* [NYE DELUGE 3/4] xfs: modernize the realtime volume @ 2022-12-30 21:14 Darrick J. Wong 2022-12-30 22:17 ` [PATCHSET v1.0 00/20] xfs: hoist inode operations to libxfs Darrick J. Wong ` (39 more replies) 0 siblings, 40 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 21:14 UTC (permalink / raw) To: djwong; +Cc: xfs, fstests Hi all, This third patchset deluge is for the realtime modernization project. There are five main parts to this effort -- adding a metadata directory tree; sharding the realtime volume into allocation groups to reduce metadata lock contention; adding reverse mapping; adding reflink; and adding the one piece needed to make quotas work on realtime. This brings the robustness of the realtime volume up to par with the data volume. Originally, the modernization effort was a side project that was intended to match XFS up to the proliferation of persistent memory. The data device would store metadata on cheap(er) flash storage, and the realtime volume would be used to map persistent memory to files and take advantage of the ability to do PMD-aligned allocations. It's now less clear how much of that will actually happen (CXL?), but the code's finished, reasonably well tested, and ready for review. NOTE: I hacked up metadump to support saving the metadata contents of external logs and realtime devices so that I could run fuzz testing in the least thoughtful way possible. Chandan is working on improving the deployment image story for our customers, and will likely produce something better than my rush job. As a warning, the patches will likely take several days to trickle in. --D ^ permalink raw reply [flat|nested] 565+ messages in thread
* [PATCHSET v1.0 00/20] xfs: hoist inode operations to libxfs 2022-12-30 21:14 [NYE DELUGE 3/4] xfs: modernize the realtime volume Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 01/20] xfs: move inode copy-on-write predicates to xfs_inode.[ch] Darrick J. Wong ` (19 more replies) 2022-12-30 22:17 ` [PATCHSET v1.0 00/23] xfs: metadata inode directories Darrick J. Wong ` (38 subsequent siblings) 39 siblings, 20 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs Hi all, This series hoists inode creation, renaming, and deletion operations to libxfs in anticipation of the metadata inode directory feature, which maintains a directory tree of metadata inodes. This will be necessary for further enhancements to the realtime feature, subvolume support. There aren't supposed to be any functional changes in this intense refactoring -- we just split the functions into pieces that are generic and pieces that are specific to libxfs clients. As a bonus, we can remove various open-coded pieces of mkfs.xfs and xfs_repair when this series gets to xfsprogs. If you're going to start using this mess, you probably ought to just pull from my git trees, which are linked below. This is an extraordinary way to destroy everything. Enjoy! Comments and questions are, as always, welcome. --D kernel git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=inode-refactor xfsprogs git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=inode-refactor --- fs/xfs/Makefile | 1 fs/xfs/libxfs/xfs_bmap.c | 42 + fs/xfs/libxfs/xfs_bmap.h | 3 fs/xfs/libxfs/xfs_dir2.c | 479 +++++++++++++++ fs/xfs/libxfs/xfs_dir2.h | 19 + fs/xfs/libxfs/xfs_ialloc.c | 20 + fs/xfs/libxfs/xfs_inode_util.c | 698 ++++++++++++++++++++++ fs/xfs/libxfs/xfs_inode_util.h | 79 +++ fs/xfs/libxfs/xfs_shared.h | 7 fs/xfs/libxfs/xfs_trans_inode.c | 2 fs/xfs/scrub/tempfile.c | 20 - fs/xfs/xfs_inode.c | 1231 +++++---------------------------------- fs/xfs/xfs_inode.h | 46 + fs/xfs/xfs_ioctl.c | 60 -- fs/xfs/xfs_iops.c | 51 +- fs/xfs/xfs_linux.h | 2 fs/xfs/xfs_qm.c | 8 fs/xfs/xfs_reflink.h | 10 fs/xfs/xfs_symlink.c | 22 - fs/xfs/xfs_trans.h | 1 20 files changed, 1564 insertions(+), 1237 deletions(-) create mode 100644 fs/xfs/libxfs/xfs_inode_util.c create mode 100644 fs/xfs/libxfs/xfs_inode_util.h ^ permalink raw reply [flat|nested] 565+ messages in thread
* [PATCH 01/20] xfs: move inode copy-on-write predicates to xfs_inode.[ch] 2022-12-30 22:17 ` [PATCHSET v1.0 00/20] xfs: hoist inode operations to libxfs Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 03/20] xfs: hoist inode flag conversion functions to libxfs Darrick J. Wong ` (18 subsequent siblings) 19 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Move these inode predicate functions to xfs_inode.[ch] since they're not reflink functions. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/xfs_inode.c | 7 +++++++ fs/xfs/xfs_inode.h | 7 +++++++ fs/xfs/xfs_reflink.h | 10 ---------- 3 files changed, 14 insertions(+), 10 deletions(-) diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index af4ac808a0e0..abf8844df017 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -3859,3 +3859,10 @@ xfs_inode_alloc_unitsize( return XFS_FSB_TO_B(ip->i_mount, blocks); } + +bool +xfs_is_always_cow_inode( + struct xfs_inode *ip) +{ + return ip->i_mount->m_always_cow && xfs_has_reflink(ip->i_mount); +} diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h index be704174fa4f..926f2d74413c 100644 --- a/fs/xfs/xfs_inode.h +++ b/fs/xfs/xfs_inode.h @@ -276,6 +276,13 @@ static inline bool xfs_is_metadata_inode(struct xfs_inode *ip) xfs_is_quota_inode(&mp->m_sb, ip->i_ino); } +bool xfs_is_always_cow_inode(struct xfs_inode *ip); + +static inline bool xfs_is_cow_inode(struct xfs_inode *ip) +{ + return xfs_is_reflink_inode(ip) || xfs_is_always_cow_inode(ip); +} + /* * Check if an inode has any data in the COW fork. This might be often false * even for inodes with the reflink flag when there is no pending COW operation. diff --git a/fs/xfs/xfs_reflink.h b/fs/xfs/xfs_reflink.h index 65c5dfe17ecf..fb55e4ce49fa 100644 --- a/fs/xfs/xfs_reflink.h +++ b/fs/xfs/xfs_reflink.h @@ -6,16 +6,6 @@ #ifndef __XFS_REFLINK_H #define __XFS_REFLINK_H 1 -static inline bool xfs_is_always_cow_inode(struct xfs_inode *ip) -{ - return ip->i_mount->m_always_cow && xfs_has_reflink(ip->i_mount); -} - -static inline bool xfs_is_cow_inode(struct xfs_inode *ip) -{ - return xfs_is_reflink_inode(ip) || xfs_is_always_cow_inode(ip); -} - extern int xfs_reflink_trim_around_shared(struct xfs_inode *ip, struct xfs_bmbt_irec *irec, bool *shared); int xfs_bmap_trim_cow(struct xfs_inode *ip, struct xfs_bmbt_irec *imap, ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 03/20] xfs: hoist inode flag conversion functions to libxfs 2022-12-30 22:17 ` [PATCHSET v1.0 00/20] xfs: hoist inode operations to libxfs Darrick J. Wong 2022-12-30 22:17 ` [PATCH 01/20] xfs: move inode copy-on-write predicates to xfs_inode.[ch] Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 04/20] xfs: hoist project id get/set " Darrick J. Wong ` (17 subsequent siblings) 19 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Hoist the inode flag conversion functions into libxfs so that we can keep them in sync. Do this by creating a new xfs_inode_util.c file in libxfs. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/Makefile | 1 fs/xfs/libxfs/xfs_bmap.c | 1 fs/xfs/libxfs/xfs_inode_util.c | 124 ++++++++++++++++++++++++++++++++++++++++ fs/xfs/libxfs/xfs_inode_util.h | 14 +++++ fs/xfs/xfs_inode.c | 49 ---------------- fs/xfs/xfs_inode.h | 2 - fs/xfs/xfs_ioctl.c | 60 ------------------- 7 files changed, 141 insertions(+), 110 deletions(-) create mode 100644 fs/xfs/libxfs/xfs_inode_util.c create mode 100644 fs/xfs/libxfs/xfs_inode_util.h diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile index 33b1ea3e6e6b..dac4165b02c0 100644 --- a/fs/xfs/Makefile +++ b/fs/xfs/Makefile @@ -39,6 +39,7 @@ xfs-y += $(addprefix libxfs/, \ xfs_iext_tree.o \ xfs_inode_fork.o \ xfs_inode_buf.o \ + xfs_inode_util.o \ xfs_log_rlimit.o \ xfs_ag_resv.o \ xfs_rmap.o \ diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index 5224e3fcce83..a372618ce393 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -38,6 +38,7 @@ #include "xfs_iomap.h" #include "xfs_health.h" #include "xfs_symlink_remote.h" +#include "xfs_inode_util.h" struct kmem_cache *xfs_bmap_intent_cache; diff --git a/fs/xfs/libxfs/xfs_inode_util.c b/fs/xfs/libxfs/xfs_inode_util.c new file mode 100644 index 000000000000..ed5e1a9b4b8c --- /dev/null +++ b/fs/xfs/libxfs/xfs_inode_util.c @@ -0,0 +1,124 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (c) 2000-2006 Silicon Graphics, Inc. + * All Rights Reserved. + */ +#include "xfs.h" +#include "xfs_fs.h" +#include "xfs_shared.h" +#include "xfs_format.h" +#include "xfs_log_format.h" +#include "xfs_trans_resv.h" +#include "xfs_sb.h" +#include "xfs_mount.h" +#include "xfs_inode.h" +#include "xfs_inode_util.h" + +uint16_t +xfs_flags2diflags( + struct xfs_inode *ip, + unsigned int xflags) +{ + /* can't set PREALLOC this way, just preserve it */ + uint16_t di_flags = + (ip->i_diflags & XFS_DIFLAG_PREALLOC); + + if (xflags & FS_XFLAG_IMMUTABLE) + di_flags |= XFS_DIFLAG_IMMUTABLE; + if (xflags & FS_XFLAG_APPEND) + di_flags |= XFS_DIFLAG_APPEND; + if (xflags & FS_XFLAG_SYNC) + di_flags |= XFS_DIFLAG_SYNC; + if (xflags & FS_XFLAG_NOATIME) + di_flags |= XFS_DIFLAG_NOATIME; + if (xflags & FS_XFLAG_NODUMP) + di_flags |= XFS_DIFLAG_NODUMP; + if (xflags & FS_XFLAG_NODEFRAG) + di_flags |= XFS_DIFLAG_NODEFRAG; + if (xflags & FS_XFLAG_FILESTREAM) + di_flags |= XFS_DIFLAG_FILESTREAM; + if (S_ISDIR(VFS_I(ip)->i_mode)) { + if (xflags & FS_XFLAG_RTINHERIT) + di_flags |= XFS_DIFLAG_RTINHERIT; + if (xflags & FS_XFLAG_NOSYMLINKS) + di_flags |= XFS_DIFLAG_NOSYMLINKS; + if (xflags & FS_XFLAG_EXTSZINHERIT) + di_flags |= XFS_DIFLAG_EXTSZINHERIT; + if (xflags & FS_XFLAG_PROJINHERIT) + di_flags |= XFS_DIFLAG_PROJINHERIT; + } else if (S_ISREG(VFS_I(ip)->i_mode)) { + if (xflags & FS_XFLAG_REALTIME) + di_flags |= XFS_DIFLAG_REALTIME; + if (xflags & FS_XFLAG_EXTSIZE) + di_flags |= XFS_DIFLAG_EXTSIZE; + } + + return di_flags; +} + +uint64_t +xfs_flags2diflags2( + struct xfs_inode *ip, + unsigned int xflags) +{ + uint64_t di_flags2 = + (ip->i_diflags2 & (XFS_DIFLAG2_REFLINK | + XFS_DIFLAG2_BIGTIME | + XFS_DIFLAG2_NREXT64)); + + if (xflags & FS_XFLAG_DAX) + di_flags2 |= XFS_DIFLAG2_DAX; + if (xflags & FS_XFLAG_COWEXTSIZE) + di_flags2 |= XFS_DIFLAG2_COWEXTSIZE; + + return di_flags2; +} + +uint32_t +xfs_ip2xflags( + struct xfs_inode *ip) +{ + uint32_t flags = 0; + + if (ip->i_diflags & XFS_DIFLAG_ANY) { + if (ip->i_diflags & XFS_DIFLAG_REALTIME) + flags |= FS_XFLAG_REALTIME; + if (ip->i_diflags & XFS_DIFLAG_PREALLOC) + flags |= FS_XFLAG_PREALLOC; + if (ip->i_diflags & XFS_DIFLAG_IMMUTABLE) + flags |= FS_XFLAG_IMMUTABLE; + if (ip->i_diflags & XFS_DIFLAG_APPEND) + flags |= FS_XFLAG_APPEND; + if (ip->i_diflags & XFS_DIFLAG_SYNC) + flags |= FS_XFLAG_SYNC; + if (ip->i_diflags & XFS_DIFLAG_NOATIME) + flags |= FS_XFLAG_NOATIME; + if (ip->i_diflags & XFS_DIFLAG_NODUMP) + flags |= FS_XFLAG_NODUMP; + if (ip->i_diflags & XFS_DIFLAG_RTINHERIT) + flags |= FS_XFLAG_RTINHERIT; + if (ip->i_diflags & XFS_DIFLAG_PROJINHERIT) + flags |= FS_XFLAG_PROJINHERIT; + if (ip->i_diflags & XFS_DIFLAG_NOSYMLINKS) + flags |= FS_XFLAG_NOSYMLINKS; + if (ip->i_diflags & XFS_DIFLAG_EXTSIZE) + flags |= FS_XFLAG_EXTSIZE; + if (ip->i_diflags & XFS_DIFLAG_EXTSZINHERIT) + flags |= FS_XFLAG_EXTSZINHERIT; + if (ip->i_diflags & XFS_DIFLAG_NODEFRAG) + flags |= FS_XFLAG_NODEFRAG; + if (ip->i_diflags & XFS_DIFLAG_FILESTREAM) + flags |= FS_XFLAG_FILESTREAM; + } + + if (ip->i_diflags2 & XFS_DIFLAG2_ANY) { + if (ip->i_diflags2 & XFS_DIFLAG2_DAX) + flags |= FS_XFLAG_DAX; + if (ip->i_diflags2 & XFS_DIFLAG2_COWEXTSIZE) + flags |= FS_XFLAG_COWEXTSIZE; + } + + if (xfs_inode_has_attr_fork(ip)) + flags |= FS_XFLAG_HASATTR; + return flags; +} diff --git a/fs/xfs/libxfs/xfs_inode_util.h b/fs/xfs/libxfs/xfs_inode_util.h new file mode 100644 index 000000000000..6ad1898a0f73 --- /dev/null +++ b/fs/xfs/libxfs/xfs_inode_util.h @@ -0,0 +1,14 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (c) 2000-2003,2005 Silicon Graphics, Inc. + * All Rights Reserved. + */ +#ifndef __XFS_INODE_UTIL_H__ +#define __XFS_INODE_UTIL_H__ + +uint16_t xfs_flags2diflags(struct xfs_inode *ip, unsigned int xflags); +uint64_t xfs_flags2diflags2(struct xfs_inode *ip, unsigned int xflags); +uint32_t xfs_dic2xflags(struct xfs_inode *ip); +uint32_t xfs_ip2xflags(struct xfs_inode *ip); + +#endif /* __XFS_INODE_UTIL_H__ */ diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index a4a3ca9a3ea6..cd1d742a8a81 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -544,55 +544,6 @@ xfs_lock_two_inodes( } } -uint -xfs_ip2xflags( - struct xfs_inode *ip) -{ - uint flags = 0; - - if (ip->i_diflags & XFS_DIFLAG_ANY) { - if (ip->i_diflags & XFS_DIFLAG_REALTIME) - flags |= FS_XFLAG_REALTIME; - if (ip->i_diflags & XFS_DIFLAG_PREALLOC) - flags |= FS_XFLAG_PREALLOC; - if (ip->i_diflags & XFS_DIFLAG_IMMUTABLE) - flags |= FS_XFLAG_IMMUTABLE; - if (ip->i_diflags & XFS_DIFLAG_APPEND) - flags |= FS_XFLAG_APPEND; - if (ip->i_diflags & XFS_DIFLAG_SYNC) - flags |= FS_XFLAG_SYNC; - if (ip->i_diflags & XFS_DIFLAG_NOATIME) - flags |= FS_XFLAG_NOATIME; - if (ip->i_diflags & XFS_DIFLAG_NODUMP) - flags |= FS_XFLAG_NODUMP; - if (ip->i_diflags & XFS_DIFLAG_RTINHERIT) - flags |= FS_XFLAG_RTINHERIT; - if (ip->i_diflags & XFS_DIFLAG_PROJINHERIT) - flags |= FS_XFLAG_PROJINHERIT; - if (ip->i_diflags & XFS_DIFLAG_NOSYMLINKS) - flags |= FS_XFLAG_NOSYMLINKS; - if (ip->i_diflags & XFS_DIFLAG_EXTSIZE) - flags |= FS_XFLAG_EXTSIZE; - if (ip->i_diflags & XFS_DIFLAG_EXTSZINHERIT) - flags |= FS_XFLAG_EXTSZINHERIT; - if (ip->i_diflags & XFS_DIFLAG_NODEFRAG) - flags |= FS_XFLAG_NODEFRAG; - if (ip->i_diflags & XFS_DIFLAG_FILESTREAM) - flags |= FS_XFLAG_FILESTREAM; - } - - if (ip->i_diflags2 & XFS_DIFLAG2_ANY) { - if (ip->i_diflags2 & XFS_DIFLAG2_DAX) - flags |= FS_XFLAG_DAX; - if (ip->i_diflags2 & XFS_DIFLAG2_COWEXTSIZE) - flags |= FS_XFLAG_COWEXTSIZE; - } - - if (xfs_inode_has_attr_fork(ip)) - flags |= FS_XFLAG_HASATTR; - return flags; -} - /* * Lookups up an inode from "name". If ci_name is not NULL, then a CI match * is allowed, otherwise it has to be an exact match. If a CI match is found, diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h index adcdc369396a..2f6072d78444 100644 --- a/fs/xfs/xfs_inode.h +++ b/fs/xfs/xfs_inode.h @@ -8,6 +8,7 @@ #include "xfs_inode_buf.h" #include "xfs_inode_fork.h" +#include "xfs_inode_util.h" /* * Kernel only inode definitions @@ -518,7 +519,6 @@ bool xfs_isilocked(struct xfs_inode *, uint); uint xfs_ilock_data_map_shared(struct xfs_inode *); uint xfs_ilock_attr_map_shared(struct xfs_inode *); -uint xfs_ip2xflags(struct xfs_inode *); int xfs_ifree(struct xfs_trans *, struct xfs_inode *); int xfs_itruncate_extents_flags(struct xfs_trans **, struct xfs_inode *, int, xfs_fsize_t, int); diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c index df6601eda7ec..615fd1e4a611 100644 --- a/fs/xfs/xfs_ioctl.c +++ b/fs/xfs/xfs_ioctl.c @@ -1055,66 +1055,6 @@ xfs_fileattr_get( return 0; } -STATIC uint16_t -xfs_flags2diflags( - struct xfs_inode *ip, - unsigned int xflags) -{ - /* can't set PREALLOC this way, just preserve it */ - uint16_t di_flags = - (ip->i_diflags & XFS_DIFLAG_PREALLOC); - - if (xflags & FS_XFLAG_IMMUTABLE) - di_flags |= XFS_DIFLAG_IMMUTABLE; - if (xflags & FS_XFLAG_APPEND) - di_flags |= XFS_DIFLAG_APPEND; - if (xflags & FS_XFLAG_SYNC) - di_flags |= XFS_DIFLAG_SYNC; - if (xflags & FS_XFLAG_NOATIME) - di_flags |= XFS_DIFLAG_NOATIME; - if (xflags & FS_XFLAG_NODUMP) - di_flags |= XFS_DIFLAG_NODUMP; - if (xflags & FS_XFLAG_NODEFRAG) - di_flags |= XFS_DIFLAG_NODEFRAG; - if (xflags & FS_XFLAG_FILESTREAM) - di_flags |= XFS_DIFLAG_FILESTREAM; - if (S_ISDIR(VFS_I(ip)->i_mode)) { - if (xflags & FS_XFLAG_RTINHERIT) - di_flags |= XFS_DIFLAG_RTINHERIT; - if (xflags & FS_XFLAG_NOSYMLINKS) - di_flags |= XFS_DIFLAG_NOSYMLINKS; - if (xflags & FS_XFLAG_EXTSZINHERIT) - di_flags |= XFS_DIFLAG_EXTSZINHERIT; - if (xflags & FS_XFLAG_PROJINHERIT) - di_flags |= XFS_DIFLAG_PROJINHERIT; - } else if (S_ISREG(VFS_I(ip)->i_mode)) { - if (xflags & FS_XFLAG_REALTIME) - di_flags |= XFS_DIFLAG_REALTIME; - if (xflags & FS_XFLAG_EXTSIZE) - di_flags |= XFS_DIFLAG_EXTSIZE; - } - - return di_flags; -} - -STATIC uint64_t -xfs_flags2diflags2( - struct xfs_inode *ip, - unsigned int xflags) -{ - uint64_t di_flags2 = - (ip->i_diflags2 & (XFS_DIFLAG2_REFLINK | - XFS_DIFLAG2_BIGTIME | - XFS_DIFLAG2_NREXT64)); - - if (xflags & FS_XFLAG_DAX) - di_flags2 |= XFS_DIFLAG2_DAX; - if (xflags & FS_XFLAG_COWEXTSIZE) - di_flags2 |= XFS_DIFLAG2_COWEXTSIZE; - - return di_flags2; -} - static int xfs_ioctl_setattr_xflags( struct xfs_trans *tp, ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 04/20] xfs: hoist project id get/set functions to libxfs 2022-12-30 22:17 ` [PATCHSET v1.0 00/20] xfs: hoist inode operations to libxfs Darrick J. Wong 2022-12-30 22:17 ` [PATCH 01/20] xfs: move inode copy-on-write predicates to xfs_inode.[ch] Darrick J. Wong 2022-12-30 22:17 ` [PATCH 03/20] xfs: hoist inode flag conversion functions to libxfs Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 02/20] xfs: hoist extent size helpers " Darrick J. Wong ` (16 subsequent siblings) 19 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Move the project id get and set functions into libxfs. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_inode_util.c | 11 +++++++++++ fs/xfs/libxfs/xfs_inode_util.h | 2 ++ fs/xfs/xfs_inode.h | 9 --------- fs/xfs/xfs_linux.h | 2 -- 4 files changed, 13 insertions(+), 11 deletions(-) diff --git a/fs/xfs/libxfs/xfs_inode_util.c b/fs/xfs/libxfs/xfs_inode_util.c index ed5e1a9b4b8c..2624d18922c0 100644 --- a/fs/xfs/libxfs/xfs_inode_util.c +++ b/fs/xfs/libxfs/xfs_inode_util.c @@ -122,3 +122,14 @@ xfs_ip2xflags( flags |= FS_XFLAG_HASATTR; return flags; } + +#define XFS_PROJID_DEFAULT 0 + +prid_t +xfs_get_initial_prid(struct xfs_inode *dp) +{ + if (dp->i_diflags & XFS_DIFLAG_PROJINHERIT) + return dp->i_projid; + + return XFS_PROJID_DEFAULT; +} diff --git a/fs/xfs/libxfs/xfs_inode_util.h b/fs/xfs/libxfs/xfs_inode_util.h index 6ad1898a0f73..f7e4d5a8235d 100644 --- a/fs/xfs/libxfs/xfs_inode_util.h +++ b/fs/xfs/libxfs/xfs_inode_util.h @@ -11,4 +11,6 @@ uint64_t xfs_flags2diflags2(struct xfs_inode *ip, unsigned int xflags); uint32_t xfs_dic2xflags(struct xfs_inode *ip); uint32_t xfs_ip2xflags(struct xfs_inode *ip); +prid_t xfs_get_initial_prid(struct xfs_inode *dp); + #endif /* __XFS_INODE_UTIL_H__ */ diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h index 2f6072d78444..4803904686f5 100644 --- a/fs/xfs/xfs_inode.h +++ b/fs/xfs/xfs_inode.h @@ -255,15 +255,6 @@ xfs_iflags_test_and_set(xfs_inode_t *ip, unsigned short flags) return ret; } -static inline prid_t -xfs_get_initial_prid(struct xfs_inode *dp) -{ - if (dp->i_diflags & XFS_DIFLAG_PROJINHERIT) - return dp->i_projid; - - return XFS_PROJID_DEFAULT; -} - static inline bool xfs_is_reflink_inode(struct xfs_inode *ip) { return ip->i_diflags2 & XFS_DIFLAG2_REFLINK; diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h index 7e9bf03c80a3..2cdb3411aabb 100644 --- a/fs/xfs/xfs_linux.h +++ b/fs/xfs/xfs_linux.h @@ -134,8 +134,6 @@ typedef __u32 xfs_nlink_t; */ #define __this_address ({ __label__ __here; __here: barrier(); &&__here; }) -#define XFS_PROJID_DEFAULT 0 - #define howmany(x, y) (((x)+((y)-1))/(y)) static inline void delay(long ticks) ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 02/20] xfs: hoist extent size helpers to libxfs 2022-12-30 22:17 ` [PATCHSET v1.0 00/20] xfs: hoist inode operations to libxfs Darrick J. Wong ` (2 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 04/20] xfs: hoist project id get/set " Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 05/20] xfs: pack icreate initialization parameters into a separate structure Darrick J. Wong ` (15 subsequent siblings) 19 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Move the extent size helpers to xfs_bmap.c in libxfs since they're used there already. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_bmap.c | 41 +++++++++++++++++++++++++++++++++++++++++ fs/xfs/libxfs/xfs_bmap.h | 3 +++ fs/xfs/xfs_inode.c | 43 ------------------------------------------- fs/xfs/xfs_inode.h | 3 --- fs/xfs/xfs_iops.c | 1 + 5 files changed, 45 insertions(+), 46 deletions(-) diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index 0dfa84993a9e..5224e3fcce83 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -6413,3 +6413,44 @@ xfs_bmap_query_all( return xfs_btree_query_all(cur, xfs_bmap_query_range_helper, &query); } + +/* Helper function to extract extent size hint from inode */ +xfs_extlen_t +xfs_get_extsz_hint( + struct xfs_inode *ip) +{ + /* + * No point in aligning allocations if we need to COW to actually + * write to them. + */ + if (xfs_is_always_cow_inode(ip)) + return 0; + if ((ip->i_diflags & XFS_DIFLAG_EXTSIZE) && ip->i_extsize) + return ip->i_extsize; + if (XFS_IS_REALTIME_INODE(ip)) + return ip->i_mount->m_sb.sb_rextsize; + return 0; +} + +/* + * Helper function to extract CoW extent size hint from inode. + * Between the extent size hint and the CoW extent size hint, we + * return the greater of the two. If the value is zero (automatic), + * use the default size. + */ +xfs_extlen_t +xfs_get_cowextsz_hint( + struct xfs_inode *ip) +{ + xfs_extlen_t a, b; + + a = 0; + if (ip->i_diflags2 & XFS_DIFLAG2_COWEXTSIZE) + a = ip->i_cowextsize; + b = xfs_get_extsz_hint(ip); + + a = max(a, b); + if (a == 0) + return XFS_DEFAULT_COWEXTSZ_HINT; + return a; +} diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h index 9559f7174bba..d870c6a62e40 100644 --- a/fs/xfs/libxfs/xfs_bmap.h +++ b/fs/xfs/libxfs/xfs_bmap.h @@ -292,4 +292,7 @@ typedef int (*xfs_bmap_query_range_fn)( int xfs_bmap_query_all(struct xfs_btree_cur *cur, xfs_bmap_query_range_fn fn, void *priv); +xfs_extlen_t xfs_get_extsz_hint(struct xfs_inode *ip); +xfs_extlen_t xfs_get_cowextsz_hint(struct xfs_inode *ip); + #endif /* __XFS_BMAP_H__ */ diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index abf8844df017..a4a3ca9a3ea6 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -45,49 +45,6 @@ struct kmem_cache *xfs_inode_cache; STATIC int xfs_iunlink_remove(struct xfs_trans *tp, struct xfs_perag *pag, struct xfs_inode *); -/* - * helper function to extract extent size hint from inode - */ -xfs_extlen_t -xfs_get_extsz_hint( - struct xfs_inode *ip) -{ - /* - * No point in aligning allocations if we need to COW to actually - * write to them. - */ - if (xfs_is_always_cow_inode(ip)) - return 0; - if ((ip->i_diflags & XFS_DIFLAG_EXTSIZE) && ip->i_extsize) - return ip->i_extsize; - if (XFS_IS_REALTIME_INODE(ip)) - return ip->i_mount->m_sb.sb_rextsize; - return 0; -} - -/* - * Helper function to extract CoW extent size hint from inode. - * Between the extent size hint and the CoW extent size hint, we - * return the greater of the two. If the value is zero (automatic), - * use the default size. - */ -xfs_extlen_t -xfs_get_cowextsz_hint( - struct xfs_inode *ip) -{ - xfs_extlen_t a, b; - - a = 0; - if (ip->i_diflags2 & XFS_DIFLAG2_COWEXTSIZE) - a = ip->i_cowextsize; - b = xfs_get_extsz_hint(ip); - - a = max(a, b); - if (a == 0) - return XFS_DEFAULT_COWEXTSZ_HINT; - return a; -} - /* * These two are wrapper routines around the xfs_ilock() routine used to * centralize some grungy code. They are used in places that wish to lock the diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h index 926f2d74413c..adcdc369396a 100644 --- a/fs/xfs/xfs_inode.h +++ b/fs/xfs/xfs_inode.h @@ -532,9 +532,6 @@ int xfs_iflush_cluster(struct xfs_buf *); void xfs_lock_two_inodes(struct xfs_inode *ip0, uint ip0_mode, struct xfs_inode *ip1, uint ip1_mode); -xfs_extlen_t xfs_get_extsz_hint(struct xfs_inode *ip); -xfs_extlen_t xfs_get_cowextsz_hint(struct xfs_inode *ip); - int xfs_init_new_inode(struct user_namespace *mnt_userns, struct xfs_trans *tp, struct xfs_inode *pip, xfs_ino_t ino, umode_t mode, xfs_nlink_t nlink, dev_t rdev, prid_t prid, bool init_xattrs, diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c index a24bf6bb5094..80f881c69336 100644 --- a/fs/xfs/xfs_iops.c +++ b/fs/xfs/xfs_iops.c @@ -25,6 +25,7 @@ #include "xfs_error.h" #include "xfs_ioctl.h" #include "xfs_xattr.h" +#include "xfs_bmap.h" #include <linux/posix_acl.h> #include <linux/security.h> ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 05/20] xfs: pack icreate initialization parameters into a separate structure 2022-12-30 22:17 ` [PATCHSET v1.0 00/20] xfs: hoist inode operations to libxfs Darrick J. Wong ` (3 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 02/20] xfs: hoist extent size helpers " Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 10/20] xfs: push xfs_icreate_args creation out of xfs_create* Darrick J. Wong ` (14 subsequent siblings) 19 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Callers that want to create an inode currently pass all possible file attribute values for the new inode into xfs_init_new_inode as ten separate parameters. This causes two code maintenance issues: first, we have large multi-line call sites which programmers must read carefully to make sure they did not accidentally invert a value. Second, all three file id parameters must be passed separately to the quota functions; any discrepancy results in quota count errors. Clean this up by creating a new icreate_args structure to hold all this information, some helpers to initialize them properly, and make the callers pass this structure through to the creation function, whose name we shorten to xfs_icreate. This eliminates the issues, enables us to keep the inode init code in sync with userspace via libxfs, and is needed for future metadata directory tree management. (A subsequent cleanup will also fix the quota alloc calls.) Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_inode_util.h | 31 +++++++++++++ fs/xfs/scrub/tempfile.c | 13 +++-- fs/xfs/xfs_inode.c | 99 ++++++++++++++++++++++++++++------------ fs/xfs/xfs_inode.h | 10 ++-- fs/xfs/xfs_qm.c | 8 ++- fs/xfs/xfs_symlink.c | 14 +++--- 6 files changed, 127 insertions(+), 48 deletions(-) diff --git a/fs/xfs/libxfs/xfs_inode_util.h b/fs/xfs/libxfs/xfs_inode_util.h index f7e4d5a8235d..466f0767ab5d 100644 --- a/fs/xfs/libxfs/xfs_inode_util.h +++ b/fs/xfs/libxfs/xfs_inode_util.h @@ -13,4 +13,35 @@ uint32_t xfs_ip2xflags(struct xfs_inode *ip); prid_t xfs_get_initial_prid(struct xfs_inode *dp); +/* + * Initial ids, link count, device number, and mode of a new inode. + * + * Due to our only partial reliance on the VFS to propagate uid and gid values + * according to accepted Unix behaviors, callers must initialize mnt_userns to + * the appropriate namespace, uid to fsuid_into_mnt(), and gid to + * fsgid_into_mnt() to get the correct inheritance behaviors when + * XFS_MOUNT_GRPID is set. Use the xfs_ialloc_inherit_args() helper. + * + * To override the default ids, use the FORCE flags defined below. + */ +struct xfs_icreate_args { + struct user_namespace *mnt_userns; + struct xfs_inode *pip; /* parent inode or null */ + + kuid_t uid; + kgid_t gid; + prid_t prid; + + xfs_nlink_t nlink; + dev_t rdev; + + umode_t mode; + +#define XFS_ICREATE_ARGS_FORCE_UID (1 << 0) +#define XFS_ICREATE_ARGS_FORCE_GID (1 << 1) +#define XFS_ICREATE_ARGS_FORCE_MODE (1 << 2) +#define XFS_ICREATE_ARGS_INIT_XATTRS (1 << 3) + uint16_t flags; +}; + #endif /* __XFS_INODE_UTIL_H__ */ diff --git a/fs/xfs/scrub/tempfile.c b/fs/xfs/scrub/tempfile.c index e5087f14343b..2c630a5e23ea 100644 --- a/fs/xfs/scrub/tempfile.c +++ b/fs/xfs/scrub/tempfile.c @@ -40,6 +40,7 @@ xrep_tempfile_create( struct xfs_scrub *sc, uint16_t mode) { + struct xfs_icreate_args args = { .pip = sc->mp->m_rootip, }; struct xfs_mount *mp = sc->mp; struct xfs_trans *tp = NULL; struct xfs_dquot *udqp = NULL; @@ -60,12 +61,15 @@ xrep_tempfile_create( ASSERT(sc->tp == NULL); ASSERT(sc->tempip == NULL); + /* Force everything to have the root ids and mode we want. */ + xfs_icreate_args_rootfile(&args, mode); + /* * Make sure that we have allocated dquot(s) on disk. The temporary * inode should be completely root owned so that we don't fail due to * quota limits. */ - error = xfs_qm_vop_dqalloc(dp, GLOBAL_ROOT_UID, GLOBAL_ROOT_GID, 0, + error = xfs_qm_vop_dqalloc(dp, args.uid, args.gid, args.prid, XFS_QMOPT_QUOTALL, &udqp, &gdqp, &pdqp); if (error) return error; @@ -87,14 +91,11 @@ xrep_tempfile_create( error = xfs_dialloc(&tp, dp->i_ino, mode, &ino); if (error) goto out_trans_cancel; - error = xfs_init_new_inode(&init_user_ns, tp, dp, ino, mode, 0, 0, - 0, false, &sc->tempip); + error = xfs_icreate(tp, ino, &args, &sc->tempip); if (error) goto out_trans_cancel; - /* Change the ownership of the inode to root. */ - VFS_I(sc->tempip)->i_uid = GLOBAL_ROOT_UID; - VFS_I(sc->tempip)->i_gid = GLOBAL_ROOT_GID; + /* We don't touch file data, so drop the realtime flags. */ sc->tempip->i_diflags &= ~(XFS_DIFLAG_REALTIME | XFS_DIFLAG_RTINHERIT); xfs_trans_log_inode(tp, sc->tempip, XFS_ILOG_CORE); diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index cd1d742a8a81..ffbf504891aa 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -679,18 +679,13 @@ xfs_inode_inherit_flags2( * caller locked exclusively. */ int -xfs_init_new_inode( - struct user_namespace *mnt_userns, +xfs_icreate( struct xfs_trans *tp, - struct xfs_inode *pip, xfs_ino_t ino, - umode_t mode, - xfs_nlink_t nlink, - dev_t rdev, - prid_t prid, - bool init_xattrs, + const struct xfs_icreate_args *args, struct xfs_inode **ipp) { + struct xfs_inode *pip = args->pip; struct inode *dir = pip ? VFS_I(pip) : NULL; struct xfs_mount *mp = tp->t_mountp; struct xfs_inode *ip; @@ -723,16 +718,16 @@ xfs_init_new_inode( ASSERT(ip != NULL); inode = VFS_I(ip); - set_nlink(inode, nlink); - inode->i_rdev = rdev; - ip->i_projid = prid; + set_nlink(inode, args->nlink); + inode->i_rdev = args->rdev; + ip->i_projid = args->prid; if (dir && !(dir->i_mode & S_ISGID) && xfs_has_grpid(mp)) { - inode_fsuid_set(inode, mnt_userns); + inode_fsuid_set(inode, args->mnt_userns); inode->i_gid = dir->i_gid; - inode->i_mode = mode; + inode->i_mode = args->mode; } else { - inode_init_owner(mnt_userns, inode, dir, mode); + inode_init_owner(args->mnt_userns, inode, dir, args->mode); } /* @@ -741,9 +736,21 @@ xfs_init_new_inode( * (and only if the irix_sgid_inherit compatibility variable is set). */ if (irix_sgid_inherit && (inode->i_mode & S_ISGID) && - !vfsgid_in_group_p(i_gid_into_vfsgid(mnt_userns, inode))) + !vfsgid_in_group_p(i_gid_into_vfsgid(args->mnt_userns, inode))) inode->i_mode &= ~S_ISGID; + /* struct copies */ + if (args->flags & XFS_ICREATE_ARGS_FORCE_UID) + inode->i_uid = args->uid; + else + ASSERT(uid_eq(inode->i_uid, args->uid)); + if (args->flags & XFS_ICREATE_ARGS_FORCE_GID) + inode->i_gid = args->gid; + else if (!pip || !XFS_INHERIT_GID(pip)) + ASSERT(gid_eq(inode->i_gid, args->gid)); + if (args->flags & XFS_ICREATE_ARGS_FORCE_MODE) + inode->i_mode = args->mode; + ip->i_disk_size = 0; ip->i_df.if_nextents = 0; ASSERT(ip->i_nblocks == 0); @@ -763,7 +770,7 @@ xfs_init_new_inode( } flags = XFS_ILOG_CORE; - switch (mode & S_IFMT) { + switch (args->mode & S_IFMT) { case S_IFIFO: case S_IFCHR: case S_IFBLK: @@ -796,7 +803,8 @@ xfs_init_new_inode( * this saves us from needing to run a separate transaction to set the * fork offset in the immediate future. */ - if (init_xattrs && xfs_has_attr(mp)) { + if ((args->flags & XFS_ICREATE_ARGS_INIT_XATTRS) && + xfs_has_attr(mp)) { ip->i_forkoff = xfs_default_attroffset(ip) >> 3; xfs_ifork_init_attr(ip, XFS_DINODE_FMT_EXTENTS, 0); } @@ -814,6 +822,38 @@ xfs_init_new_inode( return 0; } +/* Set up inode attributes for newly created children of a directory. */ +void +xfs_icreate_args_inherit( + struct xfs_icreate_args *args, + struct xfs_inode *dp, + struct user_namespace *mnt_userns, + umode_t mode) +{ + args->mnt_userns = mnt_userns; + args->pip = dp; + args->uid = mapped_fsuid(mnt_userns, &init_user_ns); + args->gid = mapped_fsgid(mnt_userns, &init_user_ns); + args->prid = xfs_get_initial_prid(dp); + args->mode = mode; +} + +/* Set up inode attributes for newly created internal files. */ +void +xfs_icreate_args_rootfile( + struct xfs_icreate_args *args, + umode_t mode) +{ + args->mnt_userns = &init_user_ns; + args->uid = GLOBAL_ROOT_UID; + args->gid = GLOBAL_ROOT_GID; + args->prid = 0; + args->mode = mode; + args->flags = XFS_ICREATE_ARGS_FORCE_UID | + XFS_ICREATE_ARGS_FORCE_GID | + XFS_ICREATE_ARGS_FORCE_MODE; +} + /* * Decrement the link count on an inode & log the change. If this causes the * link count to go to zero, move the inode to AGI unlinked list so that it can @@ -970,13 +1010,16 @@ xfs_create( bool init_xattrs, xfs_inode_t **ipp) { + struct xfs_icreate_args args = { + .rdev = rdev, + .nlink = S_ISDIR(mode) ? 2 : 1, + }; int is_dir = S_ISDIR(mode); struct xfs_mount *mp = dp->i_mount; struct xfs_inode *ip = NULL; struct xfs_trans *tp = NULL; int error; bool unlock_dp_on_error = false; - prid_t prid; struct xfs_dquot *udqp = NULL; struct xfs_dquot *gdqp = NULL; struct xfs_dquot *pdqp = NULL; @@ -989,13 +1032,14 @@ xfs_create( if (xfs_is_shutdown(mp)) return -EIO; - prid = xfs_get_initial_prid(dp); + xfs_icreate_args_inherit(&args, dp, mnt_userns, mode); + if (init_xattrs) + args.flags |= XFS_ICREATE_ARGS_INIT_XATTRS; /* * Make sure that we have allocated dquot(s) on disk. */ - error = xfs_qm_vop_dqalloc(dp, mapped_fsuid(mnt_userns, &init_user_ns), - mapped_fsgid(mnt_userns, &init_user_ns), prid, + error = xfs_qm_vop_dqalloc(dp, args.uid, args.gid, args.prid, XFS_QMOPT_QUOTALL | XFS_QMOPT_INHERIT, &udqp, &gdqp, &pdqp); if (error) @@ -1036,8 +1080,7 @@ xfs_create( */ error = xfs_dialloc(&tp, dp->i_ino, mode, &ino); if (!error) - error = xfs_init_new_inode(mnt_userns, tp, dp, ino, mode, - is_dir ? 2 : 1, rdev, prid, init_xattrs, &ip); + error = xfs_icreate(tp, ino, &args, &ip); if (error) goto out_trans_cancel; @@ -1133,11 +1176,11 @@ xfs_create_tmpfile( umode_t mode, struct xfs_inode **ipp) { + struct xfs_icreate_args args = { NULL }; struct xfs_mount *mp = dp->i_mount; struct xfs_inode *ip = NULL; struct xfs_trans *tp = NULL; int error; - prid_t prid; struct xfs_dquot *udqp = NULL; struct xfs_dquot *gdqp = NULL; struct xfs_dquot *pdqp = NULL; @@ -1148,13 +1191,12 @@ xfs_create_tmpfile( if (xfs_is_shutdown(mp)) return -EIO; - prid = xfs_get_initial_prid(dp); + xfs_icreate_args_inherit(&args, dp, mnt_userns, mode); /* * Make sure that we have allocated dquot(s) on disk. */ - error = xfs_qm_vop_dqalloc(dp, mapped_fsuid(mnt_userns, &init_user_ns), - mapped_fsgid(mnt_userns, &init_user_ns), prid, + error = xfs_qm_vop_dqalloc(dp, args.uid, args.gid, args.prid, XFS_QMOPT_QUOTALL | XFS_QMOPT_INHERIT, &udqp, &gdqp, &pdqp); if (error) @@ -1170,8 +1212,7 @@ xfs_create_tmpfile( error = xfs_dialloc(&tp, dp->i_ino, mode, &ino); if (!error) - error = xfs_init_new_inode(mnt_userns, tp, dp, ino, mode, - 0, 0, prid, false, &ip); + error = xfs_icreate(tp, ino, &args, &ip); if (error) goto out_trans_cancel; diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h index 4803904686f5..cb627543e9fb 100644 --- a/fs/xfs/xfs_inode.h +++ b/fs/xfs/xfs_inode.h @@ -523,10 +523,8 @@ int xfs_iflush_cluster(struct xfs_buf *); void xfs_lock_two_inodes(struct xfs_inode *ip0, uint ip0_mode, struct xfs_inode *ip1, uint ip1_mode); -int xfs_init_new_inode(struct user_namespace *mnt_userns, struct xfs_trans *tp, - struct xfs_inode *pip, xfs_ino_t ino, umode_t mode, - xfs_nlink_t nlink, dev_t rdev, prid_t prid, bool init_xattrs, - struct xfs_inode **ipp); +int xfs_icreate(struct xfs_trans *tp, xfs_ino_t ino, + const struct xfs_icreate_args *args, struct xfs_inode **ipp); static inline int xfs_itruncate_extents( @@ -626,4 +624,8 @@ void xfs_nlink_hook_del(struct xfs_mount *mp, struct xfs_nlink_hook *hook); # define xfs_nlink_dirent_delta(dp, ip, delta, name) ((void)0) #endif /* CONFIG_XFS_LIVE_HOOKS */ +void xfs_icreate_args_inherit(struct xfs_icreate_args *args, struct xfs_inode *dp, + struct user_namespace *mnt_userns, umode_t mode); +void xfs_icreate_args_rootfile(struct xfs_icreate_args *args, umode_t mode); + #endif /* __XFS_INODE_H__ */ diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c index 59ace2eedf69..da6c6f0e1ced 100644 --- a/fs/xfs/xfs_qm.c +++ b/fs/xfs/xfs_qm.c @@ -791,12 +791,16 @@ xfs_qm_qino_alloc( return error; if (need_alloc) { + struct xfs_icreate_args args = { + .nlink = 1, + }; xfs_ino_t ino; + xfs_icreate_args_rootfile(&args, S_IFREG); + error = xfs_dialloc(&tp, 0, S_IFREG, &ino); if (!error) - error = xfs_init_new_inode(&init_user_ns, tp, NULL, ino, - S_IFREG, 1, 0, 0, false, ipp); + error = xfs_icreate(tp, ino, &args, ipp); if (error) { xfs_trans_cancel(tp); return error; diff --git a/fs/xfs/xfs_symlink.c b/fs/xfs/xfs_symlink.c index 8cf69ca4bd7c..c27bf49de7bf 100644 --- a/fs/xfs/xfs_symlink.c +++ b/fs/xfs/xfs_symlink.c @@ -88,6 +88,9 @@ xfs_symlink( umode_t mode, struct xfs_inode **ipp) { + struct xfs_icreate_args args = { + .nlink = 1, + }; struct xfs_mount *mp = dp->i_mount; struct xfs_trans *tp = NULL; struct xfs_inode *ip = NULL; @@ -95,7 +98,6 @@ xfs_symlink( int pathlen; bool unlock_dp_on_error = false; xfs_filblks_t fs_blocks; - prid_t prid; struct xfs_dquot *udqp = NULL; struct xfs_dquot *gdqp = NULL; struct xfs_dquot *pdqp = NULL; @@ -117,13 +119,13 @@ xfs_symlink( return -ENAMETOOLONG; ASSERT(pathlen > 0); - prid = xfs_get_initial_prid(dp); + xfs_icreate_args_inherit(&args, dp, mnt_userns, + S_IFLNK | (mode & ~S_IFMT)); /* * Make sure that we have allocated dquot(s) on disk. */ - error = xfs_qm_vop_dqalloc(dp, mapped_fsuid(mnt_userns, &init_user_ns), - mapped_fsgid(mnt_userns, &init_user_ns), prid, + error = xfs_qm_vop_dqalloc(dp, args.uid, args.gid, args.prid, XFS_QMOPT_QUOTALL | XFS_QMOPT_INHERIT, &udqp, &gdqp, &pdqp); if (error) @@ -160,9 +162,7 @@ xfs_symlink( */ error = xfs_dialloc(&tp, dp->i_ino, S_IFLNK, &ino); if (!error) - error = xfs_init_new_inode(mnt_userns, tp, dp, ino, - S_IFLNK | (mode & ~S_IFMT), 1, 0, prid, - false, &ip); + error = xfs_icreate(tp, ino, &args, &ip); if (error) goto out_trans_cancel; ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 10/20] xfs: push xfs_icreate_args creation out of xfs_create* 2022-12-30 22:17 ` [PATCHSET v1.0 00/20] xfs: hoist inode operations to libxfs Darrick J. Wong ` (4 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 05/20] xfs: pack icreate initialization parameters into a separate structure Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 11/20] xfs: wrap inode creation dqalloc calls Darrick J. Wong ` (13 subsequent siblings) 19 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Move the initialization of the xfs_icreate_args structure out of xfs_create and xfs_create_tempfile into their callers so that we can set the new inode's attributes in one place and pass that through instead of open coding the collection of attributes all over the code. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/xfs_inode.c | 59 +++++++++++++++++++++++----------------------------- fs/xfs/xfs_inode.h | 9 ++++---- fs/xfs/xfs_iops.c | 50 +++++++++++++++++++++++++++----------------- 3 files changed, 61 insertions(+), 57 deletions(-) diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index 270e81e12015..5350c55ac25f 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -808,44 +808,34 @@ xfs_nlink_hook_del( int xfs_create( - struct user_namespace *mnt_userns, - xfs_inode_t *dp, + struct xfs_inode *dp, struct xfs_name *name, - umode_t mode, - dev_t rdev, - bool init_xattrs, - xfs_inode_t **ipp) + const struct xfs_icreate_args *args, + struct xfs_inode **ipp) { - struct xfs_icreate_args args = { - .rdev = rdev, - .nlink = S_ISDIR(mode) ? 2 : 1, - }; - int is_dir = S_ISDIR(mode); struct xfs_mount *mp = dp->i_mount; struct xfs_inode *ip = NULL; struct xfs_trans *tp = NULL; - int error; - bool unlock_dp_on_error = false; struct xfs_dquot *udqp = NULL; struct xfs_dquot *gdqp = NULL; struct xfs_dquot *pdqp = NULL; struct xfs_trans_res *tres; - uint resblks; xfs_ino_t ino; + bool unlock_dp_on_error = false; + bool is_dir = S_ISDIR(args->mode); + uint resblks; + int error; + ASSERT(args->pip == dp); trace_xfs_create(dp, name); if (xfs_is_shutdown(mp)) return -EIO; - xfs_icreate_args_inherit(&args, dp, mnt_userns, mode); - if (init_xattrs) - args.flags |= XFS_ICREATE_ARGS_INIT_XATTRS; - /* * Make sure that we have allocated dquot(s) on disk. */ - error = xfs_qm_vop_dqalloc(dp, args.uid, args.gid, args.prid, + error = xfs_qm_vop_dqalloc(dp, args->uid, args->gid, args->prid, XFS_QMOPT_QUOTALL | XFS_QMOPT_INHERIT, &udqp, &gdqp, &pdqp); if (error) @@ -884,9 +874,9 @@ xfs_create( * entry pointing to them, but a directory also the "." entry * pointing to itself. */ - error = xfs_dialloc(&tp, dp->i_ino, mode, &ino); + error = xfs_dialloc(&tp, dp->i_ino, args->mode, &ino); if (!error) - error = xfs_icreate(tp, ino, &args, &ip); + error = xfs_icreate(tp, ino, args, &ip); if (error) goto out_trans_cancel; @@ -977,32 +967,31 @@ xfs_create( int xfs_create_tmpfile( - struct user_namespace *mnt_userns, struct xfs_inode *dp, - umode_t mode, + const struct xfs_icreate_args *args, struct xfs_inode **ipp) { - struct xfs_icreate_args args = { NULL }; struct xfs_mount *mp = dp->i_mount; struct xfs_inode *ip = NULL; struct xfs_trans *tp = NULL; - int error; struct xfs_dquot *udqp = NULL; struct xfs_dquot *gdqp = NULL; struct xfs_dquot *pdqp = NULL; struct xfs_trans_res *tres; - uint resblks; xfs_ino_t ino; + uint resblks; + int error; + + ASSERT(args->nlink == 0); + ASSERT(args->pip == dp); if (xfs_is_shutdown(mp)) return -EIO; - xfs_icreate_args_inherit(&args, dp, mnt_userns, mode); - /* * Make sure that we have allocated dquot(s) on disk. */ - error = xfs_qm_vop_dqalloc(dp, args.uid, args.gid, args.prid, + error = xfs_qm_vop_dqalloc(dp, args->uid, args->gid, args->prid, XFS_QMOPT_QUOTALL | XFS_QMOPT_INHERIT, &udqp, &gdqp, &pdqp); if (error) @@ -1016,9 +1005,9 @@ xfs_create_tmpfile( if (error) goto out_release_dquots; - error = xfs_dialloc(&tp, dp->i_ino, mode, &ino); + error = xfs_dialloc(&tp, dp->i_ino, args->mode, &ino); if (!error) - error = xfs_icreate(tp, ino, &args, &ip); + error = xfs_icreate(tp, ino, args, &ip); if (error) goto out_trans_cancel; @@ -2750,12 +2739,16 @@ xfs_rename_alloc_whiteout( struct xfs_inode *dp, struct xfs_inode **wip) { + struct xfs_icreate_args args = { + .nlink = 0, + }; struct xfs_inode *tmpfile; struct qstr name; int error; - error = xfs_create_tmpfile(mnt_userns, dp, S_IFCHR | WHITEOUT_MODE, - &tmpfile); + xfs_icreate_args_inherit(&args, dp, mnt_userns, S_IFCHR | WHITEOUT_MODE); + + error = xfs_create_tmpfile(dp, &args, &tmpfile); if (error) return error; diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h index cb4e5114bac4..9617079f0a73 100644 --- a/fs/xfs/xfs_inode.h +++ b/fs/xfs/xfs_inode.h @@ -485,12 +485,11 @@ int xfs_release(struct xfs_inode *ip); void xfs_inactive(struct xfs_inode *ip); int xfs_lookup(struct xfs_inode *dp, const struct xfs_name *name, struct xfs_inode **ipp, struct xfs_name *ci_name); -int xfs_create(struct user_namespace *mnt_userns, - struct xfs_inode *dp, struct xfs_name *name, - umode_t mode, dev_t rdev, bool need_xattr, +int xfs_create(struct xfs_inode *dp, struct xfs_name *name, + const struct xfs_icreate_args *iargs, struct xfs_inode **ipp); -int xfs_create_tmpfile(struct user_namespace *mnt_userns, - struct xfs_inode *dp, umode_t mode, +int xfs_create_tmpfile(struct xfs_inode *dp, + const struct xfs_icreate_args *iargs, struct xfs_inode **ipp); int xfs_remove(struct xfs_inode *dp, struct xfs_name *name, struct xfs_inode *ip); diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c index 80f881c69336..d580bf591d73 100644 --- a/fs/xfs/xfs_iops.c +++ b/fs/xfs/xfs_iops.c @@ -165,44 +165,56 @@ xfs_create_need_xattr( STATIC int xfs_generic_create( struct user_namespace *mnt_userns, - struct inode *dir, - struct dentry *dentry, - umode_t mode, - dev_t rdev, - struct file *tmpfile) /* unnamed file */ + struct inode *dir, + struct dentry *dentry, + umode_t mode, + dev_t rdev, + struct file *tmpfile) /* unnamed file */ { - struct inode *inode; - struct xfs_inode *ip = NULL; - struct posix_acl *default_acl, *acl; - struct xfs_name name; - int error; + struct xfs_icreate_args args = { + .rdev = rdev, + }; + struct inode *inode; + struct xfs_inode *ip = NULL; + struct posix_acl *default_acl, *acl; + struct xfs_name name; + int error; + + xfs_icreate_args_inherit(&args, XFS_I(dir), mnt_userns, mode); + if (tmpfile) + args.nlink = 0; + else if (S_ISDIR(mode)) + args.nlink = 2; + else + args.nlink = 1; /* * Irix uses Missed'em'V split, but doesn't want to see * the upper 5 bits of (14bit) major. */ - if (S_ISCHR(mode) || S_ISBLK(mode)) { - if (unlikely(!sysv_valid_dev(rdev) || MAJOR(rdev) & ~0x1ff)) + if (S_ISCHR(args.mode) || S_ISBLK(args.mode)) { + if (unlikely(!sysv_valid_dev(args.rdev) || + MAJOR(args.rdev) & ~0x1ff)) return -EINVAL; } else { - rdev = 0; + args.rdev = 0; } - error = posix_acl_create(dir, &mode, &default_acl, &acl); + error = posix_acl_create(dir, &args.mode, &default_acl, &acl); if (error) return error; /* Verify mode is valid also for tmpfile case */ - error = xfs_dentry_mode_to_name(&name, dentry, mode); + error = xfs_dentry_mode_to_name(&name, dentry, args.mode); if (unlikely(error)) goto out_free_acl; if (!tmpfile) { - error = xfs_create(mnt_userns, XFS_I(dir), &name, mode, rdev, - xfs_create_need_xattr(dir, default_acl, acl), - &ip); + if (xfs_create_need_xattr(dir, default_acl, acl)) + args.flags |= XFS_ICREATE_ARGS_INIT_XATTRS; + error = xfs_create(XFS_I(dir), &name, &args, &ip); } else { - error = xfs_create_tmpfile(mnt_userns, XFS_I(dir), mode, &ip); + error = xfs_create_tmpfile(XFS_I(dir), &args, &ip); } if (unlikely(error)) goto out_free_acl; ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 11/20] xfs: wrap inode creation dqalloc calls 2022-12-30 22:17 ` [PATCHSET v1.0 00/20] xfs: hoist inode operations to libxfs Darrick J. Wong ` (5 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 10/20] xfs: push xfs_icreate_args creation out of xfs_create* Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 07/20] xfs: use xfs_trans_ichgtime to set times when allocating inode Darrick J. Wong ` (12 subsequent siblings) 19 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Create a helper that calls dqalloc to allocate and grab a reference to dquots for the user, group, and project ids listed in an icreate structure. This simplifies the creat-related dqalloc callsites scattered around the code base. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/scrub/tempfile.c | 9 ++++----- fs/xfs/xfs_inode.c | 38 ++++++++++++++++++++++++++------------ fs/xfs/xfs_inode.h | 3 +++ fs/xfs/xfs_symlink.c | 10 ++++------ 4 files changed, 37 insertions(+), 23 deletions(-) diff --git a/fs/xfs/scrub/tempfile.c b/fs/xfs/scrub/tempfile.c index 2c630a5e23ea..6efaab50440f 100644 --- a/fs/xfs/scrub/tempfile.c +++ b/fs/xfs/scrub/tempfile.c @@ -43,9 +43,9 @@ xrep_tempfile_create( struct xfs_icreate_args args = { .pip = sc->mp->m_rootip, }; struct xfs_mount *mp = sc->mp; struct xfs_trans *tp = NULL; - struct xfs_dquot *udqp = NULL; - struct xfs_dquot *gdqp = NULL; - struct xfs_dquot *pdqp = NULL; + struct xfs_dquot *udqp; + struct xfs_dquot *gdqp; + struct xfs_dquot *pdqp; struct xfs_trans_res *tres; struct xfs_inode *dp = mp->m_rootip; xfs_ino_t ino; @@ -69,8 +69,7 @@ xrep_tempfile_create( * inode should be completely root owned so that we don't fail due to * quota limits. */ - error = xfs_qm_vop_dqalloc(dp, args.uid, args.gid, args.prid, - XFS_QMOPT_QUOTALL, &udqp, &gdqp, &pdqp); + error = xfs_icreate_dqalloc(&args, &udqp, &gdqp, &pdqp); if (error) return error; diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index 5350c55ac25f..db1f521ac6d0 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -806,6 +806,24 @@ xfs_nlink_hook_del( # define xfs_nlink_backref_delta(dp, ip, delta) ((void)0) #endif /* CONFIG_XFS_LIVE_HOOKS */ +int +xfs_icreate_dqalloc( + const struct xfs_icreate_args *args, + struct xfs_dquot **udqpp, + struct xfs_dquot **gdqpp, + struct xfs_dquot **pdqpp) +{ + unsigned int flags = XFS_QMOPT_QUOTALL; + + *udqpp = *gdqpp = *pdqpp = NULL; + + if (!(args->flags & XFS_ICREATE_ARGS_FORCE_GID)) + flags |= XFS_QMOPT_INHERIT; + + return xfs_qm_vop_dqalloc(args->pip, args->uid, args->gid, args->prid, + flags, udqpp, gdqpp, pdqpp); +} + int xfs_create( struct xfs_inode *dp, @@ -816,9 +834,9 @@ xfs_create( struct xfs_mount *mp = dp->i_mount; struct xfs_inode *ip = NULL; struct xfs_trans *tp = NULL; - struct xfs_dquot *udqp = NULL; - struct xfs_dquot *gdqp = NULL; - struct xfs_dquot *pdqp = NULL; + struct xfs_dquot *udqp; + struct xfs_dquot *gdqp; + struct xfs_dquot *pdqp; struct xfs_trans_res *tres; xfs_ino_t ino; bool unlock_dp_on_error = false; @@ -835,9 +853,7 @@ xfs_create( /* * Make sure that we have allocated dquot(s) on disk. */ - error = xfs_qm_vop_dqalloc(dp, args->uid, args->gid, args->prid, - XFS_QMOPT_QUOTALL | XFS_QMOPT_INHERIT, - &udqp, &gdqp, &pdqp); + error = xfs_icreate_dqalloc(args, &udqp, &gdqp, &pdqp); if (error) return error; @@ -974,9 +990,9 @@ xfs_create_tmpfile( struct xfs_mount *mp = dp->i_mount; struct xfs_inode *ip = NULL; struct xfs_trans *tp = NULL; - struct xfs_dquot *udqp = NULL; - struct xfs_dquot *gdqp = NULL; - struct xfs_dquot *pdqp = NULL; + struct xfs_dquot *udqp; + struct xfs_dquot *gdqp; + struct xfs_dquot *pdqp; struct xfs_trans_res *tres; xfs_ino_t ino; uint resblks; @@ -991,9 +1007,7 @@ xfs_create_tmpfile( /* * Make sure that we have allocated dquot(s) on disk. */ - error = xfs_qm_vop_dqalloc(dp, args->uid, args->gid, args->prid, - XFS_QMOPT_QUOTALL | XFS_QMOPT_INHERIT, - &udqp, &gdqp, &pdqp); + error = xfs_icreate_dqalloc(args, &udqp, &gdqp, &pdqp); if (error) return error; diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h index 9617079f0a73..b99c62f14919 100644 --- a/fs/xfs/xfs_inode.h +++ b/fs/xfs/xfs_inode.h @@ -625,5 +625,8 @@ void xfs_nlink_hook_del(struct xfs_mount *mp, struct xfs_nlink_hook *hook); void xfs_icreate_args_inherit(struct xfs_icreate_args *args, struct xfs_inode *dp, struct user_namespace *mnt_userns, umode_t mode); void xfs_icreate_args_rootfile(struct xfs_icreate_args *args, umode_t mode); +int xfs_icreate_dqalloc(const struct xfs_icreate_args *args, + struct xfs_dquot **udqpp, struct xfs_dquot **gdqpp, + struct xfs_dquot **pdqpp); #endif /* __XFS_INODE_H__ */ diff --git a/fs/xfs/xfs_symlink.c b/fs/xfs/xfs_symlink.c index c27bf49de7bf..6dc15f125895 100644 --- a/fs/xfs/xfs_symlink.c +++ b/fs/xfs/xfs_symlink.c @@ -98,9 +98,9 @@ xfs_symlink( int pathlen; bool unlock_dp_on_error = false; xfs_filblks_t fs_blocks; - struct xfs_dquot *udqp = NULL; - struct xfs_dquot *gdqp = NULL; - struct xfs_dquot *pdqp = NULL; + struct xfs_dquot *udqp; + struct xfs_dquot *gdqp; + struct xfs_dquot *pdqp; uint resblks; xfs_ino_t ino; @@ -125,9 +125,7 @@ xfs_symlink( /* * Make sure that we have allocated dquot(s) on disk. */ - error = xfs_qm_vop_dqalloc(dp, args.uid, args.gid, args.prid, - XFS_QMOPT_QUOTALL | XFS_QMOPT_INHERIT, - &udqp, &gdqp, &pdqp); + error = xfs_icreate_dqalloc(&args, &udqp, &gdqp, &pdqp); if (error) return error; ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 07/20] xfs: use xfs_trans_ichgtime to set times when allocating inode 2022-12-30 22:17 ` [PATCHSET v1.0 00/20] xfs: hoist inode operations to libxfs Darrick J. Wong ` (6 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 11/20] xfs: wrap inode creation dqalloc calls Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 06/20] xfs: implement atime updates in xfs_trans_ichgtime Darrick J. Wong ` (11 subsequent siblings) 19 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Use xfs_trans_ichgtime to set the inode times when allocating an inode, instead of open-coding them here. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/xfs_inode.c | 16 +++++++--------- 1 file changed, 7 insertions(+), 9 deletions(-) diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index ffbf504891aa..7a634a1ea111 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -689,10 +689,11 @@ xfs_icreate( struct inode *dir = pip ? VFS_I(pip) : NULL; struct xfs_mount *mp = tp->t_mountp; struct xfs_inode *ip; - unsigned int flags; - int error; - struct timespec64 tv; struct inode *inode; + unsigned int flags; + int times = XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG | + XFS_ICHGTIME_ACCESS; + int error; /* * Protect against obviously corrupt allocation btree records. Later @@ -755,20 +756,17 @@ xfs_icreate( ip->i_df.if_nextents = 0; ASSERT(ip->i_nblocks == 0); - tv = current_time(inode); - inode->i_mtime = tv; - inode->i_atime = tv; - inode->i_ctime = tv; - ip->i_extsize = 0; ip->i_diflags = 0; if (xfs_has_v3inodes(mp)) { inode_set_iversion(inode, 1); ip->i_cowextsize = 0; - ip->i_crtime = tv; + times |= XFS_ICHGTIME_CREATE; } + xfs_trans_ichgtime(tp, ip, times); + flags = XFS_ILOG_CORE; switch (args->mode & S_IFMT) { case S_IFIFO: ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 06/20] xfs: implement atime updates in xfs_trans_ichgtime 2022-12-30 22:17 ` [PATCHSET v1.0 00/20] xfs: hoist inode operations to libxfs Darrick J. Wong ` (7 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 07/20] xfs: use xfs_trans_ichgtime to set times when allocating inode Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 08/20] xfs: split new inode creation into two pieces Darrick J. Wong ` (10 subsequent siblings) 19 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Enable xfs_trans_ichgtime to change the inode access time so that we can use this function to set inode times when allocating inodes instead of open-coding it. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_shared.h | 1 + fs/xfs/libxfs/xfs_trans_inode.c | 2 ++ 2 files changed, 3 insertions(+) diff --git a/fs/xfs/libxfs/xfs_shared.h b/fs/xfs/libxfs/xfs_shared.h index 5127fa88531f..acf527eb0e1c 100644 --- a/fs/xfs/libxfs/xfs_shared.h +++ b/fs/xfs/libxfs/xfs_shared.h @@ -137,6 +137,7 @@ void xfs_log_get_max_trans_res(struct xfs_mount *mp, #define XFS_ICHGTIME_MOD 0x1 /* data fork modification timestamp */ #define XFS_ICHGTIME_CHG 0x2 /* inode field change timestamp */ #define XFS_ICHGTIME_CREATE 0x4 /* inode create timestamp */ +#define XFS_ICHGTIME_ACCESS 0x8 /* last access timestamp */ /* Computed inode geometry for the filesystem. */ struct xfs_ino_geometry { diff --git a/fs/xfs/libxfs/xfs_trans_inode.c b/fs/xfs/libxfs/xfs_trans_inode.c index 8b5547073379..6a3a869635bf 100644 --- a/fs/xfs/libxfs/xfs_trans_inode.c +++ b/fs/xfs/libxfs/xfs_trans_inode.c @@ -69,6 +69,8 @@ xfs_trans_ichgtime( inode->i_mtime = tv; if (flags & XFS_ICHGTIME_CHG) inode->i_ctime = tv; + if (flags & XFS_ICHGTIME_ACCESS) + inode->i_atime = tv; if (flags & XFS_ICHGTIME_CREATE) ip->i_crtime = tv; } ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 08/20] xfs: split new inode creation into two pieces 2022-12-30 22:17 ` [PATCHSET v1.0 00/20] xfs: hoist inode operations to libxfs Darrick J. Wong ` (8 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 06/20] xfs: implement atime updates in xfs_trans_ichgtime Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 09/20] xfs: hoist new inode initialization functions to libxfs Darrick J. Wong ` (9 subsequent siblings) 19 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> There are two parts to initializing a newly allocated inode: setting up the incore structures, and initializing the new inode core based on the parent inode and the current user's environment. The initialization code is not specific to the kernel, so we would like to share that with userspace by hoisting it to libxfs. Therefore, split xfs_icreate into separate functions to prepare for the next few patches. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_ialloc.c | 20 ++++++++++++- fs/xfs/xfs_inode.c | 66 ++++++++++++++++++++------------------------ 2 files changed, 48 insertions(+), 38 deletions(-) diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c index ecf53198907f..d4c202da84cb 100644 --- a/fs/xfs/libxfs/xfs_ialloc.c +++ b/fs/xfs/libxfs/xfs_ialloc.c @@ -1870,9 +1870,25 @@ xfs_dialloc( } xfs_perag_put(pag); } + if (error) + goto out; - if (!error) - *new_ino = ino; + /* + * Protect against obviously corrupt allocation btree records. Later + * xfs_iget checks will catch re-allocation of other active in-memory + * and on-disk inodes. If we don't catch reallocating the parent inode + * here we will deadlock in xfs_iget() so we have to do these checks + * first. + */ + if (ino == parent || !xfs_verify_dir_ino(mp, ino)) { + xfs_alert(mp, "Allocated a known in-use inode 0x%llx!", ino); + xfs_ag_mark_sick(pag, XFS_SICK_AG_INOBT); + error = -EFSCORRUPTED; + goto out; + } + + *new_ino = ino; +out: xfs_perag_put(pag); return error; } diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index 7a634a1ea111..1352599fee4c 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -674,51 +674,21 @@ xfs_inode_inherit_flags2( } } -/* - * Initialise a newly allocated inode and return the in-core inode to the - * caller locked exclusively. - */ -int -xfs_icreate( +/* Initialise an inode's attributes. */ +static void +xfs_inode_init( struct xfs_trans *tp, - xfs_ino_t ino, const struct xfs_icreate_args *args, - struct xfs_inode **ipp) + struct xfs_inode *ip) { struct xfs_inode *pip = args->pip; struct inode *dir = pip ? VFS_I(pip) : NULL; struct xfs_mount *mp = tp->t_mountp; - struct xfs_inode *ip; - struct inode *inode; + struct inode *inode = VFS_I(ip); unsigned int flags; int times = XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG | XFS_ICHGTIME_ACCESS; - int error; - /* - * Protect against obviously corrupt allocation btree records. Later - * xfs_iget checks will catch re-allocation of other active in-memory - * and on-disk inodes. If we don't catch reallocating the parent inode - * here we will deadlock in xfs_iget() so we have to do these checks - * first. - */ - if ((pip && ino == pip->i_ino) || !xfs_verify_dir_ino(mp, ino)) { - xfs_alert(mp, "Allocated a known in-use inode 0x%llx!", ino); - xfs_agno_mark_sick(mp, XFS_INO_TO_AGNO(mp, ino), - XFS_SICK_AG_INOBT); - return -EFSCORRUPTED; - } - - /* - * Get the in-core inode with the lock held exclusively to prevent - * others from looking at until we're done. - */ - error = xfs_iget(mp, tp, ino, XFS_IGET_CREATE, XFS_ILOCK_EXCL, &ip); - if (error) - return error; - - ASSERT(ip != NULL); - inode = VFS_I(ip); set_nlink(inode, args->nlink); inode->i_rdev = args->rdev; ip->i_projid = args->prid; @@ -815,8 +785,32 @@ xfs_icreate( /* now that we have an i_mode we can setup the inode structure */ xfs_setup_inode(ip); +} - *ipp = ip; +/* + * Initialise a newly allocated inode and return the in-core inode to the + * caller locked exclusively. + */ +int +xfs_icreate( + struct xfs_trans *tp, + xfs_ino_t ino, + const struct xfs_icreate_args *args, + struct xfs_inode **ipp) +{ + struct xfs_mount *mp = tp->t_mountp; + int error; + + /* + * Get the in-core inode with the lock held exclusively to prevent + * others from looking at until we're done. + */ + error = xfs_iget(mp, tp, ino, XFS_IGET_CREATE, XFS_ILOCK_EXCL, ipp); + if (error) + return error; + + ASSERT(*ipp != NULL); + xfs_inode_init(tp, args, *ipp); return 0; } ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 09/20] xfs: hoist new inode initialization functions to libxfs 2022-12-30 22:17 ` [PATCHSET v1.0 00/20] xfs: hoist inode operations to libxfs Darrick J. Wong ` (9 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 08/20] xfs: split new inode creation into two pieces Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 12/20] xfs: hoist xfs_iunlink " Darrick J. Wong ` (8 subsequent siblings) 19 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Move all the code that initializes a new inode's attributes from the icreate_args structure and the parent directory into libxfs. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_inode_util.c | 201 ++++++++++++++++++++++++++++++++++++++++ fs/xfs/libxfs/xfs_inode_util.h | 21 ++++ fs/xfs/libxfs/xfs_shared.h | 8 -- fs/xfs/xfs_inode.c | 198 +-------------------------------------- fs/xfs/xfs_inode.h | 1 fs/xfs/xfs_trans.h | 1 6 files changed, 228 insertions(+), 202 deletions(-) diff --git a/fs/xfs/libxfs/xfs_inode_util.c b/fs/xfs/libxfs/xfs_inode_util.c index 2624d18922c0..5c9954dd20b3 100644 --- a/fs/xfs/libxfs/xfs_inode_util.c +++ b/fs/xfs/libxfs/xfs_inode_util.c @@ -3,6 +3,7 @@ * Copyright (c) 2000-2006 Silicon Graphics, Inc. * All Rights Reserved. */ +#include <linux/iversion.h> #include "xfs.h" #include "xfs_fs.h" #include "xfs_shared.h" @@ -13,6 +14,10 @@ #include "xfs_mount.h" #include "xfs_inode.h" #include "xfs_inode_util.h" +#include "xfs_trans.h" +#include "xfs_ialloc.h" +#include "xfs_health.h" +#include "xfs_bmap.h" uint16_t xfs_flags2diflags( @@ -133,3 +138,199 @@ xfs_get_initial_prid(struct xfs_inode *dp) return XFS_PROJID_DEFAULT; } + +/* Propagate di_flags from a parent inode to a child inode. */ +static inline void +xfs_inode_inherit_flags( + struct xfs_inode *ip, + const struct xfs_inode *pip) +{ + unsigned int di_flags = 0; + xfs_failaddr_t failaddr; + umode_t mode = VFS_I(ip)->i_mode; + + if (S_ISDIR(mode)) { + if (pip->i_diflags & XFS_DIFLAG_RTINHERIT) + di_flags |= XFS_DIFLAG_RTINHERIT; + if (pip->i_diflags & XFS_DIFLAG_EXTSZINHERIT) { + di_flags |= XFS_DIFLAG_EXTSZINHERIT; + ip->i_extsize = pip->i_extsize; + } + if (pip->i_diflags & XFS_DIFLAG_PROJINHERIT) + di_flags |= XFS_DIFLAG_PROJINHERIT; + } else if (S_ISREG(mode)) { + if ((pip->i_diflags & XFS_DIFLAG_RTINHERIT) && + xfs_has_realtime(ip->i_mount)) + di_flags |= XFS_DIFLAG_REALTIME; + if (pip->i_diflags & XFS_DIFLAG_EXTSZINHERIT) { + di_flags |= XFS_DIFLAG_EXTSIZE; + ip->i_extsize = pip->i_extsize; + } + } + if ((pip->i_diflags & XFS_DIFLAG_NOATIME) && + xfs_inherit_noatime) + di_flags |= XFS_DIFLAG_NOATIME; + if ((pip->i_diflags & XFS_DIFLAG_NODUMP) && + xfs_inherit_nodump) + di_flags |= XFS_DIFLAG_NODUMP; + if ((pip->i_diflags & XFS_DIFLAG_SYNC) && + xfs_inherit_sync) + di_flags |= XFS_DIFLAG_SYNC; + if ((pip->i_diflags & XFS_DIFLAG_NOSYMLINKS) && + xfs_inherit_nosymlinks) + di_flags |= XFS_DIFLAG_NOSYMLINKS; + if ((pip->i_diflags & XFS_DIFLAG_NODEFRAG) && + xfs_inherit_nodefrag) + di_flags |= XFS_DIFLAG_NODEFRAG; + if (pip->i_diflags & XFS_DIFLAG_FILESTREAM) + di_flags |= XFS_DIFLAG_FILESTREAM; + + ip->i_diflags |= di_flags; + + /* + * Inode verifiers on older kernels only check that the extent size + * hint is an integer multiple of the rt extent size on realtime files. + * They did not check the hint alignment on a directory with both + * rtinherit and extszinherit flags set. If the misaligned hint is + * propagated from a directory into a new realtime file, new file + * allocations will fail due to math errors in the rt allocator and/or + * trip the verifiers. Validate the hint settings in the new file so + * that we don't let broken hints propagate. + */ + failaddr = xfs_inode_validate_extsize(ip->i_mount, ip->i_extsize, + VFS_I(ip)->i_mode, ip->i_diflags); + if (failaddr) { + ip->i_diflags &= ~(XFS_DIFLAG_EXTSIZE | + XFS_DIFLAG_EXTSZINHERIT); + ip->i_extsize = 0; + } +} + +/* Propagate di_flags2 from a parent inode to a child inode. */ +static inline void +xfs_inode_inherit_flags2( + struct xfs_inode *ip, + const struct xfs_inode *pip) +{ + xfs_failaddr_t failaddr; + + if (pip->i_diflags2 & XFS_DIFLAG2_COWEXTSIZE) { + ip->i_diflags2 |= XFS_DIFLAG2_COWEXTSIZE; + ip->i_cowextsize = pip->i_cowextsize; + } + if (pip->i_diflags2 & XFS_DIFLAG2_DAX) + ip->i_diflags2 |= XFS_DIFLAG2_DAX; + + /* Don't let invalid cowextsize hints propagate. */ + failaddr = xfs_inode_validate_cowextsize(ip->i_mount, ip->i_cowextsize, + VFS_I(ip)->i_mode, ip->i_diflags, ip->i_diflags2); + if (failaddr) { + ip->i_diflags2 &= ~XFS_DIFLAG2_COWEXTSIZE; + ip->i_cowextsize = 0; + } +} + +/* Initialise an inode's attributes. */ +void +xfs_inode_init( + struct xfs_trans *tp, + const struct xfs_icreate_args *args, + struct xfs_inode *ip) +{ + struct xfs_inode *pip = args->pip; + struct inode *dir = pip ? VFS_I(pip) : NULL; + struct xfs_mount *mp = tp->t_mountp; + struct inode *inode = VFS_I(ip); + unsigned int flags; + int times = XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG | + XFS_ICHGTIME_ACCESS; + + set_nlink(inode, args->nlink); + inode->i_rdev = args->rdev; + ip->i_projid = args->prid; + + if (dir && !(dir->i_mode & S_ISGID) && xfs_has_grpid(mp)) { + inode_fsuid_set(inode, args->mnt_userns); + inode->i_gid = dir->i_gid; + inode->i_mode = args->mode; + } else { + inode_init_owner(args->mnt_userns, inode, dir, args->mode); + } + xfs_inode_sgid_inherit(args, ip); + + /* struct copies */ + if (args->flags & XFS_ICREATE_ARGS_FORCE_UID) + inode->i_uid = args->uid; + else + ASSERT(uid_eq(inode->i_uid, args->uid)); + if (args->flags & XFS_ICREATE_ARGS_FORCE_GID) + inode->i_gid = args->gid; + else if (!pip || !XFS_INHERIT_GID(pip)) + ASSERT(gid_eq(inode->i_gid, args->gid)); + if (args->flags & XFS_ICREATE_ARGS_FORCE_MODE) + inode->i_mode = args->mode; + + ip->i_disk_size = 0; + ip->i_df.if_nextents = 0; + ASSERT(ip->i_nblocks == 0); + + ip->i_extsize = 0; + ip->i_diflags = 0; + + if (xfs_has_v3inodes(mp)) { + inode_set_iversion(inode, 1); + ip->i_cowextsize = 0; + times |= XFS_ICHGTIME_CREATE; + } + + xfs_trans_ichgtime(tp, ip, times); + + flags = XFS_ILOG_CORE; + switch (args->mode & S_IFMT) { + case S_IFIFO: + case S_IFCHR: + case S_IFBLK: + case S_IFSOCK: + ip->i_df.if_format = XFS_DINODE_FMT_DEV; + flags |= XFS_ILOG_DEV; + break; + case S_IFREG: + case S_IFDIR: + if (pip && (pip->i_diflags & XFS_DIFLAG_ANY)) + xfs_inode_inherit_flags(ip, pip); + if (pip && (pip->i_diflags2 & XFS_DIFLAG2_ANY)) + xfs_inode_inherit_flags2(ip, pip); + fallthrough; + case S_IFLNK: + ip->i_df.if_format = XFS_DINODE_FMT_EXTENTS; + ip->i_df.if_bytes = 0; + ip->i_df.if_u1.if_root = NULL; + break; + default: + ASSERT(0); + } + + /* + * If we need to create attributes immediately after allocating the + * inode, initialise an empty attribute fork right now. We use the + * default fork offset for attributes here as we don't know exactly what + * size or how many attributes we might be adding. We can do this + * safely here because we know the data fork is completely empty and + * this saves us from needing to run a separate transaction to set the + * fork offset in the immediate future. + */ + if ((args->flags & XFS_ICREATE_ARGS_INIT_XATTRS) && + xfs_has_attr(mp)) { + ip->i_forkoff = xfs_default_attroffset(ip) >> 3; + xfs_ifork_init_attr(ip, XFS_DINODE_FMT_EXTENTS, 0); + } + + /* + * Log the new values stuffed into the inode. + */ + xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL); + xfs_trans_log_inode(tp, ip, flags); + + /* now that we have an i_mode we can setup the inode structure */ + xfs_setup_inode(ip); +} diff --git a/fs/xfs/libxfs/xfs_inode_util.h b/fs/xfs/libxfs/xfs_inode_util.h index 466f0767ab5d..a73ccaea5582 100644 --- a/fs/xfs/libxfs/xfs_inode_util.h +++ b/fs/xfs/libxfs/xfs_inode_util.h @@ -44,4 +44,25 @@ struct xfs_icreate_args { uint16_t flags; }; +/* + * Flags for xfs_trans_ichgtime(). + */ +#define XFS_ICHGTIME_MOD 0x1 /* data fork modification timestamp */ +#define XFS_ICHGTIME_CHG 0x2 /* inode field change timestamp */ +#define XFS_ICHGTIME_CREATE 0x4 /* inode create timestamp */ +#define XFS_ICHGTIME_ACCESS 0x8 /* last access timestamp */ +void xfs_trans_ichgtime(struct xfs_trans *tp, struct xfs_inode *ip, int flags); + +void xfs_inode_init(struct xfs_trans *tp, const struct xfs_icreate_args *args, + struct xfs_inode *ip); + +/* The libxfs client must provide this group of helper functions. */ + +/* Handle legacy Irix sgid inheritance quirks. */ +void xfs_inode_sgid_inherit(const struct xfs_icreate_args *args, + struct xfs_inode *ip); + +/* Initialize the incore inode. */ +void xfs_setup_inode(struct xfs_inode *ip); + #endif /* __XFS_INODE_UTIL_H__ */ diff --git a/fs/xfs/libxfs/xfs_shared.h b/fs/xfs/libxfs/xfs_shared.h index acf527eb0e1c..46754fe57361 100644 --- a/fs/xfs/libxfs/xfs_shared.h +++ b/fs/xfs/libxfs/xfs_shared.h @@ -131,14 +131,6 @@ void xfs_log_get_max_trans_res(struct xfs_mount *mp, #define XFS_RCBAG_BTREE_REF 1 #define XFS_SSB_REF 0 -/* - * Flags for xfs_trans_ichgtime(). - */ -#define XFS_ICHGTIME_MOD 0x1 /* data fork modification timestamp */ -#define XFS_ICHGTIME_CHG 0x2 /* inode field change timestamp */ -#define XFS_ICHGTIME_CREATE 0x4 /* inode create timestamp */ -#define XFS_ICHGTIME_ACCESS 0x8 /* last access timestamp */ - /* Computed inode geometry for the filesystem. */ struct xfs_ino_geometry { /* Maximum inode count in this filesystem. */ diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index 1352599fee4c..270e81e12015 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -39,6 +39,7 @@ #include "xfs_ag.h" #include "xfs_log_priv.h" #include "xfs_health.h" +#include "xfs_inode_util.h" struct kmem_cache *xfs_inode_cache; @@ -583,123 +584,12 @@ xfs_lookup( return error; } -/* Propagate di_flags from a parent inode to a child inode. */ -static void -xfs_inode_inherit_flags( - struct xfs_inode *ip, - const struct xfs_inode *pip) +void +xfs_inode_sgid_inherit( + const struct xfs_icreate_args *args, + struct xfs_inode *ip) { - unsigned int di_flags = 0; - xfs_failaddr_t failaddr; - umode_t mode = VFS_I(ip)->i_mode; - - if (S_ISDIR(mode)) { - if (pip->i_diflags & XFS_DIFLAG_RTINHERIT) - di_flags |= XFS_DIFLAG_RTINHERIT; - if (pip->i_diflags & XFS_DIFLAG_EXTSZINHERIT) { - di_flags |= XFS_DIFLAG_EXTSZINHERIT; - ip->i_extsize = pip->i_extsize; - } - if (pip->i_diflags & XFS_DIFLAG_PROJINHERIT) - di_flags |= XFS_DIFLAG_PROJINHERIT; - } else if (S_ISREG(mode)) { - if ((pip->i_diflags & XFS_DIFLAG_RTINHERIT) && - xfs_has_realtime(ip->i_mount)) - di_flags |= XFS_DIFLAG_REALTIME; - if (pip->i_diflags & XFS_DIFLAG_EXTSZINHERIT) { - di_flags |= XFS_DIFLAG_EXTSIZE; - ip->i_extsize = pip->i_extsize; - } - } - if ((pip->i_diflags & XFS_DIFLAG_NOATIME) && - xfs_inherit_noatime) - di_flags |= XFS_DIFLAG_NOATIME; - if ((pip->i_diflags & XFS_DIFLAG_NODUMP) && - xfs_inherit_nodump) - di_flags |= XFS_DIFLAG_NODUMP; - if ((pip->i_diflags & XFS_DIFLAG_SYNC) && - xfs_inherit_sync) - di_flags |= XFS_DIFLAG_SYNC; - if ((pip->i_diflags & XFS_DIFLAG_NOSYMLINKS) && - xfs_inherit_nosymlinks) - di_flags |= XFS_DIFLAG_NOSYMLINKS; - if ((pip->i_diflags & XFS_DIFLAG_NODEFRAG) && - xfs_inherit_nodefrag) - di_flags |= XFS_DIFLAG_NODEFRAG; - if (pip->i_diflags & XFS_DIFLAG_FILESTREAM) - di_flags |= XFS_DIFLAG_FILESTREAM; - - ip->i_diflags |= di_flags; - - /* - * Inode verifiers on older kernels only check that the extent size - * hint is an integer multiple of the rt extent size on realtime files. - * They did not check the hint alignment on a directory with both - * rtinherit and extszinherit flags set. If the misaligned hint is - * propagated from a directory into a new realtime file, new file - * allocations will fail due to math errors in the rt allocator and/or - * trip the verifiers. Validate the hint settings in the new file so - * that we don't let broken hints propagate. - */ - failaddr = xfs_inode_validate_extsize(ip->i_mount, ip->i_extsize, - VFS_I(ip)->i_mode, ip->i_diflags); - if (failaddr) { - ip->i_diflags &= ~(XFS_DIFLAG_EXTSIZE | - XFS_DIFLAG_EXTSZINHERIT); - ip->i_extsize = 0; - } -} - -/* Propagate di_flags2 from a parent inode to a child inode. */ -static void -xfs_inode_inherit_flags2( - struct xfs_inode *ip, - const struct xfs_inode *pip) -{ - xfs_failaddr_t failaddr; - - if (pip->i_diflags2 & XFS_DIFLAG2_COWEXTSIZE) { - ip->i_diflags2 |= XFS_DIFLAG2_COWEXTSIZE; - ip->i_cowextsize = pip->i_cowextsize; - } - if (pip->i_diflags2 & XFS_DIFLAG2_DAX) - ip->i_diflags2 |= XFS_DIFLAG2_DAX; - - /* Don't let invalid cowextsize hints propagate. */ - failaddr = xfs_inode_validate_cowextsize(ip->i_mount, ip->i_cowextsize, - VFS_I(ip)->i_mode, ip->i_diflags, ip->i_diflags2); - if (failaddr) { - ip->i_diflags2 &= ~XFS_DIFLAG2_COWEXTSIZE; - ip->i_cowextsize = 0; - } -} - -/* Initialise an inode's attributes. */ -static void -xfs_inode_init( - struct xfs_trans *tp, - const struct xfs_icreate_args *args, - struct xfs_inode *ip) -{ - struct xfs_inode *pip = args->pip; - struct inode *dir = pip ? VFS_I(pip) : NULL; - struct xfs_mount *mp = tp->t_mountp; - struct inode *inode = VFS_I(ip); - unsigned int flags; - int times = XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG | - XFS_ICHGTIME_ACCESS; - - set_nlink(inode, args->nlink); - inode->i_rdev = args->rdev; - ip->i_projid = args->prid; - - if (dir && !(dir->i_mode & S_ISGID) && xfs_has_grpid(mp)) { - inode_fsuid_set(inode, args->mnt_userns); - inode->i_gid = dir->i_gid; - inode->i_mode = args->mode; - } else { - inode_init_owner(args->mnt_userns, inode, dir, args->mode); - } + struct inode *inode = VFS_I(ip); /* * If the group ID of the new file does not match the effective group @@ -709,82 +599,6 @@ xfs_inode_init( if (irix_sgid_inherit && (inode->i_mode & S_ISGID) && !vfsgid_in_group_p(i_gid_into_vfsgid(args->mnt_userns, inode))) inode->i_mode &= ~S_ISGID; - - /* struct copies */ - if (args->flags & XFS_ICREATE_ARGS_FORCE_UID) - inode->i_uid = args->uid; - else - ASSERT(uid_eq(inode->i_uid, args->uid)); - if (args->flags & XFS_ICREATE_ARGS_FORCE_GID) - inode->i_gid = args->gid; - else if (!pip || !XFS_INHERIT_GID(pip)) - ASSERT(gid_eq(inode->i_gid, args->gid)); - if (args->flags & XFS_ICREATE_ARGS_FORCE_MODE) - inode->i_mode = args->mode; - - ip->i_disk_size = 0; - ip->i_df.if_nextents = 0; - ASSERT(ip->i_nblocks == 0); - - ip->i_extsize = 0; - ip->i_diflags = 0; - - if (xfs_has_v3inodes(mp)) { - inode_set_iversion(inode, 1); - ip->i_cowextsize = 0; - times |= XFS_ICHGTIME_CREATE; - } - - xfs_trans_ichgtime(tp, ip, times); - - flags = XFS_ILOG_CORE; - switch (args->mode & S_IFMT) { - case S_IFIFO: - case S_IFCHR: - case S_IFBLK: - case S_IFSOCK: - ip->i_df.if_format = XFS_DINODE_FMT_DEV; - flags |= XFS_ILOG_DEV; - break; - case S_IFREG: - case S_IFDIR: - if (pip && (pip->i_diflags & XFS_DIFLAG_ANY)) - xfs_inode_inherit_flags(ip, pip); - if (pip && (pip->i_diflags2 & XFS_DIFLAG2_ANY)) - xfs_inode_inherit_flags2(ip, pip); - fallthrough; - case S_IFLNK: - ip->i_df.if_format = XFS_DINODE_FMT_EXTENTS; - ip->i_df.if_bytes = 0; - ip->i_df.if_u1.if_root = NULL; - break; - default: - ASSERT(0); - } - - /* - * If we need to create attributes immediately after allocating the - * inode, initialise an empty attribute fork right now. We use the - * default fork offset for attributes here as we don't know exactly what - * size or how many attributes we might be adding. We can do this - * safely here because we know the data fork is completely empty and - * this saves us from needing to run a separate transaction to set the - * fork offset in the immediate future. - */ - if ((args->flags & XFS_ICREATE_ARGS_INIT_XATTRS) && - xfs_has_attr(mp)) { - ip->i_forkoff = xfs_default_attroffset(ip) >> 3; - xfs_ifork_init_attr(ip, XFS_DINODE_FMT_EXTENTS, 0); - } - - /* - * Log the new values stuffed into the inode. - */ - xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL); - xfs_trans_log_inode(tp, ip, flags); - - /* now that we have an i_mode we can setup the inode structure */ - xfs_setup_inode(ip); } /* diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h index cb627543e9fb..cb4e5114bac4 100644 --- a/fs/xfs/xfs_inode.h +++ b/fs/xfs/xfs_inode.h @@ -542,7 +542,6 @@ int xfs_break_layouts(struct inode *inode, uint *iolock, enum layout_break_reason reason); /* from xfs_iops.c */ -extern void xfs_setup_inode(struct xfs_inode *ip); extern void xfs_setup_iops(struct xfs_inode *ip); extern void xfs_diflags_to_iflags(struct xfs_inode *ip, bool init); diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h index a43d6465b9d4..0a9ec6929bbc 100644 --- a/fs/xfs/xfs_trans.h +++ b/fs/xfs/xfs_trans.h @@ -228,7 +228,6 @@ void xfs_trans_stale_inode_buf(xfs_trans_t *, struct xfs_buf *); bool xfs_trans_ordered_buf(xfs_trans_t *, struct xfs_buf *); void xfs_trans_dquot_buf(xfs_trans_t *, struct xfs_buf *, uint); void xfs_trans_inode_alloc_buf(xfs_trans_t *, struct xfs_buf *); -void xfs_trans_ichgtime(struct xfs_trans *, struct xfs_inode *, int); void xfs_trans_ijoin(struct xfs_trans *, struct xfs_inode *, uint); void xfs_trans_log_buf(struct xfs_trans *, struct xfs_buf *, uint, uint); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 12/20] xfs: hoist xfs_iunlink to libxfs 2022-12-30 22:17 ` [PATCHSET v1.0 00/20] xfs: hoist inode operations to libxfs Darrick J. Wong ` (10 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 09/20] xfs: hoist new inode initialization functions to libxfs Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 16/20] xfs: hoist inode free function " Darrick J. Wong ` (7 subsequent siblings) 19 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Move xfs_iunlink and xfs_iunlink_remove to libxfs. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_inode_util.c | 276 ++++++++++++++++++++++++++++++++++++++++ fs/xfs/libxfs/xfs_inode_util.h | 4 + fs/xfs/xfs_inode.c | 275 ---------------------------------------- fs/xfs/xfs_inode.h | 1 4 files changed, 280 insertions(+), 276 deletions(-) diff --git a/fs/xfs/libxfs/xfs_inode_util.c b/fs/xfs/libxfs/xfs_inode_util.c index 5c9954dd20b3..0c3ac4b07cc5 100644 --- a/fs/xfs/libxfs/xfs_inode_util.c +++ b/fs/xfs/libxfs/xfs_inode_util.c @@ -18,6 +18,10 @@ #include "xfs_ialloc.h" #include "xfs_health.h" #include "xfs_bmap.h" +#include "xfs_error.h" +#include "xfs_trace.h" +#include "xfs_ag.h" +#include "xfs_iunlink_item.h" uint16_t xfs_flags2diflags( @@ -334,3 +338,275 @@ xfs_inode_init( /* now that we have an i_mode we can setup the inode structure */ xfs_setup_inode(ip); } + +/* + * In-Core Unlinked List Lookups + * ============================= + * + * Every inode is supposed to be reachable from some other piece of metadata + * with the exception of the root directory. Inodes with a connection to a + * file descriptor but not linked from anywhere in the on-disk directory tree + * are collectively known as unlinked inodes, though the filesystem itself + * maintains links to these inodes so that on-disk metadata are consistent. + * + * XFS implements a per-AG on-disk hash table of unlinked inodes. The AGI + * header contains a number of buckets that point to an inode, and each inode + * record has a pointer to the next inode in the hash chain. This + * singly-linked list causes scaling problems in the iunlink remove function + * because we must walk that list to find the inode that points to the inode + * being removed from the unlinked hash bucket list. + * + * Hence we keep an in-memory double linked list to link each inode on an + * unlinked list. Because there are 64 unlinked lists per AGI, keeping pointer + * based lists would require having 64 list heads in the perag, one for each + * list. This is expensive in terms of memory (think millions of AGs) and cache + * misses on lookups. Instead, use the fact that inodes on the unlinked list + * must be referenced at the VFS level to keep them on the list and hence we + * have an existence guarantee for inodes on the unlinked list. + * + * Given we have an existence guarantee, we can use lockless inode cache lookups + * to resolve aginos to xfs inodes. This means we only need 8 bytes per inode + * for the double linked unlinked list, and we don't need any extra locking to + * keep the list safe as all manipulations are done under the AGI buffer lock. + * Keeping the list up to date does not require memory allocation, just finding + * the XFS inode and updating the next/prev unlinked list aginos. + */ + +/* Update the prev pointer of the next agino. */ +static int +xfs_iunlink_update_backref( + struct xfs_perag *pag, + xfs_agino_t prev_agino, + xfs_agino_t next_agino) +{ + struct xfs_inode *ip; + + /* No update necessary if we are at the end of the list. */ + if (next_agino == NULLAGINO) + return 0; + + ip = xfs_iunlink_lookup(pag, next_agino); + if (!ip) { + xfs_ag_mark_sick(pag, XFS_SICK_AG_AGI); + return -EFSCORRUPTED; + } + + ip->i_prev_unlinked = prev_agino; + return 0; +} + +/* + * Point the AGI unlinked bucket at an inode and log the results. The caller + * is responsible for validating the old value. + */ +STATIC int +xfs_iunlink_update_bucket( + struct xfs_trans *tp, + struct xfs_perag *pag, + struct xfs_buf *agibp, + unsigned int bucket_index, + xfs_agino_t new_agino) +{ + struct xfs_agi *agi = agibp->b_addr; + xfs_agino_t old_value; + int offset; + + ASSERT(xfs_verify_agino_or_null(pag, new_agino)); + + old_value = be32_to_cpu(agi->agi_unlinked[bucket_index]); + trace_xfs_iunlink_update_bucket(tp->t_mountp, pag->pag_agno, bucket_index, + old_value, new_agino); + + /* + * We should never find the head of the list already set to the value + * passed in because either we're adding or removing ourselves from the + * head of the list. + */ + if (old_value == new_agino) { + xfs_buf_mark_corrupt(agibp); + xfs_ag_mark_sick(pag, XFS_SICK_AG_AGI); + return -EFSCORRUPTED; + } + + agi->agi_unlinked[bucket_index] = cpu_to_be32(new_agino); + offset = offsetof(struct xfs_agi, agi_unlinked) + + (sizeof(xfs_agino_t) * bucket_index); + xfs_trans_log_buf(tp, agibp, offset, offset + sizeof(xfs_agino_t) - 1); + return 0; +} + +static int +xfs_iunlink_insert_inode( + struct xfs_trans *tp, + struct xfs_perag *pag, + struct xfs_buf *agibp, + struct xfs_inode *ip) +{ + struct xfs_mount *mp = tp->t_mountp; + struct xfs_agi *agi = agibp->b_addr; + xfs_agino_t next_agino; + xfs_agino_t agino = XFS_INO_TO_AGINO(mp, ip->i_ino); + short bucket_index = agino % XFS_AGI_UNLINKED_BUCKETS; + int error; + + /* + * Get the index into the agi hash table for the list this inode will + * go on. Make sure the pointer isn't garbage and that this inode + * isn't already on the list. + */ + next_agino = be32_to_cpu(agi->agi_unlinked[bucket_index]); + if (next_agino == agino || + !xfs_verify_agino_or_null(pag, next_agino)) { + xfs_buf_mark_corrupt(agibp); + xfs_ag_mark_sick(pag, XFS_SICK_AG_AGI); + return -EFSCORRUPTED; + } + + /* + * Update the prev pointer in the next inode to point back to this + * inode. + */ + error = xfs_iunlink_update_backref(pag, agino, next_agino); + if (error) + return error; + + if (next_agino != NULLAGINO) { + /* + * There is already another inode in the bucket, so point this + * inode to the current head of the list. + */ + error = xfs_iunlink_log_inode(tp, ip, pag, next_agino); + if (error) + return error; + ip->i_next_unlinked = next_agino; + } + + /* Point the head of the list to point to this inode. */ + ip->i_prev_unlinked = NULLAGINO; + return xfs_iunlink_update_bucket(tp, pag, agibp, bucket_index, agino); +} + +/* + * This is called when the inode's link count has gone to 0 or we are creating + * a tmpfile via O_TMPFILE. The inode @ip must have nlink == 0. + * + * We place the on-disk inode on a list in the AGI. It will be pulled from this + * list when the inode is freed. + */ +int +xfs_iunlink( + struct xfs_trans *tp, + struct xfs_inode *ip) +{ + struct xfs_mount *mp = tp->t_mountp; + struct xfs_perag *pag; + struct xfs_buf *agibp; + int error; + + ASSERT(VFS_I(ip)->i_nlink == 0); + ASSERT(VFS_I(ip)->i_mode != 0); + trace_xfs_iunlink(ip); + + pag = xfs_perag_get(mp, XFS_INO_TO_AGNO(mp, ip->i_ino)); + + /* Get the agi buffer first. It ensures lock ordering on the list. */ + error = xfs_read_agi(pag, tp, &agibp); + if (error) + goto out; + + error = xfs_iunlink_insert_inode(tp, pag, agibp, ip); +out: + xfs_perag_put(pag); + return error; +} + +static int +xfs_iunlink_remove_inode( + struct xfs_trans *tp, + struct xfs_perag *pag, + struct xfs_buf *agibp, + struct xfs_inode *ip) +{ + struct xfs_mount *mp = tp->t_mountp; + struct xfs_agi *agi = agibp->b_addr; + xfs_agino_t agino = XFS_INO_TO_AGINO(mp, ip->i_ino); + xfs_agino_t head_agino; + short bucket_index = agino % XFS_AGI_UNLINKED_BUCKETS; + int error; + + trace_xfs_iunlink_remove(ip); + + /* + * Get the index into the agi hash table for the list this inode will + * go on. Make sure the head pointer isn't garbage. + */ + head_agino = be32_to_cpu(agi->agi_unlinked[bucket_index]); + if (!xfs_verify_agino(pag, head_agino)) { + XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp, + agi, sizeof(*agi)); + xfs_ag_mark_sick(pag, XFS_SICK_AG_AGI); + return -EFSCORRUPTED; + } + + /* + * Set our inode's next_unlinked pointer to NULL and then return + * the old pointer value so that we can update whatever was previous + * to us in the list to point to whatever was next in the list. + */ + error = xfs_iunlink_log_inode(tp, ip, pag, NULLAGINO); + if (error) + return error; + + /* + * Update the prev pointer in the next inode to point back to previous + * inode in the chain. + */ + error = xfs_iunlink_update_backref(pag, ip->i_prev_unlinked, + ip->i_next_unlinked); + if (error) + return error; + + if (head_agino != agino) { + struct xfs_inode *prev_ip; + + prev_ip = xfs_iunlink_lookup(pag, ip->i_prev_unlinked); + if (!prev_ip) { + xfs_inode_mark_sick(ip, XFS_SICK_INO_CORE); + return -EFSCORRUPTED; + } + + error = xfs_iunlink_log_inode(tp, prev_ip, pag, + ip->i_next_unlinked); + prev_ip->i_next_unlinked = ip->i_next_unlinked; + } else { + /* Point the head of the list to the next unlinked inode. */ + error = xfs_iunlink_update_bucket(tp, pag, agibp, bucket_index, + ip->i_next_unlinked); + } + + ip->i_next_unlinked = NULLAGINO; + ip->i_prev_unlinked = 0; + return error; +} + +/* + * Pull the on-disk inode from the AGI unlinked list. + */ +int +xfs_iunlink_remove( + struct xfs_trans *tp, + struct xfs_perag *pag, + struct xfs_inode *ip) +{ + struct xfs_buf *agibp; + int error; + + trace_xfs_iunlink_remove(ip); + + /* Get the agi buffer first. It ensures lock ordering on the list. */ + error = xfs_read_agi(pag, tp, &agibp); + if (error) + return error; + + return xfs_iunlink_remove_inode(tp, pag, agibp, ip); +} diff --git a/fs/xfs/libxfs/xfs_inode_util.h b/fs/xfs/libxfs/xfs_inode_util.h index a73ccaea5582..e15cf94e0943 100644 --- a/fs/xfs/libxfs/xfs_inode_util.h +++ b/fs/xfs/libxfs/xfs_inode_util.h @@ -56,6 +56,10 @@ void xfs_trans_ichgtime(struct xfs_trans *tp, struct xfs_inode *ip, int flags); void xfs_inode_init(struct xfs_trans *tp, const struct xfs_icreate_args *args, struct xfs_inode *ip); +int xfs_iunlink(struct xfs_trans *tp, struct xfs_inode *ip); +int xfs_iunlink_remove(struct xfs_trans *tp, struct xfs_perag *pag, + struct xfs_inode *ip); + /* The libxfs client must provide this group of helper functions. */ /* Handle legacy Irix sgid inheritance quirks. */ diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index db1f521ac6d0..f69423504216 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -43,9 +43,6 @@ struct kmem_cache *xfs_inode_cache; -STATIC int xfs_iunlink_remove(struct xfs_trans *tp, struct xfs_perag *pag, - struct xfs_inode *); - /* * These two are wrapper routines around the xfs_ilock() routine used to * centralize some grungy code. They are used in places that wish to lock the @@ -1729,39 +1726,6 @@ xfs_inactive( xfs_qm_dqdetach(ip); } -/* - * In-Core Unlinked List Lookups - * ============================= - * - * Every inode is supposed to be reachable from some other piece of metadata - * with the exception of the root directory. Inodes with a connection to a - * file descriptor but not linked from anywhere in the on-disk directory tree - * are collectively known as unlinked inodes, though the filesystem itself - * maintains links to these inodes so that on-disk metadata are consistent. - * - * XFS implements a per-AG on-disk hash table of unlinked inodes. The AGI - * header contains a number of buckets that point to an inode, and each inode - * record has a pointer to the next inode in the hash chain. This - * singly-linked list causes scaling problems in the iunlink remove function - * because we must walk that list to find the inode that points to the inode - * being removed from the unlinked hash bucket list. - * - * Hence we keep an in-memory double linked list to link each inode on an - * unlinked list. Because there are 64 unlinked lists per AGI, keeping pointer - * based lists would require having 64 list heads in the perag, one for each - * list. This is expensive in terms of memory (think millions of AGs) and cache - * misses on lookups. Instead, use the fact that inodes on the unlinked list - * must be referenced at the VFS level to keep them on the list and hence we - * have an existence guarantee for inodes on the unlinked list. - * - * Given we have an existence guarantee, we can use lockless inode cache lookups - * to resolve aginos to xfs inodes. This means we only need 8 bytes per inode - * for the double linked unlinked list, and we don't need any extra locking to - * keep the list safe as all manipulations are done under the AGI buffer lock. - * Keeping the list up to date does not require memory allocation, just finding - * the XFS inode and updating the next/prev unlinked list aginos. - */ - /* * Find an inode on the unlinked list. This does not take references to the * inode as we have existence guarantees by holding the AGI buffer lock and that @@ -1791,245 +1755,6 @@ xfs_iunlink_lookup( return ip; } -/* Update the prev pointer of the next agino. */ -static int -xfs_iunlink_update_backref( - struct xfs_perag *pag, - xfs_agino_t prev_agino, - xfs_agino_t next_agino) -{ - struct xfs_inode *ip; - - /* No update necessary if we are at the end of the list. */ - if (next_agino == NULLAGINO) - return 0; - - ip = xfs_iunlink_lookup(pag, next_agino); - if (!ip) { - xfs_ag_mark_sick(pag, XFS_SICK_AG_AGI); - return -EFSCORRUPTED; - } - - ip->i_prev_unlinked = prev_agino; - return 0; -} - -/* - * Point the AGI unlinked bucket at an inode and log the results. The caller - * is responsible for validating the old value. - */ -STATIC int -xfs_iunlink_update_bucket( - struct xfs_trans *tp, - struct xfs_perag *pag, - struct xfs_buf *agibp, - unsigned int bucket_index, - xfs_agino_t new_agino) -{ - struct xfs_agi *agi = agibp->b_addr; - xfs_agino_t old_value; - int offset; - - ASSERT(xfs_verify_agino_or_null(pag, new_agino)); - - old_value = be32_to_cpu(agi->agi_unlinked[bucket_index]); - trace_xfs_iunlink_update_bucket(tp->t_mountp, pag->pag_agno, bucket_index, - old_value, new_agino); - - /* - * We should never find the head of the list already set to the value - * passed in because either we're adding or removing ourselves from the - * head of the list. - */ - if (old_value == new_agino) { - xfs_buf_mark_corrupt(agibp); - xfs_ag_mark_sick(pag, XFS_SICK_AG_AGI); - return -EFSCORRUPTED; - } - - agi->agi_unlinked[bucket_index] = cpu_to_be32(new_agino); - offset = offsetof(struct xfs_agi, agi_unlinked) + - (sizeof(xfs_agino_t) * bucket_index); - xfs_trans_log_buf(tp, agibp, offset, offset + sizeof(xfs_agino_t) - 1); - return 0; -} - -static int -xfs_iunlink_insert_inode( - struct xfs_trans *tp, - struct xfs_perag *pag, - struct xfs_buf *agibp, - struct xfs_inode *ip) -{ - struct xfs_mount *mp = tp->t_mountp; - struct xfs_agi *agi = agibp->b_addr; - xfs_agino_t next_agino; - xfs_agino_t agino = XFS_INO_TO_AGINO(mp, ip->i_ino); - short bucket_index = agino % XFS_AGI_UNLINKED_BUCKETS; - int error; - - /* - * Get the index into the agi hash table for the list this inode will - * go on. Make sure the pointer isn't garbage and that this inode - * isn't already on the list. - */ - next_agino = be32_to_cpu(agi->agi_unlinked[bucket_index]); - if (next_agino == agino || - !xfs_verify_agino_or_null(pag, next_agino)) { - xfs_buf_mark_corrupt(agibp); - xfs_ag_mark_sick(pag, XFS_SICK_AG_AGI); - return -EFSCORRUPTED; - } - - /* - * Update the prev pointer in the next inode to point back to this - * inode. - */ - error = xfs_iunlink_update_backref(pag, agino, next_agino); - if (error) - return error; - - if (next_agino != NULLAGINO) { - /* - * There is already another inode in the bucket, so point this - * inode to the current head of the list. - */ - error = xfs_iunlink_log_inode(tp, ip, pag, next_agino); - if (error) - return error; - ip->i_next_unlinked = next_agino; - } - - /* Point the head of the list to point to this inode. */ - ip->i_prev_unlinked = NULLAGINO; - return xfs_iunlink_update_bucket(tp, pag, agibp, bucket_index, agino); -} - -/* - * This is called when the inode's link count has gone to 0 or we are creating - * a tmpfile via O_TMPFILE. The inode @ip must have nlink == 0. - * - * We place the on-disk inode on a list in the AGI. It will be pulled from this - * list when the inode is freed. - */ -int -xfs_iunlink( - struct xfs_trans *tp, - struct xfs_inode *ip) -{ - struct xfs_mount *mp = tp->t_mountp; - struct xfs_perag *pag; - struct xfs_buf *agibp; - int error; - - ASSERT(VFS_I(ip)->i_nlink == 0); - ASSERT(VFS_I(ip)->i_mode != 0); - trace_xfs_iunlink(ip); - - pag = xfs_perag_get(mp, XFS_INO_TO_AGNO(mp, ip->i_ino)); - - /* Get the agi buffer first. It ensures lock ordering on the list. */ - error = xfs_read_agi(pag, tp, &agibp); - if (error) - goto out; - - error = xfs_iunlink_insert_inode(tp, pag, agibp, ip); -out: - xfs_perag_put(pag); - return error; -} - -static int -xfs_iunlink_remove_inode( - struct xfs_trans *tp, - struct xfs_perag *pag, - struct xfs_buf *agibp, - struct xfs_inode *ip) -{ - struct xfs_mount *mp = tp->t_mountp; - struct xfs_agi *agi = agibp->b_addr; - xfs_agino_t agino = XFS_INO_TO_AGINO(mp, ip->i_ino); - xfs_agino_t head_agino; - short bucket_index = agino % XFS_AGI_UNLINKED_BUCKETS; - int error; - - trace_xfs_iunlink_remove(ip); - - /* - * Get the index into the agi hash table for the list this inode will - * go on. Make sure the head pointer isn't garbage. - */ - head_agino = be32_to_cpu(agi->agi_unlinked[bucket_index]); - if (!xfs_verify_agino(pag, head_agino)) { - XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp, - agi, sizeof(*agi)); - xfs_ag_mark_sick(pag, XFS_SICK_AG_AGI); - return -EFSCORRUPTED; - } - - /* - * Set our inode's next_unlinked pointer to NULL and then return - * the old pointer value so that we can update whatever was previous - * to us in the list to point to whatever was next in the list. - */ - error = xfs_iunlink_log_inode(tp, ip, pag, NULLAGINO); - if (error) - return error; - - /* - * Update the prev pointer in the next inode to point back to previous - * inode in the chain. - */ - error = xfs_iunlink_update_backref(pag, ip->i_prev_unlinked, - ip->i_next_unlinked); - if (error) - return error; - - if (head_agino != agino) { - struct xfs_inode *prev_ip; - - prev_ip = xfs_iunlink_lookup(pag, ip->i_prev_unlinked); - if (!prev_ip) { - xfs_inode_mark_sick(ip, XFS_SICK_INO_CORE); - return -EFSCORRUPTED; - } - - error = xfs_iunlink_log_inode(tp, prev_ip, pag, - ip->i_next_unlinked); - prev_ip->i_next_unlinked = ip->i_next_unlinked; - } else { - /* Point the head of the list to the next unlinked inode. */ - error = xfs_iunlink_update_bucket(tp, pag, agibp, bucket_index, - ip->i_next_unlinked); - } - - ip->i_next_unlinked = NULLAGINO; - ip->i_prev_unlinked = 0; - return error; -} - -/* - * Pull the on-disk inode from the AGI unlinked list. - */ -STATIC int -xfs_iunlink_remove( - struct xfs_trans *tp, - struct xfs_perag *pag, - struct xfs_inode *ip) -{ - struct xfs_buf *agibp; - int error; - - trace_xfs_iunlink_remove(ip); - - /* Get the agi buffer first. It ensures lock ordering on the list. */ - error = xfs_read_agi(pag, tp, &agibp); - if (error) - return error; - - return xfs_iunlink_remove_inode(tp, pag, agibp, ip); -} - /* * Look up the inode number specified and if it is not already marked XFS_ISTALE * mark it stale. We should only find clean inodes in this lookup that aren't diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h index b99c62f14919..d07763312a27 100644 --- a/fs/xfs/xfs_inode.h +++ b/fs/xfs/xfs_inode.h @@ -574,7 +574,6 @@ extern struct kmem_cache *xfs_inode_cache; bool xfs_inode_needs_inactive(struct xfs_inode *ip); -int xfs_iunlink(struct xfs_trans *tp, struct xfs_inode *ip); struct xfs_inode *xfs_iunlink_lookup(struct xfs_perag *pag, xfs_agino_t agino); void xfs_end_io(struct work_struct *work); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 16/20] xfs: hoist inode free function to libxfs 2022-12-30 22:17 ` [PATCHSET v1.0 00/20] xfs: hoist inode operations to libxfs Darrick J. Wong ` (11 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 12/20] xfs: hoist xfs_iunlink " Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 15/20] xfs: create libxfs helper to link an existing inode into a directory Darrick J. Wong ` (6 subsequent siblings) 19 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Create a libxfs helper function that marks an inode free on disk. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_inode_util.c | 51 ++++++++++++++++++++++++++++++++++++++++ fs/xfs/libxfs/xfs_inode_util.h | 5 ++++ fs/xfs/xfs_inode.c | 35 +-------------------------- 3 files changed, 57 insertions(+), 34 deletions(-) diff --git a/fs/xfs/libxfs/xfs_inode_util.c b/fs/xfs/libxfs/xfs_inode_util.c index e32b3152c3df..1135bec1328b 100644 --- a/fs/xfs/libxfs/xfs_inode_util.c +++ b/fs/xfs/libxfs/xfs_inode_util.c @@ -22,6 +22,7 @@ #include "xfs_trace.h" #include "xfs_ag.h" #include "xfs_iunlink_item.h" +#include "xfs_inode_item.h" uint16_t xfs_flags2diflags( @@ -645,3 +646,53 @@ xfs_bumplink( inc_nlink(VFS_I(ip)); xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); } + +/* Mark an inode free on disk. */ +int +xfs_dir_ifree( + struct xfs_trans *tp, + struct xfs_perag *pag, + struct xfs_inode *ip, + struct xfs_icluster *xic) +{ + int error; + + /* + * Free the inode first so that we guarantee that the AGI lock is going + * to be taken before we remove the inode from the unlinked list. This + * makes the AGI lock -> unlinked list modification order the same as + * used in O_TMPFILE creation. + */ + error = xfs_difree(tp, pag, ip->i_ino, xic); + if (error) + return error; + + error = xfs_iunlink_remove(tp, pag, ip); + if (error) + return error; + + /* + * Free any local-format data sitting around before we reset the + * data fork to extents format. Note that the attr fork data has + * already been freed by xfs_attr_inactive. + */ + if (ip->i_df.if_format == XFS_DINODE_FMT_LOCAL) { + kmem_free(ip->i_df.if_u1.if_data); + ip->i_df.if_u1.if_data = NULL; + ip->i_df.if_bytes = 0; + } + + VFS_I(ip)->i_mode = 0; /* mark incore inode as free */ + ip->i_diflags = 0; + ip->i_diflags2 = ip->i_mount->m_ino_geo.new_diflags2; + ip->i_forkoff = 0; /* mark the attr fork not in use */ + ip->i_df.if_format = XFS_DINODE_FMT_EXTENTS; + + /* + * Bump the generation count so no one will be confused + * by reincarnations of this inode. + */ + VFS_I(ip)->i_generation++; + xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); + return 0; +} diff --git a/fs/xfs/libxfs/xfs_inode_util.h b/fs/xfs/libxfs/xfs_inode_util.h index f92b14a6fbe8..fcddaa6f738c 100644 --- a/fs/xfs/libxfs/xfs_inode_util.h +++ b/fs/xfs/libxfs/xfs_inode_util.h @@ -6,6 +6,8 @@ #ifndef __XFS_INODE_UTIL_H__ #define __XFS_INODE_UTIL_H__ +struct xfs_icluster; + uint16_t xfs_flags2diflags(struct xfs_inode *ip, unsigned int xflags); uint64_t xfs_flags2diflags2(struct xfs_inode *ip, unsigned int xflags); uint32_t xfs_dic2xflags(struct xfs_inode *ip); @@ -56,6 +58,9 @@ void xfs_trans_ichgtime(struct xfs_trans *tp, struct xfs_inode *ip, int flags); void xfs_inode_init(struct xfs_trans *tp, const struct xfs_icreate_args *args, struct xfs_inode *ip); +int xfs_dir_ifree(struct xfs_trans *tp, struct xfs_perag *pag, + struct xfs_inode *ip, struct xfs_icluster *xic); + int xfs_iunlink(struct xfs_trans *tp, struct xfs_inode *ip); int xfs_iunlink_remove(struct xfs_trans *tp, struct xfs_perag *pag, struct xfs_inode *ip); diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index e2563401b27d..8bd9d47bf6fa 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -1886,36 +1886,10 @@ xfs_ifree( pag = xfs_perag_get(mp, XFS_INO_TO_AGNO(mp, ip->i_ino)); - /* - * Free the inode first so that we guarantee that the AGI lock is going - * to be taken before we remove the inode from the unlinked list. This - * makes the AGI lock -> unlinked list modification order the same as - * used in O_TMPFILE creation. - */ - error = xfs_difree(tp, pag, ip->i_ino, &xic); + error = xfs_dir_ifree(tp, pag, ip, &xic); if (error) goto out; - error = xfs_iunlink_remove(tp, pag, ip); - if (error) - goto out; - - /* - * Free any local-format data sitting around before we reset the - * data fork to extents format. Note that the attr fork data has - * already been freed by xfs_attr_inactive. - */ - if (ip->i_df.if_format == XFS_DINODE_FMT_LOCAL) { - kmem_free(ip->i_df.if_u1.if_data); - ip->i_df.if_u1.if_data = NULL; - ip->i_df.if_bytes = 0; - } - - VFS_I(ip)->i_mode = 0; /* mark incore inode as free */ - ip->i_diflags = 0; - ip->i_diflags2 = mp->m_ino_geo.new_diflags2; - ip->i_forkoff = 0; /* mark the attr fork not in use */ - ip->i_df.if_format = XFS_DINODE_FMT_EXTENTS; if (xfs_iflags_test(ip, XFS_IPRESERVE_DM_FIELDS)) xfs_iflags_clear(ip, XFS_IPRESERVE_DM_FIELDS); @@ -1924,13 +1898,6 @@ xfs_ifree( iip->ili_fields &= ~(XFS_ILOG_AOWNER | XFS_ILOG_DOWNER); spin_unlock(&iip->ili_lock); - /* - * Bump the generation count so no one will be confused - * by reincarnations of this inode. - */ - VFS_I(ip)->i_generation++; - xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); - if (xic.deleted) error = xfs_ifree_cluster(tp, pag, ip, &xic); out: ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 15/20] xfs: create libxfs helper to link an existing inode into a directory 2022-12-30 22:17 ` [PATCHSET v1.0 00/20] xfs: hoist inode operations to libxfs Darrick J. Wong ` (12 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 16/20] xfs: hoist inode free function " Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 18/20] xfs: create libxfs helper to exchange two directory entries Darrick J. Wong ` (5 subsequent siblings) 19 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Create a new libxfs function to link an existing inode into a directory. The upcoming metadata directory feature will need this to create a metadata directory tree. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_dir2.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++ fs/xfs/libxfs/xfs_dir2.h | 3 +++ fs/xfs/xfs_inode.c | 26 +---------------------- 3 files changed, 55 insertions(+), 25 deletions(-) diff --git a/fs/xfs/libxfs/xfs_dir2.c b/fs/xfs/libxfs/xfs_dir2.c index bca493a2da8d..e14464712eff 100644 --- a/fs/xfs/libxfs/xfs_dir2.c +++ b/fs/xfs/libxfs/xfs_dir2.c @@ -21,6 +21,7 @@ #include "xfs_health.h" #include "xfs_bmap_btree.h" #include "xfs_trans_space.h" +#include "xfs_ag.h" const struct xfs_name xfs_name_dotdot = { .name = (const unsigned char *)"..", @@ -825,3 +826,53 @@ xfs_dir_create_new_child( xfs_bumplink(tp, dp); return 0; } + +/* + * Given a directory @dp, an existing non-directory inode @ip, and a @name, + * link @ip into @dp under the given @name. Both inodes must have the ILOCK + * held. + */ +int +xfs_dir_link_existing_child( + struct xfs_trans *tp, + uint resblks, + struct xfs_inode *dp, + struct xfs_name *name, + struct xfs_inode *ip) +{ + struct xfs_mount *mp = tp->t_mountp; + int error; + + ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL)); + ASSERT(xfs_isilocked(dp, XFS_ILOCK_EXCL)); + ASSERT(!S_ISDIR(VFS_I(ip)->i_mode)); + + if (!resblks) { + error = xfs_dir_canenter(tp, dp, name); + if (error) + return error; + } + + /* + * Handle initial link state of O_TMPFILE inode + */ + if (VFS_I(ip)->i_nlink == 0) { + struct xfs_perag *pag; + + pag = xfs_perag_get(mp, XFS_INO_TO_AGNO(mp, ip->i_ino)); + error = xfs_iunlink_remove(tp, pag, ip); + xfs_perag_put(pag); + if (error) + return error; + } + + error = xfs_dir_createname(tp, dp, name, ip->i_ino, resblks); + if (error) + return error; + + xfs_trans_ichgtime(tp, dp, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG); + xfs_trans_log_inode(tp, dp, XFS_ILOG_CORE); + + xfs_bumplink(tp, ip); + return 0; +} diff --git a/fs/xfs/libxfs/xfs_dir2.h b/fs/xfs/libxfs/xfs_dir2.h index d3e7607c0e9d..4afade8b0877 100644 --- a/fs/xfs/libxfs/xfs_dir2.h +++ b/fs/xfs/libxfs/xfs_dir2.h @@ -256,5 +256,8 @@ bool xfs_dir2_namecheck(const void *name, size_t length); int xfs_dir_create_new_child(struct xfs_trans *tp, uint resblks, struct xfs_inode *dp, struct xfs_name *name, struct xfs_inode *ip); +int xfs_dir_link_existing_child(struct xfs_trans *tp, uint resblks, + struct xfs_inode *dp, struct xfs_name *name, + struct xfs_inode *ip); #endif /* __XFS_DIR2_H__ */ diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index b66a9cf66055..e2563401b27d 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -1062,33 +1062,9 @@ xfs_link( goto error_return; } - if (!resblks) { - error = xfs_dir_canenter(tp, tdp, target_name); - if (error) - goto error_return; - } - - /* - * Handle initial link state of O_TMPFILE inode - */ - if (VFS_I(sip)->i_nlink == 0) { - struct xfs_perag *pag; - - pag = xfs_perag_get(mp, XFS_INO_TO_AGNO(mp, sip->i_ino)); - error = xfs_iunlink_remove(tp, pag, sip); - xfs_perag_put(pag); - if (error) - goto error_return; - } - - error = xfs_dir_createname(tp, tdp, target_name, sip->i_ino, - resblks); + error = xfs_dir_link_existing_child(tp, resblks, tdp, target_name, sip); if (error) goto error_return; - xfs_trans_ichgtime(tp, tdp, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG); - xfs_trans_log_inode(tp, tdp, XFS_ILOG_CORE); - - xfs_bumplink(tp, sip); xfs_nlink_dirent_delta(tdp, sip, 1, target_name); /* ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 18/20] xfs: create libxfs helper to exchange two directory entries 2022-12-30 22:17 ` [PATCHSET v1.0 00/20] xfs: hoist inode operations to libxfs Darrick J. Wong ` (13 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 15/20] xfs: create libxfs helper to link an existing inode into a directory Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 14/20] xfs: create libxfs helper to link a new inode into a directory Darrick J. Wong ` (4 subsequent siblings) 19 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Create a new libxfs function to exchange two directory entries. The upcoming metadata directory feature will need this to replace a metadata inode directory entry. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_dir2.c | 108 ++++++++++++++++++++++++++++++++++++++++++++++ fs/xfs/libxfs/xfs_dir2.h | 4 ++ fs/xfs/xfs_inode.c | 86 +------------------------------------ 3 files changed, 115 insertions(+), 83 deletions(-) diff --git a/fs/xfs/libxfs/xfs_dir2.c b/fs/xfs/libxfs/xfs_dir2.c index 2923cf568e9d..6d7851a613e7 100644 --- a/fs/xfs/libxfs/xfs_dir2.c +++ b/fs/xfs/libxfs/xfs_dir2.c @@ -949,3 +949,111 @@ xfs_dir_remove_child( return 0; } + +/* + * Exchange the entry (@name1, @ip1) in directory @dp1 with the entry (@name2, + * @ip2) in directory @dp2, and update '..' @ip1 and @ip2's entries as needed. + * @ip1 and @ip2 need not be of the same type. + * + * All inodes must have the ILOCK held, and both entries must already exist. + */ +int +xfs_dir_exchange( + struct xfs_trans *tp, + struct xfs_inode *dp1, + struct xfs_name *name1, + struct xfs_inode *ip1, + struct xfs_inode *dp2, + struct xfs_name *name2, + struct xfs_inode *ip2, + unsigned int spaceres) +{ + int ip1_flags = 0; + int ip2_flags = 0; + int dp2_flags = 0; + int error; + + /* Swap inode number for dirent in first parent */ + error = xfs_dir_replace(tp, dp1, name1, ip2->i_ino, spaceres); + if (error) + return error; + + /* Swap inode number for dirent in second parent */ + error = xfs_dir_replace(tp, dp2, name2, ip1->i_ino, spaceres); + if (error) + return error; + + /* + * If we're renaming one or more directories across different parents, + * update the respective ".." entries (and link counts) to match the new + * parents. + */ + if (dp1 != dp2) { + dp2_flags = XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG; + + if (S_ISDIR(VFS_I(ip2)->i_mode)) { + error = xfs_dir_replace(tp, ip2, &xfs_name_dotdot, + dp1->i_ino, spaceres); + if (error) + return error; + + /* transfer ip2 ".." reference to dp1 */ + if (!S_ISDIR(VFS_I(ip1)->i_mode)) { + error = xfs_droplink(tp, dp2); + if (error) + return error; + xfs_bumplink(tp, dp1); + } + + /* + * Although ip1 isn't changed here, userspace needs + * to be warned about the change, so that applications + * relying on it (like backup ones), will properly + * notify the change + */ + ip1_flags |= XFS_ICHGTIME_CHG; + ip2_flags |= XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG; + } + + if (S_ISDIR(VFS_I(ip1)->i_mode)) { + error = xfs_dir_replace(tp, ip1, &xfs_name_dotdot, + dp2->i_ino, spaceres); + if (error) + return error; + + /* transfer ip1 ".." reference to dp2 */ + if (!S_ISDIR(VFS_I(ip2)->i_mode)) { + error = xfs_droplink(tp, dp1); + if (error) + return error; + xfs_bumplink(tp, dp2); + } + + /* + * Although ip2 isn't changed here, userspace needs + * to be warned about the change, so that applications + * relying on it (like backup ones), will properly + * notify the change + */ + ip1_flags |= XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG; + ip2_flags |= XFS_ICHGTIME_CHG; + } + } + + if (ip1_flags) { + xfs_trans_ichgtime(tp, ip1, ip1_flags); + xfs_trans_log_inode(tp, ip1, XFS_ILOG_CORE); + } + if (ip2_flags) { + xfs_trans_ichgtime(tp, ip2, ip2_flags); + xfs_trans_log_inode(tp, ip2, XFS_ILOG_CORE); + } + if (dp2_flags) { + xfs_trans_ichgtime(tp, dp2, dp2_flags); + xfs_trans_log_inode(tp, dp2, XFS_ILOG_CORE); + } + xfs_trans_ichgtime(tp, dp1, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG); + xfs_trans_log_inode(tp, dp1, XFS_ILOG_CORE); + + return 0; +} diff --git a/fs/xfs/libxfs/xfs_dir2.h b/fs/xfs/libxfs/xfs_dir2.h index e35deb273d84..f63390236f09 100644 --- a/fs/xfs/libxfs/xfs_dir2.h +++ b/fs/xfs/libxfs/xfs_dir2.h @@ -262,5 +262,9 @@ int xfs_dir_link_existing_child(struct xfs_trans *tp, uint resblks, int xfs_dir_remove_child(struct xfs_trans *tp, uint resblks, struct xfs_inode *dp, struct xfs_name *name, struct xfs_inode *ip); +int xfs_dir_exchange(struct xfs_trans *tp, struct xfs_inode *dp1, + struct xfs_name *name1, struct xfs_inode *ip1, + struct xfs_inode *dp2, struct xfs_name *name2, + struct xfs_inode *ip2, unsigned int spaceres); #endif /* __XFS_DIR2_H__ */ diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index f2a5de0119b3..591721755b78 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -2207,93 +2207,13 @@ xfs_cross_rename( struct xfs_inode *ip2, int spaceres) { - int error = 0; - int ip1_flags = 0; - int ip2_flags = 0; - int dp2_flags = 0; + int error; - /* Swap inode number for dirent in first parent */ - error = xfs_dir_replace(tp, dp1, name1, ip2->i_ino, spaceres); + error = xfs_dir_exchange(tp, dp1, name1, ip1, dp2, name2, ip2, + spaceres); if (error) goto out_trans_abort; - /* Swap inode number for dirent in second parent */ - error = xfs_dir_replace(tp, dp2, name2, ip1->i_ino, spaceres); - if (error) - goto out_trans_abort; - - /* - * If we're renaming one or more directories across different parents, - * update the respective ".." entries (and link counts) to match the new - * parents. - */ - if (dp1 != dp2) { - dp2_flags = XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG; - - if (S_ISDIR(VFS_I(ip2)->i_mode)) { - error = xfs_dir_replace(tp, ip2, &xfs_name_dotdot, - dp1->i_ino, spaceres); - if (error) - goto out_trans_abort; - - /* transfer ip2 ".." reference to dp1 */ - if (!S_ISDIR(VFS_I(ip1)->i_mode)) { - error = xfs_droplink(tp, dp2); - if (error) - goto out_trans_abort; - xfs_bumplink(tp, dp1); - } - - /* - * Although ip1 isn't changed here, userspace needs - * to be warned about the change, so that applications - * relying on it (like backup ones), will properly - * notify the change - */ - ip1_flags |= XFS_ICHGTIME_CHG; - ip2_flags |= XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG; - } - - if (S_ISDIR(VFS_I(ip1)->i_mode)) { - error = xfs_dir_replace(tp, ip1, &xfs_name_dotdot, - dp2->i_ino, spaceres); - if (error) - goto out_trans_abort; - - /* transfer ip1 ".." reference to dp2 */ - if (!S_ISDIR(VFS_I(ip2)->i_mode)) { - error = xfs_droplink(tp, dp1); - if (error) - goto out_trans_abort; - xfs_bumplink(tp, dp2); - } - - /* - * Although ip2 isn't changed here, userspace needs - * to be warned about the change, so that applications - * relying on it (like backup ones), will properly - * notify the change - */ - ip1_flags |= XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG; - ip2_flags |= XFS_ICHGTIME_CHG; - } - } - - if (ip1_flags) { - xfs_trans_ichgtime(tp, ip1, ip1_flags); - xfs_trans_log_inode(tp, ip1, XFS_ILOG_CORE); - } - if (ip2_flags) { - xfs_trans_ichgtime(tp, ip2, ip2_flags); - xfs_trans_log_inode(tp, ip2, XFS_ILOG_CORE); - } - if (dp2_flags) { - xfs_trans_ichgtime(tp, dp2, dp2_flags); - xfs_trans_log_inode(tp, dp2, XFS_ILOG_CORE); - } - xfs_trans_ichgtime(tp, dp1, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG); - xfs_trans_log_inode(tp, dp1, XFS_ILOG_CORE); - if (xfs_hooks_switched_on(&xfs_nlinks_hooks_switch)) xfs_rename_call_nlink_hooks(dp1, name1, ip1, dp2, name2, ip2, NULL, RENAME_EXCHANGE); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 14/20] xfs: create libxfs helper to link a new inode into a directory 2022-12-30 22:17 ` [PATCHSET v1.0 00/20] xfs: hoist inode operations to libxfs Darrick J. Wong ` (14 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 18/20] xfs: create libxfs helper to exchange two directory entries Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 13/20] xfs: hoist xfs_{bump,drop}link to libxfs Darrick J. Wong ` (3 subsequent siblings) 19 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Create a new libxfs function to link a newly created inode into a directory. The upcoming metadata directory feature will need this to create a metadata directory tree. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_dir2.c | 44 ++++++++++++++++++++++++++++++++++++++++++++ fs/xfs/libxfs/xfs_dir2.h | 4 ++++ fs/xfs/xfs_inode.c | 17 ++--------------- 3 files changed, 50 insertions(+), 15 deletions(-) diff --git a/fs/xfs/libxfs/xfs_dir2.c b/fs/xfs/libxfs/xfs_dir2.c index fb0697dc733f..bca493a2da8d 100644 --- a/fs/xfs/libxfs/xfs_dir2.c +++ b/fs/xfs/libxfs/xfs_dir2.c @@ -19,6 +19,8 @@ #include "xfs_error.h" #include "xfs_trace.h" #include "xfs_health.h" +#include "xfs_bmap_btree.h" +#include "xfs_trans_space.h" const struct xfs_name xfs_name_dotdot = { .name = (const unsigned char *)"..", @@ -781,3 +783,45 @@ xfs_dir2_compname( return xfs_ascii_ci_compname(args, name, len); return xfs_da_compname(args, name, len); } + +/* + * Given a directory @dp, a newly allocated inode @ip, and a @name, link @ip + * into @dp under the given @name. If @ip is a directory, it will be + * initialized. Both inodes must have the ILOCK held and the transaction must + * have sufficient blocks reserved. + */ +int +xfs_dir_create_new_child( + struct xfs_trans *tp, + uint resblks, + struct xfs_inode *dp, + struct xfs_name *name, + struct xfs_inode *ip) +{ + struct xfs_mount *mp = tp->t_mountp; + int error; + + ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL)); + ASSERT(xfs_isilocked(dp, XFS_ILOCK_EXCL)); + ASSERT(resblks == 0 || resblks > XFS_IALLOC_SPACE_RES(mp)); + + error = xfs_dir_createname(tp, dp, name, ip->i_ino, + resblks - XFS_IALLOC_SPACE_RES(mp)); + if (error) { + ASSERT(error != -ENOSPC); + return error; + } + + xfs_trans_ichgtime(tp, dp, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG); + xfs_trans_log_inode(tp, dp, XFS_ILOG_CORE); + + if (!S_ISDIR(VFS_I(ip)->i_mode)) + return 0; + + error = xfs_dir_init(tp, ip, dp); + if (error) + return error; + + xfs_bumplink(tp, dp); + return 0; +} diff --git a/fs/xfs/libxfs/xfs_dir2.h b/fs/xfs/libxfs/xfs_dir2.h index 7322284f61a0..d3e7607c0e9d 100644 --- a/fs/xfs/libxfs/xfs_dir2.h +++ b/fs/xfs/libxfs/xfs_dir2.h @@ -253,4 +253,8 @@ unsigned int xfs_dir3_data_end_offset(struct xfs_da_geometry *geo, struct xfs_dir2_data_hdr *hdr); bool xfs_dir2_namecheck(const void *name, size_t length); +int xfs_dir_create_new_child(struct xfs_trans *tp, uint resblks, + struct xfs_inode *dp, struct xfs_name *name, + struct xfs_inode *ip); + #endif /* __XFS_DIR2_H__ */ diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index f9599aa49ab4..b66a9cf66055 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -868,22 +868,9 @@ xfs_create( xfs_trans_ijoin(tp, dp, XFS_ILOCK_EXCL); unlock_dp_on_error = false; - error = xfs_dir_createname(tp, dp, name, ip->i_ino, - resblks - XFS_IALLOC_SPACE_RES(mp)); - if (error) { - ASSERT(error != -ENOSPC); + error = xfs_dir_create_new_child(tp, resblks, dp, name, ip); + if (error) goto out_trans_cancel; - } - xfs_trans_ichgtime(tp, dp, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG); - xfs_trans_log_inode(tp, dp, XFS_ILOG_CORE); - - if (is_dir) { - error = xfs_dir_init(tp, ip, dp); - if (error) - goto out_trans_cancel; - - xfs_bumplink(tp, dp); - } /* * Create ip with a reference from dp, and add '.' and '..' references ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 13/20] xfs: hoist xfs_{bump,drop}link to libxfs 2022-12-30 22:17 ` [PATCHSET v1.0 00/20] xfs: hoist inode operations to libxfs Darrick J. Wong ` (15 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 14/20] xfs: create libxfs helper to link a new inode into a directory Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 17/20] xfs: create libxfs helper to remove an existing inode/name from a directory Darrick J. Wong ` (2 subsequent siblings) 19 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Move xfs_bumplink and xfs_droplink to libxfs. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_inode_util.c | 35 +++++++++++++++++++++++++++++++++++ fs/xfs/libxfs/xfs_inode_util.h | 2 ++ fs/xfs/xfs_inode.c | 35 ----------------------------------- fs/xfs/xfs_inode.h | 1 - 4 files changed, 37 insertions(+), 36 deletions(-) diff --git a/fs/xfs/libxfs/xfs_inode_util.c b/fs/xfs/libxfs/xfs_inode_util.c index 0c3ac4b07cc5..e32b3152c3df 100644 --- a/fs/xfs/libxfs/xfs_inode_util.c +++ b/fs/xfs/libxfs/xfs_inode_util.c @@ -610,3 +610,38 @@ xfs_iunlink_remove( return xfs_iunlink_remove_inode(tp, pag, agibp, ip); } + +/* + * Decrement the link count on an inode & log the change. If this causes the + * link count to go to zero, move the inode to AGI unlinked list so that it can + * be freed when the last active reference goes away via xfs_inactive(). + */ +int +xfs_droplink( + struct xfs_trans *tp, + struct xfs_inode *ip) +{ + xfs_trans_ichgtime(tp, ip, XFS_ICHGTIME_CHG); + + drop_nlink(VFS_I(ip)); + xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); + + if (VFS_I(ip)->i_nlink) + return 0; + + return xfs_iunlink(tp, ip); +} + +/* + * Increment the link count on an inode & log the change. + */ +void +xfs_bumplink( + struct xfs_trans *tp, + struct xfs_inode *ip) +{ + xfs_trans_ichgtime(tp, ip, XFS_ICHGTIME_CHG); + + inc_nlink(VFS_I(ip)); + xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); +} diff --git a/fs/xfs/libxfs/xfs_inode_util.h b/fs/xfs/libxfs/xfs_inode_util.h index e15cf94e0943..f92b14a6fbe8 100644 --- a/fs/xfs/libxfs/xfs_inode_util.h +++ b/fs/xfs/libxfs/xfs_inode_util.h @@ -59,6 +59,8 @@ void xfs_inode_init(struct xfs_trans *tp, const struct xfs_icreate_args *args, int xfs_iunlink(struct xfs_trans *tp, struct xfs_inode *ip); int xfs_iunlink_remove(struct xfs_trans *tp, struct xfs_perag *pag, struct xfs_inode *ip); +int xfs_droplink(struct xfs_trans *tp, struct xfs_inode *ip); +void xfs_bumplink(struct xfs_trans *tp, struct xfs_inode *ip); /* The libxfs client must provide this group of helper functions. */ diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index f69423504216..f9599aa49ab4 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -657,41 +657,6 @@ xfs_icreate_args_rootfile( XFS_ICREATE_ARGS_FORCE_MODE; } -/* - * Decrement the link count on an inode & log the change. If this causes the - * link count to go to zero, move the inode to AGI unlinked list so that it can - * be freed when the last active reference goes away via xfs_inactive(). - */ -static int /* error */ -xfs_droplink( - xfs_trans_t *tp, - xfs_inode_t *ip) -{ - xfs_trans_ichgtime(tp, ip, XFS_ICHGTIME_CHG); - - drop_nlink(VFS_I(ip)); - xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); - - if (VFS_I(ip)->i_nlink) - return 0; - - return xfs_iunlink(tp, ip); -} - -/* - * Increment the link count on an inode & log the change. - */ -void -xfs_bumplink( - struct xfs_trans *tp, - struct xfs_inode *ip) -{ - xfs_trans_ichgtime(tp, ip, XFS_ICHGTIME_CHG); - - inc_nlink(VFS_I(ip)); - xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); -} - #ifdef CONFIG_XFS_LIVE_HOOKS /* * Use a static key here to reduce the overhead of link count live updates. If diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h index d07763312a27..571f61930b7b 100644 --- a/fs/xfs/xfs_inode.h +++ b/fs/xfs/xfs_inode.h @@ -580,7 +580,6 @@ void xfs_end_io(struct work_struct *work); int xfs_ilock2_io_mmap(struct xfs_inode *ip1, struct xfs_inode *ip2); void xfs_iunlock2_io_mmap(struct xfs_inode *ip1, struct xfs_inode *ip2); -void xfs_bumplink(struct xfs_trans *tp, struct xfs_inode *ip); void xfs_inode_count_blocks(struct xfs_trans *tp, struct xfs_inode *ip, xfs_filblks_t *dblocks, xfs_filblks_t *rblocks); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 17/20] xfs: create libxfs helper to remove an existing inode/name from a directory 2022-12-30 22:17 ` [PATCHSET v1.0 00/20] xfs: hoist inode operations to libxfs Darrick J. Wong ` (16 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 13/20] xfs: hoist xfs_{bump,drop}link to libxfs Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 20/20] xfs: get rid of cross_rename Darrick J. Wong 2022-12-30 22:17 ` [PATCH 19/20] xfs: create libxfs helper to rename two directory entries Darrick J. Wong 19 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Create a new libxfs function to remove a (name, inode) entry from a directory. The upcoming metadata directory feature will need this to create a metadata directory tree. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_dir2.c | 73 ++++++++++++++++++++++++++++++++++++++++++++++ fs/xfs/libxfs/xfs_dir2.h | 3 ++ fs/xfs/xfs_inode.c | 55 +---------------------------------- 3 files changed, 77 insertions(+), 54 deletions(-) diff --git a/fs/xfs/libxfs/xfs_dir2.c b/fs/xfs/libxfs/xfs_dir2.c index e14464712eff..2923cf568e9d 100644 --- a/fs/xfs/libxfs/xfs_dir2.c +++ b/fs/xfs/libxfs/xfs_dir2.c @@ -876,3 +876,76 @@ xfs_dir_link_existing_child( xfs_bumplink(tp, ip); return 0; } + +/* + * Given a directory @dp, a child @ip, and a @name, remove the (@name, @ip) + * entry from the directory. Both inodes must have the ILOCK held. + */ +int +xfs_dir_remove_child( + struct xfs_trans *tp, + uint resblks, + struct xfs_inode *dp, + struct xfs_name *name, + struct xfs_inode *ip) +{ + int error; + + ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL)); + ASSERT(xfs_isilocked(dp, XFS_ILOCK_EXCL)); + + /* + * If we're removing a directory perform some additional validation. + */ + if (S_ISDIR(VFS_I(ip)->i_mode)) { + ASSERT(VFS_I(ip)->i_nlink >= 2); + if (VFS_I(ip)->i_nlink != 2) + return -ENOTEMPTY; + if (!xfs_dir_isempty(ip)) + return -ENOTEMPTY; + + /* Drop the link from ip's "..". */ + error = xfs_droplink(tp, dp); + if (error) + return error; + + /* Drop the "." link from ip to self. */ + error = xfs_droplink(tp, ip); + if (error) + return error; + + /* + * Point the unlinked child directory's ".." entry to the root + * directory to eliminate back-references to inodes that may + * get freed before the child directory is closed. If the fs + * gets shrunk, this can lead to dirent inode validation errors. + */ + if (dp->i_ino != tp->t_mountp->m_sb.sb_rootino) { + error = xfs_dir_replace(tp, ip, &xfs_name_dotdot, + tp->t_mountp->m_sb.sb_rootino, 0); + if (error) + return error; + } + } else { + /* + * When removing a non-directory we need to log the parent + * inode here. For a directory this is done implicitly + * by the xfs_droplink call for the ".." entry. + */ + xfs_trans_log_inode(tp, dp, XFS_ILOG_CORE); + } + xfs_trans_ichgtime(tp, dp, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG); + + /* Drop the link from dp to ip. */ + error = xfs_droplink(tp, ip); + if (error) + return error; + + error = xfs_dir_removename(tp, dp, name, ip->i_ino, resblks); + if (error) { + ASSERT(error != -ENOENT); + return error; + } + + return 0; +} diff --git a/fs/xfs/libxfs/xfs_dir2.h b/fs/xfs/libxfs/xfs_dir2.h index 4afade8b0877..e35deb273d84 100644 --- a/fs/xfs/libxfs/xfs_dir2.h +++ b/fs/xfs/libxfs/xfs_dir2.h @@ -259,5 +259,8 @@ int xfs_dir_create_new_child(struct xfs_trans *tp, uint resblks, int xfs_dir_link_existing_child(struct xfs_trans *tp, uint resblks, struct xfs_inode *dp, struct xfs_name *name, struct xfs_inode *ip); +int xfs_dir_remove_child(struct xfs_trans *tp, uint resblks, + struct xfs_inode *dp, struct xfs_name *name, + struct xfs_inode *ip); #endif /* __XFS_DIR2_H__ */ diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index 8bd9d47bf6fa..f2a5de0119b3 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -2020,63 +2020,10 @@ xfs_remove( goto std_return; } - /* - * If we're removing a directory perform some additional validation. - */ - if (is_dir) { - ASSERT(VFS_I(ip)->i_nlink >= 2); - if (VFS_I(ip)->i_nlink != 2) { - error = -ENOTEMPTY; - goto out_trans_cancel; - } - if (!xfs_dir_isempty(ip)) { - error = -ENOTEMPTY; - goto out_trans_cancel; - } - - /* Drop the link from ip's "..". */ - error = xfs_droplink(tp, dp); - if (error) - goto out_trans_cancel; - - /* Drop the "." link from ip to self. */ - error = xfs_droplink(tp, ip); - if (error) - goto out_trans_cancel; - - /* - * Point the unlinked child directory's ".." entry to the root - * directory to eliminate back-references to inodes that may - * get freed before the child directory is closed. If the fs - * gets shrunk, this can lead to dirent inode validation errors. - */ - if (dp->i_ino != tp->t_mountp->m_sb.sb_rootino) { - error = xfs_dir_replace(tp, ip, &xfs_name_dotdot, - tp->t_mountp->m_sb.sb_rootino, 0); - if (error) - goto out_trans_cancel; - } - } else { - /* - * When removing a non-directory we need to log the parent - * inode here. For a directory this is done implicitly - * by the xfs_droplink call for the ".." entry. - */ - xfs_trans_log_inode(tp, dp, XFS_ILOG_CORE); - } - xfs_trans_ichgtime(tp, dp, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG); - - /* Drop the link from dp to ip. */ - error = xfs_droplink(tp, ip); + error = xfs_dir_remove_child(tp, resblks, dp, name, ip); if (error) goto out_trans_cancel; - error = xfs_dir_removename(tp, dp, name, ip->i_ino, resblks); - if (error) { - ASSERT(error != -ENOENT); - goto out_trans_cancel; - } - /* * Drop the link from dp to ip, and if ip was a directory, remove the * '.' and '..' references since we freed the directory. ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 20/20] xfs: get rid of cross_rename 2022-12-30 22:17 ` [PATCHSET v1.0 00/20] xfs: hoist inode operations to libxfs Darrick J. Wong ` (17 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 17/20] xfs: create libxfs helper to remove an existing inode/name from a directory Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 19/20] xfs: create libxfs helper to rename two directory entries Darrick J. Wong 19 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Get rid of the largely pointless xfs_cross_rename now that we've refactored its parent. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/xfs_inode.c | 72 +++++++++++----------------------------------------- 1 file changed, 15 insertions(+), 57 deletions(-) diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index 4b9d680c5268..fdd5e5c89e62 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -2177,54 +2177,6 @@ xfs_rename_call_nlink_hooks( ((void)0) #endif /* CONFIG_XFS_LIVE_HOOKS */ -static int -xfs_finish_rename( - struct xfs_trans *tp) -{ - /* - * If this is a synchronous mount, make sure that the rename transaction - * goes to disk before returning to the user. - */ - if (xfs_has_wsync(tp->t_mountp) || xfs_has_dirsync(tp->t_mountp)) - xfs_trans_set_sync(tp); - - return xfs_trans_commit(tp); -} - -/* - * xfs_cross_rename() - * - * responsible for handling RENAME_EXCHANGE flag in renameat2() syscall - */ -STATIC int -xfs_cross_rename( - struct xfs_trans *tp, - struct xfs_inode *dp1, - struct xfs_name *name1, - struct xfs_inode *ip1, - struct xfs_inode *dp2, - struct xfs_name *name2, - struct xfs_inode *ip2, - int spaceres) -{ - int error; - - error = xfs_dir_exchange(tp, dp1, name1, ip1, dp2, name2, ip2, - spaceres); - if (error) - goto out_trans_abort; - - if (xfs_hooks_switched_on(&xfs_nlinks_hooks_switch)) - xfs_rename_call_nlink_hooks(dp1, name1, ip1, dp2, name2, ip2, - NULL, RENAME_EXCHANGE); - - return xfs_finish_rename(tp); - -out_trans_abort: - xfs_trans_cancel(tp); - return error; -} - /* * xfs_rename_alloc_whiteout() * @@ -2377,12 +2329,6 @@ xfs_rename( goto out_trans_cancel; } - /* RENAME_EXCHANGE is unique from here on. */ - if (flags & RENAME_EXCHANGE) - return xfs_cross_rename(tp, src_dp, src_name, src_ip, - target_dp, target_name, target_ip, - spaceres); - /* * Try to reserve quota to handle an expansion of the target directory. * We'll allow the rename to continue in reservationless mode if we hit @@ -2434,8 +2380,13 @@ xfs_rename( } } - error = xfs_dir_rename(tp, src_dp, src_name, src_ip, target_dp, - target_name, target_ip, spaceres, wip); + if (flags & RENAME_EXCHANGE) + error = xfs_dir_exchange(tp, src_dp, src_name, src_ip, + target_dp, target_name, target_ip, spaceres); + else + error = xfs_dir_rename(tp, src_dp, src_name, src_ip, + target_dp, target_name, target_ip, spaceres, + wip); if (error) goto out_trans_cancel; @@ -2452,7 +2403,14 @@ xfs_rename( xfs_rename_call_nlink_hooks(src_dp, src_name, src_ip, target_dp, target_name, target_ip, wip, flags); - error = xfs_finish_rename(tp); + /* + * If this is a synchronous mount, make sure that the rename + * transaction goes to disk before returning to the user. + */ + if (xfs_has_wsync(tp->t_mountp) || xfs_has_dirsync(tp->t_mountp)) + xfs_trans_set_sync(tp); + + error = xfs_trans_commit(tp); if (wip) xfs_irele(wip); return error; ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 19/20] xfs: create libxfs helper to rename two directory entries 2022-12-30 22:17 ` [PATCHSET v1.0 00/20] xfs: hoist inode operations to libxfs Darrick J. Wong ` (18 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 20/20] xfs: get rid of cross_rename Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 19 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Create a new libxfs function to rename two directory entries. The upcoming metadata directory feature will need this to replace a metadata inode directory entry. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_dir2.c | 203 ++++++++++++++++++++++++++++++++++++++++++++++ fs/xfs/libxfs/xfs_dir2.h | 5 + fs/xfs/xfs_inode.c | 177 ++-------------------------------------- 3 files changed, 218 insertions(+), 167 deletions(-) diff --git a/fs/xfs/libxfs/xfs_dir2.c b/fs/xfs/libxfs/xfs_dir2.c index 6d7851a613e7..e8f4a84a97c8 100644 --- a/fs/xfs/libxfs/xfs_dir2.c +++ b/fs/xfs/libxfs/xfs_dir2.c @@ -22,6 +22,7 @@ #include "xfs_bmap_btree.h" #include "xfs_trans_space.h" #include "xfs_ag.h" +#include "xfs_ialloc.h" const struct xfs_name xfs_name_dotdot = { .name = (const unsigned char *)"..", @@ -1057,3 +1058,205 @@ xfs_dir_exchange( return 0; } + +/* + * Given an entry (@src_name, @src_ip) in directory @src_dp, make the entry + * @target_name in directory @target_dp point to @src_ip and remove the + * original entry, cleaning up everything left behind. + * + * Cleanup involves dropping a link count on @target_ip, and either removing + * the (@src_name, @src_ip) entry from @src_dp or simply replacing the entry + * with (@src_name, @wip) if a whiteout inode @wip is supplied. + * + * All inodes must have the ILOCK held. We assume that if @src_ip is a + * directory then its '..' doesn't already point to @target_dp, and that @wip + * is a freshly allocated whiteout. + */ +int +xfs_dir_rename( + struct xfs_trans *tp, + struct xfs_inode *src_dp, + struct xfs_name *src_name, + struct xfs_inode *src_ip, + struct xfs_inode *target_dp, + struct xfs_name *target_name, + struct xfs_inode *target_ip, + unsigned int spaceres, + struct xfs_inode *wip) +{ + struct xfs_mount *mp = tp->t_mountp; + bool new_parent = (src_dp != target_dp); + bool src_is_directory; + int error; + + src_is_directory = S_ISDIR(VFS_I(src_ip)->i_mode); + + /* + * Check for expected errors before we dirty the transaction + * so we can return an error without a transaction abort. + */ + if (target_ip == NULL) { + /* + * If there's no space reservation, check the entry will + * fit before actually inserting it. + */ + if (!spaceres) { + error = xfs_dir_canenter(tp, target_dp, target_name); + if (error) + return error; + } + } else { + /* + * If target exists and it's a directory, check that whether + * it can be destroyed. + */ + if (S_ISDIR(VFS_I(target_ip)->i_mode) && + (!xfs_dir_isempty(target_ip) || + (VFS_I(target_ip)->i_nlink > 2))) + return -EEXIST; + } + + /* + * Directory entry creation below may acquire the AGF. Remove + * the whiteout from the unlinked list first to preserve correct + * AGI/AGF locking order. This dirties the transaction so failures + * after this point will abort and log recovery will clean up the + * mess. + * + * For whiteouts, we need to bump the link count on the whiteout + * inode. After this point, we have a real link, clear the tmpfile + * state flag from the inode so it doesn't accidentally get misused + * in future. + */ + if (wip) { + struct xfs_perag *pag; + + ASSERT(VFS_I(wip)->i_nlink == 0); + + pag = xfs_perag_get(mp, XFS_INO_TO_AGNO(mp, wip->i_ino)); + error = xfs_iunlink_remove(tp, pag, wip); + xfs_perag_put(pag); + if (error) + return error; + + xfs_bumplink(tp, wip); + } + + /* + * Set up the target. + */ + if (target_ip == NULL) { + /* + * If target does not exist and the rename crosses + * directories, adjust the target directory link count + * to account for the ".." reference from the new entry. + */ + error = xfs_dir_createname(tp, target_dp, target_name, + src_ip->i_ino, spaceres); + if (error) + return error; + + xfs_trans_ichgtime(tp, target_dp, + XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG); + + if (new_parent && src_is_directory) { + xfs_bumplink(tp, target_dp); + } + } else { /* target_ip != NULL */ + /* + * Link the source inode under the target name. + * If the source inode is a directory and we are moving + * it across directories, its ".." entry will be + * inconsistent until we replace that down below. + * + * In case there is already an entry with the same + * name at the destination directory, remove it first. + */ + error = xfs_dir_replace(tp, target_dp, target_name, + src_ip->i_ino, spaceres); + if (error) + return error; + + xfs_trans_ichgtime(tp, target_dp, + XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG); + + /* + * Decrement the link count on the target since the target + * dir no longer points to it. + */ + error = xfs_droplink(tp, target_ip); + if (error) + return error; + + if (src_is_directory) { + /* + * Drop the link from the old "." entry. + */ + error = xfs_droplink(tp, target_ip); + if (error) + return error; + } + } /* target_ip != NULL */ + + /* + * Remove the source. + */ + if (new_parent && src_is_directory) { + /* + * Rewrite the ".." entry to point to the new + * directory. + */ + error = xfs_dir_replace(tp, src_ip, &xfs_name_dotdot, + target_dp->i_ino, spaceres); + ASSERT(error != -EEXIST); + if (error) + return error; + } + + /* + * We always want to hit the ctime on the source inode. + * + * This isn't strictly required by the standards since the source + * inode isn't really being changed, but old unix file systems did + * it and some incremental backup programs won't work without it. + */ + xfs_trans_ichgtime(tp, src_ip, XFS_ICHGTIME_CHG); + xfs_trans_log_inode(tp, src_ip, XFS_ILOG_CORE); + + /* + * Adjust the link count on src_dp. This is necessary when + * renaming a directory, either within one parent when + * the target existed, or across two parent directories. + */ + if (src_is_directory && (new_parent || target_ip != NULL)) { + + /* + * Decrement link count on src_directory since the + * entry that's moved no longer points to it. + */ + error = xfs_droplink(tp, src_dp); + if (error) + return error; + } + + /* + * For whiteouts, we only need to update the source dirent with the + * inode number of the whiteout inode rather than removing it + * altogether. + */ + if (wip) + error = xfs_dir_replace(tp, src_dp, src_name, wip->i_ino, + spaceres); + else + error = xfs_dir_removename(tp, src_dp, src_name, src_ip->i_ino, + spaceres); + if (error) + return error; + + xfs_trans_ichgtime(tp, src_dp, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG); + xfs_trans_log_inode(tp, src_dp, XFS_ILOG_CORE); + if (new_parent) + xfs_trans_log_inode(tp, target_dp, XFS_ILOG_CORE); + + return 0; +} diff --git a/fs/xfs/libxfs/xfs_dir2.h b/fs/xfs/libxfs/xfs_dir2.h index f63390236f09..00b4642bc8a8 100644 --- a/fs/xfs/libxfs/xfs_dir2.h +++ b/fs/xfs/libxfs/xfs_dir2.h @@ -266,5 +266,10 @@ int xfs_dir_exchange(struct xfs_trans *tp, struct xfs_inode *dp1, struct xfs_name *name1, struct xfs_inode *ip1, struct xfs_inode *dp2, struct xfs_name *name2, struct xfs_inode *ip2, unsigned int spaceres); +int xfs_dir_rename(struct xfs_trans *tp, struct xfs_inode *src_dp, + struct xfs_name *src_name, struct xfs_inode *src_ip, + struct xfs_inode *target_dp, struct xfs_name *target_name, + struct xfs_inode *target_ip, unsigned int spaceres, + struct xfs_inode *wip); #endif /* __XFS_DIR2_H__ */ diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index 591721755b78..4b9d680c5268 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -2408,33 +2408,6 @@ xfs_rename( goto out_trans_cancel; } - /* - * Check for expected errors before we dirty the transaction - * so we can return an error without a transaction abort. - */ - if (target_ip == NULL) { - /* - * If there's no space reservation, check the entry will - * fit before actually inserting it. - */ - if (!spaceres) { - error = xfs_dir_canenter(tp, target_dp, target_name); - if (error) - goto out_trans_cancel; - } - } else { - /* - * If target exists and it's a directory, check that whether - * it can be destroyed. - */ - if (S_ISDIR(VFS_I(target_ip)->i_mode) && - (!xfs_dir_isempty(target_ip) || - (VFS_I(target_ip)->i_nlink > 2))) { - error = -EEXIST; - goto out_trans_cancel; - } - } - /* * Lock the AGI buffers we need to handle bumping the nlink of the * whiteout inode off the unlinked list and to handle dropping the @@ -2461,150 +2434,20 @@ xfs_rename( } } - /* - * Directory entry creation below may acquire the AGF. Remove - * the whiteout from the unlinked list first to preserve correct - * AGI/AGF locking order. This dirties the transaction so failures - * after this point will abort and log recovery will clean up the - * mess. - * - * For whiteouts, we need to bump the link count on the whiteout - * inode. After this point, we have a real link, clear the tmpfile - * state flag from the inode so it doesn't accidentally get misused - * in future. - */ + error = xfs_dir_rename(tp, src_dp, src_name, src_ip, target_dp, + target_name, target_ip, spaceres, wip); + if (error) + goto out_trans_cancel; + if (wip) { - struct xfs_perag *pag; - - ASSERT(VFS_I(wip)->i_nlink == 0); - - pag = xfs_perag_get(mp, XFS_INO_TO_AGNO(mp, wip->i_ino)); - error = xfs_iunlink_remove(tp, pag, wip); - xfs_perag_put(pag); - if (error) - goto out_trans_cancel; - - xfs_bumplink(tp, wip); + /* + * Now we have a real link, clear the "I'm a tmpfile" state + * flag from the inode so it doesn't accidentally get misused in + * future. + */ VFS_I(wip)->i_state &= ~I_LINKABLE; } - /* - * Set up the target. - */ - if (target_ip == NULL) { - /* - * If target does not exist and the rename crosses - * directories, adjust the target directory link count - * to account for the ".." reference from the new entry. - */ - error = xfs_dir_createname(tp, target_dp, target_name, - src_ip->i_ino, spaceres); - if (error) - goto out_trans_cancel; - - xfs_trans_ichgtime(tp, target_dp, - XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG); - - if (new_parent && src_is_directory) { - xfs_bumplink(tp, target_dp); - } - } else { /* target_ip != NULL */ - /* - * Link the source inode under the target name. - * If the source inode is a directory and we are moving - * it across directories, its ".." entry will be - * inconsistent until we replace that down below. - * - * In case there is already an entry with the same - * name at the destination directory, remove it first. - */ - error = xfs_dir_replace(tp, target_dp, target_name, - src_ip->i_ino, spaceres); - if (error) - goto out_trans_cancel; - - xfs_trans_ichgtime(tp, target_dp, - XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG); - - /* - * Decrement the link count on the target since the target - * dir no longer points to it. - */ - error = xfs_droplink(tp, target_ip); - if (error) - goto out_trans_cancel; - - if (src_is_directory) { - /* - * Drop the link from the old "." entry. - */ - error = xfs_droplink(tp, target_ip); - if (error) - goto out_trans_cancel; - } - } /* target_ip != NULL */ - - /* - * Remove the source. - */ - if (new_parent && src_is_directory) { - /* - * Rewrite the ".." entry to point to the new - * directory. - */ - error = xfs_dir_replace(tp, src_ip, &xfs_name_dotdot, - target_dp->i_ino, spaceres); - ASSERT(error != -EEXIST); - if (error) - goto out_trans_cancel; - } - - /* - * We always want to hit the ctime on the source inode. - * - * This isn't strictly required by the standards since the source - * inode isn't really being changed, but old unix file systems did - * it and some incremental backup programs won't work without it. - */ - xfs_trans_ichgtime(tp, src_ip, XFS_ICHGTIME_CHG); - xfs_trans_log_inode(tp, src_ip, XFS_ILOG_CORE); - - /* - * Adjust the link count on src_dp. This is necessary when - * renaming a directory, either within one parent when - * the target existed, or across two parent directories. - */ - if (src_is_directory && (new_parent || target_ip != NULL)) { - - /* - * Decrement link count on src_directory since the - * entry that's moved no longer points to it. - */ - error = xfs_droplink(tp, src_dp); - if (error) - goto out_trans_cancel; - } - - /* - * For whiteouts, we only need to update the source dirent with the - * inode number of the whiteout inode rather than removing it - * altogether. - */ - if (wip) - error = xfs_dir_replace(tp, src_dp, src_name, wip->i_ino, - spaceres); - else - error = xfs_dir_removename(tp, src_dp, src_name, src_ip->i_ino, - spaceres); - - if (error) - goto out_trans_cancel; - - xfs_trans_ichgtime(tp, src_dp, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG); - xfs_trans_log_inode(tp, src_dp, XFS_ILOG_CORE); - if (new_parent) - xfs_trans_log_inode(tp, target_dp, XFS_ILOG_CORE); - if (xfs_hooks_switched_on(&xfs_nlinks_hooks_switch)) xfs_rename_call_nlink_hooks(src_dp, src_name, src_ip, target_dp, target_name, target_ip, wip, flags); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCHSET v1.0 00/23] xfs: metadata inode directories 2022-12-30 21:14 [NYE DELUGE 3/4] xfs: modernize the realtime volume Darrick J. Wong 2022-12-30 22:17 ` [PATCHSET v1.0 00/20] xfs: hoist inode operations to libxfs Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 02/23] xfs: create imeta abstractions to get and set metadata inodes Darrick J. Wong ` (22 more replies) 2022-12-30 22:17 ` [PATCHSET v1.0 00/14] xfs: refactor btrees to support records in inode root Darrick J. Wong ` (37 subsequent siblings) 39 siblings, 23 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs Hi all, This series delivers a new feature -- metadata inode directories. This is a separate directory tree (rooted in the superblock) that contains only inodes that contain filesystem metadata. Different metadata objects can be looked up with regular paths. We start by creating xfs_imeta_* functions to mediate access to metadata inode pointers. This enables the imeta code to abstract inode pointers, whether they're the classic five in the superblock, or the much more complex directory tree. All current users of metadata inodes (rt+quota) are converted to use the boilerplate code. Next, we define the metadir on-disk format, which consists of marking inodes with a new iflag that says they're metadata. This we use to prevent bulkstat and friends from ever getting their hands on fs metadata. Finally, we implement metadir operations so that clients can create, delete, zap, and look up metadata inodes by path. Beware that much of this code is only lightly used, because the five current users of metadata inodes don't tend to change them very often. This is likely to change if and when the subvolume and multiple-rt-volume features get written/merged/etc. If you're going to start using this mess, you probably ought to just pull from my git trees, which are linked below. This is an extraordinary way to destroy everything. Enjoy! Comments and questions are, as always, welcome. --D kernel git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=metadir xfsprogs git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=metadir fstests git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=metadir xfsdocs git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-documentation.git/log/?h=metadir --- fs/xfs/Makefile | 1 fs/xfs/libxfs/xfs_format.h | 60 ++ fs/xfs/libxfs/xfs_fs.h | 12 fs/xfs/libxfs/xfs_health.h | 4 fs/xfs/libxfs/xfs_ialloc.c | 16 - fs/xfs/libxfs/xfs_ialloc.h | 2 fs/xfs/libxfs/xfs_imeta.c | 1210 ++++++++++++++++++++++++++++++++++++++++ fs/xfs/libxfs/xfs_imeta.h | 94 +++ fs/xfs/libxfs/xfs_inode_buf.c | 73 ++ fs/xfs/libxfs/xfs_inode_buf.h | 3 fs/xfs/libxfs/xfs_inode_util.c | 4 fs/xfs/libxfs/xfs_log_rlimit.c | 9 fs/xfs/libxfs/xfs_sb.c | 35 + fs/xfs/libxfs/xfs_trans_resv.c | 74 ++ fs/xfs/libxfs/xfs_trans_resv.h | 2 fs/xfs/libxfs/xfs_types.c | 7 fs/xfs/scrub/agheader.c | 29 + fs/xfs/scrub/common.c | 7 fs/xfs/scrub/dir.c | 9 fs/xfs/scrub/dir_repair.c | 6 fs/xfs/scrub/inode_repair.c | 10 fs/xfs/scrub/nlinks.c | 12 fs/xfs/scrub/nlinks_repair.c | 2 fs/xfs/scrub/parent.c | 18 + fs/xfs/scrub/parent_repair.c | 37 + fs/xfs/scrub/tempfile.c | 10 fs/xfs/xfs_health.c | 1 fs/xfs/xfs_icache.c | 39 + fs/xfs/xfs_inode.c | 131 ++++ fs/xfs/xfs_inode.h | 11 fs/xfs/xfs_ioctl.c | 7 fs/xfs/xfs_iops.c | 34 + fs/xfs/xfs_itable.c | 32 + fs/xfs/xfs_itable.h | 3 fs/xfs/xfs_mount.c | 39 + fs/xfs/xfs_mount.h | 3 fs/xfs/xfs_ondisk.h | 1 fs/xfs/xfs_qm.c | 212 +++++-- fs/xfs/xfs_qm_syscalls.c | 4 fs/xfs/xfs_rtalloc.c | 16 - fs/xfs/xfs_super.c | 4 fs/xfs/xfs_symlink.c | 2 fs/xfs/xfs_trace.h | 78 +++ 43 files changed, 2223 insertions(+), 140 deletions(-) create mode 100644 fs/xfs/libxfs/xfs_imeta.c create mode 100644 fs/xfs/libxfs/xfs_imeta.h ^ permalink raw reply [flat|nested] 565+ messages in thread
* [PATCH 02/23] xfs: create imeta abstractions to get and set metadata inodes 2022-12-30 22:17 ` [PATCHSET v1.0 00/23] xfs: metadata inode directories Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 01/23] xfs: don't use the incore struct xfs_sb for offsets into struct xfs_dsb Darrick J. Wong ` (21 subsequent siblings) 22 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Create some helper routines to get and set metadata inode numbers instead of open-coding them throughout xfs. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/Makefile | 1 fs/xfs/libxfs/xfs_imeta.c | 438 +++++++++++++++++++++++++++++++++++++++++++++ fs/xfs/libxfs/xfs_imeta.h | 48 +++++ fs/xfs/libxfs/xfs_types.c | 5 - fs/xfs/xfs_mount.c | 21 ++ fs/xfs/xfs_trace.h | 37 ++++ 6 files changed, 546 insertions(+), 4 deletions(-) create mode 100644 fs/xfs/libxfs/xfs_imeta.c create mode 100644 fs/xfs/libxfs/xfs_imeta.h diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile index dac4165b02c0..3d74696755c3 100644 --- a/fs/xfs/Makefile +++ b/fs/xfs/Makefile @@ -37,6 +37,7 @@ xfs-y += $(addprefix libxfs/, \ xfs_ialloc.o \ xfs_ialloc_btree.o \ xfs_iext_tree.o \ + xfs_imeta.o \ xfs_inode_fork.o \ xfs_inode_buf.o \ xfs_inode_util.o \ diff --git a/fs/xfs/libxfs/xfs_imeta.c b/fs/xfs/libxfs/xfs_imeta.c new file mode 100644 index 000000000000..0a1cd0c5c15b --- /dev/null +++ b/fs/xfs/libxfs/xfs_imeta.c @@ -0,0 +1,438 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (C) 2022 Oracle. All Rights Reserved. + * Author: Darrick J. Wong <djwong@kernel.org> + */ +#include "xfs.h" +#include "xfs_fs.h" +#include "xfs_shared.h" +#include "xfs_format.h" +#include "xfs_log_format.h" +#include "xfs_trans_resv.h" +#include "xfs_bit.h" +#include "xfs_sb.h" +#include "xfs_mount.h" +#include "xfs_defer.h" +#include "xfs_trans.h" +#include "xfs_imeta.h" +#include "xfs_trace.h" +#include "xfs_inode.h" +#include "xfs_quota.h" +#include "xfs_ialloc.h" + +/* + * Metadata Inode Number Management + * ================================ + * + * These functions provide an abstraction layer for looking up, creating, and + * deleting metadata inodes. These pointers live in the in-core superblock, + * so the functions moderate access to those fields and take care of logging. + * + * For the five existing metadata inodes (real time bitmap & summary; and the + * user, group, and quotas) we'll continue to maintain the in-core superblock + * inodes for reads and only require xfs_imeta_create and xfs_imeta_unlink to + * persist changes. New metadata inode types must only use the xfs_imeta_* + * functions. + * + * Callers wishing to create or unlink a metadata inode must pass in a + * xfs_imeta_end structure. After committing or cancelling the transaction, + * this structure must be passed to xfs_imeta_end_update to free resources that + * cannot be freed during the transaction. + * + * Right now we only support callers passing in the predefined metadata inode + * paths; the goal is that callers will some day locate metadata inodes based + * on path lookups into a metadata directory structure. + */ + +/* Static metadata inode paths */ + +const struct xfs_imeta_path XFS_IMETA_RTBITMAP = { + .bogus = 0, +}; + +const struct xfs_imeta_path XFS_IMETA_RTSUMMARY = { + .bogus = 1, +}; + +const struct xfs_imeta_path XFS_IMETA_USRQUOTA = { + .bogus = 2, +}; + +const struct xfs_imeta_path XFS_IMETA_GRPQUOTA = { + .bogus = 3, +}; + +const struct xfs_imeta_path XFS_IMETA_PRJQUOTA = { + .bogus = 4, +}; + +/* Are these two paths equal? */ +STATIC bool +xfs_imeta_path_compare( + const struct xfs_imeta_path *a, + const struct xfs_imeta_path *b) +{ + return a == b; +} + +/* Is this path ok? */ +static inline bool +xfs_imeta_path_check( + const struct xfs_imeta_path *path) +{ + return true; +} + +/* Functions for storing and retrieving superblock inode values. */ + +/* Mapping of metadata inode paths to in-core superblock values. */ +static const struct xfs_imeta_sbmap { + const struct xfs_imeta_path *path; + unsigned int offset; +} xfs_imeta_sbmaps[] = { + { + .path = &XFS_IMETA_RTBITMAP, + .offset = offsetof(struct xfs_sb, sb_rbmino), + }, + { + .path = &XFS_IMETA_RTSUMMARY, + .offset = offsetof(struct xfs_sb, sb_rsumino), + }, + { + .path = &XFS_IMETA_USRQUOTA, + .offset = offsetof(struct xfs_sb, sb_uquotino), + }, + { + .path = &XFS_IMETA_GRPQUOTA, + .offset = offsetof(struct xfs_sb, sb_gquotino), + }, + { + .path = &XFS_IMETA_PRJQUOTA, + .offset = offsetof(struct xfs_sb, sb_pquotino), + }, + { NULL, 0 }, +}; + +/* Return a pointer to the in-core superblock inode value. */ +static inline xfs_ino_t * +xfs_imeta_sbmap_to_inop( + struct xfs_mount *mp, + const struct xfs_imeta_sbmap *map) +{ + return (xfs_ino_t *)(((char *)&mp->m_sb) + map->offset); +} + +/* Compute location of metadata inode pointer in the in-core superblock */ +static inline xfs_ino_t * +xfs_imeta_path_to_sb_inop( + struct xfs_mount *mp, + const struct xfs_imeta_path *path) +{ + const struct xfs_imeta_sbmap *p; + + for (p = xfs_imeta_sbmaps; p->path; p++) + if (xfs_imeta_path_compare(p->path, path)) + return xfs_imeta_sbmap_to_inop(mp, p); + + return NULL; +} + +/* Look up a superblock metadata inode by its path. */ +STATIC int +xfs_imeta_sb_lookup( + struct xfs_mount *mp, + const struct xfs_imeta_path *path, + xfs_ino_t *inop) +{ + xfs_ino_t *sb_inop; + + sb_inop = xfs_imeta_path_to_sb_inop(mp, path); + if (!sb_inop) + return -EINVAL; + + trace_xfs_imeta_sb_lookup(mp, sb_inop); + *inop = *sb_inop; + return 0; +} + +/* Update inode pointers in the superblock. */ +static inline void +xfs_imeta_log_sb( + struct xfs_trans *tp) +{ + struct xfs_mount *mp = tp->t_mountp; + struct xfs_buf *bp = xfs_trans_getsb(tp); + + /* + * Update the inode flags in the ondisk superblock without touching + * the summary counters. We have not quiesced inode chunk allocation, + * so we cannot coordinate with updates to the icount and ifree percpu + * counters. + */ + xfs_sb_to_disk(bp->b_addr, &mp->m_sb); + xfs_trans_buf_set_type(tp, bp, XFS_BLFT_SB_BUF); + xfs_trans_log_buf(tp, bp, 0, sizeof(struct xfs_dsb) - 1); +} + +/* + * Create a new metadata inode and set a superblock pointer to this new inode. + * The superblock field must not already be pointing to an inode. + */ +STATIC int +xfs_imeta_sb_create( + struct xfs_trans **tpp, + const struct xfs_imeta_path *path, + umode_t mode, + unsigned int flags, + struct xfs_inode **ipp) +{ + struct xfs_icreate_args args = { + .nlink = S_ISDIR(mode) ? 2 : 1, + }; + struct xfs_mount *mp = (*tpp)->t_mountp; + xfs_ino_t *sb_inop; + xfs_ino_t ino; + int error; + + xfs_icreate_args_rootfile(&args, mode); + + /* Reject if the sb already points to some inode. */ + sb_inop = xfs_imeta_path_to_sb_inop(mp, path); + if (!sb_inop) + return -EINVAL; + + if (*sb_inop != NULLFSINO) + return -EEXIST; + + /* Create a new inode and set the sb pointer. */ + error = xfs_dialloc(tpp, 0, mode, &ino); + if (error) + return error; + error = xfs_icreate(*tpp, ino, &args, ipp); + if (error) + return error; + + /* Attach dquots to this file. Caller should have allocated them! */ + if (!(flags & XFS_IMETA_CREATE_NOQUOTA)) { + error = xfs_qm_dqattach_locked(*ipp, false); + if (error) + return error; + xfs_trans_mod_dquot_byino(*tpp, *ipp, XFS_TRANS_DQ_ICOUNT, 1); + } + + /* Update superblock pointer. */ + *sb_inop = ino; + trace_xfs_imeta_sb_create(mp, sb_inop); + xfs_imeta_log_sb(*tpp); + return 0; +} + +/* + * Clear the given inode pointer from the superblock and drop the link count + * of the metadata inode. + */ +STATIC int +xfs_imeta_sb_unlink( + struct xfs_trans **tpp, + const struct xfs_imeta_path *path, + struct xfs_inode *ip) +{ + struct xfs_mount *mp = (*tpp)->t_mountp; + xfs_ino_t *sb_inop; + + sb_inop = xfs_imeta_path_to_sb_inop(mp, path); + if (!sb_inop) + return -EINVAL; + + /* Reject if the sb doesn't point to the inode that was passed in. */ + if (*sb_inop != ip->i_ino) + return -ENOENT; + + *sb_inop = NULLFSINO; + trace_xfs_imeta_sb_unlink(mp, sb_inop); + xfs_imeta_log_sb(*tpp); + return xfs_droplink(*tpp, ip); +} + +/* Set the given inode pointer in the superblock. */ +STATIC int +xfs_imeta_sb_link( + struct xfs_trans *tp, + const struct xfs_imeta_path *path, + struct xfs_inode *ip) +{ + struct xfs_mount *mp = tp->t_mountp; + xfs_ino_t *sb_inop; + + sb_inop = xfs_imeta_path_to_sb_inop(mp, path); + if (!sb_inop) + return -EINVAL; + if (*sb_inop == NULLFSINO) + return -EEXIST; + + xfs_ilock(ip, XFS_ILOCK_EXCL); + xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL); + + inc_nlink(VFS_I(ip)); + *sb_inop = ip->i_ino; + trace_xfs_imeta_sb_link(mp, sb_inop); + xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); + xfs_imeta_log_sb(tp); + return 0; +} + +/* General functions for managing metadata inode pointers */ + +/* + * Is this metadata inode pointer ok? We allow the fields to be set to + * NULLFSINO if the metadata structure isn't present, and we don't allow + * obviously incorrect inode pointers. + */ +static inline bool +xfs_imeta_verify( + struct xfs_mount *mp, + xfs_ino_t ino) +{ + if (ino == NULLFSINO) + return true; + return xfs_verify_ino(mp, ino); +} + +/* Look up a metadata inode by its path. */ +int +xfs_imeta_lookup( + struct xfs_mount *mp, + const struct xfs_imeta_path *path, + xfs_ino_t *inop) +{ + xfs_ino_t ino; + int error; + + ASSERT(xfs_imeta_path_check(path)); + + error = xfs_imeta_sb_lookup(mp, path, &ino); + if (error) + return error; + + if (!xfs_imeta_verify(mp, ino)) + return -EFSCORRUPTED; + + *inop = ino; + return 0; +} + +/* + * Create a metadata inode with the given @mode, and insert it into the + * metadata directory tree at the given @path. The path (up to the final + * component) must already exist. The new metadata inode @ipp will be ijoined + * and logged to @tpp, with the ILOCK held until the next transaction commit. + * The caller must provide a @upd structure. + * + * Callers must ensure that the root dquots are allocated, if applicable. + * + * NOTE: This function may pass a child inode @ipp back to the caller even if + * it returns a negative error code. If an inode is passed back, the caller + * must finish setting up the incore inode before releasing it. + */ +int +xfs_imeta_create( + struct xfs_trans **tpp, + const struct xfs_imeta_path *path, + umode_t mode, + unsigned int flags, + struct xfs_inode **ipp, + struct xfs_imeta_update *upd) +{ + ASSERT(xfs_imeta_path_check(path)); + *ipp = NULL; + + return xfs_imeta_sb_create(tpp, path, mode, flags, ipp); +} + +/* + * Unlink a metadata inode @ip from the metadata directory given by @path. The + * metadata inode must not be ILOCKed. Upon return, the inode will be ijoined + * and logged to @tpp, and returned with reduced link count, ready to be + * released. The caller must provide a @upd structure. + */ +int +xfs_imeta_unlink( + struct xfs_trans **tpp, + const struct xfs_imeta_path *path, + struct xfs_inode *ip, + struct xfs_imeta_update *upd) +{ + ASSERT(xfs_imeta_path_check(path)); + ASSERT(xfs_imeta_verify((*tpp)->t_mountp, ip->i_ino)); + + return xfs_imeta_sb_unlink(tpp, path, ip); +} + +/* + * Link the metadata directory given by @path point to the given inode number. + * The path must not already exist. The caller must not hold the ILOCK, and + * the function will return with the inode joined to the transaction. + */ +int +xfs_imeta_link( + struct xfs_trans *tp, + const struct xfs_imeta_path *path, + struct xfs_inode *ip, + struct xfs_imeta_update *upd) +{ + ASSERT(xfs_imeta_path_check(path)); + + return xfs_imeta_sb_link(tp, path, ip); +} + +/* + * Clean up after committing (or cancelling) a metadata inode creation or + * removal. + */ +void +xfs_imeta_end_update( + struct xfs_mount *mp, + struct xfs_imeta_update *upd, + int error) +{ + trace_xfs_imeta_end_update(mp, error, __return_address); +} + +/* Start setting up for a metadata directory tree operation. */ +int +xfs_imeta_start_update( + struct xfs_mount *mp, + const struct xfs_imeta_path *path, + struct xfs_imeta_update *upd) +{ + trace_xfs_imeta_start_update(mp, 0, __return_address); + + memset(upd, 0, sizeof(struct xfs_imeta_update)); + return 0; +} + +/* Does this inode number refer to a static metadata inode? */ +bool +xfs_is_static_meta_ino( + struct xfs_mount *mp, + xfs_ino_t ino) +{ + const struct xfs_imeta_sbmap *p; + + if (ino == NULLFSINO) + return false; + + for (p = xfs_imeta_sbmaps; p->path; p++) + if (ino == *xfs_imeta_sbmap_to_inop(mp, p)) + return true; + + return false; +} + +/* Ensure that the in-core superblock has all the values that it should. */ +int +xfs_imeta_mount( + struct xfs_mount *mp) +{ + return 0; +} diff --git a/fs/xfs/libxfs/xfs_imeta.h b/fs/xfs/libxfs/xfs_imeta.h new file mode 100644 index 000000000000..b535e19ff1a0 --- /dev/null +++ b/fs/xfs/libxfs/xfs_imeta.h @@ -0,0 +1,48 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * Copyright (C) 2022 Oracle. All Rights Reserved. + * Author: Darrick J. Wong <djwong@kernel.org> + */ +#ifndef __XFS_IMETA_H__ +#define __XFS_IMETA_H__ + +/* Key for looking up metadata inodes. */ +struct xfs_imeta_path { + /* Temporary: integer to keep the static imeta definitions unique */ + int bogus; +}; + +/* Cleanup widget for metadata inode creation and deletion. */ +struct xfs_imeta_update { + /* empty for now */ +}; + +/* Lookup keys for static metadata inodes. */ +extern const struct xfs_imeta_path XFS_IMETA_RTBITMAP; +extern const struct xfs_imeta_path XFS_IMETA_RTSUMMARY; +extern const struct xfs_imeta_path XFS_IMETA_USRQUOTA; +extern const struct xfs_imeta_path XFS_IMETA_GRPQUOTA; +extern const struct xfs_imeta_path XFS_IMETA_PRJQUOTA; + +int xfs_imeta_lookup(struct xfs_mount *mp, const struct xfs_imeta_path *path, + xfs_ino_t *ino); + +/* Don't allocate quota for this file. */ +#define XFS_IMETA_CREATE_NOQUOTA (1 << 0) +int xfs_imeta_create(struct xfs_trans **tpp, const struct xfs_imeta_path *path, + umode_t mode, unsigned int flags, struct xfs_inode **ipp, + struct xfs_imeta_update *upd); +int xfs_imeta_unlink(struct xfs_trans **tpp, const struct xfs_imeta_path *path, + struct xfs_inode *ip, struct xfs_imeta_update *upd); +int xfs_imeta_link(struct xfs_trans *tp, const struct xfs_imeta_path *path, + struct xfs_inode *ip, struct xfs_imeta_update *upd); +void xfs_imeta_end_update(struct xfs_mount *mp, struct xfs_imeta_update *upd, + int error); +int xfs_imeta_start_update(struct xfs_mount *mp, + const struct xfs_imeta_path *path, + struct xfs_imeta_update *upd); + +bool xfs_is_static_meta_ino(struct xfs_mount *mp, xfs_ino_t ino); +int xfs_imeta_mount(struct xfs_mount *mp); + +#endif /* __XFS_IMETA_H__ */ diff --git a/fs/xfs/libxfs/xfs_types.c b/fs/xfs/libxfs/xfs_types.c index 5c2765934732..50efa181b26d 100644 --- a/fs/xfs/libxfs/xfs_types.c +++ b/fs/xfs/libxfs/xfs_types.c @@ -12,6 +12,7 @@ #include "xfs_bit.h" #include "xfs_mount.h" #include "xfs_ag.h" +#include "xfs_imeta.h" /* @@ -115,9 +116,7 @@ xfs_internal_inum( struct xfs_mount *mp, xfs_ino_t ino) { - return ino == mp->m_sb.sb_rbmino || ino == mp->m_sb.sb_rsumino || - (xfs_has_quota(mp) && - xfs_is_quota_inode(&mp->m_sb, ino)); + return xfs_is_static_meta_ino(mp, ino); } /* diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c index 54cd47882991..0ee6a856f1e4 100644 --- a/fs/xfs/xfs_mount.c +++ b/fs/xfs/xfs_mount.c @@ -34,6 +34,7 @@ #include "xfs_health.h" #include "xfs_trace.h" #include "xfs_ag.h" +#include "xfs_imeta.h" static DEFINE_MUTEX(xfs_uuid_table_mutex); static int xfs_uuid_table_size; @@ -609,6 +610,22 @@ xfs_mount_setup_inode_geom( xfs_ialloc_setup_geometry(mp); } +STATIC int +xfs_mount_setup_metadir( + struct xfs_mount *mp) +{ + int error; + + error = xfs_imeta_mount(mp); + if (error) { + xfs_warn(mp, "Failed to load metadata inode info, error %d", + error); + return error; + } + + return 0; +} + /* Compute maximum possible height for per-AG btree types for this fs. */ static inline void xfs_agbtree_compute_maxlevels( @@ -843,6 +860,10 @@ xfs_mountfs( mp->m_features |= XFS_FEAT_ATTR2; } + error = xfs_mount_setup_metadir(mp); + if (error) + goto out_log_dealloc; + /* * Get and sanity-check the root inode. * Save the pointer to it in the mount structure. diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index 147dbdf73d92..1a3176932de8 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -148,7 +148,7 @@ DEFINE_ATTR_LIST_EVENT(xfs_attr_list_notfound); DEFINE_ATTR_LIST_EVENT(xfs_attr_leaf_list); DEFINE_ATTR_LIST_EVENT(xfs_attr_node_list); -TRACE_EVENT(xlog_intent_recovery_failed, +DECLARE_EVENT_CLASS(xfs_fs_error_class, TP_PROTO(struct xfs_mount *mp, int error, void *function), TP_ARGS(mp, error, function), TP_STRUCT__entry( @@ -165,6 +165,11 @@ TRACE_EVENT(xlog_intent_recovery_failed, MAJOR(__entry->dev), MINOR(__entry->dev), __entry->error, __entry->function) ); +#define DEFINE_FS_ERROR_EVENT(name) \ +DEFINE_EVENT(xfs_fs_error_class, name, \ + TP_PROTO(struct xfs_mount *mp, int error, void *function), \ + TP_ARGS(mp, error, function)) +DEFINE_FS_ERROR_EVENT(xlog_intent_recovery_failed); DECLARE_EVENT_CLASS(xfs_perag_class, TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, int refcount, @@ -4917,6 +4922,36 @@ TRACE_EVENT(xfs_swapext_delta_nextents, __entry->d_nexts1, __entry->d_nexts2) ); +DECLARE_EVENT_CLASS(xfs_imeta_sb_class, + TP_PROTO(struct xfs_mount *mp, xfs_ino_t *sb_inop), + TP_ARGS(mp, sb_inop), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(unsigned int, sb_offset) + __field(xfs_ino_t, ino) + ), + TP_fast_assign( + __entry->dev = mp->m_super->s_dev; + __entry->sb_offset = (char *)sb_inop - (char *)&mp->m_sb; + __entry->ino = *sb_inop; + ), + TP_printk("dev %d:%d sb_offset 0x%x ino 0x%llx", + MAJOR(__entry->dev), MINOR(__entry->dev), + __entry->sb_offset, + __entry->ino) +) + +#define DEFINE_IMETA_SB_EVENT(name) \ +DEFINE_EVENT(xfs_imeta_sb_class, name, \ + TP_PROTO(struct xfs_mount *mp, xfs_ino_t *sb_inop), \ + TP_ARGS(mp, sb_inop)) +DEFINE_IMETA_SB_EVENT(xfs_imeta_sb_lookup); +DEFINE_IMETA_SB_EVENT(xfs_imeta_sb_create); +DEFINE_IMETA_SB_EVENT(xfs_imeta_sb_unlink); +DEFINE_IMETA_SB_EVENT(xfs_imeta_sb_link); +DEFINE_FS_ERROR_EVENT(xfs_imeta_start_update); +DEFINE_FS_ERROR_EVENT(xfs_imeta_end_update); + #endif /* _TRACE_XFS_H */ #undef TRACE_INCLUDE_PATH ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 01/23] xfs: don't use the incore struct xfs_sb for offsets into struct xfs_dsb 2022-12-30 22:17 ` [PATCHSET v1.0 00/23] xfs: metadata inode directories Darrick J. Wong 2022-12-30 22:17 ` [PATCH 02/23] xfs: create imeta abstractions to get and set metadata inodes Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 09/23] xfs: load metadata directory root at mount time Darrick J. Wong ` (20 subsequent siblings) 22 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Currently, the XFS_SB_CRC_OFF macro uses the incore superblock struct (xfs_sb) to compute the address of sb_crc within the ondisk superblock struct (xfs_dsb). This is a landmine if we ever change the layout of the incore superblock (as we're about to do), so redefine the macro to use xfs_dsb to compute the layout of xfs_dsb. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_format.h | 9 ++++----- fs/xfs/xfs_ondisk.h | 1 + 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h index 0c457905cce5..abd75b3091ec 100644 --- a/fs/xfs/libxfs/xfs_format.h +++ b/fs/xfs/libxfs/xfs_format.h @@ -90,8 +90,7 @@ struct xfs_ifork; #define XFSLABEL_MAX 12 /* - * Superblock - in core version. Must match the ondisk version below. - * Must be padded to 64 bit alignment. + * Superblock - in core version. Must be padded to 64 bit alignment. */ typedef struct xfs_sb { uint32_t sb_magicnum; /* magic number == XFS_SB_MAGIC */ @@ -178,10 +177,8 @@ typedef struct xfs_sb { /* must be padded to 64 bit alignment */ } xfs_sb_t; -#define XFS_SB_CRC_OFF offsetof(struct xfs_sb, sb_crc) - /* - * Superblock - on disk version. Must match the in core version above. + * Superblock - on disk version. * Must be padded to 64 bit alignment. */ struct xfs_dsb { @@ -265,6 +262,8 @@ struct xfs_dsb { /* must be padded to 64 bit alignment */ }; +#define XFS_SB_CRC_OFF offsetof(struct xfs_dsb, sb_crc) + /* * Misc. Flags - warning - these will be cleared by xfs_repair unless * a feature bit is set when the flag is used. diff --git a/fs/xfs/xfs_ondisk.h b/fs/xfs/xfs_ondisk.h index 9737b5a9f405..1e71d27f0cae 100644 --- a/fs/xfs/xfs_ondisk.h +++ b/fs/xfs/xfs_ondisk.h @@ -81,6 +81,7 @@ xfs_check_ondisk_structs(void) XFS_CHECK_STRUCT_SIZE(xfs_attr_leaf_name_remote_t, 12); */ + XFS_CHECK_OFFSET(struct xfs_dsb, sb_crc, 224); XFS_CHECK_OFFSET(xfs_attr_leaf_name_local_t, valuelen, 0); XFS_CHECK_OFFSET(xfs_attr_leaf_name_local_t, namelen, 2); XFS_CHECK_OFFSET(xfs_attr_leaf_name_local_t, nameval, 3); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 09/23] xfs: load metadata directory root at mount time 2022-12-30 22:17 ` [PATCHSET v1.0 00/23] xfs: metadata inode directories Darrick J. Wong 2022-12-30 22:17 ` [PATCH 02/23] xfs: create imeta abstractions to get and set metadata inodes Darrick J. Wong 2022-12-30 22:17 ` [PATCH 01/23] xfs: don't use the incore struct xfs_sb for offsets into struct xfs_dsb Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 05/23] xfs: convert all users to xfs_imeta_log Darrick J. Wong ` (19 subsequent siblings) 22 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Load the metadata directory root inode into memory at mount time and release it at unmount time. We also make sure that the obsolete inode pointers in the superblock are not logged or read from the superblock. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_sb.c | 31 +++++++++++++++++++++++++++++++ fs/xfs/libxfs/xfs_types.c | 2 +- fs/xfs/xfs_mount.c | 20 +++++++++++++++++--- fs/xfs/xfs_mount.h | 1 + 4 files changed, 50 insertions(+), 4 deletions(-) diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c index 345a6fdf8625..181bede3b3f6 100644 --- a/fs/xfs/libxfs/xfs_sb.c +++ b/fs/xfs/libxfs/xfs_sb.c @@ -622,6 +622,25 @@ __xfs_sb_from_disk( /* Convert on-disk flags to in-memory flags? */ if (convert_xquota) xfs_sb_quota_from_disk(to); + + if (to->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_METADIR) { + /* + * Set metadirino here and null out the in-core fields for + * the other inodes because metadir initialization will load + * them later. + */ + to->sb_metadirino = be64_to_cpu(from->sb_rbmino); + to->sb_rbmino = NULLFSINO; + to->sb_rsumino = NULLFSINO; + + /* + * We don't have to worry about quota inode conversion here + * because metadir requires a v5 filesystem. + */ + to->sb_uquotino = NULLFSINO; + to->sb_gquotino = NULLFSINO; + to->sb_pquotino = NULLFSINO; + } } void @@ -769,6 +788,18 @@ xfs_sb_to_disk( to->sb_lsn = cpu_to_be64(from->sb_lsn); if (from->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_META_UUID) uuid_copy(&to->sb_meta_uuid, &from->sb_meta_uuid); + + if (from->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_METADIR) { + /* + * Save metadirino here and null out the on-disk fields for + * the other inodes, at least until we reuse the fields. + */ + to->sb_rbmino = cpu_to_be64(from->sb_metadirino); + to->sb_rsumino = cpu_to_be64(NULLFSINO); + to->sb_uquotino = cpu_to_be64(NULLFSINO); + to->sb_gquotino = cpu_to_be64(NULLFSINO); + to->sb_pquotino = cpu_to_be64(NULLFSINO); + } } /* diff --git a/fs/xfs/libxfs/xfs_types.c b/fs/xfs/libxfs/xfs_types.c index 50efa181b26d..dfcc1889c203 100644 --- a/fs/xfs/libxfs/xfs_types.c +++ b/fs/xfs/libxfs/xfs_types.c @@ -128,7 +128,7 @@ xfs_verify_dir_ino( struct xfs_mount *mp, xfs_ino_t ino) { - if (xfs_internal_inum(mp, ino)) + if (!xfs_has_metadir(mp) && xfs_internal_inum(mp, ino)) return false; return xfs_verify_ino(mp, ino); } diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c index 0ee6a856f1e4..3957c60d5d07 100644 --- a/fs/xfs/xfs_mount.c +++ b/fs/xfs/xfs_mount.c @@ -616,6 +616,16 @@ xfs_mount_setup_metadir( { int error; + /* Load the metadata directory inode into memory. */ + if (xfs_has_metadir(mp)) { + error = xfs_imeta_iget(mp, mp->m_sb.sb_metadirino, + XFS_DIR3_FT_DIR, &mp->m_metadirip); + if (error) { + xfs_warn(mp, "Failed metadir inode init: %d", error); + return error; + } + } + error = xfs_imeta_mount(mp); if (error) { xfs_warn(mp, "Failed to load metadata inode info, error %d", @@ -862,7 +872,7 @@ xfs_mountfs( error = xfs_mount_setup_metadir(mp); if (error) - goto out_log_dealloc; + goto out_free_metadir; /* * Get and sanity-check the root inode. @@ -874,7 +884,7 @@ xfs_mountfs( xfs_warn(mp, "Failed to read root inode 0x%llx, error %d", sbp->sb_rootino, -error); - goto out_log_dealloc; + goto out_free_metadir; } ASSERT(rip != NULL); @@ -1017,6 +1027,9 @@ xfs_mountfs( xfs_irele(rip); /* Clean out dquots that might be in memory after quotacheck. */ xfs_qm_unmount(mp); + out_free_metadir: + if (mp->m_metadirip) + xfs_imeta_irele(mp->m_metadirip); /* * Inactivate all inodes that might still be in memory after a log @@ -1038,7 +1051,6 @@ xfs_mountfs( * quota inodes. */ xfs_unmount_flush_inodes(mp); - out_log_dealloc: xfs_log_mount_cancel(mp); out_inodegc_shrinker: unregister_shrinker(&mp->m_inodegc_shrinker); @@ -1090,6 +1102,8 @@ xfs_unmountfs( xfs_qm_unmount_quotas(mp); xfs_rtunmount_inodes(mp); xfs_irele(mp->m_rootip); + if (mp->m_metadirip) + xfs_imeta_irele(mp->m_metadirip); xfs_unmount_flush_inodes(mp); diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h index 0fb545e92a26..88fbbaee8806 100644 --- a/fs/xfs/xfs_mount.h +++ b/fs/xfs/xfs_mount.h @@ -90,6 +90,7 @@ typedef struct xfs_mount { struct xfs_inode *m_rbmip; /* pointer to bitmap inode */ struct xfs_inode *m_rsumip; /* pointer to summary inode */ struct xfs_inode *m_rootip; /* pointer to root directory */ + struct xfs_inode *m_metadirip; /* ptr to metadata directory */ struct xfs_quotainfo *m_quotainfo; /* disk quota information */ xfs_buftarg_t *m_ddev_targp; /* saves taking the address */ xfs_buftarg_t *m_logdev_targp;/* ptr to log device */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 05/23] xfs: convert all users to xfs_imeta_log 2022-12-30 22:17 ` [PATCHSET v1.0 00/23] xfs: metadata inode directories Darrick J. Wong ` (2 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 09/23] xfs: load metadata directory root at mount time Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 06/23] xfs: iget for metadata inodes Darrick J. Wong ` (18 subsequent siblings) 22 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Convert all open-coded sb metadata inode pointer logging to use xfs_imeta_log. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/xfs_qm.c | 85 +++++++++++++++++++++++++++++++++++++------------------ 1 file changed, 57 insertions(+), 28 deletions(-) diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c index 4c629a3bc69e..0f193e85294b 100644 --- a/fs/xfs/xfs_qm.c +++ b/fs/xfs/xfs_qm.c @@ -27,6 +27,7 @@ #include "xfs_ialloc.h" #include "xfs_log_priv.h" #include "xfs_health.h" +#include "xfs_imeta.h" /* * The global quota manager. There is only one of these for the entire @@ -731,6 +732,18 @@ xfs_qm_destroy_quotainfo( mp->m_quotainfo = NULL; } +static inline const struct xfs_imeta_path * +xfs_qflags_to_imeta( + unsigned int qflags) +{ + if (qflags & XFS_QMOPT_UQUOTA) + return &XFS_IMETA_USRQUOTA; + else if (qflags & XFS_QMOPT_GQUOTA) + return &XFS_IMETA_GRPQUOTA; + else + return &XFS_IMETA_PRJQUOTA; +} + /* * Switch the group and project quota in-core inode pointers if needed. * @@ -738,6 +751,12 @@ xfs_qm_destroy_quotainfo( * between gquota and pquota. If the on-disk superblock has GQUOTA and the * filesystem is now mounted with PQUOTA, just use sb_gquotino for sb_pquotino * and vice-versa. + * + * We tolerate the direct manipulation of the in-core sb quota inode pointers + * here because calling xfs_imeta_log is only really required for filesystems + * with the metadata directory feature. That feature requires a v5 superblock, + * which always supports simultaneous group and project quotas, so we'll never + * get here. */ STATIC int xfs_qm_qino_switch( @@ -776,8 +795,13 @@ xfs_qm_qino_switch( if (error) return error; - mp->m_sb.sb_gquotino = NULLFSINO; - mp->m_sb.sb_pquotino = NULLFSINO; + if (flags & XFS_QMOPT_PQUOTA) { + mp->m_sb.sb_gquotino = NULLFSINO; + mp->m_sb.sb_pquotino = ino; + } else if (flags & XFS_QMOPT_GQUOTA) { + mp->m_sb.sb_gquotino = ino; + mp->m_sb.sb_pquotino = NULLFSINO; + } *need_alloc = false; return 0; } @@ -792,7 +816,9 @@ xfs_qm_qino_alloc( struct xfs_inode **ipp, unsigned int flags) { + struct xfs_imeta_update upd; struct xfs_trans *tp; + const struct xfs_imeta_path *path = xfs_qflags_to_imeta(flags); int error; bool need_alloc = true; @@ -802,28 +828,15 @@ xfs_qm_qino_alloc( if (error) return error; - error = xfs_trans_alloc(mp, &M_RES(mp)->tr_create, - need_alloc ? XFS_QM_QINOCREATE_SPACE_RES(mp) : 0, + error = xfs_imeta_start_update(mp, path, &upd); + if (error) + return error; + + error = xfs_trans_alloc(mp, &M_RES(mp)->tr_imeta_create, + need_alloc ? xfs_imeta_create_space_res(mp) : 0, 0, 0, &tp); if (error) - return error; - - if (need_alloc) { - struct xfs_icreate_args args = { - .nlink = 1, - }; - xfs_ino_t ino; - - xfs_icreate_args_rootfile(&args, S_IFREG); - - error = xfs_dialloc(&tp, 0, S_IFREG, &ino); - if (!error) - error = xfs_icreate(tp, ino, &args, ipp); - if (error) { - xfs_trans_cancel(tp); - return error; - } - } + goto out_end; /* * Make the changes in the superblock, and log those too. @@ -842,22 +855,38 @@ xfs_qm_qino_alloc( /* qflags will get updated fully _after_ quotacheck */ mp->m_sb.sb_qflags = mp->m_qflags & XFS_ALL_QUOTA_ACCT; } - if (flags & XFS_QMOPT_UQUOTA) - mp->m_sb.sb_uquotino = (*ipp)->i_ino; - else if (flags & XFS_QMOPT_GQUOTA) - mp->m_sb.sb_gquotino = (*ipp)->i_ino; - else - mp->m_sb.sb_pquotino = (*ipp)->i_ino; spin_unlock(&mp->m_sb_lock); xfs_log_sb(tp); + if (need_alloc) { + error = xfs_imeta_create(&tp, path, S_IFREG, + XFS_IMETA_CREATE_NOQUOTA, ipp, &upd); + if (error) + goto out_cancel; + } + error = xfs_trans_commit(tp); if (error) { ASSERT(xfs_is_shutdown(mp)); xfs_alert(mp, "%s failed (error %d)!", __func__, error); + goto out_end; } + + xfs_imeta_end_update(mp, &upd, error); if (need_alloc) xfs_finish_inode_setup(*ipp); + return 0; + +out_cancel: + xfs_trans_cancel(tp); +out_end: + /* Have to finish setting up the inode to ensure it's deleted. */ + if (*ipp) { + xfs_finish_inode_setup(*ipp); + xfs_irele(*ipp); + *ipp = NULL; + } + xfs_imeta_end_update(mp, &upd, error); return error; } ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 06/23] xfs: iget for metadata inodes 2022-12-30 22:17 ` [PATCHSET v1.0 00/23] xfs: metadata inode directories Darrick J. Wong ` (3 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 05/23] xfs: convert all users to xfs_imeta_log Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 04/23] xfs: refactor the v4 group/project inode pointer switch Darrick J. Wong ` (17 subsequent siblings) 22 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Create a xfs_iget_meta function for metadata inodes to ensure that we always check that the inobt thinks a metadata inode is in use. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_imeta.h | 5 +++++ fs/xfs/xfs_icache.c | 36 ++++++++++++++++++++++++++++++++++++ fs/xfs/xfs_inode.c | 8 ++++++++ fs/xfs/xfs_qm.c | 33 +++++++++++++++++---------------- fs/xfs/xfs_qm_syscalls.c | 4 +++- fs/xfs/xfs_rtalloc.c | 16 ++++++++++------ 6 files changed, 79 insertions(+), 23 deletions(-) diff --git a/fs/xfs/libxfs/xfs_imeta.h b/fs/xfs/libxfs/xfs_imeta.h index 9d54cb0d7962..312e3a6fdb96 100644 --- a/fs/xfs/libxfs/xfs_imeta.h +++ b/fs/xfs/libxfs/xfs_imeta.h @@ -48,4 +48,9 @@ int xfs_imeta_mount(struct xfs_mount *mp); unsigned int xfs_imeta_create_space_res(struct xfs_mount *mp); unsigned int xfs_imeta_unlink_space_res(struct xfs_mount *mp); +/* Must be implemented by the libxfs client */ +int xfs_imeta_iget(struct xfs_mount *mp, xfs_ino_t ino, unsigned char ftype, + struct xfs_inode **ipp); +void xfs_imeta_irele(struct xfs_inode *ip); + #endif /* __XFS_IMETA_H__ */ diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c index 06b3de67d791..bccdaf51cd67 100644 --- a/fs/xfs/xfs_icache.c +++ b/fs/xfs/xfs_icache.c @@ -25,6 +25,9 @@ #include "xfs_ag.h" #include "xfs_log_priv.h" #include "xfs_health.h" +#include "xfs_da_format.h" +#include "xfs_dir2.h" +#include "xfs_imeta.h" #include <linux/iversion.h> @@ -905,6 +908,39 @@ xfs_icache_inode_is_allocated( return error; } +/* Get a metadata inode. The ftype must match exactly. */ +int +xfs_imeta_iget( + struct xfs_mount *mp, + xfs_ino_t ino, + unsigned char ftype, + struct xfs_inode **ipp) +{ + struct xfs_inode *ip; + int error; + + ASSERT(ftype != XFS_DIR3_FT_UNKNOWN); + + error = xfs_iget(mp, NULL, ino, XFS_IGET_UNTRUSTED, 0, &ip); + if (error == -EFSCORRUPTED) + goto whine; + if (error) + return error; + + if (VFS_I(ip)->i_nlink == 0) + goto bad_rele; + if (xfs_mode_to_ftype(VFS_I(ip)->i_mode) != ftype) + goto bad_rele; + + *ipp = ip; + return 0; +bad_rele: + xfs_irele(ip); +whine: + xfs_err(mp, "metadata inode 0x%llx is corrupt", ino); + return -EFSCORRUPTED; +} + /* * Grab the inode for reclaim exclusively. * diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index fdd5e5c89e62..83127fed2b10 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -40,6 +40,7 @@ #include "xfs_log_priv.h" #include "xfs_health.h" #include "xfs_inode_util.h" +#include "xfs_imeta.h" struct kmem_cache *xfs_inode_cache; @@ -2709,6 +2710,13 @@ xfs_irele( iput(VFS_I(ip)); } +void +xfs_imeta_irele( + struct xfs_inode *ip) +{ + xfs_irele(ip); +} + /* * Ensure all commited transactions touching the inode are written to the log. */ diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c index 0f193e85294b..8828e8cafca5 100644 --- a/fs/xfs/xfs_qm.c +++ b/fs/xfs/xfs_qm.c @@ -28,6 +28,7 @@ #include "xfs_log_priv.h" #include "xfs_health.h" #include "xfs_imeta.h" +#include "xfs_da_format.h" /* * The global quota manager. There is only one of these for the entire @@ -232,15 +233,15 @@ xfs_qm_unmount_quotas( */ if (mp->m_quotainfo) { if (mp->m_quotainfo->qi_uquotaip) { - xfs_irele(mp->m_quotainfo->qi_uquotaip); + xfs_imeta_irele(mp->m_quotainfo->qi_uquotaip); mp->m_quotainfo->qi_uquotaip = NULL; } if (mp->m_quotainfo->qi_gquotaip) { - xfs_irele(mp->m_quotainfo->qi_gquotaip); + xfs_imeta_irele(mp->m_quotainfo->qi_gquotaip); mp->m_quotainfo->qi_gquotaip = NULL; } if (mp->m_quotainfo->qi_pquotaip) { - xfs_irele(mp->m_quotainfo->qi_pquotaip); + xfs_imeta_irele(mp->m_quotainfo->qi_pquotaip); mp->m_quotainfo->qi_pquotaip = NULL; } } @@ -791,7 +792,7 @@ xfs_qm_qino_switch( if (ino == NULLFSINO) return 0; - error = xfs_iget(mp, NULL, ino, 0, 0, ipp); + error = xfs_imeta_iget(mp, ino, XFS_DIR3_FT_REG_FILE, ipp); if (error) return error; @@ -1576,24 +1577,24 @@ xfs_qm_init_quotainos( if (XFS_IS_UQUOTA_ON(mp) && mp->m_sb.sb_uquotino != NULLFSINO) { ASSERT(mp->m_sb.sb_uquotino > 0); - error = xfs_iget(mp, NULL, mp->m_sb.sb_uquotino, - 0, 0, &uip); + error = xfs_imeta_iget(mp, mp->m_sb.sb_uquotino, + XFS_DIR3_FT_REG_FILE, &uip); if (error) return error; } if (XFS_IS_GQUOTA_ON(mp) && mp->m_sb.sb_gquotino != NULLFSINO) { ASSERT(mp->m_sb.sb_gquotino > 0); - error = xfs_iget(mp, NULL, mp->m_sb.sb_gquotino, - 0, 0, &gip); + error = xfs_imeta_iget(mp, mp->m_sb.sb_gquotino, + XFS_DIR3_FT_REG_FILE, &gip); if (error) goto error_rele; } if (XFS_IS_PQUOTA_ON(mp) && mp->m_sb.sb_pquotino != NULLFSINO) { ASSERT(mp->m_sb.sb_pquotino > 0); - error = xfs_iget(mp, NULL, mp->m_sb.sb_pquotino, - 0, 0, &pip); + error = xfs_imeta_iget(mp, mp->m_sb.sb_pquotino, + XFS_DIR3_FT_REG_FILE, &pip); if (error) goto error_rele; } @@ -1638,11 +1639,11 @@ xfs_qm_init_quotainos( error_rele: if (uip) - xfs_irele(uip); + xfs_imeta_irele(uip); if (gip) - xfs_irele(gip); + xfs_imeta_irele(gip); if (pip) - xfs_irele(pip); + xfs_imeta_irele(pip); return error; } @@ -1651,15 +1652,15 @@ xfs_qm_destroy_quotainos( struct xfs_quotainfo *qi) { if (qi->qi_uquotaip) { - xfs_irele(qi->qi_uquotaip); + xfs_imeta_irele(qi->qi_uquotaip); qi->qi_uquotaip = NULL; /* paranoia */ } if (qi->qi_gquotaip) { - xfs_irele(qi->qi_gquotaip); + xfs_imeta_irele(qi->qi_gquotaip); qi->qi_gquotaip = NULL; } if (qi->qi_pquotaip) { - xfs_irele(qi->qi_pquotaip); + xfs_imeta_irele(qi->qi_pquotaip); qi->qi_pquotaip = NULL; } } diff --git a/fs/xfs/xfs_qm_syscalls.c b/fs/xfs/xfs_qm_syscalls.c index 392cb39cc10c..30474d67bf82 100644 --- a/fs/xfs/xfs_qm_syscalls.c +++ b/fs/xfs/xfs_qm_syscalls.c @@ -18,6 +18,8 @@ #include "xfs_quota.h" #include "xfs_qm.h" #include "xfs_icache.h" +#include "xfs_imeta.h" +#include "xfs_da_format.h" int xfs_qm_scall_quotaoff( @@ -62,7 +64,7 @@ xfs_qm_scall_trunc_qfile( if (ino == NULLFSINO) return 0; - error = xfs_iget(mp, NULL, ino, 0, 0, &ip); + error = xfs_imeta_iget(mp, ino, XFS_DIR3_FT_REG_FILE, &ip); if (error) return error; diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c index 883333036519..726e3cec34d5 100644 --- a/fs/xfs/xfs_rtalloc.c +++ b/fs/xfs/xfs_rtalloc.c @@ -22,6 +22,8 @@ #include "xfs_log_priv.h" #include "xfs_health.h" #include "xfs_trace.h" +#include "xfs_da_format.h" +#include "xfs_imeta.h" /* * Read and return the summary information for a given extent size, @@ -1375,7 +1377,8 @@ xfs_rtmount_inodes( xfs_sb_t *sbp; sbp = &mp->m_sb; - error = xfs_iget(mp, NULL, sbp->sb_rbmino, 0, 0, &mp->m_rbmip); + error = xfs_imeta_iget(mp, mp->m_sb.sb_rbmino, XFS_DIR3_FT_REG_FILE, + &mp->m_rbmip); if (xfs_metadata_is_sick(error)) xfs_rt_mark_sick(mp, XFS_SICK_RT_BITMAP); if (error) @@ -1386,7 +1389,8 @@ xfs_rtmount_inodes( if (error) goto out_rele_bitmap; - error = xfs_iget(mp, NULL, sbp->sb_rsumino, 0, 0, &mp->m_rsumip); + error = xfs_imeta_iget(mp, mp->m_sb.sb_rsumino, XFS_DIR3_FT_REG_FILE, + &mp->m_rsumip); if (xfs_metadata_is_sick(error)) xfs_rt_mark_sick(mp, XFS_SICK_RT_SUMMARY); if (error) @@ -1401,9 +1405,9 @@ xfs_rtmount_inodes( return 0; out_rele_summary: - xfs_irele(mp->m_rsumip); + xfs_imeta_irele(mp->m_rsumip); out_rele_bitmap: - xfs_irele(mp->m_rbmip); + xfs_imeta_irele(mp->m_rbmip); return error; } @@ -1413,9 +1417,9 @@ xfs_rtunmount_inodes( { kmem_free(mp->m_rsum_cache); if (mp->m_rbmip) - xfs_irele(mp->m_rbmip); + xfs_imeta_irele(mp->m_rbmip); if (mp->m_rsumip) - xfs_irele(mp->m_rsumip); + xfs_imeta_irele(mp->m_rsumip); } /* ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 04/23] xfs: refactor the v4 group/project inode pointer switch 2022-12-30 22:17 ` [PATCHSET v1.0 00/23] xfs: metadata inode directories Darrick J. Wong ` (4 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 06/23] xfs: iget for metadata inodes Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 08/23] xfs: update imeta transaction reservations for metadir Darrick J. Wong ` (16 subsequent siblings) 22 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Refactor the group and project quota inode pointer switcheroo that happens only on v4 filesystems into a separate function prior to enhancing the xfs_qm_qino_alloc function. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/xfs_qm.c | 90 +++++++++++++++++++++++++++++++++---------------------- 1 file changed, 54 insertions(+), 36 deletions(-) diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c index da6c6f0e1ced..4c629a3bc69e 100644 --- a/fs/xfs/xfs_qm.c +++ b/fs/xfs/xfs_qm.c @@ -731,6 +731,57 @@ xfs_qm_destroy_quotainfo( mp->m_quotainfo = NULL; } +/* + * Switch the group and project quota in-core inode pointers if needed. + * + * On v4 superblocks that don't have separate pquotino, we share an inode + * between gquota and pquota. If the on-disk superblock has GQUOTA and the + * filesystem is now mounted with PQUOTA, just use sb_gquotino for sb_pquotino + * and vice-versa. + */ +STATIC int +xfs_qm_qino_switch( + struct xfs_mount *mp, + struct xfs_inode **ipp, + unsigned int flags, + bool *need_alloc) +{ + xfs_ino_t ino = NULLFSINO; + int error; + + if (xfs_has_pquotino(mp) || + !(flags & (XFS_QMOPT_PQUOTA | XFS_QMOPT_GQUOTA))) + return 0; + + if ((flags & XFS_QMOPT_PQUOTA) && + (mp->m_sb.sb_gquotino != NULLFSINO)) { + ino = mp->m_sb.sb_gquotino; + if (XFS_IS_CORRUPT(mp, mp->m_sb.sb_pquotino != NULLFSINO)) { + xfs_fs_mark_sick(mp, XFS_SICK_FS_PQUOTA); + return -EFSCORRUPTED; + } + } else if ((flags & XFS_QMOPT_GQUOTA) && + (mp->m_sb.sb_pquotino != NULLFSINO)) { + ino = mp->m_sb.sb_pquotino; + if (XFS_IS_CORRUPT(mp, mp->m_sb.sb_gquotino != NULLFSINO)) { + xfs_fs_mark_sick(mp, XFS_SICK_FS_GQUOTA); + return -EFSCORRUPTED; + } + } + + if (ino == NULLFSINO) + return 0; + + error = xfs_iget(mp, NULL, ino, 0, 0, ipp); + if (error) + return error; + + mp->m_sb.sb_gquotino = NULLFSINO; + mp->m_sb.sb_pquotino = NULLFSINO; + *need_alloc = false; + return 0; +} + /* * Create an inode and return with a reference already taken, but unlocked * This is how we create quota inodes @@ -746,43 +797,10 @@ xfs_qm_qino_alloc( bool need_alloc = true; *ipp = NULL; - /* - * With superblock that doesn't have separate pquotino, we - * share an inode between gquota and pquota. If the on-disk - * superblock has GQUOTA and the filesystem is now mounted - * with PQUOTA, just use sb_gquotino for sb_pquotino and - * vice-versa. - */ - if (!xfs_has_pquotino(mp) && - (flags & (XFS_QMOPT_PQUOTA|XFS_QMOPT_GQUOTA))) { - xfs_ino_t ino = NULLFSINO; - if ((flags & XFS_QMOPT_PQUOTA) && - (mp->m_sb.sb_gquotino != NULLFSINO)) { - ino = mp->m_sb.sb_gquotino; - if (XFS_IS_CORRUPT(mp, - mp->m_sb.sb_pquotino != NULLFSINO)) { - xfs_fs_mark_sick(mp, XFS_SICK_FS_PQUOTA); - return -EFSCORRUPTED; - } - } else if ((flags & XFS_QMOPT_GQUOTA) && - (mp->m_sb.sb_pquotino != NULLFSINO)) { - ino = mp->m_sb.sb_pquotino; - if (XFS_IS_CORRUPT(mp, - mp->m_sb.sb_gquotino != NULLFSINO)) { - xfs_fs_mark_sick(mp, XFS_SICK_FS_GQUOTA); - return -EFSCORRUPTED; - } - } - if (ino != NULLFSINO) { - error = xfs_iget(mp, NULL, ino, 0, 0, ipp); - if (error) - return error; - mp->m_sb.sb_gquotino = NULLFSINO; - mp->m_sb.sb_pquotino = NULLFSINO; - need_alloc = false; - } - } + error = xfs_qm_qino_switch(mp, ipp, flags, &need_alloc); + if (error) + return error; error = xfs_trans_alloc(mp, &M_RES(mp)->tr_create, need_alloc ? XFS_QM_QINOCREATE_SPACE_RES(mp) : 0, ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 08/23] xfs: update imeta transaction reservations for metadir 2022-12-30 22:17 ` [PATCHSET v1.0 00/23] xfs: metadata inode directories Darrick J. Wong ` (5 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 04/23] xfs: refactor the v4 group/project inode pointer switch Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 07/23] xfs: define the on-disk format for the metadir feature Darrick J. Wong ` (15 subsequent siblings) 22 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Update the new metadata inode transaction reservations to handle metadata directories if that feature is enabled. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_log_rlimit.c | 9 +++++ fs/xfs/libxfs/xfs_trans_resv.c | 68 +++++++++++++++++++++++++++++++++++++++- 2 files changed, 75 insertions(+), 2 deletions(-) diff --git a/fs/xfs/libxfs/xfs_log_rlimit.c b/fs/xfs/libxfs/xfs_log_rlimit.c index 9975b93a7412..a7ad8fe5dab9 100644 --- a/fs/xfs/libxfs/xfs_log_rlimit.c +++ b/fs/xfs/libxfs/xfs_log_rlimit.c @@ -48,6 +48,15 @@ xfs_log_calc_trans_resv_for_minlogblocks( { unsigned int rmap_maxlevels = mp->m_rmap_maxlevels; + /* + * The metadata directory tree feature drops the oversized minimum log + * size computations introduced by the original reflink code. + */ + if (xfs_has_metadir(mp)) { + xfs_trans_resv_calc(mp, resv); + return; + } + /* * In the early days of rmap+reflink, we always set the rmap maxlevels * to 9 even if the AG was small enough that it would never grow to diff --git a/fs/xfs/libxfs/xfs_trans_resv.c b/fs/xfs/libxfs/xfs_trans_resv.c index d2716184bd8f..08d5d6e7f554 100644 --- a/fs/xfs/libxfs/xfs_trans_resv.c +++ b/fs/xfs/libxfs/xfs_trans_resv.c @@ -909,6 +909,56 @@ xfs_calc_sb_reservation( return xfs_calc_buf_res(1, mp->m_sb.sb_sectsize); } +/* + * Metadata inode creation needs enough space to create or mkdir a directory, + * plus logging the superblock. + */ +static unsigned int +xfs_calc_imeta_create_resv( + struct xfs_mount *mp, + struct xfs_trans_resv *resp) +{ + unsigned int ret; + + ret = xfs_calc_buf_res(1, mp->m_sb.sb_sectsize); + ret += resp->tr_create.tr_logres; + return ret; +} + +/* Metadata inode creation needs enough rounds to create or mkdir a directory */ +static int +xfs_calc_imeta_create_count( + struct xfs_mount *mp, + struct xfs_trans_resv *resp) +{ + return resp->tr_create.tr_logcount; +} + +/* + * Metadata inode unlink needs enough space to remove a file plus logging the + * superblock. + */ +static unsigned int +xfs_calc_imeta_unlink_resv( + struct xfs_mount *mp, + struct xfs_trans_resv *resp) +{ + unsigned int ret; + + ret = xfs_calc_buf_res(1, mp->m_sb.sb_sectsize); + ret += resp->tr_remove.tr_logres; + return ret; +} + +/* Metadata inode creation needs enough rounds to remove a file. */ +static int +xfs_calc_imeta_unlink_count( + struct xfs_mount *mp, + struct xfs_trans_resv *resp) +{ + return resp->tr_remove.tr_logcount; +} + void xfs_trans_resv_calc( struct xfs_mount *mp, @@ -1027,6 +1077,20 @@ xfs_trans_resv_calc( resp->tr_qm_dqalloc.tr_logcount += logcount_adj; /* metadata inode creation and unlink */ - resp->tr_imeta_create = resp->tr_create; - resp->tr_imeta_unlink = resp->tr_remove; + if (xfs_has_metadir(mp)) { + resp->tr_imeta_create.tr_logres = + xfs_calc_imeta_create_resv(mp, resp); + resp->tr_imeta_create.tr_logcount = + xfs_calc_imeta_create_count(mp, resp); + resp->tr_imeta_create.tr_logflags |= XFS_TRANS_PERM_LOG_RES; + + resp->tr_imeta_unlink.tr_logres = + xfs_calc_imeta_unlink_resv(mp, resp); + resp->tr_imeta_unlink.tr_logcount = + xfs_calc_imeta_unlink_count(mp, resp); + resp->tr_imeta_unlink.tr_logflags |= XFS_TRANS_PERM_LOG_RES; + } else { + resp->tr_imeta_create = resp->tr_create; + resp->tr_imeta_unlink = resp->tr_remove; + } } ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 07/23] xfs: define the on-disk format for the metadir feature 2022-12-30 22:17 ` [PATCHSET v1.0 00/23] xfs: metadata inode directories Darrick J. Wong ` (6 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 08/23] xfs: update imeta transaction reservations for metadir Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 03/23] xfs: create transaction reservations for metadata inode operations Darrick J. Wong ` (14 subsequent siblings) 22 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Define the on-disk layout and feature flags for the metadata inode directory feature. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_format.h | 48 ++++++++++++++++++++++++++++++++++++++-- fs/xfs/libxfs/xfs_inode_util.c | 2 ++ fs/xfs/libxfs/xfs_sb.c | 2 ++ fs/xfs/xfs_inode.h | 7 ++++++ fs/xfs/xfs_mount.h | 2 ++ fs/xfs/xfs_super.c | 4 +++ 6 files changed, 63 insertions(+), 2 deletions(-) diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h index abd75b3091ec..0bd915bd4eed 100644 --- a/fs/xfs/libxfs/xfs_format.h +++ b/fs/xfs/libxfs/xfs_format.h @@ -174,6 +174,16 @@ typedef struct xfs_sb { xfs_lsn_t sb_lsn; /* last write sequence */ uuid_t sb_meta_uuid; /* metadata file system unique id */ + /* Fields beyond here do not match xfs_dsb. Be very careful! */ + + /* + * Metadata Directory Inode. On disk this lives in the sb_rbmino slot, + * but we continue to use the in-core superblock to cache the classic + * inodes (rt bitmap; rt summary; user, group, and project quotas) so + * we cache the metadir inode value here too. + */ + xfs_ino_t sb_metadirino; + /* must be padded to 64 bit alignment */ } xfs_sb_t; @@ -190,7 +200,14 @@ struct xfs_dsb { uuid_t sb_uuid; /* user-visible file system unique id */ __be64 sb_logstart; /* starting block of log if internal */ __be64 sb_rootino; /* root inode number */ - __be64 sb_rbmino; /* bitmap inode for realtime extents */ + /* + * bitmap inode for realtime extents. + * + * The metadata directory feature uses the sb_rbmino field to point to + * the root of the metadata directory tree. All other sb inode + * pointers are no longer used. + */ + __be64 sb_rbmino; __be64 sb_rsumino; /* summary inode for rt bitmap */ __be32 sb_rextsize; /* realtime extent size, blocks */ __be32 sb_agblocks; /* size of an allocation group */ @@ -372,6 +389,7 @@ xfs_sb_has_ro_compat_feature( #define XFS_SB_FEAT_INCOMPAT_BIGTIME (1 << 3) /* large timestamps */ #define XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR (1 << 4) /* needs xfs_repair */ #define XFS_SB_FEAT_INCOMPAT_NREXT64 (1 << 5) /* large extent counters */ +#define XFS_SB_FEAT_INCOMPAT_METADIR (1U << 31) /* metadata dir tree */ #define XFS_SB_FEAT_INCOMPAT_ALL \ (XFS_SB_FEAT_INCOMPAT_FTYPE| \ XFS_SB_FEAT_INCOMPAT_SPINODES| \ @@ -1078,6 +1096,7 @@ static inline void xfs_dinode_put_rdev(struct xfs_dinode *dip, xfs_dev_t rdev) #define XFS_DIFLAG2_COWEXTSIZE_BIT 2 /* copy on write extent size hint */ #define XFS_DIFLAG2_BIGTIME_BIT 3 /* big timestamps */ #define XFS_DIFLAG2_NREXT64_BIT 4 /* large extent counters */ +#define XFS_DIFLAG2_METADATA_BIT 63 /* filesystem metadata */ #define XFS_DIFLAG2_DAX (1 << XFS_DIFLAG2_DAX_BIT) #define XFS_DIFLAG2_REFLINK (1 << XFS_DIFLAG2_REFLINK_BIT) @@ -1085,9 +1104,34 @@ static inline void xfs_dinode_put_rdev(struct xfs_dinode *dip, xfs_dev_t rdev) #define XFS_DIFLAG2_BIGTIME (1 << XFS_DIFLAG2_BIGTIME_BIT) #define XFS_DIFLAG2_NREXT64 (1 << XFS_DIFLAG2_NREXT64_BIT) +/* + * The inode contains filesystem metadata and can be found through the metadata + * directory tree. Metadata inodes must satisfy the following constraints: + * + * - V5 filesystem (and ftype) are enabled; + * - The only valid modes are regular files and directories; + * - The access bits must be zero; + * - DMAPI event and state masks are zero; + * - The user, group, and project IDs must be zero; + * - The immutable, sync, noatime, nodump, nodefrag flags must be set. + * - The dax flag must not be set. + * - Directories must have nosymlinks set. + * + * These requirements are chosen defensively to minimize the ability of + * userspace to read or modify the contents, should a metadata file ever + * escape to userspace. + * + * There are further constraints on the directory tree itself: + * + * - Metadata inodes must never be resolvable through the root directory; + * - They must never be accessed by userspace; + * - Metadata directory entries must have correct ftype. + */ +#define XFS_DIFLAG2_METADATA (1ULL << XFS_DIFLAG2_METADATA_BIT) + #define XFS_DIFLAG2_ANY \ (XFS_DIFLAG2_DAX | XFS_DIFLAG2_REFLINK | XFS_DIFLAG2_COWEXTSIZE | \ - XFS_DIFLAG2_BIGTIME | XFS_DIFLAG2_NREXT64) + XFS_DIFLAG2_BIGTIME | XFS_DIFLAG2_NREXT64 | XFS_DIFLAG2_METADATA) static inline bool xfs_dinode_has_bigtime(const struct xfs_dinode *dip) { diff --git a/fs/xfs/libxfs/xfs_inode_util.c b/fs/xfs/libxfs/xfs_inode_util.c index 1135bec1328b..7b3e0c79c847 100644 --- a/fs/xfs/libxfs/xfs_inode_util.c +++ b/fs/xfs/libxfs/xfs_inode_util.c @@ -225,6 +225,8 @@ xfs_inode_inherit_flags2( } if (pip->i_diflags2 & XFS_DIFLAG2_DAX) ip->i_diflags2 |= XFS_DIFLAG2_DAX; + if (pip->i_diflags2 & XFS_DIFLAG2_METADATA) + ip->i_diflags2 |= XFS_DIFLAG2_METADATA; /* Don't let invalid cowextsize hints propagate. */ failaddr = xfs_inode_validate_cowextsize(ip->i_mount, ip->i_cowextsize, diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c index 5b6f5939fda1..345a6fdf8625 100644 --- a/fs/xfs/libxfs/xfs_sb.c +++ b/fs/xfs/libxfs/xfs_sb.c @@ -174,6 +174,8 @@ xfs_sb_version_to_features( features |= XFS_FEAT_NEEDSREPAIR; if (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_NREXT64) features |= XFS_FEAT_NREXT64; + if (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_METADIR) + features |= XFS_FEAT_METADIR; return features; } diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h index 571f61930b7b..d45583cd349d 100644 --- a/fs/xfs/xfs_inode.h +++ b/fs/xfs/xfs_inode.h @@ -264,6 +264,13 @@ static inline bool xfs_is_metadata_inode(struct xfs_inode *ip) { struct xfs_mount *mp = ip->i_mount; + if (xfs_has_metadir(mp)) + return ip->i_diflags2 & XFS_DIFLAG2_METADATA; + + /* + * Before metadata directories, the only metadata inodes were the + * three quota files, the realtime bitmap, and the realtime summary. + */ return ip == mp->m_rbmip || ip == mp->m_rsumip || xfs_is_quota_inode(&mp->m_sb, ip->i_ino); } diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h index 3b2601ab954d..0fb545e92a26 100644 --- a/fs/xfs/xfs_mount.h +++ b/fs/xfs/xfs_mount.h @@ -281,6 +281,7 @@ typedef struct xfs_mount { #define XFS_FEAT_BIGTIME (1ULL << 24) /* large timestamps */ #define XFS_FEAT_NEEDSREPAIR (1ULL << 25) /* needs xfs_repair */ #define XFS_FEAT_NREXT64 (1ULL << 26) /* large extent counters */ +#define XFS_FEAT_METADIR (1ULL << 27) /* metadata directory tree */ /* Mount features */ #define XFS_FEAT_NOATTR2 (1ULL << 48) /* disable attr2 creation */ @@ -344,6 +345,7 @@ __XFS_HAS_FEAT(inobtcounts, INOBTCNT) __XFS_HAS_FEAT(bigtime, BIGTIME) __XFS_HAS_FEAT(needsrepair, NEEDSREPAIR) __XFS_HAS_FEAT(large_extent_counts, NREXT64) +__XFS_HAS_FEAT(metadir, METADIR) /* * Mount features diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index 4cf26611f46f..9eff9ee106c4 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -1640,6 +1640,10 @@ xfs_fs_fill_super( mp->m_features &= ~XFS_FEAT_DISCARD; } + if (xfs_has_metadir(mp)) + xfs_warn(mp, +"EXPERIMENTAL metadata directory feature in use. Use at your own risk!"); + if (xfs_has_reflink(mp)) { if (mp->m_sb.sb_rblocks) { xfs_alert(mp, ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 03/23] xfs: create transaction reservations for metadata inode operations 2022-12-30 22:17 ` [PATCHSET v1.0 00/23] xfs: metadata inode directories Darrick J. Wong ` (7 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 07/23] xfs: define the on-disk format for the metadir feature Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 10/23] xfs: convert metadata inode lookup keys to use paths Darrick J. Wong ` (13 subsequent siblings) 22 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Create transaction reservation types and block reservation helpers to help us calculate transaction requirements. Right now the reservations are the same as always; we're just separating the symbols for a future patch. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_imeta.c | 20 ++++++++++++++++++++ fs/xfs/libxfs/xfs_imeta.h | 3 +++ fs/xfs/libxfs/xfs_trans_resv.c | 4 ++++ fs/xfs/libxfs/xfs_trans_resv.h | 2 ++ 4 files changed, 29 insertions(+) diff --git a/fs/xfs/libxfs/xfs_imeta.c b/fs/xfs/libxfs/xfs_imeta.c index 0a1cd0c5c15b..f14b7892f50d 100644 --- a/fs/xfs/libxfs/xfs_imeta.c +++ b/fs/xfs/libxfs/xfs_imeta.c @@ -19,6 +19,10 @@ #include "xfs_inode.h" #include "xfs_quota.h" #include "xfs_ialloc.h" +#include "xfs_bmap_btree.h" +#include "xfs_da_format.h" +#include "xfs_da_btree.h" +#include "xfs_trans_space.h" /* * Metadata Inode Number Management @@ -436,3 +440,19 @@ xfs_imeta_mount( { return 0; } + +/* Calculate the log block reservation to create a metadata inode. */ +unsigned int +xfs_imeta_create_space_res( + struct xfs_mount *mp) +{ + return XFS_IALLOC_SPACE_RES(mp); +} + +/* Calculate the log block reservation to unlink a metadata inode. */ +unsigned int +xfs_imeta_unlink_space_res( + struct xfs_mount *mp) +{ + return XFS_REMOVE_SPACE_RES(mp); +} diff --git a/fs/xfs/libxfs/xfs_imeta.h b/fs/xfs/libxfs/xfs_imeta.h index b535e19ff1a0..9d54cb0d7962 100644 --- a/fs/xfs/libxfs/xfs_imeta.h +++ b/fs/xfs/libxfs/xfs_imeta.h @@ -45,4 +45,7 @@ int xfs_imeta_start_update(struct xfs_mount *mp, bool xfs_is_static_meta_ino(struct xfs_mount *mp, xfs_ino_t ino); int xfs_imeta_mount(struct xfs_mount *mp); +unsigned int xfs_imeta_create_space_res(struct xfs_mount *mp); +unsigned int xfs_imeta_unlink_space_res(struct xfs_mount *mp); + #endif /* __XFS_IMETA_H__ */ diff --git a/fs/xfs/libxfs/xfs_trans_resv.c b/fs/xfs/libxfs/xfs_trans_resv.c index 5b2f27cbdb80..d2716184bd8f 100644 --- a/fs/xfs/libxfs/xfs_trans_resv.c +++ b/fs/xfs/libxfs/xfs_trans_resv.c @@ -1025,4 +1025,8 @@ xfs_trans_resv_calc( resp->tr_itruncate.tr_logcount += logcount_adj; resp->tr_write.tr_logcount += logcount_adj; resp->tr_qm_dqalloc.tr_logcount += logcount_adj; + + /* metadata inode creation and unlink */ + resp->tr_imeta_create = resp->tr_create; + resp->tr_imeta_unlink = resp->tr_remove; } diff --git a/fs/xfs/libxfs/xfs_trans_resv.h b/fs/xfs/libxfs/xfs_trans_resv.h index 0554b9d775d2..3836c5131b91 100644 --- a/fs/xfs/libxfs/xfs_trans_resv.h +++ b/fs/xfs/libxfs/xfs_trans_resv.h @@ -48,6 +48,8 @@ struct xfs_trans_resv { struct xfs_trans_res tr_qm_dqalloc; /* allocate quota on disk */ struct xfs_trans_res tr_sb; /* modify superblock */ struct xfs_trans_res tr_fsyncts; /* update timestamps on fsync */ + struct xfs_trans_res tr_imeta_create; /* create metadata inode */ + struct xfs_trans_res tr_imeta_unlink; /* unlink metadata inode */ }; /* shorthand way of accessing reservation structure */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 10/23] xfs: convert metadata inode lookup keys to use paths 2022-12-30 22:17 ` [PATCHSET v1.0 00/23] xfs: metadata inode directories Darrick J. Wong ` (8 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 03/23] xfs: create transaction reservations for metadata inode operations Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 15/23] xfs: hide metadata inodes from everyone because they are special Darrick J. Wong ` (12 subsequent siblings) 22 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Convert the magic metadata inode lookup keys to use actual strings for paths. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_imeta.c | 48 ++++++++++++++++++++++++--------------------- fs/xfs/libxfs/xfs_imeta.h | 17 ++++++++++++++-- 2 files changed, 41 insertions(+), 24 deletions(-) diff --git a/fs/xfs/libxfs/xfs_imeta.c b/fs/xfs/libxfs/xfs_imeta.c index f14b7892f50d..f35ed01320b3 100644 --- a/fs/xfs/libxfs/xfs_imeta.c +++ b/fs/xfs/libxfs/xfs_imeta.c @@ -49,26 +49,17 @@ */ /* Static metadata inode paths */ - -const struct xfs_imeta_path XFS_IMETA_RTBITMAP = { - .bogus = 0, -}; - -const struct xfs_imeta_path XFS_IMETA_RTSUMMARY = { - .bogus = 1, -}; - -const struct xfs_imeta_path XFS_IMETA_USRQUOTA = { - .bogus = 2, -}; - -const struct xfs_imeta_path XFS_IMETA_GRPQUOTA = { - .bogus = 3, -}; - -const struct xfs_imeta_path XFS_IMETA_PRJQUOTA = { - .bogus = 4, -}; +static const char *rtbitmap_path[] = {"realtime", "bitmap"}; +static const char *rtsummary_path[] = {"realtime", "summary"}; +static const char *usrquota_path[] = {"quota", "user"}; +static const char *grpquota_path[] = {"quota", "group"}; +static const char *prjquota_path[] = {"quota", "project"}; + +XFS_IMETA_DEFINE_PATH(XFS_IMETA_RTBITMAP, rtbitmap_path); +XFS_IMETA_DEFINE_PATH(XFS_IMETA_RTSUMMARY, rtsummary_path); +XFS_IMETA_DEFINE_PATH(XFS_IMETA_USRQUOTA, usrquota_path); +XFS_IMETA_DEFINE_PATH(XFS_IMETA_GRPQUOTA, grpquota_path); +XFS_IMETA_DEFINE_PATH(XFS_IMETA_PRJQUOTA, prjquota_path); /* Are these two paths equal? */ STATIC bool @@ -76,7 +67,20 @@ xfs_imeta_path_compare( const struct xfs_imeta_path *a, const struct xfs_imeta_path *b) { - return a == b; + unsigned int i; + + if (a == b) + return true; + + if (a->im_depth != b->im_depth) + return false; + + for (i = 0; i < a->im_depth; i++) + if (a->im_path[i] != b->im_path[i] && + strcmp(a->im_path[i], b->im_path[i])) + return false; + + return true; } /* Is this path ok? */ @@ -84,7 +88,7 @@ static inline bool xfs_imeta_path_check( const struct xfs_imeta_path *path) { - return true; + return path->im_depth <= XFS_IMETA_MAX_DEPTH; } /* Functions for storing and retrieving superblock inode values. */ diff --git a/fs/xfs/libxfs/xfs_imeta.h b/fs/xfs/libxfs/xfs_imeta.h index 312e3a6fdb96..631a88120a70 100644 --- a/fs/xfs/libxfs/xfs_imeta.h +++ b/fs/xfs/libxfs/xfs_imeta.h @@ -6,10 +6,23 @@ #ifndef __XFS_IMETA_H__ #define __XFS_IMETA_H__ +/* How deep can we nest metadata dirs? */ +#define XFS_IMETA_MAX_DEPTH 64 + +/* Form an imeta path from a simple array of strings. */ +#define XFS_IMETA_DEFINE_PATH(name, path) \ +const struct xfs_imeta_path name = { \ + .im_path = (path), \ + .im_depth = ARRAY_SIZE(path), \ +} + /* Key for looking up metadata inodes. */ struct xfs_imeta_path { - /* Temporary: integer to keep the static imeta definitions unique */ - int bogus; + /* Array of string pointers. */ + const char **im_path; + + /* Number of strings in path. */ + unsigned int im_depth; }; /* Cleanup widget for metadata inode creation and deletion. */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 15/23] xfs: hide metadata inodes from everyone because they are special 2022-12-30 22:17 ` [PATCHSET v1.0 00/23] xfs: metadata inode directories Darrick J. Wong ` (9 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 10/23] xfs: convert metadata inode lookup keys to use paths Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 13/23] xfs: ensure metadata directory paths exist before creating files Darrick J. Wong ` (11 subsequent siblings) 22 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Metadata inodes are private files and therefore cannot be exposed to userspace. This means no bulkstat, no open-by-handle, no linking them into the directory tree, and no feeding them to LSMs. As such, we mark them S_PRIVATE, which stops all that. While we're at it, put them in a separate lockdep class so that it won't get confused by "recursive" i_rwsem locking such as what happens when we write to a rt file and need to allocate from the rt bitmap file. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/scrub/tempfile.c | 8 ++++++++ fs/xfs/xfs_iops.c | 34 ++++++++++++++++++++++++++++++++-- 2 files changed, 40 insertions(+), 2 deletions(-) diff --git a/fs/xfs/scrub/tempfile.c b/fs/xfs/scrub/tempfile.c index beaaebf27284..9ae556fa4b7a 100644 --- a/fs/xfs/scrub/tempfile.c +++ b/fs/xfs/scrub/tempfile.c @@ -805,6 +805,14 @@ xrep_is_tempfile( const struct xfs_inode *ip) { const struct inode *inode = &ip->i_vnode; + struct xfs_mount *mp = ip->i_mount; + + /* + * Files in the metadata directory tree also have S_PRIVATE set and + * IOP_XATTR unset, so we must distinguish them separately. + */ + if (xfs_has_metadir(mp) && (ip->i_diflags2 & XFS_DIFLAG2_METADATA)) + return false; if (IS_PRIVATE(inode) && !(inode->i_opflags & IOP_XATTR)) return true; diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c index d580bf591d73..626ce6c4e2bf 100644 --- a/fs/xfs/xfs_iops.c +++ b/fs/xfs/xfs_iops.c @@ -44,6 +44,15 @@ static struct lock_class_key xfs_nondir_ilock_class; static struct lock_class_key xfs_dir_ilock_class; +/* + * Metadata directories and files are not exposed to userspace, which means + * that they never access any of the VFS IO locks and never experience page + * faults. Give them separate locking classes so that lockdep will not + * complain about conflicts that cannot happen. + */ +static struct lock_class_key xfs_metadata_file_ilock_class; +static struct lock_class_key xfs_metadata_dir_ilock_class; + static int xfs_initxattrs( struct inode *inode, @@ -1270,6 +1279,7 @@ xfs_setup_inode( { struct inode *inode = &ip->i_vnode; gfp_t gfp_mask; + bool is_meta = xfs_is_metadata_inode(ip); inode->i_ino = ip->i_ino; inode->i_state |= I_NEW; @@ -1281,6 +1291,16 @@ xfs_setup_inode( i_size_write(inode, ip->i_disk_size); xfs_diflags_to_iflags(ip, true); + /* + * Mark our metadata files as private so that LSMs and the ACL code + * don't try to add their own metadata or reason about these files, + * and users cannot ever obtain file handles to them. + */ + if (is_meta) { + inode->i_flags |= S_PRIVATE; + inode->i_opflags &= ~IOP_XATTR; + } + if (S_ISDIR(inode->i_mode)) { /* * We set the i_rwsem class here to avoid potential races with @@ -1290,9 +1310,19 @@ xfs_setup_inode( */ lockdep_set_class(&inode->i_rwsem, &inode->i_sb->s_type->i_mutex_dir_key); - lockdep_set_class(&ip->i_lock.mr_lock, &xfs_dir_ilock_class); + if (is_meta) + lockdep_set_class(&ip->i_lock.mr_lock, + &xfs_metadata_dir_ilock_class); + else + lockdep_set_class(&ip->i_lock.mr_lock, + &xfs_dir_ilock_class); } else { - lockdep_set_class(&ip->i_lock.mr_lock, &xfs_nondir_ilock_class); + if (is_meta) + lockdep_set_class(&ip->i_lock.mr_lock, + &xfs_metadata_file_ilock_class); + else + lockdep_set_class(&ip->i_lock.mr_lock, + &xfs_nondir_ilock_class); } /* ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 13/23] xfs: ensure metadata directory paths exist before creating files 2022-12-30 22:17 ` [PATCHSET v1.0 00/23] xfs: metadata inode directories Darrick J. Wong ` (10 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 15/23] xfs: hide metadata inodes from everyone because they are special Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 14/23] xfs: disable the agi rotor for metadata inodes Darrick J. Wong ` (10 subsequent siblings) 22 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Since xfs_imeta_create can create new metadata files arbitrarily deep in the metadata directory tree, we must supply a function that can ensure that all directories in a path exist, and call it before the quota functions create the quota inodes. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_imeta.h | 2 + fs/xfs/xfs_inode.c | 103 +++++++++++++++++++++++++++++++++++++++++++++ fs/xfs/xfs_qm.c | 16 +++++++ 3 files changed, 121 insertions(+) diff --git a/fs/xfs/libxfs/xfs_imeta.h b/fs/xfs/libxfs/xfs_imeta.h index 9b139f6809f0..741f426c6a4a 100644 --- a/fs/xfs/libxfs/xfs_imeta.h +++ b/fs/xfs/libxfs/xfs_imeta.h @@ -80,5 +80,7 @@ unsigned int xfs_imeta_unlink_space_res(struct xfs_mount *mp); int xfs_imeta_iget(struct xfs_mount *mp, xfs_ino_t ino, unsigned char ftype, struct xfs_inode **ipp); void xfs_imeta_irele(struct xfs_inode *ip); +int xfs_imeta_ensure_dirpath(struct xfs_mount *mp, + const struct xfs_imeta_path *path); #endif /* __XFS_IMETA_H__ */ diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index 3830e03ceb0a..1eb53ed0097d 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -1031,6 +1031,109 @@ xfs_create_tmpfile( return error; } +/* Create a metadata for the last component of the path. */ +STATIC int +xfs_imeta_mkdir( + struct xfs_mount *mp, + const struct xfs_imeta_path *path) +{ + struct xfs_imeta_update upd; + struct xfs_inode *ip = NULL; + struct xfs_trans *tp = NULL; + struct xfs_dquot *udqp = NULL; + struct xfs_dquot *gdqp = NULL; + struct xfs_dquot *pdqp = NULL; + unsigned int resblks; + int error; + + if (xfs_is_shutdown(mp)) + return -EIO; + + error = xfs_imeta_start_update(mp, path, &upd); + if (error) + return error; + + /* Grab all the root dquots. */ + error = xfs_qm_vop_dqalloc(mp->m_metadirip, GLOBAL_ROOT_UID, + GLOBAL_ROOT_GID, 0, XFS_QMOPT_QUOTALL, &udqp, &gdqp, + &pdqp); + if (error) + goto out_end; + + /* Allocate a transaction to create the last directory. */ + resblks = xfs_imeta_create_space_res(mp); + error = xfs_trans_alloc_icreate(mp, &M_RES(mp)->tr_imeta_create, udqp, + gdqp, pdqp, resblks, &tp); + if (error) + goto out_dqrele; + + /* Create the subdirectory. */ + error = xfs_imeta_create(&tp, path, S_IFDIR, 0, &ip, &upd); + if (error) + goto out_trans_cancel; + + /* + * Attach the dquot(s) to the inodes and modify them incore. + * These ids of the inode couldn't have changed since the new + * inode has been locked ever since it was created. + */ + xfs_qm_vop_create_dqattach(tp, ip, udqp, gdqp, pdqp); + + error = xfs_trans_commit(tp); + + /* + * We don't pass the directory we just created to the caller, so finish + * setting up the inode, then release the dir and the dquots. + */ + goto out_irele; + +out_trans_cancel: + xfs_trans_cancel(tp); +out_irele: + /* Have to finish setting up the inode to ensure it's deleted. */ + if (ip) { + xfs_finish_inode_setup(ip); + xfs_irele(ip); + } + +out_dqrele: + xfs_qm_dqrele(udqp); + xfs_qm_dqrele(gdqp); + xfs_qm_dqrele(pdqp); +out_end: + xfs_imeta_end_update(mp, &upd, error); + return error; +} + +/* + * Make sure that every metadata directory path component exists and is a + * directory. + */ +int +xfs_imeta_ensure_dirpath( + struct xfs_mount *mp, + const struct xfs_imeta_path *path) +{ + struct xfs_imeta_path temp_path = { + .im_path = path->im_path, + .im_depth = 1, + .im_ftype = XFS_DIR3_FT_DIR, + }; + unsigned int i; + int error = 0; + + if (!xfs_has_metadir(mp)) + return 0; + + for (i = 0; i < path->im_depth - 1; i++, temp_path.im_depth++) { + error = xfs_imeta_mkdir(mp, &temp_path); + if (error && error != -EEXIST) + break; + } + + return error == -EEXIST ? 0 : error; +} + int xfs_link( xfs_inode_t *tdp, diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c index 8828e8cafca5..905765eedcb0 100644 --- a/fs/xfs/xfs_qm.c +++ b/fs/xfs/xfs_qm.c @@ -829,6 +829,22 @@ xfs_qm_qino_alloc( if (error) return error; + /* + * Ensure the quota directory exists, being careful to disable quotas + * while we do this. We'll have to quotacheck anyway, so the temporary + * undercount of the directory tree shouldn't affect the quota count. + */ + if (xfs_has_metadir(mp)) { + unsigned int old_qflags; + + old_qflags = mp->m_qflags & XFS_ALL_QUOTA_ACCT; + mp->m_qflags &= ~XFS_ALL_QUOTA_ACCT; + error = xfs_imeta_ensure_dirpath(mp, path); + mp->m_qflags |= old_qflags; + if (error) + return error; + } + error = xfs_imeta_start_update(mp, path, &upd); if (error) return error; ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 14/23] xfs: disable the agi rotor for metadata inodes 2022-12-30 22:17 ` [PATCHSET v1.0 00/23] xfs: metadata inode directories Darrick J. Wong ` (11 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 13/23] xfs: ensure metadata directory paths exist before creating files Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 12/23] xfs: read and write metadata inode directory Darrick J. Wong ` (9 subsequent siblings) 22 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Ideally, we'd put all the metadata inodes in one place if we could, so that the metadata all stay reasonably close together instead of spreading out over the disk. Furthermore, if the log is internal we'd probably prefer to keep the metadata near the log. Therefore, disable AGI rotoring for metadata inode allocations. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_ialloc.c | 16 +++++++++------- fs/xfs/libxfs/xfs_ialloc.h | 2 +- fs/xfs/libxfs/xfs_imeta.c | 4 ++-- fs/xfs/scrub/tempfile.c | 2 +- fs/xfs/xfs_inode.c | 4 ++-- fs/xfs/xfs_symlink.c | 2 +- 6 files changed, 16 insertions(+), 14 deletions(-) diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c index d4c202da84cb..1d1c3cb0389c 100644 --- a/fs/xfs/libxfs/xfs_ialloc.c +++ b/fs/xfs/libxfs/xfs_ialloc.c @@ -1798,26 +1798,28 @@ xfs_dialloc_try_ag( int xfs_dialloc( struct xfs_trans **tpp, - xfs_ino_t parent, + struct xfs_inode *pip, umode_t mode, xfs_ino_t *new_ino) { struct xfs_mount *mp = (*tpp)->t_mountp; + struct xfs_perag *pag; + struct xfs_ino_geometry *igeo = M_IGEO(mp); + xfs_ino_t ino; + xfs_ino_t parent = pip ? pip->i_ino : 0; xfs_agnumber_t agno; - int error = 0; xfs_agnumber_t start_agno; - struct xfs_perag *pag; - struct xfs_ino_geometry *igeo = M_IGEO(mp); bool ok_alloc = true; int flags; - xfs_ino_t ino; + int error = 0; /* * Directories, symlinks, and regular files frequently allocate at least * one block, so factor that potential expansion when we examine whether - * an AG has enough space for file creation. + * an AG has enough space for file creation. Try to keep metadata + * files all in the same AG. */ - if (S_ISDIR(mode)) + if (S_ISDIR(mode) && (!pip || !xfs_is_metadata_inode(pip))) start_agno = xfs_ialloc_next_ag(mp); else { start_agno = XFS_INO_TO_AGNO(mp, parent); diff --git a/fs/xfs/libxfs/xfs_ialloc.h b/fs/xfs/libxfs/xfs_ialloc.h index f4dc97bb8e83..adf60dc56e73 100644 --- a/fs/xfs/libxfs/xfs_ialloc.h +++ b/fs/xfs/libxfs/xfs_ialloc.h @@ -36,7 +36,7 @@ xfs_make_iptr(struct xfs_mount *mp, struct xfs_buf *b, int o) * Allocate an inode on disk. Mode is used to tell whether the new inode will * need space, and whether it is a directory. */ -int xfs_dialloc(struct xfs_trans **tpp, xfs_ino_t parent, umode_t mode, +int xfs_dialloc(struct xfs_trans **tpp, struct xfs_inode *dp, umode_t mode, xfs_ino_t *new_ino); int xfs_difree(struct xfs_trans *tp, struct xfs_perag *pag, diff --git a/fs/xfs/libxfs/xfs_imeta.c b/fs/xfs/libxfs/xfs_imeta.c index d3f60150f8ef..07f88df7a7e5 100644 --- a/fs/xfs/libxfs/xfs_imeta.c +++ b/fs/xfs/libxfs/xfs_imeta.c @@ -232,7 +232,7 @@ xfs_imeta_sb_create( return -EEXIST; /* Create a new inode and set the sb pointer. */ - error = xfs_dialloc(tpp, 0, mode, &ino); + error = xfs_dialloc(tpp, NULL, mode, &ino); if (error) return error; error = xfs_icreate(*tpp, ino, &args, ipp); @@ -642,7 +642,7 @@ xfs_imeta_dir_create( * entry pointing to them, but a directory also the "." entry * pointing to itself. */ - error = xfs_dialloc(tpp, dp->i_ino, mode, &ino); + error = xfs_dialloc(tpp, dp, mode, &ino); if (error) goto out_ilock; error = xfs_icreate(*tpp, ino, &args, ipp); diff --git a/fs/xfs/scrub/tempfile.c b/fs/xfs/scrub/tempfile.c index 6efaab50440f..beaaebf27284 100644 --- a/fs/xfs/scrub/tempfile.c +++ b/fs/xfs/scrub/tempfile.c @@ -87,7 +87,7 @@ xrep_tempfile_create( goto out_release_dquots; /* Allocate inode, set up directory. */ - error = xfs_dialloc(&tp, dp->i_ino, mode, &ino); + error = xfs_dialloc(&tp, dp, mode, &ino); if (error) goto out_trans_cancel; error = xfs_icreate(tp, ino, &args, &sc->tempip); diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index 1eb53ed0097d..187c6025cfd8 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -864,7 +864,7 @@ xfs_create( * entry pointing to them, but a directory also the "." entry * pointing to itself. */ - error = xfs_dialloc(&tp, dp->i_ino, args->mode, &ino); + error = xfs_dialloc(&tp, dp, args->mode, &ino); if (!error) error = xfs_icreate(tp, ino, args, &ip); if (error) @@ -980,7 +980,7 @@ xfs_create_tmpfile( if (error) goto out_release_dquots; - error = xfs_dialloc(&tp, dp->i_ino, args->mode, &ino); + error = xfs_dialloc(&tp, dp, args->mode, &ino); if (!error) error = xfs_icreate(tp, ino, args, &ip); if (error) diff --git a/fs/xfs/xfs_symlink.c b/fs/xfs/xfs_symlink.c index 6dc15f125895..f26ed3eba6cc 100644 --- a/fs/xfs/xfs_symlink.c +++ b/fs/xfs/xfs_symlink.c @@ -158,7 +158,7 @@ xfs_symlink( /* * Allocate an inode for the symlink. */ - error = xfs_dialloc(&tp, dp->i_ino, S_IFLNK, &ino); + error = xfs_dialloc(&tp, dp, S_IFLNK, &ino); if (!error) error = xfs_icreate(tp, ino, &args, &ip); if (error) ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 12/23] xfs: read and write metadata inode directory 2022-12-30 22:17 ` [PATCHSET v1.0 00/23] xfs: metadata inode directories Darrick J. Wong ` (12 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 14/23] xfs: disable the agi rotor for metadata inodes Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 16/23] xfs: advertise metadata directory feature Darrick J. Wong ` (8 subsequent siblings) 22 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Plumb in the bits we need to look up metadata inode numbers from the metadata inode directory and save them back. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_imeta.c | 699 ++++++++++++++++++++++++++++++++++++++++ fs/xfs/libxfs/xfs_imeta.h | 17 + fs/xfs/libxfs/xfs_inode_util.c | 2 fs/xfs/libxfs/xfs_trans_resv.c | 8 fs/xfs/xfs_inode.c | 2 fs/xfs/xfs_inode.h | 4 fs/xfs/xfs_trace.h | 41 ++ 7 files changed, 765 insertions(+), 8 deletions(-) diff --git a/fs/xfs/libxfs/xfs_imeta.c b/fs/xfs/libxfs/xfs_imeta.c index f35ed01320b3..d3f60150f8ef 100644 --- a/fs/xfs/libxfs/xfs_imeta.c +++ b/fs/xfs/libxfs/xfs_imeta.c @@ -23,6 +23,9 @@ #include "xfs_da_format.h" #include "xfs_da_btree.h" #include "xfs_trans_space.h" +#include "xfs_dir2.h" +#include "xfs_dir2_priv.h" +#include "xfs_ag.h" /* * Metadata Inode Number Management @@ -43,9 +46,16 @@ * this structure must be passed to xfs_imeta_end_update to free resources that * cannot be freed during the transaction. * - * Right now we only support callers passing in the predefined metadata inode - * paths; the goal is that callers will some day locate metadata inodes based - * on path lookups into a metadata directory structure. + * When the metadata directory tree (metadir) feature is enabled, we can create + * a complex directory tree in which to store metadata inodes. Inodes within + * the metadata directory tree should have the "metadata" inode flag set to + * prevent them from being exposed to the outside world. + * + * Callers are expected to take the IOLOCK of metadata directories when + * performing lookups or updates to the tree. They are expected to take the + * ILOCK of any inode in the metadata directory tree (just like the regular to + * synchronize access to that inode. It is not necessary to take the MMAPLOCK + * since metadata inodes should never be exposed to user space. */ /* Static metadata inode paths */ @@ -61,6 +71,11 @@ XFS_IMETA_DEFINE_PATH(XFS_IMETA_USRQUOTA, usrquota_path); XFS_IMETA_DEFINE_PATH(XFS_IMETA_GRPQUOTA, grpquota_path); XFS_IMETA_DEFINE_PATH(XFS_IMETA_PRJQUOTA, prjquota_path); +const struct xfs_imeta_path XFS_IMETA_METADIR = { + .im_depth = 0, + .im_ftype = XFS_DIR3_FT_DIR, +}; + /* Are these two paths equal? */ STATIC bool xfs_imeta_path_compare( @@ -118,6 +133,10 @@ static const struct xfs_imeta_sbmap { .path = &XFS_IMETA_PRJQUOTA, .offset = offsetof(struct xfs_sb, sb_pquotino), }, + { + .path = &XFS_IMETA_METADIR, + .offset = offsetof(struct xfs_sb, sb_metadirino), + }, { NULL, 0 }, }; @@ -289,6 +308,523 @@ xfs_imeta_sb_link( return 0; } +/* Functions for storing and retrieving metadata directory inode values. */ + +static inline void +set_xname( + struct xfs_name *xname, + const struct xfs_imeta_path *path, + unsigned int path_idx, + unsigned char ftype) +{ + xname->name = (const unsigned char *)path->im_path[path_idx]; + xname->len = strlen(path->im_path[path_idx]); + xname->type = ftype; +} + +/* Look up the inode number and filetype for an exact name in a directory. */ +static inline int +xfs_imeta_dir_lookup( + struct xfs_inode *dp, + struct xfs_name *xname, + xfs_ino_t *ino) +{ + struct xfs_da_args args = { + .dp = dp, + .geo = dp->i_mount->m_dir_geo, + .name = xname->name, + .namelen = xname->len, + .hashval = xfs_dir2_hashname(dp->i_mount, xname), + .whichfork = XFS_DATA_FORK, + .op_flags = XFS_DA_OP_OKNOENT, + .owner = dp->i_ino, + }; + unsigned int lock_mode; + bool isblock, isleaf; + int error; + + if (xfs_is_shutdown(dp->i_mount)) + return -EIO; + + lock_mode = xfs_ilock_data_map_shared(dp); + if (dp->i_df.if_format == XFS_DINODE_FMT_LOCAL) { + error = xfs_dir2_sf_lookup(&args); + goto out_unlock; + } + + /* dir2 functions require that the data fork is loaded */ + error = xfs_iread_extents(NULL, dp, XFS_DATA_FORK); + if (error) + goto out_unlock; + + error = xfs_dir2_isblock(&args, &isblock); + if (error) + goto out_unlock; + + if (isblock) { + error = xfs_dir2_block_lookup(&args); + goto out_unlock; + } + + error = xfs_dir2_isleaf(&args, &isleaf); + if (error) + goto out_unlock; + + if (isleaf) { + error = xfs_dir2_leaf_lookup(&args); + goto out_unlock; + } + + error = xfs_dir2_node_lookup(&args); + +out_unlock: + xfs_iunlock(dp, lock_mode); + if (error == -EEXIST) + error = 0; + if (error) + return error; + + *ino = args.inumber; + xname->type = args.filetype; + return 0; +} +/* + * Given a parent directory @dp, a metadata inode @path and component + * @path_idx, and the expected file type @ftype of the path component, fill out + * the @xname and look up the inode number in the directory, returning it in + * @ino. + */ +static inline int +xfs_imeta_dir_lookup_component( + struct xfs_inode *dp, + struct xfs_name *xname, + xfs_ino_t *ino) +{ + int type_wanted = xname->type; + int error; + + trace_xfs_imeta_dir_lookup_component(dp, xname, NULLFSINO); + + if (!S_ISDIR(VFS_I(dp)->i_mode)) + return -EFSCORRUPTED; + + error = xfs_imeta_dir_lookup(dp, xname, ino); + if (error) + return error; + if (!xfs_verify_ino(dp->i_mount, *ino)) + return -EFSCORRUPTED; + if (type_wanted != XFS_DIR3_FT_UNKNOWN && xname->type != type_wanted) + return -EFSCORRUPTED; + + trace_xfs_imeta_dir_lookup_found(dp, xname, *ino); + return 0; +} + +/* + * Traverse a metadata directory tree path, returning the inode corresponding + * to the parent of the last path component. If any of the path components do + * not exist, return -ENOENT. + */ +STATIC int +xfs_imeta_dir_parent( + struct xfs_mount *mp, + const struct xfs_imeta_path *path, + struct xfs_inode **dpp) +{ + struct xfs_name xname; + struct xfs_inode *dp; + xfs_ino_t ino; + unsigned int i; + int error; + + if (mp->m_metadirip == NULL) + return -ENOENT; + + /* Grab the metadir root. */ + error = xfs_imeta_iget(mp, mp->m_metadirip->i_ino, XFS_DIR3_FT_DIR, + &dp); + if (error) + return error; + + /* Caller wanted the root, we're done! */ + if (path->im_depth == 0) { + *dpp = dp; + return 0; + } + + for (i = 0; i < path->im_depth - 1; i++) { + struct xfs_inode *ip = NULL; + + xfs_ilock(dp, XFS_IOLOCK_SHARED); + + /* Look up the name in the current directory. */ + set_xname(&xname, path, i, XFS_DIR3_FT_DIR); + error = xfs_imeta_dir_lookup_component(dp, &xname, &ino); + if (error) + goto out_rele; + + /* + * Grab the child inode while we still have the parent + * directory locked. + */ + error = xfs_imeta_iget(mp, ino, XFS_DIR3_FT_DIR, &ip); + if (error) + goto out_rele; + + xfs_iunlock(dp, XFS_IOLOCK_SHARED); + xfs_imeta_irele(dp); + dp = ip; + } + + *dpp = dp; + return 0; + +out_rele: + xfs_iunlock(dp, XFS_IOLOCK_SHARED); + xfs_imeta_irele(dp); + return error; +} + +/* + * Look up a metadata inode from the metadata directory. If the last path + * component doesn't exist, return NULLFSINO. If any other part of the path + * does not exist, return -ENOENT so we can distinguish the two. + */ +STATIC int +xfs_imeta_dir_lookup_int( + struct xfs_mount *mp, + const struct xfs_imeta_path *path, + xfs_ino_t *inop) +{ + struct xfs_name xname; + struct xfs_inode *dp = NULL; + xfs_ino_t ino; + int error; + + /* metadir ino is recorded in superblock */ + if (xfs_imeta_path_compare(path, &XFS_IMETA_METADIR)) + return xfs_imeta_sb_lookup(mp, path, inop); + + ASSERT(path->im_depth > 0); + + /* Find the parent of the last path component. */ + error = xfs_imeta_dir_parent(mp, path, &dp); + if (error) + return error; + + xfs_ilock(dp, XFS_IOLOCK_SHARED); + + /* Look up the name in the current directory. */ + set_xname(&xname, path, path->im_depth - 1, path->im_ftype); + error = xfs_imeta_dir_lookup_component(dp, &xname, &ino); + switch (error) { + case 0: + *inop = ino; + break; + case -ENOENT: + *inop = NULLFSINO; + error = 0; + break; + } + + xfs_iunlock(dp, XFS_IOLOCK_SHARED); + xfs_imeta_irele(dp); + return error; +} + +/* + * Load all the metadata inode pointers that are cached in the in-core + * superblock but live somewhere in the metadata directory tree. + */ +STATIC int +xfs_imeta_dir_mount( + struct xfs_mount *mp) +{ + const struct xfs_imeta_sbmap *p; + xfs_ino_t *sb_inop; + int err2; + int error = 0; + + for (p = xfs_imeta_sbmaps; p->path && p->path->im_depth > 0; p++) { + if (p->path == &XFS_IMETA_METADIR) + continue; + sb_inop = xfs_imeta_sbmap_to_inop(mp, p); + err2 = xfs_imeta_dir_lookup_int(mp, p->path, sb_inop); + if (err2 == -ENOENT) { + *sb_inop = NULLFSINO; + continue; + } + if (!error && err2) + error = err2; + } + + return error; +} + +/* Set up an inode to be recognized as a metadata inode. */ +void +xfs_imeta_set_metaflag( + struct xfs_trans *tp, + struct xfs_inode *ip) +{ + VFS_I(ip)->i_mode &= ~0777; + VFS_I(ip)->i_uid = GLOBAL_ROOT_UID; + VFS_I(ip)->i_gid = GLOBAL_ROOT_GID; + ip->i_projid = 0; + ip->i_diflags |= (XFS_DIFLAG_IMMUTABLE | XFS_DIFLAG_SYNC | + XFS_DIFLAG_NOATIME | XFS_DIFLAG_NODUMP | + XFS_DIFLAG_NODEFRAG); + if (S_ISDIR(VFS_I(ip)->i_mode)) + ip->i_diflags |= XFS_DIFLAG_NOSYMLINKS; + ip->i_diflags2 &= ~XFS_DIFLAG2_DAX; + ip->i_diflags2 |= XFS_DIFLAG2_METADATA; + xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); +} + +/* + * Create a new metadata inode accessible via the given metadata directory path. + * Callers must ensure that the directory entry does not already exist; a new + * one will be created. + */ +STATIC int +xfs_imeta_dir_create( + struct xfs_trans **tpp, + const struct xfs_imeta_path *path, + umode_t mode, + unsigned int flags, + struct xfs_inode **ipp, + struct xfs_imeta_update *upd) +{ + struct xfs_icreate_args args = { + .nlink = S_ISDIR(mode) ? 2 : 1, + }; + struct xfs_name xname; + struct xfs_mount *mp = (*tpp)->t_mountp; + struct xfs_inode *dp = upd->dp; + xfs_ino_t *sb_inop; + xfs_ino_t ino; + unsigned int resblks; + int error; + + xfs_icreate_args_rootfile(&args, mode); + + /* metadir ino is recorded in superblock; only mkfs gets to do this */ + if (xfs_imeta_path_compare(path, &XFS_IMETA_METADIR)) { + error = xfs_imeta_sb_create(tpp, path, mode, flags, ipp); + if (error) + return error; + + /* Set the metadata iflag, initialize directory. */ + xfs_imeta_set_metaflag(*tpp, *ipp); + return xfs_dir_init(*tpp, *ipp, *ipp); + } + + ASSERT(path->im_depth > 0); + + /* Check that the name does not already exist in the directory. */ + set_xname(&xname, path, path->im_depth - 1, XFS_DIR3_FT_UNKNOWN); + error = xfs_imeta_dir_lookup_component(dp, &xname, &ino); + switch (error) { + case -ENOENT: + break; + case 0: + error = -EEXIST; + fallthrough; + default: + return error; + } + + xfs_ilock(dp, XFS_ILOCK_EXCL | XFS_ILOCK_PARENT); + args.pip = dp; + + /* + * A newly created regular or special file just has one directory + * entry pointing to them, but a directory also the "." entry + * pointing to itself. + */ + error = xfs_dialloc(tpp, dp->i_ino, mode, &ino); + if (error) + goto out_ilock; + error = xfs_icreate(*tpp, ino, &args, ipp); + if (error) + goto out_ilock; + xfs_imeta_set_metaflag(*tpp, *ipp); + + /* + * Once we join the parent directory to the transaction we can't + * release it until after the transaction commits or cancels, so we + * must defer releasing it to end_update. This is different from + * regular file creation, where the vfs holds the parent dir reference + * and will free it. The caller is always responsible for releasing + * ipp, even if we failed. + */ + xfs_trans_ijoin(*tpp, dp, XFS_ILOCK_EXCL); + + /* Create the entry. */ + if (S_ISDIR(args.mode)) + resblks = XFS_MKDIR_SPACE_RES(mp, xname.len); + else + resblks = XFS_CREATE_SPACE_RES(mp, xname.len); + xname.type = xfs_mode_to_ftype(args.mode); + trace_xfs_imeta_dir_try_create(dp, &xname, NULLFSINO); + error = xfs_dir_create_new_child(*tpp, resblks, dp, &xname, *ipp); + if (error) + return error; + trace_xfs_imeta_dir_created(*ipp, &xname, ino); + + /* Attach dquots to this file. Caller should have allocated them! */ + if (!(flags & XFS_IMETA_CREATE_NOQUOTA)) { + error = xfs_qm_dqattach_locked(*ipp, false); + if (error) + return error; + xfs_trans_mod_dquot_byino(*tpp, *ipp, XFS_TRANS_DQ_ICOUNT, 1); + } + + /* Update the in-core superblock value if there is one. */ + sb_inop = xfs_imeta_path_to_sb_inop(mp, path); + if (sb_inop) + *sb_inop = ino; + return 0; + +out_ilock: + xfs_iunlock(dp, XFS_ILOCK_EXCL); + return error; +} + +/* + * Remove the given entry from the metadata directory and drop the link count + * of the metadata inode. + */ +STATIC int +xfs_imeta_dir_unlink( + struct xfs_trans **tpp, + const struct xfs_imeta_path *path, + struct xfs_inode *ip, + struct xfs_imeta_update *upd) +{ + struct xfs_name xname; + struct xfs_mount *mp = (*tpp)->t_mountp; + struct xfs_inode *dp = upd->dp; + xfs_ino_t *sb_inop; + xfs_ino_t ino; + unsigned int resblks; + int error; + + /* Metadata directory root cannot be unlinked. */ + if (xfs_imeta_path_compare(path, &XFS_IMETA_METADIR)) { + ASSERT(0); + return -EFSCORRUPTED; + } + + ASSERT(path->im_depth > 0); + + /* Look up the name in the current directory. */ + set_xname(&xname, path, path->im_depth - 1, + xfs_mode_to_ftype(VFS_I(ip)->i_mode)); + error = xfs_imeta_dir_lookup_component(dp, &xname, &ino); + switch (error) { + case 0: + if (ino != ip->i_ino) + error = -ENOENT; + break; + case -ENOENT: + error = -EFSCORRUPTED; + break; + } + if (error) + return error; + + xfs_lock_two_inodes(dp, XFS_ILOCK_EXCL, ip, XFS_ILOCK_EXCL); + + /* + * Once we join the parent directory to the transaction we can't + * release it until after the transaction commits or cancels, so we + * must defer releasing it to end_update. This is different from + * regular file removal, where the vfs holds the parent dir reference + * and will free it. The unlink caller is always responsible for + * releasing ip, so we don't need to take care of that. + */ + xfs_trans_ijoin(*tpp, dp, XFS_ILOCK_EXCL); + xfs_trans_ijoin(*tpp, ip, XFS_ILOCK_EXCL); + + resblks = XFS_REMOVE_SPACE_RES(mp); + error = xfs_dir_remove_child(*tpp, resblks, dp, &xname, ip); + if (error) + return error; + trace_xfs_imeta_dir_unlinked(dp, &xname, ip->i_ino); + + /* Update the in-core superblock value if there is one. */ + sb_inop = xfs_imeta_path_to_sb_inop(mp, path); + if (sb_inop) + *sb_inop = NULLFSINO; + return 0; +} + +/* Set the given path in the metadata directory to point to an inode. */ +STATIC int +xfs_imeta_dir_link( + struct xfs_trans *tp, + const struct xfs_imeta_path *path, + struct xfs_inode *ip, + struct xfs_imeta_update *upd) +{ + struct xfs_name xname; + struct xfs_mount *mp = tp->t_mountp; + struct xfs_inode *dp = upd->dp; + xfs_ino_t *sb_inop; + xfs_ino_t ino; + unsigned int resblks; + int error; + + /* Metadata directory root cannot be linked. */ + if (xfs_imeta_path_compare(path, &XFS_IMETA_METADIR)) { + ASSERT(0); + return -EFSCORRUPTED; + } + + ASSERT(path->im_depth > 0); + + /* Look up the name in the current directory. */ + set_xname(&xname, path, path->im_depth - 1, + xfs_mode_to_ftype(VFS_I(ip)->i_mode)); + error = xfs_imeta_dir_lookup_component(dp, &xname, &ino); + switch (error) { + case -ENOENT: + break; + case 0: + error = -EEXIST; + fallthrough; + default: + return error; + } + + xfs_lock_two_inodes(ip, XFS_ILOCK_EXCL, dp, XFS_ILOCK_EXCL); + + /* + * Once we join the parent directory to the transaction we can't + * release it until after the transaction commits or cancels, so we + * must defer releasing it to end_update. This is different from + * regular file removal, where the vfs holds the parent dir reference + * and will free it. The link caller is always responsible for + * releasing ip, so we don't need to take care of that. + */ + xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL); + xfs_trans_ijoin(tp, dp, XFS_ILOCK_EXCL); + + resblks = XFS_LINK_SPACE_RES(mp, target_name->len); + error = xfs_dir_link_existing_child(tp, resblks, dp, &xname, ip); + if (error) + return error; + + trace_xfs_imeta_dir_link(dp, &xname, ip->i_ino); + + /* Update the in-core superblock value if there is one. */ + sb_inop = xfs_imeta_path_to_sb_inop(mp, path); + if (sb_inop) + *sb_inop = ip->i_ino; + return 0; +} + /* General functions for managing metadata inode pointers */ /* @@ -318,7 +854,13 @@ xfs_imeta_lookup( ASSERT(xfs_imeta_path_check(path)); - error = xfs_imeta_sb_lookup(mp, path, &ino); + if (xfs_has_metadir(mp)) { + error = xfs_imeta_dir_lookup_int(mp, path, &ino); + if (error == -ENOENT) + return -EFSCORRUPTED; + } else { + error = xfs_imeta_sb_lookup(mp, path, &ino); + } if (error) return error; @@ -351,12 +893,49 @@ xfs_imeta_create( struct xfs_inode **ipp, struct xfs_imeta_update *upd) { + struct xfs_mount *mp = (*tpp)->t_mountp; + ASSERT(xfs_imeta_path_check(path)); *ipp = NULL; + if (xfs_has_metadir(mp)) + return xfs_imeta_dir_create(tpp, path, mode, flags, ipp, + upd); return xfs_imeta_sb_create(tpp, path, mode, flags, ipp); } +/* Free a file from the metadata directory tree. */ +STATIC int +xfs_imeta_ifree( + struct xfs_trans *tp, + struct xfs_inode *ip) +{ + struct xfs_mount *mp = ip->i_mount; + struct xfs_perag *pag; + struct xfs_icluster xic = { 0 }; + int error; + + ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL)); + ASSERT(VFS_I(ip)->i_nlink == 0); + ASSERT(ip->i_df.if_nextents == 0); + ASSERT(ip->i_disk_size == 0 || !S_ISREG(VFS_I(ip)->i_mode)); + ASSERT(ip->i_nblocks == 0); + + pag = xfs_perag_get(mp, XFS_INO_TO_AGNO(mp, ip->i_ino)); + + error = xfs_dir_ifree(tp, pag, ip, &xic); + if (error) + goto out; + + /* Metadata files do not support ownership changes or DMAPI. */ + + if (xic.deleted) + error = xfs_ifree_cluster(tp, pag, ip, &xic); +out: + xfs_perag_put(pag); + return error; +} + /* * Unlink a metadata inode @ip from the metadata directory given by @path. The * metadata inode must not be ILOCKed. Upon return, the inode will be ijoined @@ -370,10 +949,28 @@ xfs_imeta_unlink( struct xfs_inode *ip, struct xfs_imeta_update *upd) { + struct xfs_mount *mp = (*tpp)->t_mountp; + int error; + ASSERT(xfs_imeta_path_check(path)); ASSERT(xfs_imeta_verify((*tpp)->t_mountp, ip->i_ino)); - return xfs_imeta_sb_unlink(tpp, path, ip); + if (xfs_has_metadir(mp)) + error = xfs_imeta_dir_unlink(tpp, path, ip, upd); + else + error = xfs_imeta_sb_unlink(tpp, path, ip); + if (error) + return error; + + /* + * Metadata files require explicit resource cleanup. In other words, + * the inactivation system will not touch these files, so we must free + * the ondisk inode by ourselves if warranted. + */ + if (VFS_I(ip)->i_nlink > 0) + return 0; + + return xfs_imeta_ifree(*tpp, ip); } /* @@ -388,8 +985,12 @@ xfs_imeta_link( struct xfs_inode *ip, struct xfs_imeta_update *upd) { + struct xfs_mount *mp = tp->t_mountp; + ASSERT(xfs_imeta_path_check(path)); + if (xfs_has_metadir(mp)) + return xfs_imeta_dir_link(tp, path, ip, upd); return xfs_imeta_sb_link(tp, path, ip); } @@ -404,6 +1005,14 @@ xfs_imeta_end_update( int error) { trace_xfs_imeta_end_update(mp, error, __return_address); + + if (upd->dp) { + if (upd->lock_mode) + xfs_iunlock(upd->dp, upd->lock_mode); + xfs_imeta_irele(upd->dp); + } + upd->lock_mode = 0; + upd->dp = NULL; } /* Start setting up for a metadata directory tree operation. */ @@ -413,9 +1022,32 @@ xfs_imeta_start_update( const struct xfs_imeta_path *path, struct xfs_imeta_update *upd) { + int error; + trace_xfs_imeta_start_update(mp, 0, __return_address); memset(upd, 0, sizeof(struct xfs_imeta_update)); + + /* Metadir root directory does not have a parent. */ + if (!xfs_has_metadir(mp) || + xfs_imeta_path_compare(path, &XFS_IMETA_METADIR)) + return 0; + + ASSERT(path->im_depth > 0); + + /* + * Find the parent of the last path component. If the parent path does + * not exist, we consider this corruption because paths are supposed + * to exist. + */ + error = xfs_imeta_dir_parent(mp, path, &upd->dp); + if (error == -ENOENT) + return -EFSCORRUPTED; + if (error) + return error; + + xfs_ilock(upd->dp, XFS_IOLOCK_EXCL | XFS_IOLOCK_PARENT); + upd->lock_mode = XFS_IOLOCK_EXCL; return 0; } @@ -442,6 +1074,9 @@ int xfs_imeta_mount( struct xfs_mount *mp) { + if (xfs_has_metadir(mp)) + return xfs_imeta_dir_mount(mp); + return 0; } @@ -450,6 +1085,9 @@ unsigned int xfs_imeta_create_space_res( struct xfs_mount *mp) { + if (xfs_has_metadir(mp)) + return max(XFS_MKDIR_SPACE_RES(mp, NAME_MAX), + XFS_CREATE_SPACE_RES(mp, NAME_MAX)); return XFS_IALLOC_SPACE_RES(mp); } @@ -460,3 +1098,54 @@ xfs_imeta_unlink_space_res( { return XFS_REMOVE_SPACE_RES(mp); } + +/* Clear the metadata iflag if we're unlinking this inode. */ +void +xfs_imeta_droplink( + struct xfs_inode *ip) +{ + if (VFS_I(ip)->i_nlink == 0 && + xfs_has_metadir(ip->i_mount) && + xfs_is_metadata_inode(ip)) + ip->i_diflags2 &= ~XFS_DIFLAG2_METADATA; +} + +/* + * Given a metadata directory update, look up the inode number of the last + * component in the path. + */ +int +xfs_imeta_lookup_update( + struct xfs_mount *mp, + const struct xfs_imeta_path *path, + struct xfs_imeta_update *upd, + xfs_ino_t *inop) +{ + struct xfs_name xname; + xfs_ino_t ino; + int error; + + ASSERT(xfs_isilocked(upd->dp, XFS_IOLOCK_SHARED | XFS_IOLOCK_EXCL)); + + /* metadir ino is recorded in superblock */ + if (!xfs_has_metadir(mp) || + xfs_imeta_path_compare(path, &XFS_IMETA_METADIR)) + return xfs_imeta_sb_lookup(mp, path, inop); + + ASSERT(path->im_depth > 0); + + /* Check that the name does not already exist in the directory. */ + set_xname(&xname, path, path->im_depth - 1, XFS_DIR3_FT_UNKNOWN); + error = xfs_imeta_dir_lookup_component(upd->dp, &xname, &ino); + switch (error) { + case 0: + *inop = ino; + break; + case -ENOENT: + *inop = NULLFSINO; + error = 0; + break; + } + + return error; +} diff --git a/fs/xfs/libxfs/xfs_imeta.h b/fs/xfs/libxfs/xfs_imeta.h index 631a88120a70..9b139f6809f0 100644 --- a/fs/xfs/libxfs/xfs_imeta.h +++ b/fs/xfs/libxfs/xfs_imeta.h @@ -13,6 +13,7 @@ #define XFS_IMETA_DEFINE_PATH(name, path) \ const struct xfs_imeta_path name = { \ .im_path = (path), \ + .im_ftype = XFS_DIR3_FT_REG_FILE, \ .im_depth = ARRAY_SIZE(path), \ } @@ -23,11 +24,18 @@ struct xfs_imeta_path { /* Number of strings in path. */ unsigned int im_depth; + + /* Expected file type. */ + unsigned int im_ftype; }; /* Cleanup widget for metadata inode creation and deletion. */ struct xfs_imeta_update { - /* empty for now */ + /* Parent directory */ + struct xfs_inode *dp; + + /* Parent directory lock mode */ + unsigned int lock_mode; }; /* Lookup keys for static metadata inodes. */ @@ -36,9 +44,15 @@ extern const struct xfs_imeta_path XFS_IMETA_RTSUMMARY; extern const struct xfs_imeta_path XFS_IMETA_USRQUOTA; extern const struct xfs_imeta_path XFS_IMETA_GRPQUOTA; extern const struct xfs_imeta_path XFS_IMETA_PRJQUOTA; +extern const struct xfs_imeta_path XFS_IMETA_METADIR; int xfs_imeta_lookup(struct xfs_mount *mp, const struct xfs_imeta_path *path, xfs_ino_t *ino); +int xfs_imeta_lookup_update(struct xfs_mount *mp, + const struct xfs_imeta_path *path, + struct xfs_imeta_update *upd, xfs_ino_t *inop); + +void xfs_imeta_set_metaflag(struct xfs_trans *tp, struct xfs_inode *ip); /* Don't allocate quota for this file. */ #define XFS_IMETA_CREATE_NOQUOTA (1 << 0) @@ -57,6 +71,7 @@ int xfs_imeta_start_update(struct xfs_mount *mp, bool xfs_is_static_meta_ino(struct xfs_mount *mp, xfs_ino_t ino); int xfs_imeta_mount(struct xfs_mount *mp); +void xfs_imeta_droplink(struct xfs_inode *ip); unsigned int xfs_imeta_create_space_res(struct xfs_mount *mp); unsigned int xfs_imeta_unlink_space_res(struct xfs_mount *mp); diff --git a/fs/xfs/libxfs/xfs_inode_util.c b/fs/xfs/libxfs/xfs_inode_util.c index 7b3e0c79c847..960b315d2b20 100644 --- a/fs/xfs/libxfs/xfs_inode_util.c +++ b/fs/xfs/libxfs/xfs_inode_util.c @@ -23,6 +23,7 @@ #include "xfs_ag.h" #include "xfs_iunlink_item.h" #include "xfs_inode_item.h" +#include "xfs_imeta.h" uint16_t xfs_flags2diflags( @@ -627,6 +628,7 @@ xfs_droplink( xfs_trans_ichgtime(tp, ip, XFS_ICHGTIME_CHG); drop_nlink(VFS_I(ip)); + xfs_imeta_droplink(ip); xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); if (VFS_I(ip)->i_nlink) diff --git a/fs/xfs/libxfs/xfs_trans_resv.c b/fs/xfs/libxfs/xfs_trans_resv.c index 08d5d6e7f554..791fad6dba74 100644 --- a/fs/xfs/libxfs/xfs_trans_resv.c +++ b/fs/xfs/libxfs/xfs_trans_resv.c @@ -921,7 +921,10 @@ xfs_calc_imeta_create_resv( unsigned int ret; ret = xfs_calc_buf_res(1, mp->m_sb.sb_sectsize); - ret += resp->tr_create.tr_logres; + if (xfs_has_metadir(mp)) + ret += max(resp->tr_create.tr_logres, resp->tr_mkdir.tr_logres); + else + ret += resp->tr_create.tr_logres; return ret; } @@ -931,6 +934,9 @@ xfs_calc_imeta_create_count( struct xfs_mount *mp, struct xfs_trans_resv *resp) { + if (xfs_has_metadir(mp)) + return max(resp->tr_create.tr_logcount, + resp->tr_mkdir.tr_logcount); return resp->tr_create.tr_logcount; } diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index 2c140c6d51e7..3830e03ceb0a 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -1796,7 +1796,7 @@ xfs_ifree_mark_inode_stale( * inodes that are in memory - they all must be marked stale and attached to * the cluster buffer. */ -static int +int xfs_ifree_cluster( struct xfs_trans *tp, struct xfs_perag *pag, diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h index d45583cd349d..7cf45dd9d86b 100644 --- a/fs/xfs/xfs_inode.h +++ b/fs/xfs/xfs_inode.h @@ -439,6 +439,7 @@ static inline bool xfs_inode_has_bigrtextents(struct xfs_inode *ip) #define XFS_IOLOCK_SHIFT 16 #define XFS_IOLOCK_MAX_SUBCLASS 3 #define XFS_IOLOCK_DEP_MASK 0x000f0000u +#define XFS_IOLOCK_PARENT (I_MUTEX_PARENT << XFS_ILOCK_SHIFT) #define XFS_MMAPLOCK_SHIFT 20 #define XFS_MMAPLOCK_NUMORDER 0 @@ -517,6 +518,9 @@ uint xfs_ilock_data_map_shared(struct xfs_inode *); uint xfs_ilock_attr_map_shared(struct xfs_inode *); int xfs_ifree(struct xfs_trans *, struct xfs_inode *); +int xfs_ifree_cluster(struct xfs_trans *tp, struct xfs_perag *pag, + struct xfs_inode *free_ip, + struct xfs_icluster *xic); int xfs_itruncate_extents_flags(struct xfs_trans **, struct xfs_inode *, int, xfs_fsize_t, int); void xfs_iext_realloc(xfs_inode_t *, int, int); diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index 1a3176932de8..b92efe4eaeae 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -4952,6 +4952,47 @@ DEFINE_IMETA_SB_EVENT(xfs_imeta_sb_link); DEFINE_FS_ERROR_EVENT(xfs_imeta_start_update); DEFINE_FS_ERROR_EVENT(xfs_imeta_end_update); +DECLARE_EVENT_CLASS(xfs_imeta_dir_class, + TP_PROTO(struct xfs_inode *dp, struct xfs_name *name, + xfs_ino_t ino), + TP_ARGS(dp, name, ino), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(xfs_ino_t, dp_ino) + __field(xfs_ino_t, ino) + __field(int, ftype) + __field(int, namelen) + __dynamic_array(char, name, name->len) + ), + TP_fast_assign( + __entry->dev = VFS_I(dp)->i_sb->s_dev; + __entry->dp_ino = dp->i_ino; + __entry->ino = ino, + __entry->ftype = name->type; + __entry->namelen = name->len; + memcpy(__get_str(name), name->name, name->len); + ), + TP_printk("dev %d:%d dir 0x%llx type %s name '%.*s' ino 0x%llx", + MAJOR(__entry->dev), MINOR(__entry->dev), + __entry->dp_ino, + __print_symbolic(__entry->ftype, XFS_DIR3_FTYPE_STR), + __entry->namelen, + __get_str(name), + __entry->ino) +) + +#define DEFINE_IMETA_DIR_EVENT(name) \ +DEFINE_EVENT(xfs_imeta_dir_class, name, \ + TP_PROTO(struct xfs_inode *dp, struct xfs_name *name, \ + xfs_ino_t ino), \ + TP_ARGS(dp, name, ino)) +DEFINE_IMETA_DIR_EVENT(xfs_imeta_dir_lookup_component); +DEFINE_IMETA_DIR_EVENT(xfs_imeta_dir_lookup_found); +DEFINE_IMETA_DIR_EVENT(xfs_imeta_dir_try_create); +DEFINE_IMETA_DIR_EVENT(xfs_imeta_dir_created); +DEFINE_IMETA_DIR_EVENT(xfs_imeta_dir_unlinked); +DEFINE_IMETA_DIR_EVENT(xfs_imeta_dir_link); + #endif /* _TRACE_XFS_H */ #undef TRACE_INCLUDE_PATH ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 16/23] xfs: advertise metadata directory feature 2022-12-30 22:17 ` [PATCHSET v1.0 00/23] xfs: metadata inode directories Darrick J. Wong ` (13 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 12/23] xfs: read and write metadata inode directory Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 11/23] xfs: enforce metadata inode flag Darrick J. Wong ` (7 subsequent siblings) 22 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Advertise the existence of the metadata directory feature; this will be used by scrub to decide if it needs to scan the metadir too. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_fs.h | 1 + fs/xfs/libxfs/xfs_sb.c | 2 ++ 2 files changed, 3 insertions(+) diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h index a39fd65e6ee0..7de31a6692ae 100644 --- a/fs/xfs/libxfs/xfs_fs.h +++ b/fs/xfs/libxfs/xfs_fs.h @@ -239,6 +239,7 @@ typedef struct xfs_fsop_resblks { #define XFS_FSOP_GEOM_FLAGS_BIGTIME (1 << 21) /* 64-bit nsec timestamps */ #define XFS_FSOP_GEOM_FLAGS_INOBTCNT (1 << 22) /* inobt btree counter */ #define XFS_FSOP_GEOM_FLAGS_NREXT64 (1 << 23) /* large extent counters */ +#define XFS_FSOP_GEOM_FLAGS_METADIR (1 << 30) /* metadata directories */ #define XFS_FSOP_GEOM_FLAGS_ATOMIC_SWAP (1U << 31) /* atomic file extent swap */ /* diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c index 181bede3b3f6..8ebedfe55b15 100644 --- a/fs/xfs/libxfs/xfs_sb.c +++ b/fs/xfs/libxfs/xfs_sb.c @@ -1233,6 +1233,8 @@ xfs_fs_geometry( geo->flags |= XFS_FSOP_GEOM_FLAGS_NREXT64; if (xfs_swapext_supported(mp)) geo->flags |= XFS_FSOP_GEOM_FLAGS_ATOMIC_SWAP; + if (xfs_has_metadir(mp)) + geo->flags |= XFS_FSOP_GEOM_FLAGS_METADIR; geo->rtsectsize = sbp->sb_blocksize; geo->dirblocksize = xfs_dir2_dirblock_bytes(sbp); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 11/23] xfs: enforce metadata inode flag 2022-12-30 22:17 ` [PATCHSET v1.0 00/23] xfs: metadata inode directories Darrick J. Wong ` (14 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 16/23] xfs: advertise metadata directory feature Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 23/23] xfs: enable metadata directory feature Darrick J. Wong ` (6 subsequent siblings) 22 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Add checks for the metadata inode flag so that we don't ever leak metadata inodes out to userspace, and we don't ever try to read a regular inode as metadata. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_inode_buf.c | 73 +++++++++++++++++++++++++++++++++++++++++ fs/xfs/libxfs/xfs_inode_buf.h | 3 ++ fs/xfs/scrub/common.c | 7 +++- fs/xfs/scrub/inode_repair.c | 10 ++++++ fs/xfs/xfs_icache.c | 2 + fs/xfs/xfs_inode.c | 13 +++++++ 6 files changed, 107 insertions(+), 1 deletion(-) diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c index 454f40b29249..1fb11d0e7eba 100644 --- a/fs/xfs/libxfs/xfs_inode_buf.c +++ b/fs/xfs/libxfs/xfs_inode_buf.c @@ -457,6 +457,73 @@ xfs_dinode_verify_nrext64( return NULL; } +/* + * Validate all the picky requirements we have for a file that claims to be + * filesystem metadata. + */ +xfs_failaddr_t +xfs_dinode_verify_metaflag( + struct xfs_mount *mp, + struct xfs_dinode *dip, + uint16_t mode, + uint16_t flags, + uint64_t flags2) +{ + if (!xfs_has_metadir(mp)) + return __this_address; + + /* V5 filesystem only */ + if (dip->di_version < 3) + return __this_address; + + /* V3 inode fields that are always zero */ + if (dip->di_onlink) + return __this_address; + if ((flags2 & XFS_DIFLAG2_NREXT64) && dip->di_nrext64_pad) + return __this_address; + if (!(flags2 & XFS_DIFLAG2_NREXT64) && dip->di_flushiter) + return __this_address; + + /* Metadata files can only be directories or regular files */ + if (!S_ISDIR(mode) && !S_ISREG(mode)) + return __this_address; + + /* They must have zero access permissions */ + if (mode & 0777) + return __this_address; + + /* DMAPI event and state masks are zero */ + if (dip->di_dmevmask || dip->di_dmstate) + return __this_address; + + /* User, group, and project IDs must be zero */ + if (dip->di_uid || dip->di_gid || + dip->di_projid_lo || dip->di_projid_hi) + return __this_address; + + /* Immutable, sync, noatime, nodump, and nodefrag flags must be set */ + if (!(flags & XFS_DIFLAG_IMMUTABLE)) + return __this_address; + if (!(flags & XFS_DIFLAG_SYNC)) + return __this_address; + if (!(flags & XFS_DIFLAG_NOATIME)) + return __this_address; + if (!(flags & XFS_DIFLAG_NODUMP)) + return __this_address; + if (!(flags & XFS_DIFLAG_NODEFRAG)) + return __this_address; + + /* Directories must have nosymlinks flags set */ + if (S_ISDIR(mode) && !(flags & XFS_DIFLAG_NOSYMLINKS)) + return __this_address; + + /* dax flags2 must not be set */ + if (flags2 & XFS_DIFLAG2_DAX) + return __this_address; + + return NULL; +} + xfs_failaddr_t xfs_dinode_verify( struct xfs_mount *mp, @@ -610,6 +677,12 @@ xfs_dinode_verify( !xfs_has_bigtime(mp)) return __this_address; + if (flags2 & XFS_DIFLAG2_METADATA) { + fa = xfs_dinode_verify_metaflag(mp, dip, mode, flags, flags2); + if (fa) + return fa; + } + return NULL; } diff --git a/fs/xfs/libxfs/xfs_inode_buf.h b/fs/xfs/libxfs/xfs_inode_buf.h index 585ed5a110af..94d6e7c018e2 100644 --- a/fs/xfs/libxfs/xfs_inode_buf.h +++ b/fs/xfs/libxfs/xfs_inode_buf.h @@ -28,6 +28,9 @@ int xfs_inode_from_disk(struct xfs_inode *ip, struct xfs_dinode *from); xfs_failaddr_t xfs_dinode_verify(struct xfs_mount *mp, xfs_ino_t ino, struct xfs_dinode *dip); +xfs_failaddr_t xfs_dinode_verify_metaflag(struct xfs_mount *mp, + struct xfs_dinode *dip, uint16_t mode, uint16_t flags, + uint64_t flags2); xfs_failaddr_t xfs_inode_validate_extsize(struct xfs_mount *mp, uint32_t extsize, uint16_t mode, uint16_t flags); xfs_failaddr_t xfs_inode_validate_cowextsize(struct xfs_mount *mp, diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c index 2fbd8aa01ef7..b9c4f335cd8e 100644 --- a/fs/xfs/scrub/common.c +++ b/fs/xfs/scrub/common.c @@ -862,7 +862,12 @@ xchk_install_handle_inode( struct xfs_scrub *sc, struct xfs_inode *ip) { - if (VFS_I(ip)->i_generation != sc->sm->sm_gen) { + /* + * Only the directories in the metadata directory tree can be scrubbed + * by handle -- files must be checked through an explicit scrub type. + */ + if ((xfs_is_metadata_inode(ip) && !S_ISDIR(VFS_I(ip)->i_mode)) || + VFS_I(ip)->i_generation != sc->sm->sm_gen) { xchk_irele(sc, ip); return -ENOENT; } diff --git a/fs/xfs/scrub/inode_repair.c b/fs/xfs/scrub/inode_repair.c index 6c889c21ddec..e9225536dc65 100644 --- a/fs/xfs/scrub/inode_repair.c +++ b/fs/xfs/scrub/inode_repair.c @@ -400,6 +400,16 @@ xrep_dinode_flags( dip->di_nrext64_pad = 0; else if (dip->di_version >= 3) dip->di_v3_pad = 0; + + if (flags2 & XFS_DIFLAG2_METADATA) { + xfs_failaddr_t fa; + + fa = xfs_dinode_verify_metaflag(sc->mp, dip, mode, flags, + flags2); + if (fa) + flags2 &= ~XFS_DIFLAG2_METADATA; + } + dip->di_flags = cpu_to_be16(flags); dip->di_flags2 = cpu_to_be64(flags2); } diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c index bccdaf51cd67..fc11ae6eae0b 100644 --- a/fs/xfs/xfs_icache.c +++ b/fs/xfs/xfs_icache.c @@ -931,6 +931,8 @@ xfs_imeta_iget( goto bad_rele; if (xfs_mode_to_ftype(VFS_I(ip)->i_mode) != ftype) goto bad_rele; + if (xfs_has_metadir(mp) && !xfs_is_metadata_inode(ip)) + goto bad_rele; *ipp = ip; return 0; diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index 83127fed2b10..2c140c6d51e7 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -572,8 +572,19 @@ xfs_lookup( if (error) goto out_free_name; + /* + * Make sure that a corrupt directory cannot accidentally link to a + * metadata file. + */ + if (XFS_IS_CORRUPT(dp->i_mount, xfs_is_metadata_inode(*ipp))) { + error = -EFSCORRUPTED; + goto out_irele; + } + return 0; +out_irele: + xfs_irele(*ipp); out_free_name: if (ci_name) kmem_free(ci_name->name); @@ -2714,6 +2725,8 @@ void xfs_imeta_irele( struct xfs_inode *ip) { + ASSERT(!xfs_has_metadir(ip->i_mount) || xfs_is_metadata_inode(ip)); + xfs_irele(ip); } ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 23/23] xfs: enable metadata directory feature 2022-12-30 22:17 ` [PATCHSET v1.0 00/23] xfs: metadata inode directories Darrick J. Wong ` (15 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 11/23] xfs: enforce metadata inode flag Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 22/23] xfs: don't check secondary super inode pointers when metadir enabled Darrick J. Wong ` (5 subsequent siblings) 22 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Enable the metadata directory feature. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_format.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h index 0bd915bd4eed..33b047f9cf03 100644 --- a/fs/xfs/libxfs/xfs_format.h +++ b/fs/xfs/libxfs/xfs_format.h @@ -396,7 +396,8 @@ xfs_sb_has_ro_compat_feature( XFS_SB_FEAT_INCOMPAT_META_UUID| \ XFS_SB_FEAT_INCOMPAT_BIGTIME| \ XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR| \ - XFS_SB_FEAT_INCOMPAT_NREXT64) + XFS_SB_FEAT_INCOMPAT_NREXT64 | \ + XFS_SB_FEAT_INCOMPAT_METADIR) #define XFS_SB_FEAT_INCOMPAT_UNKNOWN ~XFS_SB_FEAT_INCOMPAT_ALL static inline bool ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 22/23] xfs: don't check secondary super inode pointers when metadir enabled 2022-12-30 22:17 ` [PATCHSET v1.0 00/23] xfs: metadata inode directories Darrick J. Wong ` (16 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 23/23] xfs: enable metadata directory feature Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 19/23] xfs: record health problems with the metadata directory Darrick J. Wong ` (4 subsequent siblings) 22 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> When metadata directories are enabled, the rt and quota inodes are no longer pointed to by the superblock, so it doesn't make sense to check these. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/scrub/agheader.c | 29 +++++++++++++++++++---------- 1 file changed, 19 insertions(+), 10 deletions(-) diff --git a/fs/xfs/scrub/agheader.c b/fs/xfs/scrub/agheader.c index fb2f32a2af5d..e5edba4219fd 100644 --- a/fs/xfs/scrub/agheader.c +++ b/fs/xfs/scrub/agheader.c @@ -144,11 +144,16 @@ xchk_superblock( if (sb->sb_rootino != cpu_to_be64(mp->m_sb.sb_rootino)) xchk_block_set_preen(sc, bp); - if (sb->sb_rbmino != cpu_to_be64(mp->m_sb.sb_rbmino)) - xchk_block_set_preen(sc, bp); + if (xfs_has_metadir(sc->mp)) { + if (sb->sb_rbmino != cpu_to_be64(mp->m_sb.sb_metadirino)) + xchk_block_set_preen(sc, bp); + } else { + if (sb->sb_rbmino != cpu_to_be64(mp->m_sb.sb_rbmino)) + xchk_block_set_preen(sc, bp); - if (sb->sb_rsumino != cpu_to_be64(mp->m_sb.sb_rsumino)) - xchk_block_set_preen(sc, bp); + if (sb->sb_rsumino != cpu_to_be64(mp->m_sb.sb_rsumino)) + xchk_block_set_preen(sc, bp); + } if (sb->sb_rextsize != cpu_to_be32(mp->m_sb.sb_rextsize)) xchk_block_set_corrupt(sc, bp); @@ -225,11 +230,13 @@ xchk_superblock( * sb_icount, sb_ifree, sb_fdblocks, sb_frexents */ - if (sb->sb_uquotino != cpu_to_be64(mp->m_sb.sb_uquotino)) - xchk_block_set_preen(sc, bp); + if (!xfs_has_metadir(sc->mp)) { + if (sb->sb_uquotino != cpu_to_be64(mp->m_sb.sb_uquotino)) + xchk_block_set_preen(sc, bp); - if (sb->sb_gquotino != cpu_to_be64(mp->m_sb.sb_gquotino)) - xchk_block_set_preen(sc, bp); + if (sb->sb_gquotino != cpu_to_be64(mp->m_sb.sb_gquotino)) + xchk_block_set_preen(sc, bp); + } /* * Skip the quota flags since repair will force quotacheck. @@ -338,8 +345,10 @@ xchk_superblock( if (sb->sb_spino_align != cpu_to_be32(mp->m_sb.sb_spino_align)) xchk_block_set_corrupt(sc, bp); - if (sb->sb_pquotino != cpu_to_be64(mp->m_sb.sb_pquotino)) - xchk_block_set_preen(sc, bp); + if (!xfs_has_metadir(sc->mp)) { + if (sb->sb_pquotino != cpu_to_be64(mp->m_sb.sb_pquotino)) + xchk_block_set_preen(sc, bp); + } /* Don't care about sb_lsn */ } ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 19/23] xfs: record health problems with the metadata directory 2022-12-30 22:17 ` [PATCHSET v1.0 00/23] xfs: metadata inode directories Darrick J. Wong ` (17 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 22/23] xfs: don't check secondary super inode pointers when metadir enabled Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 20/23] xfs: scrub metadata directories Darrick J. Wong ` (3 subsequent siblings) 22 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Make a report to the health monitoring subsystem any time we encounter something in the metadata directory tree that looks like corruption. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_fs.h | 1 + fs/xfs/libxfs/xfs_health.h | 4 +++- fs/xfs/libxfs/xfs_imeta.c | 28 ++++++++++++++++++++++------ fs/xfs/xfs_health.c | 1 + fs/xfs/xfs_icache.c | 1 + fs/xfs/xfs_inode.c | 1 + 6 files changed, 29 insertions(+), 7 deletions(-) diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h index 6e0c45fcfeeb..c4995f6557d2 100644 --- a/fs/xfs/libxfs/xfs_fs.h +++ b/fs/xfs/libxfs/xfs_fs.h @@ -197,6 +197,7 @@ struct xfs_fsop_geom { #define XFS_FSOP_GEOM_SICK_RT_SUMMARY (1 << 5) /* realtime summary */ #define XFS_FSOP_GEOM_SICK_QUOTACHECK (1 << 6) /* quota counts */ #define XFS_FSOP_GEOM_SICK_NLINKS (1 << 7) /* inode link counts */ +#define XFS_FSOP_GEOM_SICK_METADIR (1 << 8) /* metadata directory */ /* Output for XFS_FS_COUNTS */ typedef struct xfs_fsop_counts { diff --git a/fs/xfs/libxfs/xfs_health.h b/fs/xfs/libxfs/xfs_health.h index 252334bc0488..99d53bae9c13 100644 --- a/fs/xfs/libxfs/xfs_health.h +++ b/fs/xfs/libxfs/xfs_health.h @@ -60,6 +60,7 @@ struct xfs_da_args; #define XFS_SICK_FS_PQUOTA (1 << 3) /* project quota */ #define XFS_SICK_FS_QUOTACHECK (1 << 4) /* quota counts */ #define XFS_SICK_FS_NLINKS (1 << 5) /* inode link counts */ +#define XFS_SICK_FS_METADIR (1 << 6) /* metadata directory tree */ /* Observable health issues for realtime volume metadata. */ #define XFS_SICK_RT_BITMAP (1 << 0) /* realtime bitmap */ @@ -96,7 +97,8 @@ struct xfs_da_args; XFS_SICK_FS_GQUOTA | \ XFS_SICK_FS_PQUOTA | \ XFS_SICK_FS_QUOTACHECK | \ - XFS_SICK_FS_NLINKS) + XFS_SICK_FS_NLINKS | \ + XFS_SICK_FS_METADIR) #define XFS_SICK_RT_PRIMARY (XFS_SICK_RT_BITMAP | \ XFS_SICK_RT_SUMMARY) diff --git a/fs/xfs/libxfs/xfs_imeta.c b/fs/xfs/libxfs/xfs_imeta.c index 8960c13117fc..e4db1651d067 100644 --- a/fs/xfs/libxfs/xfs_imeta.c +++ b/fs/xfs/libxfs/xfs_imeta.c @@ -26,6 +26,7 @@ #include "xfs_dir2.h" #include "xfs_dir2_priv.h" #include "xfs_ag.h" +#include "xfs_health.h" /* * Metadata Inode Number Management @@ -405,16 +406,22 @@ xfs_imeta_dir_lookup_component( trace_xfs_imeta_dir_lookup_component(dp, xname, NULLFSINO); - if (!S_ISDIR(VFS_I(dp)->i_mode)) + if (!S_ISDIR(VFS_I(dp)->i_mode)) { + xfs_fs_mark_sick(dp->i_mount, XFS_SICK_FS_METADIR); return -EFSCORRUPTED; + } error = xfs_imeta_dir_lookup(dp, xname, ino); if (error) return error; - if (!xfs_verify_ino(dp->i_mount, *ino)) + if (!xfs_verify_ino(dp->i_mount, *ino)) { + xfs_fs_mark_sick(dp->i_mount, XFS_SICK_FS_METADIR); return -EFSCORRUPTED; - if (type_wanted != XFS_DIR3_FT_UNKNOWN && xname->type != type_wanted) + } + if (type_wanted != XFS_DIR3_FT_UNKNOWN && xname->type != type_wanted) { + xfs_fs_mark_sick(dp->i_mount, XFS_SICK_FS_METADIR); return -EFSCORRUPTED; + } trace_xfs_imeta_dir_lookup_found(dp, xname, *ino); return 0; @@ -713,6 +720,7 @@ xfs_imeta_dir_unlink( /* Metadata directory root cannot be unlinked. */ if (xfs_imeta_path_compare(path, &XFS_IMETA_METADIR)) { ASSERT(0); + xfs_fs_mark_sick(mp, XFS_SICK_FS_METADIR); return -EFSCORRUPTED; } @@ -728,6 +736,7 @@ xfs_imeta_dir_unlink( error = -ENOENT; break; case -ENOENT: + xfs_fs_mark_sick(mp, XFS_SICK_FS_METADIR); error = -EFSCORRUPTED; break; } @@ -779,6 +788,7 @@ xfs_imeta_dir_link( /* Metadata directory root cannot be linked. */ if (xfs_imeta_path_compare(path, &XFS_IMETA_METADIR)) { ASSERT(0); + xfs_fs_mark_sick(mp, XFS_SICK_FS_METADIR); return -EFSCORRUPTED; } @@ -856,16 +866,20 @@ xfs_imeta_lookup( if (xfs_has_metadir(mp)) { error = xfs_imeta_dir_lookup_int(mp, path, &ino); - if (error == -ENOENT) + if (error == -ENOENT) { + xfs_fs_mark_sick(mp, XFS_SICK_FS_METADIR); return -EFSCORRUPTED; + } } else { error = xfs_imeta_sb_lookup(mp, path, &ino); } if (error) return error; - if (!xfs_imeta_verify(mp, ino)) + if (!xfs_imeta_verify(mp, ino)) { + xfs_fs_mark_sick(mp, XFS_SICK_FS_METADIR); return -EFSCORRUPTED; + } *inop = ino; return 0; @@ -1041,8 +1055,10 @@ xfs_imeta_start_update( * to exist. */ error = xfs_imeta_dir_parent(mp, path, &upd->dp); - if (error == -ENOENT) + if (error == -ENOENT) { + xfs_fs_mark_sick(mp, XFS_SICK_FS_METADIR); return -EFSCORRUPTED; + } if (error) return error; diff --git a/fs/xfs/xfs_health.c b/fs/xfs/xfs_health.c index 6de8780b208a..61f7a6aca6b1 100644 --- a/fs/xfs/xfs_health.c +++ b/fs/xfs/xfs_health.c @@ -360,6 +360,7 @@ static const struct ioctl_sick_map fs_map[] = { { XFS_SICK_FS_PQUOTA, XFS_FSOP_GEOM_SICK_PQUOTA }, { XFS_SICK_FS_QUOTACHECK, XFS_FSOP_GEOM_SICK_QUOTACHECK }, { XFS_SICK_FS_NLINKS, XFS_FSOP_GEOM_SICK_NLINKS }, + { XFS_SICK_FS_METADIR, XFS_FSOP_GEOM_SICK_METADIR }, { 0, 0 }, }; diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c index fc11ae6eae0b..728065bdfc32 100644 --- a/fs/xfs/xfs_icache.c +++ b/fs/xfs/xfs_icache.c @@ -940,6 +940,7 @@ xfs_imeta_iget( xfs_irele(ip); whine: xfs_err(mp, "metadata inode 0x%llx is corrupt", ino); + xfs_fs_mark_sick(mp, XFS_SICK_FS_METADIR); return -EFSCORRUPTED; } diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index 187c6025cfd8..51bceccd8c9a 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -577,6 +577,7 @@ xfs_lookup( * metadata file. */ if (XFS_IS_CORRUPT(dp->i_mount, xfs_is_metadata_inode(*ipp))) { + xfs_fs_mark_sick(dp->i_mount, XFS_SICK_FS_METADIR); error = -EFSCORRUPTED; goto out_irele; } ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 20/23] xfs: scrub metadata directories 2022-12-30 22:17 ` [PATCHSET v1.0 00/23] xfs: metadata inode directories Darrick J. Wong ` (18 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 19/23] xfs: record health problems with the metadata directory Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 21/23] xfs: teach nlink scrubber to deal with metadata directory roots Darrick J. Wong ` (2 subsequent siblings) 22 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Teach online scrub about the metadata directory tree. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/scrub/dir.c | 9 +++++++++ fs/xfs/scrub/dir_repair.c | 6 ++++++ fs/xfs/scrub/parent.c | 18 ++++++++++++++++++ fs/xfs/scrub/parent_repair.c | 37 +++++++++++++++++++++++++++++++------ 4 files changed, 64 insertions(+), 6 deletions(-) diff --git a/fs/xfs/scrub/dir.c b/fs/xfs/scrub/dir.c index 218cf43cdf93..30636501fb9f 100644 --- a/fs/xfs/scrub/dir.c +++ b/fs/xfs/scrub/dir.c @@ -59,6 +59,15 @@ xchk_dir_check_ftype( if (xfs_mode_to_ftype(VFS_I(ip)->i_mode) != ftype) xchk_fblock_set_corrupt(sc, XFS_DATA_FORK, offset); + + /* + * Metadata and regular inodes cannot cross trees. This property + * cannot change without a full inode free and realloc cycle, so it's + * safe to check this without holding locks. + */ + if (xfs_is_metadata_inode(ip) ^ xfs_is_metadata_inode(sc->ip)) + xchk_fblock_set_corrupt(sc, XFS_DATA_FORK, offset); + } /* diff --git a/fs/xfs/scrub/dir_repair.c b/fs/xfs/scrub/dir_repair.c index 7530819e1435..14f34b9d4448 100644 --- a/fs/xfs/scrub/dir_repair.c +++ b/fs/xfs/scrub/dir_repair.c @@ -204,6 +204,12 @@ xrep_dir_salvage_entry( if (error) return 0; + /* Don't mix metadata and regular directory trees. */ + if (xfs_is_metadata_inode(ip) ^ xfs_is_metadata_inode(rd->sc->ip)) { + xchk_irele(sc, ip); + return 0; + } + entry.ftype = xfs_mode_to_ftype(VFS_I(ip)->i_mode); xchk_irele(sc, ip); diff --git a/fs/xfs/scrub/parent.c b/fs/xfs/scrub/parent.c index 92866f1757be..5af765a8182c 100644 --- a/fs/xfs/scrub/parent.c +++ b/fs/xfs/scrub/parent.c @@ -197,6 +197,16 @@ xchk_parent_validate( goto out_rele; } + /* + * Metadata and regular inodes cannot cross trees. This property + * cannot change without a full inode free and realloc cycle, so it's + * safe to check this without holding locks. + */ + if (xfs_is_metadata_inode(dp) ^ xfs_is_metadata_inode(sc->ip)) { + xchk_fblock_set_corrupt(sc, XFS_DATA_FORK, 0); + goto out_rele; + } + /* * We prefer to keep the inode locked while we lock and search its * alleged parent for a forward reference. If we can grab the iolock @@ -302,5 +312,13 @@ xchk_parent( return 0; } + /* Is this the metadata root dir? Then '..' must point to itself. */ + if (sc->ip == mp->m_metadirip) { + if (sc->ip->i_ino != mp->m_sb.sb_metadirino || + sc->ip->i_ino != parent_ino) + xchk_fblock_set_corrupt(sc, XFS_DATA_FORK, 0); + return 0; + } + return xchk_parent_validate(sc, parent_ino); } diff --git a/fs/xfs/scrub/parent_repair.c b/fs/xfs/scrub/parent_repair.c index ffef5de0fbe2..bba1cd1c7c8b 100644 --- a/fs/xfs/scrub/parent_repair.c +++ b/fs/xfs/scrub/parent_repair.c @@ -135,6 +135,10 @@ xrep_findparent_walk_directory( if (xrep_is_tempfile(dp)) return 0; + /* Don't mix metadata and regular directory trees. */ + if (xfs_is_metadata_inode(dp) ^ xfs_is_metadata_inode(sc->ip)) + return 0; + /* Try to lock dp; if we can, we're ready to scan! */ if (!xfs_ilock_nowait(dp, XFS_IOLOCK_SHARED)) { xfs_ino_t orig_parent, new_parent; @@ -227,15 +231,30 @@ xrep_parent_confirm( }; int error; - /* - * The root directory always points to itself. Unlinked dirs can point - * anywhere, so we point them at the root dir too. - */ - if (sc->ip == sc->mp->m_rootip || VFS_I(sc->ip)->i_nlink == 0) { + /* The root directory always points to itself. */ + if (sc->ip == sc->mp->m_rootip) { *parent_ino = sc->mp->m_sb.sb_rootino; return 0; } + /* The metadata root directory always points to itself. */ + if (sc->ip == sc->mp->m_metadirip) { + *parent_ino = sc->mp->m_sb.sb_metadirino; + return 0; + } + + /* + * Unlinked dirs can point anywhere, so we point them at the root dir + * of whichever tree is appropriate. + */ + if (VFS_I(sc->ip)->i_nlink == 0) { + if (xfs_is_metadata_inode(sc->ip)) + *parent_ino = sc->mp->m_sb.sb_metadirino; + else + *parent_ino = sc->mp->m_sb.sb_rootino; + return 0; + } + /* Reject garbage parent inode numbers and self-referential parents. */ if (*parent_ino == NULLFSINO) return 0; @@ -389,8 +408,14 @@ xrep_parent_self_reference( if (sc->ip->i_ino == sc->mp->m_sb.sb_rootino) return sc->mp->m_sb.sb_rootino; - if (VFS_I(sc->ip)->i_nlink == 0) + if (sc->ip->i_ino == sc->mp->m_sb.sb_metadirino) + return sc->mp->m_sb.sb_metadirino; + + if (VFS_I(sc->ip)->i_nlink == 0) { + if (xfs_is_metadata_inode(sc->ip)) + return sc->mp->m_sb.sb_metadirino; return sc->mp->m_sb.sb_rootino; + } return NULLFSINO; } ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 21/23] xfs: teach nlink scrubber to deal with metadata directory roots 2022-12-30 22:17 ` [PATCHSET v1.0 00/23] xfs: metadata inode directories Darrick J. Wong ` (19 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 20/23] xfs: scrub metadata directories Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 18/23] xfs: enable creation of dynamically allocated metadir path structures Darrick J. Wong 2022-12-30 22:17 ` [PATCH 17/23] xfs: allow bulkstat to return metadata directories Darrick J. Wong 22 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Enhance the inode link count online fsck code alter their behavior when they detect metadata directory tree roots, just like they do for the regular root directory. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/scrub/nlinks.c | 12 +++++++----- fs/xfs/scrub/nlinks_repair.c | 2 +- 2 files changed, 8 insertions(+), 6 deletions(-) diff --git a/fs/xfs/scrub/nlinks.c b/fs/xfs/scrub/nlinks.c index dca759d27ac4..5325bb0e196e 100644 --- a/fs/xfs/scrub/nlinks.c +++ b/fs/xfs/scrub/nlinks.c @@ -282,7 +282,7 @@ xchk_nlinks_collect_dirent( * Otherwise, increment the number of backrefs pointing back to ino. */ if (dotdot) { - if (dp == sc->mp->m_rootip) + if (dp == sc->mp->m_rootip || dp == sc->mp->m_metadirip) error = xchk_nlinks_update_incore(xnc, ino, 1, 0, 0); else error = xchk_nlinks_update_incore(xnc, ino, 0, 1, 0); @@ -458,9 +458,11 @@ xchk_nlinks_collect( int error; /* Count the rt and quota files that are rooted in the superblock. */ - error = xchk_nlinks_collect_metafiles(xnc); - if (error) - return error; + if (!xfs_has_metadir(sc->mp)) { + error = xchk_nlinks_collect_metafiles(xnc); + if (error) + return error; + } /* * Set up for a potentially lengthy filesystem scan by reducing our @@ -648,7 +650,7 @@ xchk_nlinks_compare_inode( xchk_ino_set_corrupt(sc, ip->i_ino); } - if (ip == sc->mp->m_rootip) { + if (ip == sc->mp->m_rootip || ip == sc->mp->m_metadirip) { /* * For the root of a directory tree, both the '.' and '..' * entries should point to the root directory. The dot entry diff --git a/fs/xfs/scrub/nlinks_repair.c b/fs/xfs/scrub/nlinks_repair.c index f881e5dbd432..055eb4b67053 100644 --- a/fs/xfs/scrub/nlinks_repair.c +++ b/fs/xfs/scrub/nlinks_repair.c @@ -86,7 +86,7 @@ xrep_nlinks_is_orphaned( if (obs->parents != 0) return false; - if (ip == mp->m_rootip || ip == sc->orphanage) + if (ip == mp->m_rootip || ip == sc->orphanage || ip == mp->m_metadirip) return false; return actual_nlink != 0; } ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 18/23] xfs: enable creation of dynamically allocated metadir path structures 2022-12-30 22:17 ` [PATCHSET v1.0 00/23] xfs: metadata inode directories Darrick J. Wong ` (20 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 21/23] xfs: teach nlink scrubber to deal with metadata directory roots Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 17/23] xfs: allow bulkstat to return metadata directories Darrick J. Wong 22 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Add a few helper functions so that it's possible to allocate xfs_imeta_path objects dynamically, along with dynamically allocated path components. Eventually we're going to want to support paths of the form "/realtime/$rtgroup.rmap", and this is necessary for that. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_imeta.c | 43 +++++++++++++++++++++++++++++++++++++++++++ fs/xfs/libxfs/xfs_imeta.h | 8 ++++++++ 2 files changed, 51 insertions(+) diff --git a/fs/xfs/libxfs/xfs_imeta.c b/fs/xfs/libxfs/xfs_imeta.c index 07f88df7a7e5..8960c13117fc 100644 --- a/fs/xfs/libxfs/xfs_imeta.c +++ b/fs/xfs/libxfs/xfs_imeta.c @@ -1149,3 +1149,46 @@ xfs_imeta_lookup_update( return error; } + +/* Create a path to a file within the metadata directory tree. */ +int +xfs_imeta_create_file_path( + struct xfs_mount *mp, + unsigned int nr_components, + struct xfs_imeta_path **pathp) +{ + struct xfs_imeta_path *p; + char **components; + + p = kmalloc(sizeof(struct xfs_imeta_path), GFP_KERNEL); + if (!p) + return -ENOMEM; + + components = kvcalloc(nr_components, sizeof(char *), GFP_KERNEL); + if (!components) { + kfree(p); + return -ENOMEM; + } + + p->im_depth = nr_components; + p->im_path = (const char **)components; + p->im_ftype = XFS_DIR3_FT_REG_FILE; + p->im_dynamicmask = 0; + *pathp = p; + return 0; +} + +/* Free a metadata directory tree path. */ +void +xfs_imeta_free_path( + struct xfs_imeta_path *path) +{ + unsigned int i; + + for (i = 0; i < path->im_depth; i++) { + if ((path->im_dynamicmask & (1ULL << i)) && path->im_path[i]) + kfree(path->im_path[i]); + } + kfree(path->im_path); + kfree(path); +} diff --git a/fs/xfs/libxfs/xfs_imeta.h b/fs/xfs/libxfs/xfs_imeta.h index 741f426c6a4a..7840087b71da 100644 --- a/fs/xfs/libxfs/xfs_imeta.h +++ b/fs/xfs/libxfs/xfs_imeta.h @@ -15,6 +15,7 @@ const struct xfs_imeta_path name = { \ .im_path = (path), \ .im_ftype = XFS_DIR3_FT_REG_FILE, \ .im_depth = ARRAY_SIZE(path), \ + .im_dynamicmask = 0, \ } /* Key for looking up metadata inodes. */ @@ -27,6 +28,9 @@ struct xfs_imeta_path { /* Expected file type. */ unsigned int im_ftype; + + /* Each bit corresponds to an element of im_path needing to be freed */ + unsigned long long im_dynamicmask; }; /* Cleanup widget for metadata inode creation and deletion. */ @@ -52,6 +56,10 @@ int xfs_imeta_lookup_update(struct xfs_mount *mp, const struct xfs_imeta_path *path, struct xfs_imeta_update *upd, xfs_ino_t *inop); +int xfs_imeta_create_file_path(struct xfs_mount *mp, + unsigned int nr_components, struct xfs_imeta_path **pathp); +void xfs_imeta_free_path(struct xfs_imeta_path *path); + void xfs_imeta_set_metaflag(struct xfs_trans *tp, struct xfs_inode *ip); /* Don't allocate quota for this file. */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 17/23] xfs: allow bulkstat to return metadata directories 2022-12-30 22:17 ` [PATCHSET v1.0 00/23] xfs: metadata inode directories Darrick J. Wong ` (21 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 18/23] xfs: enable creation of dynamically allocated metadir path structures Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 22 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Allow the V5 bulkstat ioctl to return information about metadata directory files so that xfs_scrub can find and scrub them, since they are otherwise ordinary directories. (Metadata files of course require per-file scrub code and hence do not need exposure.) Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_fs.h | 10 +++++++++- fs/xfs/xfs_ioctl.c | 7 +++++++ fs/xfs/xfs_itable.c | 32 ++++++++++++++++++++++++++++---- fs/xfs/xfs_itable.h | 3 +++ 4 files changed, 47 insertions(+), 5 deletions(-) diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h index 7de31a6692ae..6e0c45fcfeeb 100644 --- a/fs/xfs/libxfs/xfs_fs.h +++ b/fs/xfs/libxfs/xfs_fs.h @@ -485,9 +485,17 @@ struct xfs_bulk_ireq { */ #define XFS_BULK_IREQ_NREXT64 (1U << 2) +/* + * Allow bulkstat to return information about metadata directories. This + * enables xfs_scrub to find them for scanning, as they are otherwise ordinary + * directories. + */ +#define XFS_BULK_IREQ_METADIR (1U << 31) + #define XFS_BULK_IREQ_FLAGS_ALL (XFS_BULK_IREQ_AGNO | \ XFS_BULK_IREQ_SPECIAL | \ - XFS_BULK_IREQ_NREXT64) + XFS_BULK_IREQ_NREXT64 | \ + XFS_BULK_IREQ_METADIR) /* Operate on the root directory inode. */ #define XFS_BULK_IREQ_SPECIAL_ROOT (1) diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c index 615fd1e4a611..37af6b7e6dbe 100644 --- a/fs/xfs/xfs_ioctl.c +++ b/fs/xfs/xfs_ioctl.c @@ -819,6 +819,10 @@ xfs_bulk_ireq_setup( if (hdr->flags & XFS_BULK_IREQ_NREXT64) breq->flags |= XFS_IBULK_NREXT64; + /* Caller wants to see metadata directories in bulkstat output. */ + if (hdr->flags & XFS_BULK_IREQ_METADIR) + breq->flags |= XFS_IBULK_METADIR; + return 0; } @@ -909,6 +913,9 @@ xfs_ioc_inumbers( if (copy_from_user(&hdr, &arg->hdr, sizeof(hdr))) return -EFAULT; + if (hdr.flags & XFS_BULK_IREQ_METADIR) + return -EINVAL; + error = xfs_bulk_ireq_setup(mp, &hdr, &breq, arg->inumbers); if (error == -ECANCELED) goto out_teardown; diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c index 7a967cc78010..37b15da0bb5c 100644 --- a/fs/xfs/xfs_itable.c +++ b/fs/xfs/xfs_itable.c @@ -36,6 +36,16 @@ struct xfs_bstat_chunk { struct xfs_bulkstat *buf; }; +static inline bool +want_metadir( + struct xfs_inode *ip, + struct xfs_ibulk *breq) +{ + return xfs_is_metadata_inode(ip) && + S_ISDIR(VFS_I(ip)->i_mode) && + (breq->flags & XFS_IBULK_METADIR); +} + /* * Fill out the bulkstat info for a single inode and report it somewhere. * @@ -69,9 +79,6 @@ xfs_bulkstat_one_int( vfsuid_t vfsuid; vfsgid_t vfsgid; - if (xfs_internal_inum(mp, ino)) - goto out_advance; - error = xfs_iget(mp, tp, ino, (XFS_IGET_DONTCACHE | XFS_IGET_UNTRUSTED), XFS_ILOCK_SHARED, &ip); @@ -86,8 +93,25 @@ xfs_bulkstat_one_int( vfsuid = i_uid_into_vfsuid(mnt_userns, inode); vfsgid = i_gid_into_vfsgid(mnt_userns, inode); + /* If we want metadata directories, push out the bare minimum. */ + if (want_metadir(ip, bc->breq)) { + memset(buf, 0, sizeof(*buf)); + buf->bs_ino = ino; + buf->bs_gen = inode->i_generation; + buf->bs_mode = inode->i_mode & S_IFMT; + xfs_bulkstat_health(ip, buf); + buf->bs_version = XFS_BULKSTAT_VERSION_V5; + xfs_iunlock(ip, XFS_ILOCK_SHARED); + xfs_irele(ip); + + error = bc->formatter(bc->breq, buf); + if (!error || error == -ECANCELED) + goto out_advance; + goto out; + } + /* If this is a private inode, don't leak its details to userspace. */ - if (IS_PRIVATE(inode)) { + if (IS_PRIVATE(inode) || xfs_internal_inum(mp, ino)) { xfs_iunlock(ip, XFS_ILOCK_SHARED); xfs_irele(ip); error = -EINVAL; diff --git a/fs/xfs/xfs_itable.h b/fs/xfs/xfs_itable.h index e2d0eba43f35..bd04445e88e5 100644 --- a/fs/xfs/xfs_itable.h +++ b/fs/xfs/xfs_itable.h @@ -22,6 +22,9 @@ struct xfs_ibulk { /* Fill out the bs_extents64 field if set. */ #define XFS_IBULK_NREXT64 (1U << 1) +/* Signal that we can return metadata directories. */ +#define XFS_IBULK_METADIR (1U << 2) + /* * Advance the user buffer pointer by one record of the given size. If the * buffer is now full, return the appropriate error code. ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCHSET v1.0 00/14] xfs: refactor btrees to support records in inode root 2022-12-30 21:14 [NYE DELUGE 3/4] xfs: modernize the realtime volume Darrick J. Wong 2022-12-30 22:17 ` [PATCHSET v1.0 00/20] xfs: hoist inode operations to libxfs Darrick J. Wong 2022-12-30 22:17 ` [PATCHSET v1.0 00/23] xfs: metadata inode directories Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 03/14] xfs: refactor creation of bmap btree roots Darrick J. Wong ` (13 more replies) 2022-12-30 22:17 ` [PATCHSET v1.0 00/11] xfs: clean up realtime type usage Darrick J. Wong ` (36 subsequent siblings) 39 siblings, 14 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs Hi all, This series prepares the btree code to support realtime reverse mapping btrees by refactoring xfs_ifork_realloc to be fed a per-btree ops structure so that it can handle multiple types of inode-rooted btrees. It moves on to refactoring the btree code to use the new realloc routines and to support storing btree rcords in the inode root, because the current bmbt code does not support this. If you're going to start using this mess, you probably ought to just pull from my git trees, which are linked below. This is an extraordinary way to destroy everything. Enjoy! Comments and questions are, as always, welcome. --D kernel git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=btree-ifork-records xfsprogs git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=btree-ifork-records --- fs/xfs/libxfs/xfs_alloc_btree.c | 6 - fs/xfs/libxfs/xfs_alloc_btree.h | 3 fs/xfs/libxfs/xfs_attr_leaf.c | 8 - fs/xfs/libxfs/xfs_bmap.c | 59 ++---- fs/xfs/libxfs/xfs_bmap_btree.c | 93 ++++++++- fs/xfs/libxfs/xfs_bmap_btree.h | 219 +++++++++++++++------ fs/xfs/libxfs/xfs_btree.c | 382 ++++++++++++++++++++++++++++-------- fs/xfs/libxfs/xfs_btree.h | 4 fs/xfs/libxfs/xfs_btree_staging.c | 4 fs/xfs/libxfs/xfs_ialloc.c | 4 fs/xfs/libxfs/xfs_ialloc_btree.c | 6 - fs/xfs/libxfs/xfs_ialloc_btree.h | 3 fs/xfs/libxfs/xfs_inode_fork.c | 163 +++++++-------- fs/xfs/libxfs/xfs_inode_fork.h | 27 ++- fs/xfs/libxfs/xfs_refcount_btree.c | 5 fs/xfs/libxfs/xfs_refcount_btree.h | 3 fs/xfs/libxfs/xfs_rmap_btree.c | 9 - fs/xfs/libxfs/xfs_rmap_btree.h | 3 fs/xfs/libxfs/xfs_sb.c | 16 +- fs/xfs/libxfs/xfs_trans_resv.c | 2 fs/xfs/scrub/bmap_repair.c | 2 fs/xfs/scrub/inode_repair.c | 8 - fs/xfs/xfs_xchgrange.c | 4 23 files changed, 705 insertions(+), 328 deletions(-) ^ permalink raw reply [flat|nested] 565+ messages in thread
* [PATCH 03/14] xfs: refactor creation of bmap btree roots 2022-12-30 22:17 ` [PATCHSET v1.0 00/14] xfs: refactor btrees to support records in inode root Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 02/14] xfs: refactor the allocation and freeing of incore inode fork " Darrick J. Wong ` (12 subsequent siblings) 13 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Now that we've created inode fork helpers to allocate and free btree roots, create a new bmap btree helper to create a new bmbt root, and refactor the extents <-> btree conversion functions to use our new helpers. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_bmap.c | 17 ++++------------- fs/xfs/libxfs/xfs_bmap_btree.c | 16 ++++++++++++++++ fs/xfs/libxfs/xfs_bmap_btree.h | 2 ++ 3 files changed, 22 insertions(+), 13 deletions(-) diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index cbcb24df1a1e..98bd32da142d 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -591,7 +591,7 @@ xfs_bmap_btree_to_extents( xfs_trans_binval(tp, cbp); if (cur->bc_levels[0].bp == cbp) cur->bc_levels[0].bp = NULL; - xfs_iroot_realloc(ip, -1, whichfork); + xfs_iroot_free(ip, whichfork); ASSERT(ifp->if_broot == NULL); ifp->if_format = XFS_DINODE_FMT_EXTENTS; *logflagsp |= XFS_ILOG_CORE | xfs_ilog_fext(whichfork); @@ -631,20 +631,10 @@ xfs_bmap_extents_to_btree( ifp = xfs_ifork_ptr(ip, whichfork); ASSERT(ifp->if_format == XFS_DINODE_FMT_EXTENTS); - /* - * Make space in the inode incore. This needs to be undone if we fail - * to expand the root. - */ - xfs_iroot_realloc(ip, 1, whichfork); - - /* - * Fill in the root. - */ - block = ifp->if_broot; - xfs_btree_init_block(mp, block, &xfs_bmbt_ops, 1, 1, ip->i_ino); /* * Need a cursor. Can't allocate until bb_level is filled in. */ + xfs_bmbt_iroot_alloc(ip, whichfork); cur = xfs_bmbt_init_cursor(mp, tp, ip, whichfork); cur->bc_ino.flags = wasdel ? XFS_BTCUR_BMBT_WASDEL : 0; /* @@ -711,6 +701,7 @@ xfs_bmap_extents_to_btree( /* * Fill in the root key and pointer. */ + block = ifp->if_broot; kp = xfs_bmbt_key_addr(mp, block, 1); arp = xfs_bmbt_rec_addr(mp, ablock, 1); kp->br_startoff = cpu_to_be64(xfs_bmbt_disk_get_startoff(arp)); @@ -732,7 +723,7 @@ xfs_bmap_extents_to_btree( out_unreserve_dquot: xfs_trans_mod_dquot_byino(tp, ip, XFS_TRANS_DQ_BCOUNT, -1L); out_root_realloc: - xfs_iroot_realloc(ip, -1, whichfork); + xfs_iroot_free(ip, whichfork); ifp->if_format = XFS_DINODE_FMT_EXTENTS; ASSERT(ifp->if_broot == NULL); xfs_btree_del_cursor(cur, XFS_BTREE_ERROR); diff --git a/fs/xfs/libxfs/xfs_bmap_btree.c b/fs/xfs/libxfs/xfs_bmap_btree.c index 82f46837f79f..973fa6cc7aa6 100644 --- a/fs/xfs/libxfs/xfs_bmap_btree.c +++ b/fs/xfs/libxfs/xfs_bmap_btree.c @@ -796,3 +796,19 @@ xfs_bmbt_destroy_cur_cache(void) kmem_cache_destroy(xfs_bmbt_cur_cache); xfs_bmbt_cur_cache = NULL; } + +/* Create an incore bmbt btree root block. */ +void +xfs_bmbt_iroot_alloc( + struct xfs_inode *ip, + int whichfork) +{ + struct xfs_ifork *ifp = xfs_ifork_ptr(ip, whichfork); + + xfs_iroot_alloc(ip, whichfork, + xfs_bmap_broot_space_calc(ip->i_mount, 1)); + + /* Fill in the root. */ + xfs_btree_init_block(ip->i_mount, ifp->if_broot, &xfs_bmbt_ops, 1, 1, + ip->i_ino); +} diff --git a/fs/xfs/libxfs/xfs_bmap_btree.h b/fs/xfs/libxfs/xfs_bmap_btree.h index 62fbc4f7c2c4..3fe9c4f7f1a0 100644 --- a/fs/xfs/libxfs/xfs_bmap_btree.h +++ b/fs/xfs/libxfs/xfs_bmap_btree.h @@ -196,4 +196,6 @@ xfs_bmap_bmdr_space(struct xfs_btree_block *bb) return xfs_bmdr_space_calc(be16_to_cpu(bb->bb_numrecs)); } +void xfs_bmbt_iroot_alloc(struct xfs_inode *ip, int whichfork); + #endif /* __XFS_BMAP_BTREE_H__ */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 02/14] xfs: refactor the allocation and freeing of incore inode fork btree roots 2022-12-30 22:17 ` [PATCHSET v1.0 00/14] xfs: refactor btrees to support records in inode root Darrick J. Wong 2022-12-30 22:17 ` [PATCH 03/14] xfs: refactor creation of bmap btree roots Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 01/14] xfs: replace shouty XFS_BM{BT,DR} macros Darrick J. Wong ` (11 subsequent siblings) 13 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Refactor the code that allocates and freese the incore inode fork btree roots. This will help us disentangle some of the weird logic when we're creating and tearing down inode-based btrees. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_inode_fork.c | 53 ++++++++++++++++++++++++++++------------ fs/xfs/libxfs/xfs_inode_fork.h | 3 ++ 2 files changed, 40 insertions(+), 16 deletions(-) diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c index 9d53df7ce49d..0f220f100069 100644 --- a/fs/xfs/libxfs/xfs_inode_fork.c +++ b/fs/xfs/libxfs/xfs_inode_fork.c @@ -207,8 +207,7 @@ xfs_iformat_btree( return -EFSCORRUPTED; } - ifp->if_broot_bytes = size; - ifp->if_broot = kmem_alloc(size, KM_NOFS); + xfs_iroot_alloc(ip, whichfork, size); ASSERT(ifp->if_broot != NULL); /* * Copy and convert from the on-disk structure @@ -344,6 +343,32 @@ xfs_iformat_attr_fork( return error; } +/* Allocate a new incore ifork btree root. */ +void +xfs_iroot_alloc( + struct xfs_inode *ip, + int whichfork, + size_t bytes) +{ + struct xfs_ifork *ifp = xfs_ifork_ptr(ip, whichfork); + + ifp->if_broot = kmem_alloc(bytes, KM_NOFS); + ifp->if_broot_bytes = bytes; +} + +/* Free all the memory and state associated with an incore ifork btree root. */ +void +xfs_iroot_free( + struct xfs_inode *ip, + int whichfork) +{ + struct xfs_ifork *ifp = xfs_ifork_ptr(ip, whichfork); + + ifp->if_broot_bytes = 0; + kmem_free(ifp->if_broot); + ifp->if_broot = NULL; +} + /* * Reallocate the space for if_broot based on the number of records * being added or deleted as indicated in rec_diff. Move the records @@ -392,8 +417,7 @@ xfs_iroot_realloc( */ if (ifp->if_broot_bytes == 0) { new_size = xfs_bmap_broot_space_calc(mp, rec_diff); - ifp->if_broot = kmem_alloc(new_size, KM_NOFS); - ifp->if_broot_bytes = (int)new_size; + xfs_iroot_alloc(ip, whichfork, new_size); return; } @@ -432,17 +456,15 @@ xfs_iroot_realloc( new_size = xfs_bmap_broot_space_calc(mp, new_max); else new_size = 0; - if (new_size > 0) { - new_broot = kmem_alloc(new_size, KM_NOFS); - /* - * First copy over the btree block header. - */ - memcpy(new_broot, ifp->if_broot, - xfs_bmbt_block_len(ip->i_mount)); - } else { - new_broot = NULL; + if (new_size == 0) { + xfs_iroot_free(ip, whichfork); + return; } + /* First copy over the btree block header. */ + new_broot = kmem_alloc(new_size, KM_NOFS); + memcpy(new_broot, ifp->if_broot, xfs_bmbt_block_len(ip->i_mount)); + /* * Only copy the records and pointers if there are any. */ @@ -466,9 +488,8 @@ xfs_iroot_realloc( kmem_free(ifp->if_broot); ifp->if_broot = new_broot; ifp->if_broot_bytes = (int)new_size; - if (ifp->if_broot) - ASSERT(xfs_bmap_bmdr_space(ifp->if_broot) <= - xfs_inode_fork_size(ip, whichfork)); + ASSERT(xfs_bmap_bmdr_space(ifp->if_broot) <= + xfs_inode_fork_size(ip, whichfork)); return; } diff --git a/fs/xfs/libxfs/xfs_inode_fork.h b/fs/xfs/libxfs/xfs_inode_fork.h index c201d8ad5957..f4379e2df616 100644 --- a/fs/xfs/libxfs/xfs_inode_fork.h +++ b/fs/xfs/libxfs/xfs_inode_fork.h @@ -171,6 +171,9 @@ void xfs_iflush_fork(struct xfs_inode *, struct xfs_dinode *, void xfs_idestroy_fork(struct xfs_ifork *ifp); void xfs_idata_realloc(struct xfs_inode *ip, int64_t byte_diff, int whichfork); +void xfs_iroot_alloc(struct xfs_inode *ip, int whichfork, + size_t bytes); +void xfs_iroot_free(struct xfs_inode *ip, int whichfork); void xfs_iroot_realloc(struct xfs_inode *, int, int); int xfs_iread_extents(struct xfs_trans *, struct xfs_inode *, int); int xfs_iextents_copy(struct xfs_inode *, struct xfs_bmbt_rec *, ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 01/14] xfs: replace shouty XFS_BM{BT,DR} macros 2022-12-30 22:17 ` [PATCHSET v1.0 00/14] xfs: refactor btrees to support records in inode root Darrick J. Wong 2022-12-30 22:17 ` [PATCH 03/14] xfs: refactor creation of bmap btree roots Darrick J. Wong 2022-12-30 22:17 ` [PATCH 02/14] xfs: refactor the allocation and freeing of incore inode fork " Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 05/14] xfs: hoist the code that moves the incore inode fork broot memory Darrick J. Wong ` (10 subsequent siblings) 13 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Replace all the shouty bmap btree and bmap disk root macros with actual functions, and fix a type handling error in the xattr code that the macros previously didn't care about. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_attr_leaf.c | 8 +- fs/xfs/libxfs/xfs_bmap.c | 40 ++++---- fs/xfs/libxfs/xfs_bmap_btree.c | 18 ++-- fs/xfs/libxfs/xfs_bmap_btree.h | 204 +++++++++++++++++++++++++++------------- fs/xfs/libxfs/xfs_inode_fork.c | 30 +++--- fs/xfs/libxfs/xfs_trans_resv.c | 2 fs/xfs/scrub/bmap_repair.c | 2 fs/xfs/scrub/inode_repair.c | 8 +- fs/xfs/xfs_xchgrange.c | 4 - 9 files changed, 196 insertions(+), 120 deletions(-) diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c index 1f3febeccbe0..055290d37016 100644 --- a/fs/xfs/libxfs/xfs_attr_leaf.c +++ b/fs/xfs/libxfs/xfs_attr_leaf.c @@ -649,7 +649,7 @@ xfs_attr_shortform_bytesfit( */ if (!dp->i_forkoff && dp->i_df.if_bytes > xfs_default_attroffset(dp)) - dsize = XFS_BMDR_SPACE_CALC(MINDBTPTRS); + dsize = xfs_bmdr_space_calc(MINDBTPTRS); break; case XFS_DINODE_FMT_BTREE: /* @@ -663,7 +663,7 @@ xfs_attr_shortform_bytesfit( return 0; return dp->i_forkoff; } - dsize = XFS_BMAP_BROOT_SPACE(mp, dp->i_df.if_broot); + dsize = xfs_bmap_bmdr_space(dp->i_df.if_broot); break; } @@ -671,11 +671,11 @@ xfs_attr_shortform_bytesfit( * A data fork btree root must have space for at least * MINDBTPTRS key/ptr pairs if the data fork is small or empty. */ - minforkoff = max_t(int64_t, dsize, XFS_BMDR_SPACE_CALC(MINDBTPTRS)); + minforkoff = max_t(int64_t, dsize, xfs_bmdr_space_calc(MINDBTPTRS)); minforkoff = roundup(minforkoff, 8) >> 3; /* attr fork btree root can have at least this many key/ptr pairs */ - maxforkoff = XFS_LITINO(mp) - XFS_BMDR_SPACE_CALC(MINABTPTRS); + maxforkoff = XFS_LITINO(mp) - xfs_bmdr_space_calc(MINABTPTRS); maxforkoff = maxforkoff >> 3; /* rounded down */ if (offset >= maxforkoff) diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index 55fe8cda3d98..cbcb24df1a1e 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -78,9 +78,9 @@ xfs_bmap_compute_maxlevels( maxleafents = xfs_iext_max_nextents(xfs_has_large_extent_counts(mp), whichfork); if (whichfork == XFS_DATA_FORK) - sz = XFS_BMDR_SPACE_CALC(MINDBTPTRS); + sz = xfs_bmdr_space_calc(MINDBTPTRS); else - sz = XFS_BMDR_SPACE_CALC(MINABTPTRS); + sz = xfs_bmdr_space_calc(MINABTPTRS); maxrootrecs = xfs_bmdr_maxrecs(sz, 0); minleafrecs = mp->m_bmap_dmnr[0]; @@ -101,8 +101,8 @@ xfs_bmap_compute_attr_offset( struct xfs_mount *mp) { if (mp->m_sb.sb_inodesize == 256) - return XFS_LITINO(mp) - XFS_BMDR_SPACE_CALC(MINABTPTRS); - return XFS_BMDR_SPACE_CALC(6 * MINABTPTRS); + return XFS_LITINO(mp) - xfs_bmdr_space_calc(MINABTPTRS); + return xfs_bmdr_space_calc(6 * MINABTPTRS); } STATIC int /* error */ @@ -275,7 +275,7 @@ xfs_check_block( prevp = NULL; for( i = 1; i <= xfs_btree_get_numrecs(block); i++) { dmxr = mp->m_bmap_dmxr[0]; - keyp = XFS_BMBT_KEY_ADDR(mp, block, i); + keyp = xfs_bmbt_key_addr(mp, block, i); if (prevp) { ASSERT(be64_to_cpu(prevp->br_startoff) < @@ -287,15 +287,15 @@ xfs_check_block( * Compare the block numbers to see if there are dups. */ if (root) - pp = XFS_BMAP_BROOT_PTR_ADDR(mp, block, i, sz); + pp = xfs_bmap_broot_ptr_addr(mp, block, i, sz); else - pp = XFS_BMBT_PTR_ADDR(mp, block, i, dmxr); + pp = xfs_bmbt_ptr_addr(mp, block, i, dmxr); for (j = i+1; j <= be16_to_cpu(block->bb_numrecs); j++) { if (root) - thispa = XFS_BMAP_BROOT_PTR_ADDR(mp, block, j, sz); + thispa = xfs_bmap_broot_ptr_addr(mp, block, j, sz); else - thispa = XFS_BMBT_PTR_ADDR(mp, block, j, dmxr); + thispa = xfs_bmbt_ptr_addr(mp, block, j, dmxr); if (*thispa == *pp) { xfs_warn(mp, "%s: thispa(%d) == pp(%d) %lld", __func__, j, i, @@ -350,7 +350,7 @@ xfs_bmap_check_leaf_extents( level = be16_to_cpu(block->bb_level); ASSERT(level > 0); xfs_check_block(block, mp, 1, ifp->if_broot_bytes); - pp = XFS_BMAP_BROOT_PTR_ADDR(mp, block, 1, ifp->if_broot_bytes); + pp = xfs_bmap_broot_ptr_addr(mp, block, 1, ifp->if_broot_bytes); bno = be64_to_cpu(*pp); ASSERT(bno != NULLFSBLOCK); @@ -385,7 +385,7 @@ xfs_bmap_check_leaf_extents( */ xfs_check_block(block, mp, 0, 0); - pp = XFS_BMBT_PTR_ADDR(mp, block, 1, mp->m_bmap_dmxr[1]); + pp = xfs_bmbt_ptr_addr(mp, block, 1, mp->m_bmap_dmxr[1]); bno = be64_to_cpu(*pp); if (XFS_IS_CORRUPT(mp, !xfs_verify_fsbno(mp, bno))) { xfs_btree_mark_sick(cur); @@ -425,14 +425,14 @@ xfs_bmap_check_leaf_extents( * conform with the first entry in this one. */ - ep = XFS_BMBT_REC_ADDR(mp, block, 1); + ep = xfs_bmbt_rec_addr(mp, block, 1); if (i) { ASSERT(xfs_bmbt_disk_get_startoff(&last) + xfs_bmbt_disk_get_blockcount(&last) <= xfs_bmbt_disk_get_startoff(ep)); } for (j = 1; j < num_recs; j++) { - nextp = XFS_BMBT_REC_ADDR(mp, block, j + 1); + nextp = xfs_bmbt_rec_addr(mp, block, j + 1); ASSERT(xfs_bmbt_disk_get_startoff(ep) + xfs_bmbt_disk_get_blockcount(ep) <= xfs_bmbt_disk_get_startoff(nextp)); @@ -567,7 +567,7 @@ xfs_bmap_btree_to_extents( ASSERT(be16_to_cpu(rblock->bb_numrecs) == 1); ASSERT(xfs_bmbt_maxrecs(mp, ifp->if_broot_bytes, 0) == 1); - pp = XFS_BMAP_BROOT_PTR_ADDR(mp, rblock, 1, ifp->if_broot_bytes); + pp = xfs_bmap_broot_ptr_addr(mp, rblock, 1, ifp->if_broot_bytes); cbno = be64_to_cpu(*pp); #ifdef DEBUG if (XFS_IS_CORRUPT(cur->bc_mp, !xfs_btree_check_lptr(cur, cbno, 1))) { @@ -701,7 +701,7 @@ xfs_bmap_extents_to_btree( for_each_xfs_iext(ifp, &icur, &rec) { if (isnullstartblock(rec.br_startblock)) continue; - arp = XFS_BMBT_REC_ADDR(mp, ablock, 1 + cnt); + arp = xfs_bmbt_rec_addr(mp, ablock, 1 + cnt); xfs_bmbt_disk_set_all(arp, &rec); cnt++; } @@ -711,10 +711,10 @@ xfs_bmap_extents_to_btree( /* * Fill in the root key and pointer. */ - kp = XFS_BMBT_KEY_ADDR(mp, block, 1); - arp = XFS_BMBT_REC_ADDR(mp, ablock, 1); + kp = xfs_bmbt_key_addr(mp, block, 1); + arp = xfs_bmbt_rec_addr(mp, ablock, 1); kp->br_startoff = cpu_to_be64(xfs_bmbt_disk_get_startoff(arp)); - pp = XFS_BMBT_PTR_ADDR(mp, block, 1, xfs_bmbt_get_maxrecs(cur, + pp = xfs_bmbt_ptr_addr(mp, block, 1, xfs_bmbt_get_maxrecs(cur, be16_to_cpu(block->bb_level))); *pp = cpu_to_be64(args.fsbno); @@ -888,7 +888,7 @@ xfs_bmap_add_attrfork_btree( mp = ip->i_mount; - if (XFS_BMAP_BMDR_SPACE(block) <= xfs_inode_data_fork_size(ip)) + if (xfs_bmap_bmdr_space(block) <= xfs_inode_data_fork_size(ip)) *flags |= XFS_ILOG_DBROOT; else { cur = xfs_bmbt_init_cursor(mp, tp, ip, XFS_DATA_FORK); @@ -1170,7 +1170,7 @@ xfs_iread_bmbt_block( } /* Copy records into the incore cache. */ - frp = XFS_BMBT_REC_ADDR(mp, block, 1); + frp = xfs_bmbt_rec_addr(mp, block, 1); for (j = 0; j < num_recs; j++, frp++, ir->loaded++) { struct xfs_bmbt_irec new; xfs_failaddr_t fa; diff --git a/fs/xfs/libxfs/xfs_bmap_btree.c b/fs/xfs/libxfs/xfs_bmap_btree.c index 2f9202ed41dd..82f46837f79f 100644 --- a/fs/xfs/libxfs/xfs_bmap_btree.c +++ b/fs/xfs/libxfs/xfs_bmap_btree.c @@ -48,10 +48,10 @@ xfs_bmdr_to_bmbt( ASSERT(be16_to_cpu(rblock->bb_level) > 0); rblock->bb_numrecs = dblock->bb_numrecs; dmxr = xfs_bmdr_maxrecs(dblocklen, 0); - fkp = XFS_BMDR_KEY_ADDR(dblock, 1); - tkp = XFS_BMBT_KEY_ADDR(mp, rblock, 1); - fpp = XFS_BMDR_PTR_ADDR(dblock, 1, dmxr); - tpp = XFS_BMAP_BROOT_PTR_ADDR(mp, rblock, 1, rblocklen); + fkp = xfs_bmdr_key_addr(dblock, 1); + tkp = xfs_bmbt_key_addr(mp, rblock, 1); + fpp = xfs_bmdr_ptr_addr(dblock, 1, dmxr); + tpp = xfs_bmap_broot_ptr_addr(mp, rblock, 1, rblocklen); dmxr = be16_to_cpu(dblock->bb_numrecs); memcpy(tkp, fkp, sizeof(*fkp) * dmxr); memcpy(tpp, fpp, sizeof(*fpp) * dmxr); @@ -151,10 +151,10 @@ xfs_bmbt_to_bmdr( dblock->bb_level = rblock->bb_level; dblock->bb_numrecs = rblock->bb_numrecs; dmxr = xfs_bmdr_maxrecs(dblocklen, 0); - fkp = XFS_BMBT_KEY_ADDR(mp, rblock, 1); - tkp = XFS_BMDR_KEY_ADDR(dblock, 1); - fpp = XFS_BMAP_BROOT_PTR_ADDR(mp, rblock, 1, rblocklen); - tpp = XFS_BMDR_PTR_ADDR(dblock, 1, dmxr); + fkp = xfs_bmbt_key_addr(mp, rblock, 1); + tkp = xfs_bmdr_key_addr(dblock, 1); + fpp = xfs_bmap_broot_ptr_addr(mp, rblock, 1, rblocklen); + tpp = xfs_bmdr_ptr_addr(dblock, 1, dmxr); dmxr = be16_to_cpu(dblock->bb_numrecs); memcpy(tkp, fkp, sizeof(*fkp) * dmxr); memcpy(tpp, fpp, sizeof(*fpp) * dmxr); @@ -688,7 +688,7 @@ xfs_bmbt_maxrecs( int blocklen, int leaf) { - blocklen -= XFS_BMBT_BLOCK_LEN(mp); + blocklen -= xfs_bmbt_block_len(mp); return xfs_bmbt_block_maxrecs(blocklen, leaf); } diff --git a/fs/xfs/libxfs/xfs_bmap_btree.h b/fs/xfs/libxfs/xfs_bmap_btree.h index 151b8491f60e..62fbc4f7c2c4 100644 --- a/fs/xfs/libxfs/xfs_bmap_btree.h +++ b/fs/xfs/libxfs/xfs_bmap_btree.h @@ -13,70 +13,6 @@ struct xfs_inode; struct xfs_trans; struct xbtree_ifakeroot; -/* - * Btree block header size depends on a superblock flag. - */ -#define XFS_BMBT_BLOCK_LEN(mp) \ - (xfs_has_crc(((mp))) ? \ - XFS_BTREE_LBLOCK_CRC_LEN : XFS_BTREE_LBLOCK_LEN) - -#define XFS_BMBT_REC_ADDR(mp, block, index) \ - ((xfs_bmbt_rec_t *) \ - ((char *)(block) + \ - XFS_BMBT_BLOCK_LEN(mp) + \ - ((index) - 1) * sizeof(xfs_bmbt_rec_t))) - -#define XFS_BMBT_KEY_ADDR(mp, block, index) \ - ((xfs_bmbt_key_t *) \ - ((char *)(block) + \ - XFS_BMBT_BLOCK_LEN(mp) + \ - ((index) - 1) * sizeof(xfs_bmbt_key_t))) - -#define XFS_BMBT_PTR_ADDR(mp, block, index, maxrecs) \ - ((xfs_bmbt_ptr_t *) \ - ((char *)(block) + \ - XFS_BMBT_BLOCK_LEN(mp) + \ - (maxrecs) * sizeof(xfs_bmbt_key_t) + \ - ((index) - 1) * sizeof(xfs_bmbt_ptr_t))) - -#define XFS_BMDR_REC_ADDR(block, index) \ - ((xfs_bmdr_rec_t *) \ - ((char *)(block) + \ - sizeof(struct xfs_bmdr_block) + \ - ((index) - 1) * sizeof(xfs_bmdr_rec_t))) - -#define XFS_BMDR_KEY_ADDR(block, index) \ - ((xfs_bmdr_key_t *) \ - ((char *)(block) + \ - sizeof(struct xfs_bmdr_block) + \ - ((index) - 1) * sizeof(xfs_bmdr_key_t))) - -#define XFS_BMDR_PTR_ADDR(block, index, maxrecs) \ - ((xfs_bmdr_ptr_t *) \ - ((char *)(block) + \ - sizeof(struct xfs_bmdr_block) + \ - (maxrecs) * sizeof(xfs_bmdr_key_t) + \ - ((index) - 1) * sizeof(xfs_bmdr_ptr_t))) - -/* - * These are to be used when we know the size of the block and - * we don't have a cursor. - */ -#define XFS_BMAP_BROOT_PTR_ADDR(mp, bb, i, sz) \ - XFS_BMBT_PTR_ADDR(mp, bb, i, xfs_bmbt_maxrecs(mp, sz, 0)) - -#define XFS_BMAP_BROOT_SPACE_CALC(mp, nrecs) \ - (int)(XFS_BMBT_BLOCK_LEN(mp) + \ - ((nrecs) * (sizeof(xfs_bmbt_key_t) + sizeof(xfs_bmbt_ptr_t)))) - -#define XFS_BMAP_BROOT_SPACE(mp, bb) \ - (XFS_BMAP_BROOT_SPACE_CALC(mp, be16_to_cpu((bb)->bb_numrecs))) -#define XFS_BMDR_SPACE_CALC(nrecs) \ - (int)(sizeof(xfs_bmdr_block_t) + \ - ((nrecs) * (sizeof(xfs_bmbt_key_t) + sizeof(xfs_bmbt_ptr_t)))) -#define XFS_BMAP_BMDR_SPACE(bb) \ - (XFS_BMDR_SPACE_CALC(be16_to_cpu((bb)->bb_numrecs))) - /* * Maximum number of bmap btree levels. */ @@ -120,4 +56,144 @@ unsigned int xfs_bmbt_maxlevels_ondisk(void); int __init xfs_bmbt_init_cur_cache(void); void xfs_bmbt_destroy_cur_cache(void); +/* + * Btree block header size depends on a superblock flag. + */ +static inline size_t +xfs_bmbt_block_len(struct xfs_mount *mp) +{ + return xfs_has_crc(mp) ? + XFS_BTREE_LBLOCK_CRC_LEN : XFS_BTREE_LBLOCK_LEN; +} + +/* Addresses of key, pointers, and records within an incore bmbt block. */ + +static inline struct xfs_bmbt_rec * +xfs_bmbt_rec_addr( + struct xfs_mount *mp, + struct xfs_btree_block *block, + unsigned int index) +{ + return (struct xfs_bmbt_rec *) + ((char *)block + xfs_bmbt_block_len(mp) + + (index - 1) * sizeof(struct xfs_bmbt_rec)); +} + +static inline struct xfs_bmbt_key * +xfs_bmbt_key_addr( + struct xfs_mount *mp, + struct xfs_btree_block *block, + unsigned int index) +{ + return (struct xfs_bmbt_key *) + ((char *)block + xfs_bmbt_block_len(mp) + + (index - 1) * sizeof(struct xfs_bmbt_key *)); +} + +static inline xfs_bmbt_ptr_t * +xfs_bmbt_ptr_addr( + struct xfs_mount *mp, + struct xfs_btree_block *block, + unsigned int index, + unsigned int maxrecs) +{ + return (xfs_bmbt_ptr_t *) + ((char *)block + xfs_bmbt_block_len(mp) + + maxrecs * sizeof(struct xfs_bmbt_key) + + (index - 1) * sizeof(xfs_bmbt_ptr_t)); +} + +/* Addresses of key, pointers, and records within an ondisk bmbt block. */ + +static inline struct xfs_bmbt_rec * +xfs_bmdr_rec_addr( + struct xfs_bmdr_block *block, + unsigned int index) +{ + return (struct xfs_bmbt_rec *) + ((char *)(block + 1) + + (index - 1) * sizeof(struct xfs_bmbt_rec)); +} + +static inline struct xfs_bmbt_key * +xfs_bmdr_key_addr( + struct xfs_bmdr_block *block, + unsigned int index) +{ + return (struct xfs_bmbt_key *) + ((char *)(block + 1) + + (index - 1) * sizeof(struct xfs_bmbt_key)); +} + +static inline xfs_bmbt_ptr_t * +xfs_bmdr_ptr_addr( + struct xfs_bmdr_block *block, + unsigned int index, + unsigned int maxrecs) +{ + return (xfs_bmbt_ptr_t *) + ((char *)(block + 1) + + maxrecs * sizeof(struct xfs_bmbt_key) + + (index - 1) * sizeof(xfs_bmbt_ptr_t)); +} + +/* + * Address of pointers within the incore btree root. + * + * These are to be used when we know the size of the block and + * we don't have a cursor. + */ +static inline xfs_bmbt_ptr_t * +xfs_bmap_broot_ptr_addr( + struct xfs_mount *mp, + struct xfs_btree_block *bb, + unsigned int i, + unsigned int sz) +{ + return xfs_bmbt_ptr_addr(mp, bb, i, xfs_bmbt_maxrecs(mp, sz, 0)); +} + +/* + * Compute the space required for the incore btree root containing the given + * number of records. + */ +static inline size_t +xfs_bmap_broot_space_calc( + struct xfs_mount *mp, + unsigned int nrecs) +{ + return xfs_bmbt_block_len(mp) + \ + (nrecs * (sizeof(struct xfs_bmbt_key) + sizeof(xfs_bmbt_ptr_t))); +} + +/* + * Compute the space required for the incore btree root given the ondisk + * btree root block. + */ +static inline size_t +xfs_bmap_broot_space( + struct xfs_mount *mp, + struct xfs_bmdr_block *bb) +{ + return xfs_bmap_broot_space_calc(mp, be16_to_cpu(bb->bb_numrecs)); +} + +/* Compute the space required for the ondisk root block. */ +static inline size_t +xfs_bmdr_space_calc(unsigned int nrecs) +{ + return sizeof(struct xfs_bmdr_block) + + (nrecs * (sizeof(struct xfs_bmbt_key) + sizeof(xfs_bmbt_ptr_t))); +} + +/* + * Compute the space required for the ondisk root block given an incore root + * block. + */ +static inline size_t +xfs_bmap_bmdr_space(struct xfs_btree_block *bb) +{ + return xfs_bmdr_space_calc(be16_to_cpu(bb->bb_numrecs)); +} + #endif /* __XFS_BMAP_BTREE_H__ */ diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c index ab1bc0e3a595..9d53df7ce49d 100644 --- a/fs/xfs/libxfs/xfs_inode_fork.c +++ b/fs/xfs/libxfs/xfs_inode_fork.c @@ -181,7 +181,7 @@ xfs_iformat_btree( ifp = xfs_ifork_ptr(ip, whichfork); dfp = (xfs_bmdr_block_t *)XFS_DFORK_PTR(dip, whichfork); - size = XFS_BMAP_BROOT_SPACE(mp, dfp); + size = xfs_bmap_broot_space(mp, dfp); nrecs = be16_to_cpu(dfp->bb_numrecs); level = be16_to_cpu(dfp->bb_level); @@ -194,7 +194,7 @@ xfs_iformat_btree( */ if (unlikely(ifp->if_nextents <= XFS_IFORK_MAXEXT(ip, whichfork) || nrecs == 0 || - XFS_BMDR_SPACE_CALC(nrecs) > + xfs_bmdr_space_calc(nrecs) > XFS_DFORK_SIZE(dip, mp, whichfork) || ifp->if_nextents > ip->i_nblocks) || level == 0 || level > XFS_BM_MAXLEVELS(mp, whichfork)) { @@ -391,7 +391,7 @@ xfs_iroot_realloc( * allocate it now and get out. */ if (ifp->if_broot_bytes == 0) { - new_size = XFS_BMAP_BROOT_SPACE_CALC(mp, rec_diff); + new_size = xfs_bmap_broot_space_calc(mp, rec_diff); ifp->if_broot = kmem_alloc(new_size, KM_NOFS); ifp->if_broot_bytes = (int)new_size; return; @@ -405,15 +405,15 @@ xfs_iroot_realloc( */ cur_max = xfs_bmbt_maxrecs(mp, ifp->if_broot_bytes, 0); new_max = cur_max + rec_diff; - new_size = XFS_BMAP_BROOT_SPACE_CALC(mp, new_max); + new_size = xfs_bmap_broot_space_calc(mp, new_max); ifp->if_broot = krealloc(ifp->if_broot, new_size, GFP_NOFS | __GFP_NOFAIL); - op = (char *)XFS_BMAP_BROOT_PTR_ADDR(mp, ifp->if_broot, 1, + op = (char *)xfs_bmap_broot_ptr_addr(mp, ifp->if_broot, 1, ifp->if_broot_bytes); - np = (char *)XFS_BMAP_BROOT_PTR_ADDR(mp, ifp->if_broot, 1, + np = (char *)xfs_bmap_broot_ptr_addr(mp, ifp->if_broot, 1, (int)new_size); ifp->if_broot_bytes = (int)new_size; - ASSERT(XFS_BMAP_BMDR_SPACE(ifp->if_broot) <= + ASSERT(xfs_bmap_bmdr_space(ifp->if_broot) <= xfs_inode_fork_size(ip, whichfork)); memmove(np, op, cur_max * (uint)sizeof(xfs_fsblock_t)); return; @@ -429,7 +429,7 @@ xfs_iroot_realloc( new_max = cur_max + rec_diff; ASSERT(new_max >= 0); if (new_max > 0) - new_size = XFS_BMAP_BROOT_SPACE_CALC(mp, new_max); + new_size = xfs_bmap_broot_space_calc(mp, new_max); else new_size = 0; if (new_size > 0) { @@ -438,7 +438,7 @@ xfs_iroot_realloc( * First copy over the btree block header. */ memcpy(new_broot, ifp->if_broot, - XFS_BMBT_BLOCK_LEN(ip->i_mount)); + xfs_bmbt_block_len(ip->i_mount)); } else { new_broot = NULL; } @@ -450,16 +450,16 @@ xfs_iroot_realloc( /* * First copy the records. */ - op = (char *)XFS_BMBT_REC_ADDR(mp, ifp->if_broot, 1); - np = (char *)XFS_BMBT_REC_ADDR(mp, new_broot, 1); + op = (char *)xfs_bmbt_rec_addr(mp, ifp->if_broot, 1); + np = (char *)xfs_bmbt_rec_addr(mp, new_broot, 1); memcpy(np, op, new_max * (uint)sizeof(xfs_bmbt_rec_t)); /* * Then copy the pointers. */ - op = (char *)XFS_BMAP_BROOT_PTR_ADDR(mp, ifp->if_broot, 1, + op = (char *)xfs_bmap_broot_ptr_addr(mp, ifp->if_broot, 1, ifp->if_broot_bytes); - np = (char *)XFS_BMAP_BROOT_PTR_ADDR(mp, new_broot, 1, + np = (char *)xfs_bmap_broot_ptr_addr(mp, new_broot, 1, (int)new_size); memcpy(np, op, new_max * (uint)sizeof(xfs_fsblock_t)); } @@ -467,7 +467,7 @@ xfs_iroot_realloc( ifp->if_broot = new_broot; ifp->if_broot_bytes = (int)new_size; if (ifp->if_broot) - ASSERT(XFS_BMAP_BMDR_SPACE(ifp->if_broot) <= + ASSERT(xfs_bmap_bmdr_space(ifp->if_broot) <= xfs_inode_fork_size(ip, whichfork)); return; } @@ -640,7 +640,7 @@ xfs_iflush_fork( if ((iip->ili_fields & brootflag[whichfork]) && (ifp->if_broot_bytes > 0)) { ASSERT(ifp->if_broot != NULL); - ASSERT(XFS_BMAP_BMDR_SPACE(ifp->if_broot) <= + ASSERT(xfs_bmap_bmdr_space(ifp->if_broot) <= xfs_inode_fork_size(ip, whichfork)); xfs_bmbt_to_bmdr(mp, ifp->if_broot, ifp->if_broot_bytes, (xfs_bmdr_block_t *)cp, diff --git a/fs/xfs/libxfs/xfs_trans_resv.c b/fs/xfs/libxfs/xfs_trans_resv.c index 67accd613038..3435492b1658 100644 --- a/fs/xfs/libxfs/xfs_trans_resv.c +++ b/fs/xfs/libxfs/xfs_trans_resv.c @@ -128,7 +128,7 @@ xfs_calc_inode_res( (4 * sizeof(struct xlog_op_header) + sizeof(struct xfs_inode_log_format) + mp->m_sb.sb_inodesize + - 2 * XFS_BMBT_BLOCK_LEN(mp)); + 2 * xfs_bmbt_block_len(mp)); } /* diff --git a/fs/xfs/scrub/bmap_repair.c b/fs/xfs/scrub/bmap_repair.c index 4638f3652b54..73ba5c514cde 100644 --- a/fs/xfs/scrub/bmap_repair.c +++ b/fs/xfs/scrub/bmap_repair.c @@ -437,7 +437,7 @@ xrep_bmap_iroot_size( { ASSERT(level > 0); - return XFS_BMAP_BROOT_SPACE_CALC(cur->bc_mp, nr_this_level); + return xfs_bmap_broot_space_calc(cur->bc_mp, nr_this_level); } /* Update the inode counters. */ diff --git a/fs/xfs/scrub/inode_repair.c b/fs/xfs/scrub/inode_repair.c index ef10f031146e..1efd606bf92c 100644 --- a/fs/xfs/scrub/inode_repair.c +++ b/fs/xfs/scrub/inode_repair.c @@ -701,7 +701,7 @@ xrep_dinode_bad_btree_fork( nrecs = be16_to_cpu(dfp->bb_numrecs); level = be16_to_cpu(dfp->bb_level); - if (nrecs == 0 || XFS_BMDR_SPACE_CALC(nrecs) > dfork_size) + if (nrecs == 0 || xfs_bmdr_space_calc(nrecs) > dfork_size) return true; if (level == 0 || level >= XFS_BM_MAXLEVELS(sc->mp, whichfork)) return true; @@ -919,7 +919,7 @@ xrep_dinode_ensure_forkoff( struct xfs_bmdr_block *bmdr; struct xfs_scrub *sc = ri->sc; xfs_extnum_t attr_extents, data_extents; - size_t bmdr_minsz = XFS_BMDR_SPACE_CALC(1); + size_t bmdr_minsz = xfs_bmdr_space_calc(1); unsigned int lit_sz = XFS_LITINO(sc->mp); unsigned int afork_min, dfork_min; @@ -971,7 +971,7 @@ xrep_dinode_ensure_forkoff( case XFS_DINODE_FMT_BTREE: /* Must have space for btree header and key/pointers. */ bmdr = XFS_DFORK_PTR(dip, XFS_ATTR_FORK); - afork_min = XFS_BMAP_BROOT_SPACE(sc->mp, bmdr); + afork_min = xfs_bmap_broot_space(sc->mp, bmdr); break; default: /* We should never see any other formats. */ @@ -1021,7 +1021,7 @@ xrep_dinode_ensure_forkoff( case XFS_DINODE_FMT_BTREE: /* Must have space for btree header and key/pointers. */ bmdr = XFS_DFORK_PTR(dip, XFS_DATA_FORK); - dfork_min = XFS_BMAP_BROOT_SPACE(sc->mp, bmdr); + dfork_min = xfs_bmap_broot_space(sc->mp, bmdr); break; default: dfork_min = 0; diff --git a/fs/xfs/xfs_xchgrange.c b/fs/xfs/xfs_xchgrange.c index 829a17ac7406..1951fcfdb1d9 100644 --- a/fs/xfs/xfs_xchgrange.c +++ b/fs/xfs/xfs_xchgrange.c @@ -163,7 +163,7 @@ xfs_swap_extents_check_format( */ if (tifp->if_format == XFS_DINODE_FMT_BTREE) { if (xfs_inode_has_attr_fork(ip) && - XFS_BMAP_BMDR_SPACE(tifp->if_broot) > xfs_inode_fork_boff(ip)) + xfs_bmap_bmdr_space(tifp->if_broot) > xfs_inode_fork_boff(ip)) return -EINVAL; if (tifp->if_nextents <= XFS_IFORK_MAXEXT(ip, XFS_DATA_FORK)) return -EINVAL; @@ -172,7 +172,7 @@ xfs_swap_extents_check_format( /* Reciprocal target->temp btree format checks */ if (ifp->if_format == XFS_DINODE_FMT_BTREE) { if (xfs_inode_has_attr_fork(tip) && - XFS_BMAP_BMDR_SPACE(ip->i_df.if_broot) > xfs_inode_fork_boff(tip)) + xfs_bmap_bmdr_space(ip->i_df.if_broot) > xfs_inode_fork_boff(tip)) return -EINVAL; if (ifp->if_nextents <= XFS_IFORK_MAXEXT(tip, XFS_DATA_FORK)) return -EINVAL; ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 05/14] xfs: hoist the code that moves the incore inode fork broot memory 2022-12-30 22:17 ` [PATCHSET v1.0 00/14] xfs: refactor btrees to support records in inode root Darrick J. Wong ` (2 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 01/14] xfs: replace shouty XFS_BM{BT,DR} macros Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 04/14] xfs: fix a sloppy memory handling bug in xfs_iroot_realloc Darrick J. Wong ` (9 subsequent siblings) 13 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Whenever we change the size of the memory buffer holding an inode fork btree root block, we have to copy the contents over. Refactor all this into a single function that handles both, in preparation for making xfs_iroot_realloc more generic. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_inode_fork.c | 99 +++++++++++++++++++++++----------------- 1 file changed, 57 insertions(+), 42 deletions(-) diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c index b73b971b83cd..16782d3630d2 100644 --- a/fs/xfs/libxfs/xfs_inode_fork.c +++ b/fs/xfs/libxfs/xfs_inode_fork.c @@ -369,6 +369,50 @@ xfs_iroot_free( ifp->if_broot = NULL; } +/* Move the bmap btree root from one incore buffer to another. */ +static void +xfs_ifork_move_broot( + struct xfs_inode *ip, + int whichfork, + struct xfs_btree_block *dst_broot, + size_t dst_bytes, + struct xfs_btree_block *src_broot, + size_t src_bytes, + unsigned int numrecs) +{ + struct xfs_mount *mp = ip->i_mount; + void *dptr; + void *sptr; + + ASSERT(xfs_bmap_bmdr_space(src_broot) <= xfs_inode_fork_size(ip, whichfork)); + + /* + * We always have to move the pointers because they are not butted + * against the btree block header. + */ + if (numrecs) { + sptr = xfs_bmap_broot_ptr_addr(mp, src_broot, 1, src_bytes); + dptr = xfs_bmap_broot_ptr_addr(mp, dst_broot, 1, dst_bytes); + memmove(dptr, sptr, numrecs * sizeof(xfs_fsblock_t)); + } + + if (src_broot == dst_broot) + return; + + /* + * If the root is being totally relocated, we have to migrate the block + * header and the keys that come after it. + */ + memcpy(dst_broot, src_broot, xfs_bmbt_block_len(mp)); + + /* Now copy the keys, which come right after the header. */ + if (numrecs) { + sptr = xfs_bmbt_key_addr(mp, src_broot, 1); + dptr = xfs_bmbt_key_addr(mp, dst_broot, 1); + memcpy(dptr, sptr, numrecs * sizeof(struct xfs_bmbt_key)); + } +} + /* * Reallocate the space for if_broot based on the number of records * being added or deleted as indicated in rec_diff. Move the records @@ -395,12 +439,11 @@ xfs_iroot_realloc( { struct xfs_mount *mp = ip->i_mount; int cur_max; - struct xfs_ifork *ifp; + struct xfs_ifork *ifp = xfs_ifork_ptr(ip, whichfork); struct xfs_btree_block *new_broot; int new_max; size_t new_size; - char *np; - char *op; + size_t old_size = ifp->if_broot_bytes; /* * Handle the degenerate case quietly. @@ -409,13 +452,12 @@ xfs_iroot_realloc( return; } - ifp = xfs_ifork_ptr(ip, whichfork); if (rec_diff > 0) { /* * If there wasn't any memory allocated before, just * allocate it now and get out. */ - if (ifp->if_broot_bytes == 0) { + if (old_size == 0) { new_size = xfs_bmap_broot_space_calc(mp, rec_diff); xfs_iroot_alloc(ip, whichfork, new_size); return; @@ -424,22 +466,16 @@ xfs_iroot_realloc( /* * If there is already an existing if_broot, then we need * to realloc() it and shift the pointers to their new - * location. The records don't change location because - * they are kept butted up against the btree block header. + * location. */ - cur_max = xfs_bmbt_maxrecs(mp, ifp->if_broot_bytes, 0); + cur_max = xfs_bmbt_maxrecs(mp, old_size, 0); new_max = cur_max + rec_diff; new_size = xfs_bmap_broot_space_calc(mp, new_max); ifp->if_broot = krealloc(ifp->if_broot, new_size, GFP_NOFS | __GFP_NOFAIL); - op = (char *)xfs_bmap_broot_ptr_addr(mp, ifp->if_broot, 1, - ifp->if_broot_bytes); - np = (char *)xfs_bmap_broot_ptr_addr(mp, ifp->if_broot, 1, - (int)new_size); - ifp->if_broot_bytes = (int)new_size; - ASSERT(xfs_bmap_bmdr_space(ifp->if_broot) <= - xfs_inode_fork_size(ip, whichfork)); - memmove(np, op, cur_max * (uint)sizeof(xfs_fsblock_t)); + ifp->if_broot_bytes = new_size; + xfs_ifork_move_broot(ip, whichfork, ifp->if_broot, new_size, + ifp->if_broot, old_size, cur_max); return; } @@ -448,8 +484,8 @@ xfs_iroot_realloc( * if_broot buffer. It must already exist. If we go to zero * records, just get rid of the root and clear the status bit. */ - ASSERT((ifp->if_broot != NULL) && (ifp->if_broot_bytes > 0)); - cur_max = xfs_bmbt_maxrecs(mp, ifp->if_broot_bytes, 0); + ASSERT((ifp->if_broot != NULL) && (old_size > 0)); + cur_max = xfs_bmbt_maxrecs(mp, old_size, 0); new_max = cur_max + rec_diff; ASSERT(new_max >= 0); if (new_max > 0) @@ -461,35 +497,14 @@ xfs_iroot_realloc( return; } - /* First copy over the btree block header. */ + /* Reallocate the btree root and move the contents. */ new_broot = kmem_alloc(new_size, KM_NOFS); - memcpy(new_broot, ifp->if_broot, xfs_bmbt_block_len(ip->i_mount)); + xfs_ifork_move_broot(ip, whichfork, new_broot, new_size, ifp->if_broot, + old_size, new_max); - /* - * Only copy the keys and pointers if there are any. - */ - if (new_max > 0) { - /* - * First copy the keys. - */ - op = (char *)xfs_bmbt_key_addr(mp, ifp->if_broot, 1); - np = (char *)xfs_bmbt_key_addr(mp, new_broot, 1); - memcpy(np, op, new_max * (uint)sizeof(xfs_bmbt_key_t)); - - /* - * Then copy the pointers. - */ - op = (char *)xfs_bmap_broot_ptr_addr(mp, ifp->if_broot, 1, - ifp->if_broot_bytes); - np = (char *)xfs_bmap_broot_ptr_addr(mp, new_broot, 1, - (int)new_size); - memcpy(np, op, new_max * (uint)sizeof(xfs_fsblock_t)); - } kmem_free(ifp->if_broot); ifp->if_broot = new_broot; ifp->if_broot_bytes = (int)new_size; - ASSERT(xfs_bmap_bmdr_space(ifp->if_broot) <= - xfs_inode_fork_size(ip, whichfork)); return; } ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 04/14] xfs: fix a sloppy memory handling bug in xfs_iroot_realloc 2022-12-30 22:17 ` [PATCHSET v1.0 00/14] xfs: refactor btrees to support records in inode root Darrick J. Wong ` (3 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 05/14] xfs: hoist the code that moves the incore inode fork broot memory Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 06/14] xfs: move the zero records logic into xfs_bmap_broot_space_calc Darrick J. Wong ` (8 subsequent siblings) 13 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> While refactoring code, I noticed that when xfs_iroot_realloc tries to shrink a bmbt root block, it allocates a smaller new block and then copies "records" and pointers to the new block. However, bmbt root blocks cannot ever be leaves, which means that it's not technically correct to copy records. We /should/ be copying keys. Note that this has never resulted in actual memory corruption because sizeof(bmbt_rec) == (sizeof(bmbt_key) + sizeof(bmbt_ptr)). However, this will no longer be true when we start adding realtime rmap stuff, so fix this now. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_inode_fork.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c index 0f220f100069..b73b971b83cd 100644 --- a/fs/xfs/libxfs/xfs_inode_fork.c +++ b/fs/xfs/libxfs/xfs_inode_fork.c @@ -466,15 +466,15 @@ xfs_iroot_realloc( memcpy(new_broot, ifp->if_broot, xfs_bmbt_block_len(ip->i_mount)); /* - * Only copy the records and pointers if there are any. + * Only copy the keys and pointers if there are any. */ if (new_max > 0) { /* - * First copy the records. + * First copy the keys. */ - op = (char *)xfs_bmbt_rec_addr(mp, ifp->if_broot, 1); - np = (char *)xfs_bmbt_rec_addr(mp, new_broot, 1); - memcpy(np, op, new_max * (uint)sizeof(xfs_bmbt_rec_t)); + op = (char *)xfs_bmbt_key_addr(mp, ifp->if_broot, 1); + np = (char *)xfs_bmbt_key_addr(mp, new_broot, 1); + memcpy(np, op, new_max * (uint)sizeof(xfs_bmbt_key_t)); /* * Then copy the pointers. ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 06/14] xfs: move the zero records logic into xfs_bmap_broot_space_calc 2022-12-30 22:17 ` [PATCHSET v1.0 00/14] xfs: refactor btrees to support records in inode root Darrick J. Wong ` (4 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 04/14] xfs: fix a sloppy memory handling bug in xfs_iroot_realloc Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 13/14] xfs: support storing records in the inode core root Darrick J. Wong ` (7 subsequent siblings) 13 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> The bmap btree cannot ever have zero records in an incore btree block. If the number of records drops to zero, that means we're converting the fork to extents format and are trying to remove the tree. This logic won't hold for the future realtime rmap btree, so move the logic into the bmbt code. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_bmap_btree.h | 7 +++++++ fs/xfs/libxfs/xfs_inode_fork.c | 6 ++---- 2 files changed, 9 insertions(+), 4 deletions(-) diff --git a/fs/xfs/libxfs/xfs_bmap_btree.h b/fs/xfs/libxfs/xfs_bmap_btree.h index 3fe9c4f7f1a0..5a3bae94debd 100644 --- a/fs/xfs/libxfs/xfs_bmap_btree.h +++ b/fs/xfs/libxfs/xfs_bmap_btree.h @@ -162,6 +162,13 @@ xfs_bmap_broot_space_calc( struct xfs_mount *mp, unsigned int nrecs) { + /* + * If the bmbt root block is empty, we should be converting the fork + * to extents format. Hence, the size is zero. + */ + if (nrecs == 0) + return 0; + return xfs_bmbt_block_len(mp) + \ (nrecs * (sizeof(struct xfs_bmbt_key) + sizeof(xfs_bmbt_ptr_t))); } diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c index 16782d3630d2..1bd8c1f9ce37 100644 --- a/fs/xfs/libxfs/xfs_inode_fork.c +++ b/fs/xfs/libxfs/xfs_inode_fork.c @@ -488,10 +488,8 @@ xfs_iroot_realloc( cur_max = xfs_bmbt_maxrecs(mp, old_size, 0); new_max = cur_max + rec_diff; ASSERT(new_max >= 0); - if (new_max > 0) - new_size = xfs_bmap_broot_space_calc(mp, new_max); - else - new_size = 0; + + new_size = xfs_bmap_broot_space_calc(mp, new_max); if (new_size == 0) { xfs_iroot_free(ip, whichfork); return; ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 13/14] xfs: support storing records in the inode core root 2022-12-30 22:17 ` [PATCHSET v1.0 00/14] xfs: refactor btrees to support records in inode root Darrick J. Wong ` (5 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 06/14] xfs: move the zero records logic into xfs_bmap_broot_space_calc Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 07/14] xfs: rearrange xfs_iroot_realloc a bit Darrick J. Wong ` (6 subsequent siblings) 13 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Add the necessary flags and code so that we can support storing leaf records in the inode root block of a btree. This hasn't been necessary before, but the realtime rmapbt will need to be able to do this. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_btree.c | 150 ++++++++++++++++++++++++++++++++++--- fs/xfs/libxfs/xfs_btree.h | 1 fs/xfs/libxfs/xfs_btree_staging.c | 4 + 3 files changed, 141 insertions(+), 14 deletions(-) diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c index d3e073a21063..18628542d316 100644 --- a/fs/xfs/libxfs/xfs_btree.c +++ b/fs/xfs/libxfs/xfs_btree.c @@ -267,6 +267,11 @@ xfs_btree_check_block( int level, /* level of the btree block */ struct xfs_buf *bp) /* buffer containing block, if any */ { + /* Don't check the inode-core root. */ + if ((cur->bc_flags & XFS_BTREE_ROOT_IN_INODE) && + level == cur->bc_nlevels - 1) + return 0; + if (cur->bc_flags & XFS_BTREE_LONG_PTRS) return xfs_btree_check_lblock(cur, block, level, bp); else @@ -1570,12 +1575,16 @@ xfs_btree_log_recs( int first, int last) { + if (!bp) { + xfs_trans_log_inode(cur->bc_tp, cur->bc_ino.ip, + xfs_ilog_fbroot(cur->bc_ino.whichfork)); + return; + } xfs_trans_buf_set_type(cur->bc_tp, bp, XFS_BLFT_BTREE_BUF); xfs_trans_log_buf(cur->bc_tp, bp, xfs_btree_rec_offset(cur, first), xfs_btree_rec_offset(cur, last + 1) - 1); - } /* @@ -3091,6 +3100,64 @@ xfs_btree_iroot_realloc( cur->bc_ops->iroot_ops, rec_diff); } +/* + * Move the records from a root leaf block to a separate block. + * + * Trickery here: The amount of memory that we need per record for the incore + * root block changes when we convert a leaf block to an internal block. + * Therefore, we copy leaf records into the new btree block (cblock) before + * freeing the incore root block and changing the tree height. + * + * Once we've changed the tree height, we allocate a new incore root block + * (which will now be an internal root block) and populate it with a pointer to + * cblock and the relevant keys. + */ +STATIC void +xfs_btree_promote_leaf_iroot( + struct xfs_btree_cur *cur, + struct xfs_btree_block *block, + struct xfs_buf *cbp, + union xfs_btree_ptr *cptr, + struct xfs_btree_block *cblock) +{ + union xfs_btree_rec *rp; + union xfs_btree_rec *crp; + union xfs_btree_key *kp; + union xfs_btree_ptr *pp; + size_t size; + int numrecs = xfs_btree_get_numrecs(block); + + /* Copy the records from the leaf root into the new child block. */ + rp = xfs_btree_rec_addr(cur, 1, block); + crp = xfs_btree_rec_addr(cur, 1, cblock); + xfs_btree_copy_recs(cur, crp, rp, numrecs); + + /* Zap the old root and change the tree height. */ + xfs_iroot_free(cur->bc_ino.ip, cur->bc_ino.whichfork); + cur->bc_nlevels++; + cur->bc_levels[1].ptr = 1; + + /* + * Allocate a new internal root block buffer and reinitialize it to + * point to a single new child. + */ + size = cur->bc_ops->iroot_ops->size(cur->bc_mp, cur->bc_nlevels - 1, 1); + xfs_iroot_alloc(cur->bc_ino.ip, cur->bc_ino.whichfork, size); + block = xfs_btree_get_iroot(cur); + xfs_btree_init_block(cur->bc_mp, block, cur->bc_ops, + cur->bc_nlevels - 1, 1, cur->bc_ino.ip->i_ino); + + pp = xfs_btree_ptr_addr(cur, 1, block); + kp = xfs_btree_key_addr(cur, 1, block); + xfs_btree_copy_ptrs(cur, pp, cptr, 1); + xfs_btree_get_keys(cur, cblock, kp); + + /* Attach the new block to the cursor and log it. */ + xfs_btree_setbuf(cur, 0, cbp); + xfs_btree_log_block(cur, cbp, XFS_BB_ALL_BITS); + xfs_btree_log_recs(cur, cbp, 1, numrecs); +} + /* * Move the keys and pointers from a root block to a separate block. * @@ -3175,7 +3242,7 @@ xfs_btree_new_iroot( struct xfs_buf *cbp; /* buffer for cblock */ struct xfs_btree_block *block; /* btree block */ struct xfs_btree_block *cblock; /* child btree block */ - union xfs_btree_ptr *pp; + union xfs_btree_ptr aptr; union xfs_btree_ptr nptr; /* new block addr */ int level; /* btree level */ int error; /* error return code */ @@ -3187,10 +3254,15 @@ xfs_btree_new_iroot( level = cur->bc_nlevels - 1; block = xfs_btree_get_iroot(cur); - pp = xfs_btree_ptr_addr(cur, 1, block); + ASSERT(level > 0 || (cur->bc_flags & XFS_BTREE_IROOT_RECORDS)); + if (level > 0) + aptr = *xfs_btree_ptr_addr(cur, 1, block); + else + aptr.l = cpu_to_be64(XFS_INO_TO_FSB(cur->bc_mp, + cur->bc_ino.ip->i_ino)); /* Allocate the new block. If we can't do it, we're toast. Give up. */ - error = xfs_btree_alloc_block(cur, pp, &nptr, stat); + error = xfs_btree_alloc_block(cur, &aptr, &nptr, stat); if (error) goto error0; if (*stat == 0) @@ -3216,10 +3288,14 @@ xfs_btree_new_iroot( cblock->bb_u.s.bb_blkno = bno; } - error = xfs_btree_promote_node_iroot(cur, block, level, cbp, &nptr, - cblock); - if (error) - goto error0; + if (level > 0) { + error = xfs_btree_promote_node_iroot(cur, block, level, cbp, + &nptr, cblock); + if (error) + goto error0; + } else { + xfs_btree_promote_leaf_iroot(cur, block, cbp, &nptr, cblock); + } *logflags |= XFS_ILOG_CORE | xfs_ilog_fbroot(cur->bc_ino.whichfork); @@ -3716,6 +3792,45 @@ xfs_btree_insert( return error; } +/* + * Move the records from a child leaf block to the root block. + * + * Trickery here: The amount of memory we need per record for the incore root + * block changes when we convert a leaf block to an internal block. Therefore, + * we free the incore root block, change the tree height, allocate a new incore + * root, and copy the records from the doomed block into the new root. + */ +STATIC void +xfs_btree_demote_leaf_child( + struct xfs_btree_cur *cur, + struct xfs_btree_block *cblock, + int numrecs) +{ + union xfs_btree_rec *rp; + union xfs_btree_rec *crp; + struct xfs_btree_block *block; + size_t size; + + /* Zap the old root and change the tree height. */ + xfs_iroot_free(cur->bc_ino.ip, cur->bc_ino.whichfork); + cur->bc_levels[0].bp = NULL; + cur->bc_nlevels--; + + /* + * Allocate a new internal root block buffer and reinitialize it with + * the leaf records in the child. + */ + size = cur->bc_ops->iroot_ops->size(cur->bc_mp, 0, numrecs); + xfs_iroot_alloc(cur->bc_ino.ip, cur->bc_ino.whichfork, size); + block = xfs_btree_get_iroot(cur); + xfs_btree_init_block(cur->bc_mp, block, cur->bc_ops, 0, numrecs, + cur->bc_ino.ip->i_ino); + + rp = xfs_btree_rec_addr(cur, 1, block); + crp = xfs_btree_rec_addr(cur, 1, cblock); + xfs_btree_copy_recs(cur, rp, crp, numrecs); +} + /* * Move the keyptrs from a child node block to the root block. * @@ -3797,14 +3912,19 @@ xfs_btree_kill_iroot( #endif ASSERT(cur->bc_flags & XFS_BTREE_ROOT_IN_INODE); - ASSERT(cur->bc_nlevels > 1); + ASSERT((cur->bc_flags & XFS_BTREE_IROOT_RECORDS) || + cur->bc_nlevels > 1); /* * Don't deal with the root block needs to be a leaf case. * We're just going to turn the thing back into extents anyway. */ level = cur->bc_nlevels - 1; - if (level == 1) + if (level == 1 && !(cur->bc_flags & XFS_BTREE_IROOT_RECORDS)) + goto out0; + + /* If we're already a leaf, jump out. */ + if (level == 0) goto out0; /* @@ -3834,9 +3954,13 @@ xfs_btree_kill_iroot( ASSERT(xfs_btree_ptr_is_null(cur, &ptr)); #endif - error = xfs_btree_demote_node_child(cur, cblock, level, numrecs); - if (error) - return error; + if (level > 1) { + error = xfs_btree_demote_node_child(cur, cblock, level, + numrecs); + if (error) + return error; + } else + xfs_btree_demote_leaf_child(cur, cblock, numrecs); error = xfs_btree_free_block(cur, cbp); if (error) diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h index 3acfdcdf7561..b15bc77369cf 100644 --- a/fs/xfs/libxfs/xfs_btree.h +++ b/fs/xfs/libxfs/xfs_btree.h @@ -334,6 +334,7 @@ xfs_btree_cur_sizeof(unsigned int nlevels) * is dynamically allocated and must be freed when the cursor is deleted. */ #define XFS_BTREE_STAGING (1<<5) +#define XFS_BTREE_IROOT_RECORDS (1<<6) /* iroot can store records */ /* btree stored in memory; not compatible with ROOT_IN_INODE */ #ifdef CONFIG_XFS_IN_MEMORY_BTREE diff --git a/fs/xfs/libxfs/xfs_btree_staging.c b/fs/xfs/libxfs/xfs_btree_staging.c index 73d9aaeafead..d647b40351c8 100644 --- a/fs/xfs/libxfs/xfs_btree_staging.c +++ b/fs/xfs/libxfs/xfs_btree_staging.c @@ -699,7 +699,9 @@ xfs_btree_bload_compute_geometry( * * Note that bmap btrees forbid records in the root. */ - if (level != 0 && nr_this_level <= avg_per_block) { + if ((level != 0 || + (cur->bc_flags & XFS_BTREE_IROOT_RECORDS)) && + nr_this_level <= avg_per_block) { nr_blocks++; break; } ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 07/14] xfs: rearrange xfs_iroot_realloc a bit 2022-12-30 22:17 ` [PATCHSET v1.0 00/14] xfs: refactor btrees to support records in inode root Darrick J. Wong ` (6 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 13/14] xfs: support storing records in the inode core root Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 10/14] xfs: support leaves in the incore btree root block in xfs_iroot_realloc Darrick J. Wong ` (5 subsequent siblings) 13 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Rearrange the innards of xfs_iroot_realloc so that we can reduce duplicated code prior to genericizing the function. No functional changes. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_inode_fork.c | 49 ++++++++++++++++++---------------------- 1 file changed, 22 insertions(+), 27 deletions(-) diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c index 1bd8c1f9ce37..ceab02b19d26 100644 --- a/fs/xfs/libxfs/xfs_inode_fork.c +++ b/fs/xfs/libxfs/xfs_inode_fork.c @@ -433,44 +433,46 @@ xfs_ifork_move_broot( */ void xfs_iroot_realloc( - xfs_inode_t *ip, + struct xfs_inode *ip, int rec_diff, int whichfork) { struct xfs_mount *mp = ip->i_mount; - int cur_max; struct xfs_ifork *ifp = xfs_ifork_ptr(ip, whichfork); struct xfs_btree_block *new_broot; - int new_max; size_t new_size; size_t old_size = ifp->if_broot_bytes; + int cur_max; + int new_max; + + /* Handle degenerate cases. */ + if (rec_diff == 0) + return; /* - * Handle the degenerate case quietly. + * If there wasn't any memory allocated before, just allocate it now + * and get out. */ - if (rec_diff == 0) { + if (old_size == 0) { + ASSERT(rec_diff > 0); + + new_size = xfs_bmap_broot_space_calc(mp, rec_diff); + xfs_iroot_alloc(ip, whichfork, new_size); return; } + /* Compute the new and old record count and space requirements. */ + cur_max = xfs_bmbt_maxrecs(mp, old_size, 0); + new_max = cur_max + rec_diff; + ASSERT(new_max >= 0); + new_size = xfs_bmap_broot_space_calc(mp, new_max); + if (rec_diff > 0) { - /* - * If there wasn't any memory allocated before, just - * allocate it now and get out. - */ - if (old_size == 0) { - new_size = xfs_bmap_broot_space_calc(mp, rec_diff); - xfs_iroot_alloc(ip, whichfork, new_size); - return; - } - /* * If there is already an existing if_broot, then we need * to realloc() it and shift the pointers to their new * location. */ - cur_max = xfs_bmbt_maxrecs(mp, old_size, 0); - new_max = cur_max + rec_diff; - new_size = xfs_bmap_broot_space_calc(mp, new_max); ifp->if_broot = krealloc(ifp->if_broot, new_size, GFP_NOFS | __GFP_NOFAIL); ifp->if_broot_bytes = new_size; @@ -482,14 +484,8 @@ xfs_iroot_realloc( /* * rec_diff is less than 0. In this case, we are shrinking the * if_broot buffer. It must already exist. If we go to zero - * records, just get rid of the root and clear the status bit. + * bytes, just get rid of the root and clear the status bit. */ - ASSERT((ifp->if_broot != NULL) && (old_size > 0)); - cur_max = xfs_bmbt_maxrecs(mp, old_size, 0); - new_max = cur_max + rec_diff; - ASSERT(new_max >= 0); - - new_size = xfs_bmap_broot_space_calc(mp, new_max); if (new_size == 0) { xfs_iroot_free(ip, whichfork); return; @@ -502,8 +498,7 @@ xfs_iroot_realloc( kmem_free(ifp->if_broot); ifp->if_broot = new_broot; - ifp->if_broot_bytes = (int)new_size; - return; + ifp->if_broot_bytes = new_size; } ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 10/14] xfs: support leaves in the incore btree root block in xfs_iroot_realloc 2022-12-30 22:17 ` [PATCHSET v1.0 00/14] xfs: refactor btrees to support records in inode root Darrick J. Wong ` (7 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 07/14] xfs: rearrange xfs_iroot_realloc a bit Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 12/14] xfs: hoist the node iroot update code out of xfs_btree_kill_iroot Darrick J. Wong ` (4 subsequent siblings) 13 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Add some logic to xfs_iroot_realloc so that we can handle leaf records in the btree root block correctly. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_bmap_btree.c | 4 +++- fs/xfs/libxfs/xfs_bmap_btree.h | 5 ++++- fs/xfs/libxfs/xfs_inode_fork.c | 12 +++++++----- fs/xfs/libxfs/xfs_inode_fork.h | 5 +++-- fs/xfs/scrub/bmap_repair.c | 2 +- 5 files changed, 18 insertions(+), 10 deletions(-) diff --git a/fs/xfs/libxfs/xfs_bmap_btree.c b/fs/xfs/libxfs/xfs_bmap_btree.c index f9d4ca6ced1f..4c6a91acdad6 100644 --- a/fs/xfs/libxfs/xfs_bmap_btree.c +++ b/fs/xfs/libxfs/xfs_bmap_btree.c @@ -536,6 +536,7 @@ xfs_bmbt_broot_move( size_t dst_bytes, struct xfs_btree_block *src_broot, size_t src_bytes, + unsigned int level, unsigned int numrecs) { struct xfs_mount *mp = ip->i_mount; @@ -543,6 +544,7 @@ xfs_bmbt_broot_move( void *sptr; ASSERT(xfs_bmap_bmdr_space(src_broot) <= xfs_inode_fork_size(ip, whichfork)); + ASSERT(level > 0); /* * We always have to move the pointers because they are not butted @@ -857,7 +859,7 @@ xfs_bmbt_iroot_alloc( struct xfs_ifork *ifp = xfs_ifork_ptr(ip, whichfork); xfs_iroot_alloc(ip, whichfork, - xfs_bmap_broot_space_calc(ip->i_mount, 1)); + xfs_bmap_broot_space_calc(ip->i_mount, 1, 1)); /* Fill in the root. */ xfs_btree_init_block(ip->i_mount, ifp->if_broot, &xfs_bmbt_ops, 1, 1, diff --git a/fs/xfs/libxfs/xfs_bmap_btree.h b/fs/xfs/libxfs/xfs_bmap_btree.h index a9ddc9b42e61..d20321bfe2f6 100644 --- a/fs/xfs/libxfs/xfs_bmap_btree.h +++ b/fs/xfs/libxfs/xfs_bmap_btree.h @@ -161,8 +161,11 @@ xfs_bmap_broot_ptr_addr( static inline size_t xfs_bmap_broot_space_calc( struct xfs_mount *mp, + unsigned int level, unsigned int nrecs) { + ASSERT(level > 0); + /* * If the bmbt root block is empty, we should be converting the fork * to extents format. Hence, the size is zero. @@ -183,7 +186,7 @@ xfs_bmap_broot_space( struct xfs_mount *mp, struct xfs_bmdr_block *bb) { - return xfs_bmap_broot_space_calc(mp, be16_to_cpu(bb->bb_numrecs)); + return xfs_bmap_broot_space_calc(mp, 1, be16_to_cpu(bb->bb_numrecs)); } /* Compute the space required for the ondisk root block. */ diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c index 0ac1c8dba2ed..b844bfd94e9c 100644 --- a/fs/xfs/libxfs/xfs_inode_fork.c +++ b/fs/xfs/libxfs/xfs_inode_fork.c @@ -395,6 +395,7 @@ xfs_iroot_realloc( struct xfs_btree_block *new_broot; size_t new_size; size_t old_size = ifp->if_broot_bytes; + unsigned int level; int cur_max; int new_max; @@ -409,16 +410,17 @@ xfs_iroot_realloc( if (old_size == 0) { ASSERT(rec_diff > 0); - new_size = ops->size(mp, rec_diff); + new_size = ops->size(mp, 0, rec_diff); xfs_iroot_alloc(ip, whichfork, new_size); return; } /* Compute the new and old record count and space requirements. */ - cur_max = ops->maxrecs(mp, old_size, false); + level = be16_to_cpu(ifp->if_broot->bb_level); + cur_max = ops->maxrecs(mp, old_size, level == 0); new_max = cur_max + rec_diff; ASSERT(new_max >= 0); - new_size = ops->size(mp, new_max); + new_size = ops->size(mp, level, new_max); if (rec_diff > 0) { /* @@ -430,7 +432,7 @@ xfs_iroot_realloc( GFP_NOFS | __GFP_NOFAIL); ifp->if_broot_bytes = new_size; ops->move(ip, whichfork, ifp->if_broot, new_size, - ifp->if_broot, old_size, cur_max); + ifp->if_broot, old_size, level, cur_max); return; } @@ -447,7 +449,7 @@ xfs_iroot_realloc( /* Reallocate the btree root and move the contents. */ new_broot = kmem_alloc(new_size, KM_NOFS); ops->move(ip, whichfork, new_broot, new_size, ifp->if_broot, - ifp->if_broot_bytes, new_max); + ifp->if_broot_bytes, level, new_max); kmem_free(ifp->if_broot); ifp->if_broot = new_broot; diff --git a/fs/xfs/libxfs/xfs_inode_fork.h b/fs/xfs/libxfs/xfs_inode_fork.h index 7d95c402f870..3734642917a7 100644 --- a/fs/xfs/libxfs/xfs_inode_fork.h +++ b/fs/xfs/libxfs/xfs_inode_fork.h @@ -277,7 +277,8 @@ struct xfs_ifork_broot_ops { bool leaf); /* Calculate the bytes required for the incore btree root block. */ - size_t (*size)(struct xfs_mount *mp, unsigned int nrecs); + size_t (*size)(struct xfs_mount *mp, unsigned int level, + unsigned int nrecs); /* * Move an incore btree root from one buffer to another. Note that @@ -287,7 +288,7 @@ struct xfs_ifork_broot_ops { void (*move)(struct xfs_inode *ip, int whichfork, struct xfs_btree_block *dst_broot, size_t dst_bytes, struct xfs_btree_block *src_broot, size_t src_bytes, - unsigned int numrecs); + unsigned int level, unsigned int numrecs); }; void xfs_iroot_realloc(struct xfs_inode *ip, int whichfork, diff --git a/fs/xfs/scrub/bmap_repair.c b/fs/xfs/scrub/bmap_repair.c index 73ba5c514cde..0ad0f27fd8ca 100644 --- a/fs/xfs/scrub/bmap_repair.c +++ b/fs/xfs/scrub/bmap_repair.c @@ -437,7 +437,7 @@ xrep_bmap_iroot_size( { ASSERT(level > 0); - return xfs_bmap_broot_space_calc(cur->bc_mp, nr_this_level); + return xfs_bmap_broot_space_calc(cur->bc_mp, level, nr_this_level); } /* Update the inode counters. */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 12/14] xfs: hoist the node iroot update code out of xfs_btree_kill_iroot 2022-12-30 22:17 ` [PATCHSET v1.0 00/14] xfs: refactor btrees to support records in inode root Darrick J. Wong ` (8 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 10/14] xfs: support leaves in the incore btree root block in xfs_iroot_realloc Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 11/14] xfs: hoist the node iroot update code out of xfs_btree_new_iroot Darrick J. Wong ` (3 subsequent siblings) 13 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> In preparation for allowing records in an inode btree root, hoist the code that copies keyptrs from an existing node child into the root block to a separate function. Remove some unnecessary conditionals and clean up a few function calls in the new function. Note that this change reorders the ->free_block call with respect to the change in bc_nlevels to make it easier to support inode root leaf blocks in the next patch. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_btree.c | 94 +++++++++++++++++++++++++++++---------------- 1 file changed, 60 insertions(+), 34 deletions(-) diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c index 94df8c6000eb..d3e073a21063 100644 --- a/fs/xfs/libxfs/xfs_btree.c +++ b/fs/xfs/libxfs/xfs_btree.c @@ -3716,6 +3716,63 @@ xfs_btree_insert( return error; } +/* + * Move the keyptrs from a child node block to the root block. + * + * Since the keyptr size does not change, all we have to do is increase the + * tree height, copy the keyptrs to the new internal node (cblock), shrink + * the root, and copy the pointers there. + */ +STATIC int +xfs_btree_demote_node_child( + struct xfs_btree_cur *cur, + struct xfs_btree_block *cblock, + int level, + int numrecs) +{ + struct xfs_btree_block *block; + union xfs_btree_key *ckp; + union xfs_btree_key *kp; + union xfs_btree_ptr *cpp; + union xfs_btree_ptr *pp; + int i; + int error; + int diff; + + /* + * Adjust the root btree node size and the record count to match the + * doomed child so that we can copy the keyptrs ahead of changing the + * tree shape. + */ + diff = numrecs - cur->bc_ops->get_maxrecs(cur, level); + xfs_btree_iroot_realloc(cur, diff); + block = xfs_btree_get_iroot(cur); + + xfs_btree_set_numrecs(block, numrecs); + ASSERT(block->bb_numrecs == cblock->bb_numrecs); + + /* Copy keys from the doomed block. */ + kp = xfs_btree_key_addr(cur, 1, block); + ckp = xfs_btree_key_addr(cur, 1, cblock); + xfs_btree_copy_keys(cur, kp, ckp, numrecs); + + /* Copy pointers from the doomed block. */ + pp = xfs_btree_ptr_addr(cur, 1, block); + cpp = xfs_btree_ptr_addr(cur, 1, cblock); + for (i = 0; i < numrecs; i++) { + error = xfs_btree_debug_check_ptr(cur, cpp, i, level - 1); + if (error) + return error; + } + xfs_btree_copy_ptrs(cur, pp, cpp, numrecs); + + /* Decrease tree height, adjusting the root block level to match. */ + cur->bc_levels[level - 1].bp = NULL; + be16_add_cpu(&block->bb_level, -1); + cur->bc_nlevels--; + return 0; +} + /* * Try to merge a non-leaf block back into the inode root. * @@ -3728,24 +3785,16 @@ STATIC int xfs_btree_kill_iroot( struct xfs_btree_cur *cur) { - int whichfork = cur->bc_ino.whichfork; struct xfs_inode *ip = cur->bc_ino.ip; - struct xfs_ifork *ifp = xfs_ifork_ptr(ip, whichfork); struct xfs_btree_block *block; struct xfs_btree_block *cblock; - union xfs_btree_key *kp; - union xfs_btree_key *ckp; - union xfs_btree_ptr *pp; - union xfs_btree_ptr *cpp; struct xfs_buf *cbp; int level; - int index; int numrecs; int error; #ifdef DEBUG union xfs_btree_ptr ptr; #endif - int i; ASSERT(cur->bc_flags & XFS_BTREE_ROOT_IN_INODE); ASSERT(cur->bc_nlevels > 1); @@ -3785,39 +3834,16 @@ xfs_btree_kill_iroot( ASSERT(xfs_btree_ptr_is_null(cur, &ptr)); #endif - index = numrecs - cur->bc_ops->get_maxrecs(cur, level); - if (index) { - xfs_btree_iroot_realloc(cur, index); - block = ifp->if_broot; - } - - be16_add_cpu(&block->bb_numrecs, index); - ASSERT(block->bb_numrecs == cblock->bb_numrecs); - - kp = xfs_btree_key_addr(cur, 1, block); - ckp = xfs_btree_key_addr(cur, 1, cblock); - xfs_btree_copy_keys(cur, kp, ckp, numrecs); - - pp = xfs_btree_ptr_addr(cur, 1, block); - cpp = xfs_btree_ptr_addr(cur, 1, cblock); - - for (i = 0; i < numrecs; i++) { - error = xfs_btree_debug_check_ptr(cur, cpp, i, level - 1); - if (error) - return error; - } - - xfs_btree_copy_ptrs(cur, pp, cpp, numrecs); + error = xfs_btree_demote_node_child(cur, cblock, level, numrecs); + if (error) + return error; error = xfs_btree_free_block(cur, cbp); if (error) return error; - cur->bc_levels[level - 1].bp = NULL; - be16_add_cpu(&block->bb_level, -1); xfs_trans_log_inode(cur->bc_tp, ip, XFS_ILOG_CORE | xfs_ilog_fbroot(cur->bc_ino.whichfork)); - cur->bc_nlevels--; out0: return 0; } ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 11/14] xfs: hoist the node iroot update code out of xfs_btree_new_iroot 2022-12-30 22:17 ` [PATCHSET v1.0 00/14] xfs: refactor btrees to support records in inode root Darrick J. Wong ` (9 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 12/14] xfs: hoist the node iroot update code out of xfs_btree_kill_iroot Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 09/14] xfs: generalize the btree root reallocation function Darrick J. Wong ` (2 subsequent siblings) 13 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> In preparation for allowing records in an inode btree root, hoist the code that copies keyptrs from an existing node root into a child block to a separate function. Note that the new function explicitly computes the keys of the new child block and stores that in the root block; while the bmap btree could rely on leaving the key alone, realtime rmap needs to set the new high key. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_btree.c | 113 +++++++++++++++++++++++++++++---------------- 1 file changed, 74 insertions(+), 39 deletions(-) diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c index c2e6b4ea28bf..94df8c6000eb 100644 --- a/fs/xfs/libxfs/xfs_btree.c +++ b/fs/xfs/libxfs/xfs_btree.c @@ -3091,6 +3091,77 @@ xfs_btree_iroot_realloc( cur->bc_ops->iroot_ops, rec_diff); } +/* + * Move the keys and pointers from a root block to a separate block. + * + * Since the keyptr size does not change, all we have to do is increase the + * tree height, copy the keyptrs to the new internal node (cblock), shrink + * the root, and copy the pointers there. + */ +STATIC int +xfs_btree_promote_node_iroot( + struct xfs_btree_cur *cur, + struct xfs_btree_block *block, + int level, + struct xfs_buf *cbp, + union xfs_btree_ptr *cptr, + struct xfs_btree_block *cblock) +{ + union xfs_btree_key *ckp; + union xfs_btree_key *kp; + union xfs_btree_ptr *cpp; + union xfs_btree_ptr *pp; + int i; + int error; + int numrecs = xfs_btree_get_numrecs(block); + + /* + * Increase tree height, adjusting the root block level to match. + * We cannot change the root btree node size until we've copied the + * block contents to the new child block. + */ + be16_add_cpu(&block->bb_level, 1); + cur->bc_nlevels++; + cur->bc_levels[level + 1].ptr = 1; + + /* + * Adjust the root btree record count, then copy the keys from the old + * root to the new child block. + */ + xfs_btree_set_numrecs(block, 1); + kp = xfs_btree_key_addr(cur, 1, block); + ckp = xfs_btree_key_addr(cur, 1, cblock); + xfs_btree_copy_keys(cur, ckp, kp, numrecs); + + /* Check the pointers and copy them to the new child block. */ + pp = xfs_btree_ptr_addr(cur, 1, block); + cpp = xfs_btree_ptr_addr(cur, 1, cblock); + for (i = 0; i < numrecs; i++) { + error = xfs_btree_debug_check_ptr(cur, pp, i, level); + if (error) + return error; + } + xfs_btree_copy_ptrs(cur, cpp, pp, numrecs); + + /* + * Set the first keyptr to point to the new child block, then shrink + * the memory buffer for the root block. + */ + error = xfs_btree_debug_check_ptr(cur, cptr, 0, level); + if (error) + return error; + xfs_btree_copy_ptrs(cur, pp, cptr, 1); + xfs_btree_get_keys(cur, cblock, kp); + xfs_btree_iroot_realloc(cur, 1 - numrecs); + + /* Attach the new block to the cursor and log it. */ + xfs_btree_setbuf(cur, level, cbp); + xfs_btree_log_block(cur, cbp, XFS_BB_ALL_BITS); + xfs_btree_log_keys(cur, cbp, 1, numrecs); + xfs_btree_log_ptrs(cur, cbp, 1, numrecs); + return 0; +} + /* * Copy the old inode root contents into a real block and make the * broot point to it. @@ -3104,14 +3175,10 @@ xfs_btree_new_iroot( struct xfs_buf *cbp; /* buffer for cblock */ struct xfs_btree_block *block; /* btree block */ struct xfs_btree_block *cblock; /* child btree block */ - union xfs_btree_key *ckp; /* child key pointer */ - union xfs_btree_ptr *cpp; /* child ptr pointer */ - union xfs_btree_key *kp; /* pointer to btree key */ - union xfs_btree_ptr *pp; /* pointer to block addr */ + union xfs_btree_ptr *pp; union xfs_btree_ptr nptr; /* new block addr */ int level; /* btree level */ int error; /* error return code */ - int i; /* loop counter */ XFS_BTREE_STATS_INC(cur, newroot); @@ -3149,43 +3216,11 @@ xfs_btree_new_iroot( cblock->bb_u.s.bb_blkno = bno; } - be16_add_cpu(&block->bb_level, 1); - xfs_btree_set_numrecs(block, 1); - cur->bc_nlevels++; - ASSERT(cur->bc_nlevels <= cur->bc_maxlevels); - cur->bc_levels[level + 1].ptr = 1; - - kp = xfs_btree_key_addr(cur, 1, block); - ckp = xfs_btree_key_addr(cur, 1, cblock); - xfs_btree_copy_keys(cur, ckp, kp, xfs_btree_get_numrecs(cblock)); - - cpp = xfs_btree_ptr_addr(cur, 1, cblock); - for (i = 0; i < be16_to_cpu(cblock->bb_numrecs); i++) { - error = xfs_btree_debug_check_ptr(cur, pp, i, level); - if (error) - goto error0; - } - - xfs_btree_copy_ptrs(cur, cpp, pp, xfs_btree_get_numrecs(cblock)); - - error = xfs_btree_debug_check_ptr(cur, &nptr, 0, level); + error = xfs_btree_promote_node_iroot(cur, block, level, cbp, &nptr, + cblock); if (error) goto error0; - xfs_btree_copy_ptrs(cur, pp, &nptr, 1); - - xfs_btree_iroot_realloc(cur, 1 - xfs_btree_get_numrecs(cblock)); - - xfs_btree_setbuf(cur, level, cbp); - - /* - * Do all this logging at the end so that - * the root is at the right level. - */ - xfs_btree_log_block(cur, cbp, XFS_BB_ALL_BITS); - xfs_btree_log_keys(cur, cbp, 1, be16_to_cpu(cblock->bb_numrecs)); - xfs_btree_log_ptrs(cur, cbp, 1, be16_to_cpu(cblock->bb_numrecs)); - *logflags |= XFS_ILOG_CORE | xfs_ilog_fbroot(cur->bc_ino.whichfork); *stat = 1; ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 09/14] xfs: generalize the btree root reallocation function 2022-12-30 22:17 ` [PATCHSET v1.0 00/14] xfs: refactor btrees to support records in inode root Darrick J. Wong ` (10 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 11/14] xfs: hoist the node iroot update code out of xfs_btree_new_iroot Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 08/14] xfs: standardize the btree maxrecs function parameters Darrick J. Wong 2022-12-30 22:17 ` [PATCH 14/14] xfs: update btree keys correctly when _insrec splits an inode root block Darrick J. Wong 13 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> In preparation for storing realtime rmap btree roots in an inode fork, make xfs_iroot_realloc take an ops structure that takes care of all the btree-specific geometry pieces. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_bmap_btree.c | 51 +++++++++++++++++++++++++ fs/xfs/libxfs/xfs_btree.c | 22 +++++++---- fs/xfs/libxfs/xfs_btree.h | 3 + fs/xfs/libxfs/xfs_inode_fork.c | 82 ++++++++-------------------------------- fs/xfs/libxfs/xfs_inode_fork.h | 23 +++++++++++ 5 files changed, 107 insertions(+), 74 deletions(-) diff --git a/fs/xfs/libxfs/xfs_bmap_btree.c b/fs/xfs/libxfs/xfs_bmap_btree.c index 1d226b284db3..f9d4ca6ced1f 100644 --- a/fs/xfs/libxfs/xfs_bmap_btree.c +++ b/fs/xfs/libxfs/xfs_bmap_btree.c @@ -527,6 +527,56 @@ xfs_bmbt_keys_contiguous( be64_to_cpu(key2->bmbt.br_startoff)); } +/* Move the bmap btree root from one incore buffer to another. */ +static void +xfs_bmbt_broot_move( + struct xfs_inode *ip, + int whichfork, + struct xfs_btree_block *dst_broot, + size_t dst_bytes, + struct xfs_btree_block *src_broot, + size_t src_bytes, + unsigned int numrecs) +{ + struct xfs_mount *mp = ip->i_mount; + void *dptr; + void *sptr; + + ASSERT(xfs_bmap_bmdr_space(src_broot) <= xfs_inode_fork_size(ip, whichfork)); + + /* + * We always have to move the pointers because they are not butted + * against the btree block header. + */ + if (numrecs) { + sptr = xfs_bmap_broot_ptr_addr(mp, src_broot, 1, src_bytes); + dptr = xfs_bmap_broot_ptr_addr(mp, dst_broot, 1, dst_bytes); + memmove(dptr, sptr, numrecs * sizeof(xfs_fsblock_t)); + } + + if (src_broot == dst_broot) + return; + + /* + * If the root is being totally relocated, we have to migrate the block + * header and the keys that come after it. + */ + memcpy(dst_broot, src_broot, xfs_bmbt_block_len(mp)); + + /* Now copy the keys, which come right after the header. */ + if (numrecs) { + sptr = xfs_bmbt_key_addr(mp, src_broot, 1); + dptr = xfs_bmbt_key_addr(mp, dst_broot, 1); + memcpy(dptr, sptr, numrecs * sizeof(struct xfs_bmbt_key)); + } +} + +static const struct xfs_ifork_broot_ops xfs_bmbt_iroot_ops = { + .maxrecs = xfs_bmbt_maxrecs, + .size = xfs_bmap_broot_space_calc, + .move = xfs_bmbt_broot_move, +}; + const struct xfs_btree_ops xfs_bmbt_ops = { .rec_len = sizeof(xfs_bmbt_rec_t), .key_len = sizeof(xfs_bmbt_key_t), @@ -549,6 +599,7 @@ const struct xfs_btree_ops xfs_bmbt_ops = { .keys_inorder = xfs_bmbt_keys_inorder, .recs_inorder = xfs_bmbt_recs_inorder, .keys_contiguous = xfs_bmbt_keys_contiguous, + .iroot_ops = &xfs_bmbt_iroot_ops, }; /* diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c index 5176947870f9..c2e6b4ea28bf 100644 --- a/fs/xfs/libxfs/xfs_btree.c +++ b/fs/xfs/libxfs/xfs_btree.c @@ -3080,6 +3080,16 @@ xfs_btree_split( #define xfs_btree_split __xfs_btree_split #endif /* __KERNEL__ */ +static inline void +xfs_btree_iroot_realloc( + struct xfs_btree_cur *cur, + int rec_diff) +{ + ASSERT(cur->bc_flags & XFS_BTREE_ROOT_IN_INODE); + + xfs_iroot_realloc(cur->bc_ino.ip, cur->bc_ino.whichfork, + cur->bc_ops->iroot_ops, rec_diff); +} /* * Copy the old inode root contents into a real block and make the @@ -3164,9 +3174,7 @@ xfs_btree_new_iroot( xfs_btree_copy_ptrs(cur, pp, &nptr, 1); - xfs_iroot_realloc(cur->bc_ino.ip, - 1 - xfs_btree_get_numrecs(cblock), - cur->bc_ino.whichfork); + xfs_btree_iroot_realloc(cur, 1 - xfs_btree_get_numrecs(cblock)); xfs_btree_setbuf(cur, level, cbp); @@ -3336,7 +3344,7 @@ xfs_btree_make_block_unfull( if (numrecs < cur->bc_ops->get_dmaxrecs(cur, level)) { /* A root block that can be made bigger. */ - xfs_iroot_realloc(ip, 1, cur->bc_ino.whichfork); + xfs_btree_iroot_realloc(cur, 1); *stat = 1; } else { /* A root block that needs replacing */ @@ -3744,8 +3752,7 @@ xfs_btree_kill_iroot( index = numrecs - cur->bc_ops->get_maxrecs(cur, level); if (index) { - xfs_iroot_realloc(cur->bc_ino.ip, index, - cur->bc_ino.whichfork); + xfs_btree_iroot_realloc(cur, index); block = ifp->if_broot; } @@ -3942,8 +3949,7 @@ xfs_btree_delrec( */ if (level == cur->bc_nlevels - 1) { if (cur->bc_flags & XFS_BTREE_ROOT_IN_INODE) { - xfs_iroot_realloc(cur->bc_ino.ip, -1, - cur->bc_ino.whichfork); + xfs_btree_iroot_realloc(cur, -1); error = xfs_btree_kill_iroot(cur); if (error) diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h index 0e12360ae36d..3acfdcdf7561 100644 --- a/fs/xfs/libxfs/xfs_btree.h +++ b/fs/xfs/libxfs/xfs_btree.h @@ -202,6 +202,9 @@ struct xfs_btree_ops { const union xfs_btree_key *key1, const union xfs_btree_key *key2, const union xfs_btree_key *mask); + + /* Functions for manipulating the btree root block. */ + const struct xfs_ifork_broot_ops *iroot_ops; }; /* diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c index a9610452ca3a..0ac1c8dba2ed 100644 --- a/fs/xfs/libxfs/xfs_inode_fork.c +++ b/fs/xfs/libxfs/xfs_inode_fork.c @@ -369,50 +369,6 @@ xfs_iroot_free( ifp->if_broot = NULL; } -/* Move the bmap btree root from one incore buffer to another. */ -static void -xfs_ifork_move_broot( - struct xfs_inode *ip, - int whichfork, - struct xfs_btree_block *dst_broot, - size_t dst_bytes, - struct xfs_btree_block *src_broot, - size_t src_bytes, - unsigned int numrecs) -{ - struct xfs_mount *mp = ip->i_mount; - void *dptr; - void *sptr; - - ASSERT(xfs_bmap_bmdr_space(src_broot) <= xfs_inode_fork_size(ip, whichfork)); - - /* - * We always have to move the pointers because they are not butted - * against the btree block header. - */ - if (numrecs) { - sptr = xfs_bmap_broot_ptr_addr(mp, src_broot, 1, src_bytes); - dptr = xfs_bmap_broot_ptr_addr(mp, dst_broot, 1, dst_bytes); - memmove(dptr, sptr, numrecs * sizeof(xfs_fsblock_t)); - } - - if (src_broot == dst_broot) - return; - - /* - * If the root is being totally relocated, we have to migrate the block - * header and the keys that come after it. - */ - memcpy(dst_broot, src_broot, xfs_bmbt_block_len(mp)); - - /* Now copy the keys, which come right after the header. */ - if (numrecs) { - sptr = xfs_bmbt_key_addr(mp, src_broot, 1); - dptr = xfs_bmbt_key_addr(mp, dst_broot, 1); - memcpy(dptr, sptr, numrecs * sizeof(struct xfs_bmbt_key)); - } -} - /* * Reallocate the space for if_broot based on the number of records * being added or deleted as indicated in rec_diff. Move the records @@ -426,24 +382,21 @@ xfs_ifork_move_broot( * if we are adding records, one will be allocated. The caller must also * not request that the number of records go below zero, although * it can go to zero. - * - * ip -- the inode whose if_broot area is changing - * ext_diff -- the change in the number of records, positive or negative, - * requested for the if_broot array. */ void xfs_iroot_realloc( - struct xfs_inode *ip, - int rec_diff, - int whichfork) + struct xfs_inode *ip, + int whichfork, + const struct xfs_ifork_broot_ops *ops, + int rec_diff) { - struct xfs_mount *mp = ip->i_mount; - struct xfs_ifork *ifp = xfs_ifork_ptr(ip, whichfork); - struct xfs_btree_block *new_broot; - size_t new_size; - size_t old_size = ifp->if_broot_bytes; - int cur_max; - int new_max; + struct xfs_mount *mp = ip->i_mount; + struct xfs_ifork *ifp = xfs_ifork_ptr(ip, whichfork); + struct xfs_btree_block *new_broot; + size_t new_size; + size_t old_size = ifp->if_broot_bytes; + int cur_max; + int new_max; /* Handle degenerate cases. */ if (rec_diff == 0) @@ -456,16 +409,16 @@ xfs_iroot_realloc( if (old_size == 0) { ASSERT(rec_diff > 0); - new_size = xfs_bmap_broot_space_calc(mp, rec_diff); + new_size = ops->size(mp, rec_diff); xfs_iroot_alloc(ip, whichfork, new_size); return; } /* Compute the new and old record count and space requirements. */ - cur_max = xfs_bmbt_maxrecs(mp, old_size, false); + cur_max = ops->maxrecs(mp, old_size, false); new_max = cur_max + rec_diff; ASSERT(new_max >= 0); - new_size = xfs_bmap_broot_space_calc(mp, new_max); + new_size = ops->size(mp, new_max); if (rec_diff > 0) { /* @@ -476,7 +429,7 @@ xfs_iroot_realloc( ifp->if_broot = krealloc(ifp->if_broot, new_size, GFP_NOFS | __GFP_NOFAIL); ifp->if_broot_bytes = new_size; - xfs_ifork_move_broot(ip, whichfork, ifp->if_broot, new_size, + ops->move(ip, whichfork, ifp->if_broot, new_size, ifp->if_broot, old_size, cur_max); return; } @@ -493,15 +446,14 @@ xfs_iroot_realloc( /* Reallocate the btree root and move the contents. */ new_broot = kmem_alloc(new_size, KM_NOFS); - xfs_ifork_move_broot(ip, whichfork, new_broot, new_size, ifp->if_broot, - old_size, new_max); + ops->move(ip, whichfork, new_broot, new_size, ifp->if_broot, + ifp->if_broot_bytes, new_max); kmem_free(ifp->if_broot); ifp->if_broot = new_broot; ifp->if_broot_bytes = new_size; } - /* * This is called when the amount of space needed for if_data * is increased or decreased. The change in size is indicated by diff --git a/fs/xfs/libxfs/xfs_inode_fork.h b/fs/xfs/libxfs/xfs_inode_fork.h index f4379e2df616..7d95c402f870 100644 --- a/fs/xfs/libxfs/xfs_inode_fork.h +++ b/fs/xfs/libxfs/xfs_inode_fork.h @@ -174,7 +174,6 @@ void xfs_idata_realloc(struct xfs_inode *ip, int64_t byte_diff, void xfs_iroot_alloc(struct xfs_inode *ip, int whichfork, size_t bytes); void xfs_iroot_free(struct xfs_inode *ip, int whichfork); -void xfs_iroot_realloc(struct xfs_inode *, int, int); int xfs_iread_extents(struct xfs_trans *, struct xfs_inode *, int); int xfs_iextents_copy(struct xfs_inode *, struct xfs_bmbt_rec *, int); @@ -272,4 +271,26 @@ static inline bool xfs_need_iread_extents(struct xfs_ifork *ifp) return ifp->if_format == XFS_DINODE_FMT_BTREE && ifp->if_height == 0; } +struct xfs_ifork_broot_ops { + /* Calculate the number of records/keys in the incore btree block. */ + unsigned int (*maxrecs)(struct xfs_mount *mp, unsigned int blocksize, + bool leaf); + + /* Calculate the bytes required for the incore btree root block. */ + size_t (*size)(struct xfs_mount *mp, unsigned int nrecs); + + /* + * Move an incore btree root from one buffer to another. Note that + * src_broot and dst_broot could be the same or they could be totally + * separate memory regions. + */ + void (*move)(struct xfs_inode *ip, int whichfork, + struct xfs_btree_block *dst_broot, size_t dst_bytes, + struct xfs_btree_block *src_broot, size_t src_bytes, + unsigned int numrecs); +}; + +void xfs_iroot_realloc(struct xfs_inode *ip, int whichfork, + const struct xfs_ifork_broot_ops *ops, int rec_diff); + #endif /* __XFS_INODE_FORK_H__ */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 08/14] xfs: standardize the btree maxrecs function parameters 2022-12-30 22:17 ` [PATCHSET v1.0 00/14] xfs: refactor btrees to support records in inode root Darrick J. Wong ` (11 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 09/14] xfs: generalize the btree root reallocation function Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 14/14] xfs: update btree keys correctly when _insrec splits an inode root block Darrick J. Wong 13 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Standardize the parameters in xfs_{alloc,bm,ino,rmap,refcount}bt_maxrecs so that we have consistent calling conventions. This doesn't affect the kernel that much, but enables us to clean up userspace a bit. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_alloc_btree.c | 6 +++--- fs/xfs/libxfs/xfs_alloc_btree.h | 3 ++- fs/xfs/libxfs/xfs_bmap.c | 2 +- fs/xfs/libxfs/xfs_bmap_btree.c | 6 +++--- fs/xfs/libxfs/xfs_bmap_btree.h | 5 +++-- fs/xfs/libxfs/xfs_ialloc.c | 4 ++-- fs/xfs/libxfs/xfs_ialloc_btree.c | 6 +++--- fs/xfs/libxfs/xfs_ialloc_btree.h | 3 ++- fs/xfs/libxfs/xfs_inode_fork.c | 2 +- fs/xfs/libxfs/xfs_refcount_btree.c | 5 +++-- fs/xfs/libxfs/xfs_refcount_btree.h | 3 ++- fs/xfs/libxfs/xfs_rmap_btree.c | 9 +++++---- fs/xfs/libxfs/xfs_rmap_btree.h | 3 ++- fs/xfs/libxfs/xfs_sb.c | 16 ++++++++-------- 14 files changed, 40 insertions(+), 33 deletions(-) diff --git a/fs/xfs/libxfs/xfs_alloc_btree.c b/fs/xfs/libxfs/xfs_alloc_btree.c index 6f17fee31872..7f375c853492 100644 --- a/fs/xfs/libxfs/xfs_alloc_btree.c +++ b/fs/xfs/libxfs/xfs_alloc_btree.c @@ -609,11 +609,11 @@ xfs_allocbt_block_maxrecs( /* * Calculate number of records in an alloc btree block. */ -int +unsigned int xfs_allocbt_maxrecs( struct xfs_mount *mp, - int blocklen, - int leaf) + unsigned int blocklen, + bool leaf) { blocklen -= XFS_ALLOC_BLOCK_LEN(mp); return xfs_allocbt_block_maxrecs(blocklen, leaf); diff --git a/fs/xfs/libxfs/xfs_alloc_btree.h b/fs/xfs/libxfs/xfs_alloc_btree.h index 45df893ef6bb..f61f51d0bd76 100644 --- a/fs/xfs/libxfs/xfs_alloc_btree.h +++ b/fs/xfs/libxfs/xfs_alloc_btree.h @@ -53,7 +53,8 @@ extern struct xfs_btree_cur *xfs_allocbt_init_cursor(struct xfs_mount *mp, struct xfs_btree_cur *xfs_allocbt_stage_cursor(struct xfs_mount *mp, struct xbtree_afakeroot *afake, struct xfs_perag *pag, xfs_btnum_t btnum); -extern int xfs_allocbt_maxrecs(struct xfs_mount *, int, int); +unsigned int xfs_allocbt_maxrecs(struct xfs_mount *mp, unsigned int blocklen, + bool leaf); extern xfs_extlen_t xfs_allocbt_calc_size(struct xfs_mount *mp, unsigned long long len); diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index 98bd32da142d..eda20bb5c4af 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -565,7 +565,7 @@ xfs_bmap_btree_to_extents( ASSERT(ifp->if_format == XFS_DINODE_FMT_BTREE); ASSERT(be16_to_cpu(rblock->bb_level) == 1); ASSERT(be16_to_cpu(rblock->bb_numrecs) == 1); - ASSERT(xfs_bmbt_maxrecs(mp, ifp->if_broot_bytes, 0) == 1); + ASSERT(xfs_bmbt_maxrecs(mp, ifp->if_broot_bytes, false) == 1); pp = xfs_bmap_broot_ptr_addr(mp, rblock, 1, ifp->if_broot_bytes); cbno = be64_to_cpu(*pp); diff --git a/fs/xfs/libxfs/xfs_bmap_btree.c b/fs/xfs/libxfs/xfs_bmap_btree.c index 973fa6cc7aa6..1d226b284db3 100644 --- a/fs/xfs/libxfs/xfs_bmap_btree.c +++ b/fs/xfs/libxfs/xfs_bmap_btree.c @@ -682,11 +682,11 @@ xfs_bmbt_commit_staged_btree( /* * Calculate number of records in a bmap btree block. */ -int +unsigned int xfs_bmbt_maxrecs( struct xfs_mount *mp, - int blocklen, - int leaf) + unsigned int blocklen, + bool leaf) { blocklen -= xfs_bmbt_block_len(mp); return xfs_bmbt_block_maxrecs(blocklen, leaf); diff --git a/fs/xfs/libxfs/xfs_bmap_btree.h b/fs/xfs/libxfs/xfs_bmap_btree.h index 5a3bae94debd..a9ddc9b42e61 100644 --- a/fs/xfs/libxfs/xfs_bmap_btree.h +++ b/fs/xfs/libxfs/xfs_bmap_btree.h @@ -35,7 +35,8 @@ extern void xfs_bmbt_to_bmdr(struct xfs_mount *, struct xfs_btree_block *, int, extern int xfs_bmbt_get_maxrecs(struct xfs_btree_cur *, int level); extern int xfs_bmdr_maxrecs(int blocklen, int leaf); -extern int xfs_bmbt_maxrecs(struct xfs_mount *, int blocklen, int leaf); +unsigned int xfs_bmbt_maxrecs(struct xfs_mount *mp, unsigned int blocklen, + bool leaf); extern int xfs_bmbt_change_owner(struct xfs_trans *tp, struct xfs_inode *ip, int whichfork, xfs_ino_t new_owner, @@ -150,7 +151,7 @@ xfs_bmap_broot_ptr_addr( unsigned int i, unsigned int sz) { - return xfs_bmbt_ptr_addr(mp, bb, i, xfs_bmbt_maxrecs(mp, sz, 0)); + return xfs_bmbt_ptr_addr(mp, bb, i, xfs_bmbt_maxrecs(mp, sz, false)); } /* diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c index 1d1c3cb0389c..331d22a60272 100644 --- a/fs/xfs/libxfs/xfs_ialloc.c +++ b/fs/xfs/libxfs/xfs_ialloc.c @@ -2880,8 +2880,8 @@ xfs_ialloc_setup_geometry( /* Compute inode btree geometry. */ igeo->agino_log = sbp->sb_inopblog + sbp->sb_agblklog; - igeo->inobt_mxr[0] = xfs_inobt_maxrecs(mp, sbp->sb_blocksize, 1); - igeo->inobt_mxr[1] = xfs_inobt_maxrecs(mp, sbp->sb_blocksize, 0); + igeo->inobt_mxr[0] = xfs_inobt_maxrecs(mp, sbp->sb_blocksize, true); + igeo->inobt_mxr[1] = xfs_inobt_maxrecs(mp, sbp->sb_blocksize, false); igeo->inobt_mnr[0] = igeo->inobt_mxr[0] / 2; igeo->inobt_mnr[1] = igeo->inobt_mxr[1] / 2; diff --git a/fs/xfs/libxfs/xfs_ialloc_btree.c b/fs/xfs/libxfs/xfs_ialloc_btree.c index 84094e326a6e..9f3104db9171 100644 --- a/fs/xfs/libxfs/xfs_ialloc_btree.c +++ b/fs/xfs/libxfs/xfs_ialloc_btree.c @@ -560,11 +560,11 @@ xfs_inobt_block_maxrecs( /* * Calculate number of records in an inobt btree block. */ -int +unsigned int xfs_inobt_maxrecs( struct xfs_mount *mp, - int blocklen, - int leaf) + unsigned int blocklen, + bool leaf) { blocklen -= XFS_INOBT_BLOCK_LEN(mp); return xfs_inobt_block_maxrecs(blocklen, leaf); diff --git a/fs/xfs/libxfs/xfs_ialloc_btree.h b/fs/xfs/libxfs/xfs_ialloc_btree.h index 6d8d6bcd594d..1f219fae59c2 100644 --- a/fs/xfs/libxfs/xfs_ialloc_btree.h +++ b/fs/xfs/libxfs/xfs_ialloc_btree.h @@ -52,7 +52,8 @@ extern struct xfs_btree_cur *xfs_inobt_init_cursor(struct xfs_mount *mp, struct xfs_btree_cur *xfs_inobt_stage_cursor(struct xfs_mount *mp, struct xbtree_afakeroot *afake, struct xfs_perag *pag, xfs_btnum_t btnum); -extern int xfs_inobt_maxrecs(struct xfs_mount *, int, int); +unsigned int xfs_inobt_maxrecs(struct xfs_mount *mp, unsigned int blocklen, + bool leaf); /* ir_holemask to inode allocation bitmap conversion */ uint64_t xfs_inobt_irec_to_allocmask(const struct xfs_inobt_rec_incore *irec); diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c index ceab02b19d26..a9610452ca3a 100644 --- a/fs/xfs/libxfs/xfs_inode_fork.c +++ b/fs/xfs/libxfs/xfs_inode_fork.c @@ -462,7 +462,7 @@ xfs_iroot_realloc( } /* Compute the new and old record count and space requirements. */ - cur_max = xfs_bmbt_maxrecs(mp, old_size, 0); + cur_max = xfs_bmbt_maxrecs(mp, old_size, false); new_max = cur_max + rec_diff; ASSERT(new_max >= 0); new_size = xfs_bmap_broot_space_calc(mp, new_max); diff --git a/fs/xfs/libxfs/xfs_refcount_btree.c b/fs/xfs/libxfs/xfs_refcount_btree.c index b037fd949f8c..674a846aa121 100644 --- a/fs/xfs/libxfs/xfs_refcount_btree.c +++ b/fs/xfs/libxfs/xfs_refcount_btree.c @@ -439,9 +439,10 @@ xfs_refcountbt_block_maxrecs( /* * Calculate the number of records in a refcount btree block. */ -int +unsigned int xfs_refcountbt_maxrecs( - int blocklen, + struct xfs_mount *mp, + unsigned int blocklen, bool leaf) { blocklen -= XFS_REFCOUNT_BLOCK_LEN; diff --git a/fs/xfs/libxfs/xfs_refcount_btree.h b/fs/xfs/libxfs/xfs_refcount_btree.h index d66b37259bed..fe3c20d67790 100644 --- a/fs/xfs/libxfs/xfs_refcount_btree.h +++ b/fs/xfs/libxfs/xfs_refcount_btree.h @@ -50,7 +50,8 @@ extern struct xfs_btree_cur *xfs_refcountbt_init_cursor(struct xfs_mount *mp, struct xfs_perag *pag); struct xfs_btree_cur *xfs_refcountbt_stage_cursor(struct xfs_mount *mp, struct xbtree_afakeroot *afake, struct xfs_perag *pag); -extern int xfs_refcountbt_maxrecs(int blocklen, bool leaf); +unsigned int xfs_refcountbt_maxrecs(struct xfs_mount *mp, unsigned int blocklen, + bool leaf); extern void xfs_refcountbt_compute_maxlevels(struct xfs_mount *mp); extern xfs_extlen_t xfs_refcountbt_calc_size(struct xfs_mount *mp, diff --git a/fs/xfs/libxfs/xfs_rmap_btree.c b/fs/xfs/libxfs/xfs_rmap_btree.c index 1b88766ac497..81af6444e4b9 100644 --- a/fs/xfs/libxfs/xfs_rmap_btree.c +++ b/fs/xfs/libxfs/xfs_rmap_btree.c @@ -588,7 +588,7 @@ xfs_rmapbt_mem_verify( } return xfbtree_sblock_verify(bp, - xfs_rmapbt_maxrecs(xfo_to_b(1), level == 0)); + xfs_rmapbt_maxrecs(mp, xfo_to_b(1), level == 0)); } static void @@ -715,10 +715,11 @@ xfs_rmapbt_block_maxrecs( /* * Calculate number of records in an rmap btree block. */ -int +unsigned int xfs_rmapbt_maxrecs( - int blocklen, - int leaf) + struct xfs_mount *mp, + unsigned int blocklen, + bool leaf) { blocklen -= XFS_RMAP_BLOCK_LEN; return xfs_rmapbt_block_maxrecs(blocklen, leaf); diff --git a/fs/xfs/libxfs/xfs_rmap_btree.h b/fs/xfs/libxfs/xfs_rmap_btree.h index a27a236111dd..bf88fba0392a 100644 --- a/fs/xfs/libxfs/xfs_rmap_btree.h +++ b/fs/xfs/libxfs/xfs_rmap_btree.h @@ -48,7 +48,8 @@ struct xfs_btree_cur *xfs_rmapbt_stage_cursor(struct xfs_mount *mp, struct xbtree_afakeroot *afake, struct xfs_perag *pag); void xfs_rmapbt_commit_staged_btree(struct xfs_btree_cur *cur, struct xfs_trans *tp, struct xfs_buf *agbp); -int xfs_rmapbt_maxrecs(int blocklen, int leaf); +unsigned int xfs_rmapbt_maxrecs(struct xfs_mount *mp, unsigned int blocklen, + bool leaf); extern void xfs_rmapbt_compute_maxlevels(struct xfs_mount *mp); extern xfs_extlen_t xfs_rmapbt_calc_size(struct xfs_mount *mp, diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c index dbbea5b86f27..54f93e1b0f00 100644 --- a/fs/xfs/libxfs/xfs_sb.c +++ b/fs/xfs/libxfs/xfs_sb.c @@ -1049,23 +1049,23 @@ xfs_sb_mount_common( mp->m_rgblklog = log2_if_power2(sbp->sb_rgblocks); mp->m_rgblkmask = mask64_if_power2(sbp->sb_rgblocks); - mp->m_alloc_mxr[0] = xfs_allocbt_maxrecs(mp, sbp->sb_blocksize, 1); - mp->m_alloc_mxr[1] = xfs_allocbt_maxrecs(mp, sbp->sb_blocksize, 0); + mp->m_alloc_mxr[0] = xfs_allocbt_maxrecs(mp, sbp->sb_blocksize, true); + mp->m_alloc_mxr[1] = xfs_allocbt_maxrecs(mp, sbp->sb_blocksize, false); mp->m_alloc_mnr[0] = mp->m_alloc_mxr[0] / 2; mp->m_alloc_mnr[1] = mp->m_alloc_mxr[1] / 2; - mp->m_bmap_dmxr[0] = xfs_bmbt_maxrecs(mp, sbp->sb_blocksize, 1); - mp->m_bmap_dmxr[1] = xfs_bmbt_maxrecs(mp, sbp->sb_blocksize, 0); + mp->m_bmap_dmxr[0] = xfs_bmbt_maxrecs(mp, sbp->sb_blocksize, true); + mp->m_bmap_dmxr[1] = xfs_bmbt_maxrecs(mp, sbp->sb_blocksize, false); mp->m_bmap_dmnr[0] = mp->m_bmap_dmxr[0] / 2; mp->m_bmap_dmnr[1] = mp->m_bmap_dmxr[1] / 2; - mp->m_rmap_mxr[0] = xfs_rmapbt_maxrecs(sbp->sb_blocksize, 1); - mp->m_rmap_mxr[1] = xfs_rmapbt_maxrecs(sbp->sb_blocksize, 0); + mp->m_rmap_mxr[0] = xfs_rmapbt_maxrecs(mp, sbp->sb_blocksize, true); + mp->m_rmap_mxr[1] = xfs_rmapbt_maxrecs(mp, sbp->sb_blocksize, false); mp->m_rmap_mnr[0] = mp->m_rmap_mxr[0] / 2; mp->m_rmap_mnr[1] = mp->m_rmap_mxr[1] / 2; - mp->m_refc_mxr[0] = xfs_refcountbt_maxrecs(sbp->sb_blocksize, true); - mp->m_refc_mxr[1] = xfs_refcountbt_maxrecs(sbp->sb_blocksize, false); + mp->m_refc_mxr[0] = xfs_refcountbt_maxrecs(mp, sbp->sb_blocksize, true); + mp->m_refc_mxr[1] = xfs_refcountbt_maxrecs(mp, sbp->sb_blocksize, false); mp->m_refc_mnr[0] = mp->m_refc_mxr[0] / 2; mp->m_refc_mnr[1] = mp->m_refc_mxr[1] / 2; ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 14/14] xfs: update btree keys correctly when _insrec splits an inode root block 2022-12-30 22:17 ` [PATCHSET v1.0 00/14] xfs: refactor btrees to support records in inode root Darrick J. Wong ` (12 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 08/14] xfs: standardize the btree maxrecs function parameters Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 13 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> In commit 2c813ad66a72, I partially fixed a bug wherein xfs_btree_insrec would erroneously try to update the parent's key for a block that had been split if we decided to insert the new record into the new block. The solution was to detect this situation and update the in-core key value that we pass up to the caller so that the caller will (eventually) add the new block to the parent level of the tree with the correct key. However, I missed a subtlety about the way inode-rooted btrees work. If the full block was a maximally sized inode root block, we'll solve that fullness by moving the root block's records to a new block, resizing the root block, and updating the root to point to the new block. We don't pass a pointer to the new block to the caller because that work has already been done. The new record will /always/ land in the new block, so in this case we need to use xfs_btree_update_keys to update the keys. This bug can theoretically manifest itself in the very rare case that we split a bmbt root block and the new record lands in the very first slot of the new block, though I've never managed to trigger it in practice. However, it is very easy to reproduce by running generic/522 with the realtime rmapbt patchset if rtinherit=1. Fixes: 2c813ad66a72 ("xfs: support btrees with overlapping intervals for keys") Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_btree.c | 29 +++++++++++++++++++++++------ 1 file changed, 23 insertions(+), 6 deletions(-) diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c index 18628542d316..00bc1dd73675 100644 --- a/fs/xfs/libxfs/xfs_btree.c +++ b/fs/xfs/libxfs/xfs_btree.c @@ -3665,14 +3665,31 @@ xfs_btree_insrec( xfs_btree_log_block(cur, bp, XFS_BB_NUMRECS); /* - * If we just inserted into a new tree block, we have to - * recalculate nkey here because nkey is out of date. + * Update btree keys to reflect the newly added record or keyptr. + * There are three cases here to be aware of. Normally, all we have to + * do is walk towards the root, updating keys as necessary. * - * Otherwise we're just updating an existing block (having shoved - * some records into the new tree block), so use the regular key - * update mechanism. + * If the caller had us target a full block for the insertion, we dealt + * with that by calling the _make_block_unfull function. If the + * "make unfull" function splits the block, it'll hand us back the key + * and pointer of the new block. We haven't yet added the new block to + * the next level up, so if we decide to add the new record to the new + * block (bp->b_bn != old_bn), we have to update the caller's pointer + * so that the caller adds the new block with the correct key. + * + * However, there is a third possibility-- if the selected block is the + * root block of an inode-rooted btree and cannot be expanded further, + * the "make unfull" function moves the root block contents to a new + * block and updates the root block to point to the new block. In this + * case, no block pointer is passed back because the block has already + * been added to the btree. In this case, we need to use the regular + * key update function, just like the first case. This is critical for + * overlapping btrees, because the high key must be updated to reflect + * the entire tree, not just the subtree accessible through the first + * child of the root (which is now two levels down from the root). */ - if (bp && xfs_buf_daddr(bp) != old_bn) { + if (!xfs_btree_ptr_is_null(cur, &nptr) && + bp && xfs_buf_daddr(bp) != old_bn) { xfs_btree_get_keys(cur, block, lkey); } else if (xfs_btree_needs_key_update(cur, optr)) { error = xfs_btree_update_keys(cur, level); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCHSET v1.0 00/11] xfs: clean up realtime type usage 2022-12-30 21:14 [NYE DELUGE 3/4] xfs: modernize the realtime volume Darrick J. Wong ` (2 preceding siblings ...) 2022-12-30 22:17 ` [PATCHSET v1.0 00/14] xfs: refactor btrees to support records in inode root Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 05/11] xfs: make sure maxlen is still congruent with prod when rounding down Darrick J. Wong ` (10 more replies) 2022-12-30 22:17 ` [PATCHSET v1.0 0/7] xfs: refactor rt extent unit conversions Darrick J. Wong ` (35 subsequent siblings) 39 siblings, 11 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs Hi all, The realtime code uses xfs_rtblock_t and xfs_fsblock_t in a lot of places, and it's very confusing. Clean up all the type usage so that an xfs_rtblock_t is always a block within the realtime volume, an xfs_fileoff_t is always a file offset within a realtime metadata file, and an xfs_rtxnumber_t is always a rt extent within the realtime volume. If you're going to start using this mess, you probably ought to just pull from my git trees, which are linked below. This is an extraordinary way to destroy everything. Enjoy! Comments and questions are, as always, welcome. --D kernel git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=clean-up-realtime-units xfsprogs git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=clean-up-realtime-units --- fs/xfs/libxfs/xfs_bmap.c | 4 - fs/xfs/libxfs/xfs_format.h | 2 fs/xfs/libxfs/xfs_rtbitmap.c | 121 ++++++++++----------- fs/xfs/libxfs/xfs_rtbitmap.h | 79 ++++++++++++++ fs/xfs/libxfs/xfs_sb.h | 2 fs/xfs/libxfs/xfs_types.c | 4 - fs/xfs/libxfs/xfs_types.h | 8 + fs/xfs/scrub/bmap.c | 4 + fs/xfs/scrub/common.c | 58 ++++++++++ fs/xfs/scrub/common.h | 17 +++ fs/xfs/scrub/fscounters.c | 2 fs/xfs/scrub/rtbitmap.c | 23 ++-- fs/xfs/scrub/rtsummary.c | 30 ++--- fs/xfs/scrub/scrub.c | 1 fs/xfs/scrub/scrub.h | 9 ++ fs/xfs/scrub/trace.h | 7 + fs/xfs/xfs_bmap_item.c | 2 fs/xfs/xfs_bmap_util.c | 18 +-- fs/xfs/xfs_fsmap.c | 2 fs/xfs/xfs_rtalloc.c | 245 ++++++++++++++++++++++-------------------- fs/xfs/xfs_rtalloc.h | 99 ++--------------- 21 files changed, 417 insertions(+), 320 deletions(-) create mode 100644 fs/xfs/libxfs/xfs_rtbitmap.h ^ permalink raw reply [flat|nested] 565+ messages in thread
* [PATCH 05/11] xfs: make sure maxlen is still congruent with prod when rounding down 2022-12-30 22:17 ` [PATCHSET v1.0 00/11] xfs: clean up realtime type usage Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 04/11] xfs: rt stubs should return negative errnos when rt disabled Darrick J. Wong ` (9 subsequent siblings) 10 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> In commit 2a6ca4baed62, we tried to fix an overflow problem in the realtime allocator that was caused by an overly large maxlen value causing xfs_rtcheck_range to run off the end of the realtime bitmap. Unfortunately, there is a subtle bug here -- maxlen (and minlen) both have to be aligned with @prod, but @prod can be larger than 1 if the user has set an extent size hint on the file, and that extent size hint is larger than the realtime extent size. If the rt free space extents are not aligned to this file's extszhint because other files without extent size hints allocated space (or the number of rt extents is similarly not aligned), then it's possible that maxlen after clamping to sb_rextents will no longer be aligned to prod. The allocation will succeed just fine, but we still trip the assertion. Fix the problem by reducing maxlen by any misalignment with prod. While we're at it, split the assertions into two so that we can tell which value had the bad alignment. Fixes: 2a6ca4baed62 ("xfs: make sure the rt allocator doesn't run off the end") Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/xfs_rtalloc.c | 24 +++++++++++++++++++----- 1 file changed, 19 insertions(+), 5 deletions(-) diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c index 3ac8ca845239..88faf7fb912d 100644 --- a/fs/xfs/xfs_rtalloc.c +++ b/fs/xfs/xfs_rtalloc.c @@ -252,8 +252,13 @@ xfs_rtallocate_extent_block( end = XFS_BLOCKTOBIT(mp, bbno + 1) - 1; i <= end; i++) { - /* Make sure we don't scan off the end of the rt volume. */ + /* + * Make sure we don't run off the end of the rt volume. Be + * careful that adjusting maxlen downwards doesn't cause us to + * fail the alignment checks. + */ maxlen = min(mp->m_sb.sb_rextents, i + maxlen) - i; + maxlen -= maxlen % prod; /* * See if there's a free extent of maxlen starting at i. @@ -360,7 +365,8 @@ xfs_rtallocate_extent_exact( int isfree; /* extent is free */ xfs_rtblock_t next; /* next block to try (dummy) */ - ASSERT(minlen % prod == 0 && maxlen % prod == 0); + ASSERT(minlen % prod == 0); + ASSERT(maxlen % prod == 0); /* * Check if the range in question (for maxlen) is free. */ @@ -443,7 +449,9 @@ xfs_rtallocate_extent_near( xfs_rtblock_t n; /* next block to try */ xfs_rtblock_t r; /* result block */ - ASSERT(minlen % prod == 0 && maxlen % prod == 0); + ASSERT(minlen % prod == 0); + ASSERT(maxlen % prod == 0); + /* * If the block number given is off the end, silently set it to * the last block. @@ -451,8 +459,13 @@ xfs_rtallocate_extent_near( if (bno >= mp->m_sb.sb_rextents) bno = mp->m_sb.sb_rextents - 1; - /* Make sure we don't run off the end of the rt volume. */ + /* + * Make sure we don't run off the end of the rt volume. Be careful + * that adjusting maxlen downwards doesn't cause us to fail the + * alignment checks. + */ maxlen = min(mp->m_sb.sb_rextents, bno + maxlen) - bno; + maxlen -= maxlen % prod; if (maxlen < minlen) { *rtblock = NULLRTBLOCK; return 0; @@ -643,7 +656,8 @@ xfs_rtallocate_extent_size( xfs_rtblock_t r; /* result block number */ xfs_suminfo_t sum; /* summary information for extents */ - ASSERT(minlen % prod == 0 && maxlen % prod == 0); + ASSERT(minlen % prod == 0); + ASSERT(maxlen % prod == 0); ASSERT(maxlen != 0); /* ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 04/11] xfs: rt stubs should return negative errnos when rt disabled 2022-12-30 22:17 ` [PATCHSET v1.0 00/11] xfs: clean up realtime type usage Darrick J. Wong 2022-12-30 22:17 ` [PATCH 05/11] xfs: make sure maxlen is still congruent with prod when rounding down Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 01/11] xfs: refactor realtime scrubbing context management Darrick J. Wong ` (8 subsequent siblings) 10 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> When realtime support is not compiled into the kernel, these functions should return negative errnos, not positive errnos. While we're at it, fix a broken macro declaration. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/xfs_rtalloc.h | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/fs/xfs/xfs_rtalloc.h b/fs/xfs/xfs_rtalloc.h index e440f793dd98..e53bc52d81fd 100644 --- a/fs/xfs/xfs_rtalloc.h +++ b/fs/xfs/xfs_rtalloc.h @@ -143,17 +143,17 @@ int xfs_rtalloc_reinit_frextents(struct xfs_mount *mp); int xfs_rtfile_convert_unwritten(struct xfs_inode *ip, loff_t pos, uint64_t len); #else -# define xfs_rtallocate_extent(t,b,min,max,l,f,p,rb) (ENOSYS) -# define xfs_rtfree_extent(t,b,l) (ENOSYS) -# define xfs_rtfree_blocks(t,rb,rl) (ENOSYS) -# define xfs_rtpick_extent(m,t,l,rb) (ENOSYS) -# define xfs_growfs_rt(mp,in) (ENOSYS) -# define xfs_rtalloc_query_range(t,l,h,f,p) (ENOSYS) -# define xfs_rtalloc_query_all(m,t,f,p) (ENOSYS) -# define xfs_rtbuf_get(m,t,b,i,p) (ENOSYS) -# define xfs_verify_rtbno(m, r) (false) -# define xfs_rtalloc_extent_is_free(m,t,s,l,i) (ENOSYS) -# define xfs_rtalloc_reinit_frextents(m) (0) +# define xfs_rtallocate_extent(t,b,min,max,l,f,p,rb) (-ENOSYS) +# define xfs_rtfree_extent(t,b,l) (-ENOSYS) +# define xfs_rtfree_blocks(t,rb,rl) (-ENOSYS) +# define xfs_rtpick_extent(m,t,l,rb) (-ENOSYS) +# define xfs_growfs_rt(mp,in) (-ENOSYS) +# define xfs_rtalloc_query_range(m,t,l,h,f,p) (-ENOSYS) +# define xfs_rtalloc_query_all(m,t,f,p) (-ENOSYS) +# define xfs_rtbuf_get(m,t,b,i,p) (-ENOSYS) +# define xfs_verify_rtbno(m, r) (false) +# define xfs_rtalloc_extent_is_free(m,t,s,l,i) (-ENOSYS) +# define xfs_rtalloc_reinit_frextents(m) (0) static inline int /* error */ xfs_rtmount_init( xfs_mount_t *mp) /* file system mount structure */ @@ -164,7 +164,7 @@ xfs_rtmount_init( xfs_warn(mp, "Not built with CONFIG_XFS_RT"); return -ENOSYS; } -# define xfs_rtmount_inodes(m) (((mp)->m_sb.sb_rblocks == 0)? 0 : (ENOSYS)) +# define xfs_rtmount_inodes(m) (((mp)->m_sb.sb_rblocks == 0)? 0 : (-ENOSYS)) # define xfs_rtunmount_inodes(m) # define xfs_rtfile_convert_unwritten(ip, pos, len) (0) #endif /* CONFIG_XFS_RT */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 01/11] xfs: refactor realtime scrubbing context management 2022-12-30 22:17 ` [PATCHSET v1.0 00/11] xfs: clean up realtime type usage Darrick J. Wong 2022-12-30 22:17 ` [PATCH 05/11] xfs: make sure maxlen is still congruent with prod when rounding down Darrick J. Wong 2022-12-30 22:17 ` [PATCH 04/11] xfs: rt stubs should return negative errnos when rt disabled Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 02/11] xfs: bump max fsgeom struct version Darrick J. Wong ` (7 subsequent siblings) 10 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Create a pair of helpers to deal with setting up the necessary incore context to check metadata records against the realtime metadata. Right now this is limited to locking the realtime bitmap and summary inodes, but as we add rmap and reflink to the realtime device this will grow to include btree cursors. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/scrub/bmap.c | 2 ++ fs/xfs/scrub/common.c | 58 ++++++++++++++++++++++++++++++++++++++++++++++ fs/xfs/scrub/common.h | 17 +++++++++++++ fs/xfs/scrub/rtbitmap.c | 7 ++---- fs/xfs/scrub/rtsummary.c | 24 +++++-------------- fs/xfs/scrub/scrub.c | 1 + fs/xfs/scrub/scrub.h | 9 +++++++ 7 files changed, 95 insertions(+), 23 deletions(-) diff --git a/fs/xfs/scrub/bmap.c b/fs/xfs/scrub/bmap.c index 150b8c40b809..47d6bae9d6da 100644 --- a/fs/xfs/scrub/bmap.c +++ b/fs/xfs/scrub/bmap.c @@ -318,8 +318,10 @@ xchk_bmap_rt_iextent_xref( struct xchk_bmap_info *info, struct xfs_bmbt_irec *irec) { + xchk_rt_init(info->sc, &info->sc->sr, XCHK_RTLOCK_BITMAP_SHARED); xchk_xref_is_used_rt_space(info->sc, irec->br_startblock, irec->br_blockcount); + xchk_rt_unlock(info->sc, &info->sc->sr); } /* Cross-reference a single datadev extent record. */ diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c index b9c4f335cd8e..4de13f8f4277 100644 --- a/fs/xfs/scrub/common.c +++ b/fs/xfs/scrub/common.c @@ -671,6 +671,64 @@ xchk_ag_init( return 0; } +/* + * For scrubbing a realtime file, grab all the in-core resources we'll need to + * check the realtime metadata, which means taking the ILOCK of the realtime + * metadata inodes. Callers must not join these inodes to the transaction + * with non-zero lockflags or concurrency problems will result. The + * @rtlock_flags argument takes XCHK_RTLOCK_* flags because scrub has somewhat + * unusual locking requirements. + */ +void +xchk_rt_init( + struct xfs_scrub *sc, + struct xchk_rt *sr, + unsigned int rtlock_flags) +{ + ASSERT(!(rtlock_flags & ~XCHK_RTLOCK_ALL)); + ASSERT(hweight32(rtlock_flags & (XCHK_RTLOCK_BITMAP | + XCHK_RTLOCK_BITMAP_SHARED)) < 2); + ASSERT(hweight32(rtlock_flags & (XCHK_RTLOCK_SUMMARY | + XCHK_RTLOCK_SUMMARY_SHARED)) < 2); + + if (rtlock_flags & XCHK_RTLOCK_BITMAP) + xfs_ilock(sc->mp->m_rbmip, XFS_ILOCK_EXCL | XFS_ILOCK_RTBITMAP); + else if (rtlock_flags & XCHK_RTLOCK_BITMAP_SHARED) + xfs_ilock(sc->mp->m_rbmip, XFS_ILOCK_SHARED | XFS_ILOCK_RTBITMAP); + + if (rtlock_flags & XCHK_RTLOCK_SUMMARY) + xfs_ilock(sc->mp->m_rsumip, XFS_ILOCK_EXCL | XFS_ILOCK_RTSUM); + else if (rtlock_flags & XCHK_RTLOCK_SUMMARY_SHARED) + xfs_ilock(sc->mp->m_rsumip, XFS_ILOCK_SHARED | XFS_ILOCK_RTSUM); + + sr->rtlock_flags = rtlock_flags; +} + +/* + * Unlock the realtime metadata inodes. This must be done /after/ committing + * (or cancelling) the scrub transaction. + */ +void +xchk_rt_unlock( + struct xfs_scrub *sc, + struct xchk_rt *sr) +{ + if (!sr->rtlock_flags) + return; + + if (sr->rtlock_flags & XCHK_RTLOCK_SUMMARY) + xfs_iunlock(sc->mp->m_rsumip, XFS_ILOCK_EXCL); + else if (sr->rtlock_flags & XCHK_RTLOCK_SUMMARY) + xfs_iunlock(sc->mp->m_rsumip, XFS_ILOCK_SHARED); + + if (sr->rtlock_flags & XCHK_RTLOCK_BITMAP) + xfs_iunlock(sc->mp->m_rbmip, XFS_ILOCK_EXCL); + else if (sr->rtlock_flags & XCHK_RTLOCK_BITMAP_SHARED) + xfs_iunlock(sc->mp->m_rbmip, XFS_ILOCK_SHARED); + + sr->rtlock_flags = 0; +} + /* Per-scrubber setup functions */ void diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h index 9bdacce17d82..e41224065421 100644 --- a/fs/xfs/scrub/common.h +++ b/fs/xfs/scrub/common.h @@ -152,6 +152,23 @@ xchk_ag_init_existing( return error == -ENOENT ? -EFSCORRUPTED : error; } +/* Lock the rt bitmap in exclusive mode */ +#define XCHK_RTLOCK_BITMAP (1U << 31) +/* Lock the rt bitmap in shared mode */ +#define XCHK_RTLOCK_BITMAP_SHARED (1U << 30) +/* Lock the rt summary in exclusive mode */ +#define XCHK_RTLOCK_SUMMARY (1U << 29) +/* Lock the rt summary in shared mode */ +#define XCHK_RTLOCK_SUMMARY_SHARED (1U << 28) + +#define XCHK_RTLOCK_ALL (XCHK_RTLOCK_BITMAP | \ + XCHK_RTLOCK_BITMAP_SHARED | \ + XCHK_RTLOCK_SUMMARY | \ + XCHK_RTLOCK_SUMMARY_SHARED) + +void xchk_rt_init(struct xfs_scrub *sc, struct xchk_rt *sr, + unsigned int xchk_rtlock_flags); +void xchk_rt_unlock(struct xfs_scrub *sc, struct xchk_rt *sr); int xchk_ag_read_headers(struct xfs_scrub *sc, xfs_agnumber_t agno, struct xchk_ag *sa); void xchk_ag_btcur_free(struct xchk_ag *sa); diff --git a/fs/xfs/scrub/rtbitmap.c b/fs/xfs/scrub/rtbitmap.c index 1d84a9eed67c..e524055ba709 100644 --- a/fs/xfs/scrub/rtbitmap.c +++ b/fs/xfs/scrub/rtbitmap.c @@ -44,7 +44,7 @@ xchk_setup_rtbitmap( if (error) return error; - xchk_ilock(sc, XFS_ILOCK_EXCL | XFS_ILOCK_RTBITMAP); + xchk_rt_init(sc, &sc->sr, XCHK_RTLOCK_BITMAP); return 0; } @@ -157,13 +157,10 @@ xchk_xref_is_used_rt_space( do_div(startext, sc->mp->m_sb.sb_rextsize); do_div(endext, sc->mp->m_sb.sb_rextsize); extcount = endext - startext + 1; - xfs_ilock(sc->mp->m_rbmip, XFS_ILOCK_SHARED | XFS_ILOCK_RTBITMAP); error = xfs_rtalloc_extent_is_free(sc->mp, sc->tp, startext, extcount, &is_free); if (!xchk_should_check_xref(sc, &error, NULL)) - goto out_unlock; + return; if (is_free) xchk_ino_xref_set_corrupt(sc, sc->mp->m_rbmip->i_ino); -out_unlock: - xfs_iunlock(sc->mp->m_rbmip, XFS_ILOCK_SHARED | XFS_ILOCK_RTBITMAP); } diff --git a/fs/xfs/scrub/rtsummary.c b/fs/xfs/scrub/rtsummary.c index 7d1bc49fb3dd..c0bf65273f1a 100644 --- a/fs/xfs/scrub/rtsummary.c +++ b/fs/xfs/scrub/rtsummary.c @@ -75,14 +75,8 @@ xchk_setup_rtsummary( if (error) return error; - /* - * Locking order requires us to take the rtbitmap first. We must be - * careful to unlock it ourselves when we are done with the rtbitmap - * file since the scrub infrastructure won't do that for us. Only - * then we can lock the rtsummary inode. - */ - xfs_ilock(mp->m_rbmip, XFS_ILOCK_SHARED | XFS_ILOCK_RTBITMAP); - xchk_ilock(sc, XFS_ILOCK_EXCL | XFS_ILOCK_RTSUM); + xchk_rt_init(sc, &sc->sr, + XCHK_RTLOCK_SUMMARY | XCHK_RTLOCK_BITMAP_SHARED); return 0; } @@ -248,7 +242,7 @@ xchk_rtsummary( /* Invoke the fork scrubber. */ error = xchk_metadata_inode_forks(sc); if (error || (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)) - goto out_rbm; + return error; /* Construct the new summary file from the rtbitmap. */ error = xchk_rtsum_compute(sc); @@ -258,17 +252,11 @@ xchk_rtsummary( * error since we're checking the summary file. */ xchk_ino_xref_set_corrupt(sc, mp->m_rbmip->i_ino); - error = 0; - goto out_rbm; + return 0; } if (error) - goto out_rbm; + return error; /* Does the computed summary file match the actual rtsummary file? */ - error = xchk_rtsum_compare(sc); - -out_rbm: - /* Unlock the rtbitmap since we're done with it. */ - xfs_iunlock(mp->m_rbmip, XFS_ILOCK_SHARED | XFS_ILOCK_RTBITMAP); - return error; + return xchk_rtsum_compare(sc); } diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c index a596789e463d..1b3820b30384 100644 --- a/fs/xfs/scrub/scrub.c +++ b/fs/xfs/scrub/scrub.c @@ -189,6 +189,7 @@ xchk_teardown( xfs_trans_cancel(sc->tp); sc->tp = NULL; } + xchk_rt_unlock(sc, &sc->sr); if (sc->ip) { if (sc->ilock_flags) xchk_iunlock(sc, sc->ilock_flags); diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h index d606d4f370c7..38437104fc86 100644 --- a/fs/xfs/scrub/scrub.h +++ b/fs/xfs/scrub/scrub.h @@ -67,6 +67,12 @@ struct xchk_ag { struct xfs_btree_cur *refc_cur; }; +/* Inode lock state for the RT volume. */ +struct xchk_rt { + /* XCHK_RTLOCK_* lock state */ + unsigned int rtlock_flags; +}; + struct xfs_scrub { /* General scrub state. */ struct xfs_mount *mp; @@ -125,6 +131,9 @@ struct xfs_scrub { /* State tracking for single-AG operations. */ struct xchk_ag sa; + + /* State tracking for realtime operations. */ + struct xchk_rt sr; }; /* XCHK state flags grow up from zero, XREP state flags grown down from 2^31 */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 02/11] xfs: bump max fsgeom struct version 2022-12-30 22:17 ` [PATCHSET v1.0 00/11] xfs: clean up realtime type usage Darrick J. Wong ` (2 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 01/11] xfs: refactor realtime scrubbing context management Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 03/11] xfs: prevent rt growfs when quota is enabled Darrick J. Wong ` (6 subsequent siblings) 10 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> The latest version of the fs geometry structure is v5. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_sb.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/xfs/libxfs/xfs_sb.h b/fs/xfs/libxfs/xfs_sb.h index a5e14740ec9a..19134b23c10b 100644 --- a/fs/xfs/libxfs/xfs_sb.h +++ b/fs/xfs/libxfs/xfs_sb.h @@ -25,7 +25,7 @@ extern uint64_t xfs_sb_version_to_features(struct xfs_sb *sbp); extern int xfs_update_secondary_sbs(struct xfs_mount *mp); -#define XFS_FS_GEOM_MAX_STRUCT_VER (4) +#define XFS_FS_GEOM_MAX_STRUCT_VER (5) extern void xfs_fs_geometry(struct xfs_mount *mp, struct xfs_fsop_geom *geo, int struct_version); extern int xfs_sb_read_secondary(struct xfs_mount *mp, ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 03/11] xfs: prevent rt growfs when quota is enabled 2022-12-30 22:17 ` [PATCHSET v1.0 00/11] xfs: clean up realtime type usage Darrick J. Wong ` (3 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 02/11] xfs: bump max fsgeom struct version Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 08/11] xfs: convert rt bitmap/summary block numbers to xfs_fileoff_t Darrick J. Wong ` (5 subsequent siblings) 10 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Quotas aren't (yet) supported with realtime, so we shouldn't allow userspace to set up a realtime section when quotas are enabled, even if they attached one via mount options. IOWS, you shouldn't be able to do: # mkfs.xfs -f /dev/sda # mount /dev/sda /mnt -o rtdev=/dev/sdb # xfs_growfs -r /mnt Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/xfs_rtalloc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c index 726e3cec34d5..3ac8ca845239 100644 --- a/fs/xfs/xfs_rtalloc.c +++ b/fs/xfs/xfs_rtalloc.c @@ -959,7 +959,7 @@ xfs_growfs_rt( return -EINVAL; /* Unsupported realtime features. */ - if (xfs_has_rmapbt(mp) || xfs_has_reflink(mp)) + if (xfs_has_rmapbt(mp) || xfs_has_reflink(mp) || xfs_has_quota(mp)) return -EOPNOTSUPP; nrblocks = in->newblocks; ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 08/11] xfs: convert rt bitmap/summary block numbers to xfs_fileoff_t 2022-12-30 22:17 ` [PATCHSET v1.0 00/11] xfs: clean up realtime type usage Darrick J. Wong ` (4 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 03/11] xfs: prevent rt growfs when quota is enabled Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 11/11] xfs: convert rt extent numbers to xfs_rtxnum_t Darrick J. Wong ` (4 subsequent siblings) 10 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> We should use xfs_fileoff_t to store the file block offset of any location within the realtime bitmap or summary files. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_rtbitmap.c | 22 +++++++++++----------- fs/xfs/libxfs/xfs_rtbitmap.h | 12 ++++++------ fs/xfs/scrub/rtbitmap.c | 2 +- fs/xfs/xfs_rtalloc.c | 34 +++++++++++++++++----------------- 4 files changed, 35 insertions(+), 35 deletions(-) diff --git a/fs/xfs/libxfs/xfs_rtbitmap.c b/fs/xfs/libxfs/xfs_rtbitmap.c index b90d2f2d5bde..50a9d23c00c6 100644 --- a/fs/xfs/libxfs/xfs_rtbitmap.c +++ b/fs/xfs/libxfs/xfs_rtbitmap.c @@ -56,7 +56,7 @@ int xfs_rtbuf_get( xfs_mount_t *mp, /* file system mount structure */ xfs_trans_t *tp, /* transaction pointer */ - xfs_rtblock_t block, /* block number in bitmap or summary */ + xfs_fileoff_t block, /* block number in bitmap or summary */ int issum, /* is summary not bitmap */ struct xfs_buf **bpp) /* output: buffer for the block */ { @@ -108,7 +108,7 @@ xfs_rtfind_back( { xfs_rtword_t *b; /* current word in buffer */ int bit; /* bit number in the word */ - xfs_rtblock_t block; /* bitmap block number */ + xfs_fileoff_t block; /* bitmap block number */ struct xfs_buf *bp; /* buf for the block */ xfs_rtword_t *bufp; /* starting word in buffer */ int error; /* error value */ @@ -283,7 +283,7 @@ xfs_rtfind_forw( { xfs_rtword_t *b; /* current word in buffer */ int bit; /* bit number in the word */ - xfs_rtblock_t block; /* bitmap block number */ + xfs_fileoff_t block; /* bitmap block number */ struct xfs_buf *bp; /* buf for the block */ xfs_rtword_t *bufp; /* starting word in buffer */ int error; /* error value */ @@ -453,15 +453,15 @@ xfs_rtmodify_summary_int( xfs_mount_t *mp, /* file system mount structure */ xfs_trans_t *tp, /* transaction pointer */ int log, /* log2 of extent size */ - xfs_rtblock_t bbno, /* bitmap block number */ + xfs_fileoff_t bbno, /* bitmap block number */ int delta, /* change to make to summary info */ struct xfs_buf **rbpp, /* in/out: summary block buffer */ - xfs_fsblock_t *rsb, /* in/out: summary block number */ + xfs_fileoff_t *rsb, /* in/out: summary block number */ xfs_suminfo_t *sum) /* out: summary info for this block */ { struct xfs_buf *bp; /* buffer for the summary block */ int error; /* error value */ - xfs_fsblock_t sb; /* summary fsblock */ + xfs_fileoff_t sb; /* summary fsblock */ int so; /* index into the summary file */ xfs_suminfo_t *sp; /* pointer to returned data */ @@ -523,10 +523,10 @@ xfs_rtmodify_summary( xfs_mount_t *mp, /* file system mount structure */ xfs_trans_t *tp, /* transaction pointer */ int log, /* log2 of extent size */ - xfs_rtblock_t bbno, /* bitmap block number */ + xfs_fileoff_t bbno, /* bitmap block number */ int delta, /* change to make to summary info */ struct xfs_buf **rbpp, /* in/out: summary block buffer */ - xfs_fsblock_t *rsb) /* in/out: summary block number */ + xfs_fileoff_t *rsb) /* in/out: summary block number */ { return xfs_rtmodify_summary_int(mp, tp, log, bbno, delta, rbpp, rsb, NULL); @@ -546,7 +546,7 @@ xfs_rtmodify_range( { xfs_rtword_t *b; /* current word in buffer */ int bit; /* bit number in the word */ - xfs_rtblock_t block; /* bitmap block number */ + xfs_fileoff_t block; /* bitmap block number */ struct xfs_buf *bp; /* buf for the block */ xfs_rtword_t *bufp; /* starting word in buffer */ int error; /* error value */ @@ -699,7 +699,7 @@ xfs_rtfree_range( xfs_rtblock_t start, /* starting block to free */ xfs_rtxlen_t len, /* length to free */ struct xfs_buf **rbpp, /* in/out: summary block buffer */ - xfs_fsblock_t *rsb) /* in/out: summary block number */ + xfs_fileoff_t *rsb) /* in/out: summary block number */ { xfs_rtblock_t end; /* end of the freed extent */ int error; /* error value */ @@ -780,7 +780,7 @@ xfs_rtcheck_range( { xfs_rtword_t *b; /* current word in buffer */ int bit; /* bit number in the word */ - xfs_rtblock_t block; /* bitmap block number */ + xfs_fileoff_t block; /* bitmap block number */ struct xfs_buf *bp; /* buf for the block */ xfs_rtword_t *bufp; /* starting word in buffer */ int error; /* error value */ diff --git a/fs/xfs/libxfs/xfs_rtbitmap.h b/fs/xfs/libxfs/xfs_rtbitmap.h index d4449610154a..e2ea6d31c38b 100644 --- a/fs/xfs/libxfs/xfs_rtbitmap.h +++ b/fs/xfs/libxfs/xfs_rtbitmap.h @@ -24,7 +24,7 @@ typedef int (*xfs_rtalloc_query_range_fn)( #ifdef CONFIG_XFS_RT int xfs_rtbuf_get(struct xfs_mount *mp, struct xfs_trans *tp, - xfs_rtblock_t block, int issum, struct xfs_buf **bpp); + xfs_fileoff_t block, int issum, struct xfs_buf **bpp); int xfs_rtcheck_range(struct xfs_mount *mp, struct xfs_trans *tp, xfs_rtblock_t start, xfs_rtxlen_t len, int val, xfs_rtblock_t *new, int *stat); @@ -37,15 +37,15 @@ int xfs_rtfind_forw(struct xfs_mount *mp, struct xfs_trans *tp, int xfs_rtmodify_range(struct xfs_mount *mp, struct xfs_trans *tp, xfs_rtblock_t start, xfs_rtxlen_t len, int val); int xfs_rtmodify_summary_int(struct xfs_mount *mp, struct xfs_trans *tp, - int log, xfs_rtblock_t bbno, int delta, - struct xfs_buf **rbpp, xfs_fsblock_t *rsb, + int log, xfs_fileoff_t bbno, int delta, + struct xfs_buf **rbpp, xfs_fileoff_t *rsb, xfs_suminfo_t *sum); int xfs_rtmodify_summary(struct xfs_mount *mp, struct xfs_trans *tp, int log, - xfs_rtblock_t bbno, int delta, struct xfs_buf **rbpp, - xfs_fsblock_t *rsb); + xfs_fileoff_t bbno, int delta, struct xfs_buf **rbpp, + xfs_fileoff_t *rsb); int xfs_rtfree_range(struct xfs_mount *mp, struct xfs_trans *tp, xfs_rtblock_t start, xfs_rtxlen_t len, - struct xfs_buf **rbpp, xfs_fsblock_t *rsb); + struct xfs_buf **rbpp, xfs_fileoff_t *rsb); int xfs_rtalloc_query_range(struct xfs_mount *mp, struct xfs_trans *tp, const struct xfs_rtalloc_rec *low_rec, const struct xfs_rtalloc_rec *high_rec, diff --git a/fs/xfs/scrub/rtbitmap.c b/fs/xfs/scrub/rtbitmap.c index 86f726577ca7..6f8becb557bd 100644 --- a/fs/xfs/scrub/rtbitmap.c +++ b/fs/xfs/scrub/rtbitmap.c @@ -77,7 +77,7 @@ xchk_rtbitmap_check_extents( { struct xfs_mount *mp = sc->mp; struct xfs_bmbt_irec map; - xfs_rtblock_t off; + xfs_fileoff_t off; int nmap; int error = 0; diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c index 21f0ac611ef8..12d1fe425d22 100644 --- a/fs/xfs/xfs_rtalloc.c +++ b/fs/xfs/xfs_rtalloc.c @@ -37,9 +37,9 @@ xfs_rtget_summary( xfs_mount_t *mp, /* file system mount structure */ xfs_trans_t *tp, /* transaction pointer */ int log, /* log2 of extent size */ - xfs_rtblock_t bbno, /* bitmap block number */ + xfs_fileoff_t bbno, /* bitmap block number */ struct xfs_buf **rbpp, /* in/out: summary block buffer */ - xfs_fsblock_t *rsb, /* in/out: summary block number */ + xfs_fileoff_t *rsb, /* in/out: summary block number */ xfs_suminfo_t *sum) /* out: summary info for this block */ { return xfs_rtmodify_summary_int(mp, tp, log, bbno, 0, rbpp, rsb, sum); @@ -55,9 +55,9 @@ xfs_rtany_summary( xfs_trans_t *tp, /* transaction pointer */ int low, /* low log2 extent size */ int high, /* high log2 extent size */ - xfs_rtblock_t bbno, /* bitmap block number */ + xfs_fileoff_t bbno, /* bitmap block number */ struct xfs_buf **rbpp, /* in/out: summary block buffer */ - xfs_fsblock_t *rsb, /* in/out: summary block number */ + xfs_fileoff_t *rsb, /* in/out: summary block number */ int *stat) /* out: any good extents here? */ { int error; /* error value */ @@ -109,12 +109,12 @@ xfs_rtcopy_summary( xfs_mount_t *nmp, /* new file system mount point */ xfs_trans_t *tp) /* transaction pointer */ { - xfs_rtblock_t bbno; /* bitmap block number */ + xfs_fileoff_t bbno; /* bitmap block number */ struct xfs_buf *bp; /* summary buffer */ int error; /* error return value */ int log; /* summary level number (log length) */ xfs_suminfo_t sum; /* summary data */ - xfs_fsblock_t sumbno; /* summary block number */ + xfs_fileoff_t sumbno; /* summary block number */ bp = NULL; for (log = omp->m_rsumlevels - 1; log >= 0; log--) { @@ -151,7 +151,7 @@ xfs_rtallocate_range( xfs_rtblock_t start, /* start block to allocate */ xfs_rtxlen_t len, /* length to allocate */ struct xfs_buf **rbpp, /* in/out: summary block buffer */ - xfs_fsblock_t *rsb) /* in/out: summary block number */ + xfs_fileoff_t *rsb) /* in/out: summary block number */ { xfs_rtblock_t end; /* end of the allocated extent */ int error; /* error value */ @@ -227,13 +227,13 @@ STATIC int /* error */ xfs_rtallocate_extent_block( xfs_mount_t *mp, /* file system mount point */ xfs_trans_t *tp, /* transaction pointer */ - xfs_rtblock_t bbno, /* bitmap block number */ + xfs_fileoff_t bbno, /* bitmap block number */ xfs_rtxlen_t minlen, /* minimum length to allocate */ xfs_rtxlen_t maxlen, /* maximum length to allocate */ xfs_rtxlen_t *len, /* out: actual length allocated */ xfs_rtblock_t *nextp, /* out: next block to try */ struct xfs_buf **rbpp, /* in/out: summary block buffer */ - xfs_fsblock_t *rsb, /* in/out: summary block number */ + xfs_fileoff_t *rsb, /* in/out: summary block number */ xfs_rtxlen_t prod, /* extent product factor */ xfs_rtblock_t *rtblock) /* out: start block allocated */ { @@ -357,7 +357,7 @@ xfs_rtallocate_extent_exact( xfs_rtxlen_t maxlen, /* maximum length to allocate */ xfs_rtxlen_t *len, /* out: actual length allocated */ struct xfs_buf **rbpp, /* in/out: summary block buffer */ - xfs_fsblock_t *rsb, /* in/out: summary block number */ + xfs_fileoff_t *rsb, /* in/out: summary block number */ xfs_rtxlen_t prod, /* extent product factor */ xfs_rtblock_t *rtblock) /* out: start block allocated */ { @@ -437,12 +437,12 @@ xfs_rtallocate_extent_near( xfs_rtxlen_t maxlen, /* maximum length to allocate */ xfs_rtxlen_t *len, /* out: actual length allocated */ struct xfs_buf **rbpp, /* in/out: summary block buffer */ - xfs_fsblock_t *rsb, /* in/out: summary block number */ + xfs_fileoff_t *rsb, /* in/out: summary block number */ xfs_rtxlen_t prod, /* extent product factor */ xfs_rtblock_t *rtblock) /* out: start block allocated */ { int any; /* any useful extents from summary */ - xfs_rtblock_t bbno; /* bitmap block number */ + xfs_fileoff_t bbno; /* bitmap block number */ int error; /* error value */ int i; /* bitmap block offset (loop control) */ int j; /* secondary loop control */ @@ -646,12 +646,12 @@ xfs_rtallocate_extent_size( xfs_rtxlen_t maxlen, /* maximum length to allocate */ xfs_rtxlen_t *len, /* out: actual length allocated */ struct xfs_buf **rbpp, /* in/out: summary block buffer */ - xfs_fsblock_t *rsb, /* in/out: summary block number */ + xfs_fileoff_t *rsb, /* in/out: summary block number */ xfs_rtxlen_t prod, /* extent product factor */ xfs_rtblock_t *rtblock) /* out: start block allocated */ { int error; /* error value */ - int i; /* bitmap block number */ + xfs_fileoff_t i; /* bitmap block number */ int l; /* level number (loop control) */ xfs_rtblock_t n; /* next block to be tried */ xfs_rtblock_t r; /* result block number */ @@ -927,7 +927,7 @@ xfs_growfs_rt( xfs_mount_t *mp, /* mount point for filesystem */ xfs_growfs_rt_t *in) /* growfs rt input struct */ { - xfs_rtblock_t bmbno; /* bitmap block number */ + xfs_fileoff_t bmbno; /* bitmap block number */ struct xfs_buf *bp; /* temporary buffer */ int error; /* error return value */ xfs_mount_t *nmp; /* new (fake) mount structure */ @@ -942,7 +942,7 @@ xfs_growfs_rt( xfs_extlen_t rbmblocks; /* current number of rt bitmap blocks */ xfs_extlen_t rsumblocks; /* current number of rt summary blks */ xfs_sb_t *sbp; /* old superblock */ - xfs_fsblock_t sumbno; /* summary block number */ + xfs_fileoff_t sumbno; /* summary block number */ uint8_t *rsum_cache; /* old summary cache */ sbp = &mp->m_sb; @@ -1205,7 +1205,7 @@ xfs_rtallocate_extent( xfs_mount_t *mp = tp->t_mountp; int error; /* error value */ xfs_rtblock_t r; /* result allocated block */ - xfs_fsblock_t sb; /* summary file block number */ + xfs_fileoff_t sb; /* summary file block number */ struct xfs_buf *sumbp; /* summary file block buffer */ ASSERT(xfs_isilocked(mp->m_rbmip, XFS_ILOCK_EXCL)); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 11/11] xfs: convert rt extent numbers to xfs_rtxnum_t 2022-12-30 22:17 ` [PATCHSET v1.0 00/11] xfs: clean up realtime type usage Darrick J. Wong ` (5 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 08/11] xfs: convert rt bitmap/summary block numbers to xfs_fileoff_t Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 06/11] xfs: move the xfs_rtbitmap.c declarations to xfs_rtbitmap.h Darrick J. Wong ` (3 subsequent siblings) 10 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Further disambiguate the xfs_rtblock_t uses by creating a new type, xfs_rtxnum_t, to store the position of an extent within the realtime section, in units of rtextents. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_rtbitmap.c | 86 +++++++++++++-------------- fs/xfs/libxfs/xfs_rtbitmap.h | 26 ++++---- fs/xfs/libxfs/xfs_types.h | 2 + fs/xfs/scrub/rtbitmap.c | 6 +- fs/xfs/scrub/rtsummary.c | 2 - fs/xfs/scrub/trace.h | 4 + fs/xfs/xfs_bmap_util.c | 12 ++-- fs/xfs/xfs_rtalloc.c | 134 +++++++++++++++++++++--------------------- fs/xfs/xfs_rtalloc.h | 6 +- 9 files changed, 138 insertions(+), 140 deletions(-) diff --git a/fs/xfs/libxfs/xfs_rtbitmap.c b/fs/xfs/libxfs/xfs_rtbitmap.c index 50a9d23c00c6..ce1443681131 100644 --- a/fs/xfs/libxfs/xfs_rtbitmap.c +++ b/fs/xfs/libxfs/xfs_rtbitmap.c @@ -102,9 +102,9 @@ int xfs_rtfind_back( xfs_mount_t *mp, /* file system mount point */ xfs_trans_t *tp, /* transaction pointer */ - xfs_rtblock_t start, /* starting block to look at */ - xfs_rtblock_t limit, /* last block to look at */ - xfs_rtblock_t *rtblock) /* out: start block found */ + xfs_rtxnum_t start, /* starting rtext to look at */ + xfs_rtxnum_t limit, /* last rtext to look at */ + xfs_rtxnum_t *rtx) /* out: start rtext found */ { xfs_rtword_t *b; /* current word in buffer */ int bit; /* bit number in the word */ @@ -112,9 +112,9 @@ xfs_rtfind_back( struct xfs_buf *bp; /* buf for the block */ xfs_rtword_t *bufp; /* starting word in buffer */ int error; /* error value */ - xfs_rtblock_t firstbit; /* first useful bit in the word */ - xfs_rtblock_t i; /* current bit number rel. to start */ - xfs_rtblock_t len; /* length of inspected area */ + xfs_rtxnum_t firstbit; /* first useful bit in the word */ + xfs_rtxnum_t i; /* current bit number rel. to start */ + xfs_rtxnum_t len; /* length of inspected area */ xfs_rtword_t mask; /* mask of relevant bits for value */ xfs_rtword_t want; /* mask for "good" values */ xfs_rtword_t wdiff; /* difference from wanted value */ @@ -163,7 +163,7 @@ xfs_rtfind_back( */ xfs_trans_brelse(tp, bp); i = bit - XFS_RTHIBIT(wdiff); - *rtblock = start - i + 1; + *rtx = start - i + 1; return 0; } i = bit - firstbit + 1; @@ -209,7 +209,7 @@ xfs_rtfind_back( */ xfs_trans_brelse(tp, bp); i += XFS_NBWORD - 1 - XFS_RTHIBIT(wdiff); - *rtblock = start - i + 1; + *rtx = start - i + 1; return 0; } i += XFS_NBWORD; @@ -256,7 +256,7 @@ xfs_rtfind_back( */ xfs_trans_brelse(tp, bp); i += XFS_NBWORD - 1 - XFS_RTHIBIT(wdiff); - *rtblock = start - i + 1; + *rtx = start - i + 1; return 0; } else i = len; @@ -265,7 +265,7 @@ xfs_rtfind_back( * No match, return that we scanned the whole area. */ xfs_trans_brelse(tp, bp); - *rtblock = start - i + 1; + *rtx = start - i + 1; return 0; } @@ -277,9 +277,9 @@ int xfs_rtfind_forw( xfs_mount_t *mp, /* file system mount point */ xfs_trans_t *tp, /* transaction pointer */ - xfs_rtblock_t start, /* starting block to look at */ - xfs_rtblock_t limit, /* last block to look at */ - xfs_rtblock_t *rtblock) /* out: start block found */ + xfs_rtxnum_t start, /* starting rtext to look at */ + xfs_rtxnum_t limit, /* last rtext to look at */ + xfs_rtxnum_t *rtx) /* out: start rtext found */ { xfs_rtword_t *b; /* current word in buffer */ int bit; /* bit number in the word */ @@ -287,9 +287,9 @@ xfs_rtfind_forw( struct xfs_buf *bp; /* buf for the block */ xfs_rtword_t *bufp; /* starting word in buffer */ int error; /* error value */ - xfs_rtblock_t i; /* current bit number rel. to start */ - xfs_rtblock_t lastbit; /* last useful bit in the word */ - xfs_rtblock_t len; /* length of inspected area */ + xfs_rtxnum_t i; /* current bit number rel. to start */ + xfs_rtxnum_t lastbit; /* last useful bit in the word */ + xfs_rtxnum_t len; /* length of inspected area */ xfs_rtword_t mask; /* mask of relevant bits for value */ xfs_rtword_t want; /* mask for "good" values */ xfs_rtword_t wdiff; /* difference from wanted value */ @@ -337,7 +337,7 @@ xfs_rtfind_forw( */ xfs_trans_brelse(tp, bp); i = XFS_RTLOBIT(wdiff) - bit; - *rtblock = start + i - 1; + *rtx = start + i - 1; return 0; } i = lastbit - bit; @@ -382,7 +382,7 @@ xfs_rtfind_forw( */ xfs_trans_brelse(tp, bp); i += XFS_RTLOBIT(wdiff); - *rtblock = start + i - 1; + *rtx = start + i - 1; return 0; } i += XFS_NBWORD; @@ -426,7 +426,7 @@ xfs_rtfind_forw( */ xfs_trans_brelse(tp, bp); i += XFS_RTLOBIT(wdiff); - *rtblock = start + i - 1; + *rtx = start + i - 1; return 0; } else i = len; @@ -435,7 +435,7 @@ xfs_rtfind_forw( * No match, return that we scanned the whole area. */ xfs_trans_brelse(tp, bp); - *rtblock = start + i - 1; + *rtx = start + i - 1; return 0; } @@ -540,7 +540,7 @@ int xfs_rtmodify_range( xfs_mount_t *mp, /* file system mount point */ xfs_trans_t *tp, /* transaction pointer */ - xfs_rtblock_t start, /* starting block to modify */ + xfs_rtxnum_t start, /* starting rtext to modify */ xfs_rtxlen_t len, /* length of extent to modify */ int val) /* 1 for free, 0 for allocated */ { @@ -696,15 +696,15 @@ int xfs_rtfree_range( xfs_mount_t *mp, /* file system mount point */ xfs_trans_t *tp, /* transaction pointer */ - xfs_rtblock_t start, /* starting block to free */ + xfs_rtxnum_t start, /* starting rtext to free */ xfs_rtxlen_t len, /* length to free */ struct xfs_buf **rbpp, /* in/out: summary block buffer */ xfs_fileoff_t *rsb) /* in/out: summary block number */ { - xfs_rtblock_t end; /* end of the freed extent */ + xfs_rtxnum_t end; /* end of the freed extent */ int error; /* error value */ - xfs_rtblock_t postblock; /* first block freed > end */ - xfs_rtblock_t preblock; /* first block freed < start */ + xfs_rtxnum_t postblock; /* first rtext freed > end */ + xfs_rtxnum_t preblock; /* first rtext freed < start */ end = start + len - 1; /* @@ -772,10 +772,10 @@ int xfs_rtcheck_range( xfs_mount_t *mp, /* file system mount point */ xfs_trans_t *tp, /* transaction pointer */ - xfs_rtblock_t start, /* starting block number of extent */ + xfs_rtxnum_t start, /* starting rtext number of extent */ xfs_rtxlen_t len, /* length of extent */ int val, /* 1 for free, 0 for allocated */ - xfs_rtblock_t *new, /* out: first block not matching */ + xfs_rtxnum_t *new, /* out: first rtext not matching */ int *stat) /* out: 1 for matches, 0 for not */ { xfs_rtword_t *b; /* current word in buffer */ @@ -784,8 +784,8 @@ xfs_rtcheck_range( struct xfs_buf *bp; /* buf for the block */ xfs_rtword_t *bufp; /* starting word in buffer */ int error; /* error value */ - xfs_rtblock_t i; /* current bit number rel. to start */ - xfs_rtblock_t lastbit; /* last useful bit in word */ + xfs_rtxnum_t i; /* current bit number rel. to start */ + xfs_rtxnum_t lastbit; /* last useful bit in word */ xfs_rtword_t mask; /* mask of relevant bits for value */ xfs_rtword_t wdiff; /* difference from wanted value */ int word; /* word number in the buffer */ @@ -948,14 +948,14 @@ STATIC int /* error */ xfs_rtcheck_alloc_range( xfs_mount_t *mp, /* file system mount point */ xfs_trans_t *tp, /* transaction pointer */ - xfs_rtblock_t bno, /* starting block number of extent */ + xfs_rtxnum_t start, /* starting rtext number of extent */ xfs_rtxlen_t len) /* length of extent */ { - xfs_rtblock_t new; /* dummy for xfs_rtcheck_range */ + xfs_rtxnum_t new; /* dummy for xfs_rtcheck_range */ int stat; int error; - error = xfs_rtcheck_range(mp, tp, bno, len, 0, &new, &stat); + error = xfs_rtcheck_range(mp, tp, start, len, 0, &new, &stat); if (error) return error; ASSERT(stat); @@ -971,7 +971,7 @@ xfs_rtcheck_alloc_range( int /* error */ xfs_rtfree_extent( xfs_trans_t *tp, /* transaction pointer */ - xfs_rtblock_t bno, /* starting block number to free */ + xfs_rtxnum_t start, /* starting rtext number to free */ xfs_rtxlen_t len) /* length of extent freed */ { int error; /* error value */ @@ -984,14 +984,14 @@ xfs_rtfree_extent( ASSERT(mp->m_rbmip->i_itemp != NULL); ASSERT(xfs_isilocked(mp->m_rbmip, XFS_ILOCK_EXCL)); - error = xfs_rtcheck_alloc_range(mp, tp, bno, len); + error = xfs_rtcheck_alloc_range(mp, tp, start, len); if (error) return error; /* * Free the range of realtime blocks. */ - error = xfs_rtfree_range(mp, tp, bno, len, &sumbp, &sb); + error = xfs_rtfree_range(mp, tp, start, len, &sumbp, &sb); if (error) { return error; } @@ -1025,7 +1025,7 @@ xfs_rtfree_blocks( xfs_filblks_t rtlen) { struct xfs_mount *mp = tp->t_mountp; - xfs_rtblock_t bno; + xfs_rtxnum_t start; xfs_filblks_t len; xfs_extlen_t mod; @@ -1037,13 +1037,13 @@ xfs_rtfree_blocks( return -EIO; } - bno = div_u64_rem(rtbno, mp->m_sb.sb_rextsize, &mod); + start = div_u64_rem(rtbno, mp->m_sb.sb_rextsize, &mod); if (mod) { ASSERT(mod == 0); return -EIO; } - return xfs_rtfree_extent(tp, bno, len); + return xfs_rtfree_extent(tp, start, len); } /* Find all the free records within a given range. */ @@ -1057,9 +1057,9 @@ xfs_rtalloc_query_range( void *priv) { struct xfs_rtalloc_rec rec; - xfs_rtblock_t rtstart; - xfs_rtblock_t rtend; - xfs_rtblock_t high_key; + xfs_rtxnum_t rtstart; + xfs_rtxnum_t rtend; + xfs_rtxnum_t high_key; int is_free; int error = 0; @@ -1122,11 +1122,11 @@ int xfs_rtalloc_extent_is_free( struct xfs_mount *mp, struct xfs_trans *tp, - xfs_rtblock_t start, + xfs_rtxnum_t start, xfs_rtxlen_t len, bool *is_free) { - xfs_rtblock_t end; + xfs_rtxnum_t end; int matches; int error; diff --git a/fs/xfs/libxfs/xfs_rtbitmap.h b/fs/xfs/libxfs/xfs_rtbitmap.h index b0a81fb8dbda..5e2afb7fea0e 100644 --- a/fs/xfs/libxfs/xfs_rtbitmap.h +++ b/fs/xfs/libxfs/xfs_rtbitmap.h @@ -7,12 +7,10 @@ #define __XFS_RTBITMAP_H__ /* - * XXX: Most of the realtime allocation functions deal in units of realtime - * extents, not realtime blocks. This looks funny when paired with the type - * name and screams for a larger cleanup. + * Functions for walking free space rtextents in the realtime bitmap. */ struct xfs_rtalloc_rec { - xfs_rtblock_t ar_startext; + xfs_rtxnum_t ar_startext; xfs_rtbxlen_t ar_extcount; }; @@ -26,16 +24,16 @@ typedef int (*xfs_rtalloc_query_range_fn)( int xfs_rtbuf_get(struct xfs_mount *mp, struct xfs_trans *tp, xfs_fileoff_t block, int issum, struct xfs_buf **bpp); int xfs_rtcheck_range(struct xfs_mount *mp, struct xfs_trans *tp, - xfs_rtblock_t start, xfs_rtxlen_t len, int val, - xfs_rtblock_t *new, int *stat); + xfs_rtxnum_t start, xfs_rtxlen_t len, int val, + xfs_rtxnum_t *new, int *stat); int xfs_rtfind_back(struct xfs_mount *mp, struct xfs_trans *tp, - xfs_rtblock_t start, xfs_rtblock_t limit, - xfs_rtblock_t *rtblock); + xfs_rtxnum_t start, xfs_rtxnum_t limit, + xfs_rtxnum_t *rtblock); int xfs_rtfind_forw(struct xfs_mount *mp, struct xfs_trans *tp, - xfs_rtblock_t start, xfs_rtblock_t limit, - xfs_rtblock_t *rtblock); + xfs_rtxnum_t start, xfs_rtxnum_t limit, + xfs_rtxnum_t *rtblock); int xfs_rtmodify_range(struct xfs_mount *mp, struct xfs_trans *tp, - xfs_rtblock_t start, xfs_rtxlen_t len, int val); + xfs_rtxnum_t start, xfs_rtxlen_t len, int val); int xfs_rtmodify_summary_int(struct xfs_mount *mp, struct xfs_trans *tp, int log, xfs_fileoff_t bbno, int delta, struct xfs_buf **rbpp, xfs_fileoff_t *rsb, @@ -44,7 +42,7 @@ int xfs_rtmodify_summary(struct xfs_mount *mp, struct xfs_trans *tp, int log, xfs_fileoff_t bbno, int delta, struct xfs_buf **rbpp, xfs_fileoff_t *rsb); int xfs_rtfree_range(struct xfs_mount *mp, struct xfs_trans *tp, - xfs_rtblock_t start, xfs_rtxlen_t len, + xfs_rtxnum_t start, xfs_rtxlen_t len, struct xfs_buf **rbpp, xfs_fileoff_t *rsb); int xfs_rtalloc_query_range(struct xfs_mount *mp, struct xfs_trans *tp, const struct xfs_rtalloc_rec *low_rec, @@ -54,7 +52,7 @@ int xfs_rtalloc_query_all(struct xfs_mount *mp, struct xfs_trans *tp, xfs_rtalloc_query_range_fn fn, void *priv); int xfs_rtalloc_extent_is_free(struct xfs_mount *mp, struct xfs_trans *tp, - xfs_rtblock_t start, xfs_rtxlen_t len, + xfs_rtxnum_t start, xfs_rtxlen_t len, bool *is_free); /* * Free an extent in the realtime subvolume. Length is expressed in @@ -63,7 +61,7 @@ int xfs_rtalloc_extent_is_free(struct xfs_mount *mp, struct xfs_trans *tp, int /* error */ xfs_rtfree_extent( struct xfs_trans *tp, /* transaction pointer */ - xfs_rtblock_t bno, /* starting block number to free */ + xfs_rtxnum_t start, /* starting rtext number to free */ xfs_rtxlen_t len); /* length of extent freed */ /* Same as above, but in units of rt blocks. */ diff --git a/fs/xfs/libxfs/xfs_types.h b/fs/xfs/libxfs/xfs_types.h index 532447a35732..abb07a1c7b0b 100644 --- a/fs/xfs/libxfs/xfs_types.h +++ b/fs/xfs/libxfs/xfs_types.h @@ -32,6 +32,7 @@ typedef uint64_t xfs_rfsblock_t; /* blockno in filesystem (raw) */ typedef uint64_t xfs_rtblock_t; /* extent (block) in realtime area */ typedef uint64_t xfs_fileoff_t; /* block number in a file */ typedef uint64_t xfs_filblks_t; /* number of blocks in a file */ +typedef uint64_t xfs_rtxnum_t; /* rtextent number */ typedef uint64_t xfs_rtbxlen_t; /* rtbitmap extent length in rtextents */ typedef int64_t xfs_srtblock_t; /* signed version of xfs_rtblock_t */ @@ -49,6 +50,7 @@ typedef void * xfs_failaddr_t; #define NULLRFSBLOCK ((xfs_rfsblock_t)-1) #define NULLRTBLOCK ((xfs_rtblock_t)-1) #define NULLFILEOFF ((xfs_fileoff_t)-1) +#define NULLRTEXTNO ((xfs_rtxnum_t)-1) #define NULLAGBLOCK ((xfs_agblock_t)-1) #define NULLAGNUMBER ((xfs_agnumber_t)-1) diff --git a/fs/xfs/scrub/rtbitmap.c b/fs/xfs/scrub/rtbitmap.c index 051abef66bc6..29c5af5a289f 100644 --- a/fs/xfs/scrub/rtbitmap.c +++ b/fs/xfs/scrub/rtbitmap.c @@ -59,7 +59,7 @@ xchk_rtbitmap_rec( void *priv) { struct xfs_scrub *sc = priv; - xfs_rtblock_t startblock; + xfs_rtxnum_t startblock; xfs_filblks_t blockcount; startblock = rec->ar_startext * mp->m_sb.sb_rextsize; @@ -143,8 +143,8 @@ xchk_xref_is_used_rt_space( xfs_rtblock_t fsbno, xfs_extlen_t len) { - xfs_rtblock_t startext; - xfs_rtblock_t endext; + xfs_rtxnum_t startext; + xfs_rtxnum_t endext; xfs_rtxlen_t extcount; bool is_free; int error; diff --git a/fs/xfs/scrub/rtsummary.c b/fs/xfs/scrub/rtsummary.c index c9e5a3bbdfdc..91c39564298c 100644 --- a/fs/xfs/scrub/rtsummary.c +++ b/fs/xfs/scrub/rtsummary.c @@ -123,7 +123,7 @@ xchk_rtsum_record_free( { struct xfs_scrub *sc = priv; xfs_fileoff_t rbmoff; - xfs_rtblock_t rtbno; + xfs_rtxnum_t rtbno; xfs_filblks_t rtlen; xchk_rtsumoff_t offs; unsigned int lenlog; diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h index 13fea23a9ab2..650a4c88ebc4 100644 --- a/fs/xfs/scrub/trace.h +++ b/fs/xfs/scrub/trace.h @@ -1081,14 +1081,14 @@ TRACE_EVENT(xfarray_sort_stats, #ifdef CONFIG_XFS_RT TRACE_EVENT(xchk_rtsum_record_free, - TP_PROTO(struct xfs_mount *mp, xfs_rtblock_t start, + TP_PROTO(struct xfs_mount *mp, xfs_rtxnum_t start, xfs_rtbxlen_t len, unsigned int log, loff_t pos, xfs_suminfo_t v), TP_ARGS(mp, start, len, log, pos, v), TP_STRUCT__entry( __field(dev_t, dev) __field(dev_t, rtdev) - __field(xfs_rtblock_t, start) + __field(xfs_rtxnum_t, start) __field(unsigned long long, len) __field(unsigned int, log) __field(loff_t, pos) diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index 20c1b4f55788..018c3bcc225e 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -76,7 +76,7 @@ xfs_bmap_rtalloc( { struct xfs_mount *mp = ap->ip->i_mount; xfs_fileoff_t orig_offset = ap->offset; - xfs_rtblock_t rtb; + xfs_rtxnum_t rtx; xfs_rtxlen_t prod = 0; /* product factor for allocators */ xfs_extlen_t mod = 0; /* product factor for allocators */ xfs_rtxlen_t ralen = 0; /* realtime allocation length */ @@ -145,8 +145,6 @@ xfs_bmap_rtalloc( * pick an extent that will space things out in the rt area. */ if (ap->eof && ap->offset == 0) { - xfs_rtblock_t rtx; /* realtime extent no */ - error = xfs_rtpick_extent(mp, ap->tp, ralen, &rtx); if (error) return error; @@ -164,16 +162,16 @@ xfs_bmap_rtalloc( ap->blkno = 0; else do_div(ap->blkno, mp->m_sb.sb_rextsize); - rtb = ap->blkno; + rtx = ap->blkno; ap->length = ralen; raminlen = max_t(xfs_extlen_t, 1, minlen / mp->m_sb.sb_rextsize); error = xfs_rtallocate_extent(ap->tp, ap->blkno, raminlen, ap->length, - &ralen, ap->wasdel, prod, &rtb); + &ralen, ap->wasdel, prod, &rtx); if (error) return error; - if (rtb != NULLRTBLOCK) { - ap->blkno = rtb * mp->m_sb.sb_rextsize; + if (rtx != NULLRTEXTNO) { + ap->blkno = rtx * mp->m_sb.sb_rextsize; ap->length = ralen * mp->m_sb.sb_rextsize; ap->ip->i_nblocks += ap->length; xfs_trans_log_inode(ap->tp, ap->ip, XFS_ILOG_CORE); diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c index 12d1fe425d22..40b6df0ad633 100644 --- a/fs/xfs/xfs_rtalloc.c +++ b/fs/xfs/xfs_rtalloc.c @@ -148,15 +148,15 @@ STATIC int /* error */ xfs_rtallocate_range( xfs_mount_t *mp, /* file system mount point */ xfs_trans_t *tp, /* transaction pointer */ - xfs_rtblock_t start, /* start block to allocate */ + xfs_rtxnum_t start, /* start rtext to allocate */ xfs_rtxlen_t len, /* length to allocate */ struct xfs_buf **rbpp, /* in/out: summary block buffer */ xfs_fileoff_t *rsb) /* in/out: summary block number */ { - xfs_rtblock_t end; /* end of the allocated extent */ + xfs_rtxnum_t end; /* end of the allocated rtext */ int error; /* error value */ - xfs_rtblock_t postblock = 0; /* first block allocated > end */ - xfs_rtblock_t preblock = 0; /* first block allocated < start */ + xfs_rtxnum_t postblock = 0; /* first rtext allocated > end */ + xfs_rtxnum_t preblock = 0; /* first rtext allocated < start */ end = start + len - 1; /* @@ -220,7 +220,7 @@ xfs_rtallocate_range( /* * Attempt to allocate an extent minlen<=len<=maxlen starting from * bitmap block bbno. If we don't get maxlen then use prod to trim - * the length, if given. Returns error; returns starting block in *rtblock. + * the length, if given. Returns error; returns starting block in *rtx. * The lengths are all in rtextents. */ STATIC int /* error */ @@ -231,18 +231,18 @@ xfs_rtallocate_extent_block( xfs_rtxlen_t minlen, /* minimum length to allocate */ xfs_rtxlen_t maxlen, /* maximum length to allocate */ xfs_rtxlen_t *len, /* out: actual length allocated */ - xfs_rtblock_t *nextp, /* out: next block to try */ + xfs_rtxnum_t *nextp, /* out: next rtext to try */ struct xfs_buf **rbpp, /* in/out: summary block buffer */ xfs_fileoff_t *rsb, /* in/out: summary block number */ xfs_rtxlen_t prod, /* extent product factor */ - xfs_rtblock_t *rtblock) /* out: start block allocated */ + xfs_rtxnum_t *rtx) /* out: start rtext allocated */ { - xfs_rtblock_t besti; /* best rtblock found so far */ - xfs_rtblock_t bestlen; /* best length found so far */ - xfs_rtblock_t end; /* last rtblock in chunk */ + xfs_rtxnum_t besti; /* best rtext found so far */ + xfs_rtxnum_t bestlen; /* best length found so far */ + xfs_rtxnum_t end; /* last rtext in chunk */ int error; /* error value */ - xfs_rtblock_t i; /* current rtblock trying */ - xfs_rtblock_t next; /* next rtblock to try */ + xfs_rtxnum_t i; /* current rtext trying */ + xfs_rtxnum_t next; /* next rtext to try */ int stat; /* status from internal calls */ /* @@ -279,7 +279,7 @@ xfs_rtallocate_extent_block( return error; } *len = maxlen; - *rtblock = i; + *rtx = i; return 0; } /* @@ -289,7 +289,7 @@ xfs_rtallocate_extent_block( * so far, remember it. */ if (minlen < maxlen) { - xfs_rtblock_t thislen; /* this extent size */ + xfs_rtxnum_t thislen; /* this extent size */ thislen = next - i; if (thislen >= minlen && thislen > bestlen) { @@ -331,47 +331,47 @@ xfs_rtallocate_extent_block( return error; } *len = bestlen; - *rtblock = besti; + *rtx = besti; return 0; } /* * Allocation failed. Set *nextp to the next block to try. */ *nextp = next; - *rtblock = NULLRTBLOCK; + *rtx = NULLRTEXTNO; return 0; } /* * Allocate an extent of length minlen<=len<=maxlen, starting at block * bno. If we don't get maxlen then use prod to trim the length, if given. - * Returns error; returns starting block in *rtblock. + * Returns error; returns starting block in *rtx. * The lengths are all in rtextents. */ STATIC int /* error */ xfs_rtallocate_extent_exact( xfs_mount_t *mp, /* file system mount point */ xfs_trans_t *tp, /* transaction pointer */ - xfs_rtblock_t bno, /* starting block number to allocate */ + xfs_rtxnum_t start, /* starting rtext number to allocate */ xfs_rtxlen_t minlen, /* minimum length to allocate */ xfs_rtxlen_t maxlen, /* maximum length to allocate */ xfs_rtxlen_t *len, /* out: actual length allocated */ struct xfs_buf **rbpp, /* in/out: summary block buffer */ xfs_fileoff_t *rsb, /* in/out: summary block number */ xfs_rtxlen_t prod, /* extent product factor */ - xfs_rtblock_t *rtblock) /* out: start block allocated */ + xfs_rtxnum_t *rtx) /* out: start rtext allocated */ { int error; /* error value */ xfs_rtxlen_t i; /* extent length trimmed due to prod */ int isfree; /* extent is free */ - xfs_rtblock_t next; /* next block to try (dummy) */ + xfs_rtxnum_t next; /* next rtext to try (dummy) */ ASSERT(minlen % prod == 0); ASSERT(maxlen % prod == 0); /* * Check if the range in question (for maxlen) is free. */ - error = xfs_rtcheck_range(mp, tp, bno, maxlen, 1, &next, &isfree); + error = xfs_rtcheck_range(mp, tp, start, maxlen, 1, &next, &isfree); if (error) { return error; } @@ -379,23 +379,23 @@ xfs_rtallocate_extent_exact( /* * If it is, allocate it and return success. */ - error = xfs_rtallocate_range(mp, tp, bno, maxlen, rbpp, rsb); + error = xfs_rtallocate_range(mp, tp, start, maxlen, rbpp, rsb); if (error) { return error; } *len = maxlen; - *rtblock = bno; + *rtx = start; return 0; } /* * If not, allocate what there is, if it's at least minlen. */ - maxlen = next - bno; + maxlen = next - start; if (maxlen < minlen) { /* * Failed, return failure status. */ - *rtblock = NULLRTBLOCK; + *rtx = NULLRTEXTNO; return 0; } /* @@ -407,39 +407,39 @@ xfs_rtallocate_extent_exact( /* * Now we can't do it, return failure status. */ - *rtblock = NULLRTBLOCK; + *rtx = NULLRTEXTNO; return 0; } } /* * Allocate what we can and return it. */ - error = xfs_rtallocate_range(mp, tp, bno, maxlen, rbpp, rsb); + error = xfs_rtallocate_range(mp, tp, start, maxlen, rbpp, rsb); if (error) { return error; } *len = maxlen; - *rtblock = bno; + *rtx = start; return 0; } /* * Allocate an extent of length minlen<=len<=maxlen, starting as near - * to bno as possible. If we don't get maxlen then use prod to trim + * to start as possible. If we don't get maxlen then use prod to trim * the length, if given. The lengths are all in rtextents. */ STATIC int /* error */ xfs_rtallocate_extent_near( xfs_mount_t *mp, /* file system mount point */ xfs_trans_t *tp, /* transaction pointer */ - xfs_rtblock_t bno, /* starting block number to allocate */ + xfs_rtxnum_t start, /* starting rtext number to allocate */ xfs_rtxlen_t minlen, /* minimum length to allocate */ xfs_rtxlen_t maxlen, /* maximum length to allocate */ xfs_rtxlen_t *len, /* out: actual length allocated */ struct xfs_buf **rbpp, /* in/out: summary block buffer */ xfs_fileoff_t *rsb, /* in/out: summary block number */ xfs_rtxlen_t prod, /* extent product factor */ - xfs_rtblock_t *rtblock) /* out: start block allocated */ + xfs_rtxnum_t *rtx) /* out: start rtext allocated */ { int any; /* any useful extents from summary */ xfs_fileoff_t bbno; /* bitmap block number */ @@ -447,8 +447,8 @@ xfs_rtallocate_extent_near( int i; /* bitmap block offset (loop control) */ int j; /* secondary loop control */ int log2len; /* log2 of minlen */ - xfs_rtblock_t n; /* next block to try */ - xfs_rtblock_t r; /* result block */ + xfs_rtxnum_t n; /* next rtext to try */ + xfs_rtxnum_t r; /* result rtext */ ASSERT(minlen % prod == 0); ASSERT(maxlen % prod == 0); @@ -457,25 +457,25 @@ xfs_rtallocate_extent_near( * If the block number given is off the end, silently set it to * the last block. */ - if (bno >= mp->m_sb.sb_rextents) - bno = mp->m_sb.sb_rextents - 1; + if (start >= mp->m_sb.sb_rextents) + start = mp->m_sb.sb_rextents - 1; /* * Make sure we don't run off the end of the rt volume. Be careful * that adjusting maxlen downwards doesn't cause us to fail the * alignment checks. */ - maxlen = min(mp->m_sb.sb_rextents, bno + maxlen) - bno; + maxlen = min(mp->m_sb.sb_rextents, start + maxlen) - start; maxlen -= maxlen % prod; if (maxlen < minlen) { - *rtblock = NULLRTBLOCK; + *rtx = NULLRTEXTNO; return 0; } /* * Try the exact allocation first. */ - error = xfs_rtallocate_extent_exact(mp, tp, bno, minlen, maxlen, len, + error = xfs_rtallocate_extent_exact(mp, tp, start, minlen, maxlen, len, rbpp, rsb, prod, &r); if (error) { return error; @@ -483,11 +483,11 @@ xfs_rtallocate_extent_near( /* * If the exact allocation worked, return that. */ - if (r != NULLRTBLOCK) { - *rtblock = r; + if (r != NULLRTEXTNO) { + *rtx = r; return 0; } - bbno = XFS_BITTOBLOCK(mp, bno); + bbno = XFS_BITTOBLOCK(mp, start); i = 0; ASSERT(minlen != 0); log2len = xfs_highbit32(minlen); @@ -526,8 +526,8 @@ xfs_rtallocate_extent_near( /* * If it worked, return it. */ - if (r != NULLRTBLOCK) { - *rtblock = r; + if (r != NULLRTEXTNO) { + *rtx = r; return 0; } } @@ -571,8 +571,8 @@ xfs_rtallocate_extent_near( /* * If it works, return the extent. */ - if (r != NULLRTBLOCK) { - *rtblock = r; + if (r != NULLRTEXTNO) { + *rtx = r; return 0; } } @@ -593,8 +593,8 @@ xfs_rtallocate_extent_near( /* * If it works, return the extent. */ - if (r != NULLRTBLOCK) { - *rtblock = r; + if (r != NULLRTEXTNO) { + *rtx = r; return 0; } } @@ -629,7 +629,7 @@ xfs_rtallocate_extent_near( else break; } - *rtblock = NULLRTBLOCK; + *rtx = NULLRTEXTNO; return 0; } @@ -648,13 +648,13 @@ xfs_rtallocate_extent_size( struct xfs_buf **rbpp, /* in/out: summary block buffer */ xfs_fileoff_t *rsb, /* in/out: summary block number */ xfs_rtxlen_t prod, /* extent product factor */ - xfs_rtblock_t *rtblock) /* out: start block allocated */ + xfs_rtxnum_t *rtx) /* out: start rtext allocated */ { int error; /* error value */ xfs_fileoff_t i; /* bitmap block number */ int l; /* level number (loop control) */ - xfs_rtblock_t n; /* next block to be tried */ - xfs_rtblock_t r; /* result block number */ + xfs_rtxnum_t n; /* next rtext to be tried */ + xfs_rtxnum_t r; /* result rtext number */ xfs_suminfo_t sum; /* summary information for extents */ ASSERT(minlen % prod == 0); @@ -697,8 +697,8 @@ xfs_rtallocate_extent_size( /* * If it worked, return that. */ - if (r != NULLRTBLOCK) { - *rtblock = r; + if (r != NULLRTEXTNO) { + *rtx = r; return 0; } /* @@ -715,7 +715,7 @@ xfs_rtallocate_extent_size( * we're asking for a fixed size extent. */ if (minlen > --maxlen) { - *rtblock = NULLRTBLOCK; + *rtx = NULLRTEXTNO; return 0; } ASSERT(minlen != 0); @@ -760,8 +760,8 @@ xfs_rtallocate_extent_size( /* * If it worked, return that extent. */ - if (r != NULLRTBLOCK) { - *rtblock = r; + if (r != NULLRTEXTNO) { + *rtx = r; return 0; } /* @@ -776,7 +776,7 @@ xfs_rtallocate_extent_size( /* * Got nothing, return failure. */ - *rtblock = NULLRTBLOCK; + *rtx = NULLRTEXTNO; return 0; } @@ -933,7 +933,7 @@ xfs_growfs_rt( xfs_mount_t *nmp; /* new (fake) mount structure */ xfs_rfsblock_t nrblocks; /* new number of realtime blocks */ xfs_extlen_t nrbmblocks; /* new number of rt bitmap blocks */ - xfs_rtblock_t nrextents; /* new number of realtime extents */ + xfs_rtxnum_t nrextents; /* new number of realtime extents */ uint8_t nrextslog; /* new log2 of sb_rextents */ xfs_extlen_t nrsumblocks; /* new number of summary blocks */ uint nrsumlevels; /* new rt summary levels */ @@ -1194,17 +1194,17 @@ xfs_growfs_rt( int /* error */ xfs_rtallocate_extent( xfs_trans_t *tp, /* transaction pointer */ - xfs_rtblock_t bno, /* starting block number to allocate */ + xfs_rtxnum_t start, /* starting rtext number to allocate */ xfs_rtxlen_t minlen, /* minimum length to allocate */ xfs_rtxlen_t maxlen, /* maximum length to allocate */ xfs_rtxlen_t *len, /* out: actual length allocated */ int wasdel, /* was a delayed allocation extent */ xfs_rtxlen_t prod, /* extent product factor */ - xfs_rtblock_t *rtblock) /* out: start block allocated */ + xfs_rtxnum_t *rtblock) /* out: start rtext allocated */ { xfs_mount_t *mp = tp->t_mountp; int error; /* error value */ - xfs_rtblock_t r; /* result allocated block */ + xfs_rtxnum_t r; /* result allocated rtext */ xfs_fileoff_t sb; /* summary file block number */ struct xfs_buf *sumbp; /* summary file block buffer */ @@ -1222,18 +1222,18 @@ xfs_rtallocate_extent( if ((i = minlen % prod)) minlen += prod - i; if (maxlen < minlen) { - *rtblock = NULLRTBLOCK; + *rtblock = NULLRTEXTNO; return 0; } } retry: sumbp = NULL; - if (bno == 0) { + if (start == 0) { error = xfs_rtallocate_extent_size(mp, tp, minlen, maxlen, len, &sumbp, &sb, prod, &r); } else { - error = xfs_rtallocate_extent_near(mp, tp, bno, minlen, maxlen, + error = xfs_rtallocate_extent_near(mp, tp, start, minlen, maxlen, len, &sumbp, &sb, prod, &r); } @@ -1243,7 +1243,7 @@ xfs_rtallocate_extent( /* * If it worked, update the superblock. */ - if (r != NULLRTBLOCK) { + if (r != NULLRTEXTNO) { long slen = (long)*len; ASSERT(*len >= minlen && *len <= maxlen); @@ -1449,9 +1449,9 @@ xfs_rtpick_extent( xfs_mount_t *mp, /* file system mount point */ xfs_trans_t *tp, /* transaction pointer */ xfs_rtxlen_t len, /* allocation length (rtextents) */ - xfs_rtblock_t *pick) /* result rt extent */ + xfs_rtxnum_t *pick) /* result rt extent */ { - xfs_rtblock_t b; /* result block */ + xfs_rtxnum_t b; /* result rtext */ int log2; /* log of sequence number */ uint64_t resid; /* residual after log removed */ uint64_t seq; /* sequence number of file creation */ diff --git a/fs/xfs/xfs_rtalloc.h b/fs/xfs/xfs_rtalloc.h index ec03cc566bec..5ac9c15948c8 100644 --- a/fs/xfs/xfs_rtalloc.h +++ b/fs/xfs/xfs_rtalloc.h @@ -24,13 +24,13 @@ struct xfs_trans; int /* error */ xfs_rtallocate_extent( struct xfs_trans *tp, /* transaction pointer */ - xfs_rtblock_t bno, /* starting block number to allocate */ + xfs_rtxnum_t start, /* starting rtext number to allocate */ xfs_rtxlen_t minlen, /* minimum length to allocate */ xfs_rtxlen_t maxlen, /* maximum length to allocate */ xfs_rtxlen_t *len, /* out: actual length allocated */ int wasdel, /* was a delayed allocation extent */ xfs_rtxlen_t prod, /* extent product factor */ - xfs_rtblock_t *rtblock); /* out: start block allocated */ + xfs_rtxnum_t *rtblock); /* out: start rtext allocated */ /* @@ -63,7 +63,7 @@ xfs_rtpick_extent( struct xfs_mount *mp, /* file system mount point */ struct xfs_trans *tp, /* transaction pointer */ xfs_rtxlen_t len, /* allocation length (rtextents) */ - xfs_rtblock_t *pick); /* result rt extent */ + xfs_rtxnum_t *pick); /* result rt extent */ /* * Grow the realtime area of the filesystem. ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 06/11] xfs: move the xfs_rtbitmap.c declarations to xfs_rtbitmap.h 2022-12-30 22:17 ` [PATCHSET v1.0 00/11] xfs: clean up realtime type usage Darrick J. Wong ` (6 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 11/11] xfs: convert rt extent numbers to xfs_rtxnum_t Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 09/11] xfs: convert rt bitmap extent lengths to xfs_rtbxlen_t Darrick J. Wong ` (2 subsequent siblings) 10 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Move all the declarations for functionality in xfs_rtbitmap.c into a separate xfs_rtbitmap.h header file. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_bmap.c | 2 + fs/xfs/libxfs/xfs_rtbitmap.c | 1 + fs/xfs/libxfs/xfs_rtbitmap.h | 82 ++++++++++++++++++++++++++++++++++++++++++ fs/xfs/scrub/fscounters.c | 2 + fs/xfs/scrub/rtbitmap.c | 2 + fs/xfs/scrub/rtsummary.c | 2 + fs/xfs/xfs_fsmap.c | 2 + fs/xfs/xfs_rtalloc.c | 1 + fs/xfs/xfs_rtalloc.h | 73 ------------------------------------- 9 files changed, 89 insertions(+), 78 deletions(-) create mode 100644 fs/xfs/libxfs/xfs_rtbitmap.h diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index a372618ce393..6adc7e90e59d 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -21,7 +21,7 @@ #include "xfs_bmap.h" #include "xfs_bmap_util.h" #include "xfs_bmap_btree.h" -#include "xfs_rtalloc.h" +#include "xfs_rtbitmap.h" #include "xfs_errortag.h" #include "xfs_error.h" #include "xfs_quota.h" diff --git a/fs/xfs/libxfs/xfs_rtbitmap.c b/fs/xfs/libxfs/xfs_rtbitmap.c index c0bd7c44a6b8..196cad3ef85c 100644 --- a/fs/xfs/libxfs/xfs_rtbitmap.c +++ b/fs/xfs/libxfs/xfs_rtbitmap.c @@ -17,6 +17,7 @@ #include "xfs_rtalloc.h" #include "xfs_error.h" #include "xfs_health.h" +#include "xfs_rtbitmap.h" /* * Realtime allocator bitmap functions shared with userspace. diff --git a/fs/xfs/libxfs/xfs_rtbitmap.h b/fs/xfs/libxfs/xfs_rtbitmap.h new file mode 100644 index 000000000000..546dea34bb37 --- /dev/null +++ b/fs/xfs/libxfs/xfs_rtbitmap.h @@ -0,0 +1,82 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (c) 2000-2003,2005 Silicon Graphics, Inc. + * All Rights Reserved. + */ +#ifndef __XFS_RTBITMAP_H__ +#define __XFS_RTBITMAP_H__ + +/* + * XXX: Most of the realtime allocation functions deal in units of realtime + * extents, not realtime blocks. This looks funny when paired with the type + * name and screams for a larger cleanup. + */ +struct xfs_rtalloc_rec { + xfs_rtblock_t ar_startext; + xfs_rtblock_t ar_extcount; +}; + +typedef int (*xfs_rtalloc_query_range_fn)( + struct xfs_mount *mp, + struct xfs_trans *tp, + const struct xfs_rtalloc_rec *rec, + void *priv); + +#ifdef CONFIG_XFS_RT +int xfs_rtbuf_get(struct xfs_mount *mp, struct xfs_trans *tp, + xfs_rtblock_t block, int issum, struct xfs_buf **bpp); +int xfs_rtcheck_range(struct xfs_mount *mp, struct xfs_trans *tp, + xfs_rtblock_t start, xfs_extlen_t len, int val, + xfs_rtblock_t *new, int *stat); +int xfs_rtfind_back(struct xfs_mount *mp, struct xfs_trans *tp, + xfs_rtblock_t start, xfs_rtblock_t limit, + xfs_rtblock_t *rtblock); +int xfs_rtfind_forw(struct xfs_mount *mp, struct xfs_trans *tp, + xfs_rtblock_t start, xfs_rtblock_t limit, + xfs_rtblock_t *rtblock); +int xfs_rtmodify_range(struct xfs_mount *mp, struct xfs_trans *tp, + xfs_rtblock_t start, xfs_extlen_t len, int val); +int xfs_rtmodify_summary_int(struct xfs_mount *mp, struct xfs_trans *tp, + int log, xfs_rtblock_t bbno, int delta, + struct xfs_buf **rbpp, xfs_fsblock_t *rsb, + xfs_suminfo_t *sum); +int xfs_rtmodify_summary(struct xfs_mount *mp, struct xfs_trans *tp, int log, + xfs_rtblock_t bbno, int delta, struct xfs_buf **rbpp, + xfs_fsblock_t *rsb); +int xfs_rtfree_range(struct xfs_mount *mp, struct xfs_trans *tp, + xfs_rtblock_t start, xfs_extlen_t len, + struct xfs_buf **rbpp, xfs_fsblock_t *rsb); +int xfs_rtalloc_query_range(struct xfs_mount *mp, struct xfs_trans *tp, + const struct xfs_rtalloc_rec *low_rec, + const struct xfs_rtalloc_rec *high_rec, + xfs_rtalloc_query_range_fn fn, void *priv); +int xfs_rtalloc_query_all(struct xfs_mount *mp, struct xfs_trans *tp, + xfs_rtalloc_query_range_fn fn, + void *priv); +bool xfs_verify_rtbno(struct xfs_mount *mp, xfs_rtblock_t rtbno); +int xfs_rtalloc_extent_is_free(struct xfs_mount *mp, struct xfs_trans *tp, + xfs_rtblock_t start, xfs_extlen_t len, + bool *is_free); +/* + * Free an extent in the realtime subvolume. Length is expressed in + * realtime extents, as is the block number. + */ +int /* error */ +xfs_rtfree_extent( + struct xfs_trans *tp, /* transaction pointer */ + xfs_rtblock_t bno, /* starting block number to free */ + xfs_extlen_t len); /* length of extent freed */ + +/* Same as above, but in units of rt blocks. */ +int xfs_rtfree_blocks(struct xfs_trans *tp, xfs_fsblock_t rtbno, + xfs_filblks_t rtlen); +#else /* CONFIG_XFS_RT */ +# define xfs_rtfree_extent(t,b,l) (-ENOSYS) +# define xfs_rtfree_blocks(t,rb,rl) (-ENOSYS) +# define xfs_rtalloc_query_range(m,t,l,h,f,p) (-ENOSYS) +# define xfs_rtalloc_query_all(m,t,f,p) (-ENOSYS) +# define xfs_rtbuf_get(m,t,b,i,p) (-ENOSYS) +# define xfs_rtalloc_extent_is_free(m,t,s,l,i) (-ENOSYS) +#endif /* CONFIG_XFS_RT */ + +#endif /* __XFS_RTBITMAP_H__ */ diff --git a/fs/xfs/scrub/fscounters.c b/fs/xfs/scrub/fscounters.c index ae12da1be95c..680b2e1d2940 100644 --- a/fs/xfs/scrub/fscounters.c +++ b/fs/xfs/scrub/fscounters.c @@ -14,7 +14,7 @@ #include "xfs_health.h" #include "xfs_btree.h" #include "xfs_ag.h" -#include "xfs_rtalloc.h" +#include "xfs_rtbitmap.h" #include "xfs_inode.h" #include "xfs_icache.h" #include "scrub/scrub.h" diff --git a/fs/xfs/scrub/rtbitmap.c b/fs/xfs/scrub/rtbitmap.c index e524055ba709..4165ed739136 100644 --- a/fs/xfs/scrub/rtbitmap.c +++ b/fs/xfs/scrub/rtbitmap.c @@ -11,7 +11,7 @@ #include "xfs_mount.h" #include "xfs_log_format.h" #include "xfs_trans.h" -#include "xfs_rtalloc.h" +#include "xfs_rtbitmap.h" #include "xfs_inode.h" #include "xfs_bmap.h" #include "scrub/scrub.h" diff --git a/fs/xfs/scrub/rtsummary.c b/fs/xfs/scrub/rtsummary.c index c0bf65273f1a..f4a2456e01d0 100644 --- a/fs/xfs/scrub/rtsummary.c +++ b/fs/xfs/scrub/rtsummary.c @@ -13,7 +13,7 @@ #include "xfs_inode.h" #include "xfs_log_format.h" #include "xfs_trans.h" -#include "xfs_rtalloc.h" +#include "xfs_rtbitmap.h" #include "xfs_bit.h" #include "xfs_bmap.h" #include "scrub/scrub.h" diff --git a/fs/xfs/xfs_fsmap.c b/fs/xfs/xfs_fsmap.c index 88a88506ffff..0027e186bd52 100644 --- a/fs/xfs/xfs_fsmap.c +++ b/fs/xfs/xfs_fsmap.c @@ -23,7 +23,7 @@ #include "xfs_refcount.h" #include "xfs_refcount_btree.h" #include "xfs_alloc_btree.h" -#include "xfs_rtalloc.h" +#include "xfs_rtbitmap.h" #include "xfs_ag.h" /* Convert an xfs_fsmap to an fsmap. */ diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c index 88faf7fb912d..b732bac11b01 100644 --- a/fs/xfs/xfs_rtalloc.c +++ b/fs/xfs/xfs_rtalloc.c @@ -24,6 +24,7 @@ #include "xfs_trace.h" #include "xfs_da_format.h" #include "xfs_imeta.h" +#include "xfs_rtbitmap.h" /* * Read and return the summary information for a given extent size, diff --git a/fs/xfs/xfs_rtalloc.h b/fs/xfs/xfs_rtalloc.h index e53bc52d81fd..f14da84206d9 100644 --- a/fs/xfs/xfs_rtalloc.h +++ b/fs/xfs/xfs_rtalloc.h @@ -11,22 +11,6 @@ struct xfs_mount; struct xfs_trans; -/* - * XXX: Most of the realtime allocation functions deal in units of realtime - * extents, not realtime blocks. This looks funny when paired with the type - * name and screams for a larger cleanup. - */ -struct xfs_rtalloc_rec { - xfs_rtblock_t ar_startext; - xfs_rtblock_t ar_extcount; -}; - -typedef int (*xfs_rtalloc_query_range_fn)( - struct xfs_mount *mp, - struct xfs_trans *tp, - const struct xfs_rtalloc_rec *rec, - void *priv); - #ifdef CONFIG_XFS_RT /* * Function prototypes for exported functions. @@ -48,19 +32,6 @@ xfs_rtallocate_extent( xfs_extlen_t prod, /* extent product factor */ xfs_rtblock_t *rtblock); /* out: start block allocated */ -/* - * Free an extent in the realtime subvolume. Length is expressed in - * realtime extents, as is the block number. - */ -int /* error */ -xfs_rtfree_extent( - struct xfs_trans *tp, /* transaction pointer */ - xfs_rtblock_t bno, /* starting block number to free */ - xfs_extlen_t len); /* length of extent freed */ - -/* Same as above, but in units of rt blocks. */ -int xfs_rtfree_blocks(struct xfs_trans *tp, xfs_fsblock_t rtbno, - xfs_filblks_t rtlen); /* * Initialize realtime fields in the mount structure. @@ -102,57 +73,13 @@ xfs_growfs_rt( struct xfs_mount *mp, /* file system mount structure */ xfs_growfs_rt_t *in); /* user supplied growfs struct */ -/* - * From xfs_rtbitmap.c - */ -int xfs_rtbuf_get(struct xfs_mount *mp, struct xfs_trans *tp, - xfs_rtblock_t block, int issum, struct xfs_buf **bpp); -int xfs_rtcheck_range(struct xfs_mount *mp, struct xfs_trans *tp, - xfs_rtblock_t start, xfs_extlen_t len, int val, - xfs_rtblock_t *new, int *stat); -int xfs_rtfind_back(struct xfs_mount *mp, struct xfs_trans *tp, - xfs_rtblock_t start, xfs_rtblock_t limit, - xfs_rtblock_t *rtblock); -int xfs_rtfind_forw(struct xfs_mount *mp, struct xfs_trans *tp, - xfs_rtblock_t start, xfs_rtblock_t limit, - xfs_rtblock_t *rtblock); -int xfs_rtmodify_range(struct xfs_mount *mp, struct xfs_trans *tp, - xfs_rtblock_t start, xfs_extlen_t len, int val); -int xfs_rtmodify_summary_int(struct xfs_mount *mp, struct xfs_trans *tp, - int log, xfs_rtblock_t bbno, int delta, - struct xfs_buf **rbpp, xfs_fsblock_t *rsb, - xfs_suminfo_t *sum); -int xfs_rtmodify_summary(struct xfs_mount *mp, struct xfs_trans *tp, int log, - xfs_rtblock_t bbno, int delta, struct xfs_buf **rbpp, - xfs_fsblock_t *rsb); -int xfs_rtfree_range(struct xfs_mount *mp, struct xfs_trans *tp, - xfs_rtblock_t start, xfs_extlen_t len, - struct xfs_buf **rbpp, xfs_fsblock_t *rsb); -int xfs_rtalloc_query_range(struct xfs_mount *mp, struct xfs_trans *tp, - const struct xfs_rtalloc_rec *low_rec, - const struct xfs_rtalloc_rec *high_rec, - xfs_rtalloc_query_range_fn fn, void *priv); -int xfs_rtalloc_query_all(struct xfs_mount *mp, struct xfs_trans *tp, - xfs_rtalloc_query_range_fn fn, - void *priv); -bool xfs_verify_rtbno(struct xfs_mount *mp, xfs_rtblock_t rtbno); -int xfs_rtalloc_extent_is_free(struct xfs_mount *mp, struct xfs_trans *tp, - xfs_rtblock_t start, xfs_extlen_t len, - bool *is_free); int xfs_rtalloc_reinit_frextents(struct xfs_mount *mp); int xfs_rtfile_convert_unwritten(struct xfs_inode *ip, loff_t pos, uint64_t len); #else # define xfs_rtallocate_extent(t,b,min,max,l,f,p,rb) (-ENOSYS) -# define xfs_rtfree_extent(t,b,l) (-ENOSYS) -# define xfs_rtfree_blocks(t,rb,rl) (-ENOSYS) # define xfs_rtpick_extent(m,t,l,rb) (-ENOSYS) # define xfs_growfs_rt(mp,in) (-ENOSYS) -# define xfs_rtalloc_query_range(m,t,l,h,f,p) (-ENOSYS) -# define xfs_rtalloc_query_all(m,t,f,p) (-ENOSYS) -# define xfs_rtbuf_get(m,t,b,i,p) (-ENOSYS) -# define xfs_verify_rtbno(m, r) (false) -# define xfs_rtalloc_extent_is_free(m,t,s,l,i) (-ENOSYS) # define xfs_rtalloc_reinit_frextents(m) (0) static inline int /* error */ xfs_rtmount_init( ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 09/11] xfs: convert rt bitmap extent lengths to xfs_rtbxlen_t 2022-12-30 22:17 ` [PATCHSET v1.0 00/11] xfs: clean up realtime type usage Darrick J. Wong ` (7 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 06/11] xfs: move the xfs_rtbitmap.c declarations to xfs_rtbitmap.h Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 10/11] xfs: rename xfs_verify_rtext to xfs_verify_rtbext Darrick J. Wong 2022-12-30 22:17 ` [PATCH 07/11] xfs: convert xfs_extlen_t to xfs_rtxlen_t in the rt allocator Darrick J. Wong 10 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> XFS uses xfs_rtblock_t for many different uses, which makes it much more difficult to perform a unit analysis on the codebase. One of these (ab)uses is when we need to store the length of a free space extent as stored in the realtime bitmap. Because there can be up to 2^64 realtime extents in a filesystem, we need a new type that is larger than xfs_rtxlen_t for callers that are querying the bitmap directly. This means scrub and growfs. Create this type as "xfs_rtbxlen_t" and use it to store 64-bit rtx lengths. 'b' stands for 'bitmap' or 'big'; reader's choice. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_format.h | 2 +- fs/xfs/libxfs/xfs_rtbitmap.h | 2 +- fs/xfs/libxfs/xfs_types.h | 1 + fs/xfs/scrub/trace.h | 3 ++- 4 files changed, 5 insertions(+), 3 deletions(-) diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h index 33b047f9cf03..d93cc0ea20e3 100644 --- a/fs/xfs/libxfs/xfs_format.h +++ b/fs/xfs/libxfs/xfs_format.h @@ -97,7 +97,7 @@ typedef struct xfs_sb { uint32_t sb_blocksize; /* logical block size, bytes */ xfs_rfsblock_t sb_dblocks; /* number of data blocks */ xfs_rfsblock_t sb_rblocks; /* number of realtime blocks */ - xfs_rtblock_t sb_rextents; /* number of realtime extents */ + xfs_rtbxlen_t sb_rextents; /* number of realtime extents */ uuid_t sb_uuid; /* user-visible file system unique id */ xfs_fsblock_t sb_logstart; /* starting block of log if internal */ xfs_ino_t sb_rootino; /* root inode number */ diff --git a/fs/xfs/libxfs/xfs_rtbitmap.h b/fs/xfs/libxfs/xfs_rtbitmap.h index e2ea6d31c38b..b0a81fb8dbda 100644 --- a/fs/xfs/libxfs/xfs_rtbitmap.h +++ b/fs/xfs/libxfs/xfs_rtbitmap.h @@ -13,7 +13,7 @@ */ struct xfs_rtalloc_rec { xfs_rtblock_t ar_startext; - xfs_rtblock_t ar_extcount; + xfs_rtbxlen_t ar_extcount; }; typedef int (*xfs_rtalloc_query_range_fn)( diff --git a/fs/xfs/libxfs/xfs_types.h b/fs/xfs/libxfs/xfs_types.h index 0856997f84d6..a2fb880433b5 100644 --- a/fs/xfs/libxfs/xfs_types.h +++ b/fs/xfs/libxfs/xfs_types.h @@ -32,6 +32,7 @@ typedef uint64_t xfs_rfsblock_t; /* blockno in filesystem (raw) */ typedef uint64_t xfs_rtblock_t; /* extent (block) in realtime area */ typedef uint64_t xfs_fileoff_t; /* block number in a file */ typedef uint64_t xfs_filblks_t; /* number of blocks in a file */ +typedef uint64_t xfs_rtbxlen_t; /* rtbitmap extent length in rtextents */ typedef int64_t xfs_srtblock_t; /* signed version of xfs_rtblock_t */ diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h index 3652ac4a3eff..13fea23a9ab2 100644 --- a/fs/xfs/scrub/trace.h +++ b/fs/xfs/scrub/trace.h @@ -1082,7 +1082,8 @@ TRACE_EVENT(xfarray_sort_stats, #ifdef CONFIG_XFS_RT TRACE_EVENT(xchk_rtsum_record_free, TP_PROTO(struct xfs_mount *mp, xfs_rtblock_t start, - uint64_t len, unsigned int log, loff_t pos, xfs_suminfo_t v), + xfs_rtbxlen_t len, unsigned int log, loff_t pos, + xfs_suminfo_t v), TP_ARGS(mp, start, len, log, pos, v), TP_STRUCT__entry( __field(dev_t, dev) ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 10/11] xfs: rename xfs_verify_rtext to xfs_verify_rtbext 2022-12-30 22:17 ` [PATCHSET v1.0 00/11] xfs: clean up realtime type usage Darrick J. Wong ` (8 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 09/11] xfs: convert rt bitmap extent lengths to xfs_rtbxlen_t Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 07/11] xfs: convert xfs_extlen_t to xfs_rtxlen_t in the rt allocator Darrick J. Wong 10 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> This helper function validates that a range of *blocks* in the realtime section is completely contained within the realtime section. It does /not/ validate ranges of *rtextents*. Rename the function to avoid suggesting that it does, and change the type of the @len parameter since xfs_rtblock_t is a position unit, not a length unit. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_bmap.c | 2 +- fs/xfs/libxfs/xfs_types.c | 4 ++-- fs/xfs/libxfs/xfs_types.h | 4 ++-- fs/xfs/scrub/bmap.c | 2 +- fs/xfs/scrub/rtbitmap.c | 4 ++-- fs/xfs/scrub/rtsummary.c | 2 +- fs/xfs/xfs_bmap_item.c | 2 +- 7 files changed, 10 insertions(+), 10 deletions(-) diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index 6adc7e90e59d..7f7f0d435b33 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -6292,7 +6292,7 @@ xfs_bmap_validate_extent_raw( return __this_address; if (rtfile && whichfork == XFS_DATA_FORK) { - if (!xfs_verify_rtext(mp, irec->br_startblock, + if (!xfs_verify_rtbext(mp, irec->br_startblock, irec->br_blockcount)) return __this_address; } else { diff --git a/fs/xfs/libxfs/xfs_types.c b/fs/xfs/libxfs/xfs_types.c index dfcc1889c203..b1fa715e5f39 100644 --- a/fs/xfs/libxfs/xfs_types.c +++ b/fs/xfs/libxfs/xfs_types.c @@ -147,10 +147,10 @@ xfs_verify_rtbno( /* Verify that a realtime device extent is fully contained inside the volume. */ bool -xfs_verify_rtext( +xfs_verify_rtbext( struct xfs_mount *mp, xfs_rtblock_t rtbno, - xfs_rtblock_t len) + xfs_filblks_t len) { if (rtbno + len <= rtbno) return false; diff --git a/fs/xfs/libxfs/xfs_types.h b/fs/xfs/libxfs/xfs_types.h index a2fb880433b5..532447a35732 100644 --- a/fs/xfs/libxfs/xfs_types.h +++ b/fs/xfs/libxfs/xfs_types.h @@ -240,8 +240,8 @@ bool xfs_verify_ino(struct xfs_mount *mp, xfs_ino_t ino); bool xfs_internal_inum(struct xfs_mount *mp, xfs_ino_t ino); bool xfs_verify_dir_ino(struct xfs_mount *mp, xfs_ino_t ino); bool xfs_verify_rtbno(struct xfs_mount *mp, xfs_rtblock_t rtbno); -bool xfs_verify_rtext(struct xfs_mount *mp, xfs_rtblock_t rtbno, - xfs_rtblock_t len); +bool xfs_verify_rtbext(struct xfs_mount *mp, xfs_rtblock_t rtbno, + xfs_filblks_t len); bool xfs_verify_icount(struct xfs_mount *mp, unsigned long long icount); bool xfs_verify_dablk(struct xfs_mount *mp, xfs_fileoff_t off); void xfs_icount_range(struct xfs_mount *mp, unsigned long long *min, diff --git a/fs/xfs/scrub/bmap.c b/fs/xfs/scrub/bmap.c index 47d6bae9d6da..b5b081d23ca2 100644 --- a/fs/xfs/scrub/bmap.c +++ b/fs/xfs/scrub/bmap.c @@ -469,7 +469,7 @@ xchk_bmap_iextent( xchk_fblock_set_corrupt(info->sc, info->whichfork, irec->br_startoff); if (info->is_rt && - !xfs_verify_rtext(mp, irec->br_startblock, irec->br_blockcount)) + !xfs_verify_rtbext(mp, irec->br_startblock, irec->br_blockcount)) xchk_fblock_set_corrupt(info->sc, info->whichfork, irec->br_startoff); if (!info->is_rt && diff --git a/fs/xfs/scrub/rtbitmap.c b/fs/xfs/scrub/rtbitmap.c index 6f8becb557bd..051abef66bc6 100644 --- a/fs/xfs/scrub/rtbitmap.c +++ b/fs/xfs/scrub/rtbitmap.c @@ -60,12 +60,12 @@ xchk_rtbitmap_rec( { struct xfs_scrub *sc = priv; xfs_rtblock_t startblock; - xfs_rtblock_t blockcount; + xfs_filblks_t blockcount; startblock = rec->ar_startext * mp->m_sb.sb_rextsize; blockcount = rec->ar_extcount * mp->m_sb.sb_rextsize; - if (!xfs_verify_rtext(mp, startblock, blockcount)) + if (!xfs_verify_rtbext(mp, startblock, blockcount)) xchk_fblock_set_corrupt(sc, XFS_DATA_FORK, 0); return 0; } diff --git a/fs/xfs/scrub/rtsummary.c b/fs/xfs/scrub/rtsummary.c index f4a2456e01d0..c9e5a3bbdfdc 100644 --- a/fs/xfs/scrub/rtsummary.c +++ b/fs/xfs/scrub/rtsummary.c @@ -141,7 +141,7 @@ xchk_rtsum_record_free( rtbno = rec->ar_startext * mp->m_sb.sb_rextsize; rtlen = rec->ar_extcount * mp->m_sb.sb_rextsize; - if (!xfs_verify_rtext(mp, rtbno, rtlen)) { + if (!xfs_verify_rtbext(mp, rtbno, rtlen)) { xchk_ino_xref_set_corrupt(sc, mp->m_rbmip->i_ino); return -EFSCORRUPTED; } diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c index 5561c0e1136b..bf52d30d7d1c 100644 --- a/fs/xfs/xfs_bmap_item.c +++ b/fs/xfs/xfs_bmap_item.c @@ -476,7 +476,7 @@ xfs_bui_validate( return false; if (map->me_flags & XFS_BMAP_EXTENT_REALTIME) - return xfs_verify_rtext(mp, map->me_startblock, map->me_len); + return xfs_verify_rtbext(mp, map->me_startblock, map->me_len); return xfs_verify_fsbext(mp, map->me_startblock, map->me_len); } ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 07/11] xfs: convert xfs_extlen_t to xfs_rtxlen_t in the rt allocator 2022-12-30 22:17 ` [PATCHSET v1.0 00/11] xfs: clean up realtime type usage Darrick J. Wong ` (9 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 10/11] xfs: rename xfs_verify_rtext to xfs_verify_rtbext Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 10 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> In most of the filesystem, we use xfs_extlen_t to store the length of a file (or AG) space mapping in units of fs blocks. Unfortunately, the realtime allocator also uses it to store the length of a rt space mapping in units of rt extents. This is confusing, since one rt extent can consist of many fs blocks. Separate the two by introducing a new type (xfs_rtxlen_t) to store the length of a space mapping (in units of realtime extents) that would be found in a file. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_rtbitmap.c | 12 +++++----- fs/xfs/libxfs/xfs_rtbitmap.h | 11 ++++----- fs/xfs/libxfs/xfs_types.h | 1 + fs/xfs/scrub/rtbitmap.c | 2 +- fs/xfs/xfs_bmap_util.c | 6 +++-- fs/xfs/xfs_rtalloc.c | 50 +++++++++++++++++++++--------------------- fs/xfs/xfs_rtalloc.h | 10 ++++---- 7 files changed, 46 insertions(+), 46 deletions(-) diff --git a/fs/xfs/libxfs/xfs_rtbitmap.c b/fs/xfs/libxfs/xfs_rtbitmap.c index 196cad3ef85c..b90d2f2d5bde 100644 --- a/fs/xfs/libxfs/xfs_rtbitmap.c +++ b/fs/xfs/libxfs/xfs_rtbitmap.c @@ -541,7 +541,7 @@ xfs_rtmodify_range( xfs_mount_t *mp, /* file system mount point */ xfs_trans_t *tp, /* transaction pointer */ xfs_rtblock_t start, /* starting block to modify */ - xfs_extlen_t len, /* length of extent to modify */ + xfs_rtxlen_t len, /* length of extent to modify */ int val) /* 1 for free, 0 for allocated */ { xfs_rtword_t *b; /* current word in buffer */ @@ -697,7 +697,7 @@ xfs_rtfree_range( xfs_mount_t *mp, /* file system mount point */ xfs_trans_t *tp, /* transaction pointer */ xfs_rtblock_t start, /* starting block to free */ - xfs_extlen_t len, /* length to free */ + xfs_rtxlen_t len, /* length to free */ struct xfs_buf **rbpp, /* in/out: summary block buffer */ xfs_fsblock_t *rsb) /* in/out: summary block number */ { @@ -773,7 +773,7 @@ xfs_rtcheck_range( xfs_mount_t *mp, /* file system mount point */ xfs_trans_t *tp, /* transaction pointer */ xfs_rtblock_t start, /* starting block number of extent */ - xfs_extlen_t len, /* length of extent */ + xfs_rtxlen_t len, /* length of extent */ int val, /* 1 for free, 0 for allocated */ xfs_rtblock_t *new, /* out: first block not matching */ int *stat) /* out: 1 for matches, 0 for not */ @@ -949,7 +949,7 @@ xfs_rtcheck_alloc_range( xfs_mount_t *mp, /* file system mount point */ xfs_trans_t *tp, /* transaction pointer */ xfs_rtblock_t bno, /* starting block number of extent */ - xfs_extlen_t len) /* length of extent */ + xfs_rtxlen_t len) /* length of extent */ { xfs_rtblock_t new; /* dummy for xfs_rtcheck_range */ int stat; @@ -972,7 +972,7 @@ int /* error */ xfs_rtfree_extent( xfs_trans_t *tp, /* transaction pointer */ xfs_rtblock_t bno, /* starting block number to free */ - xfs_extlen_t len) /* length of extent freed */ + xfs_rtxlen_t len) /* length of extent freed */ { int error; /* error value */ xfs_mount_t *mp; /* file system mount structure */ @@ -1123,7 +1123,7 @@ xfs_rtalloc_extent_is_free( struct xfs_mount *mp, struct xfs_trans *tp, xfs_rtblock_t start, - xfs_extlen_t len, + xfs_rtxlen_t len, bool *is_free) { xfs_rtblock_t end; diff --git a/fs/xfs/libxfs/xfs_rtbitmap.h b/fs/xfs/libxfs/xfs_rtbitmap.h index 546dea34bb37..d4449610154a 100644 --- a/fs/xfs/libxfs/xfs_rtbitmap.h +++ b/fs/xfs/libxfs/xfs_rtbitmap.h @@ -26,7 +26,7 @@ typedef int (*xfs_rtalloc_query_range_fn)( int xfs_rtbuf_get(struct xfs_mount *mp, struct xfs_trans *tp, xfs_rtblock_t block, int issum, struct xfs_buf **bpp); int xfs_rtcheck_range(struct xfs_mount *mp, struct xfs_trans *tp, - xfs_rtblock_t start, xfs_extlen_t len, int val, + xfs_rtblock_t start, xfs_rtxlen_t len, int val, xfs_rtblock_t *new, int *stat); int xfs_rtfind_back(struct xfs_mount *mp, struct xfs_trans *tp, xfs_rtblock_t start, xfs_rtblock_t limit, @@ -35,7 +35,7 @@ int xfs_rtfind_forw(struct xfs_mount *mp, struct xfs_trans *tp, xfs_rtblock_t start, xfs_rtblock_t limit, xfs_rtblock_t *rtblock); int xfs_rtmodify_range(struct xfs_mount *mp, struct xfs_trans *tp, - xfs_rtblock_t start, xfs_extlen_t len, int val); + xfs_rtblock_t start, xfs_rtxlen_t len, int val); int xfs_rtmodify_summary_int(struct xfs_mount *mp, struct xfs_trans *tp, int log, xfs_rtblock_t bbno, int delta, struct xfs_buf **rbpp, xfs_fsblock_t *rsb, @@ -44,7 +44,7 @@ int xfs_rtmodify_summary(struct xfs_mount *mp, struct xfs_trans *tp, int log, xfs_rtblock_t bbno, int delta, struct xfs_buf **rbpp, xfs_fsblock_t *rsb); int xfs_rtfree_range(struct xfs_mount *mp, struct xfs_trans *tp, - xfs_rtblock_t start, xfs_extlen_t len, + xfs_rtblock_t start, xfs_rtxlen_t len, struct xfs_buf **rbpp, xfs_fsblock_t *rsb); int xfs_rtalloc_query_range(struct xfs_mount *mp, struct xfs_trans *tp, const struct xfs_rtalloc_rec *low_rec, @@ -53,9 +53,8 @@ int xfs_rtalloc_query_range(struct xfs_mount *mp, struct xfs_trans *tp, int xfs_rtalloc_query_all(struct xfs_mount *mp, struct xfs_trans *tp, xfs_rtalloc_query_range_fn fn, void *priv); -bool xfs_verify_rtbno(struct xfs_mount *mp, xfs_rtblock_t rtbno); int xfs_rtalloc_extent_is_free(struct xfs_mount *mp, struct xfs_trans *tp, - xfs_rtblock_t start, xfs_extlen_t len, + xfs_rtblock_t start, xfs_rtxlen_t len, bool *is_free); /* * Free an extent in the realtime subvolume. Length is expressed in @@ -65,7 +64,7 @@ int /* error */ xfs_rtfree_extent( struct xfs_trans *tp, /* transaction pointer */ xfs_rtblock_t bno, /* starting block number to free */ - xfs_extlen_t len); /* length of extent freed */ + xfs_rtxlen_t len); /* length of extent freed */ /* Same as above, but in units of rt blocks. */ int xfs_rtfree_blocks(struct xfs_trans *tp, xfs_fsblock_t rtbno, diff --git a/fs/xfs/libxfs/xfs_types.h b/fs/xfs/libxfs/xfs_types.h index 9a4019f23dd5..0856997f84d6 100644 --- a/fs/xfs/libxfs/xfs_types.h +++ b/fs/xfs/libxfs/xfs_types.h @@ -11,6 +11,7 @@ typedef uint32_t prid_t; /* project ID */ typedef uint32_t xfs_agblock_t; /* blockno in alloc. group */ typedef uint32_t xfs_agino_t; /* inode # within allocation grp */ typedef uint32_t xfs_extlen_t; /* extent length in blocks */ +typedef uint32_t xfs_rtxlen_t; /* file extent length in rtextents */ typedef uint32_t xfs_agnumber_t; /* allocation group number */ typedef uint64_t xfs_extnum_t; /* # of extents in a file */ typedef uint32_t xfs_aextnum_t; /* # extents in an attribute fork */ diff --git a/fs/xfs/scrub/rtbitmap.c b/fs/xfs/scrub/rtbitmap.c index 4165ed739136..86f726577ca7 100644 --- a/fs/xfs/scrub/rtbitmap.c +++ b/fs/xfs/scrub/rtbitmap.c @@ -145,7 +145,7 @@ xchk_xref_is_used_rt_space( { xfs_rtblock_t startext; xfs_rtblock_t endext; - xfs_rtblock_t extcount; + xfs_rtxlen_t extcount; bool is_free; int error; diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index 3593c0f0ce13..20c1b4f55788 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -77,13 +77,13 @@ xfs_bmap_rtalloc( struct xfs_mount *mp = ap->ip->i_mount; xfs_fileoff_t orig_offset = ap->offset; xfs_rtblock_t rtb; - xfs_extlen_t prod = 0; /* product factor for allocators */ + xfs_rtxlen_t prod = 0; /* product factor for allocators */ xfs_extlen_t mod = 0; /* product factor for allocators */ - xfs_extlen_t ralen = 0; /* realtime allocation length */ + xfs_rtxlen_t ralen = 0; /* realtime allocation length */ xfs_extlen_t align; /* minimum allocation alignment */ xfs_extlen_t orig_length = ap->length; xfs_extlen_t minlen = mp->m_sb.sb_rextsize; - xfs_extlen_t raminlen; + xfs_rtxlen_t raminlen; bool rtlocked = false; bool ignore_locality = false; int error; diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c index b732bac11b01..21f0ac611ef8 100644 --- a/fs/xfs/xfs_rtalloc.c +++ b/fs/xfs/xfs_rtalloc.c @@ -149,7 +149,7 @@ xfs_rtallocate_range( xfs_mount_t *mp, /* file system mount point */ xfs_trans_t *tp, /* transaction pointer */ xfs_rtblock_t start, /* start block to allocate */ - xfs_extlen_t len, /* length to allocate */ + xfs_rtxlen_t len, /* length to allocate */ struct xfs_buf **rbpp, /* in/out: summary block buffer */ xfs_fsblock_t *rsb) /* in/out: summary block number */ { @@ -228,13 +228,13 @@ xfs_rtallocate_extent_block( xfs_mount_t *mp, /* file system mount point */ xfs_trans_t *tp, /* transaction pointer */ xfs_rtblock_t bbno, /* bitmap block number */ - xfs_extlen_t minlen, /* minimum length to allocate */ - xfs_extlen_t maxlen, /* maximum length to allocate */ - xfs_extlen_t *len, /* out: actual length allocated */ + xfs_rtxlen_t minlen, /* minimum length to allocate */ + xfs_rtxlen_t maxlen, /* maximum length to allocate */ + xfs_rtxlen_t *len, /* out: actual length allocated */ xfs_rtblock_t *nextp, /* out: next block to try */ struct xfs_buf **rbpp, /* in/out: summary block buffer */ xfs_fsblock_t *rsb, /* in/out: summary block number */ - xfs_extlen_t prod, /* extent product factor */ + xfs_rtxlen_t prod, /* extent product factor */ xfs_rtblock_t *rtblock) /* out: start block allocated */ { xfs_rtblock_t besti; /* best rtblock found so far */ @@ -312,7 +312,7 @@ xfs_rtallocate_extent_block( * Searched the whole thing & didn't find a maxlen free extent. */ if (minlen < maxlen && besti != -1) { - xfs_extlen_t p; /* amount to trim length by */ + xfs_rtxlen_t p; /* amount to trim length by */ /* * If size should be a multiple of prod, make that so. @@ -353,16 +353,16 @@ xfs_rtallocate_extent_exact( xfs_mount_t *mp, /* file system mount point */ xfs_trans_t *tp, /* transaction pointer */ xfs_rtblock_t bno, /* starting block number to allocate */ - xfs_extlen_t minlen, /* minimum length to allocate */ - xfs_extlen_t maxlen, /* maximum length to allocate */ - xfs_extlen_t *len, /* out: actual length allocated */ + xfs_rtxlen_t minlen, /* minimum length to allocate */ + xfs_rtxlen_t maxlen, /* maximum length to allocate */ + xfs_rtxlen_t *len, /* out: actual length allocated */ struct xfs_buf **rbpp, /* in/out: summary block buffer */ xfs_fsblock_t *rsb, /* in/out: summary block number */ - xfs_extlen_t prod, /* extent product factor */ + xfs_rtxlen_t prod, /* extent product factor */ xfs_rtblock_t *rtblock) /* out: start block allocated */ { int error; /* error value */ - xfs_extlen_t i; /* extent length trimmed due to prod */ + xfs_rtxlen_t i; /* extent length trimmed due to prod */ int isfree; /* extent is free */ xfs_rtblock_t next; /* next block to try (dummy) */ @@ -433,12 +433,12 @@ xfs_rtallocate_extent_near( xfs_mount_t *mp, /* file system mount point */ xfs_trans_t *tp, /* transaction pointer */ xfs_rtblock_t bno, /* starting block number to allocate */ - xfs_extlen_t minlen, /* minimum length to allocate */ - xfs_extlen_t maxlen, /* maximum length to allocate */ - xfs_extlen_t *len, /* out: actual length allocated */ + xfs_rtxlen_t minlen, /* minimum length to allocate */ + xfs_rtxlen_t maxlen, /* maximum length to allocate */ + xfs_rtxlen_t *len, /* out: actual length allocated */ struct xfs_buf **rbpp, /* in/out: summary block buffer */ xfs_fsblock_t *rsb, /* in/out: summary block number */ - xfs_extlen_t prod, /* extent product factor */ + xfs_rtxlen_t prod, /* extent product factor */ xfs_rtblock_t *rtblock) /* out: start block allocated */ { int any; /* any useful extents from summary */ @@ -642,12 +642,12 @@ STATIC int /* error */ xfs_rtallocate_extent_size( xfs_mount_t *mp, /* file system mount point */ xfs_trans_t *tp, /* transaction pointer */ - xfs_extlen_t minlen, /* minimum length to allocate */ - xfs_extlen_t maxlen, /* maximum length to allocate */ - xfs_extlen_t *len, /* out: actual length allocated */ + xfs_rtxlen_t minlen, /* minimum length to allocate */ + xfs_rtxlen_t maxlen, /* maximum length to allocate */ + xfs_rtxlen_t *len, /* out: actual length allocated */ struct xfs_buf **rbpp, /* in/out: summary block buffer */ xfs_fsblock_t *rsb, /* in/out: summary block number */ - xfs_extlen_t prod, /* extent product factor */ + xfs_rtxlen_t prod, /* extent product factor */ xfs_rtblock_t *rtblock) /* out: start block allocated */ { int error; /* error value */ @@ -1195,11 +1195,11 @@ int /* error */ xfs_rtallocate_extent( xfs_trans_t *tp, /* transaction pointer */ xfs_rtblock_t bno, /* starting block number to allocate */ - xfs_extlen_t minlen, /* minimum length to allocate */ - xfs_extlen_t maxlen, /* maximum length to allocate */ - xfs_extlen_t *len, /* out: actual length allocated */ + xfs_rtxlen_t minlen, /* minimum length to allocate */ + xfs_rtxlen_t maxlen, /* maximum length to allocate */ + xfs_rtxlen_t *len, /* out: actual length allocated */ int wasdel, /* was a delayed allocation extent */ - xfs_extlen_t prod, /* extent product factor */ + xfs_rtxlen_t prod, /* extent product factor */ xfs_rtblock_t *rtblock) /* out: start block allocated */ { xfs_mount_t *mp = tp->t_mountp; @@ -1215,7 +1215,7 @@ xfs_rtallocate_extent( * If prod is set then figure out what to do to minlen and maxlen. */ if (prod > 1) { - xfs_extlen_t i; + xfs_rtxlen_t i; if ((i = maxlen % prod)) maxlen -= i; @@ -1448,7 +1448,7 @@ int /* error */ xfs_rtpick_extent( xfs_mount_t *mp, /* file system mount point */ xfs_trans_t *tp, /* transaction pointer */ - xfs_extlen_t len, /* allocation length (rtextents) */ + xfs_rtxlen_t len, /* allocation length (rtextents) */ xfs_rtblock_t *pick) /* result rt extent */ { xfs_rtblock_t b; /* result block */ diff --git a/fs/xfs/xfs_rtalloc.h b/fs/xfs/xfs_rtalloc.h index f14da84206d9..ec03cc566bec 100644 --- a/fs/xfs/xfs_rtalloc.h +++ b/fs/xfs/xfs_rtalloc.h @@ -25,11 +25,11 @@ int /* error */ xfs_rtallocate_extent( struct xfs_trans *tp, /* transaction pointer */ xfs_rtblock_t bno, /* starting block number to allocate */ - xfs_extlen_t minlen, /* minimum length to allocate */ - xfs_extlen_t maxlen, /* maximum length to allocate */ - xfs_extlen_t *len, /* out: actual length allocated */ + xfs_rtxlen_t minlen, /* minimum length to allocate */ + xfs_rtxlen_t maxlen, /* maximum length to allocate */ + xfs_rtxlen_t *len, /* out: actual length allocated */ int wasdel, /* was a delayed allocation extent */ - xfs_extlen_t prod, /* extent product factor */ + xfs_rtxlen_t prod, /* extent product factor */ xfs_rtblock_t *rtblock); /* out: start block allocated */ @@ -62,7 +62,7 @@ int /* error */ xfs_rtpick_extent( struct xfs_mount *mp, /* file system mount point */ struct xfs_trans *tp, /* transaction pointer */ - xfs_extlen_t len, /* allocation length (rtextents) */ + xfs_rtxlen_t len, /* allocation length (rtextents) */ xfs_rtblock_t *pick); /* result rt extent */ /* ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCHSET v1.0 0/7] xfs: refactor rt extent unit conversions 2022-12-30 21:14 [NYE DELUGE 3/4] xfs: modernize the realtime volume Darrick J. Wong ` (3 preceding siblings ...) 2022-12-30 22:17 ` [PATCHSET v1.0 00/11] xfs: clean up realtime type usage Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 1/7] xfs: create a helper to convert rtextents to rtblocks Darrick J. Wong ` (6 more replies) 2022-12-30 22:17 ` [PATCHSET v1.0 0/8] xfs: refactor rtbitmap/summary macros Darrick J. Wong ` (34 subsequent siblings) 39 siblings, 7 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs Hi all, This series replaces all the open-coded integer division and multiplication conversions between rt blocks and rt extents with calls to static inline helpers. Having cleaned all that up, the helpers are augmented to skip the expensive operations in favor of bit shifts and masking if the rt extent size is a power of two. If you're going to start using this mess, you probably ought to just pull from my git trees, which are linked below. This is an extraordinary way to destroy everything. Enjoy! Comments and questions are, as always, welcome. --D kernel git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=refactor-rt-unit-conversions xfsprogs git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=refactor-rt-unit-conversions --- fs/xfs/libxfs/xfs_bmap.c | 19 +++----- fs/xfs/libxfs/xfs_rtbitmap.c | 4 +- fs/xfs/libxfs/xfs_rtbitmap.h | 88 +++++++++++++++++++++++++++++++++++++++ fs/xfs/libxfs/xfs_sb.c | 2 + fs/xfs/libxfs/xfs_swapext.c | 7 ++- fs/xfs/libxfs/xfs_trans_inode.c | 3 + fs/xfs/libxfs/xfs_trans_resv.c | 3 + fs/xfs/scrub/inode.c | 3 + fs/xfs/scrub/inode_repair.c | 3 + fs/xfs/scrub/rtbitmap.c | 18 +++----- fs/xfs/scrub/rtsummary.c | 4 +- fs/xfs/xfs_bmap_util.c | 38 +++++++---------- fs/xfs/xfs_fsmap.c | 12 +++-- fs/xfs/xfs_ioctl.c | 5 +- fs/xfs/xfs_linux.h | 12 +++++ fs/xfs/xfs_mount.h | 2 + fs/xfs/xfs_rtalloc.c | 16 ++++--- fs/xfs/xfs_super.c | 3 + fs/xfs/xfs_trans.c | 9 +++- fs/xfs/xfs_xchgrange.c | 4 +- 20 files changed, 178 insertions(+), 77 deletions(-) ^ permalink raw reply [flat|nested] 565+ messages in thread
* [PATCH 1/7] xfs: create a helper to convert rtextents to rtblocks 2022-12-30 22:17 ` [PATCHSET v1.0 0/7] xfs: refactor rt extent unit conversions Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 2/7] xfs: create a helper to compute leftovers of realtime extents Darrick J. Wong ` (5 subsequent siblings) 6 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Create a helper to convert a realtime extent to a realtime block. Later on we'll change the helper to use bit shifts when possible. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_rtbitmap.h | 16 ++++++++++++++++ fs/xfs/scrub/rtbitmap.c | 4 ++-- fs/xfs/scrub/rtsummary.c | 4 ++-- fs/xfs/xfs_bmap_util.c | 9 +++++---- fs/xfs/xfs_fsmap.c | 4 ++-- fs/xfs/xfs_super.c | 3 ++- 6 files changed, 29 insertions(+), 11 deletions(-) diff --git a/fs/xfs/libxfs/xfs_rtbitmap.h b/fs/xfs/libxfs/xfs_rtbitmap.h index 5e2afb7fea0e..099ea8902aaa 100644 --- a/fs/xfs/libxfs/xfs_rtbitmap.h +++ b/fs/xfs/libxfs/xfs_rtbitmap.h @@ -6,6 +6,22 @@ #ifndef __XFS_RTBITMAP_H__ #define __XFS_RTBITMAP_H__ +static inline xfs_rtblock_t +xfs_rtx_to_rtb( + struct xfs_mount *mp, + xfs_rtxnum_t rtx) +{ + return rtx * mp->m_sb.sb_rextsize; +} + +static inline xfs_extlen_t +xfs_rtxlen_to_extlen( + struct xfs_mount *mp, + xfs_rtxlen_t rtxlen) +{ + return rtxlen * mp->m_sb.sb_rextsize; +} + /* * Functions for walking free space rtextents in the realtime bitmap. */ diff --git a/fs/xfs/scrub/rtbitmap.c b/fs/xfs/scrub/rtbitmap.c index 29c5af5a289f..af9eee6ed5ce 100644 --- a/fs/xfs/scrub/rtbitmap.c +++ b/fs/xfs/scrub/rtbitmap.c @@ -62,8 +62,8 @@ xchk_rtbitmap_rec( xfs_rtxnum_t startblock; xfs_filblks_t blockcount; - startblock = rec->ar_startext * mp->m_sb.sb_rextsize; - blockcount = rec->ar_extcount * mp->m_sb.sb_rextsize; + startblock = xfs_rtx_to_rtb(mp, rec->ar_startext); + blockcount = xfs_rtx_to_rtb(mp, rec->ar_extcount); if (!xfs_verify_rtbext(mp, startblock, blockcount)) xchk_fblock_set_corrupt(sc, XFS_DATA_FORK, 0); diff --git a/fs/xfs/scrub/rtsummary.c b/fs/xfs/scrub/rtsummary.c index 91c39564298c..7885ddd2d733 100644 --- a/fs/xfs/scrub/rtsummary.c +++ b/fs/xfs/scrub/rtsummary.c @@ -138,8 +138,8 @@ xchk_rtsum_record_free( lenlog = XFS_RTBLOCKLOG(rec->ar_extcount); offs = XFS_SUMOFFS(mp, lenlog, rbmoff); - rtbno = rec->ar_startext * mp->m_sb.sb_rextsize; - rtlen = rec->ar_extcount * mp->m_sb.sb_rextsize; + rtbno = xfs_rtx_to_rtb(mp, rec->ar_startext); + rtlen = xfs_rtx_to_rtb(mp, rec->ar_extcount); if (!xfs_verify_rtbext(mp, rtbno, rtlen)) { xchk_ino_xref_set_corrupt(sc, mp->m_rbmip->i_ino); diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index 018c3bcc225e..e0d3c60c7d9c 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -29,6 +29,7 @@ #include "xfs_iomap.h" #include "xfs_reflink.h" #include "xfs_swapext.h" +#include "xfs_rtbitmap.h" /* Kernel only BMAP related definitions and functions */ @@ -126,7 +127,7 @@ xfs_bmap_rtalloc( * XFS_BMBT_MAX_EXTLEN), we don't hear about that number, and can't * adjust the starting point to match it. */ - if (ralen * mp->m_sb.sb_rextsize >= XFS_MAX_BMBT_EXTLEN) + if (xfs_rtxlen_to_extlen(mp, ralen) >= XFS_MAX_BMBT_EXTLEN) ralen = XFS_MAX_BMBT_EXTLEN / mp->m_sb.sb_rextsize; /* @@ -148,7 +149,7 @@ xfs_bmap_rtalloc( error = xfs_rtpick_extent(mp, ap->tp, ralen, &rtx); if (error) return error; - ap->blkno = rtx * mp->m_sb.sb_rextsize; + ap->blkno = xfs_rtx_to_rtb(mp, rtx); } else { ap->blkno = 0; } @@ -171,8 +172,8 @@ xfs_bmap_rtalloc( return error; if (rtx != NULLRTEXTNO) { - ap->blkno = rtx * mp->m_sb.sb_rextsize; - ap->length = ralen * mp->m_sb.sb_rextsize; + ap->blkno = xfs_rtx_to_rtb(mp, rtx); + ap->length = xfs_rtxlen_to_extlen(mp, ralen); ap->ip->i_nblocks += ap->length; xfs_trans_log_inode(ap->tp, ap->ip, XFS_ILOG_CORE); if (ap->wasdel) diff --git a/fs/xfs/xfs_fsmap.c b/fs/xfs/xfs_fsmap.c index 0027e186bd52..3738dc936d85 100644 --- a/fs/xfs/xfs_fsmap.c +++ b/fs/xfs/xfs_fsmap.c @@ -459,9 +459,9 @@ xfs_getfsmap_rtdev_rtbitmap_helper( struct xfs_rmap_irec irec; xfs_daddr_t rec_daddr; - irec.rm_startblock = rec->ar_startext * mp->m_sb.sb_rextsize; + irec.rm_startblock = xfs_rtx_to_rtb(mp, rec->ar_startext); rec_daddr = XFS_FSB_TO_BB(mp, irec.rm_startblock); - irec.rm_blockcount = rec->ar_extcount * mp->m_sb.sb_rextsize; + irec.rm_blockcount = xfs_rtx_to_rtb(mp, rec->ar_extcount); irec.rm_owner = XFS_RMAP_OWN_NULL; /* "free" */ irec.rm_offset = 0; irec.rm_flags = 0; diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index 9eff9ee106c4..19a22f9225e4 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -43,6 +43,7 @@ #include "xfs_iunlink_item.h" #include "scrub/rcbag_btree.h" #include "xfs_swapext_item.h" +#include "xfs_rtbitmap.h" #include <linux/magic.h> #include <linux/fs_context.h> @@ -863,7 +864,7 @@ xfs_fs_statfs( statp->f_blocks = sbp->sb_rblocks; freertx = percpu_counter_sum_positive(&mp->m_frextents); - statp->f_bavail = statp->f_bfree = freertx * sbp->sb_rextsize; + statp->f_bavail = statp->f_bfree = xfs_rtx_to_rtb(mp, freertx); } return 0; ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 2/7] xfs: create a helper to compute leftovers of realtime extents 2022-12-30 22:17 ` [PATCHSET v1.0 0/7] xfs: refactor rt extent unit conversions Darrick J. Wong 2022-12-30 22:17 ` [PATCH 1/7] xfs: create a helper to convert rtextents to rtblocks Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 5/7] xfs: convert do_div calls to xfs_rtb_to_rtx helper calls Darrick J. Wong ` (4 subsequent siblings) 6 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Create a helper to compute the misalignment between a file extent (xfs_extlen_t) and a realtime extent. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_bmap.c | 4 ++-- fs/xfs/libxfs/xfs_rtbitmap.h | 9 +++++++++ fs/xfs/libxfs/xfs_trans_inode.c | 3 ++- fs/xfs/scrub/inode.c | 3 ++- fs/xfs/scrub/inode_repair.c | 3 ++- fs/xfs/xfs_bmap_util.c | 2 +- fs/xfs/xfs_ioctl.c | 5 +++-- 7 files changed, 21 insertions(+), 8 deletions(-) diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index 7f7f0d435b33..888b51a09acb 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -3058,7 +3058,7 @@ xfs_bmap_extsize_align( * If realtime, and the result isn't a multiple of the realtime * extent size we need to remove blocks until it is. */ - if (rt && (temp = (align_alen % mp->m_sb.sb_rextsize))) { + if (rt && (temp = xfs_extlen_to_rtxmod(mp, align_alen))) { /* * We're not covering the original request, or * we won't be able to once we fix the length. @@ -3085,7 +3085,7 @@ xfs_bmap_extsize_align( else { align_alen -= orig_off - align_off; align_off = orig_off; - align_alen -= align_alen % mp->m_sb.sb_rextsize; + align_alen -= xfs_extlen_to_rtxmod(mp, align_alen); } /* * Result doesn't cover the request, fail it. diff --git a/fs/xfs/libxfs/xfs_rtbitmap.h b/fs/xfs/libxfs/xfs_rtbitmap.h index 099ea8902aaa..b6a4c46bddc0 100644 --- a/fs/xfs/libxfs/xfs_rtbitmap.h +++ b/fs/xfs/libxfs/xfs_rtbitmap.h @@ -22,6 +22,15 @@ xfs_rtxlen_to_extlen( return rtxlen * mp->m_sb.sb_rextsize; } +/* Compute the misalignment between an extent length and a realtime extent .*/ +static inline unsigned int +xfs_extlen_to_rtxmod( + struct xfs_mount *mp, + xfs_extlen_t len) +{ + return len % mp->m_sb.sb_rextsize; +} + /* * Functions for walking free space rtextents in the realtime bitmap. */ diff --git a/fs/xfs/libxfs/xfs_trans_inode.c b/fs/xfs/libxfs/xfs_trans_inode.c index 6a3a869635bf..4571db873f14 100644 --- a/fs/xfs/libxfs/xfs_trans_inode.c +++ b/fs/xfs/libxfs/xfs_trans_inode.c @@ -14,6 +14,7 @@ #include "xfs_trans.h" #include "xfs_trans_priv.h" #include "xfs_inode_item.h" +#include "xfs_rtbitmap.h" #include <linux/iversion.h> @@ -152,7 +153,7 @@ xfs_trans_log_inode( */ if ((ip->i_diflags & XFS_DIFLAG_RTINHERIT) && (ip->i_diflags & XFS_DIFLAG_EXTSZINHERIT) && - (ip->i_extsize % ip->i_mount->m_sb.sb_rextsize) > 0) { + xfs_extlen_to_rtxmod(ip->i_mount, ip->i_extsize) > 0) { ip->i_diflags &= ~(XFS_DIFLAG_EXTSIZE | XFS_DIFLAG_EXTSZINHERIT); ip->i_extsize = 0; diff --git a/fs/xfs/scrub/inode.c b/fs/xfs/scrub/inode.c index d86a2e1572ee..4e534ec642e2 100644 --- a/fs/xfs/scrub/inode.c +++ b/fs/xfs/scrub/inode.c @@ -19,6 +19,7 @@ #include "xfs_reflink.h" #include "xfs_rmap.h" #include "xfs_bmap_util.h" +#include "xfs_rtbitmap.h" #include "scrub/scrub.h" #include "scrub/common.h" #include "scrub/btree.h" @@ -224,7 +225,7 @@ xchk_inode_extsize( */ if ((flags & XFS_DIFLAG_RTINHERIT) && (flags & XFS_DIFLAG_EXTSZINHERIT) && - value % sc->mp->m_sb.sb_rextsize > 0) + xfs_extlen_to_rtxmod(sc->mp, value) > 0) xchk_ino_set_warning(sc, ino); } diff --git a/fs/xfs/scrub/inode_repair.c b/fs/xfs/scrub/inode_repair.c index e9225536dc65..ef10f031146e 100644 --- a/fs/xfs/scrub/inode_repair.c +++ b/fs/xfs/scrub/inode_repair.c @@ -36,6 +36,7 @@ #include "xfs_attr_leaf.h" #include "xfs_log_priv.h" #include "xfs_symlink_remote.h" +#include "xfs_rtbitmap.h" #include "scrub/xfs_scrub.h" #include "scrub/scrub.h" #include "scrub/common.h" @@ -1506,7 +1507,7 @@ xrep_inode_extsize( /* Fix misaligned extent size hints on a directory. */ if ((sc->ip->i_diflags & XFS_DIFLAG_RTINHERIT) && (sc->ip->i_diflags & XFS_DIFLAG_EXTSZINHERIT) && - sc->ip->i_extsize % sc->mp->m_sb.sb_rextsize > 0) { + xfs_extlen_to_rtxmod(sc->mp, sc->ip->i_extsize) > 0) { sc->ip->i_extsize = 0; sc->ip->i_diflags &= ~XFS_DIFLAG_EXTSZINHERIT; } diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index e0d3c60c7d9c..cc158e5e095f 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -98,7 +98,7 @@ xfs_bmap_rtalloc( if (error) return error; ASSERT(ap->length); - ASSERT(ap->length % mp->m_sb.sb_rextsize == 0); + ASSERT(xfs_extlen_to_rtxmod(mp, ap->length) == 0); /* * If we shifted the file offset downward to satisfy an extent size diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c index 37af6b7e6dbe..e3e6d377d958 100644 --- a/fs/xfs/xfs_ioctl.c +++ b/fs/xfs/xfs_ioctl.c @@ -38,6 +38,7 @@ #include "xfs_reflink.h" #include "xfs_ioctl.h" #include "xfs_xattr.h" +#include "xfs_rtbitmap.h" #include <linux/mount.h> #include <linux/namei.h> @@ -1013,7 +1014,7 @@ xfs_fill_fsxattr( * later. */ if ((ip->i_diflags & XFS_DIFLAG_RTINHERIT) && - ip->i_extsize % mp->m_sb.sb_rextsize > 0) { + xfs_extlen_to_rtxmod(mp, ip->i_extsize) > 0) { fa->fsx_xflags &= ~(FS_XFLAG_EXTSIZE | FS_XFLAG_EXTSZINHERIT); fa->fsx_extsize = 0; @@ -1079,7 +1080,7 @@ xfs_ioctl_setattr_xflags( /* If realtime flag is set then must have realtime device */ if (fa->fsx_xflags & FS_XFLAG_REALTIME) { if (mp->m_sb.sb_rblocks == 0 || mp->m_sb.sb_rextsize == 0 || - (ip->i_extsize % mp->m_sb.sb_rextsize)) + xfs_extlen_to_rtxmod(mp, ip->i_extsize)) return -EINVAL; } ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 5/7] xfs: convert do_div calls to xfs_rtb_to_rtx helper calls 2022-12-30 22:17 ` [PATCHSET v1.0 0/7] xfs: refactor rt extent unit conversions Darrick J. Wong 2022-12-30 22:17 ` [PATCH 1/7] xfs: create a helper to convert rtextents to rtblocks Darrick J. Wong 2022-12-30 22:17 ` [PATCH 2/7] xfs: create a helper to compute leftovers of realtime extents Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 4/7] xfs: create helpers to convert rt block numbers to rt extent numbers Darrick J. Wong ` (3 subsequent siblings) 6 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Convert these calls to use the helpers, and clean up all these places where the same variable can have different units depending on where it is in the function. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_bmap.c | 8 ++------ fs/xfs/scrub/rtbitmap.c | 14 +++++--------- fs/xfs/xfs_bmap_util.c | 10 ++++------ fs/xfs/xfs_fsmap.c | 8 ++++---- fs/xfs/xfs_rtalloc.c | 3 +-- 5 files changed, 16 insertions(+), 27 deletions(-) diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index 055432476ef0..1ad8606c1dd9 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -4921,12 +4921,8 @@ xfs_bmap_del_extent_delay( ASSERT(got->br_startoff <= del->br_startoff); ASSERT(got_endoff >= del_endoff); - if (isrt) { - uint64_t rtexts = XFS_FSB_TO_B(mp, del->br_blockcount); - - do_div(rtexts, mp->m_sb.sb_rextsize); - xfs_mod_frextents(mp, rtexts); - } + if (isrt) + xfs_mod_frextents(mp, xfs_rtb_to_rtxt(mp, del->br_blockcount)); /* * Update the inode delalloc counter now and wait to update the diff --git a/fs/xfs/scrub/rtbitmap.c b/fs/xfs/scrub/rtbitmap.c index af9eee6ed5ce..a50b0580f3d8 100644 --- a/fs/xfs/scrub/rtbitmap.c +++ b/fs/xfs/scrub/rtbitmap.c @@ -140,25 +140,21 @@ xchk_rtbitmap( void xchk_xref_is_used_rt_space( struct xfs_scrub *sc, - xfs_rtblock_t fsbno, + xfs_rtblock_t rtbno, xfs_extlen_t len) { xfs_rtxnum_t startext; xfs_rtxnum_t endext; - xfs_rtxlen_t extcount; bool is_free; int error; if (xchk_skip_xref(sc->sm)) return; - startext = fsbno; - endext = fsbno + len - 1; - do_div(startext, sc->mp->m_sb.sb_rextsize); - do_div(endext, sc->mp->m_sb.sb_rextsize); - extcount = endext - startext + 1; - error = xfs_rtalloc_extent_is_free(sc->mp, sc->tp, startext, extcount, - &is_free); + startext = xfs_rtb_to_rtxt(sc->mp, rtbno); + endext = xfs_rtb_to_rtxt(sc->mp, rtbno + len - 1); + error = xfs_rtalloc_extent_is_free(sc->mp, sc->tp, startext, + endext - startext + 1, &is_free); if (!xchk_should_check_xref(sc, &error, NULL)) return; if (is_free) diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index a55347c693ed..e595625048f8 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -157,14 +157,12 @@ xfs_bmap_rtalloc( * Realtime allocation, done through xfs_rtallocate_extent. */ if (ignore_locality) - ap->blkno = 0; + rtx = 0; else - do_div(ap->blkno, mp->m_sb.sb_rextsize); - rtx = ap->blkno; - ap->length = ralen; + rtx = xfs_rtb_to_rtxt(mp, ap->blkno); raminlen = max_t(xfs_rtxlen_t, 1, xfs_extlen_to_rtxlen(mp, minlen)); - error = xfs_rtallocate_extent(ap->tp, ap->blkno, raminlen, ap->length, - &ralen, ap->wasdel, prod, &rtx); + error = xfs_rtallocate_extent(ap->tp, rtx, raminlen, ralen, &ralen, + ap->wasdel, prod, &rtx); if (error) return error; diff --git a/fs/xfs/xfs_fsmap.c b/fs/xfs/xfs_fsmap.c index 3738dc936d85..8d0a6f480d2a 100644 --- a/fs/xfs/xfs_fsmap.c +++ b/fs/xfs/xfs_fsmap.c @@ -522,6 +522,7 @@ xfs_getfsmap_rtdev_rtbitmap_query( struct xfs_rtalloc_rec alow = { 0 }; struct xfs_rtalloc_rec ahigh = { 0 }; struct xfs_mount *mp = tp->t_mountp; + unsigned int mod; int error; xfs_ilock(mp->m_rbmip, XFS_ILOCK_SHARED | XFS_ILOCK_RTBITMAP); @@ -530,10 +531,9 @@ xfs_getfsmap_rtdev_rtbitmap_query( * Set up query parameters to return free rtextents covering the range * we want. */ - alow.ar_startext = info->low.rm_startblock; - ahigh.ar_startext = info->high.rm_startblock; - do_div(alow.ar_startext, mp->m_sb.sb_rextsize); - if (do_div(ahigh.ar_startext, mp->m_sb.sb_rextsize)) + alow.ar_startext = xfs_rtb_to_rtxt(mp, info->low.rm_startblock); + ahigh.ar_startext = xfs_rtb_to_rtx(mp, info->high.rm_startblock, &mod); + if (mod) ahigh.ar_startext++; error = xfs_rtalloc_query_range(mp, tp, &alow, &ahigh, xfs_getfsmap_rtdev_rtbitmap_helper, info); diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c index 04a468f4cb8a..1953a00755f4 100644 --- a/fs/xfs/xfs_rtalloc.c +++ b/fs/xfs/xfs_rtalloc.c @@ -1056,8 +1056,7 @@ xfs_growfs_rt( nrblocks_step = (bmbno + 1) * NBBY * nsbp->sb_blocksize * nsbp->sb_rextsize; nsbp->sb_rblocks = min(nrblocks, nrblocks_step); - nsbp->sb_rextents = nsbp->sb_rblocks; - do_div(nsbp->sb_rextents, nsbp->sb_rextsize); + nsbp->sb_rextents = xfs_rtb_to_rtxt(nmp, nsbp->sb_rblocks); ASSERT(nsbp->sb_rextents != 0); nsbp->sb_rextslog = xfs_highbit32(nsbp->sb_rextents); nrsumlevels = nmp->m_rsumlevels = nsbp->sb_rextslog + 1; ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 4/7] xfs: create helpers to convert rt block numbers to rt extent numbers 2022-12-30 22:17 ` [PATCHSET v1.0 0/7] xfs: refactor rt extent unit conversions Darrick J. Wong ` (2 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 5/7] xfs: convert do_div calls to xfs_rtb_to_rtx helper calls Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 3/7] xfs: create a helper to compute leftovers of realtime extents Darrick J. Wong ` (2 subsequent siblings) 6 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Create helpers to do unit conversions of rt block numbers to rt extent numbers. There are two variations -- the suffix "t" denotes the one that returns only the truncated extent number; the other one also returns the misalignment. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_bmap.c | 7 +++---- fs/xfs/libxfs/xfs_rtbitmap.c | 4 ++-- fs/xfs/libxfs/xfs_rtbitmap.h | 17 +++++++++++++++++ fs/xfs/libxfs/xfs_swapext.c | 7 ++++--- fs/xfs/xfs_rtalloc.c | 8 ++++---- 5 files changed, 30 insertions(+), 13 deletions(-) diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index 888b51a09acb..055432476ef0 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -5371,7 +5371,6 @@ __xfs_bunmapi( int tmp_logflags; /* partial logging flags */ int wasdel; /* was a delayed alloc extent */ int whichfork; /* data or attribute fork */ - xfs_fsblock_t sum; xfs_filblks_t len = *rlen; /* length to unmap in file */ xfs_fileoff_t end; struct xfs_iext_cursor icur; @@ -5468,8 +5467,7 @@ __xfs_bunmapi( if (!isrt || (flags & XFS_BMAPI_REMAP)) goto delete; - sum = del.br_startblock + del.br_blockcount; - div_u64_rem(sum, mp->m_sb.sb_rextsize, &mod); + xfs_rtb_to_rtx(mp, del.br_startblock + del.br_blockcount, &mod); if (mod) { /* * Realtime extent not lined up at the end. @@ -5516,7 +5514,8 @@ __xfs_bunmapi( goto error0; goto nodelete; } - div_u64_rem(del.br_startblock, mp->m_sb.sb_rextsize, &mod); + + xfs_rtb_to_rtx(mp, del.br_startblock, &mod); if (mod) { xfs_extlen_t off = mp->m_sb.sb_rextsize - mod; diff --git a/fs/xfs/libxfs/xfs_rtbitmap.c b/fs/xfs/libxfs/xfs_rtbitmap.c index ce1443681131..de54386cf52f 100644 --- a/fs/xfs/libxfs/xfs_rtbitmap.c +++ b/fs/xfs/libxfs/xfs_rtbitmap.c @@ -1031,13 +1031,13 @@ xfs_rtfree_blocks( ASSERT(rtlen <= XFS_MAX_BMBT_EXTLEN); - len = div_u64_rem(rtlen, mp->m_sb.sb_rextsize, &mod); + len = xfs_rtb_to_rtx(mp, rtlen, &mod); if (mod) { ASSERT(mod == 0); return -EIO; } - start = div_u64_rem(rtbno, mp->m_sb.sb_rextsize, &mod); + start = xfs_rtb_to_rtx(mp, rtbno, &mod); if (mod) { ASSERT(mod == 0); return -EIO; diff --git a/fs/xfs/libxfs/xfs_rtbitmap.h b/fs/xfs/libxfs/xfs_rtbitmap.h index e2a36fc157c4..bdd4858a794c 100644 --- a/fs/xfs/libxfs/xfs_rtbitmap.h +++ b/fs/xfs/libxfs/xfs_rtbitmap.h @@ -39,6 +39,23 @@ xfs_extlen_to_rtxlen( return len / mp->m_sb.sb_rextsize; } +static inline xfs_rtxnum_t +xfs_rtb_to_rtx( + struct xfs_mount *mp, + xfs_rtblock_t rtbno, + xfs_extlen_t *mod) +{ + return div_u64_rem(rtbno, mp->m_sb.sb_rextsize, mod); +} + +static inline xfs_rtxnum_t +xfs_rtb_to_rtxt( + struct xfs_mount *mp, + xfs_rtblock_t rtbno) +{ + return div_u64(rtbno, mp->m_sb.sb_rextsize); +} + /* * Functions for walking free space rtextents in the realtime bitmap. */ diff --git a/fs/xfs/libxfs/xfs_swapext.c b/fs/xfs/libxfs/xfs_swapext.c index 42df372d1a89..36f03b0bf4ed 100644 --- a/fs/xfs/libxfs/xfs_swapext.c +++ b/fs/xfs/libxfs/xfs_swapext.c @@ -30,6 +30,7 @@ #include "xfs_dir2_priv.h" #include "xfs_dir2.h" #include "xfs_symlink_remote.h" +#include "xfs_rtbitmap.h" struct kmem_cache *xfs_swapext_intent_cache; @@ -215,19 +216,19 @@ xfs_swapext_check_rt_extents( irec2.br_blockcount); /* Both mappings must be aligned to the realtime extent size. */ - div_u64_rem(irec1.br_startoff, mp->m_sb.sb_rextsize, &mod); + xfs_rtb_to_rtx(mp, irec1.br_startoff, &mod); if (mod) { ASSERT(mod == 0); return -EINVAL; } - div_u64_rem(irec2.br_startoff, mp->m_sb.sb_rextsize, &mod); + xfs_rtb_to_rtx(mp, irec1.br_startoff, &mod); if (mod) { ASSERT(mod == 0); return -EINVAL; } - div_u64_rem(irec1.br_blockcount, mp->m_sb.sb_rextsize, &mod); + xfs_rtb_to_rtx(mp, irec1.br_blockcount, &mod); if (mod) { ASSERT(mod == 0); return -EINVAL; diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c index 40b6df0ad633..04a468f4cb8a 100644 --- a/fs/xfs/xfs_rtalloc.c +++ b/fs/xfs/xfs_rtalloc.c @@ -1494,16 +1494,16 @@ xfs_rtfile_want_conversion( struct xfs_bmbt_irec *irec) { xfs_fileoff_t rext_next; - uint32_t modoff, modcnt; + xfs_extlen_t modoff, modcnt; if (irec->br_state != XFS_EXT_UNWRITTEN) return false; - div_u64_rem(irec->br_startoff, mp->m_sb.sb_rextsize, &modoff); + xfs_rtb_to_rtx(mp, irec->br_startoff, &modoff); if (modoff == 0) { - uint64_t rexts = div_u64_rem(irec->br_blockcount, - mp->m_sb.sb_rextsize, &modcnt); + xfs_rtbxlen_t rexts; + rexts = xfs_rtb_to_rtx(mp, irec->br_blockcount, &modcnt); if (rexts > 0) { /* * Unwritten mapping starts at an rt extent boundary ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 3/7] xfs: create a helper to compute leftovers of realtime extents 2022-12-30 22:17 ` [PATCHSET v1.0 0/7] xfs: refactor rt extent unit conversions Darrick J. Wong ` (3 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 4/7] xfs: create helpers to convert rt block numbers to rt extent numbers Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 7/7] xfs: use shifting and masking when converting rt extents, if possible Darrick J. Wong 2022-12-30 22:17 ` [PATCH 6/7] xfs: create rt extent rounding helpers for realtime extent blocks Darrick J. Wong 6 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Create a helper to compute the realtime extent (xfs_rtxlen_t) from an extent length (xfs_extlen_t) value. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_rtbitmap.h | 8 ++++++++ fs/xfs/libxfs/xfs_trans_resv.c | 3 ++- fs/xfs/xfs_bmap_util.c | 11 ++++------- fs/xfs/xfs_trans.c | 5 +++-- 4 files changed, 17 insertions(+), 10 deletions(-) diff --git a/fs/xfs/libxfs/xfs_rtbitmap.h b/fs/xfs/libxfs/xfs_rtbitmap.h index b6a4c46bddc0..e2a36fc157c4 100644 --- a/fs/xfs/libxfs/xfs_rtbitmap.h +++ b/fs/xfs/libxfs/xfs_rtbitmap.h @@ -31,6 +31,14 @@ xfs_extlen_to_rtxmod( return len % mp->m_sb.sb_rextsize; } +static inline xfs_rtxlen_t +xfs_extlen_to_rtxlen( + struct xfs_mount *mp, + xfs_extlen_t len) +{ + return len / mp->m_sb.sb_rextsize; +} + /* * Functions for walking free space rtextents in the realtime bitmap. */ diff --git a/fs/xfs/libxfs/xfs_trans_resv.c b/fs/xfs/libxfs/xfs_trans_resv.c index 791fad6dba74..dd924842716d 100644 --- a/fs/xfs/libxfs/xfs_trans_resv.c +++ b/fs/xfs/libxfs/xfs_trans_resv.c @@ -19,6 +19,7 @@ #include "xfs_trans.h" #include "xfs_qm.h" #include "xfs_trans_space.h" +#include "xfs_rtbitmap.h" #define _ALLOC true #define _FREE false @@ -220,7 +221,7 @@ xfs_rtalloc_block_count( unsigned int blksz = XFS_FSB_TO_B(mp, 1); unsigned int rtbmp_bytes; - rtbmp_bytes = (XFS_MAX_BMBT_EXTLEN / mp->m_sb.sb_rextsize) / NBBY; + rtbmp_bytes = xfs_extlen_to_rtxlen(mp, XFS_MAX_BMBT_EXTLEN) / NBBY; return (howmany(rtbmp_bytes, blksz) + 1) * num_ops; } diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index cc158e5e095f..a55347c693ed 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -91,7 +91,7 @@ xfs_bmap_rtalloc( align = xfs_get_extsz_hint(ap->ip); retry: - prod = align / mp->m_sb.sb_rextsize; + prod = xfs_extlen_to_rtxlen(mp, align); error = xfs_bmap_extsize_align(mp, &ap->got, &ap->prev, align, 1, ap->eof, 0, ap->conv, &ap->offset, &ap->length); @@ -118,17 +118,14 @@ xfs_bmap_rtalloc( prod = 1; /* * Set ralen to be the actual requested length in rtextents. - */ - ralen = ap->length / mp->m_sb.sb_rextsize; - /* + * * If the old value was close enough to XFS_BMBT_MAX_EXTLEN that * we rounded up to it, cut it back so it's valid again. * Note that if it's a really large request (bigger than * XFS_BMBT_MAX_EXTLEN), we don't hear about that number, and can't * adjust the starting point to match it. */ - if (xfs_rtxlen_to_extlen(mp, ralen) >= XFS_MAX_BMBT_EXTLEN) - ralen = XFS_MAX_BMBT_EXTLEN / mp->m_sb.sb_rextsize; + ralen = xfs_extlen_to_rtxlen(mp, min(ap->length, XFS_MAX_BMBT_EXTLEN)); /* * Lock out modifications to both the RT bitmap and summary inodes @@ -165,7 +162,7 @@ xfs_bmap_rtalloc( do_div(ap->blkno, mp->m_sb.sb_rextsize); rtx = ap->blkno; ap->length = ralen; - raminlen = max_t(xfs_extlen_t, 1, minlen / mp->m_sb.sb_rextsize); + raminlen = max_t(xfs_rtxlen_t, 1, xfs_extlen_to_rtxlen(mp, minlen)); error = xfs_rtallocate_extent(ap->tp, ap->blkno, raminlen, ap->length, &ralen, ap->wasdel, prod, &rtx); if (error) diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c index 1e95d11b6d7d..3e81826c9a0a 100644 --- a/fs/xfs/xfs_trans.c +++ b/fs/xfs/xfs_trans.c @@ -24,6 +24,7 @@ #include "xfs_dquot_item.h" #include "xfs_dquot.h" #include "xfs_icache.h" +#include "xfs_rtbitmap.h" struct kmem_cache *xfs_trans_cache; @@ -1245,7 +1246,7 @@ xfs_trans_alloc_inode( retry: error = xfs_trans_alloc(mp, resv, dblocks, - rblocks / mp->m_sb.sb_rextsize, + xfs_extlen_to_rtxlen(mp, rblocks), force ? XFS_TRANS_RESERVE : 0, &tp); if (error) return error; @@ -1291,7 +1292,7 @@ xfs_trans_reserve_more_inode( bool force_quota) { struct xfs_mount *mp = ip->i_mount; - unsigned int rtx = rblocks / mp->m_sb.sb_rextsize; + unsigned int rtx = xfs_extlen_to_rtxlen(mp, rblocks); int error; ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL)); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 7/7] xfs: use shifting and masking when converting rt extents, if possible 2022-12-30 22:17 ` [PATCHSET v1.0 0/7] xfs: refactor rt extent unit conversions Darrick J. Wong ` (4 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 3/7] xfs: create a helper to compute leftovers of realtime extents Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 6/7] xfs: create rt extent rounding helpers for realtime extent blocks Darrick J. Wong 6 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Avoid the costs of integer division (32-bit and 64-bit) if the realtime extent size is a power of two. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_rtbitmap.h | 20 ++++++++++++++++++++ fs/xfs/libxfs/xfs_sb.c | 2 ++ fs/xfs/xfs_linux.h | 12 ++++++++++++ fs/xfs/xfs_mount.h | 2 ++ fs/xfs/xfs_rtalloc.c | 1 + fs/xfs/xfs_trans.c | 4 ++++ 6 files changed, 41 insertions(+) diff --git a/fs/xfs/libxfs/xfs_rtbitmap.h b/fs/xfs/libxfs/xfs_rtbitmap.h index bc51d3bfc7c4..9dd791181ca2 100644 --- a/fs/xfs/libxfs/xfs_rtbitmap.h +++ b/fs/xfs/libxfs/xfs_rtbitmap.h @@ -11,6 +11,9 @@ xfs_rtx_to_rtb( struct xfs_mount *mp, xfs_rtxnum_t rtx) { + if (mp->m_rtxblklog >= 0) + return rtx << mp->m_rtxblklog; + return rtx * mp->m_sb.sb_rextsize; } @@ -19,6 +22,9 @@ xfs_rtxlen_to_extlen( struct xfs_mount *mp, xfs_rtxlen_t rtxlen) { + if (mp->m_rtxblklog >= 0) + return rtxlen << mp->m_rtxblklog; + return rtxlen * mp->m_sb.sb_rextsize; } @@ -28,6 +34,9 @@ xfs_extlen_to_rtxmod( struct xfs_mount *mp, xfs_extlen_t len) { + if (mp->m_rtxblklog >= 0) + return len & mp->m_rtxblkmask; + return len % mp->m_sb.sb_rextsize; } @@ -36,6 +45,9 @@ xfs_extlen_to_rtxlen( struct xfs_mount *mp, xfs_extlen_t len) { + if (mp->m_rtxblklog >= 0) + return len >> mp->m_rtxblklog; + return len / mp->m_sb.sb_rextsize; } @@ -45,6 +57,11 @@ xfs_rtb_to_rtx( xfs_rtblock_t rtbno, xfs_extlen_t *mod) { + if (mp->m_rtxblklog >= 0) { + *mod = rtbno & mp->m_rtxblkmask; + return rtbno >> mp->m_rtxblklog; + } + return div_u64_rem(rtbno, mp->m_sb.sb_rextsize, mod); } @@ -53,6 +70,9 @@ xfs_rtb_to_rtxt( struct xfs_mount *mp, xfs_rtblock_t rtbno) { + if (mp->m_rtxblklog >= 0) + return rtbno >> mp->m_rtxblklog; + return div_u64(rtbno, mp->m_sb.sb_rextsize); } diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c index 8ebedfe55b15..83930abf935f 100644 --- a/fs/xfs/libxfs/xfs_sb.c +++ b/fs/xfs/libxfs/xfs_sb.c @@ -952,6 +952,8 @@ xfs_sb_mount_common( mp->m_blockmask = sbp->sb_blocksize - 1; mp->m_blockwsize = sbp->sb_blocksize >> XFS_WORDLOG; mp->m_blockwmask = mp->m_blockwsize - 1; + mp->m_rtxblklog = log2_if_power2(sbp->sb_rextsize); + mp->m_rtxblkmask = mask64_if_power2(sbp->sb_rextsize); mp->m_alloc_mxr[0] = xfs_allocbt_maxrecs(mp, sbp->sb_blocksize, 1); mp->m_alloc_mxr[1] = xfs_allocbt_maxrecs(mp, sbp->sb_blocksize, 0); diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h index 2cdb3411aabb..7455cbadc262 100644 --- a/fs/xfs/xfs_linux.h +++ b/fs/xfs/xfs_linux.h @@ -200,6 +200,18 @@ static inline bool isaligned_64(uint64_t x, uint32_t y) return do_div(x, y) == 0; } +/* If @b is a power of 2, return log2(b). Else return -1. */ +static inline int8_t log2_if_power2(unsigned long b) +{ + return is_power_of_2(b) ? ilog2(b) : -1; +} + +/* If @b is a power of 2, return a mask of the lower bits, else return zero. */ +static inline unsigned long long mask64_if_power2(unsigned long b) +{ + return is_power_of_2(b) ? b - 1 : 0; +} + int xfs_rw_bdev(struct block_device *bdev, sector_t sector, unsigned int count, char *data, enum req_op op); diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h index 88fbbaee8806..bad926f3e102 100644 --- a/fs/xfs/xfs_mount.h +++ b/fs/xfs/xfs_mount.h @@ -118,6 +118,7 @@ typedef struct xfs_mount { uint8_t m_blkbb_log; /* blocklog - BBSHIFT */ uint8_t m_agno_log; /* log #ag's */ uint8_t m_sectbb_log; /* sectlog - BBSHIFT */ + int8_t m_rtxblklog; /* log2 of rextsize, if possible */ uint m_blockmask; /* sb_blocksize-1 */ uint m_blockwsize; /* sb_blocksize in words */ uint m_blockwmask; /* blockwsize-1 */ @@ -151,6 +152,7 @@ typedef struct xfs_mount { uint64_t m_features; /* active filesystem features */ uint64_t m_low_space[XFS_LOWSP_MAX]; uint64_t m_low_rtexts[XFS_LOWSP_MAX]; + uint64_t m_rtxblkmask; /* rt extent block mask */ struct xfs_ino_geometry m_ino_geo; /* inode geometry */ struct xfs_trans_resv m_resv; /* precomputed res values */ /* low free space thresholds */ diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c index b74ba5e51cf8..3573dfef5dd7 100644 --- a/fs/xfs/xfs_rtalloc.c +++ b/fs/xfs/xfs_rtalloc.c @@ -1052,6 +1052,7 @@ xfs_growfs_rt( * Calculate new sb and mount fields for this round. */ nsbp->sb_rextsize = in->extsize; + nmp->m_rtxblklog = -1; /* don't use shift or masking */ nsbp->sb_rbmblocks = bmbno + 1; nrblocks_step = (bmbno + 1) * NBBY * nsbp->sb_blocksize * nsbp->sb_rextsize; diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c index 3e81826c9a0a..f39c5daeef86 100644 --- a/fs/xfs/xfs_trans.c +++ b/fs/xfs/xfs_trans.c @@ -710,6 +710,10 @@ xfs_trans_unreserve_and_mod_sb( mp->m_sb.sb_agcount += tp->t_agcount_delta; mp->m_sb.sb_imax_pct += tp->t_imaxpct_delta; mp->m_sb.sb_rextsize += tp->t_rextsize_delta; + if (tp->t_rextsize_delta) { + mp->m_rtxblklog = log2_if_power2(mp->m_sb.sb_rextsize); + mp->m_rtxblkmask = mask64_if_power2(mp->m_sb.sb_rextsize); + } mp->m_sb.sb_rbmblocks += tp->t_rbmblocks_delta; mp->m_sb.sb_rblocks += tp->t_rblocks_delta; mp->m_sb.sb_rextents += tp->t_rextents_delta; ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 6/7] xfs: create rt extent rounding helpers for realtime extent blocks 2022-12-30 22:17 ` [PATCHSET v1.0 0/7] xfs: refactor rt extent unit conversions Darrick J. Wong ` (5 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 7/7] xfs: use shifting and masking when converting rt extents, if possible Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 6 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Create a pair of functions to round rtblock numbers up or down to the nearest rt extent. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_rtbitmap.h | 18 ++++++++++++++++++ fs/xfs/xfs_bmap_util.c | 8 +++----- fs/xfs/xfs_rtalloc.c | 4 ++-- fs/xfs/xfs_xchgrange.c | 4 ++-- 4 files changed, 25 insertions(+), 9 deletions(-) diff --git a/fs/xfs/libxfs/xfs_rtbitmap.h b/fs/xfs/libxfs/xfs_rtbitmap.h index bdd4858a794c..bc51d3bfc7c4 100644 --- a/fs/xfs/libxfs/xfs_rtbitmap.h +++ b/fs/xfs/libxfs/xfs_rtbitmap.h @@ -56,6 +56,24 @@ xfs_rtb_to_rtxt( return div_u64(rtbno, mp->m_sb.sb_rextsize); } +/* Round this rtblock up to the nearest rt extent size. */ +static inline xfs_rtblock_t +xfs_rtb_roundup_rtx( + struct xfs_mount *mp, + xfs_rtblock_t rtbno) +{ + return roundup_64(rtbno, mp->m_sb.sb_rextsize); +} + +/* Round this rtblock down to the nearest rt extent size. */ +static inline xfs_rtblock_t +xfs_rtb_rounddown_rtx( + struct xfs_mount *mp, + xfs_rtblock_t rtbno) +{ + return rounddown_64(rtbno, mp->m_sb.sb_rextsize); +} + /* * Functions for walking free space rtextents in the realtime bitmap. */ diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index e595625048f8..1bfdd31723f5 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -685,7 +685,7 @@ xfs_can_free_eofblocks( */ end_fsb = XFS_B_TO_FSB(mp, (xfs_ufsize_t)XFS_ISIZE(ip)); if (XFS_IS_REALTIME_INODE(ip) && mp->m_sb.sb_rextsize > 1) - end_fsb = roundup_64(end_fsb, mp->m_sb.sb_rextsize); + end_fsb = xfs_rtb_roundup_rtx(mp, end_fsb); last_fsb = XFS_B_TO_FSB(mp, mp->m_super->s_maxbytes); if (last_fsb <= end_fsb) return false; @@ -984,10 +984,8 @@ xfs_free_file_space( /* We can only free complete realtime extents. */ if (xfs_inode_has_bigrtextents(ip)) { - startoffset_fsb = roundup_64(startoffset_fsb, - mp->m_sb.sb_rextsize); - endoffset_fsb = rounddown_64(endoffset_fsb, - mp->m_sb.sb_rextsize); + startoffset_fsb = xfs_rtb_roundup_rtx(mp, startoffset_fsb); + endoffset_fsb = xfs_rtb_rounddown_rtx(mp, endoffset_fsb); } /* diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c index 1953a00755f4..b74ba5e51cf8 100644 --- a/fs/xfs/xfs_rtalloc.c +++ b/fs/xfs/xfs_rtalloc.c @@ -1552,8 +1552,8 @@ xfs_rtfile_convert_unwritten( if (mp->m_sb.sb_rextsize == 1) return 0; - off = rounddown_64(XFS_B_TO_FSBT(mp, pos), mp->m_sb.sb_rextsize); - endoff = roundup_64(XFS_B_TO_FSB(mp, pos + len), mp->m_sb.sb_rextsize); + off = xfs_rtb_rounddown_rtx(mp, XFS_B_TO_FSBT(mp, pos)); + endoff = xfs_rtb_roundup_rtx(mp, XFS_B_TO_FSB(mp, pos + len)); trace_xfs_rtfile_convert_unwritten(ip, pos, len); diff --git a/fs/xfs/xfs_xchgrange.c b/fs/xfs/xfs_xchgrange.c index ae030a6f607e..829a17ac7406 100644 --- a/fs/xfs/xfs_xchgrange.c +++ b/fs/xfs/xfs_xchgrange.c @@ -29,6 +29,7 @@ #include "xfs_icache.h" #include "xfs_log.h" #include "xfs_rtalloc.h" +#include "xfs_rtbitmap.h" /* Lock (and optionally join) two inodes for a file range exchange. */ void @@ -802,8 +803,7 @@ xfs_xchg_range( * offsets and length in @fxr are safe to round up. */ if (XFS_IS_REALTIME_INODE(ip2)) - req.blockcount = roundup_64(req.blockcount, - mp->m_sb.sb_rextsize); + req.blockcount = xfs_rtb_roundup_rtx(mp, req.blockcount); error = xfs_xchg_range_estimate(&req); if (error) ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCHSET v1.0 0/8] xfs: refactor rtbitmap/summary macros 2022-12-30 21:14 [NYE DELUGE 3/4] xfs: modernize the realtime volume Darrick J. Wong ` (4 preceding siblings ...) 2022-12-30 22:17 ` [PATCHSET v1.0 0/7] xfs: refactor rt extent unit conversions Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 1/8] xfs: convert the rtbitmap block and bit macros to static inline functions Darrick J. Wong ` (7 more replies) 2022-12-30 22:17 ` [PATCHSET v1.0 0/3] xfs: refactor realtime meta inode locking Darrick J. Wong ` (33 subsequent siblings) 39 siblings, 8 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs Hi all, In preparation for adding block headers and enforcing endian order in rtbitmap and rtsummary blocks, replace open-coded geometry computations and fugly macros with proper helper functions that can be typechecked. Soon we'll be needing to add more complex logic to the helpers. If you're going to start using this mess, you probably ought to just pull from my git trees, which are linked below. This is an extraordinary way to destroy everything. Enjoy! Comments and questions are, as always, welcome. --D kernel git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=refactor-rtbitmap-macros xfsprogs git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=refactor-rtbitmap-macros --- fs/xfs/libxfs/xfs_format.h | 32 ++--- fs/xfs/libxfs/xfs_rtbitmap.c | 268 ++++++++++++++++++++++++++++----------- fs/xfs/libxfs/xfs_rtbitmap.h | 133 +++++++++++++++++++ fs/xfs/libxfs/xfs_trans_resv.c | 9 + fs/xfs/libxfs/xfs_types.h | 2 fs/xfs/scrub/rtsummary.c | 52 ++++---- fs/xfs/scrub/rtsummary.h | 6 - fs/xfs/scrub/rtsummary_repair.c | 7 + fs/xfs/scrub/trace.c | 1 fs/xfs/scrub/trace.h | 4 - fs/xfs/xfs_ondisk.h | 4 + fs/xfs/xfs_rtalloc.c | 39 +++--- 12 files changed, 409 insertions(+), 148 deletions(-) ^ permalink raw reply [flat|nested] 565+ messages in thread
* [PATCH 1/8] xfs: convert the rtbitmap block and bit macros to static inline functions 2022-12-30 22:17 ` [PATCHSET v1.0 0/8] xfs: refactor rtbitmap/summary macros Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 8/8] xfs: use accessor functions for summary info words Darrick J. Wong ` (6 subsequent siblings) 7 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Replace these macros with typechecked helper functions. Eventually we're going to add more logic to the helpers and it'll be easier if we don't have to macro it up. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_format.h | 5 ----- fs/xfs/libxfs/xfs_rtbitmap.c | 22 +++++++++++----------- fs/xfs/libxfs/xfs_rtbitmap.h | 27 +++++++++++++++++++++++++++ fs/xfs/scrub/rtsummary.c | 2 +- fs/xfs/xfs_rtalloc.c | 20 ++++++++++---------- 5 files changed, 49 insertions(+), 27 deletions(-) diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h index d93cc0ea20e3..6a3d684900ab 100644 --- a/fs/xfs/libxfs/xfs_format.h +++ b/fs/xfs/libxfs/xfs_format.h @@ -1209,11 +1209,6 @@ static inline bool xfs_dinode_has_large_extent_counts( ((xfs_suminfo_t *)((bp)->b_addr + \ (((so) * (uint)sizeof(xfs_suminfo_t)) & XFS_BLOCKMASK(mp)))) -#define XFS_BITTOBLOCK(mp,bi) ((bi) >> (mp)->m_blkbit_log) -#define XFS_BLOCKTOBIT(mp,bb) ((bb) << (mp)->m_blkbit_log) -#define XFS_BITTOWORD(mp,bi) \ - ((int)(((bi) >> XFS_NBWORDLOG) & XFS_BLOCKWMASK(mp))) - #define XFS_RTMIN(a,b) ((a) < (b) ? (a) : (b)) #define XFS_RTMAX(a,b) ((a) > (b) ? (a) : (b)) diff --git a/fs/xfs/libxfs/xfs_rtbitmap.c b/fs/xfs/libxfs/xfs_rtbitmap.c index de54386cf52f..ce8736666a1e 100644 --- a/fs/xfs/libxfs/xfs_rtbitmap.c +++ b/fs/xfs/libxfs/xfs_rtbitmap.c @@ -123,7 +123,7 @@ xfs_rtfind_back( /* * Compute and read in starting bitmap block for starting block. */ - block = XFS_BITTOBLOCK(mp, start); + block = xfs_rtx_to_rbmblock(mp, start); error = xfs_rtbuf_get(mp, tp, block, 0, &bp); if (error) { return error; @@ -132,7 +132,7 @@ xfs_rtfind_back( /* * Get the first word's index & point to it. */ - word = XFS_BITTOWORD(mp, start); + word = xfs_rtx_to_rbmword(mp, start); b = &bufp[word]; bit = (int)(start & (XFS_NBWORD - 1)); len = start - limit + 1; @@ -298,7 +298,7 @@ xfs_rtfind_forw( /* * Compute and read in starting bitmap block for starting block. */ - block = XFS_BITTOBLOCK(mp, start); + block = xfs_rtx_to_rbmblock(mp, start); error = xfs_rtbuf_get(mp, tp, block, 0, &bp); if (error) { return error; @@ -307,7 +307,7 @@ xfs_rtfind_forw( /* * Get the first word's index & point to it. */ - word = XFS_BITTOWORD(mp, start); + word = xfs_rtx_to_rbmword(mp, start); b = &bufp[word]; bit = (int)(start & (XFS_NBWORD - 1)); len = limit - start + 1; @@ -559,7 +559,7 @@ xfs_rtmodify_range( /* * Compute starting bitmap block number. */ - block = XFS_BITTOBLOCK(mp, start); + block = xfs_rtx_to_rbmblock(mp, start); /* * Read the bitmap block, and point to its data. */ @@ -571,7 +571,7 @@ xfs_rtmodify_range( /* * Compute the starting word's address, and starting bit. */ - word = XFS_BITTOWORD(mp, start); + word = xfs_rtx_to_rbmword(mp, start); first = b = &bufp[word]; bit = (int)(start & (XFS_NBWORD - 1)); /* @@ -737,7 +737,7 @@ xfs_rtfree_range( if (preblock < start) { error = xfs_rtmodify_summary(mp, tp, XFS_RTBLOCKLOG(start - preblock), - XFS_BITTOBLOCK(mp, preblock), -1, rbpp, rsb); + xfs_rtx_to_rbmblock(mp, preblock), -1, rbpp, rsb); if (error) { return error; } @@ -749,7 +749,7 @@ xfs_rtfree_range( if (postblock > end) { error = xfs_rtmodify_summary(mp, tp, XFS_RTBLOCKLOG(postblock - end), - XFS_BITTOBLOCK(mp, end + 1), -1, rbpp, rsb); + xfs_rtx_to_rbmblock(mp, end + 1), -1, rbpp, rsb); if (error) { return error; } @@ -760,7 +760,7 @@ xfs_rtfree_range( */ error = xfs_rtmodify_summary(mp, tp, XFS_RTBLOCKLOG(postblock + 1 - preblock), - XFS_BITTOBLOCK(mp, preblock), 1, rbpp, rsb); + xfs_rtx_to_rbmblock(mp, preblock), 1, rbpp, rsb); return error; } @@ -793,7 +793,7 @@ xfs_rtcheck_range( /* * Compute starting bitmap block number */ - block = XFS_BITTOBLOCK(mp, start); + block = xfs_rtx_to_rbmblock(mp, start); /* * Read the bitmap block. */ @@ -805,7 +805,7 @@ xfs_rtcheck_range( /* * Compute the starting word's address, and starting bit. */ - word = XFS_BITTOWORD(mp, start); + word = xfs_rtx_to_rbmword(mp, start); b = &bufp[word]; bit = (int)(start & (XFS_NBWORD - 1)); /* diff --git a/fs/xfs/libxfs/xfs_rtbitmap.h b/fs/xfs/libxfs/xfs_rtbitmap.h index 9dd791181ca2..e53011bc638d 100644 --- a/fs/xfs/libxfs/xfs_rtbitmap.h +++ b/fs/xfs/libxfs/xfs_rtbitmap.h @@ -94,6 +94,33 @@ xfs_rtb_rounddown_rtx( return rounddown_64(rtbno, mp->m_sb.sb_rextsize); } +/* Convert an rt extent number to a file block offset in the rt bitmap file. */ +static inline xfs_fileoff_t +xfs_rtx_to_rbmblock( + struct xfs_mount *mp, + xfs_rtxnum_t rtx) +{ + return rtx >> mp->m_blkbit_log; +} + +/* Convert an rt extent number to a word offset within an rt bitmap block. */ +static inline unsigned int +xfs_rtx_to_rbmword( + struct xfs_mount *mp, + xfs_rtxnum_t rtx) +{ + return (rtx >> XFS_NBWORDLOG) & XFS_BLOCKWMASK(mp); +} + +/* Convert a file block offset in the rt bitmap file to an rt extent number. */ +static inline xfs_rtxnum_t +xfs_rbmblock_to_rtx( + struct xfs_mount *mp, + xfs_fileoff_t rbmoff) +{ + return rbmoff << mp->m_blkbit_log; +} + /* * Functions for walking free space rtextents in the realtime bitmap. */ diff --git a/fs/xfs/scrub/rtsummary.c b/fs/xfs/scrub/rtsummary.c index 7885ddd2d733..fd6fb905904b 100644 --- a/fs/xfs/scrub/rtsummary.c +++ b/fs/xfs/scrub/rtsummary.c @@ -134,7 +134,7 @@ xchk_rtsum_record_free( return error; /* Compute the relevant location in the rtsum file. */ - rbmoff = XFS_BITTOBLOCK(mp, rec->ar_startext); + rbmoff = xfs_rtx_to_rbmblock(mp, rec->ar_startext); lenlog = XFS_RTBLOCKLOG(rec->ar_extcount); offs = XFS_SUMOFFS(mp, lenlog, rbmoff); diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c index 3573dfef5dd7..c63906cf94d1 100644 --- a/fs/xfs/xfs_rtalloc.c +++ b/fs/xfs/xfs_rtalloc.c @@ -182,7 +182,7 @@ xfs_rtallocate_range( */ error = xfs_rtmodify_summary(mp, tp, XFS_RTBLOCKLOG(postblock + 1 - preblock), - XFS_BITTOBLOCK(mp, preblock), -1, rbpp, rsb); + xfs_rtx_to_rbmblock(mp, preblock), -1, rbpp, rsb); if (error) { return error; } @@ -193,7 +193,7 @@ xfs_rtallocate_range( if (preblock < start) { error = xfs_rtmodify_summary(mp, tp, XFS_RTBLOCKLOG(start - preblock), - XFS_BITTOBLOCK(mp, preblock), 1, rbpp, rsb); + xfs_rtx_to_rbmblock(mp, preblock), 1, rbpp, rsb); if (error) { return error; } @@ -205,7 +205,7 @@ xfs_rtallocate_range( if (postblock > end) { error = xfs_rtmodify_summary(mp, tp, XFS_RTBLOCKLOG(postblock - end), - XFS_BITTOBLOCK(mp, end + 1), 1, rbpp, rsb); + xfs_rtx_to_rbmblock(mp, end + 1), 1, rbpp, rsb); if (error) { return error; } @@ -249,8 +249,8 @@ xfs_rtallocate_extent_block( * Loop over all the extents starting in this bitmap block, * looking for one that's long enough. */ - for (i = XFS_BLOCKTOBIT(mp, bbno), besti = -1, bestlen = 0, - end = XFS_BLOCKTOBIT(mp, bbno + 1) - 1; + for (i = xfs_rbmblock_to_rtx(mp, bbno), besti = -1, bestlen = 0, + end = xfs_rbmblock_to_rtx(mp, bbno + 1) - 1; i <= end; i++) { /* @@ -487,7 +487,7 @@ xfs_rtallocate_extent_near( *rtx = r; return 0; } - bbno = XFS_BITTOBLOCK(mp, start); + bbno = xfs_rtx_to_rbmblock(mp, start); i = 0; ASSERT(minlen != 0); log2len = xfs_highbit32(minlen); @@ -706,8 +706,8 @@ xfs_rtallocate_extent_size( * allocator is beyond the next bitmap block, * skip to that bitmap block. */ - if (XFS_BITTOBLOCK(mp, n) > i + 1) - i = XFS_BITTOBLOCK(mp, n) - 1; + if (xfs_rtx_to_rbmblock(mp, n) > i + 1) + i = xfs_rtx_to_rbmblock(mp, n) - 1; } } /* @@ -769,8 +769,8 @@ xfs_rtallocate_extent_size( * allocator is beyond the next bitmap block, * skip to that bitmap block. */ - if (XFS_BITTOBLOCK(mp, n) > i + 1) - i = XFS_BITTOBLOCK(mp, n) - 1; + if (xfs_rtx_to_rbmblock(mp, n) > i + 1) + i = xfs_rtx_to_rbmblock(mp, n) - 1; } } /* ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 8/8] xfs: use accessor functions for summary info words 2022-12-30 22:17 ` [PATCHSET v1.0 0/8] xfs: refactor rtbitmap/summary macros Darrick J. Wong 2022-12-30 22:17 ` [PATCH 1/8] xfs: convert the rtbitmap block and bit macros to static inline functions Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 2/8] xfs: remove XFS_BLOCKWSIZE and XFS_BLOCKWMASK macros Darrick J. Wong ` (5 subsequent siblings) 7 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Create get and set functions for rtsummary words so that we can redefine the ondisk format with a specific endianness. Note that this requires the definition of a distinct type for ondisk summary info words so that the compiler can perform proper typechecking. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_format.h | 8 ++++++++ fs/xfs/libxfs/xfs_rtbitmap.c | 27 ++++++++++++++++++++++----- fs/xfs/libxfs/xfs_rtbitmap.h | 10 +++++++--- fs/xfs/scrub/rtsummary.c | 22 ++++++++++++---------- fs/xfs/scrub/rtsummary.h | 2 +- fs/xfs/scrub/trace.c | 1 + fs/xfs/scrub/trace.h | 4 ++-- fs/xfs/xfs_ondisk.h | 1 + 8 files changed, 54 insertions(+), 21 deletions(-) diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h index 14da972f5508..946870eb492c 100644 --- a/fs/xfs/libxfs/xfs_format.h +++ b/fs/xfs/libxfs/xfs_format.h @@ -725,6 +725,14 @@ union xfs_rtword_ondisk { __u32 raw; }; +/* + * Realtime summary counts are accessed by the word, which is currently + * stored in host-endian format. + */ +union xfs_suminfo_ondisk { + __u32 raw; +}; + /* * XFS Timestamps * ============== diff --git a/fs/xfs/libxfs/xfs_rtbitmap.c b/fs/xfs/libxfs/xfs_rtbitmap.c index be5c793da46c..b74261abd238 100644 --- a/fs/xfs/libxfs/xfs_rtbitmap.c +++ b/fs/xfs/libxfs/xfs_rtbitmap.c @@ -466,6 +466,23 @@ xfs_rtfind_forw( return 0; } +inline xfs_suminfo_t +xfs_suminfo_get( + struct xfs_mount *mp, + union xfs_suminfo_ondisk *infoptr) +{ + return infoptr->raw; +} + +inline void +xfs_suminfo_add( + struct xfs_mount *mp, + union xfs_suminfo_ondisk *infoptr, + int delta) +{ + infoptr->raw += delta; +} + /* * Read and/or modify the summary information for a given extent size, * bitmap block combination. @@ -490,7 +507,7 @@ xfs_rtmodify_summary_int( int error; /* error value */ xfs_fileoff_t sb; /* summary fsblock */ xfs_rtsumoff_t so; /* index into the summary file */ - xfs_suminfo_t *sp; /* pointer to returned data */ + union xfs_suminfo_ondisk *sp; /* pointer to returned data */ unsigned int infoword; /* @@ -533,17 +550,17 @@ xfs_rtmodify_summary_int( if (delta) { uint first = (uint)((char *)sp - (char *)bp->b_addr); - *sp += delta; + xfs_suminfo_add(mp, sp, delta); if (mp->m_rsum_cache) { - if (*sp == 0 && log == mp->m_rsum_cache[bbno]) + if (sp->raw == 0 && log == mp->m_rsum_cache[bbno]) mp->m_rsum_cache[bbno]++; - if (*sp != 0 && log < mp->m_rsum_cache[bbno]) + if (sp->raw != 0 && log < mp->m_rsum_cache[bbno]) mp->m_rsum_cache[bbno] = log; } xfs_trans_log_buf(tp, bp, first, first + sizeof(*sp) - 1); } if (sum) - *sum = *sp; + *sum = xfs_suminfo_get(mp, sp); return 0; } diff --git a/fs/xfs/libxfs/xfs_rtbitmap.h b/fs/xfs/libxfs/xfs_rtbitmap.h index a66357cf002b..749c8e3ec4cb 100644 --- a/fs/xfs/libxfs/xfs_rtbitmap.h +++ b/fs/xfs/libxfs/xfs_rtbitmap.h @@ -181,18 +181,18 @@ xfs_rtsumoffs_to_infoword( } /* Return a pointer to a summary info word within a rt summary block buffer. */ -static inline xfs_suminfo_t * +static inline union xfs_suminfo_ondisk * xfs_rsumbuf_infoptr( void *buf, unsigned int infoword) { - xfs_suminfo_t *infop = buf; + union xfs_suminfo_ondisk *infop = buf; return &infop[infoword]; } /* Return a pointer to a summary info word within a rt summary block. */ -static inline xfs_suminfo_t * +static inline union xfs_suminfo_ondisk * xfs_rsumblock_infoptr( struct xfs_buf *bp, unsigned int infoword) @@ -275,6 +275,10 @@ xfs_filblks_t xfs_rtsummary_blockcount(struct xfs_mount *mp, unsigned int rsumlevels, xfs_extlen_t rbmblocks); unsigned long long xfs_rtsummary_wordcount(struct xfs_mount *mp, unsigned int rsumlevels, xfs_extlen_t rbmblocks); +xfs_suminfo_t xfs_suminfo_get(struct xfs_mount *mp, + union xfs_suminfo_ondisk *infoptr); +void xfs_suminfo_add(struct xfs_mount *mp, union xfs_suminfo_ondisk *infoptr, + int delta); #else /* CONFIG_XFS_RT */ # define xfs_rtfree_extent(t,b,l) (-ENOSYS) # define xfs_rtfree_blocks(t,rb,rl) (-ENOSYS) diff --git a/fs/xfs/scrub/rtsummary.c b/fs/xfs/scrub/rtsummary.c index e51c3d10e501..ca9153e646c9 100644 --- a/fs/xfs/scrub/rtsummary.c +++ b/fs/xfs/scrub/rtsummary.c @@ -89,9 +89,10 @@ static inline int xfsum_load( struct xfs_scrub *sc, xfs_rtsumoff_t sumoff, - xfs_suminfo_t *info) + union xfs_suminfo_ondisk *rawinfo) { - return xfile_obj_load(sc->xfile, info, sizeof(xfs_suminfo_t), + return xfile_obj_load(sc->xfile, rawinfo, + sizeof(union xfs_suminfo_ondisk), sumoff << XFS_WORDLOG); } @@ -99,9 +100,10 @@ static inline int xfsum_store( struct xfs_scrub *sc, xfs_rtsumoff_t sumoff, - const xfs_suminfo_t info) + const union xfs_suminfo_ondisk rawinfo) { - return xfile_obj_store(sc->xfile, &info, sizeof(xfs_suminfo_t), + return xfile_obj_store(sc->xfile, &rawinfo, + sizeof(union xfs_suminfo_ondisk), sumoff << XFS_WORDLOG); } @@ -109,10 +111,10 @@ inline int xfsum_copyout( struct xfs_scrub *sc, xfs_rtsumoff_t sumoff, - xfs_suminfo_t *info, + union xfs_suminfo_ondisk *rawinfo, unsigned int nr_words) { - return xfile_obj_load(sc->xfile, info, nr_words << XFS_WORDLOG, + return xfile_obj_load(sc->xfile, rawinfo, nr_words << XFS_WORDLOG, sumoff << XFS_WORDLOG); } @@ -130,7 +132,7 @@ xchk_rtsum_record_free( xfs_filblks_t rtlen; xfs_rtsumoff_t offs; unsigned int lenlog; - xfs_suminfo_t v = 0; + union xfs_suminfo_ondisk v; int error = 0; if (xchk_should_terminate(sc, &error)) @@ -154,9 +156,9 @@ xchk_rtsum_record_free( if (error) return error; - v++; + xfs_suminfo_add(mp, &v, 1); trace_xchk_rtsum_record_free(mp, rec->ar_startext, rec->ar_extcount, - lenlog, offs, v); + lenlog, offs, &v); return xfsum_store(sc, offs, v); } @@ -191,7 +193,7 @@ xchk_rtsum_compare( int nmap; for (off = 0; off < XFS_B_TO_FSB(mp, mp->m_rsumsize); off++) { - xfs_suminfo_t *ondisk_info; + union xfs_suminfo_ondisk *ondisk_info; int error = 0; if (xchk_should_terminate(sc, &error)) diff --git a/fs/xfs/scrub/rtsummary.h b/fs/xfs/scrub/rtsummary.h index f5fd55992957..aca13556b3a2 100644 --- a/fs/xfs/scrub/rtsummary.h +++ b/fs/xfs/scrub/rtsummary.h @@ -9,6 +9,6 @@ typedef unsigned int xfs_rtsumoff_t; int xfsum_copyout(struct xfs_scrub *sc, xfs_rtsumoff_t sumoff, - xfs_suminfo_t *info, unsigned int nr_words); + union xfs_suminfo_ondisk *info, unsigned int nr_words); #endif /* __XFS_SCRUB_RTSUMMARY_H__ */ diff --git a/fs/xfs/scrub/trace.c b/fs/xfs/scrub/trace.c index 2e36fcc12e40..bb13f0a8e4cf 100644 --- a/fs/xfs/scrub/trace.c +++ b/fs/xfs/scrub/trace.c @@ -19,6 +19,7 @@ #include "xfs_da_format.h" #include "xfs_btree_mem.h" #include "xfs_rmap.h" +#include "xfs_rtbitmap.h" #include "scrub/scrub.h" #include "scrub/xfile.h" #include "scrub/xfarray.h" diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h index 650a4c88ebc4..749cf4333c8a 100644 --- a/fs/xfs/scrub/trace.h +++ b/fs/xfs/scrub/trace.h @@ -1083,7 +1083,7 @@ TRACE_EVENT(xfarray_sort_stats, TRACE_EVENT(xchk_rtsum_record_free, TP_PROTO(struct xfs_mount *mp, xfs_rtxnum_t start, xfs_rtbxlen_t len, unsigned int log, loff_t pos, - xfs_suminfo_t v), + union xfs_suminfo_ondisk *v), TP_ARGS(mp, start, len, log, pos, v), TP_STRUCT__entry( __field(dev_t, dev) @@ -1101,7 +1101,7 @@ TRACE_EVENT(xchk_rtsum_record_free, __entry->len = len; __entry->log = log; __entry->pos = pos; - __entry->v = v; + __entry->v = xfs_suminfo_get(mp, v); ), TP_printk("dev %d:%d rtdev %d:%d rtx 0x%llx rtxcount 0x%llx log %u rsumpos 0x%llx sumcount %u", MAJOR(__entry->dev), MINOR(__entry->dev), diff --git a/fs/xfs/xfs_ondisk.h b/fs/xfs/xfs_ondisk.h index 7f20642b073e..f4d700ce185c 100644 --- a/fs/xfs/xfs_ondisk.h +++ b/fs/xfs/xfs_ondisk.h @@ -74,6 +74,7 @@ xfs_check_ondisk_structs(void) /* realtime structures */ XFS_CHECK_STRUCT_SIZE(union xfs_rtword_ondisk, 4); + XFS_CHECK_STRUCT_SIZE(union xfs_suminfo_ondisk, 4); /* * m68k has problems with xfs_attr_leaf_name_remote_t, but we pad it to ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 2/8] xfs: remove XFS_BLOCKWSIZE and XFS_BLOCKWMASK macros 2022-12-30 22:17 ` [PATCHSET v1.0 0/8] xfs: refactor rtbitmap/summary macros Darrick J. Wong 2022-12-30 22:17 ` [PATCH 1/8] xfs: convert the rtbitmap block and bit macros to static inline functions Darrick J. Wong 2022-12-30 22:17 ` [PATCH 8/8] xfs: use accessor functions for summary info words Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 7/8] xfs: create helpers for rtsummary block/wordcount computations Darrick J. Wong ` (4 subsequent siblings) 7 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Remove these trivial macros since they're not even part of the ondisk format. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_format.h | 2 -- fs/xfs/libxfs/xfs_rtbitmap.c | 16 ++++++++-------- fs/xfs/libxfs/xfs_rtbitmap.h | 2 +- 3 files changed, 9 insertions(+), 11 deletions(-) diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h index 6a3d684900ab..a4278c8fba5f 100644 --- a/fs/xfs/libxfs/xfs_format.h +++ b/fs/xfs/libxfs/xfs_format.h @@ -1196,8 +1196,6 @@ static inline bool xfs_dinode_has_large_extent_counts( #define XFS_BLOCKSIZE(mp) ((mp)->m_sb.sb_blocksize) #define XFS_BLOCKMASK(mp) ((mp)->m_blockmask) -#define XFS_BLOCKWSIZE(mp) ((mp)->m_blockwsize) -#define XFS_BLOCKWMASK(mp) ((mp)->m_blockwmask) /* * RT Summary and bit manipulation macros. diff --git a/fs/xfs/libxfs/xfs_rtbitmap.c b/fs/xfs/libxfs/xfs_rtbitmap.c index ce8736666a1e..1f4886287aad 100644 --- a/fs/xfs/libxfs/xfs_rtbitmap.c +++ b/fs/xfs/libxfs/xfs_rtbitmap.c @@ -181,7 +181,7 @@ xfs_rtfind_back( return error; } bufp = bp->b_addr; - word = XFS_BLOCKWMASK(mp); + word = mp->m_blockwsize - 1; b = &bufp[word]; } else { /* @@ -227,7 +227,7 @@ xfs_rtfind_back( return error; } bufp = bp->b_addr; - word = XFS_BLOCKWMASK(mp); + word = mp->m_blockwsize - 1; b = &bufp[word]; } else { /* @@ -345,7 +345,7 @@ xfs_rtfind_forw( * Go on to next block if that's where the next word is * and we need the next word. */ - if (++word == XFS_BLOCKWSIZE(mp) && i < len) { + if (++word == mp->m_blockwsize && i < len) { /* * If done with this block, get the previous one. */ @@ -390,7 +390,7 @@ xfs_rtfind_forw( * Go on to next block if that's where the next word is * and we need the next word. */ - if (++word == XFS_BLOCKWSIZE(mp) && i < len) { + if (++word == mp->m_blockwsize && i < len) { /* * If done with this block, get the next one. */ @@ -600,7 +600,7 @@ xfs_rtmodify_range( * Go on to the next block if that's where the next word is * and we need the next word. */ - if (++word == XFS_BLOCKWSIZE(mp) && i < len) { + if (++word == mp->m_blockwsize && i < len) { /* * Log the changed part of this block. * Get the next one. @@ -640,7 +640,7 @@ xfs_rtmodify_range( * Go on to the next block if that's where the next word is * and we need the next word. */ - if (++word == XFS_BLOCKWSIZE(mp) && i < len) { + if (++word == mp->m_blockwsize && i < len) { /* * Log the changed part of this block. * Get the next one. @@ -843,7 +843,7 @@ xfs_rtcheck_range( * Go on to next block if that's where the next word is * and we need the next word. */ - if (++word == XFS_BLOCKWSIZE(mp) && i < len) { + if (++word == mp->m_blockwsize && i < len) { /* * If done with this block, get the next one. */ @@ -889,7 +889,7 @@ xfs_rtcheck_range( * Go on to next block if that's where the next word is * and we need the next word. */ - if (++word == XFS_BLOCKWSIZE(mp) && i < len) { + if (++word == mp->m_blockwsize && i < len) { /* * If done with this block, get the next one. */ diff --git a/fs/xfs/libxfs/xfs_rtbitmap.h b/fs/xfs/libxfs/xfs_rtbitmap.h index e53011bc638d..5f4a453e29eb 100644 --- a/fs/xfs/libxfs/xfs_rtbitmap.h +++ b/fs/xfs/libxfs/xfs_rtbitmap.h @@ -109,7 +109,7 @@ xfs_rtx_to_rbmword( struct xfs_mount *mp, xfs_rtxnum_t rtx) { - return (rtx >> XFS_NBWORDLOG) & XFS_BLOCKWMASK(mp); + return (rtx >> XFS_NBWORDLOG) & (mp->m_blockwsize - 1); } /* Convert a file block offset in the rt bitmap file to an rt extent number. */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 7/8] xfs: create helpers for rtsummary block/wordcount computations 2022-12-30 22:17 ` [PATCHSET v1.0 0/8] xfs: refactor rtbitmap/summary macros Darrick J. Wong ` (2 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 2/8] xfs: remove XFS_BLOCKWSIZE and XFS_BLOCKWMASK macros Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 3/8] xfs: convert open-coded xfs_rtword_t pointer accesses to helper Darrick J. Wong ` (3 subsequent siblings) 7 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Create helper functions that compute the number of blocks or words necessary to store the rt summary file. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_rtbitmap.c | 29 +++++++++++++++++++++++++++++ fs/xfs/libxfs/xfs_rtbitmap.h | 7 +++++++ fs/xfs/scrub/rtsummary.c | 7 +++++-- fs/xfs/xfs_rtalloc.c | 17 +++++++---------- 4 files changed, 48 insertions(+), 12 deletions(-) diff --git a/fs/xfs/libxfs/xfs_rtbitmap.c b/fs/xfs/libxfs/xfs_rtbitmap.c index b2b1a1aec342..be5c793da46c 100644 --- a/fs/xfs/libxfs/xfs_rtbitmap.c +++ b/fs/xfs/libxfs/xfs_rtbitmap.c @@ -1205,3 +1205,32 @@ xfs_rtbitmap_wordcount( blocks = xfs_rtbitmap_blockcount(mp, rtextents); return XFS_FSB_TO_B(mp, blocks) >> XFS_WORDLOG; } + +/* Compute the number of rtsummary blocks needed to track the given rt space. */ +xfs_filblks_t +xfs_rtsummary_blockcount( + struct xfs_mount *mp, + unsigned int rsumlevels, + xfs_extlen_t rbmblocks) +{ + unsigned long long rsumwords; + + rsumwords = (unsigned long long)rsumlevels * rbmblocks; + return XFS_B_TO_FSB(mp, rsumwords << XFS_WORDLOG); +} + +/* + * Compute the number of rtsummary info words needed to populate every block of + * a summary file that is large enough to track the given rt space. + */ +unsigned long long +xfs_rtsummary_wordcount( + struct xfs_mount *mp, + unsigned int rsumlevels, + xfs_extlen_t rbmblocks) +{ + xfs_filblks_t blocks; + + blocks = xfs_rtsummary_blockcount(mp, rsumlevels, rbmblocks); + return XFS_FSB_TO_B(mp, blocks) >> XFS_WORDLOG; +} diff --git a/fs/xfs/libxfs/xfs_rtbitmap.h b/fs/xfs/libxfs/xfs_rtbitmap.h index 0a3c6299af8e..a66357cf002b 100644 --- a/fs/xfs/libxfs/xfs_rtbitmap.h +++ b/fs/xfs/libxfs/xfs_rtbitmap.h @@ -270,6 +270,11 @@ xfs_rtword_t xfs_rtbitmap_getword(struct xfs_mount *mp, union xfs_rtword_ondisk *wordptr); void xfs_rtbitmap_setword(struct xfs_mount *mp, union xfs_rtword_ondisk *wordptr, xfs_rtword_t incore); + +xfs_filblks_t xfs_rtsummary_blockcount(struct xfs_mount *mp, + unsigned int rsumlevels, xfs_extlen_t rbmblocks); +unsigned long long xfs_rtsummary_wordcount(struct xfs_mount *mp, + unsigned int rsumlevels, xfs_extlen_t rbmblocks); #else /* CONFIG_XFS_RT */ # define xfs_rtfree_extent(t,b,l) (-ENOSYS) # define xfs_rtfree_blocks(t,rb,rl) (-ENOSYS) @@ -284,6 +289,8 @@ xfs_rtbitmap_blockcount(struct xfs_mount *mp, xfs_rtbxlen_t rtextents) return 0; } # define xfs_rtbitmap_wordcount(mp, r) (0) +# define xfs_rtsummary_blockcount(mp, l, b) (0) +# define xfs_rtsummary_wordcount(mp, l, b) (0) #endif /* CONFIG_XFS_RT */ #endif /* __XFS_RTBITMAP_H__ */ diff --git a/fs/xfs/scrub/rtsummary.c b/fs/xfs/scrub/rtsummary.c index dfbbaab5a734..e51c3d10e501 100644 --- a/fs/xfs/scrub/rtsummary.c +++ b/fs/xfs/scrub/rtsummary.c @@ -40,6 +40,7 @@ xchk_setup_rtsummary( { struct xfs_mount *mp = sc->mp; size_t bufsize = mp->m_sb.sb_blocksize; + unsigned int wordcnt; unsigned int resblks = 0; int error; @@ -53,8 +54,10 @@ xchk_setup_rtsummary( * Create an xfile to construct a new rtsummary file. The xfile allows * us to avoid pinning kernel memory for this purpose. */ - error = xfile_create(mp, "realtime summary file", mp->m_rsumsize, - &sc->xfile); + wordcnt = xfs_rtsummary_wordcount(mp, mp->m_rsumlevels, + mp->m_sb.sb_rbmblocks); + error = xfile_create(mp, "realtime summary file", + wordcnt << XFS_WORDLOG, &sc->xfile); if (error) return error; diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c index a64f7abe6409..11c42ebfa0a5 100644 --- a/fs/xfs/xfs_rtalloc.c +++ b/fs/xfs/xfs_rtalloc.c @@ -999,8 +999,7 @@ xfs_growfs_rt( nrbmblocks = xfs_rtbitmap_blockcount(mp, nrextents); nrextslog = xfs_highbit32(nrextents); nrsumlevels = nrextslog + 1; - nrsumsize = (uint)sizeof(xfs_suminfo_t) * nrsumlevels * nrbmblocks; - nrsumblocks = XFS_B_TO_FSB(mp, nrsumsize); + nrsumblocks = xfs_rtsummary_blockcount(mp, nrsumlevels, nrbmblocks); nrsumsize = XFS_FSB_TO_B(mp, nrsumblocks); /* * New summary size can't be more than half the size of @@ -1061,10 +1060,8 @@ xfs_growfs_rt( ASSERT(nsbp->sb_rextents != 0); nsbp->sb_rextslog = xfs_highbit32(nsbp->sb_rextents); nrsumlevels = nmp->m_rsumlevels = nsbp->sb_rextslog + 1; - nrsumsize = - (uint)sizeof(xfs_suminfo_t) * nrsumlevels * - nsbp->sb_rbmblocks; - nrsumblocks = XFS_B_TO_FSB(mp, nrsumsize); + nrsumblocks = xfs_rtsummary_blockcount(mp, nrsumlevels, + nsbp->sb_rbmblocks); nmp->m_rsumsize = nrsumsize = XFS_FSB_TO_B(mp, nrsumblocks); /* * Start a transaction, get the log reservation. @@ -1270,6 +1267,7 @@ xfs_rtmount_init( struct xfs_buf *bp; /* buffer for last block of subvolume */ struct xfs_sb *sbp; /* filesystem superblock copy in mount */ xfs_daddr_t d; /* address of last block of subvolume */ + unsigned int rsumblocks; int error; sbp = &mp->m_sb; @@ -1281,10 +1279,9 @@ xfs_rtmount_init( return -ENODEV; } mp->m_rsumlevels = sbp->sb_rextslog + 1; - mp->m_rsumsize = - (uint)sizeof(xfs_suminfo_t) * mp->m_rsumlevels * - sbp->sb_rbmblocks; - mp->m_rsumsize = roundup(mp->m_rsumsize, sbp->sb_blocksize); + rsumblocks = xfs_rtsummary_blockcount(mp, mp->m_rsumlevels, + mp->m_sb.sb_rbmblocks); + mp->m_rsumsize = XFS_FSB_TO_B(mp, rsumblocks); mp->m_rbmip = mp->m_rsumip = NULL; /* * Check that the realtime section is an ok size. ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 3/8] xfs: convert open-coded xfs_rtword_t pointer accesses to helper 2022-12-30 22:17 ` [PATCHSET v1.0 0/8] xfs: refactor rtbitmap/summary macros Darrick J. Wong ` (3 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 7/8] xfs: create helpers for rtsummary block/wordcount computations Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 5/8] xfs: create helpers for rtbitmap block/wordcount computations Darrick J. Wong ` (2 subsequent siblings) 7 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> There are a bunch of places where we use open-coded logic to find a pointer to an xfs_rtword_t within a rt bitmap buffer. Convert all that to helper functions for better type safety. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_rtbitmap.c | 59 ++++++++++++++++++++++-------------------- fs/xfs/libxfs/xfs_rtbitmap.h | 20 ++++++++++++++ 2 files changed, 51 insertions(+), 28 deletions(-) diff --git a/fs/xfs/libxfs/xfs_rtbitmap.c b/fs/xfs/libxfs/xfs_rtbitmap.c index 1f4886287aad..231622a5ab68 100644 --- a/fs/xfs/libxfs/xfs_rtbitmap.c +++ b/fs/xfs/libxfs/xfs_rtbitmap.c @@ -110,7 +110,6 @@ xfs_rtfind_back( int bit; /* bit number in the word */ xfs_fileoff_t block; /* bitmap block number */ struct xfs_buf *bp; /* buf for the block */ - xfs_rtword_t *bufp; /* starting word in buffer */ int error; /* error value */ xfs_rtxnum_t firstbit; /* first useful bit in the word */ xfs_rtxnum_t i; /* current bit number rel. to start */ @@ -128,12 +127,12 @@ xfs_rtfind_back( if (error) { return error; } - bufp = bp->b_addr; + /* * Get the first word's index & point to it. */ word = xfs_rtx_to_rbmword(mp, start); - b = &bufp[word]; + b = xfs_rbmblock_wordptr(bp, word); bit = (int)(start & (XFS_NBWORD - 1)); len = start - limit + 1; /* @@ -180,9 +179,9 @@ xfs_rtfind_back( if (error) { return error; } - bufp = bp->b_addr; + word = mp->m_blockwsize - 1; - b = &bufp[word]; + b = xfs_rbmblock_wordptr(bp, word); } else { /* * Go on to the previous word in the buffer. @@ -226,9 +225,9 @@ xfs_rtfind_back( if (error) { return error; } - bufp = bp->b_addr; + word = mp->m_blockwsize - 1; - b = &bufp[word]; + b = xfs_rbmblock_wordptr(bp, word); } else { /* * Go on to the previous word in the buffer. @@ -285,7 +284,6 @@ xfs_rtfind_forw( int bit; /* bit number in the word */ xfs_fileoff_t block; /* bitmap block number */ struct xfs_buf *bp; /* buf for the block */ - xfs_rtword_t *bufp; /* starting word in buffer */ int error; /* error value */ xfs_rtxnum_t i; /* current bit number rel. to start */ xfs_rtxnum_t lastbit; /* last useful bit in the word */ @@ -303,12 +301,12 @@ xfs_rtfind_forw( if (error) { return error; } - bufp = bp->b_addr; + /* * Get the first word's index & point to it. */ word = xfs_rtx_to_rbmword(mp, start); - b = &bufp[word]; + b = xfs_rbmblock_wordptr(bp, word); bit = (int)(start & (XFS_NBWORD - 1)); len = limit - start + 1; /* @@ -354,8 +352,9 @@ xfs_rtfind_forw( if (error) { return error; } - b = bufp = bp->b_addr; + word = 0; + b = xfs_rbmblock_wordptr(bp, word); } else { /* * Go on to the previous word in the buffer. @@ -399,8 +398,9 @@ xfs_rtfind_forw( if (error) { return error; } - b = bufp = bp->b_addr; + word = 0; + b = xfs_rbmblock_wordptr(bp, word); } else { /* * Go on to the next word in the buffer. @@ -548,7 +548,6 @@ xfs_rtmodify_range( int bit; /* bit number in the word */ xfs_fileoff_t block; /* bitmap block number */ struct xfs_buf *bp; /* buf for the block */ - xfs_rtword_t *bufp; /* starting word in buffer */ int error; /* error value */ xfs_rtword_t *first; /* first used word in the buffer */ int i; /* current bit number rel. to start */ @@ -567,12 +566,12 @@ xfs_rtmodify_range( if (error) { return error; } - bufp = bp->b_addr; + /* * Compute the starting word's address, and starting bit. */ word = xfs_rtx_to_rbmword(mp, start); - first = b = &bufp[word]; + first = b = xfs_rbmblock_wordptr(bp, word); bit = (int)(start & (XFS_NBWORD - 1)); /* * 0 (allocated) => all zeroes; 1 (free) => all ones. @@ -606,14 +605,15 @@ xfs_rtmodify_range( * Get the next one. */ xfs_trans_log_buf(tp, bp, - (uint)((char *)first - (char *)bufp), - (uint)((char *)b - (char *)bufp)); + (uint)((char *)first - (char *)bp->b_addr), + (uint)((char *)b - (char *)bp->b_addr)); error = xfs_rtbuf_get(mp, tp, ++block, 0, &bp); if (error) { return error; } - first = b = bufp = bp->b_addr; + word = 0; + first = b = xfs_rbmblock_wordptr(bp, word); } else { /* * Go on to the next word in the buffer @@ -646,14 +646,15 @@ xfs_rtmodify_range( * Get the next one. */ xfs_trans_log_buf(tp, bp, - (uint)((char *)first - (char *)bufp), - (uint)((char *)b - (char *)bufp)); + (uint)((char *)first - (char *)bp->b_addr), + (uint)((char *)b - (char *)bp->b_addr)); error = xfs_rtbuf_get(mp, tp, ++block, 0, &bp); if (error) { return error; } - first = b = bufp = bp->b_addr; + word = 0; + first = b = xfs_rbmblock_wordptr(bp, word); } else { /* * Go on to the next word in the buffer @@ -683,8 +684,9 @@ xfs_rtmodify_range( * Log any remaining changed bytes. */ if (b > first) - xfs_trans_log_buf(tp, bp, (uint)((char *)first - (char *)bufp), - (uint)((char *)b - (char *)bufp - 1)); + xfs_trans_log_buf(tp, bp, + (uint)((char *)first - (char *)bp->b_addr), + (uint)((char *)b - (char *)bp->b_addr - 1)); return 0; } @@ -782,7 +784,6 @@ xfs_rtcheck_range( int bit; /* bit number in the word */ xfs_fileoff_t block; /* bitmap block number */ struct xfs_buf *bp; /* buf for the block */ - xfs_rtword_t *bufp; /* starting word in buffer */ int error; /* error value */ xfs_rtxnum_t i; /* current bit number rel. to start */ xfs_rtxnum_t lastbit; /* last useful bit in word */ @@ -801,12 +802,12 @@ xfs_rtcheck_range( if (error) { return error; } - bufp = bp->b_addr; + /* * Compute the starting word's address, and starting bit. */ word = xfs_rtx_to_rbmword(mp, start); - b = &bufp[word]; + b = xfs_rbmblock_wordptr(bp, word); bit = (int)(start & (XFS_NBWORD - 1)); /* * 0 (allocated) => all zero's; 1 (free) => all one's. @@ -852,8 +853,9 @@ xfs_rtcheck_range( if (error) { return error; } - b = bufp = bp->b_addr; + word = 0; + b = xfs_rbmblock_wordptr(bp, word); } else { /* * Go on to the next word in the buffer. @@ -898,8 +900,9 @@ xfs_rtcheck_range( if (error) { return error; } - b = bufp = bp->b_addr; + word = 0; + b = xfs_rbmblock_wordptr(bp, word); } else { /* * Go on to the next word in the buffer. diff --git a/fs/xfs/libxfs/xfs_rtbitmap.h b/fs/xfs/libxfs/xfs_rtbitmap.h index 5f4a453e29eb..af37afec2b01 100644 --- a/fs/xfs/libxfs/xfs_rtbitmap.h +++ b/fs/xfs/libxfs/xfs_rtbitmap.h @@ -121,6 +121,26 @@ xfs_rbmblock_to_rtx( return rbmoff << mp->m_blkbit_log; } +/* Return a pointer to a bitmap word within a rt bitmap block buffer. */ +static inline xfs_rtword_t * +xfs_rbmbuf_wordptr( + void *buf, + unsigned int rbmword) +{ + xfs_rtword_t *wordp = buf; + + return &wordp[rbmword]; +} + +/* Return a pointer to a bitmap word within a rt bitmap block. */ +static inline xfs_rtword_t * +xfs_rbmblock_wordptr( + struct xfs_buf *bp, + unsigned int rbmword) +{ + return xfs_rbmbuf_wordptr(bp->b_addr, rbmword); +} + /* * Functions for walking free space rtextents in the realtime bitmap. */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 5/8] xfs: create helpers for rtbitmap block/wordcount computations 2022-12-30 22:17 ` [PATCHSET v1.0 0/8] xfs: refactor rtbitmap/summary macros Darrick J. Wong ` (4 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 3/8] xfs: convert open-coded xfs_rtword_t pointer accesses to helper Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 6/8] xfs: use accessor functions for bitmap words Darrick J. Wong 2022-12-30 22:17 ` [PATCH 4/8] xfs: convert rt summary macros to helpers Darrick J. Wong 7 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Create helper functions that compute the number of blocks or words necessary to store the rt bitmap. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_rtbitmap.c | 27 +++++++++++++++++++++++++++ fs/xfs/libxfs/xfs_rtbitmap.h | 12 ++++++++++++ fs/xfs/libxfs/xfs_trans_resv.c | 9 +++++---- fs/xfs/scrub/rtsummary.c | 7 +++---- fs/xfs/xfs_rtalloc.c | 2 +- 5 files changed, 48 insertions(+), 9 deletions(-) diff --git a/fs/xfs/libxfs/xfs_rtbitmap.c b/fs/xfs/libxfs/xfs_rtbitmap.c index b6a1d240c554..2a453f0215ee 100644 --- a/fs/xfs/libxfs/xfs_rtbitmap.c +++ b/fs/xfs/libxfs/xfs_rtbitmap.c @@ -1142,3 +1142,30 @@ xfs_rtalloc_extent_is_free( *is_free = matches; return 0; } + +/* + * Compute the number of rtbitmap blocks needed to track the given number of rt + * extents. + */ +xfs_filblks_t +xfs_rtbitmap_blockcount( + struct xfs_mount *mp, + xfs_rtbxlen_t rtextents) +{ + return howmany_64(rtextents, NBBY * mp->m_sb.sb_blocksize); +} + +/* + * Compute the number of rtbitmap words needed to populate every block of a + * bitmap that is large enough to track the given number of rt extents. + */ +unsigned long long +xfs_rtbitmap_wordcount( + struct xfs_mount *mp, + xfs_rtbxlen_t rtextents) +{ + xfs_filblks_t blocks; + + blocks = xfs_rtbitmap_blockcount(mp, rtextents); + return XFS_FSB_TO_B(mp, blocks) >> XFS_WORDLOG; +} diff --git a/fs/xfs/libxfs/xfs_rtbitmap.h b/fs/xfs/libxfs/xfs_rtbitmap.h index f616956b2891..308ce814a908 100644 --- a/fs/xfs/libxfs/xfs_rtbitmap.h +++ b/fs/xfs/libxfs/xfs_rtbitmap.h @@ -261,6 +261,11 @@ xfs_rtfree_extent( /* Same as above, but in units of rt blocks. */ int xfs_rtfree_blocks(struct xfs_trans *tp, xfs_fsblock_t rtbno, xfs_filblks_t rtlen); + +xfs_filblks_t xfs_rtbitmap_blockcount(struct xfs_mount *mp, xfs_rtbxlen_t + rtextents); +unsigned long long xfs_rtbitmap_wordcount(struct xfs_mount *mp, + xfs_rtbxlen_t rtextents); #else /* CONFIG_XFS_RT */ # define xfs_rtfree_extent(t,b,l) (-ENOSYS) # define xfs_rtfree_blocks(t,rb,rl) (-ENOSYS) @@ -268,6 +273,13 @@ int xfs_rtfree_blocks(struct xfs_trans *tp, xfs_fsblock_t rtbno, # define xfs_rtalloc_query_all(m,t,f,p) (-ENOSYS) # define xfs_rtbuf_get(m,t,b,i,p) (-ENOSYS) # define xfs_rtalloc_extent_is_free(m,t,s,l,i) (-ENOSYS) +static inline xfs_filblks_t +xfs_rtbitmap_blockcount(struct xfs_mount *mp, xfs_rtbxlen_t rtextents) +{ + /* shut up gcc */ + return 0; +} +# define xfs_rtbitmap_wordcount(mp, r) (0) #endif /* CONFIG_XFS_RT */ #endif /* __XFS_RTBITMAP_H__ */ diff --git a/fs/xfs/libxfs/xfs_trans_resv.c b/fs/xfs/libxfs/xfs_trans_resv.c index dd924842716d..67accd613038 100644 --- a/fs/xfs/libxfs/xfs_trans_resv.c +++ b/fs/xfs/libxfs/xfs_trans_resv.c @@ -218,11 +218,12 @@ xfs_rtalloc_block_count( struct xfs_mount *mp, unsigned int num_ops) { - unsigned int blksz = XFS_FSB_TO_B(mp, 1); - unsigned int rtbmp_bytes; + unsigned int rtbmp_blocks; + xfs_rtxlen_t rtxlen; - rtbmp_bytes = xfs_extlen_to_rtxlen(mp, XFS_MAX_BMBT_EXTLEN) / NBBY; - return (howmany(rtbmp_bytes, blksz) + 1) * num_ops; + rtxlen = xfs_extlen_to_rtxlen(mp, XFS_MAX_BMBT_EXTLEN); + rtbmp_blocks = xfs_rtbitmap_blockcount(mp, rtxlen); + return (rtbmp_blocks + 1) * num_ops; } /* diff --git a/fs/xfs/scrub/rtsummary.c b/fs/xfs/scrub/rtsummary.c index 98baca261202..dfbbaab5a734 100644 --- a/fs/xfs/scrub/rtsummary.c +++ b/fs/xfs/scrub/rtsummary.c @@ -164,12 +164,11 @@ xchk_rtsum_compute( struct xfs_scrub *sc) { struct xfs_mount *mp = sc->mp; - unsigned long long rtbmp_bytes; + unsigned long long rtbmp_blocks; /* If the bitmap size doesn't match the computed size, bail. */ - rtbmp_bytes = howmany_64(mp->m_sb.sb_rextents, NBBY); - if (roundup_64(rtbmp_bytes, mp->m_sb.sb_blocksize) != - mp->m_rbmip->i_disk_size) + rtbmp_blocks = xfs_rtbitmap_blockcount(mp, mp->m_sb.sb_rextents); + if (XFS_FSB_TO_B(mp, rtbmp_blocks) != mp->m_rbmip->i_disk_size) return -EFSCORRUPTED; return xfs_rtalloc_query_all(sc->mp, sc->tp, xchk_rtsum_record_free, diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c index c63906cf94d1..a64f7abe6409 100644 --- a/fs/xfs/xfs_rtalloc.c +++ b/fs/xfs/xfs_rtalloc.c @@ -996,7 +996,7 @@ xfs_growfs_rt( */ nrextents = nrblocks; do_div(nrextents, in->extsize); - nrbmblocks = howmany_64(nrextents, NBBY * sbp->sb_blocksize); + nrbmblocks = xfs_rtbitmap_blockcount(mp, nrextents); nrextslog = xfs_highbit32(nrextents); nrsumlevels = nrextslog + 1; nrsumsize = (uint)sizeof(xfs_suminfo_t) * nrsumlevels * nrbmblocks; ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 6/8] xfs: use accessor functions for bitmap words 2022-12-30 22:17 ` [PATCHSET v1.0 0/8] xfs: refactor rtbitmap/summary macros Darrick J. Wong ` (5 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 5/8] xfs: create helpers for rtbitmap block/wordcount computations Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 4/8] xfs: convert rt summary macros to helpers Darrick J. Wong 7 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Create get and set functions for rtbitmap words so that we can redefine the ondisk format with a specific endianness. Note that this requires the definition of a distinct type for ondisk rtbitmap words so that the compiler can perform proper typechecking as we go back and forth. In the upcoming rtgroups feature, we're going to fix the problem that rtwords are written in host endian order, which means we'll need the distinct rtword/rtword_raw types. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_format.h | 8 ++++ fs/xfs/libxfs/xfs_rtbitmap.c | 78 +++++++++++++++++++++++++++++++----------- fs/xfs/libxfs/xfs_rtbitmap.h | 10 ++++- fs/xfs/xfs_ondisk.h | 3 ++ 4 files changed, 75 insertions(+), 24 deletions(-) diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h index d95497c064fc..14da972f5508 100644 --- a/fs/xfs/libxfs/xfs_format.h +++ b/fs/xfs/libxfs/xfs_format.h @@ -717,6 +717,14 @@ struct xfs_agfl { ASSERT(xfs_daddr_to_agno(mp, d) == \ xfs_daddr_to_agno(mp, (d) + (len) - 1))) +/* + * Realtime bitmap information is accessed by the word, which is currently + * stored in host-endian format. + */ +union xfs_rtword_ondisk { + __u32 raw; +}; + /* * XFS Timestamps * ============== diff --git a/fs/xfs/libxfs/xfs_rtbitmap.c b/fs/xfs/libxfs/xfs_rtbitmap.c index 2a453f0215ee..b2b1a1aec342 100644 --- a/fs/xfs/libxfs/xfs_rtbitmap.c +++ b/fs/xfs/libxfs/xfs_rtbitmap.c @@ -94,6 +94,25 @@ xfs_rtbuf_get( return 0; } +/* Convert an ondisk bitmap word to its incore representation. */ +inline xfs_rtword_t +xfs_rtbitmap_getword( + struct xfs_mount *mp, + union xfs_rtword_ondisk *wordptr) +{ + return wordptr->raw; +} + +/* Set an ondisk bitmap word from an incore representation. */ +inline void +xfs_rtbitmap_setword( + struct xfs_mount *mp, + union xfs_rtword_ondisk *wordptr, + xfs_rtword_t incore) +{ + wordptr->raw = incore; +} + /* * Searching backward from start to limit, find the first block whose * allocated/free state is different from start's. @@ -106,7 +125,7 @@ xfs_rtfind_back( xfs_rtxnum_t limit, /* last rtext to look at */ xfs_rtxnum_t *rtx) /* out: start rtext found */ { - xfs_rtword_t *b; /* current word in buffer */ + union xfs_rtword_ondisk *b; /* current word in buffer */ int bit; /* bit number in the word */ xfs_fileoff_t block; /* bitmap block number */ struct xfs_buf *bp; /* buf for the block */ @@ -117,6 +136,7 @@ xfs_rtfind_back( xfs_rtword_t mask; /* mask of relevant bits for value */ xfs_rtword_t want; /* mask for "good" values */ xfs_rtword_t wdiff; /* difference from wanted value */ + xfs_rtword_t incore; int word; /* word number in the buffer */ /* @@ -139,7 +159,8 @@ xfs_rtfind_back( * Compute match value, based on the bit at start: if 1 (free) * then all-ones, else all-zeroes. */ - want = (*b & ((xfs_rtword_t)1 << bit)) ? -1 : 0; + incore = xfs_rtbitmap_getword(mp, b); + want = (incore & ((xfs_rtword_t)1 << bit)) ? -1 : 0; /* * If the starting position is not word-aligned, deal with the * partial word. @@ -156,7 +177,7 @@ xfs_rtfind_back( * Calculate the difference between the value there * and what we're looking for. */ - if ((wdiff = (*b ^ want) & mask)) { + if ((wdiff = (incore ^ want) & mask)) { /* * Different. Mark where we are and return. */ @@ -202,7 +223,8 @@ xfs_rtfind_back( /* * Compute difference between actual and desired value. */ - if ((wdiff = *b ^ want)) { + incore = xfs_rtbitmap_getword(mp, b); + if ((wdiff = incore ^ want)) { /* * Different, mark where we are and return. */ @@ -249,7 +271,8 @@ xfs_rtfind_back( /* * Compute difference between actual and desired value. */ - if ((wdiff = (*b ^ want) & mask)) { + incore = xfs_rtbitmap_getword(mp, b); + if ((wdiff = (incore ^ want) & mask)) { /* * Different, mark where we are and return. */ @@ -280,7 +303,7 @@ xfs_rtfind_forw( xfs_rtxnum_t limit, /* last rtext to look at */ xfs_rtxnum_t *rtx) /* out: start rtext found */ { - xfs_rtword_t *b; /* current word in buffer */ + union xfs_rtword_ondisk *b; /* current word in buffer */ int bit; /* bit number in the word */ xfs_fileoff_t block; /* bitmap block number */ struct xfs_buf *bp; /* buf for the block */ @@ -291,6 +314,7 @@ xfs_rtfind_forw( xfs_rtword_t mask; /* mask of relevant bits for value */ xfs_rtword_t want; /* mask for "good" values */ xfs_rtword_t wdiff; /* difference from wanted value */ + xfs_rtword_t incore; int word; /* word number in the buffer */ /* @@ -313,7 +337,8 @@ xfs_rtfind_forw( * Compute match value, based on the bit at start: if 1 (free) * then all-ones, else all-zeroes. */ - want = (*b & ((xfs_rtword_t)1 << bit)) ? -1 : 0; + incore = xfs_rtbitmap_getword(mp, b); + want = (incore & ((xfs_rtword_t)1 << bit)) ? -1 : 0; /* * If the starting position is not word-aligned, deal with the * partial word. @@ -329,7 +354,7 @@ xfs_rtfind_forw( * Calculate the difference between the value there * and what we're looking for. */ - if ((wdiff = (*b ^ want) & mask)) { + if ((wdiff = (incore ^ want) & mask)) { /* * Different. Mark where we are and return. */ @@ -375,7 +400,8 @@ xfs_rtfind_forw( /* * Compute difference between actual and desired value. */ - if ((wdiff = *b ^ want)) { + incore = xfs_rtbitmap_getword(mp, b); + if ((wdiff = incore ^ want)) { /* * Different, mark where we are and return. */ @@ -420,7 +446,8 @@ xfs_rtfind_forw( /* * Compute difference between actual and desired value. */ - if ((wdiff = (*b ^ want) & mask)) { + incore = xfs_rtbitmap_getword(mp, b); + if ((wdiff = (incore ^ want) & mask)) { /* * Different, mark where we are and return. */ @@ -546,15 +573,16 @@ xfs_rtmodify_range( xfs_rtxlen_t len, /* length of extent to modify */ int val) /* 1 for free, 0 for allocated */ { - xfs_rtword_t *b; /* current word in buffer */ + union xfs_rtword_ondisk *b; /* current word in buffer */ int bit; /* bit number in the word */ xfs_fileoff_t block; /* bitmap block number */ struct xfs_buf *bp; /* buf for the block */ int error; /* error value */ - xfs_rtword_t *first; /* first used word in the buffer */ + union xfs_rtword_ondisk *first; /* first used word in the buffer */ int i; /* current bit number rel. to start */ int lastbit; /* last useful bit in word */ xfs_rtword_t mask; /* mask o frelevant bits for value */ + xfs_rtword_t incore; int word; /* word number in the buffer */ /* @@ -592,10 +620,12 @@ xfs_rtmodify_range( /* * Set/clear the active bits. */ + incore = xfs_rtbitmap_getword(mp, b); if (val) - *b |= mask; + incore |= mask; else - *b &= ~mask; + incore &= ~mask; + xfs_rtbitmap_setword(mp, b, incore); i = lastbit - bit; /* * Go on to the next block if that's where the next word is @@ -636,7 +666,7 @@ xfs_rtmodify_range( /* * Set the word value correctly. */ - *b = val; + xfs_rtbitmap_setword(mp, b, val); i += XFS_NBWORD; /* * Go on to the next block if that's where the next word is @@ -676,10 +706,12 @@ xfs_rtmodify_range( /* * Set/clear the active bits. */ + incore = xfs_rtbitmap_getword(mp, b); if (val) - *b |= mask; + incore |= mask; else - *b &= ~mask; + incore &= ~mask; + xfs_rtbitmap_setword(mp, b, incore); b++; } /* @@ -782,7 +814,7 @@ xfs_rtcheck_range( xfs_rtxnum_t *new, /* out: first rtext not matching */ int *stat) /* out: 1 for matches, 0 for not */ { - xfs_rtword_t *b; /* current word in buffer */ + union xfs_rtword_ondisk *b; /* current word in buffer */ int bit; /* bit number in the word */ xfs_fileoff_t block; /* bitmap block number */ struct xfs_buf *bp; /* buf for the block */ @@ -791,6 +823,7 @@ xfs_rtcheck_range( xfs_rtxnum_t lastbit; /* last useful bit in word */ xfs_rtword_t mask; /* mask of relevant bits for value */ xfs_rtword_t wdiff; /* difference from wanted value */ + xfs_rtword_t incore; int word; /* word number in the buffer */ /* @@ -831,7 +864,8 @@ xfs_rtcheck_range( /* * Compute difference between actual and desired value. */ - if ((wdiff = (*b ^ val) & mask)) { + incore = xfs_rtbitmap_getword(mp, b); + if ((wdiff = (incore ^ val) & mask)) { /* * Different, compute first wrong bit and return. */ @@ -878,7 +912,8 @@ xfs_rtcheck_range( /* * Compute difference between actual and desired value. */ - if ((wdiff = *b ^ val)) { + incore = xfs_rtbitmap_getword(mp, b); + if ((wdiff = incore ^ val)) { /* * Different, compute first wrong bit and return. */ @@ -924,7 +959,8 @@ xfs_rtcheck_range( /* * Compute difference between actual and desired value. */ - if ((wdiff = (*b ^ val) & mask)) { + incore = xfs_rtbitmap_getword(mp, b); + if ((wdiff = (incore ^ val) & mask)) { /* * Different, compute first wrong bit and return. */ diff --git a/fs/xfs/libxfs/xfs_rtbitmap.h b/fs/xfs/libxfs/xfs_rtbitmap.h index 308ce814a908..0a3c6299af8e 100644 --- a/fs/xfs/libxfs/xfs_rtbitmap.h +++ b/fs/xfs/libxfs/xfs_rtbitmap.h @@ -122,18 +122,18 @@ xfs_rbmblock_to_rtx( } /* Return a pointer to a bitmap word within a rt bitmap block buffer. */ -static inline xfs_rtword_t * +static inline union xfs_rtword_ondisk * xfs_rbmbuf_wordptr( void *buf, unsigned int rbmword) { - xfs_rtword_t *wordp = buf; + union xfs_rtword_ondisk *wordp = buf; return &wordp[rbmword]; } /* Return a pointer to a bitmap word within a rt bitmap block. */ -static inline xfs_rtword_t * +static inline union xfs_rtword_ondisk * xfs_rbmblock_wordptr( struct xfs_buf *bp, unsigned int rbmword) @@ -266,6 +266,10 @@ xfs_filblks_t xfs_rtbitmap_blockcount(struct xfs_mount *mp, xfs_rtbxlen_t rtextents); unsigned long long xfs_rtbitmap_wordcount(struct xfs_mount *mp, xfs_rtbxlen_t rtextents); +xfs_rtword_t xfs_rtbitmap_getword(struct xfs_mount *mp, + union xfs_rtword_ondisk *wordptr); +void xfs_rtbitmap_setword(struct xfs_mount *mp, + union xfs_rtword_ondisk *wordptr, xfs_rtword_t incore); #else /* CONFIG_XFS_RT */ # define xfs_rtfree_extent(t,b,l) (-ENOSYS) # define xfs_rtfree_blocks(t,rb,rl) (-ENOSYS) diff --git a/fs/xfs/xfs_ondisk.h b/fs/xfs/xfs_ondisk.h index 1e71d27f0cae..7f20642b073e 100644 --- a/fs/xfs/xfs_ondisk.h +++ b/fs/xfs/xfs_ondisk.h @@ -72,6 +72,9 @@ xfs_check_ondisk_structs(void) XFS_CHECK_STRUCT_SIZE(xfs_attr_leaf_map_t, 4); XFS_CHECK_STRUCT_SIZE(xfs_attr_leaf_name_local_t, 4); + /* realtime structures */ + XFS_CHECK_STRUCT_SIZE(union xfs_rtword_ondisk, 4); + /* * m68k has problems with xfs_attr_leaf_name_remote_t, but we pad it to * 4 bytes anyway so it's not obviously a problem. Hence for the moment ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 4/8] xfs: convert rt summary macros to helpers 2022-12-30 22:17 ` [PATCHSET v1.0 0/8] xfs: refactor rtbitmap/summary macros Darrick J. Wong ` (6 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 6/8] xfs: use accessor functions for bitmap words Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 7 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Convert the realtime summary file macros to helper functions so that we can improve type checking. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_format.h | 9 +----- fs/xfs/libxfs/xfs_rtbitmap.c | 10 ++++--- fs/xfs/libxfs/xfs_rtbitmap.h | 59 +++++++++++++++++++++++++++++++++++++++ fs/xfs/libxfs/xfs_types.h | 2 + fs/xfs/scrub/rtsummary.c | 16 ++++++----- fs/xfs/scrub/rtsummary.h | 4 +-- fs/xfs/scrub/rtsummary_repair.c | 7 +++-- 7 files changed, 83 insertions(+), 24 deletions(-) diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h index a4278c8fba5f..d95497c064fc 100644 --- a/fs/xfs/libxfs/xfs_format.h +++ b/fs/xfs/libxfs/xfs_format.h @@ -1198,15 +1198,8 @@ static inline bool xfs_dinode_has_large_extent_counts( #define XFS_BLOCKMASK(mp) ((mp)->m_blockmask) /* - * RT Summary and bit manipulation macros. + * RT bit manipulation macros. */ -#define XFS_SUMOFFS(mp,ls,bb) ((int)((ls) * (mp)->m_sb.sb_rbmblocks + (bb))) -#define XFS_SUMOFFSTOBLOCK(mp,s) \ - (((s) * (uint)sizeof(xfs_suminfo_t)) >> (mp)->m_sb.sb_blocklog) -#define XFS_SUMPTR(mp,bp,so) \ - ((xfs_suminfo_t *)((bp)->b_addr + \ - (((so) * (uint)sizeof(xfs_suminfo_t)) & XFS_BLOCKMASK(mp)))) - #define XFS_RTMIN(a,b) ((a) < (b) ? (a) : (b)) #define XFS_RTMAX(a,b) ((a) > (b) ? (a) : (b)) diff --git a/fs/xfs/libxfs/xfs_rtbitmap.c b/fs/xfs/libxfs/xfs_rtbitmap.c index 231622a5ab68..b6a1d240c554 100644 --- a/fs/xfs/libxfs/xfs_rtbitmap.c +++ b/fs/xfs/libxfs/xfs_rtbitmap.c @@ -462,17 +462,18 @@ xfs_rtmodify_summary_int( struct xfs_buf *bp; /* buffer for the summary block */ int error; /* error value */ xfs_fileoff_t sb; /* summary fsblock */ - int so; /* index into the summary file */ + xfs_rtsumoff_t so; /* index into the summary file */ xfs_suminfo_t *sp; /* pointer to returned data */ + unsigned int infoword; /* * Compute entry number in the summary file. */ - so = XFS_SUMOFFS(mp, log, bbno); + so = xfs_rtsumoffs(mp, log, bbno); /* * Compute the block number in the summary file. */ - sb = XFS_SUMOFFSTOBLOCK(mp, so); + sb = xfs_rtsumoffs_to_block(mp, so); /* * If we have an old buffer, and the block number matches, use that. */ @@ -500,7 +501,8 @@ xfs_rtmodify_summary_int( /* * Point to the summary information, modify/log it, and/or copy it out. */ - sp = XFS_SUMPTR(mp, bp, so); + infoword = xfs_rtsumoffs_to_infoword(mp, so); + sp = xfs_rsumblock_infoptr(bp, infoword); if (delta) { uint first = (uint)((char *)sp - (char *)bp->b_addr); diff --git a/fs/xfs/libxfs/xfs_rtbitmap.h b/fs/xfs/libxfs/xfs_rtbitmap.h index af37afec2b01..f616956b2891 100644 --- a/fs/xfs/libxfs/xfs_rtbitmap.h +++ b/fs/xfs/libxfs/xfs_rtbitmap.h @@ -141,6 +141,65 @@ xfs_rbmblock_wordptr( return xfs_rbmbuf_wordptr(bp->b_addr, rbmword); } +/* + * Convert a rt extent length and rt bitmap block number to a xfs_suminfo_t + * offset within the rt summary file. + */ +static inline xfs_rtsumoff_t +xfs_rtsumoffs( + struct xfs_mount *mp, + int log2_len, + xfs_fileoff_t rbmoff) +{ + return log2_len * mp->m_sb.sb_rbmblocks + rbmoff; +} + +/* + * Convert an xfs_suminfo_t offset to a file block offset within the rt summary + * file. + */ +static inline xfs_fileoff_t +xfs_rtsumoffs_to_block( + struct xfs_mount *mp, + xfs_rtsumoff_t rsumoff) +{ + return XFS_B_TO_FSBT(mp, rsumoff * sizeof(xfs_suminfo_t)); +} + +/* + * Convert an xfs_suminfo_t offset to an info word offset within an rt summary + * block. + */ +static inline unsigned int +xfs_rtsumoffs_to_infoword( + struct xfs_mount *mp, + xfs_rtsumoff_t rsumoff) +{ + unsigned int mask = mp->m_blockmask >> XFS_SUMINFOLOG; + + return rsumoff & mask; +} + +/* Return a pointer to a summary info word within a rt summary block buffer. */ +static inline xfs_suminfo_t * +xfs_rsumbuf_infoptr( + void *buf, + unsigned int infoword) +{ + xfs_suminfo_t *infop = buf; + + return &infop[infoword]; +} + +/* Return a pointer to a summary info word within a rt summary block. */ +static inline xfs_suminfo_t * +xfs_rsumblock_infoptr( + struct xfs_buf *bp, + unsigned int infoword) +{ + return xfs_rsumbuf_infoptr(bp->b_addr, infoword); +} + /* * Functions for walking free space rtextents in the realtime bitmap. */ diff --git a/fs/xfs/libxfs/xfs_types.h b/fs/xfs/libxfs/xfs_types.h index abb07a1c7b0b..f4615c5be34f 100644 --- a/fs/xfs/libxfs/xfs_types.h +++ b/fs/xfs/libxfs/xfs_types.h @@ -19,6 +19,7 @@ typedef int64_t xfs_fsize_t; /* bytes in a file */ typedef uint64_t xfs_ufsize_t; /* unsigned bytes in a file */ typedef int32_t xfs_suminfo_t; /* type of bitmap summary info */ +typedef uint32_t xfs_rtsumoff_t; /* offset of an rtsummary info word */ typedef uint32_t xfs_rtword_t; /* word type for bitmap manipulations */ typedef int64_t xfs_lsn_t; /* log sequence number */ @@ -151,6 +152,7 @@ typedef uint32_t xfs_dqid_t; */ #define XFS_NBBYLOG 3 /* log2(NBBY) */ #define XFS_WORDLOG 2 /* log2(sizeof(xfs_rtword_t)) */ +#define XFS_SUMINFOLOG 2 /* log2(sizeof(xfs_suminfo_t)) */ #define XFS_NBWORDLOG (XFS_NBBYLOG + XFS_WORDLOG) #define XFS_NBWORD (1 << XFS_NBWORDLOG) #define XFS_WORDMASK ((1 << XFS_WORDLOG) - 1) diff --git a/fs/xfs/scrub/rtsummary.c b/fs/xfs/scrub/rtsummary.c index fd6fb905904b..98baca261202 100644 --- a/fs/xfs/scrub/rtsummary.c +++ b/fs/xfs/scrub/rtsummary.c @@ -85,7 +85,7 @@ xchk_setup_rtsummary( static inline int xfsum_load( struct xfs_scrub *sc, - xchk_rtsumoff_t sumoff, + xfs_rtsumoff_t sumoff, xfs_suminfo_t *info) { return xfile_obj_load(sc->xfile, info, sizeof(xfs_suminfo_t), @@ -95,7 +95,7 @@ xfsum_load( static inline int xfsum_store( struct xfs_scrub *sc, - xchk_rtsumoff_t sumoff, + xfs_rtsumoff_t sumoff, const xfs_suminfo_t info) { return xfile_obj_store(sc->xfile, &info, sizeof(xfs_suminfo_t), @@ -105,7 +105,7 @@ xfsum_store( inline int xfsum_copyout( struct xfs_scrub *sc, - xchk_rtsumoff_t sumoff, + xfs_rtsumoff_t sumoff, xfs_suminfo_t *info, unsigned int nr_words) { @@ -125,7 +125,7 @@ xchk_rtsum_record_free( xfs_fileoff_t rbmoff; xfs_rtxnum_t rtbno; xfs_filblks_t rtlen; - xchk_rtsumoff_t offs; + xfs_rtsumoff_t offs; unsigned int lenlog; xfs_suminfo_t v = 0; int error = 0; @@ -136,7 +136,7 @@ xchk_rtsum_record_free( /* Compute the relevant location in the rtsum file. */ rbmoff = xfs_rtx_to_rbmblock(mp, rec->ar_startext); lenlog = XFS_RTBLOCKLOG(rec->ar_extcount); - offs = XFS_SUMOFFS(mp, lenlog, rbmoff); + offs = xfs_rtsumoffs(mp, lenlog, rbmoff); rtbno = xfs_rtx_to_rtb(mp, rec->ar_startext); rtlen = xfs_rtx_to_rtb(mp, rec->ar_extcount); @@ -185,10 +185,11 @@ xchk_rtsum_compare( struct xfs_buf *bp; struct xfs_bmbt_irec map; xfs_fileoff_t off; - xchk_rtsumoff_t sumoff = 0; + xfs_rtsumoff_t sumoff = 0; int nmap; for (off = 0; off < XFS_B_TO_FSB(mp, mp->m_rsumsize); off++) { + xfs_suminfo_t *ondisk_info; int error = 0; if (xchk_should_terminate(sc, &error)) @@ -220,7 +221,8 @@ xchk_rtsum_compare( return error; } - if (memcmp(bp->b_addr, sc->buf, + ondisk_info = xfs_rsumblock_infoptr(bp, 0); + if (memcmp(ondisk_info, sc->buf, mp->m_blockwsize << XFS_WORDLOG) != 0) xchk_fblock_set_corrupt(sc, XFS_DATA_FORK, off); diff --git a/fs/xfs/scrub/rtsummary.h b/fs/xfs/scrub/rtsummary.h index e5f3c69c4cbf..f5fd55992957 100644 --- a/fs/xfs/scrub/rtsummary.h +++ b/fs/xfs/scrub/rtsummary.h @@ -6,9 +6,9 @@ #ifndef __XFS_SCRUB_RTSUMMARY_H__ #define __XFS_SCRUB_RTSUMMARY_H__ -typedef unsigned int xchk_rtsumoff_t; +typedef unsigned int xfs_rtsumoff_t; -int xfsum_copyout(struct xfs_scrub *sc, xchk_rtsumoff_t sumoff, +int xfsum_copyout(struct xfs_scrub *sc, xfs_rtsumoff_t sumoff, xfs_suminfo_t *info, unsigned int nr_words); #endif /* __XFS_SCRUB_RTSUMMARY_H__ */ diff --git a/fs/xfs/scrub/rtsummary_repair.c b/fs/xfs/scrub/rtsummary_repair.c index f5c14c50ebf3..713b79a1f52a 100644 --- a/fs/xfs/scrub/rtsummary_repair.c +++ b/fs/xfs/scrub/rtsummary_repair.c @@ -18,6 +18,7 @@ #include "xfs_bmap.h" #include "xfs_bmap_btree.h" #include "xfs_swapext.h" +#include "xfs_rtbitmap.h" #include "scrub/scrub.h" #include "scrub/common.h" #include "scrub/trace.h" @@ -30,7 +31,7 @@ struct xrep_rtsummary { /* suminfo position of xfile as we write buffers to disk. */ - xchk_rtsumoff_t prep_wordoff; + xfs_rtsumoff_t prep_wordoff; }; /* Set us up to repair the rtsummary file. */ @@ -89,8 +90,8 @@ xrep_rtsummary_prep_buf( bp->b_ops = &xfs_rtbuf_ops; - error = xfsum_copyout(sc, rs->prep_wordoff, bp->b_addr, - mp->m_blockwsize); + error = xfsum_copyout(sc, rs->prep_wordoff, + xfs_rsumblock_infoptr(bp, 0), mp->m_blockwsize); if (error) return error; ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCHSET v1.0 0/3] xfs: refactor realtime meta inode locking 2022-12-30 21:14 [NYE DELUGE 3/4] xfs: modernize the realtime volume Darrick J. Wong ` (5 preceding siblings ...) 2022-12-30 22:17 ` [PATCHSET v1.0 0/8] xfs: refactor rtbitmap/summary macros Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 1/3] xfs: use separate lock classes for realtime metadata inode ILOCKs Darrick J. Wong ` (2 more replies) 2022-12-30 22:17 ` [PATCHSET v1.0 00/22] xfsprogs: shard the realtime section Darrick J. Wong ` (32 subsequent siblings) 39 siblings, 3 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs Hi all, Replace all the open-coded locking of realtime metadata inodes with a single rtlock function that can lock all the pieces that the caller wants in a single call. This will be important for maintaining correct locking order later when we start adding more realtime metadata inodes. If you're going to start using this mess, you probably ought to just pull from my git trees, which are linked below. This is an extraordinary way to destroy everything. Enjoy! Comments and questions are, as always, welcome. --D kernel git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=refactor-rt-locking xfsprogs git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=refactor-rt-locking --- fs/xfs/libxfs/xfs_bmap.c | 7 +---- fs/xfs/libxfs/xfs_rtbitmap.c | 57 +++++++++++++++++++++++++++++++++++++++ fs/xfs/libxfs/xfs_rtbitmap.h | 17 ++++++++++++ fs/xfs/scrub/common.c | 9 +++--- fs/xfs/scrub/fscounters.c | 4 +-- fs/xfs/xfs_bmap_util.c | 5 +-- fs/xfs/xfs_fsmap.c | 4 +-- fs/xfs/xfs_inode.c | 3 +- fs/xfs/xfs_inode.h | 13 +++------ fs/xfs/xfs_rtalloc.c | 62 ++++++++++++++++++++++++++++++------------ 10 files changed, 135 insertions(+), 46 deletions(-) ^ permalink raw reply [flat|nested] 565+ messages in thread
* [PATCH 1/3] xfs: use separate lock classes for realtime metadata inode ILOCKs 2022-12-30 22:17 ` [PATCHSET v1.0 0/3] xfs: refactor realtime meta inode locking Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 3/3] xfs: remove XFS_ILOCK_RT* Darrick J. Wong 2022-12-30 22:17 ` [PATCH 2/3] xfs: refactor realtime inode locking Darrick J. Wong 2 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Realtime metadata files are not quite regular files because userspace can't access the realtime bitmap directly, and because we take the ILOCK of the rt bitmap file while holding the ILOCK of a realtime file. The double nature of inodes confuses lockdep, so up until now we've created lockdep subclasses to help lockdep keep things straight. We've gotten away with using lockdep subclasses because there's only two rt metadata files, but with the coming addition of realtime rmap and refcounting, we'd need two more subclasses, which is a lot of class bits to burn on a side feature. Therefore, switch to manually setting the lockdep class of the rt metadata ILOCKs. In the next patch we'll remove the rt-related ILOCK subclasses. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/xfs_rtalloc.c | 36 ++++++++++++++++++++++++++++++++++-- 1 file changed, 34 insertions(+), 2 deletions(-) diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c index 11c42ebfa0a5..674ca3dab72e 100644 --- a/fs/xfs/xfs_rtalloc.c +++ b/fs/xfs/xfs_rtalloc.c @@ -26,6 +26,16 @@ #include "xfs_imeta.h" #include "xfs_rtbitmap.h" +/* + * Realtime metadata files are not quite regular files because userspace can't + * access the realtime bitmap directly, and because we take the ILOCK of the rt + * bitmap file while holding the ILOCK of a regular realtime file. This double + * locking confuses lockdep, so create different lockdep classes here to help + * it keep things straight. + */ +static struct lock_class_key xfs_rbmip_key; +static struct lock_class_key xfs_rsumip_key; + /* * Read and return the summary information for a given extent size, * bitmap block combination. @@ -1342,6 +1352,28 @@ xfs_rtalloc_reinit_frextents( return 0; } +static inline int +__xfs_rt_iget( + struct xfs_mount *mp, + xfs_ino_t ino, + struct lock_class_key *lockdep_key, + const char *lockdep_key_name, + struct xfs_inode **ipp) +{ + int error; + + error = xfs_imeta_iget(mp, ino, XFS_DIR3_FT_REG_FILE, ipp); + if (error) + return error; + + lockdep_set_class_and_name(&(*ipp)->i_lock.mr_lock, lockdep_key, + lockdep_key_name); + return 0; +} + +#define xfs_rt_iget(mp, ino, lockdep_key, ipp) \ + __xfs_rt_iget((mp), (ino), (lockdep_key), #lockdep_key, (ipp)) + /* * Read in the bmbt of an rt metadata inode so that we never have to load them * at runtime. This enables the use of shared ILOCKs for rtbitmap scans. Use @@ -1389,7 +1421,7 @@ xfs_rtmount_inodes( xfs_sb_t *sbp; sbp = &mp->m_sb; - error = xfs_imeta_iget(mp, mp->m_sb.sb_rbmino, XFS_DIR3_FT_REG_FILE, + error = xfs_rt_iget(mp, mp->m_sb.sb_rbmino, &xfs_rbmip_key, &mp->m_rbmip); if (xfs_metadata_is_sick(error)) xfs_rt_mark_sick(mp, XFS_SICK_RT_BITMAP); @@ -1401,7 +1433,7 @@ xfs_rtmount_inodes( if (error) goto out_rele_bitmap; - error = xfs_imeta_iget(mp, mp->m_sb.sb_rsumino, XFS_DIR3_FT_REG_FILE, + error = xfs_rt_iget(mp, mp->m_sb.sb_rsumino, &xfs_rsumip_key, &mp->m_rsumip); if (xfs_metadata_is_sick(error)) xfs_rt_mark_sick(mp, XFS_SICK_RT_SUMMARY); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 3/3] xfs: remove XFS_ILOCK_RT* 2022-12-30 22:17 ` [PATCHSET v1.0 0/3] xfs: refactor realtime meta inode locking Darrick J. Wong 2022-12-30 22:17 ` [PATCH 1/3] xfs: use separate lock classes for realtime metadata inode ILOCKs Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 2/3] xfs: refactor realtime inode locking Darrick J. Wong 2 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Now that we've centralized the realtime metadata locking routines, get rid of the ILOCK subclasses since we now use explicit lockdep classes. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_rtbitmap.c | 16 ++++++++-------- fs/xfs/scrub/common.c | 8 ++++---- fs/xfs/xfs_inode.c | 3 +-- fs/xfs/xfs_inode.h | 13 ++++--------- fs/xfs/xfs_rtalloc.c | 11 +++++------ 5 files changed, 22 insertions(+), 29 deletions(-) diff --git a/fs/xfs/libxfs/xfs_rtbitmap.c b/fs/xfs/libxfs/xfs_rtbitmap.c index 46095acec709..4237b5703a64 100644 --- a/fs/xfs/libxfs/xfs_rtbitmap.c +++ b/fs/xfs/libxfs/xfs_rtbitmap.c @@ -1262,11 +1262,11 @@ xfs_rtbitmap_lock( struct xfs_trans *tp, struct xfs_mount *mp) { - xfs_ilock(mp->m_rbmip, XFS_ILOCK_EXCL | XFS_ILOCK_RTBITMAP); + xfs_ilock(mp->m_rbmip, XFS_ILOCK_EXCL); if (tp) xfs_trans_ijoin(tp, mp->m_rbmip, XFS_ILOCK_EXCL); - xfs_ilock(mp->m_rsumip, XFS_ILOCK_EXCL | XFS_ILOCK_RTSUM); + xfs_ilock(mp->m_rsumip, XFS_ILOCK_EXCL); if (tp) xfs_trans_ijoin(tp, mp->m_rsumip, XFS_ILOCK_EXCL); } @@ -1276,8 +1276,8 @@ void xfs_rtbitmap_unlock( struct xfs_mount *mp) { - xfs_iunlock(mp->m_rsumip, XFS_ILOCK_EXCL | XFS_ILOCK_RTSUM); - xfs_iunlock(mp->m_rbmip, XFS_ILOCK_EXCL | XFS_ILOCK_RTBITMAP); + xfs_iunlock(mp->m_rsumip, XFS_ILOCK_EXCL); + xfs_iunlock(mp->m_rbmip, XFS_ILOCK_EXCL); } /* @@ -1290,10 +1290,10 @@ xfs_rtbitmap_lock_shared( unsigned int rbmlock_flags) { if (rbmlock_flags & XFS_RBMLOCK_BITMAP) - xfs_ilock(mp->m_rbmip, XFS_ILOCK_SHARED | XFS_ILOCK_RTBITMAP); + xfs_ilock(mp->m_rbmip, XFS_ILOCK_SHARED); if (rbmlock_flags & XFS_RBMLOCK_SUMMARY) - xfs_ilock(mp->m_rsumip, XFS_ILOCK_SHARED | XFS_ILOCK_RTSUM); + xfs_ilock(mp->m_rsumip, XFS_ILOCK_SHARED); } /* Unlock the realtime free space metadata inodes after a freespace scan. */ @@ -1303,8 +1303,8 @@ xfs_rtbitmap_unlock_shared( unsigned int rbmlock_flags) { if (rbmlock_flags & XFS_RBMLOCK_SUMMARY) - xfs_iunlock(mp->m_rsumip, XFS_ILOCK_SHARED | XFS_ILOCK_RTSUM); + xfs_iunlock(mp->m_rsumip, XFS_ILOCK_SHARED); if (rbmlock_flags & XFS_RBMLOCK_BITMAP) - xfs_iunlock(mp->m_rbmip, XFS_ILOCK_SHARED | XFS_ILOCK_RTBITMAP); + xfs_iunlock(mp->m_rbmip, XFS_ILOCK_SHARED); } diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c index dadbe32916de..1b48726fcc65 100644 --- a/fs/xfs/scrub/common.c +++ b/fs/xfs/scrub/common.c @@ -693,14 +693,14 @@ xchk_rt_init( XCHK_RTLOCK_SUMMARY_SHARED)) < 2); if (rtlock_flags & XCHK_RTLOCK_BITMAP) - xfs_ilock(sc->mp->m_rbmip, XFS_ILOCK_EXCL | XFS_ILOCK_RTBITMAP); + xfs_ilock(sc->mp->m_rbmip, XFS_ILOCK_EXCL); else if (rtlock_flags & XCHK_RTLOCK_BITMAP_SHARED) - xfs_ilock(sc->mp->m_rbmip, XFS_ILOCK_SHARED | XFS_ILOCK_RTBITMAP); + xfs_ilock(sc->mp->m_rbmip, XFS_ILOCK_SHARED); if (rtlock_flags & XCHK_RTLOCK_SUMMARY) - xfs_ilock(sc->mp->m_rsumip, XFS_ILOCK_EXCL | XFS_ILOCK_RTSUM); + xfs_ilock(sc->mp->m_rsumip, XFS_ILOCK_EXCL); else if (rtlock_flags & XCHK_RTLOCK_SUMMARY_SHARED) - xfs_ilock(sc->mp->m_rsumip, XFS_ILOCK_SHARED | XFS_ILOCK_RTSUM); + xfs_ilock(sc->mp->m_rsumip, XFS_ILOCK_SHARED); sr->rtlock_flags = rtlock_flags; } diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index 51bceccd8c9a..ab805df9db16 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -361,8 +361,7 @@ xfs_lock_inumorder( { uint class = 0; - ASSERT(!(lock_mode & (XFS_ILOCK_PARENT | XFS_ILOCK_RTBITMAP | - XFS_ILOCK_RTSUM))); + ASSERT(!(lock_mode & XFS_ILOCK_PARENT)); ASSERT(xfs_lockdep_subclass_ok(subclass)); if (lock_mode & (XFS_IOLOCK_SHARED|XFS_IOLOCK_EXCL)) { diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h index 7cf45dd9d86b..06601c409010 100644 --- a/fs/xfs/xfs_inode.h +++ b/fs/xfs/xfs_inode.h @@ -404,9 +404,8 @@ static inline bool xfs_inode_has_bigrtextents(struct xfs_inode *ip) * However, MAX_LOCKDEP_SUBCLASSES == 8, which means we are greatly * limited to the subclasses we can represent via nesting. We need at least * 5 inodes nest depth for the ILOCK through rename, and we also have to support - * XFS_ILOCK_PARENT, which gives 6 subclasses. Then we have XFS_ILOCK_RTBITMAP - * and XFS_ILOCK_RTSUM, which are another 2 unique subclasses, so that's all - * 8 subclasses supported by lockdep. + * XFS_ILOCK_PARENT, which gives 6 subclasses. That's 6 of the 8 subclasses + * supported by lockdep. * * This also means we have to number the sub-classes in the lowest bits of * the mask we keep, and we have to ensure we never exceed 3 bits of lockdep @@ -432,8 +431,8 @@ static inline bool xfs_inode_has_bigrtextents(struct xfs_inode *ip) * ILOCK values * 0-4 subclass values * 5 PARENT subclass (not nestable) - * 6 RTBITMAP subclass (not nestable) - * 7 RTSUM subclass (not nestable) + * 6 unused + * 7 unused * */ #define XFS_IOLOCK_SHIFT 16 @@ -449,12 +448,8 @@ static inline bool xfs_inode_has_bigrtextents(struct xfs_inode *ip) #define XFS_ILOCK_SHIFT 24 #define XFS_ILOCK_PARENT_VAL 5u #define XFS_ILOCK_MAX_SUBCLASS (XFS_ILOCK_PARENT_VAL - 1) -#define XFS_ILOCK_RTBITMAP_VAL 6u -#define XFS_ILOCK_RTSUM_VAL 7u #define XFS_ILOCK_DEP_MASK 0xff000000u #define XFS_ILOCK_PARENT (XFS_ILOCK_PARENT_VAL << XFS_ILOCK_SHIFT) -#define XFS_ILOCK_RTBITMAP (XFS_ILOCK_RTBITMAP_VAL << XFS_ILOCK_SHIFT) -#define XFS_ILOCK_RTSUM (XFS_ILOCK_RTSUM_VAL << XFS_ILOCK_SHIFT) #define XFS_LOCK_SUBCLASS_MASK (XFS_IOLOCK_DEP_MASK | \ XFS_MMAPLOCK_DEP_MASK | \ diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c index 11bea1c60eda..c131738efd0f 100644 --- a/fs/xfs/xfs_rtalloc.c +++ b/fs/xfs/xfs_rtalloc.c @@ -1376,8 +1376,7 @@ __xfs_rt_iget( */ static inline int xfs_rtmount_iread_extents( - struct xfs_inode *ip, - unsigned int lock_class) + struct xfs_inode *ip) { struct xfs_trans *tp; int error; @@ -1386,7 +1385,7 @@ xfs_rtmount_iread_extents( if (error) return error; - xfs_ilock(ip, XFS_ILOCK_EXCL | lock_class); + xfs_ilock(ip, XFS_ILOCK_EXCL); error = xfs_iread_extents(tp, ip, XFS_DATA_FORK); if (error) @@ -1399,7 +1398,7 @@ xfs_rtmount_iread_extents( } out_unlock: - xfs_iunlock(ip, XFS_ILOCK_EXCL | lock_class); + xfs_iunlock(ip, XFS_ILOCK_EXCL); xfs_trans_cancel(tp); return error; } @@ -1424,7 +1423,7 @@ xfs_rtmount_inodes( return error; ASSERT(mp->m_rbmip != NULL); - error = xfs_rtmount_iread_extents(mp->m_rbmip, XFS_ILOCK_RTBITMAP); + error = xfs_rtmount_iread_extents(mp->m_rbmip); if (error) goto out_rele_bitmap; @@ -1436,7 +1435,7 @@ xfs_rtmount_inodes( goto out_rele_bitmap; ASSERT(mp->m_rsumip != NULL); - error = xfs_rtmount_iread_extents(mp->m_rsumip, XFS_ILOCK_RTSUM); + error = xfs_rtmount_iread_extents(mp->m_rsumip); if (error) goto out_rele_summary; ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 2/3] xfs: refactor realtime inode locking 2022-12-30 22:17 ` [PATCHSET v1.0 0/3] xfs: refactor realtime meta inode locking Darrick J. Wong 2022-12-30 22:17 ` [PATCH 1/3] xfs: use separate lock classes for realtime metadata inode ILOCKs Darrick J. Wong 2022-12-30 22:17 ` [PATCH 3/3] xfs: remove XFS_ILOCK_RT* Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Create helper functions to deal with locking realtime metadata inodes. This enables us to maintain correct locking order once we start adding the realtime rmap and refcount btree inodes. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_bmap.c | 7 +---- fs/xfs/libxfs/xfs_rtbitmap.c | 57 ++++++++++++++++++++++++++++++++++++++++++ fs/xfs/libxfs/xfs_rtbitmap.h | 17 +++++++++++++ fs/xfs/scrub/common.c | 1 + fs/xfs/scrub/fscounters.c | 4 +-- fs/xfs/xfs_bmap_util.c | 5 +--- fs/xfs/xfs_fsmap.c | 4 +-- fs/xfs/xfs_rtalloc.c | 15 ++++------- 8 files changed, 87 insertions(+), 23 deletions(-) diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index 1ad8606c1dd9..55fe8cda3d98 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -5416,12 +5416,9 @@ __xfs_bunmapi( if (isrt) { /* - * Synchronize by locking the bitmap inode. + * Synchronize by locking the realtime bitmap. */ - xfs_ilock(mp->m_rbmip, XFS_ILOCK_EXCL|XFS_ILOCK_RTBITMAP); - xfs_trans_ijoin(tp, mp->m_rbmip, XFS_ILOCK_EXCL); - xfs_ilock(mp->m_rsumip, XFS_ILOCK_EXCL|XFS_ILOCK_RTSUM); - xfs_trans_ijoin(tp, mp->m_rsumip, XFS_ILOCK_EXCL); + xfs_rtbitmap_lock(tp, mp); } extno = 0; diff --git a/fs/xfs/libxfs/xfs_rtbitmap.c b/fs/xfs/libxfs/xfs_rtbitmap.c index b74261abd238..46095acec709 100644 --- a/fs/xfs/libxfs/xfs_rtbitmap.c +++ b/fs/xfs/libxfs/xfs_rtbitmap.c @@ -1251,3 +1251,60 @@ xfs_rtsummary_wordcount( blocks = xfs_rtsummary_blockcount(mp, rsumlevels, rbmblocks); return XFS_FSB_TO_B(mp, blocks) >> XFS_WORDLOG; } + +/* + * Lock both realtime free space metadata inodes for a freespace update. If a + * transaction is given, the inodes will be joined to the transaction and the + * ILOCKs will be released on transaction commit. + */ +void +xfs_rtbitmap_lock( + struct xfs_trans *tp, + struct xfs_mount *mp) +{ + xfs_ilock(mp->m_rbmip, XFS_ILOCK_EXCL | XFS_ILOCK_RTBITMAP); + if (tp) + xfs_trans_ijoin(tp, mp->m_rbmip, XFS_ILOCK_EXCL); + + xfs_ilock(mp->m_rsumip, XFS_ILOCK_EXCL | XFS_ILOCK_RTSUM); + if (tp) + xfs_trans_ijoin(tp, mp->m_rsumip, XFS_ILOCK_EXCL); +} + +/* Unlock both realtime free space metadata inodes after a freespace update. */ +void +xfs_rtbitmap_unlock( + struct xfs_mount *mp) +{ + xfs_iunlock(mp->m_rsumip, XFS_ILOCK_EXCL | XFS_ILOCK_RTSUM); + xfs_iunlock(mp->m_rbmip, XFS_ILOCK_EXCL | XFS_ILOCK_RTBITMAP); +} + +/* + * Lock the realtime free space metadata inodes for a freespace scan. Callers + * must walk metadata blocks in order of increasing file offset. + */ +void +xfs_rtbitmap_lock_shared( + struct xfs_mount *mp, + unsigned int rbmlock_flags) +{ + if (rbmlock_flags & XFS_RBMLOCK_BITMAP) + xfs_ilock(mp->m_rbmip, XFS_ILOCK_SHARED | XFS_ILOCK_RTBITMAP); + + if (rbmlock_flags & XFS_RBMLOCK_SUMMARY) + xfs_ilock(mp->m_rsumip, XFS_ILOCK_SHARED | XFS_ILOCK_RTSUM); +} + +/* Unlock the realtime free space metadata inodes after a freespace scan. */ +void +xfs_rtbitmap_unlock_shared( + struct xfs_mount *mp, + unsigned int rbmlock_flags) +{ + if (rbmlock_flags & XFS_RBMLOCK_SUMMARY) + xfs_iunlock(mp->m_rsumip, XFS_ILOCK_SHARED | XFS_ILOCK_RTSUM); + + if (rbmlock_flags & XFS_RBMLOCK_BITMAP) + xfs_iunlock(mp->m_rbmip, XFS_ILOCK_SHARED | XFS_ILOCK_RTBITMAP); +} diff --git a/fs/xfs/libxfs/xfs_rtbitmap.h b/fs/xfs/libxfs/xfs_rtbitmap.h index 749c8e3ec4cb..f6a2a48973ab 100644 --- a/fs/xfs/libxfs/xfs_rtbitmap.h +++ b/fs/xfs/libxfs/xfs_rtbitmap.h @@ -279,6 +279,19 @@ xfs_suminfo_t xfs_suminfo_get(struct xfs_mount *mp, union xfs_suminfo_ondisk *infoptr); void xfs_suminfo_add(struct xfs_mount *mp, union xfs_suminfo_ondisk *infoptr, int delta); + +void xfs_rtbitmap_lock(struct xfs_trans *tp, struct xfs_mount *mp); +void xfs_rtbitmap_unlock(struct xfs_mount *mp); + +/* Lock the rt bitmap inode in shared mode */ +#define XFS_RBMLOCK_BITMAP (1U << 0) +/* Lock the rt summary inode in shared mode */ +#define XFS_RBMLOCK_SUMMARY (1U << 1) + +void xfs_rtbitmap_lock_shared(struct xfs_mount *mp, + unsigned int rbmlock_flags); +void xfs_rtbitmap_unlock_shared(struct xfs_mount *mp, + unsigned int rbmlock_flags); #else /* CONFIG_XFS_RT */ # define xfs_rtfree_extent(t,b,l) (-ENOSYS) # define xfs_rtfree_blocks(t,rb,rl) (-ENOSYS) @@ -295,6 +308,10 @@ xfs_rtbitmap_blockcount(struct xfs_mount *mp, xfs_rtbxlen_t rtextents) # define xfs_rtbitmap_wordcount(mp, r) (0) # define xfs_rtsummary_blockcount(mp, l, b) (0) # define xfs_rtsummary_wordcount(mp, l, b) (0) +# define xfs_rtbitmap_lock(tp, mp) do { } while (0) +# define xfs_rtbitmap_unlock(mp) do { } while (0) +# define xfs_rtbitmap_lock_shared(mp, lf) do { } while (0) +# define xfs_rtbitmap_unlock_shared(mp, lf) do { } while (0) #endif /* CONFIG_XFS_RT */ #endif /* __XFS_RTBITMAP_H__ */ diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c index 4de13f8f4277..dadbe32916de 100644 --- a/fs/xfs/scrub/common.c +++ b/fs/xfs/scrub/common.c @@ -33,6 +33,7 @@ #include "xfs_error.h" #include "xfs_quota.h" #include "xfs_swapext.h" +#include "xfs_rtbitmap.h" #include "scrub/scrub.h" #include "scrub/common.h" #include "scrub/trace.h" diff --git a/fs/xfs/scrub/fscounters.c b/fs/xfs/scrub/fscounters.c index 680b2e1d2940..043fb5777290 100644 --- a/fs/xfs/scrub/fscounters.c +++ b/fs/xfs/scrub/fscounters.c @@ -470,7 +470,7 @@ xchk_fscount_count_frextents( if (!xfs_has_realtime(mp)) return 0; - xfs_ilock(sc->mp->m_rbmip, XFS_ILOCK_SHARED | XFS_ILOCK_RTBITMAP); + xfs_rtbitmap_lock_shared(sc->mp, XFS_RBMLOCK_BITMAP); error = xfs_rtalloc_query_all(sc->mp, sc->tp, xchk_fscount_add_frextent, fsc); if (error) { @@ -479,7 +479,7 @@ xchk_fscount_count_frextents( } out_unlock: - xfs_iunlock(sc->mp->m_rbmip, XFS_ILOCK_SHARED | XFS_ILOCK_RTBITMAP); + xfs_rtbitmap_unlock_shared(sc->mp, XFS_RBMLOCK_BITMAP); return error; } #else diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index 1bfdd31723f5..447c057c9331 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -131,10 +131,7 @@ xfs_bmap_rtalloc( * Lock out modifications to both the RT bitmap and summary inodes */ if (!rtlocked) { - xfs_ilock(mp->m_rbmip, XFS_ILOCK_EXCL|XFS_ILOCK_RTBITMAP); - xfs_trans_ijoin(ap->tp, mp->m_rbmip, XFS_ILOCK_EXCL); - xfs_ilock(mp->m_rsumip, XFS_ILOCK_EXCL|XFS_ILOCK_RTSUM); - xfs_trans_ijoin(ap->tp, mp->m_rsumip, XFS_ILOCK_EXCL); + xfs_rtbitmap_lock(ap->tp, mp); rtlocked = true; } diff --git a/fs/xfs/xfs_fsmap.c b/fs/xfs/xfs_fsmap.c index 8d0a6f480d2a..71053f840ea4 100644 --- a/fs/xfs/xfs_fsmap.c +++ b/fs/xfs/xfs_fsmap.c @@ -525,7 +525,7 @@ xfs_getfsmap_rtdev_rtbitmap_query( unsigned int mod; int error; - xfs_ilock(mp->m_rbmip, XFS_ILOCK_SHARED | XFS_ILOCK_RTBITMAP); + xfs_rtbitmap_lock_shared(mp, XFS_RBMLOCK_BITMAP); /* * Set up query parameters to return free rtextents covering the range @@ -551,7 +551,7 @@ xfs_getfsmap_rtdev_rtbitmap_query( if (error) goto err; err: - xfs_iunlock(mp->m_rbmip, XFS_ILOCK_SHARED | XFS_ILOCK_RTBITMAP); + xfs_rtbitmap_unlock_shared(mp, XFS_RBMLOCK_BITMAP); return error; } diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c index 674ca3dab72e..11bea1c60eda 100644 --- a/fs/xfs/xfs_rtalloc.c +++ b/fs/xfs/xfs_rtalloc.c @@ -1081,10 +1081,10 @@ xfs_growfs_rt( if (error) break; /* - * Lock out other callers by grabbing the bitmap inode lock. + * Lock out other callers by grabbing the bitmap and summary + * inode locks and joining them to the transaction. */ - xfs_ilock(mp->m_rbmip, XFS_ILOCK_EXCL | XFS_ILOCK_RTBITMAP); - xfs_trans_ijoin(tp, mp->m_rbmip, XFS_ILOCK_EXCL); + xfs_rtbitmap_lock(tp, mp); /* * Update the bitmap inode's size ondisk and incore. We need * to update the incore size so that inode inactivation won't @@ -1094,11 +1094,6 @@ xfs_growfs_rt( nsbp->sb_rbmblocks * nsbp->sb_blocksize; i_size_write(VFS_I(mp->m_rbmip), mp->m_rbmip->i_disk_size); xfs_trans_log_inode(tp, mp->m_rbmip, XFS_ILOG_CORE); - /* - * Get the summary inode into the transaction. - */ - xfs_ilock(mp->m_rsumip, XFS_ILOCK_EXCL | XFS_ILOCK_RTSUM); - xfs_trans_ijoin(tp, mp->m_rsumip, XFS_ILOCK_EXCL); /* * Update the summary inode's size. We need to update the * incore size so that inode inactivation won't punch what it @@ -1338,10 +1333,10 @@ xfs_rtalloc_reinit_frextents( uint64_t val = 0; int error; - xfs_ilock(mp->m_rbmip, XFS_ILOCK_SHARED | XFS_ILOCK_RTBITMAP); + xfs_rtbitmap_lock_shared(mp, XFS_RBMLOCK_BITMAP); error = xfs_rtalloc_query_all(mp, NULL, xfs_rtalloc_count_frextent, &val); - xfs_iunlock(mp->m_rbmip, XFS_ILOCK_SHARED | XFS_ILOCK_RTBITMAP); + xfs_rtbitmap_unlock_shared(mp, XFS_RBMLOCK_BITMAP); if (error) return error; ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCHSET v1.0 00/22] xfsprogs: shard the realtime section 2022-12-30 21:14 [NYE DELUGE 3/4] xfs: modernize the realtime volume Darrick J. Wong ` (6 preceding siblings ...) 2022-12-30 22:17 ` [PATCHSET v1.0 0/3] xfs: refactor realtime meta inode locking Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 01/22] xfs: create incore realtime group structures Darrick J. Wong ` (21 more replies) 2022-12-30 22:17 ` [PATCHSET v1.0 0/3] xfsprogs: enable FITRIM for the realtime section Darrick J. Wong ` (31 subsequent siblings) 39 siblings, 22 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs Hi all, Right now, the realtime section uses a single pair of metadata inodes to store the free space information. This presents a scalability problem since every thread trying to allocate or free rt extents have to lock these files. It would be very useful if we could begin to tackle these problems by sharding the realtime section, so create the notion of realtime groups, which are similar to allocation groups on the data section. While we're at it, define a superblock to be stamped into the start of each rt section. This enables utilities such as blkid to identify block devices containing realtime sections, and helpfully avoids the situation where a file extent can cross an rtgroup boundary. The best advantage for rtgroups will become evident later when we get to adding rmap and reflink to the realtime volume, since the geometry constraints are the same for rt groups and AGs. Hence we can reuse all that code directly. If you're going to start using this mess, you probably ought to just pull from my git trees, which are linked below. This is an extraordinary way to destroy everything. Enjoy! Comments and questions are, as always, welcome. --D kernel git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=realtime-groups xfsprogs git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=realtime-groups fstests git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=realtime-groups --- fs/xfs/Makefile | 3 fs/xfs/libxfs/xfs_bmap.h | 5 fs/xfs/libxfs/xfs_format.h | 94 ++++++- fs/xfs/libxfs/xfs_fs.h | 24 ++ fs/xfs/libxfs/xfs_health.h | 30 ++ fs/xfs/libxfs/xfs_rtbitmap.c | 128 ++++++++- fs/xfs/libxfs/xfs_rtbitmap.h | 46 +++ fs/xfs/libxfs/xfs_rtgroup.c | 548 +++++++++++++++++++++++++++++++++++++++ fs/xfs/libxfs/xfs_rtgroup.h | 241 +++++++++++++++++ fs/xfs/libxfs/xfs_sb.c | 124 +++++++++ fs/xfs/libxfs/xfs_shared.h | 4 fs/xfs/libxfs/xfs_types.c | 46 +++ fs/xfs/libxfs/xfs_types.h | 4 fs/xfs/scrub/common.c | 88 ++++++ fs/xfs/scrub/common.h | 52 ++-- fs/xfs/scrub/health.c | 25 ++ fs/xfs/scrub/repair.h | 3 fs/xfs/scrub/rgsuper.c | 77 +++++ fs/xfs/scrub/rgsuper_repair.c | 48 +++ fs/xfs/scrub/rtbitmap.c | 73 +++++ fs/xfs/scrub/rtsummary_repair.c | 15 + fs/xfs/scrub/scrub.c | 27 ++ fs/xfs/scrub/scrub.h | 42 +-- fs/xfs/scrub/trace.h | 6 fs/xfs/xfs_bmap_item.c | 18 + fs/xfs/xfs_buf_item_recover.c | 43 +++ fs/xfs/xfs_fsops.c | 4 fs/xfs/xfs_health.c | 114 ++++++++ fs/xfs/xfs_ioctl.c | 35 ++ fs/xfs/xfs_log_recover.c | 6 fs/xfs/xfs_mount.c | 12 + fs/xfs/xfs_mount.h | 10 + fs/xfs/xfs_ondisk.h | 2 fs/xfs/xfs_rtalloc.c | 266 +++++++++++++++++-- fs/xfs/xfs_rtalloc.h | 5 fs/xfs/xfs_super.c | 18 + fs/xfs/xfs_trace.c | 1 fs/xfs/xfs_trace.h | 60 ++++ fs/xfs/xfs_trans.c | 38 ++- fs/xfs/xfs_trans.h | 2 fs/xfs/xfs_trans_buf.c | 25 +- 41 files changed, 2295 insertions(+), 117 deletions(-) create mode 100644 fs/xfs/libxfs/xfs_rtgroup.c create mode 100644 fs/xfs/libxfs/xfs_rtgroup.h create mode 100644 fs/xfs/scrub/rgsuper.c create mode 100644 fs/xfs/scrub/rgsuper_repair.c ^ permalink raw reply [flat|nested] 565+ messages in thread
* [PATCH 01/22] xfs: create incore realtime group structures 2022-12-30 22:17 ` [PATCHSET v1.0 00/22] xfsprogs: shard the realtime section Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 05/22] xfs: write secondary realtime superblocks to disk Darrick J. Wong ` (20 subsequent siblings) 21 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Create an incore object that will contain information about a realtime allocation group. This will eventually enable us to shard the realtime section in a similar manner to how we shard the data section. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/Makefile | 1 fs/xfs/libxfs/xfs_format.h | 8 ++ fs/xfs/libxfs/xfs_rtgroup.c | 214 +++++++++++++++++++++++++++++++++++++++++++ fs/xfs/libxfs/xfs_rtgroup.h | 121 ++++++++++++++++++++++++ fs/xfs/libxfs/xfs_sb.c | 5 + fs/xfs/libxfs/xfs_types.h | 4 + fs/xfs/xfs_log_recover.c | 6 + fs/xfs/xfs_mount.c | 12 ++ fs/xfs/xfs_mount.h | 6 + fs/xfs/xfs_rtalloc.c | 14 ++- fs/xfs/xfs_super.c | 2 fs/xfs/xfs_trace.h | 34 +++++++ 12 files changed, 423 insertions(+), 4 deletions(-) create mode 100644 fs/xfs/libxfs/xfs_rtgroup.c create mode 100644 fs/xfs/libxfs/xfs_rtgroup.h diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile index 3d74696755c3..135a403c0edc 100644 --- a/fs/xfs/Makefile +++ b/fs/xfs/Makefile @@ -57,6 +57,7 @@ xfs-y += $(addprefix libxfs/, \ # xfs_rtbitmap is shared with libxfs xfs-$(CONFIG_XFS_RT) += $(addprefix libxfs/, \ xfs_rtbitmap.o \ + xfs_rtgroup.o \ ) # highlevel code diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h index 946870eb492c..ca87a3f8704a 100644 --- a/fs/xfs/libxfs/xfs_format.h +++ b/fs/xfs/libxfs/xfs_format.h @@ -184,6 +184,14 @@ typedef struct xfs_sb { */ xfs_ino_t sb_metadirino; + /* + * Realtime group geometry information. On disk these fields live in + * the rsumino slot, but we cache them separately in the in-core super + * for easy access. + */ + xfs_rgblock_t sb_rgblocks; /* size of a realtime group */ + xfs_rgnumber_t sb_rgcount; /* number of realtime groups */ + /* must be padded to 64 bit alignment */ } xfs_sb_t; diff --git a/fs/xfs/libxfs/xfs_rtgroup.c b/fs/xfs/libxfs/xfs_rtgroup.c new file mode 100644 index 000000000000..ced2bd896106 --- /dev/null +++ b/fs/xfs/libxfs/xfs_rtgroup.c @@ -0,0 +1,214 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (C) 2022 Oracle. All Rights Reserved. + * Author: Darrick J. Wong <djwong@kernel.org> + */ +#include "xfs.h" +#include "xfs_fs.h" +#include "xfs_shared.h" +#include "xfs_format.h" +#include "xfs_trans_resv.h" +#include "xfs_bit.h" +#include "xfs_sb.h" +#include "xfs_mount.h" +#include "xfs_btree.h" +#include "xfs_alloc_btree.h" +#include "xfs_rmap_btree.h" +#include "xfs_alloc.h" +#include "xfs_ialloc.h" +#include "xfs_rmap.h" +#include "xfs_ag.h" +#include "xfs_ag_resv.h" +#include "xfs_health.h" +#include "xfs_error.h" +#include "xfs_bmap.h" +#include "xfs_defer.h" +#include "xfs_log_format.h" +#include "xfs_trans.h" +#include "xfs_trace.h" +#include "xfs_inode.h" +#include "xfs_icache.h" +#include "xfs_rtgroup.h" +#include "xfs_rtbitmap.h" + +/* + * Passive reference counting access wrappers to the rtgroup structures. If + * the rtgroup structure is to be freed, the freeing code is responsible for + * cleaning up objects with passive references before freeing the structure. + */ +struct xfs_rtgroup * +xfs_rtgroup_get( + struct xfs_mount *mp, + xfs_rgnumber_t rgno) +{ + struct xfs_rtgroup *rtg; + int ref = 0; + + rcu_read_lock(); + rtg = radix_tree_lookup(&mp->m_rtgroup_tree, rgno); + if (rtg) { + ASSERT(atomic_read(&rtg->rtg_ref) >= 0); + ref = atomic_inc_return(&rtg->rtg_ref); + } + rcu_read_unlock(); + trace_xfs_rtgroup_get(mp, rgno, ref, _RET_IP_); + return rtg; +} + +struct xfs_rtgroup * +xfs_rtgroup_bump( + struct xfs_rtgroup *rtg) +{ + if (!atomic_inc_not_zero(&rtg->rtg_ref)) { + ASSERT(0); + return NULL; + } + + trace_xfs_rtgroup_bump(rtg->rtg_mount, rtg->rtg_rgno, + atomic_read(&rtg->rtg_ref), _RET_IP_); + return rtg; +} + +void +xfs_rtgroup_put( + struct xfs_rtgroup *rtg) +{ + int ref; + + ASSERT(atomic_read(&rtg->rtg_ref) > 0); + ref = atomic_dec_return(&rtg->rtg_ref); + trace_xfs_rtgroup_put(rtg->rtg_mount, rtg->rtg_rgno, ref, _RET_IP_); +} + +int +xfs_initialize_rtgroups( + struct xfs_mount *mp, + xfs_rgnumber_t rgcount) +{ + struct xfs_rtgroup *rtg; + xfs_rgnumber_t index; + xfs_rgnumber_t first_initialised = NULLRGNUMBER; + int error; + + if (!xfs_has_rtgroups(mp)) + return 0; + + /* + * Walk the current rtgroup tree so we don't try to initialise rt + * groups that already exist (growfs case). Allocate and insert all the + * rtgroups we don't find ready for initialisation. + */ + for (index = 0; index < rgcount; index++) { + rtg = xfs_rtgroup_get(mp, index); + if (rtg) { + xfs_rtgroup_put(rtg); + continue; + } + + rtg = kmem_zalloc(sizeof(struct xfs_rtgroup), KM_MAYFAIL); + if (!rtg) { + error = -ENOMEM; + goto out_unwind_new_rtgs; + } + rtg->rtg_rgno = index; + rtg->rtg_mount = mp; + + error = radix_tree_preload(GFP_NOFS); + if (error) + goto out_free_rtg; + + spin_lock(&mp->m_rtgroup_lock); + if (radix_tree_insert(&mp->m_rtgroup_tree, index, rtg)) { + WARN_ON_ONCE(1); + spin_unlock(&mp->m_rtgroup_lock); + radix_tree_preload_end(); + error = -EEXIST; + goto out_free_rtg; + } + spin_unlock(&mp->m_rtgroup_lock); + radix_tree_preload_end(); + +#ifdef __KERNEL__ + /* Place kernel structure only init below this point. */ + spin_lock_init(&rtg->rtg_state_lock); +#endif /* __KERNEL__ */ + + /* first new rtg is fully initialized */ + if (first_initialised == NULLRGNUMBER) + first_initialised = index; + } + + return 0; + +out_free_rtg: + kmem_free(rtg); +out_unwind_new_rtgs: + /* unwind any prior newly initialized rtgs */ + for (index = first_initialised; index < rgcount; index++) { + rtg = radix_tree_delete(&mp->m_rtgroup_tree, index); + if (!rtg) + break; + kmem_free(rtg); + } + return error; +} + +STATIC void +__xfs_free_rtgroups( + struct rcu_head *head) +{ + struct xfs_rtgroup *rtg; + + rtg = container_of(head, struct xfs_rtgroup, rcu_head); + kmem_free(rtg); +} + +/* + * Free up the rtgroup resources associated with the mount structure. + */ +void +xfs_free_rtgroups( + struct xfs_mount *mp) +{ + struct xfs_rtgroup *rtg; + xfs_rgnumber_t rgno; + + if (!xfs_has_rtgroups(mp)) + return; + + for (rgno = 0; rgno < mp->m_sb.sb_rgcount; rgno++) { + spin_lock(&mp->m_rtgroup_lock); + rtg = radix_tree_delete(&mp->m_rtgroup_tree, rgno); + spin_unlock(&mp->m_rtgroup_lock); + ASSERT(rtg); + XFS_IS_CORRUPT(rtg->rtg_mount, atomic_read(&rtg->rtg_ref) != 0); + + call_rcu(&rtg->rcu_head, __xfs_free_rtgroups); + } +} + +/* Find the size of the rtgroup, in blocks. */ +static xfs_rgblock_t +__xfs_rtgroup_block_count( + struct xfs_mount *mp, + xfs_rgnumber_t rgno, + xfs_rgnumber_t rgcount, + xfs_rfsblock_t rblocks) +{ + ASSERT(rgno < rgcount); + + if (rgno < rgcount - 1) + return mp->m_sb.sb_rgblocks; + return xfs_rtb_rounddown_rtx(mp, + rblocks - (rgno * mp->m_sb.sb_rgblocks)); +} + +/* Compute the number of blocks in this realtime group. */ +xfs_rgblock_t +xfs_rtgroup_block_count( + struct xfs_mount *mp, + xfs_rgnumber_t rgno) +{ + return __xfs_rtgroup_block_count(mp, rgno, mp->m_sb.sb_rgcount, + mp->m_sb.sb_rblocks); +} diff --git a/fs/xfs/libxfs/xfs_rtgroup.h b/fs/xfs/libxfs/xfs_rtgroup.h new file mode 100644 index 000000000000..f414218a66f2 --- /dev/null +++ b/fs/xfs/libxfs/xfs_rtgroup.h @@ -0,0 +1,121 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * Copyright (C) 2022 Oracle. All Rights Reserved. + * Author: Darrick J. Wong <djwong@kernel.org> + */ +#ifndef __LIBXFS_RTGROUP_H +#define __LIBXFS_RTGROUP_H 1 + +struct xfs_mount; +struct xfs_trans; + +/* + * Realtime group incore structure, similar to the per-AG structure. + */ +struct xfs_rtgroup { + struct xfs_mount *rtg_mount; + xfs_rgnumber_t rtg_rgno; + atomic_t rtg_ref; + + /* for rcu-safe freeing */ + struct rcu_head rcu_head; + + /* Number of blocks in this group */ + xfs_rgblock_t rtg_blockcount; + +#ifdef __KERNEL__ + /* -- kernel only structures below this line -- */ + spinlock_t rtg_state_lock; +#endif /* __KERNEL__ */ +}; + +#ifdef CONFIG_XFS_RT +struct xfs_rtgroup *xfs_rtgroup_get(struct xfs_mount *mp, xfs_rgnumber_t rgno); +struct xfs_rtgroup *xfs_rtgroup_bump(struct xfs_rtgroup *rtg); +void xfs_rtgroup_put(struct xfs_rtgroup *rtg); +int xfs_initialize_rtgroups(struct xfs_mount *mp, xfs_rgnumber_t rgcount); +void xfs_free_rtgroups(struct xfs_mount *mp); +#else +static inline struct xfs_rtgroup * +xfs_rtgroup_get( + struct xfs_mount *mp, + xfs_rgnumber_t rgno) +{ + return NULL; +} +static inline struct xfs_rtgroup *xfs_rtgroup_bump(struct xfs_rtgroup *rtg) +{ + ASSERT(rtg == NULL); + return NULL; +} +# define xfs_rtgroup_put(rtg) ((void)0) +# define xfs_initialize_rtgroups(mp, rgcount) (0) +# define xfs_free_rtgroups(mp) ((void)0) +#endif /* CONFIG_XFS_RT */ + +/* + * rt group iteration APIs + */ +static inline struct xfs_rtgroup * +xfs_rtgroup_next( + struct xfs_rtgroup *rtg, + xfs_rgnumber_t *rgno, + xfs_rgnumber_t end_rgno) +{ + struct xfs_mount *mp = rtg->rtg_mount; + + *rgno = rtg->rtg_rgno + 1; + xfs_rtgroup_put(rtg); + if (*rgno > end_rgno) + return NULL; + return xfs_rtgroup_get(mp, *rgno); +} + +#define for_each_rtgroup_range(mp, rgno, end_rgno, rtg) \ + for ((rtg) = xfs_rtgroup_get((mp), (rgno)); \ + (rtg) != NULL; \ + (rtg) = xfs_rtgroup_next((rtg), &(rgno), (end_rgno))) + +#define for_each_rtgroup_from(mp, rgno, rtg) \ + for_each_rtgroup_range((mp), (rgno), (mp)->m_sb.sb_rgcount - 1, (rtg)) + + +#define for_each_rtgroup(mp, rgno, rtg) \ + (rgno) = 0; \ + for_each_rtgroup_from((mp), (rgno), (rtg)) + +static inline bool +xfs_verify_rgbno( + struct xfs_rtgroup *rtg, + xfs_rgblock_t rgbno) +{ + if (rgbno >= rtg->rtg_blockcount) + return false; + if (rgbno < rtg->rtg_mount->m_sb.sb_rextsize) + return false; + return true; +} + +static inline bool +xfs_verify_rgbext( + struct xfs_rtgroup *rtg, + xfs_rgblock_t rgbno, + xfs_rgblock_t len) +{ + if (rgbno + len <= rgbno) + return false; + + if (!xfs_verify_rgbno(rtg, rgbno)) + return false; + + return xfs_verify_rgbno(rtg, rgbno + len - 1); +} + +#ifdef CONFIG_XFS_RT +xfs_rgblock_t xfs_rtgroup_block_count(struct xfs_mount *mp, + xfs_rgnumber_t rgno); +#else +# define xfs_rtgroup_block_count(mp, rgno) (0) +#endif /* CONFIG_XFS_RT */ + +#endif /* __LIBXFS_RTGROUP_H */ diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c index 83930abf935f..48cfb9c8296b 100644 --- a/fs/xfs/libxfs/xfs_sb.c +++ b/fs/xfs/libxfs/xfs_sb.c @@ -641,6 +641,9 @@ __xfs_sb_from_disk( to->sb_gquotino = NULLFSINO; to->sb_pquotino = NULLFSINO; } + + to->sb_rgcount = 0; + to->sb_rgblocks = 0; } void @@ -954,6 +957,8 @@ xfs_sb_mount_common( mp->m_blockwmask = mp->m_blockwsize - 1; mp->m_rtxblklog = log2_if_power2(sbp->sb_rextsize); mp->m_rtxblkmask = mask64_if_power2(sbp->sb_rextsize); + mp->m_rgblklog = 0; + mp->m_rgblkmask = 0; mp->m_alloc_mxr[0] = xfs_allocbt_maxrecs(mp, sbp->sb_blocksize, 1); mp->m_alloc_mxr[1] = xfs_allocbt_maxrecs(mp, sbp->sb_blocksize, 0); diff --git a/fs/xfs/libxfs/xfs_types.h b/fs/xfs/libxfs/xfs_types.h index f4615c5be34f..c27c84561b5e 100644 --- a/fs/xfs/libxfs/xfs_types.h +++ b/fs/xfs/libxfs/xfs_types.h @@ -9,10 +9,12 @@ typedef uint32_t prid_t; /* project ID */ typedef uint32_t xfs_agblock_t; /* blockno in alloc. group */ +typedef uint32_t xfs_rgblock_t; /* blockno in realtime group */ typedef uint32_t xfs_agino_t; /* inode # within allocation grp */ typedef uint32_t xfs_extlen_t; /* extent length in blocks */ typedef uint32_t xfs_rtxlen_t; /* file extent length in rtextents */ typedef uint32_t xfs_agnumber_t; /* allocation group number */ +typedef uint32_t xfs_rgnumber_t; /* realtime group number */ typedef uint64_t xfs_extnum_t; /* # of extents in a file */ typedef uint32_t xfs_aextnum_t; /* # extents in an attribute fork */ typedef int64_t xfs_fsize_t; /* bytes in a file */ @@ -54,7 +56,9 @@ typedef void * xfs_failaddr_t; #define NULLRTEXTNO ((xfs_rtxnum_t)-1) #define NULLAGBLOCK ((xfs_agblock_t)-1) +#define NULLRGBLOCK ((xfs_rgblock_t)-1) #define NULLAGNUMBER ((xfs_agnumber_t)-1) +#define NULLRGNUMBER ((xfs_rgnumber_t)-1) #define NULLCOMMITLSN ((xfs_lsn_t)-1) diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c index 006ceff1959d..8e6da3f34585 100644 --- a/fs/xfs/xfs_log_recover.c +++ b/fs/xfs/xfs_log_recover.c @@ -28,6 +28,7 @@ #include "xfs_ag.h" #include "xfs_quota.h" #include "xfs_reflink.h" +#include "xfs_rtgroup.h" #define BLK_AVG(blk1, blk2) ((blk1+blk2) >> 1) @@ -3341,6 +3342,11 @@ xlog_do_recover( xfs_warn(mp, "Failed post-recovery per-ag init: %d", error); return error; } + error = xfs_initialize_rtgroups(mp, sbp->sb_rgcount); + if (error) { + xfs_warn(mp, "Failed post-recovery rtgroup init: %d", error); + return error; + } mp->m_alloc_set_aside = xfs_alloc_set_aside(mp); /* Normal transactions can now occur */ diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c index 3957c60d5d07..bcfeaaf11536 100644 --- a/fs/xfs/xfs_mount.c +++ b/fs/xfs/xfs_mount.c @@ -35,6 +35,7 @@ #include "xfs_trace.h" #include "xfs_ag.h" #include "xfs_imeta.h" +#include "xfs_rtgroup.h" static DEFINE_MUTEX(xfs_uuid_table_mutex); static int xfs_uuid_table_size; @@ -830,10 +831,16 @@ xfs_mountfs( goto out_free_dir; } + error = xfs_initialize_rtgroups(mp, sbp->sb_rgcount); + if (error) { + xfs_warn(mp, "Failed rtgroup init: %d", error); + goto out_free_perag; + } + if (XFS_IS_CORRUPT(mp, !sbp->sb_logblocks)) { xfs_warn(mp, "no log defined"); error = -EFSCORRUPTED; - goto out_free_perag; + goto out_free_rtgroup; } error = xfs_inodegc_register_shrinker(mp); @@ -1058,6 +1065,8 @@ xfs_mountfs( if (mp->m_logdev_targp && mp->m_logdev_targp != mp->m_ddev_targp) xfs_buftarg_drain(mp->m_logdev_targp); xfs_buftarg_drain(mp->m_ddev_targp); + out_free_rtgroup: + xfs_free_rtgroups(mp); out_free_perag: xfs_free_perag(mp); out_free_dir: @@ -1138,6 +1147,7 @@ xfs_unmountfs( xfs_errortag_clearall(mp); #endif unregister_shrinker(&mp->m_inodegc_shrinker); + xfs_free_rtgroups(mp); xfs_free_perag(mp); xfs_errortag_del(mp); diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h index bad926f3e102..674938008a97 100644 --- a/fs/xfs/xfs_mount.h +++ b/fs/xfs/xfs_mount.h @@ -119,6 +119,7 @@ typedef struct xfs_mount { uint8_t m_agno_log; /* log #ag's */ uint8_t m_sectbb_log; /* sectlog - BBSHIFT */ int8_t m_rtxblklog; /* log2 of rextsize, if possible */ + int8_t m_rgblklog; /* log2 of rt group sz if possible */ uint m_blockmask; /* sb_blocksize-1 */ uint m_blockwsize; /* sb_blocksize in words */ uint m_blockwmask; /* blockwsize-1 */ @@ -153,6 +154,7 @@ typedef struct xfs_mount { uint64_t m_low_space[XFS_LOWSP_MAX]; uint64_t m_low_rtexts[XFS_LOWSP_MAX]; uint64_t m_rtxblkmask; /* rt extent block mask */ + uint64_t m_rgblkmask; /* rt group block mask */ struct xfs_ino_geometry m_ino_geo; /* inode geometry */ struct xfs_trans_resv m_resv; /* precomputed res values */ /* low free space thresholds */ @@ -201,6 +203,8 @@ typedef struct xfs_mount { */ atomic64_t m_allocbt_blks; + struct radix_tree_root m_rtgroup_tree; /* per-rt group info */ + spinlock_t m_rtgroup_lock; /* lock for m_rtgroup_tree */ struct radix_tree_root m_perag_tree; /* per-ag accounting info */ spinlock_t m_perag_lock; /* lock for m_perag_tree */ uint64_t m_resblks; /* total reserved blocks */ @@ -285,6 +289,7 @@ typedef struct xfs_mount { #define XFS_FEAT_NEEDSREPAIR (1ULL << 25) /* needs xfs_repair */ #define XFS_FEAT_NREXT64 (1ULL << 26) /* large extent counters */ #define XFS_FEAT_METADIR (1ULL << 27) /* metadata directory tree */ +#define XFS_FEAT_RTGROUPS (1ULL << 28) /* realtime groups */ /* Mount features */ #define XFS_FEAT_NOATTR2 (1ULL << 48) /* disable attr2 creation */ @@ -349,6 +354,7 @@ __XFS_HAS_FEAT(bigtime, BIGTIME) __XFS_HAS_FEAT(needsrepair, NEEDSREPAIR) __XFS_HAS_FEAT(large_extent_counts, NREXT64) __XFS_HAS_FEAT(metadir, METADIR) +__XFS_HAS_FEAT(rtgroups, RTGROUPS) /* * Mount features diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c index c131738efd0f..3b13352cfbfc 100644 --- a/fs/xfs/xfs_rtalloc.c +++ b/fs/xfs/xfs_rtalloc.c @@ -25,6 +25,7 @@ #include "xfs_da_format.h" #include "xfs_imeta.h" #include "xfs_rtbitmap.h" +#include "xfs_rtgroup.h" /* * Realtime metadata files are not quite regular files because userspace can't @@ -1409,10 +1410,12 @@ xfs_rtmount_iread_extents( */ int /* error */ xfs_rtmount_inodes( - xfs_mount_t *mp) /* file system mount structure */ + struct xfs_mount *mp) /* file system mount structure */ { - int error; /* error return value */ - xfs_sb_t *sbp; + struct xfs_sb *sbp; + struct xfs_rtgroup *rtg; + xfs_rgnumber_t rgno; + int error; /* error return value */ sbp = &mp->m_sb; error = xfs_rt_iget(mp, mp->m_sb.sb_rbmino, &xfs_rbmip_key, @@ -1439,6 +1442,11 @@ xfs_rtmount_inodes( if (error) goto out_rele_summary; + for_each_rtgroup(mp, rgno, rtg) { + rtg->rtg_blockcount = xfs_rtgroup_block_count(mp, + rtg->rtg_rgno); + } + xfs_alloc_rsum_cache(mp, sbp->sb_rbmblocks); return 0; diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index 19a22f9225e4..737c51333d09 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -1932,6 +1932,8 @@ static int xfs_init_fs_context( spin_lock_init(&mp->m_agirotor_lock); INIT_RADIX_TREE(&mp->m_perag_tree, GFP_ATOMIC); spin_lock_init(&mp->m_perag_lock); + INIT_RADIX_TREE(&mp->m_rtgroup_tree, GFP_ATOMIC); + spin_lock_init(&mp->m_rtgroup_lock); mutex_init(&mp->m_growlock); INIT_WORK(&mp->m_flush_inodes_work, xfs_flush_inodes_worker); INIT_DELAYED_WORK(&mp->m_reclaim_work, xfs_reclaim_worker); diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index b92efe4eaeae..f72f694b4656 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -206,6 +206,40 @@ DEFINE_PERAG_REF_EVENT(xfs_perag_put); DEFINE_PERAG_REF_EVENT(xfs_perag_set_inode_tag); DEFINE_PERAG_REF_EVENT(xfs_perag_clear_inode_tag); +#ifdef CONFIG_XFS_RT +DECLARE_EVENT_CLASS(xfs_rtgroup_class, + TP_PROTO(struct xfs_mount *mp, xfs_rgnumber_t rgno, int refcount, + unsigned long caller_ip), + TP_ARGS(mp, rgno, refcount, caller_ip), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(xfs_rgnumber_t, rgno) + __field(int, refcount) + __field(unsigned long, caller_ip) + ), + TP_fast_assign( + __entry->dev = mp->m_super->s_dev; + __entry->rgno = rgno; + __entry->refcount = refcount; + __entry->caller_ip = caller_ip; + ), + TP_printk("dev %d:%d rgno 0x%x refcount %d caller %pS", + MAJOR(__entry->dev), MINOR(__entry->dev), + __entry->rgno, + __entry->refcount, + (char *)__entry->caller_ip) +); + +#define DEFINE_RTGROUP_REF_EVENT(name) \ +DEFINE_EVENT(xfs_rtgroup_class, name, \ + TP_PROTO(struct xfs_mount *mp, xfs_rgnumber_t rgno, int refcount, \ + unsigned long caller_ip), \ + TP_ARGS(mp, rgno, refcount, caller_ip)) +DEFINE_RTGROUP_REF_EVENT(xfs_rtgroup_get); +DEFINE_RTGROUP_REF_EVENT(xfs_rtgroup_bump); +DEFINE_RTGROUP_REF_EVENT(xfs_rtgroup_put); +#endif /* CONFIG_XFS_RT */ + TRACE_EVENT(xfs_inodegc_worker, TP_PROTO(struct xfs_mount *mp, unsigned int shrinker_hits), TP_ARGS(mp, shrinker_hits), ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 05/22] xfs: write secondary realtime superblocks to disk 2022-12-30 22:17 ` [PATCHSET v1.0 00/22] xfsprogs: shard the realtime section Darrick J. Wong 2022-12-30 22:17 ` [PATCH 01/22] xfs: create incore realtime group structures Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 06/22] xfs: grow the realtime section when realtime groups are enabled Darrick J. Wong ` (19 subsequent siblings) 21 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Create some library functions to make it easy to update all the secondary realtime superblocks on disk; this will be used by growfs, xfs_db, mkfs, and xfs_repair. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_rtgroup.c | 117 +++++++++++++++++++++++++++++++++++++++++++ fs/xfs/libxfs/xfs_rtgroup.h | 2 + 2 files changed, 119 insertions(+) diff --git a/fs/xfs/libxfs/xfs_rtgroup.c b/fs/xfs/libxfs/xfs_rtgroup.c index e9655e699f4f..037506b73384 100644 --- a/fs/xfs/libxfs/xfs_rtgroup.c +++ b/fs/xfs/libxfs/xfs_rtgroup.c @@ -382,3 +382,120 @@ xfs_rtgroup_log_super( xfs_rtgroup_update_super(rtsb_bp, sb_bp); xfs_trans_ordered_buf(tp, rtsb_bp); } + +/* Initialize a secondary realtime superblock. */ +static int +xfs_rtgroup_init_secondary_super( + struct xfs_mount *mp, + xfs_rgnumber_t rgno, + struct xfs_buf **bpp) +{ + struct xfs_buf *bp; + struct xfs_rtsb *rsb; + xfs_rtblock_t rtbno; + int error; + + ASSERT(rgno != 0); + + error = xfs_buf_get_uncached(mp->m_rtdev_targp, XFS_FSB_TO_BB(mp, 1), + 0, &bp); + if (error) + return error; + + rtbno = xfs_rgbno_to_rtb(mp, rgno, 0); + bp->b_maps[0].bm_bn = xfs_rtb_to_daddr(mp, rtbno); + bp->b_ops = &xfs_rtsb_buf_ops; + xfs_buf_zero(bp, 0, BBTOB(bp->b_length)); + + rsb = bp->b_addr; + rsb->rsb_magicnum = cpu_to_be32(XFS_RTSB_MAGIC); + rsb->rsb_blocksize = cpu_to_be32(mp->m_sb.sb_blocksize); + rsb->rsb_rblocks = cpu_to_be64(mp->m_sb.sb_rblocks); + + rsb->rsb_rextents = cpu_to_be64(mp->m_sb.sb_rextents); + + memcpy(&rsb->rsb_uuid, &mp->m_sb.sb_uuid, sizeof(rsb->rsb_uuid)); + + rsb->rsb_rgcount = cpu_to_be32(mp->m_sb.sb_rgcount); + memcpy(&rsb->rsb_fname, &mp->m_sb.sb_fname, XFSLABEL_MAX); + + rsb->rsb_rextsize = cpu_to_be32(mp->m_sb.sb_rextsize); + rsb->rsb_rbmblocks = cpu_to_be32(mp->m_sb.sb_rbmblocks); + + rsb->rsb_rgblocks = cpu_to_be32(mp->m_sb.sb_rgblocks); + rsb->rsb_blocklog = mp->m_sb.sb_blocklog; + rsb->rsb_sectlog = mp->m_sb.sb_sectlog; + rsb->rsb_rextslog = mp->m_sb.sb_rextslog; + + memcpy(&rsb->rsb_meta_uuid, &mp->m_sb.sb_meta_uuid, + sizeof(rsb->rsb_meta_uuid)); + + *bpp = bp; + return 0; +} + +/* + * Update all the realtime superblocks to match the new state of the primary. + * Because we are completely overwriting all the existing fields in the + * secondary superblock buffers, there is no need to read them in from disk. + * Just get a new buffer, stamp it and write it. + * + * The rt super buffers do not need to be kept them in memory once they are + * written so we mark them as a one-shot buffer. + */ +int +xfs_rtgroup_update_secondary_sbs( + struct xfs_mount *mp) +{ + LIST_HEAD (buffer_list); + struct xfs_rtgroup *rtg; + xfs_rgnumber_t start_rgno = 1; + int saved_error = 0; + int error = 0; + + for_each_rtgroup_from(mp, start_rgno, rtg) { + struct xfs_buf *bp; + + error = xfs_rtgroup_init_secondary_super(mp, rtg->rtg_rgno, + &bp); + /* + * If we get an error reading or writing alternate superblocks, + * continue. If we break early, we'll leave more superblocks + * un-updated than updated. + */ + if (error) { + xfs_warn(mp, + "error allocating secondary superblock for rt group %d", + rtg->rtg_rgno); + if (!saved_error) + saved_error = error; + continue; + } + + xfs_buf_oneshot(bp); + xfs_buf_delwri_queue(bp, &buffer_list); + xfs_buf_relse(bp); + + /* don't hold too many buffers at once */ + if (rtg->rtg_rgno % 16) + continue; + + error = xfs_buf_delwri_submit(&buffer_list); + if (error) { + xfs_warn(mp, + "write error %d updating a secondary superblock near rt group %u", + error, rtg->rtg_rgno); + if (!saved_error) + saved_error = error; + continue; + } + } + error = xfs_buf_delwri_submit(&buffer_list); + if (error) { + xfs_warn(mp, + "write error %d updating a secondary superblock near rt group %u", + error, start_rgno); + } + + return saved_error ? saved_error : error; +} diff --git a/fs/xfs/libxfs/xfs_rtgroup.h b/fs/xfs/libxfs/xfs_rtgroup.h index c6db6b0d2ae5..d8723fabeb57 100644 --- a/fs/xfs/libxfs/xfs_rtgroup.h +++ b/fs/xfs/libxfs/xfs_rtgroup.h @@ -201,10 +201,12 @@ xfs_rgblock_t xfs_rtgroup_block_count(struct xfs_mount *mp, void xfs_rtgroup_update_super(struct xfs_buf *rtsb_bp, const struct xfs_buf *sb_bp); void xfs_rtgroup_log_super(struct xfs_trans *tp, const struct xfs_buf *sb_bp); +int xfs_rtgroup_update_secondary_sbs(struct xfs_mount *mp); #else # define xfs_rtgroup_block_count(mp, rgno) (0) # define xfs_rtgroup_update_super(bp, sb_bp) ((void)0) # define xfs_rtgroup_log_super(tp, sb_bp) ((void)0) +# define xfs_rtgroup_update_secondary_sbs(mp) (0) #endif /* CONFIG_XFS_RT */ #endif /* __LIBXFS_RTGROUP_H */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 06/22] xfs: grow the realtime section when realtime groups are enabled 2022-12-30 22:17 ` [PATCHSET v1.0 00/22] xfsprogs: shard the realtime section Darrick J. Wong 2022-12-30 22:17 ` [PATCH 01/22] xfs: create incore realtime group structures Darrick J. Wong 2022-12-30 22:17 ` [PATCH 05/22] xfs: write secondary realtime superblocks to disk Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 03/22] xfs: check the realtime superblock at mount time Darrick J. Wong ` (18 subsequent siblings) 21 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Enable growing the rt section when realtime groups are enabled. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_shared.h | 1 fs/xfs/xfs_rtalloc.c | 139 ++++++++++++++++++++++++++++++++++++++++++-- fs/xfs/xfs_trans.c | 10 +++ fs/xfs/xfs_trans.h | 1 4 files changed, 145 insertions(+), 6 deletions(-) diff --git a/fs/xfs/libxfs/xfs_shared.h b/fs/xfs/libxfs/xfs_shared.h index e76e735b1d05..bcdf298889af 100644 --- a/fs/xfs/libxfs/xfs_shared.h +++ b/fs/xfs/libxfs/xfs_shared.h @@ -111,6 +111,7 @@ void xfs_log_get_max_trans_res(struct xfs_mount *mp, #define XFS_TRANS_SB_RBLOCKS 0x00000800 #define XFS_TRANS_SB_REXTENTS 0x00001000 #define XFS_TRANS_SB_REXTSLOG 0x00002000 +#define XFS_TRANS_SB_RGCOUNT 0x00004000 /* * Here we centralize the specification of XFS meta-data buffer reference count diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c index 9c842237c452..9d8d91fa0ecf 100644 --- a/fs/xfs/xfs_rtalloc.c +++ b/fs/xfs/xfs_rtalloc.c @@ -930,6 +930,92 @@ xfs_alloc_rsum_cache( * Visible (exported) functions. */ +static int +xfs_growfs_rt_free_new( + struct xfs_trans *tp, + struct xfs_mount *nmp, + xfs_rtbxlen_t *freed_rtx) +{ + struct xfs_mount *mp = tp->t_mountp; + struct xfs_sb *sbp = &mp->m_sb; + struct xfs_sb *nsbp = &nmp->m_sb; + struct xfs_buf *bp = NULL; + xfs_fileoff_t sumbno; + xfs_rtblock_t rtbno, next_rtbno; + int error = 0; + + if (!xfs_has_rtgroups(mp)) { + *freed_rtx = nsbp->sb_rextents - sbp->sb_rextents; + return xfs_rtfree_range(nmp, tp, sbp->sb_rextents, *freed_rtx, + &bp, &sumbno); + } + + *freed_rtx = 0; + + rtbno = xfs_rtx_to_rtb(nmp, sbp->sb_rextents); + next_rtbno = xfs_rtx_to_rtb(nmp, nsbp->sb_rextents); + while (rtbno < next_rtbno) { + xfs_rtxnum_t start_rtx, next_rtx; + xfs_rtblock_t next_free_rtbno; + xfs_rgnumber_t rgno; + xfs_rgblock_t rgbno; + + /* + * Compute the first new extent that we want to free, being + * careful to skip past a realtime superblock at the start of + * the new region. + */ + rgbno = xfs_rtb_to_rgbno(nmp, rtbno, &rgno); + if (rgbno == 0) { + rtbno += nsbp->sb_rextsize; + if (rtbno >= next_rtbno) + break; + } + + start_rtx = xfs_rtb_to_rtxt(nmp, rtbno); + + /* + * Stop freeing either at the end of the new rt section or at + * the start of the next realtime group. + */ + next_free_rtbno = xfs_rgbno_to_rtb(nmp, rgno + 1, 0); + next_rtx = xfs_rtb_to_rtxt(nmp, next_free_rtbno); + next_rtx = min(next_rtx, nsbp->sb_rextents); + + bp = NULL; + *freed_rtx += next_rtx - start_rtx; + error = xfs_rtfree_range(nmp, tp, start_rtx, + next_rtx - start_rtx, &bp, &sumbno); + if (error) + break; + + rtbno = next_free_rtbno; + } + + return error; +} + +static int +xfs_growfs_rt_init_primary( + struct xfs_mount *mp) +{ + struct xfs_buf *rtsb_bp; + int error; + + error = xfs_buf_get_uncached(mp->m_rtdev_targp, XFS_FSB_TO_BB(mp, 1), + 0, &rtsb_bp); + if (error) + return error; + + rtsb_bp->b_maps[0].bm_bn = XFS_RTSB_DADDR; + rtsb_bp->b_ops = &xfs_rtsb_buf_ops; + + xfs_rtgroup_update_super(rtsb_bp, mp->m_sb_bp); + mp->m_rtsb_bp = rtsb_bp; + xfs_buf_unlock(rtsb_bp); + return 0; +} + /* * Grow the realtime area of the filesystem. */ @@ -953,8 +1039,8 @@ xfs_growfs_rt( xfs_extlen_t rbmblocks; /* current number of rt bitmap blocks */ xfs_extlen_t rsumblocks; /* current number of rt summary blks */ xfs_sb_t *sbp; /* old superblock */ - xfs_fileoff_t sumbno; /* summary block number */ uint8_t *rsum_cache; /* old summary cache */ + xfs_rgnumber_t new_rgcount = 0; sbp = &mp->m_sb; @@ -1019,6 +1105,30 @@ xfs_growfs_rt( */ if (nrsumblocks > (mp->m_sb.sb_logblocks >> 1)) return -EINVAL; + + /* Allocate the new rt group structures */ + if (xfs_has_rtgroups(mp)) { + /* + * We don't support changing the group size to match the extent + * size, even if the size of the rt section is currently zero. + */ + if (mp->m_sb.sb_rgblocks % in->extsize != 0) + return -EOPNOTSUPP; + + if (mp->m_sb.sb_rblocks == 0) { + error = xfs_growfs_rt_init_primary(mp); + if (error) + return error; + } + + new_rgcount = howmany_64(nrblocks, mp->m_sb.sb_rgblocks); + if (new_rgcount > mp->m_sb.sb_rgcount) { + error = xfs_initialize_rtgroups(mp, new_rgcount); + if (error) + return error; + } + } + /* * Get the old block counts for bitmap and summary inodes. * These can't change since other growfs callers are locked out. @@ -1054,7 +1164,10 @@ xfs_growfs_rt( bmbno < nrbmblocks; bmbno++) { struct xfs_trans *tp; + struct xfs_rtgroup *rtg; xfs_rfsblock_t nrblocks_step; + xfs_rtbxlen_t freed_rtx = 0; + xfs_rgnumber_t last_rgno = mp->m_sb.sb_rgcount - 1; *nmp = *mp; nsbp = &nmp->m_sb; @@ -1074,6 +1187,11 @@ xfs_growfs_rt( nrsumblocks = xfs_rtsummary_blockcount(mp, nrsumlevels, nsbp->sb_rbmblocks); nmp->m_rsumsize = nrsumsize = XFS_FSB_TO_B(mp, nrsumblocks); + + if (xfs_has_rtgroups(mp)) + nsbp->sb_rgcount = howmany_64(nsbp->sb_rblocks, + nsbp->sb_rgblocks); + /* * Start a transaction, get the log reservation. */ @@ -1113,6 +1231,7 @@ xfs_growfs_rt( if (error) goto error_cancel; } + /* * Update superblock fields. */ @@ -1131,12 +1250,13 @@ xfs_growfs_rt( if (nsbp->sb_rextslog != sbp->sb_rextslog) xfs_trans_mod_sb(tp, XFS_TRANS_SB_REXTSLOG, nsbp->sb_rextslog - sbp->sb_rextslog); + if (nsbp->sb_rgcount != sbp->sb_rgcount) + xfs_trans_mod_sb(tp, XFS_TRANS_SB_RGCOUNT, + nsbp->sb_rgcount - sbp->sb_rgcount); /* * Free new extent. */ - bp = NULL; - error = xfs_rtfree_range(nmp, tp, sbp->sb_rextents, - nsbp->sb_rextents - sbp->sb_rextents, &bp, &sumbno); + error = xfs_growfs_rt_free_new(tp, nmp, &freed_rtx); if (error) { error_cancel: xfs_trans_cancel(tp); @@ -1145,8 +1265,7 @@ xfs_growfs_rt( /* * Mark more blocks free in the superblock. */ - xfs_trans_mod_sb(tp, XFS_TRANS_SB_FREXTENTS, - nsbp->sb_rextents - sbp->sb_rextents); + xfs_trans_mod_sb(tp, XFS_TRANS_SB_FREXTENTS, freed_rtx); /* * Update mp values into the real mp structure. */ @@ -1157,6 +1276,10 @@ xfs_growfs_rt( if (error) break; + for_each_rtgroup_from(mp, last_rgno, rtg) + rtg->rtg_blockcount = xfs_rtgroup_block_count(mp, + rtg->rtg_rgno); + /* Ensure the mount RT feature flag is now set. */ mp->m_features |= XFS_FEAT_REALTIME; } @@ -1165,6 +1288,10 @@ xfs_growfs_rt( /* Update secondary superblocks now the physical grow has completed */ error = xfs_update_secondary_sbs(mp); + if (error) + goto out_free; + + error = xfs_rtgroup_update_secondary_sbs(mp); out_free: /* diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c index 979aba1b2fc8..a6f46cd9e60c 100644 --- a/fs/xfs/xfs_trans.c +++ b/fs/xfs/xfs_trans.c @@ -510,6 +510,10 @@ xfs_trans_mod_sb( case XFS_TRANS_SB_REXTSLOG: tp->t_rextslog_delta += delta; break; + case XFS_TRANS_SB_RGCOUNT: + ASSERT(delta > 0); + tp->t_rgcount_delta += delta; + break; default: ASSERT(0); return; @@ -615,6 +619,11 @@ xfs_trans_apply_sb_deltas( whole = 1; update_rtsb = true; } + if (tp->t_rgcount_delta) { + be32_add_cpu(&sbp->sb_rgcount, tp->t_rgcount_delta); + whole = 1; + update_rtsb = true; + } xfs_trans_buf_set_type(tp, bp, XFS_BLFT_SB_BUF); if (whole) @@ -728,6 +737,7 @@ xfs_trans_unreserve_and_mod_sb( mp->m_sb.sb_rblocks += tp->t_rblocks_delta; mp->m_sb.sb_rextents += tp->t_rextents_delta; mp->m_sb.sb_rextslog += tp->t_rextslog_delta; + mp->m_sb.sb_rgcount += tp->t_rgcount_delta; spin_unlock(&mp->m_sb_lock); /* diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h index 37cde68f3a31..efa7eace0859 100644 --- a/fs/xfs/xfs_trans.h +++ b/fs/xfs/xfs_trans.h @@ -150,6 +150,7 @@ typedef struct xfs_trans { int64_t t_rblocks_delta;/* superblock rblocks change */ int64_t t_rextents_delta;/* superblocks rextents chg */ int64_t t_rextslog_delta;/* superblocks rextslog chg */ + int64_t t_rgcount_delta; /* realtime group count */ struct list_head t_items; /* log item descriptors */ struct list_head t_busy; /* list of busy extents */ struct list_head t_dfops; /* deferred operations */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 03/22] xfs: check the realtime superblock at mount time 2022-12-30 22:17 ` [PATCHSET v1.0 00/22] xfsprogs: shard the realtime section Darrick J. Wong ` (2 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 06/22] xfs: grow the realtime section when realtime groups are enabled Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 04/22] xfs: update primary realtime super every time we update the primary fs super Darrick J. Wong ` (17 subsequent siblings) 21 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Check the realtime superblock at mount time, to ensure that the label actually matches the primary superblock. If the rt superblock is good, attach it to the xfs_mount so that the log can use ordered buffers to keep this primary in sync with the primary super on the data device. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/xfs_mount.h | 1 + fs/xfs/xfs_rtalloc.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++ fs/xfs/xfs_rtalloc.h | 5 +++++ fs/xfs/xfs_super.c | 16 ++++++++++++++-- 4 files changed, 70 insertions(+), 2 deletions(-) diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h index 674938008a97..7f0a80a8dcd4 100644 --- a/fs/xfs/xfs_mount.h +++ b/fs/xfs/xfs_mount.h @@ -82,6 +82,7 @@ typedef struct xfs_mount { struct super_block *m_super; struct xfs_ail *m_ail; /* fs active log item list */ struct xfs_buf *m_sb_bp; /* buffer for superblock */ + struct xfs_buf *m_rtsb_bp; /* realtime superblock */ char *m_rtname; /* realtime device name */ char *m_logname; /* external log device name */ struct xfs_da_geometry *m_dir_geo; /* directory block geometry */ diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c index 3b13352cfbfc..9c842237c452 100644 --- a/fs/xfs/xfs_rtalloc.c +++ b/fs/xfs/xfs_rtalloc.c @@ -1263,6 +1263,56 @@ xfs_rtallocate_extent( return 0; } +/* Read the primary realtime group's superblock and attach it to the mount. */ +int +xfs_rtmount_readsb( + struct xfs_mount *mp) +{ + struct xfs_buf *bp; + int error; + + if (!xfs_has_rtgroups(mp)) + return 0; + if (mp->m_sb.sb_rblocks == 0) + return 0; + if (mp->m_rtdev_targp == NULL) { + xfs_warn(mp, + "Filesystem has a realtime volume, use rtdev=device option"); + return -ENODEV; + } + + /* m_blkbb_log is not set up yet */ + error = xfs_buf_read_uncached(mp->m_rtdev_targp, XFS_RTSB_DADDR, + mp->m_sb.sb_blocksize >> BBSHIFT, XBF_NO_IOACCT, &bp, + &xfs_rtsb_buf_ops); + if (error) { + xfs_warn(mp, "rt sb validate failed with error %d.", error); + /* bad CRC means corrupted metadata */ + if (error == -EFSBADCRC) + error = -EFSCORRUPTED; + return error; + } + + mp->m_rtsb_bp = bp; + xfs_buf_unlock(bp); + return 0; +} + +/* Detach the realtime superblock from the mount and free it. */ +void +xfs_rtmount_freesb( + struct xfs_mount *mp) +{ + struct xfs_buf *bp = mp->m_rtsb_bp; + + if (!bp) + return; + + xfs_buf_lock(bp); + mp->m_rtsb_bp = NULL; + xfs_buf_relse(bp); +} + /* * Initialize realtime fields in the mount structure. */ diff --git a/fs/xfs/xfs_rtalloc.h b/fs/xfs/xfs_rtalloc.h index 5ac9c15948c8..d0fd49db77bd 100644 --- a/fs/xfs/xfs_rtalloc.h +++ b/fs/xfs/xfs_rtalloc.h @@ -33,6 +33,9 @@ xfs_rtallocate_extent( xfs_rtxnum_t *rtblock); /* out: start rtext allocated */ +int xfs_rtmount_readsb(struct xfs_mount *mp); +void xfs_rtmount_freesb(struct xfs_mount *mp); + /* * Initialize realtime fields in the mount structure. */ @@ -81,6 +84,8 @@ int xfs_rtfile_convert_unwritten(struct xfs_inode *ip, loff_t pos, # define xfs_rtpick_extent(m,t,l,rb) (-ENOSYS) # define xfs_growfs_rt(mp,in) (-ENOSYS) # define xfs_rtalloc_reinit_frextents(m) (0) +# define xfs_rtmount_readsb(mp) (0) +# define xfs_rtmount_freesb(mp) ((void)0) static inline int /* error */ xfs_rtmount_init( xfs_mount_t *mp) /* file system mount structure */ diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index 737c51333d09..bfe93ca6eed4 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -44,6 +44,7 @@ #include "scrub/rcbag_btree.h" #include "xfs_swapext_item.h" #include "xfs_rtbitmap.h" +#include "xfs_rtalloc.h" #include <linux/magic.h> #include <linux/fs_context.h> @@ -1117,6 +1118,7 @@ xfs_fs_put_super( xfs_filestream_unmount(mp); xfs_unmountfs(mp); + xfs_rtmount_freesb(mp); xfs_freesb(mp); free_percpu(mp->m_stats.xs_stats); xfs_mount_list_del(mp); @@ -1599,9 +1601,13 @@ xfs_fs_fill_super( goto out_free_sb; } + error = xfs_rtmount_readsb(mp); + if (error) + goto out_free_sb; + error = xfs_filestream_mount(mp); if (error) - goto out_free_sb; + goto out_free_rtsb; /* * we must configure the block size in the superblock before we run the @@ -1645,6 +1651,10 @@ xfs_fs_fill_super( xfs_warn(mp, "EXPERIMENTAL metadata directory feature in use. Use at your own risk!"); + if (xfs_has_rtgroups(mp)) + xfs_warn(mp, +"EXPERIMENTAL realtime allocation group feature in use. Use at your own risk!"); + if (xfs_has_reflink(mp)) { if (mp->m_sb.sb_rblocks) { xfs_alert(mp, @@ -1689,6 +1699,8 @@ xfs_fs_fill_super( out_filestream_unmount: xfs_filestream_unmount(mp); + out_free_rtsb: + xfs_rtmount_freesb(mp); out_free_sb: xfs_freesb(mp); out_free_stats: @@ -1710,7 +1722,7 @@ xfs_fs_fill_super( out_unmount: xfs_filestream_unmount(mp); xfs_unmountfs(mp); - goto out_free_sb; + goto out_free_rtsb; } static int ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 04/22] xfs: update primary realtime super every time we update the primary fs super 2022-12-30 22:17 ` [PATCHSET v1.0 00/22] xfsprogs: shard the realtime section Darrick J. Wong ` (3 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 03/22] xfs: check the realtime superblock at mount time Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 07/22] xfs: always update secondary rt supers when we update secondary fs supers Darrick J. Wong ` (16 subsequent siblings) 21 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Every time we update parts of the primary filesystem superblock that are echoed in the primary rt super, we should update that primary realtime super. Avoid an ondisk log format change by using ordered buffers to write the primary rt super. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_rtgroup.c | 74 +++++++++++++++++++++++++++++++++++++++++ fs/xfs/libxfs/xfs_rtgroup.h | 6 +++ fs/xfs/libxfs/xfs_sb.c | 13 +++++++ fs/xfs/xfs_buf_item_recover.c | 18 ++++++++++ fs/xfs/xfs_trans.c | 10 ++++++ fs/xfs/xfs_trans.h | 1 + fs/xfs/xfs_trans_buf.c | 25 +++++++++++--- 7 files changed, 142 insertions(+), 5 deletions(-) diff --git a/fs/xfs/libxfs/xfs_rtgroup.c b/fs/xfs/libxfs/xfs_rtgroup.c index edbc427725c3..e9655e699f4f 100644 --- a/fs/xfs/libxfs/xfs_rtgroup.c +++ b/fs/xfs/libxfs/xfs_rtgroup.c @@ -308,3 +308,77 @@ const struct xfs_buf_ops xfs_rtsb_buf_ops = { .verify_write = xfs_rtsb_write_verify, .verify_struct = xfs_rtsb_verify, }; + +/* Update a realtime superblock from the primary fs super */ +void +xfs_rtgroup_update_super( + struct xfs_buf *rtsb_bp, + const struct xfs_buf *sb_bp) +{ + const struct xfs_dsb *dsb = sb_bp->b_addr; + struct xfs_rtsb *rsb = rtsb_bp->b_addr; + const uuid_t *meta_uuid; + + rsb->rsb_magicnum = cpu_to_be32(XFS_RTSB_MAGIC); + rsb->rsb_blocksize = dsb->sb_blocksize; + rsb->rsb_rblocks = dsb->sb_rblocks; + + rsb->rsb_rextents = dsb->sb_rextents; + rsb->rsb_lsn = 0; + + memcpy(&rsb->rsb_uuid, &dsb->sb_uuid, sizeof(rsb->rsb_uuid)); + + rsb->rsb_rgcount = dsb->sb_rgcount; + memcpy(&rsb->rsb_fname, &dsb->sb_fname, XFSLABEL_MAX); + + rsb->rsb_rextsize = dsb->sb_rextsize; + rsb->rsb_rbmblocks = dsb->sb_rbmblocks; + + rsb->rsb_rgblocks = dsb->sb_rgblocks; + rsb->rsb_blocklog = dsb->sb_blocklog; + rsb->rsb_sectlog = dsb->sb_sectlog; + rsb->rsb_rextslog = dsb->sb_rextslog; + rsb->rsb_pad = 0; + rsb->rsb_pad2 = 0; + + /* + * The metadata uuid is the fs uuid if the metauuid feature is not + * enabled. + */ + if (dsb->sb_features_incompat & + cpu_to_be32(XFS_SB_FEAT_INCOMPAT_META_UUID)) + meta_uuid = &dsb->sb_meta_uuid; + else + meta_uuid = &dsb->sb_uuid; + memcpy(&rsb->rsb_meta_uuid, meta_uuid, sizeof(rsb->rsb_meta_uuid)); +} + +/* + * Update the primary realtime superblock from a filesystem superblock and + * log it to the given transaction. + */ +void +xfs_rtgroup_log_super( + struct xfs_trans *tp, + const struct xfs_buf *sb_bp) +{ + struct xfs_buf *rtsb_bp; + + if (!xfs_has_rtgroups(tp->t_mountp)) + return; + + rtsb_bp = xfs_trans_getrtsb(tp); + if (!rtsb_bp) { + /* + * It's possible for the rtgroups feature to be enabled but + * there is no incore rt superblock buffer if the rt geometry + * was specified at mkfs time but the rt section has not yet + * been attached. In this case, rblocks must be zero. + */ + ASSERT(tp->t_mountp->m_sb.sb_rblocks == 0); + return; + } + + xfs_rtgroup_update_super(rtsb_bp, sb_bp); + xfs_trans_ordered_buf(tp, rtsb_bp); +} diff --git a/fs/xfs/libxfs/xfs_rtgroup.h b/fs/xfs/libxfs/xfs_rtgroup.h index ff9b01d8c501..c6db6b0d2ae5 100644 --- a/fs/xfs/libxfs/xfs_rtgroup.h +++ b/fs/xfs/libxfs/xfs_rtgroup.h @@ -197,8 +197,14 @@ xfs_daddr_to_rgbno( #ifdef CONFIG_XFS_RT xfs_rgblock_t xfs_rtgroup_block_count(struct xfs_mount *mp, xfs_rgnumber_t rgno); + +void xfs_rtgroup_update_super(struct xfs_buf *rtsb_bp, + const struct xfs_buf *sb_bp); +void xfs_rtgroup_log_super(struct xfs_trans *tp, const struct xfs_buf *sb_bp); #else # define xfs_rtgroup_block_count(mp, rgno) (0) +# define xfs_rtgroup_update_super(bp, sb_bp) ((void)0) +# define xfs_rtgroup_log_super(tp, sb_bp) ((void)0) #endif /* CONFIG_XFS_RT */ #endif /* __LIBXFS_RTGROUP_H */ diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c index bbadf78b4628..2e8ec9214c6c 100644 --- a/fs/xfs/libxfs/xfs_sb.c +++ b/fs/xfs/libxfs/xfs_sb.c @@ -26,6 +26,7 @@ #include "xfs_health.h" #include "xfs_ag.h" #include "xfs_swapext.h" +#include "xfs_rtgroup.h" /* * Physical superblock buffer manipulations. Shared with libxfs in userspace. @@ -1100,6 +1101,8 @@ xfs_log_sb( xfs_sb_to_disk(bp->b_addr, &mp->m_sb); xfs_trans_buf_set_type(tp, bp, XFS_BLFT_SB_BUF); xfs_trans_log_buf(tp, bp, 0, sizeof(struct xfs_dsb) - 1); + + xfs_rtgroup_log_super(tp, bp); } /* @@ -1216,6 +1219,7 @@ xfs_sync_sb_buf( { struct xfs_trans *tp; struct xfs_buf *bp; + struct xfs_buf *rtsb_bp = NULL; int error; error = xfs_trans_alloc(mp, &M_RES(mp)->tr_sb, 0, 0, 0, &tp); @@ -1225,6 +1229,11 @@ xfs_sync_sb_buf( bp = xfs_trans_getsb(tp); xfs_log_sb(tp); xfs_trans_bhold(tp, bp); + if (xfs_has_rtgroups(mp)) { + rtsb_bp = xfs_trans_getrtsb(tp); + if (rtsb_bp) + xfs_trans_bhold(tp, rtsb_bp); + } xfs_trans_set_sync(tp); error = xfs_trans_commit(tp); if (error) @@ -1233,7 +1242,11 @@ xfs_sync_sb_buf( * write out the sb buffer to get the changes to disk */ error = xfs_bwrite(bp); + if (!error && rtsb_bp) + error = xfs_bwrite(rtsb_bp); out: + if (rtsb_bp) + xfs_buf_relse(rtsb_bp); xfs_buf_relse(bp); return error; } diff --git a/fs/xfs/xfs_buf_item_recover.c b/fs/xfs/xfs_buf_item_recover.c index ffa94102094d..6587d18b21c3 100644 --- a/fs/xfs/xfs_buf_item_recover.c +++ b/fs/xfs/xfs_buf_item_recover.c @@ -22,6 +22,7 @@ #include "xfs_inode.h" #include "xfs_dir2.h" #include "xfs_quota.h" +#include "xfs_rtgroup.h" /* * This is the number of entries in the l_buf_cancel_table used during @@ -985,6 +986,23 @@ xlog_recover_buf_commit_pass2( ASSERT(bp->b_mount == mp); bp->b_flags |= _XBF_LOGRECOVERY; xfs_buf_delwri_queue(bp, buffer_list); + + /* + * Update the primary rt super if we just recovered the primary + * fs super. + */ + if (xfs_has_rtgroups(mp) && bp->b_ops == &xfs_sb_buf_ops) { + struct xfs_buf *rtsb_bp = mp->m_rtsb_bp; + + if (rtsb_bp) { + xfs_buf_lock(rtsb_bp); + xfs_buf_hold(rtsb_bp); + xfs_rtgroup_update_super(rtsb_bp, bp); + rtsb_bp->b_flags |= _XBF_LOGRECOVERY; + xfs_buf_delwri_queue(rtsb_bp, buffer_list); + xfs_buf_relse(rtsb_bp); + } + } } out_release: diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c index f39c5daeef86..979aba1b2fc8 100644 --- a/fs/xfs/xfs_trans.c +++ b/fs/xfs/xfs_trans.c @@ -25,6 +25,7 @@ #include "xfs_dquot.h" #include "xfs_icache.h" #include "xfs_rtbitmap.h" +#include "xfs_rtgroup.h" struct kmem_cache *xfs_trans_cache; @@ -531,6 +532,7 @@ xfs_trans_apply_sb_deltas( { struct xfs_dsb *sbp; struct xfs_buf *bp; + bool update_rtsb = false; int whole = 0; bp = xfs_trans_getsb(tp); @@ -591,22 +593,27 @@ xfs_trans_apply_sb_deltas( if (tp->t_rextsize_delta) { be32_add_cpu(&sbp->sb_rextsize, tp->t_rextsize_delta); whole = 1; + update_rtsb = true; } if (tp->t_rbmblocks_delta) { be32_add_cpu(&sbp->sb_rbmblocks, tp->t_rbmblocks_delta); whole = 1; + update_rtsb = true; } if (tp->t_rblocks_delta) { be64_add_cpu(&sbp->sb_rblocks, tp->t_rblocks_delta); whole = 1; + update_rtsb = true; } if (tp->t_rextents_delta) { be64_add_cpu(&sbp->sb_rextents, tp->t_rextents_delta); whole = 1; + update_rtsb = true; } if (tp->t_rextslog_delta) { sbp->sb_rextslog += tp->t_rextslog_delta; whole = 1; + update_rtsb = true; } xfs_trans_buf_set_type(tp, bp, XFS_BLFT_SB_BUF); @@ -623,6 +630,9 @@ xfs_trans_apply_sb_deltas( xfs_trans_log_buf(tp, bp, offsetof(struct xfs_dsb, sb_icount), offsetof(struct xfs_dsb, sb_frextents) + sizeof(sbp->sb_frextents) - 1); + + if (update_rtsb) + xfs_rtgroup_log_super(tp, bp); } /* diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h index 0a9ec6929bbc..37cde68f3a31 100644 --- a/fs/xfs/xfs_trans.h +++ b/fs/xfs/xfs_trans.h @@ -216,6 +216,7 @@ xfs_trans_read_buf( } struct xfs_buf *xfs_trans_getsb(struct xfs_trans *); +struct xfs_buf *xfs_trans_getrtsb(struct xfs_trans *tp); void xfs_trans_brelse(xfs_trans_t *, struct xfs_buf *); void xfs_trans_bjoin(xfs_trans_t *, struct xfs_buf *); diff --git a/fs/xfs/xfs_trans_buf.c b/fs/xfs/xfs_trans_buf.c index e28ab74af4f0..8e886ecfd69a 100644 --- a/fs/xfs/xfs_trans_buf.c +++ b/fs/xfs/xfs_trans_buf.c @@ -168,12 +168,11 @@ xfs_trans_get_buf_map( /* * Get and lock the superblock buffer for the given transaction. */ -struct xfs_buf * -xfs_trans_getsb( - struct xfs_trans *tp) +static struct xfs_buf * +__xfs_trans_getsb( + struct xfs_trans *tp, + struct xfs_buf *bp) { - struct xfs_buf *bp = tp->t_mountp->m_sb_bp; - /* * Just increment the lock recursion count if the buffer is already * attached to this transaction. @@ -197,6 +196,22 @@ xfs_trans_getsb( return bp; } +struct xfs_buf * +xfs_trans_getsb( + struct xfs_trans *tp) +{ + return __xfs_trans_getsb(tp, tp->t_mountp->m_sb_bp); +} + +struct xfs_buf * +xfs_trans_getrtsb( + struct xfs_trans *tp) +{ + if (!tp->t_mountp->m_rtsb_bp) + return NULL; + return __xfs_trans_getsb(tp, tp->t_mountp->m_rtsb_bp); +} + /* * Get and lock the buffer for the caller if it is not already * locked within the given transaction. If it has not yet been ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 07/22] xfs: always update secondary rt supers when we update secondary fs supers 2022-12-30 22:17 ` [PATCHSET v1.0 00/22] xfsprogs: shard the realtime section Darrick J. Wong ` (4 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 04/22] xfs: update primary realtime super every time we update the primary fs super Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 08/22] xfs: export realtime group geometry via XFS_FSOP_GEOM Darrick J. Wong ` (15 subsequent siblings) 21 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Make sure that any update to the secondary superblocks in the data section are also echoed to the secondary superblocks in the realtime section. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/xfs_fsops.c | 4 ++++ fs/xfs/xfs_ioctl.c | 3 +++ 2 files changed, 7 insertions(+) diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c index 13851c0d640b..2da86f05e0e5 100644 --- a/fs/xfs/xfs_fsops.c +++ b/fs/xfs/xfs_fsops.c @@ -21,6 +21,7 @@ #include "xfs_ag.h" #include "xfs_ag_resv.h" #include "xfs_trace.h" +#include "xfs_rtgroup.h" /* * Write new AG headers to disk. Non-transactional, but need to be @@ -306,6 +307,9 @@ xfs_growfs_data( /* Update secondary superblocks now the physical grow has completed */ error = xfs_update_secondary_sbs(mp); + if (error) + goto out_error; + error = xfs_rtgroup_update_secondary_sbs(mp); out_error: /* diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c index e3e6d377d958..46deb26b7cc5 100644 --- a/fs/xfs/xfs_ioctl.c +++ b/fs/xfs/xfs_ioctl.c @@ -39,6 +39,7 @@ #include "xfs_ioctl.h" #include "xfs_xattr.h" #include "xfs_rtbitmap.h" +#include "xfs_rtgroup.h" #include <linux/mount.h> #include <linux/namei.h> @@ -1719,6 +1720,8 @@ xfs_ioc_setlabel( */ mutex_lock(&mp->m_growlock); error = xfs_update_secondary_sbs(mp); + if (!error) + error = xfs_rtgroup_update_secondary_sbs(mp); mutex_unlock(&mp->m_growlock); invalidate_bdev(bdev); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 08/22] xfs: export realtime group geometry via XFS_FSOP_GEOM 2022-12-30 22:17 ` [PATCHSET v1.0 00/22] xfsprogs: shard the realtime section Darrick J. Wong ` (5 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 07/22] xfs: always update secondary rt supers when we update secondary fs supers Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 02/22] xfs: define the format of rt groups Darrick J. Wong ` (14 subsequent siblings) 21 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Export the realtime geometry information so that userspace can query it. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_fs.h | 4 +++- fs/xfs/libxfs/xfs_sb.c | 5 +++++ 2 files changed, 8 insertions(+), 1 deletion(-) diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h index c4995f6557d2..ba90649c54e0 100644 --- a/fs/xfs/libxfs/xfs_fs.h +++ b/fs/xfs/libxfs/xfs_fs.h @@ -186,7 +186,9 @@ struct xfs_fsop_geom { __u32 logsunit; /* log stripe unit, bytes */ uint32_t sick; /* o: unhealthy fs & rt metadata */ uint32_t checked; /* o: checked fs & rt metadata */ - __u64 reserved[17]; /* reserved space */ + __u32 rgblocks; /* rtblocks in a realtime group */ + __u32 rgcount; /* number of realtime groups */ + __u64 reserved[16]; /* reserved space */ }; #define XFS_FSOP_GEOM_SICK_COUNTERS (1 << 0) /* summary counters */ diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c index 2e8ec9214c6c..db88f601e24b 100644 --- a/fs/xfs/libxfs/xfs_sb.c +++ b/fs/xfs/libxfs/xfs_sb.c @@ -1348,6 +1348,11 @@ xfs_fs_geometry( return; geo->version = XFS_FSOP_GEOM_VERSION_V5; + + if (xfs_has_rtgroups(mp)) { + geo->rgcount = sbp->sb_rgcount; + geo->rgblocks = sbp->sb_rgblocks; + } } /* Read a secondary superblock. */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 02/22] xfs: define the format of rt groups 2022-12-30 22:17 ` [PATCHSET v1.0 00/22] xfsprogs: shard the realtime section Darrick J. Wong ` (6 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 08/22] xfs: export realtime group geometry via XFS_FSOP_GEOM Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 11/22] xfs: record rt group superblock errors in the health system Darrick J. Wong ` (13 subsequent siblings) 21 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Define the ondisk format of realtime group metadata. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_format.h | 62 +++++++++++++++++++++++++++- fs/xfs/libxfs/xfs_rtgroup.c | 96 +++++++++++++++++++++++++++++++++++++++++++ fs/xfs/libxfs/xfs_rtgroup.h | 83 +++++++++++++++++++++++++++++++++++++ fs/xfs/libxfs/xfs_sb.c | 86 +++++++++++++++++++++++++++++++++++++-- fs/xfs/libxfs/xfs_shared.h | 1 fs/xfs/xfs_ondisk.h | 1 6 files changed, 324 insertions(+), 5 deletions(-) diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h index ca87a3f8704a..a38e1499bd4b 100644 --- a/fs/xfs/libxfs/xfs_format.h +++ b/fs/xfs/libxfs/xfs_format.h @@ -216,7 +216,17 @@ struct xfs_dsb { * pointers are no longer used. */ __be64 sb_rbmino; - __be64 sb_rsumino; /* summary inode for rt bitmap */ + /* + * rtgroups requires metadir, so we reuse the rsumino space to hold + * the rg block count and shift values. + */ + union { + __be64 sb_rsumino; /* summary inode for rt bitmap */ + struct { + __be32 sb_rgcount; /* # of realtime groups */ + __be32 sb_rgblocks; /* rtblocks per group */ + }; + }; __be32 sb_rextsize; /* realtime extent size, blocks */ __be32 sb_agblocks; /* size of an allocation group */ __be32 sb_agcount; /* number of allocation groups */ @@ -397,6 +407,7 @@ xfs_sb_has_ro_compat_feature( #define XFS_SB_FEAT_INCOMPAT_BIGTIME (1 << 3) /* large timestamps */ #define XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR (1 << 4) /* needs xfs_repair */ #define XFS_SB_FEAT_INCOMPAT_NREXT64 (1 << 5) /* large extent counters */ +#define XFS_SB_FEAT_INCOMPAT_RTGROUPS (1 << 30) /* realtime groups */ #define XFS_SB_FEAT_INCOMPAT_METADIR (1U << 31) /* metadata dir tree */ #define XFS_SB_FEAT_INCOMPAT_ALL \ (XFS_SB_FEAT_INCOMPAT_FTYPE| \ @@ -741,6 +752,55 @@ union xfs_suminfo_ondisk { __u32 raw; }; +/* + * Realtime allocation groups break the rt section into multiple pieces that + * could be locked independently. Realtime block group numbers are 32-bit + * quantities. Block numbers within a group are also 32-bit quantities, but + * the upper bit must never be set. + */ +#define XFS_MAX_RGBLOCKS ((xfs_rgblock_t)(1U << 31) - 1) +#define XFS_MAX_RGNUMBER ((xfs_rgnumber_t)(-1U)) + +#define XFS_RTSB_MAGIC 0x58524750 /* 'XRGP' */ + +/* + * Realtime superblock - on disk version. Must be padded to 64 bit alignment. + * The first block of each realtime group contains this superblock; this is + * how we avoid having file data extents cross a group boundary. + */ +struct xfs_rtsb { + __be32 rsb_magicnum; /* magic number == XFS_RTSB_MAGIC */ + __be32 rsb_blocksize; /* logical block size, bytes */ + __be64 rsb_rblocks; /* number of realtime blocks */ + + __be64 rsb_rextents; /* number of realtime extents */ + __be64 rsb_lsn; /* last write sequence */ + + __be32 rsb_rgcount; /* # of realtime groups */ + char rsb_fname[XFSLABEL_MAX]; /* rt volume name */ + + uuid_t rsb_uuid; /* user-visible file system unique id */ + + __be32 rsb_rextsize; /* realtime extent size, blocks */ + __be32 rsb_rbmblocks; /* number of rt bitmap blocks */ + + __be32 rsb_rgblocks; /* rt blocks per group */ + __u8 rsb_blocklog; /* log2 of sb_blocksize */ + __u8 rsb_sectlog; /* log2 of sb_sectsize */ + __u8 rsb_rextslog; /* log2 of sb_rextents */ + __u8 rsb_pad; + + __le32 rsb_crc; /* superblock crc */ + __le32 rsb_pad2; + + uuid_t rsb_meta_uuid; /* metadata file system unique id */ + + /* must be padded to 64 bit alignment */ +}; + +#define XFS_RTSB_CRC_OFF offsetof(struct xfs_rtsb, rsb_crc) +#define XFS_RTSB_DADDR ((xfs_daddr_t)0) /* daddr in rt section */ + /* * XFS Timestamps * ============== diff --git a/fs/xfs/libxfs/xfs_rtgroup.c b/fs/xfs/libxfs/xfs_rtgroup.c index ced2bd896106..edbc427725c3 100644 --- a/fs/xfs/libxfs/xfs_rtgroup.c +++ b/fs/xfs/libxfs/xfs_rtgroup.c @@ -28,6 +28,7 @@ #include "xfs_trace.h" #include "xfs_inode.h" #include "xfs_icache.h" +#include "xfs_buf_item.h" #include "xfs_rtgroup.h" #include "xfs_rtbitmap.h" @@ -212,3 +213,98 @@ xfs_rtgroup_block_count( return __xfs_rtgroup_block_count(mp, rgno, mp->m_sb.sb_rgcount, mp->m_sb.sb_rblocks); } + +static xfs_failaddr_t +xfs_rtsb_verify( + struct xfs_buf *bp) +{ + struct xfs_mount *mp = bp->b_mount; + struct xfs_rtsb *rsb = bp->b_addr; + + if (!xfs_verify_magic(bp, rsb->rsb_magicnum)) + return __this_address; + if (be32_to_cpu(rsb->rsb_blocksize) != mp->m_sb.sb_blocksize) + return __this_address; + if (be64_to_cpu(rsb->rsb_rblocks) != mp->m_sb.sb_rblocks) + return __this_address; + + if (be64_to_cpu(rsb->rsb_rextents) != mp->m_sb.sb_rextents) + return __this_address; + + if (!uuid_equal(&rsb->rsb_uuid, &mp->m_sb.sb_uuid)) + return __this_address; + + if (be32_to_cpu(rsb->rsb_rgcount) != mp->m_sb.sb_rgcount) + return __this_address; + + if (be32_to_cpu(rsb->rsb_rextsize) != mp->m_sb.sb_rextsize) + return __this_address; + if (be32_to_cpu(rsb->rsb_rbmblocks) != mp->m_sb.sb_rbmblocks) + return __this_address; + + if (be32_to_cpu(rsb->rsb_rgblocks) != mp->m_sb.sb_rgblocks) + return __this_address; + if (rsb->rsb_blocklog != mp->m_sb.sb_blocklog) + return __this_address; + if (rsb->rsb_sectlog != mp->m_sb.sb_sectlog) + return __this_address; + if (rsb->rsb_rextslog != mp->m_sb.sb_rextslog) + return __this_address; + if (rsb->rsb_pad) + return __this_address; + + if (rsb->rsb_pad2) + return __this_address; + + if (!uuid_equal(&rsb->rsb_meta_uuid, &mp->m_sb.sb_meta_uuid)) + return __this_address; + + /* Everything to the end of the fs block must be zero */ + if (memchr_inv(rsb + 1, 0, BBTOB(bp->b_length) - sizeof(*rsb))) + return __this_address; + + return NULL; +} + +static void +xfs_rtsb_read_verify( + struct xfs_buf *bp) +{ + xfs_failaddr_t fa; + + if (!xfs_buf_verify_cksum(bp, XFS_RTSB_CRC_OFF)) + xfs_verifier_error(bp, -EFSBADCRC, __this_address); + else { + fa = xfs_rtsb_verify(bp); + if (fa) + xfs_verifier_error(bp, -EFSCORRUPTED, fa); + } +} + +static void +xfs_rtsb_write_verify( + struct xfs_buf *bp) +{ + struct xfs_rtsb *rsb = bp->b_addr; + struct xfs_buf_log_item *bip = bp->b_log_item; + xfs_failaddr_t fa; + + fa = xfs_rtsb_verify(bp); + if (fa) { + xfs_verifier_error(bp, -EFSCORRUPTED, fa); + return; + } + + if (bip) + rsb->rsb_lsn = cpu_to_be64(bip->bli_item.li_lsn); + + xfs_buf_update_cksum(bp, XFS_RTSB_CRC_OFF); +} + +const struct xfs_buf_ops xfs_rtsb_buf_ops = { + .name = "xfs_rtsb", + .magic = { 0, cpu_to_be32(XFS_RTSB_MAGIC) }, + .verify_read = xfs_rtsb_read_verify, + .verify_write = xfs_rtsb_write_verify, + .verify_struct = xfs_rtsb_verify, +}; diff --git a/fs/xfs/libxfs/xfs_rtgroup.h b/fs/xfs/libxfs/xfs_rtgroup.h index f414218a66f2..ff9b01d8c501 100644 --- a/fs/xfs/libxfs/xfs_rtgroup.h +++ b/fs/xfs/libxfs/xfs_rtgroup.h @@ -111,6 +111,89 @@ xfs_verify_rgbext( return xfs_verify_rgbno(rtg, rgbno + len - 1); } +static inline xfs_rtblock_t +xfs_rgbno_to_rtb( + struct xfs_mount *mp, + xfs_rgnumber_t rgno, + xfs_rgblock_t rgbno) +{ + ASSERT(xfs_has_rtgroups(mp)); + + if (mp->m_rgblklog >= 0) + return ((xfs_rtblock_t)rgno << mp->m_rgblklog) | rgbno; + + return ((xfs_rtblock_t)rgno * mp->m_sb.sb_rgblocks) + rgbno; +} + +static inline xfs_rgnumber_t +xfs_rtb_to_rgno( + struct xfs_mount *mp, + xfs_rtblock_t rtbno) +{ + ASSERT(xfs_has_rtgroups(mp)); + + if (mp->m_rgblklog >= 0) + return rtbno >> mp->m_rgblklog; + + return div_u64(rtbno, mp->m_sb.sb_rgblocks); +} + +static inline xfs_rgblock_t +xfs_rtb_to_rgbno( + struct xfs_mount *mp, + xfs_rtblock_t rtbno, + xfs_rgnumber_t *rgno) +{ + uint32_t rem; + + ASSERT(xfs_has_rtgroups(mp)); + + if (mp->m_rgblklog >= 0) { + *rgno = rtbno >> mp->m_rgblklog; + return rtbno & mp->m_rgblkmask; + } + + *rgno = div_u64_rem(rtbno, mp->m_sb.sb_rgblocks, &rem); + return rem; +} + +static inline xfs_daddr_t +xfs_rtb_to_daddr( + struct xfs_mount *mp, + xfs_rtblock_t rtbno) +{ + return rtbno << mp->m_blkbb_log; +} + +static inline xfs_rtblock_t +xfs_daddr_to_rtb( + struct xfs_mount *mp, + xfs_daddr_t daddr) +{ + return daddr >> mp->m_blkbb_log; +} + +static inline xfs_rgnumber_t +xfs_daddr_to_rgno( + struct xfs_mount *mp, + xfs_daddr_t daddr) +{ + xfs_rtblock_t rtb = daddr >> mp->m_blkbb_log; + + return xfs_rtb_to_rgno(mp, rtb); +} + +static inline xfs_rgblock_t +xfs_daddr_to_rgbno( + struct xfs_mount *mp, + xfs_daddr_t daddr) +{ + xfs_rtblock_t rtb = daddr >> mp->m_blkbb_log; + xfs_rgnumber_t rgno; + + return xfs_rtb_to_rgbno(mp, rtb, &rgno); +} + #ifdef CONFIG_XFS_RT xfs_rgblock_t xfs_rtgroup_block_count(struct xfs_mount *mp, xfs_rgnumber_t rgno); diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c index 48cfb9c8296b..bbadf78b4628 100644 --- a/fs/xfs/libxfs/xfs_sb.c +++ b/fs/xfs/libxfs/xfs_sb.c @@ -176,6 +176,8 @@ xfs_sb_version_to_features( features |= XFS_FEAT_NREXT64; if (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_METADIR) features |= XFS_FEAT_METADIR; + if (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_RTGROUPS) + features |= XFS_FEAT_RTGROUPS; return features; } @@ -302,6 +304,64 @@ xfs_validate_sb_write( return 0; } +static int +xfs_validate_sb_rtgroups( + struct xfs_mount *mp, + struct xfs_sb *sbp) +{ + uint64_t groups; + + if (!(sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_METADIR)) { + xfs_warn(mp, +"Realtime groups require metadata directory tree."); + return -EINVAL; + } + + if (sbp->sb_rgblocks > XFS_MAX_RGBLOCKS) { + xfs_warn(mp, +"Realtime group size (%u) must be less than %u.", + sbp->sb_rgblocks, XFS_MAX_RGBLOCKS); + return -EINVAL; + } + + if (sbp->sb_rextsize == 0) { + xfs_warn(mp, +"Realtime extent size must not be zero."); + return -EINVAL; + } + + if (sbp->sb_rgblocks % sbp->sb_rextsize != 0) { + xfs_warn(mp, +"Realtime group size (%u) must be an even multiple of extent size (%u).", + sbp->sb_rgblocks, sbp->sb_rextsize); + return -EINVAL; + } + + if (sbp->sb_rgblocks < (sbp->sb_rextsize << 1)) { + xfs_warn(mp, +"Realtime group size (%u) must be greater than 1 rt extent.", + sbp->sb_rgblocks); + return -EINVAL; + } + + if (sbp->sb_rgcount > XFS_MAX_RGNUMBER) { + xfs_warn(mp, +"Realtime groups (%u) must be less than %u.", + sbp->sb_rgcount, XFS_MAX_RGNUMBER); + return -EINVAL; + } + + groups = howmany_64(sbp->sb_rblocks, sbp->sb_rgblocks); + if (groups != sbp->sb_rgcount) { + xfs_warn(mp, +"Realtime groups (%u) do not cover the entire rt section; need (%llu) groups.", + sbp->sb_rgcount, groups); + return -EINVAL; + } + + return 0; +} + /* Check the validity of the SB. */ STATIC int xfs_validate_sb_common( @@ -313,6 +373,7 @@ xfs_validate_sb_common( uint32_t agcount = 0; uint32_t rem; bool has_dalign; + int error; if (!xfs_verify_magic(bp, dsb->sb_magicnum)) { xfs_warn(mp, @@ -362,6 +423,12 @@ xfs_validate_sb_common( return -EINVAL; } } + + if (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_RTGROUPS) { + error = xfs_validate_sb_rtgroups(mp, sbp); + if (error) + return error; + } } else if (sbp->sb_qflags & (XFS_PQUOTA_ENFD | XFS_GQUOTA_ENFD | XFS_PQUOTA_CHKD | XFS_GQUOTA_CHKD)) { xfs_notice(mp, @@ -642,8 +709,13 @@ __xfs_sb_from_disk( to->sb_pquotino = NULLFSINO; } - to->sb_rgcount = 0; - to->sb_rgblocks = 0; + if (to->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_RTGROUPS) { + to->sb_rgcount = be32_to_cpu(from->sb_rgcount); + to->sb_rgblocks = be32_to_cpu(from->sb_rgblocks); + } else { + to->sb_rgcount = 0; + to->sb_rgblocks = 0; + } } void @@ -803,6 +875,12 @@ xfs_sb_to_disk( to->sb_gquotino = cpu_to_be64(NULLFSINO); to->sb_pquotino = cpu_to_be64(NULLFSINO); } + + if (from->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_RTGROUPS) { + /* must come after setting to_rsumino */ + to->sb_rgcount = cpu_to_be32(from->sb_rgcount); + to->sb_rgblocks = cpu_to_be32(from->sb_rgblocks); + } } /* @@ -957,8 +1035,8 @@ xfs_sb_mount_common( mp->m_blockwmask = mp->m_blockwsize - 1; mp->m_rtxblklog = log2_if_power2(sbp->sb_rextsize); mp->m_rtxblkmask = mask64_if_power2(sbp->sb_rextsize); - mp->m_rgblklog = 0; - mp->m_rgblkmask = 0; + mp->m_rgblklog = log2_if_power2(sbp->sb_rgblocks); + mp->m_rgblkmask = mask64_if_power2(sbp->sb_rgblocks); mp->m_alloc_mxr[0] = xfs_allocbt_maxrecs(mp, sbp->sb_blocksize, 1); mp->m_alloc_mxr[1] = xfs_allocbt_maxrecs(mp, sbp->sb_blocksize, 0); diff --git a/fs/xfs/libxfs/xfs_shared.h b/fs/xfs/libxfs/xfs_shared.h index 46754fe57361..e76e735b1d05 100644 --- a/fs/xfs/libxfs/xfs_shared.h +++ b/fs/xfs/libxfs/xfs_shared.h @@ -39,6 +39,7 @@ extern const struct xfs_buf_ops xfs_inode_buf_ra_ops; extern const struct xfs_buf_ops xfs_refcountbt_buf_ops; extern const struct xfs_buf_ops xfs_rmapbt_buf_ops; extern const struct xfs_buf_ops xfs_rtbuf_ops; +extern const struct xfs_buf_ops xfs_rtsb_buf_ops; extern const struct xfs_buf_ops xfs_sb_buf_ops; extern const struct xfs_buf_ops xfs_sb_quiet_buf_ops; extern const struct xfs_buf_ops xfs_symlink_buf_ops; diff --git a/fs/xfs/xfs_ondisk.h b/fs/xfs/xfs_ondisk.h index f4d700ce185c..61909355731d 100644 --- a/fs/xfs/xfs_ondisk.h +++ b/fs/xfs/xfs_ondisk.h @@ -53,6 +53,7 @@ xfs_check_ondisk_structs(void) XFS_CHECK_STRUCT_SIZE(xfs_inobt_ptr_t, 4); XFS_CHECK_STRUCT_SIZE(xfs_refcount_ptr_t, 4); XFS_CHECK_STRUCT_SIZE(xfs_rmap_ptr_t, 4); + XFS_CHECK_STRUCT_SIZE(struct xfs_rtsb, 104); /* dir/attr trees */ XFS_CHECK_STRUCT_SIZE(struct xfs_attr3_leaf_hdr, 80); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 11/22] xfs: record rt group superblock errors in the health system 2022-12-30 22:17 ` [PATCHSET v1.0 00/22] xfsprogs: shard the realtime section Darrick J. Wong ` (7 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 02/22] xfs: define the format of rt groups Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 09/22] xfs: check that rtblock extents do not overlap with the rt group metadata Darrick J. Wong ` (12 subsequent siblings) 21 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Record the state of per-rtgroup metadata sickness in the rtgroup structure for later reporting. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_health.h | 28 ++++++++++++++ fs/xfs/libxfs/xfs_rtgroup.h | 8 ++++ fs/xfs/scrub/health.c | 24 ++++++++++++ fs/xfs/xfs_health.c | 86 +++++++++++++++++++++++++++++++++++++++++++ fs/xfs/xfs_trace.c | 1 + fs/xfs/xfs_trace.h | 26 +++++++++++++ 6 files changed, 172 insertions(+), 1 deletion(-) diff --git a/fs/xfs/libxfs/xfs_health.h b/fs/xfs/libxfs/xfs_health.h index 99d53bae9c13..0beb4153a43e 100644 --- a/fs/xfs/libxfs/xfs_health.h +++ b/fs/xfs/libxfs/xfs_health.h @@ -52,6 +52,7 @@ struct xfs_inode; struct xfs_fsop_geom; struct xfs_btree_cur; struct xfs_da_args; +struct xfs_rtgroup; /* Observable health issues for metadata spanning the entire filesystem. */ #define XFS_SICK_FS_COUNTERS (1 << 0) /* summary counters */ @@ -65,6 +66,7 @@ struct xfs_da_args; /* Observable health issues for realtime volume metadata. */ #define XFS_SICK_RT_BITMAP (1 << 0) /* realtime bitmap */ #define XFS_SICK_RT_SUMMARY (1 << 1) /* realtime summary */ +#define XFS_SICK_RT_SUPER (1 << 2) /* rt group superblock */ /* Observable health issues for AG metadata. */ #define XFS_SICK_AG_SB (1 << 0) /* superblock */ @@ -101,7 +103,8 @@ struct xfs_da_args; XFS_SICK_FS_METADIR) #define XFS_SICK_RT_PRIMARY (XFS_SICK_RT_BITMAP | \ - XFS_SICK_RT_SUMMARY) + XFS_SICK_RT_SUMMARY | \ + XFS_SICK_RT_SUPER) #define XFS_SICK_AG_PRIMARY (XFS_SICK_AG_SB | \ XFS_SICK_AG_AGF | \ @@ -176,6 +179,14 @@ void xfs_rt_mark_healthy(struct xfs_mount *mp, unsigned int mask); void xfs_rt_measure_sickness(struct xfs_mount *mp, unsigned int *sick, unsigned int *checked); +void xfs_rgno_mark_sick(struct xfs_mount *mp, xfs_rgnumber_t rgno, + unsigned int mask); +void xfs_rtgroup_mark_sick(struct xfs_rtgroup *rtg, unsigned int mask); +void xfs_rtgroup_mark_checked(struct xfs_rtgroup *rtg, unsigned int mask); +void xfs_rtgroup_mark_healthy(struct xfs_rtgroup *rtg, unsigned int mask); +void xfs_rtgroup_measure_sickness(struct xfs_rtgroup *rtg, unsigned int *sick, + unsigned int *checked); + void xfs_agno_mark_sick(struct xfs_mount *mp, xfs_agnumber_t agno, unsigned int mask); void xfs_ag_mark_sick(struct xfs_perag *pag, unsigned int mask); @@ -225,6 +236,15 @@ xfs_ag_has_sickness(struct xfs_perag *pag, unsigned int mask) return sick & mask; } +static inline bool +xfs_rtgroup_has_sickness(struct xfs_rtgroup *rtg, unsigned int mask) +{ + unsigned int sick, checked; + + xfs_rtgroup_measure_sickness(rtg, &sick, &checked); + return sick & mask; +} + static inline bool xfs_inode_has_sickness(struct xfs_inode *ip, unsigned int mask) { @@ -246,6 +266,12 @@ xfs_rt_is_healthy(struct xfs_mount *mp) return !xfs_rt_has_sickness(mp, -1U); } +static inline bool +xfs_rtgroup_is_healthy(struct xfs_rtgroup *rtg) +{ + return !xfs_rtgroup_has_sickness(rtg, -1U); +} + static inline bool xfs_ag_is_healthy(struct xfs_perag *pag) { diff --git a/fs/xfs/libxfs/xfs_rtgroup.h b/fs/xfs/libxfs/xfs_rtgroup.h index d8723fabeb57..0e664e2436b0 100644 --- a/fs/xfs/libxfs/xfs_rtgroup.h +++ b/fs/xfs/libxfs/xfs_rtgroup.h @@ -23,6 +23,14 @@ struct xfs_rtgroup { /* Number of blocks in this group */ xfs_rgblock_t rtg_blockcount; + /* + * Bitsets of per-rtgroup metadata that have been checked and/or are + * sick. Callers should hold rtg_state_lock before accessing this + * field. + */ + uint16_t rtg_checked; + uint16_t rtg_sick; + #ifdef __KERNEL__ /* -- kernel only structures below this line -- */ spinlock_t rtg_state_lock; diff --git a/fs/xfs/scrub/health.c b/fs/xfs/scrub/health.c index cdf059f47656..9a8d4c348cc9 100644 --- a/fs/xfs/scrub/health.c +++ b/fs/xfs/scrub/health.c @@ -14,6 +14,7 @@ #include "xfs_mount.h" #include "xfs_ag.h" #include "xfs_health.h" +#include "xfs_rtgroup.h" #include "scrub/scrub.h" #include "scrub/health.h" #include "scrub/common.h" @@ -76,6 +77,7 @@ enum xchk_health_group { XHG_RT, XHG_AG, XHG_INO, + XHG_RTGROUP, }; struct xchk_health_map { @@ -130,12 +132,16 @@ xchk_mark_all_healthy( struct xfs_mount *mp) { struct xfs_perag *pag; + struct xfs_rtgroup *rtg; xfs_agnumber_t agno; + xfs_rgnumber_t rgno; xfs_fs_mark_healthy(mp, XFS_SICK_FS_INDIRECT); xfs_rt_mark_healthy(mp, XFS_SICK_RT_INDIRECT); for_each_perag(mp, agno, pag) xfs_ag_mark_healthy(pag, XFS_SICK_AG_INDIRECT); + for_each_rtgroup(mp, rgno, rtg) + xfs_rtgroup_mark_healthy(rtg, XFS_SICK_RT_INDIRECT); } /* @@ -153,6 +159,7 @@ xchk_update_health( struct xfs_scrub *sc) { struct xfs_perag *pag; + struct xfs_rtgroup *rtg; bool bad; /* @@ -215,6 +222,15 @@ xchk_update_health( } else xfs_rt_mark_healthy(sc->mp, sc->sick_mask); break; + case XHG_RTGROUP: + rtg = xfs_rtgroup_get(sc->mp, sc->sm->sm_agno); + if (bad) { + xfs_rtgroup_mark_sick(rtg, sc->sick_mask); + xfs_rtgroup_mark_checked(rtg, sc->sick_mask); + } else + xfs_rtgroup_mark_healthy(rtg, sc->sick_mask); + xfs_rtgroup_put(rtg); + break; default: ASSERT(0); break; @@ -302,7 +318,9 @@ xchk_health_record( { struct xfs_mount *mp = sc->mp; struct xfs_perag *pag; + struct xfs_rtgroup *rtg; xfs_agnumber_t agno; + xfs_rgnumber_t rgno; unsigned int sick; unsigned int checked; @@ -321,5 +339,11 @@ xchk_health_record( xchk_set_corrupt(sc); } + for_each_rtgroup(mp, rgno, rtg) { + xfs_rtgroup_measure_sickness(rtg, &sick, &checked); + if (sick & XFS_SICK_RT_PRIMARY) + xchk_set_corrupt(sc); + } + return 0; } diff --git a/fs/xfs/xfs_health.c b/fs/xfs/xfs_health.c index 61f7a6aca6b1..fe05b565427f 100644 --- a/fs/xfs/xfs_health.c +++ b/fs/xfs/xfs_health.c @@ -18,6 +18,7 @@ #include "xfs_da_format.h" #include "xfs_da_btree.h" #include "xfs_quota_defs.h" +#include "xfs_rtgroup.h" /* * Warn about metadata corruption that we detected but haven't fixed, and @@ -29,7 +30,9 @@ xfs_health_unmount( struct xfs_mount *mp) { struct xfs_perag *pag; + struct xfs_rtgroup *rtg; xfs_agnumber_t agno; + xfs_rgnumber_t rgno; unsigned int sick = 0; unsigned int checked = 0; bool warn = false; @@ -46,6 +49,15 @@ xfs_health_unmount( } } + /* Measure realtime group corruption levels. */ + for_each_rtgroup(mp, rgno, rtg) { + xfs_rtgroup_measure_sickness(rtg, &sick, &checked); + if (sick) { + trace_xfs_rtgroup_unfixed_corruption(rtg, sick); + warn = true; + } + } + /* Measure realtime volume corruption levels. */ xfs_rt_measure_sickness(mp, &sick, &checked); if (sick) { @@ -280,6 +292,80 @@ xfs_ag_measure_sickness( spin_unlock(&pag->pag_state_lock); } +/* Mark unhealthy per-rtgroup metadata given a raw rt group number. */ +void +xfs_rgno_mark_sick( + struct xfs_mount *mp, + xfs_rgnumber_t rgno, + unsigned int mask) +{ + struct xfs_rtgroup *rtg = xfs_rtgroup_get(mp, rgno); + + /* per-rtgroup structure not set up yet? */ + if (!rtg) + return; + + xfs_rtgroup_mark_sick(rtg, mask); + xfs_rtgroup_put(rtg); +} + +/* Mark unhealthy per-rtgroup metadata. */ +void +xfs_rtgroup_mark_sick( + struct xfs_rtgroup *rtg, + unsigned int mask) +{ + ASSERT(!(mask & ~XFS_SICK_RT_ALL)); + trace_xfs_rtgroup_mark_sick(rtg, mask); + + spin_lock(&rtg->rtg_state_lock); + rtg->rtg_sick |= mask; + spin_unlock(&rtg->rtg_state_lock); +} + +/* Mark per-rtgroup metadata as having been checked. */ +void +xfs_rtgroup_mark_checked( + struct xfs_rtgroup *rtg, + unsigned int mask) +{ + ASSERT(!(mask & ~XFS_SICK_RT_PRIMARY)); + + spin_lock(&rtg->rtg_state_lock); + rtg->rtg_checked |= mask; + spin_unlock(&rtg->rtg_state_lock); +} + +/* Mark per-rtgroup metadata ok. */ +void +xfs_rtgroup_mark_healthy( + struct xfs_rtgroup *rtg, + unsigned int mask) +{ + ASSERT(!(mask & ~XFS_SICK_RT_ALL)); + trace_xfs_rtgroup_mark_healthy(rtg, mask); + + spin_lock(&rtg->rtg_state_lock); + rtg->rtg_sick &= ~mask; + if (!(rtg->rtg_sick & XFS_SICK_RT_PRIMARY)) + rtg->rtg_sick &= ~XFS_SICK_RT_SECONDARY; + rtg->rtg_checked |= mask; + spin_unlock(&rtg->rtg_state_lock); +} + +/* Sample which per-rtgroup metadata are unhealthy. */ +void +xfs_rtgroup_measure_sickness( + struct xfs_rtgroup *rtg, + unsigned int *sick, + unsigned int *checked) +{ + spin_lock(&rtg->rtg_state_lock); + *sick = rtg->rtg_sick; + *checked = rtg->rtg_checked; + spin_unlock(&rtg->rtg_state_lock); +} + /* Mark the unhealthy parts of an inode. */ void xfs_inode_mark_sick( diff --git a/fs/xfs/xfs_trace.c b/fs/xfs/xfs_trace.c index e38814f4380c..36109a57fca6 100644 --- a/fs/xfs/xfs_trace.c +++ b/fs/xfs/xfs_trace.c @@ -42,6 +42,7 @@ #include "xfs_bmap.h" #include "xfs_swapext.h" #include "xfs_xchgrange.h" +#include "xfs_rtgroup.h" /* * We include this last to have the helpers above available for the trace diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index f72f694b4656..ec9aa1914a93 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -82,6 +82,7 @@ struct xfs_icwalk; struct xfs_bmap_intent; struct xfs_swapext_intent; struct xfs_swapext_req; +struct xfs_rtgroup; #define XFS_ATTR_FILTER_FLAGS \ { XFS_ATTR_ROOT, "ROOT" }, \ @@ -4233,6 +4234,31 @@ DEFINE_AG_CORRUPT_EVENT(xfs_ag_mark_sick); DEFINE_AG_CORRUPT_EVENT(xfs_ag_mark_healthy); DEFINE_AG_CORRUPT_EVENT(xfs_ag_unfixed_corruption); +DECLARE_EVENT_CLASS(xfs_rtgroup_corrupt_class, + TP_PROTO(struct xfs_rtgroup *rtg, unsigned int flags), + TP_ARGS(rtg, flags), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(xfs_rgnumber_t, rgno) + __field(unsigned int, flags) + ), + TP_fast_assign( + __entry->dev = rtg->rtg_mount->m_super->s_dev; + __entry->rgno = rtg->rtg_rgno; + __entry->flags = flags; + ), + TP_printk("dev %d:%d rgno 0x%x flags 0x%x", + MAJOR(__entry->dev), MINOR(__entry->dev), + __entry->rgno, __entry->flags) +); +#define DEFINE_RTGROUP_CORRUPT_EVENT(name) \ +DEFINE_EVENT(xfs_rtgroup_corrupt_class, name, \ + TP_PROTO(struct xfs_rtgroup *rtg, unsigned int flags), \ + TP_ARGS(rtg, flags)) +DEFINE_RTGROUP_CORRUPT_EVENT(xfs_rtgroup_mark_sick); +DEFINE_RTGROUP_CORRUPT_EVENT(xfs_rtgroup_mark_healthy); +DEFINE_RTGROUP_CORRUPT_EVENT(xfs_rtgroup_unfixed_corruption); + DECLARE_EVENT_CLASS(xfs_inode_corrupt_class, TP_PROTO(struct xfs_inode *ip, unsigned int flags), TP_ARGS(ip, flags), ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 09/22] xfs: check that rtblock extents do not overlap with the rt group metadata 2022-12-30 22:17 ` [PATCHSET v1.0 00/22] xfsprogs: shard the realtime section Darrick J. Wong ` (8 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 11/22] xfs: record rt group superblock errors in the health system Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 14/22] xfs: add block headers to realtime bitmap blocks Darrick J. Wong ` (11 subsequent siblings) 21 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> The ondisk format specifies that the start of each realtime group must have a superblock so that rt space mappings never cross an rtgroup boundary. Check that rt block pointers obey this. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_types.c | 46 +++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 44 insertions(+), 2 deletions(-) diff --git a/fs/xfs/libxfs/xfs_types.c b/fs/xfs/libxfs/xfs_types.c index b1fa715e5f39..34d02b2bfdd1 100644 --- a/fs/xfs/libxfs/xfs_types.c +++ b/fs/xfs/libxfs/xfs_types.c @@ -13,6 +13,8 @@ #include "xfs_mount.h" #include "xfs_ag.h" #include "xfs_imeta.h" +#include "xfs_rtbitmap.h" +#include "xfs_rtgroup.h" /* @@ -133,6 +135,26 @@ xfs_verify_dir_ino( return xfs_verify_ino(mp, ino); } +/* + * Verify that an rtgroup block number pointer neither points outside the + * rtgroup nor points at static metadata. + */ +static inline bool +xfs_verify_rgno_rgbno( + struct xfs_mount *mp, + xfs_rgnumber_t rgno, + xfs_rgblock_t rgbno) +{ + xfs_rgblock_t eorg; + + eorg = xfs_rtgroup_block_count(mp, rgno); + if (rgbno >= eorg) + return false; + if (rgbno < mp->m_sb.sb_rextsize) + return false; + return true; +} + /* * Verify that an realtime block number pointer doesn't point off the * end of the realtime device. @@ -142,7 +164,20 @@ xfs_verify_rtbno( struct xfs_mount *mp, xfs_rtblock_t rtbno) { - return rtbno < mp->m_sb.sb_rblocks; + xfs_rgnumber_t rgno; + xfs_rgblock_t rgbno; + + if (rtbno >= mp->m_sb.sb_rblocks) + return false; + + if (!xfs_has_rtgroups(mp)) + return true; + + rgbno = xfs_rtb_to_rgbno(mp, rtbno, &rgno); + if (rgno >= mp->m_sb.sb_rgcount) + return false; + + return xfs_verify_rgno_rgbno(mp, rgno, rgbno); } /* Verify that a realtime device extent is fully contained inside the volume. */ @@ -158,7 +193,14 @@ xfs_verify_rtbext( if (!xfs_verify_rtbno(mp, rtbno)) return false; - return xfs_verify_rtbno(mp, rtbno + len - 1); + if (!xfs_verify_rtbno(mp, rtbno + len - 1)) + return false; + + if (xfs_has_rtgroups(mp) && + xfs_rtb_to_rgno(mp, rtbno) != xfs_rtb_to_rgno(mp, rtbno + len - 1)) + return false; + + return true; } /* Calculate the range of valid icount values. */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 14/22] xfs: add block headers to realtime bitmap blocks 2022-12-30 22:17 ` [PATCHSET v1.0 00/22] xfsprogs: shard the realtime section Darrick J. Wong ` (9 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 09/22] xfs: check that rtblock extents do not overlap with the rt group metadata Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 13/22] xfs: export the geometry of realtime groups to userspace Darrick J. Wong ` (10 subsequent siblings) 21 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Upgrade rtbitmap blocks to have self describing metadata like most every other thing in XFS. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_format.h | 14 ++++++ fs/xfs/libxfs/xfs_rtbitmap.c | 100 +++++++++++++++++++++++++++++++++++++---- fs/xfs/libxfs/xfs_rtbitmap.h | 30 ++++++++++++ fs/xfs/libxfs/xfs_sb.c | 18 ++++++- fs/xfs/libxfs/xfs_shared.h | 1 fs/xfs/xfs_buf_item_recover.c | 20 +++++++- fs/xfs/xfs_mount.h | 3 + fs/xfs/xfs_ondisk.h | 1 fs/xfs/xfs_rtalloc.c | 60 ++++++++++++++++++------- 9 files changed, 213 insertions(+), 34 deletions(-) diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h index a38e1499bd4b..4096d3f069a3 100644 --- a/fs/xfs/libxfs/xfs_format.h +++ b/fs/xfs/libxfs/xfs_format.h @@ -1284,6 +1284,20 @@ static inline bool xfs_dinode_has_large_extent_counts( /* * RT bit manipulation macros. */ +#define XFS_RTBITMAP_MAGIC 0x424D505A /* BMPZ */ + +struct xfs_rtbuf_blkinfo { + __be32 rt_magic; /* validity check on block */ + __be32 rt_crc; /* CRC of block */ + __be64 rt_owner; /* inode that owns the block */ + __be64 rt_blkno; /* first block of the buffer */ + __be64 rt_lsn; /* sequence number of last write */ + uuid_t rt_uuid; /* filesystem we belong to */ +}; + +#define XFS_RTBUF_CRC_OFF \ + offsetof(struct xfs_rtbuf_blkinfo, rt_crc) + #define XFS_RTMIN(a,b) ((a) < (b) ? (a) : (b)) #define XFS_RTMAX(a,b) ((a) > (b) ? (a) : (b)) diff --git a/fs/xfs/libxfs/xfs_rtbitmap.c b/fs/xfs/libxfs/xfs_rtbitmap.c index 4237b5703a64..05b0e4e92a0a 100644 --- a/fs/xfs/libxfs/xfs_rtbitmap.c +++ b/fs/xfs/libxfs/xfs_rtbitmap.c @@ -18,28 +18,84 @@ #include "xfs_error.h" #include "xfs_health.h" #include "xfs_rtbitmap.h" +#include "xfs_log.h" +#include "xfs_buf_item.h" /* * Realtime allocator bitmap functions shared with userspace. */ -/* - * Real time buffers need verifiers to avoid runtime warnings during IO. - * We don't have anything to verify, however, so these are just dummy - * operations. - */ +static xfs_failaddr_t +xfs_rtbuf_verify( + struct xfs_buf *bp) +{ + struct xfs_mount *mp = bp->b_mount; + struct xfs_rtbuf_blkinfo *hdr = bp->b_addr; + + if (!xfs_verify_magic(bp, hdr->rt_magic)) + return __this_address; + if (!xfs_has_rtgroups(mp)) + return __this_address; + if (!xfs_has_crc(mp)) + return __this_address; + if (!uuid_equal(&hdr->rt_uuid, &mp->m_sb.sb_meta_uuid)) + return __this_address; + if (hdr->rt_blkno != cpu_to_be64(xfs_buf_daddr(bp))) + return __this_address; + return NULL; +} + static void xfs_rtbuf_verify_read( - struct xfs_buf *bp) + struct xfs_buf *bp) { + struct xfs_mount *mp = bp->b_mount; + struct xfs_rtbuf_blkinfo *hdr = bp->b_addr; + xfs_failaddr_t fa; + + if (!xfs_has_rtgroups(mp) || bp->b_ops != &xfs_rtbitmap_buf_ops) + return; + + if (!xfs_log_check_lsn(mp, be64_to_cpu(hdr->rt_lsn))) { + fa = __this_address; + goto fail; + } + + if (!xfs_buf_verify_cksum(bp, XFS_RTBUF_CRC_OFF)) { + fa = __this_address; + goto fail; + } + + fa = xfs_rtbuf_verify(bp); + if (fa) + goto fail; + return; +fail: + xfs_verifier_error(bp, -EFSCORRUPTED, fa); } static void xfs_rtbuf_verify_write( struct xfs_buf *bp) { - return; + struct xfs_mount *mp = bp->b_mount; + struct xfs_rtbuf_blkinfo *hdr = bp->b_addr; + struct xfs_buf_log_item *bip = bp->b_log_item; + xfs_failaddr_t fa; + + if (!xfs_has_rtgroups(mp) || bp->b_ops != &xfs_rtbitmap_buf_ops) + return; + + fa = xfs_rtbuf_verify(bp); + if (fa) { + xfs_verifier_error(bp, -EFSCORRUPTED, fa); + return; + } + + if (bip) + hdr->rt_lsn = cpu_to_be64(bip->bli_item.li_lsn); + xfs_buf_update_cksum(bp, XFS_RTBUF_CRC_OFF); } const struct xfs_buf_ops xfs_rtbuf_ops = { @@ -48,6 +104,14 @@ const struct xfs_buf_ops xfs_rtbuf_ops = { .verify_write = xfs_rtbuf_verify_write, }; +const struct xfs_buf_ops xfs_rtbitmap_buf_ops = { + .name = "xfs_rtbitmap", + .magic = { 0, cpu_to_be32(XFS_RTBITMAP_MAGIC) }, + .verify_read = xfs_rtbuf_verify_read, + .verify_write = xfs_rtbuf_verify_write, + .verify_struct = xfs_rtbuf_verify, +}; + /* * Get a buffer for the bitmap or summary file block specified. * The buffer is returned read and locked. @@ -81,13 +145,26 @@ xfs_rtbuf_get( ASSERT(map.br_startblock != NULLFSBLOCK); error = xfs_trans_read_buf(mp, tp, mp->m_ddev_targp, XFS_FSB_TO_DADDR(mp, map.br_startblock), - mp->m_bsize, 0, &bp, &xfs_rtbuf_ops); + mp->m_bsize, 0, &bp, + xfs_rtblock_ops(mp, issum)); if (xfs_metadata_is_sick(error)) xfs_rt_mark_sick(mp, issum ? XFS_SICK_RT_SUMMARY : XFS_SICK_RT_BITMAP); if (error) return error; + if (xfs_has_rtgroups(mp) && !issum) { + struct xfs_rtbuf_blkinfo *hdr = bp->b_addr; + + if (hdr->rt_owner != cpu_to_be64(ip->i_ino)) { + xfs_buf_mark_corrupt(bp); + xfs_trans_brelse(tp, bp); + xfs_rt_mark_sick(mp, issum ? XFS_SICK_RT_SUMMARY : + XFS_SICK_RT_BITMAP); + return -EFSCORRUPTED; + } + } + xfs_trans_buf_set_type(tp, bp, issum ? XFS_BLFT_RTSUMMARY_BUF : XFS_BLFT_RTBITMAP_BUF); *bpp = bp; @@ -1205,7 +1282,12 @@ xfs_rtbitmap_blockcount( struct xfs_mount *mp, xfs_rtbxlen_t rtextents) { - return howmany_64(rtextents, NBBY * mp->m_sb.sb_blocksize); + unsigned int rbmblock_bytes = mp->m_sb.sb_blocksize; + + if (xfs_has_rtgroups(mp)) + rbmblock_bytes -= sizeof(struct xfs_rtbuf_blkinfo); + + return howmany_64(rtextents, NBBY * rbmblock_bytes); } /* diff --git a/fs/xfs/libxfs/xfs_rtbitmap.h b/fs/xfs/libxfs/xfs_rtbitmap.h index f6a2a48973ab..c1f740fd27b8 100644 --- a/fs/xfs/libxfs/xfs_rtbitmap.h +++ b/fs/xfs/libxfs/xfs_rtbitmap.h @@ -100,6 +100,9 @@ xfs_rtx_to_rbmblock( struct xfs_mount *mp, xfs_rtxnum_t rtx) { + if (xfs_has_rtgroups(mp)) + return div_u64(rtx, mp->m_rtx_per_rbmblock); + return rtx >> mp->m_blkbit_log; } @@ -109,6 +112,13 @@ xfs_rtx_to_rbmword( struct xfs_mount *mp, xfs_rtxnum_t rtx) { + if (xfs_has_rtgroups(mp)) { + unsigned int mod; + + div_u64_rem(rtx >> XFS_NBWORDLOG, mp->m_blockwsize, &mod); + return mod; + } + return (rtx >> XFS_NBWORDLOG) & (mp->m_blockwsize - 1); } @@ -118,16 +128,24 @@ xfs_rbmblock_to_rtx( struct xfs_mount *mp, xfs_fileoff_t rbmoff) { + if (xfs_has_rtgroups(mp)) + return rbmoff * mp->m_rtx_per_rbmblock; + return rbmoff << mp->m_blkbit_log; } /* Return a pointer to a bitmap word within a rt bitmap block buffer. */ static inline union xfs_rtword_ondisk * xfs_rbmbuf_wordptr( + struct xfs_mount *mp, void *buf, unsigned int rbmword) { union xfs_rtword_ondisk *wordp = buf; + struct xfs_rtbuf_blkinfo *hdr = buf; + + if (xfs_has_rtgroups(mp)) + wordp = (union xfs_rtword_ondisk *)(hdr + 1); return &wordp[rbmword]; } @@ -138,7 +156,7 @@ xfs_rbmblock_wordptr( struct xfs_buf *bp, unsigned int rbmword) { - return xfs_rbmbuf_wordptr(bp->b_addr, rbmword); + return xfs_rbmbuf_wordptr(bp->b_mount, bp->b_addr, rbmword); } /* @@ -200,6 +218,16 @@ xfs_rsumblock_infoptr( return xfs_rsumbuf_infoptr(bp->b_addr, infoword); } +static inline const struct xfs_buf_ops * +xfs_rtblock_ops( + struct xfs_mount *mp, + bool issum) +{ + if (xfs_has_rtgroups(mp) && !issum) + return &xfs_rtbitmap_buf_ops; + return &xfs_rtbuf_ops; +} + /* * Functions for walking free space rtextents in the realtime bitmap. */ diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c index ee4e59453edc..dbbea5b86f27 100644 --- a/fs/xfs/libxfs/xfs_sb.c +++ b/fs/xfs/libxfs/xfs_sb.c @@ -518,10 +518,15 @@ xfs_validate_sb_common( } else { uint64_t rexts; uint64_t rbmblocks; + unsigned int rbmblock_bytes = sbp->sb_blocksize; rexts = div_u64(sbp->sb_rblocks, sbp->sb_rextsize); - rbmblocks = howmany_64(sbp->sb_rextents, - NBBY * sbp->sb_blocksize); + + if (xfs_sb_is_v5(sbp) && + (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_RTGROUPS)) + rbmblock_bytes -= sizeof(struct xfs_rtbuf_blkinfo); + + rbmblocks = howmany_64(sbp->sb_rextents, NBBY * rbmblock_bytes); if (sbp->sb_rextents != rexts || sbp->sb_rextslog != xfs_highbit32(sbp->sb_rextents) || @@ -1032,8 +1037,13 @@ xfs_sb_mount_common( mp->m_sectbb_log = sbp->sb_sectlog - BBSHIFT; mp->m_agno_log = xfs_highbit32(sbp->sb_agcount - 1) + 1; mp->m_blockmask = sbp->sb_blocksize - 1; - mp->m_blockwsize = sbp->sb_blocksize >> XFS_WORDLOG; - mp->m_blockwmask = mp->m_blockwsize - 1; + if (xfs_has_rtgroups(mp)) + mp->m_blockwsize = (sbp->sb_blocksize - + sizeof(struct xfs_rtbuf_blkinfo)) >> + XFS_WORDLOG; + else + mp->m_blockwsize = sbp->sb_blocksize >> XFS_WORDLOG; + mp->m_rtx_per_rbmblock = mp->m_blockwsize << XFS_NBWORDLOG; mp->m_rtxblklog = log2_if_power2(sbp->sb_rextsize); mp->m_rtxblkmask = mask64_if_power2(sbp->sb_rextsize); mp->m_rgblklog = log2_if_power2(sbp->sb_rgblocks); diff --git a/fs/xfs/libxfs/xfs_shared.h b/fs/xfs/libxfs/xfs_shared.h index bcdf298889af..1c86163915cf 100644 --- a/fs/xfs/libxfs/xfs_shared.h +++ b/fs/xfs/libxfs/xfs_shared.h @@ -38,6 +38,7 @@ extern const struct xfs_buf_ops xfs_inode_buf_ops; extern const struct xfs_buf_ops xfs_inode_buf_ra_ops; extern const struct xfs_buf_ops xfs_refcountbt_buf_ops; extern const struct xfs_buf_ops xfs_rmapbt_buf_ops; +extern const struct xfs_buf_ops xfs_rtbitmap_buf_ops; extern const struct xfs_buf_ops xfs_rtbuf_ops; extern const struct xfs_buf_ops xfs_rtsb_buf_ops; extern const struct xfs_buf_ops xfs_sb_buf_ops; diff --git a/fs/xfs/xfs_buf_item_recover.c b/fs/xfs/xfs_buf_item_recover.c index 6587d18b21c3..4dcd5d9d2c7c 100644 --- a/fs/xfs/xfs_buf_item_recover.c +++ b/fs/xfs/xfs_buf_item_recover.c @@ -23,6 +23,7 @@ #include "xfs_dir2.h" #include "xfs_quota.h" #include "xfs_rtgroup.h" +#include "xfs_rtbitmap.h" /* * This is the number of entries in the l_buf_cancel_table used during @@ -391,9 +392,15 @@ xlog_recover_validate_buf_type( break; #ifdef CONFIG_XFS_RT case XFS_BLFT_RTBITMAP_BUF: + if (xfs_has_rtgroups(mp) && magic32 != XFS_RTBITMAP_MAGIC) { + warnmsg = "Bad rtbitmap magic!"; + break; + } + bp->b_ops = xfs_rtblock_ops(mp, false); + break; case XFS_BLFT_RTSUMMARY_BUF: /* no magic numbers for verification of RT buffers */ - bp->b_ops = &xfs_rtbuf_ops; + bp->b_ops = xfs_rtblock_ops(mp, true); break; #endif /* CONFIG_XFS_RT */ default: @@ -728,11 +735,20 @@ xlog_recover_get_buf_lsn( * UUIDs, so we must recover them immediately. */ blft = xfs_blft_from_flags(buf_f); - if (blft == XFS_BLFT_RTBITMAP_BUF || blft == XFS_BLFT_RTSUMMARY_BUF) + if (!xfs_has_rtgroups(mp) && blft == XFS_BLFT_RTBITMAP_BUF) + goto recover_immediately; + if (blft == XFS_BLFT_RTSUMMARY_BUF) goto recover_immediately; magic32 = be32_to_cpu(*(__be32 *)blk); switch (magic32) { + case XFS_RTBITMAP_MAGIC: { + struct xfs_rtbuf_blkinfo *hdr = blk; + + lsn = be64_to_cpu(hdr->rt_lsn); + uuid = &hdr->rt_uuid; + break; + } case XFS_ABTB_CRC_MAGIC: case XFS_ABTC_CRC_MAGIC: case XFS_ABTB_MAGIC: diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h index 7f0a80a8dcd4..176b2e71da9e 100644 --- a/fs/xfs/xfs_mount.h +++ b/fs/xfs/xfs_mount.h @@ -123,7 +123,8 @@ typedef struct xfs_mount { int8_t m_rgblklog; /* log2 of rt group sz if possible */ uint m_blockmask; /* sb_blocksize-1 */ uint m_blockwsize; /* sb_blocksize in words */ - uint m_blockwmask; /* blockwsize-1 */ + /* number of rt extents per rt bitmap block if rtgroups enabled */ + unsigned int m_rtx_per_rbmblock; uint m_alloc_mxr[2]; /* max alloc btree records */ uint m_alloc_mnr[2]; /* min alloc btree records */ uint m_bmap_dmxr[2]; /* max bmap btree records */ diff --git a/fs/xfs/xfs_ondisk.h b/fs/xfs/xfs_ondisk.h index 61909355731d..17e541d35194 100644 --- a/fs/xfs/xfs_ondisk.h +++ b/fs/xfs/xfs_ondisk.h @@ -76,6 +76,7 @@ xfs_check_ondisk_structs(void) /* realtime structures */ XFS_CHECK_STRUCT_SIZE(union xfs_rtword_ondisk, 4); XFS_CHECK_STRUCT_SIZE(union xfs_suminfo_ondisk, 4); + XFS_CHECK_STRUCT_SIZE(struct xfs_rtbuf_blkinfo, 48); /* * m68k has problems with xfs_attr_leaf_name_remote_t, but we pad it to diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c index 9d8d91fa0ecf..9e013a8e3149 100644 --- a/fs/xfs/xfs_rtalloc.c +++ b/fs/xfs/xfs_rtalloc.c @@ -791,6 +791,42 @@ xfs_rtallocate_extent_size( return 0; } +/* Get a buffer for the block. */ +static int +xfs_growfs_init_rtbuf( + struct xfs_trans *tp, + struct xfs_inode *ip, + xfs_fsblock_t fsbno, + enum xfs_blft buf_type) +{ + struct xfs_mount *mp = tp->t_mountp; + struct xfs_buf *bp; + xfs_daddr_t d; + int error; + + d = XFS_FSB_TO_DADDR(mp, fsbno); + error = xfs_trans_get_buf(tp, mp->m_ddev_targp, d, mp->m_bsize, 0, + &bp); + if (error) + return error; + + xfs_trans_buf_set_type(tp, bp, buf_type); + bp->b_ops = xfs_rtblock_ops(mp, buf_type == XFS_BLFT_RTSUMMARY_BUF); + memset(bp->b_addr, 0, mp->m_sb.sb_blocksize); + + if (xfs_has_rtgroups(mp) && buf_type == XFS_BLFT_RTBITMAP_BUF) { + struct xfs_rtbuf_blkinfo *hdr = bp->b_addr; + + hdr->rt_magic = cpu_to_be32(XFS_RTBITMAP_MAGIC); + hdr->rt_owner = cpu_to_be64(ip->i_ino); + hdr->rt_blkno = cpu_to_be64(d); + uuid_copy(&hdr->rt_uuid, &mp->m_sb.sb_meta_uuid); + } + + xfs_trans_log_buf(tp, bp, 0, mp->m_sb.sb_blocksize - 1); + return 0; +} + /* * Allocate space to the bitmap or summary file, and zero it, for growfs. */ @@ -802,8 +838,6 @@ xfs_growfs_rt_alloc( struct xfs_inode *ip) /* inode (bitmap/summary) */ { xfs_fileoff_t bno; /* block number in file */ - struct xfs_buf *bp; /* temporary buffer for zeroing */ - xfs_daddr_t d; /* disk block address */ int error; /* error return value */ xfs_fsblock_t fsbno; /* filesystem block for bno */ struct xfs_bmbt_irec map; /* block map output */ @@ -878,19 +912,11 @@ xfs_growfs_rt_alloc( */ xfs_ilock(ip, XFS_ILOCK_EXCL); xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL); - /* - * Get a buffer for the block. - */ - d = XFS_FSB_TO_DADDR(mp, fsbno); - error = xfs_trans_get_buf(tp, mp->m_ddev_targp, d, - mp->m_bsize, 0, &bp); + + error = xfs_growfs_init_rtbuf(tp, ip, fsbno, buf_type); if (error) goto out_trans_cancel; - xfs_trans_buf_set_type(tp, bp, buf_type); - bp->b_ops = &xfs_rtbuf_ops; - memset(bp->b_addr, 0, mp->m_sb.sb_blocksize); - xfs_trans_log_buf(tp, bp, 0, mp->m_sb.sb_blocksize - 1); /* * Commit the transaction. */ @@ -1159,10 +1185,10 @@ xfs_growfs_rt( * Skip the current block if it is exactly full. * This also deals with the case where there were no rtextents before. */ - for (bmbno = sbp->sb_rbmblocks - - ((sbp->sb_rextents & ((1 << mp->m_blkbit_log) - 1)) != 0); - bmbno < nrbmblocks; - bmbno++) { + bmbno = sbp->sb_rbmblocks; + if (xfs_rtx_to_rbmword(mp, sbp->sb_rextents) != 0) + bmbno--; + for (; bmbno < nrbmblocks; bmbno++) { struct xfs_trans *tp; struct xfs_rtgroup *rtg; xfs_rfsblock_t nrblocks_step; @@ -1177,7 +1203,7 @@ xfs_growfs_rt( nsbp->sb_rextsize = in->extsize; nmp->m_rtxblklog = -1; /* don't use shift or masking */ nsbp->sb_rbmblocks = bmbno + 1; - nrblocks_step = (bmbno + 1) * NBBY * nsbp->sb_blocksize * + nrblocks_step = (bmbno + 1) * mp->m_rtx_per_rbmblock * nsbp->sb_rextsize; nsbp->sb_rblocks = min(nrblocks, nrblocks_step); nsbp->sb_rextents = xfs_rtb_to_rtxt(nmp, nsbp->sb_rblocks); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 13/22] xfs: export the geometry of realtime groups to userspace 2022-12-30 22:17 ` [PATCHSET v1.0 00/22] xfsprogs: shard the realtime section Darrick J. Wong ` (10 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 14/22] xfs: add block headers to realtime bitmap blocks Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 10/22] xfs: add frextents to the lazysbcounters when rtgroups enabled Darrick J. Wong ` (9 subsequent siblings) 21 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Create an ioctl so that the kernel can report the status of realtime groups to userspace. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_fs.h | 16 ++++++++++++++++ fs/xfs/libxfs/xfs_health.h | 2 ++ fs/xfs/libxfs/xfs_rtgroup.c | 14 ++++++++++++++ fs/xfs/libxfs/xfs_rtgroup.h | 4 ++++ fs/xfs/xfs_health.c | 28 ++++++++++++++++++++++++++++ fs/xfs/xfs_ioctl.c | 32 ++++++++++++++++++++++++++++++++ 6 files changed, 96 insertions(+) diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h index ba90649c54e0..e3d87665e4a5 100644 --- a/fs/xfs/libxfs/xfs_fs.h +++ b/fs/xfs/libxfs/xfs_fs.h @@ -299,6 +299,21 @@ struct xfs_ag_geometry { #define XFS_AG_GEOM_SICK_REFCNTBT (1 << 9) /* reference counts */ #define XFS_AG_GEOM_SICK_INODES (1 << 10) /* bad inodes were seen */ +/* + * Output for XFS_IOC_RTGROUP_GEOMETRY + */ +struct xfs_rtgroup_geometry { + uint32_t rg_number; /* i/o: rtgroup number */ + uint32_t rg_length; /* o: length in blocks */ + uint32_t rg_sick; /* o: sick things in ag */ + uint32_t rg_checked; /* o: checked metadata in ag */ + uint32_t rg_flags; /* i/o: flags for this ag */ + uint32_t rg_pad; /* o: zero */ + uint64_t rg_reserved[13];/* o: zero */ +}; +#define XFS_RTGROUP_GEOM_SICK_SUPER (1 << 0) /* superblock */ +#define XFS_RTGROUP_GEOM_SICK_BITMAP (1 << 1) /* rtbitmap for this group */ + /* * Structures for XFS_IOC_FSGROWFSDATA, XFS_IOC_FSGROWFSLOG & XFS_IOC_FSGROWFSRT */ @@ -819,6 +834,7 @@ struct xfs_scrub_metadata { /* XFS_IOC_GETFSMAP ------ hoisted 59 */ #define XFS_IOC_SCRUB_METADATA _IOWR('X', 60, struct xfs_scrub_metadata) #define XFS_IOC_AG_GEOMETRY _IOWR('X', 61, struct xfs_ag_geometry) +#define XFS_IOC_RTGROUP_GEOMETRY _IOWR('X', 62, struct xfs_rtgroup_geometry) /* * ioctl commands that replace IRIX syssgi()'s diff --git a/fs/xfs/libxfs/xfs_health.h b/fs/xfs/libxfs/xfs_health.h index 0beb4153a43e..44137c4983fc 100644 --- a/fs/xfs/libxfs/xfs_health.h +++ b/fs/xfs/libxfs/xfs_health.h @@ -286,6 +286,8 @@ xfs_inode_is_healthy(struct xfs_inode *ip) void xfs_fsop_geom_health(struct xfs_mount *mp, struct xfs_fsop_geom *geo); void xfs_ag_geom_health(struct xfs_perag *pag, struct xfs_ag_geometry *ageo); +void xfs_rtgroup_geom_health(struct xfs_rtgroup *rtg, + struct xfs_rtgroup_geometry *rgeo); void xfs_bulkstat_health(struct xfs_inode *ip, struct xfs_bulkstat *bs); #define xfs_metadata_is_sick(error) \ diff --git a/fs/xfs/libxfs/xfs_rtgroup.c b/fs/xfs/libxfs/xfs_rtgroup.c index 3bf85ab524f6..a428dff81888 100644 --- a/fs/xfs/libxfs/xfs_rtgroup.c +++ b/fs/xfs/libxfs/xfs_rtgroup.c @@ -532,3 +532,17 @@ xfs_rtgroup_unlock( else if (rtglock_flags & XFS_RTGLOCK_BITMAP_SHARED) xfs_rtbitmap_unlock_shared(rtg->rtg_mount, XFS_RBMLOCK_BITMAP); } + +/* Retrieve rt group geometry. */ +int +xfs_rtgroup_get_geometry( + struct xfs_rtgroup *rtg, + struct xfs_rtgroup_geometry *rgeo) +{ + /* Fill out form. */ + memset(rgeo, 0, sizeof(*rgeo)); + rgeo->rg_number = rtg->rtg_rgno; + rgeo->rg_length = rtg->rtg_blockcount; + xfs_rtgroup_geom_health(rtg, rgeo); + return 0; +} diff --git a/fs/xfs/libxfs/xfs_rtgroup.h b/fs/xfs/libxfs/xfs_rtgroup.h index b1e53af5a65b..1fec49c496d4 100644 --- a/fs/xfs/libxfs/xfs_rtgroup.h +++ b/fs/xfs/libxfs/xfs_rtgroup.h @@ -222,6 +222,9 @@ int xfs_rtgroup_update_secondary_sbs(struct xfs_mount *mp); void xfs_rtgroup_lock(struct xfs_trans *tp, struct xfs_rtgroup *rtg, unsigned int rtglock_flags); void xfs_rtgroup_unlock(struct xfs_rtgroup *rtg, unsigned int rtglock_flags); + +int xfs_rtgroup_get_geometry(struct xfs_rtgroup *rtg, + struct xfs_rtgroup_geometry *rgeo); #else # define xfs_rtgroup_block_count(mp, rgno) (0) # define xfs_rtgroup_update_super(bp, sb_bp) ((void)0) @@ -229,6 +232,7 @@ void xfs_rtgroup_unlock(struct xfs_rtgroup *rtg, unsigned int rtglock_flags); # define xfs_rtgroup_update_secondary_sbs(mp) (0) # define xfs_rtgroup_lock(tp, rtg, gf) ((void)0) # define xfs_rtgroup_unlock(rtg, gf) ((void)0) +# define xfs_rtgroup_get_geometry(rtg, rgeo) (-EOPNOTSUPP) #endif /* CONFIG_XFS_RT */ #endif /* __LIBXFS_RTGROUP_H */ diff --git a/fs/xfs/xfs_health.c b/fs/xfs/xfs_health.c index fe05b565427f..33f332ee8044 100644 --- a/fs/xfs/xfs_health.c +++ b/fs/xfs/xfs_health.c @@ -528,6 +528,34 @@ xfs_ag_geom_health( } } +static const struct ioctl_sick_map rtgroup_map[] = { + { XFS_SICK_RT_SUPER, XFS_RTGROUP_GEOM_SICK_SUPER }, + { XFS_SICK_RT_BITMAP, XFS_RTGROUP_GEOM_SICK_BITMAP }, + { 0, 0 }, +}; + +/* Fill out rtgroup geometry health info. */ +void +xfs_rtgroup_geom_health( + struct xfs_rtgroup *rtg, + struct xfs_rtgroup_geometry *rgeo) +{ + const struct ioctl_sick_map *m; + unsigned int sick; + unsigned int checked; + + rgeo->rg_sick = 0; + rgeo->rg_checked = 0; + + xfs_rtgroup_measure_sickness(rtg, &sick, &checked); + for (m = rtgroup_map; m->sick_mask; m++) { + if (checked & m->sick_mask) + rgeo->rg_checked |= m->ioctl_mask; + if (sick & m->sick_mask) + rgeo->rg_sick |= m->ioctl_mask; + } +} + static const struct ioctl_sick_map ino_map[] = { { XFS_SICK_INO_CORE, XFS_BS_SICK_INODE }, { XFS_SICK_INO_BMBTD, XFS_BS_SICK_BMBTD }, diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c index 46deb26b7cc5..fbe9bc50fc20 100644 --- a/fs/xfs/xfs_ioctl.c +++ b/fs/xfs/xfs_ioctl.c @@ -991,6 +991,36 @@ xfs_ioc_ag_geometry( return 0; } +STATIC int +xfs_ioc_rtgroup_geometry( + struct xfs_mount *mp, + void __user *arg) +{ + struct xfs_rtgroup *rtg; + struct xfs_rtgroup_geometry rgeo; + int error; + + if (copy_from_user(&rgeo, arg, sizeof(rgeo))) + return -EFAULT; + if (rgeo.rg_flags || rgeo.rg_pad) + return -EINVAL; + if (memchr_inv(&rgeo.rg_reserved, 0, sizeof(rgeo.rg_reserved))) + return -EINVAL; + + rtg = xfs_rtgroup_get(mp, rgeo.rg_number); + if (!rtg) + return -EINVAL; + + error = xfs_rtgroup_get_geometry(rtg, &rgeo); + xfs_rtgroup_put(rtg); + if (error) + return error; + + if (copy_to_user(arg, &rgeo, sizeof(rgeo))) + return -EFAULT; + return 0; +} + /* * Linux extended inode flags interface. */ @@ -1852,6 +1882,8 @@ xfs_file_ioctl( case XFS_IOC_AG_GEOMETRY: return xfs_ioc_ag_geometry(mp, arg); + case XFS_IOC_RTGROUP_GEOMETRY: + return xfs_ioc_rtgroup_geometry(mp, arg); case XFS_IOC_GETVERSION: return put_user(inode->i_generation, (int __user *)arg); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 10/22] xfs: add frextents to the lazysbcounters when rtgroups enabled 2022-12-30 22:17 ` [PATCHSET v1.0 00/22] xfsprogs: shard the realtime section Darrick J. Wong ` (11 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 13/22] xfs: export the geometry of realtime groups to userspace Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 12/22] xfs: define locking primitives for realtime groups Darrick J. Wong ` (8 subsequent siblings) 21 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Make the free rt extent count a part of the lazy sb counters when the realtime groups feature is enabled. This is possible because the patch to recompute frextents from the rtbitmap during log recovery predates the code adding rtgroup support, hence we know that the value will always be correct during runtime. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_sb.c | 5 +++++ fs/xfs/xfs_trans.c | 18 +++++++++++++++--- 2 files changed, 20 insertions(+), 3 deletions(-) diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c index db88f601e24b..ee4e59453edc 100644 --- a/fs/xfs/libxfs/xfs_sb.c +++ b/fs/xfs/libxfs/xfs_sb.c @@ -1089,6 +1089,9 @@ xfs_log_sb( * sb counters, despite having a percpu counter. It is always kept * consistent with the ondisk rtbitmap by xfs_trans_apply_sb_deltas() * and hence we don't need have to update it here. + * + * sb_frextents was added to the lazy sb counters when the rt groups + * feature was introduced. */ if (xfs_has_lazysbcount(mp)) { mp->m_sb.sb_icount = percpu_counter_sum(&mp->m_icount); @@ -1097,6 +1100,8 @@ xfs_log_sb( mp->m_sb.sb_icount); mp->m_sb.sb_fdblocks = percpu_counter_sum(&mp->m_fdblocks); } + if (xfs_has_rtgroups(mp)) + mp->m_sb.sb_frextents = percpu_counter_sum(&mp->m_frextents); xfs_sb_to_disk(bp->b_addr, &mp->m_sb); xfs_trans_buf_set_type(tp, bp, XFS_BLFT_SB_BUF); diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c index a6f46cd9e60c..05e93af190df 100644 --- a/fs/xfs/xfs_trans.c +++ b/fs/xfs/xfs_trans.c @@ -475,6 +475,8 @@ xfs_trans_mod_sb( ASSERT(tp->t_rtx_res_used <= tp->t_rtx_res); } tp->t_frextents_delta += delta; + if (xfs_has_rtgroups(mp)) + flags &= ~XFS_TRANS_SB_DIRTY; break; case XFS_TRANS_SB_RES_FREXTENTS: /* @@ -569,8 +571,14 @@ xfs_trans_apply_sb_deltas( * * Don't touch m_frextents because it includes incore reservations, * and those are handled by the unreserve function. + * + * sb_frextents was added to the lazy sb counters when the rt groups + * feature was introduced. This is possible because we know that all + * kernels supporting rtgroups will also recompute frextents from the + * realtime bitmap. */ - if (tp->t_frextents_delta || tp->t_res_frextents_delta) { + if ((tp->t_frextents_delta || tp->t_res_frextents_delta) && + !xfs_has_rtgroups(tp->t_mountp)) { struct xfs_mount *mp = tp->t_mountp; int64_t rtxdelta; @@ -684,7 +692,8 @@ xfs_trans_unreserve_and_mod_sb( if (tp->t_rtx_res > 0) rtxdelta = tp->t_rtx_res; if ((tp->t_frextents_delta != 0) && - (tp->t_flags & XFS_TRANS_SB_DIRTY)) + (xfs_has_rtgroups(mp) || + (tp->t_flags & XFS_TRANS_SB_DIRTY))) rtxdelta += tp->t_frextents_delta; if (xfs_has_lazysbcount(mp) || @@ -723,8 +732,11 @@ xfs_trans_unreserve_and_mod_sb( * Do not touch sb_frextents here because we are dealing with incore * reservation. sb_frextents is not part of the lazy sb counters so it * must be consistent with the ondisk rtbitmap and must never include - * incore reservations. + * incore reservations. sb_frextents was added to the lazy sb counters + * when the realtime groups feature was introduced. */ + if (xfs_has_rtgroups(mp)) + mp->m_sb.sb_frextents += rtxdelta; mp->m_sb.sb_dblocks += tp->t_dblocks_delta; mp->m_sb.sb_agcount += tp->t_agcount_delta; mp->m_sb.sb_imax_pct += tp->t_imaxpct_delta; ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 12/22] xfs: define locking primitives for realtime groups 2022-12-30 22:17 ` [PATCHSET v1.0 00/22] xfsprogs: shard the realtime section Darrick J. Wong ` (12 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 10/22] xfs: add frextents to the lazysbcounters when rtgroups enabled Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 15/22] xfs: encode the rtbitmap in little endian format Darrick J. Wong ` (7 subsequent siblings) 21 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Define helper functions to lock all metadata inodes related to a realtime group. There's not much to look at now, but this will become important when we add per-rtgroup metadata files and online fsck code for them. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_rtgroup.c | 33 +++++++++++++++++++++++++++++++++ fs/xfs/libxfs/xfs_rtgroup.h | 14 ++++++++++++++ 2 files changed, 47 insertions(+) diff --git a/fs/xfs/libxfs/xfs_rtgroup.c b/fs/xfs/libxfs/xfs_rtgroup.c index 037506b73384..3bf85ab524f6 100644 --- a/fs/xfs/libxfs/xfs_rtgroup.c +++ b/fs/xfs/libxfs/xfs_rtgroup.c @@ -499,3 +499,36 @@ xfs_rtgroup_update_secondary_sbs( return saved_error ? saved_error : error; } + +/* Lock metadata inodes associated with this rt group. */ +void +xfs_rtgroup_lock( + struct xfs_trans *tp, + struct xfs_rtgroup *rtg, + unsigned int rtglock_flags) +{ + ASSERT(!(rtglock_flags & ~XFS_RTGLOCK_ALL_FLAGS)); + ASSERT(!(rtglock_flags & XFS_RTGLOCK_BITMAP_SHARED) || + !(rtglock_flags & XFS_RTGLOCK_BITMAP)); + + if (rtglock_flags & XFS_RTGLOCK_BITMAP) + xfs_rtbitmap_lock(tp, rtg->rtg_mount); + else if (rtglock_flags & XFS_RTGLOCK_BITMAP_SHARED) + xfs_rtbitmap_lock_shared(rtg->rtg_mount, XFS_RBMLOCK_BITMAP); +} + +/* Unlock metadata inodes associated with this rt group. */ +void +xfs_rtgroup_unlock( + struct xfs_rtgroup *rtg, + unsigned int rtglock_flags) +{ + ASSERT(!(rtglock_flags & ~XFS_RTGLOCK_ALL_FLAGS)); + ASSERT(!(rtglock_flags & XFS_RTGLOCK_BITMAP_SHARED) || + !(rtglock_flags & XFS_RTGLOCK_BITMAP)); + + if (rtglock_flags & XFS_RTGLOCK_BITMAP) + xfs_rtbitmap_unlock(rtg->rtg_mount); + else if (rtglock_flags & XFS_RTGLOCK_BITMAP_SHARED) + xfs_rtbitmap_unlock_shared(rtg->rtg_mount, XFS_RBMLOCK_BITMAP); +} diff --git a/fs/xfs/libxfs/xfs_rtgroup.h b/fs/xfs/libxfs/xfs_rtgroup.h index 0e664e2436b0..b1e53af5a65b 100644 --- a/fs/xfs/libxfs/xfs_rtgroup.h +++ b/fs/xfs/libxfs/xfs_rtgroup.h @@ -210,11 +210,25 @@ void xfs_rtgroup_update_super(struct xfs_buf *rtsb_bp, const struct xfs_buf *sb_bp); void xfs_rtgroup_log_super(struct xfs_trans *tp, const struct xfs_buf *sb_bp); int xfs_rtgroup_update_secondary_sbs(struct xfs_mount *mp); + +/* Lock the rt bitmap inode in exclusive mode */ +#define XFS_RTGLOCK_BITMAP (1U << 0) +/* Lock the rt bitmap inode in shared mode */ +#define XFS_RTGLOCK_BITMAP_SHARED (1U << 1) + +#define XFS_RTGLOCK_ALL_FLAGS (XFS_RTGLOCK_BITMAP | \ + XFS_RTGLOCK_BITMAP_SHARED) + +void xfs_rtgroup_lock(struct xfs_trans *tp, struct xfs_rtgroup *rtg, + unsigned int rtglock_flags); +void xfs_rtgroup_unlock(struct xfs_rtgroup *rtg, unsigned int rtglock_flags); #else # define xfs_rtgroup_block_count(mp, rgno) (0) # define xfs_rtgroup_update_super(bp, sb_bp) ((void)0) # define xfs_rtgroup_log_super(tp, sb_bp) ((void)0) # define xfs_rtgroup_update_secondary_sbs(mp) (0) +# define xfs_rtgroup_lock(tp, rtg, gf) ((void)0) +# define xfs_rtgroup_unlock(rtg, gf) ((void)0) #endif /* CONFIG_XFS_RT */ #endif /* __LIBXFS_RTGROUP_H */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 15/22] xfs: encode the rtbitmap in little endian format 2022-12-30 22:17 ` [PATCHSET v1.0 00/22] xfsprogs: shard the realtime section Darrick J. Wong ` (13 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 12/22] xfs: define locking primitives for realtime groups Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 21/22] xfs: scrub each rtgroup's portion of the rtbitmap separately Darrick J. Wong ` (6 subsequent siblings) 21 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Currently, the ondisk realtime bitmap file is accessed in units of 32-bit words. There's no endian translation of the contents of this file, which means that the Bad Things Happen(tm) if you go from (say) x86 to powerpc. Since we have a new feature flag, let's take the opportunity to enforce an endianness on the file. The natural format of a bitmap is (IMHO) little endian, because the byte offsets of the bitmap data should always increase in step with the information being indexed. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_format.h | 4 +++- fs/xfs/libxfs/xfs_rtbitmap.c | 8 +++++++- 2 files changed, 10 insertions(+), 2 deletions(-) diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h index 4096d3f069a3..c7752aaa4478 100644 --- a/fs/xfs/libxfs/xfs_format.h +++ b/fs/xfs/libxfs/xfs_format.h @@ -738,10 +738,12 @@ struct xfs_agfl { /* * Realtime bitmap information is accessed by the word, which is currently - * stored in host-endian format. + * stored in host-endian format. Starting with the realtime groups feature, + * the words are stored in le32 ondisk. */ union xfs_rtword_ondisk { __u32 raw; + __le32 rtg; }; /* diff --git a/fs/xfs/libxfs/xfs_rtbitmap.c b/fs/xfs/libxfs/xfs_rtbitmap.c index 05b0e4e92a0a..3e99afea78a6 100644 --- a/fs/xfs/libxfs/xfs_rtbitmap.c +++ b/fs/xfs/libxfs/xfs_rtbitmap.c @@ -177,6 +177,9 @@ xfs_rtbitmap_getword( struct xfs_mount *mp, union xfs_rtword_ondisk *wordptr) { + if (xfs_has_rtgroups(mp)) + return le32_to_cpu(wordptr->rtg); + return wordptr->raw; } @@ -187,7 +190,10 @@ xfs_rtbitmap_setword( union xfs_rtword_ondisk *wordptr, xfs_rtword_t incore) { - wordptr->raw = incore; + if (xfs_has_rtgroups(mp)) + wordptr->rtg = cpu_to_le32(incore); + else + wordptr->raw = incore; } /* ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 21/22] xfs: scrub each rtgroup's portion of the rtbitmap separately 2022-12-30 22:17 ` [PATCHSET v1.0 00/22] xfsprogs: shard the realtime section Darrick J. Wong ` (14 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 15/22] xfs: encode the rtbitmap in little endian format Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 17/22] xfs: encode the rtsummary in big endian format Darrick J. Wong ` (5 subsequent siblings) 21 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Create a new scrub type code so that userspace can scrub each rtgroup's portion of the rtbitmap file separately. This reduces the long tail latency that results from scanning the entire bitmap all at once, and prepares us for future patchsets, wherein we'll need to be able to lock a specific rtgroup so that we can rebuild that rtgroup's part of the rtbitmap contents from the rtgroup's rmap btree. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_fs.h | 3 +- fs/xfs/scrub/common.h | 6 ++++ fs/xfs/scrub/rtbitmap.c | 73 +++++++++++++++++++++++++++++++++++++++++++++-- fs/xfs/scrub/scrub.c | 7 +++++ fs/xfs/scrub/scrub.h | 2 + fs/xfs/scrub/trace.h | 4 ++- 6 files changed, 90 insertions(+), 5 deletions(-) diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h index c12be9dbb59d..7e9d7d7bb40b 100644 --- a/fs/xfs/libxfs/xfs_fs.h +++ b/fs/xfs/libxfs/xfs_fs.h @@ -742,9 +742,10 @@ struct xfs_scrub_metadata { #define XFS_SCRUB_TYPE_NLINKS 26 /* inode link counts */ #define XFS_SCRUB_TYPE_HEALTHY 27 /* everything checked out ok */ #define XFS_SCRUB_TYPE_RGSUPER 28 /* realtime superblock */ +#define XFS_SCRUB_TYPE_RGBITMAP 29 /* realtime group bitmap */ /* Number of scrub subcommands. */ -#define XFS_SCRUB_TYPE_NR 29 +#define XFS_SCRUB_TYPE_NR 30 /* i: Repair this metadata. */ #define XFS_SCRUB_IFLAG_REPAIR (1u << 0) diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h index 96bb8bc676e7..e83e88b44e5b 100644 --- a/fs/xfs/scrub/common.h +++ b/fs/xfs/scrub/common.h @@ -105,10 +105,12 @@ int xchk_setup_parent(struct xfs_scrub *sc); int xchk_setup_rtbitmap(struct xfs_scrub *sc); int xchk_setup_rtsummary(struct xfs_scrub *sc); int xchk_setup_rgsuperblock(struct xfs_scrub *sc); +int xchk_setup_rgbitmap(struct xfs_scrub *sc); #else # define xchk_setup_rtbitmap xchk_setup_nothing # define xchk_setup_rtsummary xchk_setup_nothing # define xchk_setup_rgsuperblock xchk_setup_nothing +# define xchk_setup_rgbitmap xchk_setup_nothing #endif #ifdef CONFIG_XFS_QUOTA int xchk_ino_dqattach(struct xfs_scrub *sc); @@ -166,6 +168,10 @@ void xchk_rt_init(struct xfs_scrub *sc, struct xchk_rt *sr, void xchk_rt_unlock(struct xfs_scrub *sc, struct xchk_rt *sr); #ifdef CONFIG_XFS_RT + +/* All the locks we need to check an rtgroup. */ +#define XCHK_RTGLOCK_ALL (XFS_RTGLOCK_BITMAP_SHARED) + int xchk_rtgroup_init(struct xfs_scrub *sc, xfs_rgnumber_t rgno, struct xchk_rt *sr, unsigned int rtglock_flags); void xchk_rtgroup_unlock(struct xfs_scrub *sc, struct xchk_rt *sr); diff --git a/fs/xfs/scrub/rtbitmap.c b/fs/xfs/scrub/rtbitmap.c index a50b0580f3d8..d847773e5f66 100644 --- a/fs/xfs/scrub/rtbitmap.c +++ b/fs/xfs/scrub/rtbitmap.c @@ -14,10 +14,34 @@ #include "xfs_rtbitmap.h" #include "xfs_inode.h" #include "xfs_bmap.h" +#include "xfs_rtgroup.h" #include "scrub/scrub.h" #include "scrub/common.h" #include "scrub/repair.h" +/* Set us up with the realtime group metadata locked. */ +int +xchk_setup_rgbitmap( + struct xfs_scrub *sc) +{ + int error; + + error = xchk_trans_alloc(sc, 0); + if (error) + return error; + + error = xchk_install_live_inode(sc, sc->mp->m_rbmip); + if (error) + return error; + + error = xchk_ino_dqattach(sc); + if (error) + return error; + + return xchk_rtgroup_init(sc, sc->sm->sm_agno, &sc->sr, + XCHK_RTGLOCK_ALL); +} + /* Set us up with the realtime metadata locked. */ int xchk_setup_rtbitmap( @@ -105,6 +129,43 @@ xchk_rtbitmap_check_extents( return error; } +/* Scrub this group's realtime bitmap. */ +int +xchk_rgbitmap( + struct xfs_scrub *sc) +{ + struct xfs_rtalloc_rec keys[2]; + struct xfs_rtgroup *rtg = sc->sr.rtg; + xfs_rtblock_t rtbno; + xfs_rgblock_t last_rgbno = rtg->rtg_blockcount - 1; + int error; + + /* Sanity check the realtime bitmap size. */ + if (sc->mp->m_rbmip->i_disk_size != + XFS_FSB_TO_B(sc->mp, sc->mp->m_sb.sb_rbmblocks)) { + xchk_ino_set_corrupt(sc, sc->mp->m_rbmip->i_ino); + return 0; + } + + /* + * Check only the portion of the rtbitmap that corresponds to this + * realtime group. + */ + rtbno = xfs_rgbno_to_rtb(sc->mp, rtg->rtg_rgno, 0); + keys[0].ar_startext = xfs_rtb_to_rtxt(sc->mp, rtbno); + + rtbno = xfs_rgbno_to_rtb(sc->mp, rtg->rtg_rgno, last_rgbno); + keys[1].ar_startext = xfs_rtb_to_rtxt(sc->mp, rtbno); + keys[0].ar_extcount = keys[1].ar_extcount = 0; + + error = xfs_rtalloc_query_range(sc->mp, sc->tp, &keys[0], &keys[1], + xchk_rtbitmap_rec, sc); + if (!xchk_fblock_process_error(sc, XFS_DATA_FORK, 0, &error)) + return error; + + return 0; +} + /* Scrub the realtime bitmap. */ int xchk_rtbitmap( @@ -128,12 +189,18 @@ xchk_rtbitmap( if (error || (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)) return error; + /* + * Each rtgroup checks its portion of the rt bitmap, so if we don't + * have that feature, we have to check the bitmap contents now. + */ + if (xfs_has_rtgroups(sc->mp)) + return 0; + error = xfs_rtalloc_query_all(sc->mp, sc->tp, xchk_rtbitmap_rec, sc); if (!xchk_fblock_process_error(sc, XFS_DATA_FORK, 0, &error)) - goto out; + return error; -out: - return error; + return 0; } /* xref check that the extent is not free in the rtbitmap */ diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c index 5e07150e8f14..6066673953cb 100644 --- a/fs/xfs/scrub/scrub.c +++ b/fs/xfs/scrub/scrub.c @@ -416,6 +416,13 @@ static const struct xchk_meta_ops meta_scrub_ops[] = { .has = xfs_has_rtgroups, .repair = xrep_rgsuperblock, }, + [XFS_SCRUB_TYPE_RGBITMAP] = { /* realtime group bitmap */ + .type = ST_RTGROUP, + .setup = xchk_setup_rgbitmap, + .scrub = xchk_rgbitmap, + .has = xfs_has_rtgroups, + .repair = xrep_notsupported, + }, }; static int diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h index 6e5b96b6db81..48114bda2f4a 100644 --- a/fs/xfs/scrub/scrub.h +++ b/fs/xfs/scrub/scrub.h @@ -189,10 +189,12 @@ int xchk_parent(struct xfs_scrub *sc); int xchk_rtbitmap(struct xfs_scrub *sc); int xchk_rtsummary(struct xfs_scrub *sc); int xchk_rgsuperblock(struct xfs_scrub *sc); +int xchk_rgbitmap(struct xfs_scrub *sc); #else # define xchk_rtbitmap xchk_nothing # define xchk_rtsummary xchk_nothing # define xchk_rgsuperblock xchk_nothing +# define xchk_rgbitmap xchk_nothing #endif #ifdef CONFIG_XFS_QUOTA int xchk_quota(struct xfs_scrub *sc); diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h index a88ad16c90db..9a51eb404fae 100644 --- a/fs/xfs/scrub/trace.h +++ b/fs/xfs/scrub/trace.h @@ -75,6 +75,7 @@ TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_QUOTACHECK); TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_NLINKS); TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_HEALTHY); TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_RGSUPER); +TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_RGBITMAP); #define XFS_SCRUB_TYPE_STRINGS \ { XFS_SCRUB_TYPE_PROBE, "probe" }, \ @@ -105,7 +106,8 @@ TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_RGSUPER); { XFS_SCRUB_TYPE_QUOTACHECK, "quotacheck" }, \ { XFS_SCRUB_TYPE_NLINKS, "nlinks" }, \ { XFS_SCRUB_TYPE_HEALTHY, "healthy" }, \ - { XFS_SCRUB_TYPE_RGSUPER, "rgsuper" } + { XFS_SCRUB_TYPE_RGSUPER, "rgsuper" }, \ + { XFS_SCRUB_TYPE_RGBITMAP, "rgbitmap" } #define XFS_SCRUB_FLAG_STRINGS \ { XFS_SCRUB_IFLAG_REPAIR, "repair" }, \ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 17/22] xfs: encode the rtsummary in big endian format 2022-12-30 22:17 ` [PATCHSET v1.0 00/22] xfsprogs: shard the realtime section Darrick J. Wong ` (15 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 21/22] xfs: scrub each rtgroup's portion of the rtbitmap separately Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 19/22] xfs: scrub the realtime group superblock Darrick J. Wong ` (4 subsequent siblings) 21 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Currently, the ondisk realtime summary file counters are accessed in units of 32-bit words. There's no endian translation of the contents of this file, which means that the Bad Things Happen(tm) if you go from (say) x86 to powerpc. Since we have a new feature flag, let's take the opportunity to enforce an endianness on the file. Encode the summary information in big endian format, like most of the rest of the filesystem. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_format.h | 4 +++- fs/xfs/libxfs/xfs_rtbitmap.c | 8 +++++++- 2 files changed, 10 insertions(+), 2 deletions(-) diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h index 47b2e31e2560..7e76bedda688 100644 --- a/fs/xfs/libxfs/xfs_format.h +++ b/fs/xfs/libxfs/xfs_format.h @@ -748,10 +748,12 @@ union xfs_rtword_ondisk { /* * Realtime summary counts are accessed by the word, which is currently - * stored in host-endian format. + * stored in host-endian format. Starting with the realtime groups feature, + * the words are stored in be32 ondisk. */ union xfs_suminfo_ondisk { __u32 raw; + __be32 rtg; }; /* diff --git a/fs/xfs/libxfs/xfs_rtbitmap.c b/fs/xfs/libxfs/xfs_rtbitmap.c index 3d5b14cc0f3a..ccefbfc70f8b 100644 --- a/fs/xfs/libxfs/xfs_rtbitmap.c +++ b/fs/xfs/libxfs/xfs_rtbitmap.c @@ -562,6 +562,9 @@ xfs_suminfo_get( struct xfs_mount *mp, union xfs_suminfo_ondisk *infoptr) { + if (xfs_has_rtgroups(mp)) + return be32_to_cpu(infoptr->rtg); + return infoptr->raw; } @@ -571,7 +574,10 @@ xfs_suminfo_add( union xfs_suminfo_ondisk *infoptr, int delta) { - infoptr->raw += delta; + if (xfs_has_rtgroups(mp)) + be32_add_cpu(&infoptr->rtg, delta); + else + infoptr->raw += delta; } /* ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 19/22] xfs: scrub the realtime group superblock 2022-12-30 22:17 ` [PATCHSET v1.0 00/22] xfsprogs: shard the realtime section Darrick J. Wong ` (16 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 17/22] xfs: encode the rtsummary in big endian format Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 16/22] xfs: add block headers to realtime summary blocks Darrick J. Wong ` (3 subsequent siblings) 21 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Enable scrubbing of realtime group superblocks. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/Makefile | 1 + fs/xfs/libxfs/xfs_fs.h | 3 +- fs/xfs/scrub/common.c | 88 ++++++++++++++++++++++++++++++++++++++++++++++++ fs/xfs/scrub/common.h | 46 ++++++++++++++----------- fs/xfs/scrub/health.c | 1 + fs/xfs/scrub/rgsuper.c | 77 ++++++++++++++++++++++++++++++++++++++++++ fs/xfs/scrub/scrub.c | 20 ++++++++++- fs/xfs/scrub/scrub.h | 40 ++++++++++------------ fs/xfs/scrub/trace.h | 4 ++ 9 files changed, 236 insertions(+), 44 deletions(-) create mode 100644 fs/xfs/scrub/rgsuper.c diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile index 135a403c0edc..a02fb09fed64 100644 --- a/fs/xfs/Makefile +++ b/fs/xfs/Makefile @@ -178,6 +178,7 @@ xfs-y += $(addprefix scrub/, \ ) xfs-$(CONFIG_XFS_RT) += $(addprefix scrub/, \ + rgsuper.o \ rtbitmap.o \ rtsummary.o \ ) diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h index e3d87665e4a5..c12be9dbb59d 100644 --- a/fs/xfs/libxfs/xfs_fs.h +++ b/fs/xfs/libxfs/xfs_fs.h @@ -741,9 +741,10 @@ struct xfs_scrub_metadata { #define XFS_SCRUB_TYPE_QUOTACHECK 25 /* quota counters */ #define XFS_SCRUB_TYPE_NLINKS 26 /* inode link counts */ #define XFS_SCRUB_TYPE_HEALTHY 27 /* everything checked out ok */ +#define XFS_SCRUB_TYPE_RGSUPER 28 /* realtime superblock */ /* Number of scrub subcommands. */ -#define XFS_SCRUB_TYPE_NR 28 +#define XFS_SCRUB_TYPE_NR 29 /* i: Repair this metadata. */ #define XFS_SCRUB_IFLAG_REPAIR (1u << 0) diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c index 1b48726fcc65..b63b5c016841 100644 --- a/fs/xfs/scrub/common.c +++ b/fs/xfs/scrub/common.c @@ -34,6 +34,7 @@ #include "xfs_quota.h" #include "xfs_swapext.h" #include "xfs_rtbitmap.h" +#include "xfs_rtgroup.h" #include "scrub/scrub.h" #include "scrub/common.h" #include "scrub/trace.h" @@ -121,6 +122,17 @@ xchk_process_error( XFS_SCRUB_OFLAG_CORRUPT, __return_address); } +bool +xchk_process_rt_error( + struct xfs_scrub *sc, + xfs_rgnumber_t rgno, + xfs_rgblock_t rgbno, + int *error) +{ + return __xchk_process_error(sc, rgno, rgbno, error, + XFS_SCRUB_OFLAG_CORRUPT, __return_address); +} + bool xchk_xref_process_error( struct xfs_scrub *sc, @@ -132,6 +144,17 @@ xchk_xref_process_error( XFS_SCRUB_OFLAG_XFAIL, __return_address); } +bool +xchk_xref_process_rt_error( + struct xfs_scrub *sc, + xfs_rgnumber_t rgno, + xfs_rgblock_t rgbno, + int *error) +{ + return __xchk_process_error(sc, rgno, rgbno, error, + XFS_SCRUB_OFLAG_XFAIL, __return_address); +} + /* Check for operational errors for a file offset. */ static bool __xchk_fblock_process_error( @@ -691,6 +714,7 @@ xchk_rt_init( XCHK_RTLOCK_BITMAP_SHARED)) < 2); ASSERT(hweight32(rtlock_flags & (XCHK_RTLOCK_SUMMARY | XCHK_RTLOCK_SUMMARY_SHARED)) < 2); + ASSERT(sr->rtg == NULL); if (rtlock_flags & XCHK_RTLOCK_BITMAP) xfs_ilock(sc->mp->m_rbmip, XFS_ILOCK_EXCL); @@ -714,6 +738,8 @@ xchk_rt_unlock( struct xfs_scrub *sc, struct xchk_rt *sr) { + ASSERT(sr->rtg == NULL); + if (!sr->rtlock_flags) return; @@ -730,6 +756,68 @@ xchk_rt_unlock( sr->rtlock_flags = 0; } +#ifdef CONFIG_XFS_RT +/* + * For scrubbing a realtime group, grab all the in-core resources we'll need to + * check the metadata, which means taking the ILOCK of the realtime group's + * metadata inodes. Callers must not join these inodes to the transaction with + * non-zero lockflags or concurrency problems will result. The @rtglock_flags + * argument takes XFS_RTGLOCK_* flags. + */ +int +xchk_rtgroup_init( + struct xfs_scrub *sc, + xfs_rgnumber_t rgno, + struct xchk_rt *sr, + unsigned int rtglock_flags) +{ + ASSERT(sr->rtg == NULL); + ASSERT(sr->rtlock_flags == 0); + + sr->rtg = xfs_rtgroup_get(sc->mp, rgno); + if (!sr->rtg) + return -ENOENT; + + xfs_rtgroup_lock(NULL, sr->rtg, rtglock_flags); + sr->rtlock_flags = rtglock_flags; + return 0; +} + +/* + * Unlock the realtime group. This must be done /after/ committing (or + * cancelling) the scrub transaction. + */ +void +xchk_rtgroup_unlock( + struct xfs_scrub *sc, + struct xchk_rt *sr) +{ + ASSERT(sr->rtg != NULL); + + if (sr->rtlock_flags) { + xfs_rtgroup_unlock(sr->rtg, sr->rtlock_flags); + sr->rtlock_flags = 0; + } +} + +/* + * Unlock the realtime group and release its resources. This must be done + * /after/ committing (or cancelling) the scrub transaction. + */ +void +xchk_rtgroup_free( + struct xfs_scrub *sc, + struct xchk_rt *sr) +{ + ASSERT(sr->rtg != NULL); + + xchk_rtgroup_unlock(sc, sr); + + xfs_rtgroup_put(sr->rtg); + sr->rtg = NULL; +} +#endif /* CONFIG_XFS_RT */ + /* Per-scrubber setup functions */ void diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h index e41224065421..96bb8bc676e7 100644 --- a/fs/xfs/scrub/common.h +++ b/fs/xfs/scrub/common.h @@ -37,11 +37,15 @@ void xchk_trans_cancel(struct xfs_scrub *sc); bool xchk_process_error(struct xfs_scrub *sc, xfs_agnumber_t agno, xfs_agblock_t bno, int *error); +bool xchk_process_rt_error(struct xfs_scrub *sc, xfs_rgnumber_t rgno, + xfs_rgblock_t rgbno, int *error); bool xchk_fblock_process_error(struct xfs_scrub *sc, int whichfork, xfs_fileoff_t offset, int *error); bool xchk_xref_process_error(struct xfs_scrub *sc, xfs_agnumber_t agno, xfs_agblock_t bno, int *error); +bool xchk_xref_process_rt_error(struct xfs_scrub *sc, + xfs_rgnumber_t rgno, xfs_rgblock_t rgbno, int *error); bool xchk_fblock_xref_process_error(struct xfs_scrub *sc, int whichfork, xfs_fileoff_t offset, int *error); @@ -78,6 +82,11 @@ int xchk_checkpoint_log(struct xfs_mount *mp); bool xchk_should_check_xref(struct xfs_scrub *sc, int *error, struct xfs_btree_cur **curpp); +static inline int xchk_setup_nothing(struct xfs_scrub *sc) +{ + return -ENOENT; +} + /* Setup functions */ int xchk_setup_agheader(struct xfs_scrub *sc); int xchk_setup_fs(struct xfs_scrub *sc); @@ -95,17 +104,11 @@ int xchk_setup_parent(struct xfs_scrub *sc); #ifdef CONFIG_XFS_RT int xchk_setup_rtbitmap(struct xfs_scrub *sc); int xchk_setup_rtsummary(struct xfs_scrub *sc); +int xchk_setup_rgsuperblock(struct xfs_scrub *sc); #else -static inline int -xchk_setup_rtbitmap(struct xfs_scrub *sc) -{ - return -ENOENT; -} -static inline int -xchk_setup_rtsummary(struct xfs_scrub *sc) -{ - return -ENOENT; -} +# define xchk_setup_rtbitmap xchk_setup_nothing +# define xchk_setup_rtsummary xchk_setup_nothing +# define xchk_setup_rgsuperblock xchk_setup_nothing #endif #ifdef CONFIG_XFS_QUOTA int xchk_ino_dqattach(struct xfs_scrub *sc); @@ -117,16 +120,8 @@ xchk_ino_dqattach(struct xfs_scrub *sc) { return 0; } -static inline int -xchk_setup_quota(struct xfs_scrub *sc) -{ - return -ENOENT; -} -static inline int -xchk_setup_quotacheck(struct xfs_scrub *sc) -{ - return -ENOENT; -} +# define xchk_setup_quota xchk_setup_nothing +# define xchk_setup_quotacheck xchk_setup_nothing #endif int xchk_setup_fscounters(struct xfs_scrub *sc); int xchk_setup_nlinks(struct xfs_scrub *sc); @@ -169,6 +164,17 @@ xchk_ag_init_existing( void xchk_rt_init(struct xfs_scrub *sc, struct xchk_rt *sr, unsigned int xchk_rtlock_flags); void xchk_rt_unlock(struct xfs_scrub *sc, struct xchk_rt *sr); + +#ifdef CONFIG_XFS_RT +int xchk_rtgroup_init(struct xfs_scrub *sc, xfs_rgnumber_t rgno, + struct xchk_rt *sr, unsigned int rtglock_flags); +void xchk_rtgroup_unlock(struct xfs_scrub *sc, struct xchk_rt *sr); +void xchk_rtgroup_free(struct xfs_scrub *sc, struct xchk_rt *sr); +#else +# define xchk_rtgroup_init(sc, rgno, sr, lockflags) (-ENOSYS) +# define xchk_rtgroup_free(sc, sr) ((void)0) +#endif /* CONFIG_XFS_RT */ + int xchk_ag_read_headers(struct xfs_scrub *sc, xfs_agnumber_t agno, struct xchk_ag *sa); void xchk_ag_btcur_free(struct xchk_ag *sa); diff --git a/fs/xfs/scrub/health.c b/fs/xfs/scrub/health.c index 9a8d4c348cc9..a71d4d9087b2 100644 --- a/fs/xfs/scrub/health.c +++ b/fs/xfs/scrub/health.c @@ -112,6 +112,7 @@ static const struct xchk_health_map type_to_health_flag[XFS_SCRUB_TYPE_NR] = { [XFS_SCRUB_TYPE_FSCOUNTERS] = { XHG_FS, XFS_SICK_FS_COUNTERS }, [XFS_SCRUB_TYPE_QUOTACHECK] = { XHG_FS, XFS_SICK_FS_QUOTACHECK }, [XFS_SCRUB_TYPE_NLINKS] = { XHG_FS, XFS_SICK_FS_NLINKS }, + [XFS_SCRUB_TYPE_RGSUPER] = { XHG_RTGROUP, XFS_SICK_RT_SUPER }, }; /* Return the health status mask for this scrub type. */ diff --git a/fs/xfs/scrub/rgsuper.c b/fs/xfs/scrub/rgsuper.c new file mode 100644 index 000000000000..a85ad580aa62 --- /dev/null +++ b/fs/xfs/scrub/rgsuper.c @@ -0,0 +1,77 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (C) 2022 Oracle. All Rights Reserved. + * Author: Darrick J. Wong <djwong@kernel.org> + */ +#include "xfs.h" +#include "xfs_fs.h" +#include "xfs_shared.h" +#include "xfs_format.h" +#include "xfs_trans_resv.h" +#include "xfs_mount.h" +#include "xfs_rtgroup.h" +#include "scrub/scrub.h" +#include "scrub/common.h" + +/* Set us up with a transaction and an empty context. */ +int +xchk_setup_rgsuperblock( + struct xfs_scrub *sc) +{ + return xchk_trans_alloc(sc, 0); +} + +/* Cross-reference with the other rt metadata. */ +STATIC void +xchk_rgsuperblock_xref( + struct xfs_scrub *sc) +{ + struct xfs_mount *mp = sc->mp; + xfs_rgnumber_t rgno = sc->sr.rtg->rtg_rgno; + xfs_rtblock_t rtbno; + + if (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT) + return; + + rtbno = xfs_rgbno_to_rtb(mp, rgno, 0); + xchk_xref_is_used_rt_space(sc, rtbno, 1); +} + +int +xchk_rgsuperblock( + struct xfs_scrub *sc) +{ + struct xfs_buf *bp; + xfs_rgnumber_t rgno = sc->sm->sm_agno; + int error; + + /* + * Grab an active reference to the rtgroup structure. If we can't get + * it, we're racing with something that's tearing down the group, so + * signal that the group no longer exists. Take the rtbitmap in shared + * mode so that the group can't change while we're doing things. + */ + error = xchk_rtgroup_init(sc, rgno, &sc->sr, XFS_RTGLOCK_BITMAP_SHARED); + if (error) + return error; + + /* + * If this is the primary rtgroup superblock, we know it passed the + * verifier checks at mount time and do not need to load the buffer + * again. + */ + if (sc->sr.rtg->rtg_rgno == 0) { + xchk_rgsuperblock_xref(sc); + return 0; + } + + /* The secondary rt super is checked by the read verifier. */ + error = xfs_buf_read_uncached(sc->mp->m_rtdev_targp, XFS_RTSB_DADDR, + XFS_FSB_TO_BB(sc->mp, 1), 0, &bp, &xfs_rtsb_buf_ops); + if (!xchk_process_rt_error(sc, rgno, 0, &error)) + return error; + + xchk_rgsuperblock_xref(sc); + xfs_buf_relse(bp); + return 0; +} diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c index 1b3820b30384..6c54f00b516c 100644 --- a/fs/xfs/scrub/scrub.c +++ b/fs/xfs/scrub/scrub.c @@ -189,7 +189,10 @@ xchk_teardown( xfs_trans_cancel(sc->tp); sc->tp = NULL; } - xchk_rt_unlock(sc, &sc->sr); + if (sc->sr.rtg) + xchk_rtgroup_free(sc, &sc->sr); + else + xchk_rt_unlock(sc, &sc->sr); if (sc->ip) { if (sc->ilock_flags) xchk_iunlock(sc, sc->ilock_flags); @@ -406,6 +409,13 @@ static const struct xchk_meta_ops meta_scrub_ops[] = { .scrub = xchk_health_record, .repair = xrep_notsupported, }, + [XFS_SCRUB_TYPE_RGSUPER] = { /* realtime group superblock */ + .type = ST_RTGROUP, + .setup = xchk_setup_rgsuperblock, + .scrub = xchk_rgsuperblock, + .has = xfs_has_rtgroups, + .repair = xrep_notsupported, + }, }; static int @@ -453,6 +463,14 @@ xchk_validate_inputs( if (sm->sm_agno || (sm->sm_gen && !sm->sm_ino)) goto out; break; + case ST_RTGROUP: + if (sm->sm_ino || sm->sm_gen) + goto out; + if (!xfs_has_rtgroups(mp) && sm->sm_agno != 0) + goto out; + if (xfs_has_rtgroups(mp) && sm->sm_agno >= mp->m_sb.sb_rgcount) + goto out; + break; default: goto out; } diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h index 38437104fc86..6e5b96b6db81 100644 --- a/fs/xfs/scrub/scrub.h +++ b/fs/xfs/scrub/scrub.h @@ -23,6 +23,7 @@ enum xchk_type { ST_PERAG, /* per-AG metadata */ ST_FS, /* per-FS metadata */ ST_INODE, /* per-inode metadata */ + ST_RTGROUP, /* rtgroup metadata */ }; struct xchk_meta_ops { @@ -69,7 +70,13 @@ struct xchk_ag { /* Inode lock state for the RT volume. */ struct xchk_rt { - /* XCHK_RTLOCK_* lock state */ + /* incore rtgroup, if applicable */ + struct xfs_rtgroup *rtg; + + /* + * XCHK_RTLOCK_* lock state if rtg == NULL, or XFS_RTGLOCK_* lock state + * if rtg != NULL. + */ unsigned int rtlock_flags; }; @@ -153,6 +160,11 @@ struct xfs_scrub { XCHK_FSHOOKS_NLINKS | \ XCHK_FSHOOKS_RMAP) +static inline int xchk_nothing(struct xfs_scrub *sc) +{ + return -ENOENT; +} + /* Metadata scrubbers */ int xchk_tester(struct xfs_scrub *sc); int xchk_superblock(struct xfs_scrub *sc); @@ -176,32 +188,18 @@ int xchk_parent(struct xfs_scrub *sc); #ifdef CONFIG_XFS_RT int xchk_rtbitmap(struct xfs_scrub *sc); int xchk_rtsummary(struct xfs_scrub *sc); +int xchk_rgsuperblock(struct xfs_scrub *sc); #else -static inline int -xchk_rtbitmap(struct xfs_scrub *sc) -{ - return -ENOENT; -} -static inline int -xchk_rtsummary(struct xfs_scrub *sc) -{ - return -ENOENT; -} +# define xchk_rtbitmap xchk_nothing +# define xchk_rtsummary xchk_nothing +# define xchk_rgsuperblock xchk_nothing #endif #ifdef CONFIG_XFS_QUOTA int xchk_quota(struct xfs_scrub *sc); int xchk_quotacheck(struct xfs_scrub *sc); #else -static inline int -xchk_quota(struct xfs_scrub *sc) -{ - return -ENOENT; -} -static inline int -xchk_quotacheck(struct xfs_scrub *sc) -{ - return -ENOENT; -} +# define xchk_quota xchk_nothing +# define xchk_quotacheck xchk_nothing #endif int xchk_fscounters(struct xfs_scrub *sc); int xchk_nlinks(struct xfs_scrub *sc); diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h index 749cf4333c8a..a88ad16c90db 100644 --- a/fs/xfs/scrub/trace.h +++ b/fs/xfs/scrub/trace.h @@ -74,6 +74,7 @@ TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_FSCOUNTERS); TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_QUOTACHECK); TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_NLINKS); TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_HEALTHY); +TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_RGSUPER); #define XFS_SCRUB_TYPE_STRINGS \ { XFS_SCRUB_TYPE_PROBE, "probe" }, \ @@ -103,7 +104,8 @@ TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_HEALTHY); { XFS_SCRUB_TYPE_FSCOUNTERS, "fscounters" }, \ { XFS_SCRUB_TYPE_QUOTACHECK, "quotacheck" }, \ { XFS_SCRUB_TYPE_NLINKS, "nlinks" }, \ - { XFS_SCRUB_TYPE_HEALTHY, "healthy" } + { XFS_SCRUB_TYPE_HEALTHY, "healthy" }, \ + { XFS_SCRUB_TYPE_RGSUPER, "rgsuper" } #define XFS_SCRUB_FLAG_STRINGS \ { XFS_SCRUB_IFLAG_REPAIR, "repair" }, \ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 16/22] xfs: add block headers to realtime summary blocks 2022-12-30 22:17 ` [PATCHSET v1.0 00/22] xfsprogs: shard the realtime section Darrick J. Wong ` (17 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 19/22] xfs: scrub the realtime group superblock Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 20/22] xfs: repair secondary realtime group superblocks Darrick J. Wong ` (2 subsequent siblings) 21 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Upgrade rtsummary blocks to have self describing metadata like most every other thing in XFS. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_format.h | 1 + fs/xfs/libxfs/xfs_rtbitmap.c | 18 +++++++++++++++--- fs/xfs/libxfs/xfs_rtbitmap.h | 18 ++++++++++++++++-- fs/xfs/libxfs/xfs_shared.h | 1 + fs/xfs/scrub/rtsummary_repair.c | 15 +++++++++++++-- fs/xfs/xfs_buf_item_recover.c | 11 +++++++---- fs/xfs/xfs_rtalloc.c | 7 +++++-- 7 files changed, 58 insertions(+), 13 deletions(-) diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h index c7752aaa4478..47b2e31e2560 100644 --- a/fs/xfs/libxfs/xfs_format.h +++ b/fs/xfs/libxfs/xfs_format.h @@ -1287,6 +1287,7 @@ static inline bool xfs_dinode_has_large_extent_counts( * RT bit manipulation macros. */ #define XFS_RTBITMAP_MAGIC 0x424D505A /* BMPZ */ +#define XFS_RTSUMMARY_MAGIC 0x53554D59 /* SUMY */ struct xfs_rtbuf_blkinfo { __be32 rt_magic; /* validity check on block */ diff --git a/fs/xfs/libxfs/xfs_rtbitmap.c b/fs/xfs/libxfs/xfs_rtbitmap.c index 3e99afea78a6..3d5b14cc0f3a 100644 --- a/fs/xfs/libxfs/xfs_rtbitmap.c +++ b/fs/xfs/libxfs/xfs_rtbitmap.c @@ -53,7 +53,7 @@ xfs_rtbuf_verify_read( struct xfs_rtbuf_blkinfo *hdr = bp->b_addr; xfs_failaddr_t fa; - if (!xfs_has_rtgroups(mp) || bp->b_ops != &xfs_rtbitmap_buf_ops) + if (!xfs_has_rtgroups(mp)) return; if (!xfs_log_check_lsn(mp, be64_to_cpu(hdr->rt_lsn))) { @@ -84,7 +84,7 @@ xfs_rtbuf_verify_write( struct xfs_buf_log_item *bip = bp->b_log_item; xfs_failaddr_t fa; - if (!xfs_has_rtgroups(mp) || bp->b_ops != &xfs_rtbitmap_buf_ops) + if (!xfs_has_rtgroups(mp)) return; fa = xfs_rtbuf_verify(bp); @@ -112,6 +112,14 @@ const struct xfs_buf_ops xfs_rtbitmap_buf_ops = { .verify_struct = xfs_rtbuf_verify, }; +const struct xfs_buf_ops xfs_rtsummary_buf_ops = { + .name = "xfs_rtsummary", + .magic = { 0, cpu_to_be32(XFS_RTSUMMARY_MAGIC) }, + .verify_read = xfs_rtbuf_verify_read, + .verify_write = xfs_rtbuf_verify_write, + .verify_struct = xfs_rtbuf_verify, +}; + /* * Get a buffer for the bitmap or summary file block specified. * The buffer is returned read and locked. @@ -153,7 +161,7 @@ xfs_rtbuf_get( if (error) return error; - if (xfs_has_rtgroups(mp) && !issum) { + if (xfs_has_rtgroups(mp)) { struct xfs_rtbuf_blkinfo *hdr = bp->b_addr; if (hdr->rt_owner != cpu_to_be64(ip->i_ino)) { @@ -1321,6 +1329,10 @@ xfs_rtsummary_blockcount( unsigned long long rsumwords; rsumwords = (unsigned long long)rsumlevels * rbmblocks; + + if (xfs_has_rtgroups(mp)) + return howmany_64(rsumwords, mp->m_blockwsize); + return XFS_B_TO_FSB(mp, rsumwords << XFS_WORDLOG); } diff --git a/fs/xfs/libxfs/xfs_rtbitmap.h b/fs/xfs/libxfs/xfs_rtbitmap.h index c1f740fd27b8..cebbb72c4376 100644 --- a/fs/xfs/libxfs/xfs_rtbitmap.h +++ b/fs/xfs/libxfs/xfs_rtbitmap.h @@ -181,6 +181,9 @@ xfs_rtsumoffs_to_block( struct xfs_mount *mp, xfs_rtsumoff_t rsumoff) { + if (xfs_has_rtgroups(mp)) + return rsumoff / mp->m_blockwsize; + return XFS_B_TO_FSBT(mp, rsumoff * sizeof(xfs_suminfo_t)); } @@ -195,16 +198,24 @@ xfs_rtsumoffs_to_infoword( { unsigned int mask = mp->m_blockmask >> XFS_SUMINFOLOG; + if (xfs_has_rtgroups(mp)) + return rsumoff % mp->m_blockwsize; + return rsumoff & mask; } /* Return a pointer to a summary info word within a rt summary block buffer. */ static inline union xfs_suminfo_ondisk * xfs_rsumbuf_infoptr( + struct xfs_mount *mp, void *buf, unsigned int infoword) { union xfs_suminfo_ondisk *infop = buf; + struct xfs_rtbuf_blkinfo *hdr = buf; + + if (xfs_has_rtgroups(mp)) + infop = (union xfs_suminfo_ondisk *)(hdr + 1); return &infop[infoword]; } @@ -215,7 +226,7 @@ xfs_rsumblock_infoptr( struct xfs_buf *bp, unsigned int infoword) { - return xfs_rsumbuf_infoptr(bp->b_addr, infoword); + return xfs_rsumbuf_infoptr(bp->b_mount, bp->b_addr, infoword); } static inline const struct xfs_buf_ops * @@ -223,8 +234,11 @@ xfs_rtblock_ops( struct xfs_mount *mp, bool issum) { - if (xfs_has_rtgroups(mp) && !issum) + if (xfs_has_rtgroups(mp)) { + if (issum) + return &xfs_rtsummary_buf_ops; return &xfs_rtbitmap_buf_ops; + } return &xfs_rtbuf_ops; } diff --git a/fs/xfs/libxfs/xfs_shared.h b/fs/xfs/libxfs/xfs_shared.h index 1c86163915cf..62839fc87b50 100644 --- a/fs/xfs/libxfs/xfs_shared.h +++ b/fs/xfs/libxfs/xfs_shared.h @@ -39,6 +39,7 @@ extern const struct xfs_buf_ops xfs_inode_buf_ra_ops; extern const struct xfs_buf_ops xfs_refcountbt_buf_ops; extern const struct xfs_buf_ops xfs_rmapbt_buf_ops; extern const struct xfs_buf_ops xfs_rtbitmap_buf_ops; +extern const struct xfs_buf_ops xfs_rtsummary_buf_ops; extern const struct xfs_buf_ops xfs_rtbuf_ops; extern const struct xfs_buf_ops xfs_rtsb_buf_ops; extern const struct xfs_buf_ops xfs_sb_buf_ops; diff --git a/fs/xfs/scrub/rtsummary_repair.c b/fs/xfs/scrub/rtsummary_repair.c index 713b79a1f52a..0836c1e10504 100644 --- a/fs/xfs/scrub/rtsummary_repair.c +++ b/fs/xfs/scrub/rtsummary_repair.c @@ -88,13 +88,24 @@ xrep_rtsummary_prep_buf( struct xfs_mount *mp = sc->mp; int error; - bp->b_ops = &xfs_rtbuf_ops; - error = xfsum_copyout(sc, rs->prep_wordoff, xfs_rsumblock_infoptr(bp, 0), mp->m_blockwsize); if (error) return error; + if (xfs_has_rtgroups(sc->mp)) { + struct xfs_rtbuf_blkinfo *hdr = bp->b_addr; + + hdr->rt_magic = cpu_to_be32(XFS_RTSUMMARY_MAGIC); + hdr->rt_owner = cpu_to_be64(sc->ip->i_ino); + hdr->rt_blkno = cpu_to_be64(xfs_buf_daddr(bp)); + hdr->rt_lsn = 0; + uuid_copy(&hdr->rt_uuid, &sc->mp->m_sb.sb_meta_uuid); + bp->b_ops = &xfs_rtsummary_buf_ops; + } else { + bp->b_ops = &xfs_rtbuf_ops; + } + rs->prep_wordoff += mp->m_blockwsize; xfs_trans_buf_set_type(sc->tp, bp, XFS_BLFT_RTSUMMARY_BUF); return 0; diff --git a/fs/xfs/xfs_buf_item_recover.c b/fs/xfs/xfs_buf_item_recover.c index 4dcd5d9d2c7c..b74d40f5beb1 100644 --- a/fs/xfs/xfs_buf_item_recover.c +++ b/fs/xfs/xfs_buf_item_recover.c @@ -399,7 +399,10 @@ xlog_recover_validate_buf_type( bp->b_ops = xfs_rtblock_ops(mp, false); break; case XFS_BLFT_RTSUMMARY_BUF: - /* no magic numbers for verification of RT buffers */ + if (xfs_has_rtgroups(mp) && magic32 != XFS_RTSUMMARY_MAGIC) { + warnmsg = "Bad rtsummary magic!"; + break; + } bp->b_ops = xfs_rtblock_ops(mp, true); break; #endif /* CONFIG_XFS_RT */ @@ -735,13 +738,13 @@ xlog_recover_get_buf_lsn( * UUIDs, so we must recover them immediately. */ blft = xfs_blft_from_flags(buf_f); - if (!xfs_has_rtgroups(mp) && blft == XFS_BLFT_RTBITMAP_BUF) - goto recover_immediately; - if (blft == XFS_BLFT_RTSUMMARY_BUF) + if (!xfs_has_rtgroups(mp) && (blft == XFS_BLFT_RTBITMAP_BUF || + blft == XFS_BLFT_RTSUMMARY_BUF)) goto recover_immediately; magic32 = be32_to_cpu(*(__be32 *)blk); switch (magic32) { + case XFS_RTSUMMARY_MAGIC: case XFS_RTBITMAP_MAGIC: { struct xfs_rtbuf_blkinfo *hdr = blk; diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c index 9e013a8e3149..f8f0557dc46c 100644 --- a/fs/xfs/xfs_rtalloc.c +++ b/fs/xfs/xfs_rtalloc.c @@ -814,10 +814,13 @@ xfs_growfs_init_rtbuf( bp->b_ops = xfs_rtblock_ops(mp, buf_type == XFS_BLFT_RTSUMMARY_BUF); memset(bp->b_addr, 0, mp->m_sb.sb_blocksize); - if (xfs_has_rtgroups(mp) && buf_type == XFS_BLFT_RTBITMAP_BUF) { + if (xfs_has_rtgroups(mp)) { struct xfs_rtbuf_blkinfo *hdr = bp->b_addr; - hdr->rt_magic = cpu_to_be32(XFS_RTBITMAP_MAGIC); + if (buf_type == XFS_BLFT_RTBITMAP_BUF) + hdr->rt_magic = cpu_to_be32(XFS_RTBITMAP_MAGIC); + else + hdr->rt_magic = cpu_to_be32(XFS_RTSUMMARY_MAGIC); hdr->rt_owner = cpu_to_be64(ip->i_ino); hdr->rt_blkno = cpu_to_be64(d); uuid_copy(&hdr->rt_uuid, &mp->m_sb.sb_meta_uuid); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 20/22] xfs: repair secondary realtime group superblocks 2022-12-30 22:17 ` [PATCHSET v1.0 00/22] xfsprogs: shard the realtime section Darrick J. Wong ` (18 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 16/22] xfs: add block headers to realtime summary blocks Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 22/22] xfs: enable realtime group feature Darrick J. Wong 2022-12-30 22:17 ` [PATCH 18/22] xfs: store rtgroup information with a bmap intent Darrick J. Wong 21 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Repair secondary realtime group superblocks. They're not critical for anything, but some consistency would be a good idea. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/Makefile | 1 + fs/xfs/libxfs/xfs_rtgroup.c | 2 +- fs/xfs/libxfs/xfs_rtgroup.h | 3 +++ fs/xfs/scrub/repair.h | 3 +++ fs/xfs/scrub/rgsuper_repair.c | 48 +++++++++++++++++++++++++++++++++++++++++ fs/xfs/scrub/scrub.c | 2 +- 6 files changed, 57 insertions(+), 2 deletions(-) create mode 100644 fs/xfs/scrub/rgsuper_repair.c diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile index a02fb09fed64..4bf6d663272b 100644 --- a/fs/xfs/Makefile +++ b/fs/xfs/Makefile @@ -217,6 +217,7 @@ xfs-y += $(addprefix scrub/, \ ) xfs-$(CONFIG_XFS_RT) += $(addprefix scrub/, \ + rgsuper_repair.o \ rtbitmap_repair.o \ rtsummary_repair.o \ ) diff --git a/fs/xfs/libxfs/xfs_rtgroup.c b/fs/xfs/libxfs/xfs_rtgroup.c index a428dff81888..4d9e2c0f2fd3 100644 --- a/fs/xfs/libxfs/xfs_rtgroup.c +++ b/fs/xfs/libxfs/xfs_rtgroup.c @@ -384,7 +384,7 @@ xfs_rtgroup_log_super( } /* Initialize a secondary realtime superblock. */ -static int +int xfs_rtgroup_init_secondary_super( struct xfs_mount *mp, xfs_rgnumber_t rgno, diff --git a/fs/xfs/libxfs/xfs_rtgroup.h b/fs/xfs/libxfs/xfs_rtgroup.h index 1fec49c496d4..3c9572677f79 100644 --- a/fs/xfs/libxfs/xfs_rtgroup.h +++ b/fs/xfs/libxfs/xfs_rtgroup.h @@ -210,6 +210,8 @@ void xfs_rtgroup_update_super(struct xfs_buf *rtsb_bp, const struct xfs_buf *sb_bp); void xfs_rtgroup_log_super(struct xfs_trans *tp, const struct xfs_buf *sb_bp); int xfs_rtgroup_update_secondary_sbs(struct xfs_mount *mp); +int xfs_rtgroup_init_secondary_super(struct xfs_mount *mp, xfs_rgnumber_t rgno, + struct xfs_buf **bpp); /* Lock the rt bitmap inode in exclusive mode */ #define XFS_RTGLOCK_BITMAP (1U << 0) @@ -230,6 +232,7 @@ int xfs_rtgroup_get_geometry(struct xfs_rtgroup *rtg, # define xfs_rtgroup_update_super(bp, sb_bp) ((void)0) # define xfs_rtgroup_log_super(tp, sb_bp) ((void)0) # define xfs_rtgroup_update_secondary_sbs(mp) (0) +# define xfs_rtgroup_init_secondary_super(mp, rgno, bpp) (-EOPNOTSUPP) # define xfs_rtgroup_lock(tp, rtg, gf) ((void)0) # define xfs_rtgroup_unlock(rtg, gf) ((void)0) # define xfs_rtgroup_get_geometry(rtg, rgeo) (-EOPNOTSUPP) diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h index c6461acd1112..292e252efae3 100644 --- a/fs/xfs/scrub/repair.h +++ b/fs/xfs/scrub/repair.h @@ -131,9 +131,11 @@ int xrep_symlink(struct xfs_scrub *sc); #ifdef CONFIG_XFS_RT int xrep_rtbitmap(struct xfs_scrub *sc); int xrep_rtsummary(struct xfs_scrub *sc); +int xrep_rgsuperblock(struct xfs_scrub *sc); #else # define xrep_rtbitmap xrep_notsupported # define xrep_rtsummary xrep_notsupported +# define xrep_rgsuperblock xrep_notsupported #endif /* CONFIG_XFS_RT */ #ifdef CONFIG_XFS_QUOTA @@ -248,6 +250,7 @@ static inline int xrep_setup_symlink(struct xfs_scrub *sc, unsigned int *x) #define xrep_directory xrep_notsupported #define xrep_parent xrep_notsupported #define xrep_symlink xrep_notsupported +#define xrep_rgsuperblock xrep_notsupported #endif /* CONFIG_XFS_ONLINE_REPAIR */ diff --git a/fs/xfs/scrub/rgsuper_repair.c b/fs/xfs/scrub/rgsuper_repair.c new file mode 100644 index 000000000000..9dc379c593ba --- /dev/null +++ b/fs/xfs/scrub/rgsuper_repair.c @@ -0,0 +1,48 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (C) 2022 Oracle. All Rights Reserved. + * Author: Darrick J. Wong <djwong@kernel.org> + */ +#include "xfs.h" +#include "xfs_fs.h" +#include "xfs_shared.h" +#include "xfs_format.h" +#include "xfs_trans_resv.h" +#include "xfs_mount.h" +#include "xfs_defer.h" +#include "xfs_btree.h" +#include "xfs_inode.h" +#include "xfs_bit.h" +#include "xfs_log_format.h" +#include "xfs_trans.h" +#include "xfs_rtgroup.h" +#include "xfs_sb.h" +#include "scrub/scrub.h" +#include "scrub/repair.h" + +int +xrep_rgsuperblock( + struct xfs_scrub *sc) +{ + struct xfs_buf *bp; + int error; + + /* + * If this is the primary rtgroup superblock, log a superblock update + * to force both to disk. + */ + if (sc->sr.rtg->rtg_rgno == 0) { + xfs_log_sb(sc->tp); + return 0; + } + + /* Otherwise just write a new secondary to disk directly. */ + error = xfs_rtgroup_init_secondary_super(sc->mp, sc->sr.rtg->rtg_rgno, + &bp); + if (error) + return error; + + error = xfs_bwrite(bp); + xfs_buf_relse(bp); + return error; +} diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c index 6c54f00b516c..5e07150e8f14 100644 --- a/fs/xfs/scrub/scrub.c +++ b/fs/xfs/scrub/scrub.c @@ -414,7 +414,7 @@ static const struct xchk_meta_ops meta_scrub_ops[] = { .setup = xchk_setup_rgsuperblock, .scrub = xchk_rgsuperblock, .has = xfs_has_rtgroups, - .repair = xrep_notsupported, + .repair = xrep_rgsuperblock, }, }; ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 22/22] xfs: enable realtime group feature 2022-12-30 22:17 ` [PATCHSET v1.0 00/22] xfsprogs: shard the realtime section Darrick J. Wong ` (19 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 20/22] xfs: repair secondary realtime group superblocks Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 18/22] xfs: store rtgroup information with a bmap intent Darrick J. Wong 21 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_format.h | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h index 7e76bedda688..e4f3b2c5c054 100644 --- a/fs/xfs/libxfs/xfs_format.h +++ b/fs/xfs/libxfs/xfs_format.h @@ -416,6 +416,7 @@ xfs_sb_has_ro_compat_feature( XFS_SB_FEAT_INCOMPAT_BIGTIME| \ XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR| \ XFS_SB_FEAT_INCOMPAT_NREXT64 | \ + XFS_SB_FEAT_INCOMPAT_RTGROUPS | \ XFS_SB_FEAT_INCOMPAT_METADIR) #define XFS_SB_FEAT_INCOMPAT_UNKNOWN ~XFS_SB_FEAT_INCOMPAT_ALL ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 18/22] xfs: store rtgroup information with a bmap intent 2022-12-30 22:17 ` [PATCHSET v1.0 00/22] xfsprogs: shard the realtime section Darrick J. Wong ` (20 preceding siblings ...) 2022-12-30 22:17 ` [PATCH 22/22] xfs: enable realtime group feature Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 21 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Make the bmap intent items take an active reference to the rtgroup containing the space that is being mapped or unmapped. We will need this functionality once we start enabling rmap and reflink on the rt volume. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_bmap.h | 5 ++++- fs/xfs/xfs_bmap_item.c | 18 ++++++++++++++++-- 2 files changed, 20 insertions(+), 3 deletions(-) diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h index d870c6a62e40..05097b1d5c7d 100644 --- a/fs/xfs/libxfs/xfs_bmap.h +++ b/fs/xfs/libxfs/xfs_bmap.h @@ -241,7 +241,10 @@ struct xfs_bmap_intent { enum xfs_bmap_intent_type bi_type; int bi_whichfork; struct xfs_inode *bi_owner; - struct xfs_perag *bi_pag; + union { + struct xfs_perag *bi_pag; + struct xfs_rtgroup *bi_rtg; + }; struct xfs_bmbt_irec bi_bmap; }; diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c index bf52d30d7d1c..04eeae9aef79 100644 --- a/fs/xfs/xfs_bmap_item.c +++ b/fs/xfs/xfs_bmap_item.c @@ -25,6 +25,7 @@ #include "xfs_log_priv.h" #include "xfs_log_recover.h" #include "xfs_ag.h" +#include "xfs_rtgroup.h" struct kmem_cache *xfs_bui_cache; struct kmem_cache *xfs_bud_cache; @@ -362,8 +363,18 @@ xfs_bmap_update_get_group( { xfs_agnumber_t agno; - if (xfs_ifork_is_realtime(bi->bi_owner, bi->bi_whichfork)) + if (xfs_ifork_is_realtime(bi->bi_owner, bi->bi_whichfork)) { + if (xfs_has_rtgroups(mp)) { + xfs_rgnumber_t rgno; + + rgno = xfs_rtb_to_rgno(mp, bi->bi_bmap.br_startblock); + bi->bi_rtg = xfs_rtgroup_get(mp, rgno); + } else { + bi->bi_rtg = NULL; + } + return; + } agno = XFS_FSB_TO_AGNO(mp, bi->bi_bmap.br_startblock); bi->bi_pag = xfs_perag_get(mp, agno); @@ -383,8 +394,11 @@ static inline void xfs_bmap_update_put_group( struct xfs_bmap_intent *bi) { - if (xfs_ifork_is_realtime(bi->bi_owner, bi->bi_whichfork)) + if (xfs_ifork_is_realtime(bi->bi_owner, bi->bi_whichfork)) { + if (xfs_has_rtgroups(bi->bi_owner->i_mount)) + xfs_rtgroup_put(bi->bi_rtg); return; + } xfs_perag_drop_intents(bi->bi_pag); xfs_perag_put(bi->bi_pag); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCHSET v1.0 0/3] xfsprogs: enable FITRIM for the realtime section 2022-12-30 21:14 [NYE DELUGE 3/4] xfs: modernize the realtime volume Darrick J. Wong ` (7 preceding siblings ...) 2022-12-30 22:17 ` [PATCHSET v1.0 00/22] xfsprogs: shard the realtime section Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 1/3] xfs: hoist data device FITRIM AG iteration to a separate function Darrick J. Wong ` (2 more replies) 2022-12-30 22:18 ` [PATCHSET v1.0 0/2] xfs: enable in-core block reservation for rt metadata Darrick J. Wong ` (30 subsequent siblings) 39 siblings, 3 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs Hi all, One thing that's been missing for a long time is the ability to tell underlying storage that it can unmap the unused space on the realtime device. This short series exposes this functionality through FITRIM. Callers that want ranged FITRIM should be aware that the realtime space exists in the offset range after the data device. However, it is anticipated that most callers pass in offset=0 len=-1ULL and will not notice or care. If you're going to start using this mess, you probably ought to just pull from my git trees, which are linked below. This is an extraordinary way to destroy everything. Enjoy! Comments and questions are, as always, welcome. --D kernel git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=realtime-discard --- fs/xfs/xfs_discard.c | 167 +++++++++++++++++++++++++++++++++++++++++++------- fs/xfs/xfs_trace.h | 20 ++++++ 2 files changed, 164 insertions(+), 23 deletions(-) ^ permalink raw reply [flat|nested] 565+ messages in thread
* [PATCH 1/3] xfs: hoist data device FITRIM AG iteration to a separate function 2022-12-30 22:17 ` [PATCHSET v1.0 0/3] xfsprogs: enable FITRIM for the realtime section Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 2/3] xfs: convert xfs_trim_extents to use perag iteration macros Darrick J. Wong 2022-12-30 22:17 ` [PATCH 3/3] xfs: enable FITRIM on the realtime device Darrick J. Wong 2 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Hoist the AG iteration loop logic out of xfs_ioc_trim and into a separate function. No functional changes. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/xfs_discard.c | 50 ++++++++++++++++++++++++++++++++++---------------- 1 file changed, 34 insertions(+), 16 deletions(-) diff --git a/fs/xfs/xfs_discard.c b/fs/xfs/xfs_discard.c index 44658cc7d3f2..7459c5205a6b 100644 --- a/fs/xfs/xfs_discard.c +++ b/fs/xfs/xfs_discard.c @@ -140,6 +140,35 @@ xfs_trim_extents( return error; } +static int +xfs_trim_ddev_extents( + struct xfs_mount *mp, + xfs_daddr_t start, + xfs_daddr_t end, + xfs_daddr_t minlen, + uint64_t *blocks_trimmed) +{ + xfs_agnumber_t start_agno, end_agno, agno; + int error, last_error = 0; + + if (end > XFS_FSB_TO_BB(mp, mp->m_sb.sb_dblocks) - 1) + end = XFS_FSB_TO_BB(mp, mp->m_sb.sb_dblocks) - 1; + + start_agno = xfs_daddr_to_agno(mp, start); + end_agno = xfs_daddr_to_agno(mp, end); + + for (agno = start_agno; agno <= end_agno; agno++) { + error = xfs_trim_extents(mp, agno, start, end, minlen, + blocks_trimmed); + if (error == -ERESTARTSYS) + return error; + if (error) + last_error = error; + } + + return last_error; +} + /* * trim a range of the filesystem. * @@ -158,7 +187,6 @@ xfs_ioc_trim( unsigned int granularity = bdev_discard_granularity(bdev); struct fstrim_range range; xfs_daddr_t start, end, minlen; - xfs_agnumber_t start_agno, end_agno, agno; uint64_t blocks_trimmed = 0; int error, last_error = 0; @@ -194,21 +222,11 @@ xfs_ioc_trim( start = BTOBB(range.start); end = start + BTOBBT(range.len) - 1; - if (end > XFS_FSB_TO_BB(mp, mp->m_sb.sb_dblocks) - 1) - end = XFS_FSB_TO_BB(mp, mp->m_sb.sb_dblocks)- 1; - - start_agno = xfs_daddr_to_agno(mp, start); - end_agno = xfs_daddr_to_agno(mp, end); - - for (agno = start_agno; agno <= end_agno; agno++) { - error = xfs_trim_extents(mp, agno, start, end, minlen, - &blocks_trimmed); - if (error) { - last_error = error; - if (error == -ERESTARTSYS) - break; - } - } + error = xfs_trim_ddev_extents(mp, start, end, minlen, &blocks_trimmed); + if (error == -ERESTARTSYS) + return error; + if (error) + last_error = error; if (last_error) return last_error; ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 2/3] xfs: convert xfs_trim_extents to use perag iteration macros 2022-12-30 22:17 ` [PATCHSET v1.0 0/3] xfsprogs: enable FITRIM for the realtime section Darrick J. Wong 2022-12-30 22:17 ` [PATCH 1/3] xfs: hoist data device FITRIM AG iteration to a separate function Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2022-12-30 22:17 ` [PATCH 3/3] xfs: enable FITRIM on the realtime device Darrick J. Wong 2 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Convert the AG iteration loop to use the ranged perag iteration macro, remove the perag_get/put calls from xfs_trim_extents, and rename the function. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/xfs_discard.c | 25 ++++++++++++------------- 1 file changed, 12 insertions(+), 13 deletions(-) diff --git a/fs/xfs/xfs_discard.c b/fs/xfs/xfs_discard.c index 7459c5205a6b..6d3400771e21 100644 --- a/fs/xfs/xfs_discard.c +++ b/fs/xfs/xfs_discard.c @@ -21,24 +21,22 @@ #include "xfs_health.h" STATIC int -xfs_trim_extents( - struct xfs_mount *mp, - xfs_agnumber_t agno, +xfs_trim_perag_extents( + struct xfs_perag *pag, xfs_daddr_t start, xfs_daddr_t end, xfs_daddr_t minlen, uint64_t *blocks_trimmed) { + struct xfs_mount *mp = pag->pag_mount; struct block_device *bdev = xfs_buftarg_bdev(mp->m_ddev_targp); struct xfs_btree_cur *cur; struct xfs_buf *agbp; struct xfs_agf *agf; - struct xfs_perag *pag; + xfs_agnumber_t agno = pag->pag_agno; int error; int i; - pag = xfs_perag_get(mp, agno); - /* * Force out the log. This means any transactions that might have freed * space before we take the AGF buffer lock are now on disk, and the @@ -48,7 +46,7 @@ xfs_trim_extents( error = xfs_alloc_read_agf(pag, NULL, 0, &agbp); if (error) - goto out_put_perag; + return error; agf = agbp->b_addr; cur = xfs_allocbt_init_cursor(mp, NULL, agbp, pag, XFS_BTNUM_CNT); @@ -135,8 +133,6 @@ xfs_trim_extents( out_del_cursor: xfs_btree_del_cursor(cur, error); xfs_buf_relse(agbp); -out_put_perag: - xfs_perag_put(pag); return error; } @@ -148,7 +144,8 @@ xfs_trim_ddev_extents( xfs_daddr_t minlen, uint64_t *blocks_trimmed) { - xfs_agnumber_t start_agno, end_agno, agno; + struct xfs_perag *pag; + xfs_agnumber_t start_agno, end_agno; int error, last_error = 0; if (end > XFS_FSB_TO_BB(mp, mp->m_sb.sb_dblocks) - 1) @@ -157,11 +154,13 @@ xfs_trim_ddev_extents( start_agno = xfs_daddr_to_agno(mp, start); end_agno = xfs_daddr_to_agno(mp, end); - for (agno = start_agno; agno <= end_agno; agno++) { - error = xfs_trim_extents(mp, agno, start, end, minlen, + for_each_perag_range(mp, start_agno, end_agno, pag) { + error = xfs_trim_perag_extents(pag, start, end, minlen, blocks_trimmed); - if (error == -ERESTARTSYS) + if (error == -ERESTARTSYS) { + xfs_perag_put(pag); return error; + } if (error) last_error = error; } ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 3/3] xfs: enable FITRIM on the realtime device 2022-12-30 22:17 ` [PATCHSET v1.0 0/3] xfsprogs: enable FITRIM for the realtime section Darrick J. Wong 2022-12-30 22:17 ` [PATCH 1/3] xfs: hoist data device FITRIM AG iteration to a separate function Darrick J. Wong 2022-12-30 22:17 ` [PATCH 2/3] xfs: convert xfs_trim_extents to use perag iteration macros Darrick J. Wong @ 2022-12-30 22:17 ` Darrick J. Wong 2 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:17 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/xfs_discard.c | 106 ++++++++++++++++++++++++++++++++++++++++++++++++++ fs/xfs/xfs_trace.h | 20 +++++++++ 2 files changed, 125 insertions(+), 1 deletion(-) diff --git a/fs/xfs/xfs_discard.c b/fs/xfs/xfs_discard.c index 6d3400771e21..f6ee2b8e5e9c 100644 --- a/fs/xfs/xfs_discard.c +++ b/fs/xfs/xfs_discard.c @@ -19,6 +19,7 @@ #include "xfs_log.h" #include "xfs_ag.h" #include "xfs_health.h" +#include "xfs_rtbitmap.h" STATIC int xfs_trim_perag_extents( @@ -168,6 +169,86 @@ xfs_trim_ddev_extents( return last_error; } +#ifdef CONFIG_XFS_RT +static int +xfs_trim_rtdev_extent( + struct xfs_mount *mp, + struct xfs_trans *tp, + const struct xfs_rtalloc_rec *rec, + void *priv) +{ + struct block_device *bdev = xfs_buftarg_bdev(mp->m_rtdev_targp); + uint64_t *blocks_trimmed = priv; + xfs_rtblock_t rbno, rlen; + xfs_daddr_t dbno, dlen; + int error; + + if (fatal_signal_pending(current)) + return -ERESTARTSYS; + + rbno = xfs_rtx_to_rtb(mp, rec->ar_startext); + rlen = xfs_rtx_to_rtb(mp, rec->ar_extcount); + + trace_xfs_discard_rtextent(mp, rbno, rlen); + + dbno = XFS_FSB_TO_BB(mp, rbno); + dlen = XFS_FSB_TO_BB(mp, rlen); + + error = blkdev_issue_discard(bdev, dbno, dlen, GFP_NOFS); + if (error) + return error; + + *blocks_trimmed += rlen; + return 0; +} + +static int +xfs_trim_rtdev_extents( + struct xfs_mount *mp, + xfs_daddr_t start, + xfs_daddr_t end, + xfs_daddr_t minlen, + uint64_t *blocks_trimmed) +{ + struct xfs_rtalloc_rec low = { }, high = { }; + xfs_daddr_t rtdev_daddr; + xfs_extlen_t mod; + int error; + + /* Shift the start and end downwards to match the rt device. */ + rtdev_daddr = XFS_FSB_TO_BB(mp, mp->m_sb.sb_dblocks); + if (start > rtdev_daddr) + start -= rtdev_daddr; + else + start = 0; + + if (end <= rtdev_daddr) + return 0; + end -= rtdev_daddr; + + if (end > XFS_FSB_TO_BB(mp, mp->m_sb.sb_rblocks) - 1) + end = XFS_FSB_TO_BB(mp, mp->m_sb.sb_rblocks) - 1; + + /* Convert the rt blocks to rt extents */ + low.ar_startext = xfs_rtb_to_rtx(mp, XFS_BB_TO_FSB(mp, start), &mod); + if (mod) + low.ar_startext++; + high.ar_startext = xfs_rtb_to_rtx(mp, XFS_BB_TO_FSBT(mp, end), &mod); + + /* + * Walk the free ranges between low and high. The query_range function + * trims the extents returned. + */ + xfs_rtbitmap_lock_shared(mp, XFS_RBMLOCK_BITMAP); + error = xfs_rtalloc_query_range(mp, NULL, &low, &high, + xfs_trim_rtdev_extent, blocks_trimmed); + xfs_rtbitmap_unlock_shared(mp, XFS_RBMLOCK_BITMAP); + return error; +} +#else +# define xfs_trim_rtdev_extents(m,s,e,n,b) (-EOPNOTSUPP) +#endif /* CONFIG_XFS_RT */ + /* * trim a range of the filesystem. * @@ -176,6 +257,9 @@ xfs_trim_ddev_extents( * addressing. FSB addressing is sparse (AGNO|AGBNO), while the incoming format * is a linear address range. Hence we need to use DADDR based conversions and * comparisons for determining the correct offset and regions to trim. + * + * The realtime device is mapped into the FITRIM "address space" immediately + * after the data device. */ int xfs_ioc_trim( @@ -183,8 +267,10 @@ xfs_ioc_trim( struct fstrim_range __user *urange) { struct block_device *bdev = xfs_buftarg_bdev(mp->m_ddev_targp); + struct block_device *rt_bdev = NULL; unsigned int granularity = bdev_discard_granularity(bdev); struct fstrim_range range; + xfs_rfsblock_t max_blocks; xfs_daddr_t start, end, minlen; uint64_t blocks_trimmed = 0; int error, last_error = 0; @@ -194,6 +280,14 @@ xfs_ioc_trim( if (!bdev_max_discard_sectors(bdev)) return -EOPNOTSUPP; + if (mp->m_rtdev_targp) { + rt_bdev = xfs_buftarg_bdev(mp->m_rtdev_targp); + if (!bdev_max_discard_sectors(rt_bdev)) + return -EOPNOTSUPP; + granularity = max(granularity, + bdev_discard_granularity(rt_bdev)); + } + /* * We haven't recovered the log, so we cannot use our bnobt-guided * storage zapping commands. @@ -213,7 +307,8 @@ xfs_ioc_trim( * used by the fstrim application. In the end it really doesn't * matter as trimming blocks is an advisory interface. */ - if (range.start >= XFS_FSB_TO_B(mp, mp->m_sb.sb_dblocks) || + max_blocks = mp->m_sb.sb_dblocks + mp->m_sb.sb_rblocks; + if (range.start >= XFS_FSB_TO_B(mp, max_blocks) || range.minlen > XFS_FSB_TO_B(mp, mp->m_ag_max_usable) || range.len < mp->m_sb.sb_blocksize) return -EINVAL; @@ -227,6 +322,15 @@ xfs_ioc_trim( if (error) last_error = error; + if (rt_bdev) { + error = xfs_trim_rtdev_extents(mp, start, end, minlen, + &blocks_trimmed); + if (error == -ERESTARTSYS) + return error; + if (error) + last_error = error; + } + if (last_error) return last_error; diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index ec9aa1914a93..cfb26288394a 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -2520,6 +2520,26 @@ DEFINE_DISCARD_EVENT(xfs_discard_toosmall); DEFINE_DISCARD_EVENT(xfs_discard_exclude); DEFINE_DISCARD_EVENT(xfs_discard_busy); +TRACE_EVENT(xfs_discard_rtextent, + TP_PROTO(struct xfs_mount *mp, + xfs_rtblock_t rtbno, xfs_rtblock_t len), + TP_ARGS(mp, rtbno, len), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(xfs_rtblock_t, rtbno) + __field(xfs_rtblock_t, len) + ), + TP_fast_assign( + __entry->dev = mp->m_rtdev_targp->bt_dev; + __entry->rtbno = rtbno; + __entry->len = len; + ), + TP_printk("dev %d:%d rtbno 0x%llx rtbcount 0x%llx", + MAJOR(__entry->dev), MINOR(__entry->dev), + __entry->rtbno, + __entry->len) +); + /* btree cursor events */ TRACE_DEFINE_ENUM(XFS_BTNUM_BNOi); TRACE_DEFINE_ENUM(XFS_BTNUM_CNTi); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCHSET v1.0 0/2] xfs: enable in-core block reservation for rt metadata 2022-12-30 21:14 [NYE DELUGE 3/4] xfs: modernize the realtime volume Darrick J. Wong ` (8 preceding siblings ...) 2022-12-30 22:17 ` [PATCHSET v1.0 0/3] xfsprogs: enable FITRIM for the realtime section Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 2/2] xfs: allow inode-based btrees to reserve space in the data device Darrick J. Wong 2022-12-30 22:18 ` [PATCH 1/2] xfs: simplify xfs_ag_resv_free signature Darrick J. Wong 2022-12-30 22:18 ` [PATCHSET v1.0 0/2] xfs: extent free log intent cleanups Darrick J. Wong ` (29 subsequent siblings) 39 siblings, 2 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs Hi all, In preparation for adding reverse mapping and refcounting to the realtime device, enhance the metadir code to reserve free space for btree shape changes as delayed allocation blocks. This effectively allows us to pre-allocate space for the rmap and refcount btrees in the same manner as we do for the data device counterparts, which is how we avoid ENOSPC failures when space is low but we've already committed to a COW operation. If you're going to start using this mess, you probably ought to just pull from my git trees, which are linked below. This is an extraordinary way to destroy everything. Enjoy! Comments and questions are, as always, welcome. --D kernel git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=reserve-rt-metadata-space xfsprogs git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=reserve-rt-metadata-space --- fs/xfs/libxfs/xfs_ag.c | 4 - fs/xfs/libxfs/xfs_ag_resv.c | 25 ++---- fs/xfs/libxfs/xfs_ag_resv.h | 2 fs/xfs/libxfs/xfs_errortag.h | 4 + fs/xfs/libxfs/xfs_imeta.c | 187 ++++++++++++++++++++++++++++++++++++++++++ fs/xfs/libxfs/xfs_imeta.h | 11 ++ fs/xfs/libxfs/xfs_types.h | 7 ++ fs/xfs/scrub/newbt.c | 3 - fs/xfs/scrub/repair.c | 5 - fs/xfs/xfs_error.c | 3 + fs/xfs/xfs_fsops.c | 39 +++++---- fs/xfs/xfs_fsops.h | 2 fs/xfs/xfs_inode.h | 3 + fs/xfs/xfs_mount.c | 10 ++ fs/xfs/xfs_mount.h | 1 fs/xfs/xfs_rtalloc.c | 23 +++++ fs/xfs/xfs_rtalloc.h | 5 + fs/xfs/xfs_super.c | 6 - fs/xfs/xfs_trace.h | 46 ++++++++++ 19 files changed, 335 insertions(+), 51 deletions(-) ^ permalink raw reply [flat|nested] 565+ messages in thread
* [PATCH 2/2] xfs: allow inode-based btrees to reserve space in the data device 2022-12-30 22:18 ` [PATCHSET v1.0 0/2] xfs: enable in-core block reservation for rt metadata Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 1/2] xfs: simplify xfs_ag_resv_free signature Darrick J. Wong 1 sibling, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Create a new space reservation scheme so that btree metadata for the realtime volume can reserve space in the data device to avoid space underruns. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_ag_resv.c | 3 + fs/xfs/libxfs/xfs_errortag.h | 4 + fs/xfs/libxfs/xfs_imeta.c | 187 ++++++++++++++++++++++++++++++++++++++++++ fs/xfs/libxfs/xfs_imeta.h | 11 ++ fs/xfs/libxfs/xfs_types.h | 7 ++ fs/xfs/scrub/newbt.c | 3 - fs/xfs/xfs_error.c | 3 + fs/xfs/xfs_fsops.c | 17 ++++ fs/xfs/xfs_inode.h | 3 + fs/xfs/xfs_mount.c | 10 ++ fs/xfs/xfs_mount.h | 1 fs/xfs/xfs_rtalloc.c | 23 +++++ fs/xfs/xfs_rtalloc.h | 5 + fs/xfs/xfs_trace.h | 45 ++++++++++ 14 files changed, 320 insertions(+), 2 deletions(-) diff --git a/fs/xfs/libxfs/xfs_ag_resv.c b/fs/xfs/libxfs/xfs_ag_resv.c index 8723cd0d3f58..75c04319e9e3 100644 --- a/fs/xfs/libxfs/xfs_ag_resv.c +++ b/fs/xfs/libxfs/xfs_ag_resv.c @@ -113,6 +113,7 @@ xfs_ag_resv_needed( case XFS_AG_RESV_RMAPBT: len -= xfs_perag_resv(pag, type)->ar_reserved; break; + case XFS_AG_RESV_IMETA: case XFS_AG_RESV_NONE: /* empty */ break; @@ -347,6 +348,7 @@ xfs_ag_resv_alloc_extent( switch (type) { case XFS_AG_RESV_AGFL: + case XFS_AG_RESV_IMETA: return; case XFS_AG_RESV_METADATA: case XFS_AG_RESV_RMAPBT: @@ -389,6 +391,7 @@ xfs_ag_resv_free_extent( switch (type) { case XFS_AG_RESV_AGFL: + case XFS_AG_RESV_IMETA: return; case XFS_AG_RESV_METADATA: case XFS_AG_RESV_RMAPBT: diff --git a/fs/xfs/libxfs/xfs_errortag.h b/fs/xfs/libxfs/xfs_errortag.h index 263d62a8d70f..f359df69d6b5 100644 --- a/fs/xfs/libxfs/xfs_errortag.h +++ b/fs/xfs/libxfs/xfs_errortag.h @@ -64,7 +64,8 @@ #define XFS_ERRTAG_WB_DELAY_MS 42 #define XFS_ERRTAG_WRITE_DELAY_MS 43 #define XFS_ERRTAG_SWAPEXT_FINISH_ONE 44 -#define XFS_ERRTAG_MAX 45 +#define XFS_ERRTAG_IMETA_RESV_CRITICAL 45 +#define XFS_ERRTAG_MAX 46 /* * Random factors for above tags, 1 means always, 2 means 1/2 time, etc. @@ -113,5 +114,6 @@ #define XFS_RANDOM_WB_DELAY_MS 3000 #define XFS_RANDOM_WRITE_DELAY_MS 3000 #define XFS_RANDOM_SWAPEXT_FINISH_ONE 1 +#define XFS_RANDOM_IMETA_RESV_CRITICAL 4 #endif /* __XFS_ERRORTAG_H_ */ diff --git a/fs/xfs/libxfs/xfs_imeta.c b/fs/xfs/libxfs/xfs_imeta.c index e4db1651d067..5bfb1eabf21d 100644 --- a/fs/xfs/libxfs/xfs_imeta.c +++ b/fs/xfs/libxfs/xfs_imeta.c @@ -27,6 +27,10 @@ #include "xfs_dir2_priv.h" #include "xfs_ag.h" #include "xfs_health.h" +#include "xfs_errortag.h" +#include "xfs_error.h" +#include "xfs_btree.h" +#include "xfs_alloc.h" /* * Metadata Inode Number Management @@ -1208,3 +1212,186 @@ xfs_imeta_free_path( kfree(path->im_path); kfree(path); } + +/* + * Is the amount of space that could be allocated towards a given metadata + * file at or beneath a certain threshold? + */ +static inline bool +xfs_imeta_resv_can_cover( + struct xfs_inode *ip, + int64_t rhs) +{ + /* + * The amount of space that can be allocated to this metadata file is + * the remaining reservation for the particular metadata file + the + * global free block count. Take care of the first case to avoid + * touching the per-cpu counter. + */ + if (ip->i_delayed_blks >= rhs) + return true; + + /* + * There aren't enough blocks left in the inode's reservation, but it + * isn't critical unless there also isn't enough free space. + */ + return __percpu_counter_compare(&ip->i_mount->m_fdblocks, + rhs - ip->i_delayed_blks, 2048) >= 0; +} + +/* + * Is this metadata file critically low on blocks? For now we'll define that + * as the number of blocks we can get our hands on being less than 10% of what + * we reserved or less than some arbitrary number (maximum btree height). + */ +bool +xfs_imeta_resv_critical( + struct xfs_inode *ip) +{ + uint64_t asked_low_water; + + if (!ip) + return false; + + ASSERT(xfs_is_metadata_inode(ip)); + trace_xfs_imeta_resv_critical(ip, 0); + + if (!xfs_imeta_resv_can_cover(ip, ip->i_mount->m_rtbtree_maxlevels)) + return true; + + asked_low_water = div_u64(ip->i_meta_resv_asked, 10); + if (!xfs_imeta_resv_can_cover(ip, asked_low_water)) + return true; + + return XFS_TEST_ERROR(false, ip->i_mount, + XFS_ERRTAG_IMETA_RESV_CRITICAL); +} + +/* Allocate a block from the metadata file's reservation. */ +void +xfs_imeta_resv_alloc_extent( + struct xfs_inode *ip, + struct xfs_alloc_arg *args) +{ + int64_t len = args->len; + + ASSERT(xfs_is_metadata_inode(ip)); + ASSERT(args->resv == XFS_AG_RESV_IMETA); + + trace_xfs_imeta_resv_alloc_extent(ip, args->len); + + /* + * Allocate the blocks from the metadata inode's block reservation + * and update the ondisk sb counter. + */ + if (ip->i_delayed_blks > 0) { + int64_t from_resv; + + from_resv = min_t(int64_t, len, ip->i_delayed_blks); + ip->i_delayed_blks -= from_resv; + xfs_mod_delalloc(ip->i_mount, -from_resv); + xfs_trans_mod_sb(args->tp, XFS_TRANS_SB_RES_FDBLOCKS, + -from_resv); + len -= from_resv; + } + + /* + * Any allocation in excess of the reservation requires in-core and + * on-disk fdblocks updates. + */ + if (len) + xfs_trans_mod_sb(args->tp, XFS_TRANS_SB_FDBLOCKS, -len); + + ip->i_nblocks += args->len; +} + +/* Free a block to the metadata file's reservation. */ +void +xfs_imeta_resv_free_extent( + struct xfs_inode *ip, + struct xfs_trans *tp, + xfs_filblks_t len) +{ + int64_t to_resv; + + ASSERT(xfs_is_metadata_inode(ip)); + trace_xfs_imeta_resv_free_extent(ip, len); + + ip->i_nblocks -= len; + + /* + * Add the freed blocks back into the inode's delalloc reservation + * until it reaches the maximum size. Update the ondisk fdblocks only. + */ + to_resv = ip->i_meta_resv_asked - (ip->i_nblocks + ip->i_delayed_blks); + if (to_resv > 0) { + to_resv = min_t(int64_t, to_resv, len); + ip->i_delayed_blks += to_resv; + xfs_mod_delalloc(ip->i_mount, to_resv); + xfs_trans_mod_sb(tp, XFS_TRANS_SB_RES_FDBLOCKS, to_resv); + len -= to_resv; + } + + /* + * Everything else goes back to the filesystem, so update the in-core + * and on-disk counters. + */ + if (len) + xfs_trans_mod_sb(tp, XFS_TRANS_SB_FDBLOCKS, len); +} + +/* Release a metadata file's space reservation. */ +void +xfs_imeta_resv_free_inode( + struct xfs_inode *ip) +{ + if (!ip) + return; + + ASSERT(xfs_is_metadata_inode(ip)); + trace_xfs_imeta_resv_free(ip, 0); + + xfs_mod_delalloc(ip->i_mount, -ip->i_delayed_blks); + xfs_mod_fdblocks(ip->i_mount, ip->i_delayed_blks, true); + ip->i_delayed_blks = 0; + ip->i_meta_resv_asked = 0; +} + +/* Set up a metadata file's space reservation. */ +int +xfs_imeta_resv_init_inode( + struct xfs_inode *ip, + xfs_filblks_t ask) +{ + xfs_filblks_t hidden_space; + xfs_filblks_t used; + int error; + + if (!ip || ip->i_meta_resv_asked > 0) + return 0; + + ASSERT(xfs_is_metadata_inode(ip)); + + /* + * Space taken by all other metadata btrees are accounted on-disk as + * used space. We therefore only hide the space that is reserved but + * not used by the trees. + */ + used = ip->i_nblocks; + if (used > ask) + ask = used; + hidden_space = ask - used; + + error = xfs_mod_fdblocks(ip->i_mount, -(int64_t)hidden_space, true); + if (error) { + trace_xfs_imeta_resv_init_error(ip, error, _RET_IP_); + return error; + } + + xfs_mod_delalloc(ip->i_mount, hidden_space); + ip->i_delayed_blks = hidden_space; + ip->i_meta_resv_asked = ask; + + trace_xfs_imeta_resv_init(ip, ask); + return 0; +} diff --git a/fs/xfs/libxfs/xfs_imeta.h b/fs/xfs/libxfs/xfs_imeta.h index 7840087b71da..c3137be4c47c 100644 --- a/fs/xfs/libxfs/xfs_imeta.h +++ b/fs/xfs/libxfs/xfs_imeta.h @@ -84,6 +84,17 @@ void xfs_imeta_droplink(struct xfs_inode *ip); unsigned int xfs_imeta_create_space_res(struct xfs_mount *mp); unsigned int xfs_imeta_unlink_space_res(struct xfs_mount *mp); +/* Space reservations for metadata inodes. */ +struct xfs_alloc_arg; + +bool xfs_imeta_resv_critical(struct xfs_inode *ip); +void xfs_imeta_resv_alloc_extent(struct xfs_inode *ip, + struct xfs_alloc_arg *args); +void xfs_imeta_resv_free_extent(struct xfs_inode *ip, struct xfs_trans *tp, + xfs_filblks_t len); +void xfs_imeta_resv_free_inode(struct xfs_inode *ip); +int xfs_imeta_resv_init_inode(struct xfs_inode *ip, xfs_filblks_t ask); + /* Must be implemented by the libxfs client */ int xfs_imeta_iget(struct xfs_mount *mp, xfs_ino_t ino, unsigned char ftype, struct xfs_inode **ipp); diff --git a/fs/xfs/libxfs/xfs_types.h b/fs/xfs/libxfs/xfs_types.h index c27c84561b5e..d37f8a7ce5f8 100644 --- a/fs/xfs/libxfs/xfs_types.h +++ b/fs/xfs/libxfs/xfs_types.h @@ -221,6 +221,13 @@ enum xfs_ag_resv_type { * altering fdblocks. If you think you need this you're wrong. */ XFS_AG_RESV_IGNORE, + + /* + * This allocation activity is being done on behalf of a metadata file. + * These files maintain their own permanent space reservations and are + * required to adjust fdblocks using the xfs_imeta_resv_* helpers. + */ + XFS_AG_RESV_IMETA, }; /* Results of scanning a btree keyspace to check occupancy. */ diff --git a/fs/xfs/scrub/newbt.c b/fs/xfs/scrub/newbt.c index ebdfdf631be3..9c0ccba75656 100644 --- a/fs/xfs/scrub/newbt.c +++ b/fs/xfs/scrub/newbt.c @@ -422,7 +422,8 @@ xrep_newbt_free_extent( } if (xnr->resv == XFS_AG_RESV_RMAPBT || - xnr->resv == XFS_AG_RESV_METADATA) { + xnr->resv == XFS_AG_RESV_METADATA || + xnr->resv == XFS_AG_RESV_IMETA) { /* * Metadata blocks taken from a per-AG reservation must be put * back into that reservation immediately because EFIs cannot diff --git a/fs/xfs/xfs_error.c b/fs/xfs/xfs_error.c index 4b57a809ced5..af449cf8847e 100644 --- a/fs/xfs/xfs_error.c +++ b/fs/xfs/xfs_error.c @@ -63,6 +63,7 @@ static unsigned int xfs_errortag_random_default[] = { XFS_RANDOM_WB_DELAY_MS, XFS_RANDOM_WRITE_DELAY_MS, XFS_RANDOM_SWAPEXT_FINISH_ONE, + XFS_RANDOM_IMETA_RESV_CRITICAL, }; struct xfs_errortag_attr { @@ -181,6 +182,7 @@ XFS_ERRORTAG_ATTR_RW(attr_leaf_to_node, XFS_ERRTAG_ATTR_LEAF_TO_NODE); XFS_ERRORTAG_ATTR_RW(wb_delay_ms, XFS_ERRTAG_WB_DELAY_MS); XFS_ERRORTAG_ATTR_RW(write_delay_ms, XFS_ERRTAG_WRITE_DELAY_MS); XFS_ERRORTAG_ATTR_RW(swapext_finish_one, XFS_ERRTAG_SWAPEXT_FINISH_ONE); +XFS_ERRORTAG_ATTR_RW(imeta_resv_critical, XFS_ERRTAG_IMETA_RESV_CRITICAL); static struct attribute *xfs_errortag_attrs[] = { XFS_ERRORTAG_ATTR_LIST(noerror), @@ -227,6 +229,7 @@ static struct attribute *xfs_errortag_attrs[] = { XFS_ERRORTAG_ATTR_LIST(wb_delay_ms), XFS_ERRORTAG_ATTR_LIST(write_delay_ms), XFS_ERRORTAG_ATTR_LIST(swapext_finish_one), + XFS_ERRORTAG_ATTR_LIST(imeta_resv_critical), NULL, }; ATTRIBUTE_GROUPS(xfs_errortag); diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c index 8186f142864a..9770916acd69 100644 --- a/fs/xfs/xfs_fsops.c +++ b/fs/xfs/xfs_fsops.c @@ -22,6 +22,7 @@ #include "xfs_ag_resv.h" #include "xfs_trace.h" #include "xfs_rtgroup.h" +#include "xfs_rtalloc.h" /* * Write new AG headers to disk. Non-transactional, but need to be @@ -576,6 +577,19 @@ xfs_fs_reserve_ag_blocks( xfs_warn(mp, "Error %d reserving per-AG metadata reserve pool.", error); xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE); + return error; + } + + if (xfs_has_realtime(mp)) { + err2 = xfs_rt_resv_init(mp); + if (err2 && err2 != -ENOSPC) { + xfs_warn(mp, + "Error %d reserving realtime metadata reserve pool.", err2); + xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE); + } + + if (err2 && !error) + error = err2; } return error; @@ -591,6 +605,9 @@ xfs_fs_unreserve_ag_blocks( struct xfs_perag *pag; xfs_agnumber_t agno; + if (xfs_has_realtime(mp)) + xfs_rt_resv_free(mp); + for_each_perag(mp, agno, pag) xfs_ag_resv_free(pag); } diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h index 06601c409010..ca7ebb07efc7 100644 --- a/fs/xfs/xfs_inode.h +++ b/fs/xfs/xfs_inode.h @@ -55,6 +55,9 @@ typedef struct xfs_inode { /* Miscellaneous state. */ unsigned long i_flags; /* see defined flags below */ uint64_t i_delayed_blks; /* count of delay alloc blks */ + /* Space that has been set aside to root a btree in this file. */ + uint64_t i_meta_resv_asked; + xfs_fsize_t i_disk_size; /* number of bytes in file */ xfs_rfsblock_t i_nblocks; /* # of direct & btree blocks */ prid_t i_projid; /* owner's project id */ diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c index bcfeaaf11536..d94d44f40be4 100644 --- a/fs/xfs/xfs_mount.c +++ b/fs/xfs/xfs_mount.c @@ -649,6 +649,15 @@ xfs_agbtree_compute_maxlevels( mp->m_agbtree_maxlevels = max(levels, mp->m_refc_maxlevels); } +/* Compute maximum possible height for realtime btree types for this fs. */ +static inline void +xfs_rtbtree_compute_maxlevels( + struct xfs_mount *mp) +{ + /* This will be filled in later. */ + mp->m_rtbtree_maxlevels = 0; +} + /* * This function does the following on an initial mount of a file system: * - reads the superblock from disk and init the mount struct @@ -721,6 +730,7 @@ xfs_mountfs( xfs_refcountbt_compute_maxlevels(mp); xfs_agbtree_compute_maxlevels(mp); + xfs_rtbtree_compute_maxlevels(mp); /* * Check if sb_agblocks is aligned at stripe boundary. If sb_agblocks diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h index 176b2e71da9e..55e6e30f9045 100644 --- a/fs/xfs/xfs_mount.h +++ b/fs/xfs/xfs_mount.h @@ -138,6 +138,7 @@ typedef struct xfs_mount { uint m_rmap_maxlevels; /* max rmap btree levels */ uint m_refc_maxlevels; /* max refcount btree level */ unsigned int m_agbtree_maxlevels; /* max level of all AG btrees */ + unsigned int m_rtbtree_maxlevels; /* max level of all rt btrees */ xfs_extlen_t m_ag_prealloc_blocks; /* reserved ag blocks */ uint m_alloc_set_aside; /* space we can't use */ uint m_ag_max_usable; /* max space per AG */ diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c index f8f0557dc46c..7a94fb5b5a7f 100644 --- a/fs/xfs/xfs_rtalloc.c +++ b/fs/xfs/xfs_rtalloc.c @@ -1321,6 +1321,14 @@ xfs_growfs_rt( goto out_free; error = xfs_rtgroup_update_secondary_sbs(mp); + if (error) + goto out_free; + + /* Reset the rt metadata btree space reservations. */ + xfs_rt_resv_free(mp); + error = xfs_rt_resv_init(mp); + if (error == -ENOSPC) + error = 0; out_free: /* @@ -1554,6 +1562,21 @@ xfs_rtalloc_reinit_frextents( return 0; } +/* Free space reservations for rt metadata inodes. */ +void +xfs_rt_resv_free( + struct xfs_mount *mp) +{ +} + +/* Reserve space for rt metadata inodes' space expansion. */ +int +xfs_rt_resv_init( + struct xfs_mount *mp) +{ + return 0; +} + static inline int __xfs_rt_iget( struct xfs_mount *mp, diff --git a/fs/xfs/xfs_rtalloc.h b/fs/xfs/xfs_rtalloc.h index d0fd49db77bd..04931ab1bcac 100644 --- a/fs/xfs/xfs_rtalloc.h +++ b/fs/xfs/xfs_rtalloc.h @@ -54,6 +54,9 @@ int /* error */ xfs_rtmount_inodes( struct xfs_mount *mp); /* file system mount structure */ +void xfs_rt_resv_free(struct xfs_mount *mp); +int xfs_rt_resv_init(struct xfs_mount *mp); + /* * Pick an extent for allocation at the start of a new realtime file. * Use the sequence number stored in the atime field of the bitmap inode. @@ -99,6 +102,8 @@ xfs_rtmount_init( # define xfs_rtmount_inodes(m) (((mp)->m_sb.sb_rblocks == 0)? 0 : (-ENOSYS)) # define xfs_rtunmount_inodes(m) # define xfs_rtfile_convert_unwritten(ip, pos, len) (0) +# define xfs_rt_resv_free(mp) ((void)0) +# define xfs_rt_resv_init(mp) (0) #endif /* CONFIG_XFS_RT */ #endif /* __XFS_RTALLOC_H__ */ diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index d0e939f5b706..0b5748546c4c 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -5072,6 +5072,51 @@ DEFINE_IMETA_DIR_EVENT(xfs_imeta_dir_created); DEFINE_IMETA_DIR_EVENT(xfs_imeta_dir_unlinked); DEFINE_IMETA_DIR_EVENT(xfs_imeta_dir_link); +/* metadata inode space reservations */ + +DECLARE_EVENT_CLASS(xfs_imeta_resv_class, + TP_PROTO(struct xfs_inode *ip, xfs_filblks_t len), + TP_ARGS(ip, len), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(xfs_ino_t, ino) + __field(unsigned long long, freeblks) + __field(unsigned long long, reserved) + __field(unsigned long long, asked) + __field(unsigned long long, used) + __field(unsigned long long, len) + ), + TP_fast_assign( + struct xfs_mount *mp = ip->i_mount; + + __entry->dev = mp->m_super->s_dev; + __entry->ino = ip->i_ino; + __entry->freeblks = percpu_counter_sum(&mp->m_fdblocks); + __entry->reserved = ip->i_delayed_blks; + __entry->asked = ip->i_meta_resv_asked; + __entry->used = ip->i_nblocks; + __entry->len = len; + ), + TP_printk("dev %d:%d ino 0x%llx freeblks %llu resv %llu ask %llu used %llu len %llu", + MAJOR(__entry->dev), MINOR(__entry->dev), + __entry->ino, + __entry->freeblks, + __entry->reserved, + __entry->asked, + __entry->used, + __entry->len) +) +#define DEFINE_IMETA_RESV_EVENT(name) \ +DEFINE_EVENT(xfs_imeta_resv_class, name, \ + TP_PROTO(struct xfs_inode *ip, xfs_filblks_t len), \ + TP_ARGS(ip, len)) +DEFINE_IMETA_RESV_EVENT(xfs_imeta_resv_init); +DEFINE_IMETA_RESV_EVENT(xfs_imeta_resv_free); +DEFINE_IMETA_RESV_EVENT(xfs_imeta_resv_alloc_extent); +DEFINE_IMETA_RESV_EVENT(xfs_imeta_resv_free_extent); +DEFINE_IMETA_RESV_EVENT(xfs_imeta_resv_critical); +DEFINE_INODE_ERROR_EVENT(xfs_imeta_resv_init_error); + #endif /* _TRACE_XFS_H */ #undef TRACE_INCLUDE_PATH ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 1/2] xfs: simplify xfs_ag_resv_free signature 2022-12-30 22:18 ` [PATCHSET v1.0 0/2] xfs: enable in-core block reservation for rt metadata Darrick J. Wong 2022-12-30 22:18 ` [PATCH 2/2] xfs: allow inode-based btrees to reserve space in the data device Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 1 sibling, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> It's not possible to fail at increasing fdblocks, so get rid of all the error returns here. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_ag.c | 4 +--- fs/xfs/libxfs/xfs_ag_resv.c | 22 +++++----------------- fs/xfs/libxfs/xfs_ag_resv.h | 2 +- fs/xfs/scrub/repair.c | 5 +---- fs/xfs/xfs_fsops.c | 24 ++++++------------------ fs/xfs/xfs_fsops.h | 2 +- fs/xfs/xfs_super.c | 6 +----- fs/xfs/xfs_trace.h | 1 - 8 files changed, 16 insertions(+), 50 deletions(-) diff --git a/fs/xfs/libxfs/xfs_ag.c b/fs/xfs/libxfs/xfs_ag.c index 05d0a97e08c3..bc1fc86df322 100644 --- a/fs/xfs/libxfs/xfs_ag.c +++ b/fs/xfs/libxfs/xfs_ag.c @@ -887,9 +887,7 @@ xfs_ag_shrink_space( * Disable perag reservations so it doesn't cause the allocation request * to fail. We'll reestablish reservation before we return. */ - error = xfs_ag_resv_free(pag); - if (error) - return error; + xfs_ag_resv_free(pag); /* internal log shouldn't also show up in the free space btrees */ error = xfs_alloc_vextent(&args); diff --git a/fs/xfs/libxfs/xfs_ag_resv.c b/fs/xfs/libxfs/xfs_ag_resv.c index 2e6128a25635..8723cd0d3f58 100644 --- a/fs/xfs/libxfs/xfs_ag_resv.c +++ b/fs/xfs/libxfs/xfs_ag_resv.c @@ -126,14 +126,13 @@ xfs_ag_resv_needed( } /* Clean out a reservation */ -static int +static void __xfs_ag_resv_free( struct xfs_perag *pag, enum xfs_ag_resv_type type) { struct xfs_ag_resv *resv; xfs_extlen_t oldresv; - int error; trace_xfs_ag_resv_free(pag, type, 0); @@ -149,30 +148,19 @@ __xfs_ag_resv_free( oldresv = resv->ar_orig_reserved; else oldresv = resv->ar_reserved; - error = xfs_mod_fdblocks(pag->pag_mount, oldresv, true); + xfs_mod_fdblocks(pag->pag_mount, oldresv, true); resv->ar_reserved = 0; resv->ar_asked = 0; resv->ar_orig_reserved = 0; - - if (error) - trace_xfs_ag_resv_free_error(pag->pag_mount, pag->pag_agno, - error, _RET_IP_); - return error; } /* Free a per-AG reservation. */ -int +void xfs_ag_resv_free( struct xfs_perag *pag) { - int error; - int err2; - - error = __xfs_ag_resv_free(pag, XFS_AG_RESV_RMAPBT); - err2 = __xfs_ag_resv_free(pag, XFS_AG_RESV_METADATA); - if (err2 && !error) - error = err2; - return error; + __xfs_ag_resv_free(pag, XFS_AG_RESV_RMAPBT); + __xfs_ag_resv_free(pag, XFS_AG_RESV_METADATA); } static int diff --git a/fs/xfs/libxfs/xfs_ag_resv.h b/fs/xfs/libxfs/xfs_ag_resv.h index b74b210008ea..ff20ed93de77 100644 --- a/fs/xfs/libxfs/xfs_ag_resv.h +++ b/fs/xfs/libxfs/xfs_ag_resv.h @@ -6,7 +6,7 @@ #ifndef __XFS_AG_RESV_H__ #define __XFS_AG_RESV_H__ -int xfs_ag_resv_free(struct xfs_perag *pag); +void xfs_ag_resv_free(struct xfs_perag *pag); int xfs_ag_resv_init(struct xfs_perag *pag, struct xfs_trans *tp); bool xfs_ag_resv_critical(struct xfs_perag *pag, enum xfs_ag_resv_type type); diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c index b5c5ee7f512b..1652f633f692 100644 --- a/fs/xfs/scrub/repair.c +++ b/fs/xfs/scrub/repair.c @@ -951,9 +951,7 @@ xrep_reset_perag_resv( ASSERT(sc->tp); sc->flags &= ~XREP_RESET_PERAG_RESV; - error = xfs_ag_resv_free(sc->sa.pag); - if (error) - goto out; + xfs_ag_resv_free(sc->sa.pag); error = xfs_ag_resv_init(sc->sa.pag, sc->tp); if (error == -ENOSPC) { xfs_err(sc->mp, @@ -962,7 +960,6 @@ xrep_reset_perag_resv( error = 0; } -out: return error; } diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c index 2da86f05e0e5..8186f142864a 100644 --- a/fs/xfs/xfs_fsops.c +++ b/fs/xfs/xfs_fsops.c @@ -201,11 +201,10 @@ xfs_growfs_data_private( struct xfs_perag *pag; pag = xfs_perag_get(mp, id.agno); - error = xfs_ag_resv_free(pag); + xfs_ag_resv_free(pag); xfs_perag_put(pag); - if (error) - return error; } + /* * Reserve AG metadata blocks. ENOSPC here does not mean there * was a growfs failure, just that there still isn't space for @@ -585,24 +584,13 @@ xfs_fs_reserve_ag_blocks( /* * Free space reserved for per-AG metadata. */ -int +void xfs_fs_unreserve_ag_blocks( struct xfs_mount *mp) { - xfs_agnumber_t agno; struct xfs_perag *pag; - int error = 0; - int err2; + xfs_agnumber_t agno; - for_each_perag(mp, agno, pag) { - err2 = xfs_ag_resv_free(pag); - if (err2 && !error) - error = err2; - } - - if (error) - xfs_warn(mp, - "Error %d freeing per-AG metadata reserve pool.", error); - - return error; + for_each_perag(mp, agno, pag) + xfs_ag_resv_free(pag); } diff --git a/fs/xfs/xfs_fsops.h b/fs/xfs/xfs_fsops.h index 2cffe51a31e8..dba17c404e7d 100644 --- a/fs/xfs/xfs_fsops.h +++ b/fs/xfs/xfs_fsops.h @@ -14,6 +14,6 @@ extern int xfs_reserve_blocks(xfs_mount_t *mp, uint64_t *inval, extern int xfs_fs_goingdown(xfs_mount_t *mp, uint32_t inflags); extern int xfs_fs_reserve_ag_blocks(struct xfs_mount *mp); -extern int xfs_fs_unreserve_ag_blocks(struct xfs_mount *mp); +extern void xfs_fs_unreserve_ag_blocks(struct xfs_mount *mp); #endif /* __XFS_FSOPS_H__ */ diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index bfe93ca6eed4..e145de0bd562 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -1830,11 +1830,7 @@ xfs_remount_ro( xfs_inodegc_stop(mp); /* Free the per-AG metadata reservation pool. */ - error = xfs_fs_unreserve_ag_blocks(mp); - if (error) { - xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE); - return error; - } + xfs_fs_unreserve_ag_blocks(mp); /* * Before we sync the metadata, we need to free up the reserve block diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index cfb26288394a..d0e939f5b706 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -3145,7 +3145,6 @@ DEFINE_AG_RESV_EVENT(xfs_ag_resv_free_extent); DEFINE_AG_RESV_EVENT(xfs_ag_resv_critical); DEFINE_AG_RESV_EVENT(xfs_ag_resv_needed); -DEFINE_AG_ERROR_EVENT(xfs_ag_resv_free_error); DEFINE_AG_ERROR_EVENT(xfs_ag_resv_init_error); /* refcount tracepoint classes */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCHSET v1.0 0/2] xfs: extent free log intent cleanups 2022-12-30 21:14 [NYE DELUGE 3/4] xfs: modernize the realtime volume Darrick J. Wong ` (9 preceding siblings ...) 2022-12-30 22:18 ` [PATCHSET v1.0 0/2] xfs: enable in-core block reservation for rt metadata Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 1/2] xfs: clean up extent free log intent item tracepoint callsites Darrick J. Wong 2022-12-30 22:18 ` [PATCH 2/2] xfs: convert "skip_discard" to a proper flags bitset Darrick J. Wong 2022-12-30 22:18 ` [PATCHSET v1.0 0/2] xfs: widen EFI format to support rt Darrick J. Wong ` (28 subsequent siblings) 39 siblings, 2 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs Hi all, This series cleans up some warts in the extent freeing log intent code. We start by acknowledging that this mechanism does not have anything to do with the bmap code by moving it to xfs_alloc.c and giving the function a more descriptive name. Then we clean up the tracepoints and the _finish_one call paths to pass the intent structure around. This reduces the overhead when the tracepoints are disabled and will make things much cleaner when we start adding realtime support in the next patch. If you're going to start using this mess, you probably ought to just pull from my git trees, which are linked below. This is an extraordinary way to destroy everything. Enjoy! Comments and questions are, as always, welcome. --D kernel git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=extfree-intent-cleanups xfsprogs git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=extfree-intent-cleanups --- fs/xfs/libxfs/xfs_ag.c | 3 ++- fs/xfs/libxfs/xfs_alloc.c | 14 +++++++------- fs/xfs/libxfs/xfs_alloc.h | 20 +++++++------------- fs/xfs/libxfs/xfs_bmap.c | 14 +++++++++----- fs/xfs/libxfs/xfs_bmap_btree.c | 2 +- fs/xfs/libxfs/xfs_ialloc.c | 6 +++--- fs/xfs/libxfs/xfs_refcount.c | 7 ++++--- fs/xfs/scrub/newbt.c | 4 ++-- fs/xfs/scrub/reap.c | 11 +++++++---- fs/xfs/xfs_extfree_item.c | 6 ++---- fs/xfs/xfs_reflink.c | 2 +- fs/xfs/xfs_trace.h | 33 +++++++++++++++------------------ 12 files changed, 60 insertions(+), 62 deletions(-) ^ permalink raw reply [flat|nested] 565+ messages in thread
* [PATCH 1/2] xfs: clean up extent free log intent item tracepoint callsites 2022-12-30 22:18 ` [PATCHSET v1.0 0/2] xfs: extent free log intent cleanups Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 2/2] xfs: convert "skip_discard" to a proper flags bitset Darrick J. Wong 1 sibling, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Pass the incore EFI structure to the tracepoints instead of open-coding the argument passing. This cleans up the call sites a bit. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_alloc.c | 7 +++---- fs/xfs/libxfs/xfs_alloc.h | 1 - fs/xfs/xfs_extfree_item.c | 6 ++---- fs/xfs/xfs_trace.h | 33 +++++++++++++++------------------ 4 files changed, 20 insertions(+), 27 deletions(-) diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c index 550d0e3c8528..b9aef7937a2c 100644 --- a/fs/xfs/libxfs/xfs_alloc.c +++ b/fs/xfs/libxfs/xfs_alloc.c @@ -2587,7 +2587,7 @@ xfs_defer_agfl_block( xefi->xefi_blockcount = 1; xefi->xefi_owner = oinfo->oi_owner; - trace_xfs_agfl_free_defer(mp, agno, 0, agbno, 1); + trace_xfs_agfl_free_defer(mp, xefi); xfs_extent_free_get_group(mp, xefi); xfs_defer_add(tp, XFS_DEFER_OPS_TYPE_AGFL_FREE, &xefi->xefi_list); @@ -2641,9 +2641,8 @@ __xfs_free_extent_later( } else { xefi->xefi_owner = XFS_RMAP_OWN_NULL; } - trace_xfs_bmap_free_defer(mp, - XFS_FSB_TO_AGNO(tp->t_mountp, bno), 0, - XFS_FSB_TO_AGBNO(tp->t_mountp, bno), len); + + trace_xfs_extent_free_defer(mp, xefi); xfs_extent_free_get_group(mp, xefi); xfs_defer_add(tp, XFS_DEFER_OPS_TYPE_FREE, &xefi->xefi_list); diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h index 5b05c8bfa60a..83b92c3b3452 100644 --- a/fs/xfs/libxfs/xfs_alloc.h +++ b/fs/xfs/libxfs/xfs_alloc.h @@ -255,7 +255,6 @@ xfs_free_extent_later( __xfs_free_extent_later(tp, bno, len, oinfo, false); } - extern struct kmem_cache *xfs_extfree_item_cache; int __init xfs_extfree_intent_init_cache(void); diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c index cec637de322e..e23af5ee16b1 100644 --- a/fs/xfs/xfs_extfree_item.c +++ b/fs/xfs/xfs_extfree_item.c @@ -361,8 +361,7 @@ xfs_trans_free_extent( if (xefi->xefi_flags & XFS_EFI_BMBT_BLOCK) oinfo.oi_flags |= XFS_OWNER_INFO_BMBT_BLOCK; - trace_xfs_bmap_free_deferred(tp->t_mountp, xefi->xefi_pag->pag_agno, 0, - agbno, xefi->xefi_blockcount); + trace_xfs_extent_free_deferred(mp, xefi); error = __xfs_free_extent(tp, xefi->xefi_pag, agbno, xefi->xefi_blockcount, &oinfo, XFS_AG_RESV_NONE, @@ -558,8 +557,7 @@ xfs_agfl_free_finish_item( agbno = XFS_FSB_TO_AGBNO(mp, xefi->xefi_startblock); oinfo.oi_owner = xefi->xefi_owner; - trace_xfs_agfl_free_deferred(mp, xefi->xefi_pag->pag_agno, 0, agbno, - xefi->xefi_blockcount); + trace_xfs_agfl_free_deferred(mp, xefi); error = xfs_alloc_read_agf(xefi->xefi_pag, tp, 0, &agbp); if (!error) diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index 0b5748546c4c..698616531ea0 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -83,6 +83,7 @@ struct xfs_bmap_intent; struct xfs_swapext_intent; struct xfs_swapext_req; struct xfs_rtgroup; +struct xfs_extent_free_item; #define XFS_ATTR_FILTER_FLAGS \ { XFS_ATTR_ROOT, "ROOT" }, \ @@ -2761,41 +2762,37 @@ DEFINE_DEFER_PENDING_EVENT(xfs_defer_pending_abort); DEFINE_DEFER_PENDING_EVENT(xfs_defer_relog_intent); DECLARE_EVENT_CLASS(xfs_free_extent_deferred_class, - TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, - int type, xfs_agblock_t agbno, xfs_extlen_t len), - TP_ARGS(mp, agno, type, agbno, len), + TP_PROTO(struct xfs_mount *mp, struct xfs_extent_free_item *free), + TP_ARGS(mp, free), TP_STRUCT__entry( __field(dev_t, dev) __field(xfs_agnumber_t, agno) - __field(int, type) __field(xfs_agblock_t, agbno) __field(xfs_extlen_t, len) + __field(unsigned int, flags) ), TP_fast_assign( __entry->dev = mp->m_super->s_dev; - __entry->agno = agno; - __entry->type = type; - __entry->agbno = agbno; - __entry->len = len; + __entry->agno = XFS_FSB_TO_AGNO(mp, free->xefi_startblock); + __entry->agbno = XFS_FSB_TO_AGBNO(mp, free->xefi_startblock); + __entry->len = free->xefi_blockcount; + __entry->flags = free->xefi_flags; ), - TP_printk("dev %d:%d op %d agno 0x%x agbno 0x%x fsbcount 0x%x", + TP_printk("dev %d:%d agno 0x%x agbno 0x%x fsbcount 0x%x flags 0x%x", MAJOR(__entry->dev), MINOR(__entry->dev), - __entry->type, __entry->agno, __entry->agbno, - __entry->len) + __entry->len, + __entry->flags) ); #define DEFINE_FREE_EXTENT_DEFERRED_EVENT(name) \ DEFINE_EVENT(xfs_free_extent_deferred_class, name, \ - TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, \ - int type, \ - xfs_agblock_t bno, \ - xfs_extlen_t len), \ - TP_ARGS(mp, agno, type, bno, len)) -DEFINE_FREE_EXTENT_DEFERRED_EVENT(xfs_bmap_free_defer); -DEFINE_FREE_EXTENT_DEFERRED_EVENT(xfs_bmap_free_deferred); + TP_PROTO(struct xfs_mount *mp, struct xfs_extent_free_item *free), \ + TP_ARGS(mp, free)) DEFINE_FREE_EXTENT_DEFERRED_EVENT(xfs_agfl_free_defer); DEFINE_FREE_EXTENT_DEFERRED_EVENT(xfs_agfl_free_deferred); +DEFINE_FREE_EXTENT_DEFERRED_EVENT(xfs_extent_free_defer); +DEFINE_FREE_EXTENT_DEFERRED_EVENT(xfs_extent_free_deferred); DECLARE_EVENT_CLASS(xfs_defer_pending_item_class, TP_PROTO(struct xfs_mount *mp, struct xfs_defer_pending *dfp, ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 2/2] xfs: convert "skip_discard" to a proper flags bitset 2022-12-30 22:18 ` [PATCHSET v1.0 0/2] xfs: extent free log intent cleanups Darrick J. Wong 2022-12-30 22:18 ` [PATCH 1/2] xfs: clean up extent free log intent item tracepoint callsites Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 1 sibling, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Convert the boolean to skip discard on free into a proper flags field so that we can add more flags in the next patch. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_ag.c | 3 ++- fs/xfs/libxfs/xfs_alloc.c | 7 ++++--- fs/xfs/libxfs/xfs_alloc.h | 19 +++++++------------ fs/xfs/libxfs/xfs_bmap.c | 14 +++++++++----- fs/xfs/libxfs/xfs_bmap_btree.c | 2 +- fs/xfs/libxfs/xfs_ialloc.c | 6 +++--- fs/xfs/libxfs/xfs_refcount.c | 7 ++++--- fs/xfs/scrub/newbt.c | 4 ++-- fs/xfs/scrub/reap.c | 11 +++++++---- fs/xfs/xfs_reflink.c | 2 +- 10 files changed, 40 insertions(+), 35 deletions(-) diff --git a/fs/xfs/libxfs/xfs_ag.c b/fs/xfs/libxfs/xfs_ag.c index bc1fc86df322..baf13b4fc0f2 100644 --- a/fs/xfs/libxfs/xfs_ag.c +++ b/fs/xfs/libxfs/xfs_ag.c @@ -921,7 +921,8 @@ xfs_ag_shrink_space( if (err2 != -ENOSPC) goto resv_err; - __xfs_free_extent_later(*tpp, args.fsbno, delta, NULL, true); + xfs_free_extent_later(*tpp, args.fsbno, delta, NULL, + XFS_FREE_EXTENT_SKIP_DISCARD); /* * Roll the transaction before trying to re-init the per-ag diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c index b9aef7937a2c..d4943c197a76 100644 --- a/fs/xfs/libxfs/xfs_alloc.c +++ b/fs/xfs/libxfs/xfs_alloc.c @@ -2598,12 +2598,12 @@ xfs_defer_agfl_block( * The list is maintained sorted (by block number). */ void -__xfs_free_extent_later( +xfs_free_extent_later( struct xfs_trans *tp, xfs_fsblock_t bno, xfs_filblks_t len, const struct xfs_owner_info *oinfo, - bool skip_discard) + unsigned int flags) { struct xfs_extent_free_item *xefi; struct xfs_mount *mp = tp->t_mountp; @@ -2622,13 +2622,14 @@ __xfs_free_extent_later( ASSERT(len < mp->m_sb.sb_agblocks); ASSERT(agbno + len <= mp->m_sb.sb_agblocks); #endif + ASSERT(!(flags & ~XFS_FREE_EXTENT_ALL_FLAGS)); ASSERT(xfs_extfree_item_cache != NULL); xefi = kmem_cache_zalloc(xfs_extfree_item_cache, GFP_KERNEL | __GFP_NOFAIL); xefi->xefi_startblock = bno; xefi->xefi_blockcount = (xfs_extlen_t)len; - if (skip_discard) + if (flags & XFS_FREE_EXTENT_SKIP_DISCARD) xefi->xefi_flags |= XFS_EFI_SKIP_DISCARD; if (oinfo) { ASSERT(oinfo->oi_offset == 0); diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h index 83b92c3b3452..19c5f046c3c4 100644 --- a/fs/xfs/libxfs/xfs_alloc.h +++ b/fs/xfs/libxfs/xfs_alloc.h @@ -221,9 +221,14 @@ xfs_buf_to_agfl_bno( return bp->b_addr; } -void __xfs_free_extent_later(struct xfs_trans *tp, xfs_fsblock_t bno, +void xfs_free_extent_later(struct xfs_trans *tp, xfs_fsblock_t bno, xfs_filblks_t len, const struct xfs_owner_info *oinfo, - bool skip_discard); + unsigned int flags); + +/* Don't issue a discard for the blocks freed. */ +#define XFS_FREE_EXTENT_SKIP_DISCARD (1U << 0) + +#define XFS_FREE_EXTENT_ALL_FLAGS (XFS_FREE_EXTENT_SKIP_DISCARD) /* * List of extents to be free "later". @@ -245,16 +250,6 @@ void xfs_extent_free_get_group(struct xfs_mount *mp, #define XFS_EFI_ATTR_FORK (1U << 1) /* freeing attr fork block */ #define XFS_EFI_BMBT_BLOCK (1U << 2) /* freeing bmap btree block */ -static inline void -xfs_free_extent_later( - struct xfs_trans *tp, - xfs_fsblock_t bno, - xfs_filblks_t len, - const struct xfs_owner_info *oinfo) -{ - __xfs_free_extent_later(tp, bno, len, oinfo, false); -} - extern struct kmem_cache *xfs_extfree_item_cache; int __init xfs_extfree_intent_init_cache(void); diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index eda20bb5c4af..2e93b018d150 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -585,7 +585,7 @@ xfs_bmap_btree_to_extents( if ((error = xfs_btree_check_block(cur, cblock, 0, cbp))) return error; xfs_rmap_ino_bmbt_owner(&oinfo, ip->i_ino, whichfork); - xfs_free_extent_later(cur->bc_tp, cbno, 1, &oinfo); + xfs_free_extent_later(cur->bc_tp, cbno, 1, &oinfo, 0); ip->i_nblocks--; xfs_trans_mod_dquot_byino(tp, ip, XFS_TRANS_DQ_BCOUNT, -1L); xfs_trans_binval(tp, cbp); @@ -5307,10 +5307,14 @@ xfs_bmap_del_extent_real( if (xfs_is_reflink_inode(ip) && whichfork == XFS_DATA_FORK) { xfs_refcount_decrease_extent(tp, del); } else { - __xfs_free_extent_later(tp, del->br_startblock, - del->br_blockcount, NULL, - (bflags & XFS_BMAPI_NODISCARD) || - del->br_state == XFS_EXT_UNWRITTEN); + unsigned int efi_flags = 0; + + if ((bflags & XFS_BMAPI_NODISCARD) || + del->br_state == XFS_EXT_UNWRITTEN) + efi_flags |= XFS_FREE_EXTENT_SKIP_DISCARD; + + xfs_free_extent_later(tp, del->br_startblock, + del->br_blockcount, NULL, efi_flags); } } diff --git a/fs/xfs/libxfs/xfs_bmap_btree.c b/fs/xfs/libxfs/xfs_bmap_btree.c index 4c6a91acdad6..4f0bf593c2a5 100644 --- a/fs/xfs/libxfs/xfs_bmap_btree.c +++ b/fs/xfs/libxfs/xfs_bmap_btree.c @@ -287,7 +287,7 @@ xfs_bmbt_free_block( struct xfs_owner_info oinfo; xfs_rmap_ino_bmbt_owner(&oinfo, ip->i_ino, cur->bc_ino.whichfork); - xfs_free_extent_later(cur->bc_tp, fsbno, 1, &oinfo); + xfs_free_extent_later(cur->bc_tp, fsbno, 1, &oinfo, 0); ip->i_nblocks--; xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c index 331d22a60272..7bfda1c884aa 100644 --- a/fs/xfs/libxfs/xfs_ialloc.c +++ b/fs/xfs/libxfs/xfs_ialloc.c @@ -1918,8 +1918,8 @@ xfs_difree_inode_chunk( if (!xfs_inobt_issparse(rec->ir_holemask)) { /* not sparse, calculate extent info directly */ xfs_free_extent_later(tp, XFS_AGB_TO_FSB(mp, agno, sagbno), - M_IGEO(mp)->ialloc_blks, - &XFS_RMAP_OINFO_INODES); + M_IGEO(mp)->ialloc_blks, + &XFS_RMAP_OINFO_INODES, 0); return; } @@ -1963,7 +1963,7 @@ xfs_difree_inode_chunk( ASSERT(agbno % mp->m_sb.sb_spino_align == 0); ASSERT(contigblk % mp->m_sb.sb_spino_align == 0); xfs_free_extent_later(tp, XFS_AGB_TO_FSB(mp, agno, agbno), - contigblk, &XFS_RMAP_OINFO_INODES); + contigblk, &XFS_RMAP_OINFO_INODES, 0); /* reset range to current bit and carry on... */ startidx = endidx = nextbit; diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c index ba329fa53a56..2721c6076712 100644 --- a/fs/xfs/libxfs/xfs_refcount.c +++ b/fs/xfs/libxfs/xfs_refcount.c @@ -1180,7 +1180,7 @@ xfs_refcount_adjust_extents( cur->bc_ag.pag->pag_agno, tmp.rc_startblock); xfs_free_extent_later(cur->bc_tp, fsbno, - tmp.rc_blockcount, NULL); + tmp.rc_blockcount, NULL, 0); } (*agbno) += tmp.rc_blockcount; @@ -1241,7 +1241,7 @@ xfs_refcount_adjust_extents( cur->bc_ag.pag->pag_agno, ext.rc_startblock); xfs_free_extent_later(cur->bc_tp, fsbno, - ext.rc_blockcount, NULL); + ext.rc_blockcount, NULL, 0); } skip: @@ -2021,7 +2021,8 @@ xfs_refcount_recover_cow_leftovers( rr->rr_rrec.rc_blockcount); /* Free the block. */ - xfs_free_extent_later(tp, fsb, rr->rr_rrec.rc_blockcount, NULL); + xfs_free_extent_later(tp, fsb, rr->rr_rrec.rc_blockcount, NULL, + 0); error = xfs_trans_commit(tp); if (error) diff --git a/fs/xfs/scrub/newbt.c b/fs/xfs/scrub/newbt.c index 9c0ccba75656..6812ff67848d 100644 --- a/fs/xfs/scrub/newbt.c +++ b/fs/xfs/scrub/newbt.c @@ -416,8 +416,8 @@ xrep_newbt_free_extent( * if the system goes down. */ fsbno = XFS_AGB_TO_FSB(sc->mp, resv->pag->pag_agno, free_agbno); - __xfs_free_extent_later(sc->tp, fsbno, free_aglen, &xnr->oinfo, - true); + xfs_free_extent_later(sc->tp, fsbno, free_aglen, &xnr->oinfo, + XFS_FREE_EXTENT_SKIP_DISCARD); return 1; } diff --git a/fs/xfs/scrub/reap.c b/fs/xfs/scrub/reap.c index f43ad4dfc6f7..151afacab982 100644 --- a/fs/xfs/scrub/reap.c +++ b/fs/xfs/scrub/reap.c @@ -382,7 +382,8 @@ xreap_agextent( rs->force_roll = true; xfs_refcount_free_cow_extent(sc->tp, fsbno, *aglenp); - __xfs_free_extent_later(sc->tp, fsbno, *aglenp, NULL, true); + xfs_free_extent_later(sc->tp, fsbno, *aglenp, NULL, + XFS_FREE_EXTENT_SKIP_DISCARD); return 0; } @@ -412,7 +413,8 @@ xreap_agextent( * to minimize the window in which we could crash and lose the * old blocks. */ - __xfs_free_extent_later(sc->tp, fsbno, *aglenp, rs->oinfo, true); + xfs_free_extent_later(sc->tp, fsbno, *aglenp, rs->oinfo, + XFS_FREE_EXTENT_SKIP_DISCARD); rs->deferred++; break; } @@ -959,8 +961,9 @@ xreap_ifork_extent( xfs_bmap_unmap_extent(sc->tp, ip, whichfork, imap); xfs_trans_mod_dquot_byino(sc->tp, ip, XFS_TRANS_DQ_BCOUNT, -(int64_t)imap->br_blockcount); - __xfs_free_extent_later(sc->tp, imap->br_startblock, - imap->br_blockcount, NULL, true); + xfs_free_extent_later(sc->tp, imap->br_startblock, + imap->br_blockcount, NULL, + XFS_FREE_EXTENT_SKIP_DISCARD); } out_agf: diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c index 0804f0ad6b1c..cf514af238ce 100644 --- a/fs/xfs/xfs_reflink.c +++ b/fs/xfs/xfs_reflink.c @@ -618,7 +618,7 @@ xfs_reflink_cancel_cow_blocks( del.br_blockcount); xfs_free_extent_later(*tpp, del.br_startblock, - del.br_blockcount, NULL); + del.br_blockcount, NULL, 0); /* Roll the transaction */ error = xfs_defer_finish(tpp); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCHSET v1.0 0/2] xfs: widen EFI format to support rt 2022-12-30 21:14 [NYE DELUGE 3/4] xfs: modernize the realtime volume Darrick J. Wong ` (10 preceding siblings ...) 2022-12-30 22:18 ` [PATCHSET v1.0 0/2] xfs: extent free log intent cleanups Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 1/2] xfs: support logging EFIs for realtime extents Darrick J. Wong 2022-12-30 22:18 ` [PATCH 2/2] xfs: support error injection when freeing rt extents Darrick J. Wong 2022-12-30 22:18 ` [PATCHSET v1.0 0/5] xfs: rmap log intent cleanups Darrick J. Wong ` (27 subsequent siblings) 39 siblings, 2 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs Hi all, Realtime reverse mapping (and beyond that, realtime reflink) needs to be able to defer file mapping and extent freeing work in much the same manner as is required on the data volume. Make the extent freeing log items operate on rt extents in preparation for realtime rmap. If you're going to start using this mess, you probably ought to just pull from my git trees, which are linked below. This is an extraordinary way to destroy everything. Enjoy! Comments and questions are, as always, welcome. --D kernel git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=realtime-extfree-intents xfsprogs git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=realtime-extfree-intents --- fs/xfs/libxfs/xfs_alloc.c | 35 ++++++++++++++++---- fs/xfs/libxfs/xfs_alloc.h | 17 ++++++++-- fs/xfs/libxfs/xfs_defer.c | 1 + fs/xfs/libxfs/xfs_defer.h | 1 + fs/xfs/libxfs/xfs_log_format.h | 7 ++++ fs/xfs/libxfs/xfs_rtbitmap.c | 4 ++ fs/xfs/xfs_extfree_item.c | 71 ++++++++++++++++++++++++++++++++++++++-- 7 files changed, 123 insertions(+), 13 deletions(-) ^ permalink raw reply [flat|nested] 565+ messages in thread
* [PATCH 1/2] xfs: support logging EFIs for realtime extents 2022-12-30 22:18 ` [PATCHSET v1.0 0/2] xfs: widen EFI format to support rt Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 2/2] xfs: support error injection when freeing rt extents Darrick J. Wong 1 sibling, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Teach the EFI mechanism how to free realtime extents. We do this very sneakily, by using the upper bit of the length field in the log format (and a boolean flag incore) to convey the realtime status. We're going to need this to enforce proper ordering of operations when we enable realtime rmap. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_alloc.c | 35 ++++++++++++++++---- fs/xfs/libxfs/xfs_alloc.h | 17 ++++++++-- fs/xfs/libxfs/xfs_defer.c | 1 + fs/xfs/libxfs/xfs_defer.h | 1 + fs/xfs/libxfs/xfs_log_format.h | 7 ++++ fs/xfs/xfs_extfree_item.c | 71 ++++++++++++++++++++++++++++++++++++++-- 6 files changed, 119 insertions(+), 13 deletions(-) diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c index d4943c197a76..5d091789ff74 100644 --- a/fs/xfs/libxfs/xfs_alloc.c +++ b/fs/xfs/libxfs/xfs_alloc.c @@ -2607,6 +2607,7 @@ xfs_free_extent_later( { struct xfs_extent_free_item *xefi; struct xfs_mount *mp = tp->t_mountp; + enum xfs_defer_ops_type optype; #ifdef DEBUG xfs_agnumber_t agno; xfs_agblock_t agbno; @@ -2615,12 +2616,19 @@ xfs_free_extent_later( ASSERT(len > 0); ASSERT(len <= XFS_MAX_BMBT_EXTLEN); ASSERT(!isnullstartblock(bno)); - agno = XFS_FSB_TO_AGNO(mp, bno); - agbno = XFS_FSB_TO_AGBNO(mp, bno); - ASSERT(agno < mp->m_sb.sb_agcount); - ASSERT(agbno < mp->m_sb.sb_agblocks); - ASSERT(len < mp->m_sb.sb_agblocks); - ASSERT(agbno + len <= mp->m_sb.sb_agblocks); + if (flags & XFS_FREE_EXTENT_REALTIME) { + ASSERT(bno < mp->m_sb.sb_rblocks); + ASSERT(len <= mp->m_sb.sb_rblocks); + ASSERT(bno + len <= mp->m_sb.sb_rblocks); + } else { + agno = XFS_FSB_TO_AGNO(mp, bno); + agbno = XFS_FSB_TO_AGBNO(mp, bno); + + ASSERT(agno < mp->m_sb.sb_agcount); + ASSERT(agbno < mp->m_sb.sb_agblocks); + ASSERT(len < mp->m_sb.sb_agblocks); + ASSERT(agbno + len <= mp->m_sb.sb_agblocks); + } #endif ASSERT(!(flags & ~XFS_FREE_EXTENT_ALL_FLAGS)); ASSERT(xfs_extfree_item_cache != NULL); @@ -2631,6 +2639,19 @@ xfs_free_extent_later( xefi->xefi_blockcount = (xfs_extlen_t)len; if (flags & XFS_FREE_EXTENT_SKIP_DISCARD) xefi->xefi_flags |= XFS_EFI_SKIP_DISCARD; + if (flags & XFS_FREE_EXTENT_REALTIME) { + /* + * Realtime and data section EFIs must use separate + * transactions to finish deferred work because updates to + * realtime metadata files can lock AGFs to allocate btree + * blocks and we don't want that mixing with the AGF locks + * taken to finish data section EFIs. + */ + optype = XFS_DEFER_OPS_TYPE_FREE_RT; + xefi->xefi_flags |= XFS_EFI_REALTIME; + } else { + optype = XFS_DEFER_OPS_TYPE_FREE; + } if (oinfo) { ASSERT(oinfo->oi_offset == 0); @@ -2646,7 +2667,7 @@ xfs_free_extent_later( trace_xfs_extent_free_defer(mp, xefi); xfs_extent_free_get_group(mp, xefi); - xfs_defer_add(tp, XFS_DEFER_OPS_TYPE_FREE, &xefi->xefi_list); + xfs_defer_add(tp, optype, &xefi->xefi_list); } #ifdef DEBUG diff --git a/fs/xfs/libxfs/xfs_alloc.h b/fs/xfs/libxfs/xfs_alloc.h index 19c5f046c3c4..cd7b26568a33 100644 --- a/fs/xfs/libxfs/xfs_alloc.h +++ b/fs/xfs/libxfs/xfs_alloc.h @@ -228,7 +228,11 @@ void xfs_free_extent_later(struct xfs_trans *tp, xfs_fsblock_t bno, /* Don't issue a discard for the blocks freed. */ #define XFS_FREE_EXTENT_SKIP_DISCARD (1U << 0) -#define XFS_FREE_EXTENT_ALL_FLAGS (XFS_FREE_EXTENT_SKIP_DISCARD) +/* Free blocks on the realtime device. */ +#define XFS_FREE_EXTENT_REALTIME (1U << 1) + +#define XFS_FREE_EXTENT_ALL_FLAGS (XFS_FREE_EXTENT_SKIP_DISCARD | \ + XFS_FREE_EXTENT_REALTIME) /* * List of extents to be free "later". @@ -239,7 +243,10 @@ struct xfs_extent_free_item { uint64_t xefi_owner; xfs_fsblock_t xefi_startblock;/* starting fs block number */ xfs_extlen_t xefi_blockcount;/* number of blocks in extent */ - struct xfs_perag *xefi_pag; + union { + struct xfs_perag *xefi_pag; + struct xfs_rtgroup *xefi_rtg; + }; unsigned int xefi_flags; }; @@ -249,6 +256,12 @@ void xfs_extent_free_get_group(struct xfs_mount *mp, #define XFS_EFI_SKIP_DISCARD (1U << 0) /* don't issue discard */ #define XFS_EFI_ATTR_FORK (1U << 1) /* freeing attr fork block */ #define XFS_EFI_BMBT_BLOCK (1U << 2) /* freeing bmap btree block */ +#define XFS_EFI_REALTIME (1U << 3) /* freeing realtime extent */ + +static inline bool xfs_efi_is_realtime(const struct xfs_extent_free_item *xefi) +{ + return xefi->xefi_flags & XFS_EFI_REALTIME; +} extern struct kmem_cache *xfs_extfree_item_cache; diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c index 1619b9b928db..c0416bae880a 100644 --- a/fs/xfs/libxfs/xfs_defer.c +++ b/fs/xfs/libxfs/xfs_defer.c @@ -188,6 +188,7 @@ static const struct xfs_defer_op_type *defer_op_types[] = { [XFS_DEFER_OPS_TYPE_REFCOUNT] = &xfs_refcount_update_defer_type, [XFS_DEFER_OPS_TYPE_RMAP] = &xfs_rmap_update_defer_type, [XFS_DEFER_OPS_TYPE_FREE] = &xfs_extent_free_defer_type, + [XFS_DEFER_OPS_TYPE_FREE_RT] = &xfs_extent_free_defer_type, [XFS_DEFER_OPS_TYPE_AGFL_FREE] = &xfs_agfl_free_defer_type, [XFS_DEFER_OPS_TYPE_ATTR] = &xfs_attr_defer_type, [XFS_DEFER_OPS_TYPE_SWAPEXT] = &xfs_swapext_defer_type, diff --git a/fs/xfs/libxfs/xfs_defer.h b/fs/xfs/libxfs/xfs_defer.h index bcc48b0c75c9..52198c7124c6 100644 --- a/fs/xfs/libxfs/xfs_defer.h +++ b/fs/xfs/libxfs/xfs_defer.h @@ -19,6 +19,7 @@ enum xfs_defer_ops_type { XFS_DEFER_OPS_TYPE_RMAP, XFS_DEFER_OPS_TYPE_FREE, XFS_DEFER_OPS_TYPE_AGFL_FREE, + XFS_DEFER_OPS_TYPE_FREE_RT, XFS_DEFER_OPS_TYPE_ATTR, XFS_DEFER_OPS_TYPE_SWAPEXT, XFS_DEFER_OPS_TYPE_MAX, diff --git a/fs/xfs/libxfs/xfs_log_format.h b/fs/xfs/libxfs/xfs_log_format.h index 378201a70028..f3c8257a7545 100644 --- a/fs/xfs/libxfs/xfs_log_format.h +++ b/fs/xfs/libxfs/xfs_log_format.h @@ -591,6 +591,13 @@ typedef struct xfs_extent { xfs_extlen_t ext_len; } xfs_extent_t; +/* + * This EFI extent describes a realtime extent. We can never free more than + * XFS_MAX_BMBT_EXTLEN (2^21) blocks at a time, so we know that the upper bits + * of ext_len cannot be used. + */ +#define XFS_EFI_EXTLEN_REALTIME_EXT (1U << 31) + /* * Since an xfs_extent_t has types (start:64, len: 32) * there are different alignments on 32 bit and 64 bit kernels. diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c index e23af5ee16b1..42b89c9e996b 100644 --- a/fs/xfs/xfs_extfree_item.c +++ b/fs/xfs/xfs_extfree_item.c @@ -25,6 +25,10 @@ #include "xfs_error.h" #include "xfs_log_priv.h" #include "xfs_log_recover.h" +#include "xfs_rtalloc.h" +#include "xfs_inode.h" +#include "xfs_rtbitmap.h" +#include "xfs_rtgroup.h" struct kmem_cache *xfs_efi_cache; struct kmem_cache *xfs_efd_cache; @@ -363,9 +367,17 @@ xfs_trans_free_extent( trace_xfs_extent_free_deferred(mp, xefi); - error = __xfs_free_extent(tp, xefi->xefi_pag, agbno, - xefi->xefi_blockcount, &oinfo, XFS_AG_RESV_NONE, - xefi->xefi_flags & XFS_EFI_SKIP_DISCARD); + if (xfs_efi_is_realtime(xefi)) { + ASSERT(xefi->xefi_owner == XFS_RMAP_OWN_NULL || + xefi->xefi_owner == XFS_RMAP_OWN_UNKNOWN); + + error = xfs_rtfree_blocks(tp, xefi->xefi_startblock, + xefi->xefi_blockcount); + } else { + error = __xfs_free_extent(tp, xefi->xefi_pag, agbno, + xefi->xefi_blockcount, &oinfo, XFS_AG_RESV_NONE, + xefi->xefi_flags & XFS_EFI_SKIP_DISCARD); + } /* * Mark the transaction dirty, even on error. This ensures the @@ -400,6 +412,11 @@ xfs_extent_free_diff_items( ra = container_of(a, struct xfs_extent_free_item, xefi_list); rb = container_of(b, struct xfs_extent_free_item, xefi_list); + ASSERT(xfs_efi_is_realtime(ra) == xfs_efi_is_realtime(rb)); + + if (xfs_efi_is_realtime(ra)) + return ra->xefi_rtg->rtg_rgno - rb->xefi_rtg->rtg_rgno; + return ra->xefi_pag->pag_agno - rb->xefi_pag->pag_agno; } @@ -426,6 +443,8 @@ xfs_extent_free_log_item( extp = &efip->efi_format.efi_extents[next_extent]; extp->ext_start = xefi->xefi_startblock; extp->ext_len = xefi->xefi_blockcount; + if (xfs_efi_is_realtime(xefi)) + extp->ext_len |= XFS_EFI_EXTLEN_REALTIME_EXT; } static struct xfs_log_item * @@ -467,6 +486,14 @@ xfs_extent_free_get_group( { xfs_agnumber_t agno; + if (xfs_efi_is_realtime(xefi)) { + xfs_rgnumber_t rgno; + + rgno = xfs_rtb_to_rgno(mp, xefi->xefi_startblock); + xefi->xefi_rtg = xfs_rtgroup_get(mp, rgno); + return; + } + agno = XFS_FSB_TO_AGNO(mp, xefi->xefi_startblock); xefi->xefi_pag = xfs_perag_get(mp, agno); xfs_perag_bump_intents(xefi->xefi_pag); @@ -477,6 +504,11 @@ static inline void xfs_extent_free_put_group( struct xfs_extent_free_item *xefi) { + if (xfs_efi_is_realtime(xefi)) { + xfs_rtgroup_put(xefi->xefi_rtg); + return; + } + xfs_perag_drop_intents(xefi->xefi_pag); xfs_perag_put(xefi->xefi_pag); } @@ -494,6 +526,15 @@ xfs_extent_free_finish_item( xefi = container_of(item, struct xfs_extent_free_item, xefi_list); + /* + * Lock the rt bitmap if we've any realtime extents to free and we + * haven't locked the rt inodes yet. + */ + if (*state == NULL && xfs_efi_is_realtime(xefi)) { + xfs_rtbitmap_lock(tp, tp->t_mountp); + *state = (struct xfs_btree_cur *)1; + } + error = xfs_trans_free_extent(tp, EFD_ITEM(done), xefi); xfs_extent_free_put_group(xefi); @@ -554,6 +595,7 @@ xfs_agfl_free_finish_item( xefi = container_of(item, struct xfs_extent_free_item, xefi_list); ASSERT(xefi->xefi_blockcount == 1); + ASSERT(!xfs_efi_is_realtime(xefi)); agbno = XFS_FSB_TO_AGBNO(mp, xefi->xefi_startblock); oinfo.oi_owner = xefi->xefi_owner; @@ -602,6 +644,10 @@ xfs_efi_validate_ext( struct xfs_mount *mp, struct xfs_extent *extp) { + if (extp->ext_len & XFS_EFI_EXTLEN_REALTIME_EXT) + return xfs_verify_rtbext(mp, extp->ext_start, + extp->ext_len & ~XFS_EFI_EXTLEN_REALTIME_EXT); + return xfs_verify_fsbext(mp, extp->ext_start, extp->ext_len); } @@ -641,16 +687,33 @@ xfs_efi_item_recover( return error; efdp = xfs_trans_get_efd(tp, efip, efip->efi_format.efi_nextents); + /* Lock the rt bitmap if we've any realtime extents to free. */ + for (i = 0; i < efip->efi_format.efi_nextents; i++) { + struct xfs_extent *extp; + + extp = &efip->efi_format.efi_extents[i]; + if (extp->ext_len & XFS_EFI_EXTLEN_REALTIME_EXT) { + xfs_rtbitmap_lock(tp, mp); + break; + } + } + for (i = 0; i < efip->efi_format.efi_nextents; i++) { struct xfs_extent_free_item fake = { .xefi_owner = XFS_RMAP_OWN_UNKNOWN, }; struct xfs_extent *extp; + unsigned int len; extp = &efip->efi_format.efi_extents[i]; fake.xefi_startblock = extp->ext_start; - fake.xefi_blockcount = extp->ext_len; + len = extp->ext_len; + if (len & XFS_EFI_EXTLEN_REALTIME_EXT) { + len &= ~XFS_EFI_EXTLEN_REALTIME_EXT; + fake.xefi_flags |= XFS_EFI_REALTIME; + } + fake.xefi_blockcount = len; xfs_extent_free_get_group(mp, &fake); error = xfs_trans_free_extent(tp, efdp, &fake); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 2/2] xfs: support error injection when freeing rt extents 2022-12-30 22:18 ` [PATCHSET v1.0 0/2] xfs: widen EFI format to support rt Darrick J. Wong 2022-12-30 22:18 ` [PATCH 1/2] xfs: support logging EFIs for realtime extents Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 1 sibling, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> A handful of fstests expect to be able to test what happens when extent free intents fail to actually free the extent. Now that we're supporting EFIs for realtime extents, add to xfs_rtfree_extent the same injection point that exists in the regular extent freeing code. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_rtbitmap.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/fs/xfs/libxfs/xfs_rtbitmap.c b/fs/xfs/libxfs/xfs_rtbitmap.c index ccefbfc70f8b..a4cd7925492d 100644 --- a/fs/xfs/libxfs/xfs_rtbitmap.c +++ b/fs/xfs/libxfs/xfs_rtbitmap.c @@ -20,6 +20,7 @@ #include "xfs_rtbitmap.h" #include "xfs_log.h" #include "xfs_buf_item.h" +#include "xfs_errortag.h" /* * Realtime allocator bitmap functions shared with userspace. @@ -1139,6 +1140,9 @@ xfs_rtfree_extent( ASSERT(mp->m_rbmip->i_itemp != NULL); ASSERT(xfs_isilocked(mp->m_rbmip, XFS_ILOCK_EXCL)); + if (XFS_TEST_ERROR(false, mp, XFS_ERRTAG_FREE_EXTENT)) + return -EIO; + error = xfs_rtcheck_alloc_range(mp, tp, start, len); if (error) return error; ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCHSET v1.0 0/5] xfs: rmap log intent cleanups 2022-12-30 21:14 [NYE DELUGE 3/4] xfs: modernize the realtime volume Darrick J. Wong ` (11 preceding siblings ...) 2022-12-30 22:18 ` [PATCHSET v1.0 0/2] xfs: widen EFI format to support rt Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 1/5] xfs: attach rtgroup objects to btree cursors Darrick J. Wong ` (4 more replies) 2022-12-30 22:18 ` [PATCHSET v1.0 00/38] xfs: realtime reverse-mapping support Darrick J. Wong ` (26 subsequent siblings) 39 siblings, 5 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs Hi all, This series cleans up the rmap intent code before we start adding support for realtime devices. Similar to previous intent cleanup patchsets, we start transforming the tracepoints so that the data extraction are done inside the tracepoint code, and then we start passing the intent itself to the _finish_one function. This reduces the boxing and unboxing of parameters. If you're going to start using this mess, you probably ought to just pull from my git trees, which are linked below. This is an extraordinary way to destroy everything. Enjoy! Comments and questions are, as always, welcome. --D kernel git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=rmap-intent-cleanups xfsprogs git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=rmap-intent-cleanups --- fs/xfs/libxfs/xfs_btree.c | 4 + fs/xfs/libxfs/xfs_btree.h | 2 fs/xfs/libxfs/xfs_rmap.c | 233 +++++++++++++++++---------------------------- fs/xfs/libxfs/xfs_rmap.h | 10 ++ fs/xfs/xfs_rmap_item.c | 79 +++++++-------- fs/xfs/xfs_trace.c | 1 fs/xfs/xfs_trace.h | 187 +++++++++++++++++++++++++----------- 7 files changed, 265 insertions(+), 251 deletions(-) ^ permalink raw reply [flat|nested] 565+ messages in thread
* [PATCH 1/5] xfs: attach rtgroup objects to btree cursors 2022-12-30 22:18 ` [PATCHSET v1.0 0/5] xfs: rmap log intent cleanups Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 2/5] xfs: give rmap btree cursor error tracepoints their own class Darrick J. Wong ` (3 subsequent siblings) 4 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Make it so that we can attach realtime group objects to btree cursors. This will be crucial for enabling rmap btrees in realtime groups. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_btree.c | 4 ++++ fs/xfs/libxfs/xfs_btree.h | 2 ++ 2 files changed, 6 insertions(+) diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c index 00bc1dd73675..c02748e16075 100644 --- a/fs/xfs/libxfs/xfs_btree.c +++ b/fs/xfs/libxfs/xfs_btree.c @@ -31,6 +31,7 @@ #include "scrub/xfile.h" #include "scrub/xfbtree.h" #include "xfs_btree_mem.h" +#include "xfs_rtgroup.h" /* * Btree magic numbers. @@ -476,6 +477,9 @@ xfs_btree_del_cursor( xfs_is_shutdown(cur->bc_mp) || error != 0); if (unlikely(cur->bc_flags & XFS_BTREE_STAGING)) kmem_free(cur->bc_ops); + if ((cur->bc_flags & XFS_BTREE_ROOT_IN_INODE) && + !(cur->bc_flags & XFS_BTREE_IN_MEMORY) && cur->bc_ino.rtg) + xfs_rtgroup_put(cur->bc_ino.rtg); if (!(cur->bc_flags & XFS_BTREE_LONG_PTRS) && !(cur->bc_flags & XFS_BTREE_IN_MEMORY) && cur->bc_ag.pag) xfs_perag_put(cur->bc_ag.pag); diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h index b15bc77369cf..125f45731a54 100644 --- a/fs/xfs/libxfs/xfs_btree.h +++ b/fs/xfs/libxfs/xfs_btree.h @@ -12,6 +12,7 @@ struct xfs_mount; struct xfs_trans; struct xfs_ifork; struct xfs_perag; +struct xfs_rtgroup; /* * Generic key, ptr and record wrapper structures. @@ -244,6 +245,7 @@ struct xfs_btree_cur_ag { /* Btree-in-inode cursor information */ struct xfs_btree_cur_ino { struct xfs_inode *ip; + struct xfs_rtgroup *rtg; /* if realtime metadata */ struct xbtree_ifakeroot *ifake; /* for staging cursor */ int allocated; short forksize; ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 2/5] xfs: give rmap btree cursor error tracepoints their own class 2022-12-30 22:18 ` [PATCHSET v1.0 0/5] xfs: rmap log intent cleanups Darrick J. Wong 2022-12-30 22:18 ` [PATCH 1/5] xfs: attach rtgroup objects to btree cursors Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 3/5] xfs: prepare rmap btree tracepoints for widening Darrick J. Wong ` (2 subsequent siblings) 4 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Create a new tracepoint class for btree-related errors, then convert all the rmap tracepoints to use it. Also fix the one tracepoint that was abusing the old class by making it a separate tracepoint. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_rmap.c | 33 +++++--------- fs/xfs/xfs_trace.h | 106 ++++++++++++++++++++++++++++++++++++++-------- 2 files changed, 99 insertions(+), 40 deletions(-) diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c index 8da59935780a..f85ff3ddb5c4 100644 --- a/fs/xfs/libxfs/xfs_rmap.c +++ b/fs/xfs/libxfs/xfs_rmap.c @@ -111,8 +111,7 @@ xfs_rmap_update( xfs_rmap_irec_offset_pack(irec)); error = xfs_btree_update(cur, &rec); if (error) - trace_xfs_rmap_update_error(cur->bc_mp, - cur->bc_ag.pag->pag_agno, error, _RET_IP_); + trace_xfs_rmap_update_error(cur, error, _RET_IP_); return error; } @@ -155,8 +154,7 @@ xfs_rmap_insert( } done: if (error) - trace_xfs_rmap_insert_error(rcur->bc_mp, - rcur->bc_ag.pag->pag_agno, error, _RET_IP_); + trace_xfs_rmap_insert_error(rcur, error, _RET_IP_); return error; } @@ -194,8 +192,7 @@ xfs_rmap_delete( } done: if (error) - trace_xfs_rmap_delete_error(rcur->bc_mp, - rcur->bc_ag.pag->pag_agno, error, _RET_IP_); + trace_xfs_rmap_delete_error(rcur, error, _RET_IP_); return error; } @@ -816,8 +813,7 @@ xfs_rmap_unmap( unwritten, oinfo); out_error: if (error) - trace_xfs_rmap_unmap_error(mp, cur->bc_ag.pag->pag_agno, - error, _RET_IP_); + trace_xfs_rmap_unmap_error(cur, error, _RET_IP_); return error; } @@ -1139,8 +1135,7 @@ xfs_rmap_map( unwritten, oinfo); out_error: if (error) - trace_xfs_rmap_map_error(mp, cur->bc_ag.pag->pag_agno, - error, _RET_IP_); + trace_xfs_rmap_map_error(cur, error, _RET_IP_); return error; } @@ -1335,8 +1330,7 @@ xfs_rmap_convert( RIGHT.rm_blockcount > XFS_RMAP_LEN_MAX) state &= ~RMAP_RIGHT_CONTIG; - trace_xfs_rmap_convert_state(mp, cur->bc_ag.pag->pag_agno, state, - _RET_IP_); + trace_xfs_rmap_convert_state(cur, state, _RET_IP_); /* reset the cursor back to PREV */ error = xfs_rmap_lookup_le(cur, bno, owner, offset, oldext, NULL, &i); @@ -1689,8 +1683,7 @@ xfs_rmap_convert( unwritten, oinfo); done: if (error) - trace_xfs_rmap_convert_error(cur->bc_mp, - cur->bc_ag.pag->pag_agno, error, _RET_IP_); + trace_xfs_rmap_convert_error(cur, error, _RET_IP_); return error; } @@ -1813,8 +1806,7 @@ xfs_rmap_convert_shared( RIGHT.rm_blockcount > XFS_RMAP_LEN_MAX) state &= ~RMAP_RIGHT_CONTIG; - trace_xfs_rmap_convert_state(mp, cur->bc_ag.pag->pag_agno, state, - _RET_IP_); + trace_xfs_rmap_convert_state(cur, state, _RET_IP_); /* * Switch out based on the FILLING and CONTIG state bits. */ @@ -2116,8 +2108,7 @@ xfs_rmap_convert_shared( unwritten, oinfo); done: if (error) - trace_xfs_rmap_convert_error(cur->bc_mp, - cur->bc_ag.pag->pag_agno, error, _RET_IP_); + trace_xfs_rmap_convert_error(cur, error, _RET_IP_); return error; } @@ -2316,8 +2307,7 @@ xfs_rmap_unmap_shared( unwritten, oinfo); out_error: if (error) - trace_xfs_rmap_unmap_error(cur->bc_mp, - cur->bc_ag.pag->pag_agno, error, _RET_IP_); + trace_xfs_rmap_unmap_error(cur, error, _RET_IP_); return error; } @@ -2477,8 +2467,7 @@ xfs_rmap_map_shared( unwritten, oinfo); out_error: if (error) - trace_xfs_rmap_map_error(cur->bc_mp, - cur->bc_ag.pag->pag_agno, error, _RET_IP_); + trace_xfs_rmap_map_error(cur, error, _RET_IP_); return error; } diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index 698616531ea0..aaac43e61e83 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -2874,46 +2874,87 @@ DEFINE_EVENT(xfs_rmap_class, name, \ const struct xfs_owner_info *oinfo), \ TP_ARGS(mp, agno, agbno, len, unwritten, oinfo)) -/* simple AG-based error/%ip tracepoint class */ -DECLARE_EVENT_CLASS(xfs_ag_error_class, - TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, int error, +/* btree cursor error/%ip tracepoint class */ +DECLARE_EVENT_CLASS(xfs_btree_error_class, + TP_PROTO(struct xfs_btree_cur *cur, int error, unsigned long caller_ip), - TP_ARGS(mp, agno, error, caller_ip), + TP_ARGS(cur, error, caller_ip), TP_STRUCT__entry( __field(dev_t, dev) __field(xfs_agnumber_t, agno) + __field(xfs_ino_t, ino) __field(int, error) __field(unsigned long, caller_ip) ), TP_fast_assign( - __entry->dev = mp->m_super->s_dev; - __entry->agno = agno; + __entry->dev = cur->bc_mp->m_super->s_dev; + if (cur->bc_flags & XFS_BTREE_IN_MEMORY) { + __entry->agno = 0; + __entry->ino = 0; + } else if (cur->bc_flags & XFS_BTREE_ROOT_IN_INODE) { + __entry->agno = 0; + __entry->ino = cur->bc_ino.ip->i_ino; + } else { + __entry->agno = cur->bc_ag.pag->pag_agno; + __entry->ino = 0; + } __entry->error = error; __entry->caller_ip = caller_ip; ), - TP_printk("dev %d:%d agno 0x%x error %d caller %pS", + TP_printk("dev %d:%d agno 0x%x ino 0x%llx error %d caller %pS", MAJOR(__entry->dev), MINOR(__entry->dev), __entry->agno, + __entry->ino, __entry->error, (char *)__entry->caller_ip) ); -#define DEFINE_AG_ERROR_EVENT(name) \ -DEFINE_EVENT(xfs_ag_error_class, name, \ - TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, int error, \ +#define DEFINE_BTREE_ERROR_EVENT(name) \ +DEFINE_EVENT(xfs_btree_error_class, name, \ + TP_PROTO(struct xfs_btree_cur *cur, int error, \ unsigned long caller_ip), \ - TP_ARGS(mp, agno, error, caller_ip)) + TP_ARGS(cur, error, caller_ip)) DEFINE_RMAP_EVENT(xfs_rmap_unmap); DEFINE_RMAP_EVENT(xfs_rmap_unmap_done); -DEFINE_AG_ERROR_EVENT(xfs_rmap_unmap_error); +DEFINE_BTREE_ERROR_EVENT(xfs_rmap_unmap_error); DEFINE_RMAP_EVENT(xfs_rmap_map); DEFINE_RMAP_EVENT(xfs_rmap_map_done); -DEFINE_AG_ERROR_EVENT(xfs_rmap_map_error); +DEFINE_BTREE_ERROR_EVENT(xfs_rmap_map_error); DEFINE_RMAP_EVENT(xfs_rmap_convert); DEFINE_RMAP_EVENT(xfs_rmap_convert_done); -DEFINE_AG_ERROR_EVENT(xfs_rmap_convert_error); -DEFINE_AG_ERROR_EVENT(xfs_rmap_convert_state); +DEFINE_BTREE_ERROR_EVENT(xfs_rmap_convert_error); + +TRACE_EVENT(xfs_rmap_convert_state, + TP_PROTO(struct xfs_btree_cur *cur, int state, + unsigned long caller_ip), + TP_ARGS(cur, state, caller_ip), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(xfs_agnumber_t, agno) + __field(xfs_ino_t, ino) + __field(int, state) + __field(unsigned long, caller_ip) + ), + TP_fast_assign( + __entry->dev = cur->bc_mp->m_super->s_dev; + if (cur->bc_flags & XFS_BTREE_ROOT_IN_INODE) { + __entry->agno = 0; + __entry->ino = cur->bc_ino.ip->i_ino; + } else { + __entry->agno = cur->bc_ag.pag->pag_agno; + __entry->ino = 0; + } + __entry->state = state; + __entry->caller_ip = caller_ip; + ), + TP_printk("dev %d:%d agno 0x%x ino 0x%llx state %d caller %pS", + MAJOR(__entry->dev), MINOR(__entry->dev), + __entry->agno, + __entry->ino, + __entry->state, + (char *)__entry->caller_ip) +); DECLARE_EVENT_CLASS(xfs_rmapbt_class, TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, @@ -3014,9 +3055,9 @@ DEFINE_RMAP_DEFERRED_EVENT(xfs_rmap_deferred); DEFINE_RMAPBT_EVENT(xfs_rmap_update); DEFINE_RMAPBT_EVENT(xfs_rmap_insert); DEFINE_RMAPBT_EVENT(xfs_rmap_delete); -DEFINE_AG_ERROR_EVENT(xfs_rmap_insert_error); -DEFINE_AG_ERROR_EVENT(xfs_rmap_delete_error); -DEFINE_AG_ERROR_EVENT(xfs_rmap_update_error); +DEFINE_BTREE_ERROR_EVENT(xfs_rmap_insert_error); +DEFINE_BTREE_ERROR_EVENT(xfs_rmap_delete_error); +DEFINE_BTREE_ERROR_EVENT(xfs_rmap_update_error); DEFINE_RMAPBT_EVENT(xfs_rmap_find_left_neighbor_candidate); DEFINE_RMAPBT_EVENT(xfs_rmap_find_left_neighbor_query); @@ -3142,6 +3183,35 @@ DEFINE_AG_RESV_EVENT(xfs_ag_resv_free_extent); DEFINE_AG_RESV_EVENT(xfs_ag_resv_critical); DEFINE_AG_RESV_EVENT(xfs_ag_resv_needed); +/* simple AG-based error/%ip tracepoint class */ +DECLARE_EVENT_CLASS(xfs_ag_error_class, + TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, int error, + unsigned long caller_ip), + TP_ARGS(mp, agno, error, caller_ip), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(xfs_agnumber_t, agno) + __field(int, error) + __field(unsigned long, caller_ip) + ), + TP_fast_assign( + __entry->dev = mp->m_super->s_dev; + __entry->agno = agno; + __entry->error = error; + __entry->caller_ip = caller_ip; + ), + TP_printk("dev %d:%d agno 0x%x error %d caller %pS", + MAJOR(__entry->dev), MINOR(__entry->dev), + __entry->agno, + __entry->error, + (char *)__entry->caller_ip) +); + +#define DEFINE_AG_ERROR_EVENT(name) \ +DEFINE_EVENT(xfs_ag_error_class, name, \ + TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, int error, \ + unsigned long caller_ip), \ + TP_ARGS(mp, agno, error, caller_ip)) DEFINE_AG_ERROR_EVENT(xfs_ag_resv_init_error); /* refcount tracepoint classes */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 3/5] xfs: prepare rmap btree tracepoints for widening 2022-12-30 22:18 ` [PATCHSET v1.0 0/5] xfs: rmap log intent cleanups Darrick J. Wong 2022-12-30 22:18 ` [PATCH 1/5] xfs: attach rtgroup objects to btree cursors Darrick J. Wong 2022-12-30 22:18 ` [PATCH 2/5] xfs: give rmap btree cursor error tracepoints their own class Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 5/5] xfs: remove xfs_trans_set_rmap_flags Darrick J. Wong 2022-12-30 22:18 ` [PATCH 4/5] xfs: clean up rmap log intent item tracepoint callsites Darrick J. Wong 4 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Prepare the rmap btree tracepoints for use with realtime rmap btrees by making them take the btree cursor object as a parameter. This will save us a lot of trouble later on. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_rmap.c | 184 ++++++++++++++++++---------------------------- fs/xfs/xfs_trace.h | 24 +++--- 2 files changed, 85 insertions(+), 123 deletions(-) diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c index f85ff3ddb5c4..065cb95a1ce7 100644 --- a/fs/xfs/libxfs/xfs_rmap.c +++ b/fs/xfs/libxfs/xfs_rmap.c @@ -100,8 +100,7 @@ xfs_rmap_update( union xfs_btree_rec rec; int error; - trace_xfs_rmap_update(cur->bc_mp, cur->bc_ag.pag->pag_agno, - irec->rm_startblock, irec->rm_blockcount, + trace_xfs_rmap_update(cur, irec->rm_startblock, irec->rm_blockcount, irec->rm_owner, irec->rm_offset, irec->rm_flags); rec.rmap.rm_startblock = cpu_to_be32(irec->rm_startblock); @@ -127,8 +126,7 @@ xfs_rmap_insert( int i; int error; - trace_xfs_rmap_insert(rcur->bc_mp, rcur->bc_ag.pag->pag_agno, agbno, - len, owner, offset, flags); + trace_xfs_rmap_insert(rcur, agbno, len, owner, offset, flags); error = xfs_rmap_lookup_eq(rcur, agbno, len, owner, offset, flags, &i); if (error) @@ -170,8 +168,7 @@ xfs_rmap_delete( int i; int error; - trace_xfs_rmap_delete(rcur->bc_mp, rcur->bc_ag.pag->pag_agno, agbno, - len, owner, offset, flags); + trace_xfs_rmap_delete(rcur, agbno, len, owner, offset, flags); error = xfs_rmap_lookup_eq(rcur, agbno, len, owner, offset, flags, &i); if (error) @@ -339,8 +336,7 @@ xfs_rmap_find_left_neighbor_helper( { struct xfs_find_left_neighbor_info *info = priv; - trace_xfs_rmap_find_left_neighbor_candidate(cur->bc_mp, - cur->bc_ag.pag->pag_agno, rec->rm_startblock, + trace_xfs_rmap_find_left_neighbor_candidate(cur, rec->rm_startblock, rec->rm_blockcount, rec->rm_owner, rec->rm_offset, rec->rm_flags); @@ -390,8 +386,8 @@ xfs_rmap_find_left_neighbor( info.high.rm_blockcount = 0; info.irec = irec; - trace_xfs_rmap_find_left_neighbor_query(cur->bc_mp, - cur->bc_ag.pag->pag_agno, bno, 0, owner, offset, flags); + trace_xfs_rmap_find_left_neighbor_query(cur, bno, 0, owner, offset, + flags); /* * Historically, we always used the range query to walk every reverse @@ -422,8 +418,7 @@ xfs_rmap_find_left_neighbor( return error; *stat = 1; - trace_xfs_rmap_find_left_neighbor_result(cur->bc_mp, - cur->bc_ag.pag->pag_agno, irec->rm_startblock, + trace_xfs_rmap_find_left_neighbor_result(cur, irec->rm_startblock, irec->rm_blockcount, irec->rm_owner, irec->rm_offset, irec->rm_flags); return 0; @@ -438,8 +433,7 @@ xfs_rmap_lookup_le_range_helper( { struct xfs_find_left_neighbor_info *info = priv; - trace_xfs_rmap_lookup_le_range_candidate(cur->bc_mp, - cur->bc_ag.pag->pag_agno, rec->rm_startblock, + trace_xfs_rmap_lookup_le_range_candidate(cur, rec->rm_startblock, rec->rm_blockcount, rec->rm_owner, rec->rm_offset, rec->rm_flags); @@ -486,8 +480,7 @@ xfs_rmap_lookup_le_range( *stat = 0; info.irec = irec; - trace_xfs_rmap_lookup_le_range(cur->bc_mp, cur->bc_ag.pag->pag_agno, - bno, 0, owner, offset, flags); + trace_xfs_rmap_lookup_le_range(cur, bno, 0, owner, offset, flags); /* * Historically, we always used the range query to walk every reverse @@ -518,8 +511,7 @@ xfs_rmap_lookup_le_range( return error; *stat = 1; - trace_xfs_rmap_lookup_le_range_result(cur->bc_mp, - cur->bc_ag.pag->pag_agno, irec->rm_startblock, + trace_xfs_rmap_lookup_le_range_result(cur, irec->rm_startblock, irec->rm_blockcount, irec->rm_owner, irec->rm_offset, irec->rm_flags); return 0; @@ -631,8 +623,7 @@ xfs_rmap_unmap( (flags & XFS_RMAP_BMBT_BLOCK); if (unwritten) flags |= XFS_RMAP_UNWRITTEN; - trace_xfs_rmap_unmap(mp, cur->bc_ag.pag->pag_agno, bno, len, - unwritten, oinfo); + trace_xfs_rmap_unmap(cur, bno, len, unwritten, oinfo); /* * We should always have a left record because there's a static record @@ -648,10 +639,9 @@ xfs_rmap_unmap( goto out_error; } - trace_xfs_rmap_lookup_le_range_result(cur->bc_mp, - cur->bc_ag.pag->pag_agno, ltrec.rm_startblock, - ltrec.rm_blockcount, ltrec.rm_owner, - ltrec.rm_offset, ltrec.rm_flags); + trace_xfs_rmap_lookup_le_range_result(cur, ltrec.rm_startblock, + ltrec.rm_blockcount, ltrec.rm_owner, ltrec.rm_offset, + ltrec.rm_flags); ltoff = ltrec.rm_offset; /* @@ -718,10 +708,9 @@ xfs_rmap_unmap( if (ltrec.rm_startblock == bno && ltrec.rm_blockcount == len) { /* exact match, simply remove the record from rmap tree */ - trace_xfs_rmap_delete(mp, cur->bc_ag.pag->pag_agno, - ltrec.rm_startblock, ltrec.rm_blockcount, - ltrec.rm_owner, ltrec.rm_offset, - ltrec.rm_flags); + trace_xfs_rmap_delete(cur, ltrec.rm_startblock, + ltrec.rm_blockcount, ltrec.rm_owner, + ltrec.rm_offset, ltrec.rm_flags); error = xfs_btree_delete(cur, &i); if (error) goto out_error; @@ -797,8 +786,7 @@ xfs_rmap_unmap( else cur->bc_rec.r.rm_offset = offset + len; cur->bc_rec.r.rm_flags = flags; - trace_xfs_rmap_insert(mp, cur->bc_ag.pag->pag_agno, - cur->bc_rec.r.rm_startblock, + trace_xfs_rmap_insert(cur, cur->bc_rec.r.rm_startblock, cur->bc_rec.r.rm_blockcount, cur->bc_rec.r.rm_owner, cur->bc_rec.r.rm_offset, @@ -809,8 +797,7 @@ xfs_rmap_unmap( } out_done: - trace_xfs_rmap_unmap_done(mp, cur->bc_ag.pag->pag_agno, bno, len, - unwritten, oinfo); + trace_xfs_rmap_unmap_done(cur, bno, len, unwritten, oinfo); out_error: if (error) trace_xfs_rmap_unmap_error(cur, error, _RET_IP_); @@ -974,8 +961,7 @@ xfs_rmap_map( (flags & XFS_RMAP_BMBT_BLOCK); if (unwritten) flags |= XFS_RMAP_UNWRITTEN; - trace_xfs_rmap_map(mp, cur->bc_ag.pag->pag_agno, bno, len, - unwritten, oinfo); + trace_xfs_rmap_map(cur, bno, len, unwritten, oinfo); ASSERT(!xfs_rmap_should_skip_owner_update(oinfo)); /* @@ -988,8 +974,7 @@ xfs_rmap_map( if (error) goto out_error; if (have_lt) { - trace_xfs_rmap_lookup_le_range_result(cur->bc_mp, - cur->bc_ag.pag->pag_agno, ltrec.rm_startblock, + trace_xfs_rmap_lookup_le_range_result(cur, ltrec.rm_startblock, ltrec.rm_blockcount, ltrec.rm_owner, ltrec.rm_offset, ltrec.rm_flags); @@ -1027,10 +1012,10 @@ xfs_rmap_map( error = -EFSCORRUPTED; goto out_error; } - trace_xfs_rmap_find_right_neighbor_result(cur->bc_mp, - cur->bc_ag.pag->pag_agno, gtrec.rm_startblock, - gtrec.rm_blockcount, gtrec.rm_owner, - gtrec.rm_offset, gtrec.rm_flags); + trace_xfs_rmap_find_right_neighbor_result(cur, + gtrec.rm_startblock, gtrec.rm_blockcount, + gtrec.rm_owner, gtrec.rm_offset, + gtrec.rm_flags); if (!xfs_rmap_is_mergeable(>rec, owner, flags)) have_gt = 0; } @@ -1067,12 +1052,9 @@ xfs_rmap_map( * result: |rrrrrrrrrrrrrrrrrrrrrrrrrrrrr| */ ltrec.rm_blockcount += gtrec.rm_blockcount; - trace_xfs_rmap_delete(mp, cur->bc_ag.pag->pag_agno, - gtrec.rm_startblock, - gtrec.rm_blockcount, - gtrec.rm_owner, - gtrec.rm_offset, - gtrec.rm_flags); + trace_xfs_rmap_delete(cur, gtrec.rm_startblock, + gtrec.rm_blockcount, gtrec.rm_owner, + gtrec.rm_offset, gtrec.rm_flags); error = xfs_btree_delete(cur, &i); if (error) goto out_error; @@ -1119,8 +1101,7 @@ xfs_rmap_map( cur->bc_rec.r.rm_owner = owner; cur->bc_rec.r.rm_offset = offset; cur->bc_rec.r.rm_flags = flags; - trace_xfs_rmap_insert(mp, cur->bc_ag.pag->pag_agno, bno, len, - owner, offset, flags); + trace_xfs_rmap_insert(cur, bno, len, owner, offset, flags); error = xfs_btree_insert(cur, &i); if (error) goto out_error; @@ -1131,8 +1112,7 @@ xfs_rmap_map( } } - trace_xfs_rmap_map_done(mp, cur->bc_ag.pag->pag_agno, bno, len, - unwritten, oinfo); + trace_xfs_rmap_map_done(cur, bno, len, unwritten, oinfo); out_error: if (error) trace_xfs_rmap_map_error(cur, error, _RET_IP_); @@ -1209,8 +1189,7 @@ xfs_rmap_convert( (flags & (XFS_RMAP_ATTR_FORK | XFS_RMAP_BMBT_BLOCK)))); oldext = unwritten ? XFS_RMAP_UNWRITTEN : 0; new_endoff = offset + len; - trace_xfs_rmap_convert(mp, cur->bc_ag.pag->pag_agno, bno, len, - unwritten, oinfo); + trace_xfs_rmap_convert(cur, bno, len, unwritten, oinfo); /* * For the initial lookup, look for an exact match or the left-adjacent @@ -1226,10 +1205,9 @@ xfs_rmap_convert( goto done; } - trace_xfs_rmap_lookup_le_range_result(cur->bc_mp, - cur->bc_ag.pag->pag_agno, PREV.rm_startblock, - PREV.rm_blockcount, PREV.rm_owner, - PREV.rm_offset, PREV.rm_flags); + trace_xfs_rmap_lookup_le_range_result(cur, PREV.rm_startblock, + PREV.rm_blockcount, PREV.rm_owner, PREV.rm_offset, + PREV.rm_flags); ASSERT(PREV.rm_offset <= offset); ASSERT(PREV.rm_offset + PREV.rm_blockcount >= new_endoff); @@ -1270,10 +1248,9 @@ xfs_rmap_convert( error = -EFSCORRUPTED; goto done; } - trace_xfs_rmap_find_left_neighbor_result(cur->bc_mp, - cur->bc_ag.pag->pag_agno, LEFT.rm_startblock, - LEFT.rm_blockcount, LEFT.rm_owner, - LEFT.rm_offset, LEFT.rm_flags); + trace_xfs_rmap_find_left_neighbor_result(cur, + LEFT.rm_startblock, LEFT.rm_blockcount, + LEFT.rm_owner, LEFT.rm_offset, LEFT.rm_flags); if (LEFT.rm_startblock + LEFT.rm_blockcount == bno && LEFT.rm_offset + LEFT.rm_blockcount == offset && xfs_rmap_is_mergeable(&LEFT, owner, newext)) @@ -1311,10 +1288,10 @@ xfs_rmap_convert( error = -EFSCORRUPTED; goto done; } - trace_xfs_rmap_find_right_neighbor_result(cur->bc_mp, - cur->bc_ag.pag->pag_agno, RIGHT.rm_startblock, - RIGHT.rm_blockcount, RIGHT.rm_owner, - RIGHT.rm_offset, RIGHT.rm_flags); + trace_xfs_rmap_find_right_neighbor_result(cur, + RIGHT.rm_startblock, RIGHT.rm_blockcount, + RIGHT.rm_owner, RIGHT.rm_offset, + RIGHT.rm_flags); if (bno + len == RIGHT.rm_startblock && offset + len == RIGHT.rm_offset && xfs_rmap_is_mergeable(&RIGHT, owner, newext)) @@ -1361,10 +1338,9 @@ xfs_rmap_convert( error = -EFSCORRUPTED; goto done; } - trace_xfs_rmap_delete(mp, cur->bc_ag.pag->pag_agno, - RIGHT.rm_startblock, RIGHT.rm_blockcount, - RIGHT.rm_owner, RIGHT.rm_offset, - RIGHT.rm_flags); + trace_xfs_rmap_delete(cur, RIGHT.rm_startblock, + RIGHT.rm_blockcount, RIGHT.rm_owner, + RIGHT.rm_offset, RIGHT.rm_flags); error = xfs_btree_delete(cur, &i); if (error) goto done; @@ -1381,10 +1357,9 @@ xfs_rmap_convert( error = -EFSCORRUPTED; goto done; } - trace_xfs_rmap_delete(mp, cur->bc_ag.pag->pag_agno, - PREV.rm_startblock, PREV.rm_blockcount, - PREV.rm_owner, PREV.rm_offset, - PREV.rm_flags); + trace_xfs_rmap_delete(cur, PREV.rm_startblock, + PREV.rm_blockcount, PREV.rm_owner, + PREV.rm_offset, PREV.rm_flags); error = xfs_btree_delete(cur, &i); if (error) goto done; @@ -1413,10 +1388,9 @@ xfs_rmap_convert( * Setting all of a previous oldext extent to newext. * The left neighbor is contiguous, the right is not. */ - trace_xfs_rmap_delete(mp, cur->bc_ag.pag->pag_agno, - PREV.rm_startblock, PREV.rm_blockcount, - PREV.rm_owner, PREV.rm_offset, - PREV.rm_flags); + trace_xfs_rmap_delete(cur, PREV.rm_startblock, + PREV.rm_blockcount, PREV.rm_owner, + PREV.rm_offset, PREV.rm_flags); error = xfs_btree_delete(cur, &i); if (error) goto done; @@ -1453,10 +1427,9 @@ xfs_rmap_convert( error = -EFSCORRUPTED; goto done; } - trace_xfs_rmap_delete(mp, cur->bc_ag.pag->pag_agno, - RIGHT.rm_startblock, RIGHT.rm_blockcount, - RIGHT.rm_owner, RIGHT.rm_offset, - RIGHT.rm_flags); + trace_xfs_rmap_delete(cur, RIGHT.rm_startblock, + RIGHT.rm_blockcount, RIGHT.rm_owner, + RIGHT.rm_offset, RIGHT.rm_flags); error = xfs_btree_delete(cur, &i); if (error) goto done; @@ -1534,8 +1507,7 @@ xfs_rmap_convert( NEW.rm_blockcount = len; NEW.rm_flags = newext; cur->bc_rec.r = NEW; - trace_xfs_rmap_insert(mp, cur->bc_ag.pag->pag_agno, bno, - len, owner, offset, newext); + trace_xfs_rmap_insert(cur, bno, len, owner, offset, newext); error = xfs_btree_insert(cur, &i); if (error) goto done; @@ -1593,8 +1565,7 @@ xfs_rmap_convert( NEW.rm_blockcount = len; NEW.rm_flags = newext; cur->bc_rec.r = NEW; - trace_xfs_rmap_insert(mp, cur->bc_ag.pag->pag_agno, bno, - len, owner, offset, newext); + trace_xfs_rmap_insert(cur, bno, len, owner, offset, newext); error = xfs_btree_insert(cur, &i); if (error) goto done; @@ -1625,9 +1596,8 @@ xfs_rmap_convert( NEW = PREV; NEW.rm_blockcount = offset - PREV.rm_offset; cur->bc_rec.r = NEW; - trace_xfs_rmap_insert(mp, cur->bc_ag.pag->pag_agno, - NEW.rm_startblock, NEW.rm_blockcount, - NEW.rm_owner, NEW.rm_offset, + trace_xfs_rmap_insert(cur, NEW.rm_startblock, + NEW.rm_blockcount, NEW.rm_owner, NEW.rm_offset, NEW.rm_flags); error = xfs_btree_insert(cur, &i); if (error) @@ -1654,8 +1624,7 @@ xfs_rmap_convert( /* new middle extent - newext */ cur->bc_rec.r.rm_flags &= ~XFS_RMAP_UNWRITTEN; cur->bc_rec.r.rm_flags |= newext; - trace_xfs_rmap_insert(mp, cur->bc_ag.pag->pag_agno, bno, len, - owner, offset, newext); + trace_xfs_rmap_insert(cur, bno, len, owner, offset, newext); error = xfs_btree_insert(cur, &i); if (error) goto done; @@ -1679,8 +1648,7 @@ xfs_rmap_convert( ASSERT(0); } - trace_xfs_rmap_convert_done(mp, cur->bc_ag.pag->pag_agno, bno, len, - unwritten, oinfo); + trace_xfs_rmap_convert_done(cur, bno, len, unwritten, oinfo); done: if (error) trace_xfs_rmap_convert_error(cur, error, _RET_IP_); @@ -1719,8 +1687,7 @@ xfs_rmap_convert_shared( (flags & (XFS_RMAP_ATTR_FORK | XFS_RMAP_BMBT_BLOCK)))); oldext = unwritten ? XFS_RMAP_UNWRITTEN : 0; new_endoff = offset + len; - trace_xfs_rmap_convert(mp, cur->bc_ag.pag->pag_agno, bno, len, - unwritten, oinfo); + trace_xfs_rmap_convert(cur, bno, len, unwritten, oinfo); /* * For the initial lookup, look for and exact match or the left-adjacent @@ -1789,10 +1756,10 @@ xfs_rmap_convert_shared( error = -EFSCORRUPTED; goto done; } - trace_xfs_rmap_find_right_neighbor_result(cur->bc_mp, - cur->bc_ag.pag->pag_agno, RIGHT.rm_startblock, - RIGHT.rm_blockcount, RIGHT.rm_owner, - RIGHT.rm_offset, RIGHT.rm_flags); + trace_xfs_rmap_find_right_neighbor_result(cur, + RIGHT.rm_startblock, RIGHT.rm_blockcount, + RIGHT.rm_owner, RIGHT.rm_offset, + RIGHT.rm_flags); if (xfs_rmap_is_mergeable(&RIGHT, owner, newext)) state |= RMAP_RIGHT_CONTIG; } @@ -2104,8 +2071,7 @@ xfs_rmap_convert_shared( ASSERT(0); } - trace_xfs_rmap_convert_done(mp, cur->bc_ag.pag->pag_agno, bno, len, - unwritten, oinfo); + trace_xfs_rmap_convert_done(cur, bno, len, unwritten, oinfo); done: if (error) trace_xfs_rmap_convert_error(cur, error, _RET_IP_); @@ -2146,8 +2112,7 @@ xfs_rmap_unmap_shared( xfs_owner_info_unpack(oinfo, &owner, &offset, &flags); if (unwritten) flags |= XFS_RMAP_UNWRITTEN; - trace_xfs_rmap_unmap(mp, cur->bc_ag.pag->pag_agno, bno, len, - unwritten, oinfo); + trace_xfs_rmap_unmap(cur, bno, len, unwritten, oinfo); /* * We should always have a left record because there's a static record @@ -2303,8 +2268,7 @@ xfs_rmap_unmap_shared( goto out_error; } - trace_xfs_rmap_unmap_done(mp, cur->bc_ag.pag->pag_agno, bno, len, - unwritten, oinfo); + trace_xfs_rmap_unmap_done(cur, bno, len, unwritten, oinfo); out_error: if (error) trace_xfs_rmap_unmap_error(cur, error, _RET_IP_); @@ -2342,8 +2306,7 @@ xfs_rmap_map_shared( xfs_owner_info_unpack(oinfo, &owner, &offset, &flags); if (unwritten) flags |= XFS_RMAP_UNWRITTEN; - trace_xfs_rmap_map(mp, cur->bc_ag.pag->pag_agno, bno, len, - unwritten, oinfo); + trace_xfs_rmap_map(cur, bno, len, unwritten, oinfo); /* Is there a left record that abuts our range? */ error = xfs_rmap_find_left_neighbor(cur, bno, owner, offset, flags, @@ -2368,10 +2331,10 @@ xfs_rmap_map_shared( error = -EFSCORRUPTED; goto out_error; } - trace_xfs_rmap_find_right_neighbor_result(cur->bc_mp, - cur->bc_ag.pag->pag_agno, gtrec.rm_startblock, - gtrec.rm_blockcount, gtrec.rm_owner, - gtrec.rm_offset, gtrec.rm_flags); + trace_xfs_rmap_find_right_neighbor_result(cur, + gtrec.rm_startblock, gtrec.rm_blockcount, + gtrec.rm_owner, gtrec.rm_offset, + gtrec.rm_flags); if (!xfs_rmap_is_mergeable(>rec, owner, flags)) have_gt = 0; @@ -2463,8 +2426,7 @@ xfs_rmap_map_shared( goto out_error; } - trace_xfs_rmap_map_done(mp, cur->bc_ag.pag->pag_agno, bno, len, - unwritten, oinfo); + trace_xfs_rmap_map_done(cur, bno, len, unwritten, oinfo); out_error: if (error) trace_xfs_rmap_map_error(cur, error, _RET_IP_); diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index aaac43e61e83..3130b8def8ec 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -2834,10 +2834,10 @@ DEFINE_DEFER_PENDING_ITEM_EVENT(xfs_defer_finish_item); /* rmap tracepoints */ DECLARE_EVENT_CLASS(xfs_rmap_class, - TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, + TP_PROTO(struct xfs_btree_cur *cur, xfs_agblock_t agbno, xfs_extlen_t len, bool unwritten, const struct xfs_owner_info *oinfo), - TP_ARGS(mp, agno, agbno, len, unwritten, oinfo), + TP_ARGS(cur, agbno, len, unwritten, oinfo), TP_STRUCT__entry( __field(dev_t, dev) __field(xfs_agnumber_t, agno) @@ -2848,8 +2848,8 @@ DECLARE_EVENT_CLASS(xfs_rmap_class, __field(unsigned long, flags) ), TP_fast_assign( - __entry->dev = mp->m_super->s_dev; - __entry->agno = agno; + __entry->dev = cur->bc_mp->m_super->s_dev; + __entry->agno = cur->bc_ag.pag->pag_agno; __entry->agbno = agbno; __entry->len = len; __entry->owner = oinfo->oi_owner; @@ -2869,10 +2869,10 @@ DECLARE_EVENT_CLASS(xfs_rmap_class, ); #define DEFINE_RMAP_EVENT(name) \ DEFINE_EVENT(xfs_rmap_class, name, \ - TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, \ + TP_PROTO(struct xfs_btree_cur *cur, \ xfs_agblock_t agbno, xfs_extlen_t len, bool unwritten, \ const struct xfs_owner_info *oinfo), \ - TP_ARGS(mp, agno, agbno, len, unwritten, oinfo)) + TP_ARGS(cur, agbno, len, unwritten, oinfo)) /* btree cursor error/%ip tracepoint class */ DECLARE_EVENT_CLASS(xfs_btree_error_class, @@ -2957,10 +2957,10 @@ TRACE_EVENT(xfs_rmap_convert_state, ); DECLARE_EVENT_CLASS(xfs_rmapbt_class, - TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, + TP_PROTO(struct xfs_btree_cur *cur, xfs_agblock_t agbno, xfs_extlen_t len, uint64_t owner, uint64_t offset, unsigned int flags), - TP_ARGS(mp, agno, agbno, len, owner, offset, flags), + TP_ARGS(cur, agbno, len, owner, offset, flags), TP_STRUCT__entry( __field(dev_t, dev) __field(xfs_agnumber_t, agno) @@ -2971,8 +2971,8 @@ DECLARE_EVENT_CLASS(xfs_rmapbt_class, __field(unsigned int, flags) ), TP_fast_assign( - __entry->dev = mp->m_super->s_dev; - __entry->agno = agno; + __entry->dev = cur->bc_mp->m_super->s_dev; + __entry->agno = cur->bc_ag.pag->pag_agno; __entry->agbno = agbno; __entry->len = len; __entry->owner = owner; @@ -2990,10 +2990,10 @@ DECLARE_EVENT_CLASS(xfs_rmapbt_class, ); #define DEFINE_RMAPBT_EVENT(name) \ DEFINE_EVENT(xfs_rmapbt_class, name, \ - TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, \ + TP_PROTO(struct xfs_btree_cur *cur, \ xfs_agblock_t agbno, xfs_extlen_t len, \ uint64_t owner, uint64_t offset, unsigned int flags), \ - TP_ARGS(mp, agno, agbno, len, owner, offset, flags)) + TP_ARGS(cur, agbno, len, owner, offset, flags)) DECLARE_EVENT_CLASS(xfs_rmap_deferred_class, TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 5/5] xfs: remove xfs_trans_set_rmap_flags 2022-12-30 22:18 ` [PATCHSET v1.0 0/5] xfs: rmap log intent cleanups Darrick J. Wong ` (2 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 3/5] xfs: prepare rmap btree tracepoints for widening Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 4/5] xfs: clean up rmap log intent item tracepoint callsites Darrick J. Wong 4 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Remove this single-use helper. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/xfs_rmap_item.c | 79 +++++++++++++++++++++--------------------------- 1 file changed, 34 insertions(+), 45 deletions(-) diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c index 38915e92bf2b..a84f7e0e91a3 100644 --- a/fs/xfs/xfs_rmap_item.c +++ b/fs/xfs/xfs_rmap_item.c @@ -242,49 +242,6 @@ xfs_trans_get_rud( return rudp; } -/* Set the map extent flags for this reverse mapping. */ -static void -xfs_trans_set_rmap_flags( - struct xfs_map_extent *map, - enum xfs_rmap_intent_type type, - int whichfork, - xfs_exntst_t state) -{ - map->me_flags = 0; - if (state == XFS_EXT_UNWRITTEN) - map->me_flags |= XFS_RMAP_EXTENT_UNWRITTEN; - if (whichfork == XFS_ATTR_FORK) - map->me_flags |= XFS_RMAP_EXTENT_ATTR_FORK; - switch (type) { - case XFS_RMAP_MAP: - map->me_flags |= XFS_RMAP_EXTENT_MAP; - break; - case XFS_RMAP_MAP_SHARED: - map->me_flags |= XFS_RMAP_EXTENT_MAP_SHARED; - break; - case XFS_RMAP_UNMAP: - map->me_flags |= XFS_RMAP_EXTENT_UNMAP; - break; - case XFS_RMAP_UNMAP_SHARED: - map->me_flags |= XFS_RMAP_EXTENT_UNMAP_SHARED; - break; - case XFS_RMAP_CONVERT: - map->me_flags |= XFS_RMAP_EXTENT_CONVERT; - break; - case XFS_RMAP_CONVERT_SHARED: - map->me_flags |= XFS_RMAP_EXTENT_CONVERT_SHARED; - break; - case XFS_RMAP_ALLOC: - map->me_flags |= XFS_RMAP_EXTENT_ALLOC; - break; - case XFS_RMAP_FREE: - map->me_flags |= XFS_RMAP_EXTENT_FREE; - break; - default: - ASSERT(0); - } -} - /* * Finish an rmap update and log it to the RUD. Note that the transaction is * marked dirty regardless of whether the rmap update succeeds or fails to @@ -355,8 +312,40 @@ xfs_rmap_update_log_item( map->me_startblock = ri->ri_bmap.br_startblock; map->me_startoff = ri->ri_bmap.br_startoff; map->me_len = ri->ri_bmap.br_blockcount; - xfs_trans_set_rmap_flags(map, ri->ri_type, ri->ri_whichfork, - ri->ri_bmap.br_state); + + map->me_flags = 0; + if (ri->ri_bmap.br_state == XFS_EXT_UNWRITTEN) + map->me_flags |= XFS_RMAP_EXTENT_UNWRITTEN; + if (ri->ri_whichfork == XFS_ATTR_FORK) + map->me_flags |= XFS_RMAP_EXTENT_ATTR_FORK; + switch (ri->ri_type) { + case XFS_RMAP_MAP: + map->me_flags |= XFS_RMAP_EXTENT_MAP; + break; + case XFS_RMAP_MAP_SHARED: + map->me_flags |= XFS_RMAP_EXTENT_MAP_SHARED; + break; + case XFS_RMAP_UNMAP: + map->me_flags |= XFS_RMAP_EXTENT_UNMAP; + break; + case XFS_RMAP_UNMAP_SHARED: + map->me_flags |= XFS_RMAP_EXTENT_UNMAP_SHARED; + break; + case XFS_RMAP_CONVERT: + map->me_flags |= XFS_RMAP_EXTENT_CONVERT; + break; + case XFS_RMAP_CONVERT_SHARED: + map->me_flags |= XFS_RMAP_EXTENT_CONVERT_SHARED; + break; + case XFS_RMAP_ALLOC: + map->me_flags |= XFS_RMAP_EXTENT_ALLOC; + break; + case XFS_RMAP_FREE: + map->me_flags |= XFS_RMAP_EXTENT_FREE; + break; + default: + ASSERT(0); + } } static struct xfs_log_item * ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 4/5] xfs: clean up rmap log intent item tracepoint callsites 2022-12-30 22:18 ` [PATCHSET v1.0 0/5] xfs: rmap log intent cleanups Darrick J. Wong ` (3 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 5/5] xfs: remove xfs_trans_set_rmap_flags Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 4 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Pass the incore rmap structure to the tracepoints instead of open-coding the argument passing. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_rmap.c | 16 ++----------- fs/xfs/libxfs/xfs_rmap.h | 10 ++++++++ fs/xfs/xfs_trace.c | 1 + fs/xfs/xfs_trace.h | 57 ++++++++++++++++++++++------------------------ 4 files changed, 41 insertions(+), 43 deletions(-) diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c index 065cb95a1ce7..9ad3e5077f34 100644 --- a/fs/xfs/libxfs/xfs_rmap.c +++ b/fs/xfs/libxfs/xfs_rmap.c @@ -2590,10 +2590,7 @@ xfs_rmap_finish_one( bno = XFS_FSB_TO_AGBNO(mp, ri->ri_bmap.br_startblock); - trace_xfs_rmap_deferred(mp, ri->ri_pag->pag_agno, ri->ri_type, bno, - ri->ri_owner, ri->ri_whichfork, - ri->ri_bmap.br_startoff, ri->ri_bmap.br_blockcount, - ri->ri_bmap.br_state); + trace_xfs_rmap_deferred(mp, ri); if (XFS_TEST_ERROR(false, mp, XFS_ERRTAG_RMAP_FINISH_ONE)) return -EIO; @@ -2668,15 +2665,6 @@ __xfs_rmap_add( { struct xfs_rmap_intent *ri; - trace_xfs_rmap_defer(tp->t_mountp, - XFS_FSB_TO_AGNO(tp->t_mountp, bmap->br_startblock), - type, - XFS_FSB_TO_AGBNO(tp->t_mountp, bmap->br_startblock), - owner, whichfork, - bmap->br_startoff, - bmap->br_blockcount, - bmap->br_state); - ri = kmem_cache_alloc(xfs_rmap_intent_cache, GFP_NOFS | __GFP_NOFAIL); INIT_LIST_HEAD(&ri->ri_list); ri->ri_type = type; @@ -2684,6 +2672,8 @@ __xfs_rmap_add( ri->ri_whichfork = whichfork; ri->ri_bmap = *bmap; + trace_xfs_rmap_defer(tp->t_mountp, ri); + xfs_rmap_update_get_group(tp->t_mountp, ri); xfs_defer_add(tp, XFS_DEFER_OPS_TYPE_RMAP, &ri->ri_list); } diff --git a/fs/xfs/libxfs/xfs_rmap.h b/fs/xfs/libxfs/xfs_rmap.h index 2a9265218f1d..36af4de506c7 100644 --- a/fs/xfs/libxfs/xfs_rmap.h +++ b/fs/xfs/libxfs/xfs_rmap.h @@ -157,6 +157,16 @@ enum xfs_rmap_intent_type { XFS_RMAP_FREE, }; +#define XFS_RMAP_INTENT_STRINGS \ + { XFS_RMAP_MAP, "map" }, \ + { XFS_RMAP_MAP_SHARED, "map_shared" }, \ + { XFS_RMAP_UNMAP, "unmap" }, \ + { XFS_RMAP_UNMAP_SHARED, "unmap_shared" }, \ + { XFS_RMAP_CONVERT, "cvt" }, \ + { XFS_RMAP_CONVERT_SHARED, "cvt_shared" }, \ + { XFS_RMAP_ALLOC, "alloc" }, \ + { XFS_RMAP_FREE, "free" } + struct xfs_rmap_intent { struct list_head ri_list; enum xfs_rmap_intent_type ri_type; diff --git a/fs/xfs/xfs_trace.c b/fs/xfs/xfs_trace.c index 36109a57fca6..d7ede15110e8 100644 --- a/fs/xfs/xfs_trace.c +++ b/fs/xfs/xfs_trace.c @@ -43,6 +43,7 @@ #include "xfs_swapext.h" #include "xfs_xchgrange.h" #include "xfs_rtgroup.h" +#include "xfs_rmap.h" /* * We include this last to have the helpers above available for the trace diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index 3130b8def8ec..fd067e1e28db 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -84,6 +84,7 @@ struct xfs_swapext_intent; struct xfs_swapext_req; struct xfs_rtgroup; struct xfs_extent_free_item; +struct xfs_rmap_intent; #define XFS_ATTR_FILTER_FLAGS \ { XFS_ATTR_ROOT, "ROOT" }, \ @@ -2995,20 +2996,22 @@ DEFINE_EVENT(xfs_rmapbt_class, name, \ uint64_t owner, uint64_t offset, unsigned int flags), \ TP_ARGS(cur, agbno, len, owner, offset, flags)) +TRACE_DEFINE_ENUM(XFS_RMAP_MAP); +TRACE_DEFINE_ENUM(XFS_RMAP_MAP_SHARED); +TRACE_DEFINE_ENUM(XFS_RMAP_UNMAP); +TRACE_DEFINE_ENUM(XFS_RMAP_UNMAP_SHARED); +TRACE_DEFINE_ENUM(XFS_RMAP_CONVERT); +TRACE_DEFINE_ENUM(XFS_RMAP_CONVERT_SHARED); +TRACE_DEFINE_ENUM(XFS_RMAP_ALLOC); +TRACE_DEFINE_ENUM(XFS_RMAP_FREE); + DECLARE_EVENT_CLASS(xfs_rmap_deferred_class, - TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, - int op, - xfs_agblock_t agbno, - xfs_ino_t ino, - int whichfork, - xfs_fileoff_t offset, - xfs_filblks_t len, - xfs_exntst_t state), - TP_ARGS(mp, agno, op, agbno, ino, whichfork, offset, len, state), + TP_PROTO(struct xfs_mount *mp, struct xfs_rmap_intent *ri), + TP_ARGS(mp, ri), TP_STRUCT__entry( __field(dev_t, dev) + __field(unsigned long long, owner) __field(xfs_agnumber_t, agno) - __field(xfs_ino_t, ino) __field(xfs_agblock_t, agbno) __field(int, whichfork) __field(xfs_fileoff_t, l_loff) @@ -3018,21 +3021,22 @@ DECLARE_EVENT_CLASS(xfs_rmap_deferred_class, ), TP_fast_assign( __entry->dev = mp->m_super->s_dev; - __entry->agno = agno; - __entry->ino = ino; - __entry->agbno = agbno; - __entry->whichfork = whichfork; - __entry->l_loff = offset; - __entry->l_len = len; - __entry->l_state = state; - __entry->op = op; + __entry->agno = XFS_FSB_TO_AGNO(mp, ri->ri_bmap.br_startblock); + __entry->agbno = XFS_FSB_TO_AGBNO(mp, + ri->ri_bmap.br_startblock); + __entry->owner = ri->ri_owner; + __entry->whichfork = ri->ri_whichfork; + __entry->l_loff = ri->ri_bmap.br_startoff; + __entry->l_len = ri->ri_bmap.br_blockcount; + __entry->l_state = ri->ri_bmap.br_state; + __entry->op = ri->ri_type; ), - TP_printk("dev %d:%d op %d agno 0x%x agbno 0x%x owner 0x%llx %s fileoff 0x%llx fsbcount 0x%llx state %d", + TP_printk("dev %d:%d op %s agno 0x%x agbno 0x%x owner 0x%llx %s fileoff 0x%llx fsbcount 0x%llx state %d", MAJOR(__entry->dev), MINOR(__entry->dev), - __entry->op, + __print_symbolic(__entry->op, XFS_RMAP_INTENT_STRINGS), __entry->agno, __entry->agbno, - __entry->ino, + __entry->owner, __print_symbolic(__entry->whichfork, XFS_WHICHFORK_STRINGS), __entry->l_loff, __entry->l_len, @@ -3040,15 +3044,8 @@ DECLARE_EVENT_CLASS(xfs_rmap_deferred_class, ); #define DEFINE_RMAP_DEFERRED_EVENT(name) \ DEFINE_EVENT(xfs_rmap_deferred_class, name, \ - TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, \ - int op, \ - xfs_agblock_t agbno, \ - xfs_ino_t ino, \ - int whichfork, \ - xfs_fileoff_t offset, \ - xfs_filblks_t len, \ - xfs_exntst_t state), \ - TP_ARGS(mp, agno, op, agbno, ino, whichfork, offset, len, state)) + TP_PROTO(struct xfs_mount *mp, struct xfs_rmap_intent *ri), \ + TP_ARGS(mp, ri)) DEFINE_RMAP_DEFERRED_EVENT(xfs_rmap_defer); DEFINE_RMAP_DEFERRED_EVENT(xfs_rmap_deferred); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCHSET v1.0 00/38] xfs: realtime reverse-mapping support 2022-12-30 21:14 [NYE DELUGE 3/4] xfs: modernize the realtime volume Darrick J. Wong ` (12 preceding siblings ...) 2022-12-30 22:18 ` [PATCHSET v1.0 0/5] xfs: rmap log intent cleanups Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 06/38] xfs: add realtime rmap btree operations Darrick J. Wong ` (37 more replies) 2022-12-30 22:18 ` [PATCHSET v1.0 0/5] xfs: refcount log intent cleanups Darrick J. Wong ` (25 subsequent siblings) 39 siblings, 38 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs Hi all, This is the latest revision of a patchset that adds to XFS kernel support for reverse mapping for the realtime device. This time around I've fixed some of the bitrot that I've noticed over the past few months, and most notably have converted rtrmapbt to use the metadata inode directory feature instead of burning more space in the superblock. At the beginning of the set are patches to implement storing B+tree leaves in an inode root, since the realtime rmapbt is rooted in an inode, unlike the regular rmapbt which is rooted in an AG block. Prior to this, the only btree that could be rooted in the inode fork was the block mapping btree; if all the extent records fit in the inode, format would be switched from 'btree' to 'extents'. The next few patches widen the reverse mapping routines to fit the 64-bit numbers required to store information about the realtime device and establish a new b+tree type (rtrmapbt) for the realtime variant of the rmapbt. After that are a few patches to handle rooting the rtrmapbt in a specific inode that's referenced by the superblock. Finally, there are patches to implement GETFSMAP with the rtrmapbt and scrub functionality for the rtrmapbt and rtbitmap; and then wire up the online scrub functionality. We also enhance EFIs to support tracking freeing of realtime extents so that when rmap is turned on we can maintain the same order of operations as the regular rmap code. If you're going to start using this mess, you probably ought to just pull from my git trees, which are linked below. This is an extraordinary way to destroy everything. Enjoy! Comments and questions are, as always, welcome. --D kernel git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=realtime-rmap xfsprogs git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=realtime-rmap fstests git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=realtime-rmap xfsdocs git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-documentation.git/log/?h=realtime-rmap --- fs/xfs/Makefile | 3 fs/xfs/libxfs/xfs_bmap.c | 22 + fs/xfs/libxfs/xfs_btree.c | 121 ++++ fs/xfs/libxfs/xfs_btree.h | 7 fs/xfs/libxfs/xfs_defer.c | 1 fs/xfs/libxfs/xfs_defer.h | 1 fs/xfs/libxfs/xfs_format.h | 24 + fs/xfs/libxfs/xfs_fs.h | 4 fs/xfs/libxfs/xfs_health.h | 4 fs/xfs/libxfs/xfs_imeta.c | 6 fs/xfs/libxfs/xfs_inode_buf.c | 6 fs/xfs/libxfs/xfs_inode_fork.c | 13 fs/xfs/libxfs/xfs_log_format.h | 4 fs/xfs/libxfs/xfs_refcount.c | 6 fs/xfs/libxfs/xfs_rmap.c | 227 +++++++- fs/xfs/libxfs/xfs_rmap.h | 22 + fs/xfs/libxfs/xfs_rtgroup.c | 12 fs/xfs/libxfs/xfs_rtgroup.h | 20 + fs/xfs/libxfs/xfs_rtrmap_btree.c | 1036 ++++++++++++++++++++++++++++++++++++++ fs/xfs/libxfs/xfs_rtrmap_btree.h | 218 ++++++++ fs/xfs/libxfs/xfs_sb.c | 6 fs/xfs/libxfs/xfs_shared.h | 2 fs/xfs/libxfs/xfs_swapext.c | 4 fs/xfs/libxfs/xfs_trans_resv.c | 12 fs/xfs/libxfs/xfs_trans_space.h | 13 fs/xfs/libxfs/xfs_types.h | 5 fs/xfs/scrub/alloc_repair.c | 10 fs/xfs/scrub/bmap.c | 128 ++++- fs/xfs/scrub/bmap_repair.c | 131 +++++ fs/xfs/scrub/common.c | 153 +++++- fs/xfs/scrub/common.h | 14 - fs/xfs/scrub/cow_repair.c | 2 fs/xfs/scrub/health.c | 1 fs/xfs/scrub/inode.c | 10 fs/xfs/scrub/inode_repair.c | 75 +++ fs/xfs/scrub/reap.c | 5 fs/xfs/scrub/reap.h | 2 fs/xfs/scrub/repair.c | 229 ++++++++ fs/xfs/scrub/repair.h | 34 + fs/xfs/scrub/rmap_repair.c | 36 + fs/xfs/scrub/rtbitmap.c | 104 ++++ fs/xfs/scrub/rtbitmap_repair.c | 692 +++++++++++++++++++++++++ fs/xfs/scrub/rtrmap.c | 282 ++++++++++ fs/xfs/scrub/rtrmap_repair.c | 908 +++++++++++++++++++++++++++++++++ fs/xfs/scrub/rtsummary_repair.c | 3 fs/xfs/scrub/scrub.c | 11 fs/xfs/scrub/scrub.h | 14 + fs/xfs/scrub/tempfile.c | 15 - fs/xfs/scrub/tempswap.h | 2 fs/xfs/scrub/trace.c | 1 fs/xfs/scrub/trace.h | 249 +++++++++ fs/xfs/scrub/xfbtree.c | 3 fs/xfs/xfs_bmap_item.c | 5 fs/xfs/xfs_buf_item_recover.c | 4 fs/xfs/xfs_drain.c | 41 ++ fs/xfs/xfs_drain.h | 19 + fs/xfs/xfs_extfree_item.c | 2 fs/xfs/xfs_fsmap.c | 579 ++++++++++++++------- fs/xfs/xfs_fsops.c | 12 fs/xfs/xfs_health.c | 4 fs/xfs/xfs_inode.c | 19 + fs/xfs/xfs_inode_item.c | 2 fs/xfs/xfs_inode_item_recover.c | 33 + fs/xfs/xfs_mount.c | 9 fs/xfs/xfs_mount.h | 10 fs/xfs/xfs_ondisk.h | 2 fs/xfs/xfs_qm.c | 20 + fs/xfs/xfs_qm_bhv.c | 2 fs/xfs/xfs_quota.h | 4 fs/xfs/xfs_rmap_item.c | 27 + fs/xfs/xfs_rtalloc.c | 283 ++++++++++ fs/xfs/xfs_rtalloc.h | 9 fs/xfs/xfs_super.c | 6 fs/xfs/xfs_trace.c | 18 + fs/xfs/xfs_trace.h | 136 ++++- 75 files changed, 5771 insertions(+), 388 deletions(-) create mode 100644 fs/xfs/libxfs/xfs_rtrmap_btree.c create mode 100644 fs/xfs/libxfs/xfs_rtrmap_btree.h create mode 100644 fs/xfs/scrub/rtrmap.c create mode 100644 fs/xfs/scrub/rtrmap_repair.c ^ permalink raw reply [flat|nested] 565+ messages in thread
* [PATCH 06/38] xfs: add realtime rmap btree operations 2022-12-30 22:18 ` [PATCHSET v1.0 00/38] xfs: realtime reverse-mapping support Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 01/38] xfs: prepare rmap btree cursor tracepoints for realtime Darrick J. Wong ` (36 subsequent siblings) 37 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Implement the generic btree operations needed to manipulate rtrmap btree blocks. This is different from the regular rmapbt in that we allocate space from the filesystem at large, and are neither constrained to the free space nor any particular AG. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_btree.c | 113 ++++++++++++++++ fs/xfs/libxfs/xfs_btree.h | 5 + fs/xfs/libxfs/xfs_imeta.c | 6 + fs/xfs/libxfs/xfs_rtrmap_btree.c | 271 ++++++++++++++++++++++++++++++++++++++ 4 files changed, 395 insertions(+) diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c index 4f1f03b207d3..fe742567a7dd 100644 --- a/fs/xfs/libxfs/xfs_btree.c +++ b/fs/xfs/libxfs/xfs_btree.c @@ -33,6 +33,10 @@ #include "xfs_btree_mem.h" #include "xfs_rtgroup.h" #include "xfs_rtrmap_btree.h" +#include "xfs_bmap.h" +#include "xfs_rmap.h" +#include "xfs_quota.h" +#include "xfs_imeta.h" /* * Btree magic numbers. @@ -5589,3 +5593,112 @@ xfs_btree_goto_left_edge( return 0; } + +/* Allocate a block for an inode-rooted metadata btree. */ +int +xfs_btree_alloc_imeta_block( + struct xfs_btree_cur *cur, + const union xfs_btree_ptr *start, + union xfs_btree_ptr *new, + int *stat) +{ + struct xfs_alloc_arg args = { + .mp = cur->bc_mp, + .tp = cur->bc_tp + }; + struct xfs_inode *ip = cur->bc_ino.ip; + struct xfs_trans *tp = cur->bc_tp; + int error; + + ASSERT(!XFS_NOT_DQATTACHED(cur->bc_mp, ip)); + + args.fsbno = tp->t_firstblock; + args.resv = XFS_AG_RESV_IMETA; + xfs_rmap_ino_bmbt_owner(&args.oinfo, ip->i_ino, cur->bc_ino.whichfork); + + if (args.fsbno == NULLFSBLOCK) { + args.fsbno = be64_to_cpu(start->l); + args.type = XFS_ALLOCTYPE_START_BNO; + /* + * Make sure there is sufficient room left in the AG to + * complete a full tree split for an extent insert. If + * we are converting the middle part of an extent then + * we may need space for two tree splits. + * + * We are relying on the caller to make the correct block + * reservation for this operation to succeed. If the + * reservation amount is insufficient then we may fail a + * block allocation here and corrupt the filesystem. + */ + args.minleft = tp->t_blk_res; + } else if (tp->t_flags & XFS_TRANS_LOWMODE) { + args.type = XFS_ALLOCTYPE_START_BNO; + } else { + args.type = XFS_ALLOCTYPE_NEAR_BNO; + } + + args.minlen = args.maxlen = args.prod = 1; + error = xfs_alloc_vextent(&args); + if (error) + goto error0; + + if (args.fsbno == NULLFSBLOCK && args.minleft) { + /* + * Could not find an AG with enough free space to satisfy + * a full btree split. Try again without minleft and if + * successful activate the lowspace algorithm. + */ + args.fsbno = 0; + args.type = XFS_ALLOCTYPE_FIRST_AG; + args.minleft = 0; + error = xfs_alloc_vextent(&args); + if (error) + goto error0; + tp->t_flags |= XFS_TRANS_LOWMODE; + } + if (args.fsbno == NULLFSBLOCK) { + *stat = 0; + return 0; + } + ASSERT(args.len == 1); + + xfs_imeta_resv_alloc_extent(ip, &args); + cur->bc_ino.allocated++; + + new->l = cpu_to_be64(args.fsbno); + *stat = 1; + return 0; + + error0: + return error; +} + +/* Free a block from an inode-rooted metadata btree. */ +int +xfs_btree_free_imeta_block( + struct xfs_btree_cur *cur, + struct xfs_buf *bp) +{ + struct xfs_owner_info oinfo; + struct xfs_mount *mp = cur->bc_mp; + struct xfs_inode *ip = cur->bc_ino.ip; + struct xfs_trans *tp = cur->bc_tp; + struct xfs_perag *pag; + xfs_fsblock_t fsbno = XFS_DADDR_TO_FSB(mp, xfs_buf_daddr(bp)); + xfs_agnumber_t agno = XFS_FSB_TO_AGNO(mp, fsbno); + xfs_agblock_t agbno = XFS_FSB_TO_AGBNO(mp, fsbno); + int error; + + ASSERT(!XFS_NOT_DQATTACHED(mp, ip)); + + xfs_rmap_ino_bmbt_owner(&oinfo, ip->i_ino, cur->bc_ino.whichfork); + pag = xfs_perag_get(mp, agno); + error = __xfs_free_extent(tp, pag, agbno, 1, &oinfo, XFS_AG_RESV_IMETA, + false); + xfs_perag_put(pag); + if (error) + return error; + + xfs_imeta_resv_free_extent(ip, tp, 1); + return 0; +} diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h index ddaad83d4ff9..5a733767649b 100644 --- a/fs/xfs/libxfs/xfs_btree.h +++ b/fs/xfs/libxfs/xfs_btree.h @@ -761,4 +761,9 @@ void xfs_btree_destroy_cur_caches(void); int xfs_btree_goto_left_edge(struct xfs_btree_cur *cur); +int xfs_btree_alloc_imeta_block(struct xfs_btree_cur *cur, + const union xfs_btree_ptr *start, union xfs_btree_ptr *newp, + int *stat); +int xfs_btree_free_imeta_block(struct xfs_btree_cur *cur, struct xfs_buf *bp); + #endif /* __XFS_BTREE_H__ */ diff --git a/fs/xfs/libxfs/xfs_imeta.c b/fs/xfs/libxfs/xfs_imeta.c index 5bfb1eabf21d..1065144911b3 100644 --- a/fs/xfs/libxfs/xfs_imeta.c +++ b/fs/xfs/libxfs/xfs_imeta.c @@ -1303,6 +1303,9 @@ xfs_imeta_resv_alloc_extent( xfs_trans_mod_sb(args->tp, XFS_TRANS_SB_FDBLOCKS, -len); ip->i_nblocks += args->len; + xfs_trans_mod_dquot_byino(args->tp, ip, XFS_TRANS_DQ_BCOUNT, args->len); + + xfs_trans_log_inode(args->tp, ip, XFS_ILOG_CORE); } /* Free a block to the metadata file's reservation. */ @@ -1318,6 +1321,7 @@ xfs_imeta_resv_free_extent( trace_xfs_imeta_resv_free_extent(ip, len); ip->i_nblocks -= len; + xfs_trans_mod_dquot_byino(tp, ip, XFS_TRANS_DQ_BCOUNT, -len); /* * Add the freed blocks back into the inode's delalloc reservation @@ -1338,6 +1342,8 @@ xfs_imeta_resv_free_extent( */ if (len) xfs_trans_mod_sb(tp, XFS_TRANS_SB_FDBLOCKS, len); + + xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); } /* Release a metadata file's space reservation. */ diff --git a/fs/xfs/libxfs/xfs_rtrmap_btree.c b/fs/xfs/libxfs/xfs_rtrmap_btree.c index 7f6ba2efdaf2..551d575713db 100644 --- a/fs/xfs/libxfs/xfs_rtrmap_btree.c +++ b/fs/xfs/libxfs/xfs_rtrmap_btree.c @@ -18,12 +18,14 @@ #include "xfs_alloc.h" #include "xfs_btree.h" #include "xfs_btree_staging.h" +#include "xfs_rmap.h" #include "xfs_rtrmap_btree.h" #include "xfs_trace.h" #include "xfs_cksum.h" #include "xfs_error.h" #include "xfs_extent_busy.h" #include "xfs_rtgroup.h" +#include "xfs_bmap.h" static struct kmem_cache *xfs_rtrmapbt_cur_cache; @@ -52,6 +54,182 @@ xfs_rtrmapbt_dup_cursor( return new; } +STATIC int +xfs_rtrmapbt_get_minrecs( + struct xfs_btree_cur *cur, + int level) +{ + if (level == cur->bc_nlevels - 1) { + struct xfs_ifork *ifp = xfs_btree_ifork_ptr(cur); + + return xfs_rtrmapbt_maxrecs(cur->bc_mp, ifp->if_broot_bytes, + level == 0) / 2; + } + + return cur->bc_mp->m_rtrmap_mnr[level != 0]; +} + +STATIC int +xfs_rtrmapbt_get_maxrecs( + struct xfs_btree_cur *cur, + int level) +{ + if (level == cur->bc_nlevels - 1) { + struct xfs_ifork *ifp = xfs_btree_ifork_ptr(cur); + + return xfs_rtrmapbt_maxrecs(cur->bc_mp, ifp->if_broot_bytes, + level == 0); + } + + return cur->bc_mp->m_rtrmap_mxr[level != 0]; +} + +/* + * Convert the ondisk record's offset field into the ondisk key's offset field. + * Fork and bmbt are significant parts of the rmap record key, but written + * status is merely a record attribute. + */ +static inline __be64 ondisk_rec_offset_to_key(const union xfs_btree_rec *rec) +{ + return rec->rmap.rm_offset & ~cpu_to_be64(XFS_RMAP_OFF_UNWRITTEN); +} + +STATIC void +xfs_rtrmapbt_init_key_from_rec( + union xfs_btree_key *key, + const union xfs_btree_rec *rec) +{ + key->rmap.rm_startblock = rec->rmap.rm_startblock; + key->rmap.rm_owner = rec->rmap.rm_owner; + key->rmap.rm_offset = ondisk_rec_offset_to_key(rec); +} + +STATIC void +xfs_rtrmapbt_init_high_key_from_rec( + union xfs_btree_key *key, + const union xfs_btree_rec *rec) +{ + uint64_t off; + int adj; + + adj = be32_to_cpu(rec->rmap.rm_blockcount) - 1; + + key->rmap.rm_startblock = rec->rmap.rm_startblock; + be32_add_cpu(&key->rmap.rm_startblock, adj); + key->rmap.rm_owner = rec->rmap.rm_owner; + key->rmap.rm_offset = ondisk_rec_offset_to_key(rec); + if (XFS_RMAP_NON_INODE_OWNER(be64_to_cpu(rec->rmap.rm_owner)) || + XFS_RMAP_IS_BMBT_BLOCK(be64_to_cpu(rec->rmap.rm_offset))) + return; + off = be64_to_cpu(key->rmap.rm_offset); + off = (XFS_RMAP_OFF(off) + adj) | (off & ~XFS_RMAP_OFF_MASK); + key->rmap.rm_offset = cpu_to_be64(off); +} + +STATIC void +xfs_rtrmapbt_init_rec_from_cur( + struct xfs_btree_cur *cur, + union xfs_btree_rec *rec) +{ + rec->rmap.rm_startblock = cpu_to_be32(cur->bc_rec.r.rm_startblock); + rec->rmap.rm_blockcount = cpu_to_be32(cur->bc_rec.r.rm_blockcount); + rec->rmap.rm_owner = cpu_to_be64(cur->bc_rec.r.rm_owner); + rec->rmap.rm_offset = cpu_to_be64( + xfs_rmap_irec_offset_pack(&cur->bc_rec.r)); +} + +STATIC void +xfs_rtrmapbt_init_ptr_from_cur( + struct xfs_btree_cur *cur, + union xfs_btree_ptr *ptr) +{ + ptr->l = 0; +} + +/* + * Mask the appropriate parts of the ondisk key field for a key comparison. + * Fork and bmbt are significant parts of the rmap record key, but written + * status is merely a record attribute. + */ +static inline uint64_t offset_keymask(uint64_t offset) +{ + return offset & ~XFS_RMAP_OFF_UNWRITTEN; +} + +STATIC int64_t +xfs_rtrmapbt_key_diff( + struct xfs_btree_cur *cur, + const union xfs_btree_key *key) +{ + struct xfs_rmap_irec *rec = &cur->bc_rec.r; + const struct xfs_rmap_key *kp = &key->rmap; + __u64 x, y; + int64_t d; + + d = (int64_t)be32_to_cpu(kp->rm_startblock) - rec->rm_startblock; + if (d) + return d; + + x = be64_to_cpu(kp->rm_owner); + y = rec->rm_owner; + if (x > y) + return 1; + else if (y > x) + return -1; + + x = offset_keymask(be64_to_cpu(kp->rm_offset)); + y = offset_keymask(xfs_rmap_irec_offset_pack(rec)); + if (x > y) + return 1; + else if (y > x) + return -1; + return 0; +} + +STATIC int64_t +xfs_rtrmapbt_diff_two_keys( + struct xfs_btree_cur *cur, + const union xfs_btree_key *k1, + const union xfs_btree_key *k2, + const union xfs_btree_key *mask) +{ + const struct xfs_rmap_key *kp1 = &k1->rmap; + const struct xfs_rmap_key *kp2 = &k2->rmap; + int64_t d; + __u64 x, y; + + /* Doesn't make sense to mask off the physical space part */ + ASSERT(!mask || mask->rmap.rm_startblock); + + d = (int64_t)be32_to_cpu(kp1->rm_startblock) - + be32_to_cpu(kp2->rm_startblock); + if (d) + return d; + + if (!mask || mask->rmap.rm_owner) { + x = be64_to_cpu(kp1->rm_owner); + y = be64_to_cpu(kp2->rm_owner); + if (x > y) + return 1; + else if (y > x) + return -1; + } + + if (!mask || mask->rmap.rm_offset) { + /* Doesn't make sense to allow offset but not owner */ + ASSERT(!mask || mask->rmap.rm_owner); + + x = offset_keymask(be64_to_cpu(kp1->rm_offset)); + y = offset_keymask(be64_to_cpu(kp2->rm_offset)); + if (x > y) + return 1; + else if (y > x) + return -1; + } + + return 0; +} + static xfs_failaddr_t xfs_rtrmapbt_verify( struct xfs_buf *bp) @@ -118,6 +296,86 @@ const struct xfs_buf_ops xfs_rtrmapbt_buf_ops = { .verify_struct = xfs_rtrmapbt_verify, }; +STATIC int +xfs_rtrmapbt_keys_inorder( + struct xfs_btree_cur *cur, + const union xfs_btree_key *k1, + const union xfs_btree_key *k2) +{ + uint32_t x; + uint32_t y; + uint64_t a; + uint64_t b; + + x = be32_to_cpu(k1->rmap.rm_startblock); + y = be32_to_cpu(k2->rmap.rm_startblock); + if (x < y) + return 1; + else if (x > y) + return 0; + a = be64_to_cpu(k1->rmap.rm_owner); + b = be64_to_cpu(k2->rmap.rm_owner); + if (a < b) + return 1; + else if (a > b) + return 0; + a = offset_keymask(be64_to_cpu(k1->rmap.rm_offset)); + b = offset_keymask(be64_to_cpu(k2->rmap.rm_offset)); + if (a <= b) + return 1; + return 0; +} + +STATIC int +xfs_rtrmapbt_recs_inorder( + struct xfs_btree_cur *cur, + const union xfs_btree_rec *r1, + const union xfs_btree_rec *r2) +{ + uint32_t x; + uint32_t y; + uint64_t a; + uint64_t b; + + x = be32_to_cpu(r1->rmap.rm_startblock); + y = be32_to_cpu(r2->rmap.rm_startblock); + if (x < y) + return 1; + else if (x > y) + return 0; + a = be64_to_cpu(r1->rmap.rm_owner); + b = be64_to_cpu(r2->rmap.rm_owner); + if (a < b) + return 1; + else if (a > b) + return 0; + a = offset_keymask(be64_to_cpu(r1->rmap.rm_offset)); + b = offset_keymask(be64_to_cpu(r2->rmap.rm_offset)); + if (a <= b) + return 1; + return 0; +} + +STATIC enum xbtree_key_contig +xfs_rtrmapbt_keys_contiguous( + struct xfs_btree_cur *cur, + const union xfs_btree_key *key1, + const union xfs_btree_key *key2, + const union xfs_btree_key *mask) +{ + ASSERT(!mask || mask->rmap.rm_startblock); + + /* + * We only support checking contiguity of the physical space component. + * If any callers ever need more specificity than that, they'll have to + * implement it here. + */ + ASSERT(!mask || (!mask->rmap.rm_owner && !mask->rmap.rm_offset)); + + return xbtree_key_contig(be32_to_cpu(key1->rmap.rm_startblock), + be32_to_cpu(key2->rmap.rm_startblock)); +} + const struct xfs_btree_ops xfs_rtrmapbt_ops = { .rec_len = sizeof(struct xfs_rmap_rec), .key_len = 2 * sizeof(struct xfs_rmap_key), @@ -126,7 +384,20 @@ const struct xfs_btree_ops xfs_rtrmapbt_ops = { XFS_BTREE_IROOT_RECORDS, .dup_cursor = xfs_rtrmapbt_dup_cursor, + .alloc_block = xfs_btree_alloc_imeta_block, + .free_block = xfs_btree_free_imeta_block, + .get_minrecs = xfs_rtrmapbt_get_minrecs, + .get_maxrecs = xfs_rtrmapbt_get_maxrecs, + .init_key_from_rec = xfs_rtrmapbt_init_key_from_rec, + .init_high_key_from_rec = xfs_rtrmapbt_init_high_key_from_rec, + .init_rec_from_cur = xfs_rtrmapbt_init_rec_from_cur, + .init_ptr_from_cur = xfs_rtrmapbt_init_ptr_from_cur, + .key_diff = xfs_rtrmapbt_key_diff, .buf_ops = &xfs_rtrmapbt_buf_ops, + .diff_two_keys = xfs_rtrmapbt_diff_two_keys, + .keys_inorder = xfs_rtrmapbt_keys_inorder, + .recs_inorder = xfs_rtrmapbt_recs_inorder, + .keys_contiguous = xfs_rtrmapbt_keys_contiguous, }; /* Initialize a new rt rmap btree cursor. */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 01/38] xfs: prepare rmap btree cursor tracepoints for realtime 2022-12-30 22:18 ` [PATCHSET v1.0 00/38] xfs: realtime reverse-mapping support Darrick J. Wong 2022-12-30 22:18 ` [PATCH 06/38] xfs: add realtime rmap btree operations Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 05/38] xfs: realtime rmap btree transaction reservations Darrick J. Wong ` (35 subsequent siblings) 37 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Rework the rmap btree cursor tracepoints in preparation to handle the realtime rmap btree cursor. Mostly this involves renaming the field to "rmapbno" and extracting the group number from the cursor when possible. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/xfs_trace.c | 18 ++++++++++++++++ fs/xfs/xfs_trace.h | 58 +++++++++++++++++++++++++++------------------------- 2 files changed, 48 insertions(+), 28 deletions(-) diff --git a/fs/xfs/xfs_trace.c b/fs/xfs/xfs_trace.c index d7ede15110e8..ae35868e0638 100644 --- a/fs/xfs/xfs_trace.c +++ b/fs/xfs/xfs_trace.c @@ -45,6 +45,24 @@ #include "xfs_rtgroup.h" #include "xfs_rmap.h" +static inline void +xfs_rmapbt_crack_agno_opdev( + struct xfs_btree_cur *cur, + xfs_agnumber_t *agno, + dev_t *opdev) +{ + if (cur->bc_flags & XFS_BTREE_IN_MEMORY) { + *agno = 0; + *opdev = xfbtree_target(cur->bc_mem.xfbtree)->bt_dev; + } else if (cur->bc_flags & XFS_BTREE_ROOT_IN_INODE) { + *agno = cur->bc_ino.rtg->rtg_rgno; + *opdev = cur->bc_mp->m_rtdev_targp->bt_dev; + } else { + *agno = cur->bc_ag.pag->pag_agno; + *opdev = cur->bc_mp->m_super->s_dev; + } +} + /* * We include this last to have the helpers above available for the trace * event implementations. diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index fd067e1e28db..6bf7c2aa8e9d 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -14,11 +14,15 @@ * ino: filesystem inode number * * agbno: per-AG block number in fs blocks + * rgbno: per-rtgroup block number in fs blocks * startblock: physical block number for file mappings. This is either a * segmented fsblock for data device mappings, or a rfsblock * for realtime device mappings * fsbcount: number of blocks in an extent, in fs blocks * + * rmapbno: physical block number for a reverse mapping. This is an agbno for + * per-AG rmap btrees or a rgbno for realtime rmap btrees. + * * daddr: physical block number in 512b blocks * bbcount: number of blocks in a physical extent, in 512b blocks * @@ -2836,13 +2840,14 @@ DEFINE_DEFER_PENDING_ITEM_EVENT(xfs_defer_finish_item); /* rmap tracepoints */ DECLARE_EVENT_CLASS(xfs_rmap_class, TP_PROTO(struct xfs_btree_cur *cur, - xfs_agblock_t agbno, xfs_extlen_t len, bool unwritten, + xfs_agblock_t rmapbno, xfs_extlen_t len, bool unwritten, const struct xfs_owner_info *oinfo), - TP_ARGS(cur, agbno, len, unwritten, oinfo), + TP_ARGS(cur, rmapbno, len, unwritten, oinfo), TP_STRUCT__entry( __field(dev_t, dev) + __field(dev_t, opdev) __field(xfs_agnumber_t, agno) - __field(xfs_agblock_t, agbno) + __field(xfs_agblock_t, rmapbno) __field(xfs_extlen_t, len) __field(uint64_t, owner) __field(uint64_t, offset) @@ -2850,8 +2855,8 @@ DECLARE_EVENT_CLASS(xfs_rmap_class, ), TP_fast_assign( __entry->dev = cur->bc_mp->m_super->s_dev; - __entry->agno = cur->bc_ag.pag->pag_agno; - __entry->agbno = agbno; + xfs_rmapbt_crack_agno_opdev(cur, &__entry->agno, &__entry->opdev); + __entry->rmapbno = rmapbno; __entry->len = len; __entry->owner = oinfo->oi_owner; __entry->offset = oinfo->oi_offset; @@ -2859,10 +2864,11 @@ DECLARE_EVENT_CLASS(xfs_rmap_class, if (unwritten) __entry->flags |= XFS_RMAP_UNWRITTEN; ), - TP_printk("dev %d:%d agno 0x%x agbno 0x%x fsbcount 0x%x owner 0x%llx fileoff 0x%llx flags 0x%lx", + TP_printk("dev %d:%d opdev %d:%d agno 0x%x rmapbno 0x%x fsbcount 0x%x owner 0x%llx fileoff 0x%llx flags 0x%lx", MAJOR(__entry->dev), MINOR(__entry->dev), + MAJOR(__entry->opdev), MINOR(__entry->opdev), __entry->agno, - __entry->agbno, + __entry->rmapbno, __entry->len, __entry->owner, __entry->offset, @@ -2871,9 +2877,9 @@ DECLARE_EVENT_CLASS(xfs_rmap_class, #define DEFINE_RMAP_EVENT(name) \ DEFINE_EVENT(xfs_rmap_class, name, \ TP_PROTO(struct xfs_btree_cur *cur, \ - xfs_agblock_t agbno, xfs_extlen_t len, bool unwritten, \ + xfs_agblock_t rmapbno, xfs_extlen_t len, bool unwritten, \ const struct xfs_owner_info *oinfo), \ - TP_ARGS(cur, agbno, len, unwritten, oinfo)) + TP_ARGS(cur, rmapbno, len, unwritten, oinfo)) /* btree cursor error/%ip tracepoint class */ DECLARE_EVENT_CLASS(xfs_btree_error_class, @@ -2932,40 +2938,35 @@ TRACE_EVENT(xfs_rmap_convert_state, TP_ARGS(cur, state, caller_ip), TP_STRUCT__entry( __field(dev_t, dev) + __field(dev_t, opdev) __field(xfs_agnumber_t, agno) - __field(xfs_ino_t, ino) __field(int, state) __field(unsigned long, caller_ip) ), TP_fast_assign( __entry->dev = cur->bc_mp->m_super->s_dev; - if (cur->bc_flags & XFS_BTREE_ROOT_IN_INODE) { - __entry->agno = 0; - __entry->ino = cur->bc_ino.ip->i_ino; - } else { - __entry->agno = cur->bc_ag.pag->pag_agno; - __entry->ino = 0; - } + xfs_rmapbt_crack_agno_opdev(cur, &__entry->agno, &__entry->opdev); __entry->state = state; __entry->caller_ip = caller_ip; ), - TP_printk("dev %d:%d agno 0x%x ino 0x%llx state %d caller %pS", + TP_printk("dev %d:%d opdev %d:%d agno 0x%x state %d caller %pS", MAJOR(__entry->dev), MINOR(__entry->dev), + MAJOR(__entry->opdev), MINOR(__entry->opdev), __entry->agno, - __entry->ino, __entry->state, (char *)__entry->caller_ip) ); DECLARE_EVENT_CLASS(xfs_rmapbt_class, TP_PROTO(struct xfs_btree_cur *cur, - xfs_agblock_t agbno, xfs_extlen_t len, + xfs_agblock_t rmapbno, xfs_extlen_t len, uint64_t owner, uint64_t offset, unsigned int flags), - TP_ARGS(cur, agbno, len, owner, offset, flags), + TP_ARGS(cur, rmapbno, len, owner, offset, flags), TP_STRUCT__entry( __field(dev_t, dev) + __field(dev_t, opdev) __field(xfs_agnumber_t, agno) - __field(xfs_agblock_t, agbno) + __field(xfs_agblock_t, rmapbno) __field(xfs_extlen_t, len) __field(uint64_t, owner) __field(uint64_t, offset) @@ -2973,17 +2974,18 @@ DECLARE_EVENT_CLASS(xfs_rmapbt_class, ), TP_fast_assign( __entry->dev = cur->bc_mp->m_super->s_dev; - __entry->agno = cur->bc_ag.pag->pag_agno; - __entry->agbno = agbno; + xfs_rmapbt_crack_agno_opdev(cur, &__entry->agno, &__entry->opdev); + __entry->rmapbno = rmapbno; __entry->len = len; __entry->owner = owner; __entry->offset = offset; __entry->flags = flags; ), - TP_printk("dev %d:%d agno 0x%x agbno 0x%x fsbcount 0x%x owner 0x%llx fileoff 0x%llx flags 0x%x", + TP_printk("dev %d:%d opdev %d:%d agno 0x%x rmapbno 0x%x fsbcount 0x%x owner 0x%llx fileoff 0x%llx flags 0x%x", MAJOR(__entry->dev), MINOR(__entry->dev), + MAJOR(__entry->opdev), MINOR(__entry->opdev), __entry->agno, - __entry->agbno, + __entry->rmapbno, __entry->len, __entry->owner, __entry->offset, @@ -2992,9 +2994,9 @@ DECLARE_EVENT_CLASS(xfs_rmapbt_class, #define DEFINE_RMAPBT_EVENT(name) \ DEFINE_EVENT(xfs_rmapbt_class, name, \ TP_PROTO(struct xfs_btree_cur *cur, \ - xfs_agblock_t agbno, xfs_extlen_t len, \ + xfs_agblock_t rmapbno, xfs_extlen_t len, \ uint64_t owner, uint64_t offset, unsigned int flags), \ - TP_ARGS(cur, agbno, len, owner, offset, flags)) + TP_ARGS(cur, rmapbno, len, owner, offset, flags)) TRACE_DEFINE_ENUM(XFS_RMAP_MAP); TRACE_DEFINE_ENUM(XFS_RMAP_MAP_SHARED); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 05/38] xfs: realtime rmap btree transaction reservations 2022-12-30 22:18 ` [PATCHSET v1.0 00/38] xfs: realtime reverse-mapping support Darrick J. Wong 2022-12-30 22:18 ` [PATCH 06/38] xfs: add realtime rmap btree operations Darrick J. Wong 2022-12-30 22:18 ` [PATCH 01/38] xfs: prepare rmap btree cursor tracepoints for realtime Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 04/38] xfs: define the on-disk realtime rmap btree format Darrick J. Wong ` (34 subsequent siblings) 37 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Make sure that there's enough log reservation to handle mapping and unmapping realtime extents. We have to reserve enough space to handle a split in the rtrmapbt to add the record and a second split in the regular rmapbt to record the rtrmapbt split. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_swapext.c | 4 +++- fs/xfs/libxfs/xfs_trans_resv.c | 12 ++++++++++-- fs/xfs/libxfs/xfs_trans_space.h | 13 +++++++++++++ 3 files changed, 26 insertions(+), 3 deletions(-) diff --git a/fs/xfs/libxfs/xfs_swapext.c b/fs/xfs/libxfs/xfs_swapext.c index 36f03b0bf4ed..9d2ad2a680f8 100644 --- a/fs/xfs/libxfs/xfs_swapext.c +++ b/fs/xfs/libxfs/xfs_swapext.c @@ -702,7 +702,9 @@ xfs_swapext_rmapbt_blocks( if (!xfs_has_rmapbt(mp)) return 0; if (XFS_IS_REALTIME_INODE(req->ip1)) - return 0; + return howmany_64(req->nr_exchanges, + XFS_MAX_CONTIG_RTRMAPS_PER_BLOCK(mp)) * + XFS_RTRMAPADD_SPACE_RES(mp); return howmany_64(req->nr_exchanges, XFS_MAX_CONTIG_RMAPS_PER_BLOCK(mp)) * diff --git a/fs/xfs/libxfs/xfs_trans_resv.c b/fs/xfs/libxfs/xfs_trans_resv.c index 3435492b1658..52a4386a3d96 100644 --- a/fs/xfs/libxfs/xfs_trans_resv.c +++ b/fs/xfs/libxfs/xfs_trans_resv.c @@ -211,7 +211,9 @@ xfs_calc_inode_chunk_res( * Per-extent log reservation for the btree changes involved in freeing or * allocating a realtime extent. We have to be able to log as many rtbitmap * blocks as needed to mark inuse XFS_BMBT_MAX_EXTLEN blocks' worth of realtime - * extents, as well as the realtime summary block. + * extents, as well as the realtime summary block (t1). Realtime rmap btree + * operations happen in a second transaction, so factor in a couple of rtrmapbt + * splits (t2). */ static unsigned int xfs_rtalloc_block_count( @@ -220,10 +222,16 @@ xfs_rtalloc_block_count( { unsigned int rtbmp_blocks; xfs_rtxlen_t rtxlen; + unsigned int t1, t2 = 0; rtxlen = xfs_extlen_to_rtxlen(mp, XFS_MAX_BMBT_EXTLEN); rtbmp_blocks = xfs_rtbitmap_blockcount(mp, rtxlen); - return (rtbmp_blocks + 1) * num_ops; + t1 = (rtbmp_blocks + 1) * num_ops; + + if (xfs_has_rmapbt(mp)) + t2 = num_ops * (2 * mp->m_rtrmap_maxlevels - 1); + + return max(t1, t2); } /* diff --git a/fs/xfs/libxfs/xfs_trans_space.h b/fs/xfs/libxfs/xfs_trans_space.h index 9640fc232c14..8124893a035d 100644 --- a/fs/xfs/libxfs/xfs_trans_space.h +++ b/fs/xfs/libxfs/xfs_trans_space.h @@ -14,6 +14,19 @@ #define XFS_MAX_CONTIG_BMAPS_PER_BLOCK(mp) \ (((mp)->m_bmap_dmxr[0]) - ((mp)->m_bmap_dmnr[0])) +/* Worst case number of realtime rmaps that can be held in a block. */ +#define XFS_MAX_CONTIG_RTRMAPS_PER_BLOCK(mp) \ + (((mp)->m_rtrmap_mxr[0]) - ((mp)->m_rtrmap_mnr[0])) + +/* Adding one realtime rmap could split every level to the top of the tree. */ +#define XFS_RTRMAPADD_SPACE_RES(mp) ((mp)->m_rtrmap_maxlevels) + +/* Blocks we might need to add "b" realtime rmaps to a tree. */ +#define XFS_NRTRMAPADD_SPACE_RES(mp, b) \ + ((((b) + XFS_MAX_CONTIG_RTRMAPS_PER_BLOCK(mp) - 1) / \ + XFS_MAX_CONTIG_RTRMAPS_PER_BLOCK(mp)) * \ + XFS_RTRMAPADD_SPACE_RES(mp)) + /* Worst case number of rmaps that can be held in a block. */ #define XFS_MAX_CONTIG_RMAPS_PER_BLOCK(mp) \ (((mp)->m_rmap_mxr[0]) - ((mp)->m_rmap_mnr[0])) ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 04/38] xfs: define the on-disk realtime rmap btree format 2022-12-30 22:18 ` [PATCHSET v1.0 00/38] xfs: realtime reverse-mapping support Darrick J. Wong ` (2 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 05/38] xfs: realtime rmap btree transaction reservations Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 03/38] xfs: introduce realtime rmap btree definitions Darrick J. Wong ` (33 subsequent siblings) 37 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Start filling out the rtrmap btree implementation. Start with the on-disk btree format; add everything needed to read, write and manipulate rmap btree blocks. This prepares the way for connecting the btree operations implementation. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/Makefile | 1 fs/xfs/libxfs/xfs_btree.c | 6 + fs/xfs/libxfs/xfs_format.h | 3 fs/xfs/libxfs/xfs_rtrmap_btree.c | 306 ++++++++++++++++++++++++++++++++++++++ fs/xfs/libxfs/xfs_rtrmap_btree.h | 83 ++++++++++ fs/xfs/libxfs/xfs_sb.c | 6 + fs/xfs/libxfs/xfs_shared.h | 2 fs/xfs/xfs_mount.c | 5 - fs/xfs/xfs_mount.h | 9 + fs/xfs/xfs_ondisk.h | 1 10 files changed, 420 insertions(+), 2 deletions(-) create mode 100644 fs/xfs/libxfs/xfs_rtrmap_btree.c create mode 100644 fs/xfs/libxfs/xfs_rtrmap_btree.h diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile index 4bf6d663272b..84934538bf52 100644 --- a/fs/xfs/Makefile +++ b/fs/xfs/Makefile @@ -47,6 +47,7 @@ xfs-y += $(addprefix libxfs/, \ xfs_rmap_btree.o \ xfs_refcount.o \ xfs_refcount_btree.o \ + xfs_rtrmap_btree.o \ xfs_sb.o \ xfs_swapext.o \ xfs_symlink_remote.o \ diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c index c02748e16075..4f1f03b207d3 100644 --- a/fs/xfs/libxfs/xfs_btree.c +++ b/fs/xfs/libxfs/xfs_btree.c @@ -32,6 +32,7 @@ #include "scrub/xfbtree.h" #include "xfs_btree_mem.h" #include "xfs_rtgroup.h" +#include "xfs_rtrmap_btree.h" /* * Btree magic numbers. @@ -1377,6 +1378,7 @@ xfs_btree_set_refs( xfs_buf_set_ref(bp, XFS_BMAP_BTREE_REF); break; case XFS_BTNUM_RMAP: + case XFS_BTNUM_RTRMAP: xfs_buf_set_ref(bp, XFS_RMAP_BTREE_REF); break; case XFS_BTNUM_REFC: @@ -5537,6 +5539,9 @@ xfs_btree_init_cur_caches(void) if (error) goto err; error = xfs_refcountbt_init_cur_cache(); + if (error) + goto err; + error = xfs_rtrmapbt_init_cur_cache(); if (error) goto err; @@ -5555,6 +5560,7 @@ xfs_btree_destroy_cur_caches(void) xfs_bmbt_destroy_cur_cache(); xfs_rmapbt_destroy_cur_cache(); xfs_refcountbt_destroy_cur_cache(); + xfs_rtrmapbt_destroy_cur_cache(); } /* Move the btree cursor before the first record. */ diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h index b2d4ef28a480..fb727e1e4072 100644 --- a/fs/xfs/libxfs/xfs_format.h +++ b/fs/xfs/libxfs/xfs_format.h @@ -1734,6 +1734,9 @@ typedef __be32 xfs_rmap_ptr_t; */ #define XFS_RTRMAP_CRC_MAGIC 0x4d415052 /* 'MAPR' */ +/* inode-based btree pointer type */ +typedef __be64 xfs_rtrmap_ptr_t; + /* * Reference Count Btree format definitions * diff --git a/fs/xfs/libxfs/xfs_rtrmap_btree.c b/fs/xfs/libxfs/xfs_rtrmap_btree.c new file mode 100644 index 000000000000..7f6ba2efdaf2 --- /dev/null +++ b/fs/xfs/libxfs/xfs_rtrmap_btree.c @@ -0,0 +1,306 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (C) 2022 Oracle. All Rights Reserved. + * Author: Darrick J. Wong <djwong@kernel.org> + */ +#include "xfs.h" +#include "xfs_fs.h" +#include "xfs_shared.h" +#include "xfs_format.h" +#include "xfs_log_format.h" +#include "xfs_trans_resv.h" +#include "xfs_bit.h" +#include "xfs_sb.h" +#include "xfs_mount.h" +#include "xfs_defer.h" +#include "xfs_inode.h" +#include "xfs_trans.h" +#include "xfs_alloc.h" +#include "xfs_btree.h" +#include "xfs_btree_staging.h" +#include "xfs_rtrmap_btree.h" +#include "xfs_trace.h" +#include "xfs_cksum.h" +#include "xfs_error.h" +#include "xfs_extent_busy.h" +#include "xfs_rtgroup.h" + +static struct kmem_cache *xfs_rtrmapbt_cur_cache; + +/* + * Realtime Reverse Map btree. + * + * This is a btree used to track the owner(s) of a given extent in the realtime + * device. See the comments in xfs_rmap_btree.c for more information. + * + * This tree is basically the same as the regular rmap btree except that it + * is rooted in an inode and does not live in free space. + */ + +static struct xfs_btree_cur * +xfs_rtrmapbt_dup_cursor( + struct xfs_btree_cur *cur) +{ + struct xfs_btree_cur *new; + + new = xfs_rtrmapbt_init_cursor(cur->bc_mp, cur->bc_tp, cur->bc_ino.rtg, + cur->bc_ino.ip); + + /* Copy the flags values since init cursor doesn't get them. */ + new->bc_ino.flags = cur->bc_ino.flags; + + return new; +} + +static xfs_failaddr_t +xfs_rtrmapbt_verify( + struct xfs_buf *bp) +{ + struct xfs_mount *mp = bp->b_target->bt_mount; + struct xfs_btree_block *block = XFS_BUF_TO_BLOCK(bp); + xfs_failaddr_t fa; + int level; + + if (!xfs_verify_magic(bp, block->bb_magic)) + return __this_address; + + if (!xfs_has_rmapbt(mp)) + return __this_address; + fa = xfs_btree_lblock_v5hdr_verify(bp, XFS_RMAP_OWN_UNKNOWN); + if (fa) + return fa; + level = be16_to_cpu(block->bb_level); + if (level > mp->m_rtrmap_maxlevels) + return __this_address; + + return xfs_btree_lblock_verify(bp, mp->m_rtrmap_mxr[level != 0]); +} + +static void +xfs_rtrmapbt_read_verify( + struct xfs_buf *bp) +{ + xfs_failaddr_t fa; + + if (!xfs_btree_lblock_verify_crc(bp)) + xfs_verifier_error(bp, -EFSBADCRC, __this_address); + else { + fa = xfs_rtrmapbt_verify(bp); + if (fa) + xfs_verifier_error(bp, -EFSCORRUPTED, fa); + } + + if (bp->b_error) + trace_xfs_btree_corrupt(bp, _RET_IP_); +} + +static void +xfs_rtrmapbt_write_verify( + struct xfs_buf *bp) +{ + xfs_failaddr_t fa; + + fa = xfs_rtrmapbt_verify(bp); + if (fa) { + trace_xfs_btree_corrupt(bp, _RET_IP_); + xfs_verifier_error(bp, -EFSCORRUPTED, fa); + return; + } + xfs_btree_lblock_calc_crc(bp); + +} + +const struct xfs_buf_ops xfs_rtrmapbt_buf_ops = { + .name = "xfs_rtrmapbt", + .magic = { 0, cpu_to_be32(XFS_RTRMAP_CRC_MAGIC) }, + .verify_read = xfs_rtrmapbt_read_verify, + .verify_write = xfs_rtrmapbt_write_verify, + .verify_struct = xfs_rtrmapbt_verify, +}; + +const struct xfs_btree_ops xfs_rtrmapbt_ops = { + .rec_len = sizeof(struct xfs_rmap_rec), + .key_len = 2 * sizeof(struct xfs_rmap_key), + .geom_flags = XFS_BTREE_LONG_PTRS | XFS_BTREE_ROOT_IN_INODE | + XFS_BTREE_CRC_BLOCKS | XFS_BTREE_OVERLAPPING | + XFS_BTREE_IROOT_RECORDS, + + .dup_cursor = xfs_rtrmapbt_dup_cursor, + .buf_ops = &xfs_rtrmapbt_buf_ops, +}; + +/* Initialize a new rt rmap btree cursor. */ +static struct xfs_btree_cur * +xfs_rtrmapbt_init_common( + struct xfs_mount *mp, + struct xfs_trans *tp, + struct xfs_rtgroup *rtg, + struct xfs_inode *ip) +{ + struct xfs_btree_cur *cur; + + ASSERT(xfs_isilocked(ip, XFS_ILOCK_SHARED | XFS_ILOCK_EXCL)); + + cur = xfs_btree_alloc_cursor(mp, tp, XFS_BTNUM_RTRMAP, + &xfs_rtrmapbt_ops, mp->m_rtrmap_maxlevels, + xfs_rtrmapbt_cur_cache); + cur->bc_statoff = XFS_STATS_CALC_INDEX(xs_rmap_2); + + cur->bc_ino.ip = ip; + cur->bc_ino.allocated = 0; + cur->bc_ino.flags = 0; + + cur->bc_ino.rtg = xfs_rtgroup_bump(rtg); + return cur; +} + +/* Allocate a new rt rmap btree cursor. */ +struct xfs_btree_cur * +xfs_rtrmapbt_init_cursor( + struct xfs_mount *mp, + struct xfs_trans *tp, + struct xfs_rtgroup *rtg, + struct xfs_inode *ip) +{ + struct xfs_btree_cur *cur; + struct xfs_ifork *ifp = xfs_ifork_ptr(ip, XFS_DATA_FORK); + + cur = xfs_rtrmapbt_init_common(mp, tp, rtg, ip); + cur->bc_nlevels = be16_to_cpu(ifp->if_broot->bb_level) + 1; + cur->bc_ino.forksize = xfs_inode_fork_size(ip, XFS_DATA_FORK); + cur->bc_ino.whichfork = XFS_DATA_FORK; + return cur; +} + +/* Create a new rt reverse mapping btree cursor with a fake root for staging. */ +struct xfs_btree_cur * +xfs_rtrmapbt_stage_cursor( + struct xfs_mount *mp, + struct xfs_rtgroup *rtg, + struct xfs_inode *ip, + struct xbtree_ifakeroot *ifake) +{ + struct xfs_btree_cur *cur; + + cur = xfs_rtrmapbt_init_common(mp, NULL, rtg, ip); + cur->bc_nlevels = ifake->if_levels; + cur->bc_ino.forksize = ifake->if_fork_size; + cur->bc_ino.whichfork = -1; + xfs_btree_stage_ifakeroot(cur, ifake, NULL); + return cur; +} + +/* + * Install a new rt reverse mapping btree root. Caller is responsible for + * invalidating and freeing the old btree blocks. + */ +void +xfs_rtrmapbt_commit_staged_btree( + struct xfs_btree_cur *cur, + struct xfs_trans *tp) +{ + struct xbtree_ifakeroot *ifake = cur->bc_ino.ifake; + struct xfs_ifork *ifp; + int flags = XFS_ILOG_CORE | XFS_ILOG_DBROOT; + + ASSERT(cur->bc_flags & XFS_BTREE_STAGING); + + /* + * Free any resources hanging off the real fork, then shallow-copy the + * staging fork's contents into the real fork to transfer everything + * we just built. + */ + ifp = xfs_ifork_ptr(cur->bc_ino.ip, XFS_DATA_FORK); + xfs_idestroy_fork(ifp); + memcpy(ifp, ifake->if_fork, sizeof(struct xfs_ifork)); + + xfs_trans_log_inode(tp, cur->bc_ino.ip, flags); + xfs_btree_commit_ifakeroot(cur, tp, XFS_DATA_FORK, &xfs_rtrmapbt_ops); +} + +/* Calculate number of records in a rt reverse mapping btree block. */ +static inline unsigned int +xfs_rtrmapbt_block_maxrecs( + unsigned int blocklen, + bool leaf) +{ + if (leaf) + return blocklen / sizeof(struct xfs_rmap_rec); + return blocklen / + (2 * sizeof(struct xfs_rmap_key) + sizeof(xfs_rtrmap_ptr_t)); +} + +/* + * Calculate number of records in an rt reverse mapping btree block. + */ +unsigned int +xfs_rtrmapbt_maxrecs( + struct xfs_mount *mp, + unsigned int blocklen, + bool leaf) +{ + blocklen -= XFS_RTRMAP_BLOCK_LEN; + return xfs_rtrmapbt_block_maxrecs(blocklen, leaf); +} + +/* Compute the max possible height for realtime reverse mapping btrees. */ +unsigned int +xfs_rtrmapbt_maxlevels_ondisk(void) +{ + unsigned int minrecs[2]; + unsigned int blocklen; + + blocklen = XFS_MIN_CRC_BLOCKSIZE - XFS_BTREE_LBLOCK_CRC_LEN; + + minrecs[0] = xfs_rtrmapbt_block_maxrecs(blocklen, true) / 2; + minrecs[1] = xfs_rtrmapbt_block_maxrecs(blocklen, false) / 2; + + /* We need at most one record for every block in an rt group. */ + return xfs_btree_compute_maxlevels(minrecs, XFS_MAX_RGBLOCKS); +} + +int __init +xfs_rtrmapbt_init_cur_cache(void) +{ + xfs_rtrmapbt_cur_cache = kmem_cache_create("xfs_rtrmapbt_cur", + xfs_btree_cur_sizeof(xfs_rtrmapbt_maxlevels_ondisk()), + 0, 0, NULL); + + if (!xfs_rtrmapbt_cur_cache) + return -ENOMEM; + return 0; +} + +void +xfs_rtrmapbt_destroy_cur_cache(void) +{ + kmem_cache_destroy(xfs_rtrmapbt_cur_cache); + xfs_rtrmapbt_cur_cache = NULL; +} + +/* Compute the maximum height of an rt reverse mapping btree. */ +void +xfs_rtrmapbt_compute_maxlevels( + struct xfs_mount *mp) +{ + unsigned int d_maxlevels, r_maxlevels; + + if (!xfs_has_rtrmapbt(mp)) { + mp->m_rtrmap_maxlevels = 0; + return; + } + + /* + * The realtime rmapbt lives on the data device, which means that its + * maximum height is constrained by the size of the data device and + * the height required to store one rmap record for each block in an + * rt group. + */ + d_maxlevels = xfs_btree_space_to_height(mp->m_rtrmap_mnr, + mp->m_sb.sb_dblocks); + r_maxlevels = xfs_btree_compute_maxlevels(mp->m_rtrmap_mnr, + mp->m_sb.sb_rgblocks); + + /* Add one level to handle the inode root level. */ + mp->m_rtrmap_maxlevels = min(d_maxlevels, r_maxlevels) + 1; +} diff --git a/fs/xfs/libxfs/xfs_rtrmap_btree.h b/fs/xfs/libxfs/xfs_rtrmap_btree.h new file mode 100644 index 000000000000..7380c04e7705 --- /dev/null +++ b/fs/xfs/libxfs/xfs_rtrmap_btree.h @@ -0,0 +1,83 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * Copyright (C) 2022 Oracle. All Rights Reserved. + * Author: Darrick J. Wong <djwong@kernel.org> + */ +#ifndef __XFS_RTRMAP_BTREE_H__ +#define __XFS_RTRMAP_BTREE_H__ + +struct xfs_buf; +struct xfs_btree_cur; +struct xfs_mount; +struct xbtree_ifakeroot; +struct xfs_rtgroup; + +/* rmaps only exist on crc enabled filesystems */ +#define XFS_RTRMAP_BLOCK_LEN XFS_BTREE_LBLOCK_CRC_LEN + +struct xfs_btree_cur *xfs_rtrmapbt_init_cursor(struct xfs_mount *mp, + struct xfs_trans *tp, struct xfs_rtgroup *rtg, + struct xfs_inode *ip); +struct xfs_btree_cur *xfs_rtrmapbt_stage_cursor(struct xfs_mount *mp, + struct xfs_rtgroup *rtg, struct xfs_inode *ip, + struct xbtree_ifakeroot *ifake); +void xfs_rtrmapbt_commit_staged_btree(struct xfs_btree_cur *cur, + struct xfs_trans *tp); +unsigned int xfs_rtrmapbt_maxrecs(struct xfs_mount *mp, unsigned int blocklen, + bool leaf); +void xfs_rtrmapbt_compute_maxlevels(struct xfs_mount *mp); + +/* + * Addresses of records, keys, and pointers within an incore rtrmapbt block. + * + * (note that some of these may appear unused, but they are used in userspace) + */ +static inline struct xfs_rmap_rec * +xfs_rtrmap_rec_addr( + struct xfs_btree_block *block, + unsigned int index) +{ + return (struct xfs_rmap_rec *) + ((char *)block + XFS_RTRMAP_BLOCK_LEN + + (index - 1) * sizeof(struct xfs_rmap_rec)); +} + +static inline struct xfs_rmap_key * +xfs_rtrmap_key_addr( + struct xfs_btree_block *block, + unsigned int index) +{ + return (struct xfs_rmap_key *) + ((char *)block + XFS_RTRMAP_BLOCK_LEN + + (index - 1) * 2 * sizeof(struct xfs_rmap_key)); +} + +static inline struct xfs_rmap_key * +xfs_rtrmap_high_key_addr( + struct xfs_btree_block *block, + unsigned int index) +{ + return (struct xfs_rmap_key *) + ((char *)block + XFS_RTRMAP_BLOCK_LEN + + sizeof(struct xfs_rmap_key) + + (index - 1) * 2 * sizeof(struct xfs_rmap_key)); +} + +static inline xfs_rtrmap_ptr_t * +xfs_rtrmap_ptr_addr( + struct xfs_btree_block *block, + unsigned int index, + unsigned int maxrecs) +{ + return (xfs_rtrmap_ptr_t *) + ((char *)block + XFS_RTRMAP_BLOCK_LEN + + maxrecs * 2 * sizeof(struct xfs_rmap_key) + + (index - 1) * sizeof(xfs_rtrmap_ptr_t)); +} + +unsigned int xfs_rtrmapbt_maxlevels_ondisk(void); + +int __init xfs_rtrmapbt_init_cur_cache(void); +void xfs_rtrmapbt_destroy_cur_cache(void); + +#endif /* __XFS_RTRMAP_BTREE_H__ */ diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c index 54f93e1b0f00..570919c223c9 100644 --- a/fs/xfs/libxfs/xfs_sb.c +++ b/fs/xfs/libxfs/xfs_sb.c @@ -27,6 +27,7 @@ #include "xfs_ag.h" #include "xfs_swapext.h" #include "xfs_rtgroup.h" +#include "xfs_rtrmap_btree.h" /* * Physical superblock buffer manipulations. Shared with libxfs in userspace. @@ -1064,6 +1065,11 @@ xfs_sb_mount_common( mp->m_rmap_mnr[0] = mp->m_rmap_mxr[0] / 2; mp->m_rmap_mnr[1] = mp->m_rmap_mxr[1] / 2; + mp->m_rtrmap_mxr[0] = xfs_rtrmapbt_maxrecs(mp, sbp->sb_blocksize, true); + mp->m_rtrmap_mxr[1] = xfs_rtrmapbt_maxrecs(mp, sbp->sb_blocksize, false); + mp->m_rtrmap_mnr[0] = mp->m_rtrmap_mxr[0] / 2; + mp->m_rtrmap_mnr[1] = mp->m_rtrmap_mxr[1] / 2; + mp->m_refc_mxr[0] = xfs_refcountbt_maxrecs(mp, sbp->sb_blocksize, true); mp->m_refc_mxr[1] = xfs_refcountbt_maxrecs(mp, sbp->sb_blocksize, false); mp->m_refc_mnr[0] = mp->m_refc_mxr[0] / 2; diff --git a/fs/xfs/libxfs/xfs_shared.h b/fs/xfs/libxfs/xfs_shared.h index 62839fc87b50..31c577a94295 100644 --- a/fs/xfs/libxfs/xfs_shared.h +++ b/fs/xfs/libxfs/xfs_shared.h @@ -42,6 +42,7 @@ extern const struct xfs_buf_ops xfs_rtbitmap_buf_ops; extern const struct xfs_buf_ops xfs_rtsummary_buf_ops; extern const struct xfs_buf_ops xfs_rtbuf_ops; extern const struct xfs_buf_ops xfs_rtsb_buf_ops; +extern const struct xfs_buf_ops xfs_rtrmapbt_buf_ops; extern const struct xfs_buf_ops xfs_sb_buf_ops; extern const struct xfs_buf_ops xfs_sb_quiet_buf_ops; extern const struct xfs_buf_ops xfs_symlink_buf_ops; @@ -54,6 +55,7 @@ extern const struct xfs_btree_ops xfs_finobt_ops; extern const struct xfs_btree_ops xfs_bmbt_ops; extern const struct xfs_btree_ops xfs_refcountbt_ops; extern const struct xfs_btree_ops xfs_rmapbt_ops; +extern const struct xfs_btree_ops xfs_rtrmapbt_ops; /* log size calculation functions */ int xfs_log_calc_unit_res(struct xfs_mount *mp, int unit_bytes); diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c index d94d44f40be4..1d2403b93f58 100644 --- a/fs/xfs/xfs_mount.c +++ b/fs/xfs/xfs_mount.c @@ -36,6 +36,7 @@ #include "xfs_ag.h" #include "xfs_imeta.h" #include "xfs_rtgroup.h" +#include "xfs_rtrmap_btree.h" static DEFINE_MUTEX(xfs_uuid_table_mutex); static int xfs_uuid_table_size; @@ -654,8 +655,7 @@ static inline void xfs_rtbtree_compute_maxlevels( struct xfs_mount *mp) { - /* This will be filled in later. */ - mp->m_rtbtree_maxlevels = 0; + mp->m_rtbtree_maxlevels = mp->m_rtrmap_maxlevels; } /* @@ -727,6 +727,7 @@ xfs_mountfs( xfs_bmap_compute_maxlevels(mp, XFS_ATTR_FORK); xfs_mount_setup_inode_geom(mp); xfs_rmapbt_compute_maxlevels(mp); + xfs_rtrmapbt_compute_maxlevels(mp); xfs_refcountbt_compute_maxlevels(mp); xfs_agbtree_compute_maxlevels(mp); diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h index 55e6e30f9045..a565b1b1372a 100644 --- a/fs/xfs/xfs_mount.h +++ b/fs/xfs/xfs_mount.h @@ -131,11 +131,14 @@ typedef struct xfs_mount { uint m_bmap_dmnr[2]; /* min bmap btree records */ uint m_rmap_mxr[2]; /* max rmap btree records */ uint m_rmap_mnr[2]; /* min rmap btree records */ + uint m_rtrmap_mxr[2]; /* max rtrmap btree records */ + uint m_rtrmap_mnr[2]; /* min rtrmap btree records */ uint m_refc_mxr[2]; /* max refc btree records */ uint m_refc_mnr[2]; /* min refc btree records */ uint m_alloc_maxlevels; /* max alloc btree levels */ uint m_bm_maxlevels[2]; /* max bmap btree levels */ uint m_rmap_maxlevels; /* max rmap btree levels */ + uint m_rtrmap_maxlevels; /* max rtrmap btree level */ uint m_refc_maxlevels; /* max refcount btree level */ unsigned int m_agbtree_maxlevels; /* max level of all AG btrees */ unsigned int m_rtbtree_maxlevels; /* max level of all rt btrees */ @@ -359,6 +362,12 @@ __XFS_HAS_FEAT(large_extent_counts, NREXT64) __XFS_HAS_FEAT(metadir, METADIR) __XFS_HAS_FEAT(rtgroups, RTGROUPS) +static inline bool xfs_has_rtrmapbt(struct xfs_mount *mp) +{ + return xfs_has_rtgroups(mp) && xfs_has_realtime(mp) && + xfs_has_rmapbt(mp); +} + /* * Mount features * diff --git a/fs/xfs/xfs_ondisk.h b/fs/xfs/xfs_ondisk.h index 17e541d35194..35d0695fbf57 100644 --- a/fs/xfs/xfs_ondisk.h +++ b/fs/xfs/xfs_ondisk.h @@ -77,6 +77,7 @@ xfs_check_ondisk_structs(void) XFS_CHECK_STRUCT_SIZE(union xfs_rtword_ondisk, 4); XFS_CHECK_STRUCT_SIZE(union xfs_suminfo_ondisk, 4); XFS_CHECK_STRUCT_SIZE(struct xfs_rtbuf_blkinfo, 48); + XFS_CHECK_STRUCT_SIZE(xfs_rtrmap_ptr_t, 8); /* * m68k has problems with xfs_attr_leaf_name_remote_t, but we pad it to ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 03/38] xfs: introduce realtime rmap btree definitions 2022-12-30 22:18 ` [PATCHSET v1.0 00/38] xfs: realtime reverse-mapping support Darrick J. Wong ` (3 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 04/38] xfs: define the on-disk realtime rmap btree format Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 02/38] xfs: simplify the xfs_rmap_{alloc,free}_extent calling conventions Darrick J. Wong ` (32 subsequent siblings) 37 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Add new realtime rmap btree definitions. The realtime rmap btree will be rooted from a hidden inode, but has its own shape and therefore needs to have most of its own separate types. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_btree.h | 1 + fs/xfs/libxfs/xfs_format.h | 7 +++++++ fs/xfs/libxfs/xfs_types.h | 5 +++-- fs/xfs/scrub/trace.h | 1 + fs/xfs/xfs_trace.h | 1 + 5 files changed, 13 insertions(+), 2 deletions(-) diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h index 125f45731a54..ddaad83d4ff9 100644 --- a/fs/xfs/libxfs/xfs_btree.h +++ b/fs/xfs/libxfs/xfs_btree.h @@ -64,6 +64,7 @@ union xfs_btree_rec { #define XFS_BTNUM_RMAP ((xfs_btnum_t)XFS_BTNUM_RMAPi) #define XFS_BTNUM_REFC ((xfs_btnum_t)XFS_BTNUM_REFCi) #define XFS_BTNUM_RCBAG ((xfs_btnum_t)XFS_BTNUM_RCBAGi) +#define XFS_BTNUM_RTRMAP ((xfs_btnum_t)XFS_BTNUM_RTRMAPi) struct xfs_btree_ops; uint32_t xfs_btree_magic(struct xfs_mount *mp, const struct xfs_btree_ops *ops); diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h index e4f3b2c5c054..b2d4ef28a480 100644 --- a/fs/xfs/libxfs/xfs_format.h +++ b/fs/xfs/libxfs/xfs_format.h @@ -1727,6 +1727,13 @@ typedef __be32 xfs_rmap_ptr_t; XFS_FIBT_BLOCK(mp) + 1 : \ XFS_IBT_BLOCK(mp) + 1) +/* + * Realtime Reverse mapping btree format definitions + * + * This is a btree for reverse mapping records for realtime volumes + */ +#define XFS_RTRMAP_CRC_MAGIC 0x4d415052 /* 'MAPR' */ + /* * Reference Count Btree format definitions * diff --git a/fs/xfs/libxfs/xfs_types.h b/fs/xfs/libxfs/xfs_types.h index d37f8a7ce5f8..e6a4f4a7d009 100644 --- a/fs/xfs/libxfs/xfs_types.h +++ b/fs/xfs/libxfs/xfs_types.h @@ -126,7 +126,7 @@ typedef enum { typedef enum { XFS_BTNUM_BNOi, XFS_BTNUM_CNTi, XFS_BTNUM_RMAPi, XFS_BTNUM_BMAPi, XFS_BTNUM_INOi, XFS_BTNUM_FINOi, XFS_BTNUM_REFCi, XFS_BTNUM_RCBAGi, - XFS_BTNUM_MAX + XFS_BTNUM_RTRMAPi, XFS_BTNUM_MAX } xfs_btnum_t; #define XFS_BTNUM_STRINGS \ @@ -137,7 +137,8 @@ typedef enum { { XFS_BTNUM_INOi, "inobt" }, \ { XFS_BTNUM_FINOi, "finobt" }, \ { XFS_BTNUM_REFCi, "refcbt" }, \ - { XFS_BTNUM_RCBAGi, "rcbagbt" } + { XFS_BTNUM_RCBAGi, "rcbagbt" }, \ + { XFS_BTNUM_RTRMAPi, "rtrmapbt" } struct xfs_name { const unsigned char *name; diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h index 9a51eb404fae..cf1635e00cb0 100644 --- a/fs/xfs/scrub/trace.h +++ b/fs/xfs/scrub/trace.h @@ -42,6 +42,7 @@ TRACE_DEFINE_ENUM(XFS_BTNUM_FINOi); TRACE_DEFINE_ENUM(XFS_BTNUM_RMAPi); TRACE_DEFINE_ENUM(XFS_BTNUM_REFCi); TRACE_DEFINE_ENUM(XFS_BTNUM_RCBAGi); +TRACE_DEFINE_ENUM(XFS_BTNUM_RTRMAPi); TRACE_DEFINE_ENUM(XFS_REFC_DOMAIN_SHARED); TRACE_DEFINE_ENUM(XFS_REFC_DOMAIN_COW); diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index 6bf7c2aa8e9d..390aa7a4afae 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -2555,6 +2555,7 @@ TRACE_DEFINE_ENUM(XFS_BTNUM_FINOi); TRACE_DEFINE_ENUM(XFS_BTNUM_RMAPi); TRACE_DEFINE_ENUM(XFS_BTNUM_REFCi); TRACE_DEFINE_ENUM(XFS_BTNUM_RCBAGi); +TRACE_DEFINE_ENUM(XFS_BTNUM_RTRMAPi); DECLARE_EVENT_CLASS(xfs_btree_cur_class, TP_PROTO(struct xfs_btree_cur *cur, int level, struct xfs_buf *bp), ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 02/38] xfs: simplify the xfs_rmap_{alloc,free}_extent calling conventions 2022-12-30 22:18 ` [PATCHSET v1.0 00/38] xfs: realtime reverse-mapping support Darrick J. Wong ` (4 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 03/38] xfs: introduce realtime rmap btree definitions Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 07/38] xfs: prepare rmap functions to deal with rtrmapbt Darrick J. Wong ` (31 subsequent siblings) 37 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Simplify the calling conventions by allowing callers to pass a fsbno (xfs_fsblock_t) directly into these functions, since we're just going to set it in a struct anyway. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_refcount.c | 6 ++---- fs/xfs/libxfs/xfs_rmap.c | 12 +++++------- fs/xfs/libxfs/xfs_rmap.h | 8 ++++---- fs/xfs/scrub/alloc_repair.c | 10 +++++++--- 4 files changed, 18 insertions(+), 18 deletions(-) diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c index 2721c6076712..20c12cb7b7de 100644 --- a/fs/xfs/libxfs/xfs_refcount.c +++ b/fs/xfs/libxfs/xfs_refcount.c @@ -1889,8 +1889,7 @@ xfs_refcount_alloc_cow_extent( __xfs_refcount_add(tp, XFS_REFCOUNT_ALLOC_COW, fsb, len); /* Add rmap entry */ - xfs_rmap_alloc_extent(tp, XFS_FSB_TO_AGNO(mp, fsb), - XFS_FSB_TO_AGBNO(mp, fsb), len, XFS_RMAP_OWN_COW); + xfs_rmap_alloc_extent(tp, fsb, len, XFS_RMAP_OWN_COW); } /* Forget a CoW staging event in the refcount btree. */ @@ -1906,8 +1905,7 @@ xfs_refcount_free_cow_extent( return; /* Remove rmap entry */ - xfs_rmap_free_extent(tp, XFS_FSB_TO_AGNO(mp, fsb), - XFS_FSB_TO_AGBNO(mp, fsb), len, XFS_RMAP_OWN_COW); + xfs_rmap_free_extent(tp, fsb, len, XFS_RMAP_OWN_COW); __xfs_refcount_add(tp, XFS_REFCOUNT_FREE_COW, fsb, len); } diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c index 9ad3e5077f34..a2a863e0c7fb 100644 --- a/fs/xfs/libxfs/xfs_rmap.c +++ b/fs/xfs/libxfs/xfs_rmap.c @@ -526,7 +526,7 @@ xfs_rmap_free_check_owner( struct xfs_btree_cur *cur, uint64_t ltoff, struct xfs_rmap_irec *rec, - xfs_filblks_t len, + xfs_extlen_t len, uint64_t owner, uint64_t offset, unsigned int flags) @@ -2745,8 +2745,7 @@ xfs_rmap_convert_extent( void xfs_rmap_alloc_extent( struct xfs_trans *tp, - xfs_agnumber_t agno, - xfs_agblock_t bno, + xfs_fsblock_t fsbno, xfs_extlen_t len, uint64_t owner) { @@ -2755,7 +2754,7 @@ xfs_rmap_alloc_extent( if (!xfs_rmap_update_is_needed(tp->t_mountp, XFS_DATA_FORK)) return; - bmap.br_startblock = XFS_AGB_TO_FSB(tp->t_mountp, agno, bno); + bmap.br_startblock = fsbno; bmap.br_blockcount = len; bmap.br_startoff = 0; bmap.br_state = XFS_EXT_NORM; @@ -2767,8 +2766,7 @@ xfs_rmap_alloc_extent( void xfs_rmap_free_extent( struct xfs_trans *tp, - xfs_agnumber_t agno, - xfs_agblock_t bno, + xfs_fsblock_t fsbno, xfs_extlen_t len, uint64_t owner) { @@ -2777,7 +2775,7 @@ xfs_rmap_free_extent( if (!xfs_rmap_update_is_needed(tp->t_mountp, XFS_DATA_FORK)) return; - bmap.br_startblock = XFS_AGB_TO_FSB(tp->t_mountp, agno, bno); + bmap.br_startblock = fsbno; bmap.br_blockcount = len; bmap.br_startoff = 0; bmap.br_state = XFS_EXT_NORM; diff --git a/fs/xfs/libxfs/xfs_rmap.h b/fs/xfs/libxfs/xfs_rmap.h index 36af4de506c7..54c969731cf4 100644 --- a/fs/xfs/libxfs/xfs_rmap.h +++ b/fs/xfs/libxfs/xfs_rmap.h @@ -187,10 +187,10 @@ void xfs_rmap_unmap_extent(struct xfs_trans *tp, struct xfs_inode *ip, void xfs_rmap_convert_extent(struct xfs_mount *mp, struct xfs_trans *tp, struct xfs_inode *ip, int whichfork, struct xfs_bmbt_irec *imap); -void xfs_rmap_alloc_extent(struct xfs_trans *tp, xfs_agnumber_t agno, - xfs_agblock_t bno, xfs_extlen_t len, uint64_t owner); -void xfs_rmap_free_extent(struct xfs_trans *tp, xfs_agnumber_t agno, - xfs_agblock_t bno, xfs_extlen_t len, uint64_t owner); +void xfs_rmap_alloc_extent(struct xfs_trans *tp, xfs_fsblock_t fsbno, + xfs_extlen_t len, uint64_t owner); +void xfs_rmap_free_extent(struct xfs_trans *tp, xfs_fsblock_t fsbno, + xfs_extlen_t len, uint64_t owner); void xfs_rmap_finish_one_cleanup(struct xfs_trans *tp, struct xfs_btree_cur *rcur, int error); diff --git a/fs/xfs/scrub/alloc_repair.c b/fs/xfs/scrub/alloc_repair.c index 1e06ffe26029..6506fc202571 100644 --- a/fs/xfs/scrub/alloc_repair.c +++ b/fs/xfs/scrub/alloc_repair.c @@ -524,9 +524,13 @@ xrep_abt_dispose_one( ASSERT(pag == resv->pag); /* Add a deferred rmap for each extent we used. */ - if (resv->used > 0) - xfs_rmap_alloc_extent(sc->tp, pag->pag_agno, resv->agbno, - resv->used, XFS_RMAP_OWN_AG); + if (resv->used > 0) { + xfs_fsblock_t fsbno; + + fsbno = XFS_AGB_TO_FSB(sc->mp, pag->pag_agno, resv->agbno); + xfs_rmap_alloc_extent(sc->tp, fsbno, resv->used, + XFS_RMAP_OWN_AG); + } /* * For each reserved btree block we didn't use, add it to the free ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 07/38] xfs: prepare rmap functions to deal with rtrmapbt 2022-12-30 22:18 ` [PATCHSET v1.0 00/38] xfs: realtime reverse-mapping support Darrick J. Wong ` (5 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 02/38] xfs: simplify the xfs_rmap_{alloc,free}_extent calling conventions Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 10/38] xfs: add realtime rmap btree block detection to log recovery Darrick J. Wong ` (30 subsequent siblings) 37 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Prepare the high-level rmap functions to deal with the new realtime rmapbt and its slightly different conventions. Provide the ability to talk to either rmapbt or rtrmapbt formats from the same high level code. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_rmap.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 66 insertions(+) diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c index a2a863e0c7fb..31194cc14c0b 100644 --- a/fs/xfs/libxfs/xfs_rmap.c +++ b/fs/xfs/libxfs/xfs_rmap.c @@ -24,6 +24,7 @@ #include "xfs_inode.h" #include "xfs_ag.h" #include "xfs_health.h" +#include "xfs_rtgroup.h" struct kmem_cache *xfs_rmap_intent_cache; @@ -262,12 +263,73 @@ xfs_rmap_check_perag_irec( return NULL; } +static inline xfs_failaddr_t +xfs_rmap_check_rtgroup_irec( + struct xfs_rtgroup *rtg, + const struct xfs_rmap_irec *irec) +{ + struct xfs_mount *mp = rtg->rtg_mount; + bool is_inode; + bool is_unwritten; + bool is_bmbt; + bool is_attr; + + if (irec->rm_blockcount == 0) + return __this_address; + + if (irec->rm_owner == XFS_RMAP_OWN_FS) { + if (irec->rm_startblock != 0) + return __this_address; + if (irec->rm_blockcount != mp->m_sb.sb_rextsize) + return __this_address; + if (irec->rm_offset != 0) + return __this_address; + } else { + if (!xfs_verify_rgbext(rtg, irec->rm_startblock, + irec->rm_blockcount)) + return __this_address; + } + + if (!(xfs_verify_ino(mp, irec->rm_owner) || + (irec->rm_owner <= XFS_RMAP_OWN_FS && + irec->rm_owner >= XFS_RMAP_OWN_MIN))) + return __this_address; + + /* Check flags. */ + is_inode = !XFS_RMAP_NON_INODE_OWNER(irec->rm_owner); + is_bmbt = irec->rm_flags & XFS_RMAP_BMBT_BLOCK; + is_attr = irec->rm_flags & XFS_RMAP_ATTR_FORK; + is_unwritten = irec->rm_flags & XFS_RMAP_UNWRITTEN; + + if (!is_inode && irec->rm_owner != XFS_RMAP_OWN_FS) + return __this_address; + + if (!is_inode && irec->rm_offset != 0) + return __this_address; + + if (is_bmbt || is_attr) + return __this_address; + + if (is_unwritten && !is_inode) + return __this_address; + + /* Check for a valid fork offset, if applicable. */ + if (is_inode && + !xfs_verify_fileext(mp, irec->rm_offset, irec->rm_blockcount)) + return __this_address; + + return NULL; +} + /* Simple checks for rmap records. */ xfs_failaddr_t xfs_rmap_check_irec( struct xfs_btree_cur *cur, const struct xfs_rmap_irec *irec) { + if (cur->bc_btnum == XFS_BTNUM_RTRMAP) + return xfs_rmap_check_rtgroup_irec(cur->bc_ino.rtg, irec); + if (cur->bc_flags & XFS_BTREE_IN_MEMORY) return xfs_rmap_check_perag_irec(cur->bc_mem.pag, irec); return xfs_rmap_check_perag_irec(cur->bc_ag.pag, irec); @@ -284,6 +346,10 @@ xfs_rmap_complain_bad_rec( if (cur->bc_flags & XFS_BTREE_IN_MEMORY) xfs_warn(mp, "In-Memory Reverse Mapping BTree record corruption detected at %pS!", fa); + else if (cur->bc_btnum == XFS_BTNUM_RTRMAP) + xfs_warn(mp, + "RT Reverse Mapping BTree record corruption in rtgroup %u detected at %pS!", + cur->bc_ino.rtg->rtg_rgno, fa); else xfs_warn(mp, "Reverse Mapping BTree record corruption in AG %d detected at %pS!", ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 10/38] xfs: add realtime rmap btree block detection to log recovery 2022-12-30 22:18 ` [PATCHSET v1.0 00/38] xfs: realtime reverse-mapping support Darrick J. Wong ` (6 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 07/38] xfs: prepare rmap functions to deal with rtrmapbt Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 12/38] xfs: add realtime reverse map inode to metadata directory Darrick J. Wong ` (29 subsequent siblings) 37 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Identify rtrmapbt blocks in the log correctly so that we can validate them during log recovery. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/xfs_buf_item_recover.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/fs/xfs/xfs_buf_item_recover.c b/fs/xfs/xfs_buf_item_recover.c index b74d40f5beb1..496260c9d8cd 100644 --- a/fs/xfs/xfs_buf_item_recover.c +++ b/fs/xfs/xfs_buf_item_recover.c @@ -259,6 +259,9 @@ xlog_recover_validate_buf_type( case XFS_BMAP_MAGIC: bp->b_ops = &xfs_bmbt_buf_ops; break; + case XFS_RTRMAP_CRC_MAGIC: + bp->b_ops = &xfs_rtrmapbt_buf_ops; + break; case XFS_RMAP_CRC_MAGIC: bp->b_ops = &xfs_rmapbt_buf_ops; break; @@ -768,6 +771,7 @@ xlog_recover_get_buf_lsn( uuid = &btb->bb_u.s.bb_uuid; break; } + case XFS_RTRMAP_CRC_MAGIC: case XFS_BMAP_CRC_MAGIC: case XFS_BMAP_MAGIC: { struct xfs_btree_block *btb = blk; ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 12/38] xfs: add realtime reverse map inode to metadata directory 2022-12-30 22:18 ` [PATCHSET v1.0 00/38] xfs: realtime reverse-mapping support Darrick J. Wong ` (7 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 10/38] xfs: add realtime rmap btree block detection to log recovery Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 13/38] xfs: add metadata reservations for realtime rmap btrees Darrick J. Wong ` (28 subsequent siblings) 37 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Add a metadir path to select the realtime rmap btree inode and load it at mount time. The rtrmapbt inode will have a unique extent format code, which means that we also have to update the inode validation and flush routines to look for it. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_format.h | 6 ++- fs/xfs/libxfs/xfs_inode_buf.c | 6 +++ fs/xfs/libxfs/xfs_inode_fork.c | 9 ++++ fs/xfs/libxfs/xfs_rtgroup.h | 3 + fs/xfs/libxfs/xfs_rtrmap_btree.c | 33 ++++++++++++++++ fs/xfs/libxfs/xfs_rtrmap_btree.h | 4 ++ fs/xfs/xfs_inode.c | 19 +++++++++ fs/xfs/xfs_inode_item.c | 2 + fs/xfs/xfs_inode_item_recover.c | 1 fs/xfs/xfs_rtalloc.c | 79 ++++++++++++++++++++++++++++++++++++++ fs/xfs/xfs_trace.h | 1 11 files changed, 159 insertions(+), 4 deletions(-) diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h index fb727e1e4072..babe5d3fabb1 100644 --- a/fs/xfs/libxfs/xfs_format.h +++ b/fs/xfs/libxfs/xfs_format.h @@ -1009,7 +1009,8 @@ enum xfs_dinode_fmt { XFS_DINODE_FMT_LOCAL, /* bulk data */ XFS_DINODE_FMT_EXTENTS, /* struct xfs_bmbt_rec */ XFS_DINODE_FMT_BTREE, /* struct xfs_bmdr_block */ - XFS_DINODE_FMT_UUID /* added long ago, but never used */ + XFS_DINODE_FMT_UUID, /* added long ago, but never used */ + XFS_DINODE_FMT_RMAP, /* reverse mapping btree */ }; #define XFS_INODE_FORMAT_STR \ @@ -1017,7 +1018,8 @@ enum xfs_dinode_fmt { { XFS_DINODE_FMT_LOCAL, "local" }, \ { XFS_DINODE_FMT_EXTENTS, "extent" }, \ { XFS_DINODE_FMT_BTREE, "btree" }, \ - { XFS_DINODE_FMT_UUID, "uuid" } + { XFS_DINODE_FMT_UUID, "uuid" }, \ + { XFS_DINODE_FMT_RMAP, "rmap" } /* * Max values for extnum and aextnum. diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c index 1fb11d0e7eba..9ac84be391b3 100644 --- a/fs/xfs/libxfs/xfs_inode_buf.c +++ b/fs/xfs/libxfs/xfs_inode_buf.c @@ -408,6 +408,12 @@ xfs_dinode_verify_fork( if (di_nextents > max_extents) return __this_address; break; + case XFS_DINODE_FMT_RMAP: + if (!xfs_has_rtrmapbt(mp)) + return __this_address; + if (!(dip->di_flags2 & cpu_to_be64(XFS_DIFLAG2_METADATA))) + return __this_address; + break; default: return __this_address; } diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c index b844bfd94e9c..899428f96b94 100644 --- a/fs/xfs/libxfs/xfs_inode_fork.c +++ b/fs/xfs/libxfs/xfs_inode_fork.c @@ -259,6 +259,11 @@ xfs_iformat_data_fork( return xfs_iformat_extents(ip, dip, XFS_DATA_FORK); case XFS_DINODE_FMT_BTREE: return xfs_iformat_btree(ip, dip, XFS_DATA_FORK); + case XFS_DINODE_FMT_RMAP: + if (!xfs_has_rtrmapbt(ip->i_mount)) + return -EFSCORRUPTED; + ASSERT(0); /* to be implemented later */ + return -EFSCORRUPTED; default: xfs_inode_verifier_error(ip, -EFSCORRUPTED, __func__, dip, sizeof(*dip), __this_address); @@ -639,6 +644,10 @@ xfs_iflush_fork( } break; + case XFS_DINODE_FMT_RMAP: + ASSERT(0); /* to be implemented later */ + break; + default: ASSERT(0); break; diff --git a/fs/xfs/libxfs/xfs_rtgroup.h b/fs/xfs/libxfs/xfs_rtgroup.h index 3c9572677f79..1792a9ab3bbf 100644 --- a/fs/xfs/libxfs/xfs_rtgroup.h +++ b/fs/xfs/libxfs/xfs_rtgroup.h @@ -20,6 +20,9 @@ struct xfs_rtgroup { /* for rcu-safe freeing */ struct rcu_head rcu_head; + /* reverse mapping btree inode */ + struct xfs_inode *rtg_rmapip; + /* Number of blocks in this group */ xfs_rgblock_t rtg_blockcount; diff --git a/fs/xfs/libxfs/xfs_rtrmap_btree.c b/fs/xfs/libxfs/xfs_rtrmap_btree.c index 551d575713db..754812eaff87 100644 --- a/fs/xfs/libxfs/xfs_rtrmap_btree.c +++ b/fs/xfs/libxfs/xfs_rtrmap_btree.c @@ -18,6 +18,7 @@ #include "xfs_alloc.h" #include "xfs_btree.h" #include "xfs_btree_staging.h" +#include "xfs_imeta.h" #include "xfs_rmap.h" #include "xfs_rtrmap_btree.h" #include "xfs_trace.h" @@ -475,6 +476,7 @@ xfs_rtrmapbt_commit_staged_btree( int flags = XFS_ILOG_CORE | XFS_ILOG_DBROOT; ASSERT(cur->bc_flags & XFS_BTREE_STAGING); + ASSERT(ifake->if_fork->if_format == XFS_DINODE_FMT_RMAP); /* * Free any resources hanging off the real fork, then shallow-copy the @@ -575,3 +577,34 @@ xfs_rtrmapbt_compute_maxlevels( /* Add one level to handle the inode root level. */ mp->m_rtrmap_maxlevels = min(d_maxlevels, r_maxlevels) + 1; } + +#define XFS_RTRMAP_NAMELEN 17 + +/* Create the metadata directory path for an rtrmap btree inode. */ +int +xfs_rtrmapbt_create_path( + struct xfs_mount *mp, + xfs_rgnumber_t rgno, + struct xfs_imeta_path **pathp) +{ + struct xfs_imeta_path *path; + char *fname; + int error; + + error = xfs_imeta_create_file_path(mp, 2, &path); + if (error) + return error; + + fname = kmalloc(XFS_RTRMAP_NAMELEN, GFP_KERNEL); + if (!fname) { + xfs_imeta_free_path(path); + return -ENOMEM; + } + + snprintf(fname, XFS_RTRMAP_NAMELEN, "%u.rmap", rgno); + path->im_path[0] = "realtime"; + path->im_path[1] = fname; + path->im_dynamicmask = 0x2; + *pathp = path; + return 0; +} diff --git a/fs/xfs/libxfs/xfs_rtrmap_btree.h b/fs/xfs/libxfs/xfs_rtrmap_btree.h index 7380c04e7705..26e2445f5d6c 100644 --- a/fs/xfs/libxfs/xfs_rtrmap_btree.h +++ b/fs/xfs/libxfs/xfs_rtrmap_btree.h @@ -11,6 +11,7 @@ struct xfs_btree_cur; struct xfs_mount; struct xbtree_ifakeroot; struct xfs_rtgroup; +struct xfs_imeta_path; /* rmaps only exist on crc enabled filesystems */ #define XFS_RTRMAP_BLOCK_LEN XFS_BTREE_LBLOCK_CRC_LEN @@ -80,4 +81,7 @@ unsigned int xfs_rtrmapbt_maxlevels_ondisk(void); int __init xfs_rtrmapbt_init_cur_cache(void); void xfs_rtrmapbt_destroy_cur_cache(void); +int xfs_rtrmapbt_create_path(struct xfs_mount *mp, xfs_rgnumber_t rgno, + struct xfs_imeta_path **pathp); + #endif /* __XFS_RTRMAP_BTREE_H__ */ diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index ab805df9db16..3b0c04b6bcdf 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -2572,7 +2572,15 @@ xfs_iflush( __func__, ip->i_ino, be16_to_cpu(dip->di_magic), dip); goto flush_out; } - if (S_ISREG(VFS_I(ip)->i_mode)) { + if (ip->i_df.if_format == XFS_DINODE_FMT_RMAP) { + if (!S_ISREG(VFS_I(ip)->i_mode) || + !(ip->i_diflags2 & XFS_DIFLAG2_METADATA)) { + xfs_alert_tag(mp, XFS_PTAG_IFLUSH, + "%s: Bad rt rmapbt inode %Lu, ptr "PTR_FMT, + __func__, ip->i_ino, ip); + goto flush_out; + } + } else if (S_ISREG(VFS_I(ip)->i_mode)) { if (XFS_TEST_ERROR( ip->i_df.if_format != XFS_DINODE_FMT_EXTENTS && ip->i_df.if_format != XFS_DINODE_FMT_BTREE, @@ -2612,6 +2620,15 @@ xfs_iflush( goto flush_out; } + if (xfs_inode_has_attr_fork(ip)) { + if (ip->i_af.if_format == XFS_DINODE_FMT_RMAP) { + xfs_alert_tag(mp, XFS_PTAG_IFLUSH, + "%s: rt rmapbt in inode %Lu attr fork, ptr "PTR_FMT, + __func__, ip->i_ino, ip); + goto flush_out; + } + } + /* * Inode item log recovery for v2 inodes are dependent on the flushiter * count for correct sequencing. We bump the flush iteration count so diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c index ca2941ab6cbc..b6e374744474 100644 --- a/fs/xfs/xfs_inode_item.c +++ b/fs/xfs/xfs_inode_item.c @@ -62,6 +62,7 @@ xfs_inode_item_data_fork_size( } break; case XFS_DINODE_FMT_BTREE: + case XFS_DINODE_FMT_RMAP: if ((iip->ili_fields & XFS_ILOG_DBROOT) && ip->i_df.if_broot_bytes > 0) { *nbytes += ip->i_df.if_broot_bytes; @@ -182,6 +183,7 @@ xfs_inode_item_format_data_fork( } break; case XFS_DINODE_FMT_BTREE: + case XFS_DINODE_FMT_RMAP: iip->ili_fields &= ~(XFS_ILOG_DDATA | XFS_ILOG_DEXT | XFS_ILOG_DEV); diff --git a/fs/xfs/xfs_inode_item_recover.c b/fs/xfs/xfs_inode_item_recover.c index 0e5dba2343ea..3453a204d196 100644 --- a/fs/xfs/xfs_inode_item_recover.c +++ b/fs/xfs/xfs_inode_item_recover.c @@ -390,6 +390,7 @@ xlog_recover_inode_commit_pass2( if (unlikely(S_ISREG(ldip->di_mode))) { if ((ldip->di_format != XFS_DINODE_FMT_EXTENTS) && + (ldip->di_format != XFS_DINODE_FMT_RMAP) && (ldip->di_format != XFS_DINODE_FMT_BTREE)) { XFS_CORRUPTION_ERROR( "Bad log dinode data fork format for regular file", diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c index 82b729a86740..ba330265ab8a 100644 --- a/fs/xfs/xfs_rtalloc.c +++ b/fs/xfs/xfs_rtalloc.c @@ -27,6 +27,8 @@ #include "xfs_rtbitmap.h" #include "xfs_rtgroup.h" #include "xfs_quota.h" +#include "xfs_error.h" +#include "xfs_rtrmap_btree.h" /* * Realtime metadata files are not quite regular files because userspace can't @@ -37,6 +39,7 @@ */ static struct lock_class_key xfs_rbmip_key; static struct lock_class_key xfs_rsumip_key; +static struct lock_class_key xfs_rrmapip_key; /* * Read and return the summary information for a given extent size, @@ -1600,6 +1603,47 @@ __xfs_rt_iget( #define xfs_rt_iget(mp, ino, lockdep_key, ipp) \ __xfs_rt_iget((mp), (ino), (lockdep_key), #lockdep_key, (ipp)) +/* Load realtime rmap btree inode. */ +STATIC int +xfs_rtmount_rmapbt( + struct xfs_rtgroup *rtg) +{ + struct xfs_mount *mp = rtg->rtg_mount; + struct xfs_imeta_path *path; + struct xfs_inode *ip; + xfs_ino_t ino; + int error; + + if (!xfs_has_rtrmapbt(mp)) + return 0; + + error = xfs_rtrmapbt_create_path(mp, rtg->rtg_rgno, &path); + if (error) + return error; + + error = xfs_imeta_lookup(mp, path, &ino); + if (error) + goto out_path; + + error = xfs_rt_iget(mp, ino, &xfs_rrmapip_key, &ip); + if (error) + goto out_path; + + if (XFS_IS_CORRUPT(mp, ip->i_df.if_format != XFS_DINODE_FMT_RMAP)) { + error = -EFSCORRUPTED; + goto out_rele; + } + + rtg->rtg_rmapip = ip; + ip = NULL; +out_rele: + if (ip) + xfs_imeta_irele(ip); +out_path: + xfs_imeta_free_path(path); + return error; +} + /* * Read in the bmbt of an rt metadata inode so that we never have to load them * at runtime. This enables the use of shared ILOCKs for rtbitmap scans. Use @@ -1638,7 +1682,7 @@ xfs_rtmount_iread_extents( * Get the bitmap and summary inodes and the summary cache into the mount * structure at mount time. */ -int /* error */ +int xfs_rtmount_inodes( struct xfs_mount *mp) /* file system mount structure */ { @@ -1675,11 +1719,23 @@ xfs_rtmount_inodes( for_each_rtgroup(mp, rgno, rtg) { rtg->rtg_blockcount = xfs_rtgroup_block_count(mp, rtg->rtg_rgno); + + error = xfs_rtmount_rmapbt(rtg); + if (error) { + xfs_rtgroup_put(rtg); + goto out_rele_rtgroup; + } } xfs_alloc_rsum_cache(mp, sbp->sb_rbmblocks); return 0; +out_rele_rtgroup: + for_each_rtgroup(mp, rgno, rtg) { + if (rtg->rtg_rmapip) + xfs_imeta_irele(rtg->rtg_rmapip); + rtg->rtg_rmapip = NULL; + } out_rele_summary: xfs_imeta_irele(mp->m_rsumip); out_rele_bitmap: @@ -1692,6 +1748,8 @@ int xfs_rtmount_dqattach( struct xfs_mount *mp) { + struct xfs_rtgroup *rtg; + xfs_rgnumber_t rgno; int error; error = xfs_qm_dqattach(mp->m_rbmip); @@ -1702,6 +1760,16 @@ xfs_rtmount_dqattach( if (error) return error; + for_each_rtgroup(mp, rgno, rtg) { + if (rtg->rtg_rmapip) { + error = xfs_qm_dqattach(rtg->rtg_rmapip); + if (error) { + xfs_rtgroup_put(rtg); + return error; + } + } + } + return 0; } @@ -1709,7 +1777,16 @@ void xfs_rtunmount_inodes( struct xfs_mount *mp) { + struct xfs_rtgroup *rtg; + xfs_rgnumber_t rgno; + kmem_free(mp->m_rsum_cache); + + for_each_rtgroup(mp, rgno, rtg) { + if (rtg->rtg_rmapip) + xfs_imeta_irele(rtg->rtg_rmapip); + rtg->rtg_rmapip = NULL; + } if (mp->m_rbmip) xfs_imeta_irele(mp->m_rbmip); if (mp->m_rsumip) diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index c02a58cbf15b..77f4acc1b923 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -2235,6 +2235,7 @@ TRACE_DEFINE_ENUM(XFS_DINODE_FMT_LOCAL); TRACE_DEFINE_ENUM(XFS_DINODE_FMT_EXTENTS); TRACE_DEFINE_ENUM(XFS_DINODE_FMT_BTREE); TRACE_DEFINE_ENUM(XFS_DINODE_FMT_UUID); +TRACE_DEFINE_ENUM(XFS_DINODE_FMT_RMAP); DECLARE_EVENT_CLASS(xfs_swap_extent_class, TP_PROTO(struct xfs_inode *ip, int which), ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 13/38] xfs: add metadata reservations for realtime rmap btrees 2022-12-30 22:18 ` [PATCHSET v1.0 00/38] xfs: realtime reverse-mapping support Darrick J. Wong ` (8 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 12/38] xfs: add realtime reverse map inode to metadata directory Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 09/38] xfs: support recovering rmap intent items targetting realtime extents Darrick J. Wong ` (27 subsequent siblings) 37 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Reserve some free blocks so that we will always have enough free blocks in the data volume to handle expansion of the realtime rmap btree. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_rtrmap_btree.c | 39 ++++++++++++++++++++++++++++++++++++++ fs/xfs/libxfs/xfs_rtrmap_btree.h | 2 ++ fs/xfs/xfs_rtalloc.c | 21 +++++++++++++++++++- 3 files changed, 61 insertions(+), 1 deletion(-) diff --git a/fs/xfs/libxfs/xfs_rtrmap_btree.c b/fs/xfs/libxfs/xfs_rtrmap_btree.c index 754812eaff87..c90017408574 100644 --- a/fs/xfs/libxfs/xfs_rtrmap_btree.c +++ b/fs/xfs/libxfs/xfs_rtrmap_btree.c @@ -608,3 +608,42 @@ xfs_rtrmapbt_create_path( *pathp = path; return 0; } + +/* Calculate the rtrmap btree size for some records. */ +static unsigned long long +xfs_rtrmapbt_calc_size( + struct xfs_mount *mp, + unsigned long long len) +{ + return xfs_btree_calc_size(mp->m_rtrmap_mnr, len); +} + +/* + * Calculate the maximum rmap btree size. + */ +static unsigned long long +xfs_rtrmapbt_max_size( + struct xfs_mount *mp, + xfs_rtblock_t rtblocks) +{ + /* Bail out if we're uninitialized, which can happen in mkfs. */ + if (mp->m_rtrmap_mxr[0] == 0) + return 0; + + return xfs_rtrmapbt_calc_size(mp, rtblocks); +} + +/* + * Figure out how many blocks to reserve and how many are used by this btree. + */ +xfs_filblks_t +xfs_rtrmapbt_calc_reserves( + struct xfs_mount *mp) +{ + if (!xfs_has_rtrmapbt(mp)) + return 0; + + /* 1/64th (~1.5%) of the space, and enough for 1 record per block. */ + return max_t(xfs_filblks_t, mp->m_sb.sb_rgblocks >> 6, + xfs_rtrmapbt_max_size(mp, mp->m_sb.sb_rgblocks)); +} diff --git a/fs/xfs/libxfs/xfs_rtrmap_btree.h b/fs/xfs/libxfs/xfs_rtrmap_btree.h index 26e2445f5d6c..63e667d0d76d 100644 --- a/fs/xfs/libxfs/xfs_rtrmap_btree.h +++ b/fs/xfs/libxfs/xfs_rtrmap_btree.h @@ -84,4 +84,6 @@ void xfs_rtrmapbt_destroy_cur_cache(void); int xfs_rtrmapbt_create_path(struct xfs_mount *mp, xfs_rgnumber_t rgno, struct xfs_imeta_path **pathp); +xfs_filblks_t xfs_rtrmapbt_calc_reserves(struct xfs_mount *mp); + #endif /* __XFS_RTRMAP_BTREE_H__ */ diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c index ba330265ab8a..c3d27cb85c26 100644 --- a/fs/xfs/xfs_rtalloc.c +++ b/fs/xfs/xfs_rtalloc.c @@ -1571,6 +1571,11 @@ void xfs_rt_resv_free( struct xfs_mount *mp) { + struct xfs_rtgroup *rtg; + xfs_rgnumber_t rgno; + + for_each_rtgroup(mp, rgno, rtg) + xfs_imeta_resv_free_inode(rtg->rtg_rmapip); } /* Reserve space for rt metadata inodes' space expansion. */ @@ -1578,7 +1583,21 @@ int xfs_rt_resv_init( struct xfs_mount *mp) { - return 0; + struct xfs_rtgroup *rtg; + xfs_filblks_t ask; + xfs_rgnumber_t rgno; + int error = 0; + + for_each_rtgroup(mp, rgno, rtg) { + int err2; + + ask = xfs_rtrmapbt_calc_reserves(mp); + err2 = xfs_imeta_resv_init_inode(rtg->rtg_rmapip, ask); + if (err2 && !error) + error = err2; + } + + return error; } static inline int ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 09/38] xfs: support recovering rmap intent items targetting realtime extents 2022-12-30 22:18 ` [PATCHSET v1.0 00/38] xfs: realtime reverse-mapping support Darrick J. Wong ` (9 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 13/38] xfs: add metadata reservations for realtime rmap btrees Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 11/38] xfs: attach dquots to rt metadata files when starting quota Darrick J. Wong ` (26 subsequent siblings) 37 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Now that we have rmap on the realtime device, log recovery has to support remapping extents on the realtime volume. Make this work. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/xfs_rmap_item.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c index 5f04f55f5caa..a2949f818e0c 100644 --- a/fs/xfs/xfs_rmap_item.c +++ b/fs/xfs/xfs_rmap_item.c @@ -507,6 +507,9 @@ xfs_rui_validate_map( if (!xfs_verify_fileext(mp, map->me_startoff, map->me_len)) return false; + if (map->me_flags & XFS_RMAP_EXTENT_REALTIME) + return xfs_verify_rtbext(mp, map->me_startblock, map->me_len); + return xfs_verify_fsbext(mp, map->me_startblock, map->me_len); } ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 11/38] xfs: attach dquots to rt metadata files when starting quota 2022-12-30 22:18 ` [PATCHSET v1.0 00/38] xfs: realtime reverse-mapping support Darrick J. Wong ` (10 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 09/38] xfs: support recovering rmap intent items targetting realtime extents Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 08/38] xfs: add a realtime flag to the rmap update log redo items Darrick J. Wong ` (25 subsequent siblings) 37 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Attach dquots to the realtime metadata files when starting up quotas, since the resources used by them are charged to the root dquot. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/xfs_mount.c | 4 +++- fs/xfs/xfs_qm.c | 20 +++++++++++++++++--- fs/xfs/xfs_qm_bhv.c | 2 +- fs/xfs/xfs_quota.h | 4 ++-- fs/xfs/xfs_rtalloc.c | 19 +++++++++++++++++++ fs/xfs/xfs_rtalloc.h | 3 +++ 6 files changed, 45 insertions(+), 7 deletions(-) diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c index 1d2403b93f58..2e64f18deabf 100644 --- a/fs/xfs/xfs_mount.c +++ b/fs/xfs/xfs_mount.c @@ -1007,7 +1007,9 @@ xfs_mountfs( ASSERT(mp->m_qflags == 0); mp->m_qflags = quotaflags; - xfs_qm_mount_quotas(mp); + error = xfs_qm_mount_quotas(mp); + if (error) + goto out_rtunmount; } /* diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c index 905765eedcb0..63085d8b5ec1 100644 --- a/fs/xfs/xfs_qm.c +++ b/fs/xfs/xfs_qm.c @@ -29,6 +29,7 @@ #include "xfs_health.h" #include "xfs_imeta.h" #include "xfs_da_format.h" +#include "xfs_rtalloc.h" /* * The global quota manager. There is only one of these for the entire @@ -1486,7 +1487,7 @@ xfs_qm_quotacheck( * If we fail here, the mount will continue with quota turned off. We don't * need to inidicate success or failure at all. */ -void +int xfs_qm_mount_quotas( struct xfs_mount *mp) { @@ -1525,7 +1526,7 @@ xfs_qm_mount_quotas( error = xfs_qm_quotacheck(mp); if (error) { /* Quotacheck failed and disabled quotas. */ - return; + return 0; } } /* @@ -1566,8 +1567,21 @@ xfs_qm_mount_quotas( if (error) { xfs_warn(mp, "Failed to initialize disk quotas."); - return; + return 0; } + + /* + * Attach dquots to realtime metadata files before we do anything that + * could alter the resource usage of rt metadata (log recovery, normal + * operation, etc). + */ + error = xfs_rtmount_dqattach(mp); + if (error) { + xfs_qm_unmount_quotas(mp); + return error; + } + + return 0; } /* diff --git a/fs/xfs/xfs_qm_bhv.c b/fs/xfs/xfs_qm_bhv.c index 271c1021c733..df569a839d3f 100644 --- a/fs/xfs/xfs_qm_bhv.c +++ b/fs/xfs/xfs_qm_bhv.c @@ -119,7 +119,7 @@ xfs_qm_newmount( * mounting, and get on with the boring life * without disk quotas. */ - xfs_qm_mount_quotas(mp); + return xfs_qm_mount_quotas(mp); } else { /* * Clear the quota flags, but remember them. This diff --git a/fs/xfs/xfs_quota.h b/fs/xfs/xfs_quota.h index fe63489d91b2..0cb52d5be4aa 100644 --- a/fs/xfs/xfs_quota.h +++ b/fs/xfs/xfs_quota.h @@ -120,7 +120,7 @@ extern void xfs_qm_dqdetach(struct xfs_inode *); extern void xfs_qm_dqrele(struct xfs_dquot *); extern void xfs_qm_statvfs(struct xfs_inode *, struct kstatfs *); extern int xfs_qm_newmount(struct xfs_mount *, uint *, uint *); -extern void xfs_qm_mount_quotas(struct xfs_mount *); +int xfs_qm_mount_quotas(struct xfs_mount *mp); extern void xfs_qm_unmount(struct xfs_mount *); extern void xfs_qm_unmount_quotas(struct xfs_mount *); @@ -205,7 +205,7 @@ xfs_trans_reserve_quota_icreate(struct xfs_trans *tp, struct xfs_dquot *udqp, #define xfs_qm_dqrele(d) do { (d) = (d); } while(0) #define xfs_qm_statvfs(ip, s) do { } while(0) #define xfs_qm_newmount(mp, a, b) (0) -#define xfs_qm_mount_quotas(mp) +#define xfs_qm_mount_quotas(mp) (0) #define xfs_qm_unmount(mp) #define xfs_qm_unmount_quotas(mp) #define xfs_inode_near_dquot_enforcement(ip, type) (false) diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c index 7a94fb5b5a7f..82b729a86740 100644 --- a/fs/xfs/xfs_rtalloc.c +++ b/fs/xfs/xfs_rtalloc.c @@ -26,6 +26,7 @@ #include "xfs_imeta.h" #include "xfs_rtbitmap.h" #include "xfs_rtgroup.h" +#include "xfs_quota.h" /* * Realtime metadata files are not quite regular files because userspace can't @@ -1686,6 +1687,24 @@ xfs_rtmount_inodes( return error; } +/* Attach dquots for realtime metadata files. */ +int +xfs_rtmount_dqattach( + struct xfs_mount *mp) +{ + int error; + + error = xfs_qm_dqattach(mp->m_rbmip); + if (error) + return error; + + error = xfs_qm_dqattach(mp->m_rsumip); + if (error) + return error; + + return 0; +} + void xfs_rtunmount_inodes( struct xfs_mount *mp) diff --git a/fs/xfs/xfs_rtalloc.h b/fs/xfs/xfs_rtalloc.h index 04931ab1bcac..873ebac239dd 100644 --- a/fs/xfs/xfs_rtalloc.h +++ b/fs/xfs/xfs_rtalloc.h @@ -46,6 +46,8 @@ void xfs_rtunmount_inodes( struct xfs_mount *mp); +int xfs_rtmount_dqattach(struct xfs_mount *mp); + /* * Get the bitmap and summary inodes into the mount structure * at mount time. @@ -104,6 +106,7 @@ xfs_rtmount_init( # define xfs_rtfile_convert_unwritten(ip, pos, len) (0) # define xfs_rt_resv_free(mp) ((void)0) # define xfs_rt_resv_init(mp) (0) +# define xfs_rtmount_dqattach(mp) (0) #endif /* CONFIG_XFS_RT */ #endif /* __XFS_RTALLOC_H__ */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 08/38] xfs: add a realtime flag to the rmap update log redo items 2022-12-30 22:18 ` [PATCHSET v1.0 00/38] xfs: realtime reverse-mapping support Darrick J. Wong ` (11 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 11/38] xfs: attach dquots to rt metadata files when starting quota Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 14/38] xfs: wire up a new inode fork type for the realtime rmap Darrick J. Wong ` (24 subsequent siblings) 37 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Extend the rmap update (RUI) log items with a new realtime flag that indicates that the updates apply against the realtime rmapbt. We'll wire up the actual rmap code later. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_defer.c | 1 + fs/xfs/libxfs/xfs_defer.h | 1 + fs/xfs/libxfs/xfs_log_format.h | 4 +++- fs/xfs/libxfs/xfs_refcount.c | 4 ++-- fs/xfs/libxfs/xfs_rmap.c | 38 ++++++++++++++++++++++++++++++++------ fs/xfs/libxfs/xfs_rmap.h | 10 +++++++--- fs/xfs/scrub/alloc_repair.c | 2 +- fs/xfs/xfs_rmap_item.c | 22 ++++++++++++++++++++++ fs/xfs/xfs_trace.h | 23 +++++++++++++++++------ 9 files changed, 86 insertions(+), 19 deletions(-) diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c index c0416bae880a..ce3bc5fe2bdc 100644 --- a/fs/xfs/libxfs/xfs_defer.c +++ b/fs/xfs/libxfs/xfs_defer.c @@ -187,6 +187,7 @@ static const struct xfs_defer_op_type *defer_op_types[] = { [XFS_DEFER_OPS_TYPE_BMAP] = &xfs_bmap_update_defer_type, [XFS_DEFER_OPS_TYPE_REFCOUNT] = &xfs_refcount_update_defer_type, [XFS_DEFER_OPS_TYPE_RMAP] = &xfs_rmap_update_defer_type, + [XFS_DEFER_OPS_TYPE_RMAP_RT] = &xfs_rmap_update_defer_type, [XFS_DEFER_OPS_TYPE_FREE] = &xfs_extent_free_defer_type, [XFS_DEFER_OPS_TYPE_FREE_RT] = &xfs_extent_free_defer_type, [XFS_DEFER_OPS_TYPE_AGFL_FREE] = &xfs_agfl_free_defer_type, diff --git a/fs/xfs/libxfs/xfs_defer.h b/fs/xfs/libxfs/xfs_defer.h index 52198c7124c6..89c279185ce6 100644 --- a/fs/xfs/libxfs/xfs_defer.h +++ b/fs/xfs/libxfs/xfs_defer.h @@ -17,6 +17,7 @@ enum xfs_defer_ops_type { XFS_DEFER_OPS_TYPE_BMAP, XFS_DEFER_OPS_TYPE_REFCOUNT, XFS_DEFER_OPS_TYPE_RMAP, + XFS_DEFER_OPS_TYPE_RMAP_RT, XFS_DEFER_OPS_TYPE_FREE, XFS_DEFER_OPS_TYPE_AGFL_FREE, XFS_DEFER_OPS_TYPE_FREE_RT, diff --git a/fs/xfs/libxfs/xfs_log_format.h b/fs/xfs/libxfs/xfs_log_format.h index f3c8257a7545..3a23282d6e6f 100644 --- a/fs/xfs/libxfs/xfs_log_format.h +++ b/fs/xfs/libxfs/xfs_log_format.h @@ -746,11 +746,13 @@ struct xfs_map_extent { #define XFS_RMAP_EXTENT_ATTR_FORK (1U << 31) #define XFS_RMAP_EXTENT_BMBT_BLOCK (1U << 30) #define XFS_RMAP_EXTENT_UNWRITTEN (1U << 29) +#define XFS_RMAP_EXTENT_REALTIME (1U << 28) #define XFS_RMAP_EXTENT_FLAGS (XFS_RMAP_EXTENT_TYPE_MASK | \ XFS_RMAP_EXTENT_ATTR_FORK | \ XFS_RMAP_EXTENT_BMBT_BLOCK | \ - XFS_RMAP_EXTENT_UNWRITTEN) + XFS_RMAP_EXTENT_UNWRITTEN | \ + XFS_RMAP_EXTENT_REALTIME) /* * This is the structure used to lay out an rui log item in the diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c index 20c12cb7b7de..83f681fb49fb 100644 --- a/fs/xfs/libxfs/xfs_refcount.c +++ b/fs/xfs/libxfs/xfs_refcount.c @@ -1889,7 +1889,7 @@ xfs_refcount_alloc_cow_extent( __xfs_refcount_add(tp, XFS_REFCOUNT_ALLOC_COW, fsb, len); /* Add rmap entry */ - xfs_rmap_alloc_extent(tp, fsb, len, XFS_RMAP_OWN_COW); + xfs_rmap_alloc_extent(tp, false, fsb, len, XFS_RMAP_OWN_COW); } /* Forget a CoW staging event in the refcount btree. */ @@ -1905,7 +1905,7 @@ xfs_refcount_free_cow_extent( return; /* Remove rmap entry */ - xfs_rmap_free_extent(tp, fsb, len, XFS_RMAP_OWN_COW); + xfs_rmap_free_extent(tp, false, fsb, len, XFS_RMAP_OWN_COW); __xfs_refcount_add(tp, XFS_REFCOUNT_FREE_COW, fsb, len); } diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c index 31194cc14c0b..1a3607082d12 100644 --- a/fs/xfs/libxfs/xfs_rmap.c +++ b/fs/xfs/libxfs/xfs_rmap.c @@ -2654,6 +2654,12 @@ xfs_rmap_finish_one( xfs_agblock_t bno; bool unwritten; + if (ri->ri_realtime) { + /* coming in a subsequent patch */ + ASSERT(0); + return -EFSCORRUPTED; + } + bno = XFS_FSB_TO_AGBNO(mp, ri->ri_bmap.br_startblock); trace_xfs_rmap_deferred(mp, ri); @@ -2726,10 +2732,12 @@ __xfs_rmap_add( struct xfs_trans *tp, enum xfs_rmap_intent_type type, uint64_t owner, + bool isrt, int whichfork, struct xfs_bmbt_irec *bmap) { struct xfs_rmap_intent *ri; + enum xfs_defer_ops_type optype; ri = kmem_cache_alloc(xfs_rmap_intent_cache, GFP_NOFS | __GFP_NOFAIL); INIT_LIST_HEAD(&ri->ri_list); @@ -2737,11 +2745,24 @@ __xfs_rmap_add( ri->ri_owner = owner; ri->ri_whichfork = whichfork; ri->ri_bmap = *bmap; + ri->ri_realtime = isrt; + + /* + * Deferred rmap updates for the realtime and data sections must use + * separate transactions to finish deferred work because updates to + * realtime metadata files can lock AGFs to allocate btree blocks and + * we don't want that mixing with the AGF locks taken to finish data + * section updates. + */ + if (isrt) + optype = XFS_DEFER_OPS_TYPE_RMAP_RT; + else + optype = XFS_DEFER_OPS_TYPE_RMAP; trace_xfs_rmap_defer(tp->t_mountp, ri); xfs_rmap_update_get_group(tp->t_mountp, ri); - xfs_defer_add(tp, XFS_DEFER_OPS_TYPE_RMAP, &ri->ri_list); + xfs_defer_add(tp, optype, &ri->ri_list); } /* Map an extent into a file. */ @@ -2753,6 +2774,7 @@ xfs_rmap_map_extent( struct xfs_bmbt_irec *PREV) { enum xfs_rmap_intent_type type = XFS_RMAP_MAP; + bool isrt = xfs_ifork_is_realtime(ip, whichfork); if (!xfs_rmap_update_is_needed(tp->t_mountp, whichfork)) return; @@ -2760,7 +2782,7 @@ xfs_rmap_map_extent( if (whichfork != XFS_ATTR_FORK && xfs_is_reflink_inode(ip)) type = XFS_RMAP_MAP_SHARED; - __xfs_rmap_add(tp, type, ip->i_ino, whichfork, PREV); + __xfs_rmap_add(tp, type, ip->i_ino, isrt, whichfork, PREV); } /* Unmap an extent out of a file. */ @@ -2772,6 +2794,7 @@ xfs_rmap_unmap_extent( struct xfs_bmbt_irec *PREV) { enum xfs_rmap_intent_type type = XFS_RMAP_UNMAP; + bool isrt = xfs_ifork_is_realtime(ip, whichfork); if (!xfs_rmap_update_is_needed(tp->t_mountp, whichfork)) return; @@ -2779,7 +2802,7 @@ xfs_rmap_unmap_extent( if (whichfork != XFS_ATTR_FORK && xfs_is_reflink_inode(ip)) type = XFS_RMAP_UNMAP_SHARED; - __xfs_rmap_add(tp, type, ip->i_ino, whichfork, PREV); + __xfs_rmap_add(tp, type, ip->i_ino, isrt, whichfork, PREV); } /* @@ -2797,6 +2820,7 @@ xfs_rmap_convert_extent( struct xfs_bmbt_irec *PREV) { enum xfs_rmap_intent_type type = XFS_RMAP_CONVERT; + bool isrt = xfs_ifork_is_realtime(ip, whichfork); if (!xfs_rmap_update_is_needed(mp, whichfork)) return; @@ -2804,13 +2828,14 @@ xfs_rmap_convert_extent( if (whichfork != XFS_ATTR_FORK && xfs_is_reflink_inode(ip)) type = XFS_RMAP_CONVERT_SHARED; - __xfs_rmap_add(tp, type, ip->i_ino, whichfork, PREV); + __xfs_rmap_add(tp, type, ip->i_ino, isrt, whichfork, PREV); } /* Schedule the creation of an rmap for non-file data. */ void xfs_rmap_alloc_extent( struct xfs_trans *tp, + bool isrt, xfs_fsblock_t fsbno, xfs_extlen_t len, uint64_t owner) @@ -2825,13 +2850,14 @@ xfs_rmap_alloc_extent( bmap.br_startoff = 0; bmap.br_state = XFS_EXT_NORM; - __xfs_rmap_add(tp, XFS_RMAP_ALLOC, owner, XFS_DATA_FORK, &bmap); + __xfs_rmap_add(tp, XFS_RMAP_ALLOC, owner, isrt, XFS_DATA_FORK, &bmap); } /* Schedule the deletion of an rmap for non-file data. */ void xfs_rmap_free_extent( struct xfs_trans *tp, + bool isrt, xfs_fsblock_t fsbno, xfs_extlen_t len, uint64_t owner) @@ -2846,7 +2872,7 @@ xfs_rmap_free_extent( bmap.br_startoff = 0; bmap.br_state = XFS_EXT_NORM; - __xfs_rmap_add(tp, XFS_RMAP_FREE, owner, XFS_DATA_FORK, &bmap); + __xfs_rmap_add(tp, XFS_RMAP_FREE, owner, isrt, XFS_DATA_FORK, &bmap); } /* Compare rmap records. Returns -1 if a < b, 1 if a > b, and 0 if equal. */ diff --git a/fs/xfs/libxfs/xfs_rmap.h b/fs/xfs/libxfs/xfs_rmap.h index 54c969731cf4..e98f37c39f2f 100644 --- a/fs/xfs/libxfs/xfs_rmap.h +++ b/fs/xfs/libxfs/xfs_rmap.h @@ -173,7 +173,11 @@ struct xfs_rmap_intent { int ri_whichfork; uint64_t ri_owner; struct xfs_bmbt_irec ri_bmap; - struct xfs_perag *ri_pag; + union { + struct xfs_perag *ri_pag; + struct xfs_rtgroup *ri_rtg; + }; + bool ri_realtime; }; void xfs_rmap_update_get_group(struct xfs_mount *mp, @@ -187,9 +191,9 @@ void xfs_rmap_unmap_extent(struct xfs_trans *tp, struct xfs_inode *ip, void xfs_rmap_convert_extent(struct xfs_mount *mp, struct xfs_trans *tp, struct xfs_inode *ip, int whichfork, struct xfs_bmbt_irec *imap); -void xfs_rmap_alloc_extent(struct xfs_trans *tp, xfs_fsblock_t fsbno, +void xfs_rmap_alloc_extent(struct xfs_trans *tp, bool isrt, xfs_fsblock_t fsbno, xfs_extlen_t len, uint64_t owner); -void xfs_rmap_free_extent(struct xfs_trans *tp, xfs_fsblock_t fsbno, +void xfs_rmap_free_extent(struct xfs_trans *tp, bool isrt, xfs_fsblock_t fsbno, xfs_extlen_t len, uint64_t owner); void xfs_rmap_finish_one_cleanup(struct xfs_trans *tp, diff --git a/fs/xfs/scrub/alloc_repair.c b/fs/xfs/scrub/alloc_repair.c index 6506fc202571..b695cd2b0a56 100644 --- a/fs/xfs/scrub/alloc_repair.c +++ b/fs/xfs/scrub/alloc_repair.c @@ -528,7 +528,7 @@ xrep_abt_dispose_one( xfs_fsblock_t fsbno; fsbno = XFS_AGB_TO_FSB(sc->mp, pag->pag_agno, resv->agbno); - xfs_rmap_alloc_extent(sc->tp, fsbno, resv->used, + xfs_rmap_alloc_extent(sc->tp, false, fsbno, resv->used, XFS_RMAP_OWN_AG); } diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c index a84f7e0e91a3..5f04f55f5caa 100644 --- a/fs/xfs/xfs_rmap_item.c +++ b/fs/xfs/xfs_rmap_item.c @@ -21,6 +21,7 @@ #include "xfs_log_priv.h" #include "xfs_log_recover.h" #include "xfs_ag.h" +#include "xfs_rtgroup.h" struct kmem_cache *xfs_rui_cache; struct kmem_cache *xfs_rud_cache; @@ -284,6 +285,11 @@ xfs_rmap_update_diff_items( ra = container_of(a, struct xfs_rmap_intent, ri_list); rb = container_of(b, struct xfs_rmap_intent, ri_list); + ASSERT(ra->ri_realtime == rb->ri_realtime); + + if (ra->ri_realtime) + return ra->ri_rtg->rtg_rgno - rb->ri_rtg->rtg_rgno; + return ra->ri_pag->pag_agno - rb->ri_pag->pag_agno; } @@ -318,6 +324,8 @@ xfs_rmap_update_log_item( map->me_flags |= XFS_RMAP_EXTENT_UNWRITTEN; if (ri->ri_whichfork == XFS_ATTR_FORK) map->me_flags |= XFS_RMAP_EXTENT_ATTR_FORK; + if (ri->ri_realtime) + map->me_flags |= XFS_RMAP_EXTENT_REALTIME; switch (ri->ri_type) { case XFS_RMAP_MAP: map->me_flags |= XFS_RMAP_EXTENT_MAP; @@ -387,6 +395,14 @@ xfs_rmap_update_get_group( { xfs_agnumber_t agno; + if (ri->ri_realtime) { + xfs_rgnumber_t rgno; + + rgno = xfs_rtb_to_rgno(mp, ri->ri_bmap.br_startblock); + ri->ri_rtg = xfs_rtgroup_get(mp, rgno); + return; + } + agno = XFS_FSB_TO_AGNO(mp, ri->ri_bmap.br_startblock); ri->ri_pag = xfs_perag_get(mp, agno); xfs_perag_bump_intents(ri->ri_pag); @@ -397,6 +413,11 @@ static inline void xfs_rmap_update_put_group( struct xfs_rmap_intent *ri) { + if (ri->ri_realtime) { + xfs_rtgroup_put(ri->ri_rtg); + return; + } + xfs_perag_drop_intents(ri->ri_pag); xfs_perag_put(ri->ri_pag); } @@ -565,6 +586,7 @@ xfs_rui_item_recover( goto abort_error; } + fake.ri_realtime = !!(map->me_flags & XFS_RMAP_EXTENT_REALTIME); fake.ri_owner = map->me_owner; fake.ri_whichfork = (map->me_flags & XFS_RMAP_EXTENT_ATTR_FORK) ? XFS_ATTR_FORK : XFS_DATA_FORK; diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index 390aa7a4afae..c02a58cbf15b 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -3013,9 +3013,10 @@ DECLARE_EVENT_CLASS(xfs_rmap_deferred_class, TP_ARGS(mp, ri), TP_STRUCT__entry( __field(dev_t, dev) + __field(dev_t, opdev) __field(unsigned long long, owner) __field(xfs_agnumber_t, agno) - __field(xfs_agblock_t, agbno) + __field(xfs_agblock_t, rmapbno) __field(int, whichfork) __field(xfs_fileoff_t, l_loff) __field(xfs_filblks_t, l_len) @@ -3024,9 +3025,18 @@ DECLARE_EVENT_CLASS(xfs_rmap_deferred_class, ), TP_fast_assign( __entry->dev = mp->m_super->s_dev; - __entry->agno = XFS_FSB_TO_AGNO(mp, ri->ri_bmap.br_startblock); - __entry->agbno = XFS_FSB_TO_AGBNO(mp, - ri->ri_bmap.br_startblock); + if (ri->ri_realtime) { + __entry->opdev = mp->m_rtdev_targp->bt_dev; + __entry->rmapbno = xfs_rtb_to_rgbno(mp, + ri->ri_bmap.br_startblock, + &__entry->agno); + } else { + __entry->agno = XFS_FSB_TO_AGNO(mp, + ri->ri_bmap.br_startblock); + __entry->opdev = __entry->dev; + __entry->rmapbno = XFS_FSB_TO_AGBNO(mp, + ri->ri_bmap.br_startblock); + } __entry->owner = ri->ri_owner; __entry->whichfork = ri->ri_whichfork; __entry->l_loff = ri->ri_bmap.br_startoff; @@ -3034,11 +3044,12 @@ DECLARE_EVENT_CLASS(xfs_rmap_deferred_class, __entry->l_state = ri->ri_bmap.br_state; __entry->op = ri->ri_type; ), - TP_printk("dev %d:%d op %s agno 0x%x agbno 0x%x owner 0x%llx %s fileoff 0x%llx fsbcount 0x%llx state %d", + TP_printk("dev %d:%d op %s opdev %d:%d agno 0x%x rmapbno 0x%x owner 0x%llx %s fileoff 0x%llx fsbcount 0x%llx state %d", MAJOR(__entry->dev), MINOR(__entry->dev), __print_symbolic(__entry->op, XFS_RMAP_INTENT_STRINGS), + MAJOR(__entry->opdev), MINOR(__entry->opdev), __entry->agno, - __entry->agbno, + __entry->rmapbno, __entry->owner, __print_symbolic(__entry->whichfork, XFS_WHICHFORK_STRINGS), __entry->l_loff, ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 14/38] xfs: wire up a new inode fork type for the realtime rmap 2022-12-30 22:18 ` [PATCHSET v1.0 00/38] xfs: realtime reverse-mapping support Darrick J. Wong ` (12 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 08/38] xfs: add a realtime flag to the rmap update log redo items Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 18/38] xfs: rearrange xfs_fsmap.c a little bit Darrick J. Wong ` (23 subsequent siblings) 37 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Plumb in the pieces we need to embed the root of the realtime rmap btree in an inode's data fork, complete with new fork type and on-disk interpretation functions. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_format.h | 8 + fs/xfs/libxfs/xfs_inode_fork.c | 8 + fs/xfs/libxfs/xfs_rtrmap_btree.c | 220 ++++++++++++++++++++++++++++++++++++++ fs/xfs/libxfs/xfs_rtrmap_btree.h | 112 +++++++++++++++++++ fs/xfs/xfs_inode_item_recover.c | 32 +++++- fs/xfs/xfs_ondisk.h | 1 6 files changed, 375 insertions(+), 6 deletions(-) diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h index babe5d3fabb1..a2b8d8ee8afd 100644 --- a/fs/xfs/libxfs/xfs_format.h +++ b/fs/xfs/libxfs/xfs_format.h @@ -1736,6 +1736,14 @@ typedef __be32 xfs_rmap_ptr_t; */ #define XFS_RTRMAP_CRC_MAGIC 0x4d415052 /* 'MAPR' */ +/* + * rtrmap root header, on-disk form only. + */ +struct xfs_rtrmap_root { + __be16 bb_level; /* 0 is a leaf */ + __be16 bb_numrecs; /* current # of data records */ +}; + /* inode-based btree pointer type */ typedef __be64 xfs_rtrmap_ptr_t; diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c index 899428f96b94..94979bed8f32 100644 --- a/fs/xfs/libxfs/xfs_inode_fork.c +++ b/fs/xfs/libxfs/xfs_inode_fork.c @@ -27,6 +27,7 @@ #include "xfs_errortag.h" #include "xfs_health.h" #include "xfs_symlink_remote.h" +#include "xfs_rtrmap_btree.h" struct kmem_cache *xfs_ifork_cache; @@ -262,8 +263,7 @@ xfs_iformat_data_fork( case XFS_DINODE_FMT_RMAP: if (!xfs_has_rtrmapbt(ip->i_mount)) return -EFSCORRUPTED; - ASSERT(0); /* to be implemented later */ - return -EFSCORRUPTED; + return xfs_iformat_rtrmap(ip, dip); default: xfs_inode_verifier_error(ip, -EFSCORRUPTED, __func__, dip, sizeof(*dip), __this_address); @@ -645,7 +645,9 @@ xfs_iflush_fork( break; case XFS_DINODE_FMT_RMAP: - ASSERT(0); /* to be implemented later */ + ASSERT(whichfork == XFS_DATA_FORK); + if (iip->ili_fields & brootflag[whichfork]) + xfs_iflush_rtrmap(ip, dip); break; default: diff --git a/fs/xfs/libxfs/xfs_rtrmap_btree.c b/fs/xfs/libxfs/xfs_rtrmap_btree.c index c90017408574..a099f33f26ab 100644 --- a/fs/xfs/libxfs/xfs_rtrmap_btree.c +++ b/fs/xfs/libxfs/xfs_rtrmap_btree.c @@ -85,6 +85,39 @@ xfs_rtrmapbt_get_maxrecs( return cur->bc_mp->m_rtrmap_mxr[level != 0]; } +/* Calculate number of records in the ondisk realtime rmap btree inode root. */ +unsigned int +xfs_rtrmapbt_droot_maxrecs( + unsigned int blocklen, + bool leaf) +{ + blocklen -= sizeof(struct xfs_rtrmap_root); + + if (leaf) + return blocklen / sizeof(struct xfs_rmap_rec); + return blocklen / (2 * sizeof(struct xfs_rmap_key) + + sizeof(xfs_rtrmap_ptr_t)); +} + +/* + * Get the maximum records we could store in the on-disk format. + * + * For non-root nodes this is equivalent to xfs_rtrmapbt_get_maxrecs, but + * for the root node this checks the available space in the dinode fork + * so that we can resize the in-memory buffer to match it. After a + * resize to the maximum size this function returns the same value + * as xfs_rtrmapbt_get_maxrecs for the root node, too. + */ +STATIC int +xfs_rtrmapbt_get_dmaxrecs( + struct xfs_btree_cur *cur, + int level) +{ + if (level != cur->bc_nlevels - 1) + return cur->bc_mp->m_rtrmap_mxr[level != 0]; + return xfs_rtrmapbt_droot_maxrecs(cur->bc_ino.forksize, level == 0); +} + /* * Convert the ondisk record's offset field into the ondisk key's offset field. * Fork and bmbt are significant parts of the rmap record key, but written @@ -377,6 +410,64 @@ xfs_rtrmapbt_keys_contiguous( be32_to_cpu(key2->rmap.rm_startblock)); } +/* Move the rtrmap btree root from one incore buffer to another. */ +static void +xfs_rtrmapbt_broot_move( + struct xfs_inode *ip, + int whichfork, + struct xfs_btree_block *dst_broot, + size_t dst_bytes, + struct xfs_btree_block *src_broot, + size_t src_bytes, + unsigned int level, + unsigned int numrecs) +{ + struct xfs_mount *mp = ip->i_mount; + void *dptr; + void *sptr; + + ASSERT(xfs_rtrmap_droot_space(src_broot) <= + xfs_inode_fork_size(ip, whichfork)); + + /* + * We always have to move the pointers because they are not butted + * against the btree block header. + */ + if (numrecs && level > 0) { + sptr = xfs_rtrmap_broot_ptr_addr(mp, src_broot, 1, src_bytes); + dptr = xfs_rtrmap_broot_ptr_addr(mp, dst_broot, 1, dst_bytes); + memmove(dptr, sptr, numrecs * sizeof(xfs_fsblock_t)); + } + + if (src_broot == dst_broot) + return; + + /* + * If the root is being totally relocated, we have to migrate the block + * header and the keys/records that come after it. + */ + memcpy(dst_broot, src_broot, XFS_RTRMAP_BLOCK_LEN); + + if (!numrecs) + return; + + if (level == 0) { + sptr = xfs_rtrmap_rec_addr(src_broot, 1); + dptr = xfs_rtrmap_rec_addr(dst_broot, 1); + memcpy(dptr, sptr, numrecs * sizeof(struct xfs_rmap_rec)); + } else { + sptr = xfs_rtrmap_key_addr(src_broot, 1); + dptr = xfs_rtrmap_key_addr(dst_broot, 1); + memcpy(dptr, sptr, numrecs * 2 * sizeof(struct xfs_rmap_key)); + } +} + +static const struct xfs_ifork_broot_ops xfs_rtrmapbt_iroot_ops = { + .maxrecs = xfs_rtrmapbt_maxrecs, + .size = xfs_rtrmap_broot_space_calc, + .move = xfs_rtrmapbt_broot_move, +}; + const struct xfs_btree_ops xfs_rtrmapbt_ops = { .rec_len = sizeof(struct xfs_rmap_rec), .key_len = 2 * sizeof(struct xfs_rmap_key), @@ -389,6 +480,7 @@ const struct xfs_btree_ops xfs_rtrmapbt_ops = { .free_block = xfs_btree_free_imeta_block, .get_minrecs = xfs_rtrmapbt_get_minrecs, .get_maxrecs = xfs_rtrmapbt_get_maxrecs, + .get_dmaxrecs = xfs_rtrmapbt_get_dmaxrecs, .init_key_from_rec = xfs_rtrmapbt_init_key_from_rec, .init_high_key_from_rec = xfs_rtrmapbt_init_high_key_from_rec, .init_rec_from_cur = xfs_rtrmapbt_init_rec_from_cur, @@ -399,6 +491,7 @@ const struct xfs_btree_ops xfs_rtrmapbt_ops = { .keys_inorder = xfs_rtrmapbt_keys_inorder, .recs_inorder = xfs_rtrmapbt_recs_inorder, .keys_contiguous = xfs_rtrmapbt_keys_contiguous, + .iroot_ops = &xfs_rtrmapbt_iroot_ops, }; /* Initialize a new rt rmap btree cursor. */ @@ -647,3 +740,130 @@ xfs_rtrmapbt_calc_reserves( return max_t(xfs_filblks_t, mp->m_sb.sb_rgblocks >> 6, xfs_rtrmapbt_max_size(mp, mp->m_sb.sb_rgblocks)); } + +/* Convert on-disk form of btree root to in-memory form. */ +STATIC void +xfs_rtrmapbt_from_disk( + struct xfs_inode *ip, + struct xfs_rtrmap_root *dblock, + unsigned int dblocklen, + struct xfs_btree_block *rblock) +{ + struct xfs_mount *mp = ip->i_mount; + struct xfs_rmap_key *fkp; + __be64 *fpp; + struct xfs_rmap_key *tkp; + __be64 *tpp; + struct xfs_rmap_rec *frp; + struct xfs_rmap_rec *trp; + unsigned int rblocklen = xfs_rtrmap_broot_space(mp, dblock); + unsigned int numrecs; + unsigned int maxrecs; + + xfs_btree_init_block(mp, rblock, &xfs_rtrmapbt_ops, 0, 0, ip->i_ino); + + rblock->bb_level = dblock->bb_level; + rblock->bb_numrecs = dblock->bb_numrecs; + numrecs = be16_to_cpu(dblock->bb_numrecs); + + if (be16_to_cpu(rblock->bb_level) > 0) { + maxrecs = xfs_rtrmapbt_droot_maxrecs(dblocklen, false); + fkp = xfs_rtrmap_droot_key_addr(dblock, 1); + tkp = xfs_rtrmap_key_addr(rblock, 1); + fpp = xfs_rtrmap_droot_ptr_addr(dblock, 1, maxrecs); + tpp = xfs_rtrmap_broot_ptr_addr(mp, rblock, 1, rblocklen); + memcpy(tkp, fkp, 2 * sizeof(*fkp) * numrecs); + memcpy(tpp, fpp, sizeof(*fpp) * numrecs); + } else { + frp = xfs_rtrmap_droot_rec_addr(dblock, 1); + trp = xfs_rtrmap_rec_addr(rblock, 1); + memcpy(trp, frp, sizeof(*frp) * numrecs); + } +} + +/* Load a realtime reverse mapping btree root in from disk. */ +int +xfs_iformat_rtrmap( + struct xfs_inode *ip, + struct xfs_dinode *dip) +{ + struct xfs_mount *mp = ip->i_mount; + struct xfs_ifork *ifp = xfs_ifork_ptr(ip, XFS_DATA_FORK); + struct xfs_rtrmap_root *dfp = XFS_DFORK_PTR(dip, XFS_DATA_FORK); + unsigned int numrecs; + unsigned int level; + int dsize; + + dsize = XFS_DFORK_SIZE(dip, mp, XFS_DATA_FORK); + numrecs = be16_to_cpu(dfp->bb_numrecs); + level = be16_to_cpu(dfp->bb_level); + + if (level > mp->m_rtrmap_maxlevels || + xfs_rtrmap_droot_space_calc(level, numrecs) > dsize) + return -EFSCORRUPTED; + + xfs_iroot_alloc(ip, XFS_DATA_FORK, + xfs_rtrmap_broot_space_calc(mp, level, numrecs)); + xfs_rtrmapbt_from_disk(ip, dfp, dsize, ifp->if_broot); + return 0; +} + +/* Convert in-memory form of btree root to on-disk form. */ +void +xfs_rtrmapbt_to_disk( + struct xfs_mount *mp, + struct xfs_btree_block *rblock, + unsigned int rblocklen, + struct xfs_rtrmap_root *dblock, + unsigned int dblocklen) +{ + struct xfs_rmap_key *fkp; + __be64 *fpp; + struct xfs_rmap_key *tkp; + __be64 *tpp; + struct xfs_rmap_rec *frp; + struct xfs_rmap_rec *trp; + unsigned int numrecs; + unsigned int maxrecs; + + ASSERT(rblock->bb_magic == cpu_to_be32(XFS_RTRMAP_CRC_MAGIC)); + ASSERT(uuid_equal(&rblock->bb_u.l.bb_uuid, &mp->m_sb.sb_meta_uuid)); + ASSERT(rblock->bb_u.l.bb_blkno == cpu_to_be64(XFS_BUF_DADDR_NULL)); + ASSERT(rblock->bb_u.l.bb_leftsib == cpu_to_be64(NULLFSBLOCK)); + ASSERT(rblock->bb_u.l.bb_rightsib == cpu_to_be64(NULLFSBLOCK)); + + dblock->bb_level = rblock->bb_level; + dblock->bb_numrecs = rblock->bb_numrecs; + numrecs = be16_to_cpu(rblock->bb_numrecs); + + if (be16_to_cpu(rblock->bb_level) > 0) { + maxrecs = xfs_rtrmapbt_droot_maxrecs(dblocklen, false); + fkp = xfs_rtrmap_key_addr(rblock, 1); + tkp = xfs_rtrmap_droot_key_addr(dblock, 1); + fpp = xfs_rtrmap_broot_ptr_addr(mp, rblock, 1, rblocklen); + tpp = xfs_rtrmap_droot_ptr_addr(dblock, 1, maxrecs); + memcpy(tkp, fkp, 2 * sizeof(*fkp) * numrecs); + memcpy(tpp, fpp, sizeof(*fpp) * numrecs); + } else { + frp = xfs_rtrmap_rec_addr(rblock, 1); + trp = xfs_rtrmap_droot_rec_addr(dblock, 1); + memcpy(trp, frp, sizeof(*frp) * numrecs); + } +} + +/* Flush a realtime reverse mapping btree root out to disk. */ +void +xfs_iflush_rtrmap( + struct xfs_inode *ip, + struct xfs_dinode *dip) +{ + struct xfs_ifork *ifp = xfs_ifork_ptr(ip, XFS_DATA_FORK); + struct xfs_rtrmap_root *dfp = XFS_DFORK_PTR(dip, XFS_DATA_FORK); + + ASSERT(ifp->if_broot != NULL); + ASSERT(ifp->if_broot_bytes > 0); + ASSERT(xfs_rtrmap_droot_space(ifp->if_broot) <= + xfs_inode_fork_size(ip, XFS_DATA_FORK)); + xfs_rtrmapbt_to_disk(ip->i_mount, ifp->if_broot, ifp->if_broot_bytes, + dfp, XFS_DFORK_SIZE(dip, ip->i_mount, XFS_DATA_FORK)); +} diff --git a/fs/xfs/libxfs/xfs_rtrmap_btree.h b/fs/xfs/libxfs/xfs_rtrmap_btree.h index 63e667d0d76d..6917a31bfe0c 100644 --- a/fs/xfs/libxfs/xfs_rtrmap_btree.h +++ b/fs/xfs/libxfs/xfs_rtrmap_btree.h @@ -27,6 +27,7 @@ void xfs_rtrmapbt_commit_staged_btree(struct xfs_btree_cur *cur, unsigned int xfs_rtrmapbt_maxrecs(struct xfs_mount *mp, unsigned int blocklen, bool leaf); void xfs_rtrmapbt_compute_maxlevels(struct xfs_mount *mp); +unsigned int xfs_rtrmapbt_droot_maxrecs(unsigned int blocklen, bool leaf); /* * Addresses of records, keys, and pointers within an incore rtrmapbt block. @@ -86,4 +87,115 @@ int xfs_rtrmapbt_create_path(struct xfs_mount *mp, xfs_rgnumber_t rgno, xfs_filblks_t xfs_rtrmapbt_calc_reserves(struct xfs_mount *mp); +/* Addresses of key, pointers, and records within an ondisk rtrmapbt block. */ + +static inline struct xfs_rmap_rec * +xfs_rtrmap_droot_rec_addr( + struct xfs_rtrmap_root *block, + unsigned int index) +{ + return (struct xfs_rmap_rec *) + ((char *)(block + 1) + + (index - 1) * sizeof(struct xfs_rmap_rec)); +} + +static inline struct xfs_rmap_key * +xfs_rtrmap_droot_key_addr( + struct xfs_rtrmap_root *block, + unsigned int index) +{ + return (struct xfs_rmap_key *) + ((char *)(block + 1) + + (index - 1) * 2 * sizeof(struct xfs_rmap_key)); +} + +static inline xfs_rtrmap_ptr_t * +xfs_rtrmap_droot_ptr_addr( + struct xfs_rtrmap_root *block, + unsigned int index, + unsigned int maxrecs) +{ + return (xfs_rtrmap_ptr_t *) + ((char *)(block + 1) + + maxrecs * 2 * sizeof(struct xfs_rmap_key) + + (index - 1) * sizeof(xfs_rtrmap_ptr_t)); +} + +/* + * Address of pointers within the incore btree root. + * + * These are to be used when we know the size of the block and + * we don't have a cursor. + */ +static inline xfs_rtrmap_ptr_t * +xfs_rtrmap_broot_ptr_addr( + struct xfs_mount *mp, + struct xfs_btree_block *bb, + unsigned int index, + unsigned int block_size) +{ + return xfs_rtrmap_ptr_addr(bb, index, + xfs_rtrmapbt_maxrecs(mp, block_size, false)); +} + +/* + * Compute the space required for the incore btree root containing the given + * number of records. + */ +static inline size_t +xfs_rtrmap_broot_space_calc( + struct xfs_mount *mp, + unsigned int level, + unsigned int nrecs) +{ + size_t sz = XFS_RTRMAP_BLOCK_LEN; + + if (level > 0) + return sz + nrecs * (2 * sizeof(struct xfs_rmap_key) + + sizeof(xfs_rtrmap_ptr_t)); + return sz + nrecs * sizeof(struct xfs_rmap_rec); +} + +/* + * Compute the space required for the incore btree root given the ondisk + * btree root block. + */ +static inline size_t +xfs_rtrmap_broot_space(struct xfs_mount *mp, struct xfs_rtrmap_root *bb) +{ + return xfs_rtrmap_broot_space_calc(mp, be16_to_cpu(bb->bb_level), + be16_to_cpu(bb->bb_numrecs)); +} + +/* Compute the space required for the ondisk root block. */ +static inline size_t +xfs_rtrmap_droot_space_calc( + unsigned int level, + unsigned int nrecs) +{ + size_t sz = sizeof(struct xfs_rtrmap_root); + + if (level > 0) + return sz + nrecs * (2 * sizeof(struct xfs_rmap_key) + + sizeof(xfs_rtrmap_ptr_t)); + return sz + nrecs * sizeof(struct xfs_rmap_rec); +} + +/* + * Compute the space required for the ondisk root block given an incore root + * block. + */ +static inline size_t +xfs_rtrmap_droot_space(struct xfs_btree_block *bb) +{ + return xfs_rtrmap_droot_space_calc(be16_to_cpu(bb->bb_level), + be16_to_cpu(bb->bb_numrecs)); +} + +int xfs_iformat_rtrmap(struct xfs_inode *ip, struct xfs_dinode *dip); +void xfs_rtrmapbt_to_disk(struct xfs_mount *mp, struct xfs_btree_block *rblock, + unsigned int rblocklen, struct xfs_rtrmap_root *dblock, + unsigned int dblocklen); +void xfs_iflush_rtrmap(struct xfs_inode *ip, struct xfs_dinode *dip); + #endif /* __XFS_RTRMAP_BTREE_H__ */ diff --git a/fs/xfs/xfs_inode_item_recover.c b/fs/xfs/xfs_inode_item_recover.c index 3453a204d196..4f1ed1f6a34d 100644 --- a/fs/xfs/xfs_inode_item_recover.c +++ b/fs/xfs/xfs_inode_item_recover.c @@ -22,6 +22,7 @@ #include "xfs_log_recover.h" #include "xfs_icache.h" #include "xfs_bmap_btree.h" +#include "xfs_rtrmap_btree.h" STATIC void xlog_recover_inode_ra_pass2( @@ -266,6 +267,31 @@ xlog_dinode_verify_extent_counts( return 0; } +static inline int +xlog_recover_inode_dbroot( + struct xfs_mount *mp, + void *src, + unsigned int len, + struct xfs_dinode *dip) +{ + void *dfork = XFS_DFORK_DPTR(dip); + unsigned int dsize = XFS_DFORK_DSIZE(dip, mp); + + switch (dip->di_format) { + case XFS_DINODE_FMT_BTREE: + xfs_bmbt_to_bmdr(mp, src, len, dfork, dsize); + break; + case XFS_DINODE_FMT_RMAP: + xfs_rtrmapbt_to_disk(mp, src, len, dfork, dsize); + break; + default: + ASSERT(0); + return -EFSCORRUPTED; + } + + return 0; +} + STATIC int xlog_recover_inode_commit_pass2( struct xlog *log, @@ -472,9 +498,9 @@ xlog_recover_inode_commit_pass2( break; case XFS_ILOG_DBROOT: - xfs_bmbt_to_bmdr(mp, (struct xfs_btree_block *)src, len, - (struct xfs_bmdr_block *)XFS_DFORK_DPTR(dip), - XFS_DFORK_DSIZE(dip, mp)); + error = xlog_recover_inode_dbroot(mp, src, len, dip); + if (error) + goto out_release; break; default: diff --git a/fs/xfs/xfs_ondisk.h b/fs/xfs/xfs_ondisk.h index 35d0695fbf57..f24a08dd63e9 100644 --- a/fs/xfs/xfs_ondisk.h +++ b/fs/xfs/xfs_ondisk.h @@ -78,6 +78,7 @@ xfs_check_ondisk_structs(void) XFS_CHECK_STRUCT_SIZE(union xfs_suminfo_ondisk, 4); XFS_CHECK_STRUCT_SIZE(struct xfs_rtbuf_blkinfo, 48); XFS_CHECK_STRUCT_SIZE(xfs_rtrmap_ptr_t, 8); + XFS_CHECK_STRUCT_SIZE(struct xfs_rtrmap_root, 4); /* * m68k has problems with xfs_attr_leaf_name_remote_t, but we pad it to ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 18/38] xfs: rearrange xfs_fsmap.c a little bit 2022-12-30 22:18 ` [PATCHSET v1.0 00/38] xfs: realtime reverse-mapping support Darrick J. Wong ` (13 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 14/38] xfs: wire up a new inode fork type for the realtime rmap Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 17/38] xfs: create routine to allocate and initialize a realtime rmap btree inode Darrick J. Wong ` (22 subsequent siblings) 37 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> The order of the functions in this file has gotten a little confusing over the years. Specifically, the two data device implementations (bnobt and rmapbt) could be adjacent in the source code instead of split in two by the logdev and rtdev fsmap implementations. We're about to add more functionality to this file, so rearrange things now. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/xfs_fsmap.c | 366 ++++++++++++++++++++++++++-------------------------- 1 file changed, 183 insertions(+), 183 deletions(-) diff --git a/fs/xfs/xfs_fsmap.c b/fs/xfs/xfs_fsmap.c index 71053f840ea4..dfd9e39ded6e 100644 --- a/fs/xfs/xfs_fsmap.c +++ b/fs/xfs/xfs_fsmap.c @@ -343,6 +343,21 @@ xfs_getfsmap_helper( return 0; } +/* Set rmap flags based on the getfsmap flags */ +static void +xfs_getfsmap_set_irec_flags( + struct xfs_rmap_irec *irec, + const struct xfs_fsmap *fmr) +{ + irec->rm_flags = 0; + if (fmr->fmr_flags & FMR_OF_ATTR_FORK) + irec->rm_flags |= XFS_RMAP_ATTR_FORK; + if (fmr->fmr_flags & FMR_OF_EXTENT_MAP) + irec->rm_flags |= XFS_RMAP_BMBT_BLOCK; + if (fmr->fmr_flags & FMR_OF_PREALLOC) + irec->rm_flags |= XFS_RMAP_UNWRITTEN; +} + /* Transform a rmapbt irec into a fsmap */ STATIC int xfs_getfsmap_datadev_helper( @@ -385,189 +400,6 @@ xfs_getfsmap_datadev_bnobt_helper( return xfs_getfsmap_helper(cur->bc_tp, info, &irec, rec_daddr); } -/* Set rmap flags based on the getfsmap flags */ -static void -xfs_getfsmap_set_irec_flags( - struct xfs_rmap_irec *irec, - const struct xfs_fsmap *fmr) -{ - irec->rm_flags = 0; - if (fmr->fmr_flags & FMR_OF_ATTR_FORK) - irec->rm_flags |= XFS_RMAP_ATTR_FORK; - if (fmr->fmr_flags & FMR_OF_EXTENT_MAP) - irec->rm_flags |= XFS_RMAP_BMBT_BLOCK; - if (fmr->fmr_flags & FMR_OF_PREALLOC) - irec->rm_flags |= XFS_RMAP_UNWRITTEN; -} - -/* Execute a getfsmap query against the log device. */ -STATIC int -xfs_getfsmap_logdev( - struct xfs_trans *tp, - const struct xfs_fsmap *keys, - struct xfs_getfsmap_info *info) -{ - struct xfs_mount *mp = tp->t_mountp; - struct xfs_rmap_irec rmap; - int error; - - /* Set up search keys */ - info->low.rm_startblock = XFS_BB_TO_FSBT(mp, keys[0].fmr_physical); - info->low.rm_offset = XFS_BB_TO_FSBT(mp, keys[0].fmr_offset); - error = xfs_fsmap_owner_to_rmap(&info->low, keys); - if (error) - return error; - info->low.rm_blockcount = 0; - xfs_getfsmap_set_irec_flags(&info->low, &keys[0]); - - error = xfs_fsmap_owner_to_rmap(&info->high, keys + 1); - if (error) - return error; - info->high.rm_startblock = -1U; - info->high.rm_owner = ULLONG_MAX; - info->high.rm_offset = ULLONG_MAX; - info->high.rm_blockcount = 0; - info->high.rm_flags = XFS_RMAP_KEY_FLAGS | XFS_RMAP_REC_FLAGS; - info->missing_owner = XFS_FMR_OWN_FREE; - - trace_xfs_fsmap_low_key(mp, info->dev, NULLAGNUMBER, &info->low); - trace_xfs_fsmap_high_key(mp, info->dev, NULLAGNUMBER, &info->high); - - if (keys[0].fmr_physical > 0) - return 0; - - /* Fabricate an rmap entry for the external log device. */ - rmap.rm_startblock = 0; - rmap.rm_blockcount = mp->m_sb.sb_logblocks; - rmap.rm_owner = XFS_RMAP_OWN_LOG; - rmap.rm_offset = 0; - rmap.rm_flags = 0; - - return xfs_getfsmap_helper(tp, info, &rmap, 0); -} - -#ifdef CONFIG_XFS_RT -/* Transform a rtbitmap "record" into a fsmap */ -STATIC int -xfs_getfsmap_rtdev_rtbitmap_helper( - struct xfs_mount *mp, - struct xfs_trans *tp, - const struct xfs_rtalloc_rec *rec, - void *priv) -{ - struct xfs_getfsmap_info *info = priv; - struct xfs_rmap_irec irec; - xfs_daddr_t rec_daddr; - - irec.rm_startblock = xfs_rtx_to_rtb(mp, rec->ar_startext); - rec_daddr = XFS_FSB_TO_BB(mp, irec.rm_startblock); - irec.rm_blockcount = xfs_rtx_to_rtb(mp, rec->ar_extcount); - irec.rm_owner = XFS_RMAP_OWN_NULL; /* "free" */ - irec.rm_offset = 0; - irec.rm_flags = 0; - - return xfs_getfsmap_helper(tp, info, &irec, rec_daddr); -} - -/* Execute a getfsmap query against the realtime device. */ -STATIC int -__xfs_getfsmap_rtdev( - struct xfs_trans *tp, - const struct xfs_fsmap *keys, - int (*query_fn)(struct xfs_trans *, - struct xfs_getfsmap_info *), - struct xfs_getfsmap_info *info) -{ - struct xfs_mount *mp = tp->t_mountp; - xfs_fsblock_t start_fsb; - xfs_fsblock_t end_fsb; - uint64_t eofs; - int error = 0; - - eofs = XFS_FSB_TO_BB(mp, mp->m_sb.sb_rblocks); - if (keys[0].fmr_physical >= eofs) - return 0; - start_fsb = XFS_BB_TO_FSBT(mp, keys[0].fmr_physical); - end_fsb = XFS_BB_TO_FSB(mp, min(eofs - 1, keys[1].fmr_physical)); - - /* Set up search keys */ - info->low.rm_startblock = start_fsb; - error = xfs_fsmap_owner_to_rmap(&info->low, &keys[0]); - if (error) - return error; - info->low.rm_offset = XFS_BB_TO_FSBT(mp, keys[0].fmr_offset); - info->low.rm_blockcount = 0; - xfs_getfsmap_set_irec_flags(&info->low, &keys[0]); - - info->high.rm_startblock = end_fsb; - error = xfs_fsmap_owner_to_rmap(&info->high, &keys[1]); - if (error) - return error; - info->high.rm_offset = XFS_BB_TO_FSBT(mp, keys[1].fmr_offset); - info->high.rm_blockcount = 0; - xfs_getfsmap_set_irec_flags(&info->high, &keys[1]); - - trace_xfs_fsmap_low_key(mp, info->dev, NULLAGNUMBER, &info->low); - trace_xfs_fsmap_high_key(mp, info->dev, NULLAGNUMBER, &info->high); - - return query_fn(tp, info); -} - -/* Actually query the realtime bitmap. */ -STATIC int -xfs_getfsmap_rtdev_rtbitmap_query( - struct xfs_trans *tp, - struct xfs_getfsmap_info *info) -{ - struct xfs_rtalloc_rec alow = { 0 }; - struct xfs_rtalloc_rec ahigh = { 0 }; - struct xfs_mount *mp = tp->t_mountp; - unsigned int mod; - int error; - - xfs_rtbitmap_lock_shared(mp, XFS_RBMLOCK_BITMAP); - - /* - * Set up query parameters to return free rtextents covering the range - * we want. - */ - alow.ar_startext = xfs_rtb_to_rtxt(mp, info->low.rm_startblock); - ahigh.ar_startext = xfs_rtb_to_rtx(mp, info->high.rm_startblock, &mod); - if (mod) - ahigh.ar_startext++; - error = xfs_rtalloc_query_range(mp, tp, &alow, &ahigh, - xfs_getfsmap_rtdev_rtbitmap_helper, info); - if (error) - goto err; - - /* - * Report any gaps at the end of the rtbitmap by simulating a null - * rmap starting at the block after the end of the query range. - */ - info->last = true; - ahigh.ar_startext = min(mp->m_sb.sb_rextents, ahigh.ar_startext); - - error = xfs_getfsmap_rtdev_rtbitmap_helper(mp, tp, &ahigh, info); - if (error) - goto err; -err: - xfs_rtbitmap_unlock_shared(mp, XFS_RBMLOCK_BITMAP); - return error; -} - -/* Execute a getfsmap query against the realtime device rtbitmap. */ -STATIC int -xfs_getfsmap_rtdev_rtbitmap( - struct xfs_trans *tp, - const struct xfs_fsmap *keys, - struct xfs_getfsmap_info *info) -{ - info->missing_owner = XFS_FMR_OWN_UNKNOWN; - return __xfs_getfsmap_rtdev(tp, keys, xfs_getfsmap_rtdev_rtbitmap_query, - info); -} -#endif /* CONFIG_XFS_RT */ - /* Execute a getfsmap query against the regular data device. */ STATIC int __xfs_getfsmap_datadev( @@ -766,6 +598,174 @@ xfs_getfsmap_datadev_bnobt( xfs_getfsmap_datadev_bnobt_query, &akeys[0]); } +/* Execute a getfsmap query against the log device. */ +STATIC int +xfs_getfsmap_logdev( + struct xfs_trans *tp, + const struct xfs_fsmap *keys, + struct xfs_getfsmap_info *info) +{ + struct xfs_mount *mp = tp->t_mountp; + struct xfs_rmap_irec rmap; + int error; + + /* Set up search keys */ + info->low.rm_startblock = XFS_BB_TO_FSBT(mp, keys[0].fmr_physical); + info->low.rm_offset = XFS_BB_TO_FSBT(mp, keys[0].fmr_offset); + error = xfs_fsmap_owner_to_rmap(&info->low, keys); + if (error) + return error; + info->low.rm_blockcount = 0; + xfs_getfsmap_set_irec_flags(&info->low, &keys[0]); + + error = xfs_fsmap_owner_to_rmap(&info->high, keys + 1); + if (error) + return error; + info->high.rm_startblock = -1U; + info->high.rm_owner = ULLONG_MAX; + info->high.rm_offset = ULLONG_MAX; + info->high.rm_blockcount = 0; + info->high.rm_flags = XFS_RMAP_KEY_FLAGS | XFS_RMAP_REC_FLAGS; + info->missing_owner = XFS_FMR_OWN_FREE; + + trace_xfs_fsmap_low_key(mp, info->dev, NULLAGNUMBER, &info->low); + trace_xfs_fsmap_high_key(mp, info->dev, NULLAGNUMBER, &info->high); + + if (keys[0].fmr_physical > 0) + return 0; + + /* Fabricate an rmap entry for the external log device. */ + rmap.rm_startblock = 0; + rmap.rm_blockcount = mp->m_sb.sb_logblocks; + rmap.rm_owner = XFS_RMAP_OWN_LOG; + rmap.rm_offset = 0; + rmap.rm_flags = 0; + + return xfs_getfsmap_helper(tp, info, &rmap, 0); +} + +#ifdef CONFIG_XFS_RT +/* Transform a rtbitmap "record" into a fsmap */ +STATIC int +xfs_getfsmap_rtdev_rtbitmap_helper( + struct xfs_mount *mp, + struct xfs_trans *tp, + const struct xfs_rtalloc_rec *rec, + void *priv) +{ + struct xfs_getfsmap_info *info = priv; + struct xfs_rmap_irec irec; + xfs_daddr_t rec_daddr; + + irec.rm_startblock = xfs_rtx_to_rtb(mp, rec->ar_startext); + rec_daddr = XFS_FSB_TO_BB(mp, irec.rm_startblock); + irec.rm_blockcount = xfs_rtx_to_rtb(mp, rec->ar_extcount); + irec.rm_owner = XFS_RMAP_OWN_NULL; /* "free" */ + irec.rm_offset = 0; + irec.rm_flags = 0; + + return xfs_getfsmap_helper(tp, info, &irec, rec_daddr); +} + +/* Execute a getfsmap query against the realtime device. */ +STATIC int +__xfs_getfsmap_rtdev( + struct xfs_trans *tp, + const struct xfs_fsmap *keys, + int (*query_fn)(struct xfs_trans *, + struct xfs_getfsmap_info *), + struct xfs_getfsmap_info *info) +{ + struct xfs_mount *mp = tp->t_mountp; + xfs_fsblock_t start_fsb; + xfs_fsblock_t end_fsb; + uint64_t eofs; + int error = 0; + + eofs = XFS_FSB_TO_BB(mp, mp->m_sb.sb_rblocks); + if (keys[0].fmr_physical >= eofs) + return 0; + start_fsb = XFS_BB_TO_FSBT(mp, keys[0].fmr_physical); + end_fsb = XFS_BB_TO_FSB(mp, min(eofs - 1, keys[1].fmr_physical)); + + /* Set up search keys */ + info->low.rm_startblock = start_fsb; + error = xfs_fsmap_owner_to_rmap(&info->low, &keys[0]); + if (error) + return error; + info->low.rm_offset = XFS_BB_TO_FSBT(mp, keys[0].fmr_offset); + info->low.rm_blockcount = 0; + xfs_getfsmap_set_irec_flags(&info->low, &keys[0]); + + info->high.rm_startblock = end_fsb; + error = xfs_fsmap_owner_to_rmap(&info->high, &keys[1]); + if (error) + return error; + info->high.rm_offset = XFS_BB_TO_FSBT(mp, keys[1].fmr_offset); + info->high.rm_blockcount = 0; + xfs_getfsmap_set_irec_flags(&info->high, &keys[1]); + + trace_xfs_fsmap_low_key(mp, info->dev, NULLAGNUMBER, &info->low); + trace_xfs_fsmap_high_key(mp, info->dev, NULLAGNUMBER, &info->high); + + return query_fn(tp, info); +} + +/* Actually query the realtime bitmap. */ +STATIC int +xfs_getfsmap_rtdev_rtbitmap_query( + struct xfs_trans *tp, + struct xfs_getfsmap_info *info) +{ + struct xfs_rtalloc_rec alow = { 0 }; + struct xfs_rtalloc_rec ahigh = { 0 }; + struct xfs_mount *mp = tp->t_mountp; + unsigned int mod; + int error; + + xfs_rtbitmap_lock_shared(mp, XFS_RBMLOCK_BITMAP); + + /* + * Set up query parameters to return free rtextents covering the range + * we want. + */ + alow.ar_startext = xfs_rtb_to_rtxt(mp, info->low.rm_startblock); + ahigh.ar_startext = xfs_rtb_to_rtx(mp, info->high.rm_startblock, &mod); + if (mod) + ahigh.ar_startext++; + error = xfs_rtalloc_query_range(mp, tp, &alow, &ahigh, + xfs_getfsmap_rtdev_rtbitmap_helper, info); + if (error) + goto err; + + /* + * Report any gaps at the end of the rtbitmap by simulating a null + * rmap starting at the block after the end of the query range. + */ + info->last = true; + ahigh.ar_startext = min(mp->m_sb.sb_rextents, ahigh.ar_startext); + + error = xfs_getfsmap_rtdev_rtbitmap_helper(mp, tp, &ahigh, info); + if (error) + goto err; +err: + xfs_rtbitmap_unlock_shared(mp, XFS_RBMLOCK_BITMAP); + return error; +} + +/* Execute a getfsmap query against the realtime device rtbitmap. */ +STATIC int +xfs_getfsmap_rtdev_rtbitmap( + struct xfs_trans *tp, + const struct xfs_fsmap *keys, + struct xfs_getfsmap_info *info) +{ + info->missing_owner = XFS_FMR_OWN_UNKNOWN; + return __xfs_getfsmap_rtdev(tp, keys, xfs_getfsmap_rtdev_rtbitmap_query, + info); +} +#endif /* CONFIG_XFS_RT */ + /* Do we recognize the device? */ STATIC bool xfs_getfsmap_is_valid_device( ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 17/38] xfs: create routine to allocate and initialize a realtime rmap btree inode 2022-12-30 22:18 ` [PATCHSET v1.0 00/38] xfs: realtime reverse-mapping support Darrick J. Wong ` (14 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 18/38] xfs: rearrange xfs_fsmap.c a little bit Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 20/38] xfs: fix integer overflows in the fsmap rtbitmap backend Darrick J. Wong ` (21 subsequent siblings) 37 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Create a library routine to allocate and initialize an empty realtime rmapbt inode. We'll use this for mkfs and repair. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_rtrmap_btree.c | 42 ++++++++++++++++++++++++++++++++++++++ fs/xfs/libxfs/xfs_rtrmap_btree.h | 5 +++++ 2 files changed, 47 insertions(+) diff --git a/fs/xfs/libxfs/xfs_rtrmap_btree.c b/fs/xfs/libxfs/xfs_rtrmap_btree.c index a099f33f26ab..9181fca2ba54 100644 --- a/fs/xfs/libxfs/xfs_rtrmap_btree.c +++ b/fs/xfs/libxfs/xfs_rtrmap_btree.c @@ -27,6 +27,7 @@ #include "xfs_extent_busy.h" #include "xfs_rtgroup.h" #include "xfs_bmap.h" +#include "xfs_imeta.h" static struct kmem_cache *xfs_rtrmapbt_cur_cache; @@ -867,3 +868,44 @@ xfs_iflush_rtrmap( xfs_rtrmapbt_to_disk(ip->i_mount, ifp->if_broot, ifp->if_broot_bytes, dfp, XFS_DFORK_SIZE(dip, ip->i_mount, XFS_DATA_FORK)); } + +/* + * Create a realtime rmap btree inode. + * + * Regardless of the return value, the caller must clean up @ic. If a new + * inode is returned through *ipp, the caller must finish setting up the incore + * inode and release it. + */ +int +xfs_rtrmapbt_create( + struct xfs_trans **tpp, + struct xfs_imeta_path *path, + struct xfs_imeta_update *upd, + struct xfs_inode **ipp) +{ + struct xfs_mount *mp = (*tpp)->t_mountp; + struct xfs_ifork *ifp; + struct xfs_inode *ip; + int error; + + *ipp = NULL; + + error = xfs_imeta_create(tpp, path, S_IFREG, 0, &ip, upd); + if (error) + return error; + + ifp = &ip->i_df; + ifp->if_format = XFS_DINODE_FMT_RMAP; + ASSERT(ifp->if_broot_bytes == 0); + ASSERT(ifp->if_bytes == 0); + + /* Initialize the empty incore btree root. */ + xfs_iroot_alloc(ip, XFS_DATA_FORK, + xfs_rtrmap_broot_space_calc(mp, 0, 0)); + xfs_btree_init_block(mp, ifp->if_broot, &xfs_rtrmapbt_ops, 0, 0, + ip->i_ino); + xfs_trans_log_inode(*tpp, ip, XFS_ILOG_CORE | XFS_ILOG_DBROOT); + + *ipp = ip; + return 0; +} diff --git a/fs/xfs/libxfs/xfs_rtrmap_btree.h b/fs/xfs/libxfs/xfs_rtrmap_btree.h index 6917a31bfe0c..046a60816736 100644 --- a/fs/xfs/libxfs/xfs_rtrmap_btree.h +++ b/fs/xfs/libxfs/xfs_rtrmap_btree.h @@ -198,4 +198,9 @@ void xfs_rtrmapbt_to_disk(struct xfs_mount *mp, struct xfs_btree_block *rblock, unsigned int dblocklen); void xfs_iflush_rtrmap(struct xfs_inode *ip, struct xfs_dinode *dip); +struct xfs_imeta_update; + +int xfs_rtrmapbt_create(struct xfs_trans **tpp, struct xfs_imeta_path *path, + struct xfs_imeta_update *ic, struct xfs_inode **ipp); + #endif /* __XFS_RTRMAP_BTREE_H__ */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 20/38] xfs: fix integer overflows in the fsmap rtbitmap backend 2022-12-30 22:18 ` [PATCHSET v1.0 00/38] xfs: realtime reverse-mapping support Darrick J. Wong ` (15 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 17/38] xfs: create routine to allocate and initialize a realtime rmap btree inode Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 15/38] xfs: use realtime EFI to free extents when realtime rmap is enabled Darrick J. Wong ` (20 subsequent siblings) 37 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/xfs_fsmap.c | 54 +++++++++++++++++++++++++++++++++++++++------------- 1 file changed, 40 insertions(+), 14 deletions(-) diff --git a/fs/xfs/xfs_fsmap.c b/fs/xfs/xfs_fsmap.c index e330a7e55d1d..b5e7ae77cab9 100644 --- a/fs/xfs/xfs_fsmap.c +++ b/fs/xfs/xfs_fsmap.c @@ -163,6 +163,8 @@ struct xfs_getfsmap_info { struct xfs_rtgroup *rtg; /* rt group info, if needed */ struct xfs_perag *pag; /* AG info, if applicable */ xfs_daddr_t next_daddr; /* next daddr we expect */ + /* daddr of low fsmap key when we're using the rtbitmap */ + xfs_daddr_t low_daddr; u64 missing_owner; /* owner of holes */ u32 dev; /* device id */ struct xfs_rmap_irec low; /* low rmap key */ @@ -240,16 +242,29 @@ xfs_getfsmap_format( xfs_fsmap_from_internal(rec, xfm); } +static inline bool +xfs_getfsmap_rec_before_start( + struct xfs_getfsmap_info *info, + const struct xfs_rmap_irec *rec, + xfs_daddr_t rec_daddr) +{ + if (info->low_daddr != -1ULL) + return rec_daddr < info->low_daddr; + return xfs_rmap_compare(rec, &info->low) < 0; +} + /* * Format a reverse mapping for getfsmap, having translated rm_startblock - * into the appropriate daddr units. + * into the appropriate daddr units. Pass in a nonzero @len_daddr if the + * length could be larger than rm_blockcount in struct xfs_rmap_irec. */ STATIC int xfs_getfsmap_helper( struct xfs_trans *tp, struct xfs_getfsmap_info *info, const struct xfs_rmap_irec *rec, - xfs_daddr_t rec_daddr) + xfs_daddr_t rec_daddr, + xfs_daddr_t len_daddr) { struct xfs_fsmap fmr; struct xfs_mount *mp = tp->t_mountp; @@ -259,12 +274,15 @@ xfs_getfsmap_helper( if (fatal_signal_pending(current)) return -EINTR; + if (len_daddr == 0) + len_daddr = XFS_FSB_TO_BB(mp, rec->rm_blockcount); + /* * Filter out records that start before our startpoint, if the * caller requested that. */ - if (xfs_rmap_compare(rec, &info->low) < 0) { - rec_daddr += XFS_FSB_TO_BB(mp, rec->rm_blockcount); + if (xfs_getfsmap_rec_before_start(info, rec, rec_daddr)) { + rec_daddr += len_daddr; if (info->next_daddr < rec_daddr) info->next_daddr = rec_daddr; return 0; @@ -283,7 +301,7 @@ xfs_getfsmap_helper( info->head->fmh_entries++; - rec_daddr += XFS_FSB_TO_BB(mp, rec->rm_blockcount); + rec_daddr += len_daddr; if (info->next_daddr < rec_daddr) info->next_daddr = rec_daddr; return 0; @@ -329,7 +347,7 @@ xfs_getfsmap_helper( if (error) return error; fmr.fmr_offset = XFS_FSB_TO_BB(mp, rec->rm_offset); - fmr.fmr_length = XFS_FSB_TO_BB(mp, rec->rm_blockcount); + fmr.fmr_length = len_daddr; if (rec->rm_flags & XFS_RMAP_UNWRITTEN) fmr.fmr_flags |= FMR_OF_PREALLOC; if (rec->rm_flags & XFS_RMAP_ATTR_FORK) @@ -346,7 +364,7 @@ xfs_getfsmap_helper( xfs_getfsmap_format(mp, &fmr, info); out: - rec_daddr += XFS_FSB_TO_BB(mp, rec->rm_blockcount); + rec_daddr += len_daddr; if (info->next_daddr < rec_daddr) info->next_daddr = rec_daddr; return 0; @@ -382,7 +400,7 @@ xfs_getfsmap_datadev_helper( fsb = XFS_AGB_TO_FSB(mp, cur->bc_ag.pag->pag_agno, rec->rm_startblock); rec_daddr = XFS_FSB_TO_DADDR(mp, fsb); - return xfs_getfsmap_helper(cur->bc_tp, info, rec, rec_daddr); + return xfs_getfsmap_helper(cur->bc_tp, info, rec, rec_daddr, 0); } /* Transform a bnobt irec into a fsmap */ @@ -406,7 +424,7 @@ xfs_getfsmap_datadev_bnobt_helper( irec.rm_offset = 0; irec.rm_flags = 0; - return xfs_getfsmap_helper(cur->bc_tp, info, &irec, rec_daddr); + return xfs_getfsmap_helper(cur->bc_tp, info, &irec, rec_daddr, 0); } /* Execute a getfsmap query against the regular data device. */ @@ -650,7 +668,7 @@ xfs_getfsmap_logdev( rmap.rm_offset = 0; rmap.rm_flags = 0; - return xfs_getfsmap_helper(tp, info, &rmap, 0); + return xfs_getfsmap_helper(tp, info, &rmap, 0, 0); } #ifdef CONFIG_XFS_RT @@ -664,16 +682,22 @@ xfs_getfsmap_rtdev_rtbitmap_helper( { struct xfs_getfsmap_info *info = priv; struct xfs_rmap_irec irec; - xfs_daddr_t rec_daddr; + xfs_rtblock_t rtbno; + xfs_daddr_t rec_daddr, len_daddr; + + rtbno = xfs_rtx_to_rtb(mp, rec->ar_startext); + rec_daddr = XFS_FSB_TO_BB(mp, rtbno); + + rtbno = xfs_rtx_to_rtb(mp, rec->ar_extcount); + len_daddr = XFS_FSB_TO_BB(mp, rtbno); irec.rm_startblock = xfs_rtx_to_rtb(mp, rec->ar_startext); - rec_daddr = XFS_FSB_TO_BB(mp, irec.rm_startblock); irec.rm_blockcount = xfs_rtx_to_rtb(mp, rec->ar_extcount); irec.rm_owner = XFS_RMAP_OWN_NULL; /* "free" */ irec.rm_offset = 0; irec.rm_flags = 0; - return xfs_getfsmap_helper(tp, info, &irec, rec_daddr); + return xfs_getfsmap_helper(tp, info, &irec, rec_daddr, len_daddr); } /* Actually query the realtime bitmap. */ @@ -741,6 +765,7 @@ xfs_getfsmap_rtdev_rtbitmap( /* Set up search keys */ info->low.rm_startblock = start_fsb; + info->low_daddr = XFS_FSB_TO_BB(mp, start_fsb); error = xfs_fsmap_owner_to_rmap(&info->low, &keys[0]); if (error) return error; @@ -778,7 +803,7 @@ xfs_getfsmap_rtdev_helper( rec->rm_startblock); rec_daddr = xfs_rtb_to_daddr(mp, rtbno); - return xfs_getfsmap_helper(cur->bc_tp, info, rec, rec_daddr); + return xfs_getfsmap_helper(cur->bc_tp, info, rec, rec_daddr, 0); } /* Actually query the rtrmap btree. */ @@ -1122,6 +1147,7 @@ xfs_getfsmap( info.last = false; info.pag = NULL; info.rtg = NULL; + info.low_daddr = -1ULL; error = handlers[i].fn(tp, dkeys, &info); if (error) break; ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 15/38] xfs: use realtime EFI to free extents when realtime rmap is enabled 2022-12-30 22:18 ` [PATCHSET v1.0 00/38] xfs: realtime reverse-mapping support Darrick J. Wong ` (16 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 20/38] xfs: fix integer overflows in the fsmap rtbitmap backend Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 19/38] xfs: wire up getfsmap to the realtime reverse mapping btree Darrick J. Wong ` (19 subsequent siblings) 37 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> When rmap is enabled, XFS expects a certain order of operations, which is: 1) remove the file mapping, 2) remove the reverse mapping, and then 3) free the blocks. xfs_bmap_del_extent_real tries to do 1 and 3 in the same transaction, which means that when rtrmap is enabled, we have to use realtime EFIs to maintain the expected order. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_bmap.c | 22 ++++++++++++++++------ 1 file changed, 16 insertions(+), 6 deletions(-) diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index 2e93b018d150..8c683db35788 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -5094,7 +5094,6 @@ xfs_bmap_del_extent_real( { xfs_fsblock_t del_endblock=0; /* first block past del */ xfs_fileoff_t del_endoff; /* first offset past del */ - int do_fx; /* free extent at end of routine */ int error; /* error return value */ int flags = 0;/* inode logging flags */ struct xfs_bmbt_irec got; /* current extent entry */ @@ -5108,6 +5107,8 @@ xfs_bmap_del_extent_real( uint qfield; /* quota field to update */ uint32_t state = xfs_bmap_fork_to_state(whichfork); struct xfs_bmbt_irec old; + bool isrt = xfs_ifork_is_realtime(ip, whichfork); + bool want_free = !(bflags & XFS_BMAPI_REMAP); mp = ip->i_mount; XFS_STATS_INC(mp, xs_del_exlist); @@ -5138,17 +5139,24 @@ xfs_bmap_del_extent_real( return -ENOSPC; flags = XFS_ILOG_CORE; - if (xfs_ifork_is_realtime(ip, whichfork)) { - if (!(bflags & XFS_BMAPI_REMAP)) { + if (isrt) { + /* + * Historically, we did not use EFIs to free realtime extents. + * However, when reverse mapping is enabled, we must maintain + * the same order of operations as the data device, which is: + * Remove the file mapping, remove the reverse mapping, and + * then free the blocks. This means that we must delay the + * freeing until after we've scheduled the rmap update. + */ + if (want_free && !xfs_has_rtrmapbt(mp)) { error = xfs_rtfree_blocks(tp, del->br_startblock, del->br_blockcount); if (error) goto done; + want_free = false; } - do_fx = 0; qfield = XFS_TRANS_DQ_RTBCOUNT; } else { - do_fx = 1; qfield = XFS_TRANS_DQ_BCOUNT; } nblks = del->br_blockcount; @@ -5303,7 +5311,7 @@ xfs_bmap_del_extent_real( /* * If we need to, add to list of extents to delete. */ - if (do_fx && !(bflags & XFS_BMAPI_REMAP)) { + if (want_free) { if (xfs_is_reflink_inode(ip) && whichfork == XFS_DATA_FORK) { xfs_refcount_decrease_extent(tp, del); } else { @@ -5312,6 +5320,8 @@ xfs_bmap_del_extent_real( if ((bflags & XFS_BMAPI_NODISCARD) || del->br_state == XFS_EXT_UNWRITTEN) efi_flags |= XFS_FREE_EXTENT_SKIP_DISCARD; + if (isrt) + efi_flags |= XFS_FREE_EXTENT_REALTIME; xfs_free_extent_later(tp, del->br_startblock, del->br_blockcount, NULL, efi_flags); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 19/38] xfs: wire up getfsmap to the realtime reverse mapping btree 2022-12-30 22:18 ` [PATCHSET v1.0 00/38] xfs: realtime reverse-mapping support Darrick J. Wong ` (17 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 15/38] xfs: use realtime EFI to free extents when realtime rmap is enabled Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 16/38] xfs: wire up rmap map and unmap to the realtime rmapbt Darrick J. Wong ` (18 subsequent siblings) 37 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Connect the getfsmap ioctl to the realtime rmapbt. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/xfs_fsmap.c | 261 ++++++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 212 insertions(+), 49 deletions(-) diff --git a/fs/xfs/xfs_fsmap.c b/fs/xfs/xfs_fsmap.c index dfd9e39ded6e..e330a7e55d1d 100644 --- a/fs/xfs/xfs_fsmap.c +++ b/fs/xfs/xfs_fsmap.c @@ -25,6 +25,8 @@ #include "xfs_alloc_btree.h" #include "xfs_rtbitmap.h" #include "xfs_ag.h" +#include "xfs_rtgroup.h" +#include "xfs_rtrmap_btree.h" /* Convert an xfs_fsmap to an fsmap. */ static void @@ -158,6 +160,7 @@ struct xfs_getfsmap_info { struct xfs_fsmap_head *head; struct fsmap *fsmap_recs; /* mapping records */ struct xfs_buf *agf_bp; /* AGF, for refcount queries */ + struct xfs_rtgroup *rtg; /* rt group info, if needed */ struct xfs_perag *pag; /* AG info, if applicable */ xfs_daddr_t next_daddr; /* next daddr we expect */ u64 missing_owner; /* owner of holes */ @@ -311,8 +314,14 @@ xfs_getfsmap_helper( if (info->head->fmh_entries >= info->head->fmh_count) return -ECANCELED; - trace_xfs_fsmap_mapping(mp, info->dev, - info->pag ? info->pag->pag_agno : NULLAGNUMBER, rec); + if (info->pag) + trace_xfs_fsmap_mapping(mp, info->dev, info->pag->pag_agno, + rec); + else if (info->rtg) + trace_xfs_fsmap_mapping(mp, info->dev, info->rtg->rtg_rgno, + rec); + else + trace_xfs_fsmap_mapping(mp, info->dev, NULLAGNUMBER, rec); fmr.fmr_device = info->dev; fmr.fmr_physical = rec_daddr; @@ -667,50 +676,6 @@ xfs_getfsmap_rtdev_rtbitmap_helper( return xfs_getfsmap_helper(tp, info, &irec, rec_daddr); } -/* Execute a getfsmap query against the realtime device. */ -STATIC int -__xfs_getfsmap_rtdev( - struct xfs_trans *tp, - const struct xfs_fsmap *keys, - int (*query_fn)(struct xfs_trans *, - struct xfs_getfsmap_info *), - struct xfs_getfsmap_info *info) -{ - struct xfs_mount *mp = tp->t_mountp; - xfs_fsblock_t start_fsb; - xfs_fsblock_t end_fsb; - uint64_t eofs; - int error = 0; - - eofs = XFS_FSB_TO_BB(mp, mp->m_sb.sb_rblocks); - if (keys[0].fmr_physical >= eofs) - return 0; - start_fsb = XFS_BB_TO_FSBT(mp, keys[0].fmr_physical); - end_fsb = XFS_BB_TO_FSB(mp, min(eofs - 1, keys[1].fmr_physical)); - - /* Set up search keys */ - info->low.rm_startblock = start_fsb; - error = xfs_fsmap_owner_to_rmap(&info->low, &keys[0]); - if (error) - return error; - info->low.rm_offset = XFS_BB_TO_FSBT(mp, keys[0].fmr_offset); - info->low.rm_blockcount = 0; - xfs_getfsmap_set_irec_flags(&info->low, &keys[0]); - - info->high.rm_startblock = end_fsb; - error = xfs_fsmap_owner_to_rmap(&info->high, &keys[1]); - if (error) - return error; - info->high.rm_offset = XFS_BB_TO_FSBT(mp, keys[1].fmr_offset); - info->high.rm_blockcount = 0; - xfs_getfsmap_set_irec_flags(&info->high, &keys[1]); - - trace_xfs_fsmap_low_key(mp, info->dev, NULLAGNUMBER, &info->low); - trace_xfs_fsmap_high_key(mp, info->dev, NULLAGNUMBER, &info->high); - - return query_fn(tp, info); -} - /* Actually query the realtime bitmap. */ STATIC int xfs_getfsmap_rtdev_rtbitmap_query( @@ -760,9 +725,203 @@ xfs_getfsmap_rtdev_rtbitmap( const struct xfs_fsmap *keys, struct xfs_getfsmap_info *info) { + struct xfs_mount *mp = tp->t_mountp; + xfs_fsblock_t start_fsb; + xfs_fsblock_t end_fsb; + uint64_t eofs; + int error = 0; + + eofs = XFS_FSB_TO_BB(mp, mp->m_sb.sb_rblocks); + if (keys[0].fmr_physical >= eofs) + return 0; + start_fsb = XFS_BB_TO_FSBT(mp, keys[0].fmr_physical); + end_fsb = XFS_BB_TO_FSB(mp, min(eofs - 1, keys[1].fmr_physical)); + info->missing_owner = XFS_FMR_OWN_UNKNOWN; - return __xfs_getfsmap_rtdev(tp, keys, xfs_getfsmap_rtdev_rtbitmap_query, - info); + + /* Set up search keys */ + info->low.rm_startblock = start_fsb; + error = xfs_fsmap_owner_to_rmap(&info->low, &keys[0]); + if (error) + return error; + info->low.rm_offset = XFS_BB_TO_FSBT(mp, keys[0].fmr_offset); + info->low.rm_blockcount = 0; + xfs_getfsmap_set_irec_flags(&info->low, &keys[0]); + + info->high.rm_startblock = end_fsb; + error = xfs_fsmap_owner_to_rmap(&info->high, &keys[1]); + if (error) + return error; + info->high.rm_offset = XFS_BB_TO_FSBT(mp, keys[1].fmr_offset); + info->high.rm_blockcount = 0; + xfs_getfsmap_set_irec_flags(&info->high, &keys[1]); + + trace_xfs_fsmap_low_key(mp, info->dev, NULLAGNUMBER, &info->low); + trace_xfs_fsmap_high_key(mp, info->dev, NULLAGNUMBER, &info->high); + + return xfs_getfsmap_rtdev_rtbitmap_query(tp, info); +} + +/* Transform a absolute-startblock rmap (rtdev, logdev) into a fsmap */ +STATIC int +xfs_getfsmap_rtdev_helper( + struct xfs_btree_cur *cur, + const struct xfs_rmap_irec *rec, + void *priv) +{ + struct xfs_mount *mp = cur->bc_mp; + struct xfs_getfsmap_info *info = priv; + xfs_rtblock_t rtbno; + xfs_daddr_t rec_daddr; + + rtbno = xfs_rgbno_to_rtb(mp, cur->bc_ino.rtg->rtg_rgno, + rec->rm_startblock); + rec_daddr = xfs_rtb_to_daddr(mp, rtbno); + + return xfs_getfsmap_helper(cur->bc_tp, info, rec, rec_daddr); +} + +/* Actually query the rtrmap btree. */ +STATIC int +xfs_getfsmap_rtdev_rmapbt_query( + struct xfs_trans *tp, + struct xfs_getfsmap_info *info, + struct xfs_btree_cur **curpp) +{ + struct xfs_mount *mp = tp->t_mountp; + + /* Report any gap at the end of the last rtgroup. */ + if (info->last) + return xfs_getfsmap_rtdev_helper(*curpp, &info->high, info); + + /* Query the rtrmapbt */ + xfs_rtgroup_lock(NULL, info->rtg, XFS_RTGLOCK_RMAP); + *curpp = xfs_rtrmapbt_init_cursor(mp, tp, info->rtg, + info->rtg->rtg_rmapip); + return xfs_rmap_query_range(*curpp, &info->low, &info->high, + xfs_getfsmap_rtdev_helper, info); +} + +/* Execute a getfsmap query against the realtime device rmapbt. */ +STATIC int +xfs_getfsmap_rtdev_rmapbt( + struct xfs_trans *tp, + const struct xfs_fsmap *keys, + struct xfs_getfsmap_info *info) +{ + struct xfs_mount *mp = tp->t_mountp; + struct xfs_rtgroup *rtg; + struct xfs_btree_cur *bt_cur = NULL; + xfs_fsblock_t start_fsb; + xfs_fsblock_t end_fsb; + xfs_rgnumber_t start_rg, end_rg; + uint64_t eofs; + int error = 0; + + eofs = XFS_FSB_TO_BB(mp, xfs_rtx_to_rtb(mp, mp->m_sb.sb_rextents)); + if (keys[0].fmr_physical >= eofs) + return 0; + start_fsb = XFS_BB_TO_FSBT(mp, keys[0].fmr_physical); + end_fsb = XFS_BB_TO_FSB(mp, min(eofs - 1, keys[1].fmr_physical)); + + info->missing_owner = XFS_FMR_OWN_FREE; + + /* + * Convert the fsmap low/high keys to rtgroup based keys. Initialize + * low to the fsmap low key and max out the high key to the end + * of the rtgroup. + */ + info->low.rm_startblock = xfs_rtb_to_rgbno(mp, start_fsb, &start_rg); + info->low.rm_offset = XFS_BB_TO_FSBT(mp, keys[0].fmr_offset); + error = xfs_fsmap_owner_to_rmap(&info->low, &keys[0]); + if (error) + return error; + info->low.rm_blockcount = 0; + xfs_getfsmap_set_irec_flags(&info->low, &keys[0]); + + info->high.rm_startblock = -1U; + info->high.rm_owner = ULLONG_MAX; + info->high.rm_offset = ULLONG_MAX; + info->high.rm_blockcount = 0; + info->high.rm_flags = XFS_RMAP_KEY_FLAGS | XFS_RMAP_REC_FLAGS; + + end_rg = xfs_rtb_to_rgno(mp, end_fsb); + + for_each_rtgroup_range(mp, start_rg, end_rg, rtg) { + /* + * Set the rtgroup high key from the fsmap high key if this + * is the last rtgroup that we're querying. + */ + info->rtg = rtg; + if (rtg->rtg_rgno == end_rg) { + xfs_rgnumber_t junk; + + info->high.rm_startblock = xfs_rtb_to_rgbno(mp, + end_fsb, &junk); + info->high.rm_offset = XFS_BB_TO_FSBT(mp, + keys[1].fmr_offset); + error = xfs_fsmap_owner_to_rmap(&info->high, &keys[1]); + if (error) + break; + xfs_getfsmap_set_irec_flags(&info->high, &keys[1]); + } + + if (bt_cur) { + xfs_rtgroup_unlock(bt_cur->bc_ino.rtg, + XFS_RTGLOCK_RMAP); + xfs_btree_del_cursor(bt_cur, XFS_BTREE_NOERROR); + bt_cur = NULL; + } + + trace_xfs_fsmap_low_key(mp, info->dev, rtg->rtg_rgno, + &info->low); + trace_xfs_fsmap_high_key(mp, info->dev, rtg->rtg_rgno, + &info->high); + + error = xfs_getfsmap_rtdev_rmapbt_query(tp, info, &bt_cur); + if (error) + break; + + /* + * Set the rtgroup low key to the start of the rtgroup prior to + * moving on to the next rtgroup. + */ + if (rtg->rtg_rgno == start_rg) { + info->low.rm_startblock = 0; + info->low.rm_owner = 0; + info->low.rm_offset = 0; + info->low.rm_flags = 0; + } + + /* + * If this is the last rtgroup, report any gap at the end of it + * before we drop the reference to the perag when the loop + * terminates. + */ + if (rtg->rtg_rgno == end_rg) { + info->last = true; + error = xfs_getfsmap_rtdev_rmapbt_query(tp, info, + &bt_cur); + if (error) + break; + } + info->rtg = NULL; + } + + if (bt_cur) { + xfs_rtgroup_unlock(bt_cur->bc_ino.rtg, XFS_RTGLOCK_RMAP); + xfs_btree_del_cursor(bt_cur, error < 0 ? XFS_BTREE_ERROR : + XFS_BTREE_NOERROR); + } + if (info->rtg) { + xfs_rtgroup_put(info->rtg); + info->rtg = NULL; + } else if (rtg) { + /* loop termination case */ + xfs_rtgroup_put(rtg); + } + + return error; } #endif /* CONFIG_XFS_RT */ @@ -881,7 +1040,10 @@ xfs_getfsmap( #ifdef CONFIG_XFS_RT if (mp->m_rtdev_targp) { handlers[2].dev = new_encode_dev(mp->m_rtdev_targp->bt_dev); - handlers[2].fn = xfs_getfsmap_rtdev_rtbitmap; + if (use_rmap) + handlers[2].fn = xfs_getfsmap_rtdev_rmapbt; + else + handlers[2].fn = xfs_getfsmap_rtdev_rtbitmap; } #endif /* CONFIG_XFS_RT */ @@ -959,6 +1121,7 @@ xfs_getfsmap( info.dev = handlers[i].dev; info.last = false; info.pag = NULL; + info.rtg = NULL; error = handlers[i].fn(tp, dkeys, &info); if (error) break; ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 16/38] xfs: wire up rmap map and unmap to the realtime rmapbt 2022-12-30 22:18 ` [PATCHSET v1.0 00/38] xfs: realtime reverse-mapping support Darrick J. Wong ` (18 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 19/38] xfs: wire up getfsmap to the realtime reverse mapping btree Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 24/38] xfs: report realtime rmap btree corruption errors to the health system Darrick J. Wong ` (17 subsequent siblings) 37 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Connect the map and unmap reverse-mapping operations to the realtime rmapbt via the deferred operation callbacks. This enables us to perform rmap operations against the correct btree. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_rmap.c | 80 +++++++++++++++++++++++++++---------------- fs/xfs/libxfs/xfs_rtgroup.c | 9 +++++ fs/xfs/libxfs/xfs_rtgroup.h | 5 ++- 3 files changed, 63 insertions(+), 31 deletions(-) diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c index 1a3607082d12..e3bff42d003d 100644 --- a/fs/xfs/libxfs/xfs_rmap.c +++ b/fs/xfs/libxfs/xfs_rmap.c @@ -25,6 +25,7 @@ #include "xfs_ag.h" #include "xfs_health.h" #include "xfs_rtgroup.h" +#include "xfs_rtrmap_btree.h" struct kmem_cache *xfs_rmap_intent_cache; @@ -2591,13 +2592,14 @@ xfs_rmap_finish_one_cleanup( struct xfs_btree_cur *rcur, int error) { - struct xfs_buf *agbp; + struct xfs_buf *agbp = NULL; if (rcur == NULL) return; - agbp = rcur->bc_ag.agbp; + if (rcur->bc_btnum == XFS_BTNUM_RMAP) + agbp = rcur->bc_ag.agbp; xfs_btree_del_cursor(rcur, error); - if (error) + if (error && agbp) xfs_trans_brelse(tp, agbp); } @@ -2633,6 +2635,17 @@ __xfs_rmap_finish_intent( } } +/* Does this btree cursor match the given group object? */ +static inline bool +xfs_rmap_is_wrong_cursor( + struct xfs_btree_cur *cur, + struct xfs_rmap_intent *ri) +{ + if (cur->bc_btnum == XFS_BTNUM_RTRMAP) + return cur->bc_ino.rtg != ri->ri_rtg; + return cur->bc_ag.pag != ri->ri_pag; +} + /* * Process one of the deferred rmap operations. We pass back the * btree cursor to maintain our lock on the rmapbt between calls. @@ -2646,24 +2659,24 @@ xfs_rmap_finish_one( struct xfs_rmap_intent *ri, struct xfs_btree_cur **pcur) { + struct xfs_owner_info oinfo; struct xfs_mount *mp = tp->t_mountp; struct xfs_btree_cur *rcur; struct xfs_buf *agbp = NULL; - int error = 0; - struct xfs_owner_info oinfo; xfs_agblock_t bno; bool unwritten; - - if (ri->ri_realtime) { - /* coming in a subsequent patch */ - ASSERT(0); - return -EFSCORRUPTED; - } - - bno = XFS_FSB_TO_AGBNO(mp, ri->ri_bmap.br_startblock); + int error = 0; trace_xfs_rmap_deferred(mp, ri); + if (ri->ri_realtime) { + xfs_rgnumber_t rgno; + + bno = xfs_rtb_to_rgbno(mp, ri->ri_bmap.br_startblock, &rgno); + } else { + bno = XFS_FSB_TO_AGBNO(mp, ri->ri_bmap.br_startblock); + } + if (XFS_TEST_ERROR(false, mp, XFS_ERRTAG_RMAP_FINISH_ONE)) return -EIO; @@ -2672,35 +2685,42 @@ xfs_rmap_finish_one( * the startblock, get one now. */ rcur = *pcur; - if (rcur != NULL && rcur->bc_ag.pag != ri->ri_pag) { + if (rcur != NULL && xfs_rmap_is_wrong_cursor(rcur, ri)) { xfs_rmap_finish_one_cleanup(tp, rcur, 0); rcur = NULL; *pcur = NULL; } if (rcur == NULL) { - /* - * Refresh the freelist before we start changing the - * rmapbt, because a shape change could cause us to - * allocate blocks. - */ - error = xfs_free_extent_fix_freelist(tp, ri->ri_pag, &agbp); - if (error) { - xfs_ag_mark_sick(ri->ri_pag, XFS_SICK_AG_AGFL); - return error; - } - if (XFS_IS_CORRUPT(tp->t_mountp, !agbp)) { - xfs_ag_mark_sick(ri->ri_pag, XFS_SICK_AG_AGFL); - return -EFSCORRUPTED; - } + if (ri->ri_realtime) { + xfs_rtgroup_lock(tp, ri->ri_rtg, XFS_RTGLOCK_RMAP); + rcur = xfs_rtrmapbt_init_cursor(mp, tp, ri->ri_rtg, + ri->ri_rtg->rtg_rmapip); + rcur->bc_ino.flags = 0; + } else { + /* + * Refresh the freelist before we start changing the + * rmapbt, because a shape change could cause us to + * allocate blocks. + */ + error = xfs_free_extent_fix_freelist(tp, ri->ri_pag, + &agbp); + if (error) { + xfs_ag_mark_sick(ri->ri_pag, XFS_SICK_AG_AGFL); + return error; + } + if (XFS_IS_CORRUPT(tp->t_mountp, !agbp)) { + xfs_ag_mark_sick(ri->ri_pag, XFS_SICK_AG_AGFL); + return -EFSCORRUPTED; + } - rcur = xfs_rmapbt_init_cursor(mp, tp, agbp, ri->ri_pag); + rcur = xfs_rmapbt_init_cursor(mp, tp, agbp, ri->ri_pag); + } } *pcur = rcur; xfs_rmap_ino_owner(&oinfo, ri->ri_owner, ri->ri_whichfork, ri->ri_bmap.br_startoff); unwritten = ri->ri_bmap.br_state == XFS_EXT_UNWRITTEN; - bno = XFS_FSB_TO_AGBNO(rcur->bc_mp, ri->ri_bmap.br_startblock); error = __xfs_rmap_finish_intent(rcur, ri->ri_type, bno, ri->ri_bmap.br_blockcount, &oinfo, unwritten); diff --git a/fs/xfs/libxfs/xfs_rtgroup.c b/fs/xfs/libxfs/xfs_rtgroup.c index 4d9e2c0f2fd3..d6b790741265 100644 --- a/fs/xfs/libxfs/xfs_rtgroup.c +++ b/fs/xfs/libxfs/xfs_rtgroup.c @@ -515,6 +515,12 @@ xfs_rtgroup_lock( xfs_rtbitmap_lock(tp, rtg->rtg_mount); else if (rtglock_flags & XFS_RTGLOCK_BITMAP_SHARED) xfs_rtbitmap_lock_shared(rtg->rtg_mount, XFS_RBMLOCK_BITMAP); + + if ((rtglock_flags & XFS_RTGLOCK_RMAP) && rtg->rtg_rmapip) { + xfs_ilock(rtg->rtg_rmapip, XFS_ILOCK_EXCL); + if (tp) + xfs_trans_ijoin(tp, rtg->rtg_rmapip, XFS_ILOCK_EXCL); + } } /* Unlock metadata inodes associated with this rt group. */ @@ -527,6 +533,9 @@ xfs_rtgroup_unlock( ASSERT(!(rtglock_flags & XFS_RTGLOCK_BITMAP_SHARED) || !(rtglock_flags & XFS_RTGLOCK_BITMAP)); + if ((rtglock_flags & XFS_RTGLOCK_RMAP) && rtg->rtg_rmapip) + xfs_iunlock(rtg->rtg_rmapip, XFS_ILOCK_EXCL); + if (rtglock_flags & XFS_RTGLOCK_BITMAP) xfs_rtbitmap_unlock(rtg->rtg_mount); else if (rtglock_flags & XFS_RTGLOCK_BITMAP_SHARED) diff --git a/fs/xfs/libxfs/xfs_rtgroup.h b/fs/xfs/libxfs/xfs_rtgroup.h index 1792a9ab3bbf..3230dd03d8f8 100644 --- a/fs/xfs/libxfs/xfs_rtgroup.h +++ b/fs/xfs/libxfs/xfs_rtgroup.h @@ -220,9 +220,12 @@ int xfs_rtgroup_init_secondary_super(struct xfs_mount *mp, xfs_rgnumber_t rgno, #define XFS_RTGLOCK_BITMAP (1U << 0) /* Lock the rt bitmap inode in shared mode */ #define XFS_RTGLOCK_BITMAP_SHARED (1U << 1) +/* Lock the rt rmap inode in exclusive mode */ +#define XFS_RTGLOCK_RMAP (1U << 2) #define XFS_RTGLOCK_ALL_FLAGS (XFS_RTGLOCK_BITMAP | \ - XFS_RTGLOCK_BITMAP_SHARED) + XFS_RTGLOCK_BITMAP_SHARED | \ + XFS_RTGLOCK_RMAP) void xfs_rtgroup_lock(struct xfs_trans *tp, struct xfs_rtgroup *rtg, unsigned int rtglock_flags); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 24/38] xfs: report realtime rmap btree corruption errors to the health system 2022-12-30 22:18 ` [PATCHSET v1.0 00/38] xfs: realtime reverse-mapping support Darrick J. Wong ` (19 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 16/38] xfs: wire up rmap map and unmap to the realtime rmapbt Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 23/38] xfs: add realtime rmap btree when adding rt volume Darrick J. Wong ` (16 subsequent siblings) 37 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Whenever we encounter corrupt realtime rmap btree blocks, we should report that to the health monitoring system for later reporting. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_fs.h | 1 + fs/xfs/libxfs/xfs_health.h | 4 +++- fs/xfs/libxfs/xfs_inode_fork.c | 4 +++- fs/xfs/libxfs/xfs_rtrmap_btree.c | 5 ++++- fs/xfs/xfs_health.c | 4 ++++ fs/xfs/xfs_rtalloc.c | 1 + 6 files changed, 16 insertions(+), 3 deletions(-) diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h index 7e9d7d7bb40b..5c557d5ff13e 100644 --- a/fs/xfs/libxfs/xfs_fs.h +++ b/fs/xfs/libxfs/xfs_fs.h @@ -313,6 +313,7 @@ struct xfs_rtgroup_geometry { }; #define XFS_RTGROUP_GEOM_SICK_SUPER (1 << 0) /* superblock */ #define XFS_RTGROUP_GEOM_SICK_BITMAP (1 << 1) /* rtbitmap for this group */ +#define XFS_RTGROUP_GEOM_SICK_RMAPBT (1 << 2) /* reverse mappings */ /* * Structures for XFS_IOC_FSGROWFSDATA, XFS_IOC_FSGROWFSLOG & XFS_IOC_FSGROWFSRT diff --git a/fs/xfs/libxfs/xfs_health.h b/fs/xfs/libxfs/xfs_health.h index 44137c4983fc..d5976f6b0de1 100644 --- a/fs/xfs/libxfs/xfs_health.h +++ b/fs/xfs/libxfs/xfs_health.h @@ -67,6 +67,7 @@ struct xfs_rtgroup; #define XFS_SICK_RT_BITMAP (1 << 0) /* realtime bitmap */ #define XFS_SICK_RT_SUMMARY (1 << 1) /* realtime summary */ #define XFS_SICK_RT_SUPER (1 << 2) /* rt group superblock */ +#define XFS_SICK_RT_RMAPBT (1 << 3) /* reverse mappings */ /* Observable health issues for AG metadata. */ #define XFS_SICK_AG_SB (1 << 0) /* superblock */ @@ -104,7 +105,8 @@ struct xfs_rtgroup; #define XFS_SICK_RT_PRIMARY (XFS_SICK_RT_BITMAP | \ XFS_SICK_RT_SUMMARY | \ - XFS_SICK_RT_SUPER) + XFS_SICK_RT_SUPER | \ + XFS_SICK_RT_RMAPBT) #define XFS_SICK_AG_PRIMARY (XFS_SICK_AG_SB | \ XFS_SICK_AG_AGF | \ diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c index 94979bed8f32..61926c07aad3 100644 --- a/fs/xfs/libxfs/xfs_inode_fork.c +++ b/fs/xfs/libxfs/xfs_inode_fork.c @@ -261,8 +261,10 @@ xfs_iformat_data_fork( case XFS_DINODE_FMT_BTREE: return xfs_iformat_btree(ip, dip, XFS_DATA_FORK); case XFS_DINODE_FMT_RMAP: - if (!xfs_has_rtrmapbt(ip->i_mount)) + if (!xfs_has_rtrmapbt(ip->i_mount)) { + xfs_inode_mark_sick(ip, XFS_SICK_INO_CORE); return -EFSCORRUPTED; + } return xfs_iformat_rtrmap(ip, dip); default: xfs_inode_verifier_error(ip, -EFSCORRUPTED, __func__, diff --git a/fs/xfs/libxfs/xfs_rtrmap_btree.c b/fs/xfs/libxfs/xfs_rtrmap_btree.c index 9181fca2ba54..2d8130b4c187 100644 --- a/fs/xfs/libxfs/xfs_rtrmap_btree.c +++ b/fs/xfs/libxfs/xfs_rtrmap_btree.c @@ -28,6 +28,7 @@ #include "xfs_rtgroup.h" #include "xfs_bmap.h" #include "xfs_imeta.h" +#include "xfs_health.h" static struct kmem_cache *xfs_rtrmapbt_cur_cache; @@ -800,8 +801,10 @@ xfs_iformat_rtrmap( level = be16_to_cpu(dfp->bb_level); if (level > mp->m_rtrmap_maxlevels || - xfs_rtrmap_droot_space_calc(level, numrecs) > dsize) + xfs_rtrmap_droot_space_calc(level, numrecs) > dsize) { + xfs_inode_mark_sick(ip, XFS_SICK_INO_CORE); return -EFSCORRUPTED; + } xfs_iroot_alloc(ip, XFS_DATA_FORK, xfs_rtrmap_broot_space_calc(mp, level, numrecs)); diff --git a/fs/xfs/xfs_health.c b/fs/xfs/xfs_health.c index 33f332ee8044..80cc735b52d1 100644 --- a/fs/xfs/xfs_health.c +++ b/fs/xfs/xfs_health.c @@ -531,6 +531,7 @@ xfs_ag_geom_health( static const struct ioctl_sick_map rtgroup_map[] = { { XFS_SICK_RT_SUPER, XFS_RTGROUP_GEOM_SICK_SUPER }, { XFS_SICK_RT_BITMAP, XFS_RTGROUP_GEOM_SICK_BITMAP }, + { XFS_SICK_RT_RMAPBT, XFS_RTGROUP_GEOM_SICK_RMAPBT }, { 0, 0 }, }; @@ -630,6 +631,9 @@ xfs_btree_mark_sick( case XFS_BTNUM_BMAP: xfs_bmap_mark_sick(cur->bc_ino.ip, cur->bc_ino.whichfork); return; + case XFS_BTNUM_RTRMAP: + xfs_rtgroup_mark_sick(cur->bc_ino.rtg, XFS_SICK_RT_RMAPBT); + return; case XFS_BTNUM_BNO: mask = XFS_SICK_AG_BNOBT; break; diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c index 45c388ad4c1f..0f31680284fb 100644 --- a/fs/xfs/xfs_rtalloc.c +++ b/fs/xfs/xfs_rtalloc.c @@ -1806,6 +1806,7 @@ xfs_rtmount_rmapbt( goto out_path; if (XFS_IS_CORRUPT(mp, ip->i_df.if_format != XFS_DINODE_FMT_RMAP)) { + xfs_rtgroup_mark_sick(rtg, XFS_SICK_RT_RMAPBT); error = -EFSCORRUPTED; goto out_rele; } ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 23/38] xfs: add realtime rmap btree when adding rt volume 2022-12-30 22:18 ` [PATCHSET v1.0 00/38] xfs: realtime reverse-mapping support Darrick J. Wong ` (20 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 24/38] xfs: report realtime rmap btree corruption errors to the health system Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 26/38] xfs: allow queued realtime intents to drain before scrubbing Darrick J. Wong ` (15 subsequent siblings) 37 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> If we're adding enough space to the realtime section to require the creation of new realtime groups, create the rt rmap btree inode before we start adding the space. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/xfs_rtalloc.c | 100 +++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 98 insertions(+), 2 deletions(-) diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c index 7b7e22b36d48..45c388ad4c1f 100644 --- a/fs/xfs/xfs_rtalloc.c +++ b/fs/xfs/xfs_rtalloc.c @@ -28,6 +28,8 @@ #include "xfs_rtgroup.h" #include "xfs_quota.h" #include "xfs_error.h" +#include "xfs_btree.h" +#include "xfs_rmap.h" #include "xfs_rtrmap_btree.h" /* @@ -1049,6 +1051,87 @@ xfs_growfs_rt_init_primary( return 0; } +/* Add a metadata inode for a realtime rmap btree. */ +static int +xfs_growfsrt_create_rtrmap( + struct xfs_rtgroup *rtg) +{ + struct xfs_mount *mp = rtg->rtg_mount; + struct xfs_imeta_update upd; + struct xfs_rmap_irec rmap = { + .rm_startblock = 0, + .rm_blockcount = mp->m_sb.sb_rextsize, + .rm_owner = XFS_RMAP_OWN_FS, + .rm_offset = 0, + .rm_flags = 0, + }; + struct xfs_btree_cur *cur; + struct xfs_imeta_path *path; + struct xfs_trans *tp; + struct xfs_inode *ip = NULL; + int error; + + if (!xfs_has_rtrmapbt(mp) || rtg->rtg_rmapip) + return 0; + + error = xfs_rtrmapbt_create_path(mp, rtg->rtg_rgno, &path); + if (error) + return error; + + error = xfs_imeta_ensure_dirpath(mp, path); + if (error) + goto out_path; + + error = xfs_imeta_start_update(mp, path, &upd); + if (error) + goto out_path; + + error = xfs_qm_dqattach(upd.dp); + if (error) + goto out_upd; + + error = xfs_trans_alloc(mp, &M_RES(mp)->tr_imeta_create, + xfs_imeta_create_space_res(mp), 0, 0, &tp); + if (error) + goto out_end; + + error = xfs_rtrmapbt_create(&tp, path, &upd, &ip); + if (error) + goto out_cancel; + + lockdep_set_class(&ip->i_lock.mr_lock, &xfs_rrmapip_key); + + cur = xfs_rtrmapbt_init_cursor(mp, tp, rtg, ip); + error = xfs_rmap_map_raw(cur, &rmap); + xfs_btree_del_cursor(cur, error); + if (error) + goto out_cancel; + + error = xfs_trans_commit(tp); + if (error) + goto out_end; + + xfs_imeta_end_update(mp, &upd, error); + xfs_imeta_free_path(path); + xfs_finish_inode_setup(ip); + rtg->rtg_rmapip = ip; + return 0; + +out_cancel: + xfs_trans_cancel(tp); +out_end: + /* Have to finish setting up the inode to ensure it's deleted. */ + if (ip) { + xfs_finish_inode_setup(ip); + xfs_irele(ip); + } +out_upd: + xfs_imeta_end_update(mp, &upd, error); +out_path: + xfs_imeta_free_path(path); + return error; +} + /* * Check that changes to the realtime geometry won't affect the minimum * log size, which would cause the fs to become unusable. @@ -1155,7 +1238,9 @@ xfs_growfs_rt( return -EINVAL; /* Unsupported realtime features. */ - if (xfs_has_rmapbt(mp) || xfs_has_reflink(mp) || xfs_has_quota(mp)) + if (!xfs_has_rtgroups(mp) && xfs_has_rmapbt(mp)) + return -EOPNOTSUPP; + if (xfs_has_reflink(mp) || xfs_has_quota(mp)) return -EOPNOTSUPP; nrblocks = in->newblocks; @@ -1278,10 +1363,21 @@ xfs_growfs_rt( nsbp->sb_rbmblocks); nmp->m_rsumsize = nrsumsize = XFS_FSB_TO_B(mp, nrsumblocks); - if (xfs_has_rtgroups(mp)) + if (xfs_has_rtgroups(mp)) { + xfs_rgnumber_t rgno = last_rgno; + nsbp->sb_rgcount = howmany_64(nsbp->sb_rblocks, nsbp->sb_rgblocks); + for_each_rtgroup_range(mp, rgno, nsbp->sb_rgcount, rtg) { + error = xfs_growfsrt_create_rtrmap(rtg); + if (error) { + xfs_rtgroup_put(rtg); + break; + } + } + } + /* * Start a transaction, get the log reservation. */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 26/38] xfs: allow queued realtime intents to drain before scrubbing 2022-12-30 22:18 ` [PATCHSET v1.0 00/38] xfs: realtime reverse-mapping support Darrick J. Wong ` (21 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 23/38] xfs: add realtime rmap btree when adding rt volume Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 22/38] xfs: check that the rtrmapbt maxlevels doesn't increase when growing fs Darrick J. Wong ` (14 subsequent siblings) 37 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> When a writer thread executes a chain of log intent items for the realtime volume, the ILOCKs taken during each step are for each rt metadata file, not the entire rt volume itself. Although scrub takes all rt metadata ILOCKs, this isn't sufficient to guard against scrub checking the rt volume while that writer thread is in the middle of finishing a chain because there's no higher level locking primitive guarding the realtime volume. When there's a collision, cross-referencing between data structures (e.g. rtrmapbt and rtrefcountbt) yields false corruption events; if repair is running, this results in incorrect repairs, which is catastrophic. Fix this by adding to the mount structure the same drain that we use to protect scrub against concurrent AG updates, but this time for the realtime volume. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_rtgroup.c | 3 ++ fs/xfs/libxfs/xfs_rtgroup.h | 9 +++++ fs/xfs/scrub/common.c | 76 ++++++++++++++++++++++++++++++++++++++++--- fs/xfs/scrub/rtbitmap.c | 3 ++ fs/xfs/xfs_bmap_item.c | 5 ++- fs/xfs/xfs_drain.c | 41 +++++++++++++++++++++++ fs/xfs/xfs_drain.h | 19 +++++++++++ fs/xfs/xfs_extfree_item.c | 2 + fs/xfs/xfs_mount.h | 1 + fs/xfs/xfs_rmap_item.c | 2 + fs/xfs/xfs_trace.h | 32 ++++++++++++++++++ 11 files changed, 186 insertions(+), 7 deletions(-) diff --git a/fs/xfs/libxfs/xfs_rtgroup.c b/fs/xfs/libxfs/xfs_rtgroup.c index d6b790741265..e40806c84256 100644 --- a/fs/xfs/libxfs/xfs_rtgroup.c +++ b/fs/xfs/libxfs/xfs_rtgroup.c @@ -132,6 +132,8 @@ xfs_initialize_rtgroups( #ifdef __KERNEL__ /* Place kernel structure only init below this point. */ spin_lock_init(&rtg->rtg_state_lock); + xfs_drain_init(&rtg->rtg_intents); + #endif /* __KERNEL__ */ /* first new rtg is fully initialized */ @@ -183,6 +185,7 @@ xfs_free_rtgroups( spin_unlock(&mp->m_rtgroup_lock); ASSERT(rtg); XFS_IS_CORRUPT(rtg->rtg_mount, atomic_read(&rtg->rtg_ref) != 0); + xfs_drain_free(&rtg->rtg_intents); call_rcu(&rtg->rcu_head, __xfs_free_rtgroups); } diff --git a/fs/xfs/libxfs/xfs_rtgroup.h b/fs/xfs/libxfs/xfs_rtgroup.h index 3230dd03d8f8..1d41a2cac34f 100644 --- a/fs/xfs/libxfs/xfs_rtgroup.h +++ b/fs/xfs/libxfs/xfs_rtgroup.h @@ -37,6 +37,15 @@ struct xfs_rtgroup { #ifdef __KERNEL__ /* -- kernel only structures below this line -- */ spinlock_t rtg_state_lock; + + /* + * We use xfs_drain to track the number of deferred log intent items + * that have been queued (but not yet processed) so that waiters (e.g. + * scrub) will not lock resources when other threads are in the middle + * of processing a chain of intent items only to find momentary + * inconsistencies. + */ + struct xfs_drain rtg_intents; #endif /* __KERNEL__ */ }; diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c index b63b5c016841..bb1d9ca20374 100644 --- a/fs/xfs/scrub/common.c +++ b/fs/xfs/scrub/common.c @@ -757,12 +757,78 @@ xchk_rt_unlock( } #ifdef CONFIG_XFS_RT +/* Lock all the rt group metadata inode ILOCKs and wait for intents. */ +static int +xchk_rtgroup_lock( + struct xfs_scrub *sc, + struct xchk_rt *sr, + unsigned int rtglock_flags) +{ + int error = 0; + + ASSERT(sr->rtg != NULL); + + /* + * If we're /only/ locking the rtbitmap in shared mode, then we're + * obviously not trying to compare records in two metadata inodes. + * There's no need to drain intents here because the caller (most + * likely the rgsuper scanner) doesn't need that level of consistency. + */ + if (rtglock_flags == XFS_RTGLOCK_BITMAP_SHARED) { + xfs_rtgroup_lock(NULL, sr->rtg, rtglock_flags); + sr->rtlock_flags = rtglock_flags; + return 0; + } + + do { + if (xchk_should_terminate(sc, &error)) + return error; + + xfs_rtgroup_lock(NULL, sr->rtg, rtglock_flags); + + /* + * Decide if the rt group is quiet enough for all metadata to + * be consistent with each other. Regular file IO doesn't get + * to lock all the rt inodes at the same time, which means that + * there could be other threads in the middle of processing a + * chain of deferred ops. + * + * We just locked all the metadata inodes for this rt group; + * now take a look to see if there are any intents in progress. + * If there are, drop the rt group inode locks and wait for the + * intents to drain. Since we hold the rt group inode locks + * for the duration of the scrub, this is the only time we have + * to sample the intents counter; any threads increasing it + * after this point can't possibly be in the middle of a chain + * of rt metadata updates. + * + * Obviously, this should be slanted against scrub and in favor + * of runtime threads. + */ + if (!xfs_rtgroup_intents_busy(sr->rtg)) { + sr->rtlock_flags = rtglock_flags; + return 0; + } + + xfs_rtgroup_unlock(sr->rtg, rtglock_flags); + + if (!(sc->flags & XCHK_FSHOOKS_DRAIN)) + return -ECHRNG; + error = xfs_rtgroup_drain_intents(sr->rtg); + if (error == -ERESTARTSYS) + error = -EINTR; + } while (!error); + + return error; +} + /* * For scrubbing a realtime group, grab all the in-core resources we'll need to * check the metadata, which means taking the ILOCK of the realtime group's - * metadata inodes. Callers must not join these inodes to the transaction with - * non-zero lockflags or concurrency problems will result. The @rtglock_flags - * argument takes XFS_RTGLOCK_* flags. + * metadata inodes and draining any running intent chains. Callers must not + * join these inodes to the transaction with non-zero lockflags or concurrency + * problems will result. The @rtglock_flags argument takes XFS_RTGLOCK_* + * flags. */ int xchk_rtgroup_init( @@ -778,9 +844,7 @@ xchk_rtgroup_init( if (!sr->rtg) return -ENOENT; - xfs_rtgroup_lock(NULL, sr->rtg, rtglock_flags); - sr->rtlock_flags = rtglock_flags; - return 0; + return xchk_rtgroup_lock(sc, sr, rtglock_flags); } /* diff --git a/fs/xfs/scrub/rtbitmap.c b/fs/xfs/scrub/rtbitmap.c index d847773e5f66..a034f2d392f5 100644 --- a/fs/xfs/scrub/rtbitmap.c +++ b/fs/xfs/scrub/rtbitmap.c @@ -26,6 +26,9 @@ xchk_setup_rgbitmap( { int error; + if (xchk_need_fshook_drain(sc)) + xchk_fshooks_enable(sc, XCHK_FSHOOKS_DRAIN); + error = xchk_trans_alloc(sc, 0); if (error) return error; diff --git a/fs/xfs/xfs_bmap_item.c b/fs/xfs/xfs_bmap_item.c index 04eeae9aef79..e2e7e5f678e9 100644 --- a/fs/xfs/xfs_bmap_item.c +++ b/fs/xfs/xfs_bmap_item.c @@ -369,6 +369,7 @@ xfs_bmap_update_get_group( rgno = xfs_rtb_to_rgno(mp, bi->bi_bmap.br_startblock); bi->bi_rtg = xfs_rtgroup_get(mp, rgno); + xfs_rtgroup_bump_intents(bi->bi_rtg); } else { bi->bi_rtg = NULL; } @@ -395,8 +396,10 @@ xfs_bmap_update_put_group( struct xfs_bmap_intent *bi) { if (xfs_ifork_is_realtime(bi->bi_owner, bi->bi_whichfork)) { - if (xfs_has_rtgroups(bi->bi_owner->i_mount)) + if (xfs_has_rtgroups(bi->bi_owner->i_mount)) { + xfs_rtgroup_drop_intents(bi->bi_rtg); xfs_rtgroup_put(bi->bi_rtg); + } return; } diff --git a/fs/xfs/xfs_drain.c b/fs/xfs/xfs_drain.c index 9b463e1183f6..4fda4cd096fa 100644 --- a/fs/xfs/xfs_drain.c +++ b/fs/xfs/xfs_drain.c @@ -11,6 +11,7 @@ #include "xfs_mount.h" #include "xfs_ag.h" #include "xfs_trace.h" +#include "xfs_rtgroup.h" /* * Use a static key here to reduce the overhead of xfs_drain_drop. If the @@ -119,3 +120,43 @@ xfs_perag_intents_busy( { return xfs_drain_busy(&pag->pag_intents); } + +#ifdef CONFIG_XFS_RT +/* Add an item to the pending count. */ +void +xfs_rtgroup_bump_intents( + struct xfs_rtgroup *rtg) +{ + trace_xfs_rtgroup_bump_intents(rtg, __return_address); + xfs_drain_bump(&rtg->rtg_intents); +} + +/* Remove an item from the pending count. */ +void +xfs_rtgroup_drop_intents( + struct xfs_rtgroup *rtg) +{ + trace_xfs_rtgroup_drop_intents(rtg, __return_address); + xfs_drain_drop(&rtg->rtg_intents); +} + +/* + * Wait for the pending intent count for realtime metadata to hit zero. + * Callers must not hold any rt metadata inode locks. + */ +int +xfs_rtgroup_drain_intents( + struct xfs_rtgroup *rtg) +{ + trace_xfs_rtgroup_wait_intents(rtg, __return_address); + return xfs_drain_wait(&rtg->rtg_intents); +} + +/* Might someone else be processing intents for this rt group? */ +bool +xfs_rtgroup_intents_busy( + struct xfs_rtgroup *rtg) +{ + return xfs_drain_busy(&rtg->rtg_intents); +} +#endif /* CONFIG_XFS_RT */ diff --git a/fs/xfs/xfs_drain.h b/fs/xfs/xfs_drain.h index a980df6d3508..478ffab95b0f 100644 --- a/fs/xfs/xfs_drain.h +++ b/fs/xfs/xfs_drain.h @@ -7,6 +7,7 @@ #define XFS_DRAIN_H_ struct xfs_perag; +struct xfs_rtgroup; #ifdef CONFIG_XFS_DRAIN_INTENTS /* @@ -60,12 +61,27 @@ void xfs_drain_wait_enable(void); * All functions that create work items must increment the intent counter as * soon as the item is added to the transaction and cannot drop the counter * until the item is finished or cancelled. + * + * The same principles apply to realtime groups because the rt metadata inode + * ILOCKs are not held across transaction rolls. */ void xfs_perag_bump_intents(struct xfs_perag *pag); void xfs_perag_drop_intents(struct xfs_perag *pag); int xfs_perag_drain_intents(struct xfs_perag *pag); bool xfs_perag_intents_busy(struct xfs_perag *pag); + +#ifdef CONFIG_XFS_RT +void xfs_rtgroup_bump_intents(struct xfs_rtgroup *rtg); +void xfs_rtgroup_drop_intents(struct xfs_rtgroup *rtg); + +int xfs_rtgroup_drain_intents(struct xfs_rtgroup *rtg); +bool xfs_rtgroup_intents_busy(struct xfs_rtgroup *rtg); +#else +static inline void xfs_rtgroup_bump_intents(struct xfs_rtgroup *rtg) { } +static inline void xfs_rtgroup_drop_intents(struct xfs_rtgroup *rtg) { } +#endif /* CONFIG_XFS_RT */ + #else struct xfs_drain { /* empty */ }; @@ -75,6 +91,9 @@ struct xfs_drain { /* empty */ }; static inline void xfs_perag_bump_intents(struct xfs_perag *pag) { } static inline void xfs_perag_drop_intents(struct xfs_perag *pag) { } +static inline void xfs_rtgroup_bump_intents(struct xfs_rtgroup *rtg) { } +static inline void xfs_rtgroup_drop_intents(struct xfs_rtgroup *rtg) { } + #endif /* CONFIG_XFS_DRAIN_INTENTS */ #endif /* XFS_DRAIN_H_ */ diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c index 42b89c9e996b..e2e888bc1b1c 100644 --- a/fs/xfs/xfs_extfree_item.c +++ b/fs/xfs/xfs_extfree_item.c @@ -491,6 +491,7 @@ xfs_extent_free_get_group( rgno = xfs_rtb_to_rgno(mp, xefi->xefi_startblock); xefi->xefi_rtg = xfs_rtgroup_get(mp, rgno); + xfs_rtgroup_bump_intents(xefi->xefi_rtg); return; } @@ -505,6 +506,7 @@ xfs_extent_free_put_group( struct xfs_extent_free_item *xefi) { if (xfs_efi_is_realtime(xefi)) { + xfs_rtgroup_drop_intents(xefi->xefi_rtg); xfs_rtgroup_put(xefi->xefi_rtg); return; } diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h index a565b1b1372a..b1ffab4cb9cd 100644 --- a/fs/xfs/xfs_mount.h +++ b/fs/xfs/xfs_mount.h @@ -13,6 +13,7 @@ struct xfs_ail; struct xfs_quotainfo; struct xfs_da_geometry; struct xfs_perag; +struct xfs_rtgroup; /* dynamic preallocation free space thresholds, 5% down to 1% */ enum { diff --git a/fs/xfs/xfs_rmap_item.c b/fs/xfs/xfs_rmap_item.c index a2949f818e0c..a95783622adb 100644 --- a/fs/xfs/xfs_rmap_item.c +++ b/fs/xfs/xfs_rmap_item.c @@ -400,6 +400,7 @@ xfs_rmap_update_get_group( rgno = xfs_rtb_to_rgno(mp, ri->ri_bmap.br_startblock); ri->ri_rtg = xfs_rtgroup_get(mp, rgno); + xfs_rtgroup_bump_intents(ri->ri_rtg); return; } @@ -414,6 +415,7 @@ xfs_rmap_update_put_group( struct xfs_rmap_intent *ri) { if (ri->ri_realtime) { + xfs_rtgroup_drop_intents(ri->ri_rtg); xfs_rtgroup_put(ri->ri_rtg); return; } diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index d90e9183dfc7..a6de7b6e4afd 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -4872,6 +4872,38 @@ DEFINE_PERAG_INTENTS_EVENT(xfs_perag_bump_intents); DEFINE_PERAG_INTENTS_EVENT(xfs_perag_drop_intents); DEFINE_PERAG_INTENTS_EVENT(xfs_perag_wait_intents); +#ifdef CONFIG_XFS_RT +DECLARE_EVENT_CLASS(xfs_rtgroup_intents_class, + TP_PROTO(struct xfs_rtgroup *rtg, void *caller_ip), + TP_ARGS(rtg, caller_ip), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(dev_t, rtdev) + __field(long, nr_intents) + __field(void *, caller_ip) + ), + TP_fast_assign( + __entry->dev = rtg->rtg_mount->m_super->s_dev; + __entry->rtdev = rtg->rtg_mount->m_rtdev_targp->bt_dev; + __entry->nr_intents = atomic_read(&rtg->rtg_intents.dr_count); + __entry->caller_ip = caller_ip; + ), + TP_printk("dev %d:%d rtdev %d:%d intents %ld caller %pS", + MAJOR(__entry->dev), MINOR(__entry->dev), + MAJOR(__entry->rtdev), MINOR(__entry->rtdev), + __entry->nr_intents, + __entry->caller_ip) +); + +#define DEFINE_RTGROUP_INTENTS_EVENT(name) \ +DEFINE_EVENT(xfs_rtgroup_intents_class, name, \ + TP_PROTO(struct xfs_rtgroup *rtg, void *caller_ip), \ + TP_ARGS(rtg, caller_ip)) +DEFINE_RTGROUP_INTENTS_EVENT(xfs_rtgroup_bump_intents); +DEFINE_RTGROUP_INTENTS_EVENT(xfs_rtgroup_drop_intents); +DEFINE_RTGROUP_INTENTS_EVENT(xfs_rtgroup_wait_intents); +#endif /* CONFIG_XFS_RT */ + #endif /* CONFIG_XFS_DRAIN_INTENTS */ TRACE_EVENT(xfs_swapext_overhead, ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 22/38] xfs: check that the rtrmapbt maxlevels doesn't increase when growing fs 2022-12-30 22:18 ` [PATCHSET v1.0 00/38] xfs: realtime reverse-mapping support Darrick J. Wong ` (22 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 26/38] xfs: allow queued realtime intents to drain before scrubbing Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 21/38] xfs: fix getfsmap reporting past the last rt extent Darrick J. Wong ` (13 subsequent siblings) 37 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> The size of filesystem transaction reservations depends on the maximum height (maxlevels) of the realtime btrees. Since we don't want a grow operation to increase the reservation size enough that we'll fail the minimum log size checks on the next mount, constrain growfs operations if they would cause an increase in those maxlevels. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/xfs_fsops.c | 12 ++++++++++ fs/xfs/xfs_rtalloc.c | 63 +++++++++++++++++++++++++++++++++++++++++++++++++- fs/xfs/xfs_rtalloc.h | 6 +++++ fs/xfs/xfs_trace.h | 21 +++++++++++++++++ 4 files changed, 101 insertions(+), 1 deletion(-) diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c index 9770916acd69..65b44ad8884e 100644 --- a/fs/xfs/xfs_fsops.c +++ b/fs/xfs/xfs_fsops.c @@ -23,6 +23,7 @@ #include "xfs_trace.h" #include "xfs_rtgroup.h" #include "xfs_rtalloc.h" +#include "xfs_rtrmap_btree.h" /* * Write new AG headers to disk. Non-transactional, but need to be @@ -115,6 +116,13 @@ xfs_growfs_data_private( xfs_buf_relse(bp); } + /* Make sure the new fs size won't cause problems with the log. */ + error = xfs_growfs_check_rtgeom(mp, nb, mp->m_sb.sb_rblocks, + mp->m_sb.sb_rextsize, mp->m_sb.sb_rextents, + mp->m_sb.sb_rbmblocks, mp->m_sb.sb_rextslog); + if (error) + return error; + nb_div = nb; nb_mod = do_div(nb_div, mp->m_sb.sb_agblocks); nagcount = nb_div + (nb_mod != 0); @@ -214,7 +222,11 @@ xfs_growfs_data_private( error = xfs_fs_reserve_ag_blocks(mp); if (error == -ENOSPC) error = 0; + + /* Compute new maxlevels for rt btrees. */ + xfs_rtrmapbt_compute_maxlevels(mp); } + return error; out_trans_cancel: diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c index c3d27cb85c26..7b7e22b36d48 100644 --- a/fs/xfs/xfs_rtalloc.c +++ b/fs/xfs/xfs_rtalloc.c @@ -1049,6 +1049,57 @@ xfs_growfs_rt_init_primary( return 0; } +/* + * Check that changes to the realtime geometry won't affect the minimum + * log size, which would cause the fs to become unusable. + */ +int +xfs_growfs_check_rtgeom( + const struct xfs_mount *mp, + xfs_rfsblock_t dblocks, + xfs_rfsblock_t rblocks, + xfs_agblock_t rextsize, + xfs_rtblock_t rextents, + xfs_extlen_t rbmblocks, + uint8_t rextslog) +{ + struct xfs_mount *fake_mp; + int min_logfsbs; + + fake_mp = kmem_alloc(sizeof(struct xfs_mount), KM_MAYFAIL); + if (!fake_mp) + return -ENOMEM; + + /* + * Create a dummy xfs_mount with the new rt geometry, and compute the + * new minimum log size. This ensures that the log is big enough to + * handle the larger transactions that we could start sending. + */ + memcpy(fake_mp, mp, sizeof(struct xfs_mount)); + + fake_mp->m_sb.sb_dblocks = dblocks; + fake_mp->m_sb.sb_rblocks = rblocks; + fake_mp->m_sb.sb_rextents = rextents; + fake_mp->m_sb.sb_rextsize = rextsize; + fake_mp->m_sb.sb_rbmblocks = rbmblocks; + fake_mp->m_sb.sb_rextslog = rextslog; + if (rblocks > 0) + fake_mp->m_features |= XFS_FEAT_REALTIME; + + xfs_rtrmapbt_compute_maxlevels(fake_mp); + + xfs_trans_resv_calc(fake_mp, M_RES(fake_mp)); + min_logfsbs = xfs_log_calc_minimum_size(fake_mp); + trace_xfs_growfs_check_rtgeom(mp, min_logfsbs); + + kmem_free(fake_mp); + + if (mp->m_sb.sb_logblocks < min_logfsbs) + return -ENOSPC; + + return 0; +} + /* * Grow the realtime area of the filesystem. */ @@ -1139,6 +1190,12 @@ xfs_growfs_rt( if (nrsumblocks > (mp->m_sb.sb_logblocks >> 1)) return -EINVAL; + /* Make sure the new fs size won't cause problems with the log. */ + error = xfs_growfs_check_rtgeom(mp, mp->m_sb.sb_dblocks, nrblocks, + in->extsize, nrextents, nrbmblocks, nrextslog); + if (error) + return error; + /* Allocate the new rt group structures */ if (xfs_has_rtgroups(mp)) { /* @@ -1313,8 +1370,12 @@ xfs_growfs_rt( rtg->rtg_blockcount = xfs_rtgroup_block_count(mp, rtg->rtg_rgno); - /* Ensure the mount RT feature flag is now set. */ + /* + * Ensure the mount RT feature flag is now set, and compute new + * maxlevels for rt btrees. + */ mp->m_features |= XFS_FEAT_REALTIME; + xfs_rtrmapbt_compute_maxlevels(mp); } if (error) goto out_free; diff --git a/fs/xfs/xfs_rtalloc.h b/fs/xfs/xfs_rtalloc.h index 873ebac239dd..35737a09cdb9 100644 --- a/fs/xfs/xfs_rtalloc.h +++ b/fs/xfs/xfs_rtalloc.h @@ -84,6 +84,11 @@ xfs_growfs_rt( int xfs_rtalloc_reinit_frextents(struct xfs_mount *mp); int xfs_rtfile_convert_unwritten(struct xfs_inode *ip, loff_t pos, uint64_t len); + +int xfs_growfs_check_rtgeom(const struct xfs_mount *mp, xfs_rfsblock_t dblocks, + xfs_rfsblock_t rblocks, xfs_agblock_t rextsize, + xfs_rtblock_t rextents, xfs_extlen_t rbmblocks, + uint8_t rextslog); #else # define xfs_rtallocate_extent(t,b,min,max,l,f,p,rb) (-ENOSYS) # define xfs_rtpick_extent(m,t,l,rb) (-ENOSYS) @@ -107,6 +112,7 @@ xfs_rtmount_init( # define xfs_rt_resv_free(mp) ((void)0) # define xfs_rt_resv_init(mp) (0) # define xfs_rtmount_dqattach(mp) (0) +# define xfs_growfs_check_rtgeom(mp, d, r, rs, rx, rb, rl) (0) #endif /* CONFIG_XFS_RT */ #endif /* __XFS_RTALLOC_H__ */ diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index 77f4acc1b923..d90e9183dfc7 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -5196,6 +5196,27 @@ DEFINE_IMETA_RESV_EVENT(xfs_imeta_resv_free_extent); DEFINE_IMETA_RESV_EVENT(xfs_imeta_resv_critical); DEFINE_INODE_ERROR_EVENT(xfs_imeta_resv_init_error); +#ifdef CONFIG_XFS_RT +TRACE_EVENT(xfs_growfs_check_rtgeom, + TP_PROTO(const struct xfs_mount *mp, unsigned int min_logfsbs), + TP_ARGS(mp, min_logfsbs), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(unsigned int, logblocks) + __field(unsigned int, min_logfsbs) + ), + TP_fast_assign( + __entry->dev = mp->m_super->s_dev; + __entry->logblocks = mp->m_sb.sb_logblocks; + __entry->min_logfsbs = min_logfsbs; + ), + TP_printk("dev %d:%d logblocks %u min_logfsbs %u", + MAJOR(__entry->dev), MINOR(__entry->dev), + __entry->logblocks, + __entry->min_logfsbs) +); +#endif /* CONFIG_XFS_RT */ + #endif /* _TRACE_XFS_H */ #undef TRACE_INCLUDE_PATH ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 21/38] xfs: fix getfsmap reporting past the last rt extent 2022-12-30 22:18 ` [PATCHSET v1.0 00/38] xfs: realtime reverse-mapping support Darrick J. Wong ` (23 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 22/38] xfs: check that the rtrmapbt maxlevels doesn't increase when growing fs Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 27/38] xfs: scrub the realtime rmapbt Darrick J. Wong ` (12 subsequent siblings) 37 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> The realtime section ends at the last rt extent. If the user configures the rt geometry with an extent size that is not an integer factor of the number of rt blocks, it's possible for there to be rt blocks past the end of the last rt extent. These tail blocks cannot ever be allocated and will cause corruption reports if the last extent coincides with the end of an rt bitmap block, so do not report consider them for the GETFSMAP output. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/xfs_fsmap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/xfs/xfs_fsmap.c b/fs/xfs/xfs_fsmap.c index b5e7ae77cab9..efbcc4b1d850 100644 --- a/fs/xfs/xfs_fsmap.c +++ b/fs/xfs/xfs_fsmap.c @@ -755,7 +755,7 @@ xfs_getfsmap_rtdev_rtbitmap( uint64_t eofs; int error = 0; - eofs = XFS_FSB_TO_BB(mp, mp->m_sb.sb_rblocks); + eofs = XFS_FSB_TO_BB(mp, xfs_rtx_to_rtb(mp, mp->m_sb.sb_rextents)); if (keys[0].fmr_physical >= eofs) return 0; start_fsb = XFS_BB_TO_FSBT(mp, keys[0].fmr_physical); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 27/38] xfs: scrub the realtime rmapbt 2022-12-30 22:18 ` [PATCHSET v1.0 00/38] xfs: realtime reverse-mapping support Darrick J. Wong ` (24 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 21/38] xfs: fix getfsmap reporting past the last rt extent Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 25/38] xfs: fix scrub tracepoints when inode-rooted btrees are involved Darrick J. Wong ` (11 subsequent siblings) 37 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Check the realtime reverse mapping btree against the rtbitmap, and modify the rtbitmap scrub to check against the rtrmapbt. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/Makefile | 1 fs/xfs/libxfs/xfs_fs.h | 3 - fs/xfs/scrub/bmap.c | 1 fs/xfs/scrub/bmap_repair.c | 1 fs/xfs/scrub/common.c | 78 +++++++++++++++++ fs/xfs/scrub/common.h | 11 ++ fs/xfs/scrub/health.c | 1 fs/xfs/scrub/inode.c | 10 +- fs/xfs/scrub/inode_repair.c | 7 +- fs/xfs/scrub/repair.c | 1 fs/xfs/scrub/rtrmap.c | 192 +++++++++++++++++++++++++++++++++++++++++++ fs/xfs/scrub/scrub.c | 9 ++ fs/xfs/scrub/scrub.h | 5 + fs/xfs/scrub/trace.h | 4 + 14 files changed, 312 insertions(+), 12 deletions(-) create mode 100644 fs/xfs/scrub/rtrmap.c diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile index 84934538bf52..1060ea739210 100644 --- a/fs/xfs/Makefile +++ b/fs/xfs/Makefile @@ -181,6 +181,7 @@ xfs-y += $(addprefix scrub/, \ xfs-$(CONFIG_XFS_RT) += $(addprefix scrub/, \ rgsuper.o \ rtbitmap.o \ + rtrmap.o \ rtsummary.o \ ) diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h index 5c557d5ff13e..8547ba85c550 100644 --- a/fs/xfs/libxfs/xfs_fs.h +++ b/fs/xfs/libxfs/xfs_fs.h @@ -744,9 +744,10 @@ struct xfs_scrub_metadata { #define XFS_SCRUB_TYPE_HEALTHY 27 /* everything checked out ok */ #define XFS_SCRUB_TYPE_RGSUPER 28 /* realtime superblock */ #define XFS_SCRUB_TYPE_RGBITMAP 29 /* realtime group bitmap */ +#define XFS_SCRUB_TYPE_RTRMAPBT 30 /* rtgroup reverse mapping btree */ /* Number of scrub subcommands. */ -#define XFS_SCRUB_TYPE_NR 30 +#define XFS_SCRUB_TYPE_NR 31 /* i: Repair this metadata. */ #define XFS_SCRUB_IFLAG_REPAIR (1u << 0) diff --git a/fs/xfs/scrub/bmap.c b/fs/xfs/scrub/bmap.c index b5b081d23ca2..0c79185daedf 100644 --- a/fs/xfs/scrub/bmap.c +++ b/fs/xfs/scrub/bmap.c @@ -825,6 +825,7 @@ xchk_bmap( case XFS_DINODE_FMT_UUID: case XFS_DINODE_FMT_DEV: case XFS_DINODE_FMT_LOCAL: + case XFS_DINODE_FMT_RMAP: /* No mappings to check. */ if (whichfork == XFS_COW_FORK) xchk_fblock_set_corrupt(sc, whichfork, 0); diff --git a/fs/xfs/scrub/bmap_repair.c b/fs/xfs/scrub/bmap_repair.c index 0ad0f27fd8ca..ca7df344581d 100644 --- a/fs/xfs/scrub/bmap_repair.c +++ b/fs/xfs/scrub/bmap_repair.c @@ -682,6 +682,7 @@ xrep_bmap_check_inputs( case XFS_DINODE_FMT_DEV: case XFS_DINODE_FMT_LOCAL: case XFS_DINODE_FMT_UUID: + case XFS_DINODE_FMT_RMAP: return -ECANCELED; case XFS_DINODE_FMT_EXTENTS: case XFS_DINODE_FMT_BTREE: diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c index bb1d9ca20374..fa8e0064c41d 100644 --- a/fs/xfs/scrub/common.c +++ b/fs/xfs/scrub/common.c @@ -35,6 +35,8 @@ #include "xfs_swapext.h" #include "xfs_rtbitmap.h" #include "xfs_rtgroup.h" +#include "xfs_rtrmap_btree.h" +#include "xfs_bmap_util.h" #include "scrub/scrub.h" #include "scrub/common.h" #include "scrub/trace.h" @@ -837,6 +839,8 @@ xchk_rtgroup_init( struct xchk_rt *sr, unsigned int rtglock_flags) { + int error; + ASSERT(sr->rtg == NULL); ASSERT(sr->rtlock_flags == 0); @@ -844,7 +848,30 @@ xchk_rtgroup_init( if (!sr->rtg) return -ENOENT; - return xchk_rtgroup_lock(sc, sr, rtglock_flags); + error = xchk_rtgroup_lock(sc, sr, rtglock_flags); + if (error) + return error; + + if (xfs_has_rtrmapbt(sc->mp) && (rtglock_flags & XFS_RTGLOCK_RMAP)) + sr->rmap_cur = xfs_rtrmapbt_init_cursor(sc->mp, sc->tp, + sr->rtg, sr->rtg->rtg_rmapip); + + return 0; +} + +/* + * Free all the btree cursors and other incore data relating to the realtime + * group. This has to be done /before/ committing (or cancelling) the scrub + * transaction. + */ +void +xchk_rtgroup_btcur_free( + struct xchk_rt *sr) +{ + if (sr->rmap_cur) + xfs_btree_del_cursor(sr->rmap_cur, XFS_BTREE_ERROR); + + sr->rmap_cur = NULL; } /* @@ -932,6 +959,14 @@ xchk_setup_fs( return xchk_trans_alloc(sc, resblks); } +/* Set us up with a transaction and an empty context to repair rt metadata. */ +int +xchk_setup_rt( + struct xfs_scrub *sc) +{ + return xchk_trans_alloc(sc, 0); +} + /* Set us up with AG headers and btree cursors. */ int xchk_setup_ag_btree( @@ -1490,3 +1525,44 @@ xchk_fshooks_enable( sc->flags |= scrub_fshooks; } + +/* Count the blocks used by a file, even if it's a metadata inode. */ +int +xchk_inode_count_blocks( + struct xfs_scrub *sc, + int whichfork, + xfs_extnum_t *nextents, + xfs_filblks_t *count) +{ + struct xfs_ifork *ifp = xfs_ifork_ptr(sc->ip, whichfork); + struct xfs_btree_cur *cur; + xfs_extlen_t btblocks; + int error; + + if (!ifp) { + *nextents = 0; + *count = 0; + return 0; + } + + switch (ifp->if_format) { + case XFS_DINODE_FMT_RMAP: + if (!sc->sr.rtg) { + ASSERT(0); + return -EFSCORRUPTED; + } + cur = xfs_rtrmapbt_init_cursor(sc->mp, sc->tp, sc->sr.rtg, + sc->ip); + error = xfs_btree_count_blocks(cur, &btblocks); + xfs_btree_del_cursor(cur, error); + if (error) + return error; + + *nextents = 0; + *count = btblocks - 1; + return 0; + default: + return xfs_bmap_count_blocks(sc->tp, sc->ip, whichfork, + nextents, count); + } +} diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h index e83e88b44e5b..9ca2fbaac72c 100644 --- a/fs/xfs/scrub/common.h +++ b/fs/xfs/scrub/common.h @@ -90,6 +90,7 @@ static inline int xchk_setup_nothing(struct xfs_scrub *sc) /* Setup functions */ int xchk_setup_agheader(struct xfs_scrub *sc); int xchk_setup_fs(struct xfs_scrub *sc); +int xchk_setup_rt(struct xfs_scrub *sc); int xchk_setup_ag_allocbt(struct xfs_scrub *sc); int xchk_setup_ag_iallocbt(struct xfs_scrub *sc); int xchk_setup_ag_rmapbt(struct xfs_scrub *sc); @@ -106,11 +107,13 @@ int xchk_setup_rtbitmap(struct xfs_scrub *sc); int xchk_setup_rtsummary(struct xfs_scrub *sc); int xchk_setup_rgsuperblock(struct xfs_scrub *sc); int xchk_setup_rgbitmap(struct xfs_scrub *sc); +int xchk_setup_rtrmapbt(struct xfs_scrub *sc); #else # define xchk_setup_rtbitmap xchk_setup_nothing # define xchk_setup_rtsummary xchk_setup_nothing # define xchk_setup_rgsuperblock xchk_setup_nothing # define xchk_setup_rgbitmap xchk_setup_nothing +# define xchk_setup_rtrmapbt xchk_setup_nothing #endif #ifdef CONFIG_XFS_QUOTA int xchk_ino_dqattach(struct xfs_scrub *sc); @@ -170,14 +173,17 @@ void xchk_rt_unlock(struct xfs_scrub *sc, struct xchk_rt *sr); #ifdef CONFIG_XFS_RT /* All the locks we need to check an rtgroup. */ -#define XCHK_RTGLOCK_ALL (XFS_RTGLOCK_BITMAP_SHARED) +#define XCHK_RTGLOCK_ALL (XFS_RTGLOCK_BITMAP_SHARED | \ + XFS_RTGLOCK_RMAP) int xchk_rtgroup_init(struct xfs_scrub *sc, xfs_rgnumber_t rgno, struct xchk_rt *sr, unsigned int rtglock_flags); void xchk_rtgroup_unlock(struct xfs_scrub *sc, struct xchk_rt *sr); +void xchk_rtgroup_btcur_free(struct xchk_rt *sr); void xchk_rtgroup_free(struct xfs_scrub *sc, struct xchk_rt *sr); #else # define xchk_rtgroup_init(sc, rgno, sr, lockflags) (-ENOSYS) +# define xchk_rtgroup_btcur_free(sr) ((void)0) # define xchk_rtgroup_free(sc, sr) ((void)0) #endif /* CONFIG_XFS_RT */ @@ -258,4 +264,7 @@ static inline bool xchk_need_fshook_drain(struct xfs_scrub *sc) void xchk_fshooks_enable(struct xfs_scrub *sc, unsigned int scrub_fshooks); +int xchk_inode_count_blocks(struct xfs_scrub *sc, int whichfork, + xfs_extnum_t *nextents, xfs_filblks_t *count); + #endif /* __XFS_SCRUB_COMMON_H__ */ diff --git a/fs/xfs/scrub/health.c b/fs/xfs/scrub/health.c index a71d4d9087b2..061f6f73b666 100644 --- a/fs/xfs/scrub/health.c +++ b/fs/xfs/scrub/health.c @@ -113,6 +113,7 @@ static const struct xchk_health_map type_to_health_flag[XFS_SCRUB_TYPE_NR] = { [XFS_SCRUB_TYPE_QUOTACHECK] = { XHG_FS, XFS_SICK_FS_QUOTACHECK }, [XFS_SCRUB_TYPE_NLINKS] = { XHG_FS, XFS_SICK_FS_NLINKS }, [XFS_SCRUB_TYPE_RGSUPER] = { XHG_RTGROUP, XFS_SICK_RT_SUPER }, + [XFS_SCRUB_TYPE_RTRMAPBT] = { XHG_RTGROUP, XFS_SICK_RT_RMAPBT }, }; /* Return the health status mask for this scrub type. */ diff --git a/fs/xfs/scrub/inode.c b/fs/xfs/scrub/inode.c index 4e534ec642e2..f2c60c3515e7 100644 --- a/fs/xfs/scrub/inode.c +++ b/fs/xfs/scrub/inode.c @@ -466,6 +466,10 @@ xchk_dinode( if (!S_ISREG(mode) && !S_ISDIR(mode)) xchk_ino_set_corrupt(sc, ino); break; + case XFS_DINODE_FMT_RMAP: + if (!S_ISREG(mode)) + xchk_ino_set_corrupt(sc, ino); + break; case XFS_DINODE_FMT_UUID: default: xchk_ino_set_corrupt(sc, ino); @@ -650,15 +654,13 @@ xchk_inode_xref_bmap( return; /* Walk all the extents to check nextents/naextents/nblocks. */ - error = xfs_bmap_count_blocks(sc->tp, sc->ip, XFS_DATA_FORK, - &nextents, &count); + error = xchk_inode_count_blocks(sc, XFS_DATA_FORK, &nextents, &count); if (!xchk_should_check_xref(sc, &error, NULL)) return; if (nextents < xfs_dfork_data_extents(dip)) xchk_ino_xref_set_corrupt(sc, sc->ip->i_ino); - error = xfs_bmap_count_blocks(sc->tp, sc->ip, XFS_ATTR_FORK, - &nextents, &acount); + error = xchk_inode_count_blocks(sc, XFS_ATTR_FORK, &nextents, &acount); if (!xchk_should_check_xref(sc, &error, NULL)) return; if (nextents != xfs_dfork_attr_extents(dip)) diff --git a/fs/xfs/scrub/inode_repair.c b/fs/xfs/scrub/inode_repair.c index 1efd606bf92c..a8d19d1e76e3 100644 --- a/fs/xfs/scrub/inode_repair.c +++ b/fs/xfs/scrub/inode_repair.c @@ -1275,8 +1275,7 @@ xrep_inode_blockcounts( trace_xrep_inode_blockcounts(sc); /* Set data fork counters from the data fork mappings. */ - error = xfs_bmap_count_blocks(sc->tp, sc->ip, XFS_DATA_FORK, - &nextents, &count); + error = xchk_inode_count_blocks(sc, XFS_DATA_FORK, &nextents, &count); if (error) return error; if (xfs_has_reflink(sc->mp)) { @@ -1296,8 +1295,8 @@ xrep_inode_blockcounts( /* Set attr fork counters from the attr fork mappings. */ ifp = xfs_ifork_ptr(sc->ip, XFS_ATTR_FORK); if (ifp) { - error = xfs_bmap_count_blocks(sc->tp, sc->ip, XFS_ATTR_FORK, - &nextents, &acount); + error = xchk_inode_count_blocks(sc, XFS_ATTR_FORK, &nextents, + &acount); if (error) return error; if (count >= sc->mp->m_sb.sb_dblocks) diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c index 1652f633f692..eb0dda2df7af 100644 --- a/fs/xfs/scrub/repair.c +++ b/fs/xfs/scrub/repair.c @@ -56,6 +56,7 @@ xrep_attempt( trace_xrep_attempt(XFS_I(file_inode(sc->file)), sc->sm, error); xchk_ag_btcur_free(&sc->sa); + xchk_rtgroup_btcur_free(&sc->sr); /* Repair whatever's broken. */ ASSERT(sc->ops->repair); diff --git a/fs/xfs/scrub/rtrmap.c b/fs/xfs/scrub/rtrmap.c new file mode 100644 index 000000000000..e60b454b39f3 --- /dev/null +++ b/fs/xfs/scrub/rtrmap.c @@ -0,0 +1,192 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (C) 2022 Oracle. All Rights Reserved. + * Author: Darrick J. Wong <djwong@kernel.org> + */ +#include "xfs.h" +#include "xfs_fs.h" +#include "xfs_shared.h" +#include "xfs_format.h" +#include "xfs_trans_resv.h" +#include "xfs_mount.h" +#include "xfs_defer.h" +#include "xfs_btree.h" +#include "xfs_bit.h" +#include "xfs_log_format.h" +#include "xfs_trans.h" +#include "xfs_sb.h" +#include "xfs_rmap.h" +#include "xfs_rmap_btree.h" +#include "xfs_rtrmap_btree.h" +#include "xfs_inode.h" +#include "xfs_rtalloc.h" +#include "xfs_rtgroup.h" +#include "scrub/xfs_scrub.h" +#include "scrub/scrub.h" +#include "scrub/common.h" +#include "scrub/btree.h" +#include "scrub/trace.h" + +/* Set us up with the realtime metadata locked. */ +int +xchk_setup_rtrmapbt( + struct xfs_scrub *sc) +{ + struct xfs_mount *mp = sc->mp; + struct xfs_rtgroup *rtg; + int error = 0; + + if (xchk_need_fshook_drain(sc)) + xchk_fshooks_enable(sc, XCHK_FSHOOKS_DRAIN); + + rtg = xfs_rtgroup_get(mp, sc->sm->sm_agno); + if (!rtg) + return -ENOENT; + + error = xchk_setup_rt(sc); + if (error) + goto out_rtg; + + error = xchk_install_live_inode(sc, rtg->rtg_rmapip); + if (error) + goto out_rtg; + + error = xchk_ino_dqattach(sc); + if (error) + goto out_rtg; + + error = xchk_rtgroup_init(sc, rtg->rtg_rgno, &sc->sr, XCHK_RTGLOCK_ALL); +out_rtg: + xfs_rtgroup_put(rtg); + return error; +} + +/* Realtime reverse mapping. */ + +struct xchk_rtrmap { + /* + * The furthest-reaching of the rmapbt records that we've already + * processed. This enables us to detect overlapping records for space + * allocations that cannot be shared. + */ + struct xfs_rmap_irec overlap_rec; + + /* + * The previous rmapbt record, so that we can check for two records + * that could be one. + */ + struct xfs_rmap_irec prev_rec; +}; + +/* Flag failures for records that overlap but cannot. */ +STATIC void +xchk_rtrmapbt_check_overlapping( + struct xchk_btree *bs, + struct xchk_rtrmap *cr, + const struct xfs_rmap_irec *irec) +{ + xfs_rtblock_t pnext, inext; + + if (bs->sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT) + return; + + /* No previous record? */ + if (cr->overlap_rec.rm_blockcount == 0) + goto set_prev; + + /* Do overlap_rec and irec overlap? */ + pnext = cr->overlap_rec.rm_startblock + cr->overlap_rec.rm_blockcount; + if (pnext <= irec->rm_startblock) + goto set_prev; + + xchk_btree_set_corrupt(bs->sc, bs->cur, 0); + + /* Save whichever rmap record extends furthest. */ + inext = irec->rm_startblock + irec->rm_blockcount; + if (pnext > inext) + return; + +set_prev: + memcpy(&cr->overlap_rec, irec, sizeof(struct xfs_rmap_irec)); +} + +/* Decide if two reverse-mapping records can be merged. */ +static inline bool +xchk_rtrmap_mergeable( + struct xchk_rtrmap *cr, + const struct xfs_rmap_irec *r2) +{ + const struct xfs_rmap_irec *r1 = &cr->prev_rec; + + /* Ignore if prev_rec is not yet initialized. */ + if (cr->prev_rec.rm_blockcount == 0) + return false; + + if (r1->rm_owner != r2->rm_owner) + return false; + if (r1->rm_startblock + r1->rm_blockcount != r2->rm_startblock) + return false; + if ((unsigned long long)r1->rm_blockcount + r2->rm_blockcount > + XFS_RMAP_LEN_MAX) + return false; + if (r1->rm_flags != r2->rm_flags) + return false; + return r1->rm_offset + r1->rm_blockcount == r2->rm_offset; +} + +/* Flag failures for records that could be merged. */ +STATIC void +xchk_rtrmapbt_check_mergeable( + struct xchk_btree *bs, + struct xchk_rtrmap *cr, + const struct xfs_rmap_irec *irec) +{ + if (bs->sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT) + return; + + if (xchk_rtrmap_mergeable(cr, irec)) + xchk_btree_set_corrupt(bs->sc, bs->cur, 0); + + memcpy(&cr->prev_rec, irec, sizeof(struct xfs_rmap_irec)); +} + +/* Scrub a realtime rmapbt record. */ +STATIC int +xchk_rtrmapbt_rec( + struct xchk_btree *bs, + const union xfs_btree_rec *rec) +{ + struct xchk_rtrmap *cr = bs->private; + struct xfs_rmap_irec irec; + + if (xfs_rmap_btrec_to_irec(rec, &irec) != NULL || + xfs_rmap_check_irec(bs->cur, &irec) != NULL) { + xchk_btree_set_corrupt(bs->sc, bs->cur, 0); + return 0; + } + + if (bs->sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT) + return 0; + + xchk_rtrmapbt_check_mergeable(bs, cr, &irec); + xchk_rtrmapbt_check_overlapping(bs, cr, &irec); + return 0; +} + +/* Scrub the realtime rmap btree. */ +int +xchk_rtrmapbt( + struct xfs_scrub *sc) +{ + struct xfs_owner_info oinfo; + struct xchk_rtrmap cr = { }; + int error; + + error = xchk_metadata_inode_forks(sc); + if (error || (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)) + return error; + + xfs_rmap_ino_bmbt_owner(&oinfo, sc->sr.rtg->rtg_rmapip->i_ino, + XFS_DATA_FORK); + return xchk_btree(sc, sc->sr.rmap_cur, xchk_rtrmapbt_rec, &oinfo, &cr); +} diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c index 6066673953cb..c9b4899c8b6a 100644 --- a/fs/xfs/scrub/scrub.c +++ b/fs/xfs/scrub/scrub.c @@ -182,6 +182,8 @@ xchk_teardown( int error) { xchk_ag_free(sc, &sc->sa); + xchk_rtgroup_btcur_free(&sc->sr); + if (sc->tp) { if (error == 0 && (sc->sm->sm_flags & XFS_SCRUB_IFLAG_REPAIR)) error = xfs_trans_commit(sc->tp); @@ -423,6 +425,13 @@ static const struct xchk_meta_ops meta_scrub_ops[] = { .has = xfs_has_rtgroups, .repair = xrep_notsupported, }, + [XFS_SCRUB_TYPE_RTRMAPBT] = { /* realtime group rmapbt */ + .type = ST_RTGROUP, + .setup = xchk_setup_rtrmapbt, + .scrub = xchk_rtrmapbt, + .has = xfs_has_rtrmapbt, + .repair = xrep_notsupported, + }, }; static int diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h index 48114bda2f4a..fa75034b9051 100644 --- a/fs/xfs/scrub/scrub.h +++ b/fs/xfs/scrub/scrub.h @@ -78,6 +78,9 @@ struct xchk_rt { * if rtg != NULL. */ unsigned int rtlock_flags; + + /* rtgroup btrees */ + struct xfs_btree_cur *rmap_cur; }; struct xfs_scrub { @@ -190,11 +193,13 @@ int xchk_rtbitmap(struct xfs_scrub *sc); int xchk_rtsummary(struct xfs_scrub *sc); int xchk_rgsuperblock(struct xfs_scrub *sc); int xchk_rgbitmap(struct xfs_scrub *sc); +int xchk_rtrmapbt(struct xfs_scrub *sc); #else # define xchk_rtbitmap xchk_nothing # define xchk_rtsummary xchk_nothing # define xchk_rgsuperblock xchk_nothing # define xchk_rgbitmap xchk_nothing +# define xchk_rtrmapbt xchk_nothing #endif #ifdef CONFIG_XFS_QUOTA int xchk_quota(struct xfs_scrub *sc); diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h index 3ffee717062d..844f49091b1d 100644 --- a/fs/xfs/scrub/trace.h +++ b/fs/xfs/scrub/trace.h @@ -77,6 +77,7 @@ TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_NLINKS); TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_HEALTHY); TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_RGSUPER); TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_RGBITMAP); +TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_RTRMAPBT); #define XFS_SCRUB_TYPE_STRINGS \ { XFS_SCRUB_TYPE_PROBE, "probe" }, \ @@ -108,7 +109,8 @@ TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_RGBITMAP); { XFS_SCRUB_TYPE_NLINKS, "nlinks" }, \ { XFS_SCRUB_TYPE_HEALTHY, "healthy" }, \ { XFS_SCRUB_TYPE_RGSUPER, "rgsuper" }, \ - { XFS_SCRUB_TYPE_RGBITMAP, "rgbitmap" } + { XFS_SCRUB_TYPE_RGBITMAP, "rgbitmap" }, \ + { XFS_SCRUB_TYPE_RTRMAPBT, "rtrmapbt" } #define XFS_SCRUB_FLAG_STRINGS \ { XFS_SCRUB_IFLAG_REPAIR, "repair" }, \ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 25/38] xfs: fix scrub tracepoints when inode-rooted btrees are involved 2022-12-30 22:18 ` [PATCHSET v1.0 00/38] xfs: realtime reverse-mapping support Darrick J. Wong ` (25 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 27/38] xfs: scrub the realtime rmapbt Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 29/38] xfs: cross-reference the realtime rmapbt Darrick J. Wong ` (10 subsequent siblings) 37 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Fix a minor mistakes in the scrub tracepoints that can manifest when inode-rooted btrees are enabled. The existing code worked fine for bmap btrees, but we should tighten the code up to be less sloppy. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/scrub/trace.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h index cf1635e00cb0..3ffee717062d 100644 --- a/fs/xfs/scrub/trace.h +++ b/fs/xfs/scrub/trace.h @@ -477,7 +477,7 @@ TRACE_EVENT(xchk_ifork_btree_op_error, TP_fast_assign( xfs_fsblock_t fsbno = xchk_btree_cur_fsbno(cur, level); __entry->dev = sc->mp->m_super->s_dev; - __entry->ino = sc->ip->i_ino; + __entry->ino = cur->bc_ino.ip->i_ino; __entry->whichfork = cur->bc_ino.whichfork; __entry->type = sc->sm->sm_type; __entry->btnum = cur->bc_btnum; ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 29/38] xfs: cross-reference the realtime rmapbt 2022-12-30 22:18 ` [PATCHSET v1.0 00/38] xfs: realtime reverse-mapping support Darrick J. Wong ` (26 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 25/38] xfs: fix scrub tracepoints when inode-rooted btrees are involved Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 31/38] xfs: walk the rt reverse mapping tree when rebuilding rmap Darrick J. Wong ` (9 subsequent siblings) 37 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Teach the data fork and realtime bitmap scrubbers to cross-reference information with the realtime rmap btree. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/scrub/bmap.c | 67 +++++++++++++++++++++++++++++++-------- fs/xfs/scrub/rtbitmap.c | 80 +++++++++++++++++++++++++++++++++++++++++++++-- fs/xfs/scrub/rtrmap.c | 65 ++++++++++++++++++++++++++++++++++++++ fs/xfs/scrub/scrub.h | 9 +++++ 4 files changed, 202 insertions(+), 19 deletions(-) diff --git a/fs/xfs/scrub/bmap.c b/fs/xfs/scrub/bmap.c index 0c79185daedf..49fffe85dde6 100644 --- a/fs/xfs/scrub/bmap.c +++ b/fs/xfs/scrub/bmap.c @@ -19,6 +19,7 @@ #include "xfs_bmap_btree.h" #include "xfs_rmap.h" #include "xfs_rmap_btree.h" +#include "xfs_rtgroup.h" #include "scrub/scrub.h" #include "scrub/common.h" #include "scrub/btree.h" @@ -127,15 +128,22 @@ static inline bool xchk_bmap_get_rmap( struct xchk_bmap_info *info, struct xfs_bmbt_irec *irec, - xfs_agblock_t agbno, + xfs_agblock_t bno, uint64_t owner, struct xfs_rmap_irec *rmap) { + struct xfs_btree_cur **curp = &info->sc->sa.rmap_cur; xfs_fileoff_t offset; unsigned int rflags = 0; int has_rmap; int error; + if (xfs_ifork_is_realtime(info->sc->ip, info->whichfork)) + curp = &info->sc->sr.rmap_cur; + + if (*curp == NULL) + return false; + if (info->whichfork == XFS_ATTR_FORK) rflags |= XFS_RMAP_ATTR_FORK; if (irec->br_state == XFS_EXT_UNWRITTEN) @@ -156,13 +164,13 @@ xchk_bmap_get_rmap( * range rmap lookup to make sure we get the correct owner/offset. */ if (info->is_shared) { - error = xfs_rmap_lookup_le_range(info->sc->sa.rmap_cur, agbno, - owner, offset, rflags, rmap, &has_rmap); + error = xfs_rmap_lookup_le_range(*curp, bno, owner, offset, + rflags, rmap, &has_rmap); } else { - error = xfs_rmap_lookup_le(info->sc->sa.rmap_cur, agbno, - owner, offset, rflags, rmap, &has_rmap); + error = xfs_rmap_lookup_le(*curp, bno, owner, offset, + rflags, rmap, &has_rmap); } - if (!xchk_should_check_xref(info->sc, &error, &info->sc->sa.rmap_cur)) + if (!xchk_should_check_xref(info->sc, &error, curp)) return false; if (!has_rmap) @@ -218,13 +226,13 @@ STATIC void xchk_bmap_xref_rmap( struct xchk_bmap_info *info, struct xfs_bmbt_irec *irec, - xfs_agblock_t agbno) + xfs_agblock_t bno) { struct xfs_rmap_irec rmap; unsigned long long rmap_end; uint64_t owner; - if (!info->sc->sa.rmap_cur || xchk_skip_xref(info->sc->sm)) + if (xchk_skip_xref(info->sc->sm)) return; if (info->whichfork == XFS_COW_FORK) @@ -233,13 +241,12 @@ xchk_bmap_xref_rmap( owner = info->sc->ip->i_ino; /* Find the rmap record for this irec. */ - if (!xchk_bmap_get_rmap(info, irec, agbno, owner, &rmap)) + if (!xchk_bmap_get_rmap(info, irec, bno, owner, &rmap)) return; /* Check the rmap. */ rmap_end = (unsigned long long)rmap.rm_startblock + rmap.rm_blockcount; - if (rmap.rm_startblock > agbno || - agbno + irec->br_blockcount > rmap_end) + if (rmap.rm_startblock > bno || bno + irec->br_blockcount > rmap_end) xchk_fblock_xref_set_corrupt(info->sc, info->whichfork, irec->br_startoff); @@ -288,7 +295,7 @@ xchk_bmap_xref_rmap( * Skip this for CoW fork extents because the refcount btree (and not * the inode) is the ondisk owner for those extents. */ - if (info->whichfork != XFS_COW_FORK && rmap.rm_startblock < agbno && + if (info->whichfork != XFS_COW_FORK && rmap.rm_startblock < bno && !xchk_bmap_has_prev(info, irec)) { xchk_fblock_xref_set_corrupt(info->sc, info->whichfork, irec->br_startoff); @@ -303,7 +310,7 @@ xchk_bmap_xref_rmap( */ rmap_end = (unsigned long long)rmap.rm_startblock + rmap.rm_blockcount; if (info->whichfork != XFS_COW_FORK && - rmap_end > agbno + irec->br_blockcount && + rmap_end > bno + irec->br_blockcount && !xchk_bmap_has_next(info, irec)) { xchk_fblock_xref_set_corrupt(info->sc, info->whichfork, irec->br_startoff); @@ -318,10 +325,40 @@ xchk_bmap_rt_iextent_xref( struct xchk_bmap_info *info, struct xfs_bmbt_irec *irec) { - xchk_rt_init(info->sc, &info->sc->sr, XCHK_RTLOCK_BITMAP_SHARED); + struct xfs_owner_info oinfo; + struct xfs_mount *mp = ip->i_mount; + xfs_rgnumber_t rgno; + xfs_rgblock_t rgbno; + int error; + + if (!xfs_has_rtrmapbt(mp)) { + xchk_rt_init(info->sc, &info->sc->sr, + XCHK_RTLOCK_BITMAP_SHARED); + xchk_xref_is_used_rt_space(info->sc, irec->br_startblock, + irec->br_blockcount); + xchk_rt_unlock(info->sc, &info->sc->sr); + return; + } + + rgbno = xfs_rtb_to_rgbno(mp, irec->br_startblock, &rgno); + error = xchk_rtgroup_init(info->sc, rgno, &info->sc->sr, + XCHK_RTGLOCK_ALL); + if (!xchk_fblock_process_error(info->sc, info->whichfork, + irec->br_startoff, &error)) + goto out_free; + xchk_xref_is_used_rt_space(info->sc, irec->br_startblock, irec->br_blockcount); - xchk_rt_unlock(info->sc, &info->sc->sr); + xchk_bmap_xref_rmap(info, irec, rgbno); + + xfs_rmap_ino_owner(&oinfo, info->sc->ip->i_ino, info->whichfork, + irec->br_startoff); + xchk_xref_is_only_rt_owned_by(info->sc, rgbno, irec->br_blockcount, + &oinfo); + +out_free: + xchk_rtgroup_btcur_free(&info->sc->sr); + xchk_rtgroup_free(info->sc, &info->sc->sr); } /* Cross-reference a single datadev extent record. */ diff --git a/fs/xfs/scrub/rtbitmap.c b/fs/xfs/scrub/rtbitmap.c index a034f2d392f5..eb150c40d33c 100644 --- a/fs/xfs/scrub/rtbitmap.c +++ b/fs/xfs/scrub/rtbitmap.c @@ -9,15 +9,19 @@ #include "xfs_format.h" #include "xfs_trans_resv.h" #include "xfs_mount.h" +#include "xfs_btree.h" #include "xfs_log_format.h" #include "xfs_trans.h" #include "xfs_rtbitmap.h" #include "xfs_inode.h" #include "xfs_bmap.h" #include "xfs_rtgroup.h" +#include "xfs_rmap.h" +#include "xfs_rtrmap_btree.h" #include "scrub/scrub.h" #include "scrub/common.h" #include "scrub/repair.h" +#include "scrub/btree.h" /* Set us up with the realtime group metadata locked. */ int @@ -77,6 +81,43 @@ xchk_setup_rtbitmap( /* Realtime bitmap. */ +struct xchk_rtbitmap { + struct xfs_scrub *sc; + + /* The next free rt block that we expect to see. */ + xfs_rtblock_t next_free_rtblock; +}; + +/* Cross-reference rtbitmap entries with other metadata. */ +STATIC void +xchk_rtbitmap_xref( + struct xchk_rtbitmap *rtb, + xfs_rtblock_t startblock, + xfs_rtblock_t blockcount) +{ + struct xfs_scrub *sc = rtb->sc; + xfs_rgnumber_t rgno; + xfs_rgblock_t rgbno; + + if (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT) + return; + if (!sc->sr.rmap_cur) + return; + + rgbno = xfs_rtb_to_rgbno(sc->mp, startblock, &rgno); + xchk_xref_has_no_rt_owner(sc, rgbno, blockcount); + + if (rtb->next_free_rtblock < startblock) { + xfs_rgblock_t next_rgbno; + + next_rgbno = xfs_rtb_to_rgbno(sc->mp, rtb->next_free_rtblock, + &rgno); + xchk_xref_has_rt_owner(sc, next_rgbno, rgbno - next_rgbno); + } + + rtb->next_free_rtblock = startblock + blockcount; +} + /* Scrub a free extent record from the realtime bitmap. */ STATIC int xchk_rtbitmap_rec( @@ -85,8 +126,9 @@ xchk_rtbitmap_rec( const struct xfs_rtalloc_rec *rec, void *priv) { - struct xfs_scrub *sc = priv; - xfs_rtxnum_t startblock; + struct xchk_rtbitmap *rtb = priv; + struct xfs_scrub *sc = rtb->sc; + xfs_rtblock_t startblock; xfs_filblks_t blockcount; startblock = xfs_rtx_to_rtb(mp, rec->ar_startext); @@ -94,6 +136,12 @@ xchk_rtbitmap_rec( if (!xfs_verify_rtbext(mp, startblock, blockcount)) xchk_fblock_set_corrupt(sc, XFS_DATA_FORK, 0); + + xchk_rtbitmap_xref(rtb, startblock, blockcount); + + if (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT) + return -ECANCELED; + return 0; } @@ -138,8 +186,12 @@ xchk_rgbitmap( struct xfs_scrub *sc) { struct xfs_rtalloc_rec keys[2]; + struct xchk_rtbitmap rtb = { + .sc = sc, + }; struct xfs_rtgroup *rtg = sc->sr.rtg; xfs_rtblock_t rtbno; + xfs_rtblock_t last_rtbno; xfs_rgblock_t last_rgbno = rtg->rtg_blockcount - 1; int error; @@ -155,6 +207,7 @@ xchk_rgbitmap( * realtime group. */ rtbno = xfs_rgbno_to_rtb(sc->mp, rtg->rtg_rgno, 0); + rtb.next_free_rtblock = rtbno; keys[0].ar_startext = xfs_rtb_to_rtxt(sc->mp, rtbno); rtbno = xfs_rgbno_to_rtb(sc->mp, rtg->rtg_rgno, last_rgbno); @@ -162,10 +215,26 @@ xchk_rgbitmap( keys[0].ar_extcount = keys[1].ar_extcount = 0; error = xfs_rtalloc_query_range(sc->mp, sc->tp, &keys[0], &keys[1], - xchk_rtbitmap_rec, sc); + xchk_rtbitmap_rec, &rtb); if (!xchk_fblock_process_error(sc, XFS_DATA_FORK, 0, &error)) return error; + /* + * Check that the are rmappings for all rt extents between the end of + * the last free extent we saw and the last possible extent in the rt + * group. + */ + last_rtbno = xfs_rgbno_to_rtb(sc->mp, rtg->rtg_rgno, last_rgbno); + if (rtb.next_free_rtblock < last_rtbno) { + xfs_rgnumber_t rgno; + xfs_rgblock_t next_rgbno; + + next_rgbno = xfs_rtb_to_rgbno(sc->mp, rtb.next_free_rtblock, + &rgno); + xchk_xref_has_rt_owner(sc, next_rgbno, + last_rgbno - next_rgbno); + } + return 0; } @@ -174,6 +243,9 @@ int xchk_rtbitmap( struct xfs_scrub *sc) { + struct xchk_rtbitmap rtb = { + .sc = sc, + }; int error; /* Is the size of the rtbitmap correct? */ @@ -199,7 +271,7 @@ xchk_rtbitmap( if (xfs_has_rtgroups(sc->mp)) return 0; - error = xfs_rtalloc_query_all(sc->mp, sc->tp, xchk_rtbitmap_rec, sc); + error = xfs_rtalloc_query_all(sc->mp, sc->tp, xchk_rtbitmap_rec, &rtb); if (!xchk_fblock_process_error(sc, XFS_DATA_FORK, 0, &error)) return error; diff --git a/fs/xfs/scrub/rtrmap.c b/fs/xfs/scrub/rtrmap.c index 72fc47cc25f0..e9ca9670f3af 100644 --- a/fs/xfs/scrub/rtrmap.c +++ b/fs/xfs/scrub/rtrmap.c @@ -208,3 +208,68 @@ xchk_rtrmapbt( XFS_DATA_FORK); return xchk_btree(sc, sc->sr.rmap_cur, xchk_rtrmapbt_rec, &oinfo, &cr); } + +/* xref check that the extent has no realtime reverse mapping at all */ +void +xchk_xref_has_no_rt_owner( + struct xfs_scrub *sc, + xfs_rgblock_t bno, + xfs_extlen_t len) +{ + enum xbtree_recpacking outcome; + int error; + + if (!sc->sr.rmap_cur || xchk_skip_xref(sc->sm)) + return; + + error = xfs_rmap_has_records(sc->sr.rmap_cur, bno, len, &outcome); + if (!xchk_should_check_xref(sc, &error, &sc->sr.rmap_cur)) + return; + if (outcome != XBTREE_RECPACKING_EMPTY) + xchk_btree_xref_set_corrupt(sc, sc->sr.rmap_cur, 0); +} + +/* xref check that the extent is completely mapped */ +void +xchk_xref_has_rt_owner( + struct xfs_scrub *sc, + xfs_rgblock_t bno, + xfs_extlen_t len) +{ + enum xbtree_recpacking outcome; + int error; + + if (!sc->sr.rmap_cur || xchk_skip_xref(sc->sm)) + return; + + error = xfs_rmap_has_records(sc->sr.rmap_cur, bno, len, &outcome); + if (!xchk_should_check_xref(sc, &error, &sc->sr.rmap_cur)) + return; + if (outcome != XBTREE_RECPACKING_FULL) + xchk_btree_xref_set_corrupt(sc, sc->sr.rmap_cur, 0); +} + +/* xref check that the extent is only owned by a given owner */ +void +xchk_xref_is_only_rt_owned_by( + struct xfs_scrub *sc, + xfs_agblock_t bno, + xfs_extlen_t len, + const struct xfs_owner_info *oinfo) +{ + struct xfs_rmap_matches res; + int error; + + if (!sc->sr.rmap_cur || xchk_skip_xref(sc->sm)) + return; + + error = xfs_rmap_count_owners(sc->sr.rmap_cur, bno, len, oinfo, &res); + if (!xchk_should_check_xref(sc, &error, &sc->sr.rmap_cur)) + return; + if (res.matches != 1) + xchk_btree_xref_set_corrupt(sc, sc->sr.rmap_cur, 0); + if (res.badno_matches) + xchk_btree_xref_set_corrupt(sc, sc->sr.rmap_cur, 0); + if (res.nono_matches) + xchk_btree_xref_set_corrupt(sc, sc->sr.rmap_cur, 0); +} diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h index fa75034b9051..d47db84e6b7f 100644 --- a/fs/xfs/scrub/scrub.h +++ b/fs/xfs/scrub/scrub.h @@ -233,8 +233,17 @@ void xchk_xref_is_not_cow_staging(struct xfs_scrub *sc, xfs_agblock_t bno, #ifdef CONFIG_XFS_RT void xchk_xref_is_used_rt_space(struct xfs_scrub *sc, xfs_rtblock_t rtbno, xfs_extlen_t len); +void xchk_xref_has_no_rt_owner(struct xfs_scrub *sc, xfs_rgblock_t rgbno, + xfs_extlen_t len); +void xchk_xref_has_rt_owner(struct xfs_scrub *sc, xfs_rgblock_t rgbno, + xfs_extlen_t len); +void xchk_xref_is_only_rt_owned_by(struct xfs_scrub *sc, xfs_rgblock_t rgbno, + xfs_extlen_t len, const struct xfs_owner_info *oinfo); #else # define xchk_xref_is_used_rt_space(sc, rtbno, len) do { } while (0) +# define xchk_xref_has_no_rt_owner(sc, rtbno, len) do { } while (0) +# define xchk_xref_has_rt_owner(sc, rtbno, len) do { } while (0) +# define xchk_xref_is_only_rt_owned_by(sc, bno, len, oinfo) do { } while (0) #endif #endif /* __XFS_SCRUB_SCRUB_H__ */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 31/38] xfs: walk the rt reverse mapping tree when rebuilding rmap 2022-12-30 22:18 ` [PATCHSET v1.0 00/38] xfs: realtime reverse-mapping support Darrick J. Wong ` (27 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 29/38] xfs: cross-reference the realtime rmapbt Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 33/38] xfs: repair inodes that have realtime extents Darrick J. Wong ` (8 subsequent siblings) 37 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> When we're rebuilding the data device rmap, if we encounter an "rmap" format fork, we have to walk the (realtime) rmap btree inode to build the appropriate mappings. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/scrub/rmap_repair.c | 36 ++++++++++++++++++++++++++++++++++++ 1 file changed, 36 insertions(+) diff --git a/fs/xfs/scrub/rmap_repair.c b/fs/xfs/scrub/rmap_repair.c index ed937e461bf8..86c5338a12b9 100644 --- a/fs/xfs/scrub/rmap_repair.c +++ b/fs/xfs/scrub/rmap_repair.c @@ -30,6 +30,8 @@ #include "xfs_refcount.h" #include "xfs_refcount_btree.h" #include "xfs_ag.h" +#include "xfs_rtrmap_btree.h" +#include "xfs_rtgroup.h" #include "scrub/xfs_scrub.h" #include "scrub/scrub.h" #include "scrub/common.h" @@ -496,6 +498,38 @@ xrep_rmap_scan_iext( return xrep_rmap_stash_accumulated(rf); } +static int +xrep_rmap_scan_rtrmapbt( + struct xrep_rmap_ifork *rf, + struct xfs_inode *ip) +{ + struct xfs_scrub *sc = rf->rr->sc; + struct xfs_btree_cur *cur; + struct xfs_rtgroup *rtg; + xfs_rgnumber_t rgno; + int error; + + if (rf->whichfork != XFS_DATA_FORK) + return -EFSCORRUPTED; + + for_each_rtgroup(sc->mp, rgno, rtg) { + if (ip == rtg->rtg_rmapip) { + cur = xfs_rtrmapbt_init_cursor(sc->mp, sc->tp, rtg, ip); + error = xrep_rmap_scan_iroot_btree(rf, cur); + xfs_btree_del_cursor(cur, error); + xfs_rtgroup_put(rtg); + return error; + } + } + + /* + * We shouldn't find an rmap format inode that isn't associated with + * an rtgroup! + */ + ASSERT(0); + return -EFSCORRUPTED; +} + /* Find all the extents from a given AG in an inode fork. */ STATIC int xrep_rmap_scan_ifork( @@ -525,6 +559,8 @@ xrep_rmap_scan_ifork( error = xrep_rmap_scan_bmbt(&rf, ip, &mappings_done); if (error || mappings_done) return error; + } else if (ifp->if_format == XFS_DINODE_FMT_RMAP) { + return xrep_rmap_scan_rtrmapbt(&rf, ip); } else if (ifp->if_format != XFS_DINODE_FMT_EXTENTS) { return 0; } ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 33/38] xfs: repair inodes that have realtime extents 2022-12-30 22:18 ` [PATCHSET v1.0 00/38] xfs: realtime reverse-mapping support Darrick J. Wong ` (28 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 31/38] xfs: walk the rt reverse mapping tree when rebuilding rmap Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 34/38] xfs: online repair of realtime bitmaps for a realtime group Darrick J. Wong ` (7 subsequent siblings) 37 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Plumb into the inode core repair code the ability to search for extents on realtime devices. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/scrub/inode_repair.c | 68 ++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 67 insertions(+), 1 deletion(-) diff --git a/fs/xfs/scrub/inode_repair.c b/fs/xfs/scrub/inode_repair.c index a8d19d1e76e3..8566282827f8 100644 --- a/fs/xfs/scrub/inode_repair.c +++ b/fs/xfs/scrub/inode_repair.c @@ -37,6 +37,8 @@ #include "xfs_log_priv.h" #include "xfs_symlink_remote.h" #include "xfs_rtbitmap.h" +#include "xfs_rtgroup.h" +#include "xfs_rtrmap_btree.h" #include "scrub/xfs_scrub.h" #include "scrub/scrub.h" #include "scrub/common.h" @@ -610,18 +612,77 @@ xrep_dinode_count_ag_rmaps( return error; } +/* Count extents and blocks for an inode given an rt rmap. */ +STATIC int +xrep_dinode_walk_rtrmap( + struct xfs_btree_cur *cur, + const struct xfs_rmap_irec *rec, + void *priv) +{ + struct xrep_inode *ri = priv; + int error = 0; + + if (xchk_should_terminate(ri->sc, &error)) + return error; + + /* We only care about this inode. */ + if (rec->rm_owner != ri->sc->sm->sm_ino) + return 0; + + if (rec->rm_flags & (XFS_RMAP_ATTR_FORK | XFS_RMAP_BMBT_BLOCK)) + return -EFSCORRUPTED; + + ri->rt_blocks += rec->rm_blockcount; + ri->rt_extents++; + return 0; +} + +/* Count extents and blocks for an inode from all realtime rmap data. */ +STATIC int +xrep_dinode_count_rtgroup_rmaps( + struct xrep_inode *ri, + struct xfs_rtgroup *rtg) +{ + struct xfs_scrub *sc = ri->sc; + int error; + + if (!xfs_has_realtime(sc->mp) || + xrep_is_rtmeta_ino(sc, rtg, sc->sm->sm_ino)) + return 0; + + error = xrep_rtgroup_init(sc, rtg, &sc->sr, XFS_RTGLOCK_RMAP); + if (error) + return error; + + error = xfs_rmap_query_all(sc->sr.rmap_cur, xrep_dinode_walk_rtrmap, + ri); + xchk_rtgroup_btcur_free(&sc->sr); + xchk_rtgroup_free(sc, &sc->sr); + return error; +} + /* Count extents and blocks for a given inode from all rmap data. */ STATIC int xrep_dinode_count_rmaps( struct xrep_inode *ri) { struct xfs_perag *pag; + struct xfs_rtgroup *rtg; xfs_agnumber_t agno; + xfs_rgnumber_t rgno; int error; - if (!xfs_has_rmapbt(ri->sc->mp) || xfs_has_realtime(ri->sc->mp)) + if (!xfs_has_rmapbt(ri->sc->mp)) return -EOPNOTSUPP; + for_each_rtgroup(ri->sc->mp, rgno, rtg) { + error = xrep_dinode_count_rtgroup_rmaps(ri, rtg); + if (error) { + xfs_rtgroup_put(rtg); + return error; + } + } + for_each_perag(ri->sc->mp, agno, pag) { error = xrep_dinode_count_ag_rmaps(ri, pag); if (error) { @@ -917,6 +978,7 @@ xrep_dinode_ensure_forkoff( uint16_t mode) { struct xfs_bmdr_block *bmdr; + struct xfs_rtrmap_root *rmdr; struct xfs_scrub *sc = ri->sc; xfs_extnum_t attr_extents, data_extents; size_t bmdr_minsz = xfs_bmdr_space_calc(1); @@ -1023,6 +1085,10 @@ xrep_dinode_ensure_forkoff( bmdr = XFS_DFORK_PTR(dip, XFS_DATA_FORK); dfork_min = xfs_bmap_broot_space(sc->mp, bmdr); break; + case XFS_DINODE_FMT_RMAP: + rmdr = XFS_DFORK_PTR(dip, XFS_DATA_FORK); + dfork_min = xfs_rtrmap_broot_space(sc->mp, rmdr); + break; default: dfork_min = 0; break; ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 34/38] xfs: online repair of realtime bitmaps for a realtime group 2022-12-30 22:18 ` [PATCHSET v1.0 00/38] xfs: realtime reverse-mapping support Darrick J. Wong ` (29 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 33/38] xfs: repair inodes that have realtime extents Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 30/38] xfs: scan rt rmap when we're doing an intense rmap check of bmbt mappings Darrick J. Wong ` (6 subsequent siblings) 37 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> For a given rt group, regenerate the bitmap contents from the group's realtime rmap btree. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/scrub/repair.c | 2 fs/xfs/scrub/repair.h | 10 + fs/xfs/scrub/rtbitmap.c | 21 + fs/xfs/scrub/rtbitmap_repair.c | 692 +++++++++++++++++++++++++++++++++++++++ fs/xfs/scrub/rtsummary_repair.c | 3 fs/xfs/scrub/scrub.c | 2 fs/xfs/scrub/tempfile.c | 15 + fs/xfs/scrub/tempswap.h | 2 fs/xfs/scrub/trace.c | 1 fs/xfs/scrub/trace.h | 149 ++++++++ 10 files changed, 885 insertions(+), 12 deletions(-) diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c index 18ce73dcdf3b..995b60f2d41e 100644 --- a/fs/xfs/scrub/repair.c +++ b/fs/xfs/scrub/repair.c @@ -942,7 +942,7 @@ xrep_ag_init( #ifdef CONFIG_XFS_RT /* Initialize all the btree cursors for a RT repair. */ -static void +void xrep_rtgroup_btcur_init( struct xfs_scrub *sc, struct xchk_rt *sr) diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h index c75081185c24..a0ed79506195 100644 --- a/fs/xfs/scrub/repair.h +++ b/fs/xfs/scrub/repair.h @@ -87,6 +87,7 @@ int xrep_setup_directory(struct xfs_scrub *sc); int xrep_setup_parent(struct xfs_scrub *sc); int xrep_setup_nlinks(struct xfs_scrub *sc); int xrep_setup_symlink(struct xfs_scrub *sc, unsigned int *resblks); +int xrep_setup_rgbitmap(struct xfs_scrub *sc, unsigned int *resblks); int xrep_xattr_reset_fork(struct xfs_scrub *sc); @@ -103,6 +104,7 @@ int xrep_ag_init(struct xfs_scrub *sc, struct xfs_perag *pag, #ifdef CONFIG_XFS_RT int xrep_rtgroup_init(struct xfs_scrub *sc, struct xfs_rtgroup *rtg, struct xchk_rt *sr, unsigned int rtglock_flags); +void xrep_rtgroup_btcur_init(struct xfs_scrub *sc, struct xchk_rt *sr); int xrep_require_rtext_inuse(struct xfs_scrub *sc, xfs_rtblock_t rtbno, xfs_filblks_t len); #else @@ -143,10 +145,12 @@ int xrep_symlink(struct xfs_scrub *sc); int xrep_rtbitmap(struct xfs_scrub *sc); int xrep_rtsummary(struct xfs_scrub *sc); int xrep_rgsuperblock(struct xfs_scrub *sc); +int xrep_rgbitmap(struct xfs_scrub *sc); #else # define xrep_rtbitmap xrep_notsupported # define xrep_rtsummary xrep_notsupported # define xrep_rgsuperblock xrep_notsupported +# define xrep_rgbitmap xrep_notsupported #endif /* CONFIG_XFS_RT */ #ifdef CONFIG_XFS_QUOTA @@ -235,6 +239,11 @@ static inline int xrep_setup_symlink(struct xfs_scrub *sc, unsigned int *x) return 0; } +static inline int xrep_setup_rgbitmap(struct xfs_scrub *sc, unsigned int *x) +{ + return 0; +} + #define xrep_revalidate_allocbt (NULL) #define xrep_revalidate_iallocbt (NULL) @@ -262,6 +271,7 @@ static inline int xrep_setup_symlink(struct xfs_scrub *sc, unsigned int *x) #define xrep_parent xrep_notsupported #define xrep_symlink xrep_notsupported #define xrep_rgsuperblock xrep_notsupported +#define xrep_rgbitmap xrep_notsupported #endif /* CONFIG_XFS_ONLINE_REPAIR */ diff --git a/fs/xfs/scrub/rtbitmap.c b/fs/xfs/scrub/rtbitmap.c index eb150c40d33c..ca478fbd514e 100644 --- a/fs/xfs/scrub/rtbitmap.c +++ b/fs/xfs/scrub/rtbitmap.c @@ -22,18 +22,34 @@ #include "scrub/common.h" #include "scrub/repair.h" #include "scrub/btree.h" +#include "scrub/repair.h" /* Set us up with the realtime group metadata locked. */ int xchk_setup_rgbitmap( struct xfs_scrub *sc) { + unsigned int resblks = 0; + unsigned int rtglock_flags = XCHK_RTGLOCK_ALL; int error; if (xchk_need_fshook_drain(sc)) xchk_fshooks_enable(sc, XCHK_FSHOOKS_DRAIN); - error = xchk_trans_alloc(sc, 0); + if (xchk_could_repair(sc)) { + error = xrep_setup_rgbitmap(sc, &resblks); + if (error) + return error; + + /* + * We must hold rbmip with ILOCK_EXCL to use the extent swap + * at the end of the repair function. + */ + rtglock_flags &= ~XFS_RTGLOCK_BITMAP_SHARED; + rtglock_flags |= XFS_RTGLOCK_BITMAP; + } + + error = xchk_trans_alloc(sc, resblks); if (error) return error; @@ -45,8 +61,7 @@ xchk_setup_rgbitmap( if (error) return error; - return xchk_rtgroup_init(sc, sc->sm->sm_agno, &sc->sr, - XCHK_RTGLOCK_ALL); + return xchk_rtgroup_init(sc, sc->sm->sm_agno, &sc->sr, rtglock_flags); } /* Set us up with the realtime metadata locked. */ diff --git a/fs/xfs/scrub/rtbitmap_repair.c b/fs/xfs/scrub/rtbitmap_repair.c index c88c49b03e86..0fa8942d14e7 100644 --- a/fs/xfs/scrub/rtbitmap_repair.c +++ b/fs/xfs/scrub/rtbitmap_repair.c @@ -12,15 +12,707 @@ #include "xfs_btree.h" #include "xfs_log_format.h" #include "xfs_trans.h" +#include "xfs_rtalloc.h" #include "xfs_inode.h" #include "xfs_bit.h" #include "xfs_bmap.h" #include "xfs_bmap_btree.h" +#include "xfs_rmap.h" +#include "xfs_rtrmap_btree.h" +#include "xfs_swapext.h" +#include "xfs_rtbitmap.h" +#include "xfs_rtgroup.h" #include "scrub/scrub.h" #include "scrub/common.h" #include "scrub/trace.h" #include "scrub/repair.h" #include "scrub/xfile.h" +#include "scrub/tempfile.h" +#include "scrub/tempswap.h" +#include "scrub/reap.h" + +/* + * We use an xfile to construct new bitmap blocks for the portion of the + * rtbitmap file that we're replacing. Whereas the ondisk bitmap must be + * accessed through the buffer cache, the xfile bitmap supports direct + * word-level accesses. Therefore, we create a small abstraction for linear + * access. + */ +typedef unsigned long long xrep_wordoff_t; +typedef unsigned int xrep_wordcnt_t; + +struct xrep_rgbmp { + struct xfs_scrub *sc; + + /* file offset inside the rtbitmap where we start swapping */ + xfs_fileoff_t group_rbmoff; + + /* number of rtbitmap blocks for this group */ + xfs_filblks_t group_rbmlen; + + /* The next rtgroup block we expect to see during our rtrmapbt walk. */ + xfs_rgblock_t next_rgbno; + + /* rtword position of xfile as we write buffers to disk. */ + xrep_wordoff_t prep_wordoff; +}; + +/* Mask to round an rtx down to the nearest bitmap word. */ +#define XREP_RTBMP_WORDMASK ((1ULL << XFS_NBWORDLOG) - 1) + +/* Set up to repair the realtime bitmap for this group. */ +int +xrep_setup_rgbitmap( + struct xfs_scrub *sc, + unsigned int *resblks) +{ + struct xfs_mount *mp = sc->mp; + unsigned long long blocks = 0; + unsigned long long rtbmp_words; + size_t bufsize = mp->m_sb.sb_blocksize; + int error; + + error = xrep_tempfile_create(sc, S_IFREG); + if (error) + return error; + + /* Create an xfile to hold our reconstructed bitmap. */ + rtbmp_words = xfs_rtbitmap_wordcount(mp, mp->m_sb.sb_rextents); + error = xfile_create(sc->mp, "rtbitmap", rtbmp_words << XFS_WORDLOG, + &sc->xfile); + if (error) + return error; + + bufsize = max(bufsize, sizeof(struct xrep_tempswap)); + + /* + * Allocate a memory buffer for faster creation of new bitmap + * blocks. + */ + sc->buf = kvmalloc(bufsize, XCHK_GFP_FLAGS); + if (!sc->buf) + return -ENOMEM; + + /* + * Reserve enough blocks to write out a completely new bitmap file, + * plus twice as many blocks as we would need if we can only allocate + * one block per data fork mapping. This should cover the + * preallocation of the temporary file and swapping the extent + * mappings. + * + * We cannot use xfs_swapext_estimate because we have not yet + * constructed the replacement bitmap and therefore do not know how + * many extents it will use. By the time we do, we will have a dirty + * transaction (which we cannot drop because we cannot drop the + * rtbitmap ILOCK) and cannot ask for more reservation. + */ + blocks = mp->m_sb.sb_rbmblocks; + blocks += xfs_bmbt_calc_size(mp, blocks) * 2; + if (blocks > UINT_MAX) + return -EOPNOTSUPP; + + *resblks += blocks; + + /* + * Grab support for atomic extent swapping before we allocate any + * transactions or grab ILOCKs. + */ + return xrep_tempswap_grab_log_assist(sc); +} + +static inline xrep_wordoff_t +rtx_to_wordoff( + struct xfs_mount *mp, + xfs_rtxnum_t rtx) +{ + return rtx >> XFS_NBWORDLOG; +} + +static inline xrep_wordcnt_t +rtxlen_to_wordcnt( + xfs_rtxlen_t rtxlen) +{ + return rtxlen >> XFS_NBWORDLOG; +} + +/* Helper functions to record rtwords in an xfile. */ + +static inline int +xfbmp_load( + struct xrep_rgbmp *rb, + xrep_wordoff_t wordoff, + xfs_rtword_t *word) +{ + union xfs_rtword_ondisk urk; + int error; + + error = xfile_obj_load(rb->sc->xfile, &urk, + sizeof(union xfs_rtword_ondisk), + wordoff << XFS_WORDLOG); + if (error) + return error; + + *word = xfs_rtbitmap_getword(rb->sc->mp, &urk); + return 0; +} + +static inline int +xfbmp_store( + struct xrep_rgbmp *rb, + xrep_wordoff_t wordoff, + const xfs_rtword_t word) +{ + union xfs_rtword_ondisk urk; + + xfs_rtbitmap_setword(rb->sc->mp, &urk, word); + return xfile_obj_store(rb->sc->xfile, &urk, + sizeof(union xfs_rtword_ondisk), + wordoff << XFS_WORDLOG); +} + +static inline int +xfbmp_copyin( + struct xrep_rgbmp *rb, + xrep_wordoff_t wordoff, + const union xfs_rtword_ondisk *word, + xrep_wordcnt_t nr_words) +{ + return xfile_obj_store(rb->sc->xfile, word, nr_words << XFS_WORDLOG, + wordoff << XFS_WORDLOG); +} + +static inline int +xfbmp_copyout( + struct xrep_rgbmp *rb, + xrep_wordoff_t wordoff, + union xfs_rtword_ondisk *word, + xrep_wordcnt_t nr_words) +{ + return xfile_obj_load(rb->sc->xfile, word, nr_words << XFS_WORDLOG, + wordoff << XFS_WORDLOG); +} + +/* + * Preserve the portions of the rtbitmap block for the start of this rtgroup + * that map to the previous rtgroup. + */ +STATIC int +xrep_rgbitmap_load_before( + struct xrep_rgbmp *rb) +{ + struct xfs_scrub *sc = rb->sc; + struct xfs_mount *mp = sc->mp; + struct xfs_rtgroup *rtg = sc->sr.rtg; + struct xfs_buf *bp; + xrep_wordoff_t wordoff; + xfs_rtblock_t group_rtbno; + xfs_rtxnum_t group_rtx, rbmoff_rtx; + xfs_rtword_t ondisk_word; + xfs_rtword_t xfile_word; + xfs_rtword_t mask; + xrep_wordcnt_t wordcnt; + int bit; + int error; + + /* + * Compute the file offset within the rtbitmap block that corresponds + * to the start of this group, and decide if we need to read blocks + * from the group before this one. + */ + group_rtbno = xfs_rgbno_to_rtb(mp, rtg->rtg_rgno, 0); + group_rtx = xfs_rtb_to_rtxt(mp, group_rtbno); + + rb->group_rbmoff = xfs_rtx_to_rbmblock(mp, group_rtx); + rbmoff_rtx = xfs_rbmblock_to_rtx(mp, rb->group_rbmoff); + rb->prep_wordoff = rtx_to_wordoff(mp, rbmoff_rtx); + + trace_xrep_rgbitmap_load(rtg, rb->group_rbmoff, rbmoff_rtx, + group_rtx - 1); + + if (rbmoff_rtx == group_rtx) + return 0; + + error = xfs_rtbuf_get(mp, sc->tp, rb->group_rbmoff, 0, &bp); + if (error) { + /* + * Reading the existing rbmblock failed, and we must deal with + * the part of the rtbitmap block that corresponds to the + * previous group. The most conservative option is to fill + * that part of the bitmap with zeroes so that it won't get + * allocated. The xfile contains zeroes already, so we can + * return. + */ + return 0; + } + + /* + * Copy full rtbitmap words into memory from the beginning of the + * ondisk block until we get to the word that corresponds to the start + * of this group. + */ + wordoff = rtx_to_wordoff(mp, rbmoff_rtx); + wordcnt = rtxlen_to_wordcnt(group_rtx - rbmoff_rtx); + if (wordcnt > 0) { + union xfs_rtword_ondisk *p; + + p = xfs_rbmblock_wordptr(bp, 0); + error = xfbmp_copyin(rb, wordoff, p, wordcnt); + if (error) + goto out_rele; + + trace_xrep_rgbitmap_load_words(mp, rb->group_rbmoff, wordoff, + wordcnt); + wordoff += wordcnt; + } + + /* + * Compute the bit position of the first rtextent of this group. If + * the bit position is zero, we don't have to RMW a partial word and + * move to the next step. + */ + bit = group_rtx & XREP_RTBMP_WORDMASK; + if (bit == 0) + goto out_rele; + + /* + * Create a mask of the bits that we want to load from disk. These + * bits track space in a different rtgroup, which is why we must + * preserve them even as we replace parts of the bitmap. + */ + mask = ~((((xfs_rtword_t)1 << (XFS_NBWORD - bit)) - 1) << bit); + + error = xfbmp_load(rb, wordoff, &xfile_word); + if (error) + goto out_rele; + ondisk_word = xfs_rtbitmap_getword(mp, + xfs_rbmblock_wordptr(bp, wordcnt)); + + trace_xrep_rgbitmap_load_word(mp, wordoff, bit, ondisk_word, + xfile_word, mask); + + xfile_word &= ~mask; + xfile_word |= (ondisk_word & mask); + + error = xfbmp_store(rb, wordoff, xfile_word); + if (error) + goto out_rele; + +out_rele: + xfs_trans_brelse(sc->tp, bp); + return error; +} + +/* + * Preserve the portions of the rtbitmap block for the end of this rtgroup + * that map to the next rtgroup. + */ +STATIC int +xrep_rgbitmap_load_after( + struct xrep_rgbmp *rb) +{ + struct xfs_scrub *sc = rb->sc; + struct xfs_mount *mp = rb->sc->mp; + struct xfs_rtgroup *rtg = rb->sc->sr.rtg; + struct xfs_buf *bp; + xrep_wordoff_t wordoff; + xfs_rtblock_t last_rtbno; + xfs_rtxnum_t last_group_rtx, last_rbmblock_rtx; + xfs_fileoff_t last_group_rbmoff; + xfs_rtword_t ondisk_word; + xfs_rtword_t xfile_word; + xfs_rtword_t mask; + xrep_wordcnt_t wordcnt; + unsigned int last_group_word; + int bit; + int error; + + last_rtbno = xfs_rgbno_to_rtb(mp, rtg->rtg_rgno, + rtg->rtg_blockcount - 1); + last_group_rtx = xfs_rtb_to_rtxt(mp, last_rtbno); + + last_group_rbmoff = xfs_rtx_to_rbmblock(mp, last_group_rtx); + rb->group_rbmlen = last_group_rbmoff - rb->group_rbmoff + 1; + last_rbmblock_rtx = xfs_rbmblock_to_rtx(mp, last_group_rbmoff + 1) - 1; + + trace_xrep_rgbitmap_load(rtg, last_group_rbmoff, last_group_rtx + 1, + last_rbmblock_rtx); + + if (last_rbmblock_rtx == last_group_rtx || + rtg->rtg_rgno == mp->m_sb.sb_rgcount - 1) + return 0; + + error = xfs_rtbuf_get(mp, sc->tp, last_group_rbmoff, 0, &bp); + if (error) { + /* + * Reading the existing rbmblock failed, and we must deal with + * the part of the rtbitmap block that corresponds to the + * previous group. The most conservative option is to fill + * that part of the bitmap with zeroes so that it won't get + * allocated. The xfile contains zeroes already, so we can + * return. + */ + return 0; + } + + /* + * Compute the bit position of the first rtextent of the next group. + * If the bit position is zero, we don't have to RMW a partial word + * and move to the next step. + */ + wordoff = rtx_to_wordoff(mp, last_group_rtx); + bit = (last_group_rtx + 1) & XREP_RTBMP_WORDMASK; + if (bit == 0) + goto copy_words; + + /* + * Create a mask of the bits that we want to load from disk. These + * bits track space in a different rtgroup, which is why we must + * preserve them even as we replace parts of the bitmap. + */ + mask = (((xfs_rtword_t)1 << (XFS_NBWORD - bit)) - 1) << bit; + + error = xfbmp_load(rb, wordoff, &xfile_word); + if (error) + goto out_rele; + last_group_word = xfs_rtx_to_rbmword(mp, last_group_rtx); + ondisk_word = xfs_rtbitmap_getword(mp, + xfs_rbmblock_wordptr(bp, last_group_word)); + + trace_xrep_rgbitmap_load_word(mp, wordoff, bit, ondisk_word, + xfile_word, mask); + + xfile_word &= ~mask; + xfile_word |= (ondisk_word & mask); + + error = xfbmp_store(rb, wordoff, xfile_word); + if (error) + goto out_rele; + +copy_words: + /* Copy as many full words as we can. */ + wordoff++; + wordcnt = rtxlen_to_wordcnt(last_rbmblock_rtx - last_group_rtx); + if (wordcnt > 0) { + union xfs_rtword_ondisk *p; + + p = xfs_rbmblock_wordptr(bp, mp->m_blockwsize - wordcnt); + error = xfbmp_copyin(rb, wordoff, p, wordcnt); + if (error) + goto out_rele; + + trace_xrep_rgbitmap_load_words(mp, last_group_rbmoff, wordoff, + wordcnt); + } + +out_rele: + xfs_trans_brelse(sc->tp, bp); + return error; +} + +/* Perform a logical OR operation on an rtword in the incore bitmap. */ +static int +xrep_rgbitmap_or( + struct xrep_rgbmp *rb, + xrep_wordoff_t wordoff, + xfs_rtword_t mask) +{ + xfs_rtword_t word; + int error; + + error = xfbmp_load(rb, wordoff, &word); + if (error) + return error; + + trace_xrep_rgbitmap_or(rb->sc->mp, wordoff, mask, word); + + return xfbmp_store(rb, wordoff, word | mask); +} + +/* + * Mark as free every rt extent between the next rt block we expected to see + * in the rtrmap records and the given rt block. + */ +STATIC int +xrep_rgbitmap_mark_free( + struct xrep_rgbmp *rb, + xfs_rgblock_t rgbno) +{ + struct xfs_mount *mp = rb->sc->mp; + struct xfs_rtgroup *rtg = rb->sc->sr.rtg; + xfs_rtblock_t rtbno; + xfs_rtxnum_t startrtx; + xfs_rtxnum_t nextrtx; + xrep_wordoff_t wordoff, nextwordoff; + unsigned int bit; + unsigned int bufwsize; + xfs_extlen_t mod; + xfs_rtword_t mask; + int error; + + if (!xfs_verify_rgbext(rtg, rb->next_rgbno, rgbno - rb->next_rgbno)) + return -EFSCORRUPTED; + + /* + * Convert rt blocks to rt extents The block range we find must be + * aligned to an rtextent boundary on both ends. + */ + rtbno = xfs_rgbno_to_rtb(mp, rtg->rtg_rgno, rb->next_rgbno); + startrtx = xfs_rtb_to_rtx(mp, rtbno, &mod); + if (mod) + return -EFSCORRUPTED; + + rtbno = xfs_rgbno_to_rtb(mp, rtg->rtg_rgno, rgbno - 1); + nextrtx = xfs_rtb_to_rtx(mp, rtbno, &mod) + 1; + if (mod != mp->m_sb.sb_rextsize - 1) + return -EFSCORRUPTED; + + trace_xrep_rgbitmap_record_free(mp, startrtx, nextrtx - 1); + + /* Set bits as needed to round startrtx up to the nearest word. */ + bit = startrtx & XREP_RTBMP_WORDMASK; + if (bit) { + xfs_rtblock_t len = nextrtx - startrtx; + unsigned int lastbit; + + lastbit = XFS_RTMIN(bit + len, XFS_NBWORD); + mask = (((xfs_rtword_t)1 << (lastbit - bit)) - 1) << bit; + + error = xrep_rgbitmap_or(rb, rtx_to_wordoff(mp, startrtx), mask); + if (error || lastbit - bit == len) + return error; + startrtx += XFS_NBWORD - bit; + } + + /* Set bits as needed to round nextrtx down to the nearest word. */ + bit = nextrtx & XREP_RTBMP_WORDMASK; + if (bit) { + mask = ((xfs_rtword_t)1 << bit) - 1; + + error = xrep_rgbitmap_or(rb, rtx_to_wordoff(mp, nextrtx), mask); + if (error || startrtx + bit == nextrtx) + return error; + nextrtx -= bit; + } + + trace_xrep_rgbitmap_record_free_bulk(mp, startrtx, nextrtx - 1); + + /* Set all the words in between, up to a whole fs block at once. */ + wordoff = rtx_to_wordoff(mp, startrtx); + nextwordoff = rtx_to_wordoff(mp, nextrtx); + bufwsize = mp->m_sb.sb_blocksize >> XFS_WORDLOG; + + while (wordoff < nextwordoff) { + xrep_wordoff_t rem; + xrep_wordcnt_t wordcnt; + + wordcnt = min_t(xrep_wordcnt_t, nextwordoff - wordoff, + bufwsize); + + /* + * Try to keep us aligned to sc->buf to reduce the number of + * xfile writes. + */ + rem = wordoff & (bufwsize - 1); + if (rem) + wordcnt = min_t(xrep_wordcnt_t, wordcnt, + bufwsize - rem); + + error = xfbmp_copyin(rb, wordoff, rb->sc->buf, wordcnt); + if (error) + return error; + + wordoff += wordcnt; + } + + return 0; +} + +/* Set free space in the rtbitmap based on rtrmapbt records. */ +STATIC int +xrep_rgbitmap_walk_rtrmap( + struct xfs_btree_cur *cur, + const struct xfs_rmap_irec *rec, + void *priv) +{ + struct xrep_rgbmp *rb = priv; + int error = 0; + + if (xchk_should_terminate(rb->sc, &error)) + return error; + + if (rb->next_rgbno < rec->rm_startblock) { + error = xrep_rgbitmap_mark_free(rb, rec->rm_startblock); + if (error) + return error; + } + + rb->next_rgbno = max(rb->next_rgbno, + rec->rm_startblock + rec->rm_blockcount); + return 0; +} + +/* + * Walk the rtrmapbt to find all the gaps between records, and mark the gaps + * in the realtime bitmap that we're computing. + */ +STATIC int +xrep_rgbitmap_find_freespace( + struct xrep_rgbmp *rb) +{ + struct xfs_scrub *sc = rb->sc; + struct xfs_mount *mp = sc->mp; + struct xfs_rtgroup *rtg = sc->sr.rtg; + int error; + + /* Prepare a buffer of ones so that we can accelerate bulk setting. */ + memset(sc->buf, 0xFF, mp->m_sb.sb_blocksize); + + xrep_rtgroup_btcur_init(sc, &sc->sr); + error = xfs_rmap_query_all(sc->sr.rmap_cur, xrep_rgbitmap_walk_rtrmap, + rb); + if (error) + goto out; + + /* + * Mark as free every possible rt extent from the last one we saw to + * the end of the rt group. + */ + if (rb->next_rgbno < rtg->rtg_blockcount) { + error = xrep_rgbitmap_mark_free(rb, rtg->rtg_blockcount); + if (error) + goto out; + } + +out: + xchk_rtgroup_btcur_free(&sc->sr); + return error; +} + +static int +xrep_rgbitmap_prep_buf( + struct xfs_scrub *sc, + struct xfs_buf *bp, + void *data) +{ + struct xrep_rgbmp *rb = data; + struct xfs_mount *mp = sc->mp; + int error; + + error = xfbmp_copyout(rb, rb->prep_wordoff, + xfs_rbmblock_wordptr(bp, 0), mp->m_blockwsize); + if (error) + return error; + + if (xfs_has_rtgroups(sc->mp)) { + struct xfs_rtbuf_blkinfo *hdr = bp->b_addr; + + hdr->rt_magic = cpu_to_be32(XFS_RTBITMAP_MAGIC); + hdr->rt_owner = cpu_to_be64(sc->ip->i_ino); + hdr->rt_blkno = cpu_to_be64(xfs_buf_daddr(bp)); + hdr->rt_lsn = 0; + uuid_copy(&hdr->rt_uuid, &sc->mp->m_sb.sb_meta_uuid); + bp->b_ops = &xfs_rtbitmap_buf_ops; + } else { + bp->b_ops = &xfs_rtbuf_ops; + } + + rb->prep_wordoff += mp->m_blockwsize; + xfs_trans_buf_set_type(sc->tp, bp, XFS_BLFT_RTBITMAP_BUF); + return 0; +} + +/* Repair the realtime bitmap for this rt group. */ +int +xrep_rgbitmap( + struct xfs_scrub *sc) +{ + struct xrep_rgbmp rb = { + .sc = sc, + .next_rgbno = 0, + }; + struct xrep_tempswap *ti = NULL; + int error; + + /* + * We require the realtime rmapbt (and atomic file updates) to rebuild + * anything. + */ + if (!xfs_has_rtrmapbt(sc->mp)) + return -EOPNOTSUPP; + + /* + * If the start or end of this rt group happens to be in the middle of + * an rtbitmap block, try to read in the parts of the bitmap that are + * from some other group. + */ + error = xrep_rgbitmap_load_before(&rb); + if (error) + return error; + error = xrep_rgbitmap_load_after(&rb); + if (error) + return error; + + /* + * Generate the new rtbitmap data. We don't need the rtbmp information + * once this call is finished. + */ + error = xrep_rgbitmap_find_freespace(&rb); + if (error) + return error; + + /* + * Try to take ILOCK_EXCL of the temporary file. We had better be the + * only ones holding onto this inode, but we can't block while holding + * the rtbitmap file's ILOCK_EXCL. + */ + while (!xrep_tempfile_ilock_nowait(sc)) { + if (xchk_should_terminate(sc, &error)) + return error; + delay(1); + } + + /* + * Make sure we have space allocated for the part of the bitmap + * file that corresponds to this group. + */ + xfs_trans_ijoin(sc->tp, sc->ip, 0); + xfs_trans_ijoin(sc->tp, sc->tempip, 0); + error = xrep_tempfile_prealloc(sc, rb.group_rbmoff, rb.group_rbmlen); + if (error) + return error; + + /* Last chance to abort before we start committing fixes. */ + if (xchk_should_terminate(sc, &error)) + return error; + + /* Copy the bitmap file that we generated. */ + error = xrep_tempfile_copyin(sc, rb.group_rbmoff, rb.group_rbmlen, + xrep_rgbitmap_prep_buf, &rb); + if (error) + return error; + error = xrep_tempfile_set_isize(sc, + XFS_FSB_TO_B(sc->mp, sc->mp->m_sb.sb_rbmblocks)); + if (error) + return error; + + /* + * Now swap the extents. We're done with the temporary buffer, so + * we can reuse it for the tempfile swapext information. + */ + ti = sc->buf; + error = xrep_tempswap_trans_reserve(sc, XFS_DATA_FORK, rb.group_rbmoff, + rb.group_rbmlen, ti); + if (error) + return error; + + error = xrep_tempswap_contents(sc, ti); + if (error) + return error; + ti = NULL; + + /* Free the old bitmap blocks if they are free. */ + return xrep_reap_ifork(sc, sc->tempip, XFS_DATA_FORK); +} /* Set up to repair the realtime bitmap file metadata. */ int diff --git a/fs/xfs/scrub/rtsummary_repair.c b/fs/xfs/scrub/rtsummary_repair.c index 0836c1e10504..cf160fbdc370 100644 --- a/fs/xfs/scrub/rtsummary_repair.c +++ b/fs/xfs/scrub/rtsummary_repair.c @@ -167,7 +167,8 @@ xrep_rtsummary( * so we can reuse it for the tempfile swapext information. */ ti = sc->buf; - error = xrep_tempswap_trans_reserve(sc, XFS_DATA_FORK, ti); + error = xrep_tempswap_trans_reserve(sc, XFS_DATA_FORK, 0, rsumblocks, + ti); if (error) return error; diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c index c9b4899c8b6a..7abd25b37c97 100644 --- a/fs/xfs/scrub/scrub.c +++ b/fs/xfs/scrub/scrub.c @@ -423,7 +423,7 @@ static const struct xchk_meta_ops meta_scrub_ops[] = { .setup = xchk_setup_rgbitmap, .scrub = xchk_rgbitmap, .has = xfs_has_rtgroups, - .repair = xrep_notsupported, + .repair = xrep_rgbitmap, }, [XFS_SCRUB_TYPE_RTRMAPBT] = { /* realtime group rmapbt */ .type = ST_RTGROUP, diff --git a/fs/xfs/scrub/tempfile.c b/fs/xfs/scrub/tempfile.c index 9ae556fa4b7a..a8ee84379af4 100644 --- a/fs/xfs/scrub/tempfile.c +++ b/fs/xfs/scrub/tempfile.c @@ -475,6 +475,8 @@ STATIC int xrep_tempswap_prep_request( struct xfs_scrub *sc, int whichfork, + xfs_fileoff_t off, + xfs_filblks_t len, struct xrep_tempswap *tx) { struct xfs_swapext_req *req = &tx->req; @@ -497,10 +499,10 @@ xrep_tempswap_prep_request( /* Swap all mappings in both forks. */ req->ip1 = sc->tempip; req->ip2 = sc->ip; - req->startoff1 = 0; - req->startoff2 = 0; + req->startoff1 = off; + req->startoff2 = off; req->whichfork = whichfork; - req->blockcount = XFS_MAX_FILEOFF; + req->blockcount = len; req->req_flags = XFS_SWAP_REQ_LOGGED; /* Always swap sizes when we're swapping data fork mappings. */ @@ -653,6 +655,8 @@ int xrep_tempswap_trans_reserve( struct xfs_scrub *sc, int whichfork, + xfs_fileoff_t off, + xfs_filblks_t len, struct xrep_tempswap *tx) { int error; @@ -661,7 +665,7 @@ xrep_tempswap_trans_reserve( ASSERT(xfs_isilocked(sc->ip, XFS_ILOCK_EXCL)); ASSERT(xfs_isilocked(sc->tempip, XFS_ILOCK_EXCL)); - error = xrep_tempswap_prep_request(sc, whichfork, tx); + error = xrep_tempswap_prep_request(sc, whichfork, off, len, tx); if (error) return error; @@ -692,7 +696,8 @@ xrep_tempswap_trans_alloc( ASSERT(sc->tp == NULL); - error = xrep_tempswap_prep_request(sc, whichfork, tx); + error = xrep_tempswap_prep_request(sc, whichfork, 0, XFS_MAX_FILEOFF, + tx); if (error) return error; diff --git a/fs/xfs/scrub/tempswap.h b/fs/xfs/scrub/tempswap.h index bef8d2d2134d..a7cd96aa2fc7 100644 --- a/fs/xfs/scrub/tempswap.h +++ b/fs/xfs/scrub/tempswap.h @@ -13,7 +13,7 @@ struct xrep_tempswap { int xrep_tempswap_grab_log_assist(struct xfs_scrub *sc); int xrep_tempswap_trans_reserve(struct xfs_scrub *sc, int whichfork, - struct xrep_tempswap *ti); + xfs_fileoff_t off, xfs_filblks_t len, struct xrep_tempswap *ti); int xrep_tempswap_trans_alloc(struct xfs_scrub *sc, int whichfork, struct xrep_tempswap *ti); diff --git a/fs/xfs/scrub/trace.c b/fs/xfs/scrub/trace.c index bb13f0a8e4cf..1bb868a54c06 100644 --- a/fs/xfs/scrub/trace.c +++ b/fs/xfs/scrub/trace.c @@ -20,6 +20,7 @@ #include "xfs_btree_mem.h" #include "xfs_rmap.h" #include "xfs_rtbitmap.h" +#include "xfs_rtgroup.h" #include "scrub/scrub.h" #include "scrub/xfile.h" #include "scrub/xfarray.h" diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h index 844f49091b1d..7d086ffce7e3 100644 --- a/fs/xfs/scrub/trace.h +++ b/fs/xfs/scrub/trace.h @@ -2820,6 +2820,155 @@ TRACE_EVENT(xrep_iunlink_commit_bucket, __entry->agino) ); +#ifdef CONFIG_XFS_RT +DECLARE_EVENT_CLASS(xrep_rgbitmap_class, + TP_PROTO(struct xfs_mount *mp, xfs_rtxnum_t start, xfs_rtxnum_t end), + TP_ARGS(mp, start, end), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(dev_t, rtdev) + __field(xfs_rtxnum_t, start) + __field(xfs_rtxnum_t, end) + ), + TP_fast_assign( + __entry->dev = mp->m_super->s_dev; + __entry->rtdev = mp->m_rtdev_targp->bt_dev; + __entry->start = start; + __entry->end = end; + ), + TP_printk("dev %d:%d rtdev %d:%d startrtx 0x%llx endrtx 0x%llx", + MAJOR(__entry->dev), MINOR(__entry->dev), + MAJOR(__entry->rtdev), MINOR(__entry->rtdev), + __entry->start, + __entry->end) +); +#define DEFINE_REPAIR_RGBITMAP_EVENT(name) \ +DEFINE_EVENT(xrep_rgbitmap_class, name, \ + TP_PROTO(struct xfs_mount *mp, xfs_rtxnum_t start, \ + xfs_rtxnum_t end), \ + TP_ARGS(mp, start, end)) +DEFINE_REPAIR_RGBITMAP_EVENT(xrep_rgbitmap_record_free); +DEFINE_REPAIR_RGBITMAP_EVENT(xrep_rgbitmap_record_free_bulk); + +TRACE_EVENT(xrep_rgbitmap_or, + TP_PROTO(struct xfs_mount *mp, unsigned long long wordoff, + xfs_rtword_t mask, xfs_rtword_t word), + TP_ARGS(mp, wordoff, mask, word), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(dev_t, rtdev) + __field(unsigned long long, wordoff) + __field(unsigned int, mask) + __field(unsigned int, word) + ), + TP_fast_assign( + __entry->dev = mp->m_super->s_dev; + __entry->rtdev = mp->m_rtdev_targp->bt_dev; + __entry->wordoff = wordoff; + __entry->mask = mask; + __entry->word = word; + ), + TP_printk("dev %d:%d rtdev %d:%d wordoff 0x%llx mask 0x%x word 0x%x", + MAJOR(__entry->dev), MINOR(__entry->dev), + MAJOR(__entry->rtdev), MINOR(__entry->rtdev), + __entry->wordoff, + __entry->mask, + __entry->word) +); + +TRACE_EVENT(xrep_rgbitmap_load, + TP_PROTO(struct xfs_rtgroup *rtg, xfs_fileoff_t rbmoff, + xfs_rtxnum_t rtx, xfs_rtxnum_t len), + TP_ARGS(rtg, rbmoff, rtx, len), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(dev_t, rtdev) + __field(xfs_rgnumber_t, rgno) + __field(xfs_fileoff_t, rbmoff) + __field(xfs_rtxnum_t, rtx) + __field(xfs_rtxnum_t, len) + ), + TP_fast_assign( + __entry->dev = rtg->rtg_mount->m_super->s_dev; + __entry->rtdev = rtg->rtg_mount->m_rtdev_targp->bt_dev; + __entry->rgno = rtg->rtg_rgno; + __entry->rbmoff = rbmoff; + __entry->rtx = rtx; + __entry->len = len; + ), + TP_printk("dev %d:%d rtdev %d:%d rgno 0x%x rbmoff 0x%llx rtx 0x%llx rtxcount 0x%llx", + MAJOR(__entry->dev), MINOR(__entry->dev), + MAJOR(__entry->rtdev), MINOR(__entry->rtdev), + __entry->rgno, + __entry->rbmoff, + __entry->rtx, + __entry->len) +); + +TRACE_EVENT(xrep_rgbitmap_load_words, + TP_PROTO(struct xfs_mount *mp, xfs_fileoff_t rbmoff, + unsigned long long wordoff, unsigned int wordcnt), + TP_ARGS(mp, rbmoff, wordoff, wordcnt), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(dev_t, rtdev) + __field(xfs_fileoff_t, rbmoff) + __field(unsigned long long, wordoff) + __field(unsigned int, wordcnt) + ), + TP_fast_assign( + __entry->dev = mp->m_super->s_dev; + __entry->rtdev = mp->m_rtdev_targp->bt_dev; + __entry->rbmoff = rbmoff; + __entry->wordoff = wordoff; + __entry->wordcnt = wordcnt; + ), + TP_printk("dev %d:%d rtdev %d:%d rbmoff 0x%llx wordoff 0x%llx wordcnt 0x%x", + MAJOR(__entry->dev), MINOR(__entry->dev), + MAJOR(__entry->rtdev), MINOR(__entry->rtdev), + __entry->rbmoff, + __entry->wordoff, + __entry->wordcnt) +); + +TRACE_EVENT(xrep_rgbitmap_load_word, + TP_PROTO(struct xfs_mount *mp, unsigned long long wordoff, + unsigned int bit, xfs_rtword_t ondisk_word, + xfs_rtword_t xfile_word, xfs_rtword_t word_mask), + TP_ARGS(mp, wordoff, bit, ondisk_word, xfile_word, word_mask), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(dev_t, rtdev) + __field(unsigned long long, wordoff) + __field(unsigned int, bit) + __field(xfs_rtword_t, ondisk_word) + __field(xfs_rtword_t, xfile_word) + __field(xfs_rtword_t, word_mask) + ), + TP_fast_assign( + __entry->dev = mp->m_super->s_dev; + __entry->rtdev = mp->m_rtdev_targp->bt_dev; + __entry->wordoff = wordoff; + __entry->bit = bit; + __entry->ondisk_word = ondisk_word; + __entry->xfile_word = xfile_word; + __entry->word_mask = word_mask; + ), + TP_printk("dev %d:%d rtdev %d:%d wordoff 0x%llx bit %u ondisk 0x%x(0x%x) inmem 0x%x(0x%x) result 0x%x mask 0x%x", + MAJOR(__entry->dev), MINOR(__entry->dev), + MAJOR(__entry->rtdev), MINOR(__entry->rtdev), + __entry->wordoff, + __entry->bit, + __entry->ondisk_word, + __entry->ondisk_word & __entry->word_mask, + __entry->xfile_word, + __entry->xfile_word & ~__entry->word_mask, + (__entry->xfile_word & ~__entry->word_mask) | + (__entry->ondisk_word & __entry->word_mask), + __entry->word_mask) +); +#endif /* CONFIG_XFS_RT */ + #endif /* IS_ENABLED(CONFIG_XFS_ONLINE_REPAIR) */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 30/38] xfs: scan rt rmap when we're doing an intense rmap check of bmbt mappings 2022-12-30 22:18 ` [PATCHSET v1.0 00/38] xfs: realtime reverse-mapping support Darrick J. Wong ` (30 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 34/38] xfs: online repair of realtime bitmaps for a realtime group Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 28/38] xfs: cross-reference realtime bitmap to realtime rmapbt scrubber Darrick J. Wong ` (5 subsequent siblings) 37 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Teach the bmbt scrubber how to perform a comprehensive check that the rmapbt does not contain /any/ mappings that are not described by bmbt records when it's dealing with a realtime file. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/scrub/bmap.c | 60 +++++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 53 insertions(+), 7 deletions(-) diff --git a/fs/xfs/scrub/bmap.c b/fs/xfs/scrub/bmap.c index 49fffe85dde6..8ce279ae9c95 100644 --- a/fs/xfs/scrub/bmap.c +++ b/fs/xfs/scrub/bmap.c @@ -20,6 +20,8 @@ #include "xfs_rmap.h" #include "xfs_rmap_btree.h" #include "xfs_rtgroup.h" +#include "xfs_rtalloc.h" +#include "xfs_rtrmap_btree.h" #include "scrub/scrub.h" #include "scrub/common.h" #include "scrub/btree.h" @@ -673,12 +675,20 @@ xchk_bmap_check_rmap( */ check_rec = *rec; while (have_map) { + xfs_fsblock_t startblock; + if (irec.br_startoff != check_rec.rm_offset) xchk_fblock_set_corrupt(sc, sbcri->whichfork, check_rec.rm_offset); - if (irec.br_startblock != XFS_AGB_TO_FSB(sc->mp, - cur->bc_ag.pag->pag_agno, - check_rec.rm_startblock)) + if (cur->bc_btnum == XFS_BTNUM_RMAP) + startblock = XFS_AGB_TO_FSB(sc->mp, + cur->bc_ag.pag->pag_agno, + check_rec.rm_startblock); + else + startblock = xfs_rgbno_to_rtb(sc->mp, + cur->bc_ino.rtg->rtg_rgno, + check_rec.rm_startblock); + if (irec.br_startblock != startblock) xchk_fblock_set_corrupt(sc, sbcri->whichfork, check_rec.rm_offset); if (irec.br_blockcount > check_rec.rm_blockcount) @@ -732,6 +742,30 @@ xchk_bmap_check_ag_rmaps( return error; } +/* Make sure each rt rmap has a corresponding bmbt entry. */ +STATIC int +xchk_bmap_check_rt_rmaps( + struct xfs_scrub *sc, + struct xfs_rtgroup *rtg) +{ + struct xchk_bmap_check_rmap_info sbcri; + struct xfs_btree_cur *cur; + int error; + + xfs_rtgroup_lock(NULL, rtg, XFS_RTGLOCK_RMAP); + cur = xfs_rtrmapbt_init_cursor(sc->mp, sc->tp, rtg, rtg->rtg_rmapip); + + sbcri.sc = sc; + sbcri.whichfork = XFS_DATA_FORK; + error = xfs_rmap_query_all(cur, xchk_bmap_check_rmap, &sbcri); + if (error == -ECANCELED) + error = 0; + + xfs_btree_del_cursor(cur, error); + xfs_rtgroup_unlock(rtg, XFS_RTGLOCK_RMAP); + return error; +} + /* Make sure each rmap has a corresponding bmbt entry. */ STATIC int xchk_bmap_check_rmaps( @@ -749,10 +783,6 @@ xchk_bmap_check_rmaps( (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)) return 0; - /* Don't support realtime rmap checks yet. */ - if (xfs_ifork_is_realtime(sc->ip, whichfork)) - return 0; - ASSERT(xfs_ifork_ptr(sc->ip, whichfork) != NULL); /* @@ -772,6 +802,22 @@ xchk_bmap_check_rmaps( (zero_size || ifp->if_nextents > 0)) return 0; + if (xfs_ifork_is_realtime(sc->ip, whichfork)) { + struct xfs_rtgroup *rtg; + xfs_rgnumber_t rgno; + + for_each_rtgroup(sc->mp, rgno, rtg) { + error = xchk_bmap_check_rt_rmaps(sc, rtg); + if (error || + (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)) { + xfs_rtgroup_put(rtg); + return error; + } + } + + return 0; + } + for_each_perag(sc->mp, agno, pag) { error = xchk_bmap_check_ag_rmaps(sc, whichfork, pag); if (error || ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 28/38] xfs: cross-reference realtime bitmap to realtime rmapbt scrubber 2022-12-30 22:18 ` [PATCHSET v1.0 00/38] xfs: realtime reverse-mapping support Darrick J. Wong ` (31 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 30/38] xfs: scan rt rmap when we're doing an intense rmap check of bmbt mappings Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 32/38] xfs: online repair of realtime file bmaps Darrick J. Wong ` (4 subsequent siblings) 37 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> When we're checking the realtime rmap btree entries, cross-reference those entries with the realtime bitmap too. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/scrub/rtrmap.c | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/fs/xfs/scrub/rtrmap.c b/fs/xfs/scrub/rtrmap.c index e60b454b39f3..72fc47cc25f0 100644 --- a/fs/xfs/scrub/rtrmap.c +++ b/fs/xfs/scrub/rtrmap.c @@ -150,6 +150,23 @@ xchk_rtrmapbt_check_mergeable( memcpy(&cr->prev_rec, irec, sizeof(struct xfs_rmap_irec)); } +/* Cross-reference with other metadata. */ +STATIC void +xchk_rtrmapbt_xref( + struct xfs_scrub *sc, + struct xfs_rmap_irec *irec) +{ + xfs_rtblock_t rtbno; + + if (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT) + return; + + rtbno = xfs_rgbno_to_rtb(sc->mp, sc->sr.rtg->rtg_rgno, + irec->rm_startblock); + + xchk_xref_is_used_rt_space(sc, rtbno, irec->rm_blockcount); +} + /* Scrub a realtime rmapbt record. */ STATIC int xchk_rtrmapbt_rec( @@ -170,6 +187,7 @@ xchk_rtrmapbt_rec( xchk_rtrmapbt_check_mergeable(bs, cr, &irec); xchk_rtrmapbt_check_overlapping(bs, cr, &irec); + xchk_rtrmapbt_xref(bs->sc, &irec); return 0; } ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 32/38] xfs: online repair of realtime file bmaps 2022-12-30 22:18 ` [PATCHSET v1.0 00/38] xfs: realtime reverse-mapping support Darrick J. Wong ` (32 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 28/38] xfs: cross-reference realtime bitmap to realtime rmapbt scrubber Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 35/38] xfs: online repair of the realtime rmap btree Darrick J. Wong ` (3 subsequent siblings) 37 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Repair the block mappings of realtime files. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/scrub/bmap_repair.c | 127 +++++++++++++++++++++++++++++++++++++++++++- fs/xfs/scrub/common.c | 2 - fs/xfs/scrub/common.h | 3 + fs/xfs/scrub/repair.c | 93 ++++++++++++++++++++++++++++++++ fs/xfs/scrub/repair.h | 11 ++++ 5 files changed, 231 insertions(+), 5 deletions(-) diff --git a/fs/xfs/scrub/bmap_repair.c b/fs/xfs/scrub/bmap_repair.c index ca7df344581d..77d601afbcfb 100644 --- a/fs/xfs/scrub/bmap_repair.c +++ b/fs/xfs/scrub/bmap_repair.c @@ -25,10 +25,12 @@ #include "xfs_bmap_btree.h" #include "xfs_rmap.h" #include "xfs_rmap_btree.h" +#include "xfs_rtrmap_btree.h" #include "xfs_refcount.h" #include "xfs_quota.h" #include "xfs_ialloc.h" #include "xfs_ag.h" +#include "xfs_rtgroup.h" #include "scrub/xfs_scrub.h" #include "scrub/scrub.h" #include "scrub/common.h" @@ -313,6 +315,116 @@ xrep_bmap_scan_ag( return error; } +#ifdef CONFIG_XFS_RT +/* Check for any obvious errors or conflicts in the file mapping. */ +STATIC int +xrep_bmap_check_rtfork_rmap( + struct xfs_scrub *sc, + struct xfs_btree_cur *cur, + const struct xfs_rmap_irec *rec) +{ + xfs_rtblock_t rtbno; + + /* xattr extents are never stored on realtime devices */ + if (rec->rm_flags & XFS_RMAP_ATTR_FORK) + return -EFSCORRUPTED; + + /* bmbt blocks are never stored on realtime devices */ + if (rec->rm_flags & XFS_RMAP_BMBT_BLOCK) + return -EFSCORRUPTED; + + /* Data extents for non-rt files are never stored on the rt device. */ + if (!XFS_IS_REALTIME_INODE(sc->ip)) + return -EFSCORRUPTED; + + /* Check the file offsets and physical extents. */ + if (!xfs_verify_fileext(sc->mp, rec->rm_offset, rec->rm_blockcount)) + return -EFSCORRUPTED; + + /* Check that this is within the rtgroup. */ + if (!xfs_verify_rgbext(cur->bc_ino.rtg, rec->rm_startblock, + rec->rm_blockcount)) + return -EFSCORRUPTED; + + /* Make sure this isn't free space. */ + rtbno = xfs_rgbno_to_rtb(sc->mp, cur->bc_ino.rtg->rtg_rgno, + rec->rm_startblock); + return xrep_require_rtext_inuse(sc, rtbno, rec->rm_blockcount); +} + +/* Record realtime extents that belong to this inode's fork. */ +STATIC int +xrep_bmap_walk_rtrmap( + struct xfs_btree_cur *cur, + const struct xfs_rmap_irec *rec, + void *priv) +{ + struct xrep_bmap *rb = priv; + xfs_rtblock_t rtbno; + int error = 0; + + if (xchk_should_terminate(rb->sc, &error)) + return error; + + /* Skip extents which are not owned by this inode and fork. */ + if (rec->rm_owner != rb->sc->ip->i_ino) + return 0; + + error = xrep_bmap_check_rtfork_rmap(rb->sc, cur, rec); + if (error) + return error; + + /* + * Record all blocks allocated to this file even if the extent isn't + * for the fork we're rebuilding so that we can reset di_nblocks later. + */ + rb->nblocks += rec->rm_blockcount; + + /* If this rmap isn't for the fork we want, we're done. */ + if (rb->whichfork == XFS_DATA_FORK && + (rec->rm_flags & XFS_RMAP_ATTR_FORK)) + return 0; + if (rb->whichfork == XFS_ATTR_FORK && + !(rec->rm_flags & XFS_RMAP_ATTR_FORK)) + return 0; + + rtbno = xfs_rgbno_to_rtb(cur->bc_mp, cur->bc_ino.rtg->rtg_rgno, + rec->rm_startblock); + return xrep_bmap_from_rmap(rb, rec->rm_offset, rtbno, + rec->rm_blockcount, + rec->rm_flags & XFS_RMAP_UNWRITTEN); +} + +/* Scan the realtime reverse mappings to build the new extent map. */ +STATIC int +xrep_bmap_scan_rtgroup( + struct xrep_bmap *rb, + struct xfs_rtgroup *rtg) +{ + struct xfs_scrub *sc = rb->sc; + int error; + + if (xrep_is_rtmeta_ino(sc, rtg, sc->ip->i_ino)) + return 0; + + error = xrep_rtgroup_init(sc, rtg, &sc->sr, + XFS_RTGLOCK_RMAP | XFS_RTGLOCK_BITMAP_SHARED); + if (error) + return error; + + error = xfs_rmap_query_all(sc->sr.rmap_cur, xrep_bmap_walk_rtrmap, rb); + xchk_rtgroup_btcur_free(&sc->sr); + xchk_rtgroup_free(sc, &sc->sr); + return error; +} +#else +static inline int +xrep_bmap_scan_rtgroup(struct xrep_bmap *rb, struct xfs_rtgroup *rtg) +{ + return -EFSCORRUPTED; +} +#endif + /* Find the delalloc extents from the old incore extent tree. */ STATIC int xrep_bmap_find_delalloc( @@ -362,9 +474,20 @@ xrep_bmap_find_mappings( { struct xfs_scrub *sc = rb->sc; struct xfs_perag *pag; + struct xfs_rtgroup *rtg; xfs_agnumber_t agno; + xfs_rgnumber_t rgno; int error = 0; + /* Iterate the rtrmaps for extents. */ + for_each_rtgroup(sc->mp, rgno, rtg) { + error = xrep_bmap_scan_rtgroup(rb, rtg); + if (error) { + xfs_rtgroup_put(rtg); + return error; + } + } + /* Iterate the rmaps for extents. */ for_each_perag(sc->mp, agno, pag) { error = xrep_bmap_scan_ag(rb, pag); @@ -705,10 +828,6 @@ xrep_bmap_check_inputs( return -EINVAL; } - /* Don't know how to rebuild realtime data forks. */ - if (XFS_IS_REALTIME_INODE(sc->ip)) - return -EOPNOTSUPP; - return 0; } diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c index fa8e0064c41d..18763d136ef5 100644 --- a/fs/xfs/scrub/common.c +++ b/fs/xfs/scrub/common.c @@ -760,7 +760,7 @@ xchk_rt_unlock( #ifdef CONFIG_XFS_RT /* Lock all the rt group metadata inode ILOCKs and wait for intents. */ -static int +int xchk_rtgroup_lock( struct xfs_scrub *sc, struct xchk_rt *sr, diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h index 9ca2fbaac72c..e135f792cfcc 100644 --- a/fs/xfs/scrub/common.h +++ b/fs/xfs/scrub/common.h @@ -181,10 +181,13 @@ int xchk_rtgroup_init(struct xfs_scrub *sc, xfs_rgnumber_t rgno, void xchk_rtgroup_unlock(struct xfs_scrub *sc, struct xchk_rt *sr); void xchk_rtgroup_btcur_free(struct xchk_rt *sr); void xchk_rtgroup_free(struct xfs_scrub *sc, struct xchk_rt *sr); +int xchk_rtgroup_lock(struct xfs_scrub *sc, struct xchk_rt *sr, + unsigned int rtglock_flags); #else # define xchk_rtgroup_init(sc, rgno, sr, lockflags) (-ENOSYS) # define xchk_rtgroup_btcur_free(sr) ((void)0) # define xchk_rtgroup_free(sc, sr) ((void)0) +# define xchk_rtgroup_lock(sc, sr, lockflags) (-ENOSYS) #endif /* CONFIG_XFS_RT */ int xchk_ag_read_headers(struct xfs_scrub *sc, xfs_agnumber_t agno, diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c index eb0dda2df7af..18ce73dcdf3b 100644 --- a/fs/xfs/scrub/repair.c +++ b/fs/xfs/scrub/repair.c @@ -35,6 +35,9 @@ #include "xfs_da_btree.h" #include "xfs_attr.h" #include "xfs_dir2.h" +#include "xfs_rtrmap_btree.h" +#include "xfs_rtbitmap.h" +#include "xfs_rtgroup.h" #include "scrub/scrub.h" #include "scrub/common.h" #include "scrub/trace.h" @@ -937,6 +940,73 @@ xrep_ag_init( return 0; } +#ifdef CONFIG_XFS_RT +/* Initialize all the btree cursors for a RT repair. */ +static void +xrep_rtgroup_btcur_init( + struct xfs_scrub *sc, + struct xchk_rt *sr) +{ + struct xfs_mount *mp = sc->mp; + + ASSERT(sr->rtg != NULL); + + if (sc->sm->sm_type != XFS_SCRUB_TYPE_RTRMAPBT && + (sr->rtlock_flags & XFS_RTGLOCK_RMAP) && + xfs_has_rtrmapbt(mp)) + sr->rmap_cur = xfs_rtrmapbt_init_cursor(mp, sc->tp, sr->rtg, + sr->rtg->rtg_rmapip); +} + +/* + * Given a reference to a rtgroup structure, lock rtgroup btree inodes and + * create btree cursors. Must only be called to repair a regular rt file. + */ +int +xrep_rtgroup_init( + struct xfs_scrub *sc, + struct xfs_rtgroup *rtg, + struct xchk_rt *sr, + unsigned int rtglock_flags) +{ + ASSERT(sr->rtg == NULL); + + xfs_rtgroup_lock(NULL, rtg, rtglock_flags); + sr->rtlock_flags = rtglock_flags; + + /* Grab our own reference to the rtgroup structure. */ + sr->rtg = xfs_rtgroup_bump(rtg); + xrep_rtgroup_btcur_init(sc, sr); + return 0; +} + +/* Ensure that all rt blocks in the given range are not marked free. */ +int +xrep_require_rtext_inuse( + struct xfs_scrub *sc, + xfs_rtblock_t rtbno, + xfs_filblks_t len) +{ + struct xfs_mount *mp = sc->mp; + xfs_rtxnum_t startrtx; + xfs_rtxnum_t endrtx; + bool is_free = false; + int error; + + startrtx = xfs_rtb_to_rtxt(mp, rtbno); + endrtx = xfs_rtb_to_rtxt(mp, rtbno + len - 1); + + error = xfs_rtalloc_extent_is_free(mp, sc->tp, startrtx, + endrtx - startrtx + 1, &is_free); + if (error) + return error; + if (is_free) + return -EFSCORRUPTED; + + return 0; +} +#endif /* CONFIG_XFS_RT */ + /* Reinitialize the per-AG block reservation for the AG we just fixed. */ int xrep_reset_perag_resv( @@ -1261,3 +1331,26 @@ xrep_dotdot_lookup( return ino; } + +/* Are we looking at a realtime metadata inode? */ +bool +xrep_is_rtmeta_ino( + struct xfs_scrub *sc, + struct xfs_rtgroup *rtg, + xfs_ino_t ino) +{ + /* + * All filesystems have rt bitmap and summary inodes, even if they + * don't have an rt section. + */ + if (ino == sc->mp->m_rbmip->i_ino) + return true; + if (ino == sc->mp->m_rsumip->i_ino) + return true; + + /* Newer rt metadata files are not guaranteed to exist */ + if (rtg->rtg_rmapip && ino == rtg->rtg_rmapip->i_ino) + return true; + + return false; +} diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h index 292e252efae3..c75081185c24 100644 --- a/fs/xfs/scrub/repair.h +++ b/fs/xfs/scrub/repair.h @@ -100,6 +100,17 @@ int xrep_setup_rtbitmap(struct xfs_scrub *sc, unsigned int *resblks); void xrep_ag_btcur_init(struct xfs_scrub *sc, struct xchk_ag *sa); int xrep_ag_init(struct xfs_scrub *sc, struct xfs_perag *pag, struct xchk_ag *sa); +#ifdef CONFIG_XFS_RT +int xrep_rtgroup_init(struct xfs_scrub *sc, struct xfs_rtgroup *rtg, + struct xchk_rt *sr, unsigned int rtglock_flags); +int xrep_require_rtext_inuse(struct xfs_scrub *sc, xfs_rtblock_t rtbno, + xfs_filblks_t len); +#else +# define xrep_rtgroup_init(sc, rtg, sr, lockflags) (-ENOSYS) +#endif /* CONFIG_XFS_RT */ + +bool xrep_is_rtmeta_ino(struct xfs_scrub *sc, struct xfs_rtgroup *rtg, + xfs_ino_t ino); /* Metadata revalidators */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 35/38] xfs: online repair of the realtime rmap btree 2022-12-30 22:18 ` [PATCHSET v1.0 00/38] xfs: realtime reverse-mapping support Darrick J. Wong ` (33 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 32/38] xfs: online repair of realtime file bmaps Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 36/38] xfs: create a shadow rmap btree during realtime rmap repair Darrick J. Wong ` (2 subsequent siblings) 37 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Repair the realtime rmap btree while mounted. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/Makefile | 1 fs/xfs/libxfs/xfs_rmap.c | 2 fs/xfs/libxfs/xfs_rmap.h | 2 fs/xfs/libxfs/xfs_rtrmap_btree.c | 2 fs/xfs/libxfs/xfs_rtrmap_btree.h | 3 fs/xfs/scrub/bmap_repair.c | 3 fs/xfs/scrub/common.c | 5 fs/xfs/scrub/cow_repair.c | 2 fs/xfs/scrub/reap.c | 5 fs/xfs/scrub/reap.h | 2 fs/xfs/scrub/repair.c | 135 +++++++ fs/xfs/scrub/repair.h | 13 + fs/xfs/scrub/rtrmap.c | 7 fs/xfs/scrub/rtrmap_repair.c | 722 ++++++++++++++++++++++++++++++++++++++ fs/xfs/scrub/scrub.c | 2 fs/xfs/scrub/trace.h | 57 +++ 16 files changed, 954 insertions(+), 9 deletions(-) create mode 100644 fs/xfs/scrub/rtrmap_repair.c diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile index 1060ea739210..17c65dce6d26 100644 --- a/fs/xfs/Makefile +++ b/fs/xfs/Makefile @@ -221,6 +221,7 @@ xfs-y += $(addprefix scrub/, \ xfs-$(CONFIG_XFS_RT) += $(addprefix scrub/, \ rgsuper_repair.o \ rtbitmap_repair.o \ + rtrmap_repair.o \ rtsummary_repair.o \ ) diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c index e3bff42d003d..9c678e9fded5 100644 --- a/fs/xfs/libxfs/xfs_rmap.c +++ b/fs/xfs/libxfs/xfs_rmap.c @@ -264,7 +264,7 @@ xfs_rmap_check_perag_irec( return NULL; } -static inline xfs_failaddr_t +inline xfs_failaddr_t xfs_rmap_check_rtgroup_irec( struct xfs_rtgroup *rtg, const struct xfs_rmap_irec *irec) diff --git a/fs/xfs/libxfs/xfs_rmap.h b/fs/xfs/libxfs/xfs_rmap.h index e98f37c39f2f..9d0aaa16f551 100644 --- a/fs/xfs/libxfs/xfs_rmap.h +++ b/fs/xfs/libxfs/xfs_rmap.h @@ -215,6 +215,8 @@ xfs_failaddr_t xfs_rmap_btrec_to_irec(const union xfs_btree_rec *rec, struct xfs_rmap_irec *irec); xfs_failaddr_t xfs_rmap_check_perag_irec(struct xfs_perag *pag, const struct xfs_rmap_irec *irec); +xfs_failaddr_t xfs_rmap_check_rtgroup_irec(struct xfs_rtgroup *rtg, + const struct xfs_rmap_irec *irec); xfs_failaddr_t xfs_rmap_check_irec(struct xfs_btree_cur *cur, const struct xfs_rmap_irec *irec); diff --git a/fs/xfs/libxfs/xfs_rtrmap_btree.c b/fs/xfs/libxfs/xfs_rtrmap_btree.c index 2d8130b4c187..418173f6f3ca 100644 --- a/fs/xfs/libxfs/xfs_rtrmap_btree.c +++ b/fs/xfs/libxfs/xfs_rtrmap_btree.c @@ -705,7 +705,7 @@ xfs_rtrmapbt_create_path( } /* Calculate the rtrmap btree size for some records. */ -static unsigned long long +unsigned long long xfs_rtrmapbt_calc_size( struct xfs_mount *mp, unsigned long long len) diff --git a/fs/xfs/libxfs/xfs_rtrmap_btree.h b/fs/xfs/libxfs/xfs_rtrmap_btree.h index 046a60816736..1f0a6f9620e8 100644 --- a/fs/xfs/libxfs/xfs_rtrmap_btree.h +++ b/fs/xfs/libxfs/xfs_rtrmap_btree.h @@ -203,4 +203,7 @@ struct xfs_imeta_update; int xfs_rtrmapbt_create(struct xfs_trans **tpp, struct xfs_imeta_path *path, struct xfs_imeta_update *ic, struct xfs_inode **ipp); +unsigned long long xfs_rtrmapbt_calc_size(struct xfs_mount *mp, + unsigned long long len); + #endif /* __XFS_RTRMAP_BTREE_H__ */ diff --git a/fs/xfs/scrub/bmap_repair.c b/fs/xfs/scrub/bmap_repair.c index 77d601afbcfb..b8cdcba984f3 100644 --- a/fs/xfs/scrub/bmap_repair.c +++ b/fs/xfs/scrub/bmap_repair.c @@ -775,7 +775,8 @@ xrep_bmap_remove_old_tree( /* Free the old bmbt blocks if they're not in use. */ xfs_rmap_ino_bmbt_owner(&oinfo, sc->ip->i_ino, rb->whichfork); - return xrep_reap_fsblocks(sc, &rb->old_bmbt_blocks, &oinfo); + return xrep_reap_fsblocks(sc, &rb->old_bmbt_blocks, &oinfo, + XFS_AG_RESV_NONE); } /* Check for garbage inputs. Returns -ECANCELED if there's nothing to do. */ diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c index 18763d136ef5..c2c379aae770 100644 --- a/fs/xfs/scrub/common.c +++ b/fs/xfs/scrub/common.c @@ -964,7 +964,10 @@ int xchk_setup_rt( struct xfs_scrub *sc) { - return xchk_trans_alloc(sc, 0); + uint resblks; + + resblks = xrep_calc_rtgroup_resblks(sc); + return xchk_trans_alloc(sc, resblks); } /* Set us up with AG headers and btree cursors. */ diff --git a/fs/xfs/scrub/cow_repair.c b/fs/xfs/scrub/cow_repair.c index d1b5915e1703..5292171e6a2b 100644 --- a/fs/xfs/scrub/cow_repair.c +++ b/fs/xfs/scrub/cow_repair.c @@ -649,7 +649,7 @@ xrep_bmap_cow( * like inode metadata. */ error = xrep_reap_fsblocks(sc, &xc->old_cowfork_fsblocks, - &XFS_RMAP_OINFO_COW); + &XFS_RMAP_OINFO_COW, XFS_AG_RESV_NONE); if (error) goto out_bitmap; diff --git a/fs/xfs/scrub/reap.c b/fs/xfs/scrub/reap.c index 151afacab982..b0b29b1e139b 100644 --- a/fs/xfs/scrub/reap.c +++ b/fs/xfs/scrub/reap.c @@ -652,12 +652,13 @@ int xrep_reap_fsblocks( struct xfs_scrub *sc, struct xfsb_bitmap *bitmap, - const struct xfs_owner_info *oinfo) + const struct xfs_owner_info *oinfo, + enum xfs_ag_resv_type type) { struct xreap_state rs = { .sc = sc, .oinfo = oinfo, - .resv = XFS_AG_RESV_NONE, + .resv = type, }; int error; diff --git a/fs/xfs/scrub/reap.h b/fs/xfs/scrub/reap.h index 6606b119b9ec..cfaef544f659 100644 --- a/fs/xfs/scrub/reap.h +++ b/fs/xfs/scrub/reap.h @@ -9,7 +9,7 @@ int xrep_reap_agblocks(struct xfs_scrub *sc, struct xagb_bitmap *bitmap, const struct xfs_owner_info *oinfo, enum xfs_ag_resv_type type); int xrep_reap_fsblocks(struct xfs_scrub *sc, struct xfsb_bitmap *bitmap, - const struct xfs_owner_info *oinfo); + const struct xfs_owner_info *oinfo, enum xfs_ag_resv_type type); int xrep_reap_ifork(struct xfs_scrub *sc, struct xfs_inode *ip, int whichfork); /* Buffer cache scan context. */ diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c index 995b60f2d41e..b76c01e9f540 100644 --- a/fs/xfs/scrub/repair.c +++ b/fs/xfs/scrub/repair.c @@ -38,6 +38,8 @@ #include "xfs_rtrmap_btree.h" #include "xfs_rtbitmap.h" #include "xfs_rtgroup.h" +#include "xfs_rtalloc.h" +#include "xfs_imeta.h" #include "scrub/scrub.h" #include "scrub/common.h" #include "scrub/trace.h" @@ -371,6 +373,39 @@ xrep_calc_ag_resblks( return max(max(bnobt_sz, inobt_sz), max(rmapbt_sz, refcbt_sz)); } +#ifdef CONFIG_XFS_RT +/* + * Figure out how many blocks to reserve for a rtgroup repair. We calculate + * the worst case estimate for the number of blocks we'd need to rebuild one of + * any type of per-rtgroup btree. + */ +xfs_extlen_t +xrep_calc_rtgroup_resblks( + struct xfs_scrub *sc) +{ + struct xfs_mount *mp = sc->mp; + struct xfs_scrub_metadata *sm = sc->sm; + struct xfs_rtgroup *rtg; + xfs_extlen_t usedlen; + xfs_extlen_t rmapbt_sz = 0; + + if (!(sm->sm_flags & XFS_SCRUB_IFLAG_REPAIR)) + return 0; + + rtg = xfs_rtgroup_get(mp, sm->sm_agno); + usedlen = rtg->rtg_blockcount; + xfs_rtgroup_put(rtg); + + if (xfs_has_rmapbt(mp)) + rmapbt_sz = xfs_rtrmapbt_calc_size(mp, usedlen); + + trace_xrep_calc_rtgroup_resblks_btsize(mp, sm->sm_agno, usedlen, + rmapbt_sz); + + return rmapbt_sz; +} +#endif /* CONFIG_XFS_RT */ + /* * Reconstructing per-AG Btrees * @@ -1354,3 +1389,103 @@ xrep_is_rtmeta_ino( return false; } + +/* Check the sanity of a rmap record for a metadata btree inode. */ +int +xrep_check_ino_btree_mapping( + struct xfs_scrub *sc, + const struct xfs_rmap_irec *rec) +{ + enum xbtree_recpacking outcome; + int error; + + /* + * Metadata btree inodes never have extended attributes, and all blocks + * should have the bmbt block flag set. + */ + if ((rec->rm_flags & XFS_RMAP_ATTR_FORK) || + !(rec->rm_flags & XFS_RMAP_BMBT_BLOCK)) + return -EFSCORRUPTED; + + /* Make sure the block is within the AG. */ + if (!xfs_verify_agbext(sc->sa.pag, rec->rm_startblock, + rec->rm_blockcount)) + return -EFSCORRUPTED; + + /* Make sure this isn't free space. */ + error = xfs_alloc_has_records(sc->sa.bno_cur, rec->rm_startblock, + rec->rm_blockcount, &outcome); + if (error) + return error; + if (outcome != XBTREE_RECPACKING_EMPTY) + return -EFSCORRUPTED; + + return 0; +} + +/* + * Reset the block count of the inode being repaired, and adjust the dquot + * block usage to match. The inode must not have an xattr fork. + */ +void +xrep_inode_set_nblocks( + struct xfs_scrub *sc, + int64_t new_blocks) +{ + int64_t delta; + + delta = new_blocks - sc->ip->i_nblocks; + sc->ip->i_nblocks = new_blocks; + + xfs_trans_log_inode(sc->tp, sc->ip, XFS_ILOG_CORE); + if (delta != 0) + xfs_trans_mod_dquot_byino(sc->tp, sc->ip, XFS_TRANS_DQ_BCOUNT, + delta); +} + +/* Reset the block reservation for a metadata inode. */ +int +xrep_reset_imeta_reservation( + struct xfs_scrub *sc) +{ + struct xfs_inode *ip = sc->ip; + int64_t delta; + int error; + + delta = ip->i_nblocks + ip->i_delayed_blks - ip->i_meta_resv_asked; + if (delta == 0) + return 0; + + if (delta > 0) { + int64_t give_back; + + /* Too many blocks, free from the incore reservation. */ + give_back = min_t(uint64_t, delta, ip->i_delayed_blks); + if (give_back > 0) { + xfs_mod_delalloc(ip->i_mount, -give_back); + xfs_mod_fdblocks(ip->i_mount, give_back, true); + ip->i_delayed_blks -= give_back; + } + + return 0; + } + + /* Not enough reservation, try to add more. @delta is negative here. */ + error = xfs_mod_fdblocks(sc->mp, delta, true); + while (error == -ENOSPC) { + delta++; + if (delta == 0) { + xfs_warn(sc->mp, +"Insufficient free space to reset space reservation for inode 0x%llx after repair.", + ip->i_ino); + return 0; + } + error = xfs_mod_fdblocks(sc->mp, delta, true); + } + if (error) + return error; + + xfs_mod_delalloc(sc->mp, -delta); + ip->i_delayed_blks += -delta; + return 0; +} diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h index a0ed79506195..ff8605849a72 100644 --- a/fs/xfs/scrub/repair.h +++ b/fs/xfs/scrub/repair.h @@ -88,6 +88,7 @@ int xrep_setup_parent(struct xfs_scrub *sc); int xrep_setup_nlinks(struct xfs_scrub *sc); int xrep_setup_symlink(struct xfs_scrub *sc, unsigned int *resblks); int xrep_setup_rgbitmap(struct xfs_scrub *sc, unsigned int *resblks); +int xrep_setup_rtrmapbt(struct xfs_scrub *sc); int xrep_xattr_reset_fork(struct xfs_scrub *sc); @@ -107,12 +108,16 @@ int xrep_rtgroup_init(struct xfs_scrub *sc, struct xfs_rtgroup *rtg, void xrep_rtgroup_btcur_init(struct xfs_scrub *sc, struct xchk_rt *sr); int xrep_require_rtext_inuse(struct xfs_scrub *sc, xfs_rtblock_t rtbno, xfs_filblks_t len); +xfs_extlen_t xrep_calc_rtgroup_resblks(struct xfs_scrub *sc); #else # define xrep_rtgroup_init(sc, rtg, sr, lockflags) (-ENOSYS) +# define xrep_calc_rtgroup_resblks(sc) (0) #endif /* CONFIG_XFS_RT */ bool xrep_is_rtmeta_ino(struct xfs_scrub *sc, struct xfs_rtgroup *rtg, xfs_ino_t ino); +int xrep_check_ino_btree_mapping(struct xfs_scrub *sc, + const struct xfs_rmap_irec *rec); /* Metadata revalidators */ @@ -146,11 +151,13 @@ int xrep_rtbitmap(struct xfs_scrub *sc); int xrep_rtsummary(struct xfs_scrub *sc); int xrep_rgsuperblock(struct xfs_scrub *sc); int xrep_rgbitmap(struct xfs_scrub *sc); +int xrep_rtrmapbt(struct xfs_scrub *sc); #else # define xrep_rtbitmap xrep_notsupported # define xrep_rtsummary xrep_notsupported # define xrep_rgsuperblock xrep_notsupported # define xrep_rgbitmap xrep_notsupported +# define xrep_rtrmapbt xrep_notsupported #endif /* CONFIG_XFS_RT */ #ifdef CONFIG_XFS_QUOTA @@ -170,6 +177,8 @@ void xrep_trans_cancel_hook_dummy(void **cookiep, struct xfs_trans *tp); bool xrep_buf_verify_struct(struct xfs_buf *bp, const struct xfs_buf_ops *ops); xfs_ino_t xrep_dotdot_lookup(struct xfs_scrub *sc); +void xrep_inode_set_nblocks(struct xfs_scrub *sc, int64_t new_blocks); +int xrep_reset_imeta_reservation(struct xfs_scrub *sc); #else @@ -192,6 +201,8 @@ xrep_calc_ag_resblks( return 0; } +#define xrep_calc_rtgroup_resblks xrep_calc_ag_resblks + static inline int xrep_reset_perag_resv( struct xfs_scrub *sc) @@ -217,6 +228,7 @@ xrep_setup_nothing( #define xrep_setup_directory xrep_setup_nothing #define xrep_setup_parent xrep_setup_nothing #define xrep_setup_nlinks xrep_setup_nothing +#define xrep_setup_rtrmapbt xrep_setup_nothing #define xrep_setup_inode(sc, imap) ((void)0) @@ -272,6 +284,7 @@ static inline int xrep_setup_rgbitmap(struct xfs_scrub *sc, unsigned int *x) #define xrep_symlink xrep_notsupported #define xrep_rgsuperblock xrep_notsupported #define xrep_rgbitmap xrep_notsupported +#define xrep_rtrmapbt xrep_notsupported #endif /* CONFIG_XFS_ONLINE_REPAIR */ diff --git a/fs/xfs/scrub/rtrmap.c b/fs/xfs/scrub/rtrmap.c index e9ca9670f3af..5442325a6982 100644 --- a/fs/xfs/scrub/rtrmap.c +++ b/fs/xfs/scrub/rtrmap.c @@ -26,6 +26,7 @@ #include "scrub/common.h" #include "scrub/btree.h" #include "scrub/trace.h" +#include "scrub/repair.h" /* Set us up with the realtime metadata locked. */ int @@ -43,6 +44,12 @@ xchk_setup_rtrmapbt( if (!rtg) return -ENOENT; + if (xchk_could_repair(sc)) { + error = xrep_setup_rtrmapbt(sc); + if (error) + return error; + } + error = xchk_setup_rt(sc); if (error) goto out_rtg; diff --git a/fs/xfs/scrub/rtrmap_repair.c b/fs/xfs/scrub/rtrmap_repair.c new file mode 100644 index 000000000000..d856a4e46d6f --- /dev/null +++ b/fs/xfs/scrub/rtrmap_repair.c @@ -0,0 +1,722 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (C) 2022 Oracle. All Rights Reserved. + * Author: Darrick J. Wong <djwong@kernel.org> + */ +#include "xfs.h" +#include "xfs_fs.h" +#include "xfs_shared.h" +#include "xfs_format.h" +#include "xfs_trans_resv.h" +#include "xfs_mount.h" +#include "xfs_defer.h" +#include "xfs_btree.h" +#include "xfs_btree_staging.h" +#include "xfs_bit.h" +#include "xfs_log_format.h" +#include "xfs_trans.h" +#include "xfs_sb.h" +#include "xfs_alloc.h" +#include "xfs_rmap.h" +#include "xfs_rmap_btree.h" +#include "xfs_rtrmap_btree.h" +#include "xfs_inode.h" +#include "xfs_icache.h" +#include "xfs_bmap.h" +#include "xfs_bmap_btree.h" +#include "xfs_quota.h" +#include "xfs_rtalloc.h" +#include "xfs_ag.h" +#include "xfs_rtgroup.h" +#include "scrub/xfs_scrub.h" +#include "scrub/scrub.h" +#include "scrub/common.h" +#include "scrub/btree.h" +#include "scrub/trace.h" +#include "scrub/repair.h" +#include "scrub/bitmap.h" +#include "scrub/xfile.h" +#include "scrub/xfarray.h" +#include "scrub/iscan.h" +#include "scrub/newbt.h" +#include "scrub/reap.h" + +/* + * Realtime Reverse Mapping Btree Repair + * ===================================== + * + * This isn't quite as difficult as repairing the rmap btree on the data + * device, since we only store the data fork extents of realtime files on the + * realtime device. We still have to freeze the filesystem and stop the + * background threads like we do for the rmap repair, but we only have to scan + * realtime inodes. + * + * Collecting entries for the new realtime rmap btree is easy -- all we have + * to do is generate rtrmap entries from the data fork mappings of all realtime + * files in the filesystem. We then scan the rmap btrees of the data device + * looking for extents belonging to the old btree and note them in a bitmap. + * + * To rebuild the realtime rmap btree, we bulk-load the collected mappings into + * a new btree cursor and atomically swap that into the realtime inode. Then + * we can free the blocks from the old btree. + * + * We use the 'xrep_rtrmap' prefix for all the rmap functions. + */ + +/* Set us up to repair rt reverse mapping btrees. */ +int +xrep_setup_rtrmapbt( + struct xfs_scrub *sc) +{ + /* For now this is a placeholder until we land other pieces. */ + return 0; +} + +/* + * Packed rmap record. The UNWRITTEN flags are hidden in the upper bits of + * offset, just like the on-disk record. + */ +struct xrep_rtrmap_extent { + xfs_rgblock_t startblock; + xfs_extlen_t blockcount; + uint64_t owner; + uint64_t offset; +} __packed; + +/* Context for collecting rmaps */ +struct xrep_rtrmap { + /* new rtrmapbt information */ + struct xrep_newbt new_btree; + + /* rmap records generated from primary metadata */ + struct xfarray *rtrmap_records; + + struct xfs_scrub *sc; + + /* bitmap of old rtrmapbt blocks */ + struct xfsb_bitmap old_rtrmapbt_blocks; + + /* inode scan cursor */ + struct xchk_iscan iscan; + + /* get_records()'s position in the free space record array. */ + xfarray_idx_t array_cur; +}; + +/* Make sure there's nothing funny about this mapping. */ +STATIC int +xrep_rtrmap_check_mapping( + struct xfs_scrub *sc, + const struct xfs_rmap_irec *rec) +{ + xfs_rtblock_t rtbno; + + if (xfs_rmap_check_rtgroup_irec(sc->sr.rtg, rec) != NULL) + return -EFSCORRUPTED; + + /* Make sure this isn't free space. */ + rtbno = xfs_rgbno_to_rtb(sc->mp, sc->sr.rtg->rtg_rgno, + rec->rm_startblock); + return xrep_require_rtext_inuse(sc, rtbno, rec->rm_blockcount); +} + +/* Store a reverse-mapping record. */ +static inline int +xrep_rtrmap_stash( + struct xrep_rtrmap *rr, + xfs_rgblock_t startblock, + xfs_extlen_t blockcount, + uint64_t owner, + uint64_t offset, + unsigned int flags) +{ + struct xrep_rtrmap_extent rre = { + .startblock = startblock, + .blockcount = blockcount, + .owner = owner, + }; + struct xfs_rmap_irec rmap = { + .rm_startblock = startblock, + .rm_blockcount = blockcount, + .rm_owner = owner, + .rm_offset = offset, + .rm_flags = flags, + }; + struct xfs_scrub *sc = rr->sc; + int error = 0; + + if (xchk_should_terminate(sc, &error)) + return error; + + trace_xrep_rtrmap_found(sc->mp, &rmap); + + rre.offset = xfs_rmap_irec_offset_pack(&rmap); + return xfarray_append(rr->rtrmap_records, &rre); +} + +/* Finding all file and bmbt extents. */ + +/* Context for accumulating rmaps for an inode fork. */ +struct xrep_rtrmap_ifork { + /* + * Accumulate rmap data here to turn multiple adjacent bmaps into a + * single rmap. + */ + struct xfs_rmap_irec accum; + + struct xrep_rtrmap *rr; +}; + +/* Stash an rmap that we accumulated while walking an inode fork. */ +STATIC int +xrep_rtrmap_stash_accumulated( + struct xrep_rtrmap_ifork *rf) +{ + if (rf->accum.rm_blockcount == 0) + return 0; + + return xrep_rtrmap_stash(rf->rr, rf->accum.rm_startblock, + rf->accum.rm_blockcount, rf->accum.rm_owner, + rf->accum.rm_offset, rf->accum.rm_flags); +} + +/* Accumulate a bmbt record. */ +STATIC int +xrep_rtrmap_visit_bmbt( + struct xfs_btree_cur *cur, + struct xfs_bmbt_irec *rec, + void *priv) +{ + struct xrep_rtrmap_ifork *rf = priv; + struct xfs_rmap_irec *accum = &rf->accum; + struct xfs_mount *mp = rf->rr->sc->mp; + xfs_rgnumber_t rgno; + xfs_rgblock_t rgbno; + unsigned int rmap_flags = 0; + int error; + + rgbno = xfs_rtb_to_rgbno(mp, rec->br_startblock, &rgno); + if (rgno != rf->rr->sc->sr.rtg->rtg_rgno) + return 0; + + if (rec->br_state == XFS_EXT_UNWRITTEN) + rmap_flags |= XFS_RMAP_UNWRITTEN; + + /* If this bmap is adjacent to the previous one, just add it. */ + if (accum->rm_blockcount > 0 && + rec->br_startoff == accum->rm_offset + accum->rm_blockcount && + rgbno == accum->rm_startblock + accum->rm_blockcount && + rmap_flags == accum->rm_flags) { + accum->rm_blockcount += rec->br_blockcount; + return 0; + } + + /* Otherwise stash the old rmap and start accumulating a new one. */ + error = xrep_rtrmap_stash_accumulated(rf); + if (error) + return error; + + accum->rm_startblock = rgbno; + accum->rm_blockcount = rec->br_blockcount; + accum->rm_offset = rec->br_startoff; + accum->rm_flags = rmap_flags; + return 0; +} + +/* + * Iterate the block mapping btree to collect rmap records for anything in this + * fork that maps to the rt volume. Sets @mappings_done to true if we've + * scanned the block mappings in this fork. + */ +STATIC int +xrep_rtrmap_scan_bmbt( + struct xrep_rtrmap_ifork *rf, + struct xfs_inode *ip, + bool *mappings_done) +{ + struct xrep_rtrmap *rr = rf->rr; + struct xfs_btree_cur *cur; + struct xfs_ifork *ifp = xfs_ifork_ptr(ip, XFS_DATA_FORK); + int error = 0; + + *mappings_done = false; + + /* + * If the incore extent cache is already loaded, we'll just use the + * incore extent scanner to record mappings. Don't bother walking the + * ondisk extent tree. + */ + if (!xfs_need_iread_extents(ifp)) + return 0; + + /* Accumulate all the mappings in the bmap btree. */ + cur = xfs_bmbt_init_cursor(rr->sc->mp, rr->sc->tp, ip, XFS_DATA_FORK); + error = xfs_bmap_query_all(cur, xrep_rtrmap_visit_bmbt, rf); + xfs_btree_del_cursor(cur, error); + if (error) + return error; + + /* Stash any remaining accumulated rmaps and exit. */ + *mappings_done = true; + return xrep_rtrmap_stash_accumulated(rf); +} + +/* + * Iterate the in-core extent cache to collect rmap records for anything in + * this fork that matches the AG. + */ +STATIC int +xrep_rtrmap_scan_iext( + struct xrep_rtrmap_ifork *rf, + struct xfs_ifork *ifp) +{ + struct xfs_bmbt_irec rec; + struct xfs_iext_cursor icur; + int error; + + for_each_xfs_iext(ifp, &icur, &rec) { + if (isnullstartblock(rec.br_startblock)) + continue; + error = xrep_rtrmap_visit_bmbt(NULL, &rec, rf); + if (error) + return error; + } + + return xrep_rtrmap_stash_accumulated(rf); +} + +/* Find all the extents on the realtime device mapped by an inode fork. */ +STATIC int +xrep_rtrmap_scan_dfork( + struct xrep_rtrmap *rr, + struct xfs_inode *ip) +{ + struct xrep_rtrmap_ifork rf = { + .accum = { .rm_owner = ip->i_ino, }, + .rr = rr, + }; + struct xfs_ifork *ifp = xfs_ifork_ptr(ip, XFS_DATA_FORK); + int error = 0; + + if (ifp->if_format == XFS_DINODE_FMT_BTREE) { + bool mappings_done; + + /* + * Scan the bmbt for mappings. If the incore extent tree is + * loaded, we want to scan the cached mappings since that's + * faster when the extent counts are very high. + */ + error = xrep_rtrmap_scan_bmbt(&rf, ip, &mappings_done); + if (error || mappings_done) + return error; + } else if (ifp->if_format != XFS_DINODE_FMT_EXTENTS) { + /* realtime data forks should only be extents or btree */ + return -EFSCORRUPTED; + } + + /* Scan incore extent cache. */ + return xrep_rtrmap_scan_iext(&rf, ifp); +} + +/* Record reverse mappings for a file. */ +STATIC int +xrep_rtrmap_scan_inode( + struct xrep_rtrmap *rr, + struct xfs_inode *ip) +{ + unsigned int lock_mode; + int error = 0; + + /* Skip the rt rmap btree inode. */ + if (rr->sc->ip == ip) + return 0; + + xfs_ilock(ip, XFS_IOLOCK_SHARED | XFS_MMAPLOCK_SHARED); + lock_mode = xfs_ilock_data_map_shared(ip); + + /* Check the data fork if it's on the realtime device. */ + if (XFS_IS_REALTIME_INODE(ip)) { + error = xrep_rtrmap_scan_dfork(rr, ip); + if (error) + goto out_unlock; + } + + xchk_iscan_mark_visited(&rr->iscan, ip); +out_unlock: + xfs_iunlock(ip, XFS_IOLOCK_SHARED | XFS_MMAPLOCK_SHARED | lock_mode); + return error; +} + +/* Record extents that belong to the realtime rmap inode. */ +STATIC int +xrep_rtrmap_walk_rmap( + struct xfs_btree_cur *cur, + const struct xfs_rmap_irec *rec, + void *priv) +{ + struct xrep_rtrmap *rr = priv; + struct xfs_mount *mp = cur->bc_mp; + xfs_fsblock_t fsbno; + int error = 0; + + if (xchk_should_terminate(rr->sc, &error)) + return error; + + /* Skip extents which are not owned by this inode and fork. */ + if (rec->rm_owner != rr->sc->ip->i_ino) + return 0; + + error = xrep_check_ino_btree_mapping(rr->sc, rec); + if (error) + return error; + + fsbno = XFS_AGB_TO_FSB(mp, cur->bc_ag.pag->pag_agno, + rec->rm_startblock); + + return xfsb_bitmap_set(&rr->old_rtrmapbt_blocks, fsbno, + rec->rm_blockcount); +} + +/* Scan one AG for reverse mappings for the realtime rmap btree. */ +STATIC int +xrep_rtrmap_scan_ag( + struct xrep_rtrmap *rr, + struct xfs_perag *pag) +{ + struct xfs_scrub *sc = rr->sc; + int error; + + error = xrep_ag_init(sc, pag, &sc->sa); + if (error) + return error; + + error = xfs_rmap_query_all(sc->sa.rmap_cur, xrep_rtrmap_walk_rmap, rr); + xchk_ag_free(sc, &sc->sa); + return error; +} + +STATIC int +xrep_rtrmap_find_super_rmaps( + struct xrep_rtrmap *rr) +{ + struct xfs_scrub *sc = rr->sc; + + /* Create a record for the rtgroup superblock. */ + return xrep_rtrmap_stash(rr, 0, sc->mp->m_sb.sb_rextsize, + XFS_RMAP_OWN_FS, 0, 0); +} + +/* Generate all the reverse-mappings for the realtime device. */ +STATIC int +xrep_rtrmap_find_rmaps( + struct xrep_rtrmap *rr) +{ + struct xfs_scrub *sc = rr->sc; + struct xfs_perag *pag; + struct xfs_inode *ip; + xfs_agnumber_t agno; + int error; + + /* Generate rmaps for the rtgroup superblock */ + error = xrep_rtrmap_find_super_rmaps(rr); + if (error) + return error; + + /* + * Set up for a potentially lengthy filesystem scan by reducing our + * transaction resource usage for the duration. Specifically: + * + * Unlock the realtime metadata inodes and cancel the transaction to + * release the log grant space while we scan the filesystem. + * + * Create a new empty transaction to eliminate the possibility of the + * inode scan deadlocking on cyclical metadata. + * + * We pass the empty transaction to the file scanning function to avoid + * repeatedly cycling empty transactions. This can be done even though + * we take the IOLOCK to quiesce the file because empty transactions + * do not take sb_internal. + */ + xchk_trans_cancel(sc); + xchk_rtgroup_unlock(sc, &sc->sr); + error = xchk_trans_alloc_empty(sc); + if (error) + return error; + + while ((error = xchk_iscan_iter(sc, &rr->iscan, &ip)) == 1) { + error = xrep_rtrmap_scan_inode(rr, ip); + xchk_irele(sc, ip); + if (error) + break; + + if (xchk_should_terminate(sc, &error)) + break; + } + if (error) + return error; + + /* + * Switch out for a real transaction and lock the RT metadata in + * preparation for building a new tree. + */ + xchk_trans_cancel(sc); + error = xchk_setup_rt(sc); + if (error) + return error; + error = xchk_rtgroup_lock(sc, &sc->sr, XCHK_RTGLOCK_ALL); + if (error) + return error; + + /* Scan for old rtrmap blocks. */ + for_each_perag(sc->mp, agno, pag) { + error = xrep_rtrmap_scan_ag(rr, pag); + if (error) { + xfs_perag_put(pag); + return error; + } + } + + return 0; +} + +/* Building the new rtrmap btree. */ + +/* Retrieve rtrmapbt data for bulk load. */ +STATIC int +xrep_rtrmap_get_records( + struct xfs_btree_cur *cur, + unsigned int idx, + struct xfs_btree_block *block, + unsigned int nr_wanted, + void *priv) +{ + struct xrep_rtrmap_extent rec; + struct xfs_rmap_irec *irec = &cur->bc_rec.r; + struct xrep_rtrmap *rr = priv; + union xfs_btree_rec *block_rec; + unsigned int loaded; + int error; + + for (loaded = 0; loaded < nr_wanted; loaded++, idx++) { + error = xfarray_load_next(rr->rtrmap_records, &rr->array_cur, + &rec); + if (error) + return error; + + irec->rm_startblock = rec.startblock; + irec->rm_blockcount = rec.blockcount; + irec->rm_owner = rec.owner; + + if (xfs_rmap_irec_offset_unpack(rec.offset, irec) != NULL) + return -EFSCORRUPTED; + + error = xrep_rtrmap_check_mapping(rr->sc, irec); + if (error) + return error; + + block_rec = xfs_btree_rec_addr(cur, idx, block); + cur->bc_ops->init_rec_from_cur(cur, block_rec); + } + + return loaded; +} + +/* Feed one of the new btree blocks to the bulk loader. */ +STATIC int +xrep_rtrmap_claim_block( + struct xfs_btree_cur *cur, + union xfs_btree_ptr *ptr, + void *priv) +{ + struct xrep_rtrmap *rr = priv; + + return xrep_newbt_claim_block(cur, &rr->new_btree, ptr); +} + +/* Figure out how much space we need to create the incore btree root block. */ +STATIC size_t +xrep_rtrmap_iroot_size( + struct xfs_btree_cur *cur, + unsigned int level, + unsigned int nr_this_level, + void *priv) +{ + return xfs_rtrmap_broot_space_calc(cur->bc_mp, level, nr_this_level); +} + +/* + * Use the collected rmap information to stage a new rmap btree. If this is + * successful we'll return with the new btree root information logged to the + * repair transaction but not yet committed. This implements section (III) + * above. + */ +STATIC int +xrep_rtrmap_build_new_tree( + struct xrep_rtrmap *rr) +{ + struct xfs_owner_info oinfo; + struct xfs_scrub *sc = rr->sc; + struct xfs_rtgroup *rtg = sc->sr.rtg; + struct xfs_btree_cur *rmap_cur; + uint64_t nr_records; + int error; + + /* + * Prepare to construct the new btree by reserving disk space for the + * new btree and setting up all the accounting information we'll need + * to root the new btree while it's under construction and before we + * attach it to the realtime rmapbt inode. + */ + xfs_rmap_ino_bmbt_owner(&oinfo, rtg->rtg_rmapip->i_ino, XFS_DATA_FORK); + error = xrep_newbt_init_inode(&rr->new_btree, sc, XFS_DATA_FORK, + &oinfo); + if (error) + return error; + rr->new_btree.bload.get_records = xrep_rtrmap_get_records; + rr->new_btree.bload.claim_block = xrep_rtrmap_claim_block; + rr->new_btree.bload.iroot_size = xrep_rtrmap_iroot_size; + + rmap_cur = xfs_rtrmapbt_stage_cursor(sc->mp, rtg, rtg->rtg_rmapip, + &rr->new_btree.ifake); + + nr_records = xfarray_length(rr->rtrmap_records); + + /* Compute how many blocks we'll need for the rmaps collected. */ + error = xfs_btree_bload_compute_geometry(rmap_cur, + &rr->new_btree.bload, nr_records); + if (error) + goto err_cur; + + /* Last chance to abort before we start committing fixes. */ + if (xchk_should_terminate(sc, &error)) + goto err_cur; + + /* + * Guess how many blocks we're going to need to rebuild an entire + * rtrmapbt from the number of extents we found, and pump up our + * transaction to have sufficient block reservation. We're allowed + * to exceed quota to repair inconsistent metadata, though this is + * unlikely. + */ + error = xfs_trans_reserve_more_inode(sc->tp, rtg->rtg_rmapip, + rr->new_btree.bload.nr_blocks, 0, true); + if (error) + goto err_cur; + + /* Reserve the space we'll need for the new btree. */ + error = xrep_newbt_alloc_blocks(&rr->new_btree, + rr->new_btree.bload.nr_blocks); + if (error) + goto err_cur; + + /* Add all observed rmap records. */ + rr->new_btree.ifake.if_fork->if_format = XFS_DINODE_FMT_RMAP; + rr->array_cur = XFARRAY_CURSOR_INIT; + error = xfs_btree_bload(rmap_cur, &rr->new_btree.bload, rr); + if (error) + goto err_cur; + + /* + * Install the new rtrmap btree in the inode. After this point the old + * btree is no longer accessible, the new tree is live, and we can + * delete the cursor. + */ + xfs_rtrmapbt_commit_staged_btree(rmap_cur, sc->tp); + xrep_inode_set_nblocks(rr->sc, rr->new_btree.ifake.if_blocks); + xfs_btree_del_cursor(rmap_cur, 0); + + /* Dispose of any unused blocks and the accounting information. */ + error = xrep_newbt_commit(&rr->new_btree); + if (error) + return error; + + return xrep_roll_trans(sc); + +err_cur: + xfs_btree_del_cursor(rmap_cur, error); + xrep_newbt_cancel(&rr->new_btree); + return error; +} + +/* Reaping the old btree. */ + +/* Reap the old rtrmapbt blocks. */ +STATIC int +xrep_rtrmap_remove_old_tree( + struct xrep_rtrmap *rr) +{ + struct xfs_owner_info oinfo; + int error; + + /* + * Free all the extents that were allocated to the former rtrmapbt and + * aren't cross-linked with something else. + */ + xfs_rmap_ino_bmbt_owner(&oinfo, rr->sc->ip->i_ino, XFS_DATA_FORK); + error = xrep_reap_fsblocks(rr->sc, &rr->old_rtrmapbt_blocks, &oinfo, + XFS_AG_RESV_IMETA); + if (error) + return error; + + /* + * Ensure the proper reservation for the rtrmap inode so that we don't + * fail to expand the new btree. + */ + return xrep_reset_imeta_reservation(rr->sc); +} + +/* Repair the realtime rmap btree. */ +int +xrep_rtrmapbt( + struct xfs_scrub *sc) +{ + struct xrep_rtrmap *rr; + int error; + + /* Functionality is not yet complete. */ + return xrep_notsupported(sc); + + /* Make sure any problems with the fork are fixed. */ + error = xrep_metadata_inode_forks(sc); + if (error) + return error; + + rr = kzalloc(sizeof(struct xrep_rtrmap), XCHK_GFP_FLAGS); + if (!rr) + return -ENOMEM; + rr->sc = sc; + + xfsb_bitmap_init(&rr->old_rtrmapbt_blocks); + + /* Set up some storage */ + error = xfarray_create(sc->mp, "rtrmap records", 0, + sizeof(struct xrep_rtrmap_extent), &rr->rtrmap_records); + if (error) + goto out_bitmap; + + /* Retry iget every tenth of a second for up to 30 seconds. */ + xchk_iscan_start(&rr->iscan, 30000, 100); + + /* Collect rmaps for realtime files. */ + error = xrep_rtrmap_find_rmaps(rr); + if (error) + goto out_records; + + xfs_trans_ijoin(sc->tp, sc->ip, 0); + + /* Rebuild the rtrmap information. */ + error = xrep_rtrmap_build_new_tree(rr); + if (error) + goto out_records; + + /* Kill the old tree. */ + error = xrep_rtrmap_remove_old_tree(rr); + +out_records: + xchk_iscan_finish(&rr->iscan); + xfarray_destroy(rr->rtrmap_records); +out_bitmap: + xfsb_bitmap_destroy(&rr->old_rtrmapbt_blocks); + kfree(rr); + return error; +} diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c index 7abd25b37c97..ab7a36efab3b 100644 --- a/fs/xfs/scrub/scrub.c +++ b/fs/xfs/scrub/scrub.c @@ -430,7 +430,7 @@ static const struct xchk_meta_ops meta_scrub_ops[] = { .setup = xchk_setup_rtrmapbt, .scrub = xchk_rtrmapbt, .has = xfs_has_rtrmapbt, - .repair = xrep_notsupported, + .repair = xrep_rtrmapbt, }, }; diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h index 7d086ffce7e3..654cbcbd99ea 100644 --- a/fs/xfs/scrub/trace.h +++ b/fs/xfs/scrub/trace.h @@ -1715,6 +1715,32 @@ TRACE_EVENT(xrep_calc_ag_resblks_btsize, __entry->rmapbt_sz, __entry->refcbt_sz) ) + +#ifdef CONFIG_XFS_RT +TRACE_EVENT(xrep_calc_rtgroup_resblks_btsize, + TP_PROTO(struct xfs_mount *mp, xfs_rgnumber_t rgno, + xfs_rgblock_t usedlen, xfs_rgblock_t rmapbt_sz), + TP_ARGS(mp, rgno, usedlen, rmapbt_sz), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(xfs_rgnumber_t, rgno) + __field(xfs_rgblock_t, usedlen) + __field(xfs_rgblock_t, rmapbt_sz) + ), + TP_fast_assign( + __entry->dev = mp->m_super->s_dev; + __entry->rgno = rgno; + __entry->usedlen = usedlen; + __entry->rmapbt_sz = rmapbt_sz; + ), + TP_printk("dev %d:%d rgno 0x%x usedlen %u rmapbt %u", + MAJOR(__entry->dev), MINOR(__entry->dev), + __entry->rgno, + __entry->usedlen, + __entry->rmapbt_sz) +); +#endif /* CONFIG_XFS_RT */ + TRACE_EVENT(xrep_reset_counters, TP_PROTO(struct xfs_mount *mp, struct xchk_fscounters *fsc), TP_ARGS(mp, fsc), @@ -2967,6 +2993,37 @@ TRACE_EVENT(xrep_rgbitmap_load_word, (__entry->ondisk_word & __entry->word_mask), __entry->word_mask) ); + +TRACE_EVENT(xrep_rtrmap_found, + TP_PROTO(struct xfs_mount *mp, const struct xfs_rmap_irec *rec), + TP_ARGS(mp, rec), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(dev_t, rtdev) + __field(xfs_rgblock_t, rgbno) + __field(xfs_extlen_t, len) + __field(uint64_t, owner) + __field(uint64_t, offset) + __field(unsigned int, flags) + ), + TP_fast_assign( + __entry->dev = mp->m_super->s_dev; + __entry->rtdev = mp->m_rtdev_targp->bt_dev; + __entry->rgbno = rec->rm_startblock; + __entry->len = rec->rm_blockcount; + __entry->owner = rec->rm_owner; + __entry->offset = rec->rm_offset; + __entry->flags = rec->rm_flags; + ), + TP_printk("dev %d:%d rtdev %d:%d rgbno 0x%x fsbcount 0x%x owner 0x%llx fileoff 0x%llx flags 0x%x", + MAJOR(__entry->dev), MINOR(__entry->dev), + MAJOR(__entry->rtdev), MINOR(__entry->rtdev), + __entry->rgbno, + __entry->len, + __entry->owner, + __entry->offset, + __entry->flags) +); #endif /* CONFIG_XFS_RT */ #endif /* IS_ENABLED(CONFIG_XFS_ONLINE_REPAIR) */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 36/38] xfs: create a shadow rmap btree during realtime rmap repair 2022-12-30 22:18 ` [PATCHSET v1.0 00/38] xfs: realtime reverse-mapping support Darrick J. Wong ` (34 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 35/38] xfs: online repair of the realtime rmap btree Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 38/38] xfs: enable realtime rmap btree Darrick J. Wong 2022-12-30 22:18 ` [PATCH 37/38] xfs: hook live realtime rmap operations during a repair operation Darrick J. Wong 37 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Create an in-memory btree of rmap records instead of an array. This enables us to do live record collection instead of freezing the fs. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_btree.c | 2 + fs/xfs/libxfs/xfs_btree.h | 1 fs/xfs/libxfs/xfs_rmap.c | 6 +- fs/xfs/libxfs/xfs_rtrmap_btree.c | 122 +++++++++++++++++++++++++++++++ fs/xfs/libxfs/xfs_rtrmap_btree.h | 9 ++ fs/xfs/scrub/rtrmap_repair.c | 150 +++++++++++++++++++++++++++----------- fs/xfs/scrub/xfbtree.c | 3 + 7 files changed, 248 insertions(+), 45 deletions(-) diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c index fe742567a7dd..377dc9b0a6e6 100644 --- a/fs/xfs/libxfs/xfs_btree.c +++ b/fs/xfs/libxfs/xfs_btree.c @@ -491,6 +491,8 @@ xfs_btree_del_cursor( if (cur->bc_flags & XFS_BTREE_IN_MEMORY) { if (cur->bc_mem.pag) xfs_perag_put(cur->bc_mem.pag); + if (cur->bc_mem.rtg) + xfs_rtgroup_put(cur->bc_mem.rtg); } kmem_cache_free(cur->bc_cache, cur); } diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h index 5a733767649b..20342ed62bf4 100644 --- a/fs/xfs/libxfs/xfs_btree.h +++ b/fs/xfs/libxfs/xfs_btree.h @@ -266,6 +266,7 @@ struct xfs_btree_cur_mem { struct xfbtree *xfbtree; struct xfs_buf *head_bp; struct xfs_perag *pag; + struct xfs_rtgroup *rtg; }; struct xfs_btree_level { diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c index 9c678e9fded5..06840fc31f02 100644 --- a/fs/xfs/libxfs/xfs_rmap.c +++ b/fs/xfs/libxfs/xfs_rmap.c @@ -328,8 +328,12 @@ xfs_rmap_check_irec( struct xfs_btree_cur *cur, const struct xfs_rmap_irec *irec) { - if (cur->bc_btnum == XFS_BTNUM_RTRMAP) + if (cur->bc_btnum == XFS_BTNUM_RTRMAP) { + if (cur->bc_flags & XFS_BTREE_IN_MEMORY) + return xfs_rmap_check_rtgroup_irec(cur->bc_mem.rtg, + irec); return xfs_rmap_check_rtgroup_irec(cur->bc_ino.rtg, irec); + } if (cur->bc_flags & XFS_BTREE_IN_MEMORY) return xfs_rmap_check_perag_irec(cur->bc_mem.pag, irec); diff --git a/fs/xfs/libxfs/xfs_rtrmap_btree.c b/fs/xfs/libxfs/xfs_rtrmap_btree.c index 418173f6f3ca..878bfeed411f 100644 --- a/fs/xfs/libxfs/xfs_rtrmap_btree.c +++ b/fs/xfs/libxfs/xfs_rtrmap_btree.c @@ -29,6 +29,9 @@ #include "xfs_bmap.h" #include "xfs_imeta.h" #include "xfs_health.h" +#include "scrub/xfile.h" +#include "scrub/xfbtree.h" +#include "xfs_btree_mem.h" static struct kmem_cache *xfs_rtrmapbt_cur_cache; @@ -557,6 +560,125 @@ xfs_rtrmapbt_stage_cursor( return cur; } +#ifdef CONFIG_XFS_IN_MEMORY_BTREE +/* + * Validate an in-memory realtime rmap btree block. Callers are allowed to + * generate an in-memory btree even if the ondisk feature is not enabled. + */ +static xfs_failaddr_t +xfs_rtrmapbt_mem_verify( + struct xfs_buf *bp) +{ + struct xfs_mount *mp = bp->b_mount; + struct xfs_btree_block *block = XFS_BUF_TO_BLOCK(bp); + xfs_failaddr_t fa; + unsigned int level; + + if (!xfs_verify_magic(bp, block->bb_magic)) + return __this_address; + + fa = xfs_btree_lblock_v5hdr_verify(bp, XFS_RMAP_OWN_UNKNOWN); + if (fa) + return fa; + + level = be16_to_cpu(block->bb_level); + if (xfs_has_rmapbt(mp)) { + if (level >= mp->m_rtrmap_maxlevels) + return __this_address; + } else { + if (level >= xfs_rtrmapbt_maxlevels_ondisk()) + return __this_address; + } + + return xfbtree_lblock_verify(bp, + xfs_rtrmapbt_maxrecs(mp, xfo_to_b(1), level == 0)); +} + +static void +xfs_rtrmapbt_mem_rw_verify( + struct xfs_buf *bp) +{ + xfs_failaddr_t fa = xfs_rtrmapbt_mem_verify(bp); + + if (fa) + xfs_verifier_error(bp, -EFSCORRUPTED, fa); +} + +/* skip crc checks on in-memory btrees to save time */ +static const struct xfs_buf_ops xfs_rtrmapbt_mem_buf_ops = { + .name = "xfs_rtrmapbt_mem", + .magic = { 0, cpu_to_be32(XFS_RTRMAP_CRC_MAGIC) }, + .verify_read = xfs_rtrmapbt_mem_rw_verify, + .verify_write = xfs_rtrmapbt_mem_rw_verify, + .verify_struct = xfs_rtrmapbt_mem_verify, +}; + +static const struct xfs_btree_ops xfs_rtrmapbt_mem_ops = { + .rec_len = sizeof(struct xfs_rmap_rec), + .key_len = 2 * sizeof(struct xfs_rmap_key), + .geom_flags = XFS_BTREE_CRC_BLOCKS | XFS_BTREE_OVERLAPPING | + XFS_BTREE_LONG_PTRS | XFS_BTREE_IN_MEMORY, + + .dup_cursor = xfbtree_dup_cursor, + .set_root = xfbtree_set_root, + .alloc_block = xfbtree_alloc_block, + .free_block = xfbtree_free_block, + .get_minrecs = xfbtree_get_minrecs, + .get_maxrecs = xfbtree_get_maxrecs, + .init_key_from_rec = xfs_rtrmapbt_init_key_from_rec, + .init_high_key_from_rec = xfs_rtrmapbt_init_high_key_from_rec, + .init_rec_from_cur = xfs_rtrmapbt_init_rec_from_cur, + .init_ptr_from_cur = xfbtree_init_ptr_from_cur, + .key_diff = xfs_rtrmapbt_key_diff, + .buf_ops = &xfs_rtrmapbt_mem_buf_ops, + .diff_two_keys = xfs_rtrmapbt_diff_two_keys, + .keys_inorder = xfs_rtrmapbt_keys_inorder, + .recs_inorder = xfs_rtrmapbt_recs_inorder, + .keys_contiguous = xfs_rtrmapbt_keys_contiguous, +}; + +/* Create a cursor for an in-memory btree. */ +struct xfs_btree_cur * +xfs_rtrmapbt_mem_cursor( + struct xfs_rtgroup *rtg, + struct xfs_trans *tp, + struct xfs_buf *head_bp, + struct xfbtree *xfbtree) +{ + struct xfs_btree_cur *cur; + struct xfs_mount *mp = rtg->rtg_mount; + + /* Overlapping btree; 2 keys per pointer. */ + cur = xfs_btree_alloc_cursor(mp, tp, XFS_BTNUM_RTRMAP, + &xfs_rtrmapbt_mem_ops, mp->m_rtrmap_maxlevels, + xfs_rtrmapbt_cur_cache); + cur->bc_statoff = XFS_STATS_CALC_INDEX(xs_rmap_2); + cur->bc_mem.xfbtree = xfbtree; + cur->bc_mem.head_bp = head_bp; + cur->bc_nlevels = xfs_btree_mem_head_nlevels(head_bp); + + cur->bc_mem.rtg = xfs_rtgroup_bump(rtg); + return cur; +} + +int +xfs_rtrmapbt_mem_create( + struct xfs_mount *mp, + xfs_rgnumber_t rgno, + struct xfs_buftarg *target, + struct xfbtree **xfbtreep) +{ + struct xfbtree_config cfg = { + .btree_ops = &xfs_rtrmapbt_mem_ops, + .target = target, + .flags = XFBTREE_DIRECT_MAP, + .owner = rgno, + }; + + return xfbtree_create(mp, &cfg, xfbtreep); +} +#endif /* CONFIG_XFS_IN_MEMORY_BTREE */ + /* * Install a new rt reverse mapping btree root. Caller is responsible for * invalidating and freeing the old btree blocks. diff --git a/fs/xfs/libxfs/xfs_rtrmap_btree.h b/fs/xfs/libxfs/xfs_rtrmap_btree.h index 1f0a6f9620e8..ff60a2ca945f 100644 --- a/fs/xfs/libxfs/xfs_rtrmap_btree.h +++ b/fs/xfs/libxfs/xfs_rtrmap_btree.h @@ -206,4 +206,13 @@ int xfs_rtrmapbt_create(struct xfs_trans **tpp, struct xfs_imeta_path *path, unsigned long long xfs_rtrmapbt_calc_size(struct xfs_mount *mp, unsigned long long len); +#ifdef CONFIG_XFS_IN_MEMORY_BTREE +struct xfbtree; +struct xfs_btree_cur *xfs_rtrmapbt_mem_cursor(struct xfs_rtgroup *rtg, + struct xfs_trans *tp, struct xfs_buf *mhead_bp, + struct xfbtree *xfbtree); +int xfs_rtrmapbt_mem_create(struct xfs_mount *mp, xfs_rgnumber_t rgno, + struct xfs_buftarg *target, struct xfbtree **xfbtreep); +#endif /* CONFIG_XFS_IN_MEMORY_BTREE */ + #endif /* __XFS_RTRMAP_BTREE_H__ */ diff --git a/fs/xfs/scrub/rtrmap_repair.c b/fs/xfs/scrub/rtrmap_repair.c index d856a4e46d6f..5775efa67de6 100644 --- a/fs/xfs/scrub/rtrmap_repair.c +++ b/fs/xfs/scrub/rtrmap_repair.c @@ -12,6 +12,7 @@ #include "xfs_defer.h" #include "xfs_btree.h" #include "xfs_btree_staging.h" +#include "xfs_btree_mem.h" #include "xfs_bit.h" #include "xfs_log_format.h" #include "xfs_trans.h" @@ -40,6 +41,7 @@ #include "scrub/iscan.h" #include "scrub/newbt.h" #include "scrub/reap.h" +#include "scrub/xfbtree.h" /* * Realtime Reverse Mapping Btree Repair @@ -68,28 +70,16 @@ int xrep_setup_rtrmapbt( struct xfs_scrub *sc) { - /* For now this is a placeholder until we land other pieces. */ - return 0; + return xrep_setup_buftarg(sc, "rtrmapbt repair"); } -/* - * Packed rmap record. The UNWRITTEN flags are hidden in the upper bits of - * offset, just like the on-disk record. - */ -struct xrep_rtrmap_extent { - xfs_rgblock_t startblock; - xfs_extlen_t blockcount; - uint64_t owner; - uint64_t offset; -} __packed; - /* Context for collecting rmaps */ struct xrep_rtrmap { /* new rtrmapbt information */ struct xrep_newbt new_btree; /* rmap records generated from primary metadata */ - struct xfarray *rtrmap_records; + struct xfbtree *rtrmap_btree; struct xfs_scrub *sc; @@ -99,8 +89,11 @@ struct xrep_rtrmap { /* inode scan cursor */ struct xchk_iscan iscan; - /* get_records()'s position in the free space record array. */ - xfarray_idx_t array_cur; + /* in-memory btree cursor for the ->get_blocks walk */ + struct xfs_btree_cur *mcur; + + /* Number of records we're staging in the new btree. */ + uint64_t nr_records; }; /* Make sure there's nothing funny about this mapping. */ @@ -130,11 +123,6 @@ xrep_rtrmap_stash( uint64_t offset, unsigned int flags) { - struct xrep_rtrmap_extent rre = { - .startblock = startblock, - .blockcount = blockcount, - .owner = owner, - }; struct xfs_rmap_irec rmap = { .rm_startblock = startblock, .rm_blockcount = blockcount, @@ -143,6 +131,8 @@ xrep_rtrmap_stash( .rm_flags = flags, }; struct xfs_scrub *sc = rr->sc; + struct xfs_btree_cur *mcur; + struct xfs_buf *mhead_bp; int error = 0; if (xchk_should_terminate(sc, &error)) @@ -150,8 +140,23 @@ xrep_rtrmap_stash( trace_xrep_rtrmap_found(sc->mp, &rmap); - rre.offset = xfs_rmap_irec_offset_pack(&rmap); - return xfarray_append(rr->rtrmap_records, &rre); + /* Add entry to in-memory btree. */ + error = xfbtree_head_read_buf(rr->rtrmap_btree, sc->tp, &mhead_bp); + if (error) + return error; + + mcur = xfs_rtrmapbt_mem_cursor(sc->sr.rtg, sc->tp, mhead_bp, + rr->rtrmap_btree); + error = xfs_rmap_map_raw(mcur, &rmap); + xfs_btree_del_cursor(mcur, error); + if (error) + goto out_cancel; + + return xfbtree_trans_commit(rr->rtrmap_btree, sc->tp); + +out_cancel: + xfbtree_trans_cancel(rr->rtrmap_btree, sc->tp); + return error; } /* Finding all file and bmbt extents. */ @@ -395,6 +400,24 @@ xrep_rtrmap_scan_ag( return error; } +/* Count and check all collected records. */ +STATIC int +xrep_rtrmap_check_record( + struct xfs_btree_cur *cur, + const struct xfs_rmap_irec *rec, + void *priv) +{ + struct xrep_rtrmap *rr = priv; + int error; + + error = xrep_rtrmap_check_mapping(rr->sc, rec); + if (error) + return error; + + rr->nr_records++; + return 0; +} + STATIC int xrep_rtrmap_find_super_rmaps( struct xrep_rtrmap *rr) @@ -414,6 +437,8 @@ xrep_rtrmap_find_rmaps( struct xfs_scrub *sc = rr->sc; struct xfs_perag *pag; struct xfs_inode *ip; + struct xfs_buf *mhead_bp; + struct xfs_btree_cur *mcur; xfs_agnumber_t agno; int error; @@ -476,7 +501,25 @@ xrep_rtrmap_find_rmaps( } } - return 0; + /* + * Now that we have everything locked again, we need to count the + * number of rmap records stashed in the btree. This should reflect + * all actively-owned rt files in the filesystem. At the same time, + * check all our records before we start building a new btree, which + * requires the rtbitmap lock. + */ + error = xfbtree_head_read_buf(rr->rtrmap_btree, NULL, &mhead_bp); + if (error) + return error; + + mcur = xfs_rtrmapbt_mem_cursor(rr->sc->sr.rtg, NULL, mhead_bp, + rr->rtrmap_btree); + rr->nr_records = 0; + error = xfs_rmap_query_all(mcur, xrep_rtrmap_check_record, rr); + xfs_btree_del_cursor(mcur, error); + xfs_buf_relse(mhead_bp); + + return error; } /* Building the new rtrmap btree. */ @@ -490,29 +533,25 @@ xrep_rtrmap_get_records( unsigned int nr_wanted, void *priv) { - struct xrep_rtrmap_extent rec; - struct xfs_rmap_irec *irec = &cur->bc_rec.r; struct xrep_rtrmap *rr = priv; union xfs_btree_rec *block_rec; unsigned int loaded; int error; for (loaded = 0; loaded < nr_wanted; loaded++, idx++) { - error = xfarray_load_next(rr->rtrmap_records, &rr->array_cur, - &rec); + int stat = 0; + + error = xfs_btree_increment(rr->mcur, 0, &stat); if (error) return error; - - irec->rm_startblock = rec.startblock; - irec->rm_blockcount = rec.blockcount; - irec->rm_owner = rec.owner; - - if (xfs_rmap_irec_offset_unpack(rec.offset, irec) != NULL) + if (!stat) return -EFSCORRUPTED; - error = xrep_rtrmap_check_mapping(rr->sc, irec); + error = xfs_rmap_get_rec(rr->mcur, &cur->bc_rec.r, &stat); if (error) return error; + if (!stat) + return -EFSCORRUPTED; block_rec = xfs_btree_rec_addr(cur, idx, block); cur->bc_ops->init_rec_from_cur(cur, block_rec); @@ -558,7 +597,7 @@ xrep_rtrmap_build_new_tree( struct xfs_scrub *sc = rr->sc; struct xfs_rtgroup *rtg = sc->sr.rtg; struct xfs_btree_cur *rmap_cur; - uint64_t nr_records; + struct xfs_buf *mhead_bp; int error; /* @@ -579,11 +618,9 @@ xrep_rtrmap_build_new_tree( rmap_cur = xfs_rtrmapbt_stage_cursor(sc->mp, rtg, rtg->rtg_rmapip, &rr->new_btree.ifake); - nr_records = xfarray_length(rr->rtrmap_records); - /* Compute how many blocks we'll need for the rmaps collected. */ error = xfs_btree_bload_compute_geometry(rmap_cur, - &rr->new_btree.bload, nr_records); + &rr->new_btree.bload, rr->nr_records); if (error) goto err_cur; @@ -609,12 +646,25 @@ xrep_rtrmap_build_new_tree( if (error) goto err_cur; + /* + * Create a cursor to the in-memory btree so that we can bulk load the + * new btree. + */ + error = xfbtree_head_read_buf(rr->rtrmap_btree, NULL, &mhead_bp); + if (error) + goto err_cur; + + rr->mcur = xfs_rtrmapbt_mem_cursor(sc->sr.rtg, NULL, mhead_bp, + rr->rtrmap_btree); + error = xfs_btree_goto_left_edge(rr->mcur); + if (error) + goto err_mcur; + /* Add all observed rmap records. */ rr->new_btree.ifake.if_fork->if_format = XFS_DINODE_FMT_RMAP; - rr->array_cur = XFARRAY_CURSOR_INIT; error = xfs_btree_bload(rmap_cur, &rr->new_btree.bload, rr); if (error) - goto err_cur; + goto err_mcur; /* * Install the new rtrmap btree in the inode. After this point the old @@ -624,6 +674,15 @@ xrep_rtrmap_build_new_tree( xfs_rtrmapbt_commit_staged_btree(rmap_cur, sc->tp); xrep_inode_set_nblocks(rr->sc, rr->new_btree.ifake.if_blocks); xfs_btree_del_cursor(rmap_cur, 0); + xfs_btree_del_cursor(rr->mcur, 0); + rr->mcur = NULL; + xfs_buf_relse(mhead_bp); + + /* + * Now that we've written the new btree to disk, we don't need to keep + * updating the in-memory btree. Abort the scan to stop live updates. + */ + xchk_iscan_abort(&rr->iscan); /* Dispose of any unused blocks and the accounting information. */ error = xrep_newbt_commit(&rr->new_btree); @@ -632,6 +691,9 @@ xrep_rtrmap_build_new_tree( return xrep_roll_trans(sc); +err_mcur: + xfs_btree_del_cursor(rr->mcur, error); + xfs_buf_relse(mhead_bp); err_cur: xfs_btree_del_cursor(rmap_cur, error); xrep_newbt_cancel(&rr->new_btree); @@ -689,8 +751,8 @@ xrep_rtrmapbt( xfsb_bitmap_init(&rr->old_rtrmapbt_blocks); /* Set up some storage */ - error = xfarray_create(sc->mp, "rtrmap records", 0, - sizeof(struct xrep_rtrmap_extent), &rr->rtrmap_records); + error = xfs_rtrmapbt_mem_create(sc->mp, sc->sr.rtg->rtg_rgno, + sc->xfile_buftarg, &rr->rtrmap_btree); if (error) goto out_bitmap; @@ -714,7 +776,7 @@ xrep_rtrmapbt( out_records: xchk_iscan_finish(&rr->iscan); - xfarray_destroy(rr->rtrmap_records); + xfbtree_destroy(rr->rtrmap_btree); out_bitmap: xfsb_bitmap_destroy(&rr->old_rtrmapbt_blocks); kfree(rr); diff --git a/fs/xfs/scrub/xfbtree.c b/fs/xfs/scrub/xfbtree.c index 55d530213d40..d803bb1d151a 100644 --- a/fs/xfs/scrub/xfbtree.c +++ b/fs/xfs/scrub/xfbtree.c @@ -17,6 +17,7 @@ #include "xfs_error.h" #include "xfs_btree_mem.h" #include "xfs_ag.h" +#include "xfs_rtgroup.h" #include "scrub/scrub.h" #include "scrub/xfile.h" #include "scrub/xfbtree.h" @@ -267,6 +268,8 @@ xfbtree_dup_cursor( if (cur->bc_mem.pag) ncur->bc_mem.pag = xfs_perag_bump(cur->bc_mem.pag); + if (cur->bc_mem.rtg) + ncur->bc_mem.rtg = xfs_rtgroup_bump(cur->bc_mem.rtg); return ncur; } ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 38/38] xfs: enable realtime rmap btree 2022-12-30 22:18 ` [PATCHSET v1.0 00/38] xfs: realtime reverse-mapping support Darrick J. Wong ` (35 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 36/38] xfs: create a shadow rmap btree during realtime rmap repair Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 37/38] xfs: hook live realtime rmap operations during a repair operation Darrick J. Wong 37 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/xfs_super.c | 6 ------ 1 file changed, 6 deletions(-) diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index e145de0bd562..4abeff701093 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -1669,12 +1669,6 @@ xfs_fs_fill_super( } } - if (xfs_has_rmapbt(mp) && mp->m_sb.sb_rblocks) { - xfs_alert(mp, - "reverse mapping btree not compatible with realtime device!"); - error = -EINVAL; - goto out_filestream_unmount; - } if (xfs_has_large_extent_counts(mp)) xfs_warn(mp, ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 37/38] xfs: hook live realtime rmap operations during a repair operation 2022-12-30 22:18 ` [PATCHSET v1.0 00/38] xfs: realtime reverse-mapping support Darrick J. Wong ` (36 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 38/38] xfs: enable realtime rmap btree Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 37 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Hook the regular realtime rmap code when an rtrmapbt repair operation is running so that we can unlock the AGF buffer to scan the filesystem and keep the in-memory btree up to date during the scan. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_rmap.c | 39 ++++++++++-- fs/xfs/libxfs/xfs_rmap.h | 6 ++ fs/xfs/libxfs/xfs_rtgroup.c | 2 - fs/xfs/libxfs/xfs_rtgroup.h | 3 + fs/xfs/scrub/rtrmap_repair.c | 138 ++++++++++++++++++++++++++++++++++++++++-- fs/xfs/scrub/trace.h | 36 +++++++++++ 6 files changed, 211 insertions(+), 13 deletions(-) diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c index 06840fc31f02..a533588a9b5b 100644 --- a/fs/xfs/libxfs/xfs_rmap.c +++ b/fs/xfs/libxfs/xfs_rmap.c @@ -906,6 +906,7 @@ static inline void xfs_rmap_update_hook( struct xfs_trans *tp, struct xfs_perag *pag, + struct xfs_rtgroup *rtg, enum xfs_rmap_intent_type op, xfs_agblock_t startblock, xfs_extlen_t blockcount, @@ -922,6 +923,8 @@ xfs_rmap_update_hook( if (pag) xfs_hooks_call(&pag->pag_rmap_update_hooks, op, &p); + else if (rtg) + xfs_hooks_call(&rtg->rtg_rmap_update_hooks, op, &p); } } @@ -942,8 +945,28 @@ xfs_rmap_hook_del( { xfs_hooks_del(&pag->pag_rmap_update_hooks, &hook->update_hook); } + +# ifdef CONFIG_XFS_RT +/* Call the specified function during a rt reverse mapping update. */ +int +xfs_rtrmap_hook_add( + struct xfs_rtgroup *rtg, + struct xfs_rmap_hook *hook) +{ + return xfs_hooks_add(&rtg->rtg_rmap_update_hooks, &hook->update_hook); +} + +/* Stop calling the specified function during a rt reverse mapping update. */ +void +xfs_rtrmap_hook_del( + struct xfs_rtgroup *rtg, + struct xfs_rmap_hook *hook) +{ + xfs_hooks_del(&rtg->rtg_rmap_update_hooks, &hook->update_hook); +} +# endif /* CONFIG_XFS_RT */ #else -# define xfs_rmap_update_hook(t, p, o, s, b, u, oi) do { } while(0) +# define xfs_rmap_update_hook(t, p, r, o, s, b, u, oi) do { } while(0) #endif /* CONFIG_XFS_LIVE_HOOKS */ /* @@ -966,7 +989,8 @@ xfs_rmap_free( return 0; cur = xfs_rmapbt_init_cursor(mp, tp, agbp, pag); - xfs_rmap_update_hook(tp, pag, XFS_RMAP_UNMAP, bno, len, false, oinfo); + xfs_rmap_update_hook(tp, pag, NULL, XFS_RMAP_UNMAP, bno, len, false, + oinfo); error = xfs_rmap_unmap(cur, bno, len, false, oinfo); xfs_btree_del_cursor(cur, error); @@ -1210,7 +1234,8 @@ xfs_rmap_alloc( return 0; cur = xfs_rmapbt_init_cursor(mp, tp, agbp, pag); - xfs_rmap_update_hook(tp, pag, XFS_RMAP_MAP, bno, len, false, oinfo); + xfs_rmap_update_hook(tp, pag, NULL, XFS_RMAP_MAP, bno, len, false, + oinfo); error = xfs_rmap_map(cur, bno, len, false, oinfo); xfs_btree_del_cursor(cur, error); @@ -2731,8 +2756,12 @@ xfs_rmap_finish_one( if (error) return error; - xfs_rmap_update_hook(tp, ri->ri_pag, ri->ri_type, bno, - ri->ri_bmap.br_blockcount, unwritten, &oinfo); + if (ri->ri_realtime) + xfs_rmap_update_hook(tp, NULL, ri->ri_rtg, ri->ri_type, bno, + ri->ri_bmap.br_blockcount, unwritten, &oinfo); + else + xfs_rmap_update_hook(tp, ri->ri_pag, NULL, ri->ri_type, bno, + ri->ri_bmap.br_blockcount, unwritten, &oinfo); return 0; } diff --git a/fs/xfs/libxfs/xfs_rmap.h b/fs/xfs/libxfs/xfs_rmap.h index 9d0aaa16f551..36d071b3b44c 100644 --- a/fs/xfs/libxfs/xfs_rmap.h +++ b/fs/xfs/libxfs/xfs_rmap.h @@ -279,6 +279,12 @@ void xfs_rmap_hook_enable(void); int xfs_rmap_hook_add(struct xfs_perag *pag, struct xfs_rmap_hook *hook); void xfs_rmap_hook_del(struct xfs_perag *pag, struct xfs_rmap_hook *hook); + +# ifdef CONFIG_XFS_RT +int xfs_rtrmap_hook_add(struct xfs_rtgroup *rtg, struct xfs_rmap_hook *hook); +void xfs_rtrmap_hook_del(struct xfs_rtgroup *rtg, struct xfs_rmap_hook *hook); +# endif /* CONFIG_XFS_RT */ + #endif #endif /* __XFS_RMAP_H__ */ diff --git a/fs/xfs/libxfs/xfs_rtgroup.c b/fs/xfs/libxfs/xfs_rtgroup.c index e40806c84256..bd878e65bc44 100644 --- a/fs/xfs/libxfs/xfs_rtgroup.c +++ b/fs/xfs/libxfs/xfs_rtgroup.c @@ -133,7 +133,7 @@ xfs_initialize_rtgroups( /* Place kernel structure only init below this point. */ spin_lock_init(&rtg->rtg_state_lock); xfs_drain_init(&rtg->rtg_intents); - + xfs_hooks_init(&rtg->rtg_rmap_update_hooks); #endif /* __KERNEL__ */ /* first new rtg is fully initialized */ diff --git a/fs/xfs/libxfs/xfs_rtgroup.h b/fs/xfs/libxfs/xfs_rtgroup.h index 1d41a2cac34f..4e9b9098f2f2 100644 --- a/fs/xfs/libxfs/xfs_rtgroup.h +++ b/fs/xfs/libxfs/xfs_rtgroup.h @@ -46,6 +46,9 @@ struct xfs_rtgroup { * inconsistencies. */ struct xfs_drain rtg_intents; + + /* Hook to feed rt rmapbt updates to an active online repair. */ + struct xfs_hooks rtg_rmap_update_hooks; #endif /* __KERNEL__ */ }; diff --git a/fs/xfs/scrub/rtrmap_repair.c b/fs/xfs/scrub/rtrmap_repair.c index 5775efa67de6..e26847784d21 100644 --- a/fs/xfs/scrub/rtrmap_repair.c +++ b/fs/xfs/scrub/rtrmap_repair.c @@ -70,6 +70,8 @@ int xrep_setup_rtrmapbt( struct xfs_scrub *sc) { + xchk_fshooks_enable(sc, XCHK_FSHOOKS_RMAP); + return xrep_setup_buftarg(sc, "rtrmapbt repair"); } @@ -78,6 +80,9 @@ struct xrep_rtrmap { /* new rtrmapbt information */ struct xrep_newbt new_btree; + /* lock for the xfbtree and xfile */ + struct mutex lock; + /* rmap records generated from primary metadata */ struct xfbtree *rtrmap_btree; @@ -86,6 +91,9 @@ struct xrep_rtrmap { /* bitmap of old rtrmapbt blocks */ struct xfsb_bitmap old_rtrmapbt_blocks; + /* Hooks into rtrmap update code. */ + struct xfs_rmap_hook hooks; + /* inode scan cursor */ struct xchk_iscan iscan; @@ -138,12 +146,16 @@ xrep_rtrmap_stash( if (xchk_should_terminate(sc, &error)) return error; + if (xchk_iscan_aborted(&rr->iscan)) + return -EFSCORRUPTED; + trace_xrep_rtrmap_found(sc->mp, &rmap); /* Add entry to in-memory btree. */ + mutex_lock(&rr->lock); error = xfbtree_head_read_buf(rr->rtrmap_btree, sc->tp, &mhead_bp); if (error) - return error; + goto out_abort; mcur = xfs_rtrmapbt_mem_cursor(sc->sr.rtg, sc->tp, mhead_bp, rr->rtrmap_btree); @@ -152,10 +164,18 @@ xrep_rtrmap_stash( if (error) goto out_cancel; - return xfbtree_trans_commit(rr->rtrmap_btree, sc->tp); + error = xfbtree_trans_commit(rr->rtrmap_btree, sc->tp); + if (error) + goto out_abort; + + mutex_unlock(&rr->lock); + return 0; out_cancel: xfbtree_trans_cancel(rr->rtrmap_btree, sc->tp); +out_abort: + xchk_iscan_abort(&rr->iscan); + mutex_unlock(&rr->lock); return error; } @@ -492,6 +512,13 @@ xrep_rtrmap_find_rmaps( if (error) return error; + /* + * If a hook failed to update the in-memory btree, we lack the data to + * continue the repair. + */ + if (xchk_iscan_aborted(&rr->iscan)) + return -EFSCORRUPTED; + /* Scan for old rtrmap blocks. */ for_each_perag(sc->mp, agno, pag) { error = xrep_rtrmap_scan_ag(rr, pag); @@ -727,6 +754,89 @@ xrep_rtrmap_remove_old_tree( return xrep_reset_imeta_reservation(rr->sc); } +static inline bool +xrep_rtrmapbt_want_live_update( + struct xchk_iscan *iscan, + const struct xfs_owner_info *oi) +{ + if (xchk_iscan_aborted(iscan)) + return false; + + /* + * We scanned the CoW staging extents before we started the iscan, so + * we need all the updates. + */ + if (XFS_RMAP_NON_INODE_OWNER(oi->oi_owner)) + return true; + + /* Ignore updates to files that the scanner hasn't visited yet. */ + return xchk_iscan_want_live_update(iscan, oi->oi_owner); +} + +/* + * Apply a rtrmapbt update from the regular filesystem into our shadow btree. + * We're running from the thread that owns the rtrmap ILOCK and is generating + * the update, so we must be careful about which parts of the struct + * xrep_rtrmap that we change. + */ +static int +xrep_rtrmapbt_live_update( + struct xfs_hook *update_hook, + unsigned long action, + void *data) +{ + struct xfs_rmap_update_params *p = data; + struct xrep_rtrmap *rr; + struct xfs_mount *mp; + struct xfs_btree_cur *mcur; + struct xfs_buf *mhead_bp; + struct xfs_trans *tp; + void *txcookie; + int error; + + rr = container_of(update_hook, struct xrep_rtrmap, hooks.update_hook); + mp = rr->sc->mp; + + if (!xrep_rtrmapbt_want_live_update(&rr->iscan, &p->oinfo)) + goto out_unlock; + + trace_xrep_rtrmap_live_update(mp, rr->sc->sr.rtg->rtg_rgno, action, p); + + error = xrep_trans_alloc_hook_dummy(mp, &txcookie, &tp); + if (error) + goto out_abort; + + mutex_lock(&rr->lock); + error = xfbtree_head_read_buf(rr->rtrmap_btree, tp, &mhead_bp); + if (error) + goto out_cancel; + + mcur = xfs_rtrmapbt_mem_cursor(rr->sc->sr.rtg, tp, mhead_bp, + rr->rtrmap_btree); + error = __xfs_rmap_finish_intent(mcur, action, p->startblock, + p->blockcount, &p->oinfo, p->unwritten); + xfs_btree_del_cursor(mcur, error); + if (error) + goto out_cancel; + + error = xfbtree_trans_commit(rr->rtrmap_btree, tp); + if (error) + goto out_cancel; + + xrep_trans_cancel_hook_dummy(&txcookie, tp); + mutex_unlock(&rr->lock); + return NOTIFY_DONE; + +out_cancel: + xfbtree_trans_cancel(rr->rtrmap_btree, tp); + xrep_trans_cancel_hook_dummy(&txcookie, tp); +out_abort: + xchk_iscan_abort(&rr->iscan); + mutex_unlock(&rr->lock); +out_unlock: + return NOTIFY_DONE; +} + /* Repair the realtime rmap btree. */ int xrep_rtrmapbt( @@ -735,9 +845,6 @@ xrep_rtrmapbt( struct xrep_rtrmap *rr; int error; - /* Functionality is not yet complete. */ - return xrep_notsupported(sc); - /* Make sure any problems with the fork are fixed. */ error = xrep_metadata_inode_forks(sc); if (error) @@ -748,6 +855,7 @@ xrep_rtrmapbt( return -ENOMEM; rr->sc = sc; + mutex_init(&rr->lock); xfsb_bitmap_init(&rr->old_rtrmapbt_blocks); /* Set up some storage */ @@ -759,26 +867,42 @@ xrep_rtrmapbt( /* Retry iget every tenth of a second for up to 30 seconds. */ xchk_iscan_start(&rr->iscan, 30000, 100); + /* + * Hook into live rtrmap operations so that we can update our in-memory + * btree to reflect live changes on the filesystem. Since we drop the + * rtrmap ILOCK to scan all the inodes, we need this piece to avoid + * installing a stale btree. + */ + ASSERT(sc->flags & XCHK_FSHOOKS_RMAP); + xfs_hook_setup(&rr->hooks.update_hook, xrep_rtrmapbt_live_update); + error = xfs_rtrmap_hook_add(sc->sr.rtg, &rr->hooks); + if (error) + goto out_records; + /* Collect rmaps for realtime files. */ error = xrep_rtrmap_find_rmaps(rr); if (error) - goto out_records; + goto out_hook; xfs_trans_ijoin(sc->tp, sc->ip, 0); /* Rebuild the rtrmap information. */ error = xrep_rtrmap_build_new_tree(rr); if (error) - goto out_records; + goto out_hook; /* Kill the old tree. */ error = xrep_rtrmap_remove_old_tree(rr); +out_hook: + xchk_iscan_abort(&rr->iscan); + xfs_rtrmap_hook_del(sc->sr.rtg, &rr->hooks); out_records: xchk_iscan_finish(&rr->iscan); xfbtree_destroy(rr->rtrmap_btree); out_bitmap: xfsb_bitmap_destroy(&rr->old_rtrmapbt_blocks); + mutex_destroy(&rr->lock); kfree(rr); return error; } diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h index 654cbcbd99ea..4cf8180173ca 100644 --- a/fs/xfs/scrub/trace.h +++ b/fs/xfs/scrub/trace.h @@ -3024,6 +3024,42 @@ TRACE_EVENT(xrep_rtrmap_found, __entry->offset, __entry->flags) ); + +TRACE_EVENT(xrep_rtrmap_live_update, + TP_PROTO(struct xfs_mount *mp, xfs_rgnumber_t rgno, unsigned int op, + const struct xfs_rmap_update_params *p), + TP_ARGS(mp, rgno, op, p), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(xfs_rgnumber_t, rgno) + __field(unsigned int, op) + __field(xfs_rgblock_t, rgbno) + __field(xfs_extlen_t, len) + __field(uint64_t, owner) + __field(uint64_t, offset) + __field(unsigned int, flags) + ), + TP_fast_assign( + __entry->dev = mp->m_super->s_dev; + __entry->rgno = rgno; + __entry->op = op; + __entry->rgbno = p->startblock; + __entry->len = p->blockcount; + xfs_owner_info_unpack(&p->oinfo, &__entry->owner, + &__entry->offset, &__entry->flags); + if (p->unwritten) + __entry->flags |= XFS_RMAP_UNWRITTEN; + ), + TP_printk("dev %d:%d rgno 0x%x op %s rgbno 0x%x fsbcount 0x%x owner 0x%llx fileoff 0x%llx flags 0x%x", + MAJOR(__entry->dev), MINOR(__entry->dev), + __entry->rgno, + __print_symbolic(__entry->op, XFS_RMAP_INTENT_STRINGS), + __entry->rgbno, + __entry->len, + __entry->owner, + __entry->offset, + __entry->flags) +); #endif /* CONFIG_XFS_RT */ #endif /* IS_ENABLED(CONFIG_XFS_ONLINE_REPAIR) */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCHSET v1.0 0/5] xfs: refcount log intent cleanups 2022-12-30 21:14 [NYE DELUGE 3/4] xfs: modernize the realtime volume Darrick J. Wong ` (13 preceding siblings ...) 2022-12-30 22:18 ` [PATCHSET v1.0 00/38] xfs: realtime reverse-mapping support Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 3/5] xfs: prepare refcount btree tracepoints for widening Darrick J. Wong ` (4 more replies) 2022-12-30 22:18 ` [PATCHSET v1.0 00/42] xfs: reflink on the realtime device Darrick J. Wong ` (24 subsequent siblings) 39 siblings, 5 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs Hi all, This series cleans up the refcount intent code before we start adding support for realtime devices. Similar to previous intent cleanup patchsets, we start transforming the tracepoints so that the data extraction are done inside the tracepoint code, and then we start passing the intent itself to the _finish_one function. This reduces the boxing and unboxing of parameters. If you're going to start using this mess, you probably ought to just pull from my git trees, which are linked below. This is an extraordinary way to destroy everything. Enjoy! Comments and questions are, as always, welcome. --D kernel git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=refcount-intent-cleanups xfsprogs git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=refcount-intent-cleanups --- fs/xfs/libxfs/xfs_refcount.c | 122 ++++++++-------------- fs/xfs/libxfs/xfs_refcount.h | 6 + fs/xfs/xfs_refcount_item.c | 32 ++---- fs/xfs/xfs_trace.c | 1 fs/xfs/xfs_trace.h | 229 ++++++++++++++++++++---------------------- 5 files changed, 169 insertions(+), 221 deletions(-) ^ permalink raw reply [flat|nested] 565+ messages in thread
* [PATCH 3/5] xfs: prepare refcount btree tracepoints for widening 2022-12-30 22:18 ` [PATCHSET v1.0 0/5] xfs: refcount log intent cleanups Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 1/5] xfs: give refcount btree cursor error tracepoints their own class Darrick J. Wong ` (3 subsequent siblings) 4 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Prepare the rest of refcount btree tracepoints for use with realtime reflink by making them take the btree cursor object as a parameter. This will save us a lot of trouble later on. Remove the xfs_refcount_recover_extent tracepoint since it's already covered by other refcount tracepoints. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_refcount.c | 42 ++++++++------------- fs/xfs/xfs_trace.h | 83 +++++++++++++++++++----------------------- 2 files changed, 53 insertions(+), 72 deletions(-) diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c index 1d181561a9ff..4c6ed75059c8 100644 --- a/fs/xfs/libxfs/xfs_refcount.c +++ b/fs/xfs/libxfs/xfs_refcount.c @@ -191,7 +191,7 @@ xfs_refcount_get_rec( if (fa) return xfs_refcount_complain_bad_rec(cur, fa, irec); - trace_xfs_refcount_get(cur->bc_mp, cur->bc_ag.pag->pag_agno, irec); + trace_xfs_refcount_get(cur, irec); return 0; } @@ -209,7 +209,7 @@ xfs_refcount_update( uint32_t start; int error; - trace_xfs_refcount_update(cur->bc_mp, cur->bc_ag.pag->pag_agno, irec); + trace_xfs_refcount_update(cur, irec); start = xfs_refcount_encode_startblock(irec->rc_startblock, irec->rc_domain); @@ -236,7 +236,7 @@ xfs_refcount_insert( { int error; - trace_xfs_refcount_insert(cur->bc_mp, cur->bc_ag.pag->pag_agno, irec); + trace_xfs_refcount_insert(cur, irec); cur->bc_rec.rc.rc_startblock = irec->rc_startblock; cur->bc_rec.rc.rc_blockcount = irec->rc_blockcount; @@ -281,7 +281,7 @@ xfs_refcount_delete( error = -EFSCORRUPTED; goto out_error; } - trace_xfs_refcount_delete(cur->bc_mp, cur->bc_ag.pag->pag_agno, &irec); + trace_xfs_refcount_delete(cur, &irec); error = xfs_btree_delete(cur, i); if (XFS_IS_CORRUPT(cur->bc_mp, *i != 1)) { xfs_btree_mark_sick(cur); @@ -418,8 +418,7 @@ xfs_refcount_split_extent( return 0; *shape_changed = true; - trace_xfs_refcount_split_extent(cur->bc_mp, cur->bc_ag.pag->pag_agno, - &rcext, agbno); + trace_xfs_refcount_split_extent(cur, &rcext, agbno); /* Establish the right extent. */ tmp = rcext; @@ -462,8 +461,7 @@ xfs_refcount_merge_center_extents( int error; int found_rec; - trace_xfs_refcount_merge_center_extents(cur->bc_mp, - cur->bc_ag.pag->pag_agno, left, center, right); + trace_xfs_refcount_merge_center_extents(cur, left, center, right); ASSERT(left->rc_domain == center->rc_domain); ASSERT(right->rc_domain == center->rc_domain); @@ -544,8 +542,7 @@ xfs_refcount_merge_left_extent( int error; int found_rec; - trace_xfs_refcount_merge_left_extent(cur->bc_mp, - cur->bc_ag.pag->pag_agno, left, cleft); + trace_xfs_refcount_merge_left_extent(cur, left, cleft); ASSERT(left->rc_domain == cleft->rc_domain); @@ -609,8 +606,7 @@ xfs_refcount_merge_right_extent( int error; int found_rec; - trace_xfs_refcount_merge_right_extent(cur->bc_mp, - cur->bc_ag.pag->pag_agno, cright, right); + trace_xfs_refcount_merge_right_extent(cur, cright, right); ASSERT(right->rc_domain == cright->rc_domain); @@ -749,8 +745,7 @@ xfs_refcount_find_left_extents( cleft->rc_refcount = 1; cleft->rc_domain = domain; } - trace_xfs_refcount_find_left_extent(cur->bc_mp, cur->bc_ag.pag->pag_agno, - left, cleft, agbno); + trace_xfs_refcount_find_left_extent(cur, left, cleft, agbno); return error; out_error: @@ -843,8 +838,8 @@ xfs_refcount_find_right_extents( cright->rc_refcount = 1; cright->rc_domain = domain; } - trace_xfs_refcount_find_right_extent(cur->bc_mp, cur->bc_ag.pag->pag_agno, - cright, right, agbno + aglen); + trace_xfs_refcount_find_right_extent(cur, cright, right, + agbno + aglen); return error; out_error: @@ -1147,8 +1142,7 @@ xfs_refcount_adjust_extents( tmp.rc_refcount = 1 + adj; tmp.rc_domain = XFS_REFC_DOMAIN_SHARED; - trace_xfs_refcount_modify_extent(cur->bc_mp, - cur->bc_ag.pag->pag_agno, &tmp); + trace_xfs_refcount_modify_extent(cur, &tmp); /* * Either cover the hole (increment) or @@ -1210,8 +1204,7 @@ xfs_refcount_adjust_extents( if (ext.rc_refcount == MAXREFCOUNT) goto skip; ext.rc_refcount += adj; - trace_xfs_refcount_modify_extent(cur->bc_mp, - cur->bc_ag.pag->pag_agno, &ext); + trace_xfs_refcount_modify_extent(cur, &ext); cur->bc_ag.refc.nr_ops++; if (ext.rc_refcount > 1) { error = xfs_refcount_update(cur, &ext); @@ -1723,8 +1716,7 @@ xfs_refcount_adjust_cow_extents( tmp.rc_refcount = 1; tmp.rc_domain = XFS_REFC_DOMAIN_COW; - trace_xfs_refcount_modify_extent(cur->bc_mp, - cur->bc_ag.pag->pag_agno, &tmp); + trace_xfs_refcount_modify_extent(cur, &tmp); error = xfs_refcount_insert(cur, &tmp, &found_tmp); @@ -1755,8 +1747,7 @@ xfs_refcount_adjust_cow_extents( } ext.rc_refcount = 0; - trace_xfs_refcount_modify_extent(cur->bc_mp, - cur->bc_ag.pag->pag_agno, &ext); + trace_xfs_refcount_modify_extent(cur, &ext); error = xfs_refcount_delete(cur, &found_rec); if (error) goto out_error; @@ -1989,9 +1980,6 @@ xfs_refcount_recover_cow_leftovers( if (error) goto out_free; - trace_xfs_refcount_recover_extent(mp, pag->pag_agno, - &rr->rr_rrec); - /* Free the orphan record */ fsb = XFS_AGB_TO_FSB(mp, pag->pag_agno, rr->rr_rrec.rc_startblock); diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index a5686b53ca80..233c611b6018 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -3285,9 +3285,8 @@ TRACE_EVENT(xfs_refcount_lookup, /* single-rcext tracepoint class */ DECLARE_EVENT_CLASS(xfs_refcount_extent_class, - TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, - struct xfs_refcount_irec *irec), - TP_ARGS(mp, agno, irec), + TP_PROTO(struct xfs_btree_cur *cur, struct xfs_refcount_irec *irec), + TP_ARGS(cur, irec), TP_STRUCT__entry( __field(dev_t, dev) __field(xfs_agnumber_t, agno) @@ -3297,8 +3296,8 @@ DECLARE_EVENT_CLASS(xfs_refcount_extent_class, __field(xfs_nlink_t, refcount) ), TP_fast_assign( - __entry->dev = mp->m_super->s_dev; - __entry->agno = agno; + __entry->dev = cur->bc_mp->m_super->s_dev; + __entry->agno = cur->bc_ag.pag->pag_agno; __entry->domain = irec->rc_domain; __entry->startblock = irec->rc_startblock; __entry->blockcount = irec->rc_blockcount; @@ -3315,15 +3314,14 @@ DECLARE_EVENT_CLASS(xfs_refcount_extent_class, #define DEFINE_REFCOUNT_EXTENT_EVENT(name) \ DEFINE_EVENT(xfs_refcount_extent_class, name, \ - TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, \ - struct xfs_refcount_irec *irec), \ - TP_ARGS(mp, agno, irec)) + TP_PROTO(struct xfs_btree_cur *cur, struct xfs_refcount_irec *irec), \ + TP_ARGS(cur, irec)) /* single-rcext and an agbno tracepoint class */ DECLARE_EVENT_CLASS(xfs_refcount_extent_at_class, - TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, - struct xfs_refcount_irec *irec, xfs_agblock_t agbno), - TP_ARGS(mp, agno, irec, agbno), + TP_PROTO(struct xfs_btree_cur *cur, struct xfs_refcount_irec *irec, + xfs_agblock_t agbno), + TP_ARGS(cur, irec, agbno), TP_STRUCT__entry( __field(dev_t, dev) __field(xfs_agnumber_t, agno) @@ -3334,8 +3332,8 @@ DECLARE_EVENT_CLASS(xfs_refcount_extent_at_class, __field(xfs_agblock_t, agbno) ), TP_fast_assign( - __entry->dev = mp->m_super->s_dev; - __entry->agno = agno; + __entry->dev = cur->bc_mp->m_super->s_dev; + __entry->agno = cur->bc_ag.pag->pag_agno; __entry->domain = irec->rc_domain; __entry->startblock = irec->rc_startblock; __entry->blockcount = irec->rc_blockcount; @@ -3354,15 +3352,15 @@ DECLARE_EVENT_CLASS(xfs_refcount_extent_at_class, #define DEFINE_REFCOUNT_EXTENT_AT_EVENT(name) \ DEFINE_EVENT(xfs_refcount_extent_at_class, name, \ - TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, \ - struct xfs_refcount_irec *irec, xfs_agblock_t agbno), \ - TP_ARGS(mp, agno, irec, agbno)) + TP_PROTO(struct xfs_btree_cur *cur, struct xfs_refcount_irec *irec, \ + xfs_agblock_t agbno), \ + TP_ARGS(cur, irec, agbno)) /* double-rcext tracepoint class */ DECLARE_EVENT_CLASS(xfs_refcount_double_extent_class, - TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, - struct xfs_refcount_irec *i1, struct xfs_refcount_irec *i2), - TP_ARGS(mp, agno, i1, i2), + TP_PROTO(struct xfs_btree_cur *cur, struct xfs_refcount_irec *i1, + struct xfs_refcount_irec *i2), + TP_ARGS(cur, i1, i2), TP_STRUCT__entry( __field(dev_t, dev) __field(xfs_agnumber_t, agno) @@ -3376,8 +3374,8 @@ DECLARE_EVENT_CLASS(xfs_refcount_double_extent_class, __field(xfs_nlink_t, i2_refcount) ), TP_fast_assign( - __entry->dev = mp->m_super->s_dev; - __entry->agno = agno; + __entry->dev = cur->bc_mp->m_super->s_dev; + __entry->agno = cur->bc_ag.pag->pag_agno; __entry->i1_domain = i1->rc_domain; __entry->i1_startblock = i1->rc_startblock; __entry->i1_blockcount = i1->rc_blockcount; @@ -3403,16 +3401,15 @@ DECLARE_EVENT_CLASS(xfs_refcount_double_extent_class, #define DEFINE_REFCOUNT_DOUBLE_EXTENT_EVENT(name) \ DEFINE_EVENT(xfs_refcount_double_extent_class, name, \ - TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, \ - struct xfs_refcount_irec *i1, struct xfs_refcount_irec *i2), \ - TP_ARGS(mp, agno, i1, i2)) + TP_PROTO(struct xfs_btree_cur *cur, struct xfs_refcount_irec *i1, \ + struct xfs_refcount_irec *i2), \ + TP_ARGS(cur, i1, i2)) /* double-rcext and an agbno tracepoint class */ DECLARE_EVENT_CLASS(xfs_refcount_double_extent_at_class, - TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, - struct xfs_refcount_irec *i1, struct xfs_refcount_irec *i2, - xfs_agblock_t agbno), - TP_ARGS(mp, agno, i1, i2, agbno), + TP_PROTO(struct xfs_btree_cur *cur, struct xfs_refcount_irec *i1, + struct xfs_refcount_irec *i2, xfs_agblock_t agbno), + TP_ARGS(cur, i1, i2, agbno), TP_STRUCT__entry( __field(dev_t, dev) __field(xfs_agnumber_t, agno) @@ -3427,8 +3424,8 @@ DECLARE_EVENT_CLASS(xfs_refcount_double_extent_at_class, __field(xfs_agblock_t, agbno) ), TP_fast_assign( - __entry->dev = mp->m_super->s_dev; - __entry->agno = agno; + __entry->dev = cur->bc_mp->m_super->s_dev; + __entry->agno = cur->bc_ag.pag->pag_agno; __entry->i1_domain = i1->rc_domain; __entry->i1_startblock = i1->rc_startblock; __entry->i1_blockcount = i1->rc_blockcount; @@ -3456,17 +3453,15 @@ DECLARE_EVENT_CLASS(xfs_refcount_double_extent_at_class, #define DEFINE_REFCOUNT_DOUBLE_EXTENT_AT_EVENT(name) \ DEFINE_EVENT(xfs_refcount_double_extent_at_class, name, \ - TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, \ - struct xfs_refcount_irec *i1, struct xfs_refcount_irec *i2, \ - xfs_agblock_t agbno), \ - TP_ARGS(mp, agno, i1, i2, agbno)) + TP_PROTO(struct xfs_btree_cur *cur, struct xfs_refcount_irec *i1, \ + struct xfs_refcount_irec *i2, xfs_agblock_t agbno), \ + TP_ARGS(cur, i1, i2, agbno)) /* triple-rcext tracepoint class */ DECLARE_EVENT_CLASS(xfs_refcount_triple_extent_class, - TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, - struct xfs_refcount_irec *i1, struct xfs_refcount_irec *i2, - struct xfs_refcount_irec *i3), - TP_ARGS(mp, agno, i1, i2, i3), + TP_PROTO(struct xfs_btree_cur *cur, struct xfs_refcount_irec *i1, + struct xfs_refcount_irec *i2, struct xfs_refcount_irec *i3), + TP_ARGS(cur, i1, i2, i3), TP_STRUCT__entry( __field(dev_t, dev) __field(xfs_agnumber_t, agno) @@ -3484,8 +3479,8 @@ DECLARE_EVENT_CLASS(xfs_refcount_triple_extent_class, __field(xfs_nlink_t, i3_refcount) ), TP_fast_assign( - __entry->dev = mp->m_super->s_dev; - __entry->agno = agno; + __entry->dev = cur->bc_mp->m_super->s_dev; + __entry->agno = cur->bc_ag.pag->pag_agno; __entry->i1_domain = i1->rc_domain; __entry->i1_startblock = i1->rc_startblock; __entry->i1_blockcount = i1->rc_blockcount; @@ -3520,10 +3515,9 @@ DECLARE_EVENT_CLASS(xfs_refcount_triple_extent_class, #define DEFINE_REFCOUNT_TRIPLE_EXTENT_EVENT(name) \ DEFINE_EVENT(xfs_refcount_triple_extent_class, name, \ - TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, \ - struct xfs_refcount_irec *i1, struct xfs_refcount_irec *i2, \ - struct xfs_refcount_irec *i3), \ - TP_ARGS(mp, agno, i1, i2, i3)) + TP_PROTO(struct xfs_btree_cur *cur, struct xfs_refcount_irec *i1, \ + struct xfs_refcount_irec *i2, struct xfs_refcount_irec *i3), \ + TP_ARGS(cur, i1, i2, i3)) /* refcount btree tracepoints */ DEFINE_REFCOUNT_EXTENT_EVENT(xfs_refcount_get); @@ -3541,7 +3535,6 @@ DEFINE_REFCOUNT_EVENT(xfs_refcount_cow_increase); DEFINE_REFCOUNT_EVENT(xfs_refcount_cow_decrease); DEFINE_REFCOUNT_TRIPLE_EXTENT_EVENT(xfs_refcount_merge_center_extents); DEFINE_REFCOUNT_EXTENT_EVENT(xfs_refcount_modify_extent); -DEFINE_REFCOUNT_EXTENT_EVENT(xfs_refcount_recover_extent); DEFINE_REFCOUNT_EXTENT_AT_EVENT(xfs_refcount_split_extent); DEFINE_REFCOUNT_DOUBLE_EXTENT_EVENT(xfs_refcount_merge_left_extent); DEFINE_REFCOUNT_DOUBLE_EXTENT_EVENT(xfs_refcount_merge_right_extent); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 1/5] xfs: give refcount btree cursor error tracepoints their own class 2022-12-30 22:18 ` [PATCHSET v1.0 0/5] xfs: refcount log intent cleanups Darrick J. Wong 2022-12-30 22:18 ` [PATCH 3/5] xfs: prepare refcount btree tracepoints for widening Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 2/5] xfs: create specialized classes for refcount tracepoints Darrick J. Wong ` (2 subsequent siblings) 4 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Convert all the refcount tracepoints to use the btree error tracepoint class. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_refcount.c | 42 ++++++++++++++---------------------------- fs/xfs/xfs_trace.h | 26 +++++++++++++------------- 2 files changed, 27 insertions(+), 41 deletions(-) diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c index 83f681fb49fb..b77d40631e60 100644 --- a/fs/xfs/libxfs/xfs_refcount.c +++ b/fs/xfs/libxfs/xfs_refcount.c @@ -219,8 +219,7 @@ xfs_refcount_update( error = xfs_btree_update(cur, &rec); if (error) - trace_xfs_refcount_update_error(cur->bc_mp, - cur->bc_ag.pag->pag_agno, error, _RET_IP_); + trace_xfs_refcount_update_error(cur, error, _RET_IP_); return error; } @@ -255,8 +254,7 @@ xfs_refcount_insert( out_error: if (error) - trace_xfs_refcount_insert_error(cur->bc_mp, - cur->bc_ag.pag->pag_agno, error, _RET_IP_); + trace_xfs_refcount_insert_error(cur, error, _RET_IP_); return error; } @@ -296,8 +294,7 @@ xfs_refcount_delete( &found_rec); out_error: if (error) - trace_xfs_refcount_delete_error(cur->bc_mp, - cur->bc_ag.pag->pag_agno, error, _RET_IP_); + trace_xfs_refcount_delete_error(cur, error, _RET_IP_); return error; } @@ -446,8 +443,7 @@ xfs_refcount_split_extent( return error; out_error: - trace_xfs_refcount_split_extent_error(cur->bc_mp, - cur->bc_ag.pag->pag_agno, error, _RET_IP_); + trace_xfs_refcount_split_extent_error(cur, error, _RET_IP_); return error; } @@ -530,8 +526,7 @@ xfs_refcount_merge_center_extents( return error; out_error: - trace_xfs_refcount_merge_center_extents_error(cur->bc_mp, - cur->bc_ag.pag->pag_agno, error, _RET_IP_); + trace_xfs_refcount_merge_center_extents_error(cur, error, _RET_IP_); return error; } @@ -597,8 +592,7 @@ xfs_refcount_merge_left_extent( return error; out_error: - trace_xfs_refcount_merge_left_extent_error(cur->bc_mp, - cur->bc_ag.pag->pag_agno, error, _RET_IP_); + trace_xfs_refcount_merge_left_extent_error(cur, error, _RET_IP_); return error; } @@ -666,8 +660,7 @@ xfs_refcount_merge_right_extent( return error; out_error: - trace_xfs_refcount_merge_right_extent_error(cur->bc_mp, - cur->bc_ag.pag->pag_agno, error, _RET_IP_); + trace_xfs_refcount_merge_right_extent_error(cur, error, _RET_IP_); return error; } @@ -761,8 +754,7 @@ xfs_refcount_find_left_extents( return error; out_error: - trace_xfs_refcount_find_left_extent_error(cur->bc_mp, - cur->bc_ag.pag->pag_agno, error, _RET_IP_); + trace_xfs_refcount_find_left_extent_error(cur, error, _RET_IP_); return error; } @@ -856,8 +848,7 @@ xfs_refcount_find_right_extents( return error; out_error: - trace_xfs_refcount_find_right_extent_error(cur->bc_mp, - cur->bc_ag.pag->pag_agno, error, _RET_IP_); + trace_xfs_refcount_find_right_extent_error(cur, error, _RET_IP_); return error; } @@ -1256,8 +1247,7 @@ xfs_refcount_adjust_extents( return error; out_error: - trace_xfs_refcount_modify_extent_error(cur->bc_mp, - cur->bc_ag.pag->pag_agno, error, _RET_IP_); + trace_xfs_refcount_modify_extent_error(cur, error, _RET_IP_); return error; } @@ -1317,8 +1307,7 @@ xfs_refcount_adjust( return 0; out_error: - trace_xfs_refcount_adjust_error(cur->bc_mp, cur->bc_ag.pag->pag_agno, - error, _RET_IP_); + trace_xfs_refcount_adjust_error(cur, error, _RET_IP_); return error; } @@ -1632,8 +1621,7 @@ xfs_refcount_find_shared( out_error: if (error) - trace_xfs_refcount_find_shared_error(cur->bc_mp, - cur->bc_ag.pag->pag_agno, error, _RET_IP_); + trace_xfs_refcount_find_shared_error(cur, error, _RET_IP_); return error; } @@ -1788,8 +1776,7 @@ xfs_refcount_adjust_cow_extents( return error; out_error: - trace_xfs_refcount_modify_extent_error(cur->bc_mp, - cur->bc_ag.pag->pag_agno, error, _RET_IP_); + trace_xfs_refcount_modify_extent_error(cur, error, _RET_IP_); return error; } @@ -1835,8 +1822,7 @@ xfs_refcount_adjust_cow( return 0; out_error: - trace_xfs_refcount_adjust_cow_error(cur->bc_mp, cur->bc_ag.pag->pag_agno, - error, _RET_IP_); + trace_xfs_refcount_adjust_cow_error(cur, error, _RET_IP_); return error; } diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index a6de7b6e4afd..d6da679ebaba 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -3513,9 +3513,9 @@ DEFINE_REFCOUNT_EXTENT_EVENT(xfs_refcount_get); DEFINE_REFCOUNT_EXTENT_EVENT(xfs_refcount_update); DEFINE_REFCOUNT_EXTENT_EVENT(xfs_refcount_insert); DEFINE_REFCOUNT_EXTENT_EVENT(xfs_refcount_delete); -DEFINE_AG_ERROR_EVENT(xfs_refcount_insert_error); -DEFINE_AG_ERROR_EVENT(xfs_refcount_delete_error); -DEFINE_AG_ERROR_EVENT(xfs_refcount_update_error); +DEFINE_BTREE_ERROR_EVENT(xfs_refcount_insert_error); +DEFINE_BTREE_ERROR_EVENT(xfs_refcount_delete_error); +DEFINE_BTREE_ERROR_EVENT(xfs_refcount_update_error); /* refcount adjustment tracepoints */ DEFINE_AG_EXTENT_EVENT(xfs_refcount_increase); @@ -3530,20 +3530,20 @@ DEFINE_REFCOUNT_DOUBLE_EXTENT_EVENT(xfs_refcount_merge_left_extent); DEFINE_REFCOUNT_DOUBLE_EXTENT_EVENT(xfs_refcount_merge_right_extent); DEFINE_REFCOUNT_DOUBLE_EXTENT_AT_EVENT(xfs_refcount_find_left_extent); DEFINE_REFCOUNT_DOUBLE_EXTENT_AT_EVENT(xfs_refcount_find_right_extent); -DEFINE_AG_ERROR_EVENT(xfs_refcount_adjust_error); -DEFINE_AG_ERROR_EVENT(xfs_refcount_adjust_cow_error); -DEFINE_AG_ERROR_EVENT(xfs_refcount_merge_center_extents_error); -DEFINE_AG_ERROR_EVENT(xfs_refcount_modify_extent_error); -DEFINE_AG_ERROR_EVENT(xfs_refcount_split_extent_error); -DEFINE_AG_ERROR_EVENT(xfs_refcount_merge_left_extent_error); -DEFINE_AG_ERROR_EVENT(xfs_refcount_merge_right_extent_error); -DEFINE_AG_ERROR_EVENT(xfs_refcount_find_left_extent_error); -DEFINE_AG_ERROR_EVENT(xfs_refcount_find_right_extent_error); +DEFINE_BTREE_ERROR_EVENT(xfs_refcount_adjust_error); +DEFINE_BTREE_ERROR_EVENT(xfs_refcount_adjust_cow_error); +DEFINE_BTREE_ERROR_EVENT(xfs_refcount_merge_center_extents_error); +DEFINE_BTREE_ERROR_EVENT(xfs_refcount_modify_extent_error); +DEFINE_BTREE_ERROR_EVENT(xfs_refcount_split_extent_error); +DEFINE_BTREE_ERROR_EVENT(xfs_refcount_merge_left_extent_error); +DEFINE_BTREE_ERROR_EVENT(xfs_refcount_merge_right_extent_error); +DEFINE_BTREE_ERROR_EVENT(xfs_refcount_find_left_extent_error); +DEFINE_BTREE_ERROR_EVENT(xfs_refcount_find_right_extent_error); /* reflink helpers */ DEFINE_AG_EXTENT_EVENT(xfs_refcount_find_shared); DEFINE_AG_EXTENT_EVENT(xfs_refcount_find_shared_result); -DEFINE_AG_ERROR_EVENT(xfs_refcount_find_shared_error); +DEFINE_BTREE_ERROR_EVENT(xfs_refcount_find_shared_error); DECLARE_EVENT_CLASS(xfs_refcount_deferred_class, TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 2/5] xfs: create specialized classes for refcount tracepoints 2022-12-30 22:18 ` [PATCHSET v1.0 0/5] xfs: refcount log intent cleanups Darrick J. Wong 2022-12-30 22:18 ` [PATCH 3/5] xfs: prepare refcount btree tracepoints for widening Darrick J. Wong 2022-12-30 22:18 ` [PATCH 1/5] xfs: give refcount btree cursor error tracepoints their own class Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 4/5] xfs: clean up refcount log intent item tracepoint callsites Darrick J. Wong 2022-12-30 22:18 ` [PATCH 5/5] xfs: remove xfs_trans_set_refcount_flags Darrick J. Wong 4 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> The only user of the "ag" tracepoint event classes is the refcount btree, so rename them to make that obvious and make them take the btree cursor to simplify the arguments. This will save us a lot of trouble later on. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_refcount.c | 24 ++++++----------- fs/xfs/xfs_trace.h | 61 +++++++++++++++++++++++++++--------------- 2 files changed, 48 insertions(+), 37 deletions(-) diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c index b77d40631e60..1d181561a9ff 100644 --- a/fs/xfs/libxfs/xfs_refcount.c +++ b/fs/xfs/libxfs/xfs_refcount.c @@ -51,7 +51,7 @@ xfs_refcount_lookup_le( xfs_agblock_t bno, int *stat) { - trace_xfs_refcount_lookup(cur->bc_mp, cur->bc_ag.pag->pag_agno, + trace_xfs_refcount_lookup(cur, xfs_refcount_encode_startblock(bno, domain), XFS_LOOKUP_LE); cur->bc_rec.rc.rc_startblock = bno; @@ -71,7 +71,7 @@ xfs_refcount_lookup_ge( xfs_agblock_t bno, int *stat) { - trace_xfs_refcount_lookup(cur->bc_mp, cur->bc_ag.pag->pag_agno, + trace_xfs_refcount_lookup(cur, xfs_refcount_encode_startblock(bno, domain), XFS_LOOKUP_GE); cur->bc_rec.rc.rc_startblock = bno; @@ -91,7 +91,7 @@ xfs_refcount_lookup_eq( xfs_agblock_t bno, int *stat) { - trace_xfs_refcount_lookup(cur->bc_mp, cur->bc_ag.pag->pag_agno, + trace_xfs_refcount_lookup(cur, xfs_refcount_encode_startblock(bno, domain), XFS_LOOKUP_LE); cur->bc_rec.rc.rc_startblock = bno; @@ -1264,11 +1264,9 @@ xfs_refcount_adjust( int error; if (adj == XFS_REFCOUNT_ADJUST_INCREASE) - trace_xfs_refcount_increase(cur->bc_mp, - cur->bc_ag.pag->pag_agno, *agbno, *aglen); + trace_xfs_refcount_increase(cur, *agbno, *aglen); else - trace_xfs_refcount_decrease(cur->bc_mp, - cur->bc_ag.pag->pag_agno, *agbno, *aglen); + trace_xfs_refcount_decrease(cur, *agbno, *aglen); /* * Ensure that no rcextents cross the boundary of the adjustment range. @@ -1528,8 +1526,7 @@ xfs_refcount_find_shared( int have; int error; - trace_xfs_refcount_find_shared(cur->bc_mp, cur->bc_ag.pag->pag_agno, - agbno, aglen); + trace_xfs_refcount_find_shared(cur, agbno, aglen); /* By default, skip the whole range */ *fbno = NULLAGBLOCK; @@ -1616,8 +1613,7 @@ xfs_refcount_find_shared( } done: - trace_xfs_refcount_find_shared_result(cur->bc_mp, - cur->bc_ag.pag->pag_agno, *fbno, *flen); + trace_xfs_refcount_find_shared_result(cur, *fbno, *flen); out_error: if (error) @@ -1835,8 +1831,7 @@ __xfs_refcount_cow_alloc( xfs_agblock_t agbno, xfs_extlen_t aglen) { - trace_xfs_refcount_cow_increase(rcur->bc_mp, rcur->bc_ag.pag->pag_agno, - agbno, aglen); + trace_xfs_refcount_cow_increase(rcur, agbno, aglen); /* Add refcount btree reservation */ return xfs_refcount_adjust_cow(rcur, agbno, aglen, @@ -1852,8 +1847,7 @@ __xfs_refcount_cow_free( xfs_agblock_t agbno, xfs_extlen_t aglen) { - trace_xfs_refcount_cow_decrease(rcur->bc_mp, rcur->bc_ag.pag->pag_agno, - agbno, aglen); + trace_xfs_refcount_cow_decrease(rcur, agbno, aglen); /* Remove refcount btree reservation */ return xfs_refcount_adjust_cow(rcur, agbno, aglen, diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index d6da679ebaba..a5686b53ca80 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -3228,17 +3228,41 @@ DEFINE_AG_ERROR_EVENT(xfs_ag_resv_init_error); /* refcount tracepoint classes */ -/* reuse the discard trace class for agbno/aglen-based traces */ -#define DEFINE_AG_EXTENT_EVENT(name) DEFINE_DISCARD_EVENT(name) +DECLARE_EVENT_CLASS(xfs_refcount_class, + TP_PROTO(struct xfs_btree_cur *cur, xfs_agblock_t agbno, + xfs_extlen_t len), + TP_ARGS(cur, agbno, len), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(xfs_agnumber_t, agno) + __field(xfs_agblock_t, agbno) + __field(xfs_extlen_t, len) + ), + TP_fast_assign( + __entry->dev = cur->bc_mp->m_super->s_dev; + __entry->agno = cur->bc_ag.pag->pag_agno; + __entry->agbno = agbno; + __entry->len = len; + ), + TP_printk("dev %d:%d agno 0x%x agbno 0x%x fsbcount 0x%x", + MAJOR(__entry->dev), MINOR(__entry->dev), + __entry->agno, + __entry->agbno, + __entry->len) +); +#define DEFINE_REFCOUNT_EVENT(name) \ +DEFINE_EVENT(xfs_refcount_class, name, \ + TP_PROTO(struct xfs_btree_cur *cur, xfs_agblock_t agbno, \ + xfs_extlen_t len), \ + TP_ARGS(cur, agbno, len)) -/* ag btree lookup tracepoint class */ TRACE_DEFINE_ENUM(XFS_LOOKUP_EQi); TRACE_DEFINE_ENUM(XFS_LOOKUP_LEi); TRACE_DEFINE_ENUM(XFS_LOOKUP_GEi); -DECLARE_EVENT_CLASS(xfs_ag_btree_lookup_class, - TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, - xfs_agblock_t agbno, xfs_lookup_t dir), - TP_ARGS(mp, agno, agbno, dir), +TRACE_EVENT(xfs_refcount_lookup, + TP_PROTO(struct xfs_btree_cur *cur, xfs_agblock_t agbno, + xfs_lookup_t dir), + TP_ARGS(cur, agbno, dir), TP_STRUCT__entry( __field(dev_t, dev) __field(xfs_agnumber_t, agno) @@ -3246,8 +3270,8 @@ DECLARE_EVENT_CLASS(xfs_ag_btree_lookup_class, __field(xfs_lookup_t, dir) ), TP_fast_assign( - __entry->dev = mp->m_super->s_dev; - __entry->agno = agno; + __entry->dev = cur->bc_mp->m_super->s_dev; + __entry->agno = cur->bc_ag.pag->pag_agno; __entry->agbno = agbno; __entry->dir = dir; ), @@ -3259,12 +3283,6 @@ DECLARE_EVENT_CLASS(xfs_ag_btree_lookup_class, __entry->dir) ) -#define DEFINE_AG_BTREE_LOOKUP_EVENT(name) \ -DEFINE_EVENT(xfs_ag_btree_lookup_class, name, \ - TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, \ - xfs_agblock_t agbno, xfs_lookup_t dir), \ - TP_ARGS(mp, agno, agbno, dir)) - /* single-rcext tracepoint class */ DECLARE_EVENT_CLASS(xfs_refcount_extent_class, TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, @@ -3508,7 +3526,6 @@ DEFINE_EVENT(xfs_refcount_triple_extent_class, name, \ TP_ARGS(mp, agno, i1, i2, i3)) /* refcount btree tracepoints */ -DEFINE_AG_BTREE_LOOKUP_EVENT(xfs_refcount_lookup); DEFINE_REFCOUNT_EXTENT_EVENT(xfs_refcount_get); DEFINE_REFCOUNT_EXTENT_EVENT(xfs_refcount_update); DEFINE_REFCOUNT_EXTENT_EVENT(xfs_refcount_insert); @@ -3518,10 +3535,10 @@ DEFINE_BTREE_ERROR_EVENT(xfs_refcount_delete_error); DEFINE_BTREE_ERROR_EVENT(xfs_refcount_update_error); /* refcount adjustment tracepoints */ -DEFINE_AG_EXTENT_EVENT(xfs_refcount_increase); -DEFINE_AG_EXTENT_EVENT(xfs_refcount_decrease); -DEFINE_AG_EXTENT_EVENT(xfs_refcount_cow_increase); -DEFINE_AG_EXTENT_EVENT(xfs_refcount_cow_decrease); +DEFINE_REFCOUNT_EVENT(xfs_refcount_increase); +DEFINE_REFCOUNT_EVENT(xfs_refcount_decrease); +DEFINE_REFCOUNT_EVENT(xfs_refcount_cow_increase); +DEFINE_REFCOUNT_EVENT(xfs_refcount_cow_decrease); DEFINE_REFCOUNT_TRIPLE_EXTENT_EVENT(xfs_refcount_merge_center_extents); DEFINE_REFCOUNT_EXTENT_EVENT(xfs_refcount_modify_extent); DEFINE_REFCOUNT_EXTENT_EVENT(xfs_refcount_recover_extent); @@ -3541,8 +3558,8 @@ DEFINE_BTREE_ERROR_EVENT(xfs_refcount_find_left_extent_error); DEFINE_BTREE_ERROR_EVENT(xfs_refcount_find_right_extent_error); /* reflink helpers */ -DEFINE_AG_EXTENT_EVENT(xfs_refcount_find_shared); -DEFINE_AG_EXTENT_EVENT(xfs_refcount_find_shared_result); +DEFINE_REFCOUNT_EVENT(xfs_refcount_find_shared); +DEFINE_REFCOUNT_EVENT(xfs_refcount_find_shared_result); DEFINE_BTREE_ERROR_EVENT(xfs_refcount_find_shared_error); DECLARE_EVENT_CLASS(xfs_refcount_deferred_class, ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 4/5] xfs: clean up refcount log intent item tracepoint callsites 2022-12-30 22:18 ` [PATCHSET v1.0 0/5] xfs: refcount log intent cleanups Darrick J. Wong ` (2 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 2/5] xfs: create specialized classes for refcount tracepoints Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 5/5] xfs: remove xfs_trans_set_refcount_flags Darrick J. Wong 4 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Pass the incore refcount intent structure to the tracepoints instead of open-coding the argument passing. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_refcount.c | 14 +++------- fs/xfs/libxfs/xfs_refcount.h | 6 ++++ fs/xfs/xfs_trace.c | 1 + fs/xfs/xfs_trace.h | 59 +++++++++++++----------------------------- 4 files changed, 29 insertions(+), 51 deletions(-) diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c index 4c6ed75059c8..3d2269c6855a 100644 --- a/fs/xfs/libxfs/xfs_refcount.c +++ b/fs/xfs/libxfs/xfs_refcount.c @@ -1369,9 +1369,7 @@ xfs_refcount_finish_one( bno = XFS_FSB_TO_AGBNO(mp, ri->ri_startblock); - trace_xfs_refcount_deferred(mp, XFS_FSB_TO_AGNO(mp, ri->ri_startblock), - ri->ri_type, XFS_FSB_TO_AGBNO(mp, ri->ri_startblock), - ri->ri_blockcount); + trace_xfs_refcount_deferred(mp, ri); if (XFS_TEST_ERROR(false, mp, XFS_ERRTAG_REFCOUNT_FINISH_ONE)) return -EIO; @@ -1434,8 +1432,7 @@ xfs_refcount_finish_one( return -EFSCORRUPTED; } if (!error && ri->ri_blockcount > 0) - trace_xfs_refcount_finish_one_leftover(mp, ri->ri_pag->pag_agno, - ri->ri_type, bno, ri->ri_blockcount); + trace_xfs_refcount_finish_one_leftover(mp, ri); return error; } @@ -1451,11 +1448,6 @@ __xfs_refcount_add( { struct xfs_refcount_intent *ri; - trace_xfs_refcount_defer(tp->t_mountp, - XFS_FSB_TO_AGNO(tp->t_mountp, startblock), - type, XFS_FSB_TO_AGBNO(tp->t_mountp, startblock), - blockcount); - ri = kmem_cache_alloc(xfs_refcount_intent_cache, GFP_NOFS | __GFP_NOFAIL); INIT_LIST_HEAD(&ri->ri_list); @@ -1463,6 +1455,8 @@ __xfs_refcount_add( ri->ri_startblock = startblock; ri->ri_blockcount = blockcount; + trace_xfs_refcount_defer(tp->t_mountp, ri); + xfs_refcount_update_get_group(tp->t_mountp, ri); xfs_defer_add(tp, XFS_DEFER_OPS_TYPE_REFCOUNT, &ri->ri_list); } diff --git a/fs/xfs/libxfs/xfs_refcount.h b/fs/xfs/libxfs/xfs_refcount.h index 9563eb91be17..7713bb908bdc 100644 --- a/fs/xfs/libxfs/xfs_refcount.h +++ b/fs/xfs/libxfs/xfs_refcount.h @@ -48,6 +48,12 @@ enum xfs_refcount_intent_type { XFS_REFCOUNT_FREE_COW, }; +#define XFS_REFCOUNT_INTENT_STRINGS \ + { XFS_REFCOUNT_INCREASE, "incr" }, \ + { XFS_REFCOUNT_DECREASE, "decr" }, \ + { XFS_REFCOUNT_ALLOC_COW, "alloc_cow" }, \ + { XFS_REFCOUNT_FREE_COW, "free_cow" } + struct xfs_refcount_intent { struct list_head ri_list; struct xfs_perag *ri_pag; diff --git a/fs/xfs/xfs_trace.c b/fs/xfs/xfs_trace.c index ae35868e0638..0b9405749079 100644 --- a/fs/xfs/xfs_trace.c +++ b/fs/xfs/xfs_trace.c @@ -44,6 +44,7 @@ #include "xfs_xchgrange.h" #include "xfs_rtgroup.h" #include "xfs_rmap.h" +#include "xfs_refcount.h" static inline void xfs_rmapbt_crack_agno_opdev( diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index 233c611b6018..c22ffe459002 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -89,6 +89,7 @@ struct xfs_swapext_req; struct xfs_rtgroup; struct xfs_extent_free_item; struct xfs_rmap_intent; +struct xfs_refcount_intent; #define XFS_ATTR_FILTER_FLAGS \ { XFS_ATTR_ROOT, "ROOT" }, \ @@ -3555,66 +3556,42 @@ DEFINE_REFCOUNT_EVENT(xfs_refcount_find_shared); DEFINE_REFCOUNT_EVENT(xfs_refcount_find_shared_result); DEFINE_BTREE_ERROR_EVENT(xfs_refcount_find_shared_error); +TRACE_DEFINE_ENUM(XFS_REFCOUNT_INCREASE); +TRACE_DEFINE_ENUM(XFS_REFCOUNT_DECREASE); +TRACE_DEFINE_ENUM(XFS_REFCOUNT_ALLOC_COW); +TRACE_DEFINE_ENUM(XFS_REFCOUNT_FREE_COW); + DECLARE_EVENT_CLASS(xfs_refcount_deferred_class, - TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, - int type, xfs_agblock_t agbno, xfs_extlen_t len), - TP_ARGS(mp, agno, type, agbno, len), + TP_PROTO(struct xfs_mount *mp, struct xfs_refcount_intent *refc), + TP_ARGS(mp, refc), TP_STRUCT__entry( __field(dev_t, dev) __field(xfs_agnumber_t, agno) - __field(int, type) + __field(int, op) __field(xfs_agblock_t, agbno) __field(xfs_extlen_t, len) ), TP_fast_assign( __entry->dev = mp->m_super->s_dev; - __entry->agno = agno; - __entry->type = type; - __entry->agbno = agbno; - __entry->len = len; + __entry->agno = XFS_FSB_TO_AGNO(mp, refc->ri_startblock); + __entry->op = refc->ri_type; + __entry->agbno = XFS_FSB_TO_AGBNO(mp, refc->ri_startblock); + __entry->len = refc->ri_blockcount; ), - TP_printk("dev %d:%d op %d agno 0x%x agbno 0x%x fsbcount 0x%x", + TP_printk("dev %d:%d op %s agno 0x%x agbno 0x%x fsbcount 0x%x", MAJOR(__entry->dev), MINOR(__entry->dev), - __entry->type, + __print_symbolic(__entry->op, XFS_REFCOUNT_INTENT_STRINGS), __entry->agno, __entry->agbno, __entry->len) ); #define DEFINE_REFCOUNT_DEFERRED_EVENT(name) \ DEFINE_EVENT(xfs_refcount_deferred_class, name, \ - TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, \ - int type, \ - xfs_agblock_t bno, \ - xfs_extlen_t len), \ - TP_ARGS(mp, agno, type, bno, len)) + TP_PROTO(struct xfs_mount *mp, struct xfs_refcount_intent *refc), \ + TP_ARGS(mp, refc)) DEFINE_REFCOUNT_DEFERRED_EVENT(xfs_refcount_defer); DEFINE_REFCOUNT_DEFERRED_EVENT(xfs_refcount_deferred); - -TRACE_EVENT(xfs_refcount_finish_one_leftover, - TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, - int type, xfs_agblock_t agbno, xfs_extlen_t len), - TP_ARGS(mp, agno, type, agbno, len), - TP_STRUCT__entry( - __field(dev_t, dev) - __field(xfs_agnumber_t, agno) - __field(int, type) - __field(xfs_agblock_t, agbno) - __field(xfs_extlen_t, len) - ), - TP_fast_assign( - __entry->dev = mp->m_super->s_dev; - __entry->agno = agno; - __entry->type = type; - __entry->agbno = agbno; - __entry->len = len; - ), - TP_printk("dev %d:%d type %d agno 0x%x agbno 0x%x fsbcount 0x%x", - MAJOR(__entry->dev), MINOR(__entry->dev), - __entry->type, - __entry->agno, - __entry->agbno, - __entry->len) -); +DEFINE_REFCOUNT_DEFERRED_EVENT(xfs_refcount_finish_one_leftover); /* simple inode-based error/%ip tracepoint class */ DECLARE_EVENT_CLASS(xfs_inode_error_class, ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 5/5] xfs: remove xfs_trans_set_refcount_flags 2022-12-30 22:18 ` [PATCHSET v1.0 0/5] xfs: refcount log intent cleanups Darrick J. Wong ` (3 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 4/5] xfs: clean up refcount log intent item tracepoint callsites Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 4 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Remove this single-use helper. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/xfs_refcount_item.c | 32 ++++++++++++-------------------- 1 file changed, 12 insertions(+), 20 deletions(-) diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c index 5c6eecc5318a..ccc334d482a4 100644 --- a/fs/xfs/xfs_refcount_item.c +++ b/fs/xfs/xfs_refcount_item.c @@ -289,25 +289,6 @@ xfs_refcount_update_diff_items( return ra->ri_pag->pag_agno - rb->ri_pag->pag_agno; } -/* Set the phys extent flags for this reverse mapping. */ -static void -xfs_trans_set_refcount_flags( - struct xfs_phys_extent *pmap, - enum xfs_refcount_intent_type type) -{ - pmap->pe_flags = 0; - switch (type) { - case XFS_REFCOUNT_INCREASE: - case XFS_REFCOUNT_DECREASE: - case XFS_REFCOUNT_ALLOC_COW: - case XFS_REFCOUNT_FREE_COW: - pmap->pe_flags |= type; - break; - default: - ASSERT(0); - } -} - /* Log refcount updates in the intent item. */ STATIC void xfs_refcount_update_log_item( @@ -331,7 +312,18 @@ xfs_refcount_update_log_item( pmap = &cuip->cui_format.cui_extents[next_extent]; pmap->pe_startblock = ri->ri_startblock; pmap->pe_len = ri->ri_blockcount; - xfs_trans_set_refcount_flags(pmap, ri->ri_type); + + pmap->pe_flags = 0; + switch (ri->ri_type) { + case XFS_REFCOUNT_INCREASE: + case XFS_REFCOUNT_DECREASE: + case XFS_REFCOUNT_ALLOC_COW: + case XFS_REFCOUNT_FREE_COW: + pmap->pe_flags |= ri->ri_type; + break; + default: + ASSERT(0); + } } static struct xfs_log_item * ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCHSET v1.0 00/42] xfs: reflink on the realtime device 2022-12-30 21:14 [NYE DELUGE 3/4] xfs: modernize the realtime volume Darrick J. Wong ` (14 preceding siblings ...) 2022-12-30 22:18 ` [PATCHSET v1.0 0/5] xfs: refcount log intent cleanups Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 03/42] xfs: namespace the maximum length/refcount symbols Darrick J. Wong ` (41 more replies) 2022-12-30 22:18 ` [PATCHSET v1.0 0/9] xfs: reflink with large realtime extents Darrick J. Wong ` (23 subsequent siblings) 39 siblings, 42 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs Hi all, This patchset enables use of the file data block sharing feature (i.e. reflink) on the realtime device. It follows the same basic sequence as the realtime rmap series -- first a few cleanups; then widening of the API parameters; and introduction of the new btree format and inode fork format. Next comes enabling CoW and remapping for the rt device; new scrub, repair, and health reporting code; and at the end we implement some code to lengthen write requests so that rt extents are always CoWed fully. If you're going to start using this mess, you probably ought to just pull from my git trees, which are linked below. This is an extraordinary way to destroy everything. Enjoy! Comments and questions are, as always, welcome. --D kernel git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=realtime-reflink xfsprogs git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=realtime-reflink fstests git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=realtime-reflink xfsdocs git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-documentation.git/log/?h=realtime-reflink --- fs/xfs/Makefile | 3 fs/xfs/libxfs/xfs_bmap.c | 31 + fs/xfs/libxfs/xfs_btree.c | 6 fs/xfs/libxfs/xfs_btree.h | 12 - fs/xfs/libxfs/xfs_defer.c | 1 fs/xfs/libxfs/xfs_defer.h | 1 fs/xfs/libxfs/xfs_format.h | 25 + fs/xfs/libxfs/xfs_fs.h | 4 fs/xfs/libxfs/xfs_health.h | 4 fs/xfs/libxfs/xfs_inode_buf.c | 34 + fs/xfs/libxfs/xfs_inode_fork.c | 13 + fs/xfs/libxfs/xfs_log_format.h | 5 fs/xfs/libxfs/xfs_refcount.c | 315 ++++++++++--- fs/xfs/libxfs/xfs_refcount.h | 25 + fs/xfs/libxfs/xfs_rmap.c | 14 + fs/xfs/libxfs/xfs_rtgroup.c | 10 fs/xfs/libxfs/xfs_rtgroup.h | 8 fs/xfs/libxfs/xfs_rtrefcount_btree.c | 811 ++++++++++++++++++++++++++++++++++ fs/xfs/libxfs/xfs_rtrefcount_btree.h | 195 ++++++++ fs/xfs/libxfs/xfs_rtrmap_btree.c | 28 + fs/xfs/libxfs/xfs_sb.c | 8 fs/xfs/libxfs/xfs_shared.h | 2 fs/xfs/libxfs/xfs_trans_inode.c | 14 + fs/xfs/libxfs/xfs_trans_resv.c | 25 + fs/xfs/libxfs/xfs_types.h | 5 fs/xfs/scrub/bitmap.h | 56 ++ fs/xfs/scrub/bmap.c | 28 + fs/xfs/scrub/bmap_repair.c | 3 fs/xfs/scrub/common.c | 40 +- fs/xfs/scrub/common.h | 5 fs/xfs/scrub/cow_repair.c | 212 ++++++++- fs/xfs/scrub/health.c | 1 fs/xfs/scrub/inode.c | 32 + fs/xfs/scrub/inode_repair.c | 27 + fs/xfs/scrub/quota.c | 8 fs/xfs/scrub/quota_repair.c | 2 fs/xfs/scrub/reap.c | 224 +++++++++ fs/xfs/scrub/reap.h | 7 fs/xfs/scrub/refcount.c | 2 fs/xfs/scrub/refcount_repair.c | 4 fs/xfs/scrub/repair.c | 27 + fs/xfs/scrub/repair.h | 9 fs/xfs/scrub/rmap_repair.c | 36 ++ fs/xfs/scrub/rtbitmap.c | 2 fs/xfs/scrub/rtbitmap_repair.c | 21 + fs/xfs/scrub/rtrefcount.c | 669 ++++++++++++++++++++++++++++ fs/xfs/scrub/rtrefcount_repair.c | 783 +++++++++++++++++++++++++++++++++ fs/xfs/scrub/rtrmap.c | 54 ++ fs/xfs/scrub/rtrmap_repair.c | 104 ++++ fs/xfs/scrub/scrub.c | 7 fs/xfs/scrub/scrub.h | 12 + fs/xfs/scrub/trace.h | 108 ++++- fs/xfs/xfs_bmap_util.c | 66 ++- fs/xfs/xfs_buf_item_recover.c | 4 fs/xfs/xfs_fsmap.c | 22 + fs/xfs/xfs_fsops.c | 2 fs/xfs/xfs_health.c | 4 fs/xfs/xfs_inode.c | 13 + fs/xfs/xfs_inode_item.c | 2 fs/xfs/xfs_inode_item_recover.c | 5 fs/xfs/xfs_ioctl.c | 21 + fs/xfs/xfs_mount.c | 7 fs/xfs/xfs_mount.h | 9 fs/xfs/xfs_ondisk.h | 2 fs/xfs/xfs_quota.h | 6 fs/xfs/xfs_refcount_item.c | 35 + fs/xfs/xfs_reflink.c | 231 ++++++++-- fs/xfs/xfs_rtalloc.c | 154 ++++++ fs/xfs/xfs_super.c | 19 + fs/xfs/xfs_trace.c | 9 fs/xfs/xfs_trace.h | 116 +++-- fs/xfs/xfs_trans_dquot.c | 11 72 files changed, 4495 insertions(+), 325 deletions(-) create mode 100644 fs/xfs/libxfs/xfs_rtrefcount_btree.c create mode 100644 fs/xfs/libxfs/xfs_rtrefcount_btree.h create mode 100644 fs/xfs/scrub/rtrefcount.c create mode 100644 fs/xfs/scrub/rtrefcount_repair.c ^ permalink raw reply [flat|nested] 565+ messages in thread
* [PATCH 03/42] xfs: namespace the maximum length/refcount symbols 2022-12-30 22:18 ` [PATCHSET v1.0 00/42] xfs: reflink on the realtime device Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 02/42] xfs: introduce realtime refcount btree definitions Darrick J. Wong ` (40 subsequent siblings) 41 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Actually namespace these variables properly, so that readers can tell that this is an XFS symbol, and that it's for the refcount functionality. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_format.h | 4 ++-- fs/xfs/libxfs/xfs_refcount.c | 18 +++++++++--------- fs/xfs/scrub/refcount.c | 2 +- fs/xfs/scrub/refcount_repair.c | 4 ++-- 4 files changed, 14 insertions(+), 14 deletions(-) diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h index c78fe8e78b8c..c49a946e79f3 100644 --- a/fs/xfs/libxfs/xfs_format.h +++ b/fs/xfs/libxfs/xfs_format.h @@ -1790,8 +1790,8 @@ struct xfs_refcount_key { __be32 rc_startblock; /* starting block number */ }; -#define MAXREFCOUNT ((xfs_nlink_t)~0U) -#define MAXREFCEXTLEN ((xfs_extlen_t)~0U) +#define XFS_REFC_REFCOUNT_MAX ((xfs_nlink_t)~0U) +#define XFS_REFC_LEN_MAX ((xfs_extlen_t)~0U) /* btree pointer type */ typedef __be32 xfs_refcount_ptr_t; diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c index 3d2269c6855a..e1f55edceccf 100644 --- a/fs/xfs/libxfs/xfs_refcount.c +++ b/fs/xfs/libxfs/xfs_refcount.c @@ -126,7 +126,7 @@ xfs_refcount_check_perag_irec( struct xfs_perag *pag, const struct xfs_refcount_irec *irec) { - if (irec->rc_blockcount == 0 || irec->rc_blockcount > MAXREFCEXTLEN) + if (irec->rc_blockcount == 0 || irec->rc_blockcount > XFS_REFC_LEN_MAX) return __this_address; if (!xfs_refcount_check_domain(irec)) @@ -136,7 +136,7 @@ xfs_refcount_check_perag_irec( if (!xfs_verify_agbext(pag, irec->rc_startblock, irec->rc_blockcount)) return __this_address; - if (irec->rc_refcount == 0 || irec->rc_refcount > MAXREFCOUNT) + if (irec->rc_refcount == 0 || irec->rc_refcount > XFS_REFC_REFCOUNT_MAX) return __this_address; return NULL; @@ -860,9 +860,9 @@ xfs_refc_merge_refcount( const struct xfs_refcount_irec *irec, enum xfs_refc_adjust_op adjust) { - /* Once a record hits MAXREFCOUNT, it is pinned there forever */ - if (irec->rc_refcount == MAXREFCOUNT) - return MAXREFCOUNT; + /* Once a record hits XFS_REFC_REFCOUNT_MAX, it is pinned forever */ + if (irec->rc_refcount == XFS_REFC_REFCOUNT_MAX) + return XFS_REFC_REFCOUNT_MAX; return irec->rc_refcount + adjust; } @@ -905,7 +905,7 @@ xfs_refc_want_merge_center( * hence we need to catch u32 addition overflows here. */ ulen += cleft->rc_blockcount + right->rc_blockcount; - if (ulen >= MAXREFCEXTLEN) + if (ulen >= XFS_REFC_LEN_MAX) return false; *ulenp = ulen; @@ -940,7 +940,7 @@ xfs_refc_want_merge_left( * hence we need to catch u32 addition overflows here. */ ulen += cleft->rc_blockcount; - if (ulen >= MAXREFCEXTLEN) + if (ulen >= XFS_REFC_LEN_MAX) return false; return true; @@ -974,7 +974,7 @@ xfs_refc_want_merge_right( * hence we need to catch u32 addition overflows here. */ ulen += cright->rc_blockcount; - if (ulen >= MAXREFCEXTLEN) + if (ulen >= XFS_REFC_LEN_MAX) return false; return true; @@ -1201,7 +1201,7 @@ xfs_refcount_adjust_extents( * Adjust the reference count and either update the tree * (incr) or free the blocks (decr). */ - if (ext.rc_refcount == MAXREFCOUNT) + if (ext.rc_refcount == XFS_REFC_REFCOUNT_MAX) goto skip; ext.rc_refcount += adj; trace_xfs_refcount_modify_extent(cur, &ext); diff --git a/fs/xfs/scrub/refcount.c b/fs/xfs/scrub/refcount.c index 413885eca333..78b52c8a4d7f 100644 --- a/fs/xfs/scrub/refcount.c +++ b/fs/xfs/scrub/refcount.c @@ -421,7 +421,7 @@ xchk_refcount_mergeable( if (r1->rc_refcount != r2->rc_refcount) return false; if ((unsigned long long)r1->rc_blockcount + r2->rc_blockcount > - MAXREFCEXTLEN) + XFS_REFC_LEN_MAX) return false; return true; diff --git a/fs/xfs/scrub/refcount_repair.c b/fs/xfs/scrub/refcount_repair.c index 539548cdc65a..81709afdd9e6 100644 --- a/fs/xfs/scrub/refcount_repair.c +++ b/fs/xfs/scrub/refcount_repair.c @@ -176,7 +176,7 @@ xrep_refc_stash( if (xchk_should_terminate(sc, &error)) return error; - irec.rc_refcount = min_t(uint64_t, MAXREFCOUNT, refcount); + irec.rc_refcount = min_t(uint64_t, XFS_REFC_REFCOUNT_MAX, refcount); error = xrep_refc_check_ext(rr->sc, &irec); if (error) @@ -415,7 +415,7 @@ xrep_refc_find_refcounts( /* * Set up a bag to store all the rmap records that we're tracking to * generate a reference count record. If the size of the bag exceeds - * MAXREFCOUNT, we clamp rc_refcount. + * XFS_REFC_REFCOUNT_MAX, we clamp rc_refcount. */ error = rcbag_init(sc->mp, sc->xfile_buftarg, &rcstack); if (error) ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 02/42] xfs: introduce realtime refcount btree definitions 2022-12-30 22:18 ` [PATCHSET v1.0 00/42] xfs: reflink on the realtime device Darrick J. Wong 2022-12-30 22:18 ` [PATCH 03/42] xfs: namespace the maximum length/refcount symbols Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 01/42] xfs: prepare refcount btree cursor tracepoints for realtime Darrick J. Wong ` (39 subsequent siblings) 41 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Add new realtime refcount btree definitions. The realtime refcount btree will be rooted from a hidden inode, but has its own shape and therefore needs to have most of its own separate types. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_btree.h | 1 + fs/xfs/libxfs/xfs_format.h | 6 ++++++ fs/xfs/libxfs/xfs_types.h | 5 +++-- fs/xfs/scrub/trace.h | 1 + fs/xfs/xfs_trace.h | 1 + 5 files changed, 12 insertions(+), 2 deletions(-) diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h index 20342ed62bf4..ce5ef798c3bc 100644 --- a/fs/xfs/libxfs/xfs_btree.h +++ b/fs/xfs/libxfs/xfs_btree.h @@ -65,6 +65,7 @@ union xfs_btree_rec { #define XFS_BTNUM_REFC ((xfs_btnum_t)XFS_BTNUM_REFCi) #define XFS_BTNUM_RCBAG ((xfs_btnum_t)XFS_BTNUM_RCBAGi) #define XFS_BTNUM_RTRMAP ((xfs_btnum_t)XFS_BTNUM_RTRMAPi) +#define XFS_BTNUM_RTREFC ((xfs_btnum_t)XFS_BTNUM_RTREFCi) struct xfs_btree_ops; uint32_t xfs_btree_magic(struct xfs_mount *mp, const struct xfs_btree_ops *ops); diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h index a2b8d8ee8afd..c78fe8e78b8c 100644 --- a/fs/xfs/libxfs/xfs_format.h +++ b/fs/xfs/libxfs/xfs_format.h @@ -1796,6 +1796,12 @@ struct xfs_refcount_key { /* btree pointer type */ typedef __be32 xfs_refcount_ptr_t; +/* + * Realtime Reference Count btree format definitions + * + * This is a btree for reference count records for realtime volumes + */ +#define XFS_RTREFC_CRC_MAGIC 0x52434e54 /* 'RCNT' */ /* * BMAP Btree format definitions diff --git a/fs/xfs/libxfs/xfs_types.h b/fs/xfs/libxfs/xfs_types.h index e6a4f4a7d009..92c60a9d5862 100644 --- a/fs/xfs/libxfs/xfs_types.h +++ b/fs/xfs/libxfs/xfs_types.h @@ -126,7 +126,7 @@ typedef enum { typedef enum { XFS_BTNUM_BNOi, XFS_BTNUM_CNTi, XFS_BTNUM_RMAPi, XFS_BTNUM_BMAPi, XFS_BTNUM_INOi, XFS_BTNUM_FINOi, XFS_BTNUM_REFCi, XFS_BTNUM_RCBAGi, - XFS_BTNUM_RTRMAPi, XFS_BTNUM_MAX + XFS_BTNUM_RTRMAPi, XFS_BTNUM_RTREFCi, XFS_BTNUM_MAX } xfs_btnum_t; #define XFS_BTNUM_STRINGS \ @@ -138,7 +138,8 @@ typedef enum { { XFS_BTNUM_FINOi, "finobt" }, \ { XFS_BTNUM_REFCi, "refcbt" }, \ { XFS_BTNUM_RCBAGi, "rcbagbt" }, \ - { XFS_BTNUM_RTRMAPi, "rtrmapbt" } + { XFS_BTNUM_RTRMAPi, "rtrmapbt" }, \ + { XFS_BTNUM_RTREFCi, "rtrefcbt" } struct xfs_name { const unsigned char *name; diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h index 4cf8180173ca..8d66ab10e1fd 100644 --- a/fs/xfs/scrub/trace.h +++ b/fs/xfs/scrub/trace.h @@ -43,6 +43,7 @@ TRACE_DEFINE_ENUM(XFS_BTNUM_RMAPi); TRACE_DEFINE_ENUM(XFS_BTNUM_REFCi); TRACE_DEFINE_ENUM(XFS_BTNUM_RCBAGi); TRACE_DEFINE_ENUM(XFS_BTNUM_RTRMAPi); +TRACE_DEFINE_ENUM(XFS_BTNUM_RTREFCi); TRACE_DEFINE_ENUM(XFS_REFC_DOMAIN_SHARED); TRACE_DEFINE_ENUM(XFS_REFC_DOMAIN_COW); diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index 4e0c40934a7f..1f8ab7c436a9 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -2560,6 +2560,7 @@ TRACE_DEFINE_ENUM(XFS_BTNUM_RMAPi); TRACE_DEFINE_ENUM(XFS_BTNUM_REFCi); TRACE_DEFINE_ENUM(XFS_BTNUM_RCBAGi); TRACE_DEFINE_ENUM(XFS_BTNUM_RTRMAPi); +TRACE_DEFINE_ENUM(XFS_BTNUM_RTREFCi); DECLARE_EVENT_CLASS(xfs_btree_cur_class, TP_PROTO(struct xfs_btree_cur *cur, int level, struct xfs_buf *bp), ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 01/42] xfs: prepare refcount btree cursor tracepoints for realtime 2022-12-30 22:18 ` [PATCHSET v1.0 00/42] xfs: reflink on the realtime device Darrick J. Wong 2022-12-30 22:18 ` [PATCH 03/42] xfs: namespace the maximum length/refcount symbols Darrick J. Wong 2022-12-30 22:18 ` [PATCH 02/42] xfs: introduce realtime refcount btree definitions Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 05/42] xfs: realtime refcount btree transaction reservations Darrick J. Wong ` (38 subsequent siblings) 41 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Rework the refcount btree cursor tracepoints in preparation to handle the realtime refcount btree cursor. Mostly this involves renaming the field to "refcbno" and extracting the group number from the cursor when possible. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/xfs_trace.c | 9 ++++ fs/xfs/xfs_trace.h | 114 ++++++++++++++++++++++++++++++---------------------- 2 files changed, 74 insertions(+), 49 deletions(-) diff --git a/fs/xfs/xfs_trace.c b/fs/xfs/xfs_trace.c index 0b9405749079..64f11a535763 100644 --- a/fs/xfs/xfs_trace.c +++ b/fs/xfs/xfs_trace.c @@ -64,6 +64,15 @@ xfs_rmapbt_crack_agno_opdev( } } +static inline void +xfs_refcountbt_crack_agno_opdev( + struct xfs_btree_cur *cur, + xfs_agnumber_t *agno, + dev_t *opdev) +{ + return xfs_rmapbt_crack_agno_opdev(cur, agno, opdev); +} + /* * We include this last to have the helpers above available for the trace * event implementations. diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index c22ffe459002..4e0c40934a7f 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -22,6 +22,8 @@ * * rmapbno: physical block number for a reverse mapping. This is an agbno for * per-AG rmap btrees or a rgbno for realtime rmap btrees. + * refcbno: physical block number for a refcount record. This is an agbno for + * per-AG refcount btrees or a rgbno for realtime refcount btrees. * * daddr: physical block number in 512b blocks * bbcount: number of blocks in a physical extent, in 512b blocks @@ -3230,56 +3232,60 @@ DEFINE_AG_ERROR_EVENT(xfs_ag_resv_init_error); /* refcount tracepoint classes */ DECLARE_EVENT_CLASS(xfs_refcount_class, - TP_PROTO(struct xfs_btree_cur *cur, xfs_agblock_t agbno, + TP_PROTO(struct xfs_btree_cur *cur, xfs_agblock_t refcbno, xfs_extlen_t len), - TP_ARGS(cur, agbno, len), + TP_ARGS(cur, refcbno, len), TP_STRUCT__entry( __field(dev_t, dev) + __field(dev_t, opdev) __field(xfs_agnumber_t, agno) - __field(xfs_agblock_t, agbno) + __field(xfs_agblock_t, refcbno) __field(xfs_extlen_t, len) ), TP_fast_assign( __entry->dev = cur->bc_mp->m_super->s_dev; - __entry->agno = cur->bc_ag.pag->pag_agno; - __entry->agbno = agbno; + xfs_refcountbt_crack_agno_opdev(cur, &__entry->agno, &__entry->opdev); + __entry->refcbno = refcbno; __entry->len = len; ), - TP_printk("dev %d:%d agno 0x%x agbno 0x%x fsbcount 0x%x", + TP_printk("dev %d:%d opdev %d:%d agno 0x%x refcbno 0x%x fsbcount 0x%x", MAJOR(__entry->dev), MINOR(__entry->dev), + MAJOR(__entry->opdev), MINOR(__entry->opdev), __entry->agno, - __entry->agbno, + __entry->refcbno, __entry->len) ); #define DEFINE_REFCOUNT_EVENT(name) \ DEFINE_EVENT(xfs_refcount_class, name, \ - TP_PROTO(struct xfs_btree_cur *cur, xfs_agblock_t agbno, \ + TP_PROTO(struct xfs_btree_cur *cur, xfs_agblock_t refcbno, \ xfs_extlen_t len), \ - TP_ARGS(cur, agbno, len)) + TP_ARGS(cur, refcbno, len)) TRACE_DEFINE_ENUM(XFS_LOOKUP_EQi); TRACE_DEFINE_ENUM(XFS_LOOKUP_LEi); TRACE_DEFINE_ENUM(XFS_LOOKUP_GEi); TRACE_EVENT(xfs_refcount_lookup, - TP_PROTO(struct xfs_btree_cur *cur, xfs_agblock_t agbno, + TP_PROTO(struct xfs_btree_cur *cur, xfs_agblock_t refcbno, xfs_lookup_t dir), - TP_ARGS(cur, agbno, dir), + TP_ARGS(cur, refcbno, dir), TP_STRUCT__entry( __field(dev_t, dev) + __field(dev_t, opdev) __field(xfs_agnumber_t, agno) - __field(xfs_agblock_t, agbno) + __field(xfs_agblock_t, refcbno) __field(xfs_lookup_t, dir) ), TP_fast_assign( __entry->dev = cur->bc_mp->m_super->s_dev; - __entry->agno = cur->bc_ag.pag->pag_agno; - __entry->agbno = agbno; + xfs_refcountbt_crack_agno_opdev(cur, &__entry->agno, &__entry->opdev); + __entry->refcbno = refcbno; __entry->dir = dir; ), - TP_printk("dev %d:%d agno 0x%x agbno 0x%x cmp %s(%d)", + TP_printk("dev %d:%d opdev %d:%d agno 0x%x refcbno 0x%x cmp %s(%d)", MAJOR(__entry->dev), MINOR(__entry->dev), + MAJOR(__entry->opdev), MINOR(__entry->opdev), __entry->agno, - __entry->agbno, + __entry->refcbno, __print_symbolic(__entry->dir, XFS_AG_BTREE_CMP_FORMAT_STR), __entry->dir) ) @@ -3290,6 +3296,7 @@ DECLARE_EVENT_CLASS(xfs_refcount_extent_class, TP_ARGS(cur, irec), TP_STRUCT__entry( __field(dev_t, dev) + __field(dev_t, opdev) __field(xfs_agnumber_t, agno) __field(enum xfs_refc_domain, domain) __field(xfs_agblock_t, startblock) @@ -3298,14 +3305,15 @@ DECLARE_EVENT_CLASS(xfs_refcount_extent_class, ), TP_fast_assign( __entry->dev = cur->bc_mp->m_super->s_dev; - __entry->agno = cur->bc_ag.pag->pag_agno; + xfs_refcountbt_crack_agno_opdev(cur, &__entry->agno, &__entry->opdev); __entry->domain = irec->rc_domain; __entry->startblock = irec->rc_startblock; __entry->blockcount = irec->rc_blockcount; __entry->refcount = irec->rc_refcount; ), - TP_printk("dev %d:%d agno 0x%x dom %s agbno 0x%x fsbcount 0x%x refcount %u", + TP_printk("dev %d:%d opdev %d:%d agno 0x%x dom %s refcbno 0x%x fsbcount 0x%x refcount %u", MAJOR(__entry->dev), MINOR(__entry->dev), + MAJOR(__entry->opdev), MINOR(__entry->opdev), __entry->agno, __print_symbolic(__entry->domain, XFS_REFC_DOMAIN_STRINGS), __entry->startblock, @@ -3321,49 +3329,52 @@ DEFINE_EVENT(xfs_refcount_extent_class, name, \ /* single-rcext and an agbno tracepoint class */ DECLARE_EVENT_CLASS(xfs_refcount_extent_at_class, TP_PROTO(struct xfs_btree_cur *cur, struct xfs_refcount_irec *irec, - xfs_agblock_t agbno), - TP_ARGS(cur, irec, agbno), + xfs_agblock_t refcbno), + TP_ARGS(cur, irec, refcbno), TP_STRUCT__entry( __field(dev_t, dev) + __field(dev_t, opdev) __field(xfs_agnumber_t, agno) __field(enum xfs_refc_domain, domain) __field(xfs_agblock_t, startblock) __field(xfs_extlen_t, blockcount) __field(xfs_nlink_t, refcount) - __field(xfs_agblock_t, agbno) + __field(xfs_agblock_t, refcbno) ), TP_fast_assign( __entry->dev = cur->bc_mp->m_super->s_dev; - __entry->agno = cur->bc_ag.pag->pag_agno; + xfs_refcountbt_crack_agno_opdev(cur, &__entry->agno, &__entry->opdev); __entry->domain = irec->rc_domain; __entry->startblock = irec->rc_startblock; __entry->blockcount = irec->rc_blockcount; __entry->refcount = irec->rc_refcount; - __entry->agbno = agbno; + __entry->refcbno = refcbno; ), - TP_printk("dev %d:%d agno 0x%x dom %s agbno 0x%x fsbcount 0x%x refcount %u @ agbno 0x%x", + TP_printk("dev %d:%d opdev %d:%d agno 0x%x dom %s refcbno 0x%x fsbcount 0x%x refcount %u @ refcbno 0x%x", MAJOR(__entry->dev), MINOR(__entry->dev), + MAJOR(__entry->opdev), MINOR(__entry->opdev), __entry->agno, __print_symbolic(__entry->domain, XFS_REFC_DOMAIN_STRINGS), __entry->startblock, __entry->blockcount, __entry->refcount, - __entry->agbno) + __entry->refcbno) ) #define DEFINE_REFCOUNT_EXTENT_AT_EVENT(name) \ DEFINE_EVENT(xfs_refcount_extent_at_class, name, \ TP_PROTO(struct xfs_btree_cur *cur, struct xfs_refcount_irec *irec, \ - xfs_agblock_t agbno), \ - TP_ARGS(cur, irec, agbno)) + xfs_agblock_t refcbno), \ + TP_ARGS(cur, irec, refcbno)) /* double-rcext tracepoint class */ DECLARE_EVENT_CLASS(xfs_refcount_double_extent_class, TP_PROTO(struct xfs_btree_cur *cur, struct xfs_refcount_irec *i1, - struct xfs_refcount_irec *i2), + struct xfs_refcount_irec *i2), TP_ARGS(cur, i1, i2), TP_STRUCT__entry( __field(dev_t, dev) + __field(dev_t, opdev) __field(xfs_agnumber_t, agno) __field(enum xfs_refc_domain, i1_domain) __field(xfs_agblock_t, i1_startblock) @@ -3376,7 +3387,7 @@ DECLARE_EVENT_CLASS(xfs_refcount_double_extent_class, ), TP_fast_assign( __entry->dev = cur->bc_mp->m_super->s_dev; - __entry->agno = cur->bc_ag.pag->pag_agno; + xfs_refcountbt_crack_agno_opdev(cur, &__entry->agno, &__entry->opdev); __entry->i1_domain = i1->rc_domain; __entry->i1_startblock = i1->rc_startblock; __entry->i1_blockcount = i1->rc_blockcount; @@ -3386,9 +3397,10 @@ DECLARE_EVENT_CLASS(xfs_refcount_double_extent_class, __entry->i2_blockcount = i2->rc_blockcount; __entry->i2_refcount = i2->rc_refcount; ), - TP_printk("dev %d:%d agno 0x%x dom %s agbno 0x%x fsbcount 0x%x refcount %u -- " - "dom %s agbno 0x%x fsbcount 0x%x refcount %u", + TP_printk("dev %d:%d opdev %d:%d agno 0x%x dom %s refcbno 0x%x fsbcount 0x%x refcount %u -- " + "dom %s refcbno 0x%x fsbcount 0x%x refcount %u", MAJOR(__entry->dev), MINOR(__entry->dev), + MAJOR(__entry->opdev), MINOR(__entry->opdev), __entry->agno, __print_symbolic(__entry->i1_domain, XFS_REFC_DOMAIN_STRINGS), __entry->i1_startblock, @@ -3409,10 +3421,11 @@ DEFINE_EVENT(xfs_refcount_double_extent_class, name, \ /* double-rcext and an agbno tracepoint class */ DECLARE_EVENT_CLASS(xfs_refcount_double_extent_at_class, TP_PROTO(struct xfs_btree_cur *cur, struct xfs_refcount_irec *i1, - struct xfs_refcount_irec *i2, xfs_agblock_t agbno), - TP_ARGS(cur, i1, i2, agbno), + struct xfs_refcount_irec *i2, xfs_agblock_t refcbno), + TP_ARGS(cur, i1, i2, refcbno), TP_STRUCT__entry( __field(dev_t, dev) + __field(dev_t, opdev) __field(xfs_agnumber_t, agno) __field(enum xfs_refc_domain, i1_domain) __field(xfs_agblock_t, i1_startblock) @@ -3422,11 +3435,11 @@ DECLARE_EVENT_CLASS(xfs_refcount_double_extent_at_class, __field(xfs_agblock_t, i2_startblock) __field(xfs_extlen_t, i2_blockcount) __field(xfs_nlink_t, i2_refcount) - __field(xfs_agblock_t, agbno) + __field(xfs_agblock_t, refcbno) ), TP_fast_assign( __entry->dev = cur->bc_mp->m_super->s_dev; - __entry->agno = cur->bc_ag.pag->pag_agno; + xfs_refcountbt_crack_agno_opdev(cur, &__entry->agno, &__entry->opdev); __entry->i1_domain = i1->rc_domain; __entry->i1_startblock = i1->rc_startblock; __entry->i1_blockcount = i1->rc_blockcount; @@ -3435,11 +3448,12 @@ DECLARE_EVENT_CLASS(xfs_refcount_double_extent_at_class, __entry->i2_startblock = i2->rc_startblock; __entry->i2_blockcount = i2->rc_blockcount; __entry->i2_refcount = i2->rc_refcount; - __entry->agbno = agbno; + __entry->refcbno = refcbno; ), - TP_printk("dev %d:%d agno 0x%x dom %s agbno 0x%x fsbcount 0x%x refcount %u -- " - "dom %s agbno 0x%x fsbcount 0x%x refcount %u @ agbno 0x%x", + TP_printk("dev %d:%d opdev %d:%d agno 0x%x dom %s refcbno 0x%x fsbcount 0x%x refcount %u -- " + "dom %s refcbno 0x%x fsbcount 0x%x refcount %u @ refcbno 0x%x", MAJOR(__entry->dev), MINOR(__entry->dev), + MAJOR(__entry->opdev), MINOR(__entry->opdev), __entry->agno, __print_symbolic(__entry->i1_domain, XFS_REFC_DOMAIN_STRINGS), __entry->i1_startblock, @@ -3449,14 +3463,14 @@ DECLARE_EVENT_CLASS(xfs_refcount_double_extent_at_class, __entry->i2_startblock, __entry->i2_blockcount, __entry->i2_refcount, - __entry->agbno) + __entry->refcbno) ) #define DEFINE_REFCOUNT_DOUBLE_EXTENT_AT_EVENT(name) \ DEFINE_EVENT(xfs_refcount_double_extent_at_class, name, \ TP_PROTO(struct xfs_btree_cur *cur, struct xfs_refcount_irec *i1, \ - struct xfs_refcount_irec *i2, xfs_agblock_t agbno), \ - TP_ARGS(cur, i1, i2, agbno)) + struct xfs_refcount_irec *i2, xfs_agblock_t refcbno), \ + TP_ARGS(cur, i1, i2, refcbno)) /* triple-rcext tracepoint class */ DECLARE_EVENT_CLASS(xfs_refcount_triple_extent_class, @@ -3465,6 +3479,7 @@ DECLARE_EVENT_CLASS(xfs_refcount_triple_extent_class, TP_ARGS(cur, i1, i2, i3), TP_STRUCT__entry( __field(dev_t, dev) + __field(dev_t, opdev) __field(xfs_agnumber_t, agno) __field(enum xfs_refc_domain, i1_domain) __field(xfs_agblock_t, i1_startblock) @@ -3481,7 +3496,7 @@ DECLARE_EVENT_CLASS(xfs_refcount_triple_extent_class, ), TP_fast_assign( __entry->dev = cur->bc_mp->m_super->s_dev; - __entry->agno = cur->bc_ag.pag->pag_agno; + xfs_refcountbt_crack_agno_opdev(cur, &__entry->agno, &__entry->opdev); __entry->i1_domain = i1->rc_domain; __entry->i1_startblock = i1->rc_startblock; __entry->i1_blockcount = i1->rc_blockcount; @@ -3495,10 +3510,11 @@ DECLARE_EVENT_CLASS(xfs_refcount_triple_extent_class, __entry->i3_blockcount = i3->rc_blockcount; __entry->i3_refcount = i3->rc_refcount; ), - TP_printk("dev %d:%d agno 0x%x dom %s agbno 0x%x fsbcount 0x%x refcount %u -- " - "dom %s agbno 0x%x fsbcount 0x%x refcount %u -- " - "dom %s agbno 0x%x fsbcount 0x%x refcount %u", + TP_printk("dev %d:%d opdev %d:%d agno 0x%x dom %s refcbno 0x%x fsbcount 0x%x refcount %u -- " + "dom %s refcbno 0x%x fsbcount 0x%x refcount %u -- " + "dom %s refcbno 0x%x fsbcount 0x%x refcount %u", MAJOR(__entry->dev), MINOR(__entry->dev), + MAJOR(__entry->opdev), MINOR(__entry->opdev), __entry->agno, __print_symbolic(__entry->i1_domain, XFS_REFC_DOMAIN_STRINGS), __entry->i1_startblock, @@ -3568,21 +3584,21 @@ DECLARE_EVENT_CLASS(xfs_refcount_deferred_class, __field(dev_t, dev) __field(xfs_agnumber_t, agno) __field(int, op) - __field(xfs_agblock_t, agbno) + __field(xfs_agblock_t, refcbno) __field(xfs_extlen_t, len) ), TP_fast_assign( __entry->dev = mp->m_super->s_dev; __entry->agno = XFS_FSB_TO_AGNO(mp, refc->ri_startblock); __entry->op = refc->ri_type; - __entry->agbno = XFS_FSB_TO_AGBNO(mp, refc->ri_startblock); + __entry->refcbno = XFS_FSB_TO_AGBNO(mp, refc->ri_startblock); __entry->len = refc->ri_blockcount; ), - TP_printk("dev %d:%d op %s agno 0x%x agbno 0x%x fsbcount 0x%x", + TP_printk("dev %d:%d op %s agno 0x%x refcbno 0x%x fsbcount 0x%x", MAJOR(__entry->dev), MINOR(__entry->dev), __print_symbolic(__entry->op, XFS_REFCOUNT_INTENT_STRINGS), __entry->agno, - __entry->agbno, + __entry->refcbno, __entry->len) ); #define DEFINE_REFCOUNT_DEFERRED_EVENT(name) \ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 05/42] xfs: realtime refcount btree transaction reservations 2022-12-30 22:18 ` [PATCHSET v1.0 00/42] xfs: reflink on the realtime device Darrick J. Wong ` (2 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 01/42] xfs: prepare refcount btree cursor tracepoints for realtime Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 06/42] xfs: add realtime refcount btree operations Darrick J. Wong ` (37 subsequent siblings) 41 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Make sure that there's enough log reservation to handle mapping and unmapping realtime extents. We have to reserve enough space to handle a split in the rtrefcountbt to add the record and a second split in the regular refcountbt to record the rtrefcountbt split. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_trans_resv.c | 25 ++++++++++++++++++++++--- 1 file changed, 22 insertions(+), 3 deletions(-) diff --git a/fs/xfs/libxfs/xfs_trans_resv.c b/fs/xfs/libxfs/xfs_trans_resv.c index 52a4386a3d96..2b8b8dd5dec3 100644 --- a/fs/xfs/libxfs/xfs_trans_resv.c +++ b/fs/xfs/libxfs/xfs_trans_resv.c @@ -90,6 +90,14 @@ xfs_refcountbt_block_count( return num_ops * (2 * mp->m_refc_maxlevels - 1); } +static unsigned int +xfs_rtrefcountbt_block_count( + struct xfs_mount *mp, + unsigned int num_ops) +{ + return num_ops * (2 * mp->m_rtrefc_maxlevels - 1); +} + /* * Logging inodes is really tricksy. They are logged in memory format, * which means that what we write into the log doesn't directly translate into @@ -257,10 +265,13 @@ xfs_rtalloc_block_count( * Compute the log reservation required to handle the refcount update * transaction. Refcount updates are always done via deferred log items. * - * This is calculated as: + * This is calculated as the max of: * Data device refcount updates (t1): * the agfs of the ags containing the blocks: nr_ops * sector size * the refcount btrees: nr_ops * 1 trees * (2 * max depth - 1) * block size + * Realtime refcount updates (t2); + * the rt refcount inode + * the rtrefcount btrees: nr_ops * 1 trees * (2 * max depth - 1) * block size */ static unsigned int xfs_calc_refcountbt_reservation( @@ -268,12 +279,20 @@ xfs_calc_refcountbt_reservation( unsigned int nr_ops) { unsigned int blksz = XFS_FSB_TO_B(mp, 1); + unsigned int t1, t2 = 0; if (!xfs_has_reflink(mp)) return 0; - return xfs_calc_buf_res(nr_ops, mp->m_sb.sb_sectsize) + - xfs_calc_buf_res(xfs_refcountbt_block_count(mp, nr_ops), blksz); + t1 = xfs_calc_buf_res(nr_ops, mp->m_sb.sb_sectsize) + + xfs_calc_buf_res(xfs_refcountbt_block_count(mp, nr_ops), blksz); + + if (xfs_has_realtime(mp)) + t2 = xfs_calc_inode_res(mp, 1) + + xfs_calc_buf_res(xfs_rtrefcountbt_block_count(mp, nr_ops), + blksz); + + return max(t1, t2); } /* ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 06/42] xfs: add realtime refcount btree operations 2022-12-30 22:18 ` [PATCHSET v1.0 00/42] xfs: reflink on the realtime device Darrick J. Wong ` (3 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 05/42] xfs: realtime refcount btree transaction reservations Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 04/42] xfs: define the on-disk realtime refcount btree format Darrick J. Wong ` (36 subsequent siblings) 41 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Implement the generic btree operations needed to manipulate rtrefcount btree blocks. This is different from the regular refcountbt in that we allocate space from the filesystem at large, and are neither constrained to the free space nor any particular AG. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_rtrefcount_btree.c | 148 ++++++++++++++++++++++++++++++++++ 1 file changed, 148 insertions(+) diff --git a/fs/xfs/libxfs/xfs_rtrefcount_btree.c b/fs/xfs/libxfs/xfs_rtrefcount_btree.c index dd8e628b068b..bdefc4f5939d 100644 --- a/fs/xfs/libxfs/xfs_rtrefcount_btree.c +++ b/fs/xfs/libxfs/xfs_rtrefcount_btree.c @@ -19,6 +19,7 @@ #include "xfs_btree.h" #include "xfs_btree_staging.h" #include "xfs_rtrefcount_btree.h" +#include "xfs_refcount.h" #include "xfs_trace.h" #include "xfs_cksum.h" #include "xfs_error.h" @@ -53,6 +54,106 @@ xfs_rtrefcountbt_dup_cursor( return new; } +STATIC int +xfs_rtrefcountbt_get_minrecs( + struct xfs_btree_cur *cur, + int level) +{ + if (level == cur->bc_nlevels - 1) { + struct xfs_ifork *ifp = xfs_btree_ifork_ptr(cur); + + return xfs_rtrefcountbt_maxrecs(cur->bc_mp, ifp->if_broot_bytes, + level == 0) / 2; + } + + return cur->bc_mp->m_rtrefc_mnr[level != 0]; +} + +STATIC int +xfs_rtrefcountbt_get_maxrecs( + struct xfs_btree_cur *cur, + int level) +{ + if (level == cur->bc_nlevels - 1) { + struct xfs_ifork *ifp = xfs_btree_ifork_ptr(cur); + + return xfs_rtrefcountbt_maxrecs(cur->bc_mp, ifp->if_broot_bytes, + level == 0); + } + + return cur->bc_mp->m_rtrefc_mxr[level != 0]; +} + +STATIC void +xfs_rtrefcountbt_init_key_from_rec( + union xfs_btree_key *key, + const union xfs_btree_rec *rec) +{ + key->refc.rc_startblock = rec->refc.rc_startblock; +} + +STATIC void +xfs_rtrefcountbt_init_high_key_from_rec( + union xfs_btree_key *key, + const union xfs_btree_rec *rec) +{ + __u32 x; + + x = be32_to_cpu(rec->refc.rc_startblock); + x += be32_to_cpu(rec->refc.rc_blockcount) - 1; + key->refc.rc_startblock = cpu_to_be32(x); +} + +STATIC void +xfs_rtrefcountbt_init_rec_from_cur( + struct xfs_btree_cur *cur, + union xfs_btree_rec *rec) +{ + const struct xfs_refcount_irec *irec = &cur->bc_rec.rc; + uint32_t start; + + start = xfs_refcount_encode_startblock(irec->rc_startblock, + irec->rc_domain); + rec->refc.rc_startblock = cpu_to_be32(start); + rec->refc.rc_blockcount = cpu_to_be32(cur->bc_rec.rc.rc_blockcount); + rec->refc.rc_refcount = cpu_to_be32(cur->bc_rec.rc.rc_refcount); +} + +STATIC void +xfs_rtrefcountbt_init_ptr_from_cur( + struct xfs_btree_cur *cur, + union xfs_btree_ptr *ptr) +{ + ptr->l = 0; +} + +STATIC int64_t +xfs_rtrefcountbt_key_diff( + struct xfs_btree_cur *cur, + const union xfs_btree_key *key) +{ + const struct xfs_refcount_key *kp = &key->refc; + const struct xfs_refcount_irec *irec = &cur->bc_rec.rc; + uint32_t start; + + start = xfs_refcount_encode_startblock(irec->rc_startblock, + irec->rc_domain); + return (int64_t)be32_to_cpu(kp->rc_startblock) - start; +} + +STATIC int64_t +xfs_rtrefcountbt_diff_two_keys( + struct xfs_btree_cur *cur, + const union xfs_btree_key *k1, + const union xfs_btree_key *k2, + const union xfs_btree_key *mask) +{ + ASSERT(!mask || mask->refc.rc_startblock); + + return (int64_t)be32_to_cpu(k1->refc.rc_startblock) - + be32_to_cpu(k2->refc.rc_startblock); +} + static xfs_failaddr_t xfs_rtrefcountbt_verify( struct xfs_buf *bp) @@ -119,6 +220,40 @@ const struct xfs_buf_ops xfs_rtrefcountbt_buf_ops = { .verify_struct = xfs_rtrefcountbt_verify, }; +STATIC int +xfs_rtrefcountbt_keys_inorder( + struct xfs_btree_cur *cur, + const union xfs_btree_key *k1, + const union xfs_btree_key *k2) +{ + return be32_to_cpu(k1->refc.rc_startblock) < + be32_to_cpu(k2->refc.rc_startblock); +} + +STATIC int +xfs_rtrefcountbt_recs_inorder( + struct xfs_btree_cur *cur, + const union xfs_btree_rec *r1, + const union xfs_btree_rec *r2) +{ + return be32_to_cpu(r1->refc.rc_startblock) + + be32_to_cpu(r1->refc.rc_blockcount) <= + be32_to_cpu(r2->refc.rc_startblock); +} + +STATIC enum xbtree_key_contig +xfs_rtrefcountbt_keys_contiguous( + struct xfs_btree_cur *cur, + const union xfs_btree_key *key1, + const union xfs_btree_key *key2, + const union xfs_btree_key *mask) +{ + ASSERT(!mask || mask->refc.rc_startblock); + + return xbtree_key_contig(be32_to_cpu(key1->refc.rc_startblock), + be32_to_cpu(key2->refc.rc_startblock)); +} + const struct xfs_btree_ops xfs_rtrefcountbt_ops = { .rec_len = sizeof(struct xfs_refcount_rec), .key_len = sizeof(struct xfs_refcount_key), @@ -126,7 +261,20 @@ const struct xfs_btree_ops xfs_rtrefcountbt_ops = { XFS_BTREE_CRC_BLOCKS | XFS_BTREE_IROOT_RECORDS, .dup_cursor = xfs_rtrefcountbt_dup_cursor, + .alloc_block = xfs_btree_alloc_imeta_block, + .free_block = xfs_btree_free_imeta_block, + .get_minrecs = xfs_rtrefcountbt_get_minrecs, + .get_maxrecs = xfs_rtrefcountbt_get_maxrecs, + .init_key_from_rec = xfs_rtrefcountbt_init_key_from_rec, + .init_high_key_from_rec = xfs_rtrefcountbt_init_high_key_from_rec, + .init_rec_from_cur = xfs_rtrefcountbt_init_rec_from_cur, + .init_ptr_from_cur = xfs_rtrefcountbt_init_ptr_from_cur, + .key_diff = xfs_rtrefcountbt_key_diff, .buf_ops = &xfs_rtrefcountbt_buf_ops, + .diff_two_keys = xfs_rtrefcountbt_diff_two_keys, + .keys_inorder = xfs_rtrefcountbt_keys_inorder, + .recs_inorder = xfs_rtrefcountbt_recs_inorder, + .keys_contiguous = xfs_rtrefcountbt_keys_contiguous, }; /* Initialize a new rt refcount btree cursor. */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 04/42] xfs: define the on-disk realtime refcount btree format 2022-12-30 22:18 ` [PATCHSET v1.0 00/42] xfs: reflink on the realtime device Darrick J. Wong ` (4 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 06/42] xfs: add realtime refcount btree operations Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 10/42] xfs: add realtime refcount btree block detection to log recovery Darrick J. Wong ` (35 subsequent siblings) 41 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Start filling out the rtrefcount btree implementation. Start with the on-disk btree format; add everything needed to read, write and manipulate refcount btree blocks. This prepares the way for connecting the btree operations implementation. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/Makefile | 1 fs/xfs/libxfs/xfs_btree.c | 6 + fs/xfs/libxfs/xfs_btree.h | 11 + fs/xfs/libxfs/xfs_format.h | 3 fs/xfs/libxfs/xfs_rtrefcount_btree.c | 311 ++++++++++++++++++++++++++++++++++ fs/xfs/libxfs/xfs_rtrefcount_btree.h | 71 ++++++++ fs/xfs/libxfs/xfs_sb.c | 8 + fs/xfs/libxfs/xfs_shared.h | 2 fs/xfs/xfs_mount.c | 7 + fs/xfs/xfs_mount.h | 9 + fs/xfs/xfs_ondisk.h | 1 11 files changed, 425 insertions(+), 5 deletions(-) create mode 100644 fs/xfs/libxfs/xfs_rtrefcount_btree.c create mode 100644 fs/xfs/libxfs/xfs_rtrefcount_btree.h diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile index 17c65dce6d26..9cc30333c089 100644 --- a/fs/xfs/Makefile +++ b/fs/xfs/Makefile @@ -47,6 +47,7 @@ xfs-y += $(addprefix libxfs/, \ xfs_rmap_btree.o \ xfs_refcount.o \ xfs_refcount_btree.o \ + xfs_rtrefcount_btree.o \ xfs_rtrmap_btree.o \ xfs_sb.o \ xfs_swapext.o \ diff --git a/fs/xfs/libxfs/xfs_btree.c b/fs/xfs/libxfs/xfs_btree.c index 377dc9b0a6e6..a789fb75e77d 100644 --- a/fs/xfs/libxfs/xfs_btree.c +++ b/fs/xfs/libxfs/xfs_btree.c @@ -37,6 +37,7 @@ #include "xfs_rmap.h" #include "xfs_quota.h" #include "xfs_imeta.h" +#include "xfs_rtrefcount_btree.h" /* * Btree magic numbers. @@ -1388,6 +1389,7 @@ xfs_btree_set_refs( xfs_buf_set_ref(bp, XFS_RMAP_BTREE_REF); break; case XFS_BTNUM_REFC: + case XFS_BTNUM_RTREFC: xfs_buf_set_ref(bp, XFS_REFC_BTREE_REF); break; case XFS_BTNUM_RCBAG: @@ -5548,6 +5550,9 @@ xfs_btree_init_cur_caches(void) if (error) goto err; error = xfs_rtrmapbt_init_cur_cache(); + if (error) + goto err; + error = xfs_rtrefcountbt_init_cur_cache(); if (error) goto err; @@ -5567,6 +5572,7 @@ xfs_btree_destroy_cur_caches(void) xfs_rmapbt_destroy_cur_cache(); xfs_refcountbt_destroy_cur_cache(); xfs_rtrmapbt_destroy_cur_cache(); + xfs_rtrefcountbt_destroy_cur_cache(); } /* Move the btree cursor before the first record. */ diff --git a/fs/xfs/libxfs/xfs_btree.h b/fs/xfs/libxfs/xfs_btree.h index ce5ef798c3bc..97127030aea6 100644 --- a/fs/xfs/libxfs/xfs_btree.h +++ b/fs/xfs/libxfs/xfs_btree.h @@ -226,6 +226,11 @@ union xfs_btree_irec { struct xfs_refcount_irec rc; }; +struct xbtree_refc { + unsigned int nr_ops; /* # record updates */ + unsigned int shape_changes; /* # of extent splits */ +}; + /* Per-AG btree information. */ struct xfs_btree_cur_ag { struct xfs_perag *pag; @@ -234,10 +239,7 @@ struct xfs_btree_cur_ag { struct xbtree_afakeroot *afake; /* for staging cursor */ }; union { - struct { - unsigned int nr_ops; /* # record updates */ - unsigned int shape_changes; /* # of extent splits */ - } refc; + struct xbtree_refc refc; struct { bool active; /* allocation cursor state */ } abt; @@ -258,6 +260,7 @@ struct xfs_btree_cur_ino { /* For extent swap, ignore owner check in verifier */ #define XFS_BTCUR_BMBT_INVALID_OWNER (1 << 1) + struct xbtree_refc refc; }; /* In-memory btree information */ diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h index c49a946e79f3..d2270f95bfbc 100644 --- a/fs/xfs/libxfs/xfs_format.h +++ b/fs/xfs/libxfs/xfs_format.h @@ -1803,6 +1803,9 @@ typedef __be32 xfs_refcount_ptr_t; */ #define XFS_RTREFC_CRC_MAGIC 0x52434e54 /* 'RCNT' */ +/* inode-rooted btree pointer type */ +typedef __be64 xfs_rtrefcount_ptr_t; + /* * BMAP Btree format definitions * diff --git a/fs/xfs/libxfs/xfs_rtrefcount_btree.c b/fs/xfs/libxfs/xfs_rtrefcount_btree.c new file mode 100644 index 000000000000..dd8e628b068b --- /dev/null +++ b/fs/xfs/libxfs/xfs_rtrefcount_btree.c @@ -0,0 +1,311 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (C) 2022 Oracle. All Rights Reserved. + * Author: Darrick J. Wong <djwong@kernel.org> + */ +#include "xfs.h" +#include "xfs_fs.h" +#include "xfs_shared.h" +#include "xfs_format.h" +#include "xfs_log_format.h" +#include "xfs_trans_resv.h" +#include "xfs_bit.h" +#include "xfs_sb.h" +#include "xfs_mount.h" +#include "xfs_defer.h" +#include "xfs_inode.h" +#include "xfs_trans.h" +#include "xfs_alloc.h" +#include "xfs_btree.h" +#include "xfs_btree_staging.h" +#include "xfs_rtrefcount_btree.h" +#include "xfs_trace.h" +#include "xfs_cksum.h" +#include "xfs_error.h" +#include "xfs_extent_busy.h" +#include "xfs_rtgroup.h" +#include "xfs_rtbitmap.h" + +static struct kmem_cache *xfs_rtrefcountbt_cur_cache; + +/* + * Realtime Reference Count btree. + * + * This is a btree used to track the owner(s) of a given extent in the realtime + * device. See the comments in xfs_refcount_btree.c for more information. + * + * This tree is basically the same as the regular refcount btree except that + * it's rooted in an inode. + */ + +static struct xfs_btree_cur * +xfs_rtrefcountbt_dup_cursor( + struct xfs_btree_cur *cur) +{ + struct xfs_btree_cur *new; + + new = xfs_rtrefcountbt_init_cursor(cur->bc_mp, cur->bc_tp, + cur->bc_ino.rtg, cur->bc_ino.ip); + + /* Copy the flags values since init cursor doesn't get them. */ + new->bc_ino.flags = cur->bc_ino.flags; + + return new; +} + +static xfs_failaddr_t +xfs_rtrefcountbt_verify( + struct xfs_buf *bp) +{ + struct xfs_mount *mp = bp->b_target->bt_mount; + struct xfs_btree_block *block = XFS_BUF_TO_BLOCK(bp); + xfs_failaddr_t fa; + int level; + + if (!xfs_verify_magic(bp, block->bb_magic)) + return __this_address; + + if (!xfs_has_reflink(mp)) + return __this_address; + fa = xfs_btree_lblock_v5hdr_verify(bp, XFS_RMAP_OWN_UNKNOWN); + if (fa) + return fa; + level = be16_to_cpu(block->bb_level); + if (level > mp->m_rtrefc_maxlevels) + return __this_address; + + return xfs_btree_lblock_verify(bp, mp->m_rtrefc_mxr[level != 0]); +} + +static void +xfs_rtrefcountbt_read_verify( + struct xfs_buf *bp) +{ + xfs_failaddr_t fa; + + if (!xfs_btree_lblock_verify_crc(bp)) + xfs_verifier_error(bp, -EFSBADCRC, __this_address); + else { + fa = xfs_rtrefcountbt_verify(bp); + if (fa) + xfs_verifier_error(bp, -EFSCORRUPTED, fa); + } + + if (bp->b_error) + trace_xfs_btree_corrupt(bp, _RET_IP_); +} + +static void +xfs_rtrefcountbt_write_verify( + struct xfs_buf *bp) +{ + xfs_failaddr_t fa; + + fa = xfs_rtrefcountbt_verify(bp); + if (fa) { + trace_xfs_btree_corrupt(bp, _RET_IP_); + xfs_verifier_error(bp, -EFSCORRUPTED, fa); + return; + } + xfs_btree_lblock_calc_crc(bp); + +} + +const struct xfs_buf_ops xfs_rtrefcountbt_buf_ops = { + .name = "xfs_rtrefcountbt", + .magic = { 0, cpu_to_be32(XFS_RTREFC_CRC_MAGIC) }, + .verify_read = xfs_rtrefcountbt_read_verify, + .verify_write = xfs_rtrefcountbt_write_verify, + .verify_struct = xfs_rtrefcountbt_verify, +}; + +const struct xfs_btree_ops xfs_rtrefcountbt_ops = { + .rec_len = sizeof(struct xfs_refcount_rec), + .key_len = sizeof(struct xfs_refcount_key), + .geom_flags = XFS_BTREE_LONG_PTRS | XFS_BTREE_ROOT_IN_INODE | + XFS_BTREE_CRC_BLOCKS | XFS_BTREE_IROOT_RECORDS, + + .dup_cursor = xfs_rtrefcountbt_dup_cursor, + .buf_ops = &xfs_rtrefcountbt_buf_ops, +}; + +/* Initialize a new rt refcount btree cursor. */ +static struct xfs_btree_cur * +xfs_rtrefcountbt_init_common( + struct xfs_mount *mp, + struct xfs_trans *tp, + struct xfs_rtgroup *rtg, + struct xfs_inode *ip) +{ + struct xfs_btree_cur *cur; + + ASSERT(xfs_isilocked(ip, XFS_ILOCK_SHARED | XFS_ILOCK_EXCL)); + + cur = xfs_btree_alloc_cursor(mp, tp, XFS_BTNUM_RTREFC, + &xfs_rtrefcountbt_ops, mp->m_rtrefc_maxlevels, + xfs_rtrefcountbt_cur_cache); + cur->bc_statoff = XFS_STATS_CALC_INDEX(xs_refcbt_2); + + cur->bc_ino.ip = ip; + cur->bc_ino.allocated = 0; + cur->bc_ino.flags = 0; + cur->bc_ino.refc.nr_ops = 0; + cur->bc_ino.refc.shape_changes = 0; + + cur->bc_ino.rtg = xfs_rtgroup_bump(rtg); + return cur; +} + +/* Allocate a new rt refcount btree cursor. */ +struct xfs_btree_cur * +xfs_rtrefcountbt_init_cursor( + struct xfs_mount *mp, + struct xfs_trans *tp, + struct xfs_rtgroup *rtg, + struct xfs_inode *ip) +{ + struct xfs_btree_cur *cur; + struct xfs_ifork *ifp = xfs_ifork_ptr(ip, XFS_DATA_FORK); + + cur = xfs_rtrefcountbt_init_common(mp, tp, rtg, ip); + cur->bc_nlevels = be16_to_cpu(ifp->if_broot->bb_level) + 1; + cur->bc_ino.forksize = xfs_inode_fork_size(ip, XFS_DATA_FORK); + cur->bc_ino.whichfork = XFS_DATA_FORK; + return cur; +} + +/* Create a new rt reverse mapping btree cursor with a fake root for staging. */ +struct xfs_btree_cur * +xfs_rtrefcountbt_stage_cursor( + struct xfs_mount *mp, + struct xfs_rtgroup *rtg, + struct xfs_inode *ip, + struct xbtree_ifakeroot *ifake) +{ + struct xfs_btree_cur *cur; + + cur = xfs_rtrefcountbt_init_common(mp, NULL, rtg, ip); + cur->bc_nlevels = ifake->if_levels; + cur->bc_ino.forksize = ifake->if_fork_size; + cur->bc_ino.whichfork = -1; + xfs_btree_stage_ifakeroot(cur, ifake, NULL); + return cur; +} + +/* + * Install a new rt reverse mapping btree root. Caller is responsible for + * invalidating and freeing the old btree blocks. + */ +void +xfs_rtrefcountbt_commit_staged_btree( + struct xfs_btree_cur *cur, + struct xfs_trans *tp) +{ + struct xbtree_ifakeroot *ifake = cur->bc_ino.ifake; + struct xfs_ifork *ifp; + int flags = XFS_ILOG_CORE | XFS_ILOG_DBROOT; + + ASSERT(cur->bc_flags & XFS_BTREE_STAGING); + + /* + * Free any resources hanging off the real fork, then shallow-copy the + * staging fork's contents into the real fork to transfer everything + * we just built. + */ + ifp = xfs_ifork_ptr(cur->bc_ino.ip, XFS_DATA_FORK); + xfs_idestroy_fork(ifp); + memcpy(ifp, ifake->if_fork, sizeof(struct xfs_ifork)); + + xfs_trans_log_inode(tp, cur->bc_ino.ip, flags); + xfs_btree_commit_ifakeroot(cur, tp, XFS_DATA_FORK, + &xfs_rtrefcountbt_ops); +} + +/* Calculate number of records in a realtime refcount btree block. */ +static inline unsigned int +xfs_rtrefcountbt_block_maxrecs( + unsigned int blocklen, + bool leaf) +{ + + if (leaf) + return blocklen / sizeof(struct xfs_refcount_rec); + return blocklen / (sizeof(struct xfs_refcount_key) + + sizeof(xfs_rtrefcount_ptr_t)); +} + +/* + * Calculate number of records in an refcount btree block. + */ +unsigned int +xfs_rtrefcountbt_maxrecs( + struct xfs_mount *mp, + unsigned int blocklen, + bool leaf) +{ + blocklen -= XFS_RTREFCOUNT_BLOCK_LEN; + return xfs_rtrefcountbt_block_maxrecs(blocklen, leaf); +} + +/* Compute the max possible height for realtime refcount btrees. */ +unsigned int +xfs_rtrefcountbt_maxlevels_ondisk(void) +{ + unsigned int minrecs[2]; + unsigned int blocklen; + + blocklen = XFS_MIN_CRC_BLOCKSIZE - XFS_BTREE_LBLOCK_CRC_LEN; + + minrecs[0] = xfs_rtrefcountbt_block_maxrecs(blocklen, true) / 2; + minrecs[1] = xfs_rtrefcountbt_block_maxrecs(blocklen, false) / 2; + + /* We need at most one record for every block in an rt group. */ + return xfs_btree_compute_maxlevels(minrecs, XFS_MAX_RGBLOCKS); +} + +int __init +xfs_rtrefcountbt_init_cur_cache(void) +{ + xfs_rtrefcountbt_cur_cache = kmem_cache_create("xfs_rtrefcountbt_cur", + xfs_btree_cur_sizeof( + xfs_rtrefcountbt_maxlevels_ondisk()), + 0, 0, NULL); + + if (!xfs_rtrefcountbt_cur_cache) + return -ENOMEM; + return 0; +} + +void +xfs_rtrefcountbt_destroy_cur_cache(void) +{ + kmem_cache_destroy(xfs_rtrefcountbt_cur_cache); + xfs_rtrefcountbt_cur_cache = NULL; +} + +/* Compute the maximum height of a realtime refcount btree. */ +void +xfs_rtrefcountbt_compute_maxlevels( + struct xfs_mount *mp) +{ + unsigned int d_maxlevels, r_maxlevels; + + if (!xfs_has_rtreflink(mp)) { + mp->m_rtrefc_maxlevels = 0; + return; + } + + /* + * The realtime refcountbt lives on the data device, which means that + * its maximum height is constrained by the size of the data device and + * the height required to store one refcount record for each rtextent + * in an rt group. + */ + d_maxlevels = xfs_btree_space_to_height(mp->m_rtrefc_mnr, + mp->m_sb.sb_dblocks); + r_maxlevels = xfs_btree_compute_maxlevels(mp->m_rtrefc_mnr, + xfs_rtb_to_rtxt(mp, mp->m_sb.sb_rgblocks)); + + /* Add one level to handle the inode root level. */ + mp->m_rtrefc_maxlevels = min(d_maxlevels, r_maxlevels) + 1; +} diff --git a/fs/xfs/libxfs/xfs_rtrefcount_btree.h b/fs/xfs/libxfs/xfs_rtrefcount_btree.h new file mode 100644 index 000000000000..d10ebdcf7727 --- /dev/null +++ b/fs/xfs/libxfs/xfs_rtrefcount_btree.h @@ -0,0 +1,71 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * Copyright (C) 2022 Oracle. All Rights Reserved. + * Author: Darrick J. Wong <djwong@kernel.org> + */ +#ifndef __XFS_RTREFCOUNT_BTREE_H__ +#define __XFS_RTREFCOUNT_BTREE_H__ + +struct xfs_buf; +struct xfs_btree_cur; +struct xfs_mount; +struct xbtree_ifakeroot; +struct xfs_rtgroup; + +/* refcounts only exist on crc enabled filesystems */ +#define XFS_RTREFCOUNT_BLOCK_LEN XFS_BTREE_LBLOCK_CRC_LEN + +struct xfs_btree_cur *xfs_rtrefcountbt_init_cursor(struct xfs_mount *mp, + struct xfs_trans *tp, struct xfs_rtgroup *rtg, + struct xfs_inode *ip); +struct xfs_btree_cur *xfs_rtrefcountbt_stage_cursor(struct xfs_mount *mp, + struct xfs_rtgroup *rtg, struct xfs_inode *ip, + struct xbtree_ifakeroot *ifake); +void xfs_rtrefcountbt_commit_staged_btree(struct xfs_btree_cur *cur, + struct xfs_trans *tp); +unsigned int xfs_rtrefcountbt_maxrecs(struct xfs_mount *mp, + unsigned int blocklen, bool leaf); +void xfs_rtrefcountbt_compute_maxlevels(struct xfs_mount *mp); + +/* + * Addresses of records, keys, and pointers within an incore rtrefcountbt block. + * + * (note that some of these may appear unused, but they are used in userspace) + */ +static inline struct xfs_refcount_rec * +xfs_rtrefcount_rec_addr( + struct xfs_btree_block *block, + unsigned int index) +{ + return (struct xfs_refcount_rec *) + ((char *)block + XFS_RTREFCOUNT_BLOCK_LEN + + (index - 1) * sizeof(struct xfs_refcount_rec)); +} + +static inline struct xfs_refcount_key * +xfs_rtrefcount_key_addr( + struct xfs_btree_block *block, + unsigned int index) +{ + return (struct xfs_refcount_key *) + ((char *)block + XFS_RTREFCOUNT_BLOCK_LEN + + (index - 1) * sizeof(struct xfs_refcount_key)); +} + +static inline xfs_rtrefcount_ptr_t * +xfs_rtrefcount_ptr_addr( + struct xfs_btree_block *block, + unsigned int index, + unsigned int maxrecs) +{ + return (xfs_rtrefcount_ptr_t *) + ((char *)block + XFS_RTREFCOUNT_BLOCK_LEN + + maxrecs * sizeof(struct xfs_refcount_key) + + (index - 1) * sizeof(xfs_rtrefcount_ptr_t)); +} + +unsigned int xfs_rtrefcountbt_maxlevels_ondisk(void); +int __init xfs_rtrefcountbt_init_cur_cache(void); +void xfs_rtrefcountbt_destroy_cur_cache(void); + +#endif /* __XFS_RTREFCOUNT_BTREE_H__ */ diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c index 570919c223c9..c002cf661912 100644 --- a/fs/xfs/libxfs/xfs_sb.c +++ b/fs/xfs/libxfs/xfs_sb.c @@ -28,6 +28,7 @@ #include "xfs_swapext.h" #include "xfs_rtgroup.h" #include "xfs_rtrmap_btree.h" +#include "xfs_rtrefcount_btree.h" /* * Physical superblock buffer manipulations. Shared with libxfs in userspace. @@ -1075,6 +1076,13 @@ xfs_sb_mount_common( mp->m_refc_mnr[0] = mp->m_refc_mxr[0] / 2; mp->m_refc_mnr[1] = mp->m_refc_mxr[1] / 2; + mp->m_rtrefc_mxr[0] = xfs_rtrefcountbt_maxrecs(mp, sbp->sb_blocksize, + true); + mp->m_rtrefc_mxr[1] = xfs_rtrefcountbt_maxrecs(mp, sbp->sb_blocksize, + false); + mp->m_rtrefc_mnr[0] = mp->m_rtrefc_mxr[0] / 2; + mp->m_rtrefc_mnr[1] = mp->m_rtrefc_mxr[1] / 2; + mp->m_bsize = XFS_FSB_TO_BB(mp, 1); mp->m_alloc_set_aside = xfs_alloc_set_aside(mp); mp->m_ag_max_usable = xfs_alloc_ag_max_usable(mp); diff --git a/fs/xfs/libxfs/xfs_shared.h b/fs/xfs/libxfs/xfs_shared.h index 31c577a94295..a1bfc98c47a3 100644 --- a/fs/xfs/libxfs/xfs_shared.h +++ b/fs/xfs/libxfs/xfs_shared.h @@ -42,6 +42,7 @@ extern const struct xfs_buf_ops xfs_rtbitmap_buf_ops; extern const struct xfs_buf_ops xfs_rtsummary_buf_ops; extern const struct xfs_buf_ops xfs_rtbuf_ops; extern const struct xfs_buf_ops xfs_rtsb_buf_ops; +extern const struct xfs_buf_ops xfs_rtrefcountbt_buf_ops; extern const struct xfs_buf_ops xfs_rtrmapbt_buf_ops; extern const struct xfs_buf_ops xfs_sb_buf_ops; extern const struct xfs_buf_ops xfs_sb_quiet_buf_ops; @@ -56,6 +57,7 @@ extern const struct xfs_btree_ops xfs_bmbt_ops; extern const struct xfs_btree_ops xfs_refcountbt_ops; extern const struct xfs_btree_ops xfs_rmapbt_ops; extern const struct xfs_btree_ops xfs_rtrmapbt_ops; +extern const struct xfs_btree_ops xfs_rtrefcountbt_ops; /* log size calculation functions */ int xfs_log_calc_unit_res(struct xfs_mount *mp, int unit_bytes); diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c index 2e64f18deabf..f3ef385f9aaf 100644 --- a/fs/xfs/xfs_mount.c +++ b/fs/xfs/xfs_mount.c @@ -37,6 +37,7 @@ #include "xfs_imeta.h" #include "xfs_rtgroup.h" #include "xfs_rtrmap_btree.h" +#include "xfs_rtrefcount_btree.h" static DEFINE_MUTEX(xfs_uuid_table_mutex); static int xfs_uuid_table_size; @@ -655,7 +656,10 @@ static inline void xfs_rtbtree_compute_maxlevels( struct xfs_mount *mp) { - mp->m_rtbtree_maxlevels = mp->m_rtrmap_maxlevels; + unsigned int levels; + + levels = max(mp->m_rtrmap_maxlevels, mp->m_rtrefc_maxlevels); + mp->m_rtbtree_maxlevels = levels; } /* @@ -729,6 +733,7 @@ xfs_mountfs( xfs_rmapbt_compute_maxlevels(mp); xfs_rtrmapbt_compute_maxlevels(mp); xfs_refcountbt_compute_maxlevels(mp); + xfs_rtrefcountbt_compute_maxlevels(mp); xfs_agbtree_compute_maxlevels(mp); xfs_rtbtree_compute_maxlevels(mp); diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h index b1ffab4cb9cd..487567d1839b 100644 --- a/fs/xfs/xfs_mount.h +++ b/fs/xfs/xfs_mount.h @@ -136,11 +136,14 @@ typedef struct xfs_mount { uint m_rtrmap_mnr[2]; /* min rtrmap btree records */ uint m_refc_mxr[2]; /* max refc btree records */ uint m_refc_mnr[2]; /* min refc btree records */ + uint m_rtrefc_mxr[2]; /* max rtrefc btree records */ + uint m_rtrefc_mnr[2]; /* min rtrefc btree records */ uint m_alloc_maxlevels; /* max alloc btree levels */ uint m_bm_maxlevels[2]; /* max bmap btree levels */ uint m_rmap_maxlevels; /* max rmap btree levels */ uint m_rtrmap_maxlevels; /* max rtrmap btree level */ uint m_refc_maxlevels; /* max refcount btree level */ + uint m_rtrefc_maxlevels; /* max rtrefc btree level */ unsigned int m_agbtree_maxlevels; /* max level of all AG btrees */ unsigned int m_rtbtree_maxlevels; /* max level of all rt btrees */ xfs_extlen_t m_ag_prealloc_blocks; /* reserved ag blocks */ @@ -369,6 +372,12 @@ static inline bool xfs_has_rtrmapbt(struct xfs_mount *mp) xfs_has_rmapbt(mp); } +static inline bool xfs_has_rtreflink(struct xfs_mount *mp) +{ + return xfs_has_metadir(mp) && xfs_has_realtime(mp) && + xfs_has_reflink(mp); +} + /* * Mount features * diff --git a/fs/xfs/xfs_ondisk.h b/fs/xfs/xfs_ondisk.h index f24a08dd63e9..94bbb6351d3d 100644 --- a/fs/xfs/xfs_ondisk.h +++ b/fs/xfs/xfs_ondisk.h @@ -79,6 +79,7 @@ xfs_check_ondisk_structs(void) XFS_CHECK_STRUCT_SIZE(struct xfs_rtbuf_blkinfo, 48); XFS_CHECK_STRUCT_SIZE(xfs_rtrmap_ptr_t, 8); XFS_CHECK_STRUCT_SIZE(struct xfs_rtrmap_root, 4); + XFS_CHECK_STRUCT_SIZE(xfs_rtrefcount_ptr_t, 8); /* * m68k has problems with xfs_attr_leaf_name_remote_t, but we pad it to ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 10/42] xfs: add realtime refcount btree block detection to log recovery 2022-12-30 22:18 ` [PATCHSET v1.0 00/42] xfs: reflink on the realtime device Darrick J. Wong ` (5 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 04/42] xfs: define the on-disk realtime refcount btree format Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 11/42] xfs: add realtime refcount btree inode to metadata directory Darrick J. Wong ` (34 subsequent siblings) 41 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Identify rt refcount btree blocks in the log correctly so that we can validate them during log recovery. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/xfs_buf_item_recover.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/fs/xfs/xfs_buf_item_recover.c b/fs/xfs/xfs_buf_item_recover.c index 496260c9d8cd..5368a0d34452 100644 --- a/fs/xfs/xfs_buf_item_recover.c +++ b/fs/xfs/xfs_buf_item_recover.c @@ -268,6 +268,9 @@ xlog_recover_validate_buf_type( case XFS_REFC_CRC_MAGIC: bp->b_ops = &xfs_refcountbt_buf_ops; break; + case XFS_RTREFC_CRC_MAGIC: + bp->b_ops = &xfs_rtrefcountbt_buf_ops; + break; default: warnmsg = "Bad btree block magic!"; break; @@ -772,6 +775,7 @@ xlog_recover_get_buf_lsn( break; } case XFS_RTRMAP_CRC_MAGIC: + case XFS_RTREFC_CRC_MAGIC: case XFS_BMAP_CRC_MAGIC: case XFS_BMAP_MAGIC: { struct xfs_btree_block *btb = blk; ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 11/42] xfs: add realtime refcount btree inode to metadata directory 2022-12-30 22:18 ` [PATCHSET v1.0 00/42] xfs: reflink on the realtime device Darrick J. Wong ` (6 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 10/42] xfs: add realtime refcount btree block detection to log recovery Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 09/42] xfs: support recovering refcount intent items targetting realtime extents Darrick J. Wong ` (33 subsequent siblings) 41 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Add a metadir path to select the realtime refcount btree inode and load it at mount time. The rtrefcountbt inode will have a unique extent format code, which means that we also have to update the inode validation and flush routines to look for it. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_bmap.c | 8 +++- fs/xfs/libxfs/xfs_format.h | 4 ++ fs/xfs/libxfs/xfs_inode_buf.c | 6 +++ fs/xfs/libxfs/xfs_inode_fork.c | 9 +++++ fs/xfs/libxfs/xfs_rtgroup.h | 3 ++ fs/xfs/libxfs/xfs_rtrefcount_btree.c | 33 ++++++++++++++++++ fs/xfs/libxfs/xfs_rtrefcount_btree.h | 4 ++ fs/xfs/xfs_inode.c | 13 +++++++ fs/xfs/xfs_inode_item.c | 2 + fs/xfs/xfs_inode_item_recover.c | 1 + fs/xfs/xfs_rtalloc.c | 63 ++++++++++++++++++++++++++++++++++ fs/xfs/xfs_trace.h | 1 + 12 files changed, 144 insertions(+), 3 deletions(-) diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index b46504d861e3..fe31f3cb5d91 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -5148,9 +5148,13 @@ xfs_bmap_del_extent_real( * the same order of operations as the data device, which is: * Remove the file mapping, remove the reverse mapping, and * then free the blocks. This means that we must delay the - * freeing until after we've scheduled the rmap update. + * freeing until after we've scheduled the rmap update. If + * realtime reflink is enabled, use deferred refcount intent + * items to decide what to do with the extent, just like we do + * for the data device. */ - if (want_free && !xfs_has_rtrmapbt(mp)) { + if (want_free && !xfs_has_rtrmapbt(mp) && + !xfs_has_rtreflink(mp)) { error = xfs_rtfree_blocks(tp, del->br_startblock, del->br_blockcount); if (error) diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h index d2270f95bfbc..20af5b730d6d 100644 --- a/fs/xfs/libxfs/xfs_format.h +++ b/fs/xfs/libxfs/xfs_format.h @@ -1011,6 +1011,7 @@ enum xfs_dinode_fmt { XFS_DINODE_FMT_BTREE, /* struct xfs_bmdr_block */ XFS_DINODE_FMT_UUID, /* added long ago, but never used */ XFS_DINODE_FMT_RMAP, /* reverse mapping btree */ + XFS_DINODE_FMT_REFCOUNT, /* reference count btree */ }; #define XFS_INODE_FORMAT_STR \ @@ -1019,7 +1020,8 @@ enum xfs_dinode_fmt { { XFS_DINODE_FMT_EXTENTS, "extent" }, \ { XFS_DINODE_FMT_BTREE, "btree" }, \ { XFS_DINODE_FMT_UUID, "uuid" }, \ - { XFS_DINODE_FMT_RMAP, "rmap" } + { XFS_DINODE_FMT_RMAP, "rmap" }, \ + { XFS_DINODE_FMT_REFCOUNT, "refcount" } /* * Max values for extnum and aextnum. diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c index 9ac84be391b3..dcf816f2643b 100644 --- a/fs/xfs/libxfs/xfs_inode_buf.c +++ b/fs/xfs/libxfs/xfs_inode_buf.c @@ -414,6 +414,12 @@ xfs_dinode_verify_fork( if (!(dip->di_flags2 & cpu_to_be64(XFS_DIFLAG2_METADATA))) return __this_address; break; + case XFS_DINODE_FMT_REFCOUNT: + if (!xfs_has_rtreflink(mp)) + return __this_address; + if (!(dip->di_flags2 & cpu_to_be64(XFS_DIFLAG2_METADATA))) + return __this_address; + break; default: return __this_address; } diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c index 61926c07aad3..e69ec68b5a9d 100644 --- a/fs/xfs/libxfs/xfs_inode_fork.c +++ b/fs/xfs/libxfs/xfs_inode_fork.c @@ -266,6 +266,11 @@ xfs_iformat_data_fork( return -EFSCORRUPTED; } return xfs_iformat_rtrmap(ip, dip); + case XFS_DINODE_FMT_REFCOUNT: + if (!xfs_has_rtreflink(ip->i_mount)) + return -EFSCORRUPTED; + ASSERT(0); /* to be implemented later */ + return -EFSCORRUPTED; default: xfs_inode_verifier_error(ip, -EFSCORRUPTED, __func__, dip, sizeof(*dip), __this_address); @@ -652,6 +657,10 @@ xfs_iflush_fork( xfs_iflush_rtrmap(ip, dip); break; + case XFS_DINODE_FMT_REFCOUNT: + ASSERT(0); /* to be implemented later */ + break; + default: ASSERT(0); break; diff --git a/fs/xfs/libxfs/xfs_rtgroup.h b/fs/xfs/libxfs/xfs_rtgroup.h index 4e9b9098f2f2..0f400f133d88 100644 --- a/fs/xfs/libxfs/xfs_rtgroup.h +++ b/fs/xfs/libxfs/xfs_rtgroup.h @@ -23,6 +23,9 @@ struct xfs_rtgroup { /* reverse mapping btree inode */ struct xfs_inode *rtg_rmapip; + /* refcount btree inode */ + struct xfs_inode *rtg_refcountip; + /* Number of blocks in this group */ xfs_rgblock_t rtg_blockcount; diff --git a/fs/xfs/libxfs/xfs_rtrefcount_btree.c b/fs/xfs/libxfs/xfs_rtrefcount_btree.c index bdefc4f5939d..40524fee3860 100644 --- a/fs/xfs/libxfs/xfs_rtrefcount_btree.c +++ b/fs/xfs/libxfs/xfs_rtrefcount_btree.c @@ -26,6 +26,7 @@ #include "xfs_extent_busy.h" #include "xfs_rtgroup.h" #include "xfs_rtbitmap.h" +#include "xfs_imeta.h" static struct kmem_cache *xfs_rtrefcountbt_cur_cache; @@ -354,6 +355,7 @@ xfs_rtrefcountbt_commit_staged_btree( int flags = XFS_ILOG_CORE | XFS_ILOG_DBROOT; ASSERT(cur->bc_flags & XFS_BTREE_STAGING); + ASSERT(ifake->if_fork->if_format == XFS_DINODE_FMT_REFCOUNT); /* * Free any resources hanging off the real fork, then shallow-copy the @@ -457,3 +459,34 @@ xfs_rtrefcountbt_compute_maxlevels( /* Add one level to handle the inode root level. */ mp->m_rtrefc_maxlevels = min(d_maxlevels, r_maxlevels) + 1; } + +#define XFS_RTREFC_NAMELEN 21 + +/* Create the metadata directory path for an rtrefcount btree inode. */ +int +xfs_rtrefcountbt_create_path( + struct xfs_mount *mp, + xfs_rgnumber_t rgno, + struct xfs_imeta_path **pathp) +{ + struct xfs_imeta_path *path; + char *fname; + int error; + + error = xfs_imeta_create_file_path(mp, 2, &path); + if (error) + return error; + + fname = kmalloc(XFS_RTREFC_NAMELEN, GFP_KERNEL); + if (!fname) { + xfs_imeta_free_path(path); + return -ENOMEM; + } + + snprintf(fname, XFS_RTREFC_NAMELEN, "%u.refcount", rgno); + path->im_path[0] = "realtime"; + path->im_path[1] = fname; + path->im_dynamicmask = 0x2; + *pathp = path; + return 0; +} diff --git a/fs/xfs/libxfs/xfs_rtrefcount_btree.h b/fs/xfs/libxfs/xfs_rtrefcount_btree.h index d10ebdcf7727..1f3f590c68e6 100644 --- a/fs/xfs/libxfs/xfs_rtrefcount_btree.h +++ b/fs/xfs/libxfs/xfs_rtrefcount_btree.h @@ -11,6 +11,7 @@ struct xfs_btree_cur; struct xfs_mount; struct xbtree_ifakeroot; struct xfs_rtgroup; +struct xfs_imeta_path; /* refcounts only exist on crc enabled filesystems */ #define XFS_RTREFCOUNT_BLOCK_LEN XFS_BTREE_LBLOCK_CRC_LEN @@ -68,4 +69,7 @@ unsigned int xfs_rtrefcountbt_maxlevels_ondisk(void); int __init xfs_rtrefcountbt_init_cur_cache(void); void xfs_rtrefcountbt_destroy_cur_cache(void); +int xfs_rtrefcountbt_create_path(struct xfs_mount *mp, xfs_rgnumber_t rgno, + struct xfs_imeta_path **pathp); + #endif /* __XFS_RTREFCOUNT_BTREE_H__ */ diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index 3b0c04b6bcdf..d50cbd0eb260 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -2580,6 +2580,14 @@ xfs_iflush( __func__, ip->i_ino, ip); goto flush_out; } + } else if (ip->i_df.if_format == XFS_DINODE_FMT_REFCOUNT) { + if (!S_ISREG(VFS_I(ip)->i_mode) || + !(ip->i_diflags2 & XFS_DIFLAG2_METADATA)) { + xfs_alert_tag(mp, XFS_PTAG_IFLUSH, + "%s: Bad rt refcountbt inode %Lu, ptr "PTR_FMT, + __func__, ip->i_ino, ip); + goto flush_out; + } } else if (S_ISREG(VFS_I(ip)->i_mode)) { if (XFS_TEST_ERROR( ip->i_df.if_format != XFS_DINODE_FMT_EXTENTS && @@ -2626,6 +2634,11 @@ xfs_iflush( "%s: rt rmapbt in inode %Lu attr fork, ptr "PTR_FMT, __func__, ip->i_ino, ip); goto flush_out; + } else if (ip->i_af.if_format == XFS_DINODE_FMT_REFCOUNT) { + xfs_alert_tag(mp, XFS_PTAG_IFLUSH, + "%s: rt refcountbt in inode %Lu attr fork, ptr "PTR_FMT, + __func__, ip->i_ino, ip); + goto flush_out; } } diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c index b6e374744474..7cbc79e3997a 100644 --- a/fs/xfs/xfs_inode_item.c +++ b/fs/xfs/xfs_inode_item.c @@ -63,6 +63,7 @@ xfs_inode_item_data_fork_size( break; case XFS_DINODE_FMT_BTREE: case XFS_DINODE_FMT_RMAP: + case XFS_DINODE_FMT_REFCOUNT: if ((iip->ili_fields & XFS_ILOG_DBROOT) && ip->i_df.if_broot_bytes > 0) { *nbytes += ip->i_df.if_broot_bytes; @@ -184,6 +185,7 @@ xfs_inode_item_format_data_fork( break; case XFS_DINODE_FMT_BTREE: case XFS_DINODE_FMT_RMAP: + case XFS_DINODE_FMT_REFCOUNT: iip->ili_fields &= ~(XFS_ILOG_DDATA | XFS_ILOG_DEXT | XFS_ILOG_DEV); diff --git a/fs/xfs/xfs_inode_item_recover.c b/fs/xfs/xfs_inode_item_recover.c index 4f1ed1f6a34d..feeba1dff01e 100644 --- a/fs/xfs/xfs_inode_item_recover.c +++ b/fs/xfs/xfs_inode_item_recover.c @@ -417,6 +417,7 @@ xlog_recover_inode_commit_pass2( if (unlikely(S_ISREG(ldip->di_mode))) { if ((ldip->di_format != XFS_DINODE_FMT_EXTENTS) && (ldip->di_format != XFS_DINODE_FMT_RMAP) && + (ldip->di_format != XFS_DINODE_FMT_REFCOUNT) && (ldip->di_format != XFS_DINODE_FMT_BTREE)) { XFS_CORRUPTION_ERROR( "Bad log dinode data fork format for regular file", diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c index 0f31680284fb..c998e26f5db9 100644 --- a/fs/xfs/xfs_rtalloc.c +++ b/fs/xfs/xfs_rtalloc.c @@ -31,6 +31,7 @@ #include "xfs_btree.h" #include "xfs_rmap.h" #include "xfs_rtrmap_btree.h" +#include "xfs_rtrefcount_btree.h" /* * Realtime metadata files are not quite regular files because userspace can't @@ -42,6 +43,7 @@ static struct lock_class_key xfs_rbmip_key; static struct lock_class_key xfs_rsumip_key; static struct lock_class_key xfs_rrmapip_key; +static struct lock_class_key xfs_rrefcountip_key; /* * Read and return the summary information for a given extent size, @@ -1855,6 +1857,47 @@ xfs_rtmount_iread_extents( return error; } +/* Load realtime refcount btree inode. */ +STATIC int +xfs_rtmount_refcountbt( + struct xfs_rtgroup *rtg) +{ + struct xfs_mount *mp = rtg->rtg_mount; + struct xfs_imeta_path *path; + struct xfs_inode *ip; + xfs_ino_t ino; + int error; + + if (!xfs_has_rtreflink(mp)) + return 0; + + error = xfs_rtrefcountbt_create_path(mp, rtg->rtg_rgno, &path); + if (error) + return error; + + error = xfs_imeta_lookup(mp, path, &ino); + if (error) + goto out_path; + + error = xfs_rt_iget(mp, ino, &xfs_rrefcountip_key, &ip); + if (error) + goto out_path; + + if (XFS_IS_CORRUPT(mp, ip->i_df.if_format != XFS_DINODE_FMT_REFCOUNT)) { + error = -EFSCORRUPTED; + goto out_rele; + } + + rtg->rtg_refcountip = ip; + ip = NULL; +out_rele: + if (ip) + xfs_imeta_irele(ip); +out_path: + xfs_imeta_free_path(path); + return error; +} + /* * Get the bitmap and summary inodes and the summary cache into the mount * structure at mount time. @@ -1902,6 +1945,10 @@ xfs_rtmount_inodes( xfs_rtgroup_put(rtg); goto out_rele_rtgroup; } + + error = xfs_rtmount_refcountbt(rtg); + if (error) + goto out_rele_rtgroup; } xfs_alloc_rsum_cache(mp, sbp->sb_rbmblocks); @@ -1909,6 +1956,10 @@ xfs_rtmount_inodes( out_rele_rtgroup: for_each_rtgroup(mp, rgno, rtg) { + if (rtg->rtg_refcountip) + xfs_imeta_irele(rtg->rtg_refcountip); + rtg->rtg_refcountip = NULL; + if (rtg->rtg_rmapip) xfs_imeta_irele(rtg->rtg_rmapip); rtg->rtg_rmapip = NULL; @@ -1945,6 +1996,14 @@ xfs_rtmount_dqattach( return error; } } + + if (rtg->rtg_refcountip) { + error = xfs_qm_dqattach(rtg->rtg_refcountip); + if (error) { + xfs_rtgroup_put(rtg); + return error; + } + } } return 0; @@ -1960,6 +2019,10 @@ xfs_rtunmount_inodes( kmem_free(mp->m_rsum_cache); for_each_rtgroup(mp, rgno, rtg) { + if (rtg->rtg_refcountip) + xfs_imeta_irele(rtg->rtg_refcountip); + rtg->rtg_refcountip = NULL; + if (rtg->rtg_rmapip) xfs_imeta_irele(rtg->rtg_rmapip); rtg->rtg_rmapip = NULL; diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index 1f8ab7c436a9..d07947451ec9 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -2239,6 +2239,7 @@ TRACE_DEFINE_ENUM(XFS_DINODE_FMT_EXTENTS); TRACE_DEFINE_ENUM(XFS_DINODE_FMT_BTREE); TRACE_DEFINE_ENUM(XFS_DINODE_FMT_UUID); TRACE_DEFINE_ENUM(XFS_DINODE_FMT_RMAP); +TRACE_DEFINE_ENUM(XFS_DINODE_FMT_REFCOUNT); DECLARE_EVENT_CLASS(xfs_swap_extent_class, TP_PROTO(struct xfs_inode *ip, int which), ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 09/42] xfs: support recovering refcount intent items targetting realtime extents 2022-12-30 22:18 ` [PATCHSET v1.0 00/42] xfs: reflink on the realtime device Darrick J. Wong ` (7 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 11/42] xfs: add realtime refcount btree inode to metadata directory Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 12/42] xfs: add metadata reservations for realtime refcount btree Darrick J. Wong ` (32 subsequent siblings) 41 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Now that we have reflink on the realtime device, refcount intent items have to support remapping extents on the realtime volume. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/xfs_refcount_item.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c index 7a366b316e79..fc6dbbb17ad7 100644 --- a/fs/xfs/xfs_refcount_item.c +++ b/fs/xfs/xfs_refcount_item.c @@ -482,6 +482,9 @@ xfs_cui_validate_phys( return false; } + if (pmap->pe_flags & XFS_REFCOUNT_EXTENT_REALTIME) + return xfs_verify_rtbext(mp, pmap->pe_startblock, pmap->pe_len); + return xfs_verify_fsbext(mp, pmap->pe_startblock, pmap->pe_len); } ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 12/42] xfs: add metadata reservations for realtime refcount btree 2022-12-30 22:18 ` [PATCHSET v1.0 00/42] xfs: reflink on the realtime device Darrick J. Wong ` (8 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 09/42] xfs: support recovering refcount intent items targetting realtime extents Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 07/42] xfs: prepare refcount functions to deal with rtrefcountbt Darrick J. Wong ` (31 subsequent siblings) 41 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Reserve some free blocks so that we will always have enough free blocks in the data volume to handle expansion of the realtime refcount btree. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_rtrefcount_btree.c | 39 ++++++++++++++++++++++++++++++++++ fs/xfs/libxfs/xfs_rtrefcount_btree.h | 2 ++ fs/xfs/xfs_rtalloc.c | 9 +++++++- 3 files changed, 49 insertions(+), 1 deletion(-) diff --git a/fs/xfs/libxfs/xfs_rtrefcount_btree.c b/fs/xfs/libxfs/xfs_rtrefcount_btree.c index 40524fee3860..74c5cf9a0d3a 100644 --- a/fs/xfs/libxfs/xfs_rtrefcount_btree.c +++ b/fs/xfs/libxfs/xfs_rtrefcount_btree.c @@ -490,3 +490,42 @@ xfs_rtrefcountbt_create_path( *pathp = path; return 0; } + +/* Calculate the rtrefcount btree size for some records. */ +static unsigned long long +xfs_rtrefcountbt_calc_size( + struct xfs_mount *mp, + unsigned long long len) +{ + return xfs_btree_calc_size(mp->m_rtrefc_mnr, len); +} + +/* + * Calculate the maximum refcount btree size. + */ +static unsigned long long +xfs_rtrefcountbt_max_size( + struct xfs_mount *mp, + xfs_rtblock_t rtblocks) +{ + /* Bail out if we're uninitialized, which can happen in mkfs. */ + if (mp->m_rtrefc_mxr[0] == 0) + return 0; + + return xfs_rtrefcountbt_calc_size(mp, rtblocks); +} + +/* + * Figure out how many blocks to reserve and how many are used by this btree. + * We need enough space to hold one record for every rt extent in the rtgroup. + */ +xfs_filblks_t +xfs_rtrefcountbt_calc_reserves( + struct xfs_mount *mp) +{ + if (!xfs_has_rtreflink(mp)) + return 0; + + return xfs_rtrefcountbt_max_size(mp, + xfs_rtb_to_rtxt(mp, mp->m_sb.sb_rgblocks)); +} diff --git a/fs/xfs/libxfs/xfs_rtrefcount_btree.h b/fs/xfs/libxfs/xfs_rtrefcount_btree.h index 1f3f590c68e6..ffda0b063bcf 100644 --- a/fs/xfs/libxfs/xfs_rtrefcount_btree.h +++ b/fs/xfs/libxfs/xfs_rtrefcount_btree.h @@ -72,4 +72,6 @@ void xfs_rtrefcountbt_destroy_cur_cache(void); int xfs_rtrefcountbt_create_path(struct xfs_mount *mp, xfs_rgnumber_t rgno, struct xfs_imeta_path **pathp); +xfs_filblks_t xfs_rtrefcountbt_calc_reserves(struct xfs_mount *mp); + #endif /* __XFS_RTREFCOUNT_BTREE_H__ */ diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c index c998e26f5db9..48c7cc28b7f2 100644 --- a/fs/xfs/xfs_rtalloc.c +++ b/fs/xfs/xfs_rtalloc.c @@ -1733,8 +1733,10 @@ xfs_rt_resv_free( struct xfs_rtgroup *rtg; xfs_rgnumber_t rgno; - for_each_rtgroup(mp, rgno, rtg) + for_each_rtgroup(mp, rgno, rtg) { + xfs_imeta_resv_free_inode(rtg->rtg_refcountip); xfs_imeta_resv_free_inode(rtg->rtg_rmapip); + } } /* Reserve space for rt metadata inodes' space expansion. */ @@ -1754,6 +1756,11 @@ xfs_rt_resv_init( err2 = xfs_imeta_resv_init_inode(rtg->rtg_rmapip, ask); if (err2 && !error) error = err2; + + ask = xfs_rtrefcountbt_calc_reserves(mp); + err2 = xfs_imeta_resv_init_inode(rtg->rtg_refcountip, ask); + if (err2 && !error) + error = err2; } return error; ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 07/42] xfs: prepare refcount functions to deal with rtrefcountbt 2022-12-30 22:18 ` [PATCHSET v1.0 00/42] xfs: reflink on the realtime device Darrick J. Wong ` (9 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 12/42] xfs: add metadata reservations for realtime refcount btree Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 08/42] xfs: add a realtime flag to the refcount update log redo items Darrick J. Wong ` (30 subsequent siblings) 41 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Prepare the high-level refcount functions to deal with the new realtime refcountbt and its slightly different conventions. Provide the ability to talk to either refcountbt or rtrefcountbt formats from the same high level code. Note that we leave the _recover_cow_leftovers functions for a separate patch so that we can convert it all at once. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_refcount.c | 79 ++++++++++++++++++++++++++++++++++-------- 1 file changed, 64 insertions(+), 15 deletions(-) diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c index e1f55edceccf..a54a633f2ef9 100644 --- a/fs/xfs/libxfs/xfs_refcount.c +++ b/fs/xfs/libxfs/xfs_refcount.c @@ -24,6 +24,7 @@ #include "xfs_rmap.h" #include "xfs_ag.h" #include "xfs_health.h" +#include "xfs_rtgroup.h" struct kmem_cache *xfs_refcount_intent_cache; @@ -40,6 +41,16 @@ STATIC int __xfs_refcount_cow_alloc(struct xfs_btree_cur *rcur, STATIC int __xfs_refcount_cow_free(struct xfs_btree_cur *rcur, xfs_agblock_t agbno, xfs_extlen_t aglen); +/* Return the maximum startblock number of the refcountbt. */ +static inline xfs_agblock_t +xrefc_max_startblock( + struct xfs_btree_cur *cur) +{ + if (cur->bc_btnum == XFS_BTNUM_RTREFC) + return cur->bc_mp->m_sb.sb_rgblocks; + return cur->bc_mp->m_sb.sb_agblocks; +} + /* * Look up the first record less than or equal to [bno, len] in the btree * given by cur. @@ -142,12 +153,35 @@ xfs_refcount_check_perag_irec( return NULL; } +static inline xfs_failaddr_t +xfs_refcount_check_rtgroup_irec( + struct xfs_rtgroup *rtg, + const struct xfs_refcount_irec *irec) +{ + if (irec->rc_blockcount == 0 || irec->rc_blockcount > XFS_REFC_LEN_MAX) + return __this_address; + + if (!xfs_refcount_check_domain(irec)) + return __this_address; + + /* check for valid extent range, including overflow */ + if (!xfs_verify_rgbext(rtg, irec->rc_startblock, irec->rc_blockcount)) + return __this_address; + + if (irec->rc_refcount == 0 || irec->rc_refcount > XFS_REFC_REFCOUNT_MAX) + return __this_address; + + return NULL; +} + /* Simple checks for refcount records. */ xfs_failaddr_t xfs_refcount_check_irec( struct xfs_btree_cur *cur, const struct xfs_refcount_irec *irec) { + if (cur->bc_btnum == XFS_BTNUM_RTREFC) + return xfs_refcount_check_rtgroup_irec(cur->bc_ino.rtg, irec); return xfs_refcount_check_perag_irec(cur->bc_ag.pag, irec); } @@ -159,9 +193,15 @@ xfs_refcount_complain_bad_rec( { struct xfs_mount *mp = cur->bc_mp; - xfs_warn(mp, + if (cur->bc_btnum == XFS_BTNUM_RTREFC) { + xfs_warn(mp, + "RT Refcount BTree record corruption in rtgroup %u detected at %pS!", + cur->bc_ino.rtg->rtg_rgno, fa); + } else { + xfs_warn(mp, "Refcount BTree record corruption in AG %d detected at %pS!", cur->bc_ag.pag->pag_agno, fa); + } xfs_warn(mp, "Start block 0x%x, block count 0x%x, references 0x%x", irec->rc_startblock, irec->rc_blockcount, irec->rc_refcount); @@ -1054,6 +1094,15 @@ xfs_refcount_merge_extents( return 0; } +static inline struct xbtree_refc * +xrefc_btree_state( + struct xfs_btree_cur *cur) +{ + if (cur->bc_btnum == XFS_BTNUM_RTREFC) + return &cur->bc_ino.refc; + return &cur->bc_ag.refc; +} + /* * XXX: This is a pretty hand-wavy estimate. The penalty for guessing * true incorrectly is a shutdown FS; the penalty for guessing false @@ -1071,25 +1120,25 @@ xfs_refcount_still_have_space( * to handle each of the shape changes to the refcount btree. */ overhead = xfs_allocfree_block_count(cur->bc_mp, - cur->bc_ag.refc.shape_changes); - overhead += cur->bc_mp->m_refc_maxlevels; + xrefc_btree_state(cur)->shape_changes); + overhead += cur->bc_maxlevels; overhead *= cur->bc_mp->m_sb.sb_blocksize; /* * Only allow 2 refcount extent updates per transaction if the * refcount continue update "error" has been injected. */ - if (cur->bc_ag.refc.nr_ops > 2 && + if (xrefc_btree_state(cur)->nr_ops > 2 && XFS_TEST_ERROR(false, cur->bc_mp, XFS_ERRTAG_REFCOUNT_CONTINUE_UPDATE)) return false; - if (cur->bc_ag.refc.nr_ops == 0) + if (xrefc_btree_state(cur)->nr_ops == 0) return true; else if (overhead > cur->bc_tp->t_log_res) return false; return cur->bc_tp->t_log_res - overhead > - cur->bc_ag.refc.nr_ops * XFS_REFCOUNT_ITEM_OVERHEAD; + xrefc_btree_state(cur)->nr_ops * XFS_REFCOUNT_ITEM_OVERHEAD; } /* @@ -1124,7 +1173,7 @@ xfs_refcount_adjust_extents( if (error) goto out_error; if (!found_rec || ext.rc_domain != XFS_REFC_DOMAIN_SHARED) { - ext.rc_startblock = cur->bc_mp->m_sb.sb_agblocks; + ext.rc_startblock = xrefc_max_startblock(cur); ext.rc_blockcount = 0; ext.rc_refcount = 0; ext.rc_domain = XFS_REFC_DOMAIN_SHARED; @@ -1148,7 +1197,7 @@ xfs_refcount_adjust_extents( * Either cover the hole (increment) or * delete the range (decrement). */ - cur->bc_ag.refc.nr_ops++; + xrefc_btree_state(cur)->nr_ops++; if (tmp.rc_refcount) { error = xfs_refcount_insert(cur, &tmp, &found_tmp); @@ -1205,7 +1254,7 @@ xfs_refcount_adjust_extents( goto skip; ext.rc_refcount += adj; trace_xfs_refcount_modify_extent(cur, &ext); - cur->bc_ag.refc.nr_ops++; + xrefc_btree_state(cur)->nr_ops++; if (ext.rc_refcount > 1) { error = xfs_refcount_update(cur, &ext); if (error) @@ -1288,7 +1337,7 @@ xfs_refcount_adjust( if (shape_changed) shape_changes++; if (shape_changes) - cur->bc_ag.refc.shape_changes++; + xrefc_btree_state(cur)->shape_changes++; /* Now that we've taken care of the ends, adjust the middle extents */ error = xfs_refcount_adjust_extents(cur, agbno, aglen, adj); @@ -1380,8 +1429,8 @@ xfs_refcount_finish_one( */ rcur = *pcur; if (rcur != NULL && rcur->bc_ag.pag != ri->ri_pag) { - nr_ops = rcur->bc_ag.refc.nr_ops; - shape_changes = rcur->bc_ag.refc.shape_changes; + nr_ops = xrefc_btree_state(rcur)->nr_ops; + shape_changes = xrefc_btree_state(rcur)->shape_changes; xfs_refcount_finish_one_cleanup(tp, rcur, 0); rcur = NULL; *pcur = NULL; @@ -1393,8 +1442,8 @@ xfs_refcount_finish_one( return error; rcur = xfs_refcountbt_init_cursor(mp, tp, agbp, ri->ri_pag); - rcur->bc_ag.refc.nr_ops = nr_ops; - rcur->bc_ag.refc.shape_changes = shape_changes; + xrefc_btree_state(rcur)->nr_ops = nr_ops; + xrefc_btree_state(rcur)->shape_changes = shape_changes; } *pcur = rcur; @@ -1689,7 +1738,7 @@ xfs_refcount_adjust_cow_extents( goto out_error; } if (!found_rec) { - ext.rc_startblock = cur->bc_mp->m_sb.sb_agblocks; + ext.rc_startblock = xrefc_max_startblock(cur); ext.rc_blockcount = 0; ext.rc_refcount = 0; ext.rc_domain = XFS_REFC_DOMAIN_COW; ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 08/42] xfs: add a realtime flag to the refcount update log redo items 2022-12-30 22:18 ` [PATCHSET v1.0 00/42] xfs: reflink on the realtime device Darrick J. Wong ` (10 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 07/42] xfs: prepare refcount functions to deal with rtrefcountbt Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 13/42] xfs: wire up a new inode fork type for the realtime refcount Darrick J. Wong ` (29 subsequent siblings) 41 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Extend the refcount update (CUI) log items with a new realtime flag that indicates that the updates apply against the realtime refcountbt. We'll wire up the actual refcount code later. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_bmap.c | 10 ++- fs/xfs/libxfs/xfs_defer.c | 1 fs/xfs/libxfs/xfs_defer.h | 1 fs/xfs/libxfs/xfs_log_format.h | 5 + fs/xfs/libxfs/xfs_refcount.c | 156 +++++++++++++++++++++++++++++----------- fs/xfs/libxfs/xfs_refcount.h | 18 +++-- fs/xfs/scrub/cow_repair.c | 2 - fs/xfs/scrub/reap.c | 5 + fs/xfs/xfs_refcount_item.c | 32 ++++++++ fs/xfs/xfs_reflink.c | 19 +++-- 10 files changed, 184 insertions(+), 65 deletions(-) diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index 8c683db35788..b46504d861e3 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -4529,8 +4529,9 @@ xfs_bmapi_write( * the refcount btree for orphan recovery. */ if (whichfork == XFS_COW_FORK) - xfs_refcount_alloc_cow_extent(tp, bma.blkno, - bma.length); + xfs_refcount_alloc_cow_extent(tp, + XFS_IS_REALTIME_INODE(ip), + bma.blkno, bma.length); } /* Deal with the allocated space we found. */ @@ -4696,7 +4697,8 @@ xfs_bmapi_convert_delalloc( *seq = READ_ONCE(ifp->if_seq); if (whichfork == XFS_COW_FORK) - xfs_refcount_alloc_cow_extent(tp, bma.blkno, bma.length); + xfs_refcount_alloc_cow_extent(tp, XFS_IS_REALTIME_INODE(ip), + bma.blkno, bma.length); error = xfs_bmap_btree_to_extents(tp, ip, bma.cur, &bma.logflags, whichfork); @@ -5313,7 +5315,7 @@ xfs_bmap_del_extent_real( */ if (want_free) { if (xfs_is_reflink_inode(ip) && whichfork == XFS_DATA_FORK) { - xfs_refcount_decrease_extent(tp, del); + xfs_refcount_decrease_extent(tp, isrt, del); } else { unsigned int efi_flags = 0; diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c index ce3bc5fe2bdc..1aefb4c99e7b 100644 --- a/fs/xfs/libxfs/xfs_defer.c +++ b/fs/xfs/libxfs/xfs_defer.c @@ -186,6 +186,7 @@ static struct kmem_cache *xfs_defer_pending_cache; static const struct xfs_defer_op_type *defer_op_types[] = { [XFS_DEFER_OPS_TYPE_BMAP] = &xfs_bmap_update_defer_type, [XFS_DEFER_OPS_TYPE_REFCOUNT] = &xfs_refcount_update_defer_type, + [XFS_DEFER_OPS_TYPE_REFCOUNT_RT] = &xfs_refcount_update_defer_type, [XFS_DEFER_OPS_TYPE_RMAP] = &xfs_rmap_update_defer_type, [XFS_DEFER_OPS_TYPE_RMAP_RT] = &xfs_rmap_update_defer_type, [XFS_DEFER_OPS_TYPE_FREE] = &xfs_extent_free_defer_type, diff --git a/fs/xfs/libxfs/xfs_defer.h b/fs/xfs/libxfs/xfs_defer.h index 89c279185ce6..8564777c4c49 100644 --- a/fs/xfs/libxfs/xfs_defer.h +++ b/fs/xfs/libxfs/xfs_defer.h @@ -16,6 +16,7 @@ struct xfs_defer_capture; enum xfs_defer_ops_type { XFS_DEFER_OPS_TYPE_BMAP, XFS_DEFER_OPS_TYPE_REFCOUNT, + XFS_DEFER_OPS_TYPE_REFCOUNT_RT, XFS_DEFER_OPS_TYPE_RMAP, XFS_DEFER_OPS_TYPE_RMAP_RT, XFS_DEFER_OPS_TYPE_FREE, diff --git a/fs/xfs/libxfs/xfs_log_format.h b/fs/xfs/libxfs/xfs_log_format.h index 3a23282d6e6f..66cfcafae9b8 100644 --- a/fs/xfs/libxfs/xfs_log_format.h +++ b/fs/xfs/libxfs/xfs_log_format.h @@ -800,7 +800,10 @@ struct xfs_phys_extent { /* Type codes are taken directly from enum xfs_refcount_intent_type. */ #define XFS_REFCOUNT_EXTENT_TYPE_MASK 0xFF -#define XFS_REFCOUNT_EXTENT_FLAGS (XFS_REFCOUNT_EXTENT_TYPE_MASK) +#define XFS_REFCOUNT_EXTENT_REALTIME (1U << 31) + +#define XFS_REFCOUNT_EXTENT_FLAGS (XFS_REFCOUNT_EXTENT_TYPE_MASK | \ + XFS_REFCOUNT_EXTENT_REALTIME) /* * This is the structure used to lay out a cui log item in the diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c index a54a633f2ef9..999ba2c5c37d 100644 --- a/fs/xfs/libxfs/xfs_refcount.c +++ b/fs/xfs/libxfs/xfs_refcount.c @@ -25,6 +25,7 @@ #include "xfs_ag.h" #include "xfs_health.h" #include "xfs_rtgroup.h" +#include "xfs_rtalloc.h" struct kmem_cache *xfs_refcount_intent_cache; @@ -1141,6 +1142,28 @@ xfs_refcount_still_have_space( xrefc_btree_state(cur)->nr_ops * XFS_REFCOUNT_ITEM_OVERHEAD; } +/* Schedule an extent free. */ +static void +xrefc_free_extent( + struct xfs_btree_cur *cur, + struct xfs_refcount_irec *rec) +{ + xfs_fsblock_t fsbno; + unsigned int flags = 0; + + if (cur->bc_btnum == XFS_BTNUM_RTREFC) { + flags |= XFS_FREE_EXTENT_REALTIME; + fsbno = xfs_rgbno_to_rtb(cur->bc_mp, cur->bc_ino.rtg->rtg_rgno, + rec->rc_startblock); + } else { + fsbno = XFS_AGB_TO_FSB(cur->bc_mp, cur->bc_ag.pag->pag_agno, + rec->rc_startblock); + } + + xfs_free_extent_later(cur->bc_tp, fsbno, rec->rc_blockcount, NULL, + flags); +} + /* * Adjust the refcounts of middle extents. At this point we should have * split extents that crossed the adjustment range; merged with adjacent @@ -1157,7 +1180,6 @@ xfs_refcount_adjust_extents( struct xfs_refcount_irec ext, tmp; int error; int found_rec, found_tmp; - xfs_fsblock_t fsbno; /* Merging did all the work already. */ if (*aglen == 0) @@ -1210,11 +1232,7 @@ xfs_refcount_adjust_extents( goto out_error; } } else { - fsbno = XFS_AGB_TO_FSB(cur->bc_mp, - cur->bc_ag.pag->pag_agno, - tmp.rc_startblock); - xfs_free_extent_later(cur->bc_tp, fsbno, - tmp.rc_blockcount, NULL, 0); + xrefc_free_extent(cur, &tmp); } (*agbno) += tmp.rc_blockcount; @@ -1270,11 +1288,7 @@ xfs_refcount_adjust_extents( } goto advloop; } else { - fsbno = XFS_AGB_TO_FSB(cur->bc_mp, - cur->bc_ag.pag->pag_agno, - ext.rc_startblock); - xfs_free_extent_later(cur->bc_tp, fsbno, - ext.rc_blockcount, NULL, 0); + xrefc_free_extent(cur, &ext); } skip: @@ -1358,19 +1372,31 @@ xfs_refcount_finish_one_cleanup( struct xfs_btree_cur *rcur, int error) { - struct xfs_buf *agbp; + struct xfs_buf *agbp = NULL; if (rcur == NULL) return; - agbp = rcur->bc_ag.agbp; + if (rcur->bc_btnum == XFS_BTNUM_REFC) + agbp = rcur->bc_ag.agbp; xfs_btree_del_cursor(rcur, error); - if (error) + if (agbp) xfs_trans_brelse(tp, agbp); } +/* Does this btree cursor match the given AG? */ +static inline bool +xfs_refcount_is_wrong_cursor( + struct xfs_btree_cur *cur, + struct xfs_refcount_intent *ri) +{ + if (cur->bc_btnum == XFS_BTNUM_RTREFC) + return cur->bc_ino.rtg != ri->ri_rtg; + return cur->bc_ag.pag != ri->ri_pag; +} + /* * Set up a continuation a deferred refcount operation by updating the intent. - * Checks to make sure we're not going to run off the end of the AG. + * Checks to make sure we're not going to run off the end of the AG or rtgroup. */ static inline int xfs_refcount_continue_op( @@ -1379,19 +1405,35 @@ xfs_refcount_continue_op( xfs_agblock_t new_agbno) { struct xfs_mount *mp = cur->bc_mp; - struct xfs_perag *pag = cur->bc_ag.pag; - if (XFS_IS_CORRUPT(mp, !xfs_verify_agbext(pag, new_agbno, - ri->ri_blockcount))) { - xfs_btree_mark_sick(cur); - return -EFSCORRUPTED; + if (ri->ri_realtime) { + struct xfs_rtgroup *rtg = ri->ri_rtg; + + if (XFS_IS_CORRUPT(mp, !xfs_verify_rgbext(rtg, new_agbno, + ri->ri_blockcount))) { + xfs_btree_mark_sick(cur); + return -EFSCORRUPTED; + } + + ri->ri_startblock = xfs_rgbno_to_rtb(mp, rtg->rtg_rgno, new_agbno); + + ASSERT(xfs_verify_rtbext(mp, ri->ri_startblock, ri->ri_blockcount)); + ASSERT(rtg->rtg_rgno == xfs_rtb_to_rgno(mp, ri->ri_startblock)); + } else { + struct xfs_perag *pag = cur->bc_ag.pag; + + if (XFS_IS_CORRUPT(mp, !xfs_verify_agbext(pag, new_agbno, + ri->ri_blockcount))) { + xfs_btree_mark_sick(cur); + return -EFSCORRUPTED; + } + + ri->ri_startblock = XFS_AGB_TO_FSB(mp, pag->pag_agno, new_agbno); + + ASSERT(xfs_verify_fsbext(mp, ri->ri_startblock, ri->ri_blockcount)); + ASSERT(pag->pag_agno == XFS_FSB_TO_AGNO(mp, ri->ri_startblock)); } - ri->ri_startblock = XFS_AGB_TO_FSB(mp, pag->pag_agno, new_agbno); - - ASSERT(xfs_verify_fsbext(mp, ri->ri_startblock, ri->ri_blockcount)); - ASSERT(pag->pag_agno == XFS_FSB_TO_AGNO(mp, ri->ri_startblock)); - return 0; } @@ -1416,10 +1458,16 @@ xfs_refcount_finish_one( unsigned long nr_ops = 0; int shape_changes = 0; - bno = XFS_FSB_TO_AGBNO(mp, ri->ri_startblock); - trace_xfs_refcount_deferred(mp, ri); + if (ri->ri_realtime) { + xfs_rgnumber_t rgno; + + bno = xfs_rtb_to_rgbno(mp, ri->ri_startblock, &rgno); + } else { + bno = XFS_FSB_TO_AGBNO(mp, ri->ri_startblock); + } + if (XFS_TEST_ERROR(false, mp, XFS_ERRTAG_REFCOUNT_FINISH_ONE)) return -EIO; @@ -1428,7 +1476,7 @@ xfs_refcount_finish_one( * the startblock, get one now. */ rcur = *pcur; - if (rcur != NULL && rcur->bc_ag.pag != ri->ri_pag) { + if (rcur != NULL && xfs_refcount_is_wrong_cursor(rcur, ri)) { nr_ops = xrefc_btree_state(rcur)->nr_ops; shape_changes = xrefc_btree_state(rcur)->shape_changes; xfs_refcount_finish_one_cleanup(tp, rcur, 0); @@ -1436,12 +1484,19 @@ xfs_refcount_finish_one( *pcur = NULL; } if (rcur == NULL) { - error = xfs_alloc_read_agf(ri->ri_pag, tp, - XFS_ALLOC_FLAG_FREEING, &agbp); - if (error) - return error; + if (ri->ri_realtime) { + /* coming in a later patch */ + ASSERT(0); + return -EFSCORRUPTED; + } else { + error = xfs_alloc_read_agf(ri->ri_pag, tp, + XFS_ALLOC_FLAG_FREEING, &agbp); + if (error) + return error; - rcur = xfs_refcountbt_init_cursor(mp, tp, agbp, ri->ri_pag); + rcur = xfs_refcountbt_init_cursor(mp, tp, agbp, + ri->ri_pag); + } xrefc_btree_state(rcur)->nr_ops = nr_ops; xrefc_btree_state(rcur)->shape_changes = shape_changes; } @@ -1492,10 +1547,12 @@ static void __xfs_refcount_add( struct xfs_trans *tp, enum xfs_refcount_intent_type type, + bool isrt, xfs_fsblock_t startblock, xfs_extlen_t blockcount) { struct xfs_refcount_intent *ri; + enum xfs_defer_ops_type optype; ri = kmem_cache_alloc(xfs_refcount_intent_cache, GFP_NOFS | __GFP_NOFAIL); @@ -1503,11 +1560,24 @@ __xfs_refcount_add( ri->ri_type = type; ri->ri_startblock = startblock; ri->ri_blockcount = blockcount; + ri->ri_realtime = isrt; trace_xfs_refcount_defer(tp->t_mountp, ri); + /* + * Deferred refcount updates for the realtime and data sections must + * use separate transactions to finish deferred work because updates to + * realtime metadata files can lock AGFs to allocate btree blocks and + * we don't want that mixing with the AGF locks taken to finish data + * section updates. + */ + if (isrt) + optype = XFS_DEFER_OPS_TYPE_REFCOUNT_RT; + else + optype = XFS_DEFER_OPS_TYPE_REFCOUNT; + xfs_refcount_update_get_group(tp->t_mountp, ri); - xfs_defer_add(tp, XFS_DEFER_OPS_TYPE_REFCOUNT, &ri->ri_list); + xfs_defer_add(tp, optype, &ri->ri_list); } /* @@ -1516,12 +1586,13 @@ __xfs_refcount_add( void xfs_refcount_increase_extent( struct xfs_trans *tp, + bool isrt, struct xfs_bmbt_irec *PREV) { if (!xfs_has_reflink(tp->t_mountp)) return; - __xfs_refcount_add(tp, XFS_REFCOUNT_INCREASE, PREV->br_startblock, + __xfs_refcount_add(tp, XFS_REFCOUNT_INCREASE, isrt, PREV->br_startblock, PREV->br_blockcount); } @@ -1531,12 +1602,13 @@ xfs_refcount_increase_extent( void xfs_refcount_decrease_extent( struct xfs_trans *tp, + bool isrt, struct xfs_bmbt_irec *PREV) { if (!xfs_has_reflink(tp->t_mountp)) return; - __xfs_refcount_add(tp, XFS_REFCOUNT_DECREASE, PREV->br_startblock, + __xfs_refcount_add(tp, XFS_REFCOUNT_DECREASE, isrt, PREV->br_startblock, PREV->br_blockcount); } @@ -1892,6 +1964,7 @@ __xfs_refcount_cow_free( void xfs_refcount_alloc_cow_extent( struct xfs_trans *tp, + bool isrt, xfs_fsblock_t fsb, xfs_extlen_t len) { @@ -1900,16 +1973,17 @@ xfs_refcount_alloc_cow_extent( if (!xfs_has_reflink(mp)) return; - __xfs_refcount_add(tp, XFS_REFCOUNT_ALLOC_COW, fsb, len); + __xfs_refcount_add(tp, XFS_REFCOUNT_ALLOC_COW, isrt, fsb, len); /* Add rmap entry */ - xfs_rmap_alloc_extent(tp, false, fsb, len, XFS_RMAP_OWN_COW); + xfs_rmap_alloc_extent(tp, isrt, fsb, len, XFS_RMAP_OWN_COW); } /* Forget a CoW staging event in the refcount btree. */ void xfs_refcount_free_cow_extent( struct xfs_trans *tp, + bool isrt, xfs_fsblock_t fsb, xfs_extlen_t len) { @@ -1919,8 +1993,8 @@ xfs_refcount_free_cow_extent( return; /* Remove rmap entry */ - xfs_rmap_free_extent(tp, false, fsb, len, XFS_RMAP_OWN_COW); - __xfs_refcount_add(tp, XFS_REFCOUNT_FREE_COW, fsb, len); + xfs_rmap_free_extent(tp, isrt, fsb, len, XFS_RMAP_OWN_COW); + __xfs_refcount_add(tp, XFS_REFCOUNT_FREE_COW, isrt, fsb, len); } struct xfs_refcount_recovery { @@ -2026,7 +2100,7 @@ xfs_refcount_recover_cow_leftovers( /* Free the orphan record */ fsb = XFS_AGB_TO_FSB(mp, pag->pag_agno, rr->rr_rrec.rc_startblock); - xfs_refcount_free_cow_extent(tp, fsb, + xfs_refcount_free_cow_extent(tp, false, fsb, rr->rr_rrec.rc_blockcount); /* Free the block. */ diff --git a/fs/xfs/libxfs/xfs_refcount.h b/fs/xfs/libxfs/xfs_refcount.h index 7713bb908bdc..4e725d723e88 100644 --- a/fs/xfs/libxfs/xfs_refcount.h +++ b/fs/xfs/libxfs/xfs_refcount.h @@ -56,10 +56,14 @@ enum xfs_refcount_intent_type { struct xfs_refcount_intent { struct list_head ri_list; - struct xfs_perag *ri_pag; + union { + struct xfs_perag *ri_pag; + struct xfs_rtgroup *ri_rtg; + }; enum xfs_refcount_intent_type ri_type; xfs_extlen_t ri_blockcount; xfs_fsblock_t ri_startblock; + bool ri_realtime; }; /* Check that the refcount is appropriate for the record domain. */ @@ -77,9 +81,9 @@ xfs_refcount_check_domain( void xfs_refcount_update_get_group(struct xfs_mount *mp, struct xfs_refcount_intent *ri); -void xfs_refcount_increase_extent(struct xfs_trans *tp, +void xfs_refcount_increase_extent(struct xfs_trans *tp, bool isrt, struct xfs_bmbt_irec *irec); -void xfs_refcount_decrease_extent(struct xfs_trans *tp, +void xfs_refcount_decrease_extent(struct xfs_trans *tp, bool isrt, struct xfs_bmbt_irec *irec); extern void xfs_refcount_finish_one_cleanup(struct xfs_trans *tp, @@ -91,10 +95,10 @@ extern int xfs_refcount_find_shared(struct xfs_btree_cur *cur, xfs_agblock_t agbno, xfs_extlen_t aglen, xfs_agblock_t *fbno, xfs_extlen_t *flen, bool find_end_of_shared); -void xfs_refcount_alloc_cow_extent(struct xfs_trans *tp, xfs_fsblock_t fsb, - xfs_extlen_t len); -void xfs_refcount_free_cow_extent(struct xfs_trans *tp, xfs_fsblock_t fsb, - xfs_extlen_t len); +void xfs_refcount_alloc_cow_extent(struct xfs_trans *tp, bool isrt, + xfs_fsblock_t fsb, xfs_extlen_t len); +void xfs_refcount_free_cow_extent(struct xfs_trans *tp, bool isrt, + xfs_fsblock_t fsb, xfs_extlen_t len); extern int xfs_refcount_recover_cow_leftovers(struct xfs_mount *mp, struct xfs_perag *pag); diff --git a/fs/xfs/scrub/cow_repair.c b/fs/xfs/scrub/cow_repair.c index 5292171e6a2b..a0c1d97ab8b6 100644 --- a/fs/xfs/scrub/cow_repair.c +++ b/fs/xfs/scrub/cow_repair.c @@ -336,7 +336,7 @@ xrep_cow_alloc( if (args.fsbno == NULLFSBLOCK) return -ENOSPC; - xfs_refcount_alloc_cow_extent(sc->tp, args.fsbno, args.len); + xfs_refcount_alloc_cow_extent(sc->tp, false, args.fsbno, args.len); irec->br_startblock = args.fsbno; irec->br_blockcount = args.len; diff --git a/fs/xfs/scrub/reap.c b/fs/xfs/scrub/reap.c index b0b29b1e139b..77354bdb0511 100644 --- a/fs/xfs/scrub/reap.c +++ b/fs/xfs/scrub/reap.c @@ -349,7 +349,8 @@ xreap_agextent( * If we're unmapping CoW staging extents, remove the * records from the refcountbt as well. */ - xfs_refcount_free_cow_extent(sc->tp, fsbno, *aglenp); + xfs_refcount_free_cow_extent(sc->tp, false, fsbno, + *aglenp); return 0; } return xfs_rmap_free(sc->tp, sc->sa.agf_bp, sc->sa.pag, agbno, @@ -381,7 +382,7 @@ xreap_agextent( ASSERT(rs->resv == XFS_AG_RESV_NONE); rs->force_roll = true; - xfs_refcount_free_cow_extent(sc->tp, fsbno, *aglenp); + xfs_refcount_free_cow_extent(sc->tp, false, fsbno, *aglenp); xfs_free_extent_later(sc->tp, fsbno, *aglenp, NULL, XFS_FREE_EXTENT_SKIP_DISCARD); return 0; diff --git a/fs/xfs/xfs_refcount_item.c b/fs/xfs/xfs_refcount_item.c index ccc334d482a4..7a366b316e79 100644 --- a/fs/xfs/xfs_refcount_item.c +++ b/fs/xfs/xfs_refcount_item.c @@ -21,6 +21,7 @@ #include "xfs_log_priv.h" #include "xfs_log_recover.h" #include "xfs_ag.h" +#include "xfs_rtgroup.h" struct kmem_cache *xfs_cui_cache; struct kmem_cache *xfs_cud_cache; @@ -286,6 +287,11 @@ xfs_refcount_update_diff_items( ra = container_of(a, struct xfs_refcount_intent, ri_list); rb = container_of(b, struct xfs_refcount_intent, ri_list); + ASSERT(ra->ri_realtime == rb->ri_realtime); + + if (ra->ri_realtime) + return ra->ri_rtg->rtg_rgno - rb->ri_rtg->rtg_rgno; + return ra->ri_pag->pag_agno - rb->ri_pag->pag_agno; } @@ -324,6 +330,8 @@ xfs_refcount_update_log_item( default: ASSERT(0); } + if (ri->ri_realtime) + pmap->pe_flags |= XFS_REFCOUNT_EXTENT_REALTIME; } static struct xfs_log_item * @@ -365,6 +373,15 @@ xfs_refcount_update_get_group( { xfs_agnumber_t agno; + if (ri->ri_realtime) { + xfs_rgnumber_t rgno; + + rgno = xfs_rtb_to_rgno(mp, ri->ri_startblock); + ri->ri_rtg = xfs_rtgroup_get(mp, rgno); + xfs_rtgroup_bump_intents(ri->ri_rtg); + return; + } + agno = XFS_FSB_TO_AGNO(mp, ri->ri_startblock); ri->ri_pag = xfs_perag_get(mp, agno); xfs_perag_bump_intents(ri->ri_pag); @@ -375,6 +392,12 @@ static inline void xfs_refcount_update_put_group( struct xfs_refcount_intent *ri) { + if (ri->ri_realtime) { + xfs_rtgroup_drop_intents(ri->ri_rtg); + xfs_rtgroup_put(ri->ri_rtg); + return; + } + xfs_perag_drop_intents(ri->ri_pag); xfs_perag_put(ri->ri_pag); } @@ -536,6 +559,7 @@ xfs_cui_item_recover( goto abort_error; } + fake.ri_realtime = pmap->pe_flags & XFS_REFCOUNT_EXTENT_REALTIME; fake.ri_startblock = pmap->pe_startblock; fake.ri_blockcount = pmap->pe_len; @@ -561,18 +585,22 @@ xfs_cui_item_recover( switch (fake.ri_type) { case XFS_REFCOUNT_INCREASE: - xfs_refcount_increase_extent(tp, &irec); + xfs_refcount_increase_extent(tp, + fake.ri_realtime, &irec); break; case XFS_REFCOUNT_DECREASE: - xfs_refcount_decrease_extent(tp, &irec); + xfs_refcount_decrease_extent(tp, + fake.ri_realtime, &irec); break; case XFS_REFCOUNT_ALLOC_COW: xfs_refcount_alloc_cow_extent(tp, + fake.ri_realtime, irec.br_startblock, irec.br_blockcount); break; case XFS_REFCOUNT_FREE_COW: xfs_refcount_free_cow_extent(tp, + fake.ri_realtime, irec.br_startblock, irec.br_blockcount); break; diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c index cf514af238ce..52e73aa2c38e 100644 --- a/fs/xfs/xfs_reflink.c +++ b/fs/xfs/xfs_reflink.c @@ -585,6 +585,7 @@ xfs_reflink_cancel_cow_blocks( struct xfs_ifork *ifp = xfs_ifork_ptr(ip, XFS_COW_FORK); struct xfs_bmbt_irec got, del; struct xfs_iext_cursor icur; + bool isrt = XFS_IS_REALTIME_INODE(ip); int error = 0; if (!xfs_inode_has_cow_data(ip)) @@ -614,11 +615,12 @@ xfs_reflink_cancel_cow_blocks( ASSERT((*tpp)->t_firstblock == NULLFSBLOCK); /* Free the CoW orphan record. */ - xfs_refcount_free_cow_extent(*tpp, del.br_startblock, - del.br_blockcount); + xfs_refcount_free_cow_extent(*tpp, isrt, + del.br_startblock, del.br_blockcount); xfs_free_extent_later(*tpp, del.br_startblock, - del.br_blockcount, NULL, 0); + del.br_blockcount, NULL, + isrt ? XFS_FREE_EXTENT_REALTIME : 0); /* Roll the transaction */ error = xfs_defer_finish(tpp); @@ -726,6 +728,7 @@ xfs_reflink_end_cow_extent( struct xfs_ifork *ifp = xfs_ifork_ptr(ip, XFS_COW_FORK); unsigned int resblks; int nmaps; + bool isrt = XFS_IS_REALTIME_INODE(ip); int error; /* No COW extents? That's easy! */ @@ -803,7 +806,7 @@ xfs_reflink_end_cow_extent( * or not), unmap the extent and drop its refcount. */ xfs_bmap_unmap_extent(tp, ip, XFS_DATA_FORK, &data); - xfs_refcount_decrease_extent(tp, &data); + xfs_refcount_decrease_extent(tp, isrt, &data); xfs_trans_mod_dquot_byino(tp, ip, XFS_TRANS_DQ_BCOUNT, -data.br_blockcount); } else if (data.br_startblock == DELAYSTARTBLOCK) { @@ -823,7 +826,8 @@ xfs_reflink_end_cow_extent( } /* Free the CoW orphan record. */ - xfs_refcount_free_cow_extent(tp, del.br_startblock, del.br_blockcount); + xfs_refcount_free_cow_extent(tp, isrt, del.br_startblock, + del.br_blockcount); /* Map the new blocks into the data fork. */ xfs_bmap_map_extent(tp, ip, XFS_DATA_FORK, &del); @@ -1160,6 +1164,7 @@ xfs_reflink_remap_extent( bool quota_reserved = true; bool smap_real; bool dmap_written = xfs_bmap_is_written_extent(dmap); + bool isrt = XFS_IS_REALTIME_INODE(ip); int iext_delta = 0; int nimaps; int error; @@ -1291,7 +1296,7 @@ xfs_reflink_remap_extent( * or not), unmap the extent and drop its refcount. */ xfs_bmap_unmap_extent(tp, ip, XFS_DATA_FORK, &smap); - xfs_refcount_decrease_extent(tp, &smap); + xfs_refcount_decrease_extent(tp, isrt, &smap); qdelta -= smap.br_blockcount; } else if (smap.br_startblock == DELAYSTARTBLOCK) { int done; @@ -1314,7 +1319,7 @@ xfs_reflink_remap_extent( * its refcount and map it into the file. */ if (dmap_written) { - xfs_refcount_increase_extent(tp, dmap); + xfs_refcount_increase_extent(tp, isrt, dmap); xfs_bmap_map_extent(tp, ip, XFS_DATA_FORK, dmap); qdelta += dmap->br_blockcount; } ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 13/42] xfs: wire up a new inode fork type for the realtime refcount 2022-12-30 22:18 ` [PATCHSET v1.0 00/42] xfs: reflink on the realtime device Darrick J. Wong ` (11 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 08/42] xfs: add a realtime flag to the refcount update log redo items Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 20/42] xfs: enable sharing of realtime file blocks Darrick J. Wong ` (28 subsequent siblings) 41 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Plumb in the pieces we need to embed the root of the realtime refcount btree in an inode's data fork, complete with new fork type and on-disk interpretation functions. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_format.h | 8 + fs/xfs/libxfs/xfs_inode_fork.c | 8 + fs/xfs/libxfs/xfs_rtrefcount_btree.c | 236 ++++++++++++++++++++++++++++++++++ fs/xfs/libxfs/xfs_rtrefcount_btree.h | 112 ++++++++++++++++ fs/xfs/xfs_inode_item_recover.c | 4 + fs/xfs/xfs_ondisk.h | 1 6 files changed, 366 insertions(+), 3 deletions(-) diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h index 20af5b730d6d..17be73c45226 100644 --- a/fs/xfs/libxfs/xfs_format.h +++ b/fs/xfs/libxfs/xfs_format.h @@ -1805,6 +1805,14 @@ typedef __be32 xfs_refcount_ptr_t; */ #define XFS_RTREFC_CRC_MAGIC 0x52434e54 /* 'RCNT' */ +/* + * rt refcount root header, on-disk form only. + */ +struct xfs_rtrefcount_root { + __be16 bb_level; /* 0 is a leaf */ + __be16 bb_numrecs; /* current # of data records */ +}; + /* inode-rooted btree pointer type */ typedef __be64 xfs_rtrefcount_ptr_t; diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c index e69ec68b5a9d..7aae3ae810b7 100644 --- a/fs/xfs/libxfs/xfs_inode_fork.c +++ b/fs/xfs/libxfs/xfs_inode_fork.c @@ -28,6 +28,7 @@ #include "xfs_health.h" #include "xfs_symlink_remote.h" #include "xfs_rtrmap_btree.h" +#include "xfs_rtrefcount_btree.h" struct kmem_cache *xfs_ifork_cache; @@ -269,8 +270,7 @@ xfs_iformat_data_fork( case XFS_DINODE_FMT_REFCOUNT: if (!xfs_has_rtreflink(ip->i_mount)) return -EFSCORRUPTED; - ASSERT(0); /* to be implemented later */ - return -EFSCORRUPTED; + return xfs_iformat_rtrefcount(ip, dip); default: xfs_inode_verifier_error(ip, -EFSCORRUPTED, __func__, dip, sizeof(*dip), __this_address); @@ -658,7 +658,9 @@ xfs_iflush_fork( break; case XFS_DINODE_FMT_REFCOUNT: - ASSERT(0); /* to be implemented later */ + ASSERT(whichfork == XFS_DATA_FORK); + if (iip->ili_fields & brootflag[whichfork]) + xfs_iflush_rtrefcount(ip, dip); break; default: diff --git a/fs/xfs/libxfs/xfs_rtrefcount_btree.c b/fs/xfs/libxfs/xfs_rtrefcount_btree.c index 74c5cf9a0d3a..a43ee6d7b547 100644 --- a/fs/xfs/libxfs/xfs_rtrefcount_btree.c +++ b/fs/xfs/libxfs/xfs_rtrefcount_btree.c @@ -85,6 +85,41 @@ xfs_rtrefcountbt_get_maxrecs( return cur->bc_mp->m_rtrefc_mxr[level != 0]; } +/* + * Calculate number of records in a realtime refcount btree inode root. + */ +unsigned int +xfs_rtrefcountbt_droot_maxrecs( + unsigned int blocklen, + bool leaf) +{ + blocklen -= sizeof(struct xfs_rtrefcount_root); + + if (leaf) + return blocklen / sizeof(struct xfs_refcount_rec); + return blocklen / (2 * sizeof(struct xfs_refcount_key) + + sizeof(xfs_rtrefcount_ptr_t)); +} + +/* + * Get the maximum records we could store in the on-disk format. + * + * For non-root nodes this is equivalent to xfs_rtrefcountbt_get_maxrecs, but + * for the root node this checks the available space in the dinode fork so that + * we can resize the in-memory buffer to match it. After a resize to the + * maximum size this function returns the same value as + * xfs_rtrefcountbt_get_maxrecs for the root node, too. + */ +STATIC int +xfs_rtrefcountbt_get_dmaxrecs( + struct xfs_btree_cur *cur, + int level) +{ + if (level != cur->bc_nlevels - 1) + return cur->bc_mp->m_rtrefc_mxr[level != 0]; + return xfs_rtrefcountbt_droot_maxrecs(cur->bc_ino.forksize, level == 0); +} + STATIC void xfs_rtrefcountbt_init_key_from_rec( union xfs_btree_key *key, @@ -255,6 +290,68 @@ xfs_rtrefcountbt_keys_contiguous( be32_to_cpu(key2->refc.rc_startblock)); } +/* Move the rt refcount btree root from one incore buffer to another. */ +static void +xfs_rtrefcountbt_broot_move( + struct xfs_inode *ip, + int whichfork, + struct xfs_btree_block *dst_broot, + size_t dst_bytes, + struct xfs_btree_block *src_broot, + size_t src_bytes, + unsigned int level, + unsigned int numrecs) +{ + struct xfs_mount *mp = ip->i_mount; + void *dptr; + void *sptr; + + ASSERT(xfs_rtrefcount_droot_space(src_broot) <= + xfs_inode_fork_size(ip, whichfork)); + + /* + * We always have to move the pointers because they are not butted + * against the btree block header. + */ + if (numrecs && level > 0) { + sptr = xfs_rtrefcount_broot_ptr_addr(mp, src_broot, 1, + src_bytes); + dptr = xfs_rtrefcount_broot_ptr_addr(mp, dst_broot, 1, + dst_bytes); + memmove(dptr, sptr, numrecs * sizeof(xfs_fsblock_t)); + } + + if (src_broot == dst_broot) + return; + + /* + * If the root is being totally relocated, we have to migrate the block + * header and the keys/records that come after it. + */ + memcpy(dst_broot, src_broot, XFS_RTREFCOUNT_BLOCK_LEN); + + if (!numrecs) + return; + + if (level == 0) { + sptr = xfs_rtrefcount_rec_addr(src_broot, 1); + dptr = xfs_rtrefcount_rec_addr(dst_broot, 1); + memcpy(dptr, sptr, + numrecs * sizeof(struct xfs_refcount_rec)); + } else { + sptr = xfs_rtrefcount_key_addr(src_broot, 1); + dptr = xfs_rtrefcount_key_addr(dst_broot, 1); + memcpy(dptr, sptr, + numrecs * sizeof(struct xfs_refcount_key)); + } +} + +static const struct xfs_ifork_broot_ops xfs_rtrefcountbt_iroot_ops = { + .maxrecs = xfs_rtrefcountbt_maxrecs, + .size = xfs_rtrefcount_broot_space_calc, + .move = xfs_rtrefcountbt_broot_move, +}; + const struct xfs_btree_ops xfs_rtrefcountbt_ops = { .rec_len = sizeof(struct xfs_refcount_rec), .key_len = sizeof(struct xfs_refcount_key), @@ -266,6 +363,7 @@ const struct xfs_btree_ops xfs_rtrefcountbt_ops = { .free_block = xfs_btree_free_imeta_block, .get_minrecs = xfs_rtrefcountbt_get_minrecs, .get_maxrecs = xfs_rtrefcountbt_get_maxrecs, + .get_dmaxrecs = xfs_rtrefcountbt_get_dmaxrecs, .init_key_from_rec = xfs_rtrefcountbt_init_key_from_rec, .init_high_key_from_rec = xfs_rtrefcountbt_init_high_key_from_rec, .init_rec_from_cur = xfs_rtrefcountbt_init_rec_from_cur, @@ -276,6 +374,7 @@ const struct xfs_btree_ops xfs_rtrefcountbt_ops = { .keys_inorder = xfs_rtrefcountbt_keys_inorder, .recs_inorder = xfs_rtrefcountbt_recs_inorder, .keys_contiguous = xfs_rtrefcountbt_keys_contiguous, + .iroot_ops = &xfs_rtrefcountbt_iroot_ops, }; /* Initialize a new rt refcount btree cursor. */ @@ -529,3 +628,140 @@ xfs_rtrefcountbt_calc_reserves( return xfs_rtrefcountbt_max_size(mp, xfs_rtb_to_rtxt(mp, mp->m_sb.sb_rgblocks)); } + +/* + * Convert on-disk form of btree root to in-memory form. + */ +STATIC void +xfs_rtrefcountbt_from_disk( + struct xfs_inode *ip, + struct xfs_rtrefcount_root *dblock, + int dblocklen, + struct xfs_btree_block *rblock) +{ + struct xfs_mount *mp = ip->i_mount; + struct xfs_refcount_key *fkp; + __be64 *fpp; + struct xfs_refcount_key *tkp; + __be64 *tpp; + struct xfs_refcount_rec *frp; + struct xfs_refcount_rec *trp; + unsigned int numrecs; + unsigned int maxrecs; + unsigned int rblocklen; + + rblocklen = xfs_rtrefcount_broot_space(mp, dblock); + + xfs_btree_init_block(mp, rblock, &xfs_rtrefcountbt_ops, 0, 0, + ip->i_ino); + + rblock->bb_level = dblock->bb_level; + rblock->bb_numrecs = dblock->bb_numrecs; + + if (be16_to_cpu(rblock->bb_level) > 0) { + maxrecs = xfs_rtrefcountbt_droot_maxrecs(dblocklen, false); + fkp = xfs_rtrefcount_droot_key_addr(dblock, 1); + tkp = xfs_rtrefcount_key_addr(rblock, 1); + fpp = xfs_rtrefcount_droot_ptr_addr(dblock, 1, maxrecs); + tpp = xfs_rtrefcount_broot_ptr_addr(mp, rblock, 1, rblocklen); + numrecs = be16_to_cpu(dblock->bb_numrecs); + memcpy(tkp, fkp, 2 * sizeof(*fkp) * numrecs); + memcpy(tpp, fpp, sizeof(*fpp) * numrecs); + } else { + frp = xfs_rtrefcount_droot_rec_addr(dblock, 1); + trp = xfs_rtrefcount_rec_addr(rblock, 1); + numrecs = be16_to_cpu(dblock->bb_numrecs); + memcpy(trp, frp, sizeof(*frp) * numrecs); + } +} + +/* Load a realtime reference count btree root in from disk. */ +int +xfs_iformat_rtrefcount( + struct xfs_inode *ip, + struct xfs_dinode *dip) +{ + struct xfs_mount *mp = ip->i_mount; + struct xfs_ifork *ifp = xfs_ifork_ptr(ip, XFS_DATA_FORK); + struct xfs_rtrefcount_root *dfp = XFS_DFORK_PTR(dip, XFS_DATA_FORK); + unsigned int numrecs; + unsigned int level; + int dsize; + + dsize = XFS_DFORK_SIZE(dip, mp, XFS_DATA_FORK); + numrecs = be16_to_cpu(dfp->bb_numrecs); + level = be16_to_cpu(dfp->bb_level); + + if (level > mp->m_rtrefc_maxlevels || + xfs_rtrefcount_droot_space_calc(level, numrecs) > dsize) + return -EFSCORRUPTED; + + xfs_iroot_alloc(ip, XFS_DATA_FORK, + xfs_rtrefcount_broot_space_calc(mp, level, numrecs)); + xfs_rtrefcountbt_from_disk(ip, dfp, dsize, ifp->if_broot); + return 0; +} + +/* + * Convert in-memory form of btree root to on-disk form. + */ +void +xfs_rtrefcountbt_to_disk( + struct xfs_mount *mp, + struct xfs_btree_block *rblock, + int rblocklen, + struct xfs_rtrefcount_root *dblock, + int dblocklen) +{ + struct xfs_refcount_key *fkp; + __be64 *fpp; + struct xfs_refcount_key *tkp; + __be64 *tpp; + struct xfs_refcount_rec *frp; + struct xfs_refcount_rec *trp; + unsigned int maxrecs; + unsigned int numrecs; + + ASSERT(rblock->bb_magic == cpu_to_be32(XFS_RTREFC_CRC_MAGIC)); + ASSERT(uuid_equal(&rblock->bb_u.l.bb_uuid, &mp->m_sb.sb_meta_uuid)); + ASSERT(rblock->bb_u.l.bb_blkno == cpu_to_be64(XFS_BUF_DADDR_NULL)); + ASSERT(rblock->bb_u.l.bb_leftsib == cpu_to_be64(NULLFSBLOCK)); + ASSERT(rblock->bb_u.l.bb_rightsib == cpu_to_be64(NULLFSBLOCK)); + + dblock->bb_level = rblock->bb_level; + dblock->bb_numrecs = rblock->bb_numrecs; + + if (be16_to_cpu(rblock->bb_level) > 0) { + maxrecs = xfs_rtrefcountbt_droot_maxrecs(dblocklen, false); + fkp = xfs_rtrefcount_key_addr(rblock, 1); + tkp = xfs_rtrefcount_droot_key_addr(dblock, 1); + fpp = xfs_rtrefcount_broot_ptr_addr(mp, rblock, 1, rblocklen); + tpp = xfs_rtrefcount_droot_ptr_addr(dblock, 1, maxrecs); + numrecs = be16_to_cpu(rblock->bb_numrecs); + memcpy(tkp, fkp, 2 * sizeof(*fkp) * numrecs); + memcpy(tpp, fpp, sizeof(*fpp) * numrecs); + } else { + frp = xfs_rtrefcount_rec_addr(rblock, 1); + trp = xfs_rtrefcount_droot_rec_addr(dblock, 1); + numrecs = be16_to_cpu(rblock->bb_numrecs); + memcpy(trp, frp, sizeof(*frp) * numrecs); + } +} + +/* Flush a realtime reference count btree root out to disk. */ +void +xfs_iflush_rtrefcount( + struct xfs_inode *ip, + struct xfs_dinode *dip) +{ + struct xfs_ifork *ifp = xfs_ifork_ptr(ip, XFS_DATA_FORK); + struct xfs_rtrefcount_root *dfp = XFS_DFORK_PTR(dip, XFS_DATA_FORK); + + ASSERT(ifp->if_broot != NULL); + ASSERT(ifp->if_broot_bytes > 0); + ASSERT(xfs_rtrefcount_droot_space(ifp->if_broot) <= + xfs_inode_fork_size(ip, XFS_DATA_FORK)); + xfs_rtrefcountbt_to_disk(ip->i_mount, ifp->if_broot, + ifp->if_broot_bytes, dfp, + XFS_DFORK_SIZE(dip, ip->i_mount, XFS_DATA_FORK)); +} diff --git a/fs/xfs/libxfs/xfs_rtrefcount_btree.h b/fs/xfs/libxfs/xfs_rtrefcount_btree.h index ffda0b063bcf..d2fe2004568d 100644 --- a/fs/xfs/libxfs/xfs_rtrefcount_btree.h +++ b/fs/xfs/libxfs/xfs_rtrefcount_btree.h @@ -27,6 +27,7 @@ void xfs_rtrefcountbt_commit_staged_btree(struct xfs_btree_cur *cur, unsigned int xfs_rtrefcountbt_maxrecs(struct xfs_mount *mp, unsigned int blocklen, bool leaf); void xfs_rtrefcountbt_compute_maxlevels(struct xfs_mount *mp); +unsigned int xfs_rtrefcountbt_droot_maxrecs(unsigned int blocklen, bool leaf); /* * Addresses of records, keys, and pointers within an incore rtrefcountbt block. @@ -74,4 +75,115 @@ int xfs_rtrefcountbt_create_path(struct xfs_mount *mp, xfs_rgnumber_t rgno, xfs_filblks_t xfs_rtrefcountbt_calc_reserves(struct xfs_mount *mp); +/* Addresses of key, pointers, and records within an ondisk rtrefcount block. */ + +static inline struct xfs_refcount_rec * +xfs_rtrefcount_droot_rec_addr( + struct xfs_rtrefcount_root *block, + unsigned int index) +{ + return (struct xfs_refcount_rec *) + ((char *)(block + 1) + + (index - 1) * sizeof(struct xfs_refcount_rec)); +} + +static inline struct xfs_refcount_key * +xfs_rtrefcount_droot_key_addr( + struct xfs_rtrefcount_root *block, + unsigned int index) +{ + return (struct xfs_refcount_key *) + ((char *)(block + 1) + + (index - 1) * sizeof(struct xfs_refcount_key)); +} + +static inline xfs_rtrefcount_ptr_t * +xfs_rtrefcount_droot_ptr_addr( + struct xfs_rtrefcount_root *block, + unsigned int index, + unsigned int maxrecs) +{ + return (xfs_rtrefcount_ptr_t *) + ((char *)(block + 1) + + maxrecs * sizeof(struct xfs_refcount_key) + + (index - 1) * sizeof(xfs_rtrefcount_ptr_t)); +} + +/* + * Address of pointers within the incore btree root. + * + * These are to be used when we know the size of the block and + * we don't have a cursor. + */ +static inline xfs_rtrefcount_ptr_t * +xfs_rtrefcount_broot_ptr_addr( + struct xfs_mount *mp, + struct xfs_btree_block *bb, + unsigned int index, + unsigned int block_size) +{ + return xfs_rtrefcount_ptr_addr(bb, index, + xfs_rtrefcountbt_maxrecs(mp, block_size, false)); +} + +/* + * Compute the space required for the incore btree root containing the given + * number of records. + */ +static inline size_t +xfs_rtrefcount_broot_space_calc( + struct xfs_mount *mp, + unsigned int level, + unsigned int nrecs) +{ + size_t sz = XFS_RTREFCOUNT_BLOCK_LEN; + + if (level > 0) + return sz + nrecs * (sizeof(struct xfs_refcount_key) + + sizeof(xfs_rtrefcount_ptr_t)); + return sz + nrecs * sizeof(struct xfs_refcount_rec); +} + +/* + * Compute the space required for the incore btree root given the ondisk + * btree root block. + */ +static inline size_t +xfs_rtrefcount_broot_space(struct xfs_mount *mp, struct xfs_rtrefcount_root *bb) +{ + return xfs_rtrefcount_broot_space_calc(mp, be16_to_cpu(bb->bb_level), + be16_to_cpu(bb->bb_numrecs)); +} + +/* Compute the space required for the ondisk root block. */ +static inline size_t +xfs_rtrefcount_droot_space_calc( + unsigned int level, + unsigned int nrecs) +{ + size_t sz = sizeof(struct xfs_rtrefcount_root); + + if (level > 0) + return sz + nrecs * (sizeof(struct xfs_refcount_key) + + sizeof(xfs_rtrefcount_ptr_t)); + return sz + nrecs * sizeof(struct xfs_refcount_rec); +} + +/* + * Compute the space required for the ondisk root block given an incore root + * block. + */ +static inline size_t +xfs_rtrefcount_droot_space(struct xfs_btree_block *bb) +{ + return xfs_rtrefcount_droot_space_calc(be16_to_cpu(bb->bb_level), + be16_to_cpu(bb->bb_numrecs)); +} + +int xfs_iformat_rtrefcount(struct xfs_inode *ip, struct xfs_dinode *dip); +void xfs_rtrefcountbt_to_disk(struct xfs_mount *mp, + struct xfs_btree_block *rblock, int rblocklen, + struct xfs_rtrefcount_root *dblock, int dblocklen); +void xfs_iflush_rtrefcount(struct xfs_inode *ip, struct xfs_dinode *dip); + #endif /* __XFS_RTREFCOUNT_BTREE_H__ */ diff --git a/fs/xfs/xfs_inode_item_recover.c b/fs/xfs/xfs_inode_item_recover.c index feeba1dff01e..f13bf35793f1 100644 --- a/fs/xfs/xfs_inode_item_recover.c +++ b/fs/xfs/xfs_inode_item_recover.c @@ -23,6 +23,7 @@ #include "xfs_icache.h" #include "xfs_bmap_btree.h" #include "xfs_rtrmap_btree.h" +#include "xfs_rtrefcount_btree.h" STATIC void xlog_recover_inode_ra_pass2( @@ -284,6 +285,9 @@ xlog_recover_inode_dbroot( case XFS_DINODE_FMT_RMAP: xfs_rtrmapbt_to_disk(mp, src, len, dfork, dsize); break; + case XFS_DINODE_FMT_REFCOUNT: + xfs_rtrefcountbt_to_disk(mp, src, len, dfork, dsize); + break; default: ASSERT(0); return -EFSCORRUPTED; diff --git a/fs/xfs/xfs_ondisk.h b/fs/xfs/xfs_ondisk.h index 94bbb6351d3d..7c14dd104191 100644 --- a/fs/xfs/xfs_ondisk.h +++ b/fs/xfs/xfs_ondisk.h @@ -80,6 +80,7 @@ xfs_check_ondisk_structs(void) XFS_CHECK_STRUCT_SIZE(xfs_rtrmap_ptr_t, 8); XFS_CHECK_STRUCT_SIZE(struct xfs_rtrmap_root, 4); XFS_CHECK_STRUCT_SIZE(xfs_rtrefcount_ptr_t, 8); + XFS_CHECK_STRUCT_SIZE(struct xfs_rtrefcount_root, 4); /* * m68k has problems with xfs_attr_leaf_name_remote_t, but we pad it to ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 20/42] xfs: enable sharing of realtime file blocks 2022-12-30 22:18 ` [PATCHSET v1.0 00/42] xfs: reflink on the realtime device Darrick J. Wong ` (12 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 13/42] xfs: wire up a new inode fork type for the realtime refcount Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 19/42] xfs: enable CoW for realtime data Darrick J. Wong ` (27 subsequent siblings) 41 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Update the remapping routines to be able to handle realtime files. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/xfs_reflink.c | 26 +++++++++++++++++++++----- 1 file changed, 21 insertions(+), 5 deletions(-) diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c index 3b5d144bef41..3cead39e4308 100644 --- a/fs/xfs/xfs_reflink.c +++ b/fs/xfs/xfs_reflink.c @@ -33,6 +33,7 @@ #include "xfs_rtrefcount_btree.h" #include "xfs_rtalloc.h" #include "xfs_rtgroup.h" +#include "xfs_imeta.h" /* * Copy on Write of Shared Blocks @@ -1207,14 +1208,29 @@ xfs_reflink_update_dest( static int xfs_reflink_ag_has_free_space( struct xfs_mount *mp, - xfs_agnumber_t agno) + struct xfs_inode *ip, + xfs_fsblock_t fsb) { struct xfs_perag *pag; + xfs_agnumber_t agno; int error = 0; if (!xfs_has_rmapbt(mp)) return 0; + if (XFS_IS_REALTIME_INODE(ip)) { + struct xfs_rtgroup *rtg; + xfs_rgnumber_t rgno; + rgno = xfs_rtb_to_rgno(mp, fsb); + rtg = xfs_rtgroup_get(mp, rgno); + if (xfs_imeta_resv_critical(rtg->rtg_rmapip) || + xfs_imeta_resv_critical(rtg->rtg_refcountip)) + error = -ENOSPC; + xfs_rtgroup_put(rtg); + return error; + } + + agno = XFS_FSB_TO_AGNO(mp, fsb); pag = xfs_perag_get(mp, agno); if (xfs_ag_resv_critical(pag, XFS_AG_RESV_RMAPBT) || xfs_ag_resv_critical(pag, XFS_AG_RESV_METADATA)) @@ -1328,8 +1344,8 @@ xfs_reflink_remap_extent( /* No reflinking if the AG of the dest mapping is low on space. */ if (dmap_written) { - error = xfs_reflink_ag_has_free_space(mp, - XFS_FSB_TO_AGNO(mp, dmap->br_startblock)); + error = xfs_reflink_ag_has_free_space(mp, ip, + dmap->br_startblock); if (error) goto out_cancel; } @@ -1589,8 +1605,8 @@ xfs_reflink_remap_prep( /* Check file eligibility and prepare for block sharing. */ ret = -EINVAL; - /* Don't reflink realtime inodes */ - if (XFS_IS_REALTIME_INODE(src) || XFS_IS_REALTIME_INODE(dest)) + /* Can't reflink between data and rt volumes */ + if (XFS_IS_REALTIME_INODE(src) != XFS_IS_REALTIME_INODE(dest)) goto out_unlock; /* Don't share DAX file data with non-DAX file. */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 19/42] xfs: enable CoW for realtime data 2022-12-30 22:18 ` [PATCHSET v1.0 00/42] xfs: reflink on the realtime device Darrick J. Wong ` (13 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 20/42] xfs: enable sharing of realtime file blocks Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 15/42] xfs: create routine to allocate and initialize a realtime refcount btree inode Darrick J. Wong ` (26 subsequent siblings) 41 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Update our write paths to support copy on write on the rt volume. This works in more or less the same way as it does on the data device, with the major exception that we never do delalloc on the rt volume. Because we consider unwritten CoW fork staging extents to be incore quota reservation, we update xfs_quota_reserve_blkres to support this case. Though xfs doesn't allow rt and quota together, the change is trivial and we shouldn't leave a logic bomb here. While we're at it, add a missing xfs_mod_delalloc call when we remove delalloc block reservation from the inode. This is largely irrelvant since realtime files do not use delalloc, but we want to avoid leaving logic bombs. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/xfs_bmap_util.c | 61 ++++++++++++++++++++++++++++++++++++++-------- fs/xfs/xfs_quota.h | 6 +---- fs/xfs/xfs_reflink.c | 36 +++++++++++++++++++++------ fs/xfs/xfs_trans_dquot.c | 11 ++++++++ 4 files changed, 90 insertions(+), 24 deletions(-) diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index 447c057c9331..842f472292cd 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -71,6 +71,55 @@ xfs_zero_extent( } #ifdef CONFIG_XFS_RT + +/* Update all inode and quota accounting for the allocation we just did. */ +static void +xfs_bmap_rtalloc_accounting( + struct xfs_bmalloca *ap) +{ + if (ap->flags & XFS_BMAPI_COWFORK) { + /* + * COW fork blocks are in-core only and thus are treated as + * in-core quota reservation (like delalloc blocks) even when + * converted to real blocks. The quota reservation is not + * accounted to disk until blocks are remapped to the data + * fork. So if these blocks were previously delalloc, we + * already have quota reservation and there's nothing to do + * yet. + */ + if (ap->wasdel) { + xfs_mod_delalloc(ap->ip->i_mount, -(int64_t)ap->length); + return; + } + + /* + * Otherwise, we've allocated blocks in a hole. The transaction + * has acquired in-core quota reservation for this extent. + * Rather than account these as real blocks, however, we reduce + * the transaction quota reservation based on the allocation. + * This essentially transfers the transaction quota reservation + * to that of a delalloc extent. + */ + ap->ip->i_delayed_blks += ap->length; + xfs_trans_mod_dquot_byino(ap->tp, ap->ip, + XFS_TRANS_DQ_RES_RTBLKS, -(long)ap->length); + return; + } + + /* data fork only */ + ap->ip->i_nblocks += ap->length; + xfs_trans_log_inode(ap->tp, ap->ip, XFS_ILOG_CORE); + if (ap->wasdel) { + ap->ip->i_delayed_blks -= ap->length; + xfs_mod_delalloc(ap->ip->i_mount, -(int64_t)ap->length); + } + + /* Adjust the disk quota also. This was reserved earlier. */ + xfs_trans_mod_dquot_byino(ap->tp, ap->ip, + ap->wasdel ? XFS_TRANS_DQ_DELRTBCOUNT : + XFS_TRANS_DQ_RTBCOUNT, ap->length); +} + int xfs_bmap_rtalloc( struct xfs_bmalloca *ap) @@ -166,17 +215,7 @@ xfs_bmap_rtalloc( if (rtx != NULLRTEXTNO) { ap->blkno = xfs_rtx_to_rtb(mp, rtx); ap->length = xfs_rtxlen_to_extlen(mp, ralen); - ap->ip->i_nblocks += ap->length; - xfs_trans_log_inode(ap->tp, ap->ip, XFS_ILOG_CORE); - if (ap->wasdel) - ap->ip->i_delayed_blks -= ap->length; - /* - * Adjust the disk quota also. This was reserved - * earlier. - */ - xfs_trans_mod_dquot_byino(ap->tp, ap->ip, - ap->wasdel ? XFS_TRANS_DQ_DELRTBCOUNT : - XFS_TRANS_DQ_RTBCOUNT, ap->length); + xfs_bmap_rtalloc_accounting(ap); return 0; } diff --git a/fs/xfs/xfs_quota.h b/fs/xfs/xfs_quota.h index 0cb52d5be4aa..fa34d997b747 100644 --- a/fs/xfs/xfs_quota.h +++ b/fs/xfs/xfs_quota.h @@ -124,11 +124,7 @@ int xfs_qm_mount_quotas(struct xfs_mount *mp); extern void xfs_qm_unmount(struct xfs_mount *); extern void xfs_qm_unmount_quotas(struct xfs_mount *); -static inline int -xfs_quota_reserve_blkres(struct xfs_inode *ip, int64_t blocks) -{ - return xfs_trans_reserve_quota_nblks(NULL, ip, blocks, 0, false); -} +int xfs_quota_reserve_blkres(struct xfs_inode *ip, int64_t blocks); bool xfs_inode_near_dquot_enforcement(struct xfs_inode *ip, xfs_dqtype_t type); # ifdef CONFIG_XFS_LIVE_HOOKS diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c index 455adcce994d..3b5d144bef41 100644 --- a/fs/xfs/xfs_reflink.c +++ b/fs/xfs/xfs_reflink.c @@ -434,20 +434,26 @@ xfs_reflink_fill_cow_hole( struct xfs_mount *mp = ip->i_mount; struct xfs_trans *tp; xfs_filblks_t resaligned; - xfs_extlen_t resblks; + unsigned int dblocks = 0, rblocks = 0; int nimaps; int error; bool found; resaligned = xfs_aligned_fsb_count(imap->br_startoff, imap->br_blockcount, xfs_get_cowextsz_hint(ip)); - resblks = XFS_DIOSTRAT_SPACE_RES(mp, resaligned); + if (XFS_IS_REALTIME_INODE(ip)) { + dblocks = XFS_DIOSTRAT_SPACE_RES(mp, 0); + rblocks = resaligned; + } else { + dblocks = XFS_DIOSTRAT_SPACE_RES(mp, resaligned); + rblocks = 0; + } xfs_iunlock(ip, *lockmode); *lockmode = 0; - error = xfs_trans_alloc_inode(ip, &M_RES(mp)->tr_write, resblks, 0, - false, &tp); + error = xfs_trans_alloc_inode(ip, &M_RES(mp)->tr_write, dblocks, + rblocks, false, &tp); if (error) return error; @@ -1232,7 +1238,7 @@ xfs_reflink_remap_extent( struct xfs_trans *tp; xfs_off_t newlen; int64_t qdelta = 0; - unsigned int resblks; + unsigned int dblocks, rblocks, resblks; bool quota_reserved = true; bool smap_real; bool dmap_written = xfs_bmap_is_written_extent(dmap); @@ -1263,8 +1269,15 @@ xfs_reflink_remap_extent( * we're remapping. */ resblks = XFS_EXTENTADD_SPACE_RES(mp, XFS_DATA_FORK); + if (XFS_IS_REALTIME_INODE(ip)) { + dblocks = resblks; + rblocks = dmap->br_blockcount; + } else { + dblocks = resblks + dmap->br_blockcount; + rblocks = 0; + } error = xfs_trans_alloc_inode(ip, &M_RES(mp)->tr_write, - resblks + dmap->br_blockcount, 0, false, &tp); + dblocks, rblocks, false, &tp); if (error == -EDQUOT || error == -ENOSPC) { quota_reserved = false; error = xfs_trans_alloc_inode(ip, &M_RES(mp)->tr_write, @@ -1344,8 +1357,15 @@ xfs_reflink_remap_extent( * done. */ if (!quota_reserved && !smap_real && dmap_written) { - error = xfs_trans_reserve_quota_nblks(tp, ip, - dmap->br_blockcount, 0, false); + if (XFS_IS_REALTIME_INODE(ip)) { + dblocks = 0; + rblocks = dmap->br_blockcount; + } else { + dblocks = dmap->br_blockcount; + rblocks = 0; + } + error = xfs_trans_reserve_quota_nblks(tp, ip, dblocks, rblocks, + false); if (error) goto out_cancel; } diff --git a/fs/xfs/xfs_trans_dquot.c b/fs/xfs/xfs_trans_dquot.c index f5e9d76fb9a2..31ab1c5d6b13 100644 --- a/fs/xfs/xfs_trans_dquot.c +++ b/fs/xfs/xfs_trans_dquot.c @@ -1009,3 +1009,14 @@ xfs_trans_free_dqinfo( kmem_cache_free(xfs_dqtrx_cache, tp->t_dqinfo); tp->t_dqinfo = NULL; } + +int +xfs_quota_reserve_blkres( + struct xfs_inode *ip, + int64_t blocks) +{ + if (XFS_IS_REALTIME_INODE(ip)) + return xfs_trans_reserve_quota_nblks(NULL, ip, 0, blocks, + false); + return xfs_trans_reserve_quota_nblks(NULL, ip, blocks, 0, false); +} ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 15/42] xfs: create routine to allocate and initialize a realtime refcount btree inode 2022-12-30 22:18 ` [PATCHSET v1.0 00/42] xfs: reflink on the realtime device Darrick J. Wong ` (14 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 19/42] xfs: enable CoW for realtime data Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 16/42] xfs: update rmap to allow cow staging extents in the rt rmap Darrick J. Wong ` (25 subsequent siblings) 41 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Create a library routine to allocate and initialize an empty realtime refcountbt inode. We'll use this for growfs, mkfs, and repair. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_rtrefcount_btree.c | 41 ++++++++++++++++++++++++++++++++++ fs/xfs/libxfs/xfs_rtrefcount_btree.h | 6 +++++ 2 files changed, 47 insertions(+) diff --git a/fs/xfs/libxfs/xfs_rtrefcount_btree.c b/fs/xfs/libxfs/xfs_rtrefcount_btree.c index a43ee6d7b547..0a6fa9851371 100644 --- a/fs/xfs/libxfs/xfs_rtrefcount_btree.c +++ b/fs/xfs/libxfs/xfs_rtrefcount_btree.c @@ -765,3 +765,44 @@ xfs_iflush_rtrefcount( ifp->if_broot_bytes, dfp, XFS_DFORK_SIZE(dip, ip->i_mount, XFS_DATA_FORK)); } + +/* + * Create a realtime refcount btree inode. + * + * Regardless of the return value, the caller must clean up @ic. If a new + * inode is returned through *ipp, the caller must finish setting up the incore + * inode and release it. + */ +int +xfs_rtrefcountbt_create( + struct xfs_trans **tpp, + struct xfs_imeta_path *path, + struct xfs_imeta_update *upd, + struct xfs_inode **ipp) +{ + struct xfs_mount *mp = (*tpp)->t_mountp; + struct xfs_ifork *ifp; + struct xfs_inode *ip; + int error; + + *ipp = NULL; + + error = xfs_imeta_create(tpp, path, S_IFREG, 0, &ip, upd); + if (error) + return error; + + ifp = &ip->i_df; + ifp->if_format = XFS_DINODE_FMT_REFCOUNT; + ASSERT(ifp->if_broot_bytes == 0); + ASSERT(ifp->if_bytes == 0); + + /* Initialize the empty incore btree root. */ + xfs_iroot_alloc(ip, XFS_DATA_FORK, + xfs_rtrefcount_broot_space_calc(mp, 0, 0)); + xfs_btree_init_block(ip->i_mount, ifp->if_broot, &xfs_rtrefcountbt_ops, + 0, 0, ip->i_ino); + xfs_trans_log_inode(*tpp, ip, XFS_ILOG_CORE | XFS_ILOG_DBROOT); + + *ipp = ip; + return 0; +} diff --git a/fs/xfs/libxfs/xfs_rtrefcount_btree.h b/fs/xfs/libxfs/xfs_rtrefcount_btree.h index d2fe2004568d..86a547529c9d 100644 --- a/fs/xfs/libxfs/xfs_rtrefcount_btree.h +++ b/fs/xfs/libxfs/xfs_rtrefcount_btree.h @@ -186,4 +186,10 @@ void xfs_rtrefcountbt_to_disk(struct xfs_mount *mp, struct xfs_rtrefcount_root *dblock, int dblocklen); void xfs_iflush_rtrefcount(struct xfs_inode *ip, struct xfs_dinode *dip); +struct xfs_imeta_update; + +int xfs_rtrefcountbt_create(struct xfs_trans **tpp, + struct xfs_imeta_path *path, struct xfs_imeta_update *ic, + struct xfs_inode **ipp); + #endif /* __XFS_RTREFCOUNT_BTREE_H__ */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 16/42] xfs: update rmap to allow cow staging extents in the rt rmap 2022-12-30 22:18 ` [PATCHSET v1.0 00/42] xfs: reflink on the realtime device Darrick J. Wong ` (15 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 15/42] xfs: create routine to allocate and initialize a realtime refcount btree inode Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 17/42] xfs: compute rtrmap btree max levels when reflink enabled Darrick J. Wong ` (24 subsequent siblings) 41 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Don't error out on CoW staging extent records when realtime reflink is enabled. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_rmap.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/fs/xfs/libxfs/xfs_rmap.c b/fs/xfs/libxfs/xfs_rmap.c index a533588a9b5b..891af03afccc 100644 --- a/fs/xfs/libxfs/xfs_rmap.c +++ b/fs/xfs/libxfs/xfs_rmap.c @@ -274,6 +274,7 @@ xfs_rmap_check_rtgroup_irec( bool is_unwritten; bool is_bmbt; bool is_attr; + bool is_cow; if (irec->rm_blockcount == 0) return __this_address; @@ -285,6 +286,12 @@ xfs_rmap_check_rtgroup_irec( return __this_address; if (irec->rm_offset != 0) return __this_address; + } else if (irec->rm_owner == XFS_RMAP_OWN_COW) { + if (!xfs_has_rtreflink(mp)) + return __this_address; + if (!xfs_verify_rgbext(rtg, irec->rm_startblock, + irec->rm_blockcount)) + return __this_address; } else { if (!xfs_verify_rgbext(rtg, irec->rm_startblock, irec->rm_blockcount)) @@ -301,8 +308,10 @@ xfs_rmap_check_rtgroup_irec( is_bmbt = irec->rm_flags & XFS_RMAP_BMBT_BLOCK; is_attr = irec->rm_flags & XFS_RMAP_ATTR_FORK; is_unwritten = irec->rm_flags & XFS_RMAP_UNWRITTEN; + is_cow = xfs_has_rtreflink(mp) && + irec->rm_owner == XFS_RMAP_OWN_COW; - if (!is_inode && irec->rm_owner != XFS_RMAP_OWN_FS) + if (!is_inode && !is_cow && irec->rm_owner != XFS_RMAP_OWN_FS) return __this_address; if (!is_inode && irec->rm_offset != 0) @@ -314,6 +323,9 @@ xfs_rmap_check_rtgroup_irec( if (is_unwritten && !is_inode) return __this_address; + if (is_unwritten && is_cow) + return __this_address; + /* Check for a valid fork offset, if applicable. */ if (is_inode && !xfs_verify_fileext(mp, irec->rm_offset, irec->rm_blockcount)) ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 17/42] xfs: compute rtrmap btree max levels when reflink enabled 2022-12-30 22:18 ` [PATCHSET v1.0 00/42] xfs: reflink on the realtime device Darrick J. Wong ` (16 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 16/42] xfs: update rmap to allow cow staging extents in the rt rmap Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 18/42] xfs: refactor reflink quota updates Darrick J. Wong ` (23 subsequent siblings) 41 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Compute the maximum possible height of the realtime rmap btree when reflink is enabled. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_rtrmap_btree.c | 28 ++++++++++++++++++++++++++-- 1 file changed, 26 insertions(+), 2 deletions(-) diff --git a/fs/xfs/libxfs/xfs_rtrmap_btree.c b/fs/xfs/libxfs/xfs_rtrmap_btree.c index 878bfeed411f..35ae3171a0cc 100644 --- a/fs/xfs/libxfs/xfs_rtrmap_btree.c +++ b/fs/xfs/libxfs/xfs_rtrmap_btree.c @@ -737,6 +737,7 @@ xfs_rtrmapbt_maxrecs( unsigned int xfs_rtrmapbt_maxlevels_ondisk(void) { + unsigned long long max_dblocks; unsigned int minrecs[2]; unsigned int blocklen; @@ -745,8 +746,20 @@ xfs_rtrmapbt_maxlevels_ondisk(void) minrecs[0] = xfs_rtrmapbt_block_maxrecs(blocklen, true) / 2; minrecs[1] = xfs_rtrmapbt_block_maxrecs(blocklen, false) / 2; - /* We need at most one record for every block in an rt group. */ - return xfs_btree_compute_maxlevels(minrecs, XFS_MAX_RGBLOCKS); + /* + * Compute the asymptotic maxlevels for an rtrmapbt on any rtreflink fs. + * + * On a reflink filesystem, each block in an rtgroup can have up to + * 2^32 (per the refcount record format) owners, which means that + * theoretically we could face up to 2^64 rmap records. However, we're + * likely to run out of blocks in the data device long before that + * happens, which means that we must compute the max height based on + * what the btree will look like if it consumes almost all the blocks + * in the data device due to maximal sharing factor. + */ + max_dblocks = -1U; /* max ag count */ + max_dblocks *= XFS_MAX_CRC_AG_BLOCKS; + return xfs_btree_space_to_height(minrecs, max_dblocks); } int __init @@ -785,9 +798,20 @@ xfs_rtrmapbt_compute_maxlevels( * maximum height is constrained by the size of the data device and * the height required to store one rmap record for each block in an * rt group. + * + * On a reflink filesystem, each rt block can have up to 2^32 (per the + * refcount record format) owners, which means that theoretically we + * could face up to 2^64 rmap records. This makes the computation of + * maxlevels based on record count meaningless, so we only consider the + * size of the data device. */ d_maxlevels = xfs_btree_space_to_height(mp->m_rtrmap_mnr, mp->m_sb.sb_dblocks); + if (xfs_has_rtreflink(mp)) { + mp->m_rtrmap_maxlevels = d_maxlevels + 1; + return; + } + r_maxlevels = xfs_btree_compute_maxlevels(mp->m_rtrmap_mnr, mp->m_sb.sb_rgblocks); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 18/42] xfs: refactor reflink quota updates 2022-12-30 22:18 ` [PATCHSET v1.0 00/42] xfs: reflink on the realtime device Darrick J. Wong ` (17 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 17/42] xfs: compute rtrmap btree max levels when reflink enabled Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 14/42] xfs: wire up realtime refcount btree cursors Darrick J. Wong ` (22 subsequent siblings) 41 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Hoist all quota updates for reflink into a helper function, since things are about to become more complicated. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/xfs_reflink.c | 37 ++++++++++++++++++++++++++++++++----- 1 file changed, 32 insertions(+), 5 deletions(-) diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c index 1a8a254c81f4..455adcce994d 100644 --- a/fs/xfs/xfs_reflink.c +++ b/fs/xfs/xfs_reflink.c @@ -750,6 +750,35 @@ xfs_reflink_cancel_cow_range( return error; } +#ifdef CONFIG_XFS_QUOTA +/* + * Update quota accounting for a remapping operation. When we're remapping + * something from the CoW fork to the data fork, we must update the quota + * accounting for delayed allocations. For remapping from the data fork to the + * data fork, use regular block accounting. + */ +static inline void +xfs_reflink_update_quota( + struct xfs_trans *tp, + struct xfs_inode *ip, + bool is_cow, + int64_t blocks) +{ + unsigned int qflag; + + if (XFS_IS_REALTIME_INODE(ip)) { + qflag = is_cow ? XFS_TRANS_DQ_DELRTBCOUNT : + XFS_TRANS_DQ_RTBCOUNT; + } else { + qflag = is_cow ? XFS_TRANS_DQ_DELBCOUNT : + XFS_TRANS_DQ_BCOUNT; + } + xfs_trans_mod_dquot_byino(tp, ip, qflag, blocks); +} +#else +# define xfs_reflink_update_quota(tp, ip, is_cow, blocks) ((void)0) +#endif + /* * Remap part of the CoW fork into the data fork. * @@ -852,8 +881,7 @@ xfs_reflink_end_cow_extent( */ xfs_bmap_unmap_extent(tp, ip, XFS_DATA_FORK, &data); xfs_refcount_decrease_extent(tp, isrt, &data); - xfs_trans_mod_dquot_byino(tp, ip, XFS_TRANS_DQ_BCOUNT, - -data.br_blockcount); + xfs_reflink_update_quota(tp, ip, false, -data.br_blockcount); } else if (data.br_startblock == DELAYSTARTBLOCK) { int done; @@ -878,8 +906,7 @@ xfs_reflink_end_cow_extent( xfs_bmap_map_extent(tp, ip, XFS_DATA_FORK, &del); /* Charge this new data fork mapping to the on-disk quota. */ - xfs_trans_mod_dquot_byino(tp, ip, XFS_TRANS_DQ_DELBCOUNT, - (long)del.br_blockcount); + xfs_reflink_update_quota(tp, ip, true, del.br_blockcount); /* Remove the mapping from the CoW fork. */ xfs_bmap_del_extent_cow(ip, &icur, &got, &del); @@ -1369,7 +1396,7 @@ xfs_reflink_remap_extent( qdelta += dmap->br_blockcount; } - xfs_trans_mod_dquot_byino(tp, ip, XFS_TRANS_DQ_BCOUNT, qdelta); + xfs_reflink_update_quota(tp, ip, false, qdelta); /* Update dest isize if needed. */ newlen = XFS_FSB_TO_B(mp, dmap->br_startoff + dmap->br_blockcount); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 14/42] xfs: wire up realtime refcount btree cursors 2022-12-30 22:18 ` [PATCHSET v1.0 00/42] xfs: reflink on the realtime device Darrick J. Wong ` (18 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 18/42] xfs: refactor reflink quota updates Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 21/42] xfs: allow inodes to have the realtime and reflink flags Darrick J. Wong ` (21 subsequent siblings) 41 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Wire up realtime refcount btree cursors wherever they're needed throughout the code base. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_refcount.c | 7 ++- fs/xfs/libxfs/xfs_rtgroup.c | 10 ++++ fs/xfs/libxfs/xfs_rtgroup.h | 5 ++ fs/xfs/xfs_fsmap.c | 22 ++++++--- fs/xfs/xfs_reflink.c | 99 ++++++++++++++++++++++++++++++++++-------- 5 files changed, 111 insertions(+), 32 deletions(-) diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c index 999ba2c5c37d..c4ab749c78e4 100644 --- a/fs/xfs/libxfs/xfs_refcount.c +++ b/fs/xfs/libxfs/xfs_refcount.c @@ -26,6 +26,7 @@ #include "xfs_health.h" #include "xfs_rtgroup.h" #include "xfs_rtalloc.h" +#include "xfs_rtrefcount_btree.h" struct kmem_cache *xfs_refcount_intent_cache; @@ -1485,9 +1486,9 @@ xfs_refcount_finish_one( } if (rcur == NULL) { if (ri->ri_realtime) { - /* coming in a later patch */ - ASSERT(0); - return -EFSCORRUPTED; + xfs_rtgroup_lock(tp, ri->ri_rtg, XFS_RTGLOCK_REFCOUNT); + rcur = xfs_rtrefcountbt_init_cursor(mp, tp, ri->ri_rtg, + ri->ri_rtg->rtg_refcountip); } else { error = xfs_alloc_read_agf(ri->ri_pag, tp, XFS_ALLOC_FLAG_FREEING, &agbp); diff --git a/fs/xfs/libxfs/xfs_rtgroup.c b/fs/xfs/libxfs/xfs_rtgroup.c index bd878e65bc44..836b19e0406d 100644 --- a/fs/xfs/libxfs/xfs_rtgroup.c +++ b/fs/xfs/libxfs/xfs_rtgroup.c @@ -524,6 +524,13 @@ xfs_rtgroup_lock( if (tp) xfs_trans_ijoin(tp, rtg->rtg_rmapip, XFS_ILOCK_EXCL); } + + if ((rtglock_flags & XFS_RTGLOCK_REFCOUNT) && rtg->rtg_refcountip) { + xfs_ilock(rtg->rtg_refcountip, XFS_ILOCK_EXCL); + if (tp) + xfs_trans_ijoin(tp, rtg->rtg_refcountip, + XFS_ILOCK_EXCL); + } } /* Unlock metadata inodes associated with this rt group. */ @@ -536,6 +543,9 @@ xfs_rtgroup_unlock( ASSERT(!(rtglock_flags & XFS_RTGLOCK_BITMAP_SHARED) || !(rtglock_flags & XFS_RTGLOCK_BITMAP)); + if ((rtglock_flags & XFS_RTGLOCK_REFCOUNT) && rtg->rtg_refcountip) + xfs_iunlock(rtg->rtg_refcountip, XFS_ILOCK_EXCL); + if ((rtglock_flags & XFS_RTGLOCK_RMAP) && rtg->rtg_rmapip) xfs_iunlock(rtg->rtg_rmapip, XFS_ILOCK_EXCL); diff --git a/fs/xfs/libxfs/xfs_rtgroup.h b/fs/xfs/libxfs/xfs_rtgroup.h index 0f400f133d88..4f0358d63457 100644 --- a/fs/xfs/libxfs/xfs_rtgroup.h +++ b/fs/xfs/libxfs/xfs_rtgroup.h @@ -237,10 +237,13 @@ int xfs_rtgroup_init_secondary_super(struct xfs_mount *mp, xfs_rgnumber_t rgno, #define XFS_RTGLOCK_BITMAP_SHARED (1U << 1) /* Lock the rt rmap inode in exclusive mode */ #define XFS_RTGLOCK_RMAP (1U << 2) +/* Lock the rt refcount inode in exclusive mode */ +#define XFS_RTGLOCK_REFCOUNT (1U << 3) #define XFS_RTGLOCK_ALL_FLAGS (XFS_RTGLOCK_BITMAP | \ XFS_RTGLOCK_BITMAP_SHARED | \ - XFS_RTGLOCK_RMAP) + XFS_RTGLOCK_RMAP | \ + XFS_RTGLOCK_REFCOUNT) void xfs_rtgroup_lock(struct xfs_trans *tp, struct xfs_rtgroup *rtg, unsigned int rtglock_flags); diff --git a/fs/xfs/xfs_fsmap.c b/fs/xfs/xfs_fsmap.c index efbcc4b1d850..5f7e7ea2fde3 100644 --- a/fs/xfs/xfs_fsmap.c +++ b/fs/xfs/xfs_fsmap.c @@ -27,6 +27,7 @@ #include "xfs_ag.h" #include "xfs_rtgroup.h" #include "xfs_rtrmap_btree.h" +#include "xfs_rtrefcount_btree.h" /* Convert an xfs_fsmap to an fsmap. */ static void @@ -209,14 +210,16 @@ xfs_getfsmap_is_shared( *stat = false; if (!xfs_has_reflink(mp)) return 0; - /* rt files will have no perag structure */ - if (!info->pag) - return 0; + + if (info->rtg) + cur = xfs_rtrefcountbt_init_cursor(mp, tp, info->rtg, + info->rtg->rtg_refcountip); + else + cur = xfs_refcountbt_init_cursor(mp, tp, info->agf_bp, + info->pag); /* Are there any shared blocks here? */ flen = 0; - cur = xfs_refcountbt_init_cursor(mp, tp, info->agf_bp, info->pag); - error = xfs_refcount_find_shared(cur, rec->rm_startblock, rec->rm_blockcount, &fbno, &flen, false); @@ -820,7 +823,8 @@ xfs_getfsmap_rtdev_rmapbt_query( return xfs_getfsmap_rtdev_helper(*curpp, &info->high, info); /* Query the rtrmapbt */ - xfs_rtgroup_lock(NULL, info->rtg, XFS_RTGLOCK_RMAP); + xfs_rtgroup_lock(NULL, info->rtg, XFS_RTGLOCK_RMAP | + XFS_RTGLOCK_REFCOUNT); *curpp = xfs_rtrmapbt_init_cursor(mp, tp, info->rtg, info->rtg->rtg_rmapip); return xfs_rmap_query_range(*curpp, &info->low, &info->high, @@ -893,7 +897,8 @@ xfs_getfsmap_rtdev_rmapbt( if (bt_cur) { xfs_rtgroup_unlock(bt_cur->bc_ino.rtg, - XFS_RTGLOCK_RMAP); + XFS_RTGLOCK_RMAP | + XFS_RTGLOCK_REFCOUNT); xfs_btree_del_cursor(bt_cur, XFS_BTREE_NOERROR); bt_cur = NULL; } @@ -934,7 +939,8 @@ xfs_getfsmap_rtdev_rmapbt( } if (bt_cur) { - xfs_rtgroup_unlock(bt_cur->bc_ino.rtg, XFS_RTGLOCK_RMAP); + xfs_rtgroup_unlock(bt_cur->bc_ino.rtg, XFS_RTGLOCK_RMAP | + XFS_RTGLOCK_REFCOUNT); xfs_btree_del_cursor(bt_cur, error < 0 ? XFS_BTREE_ERROR : XFS_BTREE_NOERROR); } diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c index 52e73aa2c38e..1a8a254c81f4 100644 --- a/fs/xfs/xfs_reflink.c +++ b/fs/xfs/xfs_reflink.c @@ -30,6 +30,9 @@ #include "xfs_ag.h" #include "xfs_ag_resv.h" #include "xfs_health.h" +#include "xfs_rtrefcount_btree.h" +#include "xfs_rtalloc.h" +#include "xfs_rtgroup.h" /* * Copy on Write of Shared Blocks @@ -155,6 +158,38 @@ xfs_reflink_find_shared( return error; } +/* + * Given an RT extent, find the lowest-numbered run of shared blocks + * within that range and return the range in fbno/flen. If + * find_end_of_shared is true, return the longest contiguous extent of + * shared blocks. If there are no shared extents, fbno and flen will + * be set to NULLRGBLOCK and 0, respectively. + */ +static int +xfs_reflink_find_rtshared( + struct xfs_rtgroup *rtg, + struct xfs_trans *tp, + xfs_agblock_t rtbno, + xfs_extlen_t rtlen, + xfs_agblock_t *fbno, + xfs_extlen_t *flen, + bool find_end_of_shared) +{ + struct xfs_mount *mp = rtg->rtg_mount; + struct xfs_btree_cur *cur; + int error; + + BUILD_BUG_ON(NULLRGBLOCK != NULLAGBLOCK); + + xfs_rtgroup_lock(NULL, rtg, XFS_RTGLOCK_REFCOUNT); + cur = xfs_rtrefcountbt_init_cursor(mp, tp, rtg, rtg->rtg_refcountip); + error = xfs_refcount_find_shared(cur, rtbno, rtlen, fbno, flen, + find_end_of_shared); + xfs_btree_del_cursor(cur, error); + xfs_rtgroup_unlock(rtg, XFS_RTGLOCK_REFCOUNT); + return error; +} + /* * Trim the mapping to the next block where there's a change in the * shared/unshared status. More specifically, this means that we @@ -172,9 +207,7 @@ xfs_reflink_trim_around_shared( bool *shared) { struct xfs_mount *mp = ip->i_mount; - struct xfs_perag *pag; - xfs_agblock_t agbno; - xfs_extlen_t aglen; + xfs_agblock_t orig_bno; xfs_agblock_t fbno; xfs_extlen_t flen; int error = 0; @@ -187,13 +220,25 @@ xfs_reflink_trim_around_shared( trace_xfs_reflink_trim_around_shared(ip, irec); - pag = xfs_perag_get(mp, XFS_FSB_TO_AGNO(mp, irec->br_startblock)); - agbno = XFS_FSB_TO_AGBNO(mp, irec->br_startblock); - aglen = irec->br_blockcount; + if (XFS_IS_REALTIME_INODE(ip)) { + struct xfs_rtgroup *rtg; + xfs_rgnumber_t rgno; - error = xfs_reflink_find_shared(pag, NULL, agbno, aglen, &fbno, &flen, - true); - xfs_perag_put(pag); + orig_bno = xfs_rtb_to_rgbno(mp, irec->br_startblock, &rgno); + rtg = xfs_rtgroup_get(mp, rgno); + error = xfs_reflink_find_rtshared(rtg, NULL, orig_bno, + irec->br_blockcount, &fbno, &flen, true); + xfs_rtgroup_put(rtg); + } else { + struct xfs_perag *pag; + + pag = xfs_perag_get(mp, XFS_FSB_TO_AGNO(mp, + irec->br_startblock)); + orig_bno = XFS_FSB_TO_AGBNO(mp, irec->br_startblock); + error = xfs_reflink_find_shared(pag, NULL, orig_bno, + irec->br_blockcount, &fbno, &flen, true); + xfs_perag_put(pag); + } if (error) return error; @@ -203,7 +248,7 @@ xfs_reflink_trim_around_shared( return 0; } - if (fbno == agbno) { + if (fbno == orig_bno) { /* * The start of this extent is shared. Truncate the * mapping at the end of the shared region so that a @@ -221,7 +266,7 @@ xfs_reflink_trim_around_shared( * extent so that a subsequent iteration starts at the * start of the shared region. */ - irec->br_blockcount = fbno - agbno; + irec->br_blockcount = fbno - orig_bno; return 0; } @@ -1574,9 +1619,6 @@ xfs_reflink_inode_has_shared_extents( *has_shared = false; found = xfs_iext_lookup_extent(ip, ifp, 0, &icur, &got); while (found) { - struct xfs_perag *pag; - xfs_agblock_t agbno; - xfs_extlen_t aglen; xfs_agblock_t rbno; xfs_extlen_t rlen; @@ -1584,12 +1626,29 @@ xfs_reflink_inode_has_shared_extents( got.br_state != XFS_EXT_NORM) goto next; - pag = xfs_perag_get(mp, XFS_FSB_TO_AGNO(mp, got.br_startblock)); - agbno = XFS_FSB_TO_AGBNO(mp, got.br_startblock); - aglen = got.br_blockcount; - error = xfs_reflink_find_shared(pag, tp, agbno, aglen, - &rbno, &rlen, false); - xfs_perag_put(pag); + if (XFS_IS_REALTIME_INODE(ip)) { + struct xfs_rtgroup *rtg; + xfs_rgnumber_t rgno; + xfs_rgblock_t rgbno; + + rgbno = xfs_rtb_to_rgbno(mp, got.br_startblock, &rgno); + rtg = xfs_rtgroup_get(mp, rgno); + error = xfs_reflink_find_rtshared(rtg, tp, rgbno, + got.br_blockcount, &rbno, &rlen, + false); + xfs_rtgroup_put(rtg); + } else { + struct xfs_perag *pag; + xfs_agblock_t agbno; + + pag = xfs_perag_get(mp, XFS_FSB_TO_AGNO(mp, + got.br_startblock)); + agbno = XFS_FSB_TO_AGBNO(mp, got.br_startblock); + error = xfs_reflink_find_shared(pag, tp, agbno, + got.br_blockcount, &rbno, &rlen, + false); + xfs_perag_put(pag); + } if (error) return error; ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 21/42] xfs: allow inodes to have the realtime and reflink flags 2022-12-30 22:18 ` [PATCHSET v1.0 00/42] xfs: reflink on the realtime device Darrick J. Wong ` (19 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 14/42] xfs: wire up realtime refcount btree cursors Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 27/42] xfs: add realtime refcount btree when adding rt volume Darrick J. Wong ` (20 subsequent siblings) 41 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Now that we can share blocks between realtime files, allow this combination. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_inode_buf.c | 3 ++- fs/xfs/scrub/inode.c | 5 +++-- fs/xfs/scrub/inode_repair.c | 6 ------ fs/xfs/xfs_ioctl.c | 4 ---- 4 files changed, 5 insertions(+), 13 deletions(-) diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c index dcf816f2643b..0db719f80bf2 100644 --- a/fs/xfs/libxfs/xfs_inode_buf.c +++ b/fs/xfs/libxfs/xfs_inode_buf.c @@ -675,7 +675,8 @@ xfs_dinode_verify( return __this_address; /* don't let reflink and realtime mix */ - if ((flags2 & XFS_DIFLAG2_REFLINK) && (flags & XFS_DIFLAG_REALTIME)) + if ((flags2 & XFS_DIFLAG2_REFLINK) && (flags & XFS_DIFLAG_REALTIME) && + !xfs_has_rtreflink(mp)) return __this_address; /* COW extent size hint validation */ diff --git a/fs/xfs/scrub/inode.c b/fs/xfs/scrub/inode.c index f2c60c3515e7..3b19976b6066 100644 --- a/fs/xfs/scrub/inode.c +++ b/fs/xfs/scrub/inode.c @@ -329,8 +329,9 @@ xchk_inode_flags2( if ((flags2 & XFS_DIFLAG2_REFLINK) && !S_ISREG(mode)) goto bad; - /* realtime and reflink make no sense, currently */ - if ((flags & XFS_DIFLAG_REALTIME) && (flags2 & XFS_DIFLAG2_REFLINK)) + /* realtime and reflink don't always go together */ + if ((flags & XFS_DIFLAG_REALTIME) && (flags2 & XFS_DIFLAG2_REFLINK) && + !xfs_has_rtreflink(mp)) goto bad; /* no bigtime iflag without the bigtime feature */ diff --git a/fs/xfs/scrub/inode_repair.c b/fs/xfs/scrub/inode_repair.c index 8566282827f8..9f946406cfa0 100644 --- a/fs/xfs/scrub/inode_repair.c +++ b/fs/xfs/scrub/inode_repair.c @@ -391,8 +391,6 @@ xrep_dinode_flags( flags2 |= XFS_DIFLAG2_REFLINK; else flags2 &= ~(XFS_DIFLAG2_REFLINK | XFS_DIFLAG2_COWEXTSIZE); - if (flags & XFS_DIFLAG_REALTIME) - flags2 &= ~XFS_DIFLAG2_REFLINK; if (flags2 & XFS_DIFLAG2_REFLINK) flags2 &= ~XFS_DIFLAG2_DAX; if (!xfs_has_bigtime(mp)) @@ -1480,10 +1478,6 @@ xrep_inode_flags( if (!(S_ISREG(mode) || S_ISDIR(mode))) sc->ip->i_diflags2 &= ~XFS_DIFLAG2_DAX; - /* No reflink files on the realtime device. */ - if (sc->ip->i_diflags & XFS_DIFLAG_REALTIME) - sc->ip->i_diflags2 &= ~XFS_DIFLAG2_REFLINK; - /* No mixing reflink and DAX yet. */ if (sc->ip->i_diflags2 & XFS_DIFLAG2_REFLINK) sc->ip->i_diflags2 &= ~XFS_DIFLAG2_DAX; diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c index fbe9bc50fc20..939cc6d862da 100644 --- a/fs/xfs/xfs_ioctl.c +++ b/fs/xfs/xfs_ioctl.c @@ -1115,10 +1115,6 @@ xfs_ioctl_setattr_xflags( return -EINVAL; } - /* Clear reflink if we are actually able to set the rt flag. */ - if ((fa->fsx_xflags & FS_XFLAG_REALTIME) && xfs_is_reflink_inode(ip)) - ip->i_diflags2 &= ~XFS_DIFLAG2_REFLINK; - /* diflags2 only valid for v3 inodes. */ i_flags2 = xfs_flags2diflags2(ip, fa->fsx_xflags); if (i_flags2 && !xfs_has_v3inodes(mp)) ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 27/42] xfs: add realtime refcount btree when adding rt volume 2022-12-30 22:18 ` [PATCHSET v1.0 00/42] xfs: reflink on the realtime device Darrick J. Wong ` (20 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 21/42] xfs: allow inodes to have the realtime and reflink flags Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 23/42] xfs: fix xfs_get_extsz_hint behavior with realtime alwayscow files Darrick J. Wong ` (19 subsequent siblings) 41 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> If we're adding enough space to the realtime section to require the creation of new realtime groups, create the rt refcount btree inode before we start adding the space. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/xfs_rtalloc.c | 79 +++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 77 insertions(+), 2 deletions(-) diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c index 7f1ee9432e71..8929c4fffb53 100644 --- a/fs/xfs/xfs_rtalloc.c +++ b/fs/xfs/xfs_rtalloc.c @@ -1134,6 +1134,73 @@ xfs_growfsrt_create_rtrmap( return error; } +/* Add a metadata inode for a realtime refcount btree. */ +static int +xfs_growfsrt_create_rtrefcount( + struct xfs_rtgroup *rtg) +{ + struct xfs_mount *mp = rtg->rtg_mount; + struct xfs_imeta_update upd; + struct xfs_imeta_path *path; + struct xfs_trans *tp; + struct xfs_inode *ip = NULL; + int error; + + if (!xfs_has_rtreflink(mp) || rtg->rtg_refcountip) + return 0; + + error = xfs_rtrefcountbt_create_path(mp, rtg->rtg_rgno, &path); + if (error) + return error; + + error = xfs_imeta_ensure_dirpath(mp, path); + if (error) + goto out_path; + + error = xfs_imeta_start_update(mp, path, &upd); + if (error) + goto out_path; + + error = xfs_qm_dqattach(upd.dp); + if (error) + goto out_upd; + + error = xfs_trans_alloc(mp, &M_RES(mp)->tr_imeta_create, + xfs_imeta_create_space_res(mp), 0, 0, &tp); + if (error) + goto out_end; + + error = xfs_rtrefcountbt_create(&tp, path, &upd, &ip); + if (error) + goto out_cancel; + + lockdep_set_class(&ip->i_lock.mr_lock, &xfs_rrefcountip_key); + + error = xfs_trans_commit(tp); + if (error) + goto out_end; + + xfs_imeta_end_update(mp, &upd, error); + xfs_imeta_free_path(path); + xfs_finish_inode_setup(ip); + rtg->rtg_refcountip = ip; + return 0; + +out_cancel: + xfs_trans_cancel(tp); +out_end: + /* Have to finish setting up the inode to ensure it's deleted. */ + if (ip) { + xfs_finish_inode_setup(ip); + xfs_irele(ip); + } +out_upd: + xfs_imeta_end_update(mp, &upd, error); +out_path: + xfs_imeta_free_path(path); + return error; +} + /* * Check that changes to the realtime geometry won't affect the minimum * log size, which would cause the fs to become unusable. @@ -1241,9 +1308,11 @@ xfs_growfs_rt( return -EINVAL; /* Unsupported realtime features. */ - if (!xfs_has_rtgroups(mp) && xfs_has_rmapbt(mp)) + if (!xfs_has_rtgroups(mp) && (xfs_has_rmapbt(mp) || xfs_has_reflink(mp))) return -EOPNOTSUPP; - if (xfs_has_reflink(mp) || xfs_has_quota(mp)) + if (xfs_has_quota(mp)) + return -EOPNOTSUPP; + if (xfs_has_reflink(mp) && in->extsize != 1) return -EOPNOTSUPP; nrblocks = in->newblocks; @@ -1378,6 +1447,12 @@ xfs_growfs_rt( xfs_rtgroup_put(rtg); break; } + + error = xfs_growfsrt_create_rtrefcount(rtg); + if (error) { + xfs_rtgroup_put(rtg); + break; + } } } ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 23/42] xfs: fix xfs_get_extsz_hint behavior with realtime alwayscow files 2022-12-30 22:18 ` [PATCHSET v1.0 00/42] xfs: reflink on the realtime device Darrick J. Wong ` (21 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 27/42] xfs: add realtime refcount btree when adding rt volume Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 24/42] xfs: apply rt extent alignment constraints to CoW extsize hint Darrick J. Wong ` (18 subsequent siblings) 41 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Currently, we (ab)use xfs_get_extsz_hint so that it always returns a nonzero value for realtime files. This apparently was done to disable delayed allocation for realtime files. However, once we enable realtime reflink, we can also turn on the alwayscow flag to force CoW writes to realtime files. In this case, the logic will incorrectly send the write through the delalloc write path. Fix this by adjusting the logic slightly. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_bmap.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index fe31f3cb5d91..552875ddcc4a 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -6427,9 +6427,8 @@ xfs_get_extsz_hint( * No point in aligning allocations if we need to COW to actually * write to them. */ - if (xfs_is_always_cow_inode(ip)) - return 0; - if ((ip->i_diflags & XFS_DIFLAG_EXTSIZE) && ip->i_extsize) + if (!xfs_is_always_cow_inode(ip) && + (ip->i_diflags & XFS_DIFLAG_EXTSIZE) && ip->i_extsize) return ip->i_extsize; if (XFS_IS_REALTIME_INODE(ip)) return ip->i_mount->m_sb.sb_rextsize; ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 24/42] xfs: apply rt extent alignment constraints to CoW extsize hint 2022-12-30 22:18 ` [PATCHSET v1.0 00/42] xfs: reflink on the realtime device Darrick J. Wong ` (22 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 23/42] xfs: fix xfs_get_extsz_hint behavior with realtime alwayscow files Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 26/42] xfs: check that the rtrefcount maxlevels doesn't increase when growing fs Darrick J. Wong ` (17 subsequent siblings) 41 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> The copy-on-write extent size hint is subject to the same alignment constraints as the regular extent size hint. Since we're in the process of adding reflink (and therefore CoW) to the realtime device, we must apply the same scattered rextsize alignment validation strategies to both hints to deal with the possibility of rextsize changing. Therefore, fix the inode validator to perform rextsize alignment checks on regular realtime files, and to remove misaligned directory hints. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_inode_buf.c | 25 ++++++++++++++++++++----- fs/xfs/libxfs/xfs_trans_inode.c | 14 ++++++++++++++ fs/xfs/xfs_ioctl.c | 17 +++++++++++++++-- 3 files changed, 49 insertions(+), 7 deletions(-) diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c index 0db719f80bf2..09dafa8a9ab2 100644 --- a/fs/xfs/libxfs/xfs_inode_buf.c +++ b/fs/xfs/libxfs/xfs_inode_buf.c @@ -831,11 +831,29 @@ xfs_inode_validate_cowextsize( bool rt_flag; bool hint_flag; uint32_t cowextsize_bytes; + uint32_t blocksize_bytes; rt_flag = (flags & XFS_DIFLAG_REALTIME); hint_flag = (flags2 & XFS_DIFLAG2_COWEXTSIZE); cowextsize_bytes = XFS_FSB_TO_B(mp, cowextsize); + /* + * Similar to extent size hints, a directory can be configured to + * propagate realtime status and a CoW extent size hint to newly + * created files even if there is no realtime device, and the hints on + * disk can become misaligned if the sysadmin changes the rt extent + * size while adding the realtime device. + * + * Therefore, we can only enforce the rextsize alignment check against + * regular realtime files, and rely on callers to decide when alignment + * checks are appropriate, and fix things up as needed. + */ + + if (rt_flag) + blocksize_bytes = XFS_FSB_TO_B(mp, mp->m_sb.sb_rextsize); + else + blocksize_bytes = mp->m_sb.sb_blocksize; + if (hint_flag && !xfs_has_reflink(mp)) return __this_address; @@ -849,16 +867,13 @@ xfs_inode_validate_cowextsize( if (mode && !hint_flag && cowextsize != 0) return __this_address; - if (hint_flag && rt_flag) - return __this_address; - - if (cowextsize_bytes % mp->m_sb.sb_blocksize) + if (cowextsize_bytes % blocksize_bytes) return __this_address; if (cowextsize > XFS_MAX_BMBT_EXTLEN) return __this_address; - if (cowextsize > mp->m_sb.sb_agblocks / 2) + if (!rt_flag && cowextsize > mp->m_sb.sb_agblocks / 2) return __this_address; return NULL; diff --git a/fs/xfs/libxfs/xfs_trans_inode.c b/fs/xfs/libxfs/xfs_trans_inode.c index 4571db873f14..e292851e3b9d 100644 --- a/fs/xfs/libxfs/xfs_trans_inode.c +++ b/fs/xfs/libxfs/xfs_trans_inode.c @@ -160,6 +160,20 @@ xfs_trans_log_inode( flags |= XFS_ILOG_CORE; } + /* + * Inode verifiers do not check that the CoW extent size hint is an + * integer multiple of the rt extent size on a directory with both + * rtinherit and cowextsize flags set. If we're logging a directory + * that is misconfigured in this way, clear the hint. + */ + if ((ip->i_diflags & XFS_DIFLAG_RTINHERIT) && + (ip->i_diflags2 & XFS_DIFLAG2_COWEXTSIZE) && + (ip->i_cowextsize % ip->i_mount->m_sb.sb_rextsize) > 0) { + ip->i_diflags2 &= ~XFS_DIFLAG2_COWEXTSIZE; + ip->i_cowextsize = 0; + flags |= XFS_ILOG_CORE; + } + /* * Record the specific change for fdatasync optimisation. This allows * fdatasync to skip log forces for inodes that are only timestamp diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c index 939cc6d862da..abca384c86a4 100644 --- a/fs/xfs/xfs_ioctl.c +++ b/fs/xfs/xfs_ioctl.c @@ -1054,8 +1054,21 @@ xfs_fill_fsxattr( } } - if (ip->i_diflags2 & XFS_DIFLAG2_COWEXTSIZE) - fa->fsx_cowextsize = XFS_FSB_TO_B(mp, ip->i_cowextsize); + if (ip->i_diflags2 & XFS_DIFLAG2_COWEXTSIZE) { + /* + * Don't let a misaligned CoW extent size hint on a directory + * escape to userspace if it won't pass the setattr checks + * later. + */ + if ((ip->i_diflags & XFS_DIFLAG_RTINHERIT) && + ip->i_cowextsize % mp->m_sb.sb_rextsize > 0) { + fa->fsx_xflags &= ~FS_XFLAG_COWEXTSIZE; + fa->fsx_cowextsize = 0; + } else { + fa->fsx_cowextsize = XFS_FSB_TO_B(mp, ip->i_cowextsize); + } + } + fa->fsx_projid = ip->i_projid; if (ifp && !xfs_need_iread_extents(ifp)) fa->fsx_nextents = xfs_iext_count(ifp); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 26/42] xfs: check that the rtrefcount maxlevels doesn't increase when growing fs 2022-12-30 22:18 ` [PATCHSET v1.0 00/42] xfs: reflink on the realtime device Darrick J. Wong ` (23 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 24/42] xfs: apply rt extent alignment constraints to CoW extsize hint Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 28/42] xfs: report realtime refcount btree corruption errors to the health system Darrick J. Wong ` (16 subsequent siblings) 41 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> The size of filesystem transaction reservations depends on the maximum height (maxlevels) of the realtime btrees. Since we don't want a grow operation to increase the reservation size enough that we'll fail the minimum log size checks on the next mount, constrain growfs operations if they would cause an increase in the rt refcount btree maxlevels. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/xfs_fsops.c | 2 ++ fs/xfs/xfs_rtalloc.c | 2 ++ 2 files changed, 4 insertions(+) diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c index 65b44ad8884e..317f0461f490 100644 --- a/fs/xfs/xfs_fsops.c +++ b/fs/xfs/xfs_fsops.c @@ -24,6 +24,7 @@ #include "xfs_rtgroup.h" #include "xfs_rtalloc.h" #include "xfs_rtrmap_btree.h" +#include "xfs_rtrefcount_btree.h" /* * Write new AG headers to disk. Non-transactional, but need to be @@ -225,6 +226,7 @@ xfs_growfs_data_private( /* Compute new maxlevels for rt btrees. */ xfs_rtrmapbt_compute_maxlevels(mp); + xfs_rtrefcountbt_compute_maxlevels(mp); } return error; diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c index 48c7cc28b7f2..7f1ee9432e71 100644 --- a/fs/xfs/xfs_rtalloc.c +++ b/fs/xfs/xfs_rtalloc.c @@ -1172,6 +1172,7 @@ xfs_growfs_check_rtgeom( fake_mp->m_features |= XFS_FEAT_REALTIME; xfs_rtrmapbt_compute_maxlevels(fake_mp); + xfs_rtrefcountbt_compute_maxlevels(fake_mp); xfs_trans_resv_calc(fake_mp, M_RES(fake_mp)); min_logfsbs = xfs_log_calc_minimum_size(fake_mp); @@ -1474,6 +1475,7 @@ xfs_growfs_rt( */ mp->m_features |= XFS_FEAT_REALTIME; xfs_rtrmapbt_compute_maxlevels(mp); + xfs_rtrefcountbt_compute_maxlevels(mp); } if (error) goto out_free; ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 28/42] xfs: report realtime refcount btree corruption errors to the health system 2022-12-30 22:18 ` [PATCHSET v1.0 00/42] xfs: reflink on the realtime device Darrick J. Wong ` (24 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 26/42] xfs: check that the rtrefcount maxlevels doesn't increase when growing fs Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 25/42] xfs: enable extent size hints for CoW operations Darrick J. Wong ` (15 subsequent siblings) 41 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Whenever we encounter corrupt realtime refcount btree blocks, we should report that to the health monitoring system for later reporting. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_fs.h | 1 + fs/xfs/libxfs/xfs_health.h | 4 +++- fs/xfs/libxfs/xfs_inode_fork.c | 4 +++- fs/xfs/libxfs/xfs_rtrefcount_btree.c | 5 ++++- fs/xfs/xfs_health.c | 4 ++++ fs/xfs/xfs_rtalloc.c | 1 + 6 files changed, 16 insertions(+), 3 deletions(-) diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h index 8547ba85c550..5819576a51a1 100644 --- a/fs/xfs/libxfs/xfs_fs.h +++ b/fs/xfs/libxfs/xfs_fs.h @@ -314,6 +314,7 @@ struct xfs_rtgroup_geometry { #define XFS_RTGROUP_GEOM_SICK_SUPER (1 << 0) /* superblock */ #define XFS_RTGROUP_GEOM_SICK_BITMAP (1 << 1) /* rtbitmap for this group */ #define XFS_RTGROUP_GEOM_SICK_RMAPBT (1 << 2) /* reverse mappings */ +#define XFS_RTGROUP_GEOM_SICK_REFCNTBT (1 << 3) /* reference counts */ /* * Structures for XFS_IOC_FSGROWFSDATA, XFS_IOC_FSGROWFSLOG & XFS_IOC_FSGROWFSRT diff --git a/fs/xfs/libxfs/xfs_health.h b/fs/xfs/libxfs/xfs_health.h index d5976f6b0de1..131282167548 100644 --- a/fs/xfs/libxfs/xfs_health.h +++ b/fs/xfs/libxfs/xfs_health.h @@ -68,6 +68,7 @@ struct xfs_rtgroup; #define XFS_SICK_RT_SUMMARY (1 << 1) /* realtime summary */ #define XFS_SICK_RT_SUPER (1 << 2) /* rt group superblock */ #define XFS_SICK_RT_RMAPBT (1 << 3) /* reverse mappings */ +#define XFS_SICK_RT_REFCNTBT (1 << 4) /* reference counts */ /* Observable health issues for AG metadata. */ #define XFS_SICK_AG_SB (1 << 0) /* superblock */ @@ -106,7 +107,8 @@ struct xfs_rtgroup; #define XFS_SICK_RT_PRIMARY (XFS_SICK_RT_BITMAP | \ XFS_SICK_RT_SUMMARY | \ XFS_SICK_RT_SUPER | \ - XFS_SICK_RT_RMAPBT) + XFS_SICK_RT_RMAPBT | \ + XFS_SICK_RT_REFCNTBT) #define XFS_SICK_AG_PRIMARY (XFS_SICK_AG_SB | \ XFS_SICK_AG_AGF | \ diff --git a/fs/xfs/libxfs/xfs_inode_fork.c b/fs/xfs/libxfs/xfs_inode_fork.c index 7aae3ae810b7..5d5134a61994 100644 --- a/fs/xfs/libxfs/xfs_inode_fork.c +++ b/fs/xfs/libxfs/xfs_inode_fork.c @@ -268,8 +268,10 @@ xfs_iformat_data_fork( } return xfs_iformat_rtrmap(ip, dip); case XFS_DINODE_FMT_REFCOUNT: - if (!xfs_has_rtreflink(ip->i_mount)) + if (!xfs_has_rtreflink(ip->i_mount)) { + xfs_inode_mark_sick(ip, XFS_SICK_INO_CORE); return -EFSCORRUPTED; + } return xfs_iformat_rtrefcount(ip, dip); default: xfs_inode_verifier_error(ip, -EFSCORRUPTED, __func__, diff --git a/fs/xfs/libxfs/xfs_rtrefcount_btree.c b/fs/xfs/libxfs/xfs_rtrefcount_btree.c index 0a6fa9851371..4bbda3ff0b39 100644 --- a/fs/xfs/libxfs/xfs_rtrefcount_btree.c +++ b/fs/xfs/libxfs/xfs_rtrefcount_btree.c @@ -27,6 +27,7 @@ #include "xfs_rtgroup.h" #include "xfs_rtbitmap.h" #include "xfs_imeta.h" +#include "xfs_health.h" static struct kmem_cache *xfs_rtrefcountbt_cur_cache; @@ -693,8 +694,10 @@ xfs_iformat_rtrefcount( level = be16_to_cpu(dfp->bb_level); if (level > mp->m_rtrefc_maxlevels || - xfs_rtrefcount_droot_space_calc(level, numrecs) > dsize) + xfs_rtrefcount_droot_space_calc(level, numrecs) > dsize) { + xfs_inode_mark_sick(ip, XFS_SICK_INO_CORE); return -EFSCORRUPTED; + } xfs_iroot_alloc(ip, XFS_DATA_FORK, xfs_rtrefcount_broot_space_calc(mp, level, numrecs)); diff --git a/fs/xfs/xfs_health.c b/fs/xfs/xfs_health.c index 80cc735b52d1..3a6684acd858 100644 --- a/fs/xfs/xfs_health.c +++ b/fs/xfs/xfs_health.c @@ -532,6 +532,7 @@ static const struct ioctl_sick_map rtgroup_map[] = { { XFS_SICK_RT_SUPER, XFS_RTGROUP_GEOM_SICK_SUPER }, { XFS_SICK_RT_BITMAP, XFS_RTGROUP_GEOM_SICK_BITMAP }, { XFS_SICK_RT_RMAPBT, XFS_RTGROUP_GEOM_SICK_RMAPBT }, + { XFS_SICK_RT_REFCNTBT, XFS_RTGROUP_GEOM_SICK_REFCNTBT }, { 0, 0 }, }; @@ -634,6 +635,9 @@ xfs_btree_mark_sick( case XFS_BTNUM_RTRMAP: xfs_rtgroup_mark_sick(cur->bc_ino.rtg, XFS_SICK_RT_RMAPBT); return; + case XFS_BTNUM_RTREFC: + xfs_rtgroup_mark_sick(cur->bc_ino.rtg, XFS_SICK_RT_REFCNTBT); + return; case XFS_BTNUM_BNO: mask = XFS_SICK_AG_BNOBT; break; diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c index 8929c4fffb53..75d39c3274df 100644 --- a/fs/xfs/xfs_rtalloc.c +++ b/fs/xfs/xfs_rtalloc.c @@ -1968,6 +1968,7 @@ xfs_rtmount_refcountbt( goto out_path; if (XFS_IS_CORRUPT(mp, ip->i_df.if_format != XFS_DINODE_FMT_REFCOUNT)) { + xfs_rtgroup_mark_sick(rtg, XFS_SICK_RT_REFCNTBT); error = -EFSCORRUPTED; goto out_rele; } ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 25/42] xfs: enable extent size hints for CoW operations 2022-12-30 22:18 ` [PATCHSET v1.0 00/42] xfs: reflink on the realtime device Darrick J. Wong ` (25 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 28/42] xfs: report realtime refcount btree corruption errors to the health system Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 22/42] xfs: refcover CoW leftovers in the realtime volume Darrick J. Wong ` (14 subsequent siblings) 41 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Wire up the copy-on-write extent size hint for realtime files, and connect it to the rt allocator so that we avoid fragmentation on rt filesystems. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_bmap.c | 8 +++++++- fs/xfs/xfs_bmap_util.c | 5 ++++- 2 files changed, 11 insertions(+), 2 deletions(-) diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index 552875ddcc4a..b2bc39b1f9b7 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -6450,7 +6450,13 @@ xfs_get_cowextsz_hint( a = 0; if (ip->i_diflags2 & XFS_DIFLAG2_COWEXTSIZE) a = ip->i_cowextsize; - b = xfs_get_extsz_hint(ip); + if (XFS_IS_REALTIME_INODE(ip)) { + b = 0; + if (ip->i_diflags & XFS_DIFLAG_EXTSIZE) + b = ip->i_extsize; + } else { + b = xfs_get_extsz_hint(ip); + } a = max(a, b); if (a == 0) diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index 842f472292cd..a54ed26e1cc0 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -138,7 +138,10 @@ xfs_bmap_rtalloc( bool ignore_locality = false; int error; - align = xfs_get_extsz_hint(ap->ip); + if (ap->flags & XFS_BMAPI_COWFORK) + align = xfs_get_cowextsz_hint(ap->ip); + else + align = xfs_get_extsz_hint(ap->ip); retry: prod = xfs_extlen_to_rtxlen(mp, align); error = xfs_bmap_extsize_align(mp, &ap->got, &ap->prev, ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 22/42] xfs: refcover CoW leftovers in the realtime volume 2022-12-30 22:18 ` [PATCHSET v1.0 00/42] xfs: reflink on the realtime device Darrick J. Wong ` (26 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 25/42] xfs: enable extent size hints for CoW operations Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 31/42] xfs: allow overlapping rtrmapbt records for shared data extents Darrick J. Wong ` (13 subsequent siblings) 41 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Scan the realtime refcount tree at mount time to get rid of leftover CoW staging extents. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_refcount.c | 63 +++++++++++++++++++++++++++++++++--------- fs/xfs/libxfs/xfs_refcount.h | 5 +++ fs/xfs/xfs_reflink.c | 14 ++++++++- 3 files changed, 65 insertions(+), 17 deletions(-) diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c index c4ab749c78e4..8b878a7a5a3e 100644 --- a/fs/xfs/libxfs/xfs_refcount.c +++ b/fs/xfs/libxfs/xfs_refcount.c @@ -2037,14 +2037,15 @@ xfs_refcount_recover_extent( } /* Find and remove leftover CoW reservations. */ -int -xfs_refcount_recover_cow_leftovers( +static int +xfs_refcount_recover_group_cow_leftovers( struct xfs_mount *mp, - struct xfs_perag *pag) + struct xfs_perag *pag, + struct xfs_rtgroup *rtg) { struct xfs_trans *tp; struct xfs_btree_cur *cur; - struct xfs_buf *agbp; + struct xfs_buf *agbp = NULL; struct xfs_refcount_recovery *rr, *n; struct list_head debris; union xfs_btree_irec low; @@ -2054,7 +2055,12 @@ xfs_refcount_recover_cow_leftovers( /* reflink filesystems mustn't have AGs larger than 2^31-1 blocks */ BUILD_BUG_ON(XFS_MAX_CRC_AG_BLOCKS >= XFS_REFC_COWFLAG); - if (mp->m_sb.sb_agblocks > XFS_MAX_CRC_AG_BLOCKS) + if (pag && mp->m_sb.sb_agblocks > XFS_MAX_CRC_AG_BLOCKS) + return -EOPNOTSUPP; + + /* rtreflink filesystems can't have rtgroups larger than 2^31-1 blocks */ + BUILD_BUG_ON(XFS_MAX_RGBLOCKS >= XFS_REFC_COWFLAG); + if (rtg && mp->m_sb.sb_rgblocks >= XFS_MAX_RGBLOCKS) return -EOPNOTSUPP; INIT_LIST_HEAD(&debris); @@ -2073,10 +2079,16 @@ xfs_refcount_recover_cow_leftovers( if (error) return error; - error = xfs_alloc_read_agf(pag, tp, 0, &agbp); - if (error) - goto out_trans; - cur = xfs_refcountbt_init_cursor(mp, tp, agbp, pag); + if (rtg) { + xfs_rtgroup_lock(NULL, rtg, XFS_RTGLOCK_REFCOUNT); + cur = xfs_rtrefcountbt_init_cursor(mp, tp, rtg, + rtg->rtg_refcountip); + } else { + error = xfs_alloc_read_agf(pag, tp, 0, &agbp); + if (error) + goto out_trans; + cur = xfs_refcountbt_init_cursor(mp, tp, agbp, pag); + } /* Find all the leftover CoW staging extents. */ memset(&low, 0, sizeof(low)); @@ -2086,7 +2098,10 @@ xfs_refcount_recover_cow_leftovers( error = xfs_btree_query_range(cur, &low, &high, xfs_refcount_recover_extent, &debris); xfs_btree_del_cursor(cur, error); - xfs_trans_brelse(tp, agbp); + if (agbp) + xfs_trans_brelse(tp, agbp); + else + xfs_rtgroup_unlock(rtg, XFS_RTGLOCK_REFCOUNT); xfs_trans_cancel(tp); if (error) goto out_free; @@ -2099,14 +2114,18 @@ xfs_refcount_recover_cow_leftovers( goto out_free; /* Free the orphan record */ - fsb = XFS_AGB_TO_FSB(mp, pag->pag_agno, - rr->rr_rrec.rc_startblock); - xfs_refcount_free_cow_extent(tp, false, fsb, + if (rtg) + fsb = xfs_rgbno_to_rtb(mp, rtg->rtg_rgno, + rr->rr_rrec.rc_startblock); + else + fsb = XFS_AGB_TO_FSB(mp, pag->pag_agno, + rr->rr_rrec.rc_startblock); + xfs_refcount_free_cow_extent(tp, rtg != NULL, fsb, rr->rr_rrec.rc_blockcount); /* Free the block. */ xfs_free_extent_later(tp, fsb, rr->rr_rrec.rc_blockcount, NULL, - 0); + rtg != NULL ? XFS_FREE_EXTENT_REALTIME : 0); error = xfs_trans_commit(tp); if (error) @@ -2128,6 +2147,22 @@ xfs_refcount_recover_cow_leftovers( return error; } +int +xfs_refcount_recover_cow_leftovers( + struct xfs_mount *mp, + struct xfs_perag *pag) +{ + return xfs_refcount_recover_group_cow_leftovers(mp, pag, NULL); +} + +int +xfs_refcount_recover_rtcow_leftovers( + struct xfs_mount *mp, + struct xfs_rtgroup *rtg) +{ + return xfs_refcount_recover_group_cow_leftovers(mp, NULL, rtg); +} + /* * Scan part of the keyspace of the refcount records and tell us if the area * has no records, is fully mapped by records, or is partially filled. diff --git a/fs/xfs/libxfs/xfs_refcount.h b/fs/xfs/libxfs/xfs_refcount.h index 4e725d723e88..c7907119d10c 100644 --- a/fs/xfs/libxfs/xfs_refcount.h +++ b/fs/xfs/libxfs/xfs_refcount.h @@ -12,6 +12,7 @@ struct xfs_perag; struct xfs_btree_cur; struct xfs_bmbt_irec; struct xfs_refcount_irec; +struct xfs_rtgroup; extern int xfs_refcount_lookup_le(struct xfs_btree_cur *cur, enum xfs_refc_domain domain, xfs_agblock_t bno, int *stat); @@ -99,8 +100,10 @@ void xfs_refcount_alloc_cow_extent(struct xfs_trans *tp, bool isrt, xfs_fsblock_t fsb, xfs_extlen_t len); void xfs_refcount_free_cow_extent(struct xfs_trans *tp, bool isrt, xfs_fsblock_t fsb, xfs_extlen_t len); -extern int xfs_refcount_recover_cow_leftovers(struct xfs_mount *mp, +int xfs_refcount_recover_cow_leftovers(struct xfs_mount *mp, struct xfs_perag *pag); +int xfs_refcount_recover_rtcow_leftovers(struct xfs_mount *mp, + struct xfs_rtgroup *rtg); /* * While we're adjusting the refcounts records of an extent, we have diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c index 3cead39e4308..13a613c077df 100644 --- a/fs/xfs/xfs_reflink.c +++ b/fs/xfs/xfs_reflink.c @@ -1002,7 +1002,9 @@ xfs_reflink_recover_cow( struct xfs_mount *mp) { struct xfs_perag *pag; + struct xfs_rtgroup *rtg; xfs_agnumber_t agno; + xfs_rgnumber_t rgno; int error = 0; if (!xfs_has_reflink(mp)) @@ -1012,11 +1014,19 @@ xfs_reflink_recover_cow( error = xfs_refcount_recover_cow_leftovers(mp, pag); if (error) { xfs_perag_put(pag); - break; + return error; } } - return error; + for_each_rtgroup(mp, rgno, rtg) { + error = xfs_refcount_recover_rtcow_leftovers(mp, rtg); + if (error) { + xfs_rtgroup_put(rtg); + return error; + } + } + + return 0; } /* ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 31/42] xfs: allow overlapping rtrmapbt records for shared data extents 2022-12-30 22:18 ` [PATCHSET v1.0 00/42] xfs: reflink on the realtime device Darrick J. Wong ` (27 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 22/42] xfs: refcover CoW leftovers in the realtime volume Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 34/42] xfs: detect and repair misaligned rtinherit directory cowextsize hints Darrick J. Wong ` (12 subsequent siblings) 41 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Allow overlapping realtime reverse mapping records if they both describe shared data extents and the fs supports reflink on the realtime volume. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/scrub/rtrmap.c | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/fs/xfs/scrub/rtrmap.c b/fs/xfs/scrub/rtrmap.c index e89d5310117a..3ff4151b2c0a 100644 --- a/fs/xfs/scrub/rtrmap.c +++ b/fs/xfs/scrub/rtrmap.c @@ -86,6 +86,18 @@ struct xchk_rtrmap { struct xfs_rmap_irec prev_rec; }; +static inline bool +xchk_rtrmapbt_is_shareable( + struct xfs_scrub *sc, + const struct xfs_rmap_irec *irec) +{ + if (!xfs_has_rtreflink(sc->mp)) + return false; + if (irec->rm_flags & XFS_RMAP_UNWRITTEN) + return false; + return true; +} + /* Flag failures for records that overlap but cannot. */ STATIC void xchk_rtrmapbt_check_overlapping( @@ -107,7 +119,10 @@ xchk_rtrmapbt_check_overlapping( if (pnext <= irec->rm_startblock) goto set_prev; - xchk_btree_set_corrupt(bs->sc, bs->cur, 0); + /* Overlap is only allowed if both records are data fork mappings. */ + if (!xchk_rtrmapbt_is_shareable(bs->sc, &cr->overlap_rec) || + !xchk_rtrmapbt_is_shareable(bs->sc, irec)) + xchk_btree_set_corrupt(bs->sc, bs->cur, 0); /* Save whichever rmap record extends furthest. */ inext = irec->rm_startblock + irec->rm_blockcount; ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 34/42] xfs: detect and repair misaligned rtinherit directory cowextsize hints 2022-12-30 22:18 ` [PATCHSET v1.0 00/42] xfs: reflink on the realtime device Darrick J. Wong ` (28 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 31/42] xfs: allow overlapping rtrmapbt records for shared data extents Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 35/42] xfs: don't flag quota rt block usage on rtreflink filesystems Darrick J. Wong ` (11 subsequent siblings) 41 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> If we encounter a directory that has been configured to pass on a CoW extent size hint to a new realtime file and the hint isn't an integer multiple of the rt extent size, we should flag the hint for administrative review and/or turn it off because that is a misconfiguration. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/scrub/inode.c | 26 +++++++++++++++++--------- fs/xfs/scrub/inode_repair.c | 15 +++++++++++++++ 2 files changed, 32 insertions(+), 9 deletions(-) diff --git a/fs/xfs/scrub/inode.c b/fs/xfs/scrub/inode.c index be9739035226..6a37973823d2 100644 --- a/fs/xfs/scrub/inode.c +++ b/fs/xfs/scrub/inode.c @@ -229,12 +229,7 @@ xchk_inode_extsize( xchk_ino_set_warning(sc, ino); } -/* - * Validate di_cowextsize hint. - * - * The rules are documented at xfs_ioctl_setattr_check_cowextsize(). - * These functions must be kept in sync with each other. - */ +/* Validate di_cowextsize hint. */ STATIC void xchk_inode_cowextsize( struct xfs_scrub *sc, @@ -245,12 +240,25 @@ xchk_inode_cowextsize( uint64_t flags2) { xfs_failaddr_t fa; + uint32_t value = be32_to_cpu(dip->di_cowextsize); - fa = xfs_inode_validate_cowextsize(sc->mp, - be32_to_cpu(dip->di_cowextsize), mode, flags, - flags2); + fa = xfs_inode_validate_cowextsize(sc->mp, value, mode, flags, flags2); if (fa) xchk_ino_set_corrupt(sc, ino); + + /* + * XFS allows a sysadmin to change the rt extent size when adding a rt + * section to a filesystem after formatting. If there are any + * directories with cowextsize and rtinherit set, the hint could become + * misaligned with the new rextsize. The verifier doesn't check this, + * because we allow rtinherit directories even without an rt device. + * Flag this as an administrative warning since we will clean this up + * eventually. + */ + if ((flags & XFS_DIFLAG_RTINHERIT) && + (flags2 & XFS_DIFLAG2_COWEXTSIZE) && + value % sc->mp->m_sb.sb_rextsize > 0) + xchk_ino_set_warning(sc, ino); } /* Make sure the di_flags make sense for the inode. */ diff --git a/fs/xfs/scrub/inode_repair.c b/fs/xfs/scrub/inode_repair.c index 9f946406cfa0..3ce9ac5b0fc4 100644 --- a/fs/xfs/scrub/inode_repair.c +++ b/fs/xfs/scrub/inode_repair.c @@ -1572,6 +1572,20 @@ xrep_inode_extsize( } } +/* Fix COW extent size hint problems. */ +STATIC void +xrep_inode_cowextsize( + struct xfs_scrub *sc) +{ + /* Fix misaligned CoW extent size hints on a directory. */ + if ((sc->ip->i_diflags & XFS_DIFLAG_RTINHERIT) && + (sc->ip->i_diflags2 & XFS_DIFLAG2_COWEXTSIZE) && + sc->ip->i_extsize % sc->mp->m_sb.sb_rextsize > 0) { + sc->ip->i_cowextsize = 0; + sc->ip->i_diflags2 &= ~XFS_DIFLAG2_COWEXTSIZE; + } +} + /* Fix any irregularities in an inode that the verifiers don't catch. */ STATIC int xrep_inode_problems( @@ -1587,6 +1601,7 @@ xrep_inode_problems( xrep_inode_ids(sc); xrep_inode_size(sc); xrep_inode_extsize(sc); + xrep_inode_cowextsize(sc); trace_xrep_inode_fixed(sc); xfs_trans_log_inode(sc->tp, sc->ip, XFS_ILOG_CORE); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 35/42] xfs: don't flag quota rt block usage on rtreflink filesystems 2022-12-30 22:18 ` [PATCHSET v1.0 00/42] xfs: reflink on the realtime device Darrick J. Wong ` (29 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 34/42] xfs: detect and repair misaligned rtinherit directory cowextsize hints Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 33/42] xfs: allow dquot rt block count to exceed rt blocks on reflink fs Darrick J. Wong ` (10 subsequent siblings) 41 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Quota space usage is allowed to exceed the size of the physical storage when reflink is enabled. Now that we have reflink for the realtime volume, apply this same logic to the rtb repair logic. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/scrub/quota_repair.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/xfs/scrub/quota_repair.c b/fs/xfs/scrub/quota_repair.c index a150719c2b90..c79c47714eb6 100644 --- a/fs/xfs/scrub/quota_repair.c +++ b/fs/xfs/scrub/quota_repair.c @@ -101,7 +101,7 @@ xrep_quota_item( rqi->need_quotacheck = true; dirty = true; } - if (dqp->q_rtb.count > mp->m_sb.sb_rblocks) { + if (!xfs_has_reflink(mp) && dqp->q_rtb.count > mp->m_sb.sb_rblocks) { dqp->q_rtb.reserved -= dqp->q_rtb.count; dqp->q_rtb.reserved += mp->m_sb.sb_rblocks; dqp->q_rtb.count = mp->m_sb.sb_rblocks; ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 33/42] xfs: allow dquot rt block count to exceed rt blocks on reflink fs 2022-12-30 22:18 ` [PATCHSET v1.0 00/42] xfs: reflink on the realtime device Darrick J. Wong ` (30 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 35/42] xfs: don't flag quota rt block usage on rtreflink filesystems Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 29/42] xfs: scrub the realtime refcount btree Darrick J. Wong ` (9 subsequent siblings) 41 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Update the quota scrubber to allow dquots where the realtime block count exceeds the block count of the rt volume if reflink is enabled. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/scrub/quota.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/fs/xfs/scrub/quota.c b/fs/xfs/scrub/quota.c index 714bd4c0753a..2d2064126bc9 100644 --- a/fs/xfs/scrub/quota.c +++ b/fs/xfs/scrub/quota.c @@ -139,12 +139,18 @@ xchk_quota_item( if (mp->m_sb.sb_dblocks < dq->q_blk.count) xchk_fblock_set_warning(sc, XFS_DATA_FORK, offset); + if (mp->m_sb.sb_rblocks < dq->q_rtb.count) + xchk_fblock_set_warning(sc, XFS_DATA_FORK, + offset); } else { if (mp->m_sb.sb_dblocks < dq->q_blk.count) xchk_fblock_set_corrupt(sc, XFS_DATA_FORK, offset); + if (mp->m_sb.sb_rblocks < dq->q_rtb.count) + xchk_fblock_set_corrupt(sc, XFS_DATA_FORK, + offset); } - if (dq->q_ino.count > fs_icount || dq->q_rtb.count > mp->m_sb.sb_rblocks) + if (dq->q_ino.count > fs_icount) xchk_fblock_set_corrupt(sc, XFS_DATA_FORK, offset); /* ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 29/42] xfs: scrub the realtime refcount btree 2022-12-30 22:18 ` [PATCHSET v1.0 00/42] xfs: reflink on the realtime device Darrick J. Wong ` (31 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 33/42] xfs: allow dquot rt block count to exceed rt blocks on reflink fs Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 30/42] xfs: cross-reference checks with the rt " Darrick J. Wong ` (8 subsequent siblings) 41 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Add code to scrub realtime refcount btrees. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/Makefile | 1 fs/xfs/libxfs/xfs_fs.h | 3 fs/xfs/scrub/bmap.c | 1 fs/xfs/scrub/bmap_repair.c | 1 fs/xfs/scrub/common.c | 40 +++- fs/xfs/scrub/common.h | 5 fs/xfs/scrub/health.c | 1 fs/xfs/scrub/inode.c | 1 fs/xfs/scrub/rtrefcount.c | 495 ++++++++++++++++++++++++++++++++++++++++++++ fs/xfs/scrub/scrub.c | 7 + fs/xfs/scrub/scrub.h | 3 fs/xfs/scrub/trace.h | 4 12 files changed, 548 insertions(+), 14 deletions(-) create mode 100644 fs/xfs/scrub/rtrefcount.c diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile index 9cc30333c089..cb1074c67dc5 100644 --- a/fs/xfs/Makefile +++ b/fs/xfs/Makefile @@ -182,6 +182,7 @@ xfs-y += $(addprefix scrub/, \ xfs-$(CONFIG_XFS_RT) += $(addprefix scrub/, \ rgsuper.o \ rtbitmap.o \ + rtrefcount.o \ rtrmap.o \ rtsummary.o \ ) diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h index 5819576a51a1..453b08612256 100644 --- a/fs/xfs/libxfs/xfs_fs.h +++ b/fs/xfs/libxfs/xfs_fs.h @@ -746,9 +746,10 @@ struct xfs_scrub_metadata { #define XFS_SCRUB_TYPE_RGSUPER 28 /* realtime superblock */ #define XFS_SCRUB_TYPE_RGBITMAP 29 /* realtime group bitmap */ #define XFS_SCRUB_TYPE_RTRMAPBT 30 /* rtgroup reverse mapping btree */ +#define XFS_SCRUB_TYPE_RTREFCBT 31 /* realtime reference count btree */ /* Number of scrub subcommands. */ -#define XFS_SCRUB_TYPE_NR 31 +#define XFS_SCRUB_TYPE_NR 32 /* i: Repair this metadata. */ #define XFS_SCRUB_IFLAG_REPAIR (1u << 0) diff --git a/fs/xfs/scrub/bmap.c b/fs/xfs/scrub/bmap.c index 8ce279ae9c95..f18b22bc2548 100644 --- a/fs/xfs/scrub/bmap.c +++ b/fs/xfs/scrub/bmap.c @@ -909,6 +909,7 @@ xchk_bmap( case XFS_DINODE_FMT_DEV: case XFS_DINODE_FMT_LOCAL: case XFS_DINODE_FMT_RMAP: + case XFS_DINODE_FMT_REFCOUNT: /* No mappings to check. */ if (whichfork == XFS_COW_FORK) xchk_fblock_set_corrupt(sc, whichfork, 0); diff --git a/fs/xfs/scrub/bmap_repair.c b/fs/xfs/scrub/bmap_repair.c index b8cdcba984f3..5dca4680657f 100644 --- a/fs/xfs/scrub/bmap_repair.c +++ b/fs/xfs/scrub/bmap_repair.c @@ -807,6 +807,7 @@ xrep_bmap_check_inputs( case XFS_DINODE_FMT_LOCAL: case XFS_DINODE_FMT_UUID: case XFS_DINODE_FMT_RMAP: + case XFS_DINODE_FMT_REFCOUNT: return -ECANCELED; case XFS_DINODE_FMT_EXTENTS: case XFS_DINODE_FMT_BTREE: diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c index c2c379aae770..a632d56f255f 100644 --- a/fs/xfs/scrub/common.c +++ b/fs/xfs/scrub/common.c @@ -37,6 +37,7 @@ #include "xfs_rtgroup.h" #include "xfs_rtrmap_btree.h" #include "xfs_bmap_util.h" +#include "xfs_rtrefcount_btree.h" #include "scrub/scrub.h" #include "scrub/common.h" #include "scrub/trace.h" @@ -856,6 +857,10 @@ xchk_rtgroup_init( sr->rmap_cur = xfs_rtrmapbt_init_cursor(sc->mp, sc->tp, sr->rtg, sr->rtg->rtg_rmapip); + if (xfs_has_rtreflink(sc->mp) && (rtglock_flags & XFS_RTGLOCK_REFCOUNT)) + sr->refc_cur = xfs_rtrefcountbt_init_cursor(sc->mp, sc->tp, + sr->rtg, sr->rtg->rtg_refcountip); + return 0; } @@ -870,7 +875,10 @@ xchk_rtgroup_btcur_free( { if (sr->rmap_cur) xfs_btree_del_cursor(sr->rmap_cur, XFS_BTREE_ERROR); + if (sr->refc_cur) + xfs_btree_del_cursor(sr->refc_cur, XFS_BTREE_ERROR); + sr->refc_cur = NULL; sr->rmap_cur = NULL; } @@ -1556,16 +1564,26 @@ xchk_inode_count_blocks( } cur = xfs_rtrmapbt_init_cursor(sc->mp, sc->tp, sc->sr.rtg, sc->ip); - error = xfs_btree_count_blocks(cur, &btblocks); - xfs_btree_del_cursor(cur, error); - if (error) - return error; - - *nextents = 0; - *count = btblocks - 1; - return 0; - default: - return xfs_bmap_count_blocks(sc->tp, sc->ip, whichfork, - nextents, count); + goto meta_btree; + case XFS_DINODE_FMT_REFCOUNT: + if (!sc->sr.rtg) { + ASSERT(0); + return -EFSCORRUPTED; + } + cur = xfs_rtrefcountbt_init_cursor(sc->mp, sc->tp, sc->sr.rtg, + sc->ip); + goto meta_btree; } + + return xfs_bmap_count_blocks(sc->tp, sc->ip, whichfork, nextents, + count); +meta_btree: + error = xfs_btree_count_blocks(cur, &btblocks); + xfs_btree_del_cursor(cur, error); + if (error) + return error; + + *nextents = 0; + *count = btblocks - 1; + return 0; } diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h index e135f792cfcc..dd1b838a183f 100644 --- a/fs/xfs/scrub/common.h +++ b/fs/xfs/scrub/common.h @@ -108,12 +108,14 @@ int xchk_setup_rtsummary(struct xfs_scrub *sc); int xchk_setup_rgsuperblock(struct xfs_scrub *sc); int xchk_setup_rgbitmap(struct xfs_scrub *sc); int xchk_setup_rtrmapbt(struct xfs_scrub *sc); +int xchk_setup_rtrefcountbt(struct xfs_scrub *sc); #else # define xchk_setup_rtbitmap xchk_setup_nothing # define xchk_setup_rtsummary xchk_setup_nothing # define xchk_setup_rgsuperblock xchk_setup_nothing # define xchk_setup_rgbitmap xchk_setup_nothing # define xchk_setup_rtrmapbt xchk_setup_nothing +# define xchk_setup_rtrefcountbt xchk_setup_nothing #endif #ifdef CONFIG_XFS_QUOTA int xchk_ino_dqattach(struct xfs_scrub *sc); @@ -174,7 +176,8 @@ void xchk_rt_unlock(struct xfs_scrub *sc, struct xchk_rt *sr); /* All the locks we need to check an rtgroup. */ #define XCHK_RTGLOCK_ALL (XFS_RTGLOCK_BITMAP_SHARED | \ - XFS_RTGLOCK_RMAP) + XFS_RTGLOCK_RMAP | \ + XFS_RTGLOCK_REFCOUNT) int xchk_rtgroup_init(struct xfs_scrub *sc, xfs_rgnumber_t rgno, struct xchk_rt *sr, unsigned int rtglock_flags); diff --git a/fs/xfs/scrub/health.c b/fs/xfs/scrub/health.c index 061f6f73b666..cb3b0b221275 100644 --- a/fs/xfs/scrub/health.c +++ b/fs/xfs/scrub/health.c @@ -114,6 +114,7 @@ static const struct xchk_health_map type_to_health_flag[XFS_SCRUB_TYPE_NR] = { [XFS_SCRUB_TYPE_NLINKS] = { XHG_FS, XFS_SICK_FS_NLINKS }, [XFS_SCRUB_TYPE_RGSUPER] = { XHG_RTGROUP, XFS_SICK_RT_SUPER }, [XFS_SCRUB_TYPE_RTRMAPBT] = { XHG_RTGROUP, XFS_SICK_RT_RMAPBT }, + [XFS_SCRUB_TYPE_RTREFCBT] = { XHG_RTGROUP, XFS_SICK_RT_REFCNTBT }, }; /* Return the health status mask for this scrub type. */ diff --git a/fs/xfs/scrub/inode.c b/fs/xfs/scrub/inode.c index 3b19976b6066..be9739035226 100644 --- a/fs/xfs/scrub/inode.c +++ b/fs/xfs/scrub/inode.c @@ -468,6 +468,7 @@ xchk_dinode( xchk_ino_set_corrupt(sc, ino); break; case XFS_DINODE_FMT_RMAP: + case XFS_DINODE_FMT_REFCOUNT: if (!S_ISREG(mode)) xchk_ino_set_corrupt(sc, ino); break; diff --git a/fs/xfs/scrub/rtrefcount.c b/fs/xfs/scrub/rtrefcount.c new file mode 100644 index 000000000000..528a056c7932 --- /dev/null +++ b/fs/xfs/scrub/rtrefcount.c @@ -0,0 +1,495 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (C) 2022 Oracle. All Rights Reserved. + * Author: Darrick J. Wong <djwong@kernel.org> + */ +#include "xfs.h" +#include "xfs_fs.h" +#include "xfs_shared.h" +#include "xfs_format.h" +#include "xfs_trans_resv.h" +#include "xfs_mount.h" +#include "xfs_btree.h" +#include "xfs_rmap.h" +#include "xfs_refcount.h" +#include "xfs_inode.h" +#include "xfs_rtbitmap.h" +#include "xfs_rtgroup.h" +#include "scrub/scrub.h" +#include "scrub/common.h" +#include "scrub/btree.h" + +/* Set us up with the realtime refcount metadata locked. */ +int +xchk_setup_rtrefcountbt( + struct xfs_scrub *sc) +{ + struct xfs_rtgroup *rtg; + int error; + + if (xchk_need_fshook_drain(sc)) + xchk_fshooks_enable(sc, XCHK_FSHOOKS_DRAIN); + + rtg = xfs_rtgroup_get(sc->mp, sc->sm->sm_agno); + if (!rtg) + return -ENOENT; + + error = xchk_setup_rt(sc); + if (error) + goto out_rtg; + + error = xchk_install_live_inode(sc, rtg->rtg_refcountip); + if (error) + goto out_rtg; + + error = xchk_ino_dqattach(sc); + if (error) + goto out_rtg; + + error = xchk_rtgroup_init(sc, rtg->rtg_rgno, &sc->sr, XCHK_RTGLOCK_ALL); +out_rtg: + xfs_rtgroup_put(rtg); + return error; +} + +/* Realtime Reference count btree scrubber. */ + +/* + * Confirming Reference Counts via Reverse Mappings + * + * We want to count the reverse mappings overlapping a refcount record + * (bno, len, refcount), allowing for the possibility that some of the + * overlap may come from smaller adjoining reverse mappings, while some + * comes from single extents which overlap the range entirely. The + * outer loop is as follows: + * + * 1. For all reverse mappings overlapping the refcount extent, + * a. If a given rmap completely overlaps, mark it as seen. + * b. Otherwise, record the fragment (in agbno order) for later + * processing. + * + * Once we've seen all the rmaps, we know that for all blocks in the + * refcount record we want to find $refcount owners and we've already + * visited $seen extents that overlap all the blocks. Therefore, we + * need to find ($refcount - $seen) owners for every block in the + * extent; call that quantity $target_nr. Proceed as follows: + * + * 2. Pull the first $target_nr fragments from the list; all of them + * should start at or before the start of the extent. + * Call this subset of fragments the working set. + * 3. Until there are no more unprocessed fragments, + * a. Find the shortest fragments in the set and remove them. + * b. Note the block number of the end of these fragments. + * c. Pull the same number of fragments from the list. All of these + * fragments should start at the block number recorded in the + * previous step. + * d. Put those fragments in the set. + * 4. Check that there are $target_nr fragments remaining in the list, + * and that they all end at or beyond the end of the refcount extent. + * + * If the refcount is correct, all the check conditions in the algorithm + * should always hold true. If not, the refcount is incorrect. + */ +struct xchk_rtrefcnt_frag { + struct list_head list; + struct xfs_rmap_irec rm; +}; + +struct xchk_rtrefcnt_check { + struct xfs_scrub *sc; + struct list_head fragments; + + /* refcount extent we're examining */ + xfs_rgblock_t bno; + xfs_extlen_t len; + xfs_nlink_t refcount; + + /* number of owners seen */ + xfs_nlink_t seen; +}; + +/* + * Decide if the given rmap is large enough that we can redeem it + * towards refcount verification now, or if it's a fragment, in + * which case we'll hang onto it in the hopes that we'll later + * discover that we've collected exactly the correct number of + * fragments as the rtrefcountbt says we should have. + */ +STATIC int +xchk_rtrefcountbt_rmap_check( + struct xfs_btree_cur *cur, + const struct xfs_rmap_irec *rec, + void *priv) +{ + struct xchk_rtrefcnt_check *refchk = priv; + struct xchk_rtrefcnt_frag *frag; + xfs_rgblock_t rm_last; + xfs_rgblock_t rc_last; + int error = 0; + + if (xchk_should_terminate(refchk->sc, &error)) + return error; + + rm_last = rec->rm_startblock + rec->rm_blockcount - 1; + rc_last = refchk->bno + refchk->len - 1; + + /* Confirm that a single-owner refc extent is a CoW stage. */ + if (refchk->refcount == 1 && rec->rm_owner != XFS_RMAP_OWN_COW) { + xchk_btree_xref_set_corrupt(refchk->sc, cur, 0); + return 0; + } + + if (rec->rm_startblock <= refchk->bno && rm_last >= rc_last) { + /* + * The rmap overlaps the refcount record, so we can confirm + * one refcount owner seen. + */ + refchk->seen++; + } else { + /* + * This rmap covers only part of the refcount record, so + * save the fragment for later processing. If the rmapbt + * is healthy each rmap_irec we see will be in agbno order + * so we don't need insertion sort here. + */ + frag = kmalloc(sizeof(struct xchk_rtrefcnt_frag), + XCHK_GFP_FLAGS); + if (!frag) + return -ENOMEM; + memcpy(&frag->rm, rec, sizeof(frag->rm)); + list_add_tail(&frag->list, &refchk->fragments); + } + + return 0; +} + +/* + * Given a bunch of rmap fragments, iterate through them, keeping + * a running tally of the refcount. If this ever deviates from + * what we expect (which is the rtrefcountbt's refcount minus the + * number of extents that totally covered the rtrefcountbt extent), + * we have a rtrefcountbt error. + */ +STATIC void +xchk_rtrefcountbt_process_rmap_fragments( + struct xchk_rtrefcnt_check *refchk) +{ + struct list_head worklist; + struct xchk_rtrefcnt_frag *frag; + struct xchk_rtrefcnt_frag *n; + xfs_rgblock_t bno; + xfs_rgblock_t rbno; + xfs_rgblock_t next_rbno; + xfs_nlink_t nr; + xfs_nlink_t target_nr; + + target_nr = refchk->refcount - refchk->seen; + if (target_nr == 0) + return; + + /* + * There are (refchk->rc.rc_refcount - refchk->nr refcount) + * references we haven't found yet. Pull that many off the + * fragment list and figure out where the smallest rmap ends + * (and therefore the next rmap should start). All the rmaps + * we pull off should start at or before the beginning of the + * refcount record's range. + */ + INIT_LIST_HEAD(&worklist); + rbno = NULLRGBLOCK; + + /* Make sure the fragments actually /are/ in bno order. */ + bno = 0; + list_for_each_entry(frag, &refchk->fragments, list) { + if (frag->rm.rm_startblock < bno) + goto done; + bno = frag->rm.rm_startblock; + } + + /* + * Find all the rmaps that start at or before the refc extent, + * and put them on the worklist. + */ + nr = 0; + list_for_each_entry_safe(frag, n, &refchk->fragments, list) { + if (frag->rm.rm_startblock > refchk->bno || nr > target_nr) + break; + bno = frag->rm.rm_startblock + frag->rm.rm_blockcount; + if (bno < rbno) + rbno = bno; + list_move_tail(&frag->list, &worklist); + nr++; + } + + /* + * We should have found exactly $target_nr rmap fragments starting + * at or before the refcount extent. + */ + if (nr != target_nr) + goto done; + + while (!list_empty(&refchk->fragments)) { + /* Discard any fragments ending at rbno from the worklist. */ + nr = 0; + next_rbno = NULLRGBLOCK; + list_for_each_entry_safe(frag, n, &worklist, list) { + bno = frag->rm.rm_startblock + frag->rm.rm_blockcount; + if (bno != rbno) { + if (bno < next_rbno) + next_rbno = bno; + continue; + } + list_del(&frag->list); + kfree(frag); + nr++; + } + + /* Try to add nr rmaps starting at rbno to the worklist. */ + list_for_each_entry_safe(frag, n, &refchk->fragments, list) { + bno = frag->rm.rm_startblock + frag->rm.rm_blockcount; + if (frag->rm.rm_startblock != rbno) + goto done; + list_move_tail(&frag->list, &worklist); + if (next_rbno > bno) + next_rbno = bno; + nr--; + if (nr == 0) + break; + } + + /* + * If we get here and nr > 0, this means that we added fewer + * items to the worklist than we discarded because the fragment + * list ran out of items. Therefore, we cannot maintain the + * required refcount. Something is wrong, so we're done. + */ + if (nr) + goto done; + + rbno = next_rbno; + } + + /* + * Make sure the last extent we processed ends at or beyond + * the end of the refcount extent. + */ + if (rbno < refchk->bno + refchk->len) + goto done; + + /* Actually record us having seen the remaining refcount. */ + refchk->seen = refchk->refcount; +done: + /* Delete fragments and work list. */ + list_for_each_entry_safe(frag, n, &worklist, list) { + list_del(&frag->list); + kfree(frag); + } + list_for_each_entry_safe(frag, n, &refchk->fragments, list) { + list_del(&frag->list); + kfree(frag); + } +} + +/* Use the rmap entries covering this extent to verify the refcount. */ +STATIC void +xchk_rtrefcountbt_xref_rmap( + struct xfs_scrub *sc, + const struct xfs_refcount_irec *irec) +{ + struct xchk_rtrefcnt_check refchk = { + .sc = sc, + .bno = irec->rc_startblock, + .len = irec->rc_blockcount, + .refcount = irec->rc_refcount, + .seen = 0, + }; + struct xfs_rmap_irec low; + struct xfs_rmap_irec high; + struct xchk_rtrefcnt_frag *frag; + struct xchk_rtrefcnt_frag *n; + int error; + + if (!sc->sr.rmap_cur || xchk_skip_xref(sc->sm)) + return; + + /* Cross-reference with the rmapbt to confirm the refcount. */ + memset(&low, 0, sizeof(low)); + low.rm_startblock = irec->rc_startblock; + memset(&high, 0xFF, sizeof(high)); + high.rm_startblock = irec->rc_startblock + irec->rc_blockcount - 1; + + INIT_LIST_HEAD(&refchk.fragments); + error = xfs_rmap_query_range(sc->sr.rmap_cur, &low, &high, + xchk_rtrefcountbt_rmap_check, &refchk); + if (!xchk_should_check_xref(sc, &error, &sc->sr.rmap_cur)) + goto out_free; + + xchk_rtrefcountbt_process_rmap_fragments(&refchk); + if (irec->rc_refcount != refchk.seen) + xchk_btree_xref_set_corrupt(sc, sc->sr.rmap_cur, 0); + +out_free: + list_for_each_entry_safe(frag, n, &refchk.fragments, list) { + list_del(&frag->list); + kfree(frag); + } +} + +/* Cross-reference with the other btrees. */ +STATIC void +xchk_rtrefcountbt_xref( + struct xfs_scrub *sc, + const struct xfs_refcount_irec *irec) +{ + xfs_rtblock_t rtbno; + + if (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT) + return; + + rtbno = xfs_rgbno_to_rtb(sc->mp, sc->sr.rtg->rtg_rgno, + irec->rc_startblock); + xchk_xref_is_used_rt_space(sc, rtbno, irec->rc_blockcount); + xchk_rtrefcountbt_xref_rmap(sc, irec); +} + +struct xchk_rtrefcbt_records { + /* Previous refcount record. */ + struct xfs_refcount_irec prev_rec; + + /* Number of CoW blocks we expect. */ + xfs_extlen_t cow_blocks; +}; + +static inline bool +xchk_rtrefcount_mergeable( + struct xchk_rtrefcbt_records *rrc, + const struct xfs_refcount_irec *r2) +{ + const struct xfs_refcount_irec *r1 = &rrc->prev_rec; + + /* Ignore if prev_rec is not yet initialized. */ + if (r1->rc_blockcount > 0) + return false; + + if (r1->rc_startblock + r1->rc_blockcount != r2->rc_startblock) + return false; + if (r1->rc_refcount != r2->rc_refcount) + return false; + if ((unsigned long long)r1->rc_blockcount + r2->rc_blockcount > + XFS_REFC_LEN_MAX) + return false; + + return true; +} + +/* Flag failures for records that could be merged. */ +STATIC void +xchk_rtrefcountbt_check_mergeable( + struct xchk_btree *bs, + struct xchk_rtrefcbt_records *rrc, + const struct xfs_refcount_irec *irec) +{ + if (bs->sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT) + return; + + if (xchk_rtrefcount_mergeable(rrc, irec)) + xchk_btree_set_corrupt(bs->sc, bs->cur, 0); + + memcpy(&rrc->prev_rec, irec, sizeof(struct xfs_refcount_irec)); +} + +/* Scrub a rtrefcountbt record. */ +STATIC int +xchk_rtrefcountbt_rec( + struct xchk_btree *bs, + const union xfs_btree_rec *rec) +{ + struct xfs_mount *mp = bs->cur->bc_mp; + struct xchk_rtrefcbt_records *rrc = bs->private; + struct xfs_refcount_irec irec; + u32 mod; + + xfs_refcount_btrec_to_irec(rec, &irec); + if (xfs_refcount_check_irec(bs->cur, &irec) != NULL) { + xchk_btree_set_corrupt(bs->sc, bs->cur, 0); + return 0; + } + + /* We can only share full rt extents. */ + xfs_rtb_to_rtx(mp, irec.rc_startblock, &mod); + if (mod) + xchk_btree_set_corrupt(bs->sc, bs->cur, 0); + xfs_rtb_to_rtx(mp, irec.rc_blockcount, &mod); + if (mod) + xchk_btree_set_corrupt(bs->sc, bs->cur, 0); + + if (irec.rc_domain == XFS_REFC_DOMAIN_COW) + rrc->cow_blocks += irec.rc_blockcount; + + xchk_rtrefcountbt_check_mergeable(bs, rrc, &irec); + xchk_rtrefcountbt_xref(bs->sc, &irec); + + return 0; +} + +/* Make sure we have as many refc blocks as the rmap says. */ +STATIC void +xchk_refcount_xref_rmap( + struct xfs_scrub *sc, + const struct xfs_owner_info *btree_oinfo, + xfs_extlen_t cow_blocks) +{ + xfs_extlen_t refcbt_blocks = 0; + xfs_filblks_t blocks; + int error; + + if (!sc->sr.rmap_cur || !sc->sa.rmap_cur || xchk_skip_xref(sc->sm)) + return; + + /* Check that we saw as many refcbt blocks as the rmap knows about. */ + error = xfs_btree_count_blocks(sc->sr.refc_cur, &refcbt_blocks); + if (!xchk_btree_process_error(sc, sc->sr.refc_cur, 0, &error)) + return; + error = xchk_count_rmap_ownedby_ag(sc, sc->sa.rmap_cur, btree_oinfo, + &blocks); + if (!xchk_should_check_xref(sc, &error, &sc->sa.rmap_cur)) + return; + if (blocks != refcbt_blocks) + xchk_btree_xref_set_corrupt(sc, sc->sa.rmap_cur, 0); + + /* Check that we saw as many cow blocks as the rmap knows about. */ + error = xchk_count_rmap_ownedby_ag(sc, sc->sr.rmap_cur, + &XFS_RMAP_OINFO_COW, &blocks); + if (!xchk_should_check_xref(sc, &error, &sc->sr.rmap_cur)) + return; + if (blocks != cow_blocks) + xchk_btree_xref_set_corrupt(sc, sc->sr.rmap_cur, 0); +} + +/* Scrub the refcount btree for some AG. */ +int +xchk_rtrefcountbt( + struct xfs_scrub *sc) +{ + struct xfs_owner_info btree_oinfo; + struct xchk_rtrefcbt_records rrc = { + .cow_blocks = 0, + }; + int error; + + error = xchk_metadata_inode_forks(sc); + if (error || (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)) + return error; + + xfs_rmap_ino_bmbt_owner(&btree_oinfo, sc->sr.rtg->rtg_refcountip->i_ino, + XFS_DATA_FORK); + error = xchk_btree(sc, sc->sr.refc_cur, xchk_rtrefcountbt_rec, + &btree_oinfo, &rrc); + if (error) + goto out_unlock; + + xchk_refcount_xref_rmap(sc, &btree_oinfo, rrc.cow_blocks); + +out_unlock: + return error; +} diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c index ab7a36efab3b..ad6f297ae6cf 100644 --- a/fs/xfs/scrub/scrub.c +++ b/fs/xfs/scrub/scrub.c @@ -432,6 +432,13 @@ static const struct xchk_meta_ops meta_scrub_ops[] = { .has = xfs_has_rtrmapbt, .repair = xrep_rtrmapbt, }, + [XFS_SCRUB_TYPE_RTREFCBT] = { /* realtime refcountbt */ + .type = ST_RTGROUP, + .setup = xchk_setup_rtrefcountbt, + .scrub = xchk_rtrefcountbt, + .has = xfs_has_rtreflink, + .repair = xrep_notsupported, + }, }; static int diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h index d47db84e6b7f..3a9dd26eca7d 100644 --- a/fs/xfs/scrub/scrub.h +++ b/fs/xfs/scrub/scrub.h @@ -81,6 +81,7 @@ struct xchk_rt { /* rtgroup btrees */ struct xfs_btree_cur *rmap_cur; + struct xfs_btree_cur *refc_cur; }; struct xfs_scrub { @@ -194,12 +195,14 @@ int xchk_rtsummary(struct xfs_scrub *sc); int xchk_rgsuperblock(struct xfs_scrub *sc); int xchk_rgbitmap(struct xfs_scrub *sc); int xchk_rtrmapbt(struct xfs_scrub *sc); +int xchk_rtrefcountbt(struct xfs_scrub *sc); #else # define xchk_rtbitmap xchk_nothing # define xchk_rtsummary xchk_nothing # define xchk_rgsuperblock xchk_nothing # define xchk_rgbitmap xchk_nothing # define xchk_rtrmapbt xchk_nothing +# define xchk_rtrefcountbt xchk_nothing #endif #ifdef CONFIG_XFS_QUOTA int xchk_quota(struct xfs_scrub *sc); diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h index 8d66ab10e1fd..8070d946ae1d 100644 --- a/fs/xfs/scrub/trace.h +++ b/fs/xfs/scrub/trace.h @@ -79,6 +79,7 @@ TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_HEALTHY); TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_RGSUPER); TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_RGBITMAP); TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_RTRMAPBT); +TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_RTREFCBT); #define XFS_SCRUB_TYPE_STRINGS \ { XFS_SCRUB_TYPE_PROBE, "probe" }, \ @@ -111,7 +112,8 @@ TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_RTRMAPBT); { XFS_SCRUB_TYPE_HEALTHY, "healthy" }, \ { XFS_SCRUB_TYPE_RGSUPER, "rgsuper" }, \ { XFS_SCRUB_TYPE_RGBITMAP, "rgbitmap" }, \ - { XFS_SCRUB_TYPE_RTRMAPBT, "rtrmapbt" } + { XFS_SCRUB_TYPE_RTRMAPBT, "rtrmapbt" }, \ + { XFS_SCRUB_TYPE_RTREFCBT, "rtrefcountbt" } #define XFS_SCRUB_FLAG_STRINGS \ { XFS_SCRUB_IFLAG_REPAIR, "repair" }, \ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 30/42] xfs: cross-reference checks with the rt refcount btree 2022-12-30 22:18 ` [PATCHSET v1.0 00/42] xfs: reflink on the realtime device Darrick J. Wong ` (32 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 29/42] xfs: scrub the realtime refcount btree Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 32/42] xfs: check reference counts of gaps between rt refcount records Darrick J. Wong ` (7 subsequent siblings) 41 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Use the realtime refcount btree to implement cross-reference checks in other data structures. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/scrub/bmap.c | 27 ++++++++++++-- fs/xfs/scrub/rtbitmap.c | 2 + fs/xfs/scrub/rtrefcount.c | 86 +++++++++++++++++++++++++++++++++++++++++++++ fs/xfs/scrub/rtrmap.c | 37 +++++++++++++++++++ fs/xfs/scrub/scrub.h | 9 +++++ 5 files changed, 156 insertions(+), 5 deletions(-) diff --git a/fs/xfs/scrub/bmap.c b/fs/xfs/scrub/bmap.c index f18b22bc2548..8191a67598d0 100644 --- a/fs/xfs/scrub/bmap.c +++ b/fs/xfs/scrub/bmap.c @@ -352,11 +352,28 @@ xchk_bmap_rt_iextent_xref( xchk_xref_is_used_rt_space(info->sc, irec->br_startblock, irec->br_blockcount); xchk_bmap_xref_rmap(info, irec, rgbno); - - xfs_rmap_ino_owner(&oinfo, info->sc->ip->i_ino, info->whichfork, - irec->br_startoff); - xchk_xref_is_only_rt_owned_by(info->sc, rgbno, irec->br_blockcount, - &oinfo); + switch (info->whichfork) { + case XFS_DATA_FORK: + if (!xfs_is_reflink_inode(info->sc->ip)) { + xfs_rmap_ino_owner(&oinfo, info->sc->ip->i_ino, + info->whichfork, irec->br_startoff); + xchk_xref_is_only_rt_owned_by(info->sc, rgbno, + irec->br_blockcount, &oinfo); + xchk_xref_is_not_rt_shared(info->sc, rgbno, + irec->br_blockcount); + } + xchk_xref_is_not_rt_cow_staging(info->sc, rgbno, + irec->br_blockcount); + break; + case XFS_COW_FORK: + xchk_xref_is_only_rt_owned_by(info->sc, rgbno, + irec->br_blockcount, &XFS_RMAP_OINFO_COW); + xchk_xref_is_rt_cow_staging(info->sc, rgbno, + irec->br_blockcount); + xchk_xref_is_not_rt_shared(info->sc, rgbno, + irec->br_blockcount); + break; + } out_free: xchk_rtgroup_btcur_free(&info->sc->sr); diff --git a/fs/xfs/scrub/rtbitmap.c b/fs/xfs/scrub/rtbitmap.c index ca478fbd514e..9419219a534f 100644 --- a/fs/xfs/scrub/rtbitmap.c +++ b/fs/xfs/scrub/rtbitmap.c @@ -121,6 +121,8 @@ xchk_rtbitmap_xref( rgbno = xfs_rtb_to_rgbno(sc->mp, startblock, &rgno); xchk_xref_has_no_rt_owner(sc, rgbno, blockcount); + xchk_xref_is_not_rt_shared(sc, rgbno, blockcount); + xchk_xref_is_not_rt_cow_staging(sc, rgbno, blockcount); if (rtb->next_free_rtblock < startblock) { xfs_rgblock_t next_rgbno; diff --git a/fs/xfs/scrub/rtrefcount.c b/fs/xfs/scrub/rtrefcount.c index 528a056c7932..05512f8443a2 100644 --- a/fs/xfs/scrub/rtrefcount.c +++ b/fs/xfs/scrub/rtrefcount.c @@ -493,3 +493,89 @@ xchk_rtrefcountbt( out_unlock: return error; } + +/* xref check that a cow staging extent is marked in the rtrefcountbt. */ +void +xchk_xref_is_rt_cow_staging( + struct xfs_scrub *sc, + xfs_rgblock_t bno, + xfs_extlen_t len) +{ + struct xfs_refcount_irec rc; + int has_refcount; + int error; + + if (!sc->sr.refc_cur || xchk_skip_xref(sc->sm)) + return; + + /* Find the CoW staging extent. */ + error = xfs_refcount_lookup_le(sc->sr.refc_cur, XFS_REFC_DOMAIN_COW, + bno, &has_refcount); + if (!xchk_should_check_xref(sc, &error, &sc->sr.refc_cur)) + return; + if (!has_refcount) { + xchk_btree_xref_set_corrupt(sc, sc->sr.refc_cur, 0); + return; + } + + error = xfs_refcount_get_rec(sc->sr.refc_cur, &rc, &has_refcount); + if (!xchk_should_check_xref(sc, &error, &sc->sr.refc_cur)) + return; + if (!has_refcount) { + xchk_btree_xref_set_corrupt(sc, sc->sr.refc_cur, 0); + return; + } + + /* CoW lookup returned a shared extent record? */ + if (rc.rc_domain != XFS_REFC_DOMAIN_COW) + xchk_btree_xref_set_corrupt(sc, sc->sa.refc_cur, 0); + + /* Must be at least as long as what was passed in */ + if (rc.rc_blockcount < len) + xchk_btree_xref_set_corrupt(sc, sc->sr.refc_cur, 0); +} + +/* + * xref check that the extent is not shared. Only file data blocks + * can have multiple owners. + */ +void +xchk_xref_is_not_rt_shared( + struct xfs_scrub *sc, + xfs_rgblock_t bno, + xfs_extlen_t len) +{ + enum xbtree_recpacking outcome; + int error; + + if (!sc->sr.refc_cur || xchk_skip_xref(sc->sm)) + return; + + error = xfs_refcount_has_records(sc->sr.refc_cur, + XFS_REFC_DOMAIN_SHARED, bno, len, &outcome); + if (!xchk_should_check_xref(sc, &error, &sc->sr.refc_cur)) + return; + if (outcome != XBTREE_RECPACKING_EMPTY) + xchk_btree_xref_set_corrupt(sc, sc->sr.refc_cur, 0); +} + +/* xref check that the extent is not being used for CoW staging. */ +void +xchk_xref_is_not_rt_cow_staging( + struct xfs_scrub *sc, + xfs_rgblock_t bno, + xfs_extlen_t len) +{ + enum xbtree_recpacking outcome; + int error; + + if (!sc->sr.refc_cur || xchk_skip_xref(sc->sm)) + return; + + error = xfs_refcount_has_records(sc->sr.refc_cur, XFS_REFC_DOMAIN_COW, + bno, len, &outcome); + if (!xchk_should_check_xref(sc, &error, &sc->sr.refc_cur)) + return; + if (outcome != XBTREE_RECPACKING_EMPTY) + xchk_btree_xref_set_corrupt(sc, sc->sr.refc_cur, 0); +} diff --git a/fs/xfs/scrub/rtrmap.c b/fs/xfs/scrub/rtrmap.c index 5442325a6982..e89d5310117a 100644 --- a/fs/xfs/scrub/rtrmap.c +++ b/fs/xfs/scrub/rtrmap.c @@ -21,6 +21,7 @@ #include "xfs_inode.h" #include "xfs_rtalloc.h" #include "xfs_rtgroup.h" +#include "xfs_refcount.h" #include "scrub/xfs_scrub.h" #include "scrub/scrub.h" #include "scrub/common.h" @@ -157,6 +158,37 @@ xchk_rtrmapbt_check_mergeable( memcpy(&cr->prev_rec, irec, sizeof(struct xfs_rmap_irec)); } +/* Cross-reference a rmap against the refcount btree. */ +STATIC void +xchk_rtrmapbt_xref_rtrefc( + struct xfs_scrub *sc, + struct xfs_rmap_irec *irec) +{ + xfs_rgblock_t fbno; + xfs_extlen_t flen; + bool is_inode; + bool is_bmbt; + bool is_attr; + bool is_unwritten; + int error; + + if (!sc->sr.refc_cur || xchk_skip_xref(sc->sm)) + return; + + is_inode = !XFS_RMAP_NON_INODE_OWNER(irec->rm_owner); + is_bmbt = irec->rm_flags & XFS_RMAP_BMBT_BLOCK; + is_attr = irec->rm_flags & XFS_RMAP_ATTR_FORK; + is_unwritten = irec->rm_flags & XFS_RMAP_UNWRITTEN; + + /* If this is shared, must be a data fork extent. */ + error = xfs_refcount_find_shared(sc->sr.refc_cur, irec->rm_startblock, + irec->rm_blockcount, &fbno, &flen, false); + if (!xchk_should_check_xref(sc, &error, &sc->sr.refc_cur)) + return; + if (flen != 0 && (!is_inode || is_attr || is_bmbt || is_unwritten)) + xchk_btree_xref_set_corrupt(sc, sc->sr.refc_cur, 0); +} + /* Cross-reference with other metadata. */ STATIC void xchk_rtrmapbt_xref( @@ -172,6 +204,11 @@ xchk_rtrmapbt_xref( irec->rm_startblock); xchk_xref_is_used_rt_space(sc, rtbno, irec->rm_blockcount); + if (irec->rm_owner == XFS_RMAP_OWN_COW) + xchk_xref_is_cow_staging(sc, irec->rm_startblock, + irec->rm_blockcount); + else + xchk_rtrmapbt_xref_rtrefc(sc, irec); } /* Scrub a realtime rmapbt record. */ diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h index 3a9dd26eca7d..0a3b151f9870 100644 --- a/fs/xfs/scrub/scrub.h +++ b/fs/xfs/scrub/scrub.h @@ -242,11 +242,20 @@ void xchk_xref_has_rt_owner(struct xfs_scrub *sc, xfs_rgblock_t rgbno, xfs_extlen_t len); void xchk_xref_is_only_rt_owned_by(struct xfs_scrub *sc, xfs_rgblock_t rgbno, xfs_extlen_t len, const struct xfs_owner_info *oinfo); +void xchk_xref_is_rt_cow_staging(struct xfs_scrub *sc, xfs_rgblock_t rgbno, + xfs_extlen_t len); +void xchk_xref_is_not_rt_shared(struct xfs_scrub *sc, xfs_rgblock_t rgbno, + xfs_extlen_t len); +void xchk_xref_is_not_rt_cow_staging(struct xfs_scrub *sc, xfs_rgblock_t rgbno, + xfs_extlen_t len); #else # define xchk_xref_is_used_rt_space(sc, rtbno, len) do { } while (0) # define xchk_xref_has_no_rt_owner(sc, rtbno, len) do { } while (0) # define xchk_xref_has_rt_owner(sc, rtbno, len) do { } while (0) # define xchk_xref_is_only_rt_owned_by(sc, bno, len, oinfo) do { } while (0) +# define xchk_xref_is_rt_cow_staging(sc, bno, len) do { } while (0) +# define xchk_xref_is_not_rt_shared(sc, bno, len) do { } while (0) +# define xchk_xref_is_not_rt_cow_staging(sc, bno, len) do { } while (0) #endif #endif /* __XFS_SCRUB_SCRUB_H__ */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 32/42] xfs: check reference counts of gaps between rt refcount records 2022-12-30 22:18 ` [PATCHSET v1.0 00/42] xfs: reflink on the realtime device Darrick J. Wong ` (33 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 30/42] xfs: cross-reference checks with the rt " Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 39/42] xfs: online repair of the realtime refcount btree Darrick J. Wong ` (6 subsequent siblings) 41 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> If there's a gap between records in the rt refcount btree, we ought to cross-reference the gap with the rtrmap records to make sure that there aren't any overlapping records for a region that doesn't have any shared ownership. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/scrub/rtrefcount.c | 81 ++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 80 insertions(+), 1 deletion(-) diff --git a/fs/xfs/scrub/rtrefcount.c b/fs/xfs/scrub/rtrefcount.c index 05512f8443a2..3cb2ff8443da 100644 --- a/fs/xfs/scrub/rtrefcount.c +++ b/fs/xfs/scrub/rtrefcount.c @@ -15,6 +15,7 @@ #include "xfs_inode.h" #include "xfs_rtbitmap.h" #include "xfs_rtgroup.h" +#include "xfs_rtalloc.h" #include "scrub/scrub.h" #include "scrub/common.h" #include "scrub/btree.h" @@ -356,8 +357,14 @@ struct xchk_rtrefcbt_records { /* Previous refcount record. */ struct xfs_refcount_irec prev_rec; + /* The next rtgroup block where we aren't expecting shared extents. */ + xfs_rgblock_t next_unshared_rgbno; + /* Number of CoW blocks we expect. */ xfs_extlen_t cow_blocks; + + /* Was the last record a shared or CoW staging extent? */ + enum xfs_refc_domain prev_domain; }; static inline bool @@ -398,6 +405,53 @@ xchk_rtrefcountbt_check_mergeable( memcpy(&rrc->prev_rec, irec, sizeof(struct xfs_refcount_irec)); } +STATIC int +xchk_rtrefcountbt_rmap_check_gap( + struct xfs_btree_cur *cur, + const struct xfs_rmap_irec *rec, + void *priv) +{ + xfs_rgblock_t *next_bno = priv; + + if (*next_bno != NULLRGBLOCK && rec->rm_startblock < *next_bno) + return -ECANCELED; + + *next_bno = rec->rm_startblock + rec->rm_blockcount; + return 0; +} + +/* + * Make sure that a gap in the reference count records does not correspond to + * overlapping records (i.e. shared extents) in the reverse mappings. + */ +static inline void +xchk_rtrefcountbt_xref_gaps( + struct xfs_scrub *sc, + struct xchk_rtrefcbt_records *rrc, + xfs_rtblock_t bno) +{ + struct xfs_rmap_irec low; + struct xfs_rmap_irec high; + xfs_rgblock_t next_bno = NULLRGBLOCK; + int error; + + if (bno <= rrc->next_unshared_rgbno || !sc->sr.rmap_cur || + xchk_skip_xref(sc->sm)) + return; + + memset(&low, 0, sizeof(low)); + low.rm_startblock = rrc->next_unshared_rgbno; + memset(&high, 0xFF, sizeof(high)); + high.rm_startblock = bno - 1; + + error = xfs_rmap_query_range(sc->sr.rmap_cur, &low, &high, + xchk_rtrefcountbt_rmap_check_gap, &next_bno); + if (error == -ECANCELED) + xchk_btree_xref_set_corrupt(sc, sc->sr.rmap_cur, 0); + else + xchk_should_check_xref(sc, &error, &sc->sr.rmap_cur); +} + /* Scrub a rtrefcountbt record. */ STATIC int xchk_rtrefcountbt_rec( @@ -426,9 +480,26 @@ xchk_rtrefcountbt_rec( if (irec.rc_domain == XFS_REFC_DOMAIN_COW) rrc->cow_blocks += irec.rc_blockcount; + /* Shared records always come before CoW records. */ + if (irec.rc_domain == XFS_REFC_DOMAIN_SHARED && + rrc->prev_domain == XFS_REFC_DOMAIN_COW) + xchk_btree_set_corrupt(bs->sc, bs->cur, 0); + rrc->prev_domain = irec.rc_domain; + xchk_rtrefcountbt_check_mergeable(bs, rrc, &irec); xchk_rtrefcountbt_xref(bs->sc, &irec); + /* + * If this is a record for a shared extent, check that all blocks + * between the previous record and this one have at most one reverse + * mapping. + */ + if (irec.rc_domain == XFS_REFC_DOMAIN_SHARED) { + xchk_rtrefcountbt_xref_gaps(bs->sc, rrc, irec.rc_startblock); + rrc->next_unshared_rgbno = irec.rc_startblock + + irec.rc_blockcount; + } + return 0; } @@ -473,7 +544,9 @@ xchk_rtrefcountbt( { struct xfs_owner_info btree_oinfo; struct xchk_rtrefcbt_records rrc = { - .cow_blocks = 0, + .cow_blocks = 0, + .next_unshared_rgbno = 0, + .prev_domain = XFS_REFC_DOMAIN_SHARED, }; int error; @@ -488,6 +561,12 @@ xchk_rtrefcountbt( if (error) goto out_unlock; + /* + * Check that all blocks between the last refcount > 1 record and the + * end of the rt volume have at most one reverse mapping. + */ + xchk_rtrefcountbt_xref_gaps(sc, &rrc, sc->mp->m_sb.sb_rblocks); + xchk_refcount_xref_rmap(sc, &btree_oinfo, rrc.cow_blocks); out_unlock: ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 39/42] xfs: online repair of the realtime refcount btree 2022-12-30 22:18 ` [PATCHSET v1.0 00/42] xfs: reflink on the realtime device Darrick J. Wong ` (34 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 32/42] xfs: check reference counts of gaps between rt refcount records Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 42/42] xfs: enable realtime reflink Darrick J. Wong ` (5 subsequent siblings) 41 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Port the data device's refcount btree repair code to the realtime refcount btree. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/Makefile | 1 fs/xfs/libxfs/xfs_refcount.c | 2 fs/xfs/libxfs/xfs_refcount.h | 2 fs/xfs/scrub/bmap_repair.c | 2 fs/xfs/scrub/repair.c | 20 + fs/xfs/scrub/repair.h | 7 fs/xfs/scrub/rtrefcount.c | 9 fs/xfs/scrub/rtrefcount_repair.c | 783 ++++++++++++++++++++++++++++++++++++++ fs/xfs/scrub/rtrmap_repair.c | 2 fs/xfs/scrub/scrub.c | 2 fs/xfs/scrub/trace.h | 31 ++ 11 files changed, 852 insertions(+), 9 deletions(-) create mode 100644 fs/xfs/scrub/rtrefcount_repair.c diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile index cb1074c67dc5..2f84dff55b6e 100644 --- a/fs/xfs/Makefile +++ b/fs/xfs/Makefile @@ -223,6 +223,7 @@ xfs-y += $(addprefix scrub/, \ xfs-$(CONFIG_XFS_RT) += $(addprefix scrub/, \ rgsuper_repair.o \ rtbitmap_repair.o \ + rtrefcount_repair.o \ rtrmap_repair.o \ rtsummary_repair.o \ ) diff --git a/fs/xfs/libxfs/xfs_refcount.c b/fs/xfs/libxfs/xfs_refcount.c index 8b878a7a5a3e..e3e349cad04f 100644 --- a/fs/xfs/libxfs/xfs_refcount.c +++ b/fs/xfs/libxfs/xfs_refcount.c @@ -155,7 +155,7 @@ xfs_refcount_check_perag_irec( return NULL; } -static inline xfs_failaddr_t +inline xfs_failaddr_t xfs_refcount_check_rtgroup_irec( struct xfs_rtgroup *rtg, const struct xfs_refcount_irec *irec) diff --git a/fs/xfs/libxfs/xfs_refcount.h b/fs/xfs/libxfs/xfs_refcount.h index c7907119d10c..790d7fe9e67e 100644 --- a/fs/xfs/libxfs/xfs_refcount.h +++ b/fs/xfs/libxfs/xfs_refcount.h @@ -132,6 +132,8 @@ extern void xfs_refcount_btrec_to_irec(const union xfs_btree_rec *rec, struct xfs_refcount_irec *irec); xfs_failaddr_t xfs_refcount_check_perag_irec(struct xfs_perag *pag, const struct xfs_refcount_irec *irec); +xfs_failaddr_t xfs_refcount_check_rtgroup_irec(struct xfs_rtgroup *rtg, + const struct xfs_refcount_irec *irec); xfs_failaddr_t xfs_refcount_check_irec(struct xfs_btree_cur *cur, const struct xfs_refcount_irec *irec); extern int xfs_refcount_insert(struct xfs_btree_cur *cur, diff --git a/fs/xfs/scrub/bmap_repair.c b/fs/xfs/scrub/bmap_repair.c index 5dca4680657f..4df6ce7beef4 100644 --- a/fs/xfs/scrub/bmap_repair.c +++ b/fs/xfs/scrub/bmap_repair.c @@ -349,7 +349,7 @@ xrep_bmap_check_rtfork_rmap( /* Make sure this isn't free space. */ rtbno = xfs_rgbno_to_rtb(sc->mp, cur->bc_ino.rtg->rtg_rgno, rec->rm_startblock); - return xrep_require_rtext_inuse(sc, rtbno, rec->rm_blockcount); + return xrep_require_rtext_inuse(sc, rtbno, rec->rm_blockcount, false); } /* Record realtime extents that belong to this inode's fork. */ diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c index 3bde5ea86cf5..566fff059384 100644 --- a/fs/xfs/scrub/repair.c +++ b/fs/xfs/scrub/repair.c @@ -1022,21 +1022,31 @@ xrep_rtgroup_init( return 0; } -/* Ensure that all rt blocks in the given range are not marked free. */ +/* + * Ensure that all rt blocks in the given range are not marked free. If + * @must_align is true, then both ends must be aligned to a rt extent. + */ int xrep_require_rtext_inuse( struct xfs_scrub *sc, xfs_rtblock_t rtbno, - xfs_filblks_t len) + xfs_filblks_t len, + bool must_align) { struct xfs_mount *mp = sc->mp; xfs_rtxnum_t startrtx; xfs_rtxnum_t endrtx; + xfs_extlen_t mod; bool is_free = false; int error; - startrtx = xfs_rtb_to_rtxt(mp, rtbno); - endrtx = xfs_rtb_to_rtxt(mp, rtbno + len - 1); + startrtx = xfs_rtb_to_rtx(mp, rtbno, &mod); + if (must_align && mod != 0) + return -EFSCORRUPTED; + + endrtx = xfs_rtb_to_rtx(mp, rtbno + len - 1, &mod); + if (must_align && mod != mp->m_sb.sb_rextsize - 1) + return -EFSCORRUPTED; error = xfs_rtalloc_extent_is_free(mp, sc->tp, startrtx, endrtx - startrtx + 1, &is_free); @@ -1393,6 +1403,8 @@ xrep_is_rtmeta_ino( /* Newer rt metadata files are not guaranteed to exist */ if (rtg->rtg_rmapip && ino == rtg->rtg_rmapip->i_ino) return true; + if (rtg->rtg_refcountip && ino == rtg->rtg_refcountip->i_ino) + return true; return false; } diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h index 4a0cedea3fe0..aa15aeffa724 100644 --- a/fs/xfs/scrub/repair.h +++ b/fs/xfs/scrub/repair.h @@ -90,6 +90,7 @@ int xrep_setup_nlinks(struct xfs_scrub *sc); int xrep_setup_symlink(struct xfs_scrub *sc, unsigned int *resblks); int xrep_setup_rgbitmap(struct xfs_scrub *sc, unsigned int *resblks); int xrep_setup_rtrmapbt(struct xfs_scrub *sc); +int xrep_setup_rtrefcountbt(struct xfs_scrub *sc); int xrep_xattr_reset_fork(struct xfs_scrub *sc); @@ -108,7 +109,7 @@ int xrep_rtgroup_init(struct xfs_scrub *sc, struct xfs_rtgroup *rtg, struct xchk_rt *sr, unsigned int rtglock_flags); void xrep_rtgroup_btcur_init(struct xfs_scrub *sc, struct xchk_rt *sr); int xrep_require_rtext_inuse(struct xfs_scrub *sc, xfs_rtblock_t rtbno, - xfs_filblks_t len); + xfs_filblks_t len, bool must_align); xfs_extlen_t xrep_calc_rtgroup_resblks(struct xfs_scrub *sc); #else # define xrep_rtgroup_init(sc, rtg, sr, lockflags) (-ENOSYS) @@ -153,12 +154,14 @@ int xrep_rtsummary(struct xfs_scrub *sc); int xrep_rgsuperblock(struct xfs_scrub *sc); int xrep_rgbitmap(struct xfs_scrub *sc); int xrep_rtrmapbt(struct xfs_scrub *sc); +int xrep_rtrefcountbt(struct xfs_scrub *sc); #else # define xrep_rtbitmap xrep_notsupported # define xrep_rtsummary xrep_notsupported # define xrep_rgsuperblock xrep_notsupported # define xrep_rgbitmap xrep_notsupported # define xrep_rtrmapbt xrep_notsupported +# define xrep_rtrefcountbt xrep_notsupported #endif /* CONFIG_XFS_RT */ #ifdef CONFIG_XFS_QUOTA @@ -230,6 +233,7 @@ xrep_setup_nothing( #define xrep_setup_parent xrep_setup_nothing #define xrep_setup_nlinks xrep_setup_nothing #define xrep_setup_rtrmapbt xrep_setup_nothing +#define xrep_setup_rtrefcountbt xrep_setup_nothing #define xrep_setup_inode(sc, imap) ((void)0) @@ -286,6 +290,7 @@ static inline int xrep_setup_rgbitmap(struct xfs_scrub *sc, unsigned int *x) #define xrep_rgsuperblock xrep_notsupported #define xrep_rgbitmap xrep_notsupported #define xrep_rtrmapbt xrep_notsupported +#define xrep_rtrefcountbt xrep_notsupported #endif /* CONFIG_XFS_ONLINE_REPAIR */ diff --git a/fs/xfs/scrub/rtrefcount.c b/fs/xfs/scrub/rtrefcount.c index 3cb2ff8443da..8eb79f7030e7 100644 --- a/fs/xfs/scrub/rtrefcount.c +++ b/fs/xfs/scrub/rtrefcount.c @@ -7,8 +7,10 @@ #include "xfs_fs.h" #include "xfs_shared.h" #include "xfs_format.h" +#include "xfs_log_format.h" #include "xfs_trans_resv.h" #include "xfs_mount.h" +#include "xfs_trans.h" #include "xfs_btree.h" #include "xfs_rmap.h" #include "xfs_refcount.h" @@ -19,6 +21,7 @@ #include "scrub/scrub.h" #include "scrub/common.h" #include "scrub/btree.h" +#include "scrub/repair.h" /* Set us up with the realtime refcount metadata locked. */ int @@ -31,6 +34,12 @@ xchk_setup_rtrefcountbt( if (xchk_need_fshook_drain(sc)) xchk_fshooks_enable(sc, XCHK_FSHOOKS_DRAIN); + if (xchk_could_repair(sc)) { + error = xrep_setup_rtrefcountbt(sc); + if (error) + return error; + } + rtg = xfs_rtgroup_get(sc->mp, sc->sm->sm_agno); if (!rtg) return -ENOENT; diff --git a/fs/xfs/scrub/rtrefcount_repair.c b/fs/xfs/scrub/rtrefcount_repair.c new file mode 100644 index 000000000000..f56966aaaad8 --- /dev/null +++ b/fs/xfs/scrub/rtrefcount_repair.c @@ -0,0 +1,783 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (C) 2022 Oracle. All Rights Reserved. + * Author: Darrick J. Wong <djwong@kernel.org> + */ +#include "xfs.h" +#include "xfs_fs.h" +#include "xfs_shared.h" +#include "xfs_format.h" +#include "xfs_trans_resv.h" +#include "xfs_mount.h" +#include "xfs_defer.h" +#include "xfs_btree.h" +#include "xfs_btree_staging.h" +#include "xfs_bit.h" +#include "xfs_log_format.h" +#include "xfs_trans.h" +#include "xfs_sb.h" +#include "xfs_alloc.h" +#include "xfs_ialloc.h" +#include "xfs_rmap.h" +#include "xfs_rmap_btree.h" +#include "xfs_rtrmap_btree.h" +#include "xfs_refcount.h" +#include "xfs_rtrefcount_btree.h" +#include "xfs_error.h" +#include "xfs_health.h" +#include "xfs_inode.h" +#include "xfs_quota.h" +#include "xfs_rtalloc.h" +#include "xfs_ag.h" +#include "xfs_rtgroup.h" +#include "scrub/xfs_scrub.h" +#include "scrub/scrub.h" +#include "scrub/common.h" +#include "scrub/btree.h" +#include "scrub/trace.h" +#include "scrub/repair.h" +#include "scrub/bitmap.h" +#include "scrub/xfile.h" +#include "scrub/xfarray.h" +#include "scrub/newbt.h" +#include "scrub/reap.h" +#include "scrub/rcbag.h" + +/* + * Rebuilding the Reference Count Btree + * ==================================== + * + * This algorithm is "borrowed" from xfs_repair. Imagine the rmap + * entries as rectangles representing extents of physical blocks, and + * that the rectangles can be laid down to allow them to overlap each + * other; then we know that we must emit a refcnt btree entry wherever + * the amount of overlap changes, i.e. the emission stimulus is + * level-triggered: + * + * - --- + * -- ----- ---- --- ------ + * -- ---- ----------- ---- --------- + * -------------------------------- ----------- + * ^ ^ ^^ ^^ ^ ^^ ^^^ ^^^^ ^ ^^ ^ ^ ^ + * 2 1 23 21 3 43 234 2123 1 01 2 3 0 + * + * For our purposes, a rmap is a tuple (startblock, len, fileoff, owner). + * + * Note that in the actual refcnt btree we don't store the refcount < 2 + * cases because the bnobt tells us which blocks are free; single-use + * blocks aren't recorded in the bnobt or the refcntbt. If the rmapbt + * supports storing multiple entries covering a given block we could + * theoretically dispense with the refcntbt and simply count rmaps, but + * that's inefficient in the (hot) write path, so we'll take the cost of + * the extra tree to save time. Also there's no guarantee that rmap + * will be enabled. + * + * Given an array of rmaps sorted by physical block number, a starting + * physical block (sp), a bag to hold rmaps that cover sp, and the next + * physical block where the level changes (np), we can reconstruct the + * rt refcount btree as follows: + * + * While there are still unprocessed rmaps in the array, + * - Set sp to the physical block (pblk) of the next unprocessed rmap. + * - Add to the bag all rmaps in the array where startblock == sp. + * - Set np to the physical block where the bag size will change. This + * is the minimum of (the pblk of the next unprocessed rmap) and + * (startblock + len of each rmap in the bag). + * - Record the bag size as old_bag_size. + * + * - While the bag isn't empty, + * - Remove from the bag all rmaps where startblock + len == np. + * - Add to the bag all rmaps in the array where startblock == np. + * - If the bag size isn't old_bag_size, store the refcount entry + * (sp, np - sp, bag_size) in the refcnt btree. + * - If the bag is empty, break out of the inner loop. + * - Set old_bag_size to the bag size + * - Set sp = np. + * - Set np to the physical block where the bag size will change. + * This is the minimum of (the pblk of the next unprocessed rmap) + * and (startblock + len of each rmap in the bag). + * + * Like all the other repairers, we make a list of all the refcount + * records we need, then reinitialize the rt refcount btree root and + * insert all the records. + */ + +struct xrep_rtrefc { + /* refcount extents */ + struct xfarray *refcount_records; + + /* new refcountbt information */ + struct xrep_newbt new_btree; + + /* old refcountbt blocks */ + struct xfsb_bitmap old_rtrefcountbt_blocks; + + struct xfs_scrub *sc; + + /* get_records()'s position in the rt refcount record array. */ + xfarray_idx_t array_cur; + + /* # of refcountbt blocks */ + xfs_filblks_t btblocks; +}; + +/* Set us up to repair refcount btrees. */ +int +xrep_setup_rtrefcountbt( + struct xfs_scrub *sc) +{ + return xrep_setup_buftarg(sc, "rtrefcount bag"); +} + +/* Check for any obvious conflicts with this shared/CoW staging extent. */ +STATIC int +xrep_rtrefc_check_ext( + struct xfs_scrub *sc, + const struct xfs_refcount_irec *rec) +{ + xfs_rtblock_t rtbno; + + if (xfs_refcount_check_rtgroup_irec(sc->sr.rtg, rec) != NULL) + return -EFSCORRUPTED; + + /* Make sure this isn't free space or misaligned. */ + rtbno = xfs_rgbno_to_rtb(sc->mp, sc->sr.rtg->rtg_rgno, + rec->rc_startblock); + return xrep_require_rtext_inuse(sc, rtbno, rec->rc_blockcount, true); +} + +/* Record a reference count extent. */ +STATIC int +xrep_rtrefc_stash( + struct xrep_rtrefc *rr, + enum xfs_refc_domain domain, + xfs_rgblock_t bno, + xfs_extlen_t len, + uint64_t refcount) +{ + struct xfs_refcount_irec irec = { + .rc_startblock = bno, + .rc_blockcount = len, + .rc_refcount = refcount, + .rc_domain = domain, + }; + int error = 0; + + if (xchk_should_terminate(rr->sc, &error)) + return error; + + irec.rc_refcount = min_t(uint64_t, XFS_REFC_REFCOUNT_MAX, refcount); + + error = xrep_rtrefc_check_ext(rr->sc, &irec); + if (error) + return error; + + trace_xrep_rtrefc_found(rr->sc->sr.rtg, &irec); + + return xfarray_append(rr->refcount_records, &irec); +} + +/* Record a CoW staging extent. */ +STATIC int +xrep_rtrefc_stash_cow( + struct xrep_rtrefc *rr, + xfs_rgblock_t bno, + xfs_extlen_t len) +{ + return xrep_rtrefc_stash(rr, XFS_REFC_DOMAIN_COW, bno, len, 1); +} + +/* Decide if an rmap could describe a shared extent. */ +static inline bool +xrep_rtrefc_rmap_shareable( + const struct xfs_rmap_irec *rmap) +{ + /* rt metadata are never sharable */ + if (XFS_RMAP_NON_INODE_OWNER(rmap->rm_owner)) + return false; + + /* Unwritten file blocks are not shareable. */ + if (rmap->rm_flags & XFS_RMAP_UNWRITTEN) + return false; + + return true; +} + +/* Grab the next (abbreviated) rmap record from the rmapbt. */ +STATIC int +xrep_rtrefc_walk_rmaps( + struct xrep_rtrefc *rr, + struct xfs_rmap_irec *rmap, + bool *have_rec) +{ + struct xfs_btree_cur *cur = rr->sc->sr.rmap_cur; + struct xfs_mount *mp = cur->bc_mp; + int have_gt; + int error = 0; + + *have_rec = false; + + /* + * Loop through the remaining rmaps. Remember CoW staging + * extents and the refcountbt blocks from the old tree for later + * disposal. We can only share written data fork extents, so + * keep looping until we find an rmap for one. + */ + do { + if (xchk_should_terminate(rr->sc, &error)) + return error; + + error = xfs_btree_increment(cur, 0, &have_gt); + if (error) + return error; + if (!have_gt) + return 0; + + error = xfs_rmap_get_rec(cur, rmap, &have_gt); + if (error) + return error; + if (XFS_IS_CORRUPT(mp, !have_gt)) { + xfs_btree_mark_sick(cur); + return -EFSCORRUPTED; + } + + if (rmap->rm_owner == XFS_RMAP_OWN_COW) { + error = xrep_rtrefc_stash_cow(rr, rmap->rm_startblock, + rmap->rm_blockcount); + if (error) + return error; + } else if (xfs_internal_inum(mp, rmap->rm_owner) || + (rmap->rm_flags & (XFS_RMAP_ATTR_FORK | + XFS_RMAP_BMBT_BLOCK))) { + xfs_btree_mark_sick(cur); + return -EFSCORRUPTED; + } + } while (!xrep_rtrefc_rmap_shareable(rmap)); + + *have_rec = true; + return 0; +} + +static inline uint32_t +xrep_rtrefc_encode_startblock( + const struct xfs_refcount_irec *irec) +{ + uint32_t start; + + start = irec->rc_startblock & ~XFS_REFC_COWFLAG; + if (irec->rc_domain == XFS_REFC_DOMAIN_COW) + start |= XFS_REFC_COWFLAG; + + return start; +} + +/* + * Compare two refcount records. We want to sort in order of increasing block + * number. + */ +static int +xrep_rtrefc_extent_cmp( + const void *a, + const void *b) +{ + const struct xfs_refcount_irec *ap = a; + const struct xfs_refcount_irec *bp = b; + uint32_t sa, sb; + + sa = xrep_rtrefc_encode_startblock(ap); + sb = xrep_rtrefc_encode_startblock(bp); + + if (sa > sb) + return 1; + if (sa < sb) + return -1; + return 0; +} + +/* + * Sort the refcount extents by startblock or else the btree records will be in + * the wrong order. Make sure the records do not overlap in physical space. + */ +STATIC int +xrep_rtrefc_sort_records( + struct xrep_rtrefc *rr) +{ + struct xfs_refcount_irec irec; + xfarray_idx_t cur; + enum xfs_refc_domain dom = XFS_REFC_DOMAIN_SHARED; + xfs_rgblock_t next_rgbno = 0; + int error; + + error = xfarray_sort(rr->refcount_records, xrep_rtrefc_extent_cmp, + XFARRAY_SORT_KILLABLE); + if (error) + return error; + + foreach_xfarray_idx(rr->refcount_records, cur) { + if (xchk_should_terminate(rr->sc, &error)) + return error; + + error = xfarray_load(rr->refcount_records, cur, &irec); + if (error) + return error; + + if (dom == XFS_REFC_DOMAIN_SHARED && + irec.rc_domain == XFS_REFC_DOMAIN_COW) { + dom = irec.rc_domain; + next_rgbno = 0; + } + + if (dom != irec.rc_domain) + return -EFSCORRUPTED; + if (irec.rc_startblock < next_rgbno) + return -EFSCORRUPTED; + + next_rgbno = irec.rc_startblock + irec.rc_blockcount; + } + + return error; +} + +/* Record extents that belong to the realtime refcount inode. */ +STATIC int +xrep_rtrefc_walk_rmap( + struct xfs_btree_cur *cur, + const struct xfs_rmap_irec *rec, + void *priv) +{ + struct xrep_rtrefc *rr = priv; + struct xfs_mount *mp = cur->bc_mp; + xfs_fsblock_t fsbno; + int error = 0; + + if (xchk_should_terminate(rr->sc, &error)) + return error; + + /* Skip extents which are not owned by this inode and fork. */ + if (rec->rm_owner != rr->sc->ip->i_ino) + return 0; + + error = xrep_check_ino_btree_mapping(rr->sc, rec); + if (error) + return error; + + fsbno = XFS_AGB_TO_FSB(mp, cur->bc_ag.pag->pag_agno, + rec->rm_startblock); + + return xfsb_bitmap_set(&rr->old_rtrefcountbt_blocks, fsbno, + rec->rm_blockcount); +} + +/* + * Walk forward through the rmap btree to collect all rmaps starting at + * @bno in @rmap_bag. These represent the file(s) that share ownership of + * the current block. Upon return, the rmap cursor points to the last record + * satisfying the startblock constraint. + */ +static int +xrep_rtrefc_push_rmaps_at( + struct xrep_rtrefc *rr, + struct rcbag *rcstack, + xfs_rgblock_t bno, + struct xfs_rmap_irec *rmap, + bool *have) +{ + struct xfs_scrub *sc = rr->sc; + int have_gt; + int error; + + while (*have && rmap->rm_startblock == bno) { + error = rcbag_add(rcstack, rr->sc->tp, rmap); + if (error) + return error; + + error = xrep_rtrefc_walk_rmaps(rr, rmap, have); + if (error) + return error; + } + + error = xfs_btree_decrement(sc->sr.rmap_cur, 0, &have_gt); + if (error) + return error; + if (XFS_IS_CORRUPT(sc->mp, !have_gt)) { + xfs_btree_mark_sick(sc->sr.rmap_cur); + return -EFSCORRUPTED; + } + + return 0; +} + +/* Scan one AG for reverse mappings for the realtime refcount btree. */ +STATIC int +xrep_rtrefc_scan_ag( + struct xrep_rtrefc *rr, + struct xfs_perag *pag) +{ + struct xfs_scrub *sc = rr->sc; + int error; + + error = xrep_ag_init(sc, pag, &sc->sa); + if (error) + return error; + + error = xfs_rmap_query_all(sc->sa.rmap_cur, xrep_rtrefc_walk_rmap, rr); + xchk_ag_free(sc, &sc->sa); + return error; +} + +/* Iterate all the rmap records to generate reference count data. */ +STATIC int +xrep_rtrefc_find_refcounts( + struct xrep_rtrefc *rr) +{ + struct xfs_scrub *sc = rr->sc; + struct rcbag *rcstack; + struct xfs_perag *pag; + uint64_t old_stack_height; + xfs_rgblock_t sbno; + xfs_rgblock_t cbno; + xfs_rgblock_t nbno; + xfs_agnumber_t agno; + bool have; + int error; + + /* Scan for old rtrefc btree blocks. */ + for_each_perag(sc->mp, agno, pag) { + error = xrep_rtrefc_scan_ag(rr, pag); + if (error) { + xfs_perag_put(pag); + return error; + } + } + + xrep_rtgroup_btcur_init(sc, &sc->sr); + + /* + * Set up a bag to store all the rmap records that we're tracking to + * generate a reference count record. If this exceeds + * XFS_REFC_REFCOUNT_MAX, we clamp rc_refcount. + */ + error = rcbag_init(sc->mp, sc->xfile_buftarg, &rcstack); + if (error) + goto out_cur; + + /* Start the rtrmapbt cursor to the left of all records. */ + error = xfs_btree_goto_left_edge(sc->sr.rmap_cur); + if (error) + goto out_bag; + + /* Process reverse mappings into refcount data. */ + while (xfs_btree_has_more_records(sc->sr.rmap_cur)) { + struct xfs_rmap_irec rmap; + + /* Push all rmaps with pblk == sbno onto the stack */ + error = xrep_rtrefc_walk_rmaps(rr, &rmap, &have); + if (error) + goto out_bag; + if (!have) + break; + sbno = cbno = rmap.rm_startblock; + error = xrep_rtrefc_push_rmaps_at(rr, rcstack, sbno, &rmap, + &have); + if (error) + goto out_bag; + + /* Set nbno to the bno of the next refcount change */ + error = rcbag_next_edge(rcstack, sc->tp, &rmap, have, &nbno); + if (error) + goto out_bag; + + ASSERT(nbno > sbno); + old_stack_height = rcbag_count(rcstack); + + /* While stack isn't empty... */ + while (rcbag_count(rcstack) > 0) { + /* Pop all rmaps that end at nbno */ + error = rcbag_remove_ending_at(rcstack, sc->tp, nbno); + if (error) + goto out_bag; + + /* Push array items that start at nbno */ + error = xrep_rtrefc_walk_rmaps(rr, &rmap, &have); + if (error) + goto out_bag; + if (have) { + error = xrep_rtrefc_push_rmaps_at(rr, rcstack, + nbno, &rmap, &have); + if (error) + goto out_bag; + } + + /* Emit refcount if necessary */ + ASSERT(nbno > cbno); + if (rcbag_count(rcstack) != old_stack_height) { + if (old_stack_height > 1) { + error = xrep_rtrefc_stash(rr, + XFS_REFC_DOMAIN_SHARED, + cbno, nbno - cbno, + old_stack_height); + if (error) + goto out_bag; + } + cbno = nbno; + } + + /* Stack empty, go find the next rmap */ + if (rcbag_count(rcstack) == 0) + break; + old_stack_height = rcbag_count(rcstack); + sbno = nbno; + + /* Set nbno to the bno of the next refcount change */ + error = rcbag_next_edge(rcstack, sc->tp, &rmap, have, + &nbno); + if (error) + goto out_bag; + + ASSERT(nbno > sbno); + } + } + + ASSERT(rcbag_count(rcstack) == 0); +out_bag: + rcbag_free(&rcstack); +out_cur: + xchk_rtgroup_btcur_free(&sc->sr); + return error; +} + +/* Retrieve refcountbt data for bulk load. */ +STATIC int +xrep_rtrefc_get_records( + struct xfs_btree_cur *cur, + unsigned int idx, + struct xfs_btree_block *block, + unsigned int nr_wanted, + void *priv) +{ + struct xrep_rtrefc *rr = priv; + union xfs_btree_rec *block_rec; + unsigned int loaded; + int error; + + for (loaded = 0; loaded < nr_wanted; loaded++, idx++) { + error = xfarray_load(rr->refcount_records, rr->array_cur++, + &cur->bc_rec.rc); + if (error) + return error; + + block_rec = xfs_btree_rec_addr(cur, idx, block); + cur->bc_ops->init_rec_from_cur(cur, block_rec); + } + + return loaded; +} + +/* Feed one of the new btree blocks to the bulk loader. */ +STATIC int +xrep_rtrefc_claim_block( + struct xfs_btree_cur *cur, + union xfs_btree_ptr *ptr, + void *priv) +{ + struct xrep_rtrefc *rr = priv; + int error; + + error = xrep_newbt_relog_autoreap(&rr->new_btree); + if (error) + return error; + + return xrep_newbt_claim_block(cur, &rr->new_btree, ptr); +} + +/* Figure out how much space we need to create the incore btree root block. */ +STATIC size_t +xrep_rtrefc_iroot_size( + struct xfs_btree_cur *cur, + unsigned int level, + unsigned int nr_this_level, + void *priv) +{ + return xfs_rtrefcount_broot_space_calc(cur->bc_mp, level, + nr_this_level); +} + +/* + * Use the collected refcount information to stage a new rt refcount btree. If + * this is successful we'll return with the new btree root information logged + * to the repair transaction but not yet committed. + */ +STATIC int +xrep_rtrefc_build_new_tree( + struct xrep_rtrefc *rr) +{ + struct xfs_owner_info oinfo; + struct xfs_scrub *sc = rr->sc; + struct xfs_mount *mp = sc->mp; + struct xfs_rtgroup *rtg = sc->sr.rtg; + struct xfs_btree_cur *refc_cur; + int error; + + error = xrep_rtrefc_sort_records(rr); + if (error) + return error; + + /* + * Prepare to construct the new btree by reserving disk space for the + * new btree and setting up all the accounting information we'll need + * to root the new btree while it's under construction and before we + * attach it to the realtime refcount inode. + */ + xfs_rmap_ino_bmbt_owner(&oinfo, rtg->rtg_refcountip->i_ino, + XFS_DATA_FORK); + error = xrep_newbt_init_inode(&rr->new_btree, sc, XFS_DATA_FORK, + &oinfo); + if (error) + return error; + rr->new_btree.bload.get_records = xrep_rtrefc_get_records; + rr->new_btree.bload.claim_block = xrep_rtrefc_claim_block; + rr->new_btree.bload.iroot_size = xrep_rtrefc_iroot_size; + + /* Compute how many blocks we'll need. */ + refc_cur = xfs_rtrefcountbt_stage_cursor(mp, rtg, rtg->rtg_refcountip, + &rr->new_btree.ifake); + error = xfs_btree_bload_compute_geometry(refc_cur, &rr->new_btree.bload, + xfarray_length(rr->refcount_records)); + if (error) + goto err_cur; + + /* Last chance to abort before we start committing fixes. */ + if (xchk_should_terminate(sc, &error)) + goto err_cur; + + /* + * Guess how many blocks we're going to need to rebuild an entire + * rtrefcountbt from the number of extents we found, and pump up our + * transaction to have sufficient block reservation. We're allowed + * to exceed quota to repair inconsistent metadata, though this is + * unlikely. + */ + error = xfs_trans_reserve_more_inode(sc->tp, rtg->rtg_refcountip, + rr->new_btree.bload.nr_blocks, 0, true); + if (error) + goto err_cur; + + /* Reserve the space we'll need for the new btree. */ + error = xrep_newbt_alloc_blocks(&rr->new_btree, + rr->new_btree.bload.nr_blocks); + if (error) + goto err_cur; + + /* Add all observed refcount records. */ + rr->new_btree.ifake.if_fork->if_format = XFS_DINODE_FMT_REFCOUNT; + rr->array_cur = XFARRAY_CURSOR_INIT; + error = xfs_btree_bload(refc_cur, &rr->new_btree.bload, rr); + if (error) + goto err_cur; + + /* + * Install the new rtrefc btree in the inode. After this point the old + * btree is no longer accessible, the new tree is live, and we can + * delete the cursor. + */ + xfs_rtrefcountbt_commit_staged_btree(refc_cur, sc->tp); + xrep_inode_set_nblocks(rr->sc, rr->new_btree.ifake.if_blocks); + xfs_btree_del_cursor(refc_cur, 0); + + /* Dispose of any unused blocks and the accounting information. */ + error = xrep_newbt_commit(&rr->new_btree); + if (error) + return error; + + return xrep_roll_trans(sc); +err_cur: + xfs_btree_del_cursor(refc_cur, error); + xrep_newbt_cancel(&rr->new_btree); + return error; +} + +/* + * Now that we've logged the roots of the new btrees, invalidate all of the + * old blocks and free them. + */ +STATIC int +xrep_rtrefc_remove_old_tree( + struct xrep_rtrefc *rr) +{ + struct xfs_owner_info oinfo; + int error; + + xfs_rmap_ino_bmbt_owner(&oinfo, rr->sc->ip->i_ino, XFS_DATA_FORK); + + /* + * Free all the extents that were allocated to the former rtrefcountbt + * and aren't cross-linked with something else. If the incore space + * reservation for the rtrmap inode is insufficient, this will refill + * it. + */ + error = xrep_reap_fsblocks(rr->sc, &rr->old_rtrefcountbt_blocks, + &oinfo, XFS_AG_RESV_IMETA); + if (error) + return error; + + /* + * Ensure the proper reservation for the rtrefcount inode so that we + * don't fail to expand the btree. + */ + return xrep_reset_imeta_reservation(rr->sc); +} + +/* Rebuild the rt refcount btree. */ +int +xrep_rtrefcountbt( + struct xfs_scrub *sc) +{ + struct xrep_rtrefc *rr; + struct xfs_mount *mp = sc->mp; + int error; + + /* We require the rmapbt to rebuild anything. */ + if (!xfs_has_rtrmapbt(mp)) + return -EOPNOTSUPP; + + /* Make sure any problems with the fork are fixed. */ + error = xrep_metadata_inode_forks(sc); + if (error) + return error; + + rr = kzalloc(sizeof(struct xrep_rtrefc), XCHK_GFP_FLAGS); + if (!rr) + return -ENOMEM; + rr->sc = sc; + + /* Set up enough storage to handle one refcount record per rt extent. */ + error = xfarray_create(mp, "rtrefcount records", + mp->m_sb.sb_rextents, + sizeof(struct xfs_refcount_irec), + &rr->refcount_records); + if (error) + goto out_rr; + + /* Collect all reference counts. */ + xfsb_bitmap_init(&rr->old_rtrefcountbt_blocks); + error = xrep_rtrefc_find_refcounts(rr); + if (error) + goto out_bitmap; + + xfs_trans_ijoin(sc->tp, sc->ip, 0); + + /* Rebuild the refcount information. */ + error = xrep_rtrefc_build_new_tree(rr); + if (error) + goto out_bitmap; + + /* Kill the old tree. */ + error = xrep_rtrefc_remove_old_tree(rr); + +out_bitmap: + xfsb_bitmap_destroy(&rr->old_rtrefcountbt_blocks); + xfarray_destroy(rr->refcount_records); +out_rr: + kfree(rr); + return error; +} diff --git a/fs/xfs/scrub/rtrmap_repair.c b/fs/xfs/scrub/rtrmap_repair.c index 36c03e48c3fb..fb841036b89f 100644 --- a/fs/xfs/scrub/rtrmap_repair.c +++ b/fs/xfs/scrub/rtrmap_repair.c @@ -119,7 +119,7 @@ xrep_rtrmap_check_mapping( /* Make sure this isn't free space. */ rtbno = xfs_rgbno_to_rtb(sc->mp, sc->sr.rtg->rtg_rgno, rec->rm_startblock); - return xrep_require_rtext_inuse(sc, rtbno, rec->rm_blockcount); + return xrep_require_rtext_inuse(sc, rtbno, rec->rm_blockcount, false); } /* Store a reverse-mapping record. */ diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c index ad6f297ae6cf..2f60fd6b86a9 100644 --- a/fs/xfs/scrub/scrub.c +++ b/fs/xfs/scrub/scrub.c @@ -437,7 +437,7 @@ static const struct xchk_meta_ops meta_scrub_ops[] = { .setup = xchk_setup_rtrefcountbt, .scrub = xchk_rtrefcountbt, .has = xfs_has_rtreflink, - .repair = xrep_notsupported, + .repair = xrep_rtrefcountbt, }, }; diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h index 8070d946ae1d..d74bba391854 100644 --- a/fs/xfs/scrub/trace.h +++ b/fs/xfs/scrub/trace.h @@ -3063,6 +3063,37 @@ TRACE_EVENT(xrep_rtrmap_live_update, __entry->offset, __entry->flags) ); + +TRACE_EVENT(xrep_rtrefc_found, + TP_PROTO(struct xfs_rtgroup *rtg, const struct xfs_refcount_irec *rec), + TP_ARGS(rtg, rec), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(dev_t, rtdev) + __field(xfs_rgnumber_t, rgno) + __field(enum xfs_refc_domain, domain) + __field(xfs_rgblock_t, startblock) + __field(xfs_extlen_t, blockcount) + __field(xfs_nlink_t, refcount) + ), + TP_fast_assign( + __entry->dev = rtg->rtg_mount->m_super->s_dev; + __entry->rtdev = rtg->rtg_mount->m_rtdev_targp->bt_dev; + __entry->rgno = rtg->rtg_rgno; + __entry->domain = rec->rc_domain; + __entry->startblock = rec->rc_startblock; + __entry->blockcount = rec->rc_blockcount; + __entry->refcount = rec->rc_refcount; + ), + TP_printk("dev %d:%d rtdev %d:%d rgno 0x%x dom %s rgbno 0x%x fsbcount 0x%x refcount %u", + MAJOR(__entry->dev), MINOR(__entry->dev), + MAJOR(__entry->rtdev), MINOR(__entry->rtdev), + __entry->rgno, + __print_symbolic(__entry->domain, XFS_REFC_DOMAIN_STRINGS), + __entry->startblock, + __entry->blockcount, + __entry->refcount) +); #endif /* CONFIG_XFS_RT */ #endif /* IS_ENABLED(CONFIG_XFS_ONLINE_REPAIR) */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 42/42] xfs: enable realtime reflink 2022-12-30 22:18 ` [PATCHSET v1.0 00/42] xfs: reflink on the realtime device Darrick J. Wong ` (35 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 39/42] xfs: online repair of the realtime refcount btree Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 36/42] xfs: check new rtbitmap records against rt refcount btree Darrick J. Wong ` (4 subsequent siblings) 41 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Enable reflink for realtime devices, sort of. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/xfs_super.c | 19 ++++++++++++++++--- 1 file changed, 16 insertions(+), 3 deletions(-) diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index 4abeff701093..a3a0011272e5 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -1656,14 +1656,27 @@ xfs_fs_fill_super( "EXPERIMENTAL realtime allocation group feature in use. Use at your own risk!"); if (xfs_has_reflink(mp)) { - if (mp->m_sb.sb_rblocks) { + /* + * Reflink doesn't support rt extent sizes larger than a single + * block because we would have to perform unshare-around for + * rtext-unaligned write requests. + */ + if (xfs_has_realtime(mp) && mp->m_sb.sb_rextsize != 1) { xfs_alert(mp, - "reflink not compatible with realtime device!"); + "reflink not compatible with realtime extent size %u!", + mp->m_sb.sb_rextsize); error = -EINVAL; goto out_filestream_unmount; } - if (xfs_globals.always_cow) { + /* + * always-cow mode is not supported on filesystems with rt + * extent sizes larger than a single block because we'd have + * to perform write-around for unaligned writes because remap + * requests must be aligned to an rt extent. + */ + if (xfs_globals.always_cow && + (!xfs_has_realtime(mp) || mp->m_sb.sb_rextsize == 1)) { xfs_info(mp, "using DEBUG-only always_cow mode."); mp->m_always_cow = true; } ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 36/42] xfs: check new rtbitmap records against rt refcount btree 2022-12-30 22:18 ` [PATCHSET v1.0 00/42] xfs: reflink on the realtime device Darrick J. Wong ` (36 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 42/42] xfs: enable realtime reflink Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 40/42] xfs: repair inodes that have a refcount btree in the data fork Darrick J. Wong ` (3 subsequent siblings) 41 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> When we're rebuilding the realtime bitmap, check the proposed free extents against the rt refcount btree to make sure we don't commit any grievous errors. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/scrub/repair.c | 7 +++++++ fs/xfs/scrub/rtbitmap_repair.c | 21 +++++++++++++++++++++ 2 files changed, 28 insertions(+) diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c index b76c01e9f540..3bde5ea86cf5 100644 --- a/fs/xfs/scrub/repair.c +++ b/fs/xfs/scrub/repair.c @@ -40,6 +40,7 @@ #include "xfs_rtgroup.h" #include "xfs_rtalloc.h" #include "xfs_imeta.h" +#include "xfs_rtrefcount_btree.h" #include "scrub/scrub.h" #include "scrub/common.h" #include "scrub/trace.h" @@ -991,6 +992,12 @@ xrep_rtgroup_btcur_init( xfs_has_rtrmapbt(mp)) sr->rmap_cur = xfs_rtrmapbt_init_cursor(mp, sc->tp, sr->rtg, sr->rtg->rtg_rmapip); + + if (sc->sm->sm_type != XFS_SCRUB_TYPE_RTREFCBT && + (sr->rtlock_flags & XFS_RTGLOCK_REFCOUNT) && + xfs_has_rtreflink(mp)) + sr->refc_cur = xfs_rtrefcountbt_init_cursor(mp, sc->tp, + sr->rtg, sr->rtg->rtg_refcountip); } /* diff --git a/fs/xfs/scrub/rtbitmap_repair.c b/fs/xfs/scrub/rtbitmap_repair.c index 0fa8942d14e7..d099f988274e 100644 --- a/fs/xfs/scrub/rtbitmap_repair.c +++ b/fs/xfs/scrub/rtbitmap_repair.c @@ -22,6 +22,7 @@ #include "xfs_swapext.h" #include "xfs_rtbitmap.h" #include "xfs_rtgroup.h" +#include "xfs_refcount.h" #include "scrub/scrub.h" #include "scrub/common.h" #include "scrub/trace.h" @@ -447,6 +448,7 @@ xrep_rgbitmap_mark_free( unsigned int bufwsize; xfs_extlen_t mod; xfs_rtword_t mask; + enum xbtree_recpacking outcome; int error; if (!xfs_verify_rgbext(rtg, rb->next_rgbno, rgbno - rb->next_rgbno)) @@ -466,6 +468,25 @@ xrep_rgbitmap_mark_free( if (mod != mp->m_sb.sb_rextsize - 1) return -EFSCORRUPTED; + /* Must not be shared or CoW staging. */ + if (rb->sc->sr.refc_cur) { + error = xfs_refcount_has_records(rb->sc->sr.refc_cur, + XFS_REFC_DOMAIN_SHARED, rb->next_rgbno, + rgbno - rb->next_rgbno, &outcome); + if (error) + return error; + if (outcome != XBTREE_RECPACKING_EMPTY) + return -EFSCORRUPTED; + + error = xfs_refcount_has_records(rb->sc->sr.refc_cur, + XFS_REFC_DOMAIN_COW, rb->next_rgbno, + rgbno - rb->next_rgbno, &outcome); + if (error) + return error; + if (outcome != XBTREE_RECPACKING_EMPTY) + return -EFSCORRUPTED; + } + trace_xrep_rgbitmap_record_free(mp, startrtx, nextrtx - 1); /* Set bits as needed to round startrtx up to the nearest word. */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 40/42] xfs: repair inodes that have a refcount btree in the data fork 2022-12-30 22:18 ` [PATCHSET v1.0 00/42] xfs: reflink on the realtime device Darrick J. Wong ` (37 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 36/42] xfs: check new rtbitmap records against rt refcount btree Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 37/42] xfs: walk the rt reference count tree when rebuilding rmap Darrick J. Wong ` (2 subsequent siblings) 41 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Plumb knowledge of refcount btrees into the inode core repair code. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/scrub/inode_repair.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/fs/xfs/scrub/inode_repair.c b/fs/xfs/scrub/inode_repair.c index 3ce9ac5b0fc4..15dbb8a08b81 100644 --- a/fs/xfs/scrub/inode_repair.c +++ b/fs/xfs/scrub/inode_repair.c @@ -39,6 +39,7 @@ #include "xfs_rtbitmap.h" #include "xfs_rtgroup.h" #include "xfs_rtrmap_btree.h" +#include "xfs_rtrefcount_btree.h" #include "scrub/xfs_scrub.h" #include "scrub/scrub.h" #include "scrub/common.h" @@ -977,6 +978,7 @@ xrep_dinode_ensure_forkoff( { struct xfs_bmdr_block *bmdr; struct xfs_rtrmap_root *rmdr; + struct xfs_rtrefcount_root *rcdr; struct xfs_scrub *sc = ri->sc; xfs_extnum_t attr_extents, data_extents; size_t bmdr_minsz = xfs_bmdr_space_calc(1); @@ -1087,6 +1089,10 @@ xrep_dinode_ensure_forkoff( rmdr = XFS_DFORK_PTR(dip, XFS_DATA_FORK); dfork_min = xfs_rtrmap_broot_space(sc->mp, rmdr); break; + case XFS_DINODE_FMT_REFCOUNT: + rcdr = XFS_DFORK_PTR(dip, XFS_DATA_FORK); + dfork_min = xfs_rtrefcount_broot_space(sc->mp, rcdr); + break; default: dfork_min = 0; break; ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 37/42] xfs: walk the rt reference count tree when rebuilding rmap 2022-12-30 22:18 ` [PATCHSET v1.0 00/42] xfs: reflink on the realtime device Darrick J. Wong ` (38 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 40/42] xfs: repair inodes that have a refcount btree in the data fork Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 41/42] xfs: fix cow forks for realtime files Darrick J. Wong 2022-12-30 22:18 ` [PATCH 38/42] xfs: capture realtime CoW staging extents when rebuilding rt rmapbt Darrick J. Wong 41 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> When we're rebuilding the data device rmap, if we encounter a "refcount" format fork, we have to walk the (realtime) refcount btree inode to build the appropriate mappings. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/scrub/rmap_repair.c | 36 ++++++++++++++++++++++++++++++++++++ 1 file changed, 36 insertions(+) diff --git a/fs/xfs/scrub/rmap_repair.c b/fs/xfs/scrub/rmap_repair.c index 86c5338a12b9..24dcd3842ce6 100644 --- a/fs/xfs/scrub/rmap_repair.c +++ b/fs/xfs/scrub/rmap_repair.c @@ -32,6 +32,7 @@ #include "xfs_ag.h" #include "xfs_rtrmap_btree.h" #include "xfs_rtgroup.h" +#include "xfs_rtrefcount_btree.h" #include "scrub/xfs_scrub.h" #include "scrub/scrub.h" #include "scrub/common.h" @@ -530,6 +531,39 @@ xrep_rmap_scan_rtrmapbt( return -EFSCORRUPTED; } +static int +xrep_rmap_scan_rtrefcountbt( + struct xrep_rmap_ifork *rf, + struct xfs_inode *ip) +{ + struct xfs_scrub *sc = rf->rr->sc; + struct xfs_btree_cur *cur; + struct xfs_rtgroup *rtg; + xfs_rgnumber_t rgno; + int error; + + if (rf->whichfork != XFS_DATA_FORK) + return -EFSCORRUPTED; + + for_each_rtgroup(sc->mp, rgno, rtg) { + if (ip == rtg->rtg_refcountip) { + cur = xfs_rtrefcountbt_init_cursor(sc->mp, sc->tp, rtg, + ip); + error = xrep_rmap_scan_iroot_btree(rf, cur); + xfs_btree_del_cursor(cur, error); + xfs_rtgroup_put(rtg); + return error; + } + } + + /* + * We shouldn't find a refcount format inode that isn't associated with + * an rtgroup! + */ + ASSERT(0); + return -EFSCORRUPTED; +} + /* Find all the extents from a given AG in an inode fork. */ STATIC int xrep_rmap_scan_ifork( @@ -561,6 +595,8 @@ xrep_rmap_scan_ifork( return error; } else if (ifp->if_format == XFS_DINODE_FMT_RMAP) { return xrep_rmap_scan_rtrmapbt(&rf, ip); + } else if (ifp->if_format == XFS_DINODE_FMT_REFCOUNT) { + return xrep_rmap_scan_rtrefcountbt(&rf, ip); } else if (ifp->if_format != XFS_DINODE_FMT_EXTENTS) { return 0; } ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 41/42] xfs: fix cow forks for realtime files 2022-12-30 22:18 ` [PATCHSET v1.0 00/42] xfs: reflink on the realtime device Darrick J. Wong ` (39 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 37/42] xfs: walk the rt reference count tree when rebuilding rmap Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 38/42] xfs: capture realtime CoW staging extents when rebuilding rt rmapbt Darrick J. Wong 41 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Port the CoW fork repair to realtime files. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/scrub/bitmap.h | 28 ++++++ fs/xfs/scrub/cow_repair.c | 210 ++++++++++++++++++++++++++++++++++++++++--- fs/xfs/scrub/reap.c | 219 +++++++++++++++++++++++++++++++++++++++++++++ fs/xfs/scrub/reap.h | 7 + fs/xfs/scrub/repair.h | 1 fs/xfs/scrub/trace.h | 72 +++++++++++++++ 6 files changed, 520 insertions(+), 17 deletions(-) diff --git a/fs/xfs/scrub/bitmap.h b/fs/xfs/scrub/bitmap.h index 29faf2b63715..7541b5fd68f8 100644 --- a/fs/xfs/scrub/bitmap.h +++ b/fs/xfs/scrub/bitmap.h @@ -167,4 +167,32 @@ static inline int xfsb_bitmap_walk(struct xfsb_bitmap *bitmap, return xbitmap_walk(&bitmap->fsbitmap, fn, priv); } +/* Bitmaps, but for type-checked for xfs_rtblock_t */ + +struct xrtb_bitmap { + struct xbitmap rtbitmap; +}; + +static inline void xrtb_bitmap_init(struct xrtb_bitmap *bitmap) +{ + xbitmap_init(&bitmap->rtbitmap); +} + +static inline void xrtb_bitmap_destroy(struct xrtb_bitmap *bitmap) +{ + xbitmap_destroy(&bitmap->rtbitmap); +} + +static inline int xrtb_bitmap_set(struct xrtb_bitmap *bitmap, + xfs_rtblock_t start, xfs_filblks_t len) +{ + return xbitmap_set(&bitmap->rtbitmap, start, len); +} + +static inline int xrtb_bitmap_walk(struct xrtb_bitmap *bitmap, + xbitmap_walk_fn fn, void *priv) +{ + return xbitmap_walk(&bitmap->rtbitmap, fn, priv); +} + #endif /* __XFS_SCRUB_BITMAP_H__ */ diff --git a/fs/xfs/scrub/cow_repair.c b/fs/xfs/scrub/cow_repair.c index a0c1d97ab8b6..5605c4ecbdca 100644 --- a/fs/xfs/scrub/cow_repair.c +++ b/fs/xfs/scrub/cow_repair.c @@ -26,6 +26,9 @@ #include "xfs_errortag.h" #include "xfs_icache.h" #include "xfs_refcount_btree.h" +#include "xfs_rtalloc.h" +#include "xfs_rtbitmap.h" +#include "xfs_rtgroup.h" #include "scrub/xfs_scrub.h" #include "scrub/scrub.h" #include "scrub/common.h" @@ -59,7 +62,10 @@ struct xrep_cow { struct xbitmap bad_fileoffs; /* Bitmap of fsblocks that were removed from the CoW fork. */ - struct xfsb_bitmap old_cowfork_fsblocks; + union { + struct xfsb_bitmap old_cowfork_fsblocks; + struct xrtb_bitmap old_cowfork_rtblocks; + }; /* CoW fork mappings used to scan for bad CoW staging extents. */ struct xfs_bmbt_irec irec; @@ -137,8 +143,12 @@ xrep_cow_mark_shared_staging( xrep_cow_trim_refcount(xc, &rrec, rec); - fsbno = XFS_AGB_TO_FSB(xc->sc->mp, cur->bc_ag.pag->pag_agno, - rrec.rc_startblock); + if (XFS_IS_REALTIME_INODE(xc->sc->ip)) + fsbno = xfs_rgbno_to_rtb(xc->sc->mp, cur->bc_ino.rtg->rtg_rgno, + rrec.rc_startblock); + else + fsbno = XFS_AGB_TO_FSB(xc->sc->mp, cur->bc_ag.pag->pag_agno, + rrec.rc_startblock); return xrep_cow_mark_file_range(xc, fsbno, rrec.rc_blockcount); } @@ -158,6 +168,7 @@ xrep_cow_mark_missing_staging( { struct xrep_cow *xc = priv; struct xfs_refcount_irec rrec; + xfs_fsblock_t fsbno; int error; if (!xfs_refcount_check_domain(rec) || @@ -169,9 +180,13 @@ xrep_cow_mark_missing_staging( if (xc->next_bno >= rrec.rc_startblock) goto next; - error = xrep_cow_mark_file_range(xc, - XFS_AGB_TO_FSB(xc->sc->mp, cur->bc_ag.pag->pag_agno, - xc->next_bno), + if (XFS_IS_REALTIME_INODE(xc->sc->ip)) + fsbno = xfs_rgbno_to_rtb(xc->sc->mp, cur->bc_ino.rtg->rtg_rgno, + xc->next_bno); + else + fsbno = XFS_AGB_TO_FSB(xc->sc->mp, cur->bc_ag.pag->pag_agno, + xc->next_bno); + error = xrep_cow_mark_file_range(xc, fsbno, rrec.rc_startblock - xc->next_bno); if (error) return error; @@ -214,7 +229,12 @@ xrep_cow_mark_missing_staging_rmap( rec_len -= adj; } - fsbno = XFS_AGB_TO_FSB(xc->sc->mp, cur->bc_ag.pag->pag_agno, rec_bno); + if (XFS_IS_REALTIME_INODE(xc->sc->ip)) + fsbno = xfs_rgbno_to_rtb(xc->sc->mp, cur->bc_ino.rtg->rtg_rgno, + rec_bno); + else + fsbno = XFS_AGB_TO_FSB(xc->sc->mp, cur->bc_ag.pag->pag_agno, + rec_bno); return xrep_cow_mark_file_range(xc, fsbno, rec_len); } @@ -303,6 +323,99 @@ xrep_cow_find_bad( return 0; } +/* + * Find any part of the CoW fork mapping that isn't a single-owner CoW staging + * extent and mark the corresponding part of the file range in the bitmap. + */ +STATIC int +xrep_cow_find_bad_rt( + struct xrep_cow *xc) +{ + struct xfs_refcount_irec rc_low = { 0 }; + struct xfs_refcount_irec rc_high = { 0 }; + struct xfs_rmap_irec rm_low = { 0 }; + struct xfs_rmap_irec rm_high = { 0 }; + struct xfs_scrub *sc = xc->sc; + struct xfs_rtgroup *rtg; + xfs_rgnumber_t rgno; + int error = 0; + + xc->irec_startbno = xfs_rtb_to_rgbno(sc->mp, xc->irec.br_startblock, + &rgno); + + rtg = xfs_rtgroup_get(sc->mp, rgno); + if (!rtg) + return -EFSCORRUPTED; + + if (xrep_is_rtmeta_ino(sc, rtg, sc->ip->i_ino)) { + xfs_rtgroup_put(rtg); + goto out_rtg; + } + + error = xrep_rtgroup_init(sc, rtg, &sc->sr, + XFS_RTGLOCK_RMAP | XFS_RTGLOCK_REFCOUNT); + if (error) + goto out_rtg; + + /* Mark any CoW fork extents that are shared. */ + rc_low.rc_startblock = xc->irec_startbno; + rc_high.rc_startblock = xc->irec_startbno + xc->irec.br_blockcount - 1; + rc_low.rc_domain = rc_high.rc_domain = XFS_REFC_DOMAIN_SHARED; + error = xfs_refcount_query_range(sc->sr.refc_cur, &rc_low, &rc_high, + xrep_cow_mark_shared_staging, xc); + if (error) + goto out_sr; + + /* Make sure there are CoW staging extents for the whole mapping. */ + rc_low.rc_startblock = xc->irec_startbno; + rc_high.rc_startblock = xc->irec_startbno + xc->irec.br_blockcount - 1; + rc_low.rc_domain = rc_high.rc_domain = XFS_REFC_DOMAIN_COW; + xc->next_bno = xc->irec_startbno; + error = xfs_refcount_query_range(sc->sr.refc_cur, &rc_low, &rc_high, + xrep_cow_mark_missing_staging, xc); + if (error) + goto out_sr; + + if (xc->next_bno < xc->irec_startbno + xc->irec.br_blockcount) { + error = xrep_cow_mark_file_range(xc, + xfs_rgbno_to_rtb(sc->mp, rtg->rtg_rgno, + xc->next_bno), + xc->irec_startbno + xc->irec.br_blockcount - + xc->next_bno); + if (error) + goto out_sr; + } + + /* Mark any area has an rmap that isn't a COW staging extent. */ + rm_low.rm_startblock = xc->irec_startbno; + memset(&rm_high, 0xFF, sizeof(rm_high)); + rm_high.rm_startblock = xc->irec_startbno + xc->irec.br_blockcount - 1; + error = xfs_rmap_query_range(sc->sr.rmap_cur, &rm_low, &rm_high, + xrep_cow_mark_missing_staging_rmap, xc); + if (error) + goto out_sr; + + /* + * If userspace is forcing us to rebuild the CoW fork or someone + * turned on the debugging knob, replace everything in the + * CoW fork and then scan for staging extents in the refcountbt. + */ + if ((sc->sm->sm_flags & XFS_SCRUB_IFLAG_FORCE_REBUILD) || + XFS_TEST_ERROR(false, sc->mp, XFS_ERRTAG_FORCE_SCRUB_REPAIR)) { + error = xrep_cow_mark_file_range(xc, xc->irec.br_startblock, + xc->irec.br_blockcount); + if (error) + goto out_rtg; + } + +out_sr: + xchk_rtgroup_btcur_free(&sc->sr); + xchk_rtgroup_free(sc, &sc->sr); +out_rtg: + xfs_rtgroup_put(rtg); + return error; +} + /* * Allocate a replacement CoW staging extent of up to the given number of * blocks, and fill out the mapping. The caller must set irec->br_blockcount. @@ -343,6 +456,45 @@ xrep_cow_alloc( return 0; } +/* + * Allocate a replacement rt CoW staging extent of up to the given number of + * blocks, and fill out the mapping. The caller must set irec->br_blockcount. + */ +STATIC int +xrep_cow_alloc_rt( + struct xfs_scrub *sc, + struct xfs_bmbt_irec *irec) +{ + xfs_rtxnum_t rtx = NULLRTEXTNO; + xfs_rtxlen_t rtxlen = 0; + xfs_rtblock_t rtbno; + xfs_extlen_t len; + int error; + + ASSERT(sc->mp->m_sb.sb_rextsize == 1); + + error = xfs_trans_reserve_more(sc->tp, 0, irec->br_blockcount); + if (error) + return error; + + xfs_rtbitmap_lock(sc->tp, sc->mp); + + error = xfs_rtallocate_extent(sc->tp, 0, 1, irec->br_blockcount, + &rtxlen, 0, 1, &rtx); + if (error) + return error; + if (rtx == NULLRTEXTNO) + return -ENOSPC; + + rtbno = xfs_rtx_to_rtb(sc->mp, rtx); + len = xfs_rtxlen_to_extlen(sc->mp, rtxlen); + xfs_refcount_alloc_cow_extent(sc->tp, true, rtbno, len); + + irec->br_startblock = rtbno; + irec->br_blockcount = len; + return 0; +} + /* * Look up the current CoW fork mapping so that we only allocate enough to * replace a single mapping. If we don't find a mapping that covers the start @@ -514,7 +666,10 @@ xrep_cow_replace_one( * Allocate a replacement extent. If we don't fill all the blocks, * shorten the quantity that will be deleted in this step. */ - error = xrep_cow_alloc(sc, &rep); + if (XFS_IS_REALTIME_INODE(sc->ip)) + error = xrep_cow_alloc_rt(sc, &rep); + else + error = xrep_cow_alloc(sc, &rep); if (error) return error; @@ -531,8 +686,12 @@ xrep_cow_replace_one( return error; /* Note the old CoW staging extents; we'll reap them all later. */ - error = xfsb_bitmap_set(&xc->old_cowfork_fsblocks, old_startblock, - rep.br_blockcount); + if (XFS_IS_REALTIME_INODE(sc->ip)) + error = xrtb_bitmap_set(&xc->old_cowfork_rtblocks, + old_startblock, rep.br_blockcount); + else + error = xfsb_bitmap_set(&xc->old_cowfork_fsblocks, + old_startblock, rep.br_blockcount); if (error) return error; @@ -588,8 +747,12 @@ xrep_bmap_cow( if (!ifp) return 0; - /* realtime files aren't supported yet */ - if (XFS_IS_REALTIME_INODE(sc->ip)) + /* + * Realtime files with large extent sizes are not supported because + * we could encounter an CoW mapping that has been partially written + * out *and* requires replacement, and there's no solution to that. + */ + if (XFS_IS_REALTIME_INODE(sc->ip) && sc->mp->m_sb.sb_rextsize != 1) return -EOPNOTSUPP; /* @@ -610,7 +773,10 @@ xrep_bmap_cow( xc->sc = sc; xbitmap_init(&xc->bad_fileoffs); - xfsb_bitmap_init(&xc->old_cowfork_fsblocks); + if (XFS_IS_REALTIME_INODE(sc->ip)) + xrtb_bitmap_init(&xc->old_cowfork_rtblocks); + else + xfsb_bitmap_init(&xc->old_cowfork_fsblocks); for_each_xfs_iext(ifp, &icur, &xc->irec) { if (xchk_should_terminate(sc, &error)) @@ -633,7 +799,10 @@ xrep_bmap_cow( if (xfs_bmap_is_written_extent(&xc->irec)) continue; - error = xrep_cow_find_bad(xc); + if (XFS_IS_REALTIME_INODE(sc->ip)) + error = xrep_cow_find_bad_rt(xc); + else + error = xrep_cow_find_bad(xc); if (error) goto out_bitmap; } @@ -648,13 +817,20 @@ xrep_bmap_cow( * by the refcount btree, not the inode, so it is correct to treat them * like inode metadata. */ - error = xrep_reap_fsblocks(sc, &xc->old_cowfork_fsblocks, - &XFS_RMAP_OINFO_COW, XFS_AG_RESV_NONE); + if (XFS_IS_REALTIME_INODE(sc->ip)) + error = xrep_reap_rtblocks(sc, &xc->old_cowfork_rtblocks, + &XFS_RMAP_OINFO_COW); + else + error = xrep_reap_fsblocks(sc, &xc->old_cowfork_fsblocks, + &XFS_RMAP_OINFO_COW, XFS_AG_RESV_NONE); if (error) goto out_bitmap; out_bitmap: - xfsb_bitmap_destroy(&xc->old_cowfork_fsblocks); + if (XFS_IS_REALTIME_INODE(sc->ip)) + xrtb_bitmap_destroy(&xc->old_cowfork_rtblocks); + else + xfsb_bitmap_destroy(&xc->old_cowfork_fsblocks); xbitmap_destroy(&xc->bad_fileoffs); kmem_free(xc); return error; diff --git a/fs/xfs/scrub/reap.c b/fs/xfs/scrub/reap.c index 77354bdb0511..b5b5963f6d99 100644 --- a/fs/xfs/scrub/reap.c +++ b/fs/xfs/scrub/reap.c @@ -33,6 +33,8 @@ #include "xfs_attr.h" #include "xfs_attr_remote.h" #include "xfs_defer.h" +#include "xfs_rtgroup.h" +#include "xfs_rtrmap_btree.h" #include "scrub/scrub.h" #include "scrub/common.h" #include "scrub/trace.h" @@ -676,6 +678,223 @@ xrep_reap_fsblocks( return 0; } +#ifdef CONFIG_XFS_RT +/* Dispose of a single rtgroup extent. */ +STATIC int +xreap_rgextent( + struct xreap_state *rs, + xfs_rgblock_t rgbno, + xfs_extlen_t *rglenp, + bool crosslinked) +{ + struct xfs_scrub *sc = rs->sc; + xfs_rtblock_t rtbno; + + /* + * The only caller so far is CoW fork repair, so we only know how to + * unlink or free CoW staging extents. + */ + if (rs->oinfo != &XFS_RMAP_OINFO_COW) { + ASSERT(rs->oinfo == &XFS_RMAP_OINFO_COW); + return -EFSCORRUPTED; + } + ASSERT(rs->resv == XFS_AG_RESV_NONE); + + rtbno = xfs_rgbno_to_rtb(sc->mp, sc->sr.rtg->rtg_rgno, rgbno); + + /* + * If there are other rmappings, this block is cross linked and must + * not be freed. Remove the forward and reverse mapping and move on. + * + * XXX: XFS doesn't support detecting the case where a single block + * metadata structure is crosslinked with a multi-block structure + * because the buffer cache doesn't detect aliasing problems, so we + * can't fix 100% of crosslinking problems (yet). The verifiers will + * blow on writeout, the filesystem will shut down, and the admin gets + * to run xfs_repair. + */ + if (crosslinked) { + trace_xreap_dispose_unmap_rtextent(sc->sr.rtg, rgbno, *rglenp); + + xfs_refcount_free_cow_extent(sc->tp, true, rtbno, *rglenp); + rs->deferred++; + return 0; + } + + trace_xreap_dispose_free_rtextent(sc->sr.rtg, rgbno, *rglenp); + + /* + * The CoW staging extent is not crosslinked. Use deferred work items + * to remove the refcountbt records (which removes the rmap records) + * and free the extent. We're not worried about the system going down + * here because log recovery walks the refcount btree to clean out the + * CoW staging extents. + */ + xfs_refcount_free_cow_extent(sc->tp, true, rtbno, *rglenp); + xfs_free_extent_later(sc->tp, rtbno, *rglenp, NULL, + XFS_FREE_EXTENT_REALTIME | + XFS_FREE_EXTENT_SKIP_DISCARD); + rs->deferred++; + return 0; +} + +/* + * Figure out the longest run of blocks that we can dispose of with a single + * call. Cross-linked blocks should have their reverse mappings removed, but + * single-owner extents can be freed. Units are rt blocks, not rt extents. + */ +STATIC int +xreap_rgextent_select( + struct xreap_state *rs, + xfs_rgblock_t rgbno, + xfs_rgblock_t rgbno_next, + bool *crosslinked, + xfs_extlen_t *rglenp) +{ + struct xfs_scrub *sc = rs->sc; + struct xfs_btree_cur *cur; + xfs_rgblock_t bno = rgbno + 1; + xfs_extlen_t len = 1; + int error; + + /* + * Determine if there are any other rmap records covering the first + * block of this extent. If so, the block is crosslinked. + */ + cur = xfs_rtrmapbt_init_cursor(sc->mp, sc->tp, sc->sr.rtg, + sc->sr.rtg->rtg_rmapip); + error = xfs_rmap_has_other_keys(cur, rgbno, 1, rs->oinfo, + crosslinked); + if (error) + goto out_cur; + + /* + * Figure out how many of the subsequent blocks have the same crosslink + * status. + */ + while (bno < rgbno_next) { + bool also_crosslinked; + + error = xfs_rmap_has_other_keys(cur, bno, 1, rs->oinfo, + &also_crosslinked); + if (error) + goto out_cur; + + if (*crosslinked != also_crosslinked) + break; + + len++; + bno++; + } + + *rglenp = len; + trace_xreap_rgextent_select(sc->sr.rtg, rgbno, len, *crosslinked); +out_cur: + xfs_btree_del_cursor(cur, error); + return error; +} + +#define XREAP_RTGLOCK_ALL (XFS_RTGLOCK_BITMAP | \ + XFS_RTGLOCK_RMAP | \ + XFS_RTGLOCK_REFCOUNT) + +/* + * Break a rt file metadata extent into sub-extents by fate (crosslinked, not + * crosslinked), and dispose of each sub-extent separately. The extent must + * be aligned to a realtime extent. + */ +STATIC int +xreap_rtmeta_extent( + uint64_t rtbno, + uint64_t len, + void *priv) +{ + struct xreap_state *rs = priv; + struct xfs_scrub *sc = rs->sc; + xfs_rgnumber_t rgno; + xfs_rgblock_t rgbno = xfs_rtb_to_rgbno(sc->mp, rtbno, &rgno); + xfs_rgblock_t rgbno_next = rgbno + len; + int error = 0; + + ASSERT(sc->ip != NULL); + ASSERT(!sc->sr.rtg); + + /* + * We're reaping blocks after repairing file metadata, which means that + * we have to init the xchk_ag structure ourselves. + */ + sc->sr.rtg = xfs_rtgroup_get(sc->mp, rgno); + if (!sc->sr.rtg) + return -EFSCORRUPTED; + + xfs_rtgroup_lock(NULL, sc->sr.rtg, XREAP_RTGLOCK_ALL); + + while (rgbno < rgbno_next) { + xfs_extlen_t rglen; + bool crosslinked; + + error = xreap_rgextent_select(rs, rgbno, rgbno_next, + &crosslinked, &rglen); + if (error) + goto out_unlock; + + error = xreap_rgextent(rs, rgbno, &rglen, crosslinked); + if (error) + goto out_unlock; + + if (xreap_want_defer_finish(rs)) { + error = xfs_defer_finish(&sc->tp); + if (error) + goto out_unlock; + xreap_defer_finish_reset(rs); + } else if (xreap_want_roll(rs)) { + error = xfs_trans_roll_inode(&sc->tp, sc->ip); + if (error) + goto out_unlock; + xreap_reset(rs); + } + + rgbno += rglen; + } + +out_unlock: + xfs_rtgroup_unlock(sc->sr.rtg, XREAP_RTGLOCK_ALL); + xfs_rtgroup_put(sc->sr.rtg); + sc->sr.rtg = NULL; + return error; +} + +/* + * Dispose of every block of every rt metadata extent in the bitmap. + * Do not use this to dispose of the mappings in an ondisk inode fork. + */ +int +xrep_reap_rtblocks( + struct xfs_scrub *sc, + struct xrtb_bitmap *bitmap, + const struct xfs_owner_info *oinfo) +{ + struct xreap_state rs = { + .sc = sc, + .oinfo = oinfo, + .resv = XFS_AG_RESV_NONE, + }; + int error; + + ASSERT(xfs_has_rmapbt(sc->mp)); + ASSERT(sc->ip != NULL); + + error = xrtb_bitmap_walk(bitmap, xreap_rtmeta_extent, &rs); + if (error) + return error; + + if (xreap_dirty(&rs)) + return xrep_defer_finish(sc); + + return 0; +} +#endif /* CONFIG_XFS_RT */ + /* * Metadata files are not supposed to share blocks with anything else. * If blocks are shared, we remove the reverse mapping (thus reducing the diff --git a/fs/xfs/scrub/reap.h b/fs/xfs/scrub/reap.h index cfaef544f659..bf025ec3501b 100644 --- a/fs/xfs/scrub/reap.h +++ b/fs/xfs/scrub/reap.h @@ -12,6 +12,13 @@ int xrep_reap_fsblocks(struct xfs_scrub *sc, struct xfsb_bitmap *bitmap, const struct xfs_owner_info *oinfo, enum xfs_ag_resv_type type); int xrep_reap_ifork(struct xfs_scrub *sc, struct xfs_inode *ip, int whichfork); +#ifdef CONFIG_XFS_RT +int xrep_reap_rtblocks(struct xfs_scrub *sc, struct xrtb_bitmap *bitmap, + const struct xfs_owner_info *oinfo); +#else +# define xrep_reap_rtblocks(...) (-EOPNOTSUPP) +#endif /* CONFIG_XFS_RT */ + /* Buffer cache scan context. */ struct xrep_bufscan { /* Disk address for the buffers we want to scan. */ diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h index aa15aeffa724..e2b75c449046 100644 --- a/fs/xfs/scrub/repair.h +++ b/fs/xfs/scrub/repair.h @@ -41,6 +41,7 @@ struct xbitmap; struct xagb_bitmap; struct xrgb_bitmap; struct xfsb_bitmap; +struct xrtb_bitmap; int xrep_fix_freelist(struct xfs_scrub *sc, int alloc_flags); diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h index d74bba391854..4d8e4b77cbbe 100644 --- a/fs/xfs/scrub/trace.h +++ b/fs/xfs/scrub/trace.h @@ -1414,6 +1414,41 @@ DEFINE_REPAIR_EXTENT_EVENT(xreap_agextent_binval); DEFINE_REPAIR_EXTENT_EVENT(xreap_bmapi_binval); DEFINE_REPAIR_EXTENT_EVENT(xrep_agfl_insert); +#ifdef CONFIG_XFS_RT +DECLARE_EVENT_CLASS(xrep_rtgroup_extent_class, + TP_PROTO(struct xfs_rtgroup *rtg, xfs_rgblock_t rgbno, + xfs_extlen_t len), + TP_ARGS(rtg, rgbno, len), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(dev_t, rtdev) + __field(xfs_rgnumber_t, rgno) + __field(xfs_rgblock_t, rgbno) + __field(xfs_extlen_t, len) + ), + TP_fast_assign( + __entry->dev = rtg->rtg_mount->m_super->s_dev; + __entry->rtdev = rtg->rtg_mount->m_rtdev_targp->bt_dev; + __entry->rgno = rtg->rtg_rgno; + __entry->rgbno = rgbno; + __entry->len = len; + ), + TP_printk("dev %d:%d rtdev %d:%d rgno 0x%x rgbno 0x%x fsbcount 0x%x", + MAJOR(__entry->dev), MINOR(__entry->dev), + MAJOR(__entry->rtdev), MINOR(__entry->rtdev), + __entry->rgno, + __entry->rgbno, + __entry->len) +); +#define DEFINE_REPAIR_RTGROUP_EXTENT_EVENT(name) \ +DEFINE_EVENT(xrep_rtgroup_extent_class, name, \ + TP_PROTO(struct xfs_rtgroup *rtg, xfs_rgblock_t rgbno, \ + xfs_extlen_t len), \ + TP_ARGS(rtg, rgbno, len)) +DEFINE_REPAIR_RTGROUP_EXTENT_EVENT(xreap_dispose_unmap_rtextent); +DEFINE_REPAIR_RTGROUP_EXTENT_EVENT(xreap_dispose_free_rtextent); +#endif /* CONFIG_XFS_RT */ + DECLARE_EVENT_CLASS(xrep_reap_find_class, TP_PROTO(struct xfs_perag *pag, xfs_agblock_t agbno, xfs_extlen_t len, bool crosslinked), @@ -1447,6 +1482,43 @@ DEFINE_EVENT(xrep_reap_find_class, name, \ DEFINE_REPAIR_REAP_FIND_EVENT(xreap_agextent_select); DEFINE_REPAIR_REAP_FIND_EVENT(xreap_bmapi_select); +#ifdef CONFIG_XFS_RT +DECLARE_EVENT_CLASS(xrep_rtgroup_reap_find_class, + TP_PROTO(struct xfs_rtgroup *rtg, xfs_rgblock_t rgbno, xfs_extlen_t len, + bool crosslinked), + TP_ARGS(rtg, rgbno, len, crosslinked), + TP_STRUCT__entry( + __field(dev_t, dev) + __field(dev_t, rtdev) + __field(xfs_rgnumber_t, rgno) + __field(xfs_rgblock_t, rgbno) + __field(xfs_extlen_t, len) + __field(bool, crosslinked) + ), + TP_fast_assign( + __entry->dev = rtg->rtg_mount->m_super->s_dev; + __entry->rtdev = rtg->rtg_mount->m_rtdev_targp->bt_dev; + __entry->rgno = rtg->rtg_rgno; + __entry->rgbno = rgbno; + __entry->len = len; + __entry->crosslinked = crosslinked; + ), + TP_printk("dev %d:%d rtdev %d:%d rgno 0x%x rgbno 0x%x fsbcount 0x%x crosslinked %d", + MAJOR(__entry->dev), MINOR(__entry->dev), + MAJOR(__entry->rtdev), MINOR(__entry->rtdev), + __entry->rgno, + __entry->rgbno, + __entry->len, + __entry->crosslinked ? 1 : 0) +); +#define DEFINE_REPAIR_RTGROUP_REAP_FIND_EVENT(name) \ +DEFINE_EVENT(xrep_rtgroup_reap_find_class, name, \ + TP_PROTO(struct xfs_rtgroup *rtg, xfs_rgblock_t rgbno, \ + xfs_extlen_t len, bool crosslinked), \ + TP_ARGS(rtg, rgbno, len, crosslinked)) +DEFINE_REPAIR_RTGROUP_REAP_FIND_EVENT(xreap_rgextent_select); +#endif /* CONFIG_XFS_RT */ + DECLARE_EVENT_CLASS(xrep_rmap_class, TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno, xfs_agblock_t agbno, xfs_extlen_t len, ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 38/42] xfs: capture realtime CoW staging extents when rebuilding rt rmapbt 2022-12-30 22:18 ` [PATCHSET v1.0 00/42] xfs: reflink on the realtime device Darrick J. Wong ` (40 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 41/42] xfs: fix cow forks for realtime files Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 41 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Walk the realtime refcount btree to find the CoW staging extents when we're rebuilding the realtime rmap btree. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/scrub/bitmap.h | 28 ++++++++++++ fs/xfs/scrub/repair.h | 1 fs/xfs/scrub/rtrmap_repair.c | 102 ++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 131 insertions(+) diff --git a/fs/xfs/scrub/bitmap.h b/fs/xfs/scrub/bitmap.h index d59d5e76782c..29faf2b63715 100644 --- a/fs/xfs/scrub/bitmap.h +++ b/fs/xfs/scrub/bitmap.h @@ -111,6 +111,34 @@ int xagb_bitmap_set_btblocks(struct xagb_bitmap *bitmap, int xagb_bitmap_set_btcur_path(struct xagb_bitmap *bitmap, struct xfs_btree_cur *cur); +/* Bitmaps, but for type-checked for xfs_rgblock_t */ + +struct xrgb_bitmap { + struct xbitmap rgbitmap; +}; + +static inline void xrgb_bitmap_init(struct xrgb_bitmap *bitmap) +{ + xbitmap_init(&bitmap->rgbitmap); +} + +static inline void xrgb_bitmap_destroy(struct xrgb_bitmap *bitmap) +{ + xbitmap_destroy(&bitmap->rgbitmap); +} + +static inline int xrgb_bitmap_set(struct xrgb_bitmap *bitmap, + xfs_rgblock_t start, xfs_extlen_t len) +{ + return xbitmap_set(&bitmap->rgbitmap, start, len); +} + +static inline int xrgb_bitmap_walk(struct xrgb_bitmap *bitmap, + xbitmap_walk_fn fn, void *priv) +{ + return xbitmap_walk(&bitmap->rgbitmap, fn, priv); +} + /* Bitmaps, but for type-checked for xfs_fsblock_t */ struct xfsb_bitmap { diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h index ff8605849a72..4a0cedea3fe0 100644 --- a/fs/xfs/scrub/repair.h +++ b/fs/xfs/scrub/repair.h @@ -39,6 +39,7 @@ xrep_trans_commit( struct xbitmap; struct xagb_bitmap; +struct xrgb_bitmap; struct xfsb_bitmap; int xrep_fix_freelist(struct xfs_scrub *sc, int alloc_flags); diff --git a/fs/xfs/scrub/rtrmap_repair.c b/fs/xfs/scrub/rtrmap_repair.c index e26847784d21..36c03e48c3fb 100644 --- a/fs/xfs/scrub/rtrmap_repair.c +++ b/fs/xfs/scrub/rtrmap_repair.c @@ -29,6 +29,7 @@ #include "xfs_rtalloc.h" #include "xfs_ag.h" #include "xfs_rtgroup.h" +#include "xfs_refcount.h" #include "scrub/xfs_scrub.h" #include "scrub/scrub.h" #include "scrub/common.h" @@ -420,6 +421,100 @@ xrep_rtrmap_scan_ag( return error; } +struct xrep_rtrmap_stash_run { + struct xrep_rtrmap *rr; + uint64_t owner; +}; + +static int +xrep_rtrmap_stash_run( + uint64_t start, + uint64_t len, + void *priv) +{ + struct xrep_rtrmap_stash_run *rsr = priv; + struct xrep_rtrmap *rr = rsr->rr; + xfs_rgblock_t rgbno = start; + + return xrep_rtrmap_stash(rr, rgbno, len, rsr->owner, 0, 0); +} + +/* + * Emit rmaps for every extent of bits set in the bitmap. Caller must ensure + * that the ranges are in units of FS blocks. + */ +STATIC int +xrep_rtrmap_stash_bitmap( + struct xrep_rtrmap *rr, + struct xrgb_bitmap *bitmap, + const struct xfs_owner_info *oinfo) +{ + struct xrep_rtrmap_stash_run rsr = { + .rr = rr, + .owner = oinfo->oi_owner, + }; + + return xrgb_bitmap_walk(bitmap, xrep_rtrmap_stash_run, &rsr); +} + +/* Record a CoW staging extent. */ +STATIC int +xrep_rtrmap_walk_cowblocks( + struct xfs_btree_cur *cur, + const struct xfs_refcount_irec *irec, + void *priv) +{ + struct xrgb_bitmap *bitmap = priv; + + if (!xfs_refcount_check_domain(irec) || + irec->rc_domain != XFS_REFC_DOMAIN_COW) + return -EFSCORRUPTED; + + return xrgb_bitmap_set(bitmap, irec->rc_startblock, + irec->rc_blockcount); +} + +/* + * Collect rmaps for the blocks containing the refcount btree, and all CoW + * staging extents. + */ +STATIC int +xrep_rtrmap_find_refcount_rmaps( + struct xrep_rtrmap *rr) +{ + struct xrgb_bitmap cow_blocks; /* COWBIT */ + struct xfs_refcount_irec low = { + .rc_startblock = 0, + .rc_domain = XFS_REFC_DOMAIN_COW, + }; + struct xfs_refcount_irec high = { + .rc_startblock = -1U, + .rc_domain = XFS_REFC_DOMAIN_COW, + }; + struct xfs_scrub *sc = rr->sc; + int error; + + if (!xfs_has_rtreflink(sc->mp)) + return 0; + + xrgb_bitmap_init(&cow_blocks); + + /* Collect rmaps for CoW staging extents. */ + error = xfs_refcount_query_range(sc->sr.refc_cur, &low, &high, + xrep_rtrmap_walk_cowblocks, &cow_blocks); + if (error) + goto out_bitmap; + + /* Generate rmaps for everything. */ + error = xrep_rtrmap_stash_bitmap(rr, &cow_blocks, &XFS_RMAP_OINFO_COW); + if (error) + goto out_bitmap; + +out_bitmap: + xrgb_bitmap_destroy(&cow_blocks); + return error; +} + /* Count and check all collected records. */ STATIC int xrep_rtrmap_check_record( @@ -467,6 +562,13 @@ xrep_rtrmap_find_rmaps( if (error) return error; + /* Find CoW staging extents. */ + xrep_rtgroup_btcur_init(sc, &sc->sr); + error = xrep_rtrmap_find_refcount_rmaps(rr); + xchk_rtgroup_btcur_free(&sc->sr); + if (error) + return error; + /* * Set up for a potentially lengthy filesystem scan by reducing our * transaction resource usage for the duration. Specifically: ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCHSET v1.0 0/9] xfs: reflink with large realtime extents 2022-12-30 21:14 [NYE DELUGE 3/4] xfs: modernize the realtime volume Darrick J. Wong ` (15 preceding siblings ...) 2022-12-30 22:18 ` [PATCHSET v1.0 00/42] xfs: reflink on the realtime device Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 4/9] xfs: forcibly convert unwritten blocks within an rt extent before sharing Darrick J. Wong ` (8 more replies) 2022-12-30 22:18 ` [PATCHSET v1.0 0/3] xfs: enable quota for realtime voluems Darrick J. Wong ` (22 subsequent siblings) 39 siblings, 9 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs Hi all, Now that we've landed support for reflink on the realtime device for cases where the rt extent size is the same as the fs block size, enhance the reflink code further to support cases where the rt extent size is a power-of-two multiple of the fs block size. This enables us to do data block sharing (for example) for much larger allocation units by dirtying pagecache around shared extents and expanding writeback to write back shared extents fully. If you're going to start using this mess, you probably ought to just pull from my git trees, which are linked below. This is an extraordinary way to destroy everything. Enjoy! Comments and questions are, as always, welcome. --D kernel git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=realtime-reflink-extsize xfsprogs git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=realtime-reflink-extsize fstests git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=realtime-reflink-extsize --- fs/dax.c | 5 + fs/iomap/buffered-io.c | 55 ++++++++++ fs/remap_range.c | 30 +++--- fs/xfs/libxfs/xfs_bmap.c | 22 ++++ fs/xfs/libxfs/xfs_inode_buf.c | 20 +--- fs/xfs/xfs_aops.c | 40 +++++++ fs/xfs/xfs_file.c | 171 ++++++++++++++++++++++++++++++++ fs/xfs/xfs_inode.h | 9 ++ fs/xfs/xfs_iops.c | 15 +++ fs/xfs/xfs_reflink.c | 220 ++++++++++++++++++++++++++++++++++++++++- fs/xfs/xfs_rtalloc.c | 3 - fs/xfs/xfs_super.c | 17 ++- fs/xfs/xfs_trace.h | 4 + include/linux/fs.h | 3 - include/linux/iomap.h | 2 15 files changed, 574 insertions(+), 42 deletions(-) ^ permalink raw reply [flat|nested] 565+ messages in thread
* [PATCH 4/9] xfs: forcibly convert unwritten blocks within an rt extent before sharing 2022-12-30 22:18 ` [PATCHSET v1.0 0/9] xfs: reflink with large realtime extents Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 2/9] iomap: set up for COWing around pages Darrick J. Wong ` (7 subsequent siblings) 8 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> As noted in the previous patch, XFS can only unmap and map full rt extents. This means that we cannot stop mid-extent for any reason, including stepping around unwritten/written extents. Second, the reflink and CoW mechanisms were not designed to handle shared unwritten extents, so we have to do something to get rid of them. If the user asks us to remap two files, we must scan both ranges beforehand to convert any unwritten extents that are not aligned to rt extent boundaries into zeroed written extents before sharing. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/xfs_reflink.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c index 8690017beb9b..b9f47bdbe383 100644 --- a/fs/xfs/xfs_reflink.c +++ b/fs/xfs/xfs_reflink.c @@ -1689,6 +1689,25 @@ xfs_reflink_remap_prep( if (ret) goto out_unlock; + /* + * Now that we've marked both inodes for reflink, make sure that all + * possible rt extents in both files' ranges are either wholly written, + * wholly unwritten, or holes. The bmap code requires that we align + * all unmap and remap requests to a rt extent boundary. We've already + * flushed the page cache and finished directio for the range that's + * being remapped, so we can convert the extents directly. + */ + if (xfs_inode_has_bigrtextents(src)) { + ret = xfs_rtfile_convert_unwritten(src, pos_in, *len); + if (ret) + goto out_unlock; + } + if (xfs_inode_has_bigrtextents(dest)) { + ret = xfs_rtfile_convert_unwritten(dest, pos_out, *len); + if (ret) + goto out_unlock; + } + /* * If pos_out > EOF, we may have dirtied blocks between EOF and * pos_out. In that case, we need to extend the flush and unmap to cover ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 2/9] iomap: set up for COWing around pages 2022-12-30 22:18 ` [PATCHSET v1.0 0/9] xfs: reflink with large realtime extents Darrick J. Wong 2022-12-30 22:18 ` [PATCH 4/9] xfs: forcibly convert unwritten blocks within an rt extent before sharing Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 1/9] vfs: explicitly pass the block size to the remap prep function Darrick J. Wong ` (6 subsequent siblings) 8 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> In anticipation of enabling reflink on the realtime volume where the allocation unit is larger than a page, create an iomap function to dirty arbitrary parts of a file's page cache so that when we dirty part of a file that could undergo a COW extent, we can dirty an entire allocation unit's worth of pages. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/iomap/buffered-io.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++++ include/linux/iomap.h | 2 ++ 2 files changed, 57 insertions(+) diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 356193e44cf0..da5a5d28e2ee 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -1141,6 +1141,61 @@ iomap_file_unshare(struct inode *inode, loff_t pos, loff_t len, } EXPORT_SYMBOL_GPL(iomap_file_unshare); +static loff_t iomap_dirty_iter(struct iomap_iter *iter) +{ + loff_t pos = iter->pos; + loff_t length = iomap_length(iter); + long status = 0; + loff_t written = 0; + + do { + unsigned long offset = offset_in_page(pos); + unsigned long bytes = min_t(loff_t, PAGE_SIZE - offset, length); + struct folio *folio; + + status = iomap_write_begin(iter, pos, bytes, &folio); + if (unlikely(status)) + return status; + + folio_mark_accessed(folio); + + status = iomap_write_end(iter, pos, bytes, bytes, folio); + if (WARN_ON_ONCE(status == 0)) + return -EIO; + + cond_resched(); + + pos += status; + written += status; + length -= status; + + balance_dirty_pages_ratelimited(iter->inode->i_mapping); + } while (length); + + return written; +} + +int +iomap_dirty_range(struct inode *inode, loff_t pos, u64 len, + const struct iomap_ops *ops) +{ + struct iomap_iter iter = { + .inode = inode, + .pos = pos, + .len = len, + .flags = IOMAP_WRITE, + }; + int ret; + + if (IS_DAX(inode)) + return -EINVAL; + + while ((ret = iomap_iter(&iter, ops)) > 0) + iter.processed = iomap_dirty_iter(&iter); + return ret; +} +EXPORT_SYMBOL_GPL(iomap_dirty_range); + static loff_t iomap_zero_iter(struct iomap_iter *iter, bool *did_zero) { const struct iomap *srcmap = iomap_iter_srcmap(iter); diff --git a/include/linux/iomap.h b/include/linux/iomap.h index 0983dfc9a203..4d911d780165 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -264,6 +264,8 @@ bool iomap_release_folio(struct folio *folio, gfp_t gfp_flags); void iomap_invalidate_folio(struct folio *folio, size_t offset, size_t len); int iomap_file_unshare(struct inode *inode, loff_t pos, loff_t len, const struct iomap_ops *ops); +int iomap_dirty_range(struct inode *inode, loff_t pos, u64 len, + const struct iomap_ops *ops); int iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero, const struct iomap_ops *ops); int iomap_truncate_page(struct inode *inode, loff_t pos, bool *did_zero, ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 1/9] vfs: explicitly pass the block size to the remap prep function 2022-12-30 22:18 ` [PATCHSET v1.0 0/9] xfs: reflink with large realtime extents Darrick J. Wong 2022-12-30 22:18 ` [PATCH 4/9] xfs: forcibly convert unwritten blocks within an rt extent before sharing Darrick J. Wong 2022-12-30 22:18 ` [PATCH 2/9] iomap: set up for COWing around pages Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 3/9] xfs: enable CoW when rt extent size is larger than 1 block Darrick J. Wong ` (5 subsequent siblings) 8 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Make it so that filesystems can pass an explicit blocksize to the remap prep function. This enables filesystems whose fundamental allocation units are /not/ the same as the blocksize to ensure that the remapping checks are aligned properly. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/dax.c | 5 ++++- fs/remap_range.c | 30 ++++++++++++++++++------------ include/linux/fs.h | 3 ++- 3 files changed, 24 insertions(+), 14 deletions(-) diff --git a/fs/dax.c b/fs/dax.c index c48a3a93ab29..9ec07a06f49c 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -2035,7 +2035,10 @@ int dax_remap_file_range_prep(struct file *file_in, loff_t pos_in, loff_t *len, unsigned int remap_flags, const struct iomap_ops *ops) { + unsigned int blocksize = file_inode(file_out)->i_sb->s_blocksize; + return __generic_remap_file_range_prep(file_in, pos_in, file_out, - pos_out, len, remap_flags, ops); + pos_out, len, remap_flags, ops, + blocksize); } EXPORT_SYMBOL_GPL(dax_remap_file_range_prep); diff --git a/fs/remap_range.c b/fs/remap_range.c index 469d53fb42e9..8a43038dc3e7 100644 --- a/fs/remap_range.c +++ b/fs/remap_range.c @@ -29,18 +29,18 @@ */ static int generic_remap_checks(struct file *file_in, loff_t pos_in, struct file *file_out, loff_t pos_out, - loff_t *req_count, unsigned int remap_flags) + loff_t *req_count, unsigned int remap_flags, + unsigned int blocksize) { struct inode *inode_in = file_in->f_mapping->host; struct inode *inode_out = file_out->f_mapping->host; uint64_t count = *req_count; uint64_t bcount; loff_t size_in, size_out; - loff_t bs = inode_out->i_sb->s_blocksize; int ret; /* The start of both ranges must be aligned to an fs block. */ - if (!IS_ALIGNED(pos_in, bs) || !IS_ALIGNED(pos_out, bs)) + if (!IS_ALIGNED(pos_in, blocksize) || !IS_ALIGNED(pos_out, blocksize)) return -EINVAL; /* Ensure offsets don't wrap. */ @@ -74,10 +74,10 @@ static int generic_remap_checks(struct file *file_in, loff_t pos_in, */ if (pos_in + count == size_in && (!(remap_flags & REMAP_FILE_DEDUP) || pos_out + count == size_out)) { - bcount = ALIGN(size_in, bs) - pos_in; + bcount = ALIGN(size_in, blocksize) - pos_in; } else { - if (!IS_ALIGNED(count, bs)) - count = ALIGN_DOWN(count, bs); + if (!IS_ALIGNED(count, blocksize)) + count = ALIGN_DOWN(count, blocksize); bcount = count; } @@ -125,9 +125,10 @@ static int generic_remap_check_len(struct inode *inode_in, struct inode *inode_out, loff_t pos_out, loff_t *len, - unsigned int remap_flags) + unsigned int remap_flags, + unsigned int blocksize) { - u64 blkmask = i_blocksize(inode_in) - 1; + u64 blkmask = blocksize - 1; loff_t new_len = *len; if ((*len & blkmask) == 0) @@ -268,7 +269,8 @@ int __generic_remap_file_range_prep(struct file *file_in, loff_t pos_in, struct file *file_out, loff_t pos_out, loff_t *len, unsigned int remap_flags, - const struct iomap_ops *dax_read_ops) + const struct iomap_ops *dax_read_ops, + unsigned int blocksize) { struct inode *inode_in = file_inode(file_in); struct inode *inode_out = file_inode(file_out); @@ -303,7 +305,7 @@ __generic_remap_file_range_prep(struct file *file_in, loff_t pos_in, /* Check that we don't violate system file offset limits. */ ret = generic_remap_checks(file_in, pos_in, file_out, pos_out, len, - remap_flags); + remap_flags, blocksize); if (ret || *len == 0) return ret; @@ -344,7 +346,7 @@ __generic_remap_file_range_prep(struct file *file_in, loff_t pos_in, } ret = generic_remap_check_len(inode_in, inode_out, pos_out, len, - remap_flags); + remap_flags, blocksize); if (ret || *len == 0) return ret; @@ -354,13 +356,17 @@ __generic_remap_file_range_prep(struct file *file_in, loff_t pos_in, return ret; } +EXPORT_SYMBOL(__generic_remap_file_range_prep); int generic_remap_file_range_prep(struct file *file_in, loff_t pos_in, struct file *file_out, loff_t pos_out, loff_t *len, unsigned int remap_flags) { + unsigned int blocksize = file_inode(file_out)->i_sb->s_blocksize; + return __generic_remap_file_range_prep(file_in, pos_in, file_out, - pos_out, len, remap_flags, NULL); + pos_out, len, remap_flags, NULL, + blocksize); } EXPORT_SYMBOL(generic_remap_file_range_prep); diff --git a/include/linux/fs.h b/include/linux/fs.h index cd86ac22c339..5f8f4b11dc28 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2204,7 +2204,8 @@ extern ssize_t generic_copy_file_range(struct file *file_in, loff_t pos_in, int __generic_remap_file_range_prep(struct file *file_in, loff_t pos_in, struct file *file_out, loff_t pos_out, loff_t *len, unsigned int remap_flags, - const struct iomap_ops *dax_read_ops); + const struct iomap_ops *dax_read_ops, + unsigned int block_size); int generic_remap_file_range_prep(struct file *file_in, loff_t pos_in, struct file *file_out, loff_t pos_out, loff_t *count, unsigned int remap_flags); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 3/9] xfs: enable CoW when rt extent size is larger than 1 block 2022-12-30 22:18 ` [PATCHSET v1.0 0/9] xfs: reflink with large realtime extents Darrick J. Wong ` (2 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 1/9] vfs: explicitly pass the block size to the remap prep function Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 5/9] xfs: extend writeback requests to handle rt cow correctly Darrick J. Wong ` (4 subsequent siblings) 8 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Copy on write encounters a major plot twist when the file being CoW'd lives on the realtime volume and the realtime extent size is larger than a single filesystem block. XFS can only unmap and remap full rt extents, which means that allocations are always done in units of full rt extents, and a request to unmap less than one extent is treated as a request to convert an extent to unwritten status. This behavioral quirk is not compatible with the existing CoW mechanism, so we have to intercept every path through which files can be modified to ensure that we dirty an entire rt extent at once so that we can remap a full rt extent. Use the existing VFS unshare functions to dirty the page cache to set that up. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/xfs_file.c | 171 ++++++++++++++++++++++++++++++++++++++++++++++++++ fs/xfs/xfs_inode.h | 9 +++ fs/xfs/xfs_iops.c | 15 ++++ fs/xfs/xfs_reflink.c | 39 +++++++++++ fs/xfs/xfs_trace.h | 1 5 files changed, 235 insertions(+) diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index 87dfb05640a8..e172ca1b18df 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -32,6 +32,7 @@ #include <linux/mman.h> #include <linux/fadvise.h> #include <linux/mount.h> +#include <linux/buffer_head.h> static const struct vm_operations_struct xfs_file_vm_ops; @@ -396,6 +397,13 @@ xfs_file_write_checks( goto restart; } + if (xfs_inode_needs_cow_around(ip)) { + error = xfs_file_cow_around(ip, isize, + iocb->ki_pos - isize); + if (error) + return error; + } + trace_xfs_zero_eof(ip, isize, iocb->ki_pos - isize); error = xfs_zero_range(ip, isize, iocb->ki_pos - isize, NULL); if (error) @@ -508,6 +516,7 @@ xfs_file_dio_write_aligned( struct iov_iter *from) { unsigned int iolock = XFS_IOLOCK_SHARED; + size_t count = iov_iter_count(from); ssize_t ret; ret = xfs_ilock_iocb(iocb, iolock); @@ -517,6 +526,17 @@ xfs_file_dio_write_aligned( if (ret) goto out_unlock; + /* + * We can't unshare a partial rt extent yet, which means that we can't + * handle direct writes that are block-aligned but not rtextent-aligned. + */ + if (xfs_inode_needs_cow_around(ip) && + !xfs_is_falloc_aligned(ip, iocb->ki_pos, count)) { + trace_xfs_reflink_bounce_dio_write(iocb, from); + ret = -ENOTBLK; + goto out_unlock; + } + /* * We don't need to hold the IOLOCK exclusively across the IO, so demote * the iolock back to shared if we had to take the exclusive lock in @@ -753,6 +773,68 @@ xfs_file_buffered_write( return ret; } +/* Unshare the rtextent at the given file position. */ +static inline int +xfs_file_unshare_at( + struct xfs_inode *ip, + loff_t isize, + unsigned int extsize, + loff_t pos) +{ + loff_t len = extsize; + uint32_t mod; + + div_u64_rem(pos, extsize, &mod); + if (mod == 0) + return 0; + + pos -= mod; + if (pos >= isize) + return 0; + + if (pos + len > isize) + len = isize - pos; + + trace_xfs_file_cow_around(ip, pos, len); + + return iomap_file_unshare(VFS_I(ip), pos, len, + &xfs_buffered_write_iomap_ops); +} + +/* + * Dirty the pages on either side of a write request as needed to satisfy + * alignment requirements if we're going to perform a copy-write. + * + * This is only needed for realtime files when the rt extent size is larger + * than 1 fs block, because we don't allow a logical rt extent in a file to map + * to multiple physical rt extents. In other words, we can only map and unmap + * full rt extents. Note that page cache doesn't exist above EOF, so be + * careful to stay below EOF. + */ +int +xfs_file_cow_around( + struct xfs_inode *ip, + loff_t pos, + long long int count) +{ + unsigned int extsize = xfs_inode_alloc_unitsize(ip); + loff_t isize = i_size_read(VFS_I(ip)); + int error; + + if (xfs_is_falloc_aligned(ip, pos, count)) + return 0; + + inode_dio_wait(VFS_I(ip)); + + /* Unshare at the start of the extent. */ + error = xfs_file_unshare_at(ip, isize, extsize, pos); + if (error) + return error; + + /* Unshare at the end. */ + return xfs_file_unshare_at(ip, isize, extsize, pos + count); +} + STATIC ssize_t xfs_file_write_iter( struct kiocb *iocb, @@ -774,6 +856,16 @@ xfs_file_write_iter( if (IS_DAX(inode)) return xfs_file_dax_write(iocb, from); + if (xfs_inode_needs_cow_around(ip)) { + ret = xfs_ilock_iocb(iocb, XFS_IOLOCK_EXCL); + if (ret) + return ret; + ret = xfs_file_cow_around(ip, iocb->ki_pos, ocount); + xfs_iunlock(ip, XFS_IOLOCK_EXCL); + if (ret) + return ret; + } + if (iocb->ki_flags & IOCB_DIRECT) { /* * Allow a directio write to fall back to a buffered @@ -929,6 +1021,13 @@ xfs_file_fallocate( goto out_unlock; if (mode & FALLOC_FL_PUNCH_HOLE) { + /* Unshare around the region to punch, if needed. */ + if (xfs_inode_needs_cow_around(ip)) { + error = xfs_file_cow_around(ip, offset, len); + if (error) + goto out_unlock; + } + error = xfs_free_file_space(ip, offset, len); if (error) goto out_unlock; @@ -999,6 +1098,14 @@ xfs_file_fallocate( trace_xfs_zero_file_space(ip); + /* Unshare around the region to zero, if needed. */ + if (xfs_inode_needs_cow_around(ip)) { + error = xfs_file_cow_around(ip, offset, + len); + if (error) + goto out_unlock; + } + error = xfs_free_file_space(ip, offset, len); if (error) goto out_unlock; @@ -1007,6 +1114,26 @@ xfs_file_fallocate( round_down(offset, blksize); offset = round_down(offset, blksize); } else if (mode & FALLOC_FL_UNSHARE_RANGE) { + /* + * Enlarge the unshare region to align to a full + * allocation unit. + */ + if (xfs_inode_needs_cow_around(ip)) { + loff_t isize = i_size_read(VFS_I(ip)); + unsigned int rextsize; + uint32_t mod; + + rextsize = xfs_inode_alloc_unitsize(ip); + div_u64_rem(offset, rextsize, &mod); + offset -= mod; + len += mod; + + div_u64_rem(offset + len, rextsize, &mod); + if (mod) + len += rextsize - mod; + if (offset + len > isize) + len = isize - offset; + } error = xfs_reflink_unshare(ip, offset, len); if (error) goto out_unlock; @@ -1341,6 +1468,35 @@ xfs_dax_fault( } #endif +static int +xfs_filemap_fault_around( + struct vm_fault *vmf, + struct inode *inode) +{ + struct folio *folio = page_folio(vmf->page); + loff_t pos; + ssize_t len; + int error; + + if (!xfs_inode_needs_cow_around(XFS_I(inode))) + return 0; + + folio_lock(folio); + len = folio_mkwrite_check_truncate(folio, inode); + if (len < 0) { + folio_unlock(folio); + return len; + } + pos = folio_pos(folio); + folio_unlock(folio); + + error = xfs_file_cow_around(XFS_I(inode), pos, len); + if (error) + return error; + + return 0; +} + /* * Locking for serialisation of IO during page faults. This results in a lock * ordering of: @@ -1378,7 +1534,21 @@ __xfs_filemap_fault( xfs_iunlock(XFS_I(inode), XFS_MMAPLOCK_SHARED); } else { if (write_fault) { + int error; + xfs_ilock(XFS_I(inode), XFS_MMAPLOCK_SHARED); + + /* + * Unshare all the blocks in this rt extent surrounding + * this page. + */ + error = xfs_filemap_fault_around(vmf, inode); + if (error) { + xfs_iunlock(XFS_I(inode), XFS_MMAPLOCK_SHARED); + ret = block_page_mkwrite_return(error); + goto out; + } + ret = iomap_page_mkwrite(vmf, &xfs_page_mkwrite_iomap_ops); xfs_iunlock(XFS_I(inode), XFS_MMAPLOCK_SHARED); @@ -1387,6 +1557,7 @@ __xfs_filemap_fault( } } +out: if (write_fault) sb_end_pagefault(inode->i_sb); return ret; diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h index ca7ebb07efc7..32a1d114dfaf 100644 --- a/fs/xfs/xfs_inode.h +++ b/fs/xfs/xfs_inode.h @@ -309,6 +309,12 @@ static inline bool xfs_inode_has_bigrtextents(struct xfs_inode *ip) return XFS_IS_REALTIME_INODE(ip) && ip->i_mount->m_sb.sb_rextsize > 1; } +/* Decide if we need to unshare the blocks around a range that we're writing. */ +static inline bool xfs_inode_needs_cow_around(struct xfs_inode *ip) +{ + return xfs_is_reflink_inode(ip) && xfs_inode_has_bigrtextents(ip); +} + /* * Return the buftarg used for data allocations on a given inode. */ @@ -636,4 +642,7 @@ int xfs_icreate_dqalloc(const struct xfs_icreate_args *args, struct xfs_dquot **udqpp, struct xfs_dquot **gdqpp, struct xfs_dquot **pdqpp); +int xfs_file_cow_around(struct xfs_inode *ip, loff_t pos, + long long int count); + #endif /* __XFS_INODE_H__ */ diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c index 626ce6c4e2bf..c0a827b23948 100644 --- a/fs/xfs/xfs_iops.c +++ b/fs/xfs/xfs_iops.c @@ -26,6 +26,7 @@ #include "xfs_ioctl.h" #include "xfs_xattr.h" #include "xfs_bmap.h" +#include "xfs_reflink.h" #include <linux/posix_acl.h> #include <linux/security.h> @@ -861,10 +862,24 @@ xfs_setattr_size( * truncate. */ if (newsize > oldsize) { + if (xfs_inode_needs_cow_around(ip)) { + error = xfs_file_cow_around(ip, oldsize, + newsize - oldsize); + if (error) + return error; + } + trace_xfs_zero_eof(ip, oldsize, newsize - oldsize); error = xfs_zero_range(ip, oldsize, newsize - oldsize, &did_zeroing); } else { + if (xfs_inode_needs_cow_around(ip)) { + error = xfs_file_cow_around(ip, newsize, + oldsize - newsize); + if (error) + return error; + } + /* * iomap won't detect a dirty page over an unwritten block (or a * cow block over a hole) and subsequently skips zeroing the diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c index 13a613c077df..8690017beb9b 100644 --- a/fs/xfs/xfs_reflink.c +++ b/fs/xfs/xfs_reflink.c @@ -34,6 +34,7 @@ #include "xfs_rtalloc.h" #include "xfs_rtgroup.h" #include "xfs_imeta.h" +#include "xfs_rtbitmap.h" /* * Copy on Write of Shared Blocks @@ -297,9 +298,26 @@ xfs_reflink_convert_cow_locked( struct xfs_iext_cursor icur; struct xfs_bmbt_irec got; struct xfs_btree_cur *dummy_cur = NULL; + struct xfs_mount *mp = ip->i_mount; int dummy_logflags; int error = 0; + /* + * We can only remap full rt extents, so make sure that we convert the + * entire extent. The caller must ensure that this is either a direct + * write that's aligned to the rt extent size, or a buffered write for + * which we've dirtied extra pages to make this work properly. + */ + if (xfs_inode_needs_cow_around(ip)) { + xfs_fileoff_t new_off; + + new_off = xfs_rtb_rounddown_rtx(mp, offset_fsb); + count_fsb += offset_fsb - new_off; + offset_fsb = new_off; + + count_fsb = xfs_rtb_roundup_rtx(mp, count_fsb); + } + if (!xfs_iext_lookup_extent(ip, ip->i_cowfp, offset_fsb, &icur, &got)) return 0; @@ -635,11 +653,21 @@ xfs_reflink_cancel_cow_blocks( bool cancel_real) { struct xfs_ifork *ifp = xfs_ifork_ptr(ip, XFS_COW_FORK); + struct xfs_mount *mp = ip->i_mount; struct xfs_bmbt_irec got, del; struct xfs_iext_cursor icur; bool isrt = XFS_IS_REALTIME_INODE(ip); int error = 0; + /* + * Shrink the range that we're cancelling if they don't align to the + * realtime extent size, since we can only free full extents. + */ + if (xfs_inode_needs_cow_around(ip)) { + offset_fsb = xfs_rtb_roundup_rtx(mp, offset_fsb); + end_fsb = xfs_rtb_rounddown_rtx(mp, end_fsb); + } + if (!xfs_inode_has_cow_data(ip)) return 0; if (!xfs_iext_lookup_extent_before(ip, ifp, &end_fsb, &icur, &got)) @@ -942,6 +970,7 @@ xfs_reflink_end_cow( xfs_off_t offset, xfs_off_t count) { + struct xfs_mount *mp = ip->i_mount; xfs_fileoff_t offset_fsb; xfs_fileoff_t end_fsb; int error = 0; @@ -951,6 +980,16 @@ xfs_reflink_end_cow( offset_fsb = XFS_B_TO_FSBT(ip->i_mount, offset); end_fsb = XFS_B_TO_FSB(ip->i_mount, offset + count); + /* + * Make sure the end is aligned with a rt extent (if desired), since + * the end of the range could be EOF. The _convert_cow function should + * have set us up to swap only full rt extents. + */ + if (xfs_inode_needs_cow_around(ip)) { + offset_fsb = xfs_rtb_rounddown_rtx(mp, offset_fsb); + end_fsb = xfs_rtb_roundup_rtx(mp, end_fsb); + } + /* * Walk forwards until we've remapped the I/O range. The loop function * repeatedly cycles the ILOCK to allocate one transaction per remapped diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index d07947451ec9..d5b0dc3c5a0d 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -3888,6 +3888,7 @@ TRACE_EVENT(xfs_ioctl_clone, /* unshare tracepoints */ DEFINE_SIMPLE_IO_EVENT(xfs_reflink_unshare); +DEFINE_SIMPLE_IO_EVENT(xfs_file_cow_around); DEFINE_INODE_ERROR_EVENT(xfs_reflink_unshare_error); #ifdef CONFIG_XFS_RT DEFINE_SIMPLE_IO_EVENT(xfs_rtfile_convert_unwritten); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 5/9] xfs: extend writeback requests to handle rt cow correctly 2022-12-30 22:18 ` [PATCHSET v1.0 0/9] xfs: reflink with large realtime extents Darrick J. Wong ` (3 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 3/9] xfs: enable CoW when rt extent size is larger than 1 block Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 6/9] xfs: enable extent size hints for CoW when rtextsize > 1 Darrick J. Wong ` (3 subsequent siblings) 8 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> If we have shared realtime files and the rt extent size is larger than a single fs block, we need to extend writeback requests to be aligned to rt extent size granularity because we cannot share partial rt extents. The front end should have set us up for this by dirtying the relevant ranges. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/xfs_aops.c | 40 ++++++++++++++++++++++++++++++++++++---- 1 file changed, 36 insertions(+), 4 deletions(-) diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c index c3a9df0c0eab..af5c854a72dc 100644 --- a/fs/xfs/xfs_aops.c +++ b/fs/xfs/xfs_aops.c @@ -488,12 +488,41 @@ static const struct iomap_writeback_ops xfs_writeback_ops = { .discard_folio = xfs_discard_folio, }; +/* + * Extend the writeback range to allocation unit granularity and alignment. + * This is a requirement for blocksize > pagesize scenarios such as realtime + * copy on write, since we can only share full rt extents. + */ +static void +xfs_vm_writepage_extend( + struct xfs_inode *ip, + struct writeback_control *wbc) +{ + unsigned int bsize = xfs_inode_alloc_unitsize(ip); + long long int pages_to_write; + + wbc->range_start = rounddown_64(wbc->range_start, bsize); + if (wbc->range_end != LLONG_MAX) + wbc->range_end = roundup_64(wbc->range_end, bsize); + + if (wbc->nr_to_write == LONG_MAX) + return; + + pages_to_write = roundup_64(wbc->range_end - wbc->range_start, + PAGE_SIZE); + if (pages_to_write >= LONG_MAX) + pages_to_write = LONG_MAX; + if (wbc->nr_to_write < pages_to_write) + wbc->nr_to_write = pages_to_write; +} + STATIC int xfs_vm_writepages( - struct address_space *mapping, - struct writeback_control *wbc) + struct address_space *mapping, + struct writeback_control *wbc) { - struct xfs_writepage_ctx wpc = { }; + struct xfs_writepage_ctx wpc = { }; + struct xfs_inode *ip = XFS_I(mapping->host); /* * Writing back data in a transaction context can result in recursive @@ -502,7 +531,10 @@ xfs_vm_writepages( if (WARN_ON_ONCE(current->journal_info)) return 0; - xfs_iflags_clear(XFS_I(mapping->host), XFS_ITRUNCATED); + if (xfs_inode_needs_cow_around(ip)) + xfs_vm_writepage_extend(ip, wbc); + + xfs_iflags_clear(ip, XFS_ITRUNCATED); return iomap_writepages(mapping, wbc, &wpc.ctx, &xfs_writeback_ops); } ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 6/9] xfs: enable extent size hints for CoW when rtextsize > 1 2022-12-30 22:18 ` [PATCHSET v1.0 0/9] xfs: reflink with large realtime extents Darrick J. Wong ` (4 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 5/9] xfs: extend writeback requests to handle rt cow correctly Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 8/9] xfs: fix integer overflow when validating extent size hints Darrick J. Wong ` (2 subsequent siblings) 8 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> CoW extent size hints are not allowed on filesystems that have large realtime extents because we only want to perform the minimum required amount of write-around (aka write amplification) for shared extents. On filesystems where rtextsize > 1, allocations can only be done in units of full rt extents, which means that we can only map an entire rt extent's worth of blocks into the data fork. Hole punch requests become conversions to unwritten if the request isn't aligned properly. Because a copy-write fundamentally requires remapping, this means that we also can only do copy-writes of a full rt extent. This is too expensive for large hint sizes, since it's all or nothing. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_bmap.c | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index b2bc39b1f9b7..053d72063999 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -6451,6 +6451,28 @@ xfs_get_cowextsz_hint( if (ip->i_diflags2 & XFS_DIFLAG2_COWEXTSIZE) a = ip->i_cowextsize; if (XFS_IS_REALTIME_INODE(ip)) { + /* + * For realtime files, the realtime extent is the fundamental + * unit of allocation. This means that data sharing and CoW + * remapping can only be done in those units. For filesystems + * where the extent size is larger than one block, write + * requests that are not aligned to an extent boundary employ + * an unshare-around strategy to ensure that all pages for a + * shared extent are fully dirtied. + * + * Because the remapping alignment requirement applies equally + * to all CoW writes, any regular overwrites that could be + * turned (by a speculative CoW preallocation) into a CoW write + * must either employ this dirty-around strategy, or be smart + * enough to ignore the CoW fork mapping unless the entire + * extent is dirty or becomes shared by writeback time. Doing + * the first would dramatically increase write amplification, + * and the second would require deeper insight into the state + * of the page cache during a writeback request. For now, we + * ignore the hint. + */ + if (ip->i_mount->m_sb.sb_rextsize > 1) + return ip->i_mount->m_sb.sb_rextsize; b = 0; if (ip->i_diflags & XFS_DIFLAG_EXTSIZE) b = ip->i_extsize; ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 8/9] xfs: fix integer overflow when validating extent size hints 2022-12-30 22:18 ` [PATCHSET v1.0 0/9] xfs: reflink with large realtime extents Darrick J. Wong ` (5 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 6/9] xfs: enable extent size hints for CoW when rtextsize > 1 Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 7/9] xfs: allow reflink on the rt volume when extent size is larger than 1 rt block Darrick J. Wong 2022-12-30 22:18 ` [PATCH 9/9] xfs: support realtime reflink with an extent size that isn't a power of 2 Darrick J. Wong 8 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Both file extent size hints are stored as 32-bit quantities, in units of filesystem blocks. As part of validating the hints, we convert these quantities to bytes to ensure that the hint is congruent with the file's allocation size. The maximum possible hint value is 2097151 (aka XFS_MAX_BMBT_EXTLEN). If the file allocation unit is larger than 2048, the unit conversion will exceed 32 bits in size, which overflows the uint32_t used to store the value used in the comparison. This isn't a problem for files on the data device since the hint will always be a multiple of the block size. However, this is a problem for realtime files because the rtextent size can be any integer number of fs blocks, and truncation of upper bits changes the outcome of division. Eliminate the overflow by performing the congruency check in units of blocks, not bytes. Otherwise, we get errors like this: $ truncate -s 500T /tmp/a $ mkfs.xfs -f -N /tmp/a -d extszinherit=2097151,rtinherit=1 -r extsize=28k illegal extent size hint 2097151, must be less than 2097151 and a multiple of 7. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/libxfs/xfs_inode_buf.c | 20 ++++++-------------- 1 file changed, 6 insertions(+), 14 deletions(-) diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c index 09dafa8a9ab2..6f2ae73559d1 100644 --- a/fs/xfs/libxfs/xfs_inode_buf.c +++ b/fs/xfs/libxfs/xfs_inode_buf.c @@ -740,13 +740,11 @@ xfs_inode_validate_extsize( bool rt_flag; bool hint_flag; bool inherit_flag; - uint32_t extsize_bytes; - uint32_t blocksize_bytes; + uint32_t alloc_unit = 1; rt_flag = (flags & XFS_DIFLAG_REALTIME); hint_flag = (flags & XFS_DIFLAG_EXTSIZE); inherit_flag = (flags & XFS_DIFLAG_EXTSZINHERIT); - extsize_bytes = XFS_FSB_TO_B(mp, extsize); /* * This comment describes a historic gap in this verifier function. @@ -775,9 +773,7 @@ xfs_inode_validate_extsize( */ if (rt_flag) - blocksize_bytes = XFS_FSB_TO_B(mp, mp->m_sb.sb_rextsize); - else - blocksize_bytes = mp->m_sb.sb_blocksize; + alloc_unit = mp->m_sb.sb_rextsize; if ((hint_flag || inherit_flag) && !(S_ISDIR(mode) || S_ISREG(mode))) return __this_address; @@ -795,7 +791,7 @@ xfs_inode_validate_extsize( if (mode && !(hint_flag || inherit_flag) && extsize != 0) return __this_address; - if (extsize_bytes % blocksize_bytes) + if (extsize % alloc_unit) return __this_address; if (extsize > XFS_MAX_BMBT_EXTLEN) @@ -830,12 +826,10 @@ xfs_inode_validate_cowextsize( { bool rt_flag; bool hint_flag; - uint32_t cowextsize_bytes; - uint32_t blocksize_bytes; + uint32_t alloc_unit = 1; rt_flag = (flags & XFS_DIFLAG_REALTIME); hint_flag = (flags2 & XFS_DIFLAG2_COWEXTSIZE); - cowextsize_bytes = XFS_FSB_TO_B(mp, cowextsize); /* * Similar to extent size hints, a directory can be configured to @@ -850,9 +844,7 @@ xfs_inode_validate_cowextsize( */ if (rt_flag) - blocksize_bytes = XFS_FSB_TO_B(mp, mp->m_sb.sb_rextsize); - else - blocksize_bytes = mp->m_sb.sb_blocksize; + alloc_unit = mp->m_sb.sb_rextsize; if (hint_flag && !xfs_has_reflink(mp)) return __this_address; @@ -867,7 +859,7 @@ xfs_inode_validate_cowextsize( if (mode && !hint_flag && cowextsize != 0) return __this_address; - if (cowextsize_bytes % blocksize_bytes) + if (cowextsize % alloc_unit) return __this_address; if (cowextsize > XFS_MAX_BMBT_EXTLEN) ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 7/9] xfs: allow reflink on the rt volume when extent size is larger than 1 rt block 2022-12-30 22:18 ` [PATCHSET v1.0 0/9] xfs: reflink with large realtime extents Darrick J. Wong ` (6 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 8/9] xfs: fix integer overflow when validating extent size hints Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 9/9] xfs: support realtime reflink with an extent size that isn't a power of 2 Darrick J. Wong 8 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Make the necessary tweaks to the reflink remapping code to support remapping on the realtime volume when the rt extent size is larger than a single rt block. We need to check that the remap arguments from userspace are aligned to a rt extent boundary, and that the length is always aligned, even if the kernel tried to round it up to EOF for us. XFS can only map and remap full rt extents, so we have to be a little more strict about the alignment there. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/xfs_reflink.c | 78 ++++++++++++++++++++++++++++++++++++++++++++++---- fs/xfs/xfs_rtalloc.c | 2 + fs/xfs/xfs_super.c | 19 +++++++++--- fs/xfs/xfs_trace.h | 3 ++ 4 files changed, 90 insertions(+), 12 deletions(-) diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c index b9f47bdbe383..28fe946ecd08 100644 --- a/fs/xfs/xfs_reflink.c +++ b/fs/xfs/xfs_reflink.c @@ -1530,6 +1530,13 @@ xfs_reflink_remap_blocks( len = min_t(xfs_filblks_t, XFS_B_TO_FSB(mp, remap_len), XFS_MAX_FILEOFF); + /* + * Make sure the end is aligned with a rt extent (if desired), since + * the end of the range could be EOF. + */ + if (xfs_inode_has_bigrtextents(dest)) + len = xfs_rtb_roundup_rtx(mp, len); + trace_xfs_reflink_remap_blocks(src, srcoff, len, dest, destoff); while (len > 0) { @@ -1603,6 +1610,54 @@ xfs_reflink_zero_posteof( return xfs_zero_range(ip, isize, pos - isize, NULL); } +#ifdef CONFIG_XFS_RT +/* Adjust the length of the remap operation to end on a rt extent boundary. */ +STATIC int +xfs_reflink_remap_adjust_rtlen( + struct xfs_inode *src, + loff_t pos_in, + struct xfs_inode *dest, + loff_t pos_out, + loff_t *len, + unsigned int remap_flags) +{ + struct xfs_mount *mp = src->i_mount; + uint32_t mod; + + div_u64_rem(*len, XFS_FSB_TO_B(mp, mp->m_sb.sb_rextsize), &mod); + + /* + * We previously checked the rtextent alignment of both offsets, so we + * now have to check the alignment of the length. The VFS remap prep + * function can change the length on us, so we can only make length + * adjustments after that. If the length is aligned to an rtextent, + * we're trivially good to go. + * + * Otherwise, the length is not aligned to an rt extent. If the source + * file's range ends at EOF, the VFS ensured that the dest file's range + * also ends at EOF. The actual remap function will round the (byte) + * length up to the nearest rtextent unit, so we're ok here too. + */ + if (mod == 0 || pos_in + *len == i_size_read(VFS_I(src))) + return 0; + + /* + * Otherwise, the only thing we can do is round the request length down + * to an rt extent boundary. If the caller doesn't allow that, we are + * finished. + */ + if (!(remap_flags & REMAP_FILE_CAN_SHORTEN)) + return -EINVAL; + + /* Back off by a single extent. */ + (*len) -= mod; + trace_xfs_reflink_remap_adjust_rtlen(src, pos_in, *len, dest, pos_out); + return 0; +} +#else +# define xfs_reflink_remap_adjust_rtlen(...) (0) +#endif /* CONFIG_XFS_RT */ + /* * Prepare two files for range cloning. Upon a successful return both inodes * will have the iolock and mmaplock held, the page cache of the out file will @@ -1645,6 +1700,7 @@ xfs_reflink_remap_prep( struct xfs_inode *src = XFS_I(inode_in); struct inode *inode_out = file_inode(file_out); struct xfs_inode *dest = XFS_I(inode_out); + const struct iomap_ops *dax_read_ops = NULL; int ret; /* Lock both files against IO */ @@ -1662,15 +1718,25 @@ xfs_reflink_remap_prep( if (IS_DAX(inode_in) != IS_DAX(inode_out)) goto out_unlock; - if (!IS_DAX(inode_in)) - ret = generic_remap_file_range_prep(file_in, pos_in, file_out, - pos_out, len, remap_flags); - else - ret = dax_remap_file_range_prep(file_in, pos_in, file_out, - pos_out, len, remap_flags, &xfs_read_iomap_ops); + ASSERT(is_power_of_2(xfs_inode_alloc_unitsize(dest))); + + if (IS_DAX(inode_in)) + dax_read_ops = &xfs_read_iomap_ops; + + ret = __generic_remap_file_range_prep(file_in, pos_in, file_out, + pos_out, len, remap_flags, dax_read_ops, + xfs_inode_alloc_unitsize(dest)); if (ret || *len == 0) goto out_unlock; + /* Make sure the end is aligned with a rt extent. */ + if (xfs_inode_has_bigrtextents(src)) { + ret = xfs_reflink_remap_adjust_rtlen(src, pos_in, dest, + pos_out, len, remap_flags); + if (ret || *len == 0) + goto out_unlock; + } + /* Attach dquots to dest inode before changing block map */ ret = xfs_qm_dqattach(dest); if (ret) diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c index 75d39c3274df..7c1edd5c2554 100644 --- a/fs/xfs/xfs_rtalloc.c +++ b/fs/xfs/xfs_rtalloc.c @@ -1312,7 +1312,7 @@ xfs_growfs_rt( return -EOPNOTSUPP; if (xfs_has_quota(mp)) return -EOPNOTSUPP; - if (xfs_has_reflink(mp) && in->extsize != 1) + if (xfs_has_reflink(mp) && !is_power_of_2(mp->m_sb.sb_rextsize)) return -EOPNOTSUPP; nrblocks = in->newblocks; diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index a3a0011272e5..31c1690ed847 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -1657,13 +1657,22 @@ xfs_fs_fill_super( if (xfs_has_reflink(mp)) { /* - * Reflink doesn't support rt extent sizes larger than a single - * block because we would have to perform unshare-around for - * rtext-unaligned write requests. + * Reflink doesn't support pagecache pages that span multiple + * realtime extents because iomap doesn't track subpage dirty + * state. This means that we cannot dirty all the pages + * backing an rt extent without dirtying the adjoining rt + * extents. If those rt extents are shared and extend into + * other pages, this leads to crazy write amplification. The + * VFS remap_range checks assume power-of-two block sizes. + * + * Hence we only support rt extent sizes that are an integer + * power of two because we know those will align with the page + * size. */ - if (xfs_has_realtime(mp) && mp->m_sb.sb_rextsize != 1) { + if (xfs_has_realtime(mp) && + !is_power_of_2(mp->m_sb.sb_rextsize)) { xfs_alert(mp, - "reflink not compatible with realtime extent size %u!", + "reflink not compatible with non-power-of-2 realtime extent size %u!", mp->m_sb.sb_rextsize); error = -EINVAL; goto out_filestream_unmount; diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h index d5b0dc3c5a0d..00716f112f4e 100644 --- a/fs/xfs/xfs_trace.h +++ b/fs/xfs/xfs_trace.h @@ -3848,6 +3848,9 @@ TRACE_EVENT(xfs_reflink_remap_blocks, __entry->dest_lblk) ); DEFINE_DOUBLE_IO_EVENT(xfs_reflink_remap_range); +#ifdef CONFIG_XFS_RT +DEFINE_DOUBLE_IO_EVENT(xfs_reflink_remap_adjust_rtlen); +#endif /* CONFIG_XFS_RT */ DEFINE_INODE_ERROR_EVENT(xfs_reflink_remap_range_error); DEFINE_INODE_ERROR_EVENT(xfs_reflink_set_inode_flag_error); DEFINE_INODE_ERROR_EVENT(xfs_reflink_update_inode_size_error); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 9/9] xfs: support realtime reflink with an extent size that isn't a power of 2 2022-12-30 22:18 ` [PATCHSET v1.0 0/9] xfs: reflink with large realtime extents Darrick J. Wong ` (7 preceding siblings ...) 2022-12-30 22:18 ` [PATCH 7/9] xfs: allow reflink on the rt volume when extent size is larger than 1 rt block Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 8 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Add the necessary alignment checking code to the reflink remap code to ensure that remap requests are aligned to rt extent boundaries if the realtime extent size isn't a power of two. The VFS helpers assume that they can use the usual (blocksize - 1) masking to avoid slow 64-bit division, but since XFS is special we won't make everyone pay that cost for our weird edge case. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/xfs_reflink.c | 92 ++++++++++++++++++++++++++++++++++++++++++++++++-- fs/xfs/xfs_rtalloc.c | 3 +- fs/xfs/xfs_super.c | 12 +++---- 3 files changed, 97 insertions(+), 10 deletions(-) diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c index 28fe946ecd08..1ec00204b33f 100644 --- a/fs/xfs/xfs_reflink.c +++ b/fs/xfs/xfs_reflink.c @@ -1658,6 +1658,83 @@ xfs_reflink_remap_adjust_rtlen( # define xfs_reflink_remap_adjust_rtlen(...) (0) #endif /* CONFIG_XFS_RT */ +/* + * Check the alignment of a remap request when the allocation unit size isn't a + * power of two. The VFS helpers use (fast) bitmask-based alignment checks, + * but here we have to use slow long division. + */ +static int +xfs_reflink_remap_check_rtalign( + struct xfs_inode *ip_in, + loff_t pos_in, + struct xfs_inode *ip_out, + loff_t pos_out, + loff_t *req_len, + unsigned int remap_flags) +{ + struct xfs_mount *mp = ip_in->i_mount; + uint32_t rextbytes; + loff_t in_size, out_size; + loff_t new_length, length = *req_len; + loff_t blen; + + rextbytes = XFS_FSB_TO_B(mp, mp->m_sb.sb_rextsize); + in_size = i_size_read(VFS_I(ip_in)); + out_size = i_size_read(VFS_I(ip_out)); + + /* The start of both ranges must be aligned to a rt extent. */ + if (!isaligned_64(pos_in, rextbytes) || + !isaligned_64(pos_out, rextbytes)) + return -EINVAL; + + if (length == 0) + length = in_size - pos_in; + + /* + * If the user wanted us to exchange up to the infile's EOF, round up + * to the next block boundary for this check. + * + * Otherwise, reject the range length if it's not extent aligned. We + * already confirmed the starting offsets' extent alignment. + */ + if (pos_in + length == in_size) + blen = roundup_64(in_size, rextbytes) - pos_in; + else + blen = rounddown_64(length, rextbytes); + + /* Don't allow overlapped remappings within the same file. */ + if (ip_in == ip_out && + pos_out + blen > pos_in && + pos_in + blen > pos_out) + return -EINVAL; + + /* + * Ensure that we don't exchange a partial EOF extent into the middle + * of another file. + */ + if (isaligned_64(length, rextbytes)) + return 0; + + new_length = length; + if (pos_out + length < out_size) + new_length = rounddown_64(new_length, rextbytes); + + if (new_length == length) + return 0; + + /* + * Return the shortened request if the caller permits it. If the + * request was shortened to zero rt extents, we know that the original + * arguments weren't valid in the first place. + */ + if ((remap_flags & REMAP_FILE_CAN_SHORTEN) && new_length > 0) { + *req_len = new_length; + return 0; + } + + return (remap_flags & REMAP_FILE_DEDUP) ? -EBADE : -EINVAL; +} + /* * Prepare two files for range cloning. Upon a successful return both inodes * will have the iolock and mmaplock held, the page cache of the out file will @@ -1701,6 +1778,7 @@ xfs_reflink_remap_prep( struct inode *inode_out = file_inode(file_out); struct xfs_inode *dest = XFS_I(inode_out); const struct iomap_ops *dax_read_ops = NULL; + unsigned int alloc_unit = xfs_inode_alloc_unitsize(dest); int ret; /* Lock both files against IO */ @@ -1718,14 +1796,22 @@ xfs_reflink_remap_prep( if (IS_DAX(inode_in) != IS_DAX(inode_out)) goto out_unlock; - ASSERT(is_power_of_2(xfs_inode_alloc_unitsize(dest))); + /* Check non-power of two alignment issues, if necessary. */ + if (XFS_IS_REALTIME_INODE(dest) && !is_power_of_2(alloc_unit)) { + ret = xfs_reflink_remap_check_rtalign(src, pos_in, dest, + pos_out, len, remap_flags); + if (ret) + goto out_unlock; + + /* Do the VFS checks with the regular block alignment. */ + alloc_unit = src->i_mount->m_sb.sb_blocksize; + } if (IS_DAX(inode_in)) dax_read_ops = &xfs_read_iomap_ops; ret = __generic_remap_file_range_prep(file_in, pos_in, file_out, - pos_out, len, remap_flags, dax_read_ops, - xfs_inode_alloc_unitsize(dest)); + pos_out, len, remap_flags, dax_read_ops, alloc_unit); if (ret || *len == 0) goto out_unlock; diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c index 7c1edd5c2554..5e27cb7fce36 100644 --- a/fs/xfs/xfs_rtalloc.c +++ b/fs/xfs/xfs_rtalloc.c @@ -1312,7 +1312,8 @@ xfs_growfs_rt( return -EOPNOTSUPP; if (xfs_has_quota(mp)) return -EOPNOTSUPP; - if (xfs_has_reflink(mp) && !is_power_of_2(mp->m_sb.sb_rextsize)) + if (xfs_has_reflink(mp) && !is_power_of_2(mp->m_sb.sb_rextsize) && + (XFS_FSB_TO_B(mp, mp->m_sb.sb_rextsize) & ~PAGE_MASK)) return -EOPNOTSUPP; nrblocks = in->newblocks; diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index 31c1690ed847..627fa40bbc5b 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -1662,17 +1662,17 @@ xfs_fs_fill_super( * state. This means that we cannot dirty all the pages * backing an rt extent without dirtying the adjoining rt * extents. If those rt extents are shared and extend into - * other pages, this leads to crazy write amplification. The - * VFS remap_range checks assume power-of-two block sizes. + * other pages, this leads to crazy write amplification. * * Hence we only support rt extent sizes that are an integer - * power of two because we know those will align with the page - * size. + * power of two or an integer multiple of the page size because + * we know those will align with the page size. */ if (xfs_has_realtime(mp) && - !is_power_of_2(mp->m_sb.sb_rextsize)) { + !is_power_of_2(mp->m_sb.sb_rextsize) && + (XFS_FSB_TO_B(mp, mp->m_sb.sb_rextsize) & ~PAGE_MASK)) { xfs_alert(mp, - "reflink not compatible with non-power-of-2 realtime extent size %u!", + "reflink not compatible with realtime extent size %u!", mp->m_sb.sb_rextsize); error = -EINVAL; goto out_filestream_unmount; ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCHSET v1.0 0/3] xfs: enable quota for realtime voluems 2022-12-30 21:14 [NYE DELUGE 3/4] xfs: modernize the realtime volume Darrick J. Wong ` (16 preceding siblings ...) 2022-12-30 22:18 ` [PATCHSET v1.0 0/9] xfs: reflink with large realtime extents Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 3/3] xfs: enable realtime quota again Darrick J. Wong ` (2 more replies) 2022-12-30 22:19 ` [PATCHSET 0/4] xfs_repair: add other v5 features to filesystems Darrick J. Wong ` (21 subsequent siblings) 39 siblings, 3 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs Hi all, At some point, I realized that I've refactored enough of the quota code in XFS that I should evaluate whether or not quota actually works on realtime volumes. It turns out that with two exceptions, it actually does seem to work properly! There are three broken pieces that I've found so far: chown doesn't work, the quota accounting goes wrong when the rt bitmap changes size, and the VFS quota ioctls don't report the realtime warning counts or limits. Hence this series fixes two things in XFS and re-enables rt quota after a break of a couple decades. If you're going to start using this mess, you probably ought to just pull from my git trees, which are linked below. This is an extraordinary way to destroy everything. Enjoy! Comments and questions are, as always, welcome. --D kernel git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=realtime-quotas xfsprogs git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=realtime-quotas fstests git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=realtime-quotas --- fs/xfs/xfs_qm.c | 56 +++++++++++++++++++++++++++----------------------- fs/xfs/xfs_rtalloc.c | 24 +++++++-------------- fs/xfs/xfs_trans.c | 31 ++++++++++++++++++++++++++-- 3 files changed, 67 insertions(+), 44 deletions(-) ^ permalink raw reply [flat|nested] 565+ messages in thread
* [PATCH 3/3] xfs: enable realtime quota again 2022-12-30 22:18 ` [PATCHSET v1.0 0/3] xfs: enable quota for realtime voluems Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 1/3] xfs: fix chown with rt quota Darrick J. Wong 2022-12-30 22:18 ` [PATCH 2/3] xfs: fix rt growfs quota accounting Darrick J. Wong 2 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Enable quotas for the realtime device. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/xfs_qm.c | 12 +++--------- 1 file changed, 3 insertions(+), 9 deletions(-) diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c index 7a69857c4e49..99167e3250f9 100644 --- a/fs/xfs/xfs_qm.c +++ b/fs/xfs/xfs_qm.c @@ -1493,15 +1493,9 @@ xfs_qm_mount_quotas( int error = 0; uint sbf; - /* - * If quotas on realtime volumes is not supported, we disable - * quotas immediately. - */ - if (mp->m_sb.sb_rextents) { - xfs_notice(mp, "Cannot turn on quotas for realtime filesystem"); - mp->m_qflags = 0; - goto write_changes; - } + if (mp->m_sb.sb_rextents) + xfs_warn(mp, + "EXPERIMENTAL realtime quota feature in use. Use at your own risk!"); ASSERT(XFS_IS_QUOTA_ON(mp)); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 1/3] xfs: fix chown with rt quota 2022-12-30 22:18 ` [PATCHSET v1.0 0/3] xfs: enable quota for realtime voluems Darrick J. Wong 2022-12-30 22:18 ` [PATCH 3/3] xfs: enable realtime quota again Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2022-12-30 22:18 ` [PATCH 2/3] xfs: fix rt growfs quota accounting Darrick J. Wong 2 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Make chown's quota adjustments work with realtime files. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/xfs_qm.c | 44 +++++++++++++++++++++++++++----------------- fs/xfs/xfs_trans.c | 31 +++++++++++++++++++++++++++++-- 2 files changed, 56 insertions(+), 19 deletions(-) diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c index 63085d8b5ec1..7a69857c4e49 100644 --- a/fs/xfs/xfs_qm.c +++ b/fs/xfs/xfs_qm.c @@ -1214,8 +1214,8 @@ xfs_qm_dqusage_adjust( void *data) { struct xfs_inode *ip; - xfs_qcnt_t nblks; - xfs_filblks_t rtblks = 0; /* total rt blks */ + xfs_filblks_t nblks, rtblks; + unsigned int lock_mode; int error; ASSERT(XFS_IS_QUOTA_ON(mp)); @@ -1239,17 +1239,16 @@ xfs_qm_dqusage_adjust( ASSERT(ip->i_delayed_blks == 0); + lock_mode = xfs_ilock_data_map_shared(ip); if (XFS_IS_REALTIME_INODE(ip)) { - struct xfs_ifork *ifp = xfs_ifork_ptr(ip, XFS_DATA_FORK); - error = xfs_iread_extents(tp, ip, XFS_DATA_FORK); - if (error) + if (error) { + xfs_iunlock(ip, lock_mode); goto error0; - - xfs_bmap_count_leaves(ifp, &rtblks); + } } - - nblks = (xfs_qcnt_t)ip->i_nblocks - rtblks; + xfs_inode_count_blocks(tp, ip, &nblks, &rtblks); + xfs_iunlock(ip, lock_mode); /* * Add the (disk blocks and inode) resources occupied by this @@ -1870,9 +1869,8 @@ xfs_qm_vop_chown( struct xfs_dquot *newdq) { struct xfs_dquot *prevdq; - uint bfield = XFS_IS_REALTIME_INODE(ip) ? - XFS_TRANS_DQ_RTBCOUNT : XFS_TRANS_DQ_BCOUNT; - + xfs_filblks_t dblocks, rblocks; + bool isrt = XFS_IS_REALTIME_INODE(ip); ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL)); ASSERT(XFS_IS_QUOTA_ON(ip->i_mount)); @@ -1882,11 +1880,17 @@ xfs_qm_vop_chown( ASSERT(prevdq); ASSERT(prevdq != newdq); - xfs_trans_mod_ino_dquot(tp, ip, prevdq, bfield, -(ip->i_nblocks)); + xfs_inode_count_blocks(tp, ip, &dblocks, &rblocks); + + xfs_trans_mod_ino_dquot(tp, ip, prevdq, XFS_TRANS_DQ_BCOUNT, + -(xfs_qcnt_t)dblocks); + xfs_trans_mod_ino_dquot(tp, ip, prevdq, XFS_TRANS_DQ_RTBCOUNT, + -(xfs_qcnt_t)rblocks); xfs_trans_mod_ino_dquot(tp, ip, prevdq, XFS_TRANS_DQ_ICOUNT, -1); /* the sparkling new dquot */ - xfs_trans_mod_ino_dquot(tp, ip, newdq, bfield, ip->i_nblocks); + xfs_trans_mod_ino_dquot(tp, ip, newdq, XFS_TRANS_DQ_BCOUNT, dblocks); + xfs_trans_mod_ino_dquot(tp, ip, newdq, XFS_TRANS_DQ_RTBCOUNT, rblocks); xfs_trans_mod_ino_dquot(tp, ip, newdq, XFS_TRANS_DQ_ICOUNT, 1); /* @@ -1896,7 +1900,8 @@ xfs_qm_vop_chown( * (having already bumped up the real counter) so that we don't have * any reservation to give back when we commit. */ - xfs_trans_mod_dquot(tp, newdq, XFS_TRANS_DQ_RES_BLKS, + xfs_trans_mod_dquot(tp, newdq, + isrt ? XFS_TRANS_DQ_RES_RTBLKS : XFS_TRANS_DQ_RES_BLKS, -ip->i_delayed_blks); /* @@ -1908,8 +1913,13 @@ xfs_qm_vop_chown( */ tp->t_flags |= XFS_TRANS_DIRTY; xfs_dqlock(prevdq); - ASSERT(prevdq->q_blk.reserved >= ip->i_delayed_blks); - prevdq->q_blk.reserved -= ip->i_delayed_blks; + if (isrt) { + ASSERT(prevdq->q_rtb.reserved >= ip->i_delayed_blks); + prevdq->q_rtb.reserved -= ip->i_delayed_blks; + } else { + ASSERT(prevdq->q_blk.reserved >= ip->i_delayed_blks); + prevdq->q_blk.reserved -= ip->i_delayed_blks; + } xfs_dqunlock(prevdq); /* diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c index 05e93af190df..fd389a8582fd 100644 --- a/fs/xfs/xfs_trans.c +++ b/fs/xfs/xfs_trans.c @@ -1446,11 +1446,26 @@ xfs_trans_alloc_ichange( gdqp = (new_gdqp != ip->i_gdquot) ? new_gdqp : NULL; pdqp = (new_pdqp != ip->i_pdquot) ? new_pdqp : NULL; if (udqp || gdqp || pdqp) { + xfs_filblks_t dblocks, rblocks; unsigned int qflags = XFS_QMOPT_RES_REGBLKS; + bool isrt = XFS_IS_REALTIME_INODE(ip); if (force) qflags |= XFS_QMOPT_FORCE_RES; + if (isrt) { + error = xfs_iread_extents(tp, ip, XFS_DATA_FORK); + if (error) + goto out_cancel; + } + + xfs_inode_count_blocks(tp, ip, &dblocks, &rblocks); + + if (isrt) + rblocks += ip->i_delayed_blks; + else + dblocks += ip->i_delayed_blks; + /* * Reserve enough quota to handle blocks on disk and reserved * for a delayed allocation. We'll actually transfer the @@ -1458,8 +1473,20 @@ xfs_trans_alloc_ichange( * though that part is only semi-transactional. */ error = xfs_trans_reserve_quota_bydquots(tp, mp, udqp, gdqp, - pdqp, ip->i_nblocks + ip->i_delayed_blks, - 1, qflags); + pdqp, dblocks, 1, qflags); + if ((error == -EDQUOT || error == -ENOSPC) && !retried) { + xfs_trans_cancel(tp); + xfs_blockgc_free_dquots(mp, udqp, gdqp, pdqp, 0); + retried = true; + goto retry; + } + if (error) + goto out_cancel; + + /* Do the same for realtime. */ + qflags = XFS_QMOPT_RES_RTBLKS | (qflags & XFS_QMOPT_FORCE_RES); + error = xfs_trans_reserve_quota_bydquots(tp, mp, udqp, gdqp, + pdqp, rblocks, 0, qflags); if ((error == -EDQUOT || error == -ENOSPC) && !retried) { xfs_trans_cancel(tp); xfs_blockgc_free_dquots(mp, udqp, gdqp, pdqp, 0); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 2/3] xfs: fix rt growfs quota accounting 2022-12-30 22:18 ` [PATCHSET v1.0 0/3] xfs: enable quota for realtime voluems Darrick J. Wong 2022-12-30 22:18 ` [PATCH 3/3] xfs: enable realtime quota again Darrick J. Wong 2022-12-30 22:18 ` [PATCH 1/3] xfs: fix chown with rt quota Darrick J. Wong @ 2022-12-30 22:18 ` Darrick J. Wong 2 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:18 UTC (permalink / raw) To: djwong; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> When growing the realtime bitmap or summary inodes, use xfs_trans_alloc_inode to reserve quota for the blocks that could be allocated to the file. Although we never enforce limits against the root dquot, making a reservation means that the bmap code will update the quota block count, which is necessary for correct accounting. Found by running xfs/521. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- fs/xfs/xfs_rtalloc.c | 24 ++++++++---------------- 1 file changed, 8 insertions(+), 16 deletions(-) diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c index 5e27cb7fce36..4165899cdc96 100644 --- a/fs/xfs/xfs_rtalloc.c +++ b/fs/xfs/xfs_rtalloc.c @@ -870,15 +870,10 @@ xfs_growfs_rt_alloc( /* * Reserve space & log for one extent added to the file. */ - error = xfs_trans_alloc(mp, &M_RES(mp)->tr_growrtalloc, resblks, - 0, 0, &tp); + error = xfs_trans_alloc_inode(ip, &M_RES(mp)->tr_growrtalloc, + resblks, 0, false, &tp); if (error) return error; - /* - * Lock the inode. - */ - xfs_ilock(ip, XFS_ILOCK_EXCL); - xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL); error = xfs_iext_count_may_overflow(ip, XFS_DATA_FORK, XFS_IEXT_ADD_NOSPLIT_CNT); @@ -902,6 +897,7 @@ xfs_growfs_rt_alloc( * Free any blocks freed up in the transaction, then commit. */ error = xfs_trans_commit(tp); + xfs_iunlock(ip, XFS_ILOCK_EXCL); if (error) return error; /* @@ -914,15 +910,11 @@ xfs_growfs_rt_alloc( /* * Reserve log for one block zeroing. */ - error = xfs_trans_alloc(mp, &M_RES(mp)->tr_growrtzero, - 0, 0, 0, &tp); + error = xfs_trans_alloc_inode(ip, + &M_RES(mp)->tr_growrtzero, 0, 0, false, + &tp); if (error) return error; - /* - * Lock the bitmap inode. - */ - xfs_ilock(ip, XFS_ILOCK_EXCL); - xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL); error = xfs_growfs_init_rtbuf(tp, ip, fsbno, buf_type); if (error) @@ -932,6 +924,7 @@ xfs_growfs_rt_alloc( * Commit the transaction. */ error = xfs_trans_commit(tp); + xfs_iunlock(ip, XFS_ILOCK_EXCL); if (error) return error; } @@ -945,6 +938,7 @@ xfs_growfs_rt_alloc( out_trans_cancel: xfs_trans_cancel(tp); + xfs_iunlock(ip, XFS_ILOCK_EXCL); return error; } @@ -1310,8 +1304,6 @@ xfs_growfs_rt( /* Unsupported realtime features. */ if (!xfs_has_rtgroups(mp) && (xfs_has_rmapbt(mp) || xfs_has_reflink(mp))) return -EOPNOTSUPP; - if (xfs_has_quota(mp)) - return -EOPNOTSUPP; if (xfs_has_reflink(mp) && !is_power_of_2(mp->m_sb.sb_rextsize) && (XFS_FSB_TO_B(mp, mp->m_sb.sb_rextsize) & ~PAGE_MASK)) return -EOPNOTSUPP; ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCHSET 0/4] xfs_repair: add other v5 features to filesystems 2022-12-30 21:14 [NYE DELUGE 3/4] xfs: modernize the realtime volume Darrick J. Wong ` (17 preceding siblings ...) 2022-12-30 22:18 ` [PATCHSET v1.0 0/3] xfs: enable quota for realtime voluems Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 1/4] xfs_repair: check free space requirements before allowing upgrades Darrick J. Wong ` (3 more replies) 2022-12-30 22:19 ` [PATCHSET v1.0 00/26] libxfs: hoist inode operations to libxfs Darrick J. Wong ` (20 subsequent siblings) 39 siblings, 4 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: Dave Chinner, Chandan Babu R, linux-xfs Hi all, This series enables xfs_repair to add select features to existing V5 filesystems. Specifically, one can add free inode btrees, reflink support, and reverse mapping. If you're going to start using this mess, you probably ought to just pull from my git trees, which are linked below. This is an extraordinary way to destroy everything. Enjoy! Comments and questions are, as always, welcome. --D xfsprogs git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=upgrade-older-features fstests git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=upgrade-older-features --- include/libxfs.h | 1 man/man8/xfs_admin.8 | 21 +++++ repair/globals.c | 3 + repair/globals.h | 3 + repair/phase2.c | 229 ++++++++++++++++++++++++++++++++++++++++++++++++++ repair/rmap.c | 8 +- repair/xfs_repair.c | 33 +++++++ 7 files changed, 294 insertions(+), 4 deletions(-) ^ permalink raw reply [flat|nested] 565+ messages in thread
* [PATCH 1/4] xfs_repair: check free space requirements before allowing upgrades 2022-12-30 22:19 ` [PATCHSET 0/4] xfs_repair: add other v5 features to filesystems Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 4/4] xfs_repair: allow sysadmins to add reverse mapping indexes Darrick J. Wong ` (2 subsequent siblings) 3 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: Chandan Babu R, Dave Chinner, linux-xfs From: Darrick J. Wong <djwong@kernel.org> Currently, the V5 feature upgrades permitted by xfs_repair do not affect filesystem space usage, so we haven't needed to verify the geometry. However, this will change once we start to allow the sysadmin to add new metadata indexes to existing filesystems. Add all the infrastructure we need to ensure that there's enough space for metadata space reservations and per-AG reservations the next time the filesystem will be mounted. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Chandan Babu R <chandan.babu@oracle.com> [david: Recompute transaction reservation values; Exit with error if upgrade fails] Signed-off-by: Dave Chinner <david@fromorbit.com> [djwong: Refuse to upgrade if any part of the fs has < 10% free] --- include/libxfs.h | 1 repair/phase2.c | 134 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 135 insertions(+) diff --git a/include/libxfs.h b/include/libxfs.h index d4b5d8e564d..14f6d629c9f 100644 --- a/include/libxfs.h +++ b/include/libxfs.h @@ -80,6 +80,7 @@ struct iomap; #include "xfs_refcount.h" #include "xfs_btree_staging.h" #include "xfs_symlink_remote.h" +#include "xfs_ag_resv.h" #ifndef ARRAY_SIZE #define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0])) diff --git a/repair/phase2.c b/repair/phase2.c index 2ada95aefd1..cdfc98bf39f 100644 --- a/repair/phase2.c +++ b/repair/phase2.c @@ -242,6 +242,137 @@ install_new_state( libxfs_trans_init(mp); } +#define GIGABYTES(count, blog) ((uint64_t)(count) << (30 - (blog))) +static inline bool +check_free_space( + struct xfs_mount *mp, + unsigned long long avail, + unsigned long long total) +{ + /* Ok if there's more than 10% free. */ + if (avail >= total / 10) + return true; + + /* Not ok if there's less than 5% free. */ + if (avail < total / 5) + return false; + + /* Let it slide if there's at least 10GB free. */ + return avail > GIGABYTES(10, mp->m_sb.sb_blocklog); +} + +static void +check_fs_free_space( + struct xfs_mount *mp, + const struct check_state *old, + struct xfs_sb *new_sb) +{ + struct xfs_perag *pag; + xfs_agnumber_t agno; + int error; + + /* Make sure we have enough space for per-AG reservations. */ + for_each_perag(mp, agno, pag) { + struct xfs_trans *tp; + struct xfs_agf *agf; + struct xfs_buf *agi_bp, *agf_bp; + unsigned int avail, agblocks; + + /* Put back the old super so that we can read AG headers. */ + restore_old_state(mp, old); + + /* + * Create a dummy transaction so that we can load the AGI and + * AGF buffers in memory with the old fs geometry and pin them + * there while we try to make a per-AG reservation with the new + * geometry. + */ + error = -libxfs_trans_alloc_empty(mp, &tp); + if (error) + do_error( + _("Cannot reserve resources for upgrade check, err=%d.\n"), + error); + + error = -libxfs_ialloc_read_agi(pag, tp, &agi_bp); + if (error) + do_error( + _("Cannot read AGI %u for upgrade check, err=%d.\n"), + pag->pag_agno, error); + + error = -libxfs_alloc_read_agf(pag, tp, 0, &agf_bp); + if (error) + do_error( + _("Cannot read AGF %u for upgrade check, err=%d.\n"), + pag->pag_agno, error); + agf = agf_bp->b_addr; + agblocks = be32_to_cpu(agf->agf_length); + + /* + * Install the new superblock and try to make a per-AG space + * reservation with the new geometry. We pinned the AG header + * buffers to the transaction, so we shouldn't hit any + * corruption errors on account of the new geometry. + */ + install_new_state(mp, new_sb); + + error = -libxfs_ag_resv_init(pag, tp); + if (error == ENOSPC) { + printf( + _("Not enough free space would remain in AG %u for metadata.\n"), + pag->pag_agno); + exit(1); + } + if (error) + do_error( + _("Error %d while checking AG %u space reservation.\n"), + error, pag->pag_agno); + + /* + * Would the post-upgrade filesystem have enough free space in + * this AG after making per-AG reservations? + */ + avail = pag->pagf_freeblks + pag->pagf_flcount; + avail -= pag->pag_meta_resv.ar_reserved; + avail -= pag->pag_rmapbt_resv.ar_asked; + + if (!check_free_space(mp, avail, agblocks)) { + printf( + _("AG %u will be low on space after upgrade.\n"), + pag->pag_agno); + exit(1); + } + libxfs_trans_cancel(tp); + } + + /* + * Would the post-upgrade filesystem have enough free space on the data + * device after making per-AG reservations? + */ + if (!check_free_space(mp, mp->m_sb.sb_fdblocks, mp->m_sb.sb_dblocks)) { + printf(_("Filesystem will be low on space after upgrade.\n")); + exit(1); + } + + /* + * Release the per-AG reservations and mark the per-AG structure as + * uninitialized so that we don't trip over stale cached counters + * after the upgrade/ + */ + for_each_perag(mp, agno, pag) { + libxfs_ag_resv_free(pag); + pag->pagf_init = 0; + pag->pagi_init = 0; + } +} + +static bool +need_check_fs_free_space( + struct xfs_mount *mp, + const struct check_state *old) +{ + return false; +} + /* * Make sure we can actually upgrade this (v5) filesystem without running afoul * of root inode or log size requirements that would prevent us from mounting @@ -284,6 +415,9 @@ install_new_geometry( exit(1); } + if (need_check_fs_free_space(mp, &old)) + check_fs_free_space(mp, &old, new_sb); + /* * Restore the old state to get everything back to a clean state, * upgrade the featureset one more time, and recompute the btree max ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 4/4] xfs_repair: allow sysadmins to add reverse mapping indexes 2022-12-30 22:19 ` [PATCHSET 0/4] xfs_repair: add other v5 features to filesystems Darrick J. Wong 2022-12-30 22:19 ` [PATCH 1/4] xfs_repair: check free space requirements before allowing upgrades Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 2/4] xfs_repair: allow sysadmins to add free inode btree indexes Darrick J. Wong 2022-12-30 22:19 ` [PATCH 3/4] xfs_repair: allow sysadmins to add reflink Darrick J. Wong 3 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Allow the sysadmin to use xfs_repair to upgrade an existing filesystem to support the reverse mapping btree index. This is needed for online fsck. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- man/man8/xfs_admin.8 | 8 ++++++++ repair/globals.c | 1 + repair/globals.h | 1 + repair/phase2.c | 38 ++++++++++++++++++++++++++++++++++++++ repair/rmap.c | 4 ++-- repair/xfs_repair.c | 11 +++++++++++ 6 files changed, 61 insertions(+), 2 deletions(-) diff --git a/man/man8/xfs_admin.8 b/man/man8/xfs_admin.8 index 3af201cadc3..467fb2dfd0a 100644 --- a/man/man8/xfs_admin.8 +++ b/man/man8/xfs_admin.8 @@ -169,6 +169,14 @@ Enable sharing of file data blocks. This upgrade can fail if any AG has less than 2% free space remaining. The filesystem cannot be downgraded after this feature is enabled. This feature was added to Linux 4.9. +.TP 0.4i +.B rmapbt +Store an index of the owners of on-disk blocks. +This enables much stronger cross-referencing of various metadata structures +and online repairs to space usage metadata. +The filesystem cannot be downgraded after this feature is enabled. +This upgrade can fail if any AG has less than 5% free space remaining. +This feature was added to Linux 4.8. .RE .TP .BI \-U " uuid" diff --git a/repair/globals.c b/repair/globals.c index f5b5269c5d9..ec11bc67139 100644 --- a/repair/globals.c +++ b/repair/globals.c @@ -54,6 +54,7 @@ bool add_bigtime; /* add support for timestamps up to 2486 */ bool add_nrext64; bool add_finobt; /* add free inode btrees */ bool add_reflink; /* add reference count btrees */ +bool add_rmapbt; /* add reverse mapping btrees */ /* misc status variables */ diff --git a/repair/globals.h b/repair/globals.h index d42c4d3c6a5..d5a04a75d41 100644 --- a/repair/globals.h +++ b/repair/globals.h @@ -95,6 +95,7 @@ extern bool add_bigtime; /* add support for timestamps up to 2486 */ extern bool add_nrext64; extern bool add_finobt; /* add free inode btrees */ extern bool add_reflink; /* add reference count btrees */ +extern bool add_rmapbt; /* add reverse mapping btrees */ /* misc status variables */ diff --git a/repair/phase2.c b/repair/phase2.c index d968b4fd558..05964b3d23c 100644 --- a/repair/phase2.c +++ b/repair/phase2.c @@ -252,6 +252,40 @@ set_reflink( return true; } +static bool +set_rmapbt( + struct xfs_mount *mp, + struct xfs_sb *new_sb) +{ + if (!xfs_has_crc(mp)) { + printf( + _("Reverse mapping btree feature only supported on V5 filesystems.\n")); + exit(0); + } + + if (xfs_has_realtime(mp)) { + printf( + _("Reverse mapping btree feature not supported with realtime.\n")); + exit(0); + } + + if (xfs_has_reflink(mp)) { + printf( + _("Reverse mapping btrees cannot be added when reflink is enabled.\n")); + exit(0); + } + + if (xfs_has_rmapbt(mp)) { + printf(_("Filesystem already supports reverse mapping btrees.\n")); + exit(0); + } + + printf(_("Adding reverse mapping btrees to filesystem.\n")); + new_sb->sb_features_ro_compat |= XFS_SB_FEAT_RO_COMPAT_RMAPBT; + new_sb->sb_features_incompat |= XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR; + return true; +} + struct check_state { struct xfs_sb sb; uint64_t features; @@ -423,6 +457,8 @@ need_check_fs_free_space( return true; if (xfs_has_reflink(mp) && !(old->features & XFS_FEAT_REFLINK)) return true; + if (xfs_has_rmapbt(mp) && !(old->features & XFS_FEAT_RMAPBT)) + return true; return false; } @@ -502,6 +538,8 @@ upgrade_filesystem( dirty |= set_finobt(mp, &new_sb); if (add_reflink) dirty |= set_reflink(mp, &new_sb); + if (add_rmapbt) + dirty |= set_rmapbt(mp, &new_sb); if (!dirty) return; diff --git a/repair/rmap.c b/repair/rmap.c index e49cd2364ec..00381c6e69d 100644 --- a/repair/rmap.c +++ b/repair/rmap.c @@ -52,7 +52,7 @@ rmap_needs_work( struct xfs_mount *mp) { return xfs_has_reflink(mp) || add_reflink || - xfs_has_rmapbt(mp); + xfs_has_rmapbt(mp) || add_rmapbt; } /* Destroy an in-memory rmap btree. */ @@ -1159,7 +1159,7 @@ rmaps_verify_btree( int have; int error; - if (!xfs_has_rmapbt(mp)) + if (!xfs_has_rmapbt(mp) || add_rmapbt) return; if (rmapbt_suspect) { if (no_modify && agno == 0) diff --git a/repair/xfs_repair.c b/repair/xfs_repair.c index 90da0d0e956..c461bc8eb07 100644 --- a/repair/xfs_repair.c +++ b/repair/xfs_repair.c @@ -71,6 +71,7 @@ enum c_opt_nums { CONVERT_NREXT64, CONVERT_FINOBT, CONVERT_REFLINK, + CONVERT_RMAPBT, C_MAX_OPTS, }; @@ -81,6 +82,7 @@ static char *c_opts[] = { [CONVERT_NREXT64] = "nrext64", [CONVERT_FINOBT] = "finobt", [CONVERT_REFLINK] = "reflink", + [CONVERT_RMAPBT] = "rmapbt", [C_MAX_OPTS] = NULL, }; @@ -358,6 +360,15 @@ process_args(int argc, char **argv) _("-c reflink only supports upgrades\n")); add_reflink = true; break; + case CONVERT_RMAPBT: + if (!val) + do_abort( + _("-c rmapbt requires a parameter\n")); + if (strtol(val, NULL, 0) != 1) + do_abort( + _("-c rmapbt only supports upgrades\n")); + add_rmapbt = true; + break; default: unknown('c', val); break; ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 2/4] xfs_repair: allow sysadmins to add free inode btree indexes 2022-12-30 22:19 ` [PATCHSET 0/4] xfs_repair: add other v5 features to filesystems Darrick J. Wong 2022-12-30 22:19 ` [PATCH 1/4] xfs_repair: check free space requirements before allowing upgrades Darrick J. Wong 2022-12-30 22:19 ` [PATCH 4/4] xfs_repair: allow sysadmins to add reverse mapping indexes Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 3/4] xfs_repair: allow sysadmins to add reflink Darrick J. Wong 3 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Allow the sysadmin to use xfs_repair to upgrade an existing filesystem to support the free inode btree. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- man/man8/xfs_admin.8 | 7 +++++++ repair/globals.c | 1 + repair/globals.h | 1 + repair/phase2.c | 26 ++++++++++++++++++++++++++ repair/xfs_repair.c | 11 +++++++++++ 5 files changed, 46 insertions(+) diff --git a/man/man8/xfs_admin.8 b/man/man8/xfs_admin.8 index 4794d6774ed..efe2ce45fc2 100644 --- a/man/man8/xfs_admin.8 +++ b/man/man8/xfs_admin.8 @@ -156,6 +156,13 @@ data fork extent count will be 2^48 - 1, while the maximum attribute fork extent count will be 2^32 - 1. The filesystem cannot be downgraded after this feature is enabled. Once enabled, the filesystem will not be mountable by older kernels. This feature was added to Linux 5.19. +.TP 0.4i +.B finobt +Track free inodes through a separate free inode btree index to speed up inode +allocation on old filesystems. +This upgrade can fail if any AG has less than 1% free space remaining. +The filesystem cannot be downgraded after this feature is enabled. +This feature was added to Linux 3.16. .RE .TP .BI \-U " uuid" diff --git a/repair/globals.c b/repair/globals.c index c40849853b8..9640877b703 100644 --- a/repair/globals.c +++ b/repair/globals.c @@ -52,6 +52,7 @@ bool features_changed; /* did we change superblock feature bits? */ bool add_inobtcount; /* add inode btree counts to AGI */ bool add_bigtime; /* add support for timestamps up to 2486 */ bool add_nrext64; +bool add_finobt; /* add free inode btrees */ /* misc status variables */ diff --git a/repair/globals.h b/repair/globals.h index b65e4a2d09c..d7539294b5f 100644 --- a/repair/globals.h +++ b/repair/globals.h @@ -93,6 +93,7 @@ extern bool features_changed; /* did we change superblock feature bits? */ extern bool add_inobtcount; /* add inode btree counts to AGI */ extern bool add_bigtime; /* add support for timestamps up to 2486 */ extern bool add_nrext64; +extern bool add_finobt; /* add free inode btrees */ /* misc status variables */ diff --git a/repair/phase2.c b/repair/phase2.c index cdfc98bf39f..29bc0e34363 100644 --- a/repair/phase2.c +++ b/repair/phase2.c @@ -203,6 +203,28 @@ set_nrext64( return true; } +static bool +set_finobt( + struct xfs_mount *mp, + struct xfs_sb *new_sb) +{ + if (!xfs_has_crc(mp)) { + printf( + _("Free inode btree feature only supported on V5 filesystems.\n")); + exit(0); + } + + if (xfs_has_finobt(mp)) { + printf(_("Filesystem already supports free inode btrees.\n")); + exit(0); + } + + printf(_("Adding free inode btrees to filesystem.\n")); + new_sb->sb_features_ro_compat |= XFS_SB_FEAT_RO_COMPAT_FINOBT; + new_sb->sb_features_incompat |= XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR; + return true; +} + struct check_state { struct xfs_sb sb; uint64_t features; @@ -370,6 +392,8 @@ need_check_fs_free_space( struct xfs_mount *mp, const struct check_state *old) { + if (xfs_has_finobt(mp) && !(old->features & XFS_FEAT_FINOBT)) + return true; return false; } @@ -445,6 +469,8 @@ upgrade_filesystem( dirty |= set_bigtime(mp, &new_sb); if (add_nrext64) dirty |= set_nrext64(mp, &new_sb); + if (add_finobt) + dirty |= set_finobt(mp, &new_sb); if (!dirty) return; diff --git a/repair/xfs_repair.c b/repair/xfs_repair.c index 8e62533ac53..45130ab7559 100644 --- a/repair/xfs_repair.c +++ b/repair/xfs_repair.c @@ -69,6 +69,7 @@ enum c_opt_nums { CONVERT_INOBTCOUNT, CONVERT_BIGTIME, CONVERT_NREXT64, + CONVERT_FINOBT, C_MAX_OPTS, }; @@ -77,6 +78,7 @@ static char *c_opts[] = { [CONVERT_INOBTCOUNT] = "inobtcount", [CONVERT_BIGTIME] = "bigtime", [CONVERT_NREXT64] = "nrext64", + [CONVERT_FINOBT] = "finobt", [C_MAX_OPTS] = NULL, }; @@ -336,6 +338,15 @@ process_args(int argc, char **argv) _("-c nrext64 only supports upgrades\n")); add_nrext64 = true; break; + case CONVERT_FINOBT: + if (!val) + do_abort( + _("-c finobt requires a parameter\n")); + if (strtol(val, NULL, 0) != 1) + do_abort( + _("-c finobt only supports upgrades\n")); + add_finobt = true; + break; default: unknown('c', val); break; ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 3/4] xfs_repair: allow sysadmins to add reflink 2022-12-30 22:19 ` [PATCHSET 0/4] xfs_repair: add other v5 features to filesystems Darrick J. Wong ` (2 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 2/4] xfs_repair: allow sysadmins to add free inode btree indexes Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 3 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Allow the sysadmin to use xfs_repair to upgrade an existing filesystem to support the reference count btree, and therefore reflink. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- man/man8/xfs_admin.8 | 6 ++++++ repair/globals.c | 1 + repair/globals.h | 1 + repair/phase2.c | 31 +++++++++++++++++++++++++++++++ repair/rmap.c | 4 ++-- repair/xfs_repair.c | 11 +++++++++++ 6 files changed, 52 insertions(+), 2 deletions(-) diff --git a/man/man8/xfs_admin.8 b/man/man8/xfs_admin.8 index efe2ce45fc2..3af201cadc3 100644 --- a/man/man8/xfs_admin.8 +++ b/man/man8/xfs_admin.8 @@ -163,6 +163,12 @@ allocation on old filesystems. This upgrade can fail if any AG has less than 1% free space remaining. The filesystem cannot be downgraded after this feature is enabled. This feature was added to Linux 3.16. +.TP 0.4i +.B reflink +Enable sharing of file data blocks. +This upgrade can fail if any AG has less than 2% free space remaining. +The filesystem cannot be downgraded after this feature is enabled. +This feature was added to Linux 4.9. .RE .TP .BI \-U " uuid" diff --git a/repair/globals.c b/repair/globals.c index 9640877b703..f5b5269c5d9 100644 --- a/repair/globals.c +++ b/repair/globals.c @@ -53,6 +53,7 @@ bool add_inobtcount; /* add inode btree counts to AGI */ bool add_bigtime; /* add support for timestamps up to 2486 */ bool add_nrext64; bool add_finobt; /* add free inode btrees */ +bool add_reflink; /* add reference count btrees */ /* misc status variables */ diff --git a/repair/globals.h b/repair/globals.h index d7539294b5f..d42c4d3c6a5 100644 --- a/repair/globals.h +++ b/repair/globals.h @@ -94,6 +94,7 @@ extern bool add_inobtcount; /* add inode btree counts to AGI */ extern bool add_bigtime; /* add support for timestamps up to 2486 */ extern bool add_nrext64; extern bool add_finobt; /* add free inode btrees */ +extern bool add_reflink; /* add reference count btrees */ /* misc status variables */ diff --git a/repair/phase2.c b/repair/phase2.c index 29bc0e34363..d968b4fd558 100644 --- a/repair/phase2.c +++ b/repair/phase2.c @@ -225,6 +225,33 @@ set_finobt( return true; } +static bool +set_reflink( + struct xfs_mount *mp, + struct xfs_sb *new_sb) +{ + if (!xfs_has_crc(mp)) { + printf( + _("Reflink feature only supported on V5 filesystems.\n")); + exit(0); + } + + if (xfs_has_reflink(mp)) { + printf(_("Filesystem already supports reflink.\n")); + exit(0); + } + + if (xfs_has_realtime(mp)) { + printf(_("Reflink feature not supported with realtime.\n")); + exit(0); + } + + printf(_("Adding reflink support to filesystem.\n")); + new_sb->sb_features_ro_compat |= XFS_SB_FEAT_RO_COMPAT_REFLINK; + new_sb->sb_features_incompat |= XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR; + return true; +} + struct check_state { struct xfs_sb sb; uint64_t features; @@ -394,6 +421,8 @@ need_check_fs_free_space( { if (xfs_has_finobt(mp) && !(old->features & XFS_FEAT_FINOBT)) return true; + if (xfs_has_reflink(mp) && !(old->features & XFS_FEAT_REFLINK)) + return true; return false; } @@ -471,6 +500,8 @@ upgrade_filesystem( dirty |= set_nrext64(mp, &new_sb); if (add_finobt) dirty |= set_finobt(mp, &new_sb); + if (add_reflink) + dirty |= set_reflink(mp, &new_sb); if (!dirty) return; diff --git a/repair/rmap.c b/repair/rmap.c index f8294cc3e13..e49cd2364ec 100644 --- a/repair/rmap.c +++ b/repair/rmap.c @@ -51,7 +51,7 @@ bool rmap_needs_work( struct xfs_mount *mp) { - return xfs_has_reflink(mp) || + return xfs_has_reflink(mp) || add_reflink || xfs_has_rmapbt(mp); } @@ -1529,7 +1529,7 @@ check_refcounts( int i; int error; - if (!xfs_has_reflink(mp)) + if (!xfs_has_reflink(mp) || add_reflink) return; if (refcbt_suspect) { if (no_modify && agno == 0) diff --git a/repair/xfs_repair.c b/repair/xfs_repair.c index 45130ab7559..90da0d0e956 100644 --- a/repair/xfs_repair.c +++ b/repair/xfs_repair.c @@ -70,6 +70,7 @@ enum c_opt_nums { CONVERT_BIGTIME, CONVERT_NREXT64, CONVERT_FINOBT, + CONVERT_REFLINK, C_MAX_OPTS, }; @@ -79,6 +80,7 @@ static char *c_opts[] = { [CONVERT_BIGTIME] = "bigtime", [CONVERT_NREXT64] = "nrext64", [CONVERT_FINOBT] = "finobt", + [CONVERT_REFLINK] = "reflink", [C_MAX_OPTS] = NULL, }; @@ -347,6 +349,15 @@ process_args(int argc, char **argv) _("-c finobt only supports upgrades\n")); add_finobt = true; break; + case CONVERT_REFLINK: + if (!val) + do_abort( + _("-c reflink requires a parameter\n")); + if (strtol(val, NULL, 0) != 1) + do_abort( + _("-c reflink only supports upgrades\n")); + add_reflink = true; + break; default: unknown('c', val); break; ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCHSET v1.0 00/26] libxfs: hoist inode operations to libxfs 2022-12-30 21:14 [NYE DELUGE 3/4] xfs: modernize the realtime volume Darrick J. Wong ` (18 preceding siblings ...) 2022-12-30 22:19 ` [PATCHSET 0/4] xfs_repair: add other v5 features to filesystems Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 03/26] xfs: hoist project id get/set functions " Darrick J. Wong ` (25 more replies) 2022-12-30 22:19 ` [PATCHSET v1.0 00/46] libxfs: metadata inode directories Darrick J. Wong ` (19 subsequent siblings) 39 siblings, 26 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs Hi all, Add libxfs code from the kernel from the inode refactoring, then fix up xfs_repair and mkfs to use library functions instead of open-coding inode (re)creation. If you're going to start using this mess, you probably ought to just pull from my git trees, which are linked below. This is an extraordinary way to destroy everything. Enjoy! Comments and questions are, as always, welcome. --D kernel git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=inode-refactor xfsprogs git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=inode-refactor --- db/sb.c | 4 include/libxfs.h | 1 include/xfs_inode.h | 85 ++++-- include/xfs_mount.h | 1 include/xfs_trace.h | 6 libxfs/Makefile | 5 libxfs/inode.c | 279 ++++++++++++++++++ libxfs/iunlink.c | 126 ++++++++ libxfs/iunlink.h | 22 + libxfs/libxfs_api_defs.h | 7 libxfs/libxfs_priv.h | 36 ++ libxfs/rdwr.c | 87 ------ libxfs/util.c | 282 ------------------ libxfs/xfs_bmap.c | 42 +++ libxfs/xfs_bmap.h | 3 libxfs/xfs_dir2.c | 480 +++++++++++++++++++++++++++++++ libxfs/xfs_dir2.h | 19 + libxfs/xfs_format.h | 9 - libxfs/xfs_ialloc.c | 20 + libxfs/xfs_inode_util.c | 695 +++++++++++++++++++++++++++++++++++++++++++++ libxfs/xfs_inode_util.h | 79 +++++ libxfs/xfs_shared.h | 7 libxfs/xfs_trans_inode.c | 2 mdrestore/xfs_mdrestore.c | 6 mkfs/proto.c | 94 +++++- repair/agheader.c | 12 - repair/phase6.c | 199 ++++--------- 27 files changed, 2037 insertions(+), 571 deletions(-) create mode 100644 libxfs/inode.c create mode 100644 libxfs/iunlink.c create mode 100644 libxfs/iunlink.h create mode 100644 libxfs/xfs_inode_util.c create mode 100644 libxfs/xfs_inode_util.h ^ permalink raw reply [flat|nested] 565+ messages in thread
* [PATCH 03/26] xfs: hoist project id get/set functions to libxfs 2022-12-30 22:19 ` [PATCHSET v1.0 00/26] libxfs: hoist inode operations to libxfs Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 01/26] xfs: hoist extent size helpers " Darrick J. Wong ` (24 subsequent siblings) 25 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Move the project id get and set functions into libxfs. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/libxfs_api_defs.h | 2 ++ libxfs/xfs_inode_util.c | 11 +++++++++++ libxfs/xfs_inode_util.h | 2 ++ 3 files changed, 15 insertions(+) diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h index d871963966c..01ad6e54624 100644 --- a/libxfs/libxfs_api_defs.h +++ b/libxfs/libxfs_api_defs.h @@ -135,6 +135,7 @@ #define xfs_free_extent libxfs_free_extent #define xfs_free_perag libxfs_free_perag #define xfs_fs_geometry libxfs_fs_geometry +#define xfs_get_projid libxfs_get_projid #define xfs_highbit32 libxfs_highbit32 #define xfs_highbit64 libxfs_highbit64 #define xfs_ialloc_calc_rootino libxfs_ialloc_calc_rootino @@ -207,6 +208,7 @@ #define xfs_sb_read_secondary libxfs_sb_read_secondary #define xfs_sb_to_disk libxfs_sb_to_disk #define xfs_sb_version_to_features libxfs_sb_version_to_features +#define xfs_set_projid libxfs_set_projid #define xfs_symlink_blocks libxfs_symlink_blocks #define xfs_symlink_hdr_ok libxfs_symlink_hdr_ok #define xfs_symlink_write_target libxfs_symlink_write_target diff --git a/libxfs/xfs_inode_util.c b/libxfs/xfs_inode_util.c index 868a77cafa6..89fb58807a1 100644 --- a/libxfs/xfs_inode_util.c +++ b/libxfs/xfs_inode_util.c @@ -122,3 +122,14 @@ xfs_ip2xflags( flags |= FS_XFLAG_HASATTR; return flags; } + +#define XFS_PROJID_DEFAULT 0 + +prid_t +xfs_get_initial_prid(struct xfs_inode *dp) +{ + if (dp->i_diflags & XFS_DIFLAG_PROJINHERIT) + return dp->i_projid; + + return XFS_PROJID_DEFAULT; +} diff --git a/libxfs/xfs_inode_util.h b/libxfs/xfs_inode_util.h index 6ad1898a0f7..f7e4d5a8235 100644 --- a/libxfs/xfs_inode_util.h +++ b/libxfs/xfs_inode_util.h @@ -11,4 +11,6 @@ uint64_t xfs_flags2diflags2(struct xfs_inode *ip, unsigned int xflags); uint32_t xfs_dic2xflags(struct xfs_inode *ip); uint32_t xfs_ip2xflags(struct xfs_inode *ip); +prid_t xfs_get_initial_prid(struct xfs_inode *dp); + #endif /* __XFS_INODE_UTIL_H__ */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 01/26] xfs: hoist extent size helpers to libxfs 2022-12-30 22:19 ` [PATCHSET v1.0 00/26] libxfs: hoist inode operations to libxfs Darrick J. Wong 2022-12-30 22:19 ` [PATCH 03/26] xfs: hoist project id get/set functions " Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 02/26] xfs: hoist inode flag conversion functions " Darrick J. Wong ` (23 subsequent siblings) 25 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Move the extent size helpers to xfs_bmap.c in libxfs since they're used there already. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- include/xfs_inode.h | 7 +++++++ libxfs/libxfs_priv.h | 2 -- libxfs/xfs_bmap.c | 41 +++++++++++++++++++++++++++++++++++++++++ libxfs/xfs_bmap.h | 3 +++ 4 files changed, 51 insertions(+), 2 deletions(-) diff --git a/include/xfs_inode.h b/include/xfs_inode.h index 489fd7d107d..3bc5aa2c7cb 100644 --- a/include/xfs_inode.h +++ b/include/xfs_inode.h @@ -237,6 +237,11 @@ static inline bool xfs_inode_has_bigrtextents(struct xfs_inode *ip) return XFS_IS_REALTIME_INODE(ip) && ip->i_mount->m_sb.sb_rextsize > 1; } +static inline bool xfs_is_always_cow_inode(struct xfs_inode *ip) +{ + return false; +} + /* Always set the child's GID to this value, even if the parent is setgid. */ #define CRED_FORCE_GID (1U << 0) struct cred { @@ -262,4 +267,6 @@ extern int libxfs_iget(struct xfs_mount *, struct xfs_trans *, xfs_ino_t, uint, struct xfs_inode **); extern void libxfs_irele(struct xfs_inode *ip); +#define XFS_DEFAULT_COWEXTSZ_HINT 32 + #endif /* __XFS_INODE_H__ */ diff --git a/libxfs/libxfs_priv.h b/libxfs/libxfs_priv.h index 63bc6ea7c2b..716e711cde4 100644 --- a/libxfs/libxfs_priv.h +++ b/libxfs/libxfs_priv.h @@ -508,8 +508,6 @@ void __xfs_buf_mark_corrupt(struct xfs_buf *bp, xfs_failaddr_t fa); #define xfs_rotorstep 1 #define xfs_bmap_rtalloc(a) (-ENOSYS) -#define xfs_get_extsz_hint(ip) (0) -#define xfs_get_cowextsz_hint(ip) (0) #define xfs_inode_is_filestream(ip) (0) #define xfs_filestream_lookup_ag(ip) (0) #define xfs_filestream_new_ag(ip,ag) (0) diff --git a/libxfs/xfs_bmap.c b/libxfs/xfs_bmap.c index f2af8a012a9..c4a81537ccf 100644 --- a/libxfs/xfs_bmap.c +++ b/libxfs/xfs_bmap.c @@ -6406,3 +6406,44 @@ xfs_bmap_query_all( return xfs_btree_query_all(cur, xfs_bmap_query_range_helper, &query); } + +/* Helper function to extract extent size hint from inode */ +xfs_extlen_t +xfs_get_extsz_hint( + struct xfs_inode *ip) +{ + /* + * No point in aligning allocations if we need to COW to actually + * write to them. + */ + if (xfs_is_always_cow_inode(ip)) + return 0; + if ((ip->i_diflags & XFS_DIFLAG_EXTSIZE) && ip->i_extsize) + return ip->i_extsize; + if (XFS_IS_REALTIME_INODE(ip)) + return ip->i_mount->m_sb.sb_rextsize; + return 0; +} + +/* + * Helper function to extract CoW extent size hint from inode. + * Between the extent size hint and the CoW extent size hint, we + * return the greater of the two. If the value is zero (automatic), + * use the default size. + */ +xfs_extlen_t +xfs_get_cowextsz_hint( + struct xfs_inode *ip) +{ + xfs_extlen_t a, b; + + a = 0; + if (ip->i_diflags2 & XFS_DIFLAG2_COWEXTSIZE) + a = ip->i_cowextsize; + b = xfs_get_extsz_hint(ip); + + a = max(a, b); + if (a == 0) + return XFS_DEFAULT_COWEXTSZ_HINT; + return a; +} diff --git a/libxfs/xfs_bmap.h b/libxfs/xfs_bmap.h index 9559f7174bb..d870c6a62e4 100644 --- a/libxfs/xfs_bmap.h +++ b/libxfs/xfs_bmap.h @@ -292,4 +292,7 @@ typedef int (*xfs_bmap_query_range_fn)( int xfs_bmap_query_all(struct xfs_btree_cur *cur, xfs_bmap_query_range_fn fn, void *priv); +xfs_extlen_t xfs_get_extsz_hint(struct xfs_inode *ip); +xfs_extlen_t xfs_get_cowextsz_hint(struct xfs_inode *ip); + #endif /* __XFS_BMAP_H__ */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 02/26] xfs: hoist inode flag conversion functions to libxfs 2022-12-30 22:19 ` [PATCHSET v1.0 00/26] libxfs: hoist inode operations to libxfs Darrick J. Wong 2022-12-30 22:19 ` [PATCH 03/26] xfs: hoist project id get/set functions " Darrick J. Wong 2022-12-30 22:19 ` [PATCH 01/26] xfs: hoist extent size helpers " Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 06/26] libxfs: pack icreate initialization parameters into a separate structure Darrick J. Wong ` (22 subsequent siblings) 25 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Hoist the inode flag conversion functions into libxfs so that we can keep them in sync. Do this by creating a new xfs_inode_util.c file in libxfs. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- include/libxfs.h | 1 include/xfs_inode.h | 1 libxfs/Makefile | 2 + libxfs/util.c | 60 ----------------------- libxfs/xfs_bmap.c | 1 libxfs/xfs_inode_util.c | 124 +++++++++++++++++++++++++++++++++++++++++++++++ libxfs/xfs_inode_util.h | 14 +++++ 7 files changed, 143 insertions(+), 60 deletions(-) create mode 100644 libxfs/xfs_inode_util.c create mode 100644 libxfs/xfs_inode_util.h diff --git a/include/libxfs.h b/include/libxfs.h index 14f6d629c9f..a4f6e1c2b28 100644 --- a/include/libxfs.h +++ b/include/libxfs.h @@ -68,6 +68,7 @@ struct iomap; #include "xfs_attr_sf.h" #include "xfs_inode_fork.h" #include "xfs_inode_buf.h" +#include "xfs_inode_util.h" #include "xfs_alloc.h" #include "xfs_btree.h" #include "xfs_bmap.h" diff --git a/include/xfs_inode.h b/include/xfs_inode.h index 3bc5aa2c7cb..ef62ac50912 100644 --- a/include/xfs_inode.h +++ b/include/xfs_inode.h @@ -10,6 +10,7 @@ /* These match kernel side includes */ #include "xfs_inode_buf.h" #include "xfs_inode_fork.h" +#include "xfs_inode_util.h" struct xfs_trans; struct xfs_mount; diff --git a/libxfs/Makefile b/libxfs/Makefile index 0e43941948d..0d9c4adf82b 100644 --- a/libxfs/Makefile +++ b/libxfs/Makefile @@ -48,6 +48,7 @@ HFILES = \ xfs_ialloc_btree.h \ xfs_inode_buf.h \ xfs_inode_fork.h \ + xfs_inode_util.h \ xfs_quota_defs.h \ xfs_refcount.h \ xfs_refcount_btree.h \ @@ -96,6 +97,7 @@ CFILES = cache.c \ xfs_iext_tree.c \ xfs_inode_buf.c \ xfs_inode_fork.c \ + xfs_inode_util.c \ xfs_ialloc_btree.c \ xfs_log_rlimit.c \ xfs_refcount.c \ diff --git a/libxfs/util.c b/libxfs/util.c index 3d5ef68d8e7..6b888e9f996 100644 --- a/libxfs/util.c +++ b/libxfs/util.c @@ -150,66 +150,6 @@ current_time(struct inode *inode) return tv; } -STATIC uint16_t -xfs_flags2diflags( - struct xfs_inode *ip, - unsigned int xflags) -{ - /* can't set PREALLOC this way, just preserve it */ - uint16_t di_flags = - (ip->i_diflags & XFS_DIFLAG_PREALLOC); - - if (xflags & FS_XFLAG_IMMUTABLE) - di_flags |= XFS_DIFLAG_IMMUTABLE; - if (xflags & FS_XFLAG_APPEND) - di_flags |= XFS_DIFLAG_APPEND; - if (xflags & FS_XFLAG_SYNC) - di_flags |= XFS_DIFLAG_SYNC; - if (xflags & FS_XFLAG_NOATIME) - di_flags |= XFS_DIFLAG_NOATIME; - if (xflags & FS_XFLAG_NODUMP) - di_flags |= XFS_DIFLAG_NODUMP; - if (xflags & FS_XFLAG_NODEFRAG) - di_flags |= XFS_DIFLAG_NODEFRAG; - if (xflags & FS_XFLAG_FILESTREAM) - di_flags |= XFS_DIFLAG_FILESTREAM; - if (S_ISDIR(VFS_I(ip)->i_mode)) { - if (xflags & FS_XFLAG_RTINHERIT) - di_flags |= XFS_DIFLAG_RTINHERIT; - if (xflags & FS_XFLAG_NOSYMLINKS) - di_flags |= XFS_DIFLAG_NOSYMLINKS; - if (xflags & FS_XFLAG_EXTSZINHERIT) - di_flags |= XFS_DIFLAG_EXTSZINHERIT; - if (xflags & FS_XFLAG_PROJINHERIT) - di_flags |= XFS_DIFLAG_PROJINHERIT; - } else if (S_ISREG(VFS_I(ip)->i_mode)) { - if (xflags & FS_XFLAG_REALTIME) - di_flags |= XFS_DIFLAG_REALTIME; - if (xflags & FS_XFLAG_EXTSIZE) - di_flags |= XFS_DIFLAG_EXTSIZE; - } - - return di_flags; -} - -STATIC uint64_t -xfs_flags2diflags2( - struct xfs_inode *ip, - unsigned int xflags) -{ - uint64_t di_flags2 = - (ip->i_diflags2 & (XFS_DIFLAG2_REFLINK | - XFS_DIFLAG2_BIGTIME | - XFS_DIFLAG2_NREXT64)); - - if (xflags & FS_XFLAG_DAX) - di_flags2 |= XFS_DIFLAG2_DAX; - if (xflags & FS_XFLAG_COWEXTSIZE) - di_flags2 |= XFS_DIFLAG2_COWEXTSIZE; - - return di_flags2; -} - /* Propagate di_flags from a parent inode to a child inode. */ static void xfs_inode_propagate_flags( diff --git a/libxfs/xfs_bmap.c b/libxfs/xfs_bmap.c index c4a81537ccf..afa432727db 100644 --- a/libxfs/xfs_bmap.c +++ b/libxfs/xfs_bmap.c @@ -31,6 +31,7 @@ #include "xfs_refcount.h" #include "xfs_health.h" #include "xfs_symlink_remote.h" +#include "xfs_inode_util.h" struct kmem_cache *xfs_bmap_intent_cache; diff --git a/libxfs/xfs_inode_util.c b/libxfs/xfs_inode_util.c new file mode 100644 index 00000000000..868a77cafa6 --- /dev/null +++ b/libxfs/xfs_inode_util.c @@ -0,0 +1,124 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (c) 2000-2006 Silicon Graphics, Inc. + * All Rights Reserved. + */ +#include "libxfs_priv.h" +#include "xfs_fs.h" +#include "xfs_shared.h" +#include "xfs_format.h" +#include "xfs_log_format.h" +#include "xfs_trans_resv.h" +#include "xfs_sb.h" +#include "xfs_mount.h" +#include "xfs_inode.h" +#include "xfs_inode_util.h" + +uint16_t +xfs_flags2diflags( + struct xfs_inode *ip, + unsigned int xflags) +{ + /* can't set PREALLOC this way, just preserve it */ + uint16_t di_flags = + (ip->i_diflags & XFS_DIFLAG_PREALLOC); + + if (xflags & FS_XFLAG_IMMUTABLE) + di_flags |= XFS_DIFLAG_IMMUTABLE; + if (xflags & FS_XFLAG_APPEND) + di_flags |= XFS_DIFLAG_APPEND; + if (xflags & FS_XFLAG_SYNC) + di_flags |= XFS_DIFLAG_SYNC; + if (xflags & FS_XFLAG_NOATIME) + di_flags |= XFS_DIFLAG_NOATIME; + if (xflags & FS_XFLAG_NODUMP) + di_flags |= XFS_DIFLAG_NODUMP; + if (xflags & FS_XFLAG_NODEFRAG) + di_flags |= XFS_DIFLAG_NODEFRAG; + if (xflags & FS_XFLAG_FILESTREAM) + di_flags |= XFS_DIFLAG_FILESTREAM; + if (S_ISDIR(VFS_I(ip)->i_mode)) { + if (xflags & FS_XFLAG_RTINHERIT) + di_flags |= XFS_DIFLAG_RTINHERIT; + if (xflags & FS_XFLAG_NOSYMLINKS) + di_flags |= XFS_DIFLAG_NOSYMLINKS; + if (xflags & FS_XFLAG_EXTSZINHERIT) + di_flags |= XFS_DIFLAG_EXTSZINHERIT; + if (xflags & FS_XFLAG_PROJINHERIT) + di_flags |= XFS_DIFLAG_PROJINHERIT; + } else if (S_ISREG(VFS_I(ip)->i_mode)) { + if (xflags & FS_XFLAG_REALTIME) + di_flags |= XFS_DIFLAG_REALTIME; + if (xflags & FS_XFLAG_EXTSIZE) + di_flags |= XFS_DIFLAG_EXTSIZE; + } + + return di_flags; +} + +uint64_t +xfs_flags2diflags2( + struct xfs_inode *ip, + unsigned int xflags) +{ + uint64_t di_flags2 = + (ip->i_diflags2 & (XFS_DIFLAG2_REFLINK | + XFS_DIFLAG2_BIGTIME | + XFS_DIFLAG2_NREXT64)); + + if (xflags & FS_XFLAG_DAX) + di_flags2 |= XFS_DIFLAG2_DAX; + if (xflags & FS_XFLAG_COWEXTSIZE) + di_flags2 |= XFS_DIFLAG2_COWEXTSIZE; + + return di_flags2; +} + +uint32_t +xfs_ip2xflags( + struct xfs_inode *ip) +{ + uint32_t flags = 0; + + if (ip->i_diflags & XFS_DIFLAG_ANY) { + if (ip->i_diflags & XFS_DIFLAG_REALTIME) + flags |= FS_XFLAG_REALTIME; + if (ip->i_diflags & XFS_DIFLAG_PREALLOC) + flags |= FS_XFLAG_PREALLOC; + if (ip->i_diflags & XFS_DIFLAG_IMMUTABLE) + flags |= FS_XFLAG_IMMUTABLE; + if (ip->i_diflags & XFS_DIFLAG_APPEND) + flags |= FS_XFLAG_APPEND; + if (ip->i_diflags & XFS_DIFLAG_SYNC) + flags |= FS_XFLAG_SYNC; + if (ip->i_diflags & XFS_DIFLAG_NOATIME) + flags |= FS_XFLAG_NOATIME; + if (ip->i_diflags & XFS_DIFLAG_NODUMP) + flags |= FS_XFLAG_NODUMP; + if (ip->i_diflags & XFS_DIFLAG_RTINHERIT) + flags |= FS_XFLAG_RTINHERIT; + if (ip->i_diflags & XFS_DIFLAG_PROJINHERIT) + flags |= FS_XFLAG_PROJINHERIT; + if (ip->i_diflags & XFS_DIFLAG_NOSYMLINKS) + flags |= FS_XFLAG_NOSYMLINKS; + if (ip->i_diflags & XFS_DIFLAG_EXTSIZE) + flags |= FS_XFLAG_EXTSIZE; + if (ip->i_diflags & XFS_DIFLAG_EXTSZINHERIT) + flags |= FS_XFLAG_EXTSZINHERIT; + if (ip->i_diflags & XFS_DIFLAG_NODEFRAG) + flags |= FS_XFLAG_NODEFRAG; + if (ip->i_diflags & XFS_DIFLAG_FILESTREAM) + flags |= FS_XFLAG_FILESTREAM; + } + + if (ip->i_diflags2 & XFS_DIFLAG2_ANY) { + if (ip->i_diflags2 & XFS_DIFLAG2_DAX) + flags |= FS_XFLAG_DAX; + if (ip->i_diflags2 & XFS_DIFLAG2_COWEXTSIZE) + flags |= FS_XFLAG_COWEXTSIZE; + } + + if (xfs_inode_has_attr_fork(ip)) + flags |= FS_XFLAG_HASATTR; + return flags; +} diff --git a/libxfs/xfs_inode_util.h b/libxfs/xfs_inode_util.h new file mode 100644 index 00000000000..6ad1898a0f7 --- /dev/null +++ b/libxfs/xfs_inode_util.h @@ -0,0 +1,14 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (c) 2000-2003,2005 Silicon Graphics, Inc. + * All Rights Reserved. + */ +#ifndef __XFS_INODE_UTIL_H__ +#define __XFS_INODE_UTIL_H__ + +uint16_t xfs_flags2diflags(struct xfs_inode *ip, unsigned int xflags); +uint64_t xfs_flags2diflags2(struct xfs_inode *ip, unsigned int xflags); +uint32_t xfs_dic2xflags(struct xfs_inode *ip); +uint32_t xfs_ip2xflags(struct xfs_inode *ip); + +#endif /* __XFS_INODE_UTIL_H__ */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 06/26] libxfs: pack icreate initialization parameters into a separate structure 2022-12-30 22:19 ` [PATCHSET v1.0 00/26] libxfs: hoist inode operations to libxfs Darrick J. Wong ` (2 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 02/26] xfs: hoist inode flag conversion functions " Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 05/26] libxfs: pass IGET flags through to xfs_iread Darrick J. Wong ` (21 subsequent siblings) 25 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Callers that want to create an inode currently pass all possible file attribute values for the new inode into xfs_init_new_inode as ten separate parameters. This causes two code maintenance issues: first, we have large multi-line call sites which programmers must read carefully to make sure they did not accidentally invert a value. Second, all three file id parameters must be passed separately to the quota functions; any discrepancy results in quota count errors. Clean this up by creating a new icreate_args structure to hold all this information, some helpers to initialize them properly, and make the callers pass this structure through to the creation function, whose name we shorten to xfs_icreate. This eliminates the issues, enables us to keep the inode init code in sync with userspace via libxfs, and is needed for future metadata directory tree management. (A subsequent cleanup will also fix the quota alloc calls and remove libxfs_dir_ialloc entirely.) Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- include/xfs_inode.h | 37 +++++++++++++++++++---- libxfs/inode.c | 75 ++++++++++++++++++++++++++++------------------- libxfs/xfs_inode_util.h | 31 +++++++++++++++++++ 3 files changed, 107 insertions(+), 36 deletions(-) diff --git a/include/xfs_inode.h b/include/xfs_inode.h index ef62ac50912..bf8322ee2ec 100644 --- a/include/xfs_inode.h +++ b/include/xfs_inode.h @@ -7,6 +7,31 @@ #ifndef __XFS_INODE_H__ #define __XFS_INODE_H__ +/* + * Borrow the kernel's uid/gid types. These are used by xfs_inode_util.h, so + * they must come first in the header file. + */ + +typedef struct { + uid_t val; +} kuid_t; + +typedef struct { + gid_t val; +} kgid_t; + +static inline kuid_t make_kuid(uid_t uid) +{ + kuid_t v = { .val = uid }; + return v; +} + +static inline kgid_t make_kgid(gid_t gid) +{ + kgid_t v = { .val = gid }; + return v; +} + /* These match kernel side includes */ #include "xfs_inode_buf.h" #include "xfs_inode_fork.h" @@ -33,8 +58,8 @@ struct xfs_inode_log_item; */ struct inode { mode_t i_mode; - uint32_t i_uid; - uint32_t i_gid; + kuid_t i_uid; + kgid_t i_gid; uint32_t i_nlink; xfs_dev_t i_rdev; /* This actually holds xfs_dev_t */ unsigned int i_count; @@ -49,19 +74,19 @@ struct inode { static inline uint32_t i_uid_read(struct inode *inode) { - return inode->i_uid; + return inode->i_uid.val; } static inline uint32_t i_gid_read(struct inode *inode) { - return inode->i_gid; + return inode->i_gid.val; } static inline void i_uid_write(struct inode *inode, uint32_t uid) { - inode->i_uid = uid; + inode->i_uid.val = uid; } static inline void i_gid_write(struct inode *inode, uint32_t gid) { - inode->i_gid = gid; + inode->i_gid.val = gid; } static inline void ihold(struct inode *inode) diff --git a/libxfs/inode.c b/libxfs/inode.c index 588aff33ef4..63150422b01 100644 --- a/libxfs/inode.c +++ b/libxfs/inode.c @@ -64,17 +64,13 @@ xfs_inode_propagate_flags( * caller locked exclusively. */ static int -libxfs_init_new_inode( +libxfs_icreate( struct xfs_trans *tp, - struct xfs_inode *pip, xfs_ino_t ino, - umode_t mode, - xfs_nlink_t nlink, - dev_t rdev, - struct cred *cr, - struct fsxattr *fsx, + const struct xfs_icreate_args *args, struct xfs_inode **ipp) { + struct xfs_inode *pip = args->pip; struct xfs_inode *ip; unsigned int flags; int error; @@ -84,48 +80,41 @@ libxfs_init_new_inode( return error; ASSERT(ip != NULL); - VFS_I(ip)->i_mode = mode; - set_nlink(VFS_I(ip), nlink); - i_uid_write(VFS_I(ip), cr->cr_uid); - i_gid_write(VFS_I(ip), cr->cr_gid); - ip->i_projid = pip ? 0 : fsx->fsx_projid; + VFS_I(ip)->i_mode = args->mode; + set_nlink(VFS_I(ip), args->nlink); + VFS_I(ip)->i_uid = args->uid; + ip->i_projid = args->prid; xfs_trans_ichgtime(tp, ip, XFS_ICHGTIME_CHG | XFS_ICHGTIME_MOD); if (pip && (VFS_I(pip)->i_mode & S_ISGID)) { - if (!(cr->cr_flags & CRED_FORCE_GID)) + if (!(args->flags & XFS_ICREATE_ARGS_FORCE_GID)) VFS_I(ip)->i_gid = VFS_I(pip)->i_gid; - if ((VFS_I(pip)->i_mode & S_ISGID) && (mode & S_IFMT) == S_IFDIR) + if ((VFS_I(pip)->i_mode & S_ISGID) && S_ISDIR(args->mode)) VFS_I(ip)->i_mode |= S_ISGID; - } + } else + VFS_I(ip)->i_gid = args->gid; ip->i_disk_size = 0; ip->i_df.if_nextents = 0; ASSERT(ip->i_nblocks == 0); - ip->i_extsize = pip ? 0 : fsx->fsx_extsize; - ip->i_diflags = pip ? 0 : xfs_flags2diflags(ip, fsx->fsx_xflags); - + ip->i_extsize = 0; + ip->i_diflags = 0; if (xfs_has_v3inodes(ip->i_mount)) { VFS_I(ip)->i_version = 1; ip->i_diflags2 = ip->i_mount->m_ino_geo.new_diflags2; - if (!pip) - ip->i_diflags2 = xfs_flags2diflags2(ip, - fsx->fsx_xflags); - ip->i_crtime = VFS_I(ip)->i_mtime; /* struct copy */ - ip->i_cowextsize = pip ? 0 : fsx->fsx_cowextsize; + ip->i_crtime = VFS_I(ip)->i_mtime; + ip->i_cowextsize = 0; } flags = XFS_ILOG_CORE; - switch (mode & S_IFMT) { + switch (args->mode & S_IFMT) { case S_IFIFO: case S_IFSOCK: - /* doesn't make sense to set an rdev for these */ - rdev = 0; - /* FALLTHROUGH */ case S_IFCHR: case S_IFBLK: ip->i_df.if_format = XFS_DINODE_FMT_DEV; flags |= XFS_ILOG_DEV; - VFS_I(ip)->i_rdev = rdev; + VFS_I(ip)->i_rdev = args->rdev; break; case S_IFREG: case S_IFDIR: @@ -235,10 +224,22 @@ libxfs_dir_ialloc( struct fsxattr *fsx, struct xfs_inode **ipp) { + struct xfs_icreate_args args = { + .pip = dp, + .uid = make_kuid(cr->cr_uid), + .gid = make_kgid(cr->cr_gid), + .nlink = nlink, + .rdev = rdev, + .mode = mode, + }; + struct xfs_inode *ip; xfs_ino_t parent_ino = dp ? dp->i_ino : 0; xfs_ino_t ino; int error; + if (cr->cr_flags & CRED_FORCE_GID) + args.flags |= XFS_ICREATE_ARGS_FORCE_GID; + /* * Call the space management code to pick the on-disk inode to be * allocated. @@ -247,8 +248,22 @@ libxfs_dir_ialloc( if (error) return error; - return libxfs_init_new_inode(*tpp, dp, ino, mode, nlink, rdev, cr, - fsx, ipp); + error = libxfs_icreate(*tpp, ino, &args, ipp); + if (error || dp) + return error; + + /* If there is no parent dir, initialize the file from fsxattr data. */ + ip = *ipp; + ip->i_projid = fsx->fsx_projid; + ip->i_extsize = fsx->fsx_extsize; + ip->i_diflags = xfs_flags2diflags(ip, fsx->fsx_xflags); + + if (xfs_has_v3inodes(ip->i_mount)) { + ip->i_diflags2 = xfs_flags2diflags2(ip, fsx->fsx_xflags); + ip->i_cowextsize = fsx->fsx_cowextsize; + } + xfs_trans_log_inode(*tpp, ip, XFS_ILOG_CORE); + return 0; } /* diff --git a/libxfs/xfs_inode_util.h b/libxfs/xfs_inode_util.h index f7e4d5a8235..466f0767ab5 100644 --- a/libxfs/xfs_inode_util.h +++ b/libxfs/xfs_inode_util.h @@ -13,4 +13,35 @@ uint32_t xfs_ip2xflags(struct xfs_inode *ip); prid_t xfs_get_initial_prid(struct xfs_inode *dp); +/* + * Initial ids, link count, device number, and mode of a new inode. + * + * Due to our only partial reliance on the VFS to propagate uid and gid values + * according to accepted Unix behaviors, callers must initialize mnt_userns to + * the appropriate namespace, uid to fsuid_into_mnt(), and gid to + * fsgid_into_mnt() to get the correct inheritance behaviors when + * XFS_MOUNT_GRPID is set. Use the xfs_ialloc_inherit_args() helper. + * + * To override the default ids, use the FORCE flags defined below. + */ +struct xfs_icreate_args { + struct user_namespace *mnt_userns; + struct xfs_inode *pip; /* parent inode or null */ + + kuid_t uid; + kgid_t gid; + prid_t prid; + + xfs_nlink_t nlink; + dev_t rdev; + + umode_t mode; + +#define XFS_ICREATE_ARGS_FORCE_UID (1 << 0) +#define XFS_ICREATE_ARGS_FORCE_GID (1 << 1) +#define XFS_ICREATE_ARGS_FORCE_MODE (1 << 2) +#define XFS_ICREATE_ARGS_INIT_XATTRS (1 << 3) + uint16_t flags; +}; + #endif /* __XFS_INODE_UTIL_H__ */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 05/26] libxfs: pass IGET flags through to xfs_iread 2022-12-30 22:19 ` [PATCHSET v1.0 00/26] libxfs: hoist inode operations to libxfs Darrick J. Wong ` (3 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 06/26] libxfs: pack icreate initialization parameters into a separate structure Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 04/26] libxfs: put all the inode functions in a single file Darrick J. Wong ` (20 subsequent siblings) 25 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Change the lock_flags parameter to iget_flags so that we can supply XFS_IGET_ flags in future patches. All callers of libxfs_iget and libxfs_trans_iget pass zero for this parameter and there are no inode locks in xfsprogs, so there's no behavior change here. Port the kernel's version of the xfs_inode_from_disk callsite. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/inode.c | 40 ++++++++++++++++++++++++++++------------ 1 file changed, 28 insertions(+), 12 deletions(-) diff --git a/libxfs/inode.c b/libxfs/inode.c index c7843aea753..588aff33ef4 100644 --- a/libxfs/inode.c +++ b/libxfs/inode.c @@ -263,11 +263,10 @@ libxfs_iget( struct xfs_mount *mp, struct xfs_trans *tp, xfs_ino_t ino, - uint lock_flags, + uint flags, struct xfs_inode **ipp) { struct xfs_inode *ip; - struct xfs_buf *bp; int error = 0; ip = kmem_cache_zalloc(xfs_inode_cache, 0); @@ -284,18 +283,35 @@ libxfs_iget( if (error) goto out_destroy; - error = xfs_imap_to_bp(mp, tp, &ip->i_imap, &bp); - if (error) - goto out_destroy; + /* + * For version 5 superblocks, if we are initialising a new inode and we + * are not utilising the XFS_MOUNT_IKEEP inode cluster mode, we can + * simply build the new inode core with a random generation number. + * + * For version 4 (and older) superblocks, log recovery is dependent on + * the di_flushiter field being initialised from the current on-disk + * value and hence we must also read the inode off disk even when + * initializing new inodes. + */ + if (xfs_has_v3inodes(mp) && + (flags & XFS_IGET_CREATE) && !xfs_has_ikeep(mp)) { + VFS_I(ip)->i_generation = get_random_u32(); + } else { + struct xfs_buf *bp; - error = xfs_inode_from_disk(ip, - xfs_buf_offset(bp, ip->i_imap.im_boffset)); - if (!error) - xfs_buf_set_ref(bp, XFS_INO_REF); - xfs_trans_brelse(tp, bp); + error = xfs_imap_to_bp(mp, tp, &ip->i_imap, &bp); + if (error) + goto out_destroy; - if (error) - goto out_destroy; + error = xfs_inode_from_disk(ip, + xfs_buf_offset(bp, ip->i_imap.im_boffset)); + if (!error) + xfs_buf_set_ref(bp, XFS_INO_REF); + xfs_trans_brelse(tp, bp); + + if (error) + goto out_destroy; + } *ipp = ip; return 0; ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 04/26] libxfs: put all the inode functions in a single file 2022-12-30 22:19 ` [PATCHSET v1.0 00/26] libxfs: hoist inode operations to libxfs Darrick J. Wong ` (4 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 05/26] libxfs: pass IGET flags through to xfs_iread Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 10/26] libxfs: when creating a file in a directory, set the project id based on the parent Darrick J. Wong ` (19 subsequent siblings) 25 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Move all the inode functions into a single source code file. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/Makefile | 1 libxfs/inode.c | 340 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ libxfs/rdwr.c | 87 -------------- libxfs/util.c | 222 ------------------------------------ 4 files changed, 341 insertions(+), 309 deletions(-) create mode 100644 libxfs/inode.c diff --git a/libxfs/Makefile b/libxfs/Makefile index 0d9c4adf82b..f9bc82cc9e8 100644 --- a/libxfs/Makefile +++ b/libxfs/Makefile @@ -64,6 +64,7 @@ HFILES = \ CFILES = cache.c \ defer_item.c \ init.c \ + inode.c \ kmem.c \ logitem.c \ rdwr.c \ diff --git a/libxfs/inode.c b/libxfs/inode.c new file mode 100644 index 00000000000..c7843aea753 --- /dev/null +++ b/libxfs/inode.c @@ -0,0 +1,340 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (c) 2000-2005 Silicon Graphics, Inc. + * All Rights Reserved. + */ + +#include "libxfs_priv.h" +#include "libxfs.h" +#include "libxfs_io.h" +#include "init.h" +#include "xfs_fs.h" +#include "xfs_shared.h" +#include "xfs_format.h" +#include "xfs_log_format.h" +#include "xfs_trans_resv.h" +#include "xfs_mount.h" +#include "xfs_defer.h" +#include "xfs_inode_buf.h" +#include "xfs_inode_fork.h" +#include "xfs_inode.h" +#include "xfs_trans.h" +#include "xfs_bmap.h" +#include "xfs_bmap_btree.h" +#include "xfs_trans_space.h" +#include "xfs_ialloc.h" +#include "xfs_alloc.h" +#include "xfs_bit.h" +#include "xfs_da_format.h" +#include "xfs_da_btree.h" +#include "xfs_dir2_priv.h" + +/* Propagate di_flags from a parent inode to a child inode. */ +static void +xfs_inode_propagate_flags( + struct xfs_inode *ip, + const struct xfs_inode *pip) +{ + unsigned int di_flags = 0; + umode_t mode = VFS_I(ip)->i_mode; + + if ((mode & S_IFMT) == S_IFDIR) { + if (pip->i_diflags & XFS_DIFLAG_RTINHERIT) + di_flags |= XFS_DIFLAG_RTINHERIT; + if (pip->i_diflags & XFS_DIFLAG_EXTSZINHERIT) { + di_flags |= XFS_DIFLAG_EXTSZINHERIT; + ip->i_extsize = pip->i_extsize; + } + } else { + if ((pip->i_diflags & XFS_DIFLAG_RTINHERIT) && + xfs_has_realtime(ip->i_mount)) + di_flags |= XFS_DIFLAG_REALTIME; + if (pip->i_diflags & XFS_DIFLAG_EXTSZINHERIT) { + di_flags |= XFS_DIFLAG_EXTSIZE; + ip->i_extsize = pip->i_extsize; + } + } + if (pip->i_diflags & XFS_DIFLAG_PROJINHERIT) + di_flags |= XFS_DIFLAG_PROJINHERIT; + ip->i_diflags |= di_flags; +} + +/* + * Initialise a newly allocated inode and return the in-core inode to the + * caller locked exclusively. + */ +static int +libxfs_init_new_inode( + struct xfs_trans *tp, + struct xfs_inode *pip, + xfs_ino_t ino, + umode_t mode, + xfs_nlink_t nlink, + dev_t rdev, + struct cred *cr, + struct fsxattr *fsx, + struct xfs_inode **ipp) +{ + struct xfs_inode *ip; + unsigned int flags; + int error; + + error = libxfs_iget(tp->t_mountp, tp, ino, 0, &ip); + if (error != 0) + return error; + ASSERT(ip != NULL); + + VFS_I(ip)->i_mode = mode; + set_nlink(VFS_I(ip), nlink); + i_uid_write(VFS_I(ip), cr->cr_uid); + i_gid_write(VFS_I(ip), cr->cr_gid); + ip->i_projid = pip ? 0 : fsx->fsx_projid; + xfs_trans_ichgtime(tp, ip, XFS_ICHGTIME_CHG | XFS_ICHGTIME_MOD); + + if (pip && (VFS_I(pip)->i_mode & S_ISGID)) { + if (!(cr->cr_flags & CRED_FORCE_GID)) + VFS_I(ip)->i_gid = VFS_I(pip)->i_gid; + if ((VFS_I(pip)->i_mode & S_ISGID) && (mode & S_IFMT) == S_IFDIR) + VFS_I(ip)->i_mode |= S_ISGID; + } + + ip->i_disk_size = 0; + ip->i_df.if_nextents = 0; + ASSERT(ip->i_nblocks == 0); + ip->i_extsize = pip ? 0 : fsx->fsx_extsize; + ip->i_diflags = pip ? 0 : xfs_flags2diflags(ip, fsx->fsx_xflags); + + if (xfs_has_v3inodes(ip->i_mount)) { + VFS_I(ip)->i_version = 1; + ip->i_diflags2 = ip->i_mount->m_ino_geo.new_diflags2; + if (!pip) + ip->i_diflags2 = xfs_flags2diflags2(ip, + fsx->fsx_xflags); + ip->i_crtime = VFS_I(ip)->i_mtime; /* struct copy */ + ip->i_cowextsize = pip ? 0 : fsx->fsx_cowextsize; + } + + flags = XFS_ILOG_CORE; + switch (mode & S_IFMT) { + case S_IFIFO: + case S_IFSOCK: + /* doesn't make sense to set an rdev for these */ + rdev = 0; + /* FALLTHROUGH */ + case S_IFCHR: + case S_IFBLK: + ip->i_df.if_format = XFS_DINODE_FMT_DEV; + flags |= XFS_ILOG_DEV; + VFS_I(ip)->i_rdev = rdev; + break; + case S_IFREG: + case S_IFDIR: + if (pip && (pip->i_diflags & XFS_DIFLAG_ANY)) + xfs_inode_propagate_flags(ip, pip); + /* FALLTHROUGH */ + case S_IFLNK: + ip->i_df.if_format = XFS_DINODE_FMT_EXTENTS; + ip->i_df.if_bytes = 0; + ip->i_df.if_u1.if_root = NULL; + break; + default: + ASSERT(0); + } + + /* + * Log the new values stuffed into the inode. + */ + xfs_trans_ijoin(tp, ip, 0); + xfs_trans_log_inode(tp, ip, flags); + *ipp = ip; + return 0; +} + +/* + * Writes a modified inode's changes out to the inode's on disk home. + * Originally based on xfs_iflush_int() from xfs_inode.c in the kernel. + */ +int +libxfs_iflush_int( + struct xfs_inode *ip, + struct xfs_buf *bp) +{ + struct xfs_inode_log_item *iip; + struct xfs_dinode *dip; + struct xfs_mount *mp; + + ASSERT(ip->i_df.if_format != XFS_DINODE_FMT_BTREE || + ip->i_df.if_nextents > ip->i_df.if_ext_max); + + iip = ip->i_itemp; + mp = ip->i_mount; + + /* set *dip = inode's place in the buffer */ + dip = xfs_buf_offset(bp, ip->i_imap.im_boffset); + + if (XFS_ISREG(ip)) { + ASSERT( (ip->i_df.if_format == XFS_DINODE_FMT_EXTENTS) || + (ip->i_df.if_format == XFS_DINODE_FMT_BTREE) ); + } else if (XFS_ISDIR(ip)) { + ASSERT( (ip->i_df.if_format == XFS_DINODE_FMT_EXTENTS) || + (ip->i_df.if_format == XFS_DINODE_FMT_BTREE) || + (ip->i_df.if_format == XFS_DINODE_FMT_LOCAL) ); + } + ASSERT(ip->i_df.if_nextents+ip.i_af->if_nextents <= ip->i_nblocks); + ASSERT(ip->i_forkoff <= mp->m_sb.sb_inodesize); + + /* bump the change count on v3 inodes */ + if (xfs_has_v3inodes(mp)) + VFS_I(ip)->i_version++; + + /* + * If there are inline format data / attr forks attached to this inode, + * make sure they are not corrupt. + */ + if (ip->i_df.if_format == XFS_DINODE_FMT_LOCAL && + xfs_ifork_verify_local_data(ip)) + return -EFSCORRUPTED; + if (xfs_inode_has_attr_fork(ip) && + ip->i_af.if_format == XFS_DINODE_FMT_LOCAL && + xfs_ifork_verify_local_attr(ip)) + return -EFSCORRUPTED; + + /* + * Copy the dirty parts of the inode into the on-disk + * inode. We always copy out the core of the inode, + * because if the inode is dirty at all the core must + * be. + */ + xfs_inode_to_disk(ip, dip, iip->ili_item.li_lsn); + + xfs_iflush_fork(ip, dip, iip, XFS_DATA_FORK); + if (xfs_inode_has_attr_fork(ip)) + xfs_iflush_fork(ip, dip, iip, XFS_ATTR_FORK); + + /* generate the checksum. */ + xfs_dinode_calc_crc(mp, dip); + + return 0; +} + +/* + * Wrapper around call to libxfs_ialloc. Takes care of committing and + * allocating a new transaction as needed. + * + * Originally there were two copies of this code - one in mkfs, the + * other in repair - now there is just the one. + */ +int +libxfs_dir_ialloc( + struct xfs_trans **tpp, + struct xfs_inode *dp, + mode_t mode, + nlink_t nlink, + xfs_dev_t rdev, + struct cred *cr, + struct fsxattr *fsx, + struct xfs_inode **ipp) +{ + xfs_ino_t parent_ino = dp ? dp->i_ino : 0; + xfs_ino_t ino; + int error; + + /* + * Call the space management code to pick the on-disk inode to be + * allocated. + */ + error = xfs_dialloc(tpp, parent_ino, mode, &ino); + if (error) + return error; + + return libxfs_init_new_inode(*tpp, dp, ino, mode, nlink, rdev, cr, + fsx, ipp); +} + +/* + * Inode cache stubs. + */ + +struct kmem_cache *xfs_inode_cache; +extern struct kmem_cache *xfs_ili_cache; + +int +libxfs_iget( + struct xfs_mount *mp, + struct xfs_trans *tp, + xfs_ino_t ino, + uint lock_flags, + struct xfs_inode **ipp) +{ + struct xfs_inode *ip; + struct xfs_buf *bp; + int error = 0; + + ip = kmem_cache_zalloc(xfs_inode_cache, 0); + if (!ip) + return -ENOMEM; + + VFS_I(ip)->i_count = 1; + ip->i_ino = ino; + ip->i_mount = mp; + ip->i_af.if_format = XFS_DINODE_FMT_EXTENTS; + spin_lock_init(&VFS_I(ip)->i_lock); + + error = xfs_imap(mp, tp, ip->i_ino, &ip->i_imap, 0); + if (error) + goto out_destroy; + + error = xfs_imap_to_bp(mp, tp, &ip->i_imap, &bp); + if (error) + goto out_destroy; + + error = xfs_inode_from_disk(ip, + xfs_buf_offset(bp, ip->i_imap.im_boffset)); + if (!error) + xfs_buf_set_ref(bp, XFS_INO_REF); + xfs_trans_brelse(tp, bp); + + if (error) + goto out_destroy; + + *ipp = ip; + return 0; + +out_destroy: + kmem_cache_free(xfs_inode_cache, ip); + *ipp = NULL; + return error; +} + +static void +libxfs_idestroy( + struct xfs_inode *ip) +{ + switch (VFS_I(ip)->i_mode & S_IFMT) { + case S_IFREG: + case S_IFDIR: + case S_IFLNK: + libxfs_idestroy_fork(&ip->i_df); + break; + } + + libxfs_ifork_zap_attr(ip); + + if (ip->i_cowfp) { + libxfs_idestroy_fork(ip->i_cowfp); + kmem_cache_free(xfs_ifork_cache, ip->i_cowfp); + } +} + +void +libxfs_irele( + struct xfs_inode *ip) +{ + VFS_I(ip)->i_count--; + + if (VFS_I(ip)->i_count == 0) { + ASSERT(ip->i_itemp == NULL); + libxfs_idestroy(ip); + kmem_cache_free(xfs_inode_cache, ip); + } +} diff --git a/libxfs/rdwr.c b/libxfs/rdwr.c index c2dbc51f3f2..2c66b84ff83 100644 --- a/libxfs/rdwr.c +++ b/libxfs/rdwr.c @@ -1121,93 +1121,6 @@ xfs_verify_magic16( return dmagic == bp->b_ops->magic16[idx]; } -/* - * Inode cache stubs. - */ - -struct kmem_cache *xfs_inode_cache; -extern struct kmem_cache *xfs_ili_cache; - -int -libxfs_iget( - struct xfs_mount *mp, - struct xfs_trans *tp, - xfs_ino_t ino, - uint lock_flags, - struct xfs_inode **ipp) -{ - struct xfs_inode *ip; - struct xfs_buf *bp; - int error = 0; - - ip = kmem_cache_zalloc(xfs_inode_cache, 0); - if (!ip) - return -ENOMEM; - - VFS_I(ip)->i_count = 1; - ip->i_ino = ino; - ip->i_mount = mp; - ip->i_af.if_format = XFS_DINODE_FMT_EXTENTS; - spin_lock_init(&VFS_I(ip)->i_lock); - - error = xfs_imap(mp, tp, ip->i_ino, &ip->i_imap, 0); - if (error) - goto out_destroy; - - error = xfs_imap_to_bp(mp, tp, &ip->i_imap, &bp); - if (error) - goto out_destroy; - - error = xfs_inode_from_disk(ip, - xfs_buf_offset(bp, ip->i_imap.im_boffset)); - if (!error) - xfs_buf_set_ref(bp, XFS_INO_REF); - xfs_trans_brelse(tp, bp); - - if (error) - goto out_destroy; - - *ipp = ip; - return 0; - -out_destroy: - kmem_cache_free(xfs_inode_cache, ip); - *ipp = NULL; - return error; -} - -static void -libxfs_idestroy(xfs_inode_t *ip) -{ - switch (VFS_I(ip)->i_mode & S_IFMT) { - case S_IFREG: - case S_IFDIR: - case S_IFLNK: - libxfs_idestroy_fork(&ip->i_df); - break; - } - - libxfs_ifork_zap_attr(ip); - - if (ip->i_cowfp) { - libxfs_idestroy_fork(ip->i_cowfp); - kmem_cache_free(xfs_ifork_cache, ip->i_cowfp); - } -} - -void -libxfs_irele( - struct xfs_inode *ip) -{ - VFS_I(ip)->i_count--; - - if (VFS_I(ip)->i_count == 0) { - ASSERT(ip->i_itemp == NULL); - libxfs_idestroy(ip); - kmem_cache_free(xfs_inode_cache, ip); - } -} - /* * Flush everything dirty in the kernel and disk write caches to stable media. * Returns 0 for success or a negative error code. diff --git a/libxfs/util.c b/libxfs/util.c index 6b888e9f996..51a0f513e7a 100644 --- a/libxfs/util.c +++ b/libxfs/util.c @@ -150,194 +150,6 @@ current_time(struct inode *inode) return tv; } -/* Propagate di_flags from a parent inode to a child inode. */ -static void -xfs_inode_propagate_flags( - struct xfs_inode *ip, - const struct xfs_inode *pip) -{ - unsigned int di_flags = 0; - umode_t mode = VFS_I(ip)->i_mode; - - if ((mode & S_IFMT) == S_IFDIR) { - if (pip->i_diflags & XFS_DIFLAG_RTINHERIT) - di_flags |= XFS_DIFLAG_RTINHERIT; - if (pip->i_diflags & XFS_DIFLAG_EXTSZINHERIT) { - di_flags |= XFS_DIFLAG_EXTSZINHERIT; - ip->i_extsize = pip->i_extsize; - } - } else { - if ((pip->i_diflags & XFS_DIFLAG_RTINHERIT) && - xfs_has_realtime(ip->i_mount)) - di_flags |= XFS_DIFLAG_REALTIME; - if (pip->i_diflags & XFS_DIFLAG_EXTSZINHERIT) { - di_flags |= XFS_DIFLAG_EXTSIZE; - ip->i_extsize = pip->i_extsize; - } - } - if (pip->i_diflags & XFS_DIFLAG_PROJINHERIT) - di_flags |= XFS_DIFLAG_PROJINHERIT; - ip->i_diflags |= di_flags; -} - -/* - * Initialise a newly allocated inode and return the in-core inode to the - * caller locked exclusively. - */ -static int -libxfs_init_new_inode( - struct xfs_trans *tp, - struct xfs_inode *pip, - xfs_ino_t ino, - umode_t mode, - xfs_nlink_t nlink, - dev_t rdev, - struct cred *cr, - struct fsxattr *fsx, - struct xfs_inode **ipp) -{ - struct xfs_inode *ip; - unsigned int flags; - int error; - - error = libxfs_iget(tp->t_mountp, tp, ino, 0, &ip); - if (error != 0) - return error; - ASSERT(ip != NULL); - - VFS_I(ip)->i_mode = mode; - set_nlink(VFS_I(ip), nlink); - i_uid_write(VFS_I(ip), cr->cr_uid); - i_gid_write(VFS_I(ip), cr->cr_gid); - ip->i_projid = pip ? 0 : fsx->fsx_projid; - xfs_trans_ichgtime(tp, ip, XFS_ICHGTIME_CHG | XFS_ICHGTIME_MOD); - - if (pip && (VFS_I(pip)->i_mode & S_ISGID)) { - if (!(cr->cr_flags & CRED_FORCE_GID)) - VFS_I(ip)->i_gid = VFS_I(pip)->i_gid; - if ((VFS_I(pip)->i_mode & S_ISGID) && (mode & S_IFMT) == S_IFDIR) - VFS_I(ip)->i_mode |= S_ISGID; - } - - ip->i_disk_size = 0; - ip->i_df.if_nextents = 0; - ASSERT(ip->i_nblocks == 0); - ip->i_extsize = pip ? 0 : fsx->fsx_extsize; - ip->i_diflags = pip ? 0 : xfs_flags2diflags(ip, fsx->fsx_xflags); - - if (xfs_has_v3inodes(ip->i_mount)) { - VFS_I(ip)->i_version = 1; - ip->i_diflags2 = ip->i_mount->m_ino_geo.new_diflags2; - if (!pip) - ip->i_diflags2 = xfs_flags2diflags2(ip, - fsx->fsx_xflags); - ip->i_crtime = VFS_I(ip)->i_mtime; /* struct copy */ - ip->i_cowextsize = pip ? 0 : fsx->fsx_cowextsize; - } - - flags = XFS_ILOG_CORE; - switch (mode & S_IFMT) { - case S_IFIFO: - case S_IFSOCK: - /* doesn't make sense to set an rdev for these */ - rdev = 0; - /* FALLTHROUGH */ - case S_IFCHR: - case S_IFBLK: - ip->i_df.if_format = XFS_DINODE_FMT_DEV; - flags |= XFS_ILOG_DEV; - VFS_I(ip)->i_rdev = rdev; - break; - case S_IFREG: - case S_IFDIR: - if (pip && (pip->i_diflags & XFS_DIFLAG_ANY)) - xfs_inode_propagate_flags(ip, pip); - /* FALLTHROUGH */ - case S_IFLNK: - ip->i_df.if_format = XFS_DINODE_FMT_EXTENTS; - ip->i_df.if_bytes = 0; - ip->i_df.if_u1.if_root = NULL; - break; - default: - ASSERT(0); - } - - /* - * Log the new values stuffed into the inode. - */ - xfs_trans_ijoin(tp, ip, 0); - xfs_trans_log_inode(tp, ip, flags); - *ipp = ip; - return 0; -} - -/* - * Writes a modified inode's changes out to the inode's on disk home. - * Originally based on xfs_iflush_int() from xfs_inode.c in the kernel. - */ -int -libxfs_iflush_int( - xfs_inode_t *ip, - struct xfs_buf *bp) -{ - struct xfs_inode_log_item *iip; - struct xfs_dinode *dip; - xfs_mount_t *mp; - - ASSERT(ip->i_df.if_format != XFS_DINODE_FMT_BTREE || - ip->i_df.if_nextents > ip->i_df.if_ext_max); - - iip = ip->i_itemp; - mp = ip->i_mount; - - /* set *dip = inode's place in the buffer */ - dip = xfs_buf_offset(bp, ip->i_imap.im_boffset); - - if (XFS_ISREG(ip)) { - ASSERT( (ip->i_df.if_format == XFS_DINODE_FMT_EXTENTS) || - (ip->i_df.if_format == XFS_DINODE_FMT_BTREE) ); - } else if (XFS_ISDIR(ip)) { - ASSERT( (ip->i_df.if_format == XFS_DINODE_FMT_EXTENTS) || - (ip->i_df.if_format == XFS_DINODE_FMT_BTREE) || - (ip->i_df.if_format == XFS_DINODE_FMT_LOCAL) ); - } - ASSERT(ip->i_df.if_nextents+ip.i_af->if_nextents <= ip->i_nblocks); - ASSERT(ip->i_forkoff <= mp->m_sb.sb_inodesize); - - /* bump the change count on v3 inodes */ - if (xfs_has_v3inodes(mp)) - VFS_I(ip)->i_version++; - - /* - * If there are inline format data / attr forks attached to this inode, - * make sure they are not corrupt. - */ - if (ip->i_df.if_format == XFS_DINODE_FMT_LOCAL && - xfs_ifork_verify_local_data(ip)) - return -EFSCORRUPTED; - if (xfs_inode_has_attr_fork(ip) && - ip->i_af.if_format == XFS_DINODE_FMT_LOCAL && - xfs_ifork_verify_local_attr(ip)) - return -EFSCORRUPTED; - - /* - * Copy the dirty parts of the inode into the on-disk - * inode. We always copy out the core of the inode, - * because if the inode is dirty at all the core must - * be. - */ - xfs_inode_to_disk(ip, dip, iip->ili_item.li_lsn); - - xfs_iflush_fork(ip, dip, iip, XFS_DATA_FORK); - if (xfs_inode_has_attr_fork(ip)) - xfs_iflush_fork(ip, dip, iip, XFS_ATTR_FORK); - - /* generate the checksum. */ - xfs_dinode_calc_crc(mp, dip); - - return 0; -} - int libxfs_mod_incore_sb( struct xfs_mount *mp, @@ -442,40 +254,6 @@ libxfs_alloc_file_space( return error; } -/* - * Wrapper around call to libxfs_ialloc. Takes care of committing and - * allocating a new transaction as needed. - * - * Originally there were two copies of this code - one in mkfs, the - * other in repair - now there is just the one. - */ -int -libxfs_dir_ialloc( - struct xfs_trans **tpp, - struct xfs_inode *dp, - mode_t mode, - nlink_t nlink, - xfs_dev_t rdev, - struct cred *cr, - struct fsxattr *fsx, - struct xfs_inode **ipp) -{ - xfs_ino_t parent_ino = dp ? dp->i_ino : 0; - xfs_ino_t ino; - int error; - - /* - * Call the space management code to pick the on-disk inode to be - * allocated. - */ - error = xfs_dialloc(tpp, parent_ino, mode, &ino); - if (error) - return error; - - return libxfs_init_new_inode(*tpp, dp, ino, mode, nlink, rdev, cr, - fsx, ipp); -} - void cmn_err(int level, char *fmt, ...) { ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 10/26] libxfs: when creating a file in a directory, set the project id based on the parent 2022-12-30 22:19 ` [PATCHSET v1.0 00/26] libxfs: hoist inode operations to libxfs Darrick J. Wong ` (5 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 04/26] libxfs: put all the inode functions in a single file Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 09/26] libxfs: set access time when creating files Darrick J. Wong ` (18 subsequent siblings) 25 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> When we're creating a file as a child of an existing directory, use xfs_get_initial_prid to have the child inherit the project id of the directory if the directory has PROJINHERIT set, just like the kernel does. This fixes mkfs project id propagation with -d projinherit=X when protofiles are in use. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/inode.c | 1 + libxfs/libxfs_api_defs.h | 1 + 2 files changed, 2 insertions(+) diff --git a/libxfs/inode.c b/libxfs/inode.c index 7f8f1164e08..c63cc0543d6 100644 --- a/libxfs/inode.c +++ b/libxfs/inode.c @@ -231,6 +231,7 @@ libxfs_dir_ialloc( .pip = dp, .uid = make_kuid(cr->cr_uid), .gid = make_kgid(cr->cr_gid), + .prid = dp ? libxfs_get_initial_prid(dp) : 0, .nlink = nlink, .rdev = rdev, .mode = mode, diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h index 01ad6e54624..5752733a833 100644 --- a/libxfs/libxfs_api_defs.h +++ b/libxfs/libxfs_api_defs.h @@ -136,6 +136,7 @@ #define xfs_free_perag libxfs_free_perag #define xfs_fs_geometry libxfs_fs_geometry #define xfs_get_projid libxfs_get_projid +#define xfs_get_initial_prid libxfs_get_initial_prid #define xfs_highbit32 libxfs_highbit32 #define xfs_highbit64 libxfs_highbit64 #define xfs_ialloc_calc_rootino libxfs_ialloc_calc_rootino ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 09/26] libxfs: set access time when creating files 2022-12-30 22:19 ` [PATCHSET v1.0 00/26] libxfs: hoist inode operations to libxfs Darrick J. Wong ` (6 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 10/26] libxfs: when creating a file in a directory, set the project id based on the parent Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 07/26] libxfs: implement access timestamp updates in ichgtime Darrick J. Wong ` (17 subsequent siblings) 25 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Set the access time on files that we're creating, to match the behavior of the kernel. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/inode.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/libxfs/inode.c b/libxfs/inode.c index c14a4c5a27f..7f8f1164e08 100644 --- a/libxfs/inode.c +++ b/libxfs/inode.c @@ -73,7 +73,8 @@ libxfs_icreate( struct xfs_inode *pip = args->pip; struct xfs_inode *ip; unsigned int flags; - int times = XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG; + int times = XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG | + XFS_ICHGTIME_ACCESS; int error; error = libxfs_iget(tp->t_mountp, tp, ino, 0, &ip); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 07/26] libxfs: implement access timestamp updates in ichgtime 2022-12-30 22:19 ` [PATCHSET v1.0 00/26] libxfs: hoist inode operations to libxfs Darrick J. Wong ` (7 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 09/26] libxfs: set access time when creating files Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 08/26] libxfs: rearrange libxfs_trans_ichgtime call when creating inodes Darrick J. Wong ` (16 subsequent siblings) 25 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Implement access time updates in ichgtime so that we can use the common ichgtime function when setting up inodes. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/xfs_shared.h | 1 + libxfs/xfs_trans_inode.c | 2 ++ 2 files changed, 3 insertions(+) diff --git a/libxfs/xfs_shared.h b/libxfs/xfs_shared.h index 5127fa88531..acf527eb0e1 100644 --- a/libxfs/xfs_shared.h +++ b/libxfs/xfs_shared.h @@ -137,6 +137,7 @@ void xfs_log_get_max_trans_res(struct xfs_mount *mp, #define XFS_ICHGTIME_MOD 0x1 /* data fork modification timestamp */ #define XFS_ICHGTIME_CHG 0x2 /* inode field change timestamp */ #define XFS_ICHGTIME_CREATE 0x4 /* inode create timestamp */ +#define XFS_ICHGTIME_ACCESS 0x8 /* last access timestamp */ /* Computed inode geometry for the filesystem. */ struct xfs_ino_geometry { diff --git a/libxfs/xfs_trans_inode.c b/libxfs/xfs_trans_inode.c index 276d57cf737..6fc7a65d517 100644 --- a/libxfs/xfs_trans_inode.c +++ b/libxfs/xfs_trans_inode.c @@ -66,6 +66,8 @@ xfs_trans_ichgtime( inode->i_mtime = tv; if (flags & XFS_ICHGTIME_CHG) inode->i_ctime = tv; + if (flags & XFS_ICHGTIME_ACCESS) + inode->i_atime = tv; if (flags & XFS_ICHGTIME_CREATE) ip->i_crtime = tv; } ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 08/26] libxfs: rearrange libxfs_trans_ichgtime call when creating inodes 2022-12-30 22:19 ` [PATCHSET v1.0 00/26] libxfs: hoist inode operations to libxfs Darrick J. Wong ` (8 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 07/26] libxfs: implement access timestamp updates in ichgtime Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 11/26] libxfs: pass flags2 from parent to child when creating files Darrick J. Wong ` (15 subsequent siblings) 25 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Rearrange the libxfs_trans_ichgtime call in libxfs_ialloc so that we call it once with the flags we want. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/inode.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/libxfs/inode.c b/libxfs/inode.c index 63150422b01..c14a4c5a27f 100644 --- a/libxfs/inode.c +++ b/libxfs/inode.c @@ -73,6 +73,7 @@ libxfs_icreate( struct xfs_inode *pip = args->pip; struct xfs_inode *ip; unsigned int flags; + int times = XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG; int error; error = libxfs_iget(tp->t_mountp, tp, ino, 0, &ip); @@ -84,7 +85,6 @@ libxfs_icreate( set_nlink(VFS_I(ip), args->nlink); VFS_I(ip)->i_uid = args->uid; ip->i_projid = args->prid; - xfs_trans_ichgtime(tp, ip, XFS_ICHGTIME_CHG | XFS_ICHGTIME_MOD); if (pip && (VFS_I(pip)->i_mode & S_ISGID)) { if (!(args->flags & XFS_ICREATE_ARGS_FORCE_GID)) @@ -102,10 +102,12 @@ libxfs_icreate( if (xfs_has_v3inodes(ip->i_mount)) { VFS_I(ip)->i_version = 1; ip->i_diflags2 = ip->i_mount->m_ino_geo.new_diflags2; - ip->i_crtime = VFS_I(ip)->i_mtime; ip->i_cowextsize = 0; + times |= XFS_ICHGTIME_CREATE; } + xfs_trans_ichgtime(tp, ip, times); + flags = XFS_ILOG_CORE; switch (args->mode & S_IFMT) { case S_IFIFO: ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 11/26] libxfs: pass flags2 from parent to child when creating files 2022-12-30 22:19 ` [PATCHSET v1.0 00/26] libxfs: hoist inode operations to libxfs Darrick J. Wong ` (9 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 08/26] libxfs: rearrange libxfs_trans_ichgtime call when creating inodes Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 12/26] libxfs: split new inode creation into two pieces Darrick J. Wong ` (14 subsequent siblings) 25 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> When mkfs creates a new file as a child of an existing directory, we should propagate the flags2 field from parent to child like the kernel does. This ensures that mkfs propagates cowextsize hints properly when protofiles are in use. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/inode.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/libxfs/inode.c b/libxfs/inode.c index c63cc0543d6..9835c708021 100644 --- a/libxfs/inode.c +++ b/libxfs/inode.c @@ -59,6 +59,20 @@ xfs_inode_propagate_flags( ip->i_diflags |= di_flags; } +/* Propagate di_flags2 from a parent inode to a child inode. */ +static void +xfs_inode_inherit_flags2( + struct xfs_inode *ip, + const struct xfs_inode *pip) +{ + if (pip->i_diflags2 & XFS_DIFLAG2_COWEXTSIZE) { + ip->i_diflags2 |= XFS_DIFLAG2_COWEXTSIZE; + ip->i_cowextsize = pip->i_cowextsize; + } + if (pip->i_diflags2 & XFS_DIFLAG2_DAX) + ip->i_diflags2 |= XFS_DIFLAG2_DAX; +} + /* * Initialise a newly allocated inode and return the in-core inode to the * caller locked exclusively. @@ -123,6 +137,8 @@ libxfs_icreate( case S_IFDIR: if (pip && (pip->i_diflags & XFS_DIFLAG_ANY)) xfs_inode_propagate_flags(ip, pip); + if (pip && (pip->i_diflags2 & XFS_DIFLAG2_ANY)) + xfs_inode_inherit_flags2(ip, pip); /* FALLTHROUGH */ case S_IFLNK: ip->i_df.if_format = XFS_DINODE_FMT_EXTENTS; ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 12/26] libxfs: split new inode creation into two pieces 2022-12-30 22:19 ` [PATCHSET v1.0 00/26] libxfs: hoist inode operations to libxfs Darrick J. Wong ` (10 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 11/26] libxfs: pass flags2 from parent to child when creating files Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 14/26] libxfs: remove libxfs_dir_ialloc Darrick J. Wong ` (13 subsequent siblings) 25 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> There are two parts to initializing a newly allocated inode: setting up the incore structures, and initializing the new inode core based on the parent inode and the current user's environment. The initialization code is not specific to the kernel, so we would like to share that with userspace by hoisting it to libxfs. Therefore, split xfs_icreate into separate functions to prepare for the next few patches. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/inode.c | 42 ++++++++++++++++++++++++++---------------- libxfs/xfs_ialloc.c | 20 ++++++++++++++++++-- 2 files changed, 44 insertions(+), 18 deletions(-) diff --git a/libxfs/inode.c b/libxfs/inode.c index 9835c708021..44d889f3f0f 100644 --- a/libxfs/inode.c +++ b/libxfs/inode.c @@ -73,28 +73,17 @@ xfs_inode_inherit_flags2( ip->i_diflags2 |= XFS_DIFLAG2_DAX; } -/* - * Initialise a newly allocated inode and return the in-core inode to the - * caller locked exclusively. - */ -static int -libxfs_icreate( +/* Initialise an inode's attributes. */ +static void +xfs_inode_init( struct xfs_trans *tp, - xfs_ino_t ino, const struct xfs_icreate_args *args, - struct xfs_inode **ipp) + struct xfs_inode *ip) { struct xfs_inode *pip = args->pip; - struct xfs_inode *ip; unsigned int flags; int times = XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG | XFS_ICHGTIME_ACCESS; - int error; - - error = libxfs_iget(tp->t_mountp, tp, ino, 0, &ip); - if (error != 0) - return error; - ASSERT(ip != NULL); VFS_I(ip)->i_mode = args->mode; set_nlink(VFS_I(ip), args->nlink); @@ -154,7 +143,28 @@ libxfs_icreate( */ xfs_trans_ijoin(tp, ip, 0); xfs_trans_log_inode(tp, ip, flags); - *ipp = ip; +} + +/* + * Initialise a newly allocated inode and return the in-core inode to the + * caller locked exclusively. + */ +static int +libxfs_icreate( + struct xfs_trans *tp, + xfs_ino_t ino, + const struct xfs_icreate_args *args, + struct xfs_inode **ipp) +{ + struct xfs_mount *mp = tp->t_mountp; + int error; + + error = libxfs_iget(mp, tp, ino, 0, ipp); + if (error) + return error; + + ASSERT(*ipp != NULL); + xfs_inode_init(tp, args, *ipp); return 0; } diff --git a/libxfs/xfs_ialloc.c b/libxfs/xfs_ialloc.c index 3d55f5d1f46..9ce36b2cd8d 100644 --- a/libxfs/xfs_ialloc.c +++ b/libxfs/xfs_ialloc.c @@ -1865,9 +1865,25 @@ xfs_dialloc( } xfs_perag_put(pag); } + if (error) + goto out; - if (!error) - *new_ino = ino; + /* + * Protect against obviously corrupt allocation btree records. Later + * xfs_iget checks will catch re-allocation of other active in-memory + * and on-disk inodes. If we don't catch reallocating the parent inode + * here we will deadlock in xfs_iget() so we have to do these checks + * first. + */ + if (ino == parent || !xfs_verify_dir_ino(mp, ino)) { + xfs_alert(mp, "Allocated a known in-use inode 0x%llx!", ino); + xfs_ag_mark_sick(pag, XFS_SICK_AG_INOBT); + error = -EFSCORRUPTED; + goto out; + } + + *new_ino = ino; +out: xfs_perag_put(pag); return error; } ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 14/26] libxfs: remove libxfs_dir_ialloc 2022-12-30 22:19 ` [PATCHSET v1.0 00/26] libxfs: hoist inode operations to libxfs Darrick J. Wong ` (11 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 12/26] libxfs: split new inode creation into two pieces Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 13/26] libxfs: backport inode init code from the kernel Darrick J. Wong ` (12 subsequent siblings) 25 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> This function no longer exists in the kernel, and it's not really needed in userspace either. There are two users of it: repair and mkfs. Repair passes in zeroed cred and fsxattr structures so it can call libxfs_dialloc and libxfs_icreate directly. For mkfs we'll move the guts of libxfs_dir_ialloc into proto.c as a creatproto function that takes care of all that, and move struct cred to mkfs since it's now the only user. This gets us ready to hoist the rest of the inode initialization code to libxfs for metadata directories. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- include/xfs_inode.h | 15 ++----- libxfs/inode.c | 79 ++++++++------------------------------- libxfs/libxfs_api_defs.h | 1 mkfs/proto.c | 94 +++++++++++++++++++++++++++++++++++++--------- repair/phase6.c | 48 +++++++++++++---------- 5 files changed, 124 insertions(+), 113 deletions(-) diff --git a/include/xfs_inode.h b/include/xfs_inode.h index 4e8a3dc6fd8..03add740fa7 100644 --- a/include/xfs_inode.h +++ b/include/xfs_inode.h @@ -268,17 +268,6 @@ static inline bool xfs_is_always_cow_inode(struct xfs_inode *ip) return false; } -/* Always set the child's GID to this value, even if the parent is setgid. */ -#define CRED_FORCE_GID (1U << 0) -struct cred { - uid_t cr_uid; - gid_t cr_gid; - unsigned int cr_flags; -}; - -extern int libxfs_dir_ialloc (struct xfs_trans **, struct xfs_inode *, - mode_t, nlink_t, xfs_dev_t, struct cred *, - struct fsxattr *, struct xfs_inode **); extern void libxfs_trans_inode_alloc_buf (struct xfs_trans *, struct xfs_buf *); @@ -286,6 +275,10 @@ extern void libxfs_trans_ichgtime(struct xfs_trans *, struct xfs_inode *, int); extern int libxfs_iflush_int (struct xfs_inode *, struct xfs_buf *); +int libxfs_icreate(struct xfs_trans *tp, xfs_ino_t ino, + const struct xfs_icreate_args *args, struct xfs_inode **ipp); +void libxfs_icreate_args_rootfile(struct xfs_icreate_args *args, umode_t mode); + extern struct timespec64 current_time(struct inode *inode); /* Inode Cache Interfaces */ diff --git a/libxfs/inode.c b/libxfs/inode.c index d311abafd79..c1fb622f306 100644 --- a/libxfs/inode.c +++ b/libxfs/inode.c @@ -179,7 +179,7 @@ xfs_inode_init( * Initialise a newly allocated inode and return the in-core inode to the * caller locked exclusively. */ -static int +int libxfs_icreate( struct xfs_trans *tp, xfs_ino_t ino, @@ -198,6 +198,22 @@ libxfs_icreate( return 0; } +/* Set up inode attributes for newly created internal files. */ +void +libxfs_icreate_args_rootfile( + struct xfs_icreate_args *args, + umode_t mode) +{ + args->mnt_userns = NULL; + args->uid = make_kuid(0); + args->gid = make_kgid(0); + args->prid = 0; + args->mode = mode; + args->flags = XFS_ICREATE_ARGS_FORCE_UID | + XFS_ICREATE_ARGS_FORCE_GID | + XFS_ICREATE_ARGS_FORCE_MODE; +} + /* * Writes a modified inode's changes out to the inode's on disk home. * Originally based on xfs_iflush_int() from xfs_inode.c in the kernel. @@ -265,67 +281,6 @@ libxfs_iflush_int( return 0; } -/* - * Wrapper around call to libxfs_ialloc. Takes care of committing and - * allocating a new transaction as needed. - * - * Originally there were two copies of this code - one in mkfs, the - * other in repair - now there is just the one. - */ -int -libxfs_dir_ialloc( - struct xfs_trans **tpp, - struct xfs_inode *dp, - mode_t mode, - nlink_t nlink, - xfs_dev_t rdev, - struct cred *cr, - struct fsxattr *fsx, - struct xfs_inode **ipp) -{ - struct xfs_icreate_args args = { - .pip = dp, - .uid = make_kuid(cr->cr_uid), - .gid = make_kgid(cr->cr_gid), - .prid = dp ? libxfs_get_initial_prid(dp) : 0, - .nlink = nlink, - .rdev = rdev, - .mode = mode, - .flags = XFS_ICREATE_ARGS_FORCE_UID | - XFS_ICREATE_ARGS_FORCE_GID | - XFS_ICREATE_ARGS_FORCE_MODE, - }; - struct xfs_inode *ip; - xfs_ino_t parent_ino = dp ? dp->i_ino : 0; - xfs_ino_t ino; - int error; - - /* - * Call the space management code to pick the on-disk inode to be - * allocated. - */ - error = xfs_dialloc(tpp, parent_ino, mode, &ino); - if (error) - return error; - - error = libxfs_icreate(*tpp, ino, &args, ipp); - if (error || dp) - return error; - - /* If there is no parent dir, initialize the file from fsxattr data. */ - ip = *ipp; - ip->i_projid = fsx->fsx_projid; - ip->i_extsize = fsx->fsx_extsize; - ip->i_diflags = xfs_flags2diflags(ip, fsx->fsx_xflags); - - if (xfs_has_v3inodes(ip->i_mount)) { - ip->i_diflags2 = xfs_flags2diflags2(ip, fsx->fsx_xflags); - ip->i_cowextsize = fsx->fsx_cowextsize; - } - xfs_trans_log_inode(*tpp, ip, XFS_ILOG_CORE); - return 0; -} - /* * Inode cache stubs. */ diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h index 5752733a833..782a551ee1c 100644 --- a/libxfs/libxfs_api_defs.h +++ b/libxfs/libxfs_api_defs.h @@ -91,6 +91,7 @@ #define xfs_da_shrink_inode libxfs_da_shrink_inode #define xfs_defer_cancel libxfs_defer_cancel #define xfs_defer_finish libxfs_defer_finish +#define xfs_dialloc libxfs_dialloc #define xfs_dinode_calc_crc libxfs_dinode_calc_crc #define xfs_dinode_good_version libxfs_dinode_good_version #define xfs_dinode_verify libxfs_dinode_verify diff --git a/mkfs/proto.c b/mkfs/proto.c index bd306f95568..b60def70652 100644 --- a/mkfs/proto.c +++ b/mkfs/proto.c @@ -351,6 +351,65 @@ newdirectory( fail(_("directory create error"), error); } +struct cred { + uid_t cr_uid; + gid_t cr_gid; +}; + +static int +creatproto( + struct xfs_trans **tpp, + struct xfs_inode *dp, + mode_t mode, + nlink_t nlink, + xfs_dev_t rdev, + struct cred *cr, + struct fsxattr *fsx, + struct xfs_inode **ipp) +{ + struct xfs_icreate_args args = { + .pip = dp, + .uid = make_kuid(cr->cr_uid), + .gid = make_kgid(cr->cr_gid), + .prid = dp ? libxfs_get_initial_prid(dp) : 0, + .nlink = nlink, + .rdev = rdev, + .mode = mode, + .flags = XFS_ICREATE_ARGS_FORCE_UID | + XFS_ICREATE_ARGS_FORCE_GID | + XFS_ICREATE_ARGS_FORCE_MODE, + }; + struct xfs_inode *ip; + xfs_ino_t parent_ino = dp ? dp->i_ino : 0; + xfs_ino_t ino; + int error; + + /* + * Call the space management code to pick the on-disk inode to be + * allocated. + */ + error = -libxfs_dialloc(tpp, parent_ino, mode, &ino); + if (error) + return error; + + error = -libxfs_icreate(*tpp, ino, &args, ipp); + if (error || dp) + return error; + + /* If there is no parent dir, initialize the file from fsxattr data. */ + ip = *ipp; + ip->i_projid = fsx->fsx_projid; + ip->i_extsize = fsx->fsx_extsize; + ip->i_diflags = xfs_flags2diflags(ip, fsx->fsx_xflags); + + if (xfs_has_v3inodes(ip->i_mount)) { + ip->i_diflags2 = xfs_flags2diflags2(ip, fsx->fsx_xflags); + ip->i_cowextsize = fsx->fsx_cowextsize; + } + libxfs_trans_log_inode(*tpp, ip, XFS_ILOG_CORE); + return 0; +} + static void parseproto( xfs_mount_t *mp, @@ -450,7 +509,6 @@ parseproto( mode |= val; creds.cr_uid = (int)getnum(getstr(pp), 0, 0, false); creds.cr_gid = (int)getnum(getstr(pp), 0, 0, false); - creds.cr_flags = CRED_FORCE_GID; xname.name = (unsigned char *)name; xname.len = name ? strlen(name) : 0; xname.type = 0; @@ -459,8 +517,8 @@ parseproto( case IF_REGULAR: buf = newregfile(pp, &len); tp = getres(mp, XFS_B_TO_FSB(mp, len)); - error = -libxfs_dir_ialloc(&tp, pip, mode|S_IFREG, 1, 0, - &creds, fsxp, &ip); + error = creatproto(&tp, pip, mode | S_IFREG, 1, 0, &creds, + fsxp, &ip); if (error) fail(_("Inode allocation failed"), error); writefile(tp, ip, buf, len); @@ -483,8 +541,8 @@ parseproto( } tp = getres(mp, XFS_B_TO_FSB(mp, llen)); - error = -libxfs_dir_ialloc(&tp, pip, mode|S_IFREG, 1, 0, - &creds, fsxp, &ip); + error = creatproto(&tp, pip, mode | S_IFREG, 1, 0, &creds, + fsxp, &ip); if (error) fail(_("Inode pre-allocation failed"), error); @@ -504,7 +562,7 @@ parseproto( tp = getres(mp, 0); majdev = getnum(getstr(pp), 0, 0, false); mindev = getnum(getstr(pp), 0, 0, false); - error = -libxfs_dir_ialloc(&tp, pip, mode|S_IFBLK, 1, + error = creatproto(&tp, pip, mode | S_IFBLK, 1, IRIX_MKDEV(majdev, mindev), &creds, fsxp, &ip); if (error) { fail(_("Inode allocation failed"), error); @@ -519,7 +577,7 @@ parseproto( tp = getres(mp, 0); majdev = getnum(getstr(pp), 0, 0, false); mindev = getnum(getstr(pp), 0, 0, false); - error = -libxfs_dir_ialloc(&tp, pip, mode|S_IFCHR, 1, + error = creatproto(&tp, pip, mode | S_IFCHR, 1, IRIX_MKDEV(majdev, mindev), &creds, fsxp, &ip); if (error) fail(_("Inode allocation failed"), error); @@ -531,8 +589,8 @@ parseproto( case IF_FIFO: tp = getres(mp, 0); - error = -libxfs_dir_ialloc(&tp, pip, mode|S_IFIFO, 1, 0, - &creds, fsxp, &ip); + error = creatproto(&tp, pip, mode | S_IFIFO, 1, 0, &creds, + fsxp, &ip); if (error) fail(_("Inode allocation failed"), error); libxfs_trans_ijoin(tp, pip, 0); @@ -543,8 +601,8 @@ parseproto( buf = getstr(pp); len = (int)strlen(buf); tp = getres(mp, XFS_B_TO_FSB(mp, len)); - error = -libxfs_dir_ialloc(&tp, pip, mode|S_IFLNK, 1, 0, - &creds, fsxp, &ip); + error = creatproto(&tp, pip, mode | S_IFLNK, 1, 0, &creds, + fsxp, &ip); if (error) fail(_("Inode allocation failed"), error); writesymlink(tp, ip, buf, len); @@ -554,8 +612,8 @@ parseproto( break; case IF_DIRECTORY: tp = getres(mp, 0); - error = -libxfs_dir_ialloc(&tp, pip, mode|S_IFDIR, 1, 0, - &creds, fsxp, &ip); + error = creatproto(&tp, pip, mode | S_IFDIR, 1, 0, &creds, + fsxp, &ip); if (error) fail(_("Inode allocation failed"), error); inc_nlink(VFS_I(ip)); /* account for . */ @@ -646,14 +704,14 @@ rtinit( memset(&creds, 0, sizeof(creds)); memset(&fsxattrs, 0, sizeof(fsxattrs)); - error = -libxfs_dir_ialloc(&tp, NULL, S_IFREG, 1, 0, - &creds, &fsxattrs, &rbmip); + error = creatproto(&tp, NULL, S_IFREG, 1, 0, &creds, &fsxattrs, + &rbmip); if (error) { fail(_("Realtime bitmap inode allocation failed"), error); } /* * Do our thing with rbmip before allocating rsumip, - * because the next call to ialloc() may + * because the next call to createproto may * commit the transaction in which rbmip was allocated. */ mp->m_sb.sb_rbmino = rbmip->i_ino; @@ -663,8 +721,8 @@ rtinit( libxfs_trans_log_inode(tp, rbmip, XFS_ILOG_CORE); libxfs_log_sb(tp); mp->m_rbmip = rbmip; - error = -libxfs_dir_ialloc(&tp, NULL, S_IFREG, 1, 0, - &creds, &fsxattrs, &rsumip); + error = creatproto(&tp, NULL, S_IFREG, 1, 0, &creds, &fsxattrs, + &rsumip); if (error) { fail(_("Realtime summary inode allocation failed"), error); } diff --git a/repair/phase6.c b/repair/phase6.c index 75b0e06b31a..e7e2bf3f475 100644 --- a/repair/phase6.c +++ b/repair/phase6.c @@ -19,8 +19,6 @@ #include "progress.h" #include "versions.h" -static struct cred zerocr; -static struct fsxattr zerofsx; static xfs_ino_t orphanage_ino; /* @@ -873,19 +871,25 @@ mk_root_dir(xfs_mount_t *mp) * orphanage name == lost+found */ static xfs_ino_t -mk_orphanage(xfs_mount_t *mp) +mk_orphanage( + struct xfs_mount *mp) { - xfs_ino_t ino; - xfs_trans_t *tp; - xfs_inode_t *ip; - xfs_inode_t *pip; - ino_tree_node_t *irec; - int ino_offset = 0; - int i; - int error; - const int mode = 0755; - int nres; - struct xfs_name xname; + struct xfs_icreate_args args = { + .nlink = 2, + }; + struct xfs_trans *tp; + struct xfs_inode *ip; + struct xfs_inode *pip; + struct ino_tree_node *irec; + xfs_ino_t ino; + int ino_offset = 0; + int i; + int error; + int nres; + const umode_t mode = S_IFDIR | 0755; + struct xfs_name xname; + + libxfs_icreate_args_rootfile(&args, mode); /* * check for an existing lost+found first, if it exists, return @@ -898,6 +902,7 @@ mk_orphanage(xfs_mount_t *mp) do_error(_("%d - couldn't iget root inode to obtain %s\n"), i, ORPHANAGE); + args.pip = pip; xname.name = (unsigned char *)ORPHANAGE; xname.len = strlen(ORPHANAGE); xname.type = XFS_DIR3_FT_DIR; @@ -922,14 +927,15 @@ mk_orphanage(xfs_mount_t *mp) do_error(_("%d - couldn't iget root inode to make %s\n"), i, ORPHANAGE);*/ - error = -libxfs_dir_ialloc(&tp, pip, mode|S_IFDIR, - 1, 0, &zerocr, &zerofsx, &ip); - if (error) { + error = -libxfs_dialloc(&tp, mp->m_sb.sb_rootino, mode, &ino); + if (error) do_error(_("%s inode allocation failed %d\n"), ORPHANAGE, error); - } - inc_nlink(VFS_I(ip)); /* account for . */ - ino = ip->i_ino; + + error = -libxfs_icreate(tp, ino, &args, &ip); + if (error) + do_error(_("%s inode initialization failed %d\n"), + ORPHANAGE, error); irec = find_inode_rec(mp, XFS_INO_TO_AGNO(mp, ino), @@ -3207,8 +3213,6 @@ phase6(xfs_mount_t *mp) ino_tree_node_t *irec; int i; - memset(&zerocr, 0, sizeof(struct cred)); - memset(&zerofsx, 0, sizeof(struct fsxattr)); orphanage_ino = 0; do_log(_("Phase 6 - check inode connectivity...\n")); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 13/26] libxfs: backport inode init code from the kernel 2022-12-30 22:19 ` [PATCHSET v1.0 00/26] libxfs: hoist inode operations to libxfs Darrick J. Wong ` (12 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 14/26] libxfs: remove libxfs_dir_ialloc Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 17/26] xfs: hoist xfs_{bump,drop}link to libxfs Darrick J. Wong ` (11 subsequent siblings) 25 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Reorganize the userspace inode initialization code to more closely resemble its kernel counterpart. This is preparation to hoist the initialization routines to libxfs. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- include/xfs_inode.h | 2 + include/xfs_mount.h | 1 + libxfs/inode.c | 92 +++++++++++++++++++++++++++++++++++++++++--------- libxfs/libxfs_priv.h | 6 +++ 4 files changed, 84 insertions(+), 17 deletions(-) diff --git a/include/xfs_inode.h b/include/xfs_inode.h index bf8322ee2ec..4e8a3dc6fd8 100644 --- a/include/xfs_inode.h +++ b/include/xfs_inode.h @@ -295,4 +295,6 @@ extern void libxfs_irele(struct xfs_inode *ip); #define XFS_DEFAULT_COWEXTSZ_HINT 32 +#define XFS_INHERIT_GID(pip) (VFS_I(pip)->i_mode & S_ISGID) + #endif /* __XFS_INODE_H__ */ diff --git a/include/xfs_mount.h b/include/xfs_mount.h index c67d0237686..1690660ed5b 100644 --- a/include/xfs_mount.h +++ b/include/xfs_mount.h @@ -218,6 +218,7 @@ __XFS_UNSUPP_FEAT(ikeep) __XFS_UNSUPP_FEAT(swalloc) __XFS_UNSUPP_FEAT(small_inums) __XFS_UNSUPP_FEAT(readonly) +__XFS_UNSUPP_FEAT(grpid) /* Operational mount state flags */ #define XFS_OPSTATE_INODE32 0 /* inode32 allocator active */ diff --git a/libxfs/inode.c b/libxfs/inode.c index 44d889f3f0f..d311abafd79 100644 --- a/libxfs/inode.c +++ b/libxfs/inode.c @@ -31,7 +31,7 @@ /* Propagate di_flags from a parent inode to a child inode. */ static void -xfs_inode_propagate_flags( +xfs_inode_inherit_flags( struct xfs_inode *ip, const struct xfs_inode *pip) { @@ -81,31 +81,47 @@ xfs_inode_init( struct xfs_inode *ip) { struct xfs_inode *pip = args->pip; + struct inode *dir = pip ? VFS_I(pip) : NULL; + struct xfs_mount *mp = tp->t_mountp; + struct inode *inode = VFS_I(ip); unsigned int flags; int times = XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG | XFS_ICHGTIME_ACCESS; - VFS_I(ip)->i_mode = args->mode; - set_nlink(VFS_I(ip), args->nlink); - VFS_I(ip)->i_uid = args->uid; + set_nlink(inode, args->nlink); + inode->i_rdev = args->rdev; ip->i_projid = args->prid; - if (pip && (VFS_I(pip)->i_mode & S_ISGID)) { - if (!(args->flags & XFS_ICREATE_ARGS_FORCE_GID)) - VFS_I(ip)->i_gid = VFS_I(pip)->i_gid; - if ((VFS_I(pip)->i_mode & S_ISGID) && S_ISDIR(args->mode)) - VFS_I(ip)->i_mode |= S_ISGID; - } else - VFS_I(ip)->i_gid = args->gid; + if (dir && !(dir->i_mode & S_ISGID) && + xfs_has_grpid(mp)) { + inode->i_uid = args->uid; + inode->i_gid = dir->i_gid; + inode->i_mode = args->mode; + } else { + inode_init_owner(args->mnt_userns, inode, dir, args->mode); + } + + /* struct copies */ + if (args->flags & XFS_ICREATE_ARGS_FORCE_UID) + inode->i_uid = args->uid; + else + ASSERT(uid_eq(inode->i_uid, args->uid)); + if (args->flags & XFS_ICREATE_ARGS_FORCE_GID) + inode->i_gid = args->gid; + else if (!pip || !XFS_INHERIT_GID(pip)) + ASSERT(gid_eq(inode->i_gid, args->gid)); + if (args->flags & XFS_ICREATE_ARGS_FORCE_MODE) + inode->i_mode = args->mode; ip->i_disk_size = 0; ip->i_df.if_nextents = 0; ASSERT(ip->i_nblocks == 0); + ip->i_extsize = 0; ip->i_diflags = 0; + if (xfs_has_v3inodes(ip->i_mount)) { VFS_I(ip)->i_version = 1; - ip->i_diflags2 = ip->i_mount->m_ino_geo.new_diflags2; ip->i_cowextsize = 0; times |= XFS_ICHGTIME_CREATE; } @@ -120,12 +136,11 @@ xfs_inode_init( case S_IFBLK: ip->i_df.if_format = XFS_DINODE_FMT_DEV; flags |= XFS_ILOG_DEV; - VFS_I(ip)->i_rdev = args->rdev; break; case S_IFREG: case S_IFDIR: if (pip && (pip->i_diflags & XFS_DIFLAG_ANY)) - xfs_inode_propagate_flags(ip, pip); + xfs_inode_inherit_flags(ip, pip); if (pip && (pip->i_diflags2 & XFS_DIFLAG2_ANY)) xfs_inode_inherit_flags2(ip, pip); /* FALLTHROUGH */ @@ -138,6 +153,21 @@ xfs_inode_init( ASSERT(0); } + /* + * If we need to create attributes immediately after allocating the + * inode, initialise an empty attribute fork right now. We use the + * default fork offset for attributes here as we don't know exactly what + * size or how many attributes we might be adding. We can do this + * safely here because we know the data fork is completely empty and + * this saves us from needing to run a separate transaction to set the + * fork offset in the immediate future. + */ + if ((args->flags & XFS_ICREATE_ARGS_INIT_XATTRS) && + xfs_has_attr(mp)) { + ip->i_forkoff = xfs_default_attroffset(ip) >> 3; + xfs_ifork_init_attr(ip, XFS_DINODE_FMT_EXTENTS, 0); + } + /* * Log the new values stuffed into the inode. */ @@ -261,15 +291,15 @@ libxfs_dir_ialloc( .nlink = nlink, .rdev = rdev, .mode = mode, + .flags = XFS_ICREATE_ARGS_FORCE_UID | + XFS_ICREATE_ARGS_FORCE_GID | + XFS_ICREATE_ARGS_FORCE_MODE, }; struct xfs_inode *ip; xfs_ino_t parent_ino = dp ? dp->i_ino : 0; xfs_ino_t ino; int error; - if (cr->cr_flags & CRED_FORCE_GID) - args.flags |= XFS_ICREATE_ARGS_FORCE_GID; - /* * Call the space management code to pick the on-disk inode to be * allocated. @@ -321,6 +351,7 @@ libxfs_iget( VFS_I(ip)->i_count = 1; ip->i_ino = ino; ip->i_mount = mp; + ip->i_diflags2 = mp->m_ino_geo.new_diflags2; ip->i_af.if_format = XFS_DINODE_FMT_EXTENTS; spin_lock_init(&VFS_I(ip)->i_lock); @@ -399,3 +430,30 @@ libxfs_irele( kmem_cache_free(xfs_inode_cache, ip); } } + +static inline void inode_fsuid_set(struct inode *inode, + struct user_namespace *mnt_userns) +{ + inode->i_uid = make_kuid(0); +} + +static inline void inode_fsgid_set(struct inode *inode, + struct user_namespace *mnt_userns) +{ + inode->i_gid = make_kgid(0); +} + +void inode_init_owner(struct user_namespace *mnt_userns, struct inode *inode, + const struct inode *dir, umode_t mode) +{ + inode_fsuid_set(inode, mnt_userns); + if (dir && dir->i_mode & S_ISGID) { + inode->i_gid = dir->i_gid; + + /* Directories are special, and always inherit S_ISGID */ + if (S_ISDIR(mode)) + mode |= S_ISGID; + } else + inode_fsgid_set(inode, mnt_userns); + inode->i_mode = mode; +} diff --git a/libxfs/libxfs_priv.h b/libxfs/libxfs_priv.h index 716e711cde4..acad5ccd228 100644 --- a/libxfs/libxfs_priv.h +++ b/libxfs/libxfs_priv.h @@ -219,6 +219,12 @@ static inline bool WARN_ON(bool expr) { (inode)->i_version = (version); \ } while (0) +struct inode; +struct user_namespace; + +void inode_init_owner(struct user_namespace *mnt_userns, struct inode *inode, + const struct inode *dir, umode_t mode); + #define __must_check __attribute__((__warn_unused_result__)) /* ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 17/26] xfs: hoist xfs_{bump,drop}link to libxfs 2022-12-30 22:19 ` [PATCHSET v1.0 00/26] libxfs: hoist inode operations to libxfs Darrick J. Wong ` (13 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 13/26] libxfs: backport inode init code from the kernel Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 15/26] xfs: hoist new inode initialization functions " Darrick J. Wong ` (10 subsequent siblings) 25 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Move xfs_bumplink and xfs_droplink to libxfs. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- include/xfs_inode.h | 4 ++++ libxfs/xfs_inode_util.c | 35 +++++++++++++++++++++++++++++++++++ libxfs/xfs_inode_util.h | 2 ++ 3 files changed, 41 insertions(+) diff --git a/include/xfs_inode.h b/include/xfs_inode.h index 234f8d3affa..ccd19e5ee5b 100644 --- a/include/xfs_inode.h +++ b/include/xfs_inode.h @@ -257,6 +257,10 @@ static inline void inc_nlink(struct inode *inode) { inode->i_nlink++; } +static inline void drop_nlink(struct inode *inode) +{ + inode->i_nlink--; +} static inline bool xfs_is_reflink_inode(struct xfs_inode *ip) { diff --git a/libxfs/xfs_inode_util.c b/libxfs/xfs_inode_util.c index 4b19edd9ab1..e12c43954cf 100644 --- a/libxfs/xfs_inode_util.c +++ b/libxfs/xfs_inode_util.c @@ -608,3 +608,38 @@ xfs_iunlink_remove( return xfs_iunlink_remove_inode(tp, pag, agibp, ip); } + +/* + * Decrement the link count on an inode & log the change. If this causes the + * link count to go to zero, move the inode to AGI unlinked list so that it can + * be freed when the last active reference goes away via xfs_inactive(). + */ +int +xfs_droplink( + struct xfs_trans *tp, + struct xfs_inode *ip) +{ + xfs_trans_ichgtime(tp, ip, XFS_ICHGTIME_CHG); + + drop_nlink(VFS_I(ip)); + xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); + + if (VFS_I(ip)->i_nlink) + return 0; + + return xfs_iunlink(tp, ip); +} + +/* + * Increment the link count on an inode & log the change. + */ +void +xfs_bumplink( + struct xfs_trans *tp, + struct xfs_inode *ip) +{ + xfs_trans_ichgtime(tp, ip, XFS_ICHGTIME_CHG); + + inc_nlink(VFS_I(ip)); + xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); +} diff --git a/libxfs/xfs_inode_util.h b/libxfs/xfs_inode_util.h index e15cf94e094..f92b14a6fbe 100644 --- a/libxfs/xfs_inode_util.h +++ b/libxfs/xfs_inode_util.h @@ -59,6 +59,8 @@ void xfs_inode_init(struct xfs_trans *tp, const struct xfs_icreate_args *args, int xfs_iunlink(struct xfs_trans *tp, struct xfs_inode *ip); int xfs_iunlink_remove(struct xfs_trans *tp, struct xfs_perag *pag, struct xfs_inode *ip); +int xfs_droplink(struct xfs_trans *tp, struct xfs_inode *ip); +void xfs_bumplink(struct xfs_trans *tp, struct xfs_inode *ip); /* The libxfs client must provide this group of helper functions. */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 15/26] xfs: hoist new inode initialization functions to libxfs 2022-12-30 22:19 ` [PATCHSET v1.0 00/26] libxfs: hoist inode operations to libxfs Darrick J. Wong ` (14 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 17/26] xfs: hoist xfs_{bump,drop}link to libxfs Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 22/26] xfs: create libxfs helper to exchange two directory entries Darrick J. Wong ` (9 subsequent siblings) 25 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Move all the code that initializes a new inode's attributes from the icreate_args structure and the parent directory into libxfs. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- include/xfs_inode.h | 20 +++++ libxfs/inode.c | 153 ++---------------------------------- libxfs/xfs_inode_util.c | 200 +++++++++++++++++++++++++++++++++++++++++++++++ libxfs/xfs_inode_util.h | 21 +++++ libxfs/xfs_shared.h | 8 -- repair/phase6.c | 3 - 6 files changed, 251 insertions(+), 154 deletions(-) diff --git a/include/xfs_inode.h b/include/xfs_inode.h index 03add740fa7..5c806b3a58c 100644 --- a/include/xfs_inode.h +++ b/include/xfs_inode.h @@ -94,6 +94,20 @@ static inline void ihold(struct inode *inode) inode->i_count++; } +static inline void +inode_fsuid_set( + struct inode *inode, + struct user_namespace *mnt_userns) +{ + inode->i_uid = make_kuid(0); +} + +static inline void +inode_set_iversion(struct inode *inode, uint64_t version) +{ + inode->i_version = version; +} + typedef struct xfs_inode { struct cache_node i_node; struct xfs_mount *i_mount; /* fs mount struct ptr */ @@ -290,4 +304,10 @@ extern void libxfs_irele(struct xfs_inode *ip); #define XFS_INHERIT_GID(pip) (VFS_I(pip)->i_mode & S_ISGID) +#define xfs_inherit_noatime (false) +#define xfs_inherit_nodump (false) +#define xfs_inherit_sync (false) +#define xfs_inherit_nosymlinks (false) +#define xfs_inherit_nodefrag (false) + #endif /* __XFS_INODE_H__ */ diff --git a/libxfs/inode.c b/libxfs/inode.c index c1fb622f306..8ef2b654769 100644 --- a/libxfs/inode.c +++ b/libxfs/inode.c @@ -29,150 +29,11 @@ #include "xfs_da_btree.h" #include "xfs_dir2_priv.h" -/* Propagate di_flags from a parent inode to a child inode. */ -static void -xfs_inode_inherit_flags( - struct xfs_inode *ip, - const struct xfs_inode *pip) -{ - unsigned int di_flags = 0; - umode_t mode = VFS_I(ip)->i_mode; - - if ((mode & S_IFMT) == S_IFDIR) { - if (pip->i_diflags & XFS_DIFLAG_RTINHERIT) - di_flags |= XFS_DIFLAG_RTINHERIT; - if (pip->i_diflags & XFS_DIFLAG_EXTSZINHERIT) { - di_flags |= XFS_DIFLAG_EXTSZINHERIT; - ip->i_extsize = pip->i_extsize; - } - } else { - if ((pip->i_diflags & XFS_DIFLAG_RTINHERIT) && - xfs_has_realtime(ip->i_mount)) - di_flags |= XFS_DIFLAG_REALTIME; - if (pip->i_diflags & XFS_DIFLAG_EXTSZINHERIT) { - di_flags |= XFS_DIFLAG_EXTSIZE; - ip->i_extsize = pip->i_extsize; - } - } - if (pip->i_diflags & XFS_DIFLAG_PROJINHERIT) - di_flags |= XFS_DIFLAG_PROJINHERIT; - ip->i_diflags |= di_flags; -} - -/* Propagate di_flags2 from a parent inode to a child inode. */ -static void -xfs_inode_inherit_flags2( - struct xfs_inode *ip, - const struct xfs_inode *pip) -{ - if (pip->i_diflags2 & XFS_DIFLAG2_COWEXTSIZE) { - ip->i_diflags2 |= XFS_DIFLAG2_COWEXTSIZE; - ip->i_cowextsize = pip->i_cowextsize; - } - if (pip->i_diflags2 & XFS_DIFLAG2_DAX) - ip->i_diflags2 |= XFS_DIFLAG2_DAX; -} - -/* Initialise an inode's attributes. */ -static void -xfs_inode_init( - struct xfs_trans *tp, - const struct xfs_icreate_args *args, +void +xfs_setup_inode( struct xfs_inode *ip) { - struct xfs_inode *pip = args->pip; - struct inode *dir = pip ? VFS_I(pip) : NULL; - struct xfs_mount *mp = tp->t_mountp; - struct inode *inode = VFS_I(ip); - unsigned int flags; - int times = XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG | - XFS_ICHGTIME_ACCESS; - - set_nlink(inode, args->nlink); - inode->i_rdev = args->rdev; - ip->i_projid = args->prid; - - if (dir && !(dir->i_mode & S_ISGID) && - xfs_has_grpid(mp)) { - inode->i_uid = args->uid; - inode->i_gid = dir->i_gid; - inode->i_mode = args->mode; - } else { - inode_init_owner(args->mnt_userns, inode, dir, args->mode); - } - - /* struct copies */ - if (args->flags & XFS_ICREATE_ARGS_FORCE_UID) - inode->i_uid = args->uid; - else - ASSERT(uid_eq(inode->i_uid, args->uid)); - if (args->flags & XFS_ICREATE_ARGS_FORCE_GID) - inode->i_gid = args->gid; - else if (!pip || !XFS_INHERIT_GID(pip)) - ASSERT(gid_eq(inode->i_gid, args->gid)); - if (args->flags & XFS_ICREATE_ARGS_FORCE_MODE) - inode->i_mode = args->mode; - - ip->i_disk_size = 0; - ip->i_df.if_nextents = 0; - ASSERT(ip->i_nblocks == 0); - - ip->i_extsize = 0; - ip->i_diflags = 0; - - if (xfs_has_v3inodes(ip->i_mount)) { - VFS_I(ip)->i_version = 1; - ip->i_cowextsize = 0; - times |= XFS_ICHGTIME_CREATE; - } - - xfs_trans_ichgtime(tp, ip, times); - - flags = XFS_ILOG_CORE; - switch (args->mode & S_IFMT) { - case S_IFIFO: - case S_IFSOCK: - case S_IFCHR: - case S_IFBLK: - ip->i_df.if_format = XFS_DINODE_FMT_DEV; - flags |= XFS_ILOG_DEV; - break; - case S_IFREG: - case S_IFDIR: - if (pip && (pip->i_diflags & XFS_DIFLAG_ANY)) - xfs_inode_inherit_flags(ip, pip); - if (pip && (pip->i_diflags2 & XFS_DIFLAG2_ANY)) - xfs_inode_inherit_flags2(ip, pip); - /* FALLTHROUGH */ - case S_IFLNK: - ip->i_df.if_format = XFS_DINODE_FMT_EXTENTS; - ip->i_df.if_bytes = 0; - ip->i_df.if_u1.if_root = NULL; - break; - default: - ASSERT(0); - } - - /* - * If we need to create attributes immediately after allocating the - * inode, initialise an empty attribute fork right now. We use the - * default fork offset for attributes here as we don't know exactly what - * size or how many attributes we might be adding. We can do this - * safely here because we know the data fork is completely empty and - * this saves us from needing to run a separate transaction to set the - * fork offset in the immediate future. - */ - if ((args->flags & XFS_ICREATE_ARGS_INIT_XATTRS) && - xfs_has_attr(mp)) { - ip->i_forkoff = xfs_default_attroffset(ip) >> 3; - xfs_ifork_init_attr(ip, XFS_DINODE_FMT_EXTENTS, 0); - } - - /* - * Log the new values stuffed into the inode. - */ - xfs_trans_ijoin(tp, ip, 0); - xfs_trans_log_inode(tp, ip, flags); + /* empty */ } /* @@ -386,10 +247,12 @@ libxfs_irele( } } -static inline void inode_fsuid_set(struct inode *inode, - struct user_namespace *mnt_userns) +void +xfs_inode_sgid_inherit( + const struct xfs_icreate_args *args, + struct xfs_inode *ip) { - inode->i_uid = make_kuid(0); + /* empty */ } static inline void inode_fsgid_set(struct inode *inode, diff --git a/libxfs/xfs_inode_util.c b/libxfs/xfs_inode_util.c index 89fb58807a1..21196a899da 100644 --- a/libxfs/xfs_inode_util.c +++ b/libxfs/xfs_inode_util.c @@ -13,6 +13,10 @@ #include "xfs_mount.h" #include "xfs_inode.h" #include "xfs_inode_util.h" +#include "xfs_trans.h" +#include "xfs_ialloc.h" +#include "xfs_health.h" +#include "xfs_bmap.h" uint16_t xfs_flags2diflags( @@ -133,3 +137,199 @@ xfs_get_initial_prid(struct xfs_inode *dp) return XFS_PROJID_DEFAULT; } + +/* Propagate di_flags from a parent inode to a child inode. */ +static inline void +xfs_inode_inherit_flags( + struct xfs_inode *ip, + const struct xfs_inode *pip) +{ + unsigned int di_flags = 0; + xfs_failaddr_t failaddr; + umode_t mode = VFS_I(ip)->i_mode; + + if (S_ISDIR(mode)) { + if (pip->i_diflags & XFS_DIFLAG_RTINHERIT) + di_flags |= XFS_DIFLAG_RTINHERIT; + if (pip->i_diflags & XFS_DIFLAG_EXTSZINHERIT) { + di_flags |= XFS_DIFLAG_EXTSZINHERIT; + ip->i_extsize = pip->i_extsize; + } + if (pip->i_diflags & XFS_DIFLAG_PROJINHERIT) + di_flags |= XFS_DIFLAG_PROJINHERIT; + } else if (S_ISREG(mode)) { + if ((pip->i_diflags & XFS_DIFLAG_RTINHERIT) && + xfs_has_realtime(ip->i_mount)) + di_flags |= XFS_DIFLAG_REALTIME; + if (pip->i_diflags & XFS_DIFLAG_EXTSZINHERIT) { + di_flags |= XFS_DIFLAG_EXTSIZE; + ip->i_extsize = pip->i_extsize; + } + } + if ((pip->i_diflags & XFS_DIFLAG_NOATIME) && + xfs_inherit_noatime) + di_flags |= XFS_DIFLAG_NOATIME; + if ((pip->i_diflags & XFS_DIFLAG_NODUMP) && + xfs_inherit_nodump) + di_flags |= XFS_DIFLAG_NODUMP; + if ((pip->i_diflags & XFS_DIFLAG_SYNC) && + xfs_inherit_sync) + di_flags |= XFS_DIFLAG_SYNC; + if ((pip->i_diflags & XFS_DIFLAG_NOSYMLINKS) && + xfs_inherit_nosymlinks) + di_flags |= XFS_DIFLAG_NOSYMLINKS; + if ((pip->i_diflags & XFS_DIFLAG_NODEFRAG) && + xfs_inherit_nodefrag) + di_flags |= XFS_DIFLAG_NODEFRAG; + if (pip->i_diflags & XFS_DIFLAG_FILESTREAM) + di_flags |= XFS_DIFLAG_FILESTREAM; + + ip->i_diflags |= di_flags; + + /* + * Inode verifiers on older kernels only check that the extent size + * hint is an integer multiple of the rt extent size on realtime files. + * They did not check the hint alignment on a directory with both + * rtinherit and extszinherit flags set. If the misaligned hint is + * propagated from a directory into a new realtime file, new file + * allocations will fail due to math errors in the rt allocator and/or + * trip the verifiers. Validate the hint settings in the new file so + * that we don't let broken hints propagate. + */ + failaddr = xfs_inode_validate_extsize(ip->i_mount, ip->i_extsize, + VFS_I(ip)->i_mode, ip->i_diflags); + if (failaddr) { + ip->i_diflags &= ~(XFS_DIFLAG_EXTSIZE | + XFS_DIFLAG_EXTSZINHERIT); + ip->i_extsize = 0; + } +} + +/* Propagate di_flags2 from a parent inode to a child inode. */ +static inline void +xfs_inode_inherit_flags2( + struct xfs_inode *ip, + const struct xfs_inode *pip) +{ + xfs_failaddr_t failaddr; + + if (pip->i_diflags2 & XFS_DIFLAG2_COWEXTSIZE) { + ip->i_diflags2 |= XFS_DIFLAG2_COWEXTSIZE; + ip->i_cowextsize = pip->i_cowextsize; + } + if (pip->i_diflags2 & XFS_DIFLAG2_DAX) + ip->i_diflags2 |= XFS_DIFLAG2_DAX; + + /* Don't let invalid cowextsize hints propagate. */ + failaddr = xfs_inode_validate_cowextsize(ip->i_mount, ip->i_cowextsize, + VFS_I(ip)->i_mode, ip->i_diflags, ip->i_diflags2); + if (failaddr) { + ip->i_diflags2 &= ~XFS_DIFLAG2_COWEXTSIZE; + ip->i_cowextsize = 0; + } +} + +/* Initialise an inode's attributes. */ +void +xfs_inode_init( + struct xfs_trans *tp, + const struct xfs_icreate_args *args, + struct xfs_inode *ip) +{ + struct xfs_inode *pip = args->pip; + struct inode *dir = pip ? VFS_I(pip) : NULL; + struct xfs_mount *mp = tp->t_mountp; + struct inode *inode = VFS_I(ip); + unsigned int flags; + int times = XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG | + XFS_ICHGTIME_ACCESS; + + set_nlink(inode, args->nlink); + inode->i_rdev = args->rdev; + ip->i_projid = args->prid; + + if (dir && !(dir->i_mode & S_ISGID) && xfs_has_grpid(mp)) { + inode_fsuid_set(inode, args->mnt_userns); + inode->i_gid = dir->i_gid; + inode->i_mode = args->mode; + } else { + inode_init_owner(args->mnt_userns, inode, dir, args->mode); + } + xfs_inode_sgid_inherit(args, ip); + + /* struct copies */ + if (args->flags & XFS_ICREATE_ARGS_FORCE_UID) + inode->i_uid = args->uid; + else + ASSERT(uid_eq(inode->i_uid, args->uid)); + if (args->flags & XFS_ICREATE_ARGS_FORCE_GID) + inode->i_gid = args->gid; + else if (!pip || !XFS_INHERIT_GID(pip)) + ASSERT(gid_eq(inode->i_gid, args->gid)); + if (args->flags & XFS_ICREATE_ARGS_FORCE_MODE) + inode->i_mode = args->mode; + + ip->i_disk_size = 0; + ip->i_df.if_nextents = 0; + ASSERT(ip->i_nblocks == 0); + + ip->i_extsize = 0; + ip->i_diflags = 0; + + if (xfs_has_v3inodes(mp)) { + inode_set_iversion(inode, 1); + ip->i_cowextsize = 0; + times |= XFS_ICHGTIME_CREATE; + } + + xfs_trans_ichgtime(tp, ip, times); + + flags = XFS_ILOG_CORE; + switch (args->mode & S_IFMT) { + case S_IFIFO: + case S_IFCHR: + case S_IFBLK: + case S_IFSOCK: + ip->i_df.if_format = XFS_DINODE_FMT_DEV; + flags |= XFS_ILOG_DEV; + break; + case S_IFREG: + case S_IFDIR: + if (pip && (pip->i_diflags & XFS_DIFLAG_ANY)) + xfs_inode_inherit_flags(ip, pip); + if (pip && (pip->i_diflags2 & XFS_DIFLAG2_ANY)) + xfs_inode_inherit_flags2(ip, pip); + fallthrough; + case S_IFLNK: + ip->i_df.if_format = XFS_DINODE_FMT_EXTENTS; + ip->i_df.if_bytes = 0; + ip->i_df.if_u1.if_root = NULL; + break; + default: + ASSERT(0); + } + + /* + * If we need to create attributes immediately after allocating the + * inode, initialise an empty attribute fork right now. We use the + * default fork offset for attributes here as we don't know exactly what + * size or how many attributes we might be adding. We can do this + * safely here because we know the data fork is completely empty and + * this saves us from needing to run a separate transaction to set the + * fork offset in the immediate future. + */ + if ((args->flags & XFS_ICREATE_ARGS_INIT_XATTRS) && + xfs_has_attr(mp)) { + ip->i_forkoff = xfs_default_attroffset(ip) >> 3; + xfs_ifork_init_attr(ip, XFS_DINODE_FMT_EXTENTS, 0); + } + + /* + * Log the new values stuffed into the inode. + */ + xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL); + xfs_trans_log_inode(tp, ip, flags); + + /* now that we have an i_mode we can setup the inode structure */ + xfs_setup_inode(ip); +} diff --git a/libxfs/xfs_inode_util.h b/libxfs/xfs_inode_util.h index 466f0767ab5..a73ccaea558 100644 --- a/libxfs/xfs_inode_util.h +++ b/libxfs/xfs_inode_util.h @@ -44,4 +44,25 @@ struct xfs_icreate_args { uint16_t flags; }; +/* + * Flags for xfs_trans_ichgtime(). + */ +#define XFS_ICHGTIME_MOD 0x1 /* data fork modification timestamp */ +#define XFS_ICHGTIME_CHG 0x2 /* inode field change timestamp */ +#define XFS_ICHGTIME_CREATE 0x4 /* inode create timestamp */ +#define XFS_ICHGTIME_ACCESS 0x8 /* last access timestamp */ +void xfs_trans_ichgtime(struct xfs_trans *tp, struct xfs_inode *ip, int flags); + +void xfs_inode_init(struct xfs_trans *tp, const struct xfs_icreate_args *args, + struct xfs_inode *ip); + +/* The libxfs client must provide this group of helper functions. */ + +/* Handle legacy Irix sgid inheritance quirks. */ +void xfs_inode_sgid_inherit(const struct xfs_icreate_args *args, + struct xfs_inode *ip); + +/* Initialize the incore inode. */ +void xfs_setup_inode(struct xfs_inode *ip); + #endif /* __XFS_INODE_UTIL_H__ */ diff --git a/libxfs/xfs_shared.h b/libxfs/xfs_shared.h index acf527eb0e1..46754fe5736 100644 --- a/libxfs/xfs_shared.h +++ b/libxfs/xfs_shared.h @@ -131,14 +131,6 @@ void xfs_log_get_max_trans_res(struct xfs_mount *mp, #define XFS_RCBAG_BTREE_REF 1 #define XFS_SSB_REF 0 -/* - * Flags for xfs_trans_ichgtime(). - */ -#define XFS_ICHGTIME_MOD 0x1 /* data fork modification timestamp */ -#define XFS_ICHGTIME_CHG 0x2 /* inode field change timestamp */ -#define XFS_ICHGTIME_CREATE 0x4 /* inode create timestamp */ -#define XFS_ICHGTIME_ACCESS 0x8 /* last access timestamp */ - /* Computed inode geometry for the filesystem. */ struct xfs_ino_geometry { /* Maximum inode count in this filesystem. */ diff --git a/repair/phase6.c b/repair/phase6.c index e7e2bf3f475..0c24cfbf144 100644 --- a/repair/phase6.c +++ b/repair/phase6.c @@ -824,7 +824,8 @@ mk_root_dir(xfs_mount_t *mp) } /* - * take care of the core -- initialization from xfs_ialloc() + * take care of the core since we didn't call the libxfs ialloc function + * (comment changed to avoid tangling xfs/437) */ reset_inode_fields(ip); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 22/26] xfs: create libxfs helper to exchange two directory entries 2022-12-30 22:19 ` [PATCHSET v1.0 00/26] libxfs: hoist inode operations to libxfs Darrick J. Wong ` (15 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 15/26] xfs: hoist new inode initialization functions " Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 21/26] xfs: create libxfs helper to remove an existing inode/name from a directory Darrick J. Wong ` (8 subsequent siblings) 25 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Create a new libxfs function to exchange two directory entries. The upcoming metadata directory feature will need this to replace a metadata inode directory entry. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/xfs_dir2.c | 108 +++++++++++++++++++++++++++++++++++++++++++++++++++++ libxfs/xfs_dir2.h | 4 ++ 2 files changed, 112 insertions(+) diff --git a/libxfs/xfs_dir2.c b/libxfs/xfs_dir2.c index a0853d766f2..b0bb22ac506 100644 --- a/libxfs/xfs_dir2.c +++ b/libxfs/xfs_dir2.c @@ -949,3 +949,111 @@ xfs_dir_remove_child( return 0; } + +/* + * Exchange the entry (@name1, @ip1) in directory @dp1 with the entry (@name2, + * @ip2) in directory @dp2, and update '..' @ip1 and @ip2's entries as needed. + * @ip1 and @ip2 need not be of the same type. + * + * All inodes must have the ILOCK held, and both entries must already exist. + */ +int +xfs_dir_exchange( + struct xfs_trans *tp, + struct xfs_inode *dp1, + struct xfs_name *name1, + struct xfs_inode *ip1, + struct xfs_inode *dp2, + struct xfs_name *name2, + struct xfs_inode *ip2, + unsigned int spaceres) +{ + int ip1_flags = 0; + int ip2_flags = 0; + int dp2_flags = 0; + int error; + + /* Swap inode number for dirent in first parent */ + error = xfs_dir_replace(tp, dp1, name1, ip2->i_ino, spaceres); + if (error) + return error; + + /* Swap inode number for dirent in second parent */ + error = xfs_dir_replace(tp, dp2, name2, ip1->i_ino, spaceres); + if (error) + return error; + + /* + * If we're renaming one or more directories across different parents, + * update the respective ".." entries (and link counts) to match the new + * parents. + */ + if (dp1 != dp2) { + dp2_flags = XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG; + + if (S_ISDIR(VFS_I(ip2)->i_mode)) { + error = xfs_dir_replace(tp, ip2, &xfs_name_dotdot, + dp1->i_ino, spaceres); + if (error) + return error; + + /* transfer ip2 ".." reference to dp1 */ + if (!S_ISDIR(VFS_I(ip1)->i_mode)) { + error = xfs_droplink(tp, dp2); + if (error) + return error; + xfs_bumplink(tp, dp1); + } + + /* + * Although ip1 isn't changed here, userspace needs + * to be warned about the change, so that applications + * relying on it (like backup ones), will properly + * notify the change + */ + ip1_flags |= XFS_ICHGTIME_CHG; + ip2_flags |= XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG; + } + + if (S_ISDIR(VFS_I(ip1)->i_mode)) { + error = xfs_dir_replace(tp, ip1, &xfs_name_dotdot, + dp2->i_ino, spaceres); + if (error) + return error; + + /* transfer ip1 ".." reference to dp2 */ + if (!S_ISDIR(VFS_I(ip2)->i_mode)) { + error = xfs_droplink(tp, dp1); + if (error) + return error; + xfs_bumplink(tp, dp2); + } + + /* + * Although ip2 isn't changed here, userspace needs + * to be warned about the change, so that applications + * relying on it (like backup ones), will properly + * notify the change + */ + ip1_flags |= XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG; + ip2_flags |= XFS_ICHGTIME_CHG; + } + } + + if (ip1_flags) { + xfs_trans_ichgtime(tp, ip1, ip1_flags); + xfs_trans_log_inode(tp, ip1, XFS_ILOG_CORE); + } + if (ip2_flags) { + xfs_trans_ichgtime(tp, ip2, ip2_flags); + xfs_trans_log_inode(tp, ip2, XFS_ILOG_CORE); + } + if (dp2_flags) { + xfs_trans_ichgtime(tp, dp2, dp2_flags); + xfs_trans_log_inode(tp, dp2, XFS_ILOG_CORE); + } + xfs_trans_ichgtime(tp, dp1, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG); + xfs_trans_log_inode(tp, dp1, XFS_ILOG_CORE); + + return 0; +} diff --git a/libxfs/xfs_dir2.h b/libxfs/xfs_dir2.h index e35deb273d8..f63390236f0 100644 --- a/libxfs/xfs_dir2.h +++ b/libxfs/xfs_dir2.h @@ -262,5 +262,9 @@ int xfs_dir_link_existing_child(struct xfs_trans *tp, uint resblks, int xfs_dir_remove_child(struct xfs_trans *tp, uint resblks, struct xfs_inode *dp, struct xfs_name *name, struct xfs_inode *ip); +int xfs_dir_exchange(struct xfs_trans *tp, struct xfs_inode *dp1, + struct xfs_name *name1, struct xfs_inode *ip1, + struct xfs_inode *dp2, struct xfs_name *name2, + struct xfs_inode *ip2, unsigned int spaceres); #endif /* __XFS_DIR2_H__ */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 21/26] xfs: create libxfs helper to remove an existing inode/name from a directory 2022-12-30 22:19 ` [PATCHSET v1.0 00/26] libxfs: hoist inode operations to libxfs Darrick J. Wong ` (16 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 22/26] xfs: create libxfs helper to exchange two directory entries Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 20/26] xfs: hoist inode free function to libxfs Darrick J. Wong ` (7 subsequent siblings) 25 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Create a new libxfs function to remove a (name, inode) entry from a directory. The upcoming metadata directory feature will need this to create a metadata directory tree. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/xfs_dir2.c | 73 +++++++++++++++++++++++++++++++++++++++++++++++++++++ libxfs/xfs_dir2.h | 3 ++ 2 files changed, 76 insertions(+) diff --git a/libxfs/xfs_dir2.c b/libxfs/xfs_dir2.c index 8755ef615f8..a0853d766f2 100644 --- a/libxfs/xfs_dir2.c +++ b/libxfs/xfs_dir2.c @@ -876,3 +876,76 @@ xfs_dir_link_existing_child( xfs_bumplink(tp, ip); return 0; } + +/* + * Given a directory @dp, a child @ip, and a @name, remove the (@name, @ip) + * entry from the directory. Both inodes must have the ILOCK held. + */ +int +xfs_dir_remove_child( + struct xfs_trans *tp, + uint resblks, + struct xfs_inode *dp, + struct xfs_name *name, + struct xfs_inode *ip) +{ + int error; + + ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL)); + ASSERT(xfs_isilocked(dp, XFS_ILOCK_EXCL)); + + /* + * If we're removing a directory perform some additional validation. + */ + if (S_ISDIR(VFS_I(ip)->i_mode)) { + ASSERT(VFS_I(ip)->i_nlink >= 2); + if (VFS_I(ip)->i_nlink != 2) + return -ENOTEMPTY; + if (!xfs_dir_isempty(ip)) + return -ENOTEMPTY; + + /* Drop the link from ip's "..". */ + error = xfs_droplink(tp, dp); + if (error) + return error; + + /* Drop the "." link from ip to self. */ + error = xfs_droplink(tp, ip); + if (error) + return error; + + /* + * Point the unlinked child directory's ".." entry to the root + * directory to eliminate back-references to inodes that may + * get freed before the child directory is closed. If the fs + * gets shrunk, this can lead to dirent inode validation errors. + */ + if (dp->i_ino != tp->t_mountp->m_sb.sb_rootino) { + error = xfs_dir_replace(tp, ip, &xfs_name_dotdot, + tp->t_mountp->m_sb.sb_rootino, 0); + if (error) + return error; + } + } else { + /* + * When removing a non-directory we need to log the parent + * inode here. For a directory this is done implicitly + * by the xfs_droplink call for the ".." entry. + */ + xfs_trans_log_inode(tp, dp, XFS_ILOG_CORE); + } + xfs_trans_ichgtime(tp, dp, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG); + + /* Drop the link from dp to ip. */ + error = xfs_droplink(tp, ip); + if (error) + return error; + + error = xfs_dir_removename(tp, dp, name, ip->i_ino, resblks); + if (error) { + ASSERT(error != -ENOENT); + return error; + } + + return 0; +} diff --git a/libxfs/xfs_dir2.h b/libxfs/xfs_dir2.h index 4afade8b087..e35deb273d8 100644 --- a/libxfs/xfs_dir2.h +++ b/libxfs/xfs_dir2.h @@ -259,5 +259,8 @@ int xfs_dir_create_new_child(struct xfs_trans *tp, uint resblks, int xfs_dir_link_existing_child(struct xfs_trans *tp, uint resblks, struct xfs_inode *dp, struct xfs_name *name, struct xfs_inode *ip); +int xfs_dir_remove_child(struct xfs_trans *tp, uint resblks, + struct xfs_inode *dp, struct xfs_name *name, + struct xfs_inode *ip); #endif /* __XFS_DIR2_H__ */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 20/26] xfs: hoist inode free function to libxfs 2022-12-30 22:19 ` [PATCHSET v1.0 00/26] libxfs: hoist inode operations to libxfs Darrick J. Wong ` (17 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 21/26] xfs: create libxfs helper to remove an existing inode/name from a directory Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 19/26] xfs: create libxfs helper to link an existing inode into a directory Darrick J. Wong ` (6 subsequent siblings) 25 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Create a libxfs helper function that marks an inode free on disk. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/xfs_inode_util.c | 50 +++++++++++++++++++++++++++++++++++++++++++++++ libxfs/xfs_inode_util.h | 5 +++++ 2 files changed, 55 insertions(+) diff --git a/libxfs/xfs_inode_util.c b/libxfs/xfs_inode_util.c index e12c43954cf..65c025f3573 100644 --- a/libxfs/xfs_inode_util.c +++ b/libxfs/xfs_inode_util.c @@ -643,3 +643,53 @@ xfs_bumplink( inc_nlink(VFS_I(ip)); xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); } + +/* Mark an inode free on disk. */ +int +xfs_dir_ifree( + struct xfs_trans *tp, + struct xfs_perag *pag, + struct xfs_inode *ip, + struct xfs_icluster *xic) +{ + int error; + + /* + * Free the inode first so that we guarantee that the AGI lock is going + * to be taken before we remove the inode from the unlinked list. This + * makes the AGI lock -> unlinked list modification order the same as + * used in O_TMPFILE creation. + */ + error = xfs_difree(tp, pag, ip->i_ino, xic); + if (error) + return error; + + error = xfs_iunlink_remove(tp, pag, ip); + if (error) + return error; + + /* + * Free any local-format data sitting around before we reset the + * data fork to extents format. Note that the attr fork data has + * already been freed by xfs_attr_inactive. + */ + if (ip->i_df.if_format == XFS_DINODE_FMT_LOCAL) { + kmem_free(ip->i_df.if_u1.if_data); + ip->i_df.if_u1.if_data = NULL; + ip->i_df.if_bytes = 0; + } + + VFS_I(ip)->i_mode = 0; /* mark incore inode as free */ + ip->i_diflags = 0; + ip->i_diflags2 = ip->i_mount->m_ino_geo.new_diflags2; + ip->i_forkoff = 0; /* mark the attr fork not in use */ + ip->i_df.if_format = XFS_DINODE_FMT_EXTENTS; + + /* + * Bump the generation count so no one will be confused + * by reincarnations of this inode. + */ + VFS_I(ip)->i_generation++; + xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); + return 0; +} diff --git a/libxfs/xfs_inode_util.h b/libxfs/xfs_inode_util.h index f92b14a6fbe..fcddaa6f738 100644 --- a/libxfs/xfs_inode_util.h +++ b/libxfs/xfs_inode_util.h @@ -6,6 +6,8 @@ #ifndef __XFS_INODE_UTIL_H__ #define __XFS_INODE_UTIL_H__ +struct xfs_icluster; + uint16_t xfs_flags2diflags(struct xfs_inode *ip, unsigned int xflags); uint64_t xfs_flags2diflags2(struct xfs_inode *ip, unsigned int xflags); uint32_t xfs_dic2xflags(struct xfs_inode *ip); @@ -56,6 +58,9 @@ void xfs_trans_ichgtime(struct xfs_trans *tp, struct xfs_inode *ip, int flags); void xfs_inode_init(struct xfs_trans *tp, const struct xfs_icreate_args *args, struct xfs_inode *ip); +int xfs_dir_ifree(struct xfs_trans *tp, struct xfs_perag *pag, + struct xfs_inode *ip, struct xfs_icluster *xic); + int xfs_iunlink(struct xfs_trans *tp, struct xfs_inode *ip); int xfs_iunlink_remove(struct xfs_trans *tp, struct xfs_perag *pag, struct xfs_inode *ip); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 19/26] xfs: create libxfs helper to link an existing inode into a directory 2022-12-30 22:19 ` [PATCHSET v1.0 00/26] libxfs: hoist inode operations to libxfs Darrick J. Wong ` (18 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 20/26] xfs: hoist inode free function to libxfs Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 18/26] xfs: create libxfs helper to link a new " Darrick J. Wong ` (5 subsequent siblings) 25 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Create a new libxfs function to link an existing inode into a directory. The upcoming metadata directory feature will need this to create a metadata directory tree. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/xfs_dir2.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++++++ libxfs/xfs_dir2.h | 3 +++ 2 files changed, 54 insertions(+) diff --git a/libxfs/xfs_dir2.c b/libxfs/xfs_dir2.c index ed52969b45d..8755ef615f8 100644 --- a/libxfs/xfs_dir2.c +++ b/libxfs/xfs_dir2.c @@ -21,6 +21,7 @@ #include "xfs_shared.h" #include "xfs_bmap_btree.h" #include "xfs_trans_space.h" +#include "xfs_ag.h" const struct xfs_name xfs_name_dotdot = { .name = (const unsigned char *)"..", @@ -825,3 +826,53 @@ xfs_dir_create_new_child( xfs_bumplink(tp, dp); return 0; } + +/* + * Given a directory @dp, an existing non-directory inode @ip, and a @name, + * link @ip into @dp under the given @name. Both inodes must have the ILOCK + * held. + */ +int +xfs_dir_link_existing_child( + struct xfs_trans *tp, + uint resblks, + struct xfs_inode *dp, + struct xfs_name *name, + struct xfs_inode *ip) +{ + struct xfs_mount *mp = tp->t_mountp; + int error; + + ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL)); + ASSERT(xfs_isilocked(dp, XFS_ILOCK_EXCL)); + ASSERT(!S_ISDIR(VFS_I(ip)->i_mode)); + + if (!resblks) { + error = xfs_dir_canenter(tp, dp, name); + if (error) + return error; + } + + /* + * Handle initial link state of O_TMPFILE inode + */ + if (VFS_I(ip)->i_nlink == 0) { + struct xfs_perag *pag; + + pag = xfs_perag_get(mp, XFS_INO_TO_AGNO(mp, ip->i_ino)); + error = xfs_iunlink_remove(tp, pag, ip); + xfs_perag_put(pag); + if (error) + return error; + } + + error = xfs_dir_createname(tp, dp, name, ip->i_ino, resblks); + if (error) + return error; + + xfs_trans_ichgtime(tp, dp, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG); + xfs_trans_log_inode(tp, dp, XFS_ILOG_CORE); + + xfs_bumplink(tp, ip); + return 0; +} diff --git a/libxfs/xfs_dir2.h b/libxfs/xfs_dir2.h index d3e7607c0e9..4afade8b087 100644 --- a/libxfs/xfs_dir2.h +++ b/libxfs/xfs_dir2.h @@ -256,5 +256,8 @@ bool xfs_dir2_namecheck(const void *name, size_t length); int xfs_dir_create_new_child(struct xfs_trans *tp, uint resblks, struct xfs_inode *dp, struct xfs_name *name, struct xfs_inode *ip); +int xfs_dir_link_existing_child(struct xfs_trans *tp, uint resblks, + struct xfs_inode *dp, struct xfs_name *name, + struct xfs_inode *ip); #endif /* __XFS_DIR2_H__ */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 18/26] xfs: create libxfs helper to link a new inode into a directory 2022-12-30 22:19 ` [PATCHSET v1.0 00/26] libxfs: hoist inode operations to libxfs Darrick J. Wong ` (19 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 19/26] xfs: create libxfs helper to link an existing inode into a directory Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 16/26] xfs: hoist xfs_iunlink to libxfs Darrick J. Wong ` (4 subsequent siblings) 25 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Create a new libxfs function to link a newly created inode into a directory. The upcoming metadata directory feature will need this to create a metadata directory tree. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/xfs_dir2.c | 45 +++++++++++++++++++++++++++++++++++++++++++++ libxfs/xfs_dir2.h | 4 ++++ 2 files changed, 49 insertions(+) diff --git a/libxfs/xfs_dir2.c b/libxfs/xfs_dir2.c index 460b64339d7..ed52969b45d 100644 --- a/libxfs/xfs_dir2.c +++ b/libxfs/xfs_dir2.c @@ -18,6 +18,9 @@ #include "xfs_errortag.h" #include "xfs_trace.h" #include "xfs_health.h" +#include "xfs_shared.h" +#include "xfs_bmap_btree.h" +#include "xfs_trans_space.h" const struct xfs_name xfs_name_dotdot = { .name = (const unsigned char *)"..", @@ -780,3 +783,45 @@ xfs_dir2_compname( return xfs_ascii_ci_compname(args, name, len); return xfs_da_compname(args, name, len); } + +/* + * Given a directory @dp, a newly allocated inode @ip, and a @name, link @ip + * into @dp under the given @name. If @ip is a directory, it will be + * initialized. Both inodes must have the ILOCK held and the transaction must + * have sufficient blocks reserved. + */ +int +xfs_dir_create_new_child( + struct xfs_trans *tp, + uint resblks, + struct xfs_inode *dp, + struct xfs_name *name, + struct xfs_inode *ip) +{ + struct xfs_mount *mp = tp->t_mountp; + int error; + + ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL)); + ASSERT(xfs_isilocked(dp, XFS_ILOCK_EXCL)); + ASSERT(resblks == 0 || resblks > XFS_IALLOC_SPACE_RES(mp)); + + error = xfs_dir_createname(tp, dp, name, ip->i_ino, + resblks - XFS_IALLOC_SPACE_RES(mp)); + if (error) { + ASSERT(error != -ENOSPC); + return error; + } + + xfs_trans_ichgtime(tp, dp, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG); + xfs_trans_log_inode(tp, dp, XFS_ILOG_CORE); + + if (!S_ISDIR(VFS_I(ip)->i_mode)) + return 0; + + error = xfs_dir_init(tp, ip, dp); + if (error) + return error; + + xfs_bumplink(tp, dp); + return 0; +} diff --git a/libxfs/xfs_dir2.h b/libxfs/xfs_dir2.h index 7322284f61a..d3e7607c0e9 100644 --- a/libxfs/xfs_dir2.h +++ b/libxfs/xfs_dir2.h @@ -253,4 +253,8 @@ unsigned int xfs_dir3_data_end_offset(struct xfs_da_geometry *geo, struct xfs_dir2_data_hdr *hdr); bool xfs_dir2_namecheck(const void *name, size_t length); +int xfs_dir_create_new_child(struct xfs_trans *tp, uint resblks, + struct xfs_inode *dp, struct xfs_name *name, + struct xfs_inode *ip); + #endif /* __XFS_DIR2_H__ */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 16/26] xfs: hoist xfs_iunlink to libxfs 2022-12-30 22:19 ` [PATCHSET v1.0 00/26] libxfs: hoist inode operations to libxfs Darrick J. Wong ` (20 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 18/26] xfs: create libxfs helper to link a new " Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 23/26] xfs: create libxfs helper to rename two directory entries Darrick J. Wong ` (3 subsequent siblings) 25 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Move xfs_iunlink and xfs_iunlink_remove to libxfs. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- include/xfs_inode.h | 1 include/xfs_trace.h | 6 + libxfs/Makefile | 2 libxfs/inode.c | 2 libxfs/iunlink.c | 126 ++++++++++++++++++++++ libxfs/iunlink.h | 22 ++++ libxfs/libxfs_priv.h | 28 +++++ libxfs/xfs_inode_util.c | 275 +++++++++++++++++++++++++++++++++++++++++++++++ libxfs/xfs_inode_util.h | 4 + 9 files changed, 466 insertions(+) create mode 100644 libxfs/iunlink.c create mode 100644 libxfs/iunlink.h diff --git a/include/xfs_inode.h b/include/xfs_inode.h index 5c806b3a58c..234f8d3affa 100644 --- a/include/xfs_inode.h +++ b/include/xfs_inode.h @@ -135,6 +135,7 @@ typedef struct xfs_inode { /* unlinked list pointers */ xfs_agino_t i_next_unlinked; + xfs_agino_t i_prev_unlinked; xfs_extnum_t i_cnextents; /* # of extents in cow fork */ unsigned int i_cformat; /* format of cow fork */ diff --git a/include/xfs_trace.h b/include/xfs_trace.h index a6ba6fc93bf..d94d8d29bed 100644 --- a/include/xfs_trace.h +++ b/include/xfs_trace.h @@ -349,4 +349,10 @@ #define trace_xfs_perag_get_tag(a,b,c,d) ((c) = (c)) #define trace_xfs_perag_put(a,b,c,d) ((c) = (c)) +#define trace_xfs_iunlink_update_bucket(...) ((void) 0) +#define trace_xfs_iunlink_update_dinode(...) ((void) 0) +#define trace_xfs_iunlink(...) ((void) 0) +#define trace_xfs_iunlink_remove(...) ((void) 0) +#define trace_xfs_iunlink_map_prev_fallback(...) ((void) 0) + #endif /* __TRACE_H__ */ diff --git a/libxfs/Makefile b/libxfs/Makefile index f9bc82cc9e8..94f5968e862 100644 --- a/libxfs/Makefile +++ b/libxfs/Makefile @@ -23,6 +23,7 @@ HFILES = \ libxfs_io.h \ libxfs_api_defs.h \ init.h \ + iunlink.h \ libxfs_priv.h \ linux-err.h \ topology.h \ @@ -65,6 +66,7 @@ CFILES = cache.c \ defer_item.c \ init.c \ inode.c \ + iunlink.c \ kmem.c \ logitem.c \ rdwr.c \ diff --git a/libxfs/inode.c b/libxfs/inode.c index 8ef2b654769..1a27016a763 100644 --- a/libxfs/inode.c +++ b/libxfs/inode.c @@ -169,6 +169,8 @@ libxfs_iget( ip->i_mount = mp; ip->i_diflags2 = mp->m_ino_geo.new_diflags2; ip->i_af.if_format = XFS_DINODE_FMT_EXTENTS; + ip->i_next_unlinked = NULLAGINO; + ip->i_prev_unlinked = NULLAGINO; spin_lock_init(&VFS_I(ip)->i_lock); error = xfs_imap(mp, tp, ip->i_ino, &ip->i_imap, 0); diff --git a/libxfs/iunlink.c b/libxfs/iunlink.c new file mode 100644 index 00000000000..2123dfdcbbf --- /dev/null +++ b/libxfs/iunlink.c @@ -0,0 +1,126 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (c) 2020-2022, Red Hat, Inc. + * All Rights Reserved. + */ + +#include "libxfs_priv.h" +#include "libxfs.h" +#include "libxfs_io.h" +#include "init.h" +#include "xfs_fs.h" +#include "xfs_shared.h" +#include "xfs_format.h" +#include "xfs_log_format.h" +#include "xfs_trans_resv.h" +#include "xfs_mount.h" +#include "xfs_inode.h" +#include "xfs_trans.h" +#include "xfs_ag.h" +#include "iunlink.h" +#include "xfs_trace.h" + +/* in memory log item structure */ +struct xfs_iunlink_item { + struct xfs_inode *ip; + struct xfs_perag *pag; + xfs_agino_t next_agino; + xfs_agino_t old_agino; +}; + +/* + * Look up the inode cluster buffer and log the on-disk unlinked inode change + * we need to make. + */ +static int +xfs_iunlink_log_dinode( + struct xfs_trans *tp, + struct xfs_iunlink_item *iup) +{ + struct xfs_mount *mp = tp->t_mountp; + struct xfs_inode *ip = iup->ip; + struct xfs_dinode *dip; + struct xfs_buf *ibp; + int offset; + int error; + + error = xfs_imap_to_bp(mp, tp, &ip->i_imap, &ibp); + if (error) + return error; + /* + * Don't log the unlinked field on stale buffers as this may be the + * transaction that frees the inode cluster and relogging the buffer + * here will incorrectly remove the stale state. + */ + if (ibp->b_flags & LIBXFS_B_STALE) + goto out; + + dip = xfs_buf_offset(ibp, ip->i_imap.im_boffset); + + /* Make sure the old pointer isn't garbage. */ + if (be32_to_cpu(dip->di_next_unlinked) != iup->old_agino) { + xfs_inode_verifier_error(ip, -EFSCORRUPTED, __func__, dip, + sizeof(*dip), __this_address); + error = -EFSCORRUPTED; + goto out; + } + + trace_xfs_iunlink_update_dinode(mp, iup->pag->pag_agno, + XFS_INO_TO_AGINO(mp, ip->i_ino), + be32_to_cpu(dip->di_next_unlinked), iup->next_agino); + + dip->di_next_unlinked = cpu_to_be32(iup->next_agino); + offset = ip->i_imap.im_boffset + + offsetof(struct xfs_dinode, di_next_unlinked); + + xfs_dinode_calc_crc(mp, dip); + xfs_trans_inode_buf(tp, ibp); + xfs_trans_log_buf(tp, ibp, offset, offset + sizeof(xfs_agino_t) - 1); + return 0; +out: + xfs_trans_brelse(tp, ibp); + return error; +} + +/* + * Initialize the inode log item for a newly allocated (in-core) inode. + * + * Inode extents can only reside within an AG. Hence specify the starting + * block for the inode chunk by offset within an AG as well as the + * length of the allocated extent. + * + * This joins the item to the transaction and marks it dirty so + * that we don't need a separate call to do this, nor does the + * caller need to know anything about the iunlink item. + */ +int +xfs_iunlink_log_inode( + struct xfs_trans *tp, + struct xfs_inode *ip, + struct xfs_perag *pag, + xfs_agino_t next_agino) +{ + struct xfs_iunlink_item iup = { + .ip = ip, + .pag = pag, + .next_agino = next_agino, + .old_agino = ip->i_next_unlinked, + }; + + ASSERT(xfs_verify_agino_or_null(pag, next_agino)); + ASSERT(xfs_verify_agino_or_null(pag, ip->i_next_unlinked)); + + /* + * Since we're updating a linked list, we should never find that the + * current pointer is the same as the new value, unless we're + * terminating the list. + */ + if (ip->i_next_unlinked == next_agino) { + if (next_agino != NULLAGINO) + return -EFSCORRUPTED; + return 0; + } + + return xfs_iunlink_log_dinode(tp, &iup); +} + diff --git a/libxfs/iunlink.h b/libxfs/iunlink.h new file mode 100644 index 00000000000..fec6a515181 --- /dev/null +++ b/libxfs/iunlink.h @@ -0,0 +1,22 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (c) 2020-2022, Red Hat, Inc. + * All Rights Reserved. + */ +#ifndef XFS_IUNLINK_ITEM_H +#define XFS_IUNLINK_ITEM_H 1 + +struct xfs_trans; +struct xfs_inode; +struct xfs_perag; + +static inline struct xfs_inode * +xfs_iunlink_lookup(struct xfs_perag *pag, xfs_agino_t agino) +{ + return NULL; +} + +int xfs_iunlink_log_inode(struct xfs_trans *tp, struct xfs_inode *ip, + struct xfs_perag *pag, xfs_agino_t next_agino); + +#endif /* XFS_IUNLINK_ITEM_H */ diff --git a/libxfs/libxfs_priv.h b/libxfs/libxfs_priv.h index acad5ccd228..90335331cde 100644 --- a/libxfs/libxfs_priv.h +++ b/libxfs/libxfs_priv.h @@ -518,6 +518,8 @@ void __xfs_buf_mark_corrupt(struct xfs_buf *bp, xfs_failaddr_t fa); #define xfs_filestream_lookup_ag(ip) (0) #define xfs_filestream_new_ag(ip,ag) (0) +#define xfs_trans_inode_buf(tp, bp) ((void) 0) + /* quota bits */ #define xfs_trans_mod_dquot_byino(t,i,f,d) ((void) 0) #define xfs_trans_reserve_quota_nblks(t,i,b,n,f) (0) @@ -690,6 +692,32 @@ static inline void xfs_buf_hash_destroy(struct xfs_perag *pag) { } static inline int xfs_iunlink_init(struct xfs_perag *pag) { return 0; } static inline void xfs_iunlink_destroy(struct xfs_perag *pag) { } +static inline xfs_agino_t +xfs_iunlink_lookup_backref( + struct xfs_perag *pag, + xfs_agino_t agino) +{ + return NULLAGINO; +} + +static inline int +xfs_iunlink_add_backref( + struct xfs_perag *pag, + xfs_agino_t prev_agino, + xfs_agino_t this_agino) +{ + return 0; +} + +static inline int +xfs_iunlink_change_backref( + struct xfs_perag *pag, + xfs_agino_t agino, + xfs_agino_t next_unlinked) +{ + return 0; +} + xfs_agnumber_t xfs_set_inode_alloc(struct xfs_mount *mp, xfs_agnumber_t agcount); diff --git a/libxfs/xfs_inode_util.c b/libxfs/xfs_inode_util.c index 21196a899da..4b19edd9ab1 100644 --- a/libxfs/xfs_inode_util.c +++ b/libxfs/xfs_inode_util.c @@ -17,6 +17,9 @@ #include "xfs_ialloc.h" #include "xfs_health.h" #include "xfs_bmap.h" +#include "xfs_trace.h" +#include "xfs_ag.h" +#include "iunlink.h" uint16_t xfs_flags2diflags( @@ -333,3 +336,275 @@ xfs_inode_init( /* now that we have an i_mode we can setup the inode structure */ xfs_setup_inode(ip); } + +/* + * In-Core Unlinked List Lookups + * ============================= + * + * Every inode is supposed to be reachable from some other piece of metadata + * with the exception of the root directory. Inodes with a connection to a + * file descriptor but not linked from anywhere in the on-disk directory tree + * are collectively known as unlinked inodes, though the filesystem itself + * maintains links to these inodes so that on-disk metadata are consistent. + * + * XFS implements a per-AG on-disk hash table of unlinked inodes. The AGI + * header contains a number of buckets that point to an inode, and each inode + * record has a pointer to the next inode in the hash chain. This + * singly-linked list causes scaling problems in the iunlink remove function + * because we must walk that list to find the inode that points to the inode + * being removed from the unlinked hash bucket list. + * + * Hence we keep an in-memory double linked list to link each inode on an + * unlinked list. Because there are 64 unlinked lists per AGI, keeping pointer + * based lists would require having 64 list heads in the perag, one for each + * list. This is expensive in terms of memory (think millions of AGs) and cache + * misses on lookups. Instead, use the fact that inodes on the unlinked list + * must be referenced at the VFS level to keep them on the list and hence we + * have an existence guarantee for inodes on the unlinked list. + * + * Given we have an existence guarantee, we can use lockless inode cache lookups + * to resolve aginos to xfs inodes. This means we only need 8 bytes per inode + * for the double linked unlinked list, and we don't need any extra locking to + * keep the list safe as all manipulations are done under the AGI buffer lock. + * Keeping the list up to date does not require memory allocation, just finding + * the XFS inode and updating the next/prev unlinked list aginos. + */ + +/* Update the prev pointer of the next agino. */ +static int +xfs_iunlink_update_backref( + struct xfs_perag *pag, + xfs_agino_t prev_agino, + xfs_agino_t next_agino) +{ + struct xfs_inode *ip; + + /* No update necessary if we are at the end of the list. */ + if (next_agino == NULLAGINO) + return 0; + + ip = xfs_iunlink_lookup(pag, next_agino); + if (!ip) { + xfs_ag_mark_sick(pag, XFS_SICK_AG_AGI); + return -EFSCORRUPTED; + } + + ip->i_prev_unlinked = prev_agino; + return 0; +} + +/* + * Point the AGI unlinked bucket at an inode and log the results. The caller + * is responsible for validating the old value. + */ +STATIC int +xfs_iunlink_update_bucket( + struct xfs_trans *tp, + struct xfs_perag *pag, + struct xfs_buf *agibp, + unsigned int bucket_index, + xfs_agino_t new_agino) +{ + struct xfs_agi *agi = agibp->b_addr; + xfs_agino_t old_value; + int offset; + + ASSERT(xfs_verify_agino_or_null(pag, new_agino)); + + old_value = be32_to_cpu(agi->agi_unlinked[bucket_index]); + trace_xfs_iunlink_update_bucket(tp->t_mountp, pag->pag_agno, bucket_index, + old_value, new_agino); + + /* + * We should never find the head of the list already set to the value + * passed in because either we're adding or removing ourselves from the + * head of the list. + */ + if (old_value == new_agino) { + xfs_buf_mark_corrupt(agibp); + xfs_ag_mark_sick(pag, XFS_SICK_AG_AGI); + return -EFSCORRUPTED; + } + + agi->agi_unlinked[bucket_index] = cpu_to_be32(new_agino); + offset = offsetof(struct xfs_agi, agi_unlinked) + + (sizeof(xfs_agino_t) * bucket_index); + xfs_trans_log_buf(tp, agibp, offset, offset + sizeof(xfs_agino_t) - 1); + return 0; +} + +static int +xfs_iunlink_insert_inode( + struct xfs_trans *tp, + struct xfs_perag *pag, + struct xfs_buf *agibp, + struct xfs_inode *ip) +{ + struct xfs_mount *mp = tp->t_mountp; + struct xfs_agi *agi = agibp->b_addr; + xfs_agino_t next_agino; + xfs_agino_t agino = XFS_INO_TO_AGINO(mp, ip->i_ino); + short bucket_index = agino % XFS_AGI_UNLINKED_BUCKETS; + int error; + + /* + * Get the index into the agi hash table for the list this inode will + * go on. Make sure the pointer isn't garbage and that this inode + * isn't already on the list. + */ + next_agino = be32_to_cpu(agi->agi_unlinked[bucket_index]); + if (next_agino == agino || + !xfs_verify_agino_or_null(pag, next_agino)) { + xfs_buf_mark_corrupt(agibp); + xfs_ag_mark_sick(pag, XFS_SICK_AG_AGI); + return -EFSCORRUPTED; + } + + /* + * Update the prev pointer in the next inode to point back to this + * inode. + */ + error = xfs_iunlink_update_backref(pag, agino, next_agino); + if (error) + return error; + + if (next_agino != NULLAGINO) { + /* + * There is already another inode in the bucket, so point this + * inode to the current head of the list. + */ + error = xfs_iunlink_log_inode(tp, ip, pag, next_agino); + if (error) + return error; + ip->i_next_unlinked = next_agino; + } + + /* Point the head of the list to point to this inode. */ + ip->i_prev_unlinked = NULLAGINO; + return xfs_iunlink_update_bucket(tp, pag, agibp, bucket_index, agino); +} + +/* + * This is called when the inode's link count has gone to 0 or we are creating + * a tmpfile via O_TMPFILE. The inode @ip must have nlink == 0. + * + * We place the on-disk inode on a list in the AGI. It will be pulled from this + * list when the inode is freed. + */ +int +xfs_iunlink( + struct xfs_trans *tp, + struct xfs_inode *ip) +{ + struct xfs_mount *mp = tp->t_mountp; + struct xfs_perag *pag; + struct xfs_buf *agibp; + int error; + + ASSERT(VFS_I(ip)->i_nlink == 0); + ASSERT(VFS_I(ip)->i_mode != 0); + trace_xfs_iunlink(ip); + + pag = xfs_perag_get(mp, XFS_INO_TO_AGNO(mp, ip->i_ino)); + + /* Get the agi buffer first. It ensures lock ordering on the list. */ + error = xfs_read_agi(pag, tp, &agibp); + if (error) + goto out; + + error = xfs_iunlink_insert_inode(tp, pag, agibp, ip); +out: + xfs_perag_put(pag); + return error; +} + +static int +xfs_iunlink_remove_inode( + struct xfs_trans *tp, + struct xfs_perag *pag, + struct xfs_buf *agibp, + struct xfs_inode *ip) +{ + struct xfs_mount *mp = tp->t_mountp; + struct xfs_agi *agi = agibp->b_addr; + xfs_agino_t agino = XFS_INO_TO_AGINO(mp, ip->i_ino); + xfs_agino_t head_agino; + short bucket_index = agino % XFS_AGI_UNLINKED_BUCKETS; + int error; + + trace_xfs_iunlink_remove(ip); + + /* + * Get the index into the agi hash table for the list this inode will + * go on. Make sure the head pointer isn't garbage. + */ + head_agino = be32_to_cpu(agi->agi_unlinked[bucket_index]); + if (!xfs_verify_agino(pag, head_agino)) { + XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp, + agi, sizeof(*agi)); + xfs_ag_mark_sick(pag, XFS_SICK_AG_AGI); + return -EFSCORRUPTED; + } + + /* + * Set our inode's next_unlinked pointer to NULL and then return + * the old pointer value so that we can update whatever was previous + * to us in the list to point to whatever was next in the list. + */ + error = xfs_iunlink_log_inode(tp, ip, pag, NULLAGINO); + if (error) + return error; + + /* + * Update the prev pointer in the next inode to point back to previous + * inode in the chain. + */ + error = xfs_iunlink_update_backref(pag, ip->i_prev_unlinked, + ip->i_next_unlinked); + if (error) + return error; + + if (head_agino != agino) { + struct xfs_inode *prev_ip; + + prev_ip = xfs_iunlink_lookup(pag, ip->i_prev_unlinked); + if (!prev_ip) { + xfs_inode_mark_sick(ip, XFS_SICK_INO_CORE); + return -EFSCORRUPTED; + } + + error = xfs_iunlink_log_inode(tp, prev_ip, pag, + ip->i_next_unlinked); + prev_ip->i_next_unlinked = ip->i_next_unlinked; + } else { + /* Point the head of the list to the next unlinked inode. */ + error = xfs_iunlink_update_bucket(tp, pag, agibp, bucket_index, + ip->i_next_unlinked); + } + + ip->i_next_unlinked = NULLAGINO; + ip->i_prev_unlinked = 0; + return error; +} + +/* + * Pull the on-disk inode from the AGI unlinked list. + */ +int +xfs_iunlink_remove( + struct xfs_trans *tp, + struct xfs_perag *pag, + struct xfs_inode *ip) +{ + struct xfs_buf *agibp; + int error; + + trace_xfs_iunlink_remove(ip); + + /* Get the agi buffer first. It ensures lock ordering on the list. */ + error = xfs_read_agi(pag, tp, &agibp); + if (error) + return error; + + return xfs_iunlink_remove_inode(tp, pag, agibp, ip); +} diff --git a/libxfs/xfs_inode_util.h b/libxfs/xfs_inode_util.h index a73ccaea558..e15cf94e094 100644 --- a/libxfs/xfs_inode_util.h +++ b/libxfs/xfs_inode_util.h @@ -56,6 +56,10 @@ void xfs_trans_ichgtime(struct xfs_trans *tp, struct xfs_inode *ip, int flags); void xfs_inode_init(struct xfs_trans *tp, const struct xfs_icreate_args *args, struct xfs_inode *ip); +int xfs_iunlink(struct xfs_trans *tp, struct xfs_inode *ip); +int xfs_iunlink_remove(struct xfs_trans *tp, struct xfs_perag *pag, + struct xfs_inode *ip); + /* The libxfs client must provide this group of helper functions. */ /* Handle legacy Irix sgid inheritance quirks. */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 23/26] xfs: create libxfs helper to rename two directory entries 2022-12-30 22:19 ` [PATCHSET v1.0 00/26] libxfs: hoist inode operations to libxfs Darrick J. Wong ` (21 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 16/26] xfs: hoist xfs_iunlink to libxfs Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 25/26] xfs_repair: use library functions to reset root/rbm/rsum inodes Darrick J. Wong ` (2 subsequent siblings) 25 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Create a new libxfs function to rename two directory entries. The upcoming metadata directory feature will need this to replace a metadata inode directory entry. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/xfs_dir2.c | 203 +++++++++++++++++++++++++++++++++++++++++++++++++++++ libxfs/xfs_dir2.h | 5 + 2 files changed, 208 insertions(+) diff --git a/libxfs/xfs_dir2.c b/libxfs/xfs_dir2.c index b0bb22ac506..ae91fc48d79 100644 --- a/libxfs/xfs_dir2.c +++ b/libxfs/xfs_dir2.c @@ -22,6 +22,7 @@ #include "xfs_bmap_btree.h" #include "xfs_trans_space.h" #include "xfs_ag.h" +#include "xfs_ialloc.h" const struct xfs_name xfs_name_dotdot = { .name = (const unsigned char *)"..", @@ -1057,3 +1058,205 @@ xfs_dir_exchange( return 0; } + +/* + * Given an entry (@src_name, @src_ip) in directory @src_dp, make the entry + * @target_name in directory @target_dp point to @src_ip and remove the + * original entry, cleaning up everything left behind. + * + * Cleanup involves dropping a link count on @target_ip, and either removing + * the (@src_name, @src_ip) entry from @src_dp or simply replacing the entry + * with (@src_name, @wip) if a whiteout inode @wip is supplied. + * + * All inodes must have the ILOCK held. We assume that if @src_ip is a + * directory then its '..' doesn't already point to @target_dp, and that @wip + * is a freshly allocated whiteout. + */ +int +xfs_dir_rename( + struct xfs_trans *tp, + struct xfs_inode *src_dp, + struct xfs_name *src_name, + struct xfs_inode *src_ip, + struct xfs_inode *target_dp, + struct xfs_name *target_name, + struct xfs_inode *target_ip, + unsigned int spaceres, + struct xfs_inode *wip) +{ + struct xfs_mount *mp = tp->t_mountp; + bool new_parent = (src_dp != target_dp); + bool src_is_directory; + int error; + + src_is_directory = S_ISDIR(VFS_I(src_ip)->i_mode); + + /* + * Check for expected errors before we dirty the transaction + * so we can return an error without a transaction abort. + */ + if (target_ip == NULL) { + /* + * If there's no space reservation, check the entry will + * fit before actually inserting it. + */ + if (!spaceres) { + error = xfs_dir_canenter(tp, target_dp, target_name); + if (error) + return error; + } + } else { + /* + * If target exists and it's a directory, check that whether + * it can be destroyed. + */ + if (S_ISDIR(VFS_I(target_ip)->i_mode) && + (!xfs_dir_isempty(target_ip) || + (VFS_I(target_ip)->i_nlink > 2))) + return -EEXIST; + } + + /* + * Directory entry creation below may acquire the AGF. Remove + * the whiteout from the unlinked list first to preserve correct + * AGI/AGF locking order. This dirties the transaction so failures + * after this point will abort and log recovery will clean up the + * mess. + * + * For whiteouts, we need to bump the link count on the whiteout + * inode. After this point, we have a real link, clear the tmpfile + * state flag from the inode so it doesn't accidentally get misused + * in future. + */ + if (wip) { + struct xfs_perag *pag; + + ASSERT(VFS_I(wip)->i_nlink == 0); + + pag = xfs_perag_get(mp, XFS_INO_TO_AGNO(mp, wip->i_ino)); + error = xfs_iunlink_remove(tp, pag, wip); + xfs_perag_put(pag); + if (error) + return error; + + xfs_bumplink(tp, wip); + } + + /* + * Set up the target. + */ + if (target_ip == NULL) { + /* + * If target does not exist and the rename crosses + * directories, adjust the target directory link count + * to account for the ".." reference from the new entry. + */ + error = xfs_dir_createname(tp, target_dp, target_name, + src_ip->i_ino, spaceres); + if (error) + return error; + + xfs_trans_ichgtime(tp, target_dp, + XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG); + + if (new_parent && src_is_directory) { + xfs_bumplink(tp, target_dp); + } + } else { /* target_ip != NULL */ + /* + * Link the source inode under the target name. + * If the source inode is a directory and we are moving + * it across directories, its ".." entry will be + * inconsistent until we replace that down below. + * + * In case there is already an entry with the same + * name at the destination directory, remove it first. + */ + error = xfs_dir_replace(tp, target_dp, target_name, + src_ip->i_ino, spaceres); + if (error) + return error; + + xfs_trans_ichgtime(tp, target_dp, + XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG); + + /* + * Decrement the link count on the target since the target + * dir no longer points to it. + */ + error = xfs_droplink(tp, target_ip); + if (error) + return error; + + if (src_is_directory) { + /* + * Drop the link from the old "." entry. + */ + error = xfs_droplink(tp, target_ip); + if (error) + return error; + } + } /* target_ip != NULL */ + + /* + * Remove the source. + */ + if (new_parent && src_is_directory) { + /* + * Rewrite the ".." entry to point to the new + * directory. + */ + error = xfs_dir_replace(tp, src_ip, &xfs_name_dotdot, + target_dp->i_ino, spaceres); + ASSERT(error != -EEXIST); + if (error) + return error; + } + + /* + * We always want to hit the ctime on the source inode. + * + * This isn't strictly required by the standards since the source + * inode isn't really being changed, but old unix file systems did + * it and some incremental backup programs won't work without it. + */ + xfs_trans_ichgtime(tp, src_ip, XFS_ICHGTIME_CHG); + xfs_trans_log_inode(tp, src_ip, XFS_ILOG_CORE); + + /* + * Adjust the link count on src_dp. This is necessary when + * renaming a directory, either within one parent when + * the target existed, or across two parent directories. + */ + if (src_is_directory && (new_parent || target_ip != NULL)) { + + /* + * Decrement link count on src_directory since the + * entry that's moved no longer points to it. + */ + error = xfs_droplink(tp, src_dp); + if (error) + return error; + } + + /* + * For whiteouts, we only need to update the source dirent with the + * inode number of the whiteout inode rather than removing it + * altogether. + */ + if (wip) + error = xfs_dir_replace(tp, src_dp, src_name, wip->i_ino, + spaceres); + else + error = xfs_dir_removename(tp, src_dp, src_name, src_ip->i_ino, + spaceres); + if (error) + return error; + + xfs_trans_ichgtime(tp, src_dp, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG); + xfs_trans_log_inode(tp, src_dp, XFS_ILOG_CORE); + if (new_parent) + xfs_trans_log_inode(tp, target_dp, XFS_ILOG_CORE); + + return 0; +} diff --git a/libxfs/xfs_dir2.h b/libxfs/xfs_dir2.h index f63390236f0..00b4642bc8a 100644 --- a/libxfs/xfs_dir2.h +++ b/libxfs/xfs_dir2.h @@ -266,5 +266,10 @@ int xfs_dir_exchange(struct xfs_trans *tp, struct xfs_inode *dp1, struct xfs_name *name1, struct xfs_inode *ip1, struct xfs_inode *dp2, struct xfs_name *name2, struct xfs_inode *ip2, unsigned int spaceres); +int xfs_dir_rename(struct xfs_trans *tp, struct xfs_inode *src_dp, + struct xfs_name *src_name, struct xfs_inode *src_ip, + struct xfs_inode *target_dp, struct xfs_name *target_name, + struct xfs_inode *target_ip, unsigned int spaceres, + struct xfs_inode *wip); #endif /* __XFS_DIR2_H__ */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 25/26] xfs_repair: use library functions to reset root/rbm/rsum inodes 2022-12-30 22:19 ` [PATCHSET v1.0 00/26] libxfs: hoist inode operations to libxfs Darrick J. Wong ` (22 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 23/26] xfs: create libxfs helper to rename two directory entries Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 26/26] xfs_repair: use library functions for orphanage creation Darrick J. Wong 2022-12-30 22:19 ` [PATCH 24/26] xfs: don't use the incore struct xfs_sb for offsets into struct xfs_dsb Darrick J. Wong 25 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Use the iroot reset function to reset root inodes instead of open-coding the reset routine. While we're at it, fix a longstanding memory leak if the inode being reset actually had an xattr fork full of mappings. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/libxfs_api_defs.h | 2 + repair/phase6.c | 126 +++++++++------------------------------------- 2 files changed, 28 insertions(+), 100 deletions(-) diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h index 782a551ee1c..2b2b958d8a9 100644 --- a/libxfs/libxfs_api_defs.h +++ b/libxfs/libxfs_api_defs.h @@ -133,6 +133,7 @@ #define xfs_dquot_verify libxfs_dquot_verify #define xfs_finobt_calc_reserves libxfs_finobt_calc_reserves +#define xfs_fixed_inode_reset libxfs_fixed_inode_reset #define xfs_free_extent libxfs_free_extent #define xfs_free_perag libxfs_free_perag #define xfs_fs_geometry libxfs_fs_geometry @@ -159,6 +160,7 @@ #define xfs_inobt_stage_cursor libxfs_inobt_stage_cursor #define xfs_inode_from_disk libxfs_inode_from_disk #define xfs_inode_from_disk_ts libxfs_inode_from_disk_ts +#define xfs_inode_init libxfs_inode_init #define xfs_inode_to_disk libxfs_inode_to_disk #define xfs_inode_validate_cowextsize libxfs_inode_validate_cowextsize #define xfs_inode_validate_extsize libxfs_inode_validate_extsize diff --git a/repair/phase6.c b/repair/phase6.c index 0c24cfbf144..5765c2b1250 100644 --- a/repair/phase6.c +++ b/repair/phase6.c @@ -445,20 +445,28 @@ res_failed( do_error(_("xfs_trans_reserve returned %d\n"), err); } -static inline void -reset_inode_fields(struct xfs_inode *ip) +/* + * Forcibly reinitialize a fixed-location inode, such as a filesystem root + * directory or the realtime metadata inodes. The inode must not otherwise be + * in use; the data fork must be empty, and the attr fork will be reset. + */ +static void +reset_root_ino( + struct xfs_trans *tp, + umode_t mode, + struct xfs_inode *ip) { - ip->i_projid = 0; - ip->i_disk_size = 0; - ip->i_nblocks = 0; - ip->i_extsize = 0; - ip->i_cowextsize = 0; - ip->i_flushiter = 0; + struct xfs_icreate_args args = { + .nlink = S_ISDIR(mode) ? 2 : 1, + }; + + libxfs_icreate_args_rootfile(&args, mode); + + /* Erase the attr fork since libxfs_inode_init won't do it for us. */ ip->i_forkoff = 0; - ip->i_diflags = 0; - ip->i_diflags2 = 0; - ip->i_crtime.tv_sec = 0; - ip->i_crtime.tv_nsec = 0; + libxfs_ifork_zap_attr(ip); + + libxfs_inode_init(tp, &args, ip); } static void @@ -472,7 +480,6 @@ mk_rbmino(xfs_mount_t *mp) int error; xfs_fileoff_t bno; xfs_bmbt_irec_t map[XFS_BMAP_MAX_NMAP]; - int times; uint blocks; /* @@ -489,34 +496,9 @@ mk_rbmino(xfs_mount_t *mp) error); } - reset_inode_fields(ip); - - VFS_I(ip)->i_mode = S_IFREG; - ip->i_df.if_format = XFS_DINODE_FMT_EXTENTS; - libxfs_ifork_zap_attr(ip); - - set_nlink(VFS_I(ip), 1); /* account for sb ptr */ - - times = XFS_ICHGTIME_CHG | XFS_ICHGTIME_MOD; - if (xfs_has_v3inodes(mp)) { - VFS_I(ip)->i_version = 1; - ip->i_diflags2 = 0; - times |= XFS_ICHGTIME_CREATE; - } - libxfs_trans_ichgtime(tp, ip, times); - - /* - * now the ifork - */ - ip->i_df.if_bytes = 0; - ip->i_df.if_u1.if_root = NULL; - + /* Reset the realtime bitmap inode. */ + reset_root_ino(tp, S_IFREG, ip); ip->i_disk_size = mp->m_sb.sb_rbmblocks * mp->m_sb.sb_blocksize; - - /* - * commit changes - */ - libxfs_trans_ijoin(tp, ip, 0); libxfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); error = -libxfs_trans_commit(tp); if (error) @@ -711,7 +693,6 @@ mk_rsumino(xfs_mount_t *mp) int nsumblocks; xfs_fileoff_t bno; xfs_bmbt_irec_t map[XFS_BMAP_MAX_NMAP]; - int times; uint blocks; /* @@ -728,34 +709,9 @@ mk_rsumino(xfs_mount_t *mp) error); } - reset_inode_fields(ip); - - VFS_I(ip)->i_mode = S_IFREG; - ip->i_df.if_format = XFS_DINODE_FMT_EXTENTS; - libxfs_ifork_zap_attr(ip); - - set_nlink(VFS_I(ip), 1); /* account for sb ptr */ - - times = XFS_ICHGTIME_CHG | XFS_ICHGTIME_MOD; - if (xfs_has_v3inodes(mp)) { - VFS_I(ip)->i_version = 1; - ip->i_diflags2 = 0; - times |= XFS_ICHGTIME_CREATE; - } - libxfs_trans_ichgtime(tp, ip, times); - - /* - * now the ifork - */ - ip->i_df.if_bytes = 0; - ip->i_df.if_u1.if_root = NULL; - + /* Reset the rt summary inode. */ + reset_root_ino(tp, S_IFREG, ip); ip->i_disk_size = mp->m_rsumsize; - - /* - * commit changes - */ - libxfs_trans_ijoin(tp, ip, 0); libxfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); error = -libxfs_trans_commit(tp); if (error) @@ -811,7 +767,6 @@ mk_root_dir(xfs_mount_t *mp) int error; const mode_t mode = 0755; ino_tree_node_t *irec; - int times; ip = NULL; i = -libxfs_trans_alloc(mp, &M_RES(mp)->tr_ichange, 10, 0, 0, &tp); @@ -823,37 +778,8 @@ mk_root_dir(xfs_mount_t *mp) do_error(_("could not iget root inode -- error - %d\n"), error); } - /* - * take care of the core since we didn't call the libxfs ialloc function - * (comment changed to avoid tangling xfs/437) - */ - reset_inode_fields(ip); - - VFS_I(ip)->i_mode = mode|S_IFDIR; - ip->i_df.if_format = XFS_DINODE_FMT_EXTENTS; - libxfs_ifork_zap_attr(ip); - - set_nlink(VFS_I(ip), 2); /* account for . and .. */ - - times = XFS_ICHGTIME_CHG | XFS_ICHGTIME_MOD; - if (xfs_has_v3inodes(mp)) { - VFS_I(ip)->i_version = 1; - ip->i_diflags2 = 0; - times |= XFS_ICHGTIME_CREATE; - } - libxfs_trans_ichgtime(tp, ip, times); - libxfs_trans_ijoin(tp, ip, 0); - libxfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); - - /* - * now the ifork - */ - ip->i_df.if_bytes = 0; - ip->i_df.if_u1.if_root = NULL; - - /* - * initialize the directory - */ + /* Reset the root directory. */ + reset_root_ino(tp, mode | S_IFDIR, ip); libxfs_dir_init(tp, ip, ip); error = -libxfs_trans_commit(tp); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 26/26] xfs_repair: use library functions for orphanage creation 2022-12-30 22:19 ` [PATCHSET v1.0 00/26] libxfs: hoist inode operations to libxfs Darrick J. Wong ` (23 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 25/26] xfs_repair: use library functions to reset root/rbm/rsum inodes Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 24/26] xfs: don't use the incore struct xfs_sb for offsets into struct xfs_dsb Darrick J. Wong 25 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Use new library functions to create lost+found. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/libxfs_api_defs.h | 1 + repair/phase6.c | 28 ++++++++-------------------- 2 files changed, 9 insertions(+), 20 deletions(-) diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h index 2b2b958d8a9..a5f9c6006f6 100644 --- a/libxfs/libxfs_api_defs.h +++ b/libxfs/libxfs_api_defs.h @@ -123,6 +123,7 @@ #define xfs_dir2_shrink_inode libxfs_dir2_shrink_inode #define xfs_dir_createname libxfs_dir_createname +#define xfs_dir_create_new_child libxfs_dir_create_new_child #define xfs_dir_init libxfs_dir_init #define xfs_dir_ino_validate libxfs_dir_ino_validate #define xfs_dir_lookup libxfs_dir_lookup diff --git a/repair/phase6.c b/repair/phase6.c index 5765c2b1250..f8f42eb6e29 100644 --- a/repair/phase6.c +++ b/repair/phase6.c @@ -804,6 +804,11 @@ mk_orphanage( struct xfs_icreate_args args = { .nlink = 2, }; + struct xfs_name xname = { + .name = ORPHANAGE, + .len = strlen(ORPHANAGE), + .type = XFS_DIR3_FT_DIR, + }; struct xfs_trans *tp; struct xfs_inode *ip; struct xfs_inode *pip; @@ -814,7 +819,6 @@ mk_orphanage( int error; int nres; const umode_t mode = S_IFDIR | 0755; - struct xfs_name xname; libxfs_icreate_args_rootfile(&args, mode); @@ -830,9 +834,6 @@ mk_orphanage( i, ORPHANAGE); args.pip = pip; - xname.name = (unsigned char *)ORPHANAGE; - xname.len = strlen(ORPHANAGE); - xname.type = XFS_DIR3_FT_DIR; if (libxfs_dir_lookup(NULL, pip, &xname, &ino, NULL) == 0) return ino; @@ -845,15 +846,6 @@ mk_orphanage( if (i) res_failed(i); - /* - * use iget/ijoin instead of trans_iget because the ialloc - * wrapper can commit the transaction and start a new one - */ -/* i = -libxfs_iget(mp, NULL, mp->m_sb.sb_rootino, 0, &pip); - if (i) - do_error(_("%d - couldn't iget root inode to make %s\n"), - i, ORPHANAGE);*/ - error = -libxfs_dialloc(&tp, mp->m_sb.sb_rootino, mode, &ino); if (error) do_error(_("%s inode allocation failed %d\n"), @@ -902,26 +894,22 @@ mk_orphanage( /* * create the actual entry */ - error = -libxfs_dir_createname(tp, pip, &xname, ip->i_ino, nres); + error = -libxfs_dir_create_new_child(tp, nres, pip, &xname, ip); if (error) do_error( _("can't make %s, createname error %d\n"), ORPHANAGE, error); /* - * bump up the link count in the root directory to account - * for .. in the new directory, and update the irec copy of the + * We bumped up the link count in the root directory to account + * for .. in the new directory, so now update the irec copy of the * on-disk nlink so we don't fail the link count check later. */ - inc_nlink(VFS_I(pip)); irec = find_inode_rec(mp, XFS_INO_TO_AGNO(mp, mp->m_sb.sb_rootino), XFS_INO_TO_AGINO(mp, mp->m_sb.sb_rootino)); add_inode_ref(irec, 0); set_inode_disk_nlinks(irec, 0, get_inode_disk_nlinks(irec, 0) + 1); - libxfs_trans_log_inode(tp, pip, XFS_ILOG_CORE); - libxfs_dir_init(tp, ip, pip); - libxfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); error = -libxfs_trans_commit(tp); if (error) { do_error(_("%s directory creation failed -- bmapf error %d\n"), ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 24/26] xfs: don't use the incore struct xfs_sb for offsets into struct xfs_dsb 2022-12-30 22:19 ` [PATCHSET v1.0 00/26] libxfs: hoist inode operations to libxfs Darrick J. Wong ` (24 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 26/26] xfs_repair: use library functions for orphanage creation Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 25 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Currently, the XFS_SB_CRC_OFF macro uses the incore superblock struct (xfs_sb) to compute the address of sb_crc within the ondisk superblock struct (xfs_dsb). This is a landmine if we ever change the layout of the incore superblock (as we're about to do), so redefine the macro to use xfs_dsb to compute the layout of xfs_dsb. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- db/sb.c | 4 ++-- libxfs/xfs_format.h | 9 ++++----- mdrestore/xfs_mdrestore.c | 6 ++---- repair/agheader.c | 12 ++++++------ 4 files changed, 14 insertions(+), 17 deletions(-) diff --git a/db/sb.c b/db/sb.c index fd81286cd60..095c59596a4 100644 --- a/db/sb.c +++ b/db/sb.c @@ -50,8 +50,8 @@ sb_init(void) add_command(&version_cmd); } -#define OFF(f) bitize(offsetof(xfs_sb_t, sb_ ## f)) -#define SZC(f) szcount(xfs_sb_t, sb_ ## f) +#define OFF(f) bitize(offsetof(struct xfs_dsb, sb_ ## f)) +#define SZC(f) szcount(struct xfs_dsb, sb_ ## f) const field_t sb_flds[] = { { "magicnum", FLDT_UINT32X, OI(OFF(magicnum)), C1, 0, TYP_NONE }, { "blocksize", FLDT_UINT32D, OI(OFF(blocksize)), C1, 0, TYP_NONE }, diff --git a/libxfs/xfs_format.h b/libxfs/xfs_format.h index 0c457905cce..abd75b3091e 100644 --- a/libxfs/xfs_format.h +++ b/libxfs/xfs_format.h @@ -90,8 +90,7 @@ struct xfs_ifork; #define XFSLABEL_MAX 12 /* - * Superblock - in core version. Must match the ondisk version below. - * Must be padded to 64 bit alignment. + * Superblock - in core version. Must be padded to 64 bit alignment. */ typedef struct xfs_sb { uint32_t sb_magicnum; /* magic number == XFS_SB_MAGIC */ @@ -178,10 +177,8 @@ typedef struct xfs_sb { /* must be padded to 64 bit alignment */ } xfs_sb_t; -#define XFS_SB_CRC_OFF offsetof(struct xfs_sb, sb_crc) - /* - * Superblock - on disk version. Must match the in core version above. + * Superblock - on disk version. * Must be padded to 64 bit alignment. */ struct xfs_dsb { @@ -265,6 +262,8 @@ struct xfs_dsb { /* must be padded to 64 bit alignment */ }; +#define XFS_SB_CRC_OFF offsetof(struct xfs_dsb, sb_crc) + /* * Misc. Flags - warning - these will be cleared by xfs_repair unless * a feature bit is set when the flag is used. diff --git a/mdrestore/xfs_mdrestore.c b/mdrestore/xfs_mdrestore.c index 7c1a66c4001..9f8cbe98cd6 100644 --- a/mdrestore/xfs_mdrestore.c +++ b/mdrestore/xfs_mdrestore.c @@ -164,10 +164,8 @@ perform_restore( memset(block_buffer, 0, sb.sb_sectsize); sb.sb_inprogress = 0; libxfs_sb_to_disk((struct xfs_dsb *)block_buffer, &sb); - if (xfs_sb_version_hascrc(&sb)) { - xfs_update_cksum(block_buffer, sb.sb_sectsize, - offsetof(struct xfs_sb, sb_crc)); - } + if (xfs_sb_version_hascrc(&sb)) + xfs_update_cksum(block_buffer, sb.sb_sectsize, XFS_SB_CRC_OFF); if (pwrite(dst_fd, block_buffer, sb.sb_sectsize, 0) < 0) fatal("error writing primary superblock: %s\n", strerror(errno)); diff --git a/repair/agheader.c b/repair/agheader.c index 762901581e1..3930a0ac091 100644 --- a/repair/agheader.c +++ b/repair/agheader.c @@ -358,22 +358,22 @@ secondary_sb_whack( * size is the size of data which is valid for this sb. */ if (xfs_sb_version_hasmetauuid(sb)) - size = offsetof(xfs_sb_t, sb_meta_uuid) + size = offsetof(struct xfs_dsb, sb_meta_uuid) + sizeof(sb->sb_meta_uuid); else if (xfs_sb_version_hascrc(sb)) - size = offsetof(xfs_sb_t, sb_lsn) + size = offsetof(struct xfs_dsb, sb_lsn) + sizeof(sb->sb_lsn); else if (xfs_sb_version_hasmorebits(sb)) - size = offsetof(xfs_sb_t, sb_bad_features2) + size = offsetof(struct xfs_dsb, sb_bad_features2) + sizeof(sb->sb_bad_features2); else if (xfs_sb_version_haslogv2(sb)) - size = offsetof(xfs_sb_t, sb_logsunit) + size = offsetof(struct xfs_dsb, sb_logsunit) + sizeof(sb->sb_logsunit); else if (xfs_sb_version_hassector(sb)) - size = offsetof(xfs_sb_t, sb_logsectsize) + size = offsetof(struct xfs_dsb, sb_logsectsize) + sizeof(sb->sb_logsectsize); else /* only support dirv2 or more recent */ - size = offsetof(xfs_sb_t, sb_dirblklog) + size = offsetof(struct xfs_dsb, sb_dirblklog) + sizeof(sb->sb_dirblklog); /* Check the buffer we read from disk for garbage outside size */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCHSET v1.0 00/46] libxfs: metadata inode directories 2022-12-30 21:14 [NYE DELUGE 3/4] xfs: modernize the realtime volume Darrick J. Wong ` (19 preceding siblings ...) 2022-12-30 22:19 ` [PATCHSET v1.0 00/26] libxfs: hoist inode operations to libxfs Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 04/46] libxfs: convert all users to libxfs_imeta_create Darrick J. Wong ` (45 more replies) 2022-12-30 22:19 ` [PATCHSET v1.0 00/10] libxfs: refactor rt extent unit conversions Darrick J. Wong ` (18 subsequent siblings) 39 siblings, 46 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs Hi all, Add libxfs code from the kernel, then teach xfs_repair and mkfs to use the metadir functions to find metadata inodes. If you're going to start using this mess, you probably ought to just pull from my git trees, which are linked below. This is an extraordinary way to destroy everything. Enjoy! Comments and questions are, as always, welcome. --D kernel git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=metadir xfsprogs git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=metadir fstests git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=metadir xfsdocs git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-documentation.git/log/?h=metadir --- db/check.c | 21 + db/inode.c | 3 db/metadump.c | 92 ++- db/namei.c | 43 + db/sb.c | 43 + include/kmem.h | 4 include/libxfs.h | 1 include/xfs_inode.h | 13 include/xfs_mount.h | 3 include/xfs_trace.h | 13 io/bulkstat.c | 16 - libfrog/fsgeom.c | 4 libxfs/Makefile | 2 libxfs/init.c | 40 + libxfs/inode.c | 130 ++++ libxfs/libxfs_api_defs.h | 20 + libxfs/libxfs_priv.h | 4 libxfs/util.c | 75 ++ libxfs/xfs_format.h | 51 ++ libxfs/xfs_fs.h | 12 libxfs/xfs_health.h | 4 libxfs/xfs_ialloc.c | 16 - libxfs/xfs_ialloc.h | 2 libxfs/xfs_imeta.c | 1209 +++++++++++++++++++++++++++++++++++++++ libxfs/xfs_imeta.h | 94 +++ libxfs/xfs_inode_buf.c | 73 ++ libxfs/xfs_inode_buf.h | 3 libxfs/xfs_inode_util.c | 4 libxfs/xfs_log_rlimit.c | 9 libxfs/xfs_sb.c | 35 + libxfs/xfs_trans_resv.c | 74 ++ libxfs/xfs_trans_resv.h | 2 libxfs/xfs_types.c | 7 man/man2/ioctl_xfs_fsgeometry.2 | 3 man/man8/mkfs.xfs.8.in | 11 man/man8/xfs_admin.8 | 9 man/man8/xfs_db.8 | 11 man/man8/xfs_io.8 | 10 man/man8/xfs_protofile.8 | 33 + mkfs/Makefile | 10 mkfs/lts_4.19.conf | 1 mkfs/lts_5.10.conf | 1 mkfs/lts_5.15.conf | 1 mkfs/proto.c | 283 +++++++-- mkfs/xfs_mkfs.c | 26 + mkfs/xfs_protofile.in | 152 +++++ repair/agheader.c | 7 repair/dino_chunks.c | 58 ++ repair/dinode.c | 173 +++++- repair/dinode.h | 6 repair/dir2.c | 77 ++ repair/globals.c | 4 repair/globals.h | 4 repair/incore.h | 50 +- repair/incore_ino.c | 1 repair/phase1.c | 2 repair/phase2.c | 76 ++ repair/phase4.c | 21 + repair/phase5.c | 7 repair/phase6.c | 853 +++++++++++++++++++++++----- repair/protos.h | 6 repair/sb.c | 3 repair/scan.c | 43 + repair/scan.h | 7 repair/xfs_repair.c | 88 +++ scrub/inodes.c | 5 scrub/inodes.h | 5 scrub/phase3.c | 7 scrub/phase5.c | 2 scrub/phase6.c | 2 70 files changed, 3785 insertions(+), 395 deletions(-) create mode 100644 libxfs/xfs_imeta.c create mode 100644 libxfs/xfs_imeta.h create mode 100644 man/man8/xfs_protofile.8 create mode 100644 mkfs/xfs_protofile.in ^ permalink raw reply [flat|nested] 565+ messages in thread
* [PATCH 04/46] libxfs: convert all users to libxfs_imeta_create 2022-12-30 22:19 ` [PATCHSET v1.0 00/46] libxfs: metadata inode directories Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 02/46] xfs: create transaction reservations for metadata inode operations Darrick J. Wong ` (44 subsequent siblings) 45 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Convert all open-coded sb metadata inode pointer logging to use libxfs_imeta_create to create metadata inodes. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- mkfs/proto.c | 54 +++++++++++++++++++++++++++++++----------------------- 1 file changed, 31 insertions(+), 23 deletions(-) diff --git a/mkfs/proto.c b/mkfs/proto.c index 3121c35baa1..354c9fa8a02 100644 --- a/mkfs/proto.c +++ b/mkfs/proto.c @@ -680,8 +680,7 @@ static void rtinit( struct xfs_mount *mp) { - struct cred creds; - struct fsxattr fsxattrs; + struct xfs_imeta_update upd; struct xfs_bmbt_irec map[XFS_BMAP_MAX_NMAP]; struct xfs_inode *rbmip; struct xfs_inode *rsumip; @@ -696,45 +695,54 @@ rtinit( int error; /* Create the realtime bitmap inode. */ - error = -libxfs_trans_alloc_rollable(mp, MKFS_BLOCKRES_INODE, &tp); + error = -libxfs_imeta_start_update(mp, &XFS_IMETA_RTBITMAP, &upd); if (error) res_failed(error); - memset(&creds, 0, sizeof(creds)); - memset(&fsxattrs, 0, sizeof(fsxattrs)); - error = creatproto(&tp, NULL, S_IFREG, 1, 0, &creds, &fsxattrs, - &rbmip); - if (error) { + error = -libxfs_trans_alloc(mp, &M_RES(mp)->tr_imeta_create, + libxfs_imeta_create_space_res(mp), 0, 0, &tp); + if (error) + res_failed(error); + + error = -libxfs_imeta_create(&tp, &XFS_IMETA_RTBITMAP, S_IFREG, 0, + &rbmip, &upd); + if (error) fail(_("Realtime bitmap inode allocation failed"), error); - } - /* - * Do our thing with rbmip before allocating rsumip, - * because the next call to createproto may - * commit the transaction in which rbmip was allocated. - */ - mp->m_sb.sb_rbmino = rbmip->i_ino; + rbmip->i_disk_size = mp->m_sb.sb_rbmblocks * mp->m_sb.sb_blocksize; - rbmip->i_diflags = XFS_DIFLAG_NEWRTBM; + rbmip->i_diflags |= XFS_DIFLAG_NEWRTBM; *(uint64_t *)&VFS_I(rbmip)->i_atime = 0; libxfs_trans_log_inode(tp, rbmip, XFS_ILOG_CORE); - libxfs_log_sb(tp); + error = -libxfs_trans_commit(tp); + if (error) + fail(_("Completion of the realtime bitmap inode failed"), + error); mp->m_rbmip = rbmip; + libxfs_imeta_end_update(mp, &upd, 0); /* Create the realtime summary inode. */ - error = creatproto(&tp, NULL, S_IFREG, 1, 0, &creds, &fsxattrs, - &rsumip); - if (error) { + error = -libxfs_imeta_start_update(mp, &XFS_IMETA_RTSUMMARY, &upd); + if (error) + res_failed(error); + + error = -libxfs_trans_alloc(mp, &M_RES(mp)->tr_imeta_create, + libxfs_imeta_create_space_res(mp), 0, 0, &tp); + if (error) + res_failed(error); + + error = -libxfs_imeta_create(&tp, &XFS_IMETA_RTSUMMARY, S_IFREG, 0, + &rsumip, &upd); + if (error) fail(_("Realtime summary inode allocation failed"), error); - } - mp->m_sb.sb_rsumino = rsumip->i_ino; + rsumip->i_disk_size = mp->m_rsumsize; libxfs_trans_log_inode(tp, rsumip, XFS_ILOG_CORE); - libxfs_log_sb(tp); error = -libxfs_trans_commit(tp); if (error) fail(_("Completion of the realtime summary inode failed"), error); mp->m_rsumip = rsumip; + libxfs_imeta_end_update(mp, &upd, 0); /* Zero the realtime bitmap. */ blocks = mp->m_sb.sb_rbmblocks + ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 02/46] xfs: create transaction reservations for metadata inode operations 2022-12-30 22:19 ` [PATCHSET v1.0 00/46] libxfs: metadata inode directories Darrick J. Wong 2022-12-30 22:19 ` [PATCH 04/46] libxfs: convert all users to libxfs_imeta_create Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 03/46] mkfs: clean up the rtinit() function Darrick J. Wong ` (43 subsequent siblings) 45 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Create transaction reservation types and block reservation helpers to help us calculate transaction requirements. Right now we're just separating the symbols for a future patch. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/libxfs_api_defs.h | 2 ++ libxfs/xfs_imeta.c | 20 ++++++++++++++++++++ libxfs/xfs_imeta.h | 3 +++ libxfs/xfs_trans_resv.c | 4 ++++ libxfs/xfs_trans_resv.h | 2 ++ 5 files changed, 31 insertions(+) diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h index 5657fee51b8..69f1cf2c752 100644 --- a/libxfs/libxfs_api_defs.h +++ b/libxfs/libxfs_api_defs.h @@ -157,12 +157,14 @@ #define xfs_imap_to_bp libxfs_imap_to_bp #define xfs_imeta_create libxfs_imeta_create +#define xfs_imeta_create_space_res libxfs_imeta_create_space_res #define xfs_imeta_end_update libxfs_imeta_end_update #define xfs_imeta_link libxfs_imeta_link #define xfs_imeta_lookup libxfs_imeta_lookup #define xfs_imeta_mount libxfs_imeta_mount #define xfs_imeta_start_update libxfs_imeta_start_update #define xfs_imeta_unlink libxfs_imeta_unlink +#define xfs_imeta_unlink_space_res libxfs_imeta_unlink_space_res #define xfs_initialize_perag libxfs_initialize_perag #define xfs_initialize_perag_data libxfs_initialize_perag_data diff --git a/libxfs/xfs_imeta.c b/libxfs/xfs_imeta.c index b5c6672e7d5..bc3c94634ce 100644 --- a/libxfs/xfs_imeta.c +++ b/libxfs/xfs_imeta.c @@ -18,6 +18,10 @@ #include "xfs_trace.h" #include "xfs_inode.h" #include "xfs_ialloc.h" +#include "xfs_bmap_btree.h" +#include "xfs_da_format.h" +#include "xfs_da_btree.h" +#include "xfs_trans_space.h" /* * Metadata Inode Number Management @@ -435,3 +439,19 @@ xfs_imeta_mount( { return 0; } + +/* Calculate the log block reservation to create a metadata inode. */ +unsigned int +xfs_imeta_create_space_res( + struct xfs_mount *mp) +{ + return XFS_IALLOC_SPACE_RES(mp); +} + +/* Calculate the log block reservation to unlink a metadata inode. */ +unsigned int +xfs_imeta_unlink_space_res( + struct xfs_mount *mp) +{ + return XFS_REMOVE_SPACE_RES(mp); +} diff --git a/libxfs/xfs_imeta.h b/libxfs/xfs_imeta.h index b535e19ff1a..9d54cb0d796 100644 --- a/libxfs/xfs_imeta.h +++ b/libxfs/xfs_imeta.h @@ -45,4 +45,7 @@ int xfs_imeta_start_update(struct xfs_mount *mp, bool xfs_is_static_meta_ino(struct xfs_mount *mp, xfs_ino_t ino); int xfs_imeta_mount(struct xfs_mount *mp); +unsigned int xfs_imeta_create_space_res(struct xfs_mount *mp); +unsigned int xfs_imeta_unlink_space_res(struct xfs_mount *mp); + #endif /* __XFS_IMETA_H__ */ diff --git a/libxfs/xfs_trans_resv.c b/libxfs/xfs_trans_resv.c index 04c444806fe..00bdcb1d550 100644 --- a/libxfs/xfs_trans_resv.c +++ b/libxfs/xfs_trans_resv.c @@ -1024,4 +1024,8 @@ xfs_trans_resv_calc( resp->tr_itruncate.tr_logcount += logcount_adj; resp->tr_write.tr_logcount += logcount_adj; resp->tr_qm_dqalloc.tr_logcount += logcount_adj; + + /* metadata inode creation and unlink */ + resp->tr_imeta_create = resp->tr_create; + resp->tr_imeta_unlink = resp->tr_remove; } diff --git a/libxfs/xfs_trans_resv.h b/libxfs/xfs_trans_resv.h index 0554b9d775d..3836c5131b9 100644 --- a/libxfs/xfs_trans_resv.h +++ b/libxfs/xfs_trans_resv.h @@ -48,6 +48,8 @@ struct xfs_trans_resv { struct xfs_trans_res tr_qm_dqalloc; /* allocate quota on disk */ struct xfs_trans_res tr_sb; /* modify superblock */ struct xfs_trans_res tr_fsyncts; /* update timestamps on fsync */ + struct xfs_trans_res tr_imeta_create; /* create metadata inode */ + struct xfs_trans_res tr_imeta_unlink; /* unlink metadata inode */ }; /* shorthand way of accessing reservation structure */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 03/46] mkfs: clean up the rtinit() function 2022-12-30 22:19 ` [PATCHSET v1.0 00/46] libxfs: metadata inode directories Darrick J. Wong 2022-12-30 22:19 ` [PATCH 04/46] libxfs: convert all users to libxfs_imeta_create Darrick J. Wong 2022-12-30 22:19 ` [PATCH 02/46] xfs: create transaction reservations for metadata inode operations Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 01/46] xfs: create imeta abstractions to get and set metadata inodes Darrick J. Wong ` (42 subsequent siblings) 45 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Clean up some of the warts in this function, like the inconsistent use of @i for @error, missing comments, and make this more visually pleasing by adding some whitespace between major sections. Some things are left untouched for the next patch. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- mkfs/proto.c | 81 +++++++++++++++++++++++++++++----------------------------- 1 file changed, 41 insertions(+), 40 deletions(-) diff --git a/mkfs/proto.c b/mkfs/proto.c index b60def70652..3121c35baa1 100644 --- a/mkfs/proto.c +++ b/mkfs/proto.c @@ -678,29 +678,27 @@ parse_proto( */ static void rtinit( - xfs_mount_t *mp) + struct xfs_mount *mp) { - xfs_fileoff_t bno; - xfs_fileoff_t ebno; - xfs_bmbt_irec_t *ep; - int error; - int i; - xfs_bmbt_irec_t map[XFS_BMAP_MAX_NMAP]; - xfs_extlen_t nsumblocks; - uint blocks; - int nmap; - xfs_inode_t *rbmip; - xfs_inode_t *rsumip; - xfs_trans_t *tp; - struct cred creds; - struct fsxattr fsxattrs; + struct cred creds; + struct fsxattr fsxattrs; + struct xfs_bmbt_irec map[XFS_BMAP_MAX_NMAP]; + struct xfs_inode *rbmip; + struct xfs_inode *rsumip; + struct xfs_trans *tp; + struct xfs_bmbt_irec *ep; + xfs_fileoff_t bno; + xfs_fileoff_t ebno; + xfs_extlen_t nsumblocks; + uint blocks; + int i; + int nmap; + int error; - /* - * First, allocate the inodes. - */ - i = -libxfs_trans_alloc_rollable(mp, MKFS_BLOCKRES_INODE, &tp); - if (i) - res_failed(i); + /* Create the realtime bitmap inode. */ + error = -libxfs_trans_alloc_rollable(mp, MKFS_BLOCKRES_INODE, &tp); + if (error) + res_failed(error); memset(&creds, 0, sizeof(creds)); memset(&fsxattrs, 0, sizeof(fsxattrs)); @@ -721,6 +719,8 @@ rtinit( libxfs_trans_log_inode(tp, rbmip, XFS_ILOG_CORE); libxfs_log_sb(tp); mp->m_rbmip = rbmip; + + /* Create the realtime summary inode. */ error = creatproto(&tp, NULL, S_IFREG, 1, 0, &creds, &fsxattrs, &rsumip); if (error) { @@ -735,14 +735,13 @@ rtinit( fail(_("Completion of the realtime summary inode failed"), error); mp->m_rsumip = rsumip; - /* - * Next, give the bitmap file some zero-filled blocks. - */ + + /* Zero the realtime bitmap. */ blocks = mp->m_sb.sb_rbmblocks + XFS_BM_MAXLEVELS(mp, XFS_DATA_FORK) - 1; - i = -libxfs_trans_alloc_rollable(mp, blocks, &tp); - if (i) - res_failed(i); + error = -libxfs_trans_alloc_rollable(mp, blocks, &tp); + if (error) + res_failed(error); libxfs_trans_ijoin(tp, rbmip, 0); bno = 0; @@ -751,10 +750,10 @@ rtinit( error = -libxfs_bmapi_write(tp, rbmip, bno, (xfs_extlen_t)(mp->m_sb.sb_rbmblocks - bno), 0, mp->m_sb.sb_rbmblocks, map, &nmap); - if (error) { + if (error) fail(_("Allocation of the realtime bitmap failed"), error); - } + for (i = 0, ep = map; i < nmap; i++, ep++) { libxfs_device_zero(mp->m_ddev_targp, XFS_FSB_TO_DADDR(mp, ep->br_startblock), @@ -768,25 +767,24 @@ rtinit( fail(_("Block allocation of the realtime bitmap inode failed"), error); - /* - * Give the summary file some zero-filled blocks. - */ + /* Zero the summary file. */ nsumblocks = mp->m_rsumsize >> mp->m_sb.sb_blocklog; blocks = nsumblocks + XFS_BM_MAXLEVELS(mp, XFS_DATA_FORK) - 1; - i = -libxfs_trans_alloc_rollable(mp, blocks, &tp); - if (i) - res_failed(i); + error = -libxfs_trans_alloc_rollable(mp, blocks, &tp); + if (error) + res_failed(error); libxfs_trans_ijoin(tp, rsumip, 0); + bno = 0; while (bno < nsumblocks) { nmap = XFS_BMAP_MAX_NMAP; error = -libxfs_bmapi_write(tp, rsumip, bno, (xfs_extlen_t)(nsumblocks - bno), 0, nsumblocks, map, &nmap); - if (error) { + if (error) fail(_("Allocation of the realtime summary failed"), error); - } + for (i = 0, ep = map; i < nmap; i++, ep++) { libxfs_device_zero(mp->m_ddev_targp, XFS_FSB_TO_DADDR(mp, ep->br_startblock), @@ -794,6 +792,7 @@ rtinit( bno += ep->br_blockcount; } } + error = -libxfs_trans_commit(tp); if (error) fail(_("Block allocation of the realtime summary inode failed"), @@ -804,13 +803,15 @@ rtinit( * Do one transaction per bitmap block. */ for (bno = 0; bno < mp->m_sb.sb_rextents; bno = ebno) { - i = -libxfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, + error = -libxfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, 0, 0, 0, &tp); - if (i) - res_failed(i); + if (error) + res_failed(error); + libxfs_trans_ijoin(tp, rbmip, 0); ebno = XFS_RTMIN(mp->m_sb.sb_rextents, bno + NBBY * mp->m_sb.sb_blocksize); + error = -libxfs_rtfree_extent(tp, bno, (xfs_extlen_t)(ebno-bno)); if (error) { fail(_("Error initializing the realtime space"), ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 01/46] xfs: create imeta abstractions to get and set metadata inodes 2022-12-30 22:19 ` [PATCHSET v1.0 00/46] libxfs: metadata inode directories Darrick J. Wong ` (2 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 03/46] mkfs: clean up the rtinit() function Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 05/46] mkfs: break up the rest of the rtinit() function Darrick J. Wong ` (41 subsequent siblings) 45 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Create some helper routines to get and set metadata inode numbers instead of open-coding them throughout xfs. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- include/libxfs.h | 1 include/xfs_trace.h | 7 + libxfs/Makefile | 2 libxfs/init.c | 20 ++ libxfs/libxfs_api_defs.h | 12 + libxfs/libxfs_priv.h | 2 libxfs/xfs_imeta.c | 437 ++++++++++++++++++++++++++++++++++++++++++++++ libxfs/xfs_imeta.h | 48 +++++ libxfs/xfs_types.c | 5 - mkfs/xfs_mkfs.c | 2 10 files changed, 532 insertions(+), 4 deletions(-) create mode 100644 libxfs/xfs_imeta.c create mode 100644 libxfs/xfs_imeta.h diff --git a/include/libxfs.h b/include/libxfs.h index a4f6e1c2b28..b06d691e283 100644 --- a/include/libxfs.h +++ b/include/libxfs.h @@ -82,6 +82,7 @@ struct iomap; #include "xfs_btree_staging.h" #include "xfs_symlink_remote.h" #include "xfs_ag_resv.h" +#include "xfs_imeta.h" #ifndef ARRAY_SIZE #define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0])) diff --git a/include/xfs_trace.h b/include/xfs_trace.h index d94d8d29bed..78bce651a6f 100644 --- a/include/xfs_trace.h +++ b/include/xfs_trace.h @@ -349,6 +349,13 @@ #define trace_xfs_perag_get_tag(a,b,c,d) ((c) = (c)) #define trace_xfs_perag_put(a,b,c,d) ((c) = (c)) +#define trace_xfs_imeta_end_update(...) ((void) 0) +#define trace_xfs_imeta_sb_link(...) ((void) 0) +#define trace_xfs_imeta_sb_lookup(...) ((void) 0) +#define trace_xfs_imeta_sb_create(...) ((void) 0) +#define trace_xfs_imeta_sb_unlink(...) ((void) 0) +#define trace_xfs_imeta_start_update(...) ((void) 0) + #define trace_xfs_iunlink_update_bucket(...) ((void) 0) #define trace_xfs_iunlink_update_dinode(...) ((void) 0) #define trace_xfs_iunlink(...) ((void) 0) diff --git a/libxfs/Makefile b/libxfs/Makefile index 94f5968e862..4296a6d9158 100644 --- a/libxfs/Makefile +++ b/libxfs/Makefile @@ -47,6 +47,7 @@ HFILES = \ xfs_errortag.h \ xfs_ialloc.h \ xfs_ialloc_btree.h \ + xfs_imeta.h \ xfs_inode_buf.h \ xfs_inode_fork.h \ xfs_inode_util.h \ @@ -98,6 +99,7 @@ CFILES = cache.c \ xfs_dquot_buf.c \ xfs_ialloc.c \ xfs_iext_tree.c \ + xfs_imeta.c \ xfs_inode_buf.c \ xfs_inode_fork.c \ xfs_inode_util.c \ diff --git a/libxfs/init.c b/libxfs/init.c index b80f6bfd8fc..e19b4e6d4cf 100644 --- a/libxfs/init.c +++ b/libxfs/init.c @@ -791,6 +791,24 @@ libxfs_compute_all_maxlevels( xfs_agbtree_compute_maxlevels(mp); } +/* Mount the metadata files under the metadata directory tree. */ +STATIC void +libxfs_mountfs_imeta( + struct xfs_mount *mp) +{ + int error; + + /* Ignore filesystems that are under construction. */ + if (mp->m_sb.sb_inprogress) + return; + + error = -xfs_imeta_mount(mp); + if (error) + fprintf(stderr, +_("%s: metadata inode mounting failed, error %d\n"), + progname, error); +} + /* * Mount structure initialization, provides a filled-in xfs_mount_t * such that the numerous XFS_* macros can be used. If dev is zero, @@ -953,6 +971,8 @@ libxfs_mount( } xfs_set_perag_data_loaded(mp); + libxfs_mountfs_imeta(mp); + return mp; out_da: xfs_da_unmount(mp); diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h index a5f9c6006f6..5657fee51b8 100644 --- a/libxfs/libxfs_api_defs.h +++ b/libxfs/libxfs_api_defs.h @@ -145,6 +145,8 @@ #define xfs_ialloc_calc_rootino libxfs_ialloc_calc_rootino #define xfs_iallocbt_maxlevels_ondisk libxfs_iallocbt_maxlevels_ondisk #define xfs_ialloc_read_agi libxfs_ialloc_read_agi +#define xfs_icreate libxfs_icreate +#define xfs_icreate_args_rootfile libxfs_icreate_args_rootfile #define xfs_idata_realloc libxfs_idata_realloc #define xfs_idestroy_fork libxfs_idestroy_fork #define xfs_iext_first libxfs_iext_first @@ -153,6 +155,15 @@ #define xfs_iext_next libxfs_iext_next #define xfs_ifork_zap_attr libxfs_ifork_zap_attr #define xfs_imap_to_bp libxfs_imap_to_bp + +#define xfs_imeta_create libxfs_imeta_create +#define xfs_imeta_end_update libxfs_imeta_end_update +#define xfs_imeta_link libxfs_imeta_link +#define xfs_imeta_lookup libxfs_imeta_lookup +#define xfs_imeta_mount libxfs_imeta_mount +#define xfs_imeta_start_update libxfs_imeta_start_update +#define xfs_imeta_unlink libxfs_imeta_unlink + #define xfs_initialize_perag libxfs_initialize_perag #define xfs_initialize_perag_data libxfs_initialize_perag_data #define xfs_init_local_fork libxfs_init_local_fork @@ -170,6 +181,7 @@ #define xfs_iread_extents libxfs_iread_extents #define xfs_irele libxfs_irele +#define xfs_is_meta_ino libxfs_is_meta_ino #define xfs_log_calc_minimum_size libxfs_log_calc_minimum_size #define xfs_log_get_max_trans_res libxfs_log_get_max_trans_res #define xfs_log_sb libxfs_log_sb diff --git a/libxfs/libxfs_priv.h b/libxfs/libxfs_priv.h index 90335331cde..85b54f16803 100644 --- a/libxfs/libxfs_priv.h +++ b/libxfs/libxfs_priv.h @@ -528,6 +528,8 @@ void __xfs_buf_mark_corrupt(struct xfs_buf *bp, xfs_failaddr_t fa); static inline int retzero(void) { return 0; } #define xfs_trans_unreserve_quota_nblks(t,i,b,n,f) retzero() #define xfs_quota_unreserve_blkres(i,b) retzero() +#define xfs_qm_dqattach(i) (0) +#define xfs_qm_dqattach_locked(ip, alloc) (0) #define xfs_quota_reserve_blkres(i,b) (0) #define xfs_qm_dqattach(i) (0) diff --git a/libxfs/xfs_imeta.c b/libxfs/xfs_imeta.c new file mode 100644 index 00000000000..b5c6672e7d5 --- /dev/null +++ b/libxfs/xfs_imeta.c @@ -0,0 +1,437 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (C) 2022 Oracle. All Rights Reserved. + * Author: Darrick J. Wong <djwong@kernel.org> + */ +#include "libxfs_priv.h" +#include "xfs_fs.h" +#include "xfs_shared.h" +#include "xfs_format.h" +#include "xfs_log_format.h" +#include "xfs_trans_resv.h" +#include "xfs_bit.h" +#include "xfs_sb.h" +#include "xfs_mount.h" +#include "xfs_defer.h" +#include "xfs_trans.h" +#include "xfs_imeta.h" +#include "xfs_trace.h" +#include "xfs_inode.h" +#include "xfs_ialloc.h" + +/* + * Metadata Inode Number Management + * ================================ + * + * These functions provide an abstraction layer for looking up, creating, and + * deleting metadata inodes. These pointers live in the in-core superblock, + * so the functions moderate access to those fields and take care of logging. + * + * For the five existing metadata inodes (real time bitmap & summary; and the + * user, group, and quotas) we'll continue to maintain the in-core superblock + * inodes for reads and only require xfs_imeta_create and xfs_imeta_unlink to + * persist changes. New metadata inode types must only use the xfs_imeta_* + * functions. + * + * Callers wishing to create or unlink a metadata inode must pass in a + * xfs_imeta_end structure. After committing or cancelling the transaction, + * this structure must be passed to xfs_imeta_end_update to free resources that + * cannot be freed during the transaction. + * + * Right now we only support callers passing in the predefined metadata inode + * paths; the goal is that callers will some day locate metadata inodes based + * on path lookups into a metadata directory structure. + */ + +/* Static metadata inode paths */ + +const struct xfs_imeta_path XFS_IMETA_RTBITMAP = { + .bogus = 0, +}; + +const struct xfs_imeta_path XFS_IMETA_RTSUMMARY = { + .bogus = 1, +}; + +const struct xfs_imeta_path XFS_IMETA_USRQUOTA = { + .bogus = 2, +}; + +const struct xfs_imeta_path XFS_IMETA_GRPQUOTA = { + .bogus = 3, +}; + +const struct xfs_imeta_path XFS_IMETA_PRJQUOTA = { + .bogus = 4, +}; + +/* Are these two paths equal? */ +STATIC bool +xfs_imeta_path_compare( + const struct xfs_imeta_path *a, + const struct xfs_imeta_path *b) +{ + return a == b; +} + +/* Is this path ok? */ +static inline bool +xfs_imeta_path_check( + const struct xfs_imeta_path *path) +{ + return true; +} + +/* Functions for storing and retrieving superblock inode values. */ + +/* Mapping of metadata inode paths to in-core superblock values. */ +static const struct xfs_imeta_sbmap { + const struct xfs_imeta_path *path; + unsigned int offset; +} xfs_imeta_sbmaps[] = { + { + .path = &XFS_IMETA_RTBITMAP, + .offset = offsetof(struct xfs_sb, sb_rbmino), + }, + { + .path = &XFS_IMETA_RTSUMMARY, + .offset = offsetof(struct xfs_sb, sb_rsumino), + }, + { + .path = &XFS_IMETA_USRQUOTA, + .offset = offsetof(struct xfs_sb, sb_uquotino), + }, + { + .path = &XFS_IMETA_GRPQUOTA, + .offset = offsetof(struct xfs_sb, sb_gquotino), + }, + { + .path = &XFS_IMETA_PRJQUOTA, + .offset = offsetof(struct xfs_sb, sb_pquotino), + }, + { NULL, 0 }, +}; + +/* Return a pointer to the in-core superblock inode value. */ +static inline xfs_ino_t * +xfs_imeta_sbmap_to_inop( + struct xfs_mount *mp, + const struct xfs_imeta_sbmap *map) +{ + return (xfs_ino_t *)(((char *)&mp->m_sb) + map->offset); +} + +/* Compute location of metadata inode pointer in the in-core superblock */ +static inline xfs_ino_t * +xfs_imeta_path_to_sb_inop( + struct xfs_mount *mp, + const struct xfs_imeta_path *path) +{ + const struct xfs_imeta_sbmap *p; + + for (p = xfs_imeta_sbmaps; p->path; p++) + if (xfs_imeta_path_compare(p->path, path)) + return xfs_imeta_sbmap_to_inop(mp, p); + + return NULL; +} + +/* Look up a superblock metadata inode by its path. */ +STATIC int +xfs_imeta_sb_lookup( + struct xfs_mount *mp, + const struct xfs_imeta_path *path, + xfs_ino_t *inop) +{ + xfs_ino_t *sb_inop; + + sb_inop = xfs_imeta_path_to_sb_inop(mp, path); + if (!sb_inop) + return -EINVAL; + + trace_xfs_imeta_sb_lookup(mp, sb_inop); + *inop = *sb_inop; + return 0; +} + +/* Update inode pointers in the superblock. */ +static inline void +xfs_imeta_log_sb( + struct xfs_trans *tp) +{ + struct xfs_mount *mp = tp->t_mountp; + struct xfs_buf *bp = xfs_trans_getsb(tp); + + /* + * Update the inode flags in the ondisk superblock without touching + * the summary counters. We have not quiesced inode chunk allocation, + * so we cannot coordinate with updates to the icount and ifree percpu + * counters. + */ + xfs_sb_to_disk(bp->b_addr, &mp->m_sb); + xfs_trans_buf_set_type(tp, bp, XFS_BLFT_SB_BUF); + xfs_trans_log_buf(tp, bp, 0, sizeof(struct xfs_dsb) - 1); +} + +/* + * Create a new metadata inode and set a superblock pointer to this new inode. + * The superblock field must not already be pointing to an inode. + */ +STATIC int +xfs_imeta_sb_create( + struct xfs_trans **tpp, + const struct xfs_imeta_path *path, + umode_t mode, + unsigned int flags, + struct xfs_inode **ipp) +{ + struct xfs_icreate_args args = { + .nlink = S_ISDIR(mode) ? 2 : 1, + }; + struct xfs_mount *mp = (*tpp)->t_mountp; + xfs_ino_t *sb_inop; + xfs_ino_t ino; + int error; + + xfs_icreate_args_rootfile(&args, mode); + + /* Reject if the sb already points to some inode. */ + sb_inop = xfs_imeta_path_to_sb_inop(mp, path); + if (!sb_inop) + return -EINVAL; + + if (*sb_inop != NULLFSINO) + return -EEXIST; + + /* Create a new inode and set the sb pointer. */ + error = xfs_dialloc(tpp, 0, mode, &ino); + if (error) + return error; + error = xfs_icreate(*tpp, ino, &args, ipp); + if (error) + return error; + + /* Attach dquots to this file. Caller should have allocated them! */ + if (!(flags & XFS_IMETA_CREATE_NOQUOTA)) { + error = xfs_qm_dqattach_locked(*ipp, false); + if (error) + return error; + xfs_trans_mod_dquot_byino(*tpp, *ipp, XFS_TRANS_DQ_ICOUNT, 1); + } + + /* Update superblock pointer. */ + *sb_inop = ino; + trace_xfs_imeta_sb_create(mp, sb_inop); + xfs_imeta_log_sb(*tpp); + return 0; +} + +/* + * Clear the given inode pointer from the superblock and drop the link count + * of the metadata inode. + */ +STATIC int +xfs_imeta_sb_unlink( + struct xfs_trans **tpp, + const struct xfs_imeta_path *path, + struct xfs_inode *ip) +{ + struct xfs_mount *mp = (*tpp)->t_mountp; + xfs_ino_t *sb_inop; + + sb_inop = xfs_imeta_path_to_sb_inop(mp, path); + if (!sb_inop) + return -EINVAL; + + /* Reject if the sb doesn't point to the inode that was passed in. */ + if (*sb_inop != ip->i_ino) + return -ENOENT; + + *sb_inop = NULLFSINO; + trace_xfs_imeta_sb_unlink(mp, sb_inop); + xfs_imeta_log_sb(*tpp); + return xfs_droplink(*tpp, ip); +} + +/* Set the given inode pointer in the superblock. */ +STATIC int +xfs_imeta_sb_link( + struct xfs_trans *tp, + const struct xfs_imeta_path *path, + struct xfs_inode *ip) +{ + struct xfs_mount *mp = tp->t_mountp; + xfs_ino_t *sb_inop; + + sb_inop = xfs_imeta_path_to_sb_inop(mp, path); + if (!sb_inop) + return -EINVAL; + if (*sb_inop == NULLFSINO) + return -EEXIST; + + xfs_ilock(ip, XFS_ILOCK_EXCL); + xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL); + + inc_nlink(VFS_I(ip)); + *sb_inop = ip->i_ino; + trace_xfs_imeta_sb_link(mp, sb_inop); + xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); + xfs_imeta_log_sb(tp); + return 0; +} + +/* General functions for managing metadata inode pointers */ + +/* + * Is this metadata inode pointer ok? We allow the fields to be set to + * NULLFSINO if the metadata structure isn't present, and we don't allow + * obviously incorrect inode pointers. + */ +static inline bool +xfs_imeta_verify( + struct xfs_mount *mp, + xfs_ino_t ino) +{ + if (ino == NULLFSINO) + return true; + return xfs_verify_ino(mp, ino); +} + +/* Look up a metadata inode by its path. */ +int +xfs_imeta_lookup( + struct xfs_mount *mp, + const struct xfs_imeta_path *path, + xfs_ino_t *inop) +{ + xfs_ino_t ino; + int error; + + ASSERT(xfs_imeta_path_check(path)); + + error = xfs_imeta_sb_lookup(mp, path, &ino); + if (error) + return error; + + if (!xfs_imeta_verify(mp, ino)) + return -EFSCORRUPTED; + + *inop = ino; + return 0; +} + +/* + * Create a metadata inode with the given @mode, and insert it into the + * metadata directory tree at the given @path. The path (up to the final + * component) must already exist. The new metadata inode @ipp will be ijoined + * and logged to @tpp, with the ILOCK held until the next transaction commit. + * The caller must provide a @upd structure. + * + * Callers must ensure that the root dquots are allocated, if applicable. + * + * NOTE: This function may pass a child inode @ipp back to the caller even if + * it returns a negative error code. If an inode is passed back, the caller + * must finish setting up the incore inode before releasing it. + */ +int +xfs_imeta_create( + struct xfs_trans **tpp, + const struct xfs_imeta_path *path, + umode_t mode, + unsigned int flags, + struct xfs_inode **ipp, + struct xfs_imeta_update *upd) +{ + ASSERT(xfs_imeta_path_check(path)); + *ipp = NULL; + + return xfs_imeta_sb_create(tpp, path, mode, flags, ipp); +} + +/* + * Unlink a metadata inode @ip from the metadata directory given by @path. The + * metadata inode must not be ILOCKed. Upon return, the inode will be ijoined + * and logged to @tpp, and returned with reduced link count, ready to be + * released. The caller must provide a @upd structure. + */ +int +xfs_imeta_unlink( + struct xfs_trans **tpp, + const struct xfs_imeta_path *path, + struct xfs_inode *ip, + struct xfs_imeta_update *upd) +{ + ASSERT(xfs_imeta_path_check(path)); + ASSERT(xfs_imeta_verify((*tpp)->t_mountp, ip->i_ino)); + + return xfs_imeta_sb_unlink(tpp, path, ip); +} + +/* + * Link the metadata directory given by @path point to the given inode number. + * The path must not already exist. The caller must not hold the ILOCK, and + * the function will return with the inode joined to the transaction. + */ +int +xfs_imeta_link( + struct xfs_trans *tp, + const struct xfs_imeta_path *path, + struct xfs_inode *ip, + struct xfs_imeta_update *upd) +{ + ASSERT(xfs_imeta_path_check(path)); + + return xfs_imeta_sb_link(tp, path, ip); +} + +/* + * Clean up after committing (or cancelling) a metadata inode creation or + * removal. + */ +void +xfs_imeta_end_update( + struct xfs_mount *mp, + struct xfs_imeta_update *upd, + int error) +{ + trace_xfs_imeta_end_update(mp, error, __return_address); +} + +/* Start setting up for a metadata directory tree operation. */ +int +xfs_imeta_start_update( + struct xfs_mount *mp, + const struct xfs_imeta_path *path, + struct xfs_imeta_update *upd) +{ + trace_xfs_imeta_start_update(mp, 0, __return_address); + + memset(upd, 0, sizeof(struct xfs_imeta_update)); + return 0; +} + +/* Does this inode number refer to a static metadata inode? */ +bool +xfs_is_static_meta_ino( + struct xfs_mount *mp, + xfs_ino_t ino) +{ + const struct xfs_imeta_sbmap *p; + + if (ino == NULLFSINO) + return false; + + for (p = xfs_imeta_sbmaps; p->path; p++) + if (ino == *xfs_imeta_sbmap_to_inop(mp, p)) + return true; + + return false; +} + +/* Ensure that the in-core superblock has all the values that it should. */ +int +xfs_imeta_mount( + struct xfs_mount *mp) +{ + return 0; +} diff --git a/libxfs/xfs_imeta.h b/libxfs/xfs_imeta.h new file mode 100644 index 00000000000..b535e19ff1a --- /dev/null +++ b/libxfs/xfs_imeta.h @@ -0,0 +1,48 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * Copyright (C) 2022 Oracle. All Rights Reserved. + * Author: Darrick J. Wong <djwong@kernel.org> + */ +#ifndef __XFS_IMETA_H__ +#define __XFS_IMETA_H__ + +/* Key for looking up metadata inodes. */ +struct xfs_imeta_path { + /* Temporary: integer to keep the static imeta definitions unique */ + int bogus; +}; + +/* Cleanup widget for metadata inode creation and deletion. */ +struct xfs_imeta_update { + /* empty for now */ +}; + +/* Lookup keys for static metadata inodes. */ +extern const struct xfs_imeta_path XFS_IMETA_RTBITMAP; +extern const struct xfs_imeta_path XFS_IMETA_RTSUMMARY; +extern const struct xfs_imeta_path XFS_IMETA_USRQUOTA; +extern const struct xfs_imeta_path XFS_IMETA_GRPQUOTA; +extern const struct xfs_imeta_path XFS_IMETA_PRJQUOTA; + +int xfs_imeta_lookup(struct xfs_mount *mp, const struct xfs_imeta_path *path, + xfs_ino_t *ino); + +/* Don't allocate quota for this file. */ +#define XFS_IMETA_CREATE_NOQUOTA (1 << 0) +int xfs_imeta_create(struct xfs_trans **tpp, const struct xfs_imeta_path *path, + umode_t mode, unsigned int flags, struct xfs_inode **ipp, + struct xfs_imeta_update *upd); +int xfs_imeta_unlink(struct xfs_trans **tpp, const struct xfs_imeta_path *path, + struct xfs_inode *ip, struct xfs_imeta_update *upd); +int xfs_imeta_link(struct xfs_trans *tp, const struct xfs_imeta_path *path, + struct xfs_inode *ip, struct xfs_imeta_update *upd); +void xfs_imeta_end_update(struct xfs_mount *mp, struct xfs_imeta_update *upd, + int error); +int xfs_imeta_start_update(struct xfs_mount *mp, + const struct xfs_imeta_path *path, + struct xfs_imeta_update *upd); + +bool xfs_is_static_meta_ino(struct xfs_mount *mp, xfs_ino_t ino); +int xfs_imeta_mount(struct xfs_mount *mp); + +#endif /* __XFS_IMETA_H__ */ diff --git a/libxfs/xfs_types.c b/libxfs/xfs_types.c index 87abc824479..93eefd7b35f 100644 --- a/libxfs/xfs_types.c +++ b/libxfs/xfs_types.c @@ -12,6 +12,7 @@ #include "xfs_bit.h" #include "xfs_mount.h" #include "xfs_ag.h" +#include "xfs_imeta.h" /* @@ -115,9 +116,7 @@ xfs_internal_inum( struct xfs_mount *mp, xfs_ino_t ino) { - return ino == mp->m_sb.sb_rbmino || ino == mp->m_sb.sb_rsumino || - (xfs_has_quota(mp) && - xfs_is_quota_inode(&mp->m_sb, ino)); + return xfs_is_static_meta_ino(mp, ino); } /* diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c index cca3497ab64..bd730d6cb07 100644 --- a/mkfs/xfs_mkfs.c +++ b/mkfs/xfs_mkfs.c @@ -3824,6 +3824,7 @@ start_superblock_setup( struct xfs_mount *mp, struct xfs_sb *sbp) { + sbp->sb_inprogress = 1; /* mkfs is in progress */ sbp->sb_magicnum = XFS_SB_MAGIC; sbp->sb_sectsize = (uint16_t)cfg->sectorsize; sbp->sb_sectlog = (uint8_t)cfg->sectorlog; @@ -3907,7 +3908,6 @@ finish_superblock_setup( sbp->sb_logblocks = (xfs_extlen_t)cfg->logblocks; sbp->sb_rextslog = (uint8_t)(cfg->rtextents ? libxfs_highbit32((unsigned int)cfg->rtextents) : 0); - sbp->sb_inprogress = 1; /* mkfs is in progress */ sbp->sb_imax_pct = cfg->imaxpct; sbp->sb_icount = 0; sbp->sb_ifree = 0; ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 05/46] mkfs: break up the rest of the rtinit() function 2022-12-30 22:19 ` [PATCHSET v1.0 00/46] libxfs: metadata inode directories Darrick J. Wong ` (3 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 01/46] xfs: create imeta abstractions to get and set metadata inodes Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 12/46] xfs: read and write metadata inode directory Darrick J. Wong ` (40 subsequent siblings) 45 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Break up this really long function into smaller functions that each do one thing. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- mkfs/proto.c | 106 +++++++++++++++++++++++++++++++++++++++++++--------------- 1 file changed, 79 insertions(+), 27 deletions(-) diff --git a/mkfs/proto.c b/mkfs/proto.c index 354c9fa8a02..f145a7ba753 100644 --- a/mkfs/proto.c +++ b/mkfs/proto.c @@ -673,28 +673,16 @@ parse_proto( parseproto(mp, NULL, fsx, pp, NULL); } -/* - * Allocate the realtime bitmap and summary inodes, and fill in data if any. - */ +/* Create the realtime bitmap inode. */ static void -rtinit( +rtbitmap_create( struct xfs_mount *mp) { struct xfs_imeta_update upd; - struct xfs_bmbt_irec map[XFS_BMAP_MAX_NMAP]; + struct xfs_trans *tp; struct xfs_inode *rbmip; - struct xfs_inode *rsumip; - struct xfs_trans *tp; - struct xfs_bmbt_irec *ep; - xfs_fileoff_t bno; - xfs_fileoff_t ebno; - xfs_extlen_t nsumblocks; - uint blocks; - int i; - int nmap; int error; - /* Create the realtime bitmap inode. */ error = -libxfs_imeta_start_update(mp, &XFS_IMETA_RTBITMAP, &upd); if (error) res_failed(error); @@ -719,8 +707,18 @@ rtinit( error); mp->m_rbmip = rbmip; libxfs_imeta_end_update(mp, &upd, 0); +} + +/* Create the realtime summary inode. */ +static void +rtsummary_create( + struct xfs_mount *mp) +{ + struct xfs_imeta_update upd; + struct xfs_trans *tp; + struct xfs_inode *rsumip; + int error; - /* Create the realtime summary inode. */ error = -libxfs_imeta_start_update(mp, &XFS_IMETA_RTSUMMARY, &upd); if (error) res_failed(error); @@ -743,19 +741,33 @@ rtinit( error); mp->m_rsumip = rsumip; libxfs_imeta_end_update(mp, &upd, 0); +} + +/* Zero the realtime bitmap. */ +static void +rtbitmap_init( + struct xfs_mount *mp) +{ + struct xfs_bmbt_irec map[XFS_BMAP_MAX_NMAP]; + struct xfs_trans *tp; + struct xfs_bmbt_irec *ep; + xfs_fileoff_t bno; + uint blocks; + int i; + int nmap; + int error; - /* Zero the realtime bitmap. */ blocks = mp->m_sb.sb_rbmblocks + XFS_BM_MAXLEVELS(mp, XFS_DATA_FORK) - 1; error = -libxfs_trans_alloc_rollable(mp, blocks, &tp); if (error) res_failed(error); - libxfs_trans_ijoin(tp, rbmip, 0); + libxfs_trans_ijoin(tp, mp->m_rbmip, 0); bno = 0; while (bno < mp->m_sb.sb_rbmblocks) { nmap = XFS_BMAP_MAX_NMAP; - error = -libxfs_bmapi_write(tp, rbmip, bno, + error = -libxfs_bmapi_write(tp, mp->m_rbmip, bno, (xfs_extlen_t)(mp->m_sb.sb_rbmblocks - bno), 0, mp->m_sb.sb_rbmblocks, map, &nmap); if (error) @@ -774,19 +786,34 @@ rtinit( if (error) fail(_("Block allocation of the realtime bitmap inode failed"), error); +} + +/* Zero the realtime summary file. */ +static void +rtsummary_init( + struct xfs_mount *mp) +{ + struct xfs_bmbt_irec map[XFS_BMAP_MAX_NMAP]; + struct xfs_trans *tp; + struct xfs_bmbt_irec *ep; + xfs_fileoff_t bno; + xfs_extlen_t nsumblocks; + uint blocks; + int i; + int nmap; + int error; - /* Zero the summary file. */ nsumblocks = mp->m_rsumsize >> mp->m_sb.sb_blocklog; blocks = nsumblocks + XFS_BM_MAXLEVELS(mp, XFS_DATA_FORK) - 1; error = -libxfs_trans_alloc_rollable(mp, blocks, &tp); if (error) res_failed(error); - libxfs_trans_ijoin(tp, rsumip, 0); + libxfs_trans_ijoin(tp, mp->m_rsumip, 0); bno = 0; while (bno < nsumblocks) { nmap = XFS_BMAP_MAX_NMAP; - error = -libxfs_bmapi_write(tp, rsumip, bno, + error = -libxfs_bmapi_write(tp, mp->m_rsumip, bno, (xfs_extlen_t)(nsumblocks - bno), 0, nsumblocks, map, &nmap); if (error) @@ -805,18 +832,28 @@ rtinit( if (error) fail(_("Block allocation of the realtime summary inode failed"), error); +} + +/* + * Free the whole realtime area using transactions. + * Do one transaction per bitmap block. + */ +static void +rtfreesp_init( + struct xfs_mount *mp) +{ + struct xfs_trans *tp; + xfs_fileoff_t bno; + xfs_fileoff_t ebno; + int error; - /* - * Free the whole area using transactions. - * Do one transaction per bitmap block. - */ for (bno = 0; bno < mp->m_sb.sb_rextents; bno = ebno) { error = -libxfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, 0, 0, 0, &tp); if (error) res_failed(error); - libxfs_trans_ijoin(tp, rbmip, 0); + libxfs_trans_ijoin(tp, mp->m_rbmip, 0); ebno = XFS_RTMIN(mp->m_sb.sb_rextents, bno + NBBY * mp->m_sb.sb_blocksize); @@ -832,6 +869,21 @@ rtinit( } } +/* + * Allocate the realtime bitmap and summary inodes, and fill in data if any. + */ +static void +rtinit( + struct xfs_mount *mp) +{ + rtbitmap_create(mp); + rtsummary_create(mp); + + rtbitmap_init(mp); + rtsummary_init(mp); + rtfreesp_init(mp); +} + static long filesize( int fd) ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 12/46] xfs: read and write metadata inode directory 2022-12-30 22:19 ` [PATCHSET v1.0 00/46] libxfs: metadata inode directories Darrick J. Wong ` (4 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 05/46] mkfs: break up the rest of the rtinit() function Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 08/46] xfs: update imeta transaction reservations for metadir Darrick J. Wong ` (39 subsequent siblings) 45 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Plumb in the bits we need to look up metadata inode numbers from the metadata inode directory and save them back. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- include/xfs_inode.h | 8 + include/xfs_trace.h | 6 libxfs/inode.c | 93 ++++++ libxfs/libxfs_api_defs.h | 1 libxfs/libxfs_priv.h | 2 libxfs/xfs_imeta.c | 699 ++++++++++++++++++++++++++++++++++++++++++++++ libxfs/xfs_imeta.h | 17 + libxfs/xfs_inode_util.c | 2 libxfs/xfs_trans_resv.c | 8 - 9 files changed, 829 insertions(+), 7 deletions(-) diff --git a/include/xfs_inode.h b/include/xfs_inode.h index b099c036ef2..b8e82090628 100644 --- a/include/xfs_inode.h +++ b/include/xfs_inode.h @@ -32,6 +32,11 @@ static inline kgid_t make_kgid(gid_t gid) return v; } +#define KUIDT_INIT(value) (kuid_t){ value } +#define KGIDT_INIT(value) (kgid_t){ value } +#define GLOBAL_ROOT_UID KUIDT_INIT(0) +#define GLOBAL_ROOT_GID KGIDT_INIT(0) + /* These match kernel side includes */ #include "xfs_inode_buf.h" #include "xfs_inode_fork.h" @@ -320,4 +325,7 @@ extern void libxfs_irele(struct xfs_inode *ip); #define xfs_inherit_nosymlinks (false) #define xfs_inherit_nodefrag (false) +int libxfs_ifree_cluster(struct xfs_trans *tp, struct xfs_perag *pag, + struct xfs_inode *free_ip, struct xfs_icluster *xic); + #endif /* __XFS_INODE_H__ */ diff --git a/include/xfs_trace.h b/include/xfs_trace.h index 78bce651a6f..fef869dbea3 100644 --- a/include/xfs_trace.h +++ b/include/xfs_trace.h @@ -349,6 +349,12 @@ #define trace_xfs_perag_get_tag(a,b,c,d) ((c) = (c)) #define trace_xfs_perag_put(a,b,c,d) ((c) = (c)) +#define trace_xfs_imeta_dir_link(...) ((void) 0) +#define trace_xfs_imeta_dir_lookup_component(...) ((void) 0) +#define trace_xfs_imeta_dir_lookup_found(...) ((void) 0) +#define trace_xfs_imeta_dir_try_create(...) ((void) 0) +#define trace_xfs_imeta_dir_created(...) ((void) 0) +#define trace_xfs_imeta_dir_unlinked(...) ((void) 0) #define trace_xfs_imeta_end_update(...) ((void) 0) #define trace_xfs_imeta_sb_link(...) ((void) 0) #define trace_xfs_imeta_sb_lookup(...) ((void) 0) diff --git a/libxfs/inode.c b/libxfs/inode.c index db42529e07e..8dabad93247 100644 --- a/libxfs/inode.c +++ b/libxfs/inode.c @@ -314,3 +314,96 @@ void inode_init_owner(struct user_namespace *mnt_userns, struct inode *inode, inode_fsgid_set(inode, mnt_userns); inode->i_mode = mode; } + +/* + * This call is used to indicate that the buffer is going to + * be staled and was an inode buffer. This means it gets + * special processing during unpin - where any inodes + * associated with the buffer should be removed from ail. + * There is also special processing during recovery, + * any replay of the inodes in the buffer needs to be + * prevented as the buffer may have been reused. + */ +static void +xfs_trans_stale_inode_buf( + xfs_trans_t *tp, + struct xfs_buf *bp) +{ + ASSERT(bp->b_transp == tp); + ASSERT(bip != NULL); + ASSERT(atomic_read(&bip->bli_refcount) > 0); + + bp->b_flags |= _XBF_INODES; + xfs_trans_buf_set_type(tp, bp, XFS_BLFT_DINO_BUF); +} + +/* + * A big issue when freeing the inode cluster is that we _cannot_ skip any + * inodes that are in memory - they all must be marked stale and attached to + * the cluster buffer. + */ +int +libxfs_ifree_cluster( + struct xfs_trans *tp, + struct xfs_perag *pag, + struct xfs_inode *free_ip, + struct xfs_icluster *xic) +{ + struct xfs_mount *mp = free_ip->i_mount; + struct xfs_ino_geometry *igeo = M_IGEO(mp); + struct xfs_buf *bp; + xfs_daddr_t blkno; + xfs_ino_t inum = xic->first_ino; + int nbufs; + int j; + int ioffset; + int error; + + nbufs = igeo->ialloc_blks / igeo->blocks_per_cluster; + + for (j = 0; j < nbufs; j++, inum += igeo->inodes_per_cluster) { + /* + * The allocation bitmap tells us which inodes of the chunk were + * physically allocated. Skip the cluster if an inode falls into + * a sparse region. + */ + ioffset = inum - xic->first_ino; + if ((xic->alloc & XFS_INOBT_MASK(ioffset)) == 0) { + ASSERT(ioffset % igeo->inodes_per_cluster == 0); + continue; + } + + blkno = XFS_AGB_TO_DADDR(mp, XFS_INO_TO_AGNO(mp, inum), + XFS_INO_TO_AGBNO(mp, inum)); + + /* + * We obtain and lock the backing buffer first in the process + * here to ensure dirty inodes attached to the buffer remain in + * the flushing state while we mark them stale. + * + * If we scan the in-memory inodes first, then buffer IO can + * complete before we get a lock on it, and hence we may fail + * to mark all the active inodes on the buffer stale. + */ + error = xfs_trans_get_buf(tp, mp->m_ddev_targp, blkno, + mp->m_bsize * igeo->blocks_per_cluster, + XBF_UNMAPPED, &bp); + if (error) + return error; + + /* + * This buffer may not have been correctly initialised as we + * didn't read it from disk. That's not important because we are + * only using to mark the buffer as stale in the log, and to + * attach stale cached inodes on it. That means it will never be + * dispatched for IO. If it is, we want to know about it, and we + * want it to fail. We can acheive this by adding a write + * verifier to the buffer. + */ + bp->b_ops = &xfs_inode_buf_ops; + + xfs_trans_stale_inode_buf(tp, bp); + xfs_trans_binval(tp, bp); + } + return 0; +} diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h index c27949e5f48..d4cc059abfb 100644 --- a/libxfs/libxfs_api_defs.h +++ b/libxfs/libxfs_api_defs.h @@ -154,6 +154,7 @@ #define xfs_iext_lookup_extent libxfs_iext_lookup_extent #define xfs_iext_next libxfs_iext_next #define xfs_ifork_zap_attr libxfs_ifork_zap_attr +#define xfs_ifree_cluster libxfs_ifree_cluster #define xfs_imap_to_bp libxfs_imap_to_bp #define xfs_imeta_create libxfs_imeta_create diff --git a/libxfs/libxfs_priv.h b/libxfs/libxfs_priv.h index 85b54f16803..b7885cfea06 100644 --- a/libxfs/libxfs_priv.h +++ b/libxfs/libxfs_priv.h @@ -167,6 +167,8 @@ enum ce { CE_DEBUG, CE_CONT, CE_NOTE, CE_WARN, CE_ALERT, CE_PANIC }; #define XFS_ERRLEVEL_LOW 1 #define XFS_ILOCK_EXCL 0 +#define XFS_IOLOCK_SHARED 0 +#define XFS_IOLOCK_EXCL 0 #define XFS_STATS_INC(mp, count) do { (mp) = (mp); } while (0) #define XFS_STATS_DEC(mp, count, x) do { (mp) = (mp); } while (0) #define XFS_STATS_ADD(mp, count, x) do { (mp) = (mp); } while (0) diff --git a/libxfs/xfs_imeta.c b/libxfs/xfs_imeta.c index 8fa0b5a5c1c..9e92186b58c 100644 --- a/libxfs/xfs_imeta.c +++ b/libxfs/xfs_imeta.c @@ -22,6 +22,9 @@ #include "xfs_da_format.h" #include "xfs_da_btree.h" #include "xfs_trans_space.h" +#include "xfs_dir2.h" +#include "xfs_dir2_priv.h" +#include "xfs_ag.h" /* * Metadata Inode Number Management @@ -42,9 +45,16 @@ * this structure must be passed to xfs_imeta_end_update to free resources that * cannot be freed during the transaction. * - * Right now we only support callers passing in the predefined metadata inode - * paths; the goal is that callers will some day locate metadata inodes based - * on path lookups into a metadata directory structure. + * When the metadata directory tree (metadir) feature is enabled, we can create + * a complex directory tree in which to store metadata inodes. Inodes within + * the metadata directory tree should have the "metadata" inode flag set to + * prevent them from being exposed to the outside world. + * + * Callers are expected to take the IOLOCK of metadata directories when + * performing lookups or updates to the tree. They are expected to take the + * ILOCK of any inode in the metadata directory tree (just like the regular to + * synchronize access to that inode. It is not necessary to take the MMAPLOCK + * since metadata inodes should never be exposed to user space. */ /* Static metadata inode paths */ @@ -60,6 +70,11 @@ XFS_IMETA_DEFINE_PATH(XFS_IMETA_USRQUOTA, usrquota_path); XFS_IMETA_DEFINE_PATH(XFS_IMETA_GRPQUOTA, grpquota_path); XFS_IMETA_DEFINE_PATH(XFS_IMETA_PRJQUOTA, prjquota_path); +const struct xfs_imeta_path XFS_IMETA_METADIR = { + .im_depth = 0, + .im_ftype = XFS_DIR3_FT_DIR, +}; + /* Are these two paths equal? */ STATIC bool xfs_imeta_path_compare( @@ -117,6 +132,10 @@ static const struct xfs_imeta_sbmap { .path = &XFS_IMETA_PRJQUOTA, .offset = offsetof(struct xfs_sb, sb_pquotino), }, + { + .path = &XFS_IMETA_METADIR, + .offset = offsetof(struct xfs_sb, sb_metadirino), + }, { NULL, 0 }, }; @@ -288,6 +307,523 @@ xfs_imeta_sb_link( return 0; } +/* Functions for storing and retrieving metadata directory inode values. */ + +static inline void +set_xname( + struct xfs_name *xname, + const struct xfs_imeta_path *path, + unsigned int path_idx, + unsigned char ftype) +{ + xname->name = (const unsigned char *)path->im_path[path_idx]; + xname->len = strlen(path->im_path[path_idx]); + xname->type = ftype; +} + +/* Look up the inode number and filetype for an exact name in a directory. */ +static inline int +xfs_imeta_dir_lookup( + struct xfs_inode *dp, + struct xfs_name *xname, + xfs_ino_t *ino) +{ + struct xfs_da_args args = { + .dp = dp, + .geo = dp->i_mount->m_dir_geo, + .name = xname->name, + .namelen = xname->len, + .hashval = xfs_dir2_hashname(dp->i_mount, xname), + .whichfork = XFS_DATA_FORK, + .op_flags = XFS_DA_OP_OKNOENT, + .owner = dp->i_ino, + }; + unsigned int lock_mode; + bool isblock, isleaf; + int error; + + if (xfs_is_shutdown(dp->i_mount)) + return -EIO; + + lock_mode = xfs_ilock_data_map_shared(dp); + if (dp->i_df.if_format == XFS_DINODE_FMT_LOCAL) { + error = xfs_dir2_sf_lookup(&args); + goto out_unlock; + } + + /* dir2 functions require that the data fork is loaded */ + error = xfs_iread_extents(NULL, dp, XFS_DATA_FORK); + if (error) + goto out_unlock; + + error = xfs_dir2_isblock(&args, &isblock); + if (error) + goto out_unlock; + + if (isblock) { + error = xfs_dir2_block_lookup(&args); + goto out_unlock; + } + + error = xfs_dir2_isleaf(&args, &isleaf); + if (error) + goto out_unlock; + + if (isleaf) { + error = xfs_dir2_leaf_lookup(&args); + goto out_unlock; + } + + error = xfs_dir2_node_lookup(&args); + +out_unlock: + xfs_iunlock(dp, lock_mode); + if (error == -EEXIST) + error = 0; + if (error) + return error; + + *ino = args.inumber; + xname->type = args.filetype; + return 0; +} +/* + * Given a parent directory @dp, a metadata inode @path and component + * @path_idx, and the expected file type @ftype of the path component, fill out + * the @xname and look up the inode number in the directory, returning it in + * @ino. + */ +static inline int +xfs_imeta_dir_lookup_component( + struct xfs_inode *dp, + struct xfs_name *xname, + xfs_ino_t *ino) +{ + int type_wanted = xname->type; + int error; + + trace_xfs_imeta_dir_lookup_component(dp, xname, NULLFSINO); + + if (!S_ISDIR(VFS_I(dp)->i_mode)) + return -EFSCORRUPTED; + + error = xfs_imeta_dir_lookup(dp, xname, ino); + if (error) + return error; + if (!xfs_verify_ino(dp->i_mount, *ino)) + return -EFSCORRUPTED; + if (type_wanted != XFS_DIR3_FT_UNKNOWN && xname->type != type_wanted) + return -EFSCORRUPTED; + + trace_xfs_imeta_dir_lookup_found(dp, xname, *ino); + return 0; +} + +/* + * Traverse a metadata directory tree path, returning the inode corresponding + * to the parent of the last path component. If any of the path components do + * not exist, return -ENOENT. + */ +STATIC int +xfs_imeta_dir_parent( + struct xfs_mount *mp, + const struct xfs_imeta_path *path, + struct xfs_inode **dpp) +{ + struct xfs_name xname; + struct xfs_inode *dp; + xfs_ino_t ino; + unsigned int i; + int error; + + if (mp->m_metadirip == NULL) + return -ENOENT; + + /* Grab the metadir root. */ + error = xfs_imeta_iget(mp, mp->m_metadirip->i_ino, XFS_DIR3_FT_DIR, + &dp); + if (error) + return error; + + /* Caller wanted the root, we're done! */ + if (path->im_depth == 0) { + *dpp = dp; + return 0; + } + + for (i = 0; i < path->im_depth - 1; i++) { + struct xfs_inode *ip = NULL; + + xfs_ilock(dp, XFS_IOLOCK_SHARED); + + /* Look up the name in the current directory. */ + set_xname(&xname, path, i, XFS_DIR3_FT_DIR); + error = xfs_imeta_dir_lookup_component(dp, &xname, &ino); + if (error) + goto out_rele; + + /* + * Grab the child inode while we still have the parent + * directory locked. + */ + error = xfs_imeta_iget(mp, ino, XFS_DIR3_FT_DIR, &ip); + if (error) + goto out_rele; + + xfs_iunlock(dp, XFS_IOLOCK_SHARED); + xfs_imeta_irele(dp); + dp = ip; + } + + *dpp = dp; + return 0; + +out_rele: + xfs_iunlock(dp, XFS_IOLOCK_SHARED); + xfs_imeta_irele(dp); + return error; +} + +/* + * Look up a metadata inode from the metadata directory. If the last path + * component doesn't exist, return NULLFSINO. If any other part of the path + * does not exist, return -ENOENT so we can distinguish the two. + */ +STATIC int +xfs_imeta_dir_lookup_int( + struct xfs_mount *mp, + const struct xfs_imeta_path *path, + xfs_ino_t *inop) +{ + struct xfs_name xname; + struct xfs_inode *dp = NULL; + xfs_ino_t ino; + int error; + + /* metadir ino is recorded in superblock */ + if (xfs_imeta_path_compare(path, &XFS_IMETA_METADIR)) + return xfs_imeta_sb_lookup(mp, path, inop); + + ASSERT(path->im_depth > 0); + + /* Find the parent of the last path component. */ + error = xfs_imeta_dir_parent(mp, path, &dp); + if (error) + return error; + + xfs_ilock(dp, XFS_IOLOCK_SHARED); + + /* Look up the name in the current directory. */ + set_xname(&xname, path, path->im_depth - 1, path->im_ftype); + error = xfs_imeta_dir_lookup_component(dp, &xname, &ino); + switch (error) { + case 0: + *inop = ino; + break; + case -ENOENT: + *inop = NULLFSINO; + error = 0; + break; + } + + xfs_iunlock(dp, XFS_IOLOCK_SHARED); + xfs_imeta_irele(dp); + return error; +} + +/* + * Load all the metadata inode pointers that are cached in the in-core + * superblock but live somewhere in the metadata directory tree. + */ +STATIC int +xfs_imeta_dir_mount( + struct xfs_mount *mp) +{ + const struct xfs_imeta_sbmap *p; + xfs_ino_t *sb_inop; + int err2; + int error = 0; + + for (p = xfs_imeta_sbmaps; p->path && p->path->im_depth > 0; p++) { + if (p->path == &XFS_IMETA_METADIR) + continue; + sb_inop = xfs_imeta_sbmap_to_inop(mp, p); + err2 = xfs_imeta_dir_lookup_int(mp, p->path, sb_inop); + if (err2 == -ENOENT) { + *sb_inop = NULLFSINO; + continue; + } + if (!error && err2) + error = err2; + } + + return error; +} + +/* Set up an inode to be recognized as a metadata inode. */ +void +xfs_imeta_set_metaflag( + struct xfs_trans *tp, + struct xfs_inode *ip) +{ + VFS_I(ip)->i_mode &= ~0777; + VFS_I(ip)->i_uid = GLOBAL_ROOT_UID; + VFS_I(ip)->i_gid = GLOBAL_ROOT_GID; + ip->i_projid = 0; + ip->i_diflags |= (XFS_DIFLAG_IMMUTABLE | XFS_DIFLAG_SYNC | + XFS_DIFLAG_NOATIME | XFS_DIFLAG_NODUMP | + XFS_DIFLAG_NODEFRAG); + if (S_ISDIR(VFS_I(ip)->i_mode)) + ip->i_diflags |= XFS_DIFLAG_NOSYMLINKS; + ip->i_diflags2 &= ~XFS_DIFLAG2_DAX; + ip->i_diflags2 |= XFS_DIFLAG2_METADATA; + xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); +} + +/* + * Create a new metadata inode accessible via the given metadata directory path. + * Callers must ensure that the directory entry does not already exist; a new + * one will be created. + */ +STATIC int +xfs_imeta_dir_create( + struct xfs_trans **tpp, + const struct xfs_imeta_path *path, + umode_t mode, + unsigned int flags, + struct xfs_inode **ipp, + struct xfs_imeta_update *upd) +{ + struct xfs_icreate_args args = { + .nlink = S_ISDIR(mode) ? 2 : 1, + }; + struct xfs_name xname; + struct xfs_mount *mp = (*tpp)->t_mountp; + struct xfs_inode *dp = upd->dp; + xfs_ino_t *sb_inop; + xfs_ino_t ino; + unsigned int resblks; + int error; + + xfs_icreate_args_rootfile(&args, mode); + + /* metadir ino is recorded in superblock; only mkfs gets to do this */ + if (xfs_imeta_path_compare(path, &XFS_IMETA_METADIR)) { + error = xfs_imeta_sb_create(tpp, path, mode, flags, ipp); + if (error) + return error; + + /* Set the metadata iflag, initialize directory. */ + xfs_imeta_set_metaflag(*tpp, *ipp); + return xfs_dir_init(*tpp, *ipp, *ipp); + } + + ASSERT(path->im_depth > 0); + + /* Check that the name does not already exist in the directory. */ + set_xname(&xname, path, path->im_depth - 1, XFS_DIR3_FT_UNKNOWN); + error = xfs_imeta_dir_lookup_component(dp, &xname, &ino); + switch (error) { + case -ENOENT: + break; + case 0: + error = -EEXIST; + fallthrough; + default: + return error; + } + + xfs_ilock(dp, XFS_ILOCK_EXCL | XFS_ILOCK_PARENT); + args.pip = dp; + + /* + * A newly created regular or special file just has one directory + * entry pointing to them, but a directory also the "." entry + * pointing to itself. + */ + error = xfs_dialloc(tpp, dp->i_ino, mode, &ino); + if (error) + goto out_ilock; + error = xfs_icreate(*tpp, ino, &args, ipp); + if (error) + goto out_ilock; + xfs_imeta_set_metaflag(*tpp, *ipp); + + /* + * Once we join the parent directory to the transaction we can't + * release it until after the transaction commits or cancels, so we + * must defer releasing it to end_update. This is different from + * regular file creation, where the vfs holds the parent dir reference + * and will free it. The caller is always responsible for releasing + * ipp, even if we failed. + */ + xfs_trans_ijoin(*tpp, dp, XFS_ILOCK_EXCL); + + /* Create the entry. */ + if (S_ISDIR(args.mode)) + resblks = XFS_MKDIR_SPACE_RES(mp, xname.len); + else + resblks = XFS_CREATE_SPACE_RES(mp, xname.len); + xname.type = xfs_mode_to_ftype(args.mode); + trace_xfs_imeta_dir_try_create(dp, &xname, NULLFSINO); + error = xfs_dir_create_new_child(*tpp, resblks, dp, &xname, *ipp); + if (error) + return error; + trace_xfs_imeta_dir_created(*ipp, &xname, ino); + + /* Attach dquots to this file. Caller should have allocated them! */ + if (!(flags & XFS_IMETA_CREATE_NOQUOTA)) { + error = xfs_qm_dqattach_locked(*ipp, false); + if (error) + return error; + xfs_trans_mod_dquot_byino(*tpp, *ipp, XFS_TRANS_DQ_ICOUNT, 1); + } + + /* Update the in-core superblock value if there is one. */ + sb_inop = xfs_imeta_path_to_sb_inop(mp, path); + if (sb_inop) + *sb_inop = ino; + return 0; + +out_ilock: + xfs_iunlock(dp, XFS_ILOCK_EXCL); + return error; +} + +/* + * Remove the given entry from the metadata directory and drop the link count + * of the metadata inode. + */ +STATIC int +xfs_imeta_dir_unlink( + struct xfs_trans **tpp, + const struct xfs_imeta_path *path, + struct xfs_inode *ip, + struct xfs_imeta_update *upd) +{ + struct xfs_name xname; + struct xfs_mount *mp = (*tpp)->t_mountp; + struct xfs_inode *dp = upd->dp; + xfs_ino_t *sb_inop; + xfs_ino_t ino; + unsigned int resblks; + int error; + + /* Metadata directory root cannot be unlinked. */ + if (xfs_imeta_path_compare(path, &XFS_IMETA_METADIR)) { + ASSERT(0); + return -EFSCORRUPTED; + } + + ASSERT(path->im_depth > 0); + + /* Look up the name in the current directory. */ + set_xname(&xname, path, path->im_depth - 1, + xfs_mode_to_ftype(VFS_I(ip)->i_mode)); + error = xfs_imeta_dir_lookup_component(dp, &xname, &ino); + switch (error) { + case 0: + if (ino != ip->i_ino) + error = -ENOENT; + break; + case -ENOENT: + error = -EFSCORRUPTED; + break; + } + if (error) + return error; + + xfs_lock_two_inodes(dp, XFS_ILOCK_EXCL, ip, XFS_ILOCK_EXCL); + + /* + * Once we join the parent directory to the transaction we can't + * release it until after the transaction commits or cancels, so we + * must defer releasing it to end_update. This is different from + * regular file removal, where the vfs holds the parent dir reference + * and will free it. The unlink caller is always responsible for + * releasing ip, so we don't need to take care of that. + */ + xfs_trans_ijoin(*tpp, dp, XFS_ILOCK_EXCL); + xfs_trans_ijoin(*tpp, ip, XFS_ILOCK_EXCL); + + resblks = XFS_REMOVE_SPACE_RES(mp); + error = xfs_dir_remove_child(*tpp, resblks, dp, &xname, ip); + if (error) + return error; + trace_xfs_imeta_dir_unlinked(dp, &xname, ip->i_ino); + + /* Update the in-core superblock value if there is one. */ + sb_inop = xfs_imeta_path_to_sb_inop(mp, path); + if (sb_inop) + *sb_inop = NULLFSINO; + return 0; +} + +/* Set the given path in the metadata directory to point to an inode. */ +STATIC int +xfs_imeta_dir_link( + struct xfs_trans *tp, + const struct xfs_imeta_path *path, + struct xfs_inode *ip, + struct xfs_imeta_update *upd) +{ + struct xfs_name xname; + struct xfs_mount *mp = tp->t_mountp; + struct xfs_inode *dp = upd->dp; + xfs_ino_t *sb_inop; + xfs_ino_t ino; + unsigned int resblks; + int error; + + /* Metadata directory root cannot be linked. */ + if (xfs_imeta_path_compare(path, &XFS_IMETA_METADIR)) { + ASSERT(0); + return -EFSCORRUPTED; + } + + ASSERT(path->im_depth > 0); + + /* Look up the name in the current directory. */ + set_xname(&xname, path, path->im_depth - 1, + xfs_mode_to_ftype(VFS_I(ip)->i_mode)); + error = xfs_imeta_dir_lookup_component(dp, &xname, &ino); + switch (error) { + case -ENOENT: + break; + case 0: + error = -EEXIST; + fallthrough; + default: + return error; + } + + xfs_lock_two_inodes(ip, XFS_ILOCK_EXCL, dp, XFS_ILOCK_EXCL); + + /* + * Once we join the parent directory to the transaction we can't + * release it until after the transaction commits or cancels, so we + * must defer releasing it to end_update. This is different from + * regular file removal, where the vfs holds the parent dir reference + * and will free it. The link caller is always responsible for + * releasing ip, so we don't need to take care of that. + */ + xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL); + xfs_trans_ijoin(tp, dp, XFS_ILOCK_EXCL); + + resblks = XFS_LINK_SPACE_RES(mp, target_name->len); + error = xfs_dir_link_existing_child(tp, resblks, dp, &xname, ip); + if (error) + return error; + + trace_xfs_imeta_dir_link(dp, &xname, ip->i_ino); + + /* Update the in-core superblock value if there is one. */ + sb_inop = xfs_imeta_path_to_sb_inop(mp, path); + if (sb_inop) + *sb_inop = ip->i_ino; + return 0; +} + /* General functions for managing metadata inode pointers */ /* @@ -317,7 +853,13 @@ xfs_imeta_lookup( ASSERT(xfs_imeta_path_check(path)); - error = xfs_imeta_sb_lookup(mp, path, &ino); + if (xfs_has_metadir(mp)) { + error = xfs_imeta_dir_lookup_int(mp, path, &ino); + if (error == -ENOENT) + return -EFSCORRUPTED; + } else { + error = xfs_imeta_sb_lookup(mp, path, &ino); + } if (error) return error; @@ -350,12 +892,49 @@ xfs_imeta_create( struct xfs_inode **ipp, struct xfs_imeta_update *upd) { + struct xfs_mount *mp = (*tpp)->t_mountp; + ASSERT(xfs_imeta_path_check(path)); *ipp = NULL; + if (xfs_has_metadir(mp)) + return xfs_imeta_dir_create(tpp, path, mode, flags, ipp, + upd); return xfs_imeta_sb_create(tpp, path, mode, flags, ipp); } +/* Free a file from the metadata directory tree. */ +STATIC int +xfs_imeta_ifree( + struct xfs_trans *tp, + struct xfs_inode *ip) +{ + struct xfs_mount *mp = ip->i_mount; + struct xfs_perag *pag; + struct xfs_icluster xic = { 0 }; + int error; + + ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL)); + ASSERT(VFS_I(ip)->i_nlink == 0); + ASSERT(ip->i_df.if_nextents == 0); + ASSERT(ip->i_disk_size == 0 || !S_ISREG(VFS_I(ip)->i_mode)); + ASSERT(ip->i_nblocks == 0); + + pag = xfs_perag_get(mp, XFS_INO_TO_AGNO(mp, ip->i_ino)); + + error = xfs_dir_ifree(tp, pag, ip, &xic); + if (error) + goto out; + + /* Metadata files do not support ownership changes or DMAPI. */ + + if (xic.deleted) + error = xfs_ifree_cluster(tp, pag, ip, &xic); +out: + xfs_perag_put(pag); + return error; +} + /* * Unlink a metadata inode @ip from the metadata directory given by @path. The * metadata inode must not be ILOCKed. Upon return, the inode will be ijoined @@ -369,10 +948,28 @@ xfs_imeta_unlink( struct xfs_inode *ip, struct xfs_imeta_update *upd) { + struct xfs_mount *mp = (*tpp)->t_mountp; + int error; + ASSERT(xfs_imeta_path_check(path)); ASSERT(xfs_imeta_verify((*tpp)->t_mountp, ip->i_ino)); - return xfs_imeta_sb_unlink(tpp, path, ip); + if (xfs_has_metadir(mp)) + error = xfs_imeta_dir_unlink(tpp, path, ip, upd); + else + error = xfs_imeta_sb_unlink(tpp, path, ip); + if (error) + return error; + + /* + * Metadata files require explicit resource cleanup. In other words, + * the inactivation system will not touch these files, so we must free + * the ondisk inode by ourselves if warranted. + */ + if (VFS_I(ip)->i_nlink > 0) + return 0; + + return xfs_imeta_ifree(*tpp, ip); } /* @@ -387,8 +984,12 @@ xfs_imeta_link( struct xfs_inode *ip, struct xfs_imeta_update *upd) { + struct xfs_mount *mp = tp->t_mountp; + ASSERT(xfs_imeta_path_check(path)); + if (xfs_has_metadir(mp)) + return xfs_imeta_dir_link(tp, path, ip, upd); return xfs_imeta_sb_link(tp, path, ip); } @@ -403,6 +1004,14 @@ xfs_imeta_end_update( int error) { trace_xfs_imeta_end_update(mp, error, __return_address); + + if (upd->dp) { + if (upd->lock_mode) + xfs_iunlock(upd->dp, upd->lock_mode); + xfs_imeta_irele(upd->dp); + } + upd->lock_mode = 0; + upd->dp = NULL; } /* Start setting up for a metadata directory tree operation. */ @@ -412,9 +1021,32 @@ xfs_imeta_start_update( const struct xfs_imeta_path *path, struct xfs_imeta_update *upd) { + int error; + trace_xfs_imeta_start_update(mp, 0, __return_address); memset(upd, 0, sizeof(struct xfs_imeta_update)); + + /* Metadir root directory does not have a parent. */ + if (!xfs_has_metadir(mp) || + xfs_imeta_path_compare(path, &XFS_IMETA_METADIR)) + return 0; + + ASSERT(path->im_depth > 0); + + /* + * Find the parent of the last path component. If the parent path does + * not exist, we consider this corruption because paths are supposed + * to exist. + */ + error = xfs_imeta_dir_parent(mp, path, &upd->dp); + if (error == -ENOENT) + return -EFSCORRUPTED; + if (error) + return error; + + xfs_ilock(upd->dp, XFS_IOLOCK_EXCL | XFS_IOLOCK_PARENT); + upd->lock_mode = XFS_IOLOCK_EXCL; return 0; } @@ -441,6 +1073,9 @@ int xfs_imeta_mount( struct xfs_mount *mp) { + if (xfs_has_metadir(mp)) + return xfs_imeta_dir_mount(mp); + return 0; } @@ -449,6 +1084,9 @@ unsigned int xfs_imeta_create_space_res( struct xfs_mount *mp) { + if (xfs_has_metadir(mp)) + return max(XFS_MKDIR_SPACE_RES(mp, NAME_MAX), + XFS_CREATE_SPACE_RES(mp, NAME_MAX)); return XFS_IALLOC_SPACE_RES(mp); } @@ -459,3 +1097,54 @@ xfs_imeta_unlink_space_res( { return XFS_REMOVE_SPACE_RES(mp); } + +/* Clear the metadata iflag if we're unlinking this inode. */ +void +xfs_imeta_droplink( + struct xfs_inode *ip) +{ + if (VFS_I(ip)->i_nlink == 0 && + xfs_has_metadir(ip->i_mount) && + xfs_is_metadata_inode(ip)) + ip->i_diflags2 &= ~XFS_DIFLAG2_METADATA; +} + +/* + * Given a metadata directory update, look up the inode number of the last + * component in the path. + */ +int +xfs_imeta_lookup_update( + struct xfs_mount *mp, + const struct xfs_imeta_path *path, + struct xfs_imeta_update *upd, + xfs_ino_t *inop) +{ + struct xfs_name xname; + xfs_ino_t ino; + int error; + + ASSERT(xfs_isilocked(upd->dp, XFS_IOLOCK_SHARED | XFS_IOLOCK_EXCL)); + + /* metadir ino is recorded in superblock */ + if (!xfs_has_metadir(mp) || + xfs_imeta_path_compare(path, &XFS_IMETA_METADIR)) + return xfs_imeta_sb_lookup(mp, path, inop); + + ASSERT(path->im_depth > 0); + + /* Check that the name does not already exist in the directory. */ + set_xname(&xname, path, path->im_depth - 1, XFS_DIR3_FT_UNKNOWN); + error = xfs_imeta_dir_lookup_component(upd->dp, &xname, &ino); + switch (error) { + case 0: + *inop = ino; + break; + case -ENOENT: + *inop = NULLFSINO; + error = 0; + break; + } + + return error; +} diff --git a/libxfs/xfs_imeta.h b/libxfs/xfs_imeta.h index 631a88120a7..9b139f6809f 100644 --- a/libxfs/xfs_imeta.h +++ b/libxfs/xfs_imeta.h @@ -13,6 +13,7 @@ #define XFS_IMETA_DEFINE_PATH(name, path) \ const struct xfs_imeta_path name = { \ .im_path = (path), \ + .im_ftype = XFS_DIR3_FT_REG_FILE, \ .im_depth = ARRAY_SIZE(path), \ } @@ -23,11 +24,18 @@ struct xfs_imeta_path { /* Number of strings in path. */ unsigned int im_depth; + + /* Expected file type. */ + unsigned int im_ftype; }; /* Cleanup widget for metadata inode creation and deletion. */ struct xfs_imeta_update { - /* empty for now */ + /* Parent directory */ + struct xfs_inode *dp; + + /* Parent directory lock mode */ + unsigned int lock_mode; }; /* Lookup keys for static metadata inodes. */ @@ -36,9 +44,15 @@ extern const struct xfs_imeta_path XFS_IMETA_RTSUMMARY; extern const struct xfs_imeta_path XFS_IMETA_USRQUOTA; extern const struct xfs_imeta_path XFS_IMETA_GRPQUOTA; extern const struct xfs_imeta_path XFS_IMETA_PRJQUOTA; +extern const struct xfs_imeta_path XFS_IMETA_METADIR; int xfs_imeta_lookup(struct xfs_mount *mp, const struct xfs_imeta_path *path, xfs_ino_t *ino); +int xfs_imeta_lookup_update(struct xfs_mount *mp, + const struct xfs_imeta_path *path, + struct xfs_imeta_update *upd, xfs_ino_t *inop); + +void xfs_imeta_set_metaflag(struct xfs_trans *tp, struct xfs_inode *ip); /* Don't allocate quota for this file. */ #define XFS_IMETA_CREATE_NOQUOTA (1 << 0) @@ -57,6 +71,7 @@ int xfs_imeta_start_update(struct xfs_mount *mp, bool xfs_is_static_meta_ino(struct xfs_mount *mp, xfs_ino_t ino); int xfs_imeta_mount(struct xfs_mount *mp); +void xfs_imeta_droplink(struct xfs_inode *ip); unsigned int xfs_imeta_create_space_res(struct xfs_mount *mp); unsigned int xfs_imeta_unlink_space_res(struct xfs_mount *mp); diff --git a/libxfs/xfs_inode_util.c b/libxfs/xfs_inode_util.c index cc203321dad..9f92af3274b 100644 --- a/libxfs/xfs_inode_util.c +++ b/libxfs/xfs_inode_util.c @@ -19,6 +19,7 @@ #include "xfs_bmap.h" #include "xfs_trace.h" #include "xfs_ag.h" +#include "xfs_imeta.h" #include "iunlink.h" uint16_t @@ -624,6 +625,7 @@ xfs_droplink( xfs_trans_ichgtime(tp, ip, XFS_ICHGTIME_CHG); drop_nlink(VFS_I(ip)); + xfs_imeta_droplink(ip); xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); if (VFS_I(ip)->i_nlink) diff --git a/libxfs/xfs_trans_resv.c b/libxfs/xfs_trans_resv.c index 67008bb4b72..2835d7754a8 100644 --- a/libxfs/xfs_trans_resv.c +++ b/libxfs/xfs_trans_resv.c @@ -920,7 +920,10 @@ xfs_calc_imeta_create_resv( unsigned int ret; ret = xfs_calc_buf_res(1, mp->m_sb.sb_sectsize); - ret += resp->tr_create.tr_logres; + if (xfs_has_metadir(mp)) + ret += max(resp->tr_create.tr_logres, resp->tr_mkdir.tr_logres); + else + ret += resp->tr_create.tr_logres; return ret; } @@ -930,6 +933,9 @@ xfs_calc_imeta_create_count( struct xfs_mount *mp, struct xfs_trans_resv *resp) { + if (xfs_has_metadir(mp)) + return max(resp->tr_create.tr_logcount, + resp->tr_mkdir.tr_logcount); return resp->tr_create.tr_logcount; } ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 08/46] xfs: update imeta transaction reservations for metadir 2022-12-30 22:19 ` [PATCHSET v1.0 00/46] libxfs: metadata inode directories Darrick J. Wong ` (5 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 12/46] xfs: read and write metadata inode directory Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 06/46] libxfs: iget for metadata inodes Darrick J. Wong ` (38 subsequent siblings) 45 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Update the new metadata inode transaction reservations to handle metadata directories if that feature is enabled. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/xfs_log_rlimit.c | 9 ++++++ libxfs/xfs_trans_resv.c | 68 ++++++++++++++++++++++++++++++++++++++++++++++- 2 files changed, 75 insertions(+), 2 deletions(-) diff --git a/libxfs/xfs_log_rlimit.c b/libxfs/xfs_log_rlimit.c index cba24493f86..84ebdbddd0a 100644 --- a/libxfs/xfs_log_rlimit.c +++ b/libxfs/xfs_log_rlimit.c @@ -48,6 +48,15 @@ xfs_log_calc_trans_resv_for_minlogblocks( { unsigned int rmap_maxlevels = mp->m_rmap_maxlevels; + /* + * The metadata directory tree feature drops the oversized minimum log + * size computations introduced by the original reflink code. + */ + if (xfs_has_metadir(mp)) { + xfs_trans_resv_calc(mp, resv); + return; + } + /* * In the early days of rmap+reflink, we always set the rmap maxlevels * to 9 even if the AG was small enough that it would never grow to diff --git a/libxfs/xfs_trans_resv.c b/libxfs/xfs_trans_resv.c index 00bdcb1d550..67008bb4b72 100644 --- a/libxfs/xfs_trans_resv.c +++ b/libxfs/xfs_trans_resv.c @@ -908,6 +908,56 @@ xfs_calc_sb_reservation( return xfs_calc_buf_res(1, mp->m_sb.sb_sectsize); } +/* + * Metadata inode creation needs enough space to create or mkdir a directory, + * plus logging the superblock. + */ +static unsigned int +xfs_calc_imeta_create_resv( + struct xfs_mount *mp, + struct xfs_trans_resv *resp) +{ + unsigned int ret; + + ret = xfs_calc_buf_res(1, mp->m_sb.sb_sectsize); + ret += resp->tr_create.tr_logres; + return ret; +} + +/* Metadata inode creation needs enough rounds to create or mkdir a directory */ +static int +xfs_calc_imeta_create_count( + struct xfs_mount *mp, + struct xfs_trans_resv *resp) +{ + return resp->tr_create.tr_logcount; +} + +/* + * Metadata inode unlink needs enough space to remove a file plus logging the + * superblock. + */ +static unsigned int +xfs_calc_imeta_unlink_resv( + struct xfs_mount *mp, + struct xfs_trans_resv *resp) +{ + unsigned int ret; + + ret = xfs_calc_buf_res(1, mp->m_sb.sb_sectsize); + ret += resp->tr_remove.tr_logres; + return ret; +} + +/* Metadata inode creation needs enough rounds to remove a file. */ +static int +xfs_calc_imeta_unlink_count( + struct xfs_mount *mp, + struct xfs_trans_resv *resp) +{ + return resp->tr_remove.tr_logcount; +} + void xfs_trans_resv_calc( struct xfs_mount *mp, @@ -1026,6 +1076,20 @@ xfs_trans_resv_calc( resp->tr_qm_dqalloc.tr_logcount += logcount_adj; /* metadata inode creation and unlink */ - resp->tr_imeta_create = resp->tr_create; - resp->tr_imeta_unlink = resp->tr_remove; + if (xfs_has_metadir(mp)) { + resp->tr_imeta_create.tr_logres = + xfs_calc_imeta_create_resv(mp, resp); + resp->tr_imeta_create.tr_logcount = + xfs_calc_imeta_create_count(mp, resp); + resp->tr_imeta_create.tr_logflags |= XFS_TRANS_PERM_LOG_RES; + + resp->tr_imeta_unlink.tr_logres = + xfs_calc_imeta_unlink_resv(mp, resp); + resp->tr_imeta_unlink.tr_logcount = + xfs_calc_imeta_unlink_count(mp, resp); + resp->tr_imeta_unlink.tr_logflags |= XFS_TRANS_PERM_LOG_RES; + } else { + resp->tr_imeta_create = resp->tr_create; + resp->tr_imeta_unlink = resp->tr_remove; + } } ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 06/46] libxfs: iget for metadata inodes 2022-12-30 22:19 ` [PATCHSET v1.0 00/46] libxfs: metadata inode directories Darrick J. Wong ` (6 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 08/46] xfs: update imeta transaction reservations for metadir Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 11/46] xfs: enforce metadata inode flag Darrick J. Wong ` (37 subsequent siblings) 45 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Create a libxfs_iget_meta function for metadata inodes to ensure that we always check that the inobt thinks a metadata inode is in use. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/init.c | 4 ++-- libxfs/inode.c | 32 ++++++++++++++++++++++++++++++++ libxfs/libxfs_api_defs.h | 2 ++ libxfs/xfs_imeta.h | 5 +++++ 4 files changed, 41 insertions(+), 2 deletions(-) diff --git a/libxfs/init.c b/libxfs/init.c index e19b4e6d4cf..d114ac87f19 100644 --- a/libxfs/init.c +++ b/libxfs/init.c @@ -983,9 +983,9 @@ void libxfs_rtmount_destroy(xfs_mount_t *mp) { if (mp->m_rsumip) - libxfs_irele(mp->m_rsumip); + libxfs_imeta_irele(mp->m_rsumip); if (mp->m_rbmip) - libxfs_irele(mp->m_rbmip); + libxfs_imeta_irele(mp->m_rbmip); mp->m_rsumip = mp->m_rbmip = NULL; } diff --git a/libxfs/inode.c b/libxfs/inode.c index 1a27016a763..95a1ba50cdf 100644 --- a/libxfs/inode.c +++ b/libxfs/inode.c @@ -216,6 +216,31 @@ libxfs_iget( return error; } +/* Get a metadata inode. The ftype must match exactly. */ +int +libxfs_imeta_iget( + struct xfs_mount *mp, + xfs_ino_t ino, + unsigned char ftype, + struct xfs_inode **ipp) +{ + struct xfs_inode *ip; + int error; + + error = libxfs_iget(mp, NULL, ino, 0, &ip); + if (error) + return error; + + if (ftype == XFS_DIR3_FT_UNKNOWN || + xfs_mode_to_ftype(VFS_I(ip)->i_mode) != ftype) { + libxfs_irele(ip); + return -EFSCORRUPTED; + } + + *ipp = ip; + return 0; +} + static void libxfs_idestroy( struct xfs_inode *ip) @@ -249,6 +274,13 @@ libxfs_irele( } } +void +libxfs_imeta_irele( + struct xfs_inode *ip) +{ + libxfs_irele(ip); +} + void xfs_inode_sgid_inherit( const struct xfs_icreate_args *args, diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h index 69f1cf2c752..c27949e5f48 100644 --- a/libxfs/libxfs_api_defs.h +++ b/libxfs/libxfs_api_defs.h @@ -159,6 +159,8 @@ #define xfs_imeta_create libxfs_imeta_create #define xfs_imeta_create_space_res libxfs_imeta_create_space_res #define xfs_imeta_end_update libxfs_imeta_end_update +#define xfs_imeta_iget libxfs_imeta_iget +#define xfs_imeta_irele libxfs_imeta_irele #define xfs_imeta_link libxfs_imeta_link #define xfs_imeta_lookup libxfs_imeta_lookup #define xfs_imeta_mount libxfs_imeta_mount diff --git a/libxfs/xfs_imeta.h b/libxfs/xfs_imeta.h index 9d54cb0d796..312e3a6fdb9 100644 --- a/libxfs/xfs_imeta.h +++ b/libxfs/xfs_imeta.h @@ -48,4 +48,9 @@ int xfs_imeta_mount(struct xfs_mount *mp); unsigned int xfs_imeta_create_space_res(struct xfs_mount *mp); unsigned int xfs_imeta_unlink_space_res(struct xfs_mount *mp); +/* Must be implemented by the libxfs client */ +int xfs_imeta_iget(struct xfs_mount *mp, xfs_ino_t ino, unsigned char ftype, + struct xfs_inode **ipp); +void xfs_imeta_irele(struct xfs_inode *ip); + #endif /* __XFS_IMETA_H__ */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 11/46] xfs: enforce metadata inode flag 2022-12-30 22:19 ` [PATCHSET v1.0 00/46] libxfs: metadata inode directories Darrick J. Wong ` (7 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 06/46] libxfs: iget for metadata inodes Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 09/46] xfs: load metadata directory root at mount time Darrick J. Wong ` (36 subsequent siblings) 45 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Add checks for the metadata inode flag so that we don't ever leak metadata inodes out to userspace, and we don't ever try to read a regular inode as metadata. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/inode.c | 7 ++++- libxfs/xfs_inode_buf.c | 73 ++++++++++++++++++++++++++++++++++++++++++++++++ libxfs/xfs_inode_buf.h | 3 ++ 3 files changed, 82 insertions(+), 1 deletion(-) diff --git a/libxfs/inode.c b/libxfs/inode.c index 95a1ba50cdf..db42529e07e 100644 --- a/libxfs/inode.c +++ b/libxfs/inode.c @@ -231,7 +231,9 @@ libxfs_imeta_iget( if (error) return error; - if (ftype == XFS_DIR3_FT_UNKNOWN || + if ((xfs_has_metadir(mp) && + !xfs_is_metadata_inode(ip)) || + ftype == XFS_DIR3_FT_UNKNOWN || xfs_mode_to_ftype(VFS_I(ip)->i_mode) != ftype) { libxfs_irele(ip); return -EFSCORRUPTED; @@ -278,6 +280,9 @@ void libxfs_imeta_irele( struct xfs_inode *ip) { + ASSERT(!xfs_has_metadir(ip->i_mount) || + xfs_is_metadata_inode(ip)); + libxfs_irele(ip); } diff --git a/libxfs/xfs_inode_buf.c b/libxfs/xfs_inode_buf.c index 82eb3f91b9d..b5d4e5dd7ca 100644 --- a/libxfs/xfs_inode_buf.c +++ b/libxfs/xfs_inode_buf.c @@ -454,6 +454,73 @@ xfs_dinode_verify_nrext64( return NULL; } +/* + * Validate all the picky requirements we have for a file that claims to be + * filesystem metadata. + */ +xfs_failaddr_t +xfs_dinode_verify_metaflag( + struct xfs_mount *mp, + struct xfs_dinode *dip, + uint16_t mode, + uint16_t flags, + uint64_t flags2) +{ + if (!xfs_has_metadir(mp)) + return __this_address; + + /* V5 filesystem only */ + if (dip->di_version < 3) + return __this_address; + + /* V3 inode fields that are always zero */ + if (dip->di_onlink) + return __this_address; + if ((flags2 & XFS_DIFLAG2_NREXT64) && dip->di_nrext64_pad) + return __this_address; + if (!(flags2 & XFS_DIFLAG2_NREXT64) && dip->di_flushiter) + return __this_address; + + /* Metadata files can only be directories or regular files */ + if (!S_ISDIR(mode) && !S_ISREG(mode)) + return __this_address; + + /* They must have zero access permissions */ + if (mode & 0777) + return __this_address; + + /* DMAPI event and state masks are zero */ + if (dip->di_dmevmask || dip->di_dmstate) + return __this_address; + + /* User, group, and project IDs must be zero */ + if (dip->di_uid || dip->di_gid || + dip->di_projid_lo || dip->di_projid_hi) + return __this_address; + + /* Immutable, sync, noatime, nodump, and nodefrag flags must be set */ + if (!(flags & XFS_DIFLAG_IMMUTABLE)) + return __this_address; + if (!(flags & XFS_DIFLAG_SYNC)) + return __this_address; + if (!(flags & XFS_DIFLAG_NOATIME)) + return __this_address; + if (!(flags & XFS_DIFLAG_NODUMP)) + return __this_address; + if (!(flags & XFS_DIFLAG_NODEFRAG)) + return __this_address; + + /* Directories must have nosymlinks flags set */ + if (S_ISDIR(mode) && !(flags & XFS_DIFLAG_NOSYMLINKS)) + return __this_address; + + /* dax flags2 must not be set */ + if (flags2 & XFS_DIFLAG2_DAX) + return __this_address; + + return NULL; +} + xfs_failaddr_t xfs_dinode_verify( struct xfs_mount *mp, @@ -607,6 +674,12 @@ xfs_dinode_verify( !xfs_has_bigtime(mp)) return __this_address; + if (flags2 & XFS_DIFLAG2_METADATA) { + fa = xfs_dinode_verify_metaflag(mp, dip, mode, flags, flags2); + if (fa) + return fa; + } + return NULL; } diff --git a/libxfs/xfs_inode_buf.h b/libxfs/xfs_inode_buf.h index 585ed5a110a..94d6e7c018e 100644 --- a/libxfs/xfs_inode_buf.h +++ b/libxfs/xfs_inode_buf.h @@ -28,6 +28,9 @@ int xfs_inode_from_disk(struct xfs_inode *ip, struct xfs_dinode *from); xfs_failaddr_t xfs_dinode_verify(struct xfs_mount *mp, xfs_ino_t ino, struct xfs_dinode *dip); +xfs_failaddr_t xfs_dinode_verify_metaflag(struct xfs_mount *mp, + struct xfs_dinode *dip, uint16_t mode, uint16_t flags, + uint64_t flags2); xfs_failaddr_t xfs_inode_validate_extsize(struct xfs_mount *mp, uint32_t extsize, uint16_t mode, uint16_t flags); xfs_failaddr_t xfs_inode_validate_cowextsize(struct xfs_mount *mp, ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 09/46] xfs: load metadata directory root at mount time 2022-12-30 22:19 ` [PATCHSET v1.0 00/46] libxfs: metadata inode directories Darrick J. Wong ` (8 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 11/46] xfs: enforce metadata inode flag Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 07/46] xfs: define the on-disk format for the metadir feature Darrick J. Wong ` (35 subsequent siblings) 45 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Load the metadata directory root inode into memory at mount time and release it at unmount time. We also make sure that the obsolete inode pointers in the superblock are not logged or read from the superblock. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- include/xfs_mount.h | 1 + libxfs/init.c | 20 ++++++++++++++++++-- libxfs/xfs_sb.c | 31 +++++++++++++++++++++++++++++++ libxfs/xfs_types.c | 2 +- 4 files changed, 51 insertions(+), 3 deletions(-) diff --git a/include/xfs_mount.h b/include/xfs_mount.h index 5a8f45e7796..4347098dc7e 100644 --- a/include/xfs_mount.h +++ b/include/xfs_mount.h @@ -55,6 +55,7 @@ typedef struct xfs_mount { uint8_t *m_rsum_cache; struct xfs_inode *m_rbmip; /* pointer to bitmap inode */ struct xfs_inode *m_rsumip; /* pointer to summary inode */ + struct xfs_inode *m_metadirip; /* ptr to metadata directory */ struct xfs_buftarg *m_ddev_targp; struct xfs_buftarg *m_logdev_targp; struct xfs_buftarg *m_rtdev_targp; diff --git a/libxfs/init.c b/libxfs/init.c index d114ac87f19..787f7c108db 100644 --- a/libxfs/init.c +++ b/libxfs/init.c @@ -802,11 +802,25 @@ libxfs_mountfs_imeta( if (mp->m_sb.sb_inprogress) return; + if (xfs_has_metadir(mp)) { + error = -libxfs_imeta_iget(mp, mp->m_sb.sb_metadirino, + XFS_DIR3_FT_DIR, &mp->m_metadirip); + if (error) + fprintf(stderr, +_("%s: could not open metadata directory, error %d\n"), + progname, error); + } + error = -xfs_imeta_mount(mp); - if (error) + if (error) { + if (mp->m_metadirip) + libxfs_imeta_irele(mp->m_metadirip); + mp->m_metadirip = NULL; + fprintf(stderr, -_("%s: metadata inode mounting failed, error %d\n"), +_("%s: mounting metadata directory failed, error %d\n"), progname, error); + } } /* @@ -1091,6 +1105,8 @@ libxfs_umount( int error; libxfs_rtmount_destroy(mp); + if (mp->m_metadirip) + libxfs_imeta_irele(mp->m_metadirip); /* * Purge the buffer cache to write all dirty buffers to disk and free diff --git a/libxfs/xfs_sb.c b/libxfs/xfs_sb.c index c421099e4f9..6452856d45b 100644 --- a/libxfs/xfs_sb.c +++ b/libxfs/xfs_sb.c @@ -620,6 +620,25 @@ __xfs_sb_from_disk( /* Convert on-disk flags to in-memory flags? */ if (convert_xquota) xfs_sb_quota_from_disk(to); + + if (to->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_METADIR) { + /* + * Set metadirino here and null out the in-core fields for + * the other inodes because metadir initialization will load + * them later. + */ + to->sb_metadirino = be64_to_cpu(from->sb_rbmino); + to->sb_rbmino = NULLFSINO; + to->sb_rsumino = NULLFSINO; + + /* + * We don't have to worry about quota inode conversion here + * because metadir requires a v5 filesystem. + */ + to->sb_uquotino = NULLFSINO; + to->sb_gquotino = NULLFSINO; + to->sb_pquotino = NULLFSINO; + } } void @@ -767,6 +786,18 @@ xfs_sb_to_disk( to->sb_lsn = cpu_to_be64(from->sb_lsn); if (from->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_META_UUID) uuid_copy(&to->sb_meta_uuid, &from->sb_meta_uuid); + + if (from->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_METADIR) { + /* + * Save metadirino here and null out the on-disk fields for + * the other inodes, at least until we reuse the fields. + */ + to->sb_rbmino = cpu_to_be64(from->sb_metadirino); + to->sb_rsumino = cpu_to_be64(NULLFSINO); + to->sb_uquotino = cpu_to_be64(NULLFSINO); + to->sb_gquotino = cpu_to_be64(NULLFSINO); + to->sb_pquotino = cpu_to_be64(NULLFSINO); + } } /* diff --git a/libxfs/xfs_types.c b/libxfs/xfs_types.c index 93eefd7b35f..d20d9a5c915 100644 --- a/libxfs/xfs_types.c +++ b/libxfs/xfs_types.c @@ -128,7 +128,7 @@ xfs_verify_dir_ino( struct xfs_mount *mp, xfs_ino_t ino) { - if (xfs_internal_inum(mp, ino)) + if (!xfs_has_metadir(mp) && xfs_internal_inum(mp, ino)) return false; return xfs_verify_ino(mp, ino); } ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 07/46] xfs: define the on-disk format for the metadir feature 2022-12-30 22:19 ` [PATCHSET v1.0 00/46] libxfs: metadata inode directories Darrick J. Wong ` (9 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 09/46] xfs: load metadata directory root at mount time Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 10/46] xfs: convert metadata inode lookup keys to use paths Darrick J. Wong ` (34 subsequent siblings) 45 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Define the on-disk layout and feature flags for the metadata inode directory feature. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- include/xfs_inode.h | 5 +++++ include/xfs_mount.h | 2 ++ libxfs/xfs_format.h | 48 +++++++++++++++++++++++++++++++++++++++++++++-- libxfs/xfs_inode_util.c | 2 ++ libxfs/xfs_sb.c | 2 ++ 5 files changed, 57 insertions(+), 2 deletions(-) diff --git a/include/xfs_inode.h b/include/xfs_inode.h index ccd19e5ee5b..b099c036ef2 100644 --- a/include/xfs_inode.h +++ b/include/xfs_inode.h @@ -287,6 +287,11 @@ static inline bool xfs_is_always_cow_inode(struct xfs_inode *ip) return false; } +static inline bool xfs_is_metadata_inode(struct xfs_inode *ip) +{ + return ip->i_diflags2 & XFS_DIFLAG2_METADATA; +} + extern void libxfs_trans_inode_alloc_buf (struct xfs_trans *, struct xfs_buf *); diff --git a/include/xfs_mount.h b/include/xfs_mount.h index 1690660ed5b..5a8f45e7796 100644 --- a/include/xfs_mount.h +++ b/include/xfs_mount.h @@ -161,6 +161,7 @@ typedef struct xfs_mount { #define XFS_FEAT_BIGTIME (1ULL << 24) /* large timestamps */ #define XFS_FEAT_NEEDSREPAIR (1ULL << 25) /* needs xfs_repair */ #define XFS_FEAT_NREXT64 (1ULL << 26) /* large extent counters */ +#define XFS_FEAT_METADIR (1ULL << 27) /* metadata directory tree */ #define __XFS_HAS_FEAT(name, NAME) \ static inline bool xfs_has_ ## name (struct xfs_mount *mp) \ @@ -205,6 +206,7 @@ __XFS_HAS_FEAT(inobtcounts, INOBTCNT) __XFS_HAS_FEAT(bigtime, BIGTIME) __XFS_HAS_FEAT(needsrepair, NEEDSREPAIR) __XFS_HAS_FEAT(large_extent_counts, NREXT64) +__XFS_HAS_FEAT(metadir, METADIR) /* Kernel mount features that we don't support */ #define __XFS_UNSUPP_FEAT(name) \ diff --git a/libxfs/xfs_format.h b/libxfs/xfs_format.h index abd75b3091e..0bd915bd4ee 100644 --- a/libxfs/xfs_format.h +++ b/libxfs/xfs_format.h @@ -174,6 +174,16 @@ typedef struct xfs_sb { xfs_lsn_t sb_lsn; /* last write sequence */ uuid_t sb_meta_uuid; /* metadata file system unique id */ + /* Fields beyond here do not match xfs_dsb. Be very careful! */ + + /* + * Metadata Directory Inode. On disk this lives in the sb_rbmino slot, + * but we continue to use the in-core superblock to cache the classic + * inodes (rt bitmap; rt summary; user, group, and project quotas) so + * we cache the metadir inode value here too. + */ + xfs_ino_t sb_metadirino; + /* must be padded to 64 bit alignment */ } xfs_sb_t; @@ -190,7 +200,14 @@ struct xfs_dsb { uuid_t sb_uuid; /* user-visible file system unique id */ __be64 sb_logstart; /* starting block of log if internal */ __be64 sb_rootino; /* root inode number */ - __be64 sb_rbmino; /* bitmap inode for realtime extents */ + /* + * bitmap inode for realtime extents. + * + * The metadata directory feature uses the sb_rbmino field to point to + * the root of the metadata directory tree. All other sb inode + * pointers are no longer used. + */ + __be64 sb_rbmino; __be64 sb_rsumino; /* summary inode for rt bitmap */ __be32 sb_rextsize; /* realtime extent size, blocks */ __be32 sb_agblocks; /* size of an allocation group */ @@ -372,6 +389,7 @@ xfs_sb_has_ro_compat_feature( #define XFS_SB_FEAT_INCOMPAT_BIGTIME (1 << 3) /* large timestamps */ #define XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR (1 << 4) /* needs xfs_repair */ #define XFS_SB_FEAT_INCOMPAT_NREXT64 (1 << 5) /* large extent counters */ +#define XFS_SB_FEAT_INCOMPAT_METADIR (1U << 31) /* metadata dir tree */ #define XFS_SB_FEAT_INCOMPAT_ALL \ (XFS_SB_FEAT_INCOMPAT_FTYPE| \ XFS_SB_FEAT_INCOMPAT_SPINODES| \ @@ -1078,6 +1096,7 @@ static inline void xfs_dinode_put_rdev(struct xfs_dinode *dip, xfs_dev_t rdev) #define XFS_DIFLAG2_COWEXTSIZE_BIT 2 /* copy on write extent size hint */ #define XFS_DIFLAG2_BIGTIME_BIT 3 /* big timestamps */ #define XFS_DIFLAG2_NREXT64_BIT 4 /* large extent counters */ +#define XFS_DIFLAG2_METADATA_BIT 63 /* filesystem metadata */ #define XFS_DIFLAG2_DAX (1 << XFS_DIFLAG2_DAX_BIT) #define XFS_DIFLAG2_REFLINK (1 << XFS_DIFLAG2_REFLINK_BIT) @@ -1085,9 +1104,34 @@ static inline void xfs_dinode_put_rdev(struct xfs_dinode *dip, xfs_dev_t rdev) #define XFS_DIFLAG2_BIGTIME (1 << XFS_DIFLAG2_BIGTIME_BIT) #define XFS_DIFLAG2_NREXT64 (1 << XFS_DIFLAG2_NREXT64_BIT) +/* + * The inode contains filesystem metadata and can be found through the metadata + * directory tree. Metadata inodes must satisfy the following constraints: + * + * - V5 filesystem (and ftype) are enabled; + * - The only valid modes are regular files and directories; + * - The access bits must be zero; + * - DMAPI event and state masks are zero; + * - The user, group, and project IDs must be zero; + * - The immutable, sync, noatime, nodump, nodefrag flags must be set. + * - The dax flag must not be set. + * - Directories must have nosymlinks set. + * + * These requirements are chosen defensively to minimize the ability of + * userspace to read or modify the contents, should a metadata file ever + * escape to userspace. + * + * There are further constraints on the directory tree itself: + * + * - Metadata inodes must never be resolvable through the root directory; + * - They must never be accessed by userspace; + * - Metadata directory entries must have correct ftype. + */ +#define XFS_DIFLAG2_METADATA (1ULL << XFS_DIFLAG2_METADATA_BIT) + #define XFS_DIFLAG2_ANY \ (XFS_DIFLAG2_DAX | XFS_DIFLAG2_REFLINK | XFS_DIFLAG2_COWEXTSIZE | \ - XFS_DIFLAG2_BIGTIME | XFS_DIFLAG2_NREXT64) + XFS_DIFLAG2_BIGTIME | XFS_DIFLAG2_NREXT64 | XFS_DIFLAG2_METADATA) static inline bool xfs_dinode_has_bigtime(const struct xfs_dinode *dip) { diff --git a/libxfs/xfs_inode_util.c b/libxfs/xfs_inode_util.c index 65c025f3573..cc203321dad 100644 --- a/libxfs/xfs_inode_util.c +++ b/libxfs/xfs_inode_util.c @@ -222,6 +222,8 @@ xfs_inode_inherit_flags2( } if (pip->i_diflags2 & XFS_DIFLAG2_DAX) ip->i_diflags2 |= XFS_DIFLAG2_DAX; + if (pip->i_diflags2 & XFS_DIFLAG2_METADATA) + ip->i_diflags2 |= XFS_DIFLAG2_METADATA; /* Don't let invalid cowextsize hints propagate. */ failaddr = xfs_inode_validate_cowextsize(ip->i_mount, ip->i_cowextsize, diff --git a/libxfs/xfs_sb.c b/libxfs/xfs_sb.c index 10f699b4e99..c421099e4f9 100644 --- a/libxfs/xfs_sb.c +++ b/libxfs/xfs_sb.c @@ -172,6 +172,8 @@ xfs_sb_version_to_features( features |= XFS_FEAT_NEEDSREPAIR; if (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_NREXT64) features |= XFS_FEAT_NREXT64; + if (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_METADIR) + features |= XFS_FEAT_METADIR; return features; } ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 10/46] xfs: convert metadata inode lookup keys to use paths 2022-12-30 22:19 ` [PATCHSET v1.0 00/46] libxfs: metadata inode directories Darrick J. Wong ` (10 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 07/46] xfs: define the on-disk format for the metadir feature Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 14/46] xfs: disable the agi rotor for metadata inodes Darrick J. Wong ` (33 subsequent siblings) 45 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Convert the magic metadata inode lookup keys to use actual strings for paths. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/xfs_imeta.c | 48 ++++++++++++++++++++++++++---------------------- libxfs/xfs_imeta.h | 17 +++++++++++++++-- 2 files changed, 41 insertions(+), 24 deletions(-) diff --git a/libxfs/xfs_imeta.c b/libxfs/xfs_imeta.c index bc3c94634ce..8fa0b5a5c1c 100644 --- a/libxfs/xfs_imeta.c +++ b/libxfs/xfs_imeta.c @@ -48,26 +48,17 @@ */ /* Static metadata inode paths */ - -const struct xfs_imeta_path XFS_IMETA_RTBITMAP = { - .bogus = 0, -}; - -const struct xfs_imeta_path XFS_IMETA_RTSUMMARY = { - .bogus = 1, -}; - -const struct xfs_imeta_path XFS_IMETA_USRQUOTA = { - .bogus = 2, -}; - -const struct xfs_imeta_path XFS_IMETA_GRPQUOTA = { - .bogus = 3, -}; - -const struct xfs_imeta_path XFS_IMETA_PRJQUOTA = { - .bogus = 4, -}; +static const char *rtbitmap_path[] = {"realtime", "bitmap"}; +static const char *rtsummary_path[] = {"realtime", "summary"}; +static const char *usrquota_path[] = {"quota", "user"}; +static const char *grpquota_path[] = {"quota", "group"}; +static const char *prjquota_path[] = {"quota", "project"}; + +XFS_IMETA_DEFINE_PATH(XFS_IMETA_RTBITMAP, rtbitmap_path); +XFS_IMETA_DEFINE_PATH(XFS_IMETA_RTSUMMARY, rtsummary_path); +XFS_IMETA_DEFINE_PATH(XFS_IMETA_USRQUOTA, usrquota_path); +XFS_IMETA_DEFINE_PATH(XFS_IMETA_GRPQUOTA, grpquota_path); +XFS_IMETA_DEFINE_PATH(XFS_IMETA_PRJQUOTA, prjquota_path); /* Are these two paths equal? */ STATIC bool @@ -75,7 +66,20 @@ xfs_imeta_path_compare( const struct xfs_imeta_path *a, const struct xfs_imeta_path *b) { - return a == b; + unsigned int i; + + if (a == b) + return true; + + if (a->im_depth != b->im_depth) + return false; + + for (i = 0; i < a->im_depth; i++) + if (a->im_path[i] != b->im_path[i] && + strcmp(a->im_path[i], b->im_path[i])) + return false; + + return true; } /* Is this path ok? */ @@ -83,7 +87,7 @@ static inline bool xfs_imeta_path_check( const struct xfs_imeta_path *path) { - return true; + return path->im_depth <= XFS_IMETA_MAX_DEPTH; } /* Functions for storing and retrieving superblock inode values. */ diff --git a/libxfs/xfs_imeta.h b/libxfs/xfs_imeta.h index 312e3a6fdb9..631a88120a7 100644 --- a/libxfs/xfs_imeta.h +++ b/libxfs/xfs_imeta.h @@ -6,10 +6,23 @@ #ifndef __XFS_IMETA_H__ #define __XFS_IMETA_H__ +/* How deep can we nest metadata dirs? */ +#define XFS_IMETA_MAX_DEPTH 64 + +/* Form an imeta path from a simple array of strings. */ +#define XFS_IMETA_DEFINE_PATH(name, path) \ +const struct xfs_imeta_path name = { \ + .im_path = (path), \ + .im_depth = ARRAY_SIZE(path), \ +} + /* Key for looking up metadata inodes. */ struct xfs_imeta_path { - /* Temporary: integer to keep the static imeta definitions unique */ - int bogus; + /* Array of string pointers. */ + const char **im_path; + + /* Number of strings in path. */ + unsigned int im_depth; }; /* Cleanup widget for metadata inode creation and deletion. */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 14/46] xfs: disable the agi rotor for metadata inodes 2022-12-30 22:19 ` [PATCHSET v1.0 00/46] libxfs: metadata inode directories Darrick J. Wong ` (11 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 10/46] xfs: convert metadata inode lookup keys to use paths Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 19/46] libfrog: report metadata directories in the geometry report Darrick J. Wong ` (32 subsequent siblings) 45 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Ideally, we'd put all the metadata inodes in one place if we could, so that the metadata all stay reasonably close together instead of spreading out over the disk. Furthermore, if the log is internal we'd probably prefer to keep the metadata near the log. Therefore, disable AGI rotoring for metadata inode allocations. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/util.c | 3 --- libxfs/xfs_ialloc.c | 16 +++++++++------- libxfs/xfs_ialloc.h | 2 +- libxfs/xfs_imeta.c | 4 ++-- mkfs/proto.c | 3 +-- repair/phase6.c | 2 +- 6 files changed, 14 insertions(+), 16 deletions(-) diff --git a/libxfs/util.c b/libxfs/util.c index fec26e6d30f..7b16d30b754 100644 --- a/libxfs/util.c +++ b/libxfs/util.c @@ -467,9 +467,6 @@ libxfs_imeta_mkdir( uint resblks; int error; - /* Try to place metadata directories in AG 0. */ - mp->m_agirotor = 0; - error = xfs_imeta_start_update(mp, path, &upd); if (error) return error; diff --git a/libxfs/xfs_ialloc.c b/libxfs/xfs_ialloc.c index 9ce36b2cd8d..e7cafdd395b 100644 --- a/libxfs/xfs_ialloc.c +++ b/libxfs/xfs_ialloc.c @@ -1793,26 +1793,28 @@ xfs_dialloc_try_ag( int xfs_dialloc( struct xfs_trans **tpp, - xfs_ino_t parent, + struct xfs_inode *pip, umode_t mode, xfs_ino_t *new_ino) { struct xfs_mount *mp = (*tpp)->t_mountp; + struct xfs_perag *pag; + struct xfs_ino_geometry *igeo = M_IGEO(mp); + xfs_ino_t ino; + xfs_ino_t parent = pip ? pip->i_ino : 0; xfs_agnumber_t agno; - int error = 0; xfs_agnumber_t start_agno; - struct xfs_perag *pag; - struct xfs_ino_geometry *igeo = M_IGEO(mp); bool ok_alloc = true; int flags; - xfs_ino_t ino; + int error = 0; /* * Directories, symlinks, and regular files frequently allocate at least * one block, so factor that potential expansion when we examine whether - * an AG has enough space for file creation. + * an AG has enough space for file creation. Try to keep metadata + * files all in the same AG. */ - if (S_ISDIR(mode)) + if (S_ISDIR(mode) && (!pip || !xfs_is_metadata_inode(pip))) start_agno = xfs_ialloc_next_ag(mp); else { start_agno = XFS_INO_TO_AGNO(mp, parent); diff --git a/libxfs/xfs_ialloc.h b/libxfs/xfs_ialloc.h index f4dc97bb8e8..adf60dc56e7 100644 --- a/libxfs/xfs_ialloc.h +++ b/libxfs/xfs_ialloc.h @@ -36,7 +36,7 @@ xfs_make_iptr(struct xfs_mount *mp, struct xfs_buf *b, int o) * Allocate an inode on disk. Mode is used to tell whether the new inode will * need space, and whether it is a directory. */ -int xfs_dialloc(struct xfs_trans **tpp, xfs_ino_t parent, umode_t mode, +int xfs_dialloc(struct xfs_trans **tpp, struct xfs_inode *dp, umode_t mode, xfs_ino_t *new_ino); int xfs_difree(struct xfs_trans *tp, struct xfs_perag *pag, diff --git a/libxfs/xfs_imeta.c b/libxfs/xfs_imeta.c index 9e92186b58c..1502d4eb2e3 100644 --- a/libxfs/xfs_imeta.c +++ b/libxfs/xfs_imeta.c @@ -231,7 +231,7 @@ xfs_imeta_sb_create( return -EEXIST; /* Create a new inode and set the sb pointer. */ - error = xfs_dialloc(tpp, 0, mode, &ino); + error = xfs_dialloc(tpp, NULL, mode, &ino); if (error) return error; error = xfs_icreate(*tpp, ino, &args, ipp); @@ -641,7 +641,7 @@ xfs_imeta_dir_create( * entry pointing to them, but a directory also the "." entry * pointing to itself. */ - error = xfs_dialloc(tpp, dp->i_ino, mode, &ino); + error = xfs_dialloc(tpp, dp, mode, &ino); if (error) goto out_ilock; error = xfs_icreate(*tpp, ino, &args, ipp); diff --git a/mkfs/proto.c b/mkfs/proto.c index f15cbea84c7..6fb58bd7cd4 100644 --- a/mkfs/proto.c +++ b/mkfs/proto.c @@ -380,7 +380,6 @@ creatproto( XFS_ICREATE_ARGS_FORCE_MODE, }; struct xfs_inode *ip; - xfs_ino_t parent_ino = dp ? dp->i_ino : 0; xfs_ino_t ino; int error; @@ -388,7 +387,7 @@ creatproto( * Call the space management code to pick the on-disk inode to be * allocated. */ - error = -libxfs_dialloc(tpp, parent_ino, mode, &ino); + error = -libxfs_dialloc(tpp, dp, mode, &ino); if (error) return error; diff --git a/repair/phase6.c b/repair/phase6.c index f8f42eb6e29..90413251b56 100644 --- a/repair/phase6.c +++ b/repair/phase6.c @@ -846,7 +846,7 @@ mk_orphanage( if (i) res_failed(i); - error = -libxfs_dialloc(&tp, mp->m_sb.sb_rootino, mode, &ino); + error = -libxfs_dialloc(&tp, pip, mode, &ino); if (error) do_error(_("%s inode allocation failed %d\n"), ORPHANAGE, error); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 19/46] libfrog: report metadata directories in the geometry report 2022-12-30 22:19 ` [PATCHSET v1.0 00/46] libxfs: metadata inode directories Darrick J. Wong ` (12 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 14/46] xfs: disable the agi rotor for metadata inodes Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 15/46] xfs: advertise metadata directory feature Darrick J. Wong ` (31 subsequent siblings) 45 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Report the presence of a metadata directory tree in the geometry report. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libfrog/fsgeom.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/libfrog/fsgeom.c b/libfrog/fsgeom.c index 6980d3ffab6..3f4c38d1e1b 100644 --- a/libfrog/fsgeom.c +++ b/libfrog/fsgeom.c @@ -31,6 +31,7 @@ xfs_report_geom( int bigtime_enabled; int inobtcount; int nrext64; + int metadir; isint = geo->logstart > 0; lazycount = geo->flags & XFS_FSOP_GEOM_FLAGS_LAZYSB ? 1 : 0; @@ -49,12 +50,14 @@ xfs_report_geom( bigtime_enabled = geo->flags & XFS_FSOP_GEOM_FLAGS_BIGTIME ? 1 : 0; inobtcount = geo->flags & XFS_FSOP_GEOM_FLAGS_INOBTCNT ? 1 : 0; nrext64 = geo->flags & XFS_FSOP_GEOM_FLAGS_NREXT64 ? 1 : 0; + metadir = geo->flags & XFS_FSOP_GEOM_FLAGS_METADIR ? 1 : 0; printf(_( "meta-data=%-22s isize=%-6d agcount=%u, agsize=%u blks\n" " =%-22s sectsz=%-5u attr=%u, projid32bit=%u\n" " =%-22s crc=%-8u finobt=%u, sparse=%u, rmapbt=%u\n" " =%-22s reflink=%-4u bigtime=%u inobtcount=%u nrext64=%u\n" +" =%-22s metadir=%-4u\n" "data =%-22s bsize=%-6u blocks=%llu, imaxpct=%u\n" " =%-22s sunit=%-6u swidth=%u blks\n" "naming =version %-14u bsize=%-6u ascii-ci=%d, ftype=%d\n" @@ -65,6 +68,7 @@ xfs_report_geom( "", geo->sectsize, attrversion, projid32bit, "", crcs_enabled, finobt_enabled, spinodes, rmapbt_enabled, "", reflink_enabled, bigtime_enabled, inobtcount, nrext64, + "", metadir, "", geo->blocksize, (unsigned long long)geo->datablocks, geo->imaxpct, "", geo->sunit, geo->swidth, ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 15/46] xfs: advertise metadata directory feature 2022-12-30 22:19 ` [PATCHSET v1.0 00/46] libxfs: metadata inode directories Darrick J. Wong ` (13 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 19/46] libfrog: report metadata directories in the geometry report Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 20/46] xfs_db: basic xfs_check support for metadir Darrick J. Wong ` (30 subsequent siblings) 45 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Advertise the existence of the metadata directory feature; this will be used by scrub to decide if it needs to scan the metadir too. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/xfs_fs.h | 1 + libxfs/xfs_sb.c | 2 ++ man/man2/ioctl_xfs_fsgeometry.2 | 3 +++ 3 files changed, 6 insertions(+) diff --git a/libxfs/xfs_fs.h b/libxfs/xfs_fs.h index a39fd65e6ee..7de31a6692a 100644 --- a/libxfs/xfs_fs.h +++ b/libxfs/xfs_fs.h @@ -239,6 +239,7 @@ typedef struct xfs_fsop_resblks { #define XFS_FSOP_GEOM_FLAGS_BIGTIME (1 << 21) /* 64-bit nsec timestamps */ #define XFS_FSOP_GEOM_FLAGS_INOBTCNT (1 << 22) /* inobt btree counter */ #define XFS_FSOP_GEOM_FLAGS_NREXT64 (1 << 23) /* large extent counters */ +#define XFS_FSOP_GEOM_FLAGS_METADIR (1 << 30) /* metadata directories */ #define XFS_FSOP_GEOM_FLAGS_ATOMIC_SWAP (1U << 31) /* atomic file extent swap */ /* diff --git a/libxfs/xfs_sb.c b/libxfs/xfs_sb.c index 6452856d45b..55a5c5fc631 100644 --- a/libxfs/xfs_sb.c +++ b/libxfs/xfs_sb.c @@ -1231,6 +1231,8 @@ xfs_fs_geometry( geo->flags |= XFS_FSOP_GEOM_FLAGS_NREXT64; if (xfs_swapext_supported(mp)) geo->flags |= XFS_FSOP_GEOM_FLAGS_ATOMIC_SWAP; + if (xfs_has_metadir(mp)) + geo->flags |= XFS_FSOP_GEOM_FLAGS_METADIR; geo->rtsectsize = sbp->sb_blocksize; geo->dirblocksize = xfs_dir2_dirblock_bytes(sbp); diff --git a/man/man2/ioctl_xfs_fsgeometry.2 b/man/man2/ioctl_xfs_fsgeometry.2 index 7c563ca0454..19328bb4be4 100644 --- a/man/man2/ioctl_xfs_fsgeometry.2 +++ b/man/man2/ioctl_xfs_fsgeometry.2 @@ -214,6 +214,9 @@ Filesystem supports sharing blocks between files. .TP .B XFS_FSOP_GEOM_FLAGS_ATOMICSWAP Filesystem can exchange file contents atomically via FIEXCHANGE_RANGE. +.TP +.B XFS_FSOP_GEOM_FLAGS_METADIR +Filesystem contains a metadata directory tree. .RE .SH XFS METADATA HEALTH REPORTING .PP ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 20/46] xfs_db: basic xfs_check support for metadir 2022-12-30 22:19 ` [PATCHSET v1.0 00/46] libxfs: metadata inode directories Darrick J. Wong ` (14 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 15/46] xfs: advertise metadata directory feature Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 18/46] xfs: record health problems with the metadata directory Darrick J. Wong ` (29 subsequent siblings) 45 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Support metadata directories in xfs_check. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- db/check.c | 21 +++++++++++++++++---- 1 file changed, 17 insertions(+), 4 deletions(-) diff --git a/db/check.c b/db/check.c index 964756d0ae5..5297ea25459 100644 --- a/db/check.c +++ b/db/check.c @@ -2649,7 +2649,9 @@ process_dir( if (!sflag || id->ilist || CHECK_BLIST(bno)) dbprintf(_("no .. entry for directory %lld\n"), id->ino); error++; - } else if (parent == id->ino && id->ino != mp->m_sb.sb_rootino) { + } else if (parent == id->ino && + id->ino != mp->m_sb.sb_rootino && + id->ino != mp->m_sb.sb_metadirino) { if (!sflag || id->ilist || CHECK_BLIST(bno)) dbprintf(_(". and .. same for non-root directory %lld\n"), id->ino); @@ -2659,6 +2661,11 @@ process_dir( dbprintf(_("root directory %lld has .. %lld\n"), id->ino, parent); error++; + } else if (id->ino == mp->m_sb.sb_metadirino && id->ino != parent) { + if (!sflag || id->ilist || CHECK_BLIST(bno)) + dbprintf(_("metadata directory %lld has .. %lld\n"), + id->ino, parent); + error++; } else if (parent != NULLFSINO && id->ino != parent) addparent_inode(id, parent); } @@ -2902,6 +2909,9 @@ process_inode( type = DBM_DIR; if (dip->di_format == XFS_DINODE_FMT_LOCAL) break; + if (xfs_has_metadir(mp) && + id->ino == mp->m_sb.sb_metadirino) + addlink_inode(id); blkmap = blkmap_alloc(dnextents); break; case S_IFREG: @@ -2910,18 +2920,21 @@ process_inode( else if (id->ino == mp->m_sb.sb_rbmino) { type = DBM_RTBITMAP; blkmap = blkmap_alloc(dnextents); - addlink_inode(id); + if (!xfs_has_metadir(mp)) + addlink_inode(id); } else if (id->ino == mp->m_sb.sb_rsumino) { type = DBM_RTSUM; blkmap = blkmap_alloc(dnextents); - addlink_inode(id); + if (!xfs_has_metadir(mp)) + addlink_inode(id); } else if (id->ino == mp->m_sb.sb_uquotino || id->ino == mp->m_sb.sb_gquotino || id->ino == mp->m_sb.sb_pquotino) { type = DBM_QUOTA; blkmap = blkmap_alloc(dnextents); - addlink_inode(id); + if (!xfs_has_metadir(mp)) + addlink_inode(id); } else type = DBM_DATA; ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 18/46] xfs: record health problems with the metadata directory 2022-12-30 22:19 ` [PATCHSET v1.0 00/46] libxfs: metadata inode directories Darrick J. Wong ` (15 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 20/46] xfs_db: basic xfs_check support for metadir Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 17/46] xfs: enable creation of dynamically allocated metadir path structures Darrick J. Wong ` (28 subsequent siblings) 45 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Make a report to the health monitoring subsystem any time we encounter something in the metadata directory tree that looks like corruption. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/xfs_fs.h | 1 + libxfs/xfs_health.h | 4 +++- libxfs/xfs_imeta.c | 28 ++++++++++++++++++++++------ 3 files changed, 26 insertions(+), 7 deletions(-) diff --git a/libxfs/xfs_fs.h b/libxfs/xfs_fs.h index 6e0c45fcfee..c4995f6557d 100644 --- a/libxfs/xfs_fs.h +++ b/libxfs/xfs_fs.h @@ -197,6 +197,7 @@ struct xfs_fsop_geom { #define XFS_FSOP_GEOM_SICK_RT_SUMMARY (1 << 5) /* realtime summary */ #define XFS_FSOP_GEOM_SICK_QUOTACHECK (1 << 6) /* quota counts */ #define XFS_FSOP_GEOM_SICK_NLINKS (1 << 7) /* inode link counts */ +#define XFS_FSOP_GEOM_SICK_METADIR (1 << 8) /* metadata directory */ /* Output for XFS_FS_COUNTS */ typedef struct xfs_fsop_counts { diff --git a/libxfs/xfs_health.h b/libxfs/xfs_health.h index 252334bc048..99d53bae9c1 100644 --- a/libxfs/xfs_health.h +++ b/libxfs/xfs_health.h @@ -60,6 +60,7 @@ struct xfs_da_args; #define XFS_SICK_FS_PQUOTA (1 << 3) /* project quota */ #define XFS_SICK_FS_QUOTACHECK (1 << 4) /* quota counts */ #define XFS_SICK_FS_NLINKS (1 << 5) /* inode link counts */ +#define XFS_SICK_FS_METADIR (1 << 6) /* metadata directory tree */ /* Observable health issues for realtime volume metadata. */ #define XFS_SICK_RT_BITMAP (1 << 0) /* realtime bitmap */ @@ -96,7 +97,8 @@ struct xfs_da_args; XFS_SICK_FS_GQUOTA | \ XFS_SICK_FS_PQUOTA | \ XFS_SICK_FS_QUOTACHECK | \ - XFS_SICK_FS_NLINKS) + XFS_SICK_FS_NLINKS | \ + XFS_SICK_FS_METADIR) #define XFS_SICK_RT_PRIMARY (XFS_SICK_RT_BITMAP | \ XFS_SICK_RT_SUMMARY) diff --git a/libxfs/xfs_imeta.c b/libxfs/xfs_imeta.c index eaf63275c08..a5d8e6057bb 100644 --- a/libxfs/xfs_imeta.c +++ b/libxfs/xfs_imeta.c @@ -25,6 +25,7 @@ #include "xfs_dir2.h" #include "xfs_dir2_priv.h" #include "xfs_ag.h" +#include "xfs_health.h" /* * Metadata Inode Number Management @@ -404,16 +405,22 @@ xfs_imeta_dir_lookup_component( trace_xfs_imeta_dir_lookup_component(dp, xname, NULLFSINO); - if (!S_ISDIR(VFS_I(dp)->i_mode)) + if (!S_ISDIR(VFS_I(dp)->i_mode)) { + xfs_fs_mark_sick(dp->i_mount, XFS_SICK_FS_METADIR); return -EFSCORRUPTED; + } error = xfs_imeta_dir_lookup(dp, xname, ino); if (error) return error; - if (!xfs_verify_ino(dp->i_mount, *ino)) + if (!xfs_verify_ino(dp->i_mount, *ino)) { + xfs_fs_mark_sick(dp->i_mount, XFS_SICK_FS_METADIR); return -EFSCORRUPTED; - if (type_wanted != XFS_DIR3_FT_UNKNOWN && xname->type != type_wanted) + } + if (type_wanted != XFS_DIR3_FT_UNKNOWN && xname->type != type_wanted) { + xfs_fs_mark_sick(dp->i_mount, XFS_SICK_FS_METADIR); return -EFSCORRUPTED; + } trace_xfs_imeta_dir_lookup_found(dp, xname, *ino); return 0; @@ -712,6 +719,7 @@ xfs_imeta_dir_unlink( /* Metadata directory root cannot be unlinked. */ if (xfs_imeta_path_compare(path, &XFS_IMETA_METADIR)) { ASSERT(0); + xfs_fs_mark_sick(mp, XFS_SICK_FS_METADIR); return -EFSCORRUPTED; } @@ -727,6 +735,7 @@ xfs_imeta_dir_unlink( error = -ENOENT; break; case -ENOENT: + xfs_fs_mark_sick(mp, XFS_SICK_FS_METADIR); error = -EFSCORRUPTED; break; } @@ -778,6 +787,7 @@ xfs_imeta_dir_link( /* Metadata directory root cannot be linked. */ if (xfs_imeta_path_compare(path, &XFS_IMETA_METADIR)) { ASSERT(0); + xfs_fs_mark_sick(mp, XFS_SICK_FS_METADIR); return -EFSCORRUPTED; } @@ -855,16 +865,20 @@ xfs_imeta_lookup( if (xfs_has_metadir(mp)) { error = xfs_imeta_dir_lookup_int(mp, path, &ino); - if (error == -ENOENT) + if (error == -ENOENT) { + xfs_fs_mark_sick(mp, XFS_SICK_FS_METADIR); return -EFSCORRUPTED; + } } else { error = xfs_imeta_sb_lookup(mp, path, &ino); } if (error) return error; - if (!xfs_imeta_verify(mp, ino)) + if (!xfs_imeta_verify(mp, ino)) { + xfs_fs_mark_sick(mp, XFS_SICK_FS_METADIR); return -EFSCORRUPTED; + } *inop = ino; return 0; @@ -1040,8 +1054,10 @@ xfs_imeta_start_update( * to exist. */ error = xfs_imeta_dir_parent(mp, path, &upd->dp); - if (error == -ENOENT) + if (error == -ENOENT) { + xfs_fs_mark_sick(mp, XFS_SICK_FS_METADIR); return -EFSCORRUPTED; + } if (error) return error; ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 17/46] xfs: enable creation of dynamically allocated metadir path structures 2022-12-30 22:19 ` [PATCHSET v1.0 00/46] libxfs: metadata inode directories Darrick J. Wong ` (16 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 18/46] xfs: record health problems with the metadata directory Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 13/46] xfs: ensure metadata directory paths exist before creating files Darrick J. Wong ` (27 subsequent siblings) 45 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Add a few helper functions so that it's possible to allocate xfs_imeta_path objects dynamically, along with dynamically allocated path components. Eventually we're going to want to support paths of the form "/realtime/$rtgroup.rmap", and this is necessary for that. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- include/kmem.h | 4 +++- libxfs/xfs_imeta.c | 43 +++++++++++++++++++++++++++++++++++++++++++ libxfs/xfs_imeta.h | 8 ++++++++ 3 files changed, 54 insertions(+), 1 deletion(-) diff --git a/include/kmem.h b/include/kmem.h index 8ae919c7066..014983173a2 100644 --- a/include/kmem.h +++ b/include/kmem.h @@ -26,7 +26,7 @@ typedef unsigned int __bitwise gfp_t; #define __GFP_NOFAIL ((__force gfp_t)0) #define __GFP_NOLOCKDEP ((__force gfp_t)0) -#define __GFP_ZERO (__force gfp_t)1 +#define __GFP_ZERO ((__force gfp_t)1) struct kmem_cache * kmem_cache_create(const char *name, unsigned int size, unsigned int align, unsigned int slab_flags, @@ -65,6 +65,8 @@ static inline void *kmalloc(size_t size, gfp_t flags) return kvmalloc(size, flags); } +#define kvcalloc(nr, size, gfp) kvmalloc((nr) * (size), (gfp) | __GFP_ZERO) + static inline void kfree(const void *ptr) { return kmem_free(ptr); diff --git a/libxfs/xfs_imeta.c b/libxfs/xfs_imeta.c index 1502d4eb2e3..eaf63275c08 100644 --- a/libxfs/xfs_imeta.c +++ b/libxfs/xfs_imeta.c @@ -1148,3 +1148,46 @@ xfs_imeta_lookup_update( return error; } + +/* Create a path to a file within the metadata directory tree. */ +int +xfs_imeta_create_file_path( + struct xfs_mount *mp, + unsigned int nr_components, + struct xfs_imeta_path **pathp) +{ + struct xfs_imeta_path *p; + char **components; + + p = kmalloc(sizeof(struct xfs_imeta_path), GFP_KERNEL); + if (!p) + return -ENOMEM; + + components = kvcalloc(nr_components, sizeof(char *), GFP_KERNEL); + if (!components) { + kfree(p); + return -ENOMEM; + } + + p->im_depth = nr_components; + p->im_path = (const char **)components; + p->im_ftype = XFS_DIR3_FT_REG_FILE; + p->im_dynamicmask = 0; + *pathp = p; + return 0; +} + +/* Free a metadata directory tree path. */ +void +xfs_imeta_free_path( + struct xfs_imeta_path *path) +{ + unsigned int i; + + for (i = 0; i < path->im_depth; i++) { + if ((path->im_dynamicmask & (1ULL << i)) && path->im_path[i]) + kfree(path->im_path[i]); + } + kfree(path->im_path); + kfree(path); +} diff --git a/libxfs/xfs_imeta.h b/libxfs/xfs_imeta.h index 741f426c6a4..7840087b71d 100644 --- a/libxfs/xfs_imeta.h +++ b/libxfs/xfs_imeta.h @@ -15,6 +15,7 @@ const struct xfs_imeta_path name = { \ .im_path = (path), \ .im_ftype = XFS_DIR3_FT_REG_FILE, \ .im_depth = ARRAY_SIZE(path), \ + .im_dynamicmask = 0, \ } /* Key for looking up metadata inodes. */ @@ -27,6 +28,9 @@ struct xfs_imeta_path { /* Expected file type. */ unsigned int im_ftype; + + /* Each bit corresponds to an element of im_path needing to be freed */ + unsigned long long im_dynamicmask; }; /* Cleanup widget for metadata inode creation and deletion. */ @@ -52,6 +56,10 @@ int xfs_imeta_lookup_update(struct xfs_mount *mp, const struct xfs_imeta_path *path, struct xfs_imeta_update *upd, xfs_ino_t *inop); +int xfs_imeta_create_file_path(struct xfs_mount *mp, + unsigned int nr_components, struct xfs_imeta_path **pathp); +void xfs_imeta_free_path(struct xfs_imeta_path *path); + void xfs_imeta_set_metaflag(struct xfs_trans *tp, struct xfs_inode *ip); /* Don't allocate quota for this file. */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 13/46] xfs: ensure metadata directory paths exist before creating files 2022-12-30 22:19 ` [PATCHSET v1.0 00/46] libxfs: metadata inode directories Darrick J. Wong ` (17 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 17/46] xfs: enable creation of dynamically allocated metadir path structures Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 16/46] xfs: allow bulkstat to return metadata directories Darrick J. Wong ` (26 subsequent siblings) 45 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Since xfs_imeta_create can create new metadata files arbitrarily deep in the metadata directory tree, we must supply a function that can ensure that all directories in a path exist, and call it before the quota functions create the quota inodes. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/libxfs_api_defs.h | 1 + libxfs/util.c | 78 ++++++++++++++++++++++++++++++++++++++++++++++ libxfs/xfs_imeta.h | 2 + mkfs/proto.c | 8 +++++ 4 files changed, 89 insertions(+) diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h index d4cc059abfb..785354d3ec8 100644 --- a/libxfs/libxfs_api_defs.h +++ b/libxfs/libxfs_api_defs.h @@ -160,6 +160,7 @@ #define xfs_imeta_create libxfs_imeta_create #define xfs_imeta_create_space_res libxfs_imeta_create_space_res #define xfs_imeta_end_update libxfs_imeta_end_update +#define xfs_imeta_ensure_dirpath libxfs_imeta_ensure_dirpath #define xfs_imeta_iget libxfs_imeta_iget #define xfs_imeta_irele libxfs_imeta_irele #define xfs_imeta_link libxfs_imeta_link diff --git a/libxfs/util.c b/libxfs/util.c index 51a0f513e7a..fec26e6d30f 100644 --- a/libxfs/util.c +++ b/libxfs/util.c @@ -454,3 +454,81 @@ void xfs_dirattr_mark_sick(struct xfs_inode *ip, int whichfork) { } void xfs_da_mark_sick(struct xfs_da_args *args) { } void xfs_inode_mark_sick(struct xfs_inode *ip, unsigned int mask) { } void xfs_rt_mark_sick(struct xfs_mount *mp, unsigned int mask) { } + +/* Create a metadata for the last component of the path. */ +STATIC int +libxfs_imeta_mkdir( + struct xfs_mount *mp, + const struct xfs_imeta_path *path) +{ + struct xfs_imeta_update upd; + struct xfs_inode *ip = NULL; + struct xfs_trans *tp = NULL; + uint resblks; + int error; + + /* Try to place metadata directories in AG 0. */ + mp->m_agirotor = 0; + + error = xfs_imeta_start_update(mp, path, &upd); + if (error) + return error; + + /* Allocate a transaction to create the last directory. */ + resblks = libxfs_imeta_create_space_res(mp); + error = libxfs_trans_alloc(mp, &M_RES(mp)->tr_imeta_create, resblks, + 0, 0, &tp); + if (error) + goto out_end; + + /* Create the subdirectory. */ + error = libxfs_imeta_create(&tp, path, S_IFDIR, 0, &ip, &upd); + if (error) + goto out_trans_cancel; + + error = libxfs_trans_commit(tp); + + /* + * We don't pass the directory we just created to the caller, so finish + * setting up the inode, then release the dir. + */ + goto out_irele; + +out_trans_cancel: + libxfs_trans_cancel(tp); +out_irele: + if (ip) + libxfs_irele(ip); +out_end: + libxfs_imeta_end_update(mp, &upd, error); + return error; +} + +/* + * Make sure that every metadata directory path component exists and is a + * directory. + */ +int +libxfs_imeta_ensure_dirpath( + struct xfs_mount *mp, + const struct xfs_imeta_path *path) +{ + struct xfs_imeta_path temp_path = { + .im_path = path->im_path, + .im_depth = 1, + .im_ftype = XFS_DIR3_FT_DIR, + }; + unsigned int i; + int error = 0; + + if (!xfs_has_metadir(mp)) + return 0; + + for (i = 0; i < path->im_depth - 1; i++, temp_path.im_depth++) { + error = libxfs_imeta_mkdir(mp, &temp_path); + if (error && error != -EEXIST) + break; + } + + return error == -EEXIST ? 0 : error; +} diff --git a/libxfs/xfs_imeta.h b/libxfs/xfs_imeta.h index 9b139f6809f..741f426c6a4 100644 --- a/libxfs/xfs_imeta.h +++ b/libxfs/xfs_imeta.h @@ -80,5 +80,7 @@ unsigned int xfs_imeta_unlink_space_res(struct xfs_mount *mp); int xfs_imeta_iget(struct xfs_mount *mp, xfs_ino_t ino, unsigned char ftype, struct xfs_inode **ipp); void xfs_imeta_irele(struct xfs_inode *ip); +int xfs_imeta_ensure_dirpath(struct xfs_mount *mp, + const struct xfs_imeta_path *path); #endif /* __XFS_IMETA_H__ */ diff --git a/mkfs/proto.c b/mkfs/proto.c index f145a7ba753..f15cbea84c7 100644 --- a/mkfs/proto.c +++ b/mkfs/proto.c @@ -683,6 +683,10 @@ rtbitmap_create( struct xfs_inode *rbmip; int error; + error = -libxfs_imeta_ensure_dirpath(mp, &XFS_IMETA_RTBITMAP); + if (error) + fail(_("Realtime bitmap directory allocation failed"), error); + error = -libxfs_imeta_start_update(mp, &XFS_IMETA_RTBITMAP, &upd); if (error) res_failed(error); @@ -719,6 +723,10 @@ rtsummary_create( struct xfs_inode *rsumip; int error; + error = -libxfs_imeta_ensure_dirpath(mp, &XFS_IMETA_RTSUMMARY); + if (error) + fail(_("Realtime summary directory allocation failed"), error); + error = -libxfs_imeta_start_update(mp, &XFS_IMETA_RTSUMMARY, &upd); if (error) res_failed(error); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 16/46] xfs: allow bulkstat to return metadata directories 2022-12-30 22:19 ` [PATCHSET v1.0 00/46] libxfs: metadata inode directories Darrick J. Wong ` (18 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 13/46] xfs: ensure metadata directory paths exist before creating files Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 24/46] xfs_db: mask superblock fields when metadir feature is enabled Darrick J. Wong ` (25 subsequent siblings) 45 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Allow the V5 bulkstat ioctl to return information about metadata directory files so that xfs_scrub can find and scrub them, since they are otherwise ordinary directories. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/xfs_fs.h | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/libxfs/xfs_fs.h b/libxfs/xfs_fs.h index 7de31a6692a..6e0c45fcfee 100644 --- a/libxfs/xfs_fs.h +++ b/libxfs/xfs_fs.h @@ -485,9 +485,17 @@ struct xfs_bulk_ireq { */ #define XFS_BULK_IREQ_NREXT64 (1U << 2) +/* + * Allow bulkstat to return information about metadata directories. This + * enables xfs_scrub to find them for scanning, as they are otherwise ordinary + * directories. + */ +#define XFS_BULK_IREQ_METADIR (1U << 31) + #define XFS_BULK_IREQ_FLAGS_ALL (XFS_BULK_IREQ_AGNO | \ XFS_BULK_IREQ_SPECIAL | \ - XFS_BULK_IREQ_NREXT64) + XFS_BULK_IREQ_NREXT64 | \ + XFS_BULK_IREQ_METADIR) /* Operate on the root directory inode. */ #define XFS_BULK_IREQ_SPECIAL_ROOT (1) ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 24/46] xfs_db: mask superblock fields when metadir feature is enabled 2022-12-30 22:19 ` [PATCHSET v1.0 00/46] libxfs: metadata inode directories Darrick J. Wong ` (19 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 16/46] xfs: allow bulkstat to return metadata directories Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 23/46] xfs_db: support metadata directories in the path command Darrick J. Wong ` (24 subsequent siblings) 45 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> When the metadata directory feature is enabled, mask the superblock fields (rt, quota inodes) that got migrated to the directory tree. Similarly, hide the 'metadirino' field when the feature is disabled. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- db/sb.c | 41 ++++++++++++++++++++++++++++++++++++----- 1 file changed, 36 insertions(+), 5 deletions(-) diff --git a/db/sb.c b/db/sb.c index 8a54ff7b00c..d7df55e02e9 100644 --- a/db/sb.c +++ b/db/sb.c @@ -50,6 +50,30 @@ sb_init(void) add_command(&version_cmd); } +/* + * Counts superblock fields that only exist when the metadata directory feature + * is enabled. + */ +static int +metadirino_count( + void *obj, + int startoff) +{ + return xfs_has_metadir(mp) ? 1 : 0; +} + +/* + * Counts superblock fields that only existed before the metadata directory + * feature came along. + */ +static int +rootino_count( + void *obj, + int startoff) +{ + return xfs_has_metadir(mp) ? 0 : 1; +} + #define OFF(f) bitize(offsetof(struct xfs_dsb, sb_ ## f)) #define SZC(f) szcount(struct xfs_dsb, sb_ ## f) const field_t sb_flds[] = { @@ -61,8 +85,12 @@ const field_t sb_flds[] = { { "uuid", FLDT_UUID, OI(OFF(uuid)), C1, 0, TYP_NONE }, { "logstart", FLDT_DFSBNO, OI(OFF(logstart)), C1, 0, TYP_LOG }, { "rootino", FLDT_INO, OI(OFF(rootino)), C1, 0, TYP_INODE }, - { "rbmino", FLDT_INO, OI(OFF(rbmino)), C1, 0, TYP_INODE }, - { "rsumino", FLDT_INO, OI(OFF(rsumino)), C1, 0, TYP_INODE }, + { "metadirino", FLDT_INO, OI(OFF(rbmino)), metadirino_count, + FLD_COUNT, TYP_INODE }, + { "rbmino", FLDT_INO, OI(OFF(rbmino)), rootino_count, FLD_COUNT, + TYP_INODE }, + { "rsumino", FLDT_INO, OI(OFF(rsumino)), rootino_count, FLD_COUNT, + TYP_INODE }, { "rextsize", FLDT_AGBLOCK, OI(OFF(rextsize)), C1, 0, TYP_NONE }, { "agblocks", FLDT_AGBLOCK, OI(OFF(agblocks)), C1, 0, TYP_NONE }, { "agcount", FLDT_AGNUMBER, OI(OFF(agcount)), C1, 0, TYP_NONE }, @@ -85,8 +113,10 @@ const field_t sb_flds[] = { { "ifree", FLDT_UINT64D, OI(OFF(ifree)), C1, 0, TYP_NONE }, { "fdblocks", FLDT_UINT64D, OI(OFF(fdblocks)), C1, 0, TYP_NONE }, { "frextents", FLDT_UINT64D, OI(OFF(frextents)), C1, 0, TYP_NONE }, - { "uquotino", FLDT_INO, OI(OFF(uquotino)), C1, 0, TYP_INODE }, - { "gquotino", FLDT_INO, OI(OFF(gquotino)), C1, 0, TYP_INODE }, + { "uquotino", FLDT_INO, OI(OFF(uquotino)), rootino_count, FLD_COUNT, + TYP_INODE }, + { "gquotino", FLDT_INO, OI(OFF(gquotino)), rootino_count, FLD_COUNT, + TYP_INODE }, { "qflags", FLDT_UINT16X, OI(OFF(qflags)), C1, 0, TYP_NONE }, { "flags", FLDT_UINT8X, OI(OFF(flags)), C1, 0, TYP_NONE }, { "shared_vn", FLDT_UINT8D, OI(OFF(shared_vn)), C1, 0, TYP_NONE }, @@ -110,7 +140,8 @@ const field_t sb_flds[] = { C1, 0, TYP_NONE }, { "crc", FLDT_CRC, OI(OFF(crc)), C1, 0, TYP_NONE }, { "spino_align", FLDT_EXTLEN, OI(OFF(spino_align)), C1, 0, TYP_NONE }, - { "pquotino", FLDT_INO, OI(OFF(pquotino)), C1, 0, TYP_INODE }, + { "pquotino", FLDT_INO, OI(OFF(pquotino)), rootino_count, FLD_COUNT, + TYP_INODE }, { "lsn", FLDT_UINT64X, OI(OFF(lsn)), C1, 0, TYP_NONE }, { "meta_uuid", FLDT_UUID, OI(OFF(meta_uuid)), C1, 0, TYP_NONE }, { NULL } ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 23/46] xfs_db: support metadata directories in the path command 2022-12-30 22:19 ` [PATCHSET v1.0 00/46] libxfs: metadata inode directories Darrick J. Wong ` (20 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 24/46] xfs_db: mask superblock fields when metadir feature is enabled Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 26/46] xfs_scrub: scan metadata directories during phase 3 Darrick J. Wong ` (23 subsequent siblings) 45 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Teach the path command to traverse the metadata directory tree by passing a '\' as the first letter in the path. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- db/namei.c | 43 +++++++++++++++++++++++++++++++++++++------ man/man8/xfs_db.8 | 11 +++++++++-- 2 files changed, 46 insertions(+), 8 deletions(-) diff --git a/db/namei.c b/db/namei.c index dc3cbbeda38..a3d917db5c6 100644 --- a/db/namei.c +++ b/db/namei.c @@ -139,11 +139,11 @@ path_navigate( /* Walk a directory path to an inode and set the io cursor to that inode. */ static int path_walk( + xfs_ino_t rootino, char *path) { struct dirpath *dirpath; char *p = path; - xfs_ino_t rootino = mp->m_sb.sb_rootino; int error = 0; if (*p == '/') { @@ -173,6 +173,9 @@ path_help(void) dbprintf(_( "\n" " Navigate to an inode via directory path.\n" +"\n" +" Options:\n" +" -m -- Walk an absolute path down the metadata directory tree.\n" )); } @@ -181,18 +184,34 @@ path_f( int argc, char **argv) { + xfs_ino_t rootino = mp->m_sb.sb_rootino; int c; int error; - while ((c = getopt(argc, argv, "")) != -1) { + while ((c = getopt(argc, argv, "m")) != -1) { switch (c) { + case 'm': + /* Absolute path, start from metadata rootdir. */ + if (!xfs_has_metadir(mp)) { + dbprintf( + _("filesystem does not support metadata directories.\n")); + exitcode = 1; + return 0; + } + rootino = mp->m_sb.sb_metadirino; + break; default: path_help(); return 0; } } - error = path_walk(argv[optind]); + if (argc == optind || argc > optind + 1) { + dbprintf(_("Only supply one path.\n")); + return -1; + } + + error = path_walk(rootino, argv[optind]); if (error) { dbprintf("%s: %s\n", argv[optind], strerror(error)); exitcode = 1; @@ -206,7 +225,7 @@ static struct cmdinfo path_cmd = { .altname = NULL, .cfunc = path_f, .argmin = 1, - .argmax = 1, + .argmax = -1, .canpush = 0, .args = "", .help = path_help, @@ -521,6 +540,7 @@ ls_help(void) " Options:\n" " -i -- Resolve the given paths to their corresponding inode numbers.\n" " If no paths are given, display the current inode number.\n" +" -m -- Walk an absolute path down the metadata directory tree.\n" "\n" " Directory contents will be listed in the format:\n" " dir_cookie inode_number type hash name_length name\n" @@ -532,15 +552,26 @@ ls_f( int argc, char **argv) { + xfs_ino_t rootino = mp->m_sb.sb_rootino; bool inum_only = false; int c; int error = 0; - while ((c = getopt(argc, argv, "i")) != -1) { + while ((c = getopt(argc, argv, "im")) != -1) { switch (c) { case 'i': inum_only = true; break; + case 'm': + /* Absolute path, start from metadata rootdir. */ + if (!xfs_has_metadir(mp)) { + dbprintf( + _("filesystem does not support metadata directories.\n")); + exitcode = 1; + return 0; + } + rootino = mp->m_sb.sb_metadirino; + break; default: ls_help(); return 0; @@ -563,7 +594,7 @@ ls_f( for (c = optind; c < argc; c++) { push_cur(); - error = path_walk(argv[c]); + error = path_walk(rootino, argv[c]); if (error) goto err_cur; diff --git a/man/man8/xfs_db.8 b/man/man8/xfs_db.8 index 43c7db5e225..a7e42e1a333 100644 --- a/man/man8/xfs_db.8 +++ b/man/man8/xfs_db.8 @@ -835,7 +835,7 @@ This makes it easier to find discrepancies in the reservation calculations between xfsprogs and the kernel, which will help when diagnosing minimum log size calculation errors. .TP -.BI "ls [\-i] [" paths "]..." +.BI "ls [\-im] [" paths "]..." List the contents of a directory. If a path resolves to a directory, the directory will be listed. If no paths are supplied and the IO cursor points at a directory inode, @@ -849,6 +849,9 @@ directory cookie, inode number, file type, hash, name length, name. Resolve each of the given paths to an inode number and print that number. If no paths are given and the IO cursor points to an inode, print the inode number. +.TP +.B \-m +Absolute paths should be walked from the root of the metadata directory tree. .RE .TP .BI "metadump [\-egow] " filename @@ -876,9 +879,13 @@ See the .B print command. .TP -.BI "path " dir_path +.BI "path [\-m] " dir_path Walk the directory tree to an inode using the supplied path. Absolute and relative paths are supported. +The +.B \-m +option causes absolute paths to be walked from the root of the metadata +directory tree. .TP .B pop Pop location from the stack. ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 26/46] xfs_scrub: scan metadata directories during phase 3 2022-12-30 22:19 ` [PATCHSET v1.0 00/46] libxfs: metadata inode directories Darrick J. Wong ` (21 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 23/46] xfs_db: support metadata directories in the path command Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 25/46] xfs_io: support the bulkstat metadata directory flag Darrick J. Wong ` (22 subsequent siblings) 45 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Scan metadata directories for correctness during phase 3. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- scrub/inodes.c | 5 +++++ scrub/inodes.h | 5 ++++- scrub/phase3.c | 7 ++++++- scrub/phase5.c | 2 +- scrub/phase6.c | 2 +- 5 files changed, 17 insertions(+), 4 deletions(-) diff --git a/scrub/inodes.c b/scrub/inodes.c index 78f0914b8d9..52d17c5c646 100644 --- a/scrub/inodes.c +++ b/scrub/inodes.c @@ -100,6 +100,7 @@ struct scan_inodes { scrub_inode_iter_fn fn; void *arg; unsigned int nr_threads; + unsigned int flags; bool aborted; }; @@ -158,6 +159,8 @@ alloc_ichunk( breq = ichunk_to_bulkstat(ichunk); breq->hdr.icount = LIBFROG_BULKSTAT_CHUNKSIZE; + if (si->flags & SCRUB_SCAN_METADIR) + breq->hdr.flags |= XFS_BULK_IREQ_METADIR; *ichunkp = ichunk; return 0; @@ -380,10 +383,12 @@ int scrub_scan_all_inodes( struct scrub_ctx *ctx, scrub_inode_iter_fn fn, + unsigned int flags, void *arg) { struct scan_inodes si = { .fn = fn, + .flags = flags, .arg = arg, .nr_threads = scrub_nproc_workqueue(ctx), }; diff --git a/scrub/inodes.h b/scrub/inodes.h index f03180458ab..d99eaf0a2a7 100644 --- a/scrub/inodes.h +++ b/scrub/inodes.h @@ -17,8 +17,11 @@ typedef int (*scrub_inode_iter_fn)(struct scrub_ctx *ctx, struct xfs_handle *handle, struct xfs_bulkstat *bs, void *arg); +/* Return metadata directories too. */ +#define SCRUB_SCAN_METADIR (1 << 0) + int scrub_scan_all_inodes(struct scrub_ctx *ctx, scrub_inode_iter_fn fn, - void *arg); + unsigned int flags, void *arg); int scrub_open_handle(struct xfs_handle *handle); diff --git a/scrub/phase3.c b/scrub/phase3.c index c5950b1b9e3..56a4385a408 100644 --- a/scrub/phase3.c +++ b/scrub/phase3.c @@ -247,6 +247,7 @@ phase3_func( struct scrub_inode_ctx ictx = { .ctx = ctx }; uint64_t val; xfs_agnumber_t agno; + unsigned int scan_flags = 0; int err; err = -ptvar_alloc(scrub_nproc(ctx), sizeof(struct action_list), @@ -263,6 +264,10 @@ phase3_func( goto out_ptvar; } + /* Scan the metadata directory tree too. */ + if (ctx->mnt.fsgeom.flags & XFS_FSOP_GEOM_FLAGS_METADIR) + scan_flags |= SCRUB_SCAN_METADIR; + /* * If we already have ag/fs metadata to repair from previous phases, * we would rather not try to repair file metadata until we've tried @@ -273,7 +278,7 @@ phase3_func( ictx.always_defer_repairs = true; } - err = scrub_scan_all_inodes(ctx, scrub_inode, &ictx); + err = scrub_scan_all_inodes(ctx, scrub_inode, scan_flags, &ictx); if (!err && ictx.aborted) err = ECANCELED; if (err) diff --git a/scrub/phase5.c b/scrub/phase5.c index 96e13ac423f..e6786b4f25c 100644 --- a/scrub/phase5.c +++ b/scrub/phase5.c @@ -532,7 +532,7 @@ _("Filesystem has errors, skipping connectivity checks.")); if (ret) return ret; - ret = scrub_scan_all_inodes(ctx, check_inode_names, &aborted); + ret = scrub_scan_all_inodes(ctx, check_inode_names, 0, &aborted); if (ret) return ret; if (aborted) diff --git a/scrub/phase6.c b/scrub/phase6.c index 1a2643bdaf0..fb7cd3f13ea 100644 --- a/scrub/phase6.c +++ b/scrub/phase6.c @@ -507,7 +507,7 @@ report_all_media_errors( return ret; /* Scan for unlinked files. */ - return scrub_scan_all_inodes(ctx, report_inode_loss, vs); + return scrub_scan_all_inodes(ctx, report_inode_loss, 0, vs); } /* Schedule a read-verify of a (data block) extent. */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 25/46] xfs_io: support the bulkstat metadata directory flag 2022-12-30 22:19 ` [PATCHSET v1.0 00/46] libxfs: metadata inode directories Darrick J. Wong ` (22 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 26/46] xfs_scrub: scan metadata directories during phase 3 Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 21/46] xfs_db: report metadir support for version command Darrick J. Wong ` (21 subsequent siblings) 45 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Support the new XFS_BULK_IREQ_METADIR flag for bulkstat commands. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- io/bulkstat.c | 16 +++++++++++++++- man/man8/xfs_io.8 | 10 +++++++--- 2 files changed, 22 insertions(+), 4 deletions(-) diff --git a/io/bulkstat.c b/io/bulkstat.c index a9ad87ca183..829f6a02515 100644 --- a/io/bulkstat.c +++ b/io/bulkstat.c @@ -70,6 +70,7 @@ bulkstat_help(void) " -d Print debugging output.\n" " -q Be quiet, no output.\n" " -e <ino> Stop after this inode.\n" +" -m Include metadata directories.\n" " -n <nr> Ask for this many results at once.\n" " -s <ino> Inode to start with.\n" " -v <ver> Use this version of the ioctl (1 or 5).\n")); @@ -107,11 +108,12 @@ bulkstat_f( bool has_agno = false; bool debug = false; bool quiet = false; + bool metadir = false; unsigned int i; int c; int ret; - while ((c = getopt(argc, argv, "a:de:n:qs:v:")) != -1) { + while ((c = getopt(argc, argv, "a:de:mn:qs:v:")) != -1) { switch (c) { case 'a': agno = cvt_u32(optarg, 10); @@ -131,6 +133,9 @@ bulkstat_f( return 1; } break; + case 'm': + metadir = true; + break; case 'n': batch_size = cvt_u32(optarg, 10); if (errno) { @@ -185,6 +190,8 @@ bulkstat_f( if (has_agno) xfrog_bulkstat_set_ag(breq, agno); + if (metadir) + breq->hdr.flags |= XFS_BULK_IREQ_METADIR; set_xfd_flags(&xfd, ver); @@ -253,6 +260,7 @@ bulkstat_single_f( unsigned long ver = 0; unsigned int i; bool debug = false; + bool metadir = false; int c; int ret; @@ -261,6 +269,9 @@ bulkstat_single_f( case 'd': debug = true; break; + case 'm': + metadir = true; + break; case 'v': errno = 0; ver = strtoull(optarg, NULL, 10); @@ -313,6 +324,9 @@ bulkstat_single_f( } } + if (metadir) + flags |= XFS_BULK_IREQ_METADIR; + ret = -xfrog_bulkstat_single(&xfd, ino, flags, &bulkstat); if (ret) { xfrog_perror(ret, "xfrog_bulkstat_single"); diff --git a/man/man8/xfs_io.8 b/man/man8/xfs_io.8 index d531cabc3ef..0c0b00b5712 100644 --- a/man/man8/xfs_io.8 +++ b/man/man8/xfs_io.8 @@ -1228,7 +1228,7 @@ for the current memory mapping. .SH FILESYSTEM COMMANDS .TP -.BI "bulkstat [ \-a " agno " ] [ \-d ] [ \-e " endino " ] [ \-n " batchsize " ] [ \-q ] [ \-s " startino " ] [ \-v " version" ] +.BI "bulkstat [ \-a " agno " ] [ \-d ] [ \-e " endino " ] [ \-m ] [ \-n " batchsize " ] [ \-q ] [ \-s " startino " ] [ \-v " version" ] Display raw stat information about a bunch of inodes in an XFS filesystem. Options are as follows: .RS 1.0i @@ -1245,6 +1245,9 @@ Print debugging information about call results. Stop displaying records when this inode number is reached. Defaults to stopping when the system call stops returning results. .TP +.BI \-m +Include metadata directories in the output. +.TP .BI \-n " batchsize" Retrieve at most this many records per call. Defaults to 4,096. @@ -1265,10 +1268,11 @@ Currently supported versions are 1 and 5. .RE .PD .TP -.BI "bulkstat_single [ \-d ] [ \-v " version " ] [ " inum... " | " special... " ] +.BI "bulkstat_single [ \-d ] [ \-m ] [ \-v " version " ] [ " inum... " | " special... " ] Display raw stat information about individual inodes in an XFS filesystem. The -.B \-d +.BR \-d , +.BR \-m , and .B \-v options are the same as the ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 21/46] xfs_db: report metadir support for version command 2022-12-30 22:19 ` [PATCHSET v1.0 00/46] libxfs: metadata inode directories Darrick J. Wong ` (23 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 25/46] xfs_io: support the bulkstat metadata directory flag Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 22/46] xfs_db: don't obfuscate metadata directories and attributes Darrick J. Wong ` (20 subsequent siblings) 45 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Report metadir support if we have it enabled. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- db/inode.c | 3 +++ db/sb.c | 2 ++ 2 files changed, 5 insertions(+) diff --git a/db/inode.c b/db/inode.c index c9b506b905d..4c2fd19f446 100644 --- a/db/inode.c +++ b/db/inode.c @@ -207,6 +207,9 @@ const field_t inode_v3_flds[] = { { "nrext64", FLDT_UINT1, OI(COFF(flags2) + bitsz(uint64_t) - XFS_DIFLAG2_NREXT64_BIT - 1), C1, 0, TYP_NONE }, + { "metadata", FLDT_UINT1, + OI(COFF(flags2) + bitsz(uint64_t) - XFS_DIFLAG2_METADATA_BIT-1), C1, + 0, TYP_NONE }, { NULL } }; diff --git a/db/sb.c b/db/sb.c index 095c59596a4..8a54ff7b00c 100644 --- a/db/sb.c +++ b/db/sb.c @@ -707,6 +707,8 @@ version_string( strcat(s, ",NEEDSREPAIR"); if (xfs_has_large_extent_counts(mp)) strcat(s, ",NREXT64"); + if (xfs_has_metadir(mp)) + strcat(s, ",METADIR"); return s; } ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 22/46] xfs_db: don't obfuscate metadata directories and attributes 2022-12-30 22:19 ` [PATCHSET v1.0 00/46] libxfs: metadata inode directories Darrick J. Wong ` (24 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 21/46] xfs_db: report metadir support for version command Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 27/46] xfs_repair: don't zero the incore secondary super when zeroing Darrick J. Wong ` (19 subsequent siblings) 45 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Don't obfuscate the directory and attribute names of metadata inodes. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- db/metadump.c | 92 ++++++++++++++++++++++++++++++++++++++++----------------- 1 file changed, 64 insertions(+), 28 deletions(-) diff --git a/db/metadump.c b/db/metadump.c index 27d1df43279..996c97ca6a2 100644 --- a/db/metadump.c +++ b/db/metadump.c @@ -1234,7 +1234,8 @@ generate_obfuscated_name( static void process_sf_dir( - struct xfs_dinode *dip) + struct xfs_dinode *dip, + bool is_meta) { struct xfs_dir2_sf_hdr *sfp; xfs_dir2_sf_entry_t *sfep; @@ -1280,7 +1281,7 @@ process_sf_dir( (char *)sfp); } - if (obfuscate) + if (obfuscate && !is_meta) generate_obfuscated_name( libxfs_dir2_sf_get_ino(mp, sfp, sfep), namelen, &sfep->name[0]); @@ -1363,7 +1364,8 @@ process_sf_symlink( static void process_sf_attr( - struct xfs_dinode *dip) + struct xfs_dinode *dip, + bool is_meta) { /* * with extended attributes, obfuscate the names and fill the actual @@ -1406,7 +1408,7 @@ process_sf_attr( break; } - if (obfuscate) { + if (obfuscate && !is_meta) { generate_obfuscated_name(0, asfep->namelen, &asfep->nameval[0]); memset(&asfep->nameval[asfep->namelen], 'v', @@ -1509,7 +1511,8 @@ static void process_dir_data_block( char *block, xfs_fileoff_t offset, - int is_block_format) + int is_block_format, + bool is_meta) { /* * we have to rely on the fileoffset and signature of the block to @@ -1616,7 +1619,7 @@ process_dir_data_block( dir_offset) return; - if (obfuscate) + if (obfuscate && !is_meta) generate_obfuscated_name(be64_to_cpu(dep->inumber), dep->namelen, &dep->name[0]); dir_offset += length; @@ -1641,7 +1644,8 @@ process_symlink_block( xfs_fsblock_t s, xfs_filblks_t c, typnm_t btype, - xfs_fileoff_t last) + xfs_fileoff_t last, + bool is_meta) { struct bbmap map; char *link; @@ -1666,7 +1670,7 @@ process_symlink_block( if (xfs_has_crc((mp))) link += sizeof(struct xfs_dsymlink_hdr); - if (obfuscate) + if (obfuscate && !is_meta) obfuscate_path_components(link, XFS_SYMLINK_BUF_SPACE(mp, mp->m_sb.sb_blocksize)); if (zero_stale_data) { @@ -1717,7 +1721,8 @@ add_remote_vals( static void process_attr_block( char *block, - xfs_fileoff_t offset) + xfs_fileoff_t offset, + bool is_meta) { struct xfs_attr_leafblock *leaf; struct xfs_attr3_icleaf_hdr hdr; @@ -1785,7 +1790,7 @@ process_attr_block( (long long)cur_ino); break; } - if (obfuscate) { + if (obfuscate && !is_meta) { generate_obfuscated_name(0, local->namelen, &local->nameval[0]); memset(&local->nameval[local->namelen], 'v', @@ -1808,7 +1813,7 @@ process_attr_block( (long long)cur_ino); break; } - if (obfuscate) { + if (obfuscate && !is_meta) { generate_obfuscated_name(0, remote->namelen, &remote->name[0]); add_remote_vals(be32_to_cpu(remote->valueblk), @@ -1841,7 +1846,8 @@ process_single_fsb_objects( xfs_fsblock_t s, xfs_filblks_t c, typnm_t btype, - xfs_fileoff_t last) + xfs_fileoff_t last, + bool is_meta) { int rval = 1; char *dp; @@ -1911,12 +1917,13 @@ process_single_fsb_objects( process_dir_leaf_block(dp); } else { process_dir_data_block(dp, o, - last == mp->m_dir_geo->fsbcount); + last == mp->m_dir_geo->fsbcount, + is_meta); } iocur_top->need_crc = 1; break; case TYP_ATTR: - process_attr_block(dp, o); + process_attr_block(dp, o, is_meta); iocur_top->need_crc = 1; break; default: @@ -1949,7 +1956,8 @@ process_multi_fsb_dir( xfs_fsblock_t s, xfs_filblks_t c, typnm_t btype, - xfs_fileoff_t last) + xfs_fileoff_t last, + bool is_meta) { char *dp; int rval = 1; @@ -1993,7 +2001,8 @@ process_multi_fsb_dir( process_dir_leaf_block(dp); } else { process_dir_data_block(dp, o, - last == mp->m_dir_geo->fsbcount); + last == mp->m_dir_geo->fsbcount, + is_meta); } iocur_top->need_crc = 1; write: @@ -2030,13 +2039,14 @@ process_multi_fsb_objects( xfs_fsblock_t s, xfs_filblks_t c, typnm_t btype, - xfs_fileoff_t last) + xfs_fileoff_t last, + bool is_meta) { switch (btype) { case TYP_DIR2: - return process_multi_fsb_dir(o, s, c, btype, last); + return process_multi_fsb_dir(o, s, c, btype, last, is_meta); case TYP_SYMLINK: - return process_symlink_block(o, s, c, btype, last); + return process_symlink_block(o, s, c, btype, last, is_meta); default: print_warning("bad type for multi-fsb object %d", btype); return 1; @@ -2048,7 +2058,8 @@ static int process_bmbt_reclist( xfs_bmbt_rec_t *rp, int numrecs, - typnm_t btype) + typnm_t btype, + bool is_meta) { int i; xfs_fileoff_t o, op = NULLFILEOFF; @@ -2124,10 +2135,10 @@ process_bmbt_reclist( /* multi-extent blocks require special handling */ if (is_multi_fsb) rval = process_multi_fsb_objects(o, s, c, btype, - last); + last, is_meta); else rval = process_single_fsb_objects(o, s, c, btype, - last); + last, is_meta); if (!rval) break; } @@ -2135,6 +2146,11 @@ process_bmbt_reclist( return rval; } +struct scan_bmap { + enum typnm typ; + bool is_meta; +}; + static int scanfunc_bmap( struct xfs_btree_block *block, @@ -2144,6 +2160,7 @@ scanfunc_bmap( typnm_t btype, void *arg) /* ptr to itype */ { + struct scan_bmap *sbm = arg; int i; xfs_bmbt_ptr_t *pp; int nrecs; @@ -2159,7 +2176,7 @@ scanfunc_bmap( return 1; } return process_bmbt_reclist(XFS_BMBT_REC_ADDR(mp, block, 1), - nrecs, *(typnm_t*)arg); + nrecs, sbm->typ, sbm->is_meta); } if (nrecs > mp->m_bmap_dmxr[1]) { @@ -2191,6 +2208,15 @@ scanfunc_bmap( return 1; } +static inline bool +is_metadata_ino( + struct xfs_dinode *dip) +{ + return xfs_has_metadir(mp) && + dip->di_version >= 3 && + (dip->di_flags2 & cpu_to_be64(XFS_DIFLAG2_METADATA)); +} + static int process_btinode( struct xfs_dinode *dip, @@ -2204,6 +2230,7 @@ process_btinode( int maxrecs; int whichfork; typnm_t btype; + bool is_meta = is_metadata_ino(dip); whichfork = (itype == TYP_ATTR) ? XFS_ATTR_FORK : XFS_DATA_FORK; btype = (itype == TYP_ATTR) ? TYP_BMAPBTA : TYP_BMAPBTD; @@ -2222,7 +2249,7 @@ process_btinode( if (level == 0) { return process_bmbt_reclist(XFS_BMDR_REC_ADDR(dib, 1), - nrecs, itype); + nrecs, itype, is_meta); } maxrecs = libxfs_bmdr_maxrecs(XFS_DFORK_SIZE(dip, mp, whichfork), 0); @@ -2249,6 +2276,10 @@ process_btinode( } for (i = 0; i < nrecs; i++) { + struct scan_bmap sbm = { + .typ = itype, + .is_meta = is_meta, + }; xfs_agnumber_t ag; xfs_agblock_t bno; @@ -2265,7 +2296,7 @@ process_btinode( continue; } - if (!scan_btree(ag, bno, level, btype, &itype, scanfunc_bmap)) + if (!scan_btree(ag, bno, level, btype, &sbm, scanfunc_bmap)) return 0; } return 1; @@ -2279,6 +2310,7 @@ process_exinode( int whichfork; int used; xfs_extnum_t nex, max_nex; + bool is_meta = is_metadata_ino(dip); whichfork = (itype == TYP_ATTR) ? XFS_ATTR_FORK : XFS_DATA_FORK; @@ -2301,7 +2333,7 @@ process_exinode( return process_bmbt_reclist((xfs_bmbt_rec_t *)XFS_DFORK_PTR(dip, - whichfork), nex, itype); + whichfork), nex, itype, is_meta); } static int @@ -2309,6 +2341,8 @@ process_inode_data( struct xfs_dinode *dip, typnm_t itype) { + bool is_meta = is_metadata_ino(dip); + switch (dip->di_format) { case XFS_DINODE_FMT_LOCAL: if (!(obfuscate || zero_stale_data)) @@ -2329,7 +2363,7 @@ process_inode_data( switch (itype) { case TYP_DIR2: - process_sf_dir(dip); + process_sf_dir(dip, is_meta); break; case TYP_SYMLINK: @@ -2447,12 +2481,14 @@ process_inode( /* copy extended attributes if they exist and forkoff is valid */ if (XFS_DFORK_DSIZE(dip, mp) < XFS_LITINO(mp)) { + bool is_meta = is_metadata_ino(dip); + attr_data.remote_val_count = 0; switch (dip->di_aformat) { case XFS_DINODE_FMT_LOCAL: need_new_crc = 1; if (obfuscate || zero_stale_data) - process_sf_attr(dip); + process_sf_attr(dip, is_meta); break; case XFS_DINODE_FMT_EXTENTS: ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 27/46] xfs_repair: don't zero the incore secondary super when zeroing 2022-12-30 22:19 ` [PATCHSET v1.0 00/46] libxfs: metadata inode directories Darrick J. Wong ` (25 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 22/46] xfs_db: don't obfuscate metadata directories and attributes Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 28/46] xfs_repair: refactor metadata inode tagging Darrick J. Wong ` (18 subsequent siblings) 45 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> If secondary_sb_whack detects nonzero bytes beyond the end of the ondisk superblock, it will try to zero the end of the ondisk buffer as well as the incore superblock prior to scan_ag using that incore super to rewrite the ondisk super. However, the metadata directory feature adds a sb_metadirino field to the incore super. On disk, this is stored in the same slot as sb_rbmino, but we wanted to cache both inumbers incore to minimize the churn. Therefore, it is now only safe to zero the "end" of an xfs_dsb buffer, and never an xfs_sb object. Most of the XFS codebase moved off that second behavior long ago, with the exception of this one part of repair. The zeroing probably ought to be turned into explicit logic to zero fields that weren't defined with the featureset encoded in the primary superblock, but for now we'll resort to always resetting the values from the xfs_mount's xfs_sb. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- repair/agheader.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/repair/agheader.c b/repair/agheader.c index 3930a0ac091..af88802ffdf 100644 --- a/repair/agheader.c +++ b/repair/agheader.c @@ -405,6 +405,13 @@ secondary_sb_whack( mp->m_sb.sb_sectsize - size); /* Preserve meta_uuid so we don't fail uuid checks */ memcpy(&sb->sb_meta_uuid, &tmpuuid, sizeof(uuid_t)); + + /* + * Preserve the parts of the incore super that extend + * beyond the part that's supposed to match the ondisk + * super byte for byte. + */ + sb->sb_metadirino = mp->m_sb.sb_metadirino; } else do_warn( _("would zero unused portion of %s superblock (AG #%u)\n"), ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 28/46] xfs_repair: refactor metadata inode tagging 2022-12-30 22:19 ` [PATCHSET v1.0 00/46] libxfs: metadata inode directories Darrick J. Wong ` (26 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 27/46] xfs_repair: don't zero the incore secondary super when zeroing Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 33/46] xfs_repair: check metadata inode flag Darrick J. Wong ` (17 subsequent siblings) 45 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Refactor tagging of metadata inodes into a single helper function instead of open-coding a if-else statement. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- repair/dir2.c | 52 ++++++++++++++++++++++++++++------------------------ 1 file changed, 28 insertions(+), 24 deletions(-) diff --git a/repair/dir2.c b/repair/dir2.c index 022b61b885f..24d0dd84aaf 100644 --- a/repair/dir2.c +++ b/repair/dir2.c @@ -136,6 +136,31 @@ process_sf_dir2_fixoff( } } +static bool +is_meta_ino( + struct xfs_mount *mp, + xfs_ino_t dirino, + xfs_ino_t lino, + char **junkreason) +{ + char *reason = NULL; + + if (lino == mp->m_sb.sb_rbmino) + reason = _("realtime bitmap"); + else if (lino == mp->m_sb.sb_rsumino) + reason = _("realtime summary"); + else if (lino == mp->m_sb.sb_uquotino) + reason = _("user quota"); + else if (lino == mp->m_sb.sb_gquotino) + reason = _("group quota"); + else if (lino == mp->m_sb.sb_pquotino) + reason = _("project quota"); + + if (reason) + *junkreason = reason; + return reason != NULL; +} + /* * this routine performs inode discovery and tries to fix things * in place. available redundancy -- inode data size should match @@ -227,21 +252,8 @@ process_sf_dir2( } else if (!libxfs_verify_dir_ino(mp, lino)) { junkit = 1; junkreason = _("invalid"); - } else if (lino == mp->m_sb.sb_rbmino) { + } else if (is_meta_ino(mp, ino, lino, &junkreason)) { junkit = 1; - junkreason = _("realtime bitmap"); - } else if (lino == mp->m_sb.sb_rsumino) { - junkit = 1; - junkreason = _("realtime summary"); - } else if (lino == mp->m_sb.sb_uquotino) { - junkit = 1; - junkreason = _("user quota"); - } else if (lino == mp->m_sb.sb_gquotino) { - junkit = 1; - junkreason = _("group quota"); - } else if (lino == mp->m_sb.sb_pquotino) { - junkit = 1; - junkreason = _("project quota"); } else if ((irec_p = find_inode_rec(mp, XFS_INO_TO_AGNO(mp, lino), XFS_INO_TO_AGINO(mp, lino))) != NULL) { @@ -698,16 +710,8 @@ process_dir2_data( * directory since it's still structurally intact. */ clearreason = _("invalid"); - } else if (ent_ino == mp->m_sb.sb_rbmino) { - clearreason = _("realtime bitmap"); - } else if (ent_ino == mp->m_sb.sb_rsumino) { - clearreason = _("realtime summary"); - } else if (ent_ino == mp->m_sb.sb_uquotino) { - clearreason = _("user quota"); - } else if (ent_ino == mp->m_sb.sb_gquotino) { - clearreason = _("group quota"); - } else if (ent_ino == mp->m_sb.sb_pquotino) { - clearreason = _("project quota"); + } else if (is_meta_ino(mp, ino, ent_ino, &clearreason)) { + /* empty */ } else { irec_p = find_inode_rec(mp, XFS_INO_TO_AGNO(mp, ent_ino), ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 33/46] xfs_repair: check metadata inode flag 2022-12-30 22:19 ` [PATCHSET v1.0 00/46] libxfs: metadata inode directories Darrick J. Wong ` (27 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 28/46] xfs_repair: refactor metadata inode tagging Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 29/46] xfs_repair: refactor fixing dotdot Darrick J. Wong ` (16 subsequent siblings) 45 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Check whether or not the metadata inode flag is set appropriately. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/libxfs_api_defs.h | 1 + repair/dinode.c | 14 ++++++++++++++ 2 files changed, 15 insertions(+) diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h index 785354d3ec8..65fa90c8a2f 100644 --- a/libxfs/libxfs_api_defs.h +++ b/libxfs/libxfs_api_defs.h @@ -95,6 +95,7 @@ #define xfs_dinode_calc_crc libxfs_dinode_calc_crc #define xfs_dinode_good_version libxfs_dinode_good_version #define xfs_dinode_verify libxfs_dinode_verify +#define xfs_dinode_verify_metaflag libxfs_dinode_verify_metaflag #define xfs_dir2_data_bestfree_p libxfs_dir2_data_bestfree_p #define xfs_dir2_data_entry_tag_p libxfs_dir2_data_entry_tag_p diff --git a/repair/dinode.c b/repair/dinode.c index ee34a62ae8b..cf517f77173 100644 --- a/repair/dinode.c +++ b/repair/dinode.c @@ -2662,6 +2662,20 @@ _("bad (negative) size %" PRId64 " on inode %" PRIu64 "\n"), } } + if (flags2 & XFS_DIFLAG2_METADATA) { + xfs_failaddr_t fa; + + fa = libxfs_dinode_verify_metaflag(mp, dino, di_mode, + be16_to_cpu(dino->di_flags), flags2); + if (fa) { + if (!uncertain) + do_warn( + _("inode %" PRIu64 " is incorrectly marked as metadata\n"), + lino); + goto clear_bad_out; + } + } + if ((flags2 & XFS_DIFLAG2_REFLINK) && !xfs_has_reflink(mp)) { if (!uncertain) { ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 29/46] xfs_repair: refactor fixing dotdot 2022-12-30 22:19 ` [PATCHSET v1.0 00/46] libxfs: metadata inode directories Darrick J. Wong ` (28 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 33/46] xfs_repair: check metadata inode flag Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 34/46] xfs_repair: rebuild the metadata directory Darrick J. Wong ` (15 subsequent siblings) 45 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Pull the code that fixes a directory's dot-dot entry into a separate helper function so that we can call it on the rootdir and (later) the metadir. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- repair/phase6.c | 96 +++++++++++++++++++++++++++++++++---------------------- 1 file changed, 57 insertions(+), 39 deletions(-) diff --git a/repair/phase6.c b/repair/phase6.c index 90413251b56..053e3ace8ee 100644 --- a/repair/phase6.c +++ b/repair/phase6.c @@ -2711,6 +2711,62 @@ _("entry \"%s\" (ino %" PRIu64 ") in dir %" PRIu64 " is a duplicate name"), } } +/* + * If we have to create a .. for /, do it now *before* we delete the bogus + * entries, otherwise the directory could transform into a shortform dir which + * would probably cause the simulation to choke. Even if the illegal entries + * get shifted around, it's ok because the entries are structurally intact and + * in in hash-value order so the simulation won't get confused if it has to + * move them around. + */ +static void +fix_dotdot( + struct xfs_mount *mp, + xfs_ino_t ino, + struct xfs_inode *ip, + xfs_ino_t rootino, + const char *tag, + int *need_dotdot) +{ + struct xfs_trans *tp; + int nres; + int error; + + if (ino != rootino || !*need_dotdot) + return; + + if (no_modify) { + do_warn(_("would recreate %s directory .. entry\n"), tag); + return; + } + + ASSERT(ip->i_df.if_format != XFS_DINODE_FMT_LOCAL); + + do_warn(_("recreating %s directory .. entry\n"), tag); + + nres = XFS_MKDIR_SPACE_RES(mp, 2); + error = -libxfs_trans_alloc(mp, &M_RES(mp)->tr_mkdir, nres, 0, 0, &tp); + if (error) + res_failed(error); + + libxfs_trans_ijoin(tp, ip, 0); + + error = -libxfs_dir_createname(tp, ip, &xfs_name_dotdot, ip->i_ino, + nres); + if (error) + do_error( +_("can't make \"..\" entry in %s inode %" PRIu64 ", createname error %d\n"), + tag ,ino, error); + + libxfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); + error = -libxfs_trans_commit(tp); + if (error) + do_error( +_("%s inode \"..\" entry recreation failed (%d)\n"), tag, error); + + *need_dotdot = 0; +} + /* * processes all reachable inodes in directories */ @@ -2839,45 +2895,7 @@ _("error %d fixing shortform directory %llu\n"), } dir_hash_done(hashtab); - /* - * if we have to create a .. for /, do it now *before* - * we delete the bogus entries, otherwise the directory - * could transform into a shortform dir which would - * probably cause the simulation to choke. Even - * if the illegal entries get shifted around, it's ok - * because the entries are structurally intact and in - * in hash-value order so the simulation won't get confused - * if it has to move them around. - */ - if (!no_modify && need_root_dotdot && ino == mp->m_sb.sb_rootino) { - ASSERT(ip->i_df.if_format != XFS_DINODE_FMT_LOCAL); - - do_warn(_("recreating root directory .. entry\n")); - - nres = XFS_MKDIR_SPACE_RES(mp, 2); - error = -libxfs_trans_alloc(mp, &M_RES(mp)->tr_mkdir, - nres, 0, 0, &tp); - if (error) - res_failed(error); - - libxfs_trans_ijoin(tp, ip, 0); - - error = -libxfs_dir_createname(tp, ip, &xfs_name_dotdot, - ip->i_ino, nres); - if (error) - do_error( - _("can't make \"..\" entry in root inode %" PRIu64 ", createname error %d\n"), ino, error); - - libxfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); - error = -libxfs_trans_commit(tp); - if (error) - do_error( - _("root inode \"..\" entry recreation failed (%d)\n"), error); - - need_root_dotdot = 0; - } else if (need_root_dotdot && ino == mp->m_sb.sb_rootino) { - do_warn(_("would recreate root directory .. entry\n")); - } + fix_dotdot(mp, ino, ip, mp->m_sb.sb_rootino, "root", &need_root_dotdot); /* * if we need to create the '.' entry, do so only if ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 34/46] xfs_repair: rebuild the metadata directory 2022-12-30 22:19 ` [PATCHSET v1.0 00/46] libxfs: metadata inode directories Darrick J. Wong ` (29 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 29/46] xfs_repair: refactor fixing dotdot Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 31/46] xfs_repair: refactor root directory initialization Darrick J. Wong ` (14 subsequent siblings) 45 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Check the metadata directory for problems and rebuild it if necessary. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/libxfs_api_defs.h | 1 repair/dino_chunks.c | 12 ++ repair/dir2.c | 25 ++++ repair/globals.c | 3 repair/globals.h | 3 repair/phase1.c | 2 repair/phase2.c | 7 + repair/phase4.c | 16 +++ repair/phase6.c | 280 +++++++++++++++++++++++++++++++++++++++++++++- repair/sb.c | 3 repair/xfs_repair.c | 71 +++++++++++- 11 files changed, 410 insertions(+), 13 deletions(-) diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h index 65fa90c8a2f..494172b213b 100644 --- a/libxfs/libxfs_api_defs.h +++ b/libxfs/libxfs_api_defs.h @@ -167,6 +167,7 @@ #define xfs_imeta_link libxfs_imeta_link #define xfs_imeta_lookup libxfs_imeta_lookup #define xfs_imeta_mount libxfs_imeta_mount +#define xfs_imeta_set_metaflag libxfs_imeta_set_metaflag #define xfs_imeta_start_update libxfs_imeta_start_update #define xfs_imeta_unlink libxfs_imeta_unlink #define xfs_imeta_unlink_space_res libxfs_imeta_unlink_space_res diff --git a/repair/dino_chunks.c b/repair/dino_chunks.c index 5c7799b1888..3de0c24b1d8 100644 --- a/repair/dino_chunks.c +++ b/repair/dino_chunks.c @@ -934,6 +934,18 @@ process_inode_chunk( _("would clear root inode %" PRIu64 "\n"), ino); } + } else if (mp->m_sb.sb_metadirino == ino) { + need_metadir_inode = true; + + if (!no_modify) { + do_warn( + _("cleared metadata directory %" PRIu64 "\n"), + ino); + } else { + do_warn( + _("would clear metadata directory %" PRIu64 "\n"), + ino); + } } else if (mp->m_sb.sb_rbmino == ino) { need_rbmino = 1; diff --git a/repair/dir2.c b/repair/dir2.c index 24d0dd84aaf..e1fb195df34 100644 --- a/repair/dir2.c +++ b/repair/dir2.c @@ -145,6 +145,10 @@ is_meta_ino( { char *reason = NULL; + /* in metadir land we don't have static metadata inodes anymore */ + if (xfs_has_metadir(mp)) + return false; + if (lino == mp->m_sb.sb_rbmino) reason = _("realtime bitmap"); else if (lino == mp->m_sb.sb_rsumino) @@ -156,6 +160,16 @@ is_meta_ino( else if (lino == mp->m_sb.sb_pquotino) reason = _("project quota"); + if (xfs_has_metadir(mp) && + dirino == mp->m_sb.sb_metadirino) { + if (reason == NULL) { + /* no regular files in the metadir */ + *junkreason = _("non-metadata inode"); + return true; + } + return false; + } + if (reason) *junkreason = reason; return reason != NULL; @@ -547,7 +561,8 @@ _("corrected root directory %" PRIu64 " .. entry, was %" PRIu64 ", now %" PRIu64 _("would have corrected root directory %" PRIu64 " .. entry from %" PRIu64" to %" PRIu64 "\n"), ino, *parent, ino); } - } else if (ino == *parent && ino != mp->m_sb.sb_rootino) { + } else if (ino == *parent && ino != mp->m_sb.sb_rootino && + ino != mp->m_sb.sb_metadirino) { /* * likewise, non-root directories can't have .. pointing * to . @@ -833,7 +848,8 @@ _("entry at block %u offset %" PRIdPTR " in directory inode %" PRIu64 " has ille * NULLFSINO otherwise. */ if (ino == ent_ino && - ino != mp->m_sb.sb_rootino) { + ino != mp->m_sb.sb_rootino && + ino != mp->m_sb.sb_metadirino) { *parent = NULLFSINO; do_warn( _("bad .. entry in directory inode %" PRIu64 ", points to self: "), @@ -1474,9 +1490,14 @@ process_dir2( } else if (dotdot == 0 && ino == mp->m_sb.sb_rootino) { do_warn(_("no .. entry for root directory %" PRIu64 "\n"), ino); need_root_dotdot = 1; + } else if (dotdot == 0 && ino == mp->m_sb.sb_metadirino) { + do_warn(_("no .. entry for metaino directory %" PRIu64 "\n"), ino); + need_metadir_dotdot = 1; } ASSERT((ino != mp->m_sb.sb_rootino && ino != *parent) || + (ino == mp->m_sb.sb_metadirino && + (ino == *parent || need_metadir_dotdot == 1)) || (ino == mp->m_sb.sb_rootino && (ino == *parent || need_root_dotdot == 1))); diff --git a/repair/globals.c b/repair/globals.c index ec11bc67139..c731d6bdff1 100644 --- a/repair/globals.c +++ b/repair/globals.c @@ -68,6 +68,9 @@ int fs_is_dirty; int need_root_inode; int need_root_dotdot; +bool need_metadir_inode; +int need_metadir_dotdot; + int need_rbmino; int need_rsumino; diff --git a/repair/globals.h b/repair/globals.h index d5a04a75d41..6bd4be20cb1 100644 --- a/repair/globals.h +++ b/repair/globals.h @@ -109,6 +109,9 @@ extern int fs_is_dirty; extern int need_root_inode; extern int need_root_dotdot; +extern bool need_metadir_inode; +extern int need_metadir_dotdot; + extern int need_rbmino; extern int need_rsumino; diff --git a/repair/phase1.c b/repair/phase1.c index 00b98584eed..40e7f164c55 100644 --- a/repair/phase1.c +++ b/repair/phase1.c @@ -48,6 +48,8 @@ phase1(xfs_mount_t *mp) primary_sb_modified = 0; need_root_inode = 0; need_root_dotdot = 0; + need_metadir_inode = false; + need_metadir_dotdot = 0; need_rbmino = 0; need_rsumino = 0; lost_quotas = 0; diff --git a/repair/phase2.c b/repair/phase2.c index 05964b3d23c..77324a976a1 100644 --- a/repair/phase2.c +++ b/repair/phase2.c @@ -628,8 +628,11 @@ phase2( * make sure we know about the root inode chunk */ if ((ino_rec = find_inode_rec(mp, 0, mp->m_sb.sb_rootino)) == NULL) { - ASSERT(mp->m_sb.sb_rbmino == mp->m_sb.sb_rootino + 1 && - mp->m_sb.sb_rsumino == mp->m_sb.sb_rootino + 2); + ASSERT(!xfs_has_metadir(mp) || + mp->m_sb.sb_metadirino == mp->m_sb.sb_rootino + 1); + ASSERT(xfs_has_metadir(mp) || + (mp->m_sb.sb_rbmino == mp->m_sb.sb_rootino + 1 && + mp->m_sb.sb_rsumino == mp->m_sb.sb_rootino + 2)); do_warn(_("root inode chunk not found\n")); /* diff --git a/repair/phase4.c b/repair/phase4.c index b5e713aaa82..fdc5d777be4 100644 --- a/repair/phase4.c +++ b/repair/phase4.c @@ -264,6 +264,22 @@ phase4(xfs_mount_t *mp) do_warn(_("root inode lost\n")); } + /* + * If metadata directory trees are enabled, the metadata root directory + * always comes immediately after the regular root directory, even if + * it's free. + */ + if (xfs_has_metadir(mp) && + (is_inode_free(irec, 1) || !inode_isadir(irec, 1))) { + need_metadir_inode = true; + if (no_modify) + do_warn( + _("metadata directory root inode would be lost\n")); + else + do_warn( + _("metadata directory root inode lost\n")); + } + for (i = 0; i < mp->m_sb.sb_agcount; i++) { ag_end = (i < mp->m_sb.sb_agcount - 1) ? mp->m_sb.sb_agblocks : mp->m_sb.sb_dblocks - diff --git a/repair/phase6.c b/repair/phase6.c index aaaebc79098..4bdea2a2a38 100644 --- a/repair/phase6.c +++ b/repair/phase6.c @@ -469,6 +469,130 @@ reset_root_ino( libxfs_inode_init(tp, &args, ip); } +/* Mark a newly allocated inode in use in the incore bitmap. */ +static void +mark_ino_inuse( + struct xfs_mount *mp, + xfs_ino_t ino, + int mode, + xfs_ino_t parent) +{ + struct ino_tree_node *irec; + int ino_offset; + int i; + + irec = find_inode_rec(mp, XFS_INO_TO_AGNO(mp, ino), + XFS_INO_TO_AGINO(mp, ino)); + + if (irec == NULL) { + /* + * This inode is allocated from a newly created inode + * chunk and therefore did not exist when inode chunks + * were processed in phase3. Add this group of inodes to + * the entry avl tree as if they were discovered in phase3. + */ + irec = set_inode_free_alloc(mp, + XFS_INO_TO_AGNO(mp, ino), + XFS_INO_TO_AGINO(mp, ino)); + alloc_ex_data(irec); + + for (i = 0; i < XFS_INODES_PER_CHUNK; i++) + set_inode_free(irec, i); + } + + ino_offset = get_inode_offset(mp, ino, irec); + + /* + * Mark the inode allocated so it is not skipped in phase 7. We'll + * find it with the directory traverser soon, so we don't need to + * mark it reached. + */ + set_inode_used(irec, ino_offset); + set_inode_ftype(irec, ino_offset, libxfs_mode_to_ftype(mode)); + set_inode_parent(irec, ino_offset, parent); + if (S_ISDIR(mode)) + set_inode_isadir(irec, ino_offset); +} + +/* Make sure this metadata directory path exists. */ +static int +ensure_imeta_dirpath( + struct xfs_mount *mp, + const struct xfs_imeta_path *path) +{ + struct xfs_imeta_path temp_path = { + .im_path = path->im_path, + .im_depth = 1, + .im_ftype = XFS_DIR3_FT_DIR, + }; + unsigned int i; + xfs_ino_t parent; + int error; + + if (!xfs_has_metadir(mp)) + return 0; + + error = -libxfs_imeta_ensure_dirpath(mp, path); + if (error) + return error; + + /* Mark all directories in this path as inuse. */ + parent = mp->m_metadirip->i_ino; + for (i = 0; i < path->im_depth - 1; i++, temp_path.im_depth++) { + xfs_ino_t ino; + + error = -libxfs_imeta_lookup(mp, &temp_path, &ino); + if (error) + return error; + if (ino == NULLFSINO) + return ENOENT; + mark_ino_inuse(mp, ino, S_IFDIR, parent); + parent = ino; + } + + return 0; +} + +/* Look up the parent of this path. */ +static xfs_ino_t +lookup_imeta_path_dirname( + struct xfs_mount *mp, + const struct xfs_imeta_path *path) +{ + struct xfs_imeta_path temp_path = { + .im_path = path->im_path, + .im_depth = path->im_depth - 1, + .im_ftype = XFS_DIR3_FT_DIR, + }; + xfs_ino_t ino; + int error; + + if (!xfs_has_metadir(mp)) + return NULLFSINO; + + error = -libxfs_imeta_lookup(mp, &temp_path, &ino); + if (error) + return NULLFSINO; + + return ino; +} + +static inline bool +is_inode_inuse( + struct xfs_mount *mp, + xfs_ino_t inum) +{ + struct ino_tree_node *irec; + int ino_offset; + + irec = find_inode_rec(mp, XFS_INO_TO_AGNO(mp, inum), + XFS_INO_TO_AGINO(mp, inum)); + if (!irec) + return false; + ino_offset = XFS_INO_TO_AGINO(mp, inum) - irec->ino_startnum; + return !is_inode_free(irec, ino_offset); +} + /* Load a realtime metadata inode from disk and reset it. */ static int ensure_rtino( @@ -487,10 +611,66 @@ ensure_rtino( return 0; } +/* + * Either link the old rtbitmap/summary inode into the (reinitialized) metadata + * directory tree, or create new ones. + */ +static int +ensure_rtino_metadir( + struct xfs_trans **tpp, + const struct xfs_imeta_path *path, + xfs_ino_t ino, + struct xfs_inode **ipp, + struct xfs_imeta_update *upd) +{ + struct xfs_mount *mp = (*tpp)->t_mountp; + int error; + + /* + * We've already voided the old metadata directory, which means that we + * cannot call libxfs_imeta_lookup. Hence we're reliant on the caller + * to have saved the rbmino/rsumino values and to have marked the inode + * inuse if it proved to be ok. + */ + if (ino != NULLFSINO && is_inode_inuse(mp, ino)) { + /* + * This rt metadata inode was fine, so we'll just link it + * into the new metadata directory tree. + */ + error = -libxfs_imeta_iget(mp, ino, XFS_DIR3_FT_REG_FILE, + ipp); + if (error) + do_error( + _("failed to iget rt metadata inode 0x%llx, error %d\n"), + (unsigned long long)ino, error); + + error = -libxfs_imeta_link(*tpp, path, *ipp, upd); + if (error) + do_error( + _("failed to link rt metadata inode 0x%llx, error %d\n"), + (unsigned long long)ino, error); + + set_nlink(VFS_I(*ipp), 1); + libxfs_trans_log_inode(*tpp, *ipp, XFS_ILOG_CORE); + return 0; + } + + /* Allocate a new inode. */ + error = -libxfs_imeta_create(tpp, path, S_IFREG, 0, ipp, upd); + if (error) + do_error( +_("couldn't create new metadata inode, error %d\n"), error); + + mark_ino_inuse(mp, (*ipp)->i_ino, S_IFREG, + lookup_imeta_path_dirname(mp, path)); + return 0; +} + static void mk_rbmino( struct xfs_mount *mp) { + struct xfs_imeta_update upd; struct xfs_trans *tp; struct xfs_inode *ip; struct xfs_bmbt_irec *ep; @@ -501,15 +681,31 @@ mk_rbmino( struct xfs_bmbt_irec map[XFS_BMAP_MAX_NMAP]; uint blocks; + error = ensure_imeta_dirpath(mp, &XFS_IMETA_RTBITMAP); + if (error) + do_error( + _("Couldn't create realtime metadata directory, error %d\n"), error); + + error = -libxfs_imeta_start_update(mp, &XFS_IMETA_RTBITMAP, &upd); + if (error) + do_error( +_("Couldn't find realtime bitmap parent, error %d\n"), + error); + /* * first set up inode */ - i = -libxfs_trans_alloc_rollable(mp, 10, &tp); + i = -libxfs_trans_alloc(mp, &M_RES(mp)->tr_imeta_create, + libxfs_imeta_create_space_res(mp), 0, 0, &tp); if (i) res_failed(i); /* Reset the realtime bitmap inode. */ - error = ensure_rtino(&tp, mp->m_sb.sb_rbmino, &ip); + if (xfs_has_metadir(mp)) + error = ensure_rtino_metadir(&tp, &XFS_IMETA_RTBITMAP, + mp->m_sb.sb_rbmino, &ip, &upd); + else + error = ensure_rtino(&tp, mp->m_sb.sb_rbmino, &ip); if (error) { do_error( _("couldn't iget realtime bitmap inode -- error - %d\n"), @@ -520,6 +716,7 @@ mk_rbmino( error = -libxfs_trans_commit(tp); if (error) do_error(_("%s: commit failed, error %d\n"), __func__, error); + libxfs_imeta_end_update(mp, &upd, error); /* * then allocate blocks for file and fill with zeroes (stolen @@ -702,6 +899,7 @@ static void mk_rsumino( struct xfs_mount *mp) { + struct xfs_imeta_update upd; struct xfs_trans *tp; struct xfs_inode *ip; struct xfs_bmbt_irec *ep; @@ -713,15 +911,31 @@ mk_rsumino( struct xfs_bmbt_irec map[XFS_BMAP_MAX_NMAP]; uint blocks; + error = ensure_imeta_dirpath(mp, &XFS_IMETA_RTSUMMARY); + if (error) + do_error( + _("Couldn't create realtime metadata directory, error %d\n"), error); + + error = -libxfs_imeta_start_update(mp, &XFS_IMETA_RTSUMMARY, &upd); + if (error) + do_error( +_("Couldn't find realtime summary parent, error %d\n"), + error); + /* * first set up inode */ - i = -libxfs_trans_alloc(mp, &M_RES(mp)->tr_ichange, 10, 0, 0, &tp); + i = -libxfs_trans_alloc(mp, &M_RES(mp)->tr_imeta_create, + libxfs_imeta_create_space_res(mp), 0, 0, &tp); if (i) res_failed(i); /* Reset the rt summary inode. */ - error = ensure_rtino(&tp, mp->m_sb.sb_rsumino, &ip); + if (xfs_has_metadir(mp)) + error = ensure_rtino_metadir(&tp, &XFS_IMETA_RTSUMMARY, + mp->m_sb.sb_rsumino, &ip, &upd); + else + error = ensure_rtino(&tp, mp->m_sb.sb_rsumino, &ip); if (error) { do_error( _("couldn't iget realtime summary inode -- error - %d\n"), @@ -732,6 +946,7 @@ mk_rsumino( error = -libxfs_trans_commit(tp); if (error) do_error(_("%s: commit failed, error %d\n"), __func__, error); + libxfs_imeta_end_update(mp, &upd, error); /* * then allocate blocks for file and fill with zeroes (stolen @@ -827,6 +1042,36 @@ mk_root_dir(xfs_mount_t *mp) libxfs_irele(ip); } +/* Create a new metadata directory root. */ +static void +mk_metadir( + struct xfs_mount *mp) +{ + struct xfs_trans *tp; + int error; + + error = init_fs_root_dir(mp, mp->m_sb.sb_metadirino, 0, + &mp->m_metadirip); + if (error) + do_error( + _("Initialization of the metadata root directory failed, error %d\n"), + error); + + /* Mark the new metadata root dir as metadata. */ + error = -libxfs_trans_alloc(mp, &M_RES(mp)->tr_ichange, 0, 0, 0, &tp); + if (error) + do_error( + _("Marking metadata root directory failed")); + + libxfs_trans_ijoin(tp, mp->m_metadirip, 0); + libxfs_imeta_set_metaflag(tp, mp->m_metadirip); + + error = -libxfs_trans_commit(tp); + if (error) + do_error( + _("Marking metadata root directory failed, error %d\n"), error); +} + /* * orphanage name == lost+found */ @@ -1265,6 +1510,8 @@ longform_dir2_rebuild( if (ino == mp->m_sb.sb_rootino) need_root_dotdot = 0; + else if (ino == mp->m_sb.sb_metadirino) + need_metadir_dotdot = 0; /* go through the hash list and re-add the inodes */ @@ -2855,7 +3102,7 @@ process_dir_inode( need_dot = dirty = num_illegal = 0; - if (mp->m_sb.sb_rootino == ino) { + if (mp->m_sb.sb_rootino == ino || mp->m_sb.sb_metadirino == ino) { /* * mark root inode reached and bump up * link count for root inode to account @@ -2929,6 +3176,9 @@ _("error %d fixing shortform directory %llu\n"), dir_hash_done(hashtab); fix_dotdot(mp, ino, ip, mp->m_sb.sb_rootino, "root", &need_root_dotdot); + if (xfs_has_metadir(mp)) + fix_dotdot(mp, ino, ip, mp->m_sb.sb_metadirino, "metadata", + &need_metadir_dotdot); /* * if we need to create the '.' entry, do so only if @@ -3008,6 +3258,15 @@ mark_inode( static void mark_standalone_inodes(xfs_mount_t *mp) { + if (xfs_has_metadir(mp)) { + /* + * The directory connectivity scanner will pick up the metadata + * inode directory, which will mark the rest of the metadata + * inodes. + */ + return; + } + mark_inode(mp, mp->m_sb.sb_rbmino); mark_inode(mp, mp->m_sb.sb_rsumino); @@ -3184,6 +3443,17 @@ phase6(xfs_mount_t *mp) } } + if (need_metadir_inode) { + if (!no_modify) { + do_warn(_("reinitializing metadata root directory\n")); + mk_metadir(mp); + need_metadir_inode = false; + need_metadir_dotdot = 0; + } else { + do_warn(_("would reinitialize metadata root directory\n")); + } + } + if (need_rbmino) { if (!no_modify) { do_warn(_("reinitializing realtime bitmap inode\n")); diff --git a/repair/sb.c b/repair/sb.c index 7391cf043fd..c5dbc6c2062 100644 --- a/repair/sb.c +++ b/repair/sb.c @@ -28,6 +28,7 @@ copy_sb(xfs_sb_t *source, xfs_sb_t *dest) xfs_ino_t uquotino; xfs_ino_t gquotino; xfs_ino_t pquotino; + xfs_ino_t metadirino; uint16_t versionnum; rootino = dest->sb_rootino; @@ -36,6 +37,7 @@ copy_sb(xfs_sb_t *source, xfs_sb_t *dest) uquotino = dest->sb_uquotino; gquotino = dest->sb_gquotino; pquotino = dest->sb_pquotino; + metadirino = dest->sb_metadirino; versionnum = dest->sb_versionnum; @@ -47,6 +49,7 @@ copy_sb(xfs_sb_t *source, xfs_sb_t *dest) dest->sb_uquotino = uquotino; dest->sb_gquotino = gquotino; dest->sb_pquotino = pquotino; + dest->sb_metadirino = metadirino; dest->sb_versionnum = versionnum; diff --git a/repair/xfs_repair.c b/repair/xfs_repair.c index c461bc8eb07..53d45c5b189 100644 --- a/repair/xfs_repair.c +++ b/repair/xfs_repair.c @@ -628,6 +628,60 @@ guess_correct_sunit( do_warn(_("Would reset sb_width to %u\n"), new_sunit); } +/* + * Check that the metadata directory inode comes immediately after the root + * directory inode and that it seems to look like a metadata directory. + */ +STATIC void +check_metadir_inode( + struct xfs_mount *mp, + xfs_ino_t rootino) +{ + int error; + + validate_sb_ino(&mp->m_sb.sb_metadirino, rootino + 1, + _("metadata root directory")); + + /* If we changed the metadir inode, try reloading it. */ + if (!mp->m_metadirip || + mp->m_metadirip->i_ino != mp->m_sb.sb_metadirino) { + if (mp->m_metadirip) + libxfs_irele(mp->m_metadirip); + + error = -libxfs_imeta_iget(mp, mp->m_sb.sb_metadirino, + XFS_DIR3_FT_DIR, &mp->m_metadirip); + if (error) { + need_metadir_inode = true; + goto done; + } + + error = -libxfs_imeta_mount(mp); + if (error) + need_metadir_inode = true; + } + +done: + if (need_metadir_inode) { + if (!no_modify) + do_warn(_("will reset metadata root directory\n")); + else + do_warn(_("would reset metadata root directory\n")); + if (mp->m_metadirip) + libxfs_irele(mp->m_metadirip); + mp->m_metadirip = NULL; + } + + /* + * Since these two realtime inodes are no longer fixed, we must + * remember to regenerate them if we still haven't gotten a pointer to + * a valid realtime inode. + */ + if (!libxfs_verify_ino(mp, mp->m_sb.sb_rbmino)) + need_rbmino = 1; + if (!libxfs_verify_ino(mp, mp->m_sb.sb_rsumino)) + need_rsumino = 1; +} + /* * Make sure that the first 3 inodes in the filesystem are the root directory, * the realtime bitmap, and the realtime summary, in that order. @@ -657,10 +711,19 @@ _("sb root inode value %" PRIu64 " valid but in unaligned location (expected %"P validate_sb_ino(&mp->m_sb.sb_rootino, rootino, _("root")); - validate_sb_ino(&mp->m_sb.sb_rbmino, rootino + 1, - _("realtime bitmap")); - validate_sb_ino(&mp->m_sb.sb_rsumino, rootino + 2, - _("realtime summary")); + + if (xfs_has_metadir(mp)) { + check_metadir_inode(mp, rootino); + } else { + /* + * The realtime bitmap and summary inodes only comes after the + * root directory when the metadir feature is not enabled. + */ + validate_sb_ino(&mp->m_sb.sb_rbmino, rootino + 1, + _("realtime bitmap")); + validate_sb_ino(&mp->m_sb.sb_rsumino, rootino + 2, + _("realtime summary")); + } } /* ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 31/46] xfs_repair: refactor root directory initialization 2022-12-30 22:19 ` [PATCHSET v1.0 00/46] libxfs: metadata inode directories Darrick J. Wong ` (30 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 34/46] xfs_repair: rebuild the metadata directory Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 35/46] xfs_repair: don't let metadata and regular files mix Darrick J. Wong ` (13 subsequent siblings) 45 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Refactor root directory initialization into a separate function we can call for both the root dir and the metadir. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- repair/phase6.c | 63 +++++++++++++++++++++++++++++++++++-------------------- 1 file changed, 40 insertions(+), 23 deletions(-) diff --git a/repair/phase6.c b/repair/phase6.c index d8df0f608f8..7e751f41770 100644 --- a/repair/phase6.c +++ b/repair/phase6.c @@ -755,27 +755,27 @@ mk_rsumino(xfs_mount_t *mp) libxfs_irele(ip); } -/* - * makes a new root directory. - */ -static void -mk_root_dir(xfs_mount_t *mp) +/* Initialize a root directory. */ +static int +init_fs_root_dir( + struct xfs_mount *mp, + xfs_ino_t ino, + mode_t mode, + struct xfs_inode **ipp) { - xfs_trans_t *tp; - xfs_inode_t *ip; - int i; - int error; - const mode_t mode = 0755; - ino_tree_node_t *irec; + struct xfs_trans *tp; + struct xfs_inode *ip = NULL; + struct ino_tree_node *irec; + int error; - ip = NULL; - i = -libxfs_trans_alloc(mp, &M_RES(mp)->tr_ichange, 10, 0, 0, &tp); - if (i) - res_failed(i); + error = -libxfs_trans_alloc(mp, &M_RES(mp)->tr_ichange, 10, 0, 0, &tp); + if (error) + return error; - error = -libxfs_iget(mp, tp, mp->m_sb.sb_rootino, 0, &ip); + error = -libxfs_iget(mp, tp, ino, 0, &ip); if (error) { - do_error(_("could not iget root inode -- error - %d\n"), error); + libxfs_trans_cancel(tp); + return error; } /* Reset the root directory. */ @@ -784,14 +784,31 @@ mk_root_dir(xfs_mount_t *mp) error = -libxfs_trans_commit(tp); if (error) - do_error(_("%s: commit failed, error %d\n"), __func__, error); + return error; + + irec = find_inode_rec(mp, XFS_INO_TO_AGNO(mp, ino), + XFS_INO_TO_AGINO(mp, ino)); + set_inode_isadir(irec, XFS_INO_TO_AGINO(mp, ino) - irec->ino_startnum); + *ipp = ip; + return 0; +} + +/* + * makes a new root directory. + */ +static void +mk_root_dir(xfs_mount_t *mp) +{ + struct xfs_inode *ip = NULL; + int error; + + error = init_fs_root_dir(mp, mp->m_sb.sb_rootino, 0755, &ip); + if (error) + do_error( + _("Could not reinitialize root directory inode, error %d\n"), + error); libxfs_irele(ip); - - irec = find_inode_rec(mp, XFS_INO_TO_AGNO(mp, mp->m_sb.sb_rootino), - XFS_INO_TO_AGINO(mp, mp->m_sb.sb_rootino)); - set_inode_isadir(irec, XFS_INO_TO_AGINO(mp, mp->m_sb.sb_rootino) - - irec->ino_startnum); } /* ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 35/46] xfs_repair: don't let metadata and regular files mix 2022-12-30 22:19 ` [PATCHSET v1.0 00/46] libxfs: metadata inode directories Darrick J. Wong ` (31 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 31/46] xfs_repair: refactor root directory initialization Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 30/46] xfs_repair: refactor marking of metadata inodes Darrick J. Wong ` (12 subsequent siblings) 45 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Track whether or not inodes thought they were metadata inodes. We cannot allow metadata inodes to appear in the regular directory tree, and we cannot allow regular inodes to appear in the metadata directory tree. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- repair/dinode.c | 21 +++++++++++++++++ repair/incore.h | 19 +++++++++++++++ repair/incore_ino.c | 1 + repair/phase6.c | 63 +++++++++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 104 insertions(+) diff --git a/repair/dinode.c b/repair/dinode.c index cf517f77173..eae64a0556f 100644 --- a/repair/dinode.c +++ b/repair/dinode.c @@ -2354,6 +2354,7 @@ process_dinode_int( struct xfs_dinode *dino = *dinop; xfs_agino_t unlinked_ino; struct xfs_perag *pag; + bool is_meta = false; *dirty = *isa_dir = 0; *used = is_used; @@ -2926,6 +2927,18 @@ _("Bad CoW extent size %u on inode %" PRIu64 ", "), if (collect_rmaps) record_inode_reflink_flag(mp, dino, agno, ino, lino); + /* Does this inode think it was metadata? */ + if (dino->di_version >= 3 && + (dino->di_flags2 & cpu_to_be64(XFS_DIFLAG2_METADATA))) { + struct ino_tree_node *irec; + int off; + + irec = find_inode_rec(mp, agno, ino); + off = get_inode_offset(mp, lino, irec); + set_inode_is_meta(irec, off); + is_meta = true; + } + /* * check data fork -- if it's bad, clear the inode */ @@ -3012,6 +3025,14 @@ _("Bad CoW extent size %u on inode %" PRIu64 ", "), *used = is_free; *isa_dir = 0; blkmap_free(dblkmap); + if (is_meta) { + struct ino_tree_node *irec; + int off; + + irec = find_inode_rec(mp, agno, ino); + off = get_inode_offset(mp, lino, irec); + clear_inode_is_meta(irec, off); + } return 1; } diff --git a/repair/incore.h b/repair/incore.h index 8a1a39ec60c..0027593ae31 100644 --- a/repair/incore.h +++ b/repair/incore.h @@ -274,6 +274,7 @@ typedef struct ino_tree_node { uint64_t ino_isa_dir; /* bit == 1 if a directory */ uint64_t ino_was_rl; /* bit == 1 if reflink flag set */ uint64_t ino_is_rl; /* bit == 1 if reflink flag should be set */ + uint64_t ino_was_meta; /* bit == 1 if metadata */ uint8_t nlink_size; union ino_nlink disk_nlinks; /* on-disk nlinks, set in P3 */ union { @@ -541,6 +542,24 @@ static inline int inode_is_rl(struct ino_tree_node *irec, int offset) return (irec->ino_is_rl & IREC_MASK(offset)) != 0; } +/* + * set/clear/test was inode marked as metadata + */ +static inline void set_inode_is_meta(struct ino_tree_node *irec, int offset) +{ + irec->ino_was_meta |= IREC_MASK(offset); +} + +static inline void clear_inode_is_meta(struct ino_tree_node *irec, int offset) +{ + irec->ino_was_meta &= ~IREC_MASK(offset); +} + +static inline int inode_is_meta(struct ino_tree_node *irec, int offset) +{ + return (irec->ino_was_meta & IREC_MASK(offset)) != 0; +} + /* * add_inode_reached() is set on inode I only if I has been reached * by an inode P claiming to be the parent and if I is a directory, diff --git a/repair/incore_ino.c b/repair/incore_ino.c index 0dd7a2f060f..ef74e64f308 100644 --- a/repair/incore_ino.c +++ b/repair/incore_ino.c @@ -253,6 +253,7 @@ alloc_ino_node( irec->ino_isa_dir = 0; irec->ino_was_rl = 0; irec->ino_is_rl = 0; + irec->ino_was_meta = 0; irec->ir_free = (xfs_inofree_t) - 1; irec->ir_sparse = 0; irec->ino_un.ex_data = NULL; diff --git a/repair/phase6.c b/repair/phase6.c index 4bdea2a2a38..3e740079235 100644 --- a/repair/phase6.c +++ b/repair/phase6.c @@ -1905,6 +1905,38 @@ longform_dir2_entry_check_data( continue; } + /* + * Regular directories cannot point to metadata files. If + * we find such a thing, blow out the entry. + */ + if (!xfs_is_metadata_inode(ip) && + inode_is_meta(irec, ino_offset)) { + nbad++; + if (entry_junked( + _("entry \"%s\" in regular dir %" PRIu64" points to a metadata inode %" PRIu64 "\n"), + fname, ip->i_ino, inum)) { + dep->name[0] = '/'; + libxfs_dir2_data_log_entry(&da, bp, dep); + } + continue; + } + + /* + * Metadata directories cannot point to regular files. If + * we find such a thing, blow out the entry. + */ + if (xfs_is_metadata_inode(ip) && + !inode_is_meta(irec, ino_offset)) { + nbad++; + if (entry_junked( + _("entry \"%s\" in metadata dir %" PRIu64" points to a regular inode %" PRIu64 "\n"), + fname, ip->i_ino, inum)) { + dep->name[0] = '/'; + libxfs_dir2_data_log_entry(&da, bp, dep); + } + continue; + } + /* * check if this inode is lost+found dir in the root */ @@ -2815,6 +2847,37 @@ shortform_dir2_entry_check( ino_dirty); continue; } + + /* + * Regular directories cannot point to metadata files. If + * we find such a thing, blow out the entry. + */ + if (!xfs_is_metadata_inode(ip) && + inode_is_meta(irec, ino_offset)) { + do_warn( + _("entry \"%s\" in regular dir %" PRIu64" points to a metadata inode %" PRIu64 "\n"), + fname, ip->i_ino, lino); + next_sfep = shortform_dir2_junk(mp, sfp, sfep, lino, + &max_size, &i, &bytes_deleted, + ino_dirty); + continue; + } + + /* + * Metadata directories cannot point to regular files. If + * we find such a thing, blow out the entry. + */ + if (xfs_is_metadata_inode(ip) && + !inode_is_meta(irec, ino_offset)) { + do_warn( + _("entry \"%s\" in metadata dir %" PRIu64" points to a regular inode %" PRIu64 "\n"), + fname, ip->i_ino, lino); + next_sfep = shortform_dir2_junk(mp, sfp, sfep, lino, + &max_size, &i, &bytes_deleted, + ino_dirty); + continue; + } + /* * check if this inode is lost+found dir in the root */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 30/46] xfs_repair: refactor marking of metadata inodes 2022-12-30 22:19 ` [PATCHSET v1.0 00/46] libxfs: metadata inode directories Darrick J. Wong ` (32 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 35/46] xfs_repair: don't let metadata and regular files mix Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 32/46] xfs_repair: refactor grabbing realtime " Darrick J. Wong ` (11 subsequent siblings) 45 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Refactor the mechanics of marking a metadata inode into a helper function so that we don't have to open-code that for every single metadata inode. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- repair/phase6.c | 76 ++++++++++++++++++++----------------------------------- 1 file changed, 28 insertions(+), 48 deletions(-) diff --git a/repair/phase6.c b/repair/phase6.c index 053e3ace8ee..d8df0f608f8 100644 --- a/repair/phase6.c +++ b/repair/phase6.c @@ -2952,6 +2952,22 @@ _("error %d fixing shortform directory %llu\n"), libxfs_irele(ip); } +static void +mark_inode( + struct xfs_mount *mp, + xfs_ino_t ino) +{ + struct ino_tree_node *irec; + int offset; + + irec = find_inode_rec(mp, XFS_INO_TO_AGNO(mp, ino), + XFS_INO_TO_AGINO(mp, ino)); + + offset = XFS_INO_TO_AGINO(mp, ino) - irec->ino_startnum; + + add_inode_reached(irec, offset); +} + /* * mark realtime bitmap and summary inodes as reached. * quota inode will be marked here as well @@ -2959,54 +2975,18 @@ _("error %d fixing shortform directory %llu\n"), static void mark_standalone_inodes(xfs_mount_t *mp) { - ino_tree_node_t *irec; - int offset; - - irec = find_inode_rec(mp, XFS_INO_TO_AGNO(mp, mp->m_sb.sb_rbmino), - XFS_INO_TO_AGINO(mp, mp->m_sb.sb_rbmino)); - - offset = XFS_INO_TO_AGINO(mp, mp->m_sb.sb_rbmino) - - irec->ino_startnum; - - add_inode_reached(irec, offset); - - irec = find_inode_rec(mp, XFS_INO_TO_AGNO(mp, mp->m_sb.sb_rsumino), - XFS_INO_TO_AGINO(mp, mp->m_sb.sb_rsumino)); - - offset = XFS_INO_TO_AGINO(mp, mp->m_sb.sb_rsumino) - - irec->ino_startnum; - - add_inode_reached(irec, offset); - - if (fs_quotas) { - if (mp->m_sb.sb_uquotino - && mp->m_sb.sb_uquotino != NULLFSINO) { - irec = find_inode_rec(mp, XFS_INO_TO_AGNO(mp, - mp->m_sb.sb_uquotino), - XFS_INO_TO_AGINO(mp, mp->m_sb.sb_uquotino)); - offset = XFS_INO_TO_AGINO(mp, mp->m_sb.sb_uquotino) - - irec->ino_startnum; - add_inode_reached(irec, offset); - } - if (mp->m_sb.sb_gquotino - && mp->m_sb.sb_gquotino != NULLFSINO) { - irec = find_inode_rec(mp, XFS_INO_TO_AGNO(mp, - mp->m_sb.sb_gquotino), - XFS_INO_TO_AGINO(mp, mp->m_sb.sb_gquotino)); - offset = XFS_INO_TO_AGINO(mp, mp->m_sb.sb_gquotino) - - irec->ino_startnum; - add_inode_reached(irec, offset); - } - if (mp->m_sb.sb_pquotino - && mp->m_sb.sb_pquotino != NULLFSINO) { - irec = find_inode_rec(mp, XFS_INO_TO_AGNO(mp, - mp->m_sb.sb_pquotino), - XFS_INO_TO_AGINO(mp, mp->m_sb.sb_pquotino)); - offset = XFS_INO_TO_AGINO(mp, mp->m_sb.sb_pquotino) - - irec->ino_startnum; - add_inode_reached(irec, offset); - } - } + mark_inode(mp, mp->m_sb.sb_rbmino); + mark_inode(mp, mp->m_sb.sb_rsumino); + + if (!fs_quotas) + return; + + if (mp->m_sb.sb_uquotino && mp->m_sb.sb_uquotino != NULLFSINO) + mark_inode(mp, mp->m_sb.sb_uquotino); + if (mp->m_sb.sb_gquotino && mp->m_sb.sb_gquotino != NULLFSINO) + mark_inode(mp, mp->m_sb.sb_gquotino); + if (mp->m_sb.sb_pquotino && mp->m_sb.sb_pquotino != NULLFSINO) + mark_inode(mp, mp->m_sb.sb_pquotino); } static void ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 32/46] xfs_repair: refactor grabbing realtime metadata inodes 2022-12-30 22:19 ` [PATCHSET v1.0 00/46] libxfs: metadata inode directories Darrick J. Wong ` (33 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 30/46] xfs_repair: refactor marking of metadata inodes Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 39/46] xfs_repair: adjust keep_fsinos to handle metadata directories Darrick J. Wong ` (10 subsequent siblings) 45 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Create a helper function to grab a realtime metadata inode. When metadir arrives, the bitmap and summary inodes can float, so we'll turn this function into a "load or allocate" function. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- repair/phase6.c | 90 ++++++++++++++++++++++++++++++++----------------------- 1 file changed, 53 insertions(+), 37 deletions(-) diff --git a/repair/phase6.c b/repair/phase6.c index 7e751f41770..aaaebc79098 100644 --- a/repair/phase6.c +++ b/repair/phase6.c @@ -469,18 +469,37 @@ reset_root_ino( libxfs_inode_init(tp, &args, ip); } +/* Load a realtime metadata inode from disk and reset it. */ +static int +ensure_rtino( + struct xfs_trans **tpp, + xfs_ino_t ino, + struct xfs_inode **ipp) +{ + struct xfs_mount *mp = (*tpp)->t_mountp; + int error; + + error = -libxfs_iget(mp, *tpp, ino, 0, ipp); + if (error) + return error; + + reset_root_ino(*tpp, S_IFREG, *ipp); + return 0; +} + static void -mk_rbmino(xfs_mount_t *mp) +mk_rbmino( + struct xfs_mount *mp) { - xfs_trans_t *tp; - xfs_inode_t *ip; - xfs_bmbt_irec_t *ep; - int i; - int nmap; - int error; - xfs_fileoff_t bno; - xfs_bmbt_irec_t map[XFS_BMAP_MAX_NMAP]; - uint blocks; + struct xfs_trans *tp; + struct xfs_inode *ip; + struct xfs_bmbt_irec *ep; + int i; + int nmap; + int error; + xfs_fileoff_t bno; + struct xfs_bmbt_irec map[XFS_BMAP_MAX_NMAP]; + uint blocks; /* * first set up inode @@ -489,15 +508,13 @@ mk_rbmino(xfs_mount_t *mp) if (i) res_failed(i); - error = -libxfs_iget(mp, tp, mp->m_sb.sb_rbmino, 0, &ip); - if (error) { - do_error( - _("couldn't iget realtime bitmap inode -- error - %d\n"), - error); - } - /* Reset the realtime bitmap inode. */ - reset_root_ino(tp, S_IFREG, ip); + error = ensure_rtino(&tp, mp->m_sb.sb_rbmino, &ip); + if (error) { + do_error( + _("couldn't iget realtime bitmap inode -- error - %d\n"), + error); + } ip->i_disk_size = mp->m_sb.sb_rbmblocks * mp->m_sb.sb_blocksize; libxfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); error = -libxfs_trans_commit(tp); @@ -682,18 +699,19 @@ _("can't access block %" PRIu64 " (fsbno %" PRIu64 ") of realtime summary inode } static void -mk_rsumino(xfs_mount_t *mp) +mk_rsumino( + struct xfs_mount *mp) { - xfs_trans_t *tp; - xfs_inode_t *ip; - xfs_bmbt_irec_t *ep; - int i; - int nmap; - int error; - int nsumblocks; - xfs_fileoff_t bno; - xfs_bmbt_irec_t map[XFS_BMAP_MAX_NMAP]; - uint blocks; + struct xfs_trans *tp; + struct xfs_inode *ip; + struct xfs_bmbt_irec *ep; + int i; + int nmap; + int error; + int nsumblocks; + xfs_fileoff_t bno; + struct xfs_bmbt_irec map[XFS_BMAP_MAX_NMAP]; + uint blocks; /* * first set up inode @@ -702,15 +720,13 @@ mk_rsumino(xfs_mount_t *mp) if (i) res_failed(i); - error = -libxfs_iget(mp, tp, mp->m_sb.sb_rsumino, 0, &ip); - if (error) { - do_error( - _("couldn't iget realtime summary inode -- error - %d\n"), - error); - } - /* Reset the rt summary inode. */ - reset_root_ino(tp, S_IFREG, ip); + error = ensure_rtino(&tp, mp->m_sb.sb_rsumino, &ip); + if (error) { + do_error( + _("couldn't iget realtime summary inode -- error - %d\n"), + error); + } ip->i_disk_size = mp->m_rsumsize; libxfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); error = -libxfs_trans_commit(tp); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 39/46] xfs_repair: adjust keep_fsinos to handle metadata directories 2022-12-30 22:19 ` [PATCHSET v1.0 00/46] libxfs: metadata inode directories Darrick J. Wong ` (34 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 32/46] xfs_repair: refactor grabbing realtime " Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 42/46] xfs_repair: drop all the metadata directory files during pass 4 Darrick J. Wong ` (9 subsequent siblings) 45 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> On a filesystem with metadata directories, we only want to automatically mark the two root directories present because those are the only two statically allocated inode numbers -- the rt summary inode is now just a regular file in a directory. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- repair/phase5.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/repair/phase5.c b/repair/phase5.c index 361e5649b29..252442c9fd8 100644 --- a/repair/phase5.c +++ b/repair/phase5.c @@ -421,13 +421,14 @@ static void keep_fsinos(xfs_mount_t *mp) { ino_tree_node_t *irec; - int i; irec = find_inode_rec(mp, XFS_INO_TO_AGNO(mp, mp->m_sb.sb_rootino), XFS_INO_TO_AGINO(mp, mp->m_sb.sb_rootino)); - for (i = 0; i < 3; i++) - set_inode_used(irec, i); + set_inode_used(irec, 0); /* root dir */ + set_inode_used(irec, 1); /* rt bitmap or metadata dir root */ + if (!xfs_has_metadir(mp)) + set_inode_used(irec, 2); /* rt summary */ } static void ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 42/46] xfs_repair: drop all the metadata directory files during pass 4 2022-12-30 22:19 ` [PATCHSET v1.0 00/46] libxfs: metadata inode directories Darrick J. Wong ` (35 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 39/46] xfs_repair: adjust keep_fsinos to handle metadata directories Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 38/46] xfs_repair: mark space used by metadata files Darrick J. Wong ` (8 subsequent siblings) 45 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Drop the entire metadata directory tree during pass 4 so that we can reinitialize the entire tree in phase 6. The existing metadata files (rtbitmap, rtsummary, quotas) will be reattached to the newly rebuilt directory tree. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- repair/dino_chunks.c | 9 +++++++++ repair/dinode.c | 14 +++++++++++++- repair/phase6.c | 21 +++++++++++---------- repair/scan.c | 2 +- 4 files changed, 34 insertions(+), 12 deletions(-) diff --git a/repair/dino_chunks.c b/repair/dino_chunks.c index fb2bca66a47..382196cc170 100644 --- a/repair/dino_chunks.c +++ b/repair/dino_chunks.c @@ -952,6 +952,15 @@ process_inode_chunk( clear_inode_isadir(ino_rec, irec_offset); } + /* + * We always reinitialize the rt bitmap and summary inodes if + * the metadata directory feature is enabled. + */ + if (xfs_has_metadir(mp) && !no_modify) { + need_rbmino = -1; + need_rsumino = -1; + } + if (status) { if (mp->m_sb.sb_rootino == ino) { need_root_inode = 1; diff --git a/repair/dinode.c b/repair/dinode.c index 4efc7fe6b8b..5c1f07d5bc1 100644 --- a/repair/dinode.c +++ b/repair/dinode.c @@ -653,7 +653,7 @@ _("illegal state %d in block map %" PRIu64 "\n"), break; } } - if (collect_rmaps) /* && !check_dups */ + if (collect_rmaps && !zap_metadata) /* && !check_dups */ rmap_add_rec(mp, ino, whichfork, &irec); *tot += irec.br_blockcount; } @@ -3077,6 +3077,18 @@ _("Bad CoW extent size %u on inode %" PRIu64 ", "), */ *dirty += process_check_inode_nlink_version(dino, lino); + /* + * The entire metadata directory tree will be rebuilt during phase 6. + * Therefore, if we're at the end of phase 4 and this is a metadata + * file, zero the ondisk inode and the incore state. + */ + if (check_dups && zap_metadata && !no_modify) { + clear_dinode(mp, dino, lino); + *dirty += 1; + *used = is_free; + *isa_dir = 0; + } + return retval; clear_bad_out: diff --git a/repair/phase6.c b/repair/phase6.c index c440c2293d1..964342c31d6 100644 --- a/repair/phase6.c +++ b/repair/phase6.c @@ -3638,20 +3638,20 @@ phase6(xfs_mount_t *mp) } } - if (need_metadir_inode) { - if (!no_modify) { + if (!no_modify && xfs_has_metadir(mp)) { + if (need_metadir_inode) do_warn(_("reinitializing metadata root directory\n")); - mk_metadir(mp); - need_metadir_inode = false; - need_metadir_dotdot = 0; - } else { - do_warn(_("would reinitialize metadata root directory\n")); - } + mk_metadir(mp); + need_metadir_inode = false; + need_metadir_dotdot = 0; + } else if (need_metadir_inode) { + do_warn(_("would reinitialize metadata root directory\n")); } if (need_rbmino) { if (!no_modify) { - do_warn(_("reinitializing realtime bitmap inode\n")); + if (need_rbmino > 0) + do_warn(_("reinitializing realtime bitmap inode\n")); mk_rbmino(mp); need_rbmino = 0; } else { @@ -3661,7 +3661,8 @@ phase6(xfs_mount_t *mp) if (need_rsumino) { if (!no_modify) { - do_warn(_("reinitializing realtime summary inode\n")); + if (need_rsumino > 0) + do_warn(_("reinitializing realtime summary inode\n")); mk_rsumino(mp); need_rsumino = 0; } else { diff --git a/repair/scan.c b/repair/scan.c index ef78b4cce50..1f5db1c11ca 100644 --- a/repair/scan.c +++ b/repair/scan.c @@ -427,7 +427,7 @@ _("bad state %d, inode %" PRIu64 " bmap block 0x%" PRIx64 "\n"), numrecs = be16_to_cpu(block->bb_numrecs); /* Record BMBT blocks in the reverse-mapping data. */ - if (check_dups && collect_rmaps) { + if (check_dups && collect_rmaps && !zap_metadata) { agno = XFS_FSB_TO_AGNO(mp, bno); pthread_mutex_lock(&ag_locks[agno].lock); rmap_add_bmbt_rec(mp, ino, whichfork, bno); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 38/46] xfs_repair: mark space used by metadata files 2022-12-30 22:19 ` [PATCHSET v1.0 00/46] libxfs: metadata inode directories Darrick J. Wong ` (36 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 42/46] xfs_repair: drop all the metadata directory files during pass 4 Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 43/46] xfs_repair: truncate and unmark orphaned metadata inodes Darrick J. Wong ` (7 subsequent siblings) 45 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Track space used by metadata files as a separate incore extent type. This ensures that we can warn about cross-linked metadata files, even though we are going to rebuild the entire metadata directory tree in the end. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- repair/dino_chunks.c | 31 +++++++++++++ repair/dinode.c | 121 ++++++++++++++++++++++++++++++++++++++------------ repair/dinode.h | 6 ++ repair/incore.h | 39 ++++++++++------ repair/incore_ino.c | 2 - repair/phase4.c | 2 + repair/scan.c | 30 +++++++++++- 7 files changed, 179 insertions(+), 52 deletions(-) diff --git a/repair/dino_chunks.c b/repair/dino_chunks.c index 3de0c24b1d8..fb2bca66a47 100644 --- a/repair/dino_chunks.c +++ b/repair/dino_chunks.c @@ -141,6 +141,16 @@ verify_inode_chunk(xfs_mount_t *mp, _("uncertain inode block %d/%d already known\n"), agno, agbno); break; + case XR_E_METADATA: + /* + * Files in the metadata directory tree are always + * reconstructed, so it's ok to let go if this block + * is also a valid inode cluster. + */ + do_warn( + _("inode block %d/%d claimed by metadata file\n"), + agno, agbno); + fallthrough; case XR_E_UNKNOWN: case XR_E_FREE1: case XR_E_FREE: @@ -430,6 +440,7 @@ verify_inode_chunk(xfs_mount_t *mp, set_bmap_ext(agno, cur_agbno, blen, XR_E_MULT); pthread_mutex_unlock(&ag_locks[agno].lock); return 0; + case XR_E_METADATA: case XR_E_INO: do_error( _("uncertain inode block overlap, agbno = %d, ino = %" PRIu64 "\n"), @@ -474,6 +485,16 @@ verify_inode_chunk(xfs_mount_t *mp, _("uncertain inode block %" PRIu64 " already known\n"), XFS_AGB_TO_FSB(mp, agno, cur_agbno)); break; + case XR_E_METADATA: + /* + * Files in the metadata directory tree are always + * reconstructed, so it's ok to let go if this block + * is also a valid inode cluster. + */ + do_warn( + _("inode block %d/%d claimed by metadata file\n"), + agno, agbno); + fallthrough; case XR_E_UNKNOWN: case XR_E_FREE1: case XR_E_FREE: @@ -559,6 +580,16 @@ process_inode_agbno_state( switch (state) { case XR_E_INO: /* already marked */ break; + case XR_E_METADATA: + /* + * Files in the metadata directory tree are always + * reconstructed, so it's ok to let go if this block is also a + * valid inode cluster. + */ + do_warn( + _("inode block %d/%d claimed by metadata file\n"), + agno, agbno); + fallthrough; case XR_E_UNKNOWN: case XR_E_FREE: case XR_E_FREE1: diff --git a/repair/dinode.c b/repair/dinode.c index 4e402d1bd59..4efc7fe6b8b 100644 --- a/repair/dinode.c +++ b/repair/dinode.c @@ -220,6 +220,7 @@ static int process_rt_rec_state( struct xfs_mount *mp, xfs_ino_t ino, + bool zap_metadata, struct xfs_bmbt_irec *irec) { xfs_fsblock_t b = irec->br_startblock; @@ -253,7 +254,13 @@ _("data fork in rt inode %" PRIu64 " found invalid rt extent %"PRIu64" state %d switch (state) { case XR_E_FREE: case XR_E_UNKNOWN: - set_rtbmap(ext, XR_E_INUSE); + set_rtbmap(ext, zap_metadata ? XR_E_METADATA : + XR_E_INUSE); + break; + case XR_E_METADATA: + do_error( +_("data fork in rt inode %" PRIu64 " found metadata file block %" PRIu64 " in rt bmap\n"), + ino, ext); break; case XR_E_BAD_STATE: do_error( @@ -290,7 +297,8 @@ process_rt_rec( struct xfs_bmbt_irec *irec, xfs_ino_t ino, xfs_rfsblock_t *tot, - int check_dups) + int check_dups, + bool zap_metadata) { xfs_fsblock_t lastb; int bad; @@ -330,7 +338,7 @@ _("inode %" PRIu64 " - bad rt extent overflows - start %" PRIu64 ", " if (check_dups) bad = process_rt_rec_dups(mp, ino, irec); else - bad = process_rt_rec_state(mp, ino, irec); + bad = process_rt_rec_state(mp, ino, zap_metadata, irec); if (bad) return bad; @@ -361,7 +369,8 @@ process_bmbt_reclist_int( xfs_fileoff_t *first_key, xfs_fileoff_t *last_key, int check_dups, - int whichfork) + int whichfork, + bool zap_metadata) { xfs_bmbt_irec_t irec; xfs_filblks_t cp = 0; /* prev count */ @@ -440,7 +449,8 @@ _("zero length extent (off = %" PRIu64 ", fsbno = %" PRIu64 ") in ino %" PRIu64 if (type == XR_INO_RTDATA && whichfork == XFS_DATA_FORK) { pthread_mutex_lock(&rt_lock.lock); - error2 = process_rt_rec(mp, &irec, ino, tot, check_dups); + error2 = process_rt_rec(mp, &irec, ino, tot, check_dups, + zap_metadata); pthread_mutex_unlock(&rt_lock.lock); if (error2) return error2; @@ -555,6 +565,11 @@ _("%s fork in ino %" PRIu64 " claims free block %" PRIu64 "\n"), case XR_E_INUSE_FS1: do_warn(_("rmap claims metadata use!\n")); fallthrough; + case XR_E_METADATA: + do_warn( +_("%s fork in inode %" PRIu64 " claims metadata file block %" PRIu64 "\n"), + forkname, ino, b); + break; case XR_E_FS_MAP: case XR_E_INO: case XR_E_INUSE_FS: @@ -611,15 +626,28 @@ _("illegal state %d in block map %" PRIu64 "\n"), for (; agbno < ebno; agbno += blen) { state = get_bmap_ext(agno, agbno, ebno, &blen); switch (state) { + case XR_E_METADATA: + /* + * The entire metadata directory tree is rebuilt + * every time, so we can let regular files take + * ownership of this block. + */ + if (zap_metadata) + break; + fallthrough; case XR_E_FREE: case XR_E_FREE1: case XR_E_INUSE1: case XR_E_UNKNOWN: - set_bmap_ext(agno, agbno, blen, XR_E_INUSE); + set_bmap_ext(agno, agbno, blen, zap_metadata ? + XR_E_METADATA : XR_E_INUSE); break; + case XR_E_INUSE: case XR_E_MULT: - set_bmap_ext(agno, agbno, blen, XR_E_MULT); + if (!zap_metadata) + set_bmap_ext(agno, agbno, blen, + XR_E_MULT); break; default: break; @@ -658,10 +686,12 @@ process_bmbt_reclist( blkmap_t **blkmapp, xfs_fileoff_t *first_key, xfs_fileoff_t *last_key, - int whichfork) + int whichfork, + bool zap_metadata) { return process_bmbt_reclist_int(mp, rp, numrecs, type, ino, tot, - blkmapp, first_key, last_key, 0, whichfork); + blkmapp, first_key, last_key, 0, whichfork, + zap_metadata); } /* @@ -676,13 +706,15 @@ scan_bmbt_reclist( int type, xfs_ino_t ino, xfs_rfsblock_t *tot, - int whichfork) + int whichfork, + bool zap_metadata) { xfs_fileoff_t first_key = 0; xfs_fileoff_t last_key = 0; return process_bmbt_reclist_int(mp, rp, numrecs, type, ino, tot, - NULL, &first_key, &last_key, 1, whichfork); + NULL, &first_key, &last_key, 1, whichfork, + zap_metadata); } /* @@ -757,7 +789,8 @@ process_btinode( xfs_extnum_t *nex, blkmap_t **blkmapp, int whichfork, - int check_dups) + int check_dups, + bool zap_metadata) { xfs_bmdr_block_t *dib; xfs_fileoff_t last_key; @@ -836,8 +869,8 @@ _("bad bmap btree ptr 0x%" PRIx64 " in ino %" PRIu64 "\n"), if (scan_lbtree(get_unaligned_be64(&pp[i]), level, scan_bmapbt, type, whichfork, lino, tot, nex, blkmapp, - &cursor, 1, check_dups, magic, NULL, - &xfs_bmbt_buf_ops)) + &cursor, 1, check_dups, magic, + (void *)zap_metadata, &xfs_bmbt_buf_ops)) return(1); /* * fix key (offset) mismatches between the keys in root @@ -932,7 +965,8 @@ process_exinode( xfs_extnum_t *nex, blkmap_t **blkmapp, int whichfork, - int check_dups) + int check_dups, + bool zap_metadata) { xfs_ino_t lino; xfs_bmbt_rec_t *rp; @@ -966,10 +1000,10 @@ process_exinode( if (check_dups == 0) ret = process_bmbt_reclist(mp, rp, &numrecs, type, lino, tot, blkmapp, &first_key, &last_key, - whichfork); + whichfork, zap_metadata); else ret = scan_bmbt_reclist(mp, rp, &numrecs, type, lino, tot, - whichfork); + whichfork, zap_metadata); *nex = numrecs; return ret; @@ -1895,7 +1929,8 @@ process_inode_data_fork( xfs_extnum_t *nextents, blkmap_t **dblkmap, int check_dups, - struct xfs_buf **ino_bpp) + struct xfs_buf **ino_bpp, + bool zap_metadata) { struct xfs_dinode *dino = *dinop; xfs_ino_t lino = XFS_AGINO_TO_INO(mp, agno, ino); @@ -1936,14 +1971,14 @@ process_inode_data_fork( try_rebuild = 1; err = process_exinode(mp, agno, ino, dino, type, dirty, totblocks, nextents, dblkmap, XFS_DATA_FORK, - check_dups); + check_dups, zap_metadata); break; case XFS_DINODE_FMT_BTREE: if (!rmapbt_suspect && try_rebuild == -1) try_rebuild = 1; err = process_btinode(mp, agno, ino, dino, type, dirty, totblocks, nextents, dblkmap, XFS_DATA_FORK, - check_dups); + check_dups, zap_metadata); break; case XFS_DINODE_FMT_DEV: err = 0; @@ -1996,12 +2031,12 @@ _("would have tried to rebuild inode %"PRIu64" data fork\n"), case XFS_DINODE_FMT_EXTENTS: err = process_exinode(mp, agno, ino, dino, type, dirty, totblocks, nextents, dblkmap, - XFS_DATA_FORK, 0); + XFS_DATA_FORK, 0, zap_metadata); break; case XFS_DINODE_FMT_BTREE: err = process_btinode(mp, agno, ino, dino, type, dirty, totblocks, nextents, dblkmap, - XFS_DATA_FORK, 0); + XFS_DATA_FORK, 0, zap_metadata); break; case XFS_DINODE_FMT_DEV: err = 0; @@ -2036,7 +2071,8 @@ process_inode_attr_fork( int check_dups, int extra_attr_check, int *retval, - struct xfs_buf **ino_bpp) + struct xfs_buf **ino_bpp, + bool zap_metadata) { xfs_ino_t lino = XFS_AGINO_TO_INO(mp, agno, ino); struct xfs_dinode *dino = *dinop; @@ -2078,7 +2114,7 @@ process_inode_attr_fork( *anextents = 0; err = process_exinode(mp, agno, ino, dino, type, dirty, atotblocks, anextents, &ablkmap, - XFS_ATTR_FORK, check_dups); + XFS_ATTR_FORK, check_dups, zap_metadata); break; case XFS_DINODE_FMT_BTREE: if (!rmapbt_suspect && try_rebuild == -1) @@ -2087,7 +2123,7 @@ process_inode_attr_fork( *anextents = 0; err = process_btinode(mp, agno, ino, dino, type, dirty, atotblocks, anextents, &ablkmap, - XFS_ATTR_FORK, check_dups); + XFS_ATTR_FORK, check_dups, zap_metadata); break; default: do_warn(_("illegal attribute format %d, ino %" PRIu64 "\n"), @@ -2152,12 +2188,12 @@ _("would have tried to rebuild inode %"PRIu64" attr fork or cleared it\n"), case XFS_DINODE_FMT_EXTENTS: err = process_exinode(mp, agno, ino, dino, type, dirty, atotblocks, anextents, - &ablkmap, XFS_ATTR_FORK, 0); + &ablkmap, XFS_ATTR_FORK, 0, zap_metadata); break; case XFS_DINODE_FMT_BTREE: err = process_btinode(mp, agno, ino, dino, type, dirty, atotblocks, anextents, - &ablkmap, XFS_ATTR_FORK, 0); + &ablkmap, XFS_ATTR_FORK, 0, zap_metadata); break; default: do_error(_("illegal attribute fmt %d, ino %" PRIu64 "\n"), @@ -2355,6 +2391,7 @@ process_dinode_int( xfs_agino_t unlinked_ino; struct xfs_perag *pag; bool is_meta = false; + bool zap_metadata = false; *dirty = *isa_dir = 0; *used = is_used; @@ -2937,6 +2974,32 @@ _("Bad CoW extent size %u on inode %" PRIu64 ", "), off = get_inode_offset(mp, lino, irec); set_inode_is_meta(irec, off); is_meta = true; + + /* + * We always rebuild the metadata directory tree during phase + * 6, so we use this flag to get all the directory blocks + * marked as free, and any other metadata files whose contents + * we don't want to save. + * + * Currently, there are no metadata files that use xattrs, so + * we always drop the xattr blocks of metadata files. + */ + switch (type) { + case XR_INO_RTBITMAP: + case XR_INO_RTSUM: + case XR_INO_UQUOTA: + case XR_INO_GQUOTA: + case XR_INO_PQUOTA: + /* + * This inode was recognized as being filesystem + * metadata, so preserve the inode and its contents for + * later checking and repair. + */ + break; + default: + zap_metadata = true; + break; + } } /* @@ -2944,7 +3007,7 @@ _("Bad CoW extent size %u on inode %" PRIu64 ", "), */ if (process_inode_data_fork(mp, agno, ino, dinop, type, dirty, &totblocks, &nextents, &dblkmap, check_dups, - ino_bpp) != 0) + ino_bpp, zap_metadata) != 0) goto bad_out; dino = *dinop; @@ -2954,7 +3017,7 @@ _("Bad CoW extent size %u on inode %" PRIu64 ", "), */ if (process_inode_attr_fork(mp, agno, ino, dinop, type, dirty, &atotblocks, &anextents, check_dups, extra_attr_check, - &retval, ino_bpp)) + &retval, ino_bpp, is_meta)) goto bad_out; dino = *dinop; diff --git a/repair/dinode.h b/repair/dinode.h index 92df83da621..ed2ec4ca238 100644 --- a/repair/dinode.h +++ b/repair/dinode.h @@ -27,7 +27,8 @@ process_bmbt_reclist(xfs_mount_t *mp, struct blkmap **blkmapp, uint64_t *first_key, uint64_t *last_key, - int whichfork); + int whichfork, + bool zap_metadata); int scan_bmbt_reclist( @@ -37,7 +38,8 @@ scan_bmbt_reclist( int type, xfs_ino_t ino, xfs_rfsblock_t *tot, - int whichfork); + int whichfork, + bool zap_metadata); void update_rootino(xfs_mount_t *mp); diff --git a/repair/incore.h b/repair/incore.h index 0027593ae31..53609f683af 100644 --- a/repair/incore.h +++ b/repair/incore.h @@ -85,18 +85,25 @@ typedef struct rt_extent_tree_node { #define XR_E_UNKNOWN 0 /* unknown state */ #define XR_E_FREE1 1 /* free block (marked by one fs space tree) */ #define XR_E_FREE 2 /* free block (marked by both fs space trees) */ -#define XR_E_INUSE 3 /* extent used by file/dir data or metadata */ -#define XR_E_INUSE_FS 4 /* extent used by fs ag header or log */ -#define XR_E_MULT 5 /* extent is multiply referenced */ -#define XR_E_INO 6 /* extent used by inodes (inode blocks) */ -#define XR_E_FS_MAP 7 /* extent used by fs space/inode maps */ -#define XR_E_INUSE1 8 /* used block (marked by rmap btree) */ -#define XR_E_INUSE_FS1 9 /* used by fs ag header or log (rmap btree) */ -#define XR_E_INO1 10 /* used by inodes (marked by rmap btree) */ -#define XR_E_FS_MAP1 11 /* used by fs space/inode maps (rmap btree) */ -#define XR_E_REFC 12 /* used by fs ag reference count btree */ -#define XR_E_COW 13 /* leftover cow extent */ -#define XR_E_BAD_STATE 14 +/* + * Space used by metadata files. The entire metadata directory tree will be + * rebuilt from scratch during phase 6, so this value must be less than + * XR_E_INUSE so that the space will go back to the free space btrees during + * phase 5. + */ +#define XR_E_METADATA 3 +#define XR_E_INUSE 4 /* extent used by file/dir data or metadata */ +#define XR_E_INUSE_FS 5 /* extent used by fs ag header or log */ +#define XR_E_MULT 6 /* extent is multiply referenced */ +#define XR_E_INO 7 /* extent used by inodes (inode blocks) */ +#define XR_E_FS_MAP 8 /* extent used by fs space/inode maps */ +#define XR_E_INUSE1 9 /* used block (marked by rmap btree) */ +#define XR_E_INUSE_FS1 10 /* used by fs ag header or log (rmap btree) */ +#define XR_E_INO1 11 /* used by inodes (marked by rmap btree) */ +#define XR_E_FS_MAP1 12 /* used by fs space/inode maps (rmap btree) */ +#define XR_E_REFC 13 /* used by fs ag reference count btree */ +#define XR_E_COW 14 /* leftover cow extent */ +#define XR_E_BAD_STATE 15 /* separate state bit, OR'ed into high (4th) bit of ex_state field */ @@ -274,7 +281,7 @@ typedef struct ino_tree_node { uint64_t ino_isa_dir; /* bit == 1 if a directory */ uint64_t ino_was_rl; /* bit == 1 if reflink flag set */ uint64_t ino_is_rl; /* bit == 1 if reflink flag should be set */ - uint64_t ino_was_meta; /* bit == 1 if metadata */ + uint64_t ino_is_meta; /* bit == 1 if metadata */ uint8_t nlink_size; union ino_nlink disk_nlinks; /* on-disk nlinks, set in P3 */ union { @@ -547,17 +554,17 @@ static inline int inode_is_rl(struct ino_tree_node *irec, int offset) */ static inline void set_inode_is_meta(struct ino_tree_node *irec, int offset) { - irec->ino_was_meta |= IREC_MASK(offset); + irec->ino_is_meta |= IREC_MASK(offset); } static inline void clear_inode_is_meta(struct ino_tree_node *irec, int offset) { - irec->ino_was_meta &= ~IREC_MASK(offset); + irec->ino_is_meta &= ~IREC_MASK(offset); } static inline int inode_is_meta(struct ino_tree_node *irec, int offset) { - return (irec->ino_was_meta & IREC_MASK(offset)) != 0; + return (irec->ino_is_meta & IREC_MASK(offset)) != 0; } /* diff --git a/repair/incore_ino.c b/repair/incore_ino.c index ef74e64f308..fc1de77141b 100644 --- a/repair/incore_ino.c +++ b/repair/incore_ino.c @@ -253,7 +253,7 @@ alloc_ino_node( irec->ino_isa_dir = 0; irec->ino_was_rl = 0; irec->ino_is_rl = 0; - irec->ino_was_meta = 0; + irec->ino_is_meta = 0; irec->ir_free = (xfs_inofree_t) - 1; irec->ir_sparse = 0; irec->ino_un.ex_data = NULL; diff --git a/repair/phase4.c b/repair/phase4.c index fdc5d777be4..5721647863a 100644 --- a/repair/phase4.c +++ b/repair/phase4.c @@ -303,6 +303,7 @@ phase4(xfs_mount_t *mp) _("unknown block state, ag %d, blocks %u-%u\n"), i, j, j + blen - 1); fallthrough; + case XR_E_METADATA: case XR_E_UNKNOWN: case XR_E_FREE: case XR_E_INUSE: @@ -335,6 +336,7 @@ phase4(xfs_mount_t *mp) _("unknown rt extent state, extent %" PRIu64 "\n"), bno); fallthrough; + case XR_E_METADATA: case XR_E_UNKNOWN: case XR_E_FREE1: case XR_E_FREE: diff --git a/repair/scan.c b/repair/scan.c index 42b37dd22ec..ef78b4cce50 100644 --- a/repair/scan.c +++ b/repair/scan.c @@ -227,6 +227,7 @@ scan_bmapbt( xfs_agnumber_t agno; xfs_agblock_t agbno; int state; + bool zap_metadata = priv != NULL; /* * unlike the ag freeblock btrees, if anything looks wrong @@ -352,7 +353,20 @@ _("bad back (left) sibling pointer (saw %llu should be NULL (0))\n" case XR_E_UNKNOWN: case XR_E_FREE1: case XR_E_FREE: - set_bmap(agno, agbno, XR_E_INUSE); + set_bmap(agno, agbno, zap_metadata ? XR_E_METADATA : + XR_E_INUSE); + break; + case XR_E_METADATA: + /* + * bmbt block already claimed by a metadata file. We + * always reconstruct the entire metadata tree, so if + * this is a regular file we mark it owned by the file. + */ + do_warn( +_("inode 0x%" PRIx64 "bmap block 0x%" PRIx64 " claimed by metadata file\n"), + ino, bno); + if (!zap_metadata) + set_bmap(agno, agbno, XR_E_INUSE); break; case XR_E_FS_MAP: case XR_E_INUSE: @@ -364,7 +378,8 @@ _("bad back (left) sibling pointer (saw %llu should be NULL (0))\n" * we made it here, the block probably * contains btree data. */ - set_bmap(agno, agbno, XR_E_MULT); + if (!zap_metadata) + set_bmap(agno, agbno, XR_E_MULT); do_warn( _("inode 0x%" PRIx64 "bmap block 0x%" PRIx64 " claimed, state is %d\n"), ino, bno, state); @@ -438,7 +453,8 @@ _("inode %" PRIu64 " bad # of bmap records (%" PRIu64 ", min - %u, max - %u)\n") if (check_dups == 0) { err = process_bmbt_reclist(mp, rp, &numrecs, type, ino, tot, blkmapp, &first_key, - &last_key, whichfork); + &last_key, whichfork, + zap_metadata); if (err) return 1; @@ -468,7 +484,7 @@ _("out-of-order bmap key (file offset) in inode %" PRIu64 ", %s fork, fsbno %" P return 0; } else { return scan_bmbt_reclist(mp, rp, &numrecs, type, ino, - tot, whichfork); + tot, whichfork, zap_metadata); } } if (numrecs > mp->m_bmap_dmxr[1] || (isroot == 0 && numrecs < @@ -858,6 +874,12 @@ process_rmap_rec( break; } break; + case XR_E_METADATA: + do_warn( +_("Metadata file block (%d,%d-%d) mismatch in %s tree, state - %d,%" PRIx64 "\n"), + agno, b, b + blen - 1, + name, state, owner); + break; case XR_E_INUSE_FS: if (owner == XFS_RMAP_OWN_FS || owner == XFS_RMAP_OWN_LOG) ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 43/46] xfs_repair: truncate and unmark orphaned metadata inodes 2022-12-30 22:19 ` [PATCHSET v1.0 00/46] libxfs: metadata inode directories Darrick J. Wong ` (37 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 38/46] xfs_repair: mark space used by metadata files Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 36/46] xfs_repair: update incore metadata state whenever we create new files Darrick J. Wong ` (6 subsequent siblings) 45 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> If an inode claims to be a metadata inode but wasn't linked in either directory tree, remove the attr fork and reset the data fork if the contents weren't regular extent mappings before moving the inode to the lost+found. We don't ifree the inode, because it's possible that the inode was not actually a metadata inode but simply got corrupted due to bitflips or something, and we'd rather let the sysadmin examine what's left of the file instead of photorec'ing it. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- repair/phase6.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 50 insertions(+) diff --git a/repair/phase6.c b/repair/phase6.c index 964342c31d6..13094730407 100644 --- a/repair/phase6.c +++ b/repair/phase6.c @@ -1220,6 +1220,53 @@ mk_orphanage( return(ino); } +/* Don't let metadata inode contents leak to lost+found. */ +static void +trunc_metadata_inode( + struct xfs_inode *ip) +{ + struct xfs_trans *tp; + struct xfs_mount *mp = ip->i_mount; + int err; + + err = -libxfs_trans_alloc(mp, &M_RES(mp)->tr_ichange, 0, 0, 0, &tp); + if (err) + do_error( + _("space reservation failed (%d), filesystem may be out of space\n"), + err); + + libxfs_trans_ijoin(tp, ip, 0); + ip->i_diflags2 &= ~XFS_DIFLAG2_METADATA; + + switch (VFS_I(ip)->i_mode & S_IFMT) { + case S_IFIFO: + case S_IFCHR: + case S_IFBLK: + case S_IFSOCK: + ip->i_df.if_format = XFS_DINODE_FMT_DEV; + break; + case S_IFREG: + switch (ip->i_df.if_format) { + case XFS_DINODE_FMT_EXTENTS: + case XFS_DINODE_FMT_BTREE: + break; + default: + ip->i_df.if_format = XFS_DINODE_FMT_EXTENTS; + ip->i_df.if_nextents = 0; + break; + } + break; + } + + libxfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); + + err = -libxfs_trans_commit(tp); + if (err) + do_error( + _("truncation of metadata inode 0x%llx failed, err=%d\n"), + (unsigned long long)ip->i_ino, err); +} + /* * move a file to the orphange. */ @@ -1262,6 +1309,9 @@ mv_orphanage( if (err) do_error(_("%d - couldn't iget disconnected inode\n"), err); + if (xfs_is_metadata_inode(ino_p)) + trunc_metadata_inode(ino_p); + xname.type = libxfs_mode_to_ftype(VFS_I(ino_p)->i_mode); if (isa_dir) { ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 36/46] xfs_repair: update incore metadata state whenever we create new files 2022-12-30 22:19 ` [PATCHSET v1.0 00/46] libxfs: metadata inode directories Darrick J. Wong ` (38 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 43/46] xfs_repair: truncate and unmark orphaned metadata inodes Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 40/46] xfs_repair: metadata dirs are never plausible root dirs Darrick J. Wong ` (5 subsequent siblings) 45 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Make sure that we update our incore metadata inode bookkeepping whenever we create new metadata files. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- repair/phase6.c | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) diff --git a/repair/phase6.c b/repair/phase6.c index 3e740079235..b3ad4074ff8 100644 --- a/repair/phase6.c +++ b/repair/phase6.c @@ -514,6 +514,24 @@ mark_ino_inuse( set_inode_isadir(irec, ino_offset); } +/* + * Mark a newly allocated inode as metadata in the incore bitmap. Callers + * must have already called mark_ino_inuse to ensure there is an incore record. + */ +static void +mark_ino_metadata( + struct xfs_mount *mp, + xfs_ino_t ino) +{ + struct ino_tree_node *irec; + int ino_offset; + + irec = find_inode_rec(mp, XFS_INO_TO_AGNO(mp, ino), + XFS_INO_TO_AGINO(mp, ino)); + ino_offset = get_inode_offset(mp, ino, irec); + set_inode_is_meta(irec, ino_offset); +} + /* Make sure this metadata directory path exists. */ static int ensure_imeta_dirpath( @@ -547,6 +565,7 @@ ensure_imeta_dirpath( if (ino == NULLFSINO) return ENOENT; mark_ino_inuse(mp, ino, S_IFDIR, parent); + mark_ino_metadata(mp, ino); parent = ino; } @@ -663,6 +682,7 @@ _("couldn't create new metadata inode, error %d\n"), error); mark_ino_inuse(mp, (*ipp)->i_ino, S_IFREG, lookup_imeta_path_dirname(mp, path)); + mark_ino_metadata(mp, (*ipp)->i_ino); return 0; } @@ -1065,6 +1085,7 @@ mk_metadir( libxfs_trans_ijoin(tp, mp->m_metadirip, 0); libxfs_imeta_set_metaflag(tp, mp->m_metadirip); + mark_ino_metadata(mp, mp->m_metadirip->i_ino); error = -libxfs_trans_commit(tp); if (error) ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 40/46] xfs_repair: metadata dirs are never plausible root dirs 2022-12-30 22:19 ` [PATCHSET v1.0 00/46] libxfs: metadata inode directories Darrick J. Wong ` (39 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 36/46] xfs_repair: update incore metadata state whenever we create new files Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 37/46] xfs_repair: pass private data pointer to scan_lbtree Darrick J. Wong ` (4 subsequent siblings) 45 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Metadata directories are never candidates to be the root of the user-accessible directory tree. Update has_plausible_rootdir to ignore them all, as well as detecting the case where the superblock incorrectly thinks both trees have the same root. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- repair/xfs_repair.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/repair/xfs_repair.c b/repair/xfs_repair.c index 53d45c5b189..fe3fe341530 100644 --- a/repair/xfs_repair.c +++ b/repair/xfs_repair.c @@ -530,9 +530,15 @@ has_plausible_rootdir( int error; bool ret = false; + if (xfs_has_metadir(mp) && + mp->m_sb.sb_rootino == mp->m_sb.sb_metadirino) + goto out; + error = -libxfs_iget(mp, NULL, mp->m_sb.sb_rootino, 0, &ip); if (error) goto out; + if (xfs_is_metadata_inode(ip)) + goto out_rele; if (!S_ISDIR(VFS_I(ip)->i_mode)) goto out_rele; ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 37/46] xfs_repair: pass private data pointer to scan_lbtree 2022-12-30 22:19 ` [PATCHSET v1.0 00/46] libxfs: metadata inode directories Darrick J. Wong ` (40 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 40/46] xfs_repair: metadata dirs are never plausible root dirs Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 41/46] xfs_repair: reattach quota inodes to metadata directory Darrick J. Wong ` (3 subsequent siblings) 45 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Pass a private data pointer through scan_lbtree. We'll use this later when scanning the rtrmapbt to keep track of scan state. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- repair/dinode.c | 2 +- repair/scan.c | 11 +++++++---- repair/scan.h | 7 +++++-- 3 files changed, 13 insertions(+), 7 deletions(-) diff --git a/repair/dinode.c b/repair/dinode.c index eae64a0556f..4e402d1bd59 100644 --- a/repair/dinode.c +++ b/repair/dinode.c @@ -836,7 +836,7 @@ _("bad bmap btree ptr 0x%" PRIx64 " in ino %" PRIu64 "\n"), if (scan_lbtree(get_unaligned_be64(&pp[i]), level, scan_bmapbt, type, whichfork, lino, tot, nex, blkmapp, - &cursor, 1, check_dups, magic, + &cursor, 1, check_dups, magic, NULL, &xfs_bmbt_buf_ops)) return(1); /* diff --git a/repair/scan.c b/repair/scan.c index ff51eb0a602..42b37dd22ec 100644 --- a/repair/scan.c +++ b/repair/scan.c @@ -139,7 +139,8 @@ scan_lbtree( int isroot, int check_dups, int *dirty, - uint64_t magic), + uint64_t magic, + void *priv), int type, int whichfork, xfs_ino_t ino, @@ -150,6 +151,7 @@ scan_lbtree( int isroot, int check_dups, uint64_t magic, + void *priv, const struct xfs_buf_ops *ops) { struct xfs_buf *bp; @@ -181,7 +183,7 @@ scan_lbtree( err = (*func)(XFS_BUF_TO_BLOCK(bp), nlevels - 1, type, whichfork, root, ino, tot, nex, blkmapp, bm_cursor, isroot, check_dups, &dirty, - magic); + magic, priv); ASSERT(dirty == 0 || (dirty && !no_modify)); @@ -210,7 +212,8 @@ scan_bmapbt( int isroot, int check_dups, int *dirty, - uint64_t magic) + uint64_t magic, + void *priv) { int i; int err; @@ -495,7 +498,7 @@ _("bad bmap btree ptr 0x%llx in ino %" PRIu64 "\n"), err = scan_lbtree(be64_to_cpu(pp[i]), level, scan_bmapbt, type, whichfork, ino, tot, nex, blkmapp, - bm_cursor, 0, check_dups, magic, + bm_cursor, 0, check_dups, magic, priv, &xfs_bmbt_buf_ops); if (err) return(1); diff --git a/repair/scan.h b/repair/scan.h index ee16362b6d3..4da788becbe 100644 --- a/repair/scan.h +++ b/repair/scan.h @@ -26,7 +26,8 @@ int scan_lbtree( int isroot, int check_dups, int *dirty, - uint64_t magic), + uint64_t magic, + void *priv), int type, int whichfork, xfs_ino_t ino, @@ -37,6 +38,7 @@ int scan_lbtree( int isroot, int check_dups, uint64_t magic, + void *priv, const struct xfs_buf_ops *ops); int scan_bmapbt( @@ -53,7 +55,8 @@ int scan_bmapbt( int isroot, int check_dups, int *dirty, - uint64_t magic); + uint64_t magic, + void *priv); void scan_ags( ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 41/46] xfs_repair: reattach quota inodes to metadata directory 2022-12-30 22:19 ` [PATCHSET v1.0 00/46] libxfs: metadata inode directories Darrick J. Wong ` (41 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 37/46] xfs_repair: pass private data pointer to scan_lbtree Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 45/46] mkfs.xfs: enable metadata directories Darrick J. Wong ` (2 subsequent siblings) 45 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> If the quota inodes came through unscathed, we should attach them to the new metadata directory so that phase 7 can run quotacheck on them. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- repair/phase6.c | 113 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 113 insertions(+) diff --git a/repair/phase6.c b/repair/phase6.c index b3ad4074ff8..c440c2293d1 100644 --- a/repair/phase6.c +++ b/repair/phase6.c @@ -3489,6 +3489,117 @@ update_missing_dotdot_entries( } } +static int +reattach_quota_inode( + struct xfs_mount *mp, + xfs_ino_t ino, + const struct xfs_imeta_path *path) +{ + struct xfs_imeta_update upd; + struct xfs_inode *ip; + struct xfs_trans *tp; + unsigned int resblks; + int error; + + error = ensure_imeta_dirpath(mp, path); + if (error) { + do_warn( +_("Couldn't create quota metadata directory, error %d\n"), error); + return error; + } + + error = -libxfs_imeta_start_update(mp, path, &upd); + if (error) { + do_warn( +_("Couldn't start metadata directory update -- error - %d\n"), + ENOMEM); + return error; + } + + error = -libxfs_imeta_iget(mp, ino, XFS_DIR3_FT_REG_FILE, &ip); + if (error) { + do_warn( +_("Couldn't grab quota inode 0x%llx, error %d\n"), + (unsigned long long)ino, error); + goto cleanup; + } + + resblks = libxfs_imeta_create_space_res(mp); + error = -libxfs_trans_alloc(mp, &M_RES(mp)->tr_imeta_create, + resblks, 0, 0, &tp); + if (error) { + do_warn( +_("Couldn't allocate transaction to attach quota inode 0x%llx, error %d\n"), + (unsigned long long)ino, error); + goto rele; + } + + error = -libxfs_imeta_link(tp, path, ip, &upd); + if (error) { + do_warn( +_("Couldn't link quota inode 0x%llx, error %d\n"), + (unsigned long long)ino, error); + libxfs_trans_cancel(tp); + goto rele; + } + + set_nlink(VFS_I(ip), 1); + libxfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); + error = -libxfs_trans_commit(tp); + if (error) { + do_warn( +_("Couldn't commit quota inode 0x%llx reattachment transaction, error %d\n"), + (unsigned long long)ino, error); + } + +rele: + libxfs_irele(ip); +cleanup: + libxfs_imeta_end_update(mp, &upd, error); + return error; +} + +/* + * Reattach quota inodes to the metadata directory if we rebuilt the metadata + * directory tree. + */ +static inline void +reattach_metadir_quota_inodes( + struct xfs_mount *mp) +{ + int error; + + if (!xfs_has_metadir(mp) || no_modify) + return; + + if (mp->m_sb.sb_uquotino != NULLFSINO) { + error = reattach_quota_inode(mp, mp->m_sb.sb_uquotino, + &XFS_IMETA_USRQUOTA); + if (error) { + mp->m_sb.sb_uquotino = NULLFSINO; + lost_uquotino = 1; + } + } + + if (mp->m_sb.sb_gquotino != NULLFSINO) { + error = reattach_quota_inode(mp, mp->m_sb.sb_gquotino, + &XFS_IMETA_GRPQUOTA); + if (error) { + mp->m_sb.sb_gquotino = NULLFSINO; + lost_gquotino = 1; + } + } + + if (mp->m_sb.sb_pquotino != NULLFSINO) { + error = reattach_quota_inode(mp, mp->m_sb.sb_pquotino, + &XFS_IMETA_PRJQUOTA); + if (error) { + mp->m_sb.sb_pquotino = NULLFSINO; + lost_pquotino = 1; + } + } +} + static void traverse_ags( struct xfs_mount *mp) @@ -3572,6 +3683,8 @@ _(" - resetting contents of realtime bitmap and summary inodes\n")); } } + reattach_metadir_quota_inodes(mp); + mark_standalone_inodes(mp); do_log(_(" - traversing filesystem ...\n")); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 45/46] mkfs.xfs: enable metadata directories 2022-12-30 22:19 ` [PATCHSET v1.0 00/46] libxfs: metadata inode directories Darrick J. Wong ` (42 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 41/46] xfs_repair: reattach quota inodes to metadata directory Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 46/46] mkfs: add a utility to generate protofiles Darrick J. Wong 2022-12-30 22:19 ` [PATCH 44/46] xfs_repair: allow sysadmins to add metadata directories Darrick J. Wong 45 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Enable formatting filesystems with metadata directories. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/xfs_format.h | 3 +- man/man8/mkfs.xfs.8.in | 11 ++++++++ mkfs/lts_4.19.conf | 1 + mkfs/lts_5.10.conf | 1 + mkfs/lts_5.15.conf | 1 + mkfs/proto.c | 65 +++++++++++++++++++++++++++++++++++++++++++++++- mkfs/xfs_mkfs.c | 24 +++++++++++++++++- 7 files changed, 103 insertions(+), 3 deletions(-) diff --git a/libxfs/xfs_format.h b/libxfs/xfs_format.h index 0bd915bd4ee..33b047f9cf0 100644 --- a/libxfs/xfs_format.h +++ b/libxfs/xfs_format.h @@ -396,7 +396,8 @@ xfs_sb_has_ro_compat_feature( XFS_SB_FEAT_INCOMPAT_META_UUID| \ XFS_SB_FEAT_INCOMPAT_BIGTIME| \ XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR| \ - XFS_SB_FEAT_INCOMPAT_NREXT64) + XFS_SB_FEAT_INCOMPAT_NREXT64 | \ + XFS_SB_FEAT_INCOMPAT_METADIR) #define XFS_SB_FEAT_INCOMPAT_UNKNOWN ~XFS_SB_FEAT_INCOMPAT_ALL static inline bool diff --git a/man/man8/mkfs.xfs.8.in b/man/man8/mkfs.xfs.8.in index 94f117b6917..8cdfe9a7ff1 100644 --- a/man/man8/mkfs.xfs.8.in +++ b/man/man8/mkfs.xfs.8.in @@ -271,6 +271,17 @@ option set. When the option .B \-m finobt=0 is used, the inode btree counter feature is not supported and is disabled. +.TP +.BI metadir= value +This option creates an internal directory tree to store filesystem metadata. +.IP +By default, +.B mkfs.xfs +will not enable this feature. +If the option +.B \-m crc=0 +is used, the metadata directory feature is not supported and is disabled. + .TP .BI uuid= value Use the given value as the filesystem UUID for the newly created filesystem. diff --git a/mkfs/lts_4.19.conf b/mkfs/lts_4.19.conf index 751be45e519..20b35e5e13a 100644 --- a/mkfs/lts_4.19.conf +++ b/mkfs/lts_4.19.conf @@ -7,6 +7,7 @@ bigtime=0 crc=1 finobt=1 inobtcount=0 +metadir=0 reflink=0 rmapbt=0 diff --git a/mkfs/lts_5.10.conf b/mkfs/lts_5.10.conf index a1c991cec3c..606b3e0149a 100644 --- a/mkfs/lts_5.10.conf +++ b/mkfs/lts_5.10.conf @@ -7,6 +7,7 @@ bigtime=0 crc=1 finobt=1 inobtcount=0 +metadir=0 reflink=1 rmapbt=0 diff --git a/mkfs/lts_5.15.conf b/mkfs/lts_5.15.conf index d751f4c4667..571d6dd3e44 100644 --- a/mkfs/lts_5.15.conf +++ b/mkfs/lts_5.15.conf @@ -7,6 +7,7 @@ bigtime=1 crc=1 finobt=1 inobtcount=1 +metadir=0 reflink=1 rmapbt=0 diff --git a/mkfs/proto.c b/mkfs/proto.c index 6fb58bd7cd4..484b5deced8 100644 --- a/mkfs/proto.c +++ b/mkfs/proto.c @@ -17,6 +17,7 @@ static void fail(char *msg, int i); static struct xfs_trans * getres(struct xfs_mount *mp, uint blocks); static void rsvfile(xfs_mount_t *mp, xfs_inode_t *ip, long long len); static char *newregfile(char **pp, int *len); +static int metadir_create(struct xfs_mount *mp); static void rtinit(xfs_mount_t *mp); static long filesize(int fd); @@ -637,8 +638,15 @@ parseproto( * RT initialization. Do this here to ensure that * the RT inodes get placed after the root inode. */ - if (isroot) + if (isroot) { + error = metadir_create(mp); + if (error) + fail( + _("Creation of the metadata directory inode failed"), + error); + rtinit(mp); + } tp = NULL; for (;;) { name = getstr(pp); @@ -672,6 +680,61 @@ parse_proto( parseproto(mp, NULL, fsx, pp, NULL); } +/* Create a new metadata root directory. */ +static int +metadir_create( + struct xfs_mount *mp) +{ + struct xfs_imeta_update upd; + struct xfs_trans *tp; + struct xfs_inode *ip; + int error; + + if (!xfs_has_metadir(mp)) + return 0; + + /* + * The root of the metadata directory tree must be the next inode + * after the root directory. Reset the AGI rotor to satisfy this + * requirement. + */ + mp->m_agirotor = 0; + + error = -libxfs_imeta_start_update(mp, &XFS_IMETA_METADIR, &upd); + if (error) + return error; + + /* + * The metadata directory should always be the inode after the root + * directory. The chunk containing both of those inodes should already + * exist, because we (re)create the root directory first. So, no block + * reservation is necessary. + */ + error = -libxfs_trans_alloc(mp, &M_RES(mp)->tr_imeta_create, + libxfs_imeta_create_space_res(mp), 0, 0, &tp); + if (error) + goto out_end; + + error = -libxfs_imeta_create(&tp, &XFS_IMETA_METADIR, S_IFDIR, 0, &ip, + &upd); + if (error) + goto out_cancel; + + error = -libxfs_trans_commit(tp); + if (error) + goto out_end; + + libxfs_imeta_end_update(mp, &upd, error); + mp->m_metadirip = ip; + return 0; + +out_cancel: + libxfs_trans_cancel(tp); +out_end: + libxfs_imeta_end_update(mp, &upd, error); + return error; +} + /* Create the realtime bitmap inode. */ static void rtbitmap_create( diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c index bd730d6cb07..df8acf221ac 100644 --- a/mkfs/xfs_mkfs.c +++ b/mkfs/xfs_mkfs.c @@ -139,6 +139,7 @@ enum { M_REFLINK, M_INOBTCNT, M_BIGTIME, + M_METADIR, M_MAX_OPTS, }; @@ -762,6 +763,7 @@ static struct opt_params mopts = { [M_REFLINK] = "reflink", [M_INOBTCNT] = "inobtcount", [M_BIGTIME] = "bigtime", + [M_METADIR] = "metadir", [M_MAX_OPTS] = NULL, }, .subopt_params = { @@ -805,6 +807,12 @@ static struct opt_params mopts = { .maxval = 1, .defaultval = 1, }, + { .index = M_METADIR, + .conflicts = { { NULL, LAST_CONFLICT } }, + .minval = 0, + .maxval = 1, + .defaultval = 1, + }, }, }; @@ -857,6 +865,7 @@ struct sb_feat_args { bool reflink; /* XFS_SB_FEAT_RO_COMPAT_REFLINK */ bool inobtcnt; /* XFS_SB_FEAT_RO_COMPAT_INOBTCNT */ bool bigtime; /* XFS_SB_FEAT_INCOMPAT_BIGTIME */ + bool metadir; /* XFS_SB_FEAT_INCOMPAT_METADIR */ bool nodalign; bool nortalign; bool nrext64; @@ -987,7 +996,7 @@ usage( void ) /* blocksize */ [-b size=num]\n\ /* config file */ [-c options=xxx]\n\ /* metadata */ [-m crc=0|1,finobt=0|1,uuid=xxx,rmapbt=0|1,reflink=0|1,\n\ - inobtcount=0|1,bigtime=0|1]\n\ + inobtcount=0|1,bigtime=0|1,metadir=0|1]\n\ /* data subvol */ [-d agcount=n,agsize=n,file,name=xxx,size=num,\n\ (sunit=value,swidth=value|su=num,sw=num|noalign),\n\ sectsize=num,concurrency=num]\n\ @@ -1810,6 +1819,9 @@ meta_opts_parser( case M_BIGTIME: cli->sb_feat.bigtime = getnum(value, opts, subopt); break; + case M_METADIR: + cli->sb_feat.metadir = getnum(value, opts, subopt); + break; default: return -EINVAL; } @@ -2310,6 +2322,13 @@ _("64 bit extent count not supported without CRC support\n")); usage(); } cli->sb_feat.nrext64 = false; + + if (cli->sb_feat.metadir) { + fprintf(stderr, +_("metadata directory not supported without CRC support\n")); + usage(); + } + cli->sb_feat.metadir = false; } if (!cli->sb_feat.finobt) { @@ -3455,6 +3474,8 @@ sb_set_features( sbp->sb_features_ro_compat |= XFS_SB_FEAT_RO_COMPAT_INOBTCNT; if (fp->bigtime) sbp->sb_features_incompat |= XFS_SB_FEAT_INCOMPAT_BIGTIME; + if (fp->metadir) + sbp->sb_features_incompat |= XFS_SB_FEAT_INCOMPAT_METADIR; /* * Sparse inode chunk support has two main inode alignment requirements. @@ -3903,6 +3924,7 @@ finish_superblock_setup( platform_uuid_copy(&sbp->sb_meta_uuid, &cfg->uuid); sbp->sb_logstart = cfg->logstart; sbp->sb_rootino = sbp->sb_rbmino = sbp->sb_rsumino = NULLFSINO; + sbp->sb_metadirino = NULLFSINO; sbp->sb_agcount = (xfs_agnumber_t)cfg->agcount; sbp->sb_rbmblocks = cfg->rtbmblocks; sbp->sb_logblocks = (xfs_extlen_t)cfg->logblocks; ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 46/46] mkfs: add a utility to generate protofiles 2022-12-30 22:19 ` [PATCHSET v1.0 00/46] libxfs: metadata inode directories Darrick J. Wong ` (43 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 45/46] mkfs.xfs: enable metadata directories Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 44/46] xfs_repair: allow sysadmins to add metadata directories Darrick J. Wong 45 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Add a new utility to generate mkfs protofiles from a directory tree. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- man/man8/xfs_protofile.8 | 33 ++++++++++ mkfs/Makefile | 10 +++ mkfs/xfs_protofile.in | 152 ++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 194 insertions(+), 1 deletion(-) create mode 100644 man/man8/xfs_protofile.8 create mode 100644 mkfs/xfs_protofile.in diff --git a/man/man8/xfs_protofile.8 b/man/man8/xfs_protofile.8 new file mode 100644 index 00000000000..75090c138f3 --- /dev/null +++ b/man/man8/xfs_protofile.8 @@ -0,0 +1,33 @@ +.TH xfs_protofile 8 +.SH NAME +xfs_protofile \- create a protofile for use with mkfs.xfs +.SH SYNOPSIS +.B xfs_protofile +.I path +[ +.I paths... +] +.br +.B xfs_protofile \-V +.SH DESCRIPTION +.B xfs_protofile +walks a directory tree to generate a protofile. +The protofile format is specified in the +.BR mkfs.xfs (8) +manual page and is derived from 3rd edition Unix. +.SH OPTIONS +.TP 1.0i +.I path +Create protofile directives to copy this path into the root directory. +If the path is a directory, protofile directives will be emitted to +replicate the entire subtree as a subtree of the root directory. +If the path is a not a directory, protofile directives will be emitted +to create the file as an entry in the root directory. +The first path must resolve to a directory. + +.SH BUGS +Filenames cannot contain spaces. +Extended attributes are not copied into the filesystem. + +.PD +.RE diff --git a/mkfs/Makefile b/mkfs/Makefile index 6c7ee186fa2..98463e4362b 100644 --- a/mkfs/Makefile +++ b/mkfs/Makefile @@ -6,6 +6,7 @@ TOPDIR = .. include $(TOPDIR)/include/builddefs LTCOMMAND = mkfs.xfs +XFS_PROTOFILE = xfs_protofile HFILES = CFILES = proto.c xfs_mkfs.c @@ -21,17 +22,24 @@ LLDLIBS += $(LIBXFS) $(LIBXCMD) $(LIBFROG) $(LIBRT) $(LIBBLKID) \ $(LIBUUID) $(LIBINIH) $(LIBURCU) $(LIBPTHREAD) LTDEPENDENCIES += $(LIBXFS) $(LIBXCMD) $(LIBFROG) LLDFLAGS = -static-libtool-libs +DIRT = $(XFS_PROTOFILE) -default: depend $(LTCOMMAND) $(CFGFILES) +default: depend $(LTCOMMAND) $(CFGFILES) $(XFS_PROTOFILE) include $(BUILDRULES) install: default $(INSTALL) -m 755 -d $(PKG_ROOT_SBIN_DIR) $(LTINSTALL) -m 755 $(LTCOMMAND) $(PKG_ROOT_SBIN_DIR) + $(INSTALL) -m 755 $(XFS_PROTOFILE) $(PKG_ROOT_SBIN_DIR) $(INSTALL) -m 755 -d $(MKFS_CFG_DIR) $(INSTALL) -m 644 $(CFGFILES) $(MKFS_CFG_DIR) install-dev: +$(XFS_PROTOFILE): $(XFS_PROTOFILE).in + @echo " [SED] $@" + $(Q)$(SED) -e "s|@pkg_version@|$(PKG_VERSION)|g" < $< > $@ + $(Q)chmod a+x $@ + -include .dep diff --git a/mkfs/xfs_protofile.in b/mkfs/xfs_protofile.in new file mode 100644 index 00000000000..f2d09735a11 --- /dev/null +++ b/mkfs/xfs_protofile.in @@ -0,0 +1,152 @@ +#!/usr/bin/python3 + +# SPDX-License-Identifier: GPL-2.0-or-later +# Copyright (C) 2022 Oracle. All rights reserved. +# +# Author: Darrick J. Wong <djwong@kernel.org> + +# Walk a filesystem tree to generate a protofile for mkfs. + +import os +import argparse +import sys +import stat + +def emit_proto_header(): + '''Emit the protofile header.''' + print('/') + print('0 0') + +def stat_to_str(statbuf): + '''Convert a stat buffer to a proto string.''' + + if stat.S_ISREG(statbuf.st_mode): + type = '-' + elif stat.S_ISCHR(statbuf.st_mode): + type = 'c' + elif stat.S_ISBLK(statbuf.st_mode): + type = 'b' + elif stat.S_ISFIFO(statbuf.st_mode): + type = 'p' + elif stat.S_ISDIR(statbuf.st_mode): + type = 'd' + elif stat.S_ISLNK(statbuf.st_mode): + type = 'l' + + if statbuf.st_mode & stat.S_ISUID: + suid = 'u' + else: + suid = '-' + + if statbuf.st_mode & stat.S_ISGID: + sgid = 'g' + else: + sgid = '-' + + perms = stat.S_IMODE(statbuf.st_mode) + + return '%s%s%s%o %d %d' % (type, suid, sgid, perms, statbuf.st_uid, \ + statbuf.st_gid) + +def stat_to_extra(statbuf, fullpath): + '''Compute the extras column for a protofile.''' + + if stat.S_ISREG(statbuf.st_mode): + return ' %s' % fullpath + elif stat.S_ISCHR(statbuf.st_mode) or stat.S_ISBLK(statbuf.st_mode): + return ' %d %d' % (statbuf.st_rdev, statbuf.st_rdev) + elif stat.S_ISLNK(statbuf.st_mode): + return ' %s' % os.readlink(fullpath) + return '' + +def max_fname_len(s1): + '''Return the length of the longest string in s1.''' + ret = 0 + for s in s1: + if len(s) > ret: + ret = len(s) + return ret + +def walk_tree(path, depth): + '''Walk the directory tree rooted by path.''' + dirs = [] + files = [] + + for fname in os.listdir(path): + fullpath = os.path.join(path, fname) + sb = os.lstat(fullpath) + + if stat.S_ISDIR(sb.st_mode): + dirs.append(fname) + continue + elif stat.S_ISSOCK(sb.st_mode): + continue + else: + files.append(fname) + + for fname in files: + if ' ' in fname: + raise ValueError( \ + f'{fname}: Spaces not allowed in file names.') + for fname in dirs: + if ' ' in fname: + raise Exception( \ + f'{fname}: Spaces not allowed in file names.') + + fname_width = max_fname_len(files) + for fname in files: + fullpath = os.path.join(path, fname) + sb = os.lstat(fullpath) + extra = stat_to_extra(sb, fullpath) + print('%*s%-*s %s%s' % (depth, ' ', fname_width, fname, \ + stat_to_str(sb), extra)) + + for fname in dirs: + fullpath = os.path.join(path, fname) + sb = os.lstat(fullpath) + extra = stat_to_extra(sb, fullpath) + print('%*s%s %s' % (depth, ' ', fname, \ + stat_to_str(sb))) + walk_tree(fullpath, depth + 1) + + if depth > 1: + print('%*s$' % (depth - 1, ' ')) + +def main(): + parser = argparse.ArgumentParser( \ + description = "Generate mkfs.xfs protofile for a directory tree.") + parser.add_argument('paths', metavar = 'paths', type = str, \ + nargs = '*', help = 'Directory paths to walk.') + parser.add_argument("-V", help = "Report version and exit.", \ + action = "store_true") + args = parser.parse_args() + + if args.V: + print("xfs_protofile version @pkg_version@") + sys.exit(0) + + emit_proto_header() + if len(args.paths) == 0: + print('d--755 0 0') + print('$') + else: + # Copy the first argument's stat to the rootdir + statbuf = os.stat(args.paths[0]) + if not stat.S_ISDIR(statbuf.st_mode): + raise NotADirectoryError(path) + print(stat_to_str(statbuf)) + + # All files under each path go in the root dir, recursively + for path in args.paths: + print(': Descending path %s' % path) + try: + walk_tree(path, 1) + except Exception as e: + print(e, file = sys.stderr) + return 1 + + print('$') + return 0 + +if __name__ == '__main__': + sys.exit(main()) ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 44/46] xfs_repair: allow sysadmins to add metadata directories 2022-12-30 22:19 ` [PATCHSET v1.0 00/46] libxfs: metadata inode directories Darrick J. Wong ` (44 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 46/46] mkfs: add a utility to generate protofiles Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 45 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Allow the sysadmin to use xfs_repair to upgrade an existing filesystem to support metadata directories. This will be needed to upgrade filesystems to support realtime rmap and reflink. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- man/man8/xfs_admin.8 | 9 +++++++ repair/dino_chunks.c | 6 ++++ repair/dinode.c | 5 +++- repair/globals.c | 1 + repair/globals.h | 1 + repair/phase2.c | 69 ++++++++++++++++++++++++++++++++++++++++++++++++++ repair/phase4.c | 5 +++- repair/protos.h | 6 ++++ repair/xfs_repair.c | 11 ++++++++ 9 files changed, 110 insertions(+), 3 deletions(-) diff --git a/man/man8/xfs_admin.8 b/man/man8/xfs_admin.8 index 467fb2dfd0a..f9c53235e7b 100644 --- a/man/man8/xfs_admin.8 +++ b/man/man8/xfs_admin.8 @@ -177,6 +177,15 @@ and online repairs to space usage metadata. The filesystem cannot be downgraded after this feature is enabled. This upgrade can fail if any AG has less than 5% free space remaining. This feature was added to Linux 4.8. +.TP 0.4i +.B metadir +Create a directory tree of metadata inodes instead of storing them all in the +superblock. +This is required for reverse mapping btrees and reflink support on the realtime +device. +The filesystem cannot be downgraded after this feature is enabled. +This upgrade can fail if any AG has less than 5% free space remaining. +This feature is not upstream yet. .RE .TP .BI \-U " uuid" diff --git a/repair/dino_chunks.c b/repair/dino_chunks.c index 382196cc170..c68d92a4d88 100644 --- a/repair/dino_chunks.c +++ b/repair/dino_chunks.c @@ -962,7 +962,11 @@ process_inode_chunk( } if (status) { - if (mp->m_sb.sb_rootino == ino) { + if (wipe_pre_metadir_file(ino)) { + if (!ino_discovery) + do_warn( + _("wiping pre-metadir metadata inode %"PRIu64".\n"), ino); + } else if (mp->m_sb.sb_rootino == ino) { need_root_inode = 1; if (!no_modify) { diff --git a/repair/dinode.c b/repair/dinode.c index 5c1f07d5bc1..cc2c3474634 100644 --- a/repair/dinode.c +++ b/repair/dinode.c @@ -2415,6 +2415,9 @@ process_dinode_int( ASSERT(uncertain == 0 || verify_mode != 0); ASSERT(ino_bpp != NULL || verify_mode != 0); + if (wipe_pre_metadir_file(lino)) + goto clear_bad_out; + /* * This is the only valid point to check the CRC; after this we may have * made changes which invalidate it, and the CRC is only updated again @@ -2624,7 +2627,7 @@ _("bad (negative) size %" PRId64 " on inode %" PRIu64 "\n"), if (flags & XFS_DIFLAG_NEWRTBM) { /* must be a rt bitmap inode */ if (lino != mp->m_sb.sb_rbmino) { - if (!uncertain) { + if (!uncertain && !add_metadir) { do_warn( _("inode %" PRIu64 " not rt bitmap\n"), lino); diff --git a/repair/globals.c b/repair/globals.c index c731d6bdff1..3200342e9f1 100644 --- a/repair/globals.c +++ b/repair/globals.c @@ -55,6 +55,7 @@ bool add_nrext64; bool add_finobt; /* add free inode btrees */ bool add_reflink; /* add reference count btrees */ bool add_rmapbt; /* add reverse mapping btrees */ +bool add_metadir; /* add metadata directory tree */ /* misc status variables */ diff --git a/repair/globals.h b/repair/globals.h index 6bd4be20cb1..e51f4e7ece4 100644 --- a/repair/globals.h +++ b/repair/globals.h @@ -96,6 +96,7 @@ extern bool add_nrext64; extern bool add_finobt; /* add free inode btrees */ extern bool add_reflink; /* add reference count btrees */ extern bool add_rmapbt; /* add reverse mapping btrees */ +extern bool add_metadir; /* add metadata directory tree */ /* misc status variables */ diff --git a/repair/phase2.c b/repair/phase2.c index 77324a976a1..707fe5ca519 100644 --- a/repair/phase2.c +++ b/repair/phase2.c @@ -14,6 +14,7 @@ #include "incore.h" #include "progress.h" #include "scan.h" +#include "quotacheck.h" /* workaround craziness in the xlog routines */ int xlog_recover_do_trans(struct xlog *log, struct xlog_recover *t, int p) @@ -286,6 +287,70 @@ set_rmapbt( return true; } +static xfs_ino_t doomed_rbmino = NULLFSINO; +static xfs_ino_t doomed_rsumino = NULLFSINO; +static xfs_ino_t doomed_uquotino = NULLFSINO; +static xfs_ino_t doomed_gquotino = NULLFSINO; +static xfs_ino_t doomed_pquotino = NULLFSINO; + +bool +wipe_pre_metadir_file( + xfs_ino_t ino) +{ + if (ino == doomed_rbmino || + ino == doomed_rsumino || + ino == doomed_uquotino || + ino == doomed_gquotino || + ino == doomed_pquotino) + return true; + return false; +} + +static bool +set_metadir( + struct xfs_mount *mp, + struct xfs_sb *new_sb) +{ + if (!xfs_has_crc(mp)) { + printf( + _("Metadata directory trees only supported on V5 filesystems.\n")); + exit(0); + } + + if (xfs_has_metadir(mp)) { + printf(_("Filesystem already supports metadata directory trees.\n")); + exit(0); + } + + printf(_("Adding metadata directory trees to filesystem.\n")); + new_sb->sb_features_incompat |= (XFS_SB_FEAT_INCOMPAT_METADIR | + XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR); + + /* Blow out all the old metadata inodes; we'll rebuild in phase6. */ + new_sb->sb_metadirino = new_sb->sb_rootino + 1; + doomed_rbmino = mp->m_sb.sb_rbmino; + doomed_rsumino = mp->m_sb.sb_rsumino; + doomed_uquotino = mp->m_sb.sb_uquotino; + doomed_gquotino = mp->m_sb.sb_gquotino; + doomed_pquotino = mp->m_sb.sb_pquotino; + + new_sb->sb_rbmino = NULLFSINO; + new_sb->sb_rsumino = NULLFSINO; + new_sb->sb_uquotino = NULLFSINO; + new_sb->sb_gquotino = NULLFSINO; + new_sb->sb_pquotino = NULLFSINO; + + /* Indicate that we need a rebuild. */ + need_metadir_inode = 1; + need_rbmino = 1; + need_rsumino = 1; + have_uquotino = 0; + have_gquotino = 0; + have_pquotino = 0; + quotacheck_skip(); + return true; +} + struct check_state { struct xfs_sb sb; uint64_t features; @@ -459,6 +524,8 @@ need_check_fs_free_space( return true; if (xfs_has_rmapbt(mp) && !(old->features & XFS_FEAT_RMAPBT)) return true; + if (xfs_has_metadir(mp) && !(old->features & XFS_FEAT_METADIR)) + return true; return false; } @@ -540,6 +607,8 @@ upgrade_filesystem( dirty |= set_reflink(mp, &new_sb); if (add_rmapbt) dirty |= set_rmapbt(mp, &new_sb); + if (add_metadir) + dirty |= set_metadir(mp, &new_sb); if (!dirty) return; diff --git a/repair/phase4.c b/repair/phase4.c index 5721647863a..28ecf56f45b 100644 --- a/repair/phase4.c +++ b/repair/phase4.c @@ -272,7 +272,10 @@ phase4(xfs_mount_t *mp) if (xfs_has_metadir(mp) && (is_inode_free(irec, 1) || !inode_isadir(irec, 1))) { need_metadir_inode = true; - if (no_modify) + if (add_metadir) + do_warn( + _("metadata directory root inode needs to be initialized\n")); + else if (no_modify) do_warn( _("metadata directory root inode would be lost\n")); else diff --git a/repair/protos.h b/repair/protos.h index 83e471ff2ad..20618bb2bc2 100644 --- a/repair/protos.h +++ b/repair/protos.h @@ -3,6 +3,8 @@ * Copyright (c) 2000-2001,2005 Silicon Graphics, Inc. * All Rights Reserved. */ +#ifndef __XFS_REPAIR_PROTOS_H__ +#define __XFS_REPAIR_PROTOS_H__ void xfs_init(libxfs_init_t *args); @@ -45,3 +47,7 @@ void phase7(struct xfs_mount *, int); int verify_set_agheader(struct xfs_mount *, struct xfs_buf *, struct xfs_sb *, struct xfs_agf *, struct xfs_agi *, xfs_agnumber_t); + +bool wipe_pre_metadir_file(xfs_ino_t ino); + +#endif /* __XFS_REPAIR_PROTOS_H__ */ diff --git a/repair/xfs_repair.c b/repair/xfs_repair.c index fe3fe341530..92dc0fb2d9f 100644 --- a/repair/xfs_repair.c +++ b/repair/xfs_repair.c @@ -72,6 +72,7 @@ enum c_opt_nums { CONVERT_FINOBT, CONVERT_REFLINK, CONVERT_RMAPBT, + CONVERT_METADIR, C_MAX_OPTS, }; @@ -83,6 +84,7 @@ static char *c_opts[] = { [CONVERT_FINOBT] = "finobt", [CONVERT_REFLINK] = "reflink", [CONVERT_RMAPBT] = "rmapbt", + [CONVERT_METADIR] = "metadir", [C_MAX_OPTS] = NULL, }; @@ -369,6 +371,15 @@ process_args(int argc, char **argv) _("-c rmapbt only supports upgrades\n")); add_rmapbt = true; break; + case CONVERT_METADIR: + if (!val) + do_abort( + _("-c metadir requires a parameter\n")); + if (strtol(val, NULL, 0) != 1) + do_abort( + _("-c metadir only supports upgrades\n")); + add_metadir = true; + break; default: unknown('c', val); break; ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCHSET v1.0 00/10] libxfs: refactor rt extent unit conversions 2022-12-30 21:14 [NYE DELUGE 3/4] xfs: modernize the realtime volume Darrick J. Wong ` (20 preceding siblings ...) 2022-12-30 22:19 ` [PATCHSET v1.0 00/46] libxfs: metadata inode directories Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 02/10] xfs: create a helper to compute leftovers of realtime extents Darrick J. Wong ` (9 more replies) 2022-12-30 22:19 ` [PATCHSET v1.0 0/9] libxfs: refactor rtbitmap/summary macros Darrick J. Wong ` (17 subsequent siblings) 39 siblings, 10 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs Hi all, This series replaces all the open-coded integer division and multiplication conversions between rt blocks and rt extents with calls to static inline helpers. Having cleaned all that up, the helpers are augmented to skip the expensive operations in favor of bit shifts and masking if the rt extent size is a power of two. If you're going to start using this mess, you probably ought to just pull from my git trees, which are linked below. This is an extraordinary way to destroy everything. Enjoy! Comments and questions are, as always, welcome. --D kernel git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=refactor-rt-unit-conversions xfsprogs git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=refactor-rt-unit-conversions --- include/libxfs.h | 1 include/xfs_mount.h | 2 + libfrog/Makefile | 1 libfrog/div64.h | 83 +++++++++++++++++++++++++++++++++++++++++ libxfs/libxfs_priv.h | 93 ++++++++++++---------------------------------- libxfs/xfs_bmap.c | 19 +++------ libxfs/xfs_rtbitmap.c | 4 +- libxfs/xfs_rtbitmap.h | 88 ++++++++++++++++++++++++++++++++++++++++++++ libxfs/xfs_sb.c | 2 + libxfs/xfs_swapext.c | 7 ++- libxfs/xfs_trans_inode.c | 3 + libxfs/xfs_trans_resv.c | 3 + mkfs/proto.c | 13 +++--- repair/agheader.h | 2 - repair/dinode.c | 21 ++++++---- repair/incore.c | 16 ++++---- repair/incore.h | 4 +- repair/phase4.c | 16 ++++---- repair/rt.c | 4 +- 19 files changed, 259 insertions(+), 123 deletions(-) create mode 100644 libfrog/div64.h ^ permalink raw reply [flat|nested] 565+ messages in thread
* [PATCH 02/10] xfs: create a helper to compute leftovers of realtime extents 2022-12-30 22:19 ` [PATCHSET v1.0 00/10] libxfs: refactor rt extent unit conversions Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 01/10] xfs: create a helper to convert rtextents to rtblocks Darrick J. Wong ` (8 subsequent siblings) 9 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Create a helper to compute the misalignment between a file extent (xfs_extlen_t) and a realtime extent. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/xfs_bmap.c | 4 ++-- libxfs/xfs_rtbitmap.h | 9 +++++++++ libxfs/xfs_trans_inode.c | 3 ++- 3 files changed, 13 insertions(+), 3 deletions(-) diff --git a/libxfs/xfs_bmap.c b/libxfs/xfs_bmap.c index 25b89d5ee13..025298db1e8 100644 --- a/libxfs/xfs_bmap.c +++ b/libxfs/xfs_bmap.c @@ -3052,7 +3052,7 @@ xfs_bmap_extsize_align( * If realtime, and the result isn't a multiple of the realtime * extent size we need to remove blocks until it is. */ - if (rt && (temp = (align_alen % mp->m_sb.sb_rextsize))) { + if (rt && (temp = xfs_extlen_to_rtxmod(mp, align_alen))) { /* * We're not covering the original request, or * we won't be able to once we fix the length. @@ -3079,7 +3079,7 @@ xfs_bmap_extsize_align( else { align_alen -= orig_off - align_off; align_off = orig_off; - align_alen -= align_alen % mp->m_sb.sb_rextsize; + align_alen -= xfs_extlen_to_rtxmod(mp, align_alen); } /* * Result doesn't cover the request, fail it. diff --git a/libxfs/xfs_rtbitmap.h b/libxfs/xfs_rtbitmap.h index 099ea8902aa..b6a4c46bddc 100644 --- a/libxfs/xfs_rtbitmap.h +++ b/libxfs/xfs_rtbitmap.h @@ -22,6 +22,15 @@ xfs_rtxlen_to_extlen( return rtxlen * mp->m_sb.sb_rextsize; } +/* Compute the misalignment between an extent length and a realtime extent .*/ +static inline unsigned int +xfs_extlen_to_rtxmod( + struct xfs_mount *mp, + xfs_extlen_t len) +{ + return len % mp->m_sb.sb_rextsize; +} + /* * Functions for walking free space rtextents in the realtime bitmap. */ diff --git a/libxfs/xfs_trans_inode.c b/libxfs/xfs_trans_inode.c index 6fc7a65d517..e2d5d3efaab 100644 --- a/libxfs/xfs_trans_inode.c +++ b/libxfs/xfs_trans_inode.c @@ -12,6 +12,7 @@ #include "xfs_mount.h" #include "xfs_inode.h" #include "xfs_trans.h" +#include "xfs_rtbitmap.h" /* @@ -149,7 +150,7 @@ xfs_trans_log_inode( */ if ((ip->i_diflags & XFS_DIFLAG_RTINHERIT) && (ip->i_diflags & XFS_DIFLAG_EXTSZINHERIT) && - (ip->i_extsize % ip->i_mount->m_sb.sb_rextsize) > 0) { + xfs_extlen_to_rtxmod(ip->i_mount, ip->i_extsize) > 0) { ip->i_diflags &= ~(XFS_DIFLAG_EXTSIZE | XFS_DIFLAG_EXTSZINHERIT); ip->i_extsize = 0; ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 01/10] xfs: create a helper to convert rtextents to rtblocks 2022-12-30 22:19 ` [PATCHSET v1.0 00/10] libxfs: refactor rt extent unit conversions Darrick J. Wong 2022-12-30 22:19 ` [PATCH 02/10] xfs: create a helper to compute leftovers of realtime extents Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 05/10] xfs: create helpers to convert rt block numbers to rt extent numbers Darrick J. Wong ` (7 subsequent siblings) 9 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Create a helper to convert a realtime extent to a realtime block. Later on we'll change the helper to use bit shifts when possible. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/xfs_rtbitmap.h | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/libxfs/xfs_rtbitmap.h b/libxfs/xfs_rtbitmap.h index 5e2afb7fea0..099ea8902aa 100644 --- a/libxfs/xfs_rtbitmap.h +++ b/libxfs/xfs_rtbitmap.h @@ -6,6 +6,22 @@ #ifndef __XFS_RTBITMAP_H__ #define __XFS_RTBITMAP_H__ +static inline xfs_rtblock_t +xfs_rtx_to_rtb( + struct xfs_mount *mp, + xfs_rtxnum_t rtx) +{ + return rtx * mp->m_sb.sb_rextsize; +} + +static inline xfs_extlen_t +xfs_rtxlen_to_extlen( + struct xfs_mount *mp, + xfs_rtxlen_t rtxlen) +{ + return rtxlen * mp->m_sb.sb_rextsize; +} + /* * Functions for walking free space rtextents in the realtime bitmap. */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 05/10] xfs: create helpers to convert rt block numbers to rt extent numbers 2022-12-30 22:19 ` [PATCHSET v1.0 00/10] libxfs: refactor rt extent unit conversions Darrick J. Wong 2022-12-30 22:19 ` [PATCH 02/10] xfs: create a helper to compute leftovers of realtime extents Darrick J. Wong 2022-12-30 22:19 ` [PATCH 01/10] xfs: create a helper to convert rtextents to rtblocks Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 03/10] xfs: create a helper to compute leftovers of realtime extents Darrick J. Wong ` (6 subsequent siblings) 9 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Create helpers to do unit conversions of rt block numbers to rt extent numbers. There are two variations -- the suffix "t" denotes the one that returns only the truncated extent number; the other one also returns the misalignment. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- include/libxfs.h | 1 + libxfs/xfs_bmap.c | 7 +++---- libxfs/xfs_rtbitmap.c | 4 ++-- libxfs/xfs_rtbitmap.h | 17 +++++++++++++++++ libxfs/xfs_swapext.c | 7 ++++--- 5 files changed, 27 insertions(+), 9 deletions(-) diff --git a/include/libxfs.h b/include/libxfs.h index 661964b8a1e..26202dede67 100644 --- a/include/libxfs.h +++ b/include/libxfs.h @@ -20,6 +20,7 @@ #include "bitops.h" #include "kmem.h" #include "libfrog/radix-tree.h" +#include "libfrog/div64.h" #include "atomic.h" #include "spinlock.h" diff --git a/libxfs/xfs_bmap.c b/libxfs/xfs_bmap.c index 025298db1e8..5fbfc5372c9 100644 --- a/libxfs/xfs_bmap.c +++ b/libxfs/xfs_bmap.c @@ -5365,7 +5365,6 @@ __xfs_bunmapi( int tmp_logflags; /* partial logging flags */ int wasdel; /* was a delayed alloc extent */ int whichfork; /* data or attribute fork */ - xfs_fsblock_t sum; xfs_filblks_t len = *rlen; /* length to unmap in file */ xfs_fileoff_t end; struct xfs_iext_cursor icur; @@ -5462,8 +5461,7 @@ __xfs_bunmapi( if (!isrt || (flags & XFS_BMAPI_REMAP)) goto delete; - sum = del.br_startblock + del.br_blockcount; - div_u64_rem(sum, mp->m_sb.sb_rextsize, &mod); + xfs_rtb_to_rtx(mp, del.br_startblock + del.br_blockcount, &mod); if (mod) { /* * Realtime extent not lined up at the end. @@ -5510,7 +5508,8 @@ __xfs_bunmapi( goto error0; goto nodelete; } - div_u64_rem(del.br_startblock, mp->m_sb.sb_rextsize, &mod); + + xfs_rtb_to_rtx(mp, del.br_startblock, &mod); if (mod) { xfs_extlen_t off = mp->m_sb.sb_rextsize - mod; diff --git a/libxfs/xfs_rtbitmap.c b/libxfs/xfs_rtbitmap.c index 70424ffb7d3..f74618323b4 100644 --- a/libxfs/xfs_rtbitmap.c +++ b/libxfs/xfs_rtbitmap.c @@ -1029,13 +1029,13 @@ xfs_rtfree_blocks( ASSERT(rtlen <= XFS_MAX_BMBT_EXTLEN); - len = div_u64_rem(rtlen, mp->m_sb.sb_rextsize, &mod); + len = xfs_rtb_to_rtx(mp, rtlen, &mod); if (mod) { ASSERT(mod == 0); return -EIO; } - start = div_u64_rem(rtbno, mp->m_sb.sb_rextsize, &mod); + start = xfs_rtb_to_rtx(mp, rtbno, &mod); if (mod) { ASSERT(mod == 0); return -EIO; diff --git a/libxfs/xfs_rtbitmap.h b/libxfs/xfs_rtbitmap.h index e2a36fc157c..bdd4858a794 100644 --- a/libxfs/xfs_rtbitmap.h +++ b/libxfs/xfs_rtbitmap.h @@ -39,6 +39,23 @@ xfs_extlen_to_rtxlen( return len / mp->m_sb.sb_rextsize; } +static inline xfs_rtxnum_t +xfs_rtb_to_rtx( + struct xfs_mount *mp, + xfs_rtblock_t rtbno, + xfs_extlen_t *mod) +{ + return div_u64_rem(rtbno, mp->m_sb.sb_rextsize, mod); +} + +static inline xfs_rtxnum_t +xfs_rtb_to_rtxt( + struct xfs_mount *mp, + xfs_rtblock_t rtbno) +{ + return div_u64(rtbno, mp->m_sb.sb_rextsize); +} + /* * Functions for walking free space rtextents in the realtime bitmap. */ diff --git a/libxfs/xfs_swapext.c b/libxfs/xfs_swapext.c index d2f0c89571d..718600019a7 100644 --- a/libxfs/xfs_swapext.c +++ b/libxfs/xfs_swapext.c @@ -28,6 +28,7 @@ #include "xfs_dir2_priv.h" #include "xfs_dir2.h" #include "xfs_symlink_remote.h" +#include "xfs_rtbitmap.h" struct kmem_cache *xfs_swapext_intent_cache; @@ -213,19 +214,19 @@ xfs_swapext_check_rt_extents( irec2.br_blockcount); /* Both mappings must be aligned to the realtime extent size. */ - div_u64_rem(irec1.br_startoff, mp->m_sb.sb_rextsize, &mod); + xfs_rtb_to_rtx(mp, irec1.br_startoff, &mod); if (mod) { ASSERT(mod == 0); return -EINVAL; } - div_u64_rem(irec2.br_startoff, mp->m_sb.sb_rextsize, &mod); + xfs_rtb_to_rtx(mp, irec1.br_startoff, &mod); if (mod) { ASSERT(mod == 0); return -EINVAL; } - div_u64_rem(irec1.br_blockcount, mp->m_sb.sb_rextsize, &mod); + xfs_rtb_to_rtx(mp, irec1.br_blockcount, &mod); if (mod) { ASSERT(mod == 0); return -EINVAL; ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 03/10] xfs: create a helper to compute leftovers of realtime extents 2022-12-30 22:19 ` [PATCHSET v1.0 00/10] libxfs: refactor rt extent unit conversions Darrick J. Wong ` (2 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 05/10] xfs: create helpers to convert rt block numbers to rt extent numbers Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 04/10] libfrog: move 64-bit division wrappers to libfrog Darrick J. Wong ` (5 subsequent siblings) 9 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Create a helper to compute the realtime extent (xfs_rtxlen_t) from an extent length (xfs_extlen_t) value. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/xfs_rtbitmap.h | 8 ++++++++ libxfs/xfs_trans_resv.c | 3 ++- 2 files changed, 10 insertions(+), 1 deletion(-) diff --git a/libxfs/xfs_rtbitmap.h b/libxfs/xfs_rtbitmap.h index b6a4c46bddc..e2a36fc157c 100644 --- a/libxfs/xfs_rtbitmap.h +++ b/libxfs/xfs_rtbitmap.h @@ -31,6 +31,14 @@ xfs_extlen_to_rtxmod( return len % mp->m_sb.sb_rextsize; } +static inline xfs_rtxlen_t +xfs_extlen_to_rtxlen( + struct xfs_mount *mp, + xfs_extlen_t len) +{ + return len / mp->m_sb.sb_rextsize; +} + /* * Functions for walking free space rtextents in the realtime bitmap. */ diff --git a/libxfs/xfs_trans_resv.c b/libxfs/xfs_trans_resv.c index 2835d7754a8..be486ed42c3 100644 --- a/libxfs/xfs_trans_resv.c +++ b/libxfs/xfs_trans_resv.c @@ -18,6 +18,7 @@ #include "xfs_trans.h" #include "xfs_trans_space.h" #include "xfs_quota_defs.h" +#include "xfs_rtbitmap.h" #define _ALLOC true #define _FREE false @@ -219,7 +220,7 @@ xfs_rtalloc_block_count( unsigned int blksz = XFS_FSB_TO_B(mp, 1); unsigned int rtbmp_bytes; - rtbmp_bytes = (XFS_MAX_BMBT_EXTLEN / mp->m_sb.sb_rextsize) / NBBY; + rtbmp_bytes = xfs_extlen_to_rtxlen(mp, XFS_MAX_BMBT_EXTLEN) / NBBY; return (howmany(rtbmp_bytes, blksz) + 1) * num_ops; } ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 04/10] libfrog: move 64-bit division wrappers to libfrog 2022-12-30 22:19 ` [PATCHSET v1.0 00/10] libxfs: refactor rt extent unit conversions Darrick J. Wong ` (3 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 03/10] xfs: create a helper to compute leftovers of realtime extents Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 06/10] xfs: convert do_div calls to xfs_rtb_to_rtx helper calls Darrick J. Wong ` (4 subsequent siblings) 9 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> We want to keep the rtgroup unit conversion functions as static inlines, so share the div64 functions via libfrog instead of libxfs_priv.h. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libfrog/Makefile | 1 + libfrog/div64.h | 69 ++++++++++++++++++++++++++++++++++++++++++++++++++ libxfs/libxfs_priv.h | 61 +------------------------------------------- 3 files changed, 71 insertions(+), 60 deletions(-) create mode 100644 libfrog/div64.h diff --git a/libfrog/Makefile b/libfrog/Makefile index 66d2afe56fe..04aecf1abf1 100644 --- a/libfrog/Makefile +++ b/libfrog/Makefile @@ -40,6 +40,7 @@ crc32c.h \ crc32cselftest.h \ crc32defs.h \ crc32table.h \ +div64.h \ file_exchange.h \ fsgeom.h \ logging.h \ diff --git a/libfrog/div64.h b/libfrog/div64.h new file mode 100644 index 00000000000..265487916fc --- /dev/null +++ b/libfrog/div64.h @@ -0,0 +1,69 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (c) 2000-2005 Silicon Graphics, Inc. + * All Rights Reserved. + */ +#ifndef LIBFROG_DIV64_H_ +#define LIBFROG_DIV64_H_ + +static inline int __do_div(unsigned long long *n, unsigned base) +{ + int __res; + __res = (int)(((unsigned long) *n) % (unsigned) base); + *n = ((unsigned long) *n) / (unsigned) base; + return __res; +} + +#define do_div(n,base) (__do_div((unsigned long long *)&(n), (base))) +#define do_mod(a, b) ((a) % (b)) +#define rol32(x,y) (((x) << (y)) | ((x) >> (32 - (y)))) + +/** + * div_u64_rem - unsigned 64bit divide with 32bit divisor with remainder + * @dividend: unsigned 64bit dividend + * @divisor: unsigned 32bit divisor + * @remainder: pointer to unsigned 32bit remainder + * + * Return: sets ``*remainder``, then returns dividend / divisor + * + * This is commonly provided by 32bit archs to provide an optimized 64bit + * divide. + */ +static inline uint64_t +div_u64_rem(uint64_t dividend, uint32_t divisor, uint32_t *remainder) +{ + *remainder = dividend % divisor; + return dividend / divisor; +} + +/** + * div_u64 - unsigned 64bit divide with 32bit divisor + * @dividend: unsigned 64bit dividend + * @divisor: unsigned 32bit divisor + * + * This is the most common 64bit divide and should be used if possible, + * as many 32bit archs can optimize this variant better than a full 64bit + * divide. + */ +static inline uint64_t div_u64(uint64_t dividend, uint32_t divisor) +{ + uint32_t remainder; + return div_u64_rem(dividend, divisor, &remainder); +} + +/** + * div64_u64_rem - unsigned 64bit divide with 64bit divisor and remainder + * @dividend: unsigned 64bit dividend + * @divisor: unsigned 64bit divisor + * @remainder: pointer to unsigned 64bit remainder + * + * Return: sets ``*remainder``, then returns dividend / divisor + */ +static inline uint64_t +div64_u64_rem(uint64_t dividend, uint64_t divisor, uint64_t *remainder) +{ + *remainder = dividend % divisor; + return dividend / divisor; +} + +#endif /* LIBFROG_DIV64_H_ */ diff --git a/libxfs/libxfs_priv.h b/libxfs/libxfs_priv.h index 4cb996a3f3f..49441ac787f 100644 --- a/libxfs/libxfs_priv.h +++ b/libxfs/libxfs_priv.h @@ -50,6 +50,7 @@ #include "bitops.h" #include "kmem.h" #include "libfrog/radix-tree.h" +#include "libfrog/div64.h" #include "atomic.h" #include "spinlock.h" #include "linux-err.h" @@ -257,66 +258,6 @@ static inline bool __must_check __must_check_overflow(bool overflow) __builtin_add_overflow(__a, __b, __d); \ })) -static inline int __do_div(unsigned long long *n, unsigned base) -{ - int __res; - __res = (int)(((unsigned long) *n) % (unsigned) base); - *n = ((unsigned long) *n) / (unsigned) base; - return __res; -} - -#define do_div(n,base) (__do_div((unsigned long long *)&(n), (base))) -#define do_mod(a, b) ((a) % (b)) -#define rol32(x,y) (((x) << (y)) | ((x) >> (32 - (y)))) - -/** - * div_u64_rem - unsigned 64bit divide with 32bit divisor with remainder - * @dividend: unsigned 64bit dividend - * @divisor: unsigned 32bit divisor - * @remainder: pointer to unsigned 32bit remainder - * - * Return: sets ``*remainder``, then returns dividend / divisor - * - * This is commonly provided by 32bit archs to provide an optimized 64bit - * divide. - */ -static inline uint64_t -div_u64_rem(uint64_t dividend, uint32_t divisor, uint32_t *remainder) -{ - *remainder = dividend % divisor; - return dividend / divisor; -} - -/** - * div_u64 - unsigned 64bit divide with 32bit divisor - * @dividend: unsigned 64bit dividend - * @divisor: unsigned 32bit divisor - * - * This is the most common 64bit divide and should be used if possible, - * as many 32bit archs can optimize this variant better than a full 64bit - * divide. - */ -static inline uint64_t div_u64(uint64_t dividend, uint32_t divisor) -{ - uint32_t remainder; - return div_u64_rem(dividend, divisor, &remainder); -} - -/** - * div64_u64_rem - unsigned 64bit divide with 64bit divisor and remainder - * @dividend: unsigned 64bit dividend - * @divisor: unsigned 64bit divisor - * @remainder: pointer to unsigned 64bit remainder - * - * Return: sets ``*remainder``, then returns dividend / divisor - */ -static inline uint64_t -div64_u64_rem(uint64_t dividend, uint64_t divisor, uint64_t *remainder) -{ - *remainder = dividend % divisor; - return dividend / divisor; -} - #define min_t(type,x,y) \ ({ type __x = (x); type __y = (y); __x < __y ? __x: __y; }) #define max_t(type,x,y) \ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 06/10] xfs: convert do_div calls to xfs_rtb_to_rtx helper calls 2022-12-30 22:19 ` [PATCHSET v1.0 00/10] libxfs: refactor rt extent unit conversions Darrick J. Wong ` (4 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 04/10] libfrog: move 64-bit division wrappers to libfrog Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 09/10] xfs_repair: convert utility to use new rt extent helpers and types Darrick J. Wong ` (3 subsequent siblings) 9 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Convert these calls to use the helpers, and clean up all these places where the same variable can have different units depending on where it is in the function. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/xfs_bmap.c | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/libxfs/xfs_bmap.c b/libxfs/xfs_bmap.c index 5fbfc5372c9..3637b07feba 100644 --- a/libxfs/xfs_bmap.c +++ b/libxfs/xfs_bmap.c @@ -4915,12 +4915,8 @@ xfs_bmap_del_extent_delay( ASSERT(got->br_startoff <= del->br_startoff); ASSERT(got_endoff >= del_endoff); - if (isrt) { - uint64_t rtexts = XFS_FSB_TO_B(mp, del->br_blockcount); - - do_div(rtexts, mp->m_sb.sb_rextsize); - xfs_mod_frextents(mp, rtexts); - } + if (isrt) + xfs_mod_frextents(mp, xfs_rtb_to_rtxt(mp, del->br_blockcount)); /* * Update the inode delalloc counter now and wait to update the ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 09/10] xfs_repair: convert utility to use new rt extent helpers and types 2022-12-30 22:19 ` [PATCHSET v1.0 00/10] libxfs: refactor rt extent unit conversions Darrick J. Wong ` (5 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 06/10] xfs: convert do_div calls to xfs_rtb_to_rtx helper calls Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 08/10] xfs: use shifting and masking when converting rt extents, if possible Darrick J. Wong ` (2 subsequent siblings) 9 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Convert the repair program to use the new realtime extent types and helper functions instead of open-coding them. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- repair/agheader.h | 2 +- repair/dinode.c | 21 ++++++++++++--------- repair/incore.c | 16 ++++++++-------- repair/incore.h | 4 ++-- repair/phase4.c | 16 ++++++++-------- repair/rt.c | 4 ++-- 6 files changed, 33 insertions(+), 30 deletions(-) diff --git a/repair/agheader.h b/repair/agheader.h index a63827c8725..e3e4a21e02b 100644 --- a/repair/agheader.h +++ b/repair/agheader.h @@ -11,7 +11,7 @@ typedef struct fs_geometry { uint32_t sb_blocksize; /* blocksize (bytes) */ xfs_rfsblock_t sb_dblocks; /* # data blocks */ xfs_rfsblock_t sb_rblocks; /* # realtime blocks */ - xfs_rtblock_t sb_rextents; /* # realtime extents */ + xfs_rtbxlen_t sb_rextents; /* # realtime extents */ xfs_fsblock_t sb_logstart; /* starting log block # */ xfs_agblock_t sb_rextsize; /* realtime extent size (blocks )*/ xfs_agblock_t sb_agblocks; /* # of blocks per ag */ diff --git a/repair/dinode.c b/repair/dinode.c index cc2c3474634..e66f93abb1d 100644 --- a/repair/dinode.c +++ b/repair/dinode.c @@ -194,13 +194,13 @@ process_rt_rec_dups( xfs_ino_t ino, struct xfs_bmbt_irec *irec) { - xfs_fsblock_t b; - xfs_rtblock_t ext; + xfs_rtblock_t b; + xfs_rtxnum_t ext; - for (b = rounddown(irec->br_startblock, mp->m_sb.sb_rextsize); + for (b = xfs_rtb_rounddown_rtx(mp, irec->br_startblock); b < irec->br_startblock + irec->br_blockcount; b += mp->m_sb.sb_rextsize) { - ext = (xfs_rtblock_t) b / mp->m_sb.sb_rextsize; + ext = xfs_rtb_to_rtxt(mp, b); if (search_rt_dup_extent(mp, ext)) { do_warn( _("data fork in rt ino %" PRIu64 " claims dup rt extent," @@ -224,14 +224,17 @@ process_rt_rec_state( struct xfs_bmbt_irec *irec) { xfs_fsblock_t b = irec->br_startblock; - xfs_rtblock_t ext; + xfs_rtxnum_t ext; int state; do { - ext = (xfs_rtblock_t)b / mp->m_sb.sb_rextsize; + xfs_extlen_t mod; + + ext = xfs_rtb_to_rtxt(mp, b); state = get_rtbmap(ext); - if ((b % mp->m_sb.sb_rextsize) != 0) { + xfs_rtb_to_rtx(mp, b, &mod); + if (mod) { /* * We are midway through a partially written extent. * If we don't find the state that gets set in the @@ -242,7 +245,7 @@ process_rt_rec_state( do_error( _("data fork in rt inode %" PRIu64 " found invalid rt extent %"PRIu64" state %d at rt block %"PRIu64"\n"), ino, ext, state, b); - b = roundup(b, mp->m_sb.sb_rextsize); + b = xfs_rtb_roundup_rtx(mp, b); continue; } @@ -2321,7 +2324,7 @@ validate_extsize( */ if ((flags & XFS_DIFLAG_EXTSZINHERIT) && (flags & XFS_DIFLAG_RTINHERIT) && - value % mp->m_sb.sb_rextsize > 0) + xfs_extlen_to_rtxmod(mp, value) > 0) misaligned = true; /* diff --git a/repair/incore.c b/repair/incore.c index f7a89e70d91..06edaf0d605 100644 --- a/repair/incore.c +++ b/repair/incore.c @@ -178,21 +178,21 @@ static size_t rt_bmap_size; */ int get_rtbmap( - xfs_rtblock_t bno) + xfs_rtxnum_t rtx) { - return (*(rt_bmap + bno / XR_BB_NUM) >> - ((bno % XR_BB_NUM) * XR_BB)) & XR_BB_MASK; + return (*(rt_bmap + rtx / XR_BB_NUM) >> + ((rtx % XR_BB_NUM) * XR_BB)) & XR_BB_MASK; } void set_rtbmap( - xfs_rtblock_t bno, + xfs_rtxnum_t rtx, int state) { - *(rt_bmap + bno / XR_BB_NUM) = - ((*(rt_bmap + bno / XR_BB_NUM) & - (~((uint64_t) XR_BB_MASK << ((bno % XR_BB_NUM) * XR_BB)))) | - (((uint64_t) state) << ((bno % XR_BB_NUM) * XR_BB))); + *(rt_bmap + rtx / XR_BB_NUM) = + ((*(rt_bmap + rtx / XR_BB_NUM) & + (~((uint64_t) XR_BB_MASK << ((rtx % XR_BB_NUM) * XR_BB)))) | + (((uint64_t) state) << ((rtx % XR_BB_NUM) * XR_BB))); } static void diff --git a/repair/incore.h b/repair/incore.h index 53609f683af..c31b778a0fb 100644 --- a/repair/incore.h +++ b/repair/incore.h @@ -28,8 +28,8 @@ void set_bmap_ext(xfs_agnumber_t agno, xfs_agblock_t agbno, int get_bmap_ext(xfs_agnumber_t agno, xfs_agblock_t agbno, xfs_agblock_t maxbno, xfs_extlen_t *blen); -void set_rtbmap(xfs_rtblock_t bno, int state); -int get_rtbmap(xfs_rtblock_t bno); +void set_rtbmap(xfs_rtxnum_t rtx, int state); +int get_rtbmap(xfs_rtxnum_t rtx); static inline void set_bmap(xfs_agnumber_t agno, xfs_agblock_t agbno, int state) diff --git a/repair/phase4.c b/repair/phase4.c index 28ecf56f45b..cfdea1460e5 100644 --- a/repair/phase4.c +++ b/repair/phase4.c @@ -229,9 +229,9 @@ void phase4(xfs_mount_t *mp) { ino_tree_node_t *irec; - xfs_rtblock_t bno; - xfs_rtblock_t rt_start; - xfs_extlen_t rt_len; + xfs_rtxnum_t rtx; + xfs_rtxnum_t rt_start; + xfs_rtxlen_t rt_len; xfs_agnumber_t i; xfs_agblock_t j; xfs_agblock_t ag_end; @@ -330,14 +330,14 @@ phase4(xfs_mount_t *mp) rt_start = 0; rt_len = 0; - for (bno = 0; bno < mp->m_sb.sb_rextents; bno++) { - bstate = get_rtbmap(bno); + for (rtx = 0; rtx < mp->m_sb.sb_rextents; rtx++) { + bstate = get_rtbmap(rtx); switch (bstate) { case XR_E_BAD_STATE: default: do_warn( _("unknown rt extent state, extent %" PRIu64 "\n"), - bno); + rtx); fallthrough; case XR_E_METADATA: case XR_E_UNKNOWN: @@ -360,14 +360,14 @@ phase4(xfs_mount_t *mp) break; case XR_E_MULT: if (rt_start == 0) { - rt_start = bno; + rt_start = rtx; rt_len = 1; } else if (rt_len == XFS_MAX_BMBT_EXTLEN) { /* * large extent case */ add_rt_dup_extent(rt_start, rt_len); - rt_start = bno; + rt_start = rtx; rt_len = 1; } else rt_len++; diff --git a/repair/rt.c b/repair/rt.c index a4cca7aa223..947382e9ede 100644 --- a/repair/rt.c +++ b/repair/rt.c @@ -48,8 +48,8 @@ generate_rtinfo(xfs_mount_t *mp, xfs_rtword_t *words, xfs_suminfo_t *sumcompute) { - xfs_rtblock_t extno; - xfs_rtblock_t start_ext; + xfs_rtxnum_t extno; + xfs_rtxnum_t start_ext; int bitsperblock; int bmbno; xfs_rtword_t freebit; ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 08/10] xfs: use shifting and masking when converting rt extents, if possible 2022-12-30 22:19 ` [PATCHSET v1.0 00/10] libxfs: refactor rt extent unit conversions Darrick J. Wong ` (6 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 09/10] xfs_repair: convert utility to use new rt extent helpers and types Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 10/10] mkfs: convert utility to use new rt extent helpers and types Darrick J. Wong 2022-12-30 22:19 ` [PATCH 07/10] xfs: create rt extent rounding helpers for realtime extent blocks Darrick J. Wong 9 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Avoid the costs of integer division (32-bit and 64-bit) if the realtime extent size is a power of two. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- include/xfs_mount.h | 2 ++ libxfs/libxfs_priv.h | 24 ++++++++++++++++++++++++ libxfs/xfs_rtbitmap.h | 20 ++++++++++++++++++++ libxfs/xfs_sb.c | 2 ++ 4 files changed, 48 insertions(+) diff --git a/include/xfs_mount.h b/include/xfs_mount.h index 4347098dc7e..6de360d33d3 100644 --- a/include/xfs_mount.h +++ b/include/xfs_mount.h @@ -67,6 +67,7 @@ typedef struct xfs_mount { uint8_t m_blkbb_log; /* blocklog - BBSHIFT */ uint8_t m_sectbb_log; /* sectorlog - BBSHIFT */ uint8_t m_agno_log; /* log #ag's */ + int8_t m_rtxblklog; /* log2 of rextsize, if possible */ uint m_blockmask; /* sb_blocksize-1 */ uint m_blockwsize; /* sb_blocksize in words */ uint m_blockwmask; /* blockwsize-1 */ @@ -88,6 +89,7 @@ typedef struct xfs_mount { uint m_ag_max_usable; /* max space per AG */ struct radix_tree_root m_perag_tree; uint64_t m_features; /* active filesystem features */ + uint64_t m_rtxblkmask; /* rt extent block mask */ unsigned long m_opstate; /* dynamic state flags */ bool m_finobt_nores; /* no per-AG finobt resv. */ uint m_qflags; /* quota status flags */ diff --git a/libxfs/libxfs_priv.h b/libxfs/libxfs_priv.h index 71abfdbe401..268c52b508d 100644 --- a/libxfs/libxfs_priv.h +++ b/libxfs/libxfs_priv.h @@ -371,6 +371,30 @@ howmany_64(uint64_t x, uint32_t y) return x; } +/* If @b is a power of 2, return log2(b). Else return -1. */ +static inline int8_t log2_if_power2(unsigned long b) +{ + unsigned long mask = 1; + unsigned int i; + unsigned int ret = 1; + + if (!is_power_of_2(b)) + return -1; + + for (i = 0; i < NBBY * sizeof(unsigned long); i++, mask <<= 1) { + if (b & mask) + ret = i; + } + + return ret; +} + +/* If @b is a power of 2, return a mask of the lower bits, else return zero. */ +static inline unsigned long long mask64_if_power2(unsigned long b) +{ + return is_power_of_2(b) ? b - 1 : 0; +} + /* buffer management */ #define XBF_TRYLOCK 0 #define XBF_UNMAPPED 0 diff --git a/libxfs/xfs_rtbitmap.h b/libxfs/xfs_rtbitmap.h index bc51d3bfc7c..9dd791181ca 100644 --- a/libxfs/xfs_rtbitmap.h +++ b/libxfs/xfs_rtbitmap.h @@ -11,6 +11,9 @@ xfs_rtx_to_rtb( struct xfs_mount *mp, xfs_rtxnum_t rtx) { + if (mp->m_rtxblklog >= 0) + return rtx << mp->m_rtxblklog; + return rtx * mp->m_sb.sb_rextsize; } @@ -19,6 +22,9 @@ xfs_rtxlen_to_extlen( struct xfs_mount *mp, xfs_rtxlen_t rtxlen) { + if (mp->m_rtxblklog >= 0) + return rtxlen << mp->m_rtxblklog; + return rtxlen * mp->m_sb.sb_rextsize; } @@ -28,6 +34,9 @@ xfs_extlen_to_rtxmod( struct xfs_mount *mp, xfs_extlen_t len) { + if (mp->m_rtxblklog >= 0) + return len & mp->m_rtxblkmask; + return len % mp->m_sb.sb_rextsize; } @@ -36,6 +45,9 @@ xfs_extlen_to_rtxlen( struct xfs_mount *mp, xfs_extlen_t len) { + if (mp->m_rtxblklog >= 0) + return len >> mp->m_rtxblklog; + return len / mp->m_sb.sb_rextsize; } @@ -45,6 +57,11 @@ xfs_rtb_to_rtx( xfs_rtblock_t rtbno, xfs_extlen_t *mod) { + if (mp->m_rtxblklog >= 0) { + *mod = rtbno & mp->m_rtxblkmask; + return rtbno >> mp->m_rtxblklog; + } + return div_u64_rem(rtbno, mp->m_sb.sb_rextsize, mod); } @@ -53,6 +70,9 @@ xfs_rtb_to_rtxt( struct xfs_mount *mp, xfs_rtblock_t rtbno) { + if (mp->m_rtxblklog >= 0) + return rtbno >> mp->m_rtxblklog; + return div_u64(rtbno, mp->m_sb.sb_rextsize); } diff --git a/libxfs/xfs_sb.c b/libxfs/xfs_sb.c index 55a5c5fc631..8605c91e212 100644 --- a/libxfs/xfs_sb.c +++ b/libxfs/xfs_sb.c @@ -950,6 +950,8 @@ xfs_sb_mount_common( mp->m_blockmask = sbp->sb_blocksize - 1; mp->m_blockwsize = sbp->sb_blocksize >> XFS_WORDLOG; mp->m_blockwmask = mp->m_blockwsize - 1; + mp->m_rtxblklog = log2_if_power2(sbp->sb_rextsize); + mp->m_rtxblkmask = mask64_if_power2(sbp->sb_rextsize); mp->m_alloc_mxr[0] = xfs_allocbt_maxrecs(mp, sbp->sb_blocksize, 1); mp->m_alloc_mxr[1] = xfs_allocbt_maxrecs(mp, sbp->sb_blocksize, 0); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 10/10] mkfs: convert utility to use new rt extent helpers and types 2022-12-30 22:19 ` [PATCHSET v1.0 00/10] libxfs: refactor rt extent unit conversions Darrick J. Wong ` (7 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 08/10] xfs: use shifting and masking when converting rt extents, if possible Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 07/10] xfs: create rt extent rounding helpers for realtime extent blocks Darrick J. Wong 9 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Convert the repair program to use the new realtime extent types and helper functions instead of open-coding them. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- mkfs/proto.c | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/mkfs/proto.c b/mkfs/proto.c index 484b5deced8..21fe2c7f972 100644 --- a/mkfs/proto.c +++ b/mkfs/proto.c @@ -913,21 +913,22 @@ rtfreesp_init( struct xfs_mount *mp) { struct xfs_trans *tp; - xfs_fileoff_t bno; - xfs_fileoff_t ebno; + xfs_rtxnum_t rtx; + xfs_rtxnum_t ertx; int error; - for (bno = 0; bno < mp->m_sb.sb_rextents; bno = ebno) { + for (rtx = 0; rtx < mp->m_sb.sb_rextents; rtx = ertx) { error = -libxfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, 0, 0, 0, &tp); if (error) res_failed(error); libxfs_trans_ijoin(tp, mp->m_rbmip, 0); - ebno = XFS_RTMIN(mp->m_sb.sb_rextents, - bno + NBBY * mp->m_sb.sb_blocksize); + ertx = XFS_RTMIN(mp->m_sb.sb_rextents, + rtx + NBBY * mp->m_sb.sb_blocksize); - error = -libxfs_rtfree_extent(tp, bno, (xfs_extlen_t)(ebno-bno)); + error = -libxfs_rtfree_extent(tp, rtx, + (xfs_rtxlen_t)(ertx - rtx)); if (error) { fail(_("Error initializing the realtime space"), error); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 07/10] xfs: create rt extent rounding helpers for realtime extent blocks 2022-12-30 22:19 ` [PATCHSET v1.0 00/10] libxfs: refactor rt extent unit conversions Darrick J. Wong ` (8 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 10/10] mkfs: convert utility to use new rt extent helpers and types Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 9 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Create a pair of functions to round rtblock numbers up or down to the nearest rt extent. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libfrog/div64.h | 14 ++++++++++++++ libxfs/libxfs_priv.h | 8 -------- libxfs/xfs_rtbitmap.h | 18 ++++++++++++++++++ 3 files changed, 32 insertions(+), 8 deletions(-) diff --git a/libfrog/div64.h b/libfrog/div64.h index 265487916fc..9317b28aad4 100644 --- a/libfrog/div64.h +++ b/libfrog/div64.h @@ -66,4 +66,18 @@ div64_u64_rem(uint64_t dividend, uint64_t divisor, uint64_t *remainder) return dividend / divisor; } +static inline uint64_t rounddown_64(uint64_t x, uint32_t y) +{ + do_div(x, y); + return x * y; +} + +static inline uint64_t +roundup_64(uint64_t x, uint32_t y) +{ + x += y - 1; + do_div(x, y); + return x * y; +} + #endif /* LIBFROG_DIV64_H_ */ diff --git a/libxfs/libxfs_priv.h b/libxfs/libxfs_priv.h index 49441ac787f..71abfdbe401 100644 --- a/libxfs/libxfs_priv.h +++ b/libxfs/libxfs_priv.h @@ -363,14 +363,6 @@ roundup_pow_of_two(uint v) return 0; } -static inline uint64_t -roundup_64(uint64_t x, uint32_t y) -{ - x += y - 1; - do_div(x, y); - return x * y; -} - static inline uint64_t howmany_64(uint64_t x, uint32_t y) { diff --git a/libxfs/xfs_rtbitmap.h b/libxfs/xfs_rtbitmap.h index bdd4858a794..bc51d3bfc7c 100644 --- a/libxfs/xfs_rtbitmap.h +++ b/libxfs/xfs_rtbitmap.h @@ -56,6 +56,24 @@ xfs_rtb_to_rtxt( return div_u64(rtbno, mp->m_sb.sb_rextsize); } +/* Round this rtblock up to the nearest rt extent size. */ +static inline xfs_rtblock_t +xfs_rtb_roundup_rtx( + struct xfs_mount *mp, + xfs_rtblock_t rtbno) +{ + return roundup_64(rtbno, mp->m_sb.sb_rextsize); +} + +/* Round this rtblock down to the nearest rt extent size. */ +static inline xfs_rtblock_t +xfs_rtb_rounddown_rtx( + struct xfs_mount *mp, + xfs_rtblock_t rtbno) +{ + return rounddown_64(rtbno, mp->m_sb.sb_rextsize); +} + /* * Functions for walking free space rtextents in the realtime bitmap. */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCHSET v1.0 0/9] libxfs: refactor rtbitmap/summary macros 2022-12-30 21:14 [NYE DELUGE 3/4] xfs: modernize the realtime volume Darrick J. Wong ` (21 preceding siblings ...) 2022-12-30 22:19 ` [PATCHSET v1.0 00/10] libxfs: refactor rt extent unit conversions Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 3/9] xfs: convert open-coded xfs_rtword_t pointer accesses to helper Darrick J. Wong ` (8 more replies) 2022-12-30 22:19 ` [PATCHSET v1.0 0/8] xfs_db: debug realtime geometry Darrick J. Wong ` (16 subsequent siblings) 39 siblings, 9 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs Hi all, In preparation for adding block headers and enforcing endian order in rtbitmap and rtsummary blocks, replace open-coded geometry computations and fugly macros with proper helper functions that can be typechecked. Soon we'll be needing to add more complex logic to the helpers. If you're going to start using this mess, you probably ought to just pull from my git trees, which are linked below. This is an extraordinary way to destroy everything. Enjoy! Comments and questions are, as always, welcome. --D kernel git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=refactor-rtbitmap-macros xfsprogs git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=refactor-rtbitmap-macros --- db/check.c | 63 ++++++++--- libxfs/init.c | 8 + libxfs/libxfs_api_defs.h | 8 + libxfs/xfs_format.h | 32 +++-- libxfs/xfs_rtbitmap.c | 268 +++++++++++++++++++++++++++++++++------------- libxfs/xfs_rtbitmap.h | 133 +++++++++++++++++++++++ libxfs/xfs_trans_resv.c | 9 +- libxfs/xfs_types.h | 2 repair/globals.c | 4 - repair/globals.h | 4 - repair/phase6.c | 14 +- repair/rt.c | 43 ++++--- repair/rt.h | 6 - 13 files changed, 443 insertions(+), 151 deletions(-) ^ permalink raw reply [flat|nested] 565+ messages in thread
* [PATCH 3/9] xfs: convert open-coded xfs_rtword_t pointer accesses to helper 2022-12-30 22:19 ` [PATCHSET v1.0 0/9] libxfs: refactor rtbitmap/summary macros Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 1/9] xfs: convert the rtbitmap block and bit macros to static inline functions Darrick J. Wong ` (7 subsequent siblings) 8 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> There are a bunch of places where we use open-coded logic to find a pointer to an xfs_rtword_t within a rt bitmap buffer. Convert all that to helper functions for better type safety. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- db/check.c | 4 ++- libxfs/xfs_rtbitmap.c | 59 ++++++++++++++++++++++++++----------------------- libxfs/xfs_rtbitmap.h | 20 +++++++++++++++++ repair/phase6.c | 4 ++- 4 files changed, 56 insertions(+), 31 deletions(-) diff --git a/db/check.c b/db/check.c index 5297ea25459..185be6352b8 100644 --- a/db/check.c +++ b/db/check.c @@ -3635,7 +3635,7 @@ process_rtbitmap( push_cur(); set_cur(&typtab[TYP_RTBITMAP], XFS_FSB_TO_DADDR(mp, bno), blkbb, DB_RING_IGN, NULL); - if ((words = iocur_top->data) == NULL) { + if (!iocur_top->bp) { if (!sflag) dbprintf(_("can't read block %lld for rtbitmap " "inode\n"), @@ -3644,6 +3644,8 @@ process_rtbitmap( pop_cur(); continue; } + + words = xfs_rbmblock_wordptr(iocur_top->bp, 0); for (bit = 0; bit < bitsperblock && extno < mp->m_sb.sb_rextents; bit++, extno++) { diff --git a/libxfs/xfs_rtbitmap.c b/libxfs/xfs_rtbitmap.c index 5301b0448f1..8fec2b769d5 100644 --- a/libxfs/xfs_rtbitmap.c +++ b/libxfs/xfs_rtbitmap.c @@ -108,7 +108,6 @@ xfs_rtfind_back( int bit; /* bit number in the word */ xfs_fileoff_t block; /* bitmap block number */ struct xfs_buf *bp; /* buf for the block */ - xfs_rtword_t *bufp; /* starting word in buffer */ int error; /* error value */ xfs_rtxnum_t firstbit; /* first useful bit in the word */ xfs_rtxnum_t i; /* current bit number rel. to start */ @@ -126,12 +125,12 @@ xfs_rtfind_back( if (error) { return error; } - bufp = bp->b_addr; + /* * Get the first word's index & point to it. */ word = xfs_rtx_to_rbmword(mp, start); - b = &bufp[word]; + b = xfs_rbmblock_wordptr(bp, word); bit = (int)(start & (XFS_NBWORD - 1)); len = start - limit + 1; /* @@ -178,9 +177,9 @@ xfs_rtfind_back( if (error) { return error; } - bufp = bp->b_addr; + word = mp->m_blockwsize - 1; - b = &bufp[word]; + b = xfs_rbmblock_wordptr(bp, word); } else { /* * Go on to the previous word in the buffer. @@ -224,9 +223,9 @@ xfs_rtfind_back( if (error) { return error; } - bufp = bp->b_addr; + word = mp->m_blockwsize - 1; - b = &bufp[word]; + b = xfs_rbmblock_wordptr(bp, word); } else { /* * Go on to the previous word in the buffer. @@ -283,7 +282,6 @@ xfs_rtfind_forw( int bit; /* bit number in the word */ xfs_fileoff_t block; /* bitmap block number */ struct xfs_buf *bp; /* buf for the block */ - xfs_rtword_t *bufp; /* starting word in buffer */ int error; /* error value */ xfs_rtxnum_t i; /* current bit number rel. to start */ xfs_rtxnum_t lastbit; /* last useful bit in the word */ @@ -301,12 +299,12 @@ xfs_rtfind_forw( if (error) { return error; } - bufp = bp->b_addr; + /* * Get the first word's index & point to it. */ word = xfs_rtx_to_rbmword(mp, start); - b = &bufp[word]; + b = xfs_rbmblock_wordptr(bp, word); bit = (int)(start & (XFS_NBWORD - 1)); len = limit - start + 1; /* @@ -352,8 +350,9 @@ xfs_rtfind_forw( if (error) { return error; } - b = bufp = bp->b_addr; + word = 0; + b = xfs_rbmblock_wordptr(bp, word); } else { /* * Go on to the previous word in the buffer. @@ -397,8 +396,9 @@ xfs_rtfind_forw( if (error) { return error; } - b = bufp = bp->b_addr; + word = 0; + b = xfs_rbmblock_wordptr(bp, word); } else { /* * Go on to the next word in the buffer. @@ -546,7 +546,6 @@ xfs_rtmodify_range( int bit; /* bit number in the word */ xfs_fileoff_t block; /* bitmap block number */ struct xfs_buf *bp; /* buf for the block */ - xfs_rtword_t *bufp; /* starting word in buffer */ int error; /* error value */ xfs_rtword_t *first; /* first used word in the buffer */ int i; /* current bit number rel. to start */ @@ -565,12 +564,12 @@ xfs_rtmodify_range( if (error) { return error; } - bufp = bp->b_addr; + /* * Compute the starting word's address, and starting bit. */ word = xfs_rtx_to_rbmword(mp, start); - first = b = &bufp[word]; + first = b = xfs_rbmblock_wordptr(bp, word); bit = (int)(start & (XFS_NBWORD - 1)); /* * 0 (allocated) => all zeroes; 1 (free) => all ones. @@ -604,14 +603,15 @@ xfs_rtmodify_range( * Get the next one. */ xfs_trans_log_buf(tp, bp, - (uint)((char *)first - (char *)bufp), - (uint)((char *)b - (char *)bufp)); + (uint)((char *)first - (char *)bp->b_addr), + (uint)((char *)b - (char *)bp->b_addr)); error = xfs_rtbuf_get(mp, tp, ++block, 0, &bp); if (error) { return error; } - first = b = bufp = bp->b_addr; + word = 0; + first = b = xfs_rbmblock_wordptr(bp, word); } else { /* * Go on to the next word in the buffer @@ -644,14 +644,15 @@ xfs_rtmodify_range( * Get the next one. */ xfs_trans_log_buf(tp, bp, - (uint)((char *)first - (char *)bufp), - (uint)((char *)b - (char *)bufp)); + (uint)((char *)first - (char *)bp->b_addr), + (uint)((char *)b - (char *)bp->b_addr)); error = xfs_rtbuf_get(mp, tp, ++block, 0, &bp); if (error) { return error; } - first = b = bufp = bp->b_addr; + word = 0; + first = b = xfs_rbmblock_wordptr(bp, word); } else { /* * Go on to the next word in the buffer @@ -681,8 +682,9 @@ xfs_rtmodify_range( * Log any remaining changed bytes. */ if (b > first) - xfs_trans_log_buf(tp, bp, (uint)((char *)first - (char *)bufp), - (uint)((char *)b - (char *)bufp - 1)); + xfs_trans_log_buf(tp, bp, + (uint)((char *)first - (char *)bp->b_addr), + (uint)((char *)b - (char *)bp->b_addr - 1)); return 0; } @@ -780,7 +782,6 @@ xfs_rtcheck_range( int bit; /* bit number in the word */ xfs_fileoff_t block; /* bitmap block number */ struct xfs_buf *bp; /* buf for the block */ - xfs_rtword_t *bufp; /* starting word in buffer */ int error; /* error value */ xfs_rtxnum_t i; /* current bit number rel. to start */ xfs_rtxnum_t lastbit; /* last useful bit in word */ @@ -799,12 +800,12 @@ xfs_rtcheck_range( if (error) { return error; } - bufp = bp->b_addr; + /* * Compute the starting word's address, and starting bit. */ word = xfs_rtx_to_rbmword(mp, start); - b = &bufp[word]; + b = xfs_rbmblock_wordptr(bp, word); bit = (int)(start & (XFS_NBWORD - 1)); /* * 0 (allocated) => all zero's; 1 (free) => all one's. @@ -850,8 +851,9 @@ xfs_rtcheck_range( if (error) { return error; } - b = bufp = bp->b_addr; + word = 0; + b = xfs_rbmblock_wordptr(bp, word); } else { /* * Go on to the next word in the buffer. @@ -896,8 +898,9 @@ xfs_rtcheck_range( if (error) { return error; } - b = bufp = bp->b_addr; + word = 0; + b = xfs_rbmblock_wordptr(bp, word); } else { /* * Go on to the next word in the buffer. diff --git a/libxfs/xfs_rtbitmap.h b/libxfs/xfs_rtbitmap.h index 5f4a453e29e..af37afec2b0 100644 --- a/libxfs/xfs_rtbitmap.h +++ b/libxfs/xfs_rtbitmap.h @@ -121,6 +121,26 @@ xfs_rbmblock_to_rtx( return rbmoff << mp->m_blkbit_log; } +/* Return a pointer to a bitmap word within a rt bitmap block buffer. */ +static inline xfs_rtword_t * +xfs_rbmbuf_wordptr( + void *buf, + unsigned int rbmword) +{ + xfs_rtword_t *wordp = buf; + + return &wordp[rbmword]; +} + +/* Return a pointer to a bitmap word within a rt bitmap block. */ +static inline xfs_rtword_t * +xfs_rbmblock_wordptr( + struct xfs_buf *bp, + unsigned int rbmword) +{ + return xfs_rbmbuf_wordptr(bp->b_addr, rbmword); +} + /* * Functions for walking free space rtextents in the realtime bitmap. */ diff --git a/repair/phase6.c b/repair/phase6.c index 13094730407..aef8a0d6b3f 100644 --- a/repair/phase6.c +++ b/repair/phase6.c @@ -828,11 +828,11 @@ _("can't access block %" PRIu64 " (fsbno %" PRIu64 ") of realtime bitmap inode % return(1); } - memmove(bp->b_addr, bmp, mp->m_sb.sb_blocksize); + memcpy(xfs_rbmblock_wordptr(bp, 0), bmp, mp->m_sb.sb_blocksize); libxfs_trans_log_buf(tp, bp, 0, mp->m_sb.sb_blocksize - 1); - bmp = (xfs_rtword_t *)((intptr_t) bmp + mp->m_sb.sb_blocksize); + bmp += mp->m_blockwsize; bno++; } ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 1/9] xfs: convert the rtbitmap block and bit macros to static inline functions 2022-12-30 22:19 ` [PATCHSET v1.0 0/9] libxfs: refactor rtbitmap/summary macros Darrick J. Wong 2022-12-30 22:19 ` [PATCH 3/9] xfs: convert open-coded xfs_rtword_t pointer accesses to helper Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 4/9] xfs: convert rt summary macros to helpers Darrick J. Wong ` (6 subsequent siblings) 8 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Replace these macros with typechecked helper functions. Eventually we're going to add more logic to the helpers and it'll be easier if we don't have to macro it up. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/xfs_format.h | 5 ----- libxfs/xfs_rtbitmap.c | 22 +++++++++++----------- libxfs/xfs_rtbitmap.h | 27 +++++++++++++++++++++++++++ 3 files changed, 38 insertions(+), 16 deletions(-) diff --git a/libxfs/xfs_format.h b/libxfs/xfs_format.h index d93cc0ea20e..6a3d684900a 100644 --- a/libxfs/xfs_format.h +++ b/libxfs/xfs_format.h @@ -1209,11 +1209,6 @@ static inline bool xfs_dinode_has_large_extent_counts( ((xfs_suminfo_t *)((bp)->b_addr + \ (((so) * (uint)sizeof(xfs_suminfo_t)) & XFS_BLOCKMASK(mp)))) -#define XFS_BITTOBLOCK(mp,bi) ((bi) >> (mp)->m_blkbit_log) -#define XFS_BLOCKTOBIT(mp,bb) ((bb) << (mp)->m_blkbit_log) -#define XFS_BITTOWORD(mp,bi) \ - ((int)(((bi) >> XFS_NBWORDLOG) & XFS_BLOCKWMASK(mp))) - #define XFS_RTMIN(a,b) ((a) < (b) ? (a) : (b)) #define XFS_RTMAX(a,b) ((a) > (b) ? (a) : (b)) diff --git a/libxfs/xfs_rtbitmap.c b/libxfs/xfs_rtbitmap.c index f74618323b4..62bd2d0eae3 100644 --- a/libxfs/xfs_rtbitmap.c +++ b/libxfs/xfs_rtbitmap.c @@ -121,7 +121,7 @@ xfs_rtfind_back( /* * Compute and read in starting bitmap block for starting block. */ - block = XFS_BITTOBLOCK(mp, start); + block = xfs_rtx_to_rbmblock(mp, start); error = xfs_rtbuf_get(mp, tp, block, 0, &bp); if (error) { return error; @@ -130,7 +130,7 @@ xfs_rtfind_back( /* * Get the first word's index & point to it. */ - word = XFS_BITTOWORD(mp, start); + word = xfs_rtx_to_rbmword(mp, start); b = &bufp[word]; bit = (int)(start & (XFS_NBWORD - 1)); len = start - limit + 1; @@ -296,7 +296,7 @@ xfs_rtfind_forw( /* * Compute and read in starting bitmap block for starting block. */ - block = XFS_BITTOBLOCK(mp, start); + block = xfs_rtx_to_rbmblock(mp, start); error = xfs_rtbuf_get(mp, tp, block, 0, &bp); if (error) { return error; @@ -305,7 +305,7 @@ xfs_rtfind_forw( /* * Get the first word's index & point to it. */ - word = XFS_BITTOWORD(mp, start); + word = xfs_rtx_to_rbmword(mp, start); b = &bufp[word]; bit = (int)(start & (XFS_NBWORD - 1)); len = limit - start + 1; @@ -557,7 +557,7 @@ xfs_rtmodify_range( /* * Compute starting bitmap block number. */ - block = XFS_BITTOBLOCK(mp, start); + block = xfs_rtx_to_rbmblock(mp, start); /* * Read the bitmap block, and point to its data. */ @@ -569,7 +569,7 @@ xfs_rtmodify_range( /* * Compute the starting word's address, and starting bit. */ - word = XFS_BITTOWORD(mp, start); + word = xfs_rtx_to_rbmword(mp, start); first = b = &bufp[word]; bit = (int)(start & (XFS_NBWORD - 1)); /* @@ -735,7 +735,7 @@ xfs_rtfree_range( if (preblock < start) { error = xfs_rtmodify_summary(mp, tp, XFS_RTBLOCKLOG(start - preblock), - XFS_BITTOBLOCK(mp, preblock), -1, rbpp, rsb); + xfs_rtx_to_rbmblock(mp, preblock), -1, rbpp, rsb); if (error) { return error; } @@ -747,7 +747,7 @@ xfs_rtfree_range( if (postblock > end) { error = xfs_rtmodify_summary(mp, tp, XFS_RTBLOCKLOG(postblock - end), - XFS_BITTOBLOCK(mp, end + 1), -1, rbpp, rsb); + xfs_rtx_to_rbmblock(mp, end + 1), -1, rbpp, rsb); if (error) { return error; } @@ -758,7 +758,7 @@ xfs_rtfree_range( */ error = xfs_rtmodify_summary(mp, tp, XFS_RTBLOCKLOG(postblock + 1 - preblock), - XFS_BITTOBLOCK(mp, preblock), 1, rbpp, rsb); + xfs_rtx_to_rbmblock(mp, preblock), 1, rbpp, rsb); return error; } @@ -791,7 +791,7 @@ xfs_rtcheck_range( /* * Compute starting bitmap block number */ - block = XFS_BITTOBLOCK(mp, start); + block = xfs_rtx_to_rbmblock(mp, start); /* * Read the bitmap block. */ @@ -803,7 +803,7 @@ xfs_rtcheck_range( /* * Compute the starting word's address, and starting bit. */ - word = XFS_BITTOWORD(mp, start); + word = xfs_rtx_to_rbmword(mp, start); b = &bufp[word]; bit = (int)(start & (XFS_NBWORD - 1)); /* diff --git a/libxfs/xfs_rtbitmap.h b/libxfs/xfs_rtbitmap.h index 9dd791181ca..e53011bc638 100644 --- a/libxfs/xfs_rtbitmap.h +++ b/libxfs/xfs_rtbitmap.h @@ -94,6 +94,33 @@ xfs_rtb_rounddown_rtx( return rounddown_64(rtbno, mp->m_sb.sb_rextsize); } +/* Convert an rt extent number to a file block offset in the rt bitmap file. */ +static inline xfs_fileoff_t +xfs_rtx_to_rbmblock( + struct xfs_mount *mp, + xfs_rtxnum_t rtx) +{ + return rtx >> mp->m_blkbit_log; +} + +/* Convert an rt extent number to a word offset within an rt bitmap block. */ +static inline unsigned int +xfs_rtx_to_rbmword( + struct xfs_mount *mp, + xfs_rtxnum_t rtx) +{ + return (rtx >> XFS_NBWORDLOG) & XFS_BLOCKWMASK(mp); +} + +/* Convert a file block offset in the rt bitmap file to an rt extent number. */ +static inline xfs_rtxnum_t +xfs_rbmblock_to_rtx( + struct xfs_mount *mp, + xfs_fileoff_t rbmoff) +{ + return rbmoff << mp->m_blkbit_log; +} + /* * Functions for walking free space rtextents in the realtime bitmap. */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 4/9] xfs: convert rt summary macros to helpers 2022-12-30 22:19 ` [PATCHSET v1.0 0/9] libxfs: refactor rtbitmap/summary macros Darrick J. Wong 2022-12-30 22:19 ` [PATCH 3/9] xfs: convert open-coded xfs_rtword_t pointer accesses to helper Darrick J. Wong 2022-12-30 22:19 ` [PATCH 1/9] xfs: convert the rtbitmap block and bit macros to static inline functions Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 2/9] xfs: remove XFS_BLOCKWSIZE and XFS_BLOCKWMASK macros Darrick J. Wong ` (5 subsequent siblings) 8 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Convert the realtime summary file macros to helper functions so that we can improve type checking. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- db/check.c | 4 ++- libxfs/xfs_format.h | 9 +------ libxfs/xfs_rtbitmap.c | 10 +++++--- libxfs/xfs_rtbitmap.h | 59 +++++++++++++++++++++++++++++++++++++++++++++++++ libxfs/xfs_types.h | 2 ++ repair/rt.c | 4 ++- 6 files changed, 72 insertions(+), 16 deletions(-) diff --git a/db/check.c b/db/check.c index 185be6352b8..4e1d3c6d366 100644 --- a/db/check.c +++ b/db/check.c @@ -3663,7 +3663,7 @@ process_rtbitmap( len = ((int)bmbno - start_bmbno) * bitsperblock + (bit - start_bit); log = XFS_RTBLOCKLOG(len); - offs = XFS_SUMOFFS(mp, log, start_bmbno); + offs = xfs_rtsumoffs(mp, log, start_bmbno); sumcompute[offs]++; prevbit = 0; } @@ -3676,7 +3676,7 @@ process_rtbitmap( len = ((int)bmbno - start_bmbno) * bitsperblock + (bit - start_bit); log = XFS_RTBLOCKLOG(len); - offs = XFS_SUMOFFS(mp, log, start_bmbno); + offs = xfs_rtsumoffs(mp, log, start_bmbno); sumcompute[offs]++; } } diff --git a/libxfs/xfs_format.h b/libxfs/xfs_format.h index a4278c8fba5..d95497c064f 100644 --- a/libxfs/xfs_format.h +++ b/libxfs/xfs_format.h @@ -1198,15 +1198,8 @@ static inline bool xfs_dinode_has_large_extent_counts( #define XFS_BLOCKMASK(mp) ((mp)->m_blockmask) /* - * RT Summary and bit manipulation macros. + * RT bit manipulation macros. */ -#define XFS_SUMOFFS(mp,ls,bb) ((int)((ls) * (mp)->m_sb.sb_rbmblocks + (bb))) -#define XFS_SUMOFFSTOBLOCK(mp,s) \ - (((s) * (uint)sizeof(xfs_suminfo_t)) >> (mp)->m_sb.sb_blocklog) -#define XFS_SUMPTR(mp,bp,so) \ - ((xfs_suminfo_t *)((bp)->b_addr + \ - (((so) * (uint)sizeof(xfs_suminfo_t)) & XFS_BLOCKMASK(mp)))) - #define XFS_RTMIN(a,b) ((a) < (b) ? (a) : (b)) #define XFS_RTMAX(a,b) ((a) > (b) ? (a) : (b)) diff --git a/libxfs/xfs_rtbitmap.c b/libxfs/xfs_rtbitmap.c index 8fec2b769d5..e17af3b9d28 100644 --- a/libxfs/xfs_rtbitmap.c +++ b/libxfs/xfs_rtbitmap.c @@ -460,17 +460,18 @@ xfs_rtmodify_summary_int( struct xfs_buf *bp; /* buffer for the summary block */ int error; /* error value */ xfs_fileoff_t sb; /* summary fsblock */ - int so; /* index into the summary file */ + xfs_rtsumoff_t so; /* index into the summary file */ xfs_suminfo_t *sp; /* pointer to returned data */ + unsigned int infoword; /* * Compute entry number in the summary file. */ - so = XFS_SUMOFFS(mp, log, bbno); + so = xfs_rtsumoffs(mp, log, bbno); /* * Compute the block number in the summary file. */ - sb = XFS_SUMOFFSTOBLOCK(mp, so); + sb = xfs_rtsumoffs_to_block(mp, so); /* * If we have an old buffer, and the block number matches, use that. */ @@ -498,7 +499,8 @@ xfs_rtmodify_summary_int( /* * Point to the summary information, modify/log it, and/or copy it out. */ - sp = XFS_SUMPTR(mp, bp, so); + infoword = xfs_rtsumoffs_to_infoword(mp, so); + sp = xfs_rsumblock_infoptr(bp, infoword); if (delta) { uint first = (uint)((char *)sp - (char *)bp->b_addr); diff --git a/libxfs/xfs_rtbitmap.h b/libxfs/xfs_rtbitmap.h index af37afec2b0..f616956b289 100644 --- a/libxfs/xfs_rtbitmap.h +++ b/libxfs/xfs_rtbitmap.h @@ -141,6 +141,65 @@ xfs_rbmblock_wordptr( return xfs_rbmbuf_wordptr(bp->b_addr, rbmword); } +/* + * Convert a rt extent length and rt bitmap block number to a xfs_suminfo_t + * offset within the rt summary file. + */ +static inline xfs_rtsumoff_t +xfs_rtsumoffs( + struct xfs_mount *mp, + int log2_len, + xfs_fileoff_t rbmoff) +{ + return log2_len * mp->m_sb.sb_rbmblocks + rbmoff; +} + +/* + * Convert an xfs_suminfo_t offset to a file block offset within the rt summary + * file. + */ +static inline xfs_fileoff_t +xfs_rtsumoffs_to_block( + struct xfs_mount *mp, + xfs_rtsumoff_t rsumoff) +{ + return XFS_B_TO_FSBT(mp, rsumoff * sizeof(xfs_suminfo_t)); +} + +/* + * Convert an xfs_suminfo_t offset to an info word offset within an rt summary + * block. + */ +static inline unsigned int +xfs_rtsumoffs_to_infoword( + struct xfs_mount *mp, + xfs_rtsumoff_t rsumoff) +{ + unsigned int mask = mp->m_blockmask >> XFS_SUMINFOLOG; + + return rsumoff & mask; +} + +/* Return a pointer to a summary info word within a rt summary block buffer. */ +static inline xfs_suminfo_t * +xfs_rsumbuf_infoptr( + void *buf, + unsigned int infoword) +{ + xfs_suminfo_t *infop = buf; + + return &infop[infoword]; +} + +/* Return a pointer to a summary info word within a rt summary block. */ +static inline xfs_suminfo_t * +xfs_rsumblock_infoptr( + struct xfs_buf *bp, + unsigned int infoword) +{ + return xfs_rsumbuf_infoptr(bp->b_addr, infoword); +} + /* * Functions for walking free space rtextents in the realtime bitmap. */ diff --git a/libxfs/xfs_types.h b/libxfs/xfs_types.h index abb07a1c7b0..f4615c5be34 100644 --- a/libxfs/xfs_types.h +++ b/libxfs/xfs_types.h @@ -19,6 +19,7 @@ typedef int64_t xfs_fsize_t; /* bytes in a file */ typedef uint64_t xfs_ufsize_t; /* unsigned bytes in a file */ typedef int32_t xfs_suminfo_t; /* type of bitmap summary info */ +typedef uint32_t xfs_rtsumoff_t; /* offset of an rtsummary info word */ typedef uint32_t xfs_rtword_t; /* word type for bitmap manipulations */ typedef int64_t xfs_lsn_t; /* log sequence number */ @@ -151,6 +152,7 @@ typedef uint32_t xfs_dqid_t; */ #define XFS_NBBYLOG 3 /* log2(NBBY) */ #define XFS_WORDLOG 2 /* log2(sizeof(xfs_rtword_t)) */ +#define XFS_SUMINFOLOG 2 /* log2(sizeof(xfs_suminfo_t)) */ #define XFS_NBWORDLOG (XFS_NBBYLOG + XFS_WORDLOG) #define XFS_NBWORD (1 << XFS_NBWORDLOG) #define XFS_WORDMASK ((1 << XFS_WORDLOG) - 1) diff --git a/repair/rt.c b/repair/rt.c index 947382e9ede..8f3b9082a9b 100644 --- a/repair/rt.c +++ b/repair/rt.c @@ -91,7 +91,7 @@ generate_rtinfo(xfs_mount_t *mp, } else if (in_extent == 1) { len = (int) (extno - start_ext); log = XFS_RTBLOCKLOG(len); - offs = XFS_SUMOFFS(mp, log, start_bmbno); + offs = xfs_rtsumoffs(mp, log, start_bmbno); sumcompute[offs]++; in_extent = 0; } @@ -107,7 +107,7 @@ generate_rtinfo(xfs_mount_t *mp, if (in_extent == 1) { len = (int) (extno - start_ext); log = XFS_RTBLOCKLOG(len); - offs = XFS_SUMOFFS(mp, log, start_bmbno); + offs = xfs_rtsumoffs(mp, log, start_bmbno); sumcompute[offs]++; } ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 2/9] xfs: remove XFS_BLOCKWSIZE and XFS_BLOCKWMASK macros 2022-12-30 22:19 ` [PATCHSET v1.0 0/9] libxfs: refactor rtbitmap/summary macros Darrick J. Wong ` (2 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 4/9] xfs: convert rt summary macros to helpers Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 5/9] xfs: create helpers for rtbitmap block/wordcount computations Darrick J. Wong ` (4 subsequent siblings) 8 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Remove these trivial macros since they're not even part of the ondisk format. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/xfs_format.h | 2 -- libxfs/xfs_rtbitmap.c | 16 ++++++++-------- libxfs/xfs_rtbitmap.h | 2 +- 3 files changed, 9 insertions(+), 11 deletions(-) diff --git a/libxfs/xfs_format.h b/libxfs/xfs_format.h index 6a3d684900a..a4278c8fba5 100644 --- a/libxfs/xfs_format.h +++ b/libxfs/xfs_format.h @@ -1196,8 +1196,6 @@ static inline bool xfs_dinode_has_large_extent_counts( #define XFS_BLOCKSIZE(mp) ((mp)->m_sb.sb_blocksize) #define XFS_BLOCKMASK(mp) ((mp)->m_blockmask) -#define XFS_BLOCKWSIZE(mp) ((mp)->m_blockwsize) -#define XFS_BLOCKWMASK(mp) ((mp)->m_blockwmask) /* * RT Summary and bit manipulation macros. diff --git a/libxfs/xfs_rtbitmap.c b/libxfs/xfs_rtbitmap.c index 62bd2d0eae3..5301b0448f1 100644 --- a/libxfs/xfs_rtbitmap.c +++ b/libxfs/xfs_rtbitmap.c @@ -179,7 +179,7 @@ xfs_rtfind_back( return error; } bufp = bp->b_addr; - word = XFS_BLOCKWMASK(mp); + word = mp->m_blockwsize - 1; b = &bufp[word]; } else { /* @@ -225,7 +225,7 @@ xfs_rtfind_back( return error; } bufp = bp->b_addr; - word = XFS_BLOCKWMASK(mp); + word = mp->m_blockwsize - 1; b = &bufp[word]; } else { /* @@ -343,7 +343,7 @@ xfs_rtfind_forw( * Go on to next block if that's where the next word is * and we need the next word. */ - if (++word == XFS_BLOCKWSIZE(mp) && i < len) { + if (++word == mp->m_blockwsize && i < len) { /* * If done with this block, get the previous one. */ @@ -388,7 +388,7 @@ xfs_rtfind_forw( * Go on to next block if that's where the next word is * and we need the next word. */ - if (++word == XFS_BLOCKWSIZE(mp) && i < len) { + if (++word == mp->m_blockwsize && i < len) { /* * If done with this block, get the next one. */ @@ -598,7 +598,7 @@ xfs_rtmodify_range( * Go on to the next block if that's where the next word is * and we need the next word. */ - if (++word == XFS_BLOCKWSIZE(mp) && i < len) { + if (++word == mp->m_blockwsize && i < len) { /* * Log the changed part of this block. * Get the next one. @@ -638,7 +638,7 @@ xfs_rtmodify_range( * Go on to the next block if that's where the next word is * and we need the next word. */ - if (++word == XFS_BLOCKWSIZE(mp) && i < len) { + if (++word == mp->m_blockwsize && i < len) { /* * Log the changed part of this block. * Get the next one. @@ -841,7 +841,7 @@ xfs_rtcheck_range( * Go on to next block if that's where the next word is * and we need the next word. */ - if (++word == XFS_BLOCKWSIZE(mp) && i < len) { + if (++word == mp->m_blockwsize && i < len) { /* * If done with this block, get the next one. */ @@ -887,7 +887,7 @@ xfs_rtcheck_range( * Go on to next block if that's where the next word is * and we need the next word. */ - if (++word == XFS_BLOCKWSIZE(mp) && i < len) { + if (++word == mp->m_blockwsize && i < len) { /* * If done with this block, get the next one. */ diff --git a/libxfs/xfs_rtbitmap.h b/libxfs/xfs_rtbitmap.h index e53011bc638..5f4a453e29e 100644 --- a/libxfs/xfs_rtbitmap.h +++ b/libxfs/xfs_rtbitmap.h @@ -109,7 +109,7 @@ xfs_rtx_to_rbmword( struct xfs_mount *mp, xfs_rtxnum_t rtx) { - return (rtx >> XFS_NBWORDLOG) & XFS_BLOCKWMASK(mp); + return (rtx >> XFS_NBWORDLOG) & (mp->m_blockwsize - 1); } /* Convert a file block offset in the rt bitmap file to an rt extent number. */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 5/9] xfs: create helpers for rtbitmap block/wordcount computations 2022-12-30 22:19 ` [PATCHSET v1.0 0/9] libxfs: refactor rtbitmap/summary macros Darrick J. Wong ` (3 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 2/9] xfs: remove XFS_BLOCKWSIZE and XFS_BLOCKWMASK macros Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 7/9] xfs: create helpers for rtsummary " Darrick J. Wong ` (3 subsequent siblings) 8 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Create helper functions that compute the number of blocks or words necessary to store the rt bitmap. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/libxfs_api_defs.h | 2 ++ libxfs/xfs_rtbitmap.c | 27 +++++++++++++++++++++++++++ libxfs/xfs_rtbitmap.h | 12 ++++++++++++ libxfs/xfs_trans_resv.c | 9 +++++---- repair/rt.c | 10 +++++----- 5 files changed, 51 insertions(+), 9 deletions(-) diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h index 494172b213b..818406b0415 100644 --- a/libxfs/libxfs_api_defs.h +++ b/libxfs/libxfs_api_defs.h @@ -227,6 +227,8 @@ #define xfs_rmap_query_all libxfs_rmap_query_all #define xfs_rmap_query_range libxfs_rmap_query_range +#define xfs_rtbitmap_wordcount libxfs_rtbitmap_wordcount + #define xfs_rtfree_extent libxfs_rtfree_extent #define xfs_sb_from_disk libxfs_sb_from_disk #define xfs_sb_quota_from_disk libxfs_sb_quota_from_disk diff --git a/libxfs/xfs_rtbitmap.c b/libxfs/xfs_rtbitmap.c index e17af3b9d28..116afec2a75 100644 --- a/libxfs/xfs_rtbitmap.c +++ b/libxfs/xfs_rtbitmap.c @@ -1140,3 +1140,30 @@ xfs_rtalloc_extent_is_free( *is_free = matches; return 0; } + +/* + * Compute the number of rtbitmap blocks needed to track the given number of rt + * extents. + */ +xfs_filblks_t +xfs_rtbitmap_blockcount( + struct xfs_mount *mp, + xfs_rtbxlen_t rtextents) +{ + return howmany_64(rtextents, NBBY * mp->m_sb.sb_blocksize); +} + +/* + * Compute the number of rtbitmap words needed to populate every block of a + * bitmap that is large enough to track the given number of rt extents. + */ +unsigned long long +xfs_rtbitmap_wordcount( + struct xfs_mount *mp, + xfs_rtbxlen_t rtextents) +{ + xfs_filblks_t blocks; + + blocks = xfs_rtbitmap_blockcount(mp, rtextents); + return XFS_FSB_TO_B(mp, blocks) >> XFS_WORDLOG; +} diff --git a/libxfs/xfs_rtbitmap.h b/libxfs/xfs_rtbitmap.h index f616956b289..308ce814a90 100644 --- a/libxfs/xfs_rtbitmap.h +++ b/libxfs/xfs_rtbitmap.h @@ -261,6 +261,11 @@ xfs_rtfree_extent( /* Same as above, but in units of rt blocks. */ int xfs_rtfree_blocks(struct xfs_trans *tp, xfs_fsblock_t rtbno, xfs_filblks_t rtlen); + +xfs_filblks_t xfs_rtbitmap_blockcount(struct xfs_mount *mp, xfs_rtbxlen_t + rtextents); +unsigned long long xfs_rtbitmap_wordcount(struct xfs_mount *mp, + xfs_rtbxlen_t rtextents); #else /* CONFIG_XFS_RT */ # define xfs_rtfree_extent(t,b,l) (-ENOSYS) # define xfs_rtfree_blocks(t,rb,rl) (-ENOSYS) @@ -268,6 +273,13 @@ int xfs_rtfree_blocks(struct xfs_trans *tp, xfs_fsblock_t rtbno, # define xfs_rtalloc_query_all(m,t,f,p) (-ENOSYS) # define xfs_rtbuf_get(m,t,b,i,p) (-ENOSYS) # define xfs_rtalloc_extent_is_free(m,t,s,l,i) (-ENOSYS) +static inline xfs_filblks_t +xfs_rtbitmap_blockcount(struct xfs_mount *mp, xfs_rtbxlen_t rtextents) +{ + /* shut up gcc */ + return 0; +} +# define xfs_rtbitmap_wordcount(mp, r) (0) #endif /* CONFIG_XFS_RT */ #endif /* __XFS_RTBITMAP_H__ */ diff --git a/libxfs/xfs_trans_resv.c b/libxfs/xfs_trans_resv.c index be486ed42c3..36db7d709fe 100644 --- a/libxfs/xfs_trans_resv.c +++ b/libxfs/xfs_trans_resv.c @@ -217,11 +217,12 @@ xfs_rtalloc_block_count( struct xfs_mount *mp, unsigned int num_ops) { - unsigned int blksz = XFS_FSB_TO_B(mp, 1); - unsigned int rtbmp_bytes; + unsigned int rtbmp_blocks; + xfs_rtxlen_t rtxlen; - rtbmp_bytes = xfs_extlen_to_rtxlen(mp, XFS_MAX_BMBT_EXTLEN) / NBBY; - return (howmany(rtbmp_bytes, blksz) + 1) * num_ops; + rtxlen = xfs_extlen_to_rtxlen(mp, XFS_MAX_BMBT_EXTLEN); + rtbmp_blocks = xfs_rtbitmap_blockcount(mp, rtxlen); + return (rtbmp_blocks + 1) * num_ops; } /* diff --git a/repair/rt.c b/repair/rt.c index 8f3b9082a9b..244b59f04ce 100644 --- a/repair/rt.c +++ b/repair/rt.c @@ -19,6 +19,8 @@ void rtinit(xfs_mount_t *mp) { + unsigned long long wordcnt; + if (mp->m_sb.sb_rblocks == 0) return; @@ -26,11 +28,9 @@ rtinit(xfs_mount_t *mp) * realtime init -- blockmap initialization is * handled by incore_init() */ - /* - sumfile = calloc(mp->m_rsumsize, 1); - */ - if ((btmcompute = calloc(mp->m_sb.sb_rbmblocks * - mp->m_sb.sb_blocksize, 1)) == NULL) + wordcnt = libxfs_rtbitmap_wordcount(mp, mp->m_sb.sb_rextents); + btmcompute = calloc(wordcnt, sizeof(xfs_rtword_t)); + if (!btmcompute) do_error( _("couldn't allocate memory for incore realtime bitmap.\n")); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 7/9] xfs: create helpers for rtsummary block/wordcount computations 2022-12-30 22:19 ` [PATCHSET v1.0 0/9] libxfs: refactor rtbitmap/summary macros Darrick J. Wong ` (4 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 5/9] xfs: create helpers for rtbitmap block/wordcount computations Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 8/9] xfs: use accessor functions for summary info words Darrick J. Wong ` (2 subsequent siblings) 8 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Create helper functions that compute the number of blocks or words necessary to store the rt summary file. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- db/check.c | 8 ++++++-- libxfs/init.c | 8 ++++---- libxfs/libxfs_api_defs.h | 2 ++ libxfs/xfs_rtbitmap.c | 29 +++++++++++++++++++++++++++++ libxfs/xfs_rtbitmap.h | 7 +++++++ repair/rt.c | 5 ++++- 6 files changed, 52 insertions(+), 7 deletions(-) diff --git a/db/check.c b/db/check.c index a10eb74ae81..81ba4732790 100644 --- a/db/check.c +++ b/db/check.c @@ -1944,10 +1944,14 @@ init( inodata[c] = xcalloc(inodata_hash_size, sizeof(**inodata)); } if (rt) { + unsigned long long words; + dbmap[c] = xcalloc(mp->m_sb.sb_rblocks, sizeof(**dbmap)); inomap[c] = xcalloc(mp->m_sb.sb_rblocks, sizeof(**inomap)); - sumfile = xcalloc(mp->m_rsumsize, 1); - sumcompute = xcalloc(mp->m_rsumsize, 1); + words = libxfs_rtsummary_wordcount(mp, mp->m_rsumlevels, + mp->m_sb.sb_rbmblocks); + sumfile = xcalloc(words, sizeof(xfs_suminfo_t)); + sumcompute = xcalloc(words, sizeof(xfs_suminfo_t)); } nflag = sflag = tflag = verbose = optind = 0; while ((c = getopt(argc, argv, "b:i:npstv")) != EOF) { diff --git a/libxfs/init.c b/libxfs/init.c index 787f7c108db..a440943cbdb 100644 --- a/libxfs/init.c +++ b/libxfs/init.c @@ -440,6 +440,7 @@ rtmount_init( { struct xfs_buf *bp; /* buffer for last block of subvolume */ xfs_daddr_t d; /* address of last block of subvolume */ + unsigned int rsumblocks; int error; if (mp->m_sb.sb_rblocks == 0) @@ -465,10 +466,9 @@ rtmount_init( return -1; } mp->m_rsumlevels = mp->m_sb.sb_rextslog + 1; - mp->m_rsumsize = - (uint)sizeof(xfs_suminfo_t) * mp->m_rsumlevels * - mp->m_sb.sb_rbmblocks; - mp->m_rsumsize = roundup(mp->m_rsumsize, mp->m_sb.sb_blocksize); + rsumblocks = xfs_rtsummary_blockcount(mp, mp->m_rsumlevels, + mp->m_sb.sb_rbmblocks); + mp->m_rsumsize = XFS_FSB_TO_B(mp, rsumblocks); mp->m_rbmip = mp->m_rsumip = NULL; /* diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h index 23b4365cc6e..38162b2fb2c 100644 --- a/libxfs/libxfs_api_defs.h +++ b/libxfs/libxfs_api_defs.h @@ -231,6 +231,8 @@ #define xfs_rtbitmap_setword libxfs_rtbitmap_setword #define xfs_rtbitmap_wordcount libxfs_rtbitmap_wordcount +#define xfs_rtsummary_wordcount libxfs_rtsummary_wordcount + #define xfs_rtfree_extent libxfs_rtfree_extent #define xfs_sb_from_disk libxfs_sb_from_disk #define xfs_sb_quota_from_disk libxfs_sb_quota_from_disk diff --git a/libxfs/xfs_rtbitmap.c b/libxfs/xfs_rtbitmap.c index 9cfefccb36c..2c84a5a0b14 100644 --- a/libxfs/xfs_rtbitmap.c +++ b/libxfs/xfs_rtbitmap.c @@ -1203,3 +1203,32 @@ xfs_rtbitmap_wordcount( blocks = xfs_rtbitmap_blockcount(mp, rtextents); return XFS_FSB_TO_B(mp, blocks) >> XFS_WORDLOG; } + +/* Compute the number of rtsummary blocks needed to track the given rt space. */ +xfs_filblks_t +xfs_rtsummary_blockcount( + struct xfs_mount *mp, + unsigned int rsumlevels, + xfs_extlen_t rbmblocks) +{ + unsigned long long rsumwords; + + rsumwords = (unsigned long long)rsumlevels * rbmblocks; + return XFS_B_TO_FSB(mp, rsumwords << XFS_WORDLOG); +} + +/* + * Compute the number of rtsummary info words needed to populate every block of + * a summary file that is large enough to track the given rt space. + */ +unsigned long long +xfs_rtsummary_wordcount( + struct xfs_mount *mp, + unsigned int rsumlevels, + xfs_extlen_t rbmblocks) +{ + xfs_filblks_t blocks; + + blocks = xfs_rtsummary_blockcount(mp, rsumlevels, rbmblocks); + return XFS_FSB_TO_B(mp, blocks) >> XFS_WORDLOG; +} diff --git a/libxfs/xfs_rtbitmap.h b/libxfs/xfs_rtbitmap.h index 0a3c6299af8..a66357cf002 100644 --- a/libxfs/xfs_rtbitmap.h +++ b/libxfs/xfs_rtbitmap.h @@ -270,6 +270,11 @@ xfs_rtword_t xfs_rtbitmap_getword(struct xfs_mount *mp, union xfs_rtword_ondisk *wordptr); void xfs_rtbitmap_setword(struct xfs_mount *mp, union xfs_rtword_ondisk *wordptr, xfs_rtword_t incore); + +xfs_filblks_t xfs_rtsummary_blockcount(struct xfs_mount *mp, + unsigned int rsumlevels, xfs_extlen_t rbmblocks); +unsigned long long xfs_rtsummary_wordcount(struct xfs_mount *mp, + unsigned int rsumlevels, xfs_extlen_t rbmblocks); #else /* CONFIG_XFS_RT */ # define xfs_rtfree_extent(t,b,l) (-ENOSYS) # define xfs_rtfree_blocks(t,rb,rl) (-ENOSYS) @@ -284,6 +289,8 @@ xfs_rtbitmap_blockcount(struct xfs_mount *mp, xfs_rtbxlen_t rtextents) return 0; } # define xfs_rtbitmap_wordcount(mp, r) (0) +# define xfs_rtsummary_blockcount(mp, l, b) (0) +# define xfs_rtsummary_wordcount(mp, l, b) (0) #endif /* CONFIG_XFS_RT */ #endif /* __XFS_RTBITMAP_H__ */ diff --git a/repair/rt.c b/repair/rt.c index c6df8819cc7..ded9e02367d 100644 --- a/repair/rt.c +++ b/repair/rt.c @@ -34,7 +34,10 @@ rtinit(xfs_mount_t *mp) do_error( _("couldn't allocate memory for incore realtime bitmap.\n")); - if ((sumcompute = calloc(mp->m_rsumsize, 1)) == NULL) + wordcnt = libxfs_rtsummary_wordcount(mp, mp->m_rsumlevels, + mp->m_sb.sb_rbmblocks); + sumcompute = calloc(wordcnt, sizeof(xfs_suminfo_t)); + if (!sumcompute) do_error( _("couldn't allocate memory for incore realtime summary info.\n")); } ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 8/9] xfs: use accessor functions for summary info words 2022-12-30 22:19 ` [PATCHSET v1.0 0/9] libxfs: refactor rtbitmap/summary macros Darrick J. Wong ` (5 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 7/9] xfs: create helpers for rtsummary " Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 9/9] misc: use m_blockwsize instead of sb_blocksize for rt blocks Darrick J. Wong 2022-12-30 22:19 ` [PATCH 6/9] xfs: use accessor functions for bitmap words Darrick J. Wong 8 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Create get and set functions for rtsummary words so that we can redefine the ondisk format with a specific endianness. Note that this requires the definition of a distinct type for ondisk summary info words so that the compiler can perform proper typechecking. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- db/check.c | 35 +++++++++++++++++++++-------------- libxfs/libxfs_api_defs.h | 2 ++ libxfs/xfs_format.h | 8 ++++++++ libxfs/xfs_rtbitmap.c | 27 ++++++++++++++++++++++----- libxfs/xfs_rtbitmap.h | 10 +++++++--- repair/globals.c | 2 +- repair/globals.h | 2 +- repair/phase6.c | 6 +++--- repair/rt.c | 8 ++++---- repair/rt.h | 2 +- 10 files changed, 70 insertions(+), 32 deletions(-) diff --git a/db/check.c b/db/check.c index 81ba4732790..2dcab8e87e6 100644 --- a/db/check.c +++ b/db/check.c @@ -132,8 +132,8 @@ static unsigned sbversion; static int sbver_err; static int serious_error; static int sflag; -static xfs_suminfo_t *sumcompute; -static xfs_suminfo_t *sumfile; +static union xfs_suminfo_ondisk *sumcompute; +static union xfs_suminfo_ondisk *sumfile; static const char *typename[] = { "unknown", "agf", @@ -1708,8 +1708,8 @@ static void check_summary(void) { xfs_rfsblock_t bno; - xfs_suminfo_t *csp; - xfs_suminfo_t *fsp; + union xfs_suminfo_ondisk *csp; + union xfs_suminfo_ondisk *fsp; int log; csp = sumcompute; @@ -1718,12 +1718,14 @@ check_summary(void) for (bno = 0; bno < mp->m_sb.sb_rbmblocks; bno++, csp++, fsp++) { - if (*csp != *fsp) { + if (csp->raw != fsp->raw) { if (!sflag) dbprintf(_("rt summary mismatch, size %d " "block %llu, file: %d, " "computed: %d\n"), - log, bno, *fsp, *csp); + log, bno, + libxfs_suminfo_get(mp, fsp), + libxfs_suminfo_get(mp, csp)); error++; } } @@ -1950,8 +1952,8 @@ init( inomap[c] = xcalloc(mp->m_sb.sb_rblocks, sizeof(**inomap)); words = libxfs_rtsummary_wordcount(mp, mp->m_rsumlevels, mp->m_sb.sb_rbmblocks); - sumfile = xcalloc(words, sizeof(xfs_suminfo_t)); - sumcompute = xcalloc(words, sizeof(xfs_suminfo_t)); + sumfile = xcalloc(words, sizeof(union xfs_suminfo_ondisk)); + sumcompute = xcalloc(words, sizeof(union xfs_suminfo_ondisk)); } nflag = sflag = tflag = verbose = optind = 0; while ((c = getopt(argc, argv, "b:i:npstv")) != EOF) { @@ -3681,7 +3683,7 @@ process_rtbitmap( bitsperblock + (bit - start_bit); log = XFS_RTBLOCKLOG(len); offs = xfs_rtsumoffs(mp, log, start_bmbno); - sumcompute[offs]++; + libxfs_suminfo_add(mp, &sumcompute[offs], 1); prevbit = 0; } } @@ -3694,7 +3696,7 @@ process_rtbitmap( (bit - start_bit); log = XFS_RTBLOCKLOG(len); offs = xfs_rtsumoffs(mp, log, start_bmbno); - sumcompute[offs]++; + libxfs_suminfo_add(mp, &sumcompute[offs], 1); } free(words); } @@ -3704,12 +3706,14 @@ process_rtsummary( blkmap_t *blkmap) { xfs_fsblock_t bno; - char *bytes; + union xfs_suminfo_ondisk *sfile = sumfile; xfs_fileoff_t sumbno; int t; sumbno = NULLFILEOFF; while ((sumbno = blkmap_next_off(blkmap, sumbno, &t)) != NULLFILEOFF) { + union xfs_suminfo_ondisk *ondisk; + bno = blkmap_get(blkmap, sumbno); if (bno == NULLFSBLOCK) { if (!sflag) @@ -3722,18 +3726,21 @@ process_rtsummary( push_cur(); set_cur(&typtab[TYP_RTSUMMARY], XFS_FSB_TO_DADDR(mp, bno), blkbb, DB_RING_IGN, NULL); - if ((bytes = iocur_top->data) == NULL) { + if (!iocur_top->bp) { if (!sflag) dbprintf(_("can't read block %lld for rtsummary " "inode\n"), (xfs_fileoff_t)sumbno); error++; pop_cur(); + sfile += mp->m_blockwsize; continue; } - memcpy((char *)sumfile + sumbno * mp->m_sb.sb_blocksize, bytes, - mp->m_sb.sb_blocksize); + + ondisk = xfs_rsumblock_infoptr(iocur_top->bp, 0); + memcpy(sfile, ondisk, mp->m_sb.sb_blocksize); pop_cur(); + sfile += mp->m_blockwsize; } } diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h index 38162b2fb2c..ca9144dd949 100644 --- a/libxfs/libxfs_api_defs.h +++ b/libxfs/libxfs_api_defs.h @@ -231,6 +231,8 @@ #define xfs_rtbitmap_setword libxfs_rtbitmap_setword #define xfs_rtbitmap_wordcount libxfs_rtbitmap_wordcount +#define xfs_suminfo_add libxfs_suminfo_add +#define xfs_suminfo_get libxfs_suminfo_get #define xfs_rtsummary_wordcount libxfs_rtsummary_wordcount #define xfs_rtfree_extent libxfs_rtfree_extent diff --git a/libxfs/xfs_format.h b/libxfs/xfs_format.h index 14da972f550..946870eb492 100644 --- a/libxfs/xfs_format.h +++ b/libxfs/xfs_format.h @@ -725,6 +725,14 @@ union xfs_rtword_ondisk { __u32 raw; }; +/* + * Realtime summary counts are accessed by the word, which is currently + * stored in host-endian format. + */ +union xfs_suminfo_ondisk { + __u32 raw; +}; + /* * XFS Timestamps * ============== diff --git a/libxfs/xfs_rtbitmap.c b/libxfs/xfs_rtbitmap.c index 2c84a5a0b14..dca41254ee0 100644 --- a/libxfs/xfs_rtbitmap.c +++ b/libxfs/xfs_rtbitmap.c @@ -464,6 +464,23 @@ xfs_rtfind_forw( return 0; } +inline xfs_suminfo_t +xfs_suminfo_get( + struct xfs_mount *mp, + union xfs_suminfo_ondisk *infoptr) +{ + return infoptr->raw; +} + +inline void +xfs_suminfo_add( + struct xfs_mount *mp, + union xfs_suminfo_ondisk *infoptr, + int delta) +{ + infoptr->raw += delta; +} + /* * Read and/or modify the summary information for a given extent size, * bitmap block combination. @@ -488,7 +505,7 @@ xfs_rtmodify_summary_int( int error; /* error value */ xfs_fileoff_t sb; /* summary fsblock */ xfs_rtsumoff_t so; /* index into the summary file */ - xfs_suminfo_t *sp; /* pointer to returned data */ + union xfs_suminfo_ondisk *sp; /* pointer to returned data */ unsigned int infoword; /* @@ -531,17 +548,17 @@ xfs_rtmodify_summary_int( if (delta) { uint first = (uint)((char *)sp - (char *)bp->b_addr); - *sp += delta; + xfs_suminfo_add(mp, sp, delta); if (mp->m_rsum_cache) { - if (*sp == 0 && log == mp->m_rsum_cache[bbno]) + if (sp->raw == 0 && log == mp->m_rsum_cache[bbno]) mp->m_rsum_cache[bbno]++; - if (*sp != 0 && log < mp->m_rsum_cache[bbno]) + if (sp->raw != 0 && log < mp->m_rsum_cache[bbno]) mp->m_rsum_cache[bbno] = log; } xfs_trans_log_buf(tp, bp, first, first + sizeof(*sp) - 1); } if (sum) - *sum = *sp; + *sum = xfs_suminfo_get(mp, sp); return 0; } diff --git a/libxfs/xfs_rtbitmap.h b/libxfs/xfs_rtbitmap.h index a66357cf002..749c8e3ec4c 100644 --- a/libxfs/xfs_rtbitmap.h +++ b/libxfs/xfs_rtbitmap.h @@ -181,18 +181,18 @@ xfs_rtsumoffs_to_infoword( } /* Return a pointer to a summary info word within a rt summary block buffer. */ -static inline xfs_suminfo_t * +static inline union xfs_suminfo_ondisk * xfs_rsumbuf_infoptr( void *buf, unsigned int infoword) { - xfs_suminfo_t *infop = buf; + union xfs_suminfo_ondisk *infop = buf; return &infop[infoword]; } /* Return a pointer to a summary info word within a rt summary block. */ -static inline xfs_suminfo_t * +static inline union xfs_suminfo_ondisk * xfs_rsumblock_infoptr( struct xfs_buf *bp, unsigned int infoword) @@ -275,6 +275,10 @@ xfs_filblks_t xfs_rtsummary_blockcount(struct xfs_mount *mp, unsigned int rsumlevels, xfs_extlen_t rbmblocks); unsigned long long xfs_rtsummary_wordcount(struct xfs_mount *mp, unsigned int rsumlevels, xfs_extlen_t rbmblocks); +xfs_suminfo_t xfs_suminfo_get(struct xfs_mount *mp, + union xfs_suminfo_ondisk *infoptr); +void xfs_suminfo_add(struct xfs_mount *mp, union xfs_suminfo_ondisk *infoptr, + int delta); #else /* CONFIG_XFS_RT */ # define xfs_rtfree_extent(t,b,l) (-ENOSYS) # define xfs_rtfree_blocks(t,rb,rl) (-ENOSYS) diff --git a/repair/globals.c b/repair/globals.c index 694a4c39cbd..a7b903e8ff6 100644 --- a/repair/globals.c +++ b/repair/globals.c @@ -94,7 +94,7 @@ int64_t fs_max_file_offset; /* realtime info */ union xfs_rtword_ondisk *btmcompute; -xfs_suminfo_t *sumcompute; +union xfs_suminfo_ondisk *sumcompute; /* inode tree records have full or partial backptr fields ? */ diff --git a/repair/globals.h b/repair/globals.h index 51d94ce8224..27895dd39c5 100644 --- a/repair/globals.h +++ b/repair/globals.h @@ -135,7 +135,7 @@ extern int64_t fs_max_file_offset; /* realtime info */ extern union xfs_rtword_ondisk *btmcompute; -extern xfs_suminfo_t *sumcompute; +extern union xfs_suminfo_ondisk *sumcompute; /* inode tree records have full or partial backptr fields ? */ diff --git a/repair/phase6.c b/repair/phase6.c index 27c47032fcb..3be1da033c5 100644 --- a/repair/phase6.c +++ b/repair/phase6.c @@ -850,7 +850,7 @@ fill_rsumino(xfs_mount_t *mp) struct xfs_buf *bp; xfs_trans_t *tp; xfs_inode_t *ip; - xfs_suminfo_t *smp; + union xfs_suminfo_ondisk *smp; int nmap; int error; xfs_fileoff_t bno; @@ -899,11 +899,11 @@ _("can't access block %" PRIu64 " (fsbno %" PRIu64 ") of realtime summary inode return(1); } - memmove(bp->b_addr, smp, mp->m_sb.sb_blocksize); + memcpy(xfs_rsumblock_infoptr(bp, 0), smp, mp->m_sb.sb_blocksize); libxfs_trans_log_buf(tp, bp, 0, mp->m_sb.sb_blocksize - 1); - smp = (xfs_suminfo_t *)((intptr_t)smp + mp->m_sb.sb_blocksize); + smp += mp->m_blockwsize; bno++; } diff --git a/repair/rt.c b/repair/rt.c index ded9e02367d..9333bce8fbb 100644 --- a/repair/rt.c +++ b/repair/rt.c @@ -36,7 +36,7 @@ rtinit(xfs_mount_t *mp) wordcnt = libxfs_rtsummary_wordcount(mp, mp->m_rsumlevels, mp->m_sb.sb_rbmblocks); - sumcompute = calloc(wordcnt, sizeof(xfs_suminfo_t)); + sumcompute = calloc(wordcnt, sizeof(union xfs_suminfo_ondisk)); if (!sumcompute) do_error( _("couldn't allocate memory for incore realtime summary info.\n")); @@ -50,7 +50,7 @@ int generate_rtinfo( struct xfs_mount *mp, union xfs_rtword_ondisk *words, - xfs_suminfo_t *sumcompute) + union xfs_suminfo_ondisk *sumcompute) { xfs_rtxnum_t extno; xfs_rtxnum_t start_ext; @@ -96,7 +96,7 @@ generate_rtinfo( len = (int) (extno - start_ext); log = XFS_RTBLOCKLOG(len); offs = xfs_rtsumoffs(mp, log, start_bmbno); - sumcompute[offs]++; + libxfs_suminfo_add(mp, &sumcompute[offs], 1); in_extent = 0; } @@ -112,7 +112,7 @@ generate_rtinfo( len = (int) (extno - start_ext); log = XFS_RTBLOCKLOG(len); offs = xfs_rtsumoffs(mp, log, start_bmbno); - sumcompute[offs]++; + libxfs_suminfo_add(mp, &sumcompute[offs], 1); } if (mp->m_sb.sb_frextents != sb_frextents) { diff --git a/repair/rt.h b/repair/rt.h index 4f944487d63..16b39c21a67 100644 --- a/repair/rt.h +++ b/repair/rt.h @@ -12,7 +12,7 @@ void rtinit(xfs_mount_t *mp); int generate_rtinfo(struct xfs_mount *mp, union xfs_rtword_ondisk *words, - xfs_suminfo_t *sumcompute); + union xfs_suminfo_ondisk *sumcompute); void check_rtbitmap(struct xfs_mount *mp); void check_rtsummary(struct xfs_mount *mp); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 9/9] misc: use m_blockwsize instead of sb_blocksize for rt blocks 2022-12-30 22:19 ` [PATCHSET v1.0 0/9] libxfs: refactor rtbitmap/summary macros Darrick J. Wong ` (6 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 8/9] xfs: use accessor functions for summary info words Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 6/9] xfs: use accessor functions for bitmap words Darrick J. Wong 8 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> In preparation to add block headers to rt bitmap and summary blocks, convert all the relevant calculations in the userspace tools to use the per-block word count instead of the raw blocksize. This is key to adding this support outside of libxfs. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- db/check.c | 4 ++-- repair/phase6.c | 6 ++++-- repair/rt.c | 9 +++++---- 3 files changed, 11 insertions(+), 8 deletions(-) diff --git a/db/check.c b/db/check.c index 2dcab8e87e6..f39d732d04d 100644 --- a/db/check.c +++ b/db/check.c @@ -3624,7 +3624,7 @@ process_rtbitmap( int t; xfs_rtword_t *words; - bitsperblock = mp->m_sb.sb_blocksize * NBBY; + bitsperblock = mp->m_blockwsize << XFS_NBWORDLOG; words = malloc(mp->m_blockwsize << XFS_WORDLOG); if (!words) { dbprintf(_("could not allocate rtwords buffer\n")); @@ -3738,7 +3738,7 @@ process_rtsummary( } ondisk = xfs_rsumblock_infoptr(iocur_top->bp, 0); - memcpy(sfile, ondisk, mp->m_sb.sb_blocksize); + memcpy(sfile, ondisk, mp->m_blockwsize << XFS_WORDLOG); pop_cur(); sfile += mp->m_blockwsize; } diff --git a/repair/phase6.c b/repair/phase6.c index 3be1da033c5..31d42b9306b 100644 --- a/repair/phase6.c +++ b/repair/phase6.c @@ -828,7 +828,8 @@ _("can't access block %" PRIu64 " (fsbno %" PRIu64 ") of realtime bitmap inode % return(1); } - memcpy(xfs_rbmblock_wordptr(bp, 0), bmp, mp->m_sb.sb_blocksize); + memcpy(xfs_rbmblock_wordptr(bp, 0), bmp, + mp->m_blockwsize << XFS_WORDLOG); libxfs_trans_log_buf(tp, bp, 0, mp->m_sb.sb_blocksize - 1); @@ -899,7 +900,8 @@ _("can't access block %" PRIu64 " (fsbno %" PRIu64 ") of realtime summary inode return(1); } - memcpy(xfs_rsumblock_infoptr(bp, 0), smp, mp->m_sb.sb_blocksize); + memcpy(xfs_rsumblock_infoptr(bp, 0), smp, + mp->m_blockwsize << XFS_WORDLOG); libxfs_trans_log_buf(tp, bp, 0, mp->m_sb.sb_blocksize - 1); diff --git a/repair/rt.c b/repair/rt.c index 9333bce8fbb..56a04c3de6e 100644 --- a/repair/rt.c +++ b/repair/rt.c @@ -25,8 +25,9 @@ rtinit(xfs_mount_t *mp) return; /* - * realtime init -- blockmap initialization is - * handled by incore_init() + * Allocate buffers for formatting the collected rt free space + * information. The rtbitmap buffer must be large enough to compare + * against any unused bytes in the last block of the file. */ wordcnt = libxfs_rtbitmap_wordcount(mp, mp->m_sb.sb_rextents); btmcompute = calloc(wordcnt, sizeof(union xfs_rtword_ondisk)); @@ -67,7 +68,7 @@ generate_rtinfo( ASSERT(mp->m_rbmip == NULL); - bitsperblock = mp->m_sb.sb_blocksize * NBBY; + bitsperblock = mp->m_blockwsize << XFS_NBWORDLOG; extno = start_ext = 0; bmbno = in_extent = start_bmbno = 0; @@ -179,7 +180,7 @@ check_rtfile_contents( break; } - if (memcmp(bp->b_addr, buf, mp->m_sb.sb_blocksize)) + if (memcmp(bp->b_addr, buf, mp->m_blockwsize << XFS_WORDLOG)) do_warn(_("discrepancy in %s at dblock 0x%llx\n"), filename, (unsigned long long)bno); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 6/9] xfs: use accessor functions for bitmap words 2022-12-30 22:19 ` [PATCHSET v1.0 0/9] libxfs: refactor rtbitmap/summary macros Darrick J. Wong ` (7 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 9/9] misc: use m_blockwsize instead of sb_blocksize for rt blocks Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 8 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Create get and set functions for rtbitmap words so that we can redefine the ondisk format with a specific endianness. Note that this requires the definition of a distinct type for ondisk rtbitmap words so that the compiler can perform proper typechecking as we go back and forth. In the upcoming rtgroups feature, we're going to fix the problem that rtwords are written in host endian order, which means we'll need the distinct rtword/rtword_raw types. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- db/check.c | 16 +++++++++ libxfs/libxfs_api_defs.h | 2 + libxfs/xfs_format.h | 8 +++++ libxfs/xfs_rtbitmap.c | 78 ++++++++++++++++++++++++++++++++++------------ libxfs/xfs_rtbitmap.h | 10 ++++-- repair/globals.c | 2 + repair/globals.h | 2 + repair/phase6.c | 2 + repair/rt.c | 13 ++++---- repair/rt.h | 6 +--- 10 files changed, 101 insertions(+), 38 deletions(-) diff --git a/db/check.c b/db/check.c index 4e1d3c6d366..a10eb74ae81 100644 --- a/db/check.c +++ b/db/check.c @@ -3619,10 +3619,20 @@ process_rtbitmap( xfs_rtword_t *words; bitsperblock = mp->m_sb.sb_blocksize * NBBY; + words = malloc(mp->m_blockwsize << XFS_WORDLOG); + if (!words) { + dbprintf(_("could not allocate rtwords buffer\n")); + error++; + return; + } bit = extno = prevbit = start_bmbno = start_bit = 0; bmbno = NULLFILEOFF; while ((bmbno = blkmap_next_off(blkmap, bmbno, &t)) != NULLFILEOFF) { + xfs_rtword_t *incore = words; + union xfs_rtword_ondisk *ondisk; + unsigned int i; + bno = blkmap_get(blkmap, bmbno); if (bno == NULLFSBLOCK) { if (!sflag) @@ -3645,7 +3655,10 @@ process_rtbitmap( continue; } - words = xfs_rbmblock_wordptr(iocur_top->bp, 0); + ondisk = xfs_rbmblock_wordptr(iocur_top->bp, 0); + for (i = 0; i < mp->m_blockwsize; i++, incore++, ondisk++) + *incore = libxfs_rtbitmap_getword(mp, ondisk); + for (bit = 0; bit < bitsperblock && extno < mp->m_sb.sb_rextents; bit++, extno++) { @@ -3679,6 +3692,7 @@ process_rtbitmap( offs = xfs_rtsumoffs(mp, log, start_bmbno); sumcompute[offs]++; } + free(words); } static void diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h index 818406b0415..23b4365cc6e 100644 --- a/libxfs/libxfs_api_defs.h +++ b/libxfs/libxfs_api_defs.h @@ -227,6 +227,8 @@ #define xfs_rmap_query_all libxfs_rmap_query_all #define xfs_rmap_query_range libxfs_rmap_query_range +#define xfs_rtbitmap_getword libxfs_rtbitmap_getword +#define xfs_rtbitmap_setword libxfs_rtbitmap_setword #define xfs_rtbitmap_wordcount libxfs_rtbitmap_wordcount #define xfs_rtfree_extent libxfs_rtfree_extent diff --git a/libxfs/xfs_format.h b/libxfs/xfs_format.h index d95497c064f..14da972f550 100644 --- a/libxfs/xfs_format.h +++ b/libxfs/xfs_format.h @@ -717,6 +717,14 @@ struct xfs_agfl { ASSERT(xfs_daddr_to_agno(mp, d) == \ xfs_daddr_to_agno(mp, (d) + (len) - 1))) +/* + * Realtime bitmap information is accessed by the word, which is currently + * stored in host-endian format. + */ +union xfs_rtword_ondisk { + __u32 raw; +}; + /* * XFS Timestamps * ============== diff --git a/libxfs/xfs_rtbitmap.c b/libxfs/xfs_rtbitmap.c index 116afec2a75..9cfefccb36c 100644 --- a/libxfs/xfs_rtbitmap.c +++ b/libxfs/xfs_rtbitmap.c @@ -92,6 +92,25 @@ xfs_rtbuf_get( return 0; } +/* Convert an ondisk bitmap word to its incore representation. */ +inline xfs_rtword_t +xfs_rtbitmap_getword( + struct xfs_mount *mp, + union xfs_rtword_ondisk *wordptr) +{ + return wordptr->raw; +} + +/* Set an ondisk bitmap word from an incore representation. */ +inline void +xfs_rtbitmap_setword( + struct xfs_mount *mp, + union xfs_rtword_ondisk *wordptr, + xfs_rtword_t incore) +{ + wordptr->raw = incore; +} + /* * Searching backward from start to limit, find the first block whose * allocated/free state is different from start's. @@ -104,7 +123,7 @@ xfs_rtfind_back( xfs_rtxnum_t limit, /* last rtext to look at */ xfs_rtxnum_t *rtx) /* out: start rtext found */ { - xfs_rtword_t *b; /* current word in buffer */ + union xfs_rtword_ondisk *b; /* current word in buffer */ int bit; /* bit number in the word */ xfs_fileoff_t block; /* bitmap block number */ struct xfs_buf *bp; /* buf for the block */ @@ -115,6 +134,7 @@ xfs_rtfind_back( xfs_rtword_t mask; /* mask of relevant bits for value */ xfs_rtword_t want; /* mask for "good" values */ xfs_rtword_t wdiff; /* difference from wanted value */ + xfs_rtword_t incore; int word; /* word number in the buffer */ /* @@ -137,7 +157,8 @@ xfs_rtfind_back( * Compute match value, based on the bit at start: if 1 (free) * then all-ones, else all-zeroes. */ - want = (*b & ((xfs_rtword_t)1 << bit)) ? -1 : 0; + incore = xfs_rtbitmap_getword(mp, b); + want = (incore & ((xfs_rtword_t)1 << bit)) ? -1 : 0; /* * If the starting position is not word-aligned, deal with the * partial word. @@ -154,7 +175,7 @@ xfs_rtfind_back( * Calculate the difference between the value there * and what we're looking for. */ - if ((wdiff = (*b ^ want) & mask)) { + if ((wdiff = (incore ^ want) & mask)) { /* * Different. Mark where we are and return. */ @@ -200,7 +221,8 @@ xfs_rtfind_back( /* * Compute difference between actual and desired value. */ - if ((wdiff = *b ^ want)) { + incore = xfs_rtbitmap_getword(mp, b); + if ((wdiff = incore ^ want)) { /* * Different, mark where we are and return. */ @@ -247,7 +269,8 @@ xfs_rtfind_back( /* * Compute difference between actual and desired value. */ - if ((wdiff = (*b ^ want) & mask)) { + incore = xfs_rtbitmap_getword(mp, b); + if ((wdiff = (incore ^ want) & mask)) { /* * Different, mark where we are and return. */ @@ -278,7 +301,7 @@ xfs_rtfind_forw( xfs_rtxnum_t limit, /* last rtext to look at */ xfs_rtxnum_t *rtx) /* out: start rtext found */ { - xfs_rtword_t *b; /* current word in buffer */ + union xfs_rtword_ondisk *b; /* current word in buffer */ int bit; /* bit number in the word */ xfs_fileoff_t block; /* bitmap block number */ struct xfs_buf *bp; /* buf for the block */ @@ -289,6 +312,7 @@ xfs_rtfind_forw( xfs_rtword_t mask; /* mask of relevant bits for value */ xfs_rtword_t want; /* mask for "good" values */ xfs_rtword_t wdiff; /* difference from wanted value */ + xfs_rtword_t incore; int word; /* word number in the buffer */ /* @@ -311,7 +335,8 @@ xfs_rtfind_forw( * Compute match value, based on the bit at start: if 1 (free) * then all-ones, else all-zeroes. */ - want = (*b & ((xfs_rtword_t)1 << bit)) ? -1 : 0; + incore = xfs_rtbitmap_getword(mp, b); + want = (incore & ((xfs_rtword_t)1 << bit)) ? -1 : 0; /* * If the starting position is not word-aligned, deal with the * partial word. @@ -327,7 +352,7 @@ xfs_rtfind_forw( * Calculate the difference between the value there * and what we're looking for. */ - if ((wdiff = (*b ^ want) & mask)) { + if ((wdiff = (incore ^ want) & mask)) { /* * Different. Mark where we are and return. */ @@ -373,7 +398,8 @@ xfs_rtfind_forw( /* * Compute difference between actual and desired value. */ - if ((wdiff = *b ^ want)) { + incore = xfs_rtbitmap_getword(mp, b); + if ((wdiff = incore ^ want)) { /* * Different, mark where we are and return. */ @@ -418,7 +444,8 @@ xfs_rtfind_forw( /* * Compute difference between actual and desired value. */ - if ((wdiff = (*b ^ want) & mask)) { + incore = xfs_rtbitmap_getword(mp, b); + if ((wdiff = (incore ^ want) & mask)) { /* * Different, mark where we are and return. */ @@ -544,15 +571,16 @@ xfs_rtmodify_range( xfs_rtxlen_t len, /* length of extent to modify */ int val) /* 1 for free, 0 for allocated */ { - xfs_rtword_t *b; /* current word in buffer */ + union xfs_rtword_ondisk *b; /* current word in buffer */ int bit; /* bit number in the word */ xfs_fileoff_t block; /* bitmap block number */ struct xfs_buf *bp; /* buf for the block */ int error; /* error value */ - xfs_rtword_t *first; /* first used word in the buffer */ + union xfs_rtword_ondisk *first; /* first used word in the buffer */ int i; /* current bit number rel. to start */ int lastbit; /* last useful bit in word */ xfs_rtword_t mask; /* mask o frelevant bits for value */ + xfs_rtword_t incore; int word; /* word number in the buffer */ /* @@ -590,10 +618,12 @@ xfs_rtmodify_range( /* * Set/clear the active bits. */ + incore = xfs_rtbitmap_getword(mp, b); if (val) - *b |= mask; + incore |= mask; else - *b &= ~mask; + incore &= ~mask; + xfs_rtbitmap_setword(mp, b, incore); i = lastbit - bit; /* * Go on to the next block if that's where the next word is @@ -634,7 +664,7 @@ xfs_rtmodify_range( /* * Set the word value correctly. */ - *b = val; + xfs_rtbitmap_setword(mp, b, val); i += XFS_NBWORD; /* * Go on to the next block if that's where the next word is @@ -674,10 +704,12 @@ xfs_rtmodify_range( /* * Set/clear the active bits. */ + incore = xfs_rtbitmap_getword(mp, b); if (val) - *b |= mask; + incore |= mask; else - *b &= ~mask; + incore &= ~mask; + xfs_rtbitmap_setword(mp, b, incore); b++; } /* @@ -780,7 +812,7 @@ xfs_rtcheck_range( xfs_rtxnum_t *new, /* out: first rtext not matching */ int *stat) /* out: 1 for matches, 0 for not */ { - xfs_rtword_t *b; /* current word in buffer */ + union xfs_rtword_ondisk *b; /* current word in buffer */ int bit; /* bit number in the word */ xfs_fileoff_t block; /* bitmap block number */ struct xfs_buf *bp; /* buf for the block */ @@ -789,6 +821,7 @@ xfs_rtcheck_range( xfs_rtxnum_t lastbit; /* last useful bit in word */ xfs_rtword_t mask; /* mask of relevant bits for value */ xfs_rtword_t wdiff; /* difference from wanted value */ + xfs_rtword_t incore; int word; /* word number in the buffer */ /* @@ -829,7 +862,8 @@ xfs_rtcheck_range( /* * Compute difference between actual and desired value. */ - if ((wdiff = (*b ^ val) & mask)) { + incore = xfs_rtbitmap_getword(mp, b); + if ((wdiff = (incore ^ val) & mask)) { /* * Different, compute first wrong bit and return. */ @@ -876,7 +910,8 @@ xfs_rtcheck_range( /* * Compute difference between actual and desired value. */ - if ((wdiff = *b ^ val)) { + incore = xfs_rtbitmap_getword(mp, b); + if ((wdiff = incore ^ val)) { /* * Different, compute first wrong bit and return. */ @@ -922,7 +957,8 @@ xfs_rtcheck_range( /* * Compute difference between actual and desired value. */ - if ((wdiff = (*b ^ val) & mask)) { + incore = xfs_rtbitmap_getword(mp, b); + if ((wdiff = (incore ^ val) & mask)) { /* * Different, compute first wrong bit and return. */ diff --git a/libxfs/xfs_rtbitmap.h b/libxfs/xfs_rtbitmap.h index 308ce814a90..0a3c6299af8 100644 --- a/libxfs/xfs_rtbitmap.h +++ b/libxfs/xfs_rtbitmap.h @@ -122,18 +122,18 @@ xfs_rbmblock_to_rtx( } /* Return a pointer to a bitmap word within a rt bitmap block buffer. */ -static inline xfs_rtword_t * +static inline union xfs_rtword_ondisk * xfs_rbmbuf_wordptr( void *buf, unsigned int rbmword) { - xfs_rtword_t *wordp = buf; + union xfs_rtword_ondisk *wordp = buf; return &wordp[rbmword]; } /* Return a pointer to a bitmap word within a rt bitmap block. */ -static inline xfs_rtword_t * +static inline union xfs_rtword_ondisk * xfs_rbmblock_wordptr( struct xfs_buf *bp, unsigned int rbmword) @@ -266,6 +266,10 @@ xfs_filblks_t xfs_rtbitmap_blockcount(struct xfs_mount *mp, xfs_rtbxlen_t rtextents); unsigned long long xfs_rtbitmap_wordcount(struct xfs_mount *mp, xfs_rtbxlen_t rtextents); +xfs_rtword_t xfs_rtbitmap_getword(struct xfs_mount *mp, + union xfs_rtword_ondisk *wordptr); +void xfs_rtbitmap_setword(struct xfs_mount *mp, + union xfs_rtword_ondisk *wordptr, xfs_rtword_t incore); #else /* CONFIG_XFS_RT */ # define xfs_rtfree_extent(t,b,l) (-ENOSYS) # define xfs_rtfree_blocks(t,rb,rl) (-ENOSYS) diff --git a/repair/globals.c b/repair/globals.c index 3200342e9f1..694a4c39cbd 100644 --- a/repair/globals.c +++ b/repair/globals.c @@ -93,7 +93,7 @@ int64_t fs_max_file_offset; /* realtime info */ -xfs_rtword_t *btmcompute; +union xfs_rtword_ondisk *btmcompute; xfs_suminfo_t *sumcompute; /* inode tree records have full or partial backptr fields ? */ diff --git a/repair/globals.h b/repair/globals.h index e51f4e7ece4..51d94ce8224 100644 --- a/repair/globals.h +++ b/repair/globals.h @@ -134,7 +134,7 @@ extern int64_t fs_max_file_offset; /* realtime info */ -extern xfs_rtword_t *btmcompute; +extern union xfs_rtword_ondisk *btmcompute; extern xfs_suminfo_t *sumcompute; /* inode tree records have full or partial backptr fields ? */ diff --git a/repair/phase6.c b/repair/phase6.c index aef8a0d6b3f..27c47032fcb 100644 --- a/repair/phase6.c +++ b/repair/phase6.c @@ -782,7 +782,7 @@ fill_rbmino(xfs_mount_t *mp) struct xfs_buf *bp; xfs_trans_t *tp; xfs_inode_t *ip; - xfs_rtword_t *bmp; + union xfs_rtword_ondisk *bmp; int nmap; int error; xfs_fileoff_t bno; diff --git a/repair/rt.c b/repair/rt.c index 244b59f04ce..c6df8819cc7 100644 --- a/repair/rt.c +++ b/repair/rt.c @@ -29,7 +29,7 @@ rtinit(xfs_mount_t *mp) * handled by incore_init() */ wordcnt = libxfs_rtbitmap_wordcount(mp, mp->m_sb.sb_rextents); - btmcompute = calloc(wordcnt, sizeof(xfs_rtword_t)); + btmcompute = calloc(wordcnt, sizeof(union xfs_rtword_ondisk)); if (!btmcompute) do_error( _("couldn't allocate memory for incore realtime bitmap.\n")); @@ -44,9 +44,10 @@ rtinit(xfs_mount_t *mp) * incore realtime extent map. */ int -generate_rtinfo(xfs_mount_t *mp, - xfs_rtword_t *words, - xfs_suminfo_t *sumcompute) +generate_rtinfo( + struct xfs_mount *mp, + union xfs_rtword_ondisk *words, + xfs_suminfo_t *sumcompute) { xfs_rtxnum_t extno; xfs_rtxnum_t start_ext; @@ -75,7 +76,7 @@ generate_rtinfo(xfs_mount_t *mp, */ while (extno < mp->m_sb.sb_rextents) { freebit = 1; - *words = 0; + libxfs_rtbitmap_setword(mp, words, 0); bits = 0; for (i = 0; i < sizeof(xfs_rtword_t) * NBBY && extno < mp->m_sb.sb_rextents; i++, extno++) { @@ -98,7 +99,7 @@ generate_rtinfo(xfs_mount_t *mp, freebit <<= 1; } - *words = bits; + libxfs_rtbitmap_setword(mp, words, bits); words++; if (extno % bitsperblock == 0) diff --git a/repair/rt.h b/repair/rt.h index be24e91c95e..4f944487d63 100644 --- a/repair/rt.h +++ b/repair/rt.h @@ -11,10 +11,8 @@ struct blkmap; void rtinit(xfs_mount_t *mp); -int -generate_rtinfo(xfs_mount_t *mp, - xfs_rtword_t *words, - xfs_suminfo_t *sumcompute); +int generate_rtinfo(struct xfs_mount *mp, union xfs_rtword_ondisk *words, + xfs_suminfo_t *sumcompute); void check_rtbitmap(struct xfs_mount *mp); void check_rtsummary(struct xfs_mount *mp); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCHSET v1.0 0/8] xfs_db: debug realtime geometry 2022-12-30 21:14 [NYE DELUGE 3/4] xfs: modernize the realtime volume Darrick J. Wong ` (22 preceding siblings ...) 2022-12-30 22:19 ` [PATCHSET v1.0 0/9] libxfs: refactor rtbitmap/summary macros Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 2/8] xfs_db: report the device associated with each io cursor Darrick J. Wong ` (7 more replies) 2022-12-30 22:19 ` [PATCHSET v1.0 0/5] xfs_metadump: support external devices Darrick J. Wong ` (15 subsequent siblings) 39 siblings, 8 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs Hi all, Before we start modernizing the realtime device, let's first make a few improvements to the XFS debugger to make our lives easier. First up is making it so that users can point the debugger at the block device containing the realtime section, and augmenting the io cursor code to be able to read blocks from the rt device. Next, we add a new geometry conversion command (rtconvert) to make it easier to go back and forth between rt blocks, rt extents, and the corresponding locations within the rt bitmap and summary files. If you're going to start using this mess, you probably ought to just pull from my git trees, which are linked below. This is an extraordinary way to destroy everything. Enjoy! Comments and questions are, as always, welcome. --D xfsprogs git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=debug-realtime-geometry --- db/block.c | 183 +++++++++++++++++++++++-- db/convert.c | 395 ++++++++++++++++++++++++++++++++++++++++++++++++++--- db/faddr.c | 4 - db/init.c | 7 + db/io.c | 89 +++++++++++- db/io.h | 6 + db/xfs_admin.sh | 5 - man/man8/xfs_db.8 | 129 +++++++++++++++++ 8 files changed, 772 insertions(+), 46 deletions(-) ^ permalink raw reply [flat|nested] 565+ messages in thread
* [PATCH 2/8] xfs_db: report the device associated with each io cursor 2022-12-30 22:19 ` [PATCHSET v1.0 0/8] xfs_db: debug realtime geometry Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 5/8] xfs_db: access arbitrary realtime blocks and extents Darrick J. Wong ` (6 subsequent siblings) 7 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> When db is reporting on an io cursor, have it print out the device that the cursor is pointing to. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- db/block.c | 16 +++++++++++++++- db/io.c | 46 +++++++++++++++++++++++++++++++++++++++++++--- db/io.h | 4 ++++ 3 files changed, 62 insertions(+), 4 deletions(-) diff --git a/db/block.c b/db/block.c index 788337d3709..b2b5edf9385 100644 --- a/db/block.c +++ b/db/block.c @@ -126,7 +126,17 @@ daddr_f( char *p; if (argc == 1) { - dbprintf(_("current daddr is %lld\n"), iocur_top->off >> BBSHIFT); + xfs_daddr_t daddr = iocur_top->off >> BBSHIFT; + + if (iocur_is_ddev(iocur_top)) + dbprintf(_("datadev daddr is %lld\n"), daddr); + else if (iocur_is_extlogdev(iocur_top)) + dbprintf(_("logdev daddr is %lld\n"), daddr); + else if (iocur_is_rtdev(iocur_top)) + dbprintf(_("rtdev daddr is %lld\n"), daddr); + else + dbprintf(_("current daddr is %lld\n"), daddr); + return 0; } d = (int64_t)strtoull(argv[1], &p, 0); @@ -220,6 +230,10 @@ fsblock_f( char *p; if (argc == 1) { + if (!iocur_is_ddev(iocur_top)) { + dbprintf(_("cursor does not point to data device\n")); + return 0; + } dbprintf(_("current fsblock is %lld\n"), XFS_DADDR_TO_FSB(mp, iocur_top->off >> BBSHIFT)); return 0; diff --git a/db/io.c b/db/io.c index 8688ee8e9c0..00eb5e98dc2 100644 --- a/db/io.c +++ b/db/io.c @@ -137,18 +137,58 @@ pop_help(void) )); } +bool +iocur_is_ddev(const struct iocur *ioc) +{ + if (!ioc->bp) + return false; + + return ioc->bp->b_target == ioc->bp->b_mount->m_ddev_targp; +} + +bool +iocur_is_extlogdev(const struct iocur *ioc) +{ + struct xfs_buf *bp = ioc->bp; + + if (!bp) + return false; + if (bp->b_mount->m_logdev_targp == bp->b_mount->m_ddev_targp) + return false; + + return bp->b_target == bp->b_mount->m_logdev_targp; +} + +bool +iocur_is_rtdev(const struct iocur *ioc) +{ + if (!ioc->bp) + return false; + + return ioc->bp->b_target == ioc->bp->b_mount->m_rtdev_targp; +} + void print_iocur( char *tag, iocur_t *ioc) { + const char *block_unit = "fsbno?"; int i; + if (iocur_is_ddev(ioc)) + block_unit = "fsbno"; + else if (iocur_is_extlogdev(ioc)) + block_unit = "logbno"; + else if (iocur_is_rtdev(ioc)) + block_unit = "rtbno"; + dbprintf("%s\n", tag); dbprintf(_("\tbyte offset %lld, length %d\n"), ioc->off, ioc->len); - dbprintf(_("\tbuffer block %lld (fsbno %lld), %d bb%s\n"), ioc->bb, - (xfs_fsblock_t)XFS_DADDR_TO_FSB(mp, ioc->bb), ioc->blen, - ioc->blen == 1 ? "" : "s"); + dbprintf(_("\tbuffer block %lld (%s %lld), %d bb%s\n"), ioc->bb, + block_unit, + (xfs_fsblock_t)XFS_DADDR_TO_FSB(mp, ioc->bb), + ioc->blen, ioc->blen == 1 ? "" : "s"); if (ioc->bbmap) { dbprintf(_("\tblock map")); for (i = 0; i < ioc->bbmap->nmaps; i++) diff --git a/db/io.h b/db/io.h index 29b22037bd6..1a37ee78c72 100644 --- a/db/io.h +++ b/db/io.h @@ -56,6 +56,10 @@ extern void set_iocur_type(const struct typ *type); extern void xfs_dummy_verify(struct xfs_buf *bp); extern void xfs_verify_recalc_crc(struct xfs_buf *bp); +bool iocur_is_ddev(const struct iocur *ioc); +bool iocur_is_extlogdev(const struct iocur *ioc); +bool iocur_is_rtdev(const struct iocur *ioc); + /* * returns -1 for unchecked, 0 for bad and 1 for good */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 5/8] xfs_db: access arbitrary realtime blocks and extents 2022-12-30 22:19 ` [PATCHSET v1.0 0/8] xfs_db: debug realtime geometry Darrick J. Wong 2022-12-30 22:19 ` [PATCH 2/8] xfs_db: report the device associated with each io cursor Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 6/8] xfs_db: enable conversion of rt space units Darrick J. Wong ` (5 subsequent siblings) 7 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Add two commands to xfs_db so that we can point ourselves at any arbitrary realtime block or extent. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- db/block.c | 107 +++++++++++++++++++++++++++++++++++++++++++++++++++++ man/man8/xfs_db.8 | 20 ++++++++++ 2 files changed, 127 insertions(+) diff --git a/db/block.c b/db/block.c index ae8744685b0..1afe201d0b2 100644 --- a/db/block.c +++ b/db/block.c @@ -25,6 +25,10 @@ static int dblock_f(int argc, char **argv); static void dblock_help(void); static int fsblock_f(int argc, char **argv); static void fsblock_help(void); +static int rtblock_f(int argc, char **argv); +static void rtblock_help(void); +static int rtextent_f(int argc, char **argv); +static void rtextent_help(void); static void print_rawdata(void *data, int len); static const cmdinfo_t ablock_cmd = @@ -39,6 +43,12 @@ static const cmdinfo_t dblock_cmd = static const cmdinfo_t fsblock_cmd = { "fsblock", "fsb", fsblock_f, 0, 1, 1, N_("[fsb]"), N_("set address to fsblock value"), fsblock_help }; +static const cmdinfo_t rtblock_cmd = + { "rtblock", "rtbno", rtblock_f, 0, 1, 1, N_("[rtbno]"), + N_("set address to rtblock value"), rtblock_help }; +static const cmdinfo_t rtextent_cmd = + { "rtextent", "rtx", rtextent_f, 0, 1, 1, N_("[rtxno]"), + N_("set address to rtextent value"), rtextent_help }; static void ablock_help(void) @@ -104,6 +114,8 @@ block_init(void) add_command(&daddr_cmd); add_command(&dblock_cmd); add_command(&fsblock_cmd); + add_command(&rtblock_cmd); + add_command(&rtextent_cmd); } static void @@ -301,6 +313,101 @@ fsblock_f( return 0; } +static void +rtblock_help(void) +{ + dbprintf(_( +"\n Example:\n" +"\n" +" 'rtblock 1023' - sets the file position to the 1023rd block on the realtime\n" +" volume. The filesystem block size is specified in the superblock and set\n" +" during mkfs time.\n\n" +)); +} + +static int +rtblock_f( + int argc, + char **argv) +{ + xfs_rtblock_t rtbno; + char *p; + + if (argc == 1) { + if (!iocur_is_rtdev(iocur_top)) { + dbprintf(_("cursor does not point to rt device\n")); + return 0; + } + dbprintf(_("current rtblock is %lld\n"), + XFS_BB_TO_FSB(mp, iocur_top->off >> BBSHIFT)); + return 0; + } + rtbno = strtoull(argv[1], &p, 0); + if (*p != '\0') { + dbprintf(_("bad rtblock %s\n"), argv[1]); + return 0; + } + if (rtbno >= mp->m_sb.sb_rblocks) { + dbprintf(_("bad rtblock %s\n"), argv[1]); + return 0; + } + ASSERT(typtab[TYP_DATA].typnm == TYP_DATA); + set_rt_cur(&typtab[TYP_DATA], XFS_FSB_TO_BB(mp, rtbno), blkbb, + DB_RING_ADD, NULL); + return 0; +} + +static void +rtextent_help(void) +{ + dbprintf(_( +"\n Example:\n" +"\n" +" 'rtextent 10' - sets the file position to the 10th extent on the realtime\n" +" volume. The realtime extent size is specified in the superblock and set\n" +" during mkfs or growfs time.\n\n" +)); +} + +static int +rtextent_f( + int argc, + char **argv) +{ + xfs_rtblock_t rtbno; + xfs_rtxnum_t rtx; + char *p; + + if (argc == 1) { + xfs_extlen_t dontcare; + + if (!iocur_is_rtdev(iocur_top)) { + dbprintf(_("cursor does not point to rt device\n")); + return 0; + } + + rtbno = XFS_BB_TO_FSB(mp, iocur_top->off >> BBSHIFT); + dbprintf(_("current rtextent is %lld\n"), + xfs_rtb_to_rtx(mp, rtbno, &dontcare)); + return 0; + } + rtx = strtoull(argv[1], &p, 0); + if (*p != '\0') { + dbprintf(_("bad rtextent %s\n"), argv[1]); + return 0; + } + if (rtx >= mp->m_sb.sb_rextents) { + dbprintf(_("bad rtextent %s\n"), argv[1]); + return 0; + } + + rtbno = xfs_rtx_to_rtb(mp, rtx); + ASSERT(typtab[TYP_DATA].typnm == TYP_DATA); + set_rt_cur(&typtab[TYP_DATA], XFS_FSB_TO_BB(mp, rtbno), + mp->m_sb.sb_rextsize * blkbb, DB_RING_ADD, NULL); + return 0; +} + void print_block( const field_t *fields, diff --git a/man/man8/xfs_db.8 b/man/man8/xfs_db.8 index aa097a13b27..de30fbc6230 100644 --- a/man/man8/xfs_db.8 +++ b/man/man8/xfs_db.8 @@ -927,6 +927,26 @@ command. Exit .BR xfs_db . .TP +.BI "rtblock [" rtbno ] +Set current address to the rtblock value given by +.IR rtbno . +If no value for +.I rtbno +is given the current address is printed, expressed as an rtbno. +The type is set to +.B data +(uninterpreted). +.TP +.BI "rtextent [" rtxno ] +Set current address to the rtextent value given by +.IR rtextent . +If no value for +.I rtextent +is given the current address is printed, expressed as an rtextent. +The type is set to +.B data +(uninterpreted). +.TP .BI "ring [" index ] Show position ring (if no .I index ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 6/8] xfs_db: enable conversion of rt space units 2022-12-30 22:19 ` [PATCHSET v1.0 0/8] xfs_db: debug realtime geometry Darrick J. Wong 2022-12-30 22:19 ` [PATCH 2/8] xfs_db: report the device associated with each io cursor Darrick J. Wong 2022-12-30 22:19 ` [PATCH 5/8] xfs_db: access arbitrary realtime blocks and extents Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 3/8] xfs_db: make the daddr command target the realtime device Darrick J. Wong ` (4 subsequent siblings) 7 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Teach the xfs_db convert function about rt extents, rt block numbers, and how to compute offsets within the rt bitmap and summary files. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- db/convert.c | 232 ++++++++++++++++++++++++++++++++++++++++++++++++----- man/man8/xfs_db.8 | 51 ++++++++++++ 2 files changed, 260 insertions(+), 23 deletions(-) diff --git a/db/convert.c b/db/convert.c index e1466057031..811bac00f71 100644 --- a/db/convert.c +++ b/db/convert.c @@ -26,6 +26,10 @@ agino_to_bytes(XFS_INO_TO_AGINO(mp, (x)))) #define inoidx_to_bytes(x) \ ((uint64_t)(x) << mp->m_sb.sb_inodelog) +#define rtblock_to_bytes(x) \ + ((uint64_t)(x) << mp->m_sb.sb_blocklog) +#define rtx_to_rtblock(x) \ + ((uint64_t)(x) * mp->m_sb.sb_rextsize) typedef enum { CT_NONE = -1, @@ -40,11 +44,12 @@ typedef enum { CT_INO, /* xfs_ino_t */ CT_INOIDX, /* index of inode in fsblock */ CT_INOOFF, /* byte offset in inode */ + CT_RTBLOCK, /* realtime block */ + CT_RTX, /* realtime extent */ NCTS } ctype_t; typedef struct ctydesc { - ctype_t ctype; int allowed; const char **names; } ctydesc_t; @@ -61,12 +66,16 @@ typedef union { xfs_ino_t ino; int inoidx; int inooff; + xfs_rtblock_t rtblock; + xfs_rtblock_t rtx; } cval_t; static uint64_t bytevalue(ctype_t ctype, cval_t *val); +static int rtconvert_f(int argc, char **argv); static int convert_f(int argc, char **argv); static int getvalue(char *s, ctype_t ctype, cval_t *val); -static ctype_t lookupcty(char *ctyname); +static ctype_t lookupcty(const struct ctydesc *descs, + const char *ctyname); static const char *agblock_names[] = { "agblock", "agbno", NULL }; static const char *agino_names[] = { "agino", "aginode", NULL }; @@ -74,6 +83,8 @@ static const char *agnumber_names[] = { "agnumber", "agno", NULL }; static const char *bboff_names[] = { "bboff", "daddroff", NULL }; static const char *blkoff_names[] = { "blkoff", "fsboff", "agboff", NULL }; +static const char *rtblkoff_names[] = { "blkoff", "rtboff", + NULL }; static const char *byte_names[] = { "byte", "fsbyte", NULL }; static const char *daddr_names[] = { "daddr", "bb", NULL }; static const char *fsblock_names[] = { "fsblock", "fsb", "fsbno", NULL }; @@ -81,30 +92,91 @@ static const char *ino_names[] = { "ino", "inode", NULL }; static const char *inoidx_names[] = { "inoidx", "offset", NULL }; static const char *inooff_names[] = { "inooff", "inodeoff", NULL }; +static const char *rtblock_names[] = { "rtblock", "rtb", "rtbno", NULL }; +static const char *rtx_names[] = { "rtx", "rtextent", NULL }; + static const ctydesc_t ctydescs[NCTS] = { - { CT_AGBLOCK, M(AGNUMBER)|M(BBOFF)|M(BLKOFF)|M(INOIDX)|M(INOOFF), - agblock_names }, - { CT_AGINO, M(AGNUMBER)|M(INOOFF), agino_names }, - { CT_AGNUMBER, - M(AGBLOCK)|M(AGINO)|M(BBOFF)|M(BLKOFF)|M(INOIDX)|M(INOOFF), - agnumber_names }, - { CT_BBOFF, M(AGBLOCK)|M(AGNUMBER)|M(DADDR)|M(FSBLOCK), bboff_names }, - { CT_BLKOFF, M(AGBLOCK)|M(AGNUMBER)|M(FSBLOCK), blkoff_names }, - { CT_BYTE, 0, byte_names }, - { CT_DADDR, M(BBOFF), daddr_names }, - { CT_FSBLOCK, M(BBOFF)|M(BLKOFF)|M(INOIDX), fsblock_names }, - { CT_INO, M(INOOFF), ino_names }, - { CT_INOIDX, M(AGBLOCK)|M(AGNUMBER)|M(FSBLOCK)|M(INOOFF), - inoidx_names }, - { CT_INOOFF, - M(AGBLOCK)|M(AGINO)|M(AGNUMBER)|M(FSBLOCK)|M(INO)|M(INOIDX), - inooff_names }, + [CT_AGBLOCK] = { + .allowed = M(AGNUMBER)|M(BBOFF)|M(BLKOFF)|M(INOIDX)|M(INOOFF), + .names = agblock_names, + }, + [CT_AGINO] = { + .allowed = M(AGNUMBER)|M(INOOFF), + .names = agino_names, + }, + [CT_AGNUMBER] = { + .allowed = M(AGBLOCK)|M(AGINO)|M(BBOFF)|M(BLKOFF)|M(INOIDX)|M(INOOFF), + .names = agnumber_names, + }, + [CT_BBOFF] = { + .allowed = M(AGBLOCK)|M(AGNUMBER)|M(DADDR)|M(FSBLOCK), + .names = bboff_names, + }, + [CT_BLKOFF] = { + .allowed = M(AGBLOCK)|M(AGNUMBER)|M(FSBLOCK), + .names = blkoff_names, + }, + [CT_BYTE] = { + .allowed = 0, + .names = byte_names, + }, + [CT_DADDR] = { + .allowed = M(BBOFF), + .names = daddr_names, + }, + [CT_FSBLOCK] = { + .allowed = M(BBOFF)|M(BLKOFF)|M(INOIDX), + .names = fsblock_names, + }, + [CT_INO] = { + .allowed = M(INOOFF), + .names = ino_names, + }, + [CT_INOIDX] = { + .allowed = M(AGBLOCK)|M(AGNUMBER)|M(FSBLOCK)|M(INOOFF), + .names = inoidx_names, + }, + [CT_INOOFF] = { + .allowed = M(AGBLOCK)|M(AGINO)|M(AGNUMBER)|M(FSBLOCK)|M(INO)|M(INOIDX), + .names = inooff_names, + }, +}; + +static const ctydesc_t ctydescs_rt[NCTS] = { + [CT_BBOFF] = { + .allowed = M(DADDR)|M(RTBLOCK), + .names = bboff_names, + }, + [CT_BLKOFF] = { + .allowed = M(RTBLOCK), + .names = rtblkoff_names, + }, + [CT_BYTE] = { + .allowed = 0, + .names = byte_names, + }, + [CT_DADDR] = { + .allowed = M(BBOFF), + .names = daddr_names, + }, + [CT_RTBLOCK] = { + .allowed = M(BBOFF)|M(BLKOFF), + .names = rtblock_names, + }, + [CT_RTX] = { + .allowed = M(BBOFF)|M(BLKOFF), + .names = rtx_names, + }, }; static const cmdinfo_t convert_cmd = { "convert", NULL, convert_f, 3, 9, 0, "type num [type num]... type", "convert from one address form to another", NULL }; +static const cmdinfo_t rtconvert_cmd = + { "rtconvert", NULL, rtconvert_f, 3, 9, 0, "type num [type num]... type", + "convert from one realtime address form to another", NULL }; + static uint64_t bytevalue(ctype_t ctype, cval_t *val) { @@ -131,6 +203,10 @@ bytevalue(ctype_t ctype, cval_t *val) return inoidx_to_bytes(val->inoidx); case CT_INOOFF: return (uint64_t)val->inooff; + case CT_RTBLOCK: + return rtblock_to_bytes(val->rtblock); + case CT_RTX: + return rtblock_to_bytes(rtx_to_rtblock(val->rtx)); case CT_NONE: case NCTS: break; @@ -159,13 +235,13 @@ convert_f(int argc, char **argv) "arguments\n"), argc); return 0; } - if ((wtype = lookupcty(argv[argc - 1])) == CT_NONE) { + if ((wtype = lookupcty(ctydescs, argv[argc - 1])) == CT_NONE) { dbprintf(_("unknown conversion type %s\n"), argv[argc - 1]); return 0; } for (i = mask = conmask = 0; i < (argc - 1) / 2; i++) { - c = lookupcty(argv[i * 2]); + c = lookupcty(ctydescs, argv[i * 2]); if (c == CT_NONE) { dbprintf(_("unknown conversion type %s\n"), argv[i * 2]); return 0; @@ -230,6 +306,107 @@ convert_f(int argc, char **argv) case CT_INOOFF: v &= mp->m_sb.sb_inodesize - 1; break; + case CT_RTBLOCK: + case CT_RTX: + /* shouldn't get here */ + ASSERT(0); + break; + case CT_NONE: + case NCTS: + /* NOTREACHED */ + break; + } + dbprintf("0x%llx (%llu)\n", v, v); + return 0; +} + +static inline xfs_rtblock_t +xfs_daddr_to_rtb( + struct xfs_mount *mp, + xfs_daddr_t daddr) +{ + return daddr >> mp->m_blkbb_log; +} + +static int +rtconvert_f(int argc, char **argv) +{ + ctype_t c; + int conmask; + cval_t cvals[NCTS] = {}; + int i; + int mask; + uint64_t v; + ctype_t wtype; + + /* move past the "rtconvert" command */ + argc--; + argv++; + + if ((argc % 2) != 1) { + dbprintf(_("bad argument count %d to rtconvert, expected 3,5,7,9 " + "arguments\n"), argc); + return 0; + } + if ((wtype = lookupcty(ctydescs_rt, argv[argc - 1])) == CT_NONE) { + dbprintf(_("unknown conversion type %s\n"), argv[argc - 1]); + return 0; + } + + for (i = mask = conmask = 0; i < (argc - 1) / 2; i++) { + c = lookupcty(ctydescs_rt, argv[i * 2]); + if (c == CT_NONE) { + dbprintf(_("unknown conversion type %s\n"), argv[i * 2]); + return 0; + } + if (c == wtype) { + dbprintf(_("result type same as argument\n")); + return 0; + } + if (conmask & (1 << c)) { + dbprintf(_("conflicting conversion type %s\n"), + argv[i * 2]); + return 0; + } + if (!getvalue(argv[i * 2 + 1], c, &cvals[c])) + return 0; + mask |= 1 << c; + conmask |= ~ctydescs_rt[c].allowed; + } + v = 0; + for (c = (ctype_t)0; c < NCTS; c++) { + if (!(mask & (1 << c))) + continue; + v += bytevalue(c, &cvals[c]); + } + switch (wtype) { + case CT_BBOFF: + v &= BBMASK; + break; + case CT_BLKOFF: + v &= mp->m_blockmask; + break; + case CT_BYTE: + break; + case CT_DADDR: + v >>= BBSHIFT; + break; + case CT_RTBLOCK: + v = xfs_daddr_to_rtb(mp, v >> BBSHIFT); + break; + case CT_RTX: + v = xfs_daddr_to_rtb(mp, v >> BBSHIFT) / mp->m_sb.sb_rextsize; + break; + case CT_AGBLOCK: + case CT_AGINO: + case CT_AGNUMBER: + case CT_FSBLOCK: + case CT_INO: + case CT_INOIDX: + case CT_INOOFF: + /* shouldn't get here */ + ASSERT(0); + break; case CT_NONE: case NCTS: /* NOTREACHED */ @@ -243,6 +420,7 @@ void convert_init(void) { add_command(&convert_cmd); + add_command(&rtconvert_cmd); } static int @@ -290,6 +468,12 @@ getvalue(char *s, ctype_t ctype, cval_t *val) case CT_INOOFF: val->inooff = (int)v; break; + case CT_RTBLOCK: + val->rtblock = (xfs_rtblock_t)v; + break; + case CT_RTX: + val->rtx = (xfs_rtblock_t)v; + break; case CT_NONE: case NCTS: /* NOTREACHED */ @@ -299,13 +483,15 @@ getvalue(char *s, ctype_t ctype, cval_t *val) } static ctype_t -lookupcty(char *ctyname) +lookupcty( + const struct ctydesc *descs, + const char *ctyname) { ctype_t cty; const char **name; for (cty = (ctype_t)0; cty < NCTS; cty++) { - for (name = ctydescs[cty].names; *name; name++) { + for (name = descs[cty].names; name && *name; name++) { if (strcmp(ctyname, *name) == 0) return cty; } diff --git a/man/man8/xfs_db.8 b/man/man8/xfs_db.8 index de30fbc6230..40ff7de335e 100644 --- a/man/man8/xfs_db.8 +++ b/man/man8/xfs_db.8 @@ -953,6 +953,57 @@ Show position ring (if no argument is given), or move to a specific entry in the position ring given by .IR index . .TP +.BI "rtconvert " "type number" " [" "type number" "] ... " type +Convert from one address form to another for realtime section addresses. +The known +.IR type s, +with alternate names, are: +.RS 1.0i +.PD 0 +.HP +.B bboff +or +.B daddroff +(byte offset in a +.BR daddr ) +.HP +.B blkoff +or +.B fsboff or +.B rtboff +(byte offset in a +.B rtblock +or +.BR rtextent ) +.HP +.B byte +or +.B fsbyte +(byte address in filesystem) +.HP +.B daddr +or +.B bb +(disk address, 512-byte blocks) +.HP +.B rtblock +or +.B rtb +or +.B rtbno +(realtime filesystem block, see the +.B fsblock +command) +.HP +.B rtx +or +.B rtextent +(realtime extent) +.PD +.RE +.IP +Only conversions that "make sense" are allowed. +.TP .BI "sb [" agno ] Set current address to SB header in allocation group .IR agno . ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 3/8] xfs_db: make the daddr command target the realtime device 2022-12-30 22:19 ` [PATCHSET v1.0 0/8] xfs_db: debug realtime geometry Darrick J. Wong ` (2 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 6/8] xfs_db: enable conversion of rt space units Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 4/8] xfs_db: access realtime file blocks Darrick J. Wong ` (3 subsequent siblings) 7 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Make it so that users can issue the command "daddr -r XXX" to select disk block XXX on the realtime device. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- db/block.c | 43 ++++++++++++++++++++++++++++++++++++++----- man/man8/xfs_db.8 | 6 +++++- 2 files changed, 43 insertions(+), 6 deletions(-) diff --git a/db/block.c b/db/block.c index b2b5edf9385..d064fbed5aa 100644 --- a/db/block.c +++ b/db/block.c @@ -31,7 +31,7 @@ static const cmdinfo_t ablock_cmd = { "ablock", NULL, ablock_f, 1, 1, 1, N_("filoff"), N_("set address to file offset (attr fork)"), ablock_help }; static const cmdinfo_t daddr_cmd = - { "daddr", NULL, daddr_f, 0, 1, 1, N_("[d]"), + { "daddr", NULL, daddr_f, 0, -1, 1, N_("[d]"), N_("set address to daddr value"), daddr_help }; static const cmdinfo_t dblock_cmd = { "dblock", NULL, dblock_f, 1, 1, 1, N_("filoff"), @@ -117,6 +117,11 @@ daddr_help(void) )); } +enum daddr_target { + DT_DATA, + DT_RT, +}; + static int daddr_f( int argc, @@ -124,8 +129,23 @@ daddr_f( { int64_t d; char *p; + int c; + xfs_rfsblock_t max_daddrs = mp->m_sb.sb_dblocks; + enum daddr_target tgt = DT_DATA; - if (argc == 1) { + while ((c = getopt(argc, argv, "r")) != -1) { + switch (c) { + case 'r': + tgt = DT_RT; + max_daddrs = mp->m_sb.sb_rblocks; + break; + default: + daddr_help(); + return 0; + } + } + + if (optind == argc) { xfs_daddr_t daddr = iocur_top->off >> BBSHIFT; if (iocur_is_ddev(iocur_top)) @@ -139,14 +159,27 @@ daddr_f( return 0; } - d = (int64_t)strtoull(argv[1], &p, 0); + + if (optind != argc - 1) { + daddr_help(); + return 0; + } + + d = (int64_t)strtoull(argv[optind], &p, 0); if (*p != '\0' || - d >= mp->m_sb.sb_dblocks << (mp->m_sb.sb_blocklog - BBSHIFT)) { + d >= max_daddrs << (mp->m_sb.sb_blocklog - BBSHIFT)) { dbprintf(_("bad daddr %s\n"), argv[1]); return 0; } ASSERT(typtab[TYP_DATA].typnm == TYP_DATA); - set_cur(&typtab[TYP_DATA], d, 1, DB_RING_ADD, NULL); + switch (tgt) { + case DT_DATA: + set_cur(&typtab[TYP_DATA], d, 1, DB_RING_ADD, NULL); + break; + case DT_RT: + set_rt_cur(&typtab[TYP_DATA], d, 1, DB_RING_ADD, NULL); + break; + } return 0; } diff --git a/man/man8/xfs_db.8 b/man/man8/xfs_db.8 index 593b8037251..aa097a13b27 100644 --- a/man/man8/xfs_db.8 +++ b/man/man8/xfs_db.8 @@ -590,7 +590,7 @@ Recalculate the current structure's correct CRC value, and write it to disk. Validate and display the current value and state of the structure's CRC. .RE .TP -.BI "daddr [" d ] +.BI "daddr [" -r "] [" d ] Set current address to the daddr (512 byte block) given by .IR d . If no value for @@ -599,6 +599,10 @@ is given, the current address is printed, expressed as a daddr. The type is set to .B data (uninterpreted). + +If an address and the +.B \-r +option are specified, the current address is set to the realtime device. .TP .BI dblock " filoff" Set current address to the offset ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 4/8] xfs_db: access realtime file blocks 2022-12-30 22:19 ` [PATCHSET v1.0 0/8] xfs_db: debug realtime geometry Darrick J. Wong ` (3 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 3/8] xfs_db: make the daddr command target the realtime device Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 1/8] xfs_db: support passing the realtime device to the debugger Darrick J. Wong ` (2 subsequent siblings) 7 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Now that we have the ability to point the io cursor at the realtime device, let's make it so that the "dblock" command can walk the contents of realtime files. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- db/block.c | 17 +++++++++++++++-- db/faddr.c | 4 +++- 2 files changed, 18 insertions(+), 3 deletions(-) diff --git a/db/block.c b/db/block.c index d064fbed5aa..ae8744685b0 100644 --- a/db/block.c +++ b/db/block.c @@ -195,6 +195,13 @@ dblock_help(void) )); } +static inline bool +is_rtfile( + struct xfs_dinode *dip) +{ + return dip->di_flags & cpu_to_be16(XFS_DIFLAG_REALTIME); +} + static int dblock_f( int argc, @@ -234,8 +241,14 @@ dblock_f( ASSERT(typtab[type].typnm == type); if (nex > 1) make_bbmap(&bbmap, nex, bmp); - set_cur(&typtab[type], (int64_t)XFS_FSB_TO_DADDR(mp, dfsbno), - nb * blkbb, DB_RING_ADD, nex > 1 ? &bbmap : NULL); + if (is_rtfile(iocur_top->data)) + set_rt_cur(&typtab[type], (int64_t)XFS_FSB_TO_DADDR(mp, dfsbno), + nb * blkbb, DB_RING_ADD, + nex > 1 ? &bbmap : NULL); + else + set_cur(&typtab[type], (int64_t)XFS_FSB_TO_DADDR(mp, dfsbno), + nb * blkbb, DB_RING_ADD, + nex > 1 ? &bbmap : NULL); free(bmp); return 0; } diff --git a/db/faddr.c b/db/faddr.c index ec4aae68bb5..fd65b86b5e9 100644 --- a/db/faddr.c +++ b/db/faddr.c @@ -323,7 +323,9 @@ fa_drtbno( dbprintf(_("null block number, cannot set new addr\n")); return; } - /* need set_cur to understand rt subvolume */ + + set_rt_cur(&typtab[next], (int64_t)XFS_FSB_TO_BB(mp, bno), blkbb, + DB_RING_ADD, NULL); } /*ARGSUSED*/ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 1/8] xfs_db: support passing the realtime device to the debugger 2022-12-30 22:19 ` [PATCHSET v1.0 0/8] xfs_db: debug realtime geometry Darrick J. Wong ` (4 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 4/8] xfs_db: access realtime file blocks Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 8/8] xfs_db: convert rtsummary geometry Darrick J. Wong 2022-12-30 22:19 ` [PATCH 7/8] xfs_db: convert rtbitmap geometry Darrick J. Wong 7 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Create a new -R flag so that sysadmins can pass the realtime device to the xfs debugger. Since we can now have superblocks on the rt device, we need this to be able to inspect/dump/etc. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- db/init.c | 7 +++++-- db/io.c | 43 +++++++++++++++++++++++++++++++++++++------ db/io.h | 2 ++ db/xfs_admin.sh | 5 +++-- man/man8/xfs_db.8 | 13 +++++++++++++ 5 files changed, 60 insertions(+), 10 deletions(-) diff --git a/db/init.c b/db/init.c index 9f045d27076..fc3bc403ea1 100644 --- a/db/init.c +++ b/db/init.c @@ -33,7 +33,7 @@ static void usage(void) { fprintf(stderr, _( - "Usage: %s [-ifFrxV] [-p prog] [-l logdev] [-c cmd]... device\n" + "Usage: %s [-ifFrxV] [-p prog] [-l logdev] [-R rtdev] [-c cmd]... device\n" ), progname); exit(1); } @@ -54,7 +54,7 @@ init( textdomain(PACKAGE); progname = basename(argv[0]); - while ((c = getopt(argc, argv, "c:fFip:rxVl:")) != EOF) { + while ((c = getopt(argc, argv, "c:fFip:rR:xVl:")) != EOF) { switch (c) { case 'c': cmdline = xrealloc(cmdline, (ncmdline+1)*sizeof(char*)); @@ -75,6 +75,9 @@ init( case 'r': x.isreadonly = LIBXFS_ISREADONLY; break; + case 'R': + x.rtname = optarg; + break; case 'l': x.logname = optarg; break; diff --git a/db/io.c b/db/io.c index 3d2572364d3..8688ee8e9c0 100644 --- a/db/io.c +++ b/db/io.c @@ -429,6 +429,7 @@ ring_add(void) static void write_cur_buf(void) { + struct xfs_buftarg *btp = iocur_top->bp->b_target; int ret; ret = -libxfs_bwrite(iocur_top->bp); @@ -436,7 +437,7 @@ write_cur_buf(void) dbprintf(_("write error: %s\n"), strerror(ret)); /* re-read buffer from disk */ - ret = -libxfs_readbufr(mp->m_ddev_targp, iocur_top->bb, iocur_top->bp, + ret = -libxfs_readbufr(btp, iocur_top->bb, iocur_top->bp, iocur_top->blen, 0); if (ret != 0) dbprintf(_("read error: %s\n"), strerror(ret)); @@ -445,6 +446,7 @@ write_cur_buf(void) static void write_cur_bbs(void) { + struct xfs_buftarg *btp = iocur_top->bp->b_target; int ret; ret = -libxfs_bwrite(iocur_top->bp); @@ -453,7 +455,7 @@ write_cur_bbs(void) /* re-read buffer from disk */ - ret = -libxfs_readbufr_map(mp->m_ddev_targp, iocur_top->bp, 0); + ret = -libxfs_readbufr_map(btp, iocur_top->bp, 0); if (ret != 0) dbprintf(_("read error: %s\n"), strerror(ret)); } @@ -508,8 +510,9 @@ write_cur(void) } -void -set_cur( +static void +__set_cur( + struct xfs_buftarg *btp, const typ_t *type, xfs_daddr_t blknum, int len, @@ -548,11 +551,11 @@ set_cur( if (!iocur_top->bbmap) return; memcpy(iocur_top->bbmap, bbmap, sizeof(struct bbmap)); - error = -libxfs_buf_read_map(mp->m_ddev_targp, bbmap->b, + error = -libxfs_buf_read_map(btp, bbmap->b, bbmap->nmaps, LIBXFS_READBUF_SALVAGE, &bp, ops); } else { - error = -libxfs_buf_read(mp->m_ddev_targp, blknum, len, + error = -libxfs_buf_read(btp, blknum, len, LIBXFS_READBUF_SALVAGE, &bp, ops); iocur_top->bbmap = NULL; } @@ -589,6 +592,34 @@ set_cur( ring_add(); } +void +set_cur( + const typ_t *type, + xfs_daddr_t blknum, + int len, + int ring_flag, + bbmap_t *bbmap) +{ + __set_cur(mp->m_ddev_targp, type, blknum, len, ring_flag, bbmap); +} + +int +set_rt_cur( + const typ_t *type, + xfs_daddr_t blknum, + int len, + int ring_flag, + bbmap_t *bbmap) +{ + if (!mp->m_rtdev_targp->bt_bdev) { + printf(_("realtime device not loaded, use -R.\n")); + return ENODEV; + } + + __set_cur(mp->m_rtdev_targp, type, blknum, len, ring_flag, bbmap); + return 0; +} + void set_iocur_type( const typ_t *type) diff --git a/db/io.h b/db/io.h index c29a7488198..29b22037bd6 100644 --- a/db/io.h +++ b/db/io.h @@ -49,6 +49,8 @@ extern void push_cur_and_set_type(void); extern void write_cur(void); extern void set_cur(const struct typ *type, xfs_daddr_t blknum, int len, int ring_add, bbmap_t *bbmap); +extern int set_rt_cur(const struct typ *type, xfs_daddr_t blknum, + int len, int ring_add, bbmap_t *bbmap); extern void ring_add(void); extern void set_iocur_type(const struct typ *type); extern void xfs_dummy_verify(struct xfs_buf *bp); diff --git a/db/xfs_admin.sh b/db/xfs_admin.sh index 409975b2228..a45a75bc9a6 100755 --- a/db/xfs_admin.sh +++ b/db/xfs_admin.sh @@ -6,6 +6,7 @@ status=0 DB_OPTS="" +DB_DEV_OPTS="" REPAIR_OPTS="" REPAIR_DEV_OPTS="" LOG_OPTS="" @@ -22,7 +23,7 @@ do L) DB_OPTS=$DB_OPTS" -c 'label "$OPTARG"'";; O) REPAIR_OPTS=$REPAIR_OPTS" -c $OPTARG";; p) DB_OPTS=$DB_OPTS" -c 'version projid32bit'";; - r) REPAIR_DEV_OPTS=" -r '$OPTARG'";; + r) REPAIR_DEV_OPTS=" -r '$OPTARG'"; DB_DEV_OPTS=" -R '$OPTARG'";; u) DB_OPTS=$DB_OPTS" -r -c uuid";; U) DB_OPTS=$DB_OPTS" -c 'uuid "$OPTARG"'";; V) xfs_db -p xfs_admin -V @@ -45,7 +46,7 @@ case $# in if [ -n "$DB_OPTS" ] then - eval xfs_db -x -p xfs_admin $LOG_OPTS $DB_OPTS "$1" + eval xfs_db -x -p xfs_admin $LOG_OPTS $DB_DEV_OPTS $DB_OPTS "$1" status=$? fi if [ -n "$REPAIR_OPTS" ] diff --git a/man/man8/xfs_db.8 b/man/man8/xfs_db.8 index a7e42e1a333..593b8037251 100644 --- a/man/man8/xfs_db.8 +++ b/man/man8/xfs_db.8 @@ -14,6 +14,9 @@ xfs_db \- debug an XFS filesystem .B \-l .I logdev ] [ +.B \-R +.I rtdev +] [ .B \-p .I progname ] @@ -80,6 +83,16 @@ Set the program name to for prompts and some error messages, the default value is .BR xfs_db . .TP +.B -R +.I rtdev +Specifies the device where the realtime data resides. +This is only relevant for filesystems that have a realtime section. +See the +.BR mkfs.xfs "(8) " \-r +option, and refer to +.BR xfs (5) +for a detailed description of the XFS realtime section. +.TP .B -r Open .I device ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 8/8] xfs_db: convert rtsummary geometry 2022-12-30 22:19 ` [PATCHSET v1.0 0/8] xfs_db: debug realtime geometry Darrick J. Wong ` (5 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 1/8] xfs_db: support passing the realtime device to the debugger Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 7/8] xfs_db: convert rtbitmap geometry Darrick J. Wong 7 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Teach the rtconvert command to be able to convert realtime blocks and extents to locations within the rt summary. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- db/convert.c | 141 ++++++++++++++++++++++++++++++++++++++++++++++++++--- man/man8/xfs_db.8 | 29 +++++++++++ 2 files changed, 162 insertions(+), 8 deletions(-) diff --git a/db/convert.c b/db/convert.c index 691361604ee..0aed1437dc4 100644 --- a/db/convert.c +++ b/db/convert.c @@ -52,6 +52,9 @@ typedef enum { CT_RTX, /* realtime extent */ CT_RBMBLOCK, /* block within rt bitmap */ CT_RBMWORD, /* word within rt bitmap */ + CT_RSUMBLOCK, /* block within rt summary */ + CT_RSUMLOG, /* log level for rtsummary computations */ + CT_RSUMINFO, /* info word within rt summary */ NCTS } ctype_t; @@ -76,6 +79,7 @@ typedef union { xfs_rtblock_t rtx; xfs_fileoff_t rbmblock; unsigned int rbmword; + xfs_fileoff_t rsumblock; } cval_t; static uint64_t bytevalue(ctype_t ctype, cval_t *val); @@ -104,6 +108,12 @@ static const char *rtblock_names[] = { "rtblock", "rtb", "rtbno", NULL }; static const char *rtx_names[] = { "rtx", "rtextent", NULL }; static const char *rbmblock_names[] = { "rbmblock", "rbmb", NULL }; static const char *rbmword_names[] = { "rbmword", "rbmw", NULL }; +static const char *rsumblock_names[] = { "rsumblock", "rsmb", NULL }; +static const char *rsumlog_names[] = { "rsumlog", "rsml", NULL }; +static const char *rsumword_names[] = { "rsuminfo", "rsmi", NULL }; + +static int rsuminfo; +static int rsumlog; static const ctydesc_t ctydescs[NCTS] = { [CT_AGBLOCK] = { @@ -154,37 +164,50 @@ static const ctydesc_t ctydescs[NCTS] = { static const ctydesc_t ctydescs_rt[NCTS] = { [CT_BBOFF] = { - .allowed = M(DADDR)|M(RTBLOCK), + .allowed = M(DADDR)|M(RTBLOCK)|M(RSUMLOG), .names = bboff_names, }, [CT_BLKOFF] = { - .allowed = M(RTBLOCK), + .allowed = M(RTBLOCK)|M(RSUMLOG), .names = rtblkoff_names, }, [CT_BYTE] = { - .allowed = 0, + .allowed = 0|M(RSUMLOG), .names = byte_names, }, [CT_DADDR] = { - .allowed = M(BBOFF), + .allowed = M(BBOFF)|M(RSUMLOG), .names = daddr_names, }, [CT_RTBLOCK] = { - .allowed = M(BBOFF)|M(BLKOFF), + .allowed = M(BBOFF)|M(BLKOFF)|M(RSUMLOG), .names = rtblock_names, }, [CT_RTX] = { - .allowed = M(BBOFF)|M(BLKOFF), + .allowed = M(BBOFF)|M(BLKOFF)|M(RSUMLOG), .names = rtx_names, }, [CT_RBMBLOCK] = { - .allowed = M(RBMWORD), + .allowed = M(RBMWORD)|M(RSUMLOG), .names = rbmblock_names, }, [CT_RBMWORD] = { - .allowed = M(RBMBLOCK), + .allowed = M(RBMBLOCK)|M(RSUMLOG), .names = rbmword_names, }, + /* must be specified in order rsumlog -> rsuminfo -> rsumblock */ + [CT_RSUMBLOCK] = { + .allowed = 0, + .names = rsumblock_names, + }, + [CT_RSUMLOG] = { + .allowed = M(RSUMINFO)|M(RSUMBLOCK), + .names = rsumlog_names, + }, + [CT_RSUMINFO] = { + .allowed = M(RSUMBLOCK), + .names = rsumword_names, + }, }; static const cmdinfo_t convert_cmd = @@ -195,6 +218,39 @@ static const cmdinfo_t rtconvert_cmd = { "rtconvert", NULL, rtconvert_f, 3, 9, 0, "type num [type num]... type", "convert from one realtime address form to another", NULL }; +static inline uint64_t +rsumblock_to_bytes( + xfs_fileoff_t rsumblock) +{ + /* + * We compute the rt summary file block with this formula: + * sumoffs = (log2len * sb_rbmblocks) + rbmblock; + * sumblock = sumoffs / blockwsize; + * + * Hence the return value is the inverse of this: + * sumoffs = (rsumblock * blockwsize) + rsuminfo; + * rbmblock = sumoffs % (log2len * sb_rbmblocks); + */ + xfs_rtsumoff_t sumoff; + xfs_fileoff_t rbmblock; + + if (rsumlog < 0) { + dbprintf(_("need to set rsumlog\n")); + return 0; + } + if (rsuminfo < 0) { + dbprintf(_("need to set rsuminfo\n")); + return 0; + } + + sumoff = rsuminfo + (rsumblock * mp->m_blockwsize); + if (rsumlog) + rbmblock = sumoff % (rsumlog * mp->m_sb.sb_rbmblocks); + else + rbmblock = sumoff; + return rbmblock_to_bytes(rbmblock); +} + static uint64_t bytevalue(ctype_t ctype, cval_t *val) { @@ -229,6 +285,16 @@ bytevalue(ctype_t ctype, cval_t *val) return rbmblock_to_bytes(val->rbmblock); case CT_RBMWORD: return rbmword_to_bytes(val->rbmword); + case CT_RSUMBLOCK: + return rsumblock_to_bytes(val->rbmblock); + case CT_RSUMLOG: + case CT_RSUMINFO: + /* + * These have to specified before rsumblock, and are stored in + * global variables. Hence they do not adjust the disk address + * value. + */ + return 0; case CT_NONE: case NCTS: break; @@ -332,6 +398,9 @@ convert_f(int argc, char **argv) case CT_RTX: case CT_RBMBLOCK: case CT_RBMWORD: + case CT_RSUMBLOCK: + case CT_RSUMLOG: + case CT_RSUMINFO: /* shouldn't get here */ ASSERT(0); break; @@ -352,6 +421,40 @@ xfs_daddr_to_rtb( return daddr >> mp->m_blkbb_log; } +static inline uint64_t +rt_daddr_to_rsumblock( + struct xfs_mount *mp, + uint64_t input) +{ + xfs_fileoff_t rbmblock; + + if (rsumlog < 0) { + dbprintf(_("need to set rsumlog\n")); + return 0; + } + + rbmblock = xfs_rtx_to_rbmblock(mp, xfs_rtb_to_rtxt(mp, + xfs_daddr_to_rtb(mp, input >> BBSHIFT))); + return xfs_rtsumoffs_to_block(mp, xfs_rtsumoffs(mp, rsumlog, rbmblock)); +} + +static inline uint64_t +rt_daddr_to_rsuminfo( + struct xfs_mount *mp, + uint64_t input) +{ + xfs_fileoff_t rbmblock; + + if (rsumlog < 0) { + dbprintf(_("need to set rsumlog\n")); + return 0; + } + + rbmblock = xfs_rtx_to_rbmblock(mp, xfs_rtb_to_rtxt(mp, + xfs_daddr_to_rtb(mp, input >> BBSHIFT))); + return xfs_rtsumoffs_to_infoword(mp, xfs_rtsumoffs(mp, rsumlog, rbmblock)); +} + static int rtconvert_f(int argc, char **argv) { @@ -363,6 +466,9 @@ rtconvert_f(int argc, char **argv) uint64_t v; ctype_t wtype; + rsumlog = -1; + rsuminfo = -1; + /* move past the "rtconvert" command */ argc--; argv++; @@ -429,6 +535,16 @@ rtconvert_f(int argc, char **argv) v = xfs_rtx_to_rbmword(mp, xfs_rtb_to_rtxt(mp, xfs_daddr_to_rtb(mp, v >> BBSHIFT))); break; + case CT_RSUMBLOCK: + v = rt_daddr_to_rsumblock(mp, v); + break; + case CT_RSUMLOG: + dbprintf(_("cannot convert to rsumlog\n")); + return 0; + break; + case CT_RSUMINFO: + v = rt_daddr_to_rsuminfo(mp, v); + break; case CT_AGBLOCK: case CT_AGINO: case CT_AGNUMBER: @@ -512,6 +628,15 @@ getvalue(char *s, ctype_t ctype, cval_t *val) case CT_RBMWORD: val->rbmword = (unsigned int)v; break; + case CT_RSUMBLOCK: + val->rsumblock = (xfs_fileoff_t)v; + break; + case CT_RSUMLOG: + rsumlog = (unsigned int)v; + break; + case CT_RSUMINFO: + rsuminfo = (unsigned int)v; + break; case CT_NONE: case NCTS: /* NOTREACHED */ diff --git a/man/man8/xfs_db.8 b/man/man8/xfs_db.8 index 65d1a65b75f..1246eb6327c 100644 --- a/man/man8/xfs_db.8 +++ b/man/man8/xfs_db.8 @@ -1009,10 +1009,39 @@ or or .B rbmw (32-bit word within a realtime bitmap block) +.HP +.B rsumblock +or +.B rsmb +(realtime summary file block) +.HP +.B rsuminfo +or +.B rsmi +(32-bit counter within a realtime summary block) +.HP +.B rsumlog +or +.B rsml +(log2len parameter used for summary file offset computations) .PD .RE .IP Only conversions that "make sense" are allowed. + +Realtime summary file location conversions have the following rules: +Each info word in the rt summary file counts the number of free extents of a +given log2(length) that start in a given rt bitmap block. + +To compute summary file location information for a given rt bitmap block, a +log2(extent length) must be specified as the last type/number pair before the +conversion type, and the type must be +.BR rsumlog . + +To compute the rt bitmap block from summary file location, the type/number pairs +must be specified exactly in the order +.BR rsumlog ", " rsuminfo ", " rsumblock . + .TP .BI "sb [" agno ] Set current address to SB header in allocation group ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 7/8] xfs_db: convert rtbitmap geometry 2022-12-30 22:19 ` [PATCHSET v1.0 0/8] xfs_db: debug realtime geometry Darrick J. Wong ` (6 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 8/8] xfs_db: convert rtsummary geometry Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 7 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Teach the rtconvert command to be able to convert realtime blocks and extents to locations within the rt bitmap. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- db/convert.c | 38 ++++++++++++++++++++++++++++++++++++++ man/man8/xfs_db.8 | 10 ++++++++++ 2 files changed, 48 insertions(+) diff --git a/db/convert.c b/db/convert.c index 811bac00f71..691361604ee 100644 --- a/db/convert.c +++ b/db/convert.c @@ -30,6 +30,10 @@ ((uint64_t)(x) << mp->m_sb.sb_blocklog) #define rtx_to_rtblock(x) \ ((uint64_t)(x) * mp->m_sb.sb_rextsize) +#define rbmblock_to_bytes(x) \ + rtblock_to_bytes(rtx_to_rtblock(xfs_rbmblock_to_rtx(mp, (uint64_t)x))) +#define rbmword_to_bytes(x) \ + rtblock_to_bytes(rtx_to_rtblock((uint64_t)(x) << XFS_NBWORDLOG)) typedef enum { CT_NONE = -1, @@ -46,6 +50,8 @@ typedef enum { CT_INOOFF, /* byte offset in inode */ CT_RTBLOCK, /* realtime block */ CT_RTX, /* realtime extent */ + CT_RBMBLOCK, /* block within rt bitmap */ + CT_RBMWORD, /* word within rt bitmap */ NCTS } ctype_t; @@ -68,6 +74,8 @@ typedef union { int inooff; xfs_rtblock_t rtblock; xfs_rtblock_t rtx; + xfs_fileoff_t rbmblock; + unsigned int rbmword; } cval_t; static uint64_t bytevalue(ctype_t ctype, cval_t *val); @@ -94,6 +102,8 @@ static const char *inooff_names[] = { "inooff", "inodeoff", NULL }; static const char *rtblock_names[] = { "rtblock", "rtb", "rtbno", NULL }; static const char *rtx_names[] = { "rtx", "rtextent", NULL }; +static const char *rbmblock_names[] = { "rbmblock", "rbmb", NULL }; +static const char *rbmword_names[] = { "rbmword", "rbmw", NULL }; static const ctydesc_t ctydescs[NCTS] = { [CT_AGBLOCK] = { @@ -167,6 +177,14 @@ static const ctydesc_t ctydescs_rt[NCTS] = { .allowed = M(BBOFF)|M(BLKOFF), .names = rtx_names, }, + [CT_RBMBLOCK] = { + .allowed = M(RBMWORD), + .names = rbmblock_names, + }, + [CT_RBMWORD] = { + .allowed = M(RBMBLOCK), + .names = rbmword_names, + }, }; static const cmdinfo_t convert_cmd = @@ -207,6 +225,10 @@ bytevalue(ctype_t ctype, cval_t *val) return rtblock_to_bytes(val->rtblock); case CT_RTX: return rtblock_to_bytes(rtx_to_rtblock(val->rtx)); + case CT_RBMBLOCK: + return rbmblock_to_bytes(val->rbmblock); + case CT_RBMWORD: + return rbmword_to_bytes(val->rbmword); case CT_NONE: case NCTS: break; @@ -308,6 +330,8 @@ convert_f(int argc, char **argv) break; case CT_RTBLOCK: case CT_RTX: + case CT_RBMBLOCK: + case CT_RBMWORD: /* shouldn't get here */ ASSERT(0); break; @@ -397,6 +421,14 @@ rtconvert_f(int argc, char **argv) case CT_RTX: v = xfs_daddr_to_rtb(mp, v >> BBSHIFT) / mp->m_sb.sb_rextsize; break; + case CT_RBMBLOCK: + v = xfs_rtx_to_rbmblock(mp, xfs_rtb_to_rtxt(mp, + xfs_daddr_to_rtb(mp, v >> BBSHIFT))); + break; + case CT_RBMWORD: + v = xfs_rtx_to_rbmword(mp, xfs_rtb_to_rtxt(mp, + xfs_daddr_to_rtb(mp, v >> BBSHIFT))); + break; case CT_AGBLOCK: case CT_AGINO: case CT_AGNUMBER: @@ -474,6 +506,12 @@ getvalue(char *s, ctype_t ctype, cval_t *val) case CT_RTX: val->rtx = (xfs_rtblock_t)v; break; + case CT_RBMBLOCK: + val->rbmblock = (xfs_fileoff_t)v; + break; + case CT_RBMWORD: + val->rbmword = (unsigned int)v; + break; case CT_NONE: case NCTS: /* NOTREACHED */ diff --git a/man/man8/xfs_db.8 b/man/man8/xfs_db.8 index 40ff7de335e..65d1a65b75f 100644 --- a/man/man8/xfs_db.8 +++ b/man/man8/xfs_db.8 @@ -999,6 +999,16 @@ command) or .B rtextent (realtime extent) +.HP +.B rbmblock +or +.B rbmb +(realtime bitmap block) +.HP +.B rbmword +or +.B rbmw +(32-bit word within a realtime bitmap block) .PD .RE .IP ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCHSET v1.0 0/5] xfs_metadump: support external devices 2022-12-30 21:14 [NYE DELUGE 3/4] xfs: modernize the realtime volume Darrick J. Wong ` (23 preceding siblings ...) 2022-12-30 22:19 ` [PATCHSET v1.0 0/8] xfs_db: debug realtime geometry Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 1/5] xfs_db: allow selecting logdev blocks Darrick J. Wong ` (4 more replies) 2022-12-30 22:19 ` [PATCHSET v1.0 00/45] libxfs: shard the realtime section Darrick J. Wong ` (14 subsequent siblings) 39 siblings, 5 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs Hi all, This series augments the xfs_metadump and xfs_mdrestore utilities to capture the contents of an external log in a metadump, and restore it on the other end. This will enable better debugging analysis of broken filesystems, since it will now be possible to capture external log data. This is a prequisite for the rt groups feature, since we'll also need to capture the rt superblocks written to the rt device. This also means we can capture the contents of external logs for better analysis by support staff. If you're going to start using this mess, you probably ought to just pull from my git trees, which are linked below. This is an extraordinary way to destroy everything. Enjoy! Comments and questions are, as always, welcome. --D xfsprogs git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=metadump-external-devices fstests git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=metadump-external-devices --- db/block.c | 103 ++++++++++++++++++++++++ db/io.c | 18 ++++ db/io.h | 2 db/metadump.c | 98 +++++++++++++++++++++++ db/xfs_metadump.sh | 5 + include/xfs_metadump.h | 3 + man/man8/xfs_db.8 | 17 ++++ man/man8/xfs_mdrestore.8 | 8 ++ man/man8/xfs_metadump.8 | 13 ++- mdrestore/xfs_mdrestore.c | 190 ++++++++++++++++++++++++++++++++------------- 10 files changed, 393 insertions(+), 64 deletions(-) ^ permalink raw reply [flat|nested] 565+ messages in thread
* [PATCH 1/5] xfs_db: allow selecting logdev blocks 2022-12-30 22:19 ` [PATCHSET v1.0 0/5] xfs_metadump: support external devices Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 3/5] xfs_db: metadump external log devices Darrick J. Wong ` (3 subsequent siblings) 4 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Make it so that xfs_db can examine blocks on an external log device. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- db/io.c | 18 ++++++++++++++++++ db/io.h | 2 ++ 2 files changed, 20 insertions(+) diff --git a/db/io.c b/db/io.c index 00eb5e98dc2..8e3b32d9551 100644 --- a/db/io.c +++ b/db/io.c @@ -660,6 +660,24 @@ set_rt_cur( return 0; } +int +set_log_cur( + const typ_t *type, + xfs_daddr_t blknum, + int len, + int ring_flag, + bbmap_t *bbmap) +{ + if (!mp->m_logdev_targp->bt_bdev || + mp->m_logdev_targp->bt_bdev == mp->m_ddev_targp->bt_bdev) { + printf(_("external log device not loaded, use -l.\n")); + return ENODEV; + } + + __set_cur(mp->m_logdev_targp, type, blknum, len, ring_flag, bbmap); + return 0; +} + void set_iocur_type( const typ_t *type) diff --git a/db/io.h b/db/io.h index 1a37ee78c72..b3d8123d548 100644 --- a/db/io.h +++ b/db/io.h @@ -49,6 +49,8 @@ extern void push_cur_and_set_type(void); extern void write_cur(void); extern void set_cur(const struct typ *type, xfs_daddr_t blknum, int len, int ring_add, bbmap_t *bbmap); +extern int set_log_cur(const struct typ *type, xfs_daddr_t blknum, + int len, int ring_add, bbmap_t *bbmap); extern int set_rt_cur(const struct typ *type, xfs_daddr_t blknum, int len, int ring_add, bbmap_t *bbmap); extern void ring_add(void); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 3/5] xfs_db: metadump external log devices 2022-12-30 22:19 ` [PATCHSET v1.0 0/5] xfs_metadump: support external devices Darrick J. Wong 2022-12-30 22:19 ` [PATCH 1/5] xfs_db: allow selecting logdev blocks Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 5/5] xfs_mdrestore: fix missed progress reporting Darrick J. Wong ` (2 subsequent siblings) 4 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Teach the metadump command to dump the contents of an external log to the metadump file. Older mdrestore programs aren't going to recognize the new metablock info flag, change the magic number before adding new information flags to signal that the metablock is describing blocks on either an external log device or a realtime device. Realtime support isn't needed now, but it will be for realtime groups. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- db/metadump.c | 98 ++++++++++++++++++++++++++++++++++++++++++++++- db/xfs_metadump.sh | 5 +- include/xfs_metadump.h | 3 + man/man8/xfs_metadump.8 | 13 +++++- 4 files changed, 111 insertions(+), 8 deletions(-) diff --git a/db/metadump.c b/db/metadump.c index 996c97ca6a2..f337493d505 100644 --- a/db/metadump.c +++ b/db/metadump.c @@ -3002,6 +3002,74 @@ _("Could not discern log; image will contain unobfuscated metadata in log.")); return !write_buf(iocur_top); } +static int +copy_external_log(void) +{ + struct xlog log; + int dirty; + xfs_daddr_t logstart; + int logblocks; + int logversion; + int cycle = XLOG_INIT_CYCLE; + int error; + + if (show_progress) + print_progress("Copying external log"); + + push_cur(); + error = set_log_cur(&typtab[TYP_LOG], + XFS_FSB_TO_DADDR(mp, mp->m_sb.sb_logstart), + mp->m_sb.sb_logblocks * blkbb, DB_RING_IGN, NULL); + if (error) + return 0; + if (iocur_top->data == NULL) { + pop_cur(); + print_warning("cannot read external log"); + return !stop_on_read_error; + } + + /* If not obfuscating or zeroing, just copy the log as it is */ + if (!obfuscate && !zero_stale_data) + goto done; + + dirty = xlog_is_dirty(mp, &log, &x, 0); + + switch (dirty) { + case 0: + /* clear out a clean log */ + if (show_progress) + print_progress("Zeroing clean log"); + + logstart = XFS_FSB_TO_DADDR(mp, mp->m_sb.sb_logstart); + logblocks = XFS_FSB_TO_BB(mp, mp->m_sb.sb_logblocks); + logversion = xfs_has_logv2(mp) ? 2 : 1; + if (xfs_has_crc(mp)) + cycle = log.l_curr_cycle + 1; + + libxfs_log_clear(NULL, iocur_top->data, logstart, logblocks, + &mp->m_sb.sb_uuid, logversion, + mp->m_sb.sb_logsunit, XLOG_FMT, cycle, true); + break; + case 1: + /* keep the dirty log */ + if (obfuscate) + print_warning( +_("Warning: log recovery of an obfuscated metadata image can leak " +"unobfuscated metadata and/or cause image corruption. If possible, " +"please mount the filesystem to clean the log, or disable obfuscation.")); + break; + case -1: + /* log detection error */ + if (obfuscate) + print_warning( +_("Could not discern log; image will contain unobfuscated metadata in log.")); + break; + } + +done: + return !write_buf(iocur_top); +} + static int metadump_f( int argc, @@ -3012,6 +3080,7 @@ metadump_f( int start_iocur_sp; int outfd = -1; int ret; + bool copy_external = false; char *p; exitcode = 1; @@ -3035,7 +3104,7 @@ metadump_f( return 0; } - while ((c = getopt(argc, argv, "aegm:ow")) != EOF) { + while ((c = getopt(argc, argv, "aegm:owx")) != EOF) { switch (c) { case 'a': zero_stale_data = 0; @@ -3060,6 +3129,9 @@ metadump_f( case 'w': show_warnings = 1; break; + case 'x': + copy_external = true; + break; default: print_warning("bad option for metadump command"); return 0; @@ -3071,13 +3143,23 @@ metadump_f( return 0; } + /* + * Use the old format if there are no external devices with metadata to + * dump. + */ + if (mp->m_sb.sb_logstart != 0) + copy_external = false; + metablock = (xfs_metablock_t *)calloc(BBSIZE + 1, BBSIZE); if (metablock == NULL) { print_warning("memory allocation failure"); return 0; } metablock->mb_blocklog = BBSHIFT; - metablock->mb_magic = cpu_to_be32(XFS_MD_MAGIC); + if (copy_external) + metablock->mb_magic = cpu_to_be32(XFS_MDX_MAGIC); + else + metablock->mb_magic = cpu_to_be32(XFS_MD_MAGIC); /* Set flags about state of metadump */ metablock->mb_info = XFS_METADUMP_INFO_FLAGS; @@ -3165,7 +3247,7 @@ metadump_f( exitcode = 0; - for (agno = 0; agno < mp->m_sb.sb_agcount; agno++) { + for (agno = 0; !exitcode && agno < mp->m_sb.sb_agcount; agno++) { if (!scan_ag(agno)) { exitcode = 1; break; @@ -3184,6 +3266,16 @@ metadump_f( if (!exitcode) exitcode = write_index() < 0; + /* write the external log, if desired */ + if (!exitcode && mp->m_sb.sb_logstart == 0 && copy_external) { + metablock->mb_info |= XFS_METADUMP_LOGDEV; + + if (!copy_external_log()) + exitcode = 1; + if (!exitcode) + exitcode = write_index() < 0; + } + if (progress_since_warning) fputc('\n', stdout_metadump ? stderr : stdout); diff --git a/db/xfs_metadump.sh b/db/xfs_metadump.sh index 9852a5bc2b0..06bfc4e7bd4 100755 --- a/db/xfs_metadump.sh +++ b/db/xfs_metadump.sh @@ -6,9 +6,9 @@ OPTS=" " DBOPTS=" " -USAGE="Usage: xfs_metadump [-aefFogwV] [-m max_extents] [-l logdev] source target" +USAGE="Usage: xfs_metadump [-aefFgoVwx] [-m max_extents] [-l logdev] source target" -while getopts "aefgl:m:owFV" c +while getopts "aefgl:m:owFVx" c do case $c in a) OPTS=$OPTS"-a ";; @@ -24,6 +24,7 @@ do status=$? exit $status ;; + x) OPTS=$OPTS"-x ";; \?) echo $USAGE 1>&2 exit 2 ;; diff --git a/include/xfs_metadump.h b/include/xfs_metadump.h index fbd9902327b..2373b0d8b50 100644 --- a/include/xfs_metadump.h +++ b/include/xfs_metadump.h @@ -8,6 +8,7 @@ #define _XFS_METADUMP_H_ #define XFS_MD_MAGIC 0x5846534d /* 'XFSM' */ +#define XFS_MDX_MAGIC 0x584d4458 /* 'XMDX' */ typedef struct xfs_metablock { __be32 mb_magic; @@ -22,5 +23,7 @@ typedef struct xfs_metablock { #define XFS_METADUMP_OBFUSCATED (1 << 1) #define XFS_METADUMP_FULLBLOCKS (1 << 2) #define XFS_METADUMP_DIRTYLOG (1 << 3) +#define XFS_METADUMP_LOGDEV (1 << 4) /* targets external log device */ +#define XFS_METADUMP_RTDEV (1 << 5) /* targets realtime volume */ #endif /* _XFS_METADUMP_H_ */ diff --git a/man/man8/xfs_metadump.8 b/man/man8/xfs_metadump.8 index c0e79d77993..b940cb084b5 100644 --- a/man/man8/xfs_metadump.8 +++ b/man/man8/xfs_metadump.8 @@ -4,7 +4,7 @@ xfs_metadump \- copy XFS filesystem metadata to a file .SH SYNOPSIS .B xfs_metadump [ -.B \-aefFgow +.B \-aefFgowx ] [ .B \-m .I max_extents @@ -123,8 +123,10 @@ is stdout. .TP .BI \-l " logdev" For filesystems which use an external log, this specifies the device where the -external log resides. The external log is not copied, only internal logs are -copied. +external log resides. +To record the contents of the external log in the dump, the +.B \-x +option must also be specified. .TP .B \-m Set the maximum size of an allowed metadata extent. Extremely large metadata @@ -138,6 +140,11 @@ Disables obfuscation of file names and extended attributes. Prints warnings of inconsistent metadata encountered to stderr. Bad metadata is still copied. .TP +.B \-x +Dump the external log device, if present. +The metadump file will not be compatible with older versions of +.BR xfs_mdrestore (1). +.TP .B \-V Prints the version number and exits. .SH DIAGNOSTICS ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 5/5] xfs_mdrestore: fix missed progress reporting 2022-12-30 22:19 ` [PATCHSET v1.0 0/5] xfs_metadump: support external devices Darrick J. Wong 2022-12-30 22:19 ` [PATCH 1/5] xfs_db: allow selecting logdev blocks Darrick J. Wong 2022-12-30 22:19 ` [PATCH 3/5] xfs_db: metadump external log devices Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 4/5] xfs_mdrestore: restore log contents to external log devices Darrick J. Wong 2022-12-30 22:19 ` [PATCH 2/5] xfs_db: allow setting current address to log blocks Darrick J. Wong 4 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Currently, the progress reporting only triggers when the number of bytes read is exactly a multiple of a megabyte. This isn't always guaranteed, since AG headers can be 512 bytes in size. Fix the algorithm by recording the number of megabytes we've reported as being read, and emit a new report any time the bytes_read count, once converted to megabytes, doesn't match. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- mdrestore/xfs_mdrestore.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/mdrestore/xfs_mdrestore.c b/mdrestore/xfs_mdrestore.c index 4318fac9008..672010bcc6e 100644 --- a/mdrestore/xfs_mdrestore.c +++ b/mdrestore/xfs_mdrestore.c @@ -126,6 +126,7 @@ perform_restore( int mb_count; xfs_sb_t sb; int64_t bytes_read; + int64_t mb_read = 0; int log_fd = -1; bool is_mdx; @@ -205,8 +206,14 @@ perform_restore( fatal("rtdev not supported\n"); } - if (show_progress && (bytes_read & ((1 << 20) - 1)) == 0) - print_progress("%lld MB read", bytes_read >> 20); + if (show_progress) { + int64_t mb_now = bytes_read >> 20; + + if (mb_now != mb_read) { + print_progress("%lld MB read", mb_now); + mb_read = mb_now; + } + } for (cur_index = 0; cur_index < mb_count; cur_index++) { if (pwrite(write_fd, &block_buffer[cur_index << @@ -245,6 +252,9 @@ perform_restore( bytes_read += block_size + (mb_count << mbp->mb_blocklog); } + if (show_progress && bytes_read > (mb_read << 20)) + print_progress("%lld MB read", mb_read + 1); + if (progress_since_warning) putchar('\n'); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 4/5] xfs_mdrestore: restore log contents to external log devices 2022-12-30 22:19 ` [PATCHSET v1.0 0/5] xfs_metadump: support external devices Darrick J. Wong ` (2 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 5/5] xfs_mdrestore: fix missed progress reporting Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 2/5] xfs_db: allow setting current address to log blocks Darrick J. Wong 4 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Support restoring log data to an external log device, if the dumped filesystem had one. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- man/man8/xfs_mdrestore.8 | 8 ++ mdrestore/xfs_mdrestore.c | 176 +++++++++++++++++++++++++++++++-------------- 2 files changed, 131 insertions(+), 53 deletions(-) diff --git a/man/man8/xfs_mdrestore.8 b/man/man8/xfs_mdrestore.8 index 72f3b297787..4626b98e749 100644 --- a/man/man8/xfs_mdrestore.8 +++ b/man/man8/xfs_mdrestore.8 @@ -5,12 +5,17 @@ xfs_mdrestore \- restores an XFS metadump image to a filesystem image .B xfs_mdrestore [ .B \-gi +] [ +.B \-l logdev ] .I source .I target .br .B xfs_mdrestore .B \-i +[ +.B \-l logdev +] .I source .br .B xfs_mdrestore \-V @@ -43,6 +48,9 @@ can be destroyed. .B \-g Shows restore progress on stdout. .TP +.B \-l +Restore log contents to this external log device. +.TP .B \-i Shows metadump information on stdout. If no .I target diff --git a/mdrestore/xfs_mdrestore.c b/mdrestore/xfs_mdrestore.c index 9f8cbe98cd6..4318fac9008 100644 --- a/mdrestore/xfs_mdrestore.c +++ b/mdrestore/xfs_mdrestore.c @@ -38,6 +38,67 @@ print_progress(const char *fmt, ...) progress_since_warning = 1; } +extern int platform_check_ismounted(char *, char *, struct stat *, int); + +static int +open_device( + char *path, + int *is_target_file) +{ + struct stat statbuf; + int open_flags = O_RDWR; + int dst_fd; + + *is_target_file = 0; + if (stat(path, &statbuf) < 0) { + /* ok, assume it's a file and create it */ + open_flags |= O_CREAT; + *is_target_file = 1; + } else if (S_ISREG(statbuf.st_mode)) { + open_flags |= O_TRUNC; + *is_target_file = 1; + } else { + /* + * check to make sure a filesystem isn't mounted on the device + */ + if (platform_check_ismounted(path, NULL, &statbuf, 0)) + fatal("a filesystem is mounted on target device \"%s\"," + " cannot restore to a mounted filesystem.\n", + path); + } + + dst_fd = open(path, open_flags, 0644); + if (dst_fd < 0) + fatal("couldn't open target \"%s\"\n", path); + + return dst_fd; +} + +static void +check_dev( + int dst_fd, + int is_target_file, + unsigned long long bytes) +{ + if (is_target_file) { + /* ensure regular files are correctly sized */ + + if (ftruncate(dst_fd, bytes)) + fatal("cannot set filesystem image size: %s\n", + strerror(errno)); + } else { + /* ensure device is sufficiently large enough */ + + char *lb[XFS_MAX_SECTORSIZE] = { NULL }; + off64_t off; + + off = bytes - sizeof(lb); + if (pwrite(dst_fd, lb, sizeof(lb), off) < 0) + fatal("failed to write last block, is target too " + "small? (error: %s)\n", strerror(errno)); + } +} + /* * perform_restore() -- do the actual work to restore the metadump * @@ -53,7 +114,8 @@ perform_restore( FILE *src_f, int dst_fd, int is_target_file, - const struct xfs_metablock *mbp) + const struct xfs_metablock *mbp, + char *log_path) { struct xfs_metablock *metablock; /* header + index + blocks */ __be64 *block_index; @@ -64,6 +126,10 @@ perform_restore( int mb_count; xfs_sb_t sb; int64_t bytes_read; + int log_fd = -1; + bool is_mdx; + + is_mdx = mbp->mb_magic == cpu_to_be32(XFS_MDX_MAGIC); block_size = 1 << mbp->mb_blocklog; max_indices = (block_size - sizeof(xfs_metablock_t)) / sizeof(__be64); @@ -76,6 +142,7 @@ perform_restore( if (mb_count == 0 || mb_count > max_indices) fatal("bad block count: %u\n", mb_count); + memcpy(metablock, mbp, sizeof(struct xfs_metablock)); block_index = (__be64 *)((char *)metablock + sizeof(xfs_metablock_t)); block_buffer = (char *)metablock + block_size; @@ -106,32 +173,43 @@ perform_restore( ((struct xfs_dsb*)block_buffer)->sb_inprogress = 1; - if (is_target_file) { - /* ensure regular files are correctly sized */ - - if (ftruncate(dst_fd, sb.sb_dblocks * sb.sb_blocksize)) - fatal("cannot set filesystem image size: %s\n", - strerror(errno)); - } else { - /* ensure device is sufficiently large enough */ - - char *lb[XFS_MAX_SECTORSIZE] = { NULL }; - off64_t off; - - off = sb.sb_dblocks * sb.sb_blocksize - sizeof(lb); - if (pwrite(dst_fd, lb, sizeof(lb), off) < 0) - fatal("failed to write last block, is target too " - "small? (error: %s)\n", strerror(errno)); - } + check_dev(dst_fd, is_target_file, sb.sb_dblocks * sb.sb_blocksize); bytes_read = 0; for (;;) { + int write_fd = dst_fd; + + if (metablock->mb_magic != mbp->mb_magic) + fatal("magic value 0x%x wrong, expected 0x%x\n", + metablock->mb_magic, mbp->mb_magic); + + if (metablock->mb_info & XFS_METADUMP_LOGDEV) { + int log_is_file; + + if (!is_mdx) + fatal("logdev set on an old style metadump?\n"); + if (log_fd == -1) { + if (!log_path) + fatal( + "metadump has log contents but -l was not specified?\n"); + log_fd = open_device(log_path, &log_is_file); + check_dev(log_fd, log_is_file, + sb.sb_logblocks * sb.sb_blocksize); + } + write_fd = log_fd; + } + if (metablock->mb_info & XFS_METADUMP_RTDEV) { + if (!is_mdx) + fatal("rtdev set on an old style metadump?\n"); + fatal("rtdev not supported\n"); + } + if (show_progress && (bytes_read & ((1 << 20) - 1)) == 0) print_progress("%lld MB read", bytes_read >> 20); for (cur_index = 0; cur_index < mb_count; cur_index++) { - if (pwrite(dst_fd, &block_buffer[cur_index << + if (pwrite(write_fd, &block_buffer[cur_index << mbp->mb_blocklog], block_size, be64_to_cpu(block_index[cur_index]) << BBSHIFT) < 0) @@ -139,11 +217,20 @@ perform_restore( be64_to_cpu(block_index[cur_index]) << BBSHIFT, strerror(errno)); } - if (mb_count < max_indices) - break; + if (is_mdx) { + size_t nr = fread(metablock, block_size, 1, src_f); - if (fread(metablock, block_size, 1, src_f) != 1) - fatal("error reading from metadump file\n"); + if (nr == 0) + break; + if (nr != 1) + fatal("error reading from extended metadump file\n"); + } else { + if (mb_count < max_indices) + break; + + if (fread(metablock, block_size, 1, src_f) != 1) + fatal("error reading from metadump file\n"); + } mb_count = be16_to_cpu(metablock->mb_count); if (mb_count == 0) @@ -170,38 +257,41 @@ perform_restore( if (pwrite(dst_fd, block_buffer, sb.sb_sectsize, 0) < 0) fatal("error writing primary superblock: %s\n", strerror(errno)); + if (log_fd >= 0) + close(log_fd); + free(metablock); } static void usage(void) { - fprintf(stderr, "Usage: %s [-V] [-g] [-i] source target\n", progname); + fprintf(stderr, "Usage: %s [-V] [-g] [-i] [-l logdev] source target\n", progname); exit(1); } -extern int platform_check_ismounted(char *, char *, struct stat *, int); - int main( int argc, char **argv) { + char *log_path = NULL; FILE *src_f; int dst_fd; int c; - int open_flags; - struct stat statbuf; int is_target_file; struct xfs_metablock mb; progname = basename(argv[0]); - while ((c = getopt(argc, argv, "giV")) != EOF) { + while ((c = getopt(argc, argv, "gl:iV")) != EOF) { switch (c) { case 'g': show_progress = 1; break; + case 'l': + log_path = optarg; + break; case 'i': show_info = 1; break; @@ -238,7 +328,8 @@ main( if (fread(&mb, sizeof(mb), 1, src_f) != 1) fatal("error reading from metadump file\n"); - if (mb.mb_magic != cpu_to_be32(XFS_MD_MAGIC)) + if (mb.mb_magic != cpu_to_be32(XFS_MD_MAGIC) && + mb.mb_magic != cpu_to_be32(XFS_MDX_MAGIC)) fatal("specified file is not a metadata dump\n"); if (show_info) { @@ -260,30 +351,9 @@ main( optind++; /* check and open target */ - open_flags = O_RDWR; - is_target_file = 0; - if (stat(argv[optind], &statbuf) < 0) { - /* ok, assume it's a file and create it */ - open_flags |= O_CREAT; - is_target_file = 1; - } else if (S_ISREG(statbuf.st_mode)) { - open_flags |= O_TRUNC; - is_target_file = 1; - } else { - /* - * check to make sure a filesystem isn't mounted on the device - */ - if (platform_check_ismounted(argv[optind], NULL, &statbuf, 0)) - fatal("a filesystem is mounted on target device \"%s\"," - " cannot restore to a mounted filesystem.\n", - argv[optind]); - } + dst_fd = open_device(argv[optind], &is_target_file); - dst_fd = open(argv[optind], open_flags, 0644); - if (dst_fd < 0) - fatal("couldn't open target \"%s\"\n", argv[optind]); - - perform_restore(src_f, dst_fd, is_target_file, &mb); + perform_restore(src_f, dst_fd, is_target_file, &mb, log_path); close(dst_fd); if (src_f != stdin) ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 2/5] xfs_db: allow setting current address to log blocks 2022-12-30 22:19 ` [PATCHSET v1.0 0/5] xfs_metadump: support external devices Darrick J. Wong ` (3 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 4/5] xfs_mdrestore: restore log contents to external log devices Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 4 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Add commands so that users can target blocks on an external log device. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- db/block.c | 103 ++++++++++++++++++++++++++++++++++++++++++++++++++++- man/man8/xfs_db.8 | 17 +++++++++ 2 files changed, 119 insertions(+), 1 deletion(-) diff --git a/db/block.c b/db/block.c index 1afe201d0b2..98df0ce10ac 100644 --- a/db/block.c +++ b/db/block.c @@ -29,6 +29,8 @@ static int rtblock_f(int argc, char **argv); static void rtblock_help(void); static int rtextent_f(int argc, char **argv); static void rtextent_help(void); +static int logblock_f(int argc, char **argv); +static void logblock_help(void); static void print_rawdata(void *data, int len); static const cmdinfo_t ablock_cmd = @@ -49,6 +51,9 @@ static const cmdinfo_t rtblock_cmd = static const cmdinfo_t rtextent_cmd = { "rtextent", "rtx", rtextent_f, 0, 1, 1, N_("[rtxno]"), N_("set address to rtextent value"), rtextent_help }; +static const cmdinfo_t logblock_cmd = + { "logblock", "lsb", logblock_f, 0, 1, 1, N_("[logbno]"), + N_("set address to logblock value"), logblock_help }; static void ablock_help(void) @@ -116,6 +121,7 @@ block_init(void) add_command(&fsblock_cmd); add_command(&rtblock_cmd); add_command(&rtextent_cmd); + add_command(&logblock_cmd); } static void @@ -132,6 +138,7 @@ daddr_help(void) enum daddr_target { DT_DATA, DT_RT, + DT_LOG, }; static int @@ -145,18 +152,27 @@ daddr_f( xfs_rfsblock_t max_daddrs = mp->m_sb.sb_dblocks; enum daddr_target tgt = DT_DATA; - while ((c = getopt(argc, argv, "r")) != -1) { + while ((c = getopt(argc, argv, "rl")) != -1) { switch (c) { case 'r': tgt = DT_RT; max_daddrs = mp->m_sb.sb_rblocks; break; + case 'l': + tgt = DT_LOG; + max_daddrs = mp->m_sb.sb_logblocks; + break; default: daddr_help(); return 0; } } + if (tgt == DT_LOG && mp->m_sb.sb_logstart > 0) { + dbprintf(_("filesystem has internal log\n")); + return 0; + } + if (optind == argc) { xfs_daddr_t daddr = iocur_top->off >> BBSHIFT; @@ -191,6 +207,9 @@ daddr_f( case DT_RT: set_rt_cur(&typtab[TYP_DATA], d, 1, DB_RING_ADD, NULL); break; + case DT_LOG: + set_log_cur(&typtab[TYP_DATA], d, 1, DB_RING_ADD, NULL); + break; } return 0; } @@ -408,6 +427,88 @@ rtextent_f( return 0; } +static void +logblock_help(void) +{ + dbprintf(_( +"\n Example:\n" +"\n" +" 'logblock 1023' - sets the file position to the 1023rd log block.\n" +" The external log device or the block offset within the internal log will be\n" +" chosen as appropriate.\n" +)); +} + +static int +logblock_f( + int argc, + char **argv) +{ + xfs_fsblock_t logblock; + char *p; + + if (argc == 1) { + if (mp->m_sb.sb_logstart > 0 && iocur_is_ddev(iocur_top)) { + logblock = XFS_DADDR_TO_FSB(mp, + iocur_top->off >> BBSHIFT); + + if (logblock < mp->m_sb.sb_logstart || + logblock >= mp->m_sb.sb_logstart + + mp->m_sb.sb_logblocks) { + dbprintf( + _("current address not within internal log\n")); + return 0; + } + + dbprintf(_("current logblock is %lld\n"), + logblock - mp->m_sb.sb_logstart); + return 0; + } + + if (mp->m_sb.sb_logstart == 0 && + iocur_is_extlogdev(iocur_top)) { + logblock = XFS_BB_TO_FSB(mp, + iocur_top->off >> BBSHIFT); + + if (logblock >= mp->m_sb.sb_logblocks) { + dbprintf( + _("current address not within external log\n")); + return 0; + } + + dbprintf(_("current logblock is %lld\n"), logblock); + return 0; + } + + dbprintf(_("current address does not point to log\n")); + return 0; + } + + logblock = strtoull(argv[1], &p, 0); + if (*p != '\0') { + dbprintf(_("bad logblock %s\n"), argv[1]); + return 0; + } + + if (logblock >= mp->m_sb.sb_logblocks) { + dbprintf(_("bad logblock %s\n"), argv[1]); + return 0; + } + + ASSERT(typtab[TYP_DATA].typnm == TYP_DATA); + + if (mp->m_sb.sb_logstart) { + logblock += mp->m_sb.sb_logstart; + set_cur(&typtab[TYP_DATA], XFS_FSB_TO_DADDR(mp, logblock), + blkbb, DB_RING_ADD, NULL); + } else { + set_log_cur(&typtab[TYP_DATA], XFS_FSB_TO_BB(mp, logblock), + blkbb, DB_RING_ADD, NULL); + } + + return 0; +} + void print_block( const field_t *fields, diff --git a/man/man8/xfs_db.8 b/man/man8/xfs_db.8 index 1246eb6327c..9c1ce5d79cf 100644 --- a/man/man8/xfs_db.8 +++ b/man/man8/xfs_db.8 @@ -600,6 +600,9 @@ The type is set to .B data (uninterpreted). +If an address and the +.B \-l +option are specified, the current address is set to the external log device. If an address and the .B \-r option are specified, the current address is set to the realtime device. @@ -839,6 +842,20 @@ Start logging output to .IR filename , stop logging, or print the current logging status. .TP +.BI "logblock [" logbno ] +Set current address to the log block value given by +.IR logbno . +If no value for +.I logbno +is given the current address is printed, expressed as an fsb. +The type is set to +.B data +(uninterpreted). +If the filesystem has an external log, then the address will be within the log +device. +If the filesystem has an internal log, then the address will be within the +internal log. +.TP .BI "logformat [\-c " cycle "] [\-s " sunit "]" Reformats the log to the specified log cycle and log stripe unit. This has the effect of clearing the log destructively. ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCHSET v1.0 00/45] libxfs: shard the realtime section 2022-12-30 21:14 [NYE DELUGE 3/4] xfs: modernize the realtime volume Darrick J. Wong ` (24 preceding siblings ...) 2022-12-30 22:19 ` [PATCHSET v1.0 0/5] xfs_metadump: support external devices Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 02/45] xfs: define the format of rt groups Darrick J. Wong ` (44 more replies) 2022-12-30 22:19 ` [PATCHSET v1.0 0/3] libxfs: widen EFI format to support rt Darrick J. Wong ` (13 subsequent siblings) 39 siblings, 45 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs Hi all, Right now, the realtime section uses a single pair of metadata inodes to store the free space information. This presents a scalability problem since every thread trying to allocate or free rt extents have to lock these files. It would be very useful if we could begin to tackle these problems by sharding the realtime section, so create the notion of realtime groups, which are similar to allocation groups on the data section. While we're at it, define a superblock to be stamped into the start of each rt section. This enables utilities such as blkid to identify block devices containing realtime sections, and helpfully avoids the situation where a file extent can cross an rtgroup boundary. The best advantage for rtgroups will become evident later when we get to adding rmap and reflink to the realtime volume, since the geometry constraints are the same for rt groups and AGs. Hence we can reuse all that code directly. If you're going to start using this mess, you probably ought to just pull from my git trees, which are linked below. This is an extraordinary way to destroy everything. Enjoy! Comments and questions are, as always, welcome. --D kernel git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=realtime-groups xfsprogs git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=realtime-groups fstests git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=realtime-groups --- db/Makefile | 2 db/bit.c | 24 ++ db/bit.h | 1 db/check.c | 58 ++++- db/command.c | 2 db/convert.c | 46 +++- db/field.c | 18 + db/field.h | 11 + db/fprint.c | 11 + db/inode.c | 9 + db/metadump.c | 51 ++++ db/rtgroup.c | 169 ++++++++++++++ db/rtgroup.h | 21 ++ db/sb.c | 136 ++++++++++- db/type.c | 16 + db/type.h | 32 ++- db/xfs_metadump.sh | 5 include/libxfs.h | 1 include/xfs_arch.h | 6 include/xfs_mount.h | 12 + include/xfs_trace.h | 5 include/xfs_trans.h | 1 io/Makefile | 2 io/aginfo.c | 215 ++++++++++++++++++ io/bmap.c | 30 ++ io/fsmap.c | 23 ++ io/init.c | 1 io/io.h | 1 io/scrub.c | 15 + libfrog/div64.h | 6 libfrog/fsgeom.c | 24 ++ libfrog/fsgeom.h | 23 ++ libfrog/scrub.c | 10 + libfrog/scrub.h | 1 libfrog/util.c | 26 ++ libfrog/util.h | 3 libxfs/Makefile | 2 libxfs/defer_item.c | 17 + libxfs/init.c | 21 ++ libxfs/libxfs_api_defs.h | 6 libxfs/libxfs_io.h | 1 libxfs/libxfs_priv.h | 25 ++ libxfs/rdwr.c | 17 + libxfs/topology.c | 42 +++ libxfs/topology.h | 3 libxfs/trans.c | 29 ++ libxfs/util.c | 7 + libxfs/xfs_bmap.h | 5 libxfs/xfs_format.h | 94 ++++++++ libxfs/xfs_fs.h | 24 ++ libxfs/xfs_health.h | 30 ++ libxfs/xfs_rtbitmap.c | 126 +++++++++- libxfs/xfs_rtbitmap.h | 46 ++++ libxfs/xfs_rtgroup.c | 545 +++++++++++++++++++++++++++++++++++++++++++++ libxfs/xfs_rtgroup.h | 241 ++++++++++++++++++++ libxfs/xfs_sb.c | 124 ++++++++++ libxfs/xfs_shared.h | 4 libxfs/xfs_types.c | 46 ++++ libxfs/xfs_types.h | 4 man/man8/mkfs.xfs.8.in | 44 ++++ man/man8/xfs_db.8 | 17 + man/man8/xfs_io.8 | 23 ++ man/man8/xfs_mdrestore.8 | 7 + man/man8/xfs_metadump.8 | 12 + man/man8/xfs_spaceman.8 | 5 mdrestore/xfs_mdrestore.c | 30 ++ mkfs/proto.c | 104 +++++++++ mkfs/xfs_mkfs.c | 279 +++++++++++++++++++++++ repair/agheader.c | 2 repair/incore.c | 22 ++ repair/phase3.c | 3 repair/phase6.c | 39 +++ repair/rt.c | 153 ++++++++++++- repair/rt.h | 3 repair/sb.c | 46 ++++ repair/xfs_repair.c | 11 + scrub/phase2.c | 97 ++++++++ scrub/phase8.c | 46 ++++ scrub/repair.c | 2 scrub/scrub.c | 4 scrub/scrub.h | 9 + spaceman/health.c | 59 +++++ 82 files changed, 3361 insertions(+), 132 deletions(-) create mode 100644 db/rtgroup.c create mode 100644 db/rtgroup.h create mode 100644 io/aginfo.c create mode 100644 libxfs/xfs_rtgroup.c create mode 100644 libxfs/xfs_rtgroup.h ^ permalink raw reply [flat|nested] 565+ messages in thread
* [PATCH 02/45] xfs: define the format of rt groups 2022-12-30 22:19 ` [PATCHSET v1.0 00/45] libxfs: shard the realtime section Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 01/45] xfs: create incore realtime group structures Darrick J. Wong ` (43 subsequent siblings) 44 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Define the ondisk format of realtime group metadata. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- db/convert.c | 8 ---- include/libxfs.h | 1 + libfrog/util.c | 14 +++++++ libfrog/util.h | 2 + libxfs/libxfs_priv.h | 19 ++++++++++ libxfs/xfs_format.h | 62 ++++++++++++++++++++++++++++++++- libxfs/xfs_rtgroup.c | 95 ++++++++++++++++++++++++++++++++++++++++++++++++++ libxfs/xfs_rtgroup.h | 83 ++++++++++++++++++++++++++++++++++++++++++++ libxfs/xfs_sb.c | 86 +++++++++++++++++++++++++++++++++++++++++++-- libxfs/xfs_shared.h | 1 + mkfs/xfs_mkfs.c | 1 + repair/sb.c | 1 + 12 files changed, 360 insertions(+), 13 deletions(-) diff --git a/db/convert.c b/db/convert.c index 0aed1437dc4..072ccc8f6ef 100644 --- a/db/convert.c +++ b/db/convert.c @@ -413,14 +413,6 @@ convert_f(int argc, char **argv) return 0; } -static inline xfs_rtblock_t -xfs_daddr_to_rtb( - struct xfs_mount *mp, - xfs_daddr_t daddr) -{ - return daddr >> mp->m_blkbb_log; -} - static inline uint64_t rt_daddr_to_rsumblock( struct xfs_mount *mp, diff --git a/include/libxfs.h b/include/libxfs.h index 26202dede67..5b58750fcd5 100644 --- a/include/libxfs.h +++ b/include/libxfs.h @@ -86,6 +86,7 @@ struct iomap; #include "xfs_ag_resv.h" #include "xfs_imeta.h" #include "xfs_rtbitmap.h" +#include "xfs_rtgroup.h" #ifndef ARRAY_SIZE #define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0])) diff --git a/libfrog/util.c b/libfrog/util.c index 8fb10cf82f5..46047571a55 100644 --- a/libfrog/util.c +++ b/libfrog/util.c @@ -22,3 +22,17 @@ log2_roundup(unsigned int i) } return rval; } + +void * +memchr_inv(const void *start, int c, size_t bytes) +{ + const unsigned char *p = start; + + while (bytes > 0) { + if (*p != (unsigned char)c) + return (void *)p; + bytes--; + } + + return NULL; +} diff --git a/libfrog/util.h b/libfrog/util.h index 1b97881bf16..ac2f331c93e 100644 --- a/libfrog/util.h +++ b/libfrog/util.h @@ -8,4 +8,6 @@ unsigned int log2_roundup(unsigned int i); +void *memchr_inv(const void *start, int c, size_t bytes); + #endif /* __LIBFROG_UTIL_H__ */ diff --git a/libxfs/libxfs_priv.h b/libxfs/libxfs_priv.h index aeb42837af4..57b92ac1b99 100644 --- a/libxfs/libxfs_priv.h +++ b/libxfs/libxfs_priv.h @@ -51,6 +51,7 @@ #include "kmem.h" #include "libfrog/radix-tree.h" #include "libfrog/div64.h" +#include "libfrog/util.h" #include "atomic.h" #include "spinlock.h" #include "linux-err.h" @@ -396,6 +397,24 @@ static inline unsigned long long mask64_if_power2(unsigned long b) return is_power_of_2(b) ? b - 1 : 0; } +/* If @b is a power of 2, return log2(b). Else return zero. */ +static inline unsigned int log2_if_power(unsigned long b) +{ + unsigned long mask = 1; + unsigned int i; + unsigned int ret = 1; + + if (!is_power_of_2(b)) + return 0; + + for (i = 0; i < NBBY * sizeof(unsigned long); i++, mask <<= 1) { + if (b & mask) + ret = i; + } + + return ret; +} + /* buffer management */ #define XBF_TRYLOCK 0 #define XBF_UNMAPPED 0 diff --git a/libxfs/xfs_format.h b/libxfs/xfs_format.h index ca87a3f8704..a38e1499bd4 100644 --- a/libxfs/xfs_format.h +++ b/libxfs/xfs_format.h @@ -216,7 +216,17 @@ struct xfs_dsb { * pointers are no longer used. */ __be64 sb_rbmino; - __be64 sb_rsumino; /* summary inode for rt bitmap */ + /* + * rtgroups requires metadir, so we reuse the rsumino space to hold + * the rg block count and shift values. + */ + union { + __be64 sb_rsumino; /* summary inode for rt bitmap */ + struct { + __be32 sb_rgcount; /* # of realtime groups */ + __be32 sb_rgblocks; /* rtblocks per group */ + }; + }; __be32 sb_rextsize; /* realtime extent size, blocks */ __be32 sb_agblocks; /* size of an allocation group */ __be32 sb_agcount; /* number of allocation groups */ @@ -397,6 +407,7 @@ xfs_sb_has_ro_compat_feature( #define XFS_SB_FEAT_INCOMPAT_BIGTIME (1 << 3) /* large timestamps */ #define XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR (1 << 4) /* needs xfs_repair */ #define XFS_SB_FEAT_INCOMPAT_NREXT64 (1 << 5) /* large extent counters */ +#define XFS_SB_FEAT_INCOMPAT_RTGROUPS (1 << 30) /* realtime groups */ #define XFS_SB_FEAT_INCOMPAT_METADIR (1U << 31) /* metadata dir tree */ #define XFS_SB_FEAT_INCOMPAT_ALL \ (XFS_SB_FEAT_INCOMPAT_FTYPE| \ @@ -741,6 +752,55 @@ union xfs_suminfo_ondisk { __u32 raw; }; +/* + * Realtime allocation groups break the rt section into multiple pieces that + * could be locked independently. Realtime block group numbers are 32-bit + * quantities. Block numbers within a group are also 32-bit quantities, but + * the upper bit must never be set. + */ +#define XFS_MAX_RGBLOCKS ((xfs_rgblock_t)(1U << 31) - 1) +#define XFS_MAX_RGNUMBER ((xfs_rgnumber_t)(-1U)) + +#define XFS_RTSB_MAGIC 0x58524750 /* 'XRGP' */ + +/* + * Realtime superblock - on disk version. Must be padded to 64 bit alignment. + * The first block of each realtime group contains this superblock; this is + * how we avoid having file data extents cross a group boundary. + */ +struct xfs_rtsb { + __be32 rsb_magicnum; /* magic number == XFS_RTSB_MAGIC */ + __be32 rsb_blocksize; /* logical block size, bytes */ + __be64 rsb_rblocks; /* number of realtime blocks */ + + __be64 rsb_rextents; /* number of realtime extents */ + __be64 rsb_lsn; /* last write sequence */ + + __be32 rsb_rgcount; /* # of realtime groups */ + char rsb_fname[XFSLABEL_MAX]; /* rt volume name */ + + uuid_t rsb_uuid; /* user-visible file system unique id */ + + __be32 rsb_rextsize; /* realtime extent size, blocks */ + __be32 rsb_rbmblocks; /* number of rt bitmap blocks */ + + __be32 rsb_rgblocks; /* rt blocks per group */ + __u8 rsb_blocklog; /* log2 of sb_blocksize */ + __u8 rsb_sectlog; /* log2 of sb_sectsize */ + __u8 rsb_rextslog; /* log2 of sb_rextents */ + __u8 rsb_pad; + + __le32 rsb_crc; /* superblock crc */ + __le32 rsb_pad2; + + uuid_t rsb_meta_uuid; /* metadata file system unique id */ + + /* must be padded to 64 bit alignment */ +}; + +#define XFS_RTSB_CRC_OFF offsetof(struct xfs_rtsb, rsb_crc) +#define XFS_RTSB_DADDR ((xfs_daddr_t)0) /* daddr in rt section */ + /* * XFS Timestamps * ============== diff --git a/libxfs/xfs_rtgroup.c b/libxfs/xfs_rtgroup.c index 7d26ef76d3e..ef1d1f29d64 100644 --- a/libxfs/xfs_rtgroup.c +++ b/libxfs/xfs_rtgroup.c @@ -210,3 +210,98 @@ xfs_rtgroup_block_count( return __xfs_rtgroup_block_count(mp, rgno, mp->m_sb.sb_rgcount, mp->m_sb.sb_rblocks); } + +static xfs_failaddr_t +xfs_rtsb_verify( + struct xfs_buf *bp) +{ + struct xfs_mount *mp = bp->b_mount; + struct xfs_rtsb *rsb = bp->b_addr; + + if (!xfs_verify_magic(bp, rsb->rsb_magicnum)) + return __this_address; + if (be32_to_cpu(rsb->rsb_blocksize) != mp->m_sb.sb_blocksize) + return __this_address; + if (be64_to_cpu(rsb->rsb_rblocks) != mp->m_sb.sb_rblocks) + return __this_address; + + if (be64_to_cpu(rsb->rsb_rextents) != mp->m_sb.sb_rextents) + return __this_address; + + if (!uuid_equal(&rsb->rsb_uuid, &mp->m_sb.sb_uuid)) + return __this_address; + + if (be32_to_cpu(rsb->rsb_rgcount) != mp->m_sb.sb_rgcount) + return __this_address; + + if (be32_to_cpu(rsb->rsb_rextsize) != mp->m_sb.sb_rextsize) + return __this_address; + if (be32_to_cpu(rsb->rsb_rbmblocks) != mp->m_sb.sb_rbmblocks) + return __this_address; + + if (be32_to_cpu(rsb->rsb_rgblocks) != mp->m_sb.sb_rgblocks) + return __this_address; + if (rsb->rsb_blocklog != mp->m_sb.sb_blocklog) + return __this_address; + if (rsb->rsb_sectlog != mp->m_sb.sb_sectlog) + return __this_address; + if (rsb->rsb_rextslog != mp->m_sb.sb_rextslog) + return __this_address; + if (rsb->rsb_pad) + return __this_address; + + if (rsb->rsb_pad2) + return __this_address; + + if (!uuid_equal(&rsb->rsb_meta_uuid, &mp->m_sb.sb_meta_uuid)) + return __this_address; + + /* Everything to the end of the fs block must be zero */ + if (memchr_inv(rsb + 1, 0, BBTOB(bp->b_length) - sizeof(*rsb))) + return __this_address; + + return NULL; +} + +static void +xfs_rtsb_read_verify( + struct xfs_buf *bp) +{ + xfs_failaddr_t fa; + + if (!xfs_buf_verify_cksum(bp, XFS_RTSB_CRC_OFF)) + xfs_verifier_error(bp, -EFSBADCRC, __this_address); + else { + fa = xfs_rtsb_verify(bp); + if (fa) + xfs_verifier_error(bp, -EFSCORRUPTED, fa); + } +} + +static void +xfs_rtsb_write_verify( + struct xfs_buf *bp) +{ + struct xfs_rtsb *rsb = bp->b_addr; + struct xfs_buf_log_item *bip = bp->b_log_item; + xfs_failaddr_t fa; + + fa = xfs_rtsb_verify(bp); + if (fa) { + xfs_verifier_error(bp, -EFSCORRUPTED, fa); + return; + } + + if (bip) + rsb->rsb_lsn = cpu_to_be64(bip->bli_item.li_lsn); + + xfs_buf_update_cksum(bp, XFS_RTSB_CRC_OFF); +} + +const struct xfs_buf_ops xfs_rtsb_buf_ops = { + .name = "xfs_rtsb", + .magic = { 0, cpu_to_be32(XFS_RTSB_MAGIC) }, + .verify_read = xfs_rtsb_read_verify, + .verify_write = xfs_rtsb_write_verify, + .verify_struct = xfs_rtsb_verify, +}; diff --git a/libxfs/xfs_rtgroup.h b/libxfs/xfs_rtgroup.h index f414218a66f..ff9b01d8c50 100644 --- a/libxfs/xfs_rtgroup.h +++ b/libxfs/xfs_rtgroup.h @@ -111,6 +111,89 @@ xfs_verify_rgbext( return xfs_verify_rgbno(rtg, rgbno + len - 1); } +static inline xfs_rtblock_t +xfs_rgbno_to_rtb( + struct xfs_mount *mp, + xfs_rgnumber_t rgno, + xfs_rgblock_t rgbno) +{ + ASSERT(xfs_has_rtgroups(mp)); + + if (mp->m_rgblklog >= 0) + return ((xfs_rtblock_t)rgno << mp->m_rgblklog) | rgbno; + + return ((xfs_rtblock_t)rgno * mp->m_sb.sb_rgblocks) + rgbno; +} + +static inline xfs_rgnumber_t +xfs_rtb_to_rgno( + struct xfs_mount *mp, + xfs_rtblock_t rtbno) +{ + ASSERT(xfs_has_rtgroups(mp)); + + if (mp->m_rgblklog >= 0) + return rtbno >> mp->m_rgblklog; + + return div_u64(rtbno, mp->m_sb.sb_rgblocks); +} + +static inline xfs_rgblock_t +xfs_rtb_to_rgbno( + struct xfs_mount *mp, + xfs_rtblock_t rtbno, + xfs_rgnumber_t *rgno) +{ + uint32_t rem; + + ASSERT(xfs_has_rtgroups(mp)); + + if (mp->m_rgblklog >= 0) { + *rgno = rtbno >> mp->m_rgblklog; + return rtbno & mp->m_rgblkmask; + } + + *rgno = div_u64_rem(rtbno, mp->m_sb.sb_rgblocks, &rem); + return rem; +} + +static inline xfs_daddr_t +xfs_rtb_to_daddr( + struct xfs_mount *mp, + xfs_rtblock_t rtbno) +{ + return rtbno << mp->m_blkbb_log; +} + +static inline xfs_rtblock_t +xfs_daddr_to_rtb( + struct xfs_mount *mp, + xfs_daddr_t daddr) +{ + return daddr >> mp->m_blkbb_log; +} + +static inline xfs_rgnumber_t +xfs_daddr_to_rgno( + struct xfs_mount *mp, + xfs_daddr_t daddr) +{ + xfs_rtblock_t rtb = daddr >> mp->m_blkbb_log; + + return xfs_rtb_to_rgno(mp, rtb); +} + +static inline xfs_rgblock_t +xfs_daddr_to_rgbno( + struct xfs_mount *mp, + xfs_daddr_t daddr) +{ + xfs_rtblock_t rtb = daddr >> mp->m_blkbb_log; + xfs_rgnumber_t rgno; + + return xfs_rtb_to_rgbno(mp, rtb, &rgno); +} + #ifdef CONFIG_XFS_RT xfs_rgblock_t xfs_rtgroup_block_count(struct xfs_mount *mp, xfs_rgnumber_t rgno); diff --git a/libxfs/xfs_sb.c b/libxfs/xfs_sb.c index ac2e9f91989..aec147fe5f8 100644 --- a/libxfs/xfs_sb.c +++ b/libxfs/xfs_sb.c @@ -174,6 +174,8 @@ xfs_sb_version_to_features( features |= XFS_FEAT_NREXT64; if (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_METADIR) features |= XFS_FEAT_METADIR; + if (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_RTGROUPS) + features |= XFS_FEAT_RTGROUPS; return features; } @@ -300,6 +302,64 @@ xfs_validate_sb_write( return 0; } +static int +xfs_validate_sb_rtgroups( + struct xfs_mount *mp, + struct xfs_sb *sbp) +{ + uint64_t groups; + + if (!(sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_METADIR)) { + xfs_warn(mp, +"Realtime groups require metadata directory tree."); + return -EINVAL; + } + + if (sbp->sb_rgblocks > XFS_MAX_RGBLOCKS) { + xfs_warn(mp, +"Realtime group size (%u) must be less than %u.", + sbp->sb_rgblocks, XFS_MAX_RGBLOCKS); + return -EINVAL; + } + + if (sbp->sb_rextsize == 0) { + xfs_warn(mp, +"Realtime extent size must not be zero."); + return -EINVAL; + } + + if (sbp->sb_rgblocks % sbp->sb_rextsize != 0) { + xfs_warn(mp, +"Realtime group size (%u) must be an even multiple of extent size (%u).", + sbp->sb_rgblocks, sbp->sb_rextsize); + return -EINVAL; + } + + if (sbp->sb_rgblocks < (sbp->sb_rextsize << 1)) { + xfs_warn(mp, +"Realtime group size (%u) must be greater than 1 rt extent.", + sbp->sb_rgblocks); + return -EINVAL; + } + + if (sbp->sb_rgcount > XFS_MAX_RGNUMBER) { + xfs_warn(mp, +"Realtime groups (%u) must be less than %u.", + sbp->sb_rgcount, XFS_MAX_RGNUMBER); + return -EINVAL; + } + + groups = howmany_64(sbp->sb_rblocks, sbp->sb_rgblocks); + if (groups != sbp->sb_rgcount) { + xfs_warn(mp, +"Realtime groups (%u) do not cover the entire rt section; need (%llu) groups.", + sbp->sb_rgcount, groups); + return -EINVAL; + } + + return 0; +} + /* Check the validity of the SB. */ STATIC int xfs_validate_sb_common( @@ -311,6 +371,7 @@ xfs_validate_sb_common( uint32_t agcount = 0; uint32_t rem; bool has_dalign; + int error; if (!xfs_verify_magic(bp, dsb->sb_magicnum)) { xfs_warn(mp, @@ -360,6 +421,12 @@ xfs_validate_sb_common( return -EINVAL; } } + + if (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_RTGROUPS) { + error = xfs_validate_sb_rtgroups(mp, sbp); + if (error) + return error; + } } else if (sbp->sb_qflags & (XFS_PQUOTA_ENFD | XFS_GQUOTA_ENFD | XFS_PQUOTA_CHKD | XFS_GQUOTA_CHKD)) { xfs_notice(mp, @@ -640,8 +707,13 @@ __xfs_sb_from_disk( to->sb_pquotino = NULLFSINO; } - to->sb_rgcount = 0; - to->sb_rgblocks = 0; + if (to->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_RTGROUPS) { + to->sb_rgcount = be32_to_cpu(from->sb_rgcount); + to->sb_rgblocks = be32_to_cpu(from->sb_rgblocks); + } else { + to->sb_rgcount = 0; + to->sb_rgblocks = 0; + } } void @@ -801,6 +873,12 @@ xfs_sb_to_disk( to->sb_gquotino = cpu_to_be64(NULLFSINO); to->sb_pquotino = cpu_to_be64(NULLFSINO); } + + if (from->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_RTGROUPS) { + /* must come after setting to_rsumino */ + to->sb_rgcount = cpu_to_be32(from->sb_rgcount); + to->sb_rgblocks = cpu_to_be32(from->sb_rgblocks); + } } /* @@ -955,8 +1033,8 @@ xfs_sb_mount_common( mp->m_blockwmask = mp->m_blockwsize - 1; mp->m_rtxblklog = log2_if_power2(sbp->sb_rextsize); mp->m_rtxblkmask = mask64_if_power2(sbp->sb_rextsize); - mp->m_rgblklog = 0; - mp->m_rgblkmask = 0; + mp->m_rgblklog = log2_if_power2(sbp->sb_rgblocks); + mp->m_rgblkmask = mask64_if_power2(sbp->sb_rgblocks); mp->m_alloc_mxr[0] = xfs_allocbt_maxrecs(mp, sbp->sb_blocksize, 1); mp->m_alloc_mxr[1] = xfs_allocbt_maxrecs(mp, sbp->sb_blocksize, 0); diff --git a/libxfs/xfs_shared.h b/libxfs/xfs_shared.h index 46754fe5736..e76e735b1d0 100644 --- a/libxfs/xfs_shared.h +++ b/libxfs/xfs_shared.h @@ -39,6 +39,7 @@ extern const struct xfs_buf_ops xfs_inode_buf_ra_ops; extern const struct xfs_buf_ops xfs_refcountbt_buf_ops; extern const struct xfs_buf_ops xfs_rmapbt_buf_ops; extern const struct xfs_buf_ops xfs_rtbuf_ops; +extern const struct xfs_buf_ops xfs_rtsb_buf_ops; extern const struct xfs_buf_ops xfs_sb_buf_ops; extern const struct xfs_buf_ops xfs_sb_quiet_buf_ops; extern const struct xfs_buf_ops xfs_symlink_buf_ops; diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c index df8acf221ac..0324daaad3a 100644 --- a/mkfs/xfs_mkfs.c +++ b/mkfs/xfs_mkfs.c @@ -3,6 +3,7 @@ * Copyright (c) 2000-2005 Silicon Graphics, Inc. * All Rights Reserved. */ +#include <stddef.h> #include "libfrog/util.h" #include "libxfs.h" #include <ctype.h> diff --git a/repair/sb.c b/repair/sb.c index c5dbc6c2062..6e7f448596e 100644 --- a/repair/sb.c +++ b/repair/sb.c @@ -3,6 +3,7 @@ * Copyright (c) 2000-2003,2005 Silicon Graphics, Inc. * All Rights Reserved. */ +#include <stddef.h> #include "libfrog/util.h" #include "libxfs.h" #include "libxcmd.h" ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 01/45] xfs: create incore realtime group structures 2022-12-30 22:19 ` [PATCHSET v1.0 00/45] libxfs: shard the realtime section Darrick J. Wong 2022-12-30 22:19 ` [PATCH 02/45] xfs: define the format of rt groups Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 05/45] xfs: grow the realtime section when realtime groups are enabled Darrick J. Wong ` (42 subsequent siblings) 44 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Create an incore object that will contain information about a realtime allocation group. This will eventually enable us to shard the realtime section in a similar manner to how we shard the data section. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- include/xfs_mount.h | 8 ++ include/xfs_trace.h | 5 + libxfs/Makefile | 2 libxfs/init.c | 21 +++++ libxfs/libxfs_api_defs.h | 2 libxfs/xfs_format.h | 8 ++ libxfs/xfs_rtgroup.c | 212 ++++++++++++++++++++++++++++++++++++++++++++++ libxfs/xfs_rtgroup.h | 121 ++++++++++++++++++++++++++ libxfs/xfs_sb.c | 5 + libxfs/xfs_types.h | 4 + 10 files changed, 388 insertions(+) create mode 100644 libxfs/xfs_rtgroup.c create mode 100644 libxfs/xfs_rtgroup.h diff --git a/include/xfs_mount.h b/include/xfs_mount.h index 6de360d33d3..5987650c639 100644 --- a/include/xfs_mount.h +++ b/include/xfs_mount.h @@ -68,6 +68,7 @@ typedef struct xfs_mount { uint8_t m_sectbb_log; /* sectorlog - BBSHIFT */ uint8_t m_agno_log; /* log #ag's */ int8_t m_rtxblklog; /* log2 of rextsize, if possible */ + int8_t m_rgblklog; /* log2 of rt group sz if possible */ uint m_blockmask; /* sb_blocksize-1 */ uint m_blockwsize; /* sb_blocksize in words */ uint m_blockwmask; /* blockwsize-1 */ @@ -88,8 +89,10 @@ typedef struct xfs_mount { uint m_alloc_set_aside; /* space we can't use */ uint m_ag_max_usable; /* max space per AG */ struct radix_tree_root m_perag_tree; + struct radix_tree_root m_rtgroup_tree; uint64_t m_features; /* active filesystem features */ uint64_t m_rtxblkmask; /* rt extent block mask */ + uint64_t m_rgblkmask; /* rt group block mask */ unsigned long m_opstate; /* dynamic state flags */ bool m_finobt_nores; /* no per-AG finobt resv. */ uint m_qflags; /* quota status flags */ @@ -126,6 +129,7 @@ typedef struct xfs_mount { */ atomic64_t m_allocbt_blks; spinlock_t m_perag_lock; /* lock for m_perag_tree */ + spinlock_t m_rtgroup_lock; /* lock for m_rtgroup_tree */ } xfs_mount_t; @@ -165,6 +169,7 @@ typedef struct xfs_mount { #define XFS_FEAT_NEEDSREPAIR (1ULL << 25) /* needs xfs_repair */ #define XFS_FEAT_NREXT64 (1ULL << 26) /* large extent counters */ #define XFS_FEAT_METADIR (1ULL << 27) /* metadata directory tree */ +#define XFS_FEAT_RTGROUPS (1ULL << 28) /* realtime groups */ #define __XFS_HAS_FEAT(name, NAME) \ static inline bool xfs_has_ ## name (struct xfs_mount *mp) \ @@ -210,6 +215,7 @@ __XFS_HAS_FEAT(bigtime, BIGTIME) __XFS_HAS_FEAT(needsrepair, NEEDSREPAIR) __XFS_HAS_FEAT(large_extent_counts, NREXT64) __XFS_HAS_FEAT(metadir, METADIR) +__XFS_HAS_FEAT(rtgroups, RTGROUPS) /* Kernel mount features that we don't support */ #define __XFS_UNSUPP_FEAT(name) \ @@ -230,6 +236,7 @@ __XFS_UNSUPP_FEAT(grpid) #define XFS_OPSTATE_DEBUGGER 1 /* is this the debugger? */ #define XFS_OPSTATE_REPORT_CORRUPTION 2 /* report buffer corruption? */ #define XFS_OPSTATE_PERAG_DATA_LOADED 3 /* per-AG data initialized? */ +#define XFS_OPSTATE_RTGROUP_DATA_LOADED 4 /* rtgroup data initialized? */ #define __XFS_IS_OPSTATE(name, NAME) \ static inline bool xfs_is_ ## name (struct xfs_mount *mp) \ @@ -255,6 +262,7 @@ __XFS_IS_OPSTATE(inode32, INODE32) __XFS_IS_OPSTATE(debugger, DEBUGGER) __XFS_IS_OPSTATE(reporting_corruption, REPORT_CORRUPTION) __XFS_IS_OPSTATE(perag_data_loaded, PERAG_DATA_LOADED) +__XFS_IS_OPSTATE(rtgroup_data_loaded, RTGROUP_DATA_LOADED) #define __XFS_UNSUPP_OPSTATE(name) \ static inline bool xfs_is_ ## name (struct xfs_mount *mp) \ diff --git a/include/xfs_trace.h b/include/xfs_trace.h index fef869dbea3..4c73f86d8f0 100644 --- a/include/xfs_trace.h +++ b/include/xfs_trace.h @@ -332,6 +332,11 @@ #define trace_xfs_rmap_map_error(...) ((void) 0) #define trace_xfs_rmap_delete_error(...) ((void) 0) +/* set c = c to avoid unused var warnings */ +#define trace_xfs_rtgroup_bump(...) ((void) 0) +#define trace_xfs_rtgroup_get(a,b,c,d) ((c) = (c)) +#define trace_xfs_rtgroup_put(a,b,c,d) ((c) = (c)) + #define trace_xfs_swapext_defer(...) ((void) 0) #define trace_xfs_swapext_delta_nextents(...) ((void) 0) #define trace_xfs_swapext_delta_nextents_step(...) ((void) 0) diff --git a/libxfs/Makefile b/libxfs/Makefile index 5d6e1c7bcc2..1bd8a2ab01d 100644 --- a/libxfs/Makefile +++ b/libxfs/Makefile @@ -57,6 +57,7 @@ HFILES = \ xfs_rmap.h \ xfs_rmap_btree.h \ xfs_rtbitmap.h \ + xfs_rtgroup.h \ xfs_sb.h \ xfs_shared.h \ xfs_swapext.h \ @@ -111,6 +112,7 @@ CFILES = cache.c \ xfs_rmap.c \ xfs_rmap_btree.c \ xfs_rtbitmap.c \ + xfs_rtgroup.c \ xfs_sb.c \ xfs_swapext.c \ xfs_symlink_remote.c \ diff --git a/libxfs/init.c b/libxfs/init.c index a440943cbdb..c7f10823870 100644 --- a/libxfs/init.c +++ b/libxfs/init.c @@ -25,6 +25,7 @@ #include "xfile.h" #include "libxfs.h" /* for now */ +#include "xfs_rtgroup.h" #ifndef HAVE_LIBURCU_ATOMIC64 pthread_mutex_t atomic64_lock = PTHREAD_MUTEX_INITIALIZER; @@ -839,7 +840,9 @@ libxfs_mount( { struct xfs_buf *bp; struct xfs_sb *sbp; + struct xfs_rtgroup *rtg; xfs_daddr_t d; + xfs_rgnumber_t rgno; unsigned int btflags = 0; int error; @@ -857,9 +860,11 @@ libxfs_mount( xfs_set_inode32(mp); mp->m_sb = *sb; INIT_RADIX_TREE(&mp->m_perag_tree, GFP_KERNEL); + INIT_RADIX_TREE(&mp->m_rtgroup_tree, GFP_KERNEL); sbp = &mp->m_sb; spin_lock_init(&mp->m_sb_lock); spin_lock_init(&mp->m_agirotor_lock); + spin_lock_init(&mp->m_rtgroup_lock); xfs_sb_mount_common(mp, sb); @@ -987,6 +992,20 @@ libxfs_mount( libxfs_mountfs_imeta(mp); + error = libxfs_initialize_rtgroups(mp, sbp->sb_rgcount); + if (error) { + fprintf(stderr, _("%s: rtgroup init failed\n"), + progname); + exit(1); + } + + for_each_rtgroup(mp, rgno, rtg) { + rtg->rtg_blockcount = xfs_rtgroup_block_count(mp, + rtg->rtg_rgno); + } + + xfs_set_rtgroup_data_loaded(mp); + return mp; out_da: xfs_da_unmount(mp); @@ -1120,6 +1139,8 @@ libxfs_umount( * Only try to free the per-AG structures if we set them up in the * first place. */ + if (xfs_is_rtgroup_data_loaded(mp)) + xfs_free_rtgroups(mp); if (xfs_is_perag_data_loaded(mp)) libxfs_free_perag(mp); diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h index ca9144dd949..7ce9686c00d 100644 --- a/libxfs/libxfs_api_defs.h +++ b/libxfs/libxfs_api_defs.h @@ -138,6 +138,7 @@ #define xfs_fixed_inode_reset libxfs_fixed_inode_reset #define xfs_free_extent libxfs_free_extent #define xfs_free_perag libxfs_free_perag +#define xfs_free_rtgroups libxfs_free_rtgroups #define xfs_fs_geometry libxfs_fs_geometry #define xfs_get_projid libxfs_get_projid #define xfs_get_initial_prid libxfs_get_initial_prid @@ -174,6 +175,7 @@ #define xfs_initialize_perag libxfs_initialize_perag #define xfs_initialize_perag_data libxfs_initialize_perag_data +#define xfs_initialize_rtgroups libxfs_initialize_rtgroups #define xfs_init_local_fork libxfs_init_local_fork #define xfs_inobt_maxrecs libxfs_inobt_maxrecs diff --git a/libxfs/xfs_format.h b/libxfs/xfs_format.h index 946870eb492..ca87a3f8704 100644 --- a/libxfs/xfs_format.h +++ b/libxfs/xfs_format.h @@ -184,6 +184,14 @@ typedef struct xfs_sb { */ xfs_ino_t sb_metadirino; + /* + * Realtime group geometry information. On disk these fields live in + * the rsumino slot, but we cache them separately in the in-core super + * for easy access. + */ + xfs_rgblock_t sb_rgblocks; /* size of a realtime group */ + xfs_rgnumber_t sb_rgcount; /* number of realtime groups */ + /* must be padded to 64 bit alignment */ } xfs_sb_t; diff --git a/libxfs/xfs_rtgroup.c b/libxfs/xfs_rtgroup.c new file mode 100644 index 00000000000..7d26ef76d3e --- /dev/null +++ b/libxfs/xfs_rtgroup.c @@ -0,0 +1,212 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (C) 2022 Oracle. All Rights Reserved. + * Author: Darrick J. Wong <djwong@kernel.org> + */ +#include "libxfs_priv.h" +#include "xfs_fs.h" +#include "xfs_shared.h" +#include "xfs_format.h" +#include "xfs_trans_resv.h" +#include "xfs_bit.h" +#include "xfs_sb.h" +#include "xfs_mount.h" +#include "xfs_btree.h" +#include "xfs_alloc_btree.h" +#include "xfs_rmap_btree.h" +#include "xfs_alloc.h" +#include "xfs_ialloc.h" +#include "xfs_rmap.h" +#include "xfs_ag.h" +#include "xfs_ag_resv.h" +#include "xfs_health.h" +#include "xfs_bmap.h" +#include "xfs_defer.h" +#include "xfs_log_format.h" +#include "xfs_trans.h" +#include "xfs_trace.h" +#include "xfs_inode.h" +#include "xfs_rtgroup.h" +#include "xfs_rtbitmap.h" + +/* + * Passive reference counting access wrappers to the rtgroup structures. If + * the rtgroup structure is to be freed, the freeing code is responsible for + * cleaning up objects with passive references before freeing the structure. + */ +struct xfs_rtgroup * +xfs_rtgroup_get( + struct xfs_mount *mp, + xfs_rgnumber_t rgno) +{ + struct xfs_rtgroup *rtg; + int ref = 0; + + rcu_read_lock(); + rtg = radix_tree_lookup(&mp->m_rtgroup_tree, rgno); + if (rtg) { + ASSERT(atomic_read(&rtg->rtg_ref) >= 0); + ref = atomic_inc_return(&rtg->rtg_ref); + } + rcu_read_unlock(); + trace_xfs_rtgroup_get(mp, rgno, ref, _RET_IP_); + return rtg; +} + +struct xfs_rtgroup * +xfs_rtgroup_bump( + struct xfs_rtgroup *rtg) +{ + if (!atomic_inc_not_zero(&rtg->rtg_ref)) { + ASSERT(0); + return NULL; + } + + trace_xfs_rtgroup_bump(rtg->rtg_mount, rtg->rtg_rgno, + atomic_read(&rtg->rtg_ref), _RET_IP_); + return rtg; +} + +void +xfs_rtgroup_put( + struct xfs_rtgroup *rtg) +{ + int ref; + + ASSERT(atomic_read(&rtg->rtg_ref) > 0); + ref = atomic_dec_return(&rtg->rtg_ref); + trace_xfs_rtgroup_put(rtg->rtg_mount, rtg->rtg_rgno, ref, _RET_IP_); +} + +int +xfs_initialize_rtgroups( + struct xfs_mount *mp, + xfs_rgnumber_t rgcount) +{ + struct xfs_rtgroup *rtg; + xfs_rgnumber_t index; + xfs_rgnumber_t first_initialised = NULLRGNUMBER; + int error; + + if (!xfs_has_rtgroups(mp)) + return 0; + + /* + * Walk the current rtgroup tree so we don't try to initialise rt + * groups that already exist (growfs case). Allocate and insert all the + * rtgroups we don't find ready for initialisation. + */ + for (index = 0; index < rgcount; index++) { + rtg = xfs_rtgroup_get(mp, index); + if (rtg) { + xfs_rtgroup_put(rtg); + continue; + } + + rtg = kmem_zalloc(sizeof(struct xfs_rtgroup), KM_MAYFAIL); + if (!rtg) { + error = -ENOMEM; + goto out_unwind_new_rtgs; + } + rtg->rtg_rgno = index; + rtg->rtg_mount = mp; + + error = radix_tree_preload(GFP_NOFS); + if (error) + goto out_free_rtg; + + spin_lock(&mp->m_rtgroup_lock); + if (radix_tree_insert(&mp->m_rtgroup_tree, index, rtg)) { + WARN_ON_ONCE(1); + spin_unlock(&mp->m_rtgroup_lock); + radix_tree_preload_end(); + error = -EEXIST; + goto out_free_rtg; + } + spin_unlock(&mp->m_rtgroup_lock); + radix_tree_preload_end(); + +#ifdef __KERNEL__ + /* Place kernel structure only init below this point. */ + spin_lock_init(&rtg->rtg_state_lock); +#endif /* __KERNEL__ */ + + /* first new rtg is fully initialized */ + if (first_initialised == NULLRGNUMBER) + first_initialised = index; + } + + return 0; + +out_free_rtg: + kmem_free(rtg); +out_unwind_new_rtgs: + /* unwind any prior newly initialized rtgs */ + for (index = first_initialised; index < rgcount; index++) { + rtg = radix_tree_delete(&mp->m_rtgroup_tree, index); + if (!rtg) + break; + kmem_free(rtg); + } + return error; +} + +STATIC void +__xfs_free_rtgroups( + struct rcu_head *head) +{ + struct xfs_rtgroup *rtg; + + rtg = container_of(head, struct xfs_rtgroup, rcu_head); + kmem_free(rtg); +} + +/* + * Free up the rtgroup resources associated with the mount structure. + */ +void +xfs_free_rtgroups( + struct xfs_mount *mp) +{ + struct xfs_rtgroup *rtg; + xfs_rgnumber_t rgno; + + if (!xfs_has_rtgroups(mp)) + return; + + for (rgno = 0; rgno < mp->m_sb.sb_rgcount; rgno++) { + spin_lock(&mp->m_rtgroup_lock); + rtg = radix_tree_delete(&mp->m_rtgroup_tree, rgno); + spin_unlock(&mp->m_rtgroup_lock); + ASSERT(rtg); + XFS_IS_CORRUPT(rtg->rtg_mount, atomic_read(&rtg->rtg_ref) != 0); + + call_rcu(&rtg->rcu_head, __xfs_free_rtgroups); + } +} + +/* Find the size of the rtgroup, in blocks. */ +static xfs_rgblock_t +__xfs_rtgroup_block_count( + struct xfs_mount *mp, + xfs_rgnumber_t rgno, + xfs_rgnumber_t rgcount, + xfs_rfsblock_t rblocks) +{ + ASSERT(rgno < rgcount); + + if (rgno < rgcount - 1) + return mp->m_sb.sb_rgblocks; + return xfs_rtb_rounddown_rtx(mp, + rblocks - (rgno * mp->m_sb.sb_rgblocks)); +} + +/* Compute the number of blocks in this realtime group. */ +xfs_rgblock_t +xfs_rtgroup_block_count( + struct xfs_mount *mp, + xfs_rgnumber_t rgno) +{ + return __xfs_rtgroup_block_count(mp, rgno, mp->m_sb.sb_rgcount, + mp->m_sb.sb_rblocks); +} diff --git a/libxfs/xfs_rtgroup.h b/libxfs/xfs_rtgroup.h new file mode 100644 index 00000000000..f414218a66f --- /dev/null +++ b/libxfs/xfs_rtgroup.h @@ -0,0 +1,121 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * Copyright (C) 2022 Oracle. All Rights Reserved. + * Author: Darrick J. Wong <djwong@kernel.org> + */ +#ifndef __LIBXFS_RTGROUP_H +#define __LIBXFS_RTGROUP_H 1 + +struct xfs_mount; +struct xfs_trans; + +/* + * Realtime group incore structure, similar to the per-AG structure. + */ +struct xfs_rtgroup { + struct xfs_mount *rtg_mount; + xfs_rgnumber_t rtg_rgno; + atomic_t rtg_ref; + + /* for rcu-safe freeing */ + struct rcu_head rcu_head; + + /* Number of blocks in this group */ + xfs_rgblock_t rtg_blockcount; + +#ifdef __KERNEL__ + /* -- kernel only structures below this line -- */ + spinlock_t rtg_state_lock; +#endif /* __KERNEL__ */ +}; + +#ifdef CONFIG_XFS_RT +struct xfs_rtgroup *xfs_rtgroup_get(struct xfs_mount *mp, xfs_rgnumber_t rgno); +struct xfs_rtgroup *xfs_rtgroup_bump(struct xfs_rtgroup *rtg); +void xfs_rtgroup_put(struct xfs_rtgroup *rtg); +int xfs_initialize_rtgroups(struct xfs_mount *mp, xfs_rgnumber_t rgcount); +void xfs_free_rtgroups(struct xfs_mount *mp); +#else +static inline struct xfs_rtgroup * +xfs_rtgroup_get( + struct xfs_mount *mp, + xfs_rgnumber_t rgno) +{ + return NULL; +} +static inline struct xfs_rtgroup *xfs_rtgroup_bump(struct xfs_rtgroup *rtg) +{ + ASSERT(rtg == NULL); + return NULL; +} +# define xfs_rtgroup_put(rtg) ((void)0) +# define xfs_initialize_rtgroups(mp, rgcount) (0) +# define xfs_free_rtgroups(mp) ((void)0) +#endif /* CONFIG_XFS_RT */ + +/* + * rt group iteration APIs + */ +static inline struct xfs_rtgroup * +xfs_rtgroup_next( + struct xfs_rtgroup *rtg, + xfs_rgnumber_t *rgno, + xfs_rgnumber_t end_rgno) +{ + struct xfs_mount *mp = rtg->rtg_mount; + + *rgno = rtg->rtg_rgno + 1; + xfs_rtgroup_put(rtg); + if (*rgno > end_rgno) + return NULL; + return xfs_rtgroup_get(mp, *rgno); +} + +#define for_each_rtgroup_range(mp, rgno, end_rgno, rtg) \ + for ((rtg) = xfs_rtgroup_get((mp), (rgno)); \ + (rtg) != NULL; \ + (rtg) = xfs_rtgroup_next((rtg), &(rgno), (end_rgno))) + +#define for_each_rtgroup_from(mp, rgno, rtg) \ + for_each_rtgroup_range((mp), (rgno), (mp)->m_sb.sb_rgcount - 1, (rtg)) + + +#define for_each_rtgroup(mp, rgno, rtg) \ + (rgno) = 0; \ + for_each_rtgroup_from((mp), (rgno), (rtg)) + +static inline bool +xfs_verify_rgbno( + struct xfs_rtgroup *rtg, + xfs_rgblock_t rgbno) +{ + if (rgbno >= rtg->rtg_blockcount) + return false; + if (rgbno < rtg->rtg_mount->m_sb.sb_rextsize) + return false; + return true; +} + +static inline bool +xfs_verify_rgbext( + struct xfs_rtgroup *rtg, + xfs_rgblock_t rgbno, + xfs_rgblock_t len) +{ + if (rgbno + len <= rgbno) + return false; + + if (!xfs_verify_rgbno(rtg, rgbno)) + return false; + + return xfs_verify_rgbno(rtg, rgbno + len - 1); +} + +#ifdef CONFIG_XFS_RT +xfs_rgblock_t xfs_rtgroup_block_count(struct xfs_mount *mp, + xfs_rgnumber_t rgno); +#else +# define xfs_rtgroup_block_count(mp, rgno) (0) +#endif /* CONFIG_XFS_RT */ + +#endif /* __LIBXFS_RTGROUP_H */ diff --git a/libxfs/xfs_sb.c b/libxfs/xfs_sb.c index 8605c91e212..ac2e9f91989 100644 --- a/libxfs/xfs_sb.c +++ b/libxfs/xfs_sb.c @@ -639,6 +639,9 @@ __xfs_sb_from_disk( to->sb_gquotino = NULLFSINO; to->sb_pquotino = NULLFSINO; } + + to->sb_rgcount = 0; + to->sb_rgblocks = 0; } void @@ -952,6 +955,8 @@ xfs_sb_mount_common( mp->m_blockwmask = mp->m_blockwsize - 1; mp->m_rtxblklog = log2_if_power2(sbp->sb_rextsize); mp->m_rtxblkmask = mask64_if_power2(sbp->sb_rextsize); + mp->m_rgblklog = 0; + mp->m_rgblkmask = 0; mp->m_alloc_mxr[0] = xfs_allocbt_maxrecs(mp, sbp->sb_blocksize, 1); mp->m_alloc_mxr[1] = xfs_allocbt_maxrecs(mp, sbp->sb_blocksize, 0); diff --git a/libxfs/xfs_types.h b/libxfs/xfs_types.h index f4615c5be34..c27c84561b5 100644 --- a/libxfs/xfs_types.h +++ b/libxfs/xfs_types.h @@ -9,10 +9,12 @@ typedef uint32_t prid_t; /* project ID */ typedef uint32_t xfs_agblock_t; /* blockno in alloc. group */ +typedef uint32_t xfs_rgblock_t; /* blockno in realtime group */ typedef uint32_t xfs_agino_t; /* inode # within allocation grp */ typedef uint32_t xfs_extlen_t; /* extent length in blocks */ typedef uint32_t xfs_rtxlen_t; /* file extent length in rtextents */ typedef uint32_t xfs_agnumber_t; /* allocation group number */ +typedef uint32_t xfs_rgnumber_t; /* realtime group number */ typedef uint64_t xfs_extnum_t; /* # of extents in a file */ typedef uint32_t xfs_aextnum_t; /* # extents in an attribute fork */ typedef int64_t xfs_fsize_t; /* bytes in a file */ @@ -54,7 +56,9 @@ typedef void * xfs_failaddr_t; #define NULLRTEXTNO ((xfs_rtxnum_t)-1) #define NULLAGBLOCK ((xfs_agblock_t)-1) +#define NULLRGBLOCK ((xfs_rgblock_t)-1) #define NULLAGNUMBER ((xfs_agnumber_t)-1) +#define NULLRGNUMBER ((xfs_rgnumber_t)-1) #define NULLCOMMITLSN ((xfs_lsn_t)-1) ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 05/45] xfs: grow the realtime section when realtime groups are enabled 2022-12-30 22:19 ` [PATCHSET v1.0 00/45] libxfs: shard the realtime section Darrick J. Wong 2022-12-30 22:19 ` [PATCH 02/45] xfs: define the format of rt groups Darrick J. Wong 2022-12-30 22:19 ` [PATCH 01/45] xfs: create incore realtime group structures Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 08/45] xfs: add frextents to the lazysbcounters when rtgroups enabled Darrick J. Wong ` (41 subsequent siblings) 44 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Enable growing the rt section when realtime groups are enabled. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/xfs_shared.h | 1 + 1 file changed, 1 insertion(+) diff --git a/libxfs/xfs_shared.h b/libxfs/xfs_shared.h index e76e735b1d0..bcdf298889a 100644 --- a/libxfs/xfs_shared.h +++ b/libxfs/xfs_shared.h @@ -111,6 +111,7 @@ void xfs_log_get_max_trans_res(struct xfs_mount *mp, #define XFS_TRANS_SB_RBLOCKS 0x00000800 #define XFS_TRANS_SB_REXTENTS 0x00001000 #define XFS_TRANS_SB_REXTSLOG 0x00002000 +#define XFS_TRANS_SB_RGCOUNT 0x00004000 /* * Here we centralize the specification of XFS meta-data buffer reference count ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 08/45] xfs: add frextents to the lazysbcounters when rtgroups enabled 2022-12-30 22:19 ` [PATCHSET v1.0 00/45] libxfs: shard the realtime section Darrick J. Wong ` (2 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 05/45] xfs: grow the realtime section when realtime groups are enabled Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 03/45] xfs: update primary realtime super every time we update the primary fs super Darrick J. Wong ` (40 subsequent siblings) 44 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Make the free rt extent count a part of the lazy sb counters when the realtime groups feature is enabled. This is possible because the patch to recompute frextents from the rtbitmap during log recovery predates the code adding rtgroup support, hence we know that the value will always be correct during runtime. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- include/xfs_mount.h | 1 + libxfs/xfs_sb.c | 5 +++++ 2 files changed, 6 insertions(+) diff --git a/include/xfs_mount.h b/include/xfs_mount.h index 5987650c639..ed19b15fcb5 100644 --- a/include/xfs_mount.h +++ b/include/xfs_mount.h @@ -28,6 +28,7 @@ typedef struct xfs_mount { #define m_icount m_sb.sb_icount #define m_ifree m_sb.sb_ifree #define m_fdblocks m_sb.sb_fdblocks +#define m_frextents m_sb.sb_frextents spinlock_t m_sb_lock; /* diff --git a/libxfs/xfs_sb.c b/libxfs/xfs_sb.c index 0ba9143e7c5..1bcffb24761 100644 --- a/libxfs/xfs_sb.c +++ b/libxfs/xfs_sb.c @@ -1087,6 +1087,9 @@ xfs_log_sb( * sb counters, despite having a percpu counter. It is always kept * consistent with the ondisk rtbitmap by xfs_trans_apply_sb_deltas() * and hence we don't need have to update it here. + * + * sb_frextents was added to the lazy sb counters when the rt groups + * feature was introduced. */ if (xfs_has_lazysbcount(mp)) { mp->m_sb.sb_icount = percpu_counter_sum(&mp->m_icount); @@ -1095,6 +1098,8 @@ xfs_log_sb( mp->m_sb.sb_icount); mp->m_sb.sb_fdblocks = percpu_counter_sum(&mp->m_fdblocks); } + if (xfs_has_rtgroups(mp)) + mp->m_sb.sb_frextents = percpu_counter_sum(&mp->m_frextents); xfs_sb_to_disk(bp->b_addr, &mp->m_sb); xfs_trans_buf_set_type(tp, bp, XFS_BLFT_SB_BUF); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 03/45] xfs: update primary realtime super every time we update the primary fs super 2022-12-30 22:19 ` [PATCHSET v1.0 00/45] libxfs: shard the realtime section Darrick J. Wong ` (3 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 08/45] xfs: add frextents to the lazysbcounters when rtgroups enabled Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 07/45] xfs: check that rtblock extents do not overlap with the rt group metadata Darrick J. Wong ` (39 subsequent siblings) 44 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Every time we update parts of the primary filesystem superblock that are echoed in the primary rt super, we should update that primary realtime super. Avoid an ondisk log format change by using ordered buffers to write the primary rt super. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- include/xfs_trans.h | 1 + libxfs/libxfs_api_defs.h | 1 + libxfs/libxfs_io.h | 1 + libxfs/rdwr.c | 17 +++++++++++ libxfs/trans.c | 29 ++++++++++++++++++ libxfs/xfs_rtgroup.c | 74 ++++++++++++++++++++++++++++++++++++++++++++++ libxfs/xfs_rtgroup.h | 6 ++++ libxfs/xfs_sb.c | 13 ++++++++ 8 files changed, 142 insertions(+) diff --git a/include/xfs_trans.h b/include/xfs_trans.h index bfaee7e8fed..0ecf0a95560 100644 --- a/include/xfs_trans.h +++ b/include/xfs_trans.h @@ -98,6 +98,7 @@ int libxfs_trans_reserve_more(struct xfs_trans *tp, uint blocks, void xfs_defer_cancel(struct xfs_trans *); struct xfs_buf *libxfs_trans_getsb(struct xfs_trans *); +struct xfs_buf *libxfs_trans_getrtsb(struct xfs_trans *tp); void libxfs_trans_ijoin(struct xfs_trans *, struct xfs_inode *, uint); void libxfs_trans_log_inode (struct xfs_trans *, struct xfs_inode *, diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h index 7ce9686c00d..4d9499529c0 100644 --- a/libxfs/libxfs_api_defs.h +++ b/libxfs/libxfs_api_defs.h @@ -263,6 +263,7 @@ #define xfs_trans_dirty_buf libxfs_trans_dirty_buf #define xfs_trans_get_buf libxfs_trans_get_buf #define xfs_trans_get_buf_map libxfs_trans_get_buf_map +#define xfs_trans_getrtsb libxfs_trans_getrtsb #define xfs_trans_getsb libxfs_trans_getsb #define xfs_trans_ichgtime libxfs_trans_ichgtime #define xfs_trans_ijoin libxfs_trans_ijoin diff --git a/libxfs/libxfs_io.h b/libxfs/libxfs_io.h index fb536c1c3c9..d54258c7355 100644 --- a/libxfs/libxfs_io.h +++ b/libxfs/libxfs_io.h @@ -197,6 +197,7 @@ libxfs_buf_read( int libxfs_readbuf_verify(struct xfs_buf *bp, const struct xfs_buf_ops *ops); struct xfs_buf *libxfs_getsb(struct xfs_mount *mp); +struct xfs_buf *libxfs_getrtsb(struct xfs_mount *mp); extern void libxfs_bcache_purge(struct xfs_mount *mp); extern void libxfs_bcache_free(void); extern void libxfs_bcache_flush(struct xfs_mount *mp); diff --git a/libxfs/rdwr.c b/libxfs/rdwr.c index 2c66b84ff83..1c91f557f41 100644 --- a/libxfs/rdwr.c +++ b/libxfs/rdwr.c @@ -164,6 +164,23 @@ libxfs_getsb( return bp; } +struct xfs_buf * +libxfs_getrtsb( + struct xfs_mount *mp) +{ + struct xfs_buf *bp; + int error; + + if (!mp->m_rtdev_targp->bt_bdev) + return NULL; + + error = libxfs_buf_read_uncached(mp->m_rtdev_targp, XFS_RTSB_DADDR, + XFS_FSB_TO_BB(mp, 1), 0, &bp, &xfs_rtsb_buf_ops); + if (error) + return NULL; + return bp; +} + struct kmem_cache *xfs_buf_cache; static struct cache_mru xfs_buf_freelist = diff --git a/libxfs/trans.c b/libxfs/trans.c index 3120d8b1dea..06d3655c33b 100644 --- a/libxfs/trans.c +++ b/libxfs/trans.c @@ -511,6 +511,35 @@ libxfs_trans_getsb( return bp; } +struct xfs_buf * +libxfs_trans_getrtsb( + struct xfs_trans *tp) +{ + struct xfs_mount *mp = tp->t_mountp; + struct xfs_buf *bp; + struct xfs_buf_log_item *bip; + int len = XFS_FSS_TO_BB(mp, 1); + DEFINE_SINGLE_BUF_MAP(map, XFS_SB_DADDR, len); + + bp = xfs_trans_buf_item_match(tp, mp->m_rtdev, &map, 1); + if (bp != NULL) { + ASSERT(bp->b_transp == tp); + bip = bp->b_log_item; + ASSERT(bip != NULL); + bip->bli_recur++; + trace_xfs_trans_getsb_recur(bip); + return bp; + } + + bp = libxfs_getrtsb(mp); + if (bp == NULL) + return NULL; + + _libxfs_trans_bjoin(tp, bp, 1); + trace_xfs_trans_getsb(bp->b_log_item); + return bp; +} + int libxfs_trans_read_buf_map( struct xfs_mount *mp, diff --git a/libxfs/xfs_rtgroup.c b/libxfs/xfs_rtgroup.c index ef1d1f29d64..a96df704070 100644 --- a/libxfs/xfs_rtgroup.c +++ b/libxfs/xfs_rtgroup.c @@ -305,3 +305,77 @@ const struct xfs_buf_ops xfs_rtsb_buf_ops = { .verify_write = xfs_rtsb_write_verify, .verify_struct = xfs_rtsb_verify, }; + +/* Update a realtime superblock from the primary fs super */ +void +xfs_rtgroup_update_super( + struct xfs_buf *rtsb_bp, + const struct xfs_buf *sb_bp) +{ + const struct xfs_dsb *dsb = sb_bp->b_addr; + struct xfs_rtsb *rsb = rtsb_bp->b_addr; + const uuid_t *meta_uuid; + + rsb->rsb_magicnum = cpu_to_be32(XFS_RTSB_MAGIC); + rsb->rsb_blocksize = dsb->sb_blocksize; + rsb->rsb_rblocks = dsb->sb_rblocks; + + rsb->rsb_rextents = dsb->sb_rextents; + rsb->rsb_lsn = 0; + + memcpy(&rsb->rsb_uuid, &dsb->sb_uuid, sizeof(rsb->rsb_uuid)); + + rsb->rsb_rgcount = dsb->sb_rgcount; + memcpy(&rsb->rsb_fname, &dsb->sb_fname, XFSLABEL_MAX); + + rsb->rsb_rextsize = dsb->sb_rextsize; + rsb->rsb_rbmblocks = dsb->sb_rbmblocks; + + rsb->rsb_rgblocks = dsb->sb_rgblocks; + rsb->rsb_blocklog = dsb->sb_blocklog; + rsb->rsb_sectlog = dsb->sb_sectlog; + rsb->rsb_rextslog = dsb->sb_rextslog; + rsb->rsb_pad = 0; + rsb->rsb_pad2 = 0; + + /* + * The metadata uuid is the fs uuid if the metauuid feature is not + * enabled. + */ + if (dsb->sb_features_incompat & + cpu_to_be32(XFS_SB_FEAT_INCOMPAT_META_UUID)) + meta_uuid = &dsb->sb_meta_uuid; + else + meta_uuid = &dsb->sb_uuid; + memcpy(&rsb->rsb_meta_uuid, meta_uuid, sizeof(rsb->rsb_meta_uuid)); +} + +/* + * Update the primary realtime superblock from a filesystem superblock and + * log it to the given transaction. + */ +void +xfs_rtgroup_log_super( + struct xfs_trans *tp, + const struct xfs_buf *sb_bp) +{ + struct xfs_buf *rtsb_bp; + + if (!xfs_has_rtgroups(tp->t_mountp)) + return; + + rtsb_bp = xfs_trans_getrtsb(tp); + if (!rtsb_bp) { + /* + * It's possible for the rtgroups feature to be enabled but + * there is no incore rt superblock buffer if the rt geometry + * was specified at mkfs time but the rt section has not yet + * been attached. In this case, rblocks must be zero. + */ + ASSERT(tp->t_mountp->m_sb.sb_rblocks == 0); + return; + } + + xfs_rtgroup_update_super(rtsb_bp, sb_bp); + xfs_trans_ordered_buf(tp, rtsb_bp); +} diff --git a/libxfs/xfs_rtgroup.h b/libxfs/xfs_rtgroup.h index ff9b01d8c50..c6db6b0d2ae 100644 --- a/libxfs/xfs_rtgroup.h +++ b/libxfs/xfs_rtgroup.h @@ -197,8 +197,14 @@ xfs_daddr_to_rgbno( #ifdef CONFIG_XFS_RT xfs_rgblock_t xfs_rtgroup_block_count(struct xfs_mount *mp, xfs_rgnumber_t rgno); + +void xfs_rtgroup_update_super(struct xfs_buf *rtsb_bp, + const struct xfs_buf *sb_bp); +void xfs_rtgroup_log_super(struct xfs_trans *tp, const struct xfs_buf *sb_bp); #else # define xfs_rtgroup_block_count(mp, rgno) (0) +# define xfs_rtgroup_update_super(bp, sb_bp) ((void)0) +# define xfs_rtgroup_log_super(tp, sb_bp) ((void)0) #endif /* CONFIG_XFS_RT */ #endif /* __LIBXFS_RTGROUP_H */ diff --git a/libxfs/xfs_sb.c b/libxfs/xfs_sb.c index aec147fe5f8..7b8baf64e82 100644 --- a/libxfs/xfs_sb.c +++ b/libxfs/xfs_sb.c @@ -24,6 +24,7 @@ #include "xfs_health.h" #include "xfs_ag.h" #include "xfs_swapext.h" +#include "xfs_rtgroup.h" /* * Physical superblock buffer manipulations. Shared with libxfs in userspace. @@ -1098,6 +1099,8 @@ xfs_log_sb( xfs_sb_to_disk(bp->b_addr, &mp->m_sb); xfs_trans_buf_set_type(tp, bp, XFS_BLFT_SB_BUF); xfs_trans_log_buf(tp, bp, 0, sizeof(struct xfs_dsb) - 1); + + xfs_rtgroup_log_super(tp, bp); } /* @@ -1214,6 +1217,7 @@ xfs_sync_sb_buf( { struct xfs_trans *tp; struct xfs_buf *bp; + struct xfs_buf *rtsb_bp = NULL; int error; error = xfs_trans_alloc(mp, &M_RES(mp)->tr_sb, 0, 0, 0, &tp); @@ -1223,6 +1227,11 @@ xfs_sync_sb_buf( bp = xfs_trans_getsb(tp); xfs_log_sb(tp); xfs_trans_bhold(tp, bp); + if (xfs_has_rtgroups(mp)) { + rtsb_bp = xfs_trans_getrtsb(tp); + if (rtsb_bp) + xfs_trans_bhold(tp, rtsb_bp); + } xfs_trans_set_sync(tp); error = xfs_trans_commit(tp); if (error) @@ -1231,7 +1240,11 @@ xfs_sync_sb_buf( * write out the sb buffer to get the changes to disk */ error = xfs_bwrite(bp); + if (!error && rtsb_bp) + error = xfs_bwrite(rtsb_bp); out: + if (rtsb_bp) + xfs_buf_relse(rtsb_bp); xfs_buf_relse(bp); return error; } ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 07/45] xfs: check that rtblock extents do not overlap with the rt group metadata 2022-12-30 22:19 ` [PATCHSET v1.0 00/45] libxfs: shard the realtime section Darrick J. Wong ` (4 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 03/45] xfs: update primary realtime super every time we update the primary fs super Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 06/45] xfs: export realtime group geometry via XFS_FSOP_GEOM Darrick J. Wong ` (38 subsequent siblings) 44 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> The ondisk format specifies that the start of each realtime group must have a superblock so that rt space mappings never cross an rtgroup boundary. Check that rt block pointers obey this. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/xfs_types.c | 46 ++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 44 insertions(+), 2 deletions(-) diff --git a/libxfs/xfs_types.c b/libxfs/xfs_types.c index f5eab8839e3..6488cda24e8 100644 --- a/libxfs/xfs_types.c +++ b/libxfs/xfs_types.c @@ -13,6 +13,8 @@ #include "xfs_mount.h" #include "xfs_ag.h" #include "xfs_imeta.h" +#include "xfs_rtbitmap.h" +#include "xfs_rtgroup.h" /* @@ -133,6 +135,26 @@ xfs_verify_dir_ino( return xfs_verify_ino(mp, ino); } +/* + * Verify that an rtgroup block number pointer neither points outside the + * rtgroup nor points at static metadata. + */ +static inline bool +xfs_verify_rgno_rgbno( + struct xfs_mount *mp, + xfs_rgnumber_t rgno, + xfs_rgblock_t rgbno) +{ + xfs_rgblock_t eorg; + + eorg = xfs_rtgroup_block_count(mp, rgno); + if (rgbno >= eorg) + return false; + if (rgbno < mp->m_sb.sb_rextsize) + return false; + return true; +} + /* * Verify that an realtime block number pointer doesn't point off the * end of the realtime device. @@ -142,7 +164,20 @@ xfs_verify_rtbno( struct xfs_mount *mp, xfs_rtblock_t rtbno) { - return rtbno < mp->m_sb.sb_rblocks; + xfs_rgnumber_t rgno; + xfs_rgblock_t rgbno; + + if (rtbno >= mp->m_sb.sb_rblocks) + return false; + + if (!xfs_has_rtgroups(mp)) + return true; + + rgbno = xfs_rtb_to_rgbno(mp, rtbno, &rgno); + if (rgno >= mp->m_sb.sb_rgcount) + return false; + + return xfs_verify_rgno_rgbno(mp, rgno, rgbno); } /* Verify that a realtime device extent is fully contained inside the volume. */ @@ -158,7 +193,14 @@ xfs_verify_rtbext( if (!xfs_verify_rtbno(mp, rtbno)) return false; - return xfs_verify_rtbno(mp, rtbno + len - 1); + if (!xfs_verify_rtbno(mp, rtbno + len - 1)) + return false; + + if (xfs_has_rtgroups(mp) && + xfs_rtb_to_rgno(mp, rtbno) != xfs_rtb_to_rgno(mp, rtbno + len - 1)) + return false; + + return true; } /* Calculate the range of valid icount values. */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 06/45] xfs: export realtime group geometry via XFS_FSOP_GEOM 2022-12-30 22:19 ` [PATCHSET v1.0 00/45] libxfs: shard the realtime section Darrick J. Wong ` (5 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 07/45] xfs: check that rtblock extents do not overlap with the rt group metadata Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 04/45] xfs: write secondary realtime superblocks to disk Darrick J. Wong ` (37 subsequent siblings) 44 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Export the realtime geometry information so that userspace can query it. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/xfs_fs.h | 4 +++- libxfs/xfs_sb.c | 5 +++++ 2 files changed, 8 insertions(+), 1 deletion(-) diff --git a/libxfs/xfs_fs.h b/libxfs/xfs_fs.h index c4995f6557d..ba90649c54e 100644 --- a/libxfs/xfs_fs.h +++ b/libxfs/xfs_fs.h @@ -186,7 +186,9 @@ struct xfs_fsop_geom { __u32 logsunit; /* log stripe unit, bytes */ uint32_t sick; /* o: unhealthy fs & rt metadata */ uint32_t checked; /* o: checked fs & rt metadata */ - __u64 reserved[17]; /* reserved space */ + __u32 rgblocks; /* rtblocks in a realtime group */ + __u32 rgcount; /* number of realtime groups */ + __u64 reserved[16]; /* reserved space */ }; #define XFS_FSOP_GEOM_SICK_COUNTERS (1 << 0) /* summary counters */ diff --git a/libxfs/xfs_sb.c b/libxfs/xfs_sb.c index 7b8baf64e82..0ba9143e7c5 100644 --- a/libxfs/xfs_sb.c +++ b/libxfs/xfs_sb.c @@ -1346,6 +1346,11 @@ xfs_fs_geometry( return; geo->version = XFS_FSOP_GEOM_VERSION_V5; + + if (xfs_has_rtgroups(mp)) { + geo->rgcount = sbp->sb_rgcount; + geo->rgblocks = sbp->sb_rgblocks; + } } /* Read a secondary superblock. */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 04/45] xfs: write secondary realtime superblocks to disk 2022-12-30 22:19 ` [PATCHSET v1.0 00/45] libxfs: shard the realtime section Darrick J. Wong ` (6 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 06/45] xfs: export realtime group geometry via XFS_FSOP_GEOM Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 09/45] xfs: record rt group superblock errors in the health system Darrick J. Wong ` (36 subsequent siblings) 44 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Create some library functions to make it easy to update all the secondary realtime superblocks on disk; this will be used by growfs, xfs_db, mkfs, and xfs_repair. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/xfs_rtgroup.c | 117 ++++++++++++++++++++++++++++++++++++++++++++++++++ libxfs/xfs_rtgroup.h | 2 + 2 files changed, 119 insertions(+) diff --git a/libxfs/xfs_rtgroup.c b/libxfs/xfs_rtgroup.c index a96df704070..9caf39fd51a 100644 --- a/libxfs/xfs_rtgroup.c +++ b/libxfs/xfs_rtgroup.c @@ -379,3 +379,120 @@ xfs_rtgroup_log_super( xfs_rtgroup_update_super(rtsb_bp, sb_bp); xfs_trans_ordered_buf(tp, rtsb_bp); } + +/* Initialize a secondary realtime superblock. */ +static int +xfs_rtgroup_init_secondary_super( + struct xfs_mount *mp, + xfs_rgnumber_t rgno, + struct xfs_buf **bpp) +{ + struct xfs_buf *bp; + struct xfs_rtsb *rsb; + xfs_rtblock_t rtbno; + int error; + + ASSERT(rgno != 0); + + error = xfs_buf_get_uncached(mp->m_rtdev_targp, XFS_FSB_TO_BB(mp, 1), + 0, &bp); + if (error) + return error; + + rtbno = xfs_rgbno_to_rtb(mp, rgno, 0); + bp->b_maps[0].bm_bn = xfs_rtb_to_daddr(mp, rtbno); + bp->b_ops = &xfs_rtsb_buf_ops; + xfs_buf_zero(bp, 0, BBTOB(bp->b_length)); + + rsb = bp->b_addr; + rsb->rsb_magicnum = cpu_to_be32(XFS_RTSB_MAGIC); + rsb->rsb_blocksize = cpu_to_be32(mp->m_sb.sb_blocksize); + rsb->rsb_rblocks = cpu_to_be64(mp->m_sb.sb_rblocks); + + rsb->rsb_rextents = cpu_to_be64(mp->m_sb.sb_rextents); + + memcpy(&rsb->rsb_uuid, &mp->m_sb.sb_uuid, sizeof(rsb->rsb_uuid)); + + rsb->rsb_rgcount = cpu_to_be32(mp->m_sb.sb_rgcount); + memcpy(&rsb->rsb_fname, &mp->m_sb.sb_fname, XFSLABEL_MAX); + + rsb->rsb_rextsize = cpu_to_be32(mp->m_sb.sb_rextsize); + rsb->rsb_rbmblocks = cpu_to_be32(mp->m_sb.sb_rbmblocks); + + rsb->rsb_rgblocks = cpu_to_be32(mp->m_sb.sb_rgblocks); + rsb->rsb_blocklog = mp->m_sb.sb_blocklog; + rsb->rsb_sectlog = mp->m_sb.sb_sectlog; + rsb->rsb_rextslog = mp->m_sb.sb_rextslog; + + memcpy(&rsb->rsb_meta_uuid, &mp->m_sb.sb_meta_uuid, + sizeof(rsb->rsb_meta_uuid)); + + *bpp = bp; + return 0; +} + +/* + * Update all the realtime superblocks to match the new state of the primary. + * Because we are completely overwriting all the existing fields in the + * secondary superblock buffers, there is no need to read them in from disk. + * Just get a new buffer, stamp it and write it. + * + * The rt super buffers do not need to be kept them in memory once they are + * written so we mark them as a one-shot buffer. + */ +int +xfs_rtgroup_update_secondary_sbs( + struct xfs_mount *mp) +{ + LIST_HEAD (buffer_list); + struct xfs_rtgroup *rtg; + xfs_rgnumber_t start_rgno = 1; + int saved_error = 0; + int error = 0; + + for_each_rtgroup_from(mp, start_rgno, rtg) { + struct xfs_buf *bp; + + error = xfs_rtgroup_init_secondary_super(mp, rtg->rtg_rgno, + &bp); + /* + * If we get an error reading or writing alternate superblocks, + * continue. If we break early, we'll leave more superblocks + * un-updated than updated. + */ + if (error) { + xfs_warn(mp, + "error allocating secondary superblock for rt group %d", + rtg->rtg_rgno); + if (!saved_error) + saved_error = error; + continue; + } + + xfs_buf_oneshot(bp); + xfs_buf_delwri_queue(bp, &buffer_list); + xfs_buf_relse(bp); + + /* don't hold too many buffers at once */ + if (rtg->rtg_rgno % 16) + continue; + + error = xfs_buf_delwri_submit(&buffer_list); + if (error) { + xfs_warn(mp, + "write error %d updating a secondary superblock near rt group %u", + error, rtg->rtg_rgno); + if (!saved_error) + saved_error = error; + continue; + } + } + error = xfs_buf_delwri_submit(&buffer_list); + if (error) { + xfs_warn(mp, + "write error %d updating a secondary superblock near rt group %u", + error, start_rgno); + } + + return saved_error ? saved_error : error; +} diff --git a/libxfs/xfs_rtgroup.h b/libxfs/xfs_rtgroup.h index c6db6b0d2ae..d8723fabeb5 100644 --- a/libxfs/xfs_rtgroup.h +++ b/libxfs/xfs_rtgroup.h @@ -201,10 +201,12 @@ xfs_rgblock_t xfs_rtgroup_block_count(struct xfs_mount *mp, void xfs_rtgroup_update_super(struct xfs_buf *rtsb_bp, const struct xfs_buf *sb_bp); void xfs_rtgroup_log_super(struct xfs_trans *tp, const struct xfs_buf *sb_bp); +int xfs_rtgroup_update_secondary_sbs(struct xfs_mount *mp); #else # define xfs_rtgroup_block_count(mp, rgno) (0) # define xfs_rtgroup_update_super(bp, sb_bp) ((void)0) # define xfs_rtgroup_log_super(tp, sb_bp) ((void)0) +# define xfs_rtgroup_update_secondary_sbs(mp) (0) #endif /* CONFIG_XFS_RT */ #endif /* __LIBXFS_RTGROUP_H */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 09/45] xfs: record rt group superblock errors in the health system 2022-12-30 22:19 ` [PATCHSET v1.0 00/45] libxfs: shard the realtime section Darrick J. Wong ` (7 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 04/45] xfs: write secondary realtime superblocks to disk Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 10/45] xfs: define locking primitives for realtime groups Darrick J. Wong ` (35 subsequent siblings) 44 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Record the state of per-rtgroup metadata sickness in the rtgroup structure for later reporting. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/xfs_health.h | 28 +++++++++++++++++++++++++++- libxfs/xfs_rtgroup.h | 8 ++++++++ 2 files changed, 35 insertions(+), 1 deletion(-) diff --git a/libxfs/xfs_health.h b/libxfs/xfs_health.h index 99d53bae9c1..0beb4153a43 100644 --- a/libxfs/xfs_health.h +++ b/libxfs/xfs_health.h @@ -52,6 +52,7 @@ struct xfs_inode; struct xfs_fsop_geom; struct xfs_btree_cur; struct xfs_da_args; +struct xfs_rtgroup; /* Observable health issues for metadata spanning the entire filesystem. */ #define XFS_SICK_FS_COUNTERS (1 << 0) /* summary counters */ @@ -65,6 +66,7 @@ struct xfs_da_args; /* Observable health issues for realtime volume metadata. */ #define XFS_SICK_RT_BITMAP (1 << 0) /* realtime bitmap */ #define XFS_SICK_RT_SUMMARY (1 << 1) /* realtime summary */ +#define XFS_SICK_RT_SUPER (1 << 2) /* rt group superblock */ /* Observable health issues for AG metadata. */ #define XFS_SICK_AG_SB (1 << 0) /* superblock */ @@ -101,7 +103,8 @@ struct xfs_da_args; XFS_SICK_FS_METADIR) #define XFS_SICK_RT_PRIMARY (XFS_SICK_RT_BITMAP | \ - XFS_SICK_RT_SUMMARY) + XFS_SICK_RT_SUMMARY | \ + XFS_SICK_RT_SUPER) #define XFS_SICK_AG_PRIMARY (XFS_SICK_AG_SB | \ XFS_SICK_AG_AGF | \ @@ -176,6 +179,14 @@ void xfs_rt_mark_healthy(struct xfs_mount *mp, unsigned int mask); void xfs_rt_measure_sickness(struct xfs_mount *mp, unsigned int *sick, unsigned int *checked); +void xfs_rgno_mark_sick(struct xfs_mount *mp, xfs_rgnumber_t rgno, + unsigned int mask); +void xfs_rtgroup_mark_sick(struct xfs_rtgroup *rtg, unsigned int mask); +void xfs_rtgroup_mark_checked(struct xfs_rtgroup *rtg, unsigned int mask); +void xfs_rtgroup_mark_healthy(struct xfs_rtgroup *rtg, unsigned int mask); +void xfs_rtgroup_measure_sickness(struct xfs_rtgroup *rtg, unsigned int *sick, + unsigned int *checked); + void xfs_agno_mark_sick(struct xfs_mount *mp, xfs_agnumber_t agno, unsigned int mask); void xfs_ag_mark_sick(struct xfs_perag *pag, unsigned int mask); @@ -225,6 +236,15 @@ xfs_ag_has_sickness(struct xfs_perag *pag, unsigned int mask) return sick & mask; } +static inline bool +xfs_rtgroup_has_sickness(struct xfs_rtgroup *rtg, unsigned int mask) +{ + unsigned int sick, checked; + + xfs_rtgroup_measure_sickness(rtg, &sick, &checked); + return sick & mask; +} + static inline bool xfs_inode_has_sickness(struct xfs_inode *ip, unsigned int mask) { @@ -246,6 +266,12 @@ xfs_rt_is_healthy(struct xfs_mount *mp) return !xfs_rt_has_sickness(mp, -1U); } +static inline bool +xfs_rtgroup_is_healthy(struct xfs_rtgroup *rtg) +{ + return !xfs_rtgroup_has_sickness(rtg, -1U); +} + static inline bool xfs_ag_is_healthy(struct xfs_perag *pag) { diff --git a/libxfs/xfs_rtgroup.h b/libxfs/xfs_rtgroup.h index d8723fabeb5..0e664e2436b 100644 --- a/libxfs/xfs_rtgroup.h +++ b/libxfs/xfs_rtgroup.h @@ -23,6 +23,14 @@ struct xfs_rtgroup { /* Number of blocks in this group */ xfs_rgblock_t rtg_blockcount; + /* + * Bitsets of per-rtgroup metadata that have been checked and/or are + * sick. Callers should hold rtg_state_lock before accessing this + * field. + */ + uint16_t rtg_checked; + uint16_t rtg_sick; + #ifdef __KERNEL__ /* -- kernel only structures below this line -- */ spinlock_t rtg_state_lock; ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 10/45] xfs: define locking primitives for realtime groups 2022-12-30 22:19 ` [PATCHSET v1.0 00/45] libxfs: shard the realtime section Darrick J. Wong ` (8 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 09/45] xfs: record rt group superblock errors in the health system Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 15/45] xfs: encode the rtsummary in big endian format Darrick J. Wong ` (34 subsequent siblings) 44 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Define helper functions to lock all metadata inodes related to a realtime group. There's not much to look at now, but this will become important when we add per-rtgroup metadata files and online fsck code for them. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/xfs_rtgroup.c | 33 +++++++++++++++++++++++++++++++++ libxfs/xfs_rtgroup.h | 14 ++++++++++++++ 2 files changed, 47 insertions(+) diff --git a/libxfs/xfs_rtgroup.c b/libxfs/xfs_rtgroup.c index 9caf39fd51a..86751cb8d31 100644 --- a/libxfs/xfs_rtgroup.c +++ b/libxfs/xfs_rtgroup.c @@ -496,3 +496,36 @@ xfs_rtgroup_update_secondary_sbs( return saved_error ? saved_error : error; } + +/* Lock metadata inodes associated with this rt group. */ +void +xfs_rtgroup_lock( + struct xfs_trans *tp, + struct xfs_rtgroup *rtg, + unsigned int rtglock_flags) +{ + ASSERT(!(rtglock_flags & ~XFS_RTGLOCK_ALL_FLAGS)); + ASSERT(!(rtglock_flags & XFS_RTGLOCK_BITMAP_SHARED) || + !(rtglock_flags & XFS_RTGLOCK_BITMAP)); + + if (rtglock_flags & XFS_RTGLOCK_BITMAP) + xfs_rtbitmap_lock(tp, rtg->rtg_mount); + else if (rtglock_flags & XFS_RTGLOCK_BITMAP_SHARED) + xfs_rtbitmap_lock_shared(rtg->rtg_mount, XFS_RBMLOCK_BITMAP); +} + +/* Unlock metadata inodes associated with this rt group. */ +void +xfs_rtgroup_unlock( + struct xfs_rtgroup *rtg, + unsigned int rtglock_flags) +{ + ASSERT(!(rtglock_flags & ~XFS_RTGLOCK_ALL_FLAGS)); + ASSERT(!(rtglock_flags & XFS_RTGLOCK_BITMAP_SHARED) || + !(rtglock_flags & XFS_RTGLOCK_BITMAP)); + + if (rtglock_flags & XFS_RTGLOCK_BITMAP) + xfs_rtbitmap_unlock(rtg->rtg_mount); + else if (rtglock_flags & XFS_RTGLOCK_BITMAP_SHARED) + xfs_rtbitmap_unlock_shared(rtg->rtg_mount, XFS_RBMLOCK_BITMAP); +} diff --git a/libxfs/xfs_rtgroup.h b/libxfs/xfs_rtgroup.h index 0e664e2436b..b1e53af5a65 100644 --- a/libxfs/xfs_rtgroup.h +++ b/libxfs/xfs_rtgroup.h @@ -210,11 +210,25 @@ void xfs_rtgroup_update_super(struct xfs_buf *rtsb_bp, const struct xfs_buf *sb_bp); void xfs_rtgroup_log_super(struct xfs_trans *tp, const struct xfs_buf *sb_bp); int xfs_rtgroup_update_secondary_sbs(struct xfs_mount *mp); + +/* Lock the rt bitmap inode in exclusive mode */ +#define XFS_RTGLOCK_BITMAP (1U << 0) +/* Lock the rt bitmap inode in shared mode */ +#define XFS_RTGLOCK_BITMAP_SHARED (1U << 1) + +#define XFS_RTGLOCK_ALL_FLAGS (XFS_RTGLOCK_BITMAP | \ + XFS_RTGLOCK_BITMAP_SHARED) + +void xfs_rtgroup_lock(struct xfs_trans *tp, struct xfs_rtgroup *rtg, + unsigned int rtglock_flags); +void xfs_rtgroup_unlock(struct xfs_rtgroup *rtg, unsigned int rtglock_flags); #else # define xfs_rtgroup_block_count(mp, rgno) (0) # define xfs_rtgroup_update_super(bp, sb_bp) ((void)0) # define xfs_rtgroup_log_super(tp, sb_bp) ((void)0) # define xfs_rtgroup_update_secondary_sbs(mp) (0) +# define xfs_rtgroup_lock(tp, rtg, gf) ((void)0) +# define xfs_rtgroup_unlock(rtg, gf) ((void)0) #endif /* CONFIG_XFS_RT */ #endif /* __LIBXFS_RTGROUP_H */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 15/45] xfs: encode the rtsummary in big endian format 2022-12-30 22:19 ` [PATCHSET v1.0 00/45] libxfs: shard the realtime section Darrick J. Wong ` (9 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 10/45] xfs: define locking primitives for realtime groups Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 13/45] xfs: encode the rtbitmap in little " Darrick J. Wong ` (33 subsequent siblings) 44 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Currently, the ondisk realtime summary file counters are accessed in units of 32-bit words. There's no endian translation of the contents of this file, which means that the Bad Things Happen(tm) if you go from (say) x86 to powerpc. Since we have a new feature flag, let's take the opportunity to enforce an endianness on the file. Encode the summary information in big endian format, like most of the rest of the filesystem. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/xfs_format.h | 4 +++- libxfs/xfs_rtbitmap.c | 8 +++++++- 2 files changed, 10 insertions(+), 2 deletions(-) diff --git a/libxfs/xfs_format.h b/libxfs/xfs_format.h index 47b2e31e256..7e76bedda68 100644 --- a/libxfs/xfs_format.h +++ b/libxfs/xfs_format.h @@ -748,10 +748,12 @@ union xfs_rtword_ondisk { /* * Realtime summary counts are accessed by the word, which is currently - * stored in host-endian format. + * stored in host-endian format. Starting with the realtime groups feature, + * the words are stored in be32 ondisk. */ union xfs_suminfo_ondisk { __u32 raw; + __be32 rtg; }; /* diff --git a/libxfs/xfs_rtbitmap.c b/libxfs/xfs_rtbitmap.c index 4ed3bd261f6..26428898d60 100644 --- a/libxfs/xfs_rtbitmap.c +++ b/libxfs/xfs_rtbitmap.c @@ -558,6 +558,9 @@ xfs_suminfo_get( struct xfs_mount *mp, union xfs_suminfo_ondisk *infoptr) { + if (xfs_has_rtgroups(mp)) + return be32_to_cpu(infoptr->rtg); + return infoptr->raw; } @@ -567,7 +570,10 @@ xfs_suminfo_add( union xfs_suminfo_ondisk *infoptr, int delta) { - infoptr->raw += delta; + if (xfs_has_rtgroups(mp)) + be32_add_cpu(&infoptr->rtg, delta); + else + infoptr->raw += delta; } /* ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 13/45] xfs: encode the rtbitmap in little endian format 2022-12-30 22:19 ` [PATCHSET v1.0 00/45] libxfs: shard the realtime section Darrick J. Wong ` (10 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 15/45] xfs: encode the rtsummary in big endian format Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 11/45] xfs: export the geometry of realtime groups to userspace Darrick J. Wong ` (32 subsequent siblings) 44 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Currently, the ondisk realtime bitmap file is accessed in units of 32-bit words. There's no endian translation of the contents of this file, which means that the Bad Things Happen(tm) if you go from (say) x86 to powerpc. Since we have a new feature flag, let's take the opportunity to enforce an endianness on the file. The natural format of a bitmap is (IMHO) little endian, because the byte offsets of the bitmap data should always increase in step with the information being indexed. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/xfs_format.h | 4 +++- libxfs/xfs_rtbitmap.c | 8 +++++++- 2 files changed, 10 insertions(+), 2 deletions(-) diff --git a/libxfs/xfs_format.h b/libxfs/xfs_format.h index 4096d3f069a..c7752aaa447 100644 --- a/libxfs/xfs_format.h +++ b/libxfs/xfs_format.h @@ -738,10 +738,12 @@ struct xfs_agfl { /* * Realtime bitmap information is accessed by the word, which is currently - * stored in host-endian format. + * stored in host-endian format. Starting with the realtime groups feature, + * the words are stored in le32 ondisk. */ union xfs_rtword_ondisk { __u32 raw; + __le32 rtg; }; /* diff --git a/libxfs/xfs_rtbitmap.c b/libxfs/xfs_rtbitmap.c index 2e286a22196..db80f740151 100644 --- a/libxfs/xfs_rtbitmap.c +++ b/libxfs/xfs_rtbitmap.c @@ -173,6 +173,9 @@ xfs_rtbitmap_getword( struct xfs_mount *mp, union xfs_rtword_ondisk *wordptr) { + if (xfs_has_rtgroups(mp)) + return le32_to_cpu(wordptr->rtg); + return wordptr->raw; } @@ -183,7 +186,10 @@ xfs_rtbitmap_setword( union xfs_rtword_ondisk *wordptr, xfs_rtword_t incore) { - wordptr->raw = incore; + if (xfs_has_rtgroups(mp)) + wordptr->rtg = cpu_to_le32(incore); + else + wordptr->raw = incore; } /* ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 11/45] xfs: export the geometry of realtime groups to userspace 2022-12-30 22:19 ` [PATCHSET v1.0 00/45] libxfs: shard the realtime section Darrick J. Wong ` (11 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 13/45] xfs: encode the rtbitmap in little " Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 14/45] xfs: add block headers to realtime summary blocks Darrick J. Wong ` (31 subsequent siblings) 44 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Create an ioctl so that the kernel can report the status of realtime groups to userspace. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/util.c | 7 +++++++ libxfs/xfs_fs.h | 16 ++++++++++++++++ libxfs/xfs_health.h | 2 ++ libxfs/xfs_rtgroup.c | 14 ++++++++++++++ libxfs/xfs_rtgroup.h | 4 ++++ 5 files changed, 43 insertions(+) diff --git a/libxfs/util.c b/libxfs/util.c index 7b16d30b754..e8397fdc341 100644 --- a/libxfs/util.c +++ b/libxfs/util.c @@ -444,6 +444,13 @@ xfs_fs_mark_healthy( } void xfs_ag_geom_health(struct xfs_perag *pag, struct xfs_ag_geometry *ageo) { } +void +xfs_rtgroup_geom_health( + struct xfs_rtgroup *rtg, + struct xfs_rtgroup_geometry *rgeo) +{ + /* empty */ +} void xfs_fs_mark_sick(struct xfs_mount *mp, unsigned int mask) { } void xfs_agno_mark_sick(struct xfs_mount *mp, xfs_agnumber_t agno, unsigned int mask) { } diff --git a/libxfs/xfs_fs.h b/libxfs/xfs_fs.h index ba90649c54e..e3d87665e4a 100644 --- a/libxfs/xfs_fs.h +++ b/libxfs/xfs_fs.h @@ -299,6 +299,21 @@ struct xfs_ag_geometry { #define XFS_AG_GEOM_SICK_REFCNTBT (1 << 9) /* reference counts */ #define XFS_AG_GEOM_SICK_INODES (1 << 10) /* bad inodes were seen */ +/* + * Output for XFS_IOC_RTGROUP_GEOMETRY + */ +struct xfs_rtgroup_geometry { + uint32_t rg_number; /* i/o: rtgroup number */ + uint32_t rg_length; /* o: length in blocks */ + uint32_t rg_sick; /* o: sick things in ag */ + uint32_t rg_checked; /* o: checked metadata in ag */ + uint32_t rg_flags; /* i/o: flags for this ag */ + uint32_t rg_pad; /* o: zero */ + uint64_t rg_reserved[13];/* o: zero */ +}; +#define XFS_RTGROUP_GEOM_SICK_SUPER (1 << 0) /* superblock */ +#define XFS_RTGROUP_GEOM_SICK_BITMAP (1 << 1) /* rtbitmap for this group */ + /* * Structures for XFS_IOC_FSGROWFSDATA, XFS_IOC_FSGROWFSLOG & XFS_IOC_FSGROWFSRT */ @@ -819,6 +834,7 @@ struct xfs_scrub_metadata { /* XFS_IOC_GETFSMAP ------ hoisted 59 */ #define XFS_IOC_SCRUB_METADATA _IOWR('X', 60, struct xfs_scrub_metadata) #define XFS_IOC_AG_GEOMETRY _IOWR('X', 61, struct xfs_ag_geometry) +#define XFS_IOC_RTGROUP_GEOMETRY _IOWR('X', 62, struct xfs_rtgroup_geometry) /* * ioctl commands that replace IRIX syssgi()'s diff --git a/libxfs/xfs_health.h b/libxfs/xfs_health.h index 0beb4153a43..44137c4983f 100644 --- a/libxfs/xfs_health.h +++ b/libxfs/xfs_health.h @@ -286,6 +286,8 @@ xfs_inode_is_healthy(struct xfs_inode *ip) void xfs_fsop_geom_health(struct xfs_mount *mp, struct xfs_fsop_geom *geo); void xfs_ag_geom_health(struct xfs_perag *pag, struct xfs_ag_geometry *ageo); +void xfs_rtgroup_geom_health(struct xfs_rtgroup *rtg, + struct xfs_rtgroup_geometry *rgeo); void xfs_bulkstat_health(struct xfs_inode *ip, struct xfs_bulkstat *bs); #define xfs_metadata_is_sick(error) \ diff --git a/libxfs/xfs_rtgroup.c b/libxfs/xfs_rtgroup.c index 86751cb8d31..ebbd0d13a8a 100644 --- a/libxfs/xfs_rtgroup.c +++ b/libxfs/xfs_rtgroup.c @@ -529,3 +529,17 @@ xfs_rtgroup_unlock( else if (rtglock_flags & XFS_RTGLOCK_BITMAP_SHARED) xfs_rtbitmap_unlock_shared(rtg->rtg_mount, XFS_RBMLOCK_BITMAP); } + +/* Retrieve rt group geometry. */ +int +xfs_rtgroup_get_geometry( + struct xfs_rtgroup *rtg, + struct xfs_rtgroup_geometry *rgeo) +{ + /* Fill out form. */ + memset(rgeo, 0, sizeof(*rgeo)); + rgeo->rg_number = rtg->rtg_rgno; + rgeo->rg_length = rtg->rtg_blockcount; + xfs_rtgroup_geom_health(rtg, rgeo); + return 0; +} diff --git a/libxfs/xfs_rtgroup.h b/libxfs/xfs_rtgroup.h index b1e53af5a65..1fec49c496d 100644 --- a/libxfs/xfs_rtgroup.h +++ b/libxfs/xfs_rtgroup.h @@ -222,6 +222,9 @@ int xfs_rtgroup_update_secondary_sbs(struct xfs_mount *mp); void xfs_rtgroup_lock(struct xfs_trans *tp, struct xfs_rtgroup *rtg, unsigned int rtglock_flags); void xfs_rtgroup_unlock(struct xfs_rtgroup *rtg, unsigned int rtglock_flags); + +int xfs_rtgroup_get_geometry(struct xfs_rtgroup *rtg, + struct xfs_rtgroup_geometry *rgeo); #else # define xfs_rtgroup_block_count(mp, rgno) (0) # define xfs_rtgroup_update_super(bp, sb_bp) ((void)0) @@ -229,6 +232,7 @@ void xfs_rtgroup_unlock(struct xfs_rtgroup *rtg, unsigned int rtglock_flags); # define xfs_rtgroup_update_secondary_sbs(mp) (0) # define xfs_rtgroup_lock(tp, rtg, gf) ((void)0) # define xfs_rtgroup_unlock(rtg, gf) ((void)0) +# define xfs_rtgroup_get_geometry(rtg, rgeo) (-EOPNOTSUPP) #endif /* CONFIG_XFS_RT */ #endif /* __LIBXFS_RTGROUP_H */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 14/45] xfs: add block headers to realtime summary blocks 2022-12-30 22:19 ` [PATCHSET v1.0 00/45] libxfs: shard the realtime section Darrick J. Wong ` (12 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 11/45] xfs: export the geometry of realtime groups to userspace Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 16/45] xfs: store rtgroup information with a bmap intent Darrick J. Wong ` (30 subsequent siblings) 44 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Upgrade rtsummary blocks to have self describing metadata like most every other thing in XFS. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/xfs_format.h | 1 + libxfs/xfs_rtbitmap.c | 18 +++++++++++++++--- libxfs/xfs_rtbitmap.h | 18 ++++++++++++++++-- libxfs/xfs_shared.h | 1 + 4 files changed, 33 insertions(+), 5 deletions(-) diff --git a/libxfs/xfs_format.h b/libxfs/xfs_format.h index c7752aaa447..47b2e31e256 100644 --- a/libxfs/xfs_format.h +++ b/libxfs/xfs_format.h @@ -1287,6 +1287,7 @@ static inline bool xfs_dinode_has_large_extent_counts( * RT bit manipulation macros. */ #define XFS_RTBITMAP_MAGIC 0x424D505A /* BMPZ */ +#define XFS_RTSUMMARY_MAGIC 0x53554D59 /* SUMY */ struct xfs_rtbuf_blkinfo { __be32 rt_magic; /* validity check on block */ diff --git a/libxfs/xfs_rtbitmap.c b/libxfs/xfs_rtbitmap.c index db80f740151..4ed3bd261f6 100644 --- a/libxfs/xfs_rtbitmap.c +++ b/libxfs/xfs_rtbitmap.c @@ -49,7 +49,7 @@ xfs_rtbuf_verify_read( struct xfs_rtbuf_blkinfo *hdr = bp->b_addr; xfs_failaddr_t fa; - if (!xfs_has_rtgroups(mp) || bp->b_ops != &xfs_rtbitmap_buf_ops) + if (!xfs_has_rtgroups(mp)) return; if (!xfs_log_check_lsn(mp, be64_to_cpu(hdr->rt_lsn))) { @@ -80,7 +80,7 @@ xfs_rtbuf_verify_write( struct xfs_buf_log_item *bip = bp->b_log_item; xfs_failaddr_t fa; - if (!xfs_has_rtgroups(mp) || bp->b_ops != &xfs_rtbitmap_buf_ops) + if (!xfs_has_rtgroups(mp)) return; fa = xfs_rtbuf_verify(bp); @@ -108,6 +108,14 @@ const struct xfs_buf_ops xfs_rtbitmap_buf_ops = { .verify_struct = xfs_rtbuf_verify, }; +const struct xfs_buf_ops xfs_rtsummary_buf_ops = { + .name = "xfs_rtsummary", + .magic = { 0, cpu_to_be32(XFS_RTSUMMARY_MAGIC) }, + .verify_read = xfs_rtbuf_verify_read, + .verify_write = xfs_rtbuf_verify_write, + .verify_struct = xfs_rtbuf_verify, +}; + /* * Get a buffer for the bitmap or summary file block specified. * The buffer is returned read and locked. @@ -149,7 +157,7 @@ xfs_rtbuf_get( if (error) return error; - if (xfs_has_rtgroups(mp) && !issum) { + if (xfs_has_rtgroups(mp)) { struct xfs_rtbuf_blkinfo *hdr = bp->b_addr; if (hdr->rt_owner != cpu_to_be64(ip->i_ino)) { @@ -1317,6 +1325,10 @@ xfs_rtsummary_blockcount( unsigned long long rsumwords; rsumwords = (unsigned long long)rsumlevels * rbmblocks; + + if (xfs_has_rtgroups(mp)) + return howmany_64(rsumwords, mp->m_blockwsize); + return XFS_B_TO_FSB(mp, rsumwords << XFS_WORDLOG); } diff --git a/libxfs/xfs_rtbitmap.h b/libxfs/xfs_rtbitmap.h index c1f740fd27b..cebbb72c437 100644 --- a/libxfs/xfs_rtbitmap.h +++ b/libxfs/xfs_rtbitmap.h @@ -181,6 +181,9 @@ xfs_rtsumoffs_to_block( struct xfs_mount *mp, xfs_rtsumoff_t rsumoff) { + if (xfs_has_rtgroups(mp)) + return rsumoff / mp->m_blockwsize; + return XFS_B_TO_FSBT(mp, rsumoff * sizeof(xfs_suminfo_t)); } @@ -195,16 +198,24 @@ xfs_rtsumoffs_to_infoword( { unsigned int mask = mp->m_blockmask >> XFS_SUMINFOLOG; + if (xfs_has_rtgroups(mp)) + return rsumoff % mp->m_blockwsize; + return rsumoff & mask; } /* Return a pointer to a summary info word within a rt summary block buffer. */ static inline union xfs_suminfo_ondisk * xfs_rsumbuf_infoptr( + struct xfs_mount *mp, void *buf, unsigned int infoword) { union xfs_suminfo_ondisk *infop = buf; + struct xfs_rtbuf_blkinfo *hdr = buf; + + if (xfs_has_rtgroups(mp)) + infop = (union xfs_suminfo_ondisk *)(hdr + 1); return &infop[infoword]; } @@ -215,7 +226,7 @@ xfs_rsumblock_infoptr( struct xfs_buf *bp, unsigned int infoword) { - return xfs_rsumbuf_infoptr(bp->b_addr, infoword); + return xfs_rsumbuf_infoptr(bp->b_mount, bp->b_addr, infoword); } static inline const struct xfs_buf_ops * @@ -223,8 +234,11 @@ xfs_rtblock_ops( struct xfs_mount *mp, bool issum) { - if (xfs_has_rtgroups(mp) && !issum) + if (xfs_has_rtgroups(mp)) { + if (issum) + return &xfs_rtsummary_buf_ops; return &xfs_rtbitmap_buf_ops; + } return &xfs_rtbuf_ops; } diff --git a/libxfs/xfs_shared.h b/libxfs/xfs_shared.h index 1c86163915c..62839fc87b5 100644 --- a/libxfs/xfs_shared.h +++ b/libxfs/xfs_shared.h @@ -39,6 +39,7 @@ extern const struct xfs_buf_ops xfs_inode_buf_ra_ops; extern const struct xfs_buf_ops xfs_refcountbt_buf_ops; extern const struct xfs_buf_ops xfs_rmapbt_buf_ops; extern const struct xfs_buf_ops xfs_rtbitmap_buf_ops; +extern const struct xfs_buf_ops xfs_rtsummary_buf_ops; extern const struct xfs_buf_ops xfs_rtbuf_ops; extern const struct xfs_buf_ops xfs_rtsb_buf_ops; extern const struct xfs_buf_ops xfs_sb_buf_ops; ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 16/45] xfs: store rtgroup information with a bmap intent 2022-12-30 22:19 ` [PATCHSET v1.0 00/45] libxfs: shard the realtime section Darrick J. Wong ` (13 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 14/45] xfs: add block headers to realtime summary blocks Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 12/45] xfs: add block headers to realtime bitmap blocks Darrick J. Wong ` (29 subsequent siblings) 44 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Make the bmap intent items take an active reference to the rtgroup containing the space that is being mapped or unmapped. We will need this functionality once we start enabling rmap and reflink on the rt volume. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/defer_item.c | 17 +++++++++++++++-- libxfs/xfs_bmap.h | 5 ++++- 2 files changed, 19 insertions(+), 3 deletions(-) diff --git a/libxfs/defer_item.c b/libxfs/defer_item.c index 316cc87a802..be6ecbc348f 100644 --- a/libxfs/defer_item.c +++ b/libxfs/defer_item.c @@ -479,8 +479,18 @@ xfs_bmap_update_get_group( { xfs_agnumber_t agno; - if (xfs_ifork_is_realtime(bi->bi_owner, bi->bi_whichfork)) + if (xfs_ifork_is_realtime(bi->bi_owner, bi->bi_whichfork)) { + if (xfs_has_rtgroups(mp)) { + xfs_rgnumber_t rgno; + + rgno = xfs_rtb_to_rgno(mp, bi->bi_bmap.br_startblock); + bi->bi_rtg = xfs_rtgroup_get(mp, rgno); + } else { + bi->bi_rtg = NULL; + } + return; + } agno = XFS_FSB_TO_AGNO(mp, bi->bi_bmap.br_startblock); bi->bi_pag = xfs_perag_get(mp, agno); @@ -500,8 +510,11 @@ static inline void xfs_bmap_update_put_group( struct xfs_bmap_intent *bi) { - if (xfs_ifork_is_realtime(bi->bi_owner, bi->bi_whichfork)) + if (xfs_ifork_is_realtime(bi->bi_owner, bi->bi_whichfork)) { + if (xfs_has_rtgroups(bi->bi_owner->i_mount)) + xfs_rtgroup_put(bi->bi_rtg); return; + } xfs_perag_drop_intents(bi->bi_pag); xfs_perag_put(bi->bi_pag); diff --git a/libxfs/xfs_bmap.h b/libxfs/xfs_bmap.h index d870c6a62e4..05097b1d5c7 100644 --- a/libxfs/xfs_bmap.h +++ b/libxfs/xfs_bmap.h @@ -241,7 +241,10 @@ struct xfs_bmap_intent { enum xfs_bmap_intent_type bi_type; int bi_whichfork; struct xfs_inode *bi_owner; - struct xfs_perag *bi_pag; + union { + struct xfs_perag *bi_pag; + struct xfs_rtgroup *bi_rtg; + }; struct xfs_bmbt_irec bi_bmap; }; ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 12/45] xfs: add block headers to realtime bitmap blocks 2022-12-30 22:19 ` [PATCHSET v1.0 00/45] libxfs: shard the realtime section Darrick J. Wong ` (14 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 16/45] xfs: store rtgroup information with a bmap intent Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 17/45] xfs: scrub the realtime group superblock Darrick J. Wong ` (28 subsequent siblings) 44 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Upgrade rtbitmap blocks to have self describing metadata like most every other thing in XFS. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- include/xfs_mount.h | 3 +- libxfs/xfs_format.h | 14 +++++++ libxfs/xfs_rtbitmap.c | 98 +++++++++++++++++++++++++++++++++++++++++++++---- libxfs/xfs_rtbitmap.h | 30 +++++++++++++++ libxfs/xfs_sb.c | 18 +++++++-- libxfs/xfs_shared.h | 1 + 6 files changed, 149 insertions(+), 15 deletions(-) diff --git a/include/xfs_mount.h b/include/xfs_mount.h index ed19b15fcb5..040e594f721 100644 --- a/include/xfs_mount.h +++ b/include/xfs_mount.h @@ -72,7 +72,8 @@ typedef struct xfs_mount { int8_t m_rgblklog; /* log2 of rt group sz if possible */ uint m_blockmask; /* sb_blocksize-1 */ uint m_blockwsize; /* sb_blocksize in words */ - uint m_blockwmask; /* blockwsize-1 */ + /* number of rt extents per rt bitmap block if rtgroups enabled */ + unsigned int m_rtx_per_rbmblock; uint m_alloc_mxr[2]; /* XFS_ALLOC_BLOCK_MAXRECS */ uint m_alloc_mnr[2]; /* XFS_ALLOC_BLOCK_MINRECS */ uint m_bmap_dmxr[2]; /* XFS_BMAP_BLOCK_DMAXRECS */ diff --git a/libxfs/xfs_format.h b/libxfs/xfs_format.h index a38e1499bd4..4096d3f069a 100644 --- a/libxfs/xfs_format.h +++ b/libxfs/xfs_format.h @@ -1284,6 +1284,20 @@ static inline bool xfs_dinode_has_large_extent_counts( /* * RT bit manipulation macros. */ +#define XFS_RTBITMAP_MAGIC 0x424D505A /* BMPZ */ + +struct xfs_rtbuf_blkinfo { + __be32 rt_magic; /* validity check on block */ + __be32 rt_crc; /* CRC of block */ + __be64 rt_owner; /* inode that owns the block */ + __be64 rt_blkno; /* first block of the buffer */ + __be64 rt_lsn; /* sequence number of last write */ + uuid_t rt_uuid; /* filesystem we belong to */ +}; + +#define XFS_RTBUF_CRC_OFF \ + offsetof(struct xfs_rtbuf_blkinfo, rt_crc) + #define XFS_RTMIN(a,b) ((a) < (b) ? (a) : (b)) #define XFS_RTMAX(a,b) ((a) > (b) ? (a) : (b)) diff --git a/libxfs/xfs_rtbitmap.c b/libxfs/xfs_rtbitmap.c index 5d402b250c8..2e286a22196 100644 --- a/libxfs/xfs_rtbitmap.c +++ b/libxfs/xfs_rtbitmap.c @@ -21,23 +21,77 @@ * Realtime allocator bitmap functions shared with userspace. */ -/* - * Real time buffers need verifiers to avoid runtime warnings during IO. - * We don't have anything to verify, however, so these are just dummy - * operations. - */ +static xfs_failaddr_t +xfs_rtbuf_verify( + struct xfs_buf *bp) +{ + struct xfs_mount *mp = bp->b_mount; + struct xfs_rtbuf_blkinfo *hdr = bp->b_addr; + + if (!xfs_verify_magic(bp, hdr->rt_magic)) + return __this_address; + if (!xfs_has_rtgroups(mp)) + return __this_address; + if (!xfs_has_crc(mp)) + return __this_address; + if (!uuid_equal(&hdr->rt_uuid, &mp->m_sb.sb_meta_uuid)) + return __this_address; + if (hdr->rt_blkno != cpu_to_be64(xfs_buf_daddr(bp))) + return __this_address; + return NULL; +} + static void xfs_rtbuf_verify_read( - struct xfs_buf *bp) + struct xfs_buf *bp) { + struct xfs_mount *mp = bp->b_mount; + struct xfs_rtbuf_blkinfo *hdr = bp->b_addr; + xfs_failaddr_t fa; + + if (!xfs_has_rtgroups(mp) || bp->b_ops != &xfs_rtbitmap_buf_ops) + return; + + if (!xfs_log_check_lsn(mp, be64_to_cpu(hdr->rt_lsn))) { + fa = __this_address; + goto fail; + } + + if (!xfs_buf_verify_cksum(bp, XFS_RTBUF_CRC_OFF)) { + fa = __this_address; + goto fail; + } + + fa = xfs_rtbuf_verify(bp); + if (fa) + goto fail; + return; +fail: + xfs_verifier_error(bp, -EFSCORRUPTED, fa); } static void xfs_rtbuf_verify_write( struct xfs_buf *bp) { - return; + struct xfs_mount *mp = bp->b_mount; + struct xfs_rtbuf_blkinfo *hdr = bp->b_addr; + struct xfs_buf_log_item *bip = bp->b_log_item; + xfs_failaddr_t fa; + + if (!xfs_has_rtgroups(mp) || bp->b_ops != &xfs_rtbitmap_buf_ops) + return; + + fa = xfs_rtbuf_verify(bp); + if (fa) { + xfs_verifier_error(bp, -EFSCORRUPTED, fa); + return; + } + + if (bip) + hdr->rt_lsn = cpu_to_be64(bip->bli_item.li_lsn); + xfs_buf_update_cksum(bp, XFS_RTBUF_CRC_OFF); } const struct xfs_buf_ops xfs_rtbuf_ops = { @@ -46,6 +100,14 @@ const struct xfs_buf_ops xfs_rtbuf_ops = { .verify_write = xfs_rtbuf_verify_write, }; +const struct xfs_buf_ops xfs_rtbitmap_buf_ops = { + .name = "xfs_rtbitmap", + .magic = { 0, cpu_to_be32(XFS_RTBITMAP_MAGIC) }, + .verify_read = xfs_rtbuf_verify_read, + .verify_write = xfs_rtbuf_verify_write, + .verify_struct = xfs_rtbuf_verify, +}; + /* * Get a buffer for the bitmap or summary file block specified. * The buffer is returned read and locked. @@ -79,13 +141,26 @@ xfs_rtbuf_get( ASSERT(map.br_startblock != NULLFSBLOCK); error = xfs_trans_read_buf(mp, tp, mp->m_ddev_targp, XFS_FSB_TO_DADDR(mp, map.br_startblock), - mp->m_bsize, 0, &bp, &xfs_rtbuf_ops); + mp->m_bsize, 0, &bp, + xfs_rtblock_ops(mp, issum)); if (xfs_metadata_is_sick(error)) xfs_rt_mark_sick(mp, issum ? XFS_SICK_RT_SUMMARY : XFS_SICK_RT_BITMAP); if (error) return error; + if (xfs_has_rtgroups(mp) && !issum) { + struct xfs_rtbuf_blkinfo *hdr = bp->b_addr; + + if (hdr->rt_owner != cpu_to_be64(ip->i_ino)) { + xfs_buf_mark_corrupt(bp); + xfs_trans_brelse(tp, bp); + xfs_rt_mark_sick(mp, issum ? XFS_SICK_RT_SUMMARY : + XFS_SICK_RT_BITMAP); + return -EFSCORRUPTED; + } + } + xfs_trans_buf_set_type(tp, bp, issum ? XFS_BLFT_RTSUMMARY_BUF : XFS_BLFT_RTBITMAP_BUF); *bpp = bp; @@ -1203,7 +1278,12 @@ xfs_rtbitmap_blockcount( struct xfs_mount *mp, xfs_rtbxlen_t rtextents) { - return howmany_64(rtextents, NBBY * mp->m_sb.sb_blocksize); + unsigned int rbmblock_bytes = mp->m_sb.sb_blocksize; + + if (xfs_has_rtgroups(mp)) + rbmblock_bytes -= sizeof(struct xfs_rtbuf_blkinfo); + + return howmany_64(rtextents, NBBY * rbmblock_bytes); } /* diff --git a/libxfs/xfs_rtbitmap.h b/libxfs/xfs_rtbitmap.h index f6a2a48973a..c1f740fd27b 100644 --- a/libxfs/xfs_rtbitmap.h +++ b/libxfs/xfs_rtbitmap.h @@ -100,6 +100,9 @@ xfs_rtx_to_rbmblock( struct xfs_mount *mp, xfs_rtxnum_t rtx) { + if (xfs_has_rtgroups(mp)) + return div_u64(rtx, mp->m_rtx_per_rbmblock); + return rtx >> mp->m_blkbit_log; } @@ -109,6 +112,13 @@ xfs_rtx_to_rbmword( struct xfs_mount *mp, xfs_rtxnum_t rtx) { + if (xfs_has_rtgroups(mp)) { + unsigned int mod; + + div_u64_rem(rtx >> XFS_NBWORDLOG, mp->m_blockwsize, &mod); + return mod; + } + return (rtx >> XFS_NBWORDLOG) & (mp->m_blockwsize - 1); } @@ -118,16 +128,24 @@ xfs_rbmblock_to_rtx( struct xfs_mount *mp, xfs_fileoff_t rbmoff) { + if (xfs_has_rtgroups(mp)) + return rbmoff * mp->m_rtx_per_rbmblock; + return rbmoff << mp->m_blkbit_log; } /* Return a pointer to a bitmap word within a rt bitmap block buffer. */ static inline union xfs_rtword_ondisk * xfs_rbmbuf_wordptr( + struct xfs_mount *mp, void *buf, unsigned int rbmword) { union xfs_rtword_ondisk *wordp = buf; + struct xfs_rtbuf_blkinfo *hdr = buf; + + if (xfs_has_rtgroups(mp)) + wordp = (union xfs_rtword_ondisk *)(hdr + 1); return &wordp[rbmword]; } @@ -138,7 +156,7 @@ xfs_rbmblock_wordptr( struct xfs_buf *bp, unsigned int rbmword) { - return xfs_rbmbuf_wordptr(bp->b_addr, rbmword); + return xfs_rbmbuf_wordptr(bp->b_mount, bp->b_addr, rbmword); } /* @@ -200,6 +218,16 @@ xfs_rsumblock_infoptr( return xfs_rsumbuf_infoptr(bp->b_addr, infoword); } +static inline const struct xfs_buf_ops * +xfs_rtblock_ops( + struct xfs_mount *mp, + bool issum) +{ + if (xfs_has_rtgroups(mp) && !issum) + return &xfs_rtbitmap_buf_ops; + return &xfs_rtbuf_ops; +} + /* * Functions for walking free space rtextents in the realtime bitmap. */ diff --git a/libxfs/xfs_sb.c b/libxfs/xfs_sb.c index 1bcffb24761..94dc93ed5dc 100644 --- a/libxfs/xfs_sb.c +++ b/libxfs/xfs_sb.c @@ -516,10 +516,15 @@ xfs_validate_sb_common( } else { uint64_t rexts; uint64_t rbmblocks; + unsigned int rbmblock_bytes = sbp->sb_blocksize; rexts = div_u64(sbp->sb_rblocks, sbp->sb_rextsize); - rbmblocks = howmany_64(sbp->sb_rextents, - NBBY * sbp->sb_blocksize); + + if (xfs_sb_is_v5(sbp) && + (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_RTGROUPS)) + rbmblock_bytes -= sizeof(struct xfs_rtbuf_blkinfo); + + rbmblocks = howmany_64(sbp->sb_rextents, NBBY * rbmblock_bytes); if (sbp->sb_rextents != rexts || sbp->sb_rextslog != xfs_highbit32(sbp->sb_rextents) || @@ -1030,8 +1035,13 @@ xfs_sb_mount_common( mp->m_sectbb_log = sbp->sb_sectlog - BBSHIFT; mp->m_agno_log = xfs_highbit32(sbp->sb_agcount - 1) + 1; mp->m_blockmask = sbp->sb_blocksize - 1; - mp->m_blockwsize = sbp->sb_blocksize >> XFS_WORDLOG; - mp->m_blockwmask = mp->m_blockwsize - 1; + if (xfs_has_rtgroups(mp)) + mp->m_blockwsize = (sbp->sb_blocksize - + sizeof(struct xfs_rtbuf_blkinfo)) >> + XFS_WORDLOG; + else + mp->m_blockwsize = sbp->sb_blocksize >> XFS_WORDLOG; + mp->m_rtx_per_rbmblock = mp->m_blockwsize << XFS_NBWORDLOG; mp->m_rtxblklog = log2_if_power2(sbp->sb_rextsize); mp->m_rtxblkmask = mask64_if_power2(sbp->sb_rextsize); mp->m_rgblklog = log2_if_power2(sbp->sb_rgblocks); diff --git a/libxfs/xfs_shared.h b/libxfs/xfs_shared.h index bcdf298889a..1c86163915c 100644 --- a/libxfs/xfs_shared.h +++ b/libxfs/xfs_shared.h @@ -38,6 +38,7 @@ extern const struct xfs_buf_ops xfs_inode_buf_ops; extern const struct xfs_buf_ops xfs_inode_buf_ra_ops; extern const struct xfs_buf_ops xfs_refcountbt_buf_ops; extern const struct xfs_buf_ops xfs_rmapbt_buf_ops; +extern const struct xfs_buf_ops xfs_rtbitmap_buf_ops; extern const struct xfs_buf_ops xfs_rtbuf_ops; extern const struct xfs_buf_ops xfs_rtsb_buf_ops; extern const struct xfs_buf_ops xfs_sb_buf_ops; ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 17/45] xfs: scrub the realtime group superblock 2022-12-30 22:19 ` [PATCHSET v1.0 00/45] libxfs: shard the realtime section Darrick J. Wong ` (15 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 12/45] xfs: add block headers to realtime bitmap blocks Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 19/45] xfs: scrub the rtbitmap by group Darrick J. Wong ` (27 subsequent siblings) 44 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Enable scrubbing of realtime group superblocks. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- io/scrub.c | 15 +++++++++++++++ libfrog/scrub.c | 5 +++++ libfrog/scrub.h | 1 + libxfs/xfs_fs.h | 3 ++- scrub/repair.c | 2 ++ scrub/scrub.c | 4 ++++ 6 files changed, 29 insertions(+), 1 deletion(-) diff --git a/io/scrub.c b/io/scrub.c index 0ad1b0229cc..d764a5a997b 100644 --- a/io/scrub.c +++ b/io/scrub.c @@ -59,6 +59,7 @@ scrub_ioctl( switch (sc->group) { case XFROG_SCRUB_GROUP_AGHEADER: case XFROG_SCRUB_GROUP_PERAG: + case XFROG_SCRUB_GROUP_RTGROUP: meta.sm_agno = control; break; case XFROG_SCRUB_GROUP_INODE: @@ -178,6 +179,19 @@ parse_args( return 0; } break; + case XFROG_SCRUB_GROUP_RTGROUP: + if (optind != argc - 1) { + fprintf(stderr, + _("Must specify one rtgroup number.\n")); + return 0; + } + control = strtoul(argv[optind], &p, 0); + if (*p != '\0') { + fprintf(stderr, + _("Bad rtgroup number '%s'.\n"), argv[optind]); + return 0; + } + break; default: ASSERT(0); break; @@ -255,6 +269,7 @@ repair_ioctl( switch (sc->group) { case XFROG_SCRUB_GROUP_AGHEADER: case XFROG_SCRUB_GROUP_PERAG: + case XFROG_SCRUB_GROUP_RTGROUP: meta.sm_agno = control; break; case XFROG_SCRUB_GROUP_INODE: diff --git a/libfrog/scrub.c b/libfrog/scrub.c index 3e322b4717d..7d6c9c69e4a 100644 --- a/libfrog/scrub.c +++ b/libfrog/scrub.c @@ -149,6 +149,11 @@ const struct xfrog_scrub_descr xfrog_scrubbers[XFS_SCRUB_TYPE_NR] = { .descr = "retained health records", .group = XFROG_SCRUB_GROUP_NONE, }, + [XFS_SCRUB_TYPE_RGSUPER] = { + .name = "rgsuper", + .descr = "realtime group superblock", + .group = XFROG_SCRUB_GROUP_RTGROUP, + }, }; #undef DEP diff --git a/libfrog/scrub.h b/libfrog/scrub.h index a59371fe141..7155e6a9b0e 100644 --- a/libfrog/scrub.h +++ b/libfrog/scrub.h @@ -15,6 +15,7 @@ enum xfrog_scrub_group { XFROG_SCRUB_GROUP_INODE, /* per-inode metadata */ XFROG_SCRUB_GROUP_ISCAN, /* metadata requiring full inode scan */ XFROG_SCRUB_GROUP_SUMMARY, /* summary metadata */ + XFROG_SCRUB_GROUP_RTGROUP, /* per-rtgroup metadata */ }; /* Catalog of scrub types and names, indexed by XFS_SCRUB_TYPE_* */ diff --git a/libxfs/xfs_fs.h b/libxfs/xfs_fs.h index e3d87665e4a..c12be9dbb59 100644 --- a/libxfs/xfs_fs.h +++ b/libxfs/xfs_fs.h @@ -741,9 +741,10 @@ struct xfs_scrub_metadata { #define XFS_SCRUB_TYPE_QUOTACHECK 25 /* quota counters */ #define XFS_SCRUB_TYPE_NLINKS 26 /* inode link counts */ #define XFS_SCRUB_TYPE_HEALTHY 27 /* everything checked out ok */ +#define XFS_SCRUB_TYPE_RGSUPER 28 /* realtime superblock */ /* Number of scrub subcommands. */ -#define XFS_SCRUB_TYPE_NR 28 +#define XFS_SCRUB_TYPE_NR 29 /* i: Repair this metadata. */ #define XFS_SCRUB_IFLAG_REPAIR (1u << 0) diff --git a/scrub/repair.c b/scrub/repair.c index 6629125578c..10db103c87f 100644 --- a/scrub/repair.c +++ b/scrub/repair.c @@ -108,6 +108,7 @@ xfs_repair_metadata( switch (xfrog_scrubbers[scrub_type].group) { case XFROG_SCRUB_GROUP_AGHEADER: case XFROG_SCRUB_GROUP_PERAG: + case XFROG_SCRUB_GROUP_RTGROUP: meta.sm_agno = sri->sri_agno; break; case XFROG_SCRUB_GROUP_INODE: @@ -412,6 +413,7 @@ repair_item_difficulty( case XFS_SCRUB_TYPE_REFCNTBT: case XFS_SCRUB_TYPE_RTBITMAP: case XFS_SCRUB_TYPE_RTSUM: + case XFS_SCRUB_TYPE_RGSUPER: ret |= REPAIR_DIFFICULTY_PRIMARY; break; } diff --git a/scrub/scrub.c b/scrub/scrub.c index 19c35bfd907..a6d5ec056c8 100644 --- a/scrub/scrub.c +++ b/scrub/scrub.c @@ -49,6 +49,9 @@ format_scrub_descr( case XFROG_SCRUB_GROUP_ISCAN: case XFROG_SCRUB_GROUP_NONE: return snprintf(buf, buflen, _("%s"), _(sc->descr)); + case XFROG_SCRUB_GROUP_RTGROUP: + return snprintf(buf, buflen, _("rtgroup %u %s"), meta->sm_agno, + _(sc->descr)); } return -1; } @@ -97,6 +100,7 @@ xfs_check_metadata( switch (group) { case XFROG_SCRUB_GROUP_AGHEADER: case XFROG_SCRUB_GROUP_PERAG: + case XFROG_SCRUB_GROUP_RTGROUP: meta.sm_agno = sri->sri_agno; break; case XFROG_SCRUB_GROUP_METAFILES: ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 19/45] xfs: scrub the rtbitmap by group 2022-12-30 22:19 ` [PATCHSET v1.0 00/45] libxfs: shard the realtime section Darrick J. Wong ` (16 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 17/45] xfs: scrub the realtime group superblock Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 22/45] xfs_repair: improve rtbitmap discrepancy reporting Darrick J. Wong ` (26 subsequent siblings) 44 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Reduce the amount of time that the kernel spends with the rtbitmap locked for a scrub by splitting the work by rtgroup. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libfrog/scrub.c | 5 +++++ libxfs/xfs_fs.h | 3 ++- 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/libfrog/scrub.c b/libfrog/scrub.c index 7d6c9c69e4a..7efb7ecfbd0 100644 --- a/libfrog/scrub.c +++ b/libfrog/scrub.c @@ -154,6 +154,11 @@ const struct xfrog_scrub_descr xfrog_scrubbers[XFS_SCRUB_TYPE_NR] = { .descr = "realtime group superblock", .group = XFROG_SCRUB_GROUP_RTGROUP, }, + [XFS_SCRUB_TYPE_RGBITMAP] = { + .name = "rgbitmap", + .descr = "realtime group bitmap", + .group = XFROG_SCRUB_GROUP_RTGROUP, + }, }; #undef DEP diff --git a/libxfs/xfs_fs.h b/libxfs/xfs_fs.h index c12be9dbb59..7e9d7d7bb40 100644 --- a/libxfs/xfs_fs.h +++ b/libxfs/xfs_fs.h @@ -742,9 +742,10 @@ struct xfs_scrub_metadata { #define XFS_SCRUB_TYPE_NLINKS 26 /* inode link counts */ #define XFS_SCRUB_TYPE_HEALTHY 27 /* everything checked out ok */ #define XFS_SCRUB_TYPE_RGSUPER 28 /* realtime superblock */ +#define XFS_SCRUB_TYPE_RGBITMAP 29 /* realtime group bitmap */ /* Number of scrub subcommands. */ -#define XFS_SCRUB_TYPE_NR 29 +#define XFS_SCRUB_TYPE_NR 30 /* i: Repair this metadata. */ #define XFS_SCRUB_IFLAG_REPAIR (1u << 0) ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 22/45] xfs_repair: improve rtbitmap discrepancy reporting 2022-12-30 22:19 ` [PATCHSET v1.0 00/45] libxfs: shard the realtime section Darrick J. Wong ` (17 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 19/45] xfs: scrub the rtbitmap by group Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 25/45] xfs_db: listify the definition of enum typnm Darrick J. Wong ` (25 subsequent siblings) 44 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Improve the reporting of discrepancies in the realtime bitmap and summary files by creating a separate helper function that will pinpoint the exact (word) locations of mismatches. This will help developers to diagnose problems with the rtgroups feature and users to figure out exactly what's bad in a filesystem. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- repair/rt.c | 42 +++++++++++++++++++++++++++++++++++++++--- 1 file changed, 39 insertions(+), 3 deletions(-) diff --git a/repair/rt.c b/repair/rt.c index 33bc6836f71..ed0f744cb9f 100644 --- a/repair/rt.c +++ b/repair/rt.c @@ -124,6 +124,44 @@ generate_rtinfo( return(0); } +static void +check_rtwords( + struct xfs_mount *mp, + const char *filename, + unsigned long long bno, + void *ondisk, + void *incore) +{ + unsigned int wordcnt = mp->m_blockwsize; + union xfs_rtword_ondisk *o = ondisk, *i = incore; + int badstart = -1; + unsigned int j; + + if (memcmp(ondisk, incore, wordcnt << XFS_WORDLOG) == 0) + return; + + for (j = 0; j < wordcnt; j++, o++, i++) { + if (o->raw == i->raw) { + /* Report a range of inconsistency that just ended. */ + if (badstart >= 0) + do_warn( + _("discrepancy in %s at dblock 0x%llx words 0x%x-0x%x/0x%x\n"), + filename, bno, badstart, j - 1, wordcnt); + badstart = -1; + continue; + } + + if (badstart == -1) + badstart = j; + } + + if (badstart >= 0) + do_warn( + _("discrepancy in %s at dblock 0x%llx words 0x%x-0x%x/0x%x\n"), + filename, bno, badstart, wordcnt, + wordcnt); +} + static void check_rtfile_contents( struct xfs_mount *mp, @@ -180,9 +218,7 @@ check_rtfile_contents( break; } - if (memcmp(bp->b_addr, buf, mp->m_blockwsize << XFS_WORDLOG)) - do_warn(_("discrepancy in %s at dblock 0x%llx\n"), - filename, (unsigned long long)bno); + check_rtwords(mp, filename, bno, bp->b_addr, buf); buf += XFS_FSB_TO_B(mp, map.br_blockcount); bno += map.br_blockcount; ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 25/45] xfs_db: listify the definition of enum typnm 2022-12-30 22:19 ` [PATCHSET v1.0 00/45] libxfs: shard the realtime section Darrick J. Wong ` (18 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 22/45] xfs_repair: improve rtbitmap discrepancy reporting Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 23/45] xfs_repair: repair rtbitmap block headers Darrick J. Wong ` (24 subsequent siblings) 44 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Convert the enum definition into a list so that future patches adding things to enum typnm don't have to reflow the entire thing. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- db/type.h | 29 ++++++++++++++++++++++++----- 1 file changed, 24 insertions(+), 5 deletions(-) diff --git a/db/type.h b/db/type.h index 411bfe90dbc..397dcf5464c 100644 --- a/db/type.h +++ b/db/type.h @@ -11,11 +11,30 @@ struct field; typedef enum typnm { - TYP_AGF, TYP_AGFL, TYP_AGI, TYP_ATTR, TYP_BMAPBTA, - TYP_BMAPBTD, TYP_BNOBT, TYP_CNTBT, TYP_RMAPBT, TYP_REFCBT, TYP_DATA, - TYP_DIR2, TYP_DQBLK, TYP_INOBT, TYP_INODATA, TYP_INODE, - TYP_LOG, TYP_RTBITMAP, TYP_RTSUMMARY, TYP_SB, TYP_SYMLINK, - TYP_TEXT, TYP_FINOBT, TYP_NONE + TYP_AGF, + TYP_AGFL, + TYP_AGI, + TYP_ATTR, + TYP_BMAPBTA, + TYP_BMAPBTD, + TYP_BNOBT, + TYP_CNTBT, + TYP_RMAPBT, + TYP_REFCBT, + TYP_DATA, + TYP_DIR2, + TYP_DQBLK, + TYP_INOBT, + TYP_INODATA, + TYP_INODE, + TYP_LOG, + TYP_RTBITMAP, + TYP_RTSUMMARY, + TYP_SB, + TYP_SYMLINK, + TYP_TEXT, + TYP_FINOBT, + TYP_NONE } typnm_t; #define DB_FUZZ 2 ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 23/45] xfs_repair: repair rtbitmap block headers 2022-12-30 22:19 ` [PATCHSET v1.0 00/45] libxfs: shard the realtime section Darrick J. Wong ` (19 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 25/45] xfs_db: listify the definition of enum typnm Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 21/45] xfs_repair: support realtime groups Darrick J. Wong ` (23 subsequent siblings) 44 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Check and repair the new block headers attached to rtbitmap blocks. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- repair/phase6.c | 20 ++++++++++++++++---- repair/rt.c | 42 ++++++++++++++++++++++++++++++------------ repair/sb.c | 8 +++++++- 3 files changed, 53 insertions(+), 17 deletions(-) diff --git a/repair/phase6.c b/repair/phase6.c index 31d42b9306b..ad70b22a953 100644 --- a/repair/phase6.c +++ b/repair/phase6.c @@ -803,6 +803,8 @@ fill_rbmino(xfs_mount_t *mp) } while (bno < mp->m_sb.sb_rbmblocks) { + xfs_daddr_t daddr; + /* * fill the file one block at a time */ @@ -816,11 +818,9 @@ fill_rbmino(xfs_mount_t *mp) ASSERT(map.br_startblock != HOLESTARTBLOCK); - error = -libxfs_trans_read_buf( - mp, tp, mp->m_dev, - XFS_FSB_TO_DADDR(mp, map.br_startblock), + daddr = XFS_FSB_TO_DADDR(mp, map.br_startblock); + error = -libxfs_trans_read_buf(mp, tp, mp->m_dev, daddr, XFS_FSB_TO_BB(mp, 1), 1, &bp, NULL); - if (error) { do_warn( _("can't access block %" PRIu64 " (fsbno %" PRIu64 ") of realtime bitmap inode %" PRIu64 "\n"), @@ -831,6 +831,18 @@ _("can't access block %" PRIu64 " (fsbno %" PRIu64 ") of realtime bitmap inode % memcpy(xfs_rbmblock_wordptr(bp, 0), bmp, mp->m_blockwsize << XFS_WORDLOG); + if (xfs_has_rtgroups(mp)) { + struct xfs_rtbuf_blkinfo *hdr = bp->b_addr; + + bp->b_ops = &xfs_rtbitmap_buf_ops; + hdr->rt_magic = cpu_to_be32(XFS_RTBITMAP_MAGIC); + hdr->rt_owner = cpu_to_be64(ip->i_ino); + hdr->rt_lsn = 0; + hdr->rt_blkno = cpu_to_be64(daddr); + platform_uuid_copy(&hdr->rt_uuid, + &mp->m_sb.sb_meta_uuid); + } + libxfs_trans_log_buf(tp, bp, 0, mp->m_sb.sb_blocksize - 1); bmp += mp->m_blockwsize; diff --git a/repair/rt.c b/repair/rt.c index ed0f744cb9f..e7190383da3 100644 --- a/repair/rt.c +++ b/repair/rt.c @@ -168,11 +168,13 @@ check_rtfile_contents( const char *filename, xfs_ino_t ino, void *buf, - xfs_fileoff_t filelen) + xfs_fileoff_t filelen, + const struct xfs_buf_ops *buf_ops) { struct xfs_bmbt_irec map; struct xfs_buf *bp; struct xfs_inode *ip; + union xfs_rtword_ondisk *words = buf; xfs_fileoff_t bno = 0; int error; @@ -190,12 +192,11 @@ check_rtfile_contents( } while (bno < filelen) { - xfs_filblks_t maplen; + union xfs_rtword_ondisk *ondisk; + xfs_daddr_t daddr; int nmap = 1; - /* Read up to 1MB at a time. */ - maplen = min(filelen - bno, XFS_B_TO_FSBT(mp, 1048576)); - error = -libxfs_bmapi_read(ip, bno, maplen, &map, &nmap, 0); + error = -libxfs_bmapi_read(ip, bno, 1, &map, &nmap, 0); if (error) { do_warn(_("unable to read %s mapping, err %d\n"), filename, error); @@ -208,19 +209,32 @@ check_rtfile_contents( break; } - error = -libxfs_buf_read_uncached(mp->m_dev, - XFS_FSB_TO_DADDR(mp, map.br_startblock), + daddr = XFS_FSB_TO_DADDR(mp, map.br_startblock); + error = -libxfs_buf_read_uncached(mp->m_dev, daddr, XFS_FSB_TO_BB(mp, map.br_blockcount), - 0, &bp, NULL); + 0, &bp, buf_ops); if (error) { do_warn(_("unable to read %s at dblock 0x%llx, err %d\n"), filename, (unsigned long long)bno, error); break; } - check_rtwords(mp, filename, bno, bp->b_addr, buf); + if (buf_ops) { + struct xfs_rtbuf_blkinfo *hdr = bp->b_addr; + + if (hdr->rt_owner != cpu_to_be64(ino)) { + do_warn( + _("corrupt owner in %s at dblock 0x%llx\n"), + filename, (unsigned long long)bno); + } + ondisk = xfs_rbmblock_wordptr(bp, 0); + check_rtwords(mp, filename, bno, ondisk, words); + words += mp->m_blockwsize; + } else { + check_rtwords(mp, filename, bno, bp->b_addr, buf); + buf += XFS_FSB_TO_B(mp, map.br_blockcount); + } - buf += XFS_FSB_TO_B(mp, map.br_blockcount); bno += map.br_blockcount; libxfs_buf_relse(bp); } @@ -232,11 +246,15 @@ void check_rtbitmap( struct xfs_mount *mp) { + const struct xfs_buf_ops *buf_ops = NULL; + if (need_rbmino) return; + if (xfs_has_rtgroups(mp)) + buf_ops = &xfs_rtbitmap_buf_ops; check_rtfile_contents(mp, "rtbitmap", mp->m_sb.sb_rbmino, btmcompute, - mp->m_sb.sb_rbmblocks); + mp->m_sb.sb_rbmblocks, buf_ops); } void @@ -247,7 +265,7 @@ check_rtsummary( return; check_rtfile_contents(mp, "rtsummary", mp->m_sb.sb_rsumino, sumcompute, - XFS_B_TO_FSB(mp, mp->m_rsumsize)); + XFS_B_TO_FSB(mp, mp->m_rsumsize), NULL); } void diff --git a/repair/sb.c b/repair/sb.c index a1cfeff1e91..04b3d8cf9ce 100644 --- a/repair/sb.c +++ b/repair/sb.c @@ -506,6 +506,8 @@ verify_sb(char *sb_buf, xfs_sb_t *sb, int is_primary_sb) if (sb->sb_frextents != 0) return(XR_BAD_RT_GEO_DATA); } else { + unsigned int rbmblock_bytes = sb->sb_blocksize; + /* * if we have a real-time partition, sanity-check geometry */ @@ -516,8 +518,12 @@ verify_sb(char *sb_buf, xfs_sb_t *sb, int is_primary_sb) libxfs_highbit32((unsigned int)sb->sb_rextents)) return(XR_BAD_RT_GEO_DATA); + if (xfs_sb_is_v5(sb) && + (sb->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_RTGROUPS)) + rbmblock_bytes -= sizeof(struct xfs_rtbuf_blkinfo); + if (sb->sb_rbmblocks != (xfs_extlen_t) howmany(sb->sb_rextents, - NBBY * sb->sb_blocksize)) + NBBY * rbmblock_bytes)) return(XR_BAD_RT_GEO_DATA); } ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 21/45] xfs_repair: support realtime groups 2022-12-30 22:19 ` [PATCHSET v1.0 00/45] libxfs: shard the realtime section Darrick J. Wong ` (20 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 23/45] xfs_repair: repair rtbitmap block headers Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 24/45] xfs_repair: repair rtsummary block headers Darrick J. Wong ` (22 subsequent siblings) 44 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Support the realtime group feature. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/libxfs_api_defs.h | 2 + repair/agheader.c | 2 + repair/incore.c | 22 +++++++++++++++ repair/phase3.c | 3 ++ repair/rt.c | 69 ++++++++++++++++++++++++++++++++++++++++++++++ repair/rt.h | 3 ++ repair/sb.c | 37 +++++++++++++++++++++++++ repair/xfs_repair.c | 11 +++++++ 8 files changed, 149 insertions(+) diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h index 4d9499529c0..deadfe2c422 100644 --- a/libxfs/libxfs_api_defs.h +++ b/libxfs/libxfs_api_defs.h @@ -238,6 +238,8 @@ #define xfs_rtsummary_wordcount libxfs_rtsummary_wordcount #define xfs_rtfree_extent libxfs_rtfree_extent +#define xfs_rtgroup_update_secondary_sbs libxfs_rtgroup_update_secondary_sbs +#define xfs_rtgroup_update_super libxfs_rtgroup_update_super #define xfs_sb_from_disk libxfs_sb_from_disk #define xfs_sb_quota_from_disk libxfs_sb_quota_from_disk #define xfs_sb_read_secondary libxfs_sb_read_secondary diff --git a/repair/agheader.c b/repair/agheader.c index af88802ffdf..076860a4451 100644 --- a/repair/agheader.c +++ b/repair/agheader.c @@ -412,6 +412,8 @@ secondary_sb_whack( * super byte for byte. */ sb->sb_metadirino = mp->m_sb.sb_metadirino; + sb->sb_rgblocks = mp->m_sb.sb_rgblocks; + sb->sb_rgcount = mp->m_sb.sb_rgcount; } else do_warn( _("would zero unused portion of %s superblock (AG #%u)\n"), diff --git a/repair/incore.c b/repair/incore.c index 06edaf0d605..27457a7c17e 100644 --- a/repair/incore.c +++ b/repair/incore.c @@ -195,6 +195,25 @@ set_rtbmap( (((uint64_t) state) << ((rtx % XR_BB_NUM) * XR_BB))); } +static void +rtgroups_init( + struct xfs_mount *mp) +{ + xfs_rgnumber_t rgno; + + if (!xfs_has_rtgroups(mp) || !rt_bmap) + return; + + for (rgno = 0; rgno < mp->m_sb.sb_rgcount; rgno++) { + xfs_rtblock_t start_rtx; + + start_rtx = xfs_rgbno_to_rtb(mp, rgno, 0) / + mp->m_sb.sb_rextsize; + + set_rtbmap(start_rtx, XR_E_INUSE_FS); + } +} + static void reset_rt_bmap(void) { @@ -219,6 +238,8 @@ init_rt_bmap( mp->m_sb.sb_rextents); return; } + + rtgroups_init(mp); } static void @@ -271,6 +292,7 @@ reset_bmaps(xfs_mount_t *mp) } reset_rt_bmap(); + rtgroups_init(mp); } void diff --git a/repair/phase3.c b/repair/phase3.c index ca4dbee4743..19490dbe9bb 100644 --- a/repair/phase3.c +++ b/repair/phase3.c @@ -17,6 +17,7 @@ #include "progress.h" #include "bmap.h" #include "threads.h" +#include "rt.h" static void process_agi_unlinked( @@ -116,6 +117,8 @@ phase3( set_progress_msg(PROG_FMT_AGI_UNLINKED, (uint64_t) glob_agcount); + check_rtsupers(mp); + /* first clear the agi unlinked AGI list */ if (!no_modify) { for (i = 0; i < mp->m_sb.sb_agcount; i++) diff --git a/repair/rt.c b/repair/rt.c index 56a04c3de6e..33bc6836f71 100644 --- a/repair/rt.c +++ b/repair/rt.c @@ -213,3 +213,72 @@ check_rtsummary( check_rtfile_contents(mp, "rtsummary", mp->m_sb.sb_rsumino, sumcompute, XFS_B_TO_FSB(mp, mp->m_rsumsize)); } + +void +check_rtsupers( + struct xfs_mount *mp) +{ + struct xfs_buf *bp; + xfs_rtblock_t rtbno; + xfs_rgnumber_t rgno; + int error; + + if (!xfs_has_rtgroups(mp)) + return; + + for (rgno = 0; rgno < mp->m_sb.sb_rgcount; rgno++) { + rtbno = xfs_rgbno_to_rtb(mp, rgno, 0); + error = -libxfs_buf_read_uncached(mp->m_rtdev_targp, + xfs_rtb_to_daddr(mp, rtbno), + XFS_FSB_TO_BB(mp, 1), 0, &bp, + &xfs_rtsb_buf_ops); + if (!error) { + libxfs_buf_relse(bp); + continue; + } + + if (no_modify) { + do_warn( + _("would rewrite realtime group %u superblock\n"), + rgno); + } else { + do_warn( + _("will rewrite realtime group %u superblock\n"), + rgno); + /* + * Rewrite the primary rt superblock before an update + * to the primary fs superblock trips over the rt super + * being corrupt. + */ + if (rgno == 0) + rewrite_primary_rt_super(mp); + } + } +} + +void +rewrite_primary_rt_super( + struct xfs_mount *mp) +{ + struct xfs_buf *rtsb_bp; + struct xfs_buf *sb_bp = libxfs_getsb(mp); + int error; + + if (!sb_bp) + do_error( + _("couldn't grab primary sb to update rt superblocks\n")); + + error = -libxfs_buf_get_uncached(mp->m_rtdev_targp, + XFS_FSB_TO_BB(mp, 1), 0, &rtsb_bp); + if (error) + do_error( + _("couldn't grab primary rt superblock\n")); + + rtsb_bp->b_maps[0].bm_bn = XFS_RTSB_DADDR; + rtsb_bp->b_ops = &xfs_rtsb_buf_ops; + + libxfs_rtgroup_update_super(rtsb_bp, sb_bp); + libxfs_buf_mark_dirty(rtsb_bp); + libxfs_buf_relse(rtsb_bp); + libxfs_buf_relse(sb_bp); +} diff --git a/repair/rt.h b/repair/rt.h index 16b39c21a67..8e8796aa3c1 100644 --- a/repair/rt.h +++ b/repair/rt.h @@ -16,5 +16,8 @@ int generate_rtinfo(struct xfs_mount *mp, union xfs_rtword_ondisk *words, void check_rtbitmap(struct xfs_mount *mp); void check_rtsummary(struct xfs_mount *mp); +void check_rtsupers(struct xfs_mount *mp); + +void rewrite_primary_rt_super(struct xfs_mount *mp); #endif /* _XFS_REPAIR_RT_H_ */ diff --git a/repair/sb.c b/repair/sb.c index 6e7f448596e..a1cfeff1e91 100644 --- a/repair/sb.c +++ b/repair/sb.c @@ -314,6 +314,37 @@ verify_sb_loginfo( return true; } +static int +verify_sb_rtgroups( + struct xfs_sb *sbp) +{ + uint64_t groups; + + if (!(sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_METADIR)) + return XR_BAD_RT_GEO_DATA; + + if (sbp->sb_rgblocks > XFS_MAX_RGBLOCKS) + return XR_BAD_RT_GEO_DATA; + + if (sbp->sb_rextsize == 0) + return XR_BAD_RT_GEO_DATA; + + if (sbp->sb_rgblocks % sbp->sb_rextsize != 0) + return XR_BAD_RT_GEO_DATA; + + if (sbp->sb_rgblocks < (sbp->sb_rextsize << 1)) + return XR_BAD_RT_GEO_DATA; + + if (sbp->sb_rgcount > XFS_MAX_RGNUMBER) + return XR_BAD_RT_GEO_DATA; + + groups = howmany(sbp->sb_rblocks, sbp->sb_rgblocks); + if (groups != sbp->sb_rgcount) + return XR_BAD_RT_GEO_DATA; + + return 0; +} + /* * verify a superblock -- does not verify root inode # * can only check that geometry info is internally @@ -519,6 +550,12 @@ verify_sb(char *sb_buf, xfs_sb_t *sb, int is_primary_sb) if (sb->sb_blocklog + sb->sb_dirblklog > XFS_MAX_BLOCKSIZE_LOG) return XR_BAD_DIR_SIZE_DATA; + if (sb->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_RTGROUPS) { + int err = verify_sb_rtgroups(sb); + if (err) + return err; + } + return(XR_OK); } diff --git a/repair/xfs_repair.c b/repair/xfs_repair.c index 92dc0fb2d9f..13e1f2deccf 100644 --- a/repair/xfs_repair.c +++ b/repair/xfs_repair.c @@ -27,6 +27,7 @@ #include "bulkload.h" #include "quotacheck.h" #include "rcbag_btree.h" +#include "rt.h" /* * option tables for getsubopt calls @@ -1510,6 +1511,16 @@ _("Note - stripe unit (%d) and width (%d) were copied from a backup superblock.\ XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR; } + /* Always rewrite the realtime superblocks. */ + if (xfs_has_rtgroups(mp)) { + if (mp->m_sb.sb_rgcount > 0) + rewrite_primary_rt_super(mp); + + error = -libxfs_rtgroup_update_secondary_sbs(mp); + if (error) + do_error(_("updating rt superblocks, err %d"), error); + } + /* * Done. Flush all cached buffers and inodes first to ensure all * verifiers are run (where we discover the max metadata LSN), reformat ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 24/45] xfs_repair: repair rtsummary block headers 2022-12-30 22:19 ` [PATCHSET v1.0 00/45] libxfs: shard the realtime section Darrick J. Wong ` (21 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 21/45] xfs_repair: support realtime groups Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 18/45] xfs: repair secondary realtime group superblocks Darrick J. Wong ` (21 subsequent siblings) 44 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Check and repair the new block headers attached to rtsummary blocks. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- repair/phase6.c | 19 ++++++++++++++++--- repair/rt.c | 6 +++++- 2 files changed, 21 insertions(+), 4 deletions(-) diff --git a/repair/phase6.c b/repair/phase6.c index ad70b22a953..1dbd600915d 100644 --- a/repair/phase6.c +++ b/repair/phase6.c @@ -886,6 +886,8 @@ fill_rsumino(xfs_mount_t *mp) } while (bno < end_bno) { + xfs_daddr_t daddr; + /* * fill the file one block at a time */ @@ -899,9 +901,8 @@ fill_rsumino(xfs_mount_t *mp) ASSERT(map.br_startblock != HOLESTARTBLOCK); - error = -libxfs_trans_read_buf( - mp, tp, mp->m_dev, - XFS_FSB_TO_DADDR(mp, map.br_startblock), + daddr = XFS_FSB_TO_DADDR(mp, map.br_startblock); + error = -libxfs_trans_read_buf(mp, tp, mp->m_dev, daddr, XFS_FSB_TO_BB(mp, 1), 1, &bp, NULL); if (error) { @@ -915,6 +916,18 @@ _("can't access block %" PRIu64 " (fsbno %" PRIu64 ") of realtime summary inode memcpy(xfs_rsumblock_infoptr(bp, 0), smp, mp->m_blockwsize << XFS_WORDLOG); + if (xfs_has_rtgroups(mp)) { + struct xfs_rtbuf_blkinfo *hdr = bp->b_addr; + + bp->b_ops = &xfs_rtsummary_buf_ops; + hdr->rt_magic = cpu_to_be32(XFS_RTSUMMARY_MAGIC); + hdr->rt_owner = cpu_to_be64(ip->i_ino); + hdr->rt_lsn = 0; + hdr->rt_blkno = cpu_to_be64(daddr); + platform_uuid_copy(&hdr->rt_uuid, + &mp->m_sb.sb_meta_uuid); + } + libxfs_trans_log_buf(tp, bp, 0, mp->m_sb.sb_blocksize - 1); smp += mp->m_blockwsize; diff --git a/repair/rt.c b/repair/rt.c index e7190383da3..33641031731 100644 --- a/repair/rt.c +++ b/repair/rt.c @@ -261,11 +261,15 @@ void check_rtsummary( struct xfs_mount *mp) { + const struct xfs_buf_ops *buf_ops = NULL; + if (need_rsumino) return; + if (xfs_has_rtgroups(mp)) + buf_ops = &xfs_rtsummary_buf_ops; check_rtfile_contents(mp, "rtsummary", mp->m_sb.sb_rsumino, sumcompute, - XFS_B_TO_FSB(mp, mp->m_rsumsize), NULL); + XFS_B_TO_FSB(mp, mp->m_rsumsize), buf_ops); } void ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 18/45] xfs: repair secondary realtime group superblocks 2022-12-30 22:19 ` [PATCHSET v1.0 00/45] libxfs: shard the realtime section Darrick J. Wong ` (22 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 24/45] xfs_repair: repair rtsummary block headers Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 20/45] libfrog: report rt groups in output Darrick J. Wong ` (20 subsequent siblings) 44 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Repair secondary realtime group superblocks. They're not critical for anything, but some consistency would be a good idea. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/xfs_rtgroup.c | 2 +- libxfs/xfs_rtgroup.h | 3 +++ 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/libxfs/xfs_rtgroup.c b/libxfs/xfs_rtgroup.c index ebbd0d13a8a..97643fdcc7c 100644 --- a/libxfs/xfs_rtgroup.c +++ b/libxfs/xfs_rtgroup.c @@ -381,7 +381,7 @@ xfs_rtgroup_log_super( } /* Initialize a secondary realtime superblock. */ -static int +int xfs_rtgroup_init_secondary_super( struct xfs_mount *mp, xfs_rgnumber_t rgno, diff --git a/libxfs/xfs_rtgroup.h b/libxfs/xfs_rtgroup.h index 1fec49c496d..3c9572677f7 100644 --- a/libxfs/xfs_rtgroup.h +++ b/libxfs/xfs_rtgroup.h @@ -210,6 +210,8 @@ void xfs_rtgroup_update_super(struct xfs_buf *rtsb_bp, const struct xfs_buf *sb_bp); void xfs_rtgroup_log_super(struct xfs_trans *tp, const struct xfs_buf *sb_bp); int xfs_rtgroup_update_secondary_sbs(struct xfs_mount *mp); +int xfs_rtgroup_init_secondary_super(struct xfs_mount *mp, xfs_rgnumber_t rgno, + struct xfs_buf **bpp); /* Lock the rt bitmap inode in exclusive mode */ #define XFS_RTGLOCK_BITMAP (1U << 0) @@ -230,6 +232,7 @@ int xfs_rtgroup_get_geometry(struct xfs_rtgroup *rtg, # define xfs_rtgroup_update_super(bp, sb_bp) ((void)0) # define xfs_rtgroup_log_super(tp, sb_bp) ((void)0) # define xfs_rtgroup_update_secondary_sbs(mp) (0) +# define xfs_rtgroup_init_secondary_super(mp, rgno, bpp) (-EOPNOTSUPP) # define xfs_rtgroup_lock(tp, rtg, gf) ((void)0) # define xfs_rtgroup_unlock(rtg, gf) ((void)0) # define xfs_rtgroup_get_geometry(rtg, rgeo) (-EOPNOTSUPP) ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 20/45] libfrog: report rt groups in output 2022-12-30 22:19 ` [PATCHSET v1.0 00/45] libxfs: shard the realtime section Darrick J. Wong ` (23 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 18/45] xfs: repair secondary realtime group superblocks Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 26/45] xfs_db: support dumping realtime superblocks Darrick J. Wong ` (19 subsequent siblings) 44 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Report realtime group geometry. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libfrog/fsgeom.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/libfrog/fsgeom.c b/libfrog/fsgeom.c index 3f4c38d1e1b..66e813a863f 100644 --- a/libfrog/fsgeom.c +++ b/libfrog/fsgeom.c @@ -63,7 +63,8 @@ xfs_report_geom( "naming =version %-14u bsize=%-6u ascii-ci=%d, ftype=%d\n" "log =%-22s bsize=%-6d blocks=%u, version=%d\n" " =%-22s sectsz=%-5u sunit=%d blks, lazy-count=%d\n" -"realtime =%-22s extsz=%-6d blocks=%lld, rtextents=%lld\n"), +"realtime =%-22s extsz=%-6d blocks=%lld, rtextents=%lld\n" +" =%-22s rgcount=%-4d rgsize=%u blks\n"), mntpoint, geo->inodesize, geo->agcount, geo->agblocks, "", geo->sectsize, attrversion, projid32bit, "", crcs_enabled, finobt_enabled, spinodes, rmapbt_enabled, @@ -78,7 +79,8 @@ xfs_report_geom( "", geo->logsectsize, geo->logsunit / geo->blocksize, lazycount, !geo->rtblocks ? _("none") : rtname ? rtname : _("external"), geo->rtextsize * geo->blocksize, (unsigned long long)geo->rtblocks, - (unsigned long long)geo->rtextents); + (unsigned long long)geo->rtextents, + "", geo->rgcount, geo->rgblocks); } /* Try to obtain the xfs geometry. On error returns a negative error code. */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 26/45] xfs_db: support dumping realtime superblocks 2022-12-30 22:19 ` [PATCHSET v1.0 00/45] libxfs: shard the realtime section Darrick J. Wong ` (24 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 20/45] libfrog: report rt groups in output Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 31/45] xfs_db: report rtgroups via version command Darrick J. Wong ` (18 subsequent siblings) 44 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Allow debugging of realtime superblocks, and add the relevant fields in the fs superblock that point us at the existence and location of the rt supers. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- db/Makefile | 2 + db/command.c | 2 + db/field.c | 8 ++++ db/field.h | 4 ++ db/rtgroup.c | 115 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ db/rtgroup.h | 15 ++++++++ db/sb.c | 15 ++++++++ db/type.c | 6 +++ db/type.h | 1 + 9 files changed, 167 insertions(+), 1 deletion(-) create mode 100644 db/rtgroup.c create mode 100644 db/rtgroup.h diff --git a/db/Makefile b/db/Makefile index dbe79a9a1b1..d22adc2d18d 100644 --- a/db/Makefile +++ b/db/Makefile @@ -13,7 +13,7 @@ HFILES = addr.h agf.h agfl.h agi.h attr.h attrshort.h bit.h block.h bmap.h \ flist.h fprint.h frag.h freesp.h hash.h help.h init.h inode.h input.h \ io.h logformat.h malloc.h metadump.h output.h print.h quit.h sb.h \ sig.h strvec.h text.h type.h write.h attrset.h symlink.h fsmap.h \ - fuzz.h + fuzz.h rtgroup.h CFILES = $(HFILES:.h=.c) btdump.c btheight.c convert.c info.c namei.c \ timelimit.c bmap_inflate.c unlinked.c LSRCFILES = xfs_admin.sh xfs_ncheck.sh xfs_metadump.sh diff --git a/db/command.c b/db/command.c index be6d045a23a..f2b59709848 100644 --- a/db/command.c +++ b/db/command.c @@ -39,6 +39,7 @@ #include "fsmap.h" #include "crc.h" #include "fuzz.h" +#include "rtgroup.h" cmdinfo_t *cmdtab; int ncmds; @@ -135,6 +136,7 @@ init_commands(void) output_init(); print_init(); quit_init(); + rtsb_init(); sb_init(); type_init(); write_init(); diff --git a/db/field.c b/db/field.c index a3e47ee81cc..cee5c661595 100644 --- a/db/field.c +++ b/db/field.c @@ -23,6 +23,7 @@ #include "dir2.h" #include "dir2sf.h" #include "symlink.h" +#include "rtgroup.h" const ftattr_t ftattrtab[] = { { FLDT_AGBLOCK, "agblock", fp_num, "%u", SI(bitsz(xfs_agblock_t)), @@ -44,6 +45,11 @@ const ftattr_t ftattrtab[] = { { FLDT_AGNUMBER, "agnumber", fp_num, "%u", SI(bitsz(xfs_agnumber_t)), FTARG_DONULL, NULL, NULL }, + { FLDT_RGBLOCK, "rgblock", fp_num, "%u", SI(bitsz(xfs_rgblock_t)), + FTARG_DONULL, NULL, NULL }, + { FLDT_RGNUMBER, "rgnumber", fp_num, "%u", SI(bitsz(xfs_rgnumber_t)), + FTARG_DONULL, NULL, NULL }, + /* attr fields */ { FLDT_ATTR, "attr", NULL, (char *)attr_flds, attr_size, FTARG_SIZE, NULL, attr_flds }, @@ -347,6 +353,8 @@ const ftattr_t ftattrtab[] = { NULL, NULL }, { FLDT_SB, "sb", NULL, (char *)sb_flds, sb_size, FTARG_SIZE, NULL, sb_flds }, + { FLDT_RTSB, "rtsb", NULL, (char *)rtsb_flds, rtsb_size, FTARG_SIZE, + NULL, rtsb_flds }, /* CRC enabled symlink */ { FLDT_SYMLINK_CRC, "symlink", NULL, (char *)symlink_crc_flds, diff --git a/db/field.h b/db/field.h index 634742a572c..226753490ad 100644 --- a/db/field.h +++ b/db/field.h @@ -15,6 +15,9 @@ typedef enum fldt { FLDT_AGINONN, FLDT_AGNUMBER, + FLDT_RGBLOCK, + FLDT_RGNUMBER, + /* attr fields */ FLDT_ATTR, FLDT_ATTR_BLKINFO, @@ -166,6 +169,7 @@ typedef enum fldt { FLDT_QCNT, FLDT_QWARNCNT, FLDT_SB, + FLDT_RTSB, /* CRC enabled symlink */ FLDT_SYMLINK_CRC, diff --git a/db/rtgroup.c b/db/rtgroup.c new file mode 100644 index 00000000000..c4debc1d394 --- /dev/null +++ b/db/rtgroup.c @@ -0,0 +1,115 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (C) 2022 Oracle. All Rights Reserved. + * Author: Darrick J. Wong <djwong@kernel.org> + */ +#include "libxfs.h" +#include "libxlog.h" +#include "command.h" +#include "type.h" +#include "faddr.h" +#include "fprint.h" +#include "field.h" +#include "io.h" +#include "sb.h" +#include "bit.h" +#include "output.h" +#include "init.h" +#include "rtgroup.h" + +#define uuid_equal(s,d) (platform_uuid_compare((s),(d)) == 0) + +static int rtsb_f(int argc, char **argv); +static void rtsb_help(void); + +static const cmdinfo_t rtsb_cmd = + { "rtsb", NULL, rtsb_f, 0, 1, 1, N_("[rgno]"), + N_("set current address to realtime sb header"), rtsb_help }; + +void +rtsb_init(void) +{ + if (xfs_has_rtgroups(mp)) + add_command(&rtsb_cmd); +} + +#define OFF(f) bitize(offsetof(struct xfs_rtsb, rsb_ ## f)) +#define SZC(f) szcount(struct xfs_rtsb, rsb_ ## f) +const field_t rtsb_flds[] = { + { "magicnum", FLDT_UINT32X, OI(OFF(magicnum)), C1, 0, TYP_NONE }, + { "blocksize", FLDT_UINT32D, OI(OFF(blocksize)), C1, 0, TYP_NONE }, + { "rblocks", FLDT_DRFSBNO, OI(OFF(rblocks)), C1, 0, TYP_NONE }, + { "rextents", FLDT_DRTBNO, OI(OFF(rextents)), C1, 0, TYP_NONE }, + { "uuid", FLDT_UUID, OI(OFF(uuid)), C1, 0, TYP_NONE }, + { "rextsize", FLDT_AGBLOCK, OI(OFF(rextsize)), C1, 0, TYP_NONE }, + { "rgblocks", FLDT_RGBLOCK, OI(OFF(rgblocks)), C1, 0, TYP_NONE }, + { "rgcount", FLDT_RGNUMBER, OI(OFF(rgcount)), C1, 0, TYP_NONE }, + { "rbmblocks", FLDT_EXTLEN, OI(OFF(rbmblocks)), C1, 0, TYP_NONE }, + { "fname", FLDT_CHARNS, OI(OFF(fname)), CI(SZC(fname)), 0, TYP_NONE }, + { "blocklog", FLDT_UINT8D, OI(OFF(blocklog)), C1, 0, TYP_NONE }, + { "sectlog", FLDT_UINT8D, OI(OFF(sectlog)), C1, 0, TYP_NONE }, + { "rextslog", FLDT_UINT8D, OI(OFF(rextslog)), C1, 0, TYP_NONE }, + { "crc", FLDT_CRC, OI(OFF(crc)), C1, 0, TYP_NONE }, + { "lsn", FLDT_UINT64X, OI(OFF(lsn)), C1, 0, TYP_NONE }, + { "meta_uuid", FLDT_UUID, OI(OFF(meta_uuid)), C1, 0, TYP_NONE }, + { NULL } +}; + +const field_t rtsb_hfld[] = { + { "", FLDT_RTSB, OI(0), C1, 0, TYP_NONE }, + { NULL } +}; + +static void +rtsb_help(void) +{ + dbprintf(_( +"\n" +" set realtime group superblock\n" +"\n" +" Example:\n" +"\n" +" 'rtsb 7' - set location to 7th realtime group superblock, set type to 'rtsb'\n" +"\n" +" Located in the first block of each realtime group, the rt superblock\n" +" contains the base information for the realtime section of a filesystem.\n" +" The superblock in allocation group 0 is the primary. The copies in the\n" +" remaining realtime groups only serve as backup for filesystem recovery.\n" +"\n" +)); +} + +static int +rtsb_f( + int argc, + char **argv) +{ + xfs_rtblock_t rtbno; + xfs_rgnumber_t rgno = 0; + char *p; + + if (argc > 1) { + rgno = (xfs_rgnumber_t)strtoul(argv[1], &p, 0); + if (*p != '\0' || rgno >= mp->m_sb.sb_rgcount) { + dbprintf(_("bad realtime group number %s\n"), argv[1]); + return 0; + } + } + cur_agno = NULLAGNUMBER; + + rtbno = xfs_rgbno_to_rtb(mp, rgno, 0); + + ASSERT(typtab[TYP_RTSB].typnm == TYP_RTSB); + set_rt_cur(&typtab[TYP_RTSB], xfs_rtb_to_daddr(mp, rtbno), + XFS_FSB_TO_BB(mp, 1), DB_RING_ADD, NULL); + return 0; +} + +int +rtsb_size( + void *obj, + int startoff, + int idx) +{ + return bitize(mp->m_sb.sb_blocksize); +} diff --git a/db/rtgroup.h b/db/rtgroup.h new file mode 100644 index 00000000000..49077bee141 --- /dev/null +++ b/db/rtgroup.h @@ -0,0 +1,15 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (C) 2022 Oracle. All Rights Reserved. + * Author: Darrick J. Wong <djwong@kernel.org> + */ +#ifndef DB_RTGROUP_H_ +#define DB_RTGROUP_H_ + +extern const struct field rtsb_flds[]; +extern const struct field rtsb_hfld[]; + +extern void rtsb_init(void); +extern int rtsb_size(void *obj, int startoff, int idx); + +#endif /* DB_RTGROUP_H_ */ diff --git a/db/sb.c b/db/sb.c index d7df55e02e9..fa0706d3676 100644 --- a/db/sb.c +++ b/db/sb.c @@ -74,6 +74,17 @@ rootino_count( return xfs_has_metadir(mp) ? 0 : 1; } +/* + * Counts superblock fields that only exist when realtime groups are enabled. + */ +static int +rtgroups_count( + void *obj, + int startoff) +{ + return xfs_has_rtgroups(mp) ? 1 : 0; +} + #define OFF(f) bitize(offsetof(struct xfs_dsb, sb_ ## f)) #define SZC(f) szcount(struct xfs_dsb, sb_ ## f) const field_t sb_flds[] = { @@ -91,6 +102,10 @@ const field_t sb_flds[] = { TYP_INODE }, { "rsumino", FLDT_INO, OI(OFF(rsumino)), rootino_count, FLD_COUNT, TYP_INODE }, + { "rgblocks", FLDT_RGBLOCK, OI(OFF(rgblocks)), rtgroups_count, + FLD_COUNT, TYP_NONE }, + { "rgcount", FLDT_RGNUMBER, OI(OFF(rgcount)), rtgroups_count, + FLD_COUNT, TYP_NONE }, { "rextsize", FLDT_AGBLOCK, OI(OFF(rextsize)), C1, 0, TYP_NONE }, { "agblocks", FLDT_AGBLOCK, OI(OFF(agblocks)), C1, 0, TYP_NONE }, { "agcount", FLDT_AGNUMBER, OI(OFF(agcount)), C1, 0, TYP_NONE }, diff --git a/db/type.c b/db/type.c index efe7044569d..d875c0c6365 100644 --- a/db/type.c +++ b/db/type.c @@ -28,6 +28,7 @@ #include "text.h" #include "symlink.h" #include "fuzz.h" +#include "rtgroup.h" static const typ_t *findtyp(char *name); static int type_f(int argc, char **argv); @@ -60,6 +61,7 @@ static const typ_t __typtab[] = { { TYP_LOG, "log", NULL, NULL, NULL, TYP_F_NO_CRC_OFF }, { TYP_RTBITMAP, "rtbitmap", handle_text, NULL, NULL, TYP_F_NO_CRC_OFF }, { TYP_RTSUMMARY, "rtsummary", handle_text, NULL, NULL, TYP_F_NO_CRC_OFF }, + { TYP_RTSB, "rtsb", handle_struct, rtsb_hfld, NULL, TYP_F_NO_CRC_OFF }, { TYP_SB, "sb", handle_struct, sb_hfld, NULL, TYP_F_NO_CRC_OFF }, { TYP_SYMLINK, "symlink", handle_string, NULL, NULL, TYP_F_NO_CRC_OFF }, { TYP_TEXT, "text", handle_text, NULL, NULL, TYP_F_NO_CRC_OFF }, @@ -102,6 +104,8 @@ static const typ_t __typtab_crc[] = { { TYP_LOG, "log", NULL, NULL, NULL, TYP_F_NO_CRC_OFF }, { TYP_RTBITMAP, "rtbitmap", handle_text, NULL, NULL, TYP_F_NO_CRC_OFF }, { TYP_RTSUMMARY, "rtsummary", handle_text, NULL, NULL, TYP_F_NO_CRC_OFF }, + { TYP_RTSB, "rtsb", handle_struct, rtsb_hfld, &xfs_rtsb_buf_ops, + XFS_SB_CRC_OFF }, { TYP_SB, "sb", handle_struct, sb_hfld, &xfs_sb_buf_ops, XFS_SB_CRC_OFF }, { TYP_SYMLINK, "symlink", handle_struct, symlink_crc_hfld, @@ -146,6 +150,8 @@ static const typ_t __typtab_spcrc[] = { { TYP_LOG, "log", NULL, NULL, NULL, TYP_F_NO_CRC_OFF }, { TYP_RTBITMAP, "rtbitmap", handle_text, NULL, NULL, TYP_F_NO_CRC_OFF }, { TYP_RTSUMMARY, "rtsummary", handle_text, NULL, NULL, TYP_F_NO_CRC_OFF }, + { TYP_RTSB, "rtsb", handle_struct, rtsb_hfld, &xfs_rtsb_buf_ops, + XFS_SB_CRC_OFF }, { TYP_SB, "sb", handle_struct, sb_hfld, &xfs_sb_buf_ops, XFS_SB_CRC_OFF }, { TYP_SYMLINK, "symlink", handle_struct, symlink_crc_hfld, diff --git a/db/type.h b/db/type.h index 397dcf5464c..d4efa4b0fab 100644 --- a/db/type.h +++ b/db/type.h @@ -30,6 +30,7 @@ typedef enum typnm TYP_LOG, TYP_RTBITMAP, TYP_RTSUMMARY, + TYP_RTSB, TYP_SB, TYP_SYMLINK, TYP_TEXT, ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 31/45] xfs_db: report rtgroups via version command 2022-12-30 22:19 ` [PATCHSET v1.0 00/45] libxfs: shard the realtime section Darrick J. Wong ` (25 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 26/45] xfs_db: support dumping realtime superblocks Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 27/45] xfs_db: support changing the label and uuid of rt superblocks Darrick J. Wong ` (17 subsequent siblings) 44 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Report the rtgroups feature in the version command output. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- db/sb.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/db/sb.c b/db/sb.c index 36d4c317dba..52c5edc065a 100644 --- a/db/sb.c +++ b/db/sb.c @@ -852,6 +852,8 @@ version_string( strcat(s, ",NREXT64"); if (xfs_has_metadir(mp)) strcat(s, ",METADIR"); + if (xfs_has_rtgroups(mp)) + strcat(s, ",RTGROUPS"); return s; } ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 27/45] xfs_db: support changing the label and uuid of rt superblocks 2022-12-30 22:19 ` [PATCHSET v1.0 00/45] libxfs: shard the realtime section Darrick J. Wong ` (26 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 31/45] xfs_db: report rtgroups via version command Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 28/45] xfs_db: listify the definition of dbm_t Darrick J. Wong ` (16 subsequent siblings) 44 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Update the label and uuid commands to change the rt superblocks along with the filesystem superblocks. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- db/sb.c | 119 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 108 insertions(+), 11 deletions(-) diff --git a/db/sb.c b/db/sb.c index fa0706d3676..36d4c317dba 100644 --- a/db/sb.c +++ b/db/sb.c @@ -27,6 +27,7 @@ static int label_f(int argc, char **argv); static void label_help(void); static int version_f(int argc, char **argv); static void version_help(void); +static size_t check_label(char *label, bool can_warn); static const cmdinfo_t sb_cmd = { "sb", NULL, sb_f, 0, 1, 1, N_("[agno]"), @@ -357,6 +358,77 @@ uuid_help(void) )); } +static bool +check_rtgroup_update_problems( + struct xfs_mount *mp) +{ + int error; + + if (!xfs_has_rtgroups(mp) || mp->m_sb.sb_rgcount == 0) + return false; + + push_cur(); + error = set_rt_cur(&typtab[TYP_RTSB], XFS_RTSB_DADDR, + XFS_FSB_TO_BB(mp, 1), DB_RING_ADD, NULL); + if (error == ENODEV) { + /* no rt dev means we should just bail out */ + pop_cur(); + return true; + } + + pop_cur(); + return false; +} + +static int +update_rt_supers( + struct xfs_mount *mp, + uuid_t *uuid, + char *label) +{ + uuid_t old_uuid; + xfs_rgnumber_t rgno; + int error; + + if (uuid) + memcpy(&old_uuid, &mp->m_sb.sb_uuid, sizeof(uuid_t)); + + for (rgno = 0; rgno < mp->m_sb.sb_rgcount; rgno++) { + struct xfs_rtsb *rsb; + xfs_rtblock_t rtbno; + + push_cur(); + rtbno = xfs_rgbno_to_rtb(mp, rgno, 0); + error = set_rt_cur(&typtab[TYP_RTSB], + xfs_rtb_to_daddr(mp, rtbno), + XFS_FSB_TO_BB(mp, 1), DB_RING_ADD, NULL); + if (error == ENODEV) { + /* no rt dev means we should just bail out */ + exitcode = 1; + pop_cur(); + return 1; + } + + rsb = iocur_top->data; + if (label) { + size_t len = check_label(label, false); + + memset(&rsb->rsb_fname, 0, XFSLABEL_MAX); + memcpy(&rsb->rsb_fname, label, len); + } + if (uuid) { + memcpy(&mp->m_sb.sb_uuid, uuid, sizeof(uuid_t)); + memcpy(&rsb->rsb_uuid, uuid, sizeof(rsb->rsb_uuid)); + } + write_cur(); + if (uuid) + memcpy(&mp->m_sb.sb_uuid, &old_uuid, sizeof(uuid_t)); + pop_cur(); + } + + return 0; +} + static uuid_t * do_uuid(xfs_agnumber_t agno, uuid_t *uuid) { @@ -463,11 +535,18 @@ uuid_f( } } + if (check_rtgroup_update_problems(mp)) { + exitcode = 1; + return 0; + } + /* clear the log (setting uuid) if it's not dirty */ if (!sb_logzero(&uu)) return 0; dbprintf(_("writing all SBs\n")); + if (update_rt_supers(mp, &uu, NULL)) + return 1; for (agno = 0; agno < mp->m_sb.sb_agcount; agno++) if (!do_uuid(agno, &uu)) { dbprintf(_("failed to set UUID in AG %d\n"), agno); @@ -536,6 +615,27 @@ label_help(void) )); } +static size_t +check_label( + char *label, + bool can_warn) +{ + size_t len = strlen(label); + + if (len > XFSLABEL_MAX) { + if (can_warn) + dbprintf(_("%s: truncating label length from %d to %d\n"), + progname, (int)len, XFSLABEL_MAX); + len = XFSLABEL_MAX; + } + if ( len == 2 && + (strcmp(label, "\"\"") == 0 || + strcmp(label, "''") == 0 || + strcmp(label, "--") == 0) ) + label[0] = label[1] = '\0'; + return len; +} + static char * do_label(xfs_agnumber_t agno, char *label) { @@ -554,17 +654,7 @@ do_label(xfs_agnumber_t agno, char *label) return &lbl[0]; } /* set label */ - if ((len = strlen(label)) > sizeof(tsb.sb_fname)) { - if (agno == 0) - dbprintf(_("%s: truncating label length from %d to %d\n"), - progname, (int)len, (int)sizeof(tsb.sb_fname)); - len = sizeof(tsb.sb_fname); - } - if ( len == 2 && - (strcmp(label, "\"\"") == 0 || - strcmp(label, "''") == 0 || - strcmp(label, "--") == 0) ) - label[0] = label[1] = '\0'; + len = check_label(label, agno == 0); memset(&tsb.sb_fname, 0, sizeof(tsb.sb_fname)); memcpy(&tsb.sb_fname, label, len); memcpy(&lbl[0], &tsb.sb_fname, sizeof(tsb.sb_fname)); @@ -601,7 +691,14 @@ label_f( return 0; } + if (check_rtgroup_update_problems(mp)) { + exitcode = 1; + return 0; + } + dbprintf(_("writing all SBs\n")); + if (update_rt_supers(mp, NULL, argv[1])) + return 1; for (ag = 0; ag < mp->m_sb.sb_agcount; ag++) if ((p = do_label(ag, argv[1])) == NULL) { dbprintf(_("failed to set label in AG %d\n"), ag); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 28/45] xfs_db: listify the definition of dbm_t 2022-12-30 22:19 ` [PATCHSET v1.0 00/45] libxfs: shard the realtime section Darrick J. Wong ` (27 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 27/45] xfs_db: support changing the label and uuid of rt superblocks Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 32/45] xfs_db: metadump realtime devices Darrick J. Wong ` (15 subsequent siblings) 44 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Convert this enum definition to a list so that code adding elements to the enum do not have to reflow the whole thing. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- db/check.c | 38 ++++++++++++++++++++++++++++++-------- 1 file changed, 30 insertions(+), 8 deletions(-) diff --git a/db/check.c b/db/check.c index f39d732d04d..4f4bff58e22 100644 --- a/db/check.c +++ b/db/check.c @@ -26,14 +26,36 @@ typedef enum { } qtype_t; typedef enum { - DBM_UNKNOWN, DBM_AGF, DBM_AGFL, DBM_AGI, - DBM_ATTR, DBM_BTBMAPA, DBM_BTBMAPD, DBM_BTBNO, - DBM_BTCNT, DBM_BTINO, DBM_DATA, DBM_DIR, - DBM_FREE1, DBM_FREE2, DBM_FREELIST, DBM_INODE, - DBM_LOG, DBM_MISSING, DBM_QUOTA, DBM_RTBITMAP, - DBM_RTDATA, DBM_RTFREE, DBM_RTSUM, DBM_SB, - DBM_SYMLINK, DBM_BTFINO, DBM_BTRMAP, DBM_BTREFC, - DBM_RLDATA, DBM_COWDATA, + DBM_UNKNOWN, + DBM_AGF, + DBM_AGFL, + DBM_AGI, + DBM_ATTR, + DBM_BTBMAPA, + DBM_BTBMAPD, + DBM_BTBNO, + DBM_BTCNT, + DBM_BTINO, + DBM_DATA, + DBM_DIR, + DBM_FREE1, + DBM_FREE2, + DBM_FREELIST, + DBM_INODE, + DBM_LOG, + DBM_MISSING, + DBM_QUOTA, + DBM_RTBITMAP, + DBM_RTDATA, + DBM_RTFREE, + DBM_RTSUM, + DBM_SB, + DBM_SYMLINK, + DBM_BTFINO, + DBM_BTRMAP, + DBM_BTREFC, + DBM_RLDATA, + DBM_COWDATA, DBM_NDBM } dbm_t; ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 32/45] xfs_db: metadump realtime devices 2022-12-30 22:19 ` [PATCHSET v1.0 00/45] libxfs: shard the realtime section Darrick J. Wong ` (28 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 28/45] xfs_db: listify the definition of dbm_t Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 30/45] xfs_db: enable conversion of rt space units Darrick J. Wong ` (14 subsequent siblings) 44 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Teach the metadump device to dump the filesystem metadata of a realtime device to the metadump file. Currently, this is limited to the rt group superblocks. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- db/metadump.c | 51 +++++++++++++++++++++++++++++++++++++++++++++-- db/xfs_metadump.sh | 5 +++-- man/man8/xfs_metadump.8 | 12 ++++++++++- 3 files changed, 63 insertions(+), 5 deletions(-) diff --git a/db/metadump.c b/db/metadump.c index f337493d505..e997c386c0b 100644 --- a/db/metadump.c +++ b/db/metadump.c @@ -3070,6 +3070,40 @@ _("Could not discern log; image will contain unobfuscated metadata in log.")); return !write_buf(iocur_top); } +static int +copy_rtsupers(void) +{ + int error; + + if (show_progress) + print_progress("Copying realtime superblocks"); + + xfs_rtblock_t rtbno; + xfs_rgnumber_t rgno = 0; + + for (rgno = 0; rgno < mp->m_sb.sb_rgcount; rgno++) { + rtbno = xfs_rgbno_to_rtb(mp, rgno, 0); + + push_cur(); + error = set_rt_cur(&typtab[TYP_RTSB], + xfs_rtb_to_daddr(mp, rtbno), + XFS_FSB_TO_BB(mp, 1), DB_RING_ADD, NULL); + if (error) + return 0; + if (iocur_top->data == NULL) { + pop_cur(); + print_warning("cannot read rt super %u", rgno); + return !stop_on_read_error; + } + error = write_buf(iocur_top); + pop_cur(); + if (error) + return 0; + } + + return 1; +} + static int metadump_f( int argc, @@ -3145,9 +3179,11 @@ metadump_f( /* * Use the old format if there are no external devices with metadata to - * dump. + * dump. Force the new format if we have realtime group superblocks. */ - if (mp->m_sb.sb_logstart != 0) + if (xfs_has_rtgroups(mp)) + copy_external = true; + else if (mp->m_sb.sb_logstart != 0 /* && !rtgroups */) copy_external = false; metablock = (xfs_metablock_t *)calloc(BBSIZE + 1, BBSIZE); @@ -3276,6 +3312,17 @@ metadump_f( exitcode = write_index() < 0; } + /* write the realtime device, if desired */ + if (!exitcode && xfs_has_rtgroups(mp) && copy_external) { + metablock->mb_info &= ~XFS_METADUMP_LOGDEV; + metablock->mb_info |= XFS_METADUMP_RTDEV; + + if (!copy_rtsupers()) + exitcode = 1; + if (!exitcode) + exitcode = write_index() < 0; + } + if (progress_since_warning) fputc('\n', stdout_metadump ? stderr : stdout); diff --git a/db/xfs_metadump.sh b/db/xfs_metadump.sh index 06bfc4e7bd4..29cced9f302 100755 --- a/db/xfs_metadump.sh +++ b/db/xfs_metadump.sh @@ -6,9 +6,9 @@ OPTS=" " DBOPTS=" " -USAGE="Usage: xfs_metadump [-aefFgoVwx] [-m max_extents] [-l logdev] source target" +USAGE="Usage: xfs_metadump [-aefFgoVwx] [-m max_extents] [-l logdev] [-R rtdev] source target" -while getopts "aefgl:m:owFVx" c +while getopts "aefgl:m:owFVxR:" c do case $c in a) OPTS=$OPTS"-a ";; @@ -25,6 +25,7 @@ do exit $status ;; x) OPTS=$OPTS"-x ";; + R) DBOPTS=$DBOPTS"-R "$OPTARG" ";; \?) echo $USAGE 1>&2 exit 2 ;; diff --git a/man/man8/xfs_metadump.8 b/man/man8/xfs_metadump.8 index b940cb084b5..98e822b39d0 100644 --- a/man/man8/xfs_metadump.8 +++ b/man/man8/xfs_metadump.8 @@ -11,6 +11,9 @@ xfs_metadump \- copy XFS filesystem metadata to a file ] [ .B \-l .I logdev +] [ +.B \-R +.I rtdev ] .I source .I target @@ -136,12 +139,19 @@ this value. The default size is 2097151 blocks. .B \-o Disables obfuscation of file names and extended attributes. .TP +.B \-R +For filesystems that have a realtime section, this specifies the device where +the realtime section resides. +To record the contents of the realtime section in the dump, the +.B \-x +option must also be specified. +.TP .B \-w Prints warnings of inconsistent metadata encountered to stderr. Bad metadata is still copied. .TP .B \-x -Dump the external log device, if present. +Dump the external log and realtime device, if present. The metadump file will not be compatible with older versions of .BR xfs_mdrestore (1). .TP ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 30/45] xfs_db: enable conversion of rt space units 2022-12-30 22:19 ` [PATCHSET v1.0 00/45] libxfs: shard the realtime section Darrick J. Wong ` (29 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 32/45] xfs_db: metadump realtime devices Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 29/45] xfs_db: implement check for rt superblocks Darrick J. Wong ` (13 subsequent siblings) 44 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Teach the xfs_db convert function about realtime extents, blocks, and realtime group numbers. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- db/convert.c | 38 +++++++++++++++++++++++++++++++++++++- man/man8/xfs_db.8 | 17 +++++++++++++++++ 2 files changed, 54 insertions(+), 1 deletion(-) diff --git a/db/convert.c b/db/convert.c index 072ccc8f6ef..2214649650d 100644 --- a/db/convert.c +++ b/db/convert.c @@ -34,6 +34,10 @@ rtblock_to_bytes(rtx_to_rtblock(xfs_rbmblock_to_rtx(mp, (uint64_t)x))) #define rbmword_to_bytes(x) \ rtblock_to_bytes(rtx_to_rtblock((uint64_t)(x) << XFS_NBWORDLOG)) +#define rgblock_to_bytes(x) \ + ((uint64_t)(x) << mp->m_sb.sb_blocklog) +#define rgnumber_to_bytes(x) \ + rgblock_to_bytes((uint64_t)(x) * mp->m_sb.sb_rgblocks) typedef enum { CT_NONE = -1, @@ -55,6 +59,8 @@ typedef enum { CT_RSUMBLOCK, /* block within rt summary */ CT_RSUMLOG, /* log level for rtsummary computations */ CT_RSUMINFO, /* info word within rt summary */ + CT_RGBLOCK, /* xfs_rgblock_t */ + CT_RGNUMBER, /* xfs_rgno_t */ NCTS } ctype_t; @@ -80,6 +86,8 @@ typedef union { xfs_fileoff_t rbmblock; unsigned int rbmword; xfs_fileoff_t rsumblock; + xfs_rgnumber_t rgnumber; + xfs_rgblock_t rgblock; } cval_t; static uint64_t bytevalue(ctype_t ctype, cval_t *val); @@ -95,7 +103,7 @@ static const char *agnumber_names[] = { "agnumber", "agno", NULL }; static const char *bboff_names[] = { "bboff", "daddroff", NULL }; static const char *blkoff_names[] = { "blkoff", "fsboff", "agboff", NULL }; -static const char *rtblkoff_names[] = { "blkoff", "rtboff", +static const char *rtblkoff_names[] = { "blkoff", "rtboff", "rgboff", NULL }; static const char *byte_names[] = { "byte", "fsbyte", NULL }; static const char *daddr_names[] = { "daddr", "bb", NULL }; @@ -111,6 +119,8 @@ static const char *rbmword_names[] = { "rbmword", "rbmw", NULL }; static const char *rsumblock_names[] = { "rsumblock", "rsmb", NULL }; static const char *rsumlog_names[] = { "rsumlog", "rsml", NULL }; static const char *rsumword_names[] = { "rsuminfo", "rsmi", NULL }; +static const char *rgblock_names[] = { "rgblock", "rgbno", NULL }; +static const char *rgnumber_names[] = { "rgnumber", "rgno", NULL }; static int rsuminfo; static int rsumlog; @@ -208,6 +218,14 @@ static const ctydesc_t ctydescs_rt[NCTS] = { .allowed = M(RSUMBLOCK), .names = rsumword_names, }, + [CT_RGBLOCK] = { + .allowed = M(RGNUMBER)|M(BBOFF)|M(BLKOFF)|M(RSUMLOG), + .names = rgblock_names, + }, + [CT_RGNUMBER] = { + .allowed = M(RGBLOCK)|M(BBOFF)|M(BLKOFF)|M(RSUMLOG), + .names = rgnumber_names, + }, }; static const cmdinfo_t convert_cmd = @@ -295,6 +313,10 @@ bytevalue(ctype_t ctype, cval_t *val) * value. */ return 0; + case CT_RGBLOCK: + return rgblock_to_bytes(val->rgblock); + case CT_RGNUMBER: + return rgnumber_to_bytes(val->rgnumber); case CT_NONE: case NCTS: break; @@ -401,6 +423,8 @@ convert_f(int argc, char **argv) case CT_RSUMBLOCK: case CT_RSUMLOG: case CT_RSUMINFO: + case CT_RGBLOCK: + case CT_RGNUMBER: /* shouldn't get here */ ASSERT(0); break; @@ -537,6 +561,12 @@ rtconvert_f(int argc, char **argv) case CT_RSUMINFO: v = rt_daddr_to_rsuminfo(mp, v); break; + case CT_RGBLOCK: + v = xfs_daddr_to_rgbno(mp, v >> BBSHIFT); + break; + case CT_RGNUMBER: + v = xfs_daddr_to_rgno(mp, v >> BBSHIFT); + break; case CT_AGBLOCK: case CT_AGINO: case CT_AGNUMBER: @@ -629,6 +659,12 @@ getvalue(char *s, ctype_t ctype, cval_t *val) case CT_RSUMINFO: rsuminfo = (unsigned int)v; break; + case CT_RGBLOCK: + val->rgblock = (xfs_rgblock_t)v; + break; + case CT_RGNUMBER: + val->rgnumber = (xfs_rgnumber_t)v; + break; case CT_NONE: case NCTS: /* NOTREACHED */ diff --git a/man/man8/xfs_db.8 b/man/man8/xfs_db.8 index 9c1ce5d79cf..bcb4c871827 100644 --- a/man/man8/xfs_db.8 +++ b/man/man8/xfs_db.8 @@ -978,6 +978,16 @@ with alternate names, are: .RS 1.0i .PD 0 .HP +.B rgblock +or +.B rgbno +(realtime block within a realtime group) +.HP +.B rgnumber +or +.B rgno +(realtime group number) +.HP .B bboff or .B daddroff @@ -1045,6 +1055,13 @@ or .RE .IP Only conversions that "make sense" are allowed. +The compound form (with more than three arguments) is useful for +conversions such as +.B convert rgno +.I rg +.B rgbno +.I rgb +.BR rtblock . Realtime summary file location conversions have the following rules: Each info word in the rt summary file counts the number of free extents of a ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 29/45] xfs_db: implement check for rt superblocks 2022-12-30 22:19 ` [PATCHSET v1.0 00/45] libxfs: shard the realtime section Darrick J. Wong ` (30 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 30/45] xfs_db: enable conversion of rt space units Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 35/45] xfs_mdrestore: restore rt group superblocks to realtime device Darrick J. Wong ` (12 subsequent siblings) 44 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Implement the bare minimum needed to avoid xfs_check regressions when realtime groups are enabled. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- db/check.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/db/check.c b/db/check.c index 4f4bff58e22..d3d22f0531c 100644 --- a/db/check.c +++ b/db/check.c @@ -56,6 +56,7 @@ typedef enum { DBM_BTREFC, DBM_RLDATA, DBM_COWDATA, + DBM_RTSB, DBM_NDBM } dbm_t; @@ -187,6 +188,7 @@ static const char *typename[] = { "btrefcnt", "rldata", "cowdata", + "rtsb", NULL }; @@ -809,6 +811,23 @@ blockfree_f( return 0; } +static void +rtgroups_init( + struct xfs_mount *mp) +{ + xfs_rgnumber_t rgno; + + if (!xfs_has_rtgroups(mp)) + return; + + for (rgno = 0; rgno < mp->m_sb.sb_rgcount; rgno++) { + xfs_rtblock_t rtbno; + + rtbno = xfs_rgbno_to_rtb(mp, rgno, 0); + set_rdbmap(rtbno, mp->m_sb.sb_rextsize, DBM_RTSB); + } +} + /* * Check consistency of xfs filesystem contents. */ @@ -843,6 +862,7 @@ blockget_f( "filesystem.\n")); } } + rtgroups_init(mp); if (blist_size) { xfree(blist); blist = NULL; ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 35/45] xfs_mdrestore: restore rt group superblocks to realtime device 2022-12-30 22:19 ` [PATCHSET v1.0 00/45] libxfs: shard the realtime section Darrick J. Wong ` (31 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 29/45] xfs_db: implement check for rt superblocks Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 38/45] xfs_io: display rt group in verbose bmap output Darrick J. Wong ` (11 subsequent siblings) 44 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Support restoring realtime device metadata to the realtime device, if the dumped filesystem had one. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- man/man8/xfs_mdrestore.8 | 7 +++++++ mdrestore/xfs_mdrestore.c | 30 ++++++++++++++++++++++++------ 2 files changed, 31 insertions(+), 6 deletions(-) diff --git a/man/man8/xfs_mdrestore.8 b/man/man8/xfs_mdrestore.8 index 4626b98e749..4a6b335a380 100644 --- a/man/man8/xfs_mdrestore.8 +++ b/man/man8/xfs_mdrestore.8 @@ -7,6 +7,8 @@ xfs_mdrestore \- restores an XFS metadump image to a filesystem image .B \-gi ] [ .B \-l logdev +] [ +.B \-R rtdev ] .I source .I target @@ -15,6 +17,8 @@ xfs_mdrestore \- restores an XFS metadump image to a filesystem image .B \-i [ .B \-l logdev +] [ +.B \-R rtdev ] .I source .br @@ -57,6 +61,9 @@ Shows metadump information on stdout. If no is specified, exits after displaying information. Older metadumps man not include any descriptive information. .TP +.B \-R +Restore realtime device metadata to this device. +.TP .B \-V Prints the version number and exits. .SH DIAGNOSTICS diff --git a/mdrestore/xfs_mdrestore.c b/mdrestore/xfs_mdrestore.c index 672010bcc6e..b75b30830ea 100644 --- a/mdrestore/xfs_mdrestore.c +++ b/mdrestore/xfs_mdrestore.c @@ -115,7 +115,8 @@ perform_restore( int dst_fd, int is_target_file, const struct xfs_metablock *mbp, - char *log_path) + char *log_path, + char *rtdev_path) { struct xfs_metablock *metablock; /* header + index + blocks */ __be64 *block_index; @@ -127,7 +128,7 @@ perform_restore( xfs_sb_t sb; int64_t bytes_read; int64_t mb_read = 0; - int log_fd = -1; + int log_fd = -1, rtdev_fd = -1; bool is_mdx; is_mdx = mbp->mb_magic == cpu_to_be32(XFS_MDX_MAGIC); @@ -201,9 +202,19 @@ perform_restore( write_fd = log_fd; } if (metablock->mb_info & XFS_METADUMP_RTDEV) { + int rtdev_is_file; + if (!is_mdx) fatal("rtdev set on an old style metadump?\n"); - fatal("rtdev not supported\n"); + if (rtdev_fd == -1) { + if (!rtdev_path) + fatal( + "metadump has rtdev contents but -R was not specified?\n"); + rtdev_fd = open_device(rtdev_path, &rtdev_is_file); + check_dev(rtdev_fd, rtdev_is_file, + sb.sb_rblocks * sb.sb_blocksize); + } + write_fd = rtdev_fd; } if (show_progress) { @@ -267,6 +278,8 @@ perform_restore( if (pwrite(dst_fd, block_buffer, sb.sb_sectsize, 0) < 0) fatal("error writing primary superblock: %s\n", strerror(errno)); + if (rtdev_fd >= 0) + close(rtdev_fd); if (log_fd >= 0) close(log_fd); @@ -276,7 +289,7 @@ perform_restore( static void usage(void) { - fprintf(stderr, "Usage: %s [-V] [-g] [-i] [-l logdev] source target\n", progname); + fprintf(stderr, "Usage: %s [-V] [-g] [-i] [-l logdev] [-R rtdev] source target\n", progname); exit(1); } @@ -286,6 +299,7 @@ main( char **argv) { char *log_path = NULL; + char *rtdev_path = NULL; FILE *src_f; int dst_fd; int c; @@ -294,7 +308,7 @@ main( progname = basename(argv[0]); - while ((c = getopt(argc, argv, "gl:iV")) != EOF) { + while ((c = getopt(argc, argv, "gl:iVR:")) != EOF) { switch (c) { case 'g': show_progress = 1; @@ -308,6 +322,9 @@ main( case 'V': printf("%s version %s\n", progname, VERSION); exit(0); + case 'R': + rtdev_path = optarg; + break; default: usage(); } @@ -363,7 +380,8 @@ main( /* check and open target */ dst_fd = open_device(argv[optind], &is_target_file); - perform_restore(src_f, dst_fd, is_target_file, &mb, log_path); + perform_restore(src_f, dst_fd, is_target_file, &mb, log_path, + rtdev_path); close(dst_fd); if (src_f != stdin) ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 38/45] xfs_io: display rt group in verbose bmap output 2022-12-30 22:19 ` [PATCHSET v1.0 00/45] libxfs: shard the realtime section Darrick J. Wong ` (32 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 35/45] xfs_mdrestore: restore rt group superblocks to realtime device Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 33/45] xfs_db: dump rt bitmap blocks Darrick J. Wong ` (10 subsequent siblings) 44 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Display the rt group number in the bmap -v output, just like we do for regular data files. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- io/bmap.c | 30 ++++++++++++++++++++++-------- 1 file changed, 22 insertions(+), 8 deletions(-) diff --git a/io/bmap.c b/io/bmap.c index 27383ca6037..a78e0c65440 100644 --- a/io/bmap.c +++ b/io/bmap.c @@ -264,9 +264,16 @@ bmap_f( foff_w = boff_w = aoff_w = MINRANGE_WIDTH; tot_w = MINTOT_WIDTH; - if (is_rt) - sunit = swidth = bbperag = 0; - else { + if (is_rt) { + if (fsgeo.rgcount == 0) { + bbperag = 0; + } else { + bbperag = (off64_t)fsgeo.rgblocks * + (off64_t)fsgeo.blocksize / BBSIZE; + } + sunit = 0; + swidth = 0; + } else { bbperag = (off64_t)fsgeo.agblocks * (off64_t)fsgeo.blocksize / BBSIZE; sunit = (fsgeo.sunit * fsgeo.blocksize) / BBSIZE; @@ -295,7 +302,7 @@ bmap_f( (long long)(map[i + 1].bmv_block + map[i + 1].bmv_length - 1LL)); boff_w = max(boff_w, strlen(bbuf)); - if (!is_rt) { + if (bbperag > 0) { agno = map[i + 1].bmv_block / bbperag; agoff = map[i + 1].bmv_block - (agno * bbperag); @@ -312,13 +319,20 @@ bmap_f( numlen(map[i+1].bmv_length, 10)); } } - agno_w = is_rt ? 0 : max(MINAG_WIDTH, numlen(fsgeo.agcount, 10)); + if (is_rt) { + if (fsgeo.rgcount > 0) + agno_w = max(MINAG_WIDTH, numlen(fsgeo.rgcount, 10)); + else + agno_w = 0; + } else { + agno_w = max(MINAG_WIDTH, numlen(fsgeo.agcount, 10)); + } printf("%4s: %-*s %-*s %*s %-*s %*s%s\n", _("EXT"), foff_w, _("FILE-OFFSET"), boff_w, is_rt ? _("RT-BLOCK-RANGE") : _("BLOCK-RANGE"), - agno_w, is_rt ? "" : _("AG"), - aoff_w, is_rt ? "" : _("AG-OFFSET"), + agno_w, is_rt ? (fsgeo.rgcount ? _("RG") : "") : _("AG"), + aoff_w, is_rt ? (fsgeo.rgcount ? _("RG-OFFSET") : "") : _("AG-OFFSET"), tot_w, _("TOTAL"), flg ? _(" FLAGS") : ""); for (i = 0; i < egcnt; i++) { @@ -377,7 +391,7 @@ bmap_f( map[i + 1].bmv_length - 1LL)); printf("%4d: %-*s %-*s", i, foff_w, rbuf, boff_w, bbuf); - if (!is_rt) { + if (bbperag > 0) { agno = map[i + 1].bmv_block / bbperag; agoff = map[i + 1].bmv_block - (agno * bbperag); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 33/45] xfs_db: dump rt bitmap blocks 2022-12-30 22:19 ` [PATCHSET v1.0 00/45] libxfs: shard the realtime section Darrick J. Wong ` (33 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 38/45] xfs_io: display rt group in verbose bmap output Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 40/45] xfs_spaceman: report on realtime group health Darrick J. Wong ` (9 subsequent siblings) 44 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Now that rtbitmap blocks have a header, make it so that xfs_db can analyze the structure. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- db/bit.c | 24 +++++++++++++++++++----- db/bit.h | 1 + db/field.c | 5 +++++ db/field.h | 4 ++++ db/fprint.c | 11 +++++++++-- db/inode.c | 6 ++++-- db/rtgroup.c | 34 ++++++++++++++++++++++++++++++++++ db/rtgroup.h | 3 +++ db/type.c | 5 +++++ db/type.h | 1 + include/xfs_arch.h | 6 ++++++ 11 files changed, 91 insertions(+), 9 deletions(-) diff --git a/db/bit.c b/db/bit.c index c9bfd2eb025..84f46290b5a 100644 --- a/db/bit.c +++ b/db/bit.c @@ -55,6 +55,7 @@ getbitval( char *p; int64_t rval; int signext; + bool is_le = (flags & BV_LE); int z1, z2, z3, z4; ASSERT(nbits<=64); @@ -63,21 +64,34 @@ getbitval( bit = bitoffs(bitoff); signext = (flags & BVSIGNED) != 0; z4 = ((intptr_t)p & 0xf) == 0 && bit == 0; - if (nbits == 64 && z4) + if (nbits == 64 && z4) { + if (is_le) + return le64_to_cpu(*(__be64 *)p); return be64_to_cpu(*(__be64 *)p); + } z3 = ((intptr_t)p & 0x7) == 0 && bit == 0; if (nbits == 32 && z3) { - if (signext) + if (signext) { + if (is_le) + return (__s32)le32_to_cpu(*(__le32 *)p); return (__s32)be32_to_cpu(*(__be32 *)p); - else + } else { + if (is_le) + return (__u32)le32_to_cpu(*(__le32 *)p); return (__u32)be32_to_cpu(*(__be32 *)p); + } } z2 = ((intptr_t)p & 0x3) == 0 && bit == 0; if (nbits == 16 && z2) { - if (signext) + if (signext) { + if (is_le) + return (__s16)le16_to_cpu(*(__le16 *)p); return (__s16)be16_to_cpu(*(__be16 *)p); - else + } else { + if (is_le) + return (__u16)le16_to_cpu(*(__le16 *)p); return (__u16)be16_to_cpu(*(__be16 *)p); + } } z1 = ((intptr_t)p & 0x1) == 0 && bit == 0; if (nbits == 8 && z1) { diff --git a/db/bit.h b/db/bit.h index 4df86030abc..912283a7348 100644 --- a/db/bit.h +++ b/db/bit.h @@ -13,6 +13,7 @@ #define BVUNSIGNED 0 #define BVSIGNED 1 +#define BV_LE (1U << 1) /* little endian */ extern int64_t getbitval(void *obj, int bitoff, int nbits, int flags); extern void setbitval(void *obuf, int bitoff, int nbits, void *ibuf); diff --git a/db/field.c b/db/field.c index cee5c661595..7dee8c3735c 100644 --- a/db/field.c +++ b/db/field.c @@ -392,6 +392,11 @@ const ftattr_t ftattrtab[] = { { FLDT_UINT8X, "uint8x", fp_num, "%#x", SI(bitsz(uint8_t)), 0, NULL, NULL }, { FLDT_UUID, "uuid", fp_uuid, NULL, SI(bitsz(uuid_t)), 0, NULL, NULL }, + + { FLDT_RTWORD, "rtword", fp_num, "%#x", SI(bitsz(xfs_rtword_t)), + FTARG_LE, NULL, NULL }, + { FLDT_RGBITMAP, "rgbitmap", NULL, (char *)rgbitmap_flds, btblock_size, + FTARG_SIZE, NULL, rgbitmap_flds }, { FLDT_ZZZ, NULL } }; diff --git a/db/field.h b/db/field.h index 226753490ad..ce7e7297afa 100644 --- a/db/field.h +++ b/db/field.h @@ -191,6 +191,9 @@ typedef enum fldt { FLDT_UINT8O, FLDT_UINT8X, FLDT_UUID, + + FLDT_RTWORD, + FLDT_RGBITMAP, FLDT_ZZZ /* mark last entry */ } fldt_t; @@ -246,6 +249,7 @@ extern const ftattr_t ftattrtab[]; #define FTARG_SIZE 16 /* size field is a function */ #define FTARG_SKIPNMS 32 /* skip printing names this time */ #define FTARG_OKEMPTY 64 /* ok if this (union type) is empty */ +#define FTARG_LE (1U << 7) /* little endian */ extern int bitoffset(const field_t *f, void *obj, int startoff, int idx); diff --git a/db/fprint.c b/db/fprint.c index 65accfda3fe..ac916d511e8 100644 --- a/db/fprint.c +++ b/db/fprint.c @@ -68,13 +68,20 @@ fp_num( int bitpos; int i; int isnull; + int bvflags = 0; int64_t val; + if (arg & FTARG_LE) + bvflags |= BV_LE; + if (arg & FTARG_SIGNED) + bvflags |= BVSIGNED; + else + bvflags |= BVUNSIGNED; + for (i = 0, bitpos = bit; i < count && !seenint(); i++, bitpos += size) { - val = getbitval(obj, bitpos, size, - (arg & FTARG_SIGNED) ? BVSIGNED : BVUNSIGNED); + val = getbitval(obj, bitpos, size, bvflags); if ((arg & FTARG_SKIPZERO) && val == 0) continue; isnull = (arg & FTARG_SIGNED) || size == 64 ? diff --git a/db/inode.c b/db/inode.c index 4c2fd19f446..663487f8b14 100644 --- a/db/inode.c +++ b/db/inode.c @@ -642,9 +642,11 @@ inode_next_type(void) case S_IFLNK: return TYP_SYMLINK; case S_IFREG: - if (iocur_top->ino == mp->m_sb.sb_rbmino) + if (iocur_top->ino == mp->m_sb.sb_rbmino) { + if (xfs_has_rtgroups(mp)) + return TYP_RGBITMAP; return TYP_RTBITMAP; - else if (iocur_top->ino == mp->m_sb.sb_rsumino) + } else if (iocur_top->ino == mp->m_sb.sb_rsumino) return TYP_RTSUMMARY; else if (iocur_top->ino == mp->m_sb.sb_uquotino || iocur_top->ino == mp->m_sb.sb_gquotino || diff --git a/db/rtgroup.c b/db/rtgroup.c index c4debc1d394..350677d4687 100644 --- a/db/rtgroup.c +++ b/db/rtgroup.c @@ -54,6 +54,7 @@ const field_t rtsb_flds[] = { { "meta_uuid", FLDT_UUID, OI(OFF(meta_uuid)), C1, 0, TYP_NONE }, { NULL } }; +#undef OFF const field_t rtsb_hfld[] = { { "", FLDT_RTSB, OI(0), C1, 0, TYP_NONE }, @@ -113,3 +114,36 @@ rtsb_size( { return bitize(mp->m_sb.sb_blocksize); } + +static int +rtwords_count( + void *obj, + int startoff) +{ + unsigned int blksz = mp->m_sb.sb_blocksize; + + if (xfs_has_rtgroups(mp)) + blksz -= sizeof(struct xfs_rtbuf_blkinfo); + + return blksz >> XFS_WORDLOG; +} + +#define OFF(f) bitize(offsetof(struct xfs_rtbuf_blkinfo, rt_ ## f)) +const field_t rgbitmap_flds[] = { + { "magicnum", FLDT_UINT32X, OI(OFF(magic)), C1, 0, TYP_NONE }, + { "crc", FLDT_CRC, OI(OFF(crc)), C1, 0, TYP_NONE }, + { "owner", FLDT_INO, OI(OFF(owner)), C1, 0, TYP_NONE }, + { "bno", FLDT_DFSBNO, OI(OFF(blkno)), C1, 0, TYP_BMAPBTD }, + { "lsn", FLDT_UINT64X, OI(OFF(lsn)), C1, 0, TYP_NONE }, + { "uuid", FLDT_UUID, OI(OFF(uuid)), C1, 0, TYP_NONE }, + /* the rtword array is after the actual structure */ + { "rtwords", FLDT_RTWORD, OI(bitize(sizeof(struct xfs_rtbuf_blkinfo))), + rtwords_count, FLD_ARRAY | FLD_COUNT, TYP_DATA }, + { NULL } +}; +#undef OFF + +const field_t rgbitmap_hfld[] = { + { "", FLDT_RGBITMAP, OI(0), C1, 0, TYP_NONE }, + { NULL } +}; diff --git a/db/rtgroup.h b/db/rtgroup.h index 49077bee141..3c9b16146fc 100644 --- a/db/rtgroup.h +++ b/db/rtgroup.h @@ -9,6 +9,9 @@ extern const struct field rtsb_flds[]; extern const struct field rtsb_hfld[]; +extern const struct field rgbitmap_flds[]; +extern const struct field rgbitmap_hfld[]; + extern void rtsb_init(void); extern int rtsb_size(void *obj, int startoff, int idx); diff --git a/db/type.c b/db/type.c index d875c0c6365..65e7b24146f 100644 --- a/db/type.c +++ b/db/type.c @@ -67,6 +67,7 @@ static const typ_t __typtab[] = { { TYP_TEXT, "text", handle_text, NULL, NULL, TYP_F_NO_CRC_OFF }, { TYP_FINOBT, "finobt", handle_struct, finobt_hfld, NULL, TYP_F_NO_CRC_OFF }, + { TYP_RGBITMAP, NULL }, { TYP_NONE, NULL } }; @@ -113,6 +114,8 @@ static const typ_t __typtab_crc[] = { { TYP_TEXT, "text", handle_text, NULL, NULL, TYP_F_NO_CRC_OFF }, { TYP_FINOBT, "finobt", handle_struct, finobt_crc_hfld, &xfs_finobt_buf_ops, XFS_BTREE_SBLOCK_CRC_OFF }, + { TYP_RGBITMAP, "rgbitmap", handle_struct, rgbitmap_hfld, + &xfs_rtbitmap_buf_ops, XFS_RTBUF_CRC_OFF }, { TYP_NONE, NULL } }; @@ -159,6 +162,8 @@ static const typ_t __typtab_spcrc[] = { { TYP_TEXT, "text", handle_text, NULL, NULL, TYP_F_NO_CRC_OFF }, { TYP_FINOBT, "finobt", handle_struct, finobt_spcrc_hfld, &xfs_finobt_buf_ops, XFS_BTREE_SBLOCK_CRC_OFF }, + { TYP_RGBITMAP, "rgbitmap", handle_struct, rgbitmap_hfld, + &xfs_rtbitmap_buf_ops, XFS_RTBUF_CRC_OFF }, { TYP_NONE, NULL } }; diff --git a/db/type.h b/db/type.h index d4efa4b0fab..e2148c6351d 100644 --- a/db/type.h +++ b/db/type.h @@ -35,6 +35,7 @@ typedef enum typnm TYP_SYMLINK, TYP_TEXT, TYP_FINOBT, + TYP_RGBITMAP, TYP_NONE } typnm_t; diff --git a/include/xfs_arch.h b/include/xfs_arch.h index d46ae47094a..6312e62b0a1 100644 --- a/include/xfs_arch.h +++ b/include/xfs_arch.h @@ -200,6 +200,9 @@ static __inline__ void __swab64s(__u64 *addr) ((__force __le32)___constant_swab32((__u32)(val))) #define __constant_cpu_to_be32(val) \ ((__force __be32)(__u32)(val)) + +#define le64_to_cpu(val) (__swab64((__force __u64)(__le64)(val))) +#define le16_to_cpu(val) (__swab16((__force __u16)(__le16)(val))) #else #define cpu_to_be16(val) ((__force __be16)__swab16((__u16)(val))) #define cpu_to_be32(val) ((__force __be32)__swab32((__u32)(val))) @@ -215,6 +218,9 @@ static __inline__ void __swab64s(__u64 *addr) ((__force __le32)(__u32)(val)) #define __constant_cpu_to_be32(val) \ ((__force __be32)___constant_swab32((__u32)(val))) + +#define le64_to_cpu(val) ((__force __u64)(__le64)(val)) +#define le16_to_cpu(val) ((__force __u16)(__le16)(val)) #endif static inline void be16_add_cpu(__be16 *a, __s16 b) ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 40/45] xfs_spaceman: report on realtime group health 2022-12-30 22:19 ` [PATCHSET v1.0 00/45] libxfs: shard the realtime section Darrick J. Wong ` (34 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 33/45] xfs_db: dump rt bitmap blocks Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 36/45] xfs_io: add a command to display allocation group information Darrick J. Wong ` (8 subsequent siblings) 44 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Add the realtime group status to the health reporting done by xfs_spaceman. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- man/man8/xfs_spaceman.8 | 5 +++- spaceman/health.c | 59 +++++++++++++++++++++++++++++++++++++++++++++-- 2 files changed, 61 insertions(+), 3 deletions(-) diff --git a/man/man8/xfs_spaceman.8 b/man/man8/xfs_spaceman.8 index ece840d7300..837fc497f27 100644 --- a/man/man8/xfs_spaceman.8 +++ b/man/man8/xfs_spaceman.8 @@ -91,7 +91,7 @@ The output will have the same format that .BR "xfs_info" "(8)" prints when querying a filesystem. .TP -.BI "health [ \-a agno] [ \-c ] [ \-f ] [ \-i inum ] [ \-q ] [ paths ]" +.BI "health [ \-a agno] [ \-c ] [ \-f ] [ \-i inum ] [ \-q ] [ \-r rgno ] [ paths ]" Reports the health of the given group of filesystem metadata. .RS 1.0i .PD 0 @@ -114,6 +114,9 @@ Report on the health of a specific inode. .B \-q Report only unhealthy metadata. .TP +.B \-r +Report on the health of the given realtime group. +.TP .B paths Report on the health of the files at the given path. .PD diff --git a/spaceman/health.c b/spaceman/health.c index 12fb67bab28..928d92abb8c 100644 --- a/spaceman/health.c +++ b/spaceman/health.c @@ -134,6 +134,18 @@ static const struct flag_map ag_flags[] = { {0}, }; +static const struct flag_map rtgroup_flags[] = { + { + .mask = XFS_RTGROUP_GEOM_SICK_SUPER, + .descr = "superblock", + }, + { + .mask = XFS_RTGROUP_GEOM_SICK_BITMAP, + .descr = "realtime bitmap", + }, + {0}, +}; + static const struct flag_map inode_flags[] = { { .mask = XFS_BS_SICK_INODE, @@ -214,6 +226,25 @@ report_ag_sick( return 0; } +/* Report on a rt group's health. */ +static int +report_rtgroup_sick( + xfs_rgnumber_t rgno) +{ + struct xfs_rtgroup_geometry rgeo = { 0 }; + char descr[256]; + int ret; + + ret = -xfrog_rtgroup_geometry(file->xfd.fd, rgno, &rgeo); + if (ret) { + xfrog_perror(ret, "rtgroup_geometry"); + return 1; + } + snprintf(descr, sizeof(descr) - 1, _("rtgroup %u"), rgno); + report_sick(descr, rtgroup_flags, rgeo.rg_sick, rgeo.rg_checked); + return 0; +} + /* Report on an inode's health. */ static int report_inode_health( @@ -312,7 +343,7 @@ report_bulkstat_health( return error; } -#define OPT_STRING ("a:cfi:q") +#define OPT_STRING ("a:cfi:qr:") /* Report on health problems in XFS filesystem. */ static int @@ -322,6 +353,7 @@ health_f( { unsigned long long x; xfs_agnumber_t agno; + xfs_rgnumber_t rgno; bool default_report = true; int c; int ret; @@ -365,6 +397,17 @@ health_f( case 'q': quiet = true; break; + case 'r': + default_report = false; + errno = 0; + x = strtoll(optarg, NULL, 10); + if (!errno && x >= NULLRGNUMBER) + errno = ERANGE; + if (errno) { + perror("rtgroup health"); + return 1; + } + break; default: return command_usage(&health_cmd); } @@ -400,6 +443,12 @@ health_f( if (ret) return 1; break; + case 'r': + rgno = strtoll(optarg, NULL, 10); + ret = report_rtgroup_sick(rgno); + if (ret) + return 1; + break; default: break; } @@ -421,6 +470,11 @@ health_f( if (ret) return 1; } + for (rgno = 0; rgno < file->xfd.fsgeom.rgcount; rgno++) { + ret = report_rtgroup_sick(rgno); + if (ret) + return 1; + } if (comprehensive) { ret = report_bulkstat_health(NULLAGNUMBER); if (ret) @@ -450,6 +504,7 @@ health_help(void) " -f -- Report health of the overall filesystem.\n" " -i inum -- Report health of a given inode number.\n" " -q -- Only report unhealthy metadata.\n" +" -r rgno -- Report health of the given realtime group.\n" " paths -- Report health of the given file path.\n" "\n")); @@ -460,7 +515,7 @@ static cmdinfo_t health_cmd = { .cfunc = health_f, .argmin = 0, .argmax = -1, - .args = "[-a agno] [-c] [-f] [-i inum] [-q] [paths]", + .args = "[-a agno] [-c] [-f] [-i inum] [-q] [-r rgno] [paths]", .flags = CMD_FLAG_ONESHOT, .help = health_help, }; ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 36/45] xfs_io: add a command to display allocation group information 2022-12-30 22:19 ` [PATCHSET v1.0 00/45] libxfs: shard the realtime section Darrick J. Wong ` (35 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 40/45] xfs_spaceman: report on realtime group health Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 34/45] xfs_db: dump rt summary blocks Darrick J. Wong ` (7 subsequent siblings) 44 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Add a new 'aginfo' command to xfs_io so that we can display allocation group geometry. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- io/Makefile | 2 - io/aginfo.c | 119 +++++++++++++++++++++++++++++++++++++++++++++++++++++ io/init.c | 1 io/io.h | 1 man/man8/xfs_io.8 | 12 +++++ 5 files changed, 133 insertions(+), 2 deletions(-) create mode 100644 io/aginfo.c diff --git a/io/Makefile b/io/Makefile index aa0d216b25f..2b7748bfc13 100644 --- a/io/Makefile +++ b/io/Makefile @@ -8,7 +8,7 @@ include $(TOPDIR)/include/builddefs LTCOMMAND = xfs_io LSRCFILES = xfs_bmap.sh xfs_freeze.sh xfs_mkfile.sh HFILES = init.h io.h -CFILES = init.c \ +CFILES = init.c aginfo.c \ attr.c bmap.c bulkstat.c crc32cselftest.c cowextsize.c encrypt.c \ file.c freeze.c fsync.c getrusage.c imap.c inject.c label.c link.c \ mmap.c open.c parent.c pread.c prealloc.c pwrite.c reflink.c \ diff --git a/io/aginfo.c b/io/aginfo.c new file mode 100644 index 00000000000..06e1cb7ba88 --- /dev/null +++ b/io/aginfo.c @@ -0,0 +1,119 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (c) 2022 Oracle. + * All Rights Reserved. + */ +#include "platform_defs.h" +#include "libxfs.h" +#include "command.h" +#include "input.h" +#include "init.h" +#include "io.h" +#include "libfrog/logging.h" +#include "libfrog/paths.h" +#include "libfrog/fsgeom.h" + +static cmdinfo_t aginfo_cmd; + +static int +report_aginfo( + struct xfs_fd *xfd, + xfs_agnumber_t agno) +{ + struct xfs_ag_geometry ageo = { 0 }; + int ret; + + ret = -xfrog_ag_geometry(xfd->fd, agno, &ageo); + if (ret) { + xfrog_perror(ret, "aginfo"); + return 1; + } + + printf(_("AG: %u\n"), ageo.ag_number); + printf(_("Blocks: %u\n"), ageo.ag_length); + printf(_("Free Blocks: %u\n"), ageo.ag_freeblks); + printf(_("Inodes: %u\n"), ageo.ag_icount); + printf(_("Free Inodes: %u\n"), ageo.ag_ifree); + printf(_("Sick: 0x%x\n"), ageo.ag_sick); + printf(_("Checked: 0x%x\n"), ageo.ag_checked); + printf(_("Flags: 0x%x\n"), ageo.ag_flags); + + return 0; +} + +/* Display AG status. */ +static int +aginfo_f( + int argc, + char **argv) +{ + struct xfs_fd xfd = XFS_FD_INIT(file->fd); + unsigned long long x; + xfs_agnumber_t agno = NULLAGNUMBER; + int c; + int ret = 0; + + ret = -xfd_prepare_geometry(&xfd); + if (ret) { + xfrog_perror(ret, "xfd_prepare_geometry"); + exitcode = 1; + return 1; + } + + while ((c = getopt(argc, argv, "a:")) != EOF) { + switch (c) { + case 'a': + errno = 0; + x = strtoll(optarg, NULL, 10); + if (!errno && x >= NULLAGNUMBER) + errno = ERANGE; + if (errno) { + perror("aginfo"); + return 1; + } + agno = x; + break; + default: + return command_usage(&aginfo_cmd); + } + } + + if (agno != NULLAGNUMBER) { + ret = report_aginfo(&xfd, agno); + } else { + for (agno = 0; !ret && agno < xfd.fsgeom.agcount; agno++) { + ret = report_aginfo(&xfd, agno); + } + } + + return ret; +} + +static void +aginfo_help(void) +{ + printf(_( +"\n" +"Report allocation group geometry.\n" +"\n" +" -a agno -- Report on the given allocation group.\n" +"\n")); + +} + +static cmdinfo_t aginfo_cmd = { + .name = "aginfo", + .cfunc = aginfo_f, + .argmin = 0, + .argmax = -1, + .args = "[-a agno]", + .flags = CMD_NOMAP_OK, + .help = aginfo_help, +}; + +void +aginfo_init(void) +{ + aginfo_cmd.oneline = _("Get XFS allocation group state."); + add_command(&aginfo_cmd); +} diff --git a/io/init.c b/io/init.c index 96536a25a1f..78d7d04e7a6 100644 --- a/io/init.c +++ b/io/init.c @@ -44,6 +44,7 @@ init_cvtnum( static void init_commands(void) { + aginfo_init(); atomicupdate_init(); attr_init(); bmap_init(); diff --git a/io/io.h b/io/io.h index 1cfe8edc2db..77bedf5159d 100644 --- a/io/io.h +++ b/io/io.h @@ -189,3 +189,4 @@ extern void repair_init(void); extern void crc32cselftest_init(void); extern void bulkstat_init(void); extern void atomicupdate_init(void); +extern void aginfo_init(void); diff --git a/man/man8/xfs_io.8 b/man/man8/xfs_io.8 index 0c0b00b5712..95c63f32a2a 100644 --- a/man/man8/xfs_io.8 +++ b/man/man8/xfs_io.8 @@ -1545,7 +1545,17 @@ This option is not compatible with the flag. .RE .PD - +.TP +.BI "aginfo [ \-a " agno " ]" +Show information about or update the state of allocation groups. +.RE +.RS 1.0i +.PD 0 +.TP +.BI \-a +Act only on a specific allocation group. +.PD +.RE .SH OTHER COMMANDS .TP ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 34/45] xfs_db: dump rt summary blocks 2022-12-30 22:19 ` [PATCHSET v1.0 00/45] libxfs: shard the realtime section Darrick J. Wong ` (36 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 36/45] xfs_io: add a command to display allocation group information Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 39/45] xfs_io: display rt group in verbose fsmap output Darrick J. Wong ` (6 subsequent siblings) 44 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Now that rtsummary blocks have a header, make it so that xfs_db can analyze the structure. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- db/field.c | 5 +++++ db/field.h | 3 +++ db/inode.c | 5 ++++- db/rtgroup.c | 20 ++++++++++++++++++++ db/rtgroup.h | 3 +++ db/type.c | 5 +++++ db/type.h | 1 + 7 files changed, 41 insertions(+), 1 deletion(-) diff --git a/db/field.c b/db/field.c index 7dee8c3735c..4a6a4cf51c3 100644 --- a/db/field.c +++ b/db/field.c @@ -397,6 +397,11 @@ const ftattr_t ftattrtab[] = { FTARG_LE, NULL, NULL }, { FLDT_RGBITMAP, "rgbitmap", NULL, (char *)rgbitmap_flds, btblock_size, FTARG_SIZE, NULL, rgbitmap_flds }, + { FLDT_SUMINFO, "suminfo", fp_num, "%u", SI(bitsz(xfs_suminfo_t)), + 0, NULL, NULL }, + { FLDT_RGSUMMARY, "rgsummary", NULL, (char *)rgsummary_flds, + btblock_size, FTARG_SIZE, NULL, rgsummary_flds }, + { FLDT_ZZZ, NULL } }; diff --git a/db/field.h b/db/field.h index ce7e7297afa..e9c6142f282 100644 --- a/db/field.h +++ b/db/field.h @@ -194,6 +194,9 @@ typedef enum fldt { FLDT_RTWORD, FLDT_RGBITMAP, + FLDT_SUMINFO, + FLDT_RGSUMMARY, + FLDT_ZZZ /* mark last entry */ } fldt_t; diff --git a/db/inode.c b/db/inode.c index 663487f8b14..0b9dc617ba9 100644 --- a/db/inode.c +++ b/db/inode.c @@ -646,8 +646,11 @@ inode_next_type(void) if (xfs_has_rtgroups(mp)) return TYP_RGBITMAP; return TYP_RTBITMAP; - } else if (iocur_top->ino == mp->m_sb.sb_rsumino) + } else if (iocur_top->ino == mp->m_sb.sb_rsumino) { + if (xfs_has_rtgroups(mp)) + return TYP_RGSUMMARY; return TYP_RTSUMMARY; + } else if (iocur_top->ino == mp->m_sb.sb_uquotino || iocur_top->ino == mp->m_sb.sb_gquotino || iocur_top->ino == mp->m_sb.sb_pquotino) diff --git a/db/rtgroup.c b/db/rtgroup.c index 350677d4687..db1b9b595d5 100644 --- a/db/rtgroup.c +++ b/db/rtgroup.c @@ -147,3 +147,23 @@ const field_t rgbitmap_hfld[] = { { "", FLDT_RGBITMAP, OI(0), C1, 0, TYP_NONE }, { NULL } }; + +#define OFF(f) bitize(offsetof(struct xfs_rtbuf_blkinfo, rt_ ## f)) +const field_t rgsummary_flds[] = { + { "magicnum", FLDT_UINT32X, OI(OFF(magic)), C1, 0, TYP_NONE }, + { "crc", FLDT_CRC, OI(OFF(crc)), C1, 0, TYP_NONE }, + { "owner", FLDT_INO, OI(OFF(owner)), C1, 0, TYP_NONE }, + { "bno", FLDT_DFSBNO, OI(OFF(blkno)), C1, 0, TYP_BMAPBTD }, + { "lsn", FLDT_UINT64X, OI(OFF(lsn)), C1, 0, TYP_NONE }, + { "uuid", FLDT_UUID, OI(OFF(uuid)), C1, 0, TYP_NONE }, + /* the suminfo array is after the actual structure */ + { "suminfo", FLDT_SUMINFO, OI(bitize(sizeof(struct xfs_rtbuf_blkinfo))), + rtwords_count, FLD_ARRAY | FLD_COUNT, TYP_DATA }, + { NULL } +}; +#undef OFF + +const field_t rgsummary_hfld[] = { + { "", FLDT_RGSUMMARY, OI(0), C1, 0, TYP_NONE }, + { NULL } +}; diff --git a/db/rtgroup.h b/db/rtgroup.h index 3c9b16146fc..2e85d3587fe 100644 --- a/db/rtgroup.h +++ b/db/rtgroup.h @@ -12,6 +12,9 @@ extern const struct field rtsb_hfld[]; extern const struct field rgbitmap_flds[]; extern const struct field rgbitmap_hfld[]; +extern const struct field rgsummary_flds[]; +extern const struct field rgsummary_hfld[]; + extern void rtsb_init(void); extern int rtsb_size(void *obj, int startoff, int idx); diff --git a/db/type.c b/db/type.c index 65e7b24146f..2091b4ac8b1 100644 --- a/db/type.c +++ b/db/type.c @@ -68,6 +68,7 @@ static const typ_t __typtab[] = { { TYP_FINOBT, "finobt", handle_struct, finobt_hfld, NULL, TYP_F_NO_CRC_OFF }, { TYP_RGBITMAP, NULL }, + { TYP_RGSUMMARY, NULL }, { TYP_NONE, NULL } }; @@ -116,6 +117,8 @@ static const typ_t __typtab_crc[] = { &xfs_finobt_buf_ops, XFS_BTREE_SBLOCK_CRC_OFF }, { TYP_RGBITMAP, "rgbitmap", handle_struct, rgbitmap_hfld, &xfs_rtbitmap_buf_ops, XFS_RTBUF_CRC_OFF }, + { TYP_RGSUMMARY, "rgsummary", handle_struct, rgsummary_hfld, + &xfs_rtsummary_buf_ops, XFS_RTBUF_CRC_OFF }, { TYP_NONE, NULL } }; @@ -164,6 +167,8 @@ static const typ_t __typtab_spcrc[] = { &xfs_finobt_buf_ops, XFS_BTREE_SBLOCK_CRC_OFF }, { TYP_RGBITMAP, "rgbitmap", handle_struct, rgbitmap_hfld, &xfs_rtbitmap_buf_ops, XFS_RTBUF_CRC_OFF }, + { TYP_RGSUMMARY, "rgsummary", handle_struct, rgsummary_hfld, + &xfs_rtsummary_buf_ops, XFS_RTBUF_CRC_OFF }, { TYP_NONE, NULL } }; diff --git a/db/type.h b/db/type.h index e2148c6351d..e7f0ecc1768 100644 --- a/db/type.h +++ b/db/type.h @@ -36,6 +36,7 @@ typedef enum typnm TYP_TEXT, TYP_FINOBT, TYP_RGBITMAP, + TYP_RGSUMMARY, TYP_NONE } typnm_t; ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 39/45] xfs_io: display rt group in verbose fsmap output 2022-12-30 22:19 ` [PATCHSET v1.0 00/45] libxfs: shard the realtime section Darrick J. Wong ` (37 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 34/45] xfs_db: dump rt summary blocks Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 37/45] xfs_io: add a command to display realtime group information Darrick J. Wong ` (5 subsequent siblings) 44 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Display the rt group number in the fsmap output, just like we do for regular data files. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- io/fsmap.c | 23 ++++++++++++++++++++++- 1 file changed, 22 insertions(+), 1 deletion(-) diff --git a/io/fsmap.c b/io/fsmap.c index 7db51847e2b..cb70f86cb96 100644 --- a/io/fsmap.c +++ b/io/fsmap.c @@ -14,6 +14,7 @@ static cmdinfo_t fsmap_cmd; static dev_t xfs_data_dev; +static dev_t xfs_rt_dev; static void fsmap_help(void) @@ -170,7 +171,7 @@ dump_map_verbose( unsigned long long i; struct fsmap *p; int agno; - off64_t agoff, bperag; + off64_t agoff, bperag, bperrtg; int foff_w, boff_w, aoff_w, tot_w, agno_w, own_w; int nr_w, dev_w; char rbuf[40], bbuf[40], abuf[40], obuf[40]; @@ -185,6 +186,8 @@ dump_map_verbose( tot_w = MINTOT_WIDTH; bperag = (off64_t)fsgeo->agblocks * (off64_t)fsgeo->blocksize; + bperrtg = (off64_t)fsgeo->rgblocks * + (off64_t)fsgeo->blocksize; sunit = (fsgeo->sunit * fsgeo->blocksize); swidth = (fsgeo->swidth * fsgeo->blocksize); @@ -243,6 +246,13 @@ dump_map_verbose( "(%lld..%lld)", (long long)BTOBBT(agoff), (long long)BTOBBT(agoff + p->fmr_length - 1)); + } else if (p->fmr_device == xfs_rt_dev && fsgeo->rgcount > 0) { + agno = p->fmr_physical / bperrtg; + agoff = p->fmr_physical - (agno * bperrtg); + snprintf(abuf, sizeof(abuf), + "(%lld..%lld)", + (long long)BTOBBT(agoff), + (long long)BTOBBT(agoff + p->fmr_length - 1)); } else abuf[0] = 0; aoff_w = max(aoff_w, strlen(abuf)); @@ -315,6 +325,16 @@ dump_map_verbose( snprintf(gbuf, sizeof(gbuf), "%lld", (long long)agno); + } else if (p->fmr_device == xfs_rt_dev && fsgeo->rgcount > 0) { + agno = p->fmr_physical / bperrtg; + agoff = p->fmr_physical - (agno * bperrtg); + snprintf(abuf, sizeof(abuf), + "(%lld..%lld)", + (long long)BTOBBT(agoff), + (long long)BTOBBT(agoff + p->fmr_length - 1)); + snprintf(gbuf, sizeof(gbuf), + "%lld", + (long long)agno); } else { abuf[0] = 0; gbuf[0] = 0; @@ -501,6 +521,7 @@ fsmap_f( } fs = fs_table_lookup(file->name, FS_MOUNT_POINT); xfs_data_dev = fs ? fs->fs_datadev : 0; + xfs_rt_dev = fs ? fs->fs_rtdev : 0; head->fmh_count = map_size; do { ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 37/45] xfs_io: add a command to display realtime group information 2022-12-30 22:19 ` [PATCHSET v1.0 00/45] libxfs: shard the realtime section Darrick J. Wong ` (38 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 39/45] xfs_io: display rt group in verbose fsmap output Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 43/45] mkfs: add headers to realtime bitmap blocks Darrick J. Wong ` (4 subsequent siblings) 44 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Add a new 'rginfo' command to xfs_io so that we can display realtime group geometry. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- io/aginfo.c | 96 +++++++++++++++++++++++++++++++++++++++++++++++++++++ libfrog/fsgeom.c | 18 ++++++++++ libfrog/fsgeom.h | 2 + man/man8/xfs_io.8 | 11 ++++++ 4 files changed, 127 insertions(+) diff --git a/io/aginfo.c b/io/aginfo.c index 06e1cb7ba88..037af54b60c 100644 --- a/io/aginfo.c +++ b/io/aginfo.c @@ -14,6 +14,7 @@ #include "libfrog/fsgeom.h" static cmdinfo_t aginfo_cmd; +static cmdinfo_t rginfo_cmd; static int report_aginfo( @@ -111,9 +112,104 @@ static cmdinfo_t aginfo_cmd = { .help = aginfo_help, }; +static int +report_rginfo( + struct xfs_fd *xfd, + xfs_rgnumber_t rgno) +{ + struct xfs_rtgroup_geometry rgeo = { 0 }; + int ret; + + ret = -xfrog_rtgroup_geometry(xfd->fd, rgno, &rgeo); + if (ret) { + xfrog_perror(ret, "rginfo"); + return 1; + } + + printf(_("RG: %u\n"), rgeo.rg_number); + printf(_("Blocks: %u\n"), rgeo.rg_length); + printf(_("Sick: 0x%x\n"), rgeo.rg_sick); + printf(_("Checked: 0x%x\n"), rgeo.rg_checked); + printf(_("Flags: 0x%x\n"), rgeo.rg_flags); + + return 0; +} + +/* Display rtgroup status. */ +static int +rginfo_f( + int argc, + char **argv) +{ + struct xfs_fd xfd = XFS_FD_INIT(file->fd); + unsigned long long x; + xfs_rgnumber_t rgno = NULLRGNUMBER; + int c; + int ret = 0; + + ret = -xfd_prepare_geometry(&xfd); + if (ret) { + xfrog_perror(ret, "xfd_prepare_geometry"); + exitcode = 1; + return 1; + } + + while ((c = getopt(argc, argv, "r:")) != EOF) { + switch (c) { + case 'r': + errno = 0; + x = strtoll(optarg, NULL, 10); + if (!errno && x >= NULLRGNUMBER) + errno = ERANGE; + if (errno) { + perror("rginfo"); + return 1; + } + rgno = x; + break; + default: + return command_usage(&rginfo_cmd); + } + } + + if (rgno != NULLRGNUMBER) { + ret = report_rginfo(&xfd, rgno); + } else { + for (rgno = 0; !ret && rgno < xfd.fsgeom.rgcount; rgno++) { + ret = report_rginfo(&xfd, rgno); + } + } + + return ret; +} + +static void +rginfo_help(void) +{ + printf(_( +"\n" +"Report realtime group geometry.\n" +"\n" +" -r rgno -- Report on the given realtime group.\n" +"\n")); + +} + +static cmdinfo_t rginfo_cmd = { + .name = "rginfo", + .cfunc = rginfo_f, + .argmin = 0, + .argmax = -1, + .args = "[-r rgno]", + .flags = CMD_NOMAP_OK, + .help = rginfo_help, +}; + void aginfo_init(void) { aginfo_cmd.oneline = _("Get XFS allocation group state."); add_command(&aginfo_cmd); + rginfo_cmd.oneline = _("Get XFS realtime group state."); + add_command(&rginfo_cmd); } diff --git a/libfrog/fsgeom.c b/libfrog/fsgeom.c index 66e813a863f..5a89c3e3ca3 100644 --- a/libfrog/fsgeom.c +++ b/libfrog/fsgeom.c @@ -210,3 +210,21 @@ xfrog_ag_geometry( return -errno; return 0; } + +/* + * Try to obtain a rt group's geometry. Returns zero or a negative error code. + */ +int +xfrog_rtgroup_geometry( + int fd, + unsigned int rgno, + struct xfs_rtgroup_geometry *rgeo) +{ + int ret; + + rgeo->rg_number = rgno; + ret = ioctl(fd, XFS_IOC_RTGROUP_GEOMETRY, rgeo); + if (ret) + return -errno; + return 0; +} diff --git a/libfrog/fsgeom.h b/libfrog/fsgeom.h index bef864fce63..8c21b240bb2 100644 --- a/libfrog/fsgeom.h +++ b/libfrog/fsgeom.h @@ -9,6 +9,8 @@ void xfs_report_geom(struct xfs_fsop_geom *geo, const char *mntpoint, const char *logname, const char *rtname); int xfrog_geometry(int fd, struct xfs_fsop_geom *fsgeo); int xfrog_ag_geometry(int fd, unsigned int agno, struct xfs_ag_geometry *ageo); +int xfrog_rtgroup_geometry(int fd, unsigned int rgno, + struct xfs_rtgroup_geometry *rgeo); /* * Structure for recording whatever observations we want about the level of diff --git a/man/man8/xfs_io.8 b/man/man8/xfs_io.8 index 95c63f32a2a..16768275b5c 100644 --- a/man/man8/xfs_io.8 +++ b/man/man8/xfs_io.8 @@ -1556,6 +1556,17 @@ Show information about or update the state of allocation groups. Act only on a specific allocation group. .PD .RE +.TP +.BI "rginfo [ \-r " rgno " ]" +Show information about or update the state of realtime allocation groups. +.RE +.RS 1.0i +.PD 0 +.TP +.BI \-a +Act only on a specific realtime group. +.PD +.RE .SH OTHER COMMANDS .TP ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 43/45] mkfs: add headers to realtime bitmap blocks 2022-12-30 22:19 ` [PATCHSET v1.0 00/45] libxfs: shard the realtime section Darrick J. Wong ` (39 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 37/45] xfs_io: add a command to display realtime group information Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 44/45] mkfs: add headers to realtime summary blocks Darrick J. Wong ` (3 subsequent siblings) 44 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> When the rtgroups feature is enabled, format rtbitmap blocks with the appropriate block headers. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- mkfs/proto.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++++++ mkfs/xfs_mkfs.c | 6 +++++- 2 files changed, 56 insertions(+), 1 deletion(-) diff --git a/mkfs/proto.c b/mkfs/proto.c index 21fe2c7f972..daf0d419bce 100644 --- a/mkfs/proto.c +++ b/mkfs/proto.c @@ -813,6 +813,50 @@ rtsummary_create( libxfs_imeta_end_update(mp, &upd, 0); } +/* Initialize block headers of rt free space files. */ +static int +init_rtblock_headers( + struct xfs_inode *ip, + xfs_fileoff_t nrblocks, + const struct xfs_buf_ops *ops, + uint32_t magic) +{ + struct xfs_bmbt_irec map; + struct xfs_mount *mp = ip->i_mount; + struct xfs_rtbuf_blkinfo *hdr; + xfs_fileoff_t off = 0; + int error; + + while (off < nrblocks) { + struct xfs_buf *bp; + xfs_daddr_t daddr; + int nimaps = 1; + + error = -libxfs_bmapi_read(ip, off, 1, &map, &nimaps, 0); + if (error) + return error; + + daddr = XFS_FSB_TO_DADDR(mp, map.br_startblock); + error = -libxfs_buf_get(mp->m_ddev_targp, daddr, + XFS_FSB_TO_BB(mp, map.br_blockcount), &bp); + if (error) + return error; + + bp->b_ops = ops; + hdr = bp->b_addr; + hdr->rt_magic = cpu_to_be32(magic); + hdr->rt_owner = cpu_to_be64(ip->i_ino); + hdr->rt_blkno = cpu_to_be64(daddr); + platform_uuid_copy(&hdr->rt_uuid, &mp->m_sb.sb_meta_uuid); + libxfs_buf_mark_dirty(bp); + libxfs_buf_relse(bp); + + off = map.br_startoff + map.br_blockcount; + } + + return 0; +} + /* Zero the realtime bitmap. */ static void rtbitmap_init( @@ -856,6 +900,13 @@ rtbitmap_init( if (error) fail(_("Block allocation of the realtime bitmap inode failed"), error); + + if (xfs_has_rtgroups(mp)) { + error = init_rtblock_headers(mp->m_rbmip, mp->m_sb.sb_rbmblocks, + &xfs_rtbitmap_buf_ops, XFS_RTBITMAP_MAGIC); + if (error) + fail(_("Initialization of rtbitmap failed"), error); + } } /* Zero the realtime summary file. */ diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c index 0324daaad3a..826f9e53309 100644 --- a/mkfs/xfs_mkfs.c +++ b/mkfs/xfs_mkfs.c @@ -870,6 +870,7 @@ struct sb_feat_args { bool nodalign; bool nortalign; bool nrext64; + bool rtgroups; /* XFS_SB_FEAT_COMPAT_RTGROUPS */ }; struct cli_params { @@ -3065,6 +3066,7 @@ validate_rtdev( char **devname) { struct libxfs_xinit *xi = cli->xi; + unsigned int rbmblocksize = cfg->blocksize; *devname = NULL; @@ -3112,8 +3114,10 @@ reported by the device (%u).\n"), } cfg->rtextents = cfg->rtblocks / cfg->rtextblocks; + if (cfg->sb_feat.rtgroups) + rbmblocksize -= sizeof(struct xfs_rtbuf_blkinfo); cfg->rtbmblocks = (xfs_extlen_t)howmany(cfg->rtextents, - NBBY * cfg->blocksize); + NBBY * rbmblocksize); } static bool ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 44/45] mkfs: add headers to realtime summary blocks 2022-12-30 22:19 ` [PATCHSET v1.0 00/45] libxfs: shard the realtime section Darrick J. Wong ` (40 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 43/45] mkfs: add headers to realtime bitmap blocks Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 45/45] mkfs: format realtime groups Darrick J. Wong ` (2 subsequent siblings) 44 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> When the rtgroups feature is enabled, format rtsummary blocks with the appropriate block headers. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- mkfs/proto.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/mkfs/proto.c b/mkfs/proto.c index daf0d419bce..846b48ec789 100644 --- a/mkfs/proto.c +++ b/mkfs/proto.c @@ -953,6 +953,14 @@ rtsummary_init( if (error) fail(_("Block allocation of the realtime summary inode failed"), error); + + if (xfs_has_rtgroups(mp)) { + error = init_rtblock_headers(mp->m_rsumip, + XFS_B_TO_FSB(mp, mp->m_rsumsize), + &xfs_rtsummary_buf_ops, XFS_RTSUMMARY_MAGIC); + if (error) + fail(_("Initialization of rtsummary failed"), error); + } } /* ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 45/45] mkfs: format realtime groups 2022-12-30 22:19 ` [PATCHSET v1.0 00/45] libxfs: shard the realtime section Darrick J. Wong ` (41 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 44/45] mkfs: add headers to realtime summary blocks Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 41/45] xfs_scrub: scrub realtime allocation group metadata Darrick J. Wong 2022-12-30 22:19 ` [PATCH 42/45] xfs_scrub: fstrim each rtgroup in parallel Darrick J. Wong 44 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Create filesystems with the realtime group feature enabled. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libfrog/div64.h | 6 + libfrog/util.c | 12 ++ libfrog/util.h | 1 libxfs/libxfs_api_defs.h | 1 libxfs/libxfs_priv.h | 6 - libxfs/topology.c | 42 +++++++ libxfs/topology.h | 3 + libxfs/xfs_format.h | 1 man/man8/mkfs.xfs.8.in | 44 +++++++ mkfs/proto.c | 45 +++++++- mkfs/xfs_mkfs.c | 272 ++++++++++++++++++++++++++++++++++++++++++++++ 11 files changed, 425 insertions(+), 8 deletions(-) diff --git a/libfrog/div64.h b/libfrog/div64.h index 9317b28aad4..0ce8b747938 100644 --- a/libfrog/div64.h +++ b/libfrog/div64.h @@ -80,4 +80,10 @@ roundup_64(uint64_t x, uint32_t y) return x * y; } +static inline __attribute__((const)) +int is_power_of_2(unsigned long n) +{ + return (n != 0 && ((n & (n - 1)) == 0)); +} + #endif /* LIBFROG_DIV64_H_ */ diff --git a/libfrog/util.c b/libfrog/util.c index 46047571a55..4e130c884c1 100644 --- a/libfrog/util.c +++ b/libfrog/util.c @@ -36,3 +36,15 @@ memchr_inv(const void *start, int c, size_t bytes) return NULL; } + +unsigned int +log2_rounddown(unsigned long long i) +{ + int rval; + + for (rval = NBBY * sizeof(i) - 1; rval >= 0; rval--) { + if ((1ULL << rval) < i) + break; + } + return rval; +} diff --git a/libfrog/util.h b/libfrog/util.h index ac2f331c93e..b0715576e8d 100644 --- a/libfrog/util.h +++ b/libfrog/util.h @@ -7,6 +7,7 @@ #define __LIBFROG_UTIL_H__ unsigned int log2_roundup(unsigned int i); +unsigned int log2_rounddown(unsigned long long i); void *memchr_inv(const void *start, int c, size_t bytes); diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h index deadfe2c422..715df25f18b 100644 --- a/libxfs/libxfs_api_defs.h +++ b/libxfs/libxfs_api_defs.h @@ -73,6 +73,7 @@ #define xfs_btree_update libxfs_btree_update #define xfs_btree_space_to_height libxfs_btree_space_to_height #define xfs_btree_visit_blocks libxfs_btree_visit_blocks +#define xfs_buf_delwri_queue libxfs_buf_delwri_queue #define xfs_buf_delwri_submit libxfs_buf_delwri_submit #define xfs_buf_get libxfs_buf_get #define xfs_buf_get_uncached libxfs_buf_get_uncached diff --git a/libxfs/libxfs_priv.h b/libxfs/libxfs_priv.h index 57b92ac1b99..66b9409a0b0 100644 --- a/libxfs/libxfs_priv.h +++ b/libxfs/libxfs_priv.h @@ -334,12 +334,6 @@ find_next_zero_bit(const unsigned long *addr, unsigned long size, } #define find_first_zero_bit(addr, size) find_next_zero_bit((addr), (size), 0) -static inline __attribute__((const)) -int is_power_of_2(unsigned long n) -{ - return (n != 0 && ((n & (n - 1)) == 0)); -} - /* * xfs_iroundup: round up argument to next power of two */ diff --git a/libxfs/topology.c b/libxfs/topology.c index a17c19691a4..75c6164887f 100644 --- a/libxfs/topology.c +++ b/libxfs/topology.c @@ -89,6 +89,48 @@ calc_default_ag_geometry( *agcount = dblocks / blocks + (dblocks % blocks != 0); } +void +calc_default_rtgroup_geometry( + int blocklog, + uint64_t rblocks, + uint64_t *rgsize, + uint64_t *rgcount) +{ + uint64_t blocks = 0; + int shift = 0; + + /* + * For a single underlying storage device over 4TB in size use the + * maximum rtgroup size. Between 128MB and 4TB, just use 4 rtgroups + * and scale up smoothly between min/max rtgroup sizes. + */ + if (rblocks >= TERABYTES(4, blocklog)) { + blocks = XFS_MAX_RGBLOCKS; + goto done; + } + if (rblocks >= MEGABYTES(128, blocklog)) { + shift = XFS_NOMULTIDISK_AGLOG; + goto calc_blocks; + } + + /* + * If rblocks is not evenly divisible by the number of desired rt + * groups, round "blocks" up so we don't lose the last bit of the + * filesystem. The same principle applies to the rt group count, so we + * don't lose the last rt group! + */ +calc_blocks: + ASSERT(shift >= 0 && shift <= XFS_MULTIDISK_AGLOG); + blocks = rblocks >> shift; + if (rblocks & xfs_mask32lo(shift)) { + if (blocks < XFS_MAX_RGBLOCKS) + blocks++; + } +done: + *rgsize = blocks; + *rgcount = rblocks / blocks + (rblocks % blocks != 0); +} + /* * Check for existing filesystem or partition table on device. * Returns: diff --git a/libxfs/topology.h b/libxfs/topology.h index 1a0fe24c09d..81843cbb803 100644 --- a/libxfs/topology.h +++ b/libxfs/topology.h @@ -32,6 +32,9 @@ calc_default_ag_geometry( uint64_t *agsize, uint64_t *agcount); +void calc_default_rtgroup_geometry(int blocklog, uint64_t rblocks, + uint64_t *rgsize, uint64_t *rgcount); + extern int check_overwrite( const char *device); diff --git a/libxfs/xfs_format.h b/libxfs/xfs_format.h index 7e76bedda68..e4f3b2c5c05 100644 --- a/libxfs/xfs_format.h +++ b/libxfs/xfs_format.h @@ -416,6 +416,7 @@ xfs_sb_has_ro_compat_feature( XFS_SB_FEAT_INCOMPAT_BIGTIME| \ XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR| \ XFS_SB_FEAT_INCOMPAT_NREXT64 | \ + XFS_SB_FEAT_INCOMPAT_RTGROUPS | \ XFS_SB_FEAT_INCOMPAT_METADIR) #define XFS_SB_FEAT_INCOMPAT_UNKNOWN ~XFS_SB_FEAT_INCOMPAT_ALL diff --git a/man/man8/mkfs.xfs.8.in b/man/man8/mkfs.xfs.8.in index 8cdfe9a7ff1..1735caecca7 100644 --- a/man/man8/mkfs.xfs.8.in +++ b/man/man8/mkfs.xfs.8.in @@ -1083,6 +1083,50 @@ or logical volume containing the section. .BI noalign This option disables stripe size detection, enforcing a realtime device with no stripe geometry. +.TP +.BI rtgroups= value +This feature breaks the realtime section into multiple allocation groups for +improved scalability. +This feature is only available if the metadata directory tree feature is +enabled. +.IP +By default, +.B mkfs.xfs +will not enable this feature. +If the option +.B \-r rtgroups=0 +is used, the rt group feature is not supported and is disabled. +.TP +.BI rgcount= +This is used to specify the number of allocation groups in the realtime +section. +The realtime section of the filesystem can be divided into allocation groups to +improve the performance of XFS. +More allocation groups imply that more parallelism can be achieved when +allocating blocks. +The minimum allocation group size is 2 realtime extents; the maximum size is +2^31 blocks. +The rt section of the filesystem is divided into +.I value +allocation groups (default value is scaled automatically based +on the underlying device size). +.TP +.BI rgsize= value +This is an alternative to using the +.B rgcount +suboption. The +.I value +is the desired size of the realtime allocation group expressed in bytes +(usually using the +.BR m " or " g +suffixes). +This value must be a multiple of the realtime extent size, +must be at least two realtime extents, and no more than 2^31 blocks. +The +.B rgcount +and +.B rgsize +suboptions are mutually exclusive. .RE .PP .PD 0 diff --git a/mkfs/proto.c b/mkfs/proto.c index 846b48ec789..e734269864e 100644 --- a/mkfs/proto.c +++ b/mkfs/proto.c @@ -963,6 +963,46 @@ rtsummary_init( } } +static void +rtfreesp_init_groups( + struct xfs_mount *mp) +{ + xfs_rgnumber_t rgno; + int error; + + for (rgno = 0; rgno < mp->m_sb.sb_rgcount; rgno++) { + struct xfs_trans *tp; + xfs_rtblock_t rtbno; + xfs_rtxnum_t start_rtx; + xfs_rtxnum_t next_rtx; + + rtbno = xfs_rgbno_to_rtb(mp, rgno, mp->m_sb.sb_rextsize); + start_rtx = xfs_rtb_to_rtxt(mp, rtbno); + + rtbno = xfs_rgbno_to_rtb(mp, rgno + 1, 0); + next_rtx = xfs_rtb_to_rtxt(mp, rtbno); + next_rtx = min(next_rtx, mp->m_sb.sb_rextents); + + error = -libxfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, + 0, 0, 0, &tp); + if (error) + res_failed(error); + + libxfs_trans_ijoin(tp, mp->m_rbmip, 0); + error = -libxfs_rtfree_extent(tp, start_rtx, + next_rtx - start_rtx); + if (error) { + fail(_("Error initializing the realtime space"), + error); + } + error = -libxfs_trans_commit(tp); + if (error) + fail(_("Initialization of the realtime space failed"), + error); + + } +} + /* * Free the whole realtime area using transactions. * Do one transaction per bitmap block. @@ -1011,7 +1051,10 @@ rtinit( rtbitmap_init(mp); rtsummary_init(mp); - rtfreesp_init(mp); + if (xfs_has_rtgroups(mp)) + rtfreesp_init_groups(mp); + else + rtfreesp_init(mp); } static long diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c index 826f9e53309..4f96e436d32 100644 --- a/mkfs/xfs_mkfs.c +++ b/mkfs/xfs_mkfs.c @@ -123,6 +123,9 @@ enum { R_FILE, R_NAME, R_NOALIGN, + R_RTGROUPS, + R_RGCOUNT, + R_RGSIZE, R_MAX_OPTS, }; @@ -679,6 +682,9 @@ static struct opt_params ropts = { [R_FILE] = "file", [R_NAME] = "name", [R_NOALIGN] = "noalign", + [R_RTGROUPS] = "rtgroups", + [R_RGCOUNT] = "rgcount", + [R_RGSIZE] = "rgsize", [R_MAX_OPTS] = NULL, }, .subopt_params = { @@ -718,6 +724,27 @@ static struct opt_params ropts = { .defaultval = 1, .conflicts = { { NULL, LAST_CONFLICT } }, }, + { .index = R_RTGROUPS, + .conflicts = { { NULL, LAST_CONFLICT } }, + .minval = 0, + .maxval = 1, + .defaultval = 1, + }, + { .index = R_RGCOUNT, + .conflicts = { { &dopts, R_RGSIZE }, + { NULL, LAST_CONFLICT } }, + .minval = 1, + .maxval = XFS_MAX_RGNUMBER, + .defaultval = SUBOPT_NEEDS_VAL, + }, + { .index = R_RGSIZE, + .conflicts = { { &dopts, R_RGCOUNT }, + { NULL, LAST_CONFLICT } }, + .convert = true, + .minval = 0, + .maxval = (unsigned long long)XFS_MAX_RGBLOCKS << XFS_MAX_BLOCKSIZE_LOG, + .defaultval = SUBOPT_NEEDS_VAL, + }, }, }; @@ -882,6 +909,7 @@ struct cli_params { /* parameters that depend on sector/block size being validated. */ char *dsize; char *agsize; + char *rgsize; char *dsu; char *dirblocksize; char *logsize; @@ -902,6 +930,7 @@ struct cli_params { /* parameters where 0 is not a valid value */ int64_t agcount; + int64_t rgcount; int inodesize; int inopblock; int imaxpct; @@ -958,6 +987,9 @@ struct mkfs_params { uint64_t agsize; uint64_t agcount; + uint64_t rgsize; + uint64_t rgcount; + int imaxpct; bool loginternal; @@ -1014,7 +1046,8 @@ usage( void ) /* no-op info only */ [-N]\n\ /* prototype file */ [-p fname]\n\ /* quiet */ [-q]\n\ -/* realtime subvol */ [-r extsize=num,size=num,rtdev=xxx]\n\ +/* realtime subvol */ [-r extsize=num,size=num,rtdev=xxx,rtgroups=0|1,\n\ + rgcount=n,rgsize=n]\n\ /* sectorsize */ [-s size=num]\n\ /* version */ [-V]\n\ devicename\n\ @@ -1884,6 +1917,15 @@ rtdev_opts_parser( case R_NOALIGN: cli->sb_feat.nortalign = getnum(value, opts, subopt); break; + case R_RTGROUPS: + cli->sb_feat.rtgroups = getnum(value, opts, subopt); + break; + case R_RGCOUNT: + cli->rgcount = getnum(value, opts, subopt); + break; + case R_RGSIZE: + cli->rgsize = getstr(value, opts, subopt); + break; default: return -EINVAL; } @@ -2365,6 +2407,15 @@ _("cowextsize not supported without reflink support\n")); usage(); } + if (cli->sb_feat.rtgroups && !cli->sb_feat.metadir) { + if (cli_opt_set(&mopts, M_METADIR)) { + fprintf(stderr, +_("realtime groups not supported without metadata directory support\n")); + usage(); + } + cli->sb_feat.metadir = true; + } + /* * Copy features across to config structure now. */ @@ -3362,6 +3413,181 @@ an AG size that is one stripe unit smaller or larger, for example %llu.\n"), cfg->agsize, cfg->agcount); } +static uint64_t +calc_rgsize_extsize_nonpower( + struct mkfs_params *cfg) +{ + uint64_t try_rgsize, rgsize, rgcount; + + /* + * For non-power-of-two rt extent sizes, round the rtgroup size down to + * the nearest extent. + */ + calc_default_rtgroup_geometry(cfg->blocklog, cfg->rtblocks, &rgsize, + &rgcount); + rgsize -= rgsize % cfg->rtextblocks; + rgsize = min(XFS_MAX_RGBLOCKS, rgsize); + + /* + * If we would be left with a too-small rtgroup, increase or decrease + * the size of the group until we have a working geometry. + */ + for (try_rgsize = rgsize; + try_rgsize <= XFS_MAX_RGBLOCKS - cfg->rtextblocks; + try_rgsize += cfg->rtextblocks) { + if (cfg->rtblocks % try_rgsize >= (2 * cfg->rtextblocks)) + return try_rgsize; + } + for (try_rgsize = rgsize; + try_rgsize > (2 * cfg->rtextblocks); + try_rgsize -= cfg->rtextblocks) { + if (cfg->rtblocks % try_rgsize >= (2 * cfg->rtextblocks)) + return try_rgsize; + } + + fprintf(stderr, +_("realtime group size (%llu) not at all congruent with extent size (%llu)\n"), + (unsigned long long)rgsize, + (unsigned long long)cfg->rtextblocks); + usage(); + return 0; +} + +static uint64_t +calc_rgsize_extsize_power( + struct mkfs_params *cfg) +{ + uint64_t try_rgsize, rgsize, rgcount; + unsigned int rgsizelog; + + /* + * Find the rt group size that is both a power of two and yields at + * least as many rt groups as the default geometry specified. + */ + calc_default_rtgroup_geometry(cfg->blocklog, cfg->rtblocks, &rgsize, + &rgcount); + rgsizelog = log2_rounddown(rgsize); + rgsize = min(XFS_MAX_RGBLOCKS, 1U << rgsizelog); + + /* + * If we would be left with a too-small rtgroup, increase or decrease + * the size of the group by powers of 2 until we have a working + * geometry. If that doesn't work, try bumping by the extent size. + */ + for (try_rgsize = rgsize; + try_rgsize <= XFS_MAX_RGBLOCKS - cfg->rtextblocks; + try_rgsize <<= 2) { + if (cfg->rtblocks % try_rgsize >= (2 * cfg->rtextblocks)) + return try_rgsize; + } + for (try_rgsize = rgsize; + try_rgsize > (2 * cfg->rtextblocks); + try_rgsize >>= 2) { + if (cfg->rtblocks % try_rgsize >= (2 * cfg->rtextblocks)) + return try_rgsize; + } + for (try_rgsize = rgsize; + try_rgsize <= XFS_MAX_RGBLOCKS - cfg->rtextblocks; + try_rgsize += cfg->rtextblocks) { + if (cfg->rtblocks % try_rgsize >= (2 * cfg->rtextblocks)) + return try_rgsize; + } + for (try_rgsize = rgsize; + try_rgsize > (2 * cfg->rtextblocks); + try_rgsize -= cfg->rtextblocks) { + if (cfg->rtblocks % try_rgsize >= (2 * cfg->rtextblocks)) + return try_rgsize; + } + + fprintf(stderr, +_("realtime group size (%llu) not at all congruent with extent size (%llu)\n"), + (unsigned long long)rgsize, + (unsigned long long)cfg->rtextblocks); + usage(); + return 0; +} + +static void +calculate_rtgroup_geometry( + struct mkfs_params *cfg, + struct cli_params *cli) +{ + if (!cli->sb_feat.rtgroups) { + cfg->rgcount = 0; + cfg->rgsize = 0; + return; + } + + if (cli->rgsize) { /* User-specified rtgroup size */ + cfg->rgsize = getnum(cli->rgsize, &ropts, R_RGSIZE); + + /* + * Check specified agsize is a multiple of blocksize. + */ + if (cfg->rgsize % cfg->blocksize) { + fprintf(stderr, +_("rgsize (%s) not a multiple of fs blk size (%d)\n"), + cli->rgsize, cfg->blocksize); + usage(); + } + cfg->rgsize /= cfg->blocksize; + cfg->rgcount = cfg->rtblocks / cfg->rgsize + + (cfg->rtblocks % cfg->rgsize != 0); + + } else if (cli->rgcount) { /* User-specified rtgroup count */ + cfg->rgcount = cli->rgcount; + cfg->rgsize = cfg->rtblocks / cfg->rgcount + + (cfg->rtblocks % cfg->rgcount != 0); + } else if (cfg->rtblocks == 0) { + /* + * If nobody specified a realtime device or the rtgroup size, + * try 1TB, rounded down to the nearest rt extent. + */ + cfg->rgsize = TERABYTES(1, cfg->blocklog); + cfg->rgsize -= cfg->rgsize % cfg->rtextblocks; + cfg->rgcount = 0; + } else if (!is_power_of_2(cfg->rtextblocks)) { + cfg->rgsize = calc_rgsize_extsize_nonpower(cfg); + cfg->rgcount = cfg->rtblocks / cfg->rgsize + + (cfg->rtblocks % cfg->rgsize != 0); + } else { + cfg->rgsize = calc_rgsize_extsize_power(cfg); + cfg->rgcount = cfg->rtblocks / cfg->rgsize + + (cfg->rtblocks % cfg->rgsize != 0); + } + + if (cfg->rgsize > XFS_MAX_RGBLOCKS) { + fprintf(stderr, +_("realtime group size (%llu) must be less than the maximum (%u)\n"), + (unsigned long long)cfg->rgsize, + XFS_MAX_RGBLOCKS); + usage(); + } + + if (cfg->rgsize % cfg->rtextblocks != 0) { + fprintf(stderr, +_("realtime group size (%llu) not a multiple of rt extent size (%llu)\n"), + (unsigned long long)cfg->rgsize, + (unsigned long long)cfg->rtextblocks); + usage(); + } + + if (cfg->rgsize <= cfg->rtextblocks) { + fprintf(stderr, +_("realtime group size (%llu) must be at least two realtime extents\n"), + (unsigned long long)cfg->rgsize); + usage(); + } + + if (cfg->rgcount > XFS_MAX_RGNUMBER) { + fprintf(stderr, +_("realtime group count (%llu) must be less than the maximum (%u)\n"), + (unsigned long long)cfg->rgcount, + XFS_MAX_RGNUMBER); + usage(); + } +} + static void calculate_imaxpct( struct mkfs_params *cfg, @@ -3499,6 +3725,12 @@ sb_set_features( if (fp->nrext64) sbp->sb_features_incompat |= XFS_SB_FEAT_INCOMPAT_NREXT64; + + if (fp->rtgroups) { + sbp->sb_features_incompat |= XFS_SB_FEAT_INCOMPAT_RTGROUPS; + sbp->sb_rgcount = cfg->rgcount; + sbp->sb_rgblocks = cfg->rgsize; + } } /* @@ -4274,6 +4506,7 @@ main( char **argv) { xfs_agnumber_t agno; + xfs_rgnumber_t rgno; struct xfs_buf *buf; int c; char *dfile = NULL; @@ -4494,6 +4727,7 @@ main( */ calculate_initial_ag_geometry(&cfg, &cli, &xi); align_ag_geometry(&cfg); + calculate_rtgroup_geometry(&cfg, &cli); calculate_imaxpct(&cfg, &cli); @@ -4587,6 +4821,42 @@ main( exit(1); } + /* Write all the realtime group superblocks. */ + for (rgno = 0; rgno < cfg.rgcount; rgno++) { + struct xfs_buf *rtsb_bp; + struct xfs_buf *sb_bp = libxfs_getsb(mp); + + if (!sb_bp) { + fprintf(stderr, + _("%s: couldn't grab buffers to write primary rt superblock\n"), progname); + exit(1); + } + + error = -libxfs_buf_get_uncached(mp->m_rtdev_targp, + XFS_FSB_TO_BB(mp, 1), 0, + &rtsb_bp); + if (error) { + fprintf(stderr, + _("%s: couldn't grab primary rt superblock\n"), progname); + exit(1); + } + rtsb_bp->b_maps[0].bm_bn = XFS_RTSB_DADDR; + rtsb_bp->b_ops = &xfs_rtsb_buf_ops; + + libxfs_rtgroup_update_super(rtsb_bp, sb_bp); + libxfs_buf_mark_dirty(rtsb_bp); + libxfs_buf_relse(rtsb_bp); + libxfs_buf_relse(sb_bp); + + error = -libxfs_rtgroup_update_secondary_sbs(mp); + if (error) { + fprintf(stderr, + _("%s: writing secondary rtgroup headers failed, err=%d\n"), + progname, error); + exit(1); + } + } + /* * Initialise the freespace freelists (i.e. AGFLs) in each AG. */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 41/45] xfs_scrub: scrub realtime allocation group metadata 2022-12-30 22:19 ` [PATCHSET v1.0 00/45] libxfs: shard the realtime section Darrick J. Wong ` (42 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 45/45] mkfs: format realtime groups Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 42/45] xfs_scrub: fstrim each rtgroup in parallel Darrick J. Wong 44 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Scan realtime group metadata as part of phase 2, just like we do for AG metadata. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- scrub/phase2.c | 97 +++++++++++++++++++++++++++++++++++++++++++++++++++++++- scrub/scrub.h | 9 +++++ 2 files changed, 104 insertions(+), 2 deletions(-) diff --git a/scrub/phase2.c b/scrub/phase2.c index ebe3ad3ad5c..a224af11ed4 100644 --- a/scrub/phase2.c +++ b/scrub/phase2.c @@ -28,6 +28,14 @@ struct scan_ctl { pthread_mutex_t rbm_waitlock; bool rbm_done; + /* + * Control mechanism to signal that each group's scan of the rt bitmap + * file scan is done and wake up any waiters. + */ + pthread_cond_t rbm_group_wait; + pthread_mutex_t rbm_group_waitlock; + unsigned int rbm_group_count; + bool aborted; }; @@ -178,6 +186,48 @@ scan_metafile( } } +/* Scrub each rt group's metadata. */ +static void +scan_rtgroup_metadata( + struct workqueue *wq, + xfs_agnumber_t rgno, + void *arg) +{ + struct scrub_item sri; + struct scrub_ctx *ctx = (struct scrub_ctx *)wq->wq_ctx; + struct scan_ctl *sctl = arg; + char descr[DESCR_BUFSZ]; + int ret; + + if (sctl->aborted) + goto out; + + scrub_item_init_rtgroup(&sri, rgno); + snprintf(descr, DESCR_BUFSZ, _("rtgroup %u"), rgno); + + scrub_item_schedule_group(&sri, XFROG_SCRUB_GROUP_RTGROUP); + ret = scrub_item_check(ctx, &sri); + if (ret) { + sctl->aborted = true; + goto out; + } + + /* Everything else gets fixed during phase 4. */ + ret = defer_fs_repair(ctx, &sri); + if (ret) { + sctl->aborted = true; + goto out; + } + +out: + /* Signal anybody waiting for the group bitmap scan to finish. */ + pthread_mutex_lock(&sctl->rbm_group_waitlock); + sctl->rbm_group_count--; + if (sctl->rbm_group_count == 0) + pthread_cond_broadcast(&sctl->rbm_group_wait); + pthread_mutex_unlock(&sctl->rbm_group_waitlock); +} + /* Scan all filesystem metadata. */ int phase2_func( @@ -191,6 +241,7 @@ phase2_func( struct scrub_item sri; const struct xfrog_scrub_descr *sc = xfrog_scrubbers; xfs_agnumber_t agno; + xfs_rgnumber_t rgno; unsigned int type; int ret, ret2; @@ -256,8 +307,10 @@ phase2_func( goto out_wq; /* - * Wait for the rt bitmap to finish scanning, then scan the rt summary - * since the summary can be regenerated completely from the bitmap. + * Wait for the rt bitmap to finish scanning, then scan the realtime + * group metadata. When rtgroups are enabled, the RTBITMAP scanner + * only checks the inode and fork data of the rt bitmap file, and each + * group checks its own part of the rtbitmap. */ ret = pthread_mutex_lock(&sctl.rbm_waitlock); if (ret) { @@ -274,6 +327,46 @@ phase2_func( } pthread_mutex_unlock(&sctl.rbm_waitlock); + if (sctl.aborted) + goto out_wq; + + for (rgno = 0; + rgno < ctx->mnt.fsgeom.rgcount && !sctl.aborted; + rgno++) { + pthread_mutex_lock(&sctl.rbm_group_waitlock); + sctl.rbm_group_count++; + pthread_mutex_unlock(&sctl.rbm_group_waitlock); + ret = -workqueue_add(&wq, scan_rtgroup_metadata, rgno, &sctl); + if (ret) { + str_liberror(ctx, ret, + _("queueing rtgroup scrub work")); + goto out_wq; + } + } + + if (sctl.aborted) + goto out_wq; + + /* + * Wait for the rtgroups to finish scanning, then scan the rt summary + * since the summary can be regenerated completely from the bitmap. + */ + ret = pthread_mutex_lock(&sctl.rbm_group_waitlock); + if (ret) { + str_liberror(ctx, ret, _("waiting for rtgroup scrubbers")); + goto out_wq; + } + if (sctl.rbm_group_count > 0) { + ret = pthread_cond_wait(&sctl.rbm_group_wait, + &sctl.rbm_group_waitlock); + if (ret) { + str_liberror(ctx, ret, + _("waiting for rtgroup scrubbers")); + goto out_wq; + } + } + pthread_mutex_unlock(&sctl.rbm_group_waitlock); + if (sctl.aborted) goto out_wq; diff --git a/scrub/scrub.h b/scrub/scrub.h index 53354099c81..b7e6173f8fa 100644 --- a/scrub/scrub.h +++ b/scrub/scrub.h @@ -87,6 +87,15 @@ scrub_item_init_ag(struct scrub_item *sri, xfs_agnumber_t agno) sri->sri_gen = -1U; } +static inline void +scrub_item_init_rtgroup(struct scrub_item *sri, xfs_rgnumber_t rgno) +{ + memset(sri, 0, sizeof(*sri)); + sri->sri_agno = rgno; + sri->sri_ino = -1ULL; + sri->sri_gen = -1U; +} + static inline void scrub_item_init_fs(struct scrub_item *sri) { ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 42/45] xfs_scrub: fstrim each rtgroup in parallel 2022-12-30 22:19 ` [PATCHSET v1.0 00/45] libxfs: shard the realtime section Darrick J. Wong ` (43 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 41/45] xfs_scrub: scrub realtime allocation group metadata Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 44 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libfrog/fsgeom.h | 21 +++++++++++++++++++++ scrub/phase8.c | 46 +++++++++++++++++++++++++++++++++++++++++++++- 2 files changed, 66 insertions(+), 1 deletion(-) diff --git a/libfrog/fsgeom.h b/libfrog/fsgeom.h index 8c21b240bb2..6c6d6bb815a 100644 --- a/libfrog/fsgeom.h +++ b/libfrog/fsgeom.h @@ -203,4 +203,25 @@ cvt_b_to_agbno( return cvt_daddr_to_agbno(xfd, cvt_btobbt(byteno)); } +/* Convert rtgroup number and rtgroup block to fs block number */ +static inline uint64_t +cvt_rgbno_to_daddr( + struct xfs_fd *xfd, + uint32_t rgno, + uint32_t rgbno) +{ + return cvt_off_fsb_to_bb(xfd, + (uint64_t)rgno * xfd->fsgeom.rgblocks + rgbno); +} + +/* Convert rtgroup number and rtgroup block to a byte location on disk. */ +static inline uint64_t +cvt_rgbno_to_b( + struct xfs_fd *xfd, + xfs_rgnumber_t rgno, + xfs_rgblock_t rgbno) +{ + return cvt_bbtob(cvt_rgbno_to_daddr(xfd, rgno, rgbno)); +} + #endif /* __LIBFROG_FSGEOM_H__ */ diff --git a/scrub/phase8.c b/scrub/phase8.c index a8ea8db706b..cc4901f8614 100644 --- a/scrub/phase8.c +++ b/scrub/phase8.c @@ -48,6 +48,7 @@ fstrim_ok( struct trim_ctl { uint64_t datadev_end_pos; + uint64_t rtdev_end_pos; bool aborted; }; @@ -80,6 +81,35 @@ trim_ag( progress_add(1); } +/* Trim each rt group. */ +static void +trim_rtgroup( + struct workqueue *wq, + xfs_agnumber_t rgno, + void *arg) +{ + struct scrub_ctx *ctx = (struct scrub_ctx *)wq->wq_ctx; + struct trim_ctl *tctl = arg; + uint64_t pos, len, eortg_pos; + int error; + + pos = cvt_rgbno_to_b(&ctx->mnt, rgno, 0); + eortg_pos = cvt_rgbno_to_b(&ctx->mnt, rgno, ctx->mnt.fsgeom.rgblocks); + len = min(tctl->rtdev_end_pos, eortg_pos) - pos; + + error = fstrim(ctx, pos + tctl->datadev_end_pos, len); + if (error) { + char descr[DESCR_BUFSZ]; + + snprintf(descr, sizeof(descr) - 1, _("fstrim rgno %u"), rgno); + str_liberror(ctx, error, descr); + tctl->aborted = true; + return; + } + + progress_add(1); +} + /* Trim the filesystem, if desired. */ int phase8_func( @@ -97,6 +127,8 @@ phase8_func( tctl.datadev_end_pos = cvt_off_fsb_to_b(&ctx->mnt, ctx->mnt.fsgeom.datablocks); + tctl.rtdev_end_pos = cvt_off_fsb_to_b(&ctx->mnt, + ctx->mnt.fsgeom.rtblocks); error = -workqueue_create(&wq, (struct xfs_mount *)ctx, disk_heads(ctx->datadev)); @@ -117,6 +149,18 @@ phase8_func( } } + /* Trim each rtgroup in parallel. */ + for (agno = 0; + agno < ctx->mnt.fsgeom.rgcount && !tctl.aborted; + agno++) { + error = -workqueue_add(&wq, trim_rtgroup, agno, &tctl); + if (error) { + str_liberror(ctx, error, + _("queueing per-rtgroup fstrim work")); + goto out_wq; + } + } + out_wq: err2 = -workqueue_terminate(&wq); if (err2) { @@ -142,7 +186,7 @@ phase8_estimate( *items = 0; if (fstrim_ok(ctx)) - *items = ctx->mnt.fsgeom.agcount; + *items = ctx->mnt.fsgeom.agcount + ctx->mnt.fsgeom.rgcount; *nr_threads = disk_heads(ctx->datadev); *rshift = 0; ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCHSET v1.0 0/3] libxfs: widen EFI format to support rt 2022-12-30 21:14 [NYE DELUGE 3/4] xfs: modernize the realtime volume Darrick J. Wong ` (25 preceding siblings ...) 2022-12-30 22:19 ` [PATCHSET v1.0 00/45] libxfs: shard the realtime section Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 2/3] xfs_logprint: report realtime EFIs Darrick J. Wong ` (2 more replies) 2022-12-30 22:19 ` [PATCHSET v1.0 00/41] libxfs: realtime reverse-mapping support Darrick J. Wong ` (12 subsequent siblings) 39 siblings, 3 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs Hi all, Realtime reverse mapping (and beyond that, realtime reflink) needs to be able to defer file mapping and extent freeing work in much the same manner as is required on the data volume. Make the extent freeing log items operate on rt extents in preparation for realtime rmap. If you're going to start using this mess, you probably ought to just pull from my git trees, which are linked below. This is an extraordinary way to destroy everything. Enjoy! Comments and questions are, as always, welcome. --D kernel git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=realtime-extfree-intents xfsprogs git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=realtime-extfree-intents --- libxfs/defer_item.c | 18 ++++++++++++++++++ libxfs/xfs_alloc.c | 35 ++++++++++++++++++++++++++++------- libxfs/xfs_alloc.h | 17 +++++++++++++++-- libxfs/xfs_defer.c | 1 + libxfs/xfs_defer.h | 1 + libxfs/xfs_log_format.h | 7 +++++++ libxfs/xfs_rtbitmap.c | 4 ++++ logprint/log_redo.c | 20 ++++++++++++++++---- 8 files changed, 90 insertions(+), 13 deletions(-) ^ permalink raw reply [flat|nested] 565+ messages in thread
* [PATCH 2/3] xfs_logprint: report realtime EFIs 2022-12-30 22:19 ` [PATCHSET v1.0 0/3] libxfs: widen EFI format to support rt Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 1/3] xfs: support logging EFIs for realtime extents Darrick J. Wong 2022-12-30 22:19 ` [PATCH 3/3] xfs: support error injection when freeing rt extents Darrick J. Wong 2 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Decode the EFI format just enough to report if an EFI targets the realtime device or not. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- logprint/log_redo.c | 20 ++++++++++++++++---- 1 file changed, 16 insertions(+), 4 deletions(-) diff --git a/logprint/log_redo.c b/logprint/log_redo.c index 770485df75d..8fbc19a60e9 100644 --- a/logprint/log_redo.c +++ b/logprint/log_redo.c @@ -113,8 +113,14 @@ xlog_print_trans_efi( ex = f->efi_extents; for (i=0; i < f->efi_nextents; i++) { - printf("(s: 0x%llx, l: %d) ", - (unsigned long long)ex->ext_start, ex->ext_len); + unsigned int len; + int rt; + + rt = !!(ex->ext_len & XFS_EFI_EXTLEN_REALTIME_EXT); + len = ex->ext_len & ~XFS_EFI_EXTLEN_REALTIME_EXT; + + printf("(s: 0x%llx, l: %u, rt? %d) ", + (unsigned long long)ex->ext_start, len, rt); if (i % 4 == 3) printf("\n"); ex++; } @@ -160,8 +166,14 @@ xlog_recover_print_efi( ex = f->efi_extents; printf(" "); for (i=0; i< f->efi_nextents; i++) { - printf("(s: 0x%llx, l: %d) ", - (unsigned long long)ex->ext_start, ex->ext_len); + unsigned int len; + int rt; + + rt = !!(ex->ext_len & XFS_EFI_EXTLEN_REALTIME_EXT); + len = ex->ext_len & ~XFS_EFI_EXTLEN_REALTIME_EXT; + + printf("(s: 0x%llx, l: %u, rt? %d) ", + (unsigned long long)ex->ext_start, len, rt); if (i % 4 == 3) printf("\n"); ex++; ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 1/3] xfs: support logging EFIs for realtime extents 2022-12-30 22:19 ` [PATCHSET v1.0 0/3] libxfs: widen EFI format to support rt Darrick J. Wong 2022-12-30 22:19 ` [PATCH 2/3] xfs_logprint: report realtime EFIs Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 3/3] xfs: support error injection when freeing rt extents Darrick J. Wong 2 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Teach the EFI mechanism how to free realtime extents. We do this very sneakily, by using the upper bit of the length field in the log format (and a boolean flag incore) to convey the realtime status. We're going to need this to enforce proper ordering of operations when we enable realtime rmap. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/defer_item.c | 18 ++++++++++++++++++ libxfs/xfs_alloc.c | 35 ++++++++++++++++++++++++++++------- libxfs/xfs_alloc.h | 17 +++++++++++++++-- libxfs/xfs_defer.c | 1 + libxfs/xfs_defer.h | 1 + libxfs/xfs_log_format.h | 7 +++++++ 6 files changed, 70 insertions(+), 9 deletions(-) diff --git a/libxfs/defer_item.c b/libxfs/defer_item.c index be6ecbc348f..99899e6b617 100644 --- a/libxfs/defer_item.c +++ b/libxfs/defer_item.c @@ -44,6 +44,11 @@ xfs_extent_free_diff_items( ra = container_of(a, struct xfs_extent_free_item, xefi_list); rb = container_of(b, struct xfs_extent_free_item, xefi_list); + ASSERT(xfs_efi_is_realtime(ra) == xfs_efi_is_realtime(rb)); + + if (xfs_efi_is_realtime(ra)) + return ra->xefi_rtg->rtg_rgno - rb->xefi_rtg->rtg_rgno; + return ra->xefi_pag->pag_agno - rb->xefi_pag->pag_agno; } @@ -80,6 +85,14 @@ xfs_extent_free_get_group( { xfs_agnumber_t agno; + if (xfs_efi_is_realtime(xefi)) { + xfs_rgnumber_t rgno; + + rgno = xfs_rtb_to_rgno(mp, xefi->xefi_startblock); + xefi->xefi_rtg = xfs_rtgroup_get(mp, rgno); + return; + } + agno = XFS_FSB_TO_AGNO(mp, xefi->xefi_startblock); xefi->xefi_pag = xfs_perag_get(mp, agno); xfs_perag_bump_intents(xefi->xefi_pag); @@ -90,6 +103,11 @@ static inline void xfs_extent_free_put_group( struct xfs_extent_free_item *xefi) { + if (xfs_efi_is_realtime(xefi)) { + xfs_rtgroup_put(xefi->xefi_rtg); + return; + } + xfs_perag_drop_intents(xefi->xefi_pag); xfs_perag_put(xefi->xefi_pag); } diff --git a/libxfs/xfs_alloc.c b/libxfs/xfs_alloc.c index 461178716b6..43b63462374 100644 --- a/libxfs/xfs_alloc.c +++ b/libxfs/xfs_alloc.c @@ -2603,6 +2603,7 @@ xfs_free_extent_later( { struct xfs_extent_free_item *xefi; struct xfs_mount *mp = tp->t_mountp; + enum xfs_defer_ops_type optype; #ifdef DEBUG xfs_agnumber_t agno; xfs_agblock_t agbno; @@ -2611,12 +2612,19 @@ xfs_free_extent_later( ASSERT(len > 0); ASSERT(len <= XFS_MAX_BMBT_EXTLEN); ASSERT(!isnullstartblock(bno)); - agno = XFS_FSB_TO_AGNO(mp, bno); - agbno = XFS_FSB_TO_AGBNO(mp, bno); - ASSERT(agno < mp->m_sb.sb_agcount); - ASSERT(agbno < mp->m_sb.sb_agblocks); - ASSERT(len < mp->m_sb.sb_agblocks); - ASSERT(agbno + len <= mp->m_sb.sb_agblocks); + if (flags & XFS_FREE_EXTENT_REALTIME) { + ASSERT(bno < mp->m_sb.sb_rblocks); + ASSERT(len <= mp->m_sb.sb_rblocks); + ASSERT(bno + len <= mp->m_sb.sb_rblocks); + } else { + agno = XFS_FSB_TO_AGNO(mp, bno); + agbno = XFS_FSB_TO_AGBNO(mp, bno); + + ASSERT(agno < mp->m_sb.sb_agcount); + ASSERT(agbno < mp->m_sb.sb_agblocks); + ASSERT(len < mp->m_sb.sb_agblocks); + ASSERT(agbno + len <= mp->m_sb.sb_agblocks); + } #endif ASSERT(!(flags & ~XFS_FREE_EXTENT_ALL_FLAGS)); ASSERT(xfs_extfree_item_cache != NULL); @@ -2627,6 +2635,19 @@ xfs_free_extent_later( xefi->xefi_blockcount = (xfs_extlen_t)len; if (flags & XFS_FREE_EXTENT_SKIP_DISCARD) xefi->xefi_flags |= XFS_EFI_SKIP_DISCARD; + if (flags & XFS_FREE_EXTENT_REALTIME) { + /* + * Realtime and data section EFIs must use separate + * transactions to finish deferred work because updates to + * realtime metadata files can lock AGFs to allocate btree + * blocks and we don't want that mixing with the AGF locks + * taken to finish data section EFIs. + */ + optype = XFS_DEFER_OPS_TYPE_FREE_RT; + xefi->xefi_flags |= XFS_EFI_REALTIME; + } else { + optype = XFS_DEFER_OPS_TYPE_FREE; + } if (oinfo) { ASSERT(oinfo->oi_offset == 0); @@ -2642,7 +2663,7 @@ xfs_free_extent_later( trace_xfs_extent_free_defer(mp, xefi); xfs_extent_free_get_group(mp, xefi); - xfs_defer_add(tp, XFS_DEFER_OPS_TYPE_FREE, &xefi->xefi_list); + xfs_defer_add(tp, optype, &xefi->xefi_list); } #ifdef DEBUG diff --git a/libxfs/xfs_alloc.h b/libxfs/xfs_alloc.h index 19c5f046c3c..cd7b26568a3 100644 --- a/libxfs/xfs_alloc.h +++ b/libxfs/xfs_alloc.h @@ -228,7 +228,11 @@ void xfs_free_extent_later(struct xfs_trans *tp, xfs_fsblock_t bno, /* Don't issue a discard for the blocks freed. */ #define XFS_FREE_EXTENT_SKIP_DISCARD (1U << 0) -#define XFS_FREE_EXTENT_ALL_FLAGS (XFS_FREE_EXTENT_SKIP_DISCARD) +/* Free blocks on the realtime device. */ +#define XFS_FREE_EXTENT_REALTIME (1U << 1) + +#define XFS_FREE_EXTENT_ALL_FLAGS (XFS_FREE_EXTENT_SKIP_DISCARD | \ + XFS_FREE_EXTENT_REALTIME) /* * List of extents to be free "later". @@ -239,7 +243,10 @@ struct xfs_extent_free_item { uint64_t xefi_owner; xfs_fsblock_t xefi_startblock;/* starting fs block number */ xfs_extlen_t xefi_blockcount;/* number of blocks in extent */ - struct xfs_perag *xefi_pag; + union { + struct xfs_perag *xefi_pag; + struct xfs_rtgroup *xefi_rtg; + }; unsigned int xefi_flags; }; @@ -249,6 +256,12 @@ void xfs_extent_free_get_group(struct xfs_mount *mp, #define XFS_EFI_SKIP_DISCARD (1U << 0) /* don't issue discard */ #define XFS_EFI_ATTR_FORK (1U << 1) /* freeing attr fork block */ #define XFS_EFI_BMBT_BLOCK (1U << 2) /* freeing bmap btree block */ +#define XFS_EFI_REALTIME (1U << 3) /* freeing realtime extent */ + +static inline bool xfs_efi_is_realtime(const struct xfs_extent_free_item *xefi) +{ + return xefi->xefi_flags & XFS_EFI_REALTIME; +} extern struct kmem_cache *xfs_extfree_item_cache; diff --git a/libxfs/xfs_defer.c b/libxfs/xfs_defer.c index 47a7c5ed1f5..c148ed38eb0 100644 --- a/libxfs/xfs_defer.c +++ b/libxfs/xfs_defer.c @@ -183,6 +183,7 @@ static const struct xfs_defer_op_type *defer_op_types[] = { [XFS_DEFER_OPS_TYPE_REFCOUNT] = &xfs_refcount_update_defer_type, [XFS_DEFER_OPS_TYPE_RMAP] = &xfs_rmap_update_defer_type, [XFS_DEFER_OPS_TYPE_FREE] = &xfs_extent_free_defer_type, + [XFS_DEFER_OPS_TYPE_FREE_RT] = &xfs_extent_free_defer_type, [XFS_DEFER_OPS_TYPE_AGFL_FREE] = &xfs_agfl_free_defer_type, [XFS_DEFER_OPS_TYPE_ATTR] = &xfs_attr_defer_type, [XFS_DEFER_OPS_TYPE_SWAPEXT] = &xfs_swapext_defer_type, diff --git a/libxfs/xfs_defer.h b/libxfs/xfs_defer.h index bcc48b0c75c..52198c7124c 100644 --- a/libxfs/xfs_defer.h +++ b/libxfs/xfs_defer.h @@ -19,6 +19,7 @@ enum xfs_defer_ops_type { XFS_DEFER_OPS_TYPE_RMAP, XFS_DEFER_OPS_TYPE_FREE, XFS_DEFER_OPS_TYPE_AGFL_FREE, + XFS_DEFER_OPS_TYPE_FREE_RT, XFS_DEFER_OPS_TYPE_ATTR, XFS_DEFER_OPS_TYPE_SWAPEXT, XFS_DEFER_OPS_TYPE_MAX, diff --git a/libxfs/xfs_log_format.h b/libxfs/xfs_log_format.h index 378201a7002..f3c8257a754 100644 --- a/libxfs/xfs_log_format.h +++ b/libxfs/xfs_log_format.h @@ -591,6 +591,13 @@ typedef struct xfs_extent { xfs_extlen_t ext_len; } xfs_extent_t; +/* + * This EFI extent describes a realtime extent. We can never free more than + * XFS_MAX_BMBT_EXTLEN (2^21) blocks at a time, so we know that the upper bits + * of ext_len cannot be used. + */ +#define XFS_EFI_EXTLEN_REALTIME_EXT (1U << 31) + /* * Since an xfs_extent_t has types (start:64, len: 32) * there are different alignments on 32 bit and 64 bit kernels. ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 3/3] xfs: support error injection when freeing rt extents 2022-12-30 22:19 ` [PATCHSET v1.0 0/3] libxfs: widen EFI format to support rt Darrick J. Wong 2022-12-30 22:19 ` [PATCH 2/3] xfs_logprint: report realtime EFIs Darrick J. Wong 2022-12-30 22:19 ` [PATCH 1/3] xfs: support logging EFIs for realtime extents Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> A handful of fstests expect to be able to test what happens when extent free intents fail to actually free the extent. Now that we're supporting EFIs for realtime extents, add to xfs_rtfree_extent the same injection point that exists in the regular extent freeing code. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/xfs_rtbitmap.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/libxfs/xfs_rtbitmap.c b/libxfs/xfs_rtbitmap.c index 26428898d60..b0934b1f24a 100644 --- a/libxfs/xfs_rtbitmap.c +++ b/libxfs/xfs_rtbitmap.c @@ -16,6 +16,7 @@ #include "xfs_trans.h" #include "xfs_health.h" #include "xfs_rtbitmap.h" +#include "xfs_errortag.h" /* * Realtime allocator bitmap functions shared with userspace. @@ -1135,6 +1136,9 @@ xfs_rtfree_extent( ASSERT(mp->m_rbmip->i_itemp != NULL); ASSERT(xfs_isilocked(mp->m_rbmip, XFS_ILOCK_EXCL)); + if (XFS_TEST_ERROR(false, mp, XFS_ERRTAG_FREE_EXTENT)) + return -EIO; + error = xfs_rtcheck_alloc_range(mp, tp, start, len); if (error) return error; ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCHSET v1.0 00/41] libxfs: realtime reverse-mapping support 2022-12-30 21:14 [NYE DELUGE 3/4] xfs: modernize the realtime volume Darrick J. Wong ` (26 preceding siblings ...) 2022-12-30 22:19 ` [PATCHSET v1.0 0/3] libxfs: widen EFI format to support rt Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 04/41] xfs: realtime rmap btree transaction reservations Darrick J. Wong ` (40 more replies) 2022-12-30 22:20 ` [PATCHSET v1.0 0/4] libxfs: file write utility refactoring Darrick J. Wong ` (11 subsequent siblings) 39 siblings, 41 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs Hi all, Add libxfs code from the kernel, then teach the various utilities about how to access realtime rmapbt information and rebuild it. If you're going to start using this mess, you probably ought to just pull from my git trees, which are linked below. This is an extraordinary way to destroy everything. Enjoy! Comments and questions are, as always, welcome. --D kernel git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=realtime-rmap xfsprogs git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=realtime-rmap fstests git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=realtime-rmap xfsdocs git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-documentation.git/log/?h=realtime-rmap --- db/bmroot.c | 149 ++++++ db/bmroot.h | 2 db/btblock.c | 103 ++++ db/btblock.h | 5 db/btdump.c | 64 +++ db/btheight.c | 5 db/check.c | 203 +++++++++ db/field.c | 11 db/field.h | 5 db/fsmap.c | 135 ++++++ db/inode.c | 113 +++++ db/inode.h | 4 db/metadump.c | 125 +++++ db/type.c | 5 db/type.h | 1 include/libxfs.h | 2 include/xfs_mount.h | 14 + libfrog/scrub.c | 5 libxfs/Makefile | 2 libxfs/defer_item.c | 28 + libxfs/init.c | 20 - libxfs/libxfs_api_defs.h | 19 + libxfs/rdwr.c | 2 libxfs/trans.c | 1 libxfs/xfbtree.c | 2 libxfs/xfs_bmap.c | 22 + libxfs/xfs_btree.c | 120 +++++ libxfs/xfs_btree.h | 7 libxfs/xfs_defer.c | 1 libxfs/xfs_defer.h | 1 libxfs/xfs_format.h | 24 + libxfs/xfs_fs.h | 4 libxfs/xfs_health.h | 4 libxfs/xfs_imeta.c | 6 libxfs/xfs_inode_buf.c | 6 libxfs/xfs_inode_fork.c | 13 + libxfs/xfs_log_format.h | 4 libxfs/xfs_refcount.c | 6 libxfs/xfs_rmap.c | 227 ++++++++-- libxfs/xfs_rmap.h | 22 + libxfs/xfs_rtgroup.c | 12 + libxfs/xfs_rtgroup.h | 20 + libxfs/xfs_rtrmap_btree.c | 1033 +++++++++++++++++++++++++++++++++++++++++++++ libxfs/xfs_rtrmap_btree.h | 218 +++++++++ libxfs/xfs_sb.c | 6 libxfs/xfs_shared.h | 2 libxfs/xfs_swapext.c | 4 libxfs/xfs_trans_resv.c | 12 - libxfs/xfs_trans_space.h | 13 + libxfs/xfs_types.h | 5 man/man8/xfs_db.8 | 63 +++ mkfs/proto.c | 62 +++ mkfs/xfs_mkfs.c | 90 ++++ repair/Makefile | 1 repair/agbtree.c | 5 repair/bmap_repair.c | 122 +++++ repair/dino_chunks.c | 13 + repair/dinode.c | 373 ++++++++++++++-- repair/dir2.c | 4 repair/incore.h | 1 repair/phase2.c | 92 ++++ repair/phase4.c | 12 + repair/phase6.c | 178 ++++++++ repair/rmap.c | 482 +++++++++++++++++---- repair/rmap.h | 15 - repair/rtrmap_repair.c | 253 +++++++++++ repair/scan.c | 411 +++++++++++++++++- repair/scan.h | 37 ++ repair/xfs_repair.c | 8 scrub/phase4.c | 43 ++ scrub/repair.c | 124 +++++ scrub/repair.h | 5 spaceman/health.c | 10 73 files changed, 4949 insertions(+), 272 deletions(-) create mode 100644 libxfs/xfs_rtrmap_btree.c create mode 100644 libxfs/xfs_rtrmap_btree.h create mode 100644 repair/rtrmap_repair.c ^ permalink raw reply [flat|nested] 565+ messages in thread
* [PATCH 04/41] xfs: realtime rmap btree transaction reservations 2022-12-30 22:19 ` [PATCHSET v1.0 00/41] libxfs: realtime reverse-mapping support Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 01/41] xfs: simplify the xfs_rmap_{alloc,free}_extent calling conventions Darrick J. Wong ` (39 subsequent siblings) 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Make sure that there's enough log reservation to handle mapping and unmapping realtime extents. We have to reserve enough space to handle a split in the rtrmapbt to add the record and a second split in the regular rmapbt to record the rtrmapbt split. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/xfs_swapext.c | 4 +++- libxfs/xfs_trans_resv.c | 12 ++++++++++-- libxfs/xfs_trans_space.h | 13 +++++++++++++ 3 files changed, 26 insertions(+), 3 deletions(-) diff --git a/libxfs/xfs_swapext.c b/libxfs/xfs_swapext.c index 718600019a7..3c22c1d8e2f 100644 --- a/libxfs/xfs_swapext.c +++ b/libxfs/xfs_swapext.c @@ -700,7 +700,9 @@ xfs_swapext_rmapbt_blocks( if (!xfs_has_rmapbt(mp)) return 0; if (XFS_IS_REALTIME_INODE(req->ip1)) - return 0; + return howmany_64(req->nr_exchanges, + XFS_MAX_CONTIG_RTRMAPS_PER_BLOCK(mp)) * + XFS_RTRMAPADD_SPACE_RES(mp); return howmany_64(req->nr_exchanges, XFS_MAX_CONTIG_RMAPS_PER_BLOCK(mp)) * diff --git a/libxfs/xfs_trans_resv.c b/libxfs/xfs_trans_resv.c index db0c745a431..42322a42058 100644 --- a/libxfs/xfs_trans_resv.c +++ b/libxfs/xfs_trans_resv.c @@ -210,7 +210,9 @@ xfs_calc_inode_chunk_res( * Per-extent log reservation for the btree changes involved in freeing or * allocating a realtime extent. We have to be able to log as many rtbitmap * blocks as needed to mark inuse XFS_BMBT_MAX_EXTLEN blocks' worth of realtime - * extents, as well as the realtime summary block. + * extents, as well as the realtime summary block (t1). Realtime rmap btree + * operations happen in a second transaction, so factor in a couple of rtrmapbt + * splits (t2). */ static unsigned int xfs_rtalloc_block_count( @@ -219,10 +221,16 @@ xfs_rtalloc_block_count( { unsigned int rtbmp_blocks; xfs_rtxlen_t rtxlen; + unsigned int t1, t2 = 0; rtxlen = xfs_extlen_to_rtxlen(mp, XFS_MAX_BMBT_EXTLEN); rtbmp_blocks = xfs_rtbitmap_blockcount(mp, rtxlen); - return (rtbmp_blocks + 1) * num_ops; + t1 = (rtbmp_blocks + 1) * num_ops; + + if (xfs_has_rmapbt(mp)) + t2 = num_ops * (2 * mp->m_rtrmap_maxlevels - 1); + + return max(t1, t2); } /* diff --git a/libxfs/xfs_trans_space.h b/libxfs/xfs_trans_space.h index 9640fc232c1..8124893a035 100644 --- a/libxfs/xfs_trans_space.h +++ b/libxfs/xfs_trans_space.h @@ -14,6 +14,19 @@ #define XFS_MAX_CONTIG_BMAPS_PER_BLOCK(mp) \ (((mp)->m_bmap_dmxr[0]) - ((mp)->m_bmap_dmnr[0])) +/* Worst case number of realtime rmaps that can be held in a block. */ +#define XFS_MAX_CONTIG_RTRMAPS_PER_BLOCK(mp) \ + (((mp)->m_rtrmap_mxr[0]) - ((mp)->m_rtrmap_mnr[0])) + +/* Adding one realtime rmap could split every level to the top of the tree. */ +#define XFS_RTRMAPADD_SPACE_RES(mp) ((mp)->m_rtrmap_maxlevels) + +/* Blocks we might need to add "b" realtime rmaps to a tree. */ +#define XFS_NRTRMAPADD_SPACE_RES(mp, b) \ + ((((b) + XFS_MAX_CONTIG_RTRMAPS_PER_BLOCK(mp) - 1) / \ + XFS_MAX_CONTIG_RTRMAPS_PER_BLOCK(mp)) * \ + XFS_RTRMAPADD_SPACE_RES(mp)) + /* Worst case number of rmaps that can be held in a block. */ #define XFS_MAX_CONTIG_RMAPS_PER_BLOCK(mp) \ (((mp)->m_rmap_mxr[0]) - ((mp)->m_rmap_mnr[0])) ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 01/41] xfs: simplify the xfs_rmap_{alloc,free}_extent calling conventions 2022-12-30 22:19 ` [PATCHSET v1.0 00/41] libxfs: realtime reverse-mapping support Darrick J. Wong 2022-12-30 22:19 ` [PATCH 04/41] xfs: realtime rmap btree transaction reservations Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 03/41] xfs: define the on-disk realtime rmap btree format Darrick J. Wong ` (38 subsequent siblings) 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Simplify the calling conventions by allowing callers to pass a fsbno (xfs_fsblock_t) directly into these functions, since we're just going to set it in a struct anyway. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/xfs_refcount.c | 6 ++---- libxfs/xfs_rmap.c | 12 +++++------- libxfs/xfs_rmap.h | 8 ++++---- repair/rmap.c | 8 ++++---- 4 files changed, 15 insertions(+), 19 deletions(-) diff --git a/libxfs/xfs_refcount.c b/libxfs/xfs_refcount.c index 6d1e1bfccd5..5cb2132d9ac 100644 --- a/libxfs/xfs_refcount.c +++ b/libxfs/xfs_refcount.c @@ -1888,8 +1888,7 @@ xfs_refcount_alloc_cow_extent( __xfs_refcount_add(tp, XFS_REFCOUNT_ALLOC_COW, fsb, len); /* Add rmap entry */ - xfs_rmap_alloc_extent(tp, XFS_FSB_TO_AGNO(mp, fsb), - XFS_FSB_TO_AGBNO(mp, fsb), len, XFS_RMAP_OWN_COW); + xfs_rmap_alloc_extent(tp, fsb, len, XFS_RMAP_OWN_COW); } /* Forget a CoW staging event in the refcount btree. */ @@ -1905,8 +1904,7 @@ xfs_refcount_free_cow_extent( return; /* Remove rmap entry */ - xfs_rmap_free_extent(tp, XFS_FSB_TO_AGNO(mp, fsb), - XFS_FSB_TO_AGBNO(mp, fsb), len, XFS_RMAP_OWN_COW); + xfs_rmap_free_extent(tp, fsb, len, XFS_RMAP_OWN_COW); __xfs_refcount_add(tp, XFS_REFCOUNT_FREE_COW, fsb, len); } diff --git a/libxfs/xfs_rmap.c b/libxfs/xfs_rmap.c index 958e26f2dfd..bce30dd66d6 100644 --- a/libxfs/xfs_rmap.c +++ b/libxfs/xfs_rmap.c @@ -525,7 +525,7 @@ xfs_rmap_free_check_owner( struct xfs_btree_cur *cur, uint64_t ltoff, struct xfs_rmap_irec *rec, - xfs_filblks_t len, + xfs_extlen_t len, uint64_t owner, uint64_t offset, unsigned int flags) @@ -2744,8 +2744,7 @@ xfs_rmap_convert_extent( void xfs_rmap_alloc_extent( struct xfs_trans *tp, - xfs_agnumber_t agno, - xfs_agblock_t bno, + xfs_fsblock_t fsbno, xfs_extlen_t len, uint64_t owner) { @@ -2754,7 +2753,7 @@ xfs_rmap_alloc_extent( if (!xfs_rmap_update_is_needed(tp->t_mountp, XFS_DATA_FORK)) return; - bmap.br_startblock = XFS_AGB_TO_FSB(tp->t_mountp, agno, bno); + bmap.br_startblock = fsbno; bmap.br_blockcount = len; bmap.br_startoff = 0; bmap.br_state = XFS_EXT_NORM; @@ -2766,8 +2765,7 @@ xfs_rmap_alloc_extent( void xfs_rmap_free_extent( struct xfs_trans *tp, - xfs_agnumber_t agno, - xfs_agblock_t bno, + xfs_fsblock_t fsbno, xfs_extlen_t len, uint64_t owner) { @@ -2776,7 +2774,7 @@ xfs_rmap_free_extent( if (!xfs_rmap_update_is_needed(tp->t_mountp, XFS_DATA_FORK)) return; - bmap.br_startblock = XFS_AGB_TO_FSB(tp->t_mountp, agno, bno); + bmap.br_startblock = fsbno; bmap.br_blockcount = len; bmap.br_startoff = 0; bmap.br_state = XFS_EXT_NORM; diff --git a/libxfs/xfs_rmap.h b/libxfs/xfs_rmap.h index 36af4de506c..54c969731cf 100644 --- a/libxfs/xfs_rmap.h +++ b/libxfs/xfs_rmap.h @@ -187,10 +187,10 @@ void xfs_rmap_unmap_extent(struct xfs_trans *tp, struct xfs_inode *ip, void xfs_rmap_convert_extent(struct xfs_mount *mp, struct xfs_trans *tp, struct xfs_inode *ip, int whichfork, struct xfs_bmbt_irec *imap); -void xfs_rmap_alloc_extent(struct xfs_trans *tp, xfs_agnumber_t agno, - xfs_agblock_t bno, xfs_extlen_t len, uint64_t owner); -void xfs_rmap_free_extent(struct xfs_trans *tp, xfs_agnumber_t agno, - xfs_agblock_t bno, xfs_extlen_t len, uint64_t owner); +void xfs_rmap_alloc_extent(struct xfs_trans *tp, xfs_fsblock_t fsbno, + xfs_extlen_t len, uint64_t owner); +void xfs_rmap_free_extent(struct xfs_trans *tp, xfs_fsblock_t fsbno, + xfs_extlen_t len, uint64_t owner); void xfs_rmap_finish_one_cleanup(struct xfs_trans *tp, struct xfs_btree_cur *rcur, int error); diff --git a/repair/rmap.c b/repair/rmap.c index 00381c6e69d..db85b747e53 100644 --- a/repair/rmap.c +++ b/repair/rmap.c @@ -1281,7 +1281,6 @@ rmap_diffkeys( { __u64 oa; __u64 ob; - int64_t d; struct xfs_rmap_irec tmp; tmp = *kp1; @@ -1291,9 +1290,10 @@ rmap_diffkeys( tmp.rm_flags &= ~XFS_RMAP_REC_FLAGS; ob = libxfs_rmap_irec_offset_pack(&tmp); - d = (int64_t)kp1->rm_startblock - kp2->rm_startblock; - if (d) - return d; + if (kp1->rm_startblock > kp2->rm_startblock) + return 1; + else if (kp2->rm_startblock > kp1->rm_startblock) + return -1; if (kp1->rm_owner > kp2->rm_owner) return 1; ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 03/41] xfs: define the on-disk realtime rmap btree format 2022-12-30 22:19 ` [PATCHSET v1.0 00/41] libxfs: realtime reverse-mapping support Darrick J. Wong 2022-12-30 22:19 ` [PATCH 04/41] xfs: realtime rmap btree transaction reservations Darrick J. Wong 2022-12-30 22:19 ` [PATCH 01/41] xfs: simplify the xfs_rmap_{alloc,free}_extent calling conventions Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 02/41] xfs: introduce realtime rmap btree definitions Darrick J. Wong ` (37 subsequent siblings) 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Start filling out the rtrmap btree implementation. Start with the on-disk btree format; add everything needed to read, write and manipulate rmap btree blocks. This prepares the way for connecting the btree operations implementation. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- include/libxfs.h | 1 include/xfs_mount.h | 9 + libxfs/Makefile | 2 libxfs/init.c | 5 - libxfs/xfs_btree.c | 6 + libxfs/xfs_format.h | 3 libxfs/xfs_rtrmap_btree.c | 304 +++++++++++++++++++++++++++++++++++++++++++++ libxfs/xfs_rtrmap_btree.h | 83 ++++++++++++ libxfs/xfs_sb.c | 6 + libxfs/xfs_shared.h | 2 10 files changed, 419 insertions(+), 2 deletions(-) create mode 100644 libxfs/xfs_rtrmap_btree.c create mode 100644 libxfs/xfs_rtrmap_btree.h diff --git a/include/libxfs.h b/include/libxfs.h index 5b58750fcd5..0b255e2c104 100644 --- a/include/libxfs.h +++ b/include/libxfs.h @@ -87,6 +87,7 @@ struct iomap; #include "xfs_imeta.h" #include "xfs_rtbitmap.h" #include "xfs_rtgroup.h" +#include "xfs_rtrmap_btree.h" #ifndef ARRAY_SIZE #define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0])) diff --git a/include/xfs_mount.h b/include/xfs_mount.h index ff33a2c450b..a4d0ba70e83 100644 --- a/include/xfs_mount.h +++ b/include/xfs_mount.h @@ -80,11 +80,14 @@ typedef struct xfs_mount { uint m_bmap_dmnr[2]; /* XFS_BMAP_BLOCK_DMINRECS */ uint m_rmap_mxr[2]; /* max rmap btree records */ uint m_rmap_mnr[2]; /* min rmap btree records */ + uint m_rtrmap_mxr[2]; /* max rtrmap btree records */ + uint m_rtrmap_mnr[2]; /* min rtrmap btree records */ uint m_refc_mxr[2]; /* max refc btree records */ uint m_refc_mnr[2]; /* min refc btree records */ uint m_alloc_maxlevels; /* max alloc btree levels */ uint m_bm_maxlevels[2]; /* max bmap btree levels */ uint m_rmap_maxlevels; /* max rmap btree levels */ + uint m_rtrmap_maxlevels; /* max rtrmap btree level */ uint m_refc_maxlevels; /* max refc btree levels */ unsigned int m_agbtree_maxlevels; /* max level of all AG btrees */ unsigned int m_rtbtree_maxlevels; /* max level of all rt btrees */ @@ -220,6 +223,12 @@ __XFS_HAS_FEAT(large_extent_counts, NREXT64) __XFS_HAS_FEAT(metadir, METADIR) __XFS_HAS_FEAT(rtgroups, RTGROUPS) +static inline bool xfs_has_rtrmapbt(struct xfs_mount *mp) +{ + return xfs_has_rtgroups(mp) && xfs_has_realtime(mp) && + xfs_has_rmapbt(mp); +} + /* Kernel mount features that we don't support */ #define __XFS_UNSUPP_FEAT(name) \ static inline bool xfs_has_ ## name (struct xfs_mount *mp) \ diff --git a/libxfs/Makefile b/libxfs/Makefile index 1bd8a2ab01d..2c636816082 100644 --- a/libxfs/Makefile +++ b/libxfs/Makefile @@ -58,6 +58,7 @@ HFILES = \ xfs_rmap_btree.h \ xfs_rtbitmap.h \ xfs_rtgroup.h \ + xfs_rtrmap_btree.h \ xfs_sb.h \ xfs_shared.h \ xfs_swapext.h \ @@ -113,6 +114,7 @@ CFILES = cache.c \ xfs_rmap_btree.c \ xfs_rtbitmap.c \ xfs_rtgroup.c \ + xfs_rtrmap_btree.c \ xfs_sb.c \ xfs_swapext.c \ xfs_symlink_remote.c \ diff --git a/libxfs/init.c b/libxfs/init.c index e3d193a9794..4ce0aca9796 100644 --- a/libxfs/init.c +++ b/libxfs/init.c @@ -21,6 +21,7 @@ #include "xfs_trans.h" #include "xfs_rmap_btree.h" #include "xfs_refcount_btree.h" +#include "xfs_imeta.h" #include "libfrog/platform.h" #include "xfile.h" @@ -782,8 +783,7 @@ static inline void xfs_rtbtree_compute_maxlevels( struct xfs_mount *mp) { - /* This will be filled in later. */ - mp->m_rtbtree_maxlevels = 0; + mp->m_rtbtree_maxlevels = mp->m_rtrmap_maxlevels; } /* Compute maximum possible height of all btrees. */ @@ -796,6 +796,7 @@ libxfs_compute_all_maxlevels( xfs_bmap_compute_maxlevels(mp, XFS_ATTR_FORK); xfs_ialloc_setup_geometry(mp); xfs_rmapbt_compute_maxlevels(mp); + xfs_rtrmapbt_compute_maxlevels(mp); xfs_refcountbt_compute_maxlevels(mp); xfs_agbtree_compute_maxlevels(mp); diff --git a/libxfs/xfs_btree.c b/libxfs/xfs_btree.c index 63a3ceb4bcb..8c101f55039 100644 --- a/libxfs/xfs_btree.c +++ b/libxfs/xfs_btree.c @@ -29,6 +29,7 @@ #include "xfbtree.h" #include "xfs_btree_mem.h" #include "xfs_rtgroup.h" +#include "xfs_rtrmap_btree.h" /* * Btree magic numbers. @@ -1374,6 +1375,7 @@ xfs_btree_set_refs( xfs_buf_set_ref(bp, XFS_BMAP_BTREE_REF); break; case XFS_BTNUM_RMAP: + case XFS_BTNUM_RTRMAP: xfs_buf_set_ref(bp, XFS_RMAP_BTREE_REF); break; case XFS_BTNUM_REFC: @@ -5534,6 +5536,9 @@ xfs_btree_init_cur_caches(void) if (error) goto err; error = xfs_refcountbt_init_cur_cache(); + if (error) + goto err; + error = xfs_rtrmapbt_init_cur_cache(); if (error) goto err; @@ -5552,6 +5557,7 @@ xfs_btree_destroy_cur_caches(void) xfs_bmbt_destroy_cur_cache(); xfs_rmapbt_destroy_cur_cache(); xfs_refcountbt_destroy_cur_cache(); + xfs_rtrmapbt_destroy_cur_cache(); } /* Move the btree cursor before the first record. */ diff --git a/libxfs/xfs_format.h b/libxfs/xfs_format.h index b2d4ef28a48..fb727e1e407 100644 --- a/libxfs/xfs_format.h +++ b/libxfs/xfs_format.h @@ -1734,6 +1734,9 @@ typedef __be32 xfs_rmap_ptr_t; */ #define XFS_RTRMAP_CRC_MAGIC 0x4d415052 /* 'MAPR' */ +/* inode-based btree pointer type */ +typedef __be64 xfs_rtrmap_ptr_t; + /* * Reference Count Btree format definitions * diff --git a/libxfs/xfs_rtrmap_btree.c b/libxfs/xfs_rtrmap_btree.c new file mode 100644 index 00000000000..99ccdffc30d --- /dev/null +++ b/libxfs/xfs_rtrmap_btree.c @@ -0,0 +1,304 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (C) 2022 Oracle. All Rights Reserved. + * Author: Darrick J. Wong <djwong@kernel.org> + */ +#include "libxfs_priv.h" +#include "xfs_fs.h" +#include "xfs_shared.h" +#include "xfs_format.h" +#include "xfs_log_format.h" +#include "xfs_trans_resv.h" +#include "xfs_bit.h" +#include "xfs_sb.h" +#include "xfs_mount.h" +#include "xfs_defer.h" +#include "xfs_inode.h" +#include "xfs_trans.h" +#include "xfs_alloc.h" +#include "xfs_btree.h" +#include "xfs_btree_staging.h" +#include "xfs_rtrmap_btree.h" +#include "xfs_trace.h" +#include "xfs_cksum.h" +#include "xfs_rtgroup.h" + +static struct kmem_cache *xfs_rtrmapbt_cur_cache; + +/* + * Realtime Reverse Map btree. + * + * This is a btree used to track the owner(s) of a given extent in the realtime + * device. See the comments in xfs_rmap_btree.c for more information. + * + * This tree is basically the same as the regular rmap btree except that it + * is rooted in an inode and does not live in free space. + */ + +static struct xfs_btree_cur * +xfs_rtrmapbt_dup_cursor( + struct xfs_btree_cur *cur) +{ + struct xfs_btree_cur *new; + + new = xfs_rtrmapbt_init_cursor(cur->bc_mp, cur->bc_tp, cur->bc_ino.rtg, + cur->bc_ino.ip); + + /* Copy the flags values since init cursor doesn't get them. */ + new->bc_ino.flags = cur->bc_ino.flags; + + return new; +} + +static xfs_failaddr_t +xfs_rtrmapbt_verify( + struct xfs_buf *bp) +{ + struct xfs_mount *mp = bp->b_target->bt_mount; + struct xfs_btree_block *block = XFS_BUF_TO_BLOCK(bp); + xfs_failaddr_t fa; + int level; + + if (!xfs_verify_magic(bp, block->bb_magic)) + return __this_address; + + if (!xfs_has_rmapbt(mp)) + return __this_address; + fa = xfs_btree_lblock_v5hdr_verify(bp, XFS_RMAP_OWN_UNKNOWN); + if (fa) + return fa; + level = be16_to_cpu(block->bb_level); + if (level > mp->m_rtrmap_maxlevels) + return __this_address; + + return xfs_btree_lblock_verify(bp, mp->m_rtrmap_mxr[level != 0]); +} + +static void +xfs_rtrmapbt_read_verify( + struct xfs_buf *bp) +{ + xfs_failaddr_t fa; + + if (!xfs_btree_lblock_verify_crc(bp)) + xfs_verifier_error(bp, -EFSBADCRC, __this_address); + else { + fa = xfs_rtrmapbt_verify(bp); + if (fa) + xfs_verifier_error(bp, -EFSCORRUPTED, fa); + } + + if (bp->b_error) + trace_xfs_btree_corrupt(bp, _RET_IP_); +} + +static void +xfs_rtrmapbt_write_verify( + struct xfs_buf *bp) +{ + xfs_failaddr_t fa; + + fa = xfs_rtrmapbt_verify(bp); + if (fa) { + trace_xfs_btree_corrupt(bp, _RET_IP_); + xfs_verifier_error(bp, -EFSCORRUPTED, fa); + return; + } + xfs_btree_lblock_calc_crc(bp); + +} + +const struct xfs_buf_ops xfs_rtrmapbt_buf_ops = { + .name = "xfs_rtrmapbt", + .magic = { 0, cpu_to_be32(XFS_RTRMAP_CRC_MAGIC) }, + .verify_read = xfs_rtrmapbt_read_verify, + .verify_write = xfs_rtrmapbt_write_verify, + .verify_struct = xfs_rtrmapbt_verify, +}; + +const struct xfs_btree_ops xfs_rtrmapbt_ops = { + .rec_len = sizeof(struct xfs_rmap_rec), + .key_len = 2 * sizeof(struct xfs_rmap_key), + .geom_flags = XFS_BTREE_LONG_PTRS | XFS_BTREE_ROOT_IN_INODE | + XFS_BTREE_CRC_BLOCKS | XFS_BTREE_OVERLAPPING | + XFS_BTREE_IROOT_RECORDS, + + .dup_cursor = xfs_rtrmapbt_dup_cursor, + .buf_ops = &xfs_rtrmapbt_buf_ops, +}; + +/* Initialize a new rt rmap btree cursor. */ +static struct xfs_btree_cur * +xfs_rtrmapbt_init_common( + struct xfs_mount *mp, + struct xfs_trans *tp, + struct xfs_rtgroup *rtg, + struct xfs_inode *ip) +{ + struct xfs_btree_cur *cur; + + ASSERT(xfs_isilocked(ip, XFS_ILOCK_SHARED | XFS_ILOCK_EXCL)); + + cur = xfs_btree_alloc_cursor(mp, tp, XFS_BTNUM_RTRMAP, + &xfs_rtrmapbt_ops, mp->m_rtrmap_maxlevels, + xfs_rtrmapbt_cur_cache); + cur->bc_statoff = XFS_STATS_CALC_INDEX(xs_rmap_2); + + cur->bc_ino.ip = ip; + cur->bc_ino.allocated = 0; + cur->bc_ino.flags = 0; + + cur->bc_ino.rtg = xfs_rtgroup_bump(rtg); + return cur; +} + +/* Allocate a new rt rmap btree cursor. */ +struct xfs_btree_cur * +xfs_rtrmapbt_init_cursor( + struct xfs_mount *mp, + struct xfs_trans *tp, + struct xfs_rtgroup *rtg, + struct xfs_inode *ip) +{ + struct xfs_btree_cur *cur; + struct xfs_ifork *ifp = xfs_ifork_ptr(ip, XFS_DATA_FORK); + + cur = xfs_rtrmapbt_init_common(mp, tp, rtg, ip); + cur->bc_nlevels = be16_to_cpu(ifp->if_broot->bb_level) + 1; + cur->bc_ino.forksize = xfs_inode_fork_size(ip, XFS_DATA_FORK); + cur->bc_ino.whichfork = XFS_DATA_FORK; + return cur; +} + +/* Create a new rt reverse mapping btree cursor with a fake root for staging. */ +struct xfs_btree_cur * +xfs_rtrmapbt_stage_cursor( + struct xfs_mount *mp, + struct xfs_rtgroup *rtg, + struct xfs_inode *ip, + struct xbtree_ifakeroot *ifake) +{ + struct xfs_btree_cur *cur; + + cur = xfs_rtrmapbt_init_common(mp, NULL, rtg, ip); + cur->bc_nlevels = ifake->if_levels; + cur->bc_ino.forksize = ifake->if_fork_size; + cur->bc_ino.whichfork = -1; + xfs_btree_stage_ifakeroot(cur, ifake, NULL); + return cur; +} + +/* + * Install a new rt reverse mapping btree root. Caller is responsible for + * invalidating and freeing the old btree blocks. + */ +void +xfs_rtrmapbt_commit_staged_btree( + struct xfs_btree_cur *cur, + struct xfs_trans *tp) +{ + struct xbtree_ifakeroot *ifake = cur->bc_ino.ifake; + struct xfs_ifork *ifp; + int flags = XFS_ILOG_CORE | XFS_ILOG_DBROOT; + + ASSERT(cur->bc_flags & XFS_BTREE_STAGING); + + /* + * Free any resources hanging off the real fork, then shallow-copy the + * staging fork's contents into the real fork to transfer everything + * we just built. + */ + ifp = xfs_ifork_ptr(cur->bc_ino.ip, XFS_DATA_FORK); + xfs_idestroy_fork(ifp); + memcpy(ifp, ifake->if_fork, sizeof(struct xfs_ifork)); + + xfs_trans_log_inode(tp, cur->bc_ino.ip, flags); + xfs_btree_commit_ifakeroot(cur, tp, XFS_DATA_FORK, &xfs_rtrmapbt_ops); +} + +/* Calculate number of records in a rt reverse mapping btree block. */ +static inline unsigned int +xfs_rtrmapbt_block_maxrecs( + unsigned int blocklen, + bool leaf) +{ + if (leaf) + return blocklen / sizeof(struct xfs_rmap_rec); + return blocklen / + (2 * sizeof(struct xfs_rmap_key) + sizeof(xfs_rtrmap_ptr_t)); +} + +/* + * Calculate number of records in an rt reverse mapping btree block. + */ +unsigned int +xfs_rtrmapbt_maxrecs( + struct xfs_mount *mp, + unsigned int blocklen, + bool leaf) +{ + blocklen -= XFS_RTRMAP_BLOCK_LEN; + return xfs_rtrmapbt_block_maxrecs(blocklen, leaf); +} + +/* Compute the max possible height for realtime reverse mapping btrees. */ +unsigned int +xfs_rtrmapbt_maxlevels_ondisk(void) +{ + unsigned int minrecs[2]; + unsigned int blocklen; + + blocklen = XFS_MIN_CRC_BLOCKSIZE - XFS_BTREE_LBLOCK_CRC_LEN; + + minrecs[0] = xfs_rtrmapbt_block_maxrecs(blocklen, true) / 2; + minrecs[1] = xfs_rtrmapbt_block_maxrecs(blocklen, false) / 2; + + /* We need at most one record for every block in an rt group. */ + return xfs_btree_compute_maxlevels(minrecs, XFS_MAX_RGBLOCKS); +} + +int __init +xfs_rtrmapbt_init_cur_cache(void) +{ + xfs_rtrmapbt_cur_cache = kmem_cache_create("xfs_rtrmapbt_cur", + xfs_btree_cur_sizeof(xfs_rtrmapbt_maxlevels_ondisk()), + 0, 0, NULL); + + if (!xfs_rtrmapbt_cur_cache) + return -ENOMEM; + return 0; +} + +void +xfs_rtrmapbt_destroy_cur_cache(void) +{ + kmem_cache_destroy(xfs_rtrmapbt_cur_cache); + xfs_rtrmapbt_cur_cache = NULL; +} + +/* Compute the maximum height of an rt reverse mapping btree. */ +void +xfs_rtrmapbt_compute_maxlevels( + struct xfs_mount *mp) +{ + unsigned int d_maxlevels, r_maxlevels; + + if (!xfs_has_rtrmapbt(mp)) { + mp->m_rtrmap_maxlevels = 0; + return; + } + + /* + * The realtime rmapbt lives on the data device, which means that its + * maximum height is constrained by the size of the data device and + * the height required to store one rmap record for each block in an + * rt group. + */ + d_maxlevels = xfs_btree_space_to_height(mp->m_rtrmap_mnr, + mp->m_sb.sb_dblocks); + r_maxlevels = xfs_btree_compute_maxlevels(mp->m_rtrmap_mnr, + mp->m_sb.sb_rgblocks); + + /* Add one level to handle the inode root level. */ + mp->m_rtrmap_maxlevels = min(d_maxlevels, r_maxlevels) + 1; +} diff --git a/libxfs/xfs_rtrmap_btree.h b/libxfs/xfs_rtrmap_btree.h new file mode 100644 index 00000000000..7380c04e770 --- /dev/null +++ b/libxfs/xfs_rtrmap_btree.h @@ -0,0 +1,83 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * Copyright (C) 2022 Oracle. All Rights Reserved. + * Author: Darrick J. Wong <djwong@kernel.org> + */ +#ifndef __XFS_RTRMAP_BTREE_H__ +#define __XFS_RTRMAP_BTREE_H__ + +struct xfs_buf; +struct xfs_btree_cur; +struct xfs_mount; +struct xbtree_ifakeroot; +struct xfs_rtgroup; + +/* rmaps only exist on crc enabled filesystems */ +#define XFS_RTRMAP_BLOCK_LEN XFS_BTREE_LBLOCK_CRC_LEN + +struct xfs_btree_cur *xfs_rtrmapbt_init_cursor(struct xfs_mount *mp, + struct xfs_trans *tp, struct xfs_rtgroup *rtg, + struct xfs_inode *ip); +struct xfs_btree_cur *xfs_rtrmapbt_stage_cursor(struct xfs_mount *mp, + struct xfs_rtgroup *rtg, struct xfs_inode *ip, + struct xbtree_ifakeroot *ifake); +void xfs_rtrmapbt_commit_staged_btree(struct xfs_btree_cur *cur, + struct xfs_trans *tp); +unsigned int xfs_rtrmapbt_maxrecs(struct xfs_mount *mp, unsigned int blocklen, + bool leaf); +void xfs_rtrmapbt_compute_maxlevels(struct xfs_mount *mp); + +/* + * Addresses of records, keys, and pointers within an incore rtrmapbt block. + * + * (note that some of these may appear unused, but they are used in userspace) + */ +static inline struct xfs_rmap_rec * +xfs_rtrmap_rec_addr( + struct xfs_btree_block *block, + unsigned int index) +{ + return (struct xfs_rmap_rec *) + ((char *)block + XFS_RTRMAP_BLOCK_LEN + + (index - 1) * sizeof(struct xfs_rmap_rec)); +} + +static inline struct xfs_rmap_key * +xfs_rtrmap_key_addr( + struct xfs_btree_block *block, + unsigned int index) +{ + return (struct xfs_rmap_key *) + ((char *)block + XFS_RTRMAP_BLOCK_LEN + + (index - 1) * 2 * sizeof(struct xfs_rmap_key)); +} + +static inline struct xfs_rmap_key * +xfs_rtrmap_high_key_addr( + struct xfs_btree_block *block, + unsigned int index) +{ + return (struct xfs_rmap_key *) + ((char *)block + XFS_RTRMAP_BLOCK_LEN + + sizeof(struct xfs_rmap_key) + + (index - 1) * 2 * sizeof(struct xfs_rmap_key)); +} + +static inline xfs_rtrmap_ptr_t * +xfs_rtrmap_ptr_addr( + struct xfs_btree_block *block, + unsigned int index, + unsigned int maxrecs) +{ + return (xfs_rtrmap_ptr_t *) + ((char *)block + XFS_RTRMAP_BLOCK_LEN + + maxrecs * 2 * sizeof(struct xfs_rmap_key) + + (index - 1) * sizeof(xfs_rtrmap_ptr_t)); +} + +unsigned int xfs_rtrmapbt_maxlevels_ondisk(void); + +int __init xfs_rtrmapbt_init_cur_cache(void); +void xfs_rtrmapbt_destroy_cur_cache(void); + +#endif /* __XFS_RTRMAP_BTREE_H__ */ diff --git a/libxfs/xfs_sb.c b/libxfs/xfs_sb.c index c9cb21ea712..816a3915ae6 100644 --- a/libxfs/xfs_sb.c +++ b/libxfs/xfs_sb.c @@ -25,6 +25,7 @@ #include "xfs_ag.h" #include "xfs_swapext.h" #include "xfs_rtgroup.h" +#include "xfs_rtrmap_btree.h" /* * Physical superblock buffer manipulations. Shared with libxfs in userspace. @@ -1062,6 +1063,11 @@ xfs_sb_mount_common( mp->m_rmap_mnr[0] = mp->m_rmap_mxr[0] / 2; mp->m_rmap_mnr[1] = mp->m_rmap_mxr[1] / 2; + mp->m_rtrmap_mxr[0] = xfs_rtrmapbt_maxrecs(mp, sbp->sb_blocksize, true); + mp->m_rtrmap_mxr[1] = xfs_rtrmapbt_maxrecs(mp, sbp->sb_blocksize, false); + mp->m_rtrmap_mnr[0] = mp->m_rtrmap_mxr[0] / 2; + mp->m_rtrmap_mnr[1] = mp->m_rtrmap_mxr[1] / 2; + mp->m_refc_mxr[0] = xfs_refcountbt_maxrecs(mp, sbp->sb_blocksize, true); mp->m_refc_mxr[1] = xfs_refcountbt_maxrecs(mp, sbp->sb_blocksize, false); mp->m_refc_mnr[0] = mp->m_refc_mxr[0] / 2; diff --git a/libxfs/xfs_shared.h b/libxfs/xfs_shared.h index 62839fc87b5..31c577a9429 100644 --- a/libxfs/xfs_shared.h +++ b/libxfs/xfs_shared.h @@ -42,6 +42,7 @@ extern const struct xfs_buf_ops xfs_rtbitmap_buf_ops; extern const struct xfs_buf_ops xfs_rtsummary_buf_ops; extern const struct xfs_buf_ops xfs_rtbuf_ops; extern const struct xfs_buf_ops xfs_rtsb_buf_ops; +extern const struct xfs_buf_ops xfs_rtrmapbt_buf_ops; extern const struct xfs_buf_ops xfs_sb_buf_ops; extern const struct xfs_buf_ops xfs_sb_quiet_buf_ops; extern const struct xfs_buf_ops xfs_symlink_buf_ops; @@ -54,6 +55,7 @@ extern const struct xfs_btree_ops xfs_finobt_ops; extern const struct xfs_btree_ops xfs_bmbt_ops; extern const struct xfs_btree_ops xfs_refcountbt_ops; extern const struct xfs_btree_ops xfs_rmapbt_ops; +extern const struct xfs_btree_ops xfs_rtrmapbt_ops; /* log size calculation functions */ int xfs_log_calc_unit_res(struct xfs_mount *mp, int unit_bytes); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 02/41] xfs: introduce realtime rmap btree definitions 2022-12-30 22:19 ` [PATCHSET v1.0 00/41] libxfs: realtime reverse-mapping support Darrick J. Wong ` (2 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 03/41] xfs: define the on-disk realtime rmap btree format Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 06/41] xfs: prepare rmap functions to deal with rtrmapbt Darrick J. Wong ` (36 subsequent siblings) 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Add new realtime rmap btree definitions. The realtime rmap btree will be rooted from a hidden inode, but has its own shape and therefore needs to have most of its own separate types. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/xfs_btree.h | 1 + libxfs/xfs_format.h | 7 +++++++ libxfs/xfs_types.h | 5 +++-- 3 files changed, 11 insertions(+), 2 deletions(-) diff --git a/libxfs/xfs_btree.h b/libxfs/xfs_btree.h index 6ded5af94b3..72fd341eccd 100644 --- a/libxfs/xfs_btree.h +++ b/libxfs/xfs_btree.h @@ -64,6 +64,7 @@ union xfs_btree_rec { #define XFS_BTNUM_RMAP ((xfs_btnum_t)XFS_BTNUM_RMAPi) #define XFS_BTNUM_REFC ((xfs_btnum_t)XFS_BTNUM_REFCi) #define XFS_BTNUM_RCBAG ((xfs_btnum_t)XFS_BTNUM_RCBAGi) +#define XFS_BTNUM_RTRMAP ((xfs_btnum_t)XFS_BTNUM_RTRMAPi) struct xfs_btree_ops; uint32_t xfs_btree_magic(struct xfs_mount *mp, const struct xfs_btree_ops *ops); diff --git a/libxfs/xfs_format.h b/libxfs/xfs_format.h index e4f3b2c5c05..b2d4ef28a48 100644 --- a/libxfs/xfs_format.h +++ b/libxfs/xfs_format.h @@ -1727,6 +1727,13 @@ typedef __be32 xfs_rmap_ptr_t; XFS_FIBT_BLOCK(mp) + 1 : \ XFS_IBT_BLOCK(mp) + 1) +/* + * Realtime Reverse mapping btree format definitions + * + * This is a btree for reverse mapping records for realtime volumes + */ +#define XFS_RTRMAP_CRC_MAGIC 0x4d415052 /* 'MAPR' */ + /* * Reference Count Btree format definitions * diff --git a/libxfs/xfs_types.h b/libxfs/xfs_types.h index d37f8a7ce5f..e6a4f4a7d00 100644 --- a/libxfs/xfs_types.h +++ b/libxfs/xfs_types.h @@ -126,7 +126,7 @@ typedef enum { typedef enum { XFS_BTNUM_BNOi, XFS_BTNUM_CNTi, XFS_BTNUM_RMAPi, XFS_BTNUM_BMAPi, XFS_BTNUM_INOi, XFS_BTNUM_FINOi, XFS_BTNUM_REFCi, XFS_BTNUM_RCBAGi, - XFS_BTNUM_MAX + XFS_BTNUM_RTRMAPi, XFS_BTNUM_MAX } xfs_btnum_t; #define XFS_BTNUM_STRINGS \ @@ -137,7 +137,8 @@ typedef enum { { XFS_BTNUM_INOi, "inobt" }, \ { XFS_BTNUM_FINOi, "finobt" }, \ { XFS_BTNUM_REFCi, "refcbt" }, \ - { XFS_BTNUM_RCBAGi, "rcbagbt" } + { XFS_BTNUM_RCBAGi, "rcbagbt" }, \ + { XFS_BTNUM_RTRMAPi, "rtrmapbt" } struct xfs_name { const unsigned char *name; ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 06/41] xfs: prepare rmap functions to deal with rtrmapbt 2022-12-30 22:19 ` [PATCHSET v1.0 00/41] libxfs: realtime reverse-mapping support Darrick J. Wong ` (3 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 02/41] xfs: introduce realtime rmap btree definitions Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 07/41] xfs: add a realtime flag to the rmap update log redo items Darrick J. Wong ` (35 subsequent siblings) 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Prepare the high-level rmap functions to deal with the new realtime rmapbt and its slightly different conventions. Provide the ability to talk to either rmapbt or rtrmapbt formats from the same high level code. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/xfs_rmap.c | 66 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 66 insertions(+) diff --git a/libxfs/xfs_rmap.c b/libxfs/xfs_rmap.c index bce30dd66d6..be611b54a6c 100644 --- a/libxfs/xfs_rmap.c +++ b/libxfs/xfs_rmap.c @@ -23,6 +23,7 @@ #include "xfs_inode.h" #include "xfs_ag.h" #include "xfs_health.h" +#include "xfs_rtgroup.h" struct kmem_cache *xfs_rmap_intent_cache; @@ -261,12 +262,73 @@ xfs_rmap_check_perag_irec( return NULL; } +static inline xfs_failaddr_t +xfs_rmap_check_rtgroup_irec( + struct xfs_rtgroup *rtg, + const struct xfs_rmap_irec *irec) +{ + struct xfs_mount *mp = rtg->rtg_mount; + bool is_inode; + bool is_unwritten; + bool is_bmbt; + bool is_attr; + + if (irec->rm_blockcount == 0) + return __this_address; + + if (irec->rm_owner == XFS_RMAP_OWN_FS) { + if (irec->rm_startblock != 0) + return __this_address; + if (irec->rm_blockcount != mp->m_sb.sb_rextsize) + return __this_address; + if (irec->rm_offset != 0) + return __this_address; + } else { + if (!xfs_verify_rgbext(rtg, irec->rm_startblock, + irec->rm_blockcount)) + return __this_address; + } + + if (!(xfs_verify_ino(mp, irec->rm_owner) || + (irec->rm_owner <= XFS_RMAP_OWN_FS && + irec->rm_owner >= XFS_RMAP_OWN_MIN))) + return __this_address; + + /* Check flags. */ + is_inode = !XFS_RMAP_NON_INODE_OWNER(irec->rm_owner); + is_bmbt = irec->rm_flags & XFS_RMAP_BMBT_BLOCK; + is_attr = irec->rm_flags & XFS_RMAP_ATTR_FORK; + is_unwritten = irec->rm_flags & XFS_RMAP_UNWRITTEN; + + if (!is_inode && irec->rm_owner != XFS_RMAP_OWN_FS) + return __this_address; + + if (!is_inode && irec->rm_offset != 0) + return __this_address; + + if (is_bmbt || is_attr) + return __this_address; + + if (is_unwritten && !is_inode) + return __this_address; + + /* Check for a valid fork offset, if applicable. */ + if (is_inode && + !xfs_verify_fileext(mp, irec->rm_offset, irec->rm_blockcount)) + return __this_address; + + return NULL; +} + /* Simple checks for rmap records. */ xfs_failaddr_t xfs_rmap_check_irec( struct xfs_btree_cur *cur, const struct xfs_rmap_irec *irec) { + if (cur->bc_btnum == XFS_BTNUM_RTRMAP) + return xfs_rmap_check_rtgroup_irec(cur->bc_ino.rtg, irec); + if (cur->bc_flags & XFS_BTREE_IN_MEMORY) return xfs_rmap_check_perag_irec(cur->bc_mem.pag, irec); return xfs_rmap_check_perag_irec(cur->bc_ag.pag, irec); @@ -283,6 +345,10 @@ xfs_rmap_complain_bad_rec( if (cur->bc_flags & XFS_BTREE_IN_MEMORY) xfs_warn(mp, "In-Memory Reverse Mapping BTree record corruption detected at %pS!", fa); + else if (cur->bc_btnum == XFS_BTNUM_RTRMAP) + xfs_warn(mp, + "RT Reverse Mapping BTree record corruption in rtgroup %u detected at %pS!", + cur->bc_ino.rtg->rtg_rgno, fa); else xfs_warn(mp, "Reverse Mapping BTree record corruption in AG %d detected at %pS!", ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 07/41] xfs: add a realtime flag to the rmap update log redo items 2022-12-30 22:19 ` [PATCHSET v1.0 00/41] libxfs: realtime reverse-mapping support Darrick J. Wong ` (4 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 06/41] xfs: prepare rmap functions to deal with rtrmapbt Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 05/41] xfs: add realtime rmap btree operations Darrick J. Wong ` (34 subsequent siblings) 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Extend the rmap update (RUI) log items with a new realtime flag that indicates that the updates apply against the realtime rmapbt. We'll wire up the actual rmap code later. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/defer_item.c | 19 +++++++++++++++++++ libxfs/xfs_defer.c | 1 + libxfs/xfs_defer.h | 1 + libxfs/xfs_log_format.h | 4 +++- libxfs/xfs_refcount.c | 4 ++-- libxfs/xfs_rmap.c | 38 ++++++++++++++++++++++++++++++++------ libxfs/xfs_rmap.h | 10 +++++++--- 7 files changed, 65 insertions(+), 12 deletions(-) diff --git a/libxfs/defer_item.c b/libxfs/defer_item.c index 99899e6b617..9a4196f7cc0 100644 --- a/libxfs/defer_item.c +++ b/libxfs/defer_item.c @@ -26,6 +26,7 @@ #include "libxfs.h" #include "xfs_ag.h" #include "xfs_swapext.h" +#include "xfs_rtgroup.h" /* Dummy defer item ops, since we don't do logging. */ @@ -228,6 +229,11 @@ xfs_rmap_update_diff_items( ra = container_of(a, struct xfs_rmap_intent, ri_list); rb = container_of(b, struct xfs_rmap_intent, ri_list); + ASSERT(ra->ri_realtime == rb->ri_realtime); + + if (ra->ri_realtime) + return ra->ri_rtg->rtg_rgno - rb->ri_rtg->rtg_rgno; + return ra->ri_pag->pag_agno - rb->ri_pag->pag_agno; } @@ -264,6 +270,14 @@ xfs_rmap_update_get_group( { xfs_agnumber_t agno; + if (ri->ri_realtime) { + xfs_rgnumber_t rgno; + + rgno = xfs_rtb_to_rgno(mp, ri->ri_bmap.br_startblock); + ri->ri_rtg = xfs_rtgroup_get(mp, rgno); + return; + } + agno = XFS_FSB_TO_AGNO(mp, ri->ri_bmap.br_startblock); ri->ri_pag = xfs_perag_get(mp, agno); xfs_perag_bump_intents(ri->ri_pag); @@ -274,6 +288,11 @@ static inline void xfs_rmap_update_put_group( struct xfs_rmap_intent *ri) { + if (ri->ri_realtime) { + xfs_rtgroup_put(ri->ri_rtg); + return; + } + xfs_perag_drop_intents(ri->ri_pag); xfs_perag_put(ri->ri_pag); } diff --git a/libxfs/xfs_defer.c b/libxfs/xfs_defer.c index c148ed38eb0..9dbab9ac955 100644 --- a/libxfs/xfs_defer.c +++ b/libxfs/xfs_defer.c @@ -182,6 +182,7 @@ static const struct xfs_defer_op_type *defer_op_types[] = { [XFS_DEFER_OPS_TYPE_BMAP] = &xfs_bmap_update_defer_type, [XFS_DEFER_OPS_TYPE_REFCOUNT] = &xfs_refcount_update_defer_type, [XFS_DEFER_OPS_TYPE_RMAP] = &xfs_rmap_update_defer_type, + [XFS_DEFER_OPS_TYPE_RMAP_RT] = &xfs_rmap_update_defer_type, [XFS_DEFER_OPS_TYPE_FREE] = &xfs_extent_free_defer_type, [XFS_DEFER_OPS_TYPE_FREE_RT] = &xfs_extent_free_defer_type, [XFS_DEFER_OPS_TYPE_AGFL_FREE] = &xfs_agfl_free_defer_type, diff --git a/libxfs/xfs_defer.h b/libxfs/xfs_defer.h index 52198c7124c..89c279185ce 100644 --- a/libxfs/xfs_defer.h +++ b/libxfs/xfs_defer.h @@ -17,6 +17,7 @@ enum xfs_defer_ops_type { XFS_DEFER_OPS_TYPE_BMAP, XFS_DEFER_OPS_TYPE_REFCOUNT, XFS_DEFER_OPS_TYPE_RMAP, + XFS_DEFER_OPS_TYPE_RMAP_RT, XFS_DEFER_OPS_TYPE_FREE, XFS_DEFER_OPS_TYPE_AGFL_FREE, XFS_DEFER_OPS_TYPE_FREE_RT, diff --git a/libxfs/xfs_log_format.h b/libxfs/xfs_log_format.h index f3c8257a754..3a23282d6e6 100644 --- a/libxfs/xfs_log_format.h +++ b/libxfs/xfs_log_format.h @@ -746,11 +746,13 @@ struct xfs_map_extent { #define XFS_RMAP_EXTENT_ATTR_FORK (1U << 31) #define XFS_RMAP_EXTENT_BMBT_BLOCK (1U << 30) #define XFS_RMAP_EXTENT_UNWRITTEN (1U << 29) +#define XFS_RMAP_EXTENT_REALTIME (1U << 28) #define XFS_RMAP_EXTENT_FLAGS (XFS_RMAP_EXTENT_TYPE_MASK | \ XFS_RMAP_EXTENT_ATTR_FORK | \ XFS_RMAP_EXTENT_BMBT_BLOCK | \ - XFS_RMAP_EXTENT_UNWRITTEN) + XFS_RMAP_EXTENT_UNWRITTEN | \ + XFS_RMAP_EXTENT_REALTIME) /* * This is the structure used to lay out an rui log item in the diff --git a/libxfs/xfs_refcount.c b/libxfs/xfs_refcount.c index 5cb2132d9ac..1590bb285ed 100644 --- a/libxfs/xfs_refcount.c +++ b/libxfs/xfs_refcount.c @@ -1888,7 +1888,7 @@ xfs_refcount_alloc_cow_extent( __xfs_refcount_add(tp, XFS_REFCOUNT_ALLOC_COW, fsb, len); /* Add rmap entry */ - xfs_rmap_alloc_extent(tp, fsb, len, XFS_RMAP_OWN_COW); + xfs_rmap_alloc_extent(tp, false, fsb, len, XFS_RMAP_OWN_COW); } /* Forget a CoW staging event in the refcount btree. */ @@ -1904,7 +1904,7 @@ xfs_refcount_free_cow_extent( return; /* Remove rmap entry */ - xfs_rmap_free_extent(tp, fsb, len, XFS_RMAP_OWN_COW); + xfs_rmap_free_extent(tp, false, fsb, len, XFS_RMAP_OWN_COW); __xfs_refcount_add(tp, XFS_REFCOUNT_FREE_COW, fsb, len); } diff --git a/libxfs/xfs_rmap.c b/libxfs/xfs_rmap.c index be611b54a6c..3700d702631 100644 --- a/libxfs/xfs_rmap.c +++ b/libxfs/xfs_rmap.c @@ -2653,6 +2653,12 @@ xfs_rmap_finish_one( xfs_agblock_t bno; bool unwritten; + if (ri->ri_realtime) { + /* coming in a subsequent patch */ + ASSERT(0); + return -EFSCORRUPTED; + } + bno = XFS_FSB_TO_AGBNO(mp, ri->ri_bmap.br_startblock); trace_xfs_rmap_deferred(mp, ri); @@ -2725,10 +2731,12 @@ __xfs_rmap_add( struct xfs_trans *tp, enum xfs_rmap_intent_type type, uint64_t owner, + bool isrt, int whichfork, struct xfs_bmbt_irec *bmap) { struct xfs_rmap_intent *ri; + enum xfs_defer_ops_type optype; ri = kmem_cache_alloc(xfs_rmap_intent_cache, GFP_NOFS | __GFP_NOFAIL); INIT_LIST_HEAD(&ri->ri_list); @@ -2736,11 +2744,24 @@ __xfs_rmap_add( ri->ri_owner = owner; ri->ri_whichfork = whichfork; ri->ri_bmap = *bmap; + ri->ri_realtime = isrt; + + /* + * Deferred rmap updates for the realtime and data sections must use + * separate transactions to finish deferred work because updates to + * realtime metadata files can lock AGFs to allocate btree blocks and + * we don't want that mixing with the AGF locks taken to finish data + * section updates. + */ + if (isrt) + optype = XFS_DEFER_OPS_TYPE_RMAP_RT; + else + optype = XFS_DEFER_OPS_TYPE_RMAP; trace_xfs_rmap_defer(tp->t_mountp, ri); xfs_rmap_update_get_group(tp->t_mountp, ri); - xfs_defer_add(tp, XFS_DEFER_OPS_TYPE_RMAP, &ri->ri_list); + xfs_defer_add(tp, optype, &ri->ri_list); } /* Map an extent into a file. */ @@ -2752,6 +2773,7 @@ xfs_rmap_map_extent( struct xfs_bmbt_irec *PREV) { enum xfs_rmap_intent_type type = XFS_RMAP_MAP; + bool isrt = xfs_ifork_is_realtime(ip, whichfork); if (!xfs_rmap_update_is_needed(tp->t_mountp, whichfork)) return; @@ -2759,7 +2781,7 @@ xfs_rmap_map_extent( if (whichfork != XFS_ATTR_FORK && xfs_is_reflink_inode(ip)) type = XFS_RMAP_MAP_SHARED; - __xfs_rmap_add(tp, type, ip->i_ino, whichfork, PREV); + __xfs_rmap_add(tp, type, ip->i_ino, isrt, whichfork, PREV); } /* Unmap an extent out of a file. */ @@ -2771,6 +2793,7 @@ xfs_rmap_unmap_extent( struct xfs_bmbt_irec *PREV) { enum xfs_rmap_intent_type type = XFS_RMAP_UNMAP; + bool isrt = xfs_ifork_is_realtime(ip, whichfork); if (!xfs_rmap_update_is_needed(tp->t_mountp, whichfork)) return; @@ -2778,7 +2801,7 @@ xfs_rmap_unmap_extent( if (whichfork != XFS_ATTR_FORK && xfs_is_reflink_inode(ip)) type = XFS_RMAP_UNMAP_SHARED; - __xfs_rmap_add(tp, type, ip->i_ino, whichfork, PREV); + __xfs_rmap_add(tp, type, ip->i_ino, isrt, whichfork, PREV); } /* @@ -2796,6 +2819,7 @@ xfs_rmap_convert_extent( struct xfs_bmbt_irec *PREV) { enum xfs_rmap_intent_type type = XFS_RMAP_CONVERT; + bool isrt = xfs_ifork_is_realtime(ip, whichfork); if (!xfs_rmap_update_is_needed(mp, whichfork)) return; @@ -2803,13 +2827,14 @@ xfs_rmap_convert_extent( if (whichfork != XFS_ATTR_FORK && xfs_is_reflink_inode(ip)) type = XFS_RMAP_CONVERT_SHARED; - __xfs_rmap_add(tp, type, ip->i_ino, whichfork, PREV); + __xfs_rmap_add(tp, type, ip->i_ino, isrt, whichfork, PREV); } /* Schedule the creation of an rmap for non-file data. */ void xfs_rmap_alloc_extent( struct xfs_trans *tp, + bool isrt, xfs_fsblock_t fsbno, xfs_extlen_t len, uint64_t owner) @@ -2824,13 +2849,14 @@ xfs_rmap_alloc_extent( bmap.br_startoff = 0; bmap.br_state = XFS_EXT_NORM; - __xfs_rmap_add(tp, XFS_RMAP_ALLOC, owner, XFS_DATA_FORK, &bmap); + __xfs_rmap_add(tp, XFS_RMAP_ALLOC, owner, isrt, XFS_DATA_FORK, &bmap); } /* Schedule the deletion of an rmap for non-file data. */ void xfs_rmap_free_extent( struct xfs_trans *tp, + bool isrt, xfs_fsblock_t fsbno, xfs_extlen_t len, uint64_t owner) @@ -2845,7 +2871,7 @@ xfs_rmap_free_extent( bmap.br_startoff = 0; bmap.br_state = XFS_EXT_NORM; - __xfs_rmap_add(tp, XFS_RMAP_FREE, owner, XFS_DATA_FORK, &bmap); + __xfs_rmap_add(tp, XFS_RMAP_FREE, owner, isrt, XFS_DATA_FORK, &bmap); } /* Compare rmap records. Returns -1 if a < b, 1 if a > b, and 0 if equal. */ diff --git a/libxfs/xfs_rmap.h b/libxfs/xfs_rmap.h index 54c969731cf..e98f37c39f2 100644 --- a/libxfs/xfs_rmap.h +++ b/libxfs/xfs_rmap.h @@ -173,7 +173,11 @@ struct xfs_rmap_intent { int ri_whichfork; uint64_t ri_owner; struct xfs_bmbt_irec ri_bmap; - struct xfs_perag *ri_pag; + union { + struct xfs_perag *ri_pag; + struct xfs_rtgroup *ri_rtg; + }; + bool ri_realtime; }; void xfs_rmap_update_get_group(struct xfs_mount *mp, @@ -187,9 +191,9 @@ void xfs_rmap_unmap_extent(struct xfs_trans *tp, struct xfs_inode *ip, void xfs_rmap_convert_extent(struct xfs_mount *mp, struct xfs_trans *tp, struct xfs_inode *ip, int whichfork, struct xfs_bmbt_irec *imap); -void xfs_rmap_alloc_extent(struct xfs_trans *tp, xfs_fsblock_t fsbno, +void xfs_rmap_alloc_extent(struct xfs_trans *tp, bool isrt, xfs_fsblock_t fsbno, xfs_extlen_t len, uint64_t owner); -void xfs_rmap_free_extent(struct xfs_trans *tp, xfs_fsblock_t fsbno, +void xfs_rmap_free_extent(struct xfs_trans *tp, bool isrt, xfs_fsblock_t fsbno, xfs_extlen_t len, uint64_t owner); void xfs_rmap_finish_one_cleanup(struct xfs_trans *tp, ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 05/41] xfs: add realtime rmap btree operations 2022-12-30 22:19 ` [PATCHSET v1.0 00/41] libxfs: realtime reverse-mapping support Darrick J. Wong ` (5 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 07/41] xfs: add a realtime flag to the rmap update log redo items Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 12/41] xfs: wire up rmap map and unmap to the realtime rmapbt Darrick J. Wong ` (33 subsequent siblings) 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Implement the generic btree operations needed to manipulate rtrmap btree blocks. This is different from the regular rmapbt in that we allocate space from the filesystem at large, and are neither constrained to the free space nor any particular AG. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/xfs_btree.c | 112 +++++++++++++++++++ libxfs/xfs_btree.h | 5 + libxfs/xfs_imeta.c | 6 + libxfs/xfs_rtrmap_btree.c | 271 +++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 394 insertions(+) diff --git a/libxfs/xfs_btree.c b/libxfs/xfs_btree.c index 8c101f55039..a89a05555b8 100644 --- a/libxfs/xfs_btree.c +++ b/libxfs/xfs_btree.c @@ -30,6 +30,9 @@ #include "xfs_btree_mem.h" #include "xfs_rtgroup.h" #include "xfs_rtrmap_btree.h" +#include "xfs_bmap.h" +#include "xfs_rmap.h" +#include "xfs_imeta.h" /* * Btree magic numbers. @@ -5586,3 +5589,112 @@ xfs_btree_goto_left_edge( return 0; } + +/* Allocate a block for an inode-rooted metadata btree. */ +int +xfs_btree_alloc_imeta_block( + struct xfs_btree_cur *cur, + const union xfs_btree_ptr *start, + union xfs_btree_ptr *new, + int *stat) +{ + struct xfs_alloc_arg args = { + .mp = cur->bc_mp, + .tp = cur->bc_tp + }; + struct xfs_inode *ip = cur->bc_ino.ip; + struct xfs_trans *tp = cur->bc_tp; + int error; + + ASSERT(!XFS_NOT_DQATTACHED(cur->bc_mp, ip)); + + args.fsbno = tp->t_firstblock; + args.resv = XFS_AG_RESV_IMETA; + xfs_rmap_ino_bmbt_owner(&args.oinfo, ip->i_ino, cur->bc_ino.whichfork); + + if (args.fsbno == NULLFSBLOCK) { + args.fsbno = be64_to_cpu(start->l); + args.type = XFS_ALLOCTYPE_START_BNO; + /* + * Make sure there is sufficient room left in the AG to + * complete a full tree split for an extent insert. If + * we are converting the middle part of an extent then + * we may need space for two tree splits. + * + * We are relying on the caller to make the correct block + * reservation for this operation to succeed. If the + * reservation amount is insufficient then we may fail a + * block allocation here and corrupt the filesystem. + */ + args.minleft = tp->t_blk_res; + } else if (tp->t_flags & XFS_TRANS_LOWMODE) { + args.type = XFS_ALLOCTYPE_START_BNO; + } else { + args.type = XFS_ALLOCTYPE_NEAR_BNO; + } + + args.minlen = args.maxlen = args.prod = 1; + error = xfs_alloc_vextent(&args); + if (error) + goto error0; + + if (args.fsbno == NULLFSBLOCK && args.minleft) { + /* + * Could not find an AG with enough free space to satisfy + * a full btree split. Try again without minleft and if + * successful activate the lowspace algorithm. + */ + args.fsbno = 0; + args.type = XFS_ALLOCTYPE_FIRST_AG; + args.minleft = 0; + error = xfs_alloc_vextent(&args); + if (error) + goto error0; + tp->t_flags |= XFS_TRANS_LOWMODE; + } + if (args.fsbno == NULLFSBLOCK) { + *stat = 0; + return 0; + } + ASSERT(args.len == 1); + + xfs_imeta_resv_alloc_extent(ip, &args); + cur->bc_ino.allocated++; + + new->l = cpu_to_be64(args.fsbno); + *stat = 1; + return 0; + + error0: + return error; +} + +/* Free a block from an inode-rooted metadata btree. */ +int +xfs_btree_free_imeta_block( + struct xfs_btree_cur *cur, + struct xfs_buf *bp) +{ + struct xfs_owner_info oinfo; + struct xfs_mount *mp = cur->bc_mp; + struct xfs_inode *ip = cur->bc_ino.ip; + struct xfs_trans *tp = cur->bc_tp; + struct xfs_perag *pag; + xfs_fsblock_t fsbno = XFS_DADDR_TO_FSB(mp, xfs_buf_daddr(bp)); + xfs_agnumber_t agno = XFS_FSB_TO_AGNO(mp, fsbno); + xfs_agblock_t agbno = XFS_FSB_TO_AGBNO(mp, fsbno); + int error; + + ASSERT(!XFS_NOT_DQATTACHED(mp, ip)); + + xfs_rmap_ino_bmbt_owner(&oinfo, ip->i_ino, cur->bc_ino.whichfork); + pag = xfs_perag_get(mp, agno); + error = __xfs_free_extent(tp, pag, agbno, 1, &oinfo, XFS_AG_RESV_IMETA, + false); + xfs_perag_put(pag); + if (error) + return error; + + xfs_imeta_resv_free_extent(ip, tp, 1); + return 0; +} diff --git a/libxfs/xfs_btree.h b/libxfs/xfs_btree.h index 72fd341eccd..9f7b6fc5439 100644 --- a/libxfs/xfs_btree.h +++ b/libxfs/xfs_btree.h @@ -762,4 +762,9 @@ void xfs_btree_destroy_cur_caches(void); int xfs_btree_goto_left_edge(struct xfs_btree_cur *cur); +int xfs_btree_alloc_imeta_block(struct xfs_btree_cur *cur, + const union xfs_btree_ptr *start, union xfs_btree_ptr *newp, + int *stat); +int xfs_btree_free_imeta_block(struct xfs_btree_cur *cur, struct xfs_buf *bp); + #endif /* __XFS_BTREE_H__ */ diff --git a/libxfs/xfs_imeta.c b/libxfs/xfs_imeta.c index 5429082bc03..5406c81217f 100644 --- a/libxfs/xfs_imeta.c +++ b/libxfs/xfs_imeta.c @@ -1301,6 +1301,9 @@ xfs_imeta_resv_alloc_extent( xfs_trans_mod_sb(args->tp, XFS_TRANS_SB_FDBLOCKS, -len); ip->i_nblocks += args->len; + xfs_trans_mod_dquot_byino(args->tp, ip, XFS_TRANS_DQ_BCOUNT, args->len); + + xfs_trans_log_inode(args->tp, ip, XFS_ILOG_CORE); } /* Free a block to the metadata file's reservation. */ @@ -1316,6 +1319,7 @@ xfs_imeta_resv_free_extent( trace_xfs_imeta_resv_free_extent(ip, len); ip->i_nblocks -= len; + xfs_trans_mod_dquot_byino(tp, ip, XFS_TRANS_DQ_BCOUNT, -len); /* * Add the freed blocks back into the inode's delalloc reservation @@ -1336,6 +1340,8 @@ xfs_imeta_resv_free_extent( */ if (len) xfs_trans_mod_sb(tp, XFS_TRANS_SB_FDBLOCKS, len); + + xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); } /* Release a metadata file's space reservation. */ diff --git a/libxfs/xfs_rtrmap_btree.c b/libxfs/xfs_rtrmap_btree.c index 99ccdffc30d..4dfd4fd1b1f 100644 --- a/libxfs/xfs_rtrmap_btree.c +++ b/libxfs/xfs_rtrmap_btree.c @@ -18,10 +18,12 @@ #include "xfs_alloc.h" #include "xfs_btree.h" #include "xfs_btree_staging.h" +#include "xfs_rmap.h" #include "xfs_rtrmap_btree.h" #include "xfs_trace.h" #include "xfs_cksum.h" #include "xfs_rtgroup.h" +#include "xfs_bmap.h" static struct kmem_cache *xfs_rtrmapbt_cur_cache; @@ -50,6 +52,182 @@ xfs_rtrmapbt_dup_cursor( return new; } +STATIC int +xfs_rtrmapbt_get_minrecs( + struct xfs_btree_cur *cur, + int level) +{ + if (level == cur->bc_nlevels - 1) { + struct xfs_ifork *ifp = xfs_btree_ifork_ptr(cur); + + return xfs_rtrmapbt_maxrecs(cur->bc_mp, ifp->if_broot_bytes, + level == 0) / 2; + } + + return cur->bc_mp->m_rtrmap_mnr[level != 0]; +} + +STATIC int +xfs_rtrmapbt_get_maxrecs( + struct xfs_btree_cur *cur, + int level) +{ + if (level == cur->bc_nlevels - 1) { + struct xfs_ifork *ifp = xfs_btree_ifork_ptr(cur); + + return xfs_rtrmapbt_maxrecs(cur->bc_mp, ifp->if_broot_bytes, + level == 0); + } + + return cur->bc_mp->m_rtrmap_mxr[level != 0]; +} + +/* + * Convert the ondisk record's offset field into the ondisk key's offset field. + * Fork and bmbt are significant parts of the rmap record key, but written + * status is merely a record attribute. + */ +static inline __be64 ondisk_rec_offset_to_key(const union xfs_btree_rec *rec) +{ + return rec->rmap.rm_offset & ~cpu_to_be64(XFS_RMAP_OFF_UNWRITTEN); +} + +STATIC void +xfs_rtrmapbt_init_key_from_rec( + union xfs_btree_key *key, + const union xfs_btree_rec *rec) +{ + key->rmap.rm_startblock = rec->rmap.rm_startblock; + key->rmap.rm_owner = rec->rmap.rm_owner; + key->rmap.rm_offset = ondisk_rec_offset_to_key(rec); +} + +STATIC void +xfs_rtrmapbt_init_high_key_from_rec( + union xfs_btree_key *key, + const union xfs_btree_rec *rec) +{ + uint64_t off; + int adj; + + adj = be32_to_cpu(rec->rmap.rm_blockcount) - 1; + + key->rmap.rm_startblock = rec->rmap.rm_startblock; + be32_add_cpu(&key->rmap.rm_startblock, adj); + key->rmap.rm_owner = rec->rmap.rm_owner; + key->rmap.rm_offset = ondisk_rec_offset_to_key(rec); + if (XFS_RMAP_NON_INODE_OWNER(be64_to_cpu(rec->rmap.rm_owner)) || + XFS_RMAP_IS_BMBT_BLOCK(be64_to_cpu(rec->rmap.rm_offset))) + return; + off = be64_to_cpu(key->rmap.rm_offset); + off = (XFS_RMAP_OFF(off) + adj) | (off & ~XFS_RMAP_OFF_MASK); + key->rmap.rm_offset = cpu_to_be64(off); +} + +STATIC void +xfs_rtrmapbt_init_rec_from_cur( + struct xfs_btree_cur *cur, + union xfs_btree_rec *rec) +{ + rec->rmap.rm_startblock = cpu_to_be32(cur->bc_rec.r.rm_startblock); + rec->rmap.rm_blockcount = cpu_to_be32(cur->bc_rec.r.rm_blockcount); + rec->rmap.rm_owner = cpu_to_be64(cur->bc_rec.r.rm_owner); + rec->rmap.rm_offset = cpu_to_be64( + xfs_rmap_irec_offset_pack(&cur->bc_rec.r)); +} + +STATIC void +xfs_rtrmapbt_init_ptr_from_cur( + struct xfs_btree_cur *cur, + union xfs_btree_ptr *ptr) +{ + ptr->l = 0; +} + +/* + * Mask the appropriate parts of the ondisk key field for a key comparison. + * Fork and bmbt are significant parts of the rmap record key, but written + * status is merely a record attribute. + */ +static inline uint64_t offset_keymask(uint64_t offset) +{ + return offset & ~XFS_RMAP_OFF_UNWRITTEN; +} + +STATIC int64_t +xfs_rtrmapbt_key_diff( + struct xfs_btree_cur *cur, + const union xfs_btree_key *key) +{ + struct xfs_rmap_irec *rec = &cur->bc_rec.r; + const struct xfs_rmap_key *kp = &key->rmap; + __u64 x, y; + int64_t d; + + d = (int64_t)be32_to_cpu(kp->rm_startblock) - rec->rm_startblock; + if (d) + return d; + + x = be64_to_cpu(kp->rm_owner); + y = rec->rm_owner; + if (x > y) + return 1; + else if (y > x) + return -1; + + x = offset_keymask(be64_to_cpu(kp->rm_offset)); + y = offset_keymask(xfs_rmap_irec_offset_pack(rec)); + if (x > y) + return 1; + else if (y > x) + return -1; + return 0; +} + +STATIC int64_t +xfs_rtrmapbt_diff_two_keys( + struct xfs_btree_cur *cur, + const union xfs_btree_key *k1, + const union xfs_btree_key *k2, + const union xfs_btree_key *mask) +{ + const struct xfs_rmap_key *kp1 = &k1->rmap; + const struct xfs_rmap_key *kp2 = &k2->rmap; + int64_t d; + __u64 x, y; + + /* Doesn't make sense to mask off the physical space part */ + ASSERT(!mask || mask->rmap.rm_startblock); + + d = (int64_t)be32_to_cpu(kp1->rm_startblock) - + be32_to_cpu(kp2->rm_startblock); + if (d) + return d; + + if (!mask || mask->rmap.rm_owner) { + x = be64_to_cpu(kp1->rm_owner); + y = be64_to_cpu(kp2->rm_owner); + if (x > y) + return 1; + else if (y > x) + return -1; + } + + if (!mask || mask->rmap.rm_offset) { + /* Doesn't make sense to allow offset but not owner */ + ASSERT(!mask || mask->rmap.rm_owner); + + x = offset_keymask(be64_to_cpu(kp1->rm_offset)); + y = offset_keymask(be64_to_cpu(kp2->rm_offset)); + if (x > y) + return 1; + else if (y > x) + return -1; + } + + return 0; +} + static xfs_failaddr_t xfs_rtrmapbt_verify( struct xfs_buf *bp) @@ -116,6 +294,86 @@ const struct xfs_buf_ops xfs_rtrmapbt_buf_ops = { .verify_struct = xfs_rtrmapbt_verify, }; +STATIC int +xfs_rtrmapbt_keys_inorder( + struct xfs_btree_cur *cur, + const union xfs_btree_key *k1, + const union xfs_btree_key *k2) +{ + uint32_t x; + uint32_t y; + uint64_t a; + uint64_t b; + + x = be32_to_cpu(k1->rmap.rm_startblock); + y = be32_to_cpu(k2->rmap.rm_startblock); + if (x < y) + return 1; + else if (x > y) + return 0; + a = be64_to_cpu(k1->rmap.rm_owner); + b = be64_to_cpu(k2->rmap.rm_owner); + if (a < b) + return 1; + else if (a > b) + return 0; + a = offset_keymask(be64_to_cpu(k1->rmap.rm_offset)); + b = offset_keymask(be64_to_cpu(k2->rmap.rm_offset)); + if (a <= b) + return 1; + return 0; +} + +STATIC int +xfs_rtrmapbt_recs_inorder( + struct xfs_btree_cur *cur, + const union xfs_btree_rec *r1, + const union xfs_btree_rec *r2) +{ + uint32_t x; + uint32_t y; + uint64_t a; + uint64_t b; + + x = be32_to_cpu(r1->rmap.rm_startblock); + y = be32_to_cpu(r2->rmap.rm_startblock); + if (x < y) + return 1; + else if (x > y) + return 0; + a = be64_to_cpu(r1->rmap.rm_owner); + b = be64_to_cpu(r2->rmap.rm_owner); + if (a < b) + return 1; + else if (a > b) + return 0; + a = offset_keymask(be64_to_cpu(r1->rmap.rm_offset)); + b = offset_keymask(be64_to_cpu(r2->rmap.rm_offset)); + if (a <= b) + return 1; + return 0; +} + +STATIC enum xbtree_key_contig +xfs_rtrmapbt_keys_contiguous( + struct xfs_btree_cur *cur, + const union xfs_btree_key *key1, + const union xfs_btree_key *key2, + const union xfs_btree_key *mask) +{ + ASSERT(!mask || mask->rmap.rm_startblock); + + /* + * We only support checking contiguity of the physical space component. + * If any callers ever need more specificity than that, they'll have to + * implement it here. + */ + ASSERT(!mask || (!mask->rmap.rm_owner && !mask->rmap.rm_offset)); + + return xbtree_key_contig(be32_to_cpu(key1->rmap.rm_startblock), + be32_to_cpu(key2->rmap.rm_startblock)); +} + const struct xfs_btree_ops xfs_rtrmapbt_ops = { .rec_len = sizeof(struct xfs_rmap_rec), .key_len = 2 * sizeof(struct xfs_rmap_key), @@ -124,7 +382,20 @@ const struct xfs_btree_ops xfs_rtrmapbt_ops = { XFS_BTREE_IROOT_RECORDS, .dup_cursor = xfs_rtrmapbt_dup_cursor, + .alloc_block = xfs_btree_alloc_imeta_block, + .free_block = xfs_btree_free_imeta_block, + .get_minrecs = xfs_rtrmapbt_get_minrecs, + .get_maxrecs = xfs_rtrmapbt_get_maxrecs, + .init_key_from_rec = xfs_rtrmapbt_init_key_from_rec, + .init_high_key_from_rec = xfs_rtrmapbt_init_high_key_from_rec, + .init_rec_from_cur = xfs_rtrmapbt_init_rec_from_cur, + .init_ptr_from_cur = xfs_rtrmapbt_init_ptr_from_cur, + .key_diff = xfs_rtrmapbt_key_diff, .buf_ops = &xfs_rtrmapbt_buf_ops, + .diff_two_keys = xfs_rtrmapbt_diff_two_keys, + .keys_inorder = xfs_rtrmapbt_keys_inorder, + .recs_inorder = xfs_rtrmapbt_recs_inorder, + .keys_contiguous = xfs_rtrmapbt_keys_contiguous, }; /* Initialize a new rt rmap btree cursor. */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 12/41] xfs: wire up rmap map and unmap to the realtime rmapbt 2022-12-30 22:19 ` [PATCHSET v1.0 00/41] libxfs: realtime reverse-mapping support Darrick J. Wong ` (6 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 05/41] xfs: add realtime rmap btree operations Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 15/41] xfs: allow queued realtime intents to drain before scrubbing Darrick J. Wong ` (32 subsequent siblings) 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Connect the map and unmap reverse-mapping operations to the realtime rmapbt via the deferred operation callbacks. This enables us to perform rmap operations against the correct btree. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/xfs_rmap.c | 80 +++++++++++++++++++++++++++++++------------------- libxfs/xfs_rtgroup.c | 9 ++++++ libxfs/xfs_rtgroup.h | 5 +++ 3 files changed, 63 insertions(+), 31 deletions(-) diff --git a/libxfs/xfs_rmap.c b/libxfs/xfs_rmap.c index 3700d702631..74fb9197cbc 100644 --- a/libxfs/xfs_rmap.c +++ b/libxfs/xfs_rmap.c @@ -24,6 +24,7 @@ #include "xfs_ag.h" #include "xfs_health.h" #include "xfs_rtgroup.h" +#include "xfs_rtrmap_btree.h" struct kmem_cache *xfs_rmap_intent_cache; @@ -2590,13 +2591,14 @@ xfs_rmap_finish_one_cleanup( struct xfs_btree_cur *rcur, int error) { - struct xfs_buf *agbp; + struct xfs_buf *agbp = NULL; if (rcur == NULL) return; - agbp = rcur->bc_ag.agbp; + if (rcur->bc_btnum == XFS_BTNUM_RMAP) + agbp = rcur->bc_ag.agbp; xfs_btree_del_cursor(rcur, error); - if (error) + if (error && agbp) xfs_trans_brelse(tp, agbp); } @@ -2632,6 +2634,17 @@ __xfs_rmap_finish_intent( } } +/* Does this btree cursor match the given group object? */ +static inline bool +xfs_rmap_is_wrong_cursor( + struct xfs_btree_cur *cur, + struct xfs_rmap_intent *ri) +{ + if (cur->bc_btnum == XFS_BTNUM_RTRMAP) + return cur->bc_ino.rtg != ri->ri_rtg; + return cur->bc_ag.pag != ri->ri_pag; +} + /* * Process one of the deferred rmap operations. We pass back the * btree cursor to maintain our lock on the rmapbt between calls. @@ -2645,24 +2658,24 @@ xfs_rmap_finish_one( struct xfs_rmap_intent *ri, struct xfs_btree_cur **pcur) { + struct xfs_owner_info oinfo; struct xfs_mount *mp = tp->t_mountp; struct xfs_btree_cur *rcur; struct xfs_buf *agbp = NULL; - int error = 0; - struct xfs_owner_info oinfo; xfs_agblock_t bno; bool unwritten; - - if (ri->ri_realtime) { - /* coming in a subsequent patch */ - ASSERT(0); - return -EFSCORRUPTED; - } - - bno = XFS_FSB_TO_AGBNO(mp, ri->ri_bmap.br_startblock); + int error = 0; trace_xfs_rmap_deferred(mp, ri); + if (ri->ri_realtime) { + xfs_rgnumber_t rgno; + + bno = xfs_rtb_to_rgbno(mp, ri->ri_bmap.br_startblock, &rgno); + } else { + bno = XFS_FSB_TO_AGBNO(mp, ri->ri_bmap.br_startblock); + } + if (XFS_TEST_ERROR(false, mp, XFS_ERRTAG_RMAP_FINISH_ONE)) return -EIO; @@ -2671,35 +2684,42 @@ xfs_rmap_finish_one( * the startblock, get one now. */ rcur = *pcur; - if (rcur != NULL && rcur->bc_ag.pag != ri->ri_pag) { + if (rcur != NULL && xfs_rmap_is_wrong_cursor(rcur, ri)) { xfs_rmap_finish_one_cleanup(tp, rcur, 0); rcur = NULL; *pcur = NULL; } if (rcur == NULL) { - /* - * Refresh the freelist before we start changing the - * rmapbt, because a shape change could cause us to - * allocate blocks. - */ - error = xfs_free_extent_fix_freelist(tp, ri->ri_pag, &agbp); - if (error) { - xfs_ag_mark_sick(ri->ri_pag, XFS_SICK_AG_AGFL); - return error; - } - if (XFS_IS_CORRUPT(tp->t_mountp, !agbp)) { - xfs_ag_mark_sick(ri->ri_pag, XFS_SICK_AG_AGFL); - return -EFSCORRUPTED; - } + if (ri->ri_realtime) { + xfs_rtgroup_lock(tp, ri->ri_rtg, XFS_RTGLOCK_RMAP); + rcur = xfs_rtrmapbt_init_cursor(mp, tp, ri->ri_rtg, + ri->ri_rtg->rtg_rmapip); + rcur->bc_ino.flags = 0; + } else { + /* + * Refresh the freelist before we start changing the + * rmapbt, because a shape change could cause us to + * allocate blocks. + */ + error = xfs_free_extent_fix_freelist(tp, ri->ri_pag, + &agbp); + if (error) { + xfs_ag_mark_sick(ri->ri_pag, XFS_SICK_AG_AGFL); + return error; + } + if (XFS_IS_CORRUPT(tp->t_mountp, !agbp)) { + xfs_ag_mark_sick(ri->ri_pag, XFS_SICK_AG_AGFL); + return -EFSCORRUPTED; + } - rcur = xfs_rmapbt_init_cursor(mp, tp, agbp, ri->ri_pag); + rcur = xfs_rmapbt_init_cursor(mp, tp, agbp, ri->ri_pag); + } } *pcur = rcur; xfs_rmap_ino_owner(&oinfo, ri->ri_owner, ri->ri_whichfork, ri->ri_bmap.br_startoff); unwritten = ri->ri_bmap.br_state == XFS_EXT_UNWRITTEN; - bno = XFS_FSB_TO_AGBNO(rcur->bc_mp, ri->ri_bmap.br_startblock); error = __xfs_rmap_finish_intent(rcur, ri->ri_type, bno, ri->ri_bmap.br_blockcount, &oinfo, unwritten); diff --git a/libxfs/xfs_rtgroup.c b/libxfs/xfs_rtgroup.c index 97643fdcc7c..8018cd02e70 100644 --- a/libxfs/xfs_rtgroup.c +++ b/libxfs/xfs_rtgroup.c @@ -512,6 +512,12 @@ xfs_rtgroup_lock( xfs_rtbitmap_lock(tp, rtg->rtg_mount); else if (rtglock_flags & XFS_RTGLOCK_BITMAP_SHARED) xfs_rtbitmap_lock_shared(rtg->rtg_mount, XFS_RBMLOCK_BITMAP); + + if ((rtglock_flags & XFS_RTGLOCK_RMAP) && rtg->rtg_rmapip) { + xfs_ilock(rtg->rtg_rmapip, XFS_ILOCK_EXCL); + if (tp) + xfs_trans_ijoin(tp, rtg->rtg_rmapip, XFS_ILOCK_EXCL); + } } /* Unlock metadata inodes associated with this rt group. */ @@ -524,6 +530,9 @@ xfs_rtgroup_unlock( ASSERT(!(rtglock_flags & XFS_RTGLOCK_BITMAP_SHARED) || !(rtglock_flags & XFS_RTGLOCK_BITMAP)); + if ((rtglock_flags & XFS_RTGLOCK_RMAP) && rtg->rtg_rmapip) + xfs_iunlock(rtg->rtg_rmapip, XFS_ILOCK_EXCL); + if (rtglock_flags & XFS_RTGLOCK_BITMAP) xfs_rtbitmap_unlock(rtg->rtg_mount); else if (rtglock_flags & XFS_RTGLOCK_BITMAP_SHARED) diff --git a/libxfs/xfs_rtgroup.h b/libxfs/xfs_rtgroup.h index 1792a9ab3bb..3230dd03d8f 100644 --- a/libxfs/xfs_rtgroup.h +++ b/libxfs/xfs_rtgroup.h @@ -220,9 +220,12 @@ int xfs_rtgroup_init_secondary_super(struct xfs_mount *mp, xfs_rgnumber_t rgno, #define XFS_RTGLOCK_BITMAP (1U << 0) /* Lock the rt bitmap inode in shared mode */ #define XFS_RTGLOCK_BITMAP_SHARED (1U << 1) +/* Lock the rt rmap inode in exclusive mode */ +#define XFS_RTGLOCK_RMAP (1U << 2) #define XFS_RTGLOCK_ALL_FLAGS (XFS_RTGLOCK_BITMAP | \ - XFS_RTGLOCK_BITMAP_SHARED) + XFS_RTGLOCK_BITMAP_SHARED | \ + XFS_RTGLOCK_RMAP) void xfs_rtgroup_lock(struct xfs_trans *tp, struct xfs_rtgroup *rtg, unsigned int rtglock_flags); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 15/41] xfs: allow queued realtime intents to drain before scrubbing 2022-12-30 22:19 ` [PATCHSET v1.0 00/41] libxfs: realtime reverse-mapping support Darrick J. Wong ` (7 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 12/41] xfs: wire up rmap map and unmap to the realtime rmapbt Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 11/41] xfs: use realtime EFI to free extents when realtime rmap is enabled Darrick J. Wong ` (31 subsequent siblings) 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> When a writer thread executes a chain of log intent items for the realtime volume, the ILOCKs taken during each step are for each rt metadata file, not the entire rt volume itself. Although scrub takes all rt metadata ILOCKs, this isn't sufficient to guard against scrub checking the rt volume while that writer thread is in the middle of finishing a chain because there's no higher level locking primitive guarding the realtime volume. When there's a collision, cross-referencing between data structures (e.g. rtrmapbt and rtrefcountbt) yields false corruption events; if repair is running, this results in incorrect repairs, which is catastrophic. Fix this by adding to the mount structure the same drain that we use to protect scrub against concurrent AG updates, but this time for the realtime volume. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- include/xfs_mount.h | 5 +++++ libxfs/defer_item.c | 9 ++++++++- libxfs/xfs_rtgroup.c | 3 +++ libxfs/xfs_rtgroup.h | 9 +++++++++ 4 files changed, 25 insertions(+), 1 deletion(-) diff --git a/include/xfs_mount.h b/include/xfs_mount.h index a4d0ba70e83..ca79c420afb 100644 --- a/include/xfs_mount.h +++ b/include/xfs_mount.h @@ -311,6 +311,11 @@ struct xfs_drain { /* empty */ }; static inline void xfs_perag_bump_intents(struct xfs_perag *pag) { } static inline void xfs_perag_drop_intents(struct xfs_perag *pag) { } +struct xfs_rtgroup; + +static inline void xfs_rtgroup_bump_intents(struct xfs_rtgroup *rtg) { } +static inline void xfs_rtgroup_drop_intents(struct xfs_rtgroup *rtg) { } + #define xfs_drain_free(dr) ((void)0) #define xfs_drain_init(dr) ((void)0) diff --git a/libxfs/defer_item.c b/libxfs/defer_item.c index 9a4196f7cc0..baf3b9e6204 100644 --- a/libxfs/defer_item.c +++ b/libxfs/defer_item.c @@ -91,6 +91,7 @@ xfs_extent_free_get_group( rgno = xfs_rtb_to_rgno(mp, xefi->xefi_startblock); xefi->xefi_rtg = xfs_rtgroup_get(mp, rgno); + xfs_rtgroup_bump_intents(xefi->xefi_rtg); return; } @@ -105,6 +106,7 @@ xfs_extent_free_put_group( struct xfs_extent_free_item *xefi) { if (xfs_efi_is_realtime(xefi)) { + xfs_rtgroup_drop_intents(xefi->xefi_rtg); xfs_rtgroup_put(xefi->xefi_rtg); return; } @@ -275,6 +277,7 @@ xfs_rmap_update_get_group( rgno = xfs_rtb_to_rgno(mp, ri->ri_bmap.br_startblock); ri->ri_rtg = xfs_rtgroup_get(mp, rgno); + xfs_rtgroup_bump_intents(ri->ri_rtg); return; } @@ -289,6 +292,7 @@ xfs_rmap_update_put_group( struct xfs_rmap_intent *ri) { if (ri->ri_realtime) { + xfs_rtgroup_drop_intents(ri->ri_rtg); xfs_rtgroup_put(ri->ri_rtg); return; } @@ -522,6 +526,7 @@ xfs_bmap_update_get_group( rgno = xfs_rtb_to_rgno(mp, bi->bi_bmap.br_startblock); bi->bi_rtg = xfs_rtgroup_get(mp, rgno); + xfs_rtgroup_bump_intents(bi->bi_rtg); } else { bi->bi_rtg = NULL; } @@ -548,8 +553,10 @@ xfs_bmap_update_put_group( struct xfs_bmap_intent *bi) { if (xfs_ifork_is_realtime(bi->bi_owner, bi->bi_whichfork)) { - if (xfs_has_rtgroups(bi->bi_owner->i_mount)) + if (xfs_has_rtgroups(bi->bi_owner->i_mount)) { + xfs_rtgroup_drop_intents(bi->bi_rtg); xfs_rtgroup_put(bi->bi_rtg); + } return; } diff --git a/libxfs/xfs_rtgroup.c b/libxfs/xfs_rtgroup.c index 8018cd02e70..8c41869a61a 100644 --- a/libxfs/xfs_rtgroup.c +++ b/libxfs/xfs_rtgroup.c @@ -129,6 +129,8 @@ xfs_initialize_rtgroups( #ifdef __KERNEL__ /* Place kernel structure only init below this point. */ spin_lock_init(&rtg->rtg_state_lock); + xfs_drain_init(&rtg->rtg_intents); + #endif /* __KERNEL__ */ /* first new rtg is fully initialized */ @@ -180,6 +182,7 @@ xfs_free_rtgroups( spin_unlock(&mp->m_rtgroup_lock); ASSERT(rtg); XFS_IS_CORRUPT(rtg->rtg_mount, atomic_read(&rtg->rtg_ref) != 0); + xfs_drain_free(&rtg->rtg_intents); call_rcu(&rtg->rcu_head, __xfs_free_rtgroups); } diff --git a/libxfs/xfs_rtgroup.h b/libxfs/xfs_rtgroup.h index 3230dd03d8f..1d41a2cac34 100644 --- a/libxfs/xfs_rtgroup.h +++ b/libxfs/xfs_rtgroup.h @@ -37,6 +37,15 @@ struct xfs_rtgroup { #ifdef __KERNEL__ /* -- kernel only structures below this line -- */ spinlock_t rtg_state_lock; + + /* + * We use xfs_drain to track the number of deferred log intent items + * that have been queued (but not yet processed) so that waiters (e.g. + * scrub) will not lock resources when other threads are in the middle + * of processing a chain of intent items only to find momentary + * inconsistencies. + */ + struct xfs_drain rtg_intents; #endif /* __KERNEL__ */ }; ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 11/41] xfs: use realtime EFI to free extents when realtime rmap is enabled 2022-12-30 22:19 ` [PATCHSET v1.0 00/41] libxfs: realtime reverse-mapping support Darrick J. Wong ` (8 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 15/41] xfs: allow queued realtime intents to drain before scrubbing Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 09/41] xfs: add metadata reservations for realtime rmap btrees Darrick J. Wong ` (30 subsequent siblings) 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> When rmap is enabled, XFS expects a certain order of operations, which is: 1) remove the file mapping, 2) remove the reverse mapping, and then 3) free the blocks. xfs_bmap_del_extent_real tries to do 1 and 3 in the same transaction, which means that when rtrmap is enabled, we have to use realtime EFIs to maintain the expected order. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/xfs_bmap.c | 22 ++++++++++++++++------ 1 file changed, 16 insertions(+), 6 deletions(-) diff --git a/libxfs/xfs_bmap.c b/libxfs/xfs_bmap.c index c6cfef01eea..d0588d3fa70 100644 --- a/libxfs/xfs_bmap.c +++ b/libxfs/xfs_bmap.c @@ -5088,7 +5088,6 @@ xfs_bmap_del_extent_real( { xfs_fsblock_t del_endblock=0; /* first block past del */ xfs_fileoff_t del_endoff; /* first offset past del */ - int do_fx; /* free extent at end of routine */ int error; /* error return value */ int flags = 0;/* inode logging flags */ struct xfs_bmbt_irec got; /* current extent entry */ @@ -5102,6 +5101,8 @@ xfs_bmap_del_extent_real( uint qfield; /* quota field to update */ uint32_t state = xfs_bmap_fork_to_state(whichfork); struct xfs_bmbt_irec old; + bool isrt = xfs_ifork_is_realtime(ip, whichfork); + bool want_free = !(bflags & XFS_BMAPI_REMAP); mp = ip->i_mount; XFS_STATS_INC(mp, xs_del_exlist); @@ -5132,17 +5133,24 @@ xfs_bmap_del_extent_real( return -ENOSPC; flags = XFS_ILOG_CORE; - if (xfs_ifork_is_realtime(ip, whichfork)) { - if (!(bflags & XFS_BMAPI_REMAP)) { + if (isrt) { + /* + * Historically, we did not use EFIs to free realtime extents. + * However, when reverse mapping is enabled, we must maintain + * the same order of operations as the data device, which is: + * Remove the file mapping, remove the reverse mapping, and + * then free the blocks. This means that we must delay the + * freeing until after we've scheduled the rmap update. + */ + if (want_free && !xfs_has_rtrmapbt(mp)) { error = xfs_rtfree_blocks(tp, del->br_startblock, del->br_blockcount); if (error) goto done; + want_free = false; } - do_fx = 0; qfield = XFS_TRANS_DQ_RTBCOUNT; } else { - do_fx = 1; qfield = XFS_TRANS_DQ_BCOUNT; } nblks = del->br_blockcount; @@ -5297,7 +5305,7 @@ xfs_bmap_del_extent_real( /* * If we need to, add to list of extents to delete. */ - if (do_fx && !(bflags & XFS_BMAPI_REMAP)) { + if (want_free) { if (xfs_is_reflink_inode(ip) && whichfork == XFS_DATA_FORK) { xfs_refcount_decrease_extent(tp, del); } else { @@ -5306,6 +5314,8 @@ xfs_bmap_del_extent_real( if ((bflags & XFS_BMAPI_NODISCARD) || del->br_state == XFS_EXT_UNWRITTEN) efi_flags |= XFS_FREE_EXTENT_SKIP_DISCARD; + if (isrt) + efi_flags |= XFS_FREE_EXTENT_REALTIME; xfs_free_extent_later(tp, del->br_startblock, del->br_blockcount, NULL, efi_flags); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 09/41] xfs: add metadata reservations for realtime rmap btrees 2022-12-30 22:19 ` [PATCHSET v1.0 00/41] libxfs: realtime reverse-mapping support Darrick J. Wong ` (9 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 11/41] xfs: use realtime EFI to free extents when realtime rmap is enabled Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 13/41] xfs: create routine to allocate and initialize a realtime rmap btree inode Darrick J. Wong ` (29 subsequent siblings) 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Reserve some free blocks so that we will always have enough free blocks in the data volume to handle expansion of the realtime rmap btree. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/xfs_rtrmap_btree.c | 39 +++++++++++++++++++++++++++++++++++++++ libxfs/xfs_rtrmap_btree.h | 2 ++ 2 files changed, 41 insertions(+) diff --git a/libxfs/xfs_rtrmap_btree.c b/libxfs/xfs_rtrmap_btree.c index 85608c813b4..d45f711ce06 100644 --- a/libxfs/xfs_rtrmap_btree.c +++ b/libxfs/xfs_rtrmap_btree.c @@ -606,3 +606,42 @@ xfs_rtrmapbt_create_path( *pathp = path; return 0; } + +/* Calculate the rtrmap btree size for some records. */ +static unsigned long long +xfs_rtrmapbt_calc_size( + struct xfs_mount *mp, + unsigned long long len) +{ + return xfs_btree_calc_size(mp->m_rtrmap_mnr, len); +} + +/* + * Calculate the maximum rmap btree size. + */ +static unsigned long long +xfs_rtrmapbt_max_size( + struct xfs_mount *mp, + xfs_rtblock_t rtblocks) +{ + /* Bail out if we're uninitialized, which can happen in mkfs. */ + if (mp->m_rtrmap_mxr[0] == 0) + return 0; + + return xfs_rtrmapbt_calc_size(mp, rtblocks); +} + +/* + * Figure out how many blocks to reserve and how many are used by this btree. + */ +xfs_filblks_t +xfs_rtrmapbt_calc_reserves( + struct xfs_mount *mp) +{ + if (!xfs_has_rtrmapbt(mp)) + return 0; + + /* 1/64th (~1.5%) of the space, and enough for 1 record per block. */ + return max_t(xfs_filblks_t, mp->m_sb.sb_rgblocks >> 6, + xfs_rtrmapbt_max_size(mp, mp->m_sb.sb_rgblocks)); +} diff --git a/libxfs/xfs_rtrmap_btree.h b/libxfs/xfs_rtrmap_btree.h index 26e2445f5d6..63e667d0d76 100644 --- a/libxfs/xfs_rtrmap_btree.h +++ b/libxfs/xfs_rtrmap_btree.h @@ -84,4 +84,6 @@ void xfs_rtrmapbt_destroy_cur_cache(void); int xfs_rtrmapbt_create_path(struct xfs_mount *mp, xfs_rgnumber_t rgno, struct xfs_imeta_path **pathp); +xfs_filblks_t xfs_rtrmapbt_calc_reserves(struct xfs_mount *mp); + #endif /* __XFS_RTRMAP_BTREE_H__ */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 13/41] xfs: create routine to allocate and initialize a realtime rmap btree inode 2022-12-30 22:19 ` [PATCHSET v1.0 00/41] libxfs: realtime reverse-mapping support Darrick J. Wong ` (10 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 09/41] xfs: add metadata reservations for realtime rmap btrees Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 08/41] xfs: add realtime reverse map inode to superblock Darrick J. Wong ` (28 subsequent siblings) 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Create a library routine to allocate and initialize an empty realtime rmapbt inode. We'll use this for growfs, mkfs, and repair. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/xfs_rtrmap_btree.c | 41 +++++++++++++++++++++++++++++++++++++++++ libxfs/xfs_rtrmap_btree.h | 5 +++++ 2 files changed, 46 insertions(+) diff --git a/libxfs/xfs_rtrmap_btree.c b/libxfs/xfs_rtrmap_btree.c index f92815b08a2..38f0b8567a6 100644 --- a/libxfs/xfs_rtrmap_btree.c +++ b/libxfs/xfs_rtrmap_btree.c @@ -865,3 +865,44 @@ xfs_iflush_rtrmap( xfs_rtrmapbt_to_disk(ip->i_mount, ifp->if_broot, ifp->if_broot_bytes, dfp, XFS_DFORK_SIZE(dip, ip->i_mount, XFS_DATA_FORK)); } + +/* + * Create a realtime rmap btree inode. + * + * Regardless of the return value, the caller must clean up @ic. If a new + * inode is returned through *ipp, the caller must finish setting up the incore + * inode and release it. + */ +int +xfs_rtrmapbt_create( + struct xfs_trans **tpp, + struct xfs_imeta_path *path, + struct xfs_imeta_update *upd, + struct xfs_inode **ipp) +{ + struct xfs_mount *mp = (*tpp)->t_mountp; + struct xfs_ifork *ifp; + struct xfs_inode *ip; + int error; + + *ipp = NULL; + + error = xfs_imeta_create(tpp, path, S_IFREG, 0, &ip, upd); + if (error) + return error; + + ifp = &ip->i_df; + ifp->if_format = XFS_DINODE_FMT_RMAP; + ASSERT(ifp->if_broot_bytes == 0); + ASSERT(ifp->if_bytes == 0); + + /* Initialize the empty incore btree root. */ + xfs_iroot_alloc(ip, XFS_DATA_FORK, + xfs_rtrmap_broot_space_calc(mp, 0, 0)); + xfs_btree_init_block(mp, ifp->if_broot, &xfs_rtrmapbt_ops, 0, 0, + ip->i_ino); + xfs_trans_log_inode(*tpp, ip, XFS_ILOG_CORE | XFS_ILOG_DBROOT); + + *ipp = ip; + return 0; +} diff --git a/libxfs/xfs_rtrmap_btree.h b/libxfs/xfs_rtrmap_btree.h index 6917a31bfe0..046a6081673 100644 --- a/libxfs/xfs_rtrmap_btree.h +++ b/libxfs/xfs_rtrmap_btree.h @@ -198,4 +198,9 @@ void xfs_rtrmapbt_to_disk(struct xfs_mount *mp, struct xfs_btree_block *rblock, unsigned int dblocklen); void xfs_iflush_rtrmap(struct xfs_inode *ip, struct xfs_dinode *dip); +struct xfs_imeta_update; + +int xfs_rtrmapbt_create(struct xfs_trans **tpp, struct xfs_imeta_path *path, + struct xfs_imeta_update *ic, struct xfs_inode **ipp); + #endif /* __XFS_RTRMAP_BTREE_H__ */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 08/41] xfs: add realtime reverse map inode to superblock 2022-12-30 22:19 ` [PATCHSET v1.0 00/41] libxfs: realtime reverse-mapping support Darrick J. Wong ` (11 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 13/41] xfs: create routine to allocate and initialize a realtime rmap btree inode Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 10/41] xfs: wire up a new inode fork type for the realtime rmap Darrick J. Wong ` (27 subsequent siblings) 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Add a metadir path to select the realtime rmap btree inode and load it at mount time. The rtrmapbt inode will have a unique extent format code, which means that we also have to update the inode validation and flush routines to look for it. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/init.c | 8 ++++++++ libxfs/xfs_format.h | 6 ++++-- libxfs/xfs_inode_buf.c | 6 ++++++ libxfs/xfs_inode_fork.c | 9 +++++++++ libxfs/xfs_rtgroup.h | 3 +++ libxfs/xfs_rtrmap_btree.c | 33 +++++++++++++++++++++++++++++++++ libxfs/xfs_rtrmap_btree.h | 4 ++++ 7 files changed, 67 insertions(+), 2 deletions(-) diff --git a/libxfs/init.c b/libxfs/init.c index 4ce0aca9796..6f549996b1e 100644 --- a/libxfs/init.c +++ b/libxfs/init.c @@ -1026,6 +1026,14 @@ libxfs_mount( void libxfs_rtmount_destroy(xfs_mount_t *mp) { + struct xfs_rtgroup *rtg; + xfs_rgnumber_t rgno; + + for_each_rtgroup(mp, rgno, rtg) { + if (rtg->rtg_rmapip) + libxfs_imeta_irele(rtg->rtg_rmapip); + rtg->rtg_rmapip = NULL; + } if (mp->m_rsumip) libxfs_imeta_irele(mp->m_rsumip); if (mp->m_rbmip) diff --git a/libxfs/xfs_format.h b/libxfs/xfs_format.h index fb727e1e407..babe5d3fabb 100644 --- a/libxfs/xfs_format.h +++ b/libxfs/xfs_format.h @@ -1009,7 +1009,8 @@ enum xfs_dinode_fmt { XFS_DINODE_FMT_LOCAL, /* bulk data */ XFS_DINODE_FMT_EXTENTS, /* struct xfs_bmbt_rec */ XFS_DINODE_FMT_BTREE, /* struct xfs_bmdr_block */ - XFS_DINODE_FMT_UUID /* added long ago, but never used */ + XFS_DINODE_FMT_UUID, /* added long ago, but never used */ + XFS_DINODE_FMT_RMAP, /* reverse mapping btree */ }; #define XFS_INODE_FORMAT_STR \ @@ -1017,7 +1018,8 @@ enum xfs_dinode_fmt { { XFS_DINODE_FMT_LOCAL, "local" }, \ { XFS_DINODE_FMT_EXTENTS, "extent" }, \ { XFS_DINODE_FMT_BTREE, "btree" }, \ - { XFS_DINODE_FMT_UUID, "uuid" } + { XFS_DINODE_FMT_UUID, "uuid" }, \ + { XFS_DINODE_FMT_RMAP, "rmap" } /* * Max values for extnum and aextnum. diff --git a/libxfs/xfs_inode_buf.c b/libxfs/xfs_inode_buf.c index b5d4e5dd7ca..3aaa1988fb1 100644 --- a/libxfs/xfs_inode_buf.c +++ b/libxfs/xfs_inode_buf.c @@ -405,6 +405,12 @@ xfs_dinode_verify_fork( if (di_nextents > max_extents) return __this_address; break; + case XFS_DINODE_FMT_RMAP: + if (!xfs_has_rtrmapbt(mp)) + return __this_address; + if (!(dip->di_flags2 & cpu_to_be64(XFS_DIFLAG2_METADATA))) + return __this_address; + break; default: return __this_address; } diff --git a/libxfs/xfs_inode_fork.c b/libxfs/xfs_inode_fork.c index 58e5ab45e42..b441328cc9c 100644 --- a/libxfs/xfs_inode_fork.c +++ b/libxfs/xfs_inode_fork.c @@ -257,6 +257,11 @@ xfs_iformat_data_fork( return xfs_iformat_extents(ip, dip, XFS_DATA_FORK); case XFS_DINODE_FMT_BTREE: return xfs_iformat_btree(ip, dip, XFS_DATA_FORK); + case XFS_DINODE_FMT_RMAP: + if (!xfs_has_rtrmapbt(ip->i_mount)) + return -EFSCORRUPTED; + ASSERT(0); /* to be implemented later */ + return -EFSCORRUPTED; default: xfs_inode_verifier_error(ip, -EFSCORRUPTED, __func__, dip, sizeof(*dip), __this_address); @@ -637,6 +642,10 @@ xfs_iflush_fork( } break; + case XFS_DINODE_FMT_RMAP: + ASSERT(0); /* to be implemented later */ + break; + default: ASSERT(0); break; diff --git a/libxfs/xfs_rtgroup.h b/libxfs/xfs_rtgroup.h index 3c9572677f7..1792a9ab3bb 100644 --- a/libxfs/xfs_rtgroup.h +++ b/libxfs/xfs_rtgroup.h @@ -20,6 +20,9 @@ struct xfs_rtgroup { /* for rcu-safe freeing */ struct rcu_head rcu_head; + /* reverse mapping btree inode */ + struct xfs_inode *rtg_rmapip; + /* Number of blocks in this group */ xfs_rgblock_t rtg_blockcount; diff --git a/libxfs/xfs_rtrmap_btree.c b/libxfs/xfs_rtrmap_btree.c index 4dfd4fd1b1f..85608c813b4 100644 --- a/libxfs/xfs_rtrmap_btree.c +++ b/libxfs/xfs_rtrmap_btree.c @@ -24,6 +24,7 @@ #include "xfs_cksum.h" #include "xfs_rtgroup.h" #include "xfs_bmap.h" +#include "xfs_imeta.h" static struct kmem_cache *xfs_rtrmapbt_cur_cache; @@ -473,6 +474,7 @@ xfs_rtrmapbt_commit_staged_btree( int flags = XFS_ILOG_CORE | XFS_ILOG_DBROOT; ASSERT(cur->bc_flags & XFS_BTREE_STAGING); + ASSERT(ifake->if_fork->if_format == XFS_DINODE_FMT_RMAP); /* * Free any resources hanging off the real fork, then shallow-copy the @@ -573,3 +575,34 @@ xfs_rtrmapbt_compute_maxlevels( /* Add one level to handle the inode root level. */ mp->m_rtrmap_maxlevels = min(d_maxlevels, r_maxlevels) + 1; } + +#define XFS_RTRMAP_NAMELEN 17 + +/* Create the metadata directory path for an rtrmap btree inode. */ +int +xfs_rtrmapbt_create_path( + struct xfs_mount *mp, + xfs_rgnumber_t rgno, + struct xfs_imeta_path **pathp) +{ + struct xfs_imeta_path *path; + char *fname; + int error; + + error = xfs_imeta_create_file_path(mp, 2, &path); + if (error) + return error; + + fname = kmalloc(XFS_RTRMAP_NAMELEN, GFP_KERNEL); + if (!fname) { + xfs_imeta_free_path(path); + return -ENOMEM; + } + + snprintf(fname, XFS_RTRMAP_NAMELEN, "%u.rmap", rgno); + path->im_path[0] = "realtime"; + path->im_path[1] = fname; + path->im_dynamicmask = 0x2; + *pathp = path; + return 0; +} diff --git a/libxfs/xfs_rtrmap_btree.h b/libxfs/xfs_rtrmap_btree.h index 7380c04e770..26e2445f5d6 100644 --- a/libxfs/xfs_rtrmap_btree.h +++ b/libxfs/xfs_rtrmap_btree.h @@ -11,6 +11,7 @@ struct xfs_btree_cur; struct xfs_mount; struct xbtree_ifakeroot; struct xfs_rtgroup; +struct xfs_imeta_path; /* rmaps only exist on crc enabled filesystems */ #define XFS_RTRMAP_BLOCK_LEN XFS_BTREE_LBLOCK_CRC_LEN @@ -80,4 +81,7 @@ unsigned int xfs_rtrmapbt_maxlevels_ondisk(void); int __init xfs_rtrmapbt_init_cur_cache(void); void xfs_rtrmapbt_destroy_cur_cache(void); +int xfs_rtrmapbt_create_path(struct xfs_mount *mp, xfs_rgnumber_t rgno, + struct xfs_imeta_path **pathp); + #endif /* __XFS_RTRMAP_BTREE_H__ */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 10/41] xfs: wire up a new inode fork type for the realtime rmap 2022-12-30 22:19 ` [PATCHSET v1.0 00/41] libxfs: realtime reverse-mapping support Darrick J. Wong ` (12 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 08/41] xfs: add realtime reverse map inode to superblock Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 14/41] xfs: report realtime rmap btree corruption errors to the health system Darrick J. Wong ` (26 subsequent siblings) 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Plumb in the pieces we need to embed the root of the realtime rmap btree in an inode's data fork, complete with new fork type and on-disk interpretation functions. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/xfs_format.h | 8 ++ libxfs/xfs_inode_fork.c | 8 +- libxfs/xfs_rtrmap_btree.c | 220 +++++++++++++++++++++++++++++++++++++++++++++ libxfs/xfs_rtrmap_btree.h | 112 +++++++++++++++++++++++ 4 files changed, 345 insertions(+), 3 deletions(-) diff --git a/libxfs/xfs_format.h b/libxfs/xfs_format.h index babe5d3fabb..a2b8d8ee8af 100644 --- a/libxfs/xfs_format.h +++ b/libxfs/xfs_format.h @@ -1736,6 +1736,14 @@ typedef __be32 xfs_rmap_ptr_t; */ #define XFS_RTRMAP_CRC_MAGIC 0x4d415052 /* 'MAPR' */ +/* + * rtrmap root header, on-disk form only. + */ +struct xfs_rtrmap_root { + __be16 bb_level; /* 0 is a leaf */ + __be16 bb_numrecs; /* current # of data records */ +}; + /* inode-based btree pointer type */ typedef __be64 xfs_rtrmap_ptr_t; diff --git a/libxfs/xfs_inode_fork.c b/libxfs/xfs_inode_fork.c index b441328cc9c..386f23b2954 100644 --- a/libxfs/xfs_inode_fork.c +++ b/libxfs/xfs_inode_fork.c @@ -25,6 +25,7 @@ #include "xfs_errortag.h" #include "xfs_health.h" #include "xfs_symlink_remote.h" +#include "xfs_rtrmap_btree.h" struct kmem_cache *xfs_ifork_cache; @@ -260,8 +261,7 @@ xfs_iformat_data_fork( case XFS_DINODE_FMT_RMAP: if (!xfs_has_rtrmapbt(ip->i_mount)) return -EFSCORRUPTED; - ASSERT(0); /* to be implemented later */ - return -EFSCORRUPTED; + return xfs_iformat_rtrmap(ip, dip); default: xfs_inode_verifier_error(ip, -EFSCORRUPTED, __func__, dip, sizeof(*dip), __this_address); @@ -643,7 +643,9 @@ xfs_iflush_fork( break; case XFS_DINODE_FMT_RMAP: - ASSERT(0); /* to be implemented later */ + ASSERT(whichfork == XFS_DATA_FORK); + if (iip->ili_fields & brootflag[whichfork]) + xfs_iflush_rtrmap(ip, dip); break; default: diff --git a/libxfs/xfs_rtrmap_btree.c b/libxfs/xfs_rtrmap_btree.c index d45f711ce06..f92815b08a2 100644 --- a/libxfs/xfs_rtrmap_btree.c +++ b/libxfs/xfs_rtrmap_btree.c @@ -83,6 +83,39 @@ xfs_rtrmapbt_get_maxrecs( return cur->bc_mp->m_rtrmap_mxr[level != 0]; } +/* Calculate number of records in the ondisk realtime rmap btree inode root. */ +unsigned int +xfs_rtrmapbt_droot_maxrecs( + unsigned int blocklen, + bool leaf) +{ + blocklen -= sizeof(struct xfs_rtrmap_root); + + if (leaf) + return blocklen / sizeof(struct xfs_rmap_rec); + return blocklen / (2 * sizeof(struct xfs_rmap_key) + + sizeof(xfs_rtrmap_ptr_t)); +} + +/* + * Get the maximum records we could store in the on-disk format. + * + * For non-root nodes this is equivalent to xfs_rtrmapbt_get_maxrecs, but + * for the root node this checks the available space in the dinode fork + * so that we can resize the in-memory buffer to match it. After a + * resize to the maximum size this function returns the same value + * as xfs_rtrmapbt_get_maxrecs for the root node, too. + */ +STATIC int +xfs_rtrmapbt_get_dmaxrecs( + struct xfs_btree_cur *cur, + int level) +{ + if (level != cur->bc_nlevels - 1) + return cur->bc_mp->m_rtrmap_mxr[level != 0]; + return xfs_rtrmapbt_droot_maxrecs(cur->bc_ino.forksize, level == 0); +} + /* * Convert the ondisk record's offset field into the ondisk key's offset field. * Fork and bmbt are significant parts of the rmap record key, but written @@ -375,6 +408,64 @@ xfs_rtrmapbt_keys_contiguous( be32_to_cpu(key2->rmap.rm_startblock)); } +/* Move the rtrmap btree root from one incore buffer to another. */ +static void +xfs_rtrmapbt_broot_move( + struct xfs_inode *ip, + int whichfork, + struct xfs_btree_block *dst_broot, + size_t dst_bytes, + struct xfs_btree_block *src_broot, + size_t src_bytes, + unsigned int level, + unsigned int numrecs) +{ + struct xfs_mount *mp = ip->i_mount; + void *dptr; + void *sptr; + + ASSERT(xfs_rtrmap_droot_space(src_broot) <= + xfs_inode_fork_size(ip, whichfork)); + + /* + * We always have to move the pointers because they are not butted + * against the btree block header. + */ + if (numrecs && level > 0) { + sptr = xfs_rtrmap_broot_ptr_addr(mp, src_broot, 1, src_bytes); + dptr = xfs_rtrmap_broot_ptr_addr(mp, dst_broot, 1, dst_bytes); + memmove(dptr, sptr, numrecs * sizeof(xfs_fsblock_t)); + } + + if (src_broot == dst_broot) + return; + + /* + * If the root is being totally relocated, we have to migrate the block + * header and the keys/records that come after it. + */ + memcpy(dst_broot, src_broot, XFS_RTRMAP_BLOCK_LEN); + + if (!numrecs) + return; + + if (level == 0) { + sptr = xfs_rtrmap_rec_addr(src_broot, 1); + dptr = xfs_rtrmap_rec_addr(dst_broot, 1); + memcpy(dptr, sptr, numrecs * sizeof(struct xfs_rmap_rec)); + } else { + sptr = xfs_rtrmap_key_addr(src_broot, 1); + dptr = xfs_rtrmap_key_addr(dst_broot, 1); + memcpy(dptr, sptr, numrecs * 2 * sizeof(struct xfs_rmap_key)); + } +} + +static const struct xfs_ifork_broot_ops xfs_rtrmapbt_iroot_ops = { + .maxrecs = xfs_rtrmapbt_maxrecs, + .size = xfs_rtrmap_broot_space_calc, + .move = xfs_rtrmapbt_broot_move, +}; + const struct xfs_btree_ops xfs_rtrmapbt_ops = { .rec_len = sizeof(struct xfs_rmap_rec), .key_len = 2 * sizeof(struct xfs_rmap_key), @@ -387,6 +478,7 @@ const struct xfs_btree_ops xfs_rtrmapbt_ops = { .free_block = xfs_btree_free_imeta_block, .get_minrecs = xfs_rtrmapbt_get_minrecs, .get_maxrecs = xfs_rtrmapbt_get_maxrecs, + .get_dmaxrecs = xfs_rtrmapbt_get_dmaxrecs, .init_key_from_rec = xfs_rtrmapbt_init_key_from_rec, .init_high_key_from_rec = xfs_rtrmapbt_init_high_key_from_rec, .init_rec_from_cur = xfs_rtrmapbt_init_rec_from_cur, @@ -397,6 +489,7 @@ const struct xfs_btree_ops xfs_rtrmapbt_ops = { .keys_inorder = xfs_rtrmapbt_keys_inorder, .recs_inorder = xfs_rtrmapbt_recs_inorder, .keys_contiguous = xfs_rtrmapbt_keys_contiguous, + .iroot_ops = &xfs_rtrmapbt_iroot_ops, }; /* Initialize a new rt rmap btree cursor. */ @@ -645,3 +738,130 @@ xfs_rtrmapbt_calc_reserves( return max_t(xfs_filblks_t, mp->m_sb.sb_rgblocks >> 6, xfs_rtrmapbt_max_size(mp, mp->m_sb.sb_rgblocks)); } + +/* Convert on-disk form of btree root to in-memory form. */ +STATIC void +xfs_rtrmapbt_from_disk( + struct xfs_inode *ip, + struct xfs_rtrmap_root *dblock, + unsigned int dblocklen, + struct xfs_btree_block *rblock) +{ + struct xfs_mount *mp = ip->i_mount; + struct xfs_rmap_key *fkp; + __be64 *fpp; + struct xfs_rmap_key *tkp; + __be64 *tpp; + struct xfs_rmap_rec *frp; + struct xfs_rmap_rec *trp; + unsigned int rblocklen = xfs_rtrmap_broot_space(mp, dblock); + unsigned int numrecs; + unsigned int maxrecs; + + xfs_btree_init_block(mp, rblock, &xfs_rtrmapbt_ops, 0, 0, ip->i_ino); + + rblock->bb_level = dblock->bb_level; + rblock->bb_numrecs = dblock->bb_numrecs; + numrecs = be16_to_cpu(dblock->bb_numrecs); + + if (be16_to_cpu(rblock->bb_level) > 0) { + maxrecs = xfs_rtrmapbt_droot_maxrecs(dblocklen, false); + fkp = xfs_rtrmap_droot_key_addr(dblock, 1); + tkp = xfs_rtrmap_key_addr(rblock, 1); + fpp = xfs_rtrmap_droot_ptr_addr(dblock, 1, maxrecs); + tpp = xfs_rtrmap_broot_ptr_addr(mp, rblock, 1, rblocklen); + memcpy(tkp, fkp, 2 * sizeof(*fkp) * numrecs); + memcpy(tpp, fpp, sizeof(*fpp) * numrecs); + } else { + frp = xfs_rtrmap_droot_rec_addr(dblock, 1); + trp = xfs_rtrmap_rec_addr(rblock, 1); + memcpy(trp, frp, sizeof(*frp) * numrecs); + } +} + +/* Load a realtime reverse mapping btree root in from disk. */ +int +xfs_iformat_rtrmap( + struct xfs_inode *ip, + struct xfs_dinode *dip) +{ + struct xfs_mount *mp = ip->i_mount; + struct xfs_ifork *ifp = xfs_ifork_ptr(ip, XFS_DATA_FORK); + struct xfs_rtrmap_root *dfp = XFS_DFORK_PTR(dip, XFS_DATA_FORK); + unsigned int numrecs; + unsigned int level; + int dsize; + + dsize = XFS_DFORK_SIZE(dip, mp, XFS_DATA_FORK); + numrecs = be16_to_cpu(dfp->bb_numrecs); + level = be16_to_cpu(dfp->bb_level); + + if (level > mp->m_rtrmap_maxlevels || + xfs_rtrmap_droot_space_calc(level, numrecs) > dsize) + return -EFSCORRUPTED; + + xfs_iroot_alloc(ip, XFS_DATA_FORK, + xfs_rtrmap_broot_space_calc(mp, level, numrecs)); + xfs_rtrmapbt_from_disk(ip, dfp, dsize, ifp->if_broot); + return 0; +} + +/* Convert in-memory form of btree root to on-disk form. */ +void +xfs_rtrmapbt_to_disk( + struct xfs_mount *mp, + struct xfs_btree_block *rblock, + unsigned int rblocklen, + struct xfs_rtrmap_root *dblock, + unsigned int dblocklen) +{ + struct xfs_rmap_key *fkp; + __be64 *fpp; + struct xfs_rmap_key *tkp; + __be64 *tpp; + struct xfs_rmap_rec *frp; + struct xfs_rmap_rec *trp; + unsigned int numrecs; + unsigned int maxrecs; + + ASSERT(rblock->bb_magic == cpu_to_be32(XFS_RTRMAP_CRC_MAGIC)); + ASSERT(uuid_equal(&rblock->bb_u.l.bb_uuid, &mp->m_sb.sb_meta_uuid)); + ASSERT(rblock->bb_u.l.bb_blkno == cpu_to_be64(XFS_BUF_DADDR_NULL)); + ASSERT(rblock->bb_u.l.bb_leftsib == cpu_to_be64(NULLFSBLOCK)); + ASSERT(rblock->bb_u.l.bb_rightsib == cpu_to_be64(NULLFSBLOCK)); + + dblock->bb_level = rblock->bb_level; + dblock->bb_numrecs = rblock->bb_numrecs; + numrecs = be16_to_cpu(rblock->bb_numrecs); + + if (be16_to_cpu(rblock->bb_level) > 0) { + maxrecs = xfs_rtrmapbt_droot_maxrecs(dblocklen, false); + fkp = xfs_rtrmap_key_addr(rblock, 1); + tkp = xfs_rtrmap_droot_key_addr(dblock, 1); + fpp = xfs_rtrmap_broot_ptr_addr(mp, rblock, 1, rblocklen); + tpp = xfs_rtrmap_droot_ptr_addr(dblock, 1, maxrecs); + memcpy(tkp, fkp, 2 * sizeof(*fkp) * numrecs); + memcpy(tpp, fpp, sizeof(*fpp) * numrecs); + } else { + frp = xfs_rtrmap_rec_addr(rblock, 1); + trp = xfs_rtrmap_droot_rec_addr(dblock, 1); + memcpy(trp, frp, sizeof(*frp) * numrecs); + } +} + +/* Flush a realtime reverse mapping btree root out to disk. */ +void +xfs_iflush_rtrmap( + struct xfs_inode *ip, + struct xfs_dinode *dip) +{ + struct xfs_ifork *ifp = xfs_ifork_ptr(ip, XFS_DATA_FORK); + struct xfs_rtrmap_root *dfp = XFS_DFORK_PTR(dip, XFS_DATA_FORK); + + ASSERT(ifp->if_broot != NULL); + ASSERT(ifp->if_broot_bytes > 0); + ASSERT(xfs_rtrmap_droot_space(ifp->if_broot) <= + xfs_inode_fork_size(ip, XFS_DATA_FORK)); + xfs_rtrmapbt_to_disk(ip->i_mount, ifp->if_broot, ifp->if_broot_bytes, + dfp, XFS_DFORK_SIZE(dip, ip->i_mount, XFS_DATA_FORK)); +} diff --git a/libxfs/xfs_rtrmap_btree.h b/libxfs/xfs_rtrmap_btree.h index 63e667d0d76..6917a31bfe0 100644 --- a/libxfs/xfs_rtrmap_btree.h +++ b/libxfs/xfs_rtrmap_btree.h @@ -27,6 +27,7 @@ void xfs_rtrmapbt_commit_staged_btree(struct xfs_btree_cur *cur, unsigned int xfs_rtrmapbt_maxrecs(struct xfs_mount *mp, unsigned int blocklen, bool leaf); void xfs_rtrmapbt_compute_maxlevels(struct xfs_mount *mp); +unsigned int xfs_rtrmapbt_droot_maxrecs(unsigned int blocklen, bool leaf); /* * Addresses of records, keys, and pointers within an incore rtrmapbt block. @@ -86,4 +87,115 @@ int xfs_rtrmapbt_create_path(struct xfs_mount *mp, xfs_rgnumber_t rgno, xfs_filblks_t xfs_rtrmapbt_calc_reserves(struct xfs_mount *mp); +/* Addresses of key, pointers, and records within an ondisk rtrmapbt block. */ + +static inline struct xfs_rmap_rec * +xfs_rtrmap_droot_rec_addr( + struct xfs_rtrmap_root *block, + unsigned int index) +{ + return (struct xfs_rmap_rec *) + ((char *)(block + 1) + + (index - 1) * sizeof(struct xfs_rmap_rec)); +} + +static inline struct xfs_rmap_key * +xfs_rtrmap_droot_key_addr( + struct xfs_rtrmap_root *block, + unsigned int index) +{ + return (struct xfs_rmap_key *) + ((char *)(block + 1) + + (index - 1) * 2 * sizeof(struct xfs_rmap_key)); +} + +static inline xfs_rtrmap_ptr_t * +xfs_rtrmap_droot_ptr_addr( + struct xfs_rtrmap_root *block, + unsigned int index, + unsigned int maxrecs) +{ + return (xfs_rtrmap_ptr_t *) + ((char *)(block + 1) + + maxrecs * 2 * sizeof(struct xfs_rmap_key) + + (index - 1) * sizeof(xfs_rtrmap_ptr_t)); +} + +/* + * Address of pointers within the incore btree root. + * + * These are to be used when we know the size of the block and + * we don't have a cursor. + */ +static inline xfs_rtrmap_ptr_t * +xfs_rtrmap_broot_ptr_addr( + struct xfs_mount *mp, + struct xfs_btree_block *bb, + unsigned int index, + unsigned int block_size) +{ + return xfs_rtrmap_ptr_addr(bb, index, + xfs_rtrmapbt_maxrecs(mp, block_size, false)); +} + +/* + * Compute the space required for the incore btree root containing the given + * number of records. + */ +static inline size_t +xfs_rtrmap_broot_space_calc( + struct xfs_mount *mp, + unsigned int level, + unsigned int nrecs) +{ + size_t sz = XFS_RTRMAP_BLOCK_LEN; + + if (level > 0) + return sz + nrecs * (2 * sizeof(struct xfs_rmap_key) + + sizeof(xfs_rtrmap_ptr_t)); + return sz + nrecs * sizeof(struct xfs_rmap_rec); +} + +/* + * Compute the space required for the incore btree root given the ondisk + * btree root block. + */ +static inline size_t +xfs_rtrmap_broot_space(struct xfs_mount *mp, struct xfs_rtrmap_root *bb) +{ + return xfs_rtrmap_broot_space_calc(mp, be16_to_cpu(bb->bb_level), + be16_to_cpu(bb->bb_numrecs)); +} + +/* Compute the space required for the ondisk root block. */ +static inline size_t +xfs_rtrmap_droot_space_calc( + unsigned int level, + unsigned int nrecs) +{ + size_t sz = sizeof(struct xfs_rtrmap_root); + + if (level > 0) + return sz + nrecs * (2 * sizeof(struct xfs_rmap_key) + + sizeof(xfs_rtrmap_ptr_t)); + return sz + nrecs * sizeof(struct xfs_rmap_rec); +} + +/* + * Compute the space required for the ondisk root block given an incore root + * block. + */ +static inline size_t +xfs_rtrmap_droot_space(struct xfs_btree_block *bb) +{ + return xfs_rtrmap_droot_space_calc(be16_to_cpu(bb->bb_level), + be16_to_cpu(bb->bb_numrecs)); +} + +int xfs_iformat_rtrmap(struct xfs_inode *ip, struct xfs_dinode *dip); +void xfs_rtrmapbt_to_disk(struct xfs_mount *mp, struct xfs_btree_block *rblock, + unsigned int rblocklen, struct xfs_rtrmap_root *dblock, + unsigned int dblocklen); +void xfs_iflush_rtrmap(struct xfs_inode *ip, struct xfs_dinode *dip); + #endif /* __XFS_RTRMAP_BTREE_H__ */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 14/41] xfs: report realtime rmap btree corruption errors to the health system 2022-12-30 22:19 ` [PATCHSET v1.0 00/41] libxfs: realtime reverse-mapping support Darrick J. Wong ` (13 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 10/41] xfs: wire up a new inode fork type for the realtime rmap Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 18/41] xfs: create a shadow rmap btree during realtime rmap repair Darrick J. Wong ` (25 subsequent siblings) 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Whenever we encounter corrupt realtime rmap btree blocks, we should report that to the health monitoring system for later reporting. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/xfs_fs.h | 1 + libxfs/xfs_health.h | 4 +++- libxfs/xfs_inode_fork.c | 4 +++- libxfs/xfs_rtrmap_btree.c | 5 ++++- 4 files changed, 11 insertions(+), 3 deletions(-) diff --git a/libxfs/xfs_fs.h b/libxfs/xfs_fs.h index 7e9d7d7bb40..5c557d5ff13 100644 --- a/libxfs/xfs_fs.h +++ b/libxfs/xfs_fs.h @@ -313,6 +313,7 @@ struct xfs_rtgroup_geometry { }; #define XFS_RTGROUP_GEOM_SICK_SUPER (1 << 0) /* superblock */ #define XFS_RTGROUP_GEOM_SICK_BITMAP (1 << 1) /* rtbitmap for this group */ +#define XFS_RTGROUP_GEOM_SICK_RMAPBT (1 << 2) /* reverse mappings */ /* * Structures for XFS_IOC_FSGROWFSDATA, XFS_IOC_FSGROWFSLOG & XFS_IOC_FSGROWFSRT diff --git a/libxfs/xfs_health.h b/libxfs/xfs_health.h index 44137c4983f..d5976f6b0de 100644 --- a/libxfs/xfs_health.h +++ b/libxfs/xfs_health.h @@ -67,6 +67,7 @@ struct xfs_rtgroup; #define XFS_SICK_RT_BITMAP (1 << 0) /* realtime bitmap */ #define XFS_SICK_RT_SUMMARY (1 << 1) /* realtime summary */ #define XFS_SICK_RT_SUPER (1 << 2) /* rt group superblock */ +#define XFS_SICK_RT_RMAPBT (1 << 3) /* reverse mappings */ /* Observable health issues for AG metadata. */ #define XFS_SICK_AG_SB (1 << 0) /* superblock */ @@ -104,7 +105,8 @@ struct xfs_rtgroup; #define XFS_SICK_RT_PRIMARY (XFS_SICK_RT_BITMAP | \ XFS_SICK_RT_SUMMARY | \ - XFS_SICK_RT_SUPER) + XFS_SICK_RT_SUPER | \ + XFS_SICK_RT_RMAPBT) #define XFS_SICK_AG_PRIMARY (XFS_SICK_AG_SB | \ XFS_SICK_AG_AGF | \ diff --git a/libxfs/xfs_inode_fork.c b/libxfs/xfs_inode_fork.c index 386f23b2954..2b2a3fcab94 100644 --- a/libxfs/xfs_inode_fork.c +++ b/libxfs/xfs_inode_fork.c @@ -259,8 +259,10 @@ xfs_iformat_data_fork( case XFS_DINODE_FMT_BTREE: return xfs_iformat_btree(ip, dip, XFS_DATA_FORK); case XFS_DINODE_FMT_RMAP: - if (!xfs_has_rtrmapbt(ip->i_mount)) + if (!xfs_has_rtrmapbt(ip->i_mount)) { + xfs_inode_mark_sick(ip, XFS_SICK_INO_CORE); return -EFSCORRUPTED; + } return xfs_iformat_rtrmap(ip, dip); default: xfs_inode_verifier_error(ip, -EFSCORRUPTED, __func__, diff --git a/libxfs/xfs_rtrmap_btree.c b/libxfs/xfs_rtrmap_btree.c index 38f0b8567a6..b39ccba497a 100644 --- a/libxfs/xfs_rtrmap_btree.c +++ b/libxfs/xfs_rtrmap_btree.c @@ -25,6 +25,7 @@ #include "xfs_rtgroup.h" #include "xfs_bmap.h" #include "xfs_imeta.h" +#include "xfs_health.h" static struct kmem_cache *xfs_rtrmapbt_cur_cache; @@ -797,8 +798,10 @@ xfs_iformat_rtrmap( level = be16_to_cpu(dfp->bb_level); if (level > mp->m_rtrmap_maxlevels || - xfs_rtrmap_droot_space_calc(level, numrecs) > dsize) + xfs_rtrmap_droot_space_calc(level, numrecs) > dsize) { + xfs_inode_mark_sick(ip, XFS_SICK_INO_CORE); return -EFSCORRUPTED; + } xfs_iroot_alloc(ip, XFS_DATA_FORK, xfs_rtrmap_broot_space_calc(mp, level, numrecs)); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 18/41] xfs: create a shadow rmap btree during realtime rmap repair 2022-12-30 22:19 ` [PATCHSET v1.0 00/41] libxfs: realtime reverse-mapping support Darrick J. Wong ` (14 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 14/41] xfs: report realtime rmap btree corruption errors to the health system Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 21/41] xfs_db: support the realtime rmapbt Darrick J. Wong ` (24 subsequent siblings) 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Create an in-memory btree of rmap records instead of an array. This enables us to do live record collection instead of freezing the fs. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/xfbtree.c | 2 + libxfs/xfs_btree.c | 2 + libxfs/xfs_btree.h | 1 libxfs/xfs_rmap.c | 6 ++ libxfs/xfs_rtrmap_btree.c | 122 +++++++++++++++++++++++++++++++++++++++++++++ libxfs/xfs_rtrmap_btree.h | 9 +++ 6 files changed, 141 insertions(+), 1 deletion(-) diff --git a/libxfs/xfbtree.c b/libxfs/xfbtree.c index 6a2ef7cbe64..21a8ab46683 100644 --- a/libxfs/xfbtree.c +++ b/libxfs/xfbtree.c @@ -253,6 +253,8 @@ xfbtree_dup_cursor( if (cur->bc_mem.pag) ncur->bc_mem.pag = xfs_perag_bump(cur->bc_mem.pag); + if (cur->bc_mem.rtg) + ncur->bc_mem.rtg = xfs_rtgroup_bump(cur->bc_mem.rtg); return ncur; } diff --git a/libxfs/xfs_btree.c b/libxfs/xfs_btree.c index a89a05555b8..49f6ce3661e 100644 --- a/libxfs/xfs_btree.c +++ b/libxfs/xfs_btree.c @@ -487,6 +487,8 @@ xfs_btree_del_cursor( if (cur->bc_flags & XFS_BTREE_IN_MEMORY) { if (cur->bc_mem.pag) xfs_perag_put(cur->bc_mem.pag); + if (cur->bc_mem.rtg) + xfs_rtgroup_put(cur->bc_mem.rtg); } kmem_cache_free(cur->bc_cache, cur); } diff --git a/libxfs/xfs_btree.h b/libxfs/xfs_btree.h index 9f7b6fc5439..1d8656ca112 100644 --- a/libxfs/xfs_btree.h +++ b/libxfs/xfs_btree.h @@ -266,6 +266,7 @@ struct xfs_btree_cur_mem { struct xfbtree *xfbtree; struct xfs_buf *head_bp; struct xfs_perag *pag; + struct xfs_rtgroup *rtg; }; struct xfs_btree_level { diff --git a/libxfs/xfs_rmap.c b/libxfs/xfs_rmap.c index 967a095c45d..5c118eb98b4 100644 --- a/libxfs/xfs_rmap.c +++ b/libxfs/xfs_rmap.c @@ -327,8 +327,12 @@ xfs_rmap_check_irec( struct xfs_btree_cur *cur, const struct xfs_rmap_irec *irec) { - if (cur->bc_btnum == XFS_BTNUM_RTRMAP) + if (cur->bc_btnum == XFS_BTNUM_RTRMAP) { + if (cur->bc_flags & XFS_BTREE_IN_MEMORY) + return xfs_rmap_check_rtgroup_irec(cur->bc_mem.rtg, + irec); return xfs_rmap_check_rtgroup_irec(cur->bc_ino.rtg, irec); + } if (cur->bc_flags & XFS_BTREE_IN_MEMORY) return xfs_rmap_check_perag_irec(cur->bc_mem.pag, irec); diff --git a/libxfs/xfs_rtrmap_btree.c b/libxfs/xfs_rtrmap_btree.c index 05258472592..ea5b3db3b32 100644 --- a/libxfs/xfs_rtrmap_btree.c +++ b/libxfs/xfs_rtrmap_btree.c @@ -26,6 +26,9 @@ #include "xfs_bmap.h" #include "xfs_imeta.h" #include "xfs_health.h" +#include "xfile.h" +#include "xfbtree.h" +#include "xfs_btree_mem.h" static struct kmem_cache *xfs_rtrmapbt_cur_cache; @@ -554,6 +557,125 @@ xfs_rtrmapbt_stage_cursor( return cur; } +#ifdef CONFIG_XFS_IN_MEMORY_BTREE +/* + * Validate an in-memory realtime rmap btree block. Callers are allowed to + * generate an in-memory btree even if the ondisk feature is not enabled. + */ +static xfs_failaddr_t +xfs_rtrmapbt_mem_verify( + struct xfs_buf *bp) +{ + struct xfs_mount *mp = bp->b_mount; + struct xfs_btree_block *block = XFS_BUF_TO_BLOCK(bp); + xfs_failaddr_t fa; + unsigned int level; + + if (!xfs_verify_magic(bp, block->bb_magic)) + return __this_address; + + fa = xfs_btree_lblock_v5hdr_verify(bp, XFS_RMAP_OWN_UNKNOWN); + if (fa) + return fa; + + level = be16_to_cpu(block->bb_level); + if (xfs_has_rmapbt(mp)) { + if (level >= mp->m_rtrmap_maxlevels) + return __this_address; + } else { + if (level >= xfs_rtrmapbt_maxlevels_ondisk()) + return __this_address; + } + + return xfbtree_lblock_verify(bp, + xfs_rtrmapbt_maxrecs(mp, xfo_to_b(1), level == 0)); +} + +static void +xfs_rtrmapbt_mem_rw_verify( + struct xfs_buf *bp) +{ + xfs_failaddr_t fa = xfs_rtrmapbt_mem_verify(bp); + + if (fa) + xfs_verifier_error(bp, -EFSCORRUPTED, fa); +} + +/* skip crc checks on in-memory btrees to save time */ +static const struct xfs_buf_ops xfs_rtrmapbt_mem_buf_ops = { + .name = "xfs_rtrmapbt_mem", + .magic = { 0, cpu_to_be32(XFS_RTRMAP_CRC_MAGIC) }, + .verify_read = xfs_rtrmapbt_mem_rw_verify, + .verify_write = xfs_rtrmapbt_mem_rw_verify, + .verify_struct = xfs_rtrmapbt_mem_verify, +}; + +static const struct xfs_btree_ops xfs_rtrmapbt_mem_ops = { + .rec_len = sizeof(struct xfs_rmap_rec), + .key_len = 2 * sizeof(struct xfs_rmap_key), + .geom_flags = XFS_BTREE_CRC_BLOCKS | XFS_BTREE_OVERLAPPING | + XFS_BTREE_LONG_PTRS | XFS_BTREE_IN_MEMORY, + + .dup_cursor = xfbtree_dup_cursor, + .set_root = xfbtree_set_root, + .alloc_block = xfbtree_alloc_block, + .free_block = xfbtree_free_block, + .get_minrecs = xfbtree_get_minrecs, + .get_maxrecs = xfbtree_get_maxrecs, + .init_key_from_rec = xfs_rtrmapbt_init_key_from_rec, + .init_high_key_from_rec = xfs_rtrmapbt_init_high_key_from_rec, + .init_rec_from_cur = xfs_rtrmapbt_init_rec_from_cur, + .init_ptr_from_cur = xfbtree_init_ptr_from_cur, + .key_diff = xfs_rtrmapbt_key_diff, + .buf_ops = &xfs_rtrmapbt_mem_buf_ops, + .diff_two_keys = xfs_rtrmapbt_diff_two_keys, + .keys_inorder = xfs_rtrmapbt_keys_inorder, + .recs_inorder = xfs_rtrmapbt_recs_inorder, + .keys_contiguous = xfs_rtrmapbt_keys_contiguous, +}; + +/* Create a cursor for an in-memory btree. */ +struct xfs_btree_cur * +xfs_rtrmapbt_mem_cursor( + struct xfs_rtgroup *rtg, + struct xfs_trans *tp, + struct xfs_buf *head_bp, + struct xfbtree *xfbtree) +{ + struct xfs_btree_cur *cur; + struct xfs_mount *mp = rtg->rtg_mount; + + /* Overlapping btree; 2 keys per pointer. */ + cur = xfs_btree_alloc_cursor(mp, tp, XFS_BTNUM_RTRMAP, + &xfs_rtrmapbt_mem_ops, mp->m_rtrmap_maxlevels, + xfs_rtrmapbt_cur_cache); + cur->bc_statoff = XFS_STATS_CALC_INDEX(xs_rmap_2); + cur->bc_mem.xfbtree = xfbtree; + cur->bc_mem.head_bp = head_bp; + cur->bc_nlevels = xfs_btree_mem_head_nlevels(head_bp); + + cur->bc_mem.rtg = xfs_rtgroup_bump(rtg); + return cur; +} + +int +xfs_rtrmapbt_mem_create( + struct xfs_mount *mp, + xfs_rgnumber_t rgno, + struct xfs_buftarg *target, + struct xfbtree **xfbtreep) +{ + struct xfbtree_config cfg = { + .btree_ops = &xfs_rtrmapbt_mem_ops, + .target = target, + .flags = XFBTREE_DIRECT_MAP, + .owner = rgno, + }; + + return xfbtree_create(mp, &cfg, xfbtreep); +} +#endif /* CONFIG_XFS_IN_MEMORY_BTREE */ + /* * Install a new rt reverse mapping btree root. Caller is responsible for * invalidating and freeing the old btree blocks. diff --git a/libxfs/xfs_rtrmap_btree.h b/libxfs/xfs_rtrmap_btree.h index 1f0a6f9620e..ff60a2ca945 100644 --- a/libxfs/xfs_rtrmap_btree.h +++ b/libxfs/xfs_rtrmap_btree.h @@ -206,4 +206,13 @@ int xfs_rtrmapbt_create(struct xfs_trans **tpp, struct xfs_imeta_path *path, unsigned long long xfs_rtrmapbt_calc_size(struct xfs_mount *mp, unsigned long long len); +#ifdef CONFIG_XFS_IN_MEMORY_BTREE +struct xfbtree; +struct xfs_btree_cur *xfs_rtrmapbt_mem_cursor(struct xfs_rtgroup *rtg, + struct xfs_trans *tp, struct xfs_buf *mhead_bp, + struct xfbtree *xfbtree); +int xfs_rtrmapbt_mem_create(struct xfs_mount *mp, xfs_rgnumber_t rgno, + struct xfs_buftarg *target, struct xfbtree **xfbtreep); +#endif /* CONFIG_XFS_IN_MEMORY_BTREE */ + #endif /* __XFS_RTRMAP_BTREE_H__ */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 21/41] xfs_db: support the realtime rmapbt 2022-12-30 22:19 ` [PATCHSET v1.0 00/41] libxfs: realtime reverse-mapping support Darrick J. Wong ` (15 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 18/41] xfs: create a shadow rmap btree during realtime rmap repair Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 19/41] xfs: hook live realtime rmap operations during a repair operation Darrick J. Wong ` (23 subsequent siblings) 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Wire up various parts of xfs_db for realtime rmap support. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- db/btblock.c | 3 ++ db/btdump.c | 64 ++++++++++++++++++++++++++++++++++++++++++++++ db/btheight.c | 5 ++++ libxfs/libxfs_api_defs.h | 1 + man/man8/xfs_db.8 | 3 +- 5 files changed, 75 insertions(+), 1 deletion(-) diff --git a/db/btblock.c b/db/btblock.c index 5cad166278d..70f6c3f6aed 100644 --- a/db/btblock.c +++ b/db/btblock.c @@ -147,6 +147,9 @@ block_to_bt( case TYP_RMAPBT: magic = crc ? XFS_RMAP_CRC_MAGIC : 0; break; + case TYP_RTRMAPBT: + magic = crc ? XFS_RTRMAP_CRC_MAGIC : 0; + break; case TYP_REFCBT: magic = crc ? XFS_REFC_CRC_MAGIC : 0; break; diff --git a/db/btdump.c b/db/btdump.c index 81642cde2b6..9c528e5a11a 100644 --- a/db/btdump.c +++ b/db/btdump.c @@ -441,6 +441,67 @@ dump_dabtree( return ret; } +static bool +is_btree_inode(void) +{ + struct xfs_dinode *dip; + + dip = iocur_top->data; + return dip->di_format == XFS_DINODE_FMT_RMAP; +} + +static int +dump_btree_inode( + bool dump_node_blocks) +{ + char *prefix; + struct xfs_dinode *dip; + struct xfs_rtrmap_root *rtrmap; + int level; + int numrecs; + int ret; + + dip = iocur_top->data; + switch (dip->di_format) { + case XFS_DINODE_FMT_RMAP: + prefix = "u3.rtrmapbt"; + rtrmap = (struct xfs_rtrmap_root *)XFS_DFORK_DPTR(dip); + level = be16_to_cpu(rtrmap->bb_level); + numrecs = be16_to_cpu(rtrmap->bb_numrecs); + break; + default: + dbprintf("Unknown metadata inode type %u\n", dip->di_format); + return 0; + } + + if (numrecs == 0) + return 0; + if (level > 0) { + if (dump_node_blocks) { + ret = eval("print %s.keys", prefix); + if (ret) + goto err; + ret = eval("print %s.ptrs", prefix); + if (ret) + goto err; + } + ret = eval("addr %s.ptrs[1]", prefix); + if (ret) + goto err; + ret = dump_btree_long(dump_node_blocks); + } else { + ret = eval("print %s.recs", prefix); + } + if (ret) + goto err; + + ret = eval("pop"); + return ret; +err: + eval("pop"); + return ret; +} + static int btdump_f( int argc, @@ -488,8 +549,11 @@ btdump_f( return dump_btree_short(iflag); case TYP_BMAPBTA: case TYP_BMAPBTD: + case TYP_RTRMAPBT: return dump_btree_long(iflag); case TYP_INODE: + if (is_btree_inode()) + return dump_btree_inode(iflag); return dump_inode(iflag, aflag); case TYP_ATTR: return dump_dabtree(iflag, crc ? &attr3_print : &attr_print); diff --git a/db/btheight.c b/db/btheight.c index 6643489c82c..25ce3400334 100644 --- a/db/btheight.c +++ b/db/btheight.c @@ -53,6 +53,11 @@ struct btmap { .maxlevels = libxfs_rmapbt_maxlevels_ondisk, .maxrecs = libxfs_rmapbt_maxrecs, }, + { + .tag = "rtrmapbt", + .maxlevels = libxfs_rtrmapbt_maxlevels_ondisk, + .maxrecs = libxfs_rtrmapbt_maxrecs, + }, }; static void diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h index ae92c909265..0e284e515d8 100644 --- a/libxfs/libxfs_api_defs.h +++ b/libxfs/libxfs_api_defs.h @@ -245,6 +245,7 @@ #define xfs_rtgroup_update_super libxfs_rtgroup_update_super #define xfs_rtrmapbt_create_path libxfs_rtrmapbt_create_path #define xfs_rtrmapbt_droot_maxrecs libxfs_rtrmapbt_droot_maxrecs +#define xfs_rtrmapbt_maxlevels_ondisk libxfs_rtrmapbt_maxlevels_ondisk #define xfs_rtrmapbt_maxrecs libxfs_rtrmapbt_maxrecs #define xfs_sb_from_disk libxfs_sb_from_disk diff --git a/man/man8/xfs_db.8 b/man/man8/xfs_db.8 index 92d22cbc15f..2efa45297db 100644 --- a/man/man8/xfs_db.8 +++ b/man/man8/xfs_db.8 @@ -454,8 +454,9 @@ The supported btree types are: .IR finobt , .IR bmapbt , .IR refcountbt , +.IR rmapbt , and -.IR rmapbt . +.IR rtrmapbt . The magic value .I all can be used to walk through all btree types. ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 19/41] xfs: hook live realtime rmap operations during a repair operation 2022-12-30 22:19 ` [PATCHSET v1.0 00/41] libxfs: realtime reverse-mapping support Darrick J. Wong ` (16 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 21/41] xfs_db: support the realtime rmapbt Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 20/41] xfs_db: display the realtime rmap btree contents Darrick J. Wong ` (22 subsequent siblings) 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Hook the regular realtime rmap code when an rtrmapbt repair operation is running so that we can unlock the AGF buffer to scan the filesystem and keep the in-memory btree up to date during the scan. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/xfs_rmap.c | 39 ++++++++++++++++++++++++++++++++++----- libxfs/xfs_rmap.h | 6 ++++++ libxfs/xfs_rtgroup.c | 2 +- libxfs/xfs_rtgroup.h | 3 +++ 4 files changed, 44 insertions(+), 6 deletions(-) diff --git a/libxfs/xfs_rmap.c b/libxfs/xfs_rmap.c index 5c118eb98b4..a8ba49a89cf 100644 --- a/libxfs/xfs_rmap.c +++ b/libxfs/xfs_rmap.c @@ -905,6 +905,7 @@ static inline void xfs_rmap_update_hook( struct xfs_trans *tp, struct xfs_perag *pag, + struct xfs_rtgroup *rtg, enum xfs_rmap_intent_type op, xfs_agblock_t startblock, xfs_extlen_t blockcount, @@ -921,6 +922,8 @@ xfs_rmap_update_hook( if (pag) xfs_hooks_call(&pag->pag_rmap_update_hooks, op, &p); + else if (rtg) + xfs_hooks_call(&rtg->rtg_rmap_update_hooks, op, &p); } } @@ -941,8 +944,28 @@ xfs_rmap_hook_del( { xfs_hooks_del(&pag->pag_rmap_update_hooks, &hook->update_hook); } + +# ifdef CONFIG_XFS_RT +/* Call the specified function during a rt reverse mapping update. */ +int +xfs_rtrmap_hook_add( + struct xfs_rtgroup *rtg, + struct xfs_rmap_hook *hook) +{ + return xfs_hooks_add(&rtg->rtg_rmap_update_hooks, &hook->update_hook); +} + +/* Stop calling the specified function during a rt reverse mapping update. */ +void +xfs_rtrmap_hook_del( + struct xfs_rtgroup *rtg, + struct xfs_rmap_hook *hook) +{ + xfs_hooks_del(&rtg->rtg_rmap_update_hooks, &hook->update_hook); +} +# endif /* CONFIG_XFS_RT */ #else -# define xfs_rmap_update_hook(t, p, o, s, b, u, oi) do { } while(0) +# define xfs_rmap_update_hook(t, p, r, o, s, b, u, oi) do { } while(0) #endif /* CONFIG_XFS_LIVE_HOOKS */ /* @@ -965,7 +988,8 @@ xfs_rmap_free( return 0; cur = xfs_rmapbt_init_cursor(mp, tp, agbp, pag); - xfs_rmap_update_hook(tp, pag, XFS_RMAP_UNMAP, bno, len, false, oinfo); + xfs_rmap_update_hook(tp, pag, NULL, XFS_RMAP_UNMAP, bno, len, false, + oinfo); error = xfs_rmap_unmap(cur, bno, len, false, oinfo); xfs_btree_del_cursor(cur, error); @@ -1209,7 +1233,8 @@ xfs_rmap_alloc( return 0; cur = xfs_rmapbt_init_cursor(mp, tp, agbp, pag); - xfs_rmap_update_hook(tp, pag, XFS_RMAP_MAP, bno, len, false, oinfo); + xfs_rmap_update_hook(tp, pag, NULL, XFS_RMAP_MAP, bno, len, false, + oinfo); error = xfs_rmap_map(cur, bno, len, false, oinfo); xfs_btree_del_cursor(cur, error); @@ -2730,8 +2755,12 @@ xfs_rmap_finish_one( if (error) return error; - xfs_rmap_update_hook(tp, ri->ri_pag, ri->ri_type, bno, - ri->ri_bmap.br_blockcount, unwritten, &oinfo); + if (ri->ri_realtime) + xfs_rmap_update_hook(tp, NULL, ri->ri_rtg, ri->ri_type, bno, + ri->ri_bmap.br_blockcount, unwritten, &oinfo); + else + xfs_rmap_update_hook(tp, ri->ri_pag, NULL, ri->ri_type, bno, + ri->ri_bmap.br_blockcount, unwritten, &oinfo); return 0; } diff --git a/libxfs/xfs_rmap.h b/libxfs/xfs_rmap.h index 9d0aaa16f55..36d071b3b44 100644 --- a/libxfs/xfs_rmap.h +++ b/libxfs/xfs_rmap.h @@ -279,6 +279,12 @@ void xfs_rmap_hook_enable(void); int xfs_rmap_hook_add(struct xfs_perag *pag, struct xfs_rmap_hook *hook); void xfs_rmap_hook_del(struct xfs_perag *pag, struct xfs_rmap_hook *hook); + +# ifdef CONFIG_XFS_RT +int xfs_rtrmap_hook_add(struct xfs_rtgroup *rtg, struct xfs_rmap_hook *hook); +void xfs_rtrmap_hook_del(struct xfs_rtgroup *rtg, struct xfs_rmap_hook *hook); +# endif /* CONFIG_XFS_RT */ + #endif #endif /* __XFS_RMAP_H__ */ diff --git a/libxfs/xfs_rtgroup.c b/libxfs/xfs_rtgroup.c index 8c41869a61a..f5f981609b4 100644 --- a/libxfs/xfs_rtgroup.c +++ b/libxfs/xfs_rtgroup.c @@ -130,7 +130,7 @@ xfs_initialize_rtgroups( /* Place kernel structure only init below this point. */ spin_lock_init(&rtg->rtg_state_lock); xfs_drain_init(&rtg->rtg_intents); - + xfs_hooks_init(&rtg->rtg_rmap_update_hooks); #endif /* __KERNEL__ */ /* first new rtg is fully initialized */ diff --git a/libxfs/xfs_rtgroup.h b/libxfs/xfs_rtgroup.h index 1d41a2cac34..4e9b9098f2f 100644 --- a/libxfs/xfs_rtgroup.h +++ b/libxfs/xfs_rtgroup.h @@ -46,6 +46,9 @@ struct xfs_rtgroup { * inconsistencies. */ struct xfs_drain rtg_intents; + + /* Hook to feed rt rmapbt updates to an active online repair. */ + struct xfs_hooks rtg_rmap_update_hooks; #endif /* __KERNEL__ */ }; ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 20/41] xfs_db: display the realtime rmap btree contents 2022-12-30 22:19 ` [PATCHSET v1.0 00/41] libxfs: realtime reverse-mapping support Darrick J. Wong ` (17 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 19/41] xfs: hook live realtime rmap operations during a repair operation Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 16/41] xfs: scrub the realtime rmapbt Darrick J. Wong ` (21 subsequent siblings) 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Implement all the code we need to dump rtrmapbt contents, starting from the root inode. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- db/bmroot.c | 149 ++++++++++++++++++++++++++++++++++++++++++++++ db/bmroot.h | 2 + db/btblock.c | 100 +++++++++++++++++++++++++++++++ db/btblock.h | 5 ++ db/field.c | 11 +++ db/field.h | 5 ++ db/inode.c | 90 +++++++++++++++++++++++++++- db/inode.h | 3 + db/type.c | 5 ++ db/type.h | 1 libxfs/libxfs_api_defs.h | 5 ++ man/man8/xfs_db.8 | 60 ++++++++++++++++++- 12 files changed, 432 insertions(+), 4 deletions(-) diff --git a/db/bmroot.c b/db/bmroot.c index 7ef07da181e..19490bd2499 100644 --- a/db/bmroot.c +++ b/db/bmroot.c @@ -24,6 +24,13 @@ static int bmrootd_key_offset(void *obj, int startoff, int idx); static int bmrootd_ptr_count(void *obj, int startoff); static int bmrootd_ptr_offset(void *obj, int startoff, int idx); +static int rtrmaproot_rec_count(void *obj, int startoff); +static int rtrmaproot_rec_offset(void *obj, int startoff, int idx); +static int rtrmaproot_key_count(void *obj, int startoff); +static int rtrmaproot_key_offset(void *obj, int startoff, int idx); +static int rtrmaproot_ptr_count(void *obj, int startoff); +static int rtrmaproot_ptr_offset(void *obj, int startoff, int idx); + #define OFF(f) bitize(offsetof(xfs_bmdr_block_t, bb_ ## f)) const field_t bmroota_flds[] = { { "level", FLDT_UINT16D, OI(OFF(level)), C1, 0, TYP_NONE }, @@ -54,6 +61,20 @@ const field_t bmrootd_key_flds[] = { { NULL } }; +/* realtime rmap btree root */ +const field_t rtrmaproot_flds[] = { + { "level", FLDT_UINT16D, OI(OFF(level)), C1, 0, TYP_NONE }, + { "numrecs", FLDT_UINT16D, OI(OFF(numrecs)), C1, 0, TYP_NONE }, + { "recs", FLDT_RTRMAPBTREC, rtrmaproot_rec_offset, rtrmaproot_rec_count, + FLD_ARRAY|FLD_ABASE1|FLD_COUNT|FLD_OFFSET, TYP_NONE }, + { "keys", FLDT_RTRMAPBTKEY, rtrmaproot_key_offset, rtrmaproot_key_count, + FLD_ARRAY|FLD_ABASE1|FLD_COUNT|FLD_OFFSET, TYP_NONE }, + { "ptrs", FLDT_RTRMAPBTPTR, rtrmaproot_ptr_offset, rtrmaproot_ptr_count, + FLD_ARRAY|FLD_ABASE1|FLD_COUNT|FLD_OFFSET, TYP_RTRMAPBT }, + { NULL } +}; +#undef OFF + static int bmroota_key_count( void *obj, @@ -241,3 +262,131 @@ bmrootd_size( dip = obj; return bitize((int)XFS_DFORK_DSIZE(dip, mp)); } + +/* realtime rmap root */ +static int +rtrmaproot_rec_count( + void *obj, + int startoff) +{ + struct xfs_rtrmap_root *block; +#ifdef DEBUG + struct xfs_dinode *dip = obj; +#endif + + ASSERT(bitoffs(startoff) == 0); + ASSERT(obj == iocur_top->data); + block = (struct xfs_rtrmap_root *)((char *)obj + byteize(startoff)); + ASSERT((char *)block == XFS_DFORK_DPTR(dip)); + if (be16_to_cpu(block->bb_level) > 0) + return 0; + return be16_to_cpu(block->bb_numrecs); +} + +static int +rtrmaproot_rec_offset( + void *obj, + int startoff, + int idx) +{ + struct xfs_rtrmap_root *block; + struct xfs_rmap_rec *kp; + + ASSERT(bitoffs(startoff) == 0); + ASSERT(obj == iocur_top->data); + block = (struct xfs_rtrmap_root *)((char *)obj + byteize(startoff)); + ASSERT(be16_to_cpu(block->bb_level) == 0); + kp = xfs_rtrmap_droot_rec_addr(block, idx); + return bitize((int)((char *)kp - (char *)block)); +} + +static int +rtrmaproot_key_count( + void *obj, + int startoff) +{ + struct xfs_rtrmap_root *block; +#ifdef DEBUG + struct xfs_dinode *dip = obj; +#endif + + ASSERT(bitoffs(startoff) == 0); + ASSERT(obj == iocur_top->data); + block = (struct xfs_rtrmap_root *)((char *)obj + byteize(startoff)); + ASSERT((char *)block == XFS_DFORK_DPTR(dip)); + if (be16_to_cpu(block->bb_level) == 0) + return 0; + return be16_to_cpu(block->bb_numrecs); +} + +static int +rtrmaproot_key_offset( + void *obj, + int startoff, + int idx) +{ + struct xfs_rtrmap_root *block; + struct xfs_rmap_key *kp; + + ASSERT(bitoffs(startoff) == 0); + ASSERT(obj == iocur_top->data); + block = (struct xfs_rtrmap_root *)((char *)obj + byteize(startoff)); + ASSERT(be16_to_cpu(block->bb_level) > 0); + kp = xfs_rtrmap_droot_key_addr(block, idx); + return bitize((int)((char *)kp - (char *)block)); +} + +static int +rtrmaproot_ptr_count( + void *obj, + int startoff) +{ + struct xfs_rtrmap_root *block; +#ifdef DEBUG + struct xfs_dinode *dip = obj; +#endif + + ASSERT(bitoffs(startoff) == 0); + ASSERT(obj == iocur_top->data); + block = (struct xfs_rtrmap_root *)((char *)obj + byteize(startoff)); + ASSERT((char *)block == XFS_DFORK_DPTR(dip)); + if (be16_to_cpu(block->bb_level) == 0) + return 0; + return be16_to_cpu(block->bb_numrecs); +} + +static int +rtrmaproot_ptr_offset( + void *obj, + int startoff, + int idx) +{ + struct xfs_rtrmap_root *block; + xfs_rtrmap_ptr_t *pp; + struct xfs_dinode *dip; + int dmxr; + + ASSERT(bitoffs(startoff) == 0); + ASSERT(obj == iocur_top->data); + dip = obj; + block = (struct xfs_rtrmap_root *)((char *)obj + byteize(startoff)); + ASSERT(be16_to_cpu(block->bb_level) > 0); + dmxr = libxfs_rtrmapbt_droot_maxrecs(XFS_DFORK_DSIZE(dip, mp), false); + pp = xfs_rtrmap_droot_ptr_addr(block, idx, dmxr); + return bitize((int)((char *)pp - (char *)block)); +} + +int +rtrmaproot_size( + void *obj, + int startoff, + int idx) +{ + struct xfs_dinode *dip; + + ASSERT(bitoffs(startoff) == 0); + ASSERT(obj == iocur_top->data); + ASSERT(idx == 0); + dip = obj; + return bitize((int)XFS_DFORK_DSIZE(dip, mp)); +} diff --git a/db/bmroot.h b/db/bmroot.h index a1274cf6a94..a2c5cfb18f0 100644 --- a/db/bmroot.h +++ b/db/bmroot.h @@ -8,6 +8,8 @@ extern const struct field bmroota_flds[]; extern const struct field bmroota_key_flds[]; extern const struct field bmrootd_flds[]; extern const struct field bmrootd_key_flds[]; +extern const struct field rtrmaproot_flds[]; extern int bmroota_size(void *obj, int startoff, int idx); extern int bmrootd_size(void *obj, int startoff, int idx); +extern int rtrmaproot_size(void *obj, int startoff, int idx); diff --git a/db/btblock.c b/db/btblock.c index d5be6adb734..5cad166278d 100644 --- a/db/btblock.c +++ b/db/btblock.c @@ -92,6 +92,12 @@ static struct xfs_db_btree { sizeof(struct xfs_rmap_rec), sizeof(__be32), }, + { XFS_RTRMAP_CRC_MAGIC, + XFS_BTREE_LBLOCK_CRC_LEN, + 2 * sizeof(struct xfs_rmap_key), + sizeof(struct xfs_rmap_rec), + sizeof(__be64), + }, { XFS_REFC_CRC_MAGIC, XFS_BTREE_SBLOCK_CRC_LEN, sizeof(struct xfs_refcount_key), @@ -813,6 +819,100 @@ const field_t rmapbt_rec_flds[] = { { NULL } }; +/* realtime RMAP btree blocks */ +const field_t rtrmapbt_crc_hfld[] = { + { "", FLDT_RTRMAPBT_CRC, OI(0), C1, 0, TYP_NONE }, + { NULL } +}; + +#define OFF(f) bitize(offsetof(struct xfs_btree_block, bb_ ## f)) +const field_t rtrmapbt_crc_flds[] = { + { "magic", FLDT_UINT32X, OI(OFF(magic)), C1, 0, TYP_NONE }, + { "level", FLDT_UINT16D, OI(OFF(level)), C1, 0, TYP_NONE }, + { "numrecs", FLDT_UINT16D, OI(OFF(numrecs)), C1, 0, TYP_NONE }, + { "leftsib", FLDT_DFSBNO, OI(OFF(u.l.bb_leftsib)), C1, 0, TYP_RTRMAPBT }, + { "rightsib", FLDT_DFSBNO, OI(OFF(u.l.bb_rightsib)), C1, 0, TYP_RTRMAPBT }, + { "bno", FLDT_DFSBNO, OI(OFF(u.l.bb_blkno)), C1, 0, TYP_RTRMAPBT }, + { "lsn", FLDT_UINT64X, OI(OFF(u.l.bb_lsn)), C1, 0, TYP_NONE }, + { "uuid", FLDT_UUID, OI(OFF(u.l.bb_uuid)), C1, 0, TYP_NONE }, + { "owner", FLDT_INO, OI(OFF(u.l.bb_owner)), C1, 0, TYP_NONE }, + { "crc", FLDT_CRC, OI(OFF(u.l.bb_crc)), C1, 0, TYP_NONE }, + { "recs", FLDT_RTRMAPBTREC, btblock_rec_offset, btblock_rec_count, + FLD_ARRAY|FLD_ABASE1|FLD_COUNT|FLD_OFFSET, TYP_NONE }, + { "keys", FLDT_RTRMAPBTKEY, btblock_key_offset, btblock_key_count, + FLD_ARRAY|FLD_ABASE1|FLD_COUNT|FLD_OFFSET, TYP_NONE }, + { "ptrs", FLDT_RTRMAPBTPTR, btblock_ptr_offset, btblock_key_count, + FLD_ARRAY|FLD_ABASE1|FLD_COUNT|FLD_OFFSET, TYP_RTRMAPBT }, + { NULL } +}; +#undef OFF + +#define KOFF(f) bitize(offsetof(struct xfs_rmap_key, rm_ ## f)) + +#define RTRMAPBK_STARTBLOCK_BITOFF 0 +#define RTRMAPBK_OWNER_BITOFF (RTRMAPBK_STARTBLOCK_BITOFF + RMAPBT_STARTBLOCK_BITLEN) +#define RTRMAPBK_ATTRFLAG_BITOFF (RTRMAPBK_OWNER_BITOFF + RMAPBT_OWNER_BITLEN) +#define RTRMAPBK_BMBTFLAG_BITOFF (RTRMAPBK_ATTRFLAG_BITOFF + RMAPBT_ATTRFLAG_BITLEN) +#define RTRMAPBK_EXNTFLAG_BITOFF (RTRMAPBK_BMBTFLAG_BITOFF + RMAPBT_BMBTFLAG_BITLEN) +#define RTRMAPBK_UNUSED_OFFSET_BITOFF (RTRMAPBK_EXNTFLAG_BITOFF + RMAPBT_EXNTFLAG_BITLEN) +#define RTRMAPBK_OFFSET_BITOFF (RTRMAPBK_UNUSED_OFFSET_BITOFF + RMAPBT_UNUSED_OFFSET_BITLEN) + +#define HI_KOFF(f) bitize(sizeof(struct xfs_rmap_key) + offsetof(struct xfs_rmap_key, rm_ ## f)) + +#define RTRMAPBK_STARTBLOCKHI_BITOFF (bitize(sizeof(struct xfs_rmap_key))) +#define RTRMAPBK_OWNERHI_BITOFF (RTRMAPBK_STARTBLOCKHI_BITOFF + RMAPBT_STARTBLOCK_BITLEN) +#define RTRMAPBK_ATTRFLAGHI_BITOFF (RTRMAPBK_OWNERHI_BITOFF + RMAPBT_OWNER_BITLEN) +#define RTRMAPBK_BMBTFLAGHI_BITOFF (RTRMAPBK_ATTRFLAGHI_BITOFF + RMAPBT_ATTRFLAG_BITLEN) +#define RTRMAPBK_EXNTFLAGHI_BITOFF (RTRMAPBK_BMBTFLAGHI_BITOFF + RMAPBT_BMBTFLAG_BITLEN) +#define RTRMAPBK_UNUSED_OFFSETHI_BITOFF (RTRMAPBK_EXNTFLAGHI_BITOFF + RMAPBT_EXNTFLAG_BITLEN) +#define RTRMAPBK_OFFSETHI_BITOFF (RTRMAPBK_UNUSED_OFFSETHI_BITOFF + RMAPBT_UNUSED_OFFSET_BITLEN) + +const field_t rtrmapbt_key_flds[] = { + { "startblock", FLDT_RGBLOCK, OI(KOFF(startblock)), C1, 0, TYP_DATA }, + { "owner", FLDT_INT64D, OI(KOFF(owner)), C1, 0, TYP_NONE }, + { "offset", FLDT_RFILEOFFD, OI(RTRMAPBK_OFFSET_BITOFF), C1, 0, TYP_NONE }, + { "attrfork", FLDT_RATTRFORKFLG, OI(RTRMAPBK_ATTRFLAG_BITOFF), C1, 0, + TYP_NONE }, + { "bmbtblock", FLDT_RBMBTFLG, OI(RTRMAPBK_BMBTFLAG_BITOFF), C1, 0, + TYP_NONE }, + { "startblock_hi", FLDT_RGBLOCK, OI(HI_KOFF(startblock)), C1, 0, TYP_DATA }, + { "owner_hi", FLDT_INT64D, OI(HI_KOFF(owner)), C1, 0, TYP_NONE }, + { "offset_hi", FLDT_RFILEOFFD, OI(RTRMAPBK_OFFSETHI_BITOFF), C1, 0, TYP_NONE }, + { "attrfork_hi", FLDT_RATTRFORKFLG, OI(RTRMAPBK_ATTRFLAGHI_BITOFF), C1, 0, + TYP_NONE }, + { "bmbtblock_hi", FLDT_RBMBTFLG, OI(RTRMAPBK_BMBTFLAGHI_BITOFF), C1, 0, + TYP_NONE }, + { NULL } +}; +#undef HI_KOFF +#undef KOFF + +#define ROFF(f) bitize(offsetof(struct xfs_rmap_rec, rm_ ## f)) + +#define RTRMAPBT_STARTBLOCK_BITOFF 0 +#define RTRMAPBT_BLOCKCOUNT_BITOFF (RTRMAPBT_STARTBLOCK_BITOFF + RMAPBT_STARTBLOCK_BITLEN) +#define RTRMAPBT_OWNER_BITOFF (RTRMAPBT_BLOCKCOUNT_BITOFF + RMAPBT_BLOCKCOUNT_BITLEN) +#define RTRMAPBT_ATTRFLAG_BITOFF (RTRMAPBT_OWNER_BITOFF + RMAPBT_OWNER_BITLEN) +#define RTRMAPBT_BMBTFLAG_BITOFF (RTRMAPBT_ATTRFLAG_BITOFF + RMAPBT_ATTRFLAG_BITLEN) +#define RTRMAPBT_EXNTFLAG_BITOFF (RTRMAPBT_BMBTFLAG_BITOFF + RMAPBT_BMBTFLAG_BITLEN) +#define RTRMAPBT_UNUSED_OFFSET_BITOFF (RTRMAPBT_EXNTFLAG_BITOFF + RMAPBT_EXNTFLAG_BITLEN) +#define RTRMAPBT_OFFSET_BITOFF (RTRMAPBT_UNUSED_OFFSET_BITOFF + RMAPBT_UNUSED_OFFSET_BITLEN) + +const field_t rtrmapbt_rec_flds[] = { + { "startblock", FLDT_RGBLOCK, OI(RTRMAPBT_STARTBLOCK_BITOFF), C1, 0, TYP_DATA }, + { "blockcount", FLDT_EXTLEN, OI(RTRMAPBT_BLOCKCOUNT_BITOFF), C1, 0, TYP_NONE }, + { "owner", FLDT_INT64D, OI(RTRMAPBT_OWNER_BITOFF), C1, 0, TYP_NONE }, + { "offset", FLDT_RFILEOFFD, OI(RTRMAPBT_OFFSET_BITOFF), C1, 0, TYP_NONE }, + { "extentflag", FLDT_REXTFLG, OI(RTRMAPBT_EXNTFLAG_BITOFF), C1, 0, + TYP_NONE }, + { "attrfork", FLDT_RATTRFORKFLG, OI(RTRMAPBT_ATTRFLAG_BITOFF), C1, 0, + TYP_NONE }, + { "bmbtblock", FLDT_RBMBTFLG, OI(RTRMAPBT_BMBTFLAG_BITOFF), C1, 0, + TYP_NONE }, + { NULL } +}; +#undef ROFF + /* refcount btree blocks */ const field_t refcbt_crc_hfld[] = { { "", FLDT_REFCBT_CRC, OI(0), C1, 0, TYP_NONE }, diff --git a/db/btblock.h b/db/btblock.h index 4168c9e2e15..b4013ea8073 100644 --- a/db/btblock.h +++ b/db/btblock.h @@ -53,6 +53,11 @@ extern const struct field rmapbt_crc_hfld[]; extern const struct field rmapbt_key_flds[]; extern const struct field rmapbt_rec_flds[]; +extern const struct field rtrmapbt_crc_flds[]; +extern const struct field rtrmapbt_crc_hfld[]; +extern const struct field rtrmapbt_key_flds[]; +extern const struct field rtrmapbt_rec_flds[]; + extern const struct field refcbt_crc_flds[]; extern const struct field refcbt_crc_hfld[]; extern const struct field refcbt_key_flds[]; diff --git a/db/field.c b/db/field.c index 4a6a4cf51c3..b3efbb5698d 100644 --- a/db/field.c +++ b/db/field.c @@ -184,6 +184,17 @@ const ftattr_t ftattrtab[] = { { FLDT_RMAPBTREC, "rmapbtrec", fp_sarray, (char *)rmapbt_rec_flds, SI(bitsz(struct xfs_rmap_rec)), 0, NULL, rmapbt_rec_flds }, + { FLDT_RTRMAPBT_CRC, "rtrmapbt", NULL, (char *)rtrmapbt_crc_flds, btblock_size, + FTARG_SIZE, NULL, rtrmapbt_crc_flds }, + { FLDT_RTRMAPBTKEY, "rtrmapbtkey", fp_sarray, (char *)rtrmapbt_key_flds, + SI(bitize(2 * sizeof(struct xfs_rmap_key))), 0, NULL, rtrmapbt_key_flds }, + { FLDT_RTRMAPBTPTR, "rtrmapbtptr", fp_num, "%llu", + SI(bitsz(xfs_rtrmap_ptr_t)), 0, fa_dfsbno, NULL }, + { FLDT_RTRMAPBTREC, "rtrmapbtrec", fp_sarray, (char *)rtrmapbt_rec_flds, + SI(bitsz(struct xfs_rmap_rec)), 0, NULL, rtrmapbt_rec_flds }, + { FLDT_RTRMAPROOT, "rtrmaproot", NULL, (char *)rtrmaproot_flds, rtrmaproot_size, + FTARG_SIZE, NULL, rtrmaproot_flds }, + { FLDT_REFCBT_CRC, "refcntbt", NULL, (char *)refcbt_crc_flds, btblock_size, FTARG_SIZE, NULL, refcbt_crc_flds }, { FLDT_REFCBTKEY, "refcntbtkey", fp_sarray, (char *)refcbt_key_flds, diff --git a/db/field.h b/db/field.h index e9c6142f282..db3e13d3927 100644 --- a/db/field.h +++ b/db/field.h @@ -83,6 +83,11 @@ typedef enum fldt { FLDT_RMAPBTKEY, FLDT_RMAPBTPTR, FLDT_RMAPBTREC, + FLDT_RTRMAPBT_CRC, + FLDT_RTRMAPBTKEY, + FLDT_RTRMAPBTPTR, + FLDT_RTRMAPBTREC, + FLDT_RTRMAPROOT, FLDT_REFCBT_CRC, FLDT_REFCBTKEY, FLDT_REFCBTPTR, diff --git a/db/inode.c b/db/inode.c index 0b9dc617ba9..460d99175ab 100644 --- a/db/inode.c +++ b/db/inode.c @@ -17,6 +17,7 @@ #include "bit.h" #include "output.h" #include "init.h" +#include "libfrog/bitmap.h" static int inode_a_bmbt_count(void *obj, int startoff); static int inode_a_bmx_count(void *obj, int startoff); @@ -47,6 +48,7 @@ static int inode_u_muuid_count(void *obj, int startoff); static int inode_u_sfdir2_count(void *obj, int startoff); static int inode_u_sfdir3_count(void *obj, int startoff); static int inode_u_symlink_count(void *obj, int startoff); +static int inode_u_rtrmapbt_count(void *obj, int startoff); static const cmdinfo_t inode_cmd = { "inode", NULL, inode_f, 0, 1, 1, "[inode#]", @@ -230,6 +232,8 @@ const field_t inode_u_flds[] = { { "sfdir3", FLDT_DIR3SF, NULL, inode_u_sfdir3_count, FLD_COUNT, TYP_NONE }, { "symlink", FLDT_CHARNS, NULL, inode_u_symlink_count, FLD_COUNT, TYP_NONE }, + { "rtrmapbt", FLDT_RTRMAPROOT, NULL, inode_u_rtrmapbt_count, FLD_COUNT, + TYP_NONE }, { NULL } }; @@ -243,7 +247,7 @@ const field_t inode_a_flds[] = { }; static const char *dinode_fmt_name[] = - { "dev", "local", "extents", "btree", "uuid" }; + { "dev", "local", "extents", "btree", "uuid", "rmap" }; static const int dinode_fmt_name_size = sizeof(dinode_fmt_name) / sizeof(dinode_fmt_name[0]); @@ -633,9 +637,74 @@ inode_init(void) add_command(&inode_cmd); } +static struct bitmap *rmap_inodes; + +static inline int +set_rtgroup_rmap_inode( + struct xfs_mount *mp, + xfs_rgnumber_t rgno) +{ + struct xfs_imeta_path *path; + xfs_ino_t rtino; + int error; + + if (!xfs_has_rtrmapbt(mp)) + return 0; + + error = -libxfs_rtrmapbt_create_path(mp, rgno, &path); + if (error) + return error; + + error = -libxfs_imeta_lookup(mp, path, &rtino); + libxfs_imeta_free_path(path); + if (error) + return error; + + if (rtino == NULLFSINO) + return EFSCORRUPTED; + + return bitmap_set(rmap_inodes, rtino, 1); +} + +int +init_rtmeta_inode_bitmaps( + struct xfs_mount *mp) +{ + xfs_rgnumber_t rgno; + int error; + + if (rmap_inodes) + return 0; + + error = bitmap_alloc(&rmap_inodes); + if (error) + return error; + + for (rgno = 0; rgno < mp->m_sb.sb_rgcount; rgno++) { + int err2 = set_rtgroup_rmap_inode(mp, rgno); + if (err2 && !error) + error = err2; + } + + return error; +} + +bool is_rtrmap_inode(xfs_ino_t ino) +{ + return bitmap_test(rmap_inodes, ino, 1); +} + typnm_t inode_next_type(void) { + int error; + + error = init_rtmeta_inode_bitmaps(mp); + if (error) { + dbprintf(_("error %d setting up rt metadata inode bitmaps\n"), + error); + } + switch (iocur_top->mode & S_IFMT) { case S_IFDIR: return TYP_DIR2; @@ -655,8 +724,9 @@ inode_next_type(void) iocur_top->ino == mp->m_sb.sb_gquotino || iocur_top->ino == mp->m_sb.sb_pquotino) return TYP_DQBLK; - else - return TYP_DATA; + else if (is_rtrmap_inode(iocur_top->ino)) + return TYP_RTRMAPBT; + return TYP_DATA; default: return TYP_NONE; } @@ -790,6 +860,20 @@ inode_u_sfdir3_count( xfs_has_ftype(mp); } +static int +inode_u_rtrmapbt_count( + void *obj, + int startoff) +{ + struct xfs_dinode *dip; + + ASSERT(bitoffs(startoff) == 0); + ASSERT(obj == iocur_top->data); + dip = obj; + ASSERT((char *)XFS_DFORK_DPTR(dip) - (char *)dip == byteize(startoff)); + return dip->di_format == XFS_DINODE_FMT_RMAP; +} + int inode_u_size( void *obj, diff --git a/db/inode.h b/db/inode.h index 31a2ebbba6a..a47b0575a15 100644 --- a/db/inode.h +++ b/db/inode.h @@ -23,3 +23,6 @@ extern int inode_size(void *obj, int startoff, int idx); extern int inode_u_size(void *obj, int startoff, int idx); extern void xfs_inode_set_crc(struct xfs_buf *); extern void set_cur_inode(xfs_ino_t ino); + +int init_rtmeta_inode_bitmaps(struct xfs_mount *mp); +bool is_rtrmap_inode(xfs_ino_t ino); diff --git a/db/type.c b/db/type.c index 2091b4ac8b1..1dfc33ffb44 100644 --- a/db/type.c +++ b/db/type.c @@ -51,6 +51,7 @@ static const typ_t __typtab[] = { { TYP_BNOBT, "bnobt", handle_struct, bnobt_hfld, NULL, TYP_F_NO_CRC_OFF }, { TYP_CNTBT, "cntbt", handle_struct, cntbt_hfld, NULL, TYP_F_NO_CRC_OFF }, { TYP_RMAPBT, NULL }, + { TYP_RTRMAPBT, NULL }, { TYP_REFCBT, NULL }, { TYP_DATA, "data", handle_block, NULL, NULL, TYP_F_NO_CRC_OFF }, { TYP_DIR2, "dir2", handle_struct, dir2_hfld, NULL, TYP_F_NO_CRC_OFF }, @@ -91,6 +92,8 @@ static const typ_t __typtab_crc[] = { &xfs_cntbt_buf_ops, XFS_BTREE_SBLOCK_CRC_OFF }, { TYP_RMAPBT, "rmapbt", handle_struct, rmapbt_crc_hfld, &xfs_rmapbt_buf_ops, XFS_BTREE_SBLOCK_CRC_OFF }, + { TYP_RTRMAPBT, "rtrmapbt", handle_struct, rtrmapbt_crc_hfld, + &xfs_rtrmapbt_buf_ops, XFS_BTREE_LBLOCK_CRC_OFF }, { TYP_REFCBT, "refcntbt", handle_struct, refcbt_crc_hfld, &xfs_refcountbt_buf_ops, XFS_BTREE_SBLOCK_CRC_OFF }, { TYP_DATA, "data", handle_block, NULL, NULL, TYP_F_NO_CRC_OFF }, @@ -141,6 +144,8 @@ static const typ_t __typtab_spcrc[] = { &xfs_cntbt_buf_ops, XFS_BTREE_SBLOCK_CRC_OFF }, { TYP_RMAPBT, "rmapbt", handle_struct, rmapbt_crc_hfld, &xfs_rmapbt_buf_ops, XFS_BTREE_SBLOCK_CRC_OFF }, + { TYP_RTRMAPBT, "rtrmapbt", handle_struct, rtrmapbt_crc_hfld, + &xfs_rtrmapbt_buf_ops, XFS_BTREE_LBLOCK_CRC_OFF }, { TYP_REFCBT, "refcntbt", handle_struct, refcbt_crc_hfld, &xfs_refcountbt_buf_ops, XFS_BTREE_SBLOCK_CRC_OFF }, { TYP_DATA, "data", handle_block, NULL, NULL, TYP_F_NO_CRC_OFF }, diff --git a/db/type.h b/db/type.h index e7f0ecc1768..c98f3640202 100644 --- a/db/type.h +++ b/db/type.h @@ -20,6 +20,7 @@ typedef enum typnm TYP_BNOBT, TYP_CNTBT, TYP_RMAPBT, + TYP_RTRMAPBT, TYP_REFCBT, TYP_DATA, TYP_DIR2, diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h index 0443423fb5a..ae92c909265 100644 --- a/libxfs/libxfs_api_defs.h +++ b/libxfs/libxfs_api_defs.h @@ -165,6 +165,7 @@ #define xfs_imeta_create_space_res libxfs_imeta_create_space_res #define xfs_imeta_end_update libxfs_imeta_end_update #define xfs_imeta_ensure_dirpath libxfs_imeta_ensure_dirpath +#define xfs_imeta_free_path libxfs_imeta_free_path #define xfs_imeta_iget libxfs_imeta_iget #define xfs_imeta_irele libxfs_imeta_irele #define xfs_imeta_link libxfs_imeta_link @@ -242,6 +243,10 @@ #define xfs_rtfree_extent libxfs_rtfree_extent #define xfs_rtgroup_update_secondary_sbs libxfs_rtgroup_update_secondary_sbs #define xfs_rtgroup_update_super libxfs_rtgroup_update_super +#define xfs_rtrmapbt_create_path libxfs_rtrmapbt_create_path +#define xfs_rtrmapbt_droot_maxrecs libxfs_rtrmapbt_droot_maxrecs +#define xfs_rtrmapbt_maxrecs libxfs_rtrmapbt_maxrecs + #define xfs_sb_from_disk libxfs_sb_from_disk #define xfs_sb_quota_from_disk libxfs_sb_quota_from_disk #define xfs_sb_read_secondary libxfs_sb_read_secondary diff --git a/man/man8/xfs_db.8 b/man/man8/xfs_db.8 index bcb4c871827..92d22cbc15f 100644 --- a/man/man8/xfs_db.8 +++ b/man/man8/xfs_db.8 @@ -1101,7 +1101,7 @@ The possible data types are: .BR agf ", " agfl ", " agi ", " attr ", " bmapbta ", " bmapbtd , .BR bnobt ", " cntbt ", " data ", " dir ", " dir2 ", " dqblk , .BR inobt ", " inode ", " log ", " refcntbt ", " rmapbt ", " rtbitmap , -.BR rtsummary ", " sb ", " symlink " and " text . +.BR rtsummary ", " sb ", " symlink ", " rtrmapbt ", and " text . See the TYPES section below for more information on these data types. .TP .BI "timelimit [" OPTIONS ] @@ -2263,6 +2263,64 @@ block number within the allocation group to the next level in the Btree. .PD .RE .TP +.B rtrmapbt +There is one reverse mapping Btree for each realtime group. +The +.BR startblock " and " +.B blockcount +fields are 32 bits wide and record blocks within a realtime group. +The root of this Btree is the reverse-mapping inode, which is recorded in the +metadata directory. +Blocks are linked to sibling left and right blocks at each level, as well as by +pointers from parent to child blocks. +Each block has the following fields: +.RS 1.4i +.PD 0 +.TP 1.2i +.B magic +RTRMAP block magic number, 0x4d415052 ('MAPR'). +.TP +.B level +level number of this block, 0 is a leaf. +.TP +.B numrecs +number of data entries in the block. +.TP +.B leftsib +left (logically lower) sibling block, 0 if none. +.TP +.B rightsib +right (logically higher) sibling block, 0 if none. +.TP +.B recs +[leaf blocks only] array of reference count records. Each record contains +.BR startblock , +.BR blockcount , +.BR owner , +.BR offset , +.BR attr_fork , +.BR bmbt_block , +and +.BR unwritten . +.TP +.B keys +[non-leaf blocks only] array of double-key records. The first ("low") key +contains the first value of each block in the level below this one. The second +("high") key contains the largest key that can be used to identify any record +in the subtree. Each record contains +.BR startblock , +.BR owner , +.BR offset , +.BR attr_fork , +and +.BR bmbt_block . +.TP +.B ptrs +[non-leaf blocks only] array of child block pointers. Each pointer is a +block number within the allocation group to the next level in the Btree. +.PD +.RE +.TP .B rtbitmap If the filesystem has a realtime subvolume, then the .B rbmino ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 16/41] xfs: scrub the realtime rmapbt 2022-12-30 22:19 ` [PATCHSET v1.0 00/41] libxfs: realtime reverse-mapping support Darrick J. Wong ` (18 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 20/41] xfs_db: display the realtime rmap btree contents Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 17/41] xfs: online repair of the realtime rmap btree Darrick J. Wong ` (20 subsequent siblings) 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Check the realtime reverse mapping btree against the rtbitmap, and modify the rtbitmap scrub to check against the rtrmapbt. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/xfs_fs.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/libxfs/xfs_fs.h b/libxfs/xfs_fs.h index 5c557d5ff13..8547ba85c55 100644 --- a/libxfs/xfs_fs.h +++ b/libxfs/xfs_fs.h @@ -744,9 +744,10 @@ struct xfs_scrub_metadata { #define XFS_SCRUB_TYPE_HEALTHY 27 /* everything checked out ok */ #define XFS_SCRUB_TYPE_RGSUPER 28 /* realtime superblock */ #define XFS_SCRUB_TYPE_RGBITMAP 29 /* realtime group bitmap */ +#define XFS_SCRUB_TYPE_RTRMAPBT 30 /* rtgroup reverse mapping btree */ /* Number of scrub subcommands. */ -#define XFS_SCRUB_TYPE_NR 30 +#define XFS_SCRUB_TYPE_NR 31 /* i: Repair this metadata. */ #define XFS_SCRUB_IFLAG_REPAIR (1u << 0) ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 17/41] xfs: online repair of the realtime rmap btree 2022-12-30 22:19 ` [PATCHSET v1.0 00/41] libxfs: realtime reverse-mapping support Darrick J. Wong ` (19 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 16/41] xfs: scrub the realtime rmapbt Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 22/41] xfs_db: support rudimentary checks of the rtrmap btree Darrick J. Wong ` (19 subsequent siblings) 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Repair the realtime rmap btree while mounted. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/xfs_rmap.c | 2 +- libxfs/xfs_rmap.h | 2 ++ libxfs/xfs_rtrmap_btree.c | 2 +- libxfs/xfs_rtrmap_btree.h | 3 +++ 4 files changed, 7 insertions(+), 2 deletions(-) diff --git a/libxfs/xfs_rmap.c b/libxfs/xfs_rmap.c index 74fb9197cbc..967a095c45d 100644 --- a/libxfs/xfs_rmap.c +++ b/libxfs/xfs_rmap.c @@ -263,7 +263,7 @@ xfs_rmap_check_perag_irec( return NULL; } -static inline xfs_failaddr_t +inline xfs_failaddr_t xfs_rmap_check_rtgroup_irec( struct xfs_rtgroup *rtg, const struct xfs_rmap_irec *irec) diff --git a/libxfs/xfs_rmap.h b/libxfs/xfs_rmap.h index e98f37c39f2..9d0aaa16f55 100644 --- a/libxfs/xfs_rmap.h +++ b/libxfs/xfs_rmap.h @@ -215,6 +215,8 @@ xfs_failaddr_t xfs_rmap_btrec_to_irec(const union xfs_btree_rec *rec, struct xfs_rmap_irec *irec); xfs_failaddr_t xfs_rmap_check_perag_irec(struct xfs_perag *pag, const struct xfs_rmap_irec *irec); +xfs_failaddr_t xfs_rmap_check_rtgroup_irec(struct xfs_rtgroup *rtg, + const struct xfs_rmap_irec *irec); xfs_failaddr_t xfs_rmap_check_irec(struct xfs_btree_cur *cur, const struct xfs_rmap_irec *irec); diff --git a/libxfs/xfs_rtrmap_btree.c b/libxfs/xfs_rtrmap_btree.c index b39ccba497a..05258472592 100644 --- a/libxfs/xfs_rtrmap_btree.c +++ b/libxfs/xfs_rtrmap_btree.c @@ -702,7 +702,7 @@ xfs_rtrmapbt_create_path( } /* Calculate the rtrmap btree size for some records. */ -static unsigned long long +unsigned long long xfs_rtrmapbt_calc_size( struct xfs_mount *mp, unsigned long long len) diff --git a/libxfs/xfs_rtrmap_btree.h b/libxfs/xfs_rtrmap_btree.h index 046a6081673..1f0a6f9620e 100644 --- a/libxfs/xfs_rtrmap_btree.h +++ b/libxfs/xfs_rtrmap_btree.h @@ -203,4 +203,7 @@ struct xfs_imeta_update; int xfs_rtrmapbt_create(struct xfs_trans **tpp, struct xfs_imeta_path *path, struct xfs_imeta_update *ic, struct xfs_inode **ipp); +unsigned long long xfs_rtrmapbt_calc_size(struct xfs_mount *mp, + unsigned long long len); + #endif /* __XFS_RTRMAP_BTREE_H__ */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 22/41] xfs_db: support rudimentary checks of the rtrmap btree 2022-12-30 22:19 ` [PATCHSET v1.0 00/41] libxfs: realtime reverse-mapping support Darrick J. Wong ` (20 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 17/41] xfs: online repair of the realtime rmap btree Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 27/41] libxfs: dirty buffers should be marked uptodate too Darrick J. Wong ` (18 subsequent siblings) 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Perform some fairly superficial checks of the rtrmap btree. We'll do more sophisticated checks in xfs_repair, but provide enough of a spot-check here that we can do simple things. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- db/check.c | 203 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++- db/inode.c | 25 +++++++ db/inode.h | 1 3 files changed, 223 insertions(+), 6 deletions(-) diff --git a/db/check.c b/db/check.c index 0e1fb3cf8f1..79f26b0e789 100644 --- a/db/check.c +++ b/db/check.c @@ -20,6 +20,7 @@ #include "init.h" #include "malloc.h" #include "dir2.h" +#include "inode.h" typedef enum { IS_USER_QUOTA, IS_PROJECT_QUOTA, IS_GROUP_QUOTA, @@ -57,6 +58,7 @@ typedef enum { DBM_RLDATA, DBM_COWDATA, DBM_RTSB, + DBM_BTRTRMAP, DBM_NDBM } dbm_t; @@ -71,6 +73,7 @@ typedef struct inodata { xfs_ino_t ino; struct inodata *parent; char *name; + xfs_rgnumber_t rgno; /* only if rtgroup metadata inode */ } inodata_t; #define MIN_INODATA_HASH_SIZE 256 #define MAX_INODATA_HASH_SIZE 65536 @@ -189,6 +192,7 @@ static const char *typename[] = { "rldata", "cowdata", "rtsb", + "btrtrmap", NULL }; @@ -341,6 +345,9 @@ static void process_rtbitmap(blkmap_t *blkmap); static void process_rtsummary(blkmap_t *blkmap); static xfs_ino_t process_sf_dir_v2(struct xfs_dinode *dip, int *dot, int *dotdot, inodata_t *id); +static void process_rtrmap(struct inodata *id, + struct xfs_dinode *dip, + xfs_rfsblock_t *toti); static void quota_add(xfs_dqid_t *p, xfs_dqid_t *g, xfs_dqid_t *u, int dq, xfs_qcnt_t bc, xfs_qcnt_t ic, xfs_qcnt_t rc); @@ -366,6 +373,12 @@ static void scanfunc_bmap(struct xfs_btree_block *block, xfs_rfsblock_t *toti, xfs_extnum_t *nex, blkmap_t **blkmapp, int isroot, typnm_t btype); +static void scanfunc_rtrmap(struct xfs_btree_block *block, + int level, dbm_t type, xfs_fsblock_t bno, + inodata_t *id, xfs_rfsblock_t *totd, + xfs_rfsblock_t *toti, xfs_extnum_t *nex, + blkmap_t **blkmapp, int isroot, + typnm_t btype); static void scanfunc_bno(struct xfs_btree_block *block, int level, xfs_agf_t *agf, xfs_agblock_t bno, int isroot); @@ -839,12 +852,21 @@ blockget_f( xfs_agnumber_t agno; int oldprefix; int sbyell; + int error; if (dbmap) { dbprintf(_("already have block usage information\n")); return 0; } + error = init_rtmeta_inode_bitmaps(mp); + if (error) { + dbprintf(_("error %d setting up rt metadata inode bitmaps\n"), + error); + exitcode = 3; + return 0; + } + if (!init(argc, argv)) { if (serious_error) exitcode = 3; @@ -1101,6 +1123,7 @@ blocktrash_f( (1 << DBM_QUOTA) | (1 << DBM_RTBITMAP) | (1 << DBM_RTSUM) | + (1 << DBM_BTRTRMAP) | (1 << DBM_SYMLINK) | (1 << DBM_BTFINO) | (1 << DBM_BTRMAP) | @@ -2840,7 +2863,7 @@ process_inode( 0 /* type 15 unused */ }; static char *fmtnames[] = { - "dev", "local", "extents", "btree", "uuid" + "dev", "local", "extents", "btree", "uuid", "rmap" }; ino = XFS_AGINO_TO_INO(mp, be32_to_cpu(agf->agf_seqno), agino); @@ -2906,11 +2929,20 @@ process_inode( be32_to_cpu(dip->di_next_unlinked), ino); error++; } - /* - * di_mode is a 16-bit uint so no need to check the < 0 case - */ + + /* Check that mode and data fork format match. */ mode = be16_to_cpu(dip->di_mode); - if ((((mode & S_IFMT) >> 12) > 15) || + if (is_rtrmap_inode(ino)) { + if (!S_ISREG(mode) || dip->di_format != XFS_DINODE_FMT_RMAP) { + if (v) + dbprintf( + _("bad format %d for rtrmap inode %lld type %#o\n"), + dip->di_format, (long long)ino, + mode & S_IFMT); + error++; + return; + } + } else if ((((mode & S_IFMT) >> 12) > 15) || (!(okfmts[(mode & S_IFMT) >> 12] & (1 << dip->di_format)))) { if (v) dbprintf(_("bad format %d for inode %lld type %#o\n"), @@ -2983,6 +3015,9 @@ process_inode( blkmap = blkmap_alloc(dnextents); if (!xfs_has_metadir(mp)) addlink_inode(id); + } else if (is_rtrmap_inode(id->ino)) { + type = DBM_BTRTRMAP; + blkmap = blkmap_alloc(be32_to_cpu(dip->di_nextents)); } else type = DBM_DATA; @@ -3014,6 +3049,10 @@ process_inode( process_btinode(id, dip, type, &totdblocks, &totiblocks, &nextents, &blkmap, XFS_DATA_FORK); break; + case XFS_DINODE_FMT_RMAP: + id->rgno = rtgroup_for_rtrmap_ino(mp, id->ino); + process_rtrmap(id, dip, &totiblocks); + break; } if (dip->di_forkoff) { sbversion |= XFS_SB_VERSION_ATTRBIT; @@ -3039,6 +3078,7 @@ process_inode( case DBM_RTBITMAP: case DBM_RTSUM: case DBM_SYMLINK: + case DBM_BTRTRMAP: case DBM_UNKNOWN: bc = totdblocks + totiblocks + atotdblocks + atotiblocks; @@ -3786,6 +3826,79 @@ process_rtsummary( } } +static void +process_rtrmap( + struct inodata *id, + struct xfs_dinode *dip, + xfs_rfsblock_t *toti) +{ + xfs_extnum_t nex = 0; + xfs_rfsblock_t totd = 0; + struct xfs_rtrmap_root *dib; + int whichfork = XFS_DATA_FORK; + int i; + int maxrecs; + xfs_rtrmap_ptr_t *pp; + + if (id->rgno == NULLRGNUMBER) { + dbprintf( + _("rt group for rmap ino %lld not found\n"), + id->ino); + error++; + return; + } + + dib = (struct xfs_rtrmap_root *)XFS_DFORK_PTR(dip, whichfork); + if (be16_to_cpu(dib->bb_level) >= mp->m_rtrmap_maxlevels) { + if (!sflag || id->ilist) + dbprintf(_("level for ino %lld rtrmap root too " + "large (%u)\n"), + id->ino, + be16_to_cpu(dib->bb_level)); + error++; + return; + } + maxrecs = libxfs_rtrmapbt_droot_maxrecs( + XFS_DFORK_SIZE(dip, mp, whichfork), + dib->bb_level == 0); + if (be16_to_cpu(dib->bb_numrecs) > maxrecs) { + if (!sflag || id->ilist) + dbprintf(_("numrecs for ino %lld rtrmap root too " + "large (%u)\n"), + id->ino, + be16_to_cpu(dib->bb_numrecs)); + error++; + return; + } + if (be16_to_cpu(dib->bb_level) == 0) { + struct xfs_rmap_rec *rp; + xfs_fsblock_t lastblock; + + rp = xfs_rtrmap_droot_rec_addr(dib, 1); + lastblock = 0; + for (i = 0; i < be16_to_cpu(dib->bb_numrecs); i++) { + if (be32_to_cpu(rp[i].rm_startblock) < lastblock) { + dbprintf(_( + "out-of-order rtrmap btree record %d (%u %u) root\n"), + i, be32_to_cpu(rp[i].rm_startblock), + be32_to_cpu(rp[i].rm_startblock)); + } else { + lastblock = be32_to_cpu(rp[i].rm_startblock) + + be32_to_cpu(rp[i].rm_blockcount); + } + } + return; + } else { + pp = xfs_rtrmap_droot_ptr_addr(dib, 1, maxrecs); + for (i = 0; i < be16_to_cpu(dib->bb_numrecs); i++) + scan_lbtree(get_unaligned_be64(&pp[i]), + be16_to_cpu(dib->bb_level), + scanfunc_rtrmap, DBM_BTRTRMAP, + id, &totd, toti, + &nex, NULL, 1, TYP_RTRMAPBT); + } +} + static xfs_ino_t process_sf_dir_v2( struct xfs_dinode *dip, @@ -4884,6 +4997,86 @@ scanfunc_rmap( TYP_RMAPBT); } +static void +scanfunc_rtrmap( + struct xfs_btree_block *block, + int level, + dbm_t type, + xfs_fsblock_t bno, + inodata_t *id, + xfs_rfsblock_t *totd, + xfs_rfsblock_t *toti, + xfs_extnum_t *nex, + blkmap_t **blkmapp, + int isroot, + typnm_t btype) +{ + xfs_agblock_t agbno; + xfs_agnumber_t agno; + int i; + xfs_rtrmap_ptr_t *pp; + struct xfs_rmap_rec *rp; + xfs_fsblock_t lastblock; + + agno = XFS_FSB_TO_AGNO(mp, bno); + agbno = XFS_FSB_TO_AGBNO(mp, bno); + if (be32_to_cpu(block->bb_magic) != XFS_RTRMAP_CRC_MAGIC) { + dbprintf(_("bad magic # %#x in rtrmapbt block %u/%u\n"), + be32_to_cpu(block->bb_magic), agno, bno); + serious_error++; + return; + } + if (be16_to_cpu(block->bb_level) != level) { + if (!sflag) + dbprintf(_("expected level %d got %d in rtrmapbt block " + "%u/%u\n"), + level, be16_to_cpu(block->bb_level), agno, bno); + error++; + } + set_dbmap(agno, agbno, 1, type, agno, agbno); + set_inomap(agno, agbno, 1, id); + (*toti)++; + if (level == 0) { + if (be16_to_cpu(block->bb_numrecs) > mp->m_rtrmap_mxr[0] || + (isroot == 0 && be16_to_cpu(block->bb_numrecs) < mp->m_rtrmap_mnr[0])) { + dbprintf(_("bad btree nrecs (%u, min=%u, max=%u) in " + "rtrmapbt block %u/%u\n"), + be16_to_cpu(block->bb_numrecs), mp->m_rtrmap_mnr[0], + mp->m_rtrmap_mxr[0], agno, bno); + serious_error++; + return; + } + rp = xfs_rtrmap_rec_addr(block, 1); + lastblock = 0; + for (i = 0; i < be16_to_cpu(block->bb_numrecs); i++) { + if (be32_to_cpu(rp[i].rm_startblock) < lastblock) { + dbprintf(_( + "out-of-order rtrmap btree record %d (%u %u) block %u/%u l %llu\n"), + i, be32_to_cpu(rp[i].rm_startblock), + be32_to_cpu(rp[i].rm_blockcount), + agno, bno, lastblock); + } else { + lastblock = be32_to_cpu(rp[i].rm_startblock) + + be32_to_cpu(rp[i].rm_blockcount); + } + } + return; + } + if (be16_to_cpu(block->bb_numrecs) > mp->m_rtrmap_mxr[1] || + (isroot == 0 && be16_to_cpu(block->bb_numrecs) < mp->m_rtrmap_mnr[1])) { + dbprintf(_("bad btree nrecs (%u, min=%u, max=%u) in rtrmapbt " + "block %u/%u\n"), + be16_to_cpu(block->bb_numrecs), mp->m_rtrmap_mnr[1], + mp->m_rtrmap_mxr[1], agno, bno); + serious_error++; + return; + } + pp = xfs_rtrmap_ptr_addr(block, 1, mp->m_rtrmap_mxr[1]); + for (i = 0; i < be16_to_cpu(block->bb_numrecs); i++) + scan_lbtree(be64_to_cpu(pp[i]), level, scanfunc_rtrmap, type, id, + totd, toti, nex, blkmapp, 0, btype); +} + static void scanfunc_refcnt( struct xfs_btree_block *block, diff --git a/db/inode.c b/db/inode.c index 460d99175ab..2d28eae4dad 100644 --- a/db/inode.c +++ b/db/inode.c @@ -637,7 +637,12 @@ inode_init(void) add_command(&inode_cmd); } -static struct bitmap *rmap_inodes; +struct rtgroup_inodes { + xfs_ino_t rmap_ino; +}; + +static struct rtgroup_inodes *rtgroup_inodes; +static struct bitmap *rmap_inodes; static inline int set_rtgroup_rmap_inode( @@ -663,6 +668,7 @@ set_rtgroup_rmap_inode( if (rtino == NULLFSINO) return EFSCORRUPTED; + rtgroup_inodes[rgno].rmap_ino = rtino; return bitmap_set(rmap_inodes, rtino, 1); } @@ -676,6 +682,11 @@ init_rtmeta_inode_bitmaps( if (rmap_inodes) return 0; + rtgroup_inodes = calloc(mp->m_sb.sb_rgcount, + sizeof(struct rtgroup_inodes)); + if (!rtgroup_inodes) + return ENOMEM; + error = bitmap_alloc(&rmap_inodes); if (error) return error; @@ -694,6 +705,18 @@ bool is_rtrmap_inode(xfs_ino_t ino) return bitmap_test(rmap_inodes, ino, 1); } +xfs_rgnumber_t rtgroup_for_rtrmap_ino(struct xfs_mount *mp, xfs_ino_t ino) +{ + unsigned int i; + + for (i = 0; i < mp->m_sb.sb_rgcount; i++) { + if (rtgroup_inodes[i].rmap_ino == ino) + return i; + } + + return NULLRGNUMBER; +} + typnm_t inode_next_type(void) { diff --git a/db/inode.h b/db/inode.h index a47b0575a15..04e606abed3 100644 --- a/db/inode.h +++ b/db/inode.h @@ -26,3 +26,4 @@ extern void set_cur_inode(xfs_ino_t ino); int init_rtmeta_inode_bitmaps(struct xfs_mount *mp); bool is_rtrmap_inode(xfs_ino_t ino); +xfs_rgnumber_t rtgroup_for_rtrmap_ino(struct xfs_mount *mp, xfs_ino_t ino); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 27/41] libxfs: dirty buffers should be marked uptodate too 2022-12-30 22:19 ` [PATCHSET v1.0 00/41] libxfs: realtime reverse-mapping support Darrick J. Wong ` (21 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 22/41] xfs_db: support rudimentary checks of the rtrmap btree Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 25/41] libfrog: enable scrubbng of the realtime rmap Darrick J. Wong ` (17 subsequent siblings) 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> I started fuzz-testing the realtime rmap feature with a very large number of realtime allocation groups. There were so many rt groups that repair had to rebuild /realtime in the metadata directory tree, and that directory was big enough to spur the creation of a block format directory. Unfortunately, repair then walks both directory trees to look for unconnceted files. This part of phase 6 emits CRC errors on the newly created buffers for the /realtime directory, declares the directory to be garbage, and moves all the rt rmap inodes to /lost+found, resulting in a corrupt fs. Poking around in gdb, I noticed that the buffer contents were indeed zero, and that UPTODATE was not set. This was very strange, until I added a watch on bp->b_flags to watch for accesses. It turns out that xfs_repair's prefetch code will _get a buffer and zero the contents if UPTODATE is not set. The directory tree code in libxfs will also _get a buffer, initialize it, and log it to the coordinating transaction, which in this case is the transactions used to reconnect the rmap btree inodes to /realtime. At no point does any of that code ever set UPTODATE on the buffer, which is why prefetch zaps the contents. Hence change both buffer dirtying functions to set UPTODATE, since a dirty buffer is by definition at least as recent as whatever's on disk. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/rdwr.c | 2 +- libxfs/trans.c | 1 + 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/libxfs/rdwr.c b/libxfs/rdwr.c index 1c91f557f41..fd9579f3bc7 100644 --- a/libxfs/rdwr.c +++ b/libxfs/rdwr.c @@ -963,7 +963,7 @@ libxfs_buf_mark_dirty( */ bp->b_error = 0; bp->b_flags &= ~LIBXFS_B_STALE; - bp->b_flags |= LIBXFS_B_DIRTY; + bp->b_flags |= LIBXFS_B_DIRTY | LIBXFS_B_UPTODATE; } /* Prepare a buffer to be sent to the MRU list. */ diff --git a/libxfs/trans.c b/libxfs/trans.c index 06d3655c33b..93c281c321b 100644 --- a/libxfs/trans.c +++ b/libxfs/trans.c @@ -715,6 +715,7 @@ libxfs_trans_dirty_buf( ASSERT(bp->b_transp == tp); ASSERT(bip != NULL); + bp->b_flags |= LIBXFS_B_UPTODATE; tp->t_flags |= XFS_TRANS_DIRTY; set_bit(XFS_LI_DIRTY, &bip->bli_item.li_flags); } ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 25/41] libfrog: enable scrubbng of the realtime rmap 2022-12-30 22:19 ` [PATCHSET v1.0 00/41] libxfs: realtime reverse-mapping support Darrick J. Wong ` (22 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 27/41] libxfs: dirty buffers should be marked uptodate too Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 24/41] xfs_db: make fsmap query the realtime reverse mapping tree Darrick J. Wong ` (16 subsequent siblings) 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Add a new entry so that we can scrub the rtrmapbt. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libfrog/scrub.c | 5 +++++ scrub/repair.c | 1 + 2 files changed, 6 insertions(+) diff --git a/libfrog/scrub.c b/libfrog/scrub.c index 7efb7ecfbd0..6f12ec72b22 100644 --- a/libfrog/scrub.c +++ b/libfrog/scrub.c @@ -159,6 +159,11 @@ const struct xfrog_scrub_descr xfrog_scrubbers[XFS_SCRUB_TYPE_NR] = { .descr = "realtime group bitmap", .group = XFROG_SCRUB_GROUP_RTGROUP, }, + [XFS_SCRUB_TYPE_RTRMAPBT] = { + .name = "rtrmapbt", + .descr = "realtime reverse mapping btree", + .group = XFROG_SCRUB_GROUP_RTGROUP, + }, }; #undef DEP diff --git a/scrub/repair.c b/scrub/repair.c index 10db103c87f..79a15f907a1 100644 --- a/scrub/repair.c +++ b/scrub/repair.c @@ -400,6 +400,7 @@ repair_item_difficulty( switch (scrub_type) { case XFS_SCRUB_TYPE_RMAPBT: + case XFS_SCRUB_TYPE_RTRMAPBT: ret |= REPAIR_DIFFICULTY_SECONDARY; break; case XFS_SCRUB_TYPE_SB: ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 24/41] xfs_db: make fsmap query the realtime reverse mapping tree 2022-12-30 22:19 ` [PATCHSET v1.0 00/41] libxfs: realtime reverse-mapping support Darrick J. Wong ` (23 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 25/41] libfrog: enable scrubbng of the realtime rmap Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 29/41] xfs_repair: use realtime rmap btree data to check block types Darrick J. Wong ` (15 subsequent siblings) 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Extend the 'fsmap' debugger command to support querying the realtime rmap btree via a new -r argument. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- db/fsmap.c | 135 +++++++++++++++++++++++++++++++++++++++++++++- libxfs/libxfs_api_defs.h | 2 + 2 files changed, 133 insertions(+), 4 deletions(-) diff --git a/db/fsmap.c b/db/fsmap.c index 7fd42df2a1c..8d06f3638d6 100644 --- a/db/fsmap.c +++ b/db/fsmap.c @@ -102,6 +102,120 @@ fsmap( } } +static int +fsmap_rt_fn( + struct xfs_btree_cur *cur, + const struct xfs_rmap_irec *rec, + void *priv) +{ + struct fsmap_info *info = priv; + + dbprintf(_("%llu: %u/%u len %u owner %lld offset %llu bmbt %d attrfork %d extflag %d\n"), + info->nr, cur->bc_ino.rtg->rtg_rgno, rec->rm_startblock, + rec->rm_blockcount, rec->rm_owner, rec->rm_offset, + !!(rec->rm_flags & XFS_RMAP_BMBT_BLOCK), + !!(rec->rm_flags & XFS_RMAP_ATTR_FORK), + !!(rec->rm_flags & XFS_RMAP_UNWRITTEN)); + info->nr++; + + return 0; +} + +static void +fsmap_rt( + xfs_fsblock_t start_fsb, + xfs_fsblock_t end_fsb) +{ + struct fsmap_info info; + xfs_daddr_t eofs; + struct xfs_rmap_irec low; + struct xfs_rmap_irec high; + struct xfs_btree_cur *bt_cur; + struct xfs_rtgroup *rtg; + xfs_rgnumber_t start_rg; + xfs_rgnumber_t end_rg; + int error; + + if (mp->m_sb.sb_rblocks == 0) + return; + + eofs = XFS_FSB_TO_BB(mp, mp->m_sb.sb_rblocks); + if (XFS_FSB_TO_DADDR(mp, end_fsb) >= eofs) + end_fsb = XFS_DADDR_TO_FSB(mp, eofs - 1); + + low.rm_startblock = xfs_rtb_to_rgbno(mp, start_fsb, &start_rg); + low.rm_owner = 0; + low.rm_offset = 0; + low.rm_flags = 0; + high.rm_startblock = -1U; + high.rm_owner = ULLONG_MAX; + high.rm_offset = ULLONG_MAX; + high.rm_flags = XFS_RMAP_ATTR_FORK | XFS_RMAP_BMBT_BLOCK | + XFS_RMAP_UNWRITTEN; + + end_rg = xfs_rtb_to_rgno(mp, end_fsb); + + info.nr = 0; + for_each_rtgroup_range(mp, start_rg, end_rg, rtg) { + struct xfs_inode *ip; + struct xfs_imeta_path *path; + xfs_ino_t ino; + xfs_rgnumber_t rgno; + + if (rtg->rtg_rgno == end_rg) + high.rm_startblock = xfs_rtb_to_rgbno(mp, end_fsb, + &rgno); + + error = -libxfs_rtrmapbt_create_path(mp, rtg->rtg_rgno, &path); + if (error) { + dbprintf( + _("Cannot create path to rtgroup %u rmap inode\n"), + rtg->rtg_rgno); + libxfs_rtgroup_put(rtg); + return; + } + + error = -libxfs_imeta_lookup(mp, path, &ino); + libxfs_imeta_free_path(path); + if (error) { + dbprintf(_("Cannot look up rtgroup %u rmap inode\n"), + rtg->rtg_rgno); + libxfs_rtgroup_put(rtg); + return; + } + + error = -libxfs_imeta_iget(mp, ino, XFS_DIR3_FT_REG_FILE, &ip); + if (error) { + dbprintf(_("Cannot load rtgroup %u rmap inode\n"), + rtg->rtg_rgno); + libxfs_rtgroup_put(rtg); + return; + } + + bt_cur = libxfs_rtrmapbt_init_cursor(mp, NULL, rtg, ip); + if (!bt_cur) { + libxfs_imeta_irele(ip); + libxfs_rtgroup_put(rtg); + dbprintf(_("Not enough memory.\n")); + return; + } + + error = -libxfs_rmap_query_range(bt_cur, &low, &high, + fsmap_rt_fn, &info); + libxfs_btree_del_cursor(bt_cur, error); + libxfs_imeta_irele(ip); + if (error) { + libxfs_rtgroup_put(rtg); + dbprintf(_("Error %d while querying rt fsmap btree.\n"), + error); + return; + } + + if (rtg->rtg_rgno == start_rg) + low.rm_startblock = 0; + } +} + static int fsmap_f( int argc, @@ -111,14 +225,18 @@ fsmap_f( int c; xfs_fsblock_t start_fsb = 0; xfs_fsblock_t end_fsb = NULLFSBLOCK; + bool isrt = false; if (!xfs_has_rmapbt(mp)) { dbprintf(_("Filesystem does not support reverse mapping btree.\n")); return 0; } - while ((c = getopt(argc, argv, "")) != EOF) { + while ((c = getopt(argc, argv, "r")) != EOF) { switch (c) { + case 'r': + isrt = true; + break; default: dbprintf(_("Bad option for fsmap command.\n")); return 0; @@ -141,14 +259,23 @@ fsmap_f( } } - fsmap(start_fsb, end_fsb); + if (argc > optind + 2) { + exitcode = 1; + dbprintf(_("Too many arguments to fsmap.\n")); + return 0; + } + + if (isrt) + fsmap_rt(start_fsb, end_fsb); + else + fsmap(start_fsb, end_fsb); return 0; } static const cmdinfo_t fsmap_cmd = - { "fsmap", NULL, fsmap_f, 0, 2, 0, - N_("[start_fsb] [end_fsb]"), + { "fsmap", NULL, fsmap_f, 0, -1, 0, + N_("[-r] [start_fsb] [end_fsb]"), N_("display reverse mapping(s)"), NULL }; void diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h index 0e284e515d8..f59f9aa2060 100644 --- a/libxfs/libxfs_api_defs.h +++ b/libxfs/libxfs_api_defs.h @@ -241,11 +241,13 @@ #define xfs_rtsummary_wordcount libxfs_rtsummary_wordcount #define xfs_rtfree_extent libxfs_rtfree_extent +#define xfs_rtgroup_put libxfs_rtgroup_put #define xfs_rtgroup_update_secondary_sbs libxfs_rtgroup_update_secondary_sbs #define xfs_rtgroup_update_super libxfs_rtgroup_update_super #define xfs_rtrmapbt_create_path libxfs_rtrmapbt_create_path #define xfs_rtrmapbt_droot_maxrecs libxfs_rtrmapbt_droot_maxrecs #define xfs_rtrmapbt_maxlevels_ondisk libxfs_rtrmapbt_maxlevels_ondisk +#define xfs_rtrmapbt_init_cursor libxfs_rtrmapbt_init_cursor #define xfs_rtrmapbt_maxrecs libxfs_rtrmapbt_maxrecs #define xfs_sb_from_disk libxfs_sb_from_disk ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 29/41] xfs_repair: use realtime rmap btree data to check block types 2022-12-30 22:19 ` [PATCHSET v1.0 00/41] libxfs: realtime reverse-mapping support Darrick J. Wong ` (24 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 24/41] xfs_db: make fsmap query the realtime reverse mapping tree Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 28/41] xfs_repair: flag suspect long-format btree blocks Darrick J. Wong ` (14 subsequent siblings) 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Use the realtime rmap btree to pre-populate the block type information so that when repair iterates the primary metadata, we can confirm the block type. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- repair/dinode.c | 163 +++++++++++++++++++++++ repair/scan.c | 390 ++++++++++++++++++++++++++++++++++++++++++++++++++++++- repair/scan.h | 34 +++++ 3 files changed, 577 insertions(+), 10 deletions(-) diff --git a/repair/dinode.c b/repair/dinode.c index 36cb5cfcc70..93ef89272fd 100644 --- a/repair/dinode.c +++ b/repair/dinode.c @@ -777,6 +777,153 @@ get_agino_buf( * first, one utility routine for each type of inode */ +/* + * return 1 if inode should be cleared, 0 otherwise + */ +static int +process_rtrmap( + struct xfs_mount *mp, + xfs_agnumber_t agno, + xfs_agino_t ino, + struct xfs_dinode *dip, + int type, + int *dirty, + xfs_rfsblock_t *tot, + uint64_t *nex, + blkmap_t **blkmapp, + int check_dups) +{ + struct xfs_rmap_irec oldkey; + struct xfs_rmap_irec key; + struct rmap_priv priv; + struct xfs_rtrmap_root *dib; + xfs_rtrmap_ptr_t *pp; + struct xfs_rmap_key *kp; + struct xfs_rmap_rec *rp; + char *forkname = get_forkname(XFS_DATA_FORK); + xfs_ino_t lino; + xfs_fsblock_t bno; + size_t droot_sz; + int i; + int level; + int numrecs; + int dmxr; + int suspect = 0; + int error; + + /* We rebuild the rtrmapbt, so no need to process blocks again. */ + if (check_dups) { + *tot = be64_to_cpu(dip->di_nblocks); + return 0; + } + + lino = XFS_AGINO_TO_INO(mp, agno, ino); + + /* This rmap btree inode must be a metadata inode. */ + if (!(dip->di_flags2 & be64_to_cpu(XFS_DIFLAG2_METADATA))) { + do_warn( +_("rtrmap inode %" PRIu64 " not flagged as metadata\n"), + lino); + return 1; + } + + memset(&priv.high_key, 0xFF, sizeof(priv.high_key)); + priv.high_key.rm_blockcount = 0; + priv.agcnts = NULL; + priv.last_rec.rm_owner = XFS_RMAP_OWN_UNKNOWN; + + dib = (struct xfs_rtrmap_root *)XFS_DFORK_PTR(dip, XFS_DATA_FORK); + *tot = 0; + *nex = 0; + + level = be16_to_cpu(dib->bb_level); + numrecs = be16_to_cpu(dib->bb_numrecs); + + if (level > mp->m_rtrmap_maxlevels) { + do_warn( +_("bad level %d in inode %" PRIu64 " rtrmap btree root block\n"), + level, lino); + return 1; + } + + /* + * use rtroot/dfork_dsize since the root block is in the data fork + */ + droot_sz = xfs_rtrmap_droot_space_calc(level, numrecs); + if (droot_sz > XFS_DFORK_SIZE(dip, mp, XFS_DATA_FORK)) { + do_warn( +_("computed size of rtrmapbt root (%zu bytes) is greater than space in " + "inode %" PRIu64 " %s fork\n"), + droot_sz, lino, forkname); + return 1; + } + + if (level == 0) { + rp = xfs_rtrmap_droot_rec_addr(dib, 1); + error = process_rtrmap_reclist(mp, rp, numrecs, + &priv.last_rec, NULL, "rtrmapbt root"); + if (error) { + rmap_avoid_check(); + return 1; + } + return 0; + } + + dmxr = libxfs_rtrmapbt_droot_maxrecs( + XFS_DFORK_SIZE(dip, mp, XFS_DATA_FORK), false); + pp = xfs_rtrmap_droot_ptr_addr(dib, 1, dmxr); + + /* check for in-order keys */ + for (i = 0; i < numrecs; i++) { + kp = xfs_rtrmap_droot_key_addr(dib, i + 1); + + key.rm_flags = 0; + key.rm_startblock = be32_to_cpu(kp->rm_startblock); + key.rm_owner = be64_to_cpu(kp->rm_owner); + if (libxfs_rmap_irec_offset_unpack(be64_to_cpu(kp->rm_offset), + &key)) { + /* Look for impossible flags. */ + do_warn( +_("invalid flags in key %u of rtrmap root ino %" PRIu64 "\n"), + i, lino); + suspect++; + continue; + } + if (i == 0) { + oldkey = key; + continue; + } + if (rmap_diffkeys(&oldkey, &key) > 0) { + do_warn( +_("out of order key %u in rtrmap root ino %" PRIu64 "\n"), + i, lino); + suspect++; + continue; + } + oldkey = key; + } + + /* probe keys */ + for (i = 0; i < numrecs; i++) { + bno = get_unaligned_be64(&pp[i]); + + if (!libxfs_verify_fsbno(mp, bno)) { + do_warn( +_("bad rtrmap btree ptr 0x%" PRIx64 " in ino %" PRIu64 "\n"), + bno, lino); + return 1; + } + + if (scan_lbtree(bno, level, scan_rtrmapbt, + type, XFS_DATA_FORK, lino, tot, nex, blkmapp, + NULL, 0, 1, check_dups, XFS_RTRMAP_CRC_MAGIC, + &priv, &xfs_rtrmapbt_buf_ops)) + return 1; + } + + return suspect ? 1 : 0; +} + /* * return 1 if inode should be cleared, 0 otherwise */ @@ -1553,7 +1700,7 @@ static int check_dinode_mode_format( struct xfs_dinode *dinoc) { - if (dinoc->di_format >= XFS_DINODE_FMT_UUID) + if (dinoc->di_format == XFS_DINODE_FMT_UUID) return -1; /* FMT_UUID is not used */ switch (dinode_fmt(dinoc)) { @@ -1568,8 +1715,13 @@ check_dinode_mode_format( dinoc->di_format > XFS_DINODE_FMT_BTREE) ? -1 : 0; case S_IFREG: - return (dinoc->di_format < XFS_DINODE_FMT_EXTENTS || - dinoc->di_format > XFS_DINODE_FMT_BTREE) ? -1 : 0; + switch (dinoc->di_format) { + case XFS_DINODE_FMT_RMAP: + case XFS_DINODE_FMT_EXTENTS: + case XFS_DINODE_FMT_BTREE: + return 0; + } + return -1; case S_IFLNK: return (dinoc->di_format < XFS_DINODE_FMT_LOCAL || @@ -1983,6 +2135,10 @@ process_inode_data_fork( totblocks, nextents, dblkmap, XFS_DATA_FORK, check_dups, zap_metadata); break; + case XFS_DINODE_FMT_RMAP: + err = process_rtrmap(mp, agno, ino, dino, type, dirty, + totblocks, nextents, dblkmap, check_dups); + break; case XFS_DINODE_FMT_DEV: err = 0; break; @@ -2042,6 +2198,7 @@ _("would have tried to rebuild inode %"PRIu64" data fork\n"), XFS_DATA_FORK, 0, zap_metadata); break; case XFS_DINODE_FMT_DEV: + case XFS_DINODE_FMT_RMAP: err = 0; break; default: diff --git a/repair/scan.c b/repair/scan.c index 4573c25a577..09ca037f47d 100644 --- a/repair/scan.c +++ b/repair/scan.c @@ -957,13 +957,6 @@ _("unknown block (%d,%d-%d) mismatch on %s tree, state - %d,%" PRIx64 "\n"), } } -struct rmap_priv { - struct aghdr_cnts *agcnts; - struct xfs_rmap_irec high_key; - struct xfs_rmap_irec last_rec; - xfs_agblock_t nr_blocks; -}; - static bool rmap_in_order( xfs_agblock_t b, @@ -1365,6 +1358,389 @@ _("out of order key %u in %s btree block (%u/%u)\n"), rmap_avoid_check(); } +int +process_rtrmap_reclist( + struct xfs_mount *mp, + struct xfs_rmap_rec *rp, + int numrecs, + struct xfs_rmap_irec *last_rec, + struct xfs_rmap_irec *high_key, + const char *name) +{ + int suspect = 0; + int i; + struct xfs_rmap_irec oldkey; + struct xfs_rmap_irec key; + + for (i = 0; i < numrecs; i++) { + xfs_rgblock_t b, end; + xfs_extlen_t len; + uint64_t owner, offset; + + b = be32_to_cpu(rp[i].rm_startblock); + len = be32_to_cpu(rp[i].rm_blockcount); + owner = be64_to_cpu(rp[i].rm_owner); + offset = be64_to_cpu(rp[i].rm_offset); + + key.rm_flags = 0; + key.rm_startblock = b; + key.rm_blockcount = len; + key.rm_owner = owner; + if (libxfs_rmap_irec_offset_unpack(offset, &key)) { + /* Look for impossible flags. */ + do_warn( +_("invalid flags in record %u of %s\n"), + i, name); + suspect++; + continue; + } + + + end = key.rm_startblock + key.rm_blockcount; + + /* Make sure startblock & len make sense. */ + if (b >= mp->m_sb.sb_rgblocks) { + do_warn( +_("invalid start block %llu in record %u of %s\n"), + (unsigned long long)b, i, name); + suspect++; + continue; + } + if (len == 0 || end - 1 >= mp->m_sb.sb_rgblocks) { + do_warn( +_("invalid length %llu in record %u of %s\n"), + (unsigned long long)len, i, name); + suspect++; + continue; + } + + /* We only store file data and superblocks in the rtrmap. */ + if (XFS_RMAP_NON_INODE_OWNER(owner) && + owner != XFS_RMAP_OWN_FS) { + do_warn( +_("invalid owner %lld in record %u of %s\n"), + (long long int)owner, i, name); + suspect++; + continue; + } + + /* Look for impossible record field combinations. */ + if (key.rm_flags & XFS_RMAP_KEY_FLAGS) { + do_warn( +_("record %d cannot have attr fork/key flags in %s\n"), + i, name); + suspect++; + continue; + } + + /* Check for out of order records. */ + if (i == 0) + oldkey = key; + else { + if (rmap_diffkeys(&oldkey, &key) > 0) + do_warn( +_("out-of-order record %d (%llu %"PRId64" %"PRIu64" %llu) in %s\n"), + i, (unsigned long long)b, owner, offset, + (unsigned long long)len, name); + else + oldkey = key; + } + + /* Is this mergeable with the previous record? */ + if (rmaps_are_mergeable(last_rec, &key)) { + do_warn( +_("record %d in %s should be merged with previous record\n"), + i, name); + last_rec->rm_blockcount += key.rm_blockcount; + } else + *last_rec = key; + + /* Check that we don't go past the high key. */ + key.rm_startblock += key.rm_blockcount - 1; + key.rm_offset += key.rm_blockcount - 1; + key.rm_blockcount = 0; + if (high_key && rmap_diffkeys(&key, high_key) > 0) { + do_warn( +_("record %d greater than high key of %s\n"), + i, name); + suspect++; + } + } + + return suspect; +} + +int +scan_rtrmapbt( + struct xfs_btree_block *block, + int level, + int type, + int whichfork, + xfs_fsblock_t fsbno, + xfs_ino_t ino, + xfs_rfsblock_t *tot, + uint64_t *nex, + blkmap_t **blkmapp, + bmap_cursor_t *bm_cursor, + int suspect, + int isroot, + int check_dups, + int *dirty, + uint64_t magic, + void *priv) +{ + const char *name = "rtrmap"; + char rootname[256]; + int i; + xfs_rtrmap_ptr_t *pp; + struct xfs_rmap_rec *rp; + struct rmap_priv *rmap_priv = priv; + int hdr_errors = 0; + int numrecs; + int state; + struct xfs_rmap_key *kp; + struct xfs_rmap_irec oldkey; + struct xfs_rmap_irec key; + xfs_agnumber_t agno; + xfs_agblock_t agbno; + int error; + + agno = XFS_FSB_TO_AGNO(mp, fsbno); + agbno = XFS_FSB_TO_AGBNO(mp, fsbno); + + /* If anything here is bad, just bail. */ + if (be32_to_cpu(block->bb_magic) != magic) { + do_warn( +_("bad magic # %#x in inode %" PRIu64 " %s block %" PRIu64 "\n"), + be32_to_cpu(block->bb_magic), ino, name, fsbno); + return 1; + } + if (be16_to_cpu(block->bb_level) != level) { + do_warn( +_("expected level %d got %d in inode %" PRIu64 ", %s block %" PRIu64 "\n"), + level, be16_to_cpu(block->bb_level), + ino, name, fsbno); + return(1); + } + + /* verify owner */ + if (be64_to_cpu(block->bb_u.l.bb_owner) != ino) { + do_warn( +_("expected owner inode %" PRIu64 ", got %llu, %s block %" PRIu64 "\n"), + ino, + (unsigned long long)be64_to_cpu(block->bb_u.l.bb_owner), + name, fsbno); + return 1; + } + /* verify block number */ + if (be64_to_cpu(block->bb_u.l.bb_blkno) != + XFS_FSB_TO_DADDR(mp, fsbno)) { + do_warn( +_("expected block %" PRIu64 ", got %llu, %s block %" PRIu64 "\n"), + XFS_FSB_TO_DADDR(mp, fsbno), + (unsigned long long)be64_to_cpu(block->bb_u.l.bb_blkno), + name, fsbno); + return 1; + } + /* verify uuid */ + if (platform_uuid_compare(&block->bb_u.l.bb_uuid, + &mp->m_sb.sb_meta_uuid) != 0) { + do_warn( +_("wrong FS UUID, %s block %" PRIu64 "\n"), + name, fsbno); + return 1; + } + + /* + * Check for btree blocks multiply claimed. We're going to regenerate + * the rtrmap anyway, so mark the blocks as metadata so they get freed. + */ + state = get_bmap(agno, agbno); + if (!(state == XR_E_UNKNOWN || state == XR_E_INUSE1)) { + do_warn( +_("%s btree block claimed (state %d), agno %d, bno %d, suspect %d\n"), + name, state, agno, agbno, suspect); + suspect++; + goto out; + } + set_bmap(agno, agbno, XR_E_METADATA); + + numrecs = be16_to_cpu(block->bb_numrecs); + + /* + * All realtime rmap btree blocks are freed for a fully empty + * filesystem, thus they are counted towards the free data + * block counter. The root lives in an inode and is thus not + * counted. + */ + (*tot)++; + + if (level == 0) { + if (numrecs > mp->m_rtrmap_mxr[0]) { + numrecs = mp->m_rtrmap_mxr[0]; + hdr_errors++; + } + if (isroot == 0 && numrecs < mp->m_rtrmap_mnr[0]) { + numrecs = mp->m_rtrmap_mnr[0]; + hdr_errors++; + } + + if (hdr_errors) { + do_warn( +_("bad btree nrecs (%u, min=%u, max=%u) in bt%s block %u/%u\n"), + be16_to_cpu(block->bb_numrecs), + mp->m_rtrmap_mnr[0], mp->m_rtrmap_mxr[0], + name, agno, agbno); + suspect++; + } + + rp = xfs_rtrmap_rec_addr(block, 1); + snprintf(rootname, 256, "%s btree block %u/%u", name, agno, agbno); + error = process_rtrmap_reclist(mp, rp, numrecs, + &rmap_priv->last_rec, &rmap_priv->high_key, + rootname); + if (error) + suspect++; + goto out; + } + + /* + * interior record + */ + pp = xfs_rtrmap_ptr_addr(block, 1, mp->m_rtrmap_mxr[1]); + + if (numrecs > mp->m_rtrmap_mxr[1]) { + numrecs = mp->m_rtrmap_mxr[1]; + hdr_errors++; + } + if (isroot == 0 && numrecs < mp->m_rtrmap_mnr[1]) { + numrecs = mp->m_rtrmap_mnr[1]; + hdr_errors++; + } + + /* + * don't pass bogus tree flag down further if this block + * looked ok. bail out if two levels in a row look bad. + */ + if (hdr_errors) { + do_warn( +_("bad btree nrecs (%u, min=%u, max=%u) in bt%s block %u/%u\n"), + be16_to_cpu(block->bb_numrecs), + mp->m_rtrmap_mnr[1], mp->m_rtrmap_mxr[1], + name, agno, agbno); + if (suspect) + goto out; + suspect++; + } else if (suspect) { + suspect = 0; + } + + /* check the node's high keys */ + for (i = 0; !isroot && i < numrecs; i++) { + kp = xfs_rtrmap_high_key_addr(block, i + 1); + + key.rm_flags = 0; + key.rm_startblock = be32_to_cpu(kp->rm_startblock); + key.rm_owner = be64_to_cpu(kp->rm_owner); + if (libxfs_rmap_irec_offset_unpack(be64_to_cpu(kp->rm_offset), + &key)) { + /* Look for impossible flags. */ + do_warn( +_("invalid flags in key %u of %s btree block %u/%u\n"), + i, name, agno, agbno); + suspect++; + continue; + } + if (rmap_diffkeys(&key, &rmap_priv->high_key) > 0) { + do_warn( +_("key %d greater than high key of block (%u/%u) in %s tree\n"), + i, agno, agbno, name); + suspect++; + } + } + + /* check for in-order keys */ + for (i = 0; i < numrecs; i++) { + kp = xfs_rtrmap_key_addr(block, i + 1); + + key.rm_flags = 0; + key.rm_startblock = be32_to_cpu(kp->rm_startblock); + key.rm_owner = be64_to_cpu(kp->rm_owner); + if (libxfs_rmap_irec_offset_unpack(be64_to_cpu(kp->rm_offset), + &key)) { + /* Look for impossible flags. */ + do_warn( +_("invalid flags in key %u of %s btree block %u/%u\n"), + i, name, agno, agbno); + suspect++; + continue; + } + if (i == 0) { + oldkey = key; + continue; + } + if (rmap_diffkeys(&oldkey, &key) > 0) { + do_warn( +_("out of order key %u in %s btree block (%u/%u)\n"), + i, name, agno, agbno); + suspect++; + } + oldkey = key; + } + + for (i = 0; i < numrecs; i++) { + xfs_fsblock_t pbno = be64_to_cpu(pp[i]); + + /* + * XXX - put sibling detection right here. + * we know our sibling chain is good. So as we go, + * we check the entry before and after each entry. + * If either of the entries references a different block, + * check the sibling pointer. If there's a sibling + * pointer mismatch, try and extract as much data + * as possible. + */ + kp = xfs_rtrmap_high_key_addr(block, i + 1); + rmap_priv->high_key.rm_flags = 0; + rmap_priv->high_key.rm_startblock = + be32_to_cpu(kp->rm_startblock); + rmap_priv->high_key.rm_owner = + be64_to_cpu(kp->rm_owner); + if (libxfs_rmap_irec_offset_unpack(be64_to_cpu(kp->rm_offset), + &rmap_priv->high_key)) { + /* Look for impossible flags. */ + do_warn( +_("invalid flags in high key %u of %s btree block %u/%u\n"), + i, name, agno, agbno); + suspect++; + continue; + } + + if (!libxfs_verify_fsbno(mp, pbno)) { + do_warn( +_("bad %s btree ptr 0x%llx in ino %" PRIu64 "\n"), + name, (unsigned long long)pbno, ino); + return 1; + } + + error = scan_lbtree(pbno, level, scan_rtrmapbt, + type, whichfork, ino, tot, nex, blkmapp, + bm_cursor, suspect, 0, check_dups, magic, + rmap_priv, &xfs_rtrmapbt_buf_ops); + if (error) { + suspect++; + goto out; + } + } + +out: + if (hdr_errors || suspect) { + rmap_avoid_check(); + return 1; + } + return 0; +} + struct refc_priv { struct xfs_refcount_irec last_rec; xfs_agblock_t nr_blocks; diff --git a/repair/scan.h b/repair/scan.h index aeaf9f1a7f4..a624c882734 100644 --- a/repair/scan.h +++ b/repair/scan.h @@ -66,4 +66,38 @@ scan_ags( struct xfs_mount *mp, int scan_threads); +struct rmap_priv { + struct aghdr_cnts *agcnts; + struct xfs_rmap_irec high_key; + struct xfs_rmap_irec last_rec; + xfs_agblock_t nr_blocks; +}; + +int +process_rtrmap_reclist( + struct xfs_mount *mp, + struct xfs_rmap_rec *rp, + int numrecs, + struct xfs_rmap_irec *last_rec, + struct xfs_rmap_irec *high_key, + const char *name); + +int scan_rtrmapbt( + struct xfs_btree_block *block, + int level, + int type, + int whichfork, + xfs_fsblock_t bno, + xfs_ino_t ino, + xfs_rfsblock_t *tot, + uint64_t *nex, + struct blkmap **blkmapp, + bmap_cursor_t *bm_cursor, + int suspect, + int isroot, + int check_dups, + int *dirty, + uint64_t magic, + void *priv); + #endif /* _XR_SCAN_H */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 28/41] xfs_repair: flag suspect long-format btree blocks 2022-12-30 22:19 ` [PATCHSET v1.0 00/41] libxfs: realtime reverse-mapping support Darrick J. Wong ` (25 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 29/41] xfs_repair: use realtime rmap btree data to check block types Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 26/41] xfs_spaceman: report health status of the realtime rmap btree Darrick J. Wong ` (13 subsequent siblings) 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Pass a "suspect" counter through scan_lbtree just like we do for short-format btree blocks, and increment its value when we encounter blocks with bad CRCs or outright corruption. This makes it so that repair actually catches bmbt blocks with bad crcs or other verifier errors. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- repair/dinode.c | 2 +- repair/scan.c | 15 ++++++++++++--- repair/scan.h | 3 +++ 3 files changed, 16 insertions(+), 4 deletions(-) diff --git a/repair/dinode.c b/repair/dinode.c index 2674dac605e..36cb5cfcc70 100644 --- a/repair/dinode.c +++ b/repair/dinode.c @@ -872,7 +872,7 @@ _("bad bmap btree ptr 0x%" PRIx64 " in ino %" PRIu64 "\n"), if (scan_lbtree(get_unaligned_be64(&pp[i]), level, scan_bmapbt, type, whichfork, lino, tot, nex, blkmapp, - &cursor, 1, check_dups, magic, + &cursor, 0, 1, check_dups, magic, (void *)zap_metadata, &xfs_bmbt_buf_ops)) return(1); /* diff --git a/repair/scan.c b/repair/scan.c index 7a377222c91..4573c25a577 100644 --- a/repair/scan.c +++ b/repair/scan.c @@ -136,6 +136,7 @@ scan_lbtree( xfs_extnum_t *nex, blkmap_t **blkmapp, bmap_cursor_t *bm_cursor, + int suspect, int isroot, int check_dups, int *dirty, @@ -148,6 +149,7 @@ scan_lbtree( xfs_extnum_t *nex, blkmap_t **blkmapp, bmap_cursor_t *bm_cursor, + int suspect, int isroot, int check_dups, uint64_t magic, @@ -167,6 +169,12 @@ scan_lbtree( XFS_FSB_TO_AGBNO(mp, root)); return(1); } + if (bp->b_error == -EFSBADCRC || bp->b_error == -EFSCORRUPTED) { + do_warn(_("btree block %d/%d is suspect, error %d\n"), + XFS_FSB_TO_AGNO(mp, root), + XFS_FSB_TO_AGBNO(mp, root), bp->b_error); + suspect++; + } /* * only check for bad CRC here - caller will determine if there @@ -182,7 +190,7 @@ scan_lbtree( err = (*func)(XFS_BUF_TO_BLOCK(bp), nlevels - 1, type, whichfork, root, ino, tot, nex, blkmapp, - bm_cursor, isroot, check_dups, &dirty, + bm_cursor, suspect, isroot, check_dups, &dirty, magic, priv); ASSERT(dirty == 0 || (dirty && !no_modify)); @@ -209,6 +217,7 @@ scan_bmapbt( xfs_extnum_t *nex, blkmap_t **blkmapp, bmap_cursor_t *bm_cursor, + int suspect, int isroot, int check_dups, int *dirty, @@ -514,7 +523,7 @@ _("bad bmap btree ptr 0x%llx in ino %" PRIu64 "\n"), err = scan_lbtree(be64_to_cpu(pp[i]), level, scan_bmapbt, type, whichfork, ino, tot, nex, blkmapp, - bm_cursor, 0, check_dups, magic, priv, + bm_cursor, suspect, 0, check_dups, magic, priv, &xfs_bmbt_buf_ops); if (err) return(1); @@ -582,7 +591,7 @@ _("bad fwd (right) sibling pointer (saw %" PRIu64 " should be NULLFSBLOCK)\n" be64_to_cpu(pkey[numrecs - 1].br_startoff); } - return(0); + return suspect > 0 ? 1 : 0; } static void diff --git a/repair/scan.h b/repair/scan.h index 4da788becbe..aeaf9f1a7f4 100644 --- a/repair/scan.h +++ b/repair/scan.h @@ -23,6 +23,7 @@ int scan_lbtree( xfs_extnum_t *nex, struct blkmap **blkmapp, bmap_cursor_t *bm_cursor, + int suspect, int isroot, int check_dups, int *dirty, @@ -35,6 +36,7 @@ int scan_lbtree( xfs_extnum_t *nex, struct blkmap **blkmapp, bmap_cursor_t *bm_cursor, + int suspect, int isroot, int check_dups, uint64_t magic, @@ -52,6 +54,7 @@ int scan_bmapbt( xfs_extnum_t *nex, struct blkmap **blkmapp, bmap_cursor_t *bm_cursor, + int suspect, int isroot, int check_dups, int *dirty, ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 26/41] xfs_spaceman: report health status of the realtime rmap btree 2022-12-30 22:19 ` [PATCHSET v1.0 00/41] libxfs: realtime reverse-mapping support Darrick J. Wong ` (26 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 28/41] xfs_repair: flag suspect long-format btree blocks Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 30/41] xfs_repair: create a new set of incore rmap information for rt groups Darrick J. Wong ` (12 subsequent siblings) 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Add reporting of the rt rmap btree health to spaceman. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- spaceman/health.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/spaceman/health.c b/spaceman/health.c index 928d92abb8c..950610d9770 100644 --- a/spaceman/health.c +++ b/spaceman/health.c @@ -39,6 +39,11 @@ static bool has_reflink(const struct xfs_fsop_geom *g) return g->flags & XFS_FSOP_GEOM_FLAGS_REFLINK; } +static bool has_rtrmapbt(const struct xfs_fsop_geom *g) +{ + return g->rtblocks > 0 && (g->flags & XFS_FSOP_GEOM_FLAGS_RMAPBT); +} + struct flag_map { unsigned int mask; bool (*has_fn)(const struct xfs_fsop_geom *g); @@ -143,6 +148,11 @@ static const struct flag_map rtgroup_flags[] = { .mask = XFS_RTGROUP_GEOM_SICK_BITMAP, .descr = "realtime bitmap", }, + { + .mask = XFS_RTGROUP_GEOM_SICK_RMAPBT, + .descr = "realtime reverse mappings btree", + .has_fn = has_rtrmapbt, + }, {0}, }; ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 30/41] xfs_repair: create a new set of incore rmap information for rt groups 2022-12-30 22:19 ` [PATCHSET v1.0 00/41] libxfs: realtime reverse-mapping support Darrick J. Wong ` (27 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 26/41] xfs_spaceman: report health status of the realtime rmap btree Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:19 ` [PATCH 23/41] xfs_db: copy the realtime rmap btree Darrick J. Wong ` (11 subsequent siblings) 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Create a parallel set of "xfs_ag_rmap" structures to cache information about reverse mappings for the realtime groups. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/libxfs_api_defs.h | 3 + repair/agbtree.c | 5 +- repair/dinode.c | 2 - repair/rmap.c | 144 +++++++++++++++++++++++++++++++++++++--------- repair/rmap.h | 7 +- 5 files changed, 127 insertions(+), 34 deletions(-) diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h index f59f9aa2060..2a07a717215 100644 --- a/libxfs/libxfs_api_defs.h +++ b/libxfs/libxfs_api_defs.h @@ -241,6 +241,7 @@ #define xfs_rtsummary_wordcount libxfs_rtsummary_wordcount #define xfs_rtfree_extent libxfs_rtfree_extent +#define xfs_rtgroup_get libxfs_rtgroup_get #define xfs_rtgroup_put libxfs_rtgroup_put #define xfs_rtgroup_update_secondary_sbs libxfs_rtgroup_update_secondary_sbs #define xfs_rtgroup_update_super libxfs_rtgroup_update_super @@ -249,6 +250,8 @@ #define xfs_rtrmapbt_maxlevels_ondisk libxfs_rtrmapbt_maxlevels_ondisk #define xfs_rtrmapbt_init_cursor libxfs_rtrmapbt_init_cursor #define xfs_rtrmapbt_maxrecs libxfs_rtrmapbt_maxrecs +#define xfs_rtrmapbt_mem_create libxfs_rtrmapbt_mem_create +#define xfs_rtrmapbt_mem_cursor libxfs_rtrmapbt_mem_cursor #define xfs_sb_from_disk libxfs_sb_from_disk #define xfs_sb_quota_from_disk libxfs_sb_quota_from_disk diff --git a/repair/agbtree.c b/repair/agbtree.c index 26e282d57c8..e340e9cfc04 100644 --- a/repair/agbtree.c +++ b/repair/agbtree.c @@ -637,7 +637,7 @@ init_rmapbt_cursor( /* Compute how many blocks we'll need. */ error = -libxfs_btree_bload_compute_geometry(btr->cur, &btr->bload, - rmap_record_count(sc->mp, agno)); + rmap_record_count(sc->mp, false, agno)); if (error) do_error( _("Unable to compute rmap btree geometry, error %d.\n"), error); @@ -654,7 +654,8 @@ build_rmap_tree( { int error; - error = rmap_init_mem_cursor(sc->mp, NULL, agno, &btr->rmapbt_cursor); + error = rmap_init_mem_cursor(sc->mp, NULL, false, agno, + &btr->rmapbt_cursor); if (error) do_error( _("Insufficient memory to construct rmap cursor.\n")); diff --git a/repair/dinode.c b/repair/dinode.c index 93ef89272fd..6f44261907e 100644 --- a/repair/dinode.c +++ b/repair/dinode.c @@ -657,7 +657,7 @@ _("illegal state %d in block map %" PRIu64 "\n"), } } if (collect_rmaps && !zap_metadata) /* && !check_dups */ - rmap_add_rec(mp, ino, whichfork, &irec); + rmap_add_rec(mp, ino, whichfork, &irec, false); *tot += irec.br_blockcount; } error = 0; diff --git a/repair/rmap.c b/repair/rmap.c index db85b747e53..9550377df16 100644 --- a/repair/rmap.c +++ b/repair/rmap.c @@ -25,7 +25,7 @@ # define dbg_printf(f, a...) #endif -/* per-AG rmap object anchor */ +/* allocation group (AG or rtgroup) rmap object anchor */ struct xfs_ag_rmap { struct xfbtree *ar_xfbtree; /* rmap observations */ struct xfs_slab *ar_agbtree_rmaps; /* rmaps for rebuilt ag btrees */ @@ -35,9 +35,17 @@ struct xfs_ag_rmap { }; static struct xfs_ag_rmap *ag_rmaps; +static struct xfs_ag_rmap *rg_rmaps; bool rmapbt_suspect; static bool refcbt_suspect; +static struct xfs_ag_rmap *rmaps_for_group(bool isrt, unsigned int group) +{ + if (isrt) + return &rg_rmaps[group]; + return &ag_rmaps[group]; +} + static inline int rmap_compare(const void *a, const void *b) { return libxfs_rmap_compare(a, b); @@ -78,6 +86,45 @@ rmaps_destroy( xfile_destroy(xfile); } +/* Initialize the in-memory rmap btree for collecting realtime rmap records. */ +STATIC void +rmaps_init_rt( + struct xfs_mount *mp, + xfs_rgnumber_t rgno, + struct xfs_ag_rmap *ag_rmap) +{ + struct xfile *xfile; + struct xfs_buftarg *target; + unsigned long long maxbytes; + int error; + + if (!xfs_has_realtime(mp)) + return; + + /* + * Each rtgroup rmap btree file can consume the entire data device, + * even if the metadata space reservation will be smaller than that. + */ + maxbytes = XFS_FSB_TO_B(mp, mp->m_sb.sb_dblocks); + error = xfile_create(mp, maxbytes, "rtrmapbt repair", &xfile); + if (error) + goto nomem; + + error = -libxfs_alloc_memory_buftarg(mp, xfile, &target); + if (error) + goto nomem; + + error = -libxfs_rtrmapbt_mem_create(mp, rgno, target, + &ag_rmap->ar_xfbtree); + if (error) + goto nomem; + + return; +nomem: + do_error( +_("Insufficient memory while allocating realtime reverse mapping btree.")); +} + /* Initialize the in-memory rmap btree for collecting per-AG rmap records. */ STATIC void rmaps_init_ag( @@ -138,6 +185,13 @@ rmaps_init( for (i = 0; i < mp->m_sb.sb_agcount; i++) rmaps_init_ag(mp, i, &ag_rmaps[i]); + + rg_rmaps = calloc(mp->m_sb.sb_rgcount, sizeof(struct xfs_ag_rmap)); + if (!rg_rmaps) + do_error(_("couldn't allocate per-rtgroup reverse map roots\n")); + + for (i = 0; i < mp->m_sb.sb_rgcount; i++) + rmaps_init_rt(mp, i, &rg_rmaps[i]); } /* @@ -152,6 +206,11 @@ rmaps_free( if (!rmap_needs_work(mp)) return; + for (i = 0; i < mp->m_sb.sb_rgcount; i++) + rmaps_destroy(mp, &rg_rmaps[i]); + free(rg_rmaps); + rg_rmaps = NULL; + for (i = 0; i < mp->m_sb.sb_agcount; i++) rmaps_destroy(mp, &ag_rmaps[i]); free(ag_rmaps); @@ -187,26 +246,38 @@ int rmap_init_mem_cursor( struct xfs_mount *mp, struct xfs_trans *tp, + bool isrt, xfs_agnumber_t agno, struct rmap_mem_cur *rmcur) { struct xfbtree *xfbt; - struct xfs_perag *pag; + struct xfs_perag *pag = NULL; + struct xfs_rtgroup *rtg = NULL; int error; - xfbt = ag_rmaps[agno].ar_xfbtree; + xfbt = rmaps_for_group(isrt, agno)->ar_xfbtree; error = -xfbtree_head_read_buf(xfbt, tp, &rmcur->mhead_bp); if (error) return error; - pag = libxfs_perag_get(mp, agno); - rmcur->mcur = libxfs_rmapbt_mem_cursor(pag, tp, rmcur->mhead_bp, xfbt); + if (isrt) { + rtg = libxfs_rtgroup_get(mp, agno); + rmcur->mcur = libxfs_rtrmapbt_mem_cursor(rtg, tp, + rmcur->mhead_bp, xfbt); + } else { + pag = libxfs_perag_get(mp, agno); + rmcur->mcur = libxfs_rmapbt_mem_cursor(pag, tp, + rmcur->mhead_bp, xfbt); + } error = -libxfs_btree_goto_left_edge(rmcur->mcur); if (error) rmap_free_mem_cursor(tp, rmcur, error); - libxfs_perag_put(pag); + if (pag) + libxfs_perag_put(pag); + if (rtg) + libxfs_rtgroup_put(rtg); return error; } @@ -251,6 +322,7 @@ rmap_get_mem_rec( static void rmap_add_mem_rec( struct xfs_mount *mp, + bool isrt, xfs_agnumber_t agno, struct xfs_rmap_irec *rmap) { @@ -259,12 +331,12 @@ rmap_add_mem_rec( struct xfs_trans *tp; int error; - xfbt = ag_rmaps[agno].ar_xfbtree; + xfbt = rmaps_for_group(isrt, agno)->ar_xfbtree; error = -libxfs_trans_alloc_empty(mp, &tp); if (error) do_error(_("allocating tx for in-memory rmap update\n")); - error = rmap_init_mem_cursor(mp, tp, agno, &rmcur); + error = rmap_init_mem_cursor(mp, tp, isrt, agno, &rmcur); if (error) do_error(_("reading in-memory rmap btree head\n")); @@ -289,7 +361,8 @@ rmap_add_rec( struct xfs_mount *mp, xfs_ino_t ino, int whichfork, - struct xfs_bmbt_irec *irec) + struct xfs_bmbt_irec *irec, + bool isrt) { struct xfs_rmap_irec rmap; xfs_agnumber_t agno; @@ -298,11 +371,19 @@ rmap_add_rec( if (!rmap_needs_work(mp)) return; - agno = XFS_FSB_TO_AGNO(mp, irec->br_startblock); - agbno = XFS_FSB_TO_AGBNO(mp, irec->br_startblock); - ASSERT(agno != NULLAGNUMBER); - ASSERT(agno < mp->m_sb.sb_agcount); - ASSERT(agbno + irec->br_blockcount <= mp->m_sb.sb_agblocks); + if (isrt) { + xfs_rgnumber_t rgno; + + agbno = xfs_rtb_to_rgbno(mp, irec->br_startblock, &rgno); + agno = rgno; + ASSERT(agbno + irec->br_blockcount <= mp->m_sb.sb_rblocks); + } else { + agno = XFS_FSB_TO_AGNO(mp, irec->br_startblock); + agbno = XFS_FSB_TO_AGBNO(mp, irec->br_startblock); + ASSERT(agno != NULLAGNUMBER); + ASSERT(agno < mp->m_sb.sb_agcount); + ASSERT(agbno + irec->br_blockcount <= mp->m_sb.sb_agblocks); + } ASSERT(ino != NULLFSINO); ASSERT(whichfork == XFS_DATA_FORK || whichfork == XFS_ATTR_FORK); @@ -316,7 +397,7 @@ rmap_add_rec( if (irec->br_state == XFS_EXT_UNWRITTEN) rmap.rm_flags |= XFS_RMAP_UNWRITTEN; - rmap_add_mem_rec(mp, agno, &rmap); + rmap_add_mem_rec(mp, isrt, agno, &rmap); } /* add a raw rmap; these will be merged later */ @@ -343,7 +424,7 @@ __rmap_add_raw_rec( rmap.rm_startblock = agbno; rmap.rm_blockcount = len; - rmap_add_mem_rec(mp, agno, &rmap); + rmap_add_mem_rec(mp, false, agno, &rmap); } /* @@ -412,6 +493,7 @@ rmap_add_agbtree_mapping( .rm_blockcount = len, }; struct xfs_perag *pag; + struct xfs_ag_rmap *x; if (!rmap_needs_work(mp)) return 0; @@ -420,7 +502,8 @@ rmap_add_agbtree_mapping( assert(libxfs_verify_agbext(pag, agbno, len)); libxfs_perag_put(pag); - return slab_add(ag_rmaps[agno].ar_agbtree_rmaps, &rmap); + x = rmaps_for_group(false, agno); + return slab_add(x->ar_agbtree_rmaps, &rmap); } static int @@ -536,7 +619,7 @@ rmap_commit_agbtree_mappings( struct xfs_buf *agflbp = NULL; struct xfs_trans *tp; __be32 *agfl_bno, *b; - struct xfs_ag_rmap *ag_rmap = &ag_rmaps[agno]; + struct xfs_ag_rmap *ag_rmap = rmaps_for_group(false, agno); struct bitmap *own_ag_bitmap = NULL; int error = 0; @@ -799,7 +882,7 @@ refcount_emit( int error; struct xfs_slab *rlslab; - rlslab = ag_rmaps[agno].ar_refcount_items; + rlslab = rmaps_for_group(false, agno)->ar_refcount_items; ASSERT(nr_rmaps > 0); dbg_printf("REFL: agno=%u pblk=%u, len=%u -> refcount=%zu\n", @@ -933,12 +1016,12 @@ compute_refcounts( if (!xfs_has_reflink(mp)) return 0; - if (ag_rmaps[agno].ar_xfbtree == NULL) + if (rmaps_for_group(false, agno)->ar_xfbtree == NULL) return 0; - nr_rmaps = rmap_record_count(mp, agno); + nr_rmaps = rmap_record_count(mp, false, agno); - error = rmap_init_mem_cursor(mp, NULL, agno, &rmcur); + error = rmap_init_mem_cursor(mp, NULL, false, agno, &rmcur); if (error) return error; @@ -1043,16 +1126,17 @@ count_btree_records( uint64_t rmap_record_count( struct xfs_mount *mp, + bool isrt, xfs_agnumber_t agno) { struct rmap_mem_cur rmcur; uint64_t nr = 0; int error; - if (ag_rmaps[agno].ar_xfbtree == NULL) + if (rmaps_for_group(isrt, agno)->ar_xfbtree == NULL) return 0; - error = rmap_init_mem_cursor(mp, NULL, agno, &rmcur); + error = rmap_init_mem_cursor(mp, NULL, isrt, agno, &rmcur); if (error) do_error(_("%s while reading in-memory rmap btree\n"), strerror(error)); @@ -1168,7 +1252,7 @@ rmaps_verify_btree( } /* Create cursors to rmap structures */ - error = rmap_init_mem_cursor(mp, NULL, agno, &rm_cur); + error = rmap_init_mem_cursor(mp, NULL, false, agno, &rm_cur); if (error) { do_warn(_("Not enough memory to check reverse mappings.\n")); return; @@ -1488,7 +1572,9 @@ refcount_record_count( struct xfs_mount *mp, xfs_agnumber_t agno) { - return slab_count(ag_rmaps[agno].ar_refcount_items); + struct xfs_ag_rmap *x = rmaps_for_group(false, agno); + + return slab_count(x->ar_refcount_items); } /* @@ -1499,7 +1585,9 @@ init_refcount_cursor( xfs_agnumber_t agno, struct xfs_slab_cursor **cur) { - return init_slab_cursor(ag_rmaps[agno].ar_refcount_items, NULL, cur); + struct xfs_ag_rmap *x = rmaps_for_group(false, agno); + + return init_slab_cursor(x->ar_refcount_items, NULL, cur); } /* @@ -1700,5 +1788,5 @@ rmap_store_agflcount( if (!rmap_needs_work(mp)) return; - ag_rmaps[agno].ar_flcount = count; + rmaps_for_group(false, agno)->ar_flcount = count; } diff --git a/repair/rmap.h b/repair/rmap.h index cb6c32af62c..008b96a38f4 100644 --- a/repair/rmap.h +++ b/repair/rmap.h @@ -15,7 +15,7 @@ extern void rmaps_init(struct xfs_mount *); extern void rmaps_free(struct xfs_mount *); void rmap_add_rec(struct xfs_mount *mp, xfs_ino_t ino, int whichfork, - struct xfs_bmbt_irec *irec); + struct xfs_bmbt_irec *irec, bool realtime); void rmap_add_bmbt_rec(struct xfs_mount *mp, xfs_ino_t ino, int whichfork, xfs_fsblock_t fsbno); bool rmaps_are_mergeable(struct xfs_rmap_irec *r1, struct xfs_rmap_irec *r2); @@ -26,7 +26,8 @@ int rmap_add_agbtree_mapping(struct xfs_mount *mp, xfs_agnumber_t agno, xfs_agblock_t agbno, xfs_extlen_t len, uint64_t owner); int rmap_commit_agbtree_mappings(struct xfs_mount *mp, xfs_agnumber_t agno); -uint64_t rmap_record_count(struct xfs_mount *mp, xfs_agnumber_t agno); +uint64_t rmap_record_count(struct xfs_mount *mp, bool isrt, + xfs_agnumber_t agno); extern void rmap_avoid_check(void); void rmaps_verify_btree(struct xfs_mount *mp, xfs_agnumber_t agno); @@ -54,7 +55,7 @@ struct rmap_mem_cur { }; int rmap_init_mem_cursor(struct xfs_mount *mp, struct xfs_trans *tp, - xfs_agnumber_t agno, struct rmap_mem_cur *rmcur); + bool isrt, xfs_agnumber_t agno, struct rmap_mem_cur *rmcur); void rmap_free_mem_cursor(struct xfs_trans *tp, struct rmap_mem_cur *rmcur, int error); int rmap_get_mem_rec(struct rmap_mem_cur *rmcur, struct xfs_rmap_irec *irec); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 23/41] xfs_db: copy the realtime rmap btree 2022-12-30 22:19 ` [PATCHSET v1.0 00/41] libxfs: realtime reverse-mapping support Darrick J. Wong ` (28 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 30/41] xfs_repair: create a new set of incore rmap information for rt groups Darrick J. Wong @ 2022-12-30 22:19 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 36/41] xfs_repair: rebuild " Darrick J. Wong ` (10 subsequent siblings) 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:19 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Copy the realtime rmapbt when we're metadumping the filesystem. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- db/metadump.c | 125 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 125 insertions(+) diff --git a/db/metadump.c b/db/metadump.c index 15e3b302d61..e8663e11a3f 100644 --- a/db/metadump.c +++ b/db/metadump.c @@ -597,6 +597,54 @@ copy_rmap_btree( return scan_btree(agno, root, levels, TYP_RMAPBT, agf, scanfunc_rmapbt); } +static int +scanfunc_rtrmapbt( + struct xfs_btree_block *block, + xfs_agnumber_t agno, + xfs_agblock_t agbno, + int level, + typnm_t btype, + void *arg) +{ + xfs_rtrmap_ptr_t *pp; + int i; + int numrecs; + + if (level == 0) + return 1; + + numrecs = be16_to_cpu(block->bb_numrecs); + if (numrecs > mp->m_rtrmap_mxr[1]) { + if (show_warnings) + print_warning("invalid numrecs (%u) in %s block %u/%u", + numrecs, typtab[btype].name, agno, agbno); + return 1; + } + + pp = xfs_rtrmap_ptr_addr(block, 1, mp->m_rtrmap_mxr[1]); + for (i = 0; i < numrecs; i++) { + xfs_agnumber_t pagno; + xfs_agblock_t pbno; + + pagno = XFS_FSB_TO_AGNO(mp, get_unaligned_be64(&pp[i])); + pbno = XFS_FSB_TO_AGBNO(mp, get_unaligned_be64(&pp[i])); + + if (pbno == 0 || pbno > mp->m_sb.sb_agblocks || + pagno > mp->m_sb.sb_agcount) { + if (show_warnings) + print_warning("invalid block number (%u/%u) " + "in inode %llu %s block %u/%u", + pagno, pbno, (long long)cur_ino, + typtab[btype].name, agno, agbno); + continue; + } + if (!scan_btree(pagno, pbno, level, btype, arg, + scanfunc_rtrmapbt)) + return 0; + } + return 1; +} + static int scanfunc_refcntbt( struct xfs_btree_block *block, @@ -2336,6 +2384,80 @@ process_exinode( whichfork), nex, itype, is_meta); } +static int +process_rtrmap( + struct xfs_dinode *dip, + typnm_t itype) +{ + struct xfs_rtrmap_root *dib; + int i; + xfs_rtrmap_ptr_t *pp; + int level; + int nrecs; + int maxrecs; + int whichfork; + typnm_t btype; + + if (itype == TYP_ATTR && show_warnings) { + print_warning("ignoring rtrmapbt root in inode %llu attr fork", + (long long)cur_ino); + return 1; + } + + whichfork = XFS_DATA_FORK; + btype = TYP_RTRMAPBT; + + dib = (struct xfs_rtrmap_root *)XFS_DFORK_PTR(dip, whichfork); + level = be16_to_cpu(dib->bb_level); + nrecs = be16_to_cpu(dib->bb_numrecs); + + if (level > mp->m_rtrmap_maxlevels) { + if (show_warnings) + print_warning("invalid level (%u) in inode %lld %s " + "root", level, (long long)cur_ino, + typtab[btype].name); + return 1; + } + + if (level == 0) + return 1; + + maxrecs = libxfs_rtrmapbt_droot_maxrecs( + XFS_DFORK_SIZE(dip, mp, whichfork), + false); + if (nrecs > maxrecs) { + if (show_warnings) + print_warning("invalid numrecs (%u) in inode %lld %s " + "root", nrecs, (long long)cur_ino, + typtab[btype].name); + return 1; + } + + pp = xfs_rtrmap_droot_ptr_addr(dib, 1, maxrecs); + for (i = 0; i < nrecs; i++) { + xfs_agnumber_t ag; + xfs_agblock_t bno; + + ag = XFS_FSB_TO_AGNO(mp, get_unaligned_be64(&pp[i])); + bno = XFS_FSB_TO_AGBNO(mp, get_unaligned_be64(&pp[i])); + + if (bno == 0 || bno > mp->m_sb.sb_agblocks || + ag > mp->m_sb.sb_agcount) { + if (show_warnings) + print_warning("invalid block number (%u/%u) " + "in inode %llu %s root", ag, + bno, (long long)cur_ino, + typtab[btype].name); + continue; + } + + if (!scan_btree(ag, bno, level, btype, &itype, + scanfunc_rtrmapbt)) + return 0; + } + return 1; +} + static int process_inode_data( struct xfs_dinode *dip, @@ -2380,6 +2502,9 @@ process_inode_data( case XFS_DINODE_FMT_BTREE: return process_btinode(dip, itype); + + case XFS_DINODE_FMT_RMAP: + return process_rtrmap(dip, itype); } return 1; } ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 36/41] xfs_repair: rebuild the realtime rmap btree 2022-12-30 22:19 ` [PATCHSET v1.0 00/41] libxfs: realtime reverse-mapping support Darrick J. Wong ` (29 preceding siblings ...) 2022-12-30 22:19 ` [PATCH 23/41] xfs_db: copy the realtime rmap btree Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 32/41] xfs_repair: refactor realtime inode check Darrick J. Wong ` (9 subsequent siblings) 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Rebuild the realtime rmap btree file from the reverse mapping records we gathered from walking the inodes. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/libxfs_api_defs.h | 4 + repair/Makefile | 1 repair/phase6.c | 131 ++++++++++++++++++++++++ repair/rmap.h | 1 repair/rtrmap_repair.c | 253 ++++++++++++++++++++++++++++++++++++++++++++++ repair/xfs_repair.c | 8 + 6 files changed, 396 insertions(+), 2 deletions(-) create mode 100644 repair/rtrmap_repair.c diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h index 2a07a717215..ee864911e5e 100644 --- a/libxfs/libxfs_api_defs.h +++ b/libxfs/libxfs_api_defs.h @@ -228,6 +228,7 @@ #define xfs_rmap_irec_offset_unpack libxfs_rmap_irec_offset_unpack #define xfs_rmap_lookup_le libxfs_rmap_lookup_le #define xfs_rmap_lookup_le_range libxfs_rmap_lookup_le_range +#define xfs_rmap_map_extent libxfs_rmap_map_extent #define xfs_rmap_map_raw libxfs_rmap_map_raw #define xfs_rmap_query_all libxfs_rmap_query_all #define xfs_rmap_query_range libxfs_rmap_query_range @@ -245,6 +246,8 @@ #define xfs_rtgroup_put libxfs_rtgroup_put #define xfs_rtgroup_update_secondary_sbs libxfs_rtgroup_update_secondary_sbs #define xfs_rtgroup_update_super libxfs_rtgroup_update_super +#define xfs_rtrmapbt_commit_staged_btree libxfs_rtrmapbt_commit_staged_btree +#define xfs_rtrmapbt_create libxfs_rtrmapbt_create #define xfs_rtrmapbt_create_path libxfs_rtrmapbt_create_path #define xfs_rtrmapbt_droot_maxrecs libxfs_rtrmapbt_droot_maxrecs #define xfs_rtrmapbt_maxlevels_ondisk libxfs_rtrmapbt_maxlevels_ondisk @@ -252,6 +255,7 @@ #define xfs_rtrmapbt_maxrecs libxfs_rtrmapbt_maxrecs #define xfs_rtrmapbt_mem_create libxfs_rtrmapbt_mem_create #define xfs_rtrmapbt_mem_cursor libxfs_rtrmapbt_mem_cursor +#define xfs_rtrmapbt_stage_cursor libxfs_rtrmapbt_stage_cursor #define xfs_sb_from_disk libxfs_sb_from_disk #define xfs_sb_quota_from_disk libxfs_sb_quota_from_disk diff --git a/repair/Makefile b/repair/Makefile index 250c86cca2d..c7e09732800 100644 --- a/repair/Makefile +++ b/repair/Makefile @@ -70,6 +70,7 @@ CFILES = \ rcbag.c \ rmap.c \ rt.c \ + rtrmap_repair.c \ sb.c \ scan.c \ slab.c \ diff --git a/repair/phase6.c b/repair/phase6.c index 1dbd600915d..d5381e1eddc 100644 --- a/repair/phase6.c +++ b/repair/phase6.c @@ -18,6 +18,8 @@ #include "dinode.h" #include "progress.h" #include "versions.h" +#include "slab.h" +#include "rmap.h" static xfs_ino_t orphanage_ino; @@ -1033,6 +1035,121 @@ _("Couldn't find realtime summary parent, error %d\n"), libxfs_irele(ip); } +static void +ensure_rtgroup_rmapbt( + struct xfs_rtgroup *rtg) +{ + struct xfs_mount *mp = rtg->rtg_mount; + struct xfs_trans *tp; + struct xfs_imeta_path *path; + struct xfs_inode *ip; + struct xfs_imeta_update upd; + xfs_ino_t ino; + int error; + + if (!xfs_has_rtrmapbt(mp)) + return; + + ino = rtgroup_rmap_ino(rtg); + if (no_modify) { + if (ino == NULLFSINO) + do_warn(_("would reset rtgroup %u rmap btree\n"), + rtg->rtg_rgno); + return; + } + + if (ino == NULLFSINO) + do_warn(_("resetting rtgroup %u rmap btree\n"), + rtg->rtg_rgno); + + error = -libxfs_rtrmapbt_create_path(mp, rtg->rtg_rgno, &path); + if (error) + do_error( +_("Couldn't create rtgroup %u rmap file path, err %d\n"), + rtg->rtg_rgno, error); + + error = ensure_imeta_dirpath(mp, path); + if (error) + do_error( +_("Couldn't create rtgroup %u metadata directory, error %d\n"), + rtg->rtg_rgno, error); + + error = -libxfs_imeta_start_update(mp, path, &upd); + if (error) + do_error( +_("Couldn't find rtgroup %u rmap inode parent, error %d\n"), + rtg->rtg_rgno, error); + + /* Create a transaction for whatever work we end up doing. */ + error = -libxfs_trans_alloc(mp, &M_RES(mp)->tr_imeta_create, + libxfs_imeta_create_space_res(mp), 0, 0, &tp); + if (error) + do_error( +_("Couldn't prepare to attach rtgroup %u rmap inode, error %d\n"), + rtg->rtg_rgno, error); + + if (ino != NULLFSINO) { + /* + * We're still hanging on to our old inode, so reconnect it to + * the metadata directory tree. + */ + error = -libxfs_imeta_iget(mp, ino, XFS_DIR3_FT_REG_FILE, &ip); + if (error) { + do_warn( +_("Couldn't iget rtgroup %u rmap inode 0x%llx, error %d\n"), + rtg->rtg_rgno, (unsigned long long)ino, + error); + goto zap; + } + + error = -libxfs_imeta_link(tp, path, ip, &upd); + if (error) + do_error( +_("Failed to link rtgroup %u rmapbt inode 0x%llx, error %d\n"), + rtg->rtg_rgno, + (unsigned long long)ip->i_ino, + error); + + set_nlink(VFS_I(ip), 1); + ip->i_df.if_format = XFS_DINODE_FMT_RMAP; + libxfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); + } else { +zap: + /* + * The rtrmap inode was bad or gone, so just make a new one + * and give our reference to the rtgroup structure. + */ + error = -libxfs_rtrmapbt_create(&tp, path, &upd, &ip); + if (error) + do_error( +_("Couldn't create rtgroup %u rmap inode, error %d\n"), + rtg->rtg_rgno, error); + } + + error = -libxfs_trans_commit(tp); + if (error) + do_error( +_("Couldn't commit new rtgroup %u rmap inode %llu, error %d\n"), + rtg->rtg_rgno, + (unsigned long long)ip->i_ino, + error); + + /* Mark the inode in use. */ + mark_ino_inuse(mp, ip->i_ino, S_IFREG, upd.dp->i_ino); + mark_ino_metadata(mp, ip->i_ino); + libxfs_imeta_end_update(mp, &upd, error); + + /* Copy our incore rmap data to the ondisk rmap inode. */ + error = populate_rtgroup_rmapbt(rtg, ip); + if (error) + do_error( +_("rtgroup %u rmap btree could not be rebuilt, error %d\n"), + rtg->rtg_rgno, error); + + libxfs_imeta_free_path(path); + libxfs_imeta_irele(ip); +} + /* Initialize a root directory. */ static int init_fs_root_dir( @@ -3684,6 +3801,18 @@ traverse_ags( do_inode_prefetch(mp, ag_stride, traverse_function, false, true); } +static void +reset_rt_metadata_inodes( + struct xfs_mount *mp) +{ + struct xfs_rtgroup *rtg; + xfs_rgnumber_t rgno; + + for_each_rtgroup(mp, rgno, rtg) { + ensure_rtgroup_rmapbt(rtg); + } +} + void phase6(xfs_mount_t *mp) { @@ -3747,6 +3876,8 @@ phase6(xfs_mount_t *mp) } } + reset_rt_metadata_inodes(mp); + if (!no_modify) { do_log( _(" - resetting contents of realtime bitmap and summary inodes\n")); diff --git a/repair/rmap.h b/repair/rmap.h index c7c09046a8a..64a85b32341 100644 --- a/repair/rmap.h +++ b/repair/rmap.h @@ -64,5 +64,6 @@ int rmap_get_mem_rec(struct rmap_mem_cur *rmcur, struct xfs_rmap_irec *irec); bool is_rtrmap_inode(xfs_ino_t ino); xfs_ino_t rtgroup_rmap_ino(struct xfs_rtgroup *rtg); +int populate_rtgroup_rmapbt(struct xfs_rtgroup *rtg, struct xfs_inode *ip); #endif /* RMAP_H_ */ diff --git a/repair/rtrmap_repair.c b/repair/rtrmap_repair.c new file mode 100644 index 00000000000..d9a0083e298 --- /dev/null +++ b/repair/rtrmap_repair.c @@ -0,0 +1,253 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (C) 2022 Oracle. All Rights Reserved. + * Author: Darrick J. Wong <djwong@kernel.org> + */ +#include <libxfs.h> +#include "btree.h" +#include "err_protos.h" +#include "libxlog.h" +#include "incore.h" +#include "globals.h" +#include "dinode.h" +#include "slab.h" +#include "rmap.h" +#include "bulkload.h" + +/* Ported routines from fs/xfs/scrub/rtrmap_repair.c */ + +/* + * Realtime Reverse Mapping (RTRMAPBT) Repair + * ========================================== + * + * Gather all the rmap records for the inode and fork we're fixing, reset the + * incore fork, then recreate the btree. + */ +struct xrep_rtrmap { + struct rmap_mem_cur btree_cursor; + + /* New fork. */ + struct bulkload new_fork_info; + struct xfs_btree_bload rtrmap_bload; + + struct repair_ctx *sc; + struct xfs_rtgroup *rtg; +}; + +/* Retrieve rtrmapbt data for bulk load. */ +STATIC int +xrep_rtrmap_get_records( + struct xfs_btree_cur *cur, + unsigned int idx, + struct xfs_btree_block *block, + unsigned int nr_wanted, + void *priv) +{ + struct xrep_rtrmap *rr = priv; + union xfs_btree_rec *block_rec; + unsigned int loaded; + int ret; + + for (loaded = 0; loaded < nr_wanted; loaded++, idx++) { + ret = rmap_get_mem_rec(&rr->btree_cursor, &cur->bc_rec.r); + if (ret < 0) + return ret; + if (ret == 0) + do_error( + _("ran out of records while rebuilding rt rmap btree\n")); + + block_rec = libxfs_btree_rec_addr(cur, idx, block); + cur->bc_ops->init_rec_from_cur(cur, block_rec); + } + + return loaded; +} + +/* Feed one of the new btree blocks to the bulk loader. */ +STATIC int +xrep_rtrmap_claim_block( + struct xfs_btree_cur *cur, + union xfs_btree_ptr *ptr, + void *priv) +{ + struct xrep_rtrmap *rr = priv; + + return bulkload_claim_block(cur, &rr->new_fork_info, ptr); +} + +/* Figure out how much space we need to create the incore btree root block. */ +STATIC size_t +xrep_rtrmap_iroot_size( + struct xfs_btree_cur *cur, + unsigned int level, + unsigned int nr_this_level, + void *priv) +{ + return xfs_rtrmap_broot_space_calc(cur->bc_mp, level, nr_this_level); +} + +/* Reserve new btree blocks and bulk load all the rtrmap records. */ +STATIC int +xrep_rtrmap_btree_load( + struct xrep_rtrmap *rr, + struct xfs_btree_cur *rtrmap_cur) +{ + struct repair_ctx *sc = rr->sc; + int error; + + rr->rtrmap_bload.get_records = xrep_rtrmap_get_records; + rr->rtrmap_bload.claim_block = xrep_rtrmap_claim_block; + rr->rtrmap_bload.iroot_size = xrep_rtrmap_iroot_size; + bulkload_estimate_inode_slack(sc->mp, &rr->rtrmap_bload); + + /* Compute how many blocks we'll need. */ + error = -libxfs_btree_bload_compute_geometry(rtrmap_cur, + &rr->rtrmap_bload, + rmap_record_count(sc->mp, true, rr->rtg->rtg_rgno)); + if (error) + return error; + + /* + * Guess how many blocks we're going to need to rebuild an entire rtrmap + * from the number of extents we found, and pump up our transaction to + * have sufficient block reservation. + */ + error = -libxfs_trans_reserve_more(sc->tp, rr->rtrmap_bload.nr_blocks, + 0); + if (error) + return error; + + /* + * Reserve the space we'll need for the new btree. Drop the cursor + * while we do this because that can roll the transaction and cursors + * can't handle that. + */ + error = bulkload_alloc_blocks(&rr->new_fork_info, + rr->rtrmap_bload.nr_blocks); + if (error) + return error; + + /* Add all observed rtrmap records. */ + error = rmap_init_mem_cursor(rr->sc->mp, sc->tp, true, + rr->rtg->rtg_rgno, &rr->btree_cursor); + if (error) + return error; + error = -libxfs_btree_bload(rtrmap_cur, &rr->rtrmap_bload, rr); + rmap_free_mem_cursor(sc->tp, &rr->btree_cursor, error); + return error; +} + +/* Update the inode counters. */ +STATIC int +xrep_rtrmap_reset_counters( + struct xrep_rtrmap *rr) +{ + struct repair_ctx *sc = rr->sc; + + /* + * Update the inode block counts to reflect the btree we just + * generated. + */ + sc->ip->i_nblocks = rr->new_fork_info.ifake.if_blocks; + libxfs_trans_log_inode(sc->tp, sc->ip, XFS_ILOG_CORE); + + /* Quotas don't exist so we're done. */ + return 0; +} + +/* + * Use the collected rmap information to stage a new rt rmap btree. If this is + * successful we'll return with the new btree root information logged to the + * repair transaction but not yet committed. + */ +static int +xrep_rtrmap_build_new_tree( + struct xrep_rtrmap *rr) +{ + struct xfs_owner_info oinfo; + struct xfs_btree_cur *cur; + struct repair_ctx *sc = rr->sc; + struct xbtree_ifakeroot *ifake = &rr->new_fork_info.ifake; + int error; + + /* + * Prepare to construct the new fork by initializing the new btree + * structure and creating a fake ifork in the ifakeroot structure. + */ + libxfs_rmap_ino_bmbt_owner(&oinfo, sc->ip->i_ino, XFS_DATA_FORK); + bulkload_init_inode(&rr->new_fork_info, sc, XFS_DATA_FORK, &oinfo); + cur = libxfs_rtrmapbt_stage_cursor(sc->mp, rr->rtg, sc->ip, ifake); + + /* + * Figure out the size and format of the new fork, then fill it with + * all the rtrmap records we've found. Join the inode to the + * transaction so that we can roll the transaction while holding the + * inode locked. + */ + libxfs_trans_ijoin(sc->tp, sc->ip, 0); + ifake->if_fork->if_format = XFS_DINODE_FMT_RMAP; + error = xrep_rtrmap_btree_load(rr, cur); + if (error) + goto err_cur; + + /* + * Install the new fork in the inode. After this point the old mapping + * data are no longer accessible and the new tree is live. We delete + * the cursor immediately after committing the staged root because the + * staged fork might be in extents format. + */ + libxfs_rtrmapbt_commit_staged_btree(cur, sc->tp); + libxfs_btree_del_cursor(cur, 0); + + /* Reset the inode counters now that we've changed the fork. */ + error = xrep_rtrmap_reset_counters(rr); + if (error) + goto err_newbt; + + /* Dispose of any unused blocks and the accounting infomation. */ + bulkload_destroy(&rr->new_fork_info, error); + + return -libxfs_trans_roll_inode(&sc->tp, sc->ip); +err_cur: + if (cur) + libxfs_btree_del_cursor(cur, error); +err_newbt: + bulkload_destroy(&rr->new_fork_info, error); + return error; +} + +/* Store the realtime reverse-mappings in the rtrmapbt. */ +int +populate_rtgroup_rmapbt( + struct xfs_rtgroup *rtg, + struct xfs_inode *ip) +{ + struct repair_ctx sc = { + .mp = rtg->rtg_mount, + .ip = ip, + }; + struct xrep_rtrmap rr = { + .sc = &sc, + .rtg = rtg, + }; + struct xfs_mount *mp = rtg->rtg_mount; + int error; + + if (!xfs_has_rtrmapbt(mp)) + return 0; + + error = -libxfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, 0, 0, 0, + &sc.tp); + if (error) + return error; + + error = xrep_rtrmap_build_new_tree(&rr); + if (error) + goto out_cancel; + + return -libxfs_trans_commit(sc.tp); + +out_cancel: + libxfs_trans_cancel(sc.tp); + return error; +} diff --git a/repair/xfs_repair.c b/repair/xfs_repair.c index 13e1f2deccf..78129218877 100644 --- a/repair/xfs_repair.c +++ b/repair/xfs_repair.c @@ -1381,13 +1381,17 @@ main(int argc, char **argv) rcbagbt_destroy_cur_cache(); /* - * Done with the block usage maps, toss them... + * Done with the block usage maps, toss them. Realtime metadata aren't + * rebuilt until phase 6, so we have to keep them around. */ - rmaps_free(mp); + if (mp->m_sb.sb_rblocks == 0) + rmaps_free(mp); free_bmaps(mp); if (!bad_ino_btree) { phase6(mp); + if (mp->m_sb.sb_rblocks != 0) + rmaps_free(mp); phase_end(mp, 6); phase7(mp, phase2_threads); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 32/41] xfs_repair: refactor realtime inode check 2022-12-30 22:19 ` [PATCHSET v1.0 00/41] libxfs: realtime reverse-mapping support Darrick J. Wong ` (30 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 36/41] xfs_repair: rebuild " Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 33/41] xfs_repair: find and mark the rtrmapbt inodes Darrick J. Wong ` (8 subsequent siblings) 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Refactor the realtime bitmap and summary checks into a helper function. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- repair/dinode.c | 84 ++++++++++++++++++++++++++----------------------------- 1 file changed, 39 insertions(+), 45 deletions(-) diff --git a/repair/dinode.c b/repair/dinode.c index 28eb639bbb8..3e55434c849 100644 --- a/repair/dinode.c +++ b/repair/dinode.c @@ -1736,6 +1736,39 @@ check_dinode_mode_format( return 0; /* invalid modes are checked elsewhere */ } +static int +process_check_rt_inode( + struct xfs_mount *mp, + struct xfs_dinode *dinoc, + xfs_ino_t lino, + int *type, + int *dirty, + int expected_type, + const char *tag) +{ + xfs_extnum_t dnextents = xfs_dfork_data_extents(dinoc); + + if (*type != expected_type) { + do_warn( +_("%s inode %" PRIu64 " has bad type 0x%x, "), + tag, lino, dinode_fmt(dinoc)); + if (!no_modify) { + do_warn(_("resetting to regular file\n")); + change_dinode_fmt(dinoc, S_IFREG); + *dirty = 1; + } else { + do_warn(_("would reset to regular file\n")); + } + } + if (mp->m_sb.sb_rblocks == 0 && dnextents != 0) { + do_warn( +_("bad # of extents (%" PRIu64 ") for %s inode %" PRIu64 "\n"), + dnextents, tag, lino); + return 1; + } + return 0; +} + /* * If inode is a superblock inode, does type check to make sure is it valid. * Returns 0 if it's valid, non-zero if it needs to be cleared. @@ -1749,8 +1782,6 @@ process_check_sb_inodes( int *type, int *dirty) { - xfs_extnum_t dnextents; - if (lino == mp->m_sb.sb_rootino) { if (*type != XR_INO_DIR) { do_warn(_("root inode %" PRIu64 " has bad type 0x%x\n"), @@ -1792,49 +1823,12 @@ process_check_sb_inodes( } return 0; } - dnextents = xfs_dfork_data_extents(dinoc); - if (lino == mp->m_sb.sb_rsumino) { - if (*type != XR_INO_RTSUM) { - do_warn( -_("realtime summary inode %" PRIu64 " has bad type 0x%x, "), - lino, dinode_fmt(dinoc)); - if (!no_modify) { - do_warn(_("resetting to regular file\n")); - change_dinode_fmt(dinoc, S_IFREG); - *dirty = 1; - } else { - do_warn(_("would reset to regular file\n")); - } - } - if (mp->m_sb.sb_rblocks == 0 && dnextents != 0) { - do_warn( -_("bad # of extents (%" PRIu64 ") for realtime summary inode %" PRIu64 "\n"), - dnextents, lino); - return 1; - } - return 0; - } - if (lino == mp->m_sb.sb_rbmino) { - if (*type != XR_INO_RTBITMAP) { - do_warn( -_("realtime bitmap inode %" PRIu64 " has bad type 0x%x, "), - lino, dinode_fmt(dinoc)); - if (!no_modify) { - do_warn(_("resetting to regular file\n")); - change_dinode_fmt(dinoc, S_IFREG); - *dirty = 1; - } else { - do_warn(_("would reset to regular file\n")); - } - } - if (mp->m_sb.sb_rblocks == 0 && dnextents != 0) { - do_warn( -_("bad # of extents (%" PRIu64 ") for realtime bitmap inode %" PRIu64 "\n"), - dnextents, lino); - return 1; - } - return 0; - } + if (lino == mp->m_sb.sb_rsumino) + return process_check_rt_inode(mp, dinoc, lino, type, dirty, + XR_INO_RTSUM, _("realtime summary")); + if (lino == mp->m_sb.sb_rbmino) + return process_check_rt_inode(mp, dinoc, lino, type, dirty, + XR_INO_RTBITMAP, _("realtime bitmap")); return 0; } ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 33/41] xfs_repair: find and mark the rtrmapbt inodes 2022-12-30 22:19 ` [PATCHSET v1.0 00/41] libxfs: realtime reverse-mapping support Darrick J. Wong ` (31 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 32/41] xfs_repair: refactor realtime inode check Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 34/41] xfs_repair: check existing realtime rmapbt entries against observed rmaps Darrick J. Wong ` (7 subsequent siblings) 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Make sure that we find the realtime rmapbt inodes and mark them appropriately, just in case we find a rogue inode claiming to be an rtrmap, or garbage in the metadata directory tree. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- repair/dino_chunks.c | 13 ++++++ repair/dinode.c | 34 ++++++++++++++- repair/dir2.c | 4 ++ repair/incore.h | 1 repair/rmap.c | 111 +++++++++++++++++++++++++++++++++++++++++++++++++- repair/rmap.h | 5 ++ repair/scan.c | 8 ++-- 7 files changed, 167 insertions(+), 9 deletions(-) diff --git a/repair/dino_chunks.c b/repair/dino_chunks.c index c68d92a4d88..277f21c6936 100644 --- a/repair/dino_chunks.c +++ b/repair/dino_chunks.c @@ -15,6 +15,8 @@ #include "versions.h" #include "prefetch.h" #include "progress.h" +#include "slab.h" +#include "rmap.h" /* * validates inode block or chunk, returns # of good inodes @@ -1014,6 +1016,17 @@ process_inode_chunk( _("would clear realtime summary inode %" PRIu64 "\n"), ino); } + } else if (is_rtrmap_inode(ino)) { + rmap_avoid_check(mp); + if (!no_modify) { + do_warn( + _("cleared realtime rmap inode %" PRIu64 "\n"), + ino); + } else { + do_warn( + _("would clear realtime rmap inode %" PRIu64 "\n"), + ino); + } } else if (!no_modify) { do_warn(_("cleared inode %" PRIu64 "\n"), ino); diff --git a/repair/dinode.c b/repair/dinode.c index 3e55434c849..782a36172ad 100644 --- a/repair/dinode.c +++ b/repair/dinode.c @@ -153,6 +153,9 @@ clear_dinode(xfs_mount_t *mp, struct xfs_dinode *dino, xfs_ino_t ino_num) clear_dinode_core(mp, dino, ino_num); clear_dinode_unlinked(mp, dino); + if (is_rtrmap_inode(ino_num)) + rmap_avoid_check(mp); + /* and clear the forks */ memset(XFS_DFORK_DPTR(dino), 0, XFS_LITINO(mp)); return; @@ -823,13 +826,22 @@ process_rtrmap( lino = XFS_AGINO_TO_INO(mp, agno, ino); - /* This rmap btree inode must be a metadata inode. */ + /* + * This rmap btree inode must be a metadata inode reachable via + * /realtime/$rgno.rmap in the metadata directory tree. + */ if (!(dip->di_flags2 & be64_to_cpu(XFS_DIFLAG2_METADATA))) { do_warn( _("rtrmap inode %" PRIu64 " not flagged as metadata\n"), lino); return 1; } + if (type != XR_INO_RTRMAP) { + do_warn( +_("rtrmap inode %" PRIu64 " was not found in the metadata directory tree\n"), + lino); + return 1; + } memset(&priv.high_key, 0xFF, sizeof(priv.high_key)); priv.high_key.rm_blockcount = 0; @@ -867,7 +879,7 @@ _("computed size of rtrmapbt root (%zu bytes) is greater than space in " error = process_rtrmap_reclist(mp, rp, numrecs, &priv.last_rec, NULL, "rtrmapbt root"); if (error) { - rmap_avoid_check(); + rmap_avoid_check(mp); return 1; } return 0; @@ -1829,6 +1841,9 @@ process_check_sb_inodes( if (lino == mp->m_sb.sb_rbmino) return process_check_rt_inode(mp, dinoc, lino, type, dirty, XR_INO_RTBITMAP, _("realtime bitmap")); + if (is_rtrmap_inode(lino)) + return process_check_rt_inode(mp, dinoc, lino, type, dirty, + XR_INO_RTRMAP, _("realtime rmap btree")); return 0; } @@ -1926,6 +1941,18 @@ _("realtime summary inode %" PRIu64 " has bad size %" PRId64 " (should be %d)\n" } break; + case XR_INO_RTRMAP: + /* + * if we have no rmapbt, any inode claiming + * to be a real-time file is bogus + */ + if (!xfs_has_rmapbt(mp)) { + do_warn( +_("found inode %" PRIu64 " claiming to be a rtrmapbt file, but rmapbt is disabled\n"), lino); + return 1; + } + break; + default: break; } @@ -3046,6 +3073,8 @@ _("bad (negative) size %" PRId64 " on inode %" PRIu64 "\n"), type = XR_INO_GQUOTA; else if (lino == mp->m_sb.sb_pquotino) type = XR_INO_PQUOTA; + else if (is_rtrmap_inode(lino)) + type = XR_INO_RTRMAP; else type = XR_INO_DATA; break; @@ -3151,6 +3180,7 @@ _("Bad CoW extent size %u on inode %" PRIu64 ", "), case XR_INO_UQUOTA: case XR_INO_GQUOTA: case XR_INO_PQUOTA: + case XR_INO_RTRMAP: /* * This inode was recognized as being filesystem * metadata, so preserve the inode and its contents for diff --git a/repair/dir2.c b/repair/dir2.c index e1fb195df34..4c59ad071de 100644 --- a/repair/dir2.c +++ b/repair/dir2.c @@ -15,6 +15,8 @@ #include "da_util.h" #include "prefetch.h" #include "progress.h" +#include "slab.h" +#include "rmap.h" /* * Known bad inode list. These are seen when the leaf and node @@ -153,6 +155,8 @@ is_meta_ino( reason = _("realtime bitmap"); else if (lino == mp->m_sb.sb_rsumino) reason = _("realtime summary"); + else if (is_rtrmap_inode(lino)) + reason = _("realtime rmap"); else if (lino == mp->m_sb.sb_uquotino) reason = _("user quota"); else if (lino == mp->m_sb.sb_gquotino) diff --git a/repair/incore.h b/repair/incore.h index c31b778a0fb..3c0e4ea2b29 100644 --- a/repair/incore.h +++ b/repair/incore.h @@ -224,6 +224,7 @@ int count_bcnt_extents(xfs_agnumber_t); #define XR_INO_UQUOTA 12 /* user quota inode */ #define XR_INO_GQUOTA 13 /* group quota inode */ #define XR_INO_PQUOTA 14 /* project quota inode */ +#define XR_INO_RTRMAP 15 /* realtime rmap */ /* inode allocation tree */ diff --git a/repair/rmap.c b/repair/rmap.c index 9550377df16..4d7ed98ad17 100644 --- a/repair/rmap.c +++ b/repair/rmap.c @@ -32,6 +32,12 @@ struct xfs_ag_rmap { int ar_flcount; /* agfl entries from leftover */ /* agbt allocations */ struct xfs_slab *ar_refcount_items; /* refcount items, p4-5 */ + + /* + * inumber of the rmap btree for this rtgroup. This can be set to + * NULLFSINO to signal to phase 6 to link a new inode into the metadir. + */ + xfs_ino_t rg_rmap_ino; }; static struct xfs_ag_rmap *ag_rmaps; @@ -39,6 +45,9 @@ static struct xfs_ag_rmap *rg_rmaps; bool rmapbt_suspect; static bool refcbt_suspect; +/* Bitmap of rt group rmap inodes reachable via /realtime/$rgno.rmap. */ +static struct bitmap *rmap_inodes; + static struct xfs_ag_rmap *rmaps_for_group(bool isrt, unsigned int group) { if (isrt) @@ -119,6 +128,7 @@ rmaps_init_rt( if (error) goto nomem; + ag_rmap->rg_rmap_ino = NULLFSINO; return; nomem: do_error( @@ -167,6 +177,79 @@ rmaps_init_ag( _("Insufficient memory while allocating realtime reverse mapping btree.")); } +static inline int +set_rtgroup_rmap_inode( + struct xfs_mount *mp, + xfs_rgnumber_t rgno) +{ + struct xfs_imeta_path *path; + struct xfs_ag_rmap *ar = rmaps_for_group(true, rgno); + xfs_ino_t ino; + int error; + + if (!xfs_has_rtrmapbt(mp)) + return 0; + + error = -libxfs_rtrmapbt_create_path(mp, rgno, &path); + if (error) + return error; + + error = -libxfs_imeta_lookup(mp, path, &ino); + libxfs_imeta_free_path(path); + if (error) + return error; + + if (ino == NULLFSINO || bitmap_test(rmap_inodes, ino, 1)) + return EFSCORRUPTED; + + error = bitmap_set(rmap_inodes, ino, 1); + if (error) + return error; + + ar->rg_rmap_ino = ino; + return 0; +} + +static void +discover_rtgroup_inodes( + struct xfs_mount *mp) +{ + xfs_rgnumber_t rgno; + int error; + + error = bitmap_alloc(&rmap_inodes); + if (error) + goto out; + + for (rgno = 0; rgno < mp->m_sb.sb_rgcount; rgno++) { + int err2 = set_rtgroup_rmap_inode(mp, rgno); + if (err2 && !error) + error = err2; + } + +out: + if (error == EFSCORRUPTED) + do_warn( + _("corruption in metadata directory tree while discovering rt group inodes\n")); + if (error) + do_warn( + _("couldn't discover rt group inodes, err %d\n"), + error); +} + +static inline void +free_rtmeta_inode_bitmaps(void) +{ + bitmap_free(&rmap_inodes); +} + +bool is_rtrmap_inode(xfs_ino_t ino) +{ + if (!rmap_inodes) + return false; + return bitmap_test(rmap_inodes, ino, 1); +} + /* * Initialize per-AG reverse map data. */ @@ -192,6 +275,8 @@ rmaps_init( for (i = 0; i < mp->m_sb.sb_rgcount; i++) rmaps_init_rt(mp, i, &rg_rmaps[i]); + + discover_rtgroup_inodes(mp); } /* @@ -206,6 +291,8 @@ rmaps_free( if (!rmap_needs_work(mp)) return; + free_rtmeta_inode_bitmaps(); + for (i = 0; i < mp->m_sb.sb_rgcount; i++) rmaps_destroy(mp, &rg_rmaps[i]); free(rg_rmaps); @@ -1152,11 +1239,22 @@ rmap_record_count( } /* - * Disable the refcount btree check. + * Disable the rmap btree check. */ void -rmap_avoid_check(void) +rmap_avoid_check( + struct xfs_mount *mp) { + struct xfs_rtgroup *rtg; + xfs_rgnumber_t rgno; + + for_each_rtgroup(mp, rgno, rtg) { + struct xfs_ag_rmap *ar = rmaps_for_group(true, rtg->rtg_rgno); + + ar->rg_rmap_ino = NULLFSINO; + } + + bitmap_clear(rmap_inodes, 0, XFS_MAXINUMBER); rmapbt_suspect = true; } @@ -1790,3 +1888,12 @@ rmap_store_agflcount( rmaps_for_group(false, agno)->ar_flcount = count; } + +xfs_ino_t +rtgroup_rmap_ino( + struct xfs_rtgroup *rtg) +{ + struct xfs_ag_rmap *ar = rmaps_for_group(true, rtg->rtg_rgno); + + return ar->rg_rmap_ino; +} diff --git a/repair/rmap.h b/repair/rmap.h index 008b96a38f4..0cb5759086b 100644 --- a/repair/rmap.h +++ b/repair/rmap.h @@ -28,7 +28,7 @@ int rmap_commit_agbtree_mappings(struct xfs_mount *mp, xfs_agnumber_t agno); uint64_t rmap_record_count(struct xfs_mount *mp, bool isrt, xfs_agnumber_t agno); -extern void rmap_avoid_check(void); +extern void rmap_avoid_check(struct xfs_mount *mp); void rmaps_verify_btree(struct xfs_mount *mp, xfs_agnumber_t agno); extern int64_t rmap_diffkeys(struct xfs_rmap_irec *kp1, @@ -60,4 +60,7 @@ void rmap_free_mem_cursor(struct xfs_trans *tp, struct rmap_mem_cur *rmcur, int error); int rmap_get_mem_rec(struct rmap_mem_cur *rmcur, struct xfs_rmap_irec *irec); +bool is_rtrmap_inode(xfs_ino_t ino); +xfs_ino_t rtgroup_rmap_ino(struct xfs_rtgroup *rtg); + #endif /* RMAP_H_ */ diff --git a/repair/scan.c b/repair/scan.c index 09ca037f47d..40e8007e698 100644 --- a/repair/scan.c +++ b/repair/scan.c @@ -1355,7 +1355,7 @@ _("out of order key %u in %s btree block (%u/%u)\n"), out: if (suspect) - rmap_avoid_check(); + rmap_avoid_check(mp); } int @@ -1735,7 +1735,7 @@ _("bad %s btree ptr 0x%llx in ino %" PRIu64 "\n"), out: if (hdr_errors || suspect) { - rmap_avoid_check(); + rmap_avoid_check(mp); return 1; } return 0; @@ -2816,7 +2816,7 @@ validate_agf( if (levels == 0 || levels > mp->m_rmap_maxlevels) { do_warn(_("bad levels %u for rmapbt root, agno %d\n"), levels, agno); - rmap_avoid_check(); + rmap_avoid_check(mp); } bno = be32_to_cpu(agf->agf_roots[XFS_BTNUM_RMAP]); @@ -2831,7 +2831,7 @@ validate_agf( } else { do_warn(_("bad agbno %u for rmapbt root, agno %d\n"), bno, agno); - rmap_avoid_check(); + rmap_avoid_check(mp); } } ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 34/41] xfs_repair: check existing realtime rmapbt entries against observed rmaps 2022-12-30 22:19 ` [PATCHSET v1.0 00/41] libxfs: realtime reverse-mapping support Darrick J. Wong ` (32 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 33/41] xfs_repair: find and mark the rtrmapbt inodes Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 38/41] xfs_repair: reserve per-AG space while rebuilding rt metadata Darrick J. Wong ` (6 subsequent siblings) 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Once we've finished collecting reverse mapping observations from the metadata scan, check those observations against the realtime rmap btree (particularly if we're in -n mode) to detect rtrmapbt problems. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- repair/phase4.c | 12 +++ repair/rmap.c | 221 ++++++++++++++++++++++++++++++++++++++++++------------- repair/rmap.h | 2 3 files changed, 182 insertions(+), 53 deletions(-) diff --git a/repair/phase4.c b/repair/phase4.c index cfdea1460e5..b0cb805f30c 100644 --- a/repair/phase4.c +++ b/repair/phase4.c @@ -155,6 +155,16 @@ check_rmap_btrees( rmaps_verify_btree(wq->wq_ctx, agno); } +static void +check_rtrmap_btrees( + struct workqueue *wq, + xfs_agnumber_t agno, + void *arg) +{ + rmap_add_fixed_rtgroup_rec(wq->wq_ctx, agno); + rtrmaps_verify_btree(wq->wq_ctx, agno); +} + static void compute_ag_refcounts( struct workqueue*wq, @@ -207,6 +217,8 @@ process_rmap_data( create_work_queue(&wq, mp, platform_nproc()); for (i = 0; i < mp->m_sb.sb_agcount; i++) queue_work(&wq, check_rmap_btrees, i, NULL); + for (i = 0; i < mp->m_sb.sb_rgcount; i++) + queue_work(&wq, check_rtrmap_btrees, i, NULL); destroy_work_queue(&wq); if (!xfs_has_reflink(mp)) diff --git a/repair/rmap.c b/repair/rmap.c index 4d7ed98ad17..9795f0ec577 100644 --- a/repair/rmap.c +++ b/repair/rmap.c @@ -675,6 +675,26 @@ rmap_add_fixed_ag_rec( } } +/* Add this realtime group's fixed metadata to the incore data. */ +void +rmap_add_fixed_rtgroup_rec( + struct xfs_mount *mp, + xfs_rgnumber_t rgno) +{ + struct xfs_rmap_irec rmap = { + .rm_startblock = 0, + .rm_blockcount = mp->m_sb.sb_rextsize, + .rm_owner = XFS_RMAP_OWN_FS, + .rm_offset = 0, + .rm_flags = 0, + }; + + if (!rmap_needs_work(mp)) + return; + + rmap_add_mem_rec(mp, true, rgno, &rmap); +} + /* * Copy the per-AG btree reverse-mapping data into the rmapbt. * @@ -1324,62 +1344,25 @@ rmap_is_good( #undef NEXTP #undef NEXTL -/* - * Compare the observed reverse mappings against what's in the ag btree. - */ -void -rmaps_verify_btree( - struct xfs_mount *mp, - xfs_agnumber_t agno) +static int +rmap_compare_records( + struct rmap_mem_cur *rm_cur, + struct xfs_btree_cur *bt_cur, + unsigned int group) { - struct rmap_mem_cur rm_cur; struct xfs_rmap_irec rm_rec; struct xfs_rmap_irec tmp; - struct xfs_btree_cur *bt_cur = NULL; - struct xfs_buf *agbp = NULL; - struct xfs_perag *pag = NULL; int have; int error; - if (!xfs_has_rmapbt(mp) || add_rmapbt) - return; - if (rmapbt_suspect) { - if (no_modify && agno == 0) - do_warn(_("would rebuild corrupt rmap btrees.\n")); - return; - } - - /* Create cursors to rmap structures */ - error = rmap_init_mem_cursor(mp, NULL, false, agno, &rm_cur); - if (error) { - do_warn(_("Not enough memory to check reverse mappings.\n")); - return; - } - - pag = libxfs_perag_get(mp, agno); - error = -libxfs_alloc_read_agf(pag, NULL, 0, &agbp); - if (error) { - do_warn(_("Could not read AGF %u to check rmap btree.\n"), - agno); - goto err_pag; - } - - /* Leave the per-ag data "uninitialized" since we rewrite it later */ - pag->pagf_init = 0; - - bt_cur = libxfs_rmapbt_init_cursor(mp, NULL, agbp, pag); - if (!bt_cur) { - do_warn(_("Not enough memory to check reverse mappings.\n")); - goto err_agf; - } - - while ((error = rmap_get_mem_rec(&rm_cur, &rm_rec)) == 1) { + while ((error = rmap_get_mem_rec(rm_cur, &rm_rec)) == 1) { error = rmap_lookup(bt_cur, &rm_rec, &tmp, &have); if (error) { do_warn( _("Could not read reverse-mapping record for (%u/%u).\n"), - agno, rm_rec.rm_startblock); - goto err_cur; + group, + rm_rec.rm_startblock); + return error; } /* @@ -1394,15 +1377,15 @@ _("Could not read reverse-mapping record for (%u/%u).\n"), if (error) { do_warn( _("Could not read reverse-mapping record for (%u/%u).\n"), - agno, rm_rec.rm_startblock); - goto err_cur; + group, rm_rec.rm_startblock); + return error; } } if (!have) { do_warn( _("Missing reverse-mapping record for (%u/%u) %slen %u owner %"PRId64" \ %s%soff %"PRIu64"\n"), - agno, rm_rec.rm_startblock, + group, rm_rec.rm_startblock, (rm_rec.rm_flags & XFS_RMAP_UNWRITTEN) ? _("unwritten ") : "", rm_rec.rm_blockcount, @@ -1415,12 +1398,12 @@ _("Missing reverse-mapping record for (%u/%u) %slen %u owner %"PRId64" \ continue; } - /* Compare each refcount observation against the btree's */ + /* Compare each rmap observation against the btree's */ if (!rmap_is_good(&rm_rec, &tmp)) { do_warn( _("Incorrect reverse-mapping: saw (%u/%u) %slen %u owner %"PRId64" %s%soff \ %"PRIu64"; should be (%u/%u) %slen %u owner %"PRId64" %s%soff %"PRIu64"\n"), - agno, tmp.rm_startblock, + group, tmp.rm_startblock, (tmp.rm_flags & XFS_RMAP_UNWRITTEN) ? _("unwritten ") : "", tmp.rm_blockcount, @@ -1430,7 +1413,7 @@ _("Incorrect reverse-mapping: saw (%u/%u) %slen %u owner %"PRId64" %s%soff \ (tmp.rm_flags & XFS_RMAP_BMBT_BLOCK) ? _("bmbt ") : "", tmp.rm_offset, - agno, rm_rec.rm_startblock, + group, rm_rec.rm_startblock, (rm_rec.rm_flags & XFS_RMAP_UNWRITTEN) ? _("unwritten ") : "", rm_rec.rm_blockcount, @@ -1443,8 +1426,61 @@ _("Incorrect reverse-mapping: saw (%u/%u) %slen %u owner %"PRId64" %s%soff \ } } + return error; +} + +/* + * Compare the observed reverse mappings against what's in the ag btree. + */ +void +rmaps_verify_btree( + struct xfs_mount *mp, + xfs_agnumber_t agno) +{ + struct rmap_mem_cur rm_cur; + struct xfs_btree_cur *bt_cur = NULL; + struct xfs_buf *agbp = NULL; + struct xfs_perag *pag = NULL; + int error; + + if (!xfs_has_rmapbt(mp) || add_rmapbt) + return; + if (rmapbt_suspect) { + if (no_modify && agno == 0) + do_warn(_("would rebuild corrupt rmap btrees.\n")); + return; + } + + /* Create cursors to rmap structures */ + error = rmap_init_mem_cursor(mp, NULL, false, agno, &rm_cur); + if (error) { + do_warn(_("Not enough memory to check reverse mappings.\n")); + return; + } + + pag = libxfs_perag_get(mp, agno); + error = -libxfs_alloc_read_agf(pag, NULL, 0, &agbp); + if (error) { + do_warn(_("Could not read AGF %u to check rmap btree.\n"), + agno); + goto err_pag; + } + + /* Leave the per-ag data "uninitialized" since we rewrite it later */ + pag->pagf_init = 0; + + bt_cur = libxfs_rmapbt_init_cursor(mp, NULL, agbp, pag); + if (!bt_cur) { + do_warn(_("Not enough memory to check reverse mappings.\n")); + goto err_agf; + } + + error = rmap_compare_records(&rm_cur, bt_cur, agno); + if (error) + goto err_cur; + err_cur: - libxfs_btree_del_cursor(bt_cur, XFS_BTREE_NOERROR); + libxfs_btree_del_cursor(bt_cur, error); err_agf: libxfs_buf_relse(agbp); err_pag: @@ -1452,6 +1488,85 @@ _("Incorrect reverse-mapping: saw (%u/%u) %slen %u owner %"PRId64" %s%soff \ rmap_free_mem_cursor(NULL, &rm_cur, error); } +/* + * Compare the observed reverse mappings against what's in the rtgroup btree. + */ +void +rtrmaps_verify_btree( + struct xfs_mount *mp, + xfs_rgnumber_t rgno) +{ + struct rmap_mem_cur rm_cur; + struct xfs_btree_cur *bt_cur = NULL; + struct xfs_rtgroup *rtg = NULL; + struct xfs_ag_rmap *ar = rmaps_for_group(true, rgno); + struct xfs_inode *ip = NULL; + int error; + + if (!xfs_has_rmapbt(mp) || add_rmapbt) + return; + if (rmapbt_suspect) { + if (no_modify && rgno == 0) + do_warn(_("would rebuild corrupt rmap btrees.\n")); + return; + } + + /* Create cursors to rmap structures */ + error = rmap_init_mem_cursor(mp, NULL, true, rgno, &rm_cur); + if (error) { + do_warn(_("Not enough memory to check reverse mappings.\n")); + return; + } + + rtg = libxfs_rtgroup_get(mp, rgno); + if (!rtg) { + do_warn(_("Could not load rtgroup %u.\n"), rgno); + goto err_rcur; + } + + error = -libxfs_imeta_iget(mp, ar->rg_rmap_ino, XFS_DIR3_FT_REG_FILE, + &ip); + if (error) { + do_warn( +_("Could not load rtgroup %u rmap inode, error %d.\n"), + rgno, error); + goto err_rtg; + } + + if (ip->i_df.if_format != XFS_DINODE_FMT_RMAP) { + do_warn( +_("rtgroup %u rmap inode has wrong format 0x%x, expected 0x%x\n"), + rgno, ip->i_df.if_format, + XFS_DINODE_FMT_RMAP); + goto err_ino; + } + + if (xfs_inode_has_attr_fork(ip)) { + do_warn( +_("rtgroup %u rmap inode should not have extended attributes\n"), rgno); + goto err_ino; + } + + bt_cur = libxfs_rtrmapbt_init_cursor(mp, NULL, rtg, ip); + if (!bt_cur) { + do_warn(_("Not enough memory to check reverse mappings.\n")); + goto err_ino; + } + + error = rmap_compare_records(&rm_cur, bt_cur, rgno); + if (error) + goto err_cur; + +err_cur: + libxfs_btree_del_cursor(bt_cur, error); +err_ino: + libxfs_imeta_irele(ip); +err_rtg: + libxfs_rtgroup_put(rtg); +err_rcur: + rmap_free_mem_cursor(NULL, &rm_cur, error); +} + /* * Compare the key fields of two rmap records -- positive if key1 > key2, * negative if key1 < key2, and zero if equal. diff --git a/repair/rmap.h b/repair/rmap.h index 0cb5759086b..c7c09046a8a 100644 --- a/repair/rmap.h +++ b/repair/rmap.h @@ -21,6 +21,7 @@ void rmap_add_bmbt_rec(struct xfs_mount *mp, xfs_ino_t ino, int whichfork, bool rmaps_are_mergeable(struct xfs_rmap_irec *r1, struct xfs_rmap_irec *r2); void rmap_add_fixed_ag_rec(struct xfs_mount *mp, xfs_agnumber_t agno); +void rmap_add_fixed_rtgroup_rec(struct xfs_mount *mp, xfs_rgnumber_t rgno); int rmap_add_agbtree_mapping(struct xfs_mount *mp, xfs_agnumber_t agno, xfs_agblock_t agbno, xfs_extlen_t len, uint64_t owner); @@ -30,6 +31,7 @@ uint64_t rmap_record_count(struct xfs_mount *mp, bool isrt, xfs_agnumber_t agno); extern void rmap_avoid_check(struct xfs_mount *mp); void rmaps_verify_btree(struct xfs_mount *mp, xfs_agnumber_t agno); +void rtrmaps_verify_btree(struct xfs_mount *mp, xfs_rgnumber_t rgno); extern int64_t rmap_diffkeys(struct xfs_rmap_irec *kp1, struct xfs_rmap_irec *kp2); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 38/41] xfs_repair: reserve per-AG space while rebuilding rt metadata 2022-12-30 22:19 ` [PATCHSET v1.0 00/41] libxfs: realtime reverse-mapping support Darrick J. Wong ` (33 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 34/41] xfs_repair: check existing realtime rmapbt entries against observed rmaps Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 35/41] xfs_repair: always check realtime file mappings against incore info Darrick J. Wong ` (5 subsequent siblings) 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Realtime metadata btrees can consume quite a bit of space on a full filesystem. Since the metadata are just regular files, we need to make the per-AG reservations to avoid overfilling any of the AGs while rebuilding metadata. This avoids the situation where a filesystem comes straight from repair and immediately trips over not having enough space in an AG. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- include/libxfs.h | 1 + repair/phase6.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 48 insertions(+) diff --git a/include/libxfs.h b/include/libxfs.h index 0b255e2c104..b1e499569ac 100644 --- a/include/libxfs.h +++ b/include/libxfs.h @@ -88,6 +88,7 @@ struct iomap; #include "xfs_rtbitmap.h" #include "xfs_rtgroup.h" #include "xfs_rtrmap_btree.h" +#include "xfs_ag_resv.h" #ifndef ARRAY_SIZE #define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0])) diff --git a/repair/phase6.c b/repair/phase6.c index d5381e1eddc..8828f7f72b9 100644 --- a/repair/phase6.c +++ b/repair/phase6.c @@ -3813,10 +3813,43 @@ reset_rt_metadata_inodes( } } +static int +reserve_ag_blocks( + struct xfs_mount *mp) +{ + struct xfs_perag *pag; + xfs_agnumber_t agno; + int error = 0; + int err2; + + mp->m_finobt_nores = false; + + for_each_perag(mp, agno, pag) { + err2 = -libxfs_ag_resv_init(pag, NULL); + if (err2 && !error) + error = err2; + } + + return error; +} + +static void +unreserve_ag_blocks( + struct xfs_mount *mp) +{ + struct xfs_perag *pag; + xfs_agnumber_t agno; + + for_each_perag(mp, agno, pag) + libxfs_ag_resv_free(pag); +} + void phase6(xfs_mount_t *mp) { ino_tree_node_t *irec; + bool reserve_perag; + int error; int i; orphanage_ino = 0; @@ -3854,6 +3887,17 @@ phase6(xfs_mount_t *mp) do_warn(_("would reinitialize metadata root directory\n")); } + reserve_perag = xfs_has_realtime(mp) && !no_modify; + if (reserve_perag) { + error = reserve_ag_blocks(mp); + if (error) { + if (error != ENOSPC) + do_warn( + _("could not reserve per-AG space to rebuild realtime metadata")); + reserve_perag = false; + } + } + if (need_rbmino) { if (!no_modify) { if (need_rbmino > 0) @@ -3892,6 +3936,9 @@ _(" - resetting contents of realtime bitmap and summary inodes\n")); } } + if (reserve_perag) + unreserve_ag_blocks(mp); + reattach_metadir_quota_inodes(mp); mark_standalone_inodes(mp); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 35/41] xfs_repair: always check realtime file mappings against incore info 2022-12-30 22:19 ` [PATCHSET v1.0 00/41] libxfs: realtime reverse-mapping support Darrick J. Wong ` (34 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 38/41] xfs_repair: reserve per-AG space while rebuilding rt metadata Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 31/41] xfs_repair: collect relatime reverse-mapping data for refcount/rmap tree rebuilding Darrick J. Wong ` (4 subsequent siblings) 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Curiously, the xfs_repair code that processes data fork mappings of realtime files doesn't actually compare the mappings against the incore state map during the !check_dups phase (aka phase 3). As a result, we lose the opportunity to clear damaged realtime data forks before we get to crosslinked file checking in phase 4, which results in ondisk metadata errors calling do_error, which aborts repair. Split the process_rt_rec_state code into two functions: one to check the mapping, and another to update the incore state. The first one can be called to help us decide if we're going to zap the fork, and the second one updates the incore state if we decide to keep the fork. We already do this for regular data files. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- repair/dinode.c | 88 ++++++++++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 80 insertions(+), 8 deletions(-) diff --git a/repair/dinode.c b/repair/dinode.c index 782a36172ad..b2c27984671 100644 --- a/repair/dinode.c +++ b/repair/dinode.c @@ -219,7 +219,7 @@ _("data fork in rt ino %" PRIu64 " claims dup rt extent," return 0; } -static int +static void process_rt_rec_state( struct xfs_mount *mp, xfs_ino_t ino, @@ -263,11 +263,78 @@ _("data fork in rt inode %" PRIu64 " found invalid rt extent %"PRIu64" state %d set_rtbmap(ext, zap_metadata ? XR_E_METADATA : XR_E_INUSE); break; + case XR_E_BAD_STATE: + do_error( +_("bad state in rt extent map %" PRIu64 "\n"), + ext); case XR_E_METADATA: + case XR_E_FS_MAP: + case XR_E_INO: + case XR_E_INUSE_FS: + break; + case XR_E_INUSE: + case XR_E_MULT: + set_rtbmap(ext, XR_E_MULT); + break; + case XR_E_FREE1: + default: do_error( +_("illegal state %d in rt extent %" PRIu64 "\n"), + state, ext); + } + b += mp->m_sb.sb_rextsize; + } while (b < irec->br_startblock + irec->br_blockcount); +} + +/* + * Checks the realtime file's data mapping against in-core extent info, and + * complains if there are discrepancies. Returns 0 if good, 1 if bad. + */ +static int +check_rt_rec_state( + struct xfs_mount *mp, + xfs_ino_t ino, + struct xfs_bmbt_irec *irec) +{ + xfs_fsblock_t b = irec->br_startblock; + xfs_rtblock_t ext; + int state; + + do { + ext = (xfs_rtblock_t)b / mp->m_sb.sb_rextsize; + state = get_rtbmap(ext); + + if ((b % mp->m_sb.sb_rextsize) != 0) { + /* + * We are midway through a partially written extent. + * If we don't find the state that gets set in the + * other clause of this loop body, then we have a + * partially *mapped* rt extent and should complain. + */ + if (state != XR_E_INUSE && state != XR_E_FREE) { + do_warn( +_("data fork in rt inode %" PRIu64 " found invalid rt extent %"PRIu64" state %d at rt block %"PRIu64"\n"), + ino, ext, state, b); + return 1; + } + + b = roundup(b, mp->m_sb.sb_rextsize); + continue; + } + + /* + * This is the start of an rt extent. Complain if there are + * conflicting states. We'll set the state elsewhere. + */ + switch (state) { + case XR_E_FREE: + case XR_E_UNKNOWN: + break; + case XR_E_METADATA: + do_warn( _("data fork in rt inode %" PRIu64 " found metadata file block %" PRIu64 " in rt bmap\n"), ino, ext); - break; + return 1; case XR_E_BAD_STATE: do_error( _("bad state in rt extent map %" PRIu64 "\n"), @@ -275,12 +342,12 @@ _("bad state in rt extent map %" PRIu64 "\n"), case XR_E_FS_MAP: case XR_E_INO: case XR_E_INUSE_FS: - do_error( + do_warn( _("data fork in rt inode %" PRIu64 " found rt metadata extent %" PRIu64 " in rt bmap\n"), ino, ext); + return 1; case XR_E_INUSE: case XR_E_MULT: - set_rtbmap(ext, XR_E_MULT); do_warn( _("data fork in rt inode %" PRIu64 " claims used rt extent %" PRIu64 "\n"), ino, b); @@ -341,13 +408,18 @@ _("inode %" PRIu64 " - bad rt extent overflows - start %" PRIu64 ", " return 1; } - if (check_dups) - bad = process_rt_rec_dups(mp, ino, irec); - else - bad = process_rt_rec_state(mp, ino, zap_metadata, irec); + bad = check_rt_rec_state(mp, ino, irec); if (bad) return bad; + if (check_dups) { + bad = process_rt_rec_dups(mp, ino, irec); + if (bad) + return bad; + } else { + process_rt_rec_state(mp, ino, zap_metadata, irec); + } + /* * bump up the block counter */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 31/41] xfs_repair: collect relatime reverse-mapping data for refcount/rmap tree rebuilding 2022-12-30 22:19 ` [PATCHSET v1.0 00/41] libxfs: realtime reverse-mapping support Darrick J. Wong ` (35 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 35/41] xfs_repair: always check realtime file mappings against incore info Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 37/41] xfs_repair: rebuild the bmap btree for realtime files Darrick J. Wong ` (3 subsequent siblings) 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Collect reverse-mapping data for realtime files so that we can later check and rebuild the reference count tree and the reverse mapping tree. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- repair/dinode.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/repair/dinode.c b/repair/dinode.c index 6f44261907e..28eb639bbb8 100644 --- a/repair/dinode.c +++ b/repair/dinode.c @@ -350,6 +350,10 @@ _("inode %" PRIu64 " - bad rt extent overflows - start %" PRIu64 ", " */ *tot += irec->br_blockcount; + /* Record mapping data for the realtime rmap. */ + if (collect_rmaps && !zap_metadata && !check_dups) + rmap_add_rec(mp, ino, XFS_DATA_FORK, irec, true); + return 0; } ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 37/41] xfs_repair: rebuild the bmap btree for realtime files 2022-12-30 22:19 ` [PATCHSET v1.0 00/41] libxfs: realtime reverse-mapping support Darrick J. Wong ` (36 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 31/41] xfs_repair: collect relatime reverse-mapping data for refcount/rmap tree rebuilding Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 41/41] mkfs: create the realtime rmap inode Darrick J. Wong ` (2 subsequent siblings) 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Use the realtime rmap btree information to rebuild an inode's data fork when appropriate. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- repair/bmap_repair.c | 122 ++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 118 insertions(+), 4 deletions(-) diff --git a/repair/bmap_repair.c b/repair/bmap_repair.c index 5e41934b543..06ed86a0b4b 100644 --- a/repair/bmap_repair.c +++ b/repair/bmap_repair.c @@ -212,6 +212,113 @@ xrep_bmap_scan_ag( return error; } +/* Check for any obvious errors or conflicts in the file mapping. */ +STATIC int +xrep_bmap_check_rtfork_rmap( + struct repair_ctx *sc, + struct xfs_btree_cur *cur, + const struct xfs_rmap_irec *rec) +{ + /* xattr extents are never stored on realtime devices */ + if (rec->rm_flags & XFS_RMAP_ATTR_FORK) + return EFSCORRUPTED; + + /* bmbt blocks are never stored on realtime devices */ + if (rec->rm_flags & XFS_RMAP_BMBT_BLOCK) + return EFSCORRUPTED; + + /* Data extents for non-rt files are never stored on the rt device. */ + if (!XFS_IS_REALTIME_INODE(sc->ip)) + return EFSCORRUPTED; + + /* Check the file offsets and physical extents. */ + if (!xfs_verify_fileext(sc->mp, rec->rm_offset, rec->rm_blockcount)) + return EFSCORRUPTED; + + /* Check that this fits in the rt volume. */ + if (!xfs_verify_rgbext(cur->bc_ino.rtg, rec->rm_startblock, + rec->rm_blockcount)) + return EFSCORRUPTED; + + return 0; +} + +/* Record realtime extents that belong to this inode's fork. */ +STATIC int +xrep_bmap_walk_rtrmap( + struct xfs_btree_cur *cur, + const struct xfs_rmap_irec *rec, + void *priv) +{ + struct xrep_bmap *rb = priv; + int error = 0; + + /* Skip extents which are not owned by this inode and fork. */ + if (rec->rm_owner != rb->sc->ip->i_ino) + return 0; + + error = xrep_bmap_check_rtfork_rmap(rb->sc, cur, rec); + if (error) + return error; + + /* + * Record all blocks allocated to this file even if the extent isn't + * for the fork we're rebuilding so that we can reset di_nblocks later. + */ + rb->nblocks += rec->rm_blockcount; + + /* If this rmap isn't for the fork we want, we're done. */ + if (rb->whichfork == XFS_DATA_FORK && + (rec->rm_flags & XFS_RMAP_ATTR_FORK)) + return 0; + if (rb->whichfork == XFS_ATTR_FORK && + !(rec->rm_flags & XFS_RMAP_ATTR_FORK)) + return 0; + + return xrep_bmap_from_rmap(rb, rec->rm_offset, rec->rm_startblock, + rec->rm_blockcount, + rec->rm_flags & XFS_RMAP_UNWRITTEN); +} + +/* + * Scan the realtime reverse mappings to build the new extent map. The rt rmap + * inodes must be loaded from disk explicitly here, since we have not yet + * validated the metadata directory tree but do not wish to throw away user + * data unnecessarily. + */ +STATIC int +xrep_bmap_scan_rt( + struct xrep_bmap *rb, + struct xfs_rtgroup *rtg) +{ + struct repair_ctx *sc = rb->sc; + struct xfs_mount *mp = sc->mp; + struct xfs_btree_cur *cur; + struct xfs_inode *ip; + struct xfs_imeta_path *path; + xfs_ino_t ino; + int error; + + error = -libxfs_rtrmapbt_create_path(mp, rtg->rtg_rgno, &path); + if (error) + return error; + + error = -libxfs_imeta_lookup(mp, path, &ino); + libxfs_imeta_free_path(path); + if (error) + return error; + + error = -libxfs_imeta_iget(mp, ino, XFS_DIR3_FT_REG_FILE, &ip); + if (error) + return error; + + cur = libxfs_rtrmapbt_init_cursor(mp, sc->tp, rtg, ip); + error = -libxfs_rmap_query_all(cur, xrep_bmap_walk_rtrmap, rb); + libxfs_btree_del_cursor(cur, error); + libxfs_imeta_irele(ip); + return error; +} + /* * Collect block mappings for this fork of this inode and decide if we have * enough space to rebuild. Caller is responsible for cleaning up the list if @@ -222,9 +329,20 @@ xrep_bmap_find_mappings( struct xrep_bmap *rb) { struct xfs_perag *pag; + struct xfs_rtgroup *rtg; xfs_agnumber_t agno; + xfs_rgnumber_t rgno; int error; + /* Iterate the rtrmaps for extents. */ + for_each_rtgroup(rb->sc->mp, rgno, rtg) { + error = xrep_bmap_scan_rt(rb, rtg); + if (error) { + libxfs_rtgroup_put(rtg); + return error; + } + } + /* Iterate the rmaps for extents. */ for_each_perag(rb->sc->mp, agno, pag) { error = xrep_bmap_scan_ag(rb, pag); @@ -564,10 +682,6 @@ xrep_bmap_check_inputs( return EINVAL; } - /* Don't know how to rebuild realtime data forks. */ - if (XFS_IS_REALTIME_INODE(sc->ip)) - return EOPNOTSUPP; - return 0; } ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 41/41] mkfs: create the realtime rmap inode 2022-12-30 22:19 ` [PATCHSET v1.0 00/41] libxfs: realtime reverse-mapping support Darrick J. Wong ` (37 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 37/41] xfs_repair: rebuild the bmap btree for realtime files Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 39/41] xfs_repair: allow sysadmins to add realtime reverse mapping indexes Darrick J. Wong 2022-12-30 22:20 ` [PATCH 40/41] xfs_scrub: retest metadata across scrub groups after a repair Darrick J. Wong 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Create a realtime rmapbt inode if we format the fs with realtime and rmap. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/init.c | 7 ---- mkfs/proto.c | 62 ++++++++++++++++++++++++++++++++++++++ mkfs/xfs_mkfs.c | 90 ++++++++++++++++++++++++++++++++++++++++++++++++++++--- 3 files changed, 147 insertions(+), 12 deletions(-) diff --git a/libxfs/init.c b/libxfs/init.c index 6f549996b1e..aa94c87ccd4 100644 --- a/libxfs/init.c +++ b/libxfs/init.c @@ -455,13 +455,6 @@ rtmount_init( return -1; } - if (xfs_has_rmapbt(mp)) { - fprintf(stderr, - _("%s: Reverse mapping btree not compatible with realtime device. Please try a newer xfsprogs.\n"), - progname); - return -1; - } - if (mp->m_rtdev_targp->bt_bdev == 0 && !xfs_is_debugger(mp)) { fprintf(stderr, _("%s: filesystem has a realtime subvolume\n"), progname); diff --git a/mkfs/proto.c b/mkfs/proto.c index e734269864e..36af61ed5c0 100644 --- a/mkfs/proto.c +++ b/mkfs/proto.c @@ -813,6 +813,60 @@ rtsummary_create( libxfs_imeta_end_update(mp, &upd, 0); } +/* Create the realtime rmap btree inode. */ +static void +rtrmapbt_create( + struct xfs_rtgroup *rtg) +{ + struct xfs_mount *mp = rtg->rtg_mount; + struct xfs_imeta_update upd; + struct xfs_rmap_irec rmap = { + .rm_startblock = 0, + .rm_blockcount = mp->m_sb.sb_rextsize, + .rm_owner = XFS_RMAP_OWN_FS, + .rm_offset = 0, + .rm_flags = 0, + }; + struct xfs_imeta_path *path; + struct xfs_trans *tp; + struct xfs_btree_cur *cur; + int error; + + error = -libxfs_rtrmapbt_create_path(mp, rtg->rtg_rgno, &path); + if (error) + fail( _("rtrmap inode path creation failed"), error); + + error = -libxfs_imeta_ensure_dirpath(mp, path); + if (error) + fail(_("rtgroup directory allocation failed"), error); + + error = -libxfs_imeta_start_update(mp, path, &upd); + if (error) + res_failed(error); + + error = -libxfs_trans_alloc(mp, &M_RES(mp)->tr_imeta_create, + libxfs_imeta_create_space_res(mp), 0, 0, &tp); + if (error) + res_failed(error); + + error = -libxfs_rtrmapbt_create(&tp, path, &upd, &rtg->rtg_rmapip); + if (error) + fail(_("rtrmap inode creation failed"), error); + + cur = libxfs_rtrmapbt_init_cursor(mp, tp, rtg, rtg->rtg_rmapip); + error = -libxfs_rmap_map_raw(cur, &rmap); + libxfs_btree_del_cursor(cur, error); + if (error) + fail(_("rtrmapbt initialization failed"), error); + + error = -libxfs_trans_commit(tp); + if (error) + fail(_("rtrmapbt commit failed"), error); + + libxfs_imeta_end_update(mp, &upd, 0); + libxfs_imeta_free_path(path); +} + /* Initialize block headers of rt free space files. */ static int init_rtblock_headers( @@ -1046,9 +1100,17 @@ static void rtinit( struct xfs_mount *mp) { + struct xfs_rtgroup *rtg; + xfs_rgnumber_t rgno; + rtbitmap_create(mp); rtsummary_create(mp); + for_each_rtgroup(mp, rgno, rtg) { + if (xfs_has_rtrmapbt(mp)) + rtrmapbt_create(rtg); + } + rtbitmap_init(mp); rtsummary_init(mp); if (xfs_has_rtgroups(mp)) diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c index 4f96e436d32..eebcade7d1a 100644 --- a/mkfs/xfs_mkfs.c +++ b/mkfs/xfs_mkfs.c @@ -2392,12 +2392,18 @@ _("reflink not supported with realtime devices\n")); } cli->sb_feat.reflink = false; - if (cli->sb_feat.rmapbt && cli_opt_set(&mopts, M_RMAPBT)) { - fprintf(stderr, -_("rmapbt not supported with realtime devices\n")); - usage(); + if (!cli->sb_feat.rtgroups && cli->sb_feat.rmapbt) { + if (cli_opt_set(&mopts, M_RMAPBT) && + cli_opt_set(&ropts, R_RTGROUPS)) { + fprintf(stderr, +_("rmapbt not supported on realtime devices without rtgroups feature\n")); + usage(); + } else if (cli_opt_set(&mopts, M_RMAPBT)) { + cli->sb_feat.rtgroups = true; + } else { + cli->sb_feat.rmapbt = false; + } } - cli->sb_feat.rmapbt = false; } if ((cli->fsx.fsx_xflags & FS_XFLAG_COWEXTSIZE) && @@ -4500,6 +4506,77 @@ cfgfile_parse( cli->cfgfile); } +static inline void +prealloc_fail( + struct xfs_mount *mp, + int error, + xfs_filblks_t ask, + const char *tag) +{ + if (error == ENOSPC) + fprintf(stderr, + _("%s: cannot handle expansion of %s; need %llu free blocks, have %llu\n"), + progname, tag, (unsigned long long)ask, + (unsigned long long)mp->m_sb.sb_fdblocks); + else + fprintf(stderr, + _("%s: error %d while checking free space for %s\n"), + progname, error, tag); + exit(1); +} + +/* + * Make sure there's enough space on the data device to handle realtime + * metadata btree expansions. + */ +static void +check_rt_meta_prealloc( + struct xfs_mount *mp) +{ + struct xfs_perag *pag; + struct xfs_rtgroup *rtg; + xfs_agnumber_t agno; + xfs_rgnumber_t rgno; + xfs_filblks_t ask; + int error; + + /* + * First create all the per-AG reservations, since they take from the + * free block count. Each AG should start with enough free space for + * the per-AG reservation. + */ + mp->m_finobt_nores = false; + + for_each_perag(mp, agno, pag) { + error = -libxfs_ag_resv_init(pag, NULL); + if (error && error != ENOSPC) { + fprintf(stderr, + _("%s: error %d while checking AG free space for realtime metadata\n"), + progname, error); + exit(1); + } + } + + /* Realtime metadata btree inode */ + for_each_rtgroup(mp, rgno, rtg) { + ask = libxfs_rtrmapbt_calc_reserves(mp); + error = -libxfs_imeta_resv_init_inode(rtg->rtg_rmapip, ask); + if (error) + prealloc_fail(mp, error, ask, _("realtime rmap btree")); + } + + /* Unreserve the realtime metadata reservations. */ + for_each_rtgroup(mp, rgno, rtg) { + libxfs_imeta_resv_free_inode(rtg->rtg_rmapip); + } + + /* Unreserve the per-AG reservations. */ + for_each_perag(mp, agno, pag) + libxfs_ag_resv_free(pag); + + mp->m_finobt_nores = false; +} + int main( int argc, @@ -4873,6 +4950,9 @@ main( */ check_root_ino(mp); + /* Make sure we can handle space preallocations of rt metadata btrees */ + check_rt_meta_prealloc(mp); + /* * Re-write multiple secondary superblocks with rootinode field set */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 39/41] xfs_repair: allow sysadmins to add realtime reverse mapping indexes 2022-12-30 22:19 ` [PATCHSET v1.0 00/41] libxfs: realtime reverse-mapping support Darrick J. Wong ` (38 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 41/41] mkfs: create the realtime rmap inode Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 40/41] xfs_scrub: retest metadata across scrub groups after a repair Darrick J. Wong 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Allow the sysadmin to use xfs_repair to upgrade an existing filesystem to support the reverse mapping btree index for realtime volumes. This is needed for online fsck. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/libxfs_api_defs.h | 4 ++ repair/phase2.c | 92 ++++++++++++++++++++++++++++++++++++++++++++-- 2 files changed, 91 insertions(+), 5 deletions(-) diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h index ee864911e5e..2e7529cec54 100644 --- a/libxfs/libxfs_api_defs.h +++ b/libxfs/libxfs_api_defs.h @@ -56,6 +56,7 @@ #define xfs_btree_bload libxfs_btree_bload #define xfs_btree_bload_compute_geometry libxfs_btree_bload_compute_geometry #define xfs_btree_calc_size libxfs_btree_calc_size +#define xfs_btree_compute_maxlevels libxfs_btree_compute_maxlevels #define xfs_btree_decrement libxfs_btree_decrement #define xfs_btree_del_cursor libxfs_btree_del_cursor #define xfs_btree_delete libxfs_btree_delete @@ -171,6 +172,8 @@ #define xfs_imeta_link libxfs_imeta_link #define xfs_imeta_lookup libxfs_imeta_lookup #define xfs_imeta_mount libxfs_imeta_mount +#define xfs_imeta_resv_free_inode libxfs_imeta_resv_free_inode +#define xfs_imeta_resv_init_inode libxfs_imeta_resv_init_inode #define xfs_imeta_set_metaflag libxfs_imeta_set_metaflag #define xfs_imeta_start_update libxfs_imeta_start_update #define xfs_imeta_unlink libxfs_imeta_unlink @@ -246,6 +249,7 @@ #define xfs_rtgroup_put libxfs_rtgroup_put #define xfs_rtgroup_update_secondary_sbs libxfs_rtgroup_update_secondary_sbs #define xfs_rtgroup_update_super libxfs_rtgroup_update_super +#define xfs_rtrmapbt_calc_reserves libxfs_rtrmapbt_calc_reserves #define xfs_rtrmapbt_commit_staged_btree libxfs_rtrmapbt_commit_staged_btree #define xfs_rtrmapbt_create libxfs_rtrmapbt_create #define xfs_rtrmapbt_create_path libxfs_rtrmapbt_create_path diff --git a/repair/phase2.c b/repair/phase2.c index 707fe5ca519..35c1214be9a 100644 --- a/repair/phase2.c +++ b/repair/phase2.c @@ -264,9 +264,8 @@ set_rmapbt( exit(0); } - if (xfs_has_realtime(mp)) { - printf( - _("Reverse mapping btree feature not supported with realtime.\n")); + if (xfs_has_realtime(mp) && !xfs_has_rtgroups(mp)) { + printf(_("Reverse mapping btree requires realtime groups.\n")); exit(0); } @@ -284,6 +283,11 @@ set_rmapbt( printf(_("Adding reverse mapping btrees to filesystem.\n")); new_sb->sb_features_ro_compat |= XFS_SB_FEAT_RO_COMPAT_RMAPBT; new_sb->sb_features_incompat |= XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR; + + /* Quota counts will be wrong once we add the rmap inodes. */ + if (xfs_has_realtime(mp)) + quotacheck_skip(); + return true; } @@ -409,6 +413,55 @@ check_free_space( return avail > GIGABYTES(10, mp->m_sb.sb_blocklog); } +/* + * Reserve space to handle rt rmap btree expansion. + * + * If the rmap inode for this group already exists, we assume that we're adding + * some other feature. Note that we have not validated the metadata directory + * tree, so we must perform the lookup by hand and abort the upgrade if there + * are errors. Otherwise, the amount of space needed to handle a new maximally + * sized rmap btree is added to @new_resv. + */ +static int +reserve_rtrmap_inode( + struct xfs_rtgroup *rtg, + xfs_rfsblock_t *new_resv) +{ + struct xfs_mount *mp = rtg->rtg_mount; + struct xfs_imeta_path *path; + xfs_ino_t ino; + xfs_filblks_t ask; + int error; + + if (!xfs_has_rtrmapbt(mp)) + return 0; + + error = -libxfs_rtrmapbt_create_path(mp, rtg->rtg_rgno, &path); + if (error) + return error; + + ask = libxfs_rtrmapbt_calc_reserves(mp); + + error = -libxfs_imeta_lookup(mp, path, &ino); + libxfs_imeta_free_path(path); + if (error == EFSCORRUPTED) { + if (ask > mp->m_sb.sb_fdblocks) + return ENOSPC; + + *new_resv += ask; + return 0; + } + if (error) + return error; + + error = -libxfs_imeta_iget(mp, ino, XFS_DIR3_FT_REG_FILE, + &rtg->rtg_rmapip); + if (error) + return error; + + return -libxfs_imeta_resv_init_inode(rtg->rtg_rmapip, ask); +} + static void check_fs_free_space( struct xfs_mount *mp, @@ -416,7 +469,10 @@ check_fs_free_space( struct xfs_sb *new_sb) { struct xfs_perag *pag; + struct xfs_rtgroup *rtg; + xfs_rfsblock_t new_resv = 0; xfs_agnumber_t agno; + xfs_rgnumber_t rgno; int error; /* Make sure we have enough space for per-AG reservations. */ @@ -492,15 +548,41 @@ check_fs_free_space( libxfs_trans_cancel(tp); } + /* Realtime metadata btree inodes */ + for_each_rtgroup(mp, rgno, rtg) { + error = reserve_rtrmap_inode(rtg, &new_resv); + if (error == ENOSPC) { + printf( +_("Not enough free space would remain for rtgroup %u rmap inode.\n"), + rtg->rtg_rgno); + exit(0); + } + if (error) + do_error( +_("Error %d while checking rtgroup %u rmap inode space reservation.\n"), + rtg->rtg_rgno, error); + } + /* * Would the post-upgrade filesystem have enough free space on the data - * device after making per-AG reservations? + * device after making per-AG reservations and reserving rt metadata + * inode blocks? */ - if (!check_free_space(mp, mp->m_sb.sb_fdblocks, mp->m_sb.sb_dblocks)) { + if (new_resv > mp->m_sb.sb_fdblocks || + !check_free_space(mp, mp->m_sb.sb_fdblocks, mp->m_sb.sb_dblocks)) { printf(_("Filesystem will be low on space after upgrade.\n")); exit(1); } + /* Unreserve the realtime metadata reservations. */ + for_each_rtgroup(mp, rgno, rtg) { + if (rtg->rtg_rmapip) { + libxfs_imeta_resv_free_inode(rtg->rtg_rmapip); + libxfs_imeta_irele(rtg->rtg_rmapip); + rtg->rtg_rmapip = NULL; + } + } + /* * Release the per-AG reservations and mark the per-AG structure as * uninitialized so that we don't trip over stale cached counters ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 40/41] xfs_scrub: retest metadata across scrub groups after a repair 2022-12-30 22:19 ` [PATCHSET v1.0 00/41] libxfs: realtime reverse-mapping support Darrick J. Wong ` (39 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 39/41] xfs_repair: allow sysadmins to add realtime reverse mapping indexes Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Certain types of metadata have dependencies that cross scrub groups. For example, after a repair the part of realtime bitmap corresponding to a realtime group, we potentially need to rebuild the realtime summary to reflect the new bitmap contents. The rtsummary is a separate scrub group (metafiles) from the rgbitmap (rtgroup), which means that the rtsummary repairs must be tracked by a separate scrub_item. Create the necessary dependency table and code to make these kinds of cross-group validations possible. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- scrub/phase4.c | 43 ++++++++++++++++++++ scrub/repair.c | 123 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ scrub/repair.h | 5 ++ 3 files changed, 171 insertions(+) diff --git a/scrub/phase4.c b/scrub/phase4.c index 74fcc55b379..2d0a448e268 100644 --- a/scrub/phase4.c +++ b/scrub/phase4.c @@ -42,6 +42,47 @@ struct repair_list_schedule { bool made_progress; }; +/* + * After a successful repair, schedule any additional revalidations needed in + * other scrub groups. + */ +static int +revalidate_across_groups( + struct scrub_ctx *ctx, + const struct action_item *old_aitem, + struct repair_list_schedule *rls) +{ + struct action_list alist; + int error; + + action_list_init(&alist); + + error = action_item_schedule_revalidation(ctx, old_aitem, &alist); + if (error) { + rls->aborted = true; + return error; + } + + if (action_list_empty(&alist)) + return 0; + + pthread_mutex_unlock(&rls->lock); + error = action_list_revalidate(ctx, &alist); + pthread_mutex_lock(&rls->lock); + + if (error) + rls->aborted = true; + else + rls->made_progress = true; + + /* + * Merge the action items into the scrub context for freeing, even if + * there was an error. + */ + action_list_merge(&rls->requeue_list, &alist); + return error; +} + /* Try to repair as many things on our list as we can. */ static void repair_list_worker( @@ -89,6 +130,8 @@ repair_list_worker( action_list_add(&rls->requeue_list, aitem); break; case TR_REPAIRED: + revalidate_across_groups(ctx, aitem, rls); + /* Item is clean. Free it. */ free(aitem); break; diff --git a/scrub/repair.c b/scrub/repair.c index 79a15f907a1..3e00db7a2fd 100644 --- a/scrub/repair.c +++ b/scrub/repair.c @@ -42,6 +42,15 @@ static const unsigned int repair_deps[XFS_SCRUB_TYPE_NR] = { DEP(XFS_SCRUB_TYPE_PQUOTA), [XFS_SCRUB_TYPE_RTSUM] = DEP(XFS_SCRUB_TYPE_RTBITMAP), }; + +/* + * Data dependencies that cross scrub groups. When we repair a metadata object + * of the given type (e.g. rtgroup bitmaps), we want to trigger a revalidation + * of the specified objects (e.g. rt summary file). + */ +static const unsigned int cross_group_recheck[XFS_SCRUB_TYPE_NR] = { + [XFS_SCRUB_TYPE_RGBITMAP] = DEP(XFS_SCRUB_TYPE_RTSUM), +}; #undef DEP /* @@ -781,3 +790,117 @@ repair_item_to_action_item( *aitemp = aitem; return 0; } + +static int +schedule_cross_group_recheck( + struct scrub_ctx *ctx, + unsigned int recheck_mask, + struct action_list *new_items) +{ + unsigned int scrub_type; + + foreach_scrub_type(scrub_type) { + struct action_item *aitem; + + if (!(recheck_mask & (1U << scrub_type))) + continue; + + switch (xfrog_scrubbers[scrub_type].group) { + case XFROG_SCRUB_GROUP_METAFILES: + /* + * XXX gcc fortify gets confused on the memset in + * scrub_item_init_fs if we hoist this allocation to a + * helper function. + */ + aitem = malloc(sizeof(struct action_item)); + if (!aitem) { + int error = errno; + + str_liberror(ctx, error, + _("creating repair revalidation action item")); + return error; + } + + INIT_LIST_HEAD(&aitem->list); + aitem->sri.sri_revalidate = true; + + scrub_item_init_fs(&aitem->sri); + scrub_item_schedule(&aitem->sri, scrub_type); + action_list_add(new_items, aitem); + break; + default: + /* We don't support any other groups yet. */ + assert(false); + continue; + } + } + + return 0; +} + +/* + * After a successful repair, schedule revalidation of metadata outside of this + * scrub item's group. + */ +int +action_item_schedule_revalidation( + struct scrub_ctx *ctx, + const struct action_item *old_aitem, + struct action_list *new_items) +{ + struct action_item *aitem, *n; + unsigned int scrub_type; + int error = 0; + + foreach_scrub_type(scrub_type) { + unsigned int mask; + + if (!(old_aitem->sri.sri_selected & (1U << scrub_type))) + continue; + mask = cross_group_recheck[scrub_type]; + if (!mask) + continue; + + error = schedule_cross_group_recheck(ctx, mask, new_items); + if (error) + goto bad; + } + + return 0; +bad: + list_for_each_entry_safe(aitem, n, &new_items->list, list) { + list_del(&aitem->list); + free(aitem); + } + return error; +} + +/* + * Revalidate all items scheduled for a recheck, and drop the ones that are + * clean. + */ +int +action_list_revalidate( + struct scrub_ctx *ctx, + struct action_list *alist) +{ + struct action_item *aitem, *n; + int error; + + list_for_each_entry_safe(aitem, n, &alist->list, list) { + error = scrub_item_check(ctx, &aitem->sri); + if (error) + return error; + + if (repair_item_count_needsrepair(&aitem->sri) > 0) { + aitem->sri.sri_revalidate = false; + continue; + } + + /* Metadata are clean, delete from list. */ + list_del(&aitem->list); + free(aitem); + } + + return 0; +} diff --git a/scrub/repair.h b/scrub/repair.h index c4b9b5799e2..f90ac16b13f 100644 --- a/scrub/repair.h +++ b/scrub/repair.h @@ -50,6 +50,11 @@ enum tryrepair_outcome { int action_item_try_repair(struct scrub_ctx *ctx, struct action_item *aitem, enum tryrepair_outcome *outcome); +int action_item_schedule_revalidation(struct scrub_ctx *ctx, + const struct action_item *old_aitem, + struct action_list *new_items); +int action_list_revalidate(struct scrub_ctx *sc, struct action_list *alist); + void repair_item_mustfix(struct scrub_item *sri, struct scrub_item *fix_now); /* Primary metadata is corrupt */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCHSET v1.0 0/4] libxfs: file write utility refactoring 2022-12-30 21:14 [NYE DELUGE 3/4] xfs: modernize the realtime volume Darrick J. Wong ` (27 preceding siblings ...) 2022-12-30 22:19 ` [PATCHSET v1.0 00/41] libxfs: realtime reverse-mapping support Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 4/4] mkfs: use file write helper to populate files Darrick J. Wong ` (3 more replies) 2022-12-30 22:20 ` [PATCHSET v1.0 00/41] libxfs: reflink on the realtime device Darrick J. Wong ` (10 subsequent siblings) 39 siblings, 4 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs Hi all, Refactor the parts of mkfs and xfs_repair that open-code the process of mapping disk space into files and writing data into them. This will help primarily with resetting of the realtime metadata, but is also used for protofiles. If you're going to start using this mess, you probably ought to just pull from my git trees, which are linked below. This is an extraordinary way to destroy everything. Enjoy! Comments and questions are, as always, welcome. --D xfsprogs git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=bmap-utils --- include/libxfs.h | 6 ++- libxfs/util.c | 109 ++++++++++++++++++++++++++++++++++++++++--------- mkfs/proto.c | 120 ++++++++++-------------------------------------------- repair/phase6.c | 74 +++++---------------------------- 4 files changed, 125 insertions(+), 184 deletions(-) ^ permalink raw reply [flat|nested] 565+ messages in thread
* [PATCH 4/4] mkfs: use file write helper to populate files 2022-12-30 22:20 ` [PATCHSET v1.0 0/4] libxfs: file write utility refactoring Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 2/4] mkfs: use libxfs_alloc_file_space for rtinit Darrick J. Wong ` (2 subsequent siblings) 3 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Use the file write helper to write files into the filesystem. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- include/libxfs.h | 2 ++ libxfs/util.c | 69 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ mkfs/proto.c | 26 ++++---------------- 3 files changed, 76 insertions(+), 21 deletions(-) diff --git a/include/libxfs.h b/include/libxfs.h index d4985a5769f..0949bbd39a5 100644 --- a/include/libxfs.h +++ b/include/libxfs.h @@ -169,6 +169,8 @@ extern int libxfs_log_header(char *, uuid_t *, int, int, int, xfs_lsn_t, extern int libxfs_alloc_file_space(struct xfs_inode *ip, xfs_off_t offset, xfs_off_t len, int alloc_type); +extern int libxfs_file_write(struct xfs_trans *tp, struct xfs_inode *ip, + void *buf, size_t len, bool logit); /* XXX: this is messy and needs fixing */ #ifndef __LIBXFS_INTERNAL_XFS_H__ diff --git a/libxfs/util.c b/libxfs/util.c index bb6867c21af..5643da72570 100644 --- a/libxfs/util.c +++ b/libxfs/util.c @@ -534,3 +534,72 @@ libxfs_imeta_ensure_dirpath( return error == -EEXIST ? 0 : error; } + +/* + * Write a buffer to a file on the data device. We assume there are no holes + * and no unwritten extents. + */ +int +libxfs_file_write( + struct xfs_trans *tp, + struct xfs_inode *ip, + void *buf, + size_t len, + bool logit) +{ + struct xfs_bmbt_irec map; + struct xfs_mount *mp = ip->i_mount; + struct xfs_buf *bp; + xfs_fileoff_t bno = 0; + xfs_fileoff_t end_bno = XFS_B_TO_FSB(mp, len); + size_t count; + size_t bcount; + int nmap; + int error = 0; + + /* Write up to 1MB at a time. */ + while (bno < end_bno) { + xfs_filblks_t maplen; + + maplen = min(end_bno - bno, XFS_B_TO_FSBT(mp, 1048576)); + nmap = 1; + error = libxfs_bmapi_read(ip, bno, maplen, &map, &nmap, 0); + if (error) + return error; + if (nmap != 1) + return -ENOSPC; + + if (map.br_startblock == HOLESTARTBLOCK || + map.br_state == XFS_EXT_UNWRITTEN) + return -EINVAL; + + error = libxfs_trans_get_buf(tp, mp->m_dev, + XFS_FSB_TO_DADDR(mp, map.br_startblock), + XFS_FSB_TO_BB(mp, map.br_blockcount), + 0, &bp); + if (error) + break; + bp->b_ops = NULL; + + count = min(len, XFS_FSB_TO_B(mp, map.br_blockcount)); + memmove(bp->b_addr, buf, count); + bcount = BBTOB(bp->b_length); + if (count < bcount) + memset((char *)bp->b_addr + count, 0, bcount - count); + + if (tp) { + libxfs_trans_log_buf(tp, bp, 0, bcount - 1); + } else { + libxfs_buf_mark_dirty(bp); + libxfs_buf_relse(bp); + } + if (error) + break; + + buf += count; + len -= count; + bno += map.br_blockcount; + } + + return error; +} diff --git a/mkfs/proto.c b/mkfs/proto.c index c62918a2f7d..96eab25da45 100644 --- a/mkfs/proto.c +++ b/mkfs/proto.c @@ -244,16 +244,12 @@ writefile( { struct xfs_bmbt_irec map; struct xfs_mount *mp; - struct xfs_buf *bp; - xfs_daddr_t d; xfs_extlen_t nb; int nmap; int error; mp = ip->i_mount; if (len > 0) { - int bcount; - nb = XFS_B_TO_FSB(mp, len); nmap = 1; error = -libxfs_bmapi_write(tp, ip, 0, nb, 0, nb, &map, &nmap); @@ -263,30 +259,18 @@ writefile( progname); exit(1); } - if (error) { + if (error) fail(_("error allocating space for a file"), error); - } if (nmap != 1) { fprintf(stderr, _("%s: cannot allocate space for file\n"), progname); exit(1); } - d = XFS_FSB_TO_DADDR(mp, map.br_startblock); - error = -libxfs_trans_get_buf(NULL, mp->m_dev, d, - nb << mp->m_blkbb_log, 0, &bp); - if (error) { - fprintf(stderr, - _("%s: cannot allocate buffer for file\n"), - progname); - exit(1); - } - memmove(bp->b_addr, buf, len); - bcount = BBTOB(bp->b_length); - if (len < bcount) - memset((char *)bp->b_addr + len, 0, bcount - len); - libxfs_buf_mark_dirty(bp); - libxfs_buf_relse(bp); + + error = -libxfs_file_write(tp, ip, buf, len, false); + if (error) + fail(_("error writing file"), error); } ip->i_disk_size = len; } ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 2/4] mkfs: use libxfs_alloc_file_space for rtinit 2022-12-30 22:20 ` [PATCHSET v1.0 0/4] libxfs: file write utility refactoring Darrick J. Wong 2022-12-30 22:20 ` [PATCH 4/4] mkfs: use file write helper to populate files Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 3/4] xfs_repair: use libxfs_alloc_file_space to reallocate rt metadata Darrick J. Wong 2022-12-30 22:20 ` [PATCH 1/4] libxfs: resync libxfs_alloc_file_space interface with the kernel Darrick J. Wong 3 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Since xfs_bmapi_write can now zero newly allocated blocks, use it to initialize the realtime inodes instead of open coding this. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- mkfs/proto.c | 92 +++++++++++----------------------------------------------- 1 file changed, 17 insertions(+), 75 deletions(-) diff --git a/mkfs/proto.c b/mkfs/proto.c index b11b7fa5f95..c62918a2f7d 100644 --- a/mkfs/proto.c +++ b/mkfs/proto.c @@ -916,43 +916,14 @@ static void rtbitmap_init( struct xfs_mount *mp) { - struct xfs_bmbt_irec map[XFS_BMAP_MAX_NMAP]; - struct xfs_trans *tp; - struct xfs_bmbt_irec *ep; - xfs_fileoff_t bno; - uint blocks; - int i; - int nmap; int error; - blocks = mp->m_sb.sb_rbmblocks + - XFS_BM_MAXLEVELS(mp, XFS_DATA_FORK) - 1; - error = -libxfs_trans_alloc_rollable(mp, blocks, &tp); + error = -libxfs_alloc_file_space(mp->m_rbmip, 0, + mp->m_sb.sb_rbmblocks << mp->m_sb.sb_blocklog, + XFS_BMAPI_ZERO); if (error) - res_failed(error); - - libxfs_trans_ijoin(tp, mp->m_rbmip, 0); - bno = 0; - while (bno < mp->m_sb.sb_rbmblocks) { - nmap = XFS_BMAP_MAX_NMAP; - error = -libxfs_bmapi_write(tp, mp->m_rbmip, bno, - (xfs_extlen_t)(mp->m_sb.sb_rbmblocks - bno), - 0, mp->m_sb.sb_rbmblocks, map, &nmap); - if (error) - fail(_("Allocation of the realtime bitmap failed"), - error); - - for (i = 0, ep = map; i < nmap; i++, ep++) { - libxfs_device_zero(mp->m_ddev_targp, - XFS_FSB_TO_DADDR(mp, ep->br_startblock), - XFS_FSB_TO_BB(mp, ep->br_blockcount)); - bno += ep->br_blockcount; - } - } - - error = -libxfs_trans_commit(tp); - if (error) - fail(_("Block allocation of the realtime bitmap inode failed"), + fail( + _("Block allocation of the realtime bitmap inode failed"), error); if (xfs_has_rtgroups(mp)) { @@ -968,44 +939,13 @@ static void rtsummary_init( struct xfs_mount *mp) { - struct xfs_bmbt_irec map[XFS_BMAP_MAX_NMAP]; - struct xfs_trans *tp; - struct xfs_bmbt_irec *ep; - xfs_fileoff_t bno; - xfs_extlen_t nsumblocks; - uint blocks; - int i; - int nmap; int error; - nsumblocks = mp->m_rsumsize >> mp->m_sb.sb_blocklog; - blocks = nsumblocks + XFS_BM_MAXLEVELS(mp, XFS_DATA_FORK) - 1; - error = -libxfs_trans_alloc_rollable(mp, blocks, &tp); + error = -libxfs_alloc_file_space(mp->m_rsumip, 0, mp->m_rsumsize, + XFS_BMAPI_ZERO); if (error) - res_failed(error); - libxfs_trans_ijoin(tp, mp->m_rsumip, 0); - - bno = 0; - while (bno < nsumblocks) { - nmap = XFS_BMAP_MAX_NMAP; - error = -libxfs_bmapi_write(tp, mp->m_rsumip, bno, - (xfs_extlen_t)(nsumblocks - bno), - 0, nsumblocks, map, &nmap); - if (error) - fail(_("Allocation of the realtime summary failed"), - error); - - for (i = 0, ep = map; i < nmap; i++, ep++) { - libxfs_device_zero(mp->m_ddev_targp, - XFS_FSB_TO_DADDR(mp, ep->br_startblock), - XFS_FSB_TO_BB(mp, ep->br_blockcount)); - bno += ep->br_blockcount; - } - } - - error = -libxfs_trans_commit(tp); - if (error) - fail(_("Block allocation of the realtime summary inode failed"), + fail( + _("Block allocation of the realtime summary inode failed"), error); if (xfs_has_rtgroups(mp)) { @@ -1111,12 +1051,14 @@ rtinit( rtrmapbt_create(rtg); } - rtbitmap_init(mp); - rtsummary_init(mp); - if (xfs_has_rtgroups(mp)) - rtfreesp_init_groups(mp); - else - rtfreesp_init(mp); + if (mp->m_sb.sb_rbmblocks) { + rtbitmap_init(mp); + rtsummary_init(mp); + if (xfs_has_rtgroups(mp)) + rtfreesp_init_groups(mp); + else + rtfreesp_init(mp); + } } static long ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 3/4] xfs_repair: use libxfs_alloc_file_space to reallocate rt metadata 2022-12-30 22:20 ` [PATCHSET v1.0 0/4] libxfs: file write utility refactoring Darrick J. Wong 2022-12-30 22:20 ` [PATCH 4/4] mkfs: use file write helper to populate files Darrick J. Wong 2022-12-30 22:20 ` [PATCH 2/4] mkfs: use libxfs_alloc_file_space for rtinit Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 1/4] libxfs: resync libxfs_alloc_file_space interface with the kernel Darrick J. Wong 3 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Now that libxfs_alloc_file_space can allocate and zero blocks, use it to repair the realtime metadata instead of open-coding all this. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- repair/phase6.c | 74 +++++++------------------------------------------------ 1 file changed, 10 insertions(+), 64 deletions(-) diff --git a/repair/phase6.c b/repair/phase6.c index 8828f7f72b9..890bb20bce1 100644 --- a/repair/phase6.c +++ b/repair/phase6.c @@ -695,13 +695,8 @@ mk_rbmino( struct xfs_imeta_update upd; struct xfs_trans *tp; struct xfs_inode *ip; - struct xfs_bmbt_irec *ep; int i; - int nmap; int error; - xfs_fileoff_t bno; - struct xfs_bmbt_irec map[XFS_BMAP_MAX_NMAP]; - uint blocks; error = ensure_imeta_dirpath(mp, &XFS_IMETA_RTBITMAP); if (error) @@ -744,36 +739,15 @@ _("Couldn't find realtime bitmap parent, error %d\n"), * then allocate blocks for file and fill with zeroes (stolen * from mkfs) */ - blocks = mp->m_sb.sb_rbmblocks + - XFS_BM_MAXLEVELS(mp, XFS_DATA_FORK) - 1; - error = -libxfs_trans_alloc_rollable(mp, blocks, &tp); - if (error) - res_failed(error); - - libxfs_trans_ijoin(tp, ip, 0); - bno = 0; - while (bno < mp->m_sb.sb_rbmblocks) { - nmap = XFS_BMAP_MAX_NMAP; - error = -libxfs_bmapi_write(tp, ip, bno, - (xfs_extlen_t)(mp->m_sb.sb_rbmblocks - bno), - 0, mp->m_sb.sb_rbmblocks, map, &nmap); + if (mp->m_sb.sb_rbmblocks) { + error = -libxfs_alloc_file_space(ip, 0, + mp->m_sb.sb_rbmblocks << mp->m_sb.sb_blocklog, + XFS_BMAPI_ZERO); if (error) { do_error( - _("couldn't allocate realtime bitmap, error = %d\n"), + _("allocation of the realtime bitmap failed, error = %d\n"), error); } - for (i = 0, ep = map; i < nmap; i++, ep++) { - libxfs_device_zero(mp->m_ddev_targp, - XFS_FSB_TO_DADDR(mp, ep->br_startblock), - XFS_FSB_TO_BB(mp, ep->br_blockcount)); - bno += ep->br_blockcount; - } - } - error = -libxfs_trans_commit(tp); - if (error) { - do_error( - _("allocation of the realtime bitmap failed, error = %d\n"), - error); } libxfs_irele(ip); } @@ -951,14 +925,8 @@ mk_rsumino( struct xfs_imeta_update upd; struct xfs_trans *tp; struct xfs_inode *ip; - struct xfs_bmbt_irec *ep; int i; - int nmap; int error; - int nsumblocks; - xfs_fileoff_t bno; - struct xfs_bmbt_irec map[XFS_BMAP_MAX_NMAP]; - uint blocks; error = ensure_imeta_dirpath(mp, &XFS_IMETA_RTSUMMARY); if (error) @@ -1001,36 +969,14 @@ _("Couldn't find realtime summary parent, error %d\n"), * then allocate blocks for file and fill with zeroes (stolen * from mkfs) */ - nsumblocks = mp->m_rsumsize >> mp->m_sb.sb_blocklog; - blocks = nsumblocks + XFS_BM_MAXLEVELS(mp, XFS_DATA_FORK) - 1; - error = -libxfs_trans_alloc_rollable(mp, blocks, &tp); - if (error) - res_failed(error); - - libxfs_trans_ijoin(tp, ip, 0); - bno = 0; - while (bno < nsumblocks) { - nmap = XFS_BMAP_MAX_NMAP; - error = -libxfs_bmapi_write(tp, ip, bno, - (xfs_extlen_t)(nsumblocks - bno), - 0, nsumblocks, map, &nmap); + if (mp->m_rsumsize) { + error = -libxfs_alloc_file_space(ip, 0, mp->m_rsumsize, + XFS_BMAPI_ZERO); if (error) { do_error( - _("couldn't allocate realtime summary inode, error = %d\n"), - error); - } - for (i = 0, ep = map; i < nmap; i++, ep++) { - libxfs_device_zero(mp->m_ddev_targp, - XFS_FSB_TO_DADDR(mp, ep->br_startblock), - XFS_FSB_TO_BB(mp, ep->br_blockcount)); - bno += ep->br_blockcount; - } - } - error = -libxfs_trans_commit(tp); - if (error) { - do_error( _("allocation of the realtime summary ino failed, error = %d\n"), - error); + error); + } } libxfs_irele(ip); } ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 1/4] libxfs: resync libxfs_alloc_file_space interface with the kernel 2022-12-30 22:20 ` [PATCHSET v1.0 0/4] libxfs: file write utility refactoring Darrick J. Wong ` (2 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 3/4] xfs_repair: use libxfs_alloc_file_space to reallocate rt metadata Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 3 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Make the userspace xfs_alloc_file_space behave (more or less) like the kernel version, at least as far as the interface goes. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- include/libxfs.h | 4 ++-- libxfs/util.c | 40 +++++++++++++++++++--------------------- mkfs/proto.c | 2 +- 3 files changed, 22 insertions(+), 24 deletions(-) diff --git a/include/libxfs.h b/include/libxfs.h index b1e499569ac..d4985a5769f 100644 --- a/include/libxfs.h +++ b/include/libxfs.h @@ -167,8 +167,8 @@ extern int libxfs_log_header(char *, uuid_t *, int, int, int, xfs_lsn_t, /* Shared utility routines */ -extern int libxfs_alloc_file_space (struct xfs_inode *, xfs_off_t, - xfs_off_t, int, int); +extern int libxfs_alloc_file_space(struct xfs_inode *ip, xfs_off_t offset, + xfs_off_t len, int alloc_type); /* XXX: this is messy and needs fixing */ #ifndef __LIBXFS_INTERNAL_XFS_H__ diff --git a/libxfs/util.c b/libxfs/util.c index e8397fdc341..bb6867c21af 100644 --- a/libxfs/util.c +++ b/libxfs/util.c @@ -179,25 +179,23 @@ libxfs_mod_incore_sb( */ int libxfs_alloc_file_space( - xfs_inode_t *ip, - xfs_off_t offset, - xfs_off_t len, - int alloc_type, - int attr_flags) + struct xfs_inode *ip, + xfs_off_t offset, + xfs_off_t len, + int alloc_type) { - xfs_mount_t *mp; - xfs_off_t count; - xfs_filblks_t datablocks; - xfs_filblks_t allocated_fsb; - xfs_filblks_t allocatesize_fsb; - xfs_bmbt_irec_t *imapp; - xfs_bmbt_irec_t imaps[1]; - int reccount; - uint resblks; - xfs_fileoff_t startoffset_fsb; - xfs_trans_t *tp; - int xfs_bmapi_flags; - int error; + struct xfs_bmbt_irec imaps[1]; + struct xfs_bmbt_irec *imapp; + struct xfs_mount *mp; + struct xfs_trans *tp; + xfs_off_t count; + xfs_filblks_t datablocks; + xfs_filblks_t allocated_fsb; + xfs_filblks_t allocatesize_fsb; + int reccount; + uint resblks; + xfs_fileoff_t startoffset_fsb; + int error; if (len <= 0) return -EINVAL; @@ -206,7 +204,6 @@ libxfs_alloc_file_space( error = 0; imapp = &imaps[0]; reccount = 1; - xfs_bmapi_flags = alloc_type ? XFS_BMAPI_PREALLOC : 0; mp = ip->i_mount; startoffset_fsb = XFS_B_TO_FSBT(mp, offset); allocatesize_fsb = XFS_B_TO_FSB(mp, count); @@ -227,8 +224,9 @@ libxfs_alloc_file_space( } xfs_trans_ijoin(tp, ip, 0); - error = xfs_bmapi_write(tp, ip, startoffset_fsb, allocatesize_fsb, - xfs_bmapi_flags, 0, imapp, &reccount); + error = xfs_bmapi_write(tp, ip, startoffset_fsb, + allocatesize_fsb, alloc_type, resblks, + imapp, &reccount); if (error) goto error0; diff --git a/mkfs/proto.c b/mkfs/proto.c index 36af61ed5c0..b11b7fa5f95 100644 --- a/mkfs/proto.c +++ b/mkfs/proto.c @@ -179,7 +179,7 @@ rsvfile( int error; xfs_trans_t *tp; - error = -libxfs_alloc_file_space(ip, 0, llen, 1, 0); + error = -libxfs_alloc_file_space(ip, 0, llen, XFS_BMAPI_PREALLOC); if (error) { fail(_("error reserving space for a file"), error); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCHSET v1.0 00/41] libxfs: reflink on the realtime device 2022-12-30 21:14 [NYE DELUGE 3/4] xfs: modernize the realtime volume Darrick J. Wong ` (28 preceding siblings ...) 2022-12-30 22:20 ` [PATCHSET v1.0 0/4] libxfs: file write utility refactoring Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 02/41] xfs: namespace the maximum length/refcount symbols Darrick J. Wong ` (40 more replies) 2022-12-30 22:20 ` [PATCHSET v1.0 0/3] libxfs: reflink with large realtime extents Darrick J. Wong ` (9 subsequent siblings) 39 siblings, 41 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs Hi all, This patchset enables use of the file data block sharing feature (i.e. reflink) on the realtime device. It follows the same basic sequence as the realtime rmap series -- first a few cleanups; then widening of the API parameters; and introduction of the new btree format and inode fork format. Next comes enabling CoW and remapping for the rt device; new scrub, repair, and health reporting code; and at the end we implement some code to lengthen write requests so that rt extents are always CoWed fully. If you're going to start using this mess, you probably ought to just pull from my git trees, which are linked below. This is an extraordinary way to destroy everything. Enjoy! Comments and questions are, as always, welcome. --D kernel git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=realtime-reflink xfsprogs git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=realtime-reflink fstests git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=realtime-reflink xfsdocs git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-documentation.git/log/?h=realtime-reflink --- db/bmroot.c | 148 ++++++++ db/bmroot.h | 2 db/btblock.c | 53 +++ db/btblock.h | 5 db/btdump.c | 11 + db/btheight.c | 5 db/check.c | 326 ++++++++++++++--- db/field.c | 13 + db/field.h | 5 db/inode.c | 76 ++++ db/inode.h | 2 db/metadump.c | 125 ++++++ db/type.c | 5 db/type.h | 1 include/libxfs.h | 1 include/xfs_mount.h | 9 libfrog/scrub.c | 5 libxfs/Makefile | 2 libxfs/defer_item.c | 20 + libxfs/init.c | 14 + libxfs/libxfs_api_defs.h | 12 + libxfs/xfs_bmap.c | 31 +- libxfs/xfs_btree.c | 6 libxfs/xfs_btree.h | 12 - libxfs/xfs_defer.c | 1 libxfs/xfs_defer.h | 1 libxfs/xfs_format.h | 25 + libxfs/xfs_fs.h | 4 libxfs/xfs_health.h | 4 libxfs/xfs_inode_buf.c | 34 +- libxfs/xfs_inode_fork.c | 13 + libxfs/xfs_log_format.h | 5 libxfs/xfs_refcount.c | 314 ++++++++++++---- libxfs/xfs_refcount.h | 25 + libxfs/xfs_rmap.c | 14 + libxfs/xfs_rtgroup.c | 10 + libxfs/xfs_rtgroup.h | 8 libxfs/xfs_rtrefcount_btree.c | 809 +++++++++++++++++++++++++++++++++++++++++ libxfs/xfs_rtrefcount_btree.h | 195 ++++++++++ libxfs/xfs_rtrmap_btree.c | 28 + libxfs/xfs_sb.c | 8 libxfs/xfs_shared.h | 2 libxfs/xfs_trans_inode.c | 14 + libxfs/xfs_trans_resv.c | 25 + libxfs/xfs_types.h | 5 man/man8/xfs_db.8 | 49 ++ mkfs/proto.c | 44 ++ mkfs/xfs_mkfs.c | 79 ++++ repair/Makefile | 1 repair/agbtree.c | 4 repair/dino_chunks.c | 11 + repair/dinode.c | 266 ++++++++++++- repair/dir2.c | 2 repair/incore.h | 1 repair/phase2.c | 75 ++++ repair/phase4.c | 30 +- repair/phase6.c | 115 ++++++ repair/rmap.c | 327 ++++++++++++++--- repair/rmap.h | 17 + repair/rtrefcount_repair.c | 248 +++++++++++++ repair/scan.c | 324 ++++++++++++++++ repair/scan.h | 33 ++ scrub/repair.c | 1 spaceman/health.c | 10 + 64 files changed, 3792 insertions(+), 278 deletions(-) create mode 100644 libxfs/xfs_rtrefcount_btree.c create mode 100644 libxfs/xfs_rtrefcount_btree.h create mode 100644 repair/rtrefcount_repair.c ^ permalink raw reply [flat|nested] 565+ messages in thread
* [PATCH 02/41] xfs: namespace the maximum length/refcount symbols 2022-12-30 22:20 ` [PATCHSET v1.0 00/41] libxfs: reflink on the realtime device Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 01/41] xfs: introduce realtime refcount btree definitions Darrick J. Wong ` (39 subsequent siblings) 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Actually namespace these variables properly, so that readers can tell that this is an XFS symbol, and that it's for the refcount functionality. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/xfs_format.h | 4 ++-- libxfs/xfs_refcount.c | 18 +++++++++--------- repair/rmap.c | 4 ++-- repair/scan.c | 2 +- 4 files changed, 14 insertions(+), 14 deletions(-) diff --git a/libxfs/xfs_format.h b/libxfs/xfs_format.h index c78fe8e78b8..c49a946e79f 100644 --- a/libxfs/xfs_format.h +++ b/libxfs/xfs_format.h @@ -1790,8 +1790,8 @@ struct xfs_refcount_key { __be32 rc_startblock; /* starting block number */ }; -#define MAXREFCOUNT ((xfs_nlink_t)~0U) -#define MAXREFCEXTLEN ((xfs_extlen_t)~0U) +#define XFS_REFC_REFCOUNT_MAX ((xfs_nlink_t)~0U) +#define XFS_REFC_LEN_MAX ((xfs_extlen_t)~0U) /* btree pointer type */ typedef __be32 xfs_refcount_ptr_t; diff --git a/libxfs/xfs_refcount.c b/libxfs/xfs_refcount.c index 614525e12e3..e1d8b3c07bd 100644 --- a/libxfs/xfs_refcount.c +++ b/libxfs/xfs_refcount.c @@ -125,7 +125,7 @@ xfs_refcount_check_perag_irec( struct xfs_perag *pag, const struct xfs_refcount_irec *irec) { - if (irec->rc_blockcount == 0 || irec->rc_blockcount > MAXREFCEXTLEN) + if (irec->rc_blockcount == 0 || irec->rc_blockcount > XFS_REFC_LEN_MAX) return __this_address; if (!xfs_refcount_check_domain(irec)) @@ -135,7 +135,7 @@ xfs_refcount_check_perag_irec( if (!xfs_verify_agbext(pag, irec->rc_startblock, irec->rc_blockcount)) return __this_address; - if (irec->rc_refcount == 0 || irec->rc_refcount > MAXREFCOUNT) + if (irec->rc_refcount == 0 || irec->rc_refcount > XFS_REFC_REFCOUNT_MAX) return __this_address; return NULL; @@ -859,9 +859,9 @@ xfs_refc_merge_refcount( const struct xfs_refcount_irec *irec, enum xfs_refc_adjust_op adjust) { - /* Once a record hits MAXREFCOUNT, it is pinned there forever */ - if (irec->rc_refcount == MAXREFCOUNT) - return MAXREFCOUNT; + /* Once a record hits XFS_REFC_REFCOUNT_MAX, it is pinned forever */ + if (irec->rc_refcount == XFS_REFC_REFCOUNT_MAX) + return XFS_REFC_REFCOUNT_MAX; return irec->rc_refcount + adjust; } @@ -904,7 +904,7 @@ xfs_refc_want_merge_center( * hence we need to catch u32 addition overflows here. */ ulen += cleft->rc_blockcount + right->rc_blockcount; - if (ulen >= MAXREFCEXTLEN) + if (ulen >= XFS_REFC_LEN_MAX) return false; *ulenp = ulen; @@ -939,7 +939,7 @@ xfs_refc_want_merge_left( * hence we need to catch u32 addition overflows here. */ ulen += cleft->rc_blockcount; - if (ulen >= MAXREFCEXTLEN) + if (ulen >= XFS_REFC_LEN_MAX) return false; return true; @@ -973,7 +973,7 @@ xfs_refc_want_merge_right( * hence we need to catch u32 addition overflows here. */ ulen += cright->rc_blockcount; - if (ulen >= MAXREFCEXTLEN) + if (ulen >= XFS_REFC_LEN_MAX) return false; return true; @@ -1200,7 +1200,7 @@ xfs_refcount_adjust_extents( * Adjust the reference count and either update the tree * (incr) or free the blocks (decr). */ - if (ext.rc_refcount == MAXREFCOUNT) + if (ext.rc_refcount == XFS_REFC_REFCOUNT_MAX) goto skip; ext.rc_refcount += adj; trace_xfs_refcount_modify_extent(cur, &ext); diff --git a/repair/rmap.c b/repair/rmap.c index 9795f0ec577..15a3e2ecaec 100644 --- a/repair/rmap.c +++ b/repair/rmap.c @@ -996,8 +996,8 @@ refcount_emit( agno, agbno, len, nr_rmaps); rlrec.rc_startblock = agbno; rlrec.rc_blockcount = len; - if (nr_rmaps > MAXREFCOUNT) - nr_rmaps = MAXREFCOUNT; + if (nr_rmaps > XFS_REFC_REFCOUNT_MAX) + nr_rmaps = XFS_REFC_REFCOUNT_MAX; rlrec.rc_refcount = nr_rmaps; rlrec.rc_domain = XFS_REFC_DOMAIN_SHARED; diff --git a/repair/scan.c b/repair/scan.c index 40e8007e698..0ff8afccedc 100644 --- a/repair/scan.c +++ b/repair/scan.c @@ -1893,7 +1893,7 @@ _("extent (%u/%u) len %u claimed, state is %d\n"), break; } } - } else if (nr < 2 || nr > MAXREFCOUNT) { + } else if (nr < 2 || nr > XFS_REFC_REFCOUNT_MAX) { do_warn( _("invalid reference count %u in record %u of %s btree block %u/%u\n"), nr, i, name, agno, bno); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 01/41] xfs: introduce realtime refcount btree definitions 2022-12-30 22:20 ` [PATCHSET v1.0 00/41] libxfs: reflink on the realtime device Darrick J. Wong 2022-12-30 22:20 ` [PATCH 02/41] xfs: namespace the maximum length/refcount symbols Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 06/41] xfs: prepare refcount functions to deal with rtrefcountbt Darrick J. Wong ` (38 subsequent siblings) 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Add new realtime refcount btree definitions. The realtime refcount btree will be rooted from a hidden inode, but has its own shape and therefore needs to have most of its own separate types. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/xfs_btree.h | 1 + libxfs/xfs_format.h | 6 ++++++ libxfs/xfs_types.h | 5 +++-- 3 files changed, 10 insertions(+), 2 deletions(-) diff --git a/libxfs/xfs_btree.h b/libxfs/xfs_btree.h index 1d8656ca112..48dbb681cf3 100644 --- a/libxfs/xfs_btree.h +++ b/libxfs/xfs_btree.h @@ -65,6 +65,7 @@ union xfs_btree_rec { #define XFS_BTNUM_REFC ((xfs_btnum_t)XFS_BTNUM_REFCi) #define XFS_BTNUM_RCBAG ((xfs_btnum_t)XFS_BTNUM_RCBAGi) #define XFS_BTNUM_RTRMAP ((xfs_btnum_t)XFS_BTNUM_RTRMAPi) +#define XFS_BTNUM_RTREFC ((xfs_btnum_t)XFS_BTNUM_RTREFCi) struct xfs_btree_ops; uint32_t xfs_btree_magic(struct xfs_mount *mp, const struct xfs_btree_ops *ops); diff --git a/libxfs/xfs_format.h b/libxfs/xfs_format.h index a2b8d8ee8af..c78fe8e78b8 100644 --- a/libxfs/xfs_format.h +++ b/libxfs/xfs_format.h @@ -1796,6 +1796,12 @@ struct xfs_refcount_key { /* btree pointer type */ typedef __be32 xfs_refcount_ptr_t; +/* + * Realtime Reference Count btree format definitions + * + * This is a btree for reference count records for realtime volumes + */ +#define XFS_RTREFC_CRC_MAGIC 0x52434e54 /* 'RCNT' */ /* * BMAP Btree format definitions diff --git a/libxfs/xfs_types.h b/libxfs/xfs_types.h index e6a4f4a7d00..92c60a9d586 100644 --- a/libxfs/xfs_types.h +++ b/libxfs/xfs_types.h @@ -126,7 +126,7 @@ typedef enum { typedef enum { XFS_BTNUM_BNOi, XFS_BTNUM_CNTi, XFS_BTNUM_RMAPi, XFS_BTNUM_BMAPi, XFS_BTNUM_INOi, XFS_BTNUM_FINOi, XFS_BTNUM_REFCi, XFS_BTNUM_RCBAGi, - XFS_BTNUM_RTRMAPi, XFS_BTNUM_MAX + XFS_BTNUM_RTRMAPi, XFS_BTNUM_RTREFCi, XFS_BTNUM_MAX } xfs_btnum_t; #define XFS_BTNUM_STRINGS \ @@ -138,7 +138,8 @@ typedef enum { { XFS_BTNUM_FINOi, "finobt" }, \ { XFS_BTNUM_REFCi, "refcbt" }, \ { XFS_BTNUM_RCBAGi, "rcbagbt" }, \ - { XFS_BTNUM_RTRMAPi, "rtrmapbt" } + { XFS_BTNUM_RTRMAPi, "rtrmapbt" }, \ + { XFS_BTNUM_RTREFCi, "rtrefcbt" } struct xfs_name { const unsigned char *name; ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 06/41] xfs: prepare refcount functions to deal with rtrefcountbt 2022-12-30 22:20 ` [PATCHSET v1.0 00/41] libxfs: reflink on the realtime device Darrick J. Wong 2022-12-30 22:20 ` [PATCH 02/41] xfs: namespace the maximum length/refcount symbols Darrick J. Wong 2022-12-30 22:20 ` [PATCH 01/41] xfs: introduce realtime refcount btree definitions Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 07/41] xfs: add a realtime flag to the refcount update log redo items Darrick J. Wong ` (37 subsequent siblings) 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Prepare the high-level refcount functions to deal with the new realtime refcountbt and its slightly different conventions. Provide the ability to talk to either refcountbt or rtrefcountbt formats from the same high level code. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/xfs_refcount.c | 79 ++++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 64 insertions(+), 15 deletions(-) diff --git a/libxfs/xfs_refcount.c b/libxfs/xfs_refcount.c index e1d8b3c07bd..248761ca1dd 100644 --- a/libxfs/xfs_refcount.c +++ b/libxfs/xfs_refcount.c @@ -23,6 +23,7 @@ #include "xfs_rmap.h" #include "xfs_ag.h" #include "xfs_health.h" +#include "xfs_rtgroup.h" struct kmem_cache *xfs_refcount_intent_cache; @@ -39,6 +40,16 @@ STATIC int __xfs_refcount_cow_alloc(struct xfs_btree_cur *rcur, STATIC int __xfs_refcount_cow_free(struct xfs_btree_cur *rcur, xfs_agblock_t agbno, xfs_extlen_t aglen); +/* Return the maximum startblock number of the refcountbt. */ +static inline xfs_agblock_t +xrefc_max_startblock( + struct xfs_btree_cur *cur) +{ + if (cur->bc_btnum == XFS_BTNUM_RTREFC) + return cur->bc_mp->m_sb.sb_rgblocks; + return cur->bc_mp->m_sb.sb_agblocks; +} + /* * Look up the first record less than or equal to [bno, len] in the btree * given by cur. @@ -141,12 +152,35 @@ xfs_refcount_check_perag_irec( return NULL; } +static inline xfs_failaddr_t +xfs_refcount_check_rtgroup_irec( + struct xfs_rtgroup *rtg, + const struct xfs_refcount_irec *irec) +{ + if (irec->rc_blockcount == 0 || irec->rc_blockcount > XFS_REFC_LEN_MAX) + return __this_address; + + if (!xfs_refcount_check_domain(irec)) + return __this_address; + + /* check for valid extent range, including overflow */ + if (!xfs_verify_rgbext(rtg, irec->rc_startblock, irec->rc_blockcount)) + return __this_address; + + if (irec->rc_refcount == 0 || irec->rc_refcount > XFS_REFC_REFCOUNT_MAX) + return __this_address; + + return NULL; +} + /* Simple checks for refcount records. */ xfs_failaddr_t xfs_refcount_check_irec( struct xfs_btree_cur *cur, const struct xfs_refcount_irec *irec) { + if (cur->bc_btnum == XFS_BTNUM_RTREFC) + return xfs_refcount_check_rtgroup_irec(cur->bc_ino.rtg, irec); return xfs_refcount_check_perag_irec(cur->bc_ag.pag, irec); } @@ -158,9 +192,15 @@ xfs_refcount_complain_bad_rec( { struct xfs_mount *mp = cur->bc_mp; - xfs_warn(mp, + if (cur->bc_btnum == XFS_BTNUM_RTREFC) { + xfs_warn(mp, + "RT Refcount BTree record corruption in rtgroup %u detected at %pS!", + cur->bc_ino.rtg->rtg_rgno, fa); + } else { + xfs_warn(mp, "Refcount BTree record corruption in AG %d detected at %pS!", cur->bc_ag.pag->pag_agno, fa); + } xfs_warn(mp, "Start block 0x%x, block count 0x%x, references 0x%x", irec->rc_startblock, irec->rc_blockcount, irec->rc_refcount); @@ -1053,6 +1093,15 @@ xfs_refcount_merge_extents( return 0; } +static inline struct xbtree_refc * +xrefc_btree_state( + struct xfs_btree_cur *cur) +{ + if (cur->bc_btnum == XFS_BTNUM_RTREFC) + return &cur->bc_ino.refc; + return &cur->bc_ag.refc; +} + /* * XXX: This is a pretty hand-wavy estimate. The penalty for guessing * true incorrectly is a shutdown FS; the penalty for guessing false @@ -1070,25 +1119,25 @@ xfs_refcount_still_have_space( * to handle each of the shape changes to the refcount btree. */ overhead = xfs_allocfree_block_count(cur->bc_mp, - cur->bc_ag.refc.shape_changes); - overhead += cur->bc_mp->m_refc_maxlevels; + xrefc_btree_state(cur)->shape_changes); + overhead += cur->bc_maxlevels; overhead *= cur->bc_mp->m_sb.sb_blocksize; /* * Only allow 2 refcount extent updates per transaction if the * refcount continue update "error" has been injected. */ - if (cur->bc_ag.refc.nr_ops > 2 && + if (xrefc_btree_state(cur)->nr_ops > 2 && XFS_TEST_ERROR(false, cur->bc_mp, XFS_ERRTAG_REFCOUNT_CONTINUE_UPDATE)) return false; - if (cur->bc_ag.refc.nr_ops == 0) + if (xrefc_btree_state(cur)->nr_ops == 0) return true; else if (overhead > cur->bc_tp->t_log_res) return false; return cur->bc_tp->t_log_res - overhead > - cur->bc_ag.refc.nr_ops * XFS_REFCOUNT_ITEM_OVERHEAD; + xrefc_btree_state(cur)->nr_ops * XFS_REFCOUNT_ITEM_OVERHEAD; } /* @@ -1123,7 +1172,7 @@ xfs_refcount_adjust_extents( if (error) goto out_error; if (!found_rec || ext.rc_domain != XFS_REFC_DOMAIN_SHARED) { - ext.rc_startblock = cur->bc_mp->m_sb.sb_agblocks; + ext.rc_startblock = xrefc_max_startblock(cur); ext.rc_blockcount = 0; ext.rc_refcount = 0; ext.rc_domain = XFS_REFC_DOMAIN_SHARED; @@ -1147,7 +1196,7 @@ xfs_refcount_adjust_extents( * Either cover the hole (increment) or * delete the range (decrement). */ - cur->bc_ag.refc.nr_ops++; + xrefc_btree_state(cur)->nr_ops++; if (tmp.rc_refcount) { error = xfs_refcount_insert(cur, &tmp, &found_tmp); @@ -1204,7 +1253,7 @@ xfs_refcount_adjust_extents( goto skip; ext.rc_refcount += adj; trace_xfs_refcount_modify_extent(cur, &ext); - cur->bc_ag.refc.nr_ops++; + xrefc_btree_state(cur)->nr_ops++; if (ext.rc_refcount > 1) { error = xfs_refcount_update(cur, &ext); if (error) @@ -1287,7 +1336,7 @@ xfs_refcount_adjust( if (shape_changed) shape_changes++; if (shape_changes) - cur->bc_ag.refc.shape_changes++; + xrefc_btree_state(cur)->shape_changes++; /* Now that we've taken care of the ends, adjust the middle extents */ error = xfs_refcount_adjust_extents(cur, agbno, aglen, adj); @@ -1379,8 +1428,8 @@ xfs_refcount_finish_one( */ rcur = *pcur; if (rcur != NULL && rcur->bc_ag.pag != ri->ri_pag) { - nr_ops = rcur->bc_ag.refc.nr_ops; - shape_changes = rcur->bc_ag.refc.shape_changes; + nr_ops = xrefc_btree_state(rcur)->nr_ops; + shape_changes = xrefc_btree_state(rcur)->shape_changes; xfs_refcount_finish_one_cleanup(tp, rcur, 0); rcur = NULL; *pcur = NULL; @@ -1392,8 +1441,8 @@ xfs_refcount_finish_one( return error; rcur = xfs_refcountbt_init_cursor(mp, tp, agbp, ri->ri_pag); - rcur->bc_ag.refc.nr_ops = nr_ops; - rcur->bc_ag.refc.shape_changes = shape_changes; + xrefc_btree_state(rcur)->nr_ops = nr_ops; + xrefc_btree_state(rcur)->shape_changes = shape_changes; } *pcur = rcur; @@ -1688,7 +1737,7 @@ xfs_refcount_adjust_cow_extents( goto out_error; } if (!found_rec) { - ext.rc_startblock = cur->bc_mp->m_sb.sb_agblocks; + ext.rc_startblock = xrefc_max_startblock(cur); ext.rc_blockcount = 0; ext.rc_refcount = 0; ext.rc_domain = XFS_REFC_DOMAIN_COW; ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 07/41] xfs: add a realtime flag to the refcount update log redo items 2022-12-30 22:20 ` [PATCHSET v1.0 00/41] libxfs: reflink on the realtime device Darrick J. Wong ` (2 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 06/41] xfs: prepare refcount functions to deal with rtrefcountbt Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 05/41] xfs: add realtime refcount btree operations Darrick J. Wong ` (36 subsequent siblings) 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Extend the refcount update (CUI) log items with a new realtime flag that indicates that the updates apply against the realtime refcountbt. We'll wire up the actual refcount code later. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/defer_item.c | 20 ++++++ libxfs/xfs_bmap.c | 10 ++- libxfs/xfs_defer.c | 1 libxfs/xfs_defer.h | 1 libxfs/xfs_log_format.h | 5 +- libxfs/xfs_refcount.c | 155 +++++++++++++++++++++++++++++++++++------------ libxfs/xfs_refcount.h | 18 +++-- 7 files changed, 157 insertions(+), 53 deletions(-) diff --git a/libxfs/defer_item.c b/libxfs/defer_item.c index baf3b9e6204..df66f54138f 100644 --- a/libxfs/defer_item.c +++ b/libxfs/defer_item.c @@ -365,6 +365,11 @@ xfs_refcount_update_diff_items( ra = container_of(a, struct xfs_refcount_intent, ri_list); rb = container_of(b, struct xfs_refcount_intent, ri_list); + ASSERT(ra->ri_realtime == rb->ri_realtime); + + if (ra->ri_realtime) + return ra->ri_rtg->rtg_rgno - rb->ri_rtg->rtg_rgno; + return ra->ri_pag->pag_agno - rb->ri_pag->pag_agno; } @@ -401,6 +406,15 @@ xfs_refcount_update_get_group( { xfs_agnumber_t agno; + if (ri->ri_realtime) { + xfs_rgnumber_t rgno; + + rgno = xfs_rtb_to_rgno(mp, ri->ri_startblock); + ri->ri_rtg = xfs_rtgroup_get(mp, rgno); + xfs_rtgroup_bump_intents(ri->ri_rtg); + return; + } + agno = XFS_FSB_TO_AGNO(mp, ri->ri_startblock); ri->ri_pag = xfs_perag_get(mp, agno); xfs_perag_bump_intents(ri->ri_pag); @@ -411,6 +425,12 @@ static inline void xfs_refcount_update_put_group( struct xfs_refcount_intent *ri) { + if (ri->ri_realtime) { + xfs_rtgroup_drop_intents(ri->ri_rtg); + xfs_rtgroup_put(ri->ri_rtg); + return; + } + xfs_perag_drop_intents(ri->ri_pag); xfs_perag_put(ri->ri_pag); } diff --git a/libxfs/xfs_bmap.c b/libxfs/xfs_bmap.c index d0588d3fa70..6795f070214 100644 --- a/libxfs/xfs_bmap.c +++ b/libxfs/xfs_bmap.c @@ -4523,8 +4523,9 @@ xfs_bmapi_write( * the refcount btree for orphan recovery. */ if (whichfork == XFS_COW_FORK) - xfs_refcount_alloc_cow_extent(tp, bma.blkno, - bma.length); + xfs_refcount_alloc_cow_extent(tp, + XFS_IS_REALTIME_INODE(ip), + bma.blkno, bma.length); } /* Deal with the allocated space we found. */ @@ -4690,7 +4691,8 @@ xfs_bmapi_convert_delalloc( *seq = READ_ONCE(ifp->if_seq); if (whichfork == XFS_COW_FORK) - xfs_refcount_alloc_cow_extent(tp, bma.blkno, bma.length); + xfs_refcount_alloc_cow_extent(tp, XFS_IS_REALTIME_INODE(ip), + bma.blkno, bma.length); error = xfs_bmap_btree_to_extents(tp, ip, bma.cur, &bma.logflags, whichfork); @@ -5307,7 +5309,7 @@ xfs_bmap_del_extent_real( */ if (want_free) { if (xfs_is_reflink_inode(ip) && whichfork == XFS_DATA_FORK) { - xfs_refcount_decrease_extent(tp, del); + xfs_refcount_decrease_extent(tp, isrt, del); } else { unsigned int efi_flags = 0; diff --git a/libxfs/xfs_defer.c b/libxfs/xfs_defer.c index 9dbab9ac955..1d8bf2f6f65 100644 --- a/libxfs/xfs_defer.c +++ b/libxfs/xfs_defer.c @@ -181,6 +181,7 @@ static struct kmem_cache *xfs_defer_pending_cache; static const struct xfs_defer_op_type *defer_op_types[] = { [XFS_DEFER_OPS_TYPE_BMAP] = &xfs_bmap_update_defer_type, [XFS_DEFER_OPS_TYPE_REFCOUNT] = &xfs_refcount_update_defer_type, + [XFS_DEFER_OPS_TYPE_REFCOUNT_RT] = &xfs_refcount_update_defer_type, [XFS_DEFER_OPS_TYPE_RMAP] = &xfs_rmap_update_defer_type, [XFS_DEFER_OPS_TYPE_RMAP_RT] = &xfs_rmap_update_defer_type, [XFS_DEFER_OPS_TYPE_FREE] = &xfs_extent_free_defer_type, diff --git a/libxfs/xfs_defer.h b/libxfs/xfs_defer.h index 89c279185ce..8564777c4c4 100644 --- a/libxfs/xfs_defer.h +++ b/libxfs/xfs_defer.h @@ -16,6 +16,7 @@ struct xfs_defer_capture; enum xfs_defer_ops_type { XFS_DEFER_OPS_TYPE_BMAP, XFS_DEFER_OPS_TYPE_REFCOUNT, + XFS_DEFER_OPS_TYPE_REFCOUNT_RT, XFS_DEFER_OPS_TYPE_RMAP, XFS_DEFER_OPS_TYPE_RMAP_RT, XFS_DEFER_OPS_TYPE_FREE, diff --git a/libxfs/xfs_log_format.h b/libxfs/xfs_log_format.h index 3a23282d6e6..66cfcafae9b 100644 --- a/libxfs/xfs_log_format.h +++ b/libxfs/xfs_log_format.h @@ -800,7 +800,10 @@ struct xfs_phys_extent { /* Type codes are taken directly from enum xfs_refcount_intent_type. */ #define XFS_REFCOUNT_EXTENT_TYPE_MASK 0xFF -#define XFS_REFCOUNT_EXTENT_FLAGS (XFS_REFCOUNT_EXTENT_TYPE_MASK) +#define XFS_REFCOUNT_EXTENT_REALTIME (1U << 31) + +#define XFS_REFCOUNT_EXTENT_FLAGS (XFS_REFCOUNT_EXTENT_TYPE_MASK | \ + XFS_REFCOUNT_EXTENT_REALTIME) /* * This is the structure used to lay out a cui log item in the diff --git a/libxfs/xfs_refcount.c b/libxfs/xfs_refcount.c index 248761ca1dd..960dbb401bd 100644 --- a/libxfs/xfs_refcount.c +++ b/libxfs/xfs_refcount.c @@ -1140,6 +1140,28 @@ xfs_refcount_still_have_space( xrefc_btree_state(cur)->nr_ops * XFS_REFCOUNT_ITEM_OVERHEAD; } +/* Schedule an extent free. */ +static void +xrefc_free_extent( + struct xfs_btree_cur *cur, + struct xfs_refcount_irec *rec) +{ + xfs_fsblock_t fsbno; + unsigned int flags = 0; + + if (cur->bc_btnum == XFS_BTNUM_RTREFC) { + flags |= XFS_FREE_EXTENT_REALTIME; + fsbno = xfs_rgbno_to_rtb(cur->bc_mp, cur->bc_ino.rtg->rtg_rgno, + rec->rc_startblock); + } else { + fsbno = XFS_AGB_TO_FSB(cur->bc_mp, cur->bc_ag.pag->pag_agno, + rec->rc_startblock); + } + + xfs_free_extent_later(cur->bc_tp, fsbno, rec->rc_blockcount, NULL, + flags); +} + /* * Adjust the refcounts of middle extents. At this point we should have * split extents that crossed the adjustment range; merged with adjacent @@ -1156,7 +1178,6 @@ xfs_refcount_adjust_extents( struct xfs_refcount_irec ext, tmp; int error; int found_rec, found_tmp; - xfs_fsblock_t fsbno; /* Merging did all the work already. */ if (*aglen == 0) @@ -1209,11 +1230,7 @@ xfs_refcount_adjust_extents( goto out_error; } } else { - fsbno = XFS_AGB_TO_FSB(cur->bc_mp, - cur->bc_ag.pag->pag_agno, - tmp.rc_startblock); - xfs_free_extent_later(cur->bc_tp, fsbno, - tmp.rc_blockcount, NULL, 0); + xrefc_free_extent(cur, &tmp); } (*agbno) += tmp.rc_blockcount; @@ -1269,11 +1286,7 @@ xfs_refcount_adjust_extents( } goto advloop; } else { - fsbno = XFS_AGB_TO_FSB(cur->bc_mp, - cur->bc_ag.pag->pag_agno, - ext.rc_startblock); - xfs_free_extent_later(cur->bc_tp, fsbno, - ext.rc_blockcount, NULL, 0); + xrefc_free_extent(cur, &ext); } skip: @@ -1357,19 +1370,31 @@ xfs_refcount_finish_one_cleanup( struct xfs_btree_cur *rcur, int error) { - struct xfs_buf *agbp; + struct xfs_buf *agbp = NULL; if (rcur == NULL) return; - agbp = rcur->bc_ag.agbp; + if (rcur->bc_btnum == XFS_BTNUM_REFC) + agbp = rcur->bc_ag.agbp; xfs_btree_del_cursor(rcur, error); - if (error) + if (agbp) xfs_trans_brelse(tp, agbp); } +/* Does this btree cursor match the given AG? */ +static inline bool +xfs_refcount_is_wrong_cursor( + struct xfs_btree_cur *cur, + struct xfs_refcount_intent *ri) +{ + if (cur->bc_btnum == XFS_BTNUM_RTREFC) + return cur->bc_ino.rtg != ri->ri_rtg; + return cur->bc_ag.pag != ri->ri_pag; +} + /* * Set up a continuation a deferred refcount operation by updating the intent. - * Checks to make sure we're not going to run off the end of the AG. + * Checks to make sure we're not going to run off the end of the AG or rtgroup. */ static inline int xfs_refcount_continue_op( @@ -1378,19 +1403,35 @@ xfs_refcount_continue_op( xfs_agblock_t new_agbno) { struct xfs_mount *mp = cur->bc_mp; - struct xfs_perag *pag = cur->bc_ag.pag; - if (XFS_IS_CORRUPT(mp, !xfs_verify_agbext(pag, new_agbno, - ri->ri_blockcount))) { - xfs_btree_mark_sick(cur); - return -EFSCORRUPTED; + if (ri->ri_realtime) { + struct xfs_rtgroup *rtg = ri->ri_rtg; + + if (XFS_IS_CORRUPT(mp, !xfs_verify_rgbext(rtg, new_agbno, + ri->ri_blockcount))) { + xfs_btree_mark_sick(cur); + return -EFSCORRUPTED; + } + + ri->ri_startblock = xfs_rgbno_to_rtb(mp, rtg->rtg_rgno, new_agbno); + + ASSERT(xfs_verify_rtbext(mp, ri->ri_startblock, ri->ri_blockcount)); + ASSERT(rtg->rtg_rgno == xfs_rtb_to_rgno(mp, ri->ri_startblock)); + } else { + struct xfs_perag *pag = cur->bc_ag.pag; + + if (XFS_IS_CORRUPT(mp, !xfs_verify_agbext(pag, new_agbno, + ri->ri_blockcount))) { + xfs_btree_mark_sick(cur); + return -EFSCORRUPTED; + } + + ri->ri_startblock = XFS_AGB_TO_FSB(mp, pag->pag_agno, new_agbno); + + ASSERT(xfs_verify_fsbext(mp, ri->ri_startblock, ri->ri_blockcount)); + ASSERT(pag->pag_agno == XFS_FSB_TO_AGNO(mp, ri->ri_startblock)); } - ri->ri_startblock = XFS_AGB_TO_FSB(mp, pag->pag_agno, new_agbno); - - ASSERT(xfs_verify_fsbext(mp, ri->ri_startblock, ri->ri_blockcount)); - ASSERT(pag->pag_agno == XFS_FSB_TO_AGNO(mp, ri->ri_startblock)); - return 0; } @@ -1415,10 +1456,16 @@ xfs_refcount_finish_one( unsigned long nr_ops = 0; int shape_changes = 0; - bno = XFS_FSB_TO_AGBNO(mp, ri->ri_startblock); - trace_xfs_refcount_deferred(mp, ri); + if (ri->ri_realtime) { + xfs_rgnumber_t rgno; + + bno = xfs_rtb_to_rgbno(mp, ri->ri_startblock, &rgno); + } else { + bno = XFS_FSB_TO_AGBNO(mp, ri->ri_startblock); + } + if (XFS_TEST_ERROR(false, mp, XFS_ERRTAG_REFCOUNT_FINISH_ONE)) return -EIO; @@ -1427,7 +1474,7 @@ xfs_refcount_finish_one( * the startblock, get one now. */ rcur = *pcur; - if (rcur != NULL && rcur->bc_ag.pag != ri->ri_pag) { + if (rcur != NULL && xfs_refcount_is_wrong_cursor(rcur, ri)) { nr_ops = xrefc_btree_state(rcur)->nr_ops; shape_changes = xrefc_btree_state(rcur)->shape_changes; xfs_refcount_finish_one_cleanup(tp, rcur, 0); @@ -1435,12 +1482,19 @@ xfs_refcount_finish_one( *pcur = NULL; } if (rcur == NULL) { - error = xfs_alloc_read_agf(ri->ri_pag, tp, - XFS_ALLOC_FLAG_FREEING, &agbp); - if (error) - return error; + if (ri->ri_realtime) { + /* coming in a later patch */ + ASSERT(0); + return -EFSCORRUPTED; + } else { + error = xfs_alloc_read_agf(ri->ri_pag, tp, + XFS_ALLOC_FLAG_FREEING, &agbp); + if (error) + return error; - rcur = xfs_refcountbt_init_cursor(mp, tp, agbp, ri->ri_pag); + rcur = xfs_refcountbt_init_cursor(mp, tp, agbp, + ri->ri_pag); + } xrefc_btree_state(rcur)->nr_ops = nr_ops; xrefc_btree_state(rcur)->shape_changes = shape_changes; } @@ -1491,10 +1545,12 @@ static void __xfs_refcount_add( struct xfs_trans *tp, enum xfs_refcount_intent_type type, + bool isrt, xfs_fsblock_t startblock, xfs_extlen_t blockcount) { struct xfs_refcount_intent *ri; + enum xfs_defer_ops_type optype; ri = kmem_cache_alloc(xfs_refcount_intent_cache, GFP_NOFS | __GFP_NOFAIL); @@ -1502,11 +1558,24 @@ __xfs_refcount_add( ri->ri_type = type; ri->ri_startblock = startblock; ri->ri_blockcount = blockcount; + ri->ri_realtime = isrt; trace_xfs_refcount_defer(tp->t_mountp, ri); + /* + * Deferred refcount updates for the realtime and data sections must + * use separate transactions to finish deferred work because updates to + * realtime metadata files can lock AGFs to allocate btree blocks and + * we don't want that mixing with the AGF locks taken to finish data + * section updates. + */ + if (isrt) + optype = XFS_DEFER_OPS_TYPE_REFCOUNT_RT; + else + optype = XFS_DEFER_OPS_TYPE_REFCOUNT; + xfs_refcount_update_get_group(tp->t_mountp, ri); - xfs_defer_add(tp, XFS_DEFER_OPS_TYPE_REFCOUNT, &ri->ri_list); + xfs_defer_add(tp, optype, &ri->ri_list); } /* @@ -1515,12 +1584,13 @@ __xfs_refcount_add( void xfs_refcount_increase_extent( struct xfs_trans *tp, + bool isrt, struct xfs_bmbt_irec *PREV) { if (!xfs_has_reflink(tp->t_mountp)) return; - __xfs_refcount_add(tp, XFS_REFCOUNT_INCREASE, PREV->br_startblock, + __xfs_refcount_add(tp, XFS_REFCOUNT_INCREASE, isrt, PREV->br_startblock, PREV->br_blockcount); } @@ -1530,12 +1600,13 @@ xfs_refcount_increase_extent( void xfs_refcount_decrease_extent( struct xfs_trans *tp, + bool isrt, struct xfs_bmbt_irec *PREV) { if (!xfs_has_reflink(tp->t_mountp)) return; - __xfs_refcount_add(tp, XFS_REFCOUNT_DECREASE, PREV->br_startblock, + __xfs_refcount_add(tp, XFS_REFCOUNT_DECREASE, isrt, PREV->br_startblock, PREV->br_blockcount); } @@ -1891,6 +1962,7 @@ __xfs_refcount_cow_free( void xfs_refcount_alloc_cow_extent( struct xfs_trans *tp, + bool isrt, xfs_fsblock_t fsb, xfs_extlen_t len) { @@ -1899,16 +1971,17 @@ xfs_refcount_alloc_cow_extent( if (!xfs_has_reflink(mp)) return; - __xfs_refcount_add(tp, XFS_REFCOUNT_ALLOC_COW, fsb, len); + __xfs_refcount_add(tp, XFS_REFCOUNT_ALLOC_COW, isrt, fsb, len); /* Add rmap entry */ - xfs_rmap_alloc_extent(tp, false, fsb, len, XFS_RMAP_OWN_COW); + xfs_rmap_alloc_extent(tp, isrt, fsb, len, XFS_RMAP_OWN_COW); } /* Forget a CoW staging event in the refcount btree. */ void xfs_refcount_free_cow_extent( struct xfs_trans *tp, + bool isrt, xfs_fsblock_t fsb, xfs_extlen_t len) { @@ -1918,8 +1991,8 @@ xfs_refcount_free_cow_extent( return; /* Remove rmap entry */ - xfs_rmap_free_extent(tp, false, fsb, len, XFS_RMAP_OWN_COW); - __xfs_refcount_add(tp, XFS_REFCOUNT_FREE_COW, fsb, len); + xfs_rmap_free_extent(tp, isrt, fsb, len, XFS_RMAP_OWN_COW); + __xfs_refcount_add(tp, XFS_REFCOUNT_FREE_COW, isrt, fsb, len); } struct xfs_refcount_recovery { @@ -2025,7 +2098,7 @@ xfs_refcount_recover_cow_leftovers( /* Free the orphan record */ fsb = XFS_AGB_TO_FSB(mp, pag->pag_agno, rr->rr_rrec.rc_startblock); - xfs_refcount_free_cow_extent(tp, fsb, + xfs_refcount_free_cow_extent(tp, false, fsb, rr->rr_rrec.rc_blockcount); /* Free the block. */ diff --git a/libxfs/xfs_refcount.h b/libxfs/xfs_refcount.h index 7713bb908bd..4e725d723e8 100644 --- a/libxfs/xfs_refcount.h +++ b/libxfs/xfs_refcount.h @@ -56,10 +56,14 @@ enum xfs_refcount_intent_type { struct xfs_refcount_intent { struct list_head ri_list; - struct xfs_perag *ri_pag; + union { + struct xfs_perag *ri_pag; + struct xfs_rtgroup *ri_rtg; + }; enum xfs_refcount_intent_type ri_type; xfs_extlen_t ri_blockcount; xfs_fsblock_t ri_startblock; + bool ri_realtime; }; /* Check that the refcount is appropriate for the record domain. */ @@ -77,9 +81,9 @@ xfs_refcount_check_domain( void xfs_refcount_update_get_group(struct xfs_mount *mp, struct xfs_refcount_intent *ri); -void xfs_refcount_increase_extent(struct xfs_trans *tp, +void xfs_refcount_increase_extent(struct xfs_trans *tp, bool isrt, struct xfs_bmbt_irec *irec); -void xfs_refcount_decrease_extent(struct xfs_trans *tp, +void xfs_refcount_decrease_extent(struct xfs_trans *tp, bool isrt, struct xfs_bmbt_irec *irec); extern void xfs_refcount_finish_one_cleanup(struct xfs_trans *tp, @@ -91,10 +95,10 @@ extern int xfs_refcount_find_shared(struct xfs_btree_cur *cur, xfs_agblock_t agbno, xfs_extlen_t aglen, xfs_agblock_t *fbno, xfs_extlen_t *flen, bool find_end_of_shared); -void xfs_refcount_alloc_cow_extent(struct xfs_trans *tp, xfs_fsblock_t fsb, - xfs_extlen_t len); -void xfs_refcount_free_cow_extent(struct xfs_trans *tp, xfs_fsblock_t fsb, - xfs_extlen_t len); +void xfs_refcount_alloc_cow_extent(struct xfs_trans *tp, bool isrt, + xfs_fsblock_t fsb, xfs_extlen_t len); +void xfs_refcount_free_cow_extent(struct xfs_trans *tp, bool isrt, + xfs_fsblock_t fsb, xfs_extlen_t len); extern int xfs_refcount_recover_cow_leftovers(struct xfs_mount *mp, struct xfs_perag *pag); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 05/41] xfs: add realtime refcount btree operations 2022-12-30 22:20 ` [PATCHSET v1.0 00/41] libxfs: reflink on the realtime device Darrick J. Wong ` (3 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 07/41] xfs: add a realtime flag to the refcount update log redo items Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 03/41] xfs: define the on-disk realtime refcount btree format Darrick J. Wong ` (35 subsequent siblings) 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Implement the generic btree operations needed to manipulate rtrefcount btree blocks. This is different from the regular refcountbt in that we allocate space from the filesystem at large, and are neither constrained to the free space nor any particular AG. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/xfs_rtrefcount_btree.c | 148 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 148 insertions(+) diff --git a/libxfs/xfs_rtrefcount_btree.c b/libxfs/xfs_rtrefcount_btree.c index b0b21b886ac..ad2f94e5231 100644 --- a/libxfs/xfs_rtrefcount_btree.c +++ b/libxfs/xfs_rtrefcount_btree.c @@ -19,6 +19,7 @@ #include "xfs_btree.h" #include "xfs_btree_staging.h" #include "xfs_rtrefcount_btree.h" +#include "xfs_refcount.h" #include "xfs_trace.h" #include "xfs_cksum.h" #include "xfs_rtgroup.h" @@ -51,6 +52,106 @@ xfs_rtrefcountbt_dup_cursor( return new; } +STATIC int +xfs_rtrefcountbt_get_minrecs( + struct xfs_btree_cur *cur, + int level) +{ + if (level == cur->bc_nlevels - 1) { + struct xfs_ifork *ifp = xfs_btree_ifork_ptr(cur); + + return xfs_rtrefcountbt_maxrecs(cur->bc_mp, ifp->if_broot_bytes, + level == 0) / 2; + } + + return cur->bc_mp->m_rtrefc_mnr[level != 0]; +} + +STATIC int +xfs_rtrefcountbt_get_maxrecs( + struct xfs_btree_cur *cur, + int level) +{ + if (level == cur->bc_nlevels - 1) { + struct xfs_ifork *ifp = xfs_btree_ifork_ptr(cur); + + return xfs_rtrefcountbt_maxrecs(cur->bc_mp, ifp->if_broot_bytes, + level == 0); + } + + return cur->bc_mp->m_rtrefc_mxr[level != 0]; +} + +STATIC void +xfs_rtrefcountbt_init_key_from_rec( + union xfs_btree_key *key, + const union xfs_btree_rec *rec) +{ + key->refc.rc_startblock = rec->refc.rc_startblock; +} + +STATIC void +xfs_rtrefcountbt_init_high_key_from_rec( + union xfs_btree_key *key, + const union xfs_btree_rec *rec) +{ + __u32 x; + + x = be32_to_cpu(rec->refc.rc_startblock); + x += be32_to_cpu(rec->refc.rc_blockcount) - 1; + key->refc.rc_startblock = cpu_to_be32(x); +} + +STATIC void +xfs_rtrefcountbt_init_rec_from_cur( + struct xfs_btree_cur *cur, + union xfs_btree_rec *rec) +{ + const struct xfs_refcount_irec *irec = &cur->bc_rec.rc; + uint32_t start; + + start = xfs_refcount_encode_startblock(irec->rc_startblock, + irec->rc_domain); + rec->refc.rc_startblock = cpu_to_be32(start); + rec->refc.rc_blockcount = cpu_to_be32(cur->bc_rec.rc.rc_blockcount); + rec->refc.rc_refcount = cpu_to_be32(cur->bc_rec.rc.rc_refcount); +} + +STATIC void +xfs_rtrefcountbt_init_ptr_from_cur( + struct xfs_btree_cur *cur, + union xfs_btree_ptr *ptr) +{ + ptr->l = 0; +} + +STATIC int64_t +xfs_rtrefcountbt_key_diff( + struct xfs_btree_cur *cur, + const union xfs_btree_key *key) +{ + const struct xfs_refcount_key *kp = &key->refc; + const struct xfs_refcount_irec *irec = &cur->bc_rec.rc; + uint32_t start; + + start = xfs_refcount_encode_startblock(irec->rc_startblock, + irec->rc_domain); + return (int64_t)be32_to_cpu(kp->rc_startblock) - start; +} + +STATIC int64_t +xfs_rtrefcountbt_diff_two_keys( + struct xfs_btree_cur *cur, + const union xfs_btree_key *k1, + const union xfs_btree_key *k2, + const union xfs_btree_key *mask) +{ + ASSERT(!mask || mask->refc.rc_startblock); + + return (int64_t)be32_to_cpu(k1->refc.rc_startblock) - + be32_to_cpu(k2->refc.rc_startblock); +} + static xfs_failaddr_t xfs_rtrefcountbt_verify( struct xfs_buf *bp) @@ -117,6 +218,40 @@ const struct xfs_buf_ops xfs_rtrefcountbt_buf_ops = { .verify_struct = xfs_rtrefcountbt_verify, }; +STATIC int +xfs_rtrefcountbt_keys_inorder( + struct xfs_btree_cur *cur, + const union xfs_btree_key *k1, + const union xfs_btree_key *k2) +{ + return be32_to_cpu(k1->refc.rc_startblock) < + be32_to_cpu(k2->refc.rc_startblock); +} + +STATIC int +xfs_rtrefcountbt_recs_inorder( + struct xfs_btree_cur *cur, + const union xfs_btree_rec *r1, + const union xfs_btree_rec *r2) +{ + return be32_to_cpu(r1->refc.rc_startblock) + + be32_to_cpu(r1->refc.rc_blockcount) <= + be32_to_cpu(r2->refc.rc_startblock); +} + +STATIC enum xbtree_key_contig +xfs_rtrefcountbt_keys_contiguous( + struct xfs_btree_cur *cur, + const union xfs_btree_key *key1, + const union xfs_btree_key *key2, + const union xfs_btree_key *mask) +{ + ASSERT(!mask || mask->refc.rc_startblock); + + return xbtree_key_contig(be32_to_cpu(key1->refc.rc_startblock), + be32_to_cpu(key2->refc.rc_startblock)); +} + const struct xfs_btree_ops xfs_rtrefcountbt_ops = { .rec_len = sizeof(struct xfs_refcount_rec), .key_len = sizeof(struct xfs_refcount_key), @@ -124,7 +259,20 @@ const struct xfs_btree_ops xfs_rtrefcountbt_ops = { XFS_BTREE_CRC_BLOCKS | XFS_BTREE_IROOT_RECORDS, .dup_cursor = xfs_rtrefcountbt_dup_cursor, + .alloc_block = xfs_btree_alloc_imeta_block, + .free_block = xfs_btree_free_imeta_block, + .get_minrecs = xfs_rtrefcountbt_get_minrecs, + .get_maxrecs = xfs_rtrefcountbt_get_maxrecs, + .init_key_from_rec = xfs_rtrefcountbt_init_key_from_rec, + .init_high_key_from_rec = xfs_rtrefcountbt_init_high_key_from_rec, + .init_rec_from_cur = xfs_rtrefcountbt_init_rec_from_cur, + .init_ptr_from_cur = xfs_rtrefcountbt_init_ptr_from_cur, + .key_diff = xfs_rtrefcountbt_key_diff, .buf_ops = &xfs_rtrefcountbt_buf_ops, + .diff_two_keys = xfs_rtrefcountbt_diff_two_keys, + .keys_inorder = xfs_rtrefcountbt_keys_inorder, + .recs_inorder = xfs_rtrefcountbt_recs_inorder, + .keys_contiguous = xfs_rtrefcountbt_keys_contiguous, }; /* Initialize a new rt refcount btree cursor. */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 03/41] xfs: define the on-disk realtime refcount btree format 2022-12-30 22:20 ` [PATCHSET v1.0 00/41] libxfs: reflink on the realtime device Darrick J. Wong ` (4 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 05/41] xfs: add realtime refcount btree operations Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 09/41] xfs: add metadata reservations for realtime refcount btree Darrick J. Wong ` (34 subsequent siblings) 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Start filling out the rtrefcount btree implementation. Start with the on-disk btree format; add everything needed to read, write and manipulate refcount btree blocks. This prepares the way for connecting the btree operations implementation. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- include/libxfs.h | 1 include/xfs_mount.h | 9 + libxfs/Makefile | 2 libxfs/init.c | 6 + libxfs/xfs_btree.c | 6 + libxfs/xfs_btree.h | 11 + libxfs/xfs_format.h | 3 libxfs/xfs_rtrefcount_btree.c | 309 +++++++++++++++++++++++++++++++++++++++++ libxfs/xfs_rtrefcount_btree.h | 71 +++++++++ libxfs/xfs_sb.c | 8 + libxfs/xfs_shared.h | 2 11 files changed, 423 insertions(+), 5 deletions(-) create mode 100644 libxfs/xfs_rtrefcount_btree.c create mode 100644 libxfs/xfs_rtrefcount_btree.h diff --git a/include/libxfs.h b/include/libxfs.h index 0949bbd39a5..2e9395a35b6 100644 --- a/include/libxfs.h +++ b/include/libxfs.h @@ -80,6 +80,7 @@ struct iomap; #include "xfs_rmap_btree.h" #include "xfs_rmap.h" #include "xfs_refcount_btree.h" +#include "xfs_rtrefcount_btree.h" #include "xfs_refcount.h" #include "xfs_btree_staging.h" #include "xfs_symlink_remote.h" diff --git a/include/xfs_mount.h b/include/xfs_mount.h index ca79c420afb..c886247d785 100644 --- a/include/xfs_mount.h +++ b/include/xfs_mount.h @@ -84,11 +84,14 @@ typedef struct xfs_mount { uint m_rtrmap_mnr[2]; /* min rtrmap btree records */ uint m_refc_mxr[2]; /* max refc btree records */ uint m_refc_mnr[2]; /* min refc btree records */ + unsigned int m_rtrefc_mxr[2]; /* max rtrefc btree records */ + unsigned int m_rtrefc_mnr[2]; /* min rtrefc btree records */ uint m_alloc_maxlevels; /* max alloc btree levels */ uint m_bm_maxlevels[2]; /* max bmap btree levels */ uint m_rmap_maxlevels; /* max rmap btree levels */ uint m_rtrmap_maxlevels; /* max rtrmap btree level */ uint m_refc_maxlevels; /* max refc btree levels */ + unsigned int m_rtrefc_maxlevels; /* max rtrefc btree level */ unsigned int m_agbtree_maxlevels; /* max level of all AG btrees */ unsigned int m_rtbtree_maxlevels; /* max level of all rt btrees */ xfs_extlen_t m_ag_prealloc_blocks; /* reserved ag blocks */ @@ -229,6 +232,12 @@ static inline bool xfs_has_rtrmapbt(struct xfs_mount *mp) xfs_has_rmapbt(mp); } +static inline bool xfs_has_rtreflink(struct xfs_mount *mp) +{ + return xfs_has_metadir(mp) && xfs_has_realtime(mp) && + xfs_has_reflink(mp); +} + /* Kernel mount features that we don't support */ #define __XFS_UNSUPP_FEAT(name) \ static inline bool xfs_has_ ## name (struct xfs_mount *mp) \ diff --git a/libxfs/Makefile b/libxfs/Makefile index 2c636816082..0b8b2248457 100644 --- a/libxfs/Makefile +++ b/libxfs/Makefile @@ -54,6 +54,7 @@ HFILES = \ xfs_quota_defs.h \ xfs_refcount.h \ xfs_refcount_btree.h \ + xfs_rtrefcount_btree.h \ xfs_rmap.h \ xfs_rmap_btree.h \ xfs_rtbitmap.h \ @@ -110,6 +111,7 @@ CFILES = cache.c \ xfs_log_rlimit.c \ xfs_refcount.c \ xfs_refcount_btree.c \ + xfs_rtrefcount_btree.c \ xfs_rmap.c \ xfs_rmap_btree.c \ xfs_rtbitmap.c \ diff --git a/libxfs/init.c b/libxfs/init.c index aa94c87ccd4..cda8c92ab4f 100644 --- a/libxfs/init.c +++ b/libxfs/init.c @@ -776,7 +776,10 @@ static inline void xfs_rtbtree_compute_maxlevels( struct xfs_mount *mp) { - mp->m_rtbtree_maxlevels = mp->m_rtrmap_maxlevels; + unsigned int levels; + + levels = max(mp->m_rtrmap_maxlevels, mp->m_rtrefc_maxlevels); + mp->m_rtbtree_maxlevels = levels; } /* Compute maximum possible height of all btrees. */ @@ -791,6 +794,7 @@ libxfs_compute_all_maxlevels( xfs_rmapbt_compute_maxlevels(mp); xfs_rtrmapbt_compute_maxlevels(mp); xfs_refcountbt_compute_maxlevels(mp); + xfs_rtrefcountbt_compute_maxlevels(mp); xfs_agbtree_compute_maxlevels(mp); xfs_rtbtree_compute_maxlevels(mp); diff --git a/libxfs/xfs_btree.c b/libxfs/xfs_btree.c index 49f6ce3661e..92ebc6ec42a 100644 --- a/libxfs/xfs_btree.c +++ b/libxfs/xfs_btree.c @@ -33,6 +33,7 @@ #include "xfs_bmap.h" #include "xfs_rmap.h" #include "xfs_imeta.h" +#include "xfs_rtrefcount_btree.h" /* * Btree magic numbers. @@ -1384,6 +1385,7 @@ xfs_btree_set_refs( xfs_buf_set_ref(bp, XFS_RMAP_BTREE_REF); break; case XFS_BTNUM_REFC: + case XFS_BTNUM_RTREFC: xfs_buf_set_ref(bp, XFS_REFC_BTREE_REF); break; case XFS_BTNUM_RCBAG: @@ -5544,6 +5546,9 @@ xfs_btree_init_cur_caches(void) if (error) goto err; error = xfs_rtrmapbt_init_cur_cache(); + if (error) + goto err; + error = xfs_rtrefcountbt_init_cur_cache(); if (error) goto err; @@ -5563,6 +5568,7 @@ xfs_btree_destroy_cur_caches(void) xfs_rmapbt_destroy_cur_cache(); xfs_refcountbt_destroy_cur_cache(); xfs_rtrmapbt_destroy_cur_cache(); + xfs_rtrefcountbt_destroy_cur_cache(); } /* Move the btree cursor before the first record. */ diff --git a/libxfs/xfs_btree.h b/libxfs/xfs_btree.h index 48dbb681cf3..3a4e6285902 100644 --- a/libxfs/xfs_btree.h +++ b/libxfs/xfs_btree.h @@ -226,6 +226,11 @@ union xfs_btree_irec { struct xfs_refcount_irec rc; }; +struct xbtree_refc { + unsigned int nr_ops; /* # record updates */ + unsigned int shape_changes; /* # of extent splits */ +}; + /* Per-AG btree information. */ struct xfs_btree_cur_ag { struct xfs_perag *pag; @@ -234,10 +239,7 @@ struct xfs_btree_cur_ag { struct xbtree_afakeroot *afake; /* for staging cursor */ }; union { - struct { - unsigned int nr_ops; /* # record updates */ - unsigned int shape_changes; /* # of extent splits */ - } refc; + struct xbtree_refc refc; struct { bool active; /* allocation cursor state */ } abt; @@ -258,6 +260,7 @@ struct xfs_btree_cur_ino { /* For extent swap, ignore owner check in verifier */ #define XFS_BTCUR_BMBT_INVALID_OWNER (1 << 1) + struct xbtree_refc refc; }; /* In-memory btree information */ diff --git a/libxfs/xfs_format.h b/libxfs/xfs_format.h index c49a946e79f..d2270f95bfb 100644 --- a/libxfs/xfs_format.h +++ b/libxfs/xfs_format.h @@ -1803,6 +1803,9 @@ typedef __be32 xfs_refcount_ptr_t; */ #define XFS_RTREFC_CRC_MAGIC 0x52434e54 /* 'RCNT' */ +/* inode-rooted btree pointer type */ +typedef __be64 xfs_rtrefcount_ptr_t; + /* * BMAP Btree format definitions * diff --git a/libxfs/xfs_rtrefcount_btree.c b/libxfs/xfs_rtrefcount_btree.c new file mode 100644 index 00000000000..b0b21b886ac --- /dev/null +++ b/libxfs/xfs_rtrefcount_btree.c @@ -0,0 +1,309 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (C) 2022 Oracle. All Rights Reserved. + * Author: Darrick J. Wong <djwong@kernel.org> + */ +#include "libxfs_priv.h" +#include "xfs_fs.h" +#include "xfs_shared.h" +#include "xfs_format.h" +#include "xfs_log_format.h" +#include "xfs_trans_resv.h" +#include "xfs_bit.h" +#include "xfs_sb.h" +#include "xfs_mount.h" +#include "xfs_defer.h" +#include "xfs_inode.h" +#include "xfs_trans.h" +#include "xfs_alloc.h" +#include "xfs_btree.h" +#include "xfs_btree_staging.h" +#include "xfs_rtrefcount_btree.h" +#include "xfs_trace.h" +#include "xfs_cksum.h" +#include "xfs_rtgroup.h" +#include "xfs_rtbitmap.h" + +static struct kmem_cache *xfs_rtrefcountbt_cur_cache; + +/* + * Realtime Reference Count btree. + * + * This is a btree used to track the owner(s) of a given extent in the realtime + * device. See the comments in xfs_refcount_btree.c for more information. + * + * This tree is basically the same as the regular refcount btree except that + * it's rooted in an inode. + */ + +static struct xfs_btree_cur * +xfs_rtrefcountbt_dup_cursor( + struct xfs_btree_cur *cur) +{ + struct xfs_btree_cur *new; + + new = xfs_rtrefcountbt_init_cursor(cur->bc_mp, cur->bc_tp, + cur->bc_ino.rtg, cur->bc_ino.ip); + + /* Copy the flags values since init cursor doesn't get them. */ + new->bc_ino.flags = cur->bc_ino.flags; + + return new; +} + +static xfs_failaddr_t +xfs_rtrefcountbt_verify( + struct xfs_buf *bp) +{ + struct xfs_mount *mp = bp->b_target->bt_mount; + struct xfs_btree_block *block = XFS_BUF_TO_BLOCK(bp); + xfs_failaddr_t fa; + int level; + + if (!xfs_verify_magic(bp, block->bb_magic)) + return __this_address; + + if (!xfs_has_reflink(mp)) + return __this_address; + fa = xfs_btree_lblock_v5hdr_verify(bp, XFS_RMAP_OWN_UNKNOWN); + if (fa) + return fa; + level = be16_to_cpu(block->bb_level); + if (level > mp->m_rtrefc_maxlevels) + return __this_address; + + return xfs_btree_lblock_verify(bp, mp->m_rtrefc_mxr[level != 0]); +} + +static void +xfs_rtrefcountbt_read_verify( + struct xfs_buf *bp) +{ + xfs_failaddr_t fa; + + if (!xfs_btree_lblock_verify_crc(bp)) + xfs_verifier_error(bp, -EFSBADCRC, __this_address); + else { + fa = xfs_rtrefcountbt_verify(bp); + if (fa) + xfs_verifier_error(bp, -EFSCORRUPTED, fa); + } + + if (bp->b_error) + trace_xfs_btree_corrupt(bp, _RET_IP_); +} + +static void +xfs_rtrefcountbt_write_verify( + struct xfs_buf *bp) +{ + xfs_failaddr_t fa; + + fa = xfs_rtrefcountbt_verify(bp); + if (fa) { + trace_xfs_btree_corrupt(bp, _RET_IP_); + xfs_verifier_error(bp, -EFSCORRUPTED, fa); + return; + } + xfs_btree_lblock_calc_crc(bp); + +} + +const struct xfs_buf_ops xfs_rtrefcountbt_buf_ops = { + .name = "xfs_rtrefcountbt", + .magic = { 0, cpu_to_be32(XFS_RTREFC_CRC_MAGIC) }, + .verify_read = xfs_rtrefcountbt_read_verify, + .verify_write = xfs_rtrefcountbt_write_verify, + .verify_struct = xfs_rtrefcountbt_verify, +}; + +const struct xfs_btree_ops xfs_rtrefcountbt_ops = { + .rec_len = sizeof(struct xfs_refcount_rec), + .key_len = sizeof(struct xfs_refcount_key), + .geom_flags = XFS_BTREE_LONG_PTRS | XFS_BTREE_ROOT_IN_INODE | + XFS_BTREE_CRC_BLOCKS | XFS_BTREE_IROOT_RECORDS, + + .dup_cursor = xfs_rtrefcountbt_dup_cursor, + .buf_ops = &xfs_rtrefcountbt_buf_ops, +}; + +/* Initialize a new rt refcount btree cursor. */ +static struct xfs_btree_cur * +xfs_rtrefcountbt_init_common( + struct xfs_mount *mp, + struct xfs_trans *tp, + struct xfs_rtgroup *rtg, + struct xfs_inode *ip) +{ + struct xfs_btree_cur *cur; + + ASSERT(xfs_isilocked(ip, XFS_ILOCK_SHARED | XFS_ILOCK_EXCL)); + + cur = xfs_btree_alloc_cursor(mp, tp, XFS_BTNUM_RTREFC, + &xfs_rtrefcountbt_ops, mp->m_rtrefc_maxlevels, + xfs_rtrefcountbt_cur_cache); + cur->bc_statoff = XFS_STATS_CALC_INDEX(xs_refcbt_2); + + cur->bc_ino.ip = ip; + cur->bc_ino.allocated = 0; + cur->bc_ino.flags = 0; + cur->bc_ino.refc.nr_ops = 0; + cur->bc_ino.refc.shape_changes = 0; + + cur->bc_ino.rtg = xfs_rtgroup_bump(rtg); + return cur; +} + +/* Allocate a new rt refcount btree cursor. */ +struct xfs_btree_cur * +xfs_rtrefcountbt_init_cursor( + struct xfs_mount *mp, + struct xfs_trans *tp, + struct xfs_rtgroup *rtg, + struct xfs_inode *ip) +{ + struct xfs_btree_cur *cur; + struct xfs_ifork *ifp = xfs_ifork_ptr(ip, XFS_DATA_FORK); + + cur = xfs_rtrefcountbt_init_common(mp, tp, rtg, ip); + cur->bc_nlevels = be16_to_cpu(ifp->if_broot->bb_level) + 1; + cur->bc_ino.forksize = xfs_inode_fork_size(ip, XFS_DATA_FORK); + cur->bc_ino.whichfork = XFS_DATA_FORK; + return cur; +} + +/* Create a new rt reverse mapping btree cursor with a fake root for staging. */ +struct xfs_btree_cur * +xfs_rtrefcountbt_stage_cursor( + struct xfs_mount *mp, + struct xfs_rtgroup *rtg, + struct xfs_inode *ip, + struct xbtree_ifakeroot *ifake) +{ + struct xfs_btree_cur *cur; + + cur = xfs_rtrefcountbt_init_common(mp, NULL, rtg, ip); + cur->bc_nlevels = ifake->if_levels; + cur->bc_ino.forksize = ifake->if_fork_size; + cur->bc_ino.whichfork = -1; + xfs_btree_stage_ifakeroot(cur, ifake, NULL); + return cur; +} + +/* + * Install a new rt reverse mapping btree root. Caller is responsible for + * invalidating and freeing the old btree blocks. + */ +void +xfs_rtrefcountbt_commit_staged_btree( + struct xfs_btree_cur *cur, + struct xfs_trans *tp) +{ + struct xbtree_ifakeroot *ifake = cur->bc_ino.ifake; + struct xfs_ifork *ifp; + int flags = XFS_ILOG_CORE | XFS_ILOG_DBROOT; + + ASSERT(cur->bc_flags & XFS_BTREE_STAGING); + + /* + * Free any resources hanging off the real fork, then shallow-copy the + * staging fork's contents into the real fork to transfer everything + * we just built. + */ + ifp = xfs_ifork_ptr(cur->bc_ino.ip, XFS_DATA_FORK); + xfs_idestroy_fork(ifp); + memcpy(ifp, ifake->if_fork, sizeof(struct xfs_ifork)); + + xfs_trans_log_inode(tp, cur->bc_ino.ip, flags); + xfs_btree_commit_ifakeroot(cur, tp, XFS_DATA_FORK, + &xfs_rtrefcountbt_ops); +} + +/* Calculate number of records in a realtime refcount btree block. */ +static inline unsigned int +xfs_rtrefcountbt_block_maxrecs( + unsigned int blocklen, + bool leaf) +{ + + if (leaf) + return blocklen / sizeof(struct xfs_refcount_rec); + return blocklen / (sizeof(struct xfs_refcount_key) + + sizeof(xfs_rtrefcount_ptr_t)); +} + +/* + * Calculate number of records in an refcount btree block. + */ +unsigned int +xfs_rtrefcountbt_maxrecs( + struct xfs_mount *mp, + unsigned int blocklen, + bool leaf) +{ + blocklen -= XFS_RTREFCOUNT_BLOCK_LEN; + return xfs_rtrefcountbt_block_maxrecs(blocklen, leaf); +} + +/* Compute the max possible height for realtime refcount btrees. */ +unsigned int +xfs_rtrefcountbt_maxlevels_ondisk(void) +{ + unsigned int minrecs[2]; + unsigned int blocklen; + + blocklen = XFS_MIN_CRC_BLOCKSIZE - XFS_BTREE_LBLOCK_CRC_LEN; + + minrecs[0] = xfs_rtrefcountbt_block_maxrecs(blocklen, true) / 2; + minrecs[1] = xfs_rtrefcountbt_block_maxrecs(blocklen, false) / 2; + + /* We need at most one record for every block in an rt group. */ + return xfs_btree_compute_maxlevels(minrecs, XFS_MAX_RGBLOCKS); +} + +int __init +xfs_rtrefcountbt_init_cur_cache(void) +{ + xfs_rtrefcountbt_cur_cache = kmem_cache_create("xfs_rtrefcountbt_cur", + xfs_btree_cur_sizeof( + xfs_rtrefcountbt_maxlevels_ondisk()), + 0, 0, NULL); + + if (!xfs_rtrefcountbt_cur_cache) + return -ENOMEM; + return 0; +} + +void +xfs_rtrefcountbt_destroy_cur_cache(void) +{ + kmem_cache_destroy(xfs_rtrefcountbt_cur_cache); + xfs_rtrefcountbt_cur_cache = NULL; +} + +/* Compute the maximum height of a realtime refcount btree. */ +void +xfs_rtrefcountbt_compute_maxlevels( + struct xfs_mount *mp) +{ + unsigned int d_maxlevels, r_maxlevels; + + if (!xfs_has_rtreflink(mp)) { + mp->m_rtrefc_maxlevels = 0; + return; + } + + /* + * The realtime refcountbt lives on the data device, which means that + * its maximum height is constrained by the size of the data device and + * the height required to store one refcount record for each rtextent + * in an rt group. + */ + d_maxlevels = xfs_btree_space_to_height(mp->m_rtrefc_mnr, + mp->m_sb.sb_dblocks); + r_maxlevels = xfs_btree_compute_maxlevels(mp->m_rtrefc_mnr, + xfs_rtb_to_rtxt(mp, mp->m_sb.sb_rgblocks)); + + /* Add one level to handle the inode root level. */ + mp->m_rtrefc_maxlevels = min(d_maxlevels, r_maxlevels) + 1; +} diff --git a/libxfs/xfs_rtrefcount_btree.h b/libxfs/xfs_rtrefcount_btree.h new file mode 100644 index 00000000000..d10ebdcf772 --- /dev/null +++ b/libxfs/xfs_rtrefcount_btree.h @@ -0,0 +1,71 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +/* + * Copyright (C) 2022 Oracle. All Rights Reserved. + * Author: Darrick J. Wong <djwong@kernel.org> + */ +#ifndef __XFS_RTREFCOUNT_BTREE_H__ +#define __XFS_RTREFCOUNT_BTREE_H__ + +struct xfs_buf; +struct xfs_btree_cur; +struct xfs_mount; +struct xbtree_ifakeroot; +struct xfs_rtgroup; + +/* refcounts only exist on crc enabled filesystems */ +#define XFS_RTREFCOUNT_BLOCK_LEN XFS_BTREE_LBLOCK_CRC_LEN + +struct xfs_btree_cur *xfs_rtrefcountbt_init_cursor(struct xfs_mount *mp, + struct xfs_trans *tp, struct xfs_rtgroup *rtg, + struct xfs_inode *ip); +struct xfs_btree_cur *xfs_rtrefcountbt_stage_cursor(struct xfs_mount *mp, + struct xfs_rtgroup *rtg, struct xfs_inode *ip, + struct xbtree_ifakeroot *ifake); +void xfs_rtrefcountbt_commit_staged_btree(struct xfs_btree_cur *cur, + struct xfs_trans *tp); +unsigned int xfs_rtrefcountbt_maxrecs(struct xfs_mount *mp, + unsigned int blocklen, bool leaf); +void xfs_rtrefcountbt_compute_maxlevels(struct xfs_mount *mp); + +/* + * Addresses of records, keys, and pointers within an incore rtrefcountbt block. + * + * (note that some of these may appear unused, but they are used in userspace) + */ +static inline struct xfs_refcount_rec * +xfs_rtrefcount_rec_addr( + struct xfs_btree_block *block, + unsigned int index) +{ + return (struct xfs_refcount_rec *) + ((char *)block + XFS_RTREFCOUNT_BLOCK_LEN + + (index - 1) * sizeof(struct xfs_refcount_rec)); +} + +static inline struct xfs_refcount_key * +xfs_rtrefcount_key_addr( + struct xfs_btree_block *block, + unsigned int index) +{ + return (struct xfs_refcount_key *) + ((char *)block + XFS_RTREFCOUNT_BLOCK_LEN + + (index - 1) * sizeof(struct xfs_refcount_key)); +} + +static inline xfs_rtrefcount_ptr_t * +xfs_rtrefcount_ptr_addr( + struct xfs_btree_block *block, + unsigned int index, + unsigned int maxrecs) +{ + return (xfs_rtrefcount_ptr_t *) + ((char *)block + XFS_RTREFCOUNT_BLOCK_LEN + + maxrecs * sizeof(struct xfs_refcount_key) + + (index - 1) * sizeof(xfs_rtrefcount_ptr_t)); +} + +unsigned int xfs_rtrefcountbt_maxlevels_ondisk(void); +int __init xfs_rtrefcountbt_init_cur_cache(void); +void xfs_rtrefcountbt_destroy_cur_cache(void); + +#endif /* __XFS_RTREFCOUNT_BTREE_H__ */ diff --git a/libxfs/xfs_sb.c b/libxfs/xfs_sb.c index 816a3915ae6..59aa501f117 100644 --- a/libxfs/xfs_sb.c +++ b/libxfs/xfs_sb.c @@ -26,6 +26,7 @@ #include "xfs_swapext.h" #include "xfs_rtgroup.h" #include "xfs_rtrmap_btree.h" +#include "xfs_rtrefcount_btree.h" /* * Physical superblock buffer manipulations. Shared with libxfs in userspace. @@ -1073,6 +1074,13 @@ xfs_sb_mount_common( mp->m_refc_mnr[0] = mp->m_refc_mxr[0] / 2; mp->m_refc_mnr[1] = mp->m_refc_mxr[1] / 2; + mp->m_rtrefc_mxr[0] = xfs_rtrefcountbt_maxrecs(mp, sbp->sb_blocksize, + true); + mp->m_rtrefc_mxr[1] = xfs_rtrefcountbt_maxrecs(mp, sbp->sb_blocksize, + false); + mp->m_rtrefc_mnr[0] = mp->m_rtrefc_mxr[0] / 2; + mp->m_rtrefc_mnr[1] = mp->m_rtrefc_mxr[1] / 2; + mp->m_bsize = XFS_FSB_TO_BB(mp, 1); mp->m_alloc_set_aside = xfs_alloc_set_aside(mp); mp->m_ag_max_usable = xfs_alloc_ag_max_usable(mp); diff --git a/libxfs/xfs_shared.h b/libxfs/xfs_shared.h index 31c577a9429..a1bfc98c47a 100644 --- a/libxfs/xfs_shared.h +++ b/libxfs/xfs_shared.h @@ -42,6 +42,7 @@ extern const struct xfs_buf_ops xfs_rtbitmap_buf_ops; extern const struct xfs_buf_ops xfs_rtsummary_buf_ops; extern const struct xfs_buf_ops xfs_rtbuf_ops; extern const struct xfs_buf_ops xfs_rtsb_buf_ops; +extern const struct xfs_buf_ops xfs_rtrefcountbt_buf_ops; extern const struct xfs_buf_ops xfs_rtrmapbt_buf_ops; extern const struct xfs_buf_ops xfs_sb_buf_ops; extern const struct xfs_buf_ops xfs_sb_quiet_buf_ops; @@ -56,6 +57,7 @@ extern const struct xfs_btree_ops xfs_bmbt_ops; extern const struct xfs_btree_ops xfs_refcountbt_ops; extern const struct xfs_btree_ops xfs_rmapbt_ops; extern const struct xfs_btree_ops xfs_rtrmapbt_ops; +extern const struct xfs_btree_ops xfs_rtrefcountbt_ops; /* log size calculation functions */ int xfs_log_calc_unit_res(struct xfs_mount *mp, int unit_bytes); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 09/41] xfs: add metadata reservations for realtime refcount btree 2022-12-30 22:20 ` [PATCHSET v1.0 00/41] libxfs: reflink on the realtime device Darrick J. Wong ` (5 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 03/41] xfs: define the on-disk realtime refcount btree format Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 04/41] xfs: realtime refcount btree transaction reservations Darrick J. Wong ` (33 subsequent siblings) 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Reserve some free blocks so that we will always have enough free blocks in the data volume to handle expansion of the realtime refcount btree. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/xfs_rtrefcount_btree.c | 39 +++++++++++++++++++++++++++++++++++++++ libxfs/xfs_rtrefcount_btree.h | 2 ++ 2 files changed, 41 insertions(+) diff --git a/libxfs/xfs_rtrefcount_btree.c b/libxfs/xfs_rtrefcount_btree.c index c69b4296d57..a5146550b20 100644 --- a/libxfs/xfs_rtrefcount_btree.c +++ b/libxfs/xfs_rtrefcount_btree.c @@ -488,3 +488,42 @@ xfs_rtrefcountbt_create_path( *pathp = path; return 0; } + +/* Calculate the rtrefcount btree size for some records. */ +static unsigned long long +xfs_rtrefcountbt_calc_size( + struct xfs_mount *mp, + unsigned long long len) +{ + return xfs_btree_calc_size(mp->m_rtrefc_mnr, len); +} + +/* + * Calculate the maximum refcount btree size. + */ +static unsigned long long +xfs_rtrefcountbt_max_size( + struct xfs_mount *mp, + xfs_rtblock_t rtblocks) +{ + /* Bail out if we're uninitialized, which can happen in mkfs. */ + if (mp->m_rtrefc_mxr[0] == 0) + return 0; + + return xfs_rtrefcountbt_calc_size(mp, rtblocks); +} + +/* + * Figure out how many blocks to reserve and how many are used by this btree. + * We need enough space to hold one record for every rt extent in the rtgroup. + */ +xfs_filblks_t +xfs_rtrefcountbt_calc_reserves( + struct xfs_mount *mp) +{ + if (!xfs_has_rtreflink(mp)) + return 0; + + return xfs_rtrefcountbt_max_size(mp, + xfs_rtb_to_rtxt(mp, mp->m_sb.sb_rgblocks)); +} diff --git a/libxfs/xfs_rtrefcount_btree.h b/libxfs/xfs_rtrefcount_btree.h index 1f3f590c68e..ffda0b063bc 100644 --- a/libxfs/xfs_rtrefcount_btree.h +++ b/libxfs/xfs_rtrefcount_btree.h @@ -72,4 +72,6 @@ void xfs_rtrefcountbt_destroy_cur_cache(void); int xfs_rtrefcountbt_create_path(struct xfs_mount *mp, xfs_rgnumber_t rgno, struct xfs_imeta_path **pathp); +xfs_filblks_t xfs_rtrefcountbt_calc_reserves(struct xfs_mount *mp); + #endif /* __XFS_RTREFCOUNT_BTREE_H__ */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 04/41] xfs: realtime refcount btree transaction reservations 2022-12-30 22:20 ` [PATCHSET v1.0 00/41] libxfs: reflink on the realtime device Darrick J. Wong ` (6 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 09/41] xfs: add metadata reservations for realtime refcount btree Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 08/41] xfs: add realtime refcount btree inode to metadata directory Darrick J. Wong ` (32 subsequent siblings) 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Make sure that there's enough log reservation to handle mapping and unmapping realtime extents. We have to reserve enough space to handle a split in the rtrefcountbt to add the record and a second split in the regular refcountbt to record the rtrefcountbt split. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/xfs_trans_resv.c | 25 ++++++++++++++++++++++--- 1 file changed, 22 insertions(+), 3 deletions(-) diff --git a/libxfs/xfs_trans_resv.c b/libxfs/xfs_trans_resv.c index 42322a42058..7279b06fcb5 100644 --- a/libxfs/xfs_trans_resv.c +++ b/libxfs/xfs_trans_resv.c @@ -89,6 +89,14 @@ xfs_refcountbt_block_count( return num_ops * (2 * mp->m_refc_maxlevels - 1); } +static unsigned int +xfs_rtrefcountbt_block_count( + struct xfs_mount *mp, + unsigned int num_ops) +{ + return num_ops * (2 * mp->m_rtrefc_maxlevels - 1); +} + /* * Logging inodes is really tricksy. They are logged in memory format, * which means that what we write into the log doesn't directly translate into @@ -256,10 +264,13 @@ xfs_rtalloc_block_count( * Compute the log reservation required to handle the refcount update * transaction. Refcount updates are always done via deferred log items. * - * This is calculated as: + * This is calculated as the max of: * Data device refcount updates (t1): * the agfs of the ags containing the blocks: nr_ops * sector size * the refcount btrees: nr_ops * 1 trees * (2 * max depth - 1) * block size + * Realtime refcount updates (t2); + * the rt refcount inode + * the rtrefcount btrees: nr_ops * 1 trees * (2 * max depth - 1) * block size */ static unsigned int xfs_calc_refcountbt_reservation( @@ -267,12 +278,20 @@ xfs_calc_refcountbt_reservation( unsigned int nr_ops) { unsigned int blksz = XFS_FSB_TO_B(mp, 1); + unsigned int t1, t2 = 0; if (!xfs_has_reflink(mp)) return 0; - return xfs_calc_buf_res(nr_ops, mp->m_sb.sb_sectsize) + - xfs_calc_buf_res(xfs_refcountbt_block_count(mp, nr_ops), blksz); + t1 = xfs_calc_buf_res(nr_ops, mp->m_sb.sb_sectsize) + + xfs_calc_buf_res(xfs_refcountbt_block_count(mp, nr_ops), blksz); + + if (xfs_has_realtime(mp)) + t2 = xfs_calc_inode_res(mp, 1) + + xfs_calc_buf_res(xfs_rtrefcountbt_block_count(mp, nr_ops), + blksz); + + return max(t1, t2); } /* ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 08/41] xfs: add realtime refcount btree inode to metadata directory 2022-12-30 22:20 ` [PATCHSET v1.0 00/41] libxfs: reflink on the realtime device Darrick J. Wong ` (7 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 04/41] xfs: realtime refcount btree transaction reservations Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 15/41] xfs: allow inodes to have the realtime and reflink flags Darrick J. Wong ` (31 subsequent siblings) 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Add a metadir path to select the realtime refcount btree inode and load it at mount time. The rtrefcountbt inode will have a unique extent format code, which means that we also have to update the inode validation and flush routines to look for it. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/init.c | 4 ++++ libxfs/xfs_bmap.c | 8 ++++++-- libxfs/xfs_format.h | 4 +++- libxfs/xfs_inode_buf.c | 6 ++++++ libxfs/xfs_inode_fork.c | 9 +++++++++ libxfs/xfs_rtgroup.h | 3 +++ libxfs/xfs_rtrefcount_btree.c | 33 +++++++++++++++++++++++++++++++++ libxfs/xfs_rtrefcount_btree.h | 4 ++++ 8 files changed, 68 insertions(+), 3 deletions(-) diff --git a/libxfs/init.c b/libxfs/init.c index cda8c92ab4f..40ebbbce39d 100644 --- a/libxfs/init.c +++ b/libxfs/init.c @@ -1027,6 +1027,10 @@ libxfs_rtmount_destroy(xfs_mount_t *mp) xfs_rgnumber_t rgno; for_each_rtgroup(mp, rgno, rtg) { + if (rtg->rtg_refcountip) + libxfs_imeta_irele(rtg->rtg_refcountip); + rtg->rtg_refcountip = NULL; + if (rtg->rtg_rmapip) libxfs_imeta_irele(rtg->rtg_rmapip); rtg->rtg_rmapip = NULL; diff --git a/libxfs/xfs_bmap.c b/libxfs/xfs_bmap.c index 6795f070214..4bf5ce838a9 100644 --- a/libxfs/xfs_bmap.c +++ b/libxfs/xfs_bmap.c @@ -5142,9 +5142,13 @@ xfs_bmap_del_extent_real( * the same order of operations as the data device, which is: * Remove the file mapping, remove the reverse mapping, and * then free the blocks. This means that we must delay the - * freeing until after we've scheduled the rmap update. + * freeing until after we've scheduled the rmap update. If + * realtime reflink is enabled, use deferred refcount intent + * items to decide what to do with the extent, just like we do + * for the data device. */ - if (want_free && !xfs_has_rtrmapbt(mp)) { + if (want_free && !xfs_has_rtrmapbt(mp) && + !xfs_has_rtreflink(mp)) { error = xfs_rtfree_blocks(tp, del->br_startblock, del->br_blockcount); if (error) diff --git a/libxfs/xfs_format.h b/libxfs/xfs_format.h index d2270f95bfb..20af5b730d6 100644 --- a/libxfs/xfs_format.h +++ b/libxfs/xfs_format.h @@ -1011,6 +1011,7 @@ enum xfs_dinode_fmt { XFS_DINODE_FMT_BTREE, /* struct xfs_bmdr_block */ XFS_DINODE_FMT_UUID, /* added long ago, but never used */ XFS_DINODE_FMT_RMAP, /* reverse mapping btree */ + XFS_DINODE_FMT_REFCOUNT, /* reference count btree */ }; #define XFS_INODE_FORMAT_STR \ @@ -1019,7 +1020,8 @@ enum xfs_dinode_fmt { { XFS_DINODE_FMT_EXTENTS, "extent" }, \ { XFS_DINODE_FMT_BTREE, "btree" }, \ { XFS_DINODE_FMT_UUID, "uuid" }, \ - { XFS_DINODE_FMT_RMAP, "rmap" } + { XFS_DINODE_FMT_RMAP, "rmap" }, \ + { XFS_DINODE_FMT_REFCOUNT, "refcount" } /* * Max values for extnum and aextnum. diff --git a/libxfs/xfs_inode_buf.c b/libxfs/xfs_inode_buf.c index 3aaa1988fb1..004dafdf1bd 100644 --- a/libxfs/xfs_inode_buf.c +++ b/libxfs/xfs_inode_buf.c @@ -411,6 +411,12 @@ xfs_dinode_verify_fork( if (!(dip->di_flags2 & cpu_to_be64(XFS_DIFLAG2_METADATA))) return __this_address; break; + case XFS_DINODE_FMT_REFCOUNT: + if (!xfs_has_rtreflink(mp)) + return __this_address; + if (!(dip->di_flags2 & cpu_to_be64(XFS_DIFLAG2_METADATA))) + return __this_address; + break; default: return __this_address; } diff --git a/libxfs/xfs_inode_fork.c b/libxfs/xfs_inode_fork.c index 2b2a3fcab94..f7a168e0625 100644 --- a/libxfs/xfs_inode_fork.c +++ b/libxfs/xfs_inode_fork.c @@ -264,6 +264,11 @@ xfs_iformat_data_fork( return -EFSCORRUPTED; } return xfs_iformat_rtrmap(ip, dip); + case XFS_DINODE_FMT_REFCOUNT: + if (!xfs_has_rtreflink(ip->i_mount)) + return -EFSCORRUPTED; + ASSERT(0); /* to be implemented later */ + return -EFSCORRUPTED; default: xfs_inode_verifier_error(ip, -EFSCORRUPTED, __func__, dip, sizeof(*dip), __this_address); @@ -650,6 +655,10 @@ xfs_iflush_fork( xfs_iflush_rtrmap(ip, dip); break; + case XFS_DINODE_FMT_REFCOUNT: + ASSERT(0); /* to be implemented later */ + break; + default: ASSERT(0); break; diff --git a/libxfs/xfs_rtgroup.h b/libxfs/xfs_rtgroup.h index 4e9b9098f2f..0f400f133d8 100644 --- a/libxfs/xfs_rtgroup.h +++ b/libxfs/xfs_rtgroup.h @@ -23,6 +23,9 @@ struct xfs_rtgroup { /* reverse mapping btree inode */ struct xfs_inode *rtg_rmapip; + /* refcount btree inode */ + struct xfs_inode *rtg_refcountip; + /* Number of blocks in this group */ xfs_rgblock_t rtg_blockcount; diff --git a/libxfs/xfs_rtrefcount_btree.c b/libxfs/xfs_rtrefcount_btree.c index ad2f94e5231..c69b4296d57 100644 --- a/libxfs/xfs_rtrefcount_btree.c +++ b/libxfs/xfs_rtrefcount_btree.c @@ -24,6 +24,7 @@ #include "xfs_cksum.h" #include "xfs_rtgroup.h" #include "xfs_rtbitmap.h" +#include "xfs_imeta.h" static struct kmem_cache *xfs_rtrefcountbt_cur_cache; @@ -352,6 +353,7 @@ xfs_rtrefcountbt_commit_staged_btree( int flags = XFS_ILOG_CORE | XFS_ILOG_DBROOT; ASSERT(cur->bc_flags & XFS_BTREE_STAGING); + ASSERT(ifake->if_fork->if_format == XFS_DINODE_FMT_REFCOUNT); /* * Free any resources hanging off the real fork, then shallow-copy the @@ -455,3 +457,34 @@ xfs_rtrefcountbt_compute_maxlevels( /* Add one level to handle the inode root level. */ mp->m_rtrefc_maxlevels = min(d_maxlevels, r_maxlevels) + 1; } + +#define XFS_RTREFC_NAMELEN 21 + +/* Create the metadata directory path for an rtrefcount btree inode. */ +int +xfs_rtrefcountbt_create_path( + struct xfs_mount *mp, + xfs_rgnumber_t rgno, + struct xfs_imeta_path **pathp) +{ + struct xfs_imeta_path *path; + char *fname; + int error; + + error = xfs_imeta_create_file_path(mp, 2, &path); + if (error) + return error; + + fname = kmalloc(XFS_RTREFC_NAMELEN, GFP_KERNEL); + if (!fname) { + xfs_imeta_free_path(path); + return -ENOMEM; + } + + snprintf(fname, XFS_RTREFC_NAMELEN, "%u.refcount", rgno); + path->im_path[0] = "realtime"; + path->im_path[1] = fname; + path->im_dynamicmask = 0x2; + *pathp = path; + return 0; +} diff --git a/libxfs/xfs_rtrefcount_btree.h b/libxfs/xfs_rtrefcount_btree.h index d10ebdcf772..1f3f590c68e 100644 --- a/libxfs/xfs_rtrefcount_btree.h +++ b/libxfs/xfs_rtrefcount_btree.h @@ -11,6 +11,7 @@ struct xfs_btree_cur; struct xfs_mount; struct xbtree_ifakeroot; struct xfs_rtgroup; +struct xfs_imeta_path; /* refcounts only exist on crc enabled filesystems */ #define XFS_RTREFCOUNT_BLOCK_LEN XFS_BTREE_LBLOCK_CRC_LEN @@ -68,4 +69,7 @@ unsigned int xfs_rtrefcountbt_maxlevels_ondisk(void); int __init xfs_rtrefcountbt_init_cur_cache(void); void xfs_rtrefcountbt_destroy_cur_cache(void); +int xfs_rtrefcountbt_create_path(struct xfs_mount *mp, xfs_rgnumber_t rgno, + struct xfs_imeta_path **pathp); + #endif /* __XFS_RTREFCOUNT_BTREE_H__ */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 15/41] xfs: allow inodes to have the realtime and reflink flags 2022-12-30 22:20 ` [PATCHSET v1.0 00/41] libxfs: reflink on the realtime device Darrick J. Wong ` (8 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 08/41] xfs: add realtime refcount btree inode to metadata directory Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 12/41] xfs: create routine to allocate and initialize a realtime refcount btree inode Darrick J. Wong ` (30 subsequent siblings) 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Now that we can share blocks between realtime files, allow this combination. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/xfs_inode_buf.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/libxfs/xfs_inode_buf.c b/libxfs/xfs_inode_buf.c index 004dafdf1bd..b2e47c3adca 100644 --- a/libxfs/xfs_inode_buf.c +++ b/libxfs/xfs_inode_buf.c @@ -672,7 +672,8 @@ xfs_dinode_verify( return __this_address; /* don't let reflink and realtime mix */ - if ((flags2 & XFS_DIFLAG2_REFLINK) && (flags & XFS_DIFLAG_REALTIME)) + if ((flags2 & XFS_DIFLAG2_REFLINK) && (flags & XFS_DIFLAG_REALTIME) && + !xfs_has_rtreflink(mp)) return __this_address; /* COW extent size hint validation */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 12/41] xfs: create routine to allocate and initialize a realtime refcount btree inode 2022-12-30 22:20 ` [PATCHSET v1.0 00/41] libxfs: reflink on the realtime device Darrick J. Wong ` (9 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 15/41] xfs: allow inodes to have the realtime and reflink flags Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 11/41] xfs: wire up realtime refcount btree cursors Darrick J. Wong ` (29 subsequent siblings) 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Create a library routine to allocate and initialize an empty realtime refcountbt inode. We'll use this for growfs, mkfs, and repair. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/xfs_rtrefcount_btree.c | 41 +++++++++++++++++++++++++++++++++++++++++ libxfs/xfs_rtrefcount_btree.h | 6 ++++++ 2 files changed, 47 insertions(+) diff --git a/libxfs/xfs_rtrefcount_btree.c b/libxfs/xfs_rtrefcount_btree.c index 2c9fbab5159..5e9930d315c 100644 --- a/libxfs/xfs_rtrefcount_btree.c +++ b/libxfs/xfs_rtrefcount_btree.c @@ -763,3 +763,44 @@ xfs_iflush_rtrefcount( ifp->if_broot_bytes, dfp, XFS_DFORK_SIZE(dip, ip->i_mount, XFS_DATA_FORK)); } + +/* + * Create a realtime refcount btree inode. + * + * Regardless of the return value, the caller must clean up @ic. If a new + * inode is returned through *ipp, the caller must finish setting up the incore + * inode and release it. + */ +int +xfs_rtrefcountbt_create( + struct xfs_trans **tpp, + struct xfs_imeta_path *path, + struct xfs_imeta_update *upd, + struct xfs_inode **ipp) +{ + struct xfs_mount *mp = (*tpp)->t_mountp; + struct xfs_ifork *ifp; + struct xfs_inode *ip; + int error; + + *ipp = NULL; + + error = xfs_imeta_create(tpp, path, S_IFREG, 0, &ip, upd); + if (error) + return error; + + ifp = &ip->i_df; + ifp->if_format = XFS_DINODE_FMT_REFCOUNT; + ASSERT(ifp->if_broot_bytes == 0); + ASSERT(ifp->if_bytes == 0); + + /* Initialize the empty incore btree root. */ + xfs_iroot_alloc(ip, XFS_DATA_FORK, + xfs_rtrefcount_broot_space_calc(mp, 0, 0)); + xfs_btree_init_block(ip->i_mount, ifp->if_broot, &xfs_rtrefcountbt_ops, + 0, 0, ip->i_ino); + xfs_trans_log_inode(*tpp, ip, XFS_ILOG_CORE | XFS_ILOG_DBROOT); + + *ipp = ip; + return 0; +} diff --git a/libxfs/xfs_rtrefcount_btree.h b/libxfs/xfs_rtrefcount_btree.h index d2fe2004568..86a547529c9 100644 --- a/libxfs/xfs_rtrefcount_btree.h +++ b/libxfs/xfs_rtrefcount_btree.h @@ -186,4 +186,10 @@ void xfs_rtrefcountbt_to_disk(struct xfs_mount *mp, struct xfs_rtrefcount_root *dblock, int dblocklen); void xfs_iflush_rtrefcount(struct xfs_inode *ip, struct xfs_dinode *dip); +struct xfs_imeta_update; + +int xfs_rtrefcountbt_create(struct xfs_trans **tpp, + struct xfs_imeta_path *path, struct xfs_imeta_update *ic, + struct xfs_inode **ipp); + #endif /* __XFS_RTREFCOUNT_BTREE_H__ */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 11/41] xfs: wire up realtime refcount btree cursors 2022-12-30 22:20 ` [PATCHSET v1.0 00/41] libxfs: reflink on the realtime device Darrick J. Wong ` (10 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 12/41] xfs: create routine to allocate and initialize a realtime refcount btree inode Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 14/41] xfs: compute rtrmap btree max levels when reflink enabled Darrick J. Wong ` (28 subsequent siblings) 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Wire up realtime refcount btree cursors wherever they're needed throughout the code base. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/xfs_refcount.c | 7 ++++--- libxfs/xfs_rtgroup.c | 10 ++++++++++ libxfs/xfs_rtgroup.h | 5 ++++- 3 files changed, 18 insertions(+), 4 deletions(-) diff --git a/libxfs/xfs_refcount.c b/libxfs/xfs_refcount.c index 960dbb401bd..5bc68407215 100644 --- a/libxfs/xfs_refcount.c +++ b/libxfs/xfs_refcount.c @@ -24,6 +24,7 @@ #include "xfs_ag.h" #include "xfs_health.h" #include "xfs_rtgroup.h" +#include "xfs_rtrefcount_btree.h" struct kmem_cache *xfs_refcount_intent_cache; @@ -1483,9 +1484,9 @@ xfs_refcount_finish_one( } if (rcur == NULL) { if (ri->ri_realtime) { - /* coming in a later patch */ - ASSERT(0); - return -EFSCORRUPTED; + xfs_rtgroup_lock(tp, ri->ri_rtg, XFS_RTGLOCK_REFCOUNT); + rcur = xfs_rtrefcountbt_init_cursor(mp, tp, ri->ri_rtg, + ri->ri_rtg->rtg_refcountip); } else { error = xfs_alloc_read_agf(ri->ri_pag, tp, XFS_ALLOC_FLAG_FREEING, &agbp); diff --git a/libxfs/xfs_rtgroup.c b/libxfs/xfs_rtgroup.c index f5f981609b4..d6a52084b3c 100644 --- a/libxfs/xfs_rtgroup.c +++ b/libxfs/xfs_rtgroup.c @@ -521,6 +521,13 @@ xfs_rtgroup_lock( if (tp) xfs_trans_ijoin(tp, rtg->rtg_rmapip, XFS_ILOCK_EXCL); } + + if ((rtglock_flags & XFS_RTGLOCK_REFCOUNT) && rtg->rtg_refcountip) { + xfs_ilock(rtg->rtg_refcountip, XFS_ILOCK_EXCL); + if (tp) + xfs_trans_ijoin(tp, rtg->rtg_refcountip, + XFS_ILOCK_EXCL); + } } /* Unlock metadata inodes associated with this rt group. */ @@ -533,6 +540,9 @@ xfs_rtgroup_unlock( ASSERT(!(rtglock_flags & XFS_RTGLOCK_BITMAP_SHARED) || !(rtglock_flags & XFS_RTGLOCK_BITMAP)); + if ((rtglock_flags & XFS_RTGLOCK_REFCOUNT) && rtg->rtg_refcountip) + xfs_iunlock(rtg->rtg_refcountip, XFS_ILOCK_EXCL); + if ((rtglock_flags & XFS_RTGLOCK_RMAP) && rtg->rtg_rmapip) xfs_iunlock(rtg->rtg_rmapip, XFS_ILOCK_EXCL); diff --git a/libxfs/xfs_rtgroup.h b/libxfs/xfs_rtgroup.h index 0f400f133d8..4f0358d6345 100644 --- a/libxfs/xfs_rtgroup.h +++ b/libxfs/xfs_rtgroup.h @@ -237,10 +237,13 @@ int xfs_rtgroup_init_secondary_super(struct xfs_mount *mp, xfs_rgnumber_t rgno, #define XFS_RTGLOCK_BITMAP_SHARED (1U << 1) /* Lock the rt rmap inode in exclusive mode */ #define XFS_RTGLOCK_RMAP (1U << 2) +/* Lock the rt refcount inode in exclusive mode */ +#define XFS_RTGLOCK_REFCOUNT (1U << 3) #define XFS_RTGLOCK_ALL_FLAGS (XFS_RTGLOCK_BITMAP | \ XFS_RTGLOCK_BITMAP_SHARED | \ - XFS_RTGLOCK_RMAP) + XFS_RTGLOCK_RMAP | \ + XFS_RTGLOCK_REFCOUNT) void xfs_rtgroup_lock(struct xfs_trans *tp, struct xfs_rtgroup *rtg, unsigned int rtglock_flags); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 14/41] xfs: compute rtrmap btree max levels when reflink enabled 2022-12-30 22:20 ` [PATCHSET v1.0 00/41] libxfs: reflink on the realtime device Darrick J. Wong ` (11 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 11/41] xfs: wire up realtime refcount btree cursors Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 17/41] xfs: fix xfs_get_extsz_hint behavior with realtime alwayscow files Darrick J. Wong ` (27 subsequent siblings) 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Compute the maximum possible height of the realtime rmap btree when reflink is enabled. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/xfs_rtrmap_btree.c | 28 ++++++++++++++++++++++++++-- 1 file changed, 26 insertions(+), 2 deletions(-) diff --git a/libxfs/xfs_rtrmap_btree.c b/libxfs/xfs_rtrmap_btree.c index ea5b3db3b32..1c8fe661e69 100644 --- a/libxfs/xfs_rtrmap_btree.c +++ b/libxfs/xfs_rtrmap_btree.c @@ -734,6 +734,7 @@ xfs_rtrmapbt_maxrecs( unsigned int xfs_rtrmapbt_maxlevels_ondisk(void) { + unsigned long long max_dblocks; unsigned int minrecs[2]; unsigned int blocklen; @@ -742,8 +743,20 @@ xfs_rtrmapbt_maxlevels_ondisk(void) minrecs[0] = xfs_rtrmapbt_block_maxrecs(blocklen, true) / 2; minrecs[1] = xfs_rtrmapbt_block_maxrecs(blocklen, false) / 2; - /* We need at most one record for every block in an rt group. */ - return xfs_btree_compute_maxlevels(minrecs, XFS_MAX_RGBLOCKS); + /* + * Compute the asymptotic maxlevels for an rtrmapbt on any rtreflink fs. + * + * On a reflink filesystem, each block in an rtgroup can have up to + * 2^32 (per the refcount record format) owners, which means that + * theoretically we could face up to 2^64 rmap records. However, we're + * likely to run out of blocks in the data device long before that + * happens, which means that we must compute the max height based on + * what the btree will look like if it consumes almost all the blocks + * in the data device due to maximal sharing factor. + */ + max_dblocks = -1U; /* max ag count */ + max_dblocks *= XFS_MAX_CRC_AG_BLOCKS; + return xfs_btree_space_to_height(minrecs, max_dblocks); } int __init @@ -782,9 +795,20 @@ xfs_rtrmapbt_compute_maxlevels( * maximum height is constrained by the size of the data device and * the height required to store one rmap record for each block in an * rt group. + * + * On a reflink filesystem, each rt block can have up to 2^32 (per the + * refcount record format) owners, which means that theoretically we + * could face up to 2^64 rmap records. This makes the computation of + * maxlevels based on record count meaningless, so we only consider the + * size of the data device. */ d_maxlevels = xfs_btree_space_to_height(mp->m_rtrmap_mnr, mp->m_sb.sb_dblocks); + if (xfs_has_rtreflink(mp)) { + mp->m_rtrmap_maxlevels = d_maxlevels + 1; + return; + } + r_maxlevels = xfs_btree_compute_maxlevels(mp->m_rtrmap_mnr, mp->m_sb.sb_rgblocks); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 17/41] xfs: fix xfs_get_extsz_hint behavior with realtime alwayscow files 2022-12-30 22:20 ` [PATCHSET v1.0 00/41] libxfs: reflink on the realtime device Darrick J. Wong ` (12 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 14/41] xfs: compute rtrmap btree max levels when reflink enabled Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 16/41] xfs: refcover CoW leftovers in the realtime volume Darrick J. Wong ` (26 subsequent siblings) 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Currently, we (ab)use xfs_get_extsz_hint so that it always returns a nonzero value for realtime files. This apparently was done to disable delayed allocation for realtime files. However, once we enable realtime reflink, we can also turn on the alwayscow flag to force CoW writes to realtime files. In this case, the logic will incorrectly send the write through the delalloc write path. Fix this by adjusting the logic slightly. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/xfs_bmap.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/libxfs/xfs_bmap.c b/libxfs/xfs_bmap.c index 4bf5ce838a9..16f0683caef 100644 --- a/libxfs/xfs_bmap.c +++ b/libxfs/xfs_bmap.c @@ -6421,9 +6421,8 @@ xfs_get_extsz_hint( * No point in aligning allocations if we need to COW to actually * write to them. */ - if (xfs_is_always_cow_inode(ip)) - return 0; - if ((ip->i_diflags & XFS_DIFLAG_EXTSIZE) && ip->i_extsize) + if (!xfs_is_always_cow_inode(ip) && + (ip->i_diflags & XFS_DIFLAG_EXTSIZE) && ip->i_extsize) return ip->i_extsize; if (XFS_IS_REALTIME_INODE(ip)) return ip->i_mount->m_sb.sb_rextsize; ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 16/41] xfs: refcover CoW leftovers in the realtime volume 2022-12-30 22:20 ` [PATCHSET v1.0 00/41] libxfs: reflink on the realtime device Darrick J. Wong ` (13 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 17/41] xfs: fix xfs_get_extsz_hint behavior with realtime alwayscow files Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 13/41] xfs: update rmap to allow cow staging extents in the rt rmap Darrick J. Wong ` (25 subsequent siblings) 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Scan the realtime refcount tree at mount time to get rid of leftover CoW staging extents. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/xfs_refcount.c | 63 ++++++++++++++++++++++++++++++++++++++----------- libxfs/xfs_refcount.h | 5 +++- 2 files changed, 53 insertions(+), 15 deletions(-) diff --git a/libxfs/xfs_refcount.c b/libxfs/xfs_refcount.c index 5bc68407215..b96472a2fe2 100644 --- a/libxfs/xfs_refcount.c +++ b/libxfs/xfs_refcount.c @@ -2035,14 +2035,15 @@ xfs_refcount_recover_extent( } /* Find and remove leftover CoW reservations. */ -int -xfs_refcount_recover_cow_leftovers( +static int +xfs_refcount_recover_group_cow_leftovers( struct xfs_mount *mp, - struct xfs_perag *pag) + struct xfs_perag *pag, + struct xfs_rtgroup *rtg) { struct xfs_trans *tp; struct xfs_btree_cur *cur; - struct xfs_buf *agbp; + struct xfs_buf *agbp = NULL; struct xfs_refcount_recovery *rr, *n; struct list_head debris; union xfs_btree_irec low; @@ -2052,7 +2053,12 @@ xfs_refcount_recover_cow_leftovers( /* reflink filesystems mustn't have AGs larger than 2^31-1 blocks */ BUILD_BUG_ON(XFS_MAX_CRC_AG_BLOCKS >= XFS_REFC_COWFLAG); - if (mp->m_sb.sb_agblocks > XFS_MAX_CRC_AG_BLOCKS) + if (pag && mp->m_sb.sb_agblocks > XFS_MAX_CRC_AG_BLOCKS) + return -EOPNOTSUPP; + + /* rtreflink filesystems can't have rtgroups larger than 2^31-1 blocks */ + BUILD_BUG_ON(XFS_MAX_RGBLOCKS >= XFS_REFC_COWFLAG); + if (rtg && mp->m_sb.sb_rgblocks >= XFS_MAX_RGBLOCKS) return -EOPNOTSUPP; INIT_LIST_HEAD(&debris); @@ -2071,10 +2077,16 @@ xfs_refcount_recover_cow_leftovers( if (error) return error; - error = xfs_alloc_read_agf(pag, tp, 0, &agbp); - if (error) - goto out_trans; - cur = xfs_refcountbt_init_cursor(mp, tp, agbp, pag); + if (rtg) { + xfs_rtgroup_lock(NULL, rtg, XFS_RTGLOCK_REFCOUNT); + cur = xfs_rtrefcountbt_init_cursor(mp, tp, rtg, + rtg->rtg_refcountip); + } else { + error = xfs_alloc_read_agf(pag, tp, 0, &agbp); + if (error) + goto out_trans; + cur = xfs_refcountbt_init_cursor(mp, tp, agbp, pag); + } /* Find all the leftover CoW staging extents. */ memset(&low, 0, sizeof(low)); @@ -2084,7 +2096,10 @@ xfs_refcount_recover_cow_leftovers( error = xfs_btree_query_range(cur, &low, &high, xfs_refcount_recover_extent, &debris); xfs_btree_del_cursor(cur, error); - xfs_trans_brelse(tp, agbp); + if (agbp) + xfs_trans_brelse(tp, agbp); + else + xfs_rtgroup_unlock(rtg, XFS_RTGLOCK_REFCOUNT); xfs_trans_cancel(tp); if (error) goto out_free; @@ -2097,14 +2112,18 @@ xfs_refcount_recover_cow_leftovers( goto out_free; /* Free the orphan record */ - fsb = XFS_AGB_TO_FSB(mp, pag->pag_agno, - rr->rr_rrec.rc_startblock); - xfs_refcount_free_cow_extent(tp, false, fsb, + if (rtg) + fsb = xfs_rgbno_to_rtb(mp, rtg->rtg_rgno, + rr->rr_rrec.rc_startblock); + else + fsb = XFS_AGB_TO_FSB(mp, pag->pag_agno, + rr->rr_rrec.rc_startblock); + xfs_refcount_free_cow_extent(tp, rtg != NULL, fsb, rr->rr_rrec.rc_blockcount); /* Free the block. */ xfs_free_extent_later(tp, fsb, rr->rr_rrec.rc_blockcount, NULL, - 0); + rtg != NULL ? XFS_FREE_EXTENT_REALTIME : 0); error = xfs_trans_commit(tp); if (error) @@ -2126,6 +2145,22 @@ xfs_refcount_recover_cow_leftovers( return error; } +int +xfs_refcount_recover_cow_leftovers( + struct xfs_mount *mp, + struct xfs_perag *pag) +{ + return xfs_refcount_recover_group_cow_leftovers(mp, pag, NULL); +} + +int +xfs_refcount_recover_rtcow_leftovers( + struct xfs_mount *mp, + struct xfs_rtgroup *rtg) +{ + return xfs_refcount_recover_group_cow_leftovers(mp, NULL, rtg); +} + /* * Scan part of the keyspace of the refcount records and tell us if the area * has no records, is fully mapped by records, or is partially filled. diff --git a/libxfs/xfs_refcount.h b/libxfs/xfs_refcount.h index 4e725d723e8..c7907119d10 100644 --- a/libxfs/xfs_refcount.h +++ b/libxfs/xfs_refcount.h @@ -12,6 +12,7 @@ struct xfs_perag; struct xfs_btree_cur; struct xfs_bmbt_irec; struct xfs_refcount_irec; +struct xfs_rtgroup; extern int xfs_refcount_lookup_le(struct xfs_btree_cur *cur, enum xfs_refc_domain domain, xfs_agblock_t bno, int *stat); @@ -99,8 +100,10 @@ void xfs_refcount_alloc_cow_extent(struct xfs_trans *tp, bool isrt, xfs_fsblock_t fsb, xfs_extlen_t len); void xfs_refcount_free_cow_extent(struct xfs_trans *tp, bool isrt, xfs_fsblock_t fsb, xfs_extlen_t len); -extern int xfs_refcount_recover_cow_leftovers(struct xfs_mount *mp, +int xfs_refcount_recover_cow_leftovers(struct xfs_mount *mp, struct xfs_perag *pag); +int xfs_refcount_recover_rtcow_leftovers(struct xfs_mount *mp, + struct xfs_rtgroup *rtg); /* * While we're adjusting the refcounts records of an extent, we have ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 13/41] xfs: update rmap to allow cow staging extents in the rt rmap 2022-12-30 22:20 ` [PATCHSET v1.0 00/41] libxfs: reflink on the realtime device Darrick J. Wong ` (14 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 16/41] xfs: refcover CoW leftovers in the realtime volume Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 10/41] xfs: wire up a new inode fork type for the realtime refcount Darrick J. Wong ` (24 subsequent siblings) 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Don't error out on CoW staging extent records when realtime reflink is enabled. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/xfs_rmap.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/libxfs/xfs_rmap.c b/libxfs/xfs_rmap.c index a8ba49a89cf..1e2798ec709 100644 --- a/libxfs/xfs_rmap.c +++ b/libxfs/xfs_rmap.c @@ -273,6 +273,7 @@ xfs_rmap_check_rtgroup_irec( bool is_unwritten; bool is_bmbt; bool is_attr; + bool is_cow; if (irec->rm_blockcount == 0) return __this_address; @@ -284,6 +285,12 @@ xfs_rmap_check_rtgroup_irec( return __this_address; if (irec->rm_offset != 0) return __this_address; + } else if (irec->rm_owner == XFS_RMAP_OWN_COW) { + if (!xfs_has_rtreflink(mp)) + return __this_address; + if (!xfs_verify_rgbext(rtg, irec->rm_startblock, + irec->rm_blockcount)) + return __this_address; } else { if (!xfs_verify_rgbext(rtg, irec->rm_startblock, irec->rm_blockcount)) @@ -300,8 +307,10 @@ xfs_rmap_check_rtgroup_irec( is_bmbt = irec->rm_flags & XFS_RMAP_BMBT_BLOCK; is_attr = irec->rm_flags & XFS_RMAP_ATTR_FORK; is_unwritten = irec->rm_flags & XFS_RMAP_UNWRITTEN; + is_cow = xfs_has_rtreflink(mp) && + irec->rm_owner == XFS_RMAP_OWN_COW; - if (!is_inode && irec->rm_owner != XFS_RMAP_OWN_FS) + if (!is_inode && !is_cow && irec->rm_owner != XFS_RMAP_OWN_FS) return __this_address; if (!is_inode && irec->rm_offset != 0) @@ -313,6 +322,9 @@ xfs_rmap_check_rtgroup_irec( if (is_unwritten && !is_inode) return __this_address; + if (is_unwritten && is_cow) + return __this_address; + /* Check for a valid fork offset, if applicable. */ if (is_inode && !xfs_verify_fileext(mp, irec->rm_offset, irec->rm_blockcount)) ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 10/41] xfs: wire up a new inode fork type for the realtime refcount 2022-12-30 22:20 ` [PATCHSET v1.0 00/41] libxfs: reflink on the realtime device Darrick J. Wong ` (15 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 13/41] xfs: update rmap to allow cow staging extents in the rt rmap Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 24/41] xfs_db: display the realtime refcount btree contents Darrick J. Wong ` (23 subsequent siblings) 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Plumb in the pieces we need to embed the root of the realtime refcount btree in an inode's data fork, complete with new fork type and on-disk interpretation functions. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/xfs_format.h | 8 + libxfs/xfs_inode_fork.c | 8 + libxfs/xfs_rtrefcount_btree.c | 236 +++++++++++++++++++++++++++++++++++++++++ libxfs/xfs_rtrefcount_btree.h | 112 +++++++++++++++++++ 4 files changed, 361 insertions(+), 3 deletions(-) diff --git a/libxfs/xfs_format.h b/libxfs/xfs_format.h index 20af5b730d6..17be73c4522 100644 --- a/libxfs/xfs_format.h +++ b/libxfs/xfs_format.h @@ -1805,6 +1805,14 @@ typedef __be32 xfs_refcount_ptr_t; */ #define XFS_RTREFC_CRC_MAGIC 0x52434e54 /* 'RCNT' */ +/* + * rt refcount root header, on-disk form only. + */ +struct xfs_rtrefcount_root { + __be16 bb_level; /* 0 is a leaf */ + __be16 bb_numrecs; /* current # of data records */ +}; + /* inode-rooted btree pointer type */ typedef __be64 xfs_rtrefcount_ptr_t; diff --git a/libxfs/xfs_inode_fork.c b/libxfs/xfs_inode_fork.c index f7a168e0625..b019cbeb5fd 100644 --- a/libxfs/xfs_inode_fork.c +++ b/libxfs/xfs_inode_fork.c @@ -26,6 +26,7 @@ #include "xfs_health.h" #include "xfs_symlink_remote.h" #include "xfs_rtrmap_btree.h" +#include "xfs_rtrefcount_btree.h" struct kmem_cache *xfs_ifork_cache; @@ -267,8 +268,7 @@ xfs_iformat_data_fork( case XFS_DINODE_FMT_REFCOUNT: if (!xfs_has_rtreflink(ip->i_mount)) return -EFSCORRUPTED; - ASSERT(0); /* to be implemented later */ - return -EFSCORRUPTED; + return xfs_iformat_rtrefcount(ip, dip); default: xfs_inode_verifier_error(ip, -EFSCORRUPTED, __func__, dip, sizeof(*dip), __this_address); @@ -656,7 +656,9 @@ xfs_iflush_fork( break; case XFS_DINODE_FMT_REFCOUNT: - ASSERT(0); /* to be implemented later */ + ASSERT(whichfork == XFS_DATA_FORK); + if (iip->ili_fields & brootflag[whichfork]) + xfs_iflush_rtrefcount(ip, dip); break; default: diff --git a/libxfs/xfs_rtrefcount_btree.c b/libxfs/xfs_rtrefcount_btree.c index a5146550b20..2c9fbab5159 100644 --- a/libxfs/xfs_rtrefcount_btree.c +++ b/libxfs/xfs_rtrefcount_btree.c @@ -83,6 +83,41 @@ xfs_rtrefcountbt_get_maxrecs( return cur->bc_mp->m_rtrefc_mxr[level != 0]; } +/* + * Calculate number of records in a realtime refcount btree inode root. + */ +unsigned int +xfs_rtrefcountbt_droot_maxrecs( + unsigned int blocklen, + bool leaf) +{ + blocklen -= sizeof(struct xfs_rtrefcount_root); + + if (leaf) + return blocklen / sizeof(struct xfs_refcount_rec); + return blocklen / (2 * sizeof(struct xfs_refcount_key) + + sizeof(xfs_rtrefcount_ptr_t)); +} + +/* + * Get the maximum records we could store in the on-disk format. + * + * For non-root nodes this is equivalent to xfs_rtrefcountbt_get_maxrecs, but + * for the root node this checks the available space in the dinode fork so that + * we can resize the in-memory buffer to match it. After a resize to the + * maximum size this function returns the same value as + * xfs_rtrefcountbt_get_maxrecs for the root node, too. + */ +STATIC int +xfs_rtrefcountbt_get_dmaxrecs( + struct xfs_btree_cur *cur, + int level) +{ + if (level != cur->bc_nlevels - 1) + return cur->bc_mp->m_rtrefc_mxr[level != 0]; + return xfs_rtrefcountbt_droot_maxrecs(cur->bc_ino.forksize, level == 0); +} + STATIC void xfs_rtrefcountbt_init_key_from_rec( union xfs_btree_key *key, @@ -253,6 +288,68 @@ xfs_rtrefcountbt_keys_contiguous( be32_to_cpu(key2->refc.rc_startblock)); } +/* Move the rt refcount btree root from one incore buffer to another. */ +static void +xfs_rtrefcountbt_broot_move( + struct xfs_inode *ip, + int whichfork, + struct xfs_btree_block *dst_broot, + size_t dst_bytes, + struct xfs_btree_block *src_broot, + size_t src_bytes, + unsigned int level, + unsigned int numrecs) +{ + struct xfs_mount *mp = ip->i_mount; + void *dptr; + void *sptr; + + ASSERT(xfs_rtrefcount_droot_space(src_broot) <= + xfs_inode_fork_size(ip, whichfork)); + + /* + * We always have to move the pointers because they are not butted + * against the btree block header. + */ + if (numrecs && level > 0) { + sptr = xfs_rtrefcount_broot_ptr_addr(mp, src_broot, 1, + src_bytes); + dptr = xfs_rtrefcount_broot_ptr_addr(mp, dst_broot, 1, + dst_bytes); + memmove(dptr, sptr, numrecs * sizeof(xfs_fsblock_t)); + } + + if (src_broot == dst_broot) + return; + + /* + * If the root is being totally relocated, we have to migrate the block + * header and the keys/records that come after it. + */ + memcpy(dst_broot, src_broot, XFS_RTREFCOUNT_BLOCK_LEN); + + if (!numrecs) + return; + + if (level == 0) { + sptr = xfs_rtrefcount_rec_addr(src_broot, 1); + dptr = xfs_rtrefcount_rec_addr(dst_broot, 1); + memcpy(dptr, sptr, + numrecs * sizeof(struct xfs_refcount_rec)); + } else { + sptr = xfs_rtrefcount_key_addr(src_broot, 1); + dptr = xfs_rtrefcount_key_addr(dst_broot, 1); + memcpy(dptr, sptr, + numrecs * sizeof(struct xfs_refcount_key)); + } +} + +static const struct xfs_ifork_broot_ops xfs_rtrefcountbt_iroot_ops = { + .maxrecs = xfs_rtrefcountbt_maxrecs, + .size = xfs_rtrefcount_broot_space_calc, + .move = xfs_rtrefcountbt_broot_move, +}; + const struct xfs_btree_ops xfs_rtrefcountbt_ops = { .rec_len = sizeof(struct xfs_refcount_rec), .key_len = sizeof(struct xfs_refcount_key), @@ -264,6 +361,7 @@ const struct xfs_btree_ops xfs_rtrefcountbt_ops = { .free_block = xfs_btree_free_imeta_block, .get_minrecs = xfs_rtrefcountbt_get_minrecs, .get_maxrecs = xfs_rtrefcountbt_get_maxrecs, + .get_dmaxrecs = xfs_rtrefcountbt_get_dmaxrecs, .init_key_from_rec = xfs_rtrefcountbt_init_key_from_rec, .init_high_key_from_rec = xfs_rtrefcountbt_init_high_key_from_rec, .init_rec_from_cur = xfs_rtrefcountbt_init_rec_from_cur, @@ -274,6 +372,7 @@ const struct xfs_btree_ops xfs_rtrefcountbt_ops = { .keys_inorder = xfs_rtrefcountbt_keys_inorder, .recs_inorder = xfs_rtrefcountbt_recs_inorder, .keys_contiguous = xfs_rtrefcountbt_keys_contiguous, + .iroot_ops = &xfs_rtrefcountbt_iroot_ops, }; /* Initialize a new rt refcount btree cursor. */ @@ -527,3 +626,140 @@ xfs_rtrefcountbt_calc_reserves( return xfs_rtrefcountbt_max_size(mp, xfs_rtb_to_rtxt(mp, mp->m_sb.sb_rgblocks)); } + +/* + * Convert on-disk form of btree root to in-memory form. + */ +STATIC void +xfs_rtrefcountbt_from_disk( + struct xfs_inode *ip, + struct xfs_rtrefcount_root *dblock, + int dblocklen, + struct xfs_btree_block *rblock) +{ + struct xfs_mount *mp = ip->i_mount; + struct xfs_refcount_key *fkp; + __be64 *fpp; + struct xfs_refcount_key *tkp; + __be64 *tpp; + struct xfs_refcount_rec *frp; + struct xfs_refcount_rec *trp; + unsigned int numrecs; + unsigned int maxrecs; + unsigned int rblocklen; + + rblocklen = xfs_rtrefcount_broot_space(mp, dblock); + + xfs_btree_init_block(mp, rblock, &xfs_rtrefcountbt_ops, 0, 0, + ip->i_ino); + + rblock->bb_level = dblock->bb_level; + rblock->bb_numrecs = dblock->bb_numrecs; + + if (be16_to_cpu(rblock->bb_level) > 0) { + maxrecs = xfs_rtrefcountbt_droot_maxrecs(dblocklen, false); + fkp = xfs_rtrefcount_droot_key_addr(dblock, 1); + tkp = xfs_rtrefcount_key_addr(rblock, 1); + fpp = xfs_rtrefcount_droot_ptr_addr(dblock, 1, maxrecs); + tpp = xfs_rtrefcount_broot_ptr_addr(mp, rblock, 1, rblocklen); + numrecs = be16_to_cpu(dblock->bb_numrecs); + memcpy(tkp, fkp, 2 * sizeof(*fkp) * numrecs); + memcpy(tpp, fpp, sizeof(*fpp) * numrecs); + } else { + frp = xfs_rtrefcount_droot_rec_addr(dblock, 1); + trp = xfs_rtrefcount_rec_addr(rblock, 1); + numrecs = be16_to_cpu(dblock->bb_numrecs); + memcpy(trp, frp, sizeof(*frp) * numrecs); + } +} + +/* Load a realtime reference count btree root in from disk. */ +int +xfs_iformat_rtrefcount( + struct xfs_inode *ip, + struct xfs_dinode *dip) +{ + struct xfs_mount *mp = ip->i_mount; + struct xfs_ifork *ifp = xfs_ifork_ptr(ip, XFS_DATA_FORK); + struct xfs_rtrefcount_root *dfp = XFS_DFORK_PTR(dip, XFS_DATA_FORK); + unsigned int numrecs; + unsigned int level; + int dsize; + + dsize = XFS_DFORK_SIZE(dip, mp, XFS_DATA_FORK); + numrecs = be16_to_cpu(dfp->bb_numrecs); + level = be16_to_cpu(dfp->bb_level); + + if (level > mp->m_rtrefc_maxlevels || + xfs_rtrefcount_droot_space_calc(level, numrecs) > dsize) + return -EFSCORRUPTED; + + xfs_iroot_alloc(ip, XFS_DATA_FORK, + xfs_rtrefcount_broot_space_calc(mp, level, numrecs)); + xfs_rtrefcountbt_from_disk(ip, dfp, dsize, ifp->if_broot); + return 0; +} + +/* + * Convert in-memory form of btree root to on-disk form. + */ +void +xfs_rtrefcountbt_to_disk( + struct xfs_mount *mp, + struct xfs_btree_block *rblock, + int rblocklen, + struct xfs_rtrefcount_root *dblock, + int dblocklen) +{ + struct xfs_refcount_key *fkp; + __be64 *fpp; + struct xfs_refcount_key *tkp; + __be64 *tpp; + struct xfs_refcount_rec *frp; + struct xfs_refcount_rec *trp; + unsigned int maxrecs; + unsigned int numrecs; + + ASSERT(rblock->bb_magic == cpu_to_be32(XFS_RTREFC_CRC_MAGIC)); + ASSERT(uuid_equal(&rblock->bb_u.l.bb_uuid, &mp->m_sb.sb_meta_uuid)); + ASSERT(rblock->bb_u.l.bb_blkno == cpu_to_be64(XFS_BUF_DADDR_NULL)); + ASSERT(rblock->bb_u.l.bb_leftsib == cpu_to_be64(NULLFSBLOCK)); + ASSERT(rblock->bb_u.l.bb_rightsib == cpu_to_be64(NULLFSBLOCK)); + + dblock->bb_level = rblock->bb_level; + dblock->bb_numrecs = rblock->bb_numrecs; + + if (be16_to_cpu(rblock->bb_level) > 0) { + maxrecs = xfs_rtrefcountbt_droot_maxrecs(dblocklen, false); + fkp = xfs_rtrefcount_key_addr(rblock, 1); + tkp = xfs_rtrefcount_droot_key_addr(dblock, 1); + fpp = xfs_rtrefcount_broot_ptr_addr(mp, rblock, 1, rblocklen); + tpp = xfs_rtrefcount_droot_ptr_addr(dblock, 1, maxrecs); + numrecs = be16_to_cpu(rblock->bb_numrecs); + memcpy(tkp, fkp, 2 * sizeof(*fkp) * numrecs); + memcpy(tpp, fpp, sizeof(*fpp) * numrecs); + } else { + frp = xfs_rtrefcount_rec_addr(rblock, 1); + trp = xfs_rtrefcount_droot_rec_addr(dblock, 1); + numrecs = be16_to_cpu(rblock->bb_numrecs); + memcpy(trp, frp, sizeof(*frp) * numrecs); + } +} + +/* Flush a realtime reference count btree root out to disk. */ +void +xfs_iflush_rtrefcount( + struct xfs_inode *ip, + struct xfs_dinode *dip) +{ + struct xfs_ifork *ifp = xfs_ifork_ptr(ip, XFS_DATA_FORK); + struct xfs_rtrefcount_root *dfp = XFS_DFORK_PTR(dip, XFS_DATA_FORK); + + ASSERT(ifp->if_broot != NULL); + ASSERT(ifp->if_broot_bytes > 0); + ASSERT(xfs_rtrefcount_droot_space(ifp->if_broot) <= + xfs_inode_fork_size(ip, XFS_DATA_FORK)); + xfs_rtrefcountbt_to_disk(ip->i_mount, ifp->if_broot, + ifp->if_broot_bytes, dfp, + XFS_DFORK_SIZE(dip, ip->i_mount, XFS_DATA_FORK)); +} diff --git a/libxfs/xfs_rtrefcount_btree.h b/libxfs/xfs_rtrefcount_btree.h index ffda0b063bc..d2fe2004568 100644 --- a/libxfs/xfs_rtrefcount_btree.h +++ b/libxfs/xfs_rtrefcount_btree.h @@ -27,6 +27,7 @@ void xfs_rtrefcountbt_commit_staged_btree(struct xfs_btree_cur *cur, unsigned int xfs_rtrefcountbt_maxrecs(struct xfs_mount *mp, unsigned int blocklen, bool leaf); void xfs_rtrefcountbt_compute_maxlevels(struct xfs_mount *mp); +unsigned int xfs_rtrefcountbt_droot_maxrecs(unsigned int blocklen, bool leaf); /* * Addresses of records, keys, and pointers within an incore rtrefcountbt block. @@ -74,4 +75,115 @@ int xfs_rtrefcountbt_create_path(struct xfs_mount *mp, xfs_rgnumber_t rgno, xfs_filblks_t xfs_rtrefcountbt_calc_reserves(struct xfs_mount *mp); +/* Addresses of key, pointers, and records within an ondisk rtrefcount block. */ + +static inline struct xfs_refcount_rec * +xfs_rtrefcount_droot_rec_addr( + struct xfs_rtrefcount_root *block, + unsigned int index) +{ + return (struct xfs_refcount_rec *) + ((char *)(block + 1) + + (index - 1) * sizeof(struct xfs_refcount_rec)); +} + +static inline struct xfs_refcount_key * +xfs_rtrefcount_droot_key_addr( + struct xfs_rtrefcount_root *block, + unsigned int index) +{ + return (struct xfs_refcount_key *) + ((char *)(block + 1) + + (index - 1) * sizeof(struct xfs_refcount_key)); +} + +static inline xfs_rtrefcount_ptr_t * +xfs_rtrefcount_droot_ptr_addr( + struct xfs_rtrefcount_root *block, + unsigned int index, + unsigned int maxrecs) +{ + return (xfs_rtrefcount_ptr_t *) + ((char *)(block + 1) + + maxrecs * sizeof(struct xfs_refcount_key) + + (index - 1) * sizeof(xfs_rtrefcount_ptr_t)); +} + +/* + * Address of pointers within the incore btree root. + * + * These are to be used when we know the size of the block and + * we don't have a cursor. + */ +static inline xfs_rtrefcount_ptr_t * +xfs_rtrefcount_broot_ptr_addr( + struct xfs_mount *mp, + struct xfs_btree_block *bb, + unsigned int index, + unsigned int block_size) +{ + return xfs_rtrefcount_ptr_addr(bb, index, + xfs_rtrefcountbt_maxrecs(mp, block_size, false)); +} + +/* + * Compute the space required for the incore btree root containing the given + * number of records. + */ +static inline size_t +xfs_rtrefcount_broot_space_calc( + struct xfs_mount *mp, + unsigned int level, + unsigned int nrecs) +{ + size_t sz = XFS_RTREFCOUNT_BLOCK_LEN; + + if (level > 0) + return sz + nrecs * (sizeof(struct xfs_refcount_key) + + sizeof(xfs_rtrefcount_ptr_t)); + return sz + nrecs * sizeof(struct xfs_refcount_rec); +} + +/* + * Compute the space required for the incore btree root given the ondisk + * btree root block. + */ +static inline size_t +xfs_rtrefcount_broot_space(struct xfs_mount *mp, struct xfs_rtrefcount_root *bb) +{ + return xfs_rtrefcount_broot_space_calc(mp, be16_to_cpu(bb->bb_level), + be16_to_cpu(bb->bb_numrecs)); +} + +/* Compute the space required for the ondisk root block. */ +static inline size_t +xfs_rtrefcount_droot_space_calc( + unsigned int level, + unsigned int nrecs) +{ + size_t sz = sizeof(struct xfs_rtrefcount_root); + + if (level > 0) + return sz + nrecs * (sizeof(struct xfs_refcount_key) + + sizeof(xfs_rtrefcount_ptr_t)); + return sz + nrecs * sizeof(struct xfs_refcount_rec); +} + +/* + * Compute the space required for the ondisk root block given an incore root + * block. + */ +static inline size_t +xfs_rtrefcount_droot_space(struct xfs_btree_block *bb) +{ + return xfs_rtrefcount_droot_space_calc(be16_to_cpu(bb->bb_level), + be16_to_cpu(bb->bb_numrecs)); +} + +int xfs_iformat_rtrefcount(struct xfs_inode *ip, struct xfs_dinode *dip); +void xfs_rtrefcountbt_to_disk(struct xfs_mount *mp, + struct xfs_btree_block *rblock, int rblocklen, + struct xfs_rtrefcount_root *dblock, int dblocklen); +void xfs_iflush_rtrefcount(struct xfs_inode *ip, struct xfs_dinode *dip); + #endif /* __XFS_RTREFCOUNT_BTREE_H__ */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 24/41] xfs_db: display the realtime refcount btree contents 2022-12-30 22:20 ` [PATCHSET v1.0 00/41] libxfs: reflink on the realtime device Darrick J. Wong ` (16 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 10/41] xfs: wire up a new inode fork type for the realtime refcount Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 23/41] libfrog: enable scrubbing of the realtime refcount data Darrick J. Wong ` (22 subsequent siblings) 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Implement all the code we need to dump rtrefcountbt contents, starting from the root inode. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- db/bmroot.c | 148 ++++++++++++++++++++++++++++++++++++++++++++++ db/bmroot.h | 2 + db/btblock.c | 50 ++++++++++++++++ db/btblock.h | 5 ++ db/field.c | 13 ++++ db/field.h | 5 ++ db/inode.c | 62 +++++++++++++++++++ db/inode.h | 1 db/type.c | 5 ++ db/type.h | 1 libxfs/libxfs_api_defs.h | 5 ++ man/man8/xfs_db.8 | 48 +++++++++++++++ 12 files changed, 343 insertions(+), 2 deletions(-) diff --git a/db/bmroot.c b/db/bmroot.c index 19490bd2499..cb334aa4583 100644 --- a/db/bmroot.c +++ b/db/bmroot.c @@ -31,6 +31,13 @@ static int rtrmaproot_key_offset(void *obj, int startoff, int idx); static int rtrmaproot_ptr_count(void *obj, int startoff); static int rtrmaproot_ptr_offset(void *obj, int startoff, int idx); +static int rtrefcroot_rec_count(void *obj, int startoff); +static int rtrefcroot_rec_offset(void *obj, int startoff, int idx); +static int rtrefcroot_key_count(void *obj, int startoff); +static int rtrefcroot_key_offset(void *obj, int startoff, int idx); +static int rtrefcroot_ptr_count(void *obj, int startoff); +static int rtrefcroot_ptr_offset(void *obj, int startoff, int idx); + #define OFF(f) bitize(offsetof(xfs_bmdr_block_t, bb_ ## f)) const field_t bmroota_flds[] = { { "level", FLDT_UINT16D, OI(OFF(level)), C1, 0, TYP_NONE }, @@ -73,6 +80,19 @@ const field_t rtrmaproot_flds[] = { FLD_ARRAY|FLD_ABASE1|FLD_COUNT|FLD_OFFSET, TYP_RTRMAPBT }, { NULL } }; + +/* realtime refcount btree root */ +const field_t rtrefcroot_flds[] = { + { "level", FLDT_UINT16D, OI(OFF(level)), C1, 0, TYP_NONE }, + { "numrecs", FLDT_UINT16D, OI(OFF(numrecs)), C1, 0, TYP_NONE }, + { "recs", FLDT_RTREFCBTREC, rtrefcroot_rec_offset, rtrefcroot_rec_count, + FLD_ARRAY|FLD_ABASE1|FLD_COUNT|FLD_OFFSET, TYP_NONE }, + { "keys", FLDT_RTREFCBTKEY, rtrefcroot_key_offset, rtrefcroot_key_count, + FLD_ARRAY|FLD_ABASE1|FLD_COUNT|FLD_OFFSET, TYP_NONE }, + { "ptrs", FLDT_RTREFCBTPTR, rtrefcroot_ptr_offset, rtrefcroot_ptr_count, + FLD_ARRAY|FLD_ABASE1|FLD_COUNT|FLD_OFFSET, TYP_RTREFCBT }, + { NULL } +}; #undef OFF static int @@ -390,3 +410,131 @@ rtrmaproot_size( dip = obj; return bitize((int)XFS_DFORK_DSIZE(dip, mp)); } + +/* realtime refcount root */ +static int +rtrefcroot_rec_count( + void *obj, + int startoff) +{ + struct xfs_rtrefcount_root *block; +#ifdef DEBUG + struct xfs_dinode *dip = obj; +#endif + + ASSERT(bitoffs(startoff) == 0); + ASSERT(obj == iocur_top->data); + block = (struct xfs_rtrefcount_root *)((char *)obj + byteize(startoff)); + ASSERT((char *)block == XFS_DFORK_DPTR(dip)); + if (be16_to_cpu(block->bb_level) > 0) + return 0; + return be16_to_cpu(block->bb_numrecs); +} + +static int +rtrefcroot_rec_offset( + void *obj, + int startoff, + int idx) +{ + struct xfs_rtrefcount_root *block; + struct xfs_refcount_rec *kp; + + ASSERT(bitoffs(startoff) == 0); + ASSERT(obj == iocur_top->data); + block = (struct xfs_rtrefcount_root *)((char *)obj + byteize(startoff)); + ASSERT(be16_to_cpu(block->bb_level) == 0); + kp = xfs_rtrefcount_droot_rec_addr(block, idx); + return bitize((int)((char *)kp - (char *)block)); +} + +static int +rtrefcroot_key_count( + void *obj, + int startoff) +{ + struct xfs_rtrefcount_root *block; +#ifdef DEBUG + struct xfs_dinode *dip = obj; +#endif + + ASSERT(bitoffs(startoff) == 0); + ASSERT(obj == iocur_top->data); + block = (struct xfs_rtrefcount_root *)((char *)obj + byteize(startoff)); + ASSERT((char *)block == XFS_DFORK_DPTR(dip)); + if (be16_to_cpu(block->bb_level) == 0) + return 0; + return be16_to_cpu(block->bb_numrecs); +} + +static int +rtrefcroot_key_offset( + void *obj, + int startoff, + int idx) +{ + struct xfs_rtrefcount_root *block; + struct xfs_refcount_key *kp; + + ASSERT(bitoffs(startoff) == 0); + ASSERT(obj == iocur_top->data); + block = (struct xfs_rtrefcount_root *)((char *)obj + byteize(startoff)); + ASSERT(be16_to_cpu(block->bb_level) > 0); + kp = xfs_rtrefcount_droot_key_addr(block, idx); + return bitize((int)((char *)kp - (char *)block)); +} + +static int +rtrefcroot_ptr_count( + void *obj, + int startoff) +{ + struct xfs_rtrefcount_root *block; +#ifdef DEBUG + struct xfs_dinode *dip = obj; +#endif + + ASSERT(bitoffs(startoff) == 0); + ASSERT(obj == iocur_top->data); + block = (struct xfs_rtrefcount_root *)((char *)obj + byteize(startoff)); + ASSERT((char *)block == XFS_DFORK_DPTR(dip)); + if (be16_to_cpu(block->bb_level) == 0) + return 0; + return be16_to_cpu(block->bb_numrecs); +} + +static int +rtrefcroot_ptr_offset( + void *obj, + int startoff, + int idx) +{ + struct xfs_rtrefcount_root *block; + xfs_rtrefcount_ptr_t *pp; + struct xfs_dinode *dip; + int dmxr; + + ASSERT(bitoffs(startoff) == 0); + ASSERT(obj == iocur_top->data); + dip = obj; + block = (struct xfs_rtrefcount_root *)((char *)obj + byteize(startoff)); + ASSERT(be16_to_cpu(block->bb_level) > 0); + dmxr = libxfs_rtrefcountbt_droot_maxrecs(XFS_DFORK_DSIZE(dip, mp), false); + pp = xfs_rtrefcount_droot_ptr_addr(block, idx, dmxr); + return bitize((int)((char *)pp - (char *)block)); +} + +int +rtrefcroot_size( + void *obj, + int startoff, + int idx) +{ + struct xfs_dinode *dip; + + ASSERT(bitoffs(startoff) == 0); + ASSERT(obj == iocur_top->data); + ASSERT(idx == 0); + dip = obj; + return bitize((int)XFS_DFORK_DSIZE(dip, mp)); +} diff --git a/db/bmroot.h b/db/bmroot.h index a2c5cfb18f0..70bc8483cd8 100644 --- a/db/bmroot.h +++ b/db/bmroot.h @@ -9,7 +9,9 @@ extern const struct field bmroota_key_flds[]; extern const struct field bmrootd_flds[]; extern const struct field bmrootd_key_flds[]; extern const struct field rtrmaproot_flds[]; +extern const struct field rtrefcroot_flds[]; extern int bmroota_size(void *obj, int startoff, int idx); extern int bmrootd_size(void *obj, int startoff, int idx); extern int rtrmaproot_size(void *obj, int startoff, int idx); +extern int rtrefcroot_size(void *obj, int startoff, int idx); diff --git a/db/btblock.c b/db/btblock.c index 70f6c3f6aed..0a581593a59 100644 --- a/db/btblock.c +++ b/db/btblock.c @@ -104,6 +104,12 @@ static struct xfs_db_btree { sizeof(struct xfs_refcount_rec), sizeof(__be32), }, + { XFS_RTREFC_CRC_MAGIC, + XFS_BTREE_LBLOCK_CRC_LEN, + sizeof(struct xfs_refcount_key), + sizeof(struct xfs_refcount_rec), + sizeof(__be64), + }, { 0, }, }; @@ -962,3 +968,47 @@ const field_t refcbt_rec_flds[] = { { NULL } }; #undef ROFF + +/* realtime refcount btree blocks */ +const field_t rtrefcbt_crc_hfld[] = { + { "", FLDT_RTREFCBT_CRC, OI(0), C1, 0, TYP_NONE }, + { NULL } +}; + +#define OFF(f) bitize(offsetof(struct xfs_btree_block, bb_ ## f)) +const field_t rtrefcbt_crc_flds[] = { + { "magic", FLDT_UINT32X, OI(OFF(magic)), C1, 0, TYP_NONE }, + { "level", FLDT_UINT16D, OI(OFF(level)), C1, 0, TYP_NONE }, + { "numrecs", FLDT_UINT16D, OI(OFF(numrecs)), C1, 0, TYP_NONE }, + { "leftsib", FLDT_DFSBNO, OI(OFF(u.l.bb_leftsib)), C1, 0, TYP_RTREFCBT }, + { "rightsib", FLDT_DFSBNO, OI(OFF(u.l.bb_rightsib)), C1, 0, TYP_RTREFCBT }, + { "bno", FLDT_DFSBNO, OI(OFF(u.l.bb_blkno)), C1, 0, TYP_REFCBT }, + { "lsn", FLDT_UINT64X, OI(OFF(u.l.bb_lsn)), C1, 0, TYP_NONE }, + { "uuid", FLDT_UUID, OI(OFF(u.l.bb_uuid)), C1, 0, TYP_NONE }, + { "owner", FLDT_INO, OI(OFF(u.l.bb_owner)), C1, 0, TYP_NONE }, + { "crc", FLDT_CRC, OI(OFF(u.l.bb_crc)), C1, 0, TYP_NONE }, + { "recs", FLDT_RTREFCBTREC, btblock_rec_offset, btblock_rec_count, + FLD_ARRAY | FLD_ABASE1 | FLD_COUNT | FLD_OFFSET, TYP_NONE }, + { "keys", FLDT_RTREFCBTKEY, btblock_key_offset, btblock_key_count, + FLD_ARRAY | FLD_ABASE1 | FLD_COUNT | FLD_OFFSET, TYP_NONE }, + { "ptrs", FLDT_RTREFCBTPTR, btblock_ptr_offset, btblock_key_count, + FLD_ARRAY | FLD_ABASE1 | FLD_COUNT | FLD_OFFSET, TYP_RTREFCBT }, + { NULL } +}; +#undef OFF + +const field_t rtrefcbt_key_flds[] = { + { "startblock", FLDT_RGBLOCK, OI(REFCNTBT_STARTBLOCK_BITOFF), C1, 0, TYP_DATA }, + { "cowflag", FLDT_CCOWFLG, OI(REFCNTBT_COWFLAG_BITOFF), C1, 0, TYP_DATA }, + { NULL } +}; + +#define ROFF(f) bitize(offsetof(struct xfs_refcount_rec, rc_ ## f)) +const field_t rtrefcbt_rec_flds[] = { + { "startblock", FLDT_RGBLOCK, OI(REFCNTBT_STARTBLOCK_BITOFF), C1, 0, TYP_DATA }, + { "blockcount", FLDT_EXTLEN, OI(ROFF(blockcount)), C1, 0, TYP_NONE }, + { "refcount", FLDT_UINT32D, OI(ROFF(refcount)), C1, 0, TYP_DATA }, + { "cowflag", FLDT_CCOWFLG, OI(REFCNTBT_COWFLAG_BITOFF), C1, 0, TYP_DATA }, + { NULL } +}; +#undef ROFF diff --git a/db/btblock.h b/db/btblock.h index b4013ea8073..5bbe857a7ef 100644 --- a/db/btblock.h +++ b/db/btblock.h @@ -63,4 +63,9 @@ extern const struct field refcbt_crc_hfld[]; extern const struct field refcbt_key_flds[]; extern const struct field refcbt_rec_flds[]; +extern const struct field rtrefcbt_crc_flds[]; +extern const struct field rtrefcbt_crc_hfld[]; +extern const struct field rtrefcbt_key_flds[]; +extern const struct field rtrefcbt_rec_flds[]; + extern int btblock_size(void *obj, int startoff, int idx); diff --git a/db/field.c b/db/field.c index b3efbb5698d..5e26f19e9d4 100644 --- a/db/field.c +++ b/db/field.c @@ -204,6 +204,19 @@ const ftattr_t ftattrtab[] = { { FLDT_REFCBTREC, "refcntbtrec", fp_sarray, (char *)refcbt_rec_flds, SI(bitsz(struct xfs_refcount_rec)), 0, NULL, refcbt_rec_flds }, + { FLDT_RTREFCBT_CRC, "rtrefcntbt", NULL, (char *)rtrefcbt_crc_flds, + btblock_size, FTARG_SIZE, NULL, rtrefcbt_crc_flds }, + { FLDT_RTREFCBTKEY, "rtrefcntbtkey", fp_sarray, + (char *)rtrefcbt_key_flds, SI(bitsz(struct xfs_refcount_key)), 0, + NULL, rtrefcbt_key_flds }, + { FLDT_RTREFCBTPTR, "rtrefcntbtptr", fp_num, "%u", + SI(bitsz(xfs_rtrefcount_ptr_t)), 0, fa_dfsbno, NULL }, + { FLDT_RTREFCBTREC, "rtrefcntbtrec", fp_sarray, + (char *)rtrefcbt_rec_flds, SI(bitsz(struct xfs_refcount_rec)), 0, + NULL, rtrefcbt_rec_flds }, + { FLDT_RTREFCROOT, "rtrefcroot", NULL, (char *)rtrefcroot_flds, + rtrefcroot_size, FTARG_SIZE, NULL, rtrefcroot_flds }, + /* CRC field */ { FLDT_CRC, "crc", fp_crc, "%#x (%s)", SI(bitsz(uint32_t)), 0, NULL, NULL }, diff --git a/db/field.h b/db/field.h index db3e13d3927..06fa5272b26 100644 --- a/db/field.h +++ b/db/field.h @@ -92,6 +92,11 @@ typedef enum fldt { FLDT_REFCBTKEY, FLDT_REFCBTPTR, FLDT_REFCBTREC, + FLDT_RTREFCBT_CRC, + FLDT_RTREFCBTKEY, + FLDT_RTREFCBTPTR, + FLDT_RTREFCBTREC, + FLDT_RTREFCROOT, /* CRC field type */ FLDT_CRC, diff --git a/db/inode.c b/db/inode.c index 2d28eae4dad..af56e615e08 100644 --- a/db/inode.c +++ b/db/inode.c @@ -49,6 +49,7 @@ static int inode_u_sfdir2_count(void *obj, int startoff); static int inode_u_sfdir3_count(void *obj, int startoff); static int inode_u_symlink_count(void *obj, int startoff); static int inode_u_rtrmapbt_count(void *obj, int startoff); +static int inode_u_rtrefcbt_count(void *obj, int startoff); static const cmdinfo_t inode_cmd = { "inode", NULL, inode_f, 0, 1, 1, "[inode#]", @@ -234,6 +235,8 @@ const field_t inode_u_flds[] = { TYP_NONE }, { "rtrmapbt", FLDT_RTRMAPROOT, NULL, inode_u_rtrmapbt_count, FLD_COUNT, TYP_NONE }, + { "rtrefcbt", FLDT_RTREFCROOT, NULL, inode_u_rtrefcbt_count, FLD_COUNT, + TYP_NONE }, { NULL } }; @@ -247,7 +250,7 @@ const field_t inode_a_flds[] = { }; static const char *dinode_fmt_name[] = - { "dev", "local", "extents", "btree", "uuid", "rmap" }; + { "dev", "local", "extents", "btree", "uuid", "rmap", "refcount" }; static const int dinode_fmt_name_size = sizeof(dinode_fmt_name) / sizeof(dinode_fmt_name[0]); @@ -643,6 +646,7 @@ struct rtgroup_inodes { static struct rtgroup_inodes *rtgroup_inodes; static struct bitmap *rmap_inodes; +static struct bitmap *refcount_inodes; static inline int set_rtgroup_rmap_inode( @@ -672,6 +676,33 @@ set_rtgroup_rmap_inode( return bitmap_set(rmap_inodes, rtino, 1); } +static inline int +set_rtgroup_refcount_inode( + struct xfs_mount *mp, + xfs_rgnumber_t rgno) +{ + struct xfs_imeta_path *path; + xfs_ino_t rtino; + int error; + + if (!xfs_has_rtreflink(mp)) + return 0; + + error = -libxfs_rtrefcountbt_create_path(mp, rgno, &path); + if (error) + return error; + + error = -libxfs_imeta_lookup(mp, path, &rtino); + libxfs_imeta_free_path(path); + if (error) + return error; + + if (rtino == NULLFSINO) + return EFSCORRUPTED; + + return bitmap_set(refcount_inodes, rtino, 1); +} + int init_rtmeta_inode_bitmaps( struct xfs_mount *mp) @@ -691,10 +722,17 @@ init_rtmeta_inode_bitmaps( if (error) return error; + error = bitmap_alloc(&refcount_inodes); + if (error) + return error; + for (rgno = 0; rgno < mp->m_sb.sb_rgcount; rgno++) { int err2 = set_rtgroup_rmap_inode(mp, rgno); if (err2 && !error) error = err2; + err2 = set_rtgroup_refcount_inode(mp, rgno); + if (err2 && !error) + error = err2; } return error; @@ -717,6 +755,11 @@ xfs_rgnumber_t rtgroup_for_rtrmap_ino(struct xfs_mount *mp, xfs_ino_t ino) return NULLRGNUMBER; } +bool is_rtrefcount_inode(xfs_ino_t ino) +{ + return bitmap_test(refcount_inodes, ino, 1); +} + typnm_t inode_next_type(void) { @@ -749,6 +792,9 @@ inode_next_type(void) return TYP_DQBLK; else if (is_rtrmap_inode(iocur_top->ino)) return TYP_RTRMAPBT; + else if (is_rtrefcount_inode(iocur_top->ino)) + return TYP_RTREFCBT; + return TYP_DATA; default: return TYP_NONE; @@ -897,6 +943,20 @@ inode_u_rtrmapbt_count( return dip->di_format == XFS_DINODE_FMT_RMAP; } +static int +inode_u_rtrefcbt_count( + void *obj, + int startoff) +{ + struct xfs_dinode *dip; + + ASSERT(bitoffs(startoff) == 0); + ASSERT(obj == iocur_top->data); + dip = obj; + ASSERT((char *)XFS_DFORK_DPTR(dip) - (char *)dip == byteize(startoff)); + return dip->di_format == XFS_DINODE_FMT_REFCOUNT; +} + int inode_u_size( void *obj, diff --git a/db/inode.h b/db/inode.h index 04e606abed3..666bb5201ea 100644 --- a/db/inode.h +++ b/db/inode.h @@ -27,3 +27,4 @@ extern void set_cur_inode(xfs_ino_t ino); int init_rtmeta_inode_bitmaps(struct xfs_mount *mp); bool is_rtrmap_inode(xfs_ino_t ino); xfs_rgnumber_t rtgroup_for_rtrmap_ino(struct xfs_mount *mp, xfs_ino_t ino); +bool is_rtrefcount_inode(xfs_ino_t ino); diff --git a/db/type.c b/db/type.c index 1dfc33ffb44..324f416a49c 100644 --- a/db/type.c +++ b/db/type.c @@ -53,6 +53,7 @@ static const typ_t __typtab[] = { { TYP_RMAPBT, NULL }, { TYP_RTRMAPBT, NULL }, { TYP_REFCBT, NULL }, + { TYP_RTREFCBT, NULL }, { TYP_DATA, "data", handle_block, NULL, NULL, TYP_F_NO_CRC_OFF }, { TYP_DIR2, "dir2", handle_struct, dir2_hfld, NULL, TYP_F_NO_CRC_OFF }, { TYP_DQBLK, "dqblk", handle_struct, dqblk_hfld, NULL, TYP_F_NO_CRC_OFF }, @@ -96,6 +97,8 @@ static const typ_t __typtab_crc[] = { &xfs_rtrmapbt_buf_ops, XFS_BTREE_LBLOCK_CRC_OFF }, { TYP_REFCBT, "refcntbt", handle_struct, refcbt_crc_hfld, &xfs_refcountbt_buf_ops, XFS_BTREE_SBLOCK_CRC_OFF }, + { TYP_RTREFCBT, "rtrefcntbt", handle_struct, rtrefcbt_crc_hfld, + &xfs_rtrefcountbt_buf_ops, XFS_BTREE_LBLOCK_CRC_OFF }, { TYP_DATA, "data", handle_block, NULL, NULL, TYP_F_NO_CRC_OFF }, { TYP_DIR2, "dir3", handle_struct, dir3_hfld, &xfs_dir3_db_buf_ops, TYP_F_CRC_FUNC, xfs_dir3_set_crc }, @@ -148,6 +151,8 @@ static const typ_t __typtab_spcrc[] = { &xfs_rtrmapbt_buf_ops, XFS_BTREE_LBLOCK_CRC_OFF }, { TYP_REFCBT, "refcntbt", handle_struct, refcbt_crc_hfld, &xfs_refcountbt_buf_ops, XFS_BTREE_SBLOCK_CRC_OFF }, + { TYP_RTREFCBT, "rtrefcntbt", handle_struct, rtrefcbt_crc_hfld, + &xfs_rtrefcountbt_buf_ops, XFS_BTREE_LBLOCK_CRC_OFF }, { TYP_DATA, "data", handle_block, NULL, NULL, TYP_F_NO_CRC_OFF }, { TYP_DIR2, "dir3", handle_struct, dir3_hfld, &xfs_dir3_db_buf_ops, TYP_F_CRC_FUNC, xfs_dir3_set_crc }, diff --git a/db/type.h b/db/type.h index c98f3640202..a2488a663db 100644 --- a/db/type.h +++ b/db/type.h @@ -22,6 +22,7 @@ typedef enum typnm TYP_RMAPBT, TYP_RTRMAPBT, TYP_REFCBT, + TYP_RTREFCBT, TYP_DATA, TYP_DIR2, TYP_DQBLK, diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h index 2e7529cec54..0ac00fca337 100644 --- a/libxfs/libxfs_api_defs.h +++ b/libxfs/libxfs_api_defs.h @@ -249,6 +249,11 @@ #define xfs_rtgroup_put libxfs_rtgroup_put #define xfs_rtgroup_update_secondary_sbs libxfs_rtgroup_update_secondary_sbs #define xfs_rtgroup_update_super libxfs_rtgroup_update_super + +#define xfs_rtrefcountbt_create_path libxfs_rtrefcountbt_create_path +#define xfs_rtrefcountbt_droot_maxrecs libxfs_rtrefcountbt_droot_maxrecs +#define xfs_rtrefcountbt_maxrecs libxfs_rtrefcountbt_maxrecs + #define xfs_rtrmapbt_calc_reserves libxfs_rtrmapbt_calc_reserves #define xfs_rtrmapbt_commit_staged_btree libxfs_rtrmapbt_commit_staged_btree #define xfs_rtrmapbt_create libxfs_rtrmapbt_create diff --git a/man/man8/xfs_db.8 b/man/man8/xfs_db.8 index 2efa45297db..a277ea5e668 100644 --- a/man/man8/xfs_db.8 +++ b/man/man8/xfs_db.8 @@ -1102,7 +1102,7 @@ The possible data types are: .BR agf ", " agfl ", " agi ", " attr ", " bmapbta ", " bmapbtd , .BR bnobt ", " cntbt ", " data ", " dir ", " dir2 ", " dqblk , .BR inobt ", " inode ", " log ", " refcntbt ", " rmapbt ", " rtbitmap , -.BR rtsummary ", " sb ", " symlink ", " rtrmapbt ", and " text . +.BR rtsummary ", " sb ", " symlink ", " rtrmapbt ", " rtrefcbt ", and " text . See the TYPES section below for more information on these data types. .TP .BI "timelimit [" OPTIONS ] @@ -2210,6 +2210,52 @@ block number within the allocation group to the next level in the Btree. .PD .RE .TP +.B rtrefcbt +There is one reference count Btree for the entire realtime device. The +.BR startblock " and " +.B blockcount +fields are 32 bits wide and record block counts within a realtime group. +The root of this Btree is the realtime refcount inode, which is recorded in the +metadata directory. +Blocks are linked to sibling left and right blocks at each level, as well as by +pointers from parent to child blocks. +Each block has the following fields: +.RS 1.4i +.PD 0 +.TP 1.2i +.B magic +RTREFC block magic number, 0x52434e54 ('RCNT'). +.TP +.B level +level number of this block, 0 is a leaf. +.TP +.B numrecs +number of data entries in the block. +.TP +.B leftsib +left (logically lower) sibling block, 0 if none. +.TP +.B rightsib +right (logically higher) sibling block, 0 if none. +.TP +.B recs +[leaf blocks only] array of reference count records. Each record contains +.BR startblock , +.BR blockcount , +and +.BR refcount . +.TP +.B keys +[non-leaf blocks only] array of key records. These are the first value +of each block in the level below this one. Each record contains +.BR startblock . +.TP +.B ptrs +[non-leaf blocks only] array of child block pointers. Each pointer is a +block number within the allocation group to the next level in the Btree. +.PD +.RE +.TP .B rmapbt There is one set of filesystem blocks forming the reverse mapping Btree for each allocation group. The root block of this Btree is designated by the ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 23/41] libfrog: enable scrubbing of the realtime refcount data 2022-12-30 22:20 ` [PATCHSET v1.0 00/41] libxfs: reflink on the realtime device Darrick J. Wong ` (17 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 24/41] xfs_db: display the realtime refcount btree contents Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 21/41] xfs: scrub the realtime refcount btree Darrick J. Wong ` (21 subsequent siblings) 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Add a new entry so that we can scrub the rtrefcountbt. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libfrog/scrub.c | 5 +++++ scrub/repair.c | 1 + 2 files changed, 6 insertions(+) diff --git a/libfrog/scrub.c b/libfrog/scrub.c index 6f12ec72b22..c3cf5312f80 100644 --- a/libfrog/scrub.c +++ b/libfrog/scrub.c @@ -164,6 +164,11 @@ const struct xfrog_scrub_descr xfrog_scrubbers[XFS_SCRUB_TYPE_NR] = { .descr = "realtime reverse mapping btree", .group = XFROG_SCRUB_GROUP_RTGROUP, }, + [XFS_SCRUB_TYPE_RTREFCBT] = { + .name = "rtrefcountbt", + .descr = "realtime reference count btree", + .group = XFROG_SCRUB_GROUP_RTGROUP, + }, }; #undef DEP diff --git a/scrub/repair.c b/scrub/repair.c index 3e00db7a2fd..cd652dc85a1 100644 --- a/scrub/repair.c +++ b/scrub/repair.c @@ -424,6 +424,7 @@ repair_item_difficulty( case XFS_SCRUB_TYPE_RTBITMAP: case XFS_SCRUB_TYPE_RTSUM: case XFS_SCRUB_TYPE_RGSUPER: + case XFS_SCRUB_TYPE_RTREFCBT: ret |= REPAIR_DIFFICULTY_PRIMARY; break; } ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 21/41] xfs: scrub the realtime refcount btree 2022-12-30 22:20 ` [PATCHSET v1.0 00/41] libxfs: reflink on the realtime device Darrick J. Wong ` (18 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 23/41] libfrog: enable scrubbing of the realtime refcount data Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 18/41] xfs: apply rt extent alignment constraints to CoW extsize hint Darrick J. Wong ` (20 subsequent siblings) 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Add code to scrub realtime refcount btrees. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/xfs_fs.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/libxfs/xfs_fs.h b/libxfs/xfs_fs.h index 5819576a51a..453b0861225 100644 --- a/libxfs/xfs_fs.h +++ b/libxfs/xfs_fs.h @@ -746,9 +746,10 @@ struct xfs_scrub_metadata { #define XFS_SCRUB_TYPE_RGSUPER 28 /* realtime superblock */ #define XFS_SCRUB_TYPE_RGBITMAP 29 /* realtime group bitmap */ #define XFS_SCRUB_TYPE_RTRMAPBT 30 /* rtgroup reverse mapping btree */ +#define XFS_SCRUB_TYPE_RTREFCBT 31 /* realtime reference count btree */ /* Number of scrub subcommands. */ -#define XFS_SCRUB_TYPE_NR 31 +#define XFS_SCRUB_TYPE_NR 32 /* i: Repair this metadata. */ #define XFS_SCRUB_IFLAG_REPAIR (1u << 0) ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 18/41] xfs: apply rt extent alignment constraints to CoW extsize hint 2022-12-30 22:20 ` [PATCHSET v1.0 00/41] libxfs: reflink on the realtime device Darrick J. Wong ` (19 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 21/41] xfs: scrub the realtime refcount btree Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 20/41] xfs: report realtime refcount btree corruption errors to the health system Darrick J. Wong ` (19 subsequent siblings) 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> The copy-on-write extent size hint is subject to the same alignment constraints as the regular extent size hint. Since we're in the process of adding reflink (and therefore CoW) to the realtime device, we must apply the same scattered rextsize alignment validation strategies to both hints to deal with the possibility of rextsize changing. Therefore, fix the inode validator to perform rextsize alignment checks on regular realtime files, and to remove misaligned directory hints. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/xfs_inode_buf.c | 25 ++++++++++++++++++++----- libxfs/xfs_trans_inode.c | 14 ++++++++++++++ 2 files changed, 34 insertions(+), 5 deletions(-) diff --git a/libxfs/xfs_inode_buf.c b/libxfs/xfs_inode_buf.c index b2e47c3adca..ba4df981bd0 100644 --- a/libxfs/xfs_inode_buf.c +++ b/libxfs/xfs_inode_buf.c @@ -828,11 +828,29 @@ xfs_inode_validate_cowextsize( bool rt_flag; bool hint_flag; uint32_t cowextsize_bytes; + uint32_t blocksize_bytes; rt_flag = (flags & XFS_DIFLAG_REALTIME); hint_flag = (flags2 & XFS_DIFLAG2_COWEXTSIZE); cowextsize_bytes = XFS_FSB_TO_B(mp, cowextsize); + /* + * Similar to extent size hints, a directory can be configured to + * propagate realtime status and a CoW extent size hint to newly + * created files even if there is no realtime device, and the hints on + * disk can become misaligned if the sysadmin changes the rt extent + * size while adding the realtime device. + * + * Therefore, we can only enforce the rextsize alignment check against + * regular realtime files, and rely on callers to decide when alignment + * checks are appropriate, and fix things up as needed. + */ + + if (rt_flag) + blocksize_bytes = XFS_FSB_TO_B(mp, mp->m_sb.sb_rextsize); + else + blocksize_bytes = mp->m_sb.sb_blocksize; + if (hint_flag && !xfs_has_reflink(mp)) return __this_address; @@ -846,16 +864,13 @@ xfs_inode_validate_cowextsize( if (mode && !hint_flag && cowextsize != 0) return __this_address; - if (hint_flag && rt_flag) - return __this_address; - - if (cowextsize_bytes % mp->m_sb.sb_blocksize) + if (cowextsize_bytes % blocksize_bytes) return __this_address; if (cowextsize > XFS_MAX_BMBT_EXTLEN) return __this_address; - if (cowextsize > mp->m_sb.sb_agblocks / 2) + if (!rt_flag && cowextsize > mp->m_sb.sb_agblocks / 2) return __this_address; return NULL; diff --git a/libxfs/xfs_trans_inode.c b/libxfs/xfs_trans_inode.c index e2d5d3efaab..04d5449e3bc 100644 --- a/libxfs/xfs_trans_inode.c +++ b/libxfs/xfs_trans_inode.c @@ -157,6 +157,20 @@ xfs_trans_log_inode( flags |= XFS_ILOG_CORE; } + /* + * Inode verifiers do not check that the CoW extent size hint is an + * integer multiple of the rt extent size on a directory with both + * rtinherit and cowextsize flags set. If we're logging a directory + * that is misconfigured in this way, clear the hint. + */ + if ((ip->i_diflags & XFS_DIFLAG_RTINHERIT) && + (ip->i_diflags2 & XFS_DIFLAG2_COWEXTSIZE) && + (ip->i_cowextsize % ip->i_mount->m_sb.sb_rextsize) > 0) { + ip->i_diflags2 &= ~XFS_DIFLAG2_COWEXTSIZE; + ip->i_cowextsize = 0; + flags |= XFS_ILOG_CORE; + } + /* * Record the specific change for fdatasync optimisation. This allows * fdatasync to skip log forces for inodes that are only timestamp ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 20/41] xfs: report realtime refcount btree corruption errors to the health system 2022-12-30 22:20 ` [PATCHSET v1.0 00/41] libxfs: reflink on the realtime device Darrick J. Wong ` (20 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 18/41] xfs: apply rt extent alignment constraints to CoW extsize hint Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 19/41] xfs: enable extent size hints for CoW operations Darrick J. Wong ` (18 subsequent siblings) 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Whenever we encounter corrupt realtime refcount btree blocks, we should report that to the health monitoring system for later reporting. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/xfs_fs.h | 1 + libxfs/xfs_health.h | 4 +++- libxfs/xfs_inode_fork.c | 4 +++- libxfs/xfs_rtrefcount_btree.c | 5 ++++- 4 files changed, 11 insertions(+), 3 deletions(-) diff --git a/libxfs/xfs_fs.h b/libxfs/xfs_fs.h index 8547ba85c55..5819576a51a 100644 --- a/libxfs/xfs_fs.h +++ b/libxfs/xfs_fs.h @@ -314,6 +314,7 @@ struct xfs_rtgroup_geometry { #define XFS_RTGROUP_GEOM_SICK_SUPER (1 << 0) /* superblock */ #define XFS_RTGROUP_GEOM_SICK_BITMAP (1 << 1) /* rtbitmap for this group */ #define XFS_RTGROUP_GEOM_SICK_RMAPBT (1 << 2) /* reverse mappings */ +#define XFS_RTGROUP_GEOM_SICK_REFCNTBT (1 << 3) /* reference counts */ /* * Structures for XFS_IOC_FSGROWFSDATA, XFS_IOC_FSGROWFSLOG & XFS_IOC_FSGROWFSRT diff --git a/libxfs/xfs_health.h b/libxfs/xfs_health.h index d5976f6b0de..13128216754 100644 --- a/libxfs/xfs_health.h +++ b/libxfs/xfs_health.h @@ -68,6 +68,7 @@ struct xfs_rtgroup; #define XFS_SICK_RT_SUMMARY (1 << 1) /* realtime summary */ #define XFS_SICK_RT_SUPER (1 << 2) /* rt group superblock */ #define XFS_SICK_RT_RMAPBT (1 << 3) /* reverse mappings */ +#define XFS_SICK_RT_REFCNTBT (1 << 4) /* reference counts */ /* Observable health issues for AG metadata. */ #define XFS_SICK_AG_SB (1 << 0) /* superblock */ @@ -106,7 +107,8 @@ struct xfs_rtgroup; #define XFS_SICK_RT_PRIMARY (XFS_SICK_RT_BITMAP | \ XFS_SICK_RT_SUMMARY | \ XFS_SICK_RT_SUPER | \ - XFS_SICK_RT_RMAPBT) + XFS_SICK_RT_RMAPBT | \ + XFS_SICK_RT_REFCNTBT) #define XFS_SICK_AG_PRIMARY (XFS_SICK_AG_SB | \ XFS_SICK_AG_AGF | \ diff --git a/libxfs/xfs_inode_fork.c b/libxfs/xfs_inode_fork.c index b019cbeb5fd..4bdcfcda234 100644 --- a/libxfs/xfs_inode_fork.c +++ b/libxfs/xfs_inode_fork.c @@ -266,8 +266,10 @@ xfs_iformat_data_fork( } return xfs_iformat_rtrmap(ip, dip); case XFS_DINODE_FMT_REFCOUNT: - if (!xfs_has_rtreflink(ip->i_mount)) + if (!xfs_has_rtreflink(ip->i_mount)) { + xfs_inode_mark_sick(ip, XFS_SICK_INO_CORE); return -EFSCORRUPTED; + } return xfs_iformat_rtrefcount(ip, dip); default: xfs_inode_verifier_error(ip, -EFSCORRUPTED, __func__, diff --git a/libxfs/xfs_rtrefcount_btree.c b/libxfs/xfs_rtrefcount_btree.c index 5e9930d315c..537287c1696 100644 --- a/libxfs/xfs_rtrefcount_btree.c +++ b/libxfs/xfs_rtrefcount_btree.c @@ -25,6 +25,7 @@ #include "xfs_rtgroup.h" #include "xfs_rtbitmap.h" #include "xfs_imeta.h" +#include "xfs_health.h" static struct kmem_cache *xfs_rtrefcountbt_cur_cache; @@ -691,8 +692,10 @@ xfs_iformat_rtrefcount( level = be16_to_cpu(dfp->bb_level); if (level > mp->m_rtrefc_maxlevels || - xfs_rtrefcount_droot_space_calc(level, numrecs) > dsize) + xfs_rtrefcount_droot_space_calc(level, numrecs) > dsize) { + xfs_inode_mark_sick(ip, XFS_SICK_INO_CORE); return -EFSCORRUPTED; + } xfs_iroot_alloc(ip, XFS_DATA_FORK, xfs_rtrefcount_broot_space_calc(mp, level, numrecs)); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 19/41] xfs: enable extent size hints for CoW operations 2022-12-30 22:20 ` [PATCHSET v1.0 00/41] libxfs: reflink on the realtime device Darrick J. Wong ` (21 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 20/41] xfs: report realtime refcount btree corruption errors to the health system Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 22/41] xfs: online repair of the realtime refcount btree Darrick J. Wong ` (17 subsequent siblings) 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Wire up the copy-on-write extent size hint for realtime files, and connect it to the rt allocator so that we avoid fragmentation on rt filesystems. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/xfs_bmap.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/libxfs/xfs_bmap.c b/libxfs/xfs_bmap.c index 16f0683caef..d5842e3b4f6 100644 --- a/libxfs/xfs_bmap.c +++ b/libxfs/xfs_bmap.c @@ -6444,7 +6444,13 @@ xfs_get_cowextsz_hint( a = 0; if (ip->i_diflags2 & XFS_DIFLAG2_COWEXTSIZE) a = ip->i_cowextsize; - b = xfs_get_extsz_hint(ip); + if (XFS_IS_REALTIME_INODE(ip)) { + b = 0; + if (ip->i_diflags & XFS_DIFLAG_EXTSIZE) + b = ip->i_extsize; + } else { + b = xfs_get_extsz_hint(ip); + } a = max(a, b); if (a == 0) ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 22/41] xfs: online repair of the realtime refcount btree 2022-12-30 22:20 ` [PATCHSET v1.0 00/41] libxfs: reflink on the realtime device Darrick J. Wong ` (22 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 19/41] xfs: enable extent size hints for CoW operations Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 26/41] xfs_db: widen block type mask to 64 bits Darrick J. Wong ` (16 subsequent siblings) 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Port the data device's refcount btree repair code to the realtime refcount btree. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/xfs_refcount.c | 2 +- libxfs/xfs_refcount.h | 2 ++ 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/libxfs/xfs_refcount.c b/libxfs/xfs_refcount.c index b96472a2fe2..c286bb3e3b5 100644 --- a/libxfs/xfs_refcount.c +++ b/libxfs/xfs_refcount.c @@ -153,7 +153,7 @@ xfs_refcount_check_perag_irec( return NULL; } -static inline xfs_failaddr_t +inline xfs_failaddr_t xfs_refcount_check_rtgroup_irec( struct xfs_rtgroup *rtg, const struct xfs_refcount_irec *irec) diff --git a/libxfs/xfs_refcount.h b/libxfs/xfs_refcount.h index c7907119d10..790d7fe9e67 100644 --- a/libxfs/xfs_refcount.h +++ b/libxfs/xfs_refcount.h @@ -132,6 +132,8 @@ extern void xfs_refcount_btrec_to_irec(const union xfs_btree_rec *rec, struct xfs_refcount_irec *irec); xfs_failaddr_t xfs_refcount_check_perag_irec(struct xfs_perag *pag, const struct xfs_refcount_irec *irec); +xfs_failaddr_t xfs_refcount_check_rtgroup_irec(struct xfs_rtgroup *rtg, + const struct xfs_refcount_irec *irec); xfs_failaddr_t xfs_refcount_check_irec(struct xfs_btree_cur *cur, const struct xfs_refcount_irec *irec); extern int xfs_refcount_insert(struct xfs_btree_cur *cur, ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 26/41] xfs_db: widen block type mask to 64 bits 2022-12-30 22:20 ` [PATCHSET v1.0 00/41] libxfs: reflink on the realtime device Darrick J. Wong ` (23 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 22/41] xfs: online repair of the realtime refcount btree Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 29/41] xfs_spaceman: report health of the realtime refcount btree Darrick J. Wong ` (15 subsequent siblings) 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> We're about to enlarge enum dbm beyond 32 items, so we need to widen the block type mask to 64 bits to avoid problems. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- db/check.c | 72 ++++++++++++++++++++++++++++++------------------------------ 1 file changed, 36 insertions(+), 36 deletions(-) diff --git a/db/check.c b/db/check.c index 79f26b0e789..6c92b961283 100644 --- a/db/check.c +++ b/db/check.c @@ -282,9 +282,9 @@ static void check_set_rdbmap(xfs_rfsblock_t bno, xfs_extlen_t len, dbm_t type1, dbm_t type2); static void check_summary(void); static void checknot_dbmap(xfs_agnumber_t agno, xfs_agblock_t agbno, - xfs_extlen_t len, int typemask); + xfs_extlen_t len, uint64_t typemask); static void checknot_rdbmap(xfs_rfsblock_t bno, xfs_extlen_t len, - int typemask); + uint64_t typemask); static void dir_hash_add(xfs_dahash_t hash, xfs_dir2_dataptr_t addr); static void dir_hash_check(inodata_t *id, int v); @@ -904,7 +904,7 @@ blockget_f( if (!tflag) { /* are we in test mode, faking out freespace? */ for (agno = 0; agno < mp->m_sb.sb_agcount; agno++) checknot_dbmap(agno, 0, mp->m_sb.sb_agblocks, - (1 << DBM_UNKNOWN) | (1 << DBM_FREE1)); + (1ULL << DBM_UNKNOWN) | (1ULL << DBM_FREE1)); } for (agno = 0; agno < mp->m_sb.sb_agcount; agno++) check_linkcounts(agno); @@ -912,7 +912,7 @@ blockget_f( checknot_rdbmap(0, (xfs_extlen_t)(mp->m_sb.sb_rextents * mp->m_sb.sb_rextsize), - 1 << DBM_UNKNOWN); + 1ULL << DBM_UNKNOWN); check_summary(); } if (mp->m_sb.sb_icount != icount) { @@ -1083,7 +1083,7 @@ blocktrash_f( int c; int count; int done; - int goodmask; + uint64_t goodmask; int i; ltab_t *lentab; int lentablen; @@ -1095,7 +1095,7 @@ blocktrash_f( xfs_rfsblock_t randb; uint seed; int sopt; - int tmask; + uint64_t tmask; bool this_block = false; int bit_offset = -1; @@ -1108,27 +1108,27 @@ blocktrash_f( seed = (unsigned int)(now.tv_sec ^ now.tv_usec); sopt = 0; tmask = 0; - goodmask = (1 << DBM_AGF) | - (1 << DBM_AGFL) | - (1 << DBM_AGI) | - (1 << DBM_ATTR) | - (1 << DBM_BTBMAPA) | - (1 << DBM_BTBMAPD) | - (1 << DBM_BTBNO) | - (1 << DBM_BTCNT) | - (1 << DBM_BTINO) | - (1 << DBM_DIR) | - (1 << DBM_INODE) | - (1 << DBM_LOG) | - (1 << DBM_QUOTA) | - (1 << DBM_RTBITMAP) | - (1 << DBM_RTSUM) | - (1 << DBM_BTRTRMAP) | - (1 << DBM_SYMLINK) | - (1 << DBM_BTFINO) | - (1 << DBM_BTRMAP) | - (1 << DBM_BTREFC) | - (1 << DBM_SB); + goodmask = (1ULL << DBM_AGF) | + (1ULL << DBM_AGFL) | + (1ULL << DBM_AGI) | + (1ULL << DBM_ATTR) | + (1ULL << DBM_BTBMAPA) | + (1ULL << DBM_BTBMAPD) | + (1ULL << DBM_BTBNO) | + (1ULL << DBM_BTCNT) | + (1ULL << DBM_BTINO) | + (1ULL << DBM_DIR) | + (1ULL << DBM_INODE) | + (1ULL << DBM_LOG) | + (1ULL << DBM_QUOTA) | + (1ULL << DBM_RTBITMAP) | + (1ULL << DBM_RTSUM) | + (1ULL << DBM_BTRTRMAP) | + (1ULL << DBM_SYMLINK) | + (1ULL << DBM_BTFINO) | + (1ULL << DBM_BTRMAP) | + (1ULL << DBM_BTREFC) | + (1ULL << DBM_SB); while ((c = getopt(argc, argv, "0123n:o:s:t:x:y:z")) != EOF) { switch (c) { case '0': @@ -1174,11 +1174,11 @@ blocktrash_f( if (strcmp(typename[i], optarg) == 0) break; } - if (!typename[i] || (((1 << i) & goodmask) == 0)) { + if (!typename[i] || (((1ULL << i) & goodmask) == 0)) { dbprintf(_("bad blocktrash type %s\n"), optarg); return 0; } - tmask |= 1 << i; + tmask |= 1ULL << i; break; case 'x': min = (int)strtol(optarg, &p, 0); @@ -1217,7 +1217,7 @@ blocktrash_f( return 0; } if (tmask == 0) - tmask = goodmask & ~((1 << DBM_LOG) | (1 << DBM_SB)); + tmask = goodmask & ~((1ULL << DBM_LOG) | (1ULL << DBM_SB)); lentab = xmalloc(sizeof(ltab_t)); lentab->min = lentab->max = min; lentablen = 1; @@ -1242,7 +1242,7 @@ blocktrash_f( for (agbno = 0, p = dbmap[agno]; agbno < mp->m_sb.sb_agblocks; agbno++, p++) { - if ((1 << *p) & tmask) + if ((1ULL << *p) & tmask) blocks++; } } @@ -1259,7 +1259,7 @@ blocktrash_f( for (agbno = 0, p = dbmap[agno]; agbno < mp->m_sb.sb_agblocks; agbno++, p++) { - if (!((1 << *p) & tmask)) + if (!((1ULL << *p) & tmask)) continue; if (bi++ < randb) continue; @@ -1802,7 +1802,7 @@ checknot_dbmap( xfs_agnumber_t agno, xfs_agblock_t agbno, xfs_extlen_t len, - int typemask) + uint64_t typemask) { xfs_extlen_t i; char *p; @@ -1810,7 +1810,7 @@ checknot_dbmap( if (!check_range(agno, agbno, len)) return; for (i = 0, p = &dbmap[agno][agbno]; i < len; i++, p++) { - if ((1 << *p) & typemask) { + if ((1ULL << *p) & typemask) { if (!sflag || CHECK_BLISTA(agno, agbno + i)) dbprintf(_("block %u/%u type %s not expected\n"), agno, agbno + i, typename[(dbm_t)*p]); @@ -1823,7 +1823,7 @@ static void checknot_rdbmap( xfs_rfsblock_t bno, xfs_extlen_t len, - int typemask) + uint64_t typemask) { xfs_extlen_t i; char *p; @@ -1831,7 +1831,7 @@ checknot_rdbmap( if (!check_rrange(bno, len)) return; for (i = 0, p = &dbmap[mp->m_sb.sb_agcount][bno]; i < len; i++, p++) { - if ((1 << *p) & typemask) { + if ((1ULL << *p) & typemask) { if (!sflag || CHECK_BLIST(bno + i)) dbprintf(_("rtblock %llu type %s not expected\n"), bno + i, typename[(dbm_t)*p]); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 29/41] xfs_spaceman: report health of the realtime refcount btree 2022-12-30 22:20 ` [PATCHSET v1.0 00/41] libxfs: reflink on the realtime device Darrick J. Wong ` (24 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 26/41] xfs_db: widen block type mask to 64 bits Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 32/41] xfs_repair: find and mark the rtrefcountbt inode Darrick J. Wong ` (14 subsequent siblings) 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Report the health of the realtime reference count btree. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- spaceman/health.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/spaceman/health.c b/spaceman/health.c index 950610d9770..6114be5704c 100644 --- a/spaceman/health.c +++ b/spaceman/health.c @@ -44,6 +44,11 @@ static bool has_rtrmapbt(const struct xfs_fsop_geom *g) return g->rtblocks > 0 && (g->flags & XFS_FSOP_GEOM_FLAGS_RMAPBT); } +static bool has_rtreflink(const struct xfs_fsop_geom *g) +{ + return g->rtblocks > 0 && (g->flags & XFS_FSOP_GEOM_FLAGS_REFLINK); +} + struct flag_map { unsigned int mask; bool (*has_fn)(const struct xfs_fsop_geom *g); @@ -153,6 +158,11 @@ static const struct flag_map rtgroup_flags[] = { .descr = "realtime reverse mappings btree", .has_fn = has_rtrmapbt, }, + { + .mask = XFS_RTGROUP_GEOM_SICK_REFCNTBT, + .descr = "realtime reference count btree", + .has_fn = has_rtreflink, + }, {0}, }; ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 32/41] xfs_repair: find and mark the rtrefcountbt inode 2022-12-30 22:20 ` [PATCHSET v1.0 00/41] libxfs: reflink on the realtime device Darrick J. Wong ` (25 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 29/41] xfs_spaceman: report health of the realtime refcount btree Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 30/41] xfs_repair: allow CoW staging extents in the realtime rmap records Darrick J. Wong ` (13 subsequent siblings) 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Make sure that we find the realtime refcountbt inode and mark it appropriately, just in case we find a rogue inode claiming to be an rtrefcount, or just plain garbage in the superblock field. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- repair/dino_chunks.c | 11 ++++++ repair/dinode.c | 29 ++++++++++++++++- repair/dir2.c | 2 + repair/incore.h | 1 + repair/rmap.c | 87 +++++++++++++++++++++++++++++++++++++++++++++++++- repair/rmap.h | 3 +- repair/scan.c | 8 ++--- 7 files changed, 133 insertions(+), 8 deletions(-) diff --git a/repair/dino_chunks.c b/repair/dino_chunks.c index 277f21c6936..fe394cd637c 100644 --- a/repair/dino_chunks.c +++ b/repair/dino_chunks.c @@ -1027,6 +1027,17 @@ process_inode_chunk( _("would clear realtime rmap inode %" PRIu64 "\n"), ino); } + } else if (is_rtrefcount_ino(ino)) { + refcount_avoid_check(mp); + if (!no_modify) { + do_warn( + _("cleared realtime refcount inode %" PRIu64 "\n"), + ino); + } else { + do_warn( + _("would clear realtime refcount inode %" PRIu64 "\n"), + ino); + } } else if (!no_modify) { do_warn(_("cleared inode %" PRIu64 "\n"), ino); diff --git a/repair/dinode.c b/repair/dinode.c index 7722d7762d2..1322f31d47e 100644 --- a/repair/dinode.c +++ b/repair/dinode.c @@ -156,6 +156,9 @@ clear_dinode(xfs_mount_t *mp, struct xfs_dinode *dino, xfs_ino_t ino_num) if (is_rtrmap_inode(ino_num)) rmap_avoid_check(mp); + if (is_rtrefcount_ino(ino_num)) + refcount_avoid_check(mp); + /* and clear the forks */ memset(XFS_DFORK_DPTR(dino), 0, XFS_LITINO(mp)); return; @@ -1067,6 +1070,12 @@ _("rtrefcount inode %" PRIu64 " not flagged as metadata\n"), lino); return 1; } + if (type != XR_INO_RTREFC) { + do_warn( +_("rtrefcount inode %" PRIu64 " was not found in the metadata directory tree\n"), + lino); + return 1; + } priv.rgno = rtgroup_for_rtrefcount_inode(mp, ino); if (priv.rgno == NULLRGNUMBER) { @@ -1107,7 +1116,7 @@ _("computed size of rtrefcountbt root (%zu bytes) is greater than space in " error = process_rtrefc_reclist(mp, rp, numrecs, &priv, "rtrefcountbt root"); if (error) { - refcount_avoid_check(); + refcount_avoid_check(mp); return 1; } return 0; @@ -2063,6 +2072,9 @@ process_check_sb_inodes( if (is_rtrmap_inode(lino)) return process_check_rt_inode(mp, dinoc, lino, type, dirty, XR_INO_RTRMAP, _("realtime rmap btree")); + if (is_rtrefcount_ino(lino)) + return process_check_rt_inode(mp, dinoc, lino, type, dirty, + XR_INO_RTREFC, _("realtime refcount btree")); return 0; } @@ -2172,6 +2184,18 @@ _("found inode %" PRIu64 " claiming to be a rtrmapbt file, but rmapbt is disable } break; + case XR_INO_RTREFC: + /* + * if we have no refcountbt, any inode claiming + * to be a real-time file is bogus + */ + if (!xfs_has_reflink(mp)) { + do_warn( +_("found inode %" PRIu64 " claiming to be a rtrefcountbt file, but reflink is disabled\n"), lino); + return 1; + } + break; + default: break; } @@ -3299,6 +3323,8 @@ _("bad (negative) size %" PRId64 " on inode %" PRIu64 "\n"), type = XR_INO_PQUOTA; else if (is_rtrmap_inode(lino)) type = XR_INO_RTRMAP; + else if (is_rtrefcount_ino(lino)) + type = XR_INO_RTREFC; else type = XR_INO_DATA; break; @@ -3405,6 +3431,7 @@ _("Bad CoW extent size %u on inode %" PRIu64 ", "), case XR_INO_GQUOTA: case XR_INO_PQUOTA: case XR_INO_RTRMAP: + case XR_INO_RTREFC: /* * This inode was recognized as being filesystem * metadata, so preserve the inode and its contents for diff --git a/repair/dir2.c b/repair/dir2.c index 4c59ad071de..6f6933f91d5 100644 --- a/repair/dir2.c +++ b/repair/dir2.c @@ -157,6 +157,8 @@ is_meta_ino( reason = _("realtime summary"); else if (is_rtrmap_inode(lino)) reason = _("realtime rmap"); + else if (is_rtrefcount_ino(lino)) + reason = _("realtime refcount"); else if (lino == mp->m_sb.sb_uquotino) reason = _("user quota"); else if (lino == mp->m_sb.sb_gquotino) diff --git a/repair/incore.h b/repair/incore.h index 3c0e4ea2b29..aaf5b5b55ba 100644 --- a/repair/incore.h +++ b/repair/incore.h @@ -225,6 +225,7 @@ int count_bcnt_extents(xfs_agnumber_t); #define XR_INO_GQUOTA 13 /* group quota inode */ #define XR_INO_PQUOTA 14 /* project quota inode */ #define XR_INO_RTRMAP 15 /* realtime rmap */ +#define XR_INO_RTREFC 16 /* realtime refcount */ /* inode allocation tree */ diff --git a/repair/rmap.c b/repair/rmap.c index 69954b448ed..9394720a1b6 100644 --- a/repair/rmap.c +++ b/repair/rmap.c @@ -38,6 +38,12 @@ struct xfs_ag_rmap { * NULLFSINO to signal to phase 6 to link a new inode into the metadir. */ xfs_ino_t rg_rmap_ino; + + /* + * inumber of the refcount btree for this rtgroup. This can be set to + * NULLFSINO to signal to phase 6 to link a new inode into the metadir. + */ + xfs_ino_t rg_refcount_ino; }; static struct xfs_ag_rmap *ag_rmaps; @@ -48,6 +54,9 @@ static bool refcbt_suspect; /* Bitmap of rt group rmap inodes reachable via /realtime/$rgno.rmap. */ static struct bitmap *rmap_inodes; +/* Bitmap of rt group refcount inodes reachable via /realtime/$rgno.refcount. */ +static struct bitmap *refcount_inodes; + static struct xfs_ag_rmap *rmaps_for_group(bool isrt, unsigned int group) { if (isrt) @@ -129,6 +138,7 @@ rmaps_init_rt( goto nomem; ag_rmap->rg_rmap_ino = NULLFSINO; + ag_rmap->rg_refcount_ino = NULLFSINO; return; nomem: do_error( @@ -210,6 +220,39 @@ set_rtgroup_rmap_inode( return 0; } +static inline int +set_rtgroup_refcount_inode( + struct xfs_mount *mp, + xfs_rgnumber_t rgno) +{ + struct xfs_imeta_path *path; + struct xfs_ag_rmap *ar = rmaps_for_group(true, rgno); + xfs_ino_t ino; + int error; + + if (!xfs_has_rtreflink(mp)) + return 0; + + error = -libxfs_rtrefcountbt_create_path(mp, rgno, &path); + if (error) + return error; + + error = -libxfs_imeta_lookup(mp, path, &ino); + libxfs_imeta_free_path(path); + if (error) + return error; + + if (ino == NULLFSINO || bitmap_test(refcount_inodes, ino, 1)) + return EFSCORRUPTED; + + error = bitmap_set(refcount_inodes, ino, 1); + if (error) + return error; + + ar->rg_refcount_ino = ino; + return 0; +} + static void discover_rtgroup_inodes( struct xfs_mount *mp) @@ -221,10 +264,20 @@ discover_rtgroup_inodes( if (error) goto out; + error = bitmap_alloc(&refcount_inodes); + if (error) { + bitmap_free(&rmap_inodes); + goto out; + } + for (rgno = 0; rgno < mp->m_sb.sb_rgcount; rgno++) { int err2 = set_rtgroup_rmap_inode(mp, rgno); if (err2 && !error) error = err2; + + err2 = set_rtgroup_refcount_inode(mp, rgno); + if (err2 && !error) + error = err2; } out: @@ -240,6 +293,7 @@ discover_rtgroup_inodes( static inline void free_rtmeta_inode_bitmaps(void) { + bitmap_free(&refcount_inodes); bitmap_free(&rmap_inodes); } @@ -255,10 +309,28 @@ rtgroup_for_rtrefcount_inode( struct xfs_mount *mp, xfs_ino_t ino) { - /* This will be implemented later. */ + xfs_rgnumber_t rgno; + + if (!refcount_inodes) + return NULLRGNUMBER; + + for (rgno = 0; rgno < mp->m_sb.sb_rgcount; rgno++) { + if (rg_rmaps[rgno].rg_refcount_ino == ino) + return rgno; + } + return NULLRGNUMBER; } +bool +is_rtrefcount_ino( + xfs_ino_t ino) +{ + if (!refcount_inodes) + return false; + return bitmap_test(refcount_inodes, ino, 1); +} + /* * Initialize per-AG reverse map data. */ @@ -1816,8 +1888,19 @@ init_refcount_cursor( * Disable the refcount btree check. */ void -refcount_avoid_check(void) +refcount_avoid_check( + struct xfs_mount *mp) { + struct xfs_rtgroup *rtg; + xfs_rgnumber_t rgno; + + for_each_rtgroup(mp, rgno, rtg) { + struct xfs_ag_rmap *ar = rmaps_for_group(true, rtg->rtg_rgno); + + ar->rg_refcount_ino = NULLFSINO; + } + + bitmap_clear(refcount_inodes, 0, XFS_MAXINUMBER); refcbt_suspect = true; } diff --git a/repair/rmap.h b/repair/rmap.h index 83331c825ec..4f49b19062c 100644 --- a/repair/rmap.h +++ b/repair/rmap.h @@ -41,7 +41,7 @@ extern void rmap_high_key_from_rec(struct xfs_rmap_irec *rec, extern int compute_refcounts(struct xfs_mount *, xfs_agnumber_t); uint64_t refcount_record_count(struct xfs_mount *mp, xfs_agnumber_t agno); extern int init_refcount_cursor(xfs_agnumber_t, struct xfs_slab_cursor **); -extern void refcount_avoid_check(void); +extern void refcount_avoid_check(struct xfs_mount *mp); void check_refcounts(struct xfs_mount *mp, xfs_agnumber_t agno); extern void record_inode_reflink_flag(struct xfs_mount *, struct xfs_dinode *, @@ -68,5 +68,6 @@ int populate_rtgroup_rmapbt(struct xfs_rtgroup *rtg, struct xfs_inode *ip); xfs_rgnumber_t rtgroup_for_rtrefcount_inode(struct xfs_mount *mp, xfs_ino_t ino); +bool is_rtrefcount_ino(xfs_ino_t ino); #endif /* RMAP_H_ */ diff --git a/repair/scan.c b/repair/scan.c index 0a37137f019..50ae662d73b 100644 --- a/repair/scan.c +++ b/repair/scan.c @@ -1985,7 +1985,7 @@ _("extent (%u/%u) len %u claimed, state is %d\n"), libxfs_perag_put(pag); out: if (suspect) - refcount_avoid_check(); + refcount_avoid_check(mp); return; } @@ -2275,7 +2275,7 @@ _("%s btree block claimed (state %d), agno %d, agbno %d, suspect %d\n"), } out: if (suspect) { - refcount_avoid_check(); + refcount_avoid_check(mp); return 1; } @@ -3138,7 +3138,7 @@ validate_agf( if (levels == 0 || levels > mp->m_refc_maxlevels) { do_warn(_("bad levels %u for refcountbt root, agno %d\n"), levels, agno); - refcount_avoid_check(); + refcount_avoid_check(mp); } bno = be32_to_cpu(agf->agf_refcount_root); @@ -3156,7 +3156,7 @@ validate_agf( } else { do_warn(_("bad agbno %u for refcntbt root, agno %d\n"), bno, agno); - refcount_avoid_check(); + refcount_avoid_check(mp); } } ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 30/41] xfs_repair: allow CoW staging extents in the realtime rmap records 2022-12-30 22:20 ` [PATCHSET v1.0 00/41] libxfs: reflink on the realtime device Darrick J. Wong ` (26 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 32/41] xfs_repair: find and mark the rtrefcountbt inode Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 28/41] xfs_db: copy the realtime refcount btree Darrick J. Wong ` (12 subsequent siblings) 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Don't flag the rt rmap btree as having errors if there are CoW staging extent records in it and the filesystem supports. As far as reporting leftover staging extents, we'll report them when we scan the rt refcount btree, in a future patch. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- repair/scan.c | 17 ++++++++++++++--- 1 file changed, 14 insertions(+), 3 deletions(-) diff --git a/repair/scan.c b/repair/scan.c index 0ff8afccedc..c9209ebc3d7 100644 --- a/repair/scan.c +++ b/repair/scan.c @@ -1414,9 +1414,20 @@ _("invalid length %llu in record %u of %s\n"), continue; } - /* We only store file data and superblocks in the rtrmap. */ - if (XFS_RMAP_NON_INODE_OWNER(owner) && - owner != XFS_RMAP_OWN_FS) { + /* + * We only store file data, COW data, and superblocks in the + * rtrmap. + */ + if (owner == XFS_RMAP_OWN_COW) { + if (!xfs_has_reflink(mp)) { + do_warn( +_("invalid CoW staging extent in record %u of %s\n"), + i, name); + suspect++; + continue; + } + } else if (XFS_RMAP_NON_INODE_OWNER(owner) && + owner != XFS_RMAP_OWN_FS) { do_warn( _("invalid owner %lld in record %u of %s\n"), (long long int)owner, i, name); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 28/41] xfs_db: copy the realtime refcount btree 2022-12-30 22:20 ` [PATCHSET v1.0 00/41] libxfs: reflink on the realtime device Darrick J. Wong ` (27 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 30/41] xfs_repair: allow CoW staging extents in the realtime rmap records Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 31/41] xfs_repair: use realtime refcount btree data to check block types Darrick J. Wong ` (11 subsequent siblings) 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Copy the realtime refcountbt when we're metadumping the filesystem. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- db/metadump.c | 125 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 125 insertions(+) diff --git a/db/metadump.c b/db/metadump.c index e8663e11a3f..b4549117117 100644 --- a/db/metadump.c +++ b/db/metadump.c @@ -717,6 +717,54 @@ copy_refcount_btree( return scan_btree(agno, root, levels, TYP_REFCBT, agf, scanfunc_refcntbt); } +static int +scanfunc_rtrefcbt( + struct xfs_btree_block *block, + xfs_agnumber_t agno, + xfs_agblock_t agbno, + int level, + typnm_t btype, + void *arg) +{ + xfs_rtrefcount_ptr_t *pp; + int i; + int numrecs; + + if (level == 0) + return 1; + + numrecs = be16_to_cpu(block->bb_numrecs); + if (numrecs > mp->m_rtrefc_mxr[1]) { + if (show_warnings) + print_warning("invalid numrecs (%u) in %s block %u/%u", + numrecs, typtab[btype].name, agno, agbno); + return 1; + } + + pp = xfs_rtrefcount_ptr_addr(block, 1, mp->m_rtrefc_mxr[1]); + for (i = 0; i < numrecs; i++) { + xfs_agnumber_t pagno; + xfs_agblock_t pbno; + + pagno = XFS_FSB_TO_AGNO(mp, get_unaligned_be64(&pp[i])); + pbno = XFS_FSB_TO_AGBNO(mp, get_unaligned_be64(&pp[i])); + + if (pbno == 0 || pbno > mp->m_sb.sb_agblocks || + pagno > mp->m_sb.sb_agcount) { + if (show_warnings) + print_warning("invalid block number (%u/%u) " + "in inode %llu %s block %u/%u", + pagno, pbno, (long long)cur_ino, + typtab[btype].name, agno, agbno); + continue; + } + if (!scan_btree(pagno, pbno, level, btype, arg, + scanfunc_rtrefcbt)) + return 0; + } + return 1; +} + /* filename and extended attribute obfuscation routines */ struct name_ent { @@ -2458,6 +2506,80 @@ process_rtrmap( return 1; } +static int +process_rtrefc( + struct xfs_dinode *dip, + typnm_t itype) +{ + struct xfs_rtrefcount_root *dib; + int i; + xfs_rtrefcount_ptr_t *pp; + int level; + int nrecs; + int maxrecs; + int whichfork; + typnm_t btype; + + if (itype == TYP_ATTR && show_warnings) { + print_warning("ignoring rtrefcbt root in inode %llu attr fork", + (long long)cur_ino); + return 1; + } + + whichfork = XFS_DATA_FORK; + btype = TYP_RTREFCBT; + + dib = (struct xfs_rtrefcount_root *)XFS_DFORK_PTR(dip, whichfork); + level = be16_to_cpu(dib->bb_level); + nrecs = be16_to_cpu(dib->bb_numrecs); + + if (level > mp->m_rtrefc_maxlevels) { + if (show_warnings) + print_warning("invalid level (%u) in inode %lld %s " + "root", level, (long long)cur_ino, + typtab[btype].name); + return 1; + } + + if (level == 0) + return 1; + + maxrecs = libxfs_rtrefcountbt_droot_maxrecs( + XFS_DFORK_SIZE(dip, mp, whichfork), + false); + if (nrecs > maxrecs) { + if (show_warnings) + print_warning("invalid numrecs (%u) in inode %lld %s " + "root", nrecs, (long long)cur_ino, + typtab[btype].name); + return 1; + } + + pp = xfs_rtrefcount_droot_ptr_addr(dib, 1, maxrecs); + for (i = 0; i < nrecs; i++) { + xfs_agnumber_t ag; + xfs_agblock_t bno; + + ag = XFS_FSB_TO_AGNO(mp, get_unaligned_be64(&pp[i])); + bno = XFS_FSB_TO_AGBNO(mp, get_unaligned_be64(&pp[i])); + + if (bno == 0 || bno > mp->m_sb.sb_agblocks || + ag > mp->m_sb.sb_agcount) { + if (show_warnings) + print_warning("invalid block number (%u/%u) " + "in inode %llu %s root", ag, + bno, (long long)cur_ino, + typtab[btype].name); + continue; + } + + if (!scan_btree(ag, bno, level, btype, &itype, + scanfunc_rtrefcbt)) + return 0; + } + return 1; +} + static int process_inode_data( struct xfs_dinode *dip, @@ -2505,6 +2627,9 @@ process_inode_data( case XFS_DINODE_FMT_RMAP: return process_rtrmap(dip, itype); + + case XFS_DINODE_FMT_REFCOUNT: + return process_rtrefc(dip, itype); } return 1; } ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 31/41] xfs_repair: use realtime refcount btree data to check block types 2022-12-30 22:20 ` [PATCHSET v1.0 00/41] libxfs: reflink on the realtime device Darrick J. Wong ` (28 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 28/41] xfs_db: copy the realtime refcount btree Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 25/41] xfs_db: support the realtime refcountbt Darrick J. Wong ` (10 subsequent siblings) 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Use the realtime refcount btree to pre-populate the block type information so that when repair iterates the primary metadata, we can confirm the block type. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- repair/dinode.c | 152 ++++++++++++++++++++++++++++ repair/rmap.c | 9 ++ repair/rmap.h | 3 + repair/scan.c | 299 ++++++++++++++++++++++++++++++++++++++++++++++++++++++- repair/scan.h | 33 ++++++ 5 files changed, 490 insertions(+), 6 deletions(-) diff --git a/repair/dinode.c b/repair/dinode.c index b2c27984671..7722d7762d2 100644 --- a/repair/dinode.c +++ b/repair/dinode.c @@ -274,6 +274,8 @@ _("bad state in rt extent map %" PRIu64 "\n"), break; case XR_E_INUSE: case XR_E_MULT: + if (xfs_has_rtreflink(mp)) + break; set_rtbmap(ext, XR_E_MULT); break; case XR_E_FREE1: @@ -348,6 +350,8 @@ _("data fork in rt inode %" PRIu64 " found rt metadata extent %" PRIu64 " in rt return 1; case XR_E_INUSE: case XR_E_MULT: + if (xfs_has_rtreflink(mp)) + break; do_warn( _("data fork in rt inode %" PRIu64 " claims used rt extent %" PRIu64 "\n"), ino, b); @@ -1012,6 +1016,148 @@ _("bad rtrmap btree ptr 0x%" PRIx64 " in ino %" PRIu64 "\n"), return suspect ? 1 : 0; } +/* + * return 1 if inode should be cleared, 0 otherwise + */ +static int +process_rtrefc( + struct xfs_mount *mp, + xfs_agnumber_t agno, + xfs_agino_t ino, + struct xfs_dinode *dip, + int type, + int *dirty, + xfs_rfsblock_t *tot, + uint64_t *nex, + blkmap_t **blkmapp, + int check_dups) +{ + struct refc_priv priv = { .nr_blocks = 0 }; + struct xfs_rtrefcount_root *dib; + xfs_rtrefcount_ptr_t *pp; + struct xfs_refcount_key *kp; + struct xfs_refcount_rec *rp; + char *forkname = get_forkname(XFS_DATA_FORK); + xfs_rgblock_t oldkey, key; + xfs_ino_t lino; + xfs_fsblock_t bno; + size_t droot_sz; + int i; + int level; + int numrecs; + int dmxr; + int suspect = 0; + int error; + + /* We rebuild the rtrefcountbt, so no need to process blocks again. */ + if (check_dups) { + *tot = be64_to_cpu(dip->di_nblocks); + return 0; + } + + lino = XFS_AGINO_TO_INO(mp, agno, ino); + + /* + * This refcount btree inode must be a metadata inode reachable via + * /realtime/$rgno.refcount in the metadata directory tree. + */ + if (!(dip->di_flags2 & be64_to_cpu(XFS_DIFLAG2_METADATA))) { + do_warn( +_("rtrefcount inode %" PRIu64 " not flagged as metadata\n"), + lino); + return 1; + } + + priv.rgno = rtgroup_for_rtrefcount_inode(mp, ino); + if (priv.rgno == NULLRGNUMBER) { + do_warn( +_("could not associate refcount inode %" PRIu64 " with any rtgroup\n"), + lino); + return 1; + } + + dib = (struct xfs_rtrefcount_root *)XFS_DFORK_PTR(dip, XFS_DATA_FORK); + *tot = 0; + *nex = 0; + + level = be16_to_cpu(dib->bb_level); + numrecs = be16_to_cpu(dib->bb_numrecs); + + if (level > mp->m_rtrefc_maxlevels) { + do_warn( +_("bad level %d in inode %" PRIu64 " rtrefcount btree root block\n"), + level, lino); + return 1; + } + + /* + * use rtroot/dfork_dsize since the root block is in the data fork + */ + droot_sz = xfs_rtrefcount_droot_space_calc(level, numrecs); + if (droot_sz > XFS_DFORK_SIZE(dip, mp, XFS_DATA_FORK)) { + do_warn( +_("computed size of rtrefcountbt root (%zu bytes) is greater than space in " + "inode %" PRIu64 " %s fork\n"), + droot_sz, lino, forkname); + return 1; + } + + if (level == 0) { + rp = xfs_rtrefcount_droot_rec_addr(dib, 1); + error = process_rtrefc_reclist(mp, rp, numrecs, + &priv, "rtrefcountbt root"); + if (error) { + refcount_avoid_check(); + return 1; + } + return 0; + } + + dmxr = libxfs_rtrefcountbt_droot_maxrecs( + XFS_DFORK_SIZE(dip, mp, XFS_DATA_FORK), false); + pp = xfs_rtrefcount_droot_ptr_addr(dib, 1, dmxr); + + /* check for in-order keys */ + for (i = 0; i < numrecs; i++) { + kp = xfs_rtrefcount_droot_key_addr(dib, i + 1); + + key = be32_to_cpu(kp->rc_startblock); + if (i == 0) { + oldkey = key; + continue; + } + if (key < oldkey) { + do_warn( +_("out of order key %u in rtrefcount root ino %" PRIu64 "\n"), + i, lino); + suspect++; + continue; + } + oldkey = key; + } + + /* probe keys */ + for (i = 0; i < numrecs; i++) { + bno = get_unaligned_be64(&pp[i]); + + if (!libxfs_verify_fsbno(mp, bno)) { + do_warn( +_("bad rtrefcount btree ptr 0x%" PRIx64 " in ino %" PRIu64 "\n"), + bno, lino); + return 1; + } + + if (scan_lbtree(bno, level, scan_rtrefcbt, + type, XFS_DATA_FORK, lino, tot, nex, blkmapp, + NULL, 0, 1, check_dups, XFS_RTREFC_CRC_MAGIC, + &priv, &xfs_rtrefcountbt_buf_ops)) + return 1; + } + + *tot = priv.nr_blocks; + return suspect ? 1 : 0; +} + /* * return 1 if inode should be cleared, 0 otherwise */ @@ -1807,6 +1953,7 @@ check_dinode_mode_format( case XFS_DINODE_FMT_RMAP: case XFS_DINODE_FMT_EXTENTS: case XFS_DINODE_FMT_BTREE: + case XFS_DINODE_FMT_REFCOUNT: return 0; } return -1; @@ -2236,6 +2383,10 @@ process_inode_data_fork( err = process_rtrmap(mp, agno, ino, dino, type, dirty, totblocks, nextents, dblkmap, check_dups); break; + case XFS_DINODE_FMT_REFCOUNT: + err = process_rtrefc(mp, agno, ino, dino, type, dirty, + totblocks, nextents, dblkmap, check_dups); + break; case XFS_DINODE_FMT_DEV: err = 0; break; @@ -2296,6 +2447,7 @@ _("would have tried to rebuild inode %"PRIu64" data fork\n"), break; case XFS_DINODE_FMT_DEV: case XFS_DINODE_FMT_RMAP: + case XFS_DINODE_FMT_REFCOUNT: err = 0; break; default: diff --git a/repair/rmap.c b/repair/rmap.c index 15a3e2ecaec..69954b448ed 100644 --- a/repair/rmap.c +++ b/repair/rmap.c @@ -250,6 +250,15 @@ bool is_rtrmap_inode(xfs_ino_t ino) return bitmap_test(rmap_inodes, ino, 1); } +xfs_rgnumber_t +rtgroup_for_rtrefcount_inode( + struct xfs_mount *mp, + xfs_ino_t ino) +{ + /* This will be implemented later. */ + return NULLRGNUMBER; +} + /* * Initialize per-AG reverse map data. */ diff --git a/repair/rmap.h b/repair/rmap.h index 64a85b32341..83331c825ec 100644 --- a/repair/rmap.h +++ b/repair/rmap.h @@ -66,4 +66,7 @@ bool is_rtrmap_inode(xfs_ino_t ino); xfs_ino_t rtgroup_rmap_ino(struct xfs_rtgroup *rtg); int populate_rtgroup_rmapbt(struct xfs_rtgroup *rtg, struct xfs_inode *ip); +xfs_rgnumber_t rtgroup_for_rtrefcount_inode(struct xfs_mount *mp, + xfs_ino_t ino); + #endif /* RMAP_H_ */ diff --git a/repair/scan.c b/repair/scan.c index c9209ebc3d7..0a37137f019 100644 --- a/repair/scan.c +++ b/repair/scan.c @@ -1752,12 +1752,6 @@ _("bad %s btree ptr 0x%llx in ino %" PRIu64 "\n"), return 0; } -struct refc_priv { - struct xfs_refcount_irec last_rec; - xfs_agblock_t nr_blocks; -}; - - static void scan_refcbt( struct xfs_btree_block *block, @@ -1995,6 +1989,299 @@ _("extent (%u/%u) len %u claimed, state is %d\n"), return; } + +int +process_rtrefc_reclist( + struct xfs_mount *mp, + struct xfs_refcount_rec *rp, + int numrecs, + struct refc_priv *refc_priv, + const char *name) +{ + xfs_rtblock_t lastblock = 0; + xfs_rtblock_t rtbno, next_rtbno; + int state; + int suspect = 0; + int i; + + for (i = 0; i < numrecs; i++) { + enum xfs_refc_domain domain; + xfs_rgblock_t b, rgbno, end; + xfs_extlen_t len; + xfs_nlink_t nr; + + b = rgbno = be32_to_cpu(rp[i].rc_startblock); + len = be32_to_cpu(rp[i].rc_blockcount); + nr = be32_to_cpu(rp[i].rc_refcount); + + if (b & XFS_REFC_COWFLAG) { + domain = XFS_REFC_DOMAIN_COW; + rgbno &= ~XFS_REFC_COWFLAG; + } else { + domain = XFS_REFC_DOMAIN_SHARED; + } + + if (domain == XFS_REFC_DOMAIN_COW && nr != 1) { + do_warn( +_("leftover rt CoW extent has incorrect refcount in record %u of %s\n"), + i, name); + suspect++; + } + if (nr == 1) { + if (domain != XFS_REFC_DOMAIN_COW) { + do_warn( +_("leftover rt CoW extent has invalid startblock in record %u of %s\n"), + i, name); + suspect++; + } + } + end = rgbno + len; + + rtbno = xfs_rgbno_to_rtb(mp, refc_priv->rgno, rgbno); + if (!libxfs_verify_rtbno(mp, rtbno)) { + do_warn( +_("invalid start block %llu in record %u of %s\n"), + (unsigned long long)b, i, name); + suspect++; + continue; + } + + next_rtbno = xfs_rgbno_to_rtb(mp, refc_priv->rgno, end); + if (len == 0 || end <= rgbno || + !libxfs_verify_rtbno(mp, next_rtbno - 1)) { + do_warn( +_("invalid length %llu in record %u of %s\n"), + (unsigned long long)len, i, name); + suspect++; + continue; + } + + if (nr == 1) { + xfs_rtxnum_t rtx, next_rtx; + + rtx = xfs_rtb_to_rtxt(mp, rtbno); + next_rtx = xfs_rtb_to_rtxt(mp, next_rtbno); + for (; rtx < next_rtx; rtx++) { + state = get_rtbmap(rtx); + switch (state) { + case XR_E_UNKNOWN: + case XR_E_COW: + do_warn( +_("leftover CoW rtextent (%llu)\n"), + (unsigned long long)rtx); + suspect++; + set_rtbmap(rtx, XR_E_FREE); + break; + default: + do_warn( +_("rtextent (%llu) claimed, state is %d\n"), + (unsigned long long)rtx, state); + suspect++; + break; + } + } + } else if (nr < 2 || nr > XFS_REFC_REFCOUNT_MAX) { + do_warn( +_("invalid rt reference count %u in record %u of %s\n"), + nr, i, name); + suspect++; + continue; + } + + if (b && b <= lastblock) { + do_warn(_( +"out-of-order %s btree record %d (%llu %llu) in %s\n"), + name, i, (unsigned long long)b, + (unsigned long long)len, name); + suspect++; + } else { + lastblock = end - 1; + } + + /* Is this record mergeable with the last one? */ + if (refc_priv->last_rec.rc_domain == domain && + refc_priv->last_rec.rc_startblock + + refc_priv->last_rec.rc_blockcount == rgbno && + refc_priv->last_rec.rc_refcount == nr) { + do_warn( +_("record %d of %s tree should be merged with previous record\n"), + i, name); + suspect++; + refc_priv->last_rec.rc_blockcount += len; + } else { + refc_priv->last_rec.rc_domain = domain; + refc_priv->last_rec.rc_startblock = rgbno; + refc_priv->last_rec.rc_blockcount = len; + refc_priv->last_rec.rc_refcount = nr; + } + + /* XXX: probably want to mark the reflinked areas? */ + } + + return suspect; +} + +int +scan_rtrefcbt( + struct xfs_btree_block *block, + int level, + int type, + int whichfork, + xfs_fsblock_t fsbno, + xfs_ino_t ino, + xfs_rfsblock_t *tot, + uint64_t *nex, + struct blkmap **blkmapp, + bmap_cursor_t *bm_cursor, + int suspect, + int isroot, + int check_dups, + int *dirty, + uint64_t magic, + void *priv) +{ + const char *name = "rtrefcount"; + char rootname[256]; + int i; + xfs_rtrefcount_ptr_t *pp; + struct xfs_refcount_rec *rp; + struct refc_priv *refc_priv = priv; + int hdr_errors = 0; + int numrecs; + int state; + xfs_agnumber_t agno; + xfs_agblock_t agbno; + int error; + + agno = XFS_FSB_TO_AGNO(mp, fsbno); + agbno = XFS_FSB_TO_AGBNO(mp, fsbno); + + if (magic != XFS_RTREFC_CRC_MAGIC) { + name = "(unknown)"; + hdr_errors++; + suspect++; + goto out; + } + + if (be32_to_cpu(block->bb_magic) != magic) { + do_warn(_("bad magic # %#x in %s btree block %d/%d\n"), + be32_to_cpu(block->bb_magic), name, agno, + agbno); + hdr_errors++; + if (suspect) + goto out; + } + + if (be16_to_cpu(block->bb_level) != level) { + do_warn(_("expected level %d got %d in %s btree block %d/%d\n"), + level, be16_to_cpu(block->bb_level), name, + agno, agbno); + hdr_errors++; + if (suspect) + goto out; + } + + refc_priv->nr_blocks++; + + /* + * Check for btree blocks multiply claimed. We're going to regenerate + * the btree anyway, so mark the blocks as metadata so they get freed. + */ + state = get_bmap(agno, agbno); + if (!(state == XR_E_UNKNOWN || state == XR_E_INUSE1)) { + do_warn( +_("%s btree block claimed (state %d), agno %d, agbno %d, suspect %d\n"), + name, state, agno, agbno, suspect); + goto out; + } + set_bmap(agno, agbno, XR_E_METADATA); + + numrecs = be16_to_cpu(block->bb_numrecs); + if (level == 0) { + if (numrecs > mp->m_rtrefc_mxr[0]) { + numrecs = mp->m_rtrefc_mxr[0]; + hdr_errors++; + } + if (isroot == 0 && numrecs < mp->m_rtrefc_mnr[0]) { + numrecs = mp->m_rtrefc_mnr[0]; + hdr_errors++; + } + + if (hdr_errors) { + do_warn( + _("bad btree nrecs (%u, min=%u, max=%u) in %s btree block %u/%u\n"), + be16_to_cpu(block->bb_numrecs), + mp->m_rtrefc_mnr[0], + mp->m_rtrefc_mxr[0], name, agno, agbno); + suspect++; + } + + rp = xfs_rtrefcount_rec_addr(block, 1); + snprintf(rootname, 256, "%s btree block %u/%u", name, agno, + agbno); + error = process_rtrefc_reclist(mp, rp, numrecs, refc_priv, + rootname); + if (error) + suspect++; + goto out; + } + + /* + * interior record + */ + pp = xfs_rtrefcount_ptr_addr(block, 1, mp->m_rtrefc_mxr[1]); + + if (numrecs > mp->m_rtrefc_mxr[1]) { + numrecs = mp->m_rtrefc_mxr[1]; + hdr_errors++; + } + if (isroot == 0 && numrecs < mp->m_rtrefc_mnr[1]) { + numrecs = mp->m_rtrefc_mnr[1]; + hdr_errors++; + } + + /* + * don't pass bogus tree flag down further if this block + * looked ok. bail out if two levels in a row look bad. + */ + if (hdr_errors) { + do_warn( + _("bad btree nrecs (%u, min=%u, max=%u) in %s btree block %u/%u\n"), + be16_to_cpu(block->bb_numrecs), + mp->m_rtrefc_mnr[1], mp->m_rtrefc_mxr[1], name, + agno, agbno); + if (suspect) + goto out; + suspect++; + } else if (suspect) { + suspect = 0; + } + + for (i = 0; i < numrecs; i++) { + xfs_fsblock_t pbno = be64_to_cpu(pp[i]); + + if (!libxfs_verify_fsbno(mp, pbno)) { + do_warn( + _("bad btree pointer (%u) in %sbt block %u/%u\n"), + agbno, name, agno, agbno); + suspect++; + return 0; + } + + scan_lbtree(pbno, level, scan_rtrefcbt, type, whichfork, ino, + tot, nex, blkmapp, bm_cursor, suspect, 0, + check_dups, magic, refc_priv, + &xfs_rtrefcountbt_buf_ops); + } +out: + if (suspect) { + refcount_avoid_check(); + return 1; + } + + return 0; +} + /* * The following helpers are to help process and validate individual on-disk * inode btree records. We have two possible inode btrees with slightly diff --git a/repair/scan.h b/repair/scan.h index a624c882734..1643a2397ae 100644 --- a/repair/scan.h +++ b/repair/scan.h @@ -100,4 +100,37 @@ int scan_rtrmapbt( uint64_t magic, void *priv); +struct refc_priv { + struct xfs_refcount_irec last_rec; + xfs_agblock_t nr_blocks; + xfs_rgnumber_t rgno; +}; + +int +process_rtrefc_reclist( + struct xfs_mount *mp, + struct xfs_refcount_rec *rp, + int numrecs, + struct refc_priv *refc_priv, + const char *name); + +int +scan_rtrefcbt( + struct xfs_btree_block *block, + int level, + int type, + int whichfork, + xfs_fsblock_t bno, + xfs_ino_t ino, + xfs_rfsblock_t *tot, + uint64_t *nex, + struct blkmap **blkmapp, + bmap_cursor_t *bm_cursor, + int suspect, + int isroot, + int check_dups, + int *dirty, + uint64_t magic, + void *priv); + #endif /* _XR_SCAN_H */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 25/41] xfs_db: support the realtime refcountbt 2022-12-30 22:20 ` [PATCHSET v1.0 00/41] libxfs: reflink on the realtime device Darrick J. Wong ` (29 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 31/41] xfs_repair: use realtime refcount btree data to check block types Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 27/41] xfs_db: support rudimentary checks of the rtrefcount btree Darrick J. Wong ` (9 subsequent siblings) 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Wire up various parts of xfs_db for realtime refcount support. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- db/btblock.c | 3 +++ db/btdump.c | 11 ++++++++++- db/btheight.c | 5 +++++ libxfs/libxfs_api_defs.h | 1 + man/man8/xfs_db.8 | 1 + 5 files changed, 20 insertions(+), 1 deletion(-) diff --git a/db/btblock.c b/db/btblock.c index 0a581593a59..6ea146ba62a 100644 --- a/db/btblock.c +++ b/db/btblock.c @@ -159,6 +159,9 @@ block_to_bt( case TYP_REFCBT: magic = crc ? XFS_REFC_CRC_MAGIC : 0; break; + case TYP_RTREFCBT: + magic = crc ? XFS_RTREFC_CRC_MAGIC : 0; + break; default: ASSERT(0); } diff --git a/db/btdump.c b/db/btdump.c index 9c528e5a11a..31f32a8f7a5 100644 --- a/db/btdump.c +++ b/db/btdump.c @@ -447,7 +447,8 @@ is_btree_inode(void) struct xfs_dinode *dip; dip = iocur_top->data; - return dip->di_format == XFS_DINODE_FMT_RMAP; + return dip->di_format == XFS_DINODE_FMT_RMAP || + dip->di_format == XFS_DINODE_FMT_REFCOUNT; } static int @@ -457,6 +458,7 @@ dump_btree_inode( char *prefix; struct xfs_dinode *dip; struct xfs_rtrmap_root *rtrmap; + struct xfs_rtrefcount_root *rtrefc; int level; int numrecs; int ret; @@ -469,6 +471,12 @@ dump_btree_inode( level = be16_to_cpu(rtrmap->bb_level); numrecs = be16_to_cpu(rtrmap->bb_numrecs); break; + case XFS_DINODE_FMT_REFCOUNT: + prefix = "u3.rtrefcbt"; + rtrefc = (struct xfs_rtrefcount_root *)XFS_DFORK_DPTR(dip); + level = be16_to_cpu(rtrefc->bb_level); + numrecs = be16_to_cpu(rtrefc->bb_numrecs); + break; default: dbprintf("Unknown metadata inode type %u\n", dip->di_format); return 0; @@ -550,6 +558,7 @@ btdump_f( case TYP_BMAPBTA: case TYP_BMAPBTD: case TYP_RTRMAPBT: + case TYP_RTREFCBT: return dump_btree_long(iflag); case TYP_INODE: if (is_btree_inode()) diff --git a/db/btheight.c b/db/btheight.c index 25ce3400334..9dd21ddae9a 100644 --- a/db/btheight.c +++ b/db/btheight.c @@ -58,6 +58,11 @@ struct btmap { .maxlevels = libxfs_rtrmapbt_maxlevels_ondisk, .maxrecs = libxfs_rtrmapbt_maxrecs, }, + { + .tag = "rtrefcountbt", + .maxlevels = libxfs_rtrefcountbt_maxlevels_ondisk, + .maxrecs = libxfs_rtrefcountbt_maxrecs, + }, }; static void diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h index 0ac00fca337..a1c6efd5ca9 100644 --- a/libxfs/libxfs_api_defs.h +++ b/libxfs/libxfs_api_defs.h @@ -252,6 +252,7 @@ #define xfs_rtrefcountbt_create_path libxfs_rtrefcountbt_create_path #define xfs_rtrefcountbt_droot_maxrecs libxfs_rtrefcountbt_droot_maxrecs +#define xfs_rtrefcountbt_maxlevels_ondisk libxfs_rtrefcountbt_maxlevels_ondisk #define xfs_rtrefcountbt_maxrecs libxfs_rtrefcountbt_maxrecs #define xfs_rtrmapbt_calc_reserves libxfs_rtrmapbt_calc_reserves diff --git a/man/man8/xfs_db.8 b/man/man8/xfs_db.8 index a277ea5e668..a694c8ed916 100644 --- a/man/man8/xfs_db.8 +++ b/man/man8/xfs_db.8 @@ -455,6 +455,7 @@ The supported btree types are: .IR bmapbt , .IR refcountbt , .IR rmapbt , +.IR rtrefcountbt , and .IR rtrmapbt . The magic value ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 27/41] xfs_db: support rudimentary checks of the rtrefcount btree 2022-12-30 22:20 ` [PATCHSET v1.0 00/41] libxfs: reflink on the realtime device Darrick J. Wong ` (30 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 25/41] xfs_db: support the realtime refcountbt Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 34/41] xfs_repair: check existing realtime refcountbt entries against observed refcounts Darrick J. Wong ` (8 subsequent siblings) 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Perform some fairly superficial checks of the rtrefcount btree. We'll do more sophisticated checks in xfs_repair, but provide enough of a spot-check here that we can do simple things. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- db/check.c | 254 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-- db/inode.c | 14 +++ db/inode.h | 1 3 files changed, 261 insertions(+), 8 deletions(-) diff --git a/db/check.c b/db/check.c index 6c92b961283..3a55c152eb1 100644 --- a/db/check.c +++ b/db/check.c @@ -59,6 +59,8 @@ typedef enum { DBM_COWDATA, DBM_RTSB, DBM_BTRTRMAP, + DBM_BTRTREFC, + DBM_RLRTDATA, DBM_NDBM } dbm_t; @@ -193,6 +195,8 @@ static const char *typename[] = { "cowdata", "rtsb", "btrtrmap", + "btrtrefc", + "rlrtdata", NULL }; @@ -268,7 +272,7 @@ static void check_linkcounts(xfs_agnumber_t agno); static int check_range(xfs_agnumber_t agno, xfs_agblock_t agbno, xfs_extlen_t len); static void check_rdbmap(xfs_rfsblock_t bno, xfs_extlen_t len, - dbm_t type); + dbm_t type, bool is_reflink); static int check_rinomap(xfs_rfsblock_t bno, xfs_extlen_t len, xfs_ino_t c_ino); static void check_rootdir(void); @@ -348,6 +352,9 @@ static xfs_ino_t process_sf_dir_v2(struct xfs_dinode *dip, int *dot, static void process_rtrmap(struct inodata *id, struct xfs_dinode *dip, xfs_rfsblock_t *toti); +static void process_rtrefc(struct inodata *id, + struct xfs_dinode *dip, + xfs_rfsblock_t *toti); static void quota_add(xfs_dqid_t *p, xfs_dqid_t *g, xfs_dqid_t *u, int dq, xfs_qcnt_t bc, xfs_qcnt_t ic, xfs_qcnt_t rc); @@ -379,6 +386,12 @@ static void scanfunc_rtrmap(struct xfs_btree_block *block, xfs_rfsblock_t *toti, xfs_extnum_t *nex, blkmap_t **blkmapp, int isroot, typnm_t btype); +static void scanfunc_rtrefc(struct xfs_btree_block *block, + int level, dbm_t type, xfs_fsblock_t bno, + inodata_t *id, xfs_rfsblock_t *totd, + xfs_rfsblock_t *toti, xfs_extnum_t *nex, + blkmap_t **blkmapp, int isroot, + typnm_t btype); static void scanfunc_bno(struct xfs_btree_block *block, int level, xfs_agf_t *agf, xfs_agblock_t bno, int isroot); @@ -1128,6 +1141,7 @@ blocktrash_f( (1ULL << DBM_BTFINO) | (1ULL << DBM_BTRMAP) | (1ULL << DBM_BTREFC) | + (1ULL << DBM_BTRTREFC) | (1ULL << DBM_SB); while ((c = getopt(argc, argv, "0123n:o:s:t:x:y:z")) != EOF) { switch (c) { @@ -1562,7 +1576,8 @@ static void check_rdbmap( xfs_rfsblock_t bno, xfs_extlen_t len, - dbm_t type) + dbm_t type, + bool ignore_reflink) { xfs_extlen_t i; char *p; @@ -1574,6 +1589,9 @@ check_rdbmap( error++; break; } + if (ignore_reflink && (*p == DBM_UNKNOWN || *p == DBM_RTDATA || + *p == DBM_RLRTDATA)) + continue; if ((dbm_t)*p != type) { if (!sflag || CHECK_BLIST(bno + i)) dbprintf(_("rtblock %llu expected type %s got " @@ -1600,6 +1618,8 @@ check_rinomap( bno, bno + len - 1, c_ino); return 0; } + if (xfs_has_rtreflink(mp)) + return 0; for (i = 0, rval = 1, idp = &inomap[mp->m_sb.sb_agcount][bno]; i < len; i++, idp++) { @@ -1740,6 +1760,26 @@ check_set_dbmap( } } +/* + * We don't check the accuracy of reference counts -- all we do is ensure + * that a data block never crosses with non-data blocks. repair can check + * those kinds of things. + * + * So with that in mind, if we're setting a block to be data or rldata, + * don't complain so long as the block is currently unknown, data, or rldata. + * Don't let blocks downgrade from rldata -> data. + */ +static bool +is_rtreflink( + dbm_t type2) +{ + if (!xfs_has_rtreflink(mp)) + return false; + if (type2 == DBM_RTDATA || type2 == DBM_RLRTDATA) + return true; + return false; +} + static void check_set_rdbmap( xfs_rfsblock_t bno, @@ -1753,7 +1793,7 @@ check_set_rdbmap( if (!check_rrange(bno, len)) return; - check_rdbmap(bno, len, type1); + check_rdbmap(bno, len, type1, is_rtreflink(type2)); mayprint = verbose | blist_size; for (i = 0, p = &dbmap[mp->m_sb.sb_agcount][bno]; i < len; i++, p++) { if (!rdbmap_boundscheck(bno + i)) { @@ -2863,7 +2903,7 @@ process_inode( 0 /* type 15 unused */ }; static char *fmtnames[] = { - "dev", "local", "extents", "btree", "uuid", "rmap" + "dev", "local", "extents", "btree", "uuid", "rmap", "refcount" }; ino = XFS_AGINO_TO_INO(mp, be32_to_cpu(agf->agf_seqno), agino); @@ -2942,6 +2982,16 @@ process_inode( error++; return; } + } else if (is_rtrefcount_inode(ino)) { + if (!S_ISREG(mode) || dip->di_format != XFS_DINODE_FMT_REFCOUNT) { + if (v) + dbprintf( + _("bad format %d for rtrefc inode %lld type %#o\n"), + dip->di_format, (long long)ino, + mode & S_IFMT); + error++; + return; + } } else if ((((mode & S_IFMT) >> 12) > 15) || (!(okfmts[(mode & S_IFMT) >> 12] & (1 << dip->di_format)))) { if (v) @@ -3018,6 +3068,9 @@ process_inode( } else if (is_rtrmap_inode(id->ino)) { type = DBM_BTRTRMAP; blkmap = blkmap_alloc(be32_to_cpu(dip->di_nextents)); + } else if (is_rtrefcount_inode(id->ino)) { + type = DBM_BTRTREFC; + blkmap = blkmap_alloc(be32_to_cpu(dip->di_nextents)); } else type = DBM_DATA; @@ -3053,6 +3106,10 @@ process_inode( id->rgno = rtgroup_for_rtrmap_ino(mp, id->ino); process_rtrmap(id, dip, &totiblocks); break; + case XFS_DINODE_FMT_REFCOUNT: + id->rgno = rtgroup_for_rtrefcount_ino(mp, id->ino); + process_rtrefc(id, dip, &totiblocks); + break; } if (dip->di_forkoff) { sbversion |= XFS_SB_VERSION_ATTRBIT; @@ -3079,6 +3136,7 @@ process_inode( case DBM_RTSUM: case DBM_SYMLINK: case DBM_BTRTRMAP: + case DBM_BTRTREFC: case DBM_UNKNOWN: bc = totdblocks + totiblocks + atotdblocks + atotiblocks; @@ -3883,8 +3941,7 @@ process_rtrmap( i, be32_to_cpu(rp[i].rm_startblock), be32_to_cpu(rp[i].rm_startblock)); } else { - lastblock = be32_to_cpu(rp[i].rm_startblock) + - be32_to_cpu(rp[i].rm_blockcount); + lastblock = be32_to_cpu(rp[i].rm_startblock); } } return; @@ -3899,6 +3956,79 @@ process_rtrmap( } } +static void +process_rtrefc( + struct inodata *id, + struct xfs_dinode *dip, + xfs_rfsblock_t *toti) +{ + xfs_extnum_t nex = 0; + xfs_rfsblock_t totd = 0; + struct xfs_rtrefcount_root *dib; + int whichfork = XFS_DATA_FORK; + int i; + int maxrecs; + xfs_rtrefcount_ptr_t *pp; + + if (id->rgno == NULLRGNUMBER) { + dbprintf( + _("rt group for refcount ino %lld not found\n"), + id->ino); + error++; + return; + } + + dib = (struct xfs_rtrefcount_root *)XFS_DFORK_PTR(dip, whichfork); + if (be16_to_cpu(dib->bb_level) >= mp->m_rtrefc_maxlevels) { + if (!sflag || id->ilist) + dbprintf(_("level for ino %lld rtrefc root too " + "large (%u)\n"), + id->ino, + be16_to_cpu(dib->bb_level)); + error++; + return; + } + maxrecs = libxfs_rtrefcountbt_droot_maxrecs( + XFS_DFORK_SIZE(dip, mp, whichfork), + dib->bb_level == 0); + if (be16_to_cpu(dib->bb_numrecs) > maxrecs) { + if (!sflag || id->ilist) + dbprintf(_("numrecs for ino %lld rtrefc root too " + "large (%u)\n"), + id->ino, + be16_to_cpu(dib->bb_numrecs)); + error++; + return; + } + if (be16_to_cpu(dib->bb_level) == 0) { + struct xfs_refcount_rec *rp; + xfs_fsblock_t lastblock; + + rp = xfs_rtrefcount_droot_rec_addr(dib, 1); + lastblock = 0; + for (i = 0; i < be16_to_cpu(dib->bb_numrecs); i++) { + if (be32_to_cpu(rp[i].rc_startblock) < lastblock) { + dbprintf(_( + "out-of-order rtrefc btree record %d (%u %u) root\n"), + i, be32_to_cpu(rp[i].rc_startblock), + be32_to_cpu(rp[i].rc_startblock)); + } else { + lastblock = be32_to_cpu(rp[i].rc_startblock) + + be32_to_cpu(rp[i].rc_blockcount); + } + } + return; + } else { + pp = xfs_rtrefcount_droot_ptr_addr(dib, 1, maxrecs); + for (i = 0; i < be16_to_cpu(dib->bb_numrecs); i++) + scan_lbtree(get_unaligned_be64(&pp[i]), + be16_to_cpu(dib->bb_level), + scanfunc_rtrefc, DBM_BTRTREFC, + id, &totd, toti, + &nex, NULL, 1, TYP_RTREFCBT); + } +} + static xfs_ino_t process_sf_dir_v2( struct xfs_dinode *dip, @@ -5056,8 +5186,7 @@ scanfunc_rtrmap( be32_to_cpu(rp[i].rm_blockcount), agno, bno, lastblock); } else { - lastblock = be32_to_cpu(rp[i].rm_startblock) + - be32_to_cpu(rp[i].rm_blockcount); + lastblock = be32_to_cpu(rp[i].rm_startblock); } } return; @@ -5173,6 +5302,115 @@ scanfunc_refcnt( TYP_REFCBT); } +static void +scanfunc_rtrefc( + struct xfs_btree_block *block, + int level, + dbm_t type, + xfs_fsblock_t bno, + inodata_t *id, + xfs_rfsblock_t *totd, + xfs_rfsblock_t *toti, + xfs_extnum_t *nex, + blkmap_t **blkmapp, + int isroot, + typnm_t btype) +{ + xfs_agblock_t agbno; + xfs_agnumber_t agno; + int i; + xfs_rtrefcount_ptr_t *pp; + struct xfs_refcount_rec *rp; + xfs_rtblock_t lastblock; + + agno = XFS_FSB_TO_AGNO(mp, bno); + agbno = XFS_FSB_TO_AGBNO(mp, bno); + if (be32_to_cpu(block->bb_magic) != XFS_RTREFC_CRC_MAGIC) { + dbprintf(_("bad magic # %#x in rtrefcbt block %u/%u\n"), + be32_to_cpu(block->bb_magic), agno, agbno); + serious_error++; + return; + } + if (be16_to_cpu(block->bb_level) != level) { + if (!sflag) + dbprintf(_("expected level %d got %d in rtrefcntbt block " + "%u/%u\n"), + level, be16_to_cpu(block->bb_level), agno, agbno); + error++; + } + set_dbmap(agno, agbno, 1, type, agno, agbno); + set_inomap(agno, agbno, 1, id); + (*toti)++; + if (level == 0) { + if (be16_to_cpu(block->bb_numrecs) > mp->m_rtrefc_mxr[0] || + (isroot == 0 && be16_to_cpu(block->bb_numrecs) < mp->m_rtrefc_mnr[0])) { + dbprintf(_("bad btree nrecs (%u, min=%u, max=%u) in " + "rtrefcntbt block %u/%u\n"), + be16_to_cpu(block->bb_numrecs), mp->m_rtrefc_mnr[0], + mp->m_rtrefc_mxr[0], agno, agbno); + serious_error++; + return; + } + rp = xfs_rtrefcount_rec_addr(block, 1); + lastblock = 0; + for (i = 0; i < be16_to_cpu(block->bb_numrecs); i++) { + xfs_rtblock_t rtbno; + + if (be32_to_cpu(rp[i].rc_refcount) == 1) { + xfs_fsblock_t bno; + char *msg; + + bno = be32_to_cpu(rp[i].rc_startblock); + if (bno & XFS_REFC_COWFLAG) { + bno &= ~XFS_REFC_COWFLAG; + msg = _( + "leftover rt CoW extent (%lu) len %u\n"); + } else { + msg = _( + "leftover rt CoW extent at unexpected address (%lu) len %lu\n"); + } + dbprintf(msg, + agbno, + be32_to_cpu(rp[i].rc_blockcount)); + rtbno = xfs_rgbno_to_rtb(mp, id->rgno, bno); + set_rdbmap(rtbno, + be32_to_cpu(rp[i].rc_blockcount), + DBM_COWDATA); + } else { + rtbno = xfs_rgbno_to_rtb(mp, id->rgno, + be32_to_cpu(rp[i].rc_startblock)); + set_rdbmap(rtbno, + be32_to_cpu(rp[i].rc_blockcount), + DBM_RLRTDATA); + } + if (be32_to_cpu(rp[i].rc_startblock) < lastblock) { + dbprintf(_( + "out-of-order rt refcnt btree record %d (%llu %llu) block %llu\n"), + i, be32_to_cpu(rp[i].rc_startblock), + be32_to_cpu(rp[i].rc_startblock), + bno); + } else { + lastblock = be32_to_cpu(rp[i].rc_startblock) + + be32_to_cpu(rp[i].rc_blockcount); + } + } + return; + } + if (be16_to_cpu(block->bb_numrecs) > mp->m_rtrefc_mxr[1] || + (isroot == 0 && be16_to_cpu(block->bb_numrecs) < mp->m_rtrefc_mnr[1])) { + dbprintf(_("bad btree nrecs (%u, min=%u, max=%u) in rtrefcntbt " + "block %u/%u\n"), + be16_to_cpu(block->bb_numrecs), mp->m_rtrefc_mnr[1], + mp->m_rtrefc_mxr[1], agno, agbno); + serious_error++; + return; + } + pp = xfs_rtrefcount_ptr_addr(block, 1, mp->m_rtrefc_mxr[1]); + for (i = 0; i < be16_to_cpu(block->bb_numrecs); i++) + scan_lbtree(be64_to_cpu(pp[i]), level, scanfunc_rtrefc, + type, id, totd, toti, nex, blkmapp, 0, btype); +} + static void set_dbmap( xfs_agnumber_t agno, diff --git a/db/inode.c b/db/inode.c index af56e615e08..ffcf25d0c70 100644 --- a/db/inode.c +++ b/db/inode.c @@ -642,6 +642,7 @@ inode_init(void) struct rtgroup_inodes { xfs_ino_t rmap_ino; + xfs_ino_t refcount_ino; }; static struct rtgroup_inodes *rtgroup_inodes; @@ -700,6 +701,7 @@ set_rtgroup_refcount_inode( if (rtino == NULLFSINO) return EFSCORRUPTED; + rtgroup_inodes[rgno].refcount_ino = rtino; return bitmap_set(refcount_inodes, rtino, 1); } @@ -760,6 +762,18 @@ bool is_rtrefcount_inode(xfs_ino_t ino) return bitmap_test(refcount_inodes, ino, 1); } +xfs_rgnumber_t rtgroup_for_rtrefcount_ino(struct xfs_mount *mp, xfs_ino_t ino) +{ + unsigned int i; + + for (i = 0; i < mp->m_sb.sb_rgcount; i++) { + if (rtgroup_inodes[i].refcount_ino == ino) + return i; + } + + return NULLRGNUMBER; +} + typnm_t inode_next_type(void) { diff --git a/db/inode.h b/db/inode.h index 666bb5201ea..c789017e0c8 100644 --- a/db/inode.h +++ b/db/inode.h @@ -28,3 +28,4 @@ int init_rtmeta_inode_bitmaps(struct xfs_mount *mp); bool is_rtrmap_inode(xfs_ino_t ino); xfs_rgnumber_t rtgroup_for_rtrmap_ino(struct xfs_mount *mp, xfs_ino_t ino); bool is_rtrefcount_inode(xfs_ino_t ino); +xfs_rgnumber_t rtgroup_for_rtrefcount_ino(struct xfs_mount *mp, xfs_ino_t ino); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 34/41] xfs_repair: check existing realtime refcountbt entries against observed refcounts 2022-12-30 22:20 ` [PATCHSET v1.0 00/41] libxfs: reflink on the realtime device Darrick J. Wong ` (31 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 27/41] xfs_db: support rudimentary checks of the rtrefcount btree Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 36/41] xfs_repair: rebuild the realtime refcount btree Darrick J. Wong ` (7 subsequent siblings) 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Once we've finished collecting reverse mapping observations from the metadata scan, check those observations against the realtime refcount btree (particularly if we're in -n mode) to detect rtrefcountbt problems. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/libxfs_api_defs.h | 1 repair/agbtree.c | 2 repair/phase4.c | 11 +++ repair/rmap.c | 200 +++++++++++++++++++++++++++++++++++----------- repair/rmap.h | 4 + 5 files changed, 170 insertions(+), 48 deletions(-) diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h index a1c6efd5ca9..7f52993aee4 100644 --- a/libxfs/libxfs_api_defs.h +++ b/libxfs/libxfs_api_defs.h @@ -252,6 +252,7 @@ #define xfs_rtrefcountbt_create_path libxfs_rtrefcountbt_create_path #define xfs_rtrefcountbt_droot_maxrecs libxfs_rtrefcountbt_droot_maxrecs +#define xfs_rtrefcountbt_init_cursor libxfs_rtrefcountbt_init_cursor #define xfs_rtrefcountbt_maxlevels_ondisk libxfs_rtrefcountbt_maxlevels_ondisk #define xfs_rtrefcountbt_maxrecs libxfs_rtrefcountbt_maxrecs diff --git a/repair/agbtree.c b/repair/agbtree.c index e340e9cfc04..1eabce0104f 100644 --- a/repair/agbtree.c +++ b/repair/agbtree.c @@ -738,7 +738,7 @@ build_refcount_tree( { int error; - error = init_refcount_cursor(agno, &btr->slab_cursor); + error = init_refcount_cursor(false, agno, &btr->slab_cursor); if (error) do_error( _("Insufficient memory to construct refcount cursor.\n")); diff --git a/repair/phase4.c b/repair/phase4.c index e90533689e0..8d97b63b2ce 100644 --- a/repair/phase4.c +++ b/repair/phase4.c @@ -219,6 +219,15 @@ check_refcount_btrees( check_refcounts(wq->wq_ctx, agno); } +static void +check_rt_refcount_btrees( + struct workqueue *wq, + xfs_agnumber_t agno, + void *arg) +{ + check_rtrefcounts(wq->wq_ctx, agno); +} + static void process_rmap_data( struct xfs_mount *mp) @@ -251,6 +260,8 @@ process_rmap_data( queue_work(&wq, process_inode_reflink_flags, i, NULL); queue_work(&wq, check_refcount_btrees, i, NULL); } + for (i = 0; i < mp->m_sb.sb_rgcount; i++) + queue_work(&wq, check_rt_refcount_btrees, i, NULL); destroy_work_queue(&wq); } diff --git a/repair/rmap.c b/repair/rmap.c index 85fc05945c6..21062e4ac49 100644 --- a/repair/rmap.c +++ b/repair/rmap.c @@ -1883,10 +1883,11 @@ refcount_record_count( */ int init_refcount_cursor( + bool isrt, xfs_agnumber_t agno, struct xfs_slab_cursor **cur) { - struct xfs_ag_rmap *x = rmaps_for_group(false, agno); + struct xfs_ag_rmap *x = rmaps_for_group(isrt, agno); return init_slab_cursor(x->ar_refcount_items, NULL, cur); } @@ -1911,56 +1912,18 @@ refcount_avoid_check( refcbt_suspect = true; } -/* - * Compare the observed reference counts against what's in the ag btree. - */ -void -check_refcounts( - struct xfs_mount *mp, +static int +check_refcount_records( + struct xfs_slab_cursor *rl_cur, + struct xfs_btree_cur *bt_cur, xfs_agnumber_t agno) { struct xfs_refcount_irec tmp; - struct xfs_slab_cursor *rl_cur; - struct xfs_btree_cur *bt_cur = NULL; - struct xfs_buf *agbp = NULL; - struct xfs_perag *pag = NULL; struct xfs_refcount_irec *rl_rec; - int have; int i; + int have; int error; - if (!xfs_has_reflink(mp) || add_reflink) - return; - if (refcbt_suspect) { - if (no_modify && agno == 0) - do_warn(_("would rebuild corrupt refcount btrees.\n")); - return; - } - - /* Create cursors to refcount structures */ - error = init_refcount_cursor(agno, &rl_cur); - if (error) { - do_warn(_("Not enough memory to check refcount data.\n")); - return; - } - - pag = libxfs_perag_get(mp, agno); - error = -libxfs_alloc_read_agf(pag, NULL, 0, &agbp); - if (error) { - do_warn(_("Could not read AGF %u to check refcount btree.\n"), - agno); - goto err_pag; - } - - /* Leave the per-ag data "uninitialized" since we rewrite it later */ - pag->pagf_init = 0; - - bt_cur = libxfs_refcountbt_init_cursor(mp, NULL, agbp, pag); - if (!bt_cur) { - do_warn(_("Not enough memory to check refcount data.\n")); - goto err_agf; - } - rl_rec = pop_slab_cursor(rl_cur); while (rl_rec) { /* Look for a refcount record in the btree */ @@ -1971,7 +1934,7 @@ check_refcounts( do_warn( _("Could not read reference count record for (%u/%u).\n"), agno, rl_rec->rc_startblock); - goto err_cur; + return error; } if (!have) { do_warn( @@ -1986,7 +1949,7 @@ _("Missing reference count record for (%u/%u) len %u count %u\n"), do_warn( _("Could not read reference count record for (%u/%u).\n"), agno, rl_rec->rc_startblock); - goto err_cur; + return error; } if (!i) { do_warn( @@ -2016,6 +1979,63 @@ _("Incorrect reference count: saw (%u/%u) len %u nlinks %u; should be (%u/%u) le rl_rec = pop_slab_cursor(rl_cur); } + return 0; +} + +/* + * Compare the observed reference counts against what's in the ag btree. + */ +void +check_refcounts( + struct xfs_mount *mp, + xfs_agnumber_t agno) +{ + struct xfs_slab_cursor *rl_cur; + struct xfs_btree_cur *bt_cur = NULL; + struct xfs_buf *agbp = NULL; + struct xfs_perag *pag = NULL; + int error; + + if (!xfs_has_reflink(mp) || add_reflink) + return; + if (refcbt_suspect) { + if (no_modify && agno == 0) + do_warn(_("would rebuild corrupt refcount btrees.\n")); + return; + } + + /* Create cursors to refcount structures */ + error = init_refcount_cursor(false, agno, &rl_cur); + if (error) { + do_warn(_("Not enough memory to check refcount data.\n")); + return; + } + + pag = libxfs_perag_get(mp, agno); + error = -libxfs_alloc_read_agf(pag, NULL, 0, &agbp); + if (error) { + do_warn( +_("Could not read AGF %u to check refcount btree.\n"), + agno); + goto err_pag; + } + + /* + * Leave the per-ag data "uninitialized" since we rewrite it + * later. + */ + pag->pagf_init = 0; + + bt_cur = libxfs_refcountbt_init_cursor(mp, NULL, agbp, pag); + if (!bt_cur) { + do_warn(_("Not enough memory to check refcount data.\n")); + goto err_agf; + } + + error = check_refcount_records(rl_cur, bt_cur, agno); + if (error) + goto err_cur; + err_cur: libxfs_btree_del_cursor(bt_cur, error); err_agf: @@ -2025,6 +2045,94 @@ _("Incorrect reference count: saw (%u/%u) len %u nlinks %u; should be (%u/%u) le free_slab_cursor(&rl_cur); } +/* + * Compare the observed reference counts against what's in the ondisk btree. + */ +void +check_rtrefcounts( + struct xfs_mount *mp, + xfs_rgnumber_t rgno) +{ + struct xfs_slab_cursor *rl_cur; + struct xfs_btree_cur *bt_cur = NULL; + struct xfs_rtgroup *rtg = NULL; + struct xfs_inode *ip = NULL; + struct xfs_ag_rmap *ar = rmaps_for_group(true, rgno); + int error; + + if (!xfs_has_reflink(mp) || add_reflink) + return; + if (refcbt_suspect) { + if (no_modify && rgno == 0) + do_warn(_("would rebuild corrupt refcount btrees.\n")); + return; + } + if (mp->m_sb.sb_rblocks == 0) { + if (rmap_record_count(mp, true, rgno) != 0) + do_error(_("realtime refcounts but no rtdev?\n")); + return; + } + + /* Create cursors to refcount structures */ + error = init_refcount_cursor(true, rgno, &rl_cur); + if (error) { + do_warn(_("Not enough memory to check refcount data.\n")); + return; + } + + rtg = libxfs_rtgroup_get(mp, rgno); + if (!rtg) { + do_warn(_("Could not load rtgroup %u.\n"), rgno); + goto err_rcur; + } + + error = -libxfs_imeta_iget(mp, ar->rg_refcount_ino, + XFS_DIR3_FT_REG_FILE, &ip); + if (error) { + do_warn( +_("Cannot load rtgroup %u refcount inode 0x%llx, error %d.\n"), + rgno, + (unsigned long long)ar->rg_refcount_ino, + error); + goto err_rtg; + } + + if (ip->i_df.if_format != XFS_DINODE_FMT_REFCOUNT) { + do_warn( +_("rtgroup %u refcount inode has wrong format 0x%x, expected 0x%x\n"), + rgno, + ip->i_df.if_format, + XFS_DINODE_FMT_REFCOUNT); + goto err_ino; + } + + if (xfs_inode_has_attr_fork(ip)) { + do_warn( +_("rtgroup %u refcount inode should not have extended attributes\n"), + rgno); + goto err_ino; + } + + bt_cur = libxfs_rtrefcountbt_init_cursor(mp, NULL, rtg, ip); + if (!bt_cur) { + do_warn(_("Not enough memory to check refcount data.\n")); + goto err_ino; + } + + error = check_refcount_records(rl_cur, bt_cur, rgno); + if (error) + goto err_cur; + +err_cur: + libxfs_btree_del_cursor(bt_cur, error); +err_ino: + libxfs_imeta_irele(ip); +err_rtg: + libxfs_rtgroup_put(rtg); +err_rcur: + free_slab_cursor(&rl_cur); +} + /* * Regenerate the AGFL so that we don't run out of it while rebuilding the * rmap btree. If skip_rmapbt is true, don't update the rmapbt (most probably diff --git a/repair/rmap.h b/repair/rmap.h index 4d20d90812b..9e7a4968588 100644 --- a/repair/rmap.h +++ b/repair/rmap.h @@ -40,9 +40,11 @@ extern void rmap_high_key_from_rec(struct xfs_rmap_irec *rec, int compute_refcounts(struct xfs_mount *mp, bool isrt, xfs_agnumber_t agno); uint64_t refcount_record_count(struct xfs_mount *mp, xfs_agnumber_t agno); -extern int init_refcount_cursor(xfs_agnumber_t, struct xfs_slab_cursor **); +int init_refcount_cursor(bool isrt, xfs_agnumber_t agno, + struct xfs_slab_cursor **pcur); extern void refcount_avoid_check(struct xfs_mount *mp); void check_refcounts(struct xfs_mount *mp, xfs_agnumber_t agno); +void check_rtrefcounts(struct xfs_mount *mp, xfs_rgnumber_t rgno); extern void record_inode_reflink_flag(struct xfs_mount *, struct xfs_dinode *, xfs_agnumber_t, xfs_agino_t, xfs_ino_t); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 36/41] xfs_repair: rebuild the realtime refcount btree 2022-12-30 22:20 ` [PATCHSET v1.0 00/41] libxfs: reflink on the realtime device Darrick J. Wong ` (32 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 34/41] xfs_repair: check existing realtime refcountbt entries against observed refcounts Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 35/41] xfs_repair: reject unwritten shared extents Darrick J. Wong ` (6 subsequent siblings) 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Use the collected reference count information to rebuild the btree. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/libxfs_api_defs.h | 4 + repair/Makefile | 1 repair/agbtree.c | 2 repair/phase6.c | 115 ++++++++++++++++++++ repair/rmap.c | 12 ++ repair/rmap.h | 5 + repair/rtrefcount_repair.c | 248 ++++++++++++++++++++++++++++++++++++++++++++ 7 files changed, 384 insertions(+), 3 deletions(-) create mode 100644 repair/rtrefcount_repair.c diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h index 7f52993aee4..5c7396a53a6 100644 --- a/libxfs/libxfs_api_defs.h +++ b/libxfs/libxfs_api_defs.h @@ -250,11 +250,15 @@ #define xfs_rtgroup_update_secondary_sbs libxfs_rtgroup_update_secondary_sbs #define xfs_rtgroup_update_super libxfs_rtgroup_update_super +#define xfs_rtrefcountbt_absolute_maxlevels libxfs_rtrefcountbt_absolute_maxlevels +#define xfs_rtrefcountbt_commit_staged_btree libxfs_rtrefcountbt_commit_staged_btree +#define xfs_rtrefcountbt_create libxfs_rtrefcountbt_create #define xfs_rtrefcountbt_create_path libxfs_rtrefcountbt_create_path #define xfs_rtrefcountbt_droot_maxrecs libxfs_rtrefcountbt_droot_maxrecs #define xfs_rtrefcountbt_init_cursor libxfs_rtrefcountbt_init_cursor #define xfs_rtrefcountbt_maxlevels_ondisk libxfs_rtrefcountbt_maxlevels_ondisk #define xfs_rtrefcountbt_maxrecs libxfs_rtrefcountbt_maxrecs +#define xfs_rtrefcountbt_stage_cursor libxfs_rtrefcountbt_stage_cursor #define xfs_rtrmapbt_calc_reserves libxfs_rtrmapbt_calc_reserves #define xfs_rtrmapbt_commit_staged_btree libxfs_rtrmapbt_commit_staged_btree diff --git a/repair/Makefile b/repair/Makefile index c7e09732800..0f2b05c4532 100644 --- a/repair/Makefile +++ b/repair/Makefile @@ -70,6 +70,7 @@ CFILES = \ rcbag.c \ rmap.c \ rt.c \ + rtrefcount_repair.c \ rtrmap_repair.c \ sb.c \ scan.c \ diff --git a/repair/agbtree.c b/repair/agbtree.c index 1eabce0104f..80d6d6710ce 100644 --- a/repair/agbtree.c +++ b/repair/agbtree.c @@ -721,7 +721,7 @@ init_refc_cursor( /* Compute how many blocks we'll need. */ error = -libxfs_btree_bload_compute_geometry(btr->cur, &btr->bload, - refcount_record_count(sc->mp, agno)); + refcount_record_count(sc->mp, false, agno)); if (error) do_error( _("Unable to compute refcount btree geometry, error %d.\n"), error); diff --git a/repair/phase6.c b/repair/phase6.c index 890bb20bce1..bb123919932 100644 --- a/repair/phase6.c +++ b/repair/phase6.c @@ -1096,6 +1096,120 @@ _("rtgroup %u rmap btree could not be rebuilt, error %d\n"), libxfs_imeta_irele(ip); } +static void +ensure_rtgroup_refcountbt( + struct xfs_rtgroup *rtg) +{ + struct xfs_mount *mp = rtg->rtg_mount; + struct xfs_trans *tp; + struct xfs_imeta_path *path; + struct xfs_inode *ip; + struct xfs_imeta_update upd; + xfs_ino_t ino; + int error; + + if (!xfs_has_rtreflink(mp)) + return; + + ino = rtgroup_refcount_ino(rtg); + if (no_modify) { + if (ino == NULLFSINO) + do_warn(_("would reset rtgroup %u refcount btree\n"), + rtg->rtg_rgno); + return; + } + + if (ino == NULLFSINO) + do_warn(_("resetting rtgroup %u refcount btree\n"), + rtg->rtg_rgno); + + error = -libxfs_rtrefcountbt_create_path(mp, rtg->rtg_rgno, &path); + if (error) + do_error( +_("Couldn't create rtgroup %u refcount file path, err %d\n"), + rtg->rtg_rgno, error); + + error = ensure_imeta_dirpath(mp, path); + if (error) + do_error( +_("Couldn't create rtgroup %u metadata directory, error %d\n"), + rtg->rtg_rgno, error); + + error = -libxfs_imeta_start_update(mp, path, &upd); + if (error) + do_error( +_("Couldn't find rtgroup %u refcountbt parent, error %d\n"), + rtg->rtg_rgno, error); + + /* Create a transaction for whatever work we end up doing. */ + error = -libxfs_trans_alloc(mp, &M_RES(mp)->tr_imeta_create, + libxfs_imeta_create_space_res(mp), 0, 0, &tp); + if (error) + do_error( +_("Couldn't prepare to attach rtgroup %u refcountbt inode, error %d\n"), + rtg->rtg_rgno, error); + + if (ino != NULLFSINO) { + /* + * We're still hanging on to our old inode, so try to grab it + * so that we can reconnect it and reconnect it to the metadata + * directory tree. + */ + error = -libxfs_imeta_iget(mp, ino, XFS_DIR3_FT_REG_FILE, &ip); + if (error) { + do_warn( +_("Couldn't iget rtgroup %u refcountbt inode 0x%llx, error %d\n"), + rtg->rtg_rgno, (unsigned long long)ino, + error); + goto zap; + } + + error = -libxfs_imeta_link(tp, path, ip, &upd); + if (error) + do_error( +_("Failed to link rtgroup %u refcountbt inode 0x%llx, error %d\n"), + rtg->rtg_rgno, (unsigned long long)ino, + error); + + set_nlink(VFS_I(ip), 1); + ip->i_df.if_format = XFS_DINODE_FMT_REFCOUNT; + libxfs_trans_log_inode(tp, ip, XFS_ILOG_CORE); + } else { +zap: + /* + * The rtrefcount inode was bad or gone, so just make a new one + * and give our reference to the rtgroup structure. + */ + error = -libxfs_rtrefcountbt_create(&tp, path, &upd, &ip); + if (error) + do_error( +_("Couldn't create rtgroup %u refcountbt inode, error %d\n"), + rtg->rtg_rgno, error); + } + + error = -libxfs_trans_commit(tp); + if (error) + do_error( +_("Couldn't commit new rtgroup %u refcountbt inode %llu, error %d\n"), + rtg->rtg_rgno, (unsigned long long)ip->i_ino, + error); + + /* Mark the inode in use. */ + mark_ino_inuse(mp, ip->i_ino, S_IFREG, upd.dp->i_ino); + mark_ino_metadata(mp, ip->i_ino); + libxfs_imeta_end_update(mp, &upd, error); + + /* Copy our incore refcount data to the ondisk refcount inode. */ + error = populate_rtgroup_refcountbt(rtg, ip); + if (error) + do_error( +_("rtgroup %u refcount btree could not be rebuilt, error %d\n"), + rtg->rtg_rgno, error); + + libxfs_imeta_free_path(path); + libxfs_imeta_irele(ip); +} + /* Initialize a root directory. */ static int init_fs_root_dir( @@ -3756,6 +3870,7 @@ reset_rt_metadata_inodes( for_each_rtgroup(mp, rgno, rtg) { ensure_rtgroup_rmapbt(rtg); + ensure_rtgroup_refcountbt(rtg); } } diff --git a/repair/rmap.c b/repair/rmap.c index 21062e4ac49..0c7faa9ad08 100644 --- a/repair/rmap.c +++ b/repair/rmap.c @@ -1871,9 +1871,10 @@ _("Unable to fix reflink flag on inode %"PRIu64".\n"), uint64_t refcount_record_count( struct xfs_mount *mp, + bool isrt, xfs_agnumber_t agno) { - struct xfs_ag_rmap *x = rmaps_for_group(false, agno); + struct xfs_ag_rmap *x = rmaps_for_group(isrt, agno); return slab_count(x->ar_refcount_items); } @@ -2219,3 +2220,12 @@ rtgroup_rmap_ino( return ar->rg_rmap_ino; } + +xfs_ino_t +rtgroup_refcount_ino( + struct xfs_rtgroup *rtg) +{ + struct xfs_ag_rmap *ar = rmaps_for_group(true, rtg->rtg_rgno); + + return ar->rg_refcount_ino; +} diff --git a/repair/rmap.h b/repair/rmap.h index 9e7a4968588..7e23e650246 100644 --- a/repair/rmap.h +++ b/repair/rmap.h @@ -39,7 +39,8 @@ extern void rmap_high_key_from_rec(struct xfs_rmap_irec *rec, struct xfs_rmap_irec *key); int compute_refcounts(struct xfs_mount *mp, bool isrt, xfs_agnumber_t agno); -uint64_t refcount_record_count(struct xfs_mount *mp, xfs_agnumber_t agno); +uint64_t refcount_record_count(struct xfs_mount *mp, bool isrt, + xfs_agnumber_t agno); int init_refcount_cursor(bool isrt, xfs_agnumber_t agno, struct xfs_slab_cursor **pcur); extern void refcount_avoid_check(struct xfs_mount *mp); @@ -71,5 +72,7 @@ int populate_rtgroup_rmapbt(struct xfs_rtgroup *rtg, struct xfs_inode *ip); xfs_rgnumber_t rtgroup_for_rtrefcount_inode(struct xfs_mount *mp, xfs_ino_t ino); bool is_rtrefcount_ino(xfs_ino_t ino); +xfs_ino_t rtgroup_refcount_ino(struct xfs_rtgroup *rtg); +int populate_rtgroup_refcountbt(struct xfs_rtgroup *rtg, struct xfs_inode *ip); #endif /* RMAP_H_ */ diff --git a/repair/rtrefcount_repair.c b/repair/rtrefcount_repair.c new file mode 100644 index 00000000000..04f61f1bbc4 --- /dev/null +++ b/repair/rtrefcount_repair.c @@ -0,0 +1,248 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +/* + * Copyright (C) 2022 Oracle. All Rights Reserved. + * Author: Darrick J. Wong <djwong@kernel.org> + */ +#include <libxfs.h> +#include "btree.h" +#include "err_protos.h" +#include "libxlog.h" +#include "incore.h" +#include "globals.h" +#include "dinode.h" +#include "slab.h" +#include "rmap.h" +#include "bulkload.h" + +/* + * Realtime Reference Count (RTREFCBT) Repair + * ========================================== + * + * Gather all the reference count records for the realtime device, reset the + * incore fork, then recreate the btree. + */ +struct xrep_rtrefc { + /* rtrefcbt slab cursor */ + struct xfs_slab_cursor *slab_cursor; + + /* New fork. */ + struct bulkload new_fork_info; + struct xfs_btree_bload rtrefc_bload; + + struct repair_ctx *sc; + struct xfs_rtgroup *rtg; +}; + +/* Retrieve rtrefc data for bulk load. */ +STATIC int +xrep_rtrefc_get_records( + struct xfs_btree_cur *cur, + unsigned int idx, + struct xfs_btree_block *block, + unsigned int nr_wanted, + void *priv) +{ + struct xfs_refcount_irec *rec; + struct xrep_rtrefc *rc = priv; + union xfs_btree_rec *block_rec; + unsigned int loaded; + + for (loaded = 0; loaded < nr_wanted; loaded++, idx++) { + rec = pop_slab_cursor(rc->slab_cursor); + memcpy(&cur->bc_rec.rc, rec, sizeof(struct xfs_refcount_irec)); + + block_rec = libxfs_btree_rec_addr(cur, idx, block); + cur->bc_ops->init_rec_from_cur(cur, block_rec); + } + + return loaded; +} + +/* Feed one of the new btree blocks to the bulk loader. */ +STATIC int +xrep_rtrefc_claim_block( + struct xfs_btree_cur *cur, + union xfs_btree_ptr *ptr, + void *priv) +{ + struct xrep_rtrefc *rr = priv; + + return bulkload_claim_block(cur, &rr->new_fork_info, ptr); +} + +/* Figure out how much space we need to create the incore btree root block. */ +STATIC size_t +xrep_rtrefc_iroot_size( + struct xfs_btree_cur *cur, + unsigned int level, + unsigned int nr_this_level, + void *priv) +{ + return xfs_rtrefcount_broot_space_calc(cur->bc_mp, level, + nr_this_level); +} + +/* Reserve new btree blocks and bulk load all the rtrmap records. */ +STATIC int +xrep_rtrefc_btree_load( + struct xrep_rtrefc *rr, + struct xfs_btree_cur *rtrmap_cur) +{ + struct repair_ctx *sc = rr->sc; + int error; + + rr->rtrefc_bload.get_records = xrep_rtrefc_get_records; + rr->rtrefc_bload.claim_block = xrep_rtrefc_claim_block; + rr->rtrefc_bload.iroot_size = xrep_rtrefc_iroot_size; + bulkload_estimate_inode_slack(sc->mp, &rr->rtrefc_bload); + + /* Compute how many blocks we'll need. */ + error = -libxfs_btree_bload_compute_geometry(rtrmap_cur, + &rr->rtrefc_bload, + refcount_record_count(sc->mp, true, rr->rtg->rtg_rgno)); + if (error) + return error; + + /* + * Guess how many blocks we're going to need to rebuild an entire + * rtrefcountbt from the number of extents we found, and pump up our + * transaction to have sufficient block reservation. + */ + error = -libxfs_trans_reserve_more(sc->tp, rr->rtrefc_bload.nr_blocks, + 0); + if (error) + return error; + + /* + * Reserve the space we'll need for the new btree. Drop the cursor + * while we do this because that can roll the transaction and cursors + * can't handle that. + */ + error = bulkload_alloc_blocks(&rr->new_fork_info, + rr->rtrefc_bload.nr_blocks); + if (error) + return error; + + /* Add all observed rtrmap records. */ + error = init_refcount_cursor(true, rr->rtg->rtg_rgno, &rr->slab_cursor); + if (error) + return error; + error = -libxfs_btree_bload(rtrmap_cur, &rr->rtrefc_bload, rr); + free_slab_cursor(&rr->slab_cursor); + return error; +} + +/* Update the inode counters. */ +STATIC int +xrep_rtrefc_reset_counters( + struct xrep_rtrefc *rr) +{ + struct repair_ctx *sc = rr->sc; + + /* + * Update the inode block counts to reflect the btree we just + * generated. + */ + sc->ip->i_nblocks = rr->new_fork_info.ifake.if_blocks; + libxfs_trans_log_inode(sc->tp, sc->ip, XFS_ILOG_CORE); + + /* Quotas don't exist so we're done. */ + return 0; +} + +/* + * Use the collected rmap information to stage a new rt refcount btree. If + * this is successful we'll return with the new btree root information logged + * to the repair transaction but not yet committed. + */ +static int +xrep_rtrefc_build_new_tree( + struct xrep_rtrefc *rr) +{ + struct xfs_owner_info oinfo; + struct xfs_btree_cur *cur; + struct repair_ctx *sc = rr->sc; + struct xbtree_ifakeroot *ifake = &rr->new_fork_info.ifake; + int error; + + /* + * Prepare to construct the new fork by initializing the new btree + * structure and creating a fake ifork in the ifakeroot structure. + */ + libxfs_rmap_ino_bmbt_owner(&oinfo, sc->ip->i_ino, XFS_DATA_FORK); + bulkload_init_inode(&rr->new_fork_info, sc, XFS_DATA_FORK, &oinfo); + cur = libxfs_rtrefcountbt_stage_cursor(sc->mp, rr->rtg, sc->ip, ifake); + + /* + * Figure out the size and format of the new fork, then fill it with + * all the rtrmap records we've found. Join the inode to the + * transaction so that we can roll the transaction while holding the + * inode locked. + */ + libxfs_trans_ijoin(sc->tp, sc->ip, 0); + ifake->if_fork->if_format = XFS_DINODE_FMT_REFCOUNT; + error = xrep_rtrefc_btree_load(rr, cur); + if (error) + goto err_cur; + + /* + * Install the new fork in the inode. After this point the old mapping + * data are no longer accessible and the new tree is live. We delete + * the cursor immediately after committing the staged root because the + * staged fork might be in extents format. + */ + libxfs_rtrefcountbt_commit_staged_btree(cur, sc->tp); + libxfs_btree_del_cursor(cur, 0); + + /* Reset the inode counters now that we've changed the fork. */ + error = xrep_rtrefc_reset_counters(rr); + if (error) + goto err_newbt; + + /* Dispose of any unused blocks and the accounting infomation. */ + bulkload_destroy(&rr->new_fork_info, error); + + return -libxfs_trans_roll_inode(&sc->tp, sc->ip); +err_cur: + if (cur) + libxfs_btree_del_cursor(cur, error); +err_newbt: + bulkload_destroy(&rr->new_fork_info, error); + return error; +} + +/* Store the realtime reference counts in the rtrefcbt. */ +int +populate_rtgroup_refcountbt( + struct xfs_rtgroup *rtg, + struct xfs_inode *ip) +{ + struct repair_ctx sc = { + .mp = rtg->rtg_mount, + .ip = ip, + }; + struct xrep_rtrefc rr = { + .sc = &sc, + .rtg = rtg, + }; + struct xfs_mount *mp = rtg->rtg_mount; + int error; + + if (!xfs_has_rtreflink(mp)) + return 0; + + error = -libxfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, 0, 0, 0, + &sc.tp); + if (error) + return error; + + error = xrep_rtrefc_build_new_tree(&rr); + if (error) + goto out_cancel; + + return -libxfs_trans_commit(sc.tp); + +out_cancel: + libxfs_trans_cancel(sc.tp); + return error; +} ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 35/41] xfs_repair: reject unwritten shared extents 2022-12-30 22:20 ` [PATCHSET v1.0 00/41] libxfs: reflink on the realtime device Darrick J. Wong ` (33 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 36/41] xfs_repair: rebuild the realtime refcount btree Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 33/41] xfs_repair: compute refcount data for the realtime groups Darrick J. Wong ` (5 subsequent siblings) 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> We don't allow sharing of unwritten extents, which means that repair should reject an unwritten extent if someone else has already claimed the space. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- repair/dinode.c | 23 ++++++++++++++++++----- 1 file changed, 18 insertions(+), 5 deletions(-) diff --git a/repair/dinode.c b/repair/dinode.c index 1322f31d47e..1ef93d7aa8e 100644 --- a/repair/dinode.c +++ b/repair/dinode.c @@ -277,7 +277,8 @@ _("bad state in rt extent map %" PRIu64 "\n"), break; case XR_E_INUSE: case XR_E_MULT: - if (xfs_has_rtreflink(mp)) + if (xfs_has_rtreflink(mp) && + irec->br_state == XFS_EXT_NORM) break; set_rtbmap(ext, XR_E_MULT); break; @@ -353,8 +354,14 @@ _("data fork in rt inode %" PRIu64 " found rt metadata extent %" PRIu64 " in rt return 1; case XR_E_INUSE: case XR_E_MULT: - if (xfs_has_rtreflink(mp)) - break; + if (xfs_has_rtreflink(mp)) { + if (irec->br_state == XFS_EXT_NORM) + break; + do_warn( +_("data fork in rt inode %" PRIu64 " claims shared unwritten rt extent %" PRIu64 "\n"), + ino, b); + return 1; + } do_warn( _("data fork in rt inode %" PRIu64 " claims used rt extent %" PRIu64 "\n"), ino, b); @@ -671,8 +678,14 @@ _("%s fork in inode %" PRIu64 " claims metadata block %" PRIu64 "\n"), case XR_E_INUSE: case XR_E_MULT: if (type == XR_INO_DATA && - xfs_has_reflink(mp)) - break; + xfs_has_reflink(mp)) { + if (irec.br_state == XFS_EXT_NORM) + break; + do_warn( +_("%s fork in %s inode %" PRIu64 " claims shared unwritten block %" PRIu64 "\n"), + forkname, ftype, ino, b); + goto done; + } do_warn( _("%s fork in %s inode %" PRIu64 " claims used block %" PRIu64 "\n"), forkname, ftype, ino, b); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 33/41] xfs_repair: compute refcount data for the realtime groups 2022-12-30 22:20 ` [PATCHSET v1.0 00/41] libxfs: reflink on the realtime device Darrick J. Wong ` (34 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 35/41] xfs_repair: reject unwritten shared extents Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 39/41] xfs_repair: allow sysadmins to add realtime reflink Darrick J. Wong ` (4 subsequent siblings) 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> At the end of phase 4, compute reference count information for realtime groups from the realtime rmap information collected, just like we do for AGs in the data section. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- repair/phase4.c | 19 ++++++++++++++++++- repair/rmap.c | 17 ++++++++++++----- repair/rmap.h | 2 +- 3 files changed, 31 insertions(+), 7 deletions(-) diff --git a/repair/phase4.c b/repair/phase4.c index b0cb805f30c..e90533689e0 100644 --- a/repair/phase4.c +++ b/repair/phase4.c @@ -173,13 +173,28 @@ compute_ag_refcounts( { int error; - error = compute_refcounts(wq->wq_ctx, agno); + error = compute_refcounts(wq->wq_ctx, false, agno); if (error) do_error( _("%s while computing reference count records.\n"), strerror(error)); } +static void +compute_rt_refcounts( + struct workqueue*wq, + xfs_agnumber_t rgno, + void *arg) +{ + int error; + + error = compute_refcounts(wq->wq_ctx, true, rgno); + if (error) + do_error( +_("%s while computing realtime reference count records.\n"), + strerror(error)); +} + static void process_inode_reflink_flags( struct workqueue *wq, @@ -227,6 +242,8 @@ process_rmap_data( create_work_queue(&wq, mp, platform_nproc()); for (i = 0; i < mp->m_sb.sb_agcount; i++) queue_work(&wq, compute_ag_refcounts, i, NULL); + for (i = 0; i < mp->m_sb.sb_rgcount; i++) + queue_work(&wq, compute_rt_refcounts, i, NULL); destroy_work_queue(&wq); create_work_queue(&wq, mp, platform_nproc()); diff --git a/repair/rmap.c b/repair/rmap.c index 9394720a1b6..85fc05945c6 100644 --- a/repair/rmap.c +++ b/repair/rmap.c @@ -137,6 +137,11 @@ rmaps_init_rt( if (error) goto nomem; + error = init_slab(&ag_rmap->ar_refcount_items, + sizeof(struct xfs_refcount_irec)); + if (error) + goto nomem; + ag_rmap->rg_rmap_ino = NULLFSINO; ag_rmap->rg_refcount_ino = NULLFSINO; return; @@ -1061,6 +1066,7 @@ mark_reflink_inodes( static void refcount_emit( struct xfs_mount *mp, + bool isrt, xfs_agnumber_t agno, xfs_agblock_t agbno, xfs_extlen_t len, @@ -1070,7 +1076,7 @@ refcount_emit( int error; struct xfs_slab *rlslab; - rlslab = rmaps_for_group(false, agno)->ar_refcount_items; + rlslab = rmaps_for_group(isrt, agno)->ar_refcount_items; ASSERT(nr_rmaps > 0); dbg_printf("REFL: agno=%u pblk=%u, len=%u -> refcount=%zu\n", @@ -1189,6 +1195,7 @@ refcount_push_rmaps_at( int compute_refcounts( struct xfs_mount *mp, + bool isrt, xfs_agnumber_t agno) { struct rcbag *rcstack; @@ -1204,12 +1211,12 @@ compute_refcounts( if (!xfs_has_reflink(mp)) return 0; - if (rmaps_for_group(false, agno)->ar_xfbtree == NULL) + if (rmaps_for_group(isrt, agno)->ar_xfbtree == NULL) return 0; - nr_rmaps = rmap_record_count(mp, false, agno); + nr_rmaps = rmap_record_count(mp, isrt, agno); - error = rmap_init_mem_cursor(mp, NULL, false, agno, &rmcur); + error = rmap_init_mem_cursor(mp, NULL, isrt, agno, &rmcur); if (error) return error; @@ -1266,7 +1273,7 @@ compute_refcounts( ASSERT(nbno > cbno); if (rcbag_count(rcstack) != old_stack_height) { if (old_stack_height > 1) { - refcount_emit(mp, agno, cbno, + refcount_emit(mp, isrt, agno, cbno, nbno - cbno, old_stack_height); } diff --git a/repair/rmap.h b/repair/rmap.h index 4f49b19062c..4d20d90812b 100644 --- a/repair/rmap.h +++ b/repair/rmap.h @@ -38,7 +38,7 @@ extern int64_t rmap_diffkeys(struct xfs_rmap_irec *kp1, extern void rmap_high_key_from_rec(struct xfs_rmap_irec *rec, struct xfs_rmap_irec *key); -extern int compute_refcounts(struct xfs_mount *, xfs_agnumber_t); +int compute_refcounts(struct xfs_mount *mp, bool isrt, xfs_agnumber_t agno); uint64_t refcount_record_count(struct xfs_mount *mp, xfs_agnumber_t agno); extern int init_refcount_cursor(xfs_agnumber_t, struct xfs_slab_cursor **); extern void refcount_avoid_check(struct xfs_mount *mp); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 39/41] xfs_repair: allow sysadmins to add realtime reflink 2022-12-30 22:20 ` [PATCHSET v1.0 00/41] libxfs: reflink on the realtime device Darrick J. Wong ` (35 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 33/41] xfs_repair: compute refcount data for the realtime groups Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 38/41] xfs_repair: validate CoW extent size hint on rtinherit directories Darrick J. Wong ` (3 subsequent siblings) 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Allow the sysadmin to use xfs_repair to upgrade an existing filesystem to support the realtime reference count btree, and therefore reflink on realtime volumes. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/libxfs_api_defs.h | 1 + repair/phase2.c | 75 +++++++++++++++++++++++++++++++++++++++++++++- 2 files changed, 74 insertions(+), 2 deletions(-) diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h index 5c7396a53a6..63607cf2b94 100644 --- a/libxfs/libxfs_api_defs.h +++ b/libxfs/libxfs_api_defs.h @@ -251,6 +251,7 @@ #define xfs_rtgroup_update_super libxfs_rtgroup_update_super #define xfs_rtrefcountbt_absolute_maxlevels libxfs_rtrefcountbt_absolute_maxlevels +#define xfs_rtrefcountbt_calc_reserves libxfs_rtrefcountbt_calc_reserves #define xfs_rtrefcountbt_commit_staged_btree libxfs_rtrefcountbt_commit_staged_btree #define xfs_rtrefcountbt_create libxfs_rtrefcountbt_create #define xfs_rtrefcountbt_create_path libxfs_rtrefcountbt_create_path diff --git a/repair/phase2.c b/repair/phase2.c index 35c1214be9a..ded281e4b88 100644 --- a/repair/phase2.c +++ b/repair/phase2.c @@ -242,14 +242,19 @@ set_reflink( exit(0); } - if (xfs_has_realtime(mp)) { - printf(_("Reflink feature not supported with realtime.\n")); + if (xfs_has_realtime(mp) && !xfs_has_rtgroups(mp)) { + printf(_("Reference count btree requires realtime groups.\n")); exit(0); } printf(_("Adding reflink support to filesystem.\n")); new_sb->sb_features_ro_compat |= XFS_SB_FEAT_RO_COMPAT_REFLINK; new_sb->sb_features_incompat |= XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR; + + /* Quota counts will be wrong once we add the refcount inodes. */ + if (xfs_has_realtime(mp)) + quotacheck_skip(); + return true; } @@ -462,6 +467,55 @@ reserve_rtrmap_inode( return -libxfs_imeta_resv_init_inode(rtg->rtg_rmapip, ask); } +/* + * Reserve space to handle rt refcount btree expansion. + * + * If the refcount inode for this group already exists, we assume that we're + * adding some other feature. Note that we have not validated the metadata + * directory tree, so we must perform the lookup by hand and abort the upgrade + * if there are errors. If the inode does not exist, the amount of space + * needed to handle a new maximally sized refcount btree is added to @new_resv. + */ +static int +reserve_rtrefcount_inode( + struct xfs_rtgroup *rtg, + xfs_rfsblock_t *new_resv) +{ + struct xfs_mount *mp = rtg->rtg_mount; + struct xfs_imeta_path *path; + xfs_ino_t ino; + xfs_filblks_t ask; + int error; + + if (!xfs_has_rtreflink(mp)) + return 0; + + error = -libxfs_rtrefcountbt_create_path(mp, rtg->rtg_rgno, &path); + if (error) + return error; + + ask = libxfs_rtrefcountbt_calc_reserves(mp); + + error = -libxfs_imeta_lookup(mp, path, &ino); + libxfs_imeta_free_path(path); + if (error == EFSCORRUPTED) { + if (ask > mp->m_sb.sb_fdblocks) + return ENOSPC; + + *new_resv += ask; + return 0; + } + if (error) + return error; + + error = -libxfs_imeta_iget(mp, ino, XFS_DIR3_FT_REG_FILE, + &rtg->rtg_refcountip); + if (error) + return error; + + return -libxfs_imeta_resv_init_inode(rtg->rtg_refcountip, ask); +} + static void check_fs_free_space( struct xfs_mount *mp, @@ -561,6 +615,18 @@ _("Not enough free space would remain for rtgroup %u rmap inode.\n"), do_error( _("Error %d while checking rtgroup %u rmap inode space reservation.\n"), rtg->rtg_rgno, error); + + error = reserve_rtrefcount_inode(rtg, &new_resv); + if (error == ENOSPC) { + printf( +_("Not enough free space would remain for rtgroup %u refcount inode.\n"), + rtg->rtg_rgno); + exit(0); + } + if (error) + do_error( +_("Error %d while checking rtgroup %u refcount inode space reservation.\n"), + rtg->rtg_rgno, error); } /* @@ -581,6 +647,11 @@ _("Error %d while checking rtgroup %u rmap inode space reservation.\n"), libxfs_imeta_irele(rtg->rtg_rmapip); rtg->rtg_rmapip = NULL; } + if (rtg->rtg_refcountip) { + libxfs_imeta_resv_free_inode(rtg->rtg_refcountip); + libxfs_imeta_irele(rtg->rtg_refcountip); + rtg->rtg_refcountip = NULL; + } } /* ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 38/41] xfs_repair: validate CoW extent size hint on rtinherit directories 2022-12-30 22:20 ` [PATCHSET v1.0 00/41] libxfs: reflink on the realtime device Darrick J. Wong ` (36 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 39/41] xfs_repair: allow sysadmins to add realtime reflink Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 37/41] xfs_repair: allow realtime files to have the reflink flag set Darrick J. Wong ` (2 subsequent siblings) 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> XFS allows a sysadmin to change the rt extent size when adding a rt section to a filesystem after formatting. If there are any directories with both a cowextsize hint and rtinherit set, the hint could become misaligned with the new rextsize. Offer to fix the problem if we're in modify mode and the verifier didn't trip. If we're in dry run mode, we let the kernel fix it. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- repair/dinode.c | 64 +++++++++++++++++++++++++++++++++++++------------------ 1 file changed, 43 insertions(+), 21 deletions(-) diff --git a/repair/dinode.c b/repair/dinode.c index 09cef18f2e9..db049415af4 100644 --- a/repair/dinode.c +++ b/repair/dinode.c @@ -2795,6 +2795,47 @@ _("Bad extent size hint %u on inode %" PRIu64 ", "), } } +static void +validate_cowextsize( + struct xfs_mount *mp, + struct xfs_dinode *dino, + xfs_ino_t lino, + int *dirty) +{ + uint16_t flags = be16_to_cpu(dino->di_flags); + uint64_t flags2 = be64_to_cpu(dino->di_flags2); + unsigned int value = be32_to_cpu(dino->di_cowextsize); + bool misaligned = false; + bool bad; + + /* + * XFS allows a sysadmin to change the rt extent size when adding a + * rt section to a filesystem after formatting. If there are any + * directories with both a cowextsize hint and rtinherit set, the + * hint could become misaligned with the new rextsize. + */ + if ((flags2 & XFS_DIFLAG2_COWEXTSIZE) && + (flags & XFS_DIFLAG_RTINHERIT) && + value % mp->m_sb.sb_rextsize > 0) + misaligned = true; + + /* Complain if the verifier fails. */ + bad = libxfs_inode_validate_cowextsize(mp, value, + be16_to_cpu(dino->di_mode), flags, flags2) != NULL; + if (bad || misaligned) { + do_warn( +_("Bad CoW extent size hint %u on inode %" PRIu64 ", "), + be32_to_cpu(dino->di_cowextsize), lino); + if (!no_modify) { + do_warn(_("resetting to zero\n")); + dino->di_flags2 &= ~cpu_to_be64(XFS_DIFLAG2_COWEXTSIZE); + dino->di_cowextsize = 0; + *dirty = 1; + } else + do_warn(_("would reset to zero\n")); + } +} + /* * returns 0 if the inode is ok, 1 if the inode is corrupt * check_dups can be set to 1 *only* when called by the @@ -3372,27 +3413,8 @@ _("bad (negative) size %" PRId64 " on inode %" PRIu64 "\n"), validate_extsize(mp, dino, lino, dirty); - /* - * Only (regular files and directories) with COWEXTSIZE flags - * set can have extsize set. - */ - if (dino->di_version >= 3 && - libxfs_inode_validate_cowextsize(mp, - be32_to_cpu(dino->di_cowextsize), - be16_to_cpu(dino->di_mode), - be16_to_cpu(dino->di_flags), - be64_to_cpu(dino->di_flags2)) != NULL) { - do_warn( -_("Bad CoW extent size %u on inode %" PRIu64 ", "), - be32_to_cpu(dino->di_cowextsize), lino); - if (!no_modify) { - do_warn(_("resetting to zero\n")); - dino->di_flags2 &= ~cpu_to_be64(XFS_DIFLAG2_COWEXTSIZE); - dino->di_cowextsize = 0; - *dirty = 1; - } else - do_warn(_("would reset to zero\n")); - } + if (dino->di_version >= 3) + validate_cowextsize(mp, dino, lino, dirty); /* nsec fields cannot be larger than 1 billion */ check_nsec("atime", lino, dino, &dino->di_atime, dirty); ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 37/41] xfs_repair: allow realtime files to have the reflink flag set 2022-12-30 22:20 ` [PATCHSET v1.0 00/41] libxfs: reflink on the realtime device Darrick J. Wong ` (37 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 38/41] xfs_repair: validate CoW extent size hint on rtinherit directories Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 40/41] mkfs: validate CoW extent size hint when rtinherit is set Darrick J. Wong 2022-12-30 22:20 ` [PATCH 41/41] mkfs: enable reflink on the realtime device Darrick J. Wong 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Now that we allow reflink on the realtime volume, allow that combination of inode flags if the feature's enabled. Note that we now allow inodes to have rtinherit even if there's no realtime volume, since the kernel has never restricted that. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- repair/dinode.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/repair/dinode.c b/repair/dinode.c index 1ef93d7aa8e..09cef18f2e9 100644 --- a/repair/dinode.c +++ b/repair/dinode.c @@ -3186,7 +3186,8 @@ _("bad (negative) size %" PRId64 " on inode %" PRIu64 "\n"), } if ((flags2 & XFS_DIFLAG2_REFLINK) && - (flags & (XFS_DIFLAG_REALTIME | XFS_DIFLAG_RTINHERIT))) { + !xfs_has_rtreflink(mp) && + (flags & XFS_DIFLAG_REALTIME)) { if (!uncertain) { do_warn( _("Cannot have a reflinked realtime inode %" PRIu64 "\n"), @@ -3218,7 +3219,8 @@ _("bad (negative) size %" PRId64 " on inode %" PRIu64 "\n"), } if ((flags2 & XFS_DIFLAG2_COWEXTSIZE) && - (flags & (XFS_DIFLAG_REALTIME | XFS_DIFLAG_RTINHERIT))) { + !xfs_has_rtreflink(mp) && + (flags & XFS_DIFLAG_REALTIME)) { if (!uncertain) { do_warn( _("Cannot have CoW extent size hint on a realtime inode %" PRIu64 "\n"), ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 40/41] mkfs: validate CoW extent size hint when rtinherit is set 2022-12-30 22:20 ` [PATCHSET v1.0 00/41] libxfs: reflink on the realtime device Darrick J. Wong ` (38 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 37/41] xfs_repair: allow realtime files to have the reflink flag set Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 41/41] mkfs: enable reflink on the realtime device Darrick J. Wong 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Extent size hints exist to nudge the behavior of the file data block allocator towards trying to make aligned allocations. Therefore, it doesn't make sense to allow a hint that isn't a multiple of the fundamental allocation unit for a given file. This means that if the sysadmin is formatting with rtinherit set on the root dir, validate_cowextsize_hint needs to check the hint value on a simulated realtime file to make sure that it's correct. This hasn't been necessary in the past since one cannot have a CoW hint without a reflink filesystem, and we previously didn't allow rt reflink filesystems. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- mkfs/xfs_mkfs.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c index eebcade7d1a..dcce3d0136e 100644 --- a/mkfs/xfs_mkfs.c +++ b/mkfs/xfs_mkfs.c @@ -2669,6 +2669,26 @@ _("illegal CoW extent size hint %lld, must be less than %u.\n"), min(XFS_MAX_BMBT_EXTLEN, mp->m_sb.sb_agblocks / 2)); usage(); } + + /* + * If the value is to be passed on to realtime files, revalidate with + * a realtime file so that we know the hint and flag that get passed on + * to realtime files will be correct. + */ + if (!(cli->fsx.fsx_xflags & FS_XFLAG_RTINHERIT)) + return; + + fa = libxfs_inode_validate_cowextsize(mp, cli->fsx.fsx_cowextsize, + S_IFREG, XFS_DIFLAG_REALTIME, flags2); + + if (fa) { + fprintf(stderr, +_("illegal CoW extent size hint %lld, must be less than %u and a multiple of %u. %p\n"), + (long long)cli->fsx.fsx_cowextsize, + min(XFS_MAX_BMBT_EXTLEN, mp->m_sb.sb_agblocks / 2), + mp->m_sb.sb_rextsize, fa); + usage(); + } } /* Complain if this filesystem is not a supported configuration. */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 41/41] mkfs: enable reflink on the realtime device 2022-12-30 22:20 ` [PATCHSET v1.0 00/41] libxfs: reflink on the realtime device Darrick J. Wong ` (39 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 40/41] mkfs: validate CoW extent size hint when rtinherit is set Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 40 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Allow the creation of filesystems with both reflink and realtime volumes enabled. For now we don't support a realtime extent size > 1. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/init.c | 4 ++-- mkfs/proto.c | 44 +++++++++++++++++++++++++++++++++++++++++ mkfs/xfs_mkfs.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++++++----- 3 files changed, 100 insertions(+), 7 deletions(-) diff --git a/libxfs/init.c b/libxfs/init.c index 40ebbbce39d..a4023f78655 100644 --- a/libxfs/init.c +++ b/libxfs/init.c @@ -448,9 +448,9 @@ rtmount_init( if (mp->m_sb.sb_rblocks == 0) return 0; - if (xfs_has_reflink(mp)) { + if (xfs_has_reflink(mp) && mp->m_sb.sb_rextsize > 1) { fprintf(stderr, - _("%s: Reflink not compatible with realtime device. Please try a newer xfsprogs.\n"), + _("%s: Reflink not compatible with realtime extent size > 1. Please try a newer xfsprogs.\n"), progname); return -1; } diff --git a/mkfs/proto.c b/mkfs/proto.c index 96eab25da45..c98568ca507 100644 --- a/mkfs/proto.c +++ b/mkfs/proto.c @@ -851,6 +851,48 @@ rtrmapbt_create( libxfs_imeta_free_path(path); } +/* Create the realtime refcount btree inode. */ +static void +rtrefcountbt_create( + struct xfs_rtgroup *rtg) +{ + struct xfs_mount *mp = rtg->rtg_mount; + struct xfs_imeta_update upd; + struct xfs_imeta_path *path; + struct xfs_trans *tp; + int error; + + error = -libxfs_rtrefcountbt_create_path(mp, rtg->rtg_rgno, &path); + if (error) + fail( _("rtrefcount inode path creation failed"), error); + + error = -libxfs_imeta_ensure_dirpath(mp, path); + if (error) + fail(_("rtgroup allocation failed"), + error); + + error = -libxfs_imeta_start_update(mp, path, &upd); + if (error) + res_failed(error); + + error = -libxfs_trans_alloc(mp, &M_RES(mp)->tr_imeta_create, + libxfs_imeta_create_space_res(mp), 0, 0, &tp); + if (error) + res_failed(error); + + error = -libxfs_rtrefcountbt_create(&tp, path, &upd, + &rtg->rtg_refcountip); + if (error) + fail(_("rtrefcount inode creation failed"), error); + + error = -libxfs_trans_commit(tp); + if (error) + fail(_("rtrefcountbt commit failed"), error); + + libxfs_imeta_end_update(mp, &upd, 0); + libxfs_imeta_free_path(path); +} + /* Initialize block headers of rt free space files. */ static int init_rtblock_headers( @@ -1033,6 +1075,8 @@ rtinit( for_each_rtgroup(mp, rgno, rtg) { if (xfs_has_rtrmapbt(mp)) rtrmapbt_create(rtg); + if (xfs_has_rtreflink(mp)) + rtrefcountbt_create(rtg); } if (mp->m_sb.sb_rbmblocks) { diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c index dcce3d0136e..e406fa6a5ea 100644 --- a/mkfs/xfs_mkfs.c +++ b/mkfs/xfs_mkfs.c @@ -2385,12 +2385,36 @@ _("inode btree counters not supported without finobt support\n")); } if (cli->xi->rtname) { - if (cli->sb_feat.reflink && cli_opt_set(&mopts, M_REFLINK)) { - fprintf(stderr, -_("reflink not supported with realtime devices\n")); - usage(); + if (cli->rtextsize && cli->sb_feat.reflink) { + if (cli_opt_set(&mopts, M_REFLINK)) { + fprintf(stderr, +_("reflink not supported on realtime devices with rt extent size specified\n")); + usage(); + } + cli->sb_feat.reflink = false; + } + if (cli->blocksize < XFS_MIN_RTEXTSIZE && cli->sb_feat.reflink) { + if (cli_opt_set(&mopts, M_REFLINK)) { + fprintf(stderr, +_("reflink not supported on realtime devices with blocksize %d < %d\n"), + cli->blocksize, + XFS_MIN_RTEXTSIZE); + usage(); + } + cli->sb_feat.reflink = false; + } + if (!cli->sb_feat.rtgroups && cli->sb_feat.reflink) { + if (cli_opt_set(&mopts, M_REFLINK) && + cli_opt_set(&ropts, R_RTGROUPS)) { + fprintf(stderr, +_("reflink not supported on realtime devices without rtgroups feature\n")); + usage(); + } else if (cli_opt_set(&mopts, M_REFLINK)) { + cli->sb_feat.rtgroups = true; + } else { + cli->sb_feat.reflink = false; + } } - cli->sb_feat.reflink = false; if (!cli->sb_feat.rtgroups && cli->sb_feat.rmapbt) { if (cli_opt_set(&mopts, M_RMAPBT) && @@ -2558,6 +2582,19 @@ validate_rtextsize( usage(); } cfg->rtextblocks = (xfs_extlen_t)(rtextbytes >> cfg->blocklog); + } else if (cli->sb_feat.reflink && cli->xi->rtname) { + /* + * reflink doesn't support rt extent size > 1FSB yet, so set + * an extent size of 1FSB. Make sure we still satisfy the + * minimum rt extent size. + */ + if (cfg->blocksize < XFS_MIN_RTEXTSIZE) { + fprintf(stderr, + _("reflink not supported on rt volume with blocksize %d\n"), + cfg->blocksize); + usage(); + } + cfg->rtextblocks = 1; } else { /* * If realtime extsize has not been specified by the user, @@ -2589,6 +2626,12 @@ validate_rtextsize( } } ASSERT(cfg->rtextblocks); + + if (cli->sb_feat.reflink && cfg->rtblocks > 0 && cfg->rtextblocks > 1) { + fprintf(stderr, +_("reflink not supported on realtime with extent sizes > 1\n")); + usage(); + } } /* Validate the incoming extsize hint. */ @@ -4583,10 +4626,16 @@ check_rt_meta_prealloc( error = -libxfs_imeta_resv_init_inode(rtg->rtg_rmapip, ask); if (error) prealloc_fail(mp, error, ask, _("realtime rmap btree")); + + ask = libxfs_rtrefcountbt_calc_reserves(mp); + error = -libxfs_imeta_resv_init_inode(rtg->rtg_refcountip, ask); + if (error) + prealloc_fail(mp, error, ask, _("realtime refcount btree")); } /* Unreserve the realtime metadata reservations. */ for_each_rtgroup(mp, rgno, rtg) { + libxfs_imeta_resv_free_inode(rtg->rtg_refcountip); libxfs_imeta_resv_free_inode(rtg->rtg_rmapip); } ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCHSET v1.0 0/3] libxfs: reflink with large realtime extents 2022-12-30 21:14 [NYE DELUGE 3/4] xfs: modernize the realtime volume Darrick J. Wong ` (29 preceding siblings ...) 2022-12-30 22:20 ` [PATCHSET v1.0 00/41] libxfs: reflink on the realtime device Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 3/3] mkfs: enable reflink with realtime extent sizes > 1 Darrick J. Wong ` (2 more replies) 2022-12-30 22:20 ` [PATCHSET v1.0 0/1] libxfs: enable quota for realtime voluems Darrick J. Wong ` (8 subsequent siblings) 39 siblings, 3 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs Hi all, Now that we've landed support for reflink on the realtime device for cases where the rt extent size is the same as the fs block size, enhance the reflink code further to support cases where the rt extent size is a power-of-two multiple of the fs block size. This enables us to do data block sharing (for example) for much larger allocation units by dirtying pagecache around shared extents and expanding writeback to write back shared extents fully. If you're going to start using this mess, you probably ought to just pull from my git trees, which are linked below. This is an extraordinary way to destroy everything. Enjoy! Comments and questions are, as always, welcome. --D kernel git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=realtime-reflink-extsize xfsprogs git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=realtime-reflink-extsize fstests git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=realtime-reflink-extsize --- libxfs/init.c | 7 ------- libxfs/xfs_bmap.c | 22 ++++++++++++++++++++++ libxfs/xfs_inode_buf.c | 20 ++++++-------------- mkfs/xfs_mkfs.c | 37 ------------------------------------- 4 files changed, 28 insertions(+), 58 deletions(-) ^ permalink raw reply [flat|nested] 565+ messages in thread
* [PATCH 3/3] mkfs: enable reflink with realtime extent sizes > 1 2022-12-30 22:20 ` [PATCHSET v1.0 0/3] libxfs: reflink with large realtime extents Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 1/3] xfs: enable extent size hints for CoW when rtextsize " Darrick J. Wong 2022-12-30 22:20 ` [PATCH 2/3] xfs: fix integer overflow when validating extent size hints Darrick J. Wong 2 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Allow creation of filesystems with reflink enabled and realtime extent size larger than 1 block. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/init.c | 7 ------- mkfs/xfs_mkfs.c | 37 ------------------------------------- 2 files changed, 44 deletions(-) diff --git a/libxfs/init.c b/libxfs/init.c index a4023f78655..c04a30bb829 100644 --- a/libxfs/init.c +++ b/libxfs/init.c @@ -448,13 +448,6 @@ rtmount_init( if (mp->m_sb.sb_rblocks == 0) return 0; - if (xfs_has_reflink(mp) && mp->m_sb.sb_rextsize > 1) { - fprintf(stderr, - _("%s: Reflink not compatible with realtime extent size > 1. Please try a newer xfsprogs.\n"), - progname); - return -1; - } - if (mp->m_rtdev_targp->bt_bdev == 0 && !xfs_is_debugger(mp)) { fprintf(stderr, _("%s: filesystem has a realtime subvolume\n"), progname); diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c index e406fa6a5ea..db828deadfb 100644 --- a/mkfs/xfs_mkfs.c +++ b/mkfs/xfs_mkfs.c @@ -2385,24 +2385,6 @@ _("inode btree counters not supported without finobt support\n")); } if (cli->xi->rtname) { - if (cli->rtextsize && cli->sb_feat.reflink) { - if (cli_opt_set(&mopts, M_REFLINK)) { - fprintf(stderr, -_("reflink not supported on realtime devices with rt extent size specified\n")); - usage(); - } - cli->sb_feat.reflink = false; - } - if (cli->blocksize < XFS_MIN_RTEXTSIZE && cli->sb_feat.reflink) { - if (cli_opt_set(&mopts, M_REFLINK)) { - fprintf(stderr, -_("reflink not supported on realtime devices with blocksize %d < %d\n"), - cli->blocksize, - XFS_MIN_RTEXTSIZE); - usage(); - } - cli->sb_feat.reflink = false; - } if (!cli->sb_feat.rtgroups && cli->sb_feat.reflink) { if (cli_opt_set(&mopts, M_REFLINK) && cli_opt_set(&ropts, R_RTGROUPS)) { @@ -2582,19 +2564,6 @@ validate_rtextsize( usage(); } cfg->rtextblocks = (xfs_extlen_t)(rtextbytes >> cfg->blocklog); - } else if (cli->sb_feat.reflink && cli->xi->rtname) { - /* - * reflink doesn't support rt extent size > 1FSB yet, so set - * an extent size of 1FSB. Make sure we still satisfy the - * minimum rt extent size. - */ - if (cfg->blocksize < XFS_MIN_RTEXTSIZE) { - fprintf(stderr, - _("reflink not supported on rt volume with blocksize %d\n"), - cfg->blocksize); - usage(); - } - cfg->rtextblocks = 1; } else { /* * If realtime extsize has not been specified by the user, @@ -2626,12 +2595,6 @@ validate_rtextsize( } } ASSERT(cfg->rtextblocks); - - if (cli->sb_feat.reflink && cfg->rtblocks > 0 && cfg->rtextblocks > 1) { - fprintf(stderr, -_("reflink not supported on realtime with extent sizes > 1\n")); - usage(); - } } /* Validate the incoming extsize hint. */ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 1/3] xfs: enable extent size hints for CoW when rtextsize > 1 2022-12-30 22:20 ` [PATCHSET v1.0 0/3] libxfs: reflink with large realtime extents Darrick J. Wong 2022-12-30 22:20 ` [PATCH 3/3] mkfs: enable reflink with realtime extent sizes > 1 Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 2/3] xfs: fix integer overflow when validating extent size hints Darrick J. Wong 2 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> CoW extent size hints are not allowed on filesystems that have large realtime extents because we only want to perform the minimum required amount of write-around (aka write amplification) for shared extents. On filesystems where rtextsize > 1, allocations can only be done in units of full rt extents, which means that we can only map an entire rt extent's worth of blocks into the data fork. Hole punch requests become conversions to unwritten if the request isn't aligned properly. Because a copy-write fundamentally requires remapping, this means that we also can only do copy-writes of a full rt extent. This is too expensive for large hint sizes, since it's all or nothing. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/xfs_bmap.c | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/libxfs/xfs_bmap.c b/libxfs/xfs_bmap.c index d5842e3b4f6..b8fe093f0f3 100644 --- a/libxfs/xfs_bmap.c +++ b/libxfs/xfs_bmap.c @@ -6445,6 +6445,28 @@ xfs_get_cowextsz_hint( if (ip->i_diflags2 & XFS_DIFLAG2_COWEXTSIZE) a = ip->i_cowextsize; if (XFS_IS_REALTIME_INODE(ip)) { + /* + * For realtime files, the realtime extent is the fundamental + * unit of allocation. This means that data sharing and CoW + * remapping can only be done in those units. For filesystems + * where the extent size is larger than one block, write + * requests that are not aligned to an extent boundary employ + * an unshare-around strategy to ensure that all pages for a + * shared extent are fully dirtied. + * + * Because the remapping alignment requirement applies equally + * to all CoW writes, any regular overwrites that could be + * turned (by a speculative CoW preallocation) into a CoW write + * must either employ this dirty-around strategy, or be smart + * enough to ignore the CoW fork mapping unless the entire + * extent is dirty or becomes shared by writeback time. Doing + * the first would dramatically increase write amplification, + * and the second would require deeper insight into the state + * of the page cache during a writeback request. For now, we + * ignore the hint. + */ + if (ip->i_mount->m_sb.sb_rextsize > 1) + return ip->i_mount->m_sb.sb_rextsize; b = 0; if (ip->i_diflags & XFS_DIFLAG_EXTSIZE) b = ip->i_extsize; ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 2/3] xfs: fix integer overflow when validating extent size hints 2022-12-30 22:20 ` [PATCHSET v1.0 0/3] libxfs: reflink with large realtime extents Darrick J. Wong 2022-12-30 22:20 ` [PATCH 3/3] mkfs: enable reflink with realtime extent sizes > 1 Darrick J. Wong 2022-12-30 22:20 ` [PATCH 1/3] xfs: enable extent size hints for CoW when rtextsize " Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Both file extent size hints are stored as 32-bit quantities, in units of filesystem blocks. As part of validating the hints, we convert these quantities to bytes to ensure that the hint is congruent with the file's allocation size. The maximum possible hint value is 2097151 (aka XFS_MAX_BMBT_EXTLEN). If the file allocation unit is larger than 2048, the unit conversion will exceed 32 bits in size, which overflows the uint32_t used to store the value used in the comparison. This isn't a problem for files on the data device since the hint will always be a multiple of the block size. However, this is a problem for realtime files because the rtextent size can be any integer number of fs blocks, and truncation of upper bits changes the outcome of division. Eliminate the overflow by performing the congruency check in units of blocks, not bytes. Otherwise, we get errors like this: $ truncate -s 500T /tmp/a $ mkfs.xfs -f -N /tmp/a -d extszinherit=2097151,rtinherit=1 -r extsize=28k illegal extent size hint 2097151, must be less than 2097151 and a multiple of 7. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- libxfs/xfs_inode_buf.c | 20 ++++++-------------- 1 file changed, 6 insertions(+), 14 deletions(-) diff --git a/libxfs/xfs_inode_buf.c b/libxfs/xfs_inode_buf.c index ba4df981bd0..866cc187769 100644 --- a/libxfs/xfs_inode_buf.c +++ b/libxfs/xfs_inode_buf.c @@ -737,13 +737,11 @@ xfs_inode_validate_extsize( bool rt_flag; bool hint_flag; bool inherit_flag; - uint32_t extsize_bytes; - uint32_t blocksize_bytes; + uint32_t alloc_unit = 1; rt_flag = (flags & XFS_DIFLAG_REALTIME); hint_flag = (flags & XFS_DIFLAG_EXTSIZE); inherit_flag = (flags & XFS_DIFLAG_EXTSZINHERIT); - extsize_bytes = XFS_FSB_TO_B(mp, extsize); /* * This comment describes a historic gap in this verifier function. @@ -772,9 +770,7 @@ xfs_inode_validate_extsize( */ if (rt_flag) - blocksize_bytes = XFS_FSB_TO_B(mp, mp->m_sb.sb_rextsize); - else - blocksize_bytes = mp->m_sb.sb_blocksize; + alloc_unit = mp->m_sb.sb_rextsize; if ((hint_flag || inherit_flag) && !(S_ISDIR(mode) || S_ISREG(mode))) return __this_address; @@ -792,7 +788,7 @@ xfs_inode_validate_extsize( if (mode && !(hint_flag || inherit_flag) && extsize != 0) return __this_address; - if (extsize_bytes % blocksize_bytes) + if (extsize % alloc_unit) return __this_address; if (extsize > XFS_MAX_BMBT_EXTLEN) @@ -827,12 +823,10 @@ xfs_inode_validate_cowextsize( { bool rt_flag; bool hint_flag; - uint32_t cowextsize_bytes; - uint32_t blocksize_bytes; + uint32_t alloc_unit = 1; rt_flag = (flags & XFS_DIFLAG_REALTIME); hint_flag = (flags2 & XFS_DIFLAG2_COWEXTSIZE); - cowextsize_bytes = XFS_FSB_TO_B(mp, cowextsize); /* * Similar to extent size hints, a directory can be configured to @@ -847,9 +841,7 @@ xfs_inode_validate_cowextsize( */ if (rt_flag) - blocksize_bytes = XFS_FSB_TO_B(mp, mp->m_sb.sb_rextsize); - else - blocksize_bytes = mp->m_sb.sb_blocksize; + alloc_unit = mp->m_sb.sb_rextsize; if (hint_flag && !xfs_has_reflink(mp)) return __this_address; @@ -864,7 +856,7 @@ xfs_inode_validate_cowextsize( if (mode && !hint_flag && cowextsize != 0) return __this_address; - if (cowextsize_bytes % blocksize_bytes) + if (cowextsize % alloc_unit) return __this_address; if (cowextsize > XFS_MAX_BMBT_EXTLEN) ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCHSET v1.0 0/1] libxfs: enable quota for realtime voluems 2022-12-30 21:14 [NYE DELUGE 3/4] xfs: modernize the realtime volume Darrick J. Wong ` (30 preceding siblings ...) 2022-12-30 22:20 ` [PATCHSET v1.0 0/3] libxfs: reflink with large realtime extents Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 1/1] xfs_quota: report warning limits for realtime space quotas Darrick J. Wong 2022-12-30 22:20 ` [PATCHSET 0/1] fstests: test upgrading older features Darrick J. Wong ` (7 subsequent siblings) 39 siblings, 1 reply; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs Hi all, At some point, I realized that I've refactored enough of the quota code in XFS that I should evaluate whether or not quota actually works on realtime volumes. It turns out that with two exceptions, it actually does seem to work properly! There are three broken pieces that I've found so far: chown doesn't work, the quota accounting goes wrong when the rt bitmap changes size, and the VFS quota ioctls don't report the realtime warning counts or limits. Hence this series fixes two things in XFS and re-enables rt quota after a break of a couple decades. If you're going to start using this mess, you probably ought to just pull from my git trees, which are linked below. This is an extraordinary way to destroy everything. Enjoy! Comments and questions are, as always, welcome. --D kernel git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=realtime-quotas xfsprogs git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=realtime-quotas fstests git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=realtime-quotas --- include/xqm.h | 5 ++++- quota/state.c | 1 + 2 files changed, 5 insertions(+), 1 deletion(-) ^ permalink raw reply [flat|nested] 565+ messages in thread
* [PATCH 1/1] xfs_quota: report warning limits for realtime space quotas 2022-12-30 22:20 ` [PATCHSET v1.0 0/1] libxfs: enable quota for realtime voluems Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 0 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: djwong, cem; +Cc: linux-xfs From: Darrick J. Wong <djwong@kernel.org> Report the number of warnings that a user will get for exceeding the soft limit of a realtime volume. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- include/xqm.h | 5 ++++- quota/state.c | 1 + 2 files changed, 5 insertions(+), 1 deletion(-) diff --git a/include/xqm.h b/include/xqm.h index 573441db986..045af9b67fd 100644 --- a/include/xqm.h +++ b/include/xqm.h @@ -184,7 +184,10 @@ struct fs_quota_statv { __s32 qs_rtbtimelimit;/* limit for rt blks timer */ __u16 qs_bwarnlimit; /* limit for num warnings */ __u16 qs_iwarnlimit; /* limit for num warnings */ - __u64 qs_pad2[8]; /* for future proofing */ + __u16 qs_rtbwarnlimit;/* limit for rt blks warnings */ + __u16 qs_pad3; + __u32 qs_pad4; + __u64 qs_pad2[7]; /* for future proofing */ }; #endif /* __XQM_H__ */ diff --git a/quota/state.c b/quota/state.c index 260ef51db18..43fb700f9a7 100644 --- a/quota/state.c +++ b/quota/state.c @@ -244,6 +244,7 @@ state_quotafile_stat( state_warnlimit(fp, XFS_INODE_QUOTA, sv->qs_iwarnlimit); state_timelimit(fp, XFS_RTBLOCK_QUOTA, sv->qs_rtbtimelimit); + state_warnlimit(fp, XFS_RTBLOCK_QUOTA, sv->qs_rtbwarnlimit); } static void ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCHSET 0/1] fstests: test upgrading older features 2022-12-30 21:14 [NYE DELUGE 3/4] xfs: modernize the realtime volume Darrick J. Wong ` (31 preceding siblings ...) 2022-12-30 22:20 ` [PATCHSET v1.0 0/1] libxfs: enable quota for realtime voluems Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 1/1] xfs: test upgrading old features Darrick J. Wong 2022-12-30 22:20 ` [PATCHSET v1.0 0/9] fstests: test XFS metadata directories Darrick J. Wong ` (6 subsequent siblings) 39 siblings, 1 reply; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan Hi all, Here is a general regression test to make sure that we can invoke the xfs_repair feature to add new features to V5 filesystems without errors. There are already targeted functionality tests for inobtcount and bigtime; this new one exists as a general upgrade exerciser. If you're going to start using this mess, you probably ought to just pull from my git trees, which are linked below. This is an extraordinary way to destroy everything. Enjoy! Comments and questions are, as always, welcome. --D xfsprogs git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=upgrade-older-features fstests git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=upgrade-older-features --- tests/xfs/769 | 248 +++++++++++++++++++++++++++++++++++++++++++++++++++++ tests/xfs/769.out | 2 2 files changed, 250 insertions(+) create mode 100755 tests/xfs/769 create mode 100644 tests/xfs/769.out ^ permalink raw reply [flat|nested] 565+ messages in thread
* [PATCH 1/1] xfs: test upgrading old features 2022-12-30 22:20 ` [PATCHSET 0/1] fstests: test upgrading older features Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2023-03-06 15:56 ` Zorro Lang 0 siblings, 1 reply; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan From: Darrick J. Wong <djwong@kernel.org> Test the ability to add older v5 features. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- tests/xfs/769 | 248 +++++++++++++++++++++++++++++++++++++++++++++++++++++ tests/xfs/769.out | 2 2 files changed, 250 insertions(+) create mode 100755 tests/xfs/769 create mode 100644 tests/xfs/769.out diff --git a/tests/xfs/769 b/tests/xfs/769 new file mode 100755 index 0000000000..7613048f52 --- /dev/null +++ b/tests/xfs/769 @@ -0,0 +1,248 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0-or-later +# Copyright (c) 2022 Oracle. All Rights Reserved. +# +# FS QA Test No. 769 +# +# Test upgrading filesystems with new features. +# +. ./common/preamble +_begin_fstest auto mkfs repair + +# Import common functions. +. ./common/filter +. ./common/populate + +# real QA test starts here +_supported_fs xfs + +test -w /dev/ttyprintk || _notrun "test requires writable /dev/ttyprintk" +_require_check_dmesg +_require_scratch_nocheck +_require_scratch_xfs_crc + +# Does repair know how to add a particular feature to a filesystem? +check_repair_upgrade() +{ + $XFS_REPAIR_PROG -c "$1=narf" 2>&1 | \ + grep -q 'unknown option' && return 1 + return 0 +} + +# Are we configured for realtime? +rt_configured() +{ + test "$USE_EXTERNAL" = "yes" && test -n "$SCRATCH_RTDEV" +} + +# Compute the MKFS_OPTIONS string for a particular feature upgrade test +compute_mkfs_options() +{ + local m_opts="" + local caller_options="$MKFS_OPTIONS" + + for feat in "${FEATURES[@]}"; do + local feat_state="${FEATURE_STATE["${feat}"]}" + + if echo "$caller_options" | grep -E -w -q "${feat}=[0-9]*"; then + # Change the caller's options + caller_options="$(echo "$caller_options" | \ + sed -e "s/\([^[:alnum:]]\)${feat}=[0-9]*/\1${feat}=${feat_state}/g")" + else + # Add it to our list of new mkfs flags + m_opts="${feat}=${feat_state},${m_opts}" + fi + done + + test -n "$m_opts" && m_opts=" -m $m_opts" + + echo "$caller_options$m_opts" +} + +# Log the start of an upgrade. +function upgrade_start_message() +{ + local feat="$1" + + echo "Add $feat to filesystem" +} + +# Find dmesg log messages since we started a particular upgrade test +function dmesg_since_feature_upgrade_start() +{ + local feat_logmsg="$(upgrade_start_message "$1")" + + # search the dmesg log of last run of $seqnum for possible failures + # use sed \cregexpc address type, since $seqnum contains "/" + dmesg | \ + tac | \ + sed -ne "0,\#run fstests $seqnum at $date_time#p" | \ + sed -ne "0,\#${feat_logmsg}#p" | \ + tac +} + +# Did the mount fail because this feature is not supported? +function feature_unsupported() +{ + local feat="$1" + + dmesg_since_feature_upgrade_start "$feat" | \ + grep -q 'has unknown.*features' +} + +# Exercise the scratch fs +function scratch_fsstress() +{ + echo moo > $SCRATCH_MNT/sample.txt + $FSSTRESS_PROG -n $((TIME_FACTOR * 1000)) -p $((LOAD_FACTOR * 4)) \ + -d $SCRATCH_MNT/data >> $seqres.full +} + +# Exercise the filesystem a little bit and emit a manifest. +function pre_exercise() +{ + local feat="$1" + + _try_scratch_mount &> $tmp.mount + res=$? + # If the kernel doesn't support the filesystem even after a + # fresh format, skip the rest of the upgrade test quietly. + if [ $res -eq 32 ] && feature_unsupported "$feat"; then + echo "mount failed due to unsupported feature $feat" >> $seqres.full + return 1 + fi + if [ $res -ne 0 ]; then + cat $tmp.mount + echo "mount failed with $res before upgrading to $feat" | \ + tee -a $seqres.full + return 1 + fi + + scratch_fsstress + find $SCRATCH_MNT -type f -print0 | xargs -r -0 md5sum > $tmp.manifest + _scratch_unmount + return 0 +} + +# Check the manifest and exercise the filesystem more +function post_exercise() +{ + local feat="$1" + + _try_scratch_mount &> $tmp.mount + res=$? + # If the kernel doesn't support the filesystem even after a + # fresh format, skip the rest of the upgrade test quietly. + if [ $res -eq 32 ] && feature_unsupported "$feat"; then + echo "mount failed due to unsupported feature $feat" >> $seqres.full + return 1 + fi + if [ $res -ne 0 ]; then + cat $tmp.mount + echo "mount failed with $res after upgrading to $feat" | \ + tee -a $seqres.full + return 1 + fi + + md5sum --quiet -c $tmp.manifest || \ + echo "fs contents ^^^ changed after adding $feat" + + iam="check" _check_scratch_fs || \ + echo "scratch fs check failed after adding $feat" + + # Try to mount the fs in case the check unmounted it + _try_scratch_mount &>> $seqres.full + + scratch_fsstress + + iam="check" _check_scratch_fs || \ + echo "scratch fs check failed after exercising $feat" + + # Try to unmount the fs in case the check didn't + _scratch_unmount &>> $seqres.full + return 0 +} + +# Create a list of fs features in the order that support for them was added +# to the kernel driver. For each feature upgrade test, we enable all the +# features that came before it and none of the ones after, which means we're +# testing incremental migrations. We start each run with a clean fs so that +# errors and unsatisfied requirements (log size, root ino position, etc) in one +# upgrade don't spread failure to the rest of the tests. +FEATURES=() +if rt_configured; then + check_repair_upgrade finobt && FEATURES+=("finobt") + check_repair_upgrade inobtcount && FEATURES+=("inobtcount") + check_repair_upgrade bigtime && FEATURES+=("bigtime") +else + check_repair_upgrade finobt && FEATURES+=("finobt") + check_repair_upgrade rmapbt && FEATURES+=("rmapbt") + check_repair_upgrade reflink && FEATURES+=("reflink") + check_repair_upgrade inobtcount && FEATURES+=("inobtcount") + check_repair_upgrade bigtime && FEATURES+=("bigtime") +fi + +test "${#FEATURES[@]}" -eq 0 && \ + _notrun "xfs_repair does not know how to add V5 features" + +declare -A FEATURE_STATE +for f in "${FEATURES[@]}"; do + FEATURE_STATE["$f"]=0 +done + +for feat in "${FEATURES[@]}"; do + echo "-----------------------" >> $seqres.full + + upgrade_start_message "$feat" | tee -a $seqres.full /dev/ttyprintk > /dev/null + + opts="$(compute_mkfs_options)" + echo "mkfs.xfs $opts" >> $seqres.full + + # Format filesystem + MKFS_OPTIONS="$opts" _scratch_mkfs &>> $seqres.full + res=$? + outcome="mkfs returns $res for $feat upgrade test" + echo "$outcome" >> $seqres.full + if [ $res -ne 0 ]; then + echo "$outcome" + continue + fi + + # Create some files to make things interesting. + pre_exercise "$feat" || break + + # Upgrade the fs + _scratch_xfs_repair -c "${feat}=1" &> $tmp.upgrade + res=$? + cat $tmp.upgrade >> $seqres.full + grep -q "^Adding" $tmp.upgrade || \ + echo "xfs_repair ignored command to add $feat" + + outcome="xfs_repair returns $res while adding $feat" + echo "$outcome" >> $seqres.full + if [ $res -ne 0 ]; then + # Couldn't upgrade filesystem, move on to the next feature. + FEATURE_STATE["$feat"]=1 + continue + fi + + # Make sure repair runs cleanly afterwards + _scratch_xfs_repair -n &>> $seqres.full + res=$? + outcome="xfs_repair -n returns $res after adding $feat" + echo "$outcome" >> $seqres.full + if [ $res -ne 0 ]; then + echo "$outcome" + fi + + # Make sure we can still exercise the filesystem. + post_exercise "$feat" || break + + # Update feature state for next run + FEATURE_STATE["$feat"]=1 +done + +# success, all done +echo Silence is golden. +status=0 +exit diff --git a/tests/xfs/769.out b/tests/xfs/769.out new file mode 100644 index 0000000000..332432db97 --- /dev/null +++ b/tests/xfs/769.out @@ -0,0 +1,2 @@ +QA output created by 769 +Silence is golden. ^ permalink raw reply related [flat|nested] 565+ messages in thread
* Re: [PATCH 1/1] xfs: test upgrading old features 2022-12-30 22:20 ` [PATCH 1/1] xfs: test upgrading old features Darrick J. Wong @ 2023-03-06 15:56 ` Zorro Lang 2023-03-06 16:41 ` Darrick J. Wong 0 siblings, 1 reply; 565+ messages in thread From: Zorro Lang @ 2023-03-06 15:56 UTC (permalink / raw) To: Darrick J. Wong; +Cc: linux-xfs, fstests On Fri, Dec 30, 2022 at 02:20:29PM -0800, Darrick J. Wong wrote: > From: Darrick J. Wong <djwong@kernel.org> > > Test the ability to add older v5 features. > > Signed-off-by: Darrick J. Wong <djwong@kernel.org> > --- > tests/xfs/769 | 248 +++++++++++++++++++++++++++++++++++++++++++++++++++++ > tests/xfs/769.out | 2 > 2 files changed, 250 insertions(+) > create mode 100755 tests/xfs/769 > create mode 100644 tests/xfs/769.out > > > diff --git a/tests/xfs/769 b/tests/xfs/769 > new file mode 100755 > index 0000000000..7613048f52 > --- /dev/null > +++ b/tests/xfs/769 > @@ -0,0 +1,248 @@ > +#! /bin/bash > +# SPDX-License-Identifier: GPL-2.0-or-later > +# Copyright (c) 2022 Oracle. All Rights Reserved. > +# > +# FS QA Test No. 769 > +# > +# Test upgrading filesystems with new features. > +# > +. ./common/preamble > +_begin_fstest auto mkfs repair > + > +# Import common functions. > +. ./common/filter > +. ./common/populate > + > +# real QA test starts here > +_supported_fs xfs > + > +test -w /dev/ttyprintk || _notrun "test requires writable /dev/ttyprintk" Hi Darrick, I'm not sure why /dev/ttyprintk is necessary. I think sometimes we might not have this driver, but has /dev/kmsg. Can /dev/kmsg be a replacement of ttyprintk ? Thanks, Zorro > +_require_check_dmesg > +_require_scratch_nocheck > +_require_scratch_xfs_crc > + > +# Does repair know how to add a particular feature to a filesystem? > +check_repair_upgrade() > +{ > + $XFS_REPAIR_PROG -c "$1=narf" 2>&1 | \ > + grep -q 'unknown option' && return 1 > + return 0 > +} > + > +# Are we configured for realtime? > +rt_configured() > +{ > + test "$USE_EXTERNAL" = "yes" && test -n "$SCRATCH_RTDEV" > +} > + > +# Compute the MKFS_OPTIONS string for a particular feature upgrade test > +compute_mkfs_options() > +{ > + local m_opts="" > + local caller_options="$MKFS_OPTIONS" > + > + for feat in "${FEATURES[@]}"; do > + local feat_state="${FEATURE_STATE["${feat}"]}" > + > + if echo "$caller_options" | grep -E -w -q "${feat}=[0-9]*"; then > + # Change the caller's options > + caller_options="$(echo "$caller_options" | \ > + sed -e "s/\([^[:alnum:]]\)${feat}=[0-9]*/\1${feat}=${feat_state}/g")" > + else > + # Add it to our list of new mkfs flags > + m_opts="${feat}=${feat_state},${m_opts}" > + fi > + done > + > + test -n "$m_opts" && m_opts=" -m $m_opts" > + > + echo "$caller_options$m_opts" > +} > + > +# Log the start of an upgrade. > +function upgrade_start_message() > +{ > + local feat="$1" > + > + echo "Add $feat to filesystem" > +} > + > +# Find dmesg log messages since we started a particular upgrade test > +function dmesg_since_feature_upgrade_start() > +{ > + local feat_logmsg="$(upgrade_start_message "$1")" > + > + # search the dmesg log of last run of $seqnum for possible failures > + # use sed \cregexpc address type, since $seqnum contains "/" > + dmesg | \ > + tac | \ > + sed -ne "0,\#run fstests $seqnum at $date_time#p" | \ > + sed -ne "0,\#${feat_logmsg}#p" | \ > + tac > +} > + > +# Did the mount fail because this feature is not supported? > +function feature_unsupported() > +{ > + local feat="$1" > + > + dmesg_since_feature_upgrade_start "$feat" | \ > + grep -q 'has unknown.*features' > +} > + > +# Exercise the scratch fs > +function scratch_fsstress() > +{ > + echo moo > $SCRATCH_MNT/sample.txt > + $FSSTRESS_PROG -n $((TIME_FACTOR * 1000)) -p $((LOAD_FACTOR * 4)) \ > + -d $SCRATCH_MNT/data >> $seqres.full > +} > + > +# Exercise the filesystem a little bit and emit a manifest. > +function pre_exercise() > +{ > + local feat="$1" > + > + _try_scratch_mount &> $tmp.mount > + res=$? > + # If the kernel doesn't support the filesystem even after a > + # fresh format, skip the rest of the upgrade test quietly. > + if [ $res -eq 32 ] && feature_unsupported "$feat"; then > + echo "mount failed due to unsupported feature $feat" >> $seqres.full > + return 1 > + fi > + if [ $res -ne 0 ]; then > + cat $tmp.mount > + echo "mount failed with $res before upgrading to $feat" | \ > + tee -a $seqres.full > + return 1 > + fi > + > + scratch_fsstress > + find $SCRATCH_MNT -type f -print0 | xargs -r -0 md5sum > $tmp.manifest > + _scratch_unmount > + return 0 > +} > + > +# Check the manifest and exercise the filesystem more > +function post_exercise() > +{ > + local feat="$1" > + > + _try_scratch_mount &> $tmp.mount > + res=$? > + # If the kernel doesn't support the filesystem even after a > + # fresh format, skip the rest of the upgrade test quietly. > + if [ $res -eq 32 ] && feature_unsupported "$feat"; then > + echo "mount failed due to unsupported feature $feat" >> $seqres.full > + return 1 > + fi > + if [ $res -ne 0 ]; then > + cat $tmp.mount > + echo "mount failed with $res after upgrading to $feat" | \ > + tee -a $seqres.full > + return 1 > + fi > + > + md5sum --quiet -c $tmp.manifest || \ > + echo "fs contents ^^^ changed after adding $feat" > + > + iam="check" _check_scratch_fs || \ > + echo "scratch fs check failed after adding $feat" > + > + # Try to mount the fs in case the check unmounted it > + _try_scratch_mount &>> $seqres.full > + > + scratch_fsstress > + > + iam="check" _check_scratch_fs || \ > + echo "scratch fs check failed after exercising $feat" > + > + # Try to unmount the fs in case the check didn't > + _scratch_unmount &>> $seqres.full > + return 0 > +} > + > +# Create a list of fs features in the order that support for them was added > +# to the kernel driver. For each feature upgrade test, we enable all the > +# features that came before it and none of the ones after, which means we're > +# testing incremental migrations. We start each run with a clean fs so that > +# errors and unsatisfied requirements (log size, root ino position, etc) in one > +# upgrade don't spread failure to the rest of the tests. > +FEATURES=() > +if rt_configured; then > + check_repair_upgrade finobt && FEATURES+=("finobt") > + check_repair_upgrade inobtcount && FEATURES+=("inobtcount") > + check_repair_upgrade bigtime && FEATURES+=("bigtime") > +else > + check_repair_upgrade finobt && FEATURES+=("finobt") > + check_repair_upgrade rmapbt && FEATURES+=("rmapbt") > + check_repair_upgrade reflink && FEATURES+=("reflink") > + check_repair_upgrade inobtcount && FEATURES+=("inobtcount") > + check_repair_upgrade bigtime && FEATURES+=("bigtime") > +fi > + > +test "${#FEATURES[@]}" -eq 0 && \ > + _notrun "xfs_repair does not know how to add V5 features" > + > +declare -A FEATURE_STATE > +for f in "${FEATURES[@]}"; do > + FEATURE_STATE["$f"]=0 > +done > + > +for feat in "${FEATURES[@]}"; do > + echo "-----------------------" >> $seqres.full > + > + upgrade_start_message "$feat" | tee -a $seqres.full /dev/ttyprintk > /dev/null > + > + opts="$(compute_mkfs_options)" > + echo "mkfs.xfs $opts" >> $seqres.full > + > + # Format filesystem > + MKFS_OPTIONS="$opts" _scratch_mkfs &>> $seqres.full > + res=$? > + outcome="mkfs returns $res for $feat upgrade test" > + echo "$outcome" >> $seqres.full > + if [ $res -ne 0 ]; then > + echo "$outcome" > + continue > + fi > + > + # Create some files to make things interesting. > + pre_exercise "$feat" || break > + > + # Upgrade the fs > + _scratch_xfs_repair -c "${feat}=1" &> $tmp.upgrade > + res=$? > + cat $tmp.upgrade >> $seqres.full > + grep -q "^Adding" $tmp.upgrade || \ > + echo "xfs_repair ignored command to add $feat" > + > + outcome="xfs_repair returns $res while adding $feat" > + echo "$outcome" >> $seqres.full > + if [ $res -ne 0 ]; then > + # Couldn't upgrade filesystem, move on to the next feature. > + FEATURE_STATE["$feat"]=1 > + continue > + fi > + > + # Make sure repair runs cleanly afterwards > + _scratch_xfs_repair -n &>> $seqres.full > + res=$? > + outcome="xfs_repair -n returns $res after adding $feat" > + echo "$outcome" >> $seqres.full > + if [ $res -ne 0 ]; then > + echo "$outcome" > + fi > + > + # Make sure we can still exercise the filesystem. > + post_exercise "$feat" || break > + > + # Update feature state for next run > + FEATURE_STATE["$feat"]=1 > +done > + > +# success, all done > +echo Silence is golden. > +status=0 > +exit > diff --git a/tests/xfs/769.out b/tests/xfs/769.out > new file mode 100644 > index 0000000000..332432db97 > --- /dev/null > +++ b/tests/xfs/769.out > @@ -0,0 +1,2 @@ > +QA output created by 769 > +Silence is golden. > ^ permalink raw reply [flat|nested] 565+ messages in thread
* Re: [PATCH 1/1] xfs: test upgrading old features 2023-03-06 15:56 ` Zorro Lang @ 2023-03-06 16:41 ` Darrick J. Wong 2023-03-06 16:54 ` Zorro Lang 0 siblings, 1 reply; 565+ messages in thread From: Darrick J. Wong @ 2023-03-06 16:41 UTC (permalink / raw) To: Zorro Lang; +Cc: linux-xfs, fstests On Mon, Mar 06, 2023 at 11:56:11PM +0800, Zorro Lang wrote: > On Fri, Dec 30, 2022 at 02:20:29PM -0800, Darrick J. Wong wrote: > > From: Darrick J. Wong <djwong@kernel.org> > > > > Test the ability to add older v5 features. > > > > Signed-off-by: Darrick J. Wong <djwong@kernel.org> > > --- > > tests/xfs/769 | 248 +++++++++++++++++++++++++++++++++++++++++++++++++++++ > > tests/xfs/769.out | 2 > > 2 files changed, 250 insertions(+) > > create mode 100755 tests/xfs/769 > > create mode 100644 tests/xfs/769.out > > > > > > diff --git a/tests/xfs/769 b/tests/xfs/769 > > new file mode 100755 > > index 0000000000..7613048f52 > > --- /dev/null > > +++ b/tests/xfs/769 > > @@ -0,0 +1,248 @@ > > +#! /bin/bash > > +# SPDX-License-Identifier: GPL-2.0-or-later > > +# Copyright (c) 2022 Oracle. All Rights Reserved. > > +# > > +# FS QA Test No. 769 > > +# > > +# Test upgrading filesystems with new features. > > +# > > +. ./common/preamble > > +_begin_fstest auto mkfs repair > > + > > +# Import common functions. > > +. ./common/filter > > +. ./common/populate > > + > > +# real QA test starts here > > +_supported_fs xfs > > + > > +test -w /dev/ttyprintk || _notrun "test requires writable /dev/ttyprintk" > > Hi Darrick, > > I'm not sure why /dev/ttyprintk is necessary. I think sometimes we might not > have this driver, but has /dev/kmsg. Can /dev/kmsg be a replacement of > ttyprintk ? The kernel logging here is for debugging purposes -- if the upgrade corrupts the fs and the kernel splats, whoever triages the failure can have at least some clue as to where things went wrong. I think the _notrun line should go away though. --D > Thanks, > Zorro > > > > +_require_check_dmesg > > +_require_scratch_nocheck > > +_require_scratch_xfs_crc > > + > > +# Does repair know how to add a particular feature to a filesystem? > > +check_repair_upgrade() > > +{ > > + $XFS_REPAIR_PROG -c "$1=narf" 2>&1 | \ > > + grep -q 'unknown option' && return 1 > > + return 0 > > +} > > + > > +# Are we configured for realtime? > > +rt_configured() > > +{ > > + test "$USE_EXTERNAL" = "yes" && test -n "$SCRATCH_RTDEV" > > +} > > + > > +# Compute the MKFS_OPTIONS string for a particular feature upgrade test > > +compute_mkfs_options() > > +{ > > + local m_opts="" > > + local caller_options="$MKFS_OPTIONS" > > + > > + for feat in "${FEATURES[@]}"; do > > + local feat_state="${FEATURE_STATE["${feat}"]}" > > + > > + if echo "$caller_options" | grep -E -w -q "${feat}=[0-9]*"; then > > + # Change the caller's options > > + caller_options="$(echo "$caller_options" | \ > > + sed -e "s/\([^[:alnum:]]\)${feat}=[0-9]*/\1${feat}=${feat_state}/g")" > > + else > > + # Add it to our list of new mkfs flags > > + m_opts="${feat}=${feat_state},${m_opts}" > > + fi > > + done > > + > > + test -n "$m_opts" && m_opts=" -m $m_opts" > > + > > + echo "$caller_options$m_opts" > > +} > > + > > +# Log the start of an upgrade. > > +function upgrade_start_message() > > +{ > > + local feat="$1" > > + > > + echo "Add $feat to filesystem" > > +} > > + > > +# Find dmesg log messages since we started a particular upgrade test > > +function dmesg_since_feature_upgrade_start() > > +{ > > + local feat_logmsg="$(upgrade_start_message "$1")" > > + > > + # search the dmesg log of last run of $seqnum for possible failures > > + # use sed \cregexpc address type, since $seqnum contains "/" > > + dmesg | \ > > + tac | \ > > + sed -ne "0,\#run fstests $seqnum at $date_time#p" | \ > > + sed -ne "0,\#${feat_logmsg}#p" | \ > > + tac > > +} > > + > > +# Did the mount fail because this feature is not supported? > > +function feature_unsupported() > > +{ > > + local feat="$1" > > + > > + dmesg_since_feature_upgrade_start "$feat" | \ > > + grep -q 'has unknown.*features' > > +} > > + > > +# Exercise the scratch fs > > +function scratch_fsstress() > > +{ > > + echo moo > $SCRATCH_MNT/sample.txt > > + $FSSTRESS_PROG -n $((TIME_FACTOR * 1000)) -p $((LOAD_FACTOR * 4)) \ > > + -d $SCRATCH_MNT/data >> $seqres.full > > +} > > + > > +# Exercise the filesystem a little bit and emit a manifest. > > +function pre_exercise() > > +{ > > + local feat="$1" > > + > > + _try_scratch_mount &> $tmp.mount > > + res=$? > > + # If the kernel doesn't support the filesystem even after a > > + # fresh format, skip the rest of the upgrade test quietly. > > + if [ $res -eq 32 ] && feature_unsupported "$feat"; then > > + echo "mount failed due to unsupported feature $feat" >> $seqres.full > > + return 1 > > + fi > > + if [ $res -ne 0 ]; then > > + cat $tmp.mount > > + echo "mount failed with $res before upgrading to $feat" | \ > > + tee -a $seqres.full > > + return 1 > > + fi > > + > > + scratch_fsstress > > + find $SCRATCH_MNT -type f -print0 | xargs -r -0 md5sum > $tmp.manifest > > + _scratch_unmount > > + return 0 > > +} > > + > > +# Check the manifest and exercise the filesystem more > > +function post_exercise() > > +{ > > + local feat="$1" > > + > > + _try_scratch_mount &> $tmp.mount > > + res=$? > > + # If the kernel doesn't support the filesystem even after a > > + # fresh format, skip the rest of the upgrade test quietly. > > + if [ $res -eq 32 ] && feature_unsupported "$feat"; then > > + echo "mount failed due to unsupported feature $feat" >> $seqres.full > > + return 1 > > + fi > > + if [ $res -ne 0 ]; then > > + cat $tmp.mount > > + echo "mount failed with $res after upgrading to $feat" | \ > > + tee -a $seqres.full > > + return 1 > > + fi > > + > > + md5sum --quiet -c $tmp.manifest || \ > > + echo "fs contents ^^^ changed after adding $feat" > > + > > + iam="check" _check_scratch_fs || \ > > + echo "scratch fs check failed after adding $feat" > > + > > + # Try to mount the fs in case the check unmounted it > > + _try_scratch_mount &>> $seqres.full > > + > > + scratch_fsstress > > + > > + iam="check" _check_scratch_fs || \ > > + echo "scratch fs check failed after exercising $feat" > > + > > + # Try to unmount the fs in case the check didn't > > + _scratch_unmount &>> $seqres.full > > + return 0 > > +} > > + > > +# Create a list of fs features in the order that support for them was added > > +# to the kernel driver. For each feature upgrade test, we enable all the > > +# features that came before it and none of the ones after, which means we're > > +# testing incremental migrations. We start each run with a clean fs so that > > +# errors and unsatisfied requirements (log size, root ino position, etc) in one > > +# upgrade don't spread failure to the rest of the tests. > > +FEATURES=() > > +if rt_configured; then > > + check_repair_upgrade finobt && FEATURES+=("finobt") > > + check_repair_upgrade inobtcount && FEATURES+=("inobtcount") > > + check_repair_upgrade bigtime && FEATURES+=("bigtime") > > +else > > + check_repair_upgrade finobt && FEATURES+=("finobt") > > + check_repair_upgrade rmapbt && FEATURES+=("rmapbt") > > + check_repair_upgrade reflink && FEATURES+=("reflink") > > + check_repair_upgrade inobtcount && FEATURES+=("inobtcount") > > + check_repair_upgrade bigtime && FEATURES+=("bigtime") > > +fi > > + > > +test "${#FEATURES[@]}" -eq 0 && \ > > + _notrun "xfs_repair does not know how to add V5 features" > > + > > +declare -A FEATURE_STATE > > +for f in "${FEATURES[@]}"; do > > + FEATURE_STATE["$f"]=0 > > +done > > + > > +for feat in "${FEATURES[@]}"; do > > + echo "-----------------------" >> $seqres.full > > + > > + upgrade_start_message "$feat" | tee -a $seqres.full /dev/ttyprintk > /dev/null > > + > > + opts="$(compute_mkfs_options)" > > + echo "mkfs.xfs $opts" >> $seqres.full > > + > > + # Format filesystem > > + MKFS_OPTIONS="$opts" _scratch_mkfs &>> $seqres.full > > + res=$? > > + outcome="mkfs returns $res for $feat upgrade test" > > + echo "$outcome" >> $seqres.full > > + if [ $res -ne 0 ]; then > > + echo "$outcome" > > + continue > > + fi > > + > > + # Create some files to make things interesting. > > + pre_exercise "$feat" || break > > + > > + # Upgrade the fs > > + _scratch_xfs_repair -c "${feat}=1" &> $tmp.upgrade > > + res=$? > > + cat $tmp.upgrade >> $seqres.full > > + grep -q "^Adding" $tmp.upgrade || \ > > + echo "xfs_repair ignored command to add $feat" > > + > > + outcome="xfs_repair returns $res while adding $feat" > > + echo "$outcome" >> $seqres.full > > + if [ $res -ne 0 ]; then > > + # Couldn't upgrade filesystem, move on to the next feature. > > + FEATURE_STATE["$feat"]=1 > > + continue > > + fi > > + > > + # Make sure repair runs cleanly afterwards > > + _scratch_xfs_repair -n &>> $seqres.full > > + res=$? > > + outcome="xfs_repair -n returns $res after adding $feat" > > + echo "$outcome" >> $seqres.full > > + if [ $res -ne 0 ]; then > > + echo "$outcome" > > + fi > > + > > + # Make sure we can still exercise the filesystem. > > + post_exercise "$feat" || break > > + > > + # Update feature state for next run > > + FEATURE_STATE["$feat"]=1 > > +done > > + > > +# success, all done > > +echo Silence is golden. > > +status=0 > > +exit > > diff --git a/tests/xfs/769.out b/tests/xfs/769.out > > new file mode 100644 > > index 0000000000..332432db97 > > --- /dev/null > > +++ b/tests/xfs/769.out > > @@ -0,0 +1,2 @@ > > +QA output created by 769 > > +Silence is golden. > > > ^ permalink raw reply [flat|nested] 565+ messages in thread
* Re: [PATCH 1/1] xfs: test upgrading old features 2023-03-06 16:41 ` Darrick J. Wong @ 2023-03-06 16:54 ` Zorro Lang 2023-03-06 23:14 ` Darrick J. Wong 0 siblings, 1 reply; 565+ messages in thread From: Zorro Lang @ 2023-03-06 16:54 UTC (permalink / raw) To: Darrick J. Wong; +Cc: linux-xfs, fstests On Mon, Mar 06, 2023 at 08:41:55AM -0800, Darrick J. Wong wrote: > On Mon, Mar 06, 2023 at 11:56:11PM +0800, Zorro Lang wrote: > > On Fri, Dec 30, 2022 at 02:20:29PM -0800, Darrick J. Wong wrote: > > > From: Darrick J. Wong <djwong@kernel.org> > > > > > > Test the ability to add older v5 features. > > > > > > Signed-off-by: Darrick J. Wong <djwong@kernel.org> > > > --- > > > tests/xfs/769 | 248 +++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > tests/xfs/769.out | 2 > > > 2 files changed, 250 insertions(+) > > > create mode 100755 tests/xfs/769 > > > create mode 100644 tests/xfs/769.out > > > > > > > > > diff --git a/tests/xfs/769 b/tests/xfs/769 > > > new file mode 100755 > > > index 0000000000..7613048f52 > > > --- /dev/null > > > +++ b/tests/xfs/769 > > > @@ -0,0 +1,248 @@ > > > +#! /bin/bash > > > +# SPDX-License-Identifier: GPL-2.0-or-later > > > +# Copyright (c) 2022 Oracle. All Rights Reserved. > > > +# > > > +# FS QA Test No. 769 > > > +# > > > +# Test upgrading filesystems with new features. > > > +# > > > +. ./common/preamble > > > +_begin_fstest auto mkfs repair > > > + > > > +# Import common functions. > > > +. ./common/filter > > > +. ./common/populate > > > + > > > +# real QA test starts here > > > +_supported_fs xfs > > > + > > > +test -w /dev/ttyprintk || _notrun "test requires writable /dev/ttyprintk" > > > > Hi Darrick, > > > > I'm not sure why /dev/ttyprintk is necessary. I think sometimes we might not > > have this driver, but has /dev/kmsg. Can /dev/kmsg be a replacement of > > ttyprintk ? > > The kernel logging here is for debugging purposes -- if the upgrade > corrupts the fs and the kernel splats, whoever triages the failure can > have at least some clue as to where things went wrong. > > I think the _notrun line should go away though. If only for debugging, better to let this case keep running without ttyprintk. Or it'll _notrun on many systems without ttyprintk driver. There's a log_dmesg() helper in common/rc, the /dev/kmsg might be more often. As you've called *_require_check_dmesg*, it helps to check kmsg file. So how about use it? Or making a helper to write messages to ttyprintk, then fallback to kmsg if ttyprintk is missing ? Thanks, Zorro > > --D > > > Thanks, > > Zorro > > > > > > > +_require_check_dmesg > > > +_require_scratch_nocheck > > > +_require_scratch_xfs_crc > > > + > > > +# Does repair know how to add a particular feature to a filesystem? > > > +check_repair_upgrade() > > > +{ > > > + $XFS_REPAIR_PROG -c "$1=narf" 2>&1 | \ > > > + grep -q 'unknown option' && return 1 > > > + return 0 > > > +} > > > + > > > +# Are we configured for realtime? > > > +rt_configured() > > > +{ > > > + test "$USE_EXTERNAL" = "yes" && test -n "$SCRATCH_RTDEV" > > > +} > > > + > > > +# Compute the MKFS_OPTIONS string for a particular feature upgrade test > > > +compute_mkfs_options() > > > +{ > > > + local m_opts="" > > > + local caller_options="$MKFS_OPTIONS" > > > + > > > + for feat in "${FEATURES[@]}"; do > > > + local feat_state="${FEATURE_STATE["${feat}"]}" > > > + > > > + if echo "$caller_options" | grep -E -w -q "${feat}=[0-9]*"; then > > > + # Change the caller's options > > > + caller_options="$(echo "$caller_options" | \ > > > + sed -e "s/\([^[:alnum:]]\)${feat}=[0-9]*/\1${feat}=${feat_state}/g")" > > > + else > > > + # Add it to our list of new mkfs flags > > > + m_opts="${feat}=${feat_state},${m_opts}" > > > + fi > > > + done > > > + > > > + test -n "$m_opts" && m_opts=" -m $m_opts" > > > + > > > + echo "$caller_options$m_opts" > > > +} > > > + > > > +# Log the start of an upgrade. > > > +function upgrade_start_message() > > > +{ > > > + local feat="$1" > > > + > > > + echo "Add $feat to filesystem" > > > +} > > > + > > > +# Find dmesg log messages since we started a particular upgrade test > > > +function dmesg_since_feature_upgrade_start() > > > +{ > > > + local feat_logmsg="$(upgrade_start_message "$1")" > > > + > > > + # search the dmesg log of last run of $seqnum for possible failures > > > + # use sed \cregexpc address type, since $seqnum contains "/" > > > + dmesg | \ > > > + tac | \ > > > + sed -ne "0,\#run fstests $seqnum at $date_time#p" | \ > > > + sed -ne "0,\#${feat_logmsg}#p" | \ > > > + tac > > > +} > > > + > > > +# Did the mount fail because this feature is not supported? > > > +function feature_unsupported() > > > +{ > > > + local feat="$1" > > > + > > > + dmesg_since_feature_upgrade_start "$feat" | \ > > > + grep -q 'has unknown.*features' > > > +} > > > + > > > +# Exercise the scratch fs > > > +function scratch_fsstress() > > > +{ > > > + echo moo > $SCRATCH_MNT/sample.txt > > > + $FSSTRESS_PROG -n $((TIME_FACTOR * 1000)) -p $((LOAD_FACTOR * 4)) \ > > > + -d $SCRATCH_MNT/data >> $seqres.full > > > +} > > > + > > > +# Exercise the filesystem a little bit and emit a manifest. > > > +function pre_exercise() > > > +{ > > > + local feat="$1" > > > + > > > + _try_scratch_mount &> $tmp.mount > > > + res=$? > > > + # If the kernel doesn't support the filesystem even after a > > > + # fresh format, skip the rest of the upgrade test quietly. > > > + if [ $res -eq 32 ] && feature_unsupported "$feat"; then > > > + echo "mount failed due to unsupported feature $feat" >> $seqres.full > > > + return 1 > > > + fi > > > + if [ $res -ne 0 ]; then > > > + cat $tmp.mount > > > + echo "mount failed with $res before upgrading to $feat" | \ > > > + tee -a $seqres.full > > > + return 1 > > > + fi > > > + > > > + scratch_fsstress > > > + find $SCRATCH_MNT -type f -print0 | xargs -r -0 md5sum > $tmp.manifest > > > + _scratch_unmount > > > + return 0 > > > +} > > > + > > > +# Check the manifest and exercise the filesystem more > > > +function post_exercise() > > > +{ > > > + local feat="$1" > > > + > > > + _try_scratch_mount &> $tmp.mount > > > + res=$? > > > + # If the kernel doesn't support the filesystem even after a > > > + # fresh format, skip the rest of the upgrade test quietly. > > > + if [ $res -eq 32 ] && feature_unsupported "$feat"; then > > > + echo "mount failed due to unsupported feature $feat" >> $seqres.full > > > + return 1 > > > + fi > > > + if [ $res -ne 0 ]; then > > > + cat $tmp.mount > > > + echo "mount failed with $res after upgrading to $feat" | \ > > > + tee -a $seqres.full > > > + return 1 > > > + fi > > > + > > > + md5sum --quiet -c $tmp.manifest || \ > > > + echo "fs contents ^^^ changed after adding $feat" > > > + > > > + iam="check" _check_scratch_fs || \ > > > + echo "scratch fs check failed after adding $feat" > > > + > > > + # Try to mount the fs in case the check unmounted it > > > + _try_scratch_mount &>> $seqres.full > > > + > > > + scratch_fsstress > > > + > > > + iam="check" _check_scratch_fs || \ > > > + echo "scratch fs check failed after exercising $feat" > > > + > > > + # Try to unmount the fs in case the check didn't > > > + _scratch_unmount &>> $seqres.full > > > + return 0 > > > +} > > > + > > > +# Create a list of fs features in the order that support for them was added > > > +# to the kernel driver. For each feature upgrade test, we enable all the > > > +# features that came before it and none of the ones after, which means we're > > > +# testing incremental migrations. We start each run with a clean fs so that > > > +# errors and unsatisfied requirements (log size, root ino position, etc) in one > > > +# upgrade don't spread failure to the rest of the tests. > > > +FEATURES=() > > > +if rt_configured; then > > > + check_repair_upgrade finobt && FEATURES+=("finobt") > > > + check_repair_upgrade inobtcount && FEATURES+=("inobtcount") > > > + check_repair_upgrade bigtime && FEATURES+=("bigtime") > > > +else > > > + check_repair_upgrade finobt && FEATURES+=("finobt") > > > + check_repair_upgrade rmapbt && FEATURES+=("rmapbt") > > > + check_repair_upgrade reflink && FEATURES+=("reflink") > > > + check_repair_upgrade inobtcount && FEATURES+=("inobtcount") > > > + check_repair_upgrade bigtime && FEATURES+=("bigtime") > > > +fi > > > + > > > +test "${#FEATURES[@]}" -eq 0 && \ > > > + _notrun "xfs_repair does not know how to add V5 features" > > > + > > > +declare -A FEATURE_STATE > > > +for f in "${FEATURES[@]}"; do > > > + FEATURE_STATE["$f"]=0 > > > +done > > > + > > > +for feat in "${FEATURES[@]}"; do > > > + echo "-----------------------" >> $seqres.full > > > + > > > + upgrade_start_message "$feat" | tee -a $seqres.full /dev/ttyprintk > /dev/null > > > + > > > + opts="$(compute_mkfs_options)" > > > + echo "mkfs.xfs $opts" >> $seqres.full > > > + > > > + # Format filesystem > > > + MKFS_OPTIONS="$opts" _scratch_mkfs &>> $seqres.full > > > + res=$? > > > + outcome="mkfs returns $res for $feat upgrade test" > > > + echo "$outcome" >> $seqres.full > > > + if [ $res -ne 0 ]; then > > > + echo "$outcome" > > > + continue > > > + fi > > > + > > > + # Create some files to make things interesting. > > > + pre_exercise "$feat" || break > > > + > > > + # Upgrade the fs > > > + _scratch_xfs_repair -c "${feat}=1" &> $tmp.upgrade > > > + res=$? > > > + cat $tmp.upgrade >> $seqres.full > > > + grep -q "^Adding" $tmp.upgrade || \ > > > + echo "xfs_repair ignored command to add $feat" > > > + > > > + outcome="xfs_repair returns $res while adding $feat" > > > + echo "$outcome" >> $seqres.full > > > + if [ $res -ne 0 ]; then > > > + # Couldn't upgrade filesystem, move on to the next feature. > > > + FEATURE_STATE["$feat"]=1 > > > + continue > > > + fi > > > + > > > + # Make sure repair runs cleanly afterwards > > > + _scratch_xfs_repair -n &>> $seqres.full > > > + res=$? > > > + outcome="xfs_repair -n returns $res after adding $feat" > > > + echo "$outcome" >> $seqres.full > > > + if [ $res -ne 0 ]; then > > > + echo "$outcome" > > > + fi > > > + > > > + # Make sure we can still exercise the filesystem. > > > + post_exercise "$feat" || break > > > + > > > + # Update feature state for next run > > > + FEATURE_STATE["$feat"]=1 > > > +done > > > + > > > +# success, all done > > > +echo Silence is golden. > > > +status=0 > > > +exit > > > diff --git a/tests/xfs/769.out b/tests/xfs/769.out > > > new file mode 100644 > > > index 0000000000..332432db97 > > > --- /dev/null > > > +++ b/tests/xfs/769.out > > > @@ -0,0 +1,2 @@ > > > +QA output created by 769 > > > +Silence is golden. > > > > > > ^ permalink raw reply [flat|nested] 565+ messages in thread
* Re: [PATCH 1/1] xfs: test upgrading old features 2023-03-06 16:54 ` Zorro Lang @ 2023-03-06 23:14 ` Darrick J. Wong 0 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2023-03-06 23:14 UTC (permalink / raw) To: Zorro Lang; +Cc: linux-xfs, fstests On Tue, Mar 07, 2023 at 12:54:02AM +0800, Zorro Lang wrote: > On Mon, Mar 06, 2023 at 08:41:55AM -0800, Darrick J. Wong wrote: > > On Mon, Mar 06, 2023 at 11:56:11PM +0800, Zorro Lang wrote: > > > On Fri, Dec 30, 2022 at 02:20:29PM -0800, Darrick J. Wong wrote: > > > > From: Darrick J. Wong <djwong@kernel.org> > > > > > > > > Test the ability to add older v5 features. > > > > > > > > Signed-off-by: Darrick J. Wong <djwong@kernel.org> > > > > --- > > > > tests/xfs/769 | 248 +++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > > tests/xfs/769.out | 2 > > > > 2 files changed, 250 insertions(+) > > > > create mode 100755 tests/xfs/769 > > > > create mode 100644 tests/xfs/769.out > > > > > > > > > > > > diff --git a/tests/xfs/769 b/tests/xfs/769 > > > > new file mode 100755 > > > > index 0000000000..7613048f52 > > > > --- /dev/null > > > > +++ b/tests/xfs/769 > > > > @@ -0,0 +1,248 @@ > > > > +#! /bin/bash > > > > +# SPDX-License-Identifier: GPL-2.0-or-later > > > > +# Copyright (c) 2022 Oracle. All Rights Reserved. > > > > +# > > > > +# FS QA Test No. 769 > > > > +# > > > > +# Test upgrading filesystems with new features. > > > > +# > > > > +. ./common/preamble > > > > +_begin_fstest auto mkfs repair > > > > + > > > > +# Import common functions. > > > > +. ./common/filter > > > > +. ./common/populate > > > > + > > > > +# real QA test starts here > > > > +_supported_fs xfs > > > > + > > > > +test -w /dev/ttyprintk || _notrun "test requires writable /dev/ttyprintk" > > > > > > Hi Darrick, > > > > > > I'm not sure why /dev/ttyprintk is necessary. I think sometimes we might not > > > have this driver, but has /dev/kmsg. Can /dev/kmsg be a replacement of > > > ttyprintk ? > > > > The kernel logging here is for debugging purposes -- if the upgrade > > corrupts the fs and the kernel splats, whoever triages the failure can > > have at least some clue as to where things went wrong. > > > > I think the _notrun line should go away though. > > If only for debugging, better to let this case keep running without ttyprintk. > Or it'll _notrun on many systems without ttyprintk driver. > > There's a log_dmesg() helper in common/rc, the /dev/kmsg might be more often. > As you've called *_require_check_dmesg*, it helps to check kmsg file. So > how about use it? Or making a helper to write messages to ttyprintk, then > fallback to kmsg if ttyprintk is missing ? Yeah, I can hoist the ttyprintk usage into a helper that emits to ttyprintk or kmsg depending on what's available. --D > Thanks, > Zorro > > > > > --D > > > > > Thanks, > > > Zorro > > > > > > > > > > +_require_check_dmesg > > > > +_require_scratch_nocheck > > > > +_require_scratch_xfs_crc > > > > + > > > > +# Does repair know how to add a particular feature to a filesystem? > > > > +check_repair_upgrade() > > > > +{ > > > > + $XFS_REPAIR_PROG -c "$1=narf" 2>&1 | \ > > > > + grep -q 'unknown option' && return 1 > > > > + return 0 > > > > +} > > > > + > > > > +# Are we configured for realtime? > > > > +rt_configured() > > > > +{ > > > > + test "$USE_EXTERNAL" = "yes" && test -n "$SCRATCH_RTDEV" > > > > +} > > > > + > > > > +# Compute the MKFS_OPTIONS string for a particular feature upgrade test > > > > +compute_mkfs_options() > > > > +{ > > > > + local m_opts="" > > > > + local caller_options="$MKFS_OPTIONS" > > > > + > > > > + for feat in "${FEATURES[@]}"; do > > > > + local feat_state="${FEATURE_STATE["${feat}"]}" > > > > + > > > > + if echo "$caller_options" | grep -E -w -q "${feat}=[0-9]*"; then > > > > + # Change the caller's options > > > > + caller_options="$(echo "$caller_options" | \ > > > > + sed -e "s/\([^[:alnum:]]\)${feat}=[0-9]*/\1${feat}=${feat_state}/g")" > > > > + else > > > > + # Add it to our list of new mkfs flags > > > > + m_opts="${feat}=${feat_state},${m_opts}" > > > > + fi > > > > + done > > > > + > > > > + test -n "$m_opts" && m_opts=" -m $m_opts" > > > > + > > > > + echo "$caller_options$m_opts" > > > > +} > > > > + > > > > +# Log the start of an upgrade. > > > > +function upgrade_start_message() > > > > +{ > > > > + local feat="$1" > > > > + > > > > + echo "Add $feat to filesystem" > > > > +} > > > > + > > > > +# Find dmesg log messages since we started a particular upgrade test > > > > +function dmesg_since_feature_upgrade_start() > > > > +{ > > > > + local feat_logmsg="$(upgrade_start_message "$1")" > > > > + > > > > + # search the dmesg log of last run of $seqnum for possible failures > > > > + # use sed \cregexpc address type, since $seqnum contains "/" > > > > + dmesg | \ > > > > + tac | \ > > > > + sed -ne "0,\#run fstests $seqnum at $date_time#p" | \ > > > > + sed -ne "0,\#${feat_logmsg}#p" | \ > > > > + tac > > > > +} > > > > + > > > > +# Did the mount fail because this feature is not supported? > > > > +function feature_unsupported() > > > > +{ > > > > + local feat="$1" > > > > + > > > > + dmesg_since_feature_upgrade_start "$feat" | \ > > > > + grep -q 'has unknown.*features' > > > > +} > > > > + > > > > +# Exercise the scratch fs > > > > +function scratch_fsstress() > > > > +{ > > > > + echo moo > $SCRATCH_MNT/sample.txt > > > > + $FSSTRESS_PROG -n $((TIME_FACTOR * 1000)) -p $((LOAD_FACTOR * 4)) \ > > > > + -d $SCRATCH_MNT/data >> $seqres.full > > > > +} > > > > + > > > > +# Exercise the filesystem a little bit and emit a manifest. > > > > +function pre_exercise() > > > > +{ > > > > + local feat="$1" > > > > + > > > > + _try_scratch_mount &> $tmp.mount > > > > + res=$? > > > > + # If the kernel doesn't support the filesystem even after a > > > > + # fresh format, skip the rest of the upgrade test quietly. > > > > + if [ $res -eq 32 ] && feature_unsupported "$feat"; then > > > > + echo "mount failed due to unsupported feature $feat" >> $seqres.full > > > > + return 1 > > > > + fi > > > > + if [ $res -ne 0 ]; then > > > > + cat $tmp.mount > > > > + echo "mount failed with $res before upgrading to $feat" | \ > > > > + tee -a $seqres.full > > > > + return 1 > > > > + fi > > > > + > > > > + scratch_fsstress > > > > + find $SCRATCH_MNT -type f -print0 | xargs -r -0 md5sum > $tmp.manifest > > > > + _scratch_unmount > > > > + return 0 > > > > +} > > > > + > > > > +# Check the manifest and exercise the filesystem more > > > > +function post_exercise() > > > > +{ > > > > + local feat="$1" > > > > + > > > > + _try_scratch_mount &> $tmp.mount > > > > + res=$? > > > > + # If the kernel doesn't support the filesystem even after a > > > > + # fresh format, skip the rest of the upgrade test quietly. > > > > + if [ $res -eq 32 ] && feature_unsupported "$feat"; then > > > > + echo "mount failed due to unsupported feature $feat" >> $seqres.full > > > > + return 1 > > > > + fi > > > > + if [ $res -ne 0 ]; then > > > > + cat $tmp.mount > > > > + echo "mount failed with $res after upgrading to $feat" | \ > > > > + tee -a $seqres.full > > > > + return 1 > > > > + fi > > > > + > > > > + md5sum --quiet -c $tmp.manifest || \ > > > > + echo "fs contents ^^^ changed after adding $feat" > > > > + > > > > + iam="check" _check_scratch_fs || \ > > > > + echo "scratch fs check failed after adding $feat" > > > > + > > > > + # Try to mount the fs in case the check unmounted it > > > > + _try_scratch_mount &>> $seqres.full > > > > + > > > > + scratch_fsstress > > > > + > > > > + iam="check" _check_scratch_fs || \ > > > > + echo "scratch fs check failed after exercising $feat" > > > > + > > > > + # Try to unmount the fs in case the check didn't > > > > + _scratch_unmount &>> $seqres.full > > > > + return 0 > > > > +} > > > > + > > > > +# Create a list of fs features in the order that support for them was added > > > > +# to the kernel driver. For each feature upgrade test, we enable all the > > > > +# features that came before it and none of the ones after, which means we're > > > > +# testing incremental migrations. We start each run with a clean fs so that > > > > +# errors and unsatisfied requirements (log size, root ino position, etc) in one > > > > +# upgrade don't spread failure to the rest of the tests. > > > > +FEATURES=() > > > > +if rt_configured; then > > > > + check_repair_upgrade finobt && FEATURES+=("finobt") > > > > + check_repair_upgrade inobtcount && FEATURES+=("inobtcount") > > > > + check_repair_upgrade bigtime && FEATURES+=("bigtime") > > > > +else > > > > + check_repair_upgrade finobt && FEATURES+=("finobt") > > > > + check_repair_upgrade rmapbt && FEATURES+=("rmapbt") > > > > + check_repair_upgrade reflink && FEATURES+=("reflink") > > > > + check_repair_upgrade inobtcount && FEATURES+=("inobtcount") > > > > + check_repair_upgrade bigtime && FEATURES+=("bigtime") > > > > +fi > > > > + > > > > +test "${#FEATURES[@]}" -eq 0 && \ > > > > + _notrun "xfs_repair does not know how to add V5 features" > > > > + > > > > +declare -A FEATURE_STATE > > > > +for f in "${FEATURES[@]}"; do > > > > + FEATURE_STATE["$f"]=0 > > > > +done > > > > + > > > > +for feat in "${FEATURES[@]}"; do > > > > + echo "-----------------------" >> $seqres.full > > > > + > > > > + upgrade_start_message "$feat" | tee -a $seqres.full /dev/ttyprintk > /dev/null > > > > + > > > > + opts="$(compute_mkfs_options)" > > > > + echo "mkfs.xfs $opts" >> $seqres.full > > > > + > > > > + # Format filesystem > > > > + MKFS_OPTIONS="$opts" _scratch_mkfs &>> $seqres.full > > > > + res=$? > > > > + outcome="mkfs returns $res for $feat upgrade test" > > > > + echo "$outcome" >> $seqres.full > > > > + if [ $res -ne 0 ]; then > > > > + echo "$outcome" > > > > + continue > > > > + fi > > > > + > > > > + # Create some files to make things interesting. > > > > + pre_exercise "$feat" || break > > > > + > > > > + # Upgrade the fs > > > > + _scratch_xfs_repair -c "${feat}=1" &> $tmp.upgrade > > > > + res=$? > > > > + cat $tmp.upgrade >> $seqres.full > > > > + grep -q "^Adding" $tmp.upgrade || \ > > > > + echo "xfs_repair ignored command to add $feat" > > > > + > > > > + outcome="xfs_repair returns $res while adding $feat" > > > > + echo "$outcome" >> $seqres.full > > > > + if [ $res -ne 0 ]; then > > > > + # Couldn't upgrade filesystem, move on to the next feature. > > > > + FEATURE_STATE["$feat"]=1 > > > > + continue > > > > + fi > > > > + > > > > + # Make sure repair runs cleanly afterwards > > > > + _scratch_xfs_repair -n &>> $seqres.full > > > > + res=$? > > > > + outcome="xfs_repair -n returns $res after adding $feat" > > > > + echo "$outcome" >> $seqres.full > > > > + if [ $res -ne 0 ]; then > > > > + echo "$outcome" > > > > + fi > > > > + > > > > + # Make sure we can still exercise the filesystem. > > > > + post_exercise "$feat" || break > > > > + > > > > + # Update feature state for next run > > > > + FEATURE_STATE["$feat"]=1 > > > > +done > > > > + > > > > +# success, all done > > > > +echo Silence is golden. > > > > +status=0 > > > > +exit > > > > diff --git a/tests/xfs/769.out b/tests/xfs/769.out > > > > new file mode 100644 > > > > index 0000000000..332432db97 > > > > --- /dev/null > > > > +++ b/tests/xfs/769.out > > > > @@ -0,0 +1,2 @@ > > > > +QA output created by 769 > > > > +Silence is golden. > > > > > > > > > > ^ permalink raw reply [flat|nested] 565+ messages in thread
* [PATCHSET v1.0 0/9] fstests: test XFS metadata directories 2022-12-30 21:14 [NYE DELUGE 3/4] xfs: modernize the realtime volume Darrick J. Wong ` (32 preceding siblings ...) 2022-12-30 22:20 ` [PATCHSET 0/1] fstests: test upgrading older features Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 3/9] xfs/{030,033,178}: forcibly disable metadata directory trees Darrick J. Wong ` (8 more replies) 2022-12-30 22:20 ` [PATCHSET v1.0 0/4] fstests: support metadump to external devices Darrick J. Wong ` (5 subsequent siblings) 39 siblings, 9 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan Hi all, Adjust fstests as needed to support the XFS metadata directory feature, and add some new tests for online fsck and fuzz testing of the ondisk metadata. If you're going to start using this mess, you probably ought to just pull from my git trees, which are linked below. This is an extraordinary way to destroy everything. Enjoy! Comments and questions are, as always, welcome. --D kernel git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=metadir xfsprogs git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=metadir fstests git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=metadir xfsdocs git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-documentation.git/log/?h=metadir --- common/filter | 7 +++- common/repair | 4 ++ common/xfs | 90 +++++++++++++++++++++++++++++++++++++++++++++++++++- tests/xfs/007 | 16 +++++---- tests/xfs/030 | 1 + tests/xfs/033 | 1 + tests/xfs/050 | 1 + tests/xfs/122.out | 1 + tests/xfs/153 | 1 + tests/xfs/1546 | 37 +++++++++++++++++++++ tests/xfs/1546.out | 4 ++ tests/xfs/1547 | 37 +++++++++++++++++++++ tests/xfs/1547.out | 4 ++ tests/xfs/1548 | 37 +++++++++++++++++++++ tests/xfs/1548.out | 4 ++ tests/xfs/1549 | 38 ++++++++++++++++++++++ tests/xfs/1549.out | 4 ++ tests/xfs/1550 | 37 +++++++++++++++++++++ tests/xfs/1550.out | 4 ++ tests/xfs/1551 | 37 +++++++++++++++++++++ tests/xfs/1551.out | 4 ++ tests/xfs/1552 | 37 +++++++++++++++++++++ tests/xfs/1552.out | 4 ++ tests/xfs/1553 | 38 ++++++++++++++++++++++ tests/xfs/1553.out | 4 ++ tests/xfs/1562 | 9 +---- tests/xfs/1563 | 9 +---- tests/xfs/1564 | 9 +---- tests/xfs/1565 | 9 +---- tests/xfs/1566 | 9 +---- tests/xfs/1567 | 9 +---- tests/xfs/1568 | 9 +---- tests/xfs/1569 | 9 +---- tests/xfs/178 | 1 + tests/xfs/206 | 3 +- tests/xfs/299 | 1 + tests/xfs/330 | 6 +++ tests/xfs/509 | 21 +++++++++++- tests/xfs/529 | 5 +-- tests/xfs/530 | 6 +-- tests/xfs/769 | 2 + 41 files changed, 491 insertions(+), 78 deletions(-) create mode 100755 tests/xfs/1546 create mode 100644 tests/xfs/1546.out create mode 100755 tests/xfs/1547 create mode 100644 tests/xfs/1547.out create mode 100755 tests/xfs/1548 create mode 100644 tests/xfs/1548.out create mode 100755 tests/xfs/1549 create mode 100644 tests/xfs/1549.out create mode 100755 tests/xfs/1550 create mode 100644 tests/xfs/1550.out create mode 100755 tests/xfs/1551 create mode 100644 tests/xfs/1551.out create mode 100755 tests/xfs/1552 create mode 100644 tests/xfs/1552.out create mode 100755 tests/xfs/1553 create mode 100644 tests/xfs/1553.out ^ permalink raw reply [flat|nested] 565+ messages in thread
* [PATCH 3/9] xfs/{030,033,178}: forcibly disable metadata directory trees 2022-12-30 22:20 ` [PATCHSET v1.0 0/9] fstests: test XFS metadata directories Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 1/9] xfs/122: fix metadirino Darrick J. Wong ` (7 subsequent siblings) 8 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan From: Darrick J. Wong <djwong@kernel.org> The golden output for thests tests encode the xfs_repair output when we fuzz various parts of the filesystem. With metadata directory trees enabled, however, the golden output changes dramatically to reflect reconstruction of the metadata directory tree. To avoid regressions, add a helper to force metadata directories off via MKFS_OPTIONS. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- common/xfs | 13 +++++++++++++ tests/xfs/030 | 1 + tests/xfs/033 | 1 + tests/xfs/178 | 1 + 4 files changed, 16 insertions(+) diff --git a/common/xfs b/common/xfs index dafbd1b874..0f69d3eb18 100644 --- a/common/xfs +++ b/common/xfs @@ -1770,3 +1770,16 @@ _scratch_xfs_find_metafile() echo "${selector}" return 0 } + +# Force metadata directories off. +_scratch_xfs_force_no_metadir() +{ + if echo "$MKFS_OPTIONS" | grep -q 'metadir='; then + MKFS_OPTIONS="$(echo "$MKFS_OPTIONS" | sed -e 's/metadir=\([01]\)/metadir=0/g')" + return + fi + + if grep -q 'metadir=' $MKFS_XFS_PROG; then + MKFS_OPTIONS="-m metadir=0 $MKFS_OPTIONS" + fi +} diff --git a/tests/xfs/030 b/tests/xfs/030 index 201a901579..a62ea4fab3 100755 --- a/tests/xfs/030 +++ b/tests/xfs/030 @@ -50,6 +50,7 @@ _supported_fs xfs _require_scratch _require_no_large_scratch_dev +_scratch_xfs_force_no_metadir DSIZE="-dsize=100m,agcount=6" diff --git a/tests/xfs/033 b/tests/xfs/033 index ef5dc4fa36..e886c15082 100755 --- a/tests/xfs/033 +++ b/tests/xfs/033 @@ -53,6 +53,7 @@ _supported_fs xfs _require_scratch _require_no_large_scratch_dev +_scratch_xfs_force_no_metadir # devzero blows away 512byte blocks, so make 512byte inodes (at least) _scratch_mkfs_xfs | _filter_mkfs 2>$tmp.mkfs >/dev/null diff --git a/tests/xfs/178 b/tests/xfs/178 index a65197cde3..72b4d347fd 100755 --- a/tests/xfs/178 +++ b/tests/xfs/178 @@ -45,6 +45,7 @@ _supported_fs xfs # fix filesystem, new mkfs.xfs will be fine. _require_scratch +_scratch_xfs_force_no_metadir _scratch_mkfs_xfs | _filter_mkfs 2>$tmp.mkfs test "${PIPESTATUS[0]}" -eq 0 || _fail "mkfs failed!" ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 1/9] xfs/122: fix metadirino 2022-12-30 22:20 ` [PATCHSET v1.0 0/9] fstests: test XFS metadata directories Darrick J. Wong 2022-12-30 22:20 ` [PATCH 3/9] xfs/{030,033,178}: forcibly disable metadata directory trees Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 2/9] various: fix finding metadata inode numbers when metadir is enabled Darrick J. Wong ` (6 subsequent siblings) 8 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan From: Darrick J. Wong <djwong@kernel.org> Fix xfs/122 to work properly with metadirino. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- tests/xfs/122.out | 1 + 1 file changed, 1 insertion(+) diff --git a/tests/xfs/122.out b/tests/xfs/122.out index 21549db7fd..eee6c1ee6d 100644 --- a/tests/xfs/122.out +++ b/tests/xfs/122.out @@ -35,6 +35,7 @@ offsetof(xfs_sb_t, sb_logsunit) = 196 offsetof(xfs_sb_t, sb_lsn) = 240 offsetof(xfs_sb_t, sb_magicnum) = 0 offsetof(xfs_sb_t, sb_meta_uuid) = 248 +offsetof(xfs_sb_t, sb_metadirino) = 264 offsetof(xfs_sb_t, sb_pquotino) = 232 offsetof(xfs_sb_t, sb_qflags) = 176 offsetof(xfs_sb_t, sb_rblocks) = 16 ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 2/9] various: fix finding metadata inode numbers when metadir is enabled 2022-12-30 22:20 ` [PATCHSET v1.0 0/9] fstests: test XFS metadata directories Darrick J. Wong 2022-12-30 22:20 ` [PATCH 3/9] xfs/{030,033,178}: forcibly disable metadata directory trees Darrick J. Wong 2022-12-30 22:20 ` [PATCH 1/9] xfs/122: fix metadirino Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2023-03-06 16:41 ` Zorro Lang 2022-12-30 22:20 ` [PATCH 4/9] common/repair: patch up repair sb inode value complaints Darrick J. Wong ` (5 subsequent siblings) 8 siblings, 1 reply; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan From: Darrick J. Wong <djwong@kernel.org> There are a number of tests that use xfs_db to examine the contents of metadata inodes to check correct functioning. The logic is scattered everywhere and won't work with metadata directory trees, so make a shared helper to find these inodes. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- common/xfs | 32 ++++++++++++++++++++++++++++++-- tests/xfs/007 | 16 +++++++++------- tests/xfs/1562 | 9 ++------- tests/xfs/1563 | 9 ++------- tests/xfs/1564 | 9 ++------- tests/xfs/1565 | 9 ++------- tests/xfs/1566 | 9 ++------- tests/xfs/1567 | 9 ++------- tests/xfs/1568 | 9 ++------- tests/xfs/1569 | 9 ++------- tests/xfs/529 | 5 ++--- tests/xfs/530 | 6 ++---- 12 files changed, 59 insertions(+), 72 deletions(-) diff --git a/common/xfs b/common/xfs index 8b365ad18b..dafbd1b874 100644 --- a/common/xfs +++ b/common/xfs @@ -1396,7 +1396,7 @@ _scratch_get_bmx_prefix() { _scratch_get_iext_count() { - local ino=$1 + local selector=$1 local whichfork=$2 local field="" @@ -1411,7 +1411,7 @@ _scratch_get_iext_count() return 1 esac - _scratch_xfs_get_metadata_field $field "inode $ino" + _scratch_xfs_get_metadata_field $field "$selector" } # @@ -1742,3 +1742,31 @@ _require_xfs_scratch_atomicswap() _notrun "atomicswap dependencies not supported by scratch filesystem type: $FSTYP" _scratch_unmount } + +# Find a metadata file within an xfs filesystem. The sole argument is the +# name of the field within the superblock. +_scratch_xfs_find_metafile() +{ + local metafile="$1" + local selector= + + if ! _check_scratch_xfs_features METADIR > /dev/null; then + sb_field="$(_scratch_xfs_get_sb_field "$metafile")" + if echo "$sb_field" | grep -q -w 'not found'; then + return 1 + fi + selector="inode $sb_field" + else + case "${metafile}" in + "rootino") selector="path /";; + "uquotino") selector="path -m /quota/user";; + "gquotino") selector="path -m /quota/group";; + "pquotino") selector="path -m /quota/project";; + "rbmino") selector="path -m /realtime/bitmap";; + "rsumino") selector="path -m /realtime/summary";; + esac + fi + + echo "${selector}" + return 0 +} diff --git a/tests/xfs/007 b/tests/xfs/007 index 4f864100fd..6d6d828b13 100755 --- a/tests/xfs/007 +++ b/tests/xfs/007 @@ -22,6 +22,11 @@ _require_xfs_quota _scratch_mkfs_xfs | _filter_mkfs > /dev/null 2> $tmp.mkfs . $tmp.mkfs +get_qfile_nblocks() { + local selector="$(_scratch_xfs_find_metafile "$1")" + _scratch_xfs_db -c "$selector" -c "p core.nblocks" +} + do_test() { qino_1=$1 @@ -31,12 +36,9 @@ do_test() echo "*** umount" _scratch_unmount - QINO_1=`_scratch_xfs_get_sb_field $qino_1` - QINO_2=`_scratch_xfs_get_sb_field $qino_2` - echo "*** Usage before quotarm ***" - _scratch_xfs_db -c "inode $QINO_1" -c "p core.nblocks" - _scratch_xfs_db -c "inode $QINO_2" -c "p core.nblocks" + get_qfile_nblocks $qino_1 + get_qfile_nblocks $qino_2 _qmount echo "*** turn off $off_opts quotas" @@ -66,8 +68,8 @@ do_test() _scratch_unmount echo "*** Usage after quotarm ***" - _scratch_xfs_db -c "inode $QINO_1" -c "p core.nblocks" - _scratch_xfs_db -c "inode $QINO_2" -c "p core.nblocks" + get_qfile_nblocks $qino_1 + get_qfile_nblocks $qino_2 } # Test user and group first diff --git a/tests/xfs/1562 b/tests/xfs/1562 index 015209eeb2..1e5b6881ee 100755 --- a/tests/xfs/1562 +++ b/tests/xfs/1562 @@ -27,13 +27,8 @@ echo "Format and populate" _scratch_populate_cached nofill > $seqres.full 2>&1 echo "Fuzz rtbitmap" -is_metadir=$(_scratch_xfs_get_metadata_field "core.version" 'path -m /realtime/0.bitmap') -if [ -n "$is_metadir" ]; then - path=('path -m /realtime/0.bitmap') -else - path=('sb' 'addr rbmino') -fi -_scratch_xfs_fuzz_metadata '' 'online' "${path[@]}" 'dblock 0' >> $seqres.full +path="$(_scratch_xfs_find_metafile rbmino)" +_scratch_xfs_fuzz_metadata '' 'online' "$path" 'dblock 0' >> $seqres.full echo "Done fuzzing rtbitmap" # success, all done diff --git a/tests/xfs/1563 b/tests/xfs/1563 index 2be0870a3d..a9da78106d 100755 --- a/tests/xfs/1563 +++ b/tests/xfs/1563 @@ -27,13 +27,8 @@ echo "Format and populate" _scratch_populate_cached nofill > $seqres.full 2>&1 echo "Fuzz rtsummary" -is_metadir=$(_scratch_xfs_get_metadata_field "core.version" 'path -m /realtime/0.summary') -if [ -n "$is_metadir" ]; then - path=('path -m /realtime/0.summary') -else - path=('sb' 'addr rsumino') -fi -_scratch_xfs_fuzz_metadata '' 'online' "${path[@]}" 'dblock 0' >> $seqres.full +path="$(_scratch_xfs_find_metafile rsumino)" +_scratch_xfs_fuzz_metadata '' 'online' "$path" 'dblock 0' >> $seqres.full echo "Done fuzzing rtsummary" # success, all done diff --git a/tests/xfs/1564 b/tests/xfs/1564 index c0d10ff0e9..4482861d50 100755 --- a/tests/xfs/1564 +++ b/tests/xfs/1564 @@ -27,13 +27,8 @@ echo "Format and populate" _scratch_populate_cached nofill > $seqres.full 2>&1 echo "Fuzz rtbitmap" -is_metadir=$(_scratch_xfs_get_metadata_field "core.version" 'path -m /realtime/0.bitmap') -if [ -n "$is_metadir" ]; then - path=('path -m /realtime/0.bitmap') -else - path=('sb' 'addr rbmino') -fi -_scratch_xfs_fuzz_metadata '' 'offline' "${path[@]}" 'dblock 0' >> $seqres.full +path="$(_scratch_xfs_find_metafile rbmino)" +_scratch_xfs_fuzz_metadata '' 'offline' "$path" 'dblock 0' >> $seqres.full echo "Done fuzzing rtbitmap" # success, all done diff --git a/tests/xfs/1565 b/tests/xfs/1565 index 6b4186fb3c..c43ccd848e 100755 --- a/tests/xfs/1565 +++ b/tests/xfs/1565 @@ -27,13 +27,8 @@ echo "Format and populate" _scratch_populate_cached nofill > $seqres.full 2>&1 echo "Fuzz rtsummary" -is_metadir=$(_scratch_xfs_get_metadata_field "core.version" 'path -m /realtime/0.summary') -if [ -n "$is_metadir" ]; then - path=('path -m /realtime/0.summary') -else - path=('sb' 'addr rsumino') -fi -_scratch_xfs_fuzz_metadata '' 'offline' "${path[@]}" 'dblock 0' >> $seqres.full +path="$(_scratch_xfs_find_metafile rsumino)" +_scratch_xfs_fuzz_metadata '' 'offline' "$path" 'dblock 0' >> $seqres.full echo "Done fuzzing rtsummary" # success, all done diff --git a/tests/xfs/1566 b/tests/xfs/1566 index 8d0f61ae10..aad4fafb15 100755 --- a/tests/xfs/1566 +++ b/tests/xfs/1566 @@ -28,13 +28,8 @@ echo "Format and populate" _scratch_populate_cached nofill > $seqres.full 2>&1 echo "Fuzz rtbitmap" -is_metadir=$(_scratch_xfs_get_metadata_field "core.version" 'path -m /realtime/0.bitmap') -if [ -n "$is_metadir" ]; then - path=('path -m /realtime/0.bitmap') -else - path=('sb' 'addr rbmino') -fi -_scratch_xfs_fuzz_metadata '' 'both' "${path[@]}" 'dblock 0' >> $seqres.full +path="$(_scratch_xfs_find_metafile rbmino)" +_scratch_xfs_fuzz_metadata '' 'both' "$path" 'dblock 0' >> $seqres.full echo "Done fuzzing rtbitmap" # success, all done diff --git a/tests/xfs/1567 b/tests/xfs/1567 index 7dc2012b67..ff782fc239 100755 --- a/tests/xfs/1567 +++ b/tests/xfs/1567 @@ -28,13 +28,8 @@ echo "Format and populate" _scratch_populate_cached nofill > $seqres.full 2>&1 echo "Fuzz rtsummary" -is_metadir=$(_scratch_xfs_get_metadata_field "core.version" 'path -m /realtime/0.summary') -if [ -n "$is_metadir" ]; then - path=('path -m /realtime/0.summary') -else - path=('sb' 'addr rsumino') -fi -_scratch_xfs_fuzz_metadata '' 'both' "${path[@]}" 'dblock 0' >> $seqres.full +path="$(_scratch_xfs_find_metafile rsumino)" +_scratch_xfs_fuzz_metadata '' 'both' "$path" 'dblock 0' >> $seqres.full echo "Done fuzzing rtsummary" # success, all done diff --git a/tests/xfs/1568 b/tests/xfs/1568 index c80640ef97..e2a28df58a 100755 --- a/tests/xfs/1568 +++ b/tests/xfs/1568 @@ -27,13 +27,8 @@ echo "Format and populate" _scratch_populate_cached nofill > $seqres.full 2>&1 echo "Fuzz rtbitmap" -is_metadir=$(_scratch_xfs_get_metadata_field "core.version" 'path -m /realtime/0.bitmap') -if [ -n "$is_metadir" ]; then - path=('path -m /realtime/0.bitmap') -else - path=('sb' 'addr rbmino') -fi -_scratch_xfs_fuzz_metadata '' 'none' "${path[@]}" 'dblock 0' >> $seqres.full +path="$(_scratch_xfs_find_metafile rbmino)" +_scratch_xfs_fuzz_metadata '' 'none' "$path" 'dblock 0' >> $seqres.full echo "Done fuzzing rtbitmap" # success, all done diff --git a/tests/xfs/1569 b/tests/xfs/1569 index e303f08ff5..dcb07440e8 100755 --- a/tests/xfs/1569 +++ b/tests/xfs/1569 @@ -27,13 +27,8 @@ echo "Format and populate" _scratch_populate_cached nofill > $seqres.full 2>&1 echo "Fuzz rtsummary" -is_metadir=$(_scratch_xfs_get_metadata_field "core.version" 'path -m /realtime/0.summary') -if [ -n "$is_metadir" ]; then - path=('path -m /realtime/0.summary') -else - path=('sb' 'addr rsumino') -fi -_scratch_xfs_fuzz_metadata '' 'none' "${path[@]}" 'dblock 0' >> $seqres.full +path="$(_scratch_xfs_find_metafile rsumino)" +_scratch_xfs_fuzz_metadata '' 'none' "$path" 'dblock 0' >> $seqres.full echo "Done fuzzing rtsummary" # success, all done diff --git a/tests/xfs/529 b/tests/xfs/529 index 83d24da0ac..e10af6753b 100755 --- a/tests/xfs/529 +++ b/tests/xfs/529 @@ -159,9 +159,8 @@ done _scratch_unmount >> $seqres.full echo "Verify uquota inode's extent count" -uquotino=$(_scratch_xfs_get_metadata_field 'uquotino' 'sb 0') - -nextents=$(_scratch_get_iext_count $uquotino data || \ +selector="$(_scratch_xfs_find_metafile uquotino)" +nextents=$(_scratch_get_iext_count "$selector" data || \ _fail "Unable to obtain inode fork's extent count") if (( $nextents > 10 )); then echo "Extent count overflow check failed: nextents = $nextents" diff --git a/tests/xfs/530 b/tests/xfs/530 index 56f5e7ebdb..cb8c2e3978 100755 --- a/tests/xfs/530 +++ b/tests/xfs/530 @@ -104,10 +104,8 @@ _scratch_unmount >> $seqres.full echo "Verify rbmino's and rsumino's extent count" for rtino in rbmino rsumino; do - ino=$(_scratch_xfs_get_metadata_field $rtino "sb 0") - echo "$rtino = $ino" >> $seqres.full - - nextents=$(_scratch_get_iext_count $ino data || \ + selector="$(_scratch_xfs_find_metafile "$rtino")" + nextents=$(_scratch_get_iext_count "$selector" data || \ _fail "Unable to obtain inode fork's extent count") if (( $nextents > 10 )); then echo "Extent count overflow check failed: nextents = $nextents" ^ permalink raw reply related [flat|nested] 565+ messages in thread
* Re: [PATCH 2/9] various: fix finding metadata inode numbers when metadir is enabled 2022-12-30 22:20 ` [PATCH 2/9] various: fix finding metadata inode numbers when metadir is enabled Darrick J. Wong @ 2023-03-06 16:41 ` Zorro Lang 0 siblings, 0 replies; 565+ messages in thread From: Zorro Lang @ 2023-03-06 16:41 UTC (permalink / raw) To: Darrick J. Wong; +Cc: linux-xfs, fstests On Fri, Dec 30, 2022 at 02:20:32PM -0800, Darrick J. Wong wrote: > From: Darrick J. Wong <djwong@kernel.org> > > There are a number of tests that use xfs_db to examine the contents of > metadata inodes to check correct functioning. The logic is scattered > everywhere and won't work with metadata directory trees, so make a > shared helper to find these inodes. > > Signed-off-by: Darrick J. Wong <djwong@kernel.org> > --- > common/xfs | 32 ++++++++++++++++++++++++++++++-- > tests/xfs/007 | 16 +++++++++------- > tests/xfs/1562 | 9 ++------- > tests/xfs/1563 | 9 ++------- > tests/xfs/1564 | 9 ++------- > tests/xfs/1565 | 9 ++------- > tests/xfs/1566 | 9 ++------- > tests/xfs/1567 | 9 ++------- > tests/xfs/1568 | 9 ++------- > tests/xfs/1569 | 9 ++------- These case names are temporary names, I've renamed them when I merged them, so this patch need to rebase. Sorry for this trouble :) Thanks, Zorro > tests/xfs/529 | 5 ++--- > tests/xfs/530 | 6 ++---- > 12 files changed, 59 insertions(+), 72 deletions(-) > > > diff --git a/common/xfs b/common/xfs > index 8b365ad18b..dafbd1b874 100644 > --- a/common/xfs > +++ b/common/xfs > @@ -1396,7 +1396,7 @@ _scratch_get_bmx_prefix() { > > _scratch_get_iext_count() > { > - local ino=$1 > + local selector=$1 > local whichfork=$2 > local field="" > > @@ -1411,7 +1411,7 @@ _scratch_get_iext_count() > return 1 > esac > > - _scratch_xfs_get_metadata_field $field "inode $ino" > + _scratch_xfs_get_metadata_field $field "$selector" > } > > # > @@ -1742,3 +1742,31 @@ _require_xfs_scratch_atomicswap() > _notrun "atomicswap dependencies not supported by scratch filesystem type: $FSTYP" > _scratch_unmount > } > + > +# Find a metadata file within an xfs filesystem. The sole argument is the > +# name of the field within the superblock. > +_scratch_xfs_find_metafile() > +{ > + local metafile="$1" > + local selector= > + > + if ! _check_scratch_xfs_features METADIR > /dev/null; then > + sb_field="$(_scratch_xfs_get_sb_field "$metafile")" > + if echo "$sb_field" | grep -q -w 'not found'; then > + return 1 > + fi > + selector="inode $sb_field" > + else > + case "${metafile}" in > + "rootino") selector="path /";; > + "uquotino") selector="path -m /quota/user";; > + "gquotino") selector="path -m /quota/group";; > + "pquotino") selector="path -m /quota/project";; > + "rbmino") selector="path -m /realtime/bitmap";; > + "rsumino") selector="path -m /realtime/summary";; > + esac > + fi > + > + echo "${selector}" > + return 0 > +} > diff --git a/tests/xfs/007 b/tests/xfs/007 > index 4f864100fd..6d6d828b13 100755 > --- a/tests/xfs/007 > +++ b/tests/xfs/007 > @@ -22,6 +22,11 @@ _require_xfs_quota > _scratch_mkfs_xfs | _filter_mkfs > /dev/null 2> $tmp.mkfs > . $tmp.mkfs > > +get_qfile_nblocks() { > + local selector="$(_scratch_xfs_find_metafile "$1")" > + _scratch_xfs_db -c "$selector" -c "p core.nblocks" > +} > + > do_test() > { > qino_1=$1 > @@ -31,12 +36,9 @@ do_test() > echo "*** umount" > _scratch_unmount > > - QINO_1=`_scratch_xfs_get_sb_field $qino_1` > - QINO_2=`_scratch_xfs_get_sb_field $qino_2` > - > echo "*** Usage before quotarm ***" > - _scratch_xfs_db -c "inode $QINO_1" -c "p core.nblocks" > - _scratch_xfs_db -c "inode $QINO_2" -c "p core.nblocks" > + get_qfile_nblocks $qino_1 > + get_qfile_nblocks $qino_2 > > _qmount > echo "*** turn off $off_opts quotas" > @@ -66,8 +68,8 @@ do_test() > _scratch_unmount > > echo "*** Usage after quotarm ***" > - _scratch_xfs_db -c "inode $QINO_1" -c "p core.nblocks" > - _scratch_xfs_db -c "inode $QINO_2" -c "p core.nblocks" > + get_qfile_nblocks $qino_1 > + get_qfile_nblocks $qino_2 > } > > # Test user and group first > diff --git a/tests/xfs/1562 b/tests/xfs/1562 > index 015209eeb2..1e5b6881ee 100755 > --- a/tests/xfs/1562 > +++ b/tests/xfs/1562 > @@ -27,13 +27,8 @@ echo "Format and populate" > _scratch_populate_cached nofill > $seqres.full 2>&1 > > echo "Fuzz rtbitmap" > -is_metadir=$(_scratch_xfs_get_metadata_field "core.version" 'path -m /realtime/0.bitmap') > -if [ -n "$is_metadir" ]; then > - path=('path -m /realtime/0.bitmap') > -else > - path=('sb' 'addr rbmino') > -fi > -_scratch_xfs_fuzz_metadata '' 'online' "${path[@]}" 'dblock 0' >> $seqres.full > +path="$(_scratch_xfs_find_metafile rbmino)" > +_scratch_xfs_fuzz_metadata '' 'online' "$path" 'dblock 0' >> $seqres.full > echo "Done fuzzing rtbitmap" > > # success, all done > diff --git a/tests/xfs/1563 b/tests/xfs/1563 > index 2be0870a3d..a9da78106d 100755 > --- a/tests/xfs/1563 > +++ b/tests/xfs/1563 > @@ -27,13 +27,8 @@ echo "Format and populate" > _scratch_populate_cached nofill > $seqres.full 2>&1 > > echo "Fuzz rtsummary" > -is_metadir=$(_scratch_xfs_get_metadata_field "core.version" 'path -m /realtime/0.summary') > -if [ -n "$is_metadir" ]; then > - path=('path -m /realtime/0.summary') > -else > - path=('sb' 'addr rsumino') > -fi > -_scratch_xfs_fuzz_metadata '' 'online' "${path[@]}" 'dblock 0' >> $seqres.full > +path="$(_scratch_xfs_find_metafile rsumino)" > +_scratch_xfs_fuzz_metadata '' 'online' "$path" 'dblock 0' >> $seqres.full > echo "Done fuzzing rtsummary" > > # success, all done > diff --git a/tests/xfs/1564 b/tests/xfs/1564 > index c0d10ff0e9..4482861d50 100755 > --- a/tests/xfs/1564 > +++ b/tests/xfs/1564 > @@ -27,13 +27,8 @@ echo "Format and populate" > _scratch_populate_cached nofill > $seqres.full 2>&1 > > echo "Fuzz rtbitmap" > -is_metadir=$(_scratch_xfs_get_metadata_field "core.version" 'path -m /realtime/0.bitmap') > -if [ -n "$is_metadir" ]; then > - path=('path -m /realtime/0.bitmap') > -else > - path=('sb' 'addr rbmino') > -fi > -_scratch_xfs_fuzz_metadata '' 'offline' "${path[@]}" 'dblock 0' >> $seqres.full > +path="$(_scratch_xfs_find_metafile rbmino)" > +_scratch_xfs_fuzz_metadata '' 'offline' "$path" 'dblock 0' >> $seqres.full > echo "Done fuzzing rtbitmap" > > # success, all done > diff --git a/tests/xfs/1565 b/tests/xfs/1565 > index 6b4186fb3c..c43ccd848e 100755 > --- a/tests/xfs/1565 > +++ b/tests/xfs/1565 > @@ -27,13 +27,8 @@ echo "Format and populate" > _scratch_populate_cached nofill > $seqres.full 2>&1 > > echo "Fuzz rtsummary" > -is_metadir=$(_scratch_xfs_get_metadata_field "core.version" 'path -m /realtime/0.summary') > -if [ -n "$is_metadir" ]; then > - path=('path -m /realtime/0.summary') > -else > - path=('sb' 'addr rsumino') > -fi > -_scratch_xfs_fuzz_metadata '' 'offline' "${path[@]}" 'dblock 0' >> $seqres.full > +path="$(_scratch_xfs_find_metafile rsumino)" > +_scratch_xfs_fuzz_metadata '' 'offline' "$path" 'dblock 0' >> $seqres.full > echo "Done fuzzing rtsummary" > > # success, all done > diff --git a/tests/xfs/1566 b/tests/xfs/1566 > index 8d0f61ae10..aad4fafb15 100755 > --- a/tests/xfs/1566 > +++ b/tests/xfs/1566 > @@ -28,13 +28,8 @@ echo "Format and populate" > _scratch_populate_cached nofill > $seqres.full 2>&1 > > echo "Fuzz rtbitmap" > -is_metadir=$(_scratch_xfs_get_metadata_field "core.version" 'path -m /realtime/0.bitmap') > -if [ -n "$is_metadir" ]; then > - path=('path -m /realtime/0.bitmap') > -else > - path=('sb' 'addr rbmino') > -fi > -_scratch_xfs_fuzz_metadata '' 'both' "${path[@]}" 'dblock 0' >> $seqres.full > +path="$(_scratch_xfs_find_metafile rbmino)" > +_scratch_xfs_fuzz_metadata '' 'both' "$path" 'dblock 0' >> $seqres.full > echo "Done fuzzing rtbitmap" > > # success, all done > diff --git a/tests/xfs/1567 b/tests/xfs/1567 > index 7dc2012b67..ff782fc239 100755 > --- a/tests/xfs/1567 > +++ b/tests/xfs/1567 > @@ -28,13 +28,8 @@ echo "Format and populate" > _scratch_populate_cached nofill > $seqres.full 2>&1 > > echo "Fuzz rtsummary" > -is_metadir=$(_scratch_xfs_get_metadata_field "core.version" 'path -m /realtime/0.summary') > -if [ -n "$is_metadir" ]; then > - path=('path -m /realtime/0.summary') > -else > - path=('sb' 'addr rsumino') > -fi > -_scratch_xfs_fuzz_metadata '' 'both' "${path[@]}" 'dblock 0' >> $seqres.full > +path="$(_scratch_xfs_find_metafile rsumino)" > +_scratch_xfs_fuzz_metadata '' 'both' "$path" 'dblock 0' >> $seqres.full > echo "Done fuzzing rtsummary" > > # success, all done > diff --git a/tests/xfs/1568 b/tests/xfs/1568 > index c80640ef97..e2a28df58a 100755 > --- a/tests/xfs/1568 > +++ b/tests/xfs/1568 > @@ -27,13 +27,8 @@ echo "Format and populate" > _scratch_populate_cached nofill > $seqres.full 2>&1 > > echo "Fuzz rtbitmap" > -is_metadir=$(_scratch_xfs_get_metadata_field "core.version" 'path -m /realtime/0.bitmap') > -if [ -n "$is_metadir" ]; then > - path=('path -m /realtime/0.bitmap') > -else > - path=('sb' 'addr rbmino') > -fi > -_scratch_xfs_fuzz_metadata '' 'none' "${path[@]}" 'dblock 0' >> $seqres.full > +path="$(_scratch_xfs_find_metafile rbmino)" > +_scratch_xfs_fuzz_metadata '' 'none' "$path" 'dblock 0' >> $seqres.full > echo "Done fuzzing rtbitmap" > > # success, all done > diff --git a/tests/xfs/1569 b/tests/xfs/1569 > index e303f08ff5..dcb07440e8 100755 > --- a/tests/xfs/1569 > +++ b/tests/xfs/1569 > @@ -27,13 +27,8 @@ echo "Format and populate" > _scratch_populate_cached nofill > $seqres.full 2>&1 > > echo "Fuzz rtsummary" > -is_metadir=$(_scratch_xfs_get_metadata_field "core.version" 'path -m /realtime/0.summary') > -if [ -n "$is_metadir" ]; then > - path=('path -m /realtime/0.summary') > -else > - path=('sb' 'addr rsumino') > -fi > -_scratch_xfs_fuzz_metadata '' 'none' "${path[@]}" 'dblock 0' >> $seqres.full > +path="$(_scratch_xfs_find_metafile rsumino)" > +_scratch_xfs_fuzz_metadata '' 'none' "$path" 'dblock 0' >> $seqres.full > echo "Done fuzzing rtsummary" > > # success, all done > diff --git a/tests/xfs/529 b/tests/xfs/529 > index 83d24da0ac..e10af6753b 100755 > --- a/tests/xfs/529 > +++ b/tests/xfs/529 > @@ -159,9 +159,8 @@ done > _scratch_unmount >> $seqres.full > > echo "Verify uquota inode's extent count" > -uquotino=$(_scratch_xfs_get_metadata_field 'uquotino' 'sb 0') > - > -nextents=$(_scratch_get_iext_count $uquotino data || \ > +selector="$(_scratch_xfs_find_metafile uquotino)" > +nextents=$(_scratch_get_iext_count "$selector" data || \ > _fail "Unable to obtain inode fork's extent count") > if (( $nextents > 10 )); then > echo "Extent count overflow check failed: nextents = $nextents" > diff --git a/tests/xfs/530 b/tests/xfs/530 > index 56f5e7ebdb..cb8c2e3978 100755 > --- a/tests/xfs/530 > +++ b/tests/xfs/530 > @@ -104,10 +104,8 @@ _scratch_unmount >> $seqres.full > > echo "Verify rbmino's and rsumino's extent count" > for rtino in rbmino rsumino; do > - ino=$(_scratch_xfs_get_metadata_field $rtino "sb 0") > - echo "$rtino = $ino" >> $seqres.full > - > - nextents=$(_scratch_get_iext_count $ino data || \ > + selector="$(_scratch_xfs_find_metafile "$rtino")" > + nextents=$(_scratch_get_iext_count "$selector" data || \ > _fail "Unable to obtain inode fork's extent count") > if (( $nextents > 10 )); then > echo "Extent count overflow check failed: nextents = $nextents" > ^ permalink raw reply [flat|nested] 565+ messages in thread
* [PATCH 4/9] common/repair: patch up repair sb inode value complaints 2022-12-30 22:20 ` [PATCHSET v1.0 0/9] fstests: test XFS metadata directories Darrick J. Wong ` (2 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 2/9] various: fix finding metadata inode numbers when metadir is enabled Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 6/9] xfs/{050,144,153,299,330}: update quota reports to leave out metadir files Darrick J. Wong ` (4 subsequent siblings) 8 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan From: Darrick J. Wong <djwong@kernel.org> Now that we've refactored xfs_repair to be more consistent in how it reports unexpected superblock inode pointer values, we have to fix up the fstests repair filters to emulate the old golden output. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- common/repair | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/common/repair b/common/repair index 8945d0028c..c3afcfb3e6 100644 --- a/common/repair +++ b/common/repair @@ -28,6 +28,10 @@ _filter_repair() perl -ne ' # for sb /- agno = / && next; # remove each AG line (variable number) +s/realtime bitmap inode pointer/realtime bitmap ino pointer/; +s/sb realtime bitmap inode value/sb realtime bitmap inode/; +s/realtime summary inode pointer/realtime summary ino pointer/; +s/sb realtime summary inode value/sb realtime summary inode/; s/(pointer to) (\d+)/\1 INO/; # Changed inode output in 5.5.0 s/sb root inode value /sb root inode /; ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 6/9] xfs/{050,144,153,299,330}: update quota reports to leave out metadir files 2022-12-30 22:20 ` [PATCHSET v1.0 0/9] fstests: test XFS metadata directories Darrick J. Wong ` (3 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 4/9] common/repair: patch up repair sb inode value complaints Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 9/9] xfs: create fuzz tests for metadata directories Darrick J. Wong ` (3 subsequent siblings) 8 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan From: Darrick J. Wong <djwong@kernel.org> Remove the metadata directory tree directories from the quota reporting in these tests so that we don't regress the golden output. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- common/filter | 7 +++++-- common/xfs | 23 +++++++++++++++++++++++ tests/xfs/050 | 1 + tests/xfs/153 | 1 + tests/xfs/299 | 1 + tests/xfs/330 | 6 +++++- 6 files changed, 36 insertions(+), 3 deletions(-) diff --git a/common/filter b/common/filter index 3e3fea7ea0..49c6859992 100644 --- a/common/filter +++ b/common/filter @@ -618,11 +618,14 @@ _filter_getcap() # Filter user/group/project id numbers out of quota reports, and standardize # the block counts to use filesystem block size. Callers must set the id and -# bsize variables before calling this function. +# bsize variables before calling this function. The qhidden_rootfiles variable +# (by default zero) is the number of root files to filter out of the inode +# count part of the quota report. _filter_quota_report() { test -n "$id" || echo "id must be set" test -n "$bsize" || echo "block size must be set" + test -n "$qhidden_rootfiles" || qhidden_rootfiles=0 tr -s '[:space:]' | \ perl -npe ' @@ -630,7 +633,7 @@ _filter_quota_report() s/^\#0 \d+ /[ROOT] 0 /g; s/6 days/7 days/g' | perl -npe ' - $val = 0; + $val = '"$qhidden_rootfiles"'; if ($ENV{'LARGE_SCRATCH_DEV'}) { $val = $ENV{'NUM_SPACE_FILES'}; } diff --git a/common/xfs b/common/xfs index 0f69d3eb18..99e377631b 100644 --- a/common/xfs +++ b/common/xfs @@ -1783,3 +1783,26 @@ _scratch_xfs_force_no_metadir() MKFS_OPTIONS="-m metadir=0 $MKFS_OPTIONS" fi } + +# Decide if a mount filesystem has metadata directory trees. +_xfs_mount_has_metadir() { + local mount="$1" + + # spaceman (and its info command) predate metadir + test ! -e "$XFS_SPACEMAN_PROG" && return 1 + $XFS_SPACEMAN_PROG -c "info" "$mount" | grep -q 'metadir=1' +} + +# Compute the number of files in the metadata directory tree. +_xfs_calc_metadir_files() { + local mount="$1" + + if ! _xfs_mount_has_metadir "$mount"; then + echo 0 + return + fi + + local regfiles="$($XFS_IO_PROG -c 'bulkstat' "$mount" | grep '^bs_ino' | wc -l)" + local metafiles="$($XFS_IO_PROG -c 'bulkstat -m' "$mount" 2>&1 | grep '^bs_ino' | wc -l)" + echo $((metafiles - regfiles)) +} diff --git a/tests/xfs/050 b/tests/xfs/050 index 2220e47016..64fbaf687d 100755 --- a/tests/xfs/050 +++ b/tests/xfs/050 @@ -34,6 +34,7 @@ _require_xfs_quota _scratch_mkfs >/dev/null 2>&1 _scratch_mount bsize=$(_get_file_block_size $SCRATCH_MNT) +qhidden_rootfiles=$(_xfs_calc_metadir_files $SCRATCH_MNT) _scratch_unmount bsoft=$(( 200 * $bsize )) diff --git a/tests/xfs/153 b/tests/xfs/153 index dbe26b6803..fc64bf734a 100755 --- a/tests/xfs/153 +++ b/tests/xfs/153 @@ -39,6 +39,7 @@ _require_test_program "vfs/mount-idmapped" _scratch_mkfs >/dev/null 2>&1 _scratch_mount bsize=$(_get_file_block_size $SCRATCH_MNT) +qhidden_rootfiles=$(_xfs_calc_metadir_files $SCRATCH_MNT) _scratch_unmount bsoft=$(( 200 * $bsize )) diff --git a/tests/xfs/299 b/tests/xfs/299 index 4b9df3c6aa..2167c492c4 100755 --- a/tests/xfs/299 +++ b/tests/xfs/299 @@ -159,6 +159,7 @@ _qmount_option "uquota,gquota,pquota" _qmount bsize=$(_get_file_block_size $SCRATCH_MNT) +qhidden_rootfiles=$(_xfs_calc_metadir_files $SCRATCH_MNT) bsoft=$(( 100 * $bsize )) bhard=$(( 500 * $bsize )) diff --git a/tests/xfs/330 b/tests/xfs/330 index c6e74e67e8..e919ccc1ca 100755 --- a/tests/xfs/330 +++ b/tests/xfs/330 @@ -26,7 +26,10 @@ _require_nobody do_repquota() { - repquota $SCRATCH_MNT | grep -E '^(fsgqa|root|nobody)' | sort -r + repquota $SCRATCH_MNT | grep -E '^(fsgqa|root|nobody)' | sort -r | \ + perl -npe ' + $val = '"$qhidden_rootfiles"'; + s/(^root\s+--\s+\S+\s+\S+\s+\S+\s+)(\S+)/$1@{[$2 - $val]}/g' } rm -f "$seqres.full" @@ -35,6 +38,7 @@ echo "Format and mount" _scratch_mkfs > "$seqres.full" 2>&1 export MOUNT_OPTIONS="-o usrquota,grpquota $MOUNT_OPTIONS" _scratch_mount >> "$seqres.full" 2>&1 +qhidden_rootfiles=$(_xfs_calc_metadir_files $SCRATCH_MNT) quotacheck -u -g $SCRATCH_MNT 2> /dev/null quotaon $SCRATCH_MNT 2> /dev/null ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 9/9] xfs: create fuzz tests for metadata directories 2022-12-30 22:20 ` [PATCHSET v1.0 0/9] fstests: test XFS metadata directories Darrick J. Wong ` (4 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 6/9] xfs/{050,144,153,299,330}: update quota reports to leave out metadir files Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 7/9] xfs/769: add metadir upgrade to test matrix Darrick J. Wong ` (2 subsequent siblings) 8 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan From: Darrick J. Wong <djwong@kernel.org> Create fuzz tests to make sure that all the validation works for metadata directories and subdirectories. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- common/xfs | 22 ++++++++++++++++++++++ tests/xfs/1546 | 37 +++++++++++++++++++++++++++++++++++++ tests/xfs/1546.out | 4 ++++ tests/xfs/1547 | 37 +++++++++++++++++++++++++++++++++++++ tests/xfs/1547.out | 4 ++++ tests/xfs/1548 | 37 +++++++++++++++++++++++++++++++++++++ tests/xfs/1548.out | 4 ++++ tests/xfs/1549 | 38 ++++++++++++++++++++++++++++++++++++++ tests/xfs/1549.out | 4 ++++ tests/xfs/1550 | 37 +++++++++++++++++++++++++++++++++++++ tests/xfs/1550.out | 4 ++++ tests/xfs/1551 | 37 +++++++++++++++++++++++++++++++++++++ tests/xfs/1551.out | 4 ++++ tests/xfs/1552 | 37 +++++++++++++++++++++++++++++++++++++ tests/xfs/1552.out | 4 ++++ tests/xfs/1553 | 38 ++++++++++++++++++++++++++++++++++++++ tests/xfs/1553.out | 4 ++++ 17 files changed, 352 insertions(+) create mode 100755 tests/xfs/1546 create mode 100644 tests/xfs/1546.out create mode 100755 tests/xfs/1547 create mode 100644 tests/xfs/1547.out create mode 100755 tests/xfs/1548 create mode 100644 tests/xfs/1548.out create mode 100755 tests/xfs/1549 create mode 100644 tests/xfs/1549.out create mode 100755 tests/xfs/1550 create mode 100644 tests/xfs/1550.out create mode 100755 tests/xfs/1551 create mode 100644 tests/xfs/1551.out create mode 100755 tests/xfs/1552 create mode 100644 tests/xfs/1552.out create mode 100755 tests/xfs/1553 create mode 100644 tests/xfs/1553.out diff --git a/common/xfs b/common/xfs index 99e377631b..77af8a6d60 100644 --- a/common/xfs +++ b/common/xfs @@ -1806,3 +1806,25 @@ _xfs_calc_metadir_files() { local metafiles="$($XFS_IO_PROG -c 'bulkstat -m' "$mount" 2>&1 | grep '^bs_ino' | wc -l)" echo $((metafiles - regfiles)) } + +_require_xfs_mkfs_metadir() +{ + _scratch_mkfs_xfs_supported -m metadir=1 >/dev/null 2>&1 || \ + _notrun "mkfs.xfs doesn't have metadir features" +} + +_require_xfs_scratch_metadir() +{ + _require_xfs_mkfs_metadir + _require_scratch + + _scratch_mkfs -m metadir=1 &> /dev/null + _require_scratch_xfs_features METADIR + _try_scratch_mount + res=$? + if [ $res -ne 0 ]; then + _notrun "mounting with metadir not supported by filesystem type: $FSTYP" + else + _scratch_unmount + fi +} diff --git a/tests/xfs/1546 b/tests/xfs/1546 new file mode 100755 index 0000000000..5b48463abe --- /dev/null +++ b/tests/xfs/1546 @@ -0,0 +1,37 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0-or-later +# Copyright (c) 2022 Oracle. All Rights Reserved. +# +# FS QA Test No. 1546 +# +# Populate a XFS filesystem and fuzz every metadir root field. +# Use xfs_scrub to fix the corruption. + +. ./common/preamble +_begin_fstest dangerous_fuzzers dangerous_scrub dangerous_online_repair + +_register_cleanup "_cleanup" BUS + +# Import common functions. +. ./common/filter +. ./common/populate +. ./common/fuzzy + +# real QA test starts here +_supported_fs xfs +_require_xfs_scratch_metadir +_require_scratch_xfs_fuzz_fields +_disable_dmesg_check + +echo "Format and populate" +_scratch_populate_cached nofill > $seqres.full 2>&1 + +inode_ver=$(_scratch_xfs_get_metadata_field "core.version" 'path -m /') + +echo "Fuzz metadir root" +_scratch_xfs_fuzz_metadata '' 'online' 'path -m /' >> $seqres.full +echo "Done fuzzing metadir root" + +# success, all done +status=0 +exit diff --git a/tests/xfs/1546.out b/tests/xfs/1546.out new file mode 100644 index 0000000000..b72891a758 --- /dev/null +++ b/tests/xfs/1546.out @@ -0,0 +1,4 @@ +QA output created by 1546 +Format and populate +Fuzz metadir root +Done fuzzing metadir root diff --git a/tests/xfs/1547 b/tests/xfs/1547 new file mode 100755 index 0000000000..ff86bc657e --- /dev/null +++ b/tests/xfs/1547 @@ -0,0 +1,37 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0-or-later +# Copyright (c) 2022 Oracle. All Rights Reserved. +# +# FS QA Test No. 1547 +# +# Populate a XFS filesystem and fuzz every metadir root field. +# Use xfs_repair to fix the corruption. + +. ./common/preamble +_begin_fstest dangerous_fuzzers dangerous_scrub dangerous_repair + +_register_cleanup "_cleanup" BUS + +# Import common functions. +. ./common/filter +. ./common/populate +. ./common/fuzzy + +# real QA test starts here +_supported_fs xfs +_require_xfs_scratch_metadir +_require_scratch_xfs_fuzz_fields +_disable_dmesg_check + +echo "Format and populate" +_scratch_populate_cached nofill > $seqres.full 2>&1 + +inode_ver=$(_scratch_xfs_get_metadata_field "core.version" 'path -m /') + +echo "Fuzz metadir root" +_scratch_xfs_fuzz_metadata '' 'offline' 'path -m /' >> $seqres.full +echo "Done fuzzing metadir root" + +# success, all done +status=0 +exit diff --git a/tests/xfs/1547.out b/tests/xfs/1547.out new file mode 100644 index 0000000000..983cc01343 --- /dev/null +++ b/tests/xfs/1547.out @@ -0,0 +1,4 @@ +QA output created by 1547 +Format and populate +Fuzz metadir root +Done fuzzing metadir root diff --git a/tests/xfs/1548 b/tests/xfs/1548 new file mode 100755 index 0000000000..1f29dfda3b --- /dev/null +++ b/tests/xfs/1548 @@ -0,0 +1,37 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0-or-later +# Copyright (c) 2022 Oracle. All Rights Reserved. +# +# FS QA Test No. 1548 +# +# Populate a XFS filesystem and fuzz every metadir root field. +# Do not fix the filesystem, to test metadata verifiers. + +. ./common/preamble +_begin_fstest dangerous_fuzzers dangerous_norepair + +_register_cleanup "_cleanup" BUS + +# Import common functions. +. ./common/filter +. ./common/populate +. ./common/fuzzy + +# real QA test starts here +_supported_fs xfs +_require_xfs_scratch_metadir +_require_scratch_xfs_fuzz_fields +_disable_dmesg_check + +echo "Format and populate" +_scratch_populate_cached nofill > $seqres.full 2>&1 + +inode_ver=$(_scratch_xfs_get_metadata_field "core.version" 'path -m /') + +echo "Fuzz metadir root" +_scratch_xfs_fuzz_metadata '' 'none' 'path -m /' >> $seqres.full +echo "Done fuzzing metadir root" + +# success, all done +status=0 +exit diff --git a/tests/xfs/1548.out b/tests/xfs/1548.out new file mode 100644 index 0000000000..9e395bb059 --- /dev/null +++ b/tests/xfs/1548.out @@ -0,0 +1,4 @@ +QA output created by 1548 +Format and populate +Fuzz metadir root +Done fuzzing metadir root diff --git a/tests/xfs/1549 b/tests/xfs/1549 new file mode 100755 index 0000000000..865023f218 --- /dev/null +++ b/tests/xfs/1549 @@ -0,0 +1,38 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0-or-later +# Copyright (c) 2022 Oracle. All Rights Reserved. +# +# FS QA Test No. 1549 +# +# Populate a XFS filesystem and fuzz every metadir root field. +# Try online repair and, if necessary, offline repair, +# to test the most likely usage pattern. + +. ./common/preamble +_begin_fstest dangerous_fuzzers dangerous_bothrepair + +_register_cleanup "_cleanup" BUS + +# Import common functions. +. ./common/filter +. ./common/populate +. ./common/fuzzy + +# real QA test starts here +_supported_fs xfs +_require_xfs_scratch_metadir +_require_scratch_xfs_fuzz_fields +_disable_dmesg_check + +echo "Format and populate" +_scratch_populate_cached nofill > $seqres.full 2>&1 + +inode_ver=$(_scratch_xfs_get_metadata_field "core.version" 'path -m /') + +echo "Fuzz metadir root" +_scratch_xfs_fuzz_metadata '' 'both' 'path -m /' >> $seqres.full +echo "Done fuzzing metadir root" + +# success, all done +status=0 +exit diff --git a/tests/xfs/1549.out b/tests/xfs/1549.out new file mode 100644 index 0000000000..22b3d215e3 --- /dev/null +++ b/tests/xfs/1549.out @@ -0,0 +1,4 @@ +QA output created by 1549 +Format and populate +Fuzz metadir root +Done fuzzing metadir root diff --git a/tests/xfs/1550 b/tests/xfs/1550 new file mode 100755 index 0000000000..62219e65fc --- /dev/null +++ b/tests/xfs/1550 @@ -0,0 +1,37 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0-or-later +# Copyright (c) 2022 Oracle. All Rights Reserved. +# +# FS QA Test No. 1550 +# +# Populate a XFS filesystem and fuzz every metadir subdir field. +# Use xfs_scrub to fix the corruption. + +. ./common/preamble +_begin_fstest dangerous_fuzzers dangerous_scrub dangerous_online_repair + +_register_cleanup "_cleanup" BUS + +# Import common functions. +. ./common/filter +. ./common/populate +. ./common/fuzzy + +# real QA test starts here +_supported_fs xfs +_require_xfs_scratch_metadir +_require_scratch_xfs_fuzz_fields +_disable_dmesg_check + +echo "Format and populate" +_scratch_populate_cached nofill > $seqres.full 2>&1 + +inode_ver=$(_scratch_xfs_get_metadata_field "core.version" 'path -m /realtime') + +echo "Fuzz metadir subdir" +_scratch_xfs_fuzz_metadata '' 'online' 'path -m /realtime' >> $seqres.full +echo "Done fuzzing metadir subdir" + +# success, all done +status=0 +exit diff --git a/tests/xfs/1550.out b/tests/xfs/1550.out new file mode 100644 index 0000000000..7694cd670b --- /dev/null +++ b/tests/xfs/1550.out @@ -0,0 +1,4 @@ +QA output created by 1550 +Format and populate +Fuzz metadir subdir +Done fuzzing metadir subdir diff --git a/tests/xfs/1551 b/tests/xfs/1551 new file mode 100755 index 0000000000..f101529364 --- /dev/null +++ b/tests/xfs/1551 @@ -0,0 +1,37 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0-or-later +# Copyright (c) 2022 Oracle. All Rights Reserved. +# +# FS QA Test No. 1551 +# +# Populate a XFS filesystem and fuzz every metadir subdir field. +# Use xfs_repair to fix the corruption. + +. ./common/preamble +_begin_fstest dangerous_fuzzers dangerous_scrub dangerous_repair + +_register_cleanup "_cleanup" BUS + +# Import common functions. +. ./common/filter +. ./common/populate +. ./common/fuzzy + +# real QA test starts here +_supported_fs xfs +_require_xfs_scratch_metadir +_require_scratch_xfs_fuzz_fields +_disable_dmesg_check + +echo "Format and populate" +_scratch_populate_cached nofill > $seqres.full 2>&1 + +inode_ver=$(_scratch_xfs_get_metadata_field "core.version" 'path -m /realtime') + +echo "Fuzz metadir subdir" +_scratch_xfs_fuzz_metadata '' 'offline' 'path -m /realtime' >> $seqres.full +echo "Done fuzzing metadir subdir" + +# success, all done +status=0 +exit diff --git a/tests/xfs/1551.out b/tests/xfs/1551.out new file mode 100644 index 0000000000..4c3360d08b --- /dev/null +++ b/tests/xfs/1551.out @@ -0,0 +1,4 @@ +QA output created by 1551 +Format and populate +Fuzz metadir subdir +Done fuzzing metadir subdir diff --git a/tests/xfs/1552 b/tests/xfs/1552 new file mode 100755 index 0000000000..ab3b89ec40 --- /dev/null +++ b/tests/xfs/1552 @@ -0,0 +1,37 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0-or-later +# Copyright (c) 2022 Oracle. All Rights Reserved. +# +# FS QA Test No. 1552 +# +# Populate a XFS filesystem and fuzz every metadir subdir field. +# Do not fix the filesystem, to test metadata verifiers. + +. ./common/preamble +_begin_fstest dangerous_fuzzers dangerous_norepair + +_register_cleanup "_cleanup" BUS + +# Import common functions. +. ./common/filter +. ./common/populate +. ./common/fuzzy + +# real QA test starts here +_supported_fs xfs +_require_xfs_scratch_metadir +_require_scratch_xfs_fuzz_fields +_disable_dmesg_check + +echo "Format and populate" +_scratch_populate_cached nofill > $seqres.full 2>&1 + +inode_ver=$(_scratch_xfs_get_metadata_field "core.version" 'path -m /realtime') + +echo "Fuzz metadir subdir" +_scratch_xfs_fuzz_metadata '' 'none' 'path -m /realtime' >> $seqres.full +echo "Done fuzzing metadir subdir" + +# success, all done +status=0 +exit diff --git a/tests/xfs/1552.out b/tests/xfs/1552.out new file mode 100644 index 0000000000..6636b1b656 --- /dev/null +++ b/tests/xfs/1552.out @@ -0,0 +1,4 @@ +QA output created by 1552 +Format and populate +Fuzz metadir subdir +Done fuzzing metadir subdir diff --git a/tests/xfs/1553 b/tests/xfs/1553 new file mode 100755 index 0000000000..6acbacbe16 --- /dev/null +++ b/tests/xfs/1553 @@ -0,0 +1,38 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0-or-later +# Copyright (c) 2022 Oracle. All Rights Reserved. +# +# FS QA Test No. 1553 +# +# Populate a XFS filesystem and fuzz every metadir subdir field. +# Try online repair and, if necessary, offline repair, +# to test the most likely usage pattern. + +. ./common/preamble +_begin_fstest dangerous_fuzzers dangerous_bothrepair + +_register_cleanup "_cleanup" BUS + +# Import common functions. +. ./common/filter +. ./common/populate +. ./common/fuzzy + +# real QA test starts here +_supported_fs xfs +_require_xfs_scratch_metadir +_require_scratch_xfs_fuzz_fields +_disable_dmesg_check + +echo "Format and populate" +_scratch_populate_cached nofill > $seqres.full 2>&1 + +inode_ver=$(_scratch_xfs_get_metadata_field "core.version" 'path -m /realtime') + +echo "Fuzz metadir subdir" +_scratch_xfs_fuzz_metadata '' 'both' 'path -m /realtime' >> $seqres.full +echo "Done fuzzing metadir subdir" + +# success, all done +status=0 +exit diff --git a/tests/xfs/1553.out b/tests/xfs/1553.out new file mode 100644 index 0000000000..0298fcfddb --- /dev/null +++ b/tests/xfs/1553.out @@ -0,0 +1,4 @@ +QA output created by 1553 +Format and populate +Fuzz metadir subdir +Done fuzzing metadir subdir ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 7/9] xfs/769: add metadir upgrade to test matrix 2022-12-30 22:20 ` [PATCHSET v1.0 0/9] fstests: test XFS metadata directories Darrick J. Wong ` (5 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 9/9] xfs: create fuzz tests for metadata directories Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 5/9] xfs/206: update for metadata directory support Darrick J. Wong 2022-12-30 22:20 ` [PATCH 8/9] xfs/509: adjust inumbers accounting for metadata directories Darrick J. Wong 8 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan From: Darrick J. Wong <djwong@kernel.org> Add metadata directory trees to the features that this test will try to upgrade. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- tests/xfs/769 | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tests/xfs/769 b/tests/xfs/769 index 7613048f52..624dd2a338 100755 --- a/tests/xfs/769 +++ b/tests/xfs/769 @@ -174,12 +174,14 @@ if rt_configured; then check_repair_upgrade finobt && FEATURES+=("finobt") check_repair_upgrade inobtcount && FEATURES+=("inobtcount") check_repair_upgrade bigtime && FEATURES+=("bigtime") + check_repair_upgrade metadir && FEATURES+=("metadir") else check_repair_upgrade finobt && FEATURES+=("finobt") check_repair_upgrade rmapbt && FEATURES+=("rmapbt") check_repair_upgrade reflink && FEATURES+=("reflink") check_repair_upgrade inobtcount && FEATURES+=("inobtcount") check_repair_upgrade bigtime && FEATURES+=("bigtime") + check_repair_upgrade metadir && FEATURES+=("metadir") fi test "${#FEATURES[@]}" -eq 0 && \ ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 5/9] xfs/206: update for metadata directory support 2022-12-30 22:20 ` [PATCHSET v1.0 0/9] fstests: test XFS metadata directories Darrick J. Wong ` (6 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 7/9] xfs/769: add metadir upgrade to test matrix Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 8/9] xfs/509: adjust inumbers accounting for metadata directories Darrick J. Wong 8 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan From: Darrick J. Wong <djwong@kernel.org> Filter 'metadir=' out of the golden output so that metadata directories don't cause this test to regress. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- tests/xfs/206 | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/tests/xfs/206 b/tests/xfs/206 index cb346b6dc9..c181d7dd3e 100755 --- a/tests/xfs/206 +++ b/tests/xfs/206 @@ -64,7 +64,8 @@ mkfs_filter() -e "s/\(sunit=\)\([0-9]* blks,\)/\10 blks,/" \ -e "s/, lazy-count=[0-9]//" \ -e "/.*crc=/d" \ - -e "/^Default configuration/d" + -e "/^Default configuration/d" \ + -e "/metadir=.*/d" } # mkfs slightly smaller than that, small log for speed. ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 8/9] xfs/509: adjust inumbers accounting for metadata directories 2022-12-30 22:20 ` [PATCHSET v1.0 0/9] fstests: test XFS metadata directories Darrick J. Wong ` (7 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 5/9] xfs/206: update for metadata directory support Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 8 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan From: Darrick J. Wong <djwong@kernel.org> The INUMBERS ioctl exports data from the inode btree directly -- the number of inodes it reports is taken from ir_freemask and includes all the files in the metadata directory tree. BULKSTAT, on the other hand, only reports non-metadata files. When metadir is enabled, this will (eventually) cause a discrepancy in the inode counts that is large enough to exceed the tolerances, thereby causing a test failure. Correct this by counting the files in the metadata directory and subtracting that from the INUMBERS totals. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- tests/xfs/509 | 21 +++++++++++++++++++-- 1 file changed, 19 insertions(+), 2 deletions(-) diff --git a/tests/xfs/509 b/tests/xfs/509 index d04dfbbfba..b87abef964 100755 --- a/tests/xfs/509 +++ b/tests/xfs/509 @@ -91,13 +91,13 @@ inumbers_count() bstat_versions | while read v_tag v_flag; do echo -n "inumbers all($v_tag): " nr=$(inumbers_fs $SCRATCH_MNT $v_flag) - _within_tolerance "inumbers" $nr $expect $tolerance -v + _within_tolerance "inumbers" $((nr - METADATA_FILES)) $expect $tolerance -v local agcount=$(_xfs_mount_agcount $SCRATCH_MNT) for batchsize in 71 2 1; do echo -n "inumbers $batchsize($v_tag): " nr=$(inumbers_ag $agcount $batchsize $SCRATCH_MNT $v_flag) - _within_tolerance "inumbers" $nr $expect $tolerance -v + _within_tolerance "inumbers" $((nr - METADATA_FILES)) $expect $tolerance -v done done } @@ -143,9 +143,26 @@ _supported_fs xfs DIRCOUNT=8 INOCOUNT=$((2048 / DIRCOUNT)) +# Count everything in the metadata directory tree. +count_metadir_files() { + local metadirs=('/realtime' '/quota') + local db_args=('-f') + + for m in "${metadirs[@]}"; do + db_args+=('-c' "ls -m $m") + done + + local ret=$(_scratch_xfs_db "${db_args[@]}" 2>/dev/null | grep regular | wc -l) + test -z "$ret" && ret=0 + echo $ret +} + _scratch_mkfs "-d agcount=$DIRCOUNT" >> $seqres.full 2>&1 || _fail "mkfs failed" _scratch_mount +METADATA_FILES=$(count_metadir_files) +echo "found $METADATA_FILES metadata files" >> $seqres.full + # Figure out if we have v5 bulkstat/inumbers ioctls. has_v5= bs_root_out="$($XFS_IO_PROG -c 'bulkstat_single root' $SCRATCH_MNT 2>>$seqres.full)" ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCHSET v1.0 0/4] fstests: support metadump to external devices 2022-12-30 21:14 [NYE DELUGE 3/4] xfs: modernize the realtime volume Darrick J. Wong ` (33 preceding siblings ...) 2022-12-30 22:20 ` [PATCHSET v1.0 0/9] fstests: test XFS metadata directories Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 3/4] common/ext4: reformat external logs during mdrestore operations Darrick J. Wong ` (3 more replies) 2022-12-30 22:20 ` [PATCHSET v1.0 00/12] xfsprogs: shard the realtime section Darrick J. Wong ` (4 subsequent siblings) 39 siblings, 4 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan Hi all, This series modifies fstests to take advantage of the fact that xfs_metadump and xfs_mdrestore now support capturing the contents of an external log in a metadump, and restoring it on the other end. The first part of this series refactors and cleans up the common code a bit, and the rest add the actual support. Once this is merged, we'll be able to cache metadumps of populated filesystems with external log devices, which will enable faster fuzz testing. If you're going to start using this mess, you probably ought to just pull from my git trees, which are linked below. This is an extraordinary way to destroy everything. Enjoy! Comments and questions are, as always, welcome. --D xfsprogs git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=metadump-external-devices fstests git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=metadump-external-devices --- common/ext4 | 17 ++++++++++++- common/fuzzy | 7 +++++ common/populate | 72 ++++++++++++++++++++++++++++--------------------------- common/xfs | 39 ++++++++++++++++++++++++++++-- 4 files changed, 95 insertions(+), 40 deletions(-) ^ permalink raw reply [flat|nested] 565+ messages in thread
* [PATCH 3/4] common/ext4: reformat external logs during mdrestore operations 2022-12-30 22:20 ` [PATCHSET v1.0 0/4] fstests: support metadump to external devices Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 2/4] common/xfs: wipe " Darrick J. Wong ` (2 subsequent siblings) 3 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan From: Darrick J. Wong <djwong@kernel.org> The e2image file format doesn't support the capture of external log devices, which means that mdrestore ought to reformat the external log to get the restored filesystem to work again. The common/populate code could already do this, so push it to the common ext4 helper. While we're at it, fix the uncareful usage of SCRATCH_LOGDEV in the populate code. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- common/ext4 | 17 ++++++++++++++++- common/populate | 16 ++-------------- 2 files changed, 18 insertions(+), 15 deletions(-) diff --git a/common/ext4 b/common/ext4 index 3dcbfe17c9..5171b8df68 100644 --- a/common/ext4 +++ b/common/ext4 @@ -134,7 +134,8 @@ _ext4_mdrestore() { local metadump="$1" local device="$2" - shift; shift + local logdev="$3" + shift; shift; shift local options="$@" # If we're configured for compressed dumps and there isn't already an @@ -148,6 +149,20 @@ _ext4_mdrestore() test -r "$metadump" || return 1 $E2IMAGE_PROG $options -r "${metadump}" "${SCRATCH_DEV}" + res=$? + test $res -ne 0 && return $res + + # ext4 cannot e2image external logs, so we have to reformat the log + # device to match the restored fs + if [ "${logdev}" != "none" ]; then + local fsuuid="$($DUMPE2FS_PROG -h "${SCRATCH_DEV}" 2>/dev/null | \ + grep 'Journal UUID:' | \ + sed -e 's/Journal UUID:[[:space:]]*//g')" + $MKFS_EXT4_PROG -O journal_dev "${logdev}" \ + -F -U "${fsuuid}" + res=$? + fi + return $res } # this test requires the ext4 kernel support crc feature on scratch device diff --git a/common/populate b/common/populate index 08c4bdc151..095e771d67 100644 --- a/common/populate +++ b/common/populate @@ -912,20 +912,8 @@ _scratch_populate_restore_cached() { return $? ;; "ext2"|"ext3"|"ext4") - _ext4_mdrestore "${metadump}" "${SCRATCH_DEV}" - ret=$? - test $ret -ne 0 && return $ret - - # ext4 cannot e2image external logs, so we have to reformat - # the scratch device to match the restored fs - if [ -n "${SCRATCH_LOGDEV}" ]; then - local fsuuid="$($DUMPE2FS_PROG -h "${SCRATCH_DEV}" 2>/dev/null | \ - grep 'Journal UUID:' | \ - sed -e 's/Journal UUID:[[:space:]]*//g')" - $MKFS_EXT4_PROG -O journal_dev "${SCRATCH_LOGDEV}" \ - -F -U "${fsuuid}" - fi - return 0 + _ext4_mdrestore "${metadump}" "${SCRATCH_DEV}" "${logdev}" + return $? ;; esac return 1 ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 2/4] common/xfs: wipe external logs during mdrestore operations 2022-12-30 22:20 ` [PATCHSET v1.0 0/4] fstests: support metadump to external devices Darrick J. Wong 2022-12-30 22:20 ` [PATCH 3/4] common/ext4: reformat external logs during mdrestore operations Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 1/4] common/populate: refactor caching of metadumps to a helper Darrick J. Wong 2022-12-30 22:20 ` [PATCH 4/4] common/xfs: capture external logs during metadump/mdrestore Darrick J. Wong 3 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan From: Darrick J. Wong <djwong@kernel.org> The XFS metadump file format doesn't support the capture of external log devices, which means that mdrestore ought to wipe the external log and run xfs_repair to rewrite the log device as needed to get the restored filesystem to work again. The common/populate code could already do this, so push it to the common xfs helper. While we're at it, fix the uncareful usage of SCRATCH_LOGDEV in the populate code. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- common/fuzzy | 7 ++++++- common/populate | 19 ++++++------------- common/xfs | 21 +++++++++++++++++++-- 3 files changed, 31 insertions(+), 16 deletions(-) diff --git a/common/fuzzy b/common/fuzzy index ef54f2fe2c..7034ff8c42 100644 --- a/common/fuzzy +++ b/common/fuzzy @@ -297,7 +297,12 @@ __scratch_xfs_fuzz_unmount() __scratch_xfs_fuzz_mdrestore() { __scratch_xfs_fuzz_unmount - _xfs_mdrestore "${POPULATE_METADUMP}" "${SCRATCH_DEV}" || \ + + local logdev=none + [ "$USE_EXTERNAL" = yes -a ! -z "$SCRATCH_LOGDEV" ] && \ + logdev=$SCRATCH_LOGDEV + + _xfs_mdrestore "${POPULATE_METADUMP}" "${SCRATCH_DEV}" "${logdev}" || \ _fail "${POPULATE_METADUMP}: Could not find metadump to restore?" } diff --git a/common/populate b/common/populate index 8db7acefb6..08c4bdc151 100644 --- a/common/populate +++ b/common/populate @@ -902,21 +902,14 @@ _scratch_populate_cache_tag() { _scratch_populate_restore_cached() { local metadump="$1" + local logdev=none + [ "$USE_EXTERNAL" = yes -a ! -z "$SCRATCH_LOGDEV" ] && \ + logdev=$SCRATCH_LOGDEV + case "${FSTYP}" in "xfs") - _xfs_mdrestore "${metadump}" "${SCRATCH_DEV}" - res=$? - test $res -ne 0 && return $res - - # Cached images should have been unmounted cleanly, so if - # there's an external log we need to wipe it and run repair to - # format it to match this filesystem. - if [ -n "${SCRATCH_LOGDEV}" ]; then - $WIPEFS_PROG -a "${SCRATCH_LOGDEV}" - _scratch_xfs_repair - res=$? - fi - return $res + _xfs_mdrestore "${metadump}" "${SCRATCH_DEV}" "${logdev}" + return $? ;; "ext2"|"ext3"|"ext4") _ext4_mdrestore "${metadump}" "${SCRATCH_DEV}" diff --git a/common/xfs b/common/xfs index 77af8a6d60..29130fabbc 100644 --- a/common/xfs +++ b/common/xfs @@ -682,7 +682,8 @@ _xfs_metadump() { _xfs_mdrestore() { local metadump="$1" local device="$2" - shift; shift + local logdev="$3" + shift; shift; shift local options="$@" # If we're configured for compressed dumps and there isn't already an @@ -696,6 +697,18 @@ _xfs_mdrestore() { test -r "$metadump" || return 1 $XFS_MDRESTORE_PROG $options "${metadump}" "${device}" + res=$? + test $res -ne 0 && return $res + + # Cached images should have been unmounted cleanly, so if there's an + # external log we need to wipe it and run repair to format it to match + # this filesystem. + if [ "${logdev}" != "none" ]; then + $WIPEFS_PROG -a "${logdev}" + _scratch_xfs_repair + res=$? + fi + return $res } # Snapshot the metadata on the scratch device @@ -717,7 +730,11 @@ _scratch_xfs_mdrestore() local metadump=$1 shift - _xfs_mdrestore "$metadump" "$SCRATCH_DEV" "$@" + local logdev=none + [ "$USE_EXTERNAL" = yes -a ! -z "$SCRATCH_LOGDEV" ] && \ + logdev=$SCRATCH_LOGDEV + + _xfs_mdrestore "$metadump" "$SCRATCH_DEV" "$logdev" "$@" } # run xfs_check and friends on a FS. ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 1/4] common/populate: refactor caching of metadumps to a helper 2022-12-30 22:20 ` [PATCHSET v1.0 0/4] fstests: support metadump to external devices Darrick J. Wong 2022-12-30 22:20 ` [PATCH 3/4] common/ext4: reformat external logs during mdrestore operations Darrick J. Wong 2022-12-30 22:20 ` [PATCH 2/4] common/xfs: wipe " Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 4/4] common/xfs: capture external logs during metadump/mdrestore Darrick J. Wong 3 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan From: Darrick J. Wong <djwong@kernel.org> Hoist out of _scratch_populate_cached all the code that we use to save a metadump of the populated filesystem. We're going to make this more involved for XFS in the next few patches so that we can take advantage of the new support for external devices in metadump/mdrestore. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- common/populate | 37 ++++++++++++++++++++++++++++--------- 1 file changed, 28 insertions(+), 9 deletions(-) diff --git a/common/populate b/common/populate index 29ea637ecb..8db7acefb6 100644 --- a/common/populate +++ b/common/populate @@ -938,6 +938,31 @@ _scratch_populate_restore_cached() { return 1 } +# Take a metadump of the scratch filesystem and cache it for later. +_scratch_populate_save_metadump() +{ + local metadump_file="$1" + + case "${FSTYP}" in + "xfs") + local logdev=none + [ "$USE_EXTERNAL" = yes -a ! -z "$SCRATCH_LOGDEV" ] && \ + logdev=$SCRATCH_LOGDEV + + _xfs_metadump "$metadump_file" "$SCRATCH_DEV" "$logdev" \ + compress + res=$? + ;; + "ext2"|"ext3"|"ext4") + _ext4_metadump "${SCRATCH_DEV}" "${metadump_file}" compress + res=$? + ;; + *) + _fail "Don't know how to save a ${FSTYP} filesystem." + esac + return $res +} + # Populate a scratch FS from scratch or from a cached image. _scratch_populate_cached() { local meta_descr="$(_scratch_populate_cache_tag "$@")" @@ -961,26 +986,20 @@ _scratch_populate_cached() { # Oh well, just create one from scratch _scratch_mkfs - echo "${meta_descr}" > "${populate_metadump_descr}" case "${FSTYP}" in "xfs") _scratch_xfs_populate $@ _scratch_xfs_populate_check - - local logdev=none - [ "$USE_EXTERNAL" = yes -a ! -z "$SCRATCH_LOGDEV" ] && \ - logdev=$SCRATCH_LOGDEV - - _xfs_metadump "$POPULATE_METADUMP" "$SCRATCH_DEV" "$logdev" \ - compress ;; "ext2"|"ext3"|"ext4") _scratch_ext4_populate $@ _scratch_ext4_populate_check - _ext4_metadump "${SCRATCH_DEV}" "${POPULATE_METADUMP}" compress ;; *) _fail "Don't know how to populate a ${FSTYP} filesystem." ;; esac + + _scratch_populate_save_metadump "${POPULATE_METADUMP}" && \ + echo "${meta_descr}" > "${populate_metadump_descr}" } ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 4/4] common/xfs: capture external logs during metadump/mdrestore 2022-12-30 22:20 ` [PATCHSET v1.0 0/4] fstests: support metadump to external devices Darrick J. Wong ` (2 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 1/4] common/populate: refactor caching of metadumps to a helper Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 3 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan From: Darrick J. Wong <djwong@kernel.org> If xfs_metadump supports the -x switch to capture the contents of external log devices and there is an external log device, add the option to the command line to enable preservation. Similarly, if xfs_mdrestore supports the -l switch and there's an external scratch log, pass the option so that we can restore log contents. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- common/xfs | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/common/xfs b/common/xfs index 29130fabbc..36e02413db 100644 --- a/common/xfs +++ b/common/xfs @@ -667,9 +667,20 @@ _xfs_metadump() { shift; shift; shift; shift local options="$@" test -z "$options" && options="-a -o" + local metadump_has_dash_x + + # Does metadump support capturing from external devices? + $XFS_METADUMP_PROG --help 2>&1 | grep -q -- '-[a-zA-Z]*[wW]x' && \ + metadump_has_dash_x=1 if [ "$logdev" != "none" ]; then options="$options -l $logdev" + + # Tell metadump to capture the log device + if [ -n "$metadump_has_dash_x" ]; then + options="$options -x" + unset metadump_has_dash_x + fi fi $XFS_METADUMP_PROG $options "$device" "$metadump" @@ -696,6 +707,13 @@ _xfs_mdrestore() { fi test -r "$metadump" || return 1 + # Does mdrestore support restoring to external log devices? If so, + # restore to it, and do not wipe it afterwards. + if [ "$logdev" != "none" ] && $XFS_MDRESTORE_PROG --help 2>&1 | grep -q -- '-l logdev'; then + options="$options -l $logdev" + logdev="none" + fi + $XFS_MDRESTORE_PROG $options "${metadump}" "${device}" res=$? test $res -ne 0 && return $res ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCHSET v1.0 00/12] xfsprogs: shard the realtime section 2022-12-30 21:14 [NYE DELUGE 3/4] xfs: modernize the realtime volume Darrick J. Wong ` (34 preceding siblings ...) 2022-12-30 22:20 ` [PATCHSET v1.0 0/4] fstests: support metadump to external devices Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 03/12] xfs/206: update mkfs filtering for rt groups feature Darrick J. Wong ` (11 more replies) 2022-12-30 22:20 ` [PATCHSET v1.0 00/13] fstests: fixes for realtime rmap Darrick J. Wong ` (3 subsequent siblings) 39 siblings, 12 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan Hi all, Right now, the realtime section uses a single pair of metadata inodes to store the free space information. This presents a scalability problem since every thread trying to allocate or free rt extents have to lock these files. It would be very useful if we could begin to tackle these problems by sharding the realtime section, so create the notion of realtime groups, which are similar to allocation groups on the data section. While we're at it, define a superblock to be stamped into the start of each rt section. This enables utilities such as blkid to identify block devices containing realtime sections, and helpfully avoids the situation where a file extent can cross an rtgroup boundary. The best advantage for rtgroups will become evident later when we get to adding rmap and reflink to the realtime volume, since the geometry constraints are the same for rt groups and AGs. Hence we can reuse all that code directly. If you're going to start using this mess, you probably ought to just pull from my git trees, which are linked below. This is an extraordinary way to destroy everything. Enjoy! Comments and questions are, as always, welcome. --D kernel git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=realtime-groups xfsprogs git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=realtime-groups fstests git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=realtime-groups --- common/fuzzy | 33 +++++++++++++++---- common/populate | 12 ++++++- common/xfs | 83 ++++++++++++++++++++++++++++++++++++++++------- src/punch-alternating.c | 28 +++++++++++++++- tests/xfs/114 | 4 ++ tests/xfs/122 | 2 + tests/xfs/122.out | 8 +++++ tests/xfs/146 | 2 + tests/xfs/185 | 2 + tests/xfs/187 | 3 +- tests/xfs/206 | 3 +- tests/xfs/271 | 3 +- tests/xfs/341 | 4 +- tests/xfs/449 | 6 +++ tests/xfs/556 | 16 ++++++--- tests/xfs/800 | 2 + tests/xfs/840 | 2 + tests/xfs/841 | 2 + 18 files changed, 176 insertions(+), 39 deletions(-) ^ permalink raw reply [flat|nested] 565+ messages in thread
* [PATCH 03/12] xfs/206: update mkfs filtering for rt groups feature 2022-12-30 22:20 ` [PATCHSET v1.0 00/12] xfsprogs: shard the realtime section Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 01/12] xfs/122: update for rtgroups Darrick J. Wong ` (10 subsequent siblings) 11 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan From: Darrick J. Wong <djwong@kernel.org> Filter out the new mkfs lines that show the rtgroup information, since this test is heavily dependent on old mkfs output. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- tests/xfs/206 | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/tests/xfs/206 b/tests/xfs/206 index c181d7dd3e..904d53deb0 100755 --- a/tests/xfs/206 +++ b/tests/xfs/206 @@ -65,7 +65,8 @@ mkfs_filter() -e "s/, lazy-count=[0-9]//" \ -e "/.*crc=/d" \ -e "/^Default configuration/d" \ - -e "/metadir=.*/d" + -e "/metadir=.*/d" \ + -e '/rgcount=/d' } # mkfs slightly smaller than that, small log for speed. ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 01/12] xfs/122: update for rtgroups 2022-12-30 22:20 ` [PATCHSET v1.0 00/12] xfsprogs: shard the realtime section Darrick J. Wong 2022-12-30 22:20 ` [PATCH 03/12] xfs/206: update mkfs filtering for rt groups feature Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 02/12] punch-alternating: detect xfs realtime files with large allocation units Darrick J. Wong ` (9 subsequent siblings) 11 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan From: Darrick J. Wong <djwong@kernel.org> Add our new metadata for realtime allocation groups to the ondisk checking. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- tests/xfs/122.out | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/tests/xfs/122.out b/tests/xfs/122.out index eee6c1ee6d..01376180cc 100644 --- a/tests/xfs/122.out +++ b/tests/xfs/122.out @@ -44,6 +44,9 @@ offsetof(xfs_sb_t, sb_rbmino) = 64 offsetof(xfs_sb_t, sb_rextents) = 24 offsetof(xfs_sb_t, sb_rextsize) = 80 offsetof(xfs_sb_t, sb_rextslog) = 125 +offsetof(xfs_sb_t, sb_rgblklog) = 280 +offsetof(xfs_sb_t, sb_rgblocks) = 272 +offsetof(xfs_sb_t, sb_rgcount) = 276 offsetof(xfs_sb_t, sb_rootino) = 56 offsetof(xfs_sb_t, sb_rrmapino) = 264 offsetof(xfs_sb_t, sb_rsumino) = 72 @@ -112,9 +115,11 @@ sizeof(struct xfs_refcount_key) = 4 sizeof(struct xfs_refcount_rec) = 12 sizeof(struct xfs_rmap_key) = 20 sizeof(struct xfs_rmap_rec) = 24 +sizeof(struct xfs_rtgroup_geometry) = 128 sizeof(struct xfs_rtrmap_key) = 24 sizeof(struct xfs_rtrmap_rec) = 32 sizeof(struct xfs_rtrmap_root) = 4 +sizeof(struct xfs_rtsb) = 104 sizeof(struct xfs_rud_log_format) = 16 sizeof(struct xfs_rui_log_format) = 16 sizeof(struct xfs_scrub_metadata) = 64 ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 02/12] punch-alternating: detect xfs realtime files with large allocation units 2022-12-30 22:20 ` [PATCHSET v1.0 00/12] xfsprogs: shard the realtime section Darrick J. Wong 2022-12-30 22:20 ` [PATCH 03/12] xfs/206: update mkfs filtering for rt groups feature Darrick J. Wong 2022-12-30 22:20 ` [PATCH 01/12] xfs/122: update for rtgroups Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 10/12] xfs/27[46],xfs/556: fix tests to deal with rtgroups output in bmap/fsmap commands Darrick J. Wong ` (8 subsequent siblings) 11 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan From: Darrick J. Wong <djwong@kernel.org> For files on the XFS realtime volume, it's possible that the file allocation unit (aka the minimum size we have to punch to deallocate file blocks) could be greater than a single fs block. This utility assumed that it's always possible to punch a single fs block, but for these types of files, all that does is zeroes the page cache. While that's what most *user applications* want, fstests uses punching to fragment file mapping metadata and/or fragment free space, so adapt this test for that purpose by detecting realtime files. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- src/punch-alternating.c | 28 +++++++++++++++++++++++++++- tests/xfs/114 | 4 ++++ tests/xfs/146 | 2 +- tests/xfs/187 | 3 ++- tests/xfs/341 | 4 ++-- 5 files changed, 36 insertions(+), 5 deletions(-) diff --git a/src/punch-alternating.c b/src/punch-alternating.c index 18dd215197..d2bb4b6a22 100644 --- a/src/punch-alternating.c +++ b/src/punch-alternating.c @@ -20,6 +20,28 @@ void usage(char *cmd) exit(1); } +/* Compute the file allocation unit size for an XFS file. */ +static int detect_xfs_alloc_unit(int fd) +{ + struct fsxattr fsx; + struct xfs_fsop_geom fsgeom; + int ret; + + ret = ioctl(fd, XFS_IOC_FSGEOMETRY, &fsgeom); + if (ret) + return -1; + + ret = ioctl(fd, XFS_IOC_FSGETXATTR, &fsx); + if (ret) + return -1; + + ret = fsgeom.blocksize; + if (fsx.fsx_xflags & XFS_XFLAG_REALTIME) + ret *= fsgeom.rtextsize; + + return ret; +} + int main(int argc, char *argv[]) { struct stat s; @@ -82,7 +104,11 @@ int main(int argc, char *argv[]) goto err; sz = s.st_size; - blksz = sf.f_bsize; + c = detect_xfs_alloc_unit(fd); + if (c > 0) + blksz = c; + else + blksz = sf.f_bsize; mode = FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE; for (offset = start_offset * blksz; diff --git a/tests/xfs/114 b/tests/xfs/114 index 0e8a0529ab..7ecb4d217c 100755 --- a/tests/xfs/114 +++ b/tests/xfs/114 @@ -49,6 +49,10 @@ $XFS_IO_PROG -f \ -c "pwrite -S 0x68 -b 1048576 0 $len2" \ $SCRATCH_MNT/f2 >> $seqres.full +# The arguments to punch-alternating must be specified in units of file +# allocation units, so we divide the argument by $file_blksz. We already +# verified that $blksz is congruent with $file_blksz, so the fpunch parameters +# will always align with the file allocation unit. $here/src/punch-alternating -o $((16 * blksz / file_blksz)) \ -s $((blksz / file_blksz)) \ -i $((blksz * 2 / file_blksz)) \ diff --git a/tests/xfs/146 b/tests/xfs/146 index 123bdff59f..c1ef5e7e1b 100755 --- a/tests/xfs/146 +++ b/tests/xfs/146 @@ -68,7 +68,7 @@ _xfs_force_bdev realtime $SCRATCH_MNT # Allocate some stuff at the start, to force the first falloc of the ouch file # to happen somewhere in the middle of the rt volume $XFS_IO_PROG -f -c 'falloc 0 64m' "$SCRATCH_MNT/b" -$here/src/punch-alternating -i $((rextblks * 2)) -s $((rextblks)) "$SCRATCH_MNT/b" +$here/src/punch-alternating "$SCRATCH_MNT/b" avail="$(df -P "$SCRATCH_MNT" | awk 'END {print $4}')"1 toobig="$((avail * 2))" diff --git a/tests/xfs/187 b/tests/xfs/187 index 7c34d8e630..14c3b37670 100755 --- a/tests/xfs/187 +++ b/tests/xfs/187 @@ -132,7 +132,8 @@ $XFS_IO_PROG -f -c "truncate $required_sz" -c "falloc 0 $remap_sz" $SCRATCH_MNT/ # Punch out every other extent of the last two sections, to fragment free space. frag_sz=$((remap_sz * 3)) punch_off=$((bigfile_sz - frag_sz)) -$here/src/punch-alternating $SCRATCH_MNT/bigfile -o $((punch_off / fsbsize)) -i $((rtextsize_blks * 2)) -s $rtextsize_blks +rtextsize_bytes=$((fsbsize * rtextsize_blks)) +$here/src/punch-alternating $SCRATCH_MNT/bigfile -o $((punch_off / rtextsize_bytes)) # Make sure we have some free rtextents. free_rtx=$($XFS_IO_PROG -c 'statfs' $SCRATCH_MNT | grep statfs.f_bavail | awk '{print $3}') diff --git a/tests/xfs/341 b/tests/xfs/341 index 1f734c9015..7d2842b579 100755 --- a/tests/xfs/341 +++ b/tests/xfs/341 @@ -43,8 +43,8 @@ len=$((blocks * rtextsz)) echo "Create some files" $XFS_IO_PROG -f -R -c "falloc 0 $len" -c "pwrite -S 0x68 -b 1048576 0 $len" $SCRATCH_MNT/f1 >> $seqres.full $XFS_IO_PROG -f -R -c "falloc 0 $len" -c "pwrite -S 0x68 -b 1048576 0 $len" $SCRATCH_MNT/f2 >> $seqres.full -$here/src/punch-alternating -i $((2 * rtextsz_blks)) -s $rtextsz_blks $SCRATCH_MNT/f1 >> "$seqres.full" -$here/src/punch-alternating -i $((2 * rtextsz_blks)) -s $rtextsz_blks $SCRATCH_MNT/f2 >> "$seqres.full" +$here/src/punch-alternating $SCRATCH_MNT/f1 >> "$seqres.full" +$here/src/punch-alternating $SCRATCH_MNT/f2 >> "$seqres.full" echo garbage > $SCRATCH_MNT/f3 ino=$(stat -c '%i' $SCRATCH_MNT/f3) _scratch_unmount ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 10/12] xfs/27[46],xfs/556: fix tests to deal with rtgroups output in bmap/fsmap commands 2022-12-30 22:20 ` [PATCHSET v1.0 00/12] xfsprogs: shard the realtime section Darrick J. Wong ` (2 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 02/12] punch-alternating: detect xfs realtime files with large allocation units Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 07/12] xfs/449: update test to know about xfs_db -R Darrick J. Wong ` (7 subsequent siblings) 11 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan From: Darrick J. Wong <djwong@kernel.org> Fix these tests to deal with the xfs_io bmap and fsmap commands printing out realtime group numbers if the feature is enabled. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- common/xfs | 4 ++++ tests/xfs/271 | 3 ++- tests/xfs/556 | 16 ++++++++++------ 3 files changed, 16 insertions(+), 7 deletions(-) diff --git a/common/xfs b/common/xfs index ccdcf45d0d..6089a05d0e 100644 --- a/common/xfs +++ b/common/xfs @@ -486,6 +486,10 @@ _xfs_has_feature() feat="rtextents" feat_regex="[1-9][0-9]*" ;; + "rtgroups") + feat="rgcount" + feat_regex="[1-9][0-9]*" + ;; esac local answer="$($XFS_INFO_PROG "$fs" 2>&1 | grep -E -w -c "$feat=$feat_regex")" diff --git a/tests/xfs/271 b/tests/xfs/271 index d67ac4d6c1..74e2c822c1 100755 --- a/tests/xfs/271 +++ b/tests/xfs/271 @@ -31,6 +31,7 @@ _scratch_mkfs > "$seqres.full" 2>&1 _scratch_mount agcount=$(_xfs_mount_agcount $SCRATCH_MNT) +rgcount=$(_xfs_mount_rgcount $SCRATCH_MNT) # mkfs lays out btree root blocks in the order bnobt, cntbt, inobt, finobt, # rmapbt, refcountbt, and then allocates AGFL blocks. Since GETFSMAP has the @@ -48,7 +49,7 @@ cat $TEST_DIR/fsmap >> $seqres.full echo "Check AG header" | tee -a $seqres.full grep 'static fs metadata[[:space:]]*[0-9]*[[:space:]]*(0\.\.' $TEST_DIR/fsmap | tee -a $seqres.full > $TEST_DIR/testout -_within_tolerance "AG header count" $(wc -l < $TEST_DIR/testout) $agcount 0 -v +_within_tolerance "AG header count" $(wc -l < $TEST_DIR/testout) $((agcount + rgcount)) 0 -v echo "Check freesp/rmap btrees" | tee -a $seqres.full grep 'per-AG metadata[[:space:]]*[0-9]*[[:space:]]*([0-9]*\.\.' $TEST_DIR/fsmap | tee -a $seqres.full > $TEST_DIR/testout diff --git a/tests/xfs/556 b/tests/xfs/556 index 66908a5410..72343e8625 100755 --- a/tests/xfs/556 +++ b/tests/xfs/556 @@ -47,16 +47,20 @@ victim=$SCRATCH_MNT/a file_blksz=$(_get_file_block_size $SCRATCH_MNT) $XFS_IO_PROG -f -c "pwrite -S 0x58 0 $((4 * file_blksz))" -c "fsync" $victim >> $seqres.full unset errordev -_xfs_is_realtime_file $victim && errordev="RT" + +awk_len_prog='{print $6}' +if _xfs_is_realtime_file $victim; then + if ! _xfs_has_feature $SCRATCH_MNT rtgroups; then + awk_len_prog='{print $4}' + fi + errordev="RT" +fi bmap_str="$($XFS_IO_PROG -c "bmap -elpv" $victim | grep "^[[:space:]]*0:")" echo "$errordev:$bmap_str" >> $seqres.full phys="$(echo "$bmap_str" | $AWK_PROG '{print $3}')" -if [ "$errordev" = "RT" ]; then - len="$(echo "$bmap_str" | $AWK_PROG '{print $4}')" -else - len="$(echo "$bmap_str" | $AWK_PROG '{print $6}')" -fi +len="$(echo "$bmap_str" | $AWK_PROG "$awk_len_prog")" + fs_blksz=$(_get_block_size $SCRATCH_MNT) echo "file_blksz:$file_blksz:fs_blksz:$fs_blksz" >> $seqres.full kernel_sectors_per_fs_block=$((fs_blksz / 512)) ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 07/12] xfs/449: update test to know about xfs_db -R 2022-12-30 22:20 ` [PATCHSET v1.0 00/12] xfsprogs: shard the realtime section Darrick J. Wong ` (3 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 10/12] xfs/27[46],xfs/556: fix tests to deal with rtgroups output in bmap/fsmap commands Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 11/12] common/xfs: capture realtime devices during metadump/mdrestore Darrick J. Wong ` (6 subsequent siblings) 11 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan From: Darrick J. Wong <djwong@kernel.org> The realtime groups feature added a -R flag to xfs_db so that users can pass in the realtime device. Since we've now modified the _scratch_xfs_db to use this facility, we can update the test to do exact comparisons of the xfs_db info command against the mkfs output. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- tests/xfs/449 | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/tests/xfs/449 b/tests/xfs/449 index 5374bf2f85..66c443e994 100755 --- a/tests/xfs/449 +++ b/tests/xfs/449 @@ -32,7 +32,11 @@ echo DB >> $seqres.full cat $tmp.dbinfo >> $seqres.full # xfs_db doesn't take a rtdev argument, so it reports "realtime=external". # mkfs does, so make a quick substitution -diff -u <(cat $tmp.mkfs | sed -e 's/realtime =\/.*extsz=/realtime =external extsz=/g') $tmp.dbinfo +if $XFS_DB_PROG --help 2>&1 | grep -q -- '-R rtdev'; then + diff -u $tmp.mkfs $tmp.dbinfo +else + diff -u <(cat $tmp.mkfs | sed -e 's/realtime =\/.*extsz=/realtime =external extsz=/g') $tmp.dbinfo +fi _scratch_mount ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 11/12] common/xfs: capture realtime devices during metadump/mdrestore 2022-12-30 22:20 ` [PATCHSET v1.0 00/12] xfsprogs: shard the realtime section Darrick J. Wong ` (4 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 07/12] xfs/449: update test to know about xfs_db -R Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 05/12] common: filter rtgroups when we're disabling metadir Darrick J. Wong ` (5 subsequent siblings) 11 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan From: Darrick J. Wong <djwong@kernel.org> If xfs_metadump supports the -x and -R switches to capture the contents of realtime devices and there is a realtime device, add the option to the command line to enable preservation. Similarly, if xfs_mdrestore supports the -R switch and there's an external scratch rtdev, pass the option so that we can restore rtdev contents. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- common/fuzzy | 6 +++++- common/populate | 12 ++++++++++-- common/xfs | 48 +++++++++++++++++++++++++++++++++++++++--------- 3 files changed, 54 insertions(+), 12 deletions(-) diff --git a/common/fuzzy b/common/fuzzy index 7034ff8c42..7f96384402 100644 --- a/common/fuzzy +++ b/common/fuzzy @@ -302,7 +302,11 @@ __scratch_xfs_fuzz_mdrestore() [ "$USE_EXTERNAL" = yes -a ! -z "$SCRATCH_LOGDEV" ] && \ logdev=$SCRATCH_LOGDEV - _xfs_mdrestore "${POPULATE_METADUMP}" "${SCRATCH_DEV}" "${logdev}" || \ + local rtdev=none + [ "$USE_EXTERNAL" = yes -a ! -z "$SCRATCH_RTDEV" ] && \ + rtdev=$SCRATCH_RTDEV + + _xfs_mdrestore "${POPULATE_METADUMP}" "${SCRATCH_DEV}" "${logdev}" "${rtdev}" || \ _fail "${POPULATE_METADUMP}: Could not find metadump to restore?" } diff --git a/common/populate b/common/populate index 095e771d67..c0bbbc3f3b 100644 --- a/common/populate +++ b/common/populate @@ -908,7 +908,11 @@ _scratch_populate_restore_cached() { case "${FSTYP}" in "xfs") - _xfs_mdrestore "${metadump}" "${SCRATCH_DEV}" "${logdev}" + local rtdev=none + [ "$USE_EXTERNAL" = yes -a ! -z "$SCRATCH_RTDEV" ] && \ + rtdev=$SCRATCH_RTDEV + + _xfs_mdrestore "${metadump}" "${SCRATCH_DEV}" "${logdev}" "${rtdev}" return $? ;; "ext2"|"ext3"|"ext4") @@ -930,8 +934,12 @@ _scratch_populate_save_metadump() [ "$USE_EXTERNAL" = yes -a ! -z "$SCRATCH_LOGDEV" ] && \ logdev=$SCRATCH_LOGDEV + local rtdev=none + [ "$USE_EXTERNAL" = yes -a ! -z "$SCRATCH_RTDEV" ] && \ + rtdev=$SCRATCH_RTDEV + _xfs_metadump "$metadump_file" "$SCRATCH_DEV" "$logdev" \ - compress + "$rtdev" compress res=$? ;; "ext2"|"ext3"|"ext4") diff --git a/common/xfs b/common/xfs index 6089a05d0e..a37284068f 100644 --- a/common/xfs +++ b/common/xfs @@ -679,15 +679,20 @@ _xfs_metadump() { local metadump="$1" local device="$2" local logdev="$3" - local compressopt="$4" - shift; shift; shift; shift + local rtdev="$4" + local compressopt="$5" + shift; shift; shift; shift; shift local options="$@" test -z "$options" && options="-a -o" local metadump_has_dash_x + local metadump_has_dash_R # Does metadump support capturing from external devices? $XFS_METADUMP_PROG --help 2>&1 | grep -q -- '-[a-zA-Z]*[wW]x' && \ metadump_has_dash_x=1 + # Does metadump support capturing realtime devices? + $XFS_METADUMP_PROG --help 2>&1 | grep -q -- '-R rtdev' && \ + metadump_has_dash_R=1 if [ "$logdev" != "none" ]; then options="$options -l $logdev" @@ -699,6 +704,17 @@ _xfs_metadump() { fi fi + # Capture the realtime device, if possible + if [ "$rtdev" != "none" ] && [ -n "$metadump_has_dash_R" ]; then + options="$options -R $rtdev" + + # Tell metadump to capture the rt device + if [ -n "$metadump_has_dash_x" ]; then + options="$options -x" + unset metadump_has_dash_x + fi + fi + $XFS_METADUMP_PROG $options "$device" "$metadump" res=$? [ "$compressopt" = "compress" ] && [ -n "$DUMP_COMPRESSOR" ] && @@ -710,7 +726,8 @@ _xfs_mdrestore() { local metadump="$1" local device="$2" local logdev="$3" - shift; shift; shift + local rtdev="$4" + shift; shift; shift; shift local options="$@" # If we're configured for compressed dumps and there isn't already an @@ -730,6 +747,11 @@ _xfs_mdrestore() { logdev="none" fi + # Does mdrestore support restoring to realtime devices? + if [ "$rtdev" != "none" ] && $XFS_MDRESTORE_PROG --help 2>&1 | grep -q -- '-R rtdev'; then + options="$options -R $rtdev" + fi + $XFS_MDRESTORE_PROG $options "${metadump}" "${device}" res=$? test $res -ne 0 && return $res @@ -750,12 +772,16 @@ _scratch_xfs_metadump() { local metadump=$1 shift + local logdev=none - [ "$USE_EXTERNAL" = yes -a ! -z "$SCRATCH_LOGDEV" ] && \ logdev=$SCRATCH_LOGDEV - _xfs_metadump "$metadump" "$SCRATCH_DEV" "$logdev" nocompress "$@" + local rtdev=none + [ "$USE_EXTERNAL" = yes -a ! -z "$SCRATCH_RTDEV" ] && \ + rtdev=$SCRATCH_RTDEV + + _xfs_metadump "$metadump" "$SCRATCH_DEV" "$logdev" "$rtdev" nocompress "$@" } # Restore snapshotted metadata on the scratch device @@ -768,7 +794,11 @@ _scratch_xfs_mdrestore() [ "$USE_EXTERNAL" = yes -a ! -z "$SCRATCH_LOGDEV" ] && \ logdev=$SCRATCH_LOGDEV - _xfs_mdrestore "$metadump" "$SCRATCH_DEV" "$logdev" "$@" + local rtdev=none + [ "$USE_EXTERNAL" = yes -a ! -z "$SCRATCH_RTDEV" ] && \ + rtdev=$SCRATCH_RTDEV + + _xfs_mdrestore "$metadump" "$SCRATCH_DEV" "$logdev" "$rtdev" "$@" } # run xfs_check and friends on a FS. @@ -895,7 +925,7 @@ _check_xfs_filesystem() if [ "$ok" -ne 1 ] && [ "$DUMP_CORRUPT_FS" = "1" ]; then local flatdev="$(basename "$device")" _xfs_metadump "$seqres.$flatdev.check.md" "$device" "$logdev" \ - compress >> $seqres.full + "$rtdev" compress >> $seqres.full fi # Optionally test the index rebuilding behavior. @@ -928,7 +958,7 @@ _check_xfs_filesystem() if [ "$rebuild_ok" -ne 1 ] && [ "$DUMP_CORRUPT_FS" = "1" ]; then local flatdev="$(basename "$device")" _xfs_metadump "$seqres.$flatdev.rebuild.md" "$device" \ - "$logdev" compress >> $seqres.full + "$logdev" "$rtdev" compress >> $seqres.full fi fi @@ -1009,7 +1039,7 @@ _check_xfs_filesystem() if [ "$orebuild_ok" -ne 1 ] && [ "$DUMP_CORRUPT_FS" = "1" ]; then local flatdev="$(basename "$device")" _xfs_metadump "$seqres.$flatdev.orebuild.md" "$device" \ - "$logdev" compress >> $seqres.full + "$logdev" "$rtdev" compress >> $seqres.full fi fi ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 05/12] common: filter rtgroups when we're disabling metadir 2022-12-30 22:20 ` [PATCHSET v1.0 00/12] xfsprogs: shard the realtime section Darrick J. Wong ` (5 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 11/12] common/xfs: capture realtime devices during metadump/mdrestore Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 08/12] xfs/122: update for rtbitmap headers Darrick J. Wong ` (4 subsequent siblings) 11 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan From: Darrick J. Wong <djwong@kernel.org> If we're forcing a filesystem to be created without the metadir feature, we should forcibly disable rtgroups as well. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- common/xfs | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/common/xfs b/common/xfs index 0d1e0ec4bc..ccdcf45d0d 100644 --- a/common/xfs +++ b/common/xfs @@ -1821,6 +1821,10 @@ _scratch_xfs_find_metafile() # Force metadata directories off. _scratch_xfs_force_no_metadir() { + if echo "$MKFS_OPTIONS" | grep -q 'rtgroups='; then + MKFS_OPTIONS="$(echo "$MKFS_OPTIONS" | sed -e 's/rtgroups=\([01]\)/rtgroups=0/g')" + fi + if echo "$MKFS_OPTIONS" | grep -q 'metadir='; then MKFS_OPTIONS="$(echo "$MKFS_OPTIONS" | sed -e 's/metadir=\([01]\)/metadir=0/g')" return ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 08/12] xfs/122: update for rtbitmap headers 2022-12-30 22:20 ` [PATCHSET v1.0 00/12] xfsprogs: shard the realtime section Darrick J. Wong ` (6 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 05/12] common: filter rtgroups when we're disabling metadir Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 06/12] xfs/185: update for rtgroups Darrick J. Wong ` (3 subsequent siblings) 11 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan From: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- tests/xfs/122.out | 1 + 1 file changed, 1 insertion(+) diff --git a/tests/xfs/122.out b/tests/xfs/122.out index 01376180cc..336618cf7a 100644 --- a/tests/xfs/122.out +++ b/tests/xfs/122.out @@ -115,6 +115,7 @@ sizeof(struct xfs_refcount_key) = 4 sizeof(struct xfs_refcount_rec) = 12 sizeof(struct xfs_rmap_key) = 20 sizeof(struct xfs_rmap_rec) = 24 +sizeof(struct xfs_rtbuf_blkinfo) = 48 sizeof(struct xfs_rtgroup_geometry) = 128 sizeof(struct xfs_rtrmap_key) = 24 sizeof(struct xfs_rtrmap_rec) = 32 ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 06/12] xfs/185: update for rtgroups 2022-12-30 22:20 ` [PATCHSET v1.0 00/12] xfsprogs: shard the realtime section Darrick J. Wong ` (7 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 08/12] xfs/122: update for rtbitmap headers Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 04/12] common: pass the realtime device to xfs_db when possible Darrick J. Wong ` (2 subsequent siblings) 11 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan From: Darrick J. Wong <djwong@kernel.org> Send the fallocate results to seqres.full, since it doesn't matter if the call fails as long as we get the layout that we wanted. This test already has code to check the layout, so there's no point in failing on random ENOSPC errors. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- tests/xfs/185 | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tests/xfs/185 b/tests/xfs/185 index abeb052580..04770fd6c9 100755 --- a/tests/xfs/185 +++ b/tests/xfs/185 @@ -100,7 +100,7 @@ test "$ddbytes" -lt "$((rtbytes + (10 * rtextsize) ))" || \ # easy because fallocate for the first rt file always starts allocating at # physical offset zero. alloc_rtx="$((rtbytes / rtextsize))" -$XFS_IO_PROG -c "falloc 0 $((alloc_rtx * rtextsize))" $rtfile +$XFS_IO_PROG -c "falloc 0 $((alloc_rtx * rtextsize))" $rtfile &>> $seqres.full expected_end="$(( (alloc_rtx * rtextsize - 1) / 512 ))" ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 04/12] common: pass the realtime device to xfs_db when possible 2022-12-30 22:20 ` [PATCHSET v1.0 00/12] xfsprogs: shard the realtime section Darrick J. Wong ` (8 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 06/12] xfs/185: update for rtgroups Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 09/12] xfs/122: udpate test to pick up rtword/suminfo ondisk unions Darrick J. Wong 2022-12-30 22:20 ` [PATCH 12/12] common/fuzzy: adapt the scrub stress tests to support rtgroups Darrick J. Wong 11 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan From: Darrick J. Wong <djwong@kernel.org> Teach xfstests to pass the realtime device to xfs_db when it supports that option. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- common/xfs | 18 +++++++++++++++--- 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/common/xfs b/common/xfs index 36e02413db..0d1e0ec4bc 100644 --- a/common/xfs +++ b/common/xfs @@ -281,10 +281,10 @@ _xfs_check() { OPTS=" " DBOPTS=" " - USAGE="Usage: xfs_check [-fsvV] [-l logdev] [-i ino]... [-b bno]... special" + USAGE="Usage: xfs_check [-fsvV] [-l logdev] [-R rtdev] [-i ino]... [-b bno]... special" OPTIND=1 - while getopts "b:fi:l:stvV" c; do + while getopts "b:fi:l:stvVR:" c; do case $c in s) OPTS=$OPTS"-s ";; t) OPTS=$OPTS"-t ";; @@ -296,12 +296,14 @@ _xfs_check() V) $XFS_DB_PROG -p xfs_check -V return $? ;; + R) DBOPTS="$DBOPTS -R $OPTARG";; esac done set -- extra $@ shift $OPTIND case $# in - 1) ${XFS_DB_PROG}${DBOPTS} -F -i -p xfs_check -c "check$OPTS" $1 + 1) echo "${XFS_DB_PROG}${DBOPTS} -F -i -p xfs_check -c check$OPTS $1" >> /dev/ttyprintk + ${XFS_DB_PROG}${DBOPTS} -F -i -p xfs_check -c "check$OPTS" $1 status=$? ;; 2) echo $USAGE 1>&1 @@ -339,6 +341,11 @@ _scratch_xfs_db_options() SCRATCH_OPTIONS="" [ "$USE_EXTERNAL" = yes -a ! -z "$SCRATCH_LOGDEV" ] && \ SCRATCH_OPTIONS="-l$SCRATCH_LOGDEV" + if [ "$USE_EXTERNAL" = yes ] && [ ! -z "$SCRATCH_RTDEV" ]; then + $XFS_DB_PROG --help 2>&1 | grep -q -- '-R rtdev' || \ + _notrun 'xfs_db does not support rt devices' + SCRATCH_OPTIONS="$SCRATCH_OPTIONS -R$SCRATCH_RTDEV" + fi echo $SCRATCH_OPTIONS $* $SCRATCH_DEV } @@ -403,6 +410,11 @@ _scratch_xfs_check() SCRATCH_OPTIONS="-l $SCRATCH_LOGDEV" [ "$LARGE_SCRATCH_DEV" = yes ] && \ SCRATCH_OPTIONS=$SCRATCH_OPTIONS" -t" + if [ "$USE_EXTERNAL" = yes ] && [ ! -z "$SCRATCH_RTDEV" ]; then + $XFS_DB_PROG --help 2>&1 | grep -q -- '-R rtdev' || \ + _notrun 'xfs_db does not support rt devices' + SCRATCH_OPTIONS="$SCRATCH_OPTIONS -R$SCRATCH_RTDEV" + fi _xfs_check $SCRATCH_OPTIONS $* $SCRATCH_DEV } ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 09/12] xfs/122: udpate test to pick up rtword/suminfo ondisk unions 2022-12-30 22:20 ` [PATCHSET v1.0 00/12] xfsprogs: shard the realtime section Darrick J. Wong ` (9 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 04/12] common: pass the realtime device to xfs_db when possible Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 12/12] common/fuzzy: adapt the scrub stress tests to support rtgroups Darrick J. Wong 11 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan From: Darrick J. Wong <djwong@kernel.org> Update this test to check that the ondisk unions for rt bitmap word and rt summary counts are always the correct size. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- tests/xfs/122 | 2 +- tests/xfs/122.out | 2 ++ 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/tests/xfs/122 b/tests/xfs/122 index e616f1987d..fe6c1e36e9 100755 --- a/tests/xfs/122 +++ b/tests/xfs/122 @@ -187,7 +187,7 @@ echo 'int main(int argc, char *argv[]) {' >>$cprog # cat /usr/include/xfs/xfs*.h | indent |\ _attribute_filter |\ -grep -E '(} *xfs_.*_t|^struct xfs_[a-z0-9_]*$)' |\ +grep -E '(} *xfs_.*_t|^(union|struct) xfs_[a-z0-9_]*$)' |\ grep -E -v -f $tmp.ignore |\ sed -e 's/^.*}[[:space:]]*//g' -e 's/;.*$//g' -e 's/_t, /_t\n/g' |\ sort | uniq |\ diff --git a/tests/xfs/122.out b/tests/xfs/122.out index 336618cf7a..1379c7b3b5 100644 --- a/tests/xfs/122.out +++ b/tests/xfs/122.out @@ -128,6 +128,8 @@ sizeof(struct xfs_swap_extent) = 64 sizeof(struct xfs_sxd_log_format) = 16 sizeof(struct xfs_sxi_log_format) = 80 sizeof(struct xfs_unmount_log_format) = 8 +sizeof(union xfs_rtword_ondisk) = 4 +sizeof(union xfs_suminfo_ondisk) = 4 sizeof(xfs_agf_t) = 224 sizeof(xfs_agfl_t) = 36 sizeof(xfs_agi_t) = 344 ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 12/12] common/fuzzy: adapt the scrub stress tests to support rtgroups 2022-12-30 22:20 ` [PATCHSET v1.0 00/12] xfsprogs: shard the realtime section Darrick J. Wong ` (10 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 09/12] xfs/122: udpate test to pick up rtword/suminfo ondisk unions Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 11 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan From: Darrick J. Wong <djwong@kernel.org> Adapt the scrub stress testing framework to support checking realtime group metadata. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- common/fuzzy | 27 ++++++++++++++++++++++----- common/xfs | 9 +++++++++ tests/xfs/800 | 2 +- tests/xfs/840 | 2 +- tests/xfs/841 | 2 +- 5 files changed, 34 insertions(+), 8 deletions(-) diff --git a/common/fuzzy b/common/fuzzy index 7f96384402..baba05de0a 100644 --- a/common/fuzzy +++ b/common/fuzzy @@ -817,8 +817,10 @@ __stress_one_scrub_loop() { local scrub_tgt="$3" local scrub_startat="$4" local start_agno="$5" - shift; shift; shift; shift; shift + local start_rgno="$6" + shift; shift; shift; shift; shift; shift local agcount="$(_xfs_mount_agcount $SCRATCH_MNT)" + local rgcount="$(_xfs_mount_rgcount $SCRATCH_MNT)" local xfs_io_args=() for arg in "$@"; do @@ -831,6 +833,12 @@ __stress_one_scrub_loop() { local ag_arg="$(echo "$arg" | sed -e "s|%agno%|$agno|g")" xfs_io_args+=('-c' "$ag_arg") done + elif echo "$arg" | grep -q -w '%rgno%'; then + # Substitute the rtgroup number + for ((rgno = start_rgno; rgno < rgcount; rgno++)); do + local rg_arg="$(echo "$arg" | sed -e "s|%rgno%|$rgno|g")" + xfs_io_args+=('-c' "$rg_arg") + done else xfs_io_args+=('-c' "$arg") fi @@ -1201,7 +1209,9 @@ _scratch_xfs_stress_scrub_cleanup() { __stress_scrub_check_commands() { local scrub_tgt="$1" local start_agno="$2" - shift; shift + local start_rgno="$3" + shift; shift; shift + local rgcount="$(_xfs_mount_rgcount $SCRATCH_MNT)" local cooked_tgt="$scrub_tgt" case "$scrub_tgt" in @@ -1231,6 +1241,10 @@ __stress_scrub_check_commands() { cooked_arg="$(echo "$cooked_arg" | sed -e 's/^repair/repair -R/g')" fi cooked_arg="$(echo "$cooked_arg" | sed -e "s/%agno%/$start_agno/g")" + if echo "$cooked_arg" | grep -q -w '%rgno%'; then + test "$rgcount" -eq 0 && continue + cooked_arg="$(echo "$cooked_arg" | sed -e "s/%rgno%/$start_rgno/g")" + fi testio=`$XFS_IO_PROG -x -c "$cooked_arg" "$cooked_tgt" 2>&1` echo $testio | grep -q "Unknown type" && \ _notrun "xfs_io scrub subcommand support is missing" @@ -1256,6 +1270,7 @@ __stress_scrub_check_commands() { # in a separate loop. If zero -i options are specified, do not run. # Callers must check each of these commands (via _require_xfs_io_command) # before calling here. +# -R For %rgno% substitution, start with this rtgroup instead of rtgroup 0. # -r Run fsstress for this amount of time, then remount the fs ro or rw. # The default is to run fsstress continuously with no remount, unless # XFS_SCRUB_STRESS_REMOUNT_PERIOD is set. @@ -1301,6 +1316,7 @@ _scratch_xfs_stress_scrub() { local remount_period="${XFS_SCRUB_STRESS_REMOUNT_PERIOD}" local stress_tgt="${XFS_SCRUB_STRESS_TARGET:-default}" local start_agno=0 + local start_rgno=0 __SCRUB_STRESS_FREEZE_PID="" __SCRUB_STRESS_REMOUNT_LOOP="" @@ -1308,12 +1324,13 @@ _scratch_xfs_stress_scrub() { touch "$runningfile" OPTIND=1 - while getopts "a:fi:r:s:S:t:w:x:X:" c; do + while getopts "a:fi:r:R:s:S:t:w:x:X:" c; do case "$c" in a) start_agno="$OPTARG";; f) freeze=yes;; i) io_args+=("$OPTARG");; r) remount_period="$OPTARG";; + R) start_rgno="$OPTARG";; s) one_scrub_args+=("$OPTARG");; S) xfs_scrub_args+=("$OPTARG");; t) scrub_tgt="$OPTARG";; @@ -1324,7 +1341,7 @@ _scratch_xfs_stress_scrub() { esac done - __stress_scrub_check_commands "$scrub_tgt" "$start_agno" \ + __stress_scrub_check_commands "$scrub_tgt" "$start_agno" "$start_rgno" \ "${one_scrub_args[@]}" if ! command -v "__stress_scrub_${exerciser}_loop" &>/dev/null; then @@ -1372,7 +1389,7 @@ _scratch_xfs_stress_scrub() { if [ "${#one_scrub_args[@]}" -gt 0 ]; then __stress_one_scrub_loop "$end" "$runningfile" "$scrub_tgt" \ - "$scrub_startat" "$start_agno" \ + "$scrub_startat" "$start_agno" "$start_rgno" \ "${one_scrub_args[@]}" & fi diff --git a/common/xfs b/common/xfs index a37284068f..f451dfb8ae 100644 --- a/common/xfs +++ b/common/xfs @@ -1524,6 +1524,15 @@ _xfs_mount_agcount() $XFS_INFO_PROG "$1" | sed -n "s/^.*agcount=\([[:digit:]]*\).*/\1/p" } +# Find rtgroup count of mounted filesystem +_xfs_mount_rgcount() +{ + local rtgroups="$($XFS_INFO_PROG "$1" | grep rgcount= | sed -e 's/^.*rgcount=\([0-9]*\).*$/\1/g')" + + test -z "$rtgroups" && rtgroups=0 + echo "$rtgroups" +} + # Wipe the superblock of each XFS AGs _try_wipe_scratch_xfs() { diff --git a/tests/xfs/800 b/tests/xfs/800 index cbcfb5f5a6..a2542be5ec 100755 --- a/tests/xfs/800 +++ b/tests/xfs/800 @@ -32,7 +32,7 @@ _require_xfs_stress_scrub _scratch_mkfs > "$seqres.full" 2>&1 _scratch_mount _require_xfs_has_feature "$SCRATCH_MNT" realtime -_scratch_xfs_stress_scrub -s "scrub rtbitmap" +_scratch_xfs_stress_scrub -s "scrub rtbitmap" -s "scrub rgbitmap %rgno%" # success, all done echo Silence is golden diff --git a/tests/xfs/840 b/tests/xfs/840 index fff41c5b8a..b9ed0a55b3 100755 --- a/tests/xfs/840 +++ b/tests/xfs/840 @@ -39,7 +39,7 @@ alloc_unit=$(_get_file_block_size $SCRATCH_MNT) scratchfile=$SCRATCH_MNT/file touch $scratchfile $XFS_IO_PROG -x -c 'inject force_repair' $SCRATCH_MNT -__stress_scrub_check_commands "$scratchfile" "" 'repair bmapbtd' +__stress_scrub_check_commands "$scratchfile" "" "" 'repair bmapbtd' # Compute the number of extent records needed to guarantee btree format, # assuming 16 bytes for each ondisk extent record diff --git a/tests/xfs/841 b/tests/xfs/841 index f743454971..5831961f5f 100755 --- a/tests/xfs/841 +++ b/tests/xfs/841 @@ -39,7 +39,7 @@ scratchfile=$SCRATCH_MNT/file mkdir $scratchdir touch $scratchfile $XFS_IO_PROG -x -c 'inject force_repair' $SCRATCH_MNT -__stress_scrub_check_commands "$scratchdir" "" 'repair directory' +__stress_scrub_check_commands "$scratchdir" "" "" 'repair directory' # Create a 2-dirblock directory total_size=$((alloc_unit * 2)) ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCHSET v1.0 00/13] fstests: fixes for realtime rmap 2022-12-30 21:14 [NYE DELUGE 3/4] xfs: modernize the realtime volume Darrick J. Wong ` (35 preceding siblings ...) 2022-12-30 22:20 ` [PATCHSET v1.0 00/12] xfsprogs: shard the realtime section Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 03/13] xfs: race fsstress with realtime rmap btree scrub and repair Darrick J. Wong ` (12 more replies) 2022-12-30 22:20 ` [PATCHSET v1.0 00/10] fstests: reflink on the realtime device Darrick J. Wong ` (2 subsequent siblings) 39 siblings, 13 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan Hi all, Fix a few regressions in fstests when rmap is enabled on the realtime volume. If you're going to start using this mess, you probably ought to just pull from my git trees, which are linked below. This is an extraordinary way to destroy everything. Enjoy! Comments and questions are, as always, welcome. --D kernel git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=realtime-rmap xfsprogs git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=realtime-rmap fstests git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=realtime-rmap xfsdocs git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-documentation.git/log/?h=realtime-rmap --- common/populate | 36 +++++++++++++++++++++++++++++++++++- common/xfs | 51 +++++++++++++++++++++++++++++++++++++++++++++++++++ tests/xfs/104 | 1 + tests/xfs/122.out | 5 ++--- tests/xfs/1528 | 41 +++++++++++++++++++++++++++++++++++++++++ tests/xfs/1528.out | 4 ++++ tests/xfs/1529 | 40 ++++++++++++++++++++++++++++++++++++++++ tests/xfs/1529.out | 4 ++++ tests/xfs/272 | 2 +- tests/xfs/276 | 2 +- tests/xfs/277 | 2 +- tests/xfs/291 | 3 ++- tests/xfs/332 | 6 +----- tests/xfs/332.out | 2 -- tests/xfs/333 | 45 --------------------------------------------- tests/xfs/333.out | 6 ------ tests/xfs/337 | 2 +- tests/xfs/338 | 21 ++++++++++++++++----- tests/xfs/339 | 5 +++-- tests/xfs/340 | 15 ++++++++++----- tests/xfs/341 | 12 ++++-------- tests/xfs/341.out | 1 - tests/xfs/342 | 4 ++-- tests/xfs/343 | 2 ++ tests/xfs/406 | 6 ++++-- tests/xfs/407 | 6 ++++-- tests/xfs/408 | 7 +++++-- tests/xfs/409 | 7 +++++-- tests/xfs/443 | 9 +++++---- tests/xfs/481 | 6 ++++-- tests/xfs/482 | 7 +++++-- tests/xfs/769 | 29 ++++++++++++++++++++++++++++- tests/xfs/781 | 42 ++++++++++++++++++++++++++++++++++++++++++ tests/xfs/781.out | 2 ++ tests/xfs/817 | 42 ++++++++++++++++++++++++++++++++++++++++++ tests/xfs/817.out | 2 ++ tests/xfs/821 | 42 ++++++++++++++++++++++++++++++++++++++++++ tests/xfs/821.out | 2 ++ 38 files changed, 414 insertions(+), 107 deletions(-) create mode 100755 tests/xfs/1528 create mode 100644 tests/xfs/1528.out create mode 100755 tests/xfs/1529 create mode 100644 tests/xfs/1529.out delete mode 100755 tests/xfs/333 delete mode 100644 tests/xfs/333.out create mode 100755 tests/xfs/781 create mode 100644 tests/xfs/781.out create mode 100755 tests/xfs/817 create mode 100644 tests/xfs/817.out create mode 100755 tests/xfs/821 create mode 100644 tests/xfs/821.out ^ permalink raw reply [flat|nested] 565+ messages in thread
* [PATCH 03/13] xfs: race fsstress with realtime rmap btree scrub and repair 2022-12-30 22:20 ` [PATCHSET v1.0 00/13] fstests: fixes for realtime rmap Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 04/13] xfs/769: add rtrmapbt upgrade to test matrix Darrick J. Wong ` (11 subsequent siblings) 12 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan From: Darrick J. Wong <djwong@kernel.org> Race checking and rebuilding realtime rmap btrees with fsstress. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- tests/xfs/781 | 42 ++++++++++++++++++++++++++++++++++++++++++ tests/xfs/781.out | 2 ++ tests/xfs/817 | 42 ++++++++++++++++++++++++++++++++++++++++++ tests/xfs/817.out | 2 ++ tests/xfs/821 | 42 ++++++++++++++++++++++++++++++++++++++++++ tests/xfs/821.out | 2 ++ 6 files changed, 132 insertions(+) create mode 100755 tests/xfs/781 create mode 100644 tests/xfs/781.out create mode 100755 tests/xfs/817 create mode 100644 tests/xfs/817.out create mode 100755 tests/xfs/821 create mode 100644 tests/xfs/821.out diff --git a/tests/xfs/781 b/tests/xfs/781 new file mode 100755 index 0000000000..938777952f --- /dev/null +++ b/tests/xfs/781 @@ -0,0 +1,42 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright (c) 2022 Oracle. Inc. All Rights Reserved. +# +# FS QA Test No. 781 +# +# Race fsstress and rtrmapbt repair for a while to see if we crash or livelock. +# +. ./common/preamble +_begin_fstest online_repair dangerous_fsstress_repair + +_cleanup() { + _scratch_xfs_stress_scrub_cleanup &> /dev/null + cd / + rm -r -f $tmp.* +} +_register_cleanup "_cleanup" BUS + +# Import common functions. +. ./common/filter +. ./common/fuzzy +. ./common/inject +. ./common/xfs + +# real QA test starts here +_supported_fs xfs +_require_realtime +_require_scratch +_require_xfs_stress_online_repair + +_scratch_mkfs > "$seqres.full" 2>&1 +_scratch_mount +_require_xfs_has_feature "$SCRATCH_MNT" realtime +_require_xfs_has_feature "$SCRATCH_MNT" rmapbt +_xfs_force_bdev realtime $SCRATCH_MNT + +_scratch_xfs_stress_online_repair -s "repair rtrmapbt %rgno%" + +# success, all done +echo Silence is golden +status=0 +exit diff --git a/tests/xfs/781.out b/tests/xfs/781.out new file mode 100644 index 0000000000..e7f74cf644 --- /dev/null +++ b/tests/xfs/781.out @@ -0,0 +1,2 @@ +QA output created by 781 +Silence is golden diff --git a/tests/xfs/817 b/tests/xfs/817 new file mode 100755 index 0000000000..88d0a18e8d --- /dev/null +++ b/tests/xfs/817 @@ -0,0 +1,42 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright (c) 2022 Oracle. Inc. All Rights Reserved. +# +# FS QA Test No. 817 +# +# Race fsstress and rtrmapbt scrub for a while to see if we crash or livelock. +# +. ./common/preamble +_begin_fstest scrub dangerous_fsstress_scrub + +_cleanup() { + _scratch_xfs_stress_scrub_cleanup &> /dev/null + cd / + rm -r -f $tmp.* +} +_register_cleanup "_cleanup" BUS + +# Import common functions. +. ./common/filter +. ./common/fuzzy +. ./common/inject +. ./common/xfs + +# real QA test starts here +_supported_fs xfs +_require_realtime +_require_scratch +_require_xfs_stress_scrub + +_scratch_mkfs > "$seqres.full" 2>&1 +_scratch_mount +_require_xfs_has_feature "$SCRATCH_MNT" realtime +_require_xfs_has_feature "$SCRATCH_MNT" rmapbt +_xfs_force_bdev realtime $SCRATCH_MNT + +_scratch_xfs_stress_scrub -s "scrub rtrmapbt %rgno%" + +# success, all done +echo Silence is golden +status=0 +exit diff --git a/tests/xfs/817.out b/tests/xfs/817.out new file mode 100644 index 0000000000..86920a4fc6 --- /dev/null +++ b/tests/xfs/817.out @@ -0,0 +1,2 @@ +QA output created by 817 +Silence is golden diff --git a/tests/xfs/821 b/tests/xfs/821 new file mode 100755 index 0000000000..45b999e3b5 --- /dev/null +++ b/tests/xfs/821 @@ -0,0 +1,42 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright (c) 2022 Oracle. Inc. All Rights Reserved. +# +# FS QA Test No. 821 +# +# Race fsstress and realtime bitmap repair for a while to see if we crash or +# livelock. +# +. ./common/preamble +_begin_fstest online_repair dangerous_fsstress_repair + +_cleanup() { + _scratch_xfs_stress_scrub_cleanup &> /dev/null + cd / + rm -r -f $tmp.* +} +_register_cleanup "_cleanup" BUS + +# Import common functions. +. ./common/filter +. ./common/fuzzy +. ./common/inject +. ./common/xfs + +# real QA test starts here +_supported_fs xfs +_require_realtime +_require_scratch +_require_xfs_stress_online_repair + +_scratch_mkfs > "$seqres.full" 2>&1 +_scratch_mount +_require_xfs_has_feature "$SCRATCH_MNT" realtime +_xfs_force_bdev realtime $SCRATCH_MNT + +_scratch_xfs_stress_online_repair -s "repair rtbitmap" -s "repair rgbitmap %rgno%" + +# success, all done +echo Silence is golden +status=0 +exit diff --git a/tests/xfs/821.out b/tests/xfs/821.out new file mode 100644 index 0000000000..17994b8627 --- /dev/null +++ b/tests/xfs/821.out @@ -0,0 +1,2 @@ +QA output created by 821 +Silence is golden ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 04/13] xfs/769: add rtrmapbt upgrade to test matrix 2022-12-30 22:20 ` [PATCHSET v1.0 00/13] fstests: fixes for realtime rmap Darrick J. Wong 2022-12-30 22:20 ` [PATCH 03/13] xfs: race fsstress with realtime rmap btree scrub and repair Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 05/13] xfs/122: update for rtgroups-based realtime rmap btrees Darrick J. Wong ` (10 subsequent siblings) 12 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan From: Darrick J. Wong <djwong@kernel.org> Add realtime reverse mapping btrees to the features that this test will try to upgrade. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- tests/xfs/769 | 29 ++++++++++++++++++++++++++++- 1 file changed, 28 insertions(+), 1 deletion(-) diff --git a/tests/xfs/769 b/tests/xfs/769 index 624dd2a338..ccc3ea10bc 100755 --- a/tests/xfs/769 +++ b/tests/xfs/769 @@ -35,11 +35,36 @@ rt_configured() test "$USE_EXTERNAL" = "yes" && test -n "$SCRATCH_RTDEV" } +need_rtgroups() +{ + local feat="$1" + + # if realtime isn't configured, we don't need rt groups + rt_configured || return 1 + + # rt rmap btrees require rt groups but rt groups cannot be added to + # an existing filesystem, so we must force it on at mkfs time + test "${FEATURE_STATE["rmapbt"]}" -eq 1 && return 0 + test "$feat" = "rmapbt" && return 0 + + return 1 +} + # Compute the MKFS_OPTIONS string for a particular feature upgrade test compute_mkfs_options() { + local feat="$1" local m_opts="" local caller_options="$MKFS_OPTIONS" + local rtgroups + + need_rtgroups "$feat" && rtgroups=1 + if echo "$caller_options" | grep -q 'rtgroups='; then + test -z "$rtgroups" && rtgroups=0 + caller_options="$(echo "$caller_options" | sed -e 's/rtgroups=*[0-9]*/rtgroups='$rtgroups'/g')" + elif [ -n "$rtgroups" ]; then + caller_options="$caller_options -r rtgroups=$rtgroups" + fi for feat in "${FEATURES[@]}"; do local feat_state="${FEATURE_STATE["${feat}"]}" @@ -171,10 +196,12 @@ function post_exercise() # upgrade don't spread failure to the rest of the tests. FEATURES=() if rt_configured; then + # rmap wasn't added to rt devices until after metadir check_repair_upgrade finobt && FEATURES+=("finobt") check_repair_upgrade inobtcount && FEATURES+=("inobtcount") check_repair_upgrade bigtime && FEATURES+=("bigtime") check_repair_upgrade metadir && FEATURES+=("metadir") + check_repair_upgrade rmapbt && FEATURES+=("rmapbt") else check_repair_upgrade finobt && FEATURES+=("finobt") check_repair_upgrade rmapbt && FEATURES+=("rmapbt") @@ -197,7 +224,7 @@ for feat in "${FEATURES[@]}"; do upgrade_start_message "$feat" | tee -a $seqres.full /dev/ttyprintk > /dev/null - opts="$(compute_mkfs_options)" + opts="$(compute_mkfs_options "$feat")" echo "mkfs.xfs $opts" >> $seqres.full # Format filesystem ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 05/13] xfs/122: update for rtgroups-based realtime rmap btrees 2022-12-30 22:20 ` [PATCHSET v1.0 00/13] fstests: fixes for realtime rmap Darrick J. Wong 2022-12-30 22:20 ` [PATCH 03/13] xfs: race fsstress with realtime rmap btree scrub and repair Darrick J. Wong 2022-12-30 22:20 ` [PATCH 04/13] xfs/769: add rtrmapbt upgrade to test matrix Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 07/13] xfs/341: update test for rtgroup-based rmap Darrick J. Wong ` (9 subsequent siblings) 12 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan From: Darrick J. Wong <djwong@kernel.org> Now that we've redesigned realtime rmap to require that the rt section be sharded into allocation groups of no more than 2^31 blocks, we've reduced the size of the ondisk structures and therefore need to update this test. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- tests/xfs/122.out | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tests/xfs/122.out b/tests/xfs/122.out index 53eff0027e..e0801f9660 100644 --- a/tests/xfs/122.out +++ b/tests/xfs/122.out @@ -116,8 +116,8 @@ sizeof(struct xfs_rmap_key) = 20 sizeof(struct xfs_rmap_rec) = 24 sizeof(struct xfs_rtbuf_blkinfo) = 48 sizeof(struct xfs_rtgroup_geometry) = 128 -sizeof(struct xfs_rtrmap_key) = 24 -sizeof(struct xfs_rtrmap_rec) = 32 +sizeof(struct xfs_rtrmap_key) = 20 +sizeof(struct xfs_rtrmap_rec) = 24 sizeof(struct xfs_rtrmap_root) = 4 sizeof(struct xfs_rtsb) = 104 sizeof(struct xfs_rud_log_format) = 16 ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 07/13] xfs/341: update test for rtgroup-based rmap 2022-12-30 22:20 ` [PATCHSET v1.0 00/13] fstests: fixes for realtime rmap Darrick J. Wong ` (2 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 05/13] xfs/122: update for rtgroups-based realtime rmap btrees Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 06/13] xfs: fix various problems with fsmap detecting the data device Darrick J. Wong ` (8 subsequent siblings) 12 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan From: Darrick J. Wong <djwong@kernel.org> Now that we're sharding the realtime volume into multiple allocation groups, update this test to reflect the new reality. The realtime rmap btree record and key sizes have shrunk, and we can't guarantee that a quick file write actually hits the same rt group as the one we fuzzed, so eliminate the file write test since we're really only curious if xfs_repair will fix the problem. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- tests/xfs/341 | 10 +++------- tests/xfs/341.out | 1 - 2 files changed, 3 insertions(+), 8 deletions(-) diff --git a/tests/xfs/341 b/tests/xfs/341 index 8861e751a9..561054f0bd 100755 --- a/tests/xfs/341 +++ b/tests/xfs/341 @@ -32,10 +32,10 @@ blksz="$(_get_block_size $SCRATCH_MNT)" rtextsz_blks=$((rtextsz / blksz)) # inode core size is at least 176 bytes; btree header is 56 bytes; -# rtrmap record is 32 bytes; and rtrmap key/pointer are 56 bytes. +# rtrmap record is 24 bytes; and rtrmap key/pointer are 48 bytes. i_core_size="$(_xfs_get_inode_core_bytes $SCRATCH_MNT)" -i_ptrs=$(( (isize - i_core_size) / 56 )) -bt_recs=$(( (blksz - 56) / 32 )) +i_ptrs=$(( (isize - i_core_size) / 48 )) +bt_recs=$(( (blksz - 56) / 24 )) blocks=$((i_ptrs * bt_recs + 1)) len=$((blocks * rtextsz)) @@ -57,10 +57,6 @@ _scratch_xfs_db -x -c 'path -m /realtime/0.rmap' \ -c "write u3.rtrmapbt.ptrs[1] $fsbno" -c 'p' >> $seqres.full _scratch_mount -echo "Try to create more files" -$XFS_IO_PROG -f -R -c "pwrite -S 0x68 0 9999" $SCRATCH_MNT/f5 >> $seqres.full 2>&1 -test -e $SCRATCH_MNT/f5 && echo "should not have been able to write f5" - echo "Repair fs" _scratch_unmount 2>&1 | _filter_scratch _repair_scratch_fs >> $seqres.full 2>&1 diff --git a/tests/xfs/341.out b/tests/xfs/341.out index 75a5bc6c61..580d788954 100644 --- a/tests/xfs/341.out +++ b/tests/xfs/341.out @@ -2,6 +2,5 @@ QA output created by 341 Format and mount Create some files Corrupt fs -Try to create more files Repair fs Try to create more files (again) ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 06/13] xfs: fix various problems with fsmap detecting the data device 2022-12-30 22:20 ` [PATCHSET v1.0 00/13] fstests: fixes for realtime rmap Darrick J. Wong ` (3 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 07/13] xfs/341: update test for rtgroup-based rmap Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 01/13] xfs: fix tests that try to access the realtime rmap inode Darrick J. Wong ` (7 subsequent siblings) 12 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan From: Darrick J. Wong <djwong@kernel.org> Various tests of realtime rmap functionality assumed that the data device could be picked out from the GETFSMAP output by looking for static fs metadata. This is no longer true, since rtgroups have a static superblock header at the start, so update these tests. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- tests/xfs/272 | 2 +- tests/xfs/276 | 2 +- tests/xfs/277 | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/tests/xfs/272 b/tests/xfs/272 index 7ed8b95122..42b4a2edb5 100755 --- a/tests/xfs/272 +++ b/tests/xfs/272 @@ -57,7 +57,7 @@ cat $TEST_DIR/bmap | while read ext offrange colon blockrange ag agrange total c done echo "Check device field of FS metadata and regular file" -data_dev=$(grep 'static fs metadata' $TEST_DIR/fsmap | head -n 1 | awk '{print $2}') +data_dev=$(grep 'inode btree' $TEST_DIR/fsmap | head -n 1 | awk '{print $2}') rt_dev=$(grep "${ino}[[:space:]]*[0-9]*\.\.[0-9]*" $TEST_DIR/fsmap | head -n 1 | awk '{print $2}') test "${data_dev}" = "${rt_dev}" || echo "data ${data_dev} realtime ${rt_dev}?" diff --git a/tests/xfs/276 b/tests/xfs/276 index 8cc486752a..a05ca1961d 100755 --- a/tests/xfs/276 +++ b/tests/xfs/276 @@ -61,7 +61,7 @@ cat $TEST_DIR/bmap | while read ext offrange colon blockrange ag agrange total c done echo "Check device field of FS metadata and realtime file" -data_dev=$(grep 'static fs metadata' $TEST_DIR/fsmap | head -n 1 | awk '{print $2}') +data_dev=$(grep 'inode btree' $TEST_DIR/fsmap | head -n 1 | awk '{print $2}') rt_dev=$(grep "${ino}[[:space:]]*[0-9]*\.\.[0-9]*[[:space:]]*[0-9]*$" $TEST_DIR/fsmap | head -n 1 | awk '{print $2}') test "${data_dev}" != "${rt_dev}" || echo "data ${data_dev} realtime ${rt_dev}?" diff --git a/tests/xfs/277 b/tests/xfs/277 index 03208ef233..eff54a2a50 100755 --- a/tests/xfs/277 +++ b/tests/xfs/277 @@ -38,7 +38,7 @@ $XFS_IO_PROG -c 'fsmap -v' $SCRATCH_MNT >> $seqres.full $XFS_IO_PROG -c 'fsmap -v' $SCRATCH_MNT | tr '[]()' ' ' > $TEST_DIR/fsmap echo "Check device field of FS metadata and journalling log" -data_dev=$(grep 'static fs metadata' $TEST_DIR/fsmap | head -n 1 | awk '{print $2}') +data_dev=$(grep 'inode btree' $TEST_DIR/fsmap | head -n 1 | awk '{print $2}') journal_dev=$(grep 'journalling log' $TEST_DIR/fsmap | head -n 1 | awk '{print $2}') test "${data_dev}" = "${journal_dev}" || echo "data ${data_dev} journal ${journal_dev}?" ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 01/13] xfs: fix tests that try to access the realtime rmap inode 2022-12-30 22:20 ` [PATCHSET v1.0 00/13] fstests: fixes for realtime rmap Darrick J. Wong ` (4 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 06/13] xfs: fix various problems with fsmap detecting the data device Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 02/13] fuzz: for fuzzing the rtrmapbt, find the path to the rt rmap btree file Darrick J. Wong ` (6 subsequent siblings) 12 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan From: Darrick J. Wong <djwong@kernel.org> The realtime rmap tests were added to fstests a long time ago. Since they were added, we decided to create a metadata file directory structure instead of adding more fields to the superblock. Therefore, fix all the tests that try to access these paths. While we're at it, fix xfs/409 to run the *online* scrub program like it's supposed to. xfs/408 is the fuzzer for xfs_repair testing. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- common/xfs | 18 ++++++++++++++++++ tests/xfs/122.out | 1 - tests/xfs/333 | 45 --------------------------------------------- tests/xfs/333.out | 6 ------ tests/xfs/337 | 2 +- tests/xfs/338 | 21 ++++++++++++++++----- tests/xfs/339 | 5 +++-- tests/xfs/340 | 15 ++++++++++----- tests/xfs/341 | 2 +- tests/xfs/342 | 4 ++-- 10 files changed, 51 insertions(+), 68 deletions(-) delete mode 100755 tests/xfs/333 delete mode 100644 tests/xfs/333.out diff --git a/common/xfs b/common/xfs index f451dfb8ae..aea2b678c8 100644 --- a/common/xfs +++ b/common/xfs @@ -1922,3 +1922,21 @@ _require_xfs_scratch_metadir() _scratch_unmount fi } + +# Resolve a metadata directory tree path and return the inode number. +_scratch_metadir_lookup() { + local res="$(_scratch_xfs_db -c "ls -i -m $1")" + test "${PIPESTATUS[0]}" -eq 0 && echo "$res" +} + +# Figure out which directory entry we have to change to update the rtrmap +# inode pointer. Assumes the /realtime directory is a short format dir. +_scratch_find_rt_metadir_entry() { + local sfkey="$(_scratch_xfs_db -c 'path -m /realtime' -c print | \ + grep "\"$1\"" | \ + sed -e 's/.name.*$//g' -e 's/\[/\\[/g' -e 's/\]/\\]/g' )" + test -n "$sfkey" || return 1 + _scratch_xfs_db -c 'path -m /realtime' -c print | \ + grep "${sfkey}.inumber" | awk '{print $1}' + return 0 +} diff --git a/tests/xfs/122.out b/tests/xfs/122.out index 1379c7b3b5..53eff0027e 100644 --- a/tests/xfs/122.out +++ b/tests/xfs/122.out @@ -48,7 +48,6 @@ offsetof(xfs_sb_t, sb_rgblklog) = 280 offsetof(xfs_sb_t, sb_rgblocks) = 272 offsetof(xfs_sb_t, sb_rgcount) = 276 offsetof(xfs_sb_t, sb_rootino) = 56 -offsetof(xfs_sb_t, sb_rrmapino) = 264 offsetof(xfs_sb_t, sb_rsumino) = 72 offsetof(xfs_sb_t, sb_sectlog) = 121 offsetof(xfs_sb_t, sb_sectsize) = 102 diff --git a/tests/xfs/333 b/tests/xfs/333 deleted file mode 100755 index 728c518402..0000000000 --- a/tests/xfs/333 +++ /dev/null @@ -1,45 +0,0 @@ -#! /bin/bash -# SPDX-License-Identifier: GPL-2.0 -# Copyright (c) 2016, Oracle and/or its affiliates. All Rights Reserved. -# -# FS QA Test No. 333 -# -# Set rrmapino to another inode on an non-rt rmap fs and see if repair fixes it. -# -. ./common/preamble -_begin_fstest auto quick rmap realtime - -# Import common functions. -. ./common/filter - -# real QA test starts here -_supported_fs xfs -_require_xfs_scratch_rmapbt -_disable_dmesg_check - -rm -f "$seqres.full" - -unset SCRATCH_RTDEV - -echo "Format and mount" -_scratch_mkfs > "$seqres.full" 2>&1 -rrmapino="$(_scratch_xfs_db -c 'sb 0' -c 'p rrmapino' 2>&1)" -test "${rrmapino}" = "field rrmapino not found" && _notrun "realtime rmapbt not supported" -_scratch_mount - -echo "Create some files" -$XFS_IO_PROG -f -c "pwrite -S 0x68 0 9999" $SCRATCH_MNT/f1 >> $seqres.full -$XFS_IO_PROG -f -c "pwrite -S 0x68 0 9999" $SCRATCH_MNT/f2 >> $seqres.full -echo garbage > $SCRATCH_MNT/f3 -ino=$(stat -c '%i' $SCRATCH_MNT/f3) -_scratch_unmount - -echo "Corrupt fs" -_scratch_xfs_db -x -c 'sb 0' -c "write rrmapino $ino" >> $seqres.full -_try_scratch_mount 2>&1 | _filter_error_mount - -echo "Test done, mount should have failed" - -# success, all done -status=0 -exit diff --git a/tests/xfs/333.out b/tests/xfs/333.out deleted file mode 100644 index b3c698750f..0000000000 --- a/tests/xfs/333.out +++ /dev/null @@ -1,6 +0,0 @@ -QA output created by 333 -Format and mount -Create some files -Corrupt fs -mount: Structure needs cleaning -Test done, mount should have failed diff --git a/tests/xfs/337 b/tests/xfs/337 index f74baae9b0..9ea8587b27 100755 --- a/tests/xfs/337 +++ b/tests/xfs/337 @@ -53,7 +53,7 @@ echo "+ check fs" _scratch_xfs_repair -n >> $seqres.full 2>&1 || echo "xfs_repair should not fail" echo "+ corrupt image" -_scratch_xfs_db -x -c "sb" -c "addr rrmapino" -c "addr u3.rtrmapbt.ptrs[1]" \ +_scratch_xfs_db -x -c "path -m /realtime/0.rmap" -c "addr u3.rtrmapbt.ptrs[1]" \ -c "stack" -c "blocktrash -x 4096 -y 4096 -n 8 -3 -z" \ >> $seqres.full 2>&1 diff --git a/tests/xfs/338 b/tests/xfs/338 index 9f36150c7e..9d41a83ec2 100755 --- a/tests/xfs/338 +++ b/tests/xfs/338 @@ -29,13 +29,24 @@ $XFS_IO_PROG -f -R -c "pwrite -S 0x68 0 9999" $SCRATCH_MNT/f2 >> $seqres.full _scratch_unmount echo "Corrupt fs" -_scratch_xfs_db -x -c 'sb 0' -c 'addr rrmapino' \ - -c 'write core.nlinkv2 0' -c 'write core.mode 0' -c 'sb 0' \ - -c 'write rrmapino 0' >> $seqres.full -_try_scratch_mount >> $seqres.full 2>&1 && echo "mount should have failed" +rtrmap_sfentry="$(_scratch_find_rt_metadir_entry 0.rmap)" +test -n "$rtrmap_sfentry" || _fail "Could not find rtrmap metadir entry?" +_scratch_xfs_db -x -c 'path -m /realtime/0.rmap' \ + -c 'write core.nlinkv2 0' -c 'write core.mode 0' \ + -c 'path -m /realtime' \ + -c "write $rtrmap_sfentry 0" >> $seqres.full +if _try_scratch_mount >> $seqres.full 2>&1; then + echo "mount should have failed" + _scratch_unmount +else + # If the verifiers are working properly, the mount will fail because + # we fuzzed the metadata root directory. This causes loud complaints + # to dmesg, so we want to ignore those. + _disable_dmesg_check +fi echo "Repair fs" -_scratch_unmount 2>&1 | _filter_scratch +_scratch_unmount 2>&1 | _filter_scratch | _filter_ending_dot _repair_scratch_fs >> $seqres.full 2>&1 echo "Try to create more files (again)" diff --git a/tests/xfs/339 b/tests/xfs/339 index 3e0b4d97ab..24a90d0ba3 100755 --- a/tests/xfs/339 +++ b/tests/xfs/339 @@ -31,7 +31,8 @@ ln $SCRATCH_MNT/f3 $SCRATCH_MNT/f4 _scratch_unmount echo "Corrupt fs" -rrmapino=`_scratch_xfs_get_sb_field rrmapino` +rrmapino=$(_scratch_metadir_lookup /realtime/0.rmap) +test -n "$rrmapino" || _fail "Could not find rtrmap inode?" _scratch_xfs_set_metadata_field "u3.sfdir3.list[3].inumber.i4" $rrmapino \ 'sb 0' 'addr rootino' >> $seqres.full _scratch_mount @@ -43,7 +44,7 @@ echo "Try to create more files" $XFS_IO_PROG -f -R -c "pwrite -S 0x68 0 9999" $SCRATCH_MNT/f5 >> $seqres.full 2>&1 echo "Repair fs" -_scratch_unmount 2>&1 | _filter_scratch +_scratch_unmount 2>&1 | _filter_scratch | _filter_ending_dot _repair_scratch_fs >> $seqres.full 2>&1 echo "Try to create more files (again)" diff --git a/tests/xfs/340 b/tests/xfs/340 index 2c0014513e..1236f6520f 100755 --- a/tests/xfs/340 +++ b/tests/xfs/340 @@ -31,16 +31,21 @@ ino=$(stat -c '%i' $SCRATCH_MNT/f3) _scratch_unmount echo "Corrupt fs" -rrmapino=$(_scratch_xfs_get_sb_field rrmapino) -_scratch_xfs_db -x -c "inode $rrmapino" \ +rtrmap_sfentry="$(_scratch_find_rt_metadir_entry 0.rmap)" +test -n "$rtrmap_sfentry" || _fail "Could not find rtrmap metadir entry?" +rrmapino=$(_scratch_metadir_lookup /realtime/0.rmap) +test -n "$rrmapino" || _fail "Could not find rtrmap inode?" +_scratch_xfs_db -x -c "path -m /realtime/0.rmap" \ -c 'write core.format 2' -c 'write core.size 0' \ - -c 'write core.nblocks 0' -c 'sb 0' -c 'addr rootino' \ + -c 'write core.nblocks 0' \ + -c 'sb 0' -c 'addr rootino' \ -c "write u3.sfdir3.list[2].inumber.i4 $rrmapino" \ - -c 'sb 0' -c "write rrmapino $ino" >> $seqres.full + -c 'path -m /realtime' \ + -c "write $rtrmap_sfentry $ino" >> $seqres.full _try_scratch_mount >> $seqres.full 2>&1 && echo "mount should have failed" echo "Repair fs" -_scratch_unmount 2>&1 | _filter_scratch +_scratch_unmount 2>&1 | _filter_scratch | _filter_ending_dot _repair_scratch_fs >> $seqres.full 2>&1 echo "Try to create more files (again)" diff --git a/tests/xfs/341 b/tests/xfs/341 index 7d2842b579..8861e751a9 100755 --- a/tests/xfs/341 +++ b/tests/xfs/341 @@ -53,7 +53,7 @@ echo "Corrupt fs" fsbno=$(_scratch_xfs_db -c "inode $ino" -c 'bmap' | grep 'flag 0' | head -n 1 | \ sed -e 's/^.*startblock \([0-9]*\) .*$/\1/g') -_scratch_xfs_db -x -c 'sb 0' -c 'addr rrmapino' \ +_scratch_xfs_db -x -c 'path -m /realtime/0.rmap' \ -c "write u3.rtrmapbt.ptrs[1] $fsbno" -c 'p' >> $seqres.full _scratch_mount diff --git a/tests/xfs/342 b/tests/xfs/342 index 538c8987ef..f29bd874e9 100755 --- a/tests/xfs/342 +++ b/tests/xfs/342 @@ -47,9 +47,9 @@ ino=$(stat -c '%i' $SCRATCH_MNT/f3) _scratch_unmount echo "Corrupt fs" -_scratch_xfs_db -c 'sb 0' -c 'addr rrmapino' -c 'p u3.rtrmapbt.ptrs[1]' >> $seqres.full +_scratch_xfs_db -c 'path -m /realtime/0.rmap' -c 'p u3.rtrmapbt.ptrs[1]' >> $seqres.full -fsbno=$(_scratch_xfs_db -c 'sb 0' -c 'addr rrmapino' \ +fsbno=$(_scratch_xfs_db -c 'path -m /realtime/0.rmap' \ -c 'p u3.rtrmapbt.ptrs[1]' | sed -e 's/^.*://g') _scratch_xfs_db -x -c "inode $ino" -c "write u3.bmx[0].startblock $fsbno" >> $seqres.full _scratch_mount ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 02/13] fuzz: for fuzzing the rtrmapbt, find the path to the rt rmap btree file 2022-12-30 22:20 ` [PATCHSET v1.0 00/13] fstests: fixes for realtime rmap Darrick J. Wong ` (5 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 01/13] xfs: fix tests that try to access the realtime rmap inode Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 09/13] xfs: skip tests if formatting small filesystem fails Darrick J. Wong ` (5 subsequent siblings) 12 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan From: Darrick J. Wong <djwong@kernel.org> The fs population code creates a realtime rmap btree in /some/ realtime group with at least two levels. This rmapbt file isn't necessarily the one for group 0, so we need to find it programmatically. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- common/xfs | 33 +++++++++++++++++++++++++++++++++ tests/xfs/406 | 6 ++++-- tests/xfs/407 | 6 ++++-- tests/xfs/408 | 7 +++++-- tests/xfs/409 | 7 +++++-- tests/xfs/481 | 6 ++++-- tests/xfs/482 | 7 +++++-- 7 files changed, 60 insertions(+), 12 deletions(-) diff --git a/common/xfs b/common/xfs index aea2b678c8..63eff39d47 100644 --- a/common/xfs +++ b/common/xfs @@ -1814,6 +1814,39 @@ _scratch_xfs_find_agbtree_height() { return 1 } +# Find us the path to the inode containing a realtime btree with a specific +# height. +_scratch_xfs_find_rgbtree_height() { + local bt_type="$1" + local bt_height="$2" + local rgcount=$(_xfs_mount_rgcount $SCRATCH_DEV) + local path + local path_format + local bt_prefix + + case "${bt_type}" in + "rmap") + path_format="/realtime/%u.rmap" + bt_prefix="u3.rtrmapbt" + ;; + *) + _fail "Don't know about rt btree ${bt_type}" + ;; + esac + + for ((rgno = 0; rgno < rgcount; rgno++)); do + path="$(printf "${path_format}" "${rgno}")" + bt_level=$(_scratch_xfs_db -c "path -m ${path}" -c "p ${bt_prefix}.level" | awk '{print $3}') + # "level" is the actual level within the btree + if [ "${bt_level}" -eq "$((bt_height - 1))" ]; then + echo "${path}" + return 0 + fi + done + + return 1 +} + _require_xfs_mkfs_atomicswap() { # atomicswap can be activated on rmap or reflink filesystems. diff --git a/tests/xfs/406 b/tests/xfs/406 index 78db18077c..8c5570886b 100755 --- a/tests/xfs/406 +++ b/tests/xfs/406 @@ -26,10 +26,12 @@ _require_scratch_xfs_fuzz_fields echo "Format and populate" _scratch_populate_cached nofill > $seqres.full 2>&1 -inode_ver=$(_scratch_xfs_get_metadata_field "core.version" 'sb 0' 'addr rrmapino') +path="$(_scratch_xfs_find_rgbtree_height 'rmap' 2)" || \ + _fail "could not find two-level rtrmapbt" +inode_ver=$(_scratch_xfs_get_metadata_field "core.version" "path -m $path") echo "Fuzz rtrmapbt recs" -_scratch_xfs_fuzz_metadata '' 'offline' 'sb 0' 'addr rrmapino' "addr u${inode_ver}.rtrmapbt.ptrs[1]" >> $seqres.full +_scratch_xfs_fuzz_metadata '' 'offline' "path -m $path" "addr u${inode_ver}.rtrmapbt.ptrs[1]" >> $seqres.full echo "Done fuzzing rtrmapbt recs" # success, all done diff --git a/tests/xfs/407 b/tests/xfs/407 index 5a43775b55..2460ea336c 100755 --- a/tests/xfs/407 +++ b/tests/xfs/407 @@ -26,10 +26,12 @@ _require_scratch_xfs_fuzz_fields echo "Format and populate" _scratch_populate_cached nofill > $seqres.full 2>&1 -inode_ver=$(_scratch_xfs_get_metadata_field "core.version" 'sb 0' 'addr rrmapino') +path="$(_scratch_xfs_find_rgbtree_height 'rmap' 1)" || \ + _fail "could not find two-level rtrmapbt" +inode_ver=$(_scratch_xfs_get_metadata_field "core.version" "path -m $path") echo "Fuzz rtrmapbt recs" -_scratch_xfs_fuzz_metadata '' 'online' 'sb 0' 'addr rrmapino' "addr u${inode_ver}.rtrmapbt.ptrs[1]" >> $seqres.full +_scratch_xfs_fuzz_metadata '' 'online' "path -m $path" "addr u${inode_ver}.rtrmapbt.ptrs[1]" >> $seqres.full echo "Done fuzzing rtrmapbt recs" # success, all done diff --git a/tests/xfs/408 b/tests/xfs/408 index 8049d6bead..3bed3824e8 100755 --- a/tests/xfs/408 +++ b/tests/xfs/408 @@ -4,7 +4,7 @@ # # FS QA Test No. 408 # -# Populate a XFS filesystem and fuzz every rtrmapbt keyptr field. +# Populate a XFS filesystem and fuzz every rtrmapbt key/pointer field. # Use xfs_repair to fix the corruption. # . ./common/preamble @@ -26,8 +26,11 @@ _require_scratch_xfs_fuzz_fields echo "Format and populate" _scratch_populate_cached nofill > $seqres.full 2>&1 +path="$(_scratch_xfs_find_rgbtree_height 'rmap' 2)" || \ + _fail "could not find two-level rtrmapbt" + echo "Fuzz rtrmapbt keyptrs" -_scratch_xfs_fuzz_metadata '(rtrmapbt)' 'offline' 'sb 0' 'addr rrmapino' >> $seqres.full +_scratch_xfs_fuzz_metadata '(rtrmapbt)' 'offline' "path -m $path" >> $seqres.full echo "Done fuzzing rtrmapbt keyptrs" # success, all done diff --git a/tests/xfs/409 b/tests/xfs/409 index adac95fea8..ce66175c6e 100755 --- a/tests/xfs/409 +++ b/tests/xfs/409 @@ -4,7 +4,7 @@ # # FS QA Test No. 409 # -# Populate a XFS filesystem and fuzz every rtrmapbt keyptr field. +# Populate a XFS filesystem and fuzz every rtrmapbt key/pointer field. # Use xfs_scrub to fix the corruption. # . ./common/preamble @@ -26,8 +26,11 @@ _require_scratch_xfs_fuzz_fields echo "Format and populate" _scratch_populate_cached nofill > $seqres.full 2>&1 +path="$(_scratch_xfs_find_rgbtree_height 'rmap' 2)" || \ + _fail "could not find two-level rtrmapbt" + echo "Fuzz rtrmapbt keyptrs" -_scratch_xfs_fuzz_metadata '(rtrmapbt)' 'offline' 'sb 0' 'addr rrmapino' >> $seqres.full +_scratch_xfs_fuzz_metadata '(rtrmapbt)' 'online' "path -m $path" >> $seqres.full echo "Done fuzzing rtrmapbt keyptrs" # success, all done diff --git a/tests/xfs/481 b/tests/xfs/481 index 48c7580ccb..d303f2c27d 100755 --- a/tests/xfs/481 +++ b/tests/xfs/481 @@ -27,10 +27,12 @@ _disable_dmesg_check echo "Format and populate" _scratch_populate_cached nofill > $seqres.full 2>&1 -inode_ver=$(_scratch_xfs_get_metadata_field "core.version" 'sb 0' 'addr rrmapino') +path="$(_scratch_xfs_find_rgbtree_height 'rmap' 2)" || \ + _fail "could not find two-level rtrmapbt" +inode_ver=$(_scratch_xfs_get_metadata_field "core.version" "path -m $path") echo "Fuzz rtrmapbt recs" -_scratch_xfs_fuzz_metadata '' 'none' 'sb 0' 'addr rrmapino' "addr u${inode_ver}.rtrmapbt.ptrs[1]" >> $seqres.full +_scratch_xfs_fuzz_metadata '' 'none' "path -m $path" "addr u${inode_ver}.rtrmapbt.ptrs[1]" >> $seqres.full echo "Done fuzzing rtrmapbt recs" # success, all done diff --git a/tests/xfs/482 b/tests/xfs/482 index 0192b94cc0..32a3012154 100755 --- a/tests/xfs/482 +++ b/tests/xfs/482 @@ -4,7 +4,7 @@ # # FS QA Test No. 482 # -# Populate a XFS filesystem and fuzz every rtrmapbt keyptr field. +# Populate a XFS filesystem and fuzz every rtrmapbt key/pointer field. # Do not fix the filesystem, to test metadata verifiers. . ./common/preamble @@ -27,8 +27,11 @@ _disable_dmesg_check echo "Format and populate" _scratch_populate_cached nofill > $seqres.full 2>&1 +path="$(_scratch_xfs_find_rgbtree_height 'rmap' 2)" || \ + _fail "could not find two-level rtrmapbt" + echo "Fuzz rtrmapbt keyptrs" -_scratch_xfs_fuzz_metadata '(rtrmapbt)' 'offline' 'sb 0' 'addr rrmapino' >> $seqres.full +_scratch_xfs_fuzz_metadata '(rtrmapbt)' 'offline' "path -m $path" >> $seqres.full echo "Done fuzzing rtrmapbt keyptrs" # success, all done ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 09/13] xfs: skip tests if formatting small filesystem fails 2022-12-30 22:20 ` [PATCHSET v1.0 00/13] fstests: fixes for realtime rmap Darrick J. Wong ` (6 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 02/13] fuzz: for fuzzing the rtrmapbt, find the path to the rt rmap btree file Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 10/13] xfs/443: use file allocation unit, not dbsize Darrick J. Wong ` (4 subsequent siblings) 12 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan From: Darrick J. Wong <djwong@kernel.org> There are a few tests that try to exercise XFS functionality with an unusually small (< 500MB) filesystem. Formatting can fail if the test configuration also specifies a very large realtime device because mkfs hits ENOSPC when allocating the realtime metadata. The test proceeds anyway (which causes an immediate mount failure) so we might as well skip these. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- tests/xfs/104 | 1 + tests/xfs/291 | 3 ++- 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/tests/xfs/104 b/tests/xfs/104 index d16f46d8e4..c3d1d18a58 100755 --- a/tests/xfs/104 +++ b/tests/xfs/104 @@ -16,6 +16,7 @@ _create_scratch() { echo "*** mkfs" _scratch_mkfs_xfs $@ | tee -a $seqres.full | _filter_mkfs 2>$tmp.mkfs + test "${PIPESTATUS[0]}" -eq 0 || _notrun "formatting small scratch fs failed" . $tmp.mkfs echo "*** mount" diff --git a/tests/xfs/291 b/tests/xfs/291 index 600dcb2eba..70e5f51cee 100755 --- a/tests/xfs/291 +++ b/tests/xfs/291 @@ -18,7 +18,8 @@ _require_command "$XFS_MDRESTORE_PROG" "xfs_mdrestore" # real QA test starts here _require_scratch logblks=$(_scratch_find_xfs_min_logblocks -n size=16k -d size=133m) -_scratch_mkfs_xfs -n size=16k -l size=${logblks}b -d size=133m >> $seqres.full 2>&1 +_scratch_mkfs_xfs -n size=16k -l size=${logblks}b -d size=133m >> $seqres.full 2>&1 || \ + _notrun "formatting small scratch fs failed" _scratch_mount # First we cause very badly fragmented freespace, then ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 10/13] xfs/443: use file allocation unit, not dbsize 2022-12-30 22:20 ` [PATCHSET v1.0 00/13] fstests: fixes for realtime rmap Darrick J. Wong ` (7 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 09/13] xfs: skip tests if formatting small filesystem fails Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 12/13] populate: check that we created a realtime rmap btree of the given height Darrick J. Wong ` (3 subsequent siblings) 12 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan From: Darrick J. Wong <djwong@kernel.org> We can only punch in units of file allocation boundaries, so update this test to use that instead of the fs blocksize. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- tests/xfs/443 | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/tests/xfs/443 b/tests/xfs/443 index 56828decae..ab3cda59f3 100755 --- a/tests/xfs/443 +++ b/tests/xfs/443 @@ -40,14 +40,15 @@ _scratch_mount file1=$SCRATCH_MNT/file1 file2=$SCRATCH_MNT/file2 +file_blksz=$(_get_file_block_size $SCRATCH_MNT) # The goal is run an extent swap where one of the associated files has the # minimum number of extents to remain in btree format. First, create a couple # files with large enough extent counts (200 or so should be plenty) to ensure # btree format on the largest possible inode size filesystems. -$XFS_IO_PROG -fc "falloc 0 $((400 * dbsize))" $file1 +$XFS_IO_PROG -fc "falloc 0 $((400 * file_blksz))" $file1 $here/src/punch-alternating $file1 -$XFS_IO_PROG -fc "falloc 0 $((400 * dbsize))" $file2 +$XFS_IO_PROG -fc "falloc 0 $((400 * file_blksz))" $file2 $here/src/punch-alternating $file2 # Now run an extent swap at every possible extent count down to 0. Depending on @@ -55,12 +56,12 @@ $here/src/punch-alternating $file2 # btree format. for i in $(seq 1 2 399); do # punch one extent from the tmpfile and swap - $XFS_IO_PROG -c "fpunch $((i * dbsize)) $dbsize" $file2 + $XFS_IO_PROG -c "fpunch $((i * file_blksz)) $file_blksz" $file2 $XFS_IO_PROG -c "swapext $file2" $file1 # punch the same extent from the old fork (now in file2) to resync the # extent counts and repeat - $XFS_IO_PROG -c "fpunch $((i * dbsize)) $dbsize" $file2 + $XFS_IO_PROG -c "fpunch $((i * file_blksz)) $file_blksz" $file2 done # sanity check that no extents are left over ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 12/13] populate: check that we created a realtime rmap btree of the given height 2022-12-30 22:20 ` [PATCHSET v1.0 00/13] fstests: fixes for realtime rmap Darrick J. Wong ` (8 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 10/13] xfs/443: use file allocation unit, not dbsize Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 13/13] fuzzy: create missing fuzz tests for rt rmap btrees Darrick J. Wong ` (2 subsequent siblings) 12 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan From: Darrick J. Wong <djwong@kernel.org> Make sure that we actually create an rt rmap btree of the desired height somewhere in the filesystem. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- common/populate | 34 ++++++++++++++++++++++++++++++++++ 1 file changed, 34 insertions(+) diff --git a/common/populate b/common/populate index 7d57cd1287..6a05177e6d 100644 --- a/common/populate +++ b/common/populate @@ -631,6 +631,37 @@ __populate_check_xfs_agbtree_height() { return 1 } +# Check that there's at least one rt btree with multiple levels +__populate_check_xfs_rgbtree_height() { + local bt_type="$1" + local rgcount=$(_scratch_xfs_db -c 'sb 0' -c 'p rgcount' | awk '{print $3}') + local path + local path_format + local bt_prefix + + case "${bt_type}" in + "rmap") + path_format="/realtime/%u.rmap" + bt_prefix="u3.rtrmapbt" + ;; + *) + _fail "Don't know about rt btree ${bt_type}" + ;; + esac + + for ((rgno = 0; rgno < rgcount; rgno++)); do + path="$(printf "${path_format}" "${rgno}")" + bt_level=$(_scratch_xfs_db -c "path -m ${path}" -c "p ${bt_prefix}.level" | awk '{print $3}') + # "level" is the actual level within the btree + if [ "${bt_level}" -gt 0 ]; then + return 0 + fi + done + + __populate_fail "Failed to create rt ${bt_type} of sufficient height!" + return 1 +} + # Check that populate created all the types of files we wanted _scratch_xfs_populate_check() { _scratch_mount @@ -654,6 +685,7 @@ _scratch_xfs_populate_check() { is_finobt=$(_xfs_has_feature "$SCRATCH_MNT" finobt -v) is_rmapbt=$(_xfs_has_feature "$SCRATCH_MNT" rmapbt -v) is_reflink=$(_xfs_has_feature "$SCRATCH_MNT" reflink -v) + is_rt="$(_xfs_get_rtextents "$SCRATCH_MNT")" blksz="$(stat -f -c '%s' "${SCRATCH_MNT}")" dblksz="$(_xfs_get_dir_blocksize "$SCRATCH_MNT")" @@ -684,6 +716,8 @@ _scratch_xfs_populate_check() { test $is_finobt -ne 0 && __populate_check_xfs_agbtree_height "fino" test $is_rmapbt -ne 0 && __populate_check_xfs_agbtree_height "rmap" test $is_reflink -ne 0 && __populate_check_xfs_agbtree_height "refcnt" + test $is_rmapbt -ne 0 && test $is_rt -gt 0 && \ + __populate_check_xfs_rgbtree_height "rmap" } # Check data fork format of ext4 file ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 13/13] fuzzy: create missing fuzz tests for rt rmap btrees 2022-12-30 22:20 ` [PATCHSET v1.0 00/13] fstests: fixes for realtime rmap Darrick J. Wong ` (9 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 12/13] populate: check that we created a realtime rmap btree of the given height Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 11/13] populate: adjust rtrmap calculations for rtgroups Darrick J. Wong 2022-12-30 22:20 ` [PATCH 08/13] xfs/3{43,32}: adapt tests for rt extent size greater than 1 Darrick J. Wong 12 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan From: Darrick J. Wong <djwong@kernel.org> Back when I first created the fuzz tests for the realtime rmap btree, I forgot a couple of things. Add tests to fuzz rtrmap btree leaf records, and node keys. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- tests/xfs/1528 | 41 +++++++++++++++++++++++++++++++++++++++++ tests/xfs/1528.out | 4 ++++ tests/xfs/1529 | 40 ++++++++++++++++++++++++++++++++++++++++ tests/xfs/1529.out | 4 ++++ tests/xfs/407 | 2 +- 5 files changed, 90 insertions(+), 1 deletion(-) create mode 100755 tests/xfs/1528 create mode 100644 tests/xfs/1528.out create mode 100755 tests/xfs/1529 create mode 100644 tests/xfs/1529.out diff --git a/tests/xfs/1528 b/tests/xfs/1528 new file mode 100755 index 0000000000..b2e1193ebd --- /dev/null +++ b/tests/xfs/1528 @@ -0,0 +1,41 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0-or-later +# Copyright (c) 2022 Oracle. All Rights Reserved. +# +# FS QA Test No. 1528 +# +# Populate a XFS filesystem and fuzz every rtrmapbt record field. +# Try online repair and, if necessary, offline repair, +# to test the most likely usage pattern. + +. ./common/preamble +_begin_fstest dangerous_fuzzers dangerous_bothrepair realtime + +_register_cleanup "_cleanup" BUS + +# Import common functions. +. ./common/filter +. ./common/populate +. ./common/fuzzy + +# real QA test starts here +_supported_fs xfs +_require_realtime +_require_xfs_scratch_rmapbt +_require_scratch_xfs_fuzz_fields +_disable_dmesg_check + +echo "Format and populate" +_scratch_populate_cached nofill > $seqres.full 2>&1 + +path="$(_scratch_xfs_find_rgbtree_height 'rmap' 2)" || \ + _fail "could not find two-level rtrmapbt" +inode_ver=$(_scratch_xfs_get_metadata_field "core.version" "path -m $path") + +echo "Fuzz rtrmapbt recs" +_scratch_xfs_fuzz_metadata '' 'both' "path -m $path" "addr u${inode_ver}.rtrmapbt.ptrs[1]" >> $seqres.full +echo "Done fuzzing rtrmapbt recs" + +# success, all done +status=0 +exit diff --git a/tests/xfs/1528.out b/tests/xfs/1528.out new file mode 100644 index 0000000000..b51b640c40 --- /dev/null +++ b/tests/xfs/1528.out @@ -0,0 +1,4 @@ +QA output created by 1528 +Format and populate +Fuzz rtrmapbt recs +Done fuzzing rtrmapbt recs diff --git a/tests/xfs/1529 b/tests/xfs/1529 new file mode 100755 index 0000000000..91a673c049 --- /dev/null +++ b/tests/xfs/1529 @@ -0,0 +1,40 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0-or-later +# Copyright (c) 2022 Oracle. All Rights Reserved. +# +# FS QA Test No. 1529 +# +# Populate a XFS filesystem and fuzz every rtrmapbt keyptr field. +# Try online repair and, if necessary, offline repair, +# to test the most likely usage pattern. + +. ./common/preamble +_begin_fstest dangerous_fuzzers dangerous_bothrepair realtime + +_register_cleanup "_cleanup" BUS + +# Import common functions. +. ./common/filter +. ./common/populate +. ./common/fuzzy + +# real QA test starts here +_supported_fs xfs +_require_realtime +_require_xfs_scratch_rmapbt +_require_scratch_xfs_fuzz_fields +_disable_dmesg_check + +echo "Format and populate" +_scratch_populate_cached nofill > $seqres.full 2>&1 + +path="$(_scratch_xfs_find_rgbtree_height 'rmap' 2)" || \ + _fail "could not find two-level rtrmapbt" + +echo "Fuzz rtrmapbt keyptrs" +_scratch_xfs_fuzz_metadata '(rtrmapbt)' 'offline' "path -m $path" >> $seqres.full +echo "Done fuzzing rtrmapbt keyptrs" + +# success, all done +status=0 +exit diff --git a/tests/xfs/1529.out b/tests/xfs/1529.out new file mode 100644 index 0000000000..808fcc957f --- /dev/null +++ b/tests/xfs/1529.out @@ -0,0 +1,4 @@ +QA output created by 1529 +Format and populate +Fuzz rtrmapbt keyptrs +Done fuzzing rtrmapbt keyptrs diff --git a/tests/xfs/407 b/tests/xfs/407 index 2460ea336c..bd439105e2 100755 --- a/tests/xfs/407 +++ b/tests/xfs/407 @@ -26,7 +26,7 @@ _require_scratch_xfs_fuzz_fields echo "Format and populate" _scratch_populate_cached nofill > $seqres.full 2>&1 -path="$(_scratch_xfs_find_rgbtree_height 'rmap' 1)" || \ +path="$(_scratch_xfs_find_rgbtree_height 'rmap' 2)" || \ _fail "could not find two-level rtrmapbt" inode_ver=$(_scratch_xfs_get_metadata_field "core.version" "path -m $path") ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 11/13] populate: adjust rtrmap calculations for rtgroups 2022-12-30 22:20 ` [PATCHSET v1.0 00/13] fstests: fixes for realtime rmap Darrick J. Wong ` (10 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 13/13] fuzzy: create missing fuzz tests for rt rmap btrees Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 08/13] xfs/3{43,32}: adapt tests for rt extent size greater than 1 Darrick J. Wong 12 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan From: Darrick J. Wong <djwong@kernel.org> Now that we've sharded the realtime volume and created per-group rmap btrees, we need to adjust downward the size of rtrmapbt records since the block counts are now 32-bit instead of 64-bit. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- common/populate | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/common/populate b/common/populate index c0bbbc3f3b..7d57cd1287 100644 --- a/common/populate +++ b/common/populate @@ -379,7 +379,7 @@ _scratch_xfs_populate() { is_rt="$(_xfs_get_rtextents "$SCRATCH_MNT")" if [ $is_rmapbt -gt 0 ] && [ $is_rt -gt 0 ]; then echo "+ rtrmapbt btree" - nr="$((blksz * 2 / 32))" + nr="$((blksz * 2 / 24))" $XFS_IO_PROG -R -f -c 'truncate 0' "${SCRATCH_MNT}/RTRMAPBT" __populate_create_file $((blksz * nr)) "${SCRATCH_MNT}/RTRMAPBT" fi ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 08/13] xfs/3{43,32}: adapt tests for rt extent size greater than 1 2022-12-30 22:20 ` [PATCHSET v1.0 00/13] fstests: fixes for realtime rmap Darrick J. Wong ` (11 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 11/13] populate: adjust rtrmap calculations for rtgroups Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 12 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan From: Darrick J. Wong <djwong@kernel.org> Both of these tests for the realtime volume can fail when the rt extent size is larger than a single block. 332 is a read-write functionality test that encodes md5sum in the output, so we need to skip it if $blksz isn't congruent with the extent size, because the fcollapse call will fail. 343 is a test of the rmap btree, so the fix here is simpler -- make $blksz the file allocation unit, and get rid of the md5sum in the golden output. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- tests/xfs/332 | 6 +----- tests/xfs/332.out | 2 -- tests/xfs/343 | 2 ++ 3 files changed, 3 insertions(+), 7 deletions(-) diff --git a/tests/xfs/332 b/tests/xfs/332 index a2d37ee905..c1ac87adcb 100755 --- a/tests/xfs/332 +++ b/tests/xfs/332 @@ -28,7 +28,7 @@ rm -f "$seqres.full" echo "Format and mount" _scratch_mkfs > "$seqres.full" 2>&1 _scratch_mount -blksz=65536 +blksz=$(_get_file_block_size $SCRATCH_MNT) # 65536 blocks=16 len=$((blocks * blksz)) @@ -45,10 +45,6 @@ $XFS_IO_PROG -c "fpunch $blksz $blksz" \ -c "fcollapse $((9 * blksz)) $blksz" \ -c "finsert $((10 * blksz)) $blksz" $SCRATCH_MNT/f1 >> $seqres.full -echo "Check file" -md5sum $SCRATCH_MNT/f1 | _filter_scratch -od -tx1 -Ad -c $SCRATCH_MNT/f1 >> $seqres.full - echo "Unmount" _scratch_unmount diff --git a/tests/xfs/332.out b/tests/xfs/332.out index 9beff7cc37..3a7ca95b40 100644 --- a/tests/xfs/332.out +++ b/tests/xfs/332.out @@ -2,8 +2,6 @@ QA output created by 332 Format and mount Create some files Manipulate file -Check file -e45c5707fcf6817e914ffb6ce37a0ac7 SCRATCH_MNT/f1 Unmount Try a regular fsmap Try a bad fsmap diff --git a/tests/xfs/343 b/tests/xfs/343 index bffcc7d9ac..fe461847ed 100755 --- a/tests/xfs/343 +++ b/tests/xfs/343 @@ -31,6 +31,8 @@ blksz=65536 blocks=16 len=$((blocks * blksz)) +_require_congruent_file_oplen $SCRATCH_MNT $blksz + echo "Create some files" $XFS_IO_PROG -f -R -c "falloc 0 $len" -c "pwrite -S 0x68 -b 1048576 0 $len" $SCRATCH_MNT/f1 >> $seqres.full ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCHSET v1.0 00/10] fstests: reflink on the realtime device 2022-12-30 21:14 [NYE DELUGE 3/4] xfs: modernize the realtime volume Darrick J. Wong ` (36 preceding siblings ...) 2022-12-30 22:20 ` [PATCHSET v1.0 00/13] fstests: fixes for realtime rmap Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 01/10] xfs/122: update fields for realtime reflink Darrick J. Wong ` (9 more replies) 2022-12-30 22:20 ` [PATCHSET v1.0 0/4] fstests: reflink with large realtime extents Darrick J. Wong 2022-12-30 22:20 ` [PATCHSET v1.0 0/1] fstests: functional tests for rt quota Darrick J. Wong 39 siblings, 10 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan Hi all, This patchset enables use of the file data block sharing feature (i.e. reflink) on the realtime device. It follows the same basic sequence as the realtime rmap series -- first a few cleanups; then widening of the API parameters; and introduction of the new btree format and inode fork format. Next comes enabling CoW and remapping for the rt device; new scrub, repair, and health reporting code; and at the end we implement some code to lengthen write requests so that rt extents are always CoWed fully. If you're going to start using this mess, you probably ought to just pull from my git trees, which are linked below. This is an extraordinary way to destroy everything. Enjoy! Comments and questions are, as always, welcome. --D kernel git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=realtime-reflink xfsprogs git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=realtime-reflink fstests git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=realtime-reflink xfsdocs git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-documentation.git/log/?h=realtime-reflink --- common/populate | 26 ++++++++++++++++++--- common/xfs | 15 ++++++++++++ tests/generic/331 | 12 ++++++++- tests/generic/331.out | 2 +- tests/xfs/122.out | 3 ++ tests/xfs/131 | 48 -------------------------------------- tests/xfs/131.out | 5 ---- tests/xfs/1538 | 41 ++++++++++++++++++++++++++++++++ tests/xfs/1538.out | 4 +++ tests/xfs/1539 | 41 ++++++++++++++++++++++++++++++++ tests/xfs/1539.out | 4 +++ tests/xfs/1540 | 41 ++++++++++++++++++++++++++++++++ tests/xfs/1540.out | 4 +++ tests/xfs/1541 | 42 +++++++++++++++++++++++++++++++++ tests/xfs/1541.out | 4 +++ tests/xfs/1542 | 41 ++++++++++++++++++++++++++++++++ tests/xfs/1542.out | 4 +++ tests/xfs/1543 | 40 ++++++++++++++++++++++++++++++++ tests/xfs/1543.out | 4 +++ tests/xfs/1544 | 40 ++++++++++++++++++++++++++++++++ tests/xfs/1544.out | 4 +++ tests/xfs/1545 | 41 ++++++++++++++++++++++++++++++++ tests/xfs/1545.out | 4 +++ tests/xfs/240 | 13 +++++++++- tests/xfs/240.out | 2 +- tests/xfs/243 | 5 ++++ tests/xfs/272 | 40 +++++++++++++++++++++----------- tests/xfs/274 | 62 ++++++++++++++++++++++++++++++++++--------------- tests/xfs/769 | 3 ++ tests/xfs/818 | 43 ++++++++++++++++++++++++++++++++++ tests/xfs/818.out | 2 ++ tests/xfs/819 | 43 ++++++++++++++++++++++++++++++++++ tests/xfs/819.out | 2 ++ 33 files changed, 590 insertions(+), 95 deletions(-) delete mode 100755 tests/xfs/131 delete mode 100644 tests/xfs/131.out create mode 100755 tests/xfs/1538 create mode 100644 tests/xfs/1538.out create mode 100755 tests/xfs/1539 create mode 100644 tests/xfs/1539.out create mode 100755 tests/xfs/1540 create mode 100644 tests/xfs/1540.out create mode 100755 tests/xfs/1541 create mode 100644 tests/xfs/1541.out create mode 100755 tests/xfs/1542 create mode 100644 tests/xfs/1542.out create mode 100755 tests/xfs/1543 create mode 100644 tests/xfs/1543.out create mode 100755 tests/xfs/1544 create mode 100644 tests/xfs/1544.out create mode 100755 tests/xfs/1545 create mode 100644 tests/xfs/1545.out create mode 100755 tests/xfs/818 create mode 100644 tests/xfs/818.out create mode 100755 tests/xfs/819 create mode 100644 tests/xfs/819.out ^ permalink raw reply [flat|nested] 565+ messages in thread
* [PATCH 01/10] xfs/122: update fields for realtime reflink 2022-12-30 22:20 ` [PATCHSET v1.0 00/10] fstests: reflink on the realtime device Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 02/10] common/populate: create realtime refcount btree Darrick J. Wong ` (8 subsequent siblings) 9 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan From: Darrick J. Wong <djwong@kernel.org> Add a few more ondisk structures for realtime reflink. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- tests/xfs/122.out | 3 +++ 1 file changed, 3 insertions(+) diff --git a/tests/xfs/122.out b/tests/xfs/122.out index e0801f9660..3239a655f9 100644 --- a/tests/xfs/122.out +++ b/tests/xfs/122.out @@ -116,6 +116,9 @@ sizeof(struct xfs_rmap_key) = 20 sizeof(struct xfs_rmap_rec) = 24 sizeof(struct xfs_rtbuf_blkinfo) = 48 sizeof(struct xfs_rtgroup_geometry) = 128 +sizeof(struct xfs_rtrefcount_key) = 4 +sizeof(struct xfs_rtrefcount_rec) = 12 +sizeof(struct xfs_rtrefcount_root) = 4 sizeof(struct xfs_rtrmap_key) = 20 sizeof(struct xfs_rtrmap_rec) = 24 sizeof(struct xfs_rtrmap_root) = 4 ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 02/10] common/populate: create realtime refcount btree 2022-12-30 22:20 ` [PATCHSET v1.0 00/10] fstests: reflink on the realtime device Darrick J. Wong 2022-12-30 22:20 ` [PATCH 01/10] xfs/122: update fields for realtime reflink Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 03/10] xfs: create fuzz tests for the " Darrick J. Wong ` (7 subsequent siblings) 9 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan From: Darrick J. Wong <djwong@kernel.org> Populate a realtime refcount btree when we're creating a sample fs. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- common/populate | 26 +++++++++++++++++++++++--- 1 file changed, 23 insertions(+), 3 deletions(-) diff --git a/common/populate b/common/populate index 6a05177e6d..734e31345f 100644 --- a/common/populate +++ b/common/populate @@ -367,16 +367,30 @@ _scratch_xfs_populate() { rm -f "${dir}/${f}" done - # Reverse-mapping btree + is_rt="$(_xfs_get_rtextents "$SCRATCH_MNT")" is_rmapbt="$(_xfs_has_feature "$SCRATCH_MNT" rmapbt -v)" + is_reflink="$(_xfs_has_feature "$SCRATCH_MNT" reflink -v)" + + # Reverse-mapping btree if [ $is_rmapbt -gt 0 ]; then echo "+ rmapbt btree" nr="$((blksz * 2 / 24))" __populate_create_file $((blksz * nr)) "${SCRATCH_MNT}/RMAPBT" fi + # Realtime Reference-count btree comes before the rtrmapbt so that + # the refcount entries are created in rtgroup 0. + if [ $is_reflink -gt 0 ] && [ $is_rt -gt 0 ]; then + echo "+ rtreflink btree" + rt_blksz=$(_xfs_get_rtextsize "$SCRATCH_MNT") + nr="$((rt_blksz * 2 / 12))" + $XFS_IO_PROG -R -f -c 'truncate 0' "${SCRATCH_MNT}/RTREFCOUNTBT" + __populate_create_file $((blksz * nr)) "${SCRATCH_MNT}/RTREFCOUNTBT" + $XFS_IO_PROG -R -f -c 'truncate 0' "${SCRATCH_MNT}/RTREFCOUNTBT2" + cp --reflink=always "${SCRATCH_MNT}/RTREFCOUNTBT" "${SCRATCH_MNT}/RTREFCOUNTBT2" + fi + # Realtime Reverse-mapping btree - is_rt="$(_xfs_get_rtextents "$SCRATCH_MNT")" if [ $is_rmapbt -gt 0 ] && [ $is_rt -gt 0 ]; then echo "+ rtrmapbt btree" nr="$((blksz * 2 / 24))" @@ -385,7 +399,6 @@ _scratch_xfs_populate() { fi # Reference-count btree - is_reflink="$(_xfs_has_feature "$SCRATCH_MNT" reflink -v)" if [ $is_reflink -gt 0 ]; then echo "+ reflink btree" nr="$((blksz * 2 / 12))" @@ -403,6 +416,7 @@ _scratch_xfs_populate() { __populate_fragment_file "${SCRATCH_MNT}/RMAPBT" __populate_fragment_file "${SCRATCH_MNT}/RTRMAPBT" __populate_fragment_file "${SCRATCH_MNT}/REFCOUNTBT" + __populate_fragment_file "${SCRATCH_MNT}/RTREFCOUNTBT" umount "${SCRATCH_MNT}" } @@ -644,6 +658,10 @@ __populate_check_xfs_rgbtree_height() { path_format="/realtime/%u.rmap" bt_prefix="u3.rtrmapbt" ;; + "refcnt") + path_format="/realtime/%u.refcount" + bt_prefix="u3.rtrefcbt" + ;; *) _fail "Don't know about rt btree ${bt_type}" ;; @@ -718,6 +736,8 @@ _scratch_xfs_populate_check() { test $is_reflink -ne 0 && __populate_check_xfs_agbtree_height "refcnt" test $is_rmapbt -ne 0 && test $is_rt -gt 0 && \ __populate_check_xfs_rgbtree_height "rmap" + test $is_reflink -ne 0 && test $is_rt -gt 0 && \ + __populate_check_xfs_rgbtree_height "refcnt" } # Check data fork format of ext4 file ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 03/10] xfs: create fuzz tests for the realtime refcount btree 2022-12-30 22:20 ` [PATCHSET v1.0 00/10] fstests: reflink on the realtime device Darrick J. Wong 2022-12-30 22:20 ` [PATCH 01/10] xfs/122: update fields for realtime reflink Darrick J. Wong 2022-12-30 22:20 ` [PATCH 02/10] common/populate: create realtime refcount btree Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 06/10] xfs: race fsstress with realtime refcount btree scrub and repair Darrick J. Wong ` (6 subsequent siblings) 9 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan From: Darrick J. Wong <djwong@kernel.org> Create fuzz tests for the realtime refcount btree record and key/ptr blocks. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- common/xfs | 4 ++++ tests/xfs/1538 | 41 +++++++++++++++++++++++++++++++++++++++++ tests/xfs/1538.out | 4 ++++ tests/xfs/1539 | 41 +++++++++++++++++++++++++++++++++++++++++ tests/xfs/1539.out | 4 ++++ tests/xfs/1540 | 41 +++++++++++++++++++++++++++++++++++++++++ tests/xfs/1540.out | 4 ++++ tests/xfs/1541 | 42 ++++++++++++++++++++++++++++++++++++++++++ tests/xfs/1541.out | 4 ++++ tests/xfs/1542 | 41 +++++++++++++++++++++++++++++++++++++++++ tests/xfs/1542.out | 4 ++++ tests/xfs/1543 | 40 ++++++++++++++++++++++++++++++++++++++++ tests/xfs/1543.out | 4 ++++ tests/xfs/1544 | 40 ++++++++++++++++++++++++++++++++++++++++ tests/xfs/1544.out | 4 ++++ tests/xfs/1545 | 41 +++++++++++++++++++++++++++++++++++++++++ tests/xfs/1545.out | 4 ++++ 17 files changed, 363 insertions(+) create mode 100755 tests/xfs/1538 create mode 100644 tests/xfs/1538.out create mode 100755 tests/xfs/1539 create mode 100644 tests/xfs/1539.out create mode 100755 tests/xfs/1540 create mode 100644 tests/xfs/1540.out create mode 100755 tests/xfs/1541 create mode 100644 tests/xfs/1541.out create mode 100755 tests/xfs/1542 create mode 100644 tests/xfs/1542.out create mode 100755 tests/xfs/1543 create mode 100644 tests/xfs/1543.out create mode 100755 tests/xfs/1544 create mode 100644 tests/xfs/1544.out create mode 100755 tests/xfs/1545 create mode 100644 tests/xfs/1545.out diff --git a/common/xfs b/common/xfs index 63eff39d47..7b7b3a35b5 100644 --- a/common/xfs +++ b/common/xfs @@ -1829,6 +1829,10 @@ _scratch_xfs_find_rgbtree_height() { path_format="/realtime/%u.rmap" bt_prefix="u3.rtrmapbt" ;; + "refcnt") + path_format="/realtime/%u.refcount" + bt_prefix="u3.rtrefcbt" + ;; *) _fail "Don't know about rt btree ${bt_type}" ;; diff --git a/tests/xfs/1538 b/tests/xfs/1538 new file mode 100755 index 0000000000..e62bf49b29 --- /dev/null +++ b/tests/xfs/1538 @@ -0,0 +1,41 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0-or-later +# Copyright (c) 2022 Oracle. All Rights Reserved. +# +# FS QA Test No. 1538 +# +# Populate a XFS filesystem and fuzz every rtrefcountbt record field. +# Use xfs_scrub to fix the corruption. + +. ./common/preamble +_begin_fstest dangerous_fuzzers dangerous_scrub dangerous_online_repair realtime + +_register_cleanup "_cleanup" BUS + +# Import common functions. +. ./common/filter +. ./common/populate +. ./common/fuzzy +. ./common/reflink + +# real QA test starts here +_supported_fs xfs +_require_realtime +_require_scratch_reflink +_require_scratch_xfs_fuzz_fields +_disable_dmesg_check + +echo "Format and populate" +_scratch_populate_cached nofill > $seqres.full 2>&1 + +path="$(_scratch_xfs_find_rgbtree_height 'refcnt' 2)" || \ + _fail "could not find two-level rtrefcountbt" +inode_ver=$(_scratch_xfs_get_metadata_field "core.version" "path -m $path") + +echo "Fuzz rtrefcountbt recs" +_scratch_xfs_fuzz_metadata '' 'online' "path -m $path" "addr u${inode_ver}.rtrefcbt.ptrs[1]" >> $seqres.full +echo "Done fuzzing rtrefcountbt recs" + +# success, all done +status=0 +exit diff --git a/tests/xfs/1538.out b/tests/xfs/1538.out new file mode 100644 index 0000000000..968cfd6ef9 --- /dev/null +++ b/tests/xfs/1538.out @@ -0,0 +1,4 @@ +QA output created by 1538 +Format and populate +Fuzz rtrefcountbt recs +Done fuzzing rtrefcountbt recs diff --git a/tests/xfs/1539 b/tests/xfs/1539 new file mode 100755 index 0000000000..36cef96a91 --- /dev/null +++ b/tests/xfs/1539 @@ -0,0 +1,41 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0-or-later +# Copyright (c) 2022 Oracle. All Rights Reserved. +# +# FS QA Test No. 1539 +# +# Populate a XFS filesystem and fuzz every rtrefcountbt record field. +# Use xfs_repair to fix the corruption. + +. ./common/preamble +_begin_fstest dangerous_fuzzers dangerous_scrub dangerous_repair realtime + +_register_cleanup "_cleanup" BUS + +# Import common functions. +. ./common/filter +. ./common/populate +. ./common/fuzzy +. ./common/reflink + +# real QA test starts here +_supported_fs xfs +_require_realtime +_require_scratch_reflink +_require_scratch_xfs_fuzz_fields +_disable_dmesg_check + +echo "Format and populate" +_scratch_populate_cached nofill > $seqres.full 2>&1 + +path="$(_scratch_xfs_find_rgbtree_height 'refcnt' 2)" || \ + _fail "could not find two-level rtrefcountbt" +inode_ver=$(_scratch_xfs_get_metadata_field "core.version" "path -m $path") + +echo "Fuzz rtrefcountbt recs" +_scratch_xfs_fuzz_metadata '' 'offline' "path -m $path" "addr u${inode_ver}.rtrefcbt.ptrs[1]" >> $seqres.full +echo "Done fuzzing rtrefcountbt recs" + +# success, all done +status=0 +exit diff --git a/tests/xfs/1539.out b/tests/xfs/1539.out new file mode 100644 index 0000000000..aa3a963dc2 --- /dev/null +++ b/tests/xfs/1539.out @@ -0,0 +1,4 @@ +QA output created by 1539 +Format and populate +Fuzz rtrefcountbt recs +Done fuzzing rtrefcountbt recs diff --git a/tests/xfs/1540 b/tests/xfs/1540 new file mode 100755 index 0000000000..fa08d3fb54 --- /dev/null +++ b/tests/xfs/1540 @@ -0,0 +1,41 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0-or-later +# Copyright (c) 2022 Oracle. All Rights Reserved. +# +# FS QA Test No. 1540 +# +# Populate a XFS filesystem and fuzz every rtrefcountbt record field. +# Do not fix the filesystem, to test metadata verifiers. + +. ./common/preamble +_begin_fstest dangerous_fuzzers dangerous_norepair realtime + +_register_cleanup "_cleanup" BUS + +# Import common functions. +. ./common/filter +. ./common/populate +. ./common/fuzzy +. ./common/reflink + +# real QA test starts here +_supported_fs xfs +_require_realtime +_require_scratch_reflink +_require_scratch_xfs_fuzz_fields +_disable_dmesg_check + +echo "Format and populate" +_scratch_populate_cached nofill > $seqres.full 2>&1 + +path="$(_scratch_xfs_find_rgbtree_height 'refcnt' 2)" || \ + _fail "could not find two-level rtrefcountbt" +inode_ver=$(_scratch_xfs_get_metadata_field "core.version" "path -m $path") + +echo "Fuzz rtrefcountbt recs" +_scratch_xfs_fuzz_metadata '' 'none' "path -m $path" "addr u${inode_ver}.rtrefcbt.ptrs[1]" >> $seqres.full +echo "Done fuzzing rtrefcountbt recs" + +# success, all done +status=0 +exit diff --git a/tests/xfs/1540.out b/tests/xfs/1540.out new file mode 100644 index 0000000000..37f3311837 --- /dev/null +++ b/tests/xfs/1540.out @@ -0,0 +1,4 @@ +QA output created by 1540 +Format and populate +Fuzz rtrefcountbt recs +Done fuzzing rtrefcountbt recs diff --git a/tests/xfs/1541 b/tests/xfs/1541 new file mode 100755 index 0000000000..ecf6fdc56c --- /dev/null +++ b/tests/xfs/1541 @@ -0,0 +1,42 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0-or-later +# Copyright (c) 2022 Oracle. All Rights Reserved. +# +# FS QA Test No. 1541 +# +# Populate a XFS filesystem and fuzz every rtrefcountbt record field. +# Try online repair and, if necessary, offline repair, +# to test the most likely usage pattern. + +. ./common/preamble +_begin_fstest dangerous_fuzzers dangerous_bothrepair realtime + +_register_cleanup "_cleanup" BUS + +# Import common functions. +. ./common/filter +. ./common/populate +. ./common/fuzzy +. ./common/reflink + +# real QA test starts here +_supported_fs xfs +_require_realtime +_require_scratch_reflink +_require_scratch_xfs_fuzz_fields +_disable_dmesg_check + +echo "Format and populate" +_scratch_populate_cached nofill > $seqres.full 2>&1 + +path="$(_scratch_xfs_find_rgbtree_height 'refcnt' 2)" || \ + _fail "could not find two-level rtrefcountbt" +inode_ver=$(_scratch_xfs_get_metadata_field "core.version" "path -m $path") + +echo "Fuzz rtrefcountbt recs" +_scratch_xfs_fuzz_metadata '' 'both' "path -m $path" "addr u${inode_ver}.rtrefcbt.ptrs[1]" >> $seqres.full +echo "Done fuzzing rtrefcountbt recs" + +# success, all done +status=0 +exit diff --git a/tests/xfs/1541.out b/tests/xfs/1541.out new file mode 100644 index 0000000000..35a9b73471 --- /dev/null +++ b/tests/xfs/1541.out @@ -0,0 +1,4 @@ +QA output created by 1541 +Format and populate +Fuzz rtrefcountbt recs +Done fuzzing rtrefcountbt recs diff --git a/tests/xfs/1542 b/tests/xfs/1542 new file mode 100755 index 0000000000..37ef8a2b4c --- /dev/null +++ b/tests/xfs/1542 @@ -0,0 +1,41 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0-or-later +# Copyright (c) 2022 Oracle. All Rights Reserved. +# +# FS QA Test No. 1542 +# +# Populate a XFS filesystem and fuzz every rtrefcountbt key/ptr field. +# Use xfs_scrub to fix the corruption. + +. ./common/preamble +_begin_fstest dangerous_fuzzers dangerous_scrub dangerous_online_repair realtime + +_register_cleanup "_cleanup" BUS + +# Import common functions. +. ./common/filter +. ./common/populate +. ./common/fuzzy +. ./common/reflink + +# real QA test starts here +_supported_fs xfs +_require_realtime +_require_scratch_reflink +_require_scratch_xfs_fuzz_fields +_disable_dmesg_check + +echo "Format and populate" +_scratch_populate_cached nofill > $seqres.full 2>&1 + +path="$(_scratch_xfs_find_rgbtree_height 'refcnt' 2)" || \ + _fail "could not find two-level rtrefcountbt" +inode_ver=$(_scratch_xfs_get_metadata_field "core.version" "path -m $path") + +echo "Fuzz rtrefcountbt keyptrs" +_scratch_xfs_fuzz_metadata '(rtrefcbt)' 'online' "path -m $path" >> $seqres.full +echo "Done fuzzing rtrefcountbt keyptrs" + +# success, all done +status=0 +exit diff --git a/tests/xfs/1542.out b/tests/xfs/1542.out new file mode 100644 index 0000000000..55d820b4b1 --- /dev/null +++ b/tests/xfs/1542.out @@ -0,0 +1,4 @@ +QA output created by 1542 +Format and populate +Fuzz rtrefcountbt keyptrs +Done fuzzing rtrefcountbt keyptrs diff --git a/tests/xfs/1543 b/tests/xfs/1543 new file mode 100755 index 0000000000..0acd3203e3 --- /dev/null +++ b/tests/xfs/1543 @@ -0,0 +1,40 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0-or-later +# Copyright (c) 2022 Oracle. All Rights Reserved. +# +# FS QA Test No. 1543 +# +# Populate a XFS filesystem and fuzz every rtrefcountbt key/ptr field. +# Use xfs_repair to fix the corruption. + +. ./common/preamble +_begin_fstest dangerous_fuzzers dangerous_scrub dangerous_repair realtime + +_register_cleanup "_cleanup" BUS + +# Import common functions. +. ./common/filter +. ./common/populate +. ./common/fuzzy +. ./common/reflink + +# real QA test starts here +_supported_fs xfs +_require_realtime +_require_scratch_reflink +_require_scratch_xfs_fuzz_fields +_disable_dmesg_check + +echo "Format and populate" +_scratch_populate_cached nofill > $seqres.full 2>&1 + +path="$(_scratch_xfs_find_rgbtree_height 'refcnt' 2)" || \ + _fail "could not find two-level rtrefcountbt" + +echo "Fuzz rtrefcountbt keyptrs" +_scratch_xfs_fuzz_metadata '(rtrefcbt)' 'offline' "path -m $path" >> $seqres.full +echo "Done fuzzing rtrefcountbt keyptrs" + +# success, all done +status=0 +exit diff --git a/tests/xfs/1543.out b/tests/xfs/1543.out new file mode 100644 index 0000000000..e7afa10744 --- /dev/null +++ b/tests/xfs/1543.out @@ -0,0 +1,4 @@ +QA output created by 1543 +Format and populate +Fuzz rtrefcountbt keyptrs +Done fuzzing rtrefcountbt keyptrs diff --git a/tests/xfs/1544 b/tests/xfs/1544 new file mode 100755 index 0000000000..165f96f6a4 --- /dev/null +++ b/tests/xfs/1544 @@ -0,0 +1,40 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0-or-later +# Copyright (c) 2022 Oracle. All Rights Reserved. +# +# FS QA Test No. 1544 +# +# Populate a XFS filesystem and fuzz every rtrefcountbt key/ptr field. +# Do not fix the filesystem, to test metadata verifiers. + +. ./common/preamble +_begin_fstest dangerous_fuzzers dangerous_norepair realtime + +_register_cleanup "_cleanup" BUS + +# Import common functions. +. ./common/filter +. ./common/populate +. ./common/fuzzy +. ./common/reflink + +# real QA test starts here +_supported_fs xfs +_require_realtime +_require_scratch_reflink +_require_scratch_xfs_fuzz_fields +_disable_dmesg_check + +echo "Format and populate" +_scratch_populate_cached nofill > $seqres.full 2>&1 + +path="$(_scratch_xfs_find_rgbtree_height 'refcnt' 2)" || \ + _fail "could not find two-level rtrefcountbt" + +echo "Fuzz rtrefcountbt keyptrs" +_scratch_xfs_fuzz_metadata '(rtrefcbt)' 'none' "path -m $path" >> $seqres.full +echo "Done fuzzing rtrefcountbt keyptrs" + +# success, all done +status=0 +exit diff --git a/tests/xfs/1544.out b/tests/xfs/1544.out new file mode 100644 index 0000000000..b39532c160 --- /dev/null +++ b/tests/xfs/1544.out @@ -0,0 +1,4 @@ +QA output created by 1544 +Format and populate +Fuzz rtrefcountbt keyptrs +Done fuzzing rtrefcountbt keyptrs diff --git a/tests/xfs/1545 b/tests/xfs/1545 new file mode 100755 index 0000000000..a467662f2f --- /dev/null +++ b/tests/xfs/1545 @@ -0,0 +1,41 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0-or-later +# Copyright (c) 2022 Oracle. All Rights Reserved. +# +# FS QA Test No. 1545 +# +# Populate a XFS filesystem and fuzz every rtrefcountbt key/ptr field. +# Try online repair and, if necessary, offline repair, +# to test the most likely usage pattern. + +. ./common/preamble +_begin_fstest dangerous_fuzzers dangerous_bothrepair realtime + +_register_cleanup "_cleanup" BUS + +# Import common functions. +. ./common/filter +. ./common/populate +. ./common/fuzzy +. ./common/reflink + +# real QA test starts here +_supported_fs xfs +_require_realtime +_require_scratch_reflink +_require_scratch_xfs_fuzz_fields +_disable_dmesg_check + +echo "Format and populate" +_scratch_populate_cached nofill > $seqres.full 2>&1 + +path="$(_scratch_xfs_find_rgbtree_height 'refcnt' 2)" || \ + _fail "could not find two-level rtrefcountbt" + +echo "Fuzz rtrefcountbt keyptrs" +_scratch_xfs_fuzz_metadata '(rtrefcbt)' 'both' "path -m $path" >> $seqres.full +echo "Done fuzzing rtrefcountbt keyptrs" + +# success, all done +status=0 +exit diff --git a/tests/xfs/1545.out b/tests/xfs/1545.out new file mode 100644 index 0000000000..982a0d64df --- /dev/null +++ b/tests/xfs/1545.out @@ -0,0 +1,4 @@ +QA output created by 1545 +Format and populate +Fuzz rtrefcountbt keyptrs +Done fuzzing rtrefcountbt keyptrs ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 06/10] xfs: race fsstress with realtime refcount btree scrub and repair 2022-12-30 22:20 ` [PATCHSET v1.0 00/10] fstests: reflink on the realtime device Darrick J. Wong ` (2 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 03/10] xfs: create fuzz tests for the " Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 09/10] generic/331,xfs/240: support files that skip delayed allocation Darrick J. Wong ` (5 subsequent siblings) 9 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan From: Darrick J. Wong <djwong@kernel.org> Race checking and rebuilding realtime refcount btrees with fsstress. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- tests/xfs/818 | 43 +++++++++++++++++++++++++++++++++++++++++++ tests/xfs/818.out | 2 ++ tests/xfs/819 | 43 +++++++++++++++++++++++++++++++++++++++++++ tests/xfs/819.out | 2 ++ 4 files changed, 90 insertions(+) create mode 100755 tests/xfs/818 create mode 100644 tests/xfs/818.out create mode 100755 tests/xfs/819 create mode 100644 tests/xfs/819.out diff --git a/tests/xfs/818 b/tests/xfs/818 new file mode 100755 index 0000000000..aabe636750 --- /dev/null +++ b/tests/xfs/818 @@ -0,0 +1,43 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright (c) 2022 Oracle. Inc. All Rights Reserved. +# +# FS QA Test No. 818 +# +# Race fsstress and rt refcount btree scrub for a while to see if we crash or +# livelock. +# +. ./common/preamble +_begin_fstest scrub dangerous_fsstress_scrub + +_cleanup() { + _scratch_xfs_stress_scrub_cleanup &> /dev/null + cd / + rm -r -f $tmp.* +} +_register_cleanup "_cleanup" BUS + +# Import common functions. +. ./common/filter +. ./common/fuzzy +. ./common/inject +. ./common/xfs + +# real QA test starts here +_supported_fs xfs +_require_realtime +_require_scratch +_require_xfs_stress_scrub + +_scratch_mkfs > "$seqres.full" 2>&1 +_scratch_mount +_require_xfs_has_feature "$SCRATCH_MNT" realtime +_require_xfs_has_feature "$SCRATCH_MNT" reflink +_xfs_force_bdev realtime $SCRATCH_MNT + +_scratch_xfs_stress_scrub -s "scrub rtrefcountbt %rgno%" + +# success, all done +echo Silence is golden +status=0 +exit diff --git a/tests/xfs/818.out b/tests/xfs/818.out new file mode 100644 index 0000000000..cb0997862e --- /dev/null +++ b/tests/xfs/818.out @@ -0,0 +1,2 @@ +QA output created by 818 +Silence is golden diff --git a/tests/xfs/819 b/tests/xfs/819 new file mode 100755 index 0000000000..e302ed1fdc --- /dev/null +++ b/tests/xfs/819 @@ -0,0 +1,43 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0 +# Copyright (c) 2022 Oracle. Inc. All Rights Reserved. +# +# FS QA Test No. 819 +# +# Race fsstress and rt refcount btree scrub for a while to see if we crash or +# livelock. +# +. ./common/preamble +_begin_fstest online_repair dangerous_fsstress_repair + +_cleanup() { + _scratch_xfs_stress_scrub_cleanup &> /dev/null + cd / + rm -r -f $tmp.* +} +_register_cleanup "_cleanup" BUS + +# Import common functions. +. ./common/filter +. ./common/fuzzy +. ./common/inject +. ./common/xfs + +# real QA test starts here +_supported_fs xfs +_require_realtime +_require_scratch +_require_xfs_stress_online_repair + +_scratch_mkfs > "$seqres.full" 2>&1 +_scratch_mount +_require_xfs_has_feature "$SCRATCH_MNT" realtime +_require_xfs_has_feature "$SCRATCH_MNT" reflink +_xfs_force_bdev realtime $SCRATCH_MNT + +_scratch_xfs_stress_online_repair -s "repair rtrefcountbt %rgno%" + +# success, all done +echo Silence is golden +status=0 +exit diff --git a/tests/xfs/819.out b/tests/xfs/819.out new file mode 100644 index 0000000000..f5df7622a7 --- /dev/null +++ b/tests/xfs/819.out @@ -0,0 +1,2 @@ +QA output created by 819 +Silence is golden ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 09/10] generic/331,xfs/240: support files that skip delayed allocation 2022-12-30 22:20 ` [PATCHSET v1.0 00/10] fstests: reflink on the realtime device Darrick J. Wong ` (3 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 06/10] xfs: race fsstress with realtime refcount btree scrub and repair Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 05/10] xfs/243: don't run when realtime storage is the default Darrick J. Wong ` (4 subsequent siblings) 9 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan From: Darrick J. Wong <djwong@kernel.org> The goal of this test is to ensure that log recovery finishes a copy on write operation in the event of temporary media errors. It's important that the test observe some sort of IO error once we switch the scratch device to fail all IOs, but regrettably the test encoded the specific behavior of XFS and btrfs when the test was written -- the aio write to the page cache doesn't have to touch the disk and succeeds, and the fdatasync flushes things to disk and hits the IO error. However, this is not how things work on the XFS realtime device. There is no delalloc on realtime, so the aio write allocates an unwritten extent to stage the write. The allocation fails due to EIO, so it's the write call that fails. Therefore, all we need to do is to detect an IO error at any point between the write and the fdatasync call to be satisfied that the test does what we want to do. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- tests/generic/331 | 12 ++++++++++-- tests/generic/331.out | 2 +- tests/xfs/240 | 13 +++++++++++-- tests/xfs/240.out | 2 +- 4 files changed, 23 insertions(+), 6 deletions(-) diff --git a/tests/generic/331 b/tests/generic/331 index 492abedf76..8c665ce4fc 100755 --- a/tests/generic/331 +++ b/tests/generic/331 @@ -59,9 +59,17 @@ echo "CoW and unmount" $XFS_IO_PROG -f -c "pwrite -S 0x63 $bufsize 1" $testdir/file2 >> $seqres.full $XFS_IO_PROG -f -c "pwrite -S 0x63 -b $bufsize 0 $filesize" $TEST_DIR/moo >> $seqres.full sync + +# If the filesystem supports delalloc, then the fdatasync will report an IO +# error. If the write goes directly to disk, then aiocp will return nonzero. +unset write_failed _dmerror_load_error_table -$AIO_TEST -b $bufsize $TEST_DIR/moo $testdir/file2 >> $seqres.full -$XFS_IO_PROG -c "fdatasync" $testdir/file2 +$AIO_TEST -b $bufsize $TEST_DIR/moo $testdir/file2 &>> $seqres.full || \ + write_failed=1 +$XFS_IO_PROG -c "fdatasync" $testdir/file2 2>&1 | grep -q 'Input.output error' && \ + write_failed=1 +test -n $write_failed && echo "write failed" + _dmerror_load_working_table _dmerror_unmount _dmerror_mount diff --git a/tests/generic/331.out b/tests/generic/331.out index adbf841d00..d8ccea704b 100644 --- a/tests/generic/331.out +++ b/tests/generic/331.out @@ -5,7 +5,7 @@ Compare files 1886e67cf8783e89ce6ddc5bb09a3944 SCRATCH_MNT/test-331/file1 1886e67cf8783e89ce6ddc5bb09a3944 SCRATCH_MNT/test-331/file2 CoW and unmount -fdatasync: Input/output error +write failed Compare files 1886e67cf8783e89ce6ddc5bb09a3944 SCRATCH_MNT/test-331/file1 d94b0ab13385aba594411c174b1cc13c SCRATCH_MNT/test-331/file2 diff --git a/tests/xfs/240 b/tests/xfs/240 index a65c270d23..cabe309201 100755 --- a/tests/xfs/240 +++ b/tests/xfs/240 @@ -66,8 +66,17 @@ $XFS_IO_PROG -f -c "pwrite -S 0x63 $bufsize 1" $testdir/file2 >> $seqres.full $XFS_IO_PROG -f -c "pwrite -S 0x63 -b $bufsize 0 $filesize" $TEST_DIR/moo >> $seqres.full sync _dmerror_load_error_table -$AIO_TEST -b $bufsize $TEST_DIR/moo $testdir/file2 >> $seqres.full -$XFS_IO_PROG -c "fdatasync" $testdir/file2 + +# If the filesystem supports delalloc, then the fdatasync will report an IO +# error. If the write goes directly to disk, then aiocp will return nonzero. +unset write_failed +_dmerror_load_error_table +$AIO_TEST -b $bufsize $TEST_DIR/moo $testdir/file2 &>> $seqres.full || \ + write_failed=1 +$XFS_IO_PROG -c "fdatasync" $testdir/file2 2>&1 | grep -q 'Input.output error' && \ + write_failed=1 +test -n $write_failed && echo "write failed" + _dmerror_load_working_table _dmerror_unmount _dmerror_mount diff --git a/tests/xfs/240.out b/tests/xfs/240.out index 1a22e8a389..00bb116e5c 100644 --- a/tests/xfs/240.out +++ b/tests/xfs/240.out @@ -5,7 +5,7 @@ Compare files 1886e67cf8783e89ce6ddc5bb09a3944 SCRATCH_MNT/test-240/file1 1886e67cf8783e89ce6ddc5bb09a3944 SCRATCH_MNT/test-240/file2 CoW and unmount -fdatasync: Input/output error +write failed Compare files 1886e67cf8783e89ce6ddc5bb09a3944 SCRATCH_MNT/test-240/file1 d94b0ab13385aba594411c174b1cc13c SCRATCH_MNT/test-240/file2 ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 05/10] xfs/243: don't run when realtime storage is the default 2022-12-30 22:20 ` [PATCHSET v1.0 00/10] fstests: reflink on the realtime device Darrick J. Wong ` (4 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 09/10] generic/331,xfs/240: support files that skip delayed allocation Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 04/10] xfs/27[24]: adapt for checking files on the realtime volume Darrick J. Wong ` (3 subsequent siblings) 9 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan From: Darrick J. Wong <djwong@kernel.org> Realtime volumes don't support delayed allocation, so don't run this test when the mkfs configuration specifies realtime creation by default. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- tests/xfs/243 | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/tests/xfs/243 b/tests/xfs/243 index 514fa35667..dda4a0c223 100755 --- a/tests/xfs/243 +++ b/tests/xfs/243 @@ -38,6 +38,11 @@ echo "Format and mount" _scratch_mkfs > $seqres.full 2>&1 _scratch_mount >> $seqres.full 2>&1 +# XFS does not support delayed allocation on realtime volumes (even for COW) +# so skip this test on those platforms. +$XFS_IO_PROG -c 'stat -v' $SCRATCH_MNT | grep -q "xflags.*rt-inherit" && + _notrun "delalloc not used for CoW on realtime device" + testdir=$SCRATCH_MNT/test-$seq mkdir $testdir ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 04/10] xfs/27[24]: adapt for checking files on the realtime volume 2022-12-30 22:20 ` [PATCHSET v1.0 00/10] fstests: reflink on the realtime device Darrick J. Wong ` (5 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 05/10] xfs/243: don't run when realtime storage is the default Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 08/10] xfs/769: add rtreflink upgrade to test matrix Darrick J. Wong ` (2 subsequent siblings) 9 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan From: Darrick J. Wong <djwong@kernel.org> Adapt both tests to behave properly if the two files being tested are on the realtime volume. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- tests/xfs/272 | 40 +++++++++++++++++++++++++------------ tests/xfs/274 | 62 ++++++++++++++++++++++++++++++++++++++++----------------- 2 files changed, 70 insertions(+), 32 deletions(-) diff --git a/tests/xfs/272 b/tests/xfs/272 index 42b4a2edb5..2d7fc57d55 100755 --- a/tests/xfs/272 +++ b/tests/xfs/272 @@ -40,26 +40,40 @@ $here/src/punch-alternating $SCRATCH_MNT/urk >> $seqres.full ino=$(stat -c '%i' $SCRATCH_MNT/urk) echo "Get fsmap" | tee -a $seqres.full -$XFS_IO_PROG -c 'fsmap -v' $SCRATCH_MNT >> $seqres.full $XFS_IO_PROG -c 'fsmap -v' $SCRATCH_MNT | tr '[]()' ' ' > $TEST_DIR/fsmap +cat $TEST_DIR/fsmap >> $seqres.full echo "Get bmap" | tee -a $seqres.full -$XFS_IO_PROG -c 'bmap -v' $SCRATCH_MNT/urk >> $seqres.full $XFS_IO_PROG -c 'bmap -v' $SCRATCH_MNT/urk | grep '^[[:space:]]*[0-9]*:' | grep -v 'hole' | tr '[]()' ' ' > $TEST_DIR/bmap +cat $TEST_DIR/bmap >> $seqres.full echo "Check bmap and fsmap" | tee -a $seqres.full -cat $TEST_DIR/bmap | while read ext offrange colon blockrange ag agrange total crap; do - qstr="^[[:space:]]*[0-9]*:[[:space:]]*[0-9]*:[0-9]*[[:space:]]*${blockrange} :[[:space:]]*${ino}[[:space:]]*${offrange}[[:space:]]*${ag}[[:space:]]*${agrange}[[:space:]]*${total}$" - echo "${qstr}" >> $seqres.full - grep "${qstr}" $TEST_DIR/fsmap >> $seqres.full - found=$(grep -c "${qstr}" $TEST_DIR/fsmap) - test $found -eq 1 || echo "Unexpected output for offset ${offrange}." -done +if $XFS_IO_PROG -c 'stat -v' $SCRATCH_MNT/urk | grep -q realtime; then + # file on rt volume + cat $TEST_DIR/bmap | while read ext offrange colon rtblockrange total crap; do + qstr="^[[:space:]]*[0-9]*:[[:space:]]*[0-9]*:[0-9]*[[:space:]]*${rtblockrange} :[[:space:]]*${ino}[[:space:]]*${offrange}[[:space:]]*${total}$" + echo "${qstr}" >> $seqres.full + grep "${qstr}" $TEST_DIR/fsmap >> $seqres.full + found=$(grep -c "${qstr}" $TEST_DIR/fsmap) + test $found -eq 1 || echo "Unexpected output for offset ${offrange}." + done -echo "Check device field of FS metadata and regular file" -data_dev=$(grep 'inode btree' $TEST_DIR/fsmap | head -n 1 | awk '{print $2}') -rt_dev=$(grep "${ino}[[:space:]]*[0-9]*\.\.[0-9]*" $TEST_DIR/fsmap | head -n 1 | awk '{print $2}') -test "${data_dev}" = "${rt_dev}" || echo "data ${data_dev} realtime ${rt_dev}?" + echo "Check device field of FS metadata and regular file" +else + # file on data volume + cat $TEST_DIR/bmap | while read ext offrange colon blockrange ag agrange total crap; do + qstr="^[[:space:]]*[0-9]*:[[:space:]]*[0-9]*:[0-9]*[[:space:]]*${blockrange} :[[:space:]]*${ino}[[:space:]]*${offrange}[[:space:]]*${ag}[[:space:]]*${agrange}[[:space:]]*${total}$" + echo "${qstr}" >> $seqres.full + grep "${qstr}" $TEST_DIR/fsmap >> $seqres.full + found=$(grep -c "${qstr}" $TEST_DIR/fsmap) + test $found -eq 1 || echo "Unexpected output for offset ${offrange}." + done + + echo "Check device field of FS metadata and regular file" + data_dev=$(grep 'inode btree' $TEST_DIR/fsmap | head -n 1 | awk '{print $2}') + rt_dev=$(grep "${ino}[[:space:]]*[0-9]*\.\.[0-9]*" $TEST_DIR/fsmap | head -n 1 | awk '{print $2}') + test "${data_dev}" = "${rt_dev}" || echo "data ${data_dev} realtime ${rt_dev}?" +fi # success, all done status=0 diff --git a/tests/xfs/274 b/tests/xfs/274 index dcaea68804..25dd0c3f74 100755 --- a/tests/xfs/274 +++ b/tests/xfs/274 @@ -40,34 +40,58 @@ _cp_reflink $SCRATCH_MNT/f1 $SCRATCH_MNT/f2 ino=$(stat -c '%i' $SCRATCH_MNT/f1) echo "Get fsmap" | tee -a $seqres.full -$XFS_IO_PROG -c 'fsmap -v' $SCRATCH_MNT >> $seqres.full $XFS_IO_PROG -c 'fsmap -v' $SCRATCH_MNT | tr '[]()' ' ' > $TEST_DIR/fsmap +cat $TEST_DIR/fsmap >> $seqres.full echo "Get f1 bmap" | tee -a $seqres.full -$XFS_IO_PROG -c 'bmap -v' $SCRATCH_MNT/f1 >> $seqres.full $XFS_IO_PROG -c 'bmap -v' $SCRATCH_MNT/f1 | grep '^[[:space:]]*[0-9]*:' | grep -v 'hole' | tr '[]()' ' ' > $TEST_DIR/bmap +cat $TEST_DIR/bmap >> $seqres.full -echo "Check f1 bmap and fsmap" | tee -a $seqres.full -cat $TEST_DIR/bmap | while read ext offrange colon blockrange ag agrange total crap; do - qstr="^[[:space:]]*[0-9]*:[[:space:]]*[0-9]*:[0-9]*[[:space:]]*${blockrange} :[[:space:]]*${ino}[[:space:]]*${offrange}[[:space:]]*${ag}[[:space:]]*${agrange}[[:space:]]*${total} 0100000$" - echo "${qstr}" >> $seqres.full - grep "${qstr}" $TEST_DIR/fsmap >> $seqres.full - found=$(grep -c "${qstr}" $TEST_DIR/fsmap) - test $found -eq 1 || echo "Unexpected output for offset ${offrange}." -done +if _xfs_is_realtime_file $SCRATCH_MNT/f1 && ! _xfs_has_feature $SCRATCH_MNT rtgroups; then + # file on rt volume + echo "Check f1 bmap and fsmap" | tee -a $seqres.full + cat $TEST_DIR/bmap | while read ext offrange colon rtblockrange total crap; do + qstr="^[[:space:]]*[0-9]*:[[:space:]]*[0-9]*:[0-9]*[[:space:]]*${rtblockrange} :[[:space:]]*${ino}[[:space:]]*${offrange}[[:space:]]*${total} 0100000$" + echo "${qstr}" >> $seqres.full + grep "${qstr}" $TEST_DIR/fsmap >> $seqres.full + found=$(grep -c "${qstr}" $TEST_DIR/fsmap) + test $found -eq 1 || echo "Unexpected output for offset ${offrange}." + done +else + # file on data volume + echo "Check f1 bmap and fsmap" | tee -a $seqres.full + cat $TEST_DIR/bmap | while read ext offrange colon blockrange ag agrange total crap; do + qstr="^[[:space:]]*[0-9]*:[[:space:]]*[0-9]*:[0-9]*[[:space:]]*${blockrange} :[[:space:]]*${ino}[[:space:]]*${offrange}[[:space:]]*${ag}[[:space:]]*${agrange}[[:space:]]*${total} 0100000$" + echo "${qstr}" >> $seqres.full + grep "${qstr}" $TEST_DIR/fsmap >> $seqres.full + found=$(grep -c "${qstr}" $TEST_DIR/fsmap) + test $found -eq 1 || echo "Unexpected output for offset ${offrange}." + done +fi echo "Get f2 bmap" | tee -a $seqres.full -$XFS_IO_PROG -c 'bmap -v' $SCRATCH_MNT/f2 >> $seqres.full $XFS_IO_PROG -c 'bmap -v' $SCRATCH_MNT/f2 | grep '^[[:space:]]*[0-9]*:' | grep -v 'hole' | tr '[]()' ' ' > $TEST_DIR/bmap +cat $TEST_DIR/bmap >> $seqres.full -echo "Check f2 bmap and fsmap" | tee -a $seqres.full -cat $TEST_DIR/bmap | while read ext offrange colon blockrange ag agrange total crap; do - qstr="^[[:space:]]*[0-9]*:[[:space:]]*[0-9]*:[0-9]*[[:space:]]*${blockrange} :[[:space:]]*${ino}[[:space:]]*${offrange}[[:space:]]*${ag}[[:space:]]*${agrange}[[:space:]]*${total} 0100000$" - echo "${qstr}" >> $seqres.full - grep "${qstr}" $TEST_DIR/fsmap >> $seqres.full - found=$(grep -c "${qstr}" $TEST_DIR/fsmap) - test $found -eq 1 || echo "Unexpected output for offset ${offrange}." -done +if _xfs_is_realtime_file $SCRATCH_MNT/f2 && ! _xfs_has_feature $SCRATCH_MNT rtgroups; then + echo "Check f2 bmap and fsmap" | tee -a $seqres.full + cat $TEST_DIR/bmap | while read ext offrange colon rtblockrange total crap; do + qstr="^[[:space:]]*[0-9]*:[[:space:]]*[0-9]*:[0-9]*[[:space:]]*${rtblockrange} :[[:space:]]*${ino}[[:space:]]*${offrange}[[:space:]]*${total} 0100000$" + echo "${qstr}" >> $seqres.full + grep "${qstr}" $TEST_DIR/fsmap >> $seqres.full + found=$(grep -c "${qstr}" $TEST_DIR/fsmap) + test $found -eq 1 || echo "Unexpected output for offset ${offrange}." + done +else + echo "Check f2 bmap and fsmap" | tee -a $seqres.full + cat $TEST_DIR/bmap | while read ext offrange colon blockrange ag agrange total crap; do + qstr="^[[:space:]]*[0-9]*:[[:space:]]*[0-9]*:[0-9]*[[:space:]]*${blockrange} :[[:space:]]*${ino}[[:space:]]*${offrange}[[:space:]]*${ag}[[:space:]]*${agrange}[[:space:]]*${total} 0100000$" + echo "${qstr}" >> $seqres.full + grep "${qstr}" $TEST_DIR/fsmap >> $seqres.full + found=$(grep -c "${qstr}" $TEST_DIR/fsmap) + test $found -eq 1 || echo "Unexpected output for offset ${offrange}." + done +fi # success, all done status=0 ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 08/10] xfs/769: add rtreflink upgrade to test matrix 2022-12-30 22:20 ` [PATCHSET v1.0 00/10] fstests: reflink on the realtime device Darrick J. Wong ` (6 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 04/10] xfs/27[24]: adapt for checking files on the realtime volume Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 10/10] common/xfs: fix _xfs_get_file_block_size when rtinherit is set and no rt section Darrick J. Wong 2022-12-30 22:20 ` [PATCH 07/10] xfs: remove xfs/131 now that we allow reflink on realtime volumes Darrick J. Wong 9 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan From: Darrick J. Wong <djwong@kernel.org> Add realtime reflink to the features that this test will try to upgrade. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- tests/xfs/769 | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/tests/xfs/769 b/tests/xfs/769 index ccc3ea10bc..72863a6e83 100755 --- a/tests/xfs/769 +++ b/tests/xfs/769 @@ -196,12 +196,13 @@ function post_exercise() # upgrade don't spread failure to the rest of the tests. FEATURES=() if rt_configured; then - # rmap wasn't added to rt devices until after metadir + # reflink & rmap weren't added to rt devices until after metadir check_repair_upgrade finobt && FEATURES+=("finobt") check_repair_upgrade inobtcount && FEATURES+=("inobtcount") check_repair_upgrade bigtime && FEATURES+=("bigtime") check_repair_upgrade metadir && FEATURES+=("metadir") check_repair_upgrade rmapbt && FEATURES+=("rmapbt") + check_repair_upgrade reflink && FEATURES+=("reflink") else check_repair_upgrade finobt && FEATURES+=("finobt") check_repair_upgrade rmapbt && FEATURES+=("rmapbt") ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 10/10] common/xfs: fix _xfs_get_file_block_size when rtinherit is set and no rt section 2022-12-30 22:20 ` [PATCHSET v1.0 00/10] fstests: reflink on the realtime device Darrick J. Wong ` (7 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 08/10] xfs/769: add rtreflink upgrade to test matrix Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 07/10] xfs: remove xfs/131 now that we allow reflink on realtime volumes Darrick J. Wong 9 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan From: Darrick J. Wong <djwong@kernel.org> It's possible for the sysadmin to set rtinherit on the directory tree even if there isn't a realtime section attached to the filesystem. When this is the case, the realtime flag is /not/ passed to new files, and file data is written to the data device. The file allocation unit for the file is the fs blocksize, and it is not correct to use the rt extent. fstests can be fooled into doing the incorrect thing if test runner puts '-d rtinherit=1 -r extsize=28k' into MKFS_OPTIONS without configuring a realtime device. This causes many tests to do the wrong thing because they think they must operate on units of 28k (and not 4k). Fix this. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- common/xfs | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/common/xfs b/common/xfs index 7b7b3a35b5..546853247c 100644 --- a/common/xfs +++ b/common/xfs @@ -207,6 +207,8 @@ _xfs_get_file_block_size() { local path="$1" + # If rtinherit or realtime are not set on the path, then all files + # will be created on the data device. if ! ($XFS_IO_PROG -c "stat -v" "$path" 2>&1 | grep -E -q '(rt-inherit|realtime)'); then _get_block_size "$path" return @@ -217,6 +219,15 @@ _xfs_get_file_block_size() while ! $XFS_INFO_PROG "$path" &>/dev/null && [ "$path" != "/" ]; do path="$(dirname "$path")" done + + # If there's no realtime section, the rtinherit and rextsize settings + # are irrelevant -- all files are created on the data device. + if $XFS_INFO_PROG "$path" | grep -q 'realtime =none'; then + _get_block_size "$path" + return + fi + + # Otherwise, report the rt extent size. _xfs_get_rtextsize "$path" } ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 07/10] xfs: remove xfs/131 now that we allow reflink on realtime volumes 2022-12-30 22:20 ` [PATCHSET v1.0 00/10] fstests: reflink on the realtime device Darrick J. Wong ` (8 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 10/10] common/xfs: fix _xfs_get_file_block_size when rtinherit is set and no rt section Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 9 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan From: Darrick J. Wong <djwong@kernel.org> Remove this test, since we now support reflink on the rt volume. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- tests/xfs/131 | 48 ------------------------------------------------ tests/xfs/131.out | 5 ----- 2 files changed, 53 deletions(-) delete mode 100755 tests/xfs/131 delete mode 100644 tests/xfs/131.out diff --git a/tests/xfs/131 b/tests/xfs/131 deleted file mode 100755 index 879e2dc6e8..0000000000 --- a/tests/xfs/131 +++ /dev/null @@ -1,48 +0,0 @@ -#! /bin/bash -# SPDX-License-Identifier: GPL-2.0 -# Copyright (c) 2015, Oracle and/or its affiliates. All Rights Reserved. -# -# FS QA Test No. 131 -# -# Ensure that we can't reflink realtime files. -# -. ./common/preamble -_begin_fstest auto quick clone realtime - -# Override the default cleanup function. -_cleanup() -{ - cd / - umount $SCRATCH_MNT > /dev/null 2>&1 - rm -rf $tmp.* $testdir $metadump_file -} - -# Import common functions. -. ./common/filter -. ./common/reflink - -# real QA test starts here -_supported_fs xfs -_require_realtime -_require_scratch_reflink -_require_cp_reflink - -echo "Format and mount scratch device" -_scratch_mkfs >> $seqres.full -_scratch_mount - -testdir=$SCRATCH_MNT/test-$seq -mkdir $testdir - -echo "Create the original file blocks" -blksz=65536 -$XFS_IO_PROG -R -f -c "truncate $blksz" $testdir/file1 - -echo "Reflink every block" -_cp_reflink $testdir/file1 $testdir/file2 2>&1 | _filter_scratch - -test -s $testdir/file2 && _fail "Should not be able to reflink a realtime file." - -# success, all done -status=0 -exit diff --git a/tests/xfs/131.out b/tests/xfs/131.out deleted file mode 100644 index 3c0186f0c7..0000000000 --- a/tests/xfs/131.out +++ /dev/null @@ -1,5 +0,0 @@ -QA output created by 131 -Format and mount scratch device -Create the original file blocks -Reflink every block -cp: failed to clone 'SCRATCH_MNT/test-131/file2' from 'SCRATCH_MNT/test-131/file1': Invalid argument ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCHSET v1.0 0/4] fstests: reflink with large realtime extents 2022-12-30 21:14 [NYE DELUGE 3/4] xfs: modernize the realtime volume Darrick J. Wong ` (37 preceding siblings ...) 2022-12-30 22:20 ` [PATCHSET v1.0 00/10] fstests: reflink on the realtime device Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 1/4] xfs: make sure that CoW will write around when rextsize > 1 Darrick J. Wong ` (3 more replies) 2022-12-30 22:20 ` [PATCHSET v1.0 0/1] fstests: functional tests for rt quota Darrick J. Wong 39 siblings, 4 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan Hi all, Now that we've landed support for reflink on the realtime device for cases where the rt extent size is the same as the fs block size, enhance the reflink code further to support cases where the rt extent size is a power-of-two multiple of the fs block size. This enables us to do data block sharing (for example) for much larger allocation units by dirtying pagecache around shared extents and expanding writeback to write back shared extents fully. If you're going to start using this mess, you probably ought to just pull from my git trees, which are linked below. This is an extraordinary way to destroy everything. Enjoy! Comments and questions are, as always, welcome. --D kernel git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=realtime-reflink-extsize xfsprogs git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=realtime-reflink-extsize fstests git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=realtime-reflink-extsize --- common/rc | 23 +++++++ common/reflink | 27 +++++++++ tests/generic/145 | 1 tests/generic/147 | 1 tests/generic/261 | 1 tests/generic/262 | 1 tests/generic/303 | 8 ++- tests/generic/331 | 1 tests/generic/353 | 3 + tests/generic/517 | 1 tests/generic/657 | 1 tests/generic/658 | 1 tests/generic/659 | 1 tests/generic/660 | 1 tests/generic/663 | 1 tests/generic/664 | 1 tests/generic/665 | 1 tests/generic/670 | 1 tests/generic/672 | 1 tests/xfs/1212 | 1 tests/xfs/180 | 1 tests/xfs/182 | 1 tests/xfs/184 | 1 tests/xfs/192 | 1 tests/xfs/200 | 1 tests/xfs/204 | 1 tests/xfs/208 | 1 tests/xfs/315 | 1 tests/xfs/326 | 6 ++ tests/xfs/420 | 3 + tests/xfs/421 | 3 + tests/xfs/919 | 163 +++++++++++++++++++++++++++++++++++++++++++++++++++++ tests/xfs/919.out | 84 +++++++++++++++++++++++++++ 33 files changed, 342 insertions(+), 2 deletions(-) create mode 100755 tests/xfs/919 create mode 100644 tests/xfs/919.out ^ permalink raw reply [flat|nested] 565+ messages in thread
* [PATCH 1/4] xfs: make sure that CoW will write around when rextsize > 1 2022-12-30 22:20 ` [PATCHSET v1.0 0/4] fstests: reflink with large realtime extents Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 2/4] xfs: skip cowextsize hint fragmentation tests on realtime volumes Darrick J. Wong ` (2 subsequent siblings) 3 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan From: Darrick J. Wong <djwong@kernel.org> Make sure that CoW triggers the intended copy-around behavior when we write a tiny amount to the middle of a large rt extent. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- tests/xfs/919 | 163 +++++++++++++++++++++++++++++++++++++++++++++++++++++ tests/xfs/919.out | 84 +++++++++++++++++++++++++++ 2 files changed, 247 insertions(+) create mode 100755 tests/xfs/919 create mode 100644 tests/xfs/919.out diff --git a/tests/xfs/919 b/tests/xfs/919 new file mode 100755 index 0000000000..45bb42f91f --- /dev/null +++ b/tests/xfs/919 @@ -0,0 +1,163 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0-or-later +# Copyright (c) 2022 Oracle. All Rights Reserved. +# +# FS QA Test No. 919 +# +# Make sure that copy on write actually does the intended write-around when we +# stage a tiny modification to a large shared realtime extent. We should never +# end up with multiple rt extents mapped to the same region. +# +. ./common/preamble +_begin_fstest auto quick clone realtime + +# Import common functions. +. ./common/filter +. ./common/reflink + +# real QA test starts here +_supported_fs xfs +_require_xfs_io_command "fpunch" +_require_xfs_io_command "fzero" +_require_xfs_io_command "fcollapse" +_require_xfs_io_command "finsert" +_require_xfs_io_command "funshare" +_require_realtime +_require_scratch_reflink + +rtextsz=262144 +filesz=$((rtextsz * 3)) + +echo "Format filesystem and populate" +_scratch_mkfs -m reflink=1 -r extsize=$rtextsz > $seqres.full +_scratch_mount >> $seqres.full + +# Force all our files to be on the realtime device +_xfs_force_bdev realtime $SCRATCH_MNT + +check_file() { + $XFS_IO_PROG -c fsync -c 'bmap -elpv' $1 >> $seqres.full + md5sum $SCRATCH_MNT/a | _filter_scratch + md5sum $1 | _filter_scratch +} + +rtextsz_got=$(_xfs_get_rtextsize "$SCRATCH_MNT") +test $rtextsz_got -eq $rtextsz || \ + _notrun "got rtextsize $rtextsz_got, wanted $rtextsz" + +_pwrite_byte 0x59 0 $filesz $SCRATCH_MNT/a >> $seqres.full +sync +md5sum $SCRATCH_MNT/a | _filter_scratch +$XFS_IO_PROG -c 'bmap -elpv' $SCRATCH_MNT/a >> $seqres.full + +echo "pwrite 1 byte in the middle" | tee -a $seqres.full +_cp_reflink $SCRATCH_MNT/a $SCRATCH_MNT/b +_pwrite_byte 0x00 345678 1 $SCRATCH_MNT/b >> $seqres.full +check_file $SCRATCH_MNT/b + +echo "mwrite 1 byte in the middle" | tee -a $seqres.full +_cp_reflink $SCRATCH_MNT/a $SCRATCH_MNT/c +$XFS_IO_PROG -c "mmap -rw 0 $filesz" -c "mwrite -S 0x00 345678 1" -c msync $SCRATCH_MNT/c +check_file $SCRATCH_MNT/c + +echo "fzero 1 byte in the middle" | tee -a $seqres.full +_cp_reflink $SCRATCH_MNT/a $SCRATCH_MNT/d +$XFS_IO_PROG -c "fzero 345678 1" $SCRATCH_MNT/d +check_file $SCRATCH_MNT/d + +echo "fpunch 1 byte in the middle" | tee -a $seqres.full +_cp_reflink $SCRATCH_MNT/a $SCRATCH_MNT/e +$XFS_IO_PROG -c "fpunch 345678 1" $SCRATCH_MNT/e +check_file $SCRATCH_MNT/e + +echo "funshare 1 byte in the middle" | tee -a $seqres.full +_cp_reflink $SCRATCH_MNT/a $SCRATCH_MNT/f +$XFS_IO_PROG -c "funshare 345678 1" $SCRATCH_MNT/f +check_file $SCRATCH_MNT/f + +echo "pwrite 1 block in the middle" | tee -a $seqres.full +_cp_reflink $SCRATCH_MNT/a $SCRATCH_MNT/g +_pwrite_byte 0x00 327680 65536 $SCRATCH_MNT/g >> $seqres.full +check_file $SCRATCH_MNT/g + +echo "mwrite 1 block in the middle" | tee -a $seqres.full +_cp_reflink $SCRATCH_MNT/a $SCRATCH_MNT/h +$XFS_IO_PROG -c "mmap -rw 0 $filesz" -c "mwrite -S 0x00 327680 65536" -c msync $SCRATCH_MNT/h +check_file $SCRATCH_MNT/h + +echo "fzero 1 block in the middle" | tee -a $seqres.full +_cp_reflink $SCRATCH_MNT/a $SCRATCH_MNT/i +$XFS_IO_PROG -c "fzero 327680 65536" $SCRATCH_MNT/i +check_file $SCRATCH_MNT/i + +echo "fpunch 1 block in the middle" | tee -a $seqres.full +_cp_reflink $SCRATCH_MNT/a $SCRATCH_MNT/j +$XFS_IO_PROG -c "fpunch 327680 65536" $SCRATCH_MNT/j +check_file $SCRATCH_MNT/j + +echo "funshare 1 block in the middle" | tee -a $seqres.full +_cp_reflink $SCRATCH_MNT/a $SCRATCH_MNT/k +$XFS_IO_PROG -c "funshare 327680 65536" $SCRATCH_MNT/k +check_file $SCRATCH_MNT/k + +echo "pwrite 1 extent in the middle" | tee -a $seqres.full +_cp_reflink $SCRATCH_MNT/a $SCRATCH_MNT/l +_pwrite_byte 0x00 262144 262144 $SCRATCH_MNT/l >> $seqres.full +check_file $SCRATCH_MNT/l + +echo "mwrite 1 extent in the middle" | tee -a $seqres.full +_cp_reflink $SCRATCH_MNT/a $SCRATCH_MNT/m +$XFS_IO_PROG -c "mmap -rw 0 $filesz" -c "mwrite -S 0x00 262144 262144" -c msync $SCRATCH_MNT/m +check_file $SCRATCH_MNT/m + +echo "fzero 1 extent in the middle" | tee -a $seqres.full +_cp_reflink $SCRATCH_MNT/a $SCRATCH_MNT/n +$XFS_IO_PROG -c "fzero 262144 262144" $SCRATCH_MNT/n +check_file $SCRATCH_MNT/n + +echo "fpunch 1 extent in the middle" | tee -a $seqres.full +_cp_reflink $SCRATCH_MNT/a $SCRATCH_MNT/o +$XFS_IO_PROG -c "fpunch 262144 262144" $SCRATCH_MNT/o +check_file $SCRATCH_MNT/o + +echo "funshare 1 extent in the middle" | tee -a $seqres.full +_cp_reflink $SCRATCH_MNT/a $SCRATCH_MNT/p +$XFS_IO_PROG -c "funshare 262144 262144" $SCRATCH_MNT/p +check_file $SCRATCH_MNT/p + +echo "fcollapse 1 extent in the middle" | tee -a $seqres.full +_cp_reflink $SCRATCH_MNT/a $SCRATCH_MNT/q +$XFS_IO_PROG -c "fcollapse 262144 262144" $SCRATCH_MNT/q +check_file $SCRATCH_MNT/q + +echo "finsert 1 extent in the middle" | tee -a $seqres.full +_cp_reflink $SCRATCH_MNT/a $SCRATCH_MNT/r +$XFS_IO_PROG -c "finsert 262144 262144" $SCRATCH_MNT/r +check_file $SCRATCH_MNT/r + +echo "copy unwritten blocks in large rtext" | tee -a $seqres.full +$XFS_IO_PROG -f -c "falloc 0 $filesz" -c 'pwrite -S 0x59 345678 1' $SCRATCH_MNT/s >> $seqres.full +$XFS_IO_PROG -c 'bmap -elpv' $SCRATCH_MNT/s >> $seqres.full +_cp_reflink $SCRATCH_MNT/s $SCRATCH_MNT/t +$XFS_IO_PROG -c 'bmap -elpv' $SCRATCH_MNT/s >> $seqres.full +$XFS_IO_PROG -f -c 'pwrite -S 0x59 1048576 1' $SCRATCH_MNT/s >> $seqres.full +$XFS_IO_PROG -f -c 'pwrite -S 0x59 1048576 1' $SCRATCH_MNT/t >> $seqres.full +check_file $SCRATCH_MNT/s +check_file $SCRATCH_MNT/t + +echo "test writing to shared unwritten extent" | tee -a $seqres.full +$XFS_IO_PROG -c 'bmap -elpv' $SCRATCH_MNT/s >> $seqres.full +_cp_reflink $SCRATCH_MNT/s $SCRATCH_MNT/u +$XFS_IO_PROG -c 'bmap -elpv' $SCRATCH_MNT/s >> $seqres.full +$XFS_IO_PROG -f -c 'pwrite -S 0x59 345678 1' $SCRATCH_MNT/u >> $seqres.full +check_file $SCRATCH_MNT/u + +echo "Remount and recheck" | tee -a $seqres.full +md5sum $SCRATCH_MNT/a | _filter_scratch +for i in b c d e f g h i j k l m n o p q r s t u; do + check_file $SCRATCH_MNT/$i | grep -v SCRATCH_MNT.a +done + +# success, all done +status=0 +exit diff --git a/tests/xfs/919.out b/tests/xfs/919.out new file mode 100644 index 0000000000..6ab3f70d17 --- /dev/null +++ b/tests/xfs/919.out @@ -0,0 +1,84 @@ +QA output created by 919 +Format filesystem and populate +924a97fdaa2ab30e2768081469e728a7 SCRATCH_MNT/a +pwrite 1 byte in the middle +924a97fdaa2ab30e2768081469e728a7 SCRATCH_MNT/a +ea0b05f13c8cce703accaffe56d59bd3 SCRATCH_MNT/b +mwrite 1 byte in the middle +924a97fdaa2ab30e2768081469e728a7 SCRATCH_MNT/a +ea0b05f13c8cce703accaffe56d59bd3 SCRATCH_MNT/c +fzero 1 byte in the middle +924a97fdaa2ab30e2768081469e728a7 SCRATCH_MNT/a +ea0b05f13c8cce703accaffe56d59bd3 SCRATCH_MNT/d +fpunch 1 byte in the middle +924a97fdaa2ab30e2768081469e728a7 SCRATCH_MNT/a +ea0b05f13c8cce703accaffe56d59bd3 SCRATCH_MNT/e +funshare 1 byte in the middle +924a97fdaa2ab30e2768081469e728a7 SCRATCH_MNT/a +924a97fdaa2ab30e2768081469e728a7 SCRATCH_MNT/f +pwrite 1 block in the middle +924a97fdaa2ab30e2768081469e728a7 SCRATCH_MNT/a +2a508e23efc80e468efa7004fd8a1839 SCRATCH_MNT/g +mwrite 1 block in the middle +924a97fdaa2ab30e2768081469e728a7 SCRATCH_MNT/a +2a508e23efc80e468efa7004fd8a1839 SCRATCH_MNT/h +fzero 1 block in the middle +924a97fdaa2ab30e2768081469e728a7 SCRATCH_MNT/a +2a508e23efc80e468efa7004fd8a1839 SCRATCH_MNT/i +fpunch 1 block in the middle +924a97fdaa2ab30e2768081469e728a7 SCRATCH_MNT/a +2a508e23efc80e468efa7004fd8a1839 SCRATCH_MNT/j +funshare 1 block in the middle +924a97fdaa2ab30e2768081469e728a7 SCRATCH_MNT/a +924a97fdaa2ab30e2768081469e728a7 SCRATCH_MNT/k +pwrite 1 extent in the middle +924a97fdaa2ab30e2768081469e728a7 SCRATCH_MNT/a +352abe71b0d40f194b9d701750b0d7f3 SCRATCH_MNT/l +mwrite 1 extent in the middle +924a97fdaa2ab30e2768081469e728a7 SCRATCH_MNT/a +352abe71b0d40f194b9d701750b0d7f3 SCRATCH_MNT/m +fzero 1 extent in the middle +924a97fdaa2ab30e2768081469e728a7 SCRATCH_MNT/a +352abe71b0d40f194b9d701750b0d7f3 SCRATCH_MNT/n +fpunch 1 extent in the middle +924a97fdaa2ab30e2768081469e728a7 SCRATCH_MNT/a +352abe71b0d40f194b9d701750b0d7f3 SCRATCH_MNT/o +funshare 1 extent in the middle +924a97fdaa2ab30e2768081469e728a7 SCRATCH_MNT/a +924a97fdaa2ab30e2768081469e728a7 SCRATCH_MNT/p +fcollapse 1 extent in the middle +924a97fdaa2ab30e2768081469e728a7 SCRATCH_MNT/a +b0581637c15320958874ef3f082111da SCRATCH_MNT/q +finsert 1 extent in the middle +924a97fdaa2ab30e2768081469e728a7 SCRATCH_MNT/a +a7359d0c100367c2cd430be334dffbd3 SCRATCH_MNT/r +copy unwritten blocks in large rtext +924a97fdaa2ab30e2768081469e728a7 SCRATCH_MNT/a +bff4e0a70430429c92d6139065e6949b SCRATCH_MNT/s +924a97fdaa2ab30e2768081469e728a7 SCRATCH_MNT/a +bff4e0a70430429c92d6139065e6949b SCRATCH_MNT/t +test writing to shared unwritten extent +924a97fdaa2ab30e2768081469e728a7 SCRATCH_MNT/a +bff4e0a70430429c92d6139065e6949b SCRATCH_MNT/u +Remount and recheck +924a97fdaa2ab30e2768081469e728a7 SCRATCH_MNT/a +ea0b05f13c8cce703accaffe56d59bd3 SCRATCH_MNT/b +ea0b05f13c8cce703accaffe56d59bd3 SCRATCH_MNT/c +ea0b05f13c8cce703accaffe56d59bd3 SCRATCH_MNT/d +ea0b05f13c8cce703accaffe56d59bd3 SCRATCH_MNT/e +924a97fdaa2ab30e2768081469e728a7 SCRATCH_MNT/f +2a508e23efc80e468efa7004fd8a1839 SCRATCH_MNT/g +2a508e23efc80e468efa7004fd8a1839 SCRATCH_MNT/h +2a508e23efc80e468efa7004fd8a1839 SCRATCH_MNT/i +2a508e23efc80e468efa7004fd8a1839 SCRATCH_MNT/j +924a97fdaa2ab30e2768081469e728a7 SCRATCH_MNT/k +352abe71b0d40f194b9d701750b0d7f3 SCRATCH_MNT/l +352abe71b0d40f194b9d701750b0d7f3 SCRATCH_MNT/m +352abe71b0d40f194b9d701750b0d7f3 SCRATCH_MNT/n +352abe71b0d40f194b9d701750b0d7f3 SCRATCH_MNT/o +924a97fdaa2ab30e2768081469e728a7 SCRATCH_MNT/p +b0581637c15320958874ef3f082111da SCRATCH_MNT/q +a7359d0c100367c2cd430be334dffbd3 SCRATCH_MNT/r +bff4e0a70430429c92d6139065e6949b SCRATCH_MNT/s +bff4e0a70430429c92d6139065e6949b SCRATCH_MNT/t +bff4e0a70430429c92d6139065e6949b SCRATCH_MNT/u ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 2/4] xfs: skip cowextsize hint fragmentation tests on realtime volumes 2022-12-30 22:20 ` [PATCHSET v1.0 0/4] fstests: reflink with large realtime extents Darrick J. Wong 2022-12-30 22:20 ` [PATCH 1/4] xfs: make sure that CoW will write around when rextsize > 1 Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 4/4] generic/303: avoid test failures on weird rt extent sizes Darrick J. Wong 2022-12-30 22:20 ` [PATCH 3/4] misc: add more congruent oplen testing Darrick J. Wong 3 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan From: Darrick J. Wong <djwong@kernel.org> The XFS CoW extent size hint is ignored on realtime filesystems when the rt extent size set to a unit larger than a single filesystem block because it is assumed that the larger allocation unit is the administrator's sole and mandatory anti-fragmentation strategy. As such, we can skip these tests. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- common/reflink | 27 +++++++++++++++++++++++++++ tests/xfs/180 | 1 + tests/xfs/182 | 1 + tests/xfs/184 | 1 + tests/xfs/192 | 1 + tests/xfs/200 | 1 + tests/xfs/204 | 1 + tests/xfs/208 | 1 + tests/xfs/315 | 1 + tests/xfs/326 | 6 ++++++ 10 files changed, 41 insertions(+) diff --git a/common/reflink b/common/reflink index 22adc4449b..1082642e4e 100644 --- a/common/reflink +++ b/common/reflink @@ -521,3 +521,30 @@ _sweave_reflink_rainbow_delalloc() { _pwrite_byte 0x62 $((blksz * i)) $blksz $dfile.chk done } + +# Require that the COW extent size hint can actually be used to combat +# fragmentation on the scratch filesystem. This is (so far) true for any +# filesystem except for the ones where the realtime extent size is larger +# than one fs block, for it is assumed that setting a rt extent size is the +# preferred fragmentation avoidance strategy. +_require_scratch_cowextsize_useful() { + local testfile=$SCRATCH_MNT/hascowextsize + local param="${1:-1m}" + + rm -f $testfile + touch $testfile + local before="$($XFS_IO_PROG -c 'cowextsize' $testfile)" + + $XFS_IO_PROG -c "cowextsize $param" $testfile + local after="$($XFS_IO_PROG -c 'cowextsize' $testfile)" + rm -f $testfile + + test "$before" != "$after" || \ + _notrun "setting cowextsize to $param had no effect" + + local fileblocksize=$(_get_file_block_size $SCRATCH_MNT) + local fsblocksize=$(_get_block_size $SCRATCH_MNT) + + test $fsblocksize -eq $fileblocksize || \ + _notrun "XFS does not support cowextsize when rt extsize ($fileblocksize) > 1FSB ($fsblocksize)" +} diff --git a/tests/xfs/180 b/tests/xfs/180 index cfea2020ce..06b4b69d52 100755 --- a/tests/xfs/180 +++ b/tests/xfs/180 @@ -37,6 +37,7 @@ nr=128 filesize=$((blksz * nr)) bufnr=16 bufsize=$((blksz * bufnr)) +_require_scratch_cowextsize_useful $bufsize _require_fs_space $SCRATCH_MNT $((filesize / 1024 * 3 * 5 / 4)) real_blksz=$(_get_block_size $testdir) diff --git a/tests/xfs/182 b/tests/xfs/182 index 511aca6f2d..7c0713b248 100755 --- a/tests/xfs/182 +++ b/tests/xfs/182 @@ -39,6 +39,7 @@ nr=128 filesize=$((blksz * nr)) bufnr=16 bufsize=$((blksz * bufnr)) +_require_scratch_cowextsize_useful $bufsize _require_fs_space $SCRATCH_MNT $((filesize / 1024 * 3 * 5 / 4)) real_blksz=$(_get_block_size $testdir) diff --git a/tests/xfs/184 b/tests/xfs/184 index 3bdd86addf..a0dc2741f5 100755 --- a/tests/xfs/184 +++ b/tests/xfs/184 @@ -39,6 +39,7 @@ nr=128 filesize=$((blksz * nr)) bufnr=16 bufsize=$((blksz * bufnr)) +_require_scratch_cowextsize_useful $bufsize _require_fs_space $SCRATCH_MNT $((filesize / 1024 * 3 * 5 / 4)) real_blksz=$(_get_block_size $testdir) diff --git a/tests/xfs/192 b/tests/xfs/192 index eb577f15fc..daa4fcb1e0 100755 --- a/tests/xfs/192 +++ b/tests/xfs/192 @@ -41,6 +41,7 @@ nr=128 filesize=$((blksz * nr)) bufnr=16 bufsize=$((blksz * bufnr)) +_require_scratch_cowextsize_useful $bufsize _require_fs_space $SCRATCH_MNT $((filesize / 1024 * 3 * 5 / 4)) real_blksz=$(_get_block_size $testdir) diff --git a/tests/xfs/200 b/tests/xfs/200 index b51b9a54f5..8eb54b5755 100755 --- a/tests/xfs/200 +++ b/tests/xfs/200 @@ -41,6 +41,7 @@ nr=128 filesize=$((blksz * nr)) bufnr=16 bufsize=$((blksz * bufnr)) +_require_scratch_cowextsize_useful $bufsize _require_fs_space $SCRATCH_MNT $((filesize / 1024 * 3 * 5 / 4)) real_blksz=$(_get_block_size $testdir) diff --git a/tests/xfs/204 b/tests/xfs/204 index 7d6b79a86d..85b7ed7cd9 100755 --- a/tests/xfs/204 +++ b/tests/xfs/204 @@ -43,6 +43,7 @@ nr=128 filesize=$((blksz * nr)) bufnr=16 bufsize=$((blksz * bufnr)) +_require_scratch_cowextsize_useful $bufsize _require_fs_space $SCRATCH_MNT $((filesize / 1024 * 3 * 5 / 4)) real_blksz=$(_get_block_size $testdir) diff --git a/tests/xfs/208 b/tests/xfs/208 index 9a71b74f6f..3a4a3e4df1 100755 --- a/tests/xfs/208 +++ b/tests/xfs/208 @@ -40,6 +40,7 @@ nr=128 filesize=$((blksz * nr)) bufnr=16 bufsize=$((blksz * bufnr)) +_require_scratch_cowextsize_useful $bufsize _require_fs_space $SCRATCH_MNT $((filesize / 1024 * 3 * 5 / 4)) real_blksz=$(_get_file_block_size $testdir) diff --git a/tests/xfs/315 b/tests/xfs/315 index 9f6b39c8cc..3a618a3680 100755 --- a/tests/xfs/315 +++ b/tests/xfs/315 @@ -38,6 +38,7 @@ echo "Format filesystem" _scratch_mkfs >/dev/null 2>&1 _scratch_mount >> $seqres.full _require_congruent_file_oplen $SCRATCH_MNT $blksz +_require_scratch_cowextsize_useful $sz $XFS_IO_PROG -c "cowextsize $sz" $SCRATCH_MNT diff --git a/tests/xfs/326 b/tests/xfs/326 index ac620fc433..a3fed8b6ac 100755 --- a/tests/xfs/326 +++ b/tests/xfs/326 @@ -55,6 +55,12 @@ $XFS_IO_PROG -c "cowextsize $sz" $SCRATCH_MNT # staging extent for an unshared extent and trips over the injected error. _require_no_xfs_always_cow +# This test uses a very large cowextszhint to manipulate the COW fork to +# contain a large unwritten extent before injecting the error. XFS ignores +# cowextsize when the realtime extent size is greater than 1FSB, so this test +# cannot set up the preconditions for the test. +_require_scratch_cowextsize_useful $sz + echo "Create files" _pwrite_byte 0x66 0 $sz $SCRATCH_MNT/file1 >> $seqres.full _cp_reflink $SCRATCH_MNT/file1 $SCRATCH_MNT/file2 ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 4/4] generic/303: avoid test failures on weird rt extent sizes 2022-12-30 22:20 ` [PATCHSET v1.0 0/4] fstests: reflink with large realtime extents Darrick J. Wong 2022-12-30 22:20 ` [PATCH 1/4] xfs: make sure that CoW will write around when rextsize > 1 Darrick J. Wong 2022-12-30 22:20 ` [PATCH 2/4] xfs: skip cowextsize hint fragmentation tests on realtime volumes Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 3/4] misc: add more congruent oplen testing Darrick J. Wong 3 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan From: Darrick J. Wong <djwong@kernel.org> Fix this test to skip the high offset reflink test if (on XFS) the rt extent size isn't congruent with the chosen target offset. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- common/rc | 23 +++++++++++++++++++++++ tests/generic/303 | 8 +++++++- 2 files changed, 30 insertions(+), 1 deletion(-) diff --git a/common/rc b/common/rc index cfe765de2e..3c30a444fe 100644 --- a/common/rc +++ b/common/rc @@ -4488,6 +4488,29 @@ _get_file_block_size() esac } +_test_congruent_file_oplen() +{ + local file="$1" + local alloc_unit=$(_get_file_block_size "$file") + local oplen="$2" + + case $FSTYP in + nfs*|cifs|9p|virtiofs|ceph|glusterfs|overlay|pvfs2) + # Network filesystems don't know about (or tell the client + # about) the underlying file allocation unit and they generally + # pass the file IO request to the underlying filesystem, so we + # don't have anything to check here. + return + ;; + esac + + if [ $alloc_unit -gt $oplen ]; then + return 1 + fi + test $((oplen % alloc_unit)) -eq 0 || return 1 + return 0 +} + # Given a file path and a byte length of a file operation under test, ensure # that the length is an integer multiple of the file's allocation unit size. # In other words, skip the test unless (oplen ≡ alloc_unit mod 0). This is diff --git a/tests/generic/303 b/tests/generic/303 index 95679569e4..ef88d2357b 100755 --- a/tests/generic/303 +++ b/tests/generic/303 @@ -48,7 +48,13 @@ echo "Reflink past maximum file size in dest file (should fail)" _reflink_range $testdir/file1 0 $testdir/file5 4611686018427322368 $len >> $seqres.full echo "Reflink high offset to low offset" -_reflink_range $testdir/file1 $bigoff_64k $testdir/file6 1048576 65535 >> $seqres.full +oplen=1048576 +if _test_congruent_file_oplen $testdir $oplen; then + _reflink_range $testdir/file1 $bigoff_64k $testdir/file6 $oplen 65535 >> $seqres.full +else + # If we can't do the ficlonerange test, fake it in the output file + $XFS_IO_PROG -f -c 'pwrite -S 0x61 1114110 1' $testdir/file6 >> $seqres.full +fi echo "Reflink past source file EOF (should fail)" _reflink_range $testdir/file2 524288 $testdir/file7 0 1048576 >> $seqres.full ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCH 3/4] misc: add more congruent oplen testing 2022-12-30 22:20 ` [PATCHSET v1.0 0/4] fstests: reflink with large realtime extents Darrick J. Wong ` (2 preceding siblings ...) 2022-12-30 22:20 ` [PATCH 4/4] generic/303: avoid test failures on weird rt extent sizes Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 3 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan From: Darrick J. Wong <djwong@kernel.org> Do more checking for file allocation operation op length congruency. This prevents tests from failing with EINVAL when the realtime extent size is something weird like 28k or 1GB. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- tests/generic/145 | 1 + tests/generic/147 | 1 + tests/generic/261 | 1 + tests/generic/262 | 1 + tests/generic/331 | 1 + tests/generic/353 | 3 ++- tests/generic/517 | 1 + tests/generic/657 | 1 + tests/generic/658 | 1 + tests/generic/659 | 1 + tests/generic/660 | 1 + tests/generic/663 | 1 + tests/generic/664 | 1 + tests/generic/665 | 1 + tests/generic/670 | 1 + tests/generic/672 | 1 + tests/xfs/1212 | 1 + tests/xfs/420 | 3 +++ tests/xfs/421 | 3 +++ 19 files changed, 24 insertions(+), 1 deletion(-) diff --git a/tests/generic/145 b/tests/generic/145 index f213f53be8..81fc5f6c2f 100755 --- a/tests/generic/145 +++ b/tests/generic/145 @@ -36,6 +36,7 @@ mkdir $testdir echo "Create the original files" blksz=65536 +_require_congruent_file_oplen $TEST_DIR $blksz _pwrite_byte 0x61 0 $blksz $testdir/file1 >> $seqres.full _pwrite_byte 0x62 $blksz $blksz $testdir/file1 >> $seqres.full _pwrite_byte 0x63 $((blksz * 2)) $blksz $testdir/file1 >> $seqres.full diff --git a/tests/generic/147 b/tests/generic/147 index 113800944b..bb17bb1c0b 100755 --- a/tests/generic/147 +++ b/tests/generic/147 @@ -35,6 +35,7 @@ mkdir $testdir echo "Create the original files" blksz=65536 +_require_congruent_file_oplen $TEST_DIR $blksz _pwrite_byte 0x61 0 $blksz $testdir/file1 >> $seqres.full _pwrite_byte 0x62 $blksz $blksz $testdir/file1 >> $seqres.full _pwrite_byte 0x63 $((blksz * 2)) $blksz $testdir/file1 >> $seqres.full diff --git a/tests/generic/261 b/tests/generic/261 index 93c1c349b1..deb360288e 100755 --- a/tests/generic/261 +++ b/tests/generic/261 @@ -29,6 +29,7 @@ testdir=$SCRATCH_MNT/test-$seq mkdir $testdir blksz=65536 +_require_congruent_file_oplen $SCRATCH_MNT $blksz nr=5 filesize=$((blksz * nr)) diff --git a/tests/generic/262 b/tests/generic/262 index 46e88f8731..f296e37e02 100755 --- a/tests/generic/262 +++ b/tests/generic/262 @@ -29,6 +29,7 @@ testdir=$SCRATCH_MNT/test-$seq mkdir $testdir blksz=65536 +_require_congruent_file_oplen $SCRATCH_MNT $blksz nr=4 filesize=$((blksz * nr)) diff --git a/tests/generic/331 b/tests/generic/331 index 8c665ce4fc..9b6801e16f 100755 --- a/tests/generic/331 +++ b/tests/generic/331 @@ -38,6 +38,7 @@ testdir=$SCRATCH_MNT/test-$seq mkdir $testdir blksz=65536 +_require_congruent_file_oplen $SCRATCH_MNT $blksz nr=640 bufnr=128 filesize=$((blksz * nr)) diff --git a/tests/generic/353 b/tests/generic/353 index 9a1471bd81..94c9ac2273 100755 --- a/tests/generic/353 +++ b/tests/generic/353 @@ -29,7 +29,8 @@ _require_xfs_io_command "fiemap" _scratch_mkfs > /dev/null 2>&1 _scratch_mount -blocksize=64k +blocksize=65536 +_require_congruent_file_oplen $SCRATCH_MNT $blocksize file1="$SCRATCH_MNT/file1" file2="$SCRATCH_MNT/file2" diff --git a/tests/generic/517 b/tests/generic/517 index cf3031ed2d..229358d06b 100755 --- a/tests/generic/517 +++ b/tests/generic/517 @@ -21,6 +21,7 @@ _require_scratch_dedupe _scratch_mkfs >>$seqres.full 2>&1 _scratch_mount +_require_congruent_file_oplen $SCRATCH_MNT 65536 # The first byte with a value of 0xae starts at an offset (512Kb + 100) which is # not a multiple of the block size. diff --git a/tests/generic/657 b/tests/generic/657 index e0fecd544c..9f4673dda3 100755 --- a/tests/generic/657 +++ b/tests/generic/657 @@ -30,6 +30,7 @@ mkdir $testdir echo "Create the original files" blksz=65536 +_require_congruent_file_oplen $SCRATCH_MNT $blksz nr=64 filesize=$((blksz * nr)) _pwrite_byte 0x61 0 $filesize $testdir/file1 >> $seqres.full diff --git a/tests/generic/658 b/tests/generic/658 index a5cbadaaa5..e9519c25e2 100755 --- a/tests/generic/658 +++ b/tests/generic/658 @@ -31,6 +31,7 @@ mkdir $testdir echo "Create the original files" blksz=65536 +_require_congruent_file_oplen $SCRATCH_MNT $blksz nr=64 filesize=$((blksz * nr)) _weave_reflink_regular $blksz $nr $testdir/file1 $testdir/file3 >> $seqres.full diff --git a/tests/generic/659 b/tests/generic/659 index ccc2d7950d..05436edfab 100755 --- a/tests/generic/659 +++ b/tests/generic/659 @@ -31,6 +31,7 @@ mkdir $testdir echo "Create the original files" blksz=65536 +_require_congruent_file_oplen $SCRATCH_MNT $blksz nr=64 filesize=$((blksz * nr)) _weave_reflink_unwritten $blksz $nr $testdir/file1 $testdir/file3 >> $seqres.full diff --git a/tests/generic/660 b/tests/generic/660 index bc17dc5e59..52b0d1ea9e 100755 --- a/tests/generic/660 +++ b/tests/generic/660 @@ -31,6 +31,7 @@ mkdir $testdir echo "Create the original files" blksz=65536 +_require_congruent_file_oplen $SCRATCH_MNT $blksz nr=64 filesize=$((blksz * nr)) _weave_reflink_holes $blksz $nr $testdir/file1 $testdir/file3 >> $seqres.full diff --git a/tests/generic/663 b/tests/generic/663 index 658a5b7004..692c77b745 100755 --- a/tests/generic/663 +++ b/tests/generic/663 @@ -32,6 +32,7 @@ mkdir $testdir echo "Create the original files" blksz=65536 +_require_congruent_file_oplen $SCRATCH_MNT $blksz nr=64 filesize=$((blksz * nr)) _sweave_reflink_regular $blksz $nr $testdir/file1 $testdir/file3 >> $seqres.full diff --git a/tests/generic/664 b/tests/generic/664 index 3009101fdc..40fb8c6d92 100755 --- a/tests/generic/664 +++ b/tests/generic/664 @@ -34,6 +34,7 @@ mkdir $testdir echo "Create the original files" blksz=65536 +_require_congruent_file_oplen $SCRATCH_MNT $blksz nr=64 filesize=$((blksz * nr)) _sweave_reflink_unwritten $blksz $nr $testdir/file1 $testdir/file3 >> $seqres.full diff --git a/tests/generic/665 b/tests/generic/665 index 86ba578720..ee511755e6 100755 --- a/tests/generic/665 +++ b/tests/generic/665 @@ -34,6 +34,7 @@ mkdir $testdir echo "Create the original files" blksz=65536 +_require_congruent_file_oplen $SCRATCH_MNT $blksz nr=64 filesize=$((blksz * nr)) _sweave_reflink_holes $blksz $nr $testdir/file1 $testdir/file3 >> $seqres.full diff --git a/tests/generic/670 b/tests/generic/670 index 67de167405..80f9fe6d4f 100755 --- a/tests/generic/670 +++ b/tests/generic/670 @@ -31,6 +31,7 @@ mkdir $testdir loops=512 nr_loops=$((loops - 1)) blksz=65536 +_require_congruent_file_oplen $SCRATCH_MNT $blksz echo "Initialize files" echo >> $seqres.full diff --git a/tests/generic/672 b/tests/generic/672 index 9e3a97ec5e..0710a04294 100755 --- a/tests/generic/672 +++ b/tests/generic/672 @@ -30,6 +30,7 @@ mkdir $testdir loops=1024 nr_loops=$((loops - 1)) blksz=65536 +_require_congruent_file_oplen $SCRATCH_MNT $blksz echo "Initialize files" echo >> $seqres.full diff --git a/tests/xfs/1212 b/tests/xfs/1212 index d2292d65a2..655cd14021 100755 --- a/tests/xfs/1212 +++ b/tests/xfs/1212 @@ -32,6 +32,7 @@ _require_xfs_io_error_injection "bmap_finish_one" _scratch_mkfs >> $seqres.full _scratch_mount +_require_congruent_file_oplen $SCRATCH_MNT 65536 # Create original file _pwrite_byte 0x58 0 1m $SCRATCH_MNT/a >> $seqres.full diff --git a/tests/xfs/420 b/tests/xfs/420 index d38772c9d9..51f87bc304 100755 --- a/tests/xfs/420 +++ b/tests/xfs/420 @@ -69,6 +69,9 @@ exercise_lseek() { } blksz=65536 +# Golden output encodes SEEK_HOLE/DATA output, which depends on COW only +# happening on $blksz granularity +_require_congruent_file_oplen $SCRATCH_MNT $blksz nr=8 filesize=$((blksz * nr)) diff --git a/tests/xfs/421 b/tests/xfs/421 index 027ae47c21..429333e349 100755 --- a/tests/xfs/421 +++ b/tests/xfs/421 @@ -51,6 +51,9 @@ testdir=$SCRATCH_MNT/test-$seq mkdir $testdir blksz=65536 +# Golden output encodes SEEK_HOLE/DATA output, which depends on COW only +# happening on $blksz granularity +_require_congruent_file_oplen $SCRATCH_MNT $blksz nr=8 filesize=$((blksz * nr)) ^ permalink raw reply related [flat|nested] 565+ messages in thread
* [PATCHSET v1.0 0/1] fstests: functional tests for rt quota 2022-12-30 21:14 [NYE DELUGE 3/4] xfs: modernize the realtime volume Darrick J. Wong ` (38 preceding siblings ...) 2022-12-30 22:20 ` [PATCHSET v1.0 0/4] fstests: reflink with large realtime extents Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCH 1/1] xfs: regression testing of quota on the realtime device Darrick J. Wong 39 siblings, 1 reply; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan Hi all, The sole patch in this series sets up functional testing for quota on the xfs realtime device. If you're going to start using this mess, you probably ought to just pull from my git trees, which are linked below. This is an extraordinary way to destroy everything. Enjoy! Comments and questions are, as always, welcome. --D kernel git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=realtime-quotas xfsprogs git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=realtime-quotas fstests git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=realtime-quotas --- common/quota | 30 ++++++++++ tests/xfs/767 | 167 +++++++++++++++++++++++++++++++++++++++++++++++++++++ tests/xfs/767.out | 41 +++++++++++++ 3 files changed, 238 insertions(+) create mode 100755 tests/xfs/767 create mode 100644 tests/xfs/767.out ^ permalink raw reply [flat|nested] 565+ messages in thread
* [PATCH 1/1] xfs: regression testing of quota on the realtime device 2022-12-30 22:20 ` [PATCHSET v1.0 0/1] fstests: functional tests for rt quota Darrick J. Wong @ 2022-12-30 22:20 ` Darrick J. Wong 0 siblings, 0 replies; 565+ messages in thread From: Darrick J. Wong @ 2022-12-30 22:20 UTC (permalink / raw) To: zlang, djwong; +Cc: linux-xfs, fstests, guan From: Darrick J. Wong <djwong@kernel.org> Make sure that quota accounting and enforcement work correctly for realtime volumes on XFS. Signed-off-by: Darrick J. Wong <djwong@kernel.org> --- common/quota | 30 ++++++++++ tests/xfs/767 | 167 +++++++++++++++++++++++++++++++++++++++++++++++++++++ tests/xfs/767.out | 41 +++++++++++++ 3 files changed, 238 insertions(+) create mode 100755 tests/xfs/767 create mode 100644 tests/xfs/767.out diff --git a/common/quota b/common/quota index 96b8d04424..f4c528c836 100644 --- a/common/quota +++ b/common/quota @@ -117,6 +117,36 @@ _require_xfs_quota_acct_enabled() _notrun "$qtype: accounting not enabled on $fsname filesystem." } +# Decide if the mounted filesystem supports realtime quotas. +_require_rtquota() +{ + local dev="$1" + test -z "$dev" && dev="$TEST_DEV" + local rtdev="$2" + test -z "$rtdev" && rtdev="$TEST_RTDEV" + + test "$FSTYP" = "xfs" || \ + _notrun "Realtime quota only supported on xfs" + + [ -n "$XFS_QUOTA_PROG" ] || \ + _notrun "xfs_quota user tool not installed" + + $here/src/feature -q $dev || \ + _notrun "Installed kernel does not support XFS quotas" + + test -b "$rtdev" || \ + _notrun "No realtime device supplied?" + + test "$USE_EXTERNAL" = "yes" || \ + _notrun "Realtime requires USE_EXTERNAL='yes'" + + $here/src/feature -U $dev || \ + $here/src/feature -G $dev || \ + $here/src/feature -P $dev || \ + _notrun "Mounted rt filesystem does not have quotas enabled" + +} + # # checks that xfs_quota can operate on foreign (non-xfs) filesystems # Skips check on xfs filesystems, old xfs_quota is fine there. diff --git a/tests/xfs/767 b/tests/xfs/767 new file mode 100755 index 0000000000..f30321d7fc --- /dev/null +++ b/tests/xfs/767 @@ -0,0 +1,167 @@ +#! /bin/bash +# SPDX-License-Identifier: GPL-2.0-or-later +# Copyright (c) 2022 Oracle. All Rights Reserved. +# +# FS QA Test No. 767 +# +# Functional testing for realtime quotas. + +. ./common/preamble +_begin_fstest auto quick quota realtime + +# Import common functions. +. ./common/quota +. ./common/filter + +# real QA test starts here +_supported_fs xfs +_require_test_program "punch-alternating" +_require_scratch +_require_user + +echo "Format filesystem" | tee -a $seqres.full +_scratch_mkfs > $seqres.full +_qmount_option 'usrquota' +_qmount +_require_rtquota $SCRATCH_DEV $SCRATCH_RTDEV + +# Make sure all our files are on the rt device +_xfs_force_bdev realtime $SCRATCH_MNT +chmod a+rwx $SCRATCH_MNT + +# Record rt geometry +bmbt_blksz=$(_get_block_size $SCRATCH_MNT) +file_blksz=$(_get_file_block_size $SCRATCH_MNT) +rextsize=$((file_blksz / bmbt_blksz)) +echo "bmbt_blksz $bmbt_blksz" >> $seqres.full +echo "file_blksz $file_blksz" >> $seqres.full +echo "rextsize $rextsize" >> $seqres.full + +note() { + echo -e "\n$@" | tee -a $seqres.full +} + +# Report on the user's block and rt block usage, soft limit, hard limit, and +# warning count for rt volumes +report_rtusage() { + local user="$1" + local timeout_arg="$2" + local print_timeout=0 + + test -z "$user" && user=$qa_user + test -n "$timeout_arg" && print_timeout=1 + + $XFS_QUOTA_PROG -c "quota -u -r -n -N $user" $SCRATCH_MNT | \ + sed -e 's/ days/_days/g' >> $seqres.full + + $XFS_QUOTA_PROG -c "quota -u -r -n -N $user" $SCRATCH_MNT | \ + sed -e 's/ days/_days/g' | \ + awk -v user=$user -v print_timeout=$print_timeout -v file_blksz=$file_blksz \ + '{printf("%s[real] %d %d %d %d %s\n", user, $2 * 1024 / file_blksz, $3 * 1024 / file_blksz, $4 * 1024 / file_blksz, $5, print_timeout ? $6 : "---");}' +} + +note "Write 128rx to root" +$XFS_IO_PROG -f -c "pwrite 0 $((128 * file_blksz))" $SCRATCH_MNT/file1 > /dev/null +chmod a+r $SCRATCH_MNT/file1 +sync +report_rtusage 0 + +note "Write 64rx to root, 4444, and 5555." +$XFS_IO_PROG -f -c "pwrite 0 $((64 * file_blksz))" $SCRATCH_MNT/file3.5555 > /dev/null +chown 5555 $SCRATCH_MNT/file3.5555 +$XFS_IO_PROG -f -c "pwrite 0 $((64 * file_blksz))" $SCRATCH_MNT/file3.4444 > /dev/null +chown 4444 $SCRATCH_MNT/file3.4444 +$XFS_IO_PROG -f -c "pwrite 0 $((64 * file_blksz))" $SCRATCH_MNT/file3 > /dev/null +sync +report_rtusage 0 +report_rtusage 4444 +report_rtusage 5555 + +note "Move 64rx from root to 5555" +chown 5555 $SCRATCH_MNT/file3 +report_rtusage 0 +report_rtusage 4444 +report_rtusage 5555 + +note "Move 64rx from 5555 to 4444" +chown 4444 $SCRATCH_MNT/file3 +report_rtusage 0 +report_rtusage 4444 +report_rtusage 5555 + +note "Set hard limit of 1024rx and check enforcement" +$XFS_QUOTA_PROG -x -c "limit -u rtbhard=$((1024 * file_blksz)) $qa_user" $SCRATCH_MNT +su $qa_user -c "$XFS_IO_PROG -f -c 'pwrite 0 $((2048 * file_blksz))' $SCRATCH_MNT/file2" +report_rtusage + +note "Set soft limit of 512rx and check timelimit enforcement" +rm -f $SCRATCH_MNT/file2 $SCRATCH_MNT/file2.1 +$XFS_QUOTA_PROG -x -c "limit -u rtbsoft=$((512 * file_blksz)) rtbhard=0 $qa_user" $SCRATCH_MNT +$XFS_QUOTA_PROG -x -c "timer -u -r -d 2" $SCRATCH_MNT +$XFS_QUOTA_PROG -x -c 'state -u' $SCRATCH_MNT >> $seqres.full + +su $qa_user -c "$XFS_IO_PROG -f -c 'pwrite 0 $((512 * file_blksz))' $SCRATCH_MNT/file2" > /dev/null +report_rtusage + +overflow=$(date +%s) +su $qa_user -c "$XFS_IO_PROG -f -c 'pwrite -b $file_blksz 0 $file_blksz' $SCRATCH_MNT/file2.1" > /dev/null +report_rtusage +sleep 1 +echo "Try again after 1s" +su $qa_user -c "$XFS_IO_PROG -f -c 'pwrite -b $file_blksz $file_blksz $file_blksz' $SCRATCH_MNT/file2.1" > /dev/null +report_rtusage +sleep 2 +echo "Try again after 3s" +su $qa_user -c "$XFS_IO_PROG -f -c 'pwrite -b $file_blksz $((2 * file_blksz)) $file_blksz' $SCRATCH_MNT/file2.1" > /dev/null +report_rtusage + +note "Extend time limits and warnings" +rm -f $SCRATCH_MNT/file2 $SCRATCH_MNT/file2.1 +$XFS_QUOTA_PROG -x -c "limit -u rtbsoft=$((512 * file_blksz)) rtbhard=0 $qa_user" $SCRATCH_MNT +$XFS_QUOTA_PROG -x -c "timer -u -r -d 49h" $SCRATCH_MNT +$XFS_QUOTA_PROG -x -c 'state -u' $SCRATCH_MNT >> $seqres.full + +su $qa_user -c "$XFS_IO_PROG -f -c 'pwrite 0 $((512 * file_blksz))' $SCRATCH_MNT/file2" > /dev/null +report_rtusage $qa_user want_timeout + +su $qa_user -c "$XFS_IO_PROG -f -c 'pwrite -b $file_blksz 0 $file_blksz' $SCRATCH_MNT/file2.1" > /dev/null +report_rtusage $qa_user want_timeout + +$XFS_QUOTA_PROG -x -c "timer -u -r 73h $qa_user" $SCRATCH_MNT + +su $qa_user -c "$XFS_IO_PROG -f -c 'pwrite -b $file_blksz $file_blksz $file_blksz' $SCRATCH_MNT/file2.1" > /dev/null +report_rtusage $qa_user want_timeout + +note "Test quota applied to bmbt" + +# Testing quota enforcement for bmbt shape changes is tricky. The block +# reservation will be for enough blocks to handle the maximal btree split. +# This is (approximately) 9 blocks no matter the size of the existing extent +# map structure, so we set the hard limit to one more than this quantity. +# +# However, that means that we need to make a file of at least twice that size +# to ensure that we create enough extent records even in the rextsize==1 case +# where punching doesn't just create unwritten records. +# +# Unfortunately, it's very difficult to predict when exactly the EDQUOT will +# come down, so we just look for the error message. +extent_records=$(( (25 * bmbt_blksz) / 16)) +echo "extent_records $extent_records" >> $seqres.full + +rm -f $SCRATCH_MNT/file2 +$XFS_QUOTA_PROG -x -c "limit -u rtbsoft=0 rtbhard=0 $qa_user" $SCRATCH_MNT +$XFS_QUOTA_PROG -x -c "limit -u bhard=$((bmbt_blksz * 10)) bsoft=0 $qa_user" $SCRATCH_MNT +$XFS_QUOTA_PROG -x -c 'state -u' $SCRATCH_MNT >> $seqres.full +$XFS_IO_PROG -f -c "pwrite -S 0x58 -b 64m 0 $((extent_records * file_blksz))" $SCRATCH_MNT/file2 > /dev/null +sync +chown $qa_user $SCRATCH_MNT/file2 +$here/src/punch-alternating $SCRATCH_MNT/file2 2>&1 | _filter_scratch + +$XFS_QUOTA_PROG -c "quota -u -r -n -N $qa_user" -c "quota -u -b -n -N $qa_user" $SCRATCH_MNT >> $seqres.full +$XFS_IO_PROG -c "bmap -e -l -p -v" $SCRATCH_MNT/file2 >> $seqres.full + +# success, all done +$XFS_QUOTA_PROG -x -c 'report -a -u' -c 'report -a -u -r' $SCRATCH_MNT >> $seqres.full +ls -latr $SCRATCH_MNT >> $seqres.full +status=0 +exit diff --git a/tests/xfs/767.out b/tests/xfs/767.out new file mode 100644 index 0000000000..bff7c0f44c --- /dev/null +++ b/tests/xfs/767.out @@ -0,0 +1,41 @@ +QA output created by 767 +Format filesystem + +Write 128rx to root +0[real] 128 0 0 0 --- + +Write 64rx to root, 4444, and 5555. +0[real] 192 0 0 0 --- +4444[real] 64 0 0 0 --- +5555[real] 64 0 0 0 --- + +Move 64rx from root to 5555 +0[real] 128 0 0 0 --- +4444[real] 64 0 0 0 --- +5555[real] 128 0 0 0 --- + +Move 64rx from 5555 to 4444 +0[real] 128 0 0 0 --- +4444[real] 128 0 0 0 --- +5555[real] 64 0 0 0 --- + +Set hard limit of 1024rx and check enforcement +pwrite: Disk quota exceeded +fsgqa[real] 1024 0 1024 0 --- + +Set soft limit of 512rx and check timelimit enforcement +fsgqa[real] 512 512 0 0 --- +fsgqa[real] 513 512 0 0 --- +Try again after 1s +fsgqa[real] 514 512 0 0 --- +Try again after 3s +pwrite: Disk quota exceeded +fsgqa[real] 514 512 0 0 --- + +Extend time limits and warnings +fsgqa[real] 512 512 0 0 [--------] +fsgqa[real] 513 512 0 0 [2_days] +fsgqa[real] 514 512 0 0 [3_days] + +Test quota applied to bmbt +SCRATCH_MNT/file2: Disk quota exceeded ^ permalink raw reply related [flat|nested] 565+ messages in thread
end of thread, other threads:[~2023-03-06 23:15 UTC | newest] Thread overview: 565+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-12-30 21:14 [NYE DELUGE 3/4] xfs: modernize the realtime volume Darrick J. Wong 2022-12-30 22:17 ` [PATCHSET v1.0 00/20] xfs: hoist inode operations to libxfs Darrick J. Wong 2022-12-30 22:17 ` [PATCH 01/20] xfs: move inode copy-on-write predicates to xfs_inode.[ch] Darrick J. Wong 2022-12-30 22:17 ` [PATCH 03/20] xfs: hoist inode flag conversion functions to libxfs Darrick J. Wong 2022-12-30 22:17 ` [PATCH 04/20] xfs: hoist project id get/set " Darrick J. Wong 2022-12-30 22:17 ` [PATCH 02/20] xfs: hoist extent size helpers " Darrick J. Wong 2022-12-30 22:17 ` [PATCH 05/20] xfs: pack icreate initialization parameters into a separate structure Darrick J. Wong 2022-12-30 22:17 ` [PATCH 10/20] xfs: push xfs_icreate_args creation out of xfs_create* Darrick J. Wong 2022-12-30 22:17 ` [PATCH 11/20] xfs: wrap inode creation dqalloc calls Darrick J. Wong 2022-12-30 22:17 ` [PATCH 07/20] xfs: use xfs_trans_ichgtime to set times when allocating inode Darrick J. Wong 2022-12-30 22:17 ` [PATCH 06/20] xfs: implement atime updates in xfs_trans_ichgtime Darrick J. Wong 2022-12-30 22:17 ` [PATCH 08/20] xfs: split new inode creation into two pieces Darrick J. Wong 2022-12-30 22:17 ` [PATCH 09/20] xfs: hoist new inode initialization functions to libxfs Darrick J. Wong 2022-12-30 22:17 ` [PATCH 12/20] xfs: hoist xfs_iunlink " Darrick J. Wong 2022-12-30 22:17 ` [PATCH 16/20] xfs: hoist inode free function " Darrick J. Wong 2022-12-30 22:17 ` [PATCH 15/20] xfs: create libxfs helper to link an existing inode into a directory Darrick J. Wong 2022-12-30 22:17 ` [PATCH 18/20] xfs: create libxfs helper to exchange two directory entries Darrick J. Wong 2022-12-30 22:17 ` [PATCH 14/20] xfs: create libxfs helper to link a new inode into a directory Darrick J. Wong 2022-12-30 22:17 ` [PATCH 13/20] xfs: hoist xfs_{bump,drop}link to libxfs Darrick J. Wong 2022-12-30 22:17 ` [PATCH 17/20] xfs: create libxfs helper to remove an existing inode/name from a directory Darrick J. Wong 2022-12-30 22:17 ` [PATCH 20/20] xfs: get rid of cross_rename Darrick J. Wong 2022-12-30 22:17 ` [PATCH 19/20] xfs: create libxfs helper to rename two directory entries Darrick J. Wong 2022-12-30 22:17 ` [PATCHSET v1.0 00/23] xfs: metadata inode directories Darrick J. Wong 2022-12-30 22:17 ` [PATCH 02/23] xfs: create imeta abstractions to get and set metadata inodes Darrick J. Wong 2022-12-30 22:17 ` [PATCH 01/23] xfs: don't use the incore struct xfs_sb for offsets into struct xfs_dsb Darrick J. Wong 2022-12-30 22:17 ` [PATCH 09/23] xfs: load metadata directory root at mount time Darrick J. Wong 2022-12-30 22:17 ` [PATCH 05/23] xfs: convert all users to xfs_imeta_log Darrick J. Wong 2022-12-30 22:17 ` [PATCH 06/23] xfs: iget for metadata inodes Darrick J. Wong 2022-12-30 22:17 ` [PATCH 04/23] xfs: refactor the v4 group/project inode pointer switch Darrick J. Wong 2022-12-30 22:17 ` [PATCH 08/23] xfs: update imeta transaction reservations for metadir Darrick J. Wong 2022-12-30 22:17 ` [PATCH 07/23] xfs: define the on-disk format for the metadir feature Darrick J. Wong 2022-12-30 22:17 ` [PATCH 03/23] xfs: create transaction reservations for metadata inode operations Darrick J. Wong 2022-12-30 22:17 ` [PATCH 10/23] xfs: convert metadata inode lookup keys to use paths Darrick J. Wong 2022-12-30 22:17 ` [PATCH 15/23] xfs: hide metadata inodes from everyone because they are special Darrick J. Wong 2022-12-30 22:17 ` [PATCH 13/23] xfs: ensure metadata directory paths exist before creating files Darrick J. Wong 2022-12-30 22:17 ` [PATCH 14/23] xfs: disable the agi rotor for metadata inodes Darrick J. Wong 2022-12-30 22:17 ` [PATCH 12/23] xfs: read and write metadata inode directory Darrick J. Wong 2022-12-30 22:17 ` [PATCH 16/23] xfs: advertise metadata directory feature Darrick J. Wong 2022-12-30 22:17 ` [PATCH 11/23] xfs: enforce metadata inode flag Darrick J. Wong 2022-12-30 22:17 ` [PATCH 23/23] xfs: enable metadata directory feature Darrick J. Wong 2022-12-30 22:17 ` [PATCH 22/23] xfs: don't check secondary super inode pointers when metadir enabled Darrick J. Wong 2022-12-30 22:17 ` [PATCH 19/23] xfs: record health problems with the metadata directory Darrick J. Wong 2022-12-30 22:17 ` [PATCH 20/23] xfs: scrub metadata directories Darrick J. Wong 2022-12-30 22:17 ` [PATCH 21/23] xfs: teach nlink scrubber to deal with metadata directory roots Darrick J. Wong 2022-12-30 22:17 ` [PATCH 18/23] xfs: enable creation of dynamically allocated metadir path structures Darrick J. Wong 2022-12-30 22:17 ` [PATCH 17/23] xfs: allow bulkstat to return metadata directories Darrick J. Wong 2022-12-30 22:17 ` [PATCHSET v1.0 00/14] xfs: refactor btrees to support records in inode root Darrick J. Wong 2022-12-30 22:17 ` [PATCH 03/14] xfs: refactor creation of bmap btree roots Darrick J. Wong 2022-12-30 22:17 ` [PATCH 02/14] xfs: refactor the allocation and freeing of incore inode fork " Darrick J. Wong 2022-12-30 22:17 ` [PATCH 01/14] xfs: replace shouty XFS_BM{BT,DR} macros Darrick J. Wong 2022-12-30 22:17 ` [PATCH 05/14] xfs: hoist the code that moves the incore inode fork broot memory Darrick J. Wong 2022-12-30 22:17 ` [PATCH 04/14] xfs: fix a sloppy memory handling bug in xfs_iroot_realloc Darrick J. Wong 2022-12-30 22:17 ` [PATCH 06/14] xfs: move the zero records logic into xfs_bmap_broot_space_calc Darrick J. Wong 2022-12-30 22:17 ` [PATCH 13/14] xfs: support storing records in the inode core root Darrick J. Wong 2022-12-30 22:17 ` [PATCH 07/14] xfs: rearrange xfs_iroot_realloc a bit Darrick J. Wong 2022-12-30 22:17 ` [PATCH 10/14] xfs: support leaves in the incore btree root block in xfs_iroot_realloc Darrick J. Wong 2022-12-30 22:17 ` [PATCH 12/14] xfs: hoist the node iroot update code out of xfs_btree_kill_iroot Darrick J. Wong 2022-12-30 22:17 ` [PATCH 11/14] xfs: hoist the node iroot update code out of xfs_btree_new_iroot Darrick J. Wong 2022-12-30 22:17 ` [PATCH 09/14] xfs: generalize the btree root reallocation function Darrick J. Wong 2022-12-30 22:17 ` [PATCH 08/14] xfs: standardize the btree maxrecs function parameters Darrick J. Wong 2022-12-30 22:17 ` [PATCH 14/14] xfs: update btree keys correctly when _insrec splits an inode root block Darrick J. Wong 2022-12-30 22:17 ` [PATCHSET v1.0 00/11] xfs: clean up realtime type usage Darrick J. Wong 2022-12-30 22:17 ` [PATCH 05/11] xfs: make sure maxlen is still congruent with prod when rounding down Darrick J. Wong 2022-12-30 22:17 ` [PATCH 04/11] xfs: rt stubs should return negative errnos when rt disabled Darrick J. Wong 2022-12-30 22:17 ` [PATCH 01/11] xfs: refactor realtime scrubbing context management Darrick J. Wong 2022-12-30 22:17 ` [PATCH 02/11] xfs: bump max fsgeom struct version Darrick J. Wong 2022-12-30 22:17 ` [PATCH 03/11] xfs: prevent rt growfs when quota is enabled Darrick J. Wong 2022-12-30 22:17 ` [PATCH 08/11] xfs: convert rt bitmap/summary block numbers to xfs_fileoff_t Darrick J. Wong 2022-12-30 22:17 ` [PATCH 11/11] xfs: convert rt extent numbers to xfs_rtxnum_t Darrick J. Wong 2022-12-30 22:17 ` [PATCH 06/11] xfs: move the xfs_rtbitmap.c declarations to xfs_rtbitmap.h Darrick J. Wong 2022-12-30 22:17 ` [PATCH 09/11] xfs: convert rt bitmap extent lengths to xfs_rtbxlen_t Darrick J. Wong 2022-12-30 22:17 ` [PATCH 10/11] xfs: rename xfs_verify_rtext to xfs_verify_rtbext Darrick J. Wong 2022-12-30 22:17 ` [PATCH 07/11] xfs: convert xfs_extlen_t to xfs_rtxlen_t in the rt allocator Darrick J. Wong 2022-12-30 22:17 ` [PATCHSET v1.0 0/7] xfs: refactor rt extent unit conversions Darrick J. Wong 2022-12-30 22:17 ` [PATCH 1/7] xfs: create a helper to convert rtextents to rtblocks Darrick J. Wong 2022-12-30 22:17 ` [PATCH 2/7] xfs: create a helper to compute leftovers of realtime extents Darrick J. Wong 2022-12-30 22:17 ` [PATCH 5/7] xfs: convert do_div calls to xfs_rtb_to_rtx helper calls Darrick J. Wong 2022-12-30 22:17 ` [PATCH 4/7] xfs: create helpers to convert rt block numbers to rt extent numbers Darrick J. Wong 2022-12-30 22:17 ` [PATCH 3/7] xfs: create a helper to compute leftovers of realtime extents Darrick J. Wong 2022-12-30 22:17 ` [PATCH 7/7] xfs: use shifting and masking when converting rt extents, if possible Darrick J. Wong 2022-12-30 22:17 ` [PATCH 6/7] xfs: create rt extent rounding helpers for realtime extent blocks Darrick J. Wong 2022-12-30 22:17 ` [PATCHSET v1.0 0/8] xfs: refactor rtbitmap/summary macros Darrick J. Wong 2022-12-30 22:17 ` [PATCH 1/8] xfs: convert the rtbitmap block and bit macros to static inline functions Darrick J. Wong 2022-12-30 22:17 ` [PATCH 8/8] xfs: use accessor functions for summary info words Darrick J. Wong 2022-12-30 22:17 ` [PATCH 2/8] xfs: remove XFS_BLOCKWSIZE and XFS_BLOCKWMASK macros Darrick J. Wong 2022-12-30 22:17 ` [PATCH 7/8] xfs: create helpers for rtsummary block/wordcount computations Darrick J. Wong 2022-12-30 22:17 ` [PATCH 3/8] xfs: convert open-coded xfs_rtword_t pointer accesses to helper Darrick J. Wong 2022-12-30 22:17 ` [PATCH 5/8] xfs: create helpers for rtbitmap block/wordcount computations Darrick J. Wong 2022-12-30 22:17 ` [PATCH 6/8] xfs: use accessor functions for bitmap words Darrick J. Wong 2022-12-30 22:17 ` [PATCH 4/8] xfs: convert rt summary macros to helpers Darrick J. Wong 2022-12-30 22:17 ` [PATCHSET v1.0 0/3] xfs: refactor realtime meta inode locking Darrick J. Wong 2022-12-30 22:17 ` [PATCH 1/3] xfs: use separate lock classes for realtime metadata inode ILOCKs Darrick J. Wong 2022-12-30 22:17 ` [PATCH 3/3] xfs: remove XFS_ILOCK_RT* Darrick J. Wong 2022-12-30 22:17 ` [PATCH 2/3] xfs: refactor realtime inode locking Darrick J. Wong 2022-12-30 22:17 ` [PATCHSET v1.0 00/22] xfsprogs: shard the realtime section Darrick J. Wong 2022-12-30 22:17 ` [PATCH 01/22] xfs: create incore realtime group structures Darrick J. Wong 2022-12-30 22:17 ` [PATCH 05/22] xfs: write secondary realtime superblocks to disk Darrick J. Wong 2022-12-30 22:17 ` [PATCH 06/22] xfs: grow the realtime section when realtime groups are enabled Darrick J. Wong 2022-12-30 22:17 ` [PATCH 03/22] xfs: check the realtime superblock at mount time Darrick J. Wong 2022-12-30 22:17 ` [PATCH 04/22] xfs: update primary realtime super every time we update the primary fs super Darrick J. Wong 2022-12-30 22:17 ` [PATCH 07/22] xfs: always update secondary rt supers when we update secondary fs supers Darrick J. Wong 2022-12-30 22:17 ` [PATCH 08/22] xfs: export realtime group geometry via XFS_FSOP_GEOM Darrick J. Wong 2022-12-30 22:17 ` [PATCH 02/22] xfs: define the format of rt groups Darrick J. Wong 2022-12-30 22:17 ` [PATCH 11/22] xfs: record rt group superblock errors in the health system Darrick J. Wong 2022-12-30 22:17 ` [PATCH 09/22] xfs: check that rtblock extents do not overlap with the rt group metadata Darrick J. Wong 2022-12-30 22:17 ` [PATCH 14/22] xfs: add block headers to realtime bitmap blocks Darrick J. Wong 2022-12-30 22:17 ` [PATCH 13/22] xfs: export the geometry of realtime groups to userspace Darrick J. Wong 2022-12-30 22:17 ` [PATCH 10/22] xfs: add frextents to the lazysbcounters when rtgroups enabled Darrick J. Wong 2022-12-30 22:17 ` [PATCH 12/22] xfs: define locking primitives for realtime groups Darrick J. Wong 2022-12-30 22:17 ` [PATCH 15/22] xfs: encode the rtbitmap in little endian format Darrick J. Wong 2022-12-30 22:17 ` [PATCH 21/22] xfs: scrub each rtgroup's portion of the rtbitmap separately Darrick J. Wong 2022-12-30 22:17 ` [PATCH 17/22] xfs: encode the rtsummary in big endian format Darrick J. Wong 2022-12-30 22:17 ` [PATCH 19/22] xfs: scrub the realtime group superblock Darrick J. Wong 2022-12-30 22:17 ` [PATCH 16/22] xfs: add block headers to realtime summary blocks Darrick J. Wong 2022-12-30 22:17 ` [PATCH 20/22] xfs: repair secondary realtime group superblocks Darrick J. Wong 2022-12-30 22:17 ` [PATCH 22/22] xfs: enable realtime group feature Darrick J. Wong 2022-12-30 22:17 ` [PATCH 18/22] xfs: store rtgroup information with a bmap intent Darrick J. Wong 2022-12-30 22:17 ` [PATCHSET v1.0 0/3] xfsprogs: enable FITRIM for the realtime section Darrick J. Wong 2022-12-30 22:17 ` [PATCH 1/3] xfs: hoist data device FITRIM AG iteration to a separate function Darrick J. Wong 2022-12-30 22:17 ` [PATCH 2/3] xfs: convert xfs_trim_extents to use perag iteration macros Darrick J. Wong 2022-12-30 22:17 ` [PATCH 3/3] xfs: enable FITRIM on the realtime device Darrick J. Wong 2022-12-30 22:18 ` [PATCHSET v1.0 0/2] xfs: enable in-core block reservation for rt metadata Darrick J. Wong 2022-12-30 22:18 ` [PATCH 2/2] xfs: allow inode-based btrees to reserve space in the data device Darrick J. Wong 2022-12-30 22:18 ` [PATCH 1/2] xfs: simplify xfs_ag_resv_free signature Darrick J. Wong 2022-12-30 22:18 ` [PATCHSET v1.0 0/2] xfs: extent free log intent cleanups Darrick J. Wong 2022-12-30 22:18 ` [PATCH 1/2] xfs: clean up extent free log intent item tracepoint callsites Darrick J. Wong 2022-12-30 22:18 ` [PATCH 2/2] xfs: convert "skip_discard" to a proper flags bitset Darrick J. Wong 2022-12-30 22:18 ` [PATCHSET v1.0 0/2] xfs: widen EFI format to support rt Darrick J. Wong 2022-12-30 22:18 ` [PATCH 1/2] xfs: support logging EFIs for realtime extents Darrick J. Wong 2022-12-30 22:18 ` [PATCH 2/2] xfs: support error injection when freeing rt extents Darrick J. Wong 2022-12-30 22:18 ` [PATCHSET v1.0 0/5] xfs: rmap log intent cleanups Darrick J. Wong 2022-12-30 22:18 ` [PATCH 1/5] xfs: attach rtgroup objects to btree cursors Darrick J. Wong 2022-12-30 22:18 ` [PATCH 2/5] xfs: give rmap btree cursor error tracepoints their own class Darrick J. Wong 2022-12-30 22:18 ` [PATCH 3/5] xfs: prepare rmap btree tracepoints for widening Darrick J. Wong 2022-12-30 22:18 ` [PATCH 5/5] xfs: remove xfs_trans_set_rmap_flags Darrick J. Wong 2022-12-30 22:18 ` [PATCH 4/5] xfs: clean up rmap log intent item tracepoint callsites Darrick J. Wong 2022-12-30 22:18 ` [PATCHSET v1.0 00/38] xfs: realtime reverse-mapping support Darrick J. Wong 2022-12-30 22:18 ` [PATCH 06/38] xfs: add realtime rmap btree operations Darrick J. Wong 2022-12-30 22:18 ` [PATCH 01/38] xfs: prepare rmap btree cursor tracepoints for realtime Darrick J. Wong 2022-12-30 22:18 ` [PATCH 05/38] xfs: realtime rmap btree transaction reservations Darrick J. Wong 2022-12-30 22:18 ` [PATCH 04/38] xfs: define the on-disk realtime rmap btree format Darrick J. Wong 2022-12-30 22:18 ` [PATCH 03/38] xfs: introduce realtime rmap btree definitions Darrick J. Wong 2022-12-30 22:18 ` [PATCH 02/38] xfs: simplify the xfs_rmap_{alloc,free}_extent calling conventions Darrick J. Wong 2022-12-30 22:18 ` [PATCH 07/38] xfs: prepare rmap functions to deal with rtrmapbt Darrick J. Wong 2022-12-30 22:18 ` [PATCH 10/38] xfs: add realtime rmap btree block detection to log recovery Darrick J. Wong 2022-12-30 22:18 ` [PATCH 12/38] xfs: add realtime reverse map inode to metadata directory Darrick J. Wong 2022-12-30 22:18 ` [PATCH 13/38] xfs: add metadata reservations for realtime rmap btrees Darrick J. Wong 2022-12-30 22:18 ` [PATCH 09/38] xfs: support recovering rmap intent items targetting realtime extents Darrick J. Wong 2022-12-30 22:18 ` [PATCH 11/38] xfs: attach dquots to rt metadata files when starting quota Darrick J. Wong 2022-12-30 22:18 ` [PATCH 08/38] xfs: add a realtime flag to the rmap update log redo items Darrick J. Wong 2022-12-30 22:18 ` [PATCH 14/38] xfs: wire up a new inode fork type for the realtime rmap Darrick J. Wong 2022-12-30 22:18 ` [PATCH 18/38] xfs: rearrange xfs_fsmap.c a little bit Darrick J. Wong 2022-12-30 22:18 ` [PATCH 17/38] xfs: create routine to allocate and initialize a realtime rmap btree inode Darrick J. Wong 2022-12-30 22:18 ` [PATCH 20/38] xfs: fix integer overflows in the fsmap rtbitmap backend Darrick J. Wong 2022-12-30 22:18 ` [PATCH 15/38] xfs: use realtime EFI to free extents when realtime rmap is enabled Darrick J. Wong 2022-12-30 22:18 ` [PATCH 19/38] xfs: wire up getfsmap to the realtime reverse mapping btree Darrick J. Wong 2022-12-30 22:18 ` [PATCH 16/38] xfs: wire up rmap map and unmap to the realtime rmapbt Darrick J. Wong 2022-12-30 22:18 ` [PATCH 24/38] xfs: report realtime rmap btree corruption errors to the health system Darrick J. Wong 2022-12-30 22:18 ` [PATCH 23/38] xfs: add realtime rmap btree when adding rt volume Darrick J. Wong 2022-12-30 22:18 ` [PATCH 26/38] xfs: allow queued realtime intents to drain before scrubbing Darrick J. Wong 2022-12-30 22:18 ` [PATCH 22/38] xfs: check that the rtrmapbt maxlevels doesn't increase when growing fs Darrick J. Wong 2022-12-30 22:18 ` [PATCH 21/38] xfs: fix getfsmap reporting past the last rt extent Darrick J. Wong 2022-12-30 22:18 ` [PATCH 27/38] xfs: scrub the realtime rmapbt Darrick J. Wong 2022-12-30 22:18 ` [PATCH 25/38] xfs: fix scrub tracepoints when inode-rooted btrees are involved Darrick J. Wong 2022-12-30 22:18 ` [PATCH 29/38] xfs: cross-reference the realtime rmapbt Darrick J. Wong 2022-12-30 22:18 ` [PATCH 31/38] xfs: walk the rt reverse mapping tree when rebuilding rmap Darrick J. Wong 2022-12-30 22:18 ` [PATCH 33/38] xfs: repair inodes that have realtime extents Darrick J. Wong 2022-12-30 22:18 ` [PATCH 34/38] xfs: online repair of realtime bitmaps for a realtime group Darrick J. Wong 2022-12-30 22:18 ` [PATCH 30/38] xfs: scan rt rmap when we're doing an intense rmap check of bmbt mappings Darrick J. Wong 2022-12-30 22:18 ` [PATCH 28/38] xfs: cross-reference realtime bitmap to realtime rmapbt scrubber Darrick J. Wong 2022-12-30 22:18 ` [PATCH 32/38] xfs: online repair of realtime file bmaps Darrick J. Wong 2022-12-30 22:18 ` [PATCH 35/38] xfs: online repair of the realtime rmap btree Darrick J. Wong 2022-12-30 22:18 ` [PATCH 36/38] xfs: create a shadow rmap btree during realtime rmap repair Darrick J. Wong 2022-12-30 22:18 ` [PATCH 38/38] xfs: enable realtime rmap btree Darrick J. Wong 2022-12-30 22:18 ` [PATCH 37/38] xfs: hook live realtime rmap operations during a repair operation Darrick J. Wong 2022-12-30 22:18 ` [PATCHSET v1.0 0/5] xfs: refcount log intent cleanups Darrick J. Wong 2022-12-30 22:18 ` [PATCH 3/5] xfs: prepare refcount btree tracepoints for widening Darrick J. Wong 2022-12-30 22:18 ` [PATCH 1/5] xfs: give refcount btree cursor error tracepoints their own class Darrick J. Wong 2022-12-30 22:18 ` [PATCH 2/5] xfs: create specialized classes for refcount tracepoints Darrick J. Wong 2022-12-30 22:18 ` [PATCH 4/5] xfs: clean up refcount log intent item tracepoint callsites Darrick J. Wong 2022-12-30 22:18 ` [PATCH 5/5] xfs: remove xfs_trans_set_refcount_flags Darrick J. Wong 2022-12-30 22:18 ` [PATCHSET v1.0 00/42] xfs: reflink on the realtime device Darrick J. Wong 2022-12-30 22:18 ` [PATCH 03/42] xfs: namespace the maximum length/refcount symbols Darrick J. Wong 2022-12-30 22:18 ` [PATCH 02/42] xfs: introduce realtime refcount btree definitions Darrick J. Wong 2022-12-30 22:18 ` [PATCH 01/42] xfs: prepare refcount btree cursor tracepoints for realtime Darrick J. Wong 2022-12-30 22:18 ` [PATCH 05/42] xfs: realtime refcount btree transaction reservations Darrick J. Wong 2022-12-30 22:18 ` [PATCH 06/42] xfs: add realtime refcount btree operations Darrick J. Wong 2022-12-30 22:18 ` [PATCH 04/42] xfs: define the on-disk realtime refcount btree format Darrick J. Wong 2022-12-30 22:18 ` [PATCH 10/42] xfs: add realtime refcount btree block detection to log recovery Darrick J. Wong 2022-12-30 22:18 ` [PATCH 11/42] xfs: add realtime refcount btree inode to metadata directory Darrick J. Wong 2022-12-30 22:18 ` [PATCH 09/42] xfs: support recovering refcount intent items targetting realtime extents Darrick J. Wong 2022-12-30 22:18 ` [PATCH 12/42] xfs: add metadata reservations for realtime refcount btree Darrick J. Wong 2022-12-30 22:18 ` [PATCH 07/42] xfs: prepare refcount functions to deal with rtrefcountbt Darrick J. Wong 2022-12-30 22:18 ` [PATCH 08/42] xfs: add a realtime flag to the refcount update log redo items Darrick J. Wong 2022-12-30 22:18 ` [PATCH 13/42] xfs: wire up a new inode fork type for the realtime refcount Darrick J. Wong 2022-12-30 22:18 ` [PATCH 20/42] xfs: enable sharing of realtime file blocks Darrick J. Wong 2022-12-30 22:18 ` [PATCH 19/42] xfs: enable CoW for realtime data Darrick J. Wong 2022-12-30 22:18 ` [PATCH 15/42] xfs: create routine to allocate and initialize a realtime refcount btree inode Darrick J. Wong 2022-12-30 22:18 ` [PATCH 16/42] xfs: update rmap to allow cow staging extents in the rt rmap Darrick J. Wong 2022-12-30 22:18 ` [PATCH 17/42] xfs: compute rtrmap btree max levels when reflink enabled Darrick J. Wong 2022-12-30 22:18 ` [PATCH 18/42] xfs: refactor reflink quota updates Darrick J. Wong 2022-12-30 22:18 ` [PATCH 14/42] xfs: wire up realtime refcount btree cursors Darrick J. Wong 2022-12-30 22:18 ` [PATCH 21/42] xfs: allow inodes to have the realtime and reflink flags Darrick J. Wong 2022-12-30 22:18 ` [PATCH 27/42] xfs: add realtime refcount btree when adding rt volume Darrick J. Wong 2022-12-30 22:18 ` [PATCH 23/42] xfs: fix xfs_get_extsz_hint behavior with realtime alwayscow files Darrick J. Wong 2022-12-30 22:18 ` [PATCH 24/42] xfs: apply rt extent alignment constraints to CoW extsize hint Darrick J. Wong 2022-12-30 22:18 ` [PATCH 26/42] xfs: check that the rtrefcount maxlevels doesn't increase when growing fs Darrick J. Wong 2022-12-30 22:18 ` [PATCH 28/42] xfs: report realtime refcount btree corruption errors to the health system Darrick J. Wong 2022-12-30 22:18 ` [PATCH 25/42] xfs: enable extent size hints for CoW operations Darrick J. Wong 2022-12-30 22:18 ` [PATCH 22/42] xfs: refcover CoW leftovers in the realtime volume Darrick J. Wong 2022-12-30 22:18 ` [PATCH 31/42] xfs: allow overlapping rtrmapbt records for shared data extents Darrick J. Wong 2022-12-30 22:18 ` [PATCH 34/42] xfs: detect and repair misaligned rtinherit directory cowextsize hints Darrick J. Wong 2022-12-30 22:18 ` [PATCH 35/42] xfs: don't flag quota rt block usage on rtreflink filesystems Darrick J. Wong 2022-12-30 22:18 ` [PATCH 33/42] xfs: allow dquot rt block count to exceed rt blocks on reflink fs Darrick J. Wong 2022-12-30 22:18 ` [PATCH 29/42] xfs: scrub the realtime refcount btree Darrick J. Wong 2022-12-30 22:18 ` [PATCH 30/42] xfs: cross-reference checks with the rt " Darrick J. Wong 2022-12-30 22:18 ` [PATCH 32/42] xfs: check reference counts of gaps between rt refcount records Darrick J. Wong 2022-12-30 22:18 ` [PATCH 39/42] xfs: online repair of the realtime refcount btree Darrick J. Wong 2022-12-30 22:18 ` [PATCH 42/42] xfs: enable realtime reflink Darrick J. Wong 2022-12-30 22:18 ` [PATCH 36/42] xfs: check new rtbitmap records against rt refcount btree Darrick J. Wong 2022-12-30 22:18 ` [PATCH 40/42] xfs: repair inodes that have a refcount btree in the data fork Darrick J. Wong 2022-12-30 22:18 ` [PATCH 37/42] xfs: walk the rt reference count tree when rebuilding rmap Darrick J. Wong 2022-12-30 22:18 ` [PATCH 41/42] xfs: fix cow forks for realtime files Darrick J. Wong 2022-12-30 22:18 ` [PATCH 38/42] xfs: capture realtime CoW staging extents when rebuilding rt rmapbt Darrick J. Wong 2022-12-30 22:18 ` [PATCHSET v1.0 0/9] xfs: reflink with large realtime extents Darrick J. Wong 2022-12-30 22:18 ` [PATCH 4/9] xfs: forcibly convert unwritten blocks within an rt extent before sharing Darrick J. Wong 2022-12-30 22:18 ` [PATCH 2/9] iomap: set up for COWing around pages Darrick J. Wong 2022-12-30 22:18 ` [PATCH 1/9] vfs: explicitly pass the block size to the remap prep function Darrick J. Wong 2022-12-30 22:18 ` [PATCH 3/9] xfs: enable CoW when rt extent size is larger than 1 block Darrick J. Wong 2022-12-30 22:18 ` [PATCH 5/9] xfs: extend writeback requests to handle rt cow correctly Darrick J. Wong 2022-12-30 22:18 ` [PATCH 6/9] xfs: enable extent size hints for CoW when rtextsize > 1 Darrick J. Wong 2022-12-30 22:18 ` [PATCH 8/9] xfs: fix integer overflow when validating extent size hints Darrick J. Wong 2022-12-30 22:18 ` [PATCH 7/9] xfs: allow reflink on the rt volume when extent size is larger than 1 rt block Darrick J. Wong 2022-12-30 22:18 ` [PATCH 9/9] xfs: support realtime reflink with an extent size that isn't a power of 2 Darrick J. Wong 2022-12-30 22:18 ` [PATCHSET v1.0 0/3] xfs: enable quota for realtime voluems Darrick J. Wong 2022-12-30 22:18 ` [PATCH 3/3] xfs: enable realtime quota again Darrick J. Wong 2022-12-30 22:18 ` [PATCH 1/3] xfs: fix chown with rt quota Darrick J. Wong 2022-12-30 22:18 ` [PATCH 2/3] xfs: fix rt growfs quota accounting Darrick J. Wong 2022-12-30 22:19 ` [PATCHSET 0/4] xfs_repair: add other v5 features to filesystems Darrick J. Wong 2022-12-30 22:19 ` [PATCH 1/4] xfs_repair: check free space requirements before allowing upgrades Darrick J. Wong 2022-12-30 22:19 ` [PATCH 4/4] xfs_repair: allow sysadmins to add reverse mapping indexes Darrick J. Wong 2022-12-30 22:19 ` [PATCH 2/4] xfs_repair: allow sysadmins to add free inode btree indexes Darrick J. Wong 2022-12-30 22:19 ` [PATCH 3/4] xfs_repair: allow sysadmins to add reflink Darrick J. Wong 2022-12-30 22:19 ` [PATCHSET v1.0 00/26] libxfs: hoist inode operations to libxfs Darrick J. Wong 2022-12-30 22:19 ` [PATCH 03/26] xfs: hoist project id get/set functions " Darrick J. Wong 2022-12-30 22:19 ` [PATCH 01/26] xfs: hoist extent size helpers " Darrick J. Wong 2022-12-30 22:19 ` [PATCH 02/26] xfs: hoist inode flag conversion functions " Darrick J. Wong 2022-12-30 22:19 ` [PATCH 06/26] libxfs: pack icreate initialization parameters into a separate structure Darrick J. Wong 2022-12-30 22:19 ` [PATCH 05/26] libxfs: pass IGET flags through to xfs_iread Darrick J. Wong 2022-12-30 22:19 ` [PATCH 04/26] libxfs: put all the inode functions in a single file Darrick J. Wong 2022-12-30 22:19 ` [PATCH 10/26] libxfs: when creating a file in a directory, set the project id based on the parent Darrick J. Wong 2022-12-30 22:19 ` [PATCH 09/26] libxfs: set access time when creating files Darrick J. Wong 2022-12-30 22:19 ` [PATCH 07/26] libxfs: implement access timestamp updates in ichgtime Darrick J. Wong 2022-12-30 22:19 ` [PATCH 08/26] libxfs: rearrange libxfs_trans_ichgtime call when creating inodes Darrick J. Wong 2022-12-30 22:19 ` [PATCH 11/26] libxfs: pass flags2 from parent to child when creating files Darrick J. Wong 2022-12-30 22:19 ` [PATCH 12/26] libxfs: split new inode creation into two pieces Darrick J. Wong 2022-12-30 22:19 ` [PATCH 14/26] libxfs: remove libxfs_dir_ialloc Darrick J. Wong 2022-12-30 22:19 ` [PATCH 13/26] libxfs: backport inode init code from the kernel Darrick J. Wong 2022-12-30 22:19 ` [PATCH 17/26] xfs: hoist xfs_{bump,drop}link to libxfs Darrick J. Wong 2022-12-30 22:19 ` [PATCH 15/26] xfs: hoist new inode initialization functions " Darrick J. Wong 2022-12-30 22:19 ` [PATCH 22/26] xfs: create libxfs helper to exchange two directory entries Darrick J. Wong 2022-12-30 22:19 ` [PATCH 21/26] xfs: create libxfs helper to remove an existing inode/name from a directory Darrick J. Wong 2022-12-30 22:19 ` [PATCH 20/26] xfs: hoist inode free function to libxfs Darrick J. Wong 2022-12-30 22:19 ` [PATCH 19/26] xfs: create libxfs helper to link an existing inode into a directory Darrick J. Wong 2022-12-30 22:19 ` [PATCH 18/26] xfs: create libxfs helper to link a new " Darrick J. Wong 2022-12-30 22:19 ` [PATCH 16/26] xfs: hoist xfs_iunlink to libxfs Darrick J. Wong 2022-12-30 22:19 ` [PATCH 23/26] xfs: create libxfs helper to rename two directory entries Darrick J. Wong 2022-12-30 22:19 ` [PATCH 25/26] xfs_repair: use library functions to reset root/rbm/rsum inodes Darrick J. Wong 2022-12-30 22:19 ` [PATCH 26/26] xfs_repair: use library functions for orphanage creation Darrick J. Wong 2022-12-30 22:19 ` [PATCH 24/26] xfs: don't use the incore struct xfs_sb for offsets into struct xfs_dsb Darrick J. Wong 2022-12-30 22:19 ` [PATCHSET v1.0 00/46] libxfs: metadata inode directories Darrick J. Wong 2022-12-30 22:19 ` [PATCH 04/46] libxfs: convert all users to libxfs_imeta_create Darrick J. Wong 2022-12-30 22:19 ` [PATCH 02/46] xfs: create transaction reservations for metadata inode operations Darrick J. Wong 2022-12-30 22:19 ` [PATCH 03/46] mkfs: clean up the rtinit() function Darrick J. Wong 2022-12-30 22:19 ` [PATCH 01/46] xfs: create imeta abstractions to get and set metadata inodes Darrick J. Wong 2022-12-30 22:19 ` [PATCH 05/46] mkfs: break up the rest of the rtinit() function Darrick J. Wong 2022-12-30 22:19 ` [PATCH 12/46] xfs: read and write metadata inode directory Darrick J. Wong 2022-12-30 22:19 ` [PATCH 08/46] xfs: update imeta transaction reservations for metadir Darrick J. Wong 2022-12-30 22:19 ` [PATCH 06/46] libxfs: iget for metadata inodes Darrick J. Wong 2022-12-30 22:19 ` [PATCH 11/46] xfs: enforce metadata inode flag Darrick J. Wong 2022-12-30 22:19 ` [PATCH 09/46] xfs: load metadata directory root at mount time Darrick J. Wong 2022-12-30 22:19 ` [PATCH 07/46] xfs: define the on-disk format for the metadir feature Darrick J. Wong 2022-12-30 22:19 ` [PATCH 10/46] xfs: convert metadata inode lookup keys to use paths Darrick J. Wong 2022-12-30 22:19 ` [PATCH 14/46] xfs: disable the agi rotor for metadata inodes Darrick J. Wong 2022-12-30 22:19 ` [PATCH 19/46] libfrog: report metadata directories in the geometry report Darrick J. Wong 2022-12-30 22:19 ` [PATCH 15/46] xfs: advertise metadata directory feature Darrick J. Wong 2022-12-30 22:19 ` [PATCH 20/46] xfs_db: basic xfs_check support for metadir Darrick J. Wong 2022-12-30 22:19 ` [PATCH 18/46] xfs: record health problems with the metadata directory Darrick J. Wong 2022-12-30 22:19 ` [PATCH 17/46] xfs: enable creation of dynamically allocated metadir path structures Darrick J. Wong 2022-12-30 22:19 ` [PATCH 13/46] xfs: ensure metadata directory paths exist before creating files Darrick J. Wong 2022-12-30 22:19 ` [PATCH 16/46] xfs: allow bulkstat to return metadata directories Darrick J. Wong 2022-12-30 22:19 ` [PATCH 24/46] xfs_db: mask superblock fields when metadir feature is enabled Darrick J. Wong 2022-12-30 22:19 ` [PATCH 23/46] xfs_db: support metadata directories in the path command Darrick J. Wong 2022-12-30 22:19 ` [PATCH 26/46] xfs_scrub: scan metadata directories during phase 3 Darrick J. Wong 2022-12-30 22:19 ` [PATCH 25/46] xfs_io: support the bulkstat metadata directory flag Darrick J. Wong 2022-12-30 22:19 ` [PATCH 21/46] xfs_db: report metadir support for version command Darrick J. Wong 2022-12-30 22:19 ` [PATCH 22/46] xfs_db: don't obfuscate metadata directories and attributes Darrick J. Wong 2022-12-30 22:19 ` [PATCH 27/46] xfs_repair: don't zero the incore secondary super when zeroing Darrick J. Wong 2022-12-30 22:19 ` [PATCH 28/46] xfs_repair: refactor metadata inode tagging Darrick J. Wong 2022-12-30 22:19 ` [PATCH 33/46] xfs_repair: check metadata inode flag Darrick J. Wong 2022-12-30 22:19 ` [PATCH 29/46] xfs_repair: refactor fixing dotdot Darrick J. Wong 2022-12-30 22:19 ` [PATCH 34/46] xfs_repair: rebuild the metadata directory Darrick J. Wong 2022-12-30 22:19 ` [PATCH 31/46] xfs_repair: refactor root directory initialization Darrick J. Wong 2022-12-30 22:19 ` [PATCH 35/46] xfs_repair: don't let metadata and regular files mix Darrick J. Wong 2022-12-30 22:19 ` [PATCH 30/46] xfs_repair: refactor marking of metadata inodes Darrick J. Wong 2022-12-30 22:19 ` [PATCH 32/46] xfs_repair: refactor grabbing realtime " Darrick J. Wong 2022-12-30 22:19 ` [PATCH 39/46] xfs_repair: adjust keep_fsinos to handle metadata directories Darrick J. Wong 2022-12-30 22:19 ` [PATCH 42/46] xfs_repair: drop all the metadata directory files during pass 4 Darrick J. Wong 2022-12-30 22:19 ` [PATCH 38/46] xfs_repair: mark space used by metadata files Darrick J. Wong 2022-12-30 22:19 ` [PATCH 43/46] xfs_repair: truncate and unmark orphaned metadata inodes Darrick J. Wong 2022-12-30 22:19 ` [PATCH 36/46] xfs_repair: update incore metadata state whenever we create new files Darrick J. Wong 2022-12-30 22:19 ` [PATCH 40/46] xfs_repair: metadata dirs are never plausible root dirs Darrick J. Wong 2022-12-30 22:19 ` [PATCH 37/46] xfs_repair: pass private data pointer to scan_lbtree Darrick J. Wong 2022-12-30 22:19 ` [PATCH 41/46] xfs_repair: reattach quota inodes to metadata directory Darrick J. Wong 2022-12-30 22:19 ` [PATCH 45/46] mkfs.xfs: enable metadata directories Darrick J. Wong 2022-12-30 22:19 ` [PATCH 46/46] mkfs: add a utility to generate protofiles Darrick J. Wong 2022-12-30 22:19 ` [PATCH 44/46] xfs_repair: allow sysadmins to add metadata directories Darrick J. Wong 2022-12-30 22:19 ` [PATCHSET v1.0 00/10] libxfs: refactor rt extent unit conversions Darrick J. Wong 2022-12-30 22:19 ` [PATCH 02/10] xfs: create a helper to compute leftovers of realtime extents Darrick J. Wong 2022-12-30 22:19 ` [PATCH 01/10] xfs: create a helper to convert rtextents to rtblocks Darrick J. Wong 2022-12-30 22:19 ` [PATCH 05/10] xfs: create helpers to convert rt block numbers to rt extent numbers Darrick J. Wong 2022-12-30 22:19 ` [PATCH 03/10] xfs: create a helper to compute leftovers of realtime extents Darrick J. Wong 2022-12-30 22:19 ` [PATCH 04/10] libfrog: move 64-bit division wrappers to libfrog Darrick J. Wong 2022-12-30 22:19 ` [PATCH 06/10] xfs: convert do_div calls to xfs_rtb_to_rtx helper calls Darrick J. Wong 2022-12-30 22:19 ` [PATCH 09/10] xfs_repair: convert utility to use new rt extent helpers and types Darrick J. Wong 2022-12-30 22:19 ` [PATCH 08/10] xfs: use shifting and masking when converting rt extents, if possible Darrick J. Wong 2022-12-30 22:19 ` [PATCH 10/10] mkfs: convert utility to use new rt extent helpers and types Darrick J. Wong 2022-12-30 22:19 ` [PATCH 07/10] xfs: create rt extent rounding helpers for realtime extent blocks Darrick J. Wong 2022-12-30 22:19 ` [PATCHSET v1.0 0/9] libxfs: refactor rtbitmap/summary macros Darrick J. Wong 2022-12-30 22:19 ` [PATCH 3/9] xfs: convert open-coded xfs_rtword_t pointer accesses to helper Darrick J. Wong 2022-12-30 22:19 ` [PATCH 1/9] xfs: convert the rtbitmap block and bit macros to static inline functions Darrick J. Wong 2022-12-30 22:19 ` [PATCH 4/9] xfs: convert rt summary macros to helpers Darrick J. Wong 2022-12-30 22:19 ` [PATCH 2/9] xfs: remove XFS_BLOCKWSIZE and XFS_BLOCKWMASK macros Darrick J. Wong 2022-12-30 22:19 ` [PATCH 5/9] xfs: create helpers for rtbitmap block/wordcount computations Darrick J. Wong 2022-12-30 22:19 ` [PATCH 7/9] xfs: create helpers for rtsummary " Darrick J. Wong 2022-12-30 22:19 ` [PATCH 8/9] xfs: use accessor functions for summary info words Darrick J. Wong 2022-12-30 22:19 ` [PATCH 9/9] misc: use m_blockwsize instead of sb_blocksize for rt blocks Darrick J. Wong 2022-12-30 22:19 ` [PATCH 6/9] xfs: use accessor functions for bitmap words Darrick J. Wong 2022-12-30 22:19 ` [PATCHSET v1.0 0/8] xfs_db: debug realtime geometry Darrick J. Wong 2022-12-30 22:19 ` [PATCH 2/8] xfs_db: report the device associated with each io cursor Darrick J. Wong 2022-12-30 22:19 ` [PATCH 5/8] xfs_db: access arbitrary realtime blocks and extents Darrick J. Wong 2022-12-30 22:19 ` [PATCH 6/8] xfs_db: enable conversion of rt space units Darrick J. Wong 2022-12-30 22:19 ` [PATCH 3/8] xfs_db: make the daddr command target the realtime device Darrick J. Wong 2022-12-30 22:19 ` [PATCH 4/8] xfs_db: access realtime file blocks Darrick J. Wong 2022-12-30 22:19 ` [PATCH 1/8] xfs_db: support passing the realtime device to the debugger Darrick J. Wong 2022-12-30 22:19 ` [PATCH 8/8] xfs_db: convert rtsummary geometry Darrick J. Wong 2022-12-30 22:19 ` [PATCH 7/8] xfs_db: convert rtbitmap geometry Darrick J. Wong 2022-12-30 22:19 ` [PATCHSET v1.0 0/5] xfs_metadump: support external devices Darrick J. Wong 2022-12-30 22:19 ` [PATCH 1/5] xfs_db: allow selecting logdev blocks Darrick J. Wong 2022-12-30 22:19 ` [PATCH 3/5] xfs_db: metadump external log devices Darrick J. Wong 2022-12-30 22:19 ` [PATCH 5/5] xfs_mdrestore: fix missed progress reporting Darrick J. Wong 2022-12-30 22:19 ` [PATCH 4/5] xfs_mdrestore: restore log contents to external log devices Darrick J. Wong 2022-12-30 22:19 ` [PATCH 2/5] xfs_db: allow setting current address to log blocks Darrick J. Wong 2022-12-30 22:19 ` [PATCHSET v1.0 00/45] libxfs: shard the realtime section Darrick J. Wong 2022-12-30 22:19 ` [PATCH 02/45] xfs: define the format of rt groups Darrick J. Wong 2022-12-30 22:19 ` [PATCH 01/45] xfs: create incore realtime group structures Darrick J. Wong 2022-12-30 22:19 ` [PATCH 05/45] xfs: grow the realtime section when realtime groups are enabled Darrick J. Wong 2022-12-30 22:19 ` [PATCH 08/45] xfs: add frextents to the lazysbcounters when rtgroups enabled Darrick J. Wong 2022-12-30 22:19 ` [PATCH 03/45] xfs: update primary realtime super every time we update the primary fs super Darrick J. Wong 2022-12-30 22:19 ` [PATCH 07/45] xfs: check that rtblock extents do not overlap with the rt group metadata Darrick J. Wong 2022-12-30 22:19 ` [PATCH 06/45] xfs: export realtime group geometry via XFS_FSOP_GEOM Darrick J. Wong 2022-12-30 22:19 ` [PATCH 04/45] xfs: write secondary realtime superblocks to disk Darrick J. Wong 2022-12-30 22:19 ` [PATCH 09/45] xfs: record rt group superblock errors in the health system Darrick J. Wong 2022-12-30 22:19 ` [PATCH 10/45] xfs: define locking primitives for realtime groups Darrick J. Wong 2022-12-30 22:19 ` [PATCH 15/45] xfs: encode the rtsummary in big endian format Darrick J. Wong 2022-12-30 22:19 ` [PATCH 13/45] xfs: encode the rtbitmap in little " Darrick J. Wong 2022-12-30 22:19 ` [PATCH 11/45] xfs: export the geometry of realtime groups to userspace Darrick J. Wong 2022-12-30 22:19 ` [PATCH 14/45] xfs: add block headers to realtime summary blocks Darrick J. Wong 2022-12-30 22:19 ` [PATCH 16/45] xfs: store rtgroup information with a bmap intent Darrick J. Wong 2022-12-30 22:19 ` [PATCH 12/45] xfs: add block headers to realtime bitmap blocks Darrick J. Wong 2022-12-30 22:19 ` [PATCH 17/45] xfs: scrub the realtime group superblock Darrick J. Wong 2022-12-30 22:19 ` [PATCH 19/45] xfs: scrub the rtbitmap by group Darrick J. Wong 2022-12-30 22:19 ` [PATCH 22/45] xfs_repair: improve rtbitmap discrepancy reporting Darrick J. Wong 2022-12-30 22:19 ` [PATCH 25/45] xfs_db: listify the definition of enum typnm Darrick J. Wong 2022-12-30 22:19 ` [PATCH 23/45] xfs_repair: repair rtbitmap block headers Darrick J. Wong 2022-12-30 22:19 ` [PATCH 21/45] xfs_repair: support realtime groups Darrick J. Wong 2022-12-30 22:19 ` [PATCH 24/45] xfs_repair: repair rtsummary block headers Darrick J. Wong 2022-12-30 22:19 ` [PATCH 18/45] xfs: repair secondary realtime group superblocks Darrick J. Wong 2022-12-30 22:19 ` [PATCH 20/45] libfrog: report rt groups in output Darrick J. Wong 2022-12-30 22:19 ` [PATCH 26/45] xfs_db: support dumping realtime superblocks Darrick J. Wong 2022-12-30 22:19 ` [PATCH 31/45] xfs_db: report rtgroups via version command Darrick J. Wong 2022-12-30 22:19 ` [PATCH 27/45] xfs_db: support changing the label and uuid of rt superblocks Darrick J. Wong 2022-12-30 22:19 ` [PATCH 28/45] xfs_db: listify the definition of dbm_t Darrick J. Wong 2022-12-30 22:19 ` [PATCH 32/45] xfs_db: metadump realtime devices Darrick J. Wong 2022-12-30 22:19 ` [PATCH 30/45] xfs_db: enable conversion of rt space units Darrick J. Wong 2022-12-30 22:19 ` [PATCH 29/45] xfs_db: implement check for rt superblocks Darrick J. Wong 2022-12-30 22:19 ` [PATCH 35/45] xfs_mdrestore: restore rt group superblocks to realtime device Darrick J. Wong 2022-12-30 22:19 ` [PATCH 38/45] xfs_io: display rt group in verbose bmap output Darrick J. Wong 2022-12-30 22:19 ` [PATCH 33/45] xfs_db: dump rt bitmap blocks Darrick J. Wong 2022-12-30 22:19 ` [PATCH 40/45] xfs_spaceman: report on realtime group health Darrick J. Wong 2022-12-30 22:19 ` [PATCH 36/45] xfs_io: add a command to display allocation group information Darrick J. Wong 2022-12-30 22:19 ` [PATCH 34/45] xfs_db: dump rt summary blocks Darrick J. Wong 2022-12-30 22:19 ` [PATCH 39/45] xfs_io: display rt group in verbose fsmap output Darrick J. Wong 2022-12-30 22:19 ` [PATCH 37/45] xfs_io: add a command to display realtime group information Darrick J. Wong 2022-12-30 22:19 ` [PATCH 43/45] mkfs: add headers to realtime bitmap blocks Darrick J. Wong 2022-12-30 22:19 ` [PATCH 44/45] mkfs: add headers to realtime summary blocks Darrick J. Wong 2022-12-30 22:19 ` [PATCH 45/45] mkfs: format realtime groups Darrick J. Wong 2022-12-30 22:19 ` [PATCH 41/45] xfs_scrub: scrub realtime allocation group metadata Darrick J. Wong 2022-12-30 22:19 ` [PATCH 42/45] xfs_scrub: fstrim each rtgroup in parallel Darrick J. Wong 2022-12-30 22:19 ` [PATCHSET v1.0 0/3] libxfs: widen EFI format to support rt Darrick J. Wong 2022-12-30 22:19 ` [PATCH 2/3] xfs_logprint: report realtime EFIs Darrick J. Wong 2022-12-30 22:19 ` [PATCH 1/3] xfs: support logging EFIs for realtime extents Darrick J. Wong 2022-12-30 22:19 ` [PATCH 3/3] xfs: support error injection when freeing rt extents Darrick J. Wong 2022-12-30 22:19 ` [PATCHSET v1.0 00/41] libxfs: realtime reverse-mapping support Darrick J. Wong 2022-12-30 22:19 ` [PATCH 04/41] xfs: realtime rmap btree transaction reservations Darrick J. Wong 2022-12-30 22:19 ` [PATCH 01/41] xfs: simplify the xfs_rmap_{alloc,free}_extent calling conventions Darrick J. Wong 2022-12-30 22:19 ` [PATCH 03/41] xfs: define the on-disk realtime rmap btree format Darrick J. Wong 2022-12-30 22:19 ` [PATCH 02/41] xfs: introduce realtime rmap btree definitions Darrick J. Wong 2022-12-30 22:19 ` [PATCH 06/41] xfs: prepare rmap functions to deal with rtrmapbt Darrick J. Wong 2022-12-30 22:19 ` [PATCH 07/41] xfs: add a realtime flag to the rmap update log redo items Darrick J. Wong 2022-12-30 22:19 ` [PATCH 05/41] xfs: add realtime rmap btree operations Darrick J. Wong 2022-12-30 22:19 ` [PATCH 12/41] xfs: wire up rmap map and unmap to the realtime rmapbt Darrick J. Wong 2022-12-30 22:19 ` [PATCH 15/41] xfs: allow queued realtime intents to drain before scrubbing Darrick J. Wong 2022-12-30 22:19 ` [PATCH 11/41] xfs: use realtime EFI to free extents when realtime rmap is enabled Darrick J. Wong 2022-12-30 22:19 ` [PATCH 09/41] xfs: add metadata reservations for realtime rmap btrees Darrick J. Wong 2022-12-30 22:19 ` [PATCH 13/41] xfs: create routine to allocate and initialize a realtime rmap btree inode Darrick J. Wong 2022-12-30 22:19 ` [PATCH 08/41] xfs: add realtime reverse map inode to superblock Darrick J. Wong 2022-12-30 22:19 ` [PATCH 10/41] xfs: wire up a new inode fork type for the realtime rmap Darrick J. Wong 2022-12-30 22:19 ` [PATCH 14/41] xfs: report realtime rmap btree corruption errors to the health system Darrick J. Wong 2022-12-30 22:19 ` [PATCH 18/41] xfs: create a shadow rmap btree during realtime rmap repair Darrick J. Wong 2022-12-30 22:19 ` [PATCH 21/41] xfs_db: support the realtime rmapbt Darrick J. Wong 2022-12-30 22:19 ` [PATCH 19/41] xfs: hook live realtime rmap operations during a repair operation Darrick J. Wong 2022-12-30 22:19 ` [PATCH 20/41] xfs_db: display the realtime rmap btree contents Darrick J. Wong 2022-12-30 22:19 ` [PATCH 16/41] xfs: scrub the realtime rmapbt Darrick J. Wong 2022-12-30 22:19 ` [PATCH 17/41] xfs: online repair of the realtime rmap btree Darrick J. Wong 2022-12-30 22:19 ` [PATCH 22/41] xfs_db: support rudimentary checks of the rtrmap btree Darrick J. Wong 2022-12-30 22:19 ` [PATCH 27/41] libxfs: dirty buffers should be marked uptodate too Darrick J. Wong 2022-12-30 22:19 ` [PATCH 25/41] libfrog: enable scrubbng of the realtime rmap Darrick J. Wong 2022-12-30 22:19 ` [PATCH 24/41] xfs_db: make fsmap query the realtime reverse mapping tree Darrick J. Wong 2022-12-30 22:19 ` [PATCH 29/41] xfs_repair: use realtime rmap btree data to check block types Darrick J. Wong 2022-12-30 22:19 ` [PATCH 28/41] xfs_repair: flag suspect long-format btree blocks Darrick J. Wong 2022-12-30 22:19 ` [PATCH 26/41] xfs_spaceman: report health status of the realtime rmap btree Darrick J. Wong 2022-12-30 22:19 ` [PATCH 30/41] xfs_repair: create a new set of incore rmap information for rt groups Darrick J. Wong 2022-12-30 22:19 ` [PATCH 23/41] xfs_db: copy the realtime rmap btree Darrick J. Wong 2022-12-30 22:20 ` [PATCH 36/41] xfs_repair: rebuild " Darrick J. Wong 2022-12-30 22:20 ` [PATCH 32/41] xfs_repair: refactor realtime inode check Darrick J. Wong 2022-12-30 22:20 ` [PATCH 33/41] xfs_repair: find and mark the rtrmapbt inodes Darrick J. Wong 2022-12-30 22:20 ` [PATCH 34/41] xfs_repair: check existing realtime rmapbt entries against observed rmaps Darrick J. Wong 2022-12-30 22:20 ` [PATCH 38/41] xfs_repair: reserve per-AG space while rebuilding rt metadata Darrick J. Wong 2022-12-30 22:20 ` [PATCH 35/41] xfs_repair: always check realtime file mappings against incore info Darrick J. Wong 2022-12-30 22:20 ` [PATCH 31/41] xfs_repair: collect relatime reverse-mapping data for refcount/rmap tree rebuilding Darrick J. Wong 2022-12-30 22:20 ` [PATCH 37/41] xfs_repair: rebuild the bmap btree for realtime files Darrick J. Wong 2022-12-30 22:20 ` [PATCH 41/41] mkfs: create the realtime rmap inode Darrick J. Wong 2022-12-30 22:20 ` [PATCH 39/41] xfs_repair: allow sysadmins to add realtime reverse mapping indexes Darrick J. Wong 2022-12-30 22:20 ` [PATCH 40/41] xfs_scrub: retest metadata across scrub groups after a repair Darrick J. Wong 2022-12-30 22:20 ` [PATCHSET v1.0 0/4] libxfs: file write utility refactoring Darrick J. Wong 2022-12-30 22:20 ` [PATCH 4/4] mkfs: use file write helper to populate files Darrick J. Wong 2022-12-30 22:20 ` [PATCH 2/4] mkfs: use libxfs_alloc_file_space for rtinit Darrick J. Wong 2022-12-30 22:20 ` [PATCH 3/4] xfs_repair: use libxfs_alloc_file_space to reallocate rt metadata Darrick J. Wong 2022-12-30 22:20 ` [PATCH 1/4] libxfs: resync libxfs_alloc_file_space interface with the kernel Darrick J. Wong 2022-12-30 22:20 ` [PATCHSET v1.0 00/41] libxfs: reflink on the realtime device Darrick J. Wong 2022-12-30 22:20 ` [PATCH 02/41] xfs: namespace the maximum length/refcount symbols Darrick J. Wong 2022-12-30 22:20 ` [PATCH 01/41] xfs: introduce realtime refcount btree definitions Darrick J. Wong 2022-12-30 22:20 ` [PATCH 06/41] xfs: prepare refcount functions to deal with rtrefcountbt Darrick J. Wong 2022-12-30 22:20 ` [PATCH 07/41] xfs: add a realtime flag to the refcount update log redo items Darrick J. Wong 2022-12-30 22:20 ` [PATCH 05/41] xfs: add realtime refcount btree operations Darrick J. Wong 2022-12-30 22:20 ` [PATCH 03/41] xfs: define the on-disk realtime refcount btree format Darrick J. Wong 2022-12-30 22:20 ` [PATCH 09/41] xfs: add metadata reservations for realtime refcount btree Darrick J. Wong 2022-12-30 22:20 ` [PATCH 04/41] xfs: realtime refcount btree transaction reservations Darrick J. Wong 2022-12-30 22:20 ` [PATCH 08/41] xfs: add realtime refcount btree inode to metadata directory Darrick J. Wong 2022-12-30 22:20 ` [PATCH 15/41] xfs: allow inodes to have the realtime and reflink flags Darrick J. Wong 2022-12-30 22:20 ` [PATCH 12/41] xfs: create routine to allocate and initialize a realtime refcount btree inode Darrick J. Wong 2022-12-30 22:20 ` [PATCH 11/41] xfs: wire up realtime refcount btree cursors Darrick J. Wong 2022-12-30 22:20 ` [PATCH 14/41] xfs: compute rtrmap btree max levels when reflink enabled Darrick J. Wong 2022-12-30 22:20 ` [PATCH 17/41] xfs: fix xfs_get_extsz_hint behavior with realtime alwayscow files Darrick J. Wong 2022-12-30 22:20 ` [PATCH 16/41] xfs: refcover CoW leftovers in the realtime volume Darrick J. Wong 2022-12-30 22:20 ` [PATCH 13/41] xfs: update rmap to allow cow staging extents in the rt rmap Darrick J. Wong 2022-12-30 22:20 ` [PATCH 10/41] xfs: wire up a new inode fork type for the realtime refcount Darrick J. Wong 2022-12-30 22:20 ` [PATCH 24/41] xfs_db: display the realtime refcount btree contents Darrick J. Wong 2022-12-30 22:20 ` [PATCH 23/41] libfrog: enable scrubbing of the realtime refcount data Darrick J. Wong 2022-12-30 22:20 ` [PATCH 21/41] xfs: scrub the realtime refcount btree Darrick J. Wong 2022-12-30 22:20 ` [PATCH 18/41] xfs: apply rt extent alignment constraints to CoW extsize hint Darrick J. Wong 2022-12-30 22:20 ` [PATCH 20/41] xfs: report realtime refcount btree corruption errors to the health system Darrick J. Wong 2022-12-30 22:20 ` [PATCH 19/41] xfs: enable extent size hints for CoW operations Darrick J. Wong 2022-12-30 22:20 ` [PATCH 22/41] xfs: online repair of the realtime refcount btree Darrick J. Wong 2022-12-30 22:20 ` [PATCH 26/41] xfs_db: widen block type mask to 64 bits Darrick J. Wong 2022-12-30 22:20 ` [PATCH 29/41] xfs_spaceman: report health of the realtime refcount btree Darrick J. Wong 2022-12-30 22:20 ` [PATCH 32/41] xfs_repair: find and mark the rtrefcountbt inode Darrick J. Wong 2022-12-30 22:20 ` [PATCH 30/41] xfs_repair: allow CoW staging extents in the realtime rmap records Darrick J. Wong 2022-12-30 22:20 ` [PATCH 28/41] xfs_db: copy the realtime refcount btree Darrick J. Wong 2022-12-30 22:20 ` [PATCH 31/41] xfs_repair: use realtime refcount btree data to check block types Darrick J. Wong 2022-12-30 22:20 ` [PATCH 25/41] xfs_db: support the realtime refcountbt Darrick J. Wong 2022-12-30 22:20 ` [PATCH 27/41] xfs_db: support rudimentary checks of the rtrefcount btree Darrick J. Wong 2022-12-30 22:20 ` [PATCH 34/41] xfs_repair: check existing realtime refcountbt entries against observed refcounts Darrick J. Wong 2022-12-30 22:20 ` [PATCH 36/41] xfs_repair: rebuild the realtime refcount btree Darrick J. Wong 2022-12-30 22:20 ` [PATCH 35/41] xfs_repair: reject unwritten shared extents Darrick J. Wong 2022-12-30 22:20 ` [PATCH 33/41] xfs_repair: compute refcount data for the realtime groups Darrick J. Wong 2022-12-30 22:20 ` [PATCH 39/41] xfs_repair: allow sysadmins to add realtime reflink Darrick J. Wong 2022-12-30 22:20 ` [PATCH 38/41] xfs_repair: validate CoW extent size hint on rtinherit directories Darrick J. Wong 2022-12-30 22:20 ` [PATCH 37/41] xfs_repair: allow realtime files to have the reflink flag set Darrick J. Wong 2022-12-30 22:20 ` [PATCH 40/41] mkfs: validate CoW extent size hint when rtinherit is set Darrick J. Wong 2022-12-30 22:20 ` [PATCH 41/41] mkfs: enable reflink on the realtime device Darrick J. Wong 2022-12-30 22:20 ` [PATCHSET v1.0 0/3] libxfs: reflink with large realtime extents Darrick J. Wong 2022-12-30 22:20 ` [PATCH 3/3] mkfs: enable reflink with realtime extent sizes > 1 Darrick J. Wong 2022-12-30 22:20 ` [PATCH 1/3] xfs: enable extent size hints for CoW when rtextsize " Darrick J. Wong 2022-12-30 22:20 ` [PATCH 2/3] xfs: fix integer overflow when validating extent size hints Darrick J. Wong 2022-12-30 22:20 ` [PATCHSET v1.0 0/1] libxfs: enable quota for realtime voluems Darrick J. Wong 2022-12-30 22:20 ` [PATCH 1/1] xfs_quota: report warning limits for realtime space quotas Darrick J. Wong 2022-12-30 22:20 ` [PATCHSET 0/1] fstests: test upgrading older features Darrick J. Wong 2022-12-30 22:20 ` [PATCH 1/1] xfs: test upgrading old features Darrick J. Wong 2023-03-06 15:56 ` Zorro Lang 2023-03-06 16:41 ` Darrick J. Wong 2023-03-06 16:54 ` Zorro Lang 2023-03-06 23:14 ` Darrick J. Wong 2022-12-30 22:20 ` [PATCHSET v1.0 0/9] fstests: test XFS metadata directories Darrick J. Wong 2022-12-30 22:20 ` [PATCH 3/9] xfs/{030,033,178}: forcibly disable metadata directory trees Darrick J. Wong 2022-12-30 22:20 ` [PATCH 1/9] xfs/122: fix metadirino Darrick J. Wong 2022-12-30 22:20 ` [PATCH 2/9] various: fix finding metadata inode numbers when metadir is enabled Darrick J. Wong 2023-03-06 16:41 ` Zorro Lang 2022-12-30 22:20 ` [PATCH 4/9] common/repair: patch up repair sb inode value complaints Darrick J. Wong 2022-12-30 22:20 ` [PATCH 6/9] xfs/{050,144,153,299,330}: update quota reports to leave out metadir files Darrick J. Wong 2022-12-30 22:20 ` [PATCH 9/9] xfs: create fuzz tests for metadata directories Darrick J. Wong 2022-12-30 22:20 ` [PATCH 7/9] xfs/769: add metadir upgrade to test matrix Darrick J. Wong 2022-12-30 22:20 ` [PATCH 5/9] xfs/206: update for metadata directory support Darrick J. Wong 2022-12-30 22:20 ` [PATCH 8/9] xfs/509: adjust inumbers accounting for metadata directories Darrick J. Wong 2022-12-30 22:20 ` [PATCHSET v1.0 0/4] fstests: support metadump to external devices Darrick J. Wong 2022-12-30 22:20 ` [PATCH 3/4] common/ext4: reformat external logs during mdrestore operations Darrick J. Wong 2022-12-30 22:20 ` [PATCH 2/4] common/xfs: wipe " Darrick J. Wong 2022-12-30 22:20 ` [PATCH 1/4] common/populate: refactor caching of metadumps to a helper Darrick J. Wong 2022-12-30 22:20 ` [PATCH 4/4] common/xfs: capture external logs during metadump/mdrestore Darrick J. Wong 2022-12-30 22:20 ` [PATCHSET v1.0 00/12] xfsprogs: shard the realtime section Darrick J. Wong 2022-12-30 22:20 ` [PATCH 03/12] xfs/206: update mkfs filtering for rt groups feature Darrick J. Wong 2022-12-30 22:20 ` [PATCH 01/12] xfs/122: update for rtgroups Darrick J. Wong 2022-12-30 22:20 ` [PATCH 02/12] punch-alternating: detect xfs realtime files with large allocation units Darrick J. Wong 2022-12-30 22:20 ` [PATCH 10/12] xfs/27[46],xfs/556: fix tests to deal with rtgroups output in bmap/fsmap commands Darrick J. Wong 2022-12-30 22:20 ` [PATCH 07/12] xfs/449: update test to know about xfs_db -R Darrick J. Wong 2022-12-30 22:20 ` [PATCH 11/12] common/xfs: capture realtime devices during metadump/mdrestore Darrick J. Wong 2022-12-30 22:20 ` [PATCH 05/12] common: filter rtgroups when we're disabling metadir Darrick J. Wong 2022-12-30 22:20 ` [PATCH 08/12] xfs/122: update for rtbitmap headers Darrick J. Wong 2022-12-30 22:20 ` [PATCH 06/12] xfs/185: update for rtgroups Darrick J. Wong 2022-12-30 22:20 ` [PATCH 04/12] common: pass the realtime device to xfs_db when possible Darrick J. Wong 2022-12-30 22:20 ` [PATCH 09/12] xfs/122: udpate test to pick up rtword/suminfo ondisk unions Darrick J. Wong 2022-12-30 22:20 ` [PATCH 12/12] common/fuzzy: adapt the scrub stress tests to support rtgroups Darrick J. Wong 2022-12-30 22:20 ` [PATCHSET v1.0 00/13] fstests: fixes for realtime rmap Darrick J. Wong 2022-12-30 22:20 ` [PATCH 03/13] xfs: race fsstress with realtime rmap btree scrub and repair Darrick J. Wong 2022-12-30 22:20 ` [PATCH 04/13] xfs/769: add rtrmapbt upgrade to test matrix Darrick J. Wong 2022-12-30 22:20 ` [PATCH 05/13] xfs/122: update for rtgroups-based realtime rmap btrees Darrick J. Wong 2022-12-30 22:20 ` [PATCH 07/13] xfs/341: update test for rtgroup-based rmap Darrick J. Wong 2022-12-30 22:20 ` [PATCH 06/13] xfs: fix various problems with fsmap detecting the data device Darrick J. Wong 2022-12-30 22:20 ` [PATCH 01/13] xfs: fix tests that try to access the realtime rmap inode Darrick J. Wong 2022-12-30 22:20 ` [PATCH 02/13] fuzz: for fuzzing the rtrmapbt, find the path to the rt rmap btree file Darrick J. Wong 2022-12-30 22:20 ` [PATCH 09/13] xfs: skip tests if formatting small filesystem fails Darrick J. Wong 2022-12-30 22:20 ` [PATCH 10/13] xfs/443: use file allocation unit, not dbsize Darrick J. Wong 2022-12-30 22:20 ` [PATCH 12/13] populate: check that we created a realtime rmap btree of the given height Darrick J. Wong 2022-12-30 22:20 ` [PATCH 13/13] fuzzy: create missing fuzz tests for rt rmap btrees Darrick J. Wong 2022-12-30 22:20 ` [PATCH 11/13] populate: adjust rtrmap calculations for rtgroups Darrick J. Wong 2022-12-30 22:20 ` [PATCH 08/13] xfs/3{43,32}: adapt tests for rt extent size greater than 1 Darrick J. Wong 2022-12-30 22:20 ` [PATCHSET v1.0 00/10] fstests: reflink on the realtime device Darrick J. Wong 2022-12-30 22:20 ` [PATCH 01/10] xfs/122: update fields for realtime reflink Darrick J. Wong 2022-12-30 22:20 ` [PATCH 02/10] common/populate: create realtime refcount btree Darrick J. Wong 2022-12-30 22:20 ` [PATCH 03/10] xfs: create fuzz tests for the " Darrick J. Wong 2022-12-30 22:20 ` [PATCH 06/10] xfs: race fsstress with realtime refcount btree scrub and repair Darrick J. Wong 2022-12-30 22:20 ` [PATCH 09/10] generic/331,xfs/240: support files that skip delayed allocation Darrick J. Wong 2022-12-30 22:20 ` [PATCH 05/10] xfs/243: don't run when realtime storage is the default Darrick J. Wong 2022-12-30 22:20 ` [PATCH 04/10] xfs/27[24]: adapt for checking files on the realtime volume Darrick J. Wong 2022-12-30 22:20 ` [PATCH 08/10] xfs/769: add rtreflink upgrade to test matrix Darrick J. Wong 2022-12-30 22:20 ` [PATCH 10/10] common/xfs: fix _xfs_get_file_block_size when rtinherit is set and no rt section Darrick J. Wong 2022-12-30 22:20 ` [PATCH 07/10] xfs: remove xfs/131 now that we allow reflink on realtime volumes Darrick J. Wong 2022-12-30 22:20 ` [PATCHSET v1.0 0/4] fstests: reflink with large realtime extents Darrick J. Wong 2022-12-30 22:20 ` [PATCH 1/4] xfs: make sure that CoW will write around when rextsize > 1 Darrick J. Wong 2022-12-30 22:20 ` [PATCH 2/4] xfs: skip cowextsize hint fragmentation tests on realtime volumes Darrick J. Wong 2022-12-30 22:20 ` [PATCH 4/4] generic/303: avoid test failures on weird rt extent sizes Darrick J. Wong 2022-12-30 22:20 ` [PATCH 3/4] misc: add more congruent oplen testing Darrick J. Wong 2022-12-30 22:20 ` [PATCHSET v1.0 0/1] fstests: functional tests for rt quota Darrick J. Wong 2022-12-30 22:20 ` [PATCH 1/1] xfs: regression testing of quota on the realtime device Darrick J. Wong
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.