All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHBOMB v13.1] xfs: directory parent pointers
@ 2024-04-10  0:36 Darrick J. Wong
  2024-04-10  0:44 ` [PATCHSET v13.1 1/9] xfs: design documentation for online fsck, part 2 Darrick J. Wong
                   ` (8 more replies)
  0 siblings, 9 replies; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  0:36 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs

Hi everyone,

Christoph and I have been working to get the directory parent pointers
patchset into shape for merging in the next kernel cycle.  This v13
release contains what I hope are the last ondisk format changes -- we've
gone back to parent pointers being xattrs attached to the child, wherein
the attr name is the dirent name, and the attr value is a handle to the
parent directory.

We've solved the pptr lookup uniqueness problem by forcing all
XFS_ATTR_PARENT attr lookups to be done on the name and value; avoided
namehash collisions on container farms by adjusting the hash function
slightly; and avoided the entire log incompat feature flag mess by
defining a permanent incompat feature for parent pointers and using
totally separate attr log item opcodes.

With that, I think this is finally ready to go.

Full versions are here:
https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=vectorized-scrub
https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=vectorized-scrub

--D



^ permalink raw reply	[flat|nested] 234+ messages in thread

* [PATCHSET v13.1 1/9] xfs: design documentation for online fsck, part 2
  2024-04-10  0:36 [PATCHBOMB v13.1] xfs: directory parent pointers Darrick J. Wong
@ 2024-04-10  0:44 ` Darrick J. Wong
  2024-04-10  0:46   ` [PATCH 1/4] docs: update the parent pointers documentation to the final version Darrick J. Wong
                     ` (3 more replies)
  2024-04-10  0:44 ` [PATCHSET v13.1 2/9] xfs: retain ILOCK during directory updates Darrick J. Wong
                   ` (7 subsequent siblings)
  8 siblings, 4 replies; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  0:44 UTC (permalink / raw)
  To: djwong; +Cc: hch, linux-xfs

Hi all,

This series updates the design documentation for online fsck to reflect
the final design of the parent pointers feature as well as the
implementation of online fsck for the new metadata.

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

This has been running on the djcloud for months with no problems.  Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=online-fsck-design
---
Commits in this patchset:
 * docs: update the parent pointers documentation to the final version
 * docs: update online directory and parent pointer repair sections
 * docs: update offline parent pointer repair strategy
 * docs: describe xfs directory tree online fsck
---
 .../filesystems/xfs/xfs-online-fsck-design.rst     |  354 +++++++++++++++-----
 1 file changed, 266 insertions(+), 88 deletions(-)


^ permalink raw reply	[flat|nested] 234+ messages in thread

* [PATCHSET v13.1 2/9] xfs: retain ILOCK during directory updates
  2024-04-10  0:36 [PATCHBOMB v13.1] xfs: directory parent pointers Darrick J. Wong
  2024-04-10  0:44 ` [PATCHSET v13.1 1/9] xfs: design documentation for online fsck, part 2 Darrick J. Wong
@ 2024-04-10  0:44 ` Darrick J. Wong
  2024-04-10  0:47   ` [PATCH 1/7] xfs: Increase XFS_DEFER_OPS_NR_INODES to 5 Darrick J. Wong
                     ` (6 more replies)
  2024-04-10  0:44 ` [PATCHSET v13.1 3/9] xfs: shrink struct xfs_da_args Darrick J. Wong
                   ` (6 subsequent siblings)
  8 siblings, 7 replies; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  0:44 UTC (permalink / raw)
  To: djwong
  Cc: Catherine Hoang, Allison Henderson, catherine.hoang, hch,
	allison.henderson, linux-xfs

Hi all,

This series changes the directory update code to retain the ILOCK on all
files involved in a rename until the end of the operation.  The upcoming
parent pointers patchset applies parent pointers in a separate chained
update from the actual directory update, which is why it is now
necessary to keep the ILOCK instead of dropping it after the first
transaction in the chain.

As a side effect, we no longer need to hold the IOLOCK during an rmapbt
scan of inodes to serialize the scan with ongoing directory updates.

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

This has been running on the djcloud for months with no problems.  Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=retain-ilock-during-dir-ops

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=retain-ilock-during-dir-ops
---
Commits in this patchset:
 * xfs: Increase XFS_DEFER_OPS_NR_INODES to 5
 * xfs: Increase XFS_QM_TRANS_MAXDQS to 5
 * xfs: Hold inode locks in xfs_ialloc
 * xfs: Hold inode locks in xfs_trans_alloc_dir
 * xfs: Hold inode locks in xfs_rename
 * xfs: don't pick up IOLOCK during rmapbt repair scan
 * xfs: unlock new repair tempfiles after creation
---
 fs/xfs/libxfs/xfs_defer.c  |    6 ++-
 fs/xfs/libxfs/xfs_defer.h  |    8 +++-
 fs/xfs/scrub/rmap_repair.c |   16 -------
 fs/xfs/scrub/tempfile.c    |    2 +
 fs/xfs/xfs_dquot.c         |   41 ++++++++++++++++++
 fs/xfs/xfs_dquot.h         |    1 
 fs/xfs/xfs_inode.c         |   98 ++++++++++++++++++++++++++++++++------------
 fs/xfs/xfs_inode.h         |    2 +
 fs/xfs/xfs_qm.c            |    4 +-
 fs/xfs/xfs_qm.h            |    2 -
 fs/xfs/xfs_symlink.c       |    6 ++-
 fs/xfs/xfs_trans.c         |    9 +++-
 fs/xfs/xfs_trans_dquot.c   |   15 ++++---
 13 files changed, 156 insertions(+), 54 deletions(-)


^ permalink raw reply	[flat|nested] 234+ messages in thread

* [PATCHSET v13.1 3/9] xfs: shrink struct xfs_da_args
  2024-04-10  0:36 [PATCHBOMB v13.1] xfs: directory parent pointers Darrick J. Wong
  2024-04-10  0:44 ` [PATCHSET v13.1 1/9] xfs: design documentation for online fsck, part 2 Darrick J. Wong
  2024-04-10  0:44 ` [PATCHSET v13.1 2/9] xfs: retain ILOCK during directory updates Darrick J. Wong
@ 2024-04-10  0:44 ` Darrick J. Wong
  2024-04-10  0:49   ` [PATCH 1/4] xfs: remove XFS_DA_OP_REMOVE Darrick J. Wong
                     ` (3 more replies)
  2024-04-10  0:45 ` [PATCHSET v13.1 4/9] xfs: improve extended attribute validation Darrick J. Wong
                   ` (5 subsequent siblings)
  8 siblings, 4 replies; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  0:44 UTC (permalink / raw)
  To: djwong; +Cc: hch, linux-xfs

Hi all,

Let's clean out some unused flags and fields from struct xfs_da_args.

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

This has been running on the djcloud for months with no problems.  Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=shrink-dirattr-args

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=shrink-dirattr-args
---
Commits in this patchset:
 * xfs: remove XFS_DA_OP_REMOVE
 * xfs: remove XFS_DA_OP_NOTIME
 * xfs: rename xfs_da_args.attr_flags
 * xfs: rearrange xfs_da_args a bit to use less space
---
 fs/xfs/libxfs/xfs_attr.c     |    9 ++++-----
 fs/xfs/libxfs/xfs_attr.h     |    1 -
 fs/xfs/libxfs/xfs_da_btree.h |   30 ++++++++++++++----------------
 fs/xfs/scrub/attr.c          |    1 -
 fs/xfs/scrub/attr_repair.c   |    2 +-
 fs/xfs/xfs_ioctl.c           |    6 +++---
 fs/xfs/xfs_trace.h           |    6 +++---
 fs/xfs/xfs_xattr.c           |    2 +-
 8 files changed, 26 insertions(+), 31 deletions(-)


^ permalink raw reply	[flat|nested] 234+ messages in thread

* [PATCHSET v13.1 4/9] xfs: improve extended attribute validation
  2024-04-10  0:36 [PATCHBOMB v13.1] xfs: directory parent pointers Darrick J. Wong
                   ` (2 preceding siblings ...)
  2024-04-10  0:44 ` [PATCHSET v13.1 3/9] xfs: shrink struct xfs_da_args Darrick J. Wong
@ 2024-04-10  0:45 ` Darrick J. Wong
  2024-04-10  0:50   ` [PATCH 01/12] xfs: attr fork iext must be loaded before calling xfs_attr_is_leaf Darrick J. Wong
                     ` (11 more replies)
  2024-04-10  0:45 ` [PATCHSET v13.1 5/9] xfs: Parent Pointers Darrick J. Wong
                   ` (4 subsequent siblings)
  8 siblings, 12 replies; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  0:45 UTC (permalink / raw)
  To: djwong; +Cc: Christoph Hellwig, hch, linux-xfs

Hi all,

Prior to introducing parent pointer extended attributes, let's spend
some time cleaning up the attr code and strengthening the validation
that it performs on attrs coming in from the disk.

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

This has been running on the djcloud for months with no problems.  Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=improve-attr-validation

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=improve-attr-validation
---
Commits in this patchset:
 * xfs: attr fork iext must be loaded before calling xfs_attr_is_leaf
 * xfs: require XFS_SB_FEAT_INCOMPAT_LOG_XATTRS for attr log intent item recovery
 * xfs: use an XFS_OPSTATE_ flag for detecting if logged xattrs are available
 * xfs: check opcode and iovec count match in xlog_recover_attri_commit_pass2
 * xfs: fix missing check for invalid attr flags
 * xfs: restructure xfs_attr_complete_op a bit
 * xfs: use helpers to extract xattr op from opflags
 * xfs: validate recovered name buffers when recovering xattr items
 * xfs: always set args->value in xfs_attri_item_recover
 * xfs: use local variables for name and value length in _attri_commit_pass2
 * xfs: refactor name/length checks in xfs_attri_validate
 * xfs: enforce one namespace per attribute
---
 fs/xfs/libxfs/xfs_attr.c      |   41 ++++++++-
 fs/xfs/libxfs/xfs_attr.h      |    9 ++
 fs/xfs/libxfs/xfs_attr_leaf.c |    7 +
 fs/xfs/libxfs/xfs_da_format.h |    5 +
 fs/xfs/scrub/attr.c           |   25 +++--
 fs/xfs/scrub/attr_repair.c    |    4 -
 fs/xfs/xfs_attr_item.c        |  192 ++++++++++++++++++++++++++++++++---------
 fs/xfs/xfs_attr_list.c        |   18 +++-
 fs/xfs/xfs_mount.c            |   16 +++
 fs/xfs/xfs_mount.h            |    6 +
 fs/xfs/xfs_xattr.c            |    3 -
 11 files changed, 258 insertions(+), 68 deletions(-)


^ permalink raw reply	[flat|nested] 234+ messages in thread

* [PATCHSET v13.1 5/9] xfs: Parent Pointers
  2024-04-10  0:36 [PATCHBOMB v13.1] xfs: directory parent pointers Darrick J. Wong
                   ` (3 preceding siblings ...)
  2024-04-10  0:45 ` [PATCHSET v13.1 4/9] xfs: improve extended attribute validation Darrick J. Wong
@ 2024-04-10  0:45 ` Darrick J. Wong
  2024-04-10  0:53   ` [PATCH 01/32] xfs: rearrange xfs_attr_match parameters Darrick J. Wong
                     ` (31 more replies)
  2024-04-10  0:45 ` [PATCHSET v13.1 6/9] xfs: scrubbing for " Darrick J. Wong
                   ` (3 subsequent siblings)
  8 siblings, 32 replies; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  0:45 UTC (permalink / raw)
  To: djwong
  Cc: Darrick J. Wong, Allison Henderson, Christoph Hellwig,
	Mark Tinguely, Dave Chinner, catherine.hoang, hch,
	allison.henderson, linux-xfs

Hi all,

This is the latest parent pointer attributes for xfs.  The goal of this
patch set is to add a parent pointer attribute to each inode.  The
attribute name containing the parent inode, generation, and directory
offset, while the  attribute value contains the file name.  This feature
will enable future optimizations for online scrub, shrink, nfs handles,
verity, or any other feature that could make use of quickly deriving an
inodes path from the mount point.

Directory parent pointers are stored as namespaced extended attributes
of a file.  Because parent pointers are an indivisible tuple of
(dirent_name, parent_ino, parent_gen) we cannot use the usual attr name
lookup functions to find a parent pointer.  This is solvable by
introducing a new lookup mode that checks both the name and the value of
the xattr.

Therefore, introduce this new name-value lookup mode that's gated on the
XFS_ATTR_PARENT namespace.  This requires the introduction of new
opcodes for the extended attribute update log intent items, which
actually means that parent pointers (itself an INCOMPAT feature) does
not depend on the LOGGED_XATTRS log incompat feature bit.

To reduce collisions on the dirent names of parent pointers, introduce a
new attr hash mode that is the dir2 namehash of the dirent name xor'd
with the parent inode number.

At this point, Allison has moved on to other things, so I've merged her
patchset into djwong-dev for merging.

Updates since v12 [djwong]:

Rebase on 6.9-rc and update the online fsck design document.
Redesign the ondisk format to use the name-value lookups to get us back
to the point where the attr is (dirent_name -> parent_ino/gen).

Updates since v11 [djwong]:

Rebase on 6.4-rc and make some tweaks and bugfixes to enable the repair
prototypes.  Merge with djwong-dev and make online repair actually work.

Updates since v10 [djwong]:

Merge in the ondisk format changes to get rid of the diroffset conflicts
with the parent pointer repair code, rebase the entire series with the
attr vlookup changes first, and merge all the other random fixes.

Updates since v9:

Reordered patches 2 and 3 to be 6 and 7

xfs: Add xfs_verify_pptr
   moved parent pointer validators to xfs_parent

xfs: Add parent pointer ioctl
   Extra validation checks for fs id
   added missing release for the inode
   use GFP_KERNEL flags for malloc/realloc
   reworked ioctl to use pptr listenty and flex array

NEW
   xfs: don't remove the attr fork when parent pointers are enabled

NEW
   directory lookups should return diroffsets too

NEW
   xfs: move/add parent pointer validators to xfs_parent

Updates since v8:

xfs: parent pointer attribute creation
   Fix xfs_parent_init to release log assist on alloc fail
   Add slab cache for xfs_parent_defer
   Fix xfs_create to release after unlock
   Add xfs_parent_start and xfs_parent_finish wrappers
   removed unused xfs_parent_name_irec and xfs_init_parent_name_irec

xfs: add parent attributes to link
   Start/finish wrapper updates
   Fix xfs_link to disallow reservationless quotas

xfs: add parent attributes to symlink
   Fix xfs_symlink to release after unlock
   Start/finish wrapper updates

xfs: remove parent pointers in unlink
   Start/finish wrapper updates
   Add missing parent free

xfs: Add parent pointers to rename
   Start/finish wrapper updates
   Fix rename to only grab logged xattr once
   Fix xfs_rename to disallow reservationless quotas
   Fix double unlock on dqattach fail
   Move parent frees to out_release_wip

xfs: Add parent pointers to xfs_cross_rename
   Hoist parent pointers into rename

Questions comments and feedback appreciated!

Thanks all!
Allison

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

This has been running on the djcloud for months with no problems.  Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=pptrs

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=pptrs

fstests git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=pptrs

xfsdocs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-documentation.git/log/?h=pptrs
---
Commits in this patchset:
 * xfs: rearrange xfs_attr_match parameters
 * xfs: check the flags earlier in xfs_attr_match
 * xfs: move xfs_attr_defer_add to xfs_attr_item.c
 * xfs: create a separate hashname function for extended attributes
 * xfs: add parent pointer support to attribute code
 * xfs: define parent pointer ondisk extended attribute format
 * xfs: allow xattr matching on name and value for local/sf attrs
 * xfs: allow logged xattr operations if parent pointers are enabled
 * xfs: log parent pointer xattr removal operations
 * xfs: log parent pointer xattr setting operations
 * xfs: log parent pointer xattr replace operations
 * xfs: record inode generation in xattr update log intent items
 * xfs: Expose init_xattrs in xfs_create_tmpfile
 * xfs: add parent pointer validator functions
 * xfs: extend transaction reservations for parent attributes
 * xfs: create a hashname function for parent pointers
 * xfs: parent pointer attribute creation
 * xfs: add parent attributes to link
 * xfs: add parent attributes to symlink
 * xfs: remove parent pointers in unlink
 * xfs: Add parent pointers to rename
 * xfs: Add parent pointers to xfs_cross_rename
 * xfs: Filter XFS_ATTR_PARENT for getfattr
 * xfs: pass the attr value to put_listent when possible
 * xfs: move handle ioctl code to xfs_handle.c
 * xfs: split out handle management helpers a bit
 * xfs: Add parent pointer ioctls
 * xfs: don't remove the attr fork when parent pointers are enabled
 * xfs: Add the parent pointer support to the superblock version 5.
 * xfs: fix unit conversion error in xfs_log_calc_max_attrsetm_res
 * xfs: drop compatibility minimum log size computations for reflink
 * xfs: enable parent pointers
---
 fs/xfs/Makefile                 |    3 
 fs/xfs/libxfs/xfs_attr.c        |   92 ++--
 fs/xfs/libxfs/xfs_attr.h        |   23 +
 fs/xfs/libxfs/xfs_attr_leaf.c   |   81 +++
 fs/xfs/libxfs/xfs_attr_sf.h     |    1 
 fs/xfs/libxfs/xfs_da_btree.h    |    4 
 fs/xfs/libxfs/xfs_da_format.h   |   25 +
 fs/xfs/libxfs/xfs_format.h      |    4 
 fs/xfs/libxfs/xfs_fs.h          |   79 +++
 fs/xfs/libxfs/xfs_log_format.h  |   25 +
 fs/xfs/libxfs/xfs_log_rlimit.c  |   46 ++
 fs/xfs/libxfs/xfs_ondisk.h      |    6 
 fs/xfs/libxfs/xfs_parent.c      |  296 +++++++++++++
 fs/xfs/libxfs/xfs_parent.h      |   99 ++++
 fs/xfs/libxfs/xfs_sb.c          |    4 
 fs/xfs/libxfs/xfs_trans_resv.c  |  326 ++++++++++++--
 fs/xfs/libxfs/xfs_trans_space.c |  121 +++++
 fs/xfs/libxfs/xfs_trans_space.h |   25 +
 fs/xfs/scrub/attr.c             |   15 +
 fs/xfs/scrub/dir_repair.c       |    2 
 fs/xfs/scrub/orphanage.c        |    5 
 fs/xfs/scrub/parent_repair.c    |    3 
 fs/xfs/scrub/symlink_repair.c   |    2 
 fs/xfs/scrub/tempfile.c         |    2 
 fs/xfs/xfs_attr_item.c          |  320 +++++++++++++-
 fs/xfs/xfs_attr_item.h          |   12 +
 fs/xfs/xfs_attr_list.c          |   13 -
 fs/xfs/xfs_handle.c             |  906 +++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_handle.h             |   33 +
 fs/xfs/xfs_inode.c              |  218 ++++++++-
 fs/xfs/xfs_inode.h              |    2 
 fs/xfs/xfs_ioctl.c              |  594 --------------------------
 fs/xfs/xfs_ioctl.h              |   28 -
 fs/xfs/xfs_ioctl32.c            |    1 
 fs/xfs/xfs_iops.c               |   15 +
 fs/xfs/xfs_super.c              |   14 +
 fs/xfs/xfs_symlink.c            |   30 +
 fs/xfs/xfs_trace.c              |    1 
 fs/xfs/xfs_trace.h              |   95 ++++
 fs/xfs/xfs_xattr.c              |   13 +
 fs/xfs/xfs_xattr.h              |    2 
 41 files changed, 2763 insertions(+), 823 deletions(-)
 create mode 100644 fs/xfs/libxfs/xfs_parent.c
 create mode 100644 fs/xfs/libxfs/xfs_parent.h
 create mode 100644 fs/xfs/libxfs/xfs_trans_space.c
 create mode 100644 fs/xfs/xfs_handle.c
 create mode 100644 fs/xfs/xfs_handle.h


^ permalink raw reply	[flat|nested] 234+ messages in thread

* [PATCHSET v13.1 6/9] xfs: scrubbing for parent pointers
  2024-04-10  0:36 [PATCHBOMB v13.1] xfs: directory parent pointers Darrick J. Wong
                   ` (4 preceding siblings ...)
  2024-04-10  0:45 ` [PATCHSET v13.1 5/9] xfs: Parent Pointers Darrick J. Wong
@ 2024-04-10  0:45 ` Darrick J. Wong
  2024-04-10  1:02   ` [PATCH 1/7] xfs: check dirents have " Darrick J. Wong
                     ` (6 more replies)
  2024-04-10  0:45 ` [PATCHSET v13.1 7/9] xfs: online repair for parent pointers Darrick J. Wong
                   ` (2 subsequent siblings)
  8 siblings, 7 replies; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  0:45 UTC (permalink / raw)
  To: djwong; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

Hi all,

Teach online fsck to use parent pointers to assist in checking
directories, parent pointers, extended attributes, and link counts.

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

This has been running on the djcloud for months with no problems.  Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=scrub-pptrs

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-pptrs
---
Commits in this patchset:
 * xfs: check dirents have parent pointers
 * xfs: deferred scrub of dirents
 * xfs: scrub parent pointers
 * xfs: deferred scrub of parent pointers
 * xfs: walk directory parent pointers to determine backref count
 * xfs: check parent pointer xattrs when scrubbing
 * xfs: salvage parent pointers when rebuilding xattr structures
---
 fs/xfs/Makefile              |    2 
 fs/xfs/libxfs/xfs_parent.c   |   22 +
 fs/xfs/libxfs/xfs_parent.h   |    5 
 fs/xfs/scrub/attr.c          |    8 
 fs/xfs/scrub/attr_repair.c   |   34 ++
 fs/xfs/scrub/common.h        |    1 
 fs/xfs/scrub/dir.c           |  342 +++++++++++++++++++++
 fs/xfs/scrub/nlinks.c        |   82 +++++
 fs/xfs/scrub/nlinks_repair.c |    2 
 fs/xfs/scrub/parent.c        |  678 ++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/readdir.c       |   78 +++++
 fs/xfs/scrub/readdir.h       |    3 
 fs/xfs/scrub/trace.c         |    1 
 fs/xfs/scrub/trace.h         |  103 ++++++
 14 files changed, 1348 insertions(+), 13 deletions(-)


^ permalink raw reply	[flat|nested] 234+ messages in thread

* [PATCHSET v13.1 7/9] xfs: online repair for parent pointers
  2024-04-10  0:36 [PATCHBOMB v13.1] xfs: directory parent pointers Darrick J. Wong
                   ` (5 preceding siblings ...)
  2024-04-10  0:45 ` [PATCHSET v13.1 6/9] xfs: scrubbing for " Darrick J. Wong
@ 2024-04-10  0:45 ` Darrick J. Wong
  2024-04-10  1:03   ` [PATCH 01/14] xfs: add xattr setname and removename functions for internal users Darrick J. Wong
                     ` (13 more replies)
  2024-04-10  0:46 ` [PATCHSET v13.1 8/9] xfs: detect and correct directory tree problems Darrick J. Wong
  2024-04-10  0:46 ` [PATCHSET v13.1 9/9] xfs: vectorize scrub kernel calls Darrick J. Wong
  8 siblings, 14 replies; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  0:45 UTC (permalink / raw)
  To: djwong; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

Hi all,

This series implements online repair for directory parent pointer
metadata.  The checking half is fairly straightforward -- for each
outgoing directory link (forward or backwards), grab the inode at the
other end, and confirm that there's a corresponding link.  If we can't
grab an inode or lock it, we'll save that link for a slower loop that
cycles all the locks, confirms the continued existence of the link, and
rechecks the link if it's actually still there.

Repairs are a bit more involved -- for directories, we walk the entire
filesystem to rebuild the dirents from parent pointer information.
Parent pointer repairs do the same walk but rebuild the pptrs from the
dirent information, but with the added twist that it duplicates all the
xattrs so that it can use the atomic extent swapping code to commit the
repairs atomically.

This introduces an added twist to the xattr repair code -- we use dirent
hooks to detect a colliding update to the pptr data while we're not
holding the ILOCKs; if one is detected, we restart the xattr salvaging
process but this time hold all the ILOCKs until the end of the scan.

For offline repair, the phase6 directory connectivity scan generates an
index of all the expected parent pointers in the filesystem.  Then it
walks each file and compares the parent pointers attached to that file
against the index generated, and resyncs the results as necessary.

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

This has been running on the djcloud for months with no problems.  Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=repair-pptrs

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=repair-pptrs
---
Commits in this patchset:
 * xfs: add xattr setname and removename functions for internal users
 * xfs: add raw parent pointer apis to support repair
 * xfs: repair directories by scanning directory parent pointers
 * xfs: implement live updates for directory repairs
 * xfs: replay unlocked parent pointer updates that accrue during xattr repair
 * xfs: repair directory parent pointers by scanning for dirents
 * xfs: implement live updates for parent pointer repairs
 * xfs: remove pointless unlocked assertion
 * xfs: split xfs_bmap_add_attrfork into two pieces
 * xfs: add a per-leaf block callback to xchk_xattr_walk
 * xfs: actually rebuild the parent pointer xattrs
 * xfs: adapt the orphanage code to handle parent pointers
 * xfs: repair link count of nondirectories after rebuilding parent pointers
 * xfs: inode repair should ensure there's an attr fork to store parent pointers
---
 fs/xfs/libxfs/xfs_attr.c     |  230 +++++++
 fs/xfs/libxfs/xfs_attr.h     |    4 
 fs/xfs/libxfs/xfs_bmap.c     |   38 -
 fs/xfs/libxfs/xfs_bmap.h     |    3 
 fs/xfs/libxfs/xfs_dir2.c     |    2 
 fs/xfs/libxfs/xfs_dir2.h     |    2 
 fs/xfs/libxfs/xfs_parent.c   |   64 ++
 fs/xfs/libxfs/xfs_parent.h   |    6 
 fs/xfs/scrub/attr.c          |    2 
 fs/xfs/scrub/attr_repair.c   |  459 +++++++++++++++
 fs/xfs/scrub/attr_repair.h   |    4 
 fs/xfs/scrub/dir_repair.c    |  564 +++++++++++++++++-
 fs/xfs/scrub/findparent.c    |   12 
 fs/xfs/scrub/findparent.h    |   10 
 fs/xfs/scrub/inode_repair.c  |   41 +
 fs/xfs/scrub/listxattr.c     |   10 
 fs/xfs/scrub/listxattr.h     |    4 
 fs/xfs/scrub/nlinks.c        |    3 
 fs/xfs/scrub/orphanage.c     |   38 +
 fs/xfs/scrub/orphanage.h     |    3 
 fs/xfs/scrub/parent.c        |    7 
 fs/xfs/scrub/parent_repair.c | 1301 ++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/scrub.c         |    2 
 fs/xfs/scrub/trace.h         |  115 ++++
 24 files changed, 2816 insertions(+), 108 deletions(-)


^ permalink raw reply	[flat|nested] 234+ messages in thread

* [PATCHSET v13.1 8/9] xfs: detect and correct directory tree problems
  2024-04-10  0:36 [PATCHBOMB v13.1] xfs: directory parent pointers Darrick J. Wong
                   ` (6 preceding siblings ...)
  2024-04-10  0:45 ` [PATCHSET v13.1 7/9] xfs: online repair for parent pointers Darrick J. Wong
@ 2024-04-10  0:46 ` Darrick J. Wong
  2024-04-10  1:07   ` [PATCH 1/4] xfs: teach online scrub to find directory tree structure problems Darrick J. Wong
                     ` (3 more replies)
  2024-04-10  0:46 ` [PATCHSET v13.1 9/9] xfs: vectorize scrub kernel calls Darrick J. Wong
  8 siblings, 4 replies; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  0:46 UTC (permalink / raw)
  To: djwong; +Cc: hch, linux-xfs

Hi all,

Historically, checking the tree-ness of the directory tree structure has
not been complete.  Cycles of subdirectories break the tree properties,
as do subdirectories with multiple parents.  It's easy enough for DFS to
detect problems as long as one of the participants is reachable from the
root, but this technique cannot find unconnected cycles.

Directory parent pointers change that, because we can discover all of
these problems from a simple walk from a subdirectory towards the root.
For each child we start with, if the walk terminates without reaching
the root, we know the path is disconnected and ought to be attached to
the lost and found.  If we find ourselves, we know this is a cycle and
can delete an incoming edge.  If we find multiple paths to the root, we
know to delete an incoming edge.

Even better, once we've finished walking paths, we've identified the
good ones and know which other path(s) to remove.

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

This has been running on the djcloud for months with no problems.  Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=scrub-directory-tree

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=scrub-directory-tree

fstests git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=scrub-directory-tree
---
Commits in this patchset:
 * xfs: teach online scrub to find directory tree structure problems
 * xfs: invalidate dirloop scrub path data when concurrent updates happen
 * xfs: report directory tree corruption in the health information
 * xfs: fix corruptions in the directory tree
---
 fs/xfs/Makefile               |    2 
 fs/xfs/libxfs/xfs_fs.h        |    4 
 fs/xfs/libxfs/xfs_health.h    |    4 
 fs/xfs/scrub/common.h         |    1 
 fs/xfs/scrub/dirtree.c        |  979 +++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/dirtree.h        |  178 +++++++
 fs/xfs/scrub/dirtree_repair.c |  821 ++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/health.c         |    1 
 fs/xfs/scrub/ino_bitmap.h     |   37 ++
 fs/xfs/scrub/orphanage.c      |    6 
 fs/xfs/scrub/orphanage.h      |    8 
 fs/xfs/scrub/repair.h         |    4 
 fs/xfs/scrub/scrub.c          |    7 
 fs/xfs/scrub/scrub.h          |    1 
 fs/xfs/scrub/stats.c          |    1 
 fs/xfs/scrub/trace.c          |    4 
 fs/xfs/scrub/trace.h          |  272 +++++++++++
 fs/xfs/scrub/xfarray.h        |    1 
 fs/xfs/xfs_health.c           |    1 
 fs/xfs/xfs_inode.c            |    2 
 fs/xfs/xfs_inode.h            |    1 
 21 files changed, 2331 insertions(+), 4 deletions(-)
 create mode 100644 fs/xfs/scrub/dirtree.c
 create mode 100644 fs/xfs/scrub/dirtree.h
 create mode 100644 fs/xfs/scrub/dirtree_repair.c
 create mode 100644 fs/xfs/scrub/ino_bitmap.h


^ permalink raw reply	[flat|nested] 234+ messages in thread

* [PATCHSET v13.1 9/9] xfs: vectorize scrub kernel calls
  2024-04-10  0:36 [PATCHBOMB v13.1] xfs: directory parent pointers Darrick J. Wong
                   ` (7 preceding siblings ...)
  2024-04-10  0:46 ` [PATCHSET v13.1 8/9] xfs: detect and correct directory tree problems Darrick J. Wong
@ 2024-04-10  0:46 ` Darrick J. Wong
  2024-04-10  1:08   ` [PATCH 1/3] xfs: reduce the rate of cond_resched calls inside scrub Darrick J. Wong
                     ` (2 more replies)
  8 siblings, 3 replies; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  0:46 UTC (permalink / raw)
  To: djwong; +Cc: hch, linux-xfs

Hi all,

Create a vectorized version of the metadata scrub and repair ioctl, and
adapt xfs_scrub to use that.  This is an experiment to measure overhead
and to try refactoring xfs_scrub.

If you're going to start using this code, I strongly recommend pulling
from my git trees, which are linked below.

This has been running on the djcloud for months with no problems.  Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=vectorized-scrub

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=vectorized-scrub

fstests git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=vectorized-scrub
---
Commits in this patchset:
 * xfs: reduce the rate of cond_resched calls inside scrub
 * xfs: introduce vectored scrub mode
 * xfs: only iget the file once when doing vectored scrub-by-handle
---
 fs/xfs/libxfs/xfs_fs.h   |   40 +++++++++++
 fs/xfs/scrub/common.h    |   25 -------
 fs/xfs/scrub/scrub.c     |  168 ++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/scrub.h     |   64 ++++++++++++++++++
 fs/xfs/scrub/trace.h     |   78 +++++++++++++++++++++
 fs/xfs/scrub/xfarray.c   |   10 +--
 fs/xfs/scrub/xfarray.h   |    3 +
 fs/xfs/scrub/xfile.c     |    2 -
 fs/xfs/scrub/xfs_scrub.h |    2 +
 fs/xfs/xfs_ioctl.c       |   50 ++++++++++++++
 10 files changed, 410 insertions(+), 32 deletions(-)


^ permalink raw reply	[flat|nested] 234+ messages in thread

* [PATCH 1/4] docs: update the parent pointers documentation to the final version
  2024-04-10  0:44 ` [PATCHSET v13.1 1/9] xfs: design documentation for online fsck, part 2 Darrick J. Wong
@ 2024-04-10  0:46   ` Darrick J. Wong
  2024-04-10  4:40     ` Christoph Hellwig
  2024-04-10  0:46   ` [PATCH 2/4] docs: update online directory and parent pointer repair sections Darrick J. Wong
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  0:46 UTC (permalink / raw)
  To: djwong; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Now that we've decided on the ondisk format of parent pointers, update
the documentation to reflect that.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 .../filesystems/xfs/xfs-online-fsck-design.rst     |   94 +++++++++++---------
 1 file changed, 53 insertions(+), 41 deletions(-)


diff --git a/Documentation/filesystems/xfs/xfs-online-fsck-design.rst b/Documentation/filesystems/xfs/xfs-online-fsck-design.rst
index 74a8e42c74bd0..1e3211d12247d 100644
--- a/Documentation/filesystems/xfs/xfs-online-fsck-design.rst
+++ b/Documentation/filesystems/xfs/xfs-online-fsck-design.rst
@@ -4465,10 +4465,10 @@ reconstruction of filesystem space metadata.
 The parent pointer feature, however, makes total directory reconstruction
 possible.
 
-XFS parent pointers include the dirent name and location of the entry within
-the parent directory.
+XFS parent pointers contain the information needed to identify the
+corresponding directory entry in the parent directory.
 In other words, child files use extended attributes to store pointers to
-parents in the form ``(parent_inum, parent_gen, dirent_pos) → (dirent_name)``.
+parents in the form ``(dirent_name) → (parent_inum, parent_gen)``.
 The directory checking process can be strengthened to ensure that the target of
 each dirent also contains a parent pointer pointing back to the dirent.
 Likewise, each parent pointer can be checked by ensuring that the target of
@@ -4476,8 +4476,6 @@ each parent pointer is a directory and that it contains a dirent matching
 the parent pointer.
 Both online and offline repair can use this strategy.
 
-**Note**: The ondisk format of parent pointers is not yet finalized.
-
 +--------------------------------------------------------------------------+
 | **Historical Sidebar**:                                                  |
 +--------------------------------------------------------------------------+
@@ -4519,8 +4517,58 @@ Both online and offline repair can use this strategy.
 | Chandan increased the maximum extent counts of both data and attribute   |
 | forks, thereby ensuring that the extended attribute structure can grow   |
 | to handle the maximum hardlink count of any file.                        |
+|                                                                          |
+| For this second effort, the ondisk parent pointer format as originally   |
+| proposed was ``(parent_inum, parent_gen, dirent_pos) → (dirent_name)``.  |
+| The format was changed during development to eliminate the requirement   |
+| of repair tools needing to to ensure that the ``dirent_pos`` field       |
+| always matched when reconstructing a directory.                          |
+|                                                                          |
+| There were a few other ways to have solved that problem:                 |
+|                                                                          |
+| 1. The field could be designated advisory, since the other three values  |
+|    are sufficient to find the entry in the parent.                       |
+|    However, this makes indexed key lookup impossible while repairs are   |
+|    ongoing.                                                              |
+|                                                                          |
+| 2. We could allow creating directory entries at specified offsets, which |
+|    solves the referential integrity problem but runs the risk that       |
+|    dirent creation will fail due to conflicts with the free space in the |
+|    directory.                                                            |
+|                                                                          |
+|    These conflicts could be resolved by appending the directory entry    |
+|    and amending the xattr code to support updating an xattr key and      |
+|    reindexing the dabtree, though this would have to be performed with   |
+|    the parent directory still locked.                                    |
+|                                                                          |
+| 3. Same as above, but remove the old parent pointer entry and add a new  |
+|    one atomically.                                                       |
+|                                                                          |
+| 4. Change the ondisk xattr format to                                     |
+|    ``(parent_inum, name) → (parent_gen)``, which would provide the attr  |
+|    name uniqueness that we require, without forcing repair code to       |
+|    update the dirent position.                                           |
+|    Unfortunately, this requires changes to the xattr code to support     |
+|    attr names as long as 263 bytes.                                      |
+|                                                                          |
+| 5. Change the ondisk xattr format to ``(parent_inum, hash(name)) →       |
+|    (name, parent_gen)``.                                                 |
+|    If the hash is sufficiently resistant to collisions (e.g. sha256)     |
+|    then this should provide the attr name uniqueness that we require.    |
+|    Names shorter than 247 bytes could be stored directly.                |
+|                                                                          |
+| 6. Change the ondisk xattr format to ``(dirent_name) → (parent_ino,      |
+|    parent_gen)``.  This format doesn't require any of the complicated    |
+|    nested name hashing of the previous suggestions.  However, it was     |
+|    discovered that multiple hardlinks to the same inode with the same    |
+|    filename caused performance problems with hashed xattr lookups, so    |
+|    the parent inumber is now xor'd into the hash index.                  |
+|                                                                          |
+| In the end, it was decided that solution #6 was the most compact and the |
+| most performant.  A new hash function was designed for parent pointers.  |
 +--------------------------------------------------------------------------+
 
+
 Case Study: Repairing Directories with Parent Pointers
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
@@ -4569,42 +4617,6 @@ The proposed patchset is the
 <https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=pptrs-online-dir-repair>`_
 series.
 
-**Unresolved Question**: How will repair ensure that the ``dirent_pos`` fields
-match in the reconstructed directory?
-
-*Answer*: There are a few ways to solve this problem:
-
-1. The field could be designated advisory, since the other three values are
-   sufficient to find the entry in the parent.
-   However, this makes indexed key lookup impossible while repairs are ongoing.
-
-2. We could allow creating directory entries at specified offsets, which solves
-   the referential integrity problem but runs the risk that dirent creation
-   will fail due to conflicts with the free space in the directory.
-
-   These conflicts could be resolved by appending the directory entry and
-   amending the xattr code to support updating an xattr key and reindexing the
-   dabtree, though this would have to be performed with the parent directory
-   still locked.
-
-3. Same as above, but remove the old parent pointer entry and add a new one
-   atomically.
-
-4. Change the ondisk xattr format to ``(parent_inum, name) → (parent_gen)``,
-   which would provide the attr name uniqueness that we require, without
-   forcing repair code to update the dirent position.
-   Unfortunately, this requires changes to the xattr code to support attr
-   names as long as 263 bytes.
-
-5. Change the ondisk xattr format to ``(parent_inum, hash(name)) →
-   (name, parent_gen)``.
-   If the hash is sufficiently resistant to collisions (e.g. sha256) then
-   this should provide the attr name uniqueness that we require.
-   Names shorter than 247 bytes could be stored directly.
-
-Discussion is ongoing under the `parent pointers patch deluge
-<https://www.spinics.net/lists/linux-xfs/msg69397.html>`_.
-
 Case Study: Repairing Parent Pointers
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 2/4] docs: update online directory and parent pointer repair sections
  2024-04-10  0:44 ` [PATCHSET v13.1 1/9] xfs: design documentation for online fsck, part 2 Darrick J. Wong
  2024-04-10  0:46   ` [PATCH 1/4] docs: update the parent pointers documentation to the final version Darrick J. Wong
@ 2024-04-10  0:46   ` Darrick J. Wong
  2024-04-10  4:40     ` Christoph Hellwig
  2024-04-10  0:47   ` [PATCH 3/4] docs: update offline parent pointer repair strategy Darrick J. Wong
  2024-04-10  0:47   ` [PATCH 4/4] docs: describe xfs directory tree online fsck Darrick J. Wong
  3 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  0:46 UTC (permalink / raw)
  To: djwong; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Update the case studies of online directory and parent pointer
reconstruction to reflect what they actually do in the final version.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 .../filesystems/xfs/xfs-online-fsck-design.rst     |   55 +++++++++++---------
 1 file changed, 29 insertions(+), 26 deletions(-)


diff --git a/Documentation/filesystems/xfs/xfs-online-fsck-design.rst b/Documentation/filesystems/xfs/xfs-online-fsck-design.rst
index 1e3211d12247d..1ea4e59c9cdbd 100644
--- a/Documentation/filesystems/xfs/xfs-online-fsck-design.rst
+++ b/Documentation/filesystems/xfs/xfs-online-fsck-design.rst
@@ -4576,8 +4576,9 @@ Directory rebuilding uses a :ref:`coordinated inode scan <iscan>` and
 a :ref:`directory entry live update hook <liveupdate>` as follows:
 
 1. Set up a temporary directory for generating the new directory structure,
-   an xfblob for storing entry names, and an xfarray for stashing directory
-   updates.
+   an xfblob for storing entry names, and an xfarray for stashing the fixed
+   size fields involved in a directory update: ``(child inumber, add vs.
+   remove, name cookie, ftype)``.
 
 2. Set up an inode scanner and hook into the directory entry code to receive
    updates on directory operations.
@@ -4586,35 +4587,34 @@ a :ref:`directory entry live update hook <liveupdate>` as follows:
    pointer references the directory of interest.
    If so:
 
-   a. Stash an addname entry for this dirent in the xfarray for later.
+   a. Stash the parent pointer name and an addname entry for this dirent in the
+      xfblob and xfarray, respectively.
 
-   b. When finished scanning that file, flush the stashed updates to the
-      temporary directory.
+   b. When finished scanning that file or the kernel memory consumption exceeds
+      a threshold, flush the stashed updates to the temporary directory.
 
 4. For each live directory update received via the hook, decide if the child
    has already been scanned.
    If so:
 
-   a. Stash an addname or removename entry for this dirent update in the
-      xfarray for later.
+   a. Stash the parent pointer name an addname or removename entry for this
+      dirent update in the xfblob and xfarray for later.
       We cannot write directly to the temporary directory because hook
       functions are not allowed to modify filesystem metadata.
       Instead, we stash updates in the xfarray and rely on the scanner thread
       to apply the stashed updates to the temporary directory.
 
-5. When the scan is complete, atomically exchange the contents of the temporary
+5. When the scan is complete, replay any stashed entries in the xfarray.
+
+6. When the scan is complete, atomically exchange the contents of the temporary
    directory and the directory being repaired.
    The temporary directory now contains the damaged directory structure.
 
-6. Reap the temporary directory.
-
-7. Update the dirent position field of parent pointers as necessary.
-   This may require the queuing of a substantial number of xattr log intent
-   items.
+7. Reap the temporary directory.
 
 The proposed patchset is the
 `parent pointers directory repair
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=pptrs-online-dir-repair>`_
+<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=pptrs-fsck>`_
 series.
 
 Case Study: Repairing Parent Pointers
@@ -4624,8 +4624,9 @@ Online reconstruction of a file's parent pointer information works similarly to
 directory reconstruction:
 
 1. Set up a temporary file for generating a new extended attribute structure,
-   an `xfblob<xfblob>` for storing parent pointer names, and an xfarray for
-   stashing parent pointer updates.
+   an xfblob for storing parent pointer names, and an xfarray for stashing the
+   fixed size fields involved in a parent pointer update: ``(parent inumber,
+   parent generation, add vs. remove, name cookie)``.
 
 2. Set up an inode scanner and hook into the directory entry code to receive
    updates on directory operations.
@@ -4634,34 +4635,36 @@ directory reconstruction:
    dirent references the file of interest.
    If so:
 
-   a. Stash an addpptr entry for this parent pointer in the xfblob and xfarray
-      for later.
+   a. Stash the dirent name and an addpptr entry for this parent pointer in the
+      xfblob and xfarray, respectively.
 
-   b. When finished scanning the directory, flush the stashed updates to the
-      temporary directory.
+   b. When finished scanning the directory or the kernel memory consumption
+      exceeds a threshold, flush the stashed updates to the temporary file.
 
 4. For each live directory update received via the hook, decide if the parent
    has already been scanned.
    If so:
 
-   a. Stash an addpptr or removepptr entry for this dirent update in the
-      xfarray for later.
+   a. Stash the dirent name and an addpptr or removepptr entry for this dirent
+      update in the xfblob and xfarray for later.
       We cannot write parent pointers directly to the temporary file because
       hook functions are not allowed to modify filesystem metadata.
       Instead, we stash updates in the xfarray and rely on the scanner thread
       to apply the stashed parent pointer updates to the temporary file.
 
-5. Copy all non-parent pointer extended attributes to the temporary file.
+5. When the scan is complete, replay any stashed entries in the xfarray.
 
-6. When the scan is complete, atomically exchange the mappings of the attribute
+6. Copy all non-parent pointer extended attributes to the temporary file.
+
+7. When the scan is complete, atomically exchange the mappings of the attribute
    forks of the temporary file and the file being repaired.
    The temporary file now contains the damaged extended attribute structure.
 
-7. Reap the temporary file.
+8. Reap the temporary file.
 
 The proposed patchset is the
 `parent pointers repair
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=pptrs-online-parent-repair>`_
+<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=pptrs-fsck>`_
 series.
 
 Digression: Offline Checking of Parent Pointers


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 3/4] docs: update offline parent pointer repair strategy
  2024-04-10  0:44 ` [PATCHSET v13.1 1/9] xfs: design documentation for online fsck, part 2 Darrick J. Wong
  2024-04-10  0:46   ` [PATCH 1/4] docs: update the parent pointers documentation to the final version Darrick J. Wong
  2024-04-10  0:46   ` [PATCH 2/4] docs: update online directory and parent pointer repair sections Darrick J. Wong
@ 2024-04-10  0:47   ` Darrick J. Wong
  2024-04-10  4:40     ` Christoph Hellwig
  2024-04-10  0:47   ` [PATCH 4/4] docs: describe xfs directory tree online fsck Darrick J. Wong
  3 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  0:47 UTC (permalink / raw)
  To: djwong; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Now update how xfs_repair checks and repairs parent pointer info.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 .../filesystems/xfs/xfs-online-fsck-design.rst     |   81 +++++++++++++++-----
 1 file changed, 60 insertions(+), 21 deletions(-)


diff --git a/Documentation/filesystems/xfs/xfs-online-fsck-design.rst b/Documentation/filesystems/xfs/xfs-online-fsck-design.rst
index 1ea4e59c9cdbd..70e3e629d8b3f 100644
--- a/Documentation/filesystems/xfs/xfs-online-fsck-design.rst
+++ b/Documentation/filesystems/xfs/xfs-online-fsck-design.rst
@@ -4675,26 +4675,56 @@ files are erased long before directory tree connectivity checks are performed.
 Parent pointer checks are therefore a second pass to be added to the existing
 connectivity checks:
 
-1. After the set of surviving files has been established (i.e. phase 6),
+1. After the set of surviving files has been established (phase 6),
    walk the surviving directories of each AG in the filesystem.
    This is already performed as part of the connectivity checks.
 
-2. For each directory entry found, record the name in an xfblob, and store
-   ``(child_ag_inum, parent_inum, parent_gen, dirent_pos)`` tuples in a
-   per-AG in-memory slab.
+2. For each directory entry found,
+
+   a. If the name has already been stored in the xfblob, then use that cookie
+      and skip the next step.
+
+   b. Otherwise, record the name in an xfblob, and remember the xfblob cookie.
+      Unique mappings are critical for
+
+      1. Deduplicating names to reduce memory usage, and
+
+      2. Creating a stable sort key for the parent pointer indexes so that the
+         parent pointer validation described below will work.
+
+   c. Store ``(child_ag_inum, parent_inum, parent_gen, name_hash, name_len,
+      name_cookie)`` tuples in a per-AG in-memory slab.  The ``name_hash``
+      referenced in this section is the regular directory entry name hash, not
+      the specialized one used for parent pointer xattrs.
 
 3. For each AG in the filesystem,
 
-   a. Sort the per-AG tuples in order of child_ag_inum, parent_inum, and
-      dirent_pos.
+   a. Sort the per-AG tuple set in order of ``child_ag_inum``, ``parent_inum``,
+      ``name_hash``, and ``name_cookie``.
+      Having a single ``name_cookie`` for each ``name`` is critical for
+      handling the uncommon case of a directory containing multiple hardlinks
+      to the same file where all the names hash to the same value.
 
    b. For each inode in the AG,
 
       1. Scan the inode for parent pointers.
-         Record the names in a per-file xfblob, and store ``(parent_inum,
-         parent_gen, dirent_pos)`` tuples in a per-file slab.
+         For each parent pointer found,
 
-      2. Sort the per-file tuples in order of parent_inum, and dirent_pos.
+         a. Validate the ondisk parent pointer.
+            If validation fails, move on to the next parent pointer in the
+            file.
+
+         b. If the name has already been stored in the xfblob, then use that
+            cookie and skip the next step.
+
+         c. Record the name in a per-file xfblob, and remember the xfblob
+            cookie.
+
+         d. Store ``(parent_inum, parent_gen, name_hash, name_len,
+            name_cookie)`` tuples in a per-file slab.
+
+      2. Sort the per-file tuples in order of ``parent_inum``, ``name_hash``,
+         and ``name_cookie``.
 
       3. Position one slab cursor at the start of the inode's records in the
          per-AG tuple slab.
@@ -4703,28 +4733,37 @@ connectivity checks:
 
       4. Position a second slab cursor at the start of the per-file tuple slab.
 
-      5. Iterate the two cursors in lockstep, comparing the parent_ino and
-         dirent_pos fields of the records under each cursor.
+      5. Iterate the two cursors in lockstep, comparing the ``parent_ino``,
+         ``name_hash``, and ``name_cookie`` fields of the records under each
+         cursor:
 
-         a. Tuples in the per-AG list but not the per-file list are missing and
-            need to be written to the inode.
+         a. If the per-AG cursor is at a lower point in the keyspace than the
+            per-file cursor, then the per-AG cursor points to a missing parent
+            pointer.
+            Add the parent pointer to the inode and advance the per-AG
+            cursor.
 
-         b. Tuples in the per-file list but not the per-AG list are dangling
-            and need to be removed from the inode.
+         b. If the per-file cursor is at a lower point in the keyspace than
+            the per-AG cursor, then the per-file cursor points to a dangling
+            parent pointer.
+            Remove the parent pointer from the inode and advance the per-file
+            cursor.
 
-         c. For tuples in both lists, update the parent_gen and name components
-            of the parent pointer if necessary.
+         c. Otherwise, both cursors point at the same parent pointer.
+            Update the parent_gen component if necessary.
+            Advance both cursors.
 
 4. Move on to examining link counts, as we do today.
 
 The proposed patchset is the
 `offline parent pointers repair
-<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=pptrs-repair>`_
+<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=pptrs-fsck>`_
 series.
 
-Rebuilding directories from parent pointers in offline repair is very
-challenging because it currently uses a single-pass scan of the filesystem
-during phase 3 to decide which files are corrupt enough to be zapped.
+Rebuilding directories from parent pointers in offline repair would be very
+challenging because xfs_repair currently uses two single-pass scans of the
+filesystem during phases 3 and 4 to decide which files are corrupt enough to be
+zapped.
 This scan would have to be converted into a multi-pass scan:
 
 1. The first pass of the scan zaps corrupt inodes, forks, and attributes


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 4/4] docs: describe xfs directory tree online fsck
  2024-04-10  0:44 ` [PATCHSET v13.1 1/9] xfs: design documentation for online fsck, part 2 Darrick J. Wong
                     ` (2 preceding siblings ...)
  2024-04-10  0:47   ` [PATCH 3/4] docs: update offline parent pointer repair strategy Darrick J. Wong
@ 2024-04-10  0:47   ` Darrick J. Wong
  2024-04-10  4:40     ` Christoph Hellwig
  3 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  0:47 UTC (permalink / raw)
  To: djwong; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

I've added a scrubber that checks the directory tree structure and fixes
them; describe this in the design documentation.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 .../filesystems/xfs/xfs-online-fsck-design.rst     |  124 ++++++++++++++++++++
 1 file changed, 124 insertions(+)


diff --git a/Documentation/filesystems/xfs/xfs-online-fsck-design.rst b/Documentation/filesystems/xfs/xfs-online-fsck-design.rst
index 70e3e629d8b3f..12aa638408304 100644
--- a/Documentation/filesystems/xfs/xfs-online-fsck-design.rst
+++ b/Documentation/filesystems/xfs/xfs-online-fsck-design.rst
@@ -4785,6 +4785,130 @@ This scan would have to be converted into a multi-pass scan:
 
 This code has not yet been constructed.
 
+.. _dirtree:
+
+Case Study: Directory Tree Structure
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+As mentioned earlier, the filesystem directory tree is supposed to be a
+directed acylic graph structure.
+However, each node in this graph is a separate ``xfs_inode`` object with its
+own locks, which makes validating the tree qualities difficult.
+Fortunately, non-directories are allowed to have multiple parents and cannot
+have children, so only directories need to be scanned.
+Directories typically constitute 5-10% of the files in a filesystem, which
+reduces the amount of work dramatically.
+
+If the directory tree could be frozen, it would be easy to discover cycles and
+disconnected regions by running a depth (or breadth) first search downwards
+from the root directory and marking a bitmap for each directory found.
+At any point in the walk, trying to set an already set bit means there is a
+cycle.
+After the scan completes, XORing the marked inode bitmap with the inode
+allocation bitmap reveals disconnected inodes.
+However, one of online repair's design goals is to avoid locking the entire
+filesystem unless it's absolutely necessary.
+Directory tree updates can move subtrees across the scanner wavefront on a live
+filesystem, so the bitmap algorithm cannot be applied.
+
+Directory parent pointers enable an incremental approach to validation of the
+tree structure.
+Instead of using one thread to scan the entire filesystem, multiple threads can
+walk from individual subdirectories upwards towards the root.
+For this to work, all directory entries and parent pointers must be internally
+consistent, each directory entry must have a parent pointer, and the link
+counts of all directories must be correct.
+Each scanner thread must be able to take the IOLOCK of an alleged parent
+directory while holding the IOLOCK of the child directory to prevent either
+directory from being moved within the tree.
+This is not possible since the VFS does not take the IOLOCK of a child
+subdirectory when moving that subdirectory, so instead the scanner stabilizes
+the parent -> child relationship by taking the ILOCKs and installing a dirent
+update hook to detect changes.
+
+The scanning process uses a dirent hook to detect changes to the directories
+mentioned in the scan data.
+The scan works as follows:
+
+1. For each subdirectory in the filesystem,
+
+   a. For each parent pointer of that subdirectory,
+
+      1. Create a path object for that parent pointer, and mark the
+         subdirectory inode number in the path object's bitmap.
+
+      2. Record the parent pointer name and inode number in a path structure.
+
+      3. If the alleged parent is the subdirectory being scrubbed, the path is
+         a cycle.
+         Mark the path for deletion and repeat step 1a with the next
+         subdirectory parent pointer.
+
+      4. Try to mark the alleged parent inode number in a bitmap in the path
+         object.
+         If the bit is already set, then there is a cycle in the directory
+         tree.
+         Mark the path as a cycle and repeat step 1a with the next subdirectory
+         parent pointer.
+
+      5. Load the alleged parent.
+         If the alleged parent is not a linked directory, abort the scan
+         because the parent pointer information is inconsistent.
+
+      6. For each parent pointer of this alleged ancestor directory,
+
+         a. Record the parent pointer name and inode number in the path object
+            if no parent has been set for that level.
+
+         b. If an ancestor has more than one parent, mark the path as corrupt.
+            Repeat step 1a with the next subdirectory parent pointer.
+
+         c. Repeat steps 1a3-1a6 for the ancestor identified in step 1a6a.
+            This repeats until the directory tree root is reached or no parents
+            are found.
+
+      7. If the walk terminates at the root directory, mark the path as ok.
+
+      8. If the walk terminates without reaching the root, mark the path as
+         disconnected.
+
+2. If the directory entry update hook triggers, check all paths already found
+   by the scan.
+   If the entry matches part of a path, mark that path and the scan stale.
+   When the scanner thread sees that the scan has been marked stale, it deletes
+   all scan data and starts over.
+
+Repairing the directory tree works as follows:
+
+1. Walk each path of the target subdirectory.
+
+   a. Corrupt paths and cycle paths are counted as suspect.
+
+   b. Paths already marked for deletion are counted as bad.
+
+   c. Paths that reached the root are counted as good.
+
+2. If the subdirectory is either the root directory or has zero link count,
+   delete all incoming directory entries in the immediate parents.
+   Repairs are complete.
+
+3. If the subdirectory has exactly one path, set the dotdot entry to the
+   parent and exit.
+
+4. If the subdirectory has at least one good path, delete all the other
+   incoming directory entries in the immediate parents.
+
+5. If the subdirectory has no good paths and more than one suspect path, delete
+   all the other incoming directory entries in the immediate parents.
+
+6. If the subdirectory has zero paths, attach it to the lost and found.
+
+The proposed patches are in the
+`directory tree repair
+<https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git/log/?h=scrub-directory-tree>`_
+series.
+
+
 .. _orphanage:
 
 The Orphanage


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 1/7] xfs: Increase XFS_DEFER_OPS_NR_INODES to 5
  2024-04-10  0:44 ` [PATCHSET v13.1 2/9] xfs: retain ILOCK during directory updates Darrick J. Wong
@ 2024-04-10  0:47   ` Darrick J. Wong
  2024-04-10  4:41     ` Christoph Hellwig
  2024-04-10  0:48   ` [PATCH 2/7] xfs: Increase XFS_QM_TRANS_MAXDQS " Darrick J. Wong
                     ` (5 subsequent siblings)
  6 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  0:47 UTC (permalink / raw)
  To: djwong
  Cc: Allison Henderson, Catherine Hoang, catherine.hoang, hch,
	allison.henderson, linux-xfs

From: Allison Henderson <allison.henderson@oracle.com>

Renames that generate parent pointer updates can join up to 5
inodes locked in sorted order.  So we need to increase the
number of defer ops inodes and relock them in the same way.

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Catherine Hoang <catherine.hoang@oracle.com>
[djwong: have one sorting function]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_defer.c |    6 +++++-
 fs/xfs/libxfs/xfs_defer.h |    8 +++++++-
 fs/xfs/xfs_inode.c        |   27 ++++++++++++++++++---------
 fs/xfs/xfs_inode.h        |    2 ++
 4 files changed, 32 insertions(+), 11 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
index 061cc01245a91..4a078e07e1a0a 100644
--- a/fs/xfs/libxfs/xfs_defer.c
+++ b/fs/xfs/libxfs/xfs_defer.c
@@ -1092,7 +1092,11 @@ xfs_defer_ops_continue(
 	ASSERT(!(tp->t_flags & XFS_TRANS_DIRTY));
 
 	/* Lock the captured resources to the new transaction. */
-	if (dfc->dfc_held.dr_inos == 2)
+	if (dfc->dfc_held.dr_inos > 2) {
+		xfs_sort_inodes(dfc->dfc_held.dr_ip, dfc->dfc_held.dr_inos);
+		xfs_lock_inodes(dfc->dfc_held.dr_ip, dfc->dfc_held.dr_inos,
+				XFS_ILOCK_EXCL);
+	} else if (dfc->dfc_held.dr_inos == 2)
 		xfs_lock_two_inodes(dfc->dfc_held.dr_ip[0], XFS_ILOCK_EXCL,
 				    dfc->dfc_held.dr_ip[1], XFS_ILOCK_EXCL);
 	else if (dfc->dfc_held.dr_inos == 1)
diff --git a/fs/xfs/libxfs/xfs_defer.h b/fs/xfs/libxfs/xfs_defer.h
index 81cca60d70a3b..8b338031e487c 100644
--- a/fs/xfs/libxfs/xfs_defer.h
+++ b/fs/xfs/libxfs/xfs_defer.h
@@ -77,7 +77,13 @@ extern const struct xfs_defer_op_type xfs_exchmaps_defer_type;
 /*
  * Deferred operation item relogging limits.
  */
-#define XFS_DEFER_OPS_NR_INODES	2	/* join up to two inodes */
+
+/*
+ * Rename w/ parent pointers can require up to 5 inodes with deferred ops to
+ * be joined to the transaction: src_dp, target_dp, src_ip, target_ip, and wip.
+ * These inodes are locked in sorted order by their inode numbers
+ */
+#define XFS_DEFER_OPS_NR_INODES	5
 #define XFS_DEFER_OPS_NR_BUFS	2	/* join up to two buffers */
 
 /* Resources that must be held across a transaction roll. */
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 03dcb4ac04312..efd040094753f 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -418,7 +418,7 @@ xfs_lock_inumorder(
  * lock more than one at a time, lockdep will report false positives saying we
  * have violated locking orders.
  */
-static void
+void
 xfs_lock_inodes(
 	struct xfs_inode	**ips,
 	int			inodes,
@@ -2802,7 +2802,7 @@ xfs_sort_for_rename(
 	struct xfs_inode	**i_tab,/* out: sorted array of inodes */
 	int			*num_inodes)  /* in/out: inodes in array */
 {
-	int			i, j;
+	int			i;
 
 	ASSERT(*num_inodes == __XFS_SORT_INODES);
 	memset(i_tab, 0, *num_inodes * sizeof(struct xfs_inode *));
@@ -2824,17 +2824,26 @@ xfs_sort_for_rename(
 		i_tab[i++] = wip;
 	*num_inodes = i;
 
+	xfs_sort_inodes(i_tab, *num_inodes);
+}
+
+void
+xfs_sort_inodes(
+	struct xfs_inode	**i_tab,
+	unsigned int		num_inodes)
+{
+	int			i, j;
+
+	ASSERT(num_inodes <= __XFS_SORT_INODES);
+
 	/*
 	 * Sort the elements via bubble sort.  (Remember, there are at
 	 * most 5 elements to sort, so this is adequate.)
 	 */
-	for (i = 0; i < *num_inodes; i++) {
-		for (j = 1; j < *num_inodes; j++) {
-			if (i_tab[j]->i_ino < i_tab[j-1]->i_ino) {
-				struct xfs_inode *temp = i_tab[j];
-				i_tab[j] = i_tab[j-1];
-				i_tab[j-1] = temp;
-			}
+	for (i = 0; i < num_inodes; i++) {
+		for (j = 1; j < num_inodes; j++) {
+			if (i_tab[j]->i_ino < i_tab[j-1]->i_ino)
+				swap(i_tab[j], i_tab[j - 1]);
 		}
 	}
 }
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index c74c48bc09453..a6da1ab8ab136 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -627,6 +627,8 @@ int xfs_ilock2_io_mmap(struct xfs_inode *ip1, struct xfs_inode *ip2);
 void xfs_iunlock2_io_mmap(struct xfs_inode *ip1, struct xfs_inode *ip2);
 void xfs_iunlock2_remapping(struct xfs_inode *ip1, struct xfs_inode *ip2);
 void xfs_bumplink(struct xfs_trans *tp, struct xfs_inode *ip);
+void xfs_lock_inodes(struct xfs_inode **ips, int inodes, uint lock_mode);
+void xfs_sort_inodes(struct xfs_inode **i_tab, unsigned int num_inodes);
 
 static inline bool
 xfs_inode_unlinked_incomplete(


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 2/7] xfs: Increase XFS_QM_TRANS_MAXDQS to 5
  2024-04-10  0:44 ` [PATCHSET v13.1 2/9] xfs: retain ILOCK during directory updates Darrick J. Wong
  2024-04-10  0:47   ` [PATCH 1/7] xfs: Increase XFS_DEFER_OPS_NR_INODES to 5 Darrick J. Wong
@ 2024-04-10  0:48   ` Darrick J. Wong
  2024-04-10  4:41     ` Christoph Hellwig
  2024-04-10  0:48   ` [PATCH 3/7] xfs: Hold inode locks in xfs_ialloc Darrick J. Wong
                     ` (4 subsequent siblings)
  6 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  0:48 UTC (permalink / raw)
  To: djwong
  Cc: Allison Henderson, catherine.hoang, hch, allison.henderson, linux-xfs

From: Allison Henderson <allison.henderson@oracle.com>

With parent pointers enabled, a rename operation can update up to 5
inodes: src_dp, target_dp, src_ip, target_ip and wip.  This causes
their dquots to a be attached to the transaction chain, so we need
to increase XFS_QM_TRANS_MAXDQS.  This patch also add a helper
function xfs_dqlockn to lock an arbitrary number of dquots.

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_dquot.c       |   41 +++++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_dquot.h       |    1 +
 fs/xfs/xfs_qm.h          |    2 +-
 fs/xfs/xfs_trans_dquot.c |   15 ++++++++++-----
 4 files changed, 53 insertions(+), 6 deletions(-)


diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
index c98cb468c3578..13aba84bd64af 100644
--- a/fs/xfs/xfs_dquot.c
+++ b/fs/xfs/xfs_dquot.c
@@ -1371,6 +1371,47 @@ xfs_dqlock2(
 	}
 }
 
+static int
+xfs_dqtrx_cmp(
+	const void		*a,
+	const void		*b)
+{
+	const struct xfs_dqtrx	*qa = a;
+	const struct xfs_dqtrx	*qb = b;
+
+	if (qa->qt_dquot->q_id > qb->qt_dquot->q_id)
+		return 1;
+	if (qa->qt_dquot->q_id < qb->qt_dquot->q_id)
+		return -1;
+	return 0;
+}
+
+void
+xfs_dqlockn(
+	struct xfs_dqtrx	*q)
+{
+	unsigned int		i;
+
+	BUILD_BUG_ON(XFS_QM_TRANS_MAXDQS > MAX_LOCKDEP_SUBCLASSES);
+
+	/* Sort in order of dquot id, do not allow duplicates */
+	for (i = 0; i < XFS_QM_TRANS_MAXDQS && q[i].qt_dquot != NULL; i++) {
+		unsigned int	j;
+
+		for (j = 0; j < i; j++)
+			ASSERT(q[i].qt_dquot != q[j].qt_dquot);
+	}
+	if (i == 0)
+		return;
+
+	sort(q, i, sizeof(struct xfs_dqtrx), xfs_dqtrx_cmp, NULL);
+
+	mutex_lock(&q[0].qt_dquot->q_qlock);
+	for (i = 1; i < XFS_QM_TRANS_MAXDQS && q[i].qt_dquot != NULL; i++)
+		mutex_lock_nested(&q[i].qt_dquot->q_qlock,
+				XFS_QLOCK_NESTED + i - 1);
+}
+
 int __init
 xfs_qm_init(void)
 {
diff --git a/fs/xfs/xfs_dquot.h b/fs/xfs/xfs_dquot.h
index 956272d9b302f..677bb2dc9ac91 100644
--- a/fs/xfs/xfs_dquot.h
+++ b/fs/xfs/xfs_dquot.h
@@ -223,6 +223,7 @@ int		xfs_qm_dqget_uncached(struct xfs_mount *mp,
 void		xfs_qm_dqput(struct xfs_dquot *dqp);
 
 void		xfs_dqlock2(struct xfs_dquot *, struct xfs_dquot *);
+void		xfs_dqlockn(struct xfs_dqtrx *q);
 
 void		xfs_dquot_set_prealloc_limits(struct xfs_dquot *);
 
diff --git a/fs/xfs/xfs_qm.h b/fs/xfs/xfs_qm.h
index f5993012bf98f..6e09dfcd13e25 100644
--- a/fs/xfs/xfs_qm.h
+++ b/fs/xfs/xfs_qm.h
@@ -136,7 +136,7 @@ enum {
 	XFS_QM_TRANS_PRJ,
 	XFS_QM_TRANS_DQTYPES
 };
-#define XFS_QM_TRANS_MAXDQS		2
+#define XFS_QM_TRANS_MAXDQS		5
 struct xfs_dquot_acct {
 	struct xfs_dqtrx	dqs[XFS_QM_TRANS_DQTYPES][XFS_QM_TRANS_MAXDQS];
 };
diff --git a/fs/xfs/xfs_trans_dquot.c b/fs/xfs/xfs_trans_dquot.c
index 577b535a595cb..b368e13424c4f 100644
--- a/fs/xfs/xfs_trans_dquot.c
+++ b/fs/xfs/xfs_trans_dquot.c
@@ -379,24 +379,29 @@ xfs_trans_mod_dquot(
 
 /*
  * Given an array of dqtrx structures, lock all the dquots associated and join
- * them to the transaction, provided they have been modified.  We know that the
- * highest number of dquots of one type - usr, grp and prj - involved in a
- * transaction is 3 so we don't need to make this very generic.
+ * them to the transaction, provided they have been modified.
  */
 STATIC void
 xfs_trans_dqlockedjoin(
 	struct xfs_trans	*tp,
 	struct xfs_dqtrx	*q)
 {
+	unsigned int		i;
 	ASSERT(q[0].qt_dquot != NULL);
 	if (q[1].qt_dquot == NULL) {
 		xfs_dqlock(q[0].qt_dquot);
 		xfs_trans_dqjoin(tp, q[0].qt_dquot);
-	} else {
-		ASSERT(XFS_QM_TRANS_MAXDQS == 2);
+	} else if (q[2].qt_dquot == NULL) {
 		xfs_dqlock2(q[0].qt_dquot, q[1].qt_dquot);
 		xfs_trans_dqjoin(tp, q[0].qt_dquot);
 		xfs_trans_dqjoin(tp, q[1].qt_dquot);
+	} else {
+		xfs_dqlockn(q);
+		for (i = 0; i < XFS_QM_TRANS_MAXDQS; i++) {
+			if (q[i].qt_dquot == NULL)
+				break;
+			xfs_trans_dqjoin(tp, q[i].qt_dquot);
+		}
 	}
 }
 


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 3/7] xfs: Hold inode locks in xfs_ialloc
  2024-04-10  0:44 ` [PATCHSET v13.1 2/9] xfs: retain ILOCK during directory updates Darrick J. Wong
  2024-04-10  0:47   ` [PATCH 1/7] xfs: Increase XFS_DEFER_OPS_NR_INODES to 5 Darrick J. Wong
  2024-04-10  0:48   ` [PATCH 2/7] xfs: Increase XFS_QM_TRANS_MAXDQS " Darrick J. Wong
@ 2024-04-10  0:48   ` Darrick J. Wong
  2024-04-10  4:41     ` Christoph Hellwig
  2024-04-10  0:48   ` [PATCH 4/7] xfs: Hold inode locks in xfs_trans_alloc_dir Darrick J. Wong
                     ` (3 subsequent siblings)
  6 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  0:48 UTC (permalink / raw)
  To: djwong
  Cc: Allison Henderson, Catherine Hoang, catherine.hoang, hch,
	allison.henderson, linux-xfs

From: Allison Henderson <allison.henderson@oracle.com>

Modify xfs_ialloc to hold locks after return.  Caller will be
responsible for manual unlock.  We will need this later to hold locks
across parent pointer operations

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Catherine Hoang <catherine.hoang@oracle.com>
[djwong: hold the parent ilocked across transaction rolls too]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_inode.c   |   12 +++++++++---
 fs/xfs/xfs_qm.c      |    4 +++-
 fs/xfs/xfs_symlink.c |    6 ++++--
 3 files changed, 16 insertions(+), 6 deletions(-)


diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index efd040094753f..2ec005e6c1dab 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -747,6 +747,8 @@ xfs_inode_inherit_flags2(
 /*
  * Initialise a newly allocated inode and return the in-core inode to the
  * caller locked exclusively.
+ *
+ * Caller is responsible for unlocking the inode manually upon return
  */
 int
 xfs_init_new_inode(
@@ -873,7 +875,7 @@ xfs_init_new_inode(
 	/*
 	 * Log the new values stuffed into the inode.
 	 */
-	xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
+	xfs_trans_ijoin(tp, ip, 0);
 	xfs_trans_log_inode(tp, ip, flags);
 
 	/* now that we have an i_mode we can setup the inode structure */
@@ -1101,8 +1103,7 @@ xfs_create(
 	 * the transaction cancel unlocking dp so don't do it explicitly in the
 	 * error path.
 	 */
-	xfs_trans_ijoin(tp, dp, XFS_ILOCK_EXCL);
-	unlock_dp_on_error = false;
+	xfs_trans_ijoin(tp, dp, 0);
 
 	error = xfs_dir_createname(tp, dp, name, ip->i_ino,
 					resblks - XFS_IALLOC_SPACE_RES(mp));
@@ -1151,6 +1152,8 @@ xfs_create(
 	xfs_qm_dqrele(pdqp);
 
 	*ipp = ip;
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+	xfs_iunlock(dp, XFS_ILOCK_EXCL);
 	return 0;
 
  out_trans_cancel:
@@ -1162,6 +1165,7 @@ xfs_create(
 	 * transactions and deadlocks from xfs_inactive.
 	 */
 	if (ip) {
+		xfs_iunlock(ip, XFS_ILOCK_EXCL);
 		xfs_finish_inode_setup(ip);
 		xfs_irele(ip);
 	}
@@ -1247,6 +1251,7 @@ xfs_create_tmpfile(
 	xfs_qm_dqrele(pdqp);
 
 	*ipp = ip;
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
 	return 0;
 
  out_trans_cancel:
@@ -1258,6 +1263,7 @@ xfs_create_tmpfile(
 	 * transactions and deadlocks from xfs_inactive.
 	 */
 	if (ip) {
+		xfs_iunlock(ip, XFS_ILOCK_EXCL);
 		xfs_finish_inode_setup(ip);
 		xfs_irele(ip);
 	}
diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c
index 0f4cf4170c357..47120b745c47f 100644
--- a/fs/xfs/xfs_qm.c
+++ b/fs/xfs/xfs_qm.c
@@ -836,8 +836,10 @@ xfs_qm_qino_alloc(
 		ASSERT(xfs_is_shutdown(mp));
 		xfs_alert(mp, "%s failed (error %d)!", __func__, error);
 	}
-	if (need_alloc)
+	if (need_alloc) {
+		xfs_iunlock(*ipp, XFS_ILOCK_EXCL);
 		xfs_finish_inode_setup(*ipp);
+	}
 	return error;
 }
 
diff --git a/fs/xfs/xfs_symlink.c b/fs/xfs/xfs_symlink.c
index fb060aaf6d40f..85ef56fdd7dfe 100644
--- a/fs/xfs/xfs_symlink.c
+++ b/fs/xfs/xfs_symlink.c
@@ -172,8 +172,7 @@ xfs_symlink(
 	 * the transaction cancel unlocking dp so don't do it explicitly in the
 	 * error path.
 	 */
-	xfs_trans_ijoin(tp, dp, XFS_ILOCK_EXCL);
-	unlock_dp_on_error = false;
+	xfs_trans_ijoin(tp, dp, 0);
 
 	/*
 	 * Also attach the dquot(s) to it, if applicable.
@@ -215,6 +214,8 @@ xfs_symlink(
 	xfs_qm_dqrele(pdqp);
 
 	*ipp = ip;
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+	xfs_iunlock(dp, XFS_ILOCK_EXCL);
 	return 0;
 
 out_trans_cancel:
@@ -226,6 +227,7 @@ xfs_symlink(
 	 * transactions and deadlocks from xfs_inactive.
 	 */
 	if (ip) {
+		xfs_iunlock(ip, XFS_ILOCK_EXCL);
 		xfs_finish_inode_setup(ip);
 		xfs_irele(ip);
 	}


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 4/7] xfs: Hold inode locks in xfs_trans_alloc_dir
  2024-04-10  0:44 ` [PATCHSET v13.1 2/9] xfs: retain ILOCK during directory updates Darrick J. Wong
                     ` (2 preceding siblings ...)
  2024-04-10  0:48   ` [PATCH 3/7] xfs: Hold inode locks in xfs_ialloc Darrick J. Wong
@ 2024-04-10  0:48   ` Darrick J. Wong
  2024-04-10  4:41     ` Christoph Hellwig
  2024-04-10  0:48   ` [PATCH 5/7] xfs: Hold inode locks in xfs_rename Darrick J. Wong
                     ` (2 subsequent siblings)
  6 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  0:48 UTC (permalink / raw)
  To: djwong
  Cc: Allison Henderson, Catherine Hoang, catherine.hoang, hch,
	allison.henderson, linux-xfs

From: Allison Henderson <allison.henderson@oracle.com>

Modify xfs_trans_alloc_dir to hold locks after return.  Caller will be
responsible for manual unlock.  We will need this later to hold locks
across parent pointer operations

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Catherine Hoang <catherine.hoang@oracle.com>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_inode.c |   14 ++++++++++++--
 fs/xfs/xfs_trans.c |    9 +++++++--
 2 files changed, 19 insertions(+), 4 deletions(-)


diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 2ec005e6c1dab..36e1012e156a1 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -1368,10 +1368,15 @@ xfs_link(
 	if (xfs_has_wsync(mp) || xfs_has_dirsync(mp))
 		xfs_trans_set_sync(tp);
 
-	return xfs_trans_commit(tp);
+	error = xfs_trans_commit(tp);
+	xfs_iunlock(tdp, XFS_ILOCK_EXCL);
+	xfs_iunlock(sip, XFS_ILOCK_EXCL);
+	return error;
 
  error_return:
 	xfs_trans_cancel(tp);
+	xfs_iunlock(tdp, XFS_ILOCK_EXCL);
+	xfs_iunlock(sip, XFS_ILOCK_EXCL);
  std_return:
 	if (error == -ENOSPC && nospace_error)
 		error = nospace_error;
@@ -2781,15 +2786,20 @@ xfs_remove(
 
 	error = xfs_trans_commit(tp);
 	if (error)
-		goto std_return;
+		goto out_unlock;
 
 	if (is_dir && xfs_inode_is_filestream(ip))
 		xfs_filestream_deassociate(ip);
 
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+	xfs_iunlock(dp, XFS_ILOCK_EXCL);
 	return 0;
 
  out_trans_cancel:
 	xfs_trans_cancel(tp);
+ out_unlock:
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+	xfs_iunlock(dp, XFS_ILOCK_EXCL);
  std_return:
 	return error;
 }
diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index 7350640059cc6..50d878d78a5e1 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -1430,6 +1430,8 @@ xfs_trans_alloc_ichange(
  * The caller must ensure that the on-disk dquots attached to this inode have
  * already been allocated and initialized.  The ILOCKs will be dropped when the
  * transaction is committed or cancelled.
+ *
+ * Caller is responsible for unlocking the inodes manually upon return
  */
 int
 xfs_trans_alloc_dir(
@@ -1460,8 +1462,8 @@ xfs_trans_alloc_dir(
 
 	xfs_lock_two_inodes(dp, XFS_ILOCK_EXCL, ip, XFS_ILOCK_EXCL);
 
-	xfs_trans_ijoin(tp, dp, XFS_ILOCK_EXCL);
-	xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
+	xfs_trans_ijoin(tp, dp, 0);
+	xfs_trans_ijoin(tp, ip, 0);
 
 	error = xfs_qm_dqattach_locked(dp, false);
 	if (error) {
@@ -1484,6 +1486,9 @@ xfs_trans_alloc_dir(
 	if (error == -EDQUOT || error == -ENOSPC) {
 		if (!retried) {
 			xfs_trans_cancel(tp);
+			xfs_iunlock(dp, XFS_ILOCK_EXCL);
+			if (dp != ip)
+				xfs_iunlock(ip, XFS_ILOCK_EXCL);
 			xfs_blockgc_free_quota(dp, 0);
 			retried = true;
 			goto retry;


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 5/7] xfs: Hold inode locks in xfs_rename
  2024-04-10  0:44 ` [PATCHSET v13.1 2/9] xfs: retain ILOCK during directory updates Darrick J. Wong
                     ` (3 preceding siblings ...)
  2024-04-10  0:48   ` [PATCH 4/7] xfs: Hold inode locks in xfs_trans_alloc_dir Darrick J. Wong
@ 2024-04-10  0:48   ` Darrick J. Wong
  2024-04-10  4:42     ` Christoph Hellwig
  2024-04-10  0:49   ` [PATCH 6/7] xfs: don't pick up IOLOCK during rmapbt repair scan Darrick J. Wong
  2024-04-10  0:49   ` [PATCH 7/7] xfs: unlock new repair tempfiles after creation Darrick J. Wong
  6 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  0:48 UTC (permalink / raw)
  To: djwong
  Cc: Allison Henderson, Catherine Hoang, catherine.hoang, hch,
	allison.henderson, linux-xfs

From: Allison Henderson <allison.henderson@oracle.com>

Modify xfs_rename to hold all inode locks across a rename operation
We will need this later when we add parent pointers

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Catherine Hoang <catherine.hoang@oracle.com>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_inode.c |   45 +++++++++++++++++++++++++++++++++------------
 1 file changed, 33 insertions(+), 12 deletions(-)


diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 36e1012e156a1..2aec7ab59aeb7 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -2804,6 +2804,21 @@ xfs_remove(
 	return error;
 }
 
+static inline void
+xfs_iunlock_rename(
+	struct xfs_inode	**i_tab,
+	int			num_inodes)
+{
+	int			i;
+
+	for (i = num_inodes - 1; i >= 0; i--) {
+		/* Skip duplicate inodes if src and target dps are the same */
+		if (!i_tab[i] || (i > 0 && i_tab[i] == i_tab[i - 1]))
+			continue;
+		xfs_iunlock(i_tab[i], XFS_ILOCK_EXCL);
+	}
+}
+
 /*
  * Enter all inodes for a rename transaction into a sorted array.
  */
@@ -3113,8 +3128,10 @@ xfs_rename(
 	 * Attach the dquots to the inodes
 	 */
 	error = xfs_qm_vop_rename_dqattach(inodes);
-	if (error)
-		goto out_trans_cancel;
+	if (error) {
+		xfs_trans_cancel(tp);
+		goto out_release_wip;
+	}
 
 	/*
 	 * Lock all the participating inodes. Depending upon whether
@@ -3125,18 +3142,16 @@ xfs_rename(
 	xfs_lock_inodes(inodes, num_inodes, XFS_ILOCK_EXCL);
 
 	/*
-	 * Join all the inodes to the transaction. From this point on,
-	 * we can rely on either trans_commit or trans_cancel to unlock
-	 * them.
+	 * Join all the inodes to the transaction.
 	 */
-	xfs_trans_ijoin(tp, src_dp, XFS_ILOCK_EXCL);
+	xfs_trans_ijoin(tp, src_dp, 0);
 	if (new_parent)
-		xfs_trans_ijoin(tp, target_dp, XFS_ILOCK_EXCL);
-	xfs_trans_ijoin(tp, src_ip, XFS_ILOCK_EXCL);
+		xfs_trans_ijoin(tp, target_dp, 0);
+	xfs_trans_ijoin(tp, src_ip, 0);
 	if (target_ip)
-		xfs_trans_ijoin(tp, target_ip, XFS_ILOCK_EXCL);
+		xfs_trans_ijoin(tp, target_ip, 0);
 	if (wip)
-		xfs_trans_ijoin(tp, wip, XFS_ILOCK_EXCL);
+		xfs_trans_ijoin(tp, wip, 0);
 
 	/*
 	 * If we are using project inheritance, we only allow renames
@@ -3150,10 +3165,13 @@ xfs_rename(
 	}
 
 	/* RENAME_EXCHANGE is unique from here on. */
-	if (flags & RENAME_EXCHANGE)
-		return xfs_cross_rename(tp, src_dp, src_name, src_ip,
+	if (flags & RENAME_EXCHANGE) {
+		error = xfs_cross_rename(tp, src_dp, src_name, src_ip,
 					target_dp, target_name, target_ip,
 					spaceres);
+		xfs_iunlock_rename(inodes, num_inodes);
+		return error;
+	}
 
 	/*
 	 * Try to reserve quota to handle an expansion of the target directory.
@@ -3167,6 +3185,7 @@ xfs_rename(
 		if (error == -EDQUOT || error == -ENOSPC) {
 			if (!retried) {
 				xfs_trans_cancel(tp);
+				xfs_iunlock_rename(inodes, num_inodes);
 				xfs_blockgc_free_quota(target_dp, 0);
 				retried = true;
 				goto retry;
@@ -3393,12 +3412,14 @@ xfs_rename(
 		xfs_dir_update_hook(src_dp, wip, 1, src_name);
 
 	error = xfs_finish_rename(tp);
+	xfs_iunlock_rename(inodes, num_inodes);
 	if (wip)
 		xfs_irele(wip);
 	return error;
 
 out_trans_cancel:
 	xfs_trans_cancel(tp);
+	xfs_iunlock_rename(inodes, num_inodes);
 out_release_wip:
 	if (wip)
 		xfs_irele(wip);


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 6/7] xfs: don't pick up IOLOCK during rmapbt repair scan
  2024-04-10  0:44 ` [PATCHSET v13.1 2/9] xfs: retain ILOCK during directory updates Darrick J. Wong
                     ` (4 preceding siblings ...)
  2024-04-10  0:48   ` [PATCH 5/7] xfs: Hold inode locks in xfs_rename Darrick J. Wong
@ 2024-04-10  0:49   ` Darrick J. Wong
  2024-04-10  4:42     ` Christoph Hellwig
  2024-04-10  0:49   ` [PATCH 7/7] xfs: unlock new repair tempfiles after creation Darrick J. Wong
  6 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  0:49 UTC (permalink / raw)
  To: djwong; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Now that we've fixed the directory operations to hold the ILOCK until
they're finished with rmapbt updates for directory shape changes, we no
longer need to take this lock when scanning directories for rmapbt
records.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/scrub/rmap_repair.c |   16 +---------------
 1 file changed, 1 insertion(+), 15 deletions(-)


diff --git a/fs/xfs/scrub/rmap_repair.c b/fs/xfs/scrub/rmap_repair.c
index e8e07b683eab6..25acd69614c2c 100644
--- a/fs/xfs/scrub/rmap_repair.c
+++ b/fs/xfs/scrub/rmap_repair.c
@@ -578,23 +578,9 @@ xrep_rmap_scan_inode(
 	struct xrep_rmap	*rr,
 	struct xfs_inode	*ip)
 {
-	unsigned int		lock_mode = 0;
+	unsigned int		lock_mode = xrep_rmap_scan_ilock(ip);
 	int			error;
 
-	/*
-	 * Directory updates (create/link/unlink/rename) drop the directory's
-	 * ILOCK before finishing any rmapbt updates associated with directory
-	 * shape changes.  For this scan to coordinate correctly with the live
-	 * update hook, we must take the only lock (i_rwsem) that is held all
-	 * the way to dir op completion.  This will get fixed by the parent
-	 * pointer patchset.
-	 */
-	if (S_ISDIR(VFS_I(ip)->i_mode)) {
-		lock_mode = XFS_IOLOCK_SHARED;
-		xfs_ilock(ip, lock_mode);
-	}
-	lock_mode |= xrep_rmap_scan_ilock(ip);
-
 	/* Check the data fork. */
 	error = xrep_rmap_scan_ifork(rr, ip, XFS_DATA_FORK);
 	if (error)


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 7/7] xfs: unlock new repair tempfiles after creation
  2024-04-10  0:44 ` [PATCHSET v13.1 2/9] xfs: retain ILOCK during directory updates Darrick J. Wong
                     ` (5 preceding siblings ...)
  2024-04-10  0:49   ` [PATCH 6/7] xfs: don't pick up IOLOCK during rmapbt repair scan Darrick J. Wong
@ 2024-04-10  0:49   ` Darrick J. Wong
  2024-04-10  4:42     ` Christoph Hellwig
  6 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  0:49 UTC (permalink / raw)
  To: djwong; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

After creation, drop the ILOCK on temporary files that have been created
to stage a repair.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/scrub/tempfile.c |    2 ++
 1 file changed, 2 insertions(+)


diff --git a/fs/xfs/scrub/tempfile.c b/fs/xfs/scrub/tempfile.c
index c72e447eb8ec3..6f39504a216ea 100644
--- a/fs/xfs/scrub/tempfile.c
+++ b/fs/xfs/scrub/tempfile.c
@@ -153,6 +153,7 @@ xrep_tempfile_create(
 	xfs_qm_dqrele(pdqp);
 
 	/* Finish setting up the incore / vfs context. */
+	xfs_iunlock(sc->tempip, XFS_ILOCK_EXCL);
 	xfs_setup_iops(sc->tempip);
 	xfs_finish_inode_setup(sc->tempip);
 
@@ -168,6 +169,7 @@ xrep_tempfile_create(
 	 * transactions and deadlocks from xfs_inactive.
 	 */
 	if (sc->tempip) {
+		xfs_iunlock(sc->tempip, XFS_ILOCK_EXCL);
 		xfs_finish_inode_setup(sc->tempip);
 		xchk_irele(sc, sc->tempip);
 	}


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 1/4] xfs: remove XFS_DA_OP_REMOVE
  2024-04-10  0:44 ` [PATCHSET v13.1 3/9] xfs: shrink struct xfs_da_args Darrick J. Wong
@ 2024-04-10  0:49   ` Darrick J. Wong
  2024-04-10  4:43     ` Christoph Hellwig
  2024-04-10  0:49   ` [PATCH 2/4] xfs: remove XFS_DA_OP_NOTIME Darrick J. Wong
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  0:49 UTC (permalink / raw)
  To: djwong; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Nobody checks this flag, so get rid of it.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_attr.h     |    1 -
 fs/xfs/libxfs/xfs_da_btree.h |    6 ++----
 2 files changed, 2 insertions(+), 5 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
index e4f55008552b4..670ab2a613fc6 100644
--- a/fs/xfs/libxfs/xfs_attr.h
+++ b/fs/xfs/libxfs/xfs_attr.h
@@ -590,7 +590,6 @@ xfs_attr_init_add_state(struct xfs_da_args *args)
 static inline enum xfs_delattr_state
 xfs_attr_init_remove_state(struct xfs_da_args *args)
 {
-	args->op_flags |= XFS_DA_OP_REMOVE;
 	if (xfs_attr_is_shortform(args->dp))
 		return XFS_DAS_SF_REMOVE;
 	if (xfs_attr_is_leaf(args->dp))
diff --git a/fs/xfs/libxfs/xfs_da_btree.h b/fs/xfs/libxfs/xfs_da_btree.h
index 7a004786ee0a2..76e764080d994 100644
--- a/fs/xfs/libxfs/xfs_da_btree.h
+++ b/fs/xfs/libxfs/xfs_da_btree.h
@@ -91,9 +91,8 @@ typedef struct xfs_da_args {
 #define XFS_DA_OP_OKNOENT	(1u << 3) /* lookup op, ENOENT ok, else die */
 #define XFS_DA_OP_CILOOKUP	(1u << 4) /* lookup returns CI name if found */
 #define XFS_DA_OP_NOTIME	(1u << 5) /* don't update inode timestamps */
-#define XFS_DA_OP_REMOVE	(1u << 6) /* this is a remove operation */
-#define XFS_DA_OP_RECOVERY	(1u << 7) /* Log recovery operation */
-#define XFS_DA_OP_LOGGED	(1u << 8) /* Use intent items to track op */
+#define XFS_DA_OP_RECOVERY	(1u << 6) /* Log recovery operation */
+#define XFS_DA_OP_LOGGED	(1u << 7) /* Use intent items to track op */
 
 #define XFS_DA_OP_FLAGS \
 	{ XFS_DA_OP_JUSTCHECK,	"JUSTCHECK" }, \
@@ -102,7 +101,6 @@ typedef struct xfs_da_args {
 	{ XFS_DA_OP_OKNOENT,	"OKNOENT" }, \
 	{ XFS_DA_OP_CILOOKUP,	"CILOOKUP" }, \
 	{ XFS_DA_OP_NOTIME,	"NOTIME" }, \
-	{ XFS_DA_OP_REMOVE,	"REMOVE" }, \
 	{ XFS_DA_OP_RECOVERY,	"RECOVERY" }, \
 	{ XFS_DA_OP_LOGGED,	"LOGGED" }
 


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 2/4] xfs: remove XFS_DA_OP_NOTIME
  2024-04-10  0:44 ` [PATCHSET v13.1 3/9] xfs: shrink struct xfs_da_args Darrick J. Wong
  2024-04-10  0:49   ` [PATCH 1/4] xfs: remove XFS_DA_OP_REMOVE Darrick J. Wong
@ 2024-04-10  0:49   ` Darrick J. Wong
  2024-04-10  4:44     ` Christoph Hellwig
  2024-04-10  0:50   ` [PATCH 3/4] xfs: rename xfs_da_args.attr_flags Darrick J. Wong
  2024-04-10  0:50   ` [PATCH 4/4] xfs: rearrange xfs_da_args a bit to use less space Darrick J. Wong
  3 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  0:49 UTC (permalink / raw)
  To: djwong; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

The only user of this flag sets it prior to an xfs_attr_get_ilocked
call, which doesn't update anything.  Get rid of the flag.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_attr.c     |    5 ++---
 fs/xfs/libxfs/xfs_da_btree.h |    6 ++----
 fs/xfs/scrub/attr.c          |    1 -
 3 files changed, 4 insertions(+), 8 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index 05d22c5e38855..30e6084122d8b 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -365,7 +365,7 @@ xfs_attr_try_sf_addname(
 	 * Commit the shortform mods, and we're done.
 	 * NOTE: this is also the error path (EEXIST, etc).
 	 */
-	if (!error && !(args->op_flags & XFS_DA_OP_NOTIME))
+	if (!error)
 		xfs_trans_ichgtime(args->trans, dp, XFS_ICHGTIME_CHG);
 
 	if (xfs_has_wsync(dp->i_mount))
@@ -1033,8 +1033,7 @@ xfs_attr_set(
 	if (xfs_has_wsync(mp))
 		xfs_trans_set_sync(args->trans);
 
-	if (!(args->op_flags & XFS_DA_OP_NOTIME))
-		xfs_trans_ichgtime(args->trans, dp, XFS_ICHGTIME_CHG);
+	xfs_trans_ichgtime(args->trans, dp, XFS_ICHGTIME_CHG);
 
 	/*
 	 * Commit the last in the sequence of transactions.
diff --git a/fs/xfs/libxfs/xfs_da_btree.h b/fs/xfs/libxfs/xfs_da_btree.h
index 76e764080d994..b04a3290ffacc 100644
--- a/fs/xfs/libxfs/xfs_da_btree.h
+++ b/fs/xfs/libxfs/xfs_da_btree.h
@@ -90,9 +90,8 @@ typedef struct xfs_da_args {
 #define XFS_DA_OP_ADDNAME	(1u << 2) /* this is an add operation */
 #define XFS_DA_OP_OKNOENT	(1u << 3) /* lookup op, ENOENT ok, else die */
 #define XFS_DA_OP_CILOOKUP	(1u << 4) /* lookup returns CI name if found */
-#define XFS_DA_OP_NOTIME	(1u << 5) /* don't update inode timestamps */
-#define XFS_DA_OP_RECOVERY	(1u << 6) /* Log recovery operation */
-#define XFS_DA_OP_LOGGED	(1u << 7) /* Use intent items to track op */
+#define XFS_DA_OP_RECOVERY	(1u << 5) /* Log recovery operation */
+#define XFS_DA_OP_LOGGED	(1u << 6) /* Use intent items to track op */
 
 #define XFS_DA_OP_FLAGS \
 	{ XFS_DA_OP_JUSTCHECK,	"JUSTCHECK" }, \
@@ -100,7 +99,6 @@ typedef struct xfs_da_args {
 	{ XFS_DA_OP_ADDNAME,	"ADDNAME" }, \
 	{ XFS_DA_OP_OKNOENT,	"OKNOENT" }, \
 	{ XFS_DA_OP_CILOOKUP,	"CILOOKUP" }, \
-	{ XFS_DA_OP_NOTIME,	"NOTIME" }, \
 	{ XFS_DA_OP_RECOVERY,	"RECOVERY" }, \
 	{ XFS_DA_OP_LOGGED,	"LOGGED" }
 
diff --git a/fs/xfs/scrub/attr.c b/fs/xfs/scrub/attr.c
index 8853e4d0eee3d..5b855d7c98211 100644
--- a/fs/xfs/scrub/attr.c
+++ b/fs/xfs/scrub/attr.c
@@ -173,7 +173,6 @@ xchk_xattr_actor(
 	void			*priv)
 {
 	struct xfs_da_args		args = {
-		.op_flags		= XFS_DA_OP_NOTIME,
 		.attr_filter		= attr_flags & XFS_ATTR_NSP_ONDISK_MASK,
 		.geo			= sc->mp->m_attr_geo,
 		.whichfork		= XFS_ATTR_FORK,


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 3/4] xfs: rename xfs_da_args.attr_flags
  2024-04-10  0:44 ` [PATCHSET v13.1 3/9] xfs: shrink struct xfs_da_args Darrick J. Wong
  2024-04-10  0:49   ` [PATCH 1/4] xfs: remove XFS_DA_OP_REMOVE Darrick J. Wong
  2024-04-10  0:49   ` [PATCH 2/4] xfs: remove XFS_DA_OP_NOTIME Darrick J. Wong
@ 2024-04-10  0:50   ` Darrick J. Wong
  2024-04-10  5:01     ` Christoph Hellwig
  2024-04-10  0:50   ` [PATCH 4/4] xfs: rearrange xfs_da_args a bit to use less space Darrick J. Wong
  3 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  0:50 UTC (permalink / raw)
  To: djwong; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

This field only ever contains XATTR_{CREATE,REPLACE}, so let's change
the name of the field to make the field and its values consistent.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_attr.c     |    4 ++--
 fs/xfs/libxfs/xfs_da_btree.h |    2 +-
 fs/xfs/scrub/attr_repair.c   |    2 +-
 fs/xfs/xfs_ioctl.c           |    6 +++---
 fs/xfs/xfs_trace.h           |    6 +++---
 fs/xfs/xfs_xattr.c           |    2 +-
 6 files changed, 11 insertions(+), 11 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index 30e6084122d8b..5efbbb60f0069 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -1008,7 +1008,7 @@ xfs_attr_set(
 		}
 
 		/* Pure create fails if the attr already exists */
-		if (args->attr_flags & XATTR_CREATE)
+		if (args->xattr_flags & XATTR_CREATE)
 			goto out_trans_cancel;
 		xfs_attr_defer_add(args, XFS_ATTRI_OP_FLAGS_REPLACE);
 		break;
@@ -1018,7 +1018,7 @@ xfs_attr_set(
 			goto out_trans_cancel;
 
 		/* Pure replace fails if no existing attr to replace. */
-		if (args->attr_flags & XATTR_REPLACE)
+		if (args->xattr_flags & XATTR_REPLACE)
 			goto out_trans_cancel;
 		xfs_attr_defer_add(args, XFS_ATTRI_OP_FLAGS_SET);
 		break;
diff --git a/fs/xfs/libxfs/xfs_da_btree.h b/fs/xfs/libxfs/xfs_da_btree.h
index b04a3290ffacc..e585d0fa9caea 100644
--- a/fs/xfs/libxfs/xfs_da_btree.h
+++ b/fs/xfs/libxfs/xfs_da_btree.h
@@ -60,7 +60,7 @@ typedef struct xfs_da_args {
 	void		*value;		/* set of bytes (maybe contain NULLs) */
 	int		valuelen;	/* length of value */
 	unsigned int	attr_filter;	/* XFS_ATTR_{ROOT,SECURE,INCOMPLETE} */
-	unsigned int	attr_flags;	/* XATTR_{CREATE,REPLACE} */
+	unsigned int	xattr_flags;	/* XATTR_{CREATE,REPLACE} */
 	xfs_dahash_t	hashval;	/* hash value of name */
 	xfs_ino_t	inumber;	/* input/output inode number */
 	struct xfs_inode *dp;		/* directory inode to manipulate */
diff --git a/fs/xfs/scrub/attr_repair.c b/fs/xfs/scrub/attr_repair.c
index 7b4318764d030..8192f9044c4a9 100644
--- a/fs/xfs/scrub/attr_repair.c
+++ b/fs/xfs/scrub/attr_repair.c
@@ -557,7 +557,7 @@ xrep_xattr_insert_rec(
 	struct xfs_da_args		args = {
 		.dp			= rx->sc->tempip,
 		.attr_filter		= key->flags,
-		.attr_flags		= XATTR_CREATE,
+		.xattr_flags		= XATTR_CREATE,
 		.namelen		= key->namelen,
 		.valuelen		= key->valuelen,
 		.owner			= rx->sc->ip->i_ino,
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index d2fc710d2d506..39bdd1034ffab 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -362,7 +362,7 @@ xfs_attr_filter(
 }
 
 static unsigned int
-xfs_attr_flags(
+xfs_xattr_flags(
 	u32			ioc_flags)
 {
 	if (ioc_flags & XFS_IOC_ATTR_CREATE)
@@ -476,7 +476,7 @@ xfs_attrmulti_attr_get(
 	struct xfs_da_args	args = {
 		.dp		= XFS_I(inode),
 		.attr_filter	= xfs_attr_filter(flags),
-		.attr_flags	= xfs_attr_flags(flags),
+		.xattr_flags	= xfs_xattr_flags(flags),
 		.name		= name,
 		.namelen	= strlen(name),
 		.valuelen	= *len,
@@ -510,7 +510,7 @@ xfs_attrmulti_attr_set(
 	struct xfs_da_args	args = {
 		.dp		= XFS_I(inode),
 		.attr_filter	= xfs_attr_filter(flags),
-		.attr_flags	= xfs_attr_flags(flags),
+		.xattr_flags	= xfs_xattr_flags(flags),
 		.name		= name,
 		.namelen	= strlen(name),
 	};
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index ba7b01a390c00..e9cf9430ce259 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -2000,7 +2000,7 @@ DECLARE_EVENT_CLASS(xfs_attr_class,
 		__field(int, valuelen)
 		__field(xfs_dahash_t, hashval)
 		__field(unsigned int, attr_filter)
-		__field(unsigned int, attr_flags)
+		__field(unsigned int, xattr_flags)
 		__field(uint32_t, op_flags)
 	),
 	TP_fast_assign(
@@ -2012,7 +2012,7 @@ DECLARE_EVENT_CLASS(xfs_attr_class,
 		__entry->valuelen = args->valuelen;
 		__entry->hashval = args->hashval;
 		__entry->attr_filter = args->attr_filter;
-		__entry->attr_flags = args->attr_flags;
+		__entry->xattr_flags = args->xattr_flags;
 		__entry->op_flags = args->op_flags;
 	),
 	TP_printk("dev %d:%d ino 0x%llx name %.*s namelen %d valuelen %d "
@@ -2026,7 +2026,7 @@ DECLARE_EVENT_CLASS(xfs_attr_class,
 		  __entry->hashval,
 		  __print_flags(__entry->attr_filter, "|",
 				XFS_ATTR_FILTER_FLAGS),
-		   __print_flags(__entry->attr_flags, "|",
+		   __print_flags(__entry->xattr_flags, "|",
 				{ XATTR_CREATE,		"CREATE" },
 				{ XATTR_REPLACE,	"REPLACE" }),
 		  __print_flags(__entry->op_flags, "|", XFS_DA_OP_FLAGS))
diff --git a/fs/xfs/xfs_xattr.c b/fs/xfs/xfs_xattr.c
index 4ebf7052eb673..9b29973424b45 100644
--- a/fs/xfs/xfs_xattr.c
+++ b/fs/xfs/xfs_xattr.c
@@ -124,7 +124,7 @@ xfs_xattr_set(const struct xattr_handler *handler,
 	struct xfs_da_args	args = {
 		.dp		= XFS_I(inode),
 		.attr_filter	= handler->flags,
-		.attr_flags	= flags,
+		.xattr_flags	= flags,
 		.name		= name,
 		.namelen	= strlen(name),
 		.value		= (void *)value,


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 4/4] xfs: rearrange xfs_da_args a bit to use less space
  2024-04-10  0:44 ` [PATCHSET v13.1 3/9] xfs: shrink struct xfs_da_args Darrick J. Wong
                     ` (2 preceding siblings ...)
  2024-04-10  0:50   ` [PATCH 3/4] xfs: rename xfs_da_args.attr_flags Darrick J. Wong
@ 2024-04-10  0:50   ` Darrick J. Wong
  2024-04-10  5:02     ` Christoph Hellwig
  3 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  0:50 UTC (permalink / raw)
  To: djwong; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

A few notes about struct xfs_da_args:

The XFS_ATTR_* flags only go up as far as XFS_ATTR_INCOMPLETE, which
means that attr_filter could be a u8 field.

The XATTR_* flags only have two values, which means that xattr_flags
could be shrunk to a u8.

I've reduced the number of XFS_DA_OP_* flags down to the point where
op_flags would also fit into a u8.

filetype has 7 bytes of slack after it, which is wasteful.

namelen will never be greater than MAXNAMELEN, which is 256.  This field
could be reduced to a short.

Rearrange the fields in xfs_da_args to waste less space.  This reduces
the structure size from 136 bytes to 128.  Later when we add extra
fields to support parent pointer replacement, this will only bloat the
structure to 144 bytes, instead of 168.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_da_btree.h |   22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_da_btree.h b/fs/xfs/libxfs/xfs_da_btree.h
index e585d0fa9caea..47485f5edae86 100644
--- a/fs/xfs/libxfs/xfs_da_btree.h
+++ b/fs/xfs/libxfs/xfs_da_btree.h
@@ -54,17 +54,21 @@ enum xfs_dacmp {
  */
 typedef struct xfs_da_args {
 	struct xfs_da_geometry *geo;	/* da block geometry */
-	const uint8_t		*name;		/* string (maybe not NULL terminated) */
-	int		namelen;	/* length of string (maybe no NULL) */
-	uint8_t		filetype;	/* filetype of inode for directories */
+	const uint8_t	*name;		/* string (maybe not NULL terminated) */
 	void		*value;		/* set of bytes (maybe contain NULLs) */
-	int		valuelen;	/* length of value */
-	unsigned int	attr_filter;	/* XFS_ATTR_{ROOT,SECURE,INCOMPLETE} */
-	unsigned int	xattr_flags;	/* XATTR_{CREATE,REPLACE} */
-	xfs_dahash_t	hashval;	/* hash value of name */
-	xfs_ino_t	inumber;	/* input/output inode number */
 	struct xfs_inode *dp;		/* directory inode to manipulate */
 	struct xfs_trans *trans;	/* current trans (changes over time) */
+
+	xfs_ino_t	inumber;	/* input/output inode number */
+	xfs_ino_t	owner;		/* inode that owns the dir/attr data */
+
+	int		valuelen;	/* length of value */
+	uint8_t		filetype;	/* filetype of inode for directories */
+	uint8_t		op_flags;	/* operation flags */
+	uint8_t		attr_filter;	/* XFS_ATTR_{ROOT,SECURE,INCOMPLETE} */
+	uint8_t		xattr_flags;	/* XATTR_{CREATE,REPLACE} */
+	short		namelen;	/* length of string (maybe no NULL) */
+	xfs_dahash_t	hashval;	/* hash value of name */
 	xfs_extlen_t	total;		/* total blocks needed, for 1st bmap */
 	int		whichfork;	/* data or attribute fork */
 	xfs_dablk_t	blkno;		/* blkno of attr leaf of interest */
@@ -77,9 +81,7 @@ typedef struct xfs_da_args {
 	xfs_dablk_t	rmtblkno2;	/* remote attr value starting blkno */
 	int		rmtblkcnt2;	/* remote attr value block count */
 	int		rmtvaluelen2;	/* remote attr value length in bytes */
-	uint32_t	op_flags;	/* operation flags */
 	enum xfs_dacmp	cmpresult;	/* name compare result for lookups */
-	xfs_ino_t	owner;		/* inode that owns the dir/attr data */
 } xfs_da_args_t;
 
 /*


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 01/12] xfs: attr fork iext must be loaded before calling xfs_attr_is_leaf
  2024-04-10  0:45 ` [PATCHSET v13.1 4/9] xfs: improve extended attribute validation Darrick J. Wong
@ 2024-04-10  0:50   ` Darrick J. Wong
  2024-04-10  5:04     ` Christoph Hellwig
  2024-04-10  0:50   ` [PATCH 02/12] xfs: require XFS_SB_FEAT_INCOMPAT_LOG_XATTRS for attr log intent item recovery Darrick J. Wong
                     ` (10 subsequent siblings)
  11 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  0:50 UTC (permalink / raw)
  To: djwong; +Cc: Christoph Hellwig, hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Christoph noticed that the xfs_attr_is_leaf in xfs_attr_get_ilocked can
access the incore extent tree of the attr fork, but nothing in the
xfs_attr_get path guarantees that the incore tree is actually loaded.

Most of the time it is, but seeing as xfs_attr_is_leaf ignores the
return value of xfs_iext_get_extent I guess we've been making choices
based on random stack contents and nobody's complained?

Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_attr.c |   17 +++++++++++++++++
 fs/xfs/xfs_attr_item.c   |   42 ++++++++++++++++++++++++++++++++++++------
 fs/xfs/xfs_attr_list.c   |    7 +++++++
 3 files changed, 60 insertions(+), 6 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index 5efbbb60f0069..cbc9a1b1c72d3 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -87,6 +87,8 @@ xfs_attr_is_leaf(
 	struct xfs_iext_cursor	icur;
 	struct xfs_bmbt_irec	imap;
 
+	ASSERT(!xfs_need_iread_extents(ifp));
+
 	if (ifp->if_nextents != 1 || ifp->if_format != XFS_DINODE_FMT_EXTENTS)
 		return false;
 
@@ -224,11 +226,21 @@ int
 xfs_attr_get_ilocked(
 	struct xfs_da_args	*args)
 {
+	int			error;
+
 	xfs_assert_ilocked(args->dp, XFS_ILOCK_SHARED | XFS_ILOCK_EXCL);
 
 	if (!xfs_inode_hasattr(args->dp))
 		return -ENOATTR;
 
+	/*
+	 * The incore attr fork iext tree must be loaded for xfs_attr_is_leaf
+	 * to work correctly.
+	 */
+	error = xfs_iread_extents(args->trans, args->dp, XFS_ATTR_FORK);
+	if (error)
+		return error;
+
 	if (args->dp->i_af.if_format == XFS_DINODE_FMT_LOCAL)
 		return xfs_attr_shortform_getvalue(args);
 	if (xfs_attr_is_leaf(args->dp))
@@ -870,6 +882,11 @@ xfs_attr_lookup(
 		return -ENOATTR;
 	}
 
+	/* Prerequisite for xfs_attr_is_leaf */
+	error = xfs_iread_extents(args->trans, args->dp, XFS_ATTR_FORK);
+	if (error)
+		return error;
+
 	if (xfs_attr_is_leaf(dp)) {
 		error = xfs_attr_leaf_hasname(args, &bp);
 
diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
index d460347056945..541455731618b 100644
--- a/fs/xfs/xfs_attr_item.c
+++ b/fs/xfs/xfs_attr_item.c
@@ -498,6 +498,25 @@ xfs_attri_validate(
 	return xfs_verify_ino(mp, attrp->alfi_ino);
 }
 
+static int
+xfs_attri_iread_extents(
+	struct xfs_inode		*ip)
+{
+	struct xfs_trans		*tp;
+	int				error;
+
+	error = xfs_trans_alloc_empty(ip->i_mount, &tp);
+	if (error)
+		return error;
+
+	xfs_ilock(ip, XFS_ILOCK_EXCL);
+	error = xfs_iread_extents(tp, ip, XFS_ATTR_FORK);
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+	xfs_trans_cancel(tp);
+
+	return error;
+}
+
 static inline struct xfs_attr_intent *
 xfs_attri_recover_work(
 	struct xfs_mount		*mp,
@@ -508,13 +527,22 @@ xfs_attri_recover_work(
 {
 	struct xfs_attr_intent		*attr;
 	struct xfs_da_args		*args;
+	struct xfs_inode		*ip;
 	int				local;
 	int				error;
 
-	error = xlog_recover_iget(mp,  attrp->alfi_ino, ipp);
+	error = xlog_recover_iget(mp,  attrp->alfi_ino, &ip);
 	if (error)
 		return ERR_PTR(error);
 
+	if (xfs_inode_has_attr_fork(ip)) {
+		error = xfs_attri_iread_extents(ip);
+		if (error) {
+			xfs_irele(ip);
+			return ERR_PTR(error);
+		}
+	}
+
 	attr = kzalloc(sizeof(struct xfs_attr_intent) +
 			sizeof(struct xfs_da_args), GFP_KERNEL | __GFP_NOFAIL);
 	args = (struct xfs_da_args *)(attr + 1);
@@ -531,7 +559,7 @@ xfs_attri_recover_work(
 	attr->xattri_nameval = xfs_attri_log_nameval_get(nv);
 	ASSERT(attr->xattri_nameval);
 
-	args->dp = *ipp;
+	args->dp = ip;
 	args->geo = mp->m_attr_geo;
 	args->whichfork = XFS_ATTR_FORK;
 	args->name = nv->name.i_addr;
@@ -561,6 +589,7 @@ xfs_attri_recover_work(
 	}
 
 	xfs_defer_add_item(dfp, &attr->xattri_list);
+	*ipp = ip;
 	return attr;
 }
 
@@ -615,16 +644,17 @@ xfs_attr_recover_work(
 		XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
 				&attrip->attri_format,
 				sizeof(attrip->attri_format));
-	if (error) {
-		xfs_trans_cancel(tp);
-		goto out_unlock;
-	}
+	if (error)
+		goto out_cancel;
 
 	error = xfs_defer_ops_capture_and_commit(tp, capture_list);
 out_unlock:
 	xfs_iunlock(ip, XFS_ILOCK_EXCL);
 	xfs_irele(ip);
 	return error;
+out_cancel:
+	xfs_trans_cancel(tp);
+	goto out_unlock;
 }
 
 /* Re-log an intent item to push the log tail forward. */
diff --git a/fs/xfs/xfs_attr_list.c b/fs/xfs/xfs_attr_list.c
index 6a621f016f040..97c8f3dcfb89d 100644
--- a/fs/xfs/xfs_attr_list.c
+++ b/fs/xfs/xfs_attr_list.c
@@ -544,6 +544,7 @@ xfs_attr_list_ilocked(
 	struct xfs_attr_list_context	*context)
 {
 	struct xfs_inode		*dp = context->dp;
+	int				error;
 
 	xfs_assert_ilocked(dp, XFS_ILOCK_SHARED | XFS_ILOCK_EXCL);
 
@@ -554,6 +555,12 @@ xfs_attr_list_ilocked(
 		return 0;
 	if (dp->i_af.if_format == XFS_DINODE_FMT_LOCAL)
 		return xfs_attr_shortform_list(context);
+
+	/* Prerequisite for xfs_attr_is_leaf */
+	error = xfs_iread_extents(NULL, dp, XFS_ATTR_FORK);
+	if (error)
+		return error;
+
 	if (xfs_attr_is_leaf(dp))
 		return xfs_attr_leaf_list(context);
 	return xfs_attr_node_list(context);


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 02/12] xfs: require XFS_SB_FEAT_INCOMPAT_LOG_XATTRS for attr log intent item recovery
  2024-04-10  0:45 ` [PATCHSET v13.1 4/9] xfs: improve extended attribute validation Darrick J. Wong
  2024-04-10  0:50   ` [PATCH 01/12] xfs: attr fork iext must be loaded before calling xfs_attr_is_leaf Darrick J. Wong
@ 2024-04-10  0:50   ` Darrick J. Wong
  2024-04-10  5:04     ` Christoph Hellwig
  2024-04-10  0:51   ` [PATCH 03/12] xfs: use an XFS_OPSTATE_ flag for detecting if logged xattrs are available Darrick J. Wong
                     ` (9 subsequent siblings)
  11 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  0:50 UTC (permalink / raw)
  To: djwong; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

The XFS_SB_FEAT_INCOMPAT_LOG_XATTRS feature bit protects a filesystem
from old kernels that do not know how to recover extended attribute log
intent items.  Make this check mandatory instead of a debugging assert.

Fixes: fd920008784ea ("xfs: Set up infrastructure for log attribute replay")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_attr_item.c |    5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)


diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
index 541455731618b..dfe7039dac989 100644
--- a/fs/xfs/xfs_attr_item.c
+++ b/fs/xfs/xfs_attr_item.c
@@ -469,6 +469,9 @@ xfs_attri_validate(
 	unsigned int			op = attrp->alfi_op_flags &
 					     XFS_ATTRI_OP_FLAGS_TYPE_MASK;
 
+	if (!xfs_sb_version_haslogxattrs(&mp->m_sb))
+		return false;
+
 	if (attrp->__pad != 0)
 		return false;
 
@@ -570,8 +573,6 @@ xfs_attri_recover_work(
 			 XFS_DA_OP_LOGGED;
 	args->owner = args->dp->i_ino;
 
-	ASSERT(xfs_sb_version_haslogxattrs(&mp->m_sb));
-
 	switch (attr->xattri_op_flags) {
 	case XFS_ATTRI_OP_FLAGS_SET:
 	case XFS_ATTRI_OP_FLAGS_REPLACE:


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 03/12] xfs: use an XFS_OPSTATE_ flag for detecting if logged xattrs are available
  2024-04-10  0:45 ` [PATCHSET v13.1 4/9] xfs: improve extended attribute validation Darrick J. Wong
  2024-04-10  0:50   ` [PATCH 01/12] xfs: attr fork iext must be loaded before calling xfs_attr_is_leaf Darrick J. Wong
  2024-04-10  0:50   ` [PATCH 02/12] xfs: require XFS_SB_FEAT_INCOMPAT_LOG_XATTRS for attr log intent item recovery Darrick J. Wong
@ 2024-04-10  0:51   ` Darrick J. Wong
  2024-04-10  5:05     ` Christoph Hellwig
  2024-04-10  0:51   ` [PATCH 04/12] xfs: check opcode and iovec count match in xlog_recover_attri_commit_pass2 Darrick J. Wong
                     ` (8 subsequent siblings)
  11 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  0:51 UTC (permalink / raw)
  To: djwong; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Per reviewer request, use an OPSTATE flag (+ helpers) to decide if
logged xattrs are enabled, instead of querying the xfs_sb.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_attr_item.c |    2 +-
 fs/xfs/xfs_mount.c     |   16 ++++++++++++++++
 fs/xfs/xfs_mount.h     |    6 +++++-
 fs/xfs/xfs_xattr.c     |    3 ++-
 4 files changed, 24 insertions(+), 3 deletions(-)


diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
index dfe7039dac989..e5e7ddbc594b9 100644
--- a/fs/xfs/xfs_attr_item.c
+++ b/fs/xfs/xfs_attr_item.c
@@ -469,7 +469,7 @@ xfs_attri_validate(
 	unsigned int			op = attrp->alfi_op_flags &
 					     XFS_ATTRI_OP_FLAGS_TYPE_MASK;
 
-	if (!xfs_sb_version_haslogxattrs(&mp->m_sb))
+	if (!xfs_is_using_logged_xattrs(mp))
 		return false;
 
 	if (attrp->__pad != 0)
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index d37ba10f5fa33..a8a4b338985af 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -230,6 +230,13 @@ xfs_readsb(
 	mp->m_features |= xfs_sb_version_to_features(sbp);
 	xfs_reinit_percpu_counters(mp);
 
+	/*
+	 * If logged xattrs are enabled after log recovery finishes, then set
+	 * the opstate so that log recovery will work properly.
+	 */
+	if (xfs_sb_version_haslogxattrs(&mp->m_sb))
+		xfs_set_using_logged_xattrs(mp);
+
 	/* no need to be quiet anymore, so reset the buf ops */
 	bp->b_ops = &xfs_sb_buf_ops;
 
@@ -828,6 +835,15 @@ xfs_mountfs(
 		goto out_inodegc_shrinker;
 	}
 
+	/*
+	 * If logged xattrs are still enabled after log recovery finishes, then
+	 * they'll be available until unmount.  Otherwise, turn them off.
+	 */
+	if (xfs_sb_version_haslogxattrs(&mp->m_sb))
+		xfs_set_using_logged_xattrs(mp);
+	else
+		xfs_clear_using_logged_xattrs(mp);
+
 	/* Enable background inode inactivation workers. */
 	xfs_inodegc_start(mp);
 	xfs_blockgc_start(mp);
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index b022e5120dc42..ffdf354b72437 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -416,6 +416,8 @@ __XFS_HAS_FEAT(nouuid, NOUUID)
 #define XFS_OPSTATE_QUOTACHECK_RUNNING	10
 /* Do we want to clear log incompat flags? */
 #define XFS_OPSTATE_UNSET_LOG_INCOMPAT	11
+/* Filesystem can use logged extended attributes */
+#define XFS_OPSTATE_USE_LARP		12
 
 #define __XFS_IS_OPSTATE(name, NAME) \
 static inline bool xfs_is_ ## name (struct xfs_mount *mp) \
@@ -444,6 +446,7 @@ __XFS_IS_OPSTATE(quotacheck_running, QUOTACHECK_RUNNING)
 # define xfs_is_quotacheck_running(mp)	(false)
 #endif
 __XFS_IS_OPSTATE(done_with_log_incompat, UNSET_LOG_INCOMPAT)
+__XFS_IS_OPSTATE(using_logged_xattrs, USE_LARP)
 
 static inline bool
 xfs_should_warn(struct xfs_mount *mp, long nr)
@@ -463,7 +466,8 @@ xfs_should_warn(struct xfs_mount *mp, long nr)
 	{ (1UL << XFS_OPSTATE_WARNED_SHRINK),		"wshrink" }, \
 	{ (1UL << XFS_OPSTATE_WARNED_LARP),		"wlarp" }, \
 	{ (1UL << XFS_OPSTATE_QUOTACHECK_RUNNING),	"quotacheck" }, \
-	{ (1UL << XFS_OPSTATE_UNSET_LOG_INCOMPAT),	"unset_log_incompat" }
+	{ (1UL << XFS_OPSTATE_UNSET_LOG_INCOMPAT),	"unset_log_incompat" }, \
+	{ (1UL << XFS_OPSTATE_USE_LARP),		"logged_xattrs" }
 
 /*
  * Max and min values for mount-option defined I/O
diff --git a/fs/xfs/xfs_xattr.c b/fs/xfs/xfs_xattr.c
index 9b29973424b45..514179a8d2a7f 100644
--- a/fs/xfs/xfs_xattr.c
+++ b/fs/xfs/xfs_xattr.c
@@ -31,7 +31,7 @@ xfs_attr_grab_log_assist(
 	int			error = 0;
 
 	/* xattr update log intent items are already enabled */
-	if (xfs_sb_version_haslogxattrs(&mp->m_sb))
+	if (xfs_is_using_logged_xattrs(mp))
 		return 0;
 
 	/*
@@ -48,6 +48,7 @@ xfs_attr_grab_log_assist(
 			XFS_SB_FEAT_INCOMPAT_LOG_XATTRS);
 	if (error)
 		return error;
+	xfs_set_using_logged_xattrs(mp);
 
 	xfs_warn_mount(mp, XFS_OPSTATE_WARNED_LARP,
  "EXPERIMENTAL logged extended attributes feature in use. Use at your own risk!");


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 04/12] xfs: check opcode and iovec count match in xlog_recover_attri_commit_pass2
  2024-04-10  0:45 ` [PATCHSET v13.1 4/9] xfs: improve extended attribute validation Darrick J. Wong
                     ` (2 preceding siblings ...)
  2024-04-10  0:51   ` [PATCH 03/12] xfs: use an XFS_OPSTATE_ flag for detecting if logged xattrs are available Darrick J. Wong
@ 2024-04-10  0:51   ` Darrick J. Wong
  2024-04-10  5:05     ` Christoph Hellwig
  2024-04-10  0:51   ` [PATCH 05/12] xfs: fix missing check for invalid attr flags Darrick J. Wong
                     ` (7 subsequent siblings)
  11 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  0:51 UTC (permalink / raw)
  To: djwong; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Check that the number of recovered log iovecs is what is expected for
the xattri opcode is expecting.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_attr_item.c |   27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)


diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
index e5e7ddbc594b9..d3559e6b24b7d 100644
--- a/fs/xfs/xfs_attr_item.c
+++ b/fs/xfs/xfs_attr_item.c
@@ -737,6 +737,7 @@ xlog_recover_attri_commit_pass2(
 	const void			*attr_value = NULL;
 	const void			*attr_name;
 	size_t				len;
+	unsigned int			op;
 
 	attri_formatp = item->ri_buf[0].i_addr;
 	attr_name = item->ri_buf[1].i_addr;
@@ -755,6 +756,32 @@ xlog_recover_attri_commit_pass2(
 		return -EFSCORRUPTED;
 	}
 
+	/* Check the number of log iovecs makes sense for the op code. */
+	op = attri_formatp->alfi_op_flags & XFS_ATTRI_OP_FLAGS_TYPE_MASK;
+	switch (op) {
+	case XFS_ATTRI_OP_FLAGS_SET:
+	case XFS_ATTRI_OP_FLAGS_REPLACE:
+		/* Log item, attr name, attr value */
+		if (item->ri_total != 3) {
+			XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
+					     attri_formatp, len);
+			return -EFSCORRUPTED;
+		}
+		break;
+	case XFS_ATTRI_OP_FLAGS_REMOVE:
+		/* Log item, attr name */
+		if (item->ri_total != 2) {
+			XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
+					     attri_formatp, len);
+			return -EFSCORRUPTED;
+		}
+		break;
+	default:
+		XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
+				     attri_formatp, len);
+		return -EFSCORRUPTED;
+	}
+
 	/* Validate the attr name */
 	if (item->ri_buf[1].i_len !=
 			xlog_calc_iovec_len(attri_formatp->alfi_name_len)) {


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 05/12] xfs: fix missing check for invalid attr flags
  2024-04-10  0:45 ` [PATCHSET v13.1 4/9] xfs: improve extended attribute validation Darrick J. Wong
                     ` (3 preceding siblings ...)
  2024-04-10  0:51   ` [PATCH 04/12] xfs: check opcode and iovec count match in xlog_recover_attri_commit_pass2 Darrick J. Wong
@ 2024-04-10  0:51   ` Darrick J. Wong
  2024-04-10  5:07     ` Christoph Hellwig
  2024-04-10  0:51   ` [PATCH 06/12] xfs: restructure xfs_attr_complete_op a bit Darrick J. Wong
                     ` (6 subsequent siblings)
  11 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  0:51 UTC (permalink / raw)
  To: djwong; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

The xattr scrubber doesn't check for undefined flags in shortform attr
entries.  Therefore, define a mask XFS_ATTR_ONDISK_MASK that has all
possible XFS_ATTR_* flags in it, and use that to check for unknown bits
in xchk_xattr_actor.

Refactor the check in the dabtree scanner function to use the new mask
as well.  The redundant checks need to be in place because the dabtree
check examines the hash mappings and therefore needs to decode the attr
leaf entries to compute the namehash.  This happens before the walk of
the xattr entries themselves.

Fixes: ae0506eba78fd ("xfs: check used space of shortform xattr structures")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_da_format.h |    5 +++++
 fs/xfs/scrub/attr.c           |   13 +++++++++----
 2 files changed, 14 insertions(+), 4 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_da_format.h b/fs/xfs/libxfs/xfs_da_format.h
index aac3fe0396140..ecd0616f5776a 100644
--- a/fs/xfs/libxfs/xfs_da_format.h
+++ b/fs/xfs/libxfs/xfs_da_format.h
@@ -719,8 +719,13 @@ struct xfs_attr3_leafblock {
 #define XFS_ATTR_ROOT		(1u << XFS_ATTR_ROOT_BIT)
 #define XFS_ATTR_SECURE		(1u << XFS_ATTR_SECURE_BIT)
 #define XFS_ATTR_INCOMPLETE	(1u << XFS_ATTR_INCOMPLETE_BIT)
+
 #define XFS_ATTR_NSP_ONDISK_MASK	(XFS_ATTR_ROOT | XFS_ATTR_SECURE)
 
+#define XFS_ATTR_ONDISK_MASK	(XFS_ATTR_NSP_ONDISK_MASK | \
+				 XFS_ATTR_LOCAL | \
+				 XFS_ATTR_INCOMPLETE)
+
 #define XFS_ATTR_NAMESPACE_STR \
 	{ XFS_ATTR_LOCAL,	"local" }, \
 	{ XFS_ATTR_ROOT,	"root" }, \
diff --git a/fs/xfs/scrub/attr.c b/fs/xfs/scrub/attr.c
index 5b855d7c98211..5ca79af47e81e 100644
--- a/fs/xfs/scrub/attr.c
+++ b/fs/xfs/scrub/attr.c
@@ -192,6 +192,11 @@ xchk_xattr_actor(
 	if (xchk_should_terminate(sc, &error))
 		return error;
 
+	if (attr_flags & ~XFS_ATTR_ONDISK_MASK) {
+		xchk_fblock_set_corrupt(sc, XFS_ATTR_FORK, args.blkno);
+		return -ECANCELED;
+	}
+
 	if (attr_flags & XFS_ATTR_INCOMPLETE) {
 		/* Incomplete attr key, just mark the inode for preening. */
 		xchk_ino_set_preen(sc, ip->i_ino);
@@ -481,7 +486,6 @@ xchk_xattr_rec(
 	xfs_dahash_t			hash;
 	int				nameidx;
 	int				hdrsize;
-	unsigned int			badflags;
 	int				error;
 
 	ASSERT(blk->magic == XFS_ATTR_LEAF_MAGIC);
@@ -511,10 +515,11 @@ xchk_xattr_rec(
 
 	/* Retrieve the entry and check it. */
 	hash = be32_to_cpu(ent->hashval);
-	badflags = ~(XFS_ATTR_LOCAL | XFS_ATTR_ROOT | XFS_ATTR_SECURE |
-			XFS_ATTR_INCOMPLETE);
-	if ((ent->flags & badflags) != 0)
+	if (ent->flags & ~XFS_ATTR_ONDISK_MASK) {
 		xchk_da_set_corrupt(ds, level);
+		return 0;
+	}
+
 	if (ent->flags & XFS_ATTR_LOCAL) {
 		lentry = (struct xfs_attr_leaf_name_local *)
 				(((char *)bp->b_addr) + nameidx);


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 06/12] xfs: restructure xfs_attr_complete_op a bit
  2024-04-10  0:45 ` [PATCHSET v13.1 4/9] xfs: improve extended attribute validation Darrick J. Wong
                     ` (4 preceding siblings ...)
  2024-04-10  0:51   ` [PATCH 05/12] xfs: fix missing check for invalid attr flags Darrick J. Wong
@ 2024-04-10  0:51   ` Darrick J. Wong
  2024-04-10  5:07     ` Christoph Hellwig
  2024-04-10  0:52   ` [PATCH 07/12] xfs: use helpers to extract xattr op from opflags Darrick J. Wong
                     ` (5 subsequent siblings)
  11 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  0:51 UTC (permalink / raw)
  To: djwong; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Eliminate the local variable from this function so that we can
streamline things a bit later when we add the PPTR_REPLACE op code.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_attr.c |    9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index cbc9a1b1c72d3..fda9acb81585d 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -432,14 +432,13 @@ xfs_attr_complete_op(
 	enum xfs_delattr_state	replace_state)
 {
 	struct xfs_da_args	*args = attr->xattri_da_args;
-	bool			do_replace = args->op_flags & XFS_DA_OP_REPLACE;
+
+	if (!(args->op_flags & XFS_DA_OP_REPLACE))
+		replace_state = XFS_DAS_DONE;
 
 	args->op_flags &= ~XFS_DA_OP_REPLACE;
 	args->attr_filter &= ~XFS_ATTR_INCOMPLETE;
-	if (do_replace)
-		return replace_state;
-
-	return XFS_DAS_DONE;
+	return replace_state;
 }
 
 static int


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 07/12] xfs: use helpers to extract xattr op from opflags
  2024-04-10  0:45 ` [PATCHSET v13.1 4/9] xfs: improve extended attribute validation Darrick J. Wong
                     ` (5 preceding siblings ...)
  2024-04-10  0:51   ` [PATCH 06/12] xfs: restructure xfs_attr_complete_op a bit Darrick J. Wong
@ 2024-04-10  0:52   ` Darrick J. Wong
  2024-04-10  5:07     ` Christoph Hellwig
  2024-04-10  0:52   ` [PATCH 08/12] xfs: validate recovered name buffers when recovering xattr items Darrick J. Wong
                     ` (4 subsequent siblings)
  11 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  0:52 UTC (permalink / raw)
  To: djwong; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Create helper functions to extract the xattr op from the ondisk xattri
log item and the incore attr intent item.  These will get more use in
the patches that follow.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_attr.h |    5 +++++
 fs/xfs/xfs_attr_item.c   |   16 ++++++++++------
 2 files changed, 15 insertions(+), 6 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
index 670ab2a613fc6..04ae01ab9a5d8 100644
--- a/fs/xfs/libxfs/xfs_attr.h
+++ b/fs/xfs/libxfs/xfs_attr.h
@@ -529,6 +529,11 @@ struct xfs_attr_intent {
 	struct xfs_bmbt_irec		xattri_map;
 };
 
+static inline unsigned int
+xfs_attr_intent_op(const struct xfs_attr_intent *attr)
+{
+	return attr->xattri_op_flags & XFS_ATTRI_OP_FLAGS_TYPE_MASK;
+}
 
 /*========================================================================
  * Function prototypes for the kernel.
diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
index d3559e6b24b7d..b4c2dcb4581bc 100644
--- a/fs/xfs/xfs_attr_item.c
+++ b/fs/xfs/xfs_attr_item.c
@@ -308,6 +308,12 @@ xfs_attrd_item_intent(
 	return &ATTRD_ITEM(lip)->attrd_attrip->attri_item;
 }
 
+static inline unsigned int
+xfs_attr_log_item_op(const struct xfs_attri_log_format *attrp)
+{
+	return attrp->alfi_op_flags & XFS_ATTRI_OP_FLAGS_TYPE_MASK;
+}
+
 /* Log an attr to the intent item. */
 STATIC void
 xfs_attr_log_item(
@@ -466,8 +472,7 @@ xfs_attri_validate(
 	struct xfs_mount		*mp,
 	struct xfs_attri_log_format	*attrp)
 {
-	unsigned int			op = attrp->alfi_op_flags &
-					     XFS_ATTRI_OP_FLAGS_TYPE_MASK;
+	unsigned int			op = xfs_attr_log_item_op(attrp);
 
 	if (!xfs_is_using_logged_xattrs(mp))
 		return false;
@@ -551,8 +556,7 @@ xfs_attri_recover_work(
 	args = (struct xfs_da_args *)(attr + 1);
 
 	attr->xattri_da_args = args;
-	attr->xattri_op_flags = attrp->alfi_op_flags &
-						XFS_ATTRI_OP_FLAGS_TYPE_MASK;
+	attr->xattri_op_flags = xfs_attr_log_item_op(attrp);
 
 	/*
 	 * We're reconstructing the deferred work state structure from the
@@ -573,7 +577,7 @@ xfs_attri_recover_work(
 			 XFS_DA_OP_LOGGED;
 	args->owner = args->dp->i_ino;
 
-	switch (attr->xattri_op_flags) {
+	switch (xfs_attr_intent_op(attr)) {
 	case XFS_ATTRI_OP_FLAGS_SET:
 	case XFS_ATTRI_OP_FLAGS_REPLACE:
 		args->value = nv->value.i_addr;
@@ -757,7 +761,7 @@ xlog_recover_attri_commit_pass2(
 	}
 
 	/* Check the number of log iovecs makes sense for the op code. */
-	op = attri_formatp->alfi_op_flags & XFS_ATTRI_OP_FLAGS_TYPE_MASK;
+	op = xfs_attr_log_item_op(attri_formatp);
 	switch (op) {
 	case XFS_ATTRI_OP_FLAGS_SET:
 	case XFS_ATTRI_OP_FLAGS_REPLACE:


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 08/12] xfs: validate recovered name buffers when recovering xattr items
  2024-04-10  0:45 ` [PATCHSET v13.1 4/9] xfs: improve extended attribute validation Darrick J. Wong
                     ` (6 preceding siblings ...)
  2024-04-10  0:52   ` [PATCH 07/12] xfs: use helpers to extract xattr op from opflags Darrick J. Wong
@ 2024-04-10  0:52   ` Darrick J. Wong
  2024-04-10  5:08     ` Christoph Hellwig
  2024-04-10  0:52   ` [PATCH 09/12] xfs: always set args->value in xfs_attri_item_recover Darrick J. Wong
                     ` (3 subsequent siblings)
  11 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  0:52 UTC (permalink / raw)
  To: djwong; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Strengthen the xattri log item recovery code by checking that we
actually have the required name and newname buffers for whatever
operation we're replaying.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_attr_item.c |   58 +++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 47 insertions(+), 11 deletions(-)


diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
index b4c2dcb4581bc..ebd6e98d9c661 100644
--- a/fs/xfs/xfs_attr_item.c
+++ b/fs/xfs/xfs_attr_item.c
@@ -741,22 +741,20 @@ xlog_recover_attri_commit_pass2(
 	const void			*attr_value = NULL;
 	const void			*attr_name;
 	size_t				len;
-	unsigned int			op;
-
-	attri_formatp = item->ri_buf[0].i_addr;
-	attr_name = item->ri_buf[1].i_addr;
+	unsigned int			op, i = 0;
 
 	/* Validate xfs_attri_log_format before the large memory allocation */
 	len = sizeof(struct xfs_attri_log_format);
-	if (item->ri_buf[0].i_len != len) {
+	if (item->ri_buf[i].i_len != len) {
 		XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
 				item->ri_buf[0].i_addr, item->ri_buf[0].i_len);
 		return -EFSCORRUPTED;
 	}
 
+	attri_formatp = item->ri_buf[i].i_addr;
 	if (!xfs_attri_validate(mp, attri_formatp)) {
 		XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
-				item->ri_buf[0].i_addr, item->ri_buf[0].i_len);
+				attri_formatp, len);
 		return -EFSCORRUPTED;
 	}
 
@@ -785,31 +783,69 @@ xlog_recover_attri_commit_pass2(
 				     attri_formatp, len);
 		return -EFSCORRUPTED;
 	}
+	i++;
 
 	/* Validate the attr name */
-	if (item->ri_buf[1].i_len !=
+	if (item->ri_buf[i].i_len !=
 			xlog_calc_iovec_len(attri_formatp->alfi_name_len)) {
 		XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
-				item->ri_buf[0].i_addr, item->ri_buf[0].i_len);
+				attri_formatp, len);
 		return -EFSCORRUPTED;
 	}
 
+	attr_name = item->ri_buf[i].i_addr;
 	if (!xfs_attr_namecheck(attr_name, attri_formatp->alfi_name_len)) {
 		XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
-				item->ri_buf[1].i_addr, item->ri_buf[1].i_len);
+				attri_formatp, len);
 		return -EFSCORRUPTED;
 	}
+	i++;
 
 	/* Validate the attr value, if present */
 	if (attri_formatp->alfi_value_len != 0) {
-		if (item->ri_buf[2].i_len != xlog_calc_iovec_len(attri_formatp->alfi_value_len)) {
+		if (item->ri_buf[i].i_len != xlog_calc_iovec_len(attri_formatp->alfi_value_len)) {
 			XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
 					item->ri_buf[0].i_addr,
 					item->ri_buf[0].i_len);
 			return -EFSCORRUPTED;
 		}
 
-		attr_value = item->ri_buf[2].i_addr;
+		attr_value = item->ri_buf[i].i_addr;
+		i++;
+	}
+
+	/*
+	 * Make sure we got the correct number of buffers for the operation
+	 * that we just loaded.
+	 */
+	if (i != item->ri_total) {
+		XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
+				attri_formatp, len);
+		return -EFSCORRUPTED;
+	}
+
+	switch (op) {
+	case XFS_ATTRI_OP_FLAGS_REMOVE:
+		/* Regular remove operations operate only on names. */
+		if (attr_value != NULL || attri_formatp->alfi_value_len != 0) {
+			XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
+					     attri_formatp, len);
+			return -EFSCORRUPTED;
+		}
+		fallthrough;
+	case XFS_ATTRI_OP_FLAGS_SET:
+	case XFS_ATTRI_OP_FLAGS_REPLACE:
+		/*
+		 * Regular xattr set/remove/replace operations require a name
+		 * and do not take a newname.  Values are optional for set and
+		 * replace.
+		 */
+		if (attr_name == NULL || attri_formatp->alfi_name_len == 0) {
+			XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
+					     attri_formatp, len);
+			return -EFSCORRUPTED;
+		}
+		break;
 	}
 
 	/*


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 09/12] xfs: always set args->value in xfs_attri_item_recover
  2024-04-10  0:45 ` [PATCHSET v13.1 4/9] xfs: improve extended attribute validation Darrick J. Wong
                     ` (7 preceding siblings ...)
  2024-04-10  0:52   ` [PATCH 08/12] xfs: validate recovered name buffers when recovering xattr items Darrick J. Wong
@ 2024-04-10  0:52   ` Darrick J. Wong
  2024-04-10  5:08     ` Christoph Hellwig
  2024-04-10  0:52   ` [PATCH 10/12] xfs: use local variables for name and value length in _attri_commit_pass2 Darrick J. Wong
                     ` (2 subsequent siblings)
  11 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  0:52 UTC (permalink / raw)
  To: djwong; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Always set args->value to the recovered value buffer.  This reduces the
amount of code in the switch statement, and hence the amount of thinking
that I have to do.  We validated the recovered buffers, supposedly.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_attr_item.c |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)


diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
index ebd6e98d9c661..8a13e2840692c 100644
--- a/fs/xfs/xfs_attr_item.c
+++ b/fs/xfs/xfs_attr_item.c
@@ -572,6 +572,8 @@ xfs_attri_recover_work(
 	args->name = nv->name.i_addr;
 	args->namelen = nv->name.i_len;
 	args->hashval = xfs_da_hashname(args->name, args->namelen);
+	args->value = nv->value.i_addr;
+	args->valuelen = nv->value.i_len;
 	args->attr_filter = attrp->alfi_attr_filter & XFS_ATTRI_FILTER_MASK;
 	args->op_flags = XFS_DA_OP_RECOVERY | XFS_DA_OP_OKNOENT |
 			 XFS_DA_OP_LOGGED;
@@ -580,8 +582,6 @@ xfs_attri_recover_work(
 	switch (xfs_attr_intent_op(attr)) {
 	case XFS_ATTRI_OP_FLAGS_SET:
 	case XFS_ATTRI_OP_FLAGS_REPLACE:
-		args->value = nv->value.i_addr;
-		args->valuelen = nv->value.i_len;
 		args->total = xfs_attr_calc_size(args, &local);
 		if (xfs_inode_hasattr(args->dp))
 			attr->xattri_dela_state = xfs_attr_init_replace_state(args);


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 10/12] xfs: use local variables for name and value length in _attri_commit_pass2
  2024-04-10  0:45 ` [PATCHSET v13.1 4/9] xfs: improve extended attribute validation Darrick J. Wong
                     ` (8 preceding siblings ...)
  2024-04-10  0:52   ` [PATCH 09/12] xfs: always set args->value in xfs_attri_item_recover Darrick J. Wong
@ 2024-04-10  0:52   ` Darrick J. Wong
  2024-04-10  5:08     ` Christoph Hellwig
  2024-04-10  0:53   ` [PATCH 11/12] xfs: refactor name/length checks in xfs_attri_validate Darrick J. Wong
  2024-04-10  0:53   ` [PATCH 12/12] xfs: enforce one namespace per attribute Darrick J. Wong
  11 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  0:52 UTC (permalink / raw)
  To: djwong; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

We're about to start using tagged unions in the xattr log format, so
create a bunch of local variables in the recovery function so we only
have to decode the log item fields once.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_attr_item.c |   25 ++++++++++++++-----------
 1 file changed, 14 insertions(+), 11 deletions(-)


diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
index 8a13e2840692c..59723e5f483e2 100644
--- a/fs/xfs/xfs_attr_item.c
+++ b/fs/xfs/xfs_attr_item.c
@@ -738,9 +738,11 @@ xlog_recover_attri_commit_pass2(
 	struct xfs_attri_log_item       *attrip;
 	struct xfs_attri_log_format     *attri_formatp;
 	struct xfs_attri_log_nameval	*nv;
-	const void			*attr_value = NULL;
 	const void			*attr_name;
+	const void			*attr_value = NULL;
 	size_t				len;
+	unsigned int			name_len = 0;
+	unsigned int			value_len = 0;
 	unsigned int			op, i = 0;
 
 	/* Validate xfs_attri_log_format before the large memory allocation */
@@ -769,6 +771,8 @@ xlog_recover_attri_commit_pass2(
 					     attri_formatp, len);
 			return -EFSCORRUPTED;
 		}
+		name_len = attri_formatp->alfi_name_len;
+		value_len = attri_formatp->alfi_value_len;
 		break;
 	case XFS_ATTRI_OP_FLAGS_REMOVE:
 		/* Log item, attr name */
@@ -777,6 +781,7 @@ xlog_recover_attri_commit_pass2(
 					     attri_formatp, len);
 			return -EFSCORRUPTED;
 		}
+		name_len = attri_formatp->alfi_name_len;
 		break;
 	default:
 		XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
@@ -786,15 +791,14 @@ xlog_recover_attri_commit_pass2(
 	i++;
 
 	/* Validate the attr name */
-	if (item->ri_buf[i].i_len !=
-			xlog_calc_iovec_len(attri_formatp->alfi_name_len)) {
+	if (item->ri_buf[i].i_len != xlog_calc_iovec_len(name_len)) {
 		XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
 				attri_formatp, len);
 		return -EFSCORRUPTED;
 	}
 
 	attr_name = item->ri_buf[i].i_addr;
-	if (!xfs_attr_namecheck(attr_name, attri_formatp->alfi_name_len)) {
+	if (!xfs_attr_namecheck(attr_name, name_len)) {
 		XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
 				attri_formatp, len);
 		return -EFSCORRUPTED;
@@ -802,8 +806,8 @@ xlog_recover_attri_commit_pass2(
 	i++;
 
 	/* Validate the attr value, if present */
-	if (attri_formatp->alfi_value_len != 0) {
-		if (item->ri_buf[i].i_len != xlog_calc_iovec_len(attri_formatp->alfi_value_len)) {
+	if (value_len != 0) {
+		if (item->ri_buf[i].i_len != xlog_calc_iovec_len(value_len)) {
 			XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
 					item->ri_buf[0].i_addr,
 					item->ri_buf[0].i_len);
@@ -827,7 +831,7 @@ xlog_recover_attri_commit_pass2(
 	switch (op) {
 	case XFS_ATTRI_OP_FLAGS_REMOVE:
 		/* Regular remove operations operate only on names. */
-		if (attr_value != NULL || attri_formatp->alfi_value_len != 0) {
+		if (attr_value != NULL || value_len != 0) {
 			XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
 					     attri_formatp, len);
 			return -EFSCORRUPTED;
@@ -840,7 +844,7 @@ xlog_recover_attri_commit_pass2(
 		 * and do not take a newname.  Values are optional for set and
 		 * replace.
 		 */
-		if (attr_name == NULL || attri_formatp->alfi_name_len == 0) {
+		if (attr_name == NULL || name_len == 0) {
 			XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
 					     attri_formatp, len);
 			return -EFSCORRUPTED;
@@ -853,9 +857,8 @@ xlog_recover_attri_commit_pass2(
 	 * name/value buffer to the recovered incore log item and drop our
 	 * reference.
 	 */
-	nv = xfs_attri_log_nameval_alloc(attr_name,
-			attri_formatp->alfi_name_len, attr_value,
-			attri_formatp->alfi_value_len);
+	nv = xfs_attri_log_nameval_alloc(attr_name, name_len,
+			attr_value, value_len);
 
 	attrip = xfs_attri_init(mp, nv);
 	memcpy(&attrip->attri_format, attri_formatp, len);


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 11/12] xfs: refactor name/length checks in xfs_attri_validate
  2024-04-10  0:45 ` [PATCHSET v13.1 4/9] xfs: improve extended attribute validation Darrick J. Wong
                     ` (9 preceding siblings ...)
  2024-04-10  0:52   ` [PATCH 10/12] xfs: use local variables for name and value length in _attri_commit_pass2 Darrick J. Wong
@ 2024-04-10  0:53   ` Darrick J. Wong
  2024-04-10  5:09     ` Christoph Hellwig
  2024-04-10  0:53   ` [PATCH 12/12] xfs: enforce one namespace per attribute Darrick J. Wong
  11 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  0:53 UTC (permalink / raw)
  To: djwong; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Move the name and length checks into the attr op switch statement so
that we can perform more specific checks of the value length.  Over the
next few patches we're going to add new attr op flags with different
validation requirements.

While we're at it, remove the incorrect comment.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_attr_item.c |   19 +++++++++++--------
 1 file changed, 11 insertions(+), 8 deletions(-)


diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
index 59723e5f483e2..5ad14be760adc 100644
--- a/fs/xfs/xfs_attr_item.c
+++ b/fs/xfs/xfs_attr_item.c
@@ -486,23 +486,26 @@ xfs_attri_validate(
 	if (attrp->alfi_attr_filter & ~XFS_ATTRI_FILTER_MASK)
 		return false;
 
-	/* alfi_op_flags should be either a set or remove */
 	switch (op) {
 	case XFS_ATTRI_OP_FLAGS_SET:
 	case XFS_ATTRI_OP_FLAGS_REPLACE:
+		if (attrp->alfi_value_len > XATTR_SIZE_MAX)
+			return false;
+		if (attrp->alfi_name_len == 0 ||
+		    attrp->alfi_name_len > XATTR_NAME_MAX)
+			return false;
+		break;
 	case XFS_ATTRI_OP_FLAGS_REMOVE:
+		if (attrp->alfi_value_len != 0)
+			return false;
+		if (attrp->alfi_name_len == 0 ||
+		    attrp->alfi_name_len > XATTR_NAME_MAX)
+			return false;
 		break;
 	default:
 		return false;
 	}
 
-	if (attrp->alfi_value_len > XATTR_SIZE_MAX)
-		return false;
-
-	if ((attrp->alfi_name_len > XATTR_NAME_MAX) ||
-	    (attrp->alfi_name_len == 0))
-		return false;
-
 	return xfs_verify_ino(mp, attrp->alfi_ino);
 }
 


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 12/12] xfs: enforce one namespace per attribute
  2024-04-10  0:45 ` [PATCHSET v13.1 4/9] xfs: improve extended attribute validation Darrick J. Wong
                     ` (10 preceding siblings ...)
  2024-04-10  0:53   ` [PATCH 11/12] xfs: refactor name/length checks in xfs_attri_validate Darrick J. Wong
@ 2024-04-10  0:53   ` Darrick J. Wong
  2024-04-10  5:09     ` Christoph Hellwig
  11 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  0:53 UTC (permalink / raw)
  To: djwong; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Create a standardized helper function to enforce one namespace bit per
extended attribute, and refactor all the open-coded hweight logic.  This
function is not a static inline to avoid porting hassles in userspace.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_attr.c      |   15 +++++++++++++++
 fs/xfs/libxfs/xfs_attr.h      |    4 +++-
 fs/xfs/libxfs/xfs_attr_leaf.c |    7 ++++++-
 fs/xfs/scrub/attr.c           |   12 +++++-------
 fs/xfs/scrub/attr_repair.c    |    4 +---
 fs/xfs/xfs_attr_item.c        |   10 ++++++++--
 fs/xfs/xfs_attr_list.c        |   11 +++++++----
 7 files changed, 45 insertions(+), 18 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index fda9acb81585d..426a41b43f641 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -275,6 +275,8 @@ xfs_attr_get(
 
 	if (xfs_is_shutdown(args->dp->i_mount))
 		return -EIO;
+	if (!xfs_attr_namecheck(args->attr_filter, args->name, args->namelen))
+		return -EFSCORRUPTED;
 
 	if (!args->owner)
 		args->owner = args->dp->i_ino;
@@ -950,6 +952,8 @@ xfs_attr_set(
 
 	if (xfs_is_shutdown(dp->i_mount))
 		return -EIO;
+	if (!xfs_attr_namecheck(args->attr_filter, args->name, args->namelen))
+		return -EFSCORRUPTED;
 
 	error = xfs_qm_dqattach(dp);
 	if (error)
@@ -1530,12 +1534,23 @@ xfs_attr_node_get(
 	return error;
 }
 
+/* Enforce that there is at most one namespace bit per attr. */
+inline bool xfs_attr_check_namespace(unsigned int attr_flags)
+{
+	return hweight32(attr_flags & XFS_ATTR_NSP_ONDISK_MASK) < 2;
+}
+
 /* Returns true if the attribute entry name is valid. */
 bool
 xfs_attr_namecheck(
+	unsigned int	attr_flags,
 	const void	*name,
 	size_t		length)
 {
+	/* Only one namespace bit allowed. */
+	if (!xfs_attr_check_namespace(attr_flags))
+		return false;
+
 	/*
 	 * MAXNAMELEN includes the trailing null, but (name/length) leave it
 	 * out, so use >= for the length check.
diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
index 04ae01ab9a5d8..3813f7ae626a2 100644
--- a/fs/xfs/libxfs/xfs_attr.h
+++ b/fs/xfs/libxfs/xfs_attr.h
@@ -552,7 +552,9 @@ int xfs_attr_get(struct xfs_da_args *args);
 int xfs_attr_set(struct xfs_da_args *args);
 int xfs_attr_set_iter(struct xfs_attr_intent *attr);
 int xfs_attr_remove_iter(struct xfs_attr_intent *attr);
-bool xfs_attr_namecheck(const void *name, size_t length);
+bool xfs_attr_check_namespace(unsigned int attr_flags);
+bool xfs_attr_namecheck(unsigned int attr_flags, const void *name,
+		size_t length);
 int xfs_attr_calc_size(struct xfs_da_args *args, int *local);
 void xfs_init_attr_trans(struct xfs_da_args *args, struct xfs_trans_res *tres,
 			 unsigned int *total);
diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c
index 0e0faa19d4da6..7929caf2052f7 100644
--- a/fs/xfs/libxfs/xfs_attr_leaf.c
+++ b/fs/xfs/libxfs/xfs_attr_leaf.c
@@ -949,6 +949,11 @@ xfs_attr_shortform_to_leaf(
 		nargs.hashval = xfs_da_hashname(sfe->nameval,
 						sfe->namelen);
 		nargs.attr_filter = sfe->flags & XFS_ATTR_NSP_ONDISK_MASK;
+		if (!xfs_attr_check_namespace(sfe->flags)) {
+			xfs_da_mark_sick(args);
+			error = -EFSCORRUPTED;
+			goto out;
+		}
 		error = xfs_attr3_leaf_lookup_int(bp, &nargs); /* set a->index */
 		ASSERT(error == -ENOATTR);
 		error = xfs_attr3_leaf_add(bp, &nargs);
@@ -1062,7 +1067,7 @@ xfs_attr_shortform_verify(
 		 * one namespace flag per xattr, so we can just count the
 		 * bits (i.e. hweight) here.
 		 */
-		if (hweight8(sfep->flags & XFS_ATTR_NSP_ONDISK_MASK) > 1)
+		if (!xfs_attr_check_namespace(sfep->flags))
 			return __this_address;
 
 		sfep = next_sfep;
diff --git a/fs/xfs/scrub/attr.c b/fs/xfs/scrub/attr.c
index 5ca79af47e81e..fdff9be408186 100644
--- a/fs/xfs/scrub/attr.c
+++ b/fs/xfs/scrub/attr.c
@@ -203,14 +203,8 @@ xchk_xattr_actor(
 		return 0;
 	}
 
-	/* Only one namespace bit allowed. */
-	if (hweight32(attr_flags & XFS_ATTR_NSP_ONDISK_MASK) > 1) {
-		xchk_fblock_set_corrupt(sc, XFS_ATTR_FORK, args.blkno);
-		return -ECANCELED;
-	}
-
 	/* Does this name make sense? */
-	if (!xfs_attr_namecheck(name, namelen)) {
+	if (!xfs_attr_namecheck(attr_flags, name, namelen)) {
 		xchk_fblock_set_corrupt(sc, XFS_ATTR_FORK, args.blkno);
 		return -ECANCELED;
 	}
@@ -519,6 +513,10 @@ xchk_xattr_rec(
 		xchk_da_set_corrupt(ds, level);
 		return 0;
 	}
+	if (!xfs_attr_check_namespace(ent->flags)) {
+		xchk_da_set_corrupt(ds, level);
+		return 0;
+	}
 
 	if (ent->flags & XFS_ATTR_LOCAL) {
 		lentry = (struct xfs_attr_leaf_name_local *)
diff --git a/fs/xfs/scrub/attr_repair.c b/fs/xfs/scrub/attr_repair.c
index 8192f9044c4a9..7228758c2da1a 100644
--- a/fs/xfs/scrub/attr_repair.c
+++ b/fs/xfs/scrub/attr_repair.c
@@ -123,12 +123,10 @@ xrep_xattr_want_salvage(
 		return false;
 	if (namelen > XATTR_NAME_MAX || namelen <= 0)
 		return false;
-	if (!xfs_attr_namecheck(name, namelen))
+	if (!xfs_attr_namecheck(attr_flags, name, namelen))
 		return false;
 	if (valuelen > XATTR_SIZE_MAX || valuelen < 0)
 		return false;
-	if (hweight32(attr_flags & XFS_ATTR_NSP_ONDISK_MASK) > 1)
-		return false;
 	return true;
 }
 
diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
index 5ad14be760adc..4d4fb804c0016 100644
--- a/fs/xfs/xfs_attr_item.c
+++ b/fs/xfs/xfs_attr_item.c
@@ -486,6 +486,10 @@ xfs_attri_validate(
 	if (attrp->alfi_attr_filter & ~XFS_ATTRI_FILTER_MASK)
 		return false;
 
+	if (!xfs_attr_check_namespace(attrp->alfi_attr_filter &
+				      XFS_ATTR_NSP_ONDISK_MASK))
+		return false;
+
 	switch (op) {
 	case XFS_ATTRI_OP_FLAGS_SET:
 	case XFS_ATTRI_OP_FLAGS_REPLACE:
@@ -629,7 +633,8 @@ xfs_attr_recover_work(
 	 */
 	attrp = &attrip->attri_format;
 	if (!xfs_attri_validate(mp, attrp) ||
-	    !xfs_attr_namecheck(nv->name.i_addr, nv->name.i_len))
+	    !xfs_attr_namecheck(attrp->alfi_attr_filter, nv->name.i_addr,
+				nv->name.i_len))
 		return -EFSCORRUPTED;
 
 	attr = xfs_attri_recover_work(mp, dfp, attrp, &ip, nv);
@@ -801,7 +806,8 @@ xlog_recover_attri_commit_pass2(
 	}
 
 	attr_name = item->ri_buf[i].i_addr;
-	if (!xfs_attr_namecheck(attr_name, name_len)) {
+	if (!xfs_attr_namecheck(attri_formatp->alfi_attr_filter, attr_name,
+				name_len)) {
 		XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
 				attri_formatp, len);
 		return -EFSCORRUPTED;
diff --git a/fs/xfs/xfs_attr_list.c b/fs/xfs/xfs_attr_list.c
index 97c8f3dcfb89d..903ed46c68872 100644
--- a/fs/xfs/xfs_attr_list.c
+++ b/fs/xfs/xfs_attr_list.c
@@ -82,7 +82,8 @@ xfs_attr_shortform_list(
 	     (dp->i_af.if_bytes + sf->count * 16) < context->bufsize)) {
 		for (i = 0, sfe = xfs_attr_sf_firstentry(sf); i < sf->count; i++) {
 			if (XFS_IS_CORRUPT(context->dp->i_mount,
-					   !xfs_attr_namecheck(sfe->nameval,
+					   !xfs_attr_namecheck(sfe->flags,
+							       sfe->nameval,
 							       sfe->namelen))) {
 				xfs_dirattr_mark_sick(context->dp, XFS_ATTR_FORK);
 				return -EFSCORRUPTED;
@@ -122,7 +123,8 @@ xfs_attr_shortform_list(
 	for (i = 0, sfe = xfs_attr_sf_firstentry(sf); i < sf->count; i++) {
 		if (unlikely(
 		    ((char *)sfe < (char *)sf) ||
-		    ((char *)sfe >= ((char *)sf + dp->i_af.if_bytes)))) {
+		    ((char *)sfe >= ((char *)sf + dp->i_af.if_bytes)) ||
+		    !xfs_attr_check_namespace(sfe->flags))) {
 			XFS_CORRUPTION_ERROR("xfs_attr_shortform_list",
 					     XFS_ERRLEVEL_LOW,
 					     context->dp->i_mount, sfe,
@@ -177,7 +179,7 @@ xfs_attr_shortform_list(
 			cursor->offset = 0;
 		}
 		if (XFS_IS_CORRUPT(context->dp->i_mount,
-				   !xfs_attr_namecheck(sbp->name,
+				   !xfs_attr_namecheck(sbp->flags, sbp->name,
 						       sbp->namelen))) {
 			xfs_dirattr_mark_sick(context->dp, XFS_ATTR_FORK);
 			error = -EFSCORRUPTED;
@@ -502,7 +504,8 @@ xfs_attr3_leaf_list_int(
 		}
 
 		if (XFS_IS_CORRUPT(context->dp->i_mount,
-				   !xfs_attr_namecheck(name, namelen))) {
+				   !xfs_attr_namecheck(entry->flags, name,
+						       namelen))) {
 			xfs_dirattr_mark_sick(context->dp, XFS_ATTR_FORK);
 			return -EFSCORRUPTED;
 		}


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 01/32] xfs: rearrange xfs_attr_match parameters
  2024-04-10  0:45 ` [PATCHSET v13.1 5/9] xfs: Parent Pointers Darrick J. Wong
@ 2024-04-10  0:53   ` Darrick J. Wong
  2024-04-10  5:10     ` Christoph Hellwig
  2024-04-10  0:54   ` [PATCH 02/32] xfs: check the flags earlier in xfs_attr_match Darrick J. Wong
                     ` (30 subsequent siblings)
  31 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  0:53 UTC (permalink / raw)
  To: djwong; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Rearrange the parameters to this function so that they match the order
of attr listent: attr_flags -> name -> namelen -> value -> valuelen.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_attr_leaf.c |   23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c
index 7929caf2052f7..53ef784e3049e 100644
--- a/fs/xfs/libxfs/xfs_attr_leaf.c
+++ b/fs/xfs/libxfs/xfs_attr_leaf.c
@@ -509,9 +509,9 @@ xfs_attr3_leaf_read(
 static bool
 xfs_attr_match(
 	struct xfs_da_args	*args,
-	uint8_t			namelen,
-	unsigned char		*name,
-	int			flags)
+	unsigned int		attr_flags,
+	const unsigned char	*name,
+	unsigned int		namelen)
 {
 
 	if (args->namelen != namelen)
@@ -521,12 +521,12 @@ xfs_attr_match(
 
 	/* Recovery ignores the INCOMPLETE flag. */
 	if ((args->op_flags & XFS_DA_OP_RECOVERY) &&
-	    args->attr_filter == (flags & XFS_ATTR_NSP_ONDISK_MASK))
+	    args->attr_filter == (attr_flags & XFS_ATTR_NSP_ONDISK_MASK))
 		return true;
 
 	/* All remaining matches need to be filtered by INCOMPLETE state. */
 	if (args->attr_filter !=
-	    (flags & (XFS_ATTR_NSP_ONDISK_MASK | XFS_ATTR_INCOMPLETE)))
+	    (attr_flags & (XFS_ATTR_NSP_ONDISK_MASK | XFS_ATTR_INCOMPLETE)))
 		return false;
 	return true;
 }
@@ -745,8 +745,8 @@ xfs_attr_sf_findname(
 	for (sfe = xfs_attr_sf_firstentry(sf);
 	     sfe < xfs_attr_sf_endptr(sf);
 	     sfe = xfs_attr_sf_nextentry(sfe)) {
-		if (xfs_attr_match(args, sfe->namelen, sfe->nameval,
-				sfe->flags))
+		if (xfs_attr_match(args, sfe->flags, sfe->nameval,
+					sfe->namelen))
 			return sfe;
 	}
 
@@ -2442,15 +2442,16 @@ xfs_attr3_leaf_lookup_int(
  */
 		if (entry->flags & XFS_ATTR_LOCAL) {
 			name_loc = xfs_attr3_leaf_name_local(leaf, probe);
-			if (!xfs_attr_match(args, name_loc->namelen,
-					name_loc->nameval, entry->flags))
+			if (!xfs_attr_match(args, entry->flags,
+						name_loc->nameval,
+						name_loc->namelen))
 				continue;
 			args->index = probe;
 			return -EEXIST;
 		} else {
 			name_rmt = xfs_attr3_leaf_name_remote(leaf, probe);
-			if (!xfs_attr_match(args, name_rmt->namelen,
-					name_rmt->name, entry->flags))
+			if (!xfs_attr_match(args, entry->flags, name_rmt->name,
+						name_rmt->namelen))
 				continue;
 			args->index = probe;
 			args->rmtvaluelen = be32_to_cpu(name_rmt->valuelen);


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 02/32] xfs: check the flags earlier in xfs_attr_match
  2024-04-10  0:45 ` [PATCHSET v13.1 5/9] xfs: Parent Pointers Darrick J. Wong
  2024-04-10  0:53   ` [PATCH 01/32] xfs: rearrange xfs_attr_match parameters Darrick J. Wong
@ 2024-04-10  0:54   ` Darrick J. Wong
  2024-04-10  0:54   ` [PATCH 03/32] xfs: move xfs_attr_defer_add to xfs_attr_item.c Darrick J. Wong
                     ` (29 subsequent siblings)
  31 siblings, 0 replies; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  0:54 UTC (permalink / raw)
  To: djwong
  Cc: Christoph Hellwig, catherine.hoang, hch, allison.henderson, linux-xfs

From: Christoph Hellwig <hch@lst.de>

Checking the flags match is much cheaper than a memcmp, so do it early
on in xfs_attr_match, and also add a little helper to calculate the
match mask right under the comment explaining the logic for it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_attr_leaf.c |   19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c
index 53ef784e3049e..9cb3a5d1c07d1 100644
--- a/fs/xfs/libxfs/xfs_attr_leaf.c
+++ b/fs/xfs/libxfs/xfs_attr_leaf.c
@@ -506,6 +506,13 @@ xfs_attr3_leaf_read(
  * INCOMPLETE flag will not be set in attr->attr_filter, but rather
  * XFS_DA_OP_RECOVERY will be set in args->op_flags.
  */
+static inline unsigned int xfs_attr_match_mask(const struct xfs_da_args *args)
+{
+	if (args->op_flags & XFS_DA_OP_RECOVERY)
+		return XFS_ATTR_NSP_ONDISK_MASK;
+	return XFS_ATTR_NSP_ONDISK_MASK | XFS_ATTR_INCOMPLETE;
+}
+
 static bool
 xfs_attr_match(
 	struct xfs_da_args	*args,
@@ -513,21 +520,15 @@ xfs_attr_match(
 	const unsigned char	*name,
 	unsigned int		namelen)
 {
+	unsigned int		mask = xfs_attr_match_mask(args);
 
 	if (args->namelen != namelen)
 		return false;
+	if ((args->attr_filter & mask) != (attr_flags & mask))
+		return false;
 	if (memcmp(args->name, name, namelen) != 0)
 		return false;
 
-	/* Recovery ignores the INCOMPLETE flag. */
-	if ((args->op_flags & XFS_DA_OP_RECOVERY) &&
-	    args->attr_filter == (attr_flags & XFS_ATTR_NSP_ONDISK_MASK))
-		return true;
-
-	/* All remaining matches need to be filtered by INCOMPLETE state. */
-	if (args->attr_filter !=
-	    (attr_flags & (XFS_ATTR_NSP_ONDISK_MASK | XFS_ATTR_INCOMPLETE)))
-		return false;
 	return true;
 }
 


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 03/32] xfs: move xfs_attr_defer_add to xfs_attr_item.c
  2024-04-10  0:45 ` [PATCHSET v13.1 5/9] xfs: Parent Pointers Darrick J. Wong
  2024-04-10  0:53   ` [PATCH 01/32] xfs: rearrange xfs_attr_match parameters Darrick J. Wong
  2024-04-10  0:54   ` [PATCH 02/32] xfs: check the flags earlier in xfs_attr_match Darrick J. Wong
@ 2024-04-10  0:54   ` Darrick J. Wong
  2024-04-10  5:11     ` Christoph Hellwig
  2024-04-10  0:54   ` [PATCH 04/32] xfs: create a separate hashname function for extended attributes Darrick J. Wong
                     ` (28 subsequent siblings)
  31 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  0:54 UTC (permalink / raw)
  To: djwong; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Move the code that adds the incore xfs_attr_item deferred work data to a
transaction live with the ATTRI log item code.  This means that the
upper level extended attribute code no longer has to know about the
inner workings of the ATTRI log items.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_attr.c |   37 +++----------------------------------
 fs/xfs/xfs_attr_item.c   |   30 ++++++++++++++++++++++++++++++
 fs/xfs/xfs_attr_item.h   |    8 ++++++++
 3 files changed, 41 insertions(+), 34 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index 426a41b43f641..03df79f63f674 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -903,37 +903,6 @@ xfs_attr_lookup(
 	return error;
 }
 
-static void
-xfs_attr_defer_add(
-	struct xfs_da_args	*args,
-	unsigned int		op_flags)
-{
-
-	struct xfs_attr_intent	*new;
-
-	new = kmem_cache_zalloc(xfs_attr_intent_cache,
-			GFP_KERNEL | __GFP_NOFAIL);
-	new->xattri_op_flags = op_flags;
-	new->xattri_da_args = args;
-
-	switch (op_flags) {
-	case XFS_ATTRI_OP_FLAGS_SET:
-		new->xattri_dela_state = xfs_attr_init_add_state(args);
-		break;
-	case XFS_ATTRI_OP_FLAGS_REPLACE:
-		new->xattri_dela_state = xfs_attr_init_replace_state(args);
-		break;
-	case XFS_ATTRI_OP_FLAGS_REMOVE:
-		new->xattri_dela_state = xfs_attr_init_remove_state(args);
-		break;
-	default:
-		ASSERT(0);
-	}
-
-	xfs_defer_add(args->trans, &new->xattri_list, &xfs_attr_defer_type);
-	trace_xfs_attr_defer_add(new->xattri_dela_state, args->dp);
-}
-
 /*
  * Note: If args->value is NULL the attribute will be removed, just like the
  * Linux ->setattr API.
@@ -1023,14 +992,14 @@ xfs_attr_set(
 	case -EEXIST:
 		if (!args->value) {
 			/* if no value, we are performing a remove operation */
-			xfs_attr_defer_add(args, XFS_ATTRI_OP_FLAGS_REMOVE);
+			xfs_attr_defer_add(args, XFS_ATTR_DEFER_REMOVE);
 			break;
 		}
 
 		/* Pure create fails if the attr already exists */
 		if (args->xattr_flags & XATTR_CREATE)
 			goto out_trans_cancel;
-		xfs_attr_defer_add(args, XFS_ATTRI_OP_FLAGS_REPLACE);
+		xfs_attr_defer_add(args, XFS_ATTR_DEFER_REPLACE);
 		break;
 	case -ENOATTR:
 		/* Can't remove what isn't there. */
@@ -1040,7 +1009,7 @@ xfs_attr_set(
 		/* Pure replace fails if no existing attr to replace. */
 		if (args->xattr_flags & XATTR_REPLACE)
 			goto out_trans_cancel;
-		xfs_attr_defer_add(args, XFS_ATTRI_OP_FLAGS_SET);
+		xfs_attr_defer_add(args, XFS_ATTR_DEFER_SET);
 		break;
 	default:
 		goto out_trans_cancel;
diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
index 4d4fb804c0016..04aa2c68d5e56 100644
--- a/fs/xfs/xfs_attr_item.c
+++ b/fs/xfs/xfs_attr_item.c
@@ -723,6 +723,36 @@ xfs_attr_create_done(
 	return &attrdp->attrd_item;
 }
 
+void
+xfs_attr_defer_add(
+	struct xfs_da_args	*args,
+	enum xfs_attr_defer_op	op)
+{
+	struct xfs_attr_intent	*new;
+
+	new = kmem_cache_zalloc(xfs_attr_intent_cache,
+			GFP_NOFS | __GFP_NOFAIL);
+	new->xattri_da_args = args;
+
+	switch (op) {
+	case XFS_ATTR_DEFER_SET:
+		new->xattri_op_flags = XFS_ATTRI_OP_FLAGS_SET;
+		new->xattri_dela_state = xfs_attr_init_add_state(args);
+		break;
+	case XFS_ATTR_DEFER_REPLACE:
+		new->xattri_op_flags = XFS_ATTRI_OP_FLAGS_REPLACE;
+		new->xattri_dela_state = xfs_attr_init_replace_state(args);
+		break;
+	case XFS_ATTR_DEFER_REMOVE:
+		new->xattri_op_flags = XFS_ATTRI_OP_FLAGS_REMOVE;
+		new->xattri_dela_state = xfs_attr_init_remove_state(args);
+		break;
+	}
+
+	xfs_defer_add(args->trans, &new->xattri_list, &xfs_attr_defer_type);
+	trace_xfs_attr_defer_add(new->xattri_dela_state, args->dp);
+}
+
 const struct xfs_defer_op_type xfs_attr_defer_type = {
 	.name		= "attr",
 	.max_items	= 1,
diff --git a/fs/xfs/xfs_attr_item.h b/fs/xfs/xfs_attr_item.h
index 3280a79302876..c32b669b0e16a 100644
--- a/fs/xfs/xfs_attr_item.h
+++ b/fs/xfs/xfs_attr_item.h
@@ -51,4 +51,12 @@ struct xfs_attrd_log_item {
 extern struct kmem_cache	*xfs_attri_cache;
 extern struct kmem_cache	*xfs_attrd_cache;
 
+enum xfs_attr_defer_op {
+	XFS_ATTR_DEFER_SET,
+	XFS_ATTR_DEFER_REMOVE,
+	XFS_ATTR_DEFER_REPLACE,
+};
+
+void xfs_attr_defer_add(struct xfs_da_args *args, enum xfs_attr_defer_op op);
+
 #endif	/* __XFS_ATTR_ITEM_H__ */


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 04/32] xfs: create a separate hashname function for extended attributes
  2024-04-10  0:45 ` [PATCHSET v13.1 5/9] xfs: Parent Pointers Darrick J. Wong
                     ` (2 preceding siblings ...)
  2024-04-10  0:54   ` [PATCH 03/32] xfs: move xfs_attr_defer_add to xfs_attr_item.c Darrick J. Wong
@ 2024-04-10  0:54   ` Darrick J. Wong
  2024-04-10  5:11     ` Christoph Hellwig
  2024-04-10  0:54   ` [PATCH 05/32] xfs: add parent pointer support to attribute code Darrick J. Wong
                     ` (27 subsequent siblings)
  31 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  0:54 UTC (permalink / raw)
  To: djwong; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Create a separate function to compute name hashvalues for extended
attributes.  When we get to parent pointers we'll be altering the rules
so that metadump obfuscation doesn't turn heinous.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_attr.c      |   28 ++++++++++++++++++++++++++--
 fs/xfs/libxfs/xfs_attr.h      |   14 ++++++++++++++
 fs/xfs/libxfs/xfs_attr_leaf.c |    3 +--
 fs/xfs/scrub/attr.c           |   11 ++++++++---
 fs/xfs/xfs_attr_item.c        |    2 +-
 fs/xfs/xfs_attr_list.c        |    5 ++++-
 6 files changed, 54 insertions(+), 9 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index 03df79f63f674..30988d60162c7 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -282,7 +282,7 @@ xfs_attr_get(
 		args->owner = args->dp->i_ino;
 	args->geo = args->dp->i_mount->m_attr_geo;
 	args->whichfork = XFS_ATTR_FORK;
-	args->hashval = xfs_da_hashname(args->name, args->namelen);
+	xfs_attr_sethash(args);
 
 	/* Entirely possible to look up a name which doesn't exist */
 	args->op_flags = XFS_DA_OP_OKNOENT;
@@ -417,6 +417,30 @@ xfs_attr_sf_addname(
 	return error;
 }
 
+/* Compute the hash value for a user/root/secure extended attribute */
+xfs_dahash_t
+xfs_attr_hashname(
+	const uint8_t		*name,
+	int			namelen)
+{
+	return xfs_da_hashname(name, namelen);
+}
+
+/* Compute the hash value for any extended attribute from any namespace. */
+xfs_dahash_t
+xfs_attr_hashval(
+	struct xfs_mount	*mp,
+	unsigned int		attr_flags,
+	const uint8_t		*name,
+	int			namelen,
+	const void		*value,
+	int			valuelen)
+{
+	ASSERT(xfs_attr_check_namespace(attr_flags));
+
+	return xfs_attr_hashname(name, namelen);
+}
+
 /*
  * Handle the state change on completion of a multi-state attr operation.
  *
@@ -932,7 +956,7 @@ xfs_attr_set(
 		args->owner = args->dp->i_ino;
 	args->geo = mp->m_attr_geo;
 	args->whichfork = XFS_ATTR_FORK;
-	args->hashval = xfs_da_hashname(args->name, args->namelen);
+	xfs_attr_sethash(args);
 
 	/*
 	 * We have no control over the attribute names that userspace passes us
diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
index 3813f7ae626a2..df91c94d5bf5c 100644
--- a/fs/xfs/libxfs/xfs_attr.h
+++ b/fs/xfs/libxfs/xfs_attr.h
@@ -620,6 +620,20 @@ xfs_attr_init_replace_state(struct xfs_da_args *args)
 	return xfs_attr_init_add_state(args);
 }
 
+xfs_dahash_t xfs_attr_hashname(const uint8_t *name, int namelen);
+
+xfs_dahash_t xfs_attr_hashval(struct xfs_mount *mp, unsigned int attr_flags,
+		const uint8_t *name, int namelen, const void *value,
+		int valuelen);
+
+/* Set the hash value for any extended attribute from any namespace. */
+static inline void xfs_attr_sethash(struct xfs_da_args *args)
+{
+	args->hashval = xfs_attr_hashval(args->dp->i_mount, args->attr_filter,
+					 args->name, args->namelen,
+					 args->value, args->valuelen);
+}
+
 extern struct kmem_cache *xfs_attr_intent_cache;
 int __init xfs_attr_intent_init_cache(void);
 void xfs_attr_intent_destroy_cache(void);
diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c
index 9cb3a5d1c07d1..490608bbed7ad 100644
--- a/fs/xfs/libxfs/xfs_attr_leaf.c
+++ b/fs/xfs/libxfs/xfs_attr_leaf.c
@@ -947,14 +947,13 @@ xfs_attr_shortform_to_leaf(
 		nargs.namelen = sfe->namelen;
 		nargs.value = &sfe->nameval[nargs.namelen];
 		nargs.valuelen = sfe->valuelen;
-		nargs.hashval = xfs_da_hashname(sfe->nameval,
-						sfe->namelen);
 		nargs.attr_filter = sfe->flags & XFS_ATTR_NSP_ONDISK_MASK;
 		if (!xfs_attr_check_namespace(sfe->flags)) {
 			xfs_da_mark_sick(args);
 			error = -EFSCORRUPTED;
 			goto out;
 		}
+		xfs_attr_sethash(&nargs);
 		error = xfs_attr3_leaf_lookup_int(bp, &nargs); /* set a->index */
 		ASSERT(error == -ENOATTR);
 		error = xfs_attr3_leaf_add(bp, &nargs);
diff --git a/fs/xfs/scrub/attr.c b/fs/xfs/scrub/attr.c
index fdff9be408186..d5c2e73be8623 100644
--- a/fs/xfs/scrub/attr.c
+++ b/fs/xfs/scrub/attr.c
@@ -179,7 +179,6 @@ xchk_xattr_actor(
 		.dp			= ip,
 		.name			= name,
 		.namelen		= namelen,
-		.hashval		= xfs_da_hashname(name, namelen),
 		.trans			= sc->tp,
 		.valuelen		= valuelen,
 		.owner			= ip->i_ino,
@@ -230,6 +229,7 @@ xchk_xattr_actor(
 
 	args.value = ab->value;
 
+	xfs_attr_sethash(&args);
 	error = xfs_attr_get_ilocked(&args);
 	/* ENODATA means the hash lookup failed and the attr is bad */
 	if (error == -ENODATA)
@@ -525,7 +525,10 @@ xchk_xattr_rec(
 			xchk_da_set_corrupt(ds, level);
 			goto out;
 		}
-		calc_hash = xfs_da_hashname(lentry->nameval, lentry->namelen);
+		calc_hash = xfs_attr_hashval(mp, ent->flags, lentry->nameval,
+					     lentry->namelen,
+					     lentry->nameval + lentry->namelen,
+					     be16_to_cpu(lentry->valuelen));
 	} else {
 		rentry = (struct xfs_attr_leaf_name_remote *)
 				(((char *)bp->b_addr) + nameidx);
@@ -533,7 +536,9 @@ xchk_xattr_rec(
 			xchk_da_set_corrupt(ds, level);
 			goto out;
 		}
-		calc_hash = xfs_da_hashname(rentry->name, rentry->namelen);
+		calc_hash = xfs_attr_hashval(mp, ent->flags, rentry->name,
+					     rentry->namelen, NULL,
+					     be32_to_cpu(rentry->valuelen));
 	}
 	if (calc_hash != hash)
 		xchk_da_set_corrupt(ds, level);
diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
index 04aa2c68d5e56..8f91016fc3cf8 100644
--- a/fs/xfs/xfs_attr_item.c
+++ b/fs/xfs/xfs_attr_item.c
@@ -578,13 +578,13 @@ xfs_attri_recover_work(
 	args->whichfork = XFS_ATTR_FORK;
 	args->name = nv->name.i_addr;
 	args->namelen = nv->name.i_len;
-	args->hashval = xfs_da_hashname(args->name, args->namelen);
 	args->value = nv->value.i_addr;
 	args->valuelen = nv->value.i_len;
 	args->attr_filter = attrp->alfi_attr_filter & XFS_ATTRI_FILTER_MASK;
 	args->op_flags = XFS_DA_OP_RECOVERY | XFS_DA_OP_OKNOENT |
 			 XFS_DA_OP_LOGGED;
 	args->owner = args->dp->i_ino;
+	xfs_attr_sethash(args);
 
 	switch (xfs_attr_intent_op(attr)) {
 	case XFS_ATTRI_OP_FLAGS_SET:
diff --git a/fs/xfs/xfs_attr_list.c b/fs/xfs/xfs_attr_list.c
index 903ed46c68872..9bc4b5322539a 100644
--- a/fs/xfs/xfs_attr_list.c
+++ b/fs/xfs/xfs_attr_list.c
@@ -135,12 +135,15 @@ xfs_attr_shortform_list(
 		}
 
 		sbp->entno = i;
-		sbp->hash = xfs_da_hashname(sfe->nameval, sfe->namelen);
 		sbp->name = sfe->nameval;
 		sbp->namelen = sfe->namelen;
 		/* These are bytes, and both on-disk, don't endian-flip */
 		sbp->valuelen = sfe->valuelen;
 		sbp->flags = sfe->flags;
+		sbp->hash = xfs_attr_hashval(dp->i_mount, sfe->flags,
+					     sfe->nameval, sfe->namelen,
+					     sfe->nameval + sfe->namelen,
+					     sfe->valuelen);
 		sfe = xfs_attr_sf_nextentry(sfe);
 		sbp++;
 		nsbuf++;


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 05/32] xfs: add parent pointer support to attribute code
  2024-04-10  0:45 ` [PATCHSET v13.1 5/9] xfs: Parent Pointers Darrick J. Wong
                     ` (3 preceding siblings ...)
  2024-04-10  0:54   ` [PATCH 04/32] xfs: create a separate hashname function for extended attributes Darrick J. Wong
@ 2024-04-10  0:54   ` Darrick J. Wong
  2024-04-10  5:11     ` Christoph Hellwig
  2024-04-10  0:55   ` [PATCH 06/32] xfs: define parent pointer ondisk extended attribute format Darrick J. Wong
                     ` (26 subsequent siblings)
  31 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  0:54 UTC (permalink / raw)
  To: djwong
  Cc: Mark Tinguely, Dave Chinner, Allison Henderson, catherine.hoang,
	hch, allison.henderson, linux-xfs

From: Allison Henderson <allison.henderson@oracle.com>

Add the new parent attribute type. XFS_ATTR_PARENT is used only for parent pointer
entries; it uses reserved blocks like XFS_ATTR_ROOT.

Signed-off-by: Mark Tinguely <mark.tinguely@oracle.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_da_format.h  |    9 +++++++--
 fs/xfs/libxfs/xfs_log_format.h |    1 +
 fs/xfs/xfs_trace.h             |    3 ++-
 3 files changed, 10 insertions(+), 3 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_da_format.h b/fs/xfs/libxfs/xfs_da_format.h
index ecd0616f5776a..0c80f7ab9475a 100644
--- a/fs/xfs/libxfs/xfs_da_format.h
+++ b/fs/xfs/libxfs/xfs_da_format.h
@@ -714,13 +714,17 @@ struct xfs_attr3_leafblock {
 #define	XFS_ATTR_LOCAL_BIT	0	/* attr is stored locally */
 #define	XFS_ATTR_ROOT_BIT	1	/* limit access to trusted attrs */
 #define	XFS_ATTR_SECURE_BIT	2	/* limit access to secure attrs */
+#define	XFS_ATTR_PARENT_BIT	3	/* parent pointer attrs */
 #define	XFS_ATTR_INCOMPLETE_BIT	7	/* attr in middle of create/delete */
 #define XFS_ATTR_LOCAL		(1u << XFS_ATTR_LOCAL_BIT)
 #define XFS_ATTR_ROOT		(1u << XFS_ATTR_ROOT_BIT)
 #define XFS_ATTR_SECURE		(1u << XFS_ATTR_SECURE_BIT)
+#define XFS_ATTR_PARENT		(1u << XFS_ATTR_PARENT_BIT)
 #define XFS_ATTR_INCOMPLETE	(1u << XFS_ATTR_INCOMPLETE_BIT)
 
-#define XFS_ATTR_NSP_ONDISK_MASK	(XFS_ATTR_ROOT | XFS_ATTR_SECURE)
+#define XFS_ATTR_NSP_ONDISK_MASK	(XFS_ATTR_ROOT | \
+					 XFS_ATTR_SECURE | \
+					 XFS_ATTR_PARENT)
 
 #define XFS_ATTR_ONDISK_MASK	(XFS_ATTR_NSP_ONDISK_MASK | \
 				 XFS_ATTR_LOCAL | \
@@ -729,7 +733,8 @@ struct xfs_attr3_leafblock {
 #define XFS_ATTR_NAMESPACE_STR \
 	{ XFS_ATTR_LOCAL,	"local" }, \
 	{ XFS_ATTR_ROOT,	"root" }, \
-	{ XFS_ATTR_SECURE,	"secure" }
+	{ XFS_ATTR_SECURE,	"secure" }, \
+	{ XFS_ATTR_PARENT,	"parent" }
 
 /*
  * Alignment for namelist and valuelist entries (since they are mixed
diff --git a/fs/xfs/libxfs/xfs_log_format.h b/fs/xfs/libxfs/xfs_log_format.h
index accba2acd623d..020aebd101432 100644
--- a/fs/xfs/libxfs/xfs_log_format.h
+++ b/fs/xfs/libxfs/xfs_log_format.h
@@ -1034,6 +1034,7 @@ struct xfs_icreate_log {
  */
 #define XFS_ATTRI_FILTER_MASK		(XFS_ATTR_ROOT | \
 					 XFS_ATTR_SECURE | \
+					 XFS_ATTR_PARENT | \
 					 XFS_ATTR_INCOMPLETE)
 
 /*
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index e9cf9430ce259..e6cbdffb14f64 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -91,7 +91,8 @@ struct xfs_exchrange;
 #define XFS_ATTR_FILTER_FLAGS \
 	{ XFS_ATTR_ROOT,	"ROOT" }, \
 	{ XFS_ATTR_SECURE,	"SECURE" }, \
-	{ XFS_ATTR_INCOMPLETE,	"INCOMPLETE" }
+	{ XFS_ATTR_INCOMPLETE,	"INCOMPLETE" }, \
+	{ XFS_ATTR_PARENT,	"PARENT" }
 
 DECLARE_EVENT_CLASS(xfs_attr_list_class,
 	TP_PROTO(struct xfs_attr_list_context *ctx),


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 06/32] xfs: define parent pointer ondisk extended attribute format
  2024-04-10  0:45 ` [PATCHSET v13.1 5/9] xfs: Parent Pointers Darrick J. Wong
                     ` (4 preceding siblings ...)
  2024-04-10  0:54   ` [PATCH 05/32] xfs: add parent pointer support to attribute code Darrick J. Wong
@ 2024-04-10  0:55   ` Darrick J. Wong
  2024-04-10  5:12     ` Christoph Hellwig
  2024-04-10  0:55   ` [PATCH 07/32] xfs: allow xattr matching on name and value for local/sf attrs Darrick J. Wong
                     ` (25 subsequent siblings)
  31 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  0:55 UTC (permalink / raw)
  To: djwong
  Cc: Allison Henderson, catherine.hoang, hch, allison.henderson, linux-xfs

From: Allison Henderson <allison.henderson@oracle.com>

We need to define the parent pointer attribute format before we start
adding support for it into all the code that needs to use it. The EA
format we will use encodes the following information:

        name={dirent name}
        value={parent inumber, parent inode generation}
        hash=xfs_dir2_hashname(dirent name) ^ (parent_inumber)

The inode/gen gives all the information we need to reliably identify the
parent without requiring child->parent lock ordering, and allows
userspace to do pathname component level reconstruction without the
kernel ever needing to verify the parent itself as part of ioctl calls.

By using the name-value lookup mode in the extended attribute code to
match parent pointers using both the xattr name and value, we can
identify the exact parent pointer EA we need to modify/remove in
rename/unlink operations without searching the entire EA space.

By storing the dirent name, we have enough information to be able to
validate and reconstruct damaged directory trees.  Earlier iterations of
this patchset encoded the directory offset in the parent pointer key,
but this format required repair to keep that in sync across directory
rebuilds, which is unnecessary complexity.

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_da_format.h |   13 +++++++++++++
 fs/xfs/libxfs/xfs_ondisk.h    |    1 +
 2 files changed, 14 insertions(+)


diff --git a/fs/xfs/libxfs/xfs_da_format.h b/fs/xfs/libxfs/xfs_da_format.h
index 0c80f7ab9475a..1395ad1937c53 100644
--- a/fs/xfs/libxfs/xfs_da_format.h
+++ b/fs/xfs/libxfs/xfs_da_format.h
@@ -890,4 +890,17 @@ static inline unsigned int xfs_dir2_dirblock_bytes(struct xfs_sb *sbp)
 xfs_failaddr_t xfs_da3_blkinfo_verify(struct xfs_buf *bp,
 				      struct xfs_da3_blkinfo *hdr3);
 
+/*
+ * Parent pointer attribute format definition
+ *
+ * The xattr name contains the dirent name.
+ * The xattr value encodes the parent inode number and generation to ease
+ * opening parents by handle.
+ * The xattr hashval is xfs_dir2_namehash() ^ p_ino
+ */
+struct xfs_parent_rec {
+	__be64	p_ino;
+	__be32	p_gen;
+} __packed;
+
 #endif /* __XFS_DA_FORMAT_H__ */
diff --git a/fs/xfs/libxfs/xfs_ondisk.h b/fs/xfs/libxfs/xfs_ondisk.h
index 81885a6a028ed..25952ef584eee 100644
--- a/fs/xfs/libxfs/xfs_ondisk.h
+++ b/fs/xfs/libxfs/xfs_ondisk.h
@@ -119,6 +119,7 @@ xfs_check_ondisk_structs(void)
 	XFS_CHECK_OFFSET(xfs_dir2_sf_entry_t, offset,		1);
 	XFS_CHECK_OFFSET(xfs_dir2_sf_entry_t, name,		3);
 	XFS_CHECK_STRUCT_SIZE(xfs_dir2_sf_hdr_t,		10);
+	XFS_CHECK_STRUCT_SIZE(struct xfs_parent_rec,		12);
 
 	/* log structures */
 	XFS_CHECK_STRUCT_SIZE(struct xfs_buf_log_format,	88);


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 07/32] xfs: allow xattr matching on name and value for local/sf attrs
  2024-04-10  0:45 ` [PATCHSET v13.1 5/9] xfs: Parent Pointers Darrick J. Wong
                     ` (5 preceding siblings ...)
  2024-04-10  0:55   ` [PATCH 06/32] xfs: define parent pointer ondisk extended attribute format Darrick J. Wong
@ 2024-04-10  0:55   ` Darrick J. Wong
  2024-04-10  5:16     ` Christoph Hellwig
  2024-04-10  0:55   ` [PATCH 08/32] xfs: allow logged xattr operations if parent pointers are enabled Darrick J. Wong
                     ` (24 subsequent siblings)
  31 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  0:55 UTC (permalink / raw)
  To: djwong; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Add a new XFS_DA_OP_PARENT flag to signal that the caller wants to look
up a parent pointer extended attribute by name and value.  This only
works with shortform and local attributes.  Only parent pointers need
this functionality and parent pointers cannot be remote xattrs, so this
limitation is ok for now.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_attr_leaf.c |   44 +++++++++++++++++++++++++++++++++++++----
 1 file changed, 40 insertions(+), 4 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c
index 490608bbed7ad..7d74ade47d8f1 100644
--- a/fs/xfs/libxfs/xfs_attr_leaf.c
+++ b/fs/xfs/libxfs/xfs_attr_leaf.c
@@ -513,12 +513,33 @@ static inline unsigned int xfs_attr_match_mask(const struct xfs_da_args *args)
 	return XFS_ATTR_NSP_ONDISK_MASK | XFS_ATTR_INCOMPLETE;
 }
 
+static inline bool
+xfs_attr_parent_match(
+	const struct xfs_da_args	*args,
+	const void			*value,
+	unsigned int			valuelen)
+{
+	ASSERT(args->value != NULL);
+
+	/* Parent pointers do not use remote values */
+	if (!value)
+		return false;
+
+	/* The only value we support is a parent rec. */
+	if (valuelen != sizeof(struct xfs_parent_rec))
+		return false;
+
+	return memcmp(args->value, value, valuelen) == 0;
+}
+
 static bool
 xfs_attr_match(
 	struct xfs_da_args	*args,
 	unsigned int		attr_flags,
 	const unsigned char	*name,
-	unsigned int		namelen)
+	unsigned int		namelen,
+	const void		*value,
+	unsigned int		valuelen)
 {
 	unsigned int		mask = xfs_attr_match_mask(args);
 
@@ -529,6 +550,9 @@ xfs_attr_match(
 	if (memcmp(args->name, name, namelen) != 0)
 		return false;
 
+	if (attr_flags & XFS_ATTR_PARENT)
+		return xfs_attr_parent_match(args, value, valuelen);
+
 	return true;
 }
 
@@ -538,6 +562,13 @@ xfs_attr_copy_value(
 	unsigned char		*value,
 	int			valuelen)
 {
+	/*
+	 * Parent pointer lookups require the caller to specify the name and
+	 * value, so don't copy anything.
+	 */
+	if (args->attr_filter & XFS_ATTR_PARENT)
+		return 0;
+
 	/*
 	 * No copy if all we have to do is get the length
 	 */
@@ -747,7 +778,9 @@ xfs_attr_sf_findname(
 	     sfe < xfs_attr_sf_endptr(sf);
 	     sfe = xfs_attr_sf_nextentry(sfe)) {
 		if (xfs_attr_match(args, sfe->flags, sfe->nameval,
-					sfe->namelen))
+					sfe->namelen,
+					&sfe->nameval[sfe->namelen],
+					sfe->valuelen))
 			return sfe;
 	}
 
@@ -2444,14 +2477,17 @@ xfs_attr3_leaf_lookup_int(
 			name_loc = xfs_attr3_leaf_name_local(leaf, probe);
 			if (!xfs_attr_match(args, entry->flags,
 						name_loc->nameval,
-						name_loc->namelen))
+						name_loc->namelen,
+						&name_loc->nameval[name_loc->namelen],
+						be16_to_cpu(name_loc->valuelen)))
 				continue;
 			args->index = probe;
 			return -EEXIST;
 		} else {
 			name_rmt = xfs_attr3_leaf_name_remote(leaf, probe);
 			if (!xfs_attr_match(args, entry->flags, name_rmt->name,
-						name_rmt->namelen))
+						name_rmt->namelen, NULL,
+						be32_to_cpu(name_rmt->valuelen)))
 				continue;
 			args->index = probe;
 			args->rmtvaluelen = be32_to_cpu(name_rmt->valuelen);


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 08/32] xfs: allow logged xattr operations if parent pointers are enabled
  2024-04-10  0:45 ` [PATCHSET v13.1 5/9] xfs: Parent Pointers Darrick J. Wong
                     ` (6 preceding siblings ...)
  2024-04-10  0:55   ` [PATCH 07/32] xfs: allow xattr matching on name and value for local/sf attrs Darrick J. Wong
@ 2024-04-10  0:55   ` Darrick J. Wong
  2024-04-10  5:18     ` Christoph Hellwig
  2024-04-10  0:55   ` [PATCH 09/32] xfs: log parent pointer xattr removal operations Darrick J. Wong
                     ` (23 subsequent siblings)
  31 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  0:55 UTC (permalink / raw)
  To: djwong; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Don't trip this assertion about attr log items if we have parent
pointers enabled.  Parent pointers are an incompat feature that doesn't
use any of the functionality protected by
XFS_SB_FEAT_INCOMPAT_LOG_XATTRS, which is why this is ok.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_attr_item.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
index 8f91016fc3cf8..c509bf841949f 100644
--- a/fs/xfs/xfs_attr_item.c
+++ b/fs/xfs/xfs_attr_item.c
@@ -474,7 +474,7 @@ xfs_attri_validate(
 {
 	unsigned int			op = xfs_attr_log_item_op(attrp);
 
-	if (!xfs_is_using_logged_xattrs(mp))
+	if (!xfs_is_using_logged_xattrs(mp) && !xfs_has_parent(mp))
 		return false;
 
 	if (attrp->__pad != 0)


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 09/32] xfs: log parent pointer xattr removal operations
  2024-04-10  0:45 ` [PATCHSET v13.1 5/9] xfs: Parent Pointers Darrick J. Wong
                     ` (7 preceding siblings ...)
  2024-04-10  0:55   ` [PATCH 08/32] xfs: allow logged xattr operations if parent pointers are enabled Darrick J. Wong
@ 2024-04-10  0:55   ` Darrick J. Wong
  2024-04-10  5:18     ` Christoph Hellwig
  2024-04-10  0:56   ` [PATCH 10/32] xfs: log parent pointer xattr setting operations Darrick J. Wong
                     ` (22 subsequent siblings)
  31 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  0:55 UTC (permalink / raw)
  To: djwong; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

The parent pointer code needs to do a deferred parent pointer remove
operation with the xattr log intent code.  Declare a new logged xattr
opcode and push it through the log.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_log_format.h |    1 +
 fs/xfs/xfs_attr_item.c         |   53 ++++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_attr_item.h         |    2 ++
 3 files changed, 56 insertions(+)


diff --git a/fs/xfs/libxfs/xfs_log_format.h b/fs/xfs/libxfs/xfs_log_format.h
index 020aebd101432..52dcee4b3abe6 100644
--- a/fs/xfs/libxfs/xfs_log_format.h
+++ b/fs/xfs/libxfs/xfs_log_format.h
@@ -1026,6 +1026,7 @@ struct xfs_icreate_log {
 #define XFS_ATTRI_OP_FLAGS_SET		1	/* Set the attribute */
 #define XFS_ATTRI_OP_FLAGS_REMOVE	2	/* Remove the attribute */
 #define XFS_ATTRI_OP_FLAGS_REPLACE	3	/* Replace the attribute */
+#define XFS_ATTRI_OP_FLAGS_PPTR_REMOVE	5	/* Remove parent pointer */
 #define XFS_ATTRI_OP_FLAGS_TYPE_MASK	0xFF	/* Flags type mask */
 
 /*
diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
index c509bf841949f..5cce8a9863862 100644
--- a/fs/xfs/xfs_attr_item.c
+++ b/fs/xfs/xfs_attr_item.c
@@ -491,6 +491,14 @@ xfs_attri_validate(
 		return false;
 
 	switch (op) {
+	case XFS_ATTRI_OP_FLAGS_PPTR_REMOVE:
+		if (!xfs_has_parent(mp))
+			return false;
+		if (attrp->alfi_value_len == 0)
+			return false;
+		if (!(attrp->alfi_attr_filter & XFS_ATTR_PARENT))
+			return false;
+		fallthrough;
 	case XFS_ATTRI_OP_FLAGS_SET:
 	case XFS_ATTRI_OP_FLAGS_REPLACE:
 		if (attrp->alfi_value_len > XATTR_SIZE_MAX)
@@ -595,6 +603,7 @@ xfs_attri_recover_work(
 		else
 			attr->xattri_dela_state = xfs_attr_init_add_state(args);
 		break;
+	case XFS_ATTRI_OP_FLAGS_PPTR_REMOVE:
 	case XFS_ATTRI_OP_FLAGS_REMOVE:
 		attr->xattri_dela_state = xfs_attr_init_remove_state(args);
 		break;
@@ -753,6 +762,36 @@ xfs_attr_defer_add(
 	trace_xfs_attr_defer_add(new->xattri_dela_state, args->dp);
 }
 
+void
+xfs_attr_defer_parent(
+	struct xfs_da_args	*args,
+	enum xfs_attr_defer_op	op)
+{
+	struct xfs_attr_intent	*new;
+
+	ASSERT(xfs_has_parent(args->dp->i_mount));
+	ASSERT(args->attr_filter & XFS_ATTR_PARENT);
+	ASSERT(args->op_flags & XFS_DA_OP_LOGGED);
+
+	new = kmem_cache_zalloc(xfs_attr_intent_cache, GFP_NOFS | __GFP_NOFAIL);
+	new->xattri_da_args = args;
+
+	switch (op) {
+	case XFS_ATTR_DEFER_SET:
+	case XFS_ATTR_DEFER_REPLACE:
+		/* will be added in subsequent patches */
+		ASSERT(0);
+		break;
+	case XFS_ATTR_DEFER_REMOVE:
+		new->xattri_op_flags = XFS_ATTRI_OP_FLAGS_PPTR_REMOVE;
+		new->xattri_dela_state = xfs_attr_init_remove_state(args);
+		break;
+	}
+
+	xfs_defer_add(args->trans, &new->xattri_list, &xfs_attr_defer_type);
+	trace_xfs_attr_defer_add(new->xattri_dela_state, args->dp);
+}
+
 const struct xfs_defer_op_type xfs_attr_defer_type = {
 	.name		= "attr",
 	.max_items	= 1,
@@ -801,6 +840,16 @@ xlog_recover_attri_commit_pass2(
 	/* Check the number of log iovecs makes sense for the op code. */
 	op = xfs_attr_log_item_op(attri_formatp);
 	switch (op) {
+	case XFS_ATTRI_OP_FLAGS_PPTR_REMOVE:
+		/* Log item, attr name, attr value */
+		if (item->ri_total != 3) {
+			XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
+					     attri_formatp, len);
+			return -EFSCORRUPTED;
+		}
+		name_len = attri_formatp->alfi_name_len;
+		value_len = attri_formatp->alfi_value_len;
+		break;
 	case XFS_ATTRI_OP_FLAGS_SET:
 	case XFS_ATTRI_OP_FLAGS_REPLACE:
 		/* Log item, attr name, attr value */
@@ -876,12 +925,16 @@ xlog_recover_attri_commit_pass2(
 			return -EFSCORRUPTED;
 		}
 		fallthrough;
+	case XFS_ATTRI_OP_FLAGS_PPTR_REMOVE:
 	case XFS_ATTRI_OP_FLAGS_SET:
 	case XFS_ATTRI_OP_FLAGS_REPLACE:
 		/*
 		 * Regular xattr set/remove/replace operations require a name
 		 * and do not take a newname.  Values are optional for set and
 		 * replace.
+		 *
+		 * Name-value remove operations must have a name, do not
+		 * take a newname, and can take a value.
 		 */
 		if (attr_name == NULL || name_len == 0) {
 			XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
diff --git a/fs/xfs/xfs_attr_item.h b/fs/xfs/xfs_attr_item.h
index c32b669b0e16a..f9efd674fd062 100644
--- a/fs/xfs/xfs_attr_item.h
+++ b/fs/xfs/xfs_attr_item.h
@@ -58,5 +58,7 @@ enum xfs_attr_defer_op {
 };
 
 void xfs_attr_defer_add(struct xfs_da_args *args, enum xfs_attr_defer_op op);
+void xfs_attr_defer_parent(struct xfs_da_args *args,
+		enum xfs_attr_defer_op op);
 
 #endif	/* __XFS_ATTR_ITEM_H__ */


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 10/32] xfs: log parent pointer xattr setting operations
  2024-04-10  0:45 ` [PATCHSET v13.1 5/9] xfs: Parent Pointers Darrick J. Wong
                     ` (8 preceding siblings ...)
  2024-04-10  0:55   ` [PATCH 09/32] xfs: log parent pointer xattr removal operations Darrick J. Wong
@ 2024-04-10  0:56   ` Darrick J. Wong
  2024-04-10  0:56   ` [PATCH 11/32] xfs: log parent pointer xattr replace operations Darrick J. Wong
                     ` (21 subsequent siblings)
  31 siblings, 0 replies; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  0:56 UTC (permalink / raw)
  To: djwong; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

The parent pointer code needs to do a deferred parent pointer set
operation with the xattr log intent code.  Declare a new logged xattr
opcode and push it through the log.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_log_format.h |    1 +
 fs/xfs/xfs_attr_item.c         |    9 ++++++++-
 2 files changed, 9 insertions(+), 1 deletion(-)


diff --git a/fs/xfs/libxfs/xfs_log_format.h b/fs/xfs/libxfs/xfs_log_format.h
index 52dcee4b3abe6..96732a212507e 100644
--- a/fs/xfs/libxfs/xfs_log_format.h
+++ b/fs/xfs/libxfs/xfs_log_format.h
@@ -1026,6 +1026,7 @@ struct xfs_icreate_log {
 #define XFS_ATTRI_OP_FLAGS_SET		1	/* Set the attribute */
 #define XFS_ATTRI_OP_FLAGS_REMOVE	2	/* Remove the attribute */
 #define XFS_ATTRI_OP_FLAGS_REPLACE	3	/* Replace the attribute */
+#define XFS_ATTRI_OP_FLAGS_PPTR_SET	4	/* Set parent pointer */
 #define XFS_ATTRI_OP_FLAGS_PPTR_REMOVE	5	/* Remove parent pointer */
 #define XFS_ATTRI_OP_FLAGS_TYPE_MASK	0xFF	/* Flags type mask */
 
diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
index 5cce8a9863862..d89495990f03b 100644
--- a/fs/xfs/xfs_attr_item.c
+++ b/fs/xfs/xfs_attr_item.c
@@ -491,6 +491,7 @@ xfs_attri_validate(
 		return false;
 
 	switch (op) {
+	case XFS_ATTRI_OP_FLAGS_PPTR_SET:
 	case XFS_ATTRI_OP_FLAGS_PPTR_REMOVE:
 		if (!xfs_has_parent(mp))
 			return false;
@@ -595,6 +596,7 @@ xfs_attri_recover_work(
 	xfs_attr_sethash(args);
 
 	switch (xfs_attr_intent_op(attr)) {
+	case XFS_ATTRI_OP_FLAGS_PPTR_SET:
 	case XFS_ATTRI_OP_FLAGS_SET:
 	case XFS_ATTRI_OP_FLAGS_REPLACE:
 		args->total = xfs_attr_calc_size(args, &local);
@@ -778,6 +780,9 @@ xfs_attr_defer_parent(
 
 	switch (op) {
 	case XFS_ATTR_DEFER_SET:
+		new->xattri_op_flags = XFS_ATTRI_OP_FLAGS_PPTR_SET;
+		new->xattri_dela_state = xfs_attr_init_add_state(args);
+		break;
 	case XFS_ATTR_DEFER_REPLACE:
 		/* will be added in subsequent patches */
 		ASSERT(0);
@@ -841,6 +846,7 @@ xlog_recover_attri_commit_pass2(
 	op = xfs_attr_log_item_op(attri_formatp);
 	switch (op) {
 	case XFS_ATTRI_OP_FLAGS_PPTR_REMOVE:
+	case XFS_ATTRI_OP_FLAGS_PPTR_SET:
 		/* Log item, attr name, attr value */
 		if (item->ri_total != 3) {
 			XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
@@ -926,6 +932,7 @@ xlog_recover_attri_commit_pass2(
 		}
 		fallthrough;
 	case XFS_ATTRI_OP_FLAGS_PPTR_REMOVE:
+	case XFS_ATTRI_OP_FLAGS_PPTR_SET:
 	case XFS_ATTRI_OP_FLAGS_SET:
 	case XFS_ATTRI_OP_FLAGS_REPLACE:
 		/*
@@ -933,7 +940,7 @@ xlog_recover_attri_commit_pass2(
 		 * and do not take a newname.  Values are optional for set and
 		 * replace.
 		 *
-		 * Name-value remove operations must have a name, do not
+		 * Name-value set/remove operations must have a name, do not
 		 * take a newname, and can take a value.
 		 */
 		if (attr_name == NULL || name_len == 0) {


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 11/32] xfs: log parent pointer xattr replace operations
  2024-04-10  0:45 ` [PATCHSET v13.1 5/9] xfs: Parent Pointers Darrick J. Wong
                     ` (9 preceding siblings ...)
  2024-04-10  0:56   ` [PATCH 10/32] xfs: log parent pointer xattr setting operations Darrick J. Wong
@ 2024-04-10  0:56   ` Darrick J. Wong
  2024-04-10  5:26     ` Christoph Hellwig
  2024-04-10  0:56   ` [PATCH 12/32] xfs: record inode generation in xattr update log intent items Darrick J. Wong
                     ` (20 subsequent siblings)
  31 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  0:56 UTC (permalink / raw)
  To: djwong
  Cc: Allison Henderson, catherine.hoang, hch, allison.henderson, linux-xfs

From: Allison Henderson <allison.henderson@oracle.com>

The parent pointer code needs to do a deferred parent pointer replace
operation with the xattr log intent code.  Declare a new logged xattr
opcode and push it through the log.

(Formerly titled "xfs: Add new name to attri/d" and described as
follows:

This patch adds two new fields to the atti/d.  They are nname and
nnamelen.  This will be used for parent pointer updates since a
rename operation may cause the parent pointer to update both the
name and value.  So we need to carry both the new name as well as
the target name in the attri/d.)

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: reworked to handle new disk format]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_attr.c       |   19 ++++
 fs/xfs/libxfs/xfs_attr.h       |    4 -
 fs/xfs/libxfs/xfs_da_btree.h   |    4 +
 fs/xfs/libxfs/xfs_log_format.h |   20 ++++
 fs/xfs/xfs_attr_item.c         |  193 ++++++++++++++++++++++++++++++++++++----
 fs/xfs/xfs_attr_item.h         |    2 
 6 files changed, 218 insertions(+), 24 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index 30988d60162c7..6e47c493bf9e2 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -441,6 +441,23 @@ xfs_attr_hashval(
 	return xfs_attr_hashname(name, namelen);
 }
 
+/*
+ * PPTR_REPLACE operations require the caller to set the old and new names and
+ * values explicitly.  Update the canonical fields to the new name and value
+ * here now that the removal phase has finished.
+ */
+static void
+xfs_attr_update_pptr_replace_args(
+	struct xfs_da_args	*args)
+{
+	ASSERT(args->new_namelen > 0);
+	args->name = args->new_name;
+	args->namelen = args->new_namelen;
+	args->value = args->new_value;
+	args->valuelen = args->new_valuelen;
+	xfs_attr_sethash(args);
+}
+
 /*
  * Handle the state change on completion of a multi-state attr operation.
  *
@@ -461,6 +478,8 @@ xfs_attr_complete_op(
 
 	if (!(args->op_flags & XFS_DA_OP_REPLACE))
 		replace_state = XFS_DAS_DONE;
+	else if (xfs_attr_intent_op(attr) == XFS_ATTRI_OP_FLAGS_PPTR_REPLACE)
+		xfs_attr_update_pptr_replace_args(args);
 
 	args->op_flags &= ~XFS_DA_OP_REPLACE;
 	args->attr_filter &= ~XFS_ATTR_INCOMPLETE;
diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
index df91c94d5bf5c..d63305fc54155 100644
--- a/fs/xfs/libxfs/xfs_attr.h
+++ b/fs/xfs/libxfs/xfs_attr.h
@@ -510,8 +510,8 @@ struct xfs_attr_intent {
 	struct xfs_da_args		*xattri_da_args;
 
 	/*
-	 * Shared buffer containing the attr name and value so that the logging
-	 * code can share large memory buffers between log items.
+	 * Shared buffer containing the attr name, new name, and value so that
+	 * the logging code can share large memory buffers between log items.
 	 */
 	struct xfs_attri_log_nameval	*xattri_nameval;
 
diff --git a/fs/xfs/libxfs/xfs_da_btree.h b/fs/xfs/libxfs/xfs_da_btree.h
index 47485f5edae86..8d7a38fe2a5c0 100644
--- a/fs/xfs/libxfs/xfs_da_btree.h
+++ b/fs/xfs/libxfs/xfs_da_btree.h
@@ -55,7 +55,9 @@ enum xfs_dacmp {
 typedef struct xfs_da_args {
 	struct xfs_da_geometry *geo;	/* da block geometry */
 	const uint8_t	*name;		/* string (maybe not NULL terminated) */
+	const uint8_t	*new_name;	/* new attr name */
 	void		*value;		/* set of bytes (maybe contain NULLs) */
+	void		*new_value;	/* new xattr value (may contain NULLs) */
 	struct xfs_inode *dp;		/* directory inode to manipulate */
 	struct xfs_trans *trans;	/* current trans (changes over time) */
 
@@ -63,11 +65,13 @@ typedef struct xfs_da_args {
 	xfs_ino_t	owner;		/* inode that owns the dir/attr data */
 
 	int		valuelen;	/* length of value */
+	int		new_valuelen;	/* length of new_value */
 	uint8_t		filetype;	/* filetype of inode for directories */
 	uint8_t		op_flags;	/* operation flags */
 	uint8_t		attr_filter;	/* XFS_ATTR_{ROOT,SECURE,INCOMPLETE} */
 	uint8_t		xattr_flags;	/* XATTR_{CREATE,REPLACE} */
 	short		namelen;	/* length of string (maybe no NULL) */
+	short		new_namelen;	/* length of new attr name */
 	xfs_dahash_t	hashval;	/* hash value of name */
 	xfs_extlen_t	total;		/* total blocks needed, for 1st bmap */
 	int		whichfork;	/* data or attribute fork */
diff --git a/fs/xfs/libxfs/xfs_log_format.h b/fs/xfs/libxfs/xfs_log_format.h
index 96732a212507e..632dd97324557 100644
--- a/fs/xfs/libxfs/xfs_log_format.h
+++ b/fs/xfs/libxfs/xfs_log_format.h
@@ -115,11 +115,13 @@ struct xfs_unmount_log_format {
 #define XLOG_REG_TYPE_BUD_FORMAT	26
 #define XLOG_REG_TYPE_ATTRI_FORMAT	27
 #define XLOG_REG_TYPE_ATTRD_FORMAT	28
-#define XLOG_REG_TYPE_ATTR_NAME	29
+#define XLOG_REG_TYPE_ATTR_NAME		29
 #define XLOG_REG_TYPE_ATTR_VALUE	30
 #define XLOG_REG_TYPE_XMI_FORMAT	31
 #define XLOG_REG_TYPE_XMD_FORMAT	32
-#define XLOG_REG_TYPE_MAX		32
+#define XLOG_REG_TYPE_ATTR_NEWNAME	33
+#define XLOG_REG_TYPE_ATTR_NEWVALUE	34
+#define XLOG_REG_TYPE_MAX		34
 
 /*
  * Flags to log operation header
@@ -1028,6 +1030,7 @@ struct xfs_icreate_log {
 #define XFS_ATTRI_OP_FLAGS_REPLACE	3	/* Replace the attribute */
 #define XFS_ATTRI_OP_FLAGS_PPTR_SET	4	/* Set parent pointer */
 #define XFS_ATTRI_OP_FLAGS_PPTR_REMOVE	5	/* Remove parent pointer */
+#define XFS_ATTRI_OP_FLAGS_PPTR_REPLACE	6	/* Replace parent pointer */
 #define XFS_ATTRI_OP_FLAGS_TYPE_MASK	0xFF	/* Flags type mask */
 
 /*
@@ -1050,7 +1053,18 @@ struct xfs_attri_log_format {
 	uint64_t	alfi_id;	/* attri identifier */
 	uint64_t	alfi_ino;	/* the inode for this attr operation */
 	uint32_t	alfi_op_flags;	/* marks the op as a set or remove */
-	uint32_t	alfi_name_len;	/* attr name length */
+	union {
+		uint32_t	alfi_name_len;	/* attr name length */
+		struct {
+			/*
+			 * For PPTR_REPLACE, these are the lengths of the old
+			 * and new attr names.  The new and old values must
+			 * have the same length.
+			 */
+			uint16_t	alfi_old_name_len;
+			uint16_t	alfi_new_name_len;
+		};
+	};
 	uint32_t	alfi_value_len;	/* attr value length */
 	uint32_t	alfi_attr_filter;/* attr filter flags */
 };
diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
index d89495990f03b..8d33294217aca 100644
--- a/fs/xfs/xfs_attr_item.c
+++ b/fs/xfs/xfs_attr_item.c
@@ -73,8 +73,12 @@ static inline struct xfs_attri_log_nameval *
 xfs_attri_log_nameval_alloc(
 	const void			*name,
 	unsigned int			name_len,
+	const void			*new_name,
+	unsigned int			new_name_len,
 	const void			*value,
-	unsigned int			value_len)
+	unsigned int			value_len,
+	const void			*new_value,
+	unsigned int			new_value_len)
 {
 	struct xfs_attri_log_nameval	*nv;
 
@@ -83,15 +87,26 @@ xfs_attri_log_nameval_alloc(
 	 * this. But kvmalloc() utterly sucks, so we use our own version.
 	 */
 	nv = xlog_kvmalloc(sizeof(struct xfs_attri_log_nameval) +
-					name_len + value_len);
+					name_len + new_name_len + value_len +
+					new_value_len);
 
 	nv->name.i_addr = nv + 1;
 	nv->name.i_len = name_len;
 	nv->name.i_type = XLOG_REG_TYPE_ATTR_NAME;
 	memcpy(nv->name.i_addr, name, name_len);
 
+	if (new_name_len) {
+		nv->new_name.i_addr = nv->name.i_addr + name_len;
+		nv->new_name.i_len = new_name_len;
+		memcpy(nv->new_name.i_addr, new_name, new_name_len);
+	} else {
+		nv->new_name.i_addr = NULL;
+		nv->new_name.i_len = 0;
+	}
+	nv->new_name.i_type = XLOG_REG_TYPE_ATTR_NEWNAME;
+
 	if (value_len) {
-		nv->value.i_addr = nv->name.i_addr + name_len;
+		nv->value.i_addr = nv->name.i_addr + name_len + new_name_len;
 		nv->value.i_len = value_len;
 		memcpy(nv->value.i_addr, value, value_len);
 	} else {
@@ -100,6 +115,17 @@ xfs_attri_log_nameval_alloc(
 	}
 	nv->value.i_type = XLOG_REG_TYPE_ATTR_VALUE;
 
+	if (new_value_len) {
+		nv->new_value.i_addr = nv->name.i_addr + name_len +
+						new_name_len + value_len;
+		nv->new_value.i_len = new_value_len;
+		memcpy(nv->new_value.i_addr, new_value, new_value_len);
+	} else {
+		nv->new_value.i_addr = NULL;
+		nv->new_value.i_len = 0;
+	}
+	nv->new_value.i_type = XLOG_REG_TYPE_ATTR_NEWVALUE;
+
 	refcount_set(&nv->refcount, 1);
 	return nv;
 }
@@ -145,11 +171,20 @@ xfs_attri_item_size(
 	*nbytes += sizeof(struct xfs_attri_log_format) +
 			xlog_calc_iovec_len(nv->name.i_len);
 
-	if (!nv->value.i_len)
-		return;
+	if (nv->new_name.i_len) {
+		*nvecs += 1;
+		*nbytes += xlog_calc_iovec_len(nv->new_name.i_len);
+	}
 
-	*nvecs += 1;
-	*nbytes += xlog_calc_iovec_len(nv->value.i_len);
+	if (nv->value.i_len) {
+		*nvecs += 1;
+		*nbytes += xlog_calc_iovec_len(nv->value.i_len);
+	}
+
+	if (nv->new_value.i_len) {
+		*nvecs += 1;
+		*nbytes += xlog_calc_iovec_len(nv->new_value.i_len);
+	}
 }
 
 /*
@@ -179,15 +214,28 @@ xfs_attri_item_format(
 	ASSERT(nv->name.i_len > 0);
 	attrip->attri_format.alfi_size++;
 
+	if (nv->new_name.i_len > 0)
+		attrip->attri_format.alfi_size++;
+
 	if (nv->value.i_len > 0)
 		attrip->attri_format.alfi_size++;
 
+	if (nv->new_value.i_len > 0)
+		attrip->attri_format.alfi_size++;
+
 	xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTRI_FORMAT,
 			&attrip->attri_format,
 			sizeof(struct xfs_attri_log_format));
 	xlog_copy_from_iovec(lv, &vecp, &nv->name);
+
+	if (nv->new_name.i_len > 0)
+		xlog_copy_from_iovec(lv, &vecp, &nv->new_name);
+
 	if (nv->value.i_len > 0)
 		xlog_copy_from_iovec(lv, &vecp, &nv->value);
+
+	if (nv->new_value.i_len > 0)
+		xlog_copy_from_iovec(lv, &vecp, &nv->new_value);
 }
 
 /*
@@ -333,7 +381,17 @@ xfs_attr_log_item(
 	ASSERT(!(attr->xattri_op_flags & ~XFS_ATTRI_OP_FLAGS_TYPE_MASK));
 	attrp->alfi_op_flags = attr->xattri_op_flags;
 	attrp->alfi_value_len = attr->xattri_nameval->value.i_len;
-	attrp->alfi_name_len = attr->xattri_nameval->name.i_len;
+
+	if (xfs_attr_log_item_op(attrp) == XFS_ATTRI_OP_FLAGS_PPTR_REPLACE) {
+		ASSERT(attr->xattri_nameval->value.i_len ==
+		       attr->xattri_nameval->new_value.i_len);
+
+		attrp->alfi_old_name_len = attr->xattri_nameval->name.i_len;
+		attrp->alfi_new_name_len = attr->xattri_nameval->new_name.i_len;
+	} else {
+		attrp->alfi_name_len = attr->xattri_nameval->name.i_len;
+	}
+
 	ASSERT(!(attr->xattri_da_args->attr_filter & ~XFS_ATTRI_FILTER_MASK));
 	attrp->alfi_attr_filter = attr->xattri_da_args->attr_filter;
 }
@@ -374,8 +432,11 @@ xfs_attr_create_intent(
 		 * Transfer our reference to the name/value buffer to the
 		 * deferred work state structure.
 		 */
-		attr->xattri_nameval = xfs_attri_log_nameval_alloc(args->name,
-				args->namelen, args->value, args->valuelen);
+		attr->xattri_nameval = xfs_attri_log_nameval_alloc(
+				args->name, args->namelen,
+				args->new_name, args->new_namelen,
+				args->value, args->valuelen,
+				args->new_value, args->new_valuelen);
 	}
 
 	attrip = xfs_attri_init(mp, attr->xattri_nameval);
@@ -477,9 +538,6 @@ xfs_attri_validate(
 	if (!xfs_is_using_logged_xattrs(mp) && !xfs_has_parent(mp))
 		return false;
 
-	if (attrp->__pad != 0)
-		return false;
-
 	if (attrp->alfi_op_flags & ~XFS_ATTRI_OP_FLAGS_TYPE_MASK)
 		return false;
 
@@ -515,6 +573,21 @@ xfs_attri_validate(
 		    attrp->alfi_name_len > XATTR_NAME_MAX)
 			return false;
 		break;
+	case XFS_ATTRI_OP_FLAGS_PPTR_REPLACE:
+		if (!xfs_has_parent(mp))
+			return false;
+		if (attrp->alfi_old_name_len == 0 ||
+		    attrp->alfi_old_name_len > XATTR_NAME_MAX)
+			return false;
+		if (attrp->alfi_new_name_len == 0 ||
+		    attrp->alfi_new_name_len > XATTR_NAME_MAX)
+			return false;
+		if (attrp->alfi_value_len == 0 ||
+		    attrp->alfi_value_len > XATTR_SIZE_MAX)
+			return false;
+		if (!(attrp->alfi_attr_filter & XFS_ATTR_PARENT))
+			return false;
+		break;
 	default:
 		return false;
 	}
@@ -587,8 +660,12 @@ xfs_attri_recover_work(
 	args->whichfork = XFS_ATTR_FORK;
 	args->name = nv->name.i_addr;
 	args->namelen = nv->name.i_len;
+	args->new_name = nv->new_name.i_addr;
+	args->new_namelen = nv->new_name.i_len;
 	args->value = nv->value.i_addr;
 	args->valuelen = nv->value.i_len;
+	args->new_value = nv->new_value.i_addr;
+	args->new_valuelen = nv->new_value.i_len;
 	args->attr_filter = attrp->alfi_attr_filter & XFS_ATTRI_FILTER_MASK;
 	args->op_flags = XFS_DA_OP_RECOVERY | XFS_DA_OP_OKNOENT |
 			 XFS_DA_OP_LOGGED;
@@ -597,6 +674,7 @@ xfs_attri_recover_work(
 
 	switch (xfs_attr_intent_op(attr)) {
 	case XFS_ATTRI_OP_FLAGS_PPTR_SET:
+	case XFS_ATTRI_OP_FLAGS_PPTR_REPLACE:
 	case XFS_ATTRI_OP_FLAGS_SET:
 	case XFS_ATTRI_OP_FLAGS_REPLACE:
 		args->total = xfs_attr_calc_size(args, &local);
@@ -706,7 +784,14 @@ xfs_attr_relog_intent(
 	new_attrp->alfi_ino = old_attrp->alfi_ino;
 	new_attrp->alfi_op_flags = old_attrp->alfi_op_flags;
 	new_attrp->alfi_value_len = old_attrp->alfi_value_len;
-	new_attrp->alfi_name_len = old_attrp->alfi_name_len;
+
+	if (xfs_attr_log_item_op(old_attrp) == XFS_ATTRI_OP_FLAGS_PPTR_REPLACE) {
+		new_attrp->alfi_new_name_len = old_attrp->alfi_new_name_len;
+		new_attrp->alfi_old_name_len = old_attrp->alfi_old_name_len;
+	} else {
+		new_attrp->alfi_name_len = old_attrp->alfi_name_len;
+	}
+
 	new_attrp->alfi_attr_filter = old_attrp->alfi_attr_filter;
 
 	return &new_attrip->attri_item;
@@ -784,8 +869,10 @@ xfs_attr_defer_parent(
 		new->xattri_dela_state = xfs_attr_init_add_state(args);
 		break;
 	case XFS_ATTR_DEFER_REPLACE:
-		/* will be added in subsequent patches */
-		ASSERT(0);
+		ASSERT(args->new_valuelen == args->valuelen);
+
+		new->xattri_op_flags = XFS_ATTRI_OP_FLAGS_PPTR_REPLACE;
+		new->xattri_dela_state = xfs_attr_init_replace_state(args);
 		break;
 	case XFS_ATTR_DEFER_REMOVE:
 		new->xattri_op_flags = XFS_ATTRI_OP_FLAGS_PPTR_REMOVE;
@@ -822,9 +909,13 @@ xlog_recover_attri_commit_pass2(
 	struct xfs_attri_log_nameval	*nv;
 	const void			*attr_name;
 	const void			*attr_value = NULL;
+	const void			*attr_new_name = NULL;
+	const void			*attr_new_value = NULL;
 	size_t				len;
 	unsigned int			name_len = 0;
 	unsigned int			value_len = 0;
+	unsigned int			new_name_len = 0;
+	unsigned int			new_value_len = 0;
 	unsigned int			op, i = 0;
 
 	/* Validate xfs_attri_log_format before the large memory allocation */
@@ -876,6 +967,20 @@ xlog_recover_attri_commit_pass2(
 		}
 		name_len = attri_formatp->alfi_name_len;
 		break;
+	case XFS_ATTRI_OP_FLAGS_PPTR_REPLACE:
+		/*
+		 * Log item, attr name, new attr name, attr value, new attr
+		 * value
+		 */
+		if (item->ri_total != 5) {
+			XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
+					     attri_formatp, len);
+			return -EFSCORRUPTED;
+		}
+		name_len = attri_formatp->alfi_old_name_len;
+		new_name_len = attri_formatp->alfi_new_name_len;
+		new_value_len = value_len = attri_formatp->alfi_value_len;
+		break;
 	default:
 		XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
 				     attri_formatp, len);
@@ -899,12 +1004,31 @@ xlog_recover_attri_commit_pass2(
 	}
 	i++;
 
+	/* Validate the new attr name */
+	if (new_name_len > 0) {
+		if (item->ri_buf[i].i_len != xlog_calc_iovec_len(new_name_len)) {
+			XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
+					item->ri_buf[i].i_addr,
+					item->ri_buf[i].i_len);
+			return -EFSCORRUPTED;
+		}
+
+		attr_new_name = item->ri_buf[i].i_addr;
+		if (!xfs_attr_namecheck(attri_formatp->alfi_attr_filter,
+					attr_new_name, new_name_len)) {
+			XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
+					item->ri_buf[i].i_addr,
+					item->ri_buf[i].i_len);
+			return -EFSCORRUPTED;
+		}
+		i++;
+	}
+
 	/* Validate the attr value, if present */
 	if (value_len != 0) {
 		if (item->ri_buf[i].i_len != xlog_calc_iovec_len(value_len)) {
 			XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
-					item->ri_buf[0].i_addr,
-					item->ri_buf[0].i_len);
+					attri_formatp, len);
 			return -EFSCORRUPTED;
 		}
 
@@ -912,6 +1036,18 @@ xlog_recover_attri_commit_pass2(
 		i++;
 	}
 
+	/* Validate the new attr value, if present */
+	if (new_value_len != 0) {
+		if (item->ri_buf[i].i_len != xlog_calc_iovec_len(new_value_len)) {
+			XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
+					attri_formatp, len);
+			return -EFSCORRUPTED;
+		}
+
+		attr_new_value = item->ri_buf[i].i_addr;
+		i++;
+	}
+
 	/*
 	 * Make sure we got the correct number of buffers for the operation
 	 * that we just loaded.
@@ -949,6 +1085,23 @@ xlog_recover_attri_commit_pass2(
 			return -EFSCORRUPTED;
 		}
 		break;
+	case XFS_ATTRI_OP_FLAGS_PPTR_REPLACE:
+		/*
+		 * Name-value replace operations require the caller to
+		 * specify the old and new names and values explicitly.
+		 * Values are optional.
+		 */
+		if (attr_name == NULL || name_len == 0) {
+			XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
+					     attri_formatp, len);
+			return -EFSCORRUPTED;
+		}
+		if (attr_new_name == NULL || new_name_len == 0) {
+			XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
+					     attri_formatp, len);
+			return -EFSCORRUPTED;
+		}
+		break;
 	}
 
 	/*
@@ -957,7 +1110,9 @@ xlog_recover_attri_commit_pass2(
 	 * reference.
 	 */
 	nv = xfs_attri_log_nameval_alloc(attr_name, name_len,
-			attr_value, value_len);
+			attr_new_name, new_name_len,
+			attr_value, value_len,
+			attr_new_value, new_value_len);
 
 	attrip = xfs_attri_init(mp, nv);
 	memcpy(&attrip->attri_format, attri_formatp, len);
diff --git a/fs/xfs/xfs_attr_item.h b/fs/xfs/xfs_attr_item.h
index f9efd674fd062..d5e4658f711d1 100644
--- a/fs/xfs/xfs_attr_item.h
+++ b/fs/xfs/xfs_attr_item.h
@@ -13,7 +13,9 @@ struct kmem_zone;
 
 struct xfs_attri_log_nameval {
 	struct xfs_log_iovec	name;
+	struct xfs_log_iovec	new_name;	/* PPTR_REPLACE only */
 	struct xfs_log_iovec	value;
+	struct xfs_log_iovec	new_value;	/* PPTR_REPLACE only */
 	refcount_t		refcount;
 
 	/* name and value follow the end of this struct */


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 12/32] xfs: record inode generation in xattr update log intent items
  2024-04-10  0:45 ` [PATCHSET v13.1 5/9] xfs: Parent Pointers Darrick J. Wong
                     ` (10 preceding siblings ...)
  2024-04-10  0:56   ` [PATCH 11/32] xfs: log parent pointer xattr replace operations Darrick J. Wong
@ 2024-04-10  0:56   ` Darrick J. Wong
  2024-04-10  5:27     ` Christoph Hellwig
  2024-04-10  0:56   ` [PATCH 13/32] xfs: Expose init_xattrs in xfs_create_tmpfile Darrick J. Wong
                     ` (19 subsequent siblings)
  31 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  0:56 UTC (permalink / raw)
  To: djwong; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

For parent pointer updates, record the i_generation of the file that is
being updated so that we don't accidentally jump generations.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_log_format.h |    2 +-
 fs/xfs/xfs_attr_item.c         |   26 ++++++++++++++++++++++++--
 2 files changed, 25 insertions(+), 3 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_log_format.h b/fs/xfs/libxfs/xfs_log_format.h
index 632dd97324557..3e6682ed656b3 100644
--- a/fs/xfs/libxfs/xfs_log_format.h
+++ b/fs/xfs/libxfs/xfs_log_format.h
@@ -1049,7 +1049,7 @@ struct xfs_icreate_log {
 struct xfs_attri_log_format {
 	uint16_t	alfi_type;	/* attri log item type */
 	uint16_t	alfi_size;	/* size of this item */
-	uint32_t	__pad;		/* pad to 64 bit aligned */
+	uint32_t	alfi_igen;	/* generation of alfi_ino for pptr ops */
 	uint64_t	alfi_id;	/* attri identifier */
 	uint64_t	alfi_ino;	/* the inode for this attr operation */
 	uint32_t	alfi_op_flags;	/* marks the op as a set or remove */
diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
index 8d33294217aca..be8660a0b55ff 100644
--- a/fs/xfs/xfs_attr_item.c
+++ b/fs/xfs/xfs_attr_item.c
@@ -382,14 +382,22 @@ xfs_attr_log_item(
 	attrp->alfi_op_flags = attr->xattri_op_flags;
 	attrp->alfi_value_len = attr->xattri_nameval->value.i_len;
 
-	if (xfs_attr_log_item_op(attrp) == XFS_ATTRI_OP_FLAGS_PPTR_REPLACE) {
+	switch (xfs_attr_log_item_op(attrp)) {
+	case XFS_ATTRI_OP_FLAGS_PPTR_REPLACE:
 		ASSERT(attr->xattri_nameval->value.i_len ==
 		       attr->xattri_nameval->new_value.i_len);
 
+		attrp->alfi_igen = VFS_I(attr->xattri_da_args->dp)->i_generation;
 		attrp->alfi_old_name_len = attr->xattri_nameval->name.i_len;
 		attrp->alfi_new_name_len = attr->xattri_nameval->new_name.i_len;
-	} else {
+		break;
+	case XFS_ATTRI_OP_FLAGS_PPTR_REMOVE:
+	case XFS_ATTRI_OP_FLAGS_PPTR_SET:
+		attrp->alfi_igen = VFS_I(attr->xattri_da_args->dp)->i_generation;
+		fallthrough;
+	default:
 		attrp->alfi_name_len = attr->xattri_nameval->name.i_len;
+		break;
 	}
 
 	ASSERT(!(attr->xattri_da_args->attr_filter & ~XFS_ATTRI_FILTER_MASK));
@@ -632,6 +640,19 @@ xfs_attri_recover_work(
 	if (error)
 		return ERR_PTR(error);
 
+	switch (xfs_attr_log_item_op(attrp)) {
+	case XFS_ATTRI_OP_FLAGS_PPTR_SET:
+	case XFS_ATTRI_OP_FLAGS_PPTR_REPLACE:
+	case XFS_ATTRI_OP_FLAGS_PPTR_REMOVE:
+		if (VFS_I(ip)->i_generation != attrp->alfi_igen) {
+			xfs_irele(ip);
+			XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
+					attrp, sizeof(*attrp));
+			return ERR_PTR(-EFSCORRUPTED);
+		}
+		break;
+	}
+
 	if (xfs_inode_has_attr_fork(ip)) {
 		error = xfs_attri_iread_extents(ip);
 		if (error) {
@@ -782,6 +803,7 @@ xfs_attr_relog_intent(
 	new_attrp = &new_attrip->attri_format;
 
 	new_attrp->alfi_ino = old_attrp->alfi_ino;
+	new_attrp->alfi_igen = old_attrp->alfi_igen;
 	new_attrp->alfi_op_flags = old_attrp->alfi_op_flags;
 	new_attrp->alfi_value_len = old_attrp->alfi_value_len;
 


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 13/32] xfs: Expose init_xattrs in xfs_create_tmpfile
  2024-04-10  0:45 ` [PATCHSET v13.1 5/9] xfs: Parent Pointers Darrick J. Wong
                     ` (11 preceding siblings ...)
  2024-04-10  0:56   ` [PATCH 12/32] xfs: record inode generation in xattr update log intent items Darrick J. Wong
@ 2024-04-10  0:56   ` Darrick J. Wong
  2024-04-10  5:28     ` Christoph Hellwig
  2024-04-10  0:57   ` [PATCH 14/32] xfs: add parent pointer validator functions Darrick J. Wong
                     ` (18 subsequent siblings)
  31 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  0:56 UTC (permalink / raw)
  To: djwong
  Cc: Allison Henderson, catherine.hoang, hch, allison.henderson, linux-xfs

From: Allison Henderson <allison.henderson@oracle.com>

Tmp files are used as part of rename operations and will need attr forks
initialized for parent pointers.  Expose the init_xattrs parameter to
the calling function to initialize the fork.

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_inode.c |    5 +++--
 fs/xfs/xfs_inode.h |    2 +-
 fs/xfs/xfs_iops.c  |    2 +-
 3 files changed, 5 insertions(+), 4 deletions(-)


diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 2aec7ab59aeb7..c079114b97ecf 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -1184,6 +1184,7 @@ xfs_create_tmpfile(
 	struct mnt_idmap	*idmap,
 	struct xfs_inode	*dp,
 	umode_t			mode,
+	bool			init_xattrs,
 	struct xfs_inode	**ipp)
 {
 	struct xfs_mount	*mp = dp->i_mount;
@@ -1224,7 +1225,7 @@ xfs_create_tmpfile(
 	error = xfs_dialloc(&tp, dp->i_ino, mode, &ino);
 	if (!error)
 		error = xfs_init_new_inode(idmap, tp, dp, ino, mode,
-				0, 0, prid, false, &ip);
+				0, 0, prid, init_xattrs, &ip);
 	if (error)
 		goto out_trans_cancel;
 
@@ -3036,7 +3037,7 @@ xfs_rename_alloc_whiteout(
 	int			error;
 
 	error = xfs_create_tmpfile(idmap, dp, S_IFCHR | WHITEOUT_MODE,
-				   &tmpfile);
+				   false, &tmpfile);
 	if (error)
 		return error;
 
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index a6da1ab8ab136..04a91e312993b 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -522,7 +522,7 @@ int		xfs_create(struct mnt_idmap *idmap,
 			   umode_t mode, dev_t rdev, bool need_xattr,
 			   struct xfs_inode **ipp);
 int		xfs_create_tmpfile(struct mnt_idmap *idmap,
-			   struct xfs_inode *dp, umode_t mode,
+			   struct xfs_inode *dp, umode_t mode, bool init_xattrs,
 			   struct xfs_inode **ipp);
 int		xfs_remove(struct xfs_inode *dp, struct xfs_name *name,
 			   struct xfs_inode *ip);
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 7f0c840f0fd2f..273bc30fd2bad 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -201,7 +201,7 @@ xfs_generic_create(
 				xfs_create_need_xattr(dir, default_acl, acl),
 				&ip);
 	} else {
-		error = xfs_create_tmpfile(idmap, XFS_I(dir), mode, &ip);
+		error = xfs_create_tmpfile(idmap, XFS_I(dir), mode, false, &ip);
 	}
 	if (unlikely(error))
 		goto out_free_acl;


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 14/32] xfs: add parent pointer validator functions
  2024-04-10  0:45 ` [PATCHSET v13.1 5/9] xfs: Parent Pointers Darrick J. Wong
                     ` (12 preceding siblings ...)
  2024-04-10  0:56   ` [PATCH 13/32] xfs: Expose init_xattrs in xfs_create_tmpfile Darrick J. Wong
@ 2024-04-10  0:57   ` Darrick J. Wong
  2024-04-10  5:31     ` Christoph Hellwig
  2024-04-10  0:57   ` [PATCH 15/32] xfs: extend transaction reservations for parent attributes Darrick J. Wong
                     ` (17 subsequent siblings)
  31 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  0:57 UTC (permalink / raw)
  To: djwong
  Cc: Allison Henderson, catherine.hoang, hch, allison.henderson, linux-xfs

From: Allison Henderson <allison.henderson@oracle.com>

Attribute names of parent pointers are not strings.  So we need to
modify attr_namecheck to verify parent pointer records when the
XFS_ATTR_PARENT flag is set.  At the same time, we need to validate attr
values during log recovery if the xattr is really a parent pointer.

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: move functions to xfs_parent.c, adjust for new disk format]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/Makefile            |    1 
 fs/xfs/libxfs/xfs_attr.c   |    5 ++
 fs/xfs/libxfs/xfs_parent.c |   92 ++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/libxfs/xfs_parent.h |   15 +++++++
 fs/xfs/xfs_attr_item.c     |   15 +++++++
 5 files changed, 128 insertions(+)
 create mode 100644 fs/xfs/libxfs/xfs_parent.c
 create mode 100644 fs/xfs/libxfs/xfs_parent.h


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 4e1eb3b6dbc45..4956ea9a307b8 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -42,6 +42,7 @@ xfs-y				+= $(addprefix libxfs/, \
 				   xfs_inode_buf.o \
 				   xfs_log_rlimit.o \
 				   xfs_ag_resv.o \
+				   xfs_parent.o \
 				   xfs_rmap.o \
 				   xfs_rmap_btree.o \
 				   xfs_refcount.o \
diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index 6e47c493bf9e2..41de6a135d907 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -26,6 +26,7 @@
 #include "xfs_trace.h"
 #include "xfs_attr_item.h"
 #include "xfs_xattr.h"
+#include "xfs_parent.h"
 
 struct kmem_cache		*xfs_attr_intent_cache;
 
@@ -1570,6 +1571,10 @@ xfs_attr_namecheck(
 	if (length >= MAXNAMELEN)
 		return false;
 
+	/* Parent pointers have their own validation. */
+	if (attr_flags & XFS_ATTR_PARENT)
+		return xfs_parent_namecheck(attr_flags, name, length);
+
 	/* There shouldn't be any nulls here */
 	return !memchr(name, 0, length);
 }
diff --git a/fs/xfs/libxfs/xfs_parent.c b/fs/xfs/libxfs/xfs_parent.c
new file mode 100644
index 0000000000000..5961fa8c85615
--- /dev/null
+++ b/fs/xfs/libxfs/xfs_parent.c
@@ -0,0 +1,92 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2022-2024 Oracle.
+ * All rights reserved.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_format.h"
+#include "xfs_da_format.h"
+#include "xfs_log_format.h"
+#include "xfs_shared.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_bmap_btree.h"
+#include "xfs_inode.h"
+#include "xfs_error.h"
+#include "xfs_trace.h"
+#include "xfs_trans.h"
+#include "xfs_da_btree.h"
+#include "xfs_attr.h"
+#include "xfs_dir2.h"
+#include "xfs_dir2_priv.h"
+#include "xfs_attr_sf.h"
+#include "xfs_bmap.h"
+#include "xfs_defer.h"
+#include "xfs_log.h"
+#include "xfs_xattr.h"
+#include "xfs_parent.h"
+#include "xfs_trans_space.h"
+
+/*
+ * Parent pointer attribute handling.
+ *
+ * Because the attribute name is a filename component, it will never be longer
+ * than 255 bytes and must not contain nulls or slashes.  These are roughly the
+ * same constraints that apply to attribute names.
+ *
+ * The attribute value must always be a struct xfs_parent_rec.  This means the
+ * attribute will never be in remote format because 12 bytes is nowhere near
+ * xfs_attr_leaf_entsize_local_max() (~75% of block size).
+ *
+ * Creating a new parent attribute will always create a new attribute - there
+ * should never, ever be an existing attribute in the tree for a new inode.
+ * ENOSPC behavior is problematic - creating the inode without the parent
+ * pointer is effectively a corruption, so we allow parent attribute creation
+ * to dip into the reserve block pool to avoid unexpected ENOSPC errors from
+ * occurring.
+ */
+
+/* Return true if parent pointer attr name is valid. */
+bool
+xfs_parent_namecheck(
+	unsigned int			attr_flags,
+	const void			*name,
+	size_t				length)
+{
+	/*
+	 * Parent pointers always use logged operations, so there should never
+	 * be incomplete xattrs.
+	 */
+	if (attr_flags & XFS_ATTR_INCOMPLETE)
+		return false;
+
+	return xfs_dir2_namecheck(name, length);
+}
+
+/* Return true if parent pointer attr value is valid. */
+bool
+xfs_parent_valuecheck(
+	struct xfs_mount		*mp,
+	const void			*value,
+	size_t				valuelen)
+{
+	const struct xfs_parent_rec	*rec = value;
+
+	if (!xfs_has_parent(mp))
+		return false;
+
+	/* The xattr value must be a parent record. */
+	if (valuelen != sizeof(struct xfs_parent_rec))
+		return false;
+
+	/* The parent record must be local. */
+	if (value == NULL)
+		return false;
+
+	/* The parent inumber must be valid. */
+	if (!xfs_verify_dir_ino(mp, be64_to_cpu(rec->p_ino)))
+		return false;
+
+	return true;
+}
diff --git a/fs/xfs/libxfs/xfs_parent.h b/fs/xfs/libxfs/xfs_parent.h
new file mode 100644
index 0000000000000..ef8aff8607801
--- /dev/null
+++ b/fs/xfs/libxfs/xfs_parent.h
@@ -0,0 +1,15 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2022-2024 Oracle.
+ * All Rights Reserved.
+ */
+#ifndef	__XFS_PARENT_H__
+#define	__XFS_PARENT_H__
+
+/* Metadata validators */
+bool xfs_parent_namecheck(unsigned int attr_flags, const void *name,
+		size_t length);
+bool xfs_parent_valuecheck(struct xfs_mount *mp, const void *value,
+		size_t valuelen);
+
+#endif /* __XFS_PARENT_H__ */
diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
index be8660a0b55ff..84c63b9668ad9 100644
--- a/fs/xfs/xfs_attr_item.c
+++ b/fs/xfs/xfs_attr_item.c
@@ -27,6 +27,7 @@
 #include "xfs_error.h"
 #include "xfs_log_priv.h"
 #include "xfs_log_recover.h"
+#include "xfs_parent.h"
 
 struct kmem_cache		*xfs_attri_cache;
 struct kmem_cache		*xfs_attrd_cache;
@@ -1055,6 +1056,13 @@ xlog_recover_attri_commit_pass2(
 		}
 
 		attr_value = item->ri_buf[i].i_addr;
+		if ((attri_formatp->alfi_attr_filter & XFS_ATTR_PARENT) &&
+		    !xfs_parent_valuecheck(mp, attr_value, value_len)) {
+			XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
+					item->ri_buf[i].i_addr,
+					item->ri_buf[i].i_len);
+			return -EFSCORRUPTED;
+		}
 		i++;
 	}
 
@@ -1067,6 +1075,13 @@ xlog_recover_attri_commit_pass2(
 		}
 
 		attr_new_value = item->ri_buf[i].i_addr;
+		if ((attri_formatp->alfi_attr_filter & XFS_ATTR_PARENT) &&
+		    !xfs_parent_valuecheck(mp, attr_new_value, new_value_len)) {
+			XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
+					item->ri_buf[i].i_addr,
+					item->ri_buf[i].i_len);
+			return -EFSCORRUPTED;
+		}
 		i++;
 	}
 


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 15/32] xfs: extend transaction reservations for parent attributes
  2024-04-10  0:45 ` [PATCHSET v13.1 5/9] xfs: Parent Pointers Darrick J. Wong
                     ` (13 preceding siblings ...)
  2024-04-10  0:57   ` [PATCH 14/32] xfs: add parent pointer validator functions Darrick J. Wong
@ 2024-04-10  0:57   ` Darrick J. Wong
  2024-04-10  5:31     ` Christoph Hellwig
  2024-04-10  0:57   ` [PATCH 16/32] xfs: create a hashname function for parent pointers Darrick J. Wong
                     ` (16 subsequent siblings)
  31 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  0:57 UTC (permalink / raw)
  To: djwong
  Cc: Dave Chinner, Allison Henderson, catherine.hoang, hch,
	allison.henderson, linux-xfs

From: Allison Henderson <allison.henderson@oracle.com>

We need to add, remove or modify parent pointer attributes during
create/link/unlink/rename operations atomically with the dirents in the
parent directories being modified. This means they need to be modified
in the same transaction as the parent directories, and so we need to add
the required space for the attribute modifications to the transaction
reservations.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: fix indenting errors, adjust for new log format]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_trans_resv.c |  326 ++++++++++++++++++++++++++++++++++------
 1 file changed, 274 insertions(+), 52 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_trans_resv.c b/fs/xfs/libxfs/xfs_trans_resv.c
index 6cd45e8c118da..6dbe6e7251e7c 100644
--- a/fs/xfs/libxfs/xfs_trans_resv.c
+++ b/fs/xfs/libxfs/xfs_trans_resv.c
@@ -20,6 +20,9 @@
 #include "xfs_qm.h"
 #include "xfs_trans_space.h"
 #include "xfs_rtbitmap.h"
+#include "xfs_attr_item.h"
+#include "xfs_log.h"
+#include "xfs_da_format.h"
 
 #define _ALLOC	true
 #define _FREE	false
@@ -422,29 +425,110 @@ xfs_calc_itruncate_reservation_minlogsize(
 	return xfs_calc_itruncate_reservation(mp, true);
 }
 
+static inline unsigned int xfs_calc_pptr_link_overhead(void)
+{
+	return sizeof(struct xfs_attri_log_format) +
+			xlog_calc_iovec_len(sizeof(struct xfs_parent_rec)) +
+			xlog_calc_iovec_len(MAXNAMELEN - 1);
+}
+static inline unsigned int xfs_calc_pptr_unlink_overhead(void)
+{
+	return sizeof(struct xfs_attri_log_format) +
+			xlog_calc_iovec_len(sizeof(struct xfs_parent_rec)) +
+			xlog_calc_iovec_len(MAXNAMELEN - 1);
+}
+static inline unsigned int xfs_calc_pptr_replace_overhead(void)
+{
+	return sizeof(struct xfs_attri_log_format) +
+			xlog_calc_iovec_len(sizeof(struct xfs_parent_rec)) +
+			xlog_calc_iovec_len(MAXNAMELEN - 1) +
+			xlog_calc_iovec_len(sizeof(struct xfs_parent_rec)) +
+			xlog_calc_iovec_len(MAXNAMELEN - 1);
+}
+
 /*
  * In renaming a files we can modify:
  *    the five inodes involved: 5 * inode size
  *    the two directory btrees: 2 * (max depth + v2) * dir block size
  *    the two directory bmap btrees: 2 * max depth * block size
  * And the bmap_finish transaction can free dir and bmap blocks (two sets
- *	of bmap blocks) giving:
+ *	of bmap blocks) giving (t2):
  *    the agf for the ags in which the blocks live: 3 * sector size
  *    the agfl for the ags in which the blocks live: 3 * sector size
  *    the superblock for the free block count: sector size
  *    the allocation btrees: 3 exts * 2 trees * (2 * max depth - 1) * block size
+ * If parent pointers are enabled (t3), then each transaction in the chain
+ *    must be capable of setting or removing the extended attribute
+ *    containing the parent information.  It must also be able to handle
+ *    the three xattr intent items that track the progress of the parent
+ *    pointer update.
  */
 STATIC uint
 xfs_calc_rename_reservation(
 	struct xfs_mount	*mp)
 {
-	return XFS_DQUOT_LOGRES(mp) +
-		max((xfs_calc_inode_res(mp, 5) +
-		     xfs_calc_buf_res(2 * XFS_DIROP_LOG_COUNT(mp),
-				      XFS_FSB_TO_B(mp, 1))),
-		    (xfs_calc_buf_res(7, mp->m_sb.sb_sectsize) +
-		     xfs_calc_buf_res(xfs_allocfree_block_count(mp, 3),
-				      XFS_FSB_TO_B(mp, 1))));
+	unsigned int		overhead = XFS_DQUOT_LOGRES(mp);
+	struct xfs_trans_resv	*resp = M_RES(mp);
+	unsigned int		t1, t2, t3 = 0;
+
+	t1 = xfs_calc_inode_res(mp, 5) +
+	     xfs_calc_buf_res(2 * XFS_DIROP_LOG_COUNT(mp),
+			XFS_FSB_TO_B(mp, 1));
+
+	t2 = xfs_calc_buf_res(7, mp->m_sb.sb_sectsize) +
+	     xfs_calc_buf_res(xfs_allocfree_block_count(mp, 3),
+			XFS_FSB_TO_B(mp, 1));
+
+	if (xfs_has_parent(mp)) {
+		unsigned int	rename_overhead, exchange_overhead;
+
+		t3 = max(resp->tr_attrsetm.tr_logres,
+			 resp->tr_attrrm.tr_logres);
+
+		/*
+		 * For a standard rename, the three xattr intent log items
+		 * are (1) replacing the pptr for the source file; (2)
+		 * removing the pptr on the dest file; and (3) adding a
+		 * pptr for the whiteout file in the src dir.
+		 *
+		 * For an RENAME_EXCHANGE, there are two xattr intent
+		 * items to replace the pptr for both src and dest
+		 * files.  Link counts don't change and there is no
+		 * whiteout.
+		 *
+		 * In the worst case we can end up relogging all log
+		 * intent items to allow the log tail to move ahead, so
+		 * they become overhead added to each transaction in a
+		 * processing chain.
+		 */
+		rename_overhead = xfs_calc_pptr_replace_overhead() +
+				  xfs_calc_pptr_unlink_overhead() +
+				  xfs_calc_pptr_link_overhead();
+		exchange_overhead = 2 * xfs_calc_pptr_replace_overhead();
+
+		overhead += max(rename_overhead, exchange_overhead);
+	}
+
+	return overhead + max3(t1, t2, t3);
+}
+
+static inline unsigned int
+xfs_rename_log_count(
+	struct xfs_mount	*mp,
+	struct xfs_trans_resv	*resp)
+{
+	/* One for the rename, one more for freeing blocks */
+	unsigned int		ret = XFS_RENAME_LOG_COUNT;
+
+	/*
+	 * Pre-reserve enough log reservation to handle the transaction
+	 * rolling needed to remove or add one parent pointer.
+	 */
+	if (xfs_has_parent(mp))
+		ret += max(resp->tr_attrsetm.tr_logcount,
+			   resp->tr_attrrm.tr_logcount);
+
+	return ret;
 }
 
 /*
@@ -461,6 +545,23 @@ xfs_calc_iunlink_remove_reservation(
 	       2 * M_IGEO(mp)->inode_cluster_size;
 }
 
+static inline unsigned int
+xfs_link_log_count(
+	struct xfs_mount	*mp,
+	struct xfs_trans_resv	*resp)
+{
+	unsigned int		ret = XFS_LINK_LOG_COUNT;
+
+	/*
+	 * Pre-reserve enough log reservation to handle the transaction
+	 * rolling needed to add one parent pointer.
+	 */
+	if (xfs_has_parent(mp))
+		ret += resp->tr_attrsetm.tr_logcount;
+
+	return ret;
+}
+
 /*
  * For creating a link to an inode:
  *    the parent directory inode: inode size
@@ -477,14 +578,23 @@ STATIC uint
 xfs_calc_link_reservation(
 	struct xfs_mount	*mp)
 {
-	return XFS_DQUOT_LOGRES(mp) +
-		xfs_calc_iunlink_remove_reservation(mp) +
-		max((xfs_calc_inode_res(mp, 2) +
-		     xfs_calc_buf_res(XFS_DIROP_LOG_COUNT(mp),
-				      XFS_FSB_TO_B(mp, 1))),
-		    (xfs_calc_buf_res(3, mp->m_sb.sb_sectsize) +
-		     xfs_calc_buf_res(xfs_allocfree_block_count(mp, 1),
-				      XFS_FSB_TO_B(mp, 1))));
+	unsigned int		overhead = XFS_DQUOT_LOGRES(mp);
+	struct xfs_trans_resv	*resp = M_RES(mp);
+	unsigned int		t1, t2, t3 = 0;
+
+	overhead += xfs_calc_iunlink_remove_reservation(mp);
+	t1 = xfs_calc_inode_res(mp, 2) +
+	     xfs_calc_buf_res(XFS_DIROP_LOG_COUNT(mp), XFS_FSB_TO_B(mp, 1));
+	t2 = xfs_calc_buf_res(3, mp->m_sb.sb_sectsize) +
+	     xfs_calc_buf_res(xfs_allocfree_block_count(mp, 1),
+			      XFS_FSB_TO_B(mp, 1));
+
+	if (xfs_has_parent(mp)) {
+		t3 = resp->tr_attrsetm.tr_logres;
+		overhead += xfs_calc_pptr_link_overhead();
+	}
+
+	return overhead + max3(t1, t2, t3);
 }
 
 /*
@@ -499,6 +609,23 @@ xfs_calc_iunlink_add_reservation(xfs_mount_t *mp)
 			M_IGEO(mp)->inode_cluster_size;
 }
 
+static inline unsigned int
+xfs_remove_log_count(
+	struct xfs_mount	*mp,
+	struct xfs_trans_resv	*resp)
+{
+	unsigned int		ret = XFS_REMOVE_LOG_COUNT;
+
+	/*
+	 * Pre-reserve enough log reservation to handle the transaction
+	 * rolling needed to add one parent pointer.
+	 */
+	if (xfs_has_parent(mp))
+		ret += resp->tr_attrrm.tr_logcount;
+
+	return ret;
+}
+
 /*
  * For removing a directory entry we can modify:
  *    the parent directory inode: inode size
@@ -515,14 +642,24 @@ STATIC uint
 xfs_calc_remove_reservation(
 	struct xfs_mount	*mp)
 {
-	return XFS_DQUOT_LOGRES(mp) +
-		xfs_calc_iunlink_add_reservation(mp) +
-		max((xfs_calc_inode_res(mp, 2) +
-		     xfs_calc_buf_res(XFS_DIROP_LOG_COUNT(mp),
-				      XFS_FSB_TO_B(mp, 1))),
-		    (xfs_calc_buf_res(4, mp->m_sb.sb_sectsize) +
-		     xfs_calc_buf_res(xfs_allocfree_block_count(mp, 2),
-				      XFS_FSB_TO_B(mp, 1))));
+	unsigned int            overhead = XFS_DQUOT_LOGRES(mp);
+	struct xfs_trans_resv   *resp = M_RES(mp);
+	unsigned int            t1, t2, t3 = 0;
+
+	overhead += xfs_calc_iunlink_add_reservation(mp);
+
+	t1 = xfs_calc_inode_res(mp, 2) +
+	     xfs_calc_buf_res(XFS_DIROP_LOG_COUNT(mp), XFS_FSB_TO_B(mp, 1));
+	t2 = xfs_calc_buf_res(4, mp->m_sb.sb_sectsize) +
+	     xfs_calc_buf_res(xfs_allocfree_block_count(mp, 2),
+			      XFS_FSB_TO_B(mp, 1));
+
+	if (xfs_has_parent(mp)) {
+		t3 = resp->tr_attrrm.tr_logres;
+		overhead += xfs_calc_pptr_unlink_overhead();
+	}
+
+	return overhead + max3(t1, t2, t3);
 }
 
 /*
@@ -571,12 +708,40 @@ xfs_calc_icreate_resv_alloc(
 		xfs_calc_finobt_res(mp);
 }
 
+static inline unsigned int
+xfs_icreate_log_count(
+	struct xfs_mount	*mp,
+	struct xfs_trans_resv	*resp)
+{
+	unsigned int		ret = XFS_CREATE_LOG_COUNT;
+
+	/*
+	 * Pre-reserve enough log reservation to handle the transaction
+	 * rolling needed to add one parent pointer.
+	 */
+	if (xfs_has_parent(mp))
+		ret += resp->tr_attrsetm.tr_logcount;
+
+	return ret;
+}
+
 STATIC uint
-xfs_calc_icreate_reservation(xfs_mount_t *mp)
+xfs_calc_icreate_reservation(
+	struct xfs_mount	*mp)
 {
-	return XFS_DQUOT_LOGRES(mp) +
-		max(xfs_calc_icreate_resv_alloc(mp),
-		    xfs_calc_create_resv_modify(mp));
+	struct xfs_trans_resv	*resp = M_RES(mp);
+	unsigned int		overhead = XFS_DQUOT_LOGRES(mp);
+	unsigned int		t1, t2, t3 = 0;
+
+	t1 = xfs_calc_icreate_resv_alloc(mp);
+	t2 = xfs_calc_create_resv_modify(mp);
+
+	if (xfs_has_parent(mp)) {
+		t3 = resp->tr_attrsetm.tr_logres;
+		overhead += xfs_calc_pptr_link_overhead();
+	}
+
+	return overhead + max3(t1, t2, t3);
 }
 
 STATIC uint
@@ -589,6 +754,23 @@ xfs_calc_create_tmpfile_reservation(
 	return res + xfs_calc_iunlink_add_reservation(mp);
 }
 
+static inline unsigned int
+xfs_mkdir_log_count(
+	struct xfs_mount	*mp,
+	struct xfs_trans_resv	*resp)
+{
+	unsigned int		ret = XFS_MKDIR_LOG_COUNT;
+
+	/*
+	 * Pre-reserve enough log reservation to handle the transaction
+	 * rolling needed to add one parent pointer.
+	 */
+	if (xfs_has_parent(mp))
+		ret += resp->tr_attrsetm.tr_logcount;
+
+	return ret;
+}
+
 /*
  * Making a new directory is the same as creating a new file.
  */
@@ -599,6 +781,22 @@ xfs_calc_mkdir_reservation(
 	return xfs_calc_icreate_reservation(mp);
 }
 
+static inline unsigned int
+xfs_symlink_log_count(
+	struct xfs_mount	*mp,
+	struct xfs_trans_resv	*resp)
+{
+	unsigned int		ret = XFS_SYMLINK_LOG_COUNT;
+
+	/*
+	 * Pre-reserve enough log reservation to handle the transaction
+	 * rolling needed to add one parent pointer.
+	 */
+	if (xfs_has_parent(mp))
+		ret += resp->tr_attrsetm.tr_logcount;
+
+	return ret;
+}
 
 /*
  * Making a new symplink is the same as creating a new file, but
@@ -911,6 +1109,52 @@ xfs_calc_sb_reservation(
 	return xfs_calc_buf_res(1, mp->m_sb.sb_sectsize);
 }
 
+/*
+ * Namespace reservations.
+ *
+ * These get tricky when parent pointers are enabled as we have attribute
+ * modifications occurring from within these transactions. Rather than confuse
+ * each of these reservation calculations with the conditional attribute
+ * reservations, add them here in a clear and concise manner. This requires that
+ * the attribute reservations have already been calculated.
+ *
+ * Note that we only include the static attribute reservation here; the runtime
+ * reservation will have to be modified by the size of the attributes being
+ * added/removed/modified. See the comments on the attribute reservation
+ * calculations for more details.
+ */
+STATIC void
+xfs_calc_namespace_reservations(
+	struct xfs_mount	*mp,
+	struct xfs_trans_resv	*resp)
+{
+	ASSERT(resp->tr_attrsetm.tr_logres > 0);
+
+	resp->tr_rename.tr_logres = xfs_calc_rename_reservation(mp);
+	resp->tr_rename.tr_logcount = xfs_rename_log_count(mp, resp);
+	resp->tr_rename.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
+
+	resp->tr_link.tr_logres = xfs_calc_link_reservation(mp);
+	resp->tr_link.tr_logcount = xfs_link_log_count(mp, resp);
+	resp->tr_link.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
+
+	resp->tr_remove.tr_logres = xfs_calc_remove_reservation(mp);
+	resp->tr_remove.tr_logcount = xfs_remove_log_count(mp, resp);
+	resp->tr_remove.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
+
+	resp->tr_symlink.tr_logres = xfs_calc_symlink_reservation(mp);
+	resp->tr_symlink.tr_logcount = xfs_symlink_log_count(mp, resp);
+	resp->tr_symlink.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
+
+	resp->tr_create.tr_logres = xfs_calc_icreate_reservation(mp);
+	resp->tr_create.tr_logcount = xfs_icreate_log_count(mp, resp);
+	resp->tr_create.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
+
+	resp->tr_mkdir.tr_logres = xfs_calc_mkdir_reservation(mp);
+	resp->tr_mkdir.tr_logcount = xfs_mkdir_log_count(mp, resp);
+	resp->tr_mkdir.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
+}
+
 void
 xfs_trans_resv_calc(
 	struct xfs_mount	*mp,
@@ -930,35 +1174,11 @@ xfs_trans_resv_calc(
 	resp->tr_itruncate.tr_logcount = XFS_ITRUNCATE_LOG_COUNT;
 	resp->tr_itruncate.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
 
-	resp->tr_rename.tr_logres = xfs_calc_rename_reservation(mp);
-	resp->tr_rename.tr_logcount = XFS_RENAME_LOG_COUNT;
-	resp->tr_rename.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
-
-	resp->tr_link.tr_logres = xfs_calc_link_reservation(mp);
-	resp->tr_link.tr_logcount = XFS_LINK_LOG_COUNT;
-	resp->tr_link.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
-
-	resp->tr_remove.tr_logres = xfs_calc_remove_reservation(mp);
-	resp->tr_remove.tr_logcount = XFS_REMOVE_LOG_COUNT;
-	resp->tr_remove.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
-
-	resp->tr_symlink.tr_logres = xfs_calc_symlink_reservation(mp);
-	resp->tr_symlink.tr_logcount = XFS_SYMLINK_LOG_COUNT;
-	resp->tr_symlink.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
-
-	resp->tr_create.tr_logres = xfs_calc_icreate_reservation(mp);
-	resp->tr_create.tr_logcount = XFS_CREATE_LOG_COUNT;
-	resp->tr_create.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
-
 	resp->tr_create_tmpfile.tr_logres =
 			xfs_calc_create_tmpfile_reservation(mp);
 	resp->tr_create_tmpfile.tr_logcount = XFS_CREATE_TMPFILE_LOG_COUNT;
 	resp->tr_create_tmpfile.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
 
-	resp->tr_mkdir.tr_logres = xfs_calc_mkdir_reservation(mp);
-	resp->tr_mkdir.tr_logcount = XFS_MKDIR_LOG_COUNT;
-	resp->tr_mkdir.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
-
 	resp->tr_ifree.tr_logres = xfs_calc_ifree_reservation(mp);
 	resp->tr_ifree.tr_logcount = XFS_INACTIVE_LOG_COUNT;
 	resp->tr_ifree.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
@@ -988,6 +1208,8 @@ xfs_trans_resv_calc(
 	resp->tr_qm_dqalloc.tr_logcount = XFS_WRITE_LOG_COUNT;
 	resp->tr_qm_dqalloc.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
 
+	xfs_calc_namespace_reservations(mp, resp);
+
 	/*
 	 * The following transactions are logged in logical format with
 	 * a default log count.


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 16/32] xfs: create a hashname function for parent pointers
  2024-04-10  0:45 ` [PATCHSET v13.1 5/9] xfs: Parent Pointers Darrick J. Wong
                     ` (14 preceding siblings ...)
  2024-04-10  0:57   ` [PATCH 15/32] xfs: extend transaction reservations for parent attributes Darrick J. Wong
@ 2024-04-10  0:57   ` Darrick J. Wong
  2024-04-10  5:33     ` Christoph Hellwig
  2024-04-10  0:57   ` [PATCH 17/32] xfs: parent pointer attribute creation Darrick J. Wong
                     ` (15 subsequent siblings)
  31 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  0:57 UTC (permalink / raw)
  To: djwong; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Although directory entry and parent pointer recordsets look very similar
(name -> ino), there's one major difference between them: a file can be
hardlinked from multiple parent directories with the same filename.
This is common in shared container environments where a base directory
tree might be hardlink-copied multiple times.  IOWs the same 'ls'
program might be hardlinked to multiple /srv/*/bin/ls paths.

We don't want parent pointer operations to bog down on hash collisions
between the same dirent name, so create a special hash function that
mixes in the parent directory inode number.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_attr.c   |    3 +++
 fs/xfs/libxfs/xfs_parent.c |   49 ++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/libxfs/xfs_parent.h |    5 ++++
 fs/xfs/scrub/attr.c        |    4 ++++
 4 files changed, 61 insertions(+)


diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index 41de6a135d907..99930472e59da 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -439,6 +439,9 @@ xfs_attr_hashval(
 {
 	ASSERT(xfs_attr_check_namespace(attr_flags));
 
+	if (attr_flags & XFS_ATTR_PARENT)
+		return xfs_parent_hashattr(mp, name, namelen, value, valuelen);
+
 	return xfs_attr_hashname(name, namelen);
 }
 
diff --git a/fs/xfs/libxfs/xfs_parent.c b/fs/xfs/libxfs/xfs_parent.c
index 5961fa8c85615..d24104821a090 100644
--- a/fs/xfs/libxfs/xfs_parent.c
+++ b/fs/xfs/libxfs/xfs_parent.c
@@ -90,3 +90,52 @@ xfs_parent_valuecheck(
 
 	return true;
 }
+
+/* Compute the attribute name hash for a parent pointer. */
+xfs_dahash_t
+xfs_parent_hashval(
+	struct xfs_mount		*mp,
+	const uint8_t			*name,
+	int				namelen,
+	xfs_ino_t			parent_ino)
+{
+	struct xfs_name			xname = {
+		.name			= name,
+		.len			= namelen,
+	};
+	xfs_dahash_t			ret;
+
+	/*
+	 * Use the same dirent name hash as would be used on the directory, but
+	 * mix in the parent inode number.
+	 */
+	ret = xfs_dir2_hashname(mp, &xname);
+	ret ^= upper_32_bits(parent_ino);
+	ret ^= lower_32_bits(parent_ino);
+	return ret;
+}
+
+/* Compute the attribute name hash from the xattr components. */
+xfs_dahash_t
+xfs_parent_hashattr(
+	struct xfs_mount		*mp,
+	const uint8_t			*name,
+	int				namelen,
+	const void			*value,
+	int				valuelen)
+{
+	const struct xfs_parent_rec	*rec = value;
+
+	/* Requires a local attr value in xfs_parent_rec format */
+	if (valuelen != sizeof(struct xfs_parent_rec)) {
+		ASSERT(valuelen == sizeof(struct xfs_parent_rec));
+		return 0;
+	}
+
+	if (!value) {
+		ASSERT(value != NULL);
+		return 0;
+	}
+
+	return xfs_parent_hashval(mp, name, namelen, be64_to_cpu(rec->p_ino));
+}
diff --git a/fs/xfs/libxfs/xfs_parent.h b/fs/xfs/libxfs/xfs_parent.h
index ef8aff8607801..6a4028871b72a 100644
--- a/fs/xfs/libxfs/xfs_parent.h
+++ b/fs/xfs/libxfs/xfs_parent.h
@@ -12,4 +12,9 @@ bool xfs_parent_namecheck(unsigned int attr_flags, const void *name,
 bool xfs_parent_valuecheck(struct xfs_mount *mp, const void *value,
 		size_t valuelen);
 
+xfs_dahash_t xfs_parent_hashval(struct xfs_mount *mp, const uint8_t *name,
+		int namelen, xfs_ino_t parent_ino);
+xfs_dahash_t xfs_parent_hashattr(struct xfs_mount *mp, const uint8_t *name,
+		int namelen, const void *value, int valuelen);
+
 #endif /* __XFS_PARENT_H__ */
diff --git a/fs/xfs/scrub/attr.c b/fs/xfs/scrub/attr.c
index d5c2e73be8623..fe51a17661831 100644
--- a/fs/xfs/scrub/attr.c
+++ b/fs/xfs/scrub/attr.c
@@ -536,6 +536,10 @@ xchk_xattr_rec(
 			xchk_da_set_corrupt(ds, level);
 			goto out;
 		}
+		if (ent->flags & XFS_ATTR_PARENT) {
+			xchk_da_set_corrupt(ds, level);
+			goto out;
+		}
 		calc_hash = xfs_attr_hashval(mp, ent->flags, rentry->name,
 					     rentry->namelen, NULL,
 					     be32_to_cpu(rentry->valuelen));


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 17/32] xfs: parent pointer attribute creation
  2024-04-10  0:45 ` [PATCHSET v13.1 5/9] xfs: Parent Pointers Darrick J. Wong
                     ` (15 preceding siblings ...)
  2024-04-10  0:57   ` [PATCH 16/32] xfs: create a hashname function for parent pointers Darrick J. Wong
@ 2024-04-10  0:57   ` Darrick J. Wong
  2024-04-10  5:44     ` Christoph Hellwig
  2024-04-10  0:58   ` [PATCH 18/32] xfs: add parent attributes to link Darrick J. Wong
                     ` (14 subsequent siblings)
  31 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  0:57 UTC (permalink / raw)
  To: djwong
  Cc: Dave Chinner, Allison Henderson, catherine.hoang, hch,
	allison.henderson, linux-xfs

From: Allison Henderson <allison.henderson@oracle.com>

Add parent pointer attribute during xfs_create, and subroutines to
initialize attributes

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: shorten names, adjust to new format, set init_xattrs for parent
pointers]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/Makefile                 |    1 +
 fs/xfs/libxfs/xfs_parent.c      |   68 +++++++++++++++++++++++++++++++++++++++
 fs/xfs/libxfs/xfs_parent.h      |   65 +++++++++++++++++++++++++++++++++++++
 fs/xfs/libxfs/xfs_trans_space.c |   52 ++++++++++++++++++++++++++++++
 fs/xfs/libxfs/xfs_trans_space.h |    9 +++--
 fs/xfs/scrub/tempfile.c         |    2 +
 fs/xfs/xfs_inode.c              |   32 +++++++++++++++---
 fs/xfs/xfs_iops.c               |   15 ++++++++-
 fs/xfs/xfs_super.c              |   10 ++++++
 fs/xfs/xfs_xattr.c              |    2 +
 fs/xfs/xfs_xattr.h              |    2 +
 11 files changed, 245 insertions(+), 13 deletions(-)
 create mode 100644 fs/xfs/libxfs/xfs_trans_space.c


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 4956ea9a307b8..0c1a0b67af93c 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -51,6 +51,7 @@ xfs-y				+= $(addprefix libxfs/, \
 				   xfs_symlink_remote.o \
 				   xfs_trans_inode.o \
 				   xfs_trans_resv.o \
+				   xfs_trans_space.o \
 				   xfs_types.o \
 				   )
 # xfs_rtbitmap is shared with libxfs
diff --git a/fs/xfs/libxfs/xfs_parent.c b/fs/xfs/libxfs/xfs_parent.c
index d24104821a090..8875b4790112e 100644
--- a/fs/xfs/libxfs/xfs_parent.c
+++ b/fs/xfs/libxfs/xfs_parent.c
@@ -27,6 +27,10 @@
 #include "xfs_xattr.h"
 #include "xfs_parent.h"
 #include "xfs_trans_space.h"
+#include "xfs_attr_item.h"
+#include "xfs_health.h"
+
+struct kmem_cache		*xfs_parent_args_cache;
 
 /*
  * Parent pointer attribute handling.
@@ -139,3 +143,67 @@ xfs_parent_hashattr(
 
 	return xfs_parent_hashval(mp, name, namelen, be64_to_cpu(rec->p_ino));
 }
+
+/*
+ * Initialize the parent pointer arguments structure.  Caller must have zeroed
+ * the contents of @args.  @tp is only required for updates.
+ */
+static void
+xfs_parent_da_args_init(
+	struct xfs_da_args	*args,
+	struct xfs_trans	*tp,
+	struct xfs_parent_rec	*rec,
+	struct xfs_inode	*child,
+	xfs_ino_t		owner,
+	const struct xfs_name	*parent_name)
+{
+	args->geo = child->i_mount->m_attr_geo;
+	args->whichfork = XFS_ATTR_FORK;
+	args->attr_filter = XFS_ATTR_PARENT;
+	args->op_flags = XFS_DA_OP_LOGGED | XFS_DA_OP_OKNOENT;
+	args->trans = tp;
+	args->dp = child;
+	args->owner = owner;
+	args->name = parent_name->name;
+	args->namelen = parent_name->len;
+	args->value = rec;
+	args->valuelen = sizeof(struct xfs_parent_rec);
+	xfs_attr_sethash(args);
+}
+
+/* Make sure the incore state is ready for a parent pointer query/update. */
+static inline int
+xfs_parent_iread_extents(
+	struct xfs_trans	*tp,
+	struct xfs_inode	*child)
+{
+	/* Parent pointers require that the attr fork must exist. */
+	if (XFS_IS_CORRUPT(child->i_mount, !xfs_inode_has_attr_fork(child))) {
+		xfs_inode_mark_sick(child, XFS_SICK_INO_PARENT);
+		return -EFSCORRUPTED;
+	}
+
+	return xfs_iread_extents(tp, child, XFS_ATTR_FORK);
+}
+
+/* Add a parent pointer to reflect a dirent addition. */
+int
+xfs_parent_addname(
+	struct xfs_trans	*tp,
+	struct xfs_parent_args	*ppargs,
+	struct xfs_inode	*dp,
+	const struct xfs_name	*parent_name,
+	struct xfs_inode	*child)
+{
+	int			error;
+
+	error = xfs_parent_iread_extents(tp, child);
+	if (error)
+		return error;
+
+	xfs_inode_to_parent_rec(&ppargs->rec, dp);
+	xfs_parent_da_args_init(&ppargs->args, tp, &ppargs->rec, child,
+			child->i_ino, parent_name);
+	xfs_attr_defer_parent(&ppargs->args, XFS_ATTR_DEFER_SET);
+	return 0;
+}
diff --git a/fs/xfs/libxfs/xfs_parent.h b/fs/xfs/libxfs/xfs_parent.h
index 6a4028871b72a..6de24e3ef318c 100644
--- a/fs/xfs/libxfs/xfs_parent.h
+++ b/fs/xfs/libxfs/xfs_parent.h
@@ -17,4 +17,69 @@ xfs_dahash_t xfs_parent_hashval(struct xfs_mount *mp, const uint8_t *name,
 xfs_dahash_t xfs_parent_hashattr(struct xfs_mount *mp, const uint8_t *name,
 		int namelen, const void *value, int valuelen);
 
+/* Initializes a xfs_parent_rec to be stored as an attribute name. */
+static inline void
+xfs_parent_rec_init(
+	struct xfs_parent_rec	*rec,
+	xfs_ino_t		ino,
+	uint32_t		gen)
+{
+	rec->p_ino = cpu_to_be64(ino);
+	rec->p_gen = cpu_to_be32(gen);
+}
+
+/* Initializes a xfs_parent_rec to be stored as an attribute name. */
+static inline void
+xfs_inode_to_parent_rec(
+	struct xfs_parent_rec	*rec,
+	const struct xfs_inode	*dp)
+{
+	xfs_parent_rec_init(rec, dp->i_ino, VFS_IC(dp)->i_generation);
+}
+
+extern struct kmem_cache	*xfs_parent_args_cache;
+
+/*
+ * Parent pointer information needed to pass around the deferred xattr update
+ * machinery.
+ */
+struct xfs_parent_args {
+	struct xfs_parent_rec	rec;
+	struct xfs_da_args	args;
+};
+
+/*
+ * Start a parent pointer update by allocating the context object we need to
+ * perform a parent pointer update.
+ */
+static inline int
+xfs_parent_start(
+	struct xfs_mount	*mp,
+	struct xfs_parent_args	**ppargsp)
+{
+	if (!xfs_has_parent(mp)) {
+		*ppargsp = NULL;
+		return 0;
+	}
+
+	*ppargsp = kmem_cache_zalloc(xfs_parent_args_cache, GFP_KERNEL);
+	if (!*ppargsp)
+		return -ENOMEM;
+	return 0;
+}
+
+/* Finish a parent pointer update by freeing the context object. */
+static inline void
+xfs_parent_finish(
+	struct xfs_mount	*mp,
+	struct xfs_parent_args	*ppargs)
+{
+	if (ppargs)
+		kmem_cache_free(xfs_parent_args_cache, ppargs);
+}
+
+int xfs_parent_addname(struct xfs_trans *tp, struct xfs_parent_args *ppargs,
+		struct xfs_inode *dp, const struct xfs_name *parent_name,
+		struct xfs_inode *child);
+
 #endif /* __XFS_PARENT_H__ */
diff --git a/fs/xfs/libxfs/xfs_trans_space.c b/fs/xfs/libxfs/xfs_trans_space.c
new file mode 100644
index 0000000000000..90532c3fa2053
--- /dev/null
+++ b/fs/xfs/libxfs/xfs_trans_space.c
@@ -0,0 +1,52 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2000,2005 Silicon Graphics, Inc.
+ * All Rights Reserved.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_da_format.h"
+#include "xfs_log_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_da_btree.h"
+#include "xfs_bmap_btree.h"
+#include "xfs_trans_space.h"
+
+/* Calculate the disk space required to add a parent pointer. */
+unsigned int
+xfs_parent_calc_space_res(
+	struct xfs_mount	*mp,
+	unsigned int		namelen)
+{
+	/*
+	 * Parent pointers are always the first attr in an attr tree, and never
+	 * larger than a block
+	 */
+	return XFS_DAENTER_SPACE_RES(mp, XFS_ATTR_FORK) +
+	       XFS_NEXTENTADD_SPACE_RES(mp, namelen, XFS_ATTR_FORK);
+}
+
+unsigned int
+xfs_create_space_res(
+	struct xfs_mount	*mp,
+	unsigned int		namelen)
+{
+	unsigned int		ret;
+
+	ret = XFS_IALLOC_SPACE_RES(mp) + XFS_DIRENTER_SPACE_RES(mp, namelen);
+	if (xfs_has_parent(mp))
+		ret += xfs_parent_calc_space_res(mp, namelen);
+
+	return ret;
+}
+
+unsigned int
+xfs_mkdir_space_res(
+	struct xfs_mount	*mp,
+	unsigned int		namelen)
+{
+	return xfs_create_space_res(mp, namelen);
+}
diff --git a/fs/xfs/libxfs/xfs_trans_space.h b/fs/xfs/libxfs/xfs_trans_space.h
index 9640fc232c147..6cda87153b38c 100644
--- a/fs/xfs/libxfs/xfs_trans_space.h
+++ b/fs/xfs/libxfs/xfs_trans_space.h
@@ -80,8 +80,6 @@
 /* This macro is not used - see inline code in xfs_attr_set */
 #define	XFS_ATTRSET_SPACE_RES(mp, v)	\
 	(XFS_DAENTER_SPACE_RES(mp, XFS_ATTR_FORK) + XFS_B_TO_FSB(mp, v))
-#define	XFS_CREATE_SPACE_RES(mp,nl)	\
-	(XFS_IALLOC_SPACE_RES(mp) + XFS_DIRENTER_SPACE_RES(mp,nl))
 #define	XFS_DIOSTRAT_SPACE_RES(mp, v)	\
 	(XFS_EXTENTADD_SPACE_RES(mp, XFS_DATA_FORK) + (v))
 #define	XFS_GROWFS_SPACE_RES(mp)	\
@@ -90,8 +88,6 @@
 	((b) + XFS_EXTENTADD_SPACE_RES(mp, XFS_DATA_FORK))
 #define	XFS_LINK_SPACE_RES(mp,nl)	\
 	XFS_DIRENTER_SPACE_RES(mp,nl)
-#define	XFS_MKDIR_SPACE_RES(mp,nl)	\
-	(XFS_IALLOC_SPACE_RES(mp) + XFS_DIRENTER_SPACE_RES(mp,nl))
 #define	XFS_QM_DQALLOC_SPACE_RES(mp)	\
 	(XFS_EXTENTADD_SPACE_RES(mp, XFS_DATA_FORK) + \
 	 XFS_DQUOT_CLUSTER_SIZE_FSB)
@@ -106,5 +102,10 @@
 #define XFS_IFREE_SPACE_RES(mp)		\
 	(xfs_has_finobt(mp) ? M_IGEO(mp)->inobt_maxlevels : 0)
 
+unsigned int xfs_parent_calc_space_res(struct xfs_mount *mp,
+		unsigned int namelen);
+
+unsigned int xfs_create_space_res(struct xfs_mount *mp, unsigned int namelen);
+unsigned int xfs_mkdir_space_res(struct xfs_mount *mp, unsigned int namelen);
 
 #endif	/* __XFS_TRANS_SPACE_H__ */
diff --git a/fs/xfs/scrub/tempfile.c b/fs/xfs/scrub/tempfile.c
index 6f39504a216ea..ddbcccb3dba13 100644
--- a/fs/xfs/scrub/tempfile.c
+++ b/fs/xfs/scrub/tempfile.c
@@ -71,7 +71,7 @@ xrep_tempfile_create(
 		return error;
 
 	if (is_dir) {
-		resblks = XFS_MKDIR_SPACE_RES(mp, 0);
+		resblks = xfs_mkdir_space_res(mp, 0);
 		tres = &M_RES(mp)->tr_mkdir;
 	} else {
 		resblks = XFS_IALLOC_SPACE_RES(mp);
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index c079114b97ecf..ebef2767a86bd 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -40,6 +40,8 @@
 #include "xfs_log_priv.h"
 #include "xfs_health.h"
 #include "xfs_pnfs.h"
+#include "xfs_parent.h"
+#include "xfs_xattr.h"
 
 struct kmem_cache *xfs_inode_cache;
 
@@ -1016,7 +1018,7 @@ xfs_dir_hook_setup(
 int
 xfs_create(
 	struct mnt_idmap	*idmap,
-	xfs_inode_t		*dp,
+	struct xfs_inode	*dp,
 	struct xfs_name		*name,
 	umode_t			mode,
 	dev_t			rdev,
@@ -1028,7 +1030,7 @@ xfs_create(
 	struct xfs_inode	*ip = NULL;
 	struct xfs_trans	*tp = NULL;
 	int			error;
-	bool                    unlock_dp_on_error = false;
+	bool			unlock_dp_on_error = false;
 	prid_t			prid;
 	struct xfs_dquot	*udqp = NULL;
 	struct xfs_dquot	*gdqp = NULL;
@@ -1036,6 +1038,7 @@ xfs_create(
 	struct xfs_trans_res	*tres;
 	uint			resblks;
 	xfs_ino_t		ino;
+	struct xfs_parent_args	*ppargs;
 
 	trace_xfs_create(dp, name);
 
@@ -1057,13 +1060,17 @@ xfs_create(
 		return error;
 
 	if (is_dir) {
-		resblks = XFS_MKDIR_SPACE_RES(mp, name->len);
+		resblks = xfs_mkdir_space_res(mp, name->len);
 		tres = &M_RES(mp)->tr_mkdir;
 	} else {
-		resblks = XFS_CREATE_SPACE_RES(mp, name->len);
+		resblks = xfs_create_space_res(mp, name->len);
 		tres = &M_RES(mp)->tr_create;
 	}
 
+	error = xfs_parent_start(mp, &ppargs);
+	if (error)
+		goto out_release_dquots;
+
 	/*
 	 * Initially assume that the file does not exist and
 	 * reserve the resources for that case.  If that is not
@@ -1079,7 +1086,7 @@ xfs_create(
 				resblks, &tp);
 	}
 	if (error)
-		goto out_release_dquots;
+		goto out_parent;
 
 	xfs_ilock(dp, XFS_ILOCK_EXCL | XFS_ILOCK_PARENT);
 	unlock_dp_on_error = true;
@@ -1122,6 +1129,16 @@ xfs_create(
 		xfs_bumplink(tp, dp);
 	}
 
+	/*
+	 * If we have parent pointers, we need to add the attribute containing
+	 * the parent information now.
+	 */
+	if (ppargs) {
+		error = xfs_parent_addname(tp, ppargs, dp, name, ip);
+		if (error)
+			goto out_trans_cancel;
+	}
+
 	/*
 	 * Create ip with a reference from dp, and add '.' and '..' references
 	 * if it's a directory.
@@ -1154,6 +1171,7 @@ xfs_create(
 	*ipp = ip;
 	xfs_iunlock(ip, XFS_ILOCK_EXCL);
 	xfs_iunlock(dp, XFS_ILOCK_EXCL);
+	xfs_parent_finish(mp, ppargs);
 	return 0;
 
  out_trans_cancel:
@@ -1169,6 +1187,8 @@ xfs_create(
 		xfs_finish_inode_setup(ip);
 		xfs_irele(ip);
 	}
+ out_parent:
+	xfs_parent_finish(mp, ppargs);
  out_release_dquots:
 	xfs_qm_dqrele(udqp);
 	xfs_qm_dqrele(gdqp);
@@ -3037,7 +3057,7 @@ xfs_rename_alloc_whiteout(
 	int			error;
 
 	error = xfs_create_tmpfile(idmap, dp, S_IFCHR | WHITEOUT_MODE,
-				   false, &tmpfile);
+			xfs_has_parent(dp->i_mount), &tmpfile);
 	if (error)
 		return error;
 
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 273bc30fd2bad..a363af4d0bead 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -157,6 +157,8 @@ xfs_create_need_xattr(
 	if (dir->i_sb->s_security)
 		return true;
 #endif
+	if (xfs_has_parent(XFS_I(dir)->i_mount))
+		return true;
 	return false;
 }
 
@@ -201,7 +203,18 @@ xfs_generic_create(
 				xfs_create_need_xattr(dir, default_acl, acl),
 				&ip);
 	} else {
-		error = xfs_create_tmpfile(idmap, XFS_I(dir), mode, false, &ip);
+		bool	init_xattrs = false;
+
+		/*
+		 * If this temporary file will be linkable, set up the file
+		 * with an attr fork to receive a parent pointer.
+		 */
+		if (!(tmpfile->f_flags & O_EXCL) &&
+		    xfs_has_parent(XFS_I(dir)->i_mount))
+			init_xattrs = true;
+
+		error = xfs_create_tmpfile(idmap, XFS_I(dir), mode,
+				init_xattrs, &ip);
 	}
 	if (unlikely(error))
 		goto out_free_acl;
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 5c9ba974252d1..84f37e8474da2 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -44,6 +44,7 @@
 #include "xfs_dahash_test.h"
 #include "xfs_rtbitmap.h"
 #include "xfs_exchmaps_item.h"
+#include "xfs_parent.h"
 #include "scrub/stats.h"
 #include "scrub/rcbag_btree.h"
 
@@ -2202,8 +2203,16 @@ xfs_init_caches(void)
 	if (!xfs_xmi_cache)
 		goto out_destroy_xmd_cache;
 
+	xfs_parent_args_cache = kmem_cache_create("xfs_parent_args",
+					     sizeof(struct xfs_parent_args),
+					     0, 0, NULL);
+	if (!xfs_parent_args_cache)
+		goto out_destroy_xmi_cache;
+
 	return 0;
 
+ out_destroy_xmi_cache:
+	kmem_cache_destroy(xfs_xmi_cache);
  out_destroy_xmd_cache:
 	kmem_cache_destroy(xfs_xmd_cache);
  out_destroy_iul_cache:
@@ -2264,6 +2273,7 @@ xfs_destroy_caches(void)
 	 * destroy caches.
 	 */
 	rcu_barrier();
+	kmem_cache_destroy(xfs_parent_args_cache);
 	kmem_cache_destroy(xfs_xmd_cache);
 	kmem_cache_destroy(xfs_xmi_cache);
 	kmem_cache_destroy(xfs_iunlink_cache);
diff --git a/fs/xfs/xfs_xattr.c b/fs/xfs/xfs_xattr.c
index 514179a8d2a7f..85e886ee20e03 100644
--- a/fs/xfs/xfs_xattr.c
+++ b/fs/xfs/xfs_xattr.c
@@ -24,7 +24,7 @@
  * Get permission to use log-assisted atomic exchange of file extents.
  * Callers must not be running any transactions or hold any ILOCKs.
  */
-static inline int
+int
 xfs_attr_grab_log_assist(
 	struct xfs_mount	*mp)
 {
diff --git a/fs/xfs/xfs_xattr.h b/fs/xfs/xfs_xattr.h
index cec766cad26cd..f097002d06571 100644
--- a/fs/xfs/xfs_xattr.h
+++ b/fs/xfs/xfs_xattr.h
@@ -7,6 +7,8 @@
 #define __XFS_XATTR_H__
 
 int xfs_attr_change(struct xfs_da_args *args);
+int xfs_attr_grab_log_assist(struct xfs_mount *mp);
+void xfs_attr_rele_log_assist(struct xfs_mount *mp);
 
 extern const struct xattr_handler * const xfs_xattr_handlers[];
 


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 18/32] xfs: add parent attributes to link
  2024-04-10  0:45 ` [PATCHSET v13.1 5/9] xfs: Parent Pointers Darrick J. Wong
                     ` (16 preceding siblings ...)
  2024-04-10  0:57   ` [PATCH 17/32] xfs: parent pointer attribute creation Darrick J. Wong
@ 2024-04-10  0:58   ` Darrick J. Wong
  2024-04-10  5:45     ` Christoph Hellwig
  2024-04-10  0:58   ` [PATCH 19/32] xfs: add parent attributes to symlink Darrick J. Wong
                     ` (13 subsequent siblings)
  31 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  0:58 UTC (permalink / raw)
  To: djwong
  Cc: Dave Chinner, Allison Henderson, catherine.hoang, hch,
	allison.henderson, linux-xfs

From: Allison Henderson <allison.henderson@oracle.com>

This patch modifies xfs_link to add a parent pointer to the inode.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: minor rebase fixes]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_trans_space.c |   14 +++++++++++++
 fs/xfs/libxfs/xfs_trans_space.h |    3 +--
 fs/xfs/scrub/dir_repair.c       |    2 +-
 fs/xfs/scrub/orphanage.c        |    2 +-
 fs/xfs/xfs_inode.c              |   43 ++++++++++++++++++++++++++++++++++-----
 5 files changed, 54 insertions(+), 10 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_trans_space.c b/fs/xfs/libxfs/xfs_trans_space.c
index 90532c3fa2053..cf775750120e8 100644
--- a/fs/xfs/libxfs/xfs_trans_space.c
+++ b/fs/xfs/libxfs/xfs_trans_space.c
@@ -50,3 +50,17 @@ xfs_mkdir_space_res(
 {
 	return xfs_create_space_res(mp, namelen);
 }
+
+unsigned int
+xfs_link_space_res(
+	struct xfs_mount	*mp,
+	unsigned int		namelen)
+{
+	unsigned int		ret;
+
+	ret = XFS_DIRENTER_SPACE_RES(mp, namelen);
+	if (xfs_has_parent(mp))
+		ret += xfs_parent_calc_space_res(mp, namelen);
+
+	return ret;
+}
diff --git a/fs/xfs/libxfs/xfs_trans_space.h b/fs/xfs/libxfs/xfs_trans_space.h
index 6cda87153b38c..5539634009fb2 100644
--- a/fs/xfs/libxfs/xfs_trans_space.h
+++ b/fs/xfs/libxfs/xfs_trans_space.h
@@ -86,8 +86,6 @@
 	(2 * (mp)->m_alloc_maxlevels)
 #define	XFS_GROWFSRT_SPACE_RES(mp,b)	\
 	((b) + XFS_EXTENTADD_SPACE_RES(mp, XFS_DATA_FORK))
-#define	XFS_LINK_SPACE_RES(mp,nl)	\
-	XFS_DIRENTER_SPACE_RES(mp,nl)
 #define	XFS_QM_DQALLOC_SPACE_RES(mp)	\
 	(XFS_EXTENTADD_SPACE_RES(mp, XFS_DATA_FORK) + \
 	 XFS_DQUOT_CLUSTER_SIZE_FSB)
@@ -107,5 +105,6 @@ unsigned int xfs_parent_calc_space_res(struct xfs_mount *mp,
 
 unsigned int xfs_create_space_res(struct xfs_mount *mp, unsigned int namelen);
 unsigned int xfs_mkdir_space_res(struct xfs_mount *mp, unsigned int namelen);
+unsigned int xfs_link_space_res(struct xfs_mount *mp, unsigned int namelen);
 
 #endif	/* __XFS_TRANS_SPACE_H__ */
diff --git a/fs/xfs/scrub/dir_repair.c b/fs/xfs/scrub/dir_repair.c
index 38957da26b94a..575397aef1f7a 100644
--- a/fs/xfs/scrub/dir_repair.c
+++ b/fs/xfs/scrub/dir_repair.c
@@ -704,7 +704,7 @@ xrep_dir_replay_update(
 	uint				resblks;
 	int				error;
 
-	resblks = XFS_LINK_SPACE_RES(mp, xname->len);
+	resblks = xfs_link_space_res(mp, xname->len);
 	error = xchk_trans_alloc(rd->sc, resblks);
 	if (error)
 		return error;
diff --git a/fs/xfs/scrub/orphanage.c b/fs/xfs/scrub/orphanage.c
index 885b7d478a0ab..5e2c3546f2e95 100644
--- a/fs/xfs/scrub/orphanage.c
+++ b/fs/xfs/scrub/orphanage.c
@@ -326,7 +326,7 @@ xrep_adoption_trans_alloc(
 
 	/* Compute the worst case space reservation that we need. */
 	adopt->sc = sc;
-	adopt->orphanage_blkres = XFS_LINK_SPACE_RES(mp, MAXNAMELEN);
+	adopt->orphanage_blkres = xfs_link_space_res(mp, MAXNAMELEN);
 	if (S_ISDIR(VFS_I(sc->ip)->i_mode))
 		child_blkres = XFS_RENAME_SPACE_RES(mp, xfs_name_dotdot.len);
 	adopt->child_blkres = child_blkres;
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index ebef2767a86bd..4a3fbd8d33099 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -1298,14 +1298,15 @@ xfs_create_tmpfile(
 
 int
 xfs_link(
-	xfs_inode_t		*tdp,
-	xfs_inode_t		*sip,
+	struct xfs_inode	*tdp,
+	struct xfs_inode	*sip,
 	struct xfs_name		*target_name)
 {
-	xfs_mount_t		*mp = tdp->i_mount;
-	xfs_trans_t		*tp;
+	struct xfs_mount	*mp = tdp->i_mount;
+	struct xfs_trans	*tp;
 	int			error, nospace_error = 0;
 	int			resblks;
+	struct xfs_parent_args	*ppargs;
 
 	trace_xfs_link(tdp, target_name);
 
@@ -1324,11 +1325,25 @@ xfs_link(
 	if (error)
 		goto std_return;
 
-	resblks = XFS_LINK_SPACE_RES(mp, target_name->len);
+	error = xfs_parent_start(mp, &ppargs);
+	if (error)
+		goto std_return;
+
+	resblks = xfs_link_space_res(mp, target_name->len);
 	error = xfs_trans_alloc_dir(tdp, &M_RES(mp)->tr_link, sip, &resblks,
 			&tp, &nospace_error);
 	if (error)
-		goto std_return;
+		goto out_parent;
+
+	/*
+	 * We don't allow reservationless or quotaless hardlinking when parent
+	 * pointers are enabled because we can't back out if the xattrs must
+	 * grow.
+	 */
+	if (ppargs && nospace_error) {
+		error = nospace_error;
+		goto error_return;
+	}
 
 	/*
 	 * If we are using project inheritance, we only allow hard link
@@ -1379,6 +1394,19 @@ xfs_link(
 	xfs_trans_log_inode(tp, tdp, XFS_ILOG_CORE);
 
 	xfs_bumplink(tp, sip);
+
+	/*
+	 * If we have parent pointers, we now need to add the parent record to
+	 * the attribute fork of the inode. If this is the initial parent
+	 * attribute, we need to create it correctly, otherwise we can just add
+	 * the parent to the inode.
+	 */
+	if (ppargs) {
+		error = xfs_parent_addname(tp, ppargs, tdp, target_name, sip);
+		if (error)
+			goto error_return;
+	}
+
 	xfs_dir_update_hook(tdp, sip, 1, target_name);
 
 	/*
@@ -1392,12 +1420,15 @@ xfs_link(
 	error = xfs_trans_commit(tp);
 	xfs_iunlock(tdp, XFS_ILOCK_EXCL);
 	xfs_iunlock(sip, XFS_ILOCK_EXCL);
+	xfs_parent_finish(mp, ppargs);
 	return error;
 
  error_return:
 	xfs_trans_cancel(tp);
 	xfs_iunlock(tdp, XFS_ILOCK_EXCL);
 	xfs_iunlock(sip, XFS_ILOCK_EXCL);
+ out_parent:
+	xfs_parent_finish(mp, ppargs);
  std_return:
 	if (error == -ENOSPC && nospace_error)
 		error = nospace_error;


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 19/32] xfs: add parent attributes to symlink
  2024-04-10  0:45 ` [PATCHSET v13.1 5/9] xfs: Parent Pointers Darrick J. Wong
                     ` (17 preceding siblings ...)
  2024-04-10  0:58   ` [PATCH 18/32] xfs: add parent attributes to link Darrick J. Wong
@ 2024-04-10  0:58   ` Darrick J. Wong
  2024-04-10  5:45     ` Christoph Hellwig
  2024-04-10  0:58   ` [PATCH 20/32] xfs: remove parent pointers in unlink Darrick J. Wong
                     ` (12 subsequent siblings)
  31 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  0:58 UTC (permalink / raw)
  To: djwong
  Cc: Allison Henderson, catherine.hoang, hch, allison.henderson, linux-xfs

From: Allison Henderson <allison.henderson@oracle.com>

This patch modifies xfs_symlink to add a parent pointer to the inode.

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: minor rebase fixups]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_trans_space.c |   17 +++++++++++++++++
 fs/xfs/libxfs/xfs_trans_space.h |    4 ++--
 fs/xfs/scrub/symlink_repair.c   |    2 +-
 fs/xfs/xfs_symlink.c            |   30 +++++++++++++++++++++++++-----
 4 files changed, 45 insertions(+), 8 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_trans_space.c b/fs/xfs/libxfs/xfs_trans_space.c
index cf775750120e8..c8adda82debe0 100644
--- a/fs/xfs/libxfs/xfs_trans_space.c
+++ b/fs/xfs/libxfs/xfs_trans_space.c
@@ -64,3 +64,20 @@ xfs_link_space_res(
 
 	return ret;
 }
+
+unsigned int
+xfs_symlink_space_res(
+	struct xfs_mount	*mp,
+	unsigned int		namelen,
+	unsigned int		fsblocks)
+{
+	unsigned int		ret;
+
+	ret = XFS_IALLOC_SPACE_RES(mp) + XFS_DIRENTER_SPACE_RES(mp, namelen) +
+			fsblocks;
+
+	if (xfs_has_parent(mp))
+		ret += xfs_parent_calc_space_res(mp, namelen);
+
+	return ret;
+}
diff --git a/fs/xfs/libxfs/xfs_trans_space.h b/fs/xfs/libxfs/xfs_trans_space.h
index 5539634009fb2..354ad1d6e18d6 100644
--- a/fs/xfs/libxfs/xfs_trans_space.h
+++ b/fs/xfs/libxfs/xfs_trans_space.h
@@ -95,8 +95,6 @@
 	XFS_DIRREMOVE_SPACE_RES(mp)
 #define	XFS_RENAME_SPACE_RES(mp,nl)	\
 	(XFS_DIRREMOVE_SPACE_RES(mp) + XFS_DIRENTER_SPACE_RES(mp,nl))
-#define	XFS_SYMLINK_SPACE_RES(mp,nl,b)	\
-	(XFS_IALLOC_SPACE_RES(mp) + XFS_DIRENTER_SPACE_RES(mp,nl) + (b))
 #define XFS_IFREE_SPACE_RES(mp)		\
 	(xfs_has_finobt(mp) ? M_IGEO(mp)->inobt_maxlevels : 0)
 
@@ -106,5 +104,7 @@ unsigned int xfs_parent_calc_space_res(struct xfs_mount *mp,
 unsigned int xfs_create_space_res(struct xfs_mount *mp, unsigned int namelen);
 unsigned int xfs_mkdir_space_res(struct xfs_mount *mp, unsigned int namelen);
 unsigned int xfs_link_space_res(struct xfs_mount *mp, unsigned int namelen);
+unsigned int xfs_symlink_space_res(struct xfs_mount *mp, unsigned int namelen,
+		unsigned int fsblocks);
 
 #endif	/* __XFS_TRANS_SPACE_H__ */
diff --git a/fs/xfs/scrub/symlink_repair.c b/fs/xfs/scrub/symlink_repair.c
index 178304959535a..c8b5a5b878ac9 100644
--- a/fs/xfs/scrub/symlink_repair.c
+++ b/fs/xfs/scrub/symlink_repair.c
@@ -421,7 +421,7 @@ xrep_symlink_rebuild(
 	 * unlikely.
 	 */
 	fs_blocks = xfs_symlink_blocks(sc->mp, target_len);
-	resblks = XFS_SYMLINK_SPACE_RES(sc->mp, target_len, fs_blocks);
+	resblks = xfs_symlink_space_res(sc->mp, target_len, fs_blocks);
 	error = xfs_trans_reserve_quota_nblks(sc->tp, sc->tempip, resblks, 0,
 			true);
 	if (error)
diff --git a/fs/xfs/xfs_symlink.c b/fs/xfs/xfs_symlink.c
index 85ef56fdd7dfe..17aee806ec2e1 100644
--- a/fs/xfs/xfs_symlink.c
+++ b/fs/xfs/xfs_symlink.c
@@ -25,6 +25,8 @@
 #include "xfs_error.h"
 #include "xfs_health.h"
 #include "xfs_symlink_remote.h"
+#include "xfs_parent.h"
+#include "xfs_defer.h"
 
 int
 xfs_readlink(
@@ -100,6 +102,7 @@ xfs_symlink(
 	struct xfs_dquot	*pdqp = NULL;
 	uint			resblks;
 	xfs_ino_t		ino;
+	struct xfs_parent_args	*ppargs;
 
 	*ipp = NULL;
 
@@ -130,18 +133,24 @@ xfs_symlink(
 
 	/*
 	 * The symlink will fit into the inode data fork?
-	 * There can't be any attributes so we get the whole variable part.
+	 * If there are no parent pointers, then there wont't be any attributes.
+	 * So we get the whole variable part, and do not need to reserve extra
+	 * blocks.  Otherwise, we need to reserve the blocks.
 	 */
-	if (pathlen <= XFS_LITINO(mp))
+	if (pathlen <= XFS_LITINO(mp) && !xfs_has_parent(mp))
 		fs_blocks = 0;
 	else
 		fs_blocks = xfs_symlink_blocks(mp, pathlen);
-	resblks = XFS_SYMLINK_SPACE_RES(mp, link_name->len, fs_blocks);
+	resblks = xfs_symlink_space_res(mp, link_name->len, fs_blocks);
+
+	error = xfs_parent_start(mp, &ppargs);
+	if (error)
+		goto out_release_dquots;
 
 	error = xfs_trans_alloc_icreate(mp, &M_RES(mp)->tr_symlink, udqp, gdqp,
 			pdqp, resblks, &tp);
 	if (error)
-		goto out_release_dquots;
+		goto out_parent;
 
 	xfs_ilock(dp, XFS_ILOCK_EXCL | XFS_ILOCK_PARENT);
 	unlock_dp_on_error = true;
@@ -161,7 +170,7 @@ xfs_symlink(
 	if (!error)
 		error = xfs_init_new_inode(idmap, tp, dp, ino,
 				S_IFLNK | (mode & ~S_IFMT), 1, 0, prid,
-				false, &ip);
+				xfs_has_parent(mp), &ip);
 	if (error)
 		goto out_trans_cancel;
 
@@ -195,6 +204,14 @@ xfs_symlink(
 		goto out_trans_cancel;
 	xfs_trans_ichgtime(tp, dp, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG);
 	xfs_trans_log_inode(tp, dp, XFS_ILOG_CORE);
+
+	/* Add parent pointer for the new symlink. */
+	if (ppargs) {
+		error = xfs_parent_addname(tp, ppargs, dp, link_name, ip);
+		if (error)
+			goto out_trans_cancel;
+	}
+
 	xfs_dir_update_hook(dp, ip, 1, link_name);
 
 	/*
@@ -216,6 +233,7 @@ xfs_symlink(
 	*ipp = ip;
 	xfs_iunlock(ip, XFS_ILOCK_EXCL);
 	xfs_iunlock(dp, XFS_ILOCK_EXCL);
+	xfs_parent_finish(mp, ppargs);
 	return 0;
 
 out_trans_cancel:
@@ -231,6 +249,8 @@ xfs_symlink(
 		xfs_finish_inode_setup(ip);
 		xfs_irele(ip);
 	}
+out_parent:
+	xfs_parent_finish(mp, ppargs);
 out_release_dquots:
 	xfs_qm_dqrele(udqp);
 	xfs_qm_dqrele(gdqp);


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 20/32] xfs: remove parent pointers in unlink
  2024-04-10  0:45 ` [PATCHSET v13.1 5/9] xfs: Parent Pointers Darrick J. Wong
                     ` (18 preceding siblings ...)
  2024-04-10  0:58   ` [PATCH 19/32] xfs: add parent attributes to symlink Darrick J. Wong
@ 2024-04-10  0:58   ` Darrick J. Wong
  2024-04-10  5:45     ` Christoph Hellwig
  2024-04-10  0:58   ` [PATCH 21/32] xfs: Add parent pointers to rename Darrick J. Wong
                     ` (11 subsequent siblings)
  31 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  0:58 UTC (permalink / raw)
  To: djwong
  Cc: Dave Chinner, Allison Henderson, catherine.hoang, hch,
	allison.henderson, linux-xfs

From: Allison Henderson <allison.henderson@oracle.com>

This patch removes the parent pointer attribute during unlink

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: adjust to new ondisk format, minor rebase fixes]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_parent.c      |   22 ++++++++++++++++++++++
 fs/xfs/libxfs/xfs_parent.h      |    3 +++
 fs/xfs/libxfs/xfs_trans_space.c |   13 +++++++++++++
 fs/xfs/libxfs/xfs_trans_space.h |    3 +--
 fs/xfs/xfs_inode.c              |   27 +++++++++++++++++++++------
 5 files changed, 60 insertions(+), 8 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_parent.c b/fs/xfs/libxfs/xfs_parent.c
index 8875b4790112e..0ddaab08d722d 100644
--- a/fs/xfs/libxfs/xfs_parent.c
+++ b/fs/xfs/libxfs/xfs_parent.c
@@ -207,3 +207,25 @@ xfs_parent_addname(
 	xfs_attr_defer_parent(&ppargs->args, XFS_ATTR_DEFER_SET);
 	return 0;
 }
+
+/* Remove a parent pointer to reflect a dirent removal. */
+int
+xfs_parent_removename(
+	struct xfs_trans	*tp,
+	struct xfs_parent_args	*ppargs,
+	struct xfs_inode	*dp,
+	const struct xfs_name	*parent_name,
+	struct xfs_inode	*child)
+{
+	int			error;
+
+	error = xfs_parent_iread_extents(tp, child);
+	if (error)
+		return error;
+
+	xfs_inode_to_parent_rec(&ppargs->rec, dp);
+	xfs_parent_da_args_init(&ppargs->args, tp, &ppargs->rec, child,
+			child->i_ino, parent_name);
+	xfs_attr_defer_parent(&ppargs->args, XFS_ATTR_DEFER_REMOVE);
+	return 0;
+}
diff --git a/fs/xfs/libxfs/xfs_parent.h b/fs/xfs/libxfs/xfs_parent.h
index 6de24e3ef318c..4a7fd48c226a4 100644
--- a/fs/xfs/libxfs/xfs_parent.h
+++ b/fs/xfs/libxfs/xfs_parent.h
@@ -81,5 +81,8 @@ xfs_parent_finish(
 int xfs_parent_addname(struct xfs_trans *tp, struct xfs_parent_args *ppargs,
 		struct xfs_inode *dp, const struct xfs_name *parent_name,
 		struct xfs_inode *child);
+int xfs_parent_removename(struct xfs_trans *tp, struct xfs_parent_args *ppargs,
+		struct xfs_inode *dp, const struct xfs_name *parent_name,
+		struct xfs_inode *child);
 
 #endif /* __XFS_PARENT_H__ */
diff --git a/fs/xfs/libxfs/xfs_trans_space.c b/fs/xfs/libxfs/xfs_trans_space.c
index c8adda82debe0..df729e4f1a4c9 100644
--- a/fs/xfs/libxfs/xfs_trans_space.c
+++ b/fs/xfs/libxfs/xfs_trans_space.c
@@ -81,3 +81,16 @@ xfs_symlink_space_res(
 
 	return ret;
 }
+
+unsigned int
+xfs_remove_space_res(
+	struct xfs_mount	*mp,
+	unsigned int		namelen)
+{
+	unsigned int		ret = XFS_DIRREMOVE_SPACE_RES(mp);
+
+	if (xfs_has_parent(mp))
+		ret += xfs_parent_calc_space_res(mp, namelen);
+
+	return ret;
+}
diff --git a/fs/xfs/libxfs/xfs_trans_space.h b/fs/xfs/libxfs/xfs_trans_space.h
index 354ad1d6e18d6..a4490813c56f1 100644
--- a/fs/xfs/libxfs/xfs_trans_space.h
+++ b/fs/xfs/libxfs/xfs_trans_space.h
@@ -91,8 +91,6 @@
 	 XFS_DQUOT_CLUSTER_SIZE_FSB)
 #define	XFS_QM_QINOCREATE_SPACE_RES(mp)	\
 	XFS_IALLOC_SPACE_RES(mp)
-#define	XFS_REMOVE_SPACE_RES(mp)	\
-	XFS_DIRREMOVE_SPACE_RES(mp)
 #define	XFS_RENAME_SPACE_RES(mp,nl)	\
 	(XFS_DIRREMOVE_SPACE_RES(mp) + XFS_DIRENTER_SPACE_RES(mp,nl))
 #define XFS_IFREE_SPACE_RES(mp)		\
@@ -106,5 +104,6 @@ unsigned int xfs_mkdir_space_res(struct xfs_mount *mp, unsigned int namelen);
 unsigned int xfs_link_space_res(struct xfs_mount *mp, unsigned int namelen);
 unsigned int xfs_symlink_space_res(struct xfs_mount *mp, unsigned int namelen,
 		unsigned int fsblocks);
+unsigned int xfs_remove_space_res(struct xfs_mount *mp, unsigned int namelen);
 
 #endif	/* __XFS_TRANS_SPACE_H__ */
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 4a3fbd8d33099..492d8d1055e9e 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -2720,16 +2720,17 @@ xfs_iunpin_wait(
  */
 int
 xfs_remove(
-	xfs_inode_t             *dp,
+	struct xfs_inode	*dp,
 	struct xfs_name		*name,
-	xfs_inode_t		*ip)
+	struct xfs_inode	*ip)
 {
-	xfs_mount_t		*mp = dp->i_mount;
-	xfs_trans_t             *tp = NULL;
+	struct xfs_mount	*mp = dp->i_mount;
+	struct xfs_trans	*tp = NULL;
 	int			is_dir = S_ISDIR(VFS_I(ip)->i_mode);
 	int			dontcare;
 	int                     error = 0;
 	uint			resblks;
+	struct xfs_parent_args	*ppargs;
 
 	trace_xfs_remove(dp, name);
 
@@ -2746,6 +2747,10 @@ xfs_remove(
 	if (error)
 		goto std_return;
 
+	error = xfs_parent_start(mp, &ppargs);
+	if (error)
+		goto std_return;
+
 	/*
 	 * We try to get the real space reservation first, allowing for
 	 * directory btree deletion(s) implying possible bmap insert(s).  If we
@@ -2757,12 +2762,12 @@ xfs_remove(
 	 * the directory code can handle a reservationless update and we don't
 	 * want to prevent a user from trying to free space by deleting things.
 	 */
-	resblks = XFS_REMOVE_SPACE_RES(mp);
+	resblks = xfs_remove_space_res(mp, name->len);
 	error = xfs_trans_alloc_dir(dp, &M_RES(mp)->tr_remove, ip, &resblks,
 			&tp, &dontcare);
 	if (error) {
 		ASSERT(error != -ENOSPC);
-		goto std_return;
+		goto out_parent;
 	}
 
 	/*
@@ -2822,6 +2827,13 @@ xfs_remove(
 		goto out_trans_cancel;
 	}
 
+	/* Remove parent pointer. */
+	if (ppargs) {
+		error = xfs_parent_removename(tp, ppargs, dp, name, ip);
+		if (error)
+			goto out_trans_cancel;
+	}
+
 	/*
 	 * Drop the link from dp to ip, and if ip was a directory, remove the
 	 * '.' and '..' references since we freed the directory.
@@ -2845,6 +2857,7 @@ xfs_remove(
 
 	xfs_iunlock(ip, XFS_ILOCK_EXCL);
 	xfs_iunlock(dp, XFS_ILOCK_EXCL);
+	xfs_parent_finish(mp, ppargs);
 	return 0;
 
  out_trans_cancel:
@@ -2852,6 +2865,8 @@ xfs_remove(
  out_unlock:
 	xfs_iunlock(ip, XFS_ILOCK_EXCL);
 	xfs_iunlock(dp, XFS_ILOCK_EXCL);
+ out_parent:
+	xfs_parent_finish(mp, ppargs);
  std_return:
 	return error;
 }


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 21/32] xfs: Add parent pointers to rename
  2024-04-10  0:45 ` [PATCHSET v13.1 5/9] xfs: Parent Pointers Darrick J. Wong
                     ` (19 preceding siblings ...)
  2024-04-10  0:58   ` [PATCH 20/32] xfs: remove parent pointers in unlink Darrick J. Wong
@ 2024-04-10  0:58   ` Darrick J. Wong
  2024-04-10  5:46     ` Christoph Hellwig
  2024-04-10  0:59   ` [PATCH 22/32] xfs: Add parent pointers to xfs_cross_rename Darrick J. Wong
                     ` (10 subsequent siblings)
  31 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  0:58 UTC (permalink / raw)
  To: djwong
  Cc: Allison Henderson, catherine.hoang, hch, allison.henderson, linux-xfs

From: Allison Henderson <allison.henderson@oracle.com>

This patch removes the old parent pointer attribute during the rename
operation, and re-adds the updated parent pointer.

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: adjust to new ondisk format]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_parent.c      |   30 +++++++++++++++
 fs/xfs/libxfs/xfs_parent.h      |    6 +++
 fs/xfs/libxfs/xfs_trans_space.c |   25 ++++++++++++
 fs/xfs/libxfs/xfs_trans_space.h |    6 ++-
 fs/xfs/scrub/orphanage.c        |    3 +
 fs/xfs/scrub/parent_repair.c    |    3 +
 fs/xfs/xfs_inode.c              |   80 ++++++++++++++++++++++++++++++++++++---
 7 files changed, 142 insertions(+), 11 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_parent.c b/fs/xfs/libxfs/xfs_parent.c
index 0ddaab08d722d..86c808157294e 100644
--- a/fs/xfs/libxfs/xfs_parent.c
+++ b/fs/xfs/libxfs/xfs_parent.c
@@ -229,3 +229,33 @@ xfs_parent_removename(
 	xfs_attr_defer_parent(&ppargs->args, XFS_ATTR_DEFER_REMOVE);
 	return 0;
 }
+
+/* Replace one parent pointer with another to reflect a rename. */
+int
+xfs_parent_replacename(
+	struct xfs_trans	*tp,
+	struct xfs_parent_args	*ppargs,
+	struct xfs_inode	*old_dp,
+	const struct xfs_name	*old_name,
+	struct xfs_inode	*new_dp,
+	const struct xfs_name	*new_name,
+	struct xfs_inode	*child)
+{
+	int			error;
+
+	error = xfs_parent_iread_extents(tp, child);
+	if (error)
+		return error;
+
+	xfs_inode_to_parent_rec(&ppargs->rec, old_dp);
+	xfs_parent_da_args_init(&ppargs->args, tp, &ppargs->rec, child,
+			child->i_ino, old_name);
+
+	xfs_inode_to_parent_rec(&ppargs->new_rec, new_dp);
+	ppargs->args.new_name = new_name->name;
+	ppargs->args.new_namelen = new_name->len;
+	ppargs->args.new_value = &ppargs->new_rec;
+	ppargs->args.new_valuelen = sizeof(struct xfs_parent_rec);
+	xfs_attr_defer_parent(&ppargs->args, XFS_ATTR_DEFER_REPLACE);
+	return 0;
+}
diff --git a/fs/xfs/libxfs/xfs_parent.h b/fs/xfs/libxfs/xfs_parent.h
index 4a7fd48c226a4..768633b313671 100644
--- a/fs/xfs/libxfs/xfs_parent.h
+++ b/fs/xfs/libxfs/xfs_parent.h
@@ -45,6 +45,7 @@ extern struct kmem_cache	*xfs_parent_args_cache;
  */
 struct xfs_parent_args {
 	struct xfs_parent_rec	rec;
+	struct xfs_parent_rec	new_rec;
 	struct xfs_da_args	args;
 };
 
@@ -84,5 +85,10 @@ int xfs_parent_addname(struct xfs_trans *tp, struct xfs_parent_args *ppargs,
 int xfs_parent_removename(struct xfs_trans *tp, struct xfs_parent_args *ppargs,
 		struct xfs_inode *dp, const struct xfs_name *parent_name,
 		struct xfs_inode *child);
+int xfs_parent_replacename(struct xfs_trans *tp,
+		struct xfs_parent_args *ppargs,
+		struct xfs_inode *old_dp, const struct xfs_name *old_name,
+		struct xfs_inode *new_dp, const struct xfs_name *new_name,
+		struct xfs_inode *child);
 
 #endif /* __XFS_PARENT_H__ */
diff --git a/fs/xfs/libxfs/xfs_trans_space.c b/fs/xfs/libxfs/xfs_trans_space.c
index df729e4f1a4c9..b9dc3752f702c 100644
--- a/fs/xfs/libxfs/xfs_trans_space.c
+++ b/fs/xfs/libxfs/xfs_trans_space.c
@@ -94,3 +94,28 @@ xfs_remove_space_res(
 
 	return ret;
 }
+
+unsigned int
+xfs_rename_space_res(
+	struct xfs_mount	*mp,
+	unsigned int		src_namelen,
+	bool			target_exists,
+	unsigned int		target_namelen,
+	bool			has_whiteout)
+{
+	unsigned int		ret;
+
+	ret = XFS_DIRREMOVE_SPACE_RES(mp) +
+			XFS_DIRENTER_SPACE_RES(mp, target_namelen);
+
+	if (xfs_has_parent(mp)) {
+		if (has_whiteout)
+			ret += xfs_parent_calc_space_res(mp, src_namelen);
+		ret += 2 * xfs_parent_calc_space_res(mp, target_namelen);
+	}
+
+	if (target_exists)
+		ret += xfs_parent_calc_space_res(mp, target_namelen);
+
+	return ret;
+}
diff --git a/fs/xfs/libxfs/xfs_trans_space.h b/fs/xfs/libxfs/xfs_trans_space.h
index a4490813c56f1..1155ff2d37e29 100644
--- a/fs/xfs/libxfs/xfs_trans_space.h
+++ b/fs/xfs/libxfs/xfs_trans_space.h
@@ -91,8 +91,6 @@
 	 XFS_DQUOT_CLUSTER_SIZE_FSB)
 #define	XFS_QM_QINOCREATE_SPACE_RES(mp)	\
 	XFS_IALLOC_SPACE_RES(mp)
-#define	XFS_RENAME_SPACE_RES(mp,nl)	\
-	(XFS_DIRREMOVE_SPACE_RES(mp) + XFS_DIRENTER_SPACE_RES(mp,nl))
 #define XFS_IFREE_SPACE_RES(mp)		\
 	(xfs_has_finobt(mp) ? M_IGEO(mp)->inobt_maxlevels : 0)
 
@@ -106,4 +104,8 @@ unsigned int xfs_symlink_space_res(struct xfs_mount *mp, unsigned int namelen,
 		unsigned int fsblocks);
 unsigned int xfs_remove_space_res(struct xfs_mount *mp, unsigned int namelen);
 
+unsigned int xfs_rename_space_res(struct xfs_mount *mp,
+		unsigned int src_namelen, bool target_exists,
+		unsigned int target_namelen, bool has_whiteout);
+
 #endif	/* __XFS_TRANS_SPACE_H__ */
diff --git a/fs/xfs/scrub/orphanage.c b/fs/xfs/scrub/orphanage.c
index 5e2c3546f2e95..94bcc2799188f 100644
--- a/fs/xfs/scrub/orphanage.c
+++ b/fs/xfs/scrub/orphanage.c
@@ -328,7 +328,8 @@ xrep_adoption_trans_alloc(
 	adopt->sc = sc;
 	adopt->orphanage_blkres = xfs_link_space_res(mp, MAXNAMELEN);
 	if (S_ISDIR(VFS_I(sc->ip)->i_mode))
-		child_blkres = XFS_RENAME_SPACE_RES(mp, xfs_name_dotdot.len);
+		child_blkres = xfs_rename_space_res(mp, 0, false,
+						    xfs_name_dotdot.len, false);
 	adopt->child_blkres = child_blkres;
 
 	/*
diff --git a/fs/xfs/scrub/parent_repair.c b/fs/xfs/scrub/parent_repair.c
index ebb5791bf839e..63590e1b35060 100644
--- a/fs/xfs/scrub/parent_repair.c
+++ b/fs/xfs/scrub/parent_repair.c
@@ -171,7 +171,8 @@ xrep_parent_reset_dotdot(
 	 * Reserve more space just in case we have to expand the dir.  We're
 	 * allowed to exceed quota to repair inconsistent metadata.
 	 */
-	spaceres = XFS_RENAME_SPACE_RES(sc->mp, xfs_name_dotdot.len);
+	spaceres = xfs_rename_space_res(sc->mp, 0, false, xfs_name_dotdot.len,
+			false);
 	error = xfs_trans_reserve_more_inode(sc->tp, sc->ip, spaceres, 0,
 			true);
 	if (error)
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 492d8d1055e9e..ea619f5140739 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -3147,6 +3147,9 @@ xfs_rename(
 	struct xfs_trans	*tp;
 	struct xfs_inode	*wip = NULL;		/* whiteout inode */
 	struct xfs_inode	*inodes[__XFS_SORT_INODES];
+	struct xfs_parent_args	*src_ppargs = NULL;
+	struct xfs_parent_args	*tgt_ppargs = NULL;
+	struct xfs_parent_args	*wip_ppargs = NULL;
 	int			i;
 	int			num_inodes = __XFS_SORT_INODES;
 	bool			new_parent = (src_dp != target_dp);
@@ -3178,9 +3181,26 @@ xfs_rename(
 	xfs_sort_for_rename(src_dp, target_dp, src_ip, target_ip, wip,
 				inodes, &num_inodes);
 
+	error = xfs_parent_start(mp, &src_ppargs);
+	if (error)
+		goto out_release_wip;
+
+	if (wip) {
+		error = xfs_parent_start(mp, &wip_ppargs);
+		if (error)
+			goto out_src_ppargs;
+	}
+
+	if (target_ip) {
+		error = xfs_parent_start(mp, &tgt_ppargs);
+		if (error)
+			goto out_wip_ppargs;
+	}
+
 retry:
 	nospace_error = 0;
-	spaceres = XFS_RENAME_SPACE_RES(mp, target_name->len);
+	spaceres = xfs_rename_space_res(mp, src_name->len, target_ip != NULL,
+			target_name->len, wip != NULL);
 	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_rename, spaceres, 0, 0, &tp);
 	if (error == -ENOSPC) {
 		nospace_error = error;
@@ -3189,7 +3209,17 @@ xfs_rename(
 				&tp);
 	}
 	if (error)
-		goto out_release_wip;
+		goto out_tgt_ppargs;
+
+	/*
+	 * We don't allow reservationless renaming when parent pointers are
+	 * enabled because we can't back out if the xattrs must grow.
+	 */
+	if (src_ppargs && nospace_error) {
+		error = nospace_error;
+		xfs_trans_cancel(tp);
+		goto out_tgt_ppargs;
+	}
 
 	/*
 	 * Attach the dquots to the inodes
@@ -3197,7 +3227,7 @@ xfs_rename(
 	error = xfs_qm_vop_rename_dqattach(inodes);
 	if (error) {
 		xfs_trans_cancel(tp);
-		goto out_release_wip;
+		goto out_tgt_ppargs;
 	}
 
 	/*
@@ -3266,6 +3296,15 @@ xfs_rename(
 			goto out_trans_cancel;
 	}
 
+	/*
+	 * We don't allow quotaless renaming when parent pointers are enabled
+	 * because we can't back out if the xattrs must grow.
+	 */
+	if (src_ppargs && nospace_error) {
+		error = nospace_error;
+		goto out_trans_cancel;
+	}
+
 	/*
 	 * Check for expected errors before we dirty the transaction
 	 * so we can return an error without a transaction abort.
@@ -3458,6 +3497,28 @@ xfs_rename(
 	if (error)
 		goto out_trans_cancel;
 
+	/* Schedule parent pointer updates. */
+	if (wip_ppargs) {
+		error = xfs_parent_addname(tp, wip_ppargs, src_dp, src_name,
+				wip);
+		if (error)
+			goto out_trans_cancel;
+	}
+
+	if (src_ppargs) {
+		error = xfs_parent_replacename(tp, src_ppargs, src_dp,
+				src_name, target_dp, target_name, src_ip);
+		if (error)
+			goto out_trans_cancel;
+	}
+
+	if (tgt_ppargs) {
+		error = xfs_parent_removename(tp, tgt_ppargs, target_dp,
+				target_name, target_ip);
+		if (error)
+			goto out_trans_cancel;
+	}
+
 	xfs_trans_ichgtime(tp, src_dp, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG);
 	xfs_trans_log_inode(tp, src_dp, XFS_ILOG_CORE);
 	if (new_parent)
@@ -3479,14 +3540,19 @@ xfs_rename(
 		xfs_dir_update_hook(src_dp, wip, 1, src_name);
 
 	error = xfs_finish_rename(tp);
-	xfs_iunlock_rename(inodes, num_inodes);
-	if (wip)
-		xfs_irele(wip);
-	return error;
+	nospace_error = 0;
+	goto out_unlock;
 
 out_trans_cancel:
 	xfs_trans_cancel(tp);
+out_unlock:
 	xfs_iunlock_rename(inodes, num_inodes);
+out_tgt_ppargs:
+	xfs_parent_finish(mp, tgt_ppargs);
+out_wip_ppargs:
+	xfs_parent_finish(mp, wip_ppargs);
+out_src_ppargs:
+	xfs_parent_finish(mp, src_ppargs);
 out_release_wip:
 	if (wip)
 		xfs_irele(wip);


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 22/32] xfs: Add parent pointers to xfs_cross_rename
  2024-04-10  0:45 ` [PATCHSET v13.1 5/9] xfs: Parent Pointers Darrick J. Wong
                     ` (20 preceding siblings ...)
  2024-04-10  0:58   ` [PATCH 21/32] xfs: Add parent pointers to rename Darrick J. Wong
@ 2024-04-10  0:59   ` Darrick J. Wong
  2024-04-10  5:46     ` Christoph Hellwig
  2024-04-10  0:59   ` [PATCH 23/32] xfs: Filter XFS_ATTR_PARENT for getfattr Darrick J. Wong
                     ` (9 subsequent siblings)
  31 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  0:59 UTC (permalink / raw)
  To: djwong
  Cc: Allison Henderson, catherine.hoang, hch, allison.henderson, linux-xfs

From: Allison Henderson <allison.henderson@oracle.com>

Cross renames are handled separately from standard renames, and
need different handling to update the parent attributes correctly.

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_inode.c |   33 +++++++++++++++++++++++++--------
 1 file changed, 25 insertions(+), 8 deletions(-)


diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index ea619f5140739..766cbb8b7be51 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -2971,15 +2971,17 @@ xfs_cross_rename(
 	struct xfs_inode	*dp1,
 	struct xfs_name		*name1,
 	struct xfs_inode	*ip1,
+	struct xfs_parent_args	*ip1_ppargs,
 	struct xfs_inode	*dp2,
 	struct xfs_name		*name2,
 	struct xfs_inode	*ip2,
+	struct xfs_parent_args	*ip2_ppargs,
 	int			spaceres)
 {
-	int		error = 0;
-	int		ip1_flags = 0;
-	int		ip2_flags = 0;
-	int		dp2_flags = 0;
+	int			error = 0;
+	int			ip1_flags = 0;
+	int			ip2_flags = 0;
+	int			dp2_flags = 0;
 
 	/* Swap inode number for dirent in first parent */
 	error = xfs_dir_replace(tp, dp1, name1, ip2->i_ino, spaceres);
@@ -3048,6 +3050,21 @@ xfs_cross_rename(
 		}
 	}
 
+	/* Schedule parent pointer replacements */
+	if (ip1_ppargs) {
+		error = xfs_parent_replacename(tp, ip1_ppargs, dp1, name1, dp2,
+				name2, ip1);
+		if (error)
+			goto out_trans_abort;
+	}
+
+	if (ip2_ppargs) {
+		error = xfs_parent_replacename(tp, ip2_ppargs, dp2, name2, dp1,
+				name1, ip2);
+		if (error)
+			goto out_trans_abort;
+	}
+
 	if (ip1_flags) {
 		xfs_trans_ichgtime(tp, ip1, ip1_flags);
 		xfs_trans_log_inode(tp, ip1, XFS_ILOG_CORE);
@@ -3264,10 +3281,10 @@ xfs_rename(
 	/* RENAME_EXCHANGE is unique from here on. */
 	if (flags & RENAME_EXCHANGE) {
 		error = xfs_cross_rename(tp, src_dp, src_name, src_ip,
-					target_dp, target_name, target_ip,
-					spaceres);
-		xfs_iunlock_rename(inodes, num_inodes);
-		return error;
+				src_ppargs, target_dp, target_name, target_ip,
+				tgt_ppargs, spaceres);
+		nospace_error = 0;
+		goto out_unlock;
 	}
 
 	/*


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 23/32] xfs: Filter XFS_ATTR_PARENT for getfattr
  2024-04-10  0:45 ` [PATCHSET v13.1 5/9] xfs: Parent Pointers Darrick J. Wong
                     ` (21 preceding siblings ...)
  2024-04-10  0:59   ` [PATCH 22/32] xfs: Add parent pointers to xfs_cross_rename Darrick J. Wong
@ 2024-04-10  0:59   ` Darrick J. Wong
  2024-04-10  5:51     ` Christoph Hellwig
  2024-04-10  0:59   ` [PATCH 24/32] xfs: pass the attr value to put_listent when possible Darrick J. Wong
                     ` (8 subsequent siblings)
  31 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  0:59 UTC (permalink / raw)
  To: djwong
  Cc: Allison Henderson, catherine.hoang, hch, allison.henderson, linux-xfs

From: Allison Henderson <allison.henderson@oracle.com>

Parent pointers returned to the get_fattr tool cause errors since
the tool cannot parse parent pointers.  Fix this by filtering parent
parent pointers from xfs_xattr_put_listent.

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Inspired-by: Andrey Albershteyn <aalbersh@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: change this to XFS_ATTR_PRIVATE_NSP_MASK per fsverity patchset]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_da_format.h |    3 +++
 fs/xfs/xfs_xattr.c            |   10 ++++++++++
 2 files changed, 13 insertions(+)


diff --git a/fs/xfs/libxfs/xfs_da_format.h b/fs/xfs/libxfs/xfs_da_format.h
index 1395ad1937c53..ebde6eb1da65d 100644
--- a/fs/xfs/libxfs/xfs_da_format.h
+++ b/fs/xfs/libxfs/xfs_da_format.h
@@ -726,6 +726,9 @@ struct xfs_attr3_leafblock {
 					 XFS_ATTR_SECURE | \
 					 XFS_ATTR_PARENT)
 
+/* Private attr namespaces not exposed to userspace */
+#define XFS_ATTR_PRIVATE_NSP_MASK	(XFS_ATTR_PARENT)
+
 #define XFS_ATTR_ONDISK_MASK	(XFS_ATTR_NSP_ONDISK_MASK | \
 				 XFS_ATTR_LOCAL | \
 				 XFS_ATTR_INCOMPLETE)
diff --git a/fs/xfs/xfs_xattr.c b/fs/xfs/xfs_xattr.c
index 85e886ee20e03..00b591f6c5ca1 100644
--- a/fs/xfs/xfs_xattr.c
+++ b/fs/xfs/xfs_xattr.c
@@ -20,6 +20,12 @@
 
 #include <linux/posix_acl_xattr.h>
 
+/*
+ * This file defines functions to work with externally visible extended
+ * attributes, such as those in user, system, or security namespaces.  They
+ * should not be used for internally used attributes.  Consider xfs_attr.c.
+ */
+
 /*
  * Get permission to use log-assisted atomic exchange of file extents.
  * Callers must not be running any transactions or hold any ILOCKs.
@@ -215,6 +221,10 @@ xfs_xattr_put_listent(
 
 	ASSERT(context->count >= 0);
 
+	/* Don't expose private xattr namespaces. */
+	if (flags & XFS_ATTR_PRIVATE_NSP_MASK)
+		return;
+
 	if (flags & XFS_ATTR_ROOT) {
 #ifdef CONFIG_XFS_POSIX_ACL
 		if (namelen == SGI_ACL_FILE_SIZE &&


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 24/32] xfs: pass the attr value to put_listent when possible
  2024-04-10  0:45 ` [PATCHSET v13.1 5/9] xfs: Parent Pointers Darrick J. Wong
                     ` (22 preceding siblings ...)
  2024-04-10  0:59   ` [PATCH 23/32] xfs: Filter XFS_ATTR_PARENT for getfattr Darrick J. Wong
@ 2024-04-10  0:59   ` Darrick J. Wong
  2024-04-10  5:51     ` Christoph Hellwig
  2024-04-10  1:00   ` [PATCH 25/32] xfs: move handle ioctl code to xfs_handle.c Darrick J. Wong
                     ` (7 subsequent siblings)
  31 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  0:59 UTC (permalink / raw)
  To: djwong
  Cc: Allison Henderson, catherine.hoang, hch, allison.henderson, linux-xfs

From: Allison Henderson <allison.henderson@oracle.com>

Pass the attr value to put_listent when we have local xattrs or
shortform xattrs.

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_attr.h    |    5 +++--
 fs/xfs/libxfs/xfs_attr_sf.h |    1 +
 fs/xfs/xfs_attr_list.c      |    8 +++++++-
 fs/xfs/xfs_ioctl.c          |    1 +
 fs/xfs/xfs_xattr.c          |    1 +
 5 files changed, 13 insertions(+), 3 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
index d63305fc54155..cb5ca37000848 100644
--- a/fs/xfs/libxfs/xfs_attr.h
+++ b/fs/xfs/libxfs/xfs_attr.h
@@ -47,8 +47,9 @@ struct xfs_attrlist_cursor_kern {
 
 
 /* void; state communicated via *context */
-typedef void (*put_listent_func_t)(struct xfs_attr_list_context *, int,
-			      unsigned char *, int, int);
+typedef void (*put_listent_func_t)(struct xfs_attr_list_context *context,
+		int flags, unsigned char *name, int namelen, void *value,
+		int valuelen);
 
 struct xfs_attr_list_context {
 	struct xfs_trans	*tp;
diff --git a/fs/xfs/libxfs/xfs_attr_sf.h b/fs/xfs/libxfs/xfs_attr_sf.h
index bc44222230248..73bdc0e556825 100644
--- a/fs/xfs/libxfs/xfs_attr_sf.h
+++ b/fs/xfs/libxfs/xfs_attr_sf.h
@@ -16,6 +16,7 @@ typedef struct xfs_attr_sf_sort {
 	uint8_t		flags;		/* flags bits (see xfs_attr_leaf.h) */
 	xfs_dahash_t	hash;		/* this entry's hash value */
 	unsigned char	*name;		/* name value, pointer into buffer */
+	void		*value;
 } xfs_attr_sf_sort_t;
 
 #define XFS_ATTR_SF_ENTSIZE_MAX			/* max space for name&value */ \
diff --git a/fs/xfs/xfs_attr_list.c b/fs/xfs/xfs_attr_list.c
index 9bc4b5322539a..5c947e5ce8b88 100644
--- a/fs/xfs/xfs_attr_list.c
+++ b/fs/xfs/xfs_attr_list.c
@@ -92,6 +92,7 @@ xfs_attr_shortform_list(
 					     sfe->flags,
 					     sfe->nameval,
 					     (int)sfe->namelen,
+					     &sfe->nameval[sfe->namelen],
 					     (int)sfe->valuelen);
 			/*
 			 * Either search callback finished early or
@@ -138,6 +139,7 @@ xfs_attr_shortform_list(
 		sbp->name = sfe->nameval;
 		sbp->namelen = sfe->namelen;
 		/* These are bytes, and both on-disk, don't endian-flip */
+		sbp->value = &sfe->nameval[sfe->namelen],
 		sbp->valuelen = sfe->valuelen;
 		sbp->flags = sfe->flags;
 		sbp->hash = xfs_attr_hashval(dp->i_mount, sfe->flags,
@@ -192,6 +194,7 @@ xfs_attr_shortform_list(
 				     sbp->flags,
 				     sbp->name,
 				     sbp->namelen,
+				     sbp->value,
 				     sbp->valuelen);
 		if (context->seen_enough)
 			break;
@@ -479,6 +482,7 @@ xfs_attr3_leaf_list_int(
 	 */
 	for (; i < ichdr.count; entry++, i++) {
 		char *name;
+		void *value;
 		int namelen, valuelen;
 
 		if (be32_to_cpu(entry->hashval) != cursor->hashval) {
@@ -496,6 +500,7 @@ xfs_attr3_leaf_list_int(
 			name_loc = xfs_attr3_leaf_name_local(leaf, i);
 			name = name_loc->nameval;
 			namelen = name_loc->namelen;
+			value = &name_loc->nameval[name_loc->namelen];
 			valuelen = be16_to_cpu(name_loc->valuelen);
 		} else {
 			xfs_attr_leaf_name_remote_t *name_rmt;
@@ -503,6 +508,7 @@ xfs_attr3_leaf_list_int(
 			name_rmt = xfs_attr3_leaf_name_remote(leaf, i);
 			name = name_rmt->name;
 			namelen = name_rmt->namelen;
+			value = NULL;
 			valuelen = be32_to_cpu(name_rmt->valuelen);
 		}
 
@@ -513,7 +519,7 @@ xfs_attr3_leaf_list_int(
 			return -EFSCORRUPTED;
 		}
 		context->put_listent(context, entry->flags,
-					      name, namelen, valuelen);
+					      name, namelen, value, valuelen);
 		if (context->seen_enough)
 			break;
 		cursor->offset++;
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 39bdd1034ffab..d56e5c6876eee 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -310,6 +310,7 @@ xfs_ioc_attr_put_listent(
 	int			flags,
 	unsigned char		*name,
 	int			namelen,
+	void			*value,
 	int			valuelen)
 {
 	struct xfs_attrlist	*alist = context->buffer;
diff --git a/fs/xfs/xfs_xattr.c b/fs/xfs/xfs_xattr.c
index 00b591f6c5ca1..1d57e204c850f 100644
--- a/fs/xfs/xfs_xattr.c
+++ b/fs/xfs/xfs_xattr.c
@@ -214,6 +214,7 @@ xfs_xattr_put_listent(
 	int		flags,
 	unsigned char	*name,
 	int		namelen,
+	void		*value,
 	int		valuelen)
 {
 	char *prefix;


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 25/32] xfs: move handle ioctl code to xfs_handle.c
  2024-04-10  0:45 ` [PATCHSET v13.1 5/9] xfs: Parent Pointers Darrick J. Wong
                     ` (23 preceding siblings ...)
  2024-04-10  0:59   ` [PATCH 24/32] xfs: pass the attr value to put_listent when possible Darrick J. Wong
@ 2024-04-10  1:00   ` Darrick J. Wong
  2024-04-10  5:52     ` Christoph Hellwig
  2024-04-10  1:00   ` [PATCH 26/32] xfs: split out handle management helpers a bit Darrick J. Wong
                     ` (6 subsequent siblings)
  31 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  1:00 UTC (permalink / raw)
  To: djwong; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Move the handle managemnet code (and the attrmulti code that uses it) to
xfs_handle.c.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/Makefile      |    1 
 fs/xfs/xfs_handle.c  |  617 ++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_handle.h  |   28 ++
 fs/xfs/xfs_ioctl.c   |  591 ------------------------------------------------
 fs/xfs/xfs_ioctl.h   |   28 --
 fs/xfs/xfs_ioctl32.c |    1 
 6 files changed, 648 insertions(+), 618 deletions(-)
 create mode 100644 fs/xfs/xfs_handle.c
 create mode 100644 fs/xfs/xfs_handle.h


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 0c1a0b67af93c..c969b11ce0f47 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -78,6 +78,7 @@ xfs-y				+= xfs_aops.o \
 				   xfs_fsmap.o \
 				   xfs_fsops.o \
 				   xfs_globals.o \
+				   xfs_handle.o \
 				   xfs_health.o \
 				   xfs_icache.o \
 				   xfs_ioctl.o \
diff --git a/fs/xfs/xfs_handle.c b/fs/xfs/xfs_handle.c
new file mode 100644
index 0000000000000..a0015dc8cff1a
--- /dev/null
+++ b/fs/xfs/xfs_handle.c
@@ -0,0 +1,617 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2000-2005 Silicon Graphics, Inc.
+ * All rights reserved.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_format.h"
+#include "xfs_log_format.h"
+#include "xfs_shared.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_bmap_btree.h"
+#include "xfs_inode.h"
+#include "xfs_error.h"
+#include "xfs_trace.h"
+#include "xfs_trans.h"
+#include "xfs_da_format.h"
+#include "xfs_da_btree.h"
+#include "xfs_attr.h"
+#include "xfs_ioctl.h"
+#include "xfs_parent.h"
+#include "xfs_da_btree.h"
+#include "xfs_handle.h"
+#include "xfs_health.h"
+#include "xfs_icache.h"
+#include "xfs_export.h"
+#include "xfs_xattr.h"
+#include "xfs_acl.h"
+
+#include <linux/namei.h>
+
+/*
+ * xfs_find_handle maps from userspace xfs_fsop_handlereq structure to
+ * a file or fs handle.
+ *
+ * XFS_IOC_PATH_TO_FSHANDLE
+ *    returns fs handle for a mount point or path within that mount point
+ * XFS_IOC_FD_TO_HANDLE
+ *    returns full handle for a FD opened in user space
+ * XFS_IOC_PATH_TO_HANDLE
+ *    returns full handle for a path
+ */
+int
+xfs_find_handle(
+	unsigned int		cmd,
+	xfs_fsop_handlereq_t	*hreq)
+{
+	int			hsize;
+	xfs_handle_t		handle;
+	struct inode		*inode;
+	struct fd		f = {NULL};
+	struct path		path;
+	int			error;
+	struct xfs_inode	*ip;
+
+	if (cmd == XFS_IOC_FD_TO_HANDLE) {
+		f = fdget(hreq->fd);
+		if (!f.file)
+			return -EBADF;
+		inode = file_inode(f.file);
+	} else {
+		error = user_path_at(AT_FDCWD, hreq->path, 0, &path);
+		if (error)
+			return error;
+		inode = d_inode(path.dentry);
+	}
+	ip = XFS_I(inode);
+
+	/*
+	 * We can only generate handles for inodes residing on a XFS filesystem,
+	 * and only for regular files, directories or symbolic links.
+	 */
+	error = -EINVAL;
+	if (inode->i_sb->s_magic != XFS_SB_MAGIC)
+		goto out_put;
+
+	error = -EBADF;
+	if (!S_ISREG(inode->i_mode) &&
+	    !S_ISDIR(inode->i_mode) &&
+	    !S_ISLNK(inode->i_mode))
+		goto out_put;
+
+
+	memcpy(&handle.ha_fsid, ip->i_mount->m_fixedfsid, sizeof(xfs_fsid_t));
+
+	if (cmd == XFS_IOC_PATH_TO_FSHANDLE) {
+		/*
+		 * This handle only contains an fsid, zero the rest.
+		 */
+		memset(&handle.ha_fid, 0, sizeof(handle.ha_fid));
+		hsize = sizeof(xfs_fsid_t);
+	} else {
+		handle.ha_fid.fid_len = sizeof(xfs_fid_t) -
+					sizeof(handle.ha_fid.fid_len);
+		handle.ha_fid.fid_pad = 0;
+		handle.ha_fid.fid_gen = inode->i_generation;
+		handle.ha_fid.fid_ino = ip->i_ino;
+		hsize = sizeof(xfs_handle_t);
+	}
+
+	error = -EFAULT;
+	if (copy_to_user(hreq->ohandle, &handle, hsize) ||
+	    copy_to_user(hreq->ohandlen, &hsize, sizeof(__s32)))
+		goto out_put;
+
+	error = 0;
+
+ out_put:
+	if (cmd == XFS_IOC_FD_TO_HANDLE)
+		fdput(f);
+	else
+		path_put(&path);
+	return error;
+}
+
+/*
+ * No need to do permission checks on the various pathname components
+ * as the handle operations are privileged.
+ */
+STATIC int
+xfs_handle_acceptable(
+	void			*context,
+	struct dentry		*dentry)
+{
+	return 1;
+}
+
+/*
+ * Convert userspace handle data into a dentry.
+ */
+struct dentry *
+xfs_handle_to_dentry(
+	struct file		*parfilp,
+	void __user		*uhandle,
+	u32			hlen)
+{
+	xfs_handle_t		handle;
+	struct xfs_fid64	fid;
+
+	/*
+	 * Only allow handle opens under a directory.
+	 */
+	if (!S_ISDIR(file_inode(parfilp)->i_mode))
+		return ERR_PTR(-ENOTDIR);
+
+	if (hlen != sizeof(xfs_handle_t))
+		return ERR_PTR(-EINVAL);
+	if (copy_from_user(&handle, uhandle, hlen))
+		return ERR_PTR(-EFAULT);
+	if (handle.ha_fid.fid_len !=
+	    sizeof(handle.ha_fid) - sizeof(handle.ha_fid.fid_len))
+		return ERR_PTR(-EINVAL);
+
+	memset(&fid, 0, sizeof(struct fid));
+	fid.ino = handle.ha_fid.fid_ino;
+	fid.gen = handle.ha_fid.fid_gen;
+
+	return exportfs_decode_fh(parfilp->f_path.mnt, (struct fid *)&fid, 3,
+			FILEID_INO32_GEN | XFS_FILEID_TYPE_64FLAG,
+			xfs_handle_acceptable, NULL);
+}
+
+STATIC struct dentry *
+xfs_handlereq_to_dentry(
+	struct file		*parfilp,
+	xfs_fsop_handlereq_t	*hreq)
+{
+	return xfs_handle_to_dentry(parfilp, hreq->ihandle, hreq->ihandlen);
+}
+
+int
+xfs_open_by_handle(
+	struct file		*parfilp,
+	xfs_fsop_handlereq_t	*hreq)
+{
+	const struct cred	*cred = current_cred();
+	int			error;
+	int			fd;
+	int			permflag;
+	struct file		*filp;
+	struct inode		*inode;
+	struct dentry		*dentry;
+	fmode_t			fmode;
+	struct path		path;
+
+	if (!capable(CAP_SYS_ADMIN))
+		return -EPERM;
+
+	dentry = xfs_handlereq_to_dentry(parfilp, hreq);
+	if (IS_ERR(dentry))
+		return PTR_ERR(dentry);
+	inode = d_inode(dentry);
+
+	/* Restrict xfs_open_by_handle to directories & regular files. */
+	if (!(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode))) {
+		error = -EPERM;
+		goto out_dput;
+	}
+
+#if BITS_PER_LONG != 32
+	hreq->oflags |= O_LARGEFILE;
+#endif
+
+	permflag = hreq->oflags;
+	fmode = OPEN_FMODE(permflag);
+	if ((!(permflag & O_APPEND) || (permflag & O_TRUNC)) &&
+	    (fmode & FMODE_WRITE) && IS_APPEND(inode)) {
+		error = -EPERM;
+		goto out_dput;
+	}
+
+	if ((fmode & FMODE_WRITE) && IS_IMMUTABLE(inode)) {
+		error = -EPERM;
+		goto out_dput;
+	}
+
+	/* Can't write directories. */
+	if (S_ISDIR(inode->i_mode) && (fmode & FMODE_WRITE)) {
+		error = -EISDIR;
+		goto out_dput;
+	}
+
+	fd = get_unused_fd_flags(0);
+	if (fd < 0) {
+		error = fd;
+		goto out_dput;
+	}
+
+	path.mnt = parfilp->f_path.mnt;
+	path.dentry = dentry;
+	filp = dentry_open(&path, hreq->oflags, cred);
+	dput(dentry);
+	if (IS_ERR(filp)) {
+		put_unused_fd(fd);
+		return PTR_ERR(filp);
+	}
+
+	if (S_ISREG(inode->i_mode)) {
+		filp->f_flags |= O_NOATIME;
+		filp->f_mode |= FMODE_NOCMTIME;
+	}
+
+	fd_install(fd, filp);
+	return fd;
+
+ out_dput:
+	dput(dentry);
+	return error;
+}
+
+int
+xfs_readlink_by_handle(
+	struct file		*parfilp,
+	xfs_fsop_handlereq_t	*hreq)
+{
+	struct dentry		*dentry;
+	__u32			olen;
+	int			error;
+
+	if (!capable(CAP_SYS_ADMIN))
+		return -EPERM;
+
+	dentry = xfs_handlereq_to_dentry(parfilp, hreq);
+	if (IS_ERR(dentry))
+		return PTR_ERR(dentry);
+
+	/* Restrict this handle operation to symlinks only. */
+	if (!d_is_symlink(dentry)) {
+		error = -EINVAL;
+		goto out_dput;
+	}
+
+	if (copy_from_user(&olen, hreq->ohandlen, sizeof(__u32))) {
+		error = -EFAULT;
+		goto out_dput;
+	}
+
+	error = vfs_readlink(dentry, hreq->ohandle, olen);
+
+ out_dput:
+	dput(dentry);
+	return error;
+}
+
+/*
+ * Format an attribute and copy it out to the user's buffer.
+ * Take care to check values and protect against them changing later,
+ * we may be reading them directly out of a user buffer.
+ */
+static void
+xfs_ioc_attr_put_listent(
+	struct xfs_attr_list_context *context,
+	int			flags,
+	unsigned char		*name,
+	int			namelen,
+	void			*value,
+	int			valuelen)
+{
+	struct xfs_attrlist	*alist = context->buffer;
+	struct xfs_attrlist_ent	*aep;
+	int			arraytop;
+
+	ASSERT(!context->seen_enough);
+	ASSERT(context->count >= 0);
+	ASSERT(context->count < (ATTR_MAX_VALUELEN/8));
+	ASSERT(context->firstu >= sizeof(*alist));
+	ASSERT(context->firstu <= context->bufsize);
+
+	/*
+	 * Only list entries in the right namespace.
+	 */
+	if (context->attr_filter != (flags & XFS_ATTR_NSP_ONDISK_MASK))
+		return;
+
+	arraytop = sizeof(*alist) +
+			context->count * sizeof(alist->al_offset[0]);
+
+	/* decrement by the actual bytes used by the attr */
+	context->firstu -= round_up(offsetof(struct xfs_attrlist_ent, a_name) +
+			namelen + 1, sizeof(uint32_t));
+	if (context->firstu < arraytop) {
+		trace_xfs_attr_list_full(context);
+		alist->al_more = 1;
+		context->seen_enough = 1;
+		return;
+	}
+
+	aep = context->buffer + context->firstu;
+	aep->a_valuelen = valuelen;
+	memcpy(aep->a_name, name, namelen);
+	aep->a_name[namelen] = 0;
+	alist->al_offset[context->count++] = context->firstu;
+	alist->al_count = context->count;
+	trace_xfs_attr_list_add(context);
+}
+
+static unsigned int
+xfs_attr_filter(
+	u32			ioc_flags)
+{
+	if (ioc_flags & XFS_IOC_ATTR_ROOT)
+		return XFS_ATTR_ROOT;
+	if (ioc_flags & XFS_IOC_ATTR_SECURE)
+		return XFS_ATTR_SECURE;
+	return 0;
+}
+
+static unsigned int
+xfs_xattr_flags(
+	u32			ioc_flags)
+{
+	if (ioc_flags & XFS_IOC_ATTR_CREATE)
+		return XATTR_CREATE;
+	if (ioc_flags & XFS_IOC_ATTR_REPLACE)
+		return XATTR_REPLACE;
+	return 0;
+}
+
+int
+xfs_ioc_attr_list(
+	struct xfs_inode		*dp,
+	void __user			*ubuf,
+	size_t				bufsize,
+	int				flags,
+	struct xfs_attrlist_cursor __user *ucursor)
+{
+	struct xfs_attr_list_context	context = { };
+	struct xfs_attrlist		*alist;
+	void				*buffer;
+	int				error;
+
+	if (bufsize < sizeof(struct xfs_attrlist) ||
+	    bufsize > XFS_XATTR_LIST_MAX)
+		return -EINVAL;
+
+	/*
+	 * Reject flags, only allow namespaces.
+	 */
+	if (flags & ~(XFS_IOC_ATTR_ROOT | XFS_IOC_ATTR_SECURE))
+		return -EINVAL;
+	if (flags == (XFS_IOC_ATTR_ROOT | XFS_IOC_ATTR_SECURE))
+		return -EINVAL;
+
+	/*
+	 * Validate the cursor.
+	 */
+	if (copy_from_user(&context.cursor, ucursor, sizeof(context.cursor)))
+		return -EFAULT;
+	if (context.cursor.pad1 || context.cursor.pad2)
+		return -EINVAL;
+	if (!context.cursor.initted &&
+	    (context.cursor.hashval || context.cursor.blkno ||
+	     context.cursor.offset))
+		return -EINVAL;
+
+	buffer = kvzalloc(bufsize, GFP_KERNEL);
+	if (!buffer)
+		return -ENOMEM;
+
+	/*
+	 * Initialize the output buffer.
+	 */
+	context.dp = dp;
+	context.resynch = 1;
+	context.attr_filter = xfs_attr_filter(flags);
+	context.buffer = buffer;
+	context.bufsize = round_down(bufsize, sizeof(uint32_t));
+	context.firstu = context.bufsize;
+	context.put_listent = xfs_ioc_attr_put_listent;
+
+	alist = context.buffer;
+	alist->al_count = 0;
+	alist->al_more = 0;
+	alist->al_offset[0] = context.bufsize;
+
+	error = xfs_attr_list(&context);
+	if (error)
+		goto out_free;
+
+	if (copy_to_user(ubuf, buffer, bufsize) ||
+	    copy_to_user(ucursor, &context.cursor, sizeof(context.cursor)))
+		error = -EFAULT;
+out_free:
+	kvfree(buffer);
+	return error;
+}
+
+int
+xfs_attrlist_by_handle(
+	struct file		*parfilp,
+	struct xfs_fsop_attrlist_handlereq __user *p)
+{
+	struct xfs_fsop_attrlist_handlereq al_hreq;
+	struct dentry		*dentry;
+	int			error = -ENOMEM;
+
+	if (!capable(CAP_SYS_ADMIN))
+		return -EPERM;
+	if (copy_from_user(&al_hreq, p, sizeof(al_hreq)))
+		return -EFAULT;
+
+	dentry = xfs_handlereq_to_dentry(parfilp, &al_hreq.hreq);
+	if (IS_ERR(dentry))
+		return PTR_ERR(dentry);
+
+	error = xfs_ioc_attr_list(XFS_I(d_inode(dentry)), al_hreq.buffer,
+				  al_hreq.buflen, al_hreq.flags, &p->pos);
+	dput(dentry);
+	return error;
+}
+
+static int
+xfs_attrmulti_attr_get(
+	struct inode		*inode,
+	unsigned char		*name,
+	unsigned char		__user *ubuf,
+	uint32_t		*len,
+	uint32_t		flags)
+{
+	struct xfs_da_args	args = {
+		.dp		= XFS_I(inode),
+		.attr_filter	= xfs_attr_filter(flags),
+		.xattr_flags	= xfs_xattr_flags(flags),
+		.name		= name,
+		.namelen	= strlen(name),
+		.valuelen	= *len,
+	};
+	int			error;
+
+	if (*len > XFS_XATTR_SIZE_MAX)
+		return -EINVAL;
+
+	error = xfs_attr_get(&args);
+	if (error)
+		goto out_kfree;
+
+	*len = args.valuelen;
+	if (copy_to_user(ubuf, args.value, args.valuelen))
+		error = -EFAULT;
+
+out_kfree:
+	kvfree(args.value);
+	return error;
+}
+
+static int
+xfs_attrmulti_attr_set(
+	struct inode		*inode,
+	unsigned char		*name,
+	const unsigned char	__user *ubuf,
+	uint32_t		len,
+	uint32_t		flags)
+{
+	struct xfs_da_args	args = {
+		.dp		= XFS_I(inode),
+		.attr_filter	= xfs_attr_filter(flags),
+		.xattr_flags	= xfs_xattr_flags(flags),
+		.name		= name,
+		.namelen	= strlen(name),
+	};
+	int			error;
+
+	if (IS_IMMUTABLE(inode) || IS_APPEND(inode))
+		return -EPERM;
+
+	if (ubuf) {
+		if (len > XFS_XATTR_SIZE_MAX)
+			return -EINVAL;
+		args.value = memdup_user(ubuf, len);
+		if (IS_ERR(args.value))
+			return PTR_ERR(args.value);
+		args.valuelen = len;
+	}
+
+	error = xfs_attr_change(&args);
+	if (!error && (flags & XFS_IOC_ATTR_ROOT))
+		xfs_forget_acl(inode, name);
+	kfree(args.value);
+	return error;
+}
+
+int
+xfs_ioc_attrmulti_one(
+	struct file		*parfilp,
+	struct inode		*inode,
+	uint32_t		opcode,
+	void __user		*uname,
+	void __user		*value,
+	uint32_t		*len,
+	uint32_t		flags)
+{
+	unsigned char		*name;
+	int			error;
+
+	if ((flags & XFS_IOC_ATTR_ROOT) && (flags & XFS_IOC_ATTR_SECURE))
+		return -EINVAL;
+
+	name = strndup_user(uname, MAXNAMELEN);
+	if (IS_ERR(name))
+		return PTR_ERR(name);
+
+	switch (opcode) {
+	case ATTR_OP_GET:
+		error = xfs_attrmulti_attr_get(inode, name, value, len, flags);
+		break;
+	case ATTR_OP_REMOVE:
+		value = NULL;
+		*len = 0;
+		fallthrough;
+	case ATTR_OP_SET:
+		error = mnt_want_write_file(parfilp);
+		if (error)
+			break;
+		error = xfs_attrmulti_attr_set(inode, name, value, *len, flags);
+		mnt_drop_write_file(parfilp);
+		break;
+	default:
+		error = -EINVAL;
+		break;
+	}
+
+	kfree(name);
+	return error;
+}
+
+int
+xfs_attrmulti_by_handle(
+	struct file		*parfilp,
+	void			__user *arg)
+{
+	int			error;
+	xfs_attr_multiop_t	*ops;
+	xfs_fsop_attrmulti_handlereq_t am_hreq;
+	struct dentry		*dentry;
+	unsigned int		i, size;
+
+	if (!capable(CAP_SYS_ADMIN))
+		return -EPERM;
+	if (copy_from_user(&am_hreq, arg, sizeof(xfs_fsop_attrmulti_handlereq_t)))
+		return -EFAULT;
+
+	/* overflow check */
+	if (am_hreq.opcount >= INT_MAX / sizeof(xfs_attr_multiop_t))
+		return -E2BIG;
+
+	dentry = xfs_handlereq_to_dentry(parfilp, &am_hreq.hreq);
+	if (IS_ERR(dentry))
+		return PTR_ERR(dentry);
+
+	error = -E2BIG;
+	size = am_hreq.opcount * sizeof(xfs_attr_multiop_t);
+	if (!size || size > 16 * PAGE_SIZE)
+		goto out_dput;
+
+	ops = memdup_user(am_hreq.ops, size);
+	if (IS_ERR(ops)) {
+		error = PTR_ERR(ops);
+		goto out_dput;
+	}
+
+	error = 0;
+	for (i = 0; i < am_hreq.opcount; i++) {
+		ops[i].am_error = xfs_ioc_attrmulti_one(parfilp,
+				d_inode(dentry), ops[i].am_opcode,
+				ops[i].am_attrname, ops[i].am_attrvalue,
+				&ops[i].am_length, ops[i].am_flags);
+	}
+
+	if (copy_to_user(am_hreq.ops, ops, size))
+		error = -EFAULT;
+
+	kfree(ops);
+ out_dput:
+	dput(dentry);
+	return error;
+}
diff --git a/fs/xfs/xfs_handle.h b/fs/xfs/xfs_handle.h
new file mode 100644
index 0000000000000..e39eaf4689da9
--- /dev/null
+++ b/fs/xfs/xfs_handle.h
@@ -0,0 +1,28 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2000-2005 Silicon Graphics, Inc.
+ * All rights reserved.
+ */
+#ifndef	__XFS_HANDLE_H__
+#define	__XFS_HANDLE_H__
+
+int xfs_attrlist_by_handle(struct file *parfilp,
+		struct xfs_fsop_attrlist_handlereq __user *p);
+int xfs_attrmulti_by_handle(struct file *parfilp, void __user *arg);
+
+int xfs_find_handle(unsigned int cmd, struct xfs_fsop_handlereq *hreq);
+int xfs_open_by_handle(struct file *parfilp, struct xfs_fsop_handlereq *hreq);
+int xfs_readlink_by_handle(struct file *parfilp,
+		struct xfs_fsop_handlereq *hreq);
+
+int xfs_ioc_attrmulti_one(struct file *parfilp, struct inode *inode,
+		uint32_t opcode, void __user *uname, void __user *value,
+		uint32_t *len, uint32_t flags);
+int xfs_ioc_attr_list(struct xfs_inode *dp, void __user *ubuf,
+		      size_t bufsize, int flags,
+		      struct xfs_attrlist_cursor __user *ucursor);
+
+struct dentry *xfs_handle_to_dentry(struct file *parfilp, void __user *uhandle,
+		u32 hlen);
+
+#endif	/* __XFS_HANDLE_H__ */
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index d56e5c6876eee..7b347cdd28785 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -23,11 +23,9 @@
 #include "xfs_fsops.h"
 #include "xfs_discard.h"
 #include "xfs_quota.h"
-#include "xfs_export.h"
 #include "xfs_trace.h"
 #include "xfs_icache.h"
 #include "xfs_trans.h"
-#include "xfs_acl.h"
 #include "xfs_btree.h"
 #include <linux/fsmap.h>
 #include "xfs_fsmap.h"
@@ -37,601 +35,14 @@
 #include "xfs_health.h"
 #include "xfs_reflink.h"
 #include "xfs_ioctl.h"
-#include "xfs_xattr.h"
 #include "xfs_rtbitmap.h"
 #include "xfs_file.h"
 #include "xfs_exchrange.h"
+#include "xfs_handle.h"
 
 #include <linux/mount.h>
-#include <linux/namei.h>
 #include <linux/fileattr.h>
 
-/*
- * xfs_find_handle maps from userspace xfs_fsop_handlereq structure to
- * a file or fs handle.
- *
- * XFS_IOC_PATH_TO_FSHANDLE
- *    returns fs handle for a mount point or path within that mount point
- * XFS_IOC_FD_TO_HANDLE
- *    returns full handle for a FD opened in user space
- * XFS_IOC_PATH_TO_HANDLE
- *    returns full handle for a path
- */
-int
-xfs_find_handle(
-	unsigned int		cmd,
-	xfs_fsop_handlereq_t	*hreq)
-{
-	int			hsize;
-	xfs_handle_t		handle;
-	struct inode		*inode;
-	struct fd		f = {NULL};
-	struct path		path;
-	int			error;
-	struct xfs_inode	*ip;
-
-	if (cmd == XFS_IOC_FD_TO_HANDLE) {
-		f = fdget(hreq->fd);
-		if (!f.file)
-			return -EBADF;
-		inode = file_inode(f.file);
-	} else {
-		error = user_path_at(AT_FDCWD, hreq->path, 0, &path);
-		if (error)
-			return error;
-		inode = d_inode(path.dentry);
-	}
-	ip = XFS_I(inode);
-
-	/*
-	 * We can only generate handles for inodes residing on a XFS filesystem,
-	 * and only for regular files, directories or symbolic links.
-	 */
-	error = -EINVAL;
-	if (inode->i_sb->s_magic != XFS_SB_MAGIC)
-		goto out_put;
-
-	error = -EBADF;
-	if (!S_ISREG(inode->i_mode) &&
-	    !S_ISDIR(inode->i_mode) &&
-	    !S_ISLNK(inode->i_mode))
-		goto out_put;
-
-
-	memcpy(&handle.ha_fsid, ip->i_mount->m_fixedfsid, sizeof(xfs_fsid_t));
-
-	if (cmd == XFS_IOC_PATH_TO_FSHANDLE) {
-		/*
-		 * This handle only contains an fsid, zero the rest.
-		 */
-		memset(&handle.ha_fid, 0, sizeof(handle.ha_fid));
-		hsize = sizeof(xfs_fsid_t);
-	} else {
-		handle.ha_fid.fid_len = sizeof(xfs_fid_t) -
-					sizeof(handle.ha_fid.fid_len);
-		handle.ha_fid.fid_pad = 0;
-		handle.ha_fid.fid_gen = inode->i_generation;
-		handle.ha_fid.fid_ino = ip->i_ino;
-		hsize = sizeof(xfs_handle_t);
-	}
-
-	error = -EFAULT;
-	if (copy_to_user(hreq->ohandle, &handle, hsize) ||
-	    copy_to_user(hreq->ohandlen, &hsize, sizeof(__s32)))
-		goto out_put;
-
-	error = 0;
-
- out_put:
-	if (cmd == XFS_IOC_FD_TO_HANDLE)
-		fdput(f);
-	else
-		path_put(&path);
-	return error;
-}
-
-/*
- * No need to do permission checks on the various pathname components
- * as the handle operations are privileged.
- */
-STATIC int
-xfs_handle_acceptable(
-	void			*context,
-	struct dentry		*dentry)
-{
-	return 1;
-}
-
-/*
- * Convert userspace handle data into a dentry.
- */
-struct dentry *
-xfs_handle_to_dentry(
-	struct file		*parfilp,
-	void __user		*uhandle,
-	u32			hlen)
-{
-	xfs_handle_t		handle;
-	struct xfs_fid64	fid;
-
-	/*
-	 * Only allow handle opens under a directory.
-	 */
-	if (!S_ISDIR(file_inode(parfilp)->i_mode))
-		return ERR_PTR(-ENOTDIR);
-
-	if (hlen != sizeof(xfs_handle_t))
-		return ERR_PTR(-EINVAL);
-	if (copy_from_user(&handle, uhandle, hlen))
-		return ERR_PTR(-EFAULT);
-	if (handle.ha_fid.fid_len !=
-	    sizeof(handle.ha_fid) - sizeof(handle.ha_fid.fid_len))
-		return ERR_PTR(-EINVAL);
-
-	memset(&fid, 0, sizeof(struct fid));
-	fid.ino = handle.ha_fid.fid_ino;
-	fid.gen = handle.ha_fid.fid_gen;
-
-	return exportfs_decode_fh(parfilp->f_path.mnt, (struct fid *)&fid, 3,
-			FILEID_INO32_GEN | XFS_FILEID_TYPE_64FLAG,
-			xfs_handle_acceptable, NULL);
-}
-
-STATIC struct dentry *
-xfs_handlereq_to_dentry(
-	struct file		*parfilp,
-	xfs_fsop_handlereq_t	*hreq)
-{
-	return xfs_handle_to_dentry(parfilp, hreq->ihandle, hreq->ihandlen);
-}
-
-int
-xfs_open_by_handle(
-	struct file		*parfilp,
-	xfs_fsop_handlereq_t	*hreq)
-{
-	const struct cred	*cred = current_cred();
-	int			error;
-	int			fd;
-	int			permflag;
-	struct file		*filp;
-	struct inode		*inode;
-	struct dentry		*dentry;
-	fmode_t			fmode;
-	struct path		path;
-
-	if (!capable(CAP_SYS_ADMIN))
-		return -EPERM;
-
-	dentry = xfs_handlereq_to_dentry(parfilp, hreq);
-	if (IS_ERR(dentry))
-		return PTR_ERR(dentry);
-	inode = d_inode(dentry);
-
-	/* Restrict xfs_open_by_handle to directories & regular files. */
-	if (!(S_ISREG(inode->i_mode) || S_ISDIR(inode->i_mode))) {
-		error = -EPERM;
-		goto out_dput;
-	}
-
-#if BITS_PER_LONG != 32
-	hreq->oflags |= O_LARGEFILE;
-#endif
-
-	permflag = hreq->oflags;
-	fmode = OPEN_FMODE(permflag);
-	if ((!(permflag & O_APPEND) || (permflag & O_TRUNC)) &&
-	    (fmode & FMODE_WRITE) && IS_APPEND(inode)) {
-		error = -EPERM;
-		goto out_dput;
-	}
-
-	if ((fmode & FMODE_WRITE) && IS_IMMUTABLE(inode)) {
-		error = -EPERM;
-		goto out_dput;
-	}
-
-	/* Can't write directories. */
-	if (S_ISDIR(inode->i_mode) && (fmode & FMODE_WRITE)) {
-		error = -EISDIR;
-		goto out_dput;
-	}
-
-	fd = get_unused_fd_flags(0);
-	if (fd < 0) {
-		error = fd;
-		goto out_dput;
-	}
-
-	path.mnt = parfilp->f_path.mnt;
-	path.dentry = dentry;
-	filp = dentry_open(&path, hreq->oflags, cred);
-	dput(dentry);
-	if (IS_ERR(filp)) {
-		put_unused_fd(fd);
-		return PTR_ERR(filp);
-	}
-
-	if (S_ISREG(inode->i_mode)) {
-		filp->f_flags |= O_NOATIME;
-		filp->f_mode |= FMODE_NOCMTIME;
-	}
-
-	fd_install(fd, filp);
-	return fd;
-
- out_dput:
-	dput(dentry);
-	return error;
-}
-
-int
-xfs_readlink_by_handle(
-	struct file		*parfilp,
-	xfs_fsop_handlereq_t	*hreq)
-{
-	struct dentry		*dentry;
-	__u32			olen;
-	int			error;
-
-	if (!capable(CAP_SYS_ADMIN))
-		return -EPERM;
-
-	dentry = xfs_handlereq_to_dentry(parfilp, hreq);
-	if (IS_ERR(dentry))
-		return PTR_ERR(dentry);
-
-	/* Restrict this handle operation to symlinks only. */
-	if (!d_is_symlink(dentry)) {
-		error = -EINVAL;
-		goto out_dput;
-	}
-
-	if (copy_from_user(&olen, hreq->ohandlen, sizeof(__u32))) {
-		error = -EFAULT;
-		goto out_dput;
-	}
-
-	error = vfs_readlink(dentry, hreq->ohandle, olen);
-
- out_dput:
-	dput(dentry);
-	return error;
-}
-
-/*
- * Format an attribute and copy it out to the user's buffer.
- * Take care to check values and protect against them changing later,
- * we may be reading them directly out of a user buffer.
- */
-static void
-xfs_ioc_attr_put_listent(
-	struct xfs_attr_list_context *context,
-	int			flags,
-	unsigned char		*name,
-	int			namelen,
-	void			*value,
-	int			valuelen)
-{
-	struct xfs_attrlist	*alist = context->buffer;
-	struct xfs_attrlist_ent	*aep;
-	int			arraytop;
-
-	ASSERT(!context->seen_enough);
-	ASSERT(context->count >= 0);
-	ASSERT(context->count < (ATTR_MAX_VALUELEN/8));
-	ASSERT(context->firstu >= sizeof(*alist));
-	ASSERT(context->firstu <= context->bufsize);
-
-	/*
-	 * Only list entries in the right namespace.
-	 */
-	if (context->attr_filter != (flags & XFS_ATTR_NSP_ONDISK_MASK))
-		return;
-
-	arraytop = sizeof(*alist) +
-			context->count * sizeof(alist->al_offset[0]);
-
-	/* decrement by the actual bytes used by the attr */
-	context->firstu -= round_up(offsetof(struct xfs_attrlist_ent, a_name) +
-			namelen + 1, sizeof(uint32_t));
-	if (context->firstu < arraytop) {
-		trace_xfs_attr_list_full(context);
-		alist->al_more = 1;
-		context->seen_enough = 1;
-		return;
-	}
-
-	aep = context->buffer + context->firstu;
-	aep->a_valuelen = valuelen;
-	memcpy(aep->a_name, name, namelen);
-	aep->a_name[namelen] = 0;
-	alist->al_offset[context->count++] = context->firstu;
-	alist->al_count = context->count;
-	trace_xfs_attr_list_add(context);
-}
-
-static unsigned int
-xfs_attr_filter(
-	u32			ioc_flags)
-{
-	if (ioc_flags & XFS_IOC_ATTR_ROOT)
-		return XFS_ATTR_ROOT;
-	if (ioc_flags & XFS_IOC_ATTR_SECURE)
-		return XFS_ATTR_SECURE;
-	return 0;
-}
-
-static unsigned int
-xfs_xattr_flags(
-	u32			ioc_flags)
-{
-	if (ioc_flags & XFS_IOC_ATTR_CREATE)
-		return XATTR_CREATE;
-	if (ioc_flags & XFS_IOC_ATTR_REPLACE)
-		return XATTR_REPLACE;
-	return 0;
-}
-
-int
-xfs_ioc_attr_list(
-	struct xfs_inode		*dp,
-	void __user			*ubuf,
-	size_t				bufsize,
-	int				flags,
-	struct xfs_attrlist_cursor __user *ucursor)
-{
-	struct xfs_attr_list_context	context = { };
-	struct xfs_attrlist		*alist;
-	void				*buffer;
-	int				error;
-
-	if (bufsize < sizeof(struct xfs_attrlist) ||
-	    bufsize > XFS_XATTR_LIST_MAX)
-		return -EINVAL;
-
-	/*
-	 * Reject flags, only allow namespaces.
-	 */
-	if (flags & ~(XFS_IOC_ATTR_ROOT | XFS_IOC_ATTR_SECURE))
-		return -EINVAL;
-	if (flags == (XFS_IOC_ATTR_ROOT | XFS_IOC_ATTR_SECURE))
-		return -EINVAL;
-
-	/*
-	 * Validate the cursor.
-	 */
-	if (copy_from_user(&context.cursor, ucursor, sizeof(context.cursor)))
-		return -EFAULT;
-	if (context.cursor.pad1 || context.cursor.pad2)
-		return -EINVAL;
-	if (!context.cursor.initted &&
-	    (context.cursor.hashval || context.cursor.blkno ||
-	     context.cursor.offset))
-		return -EINVAL;
-
-	buffer = kvzalloc(bufsize, GFP_KERNEL);
-	if (!buffer)
-		return -ENOMEM;
-
-	/*
-	 * Initialize the output buffer.
-	 */
-	context.dp = dp;
-	context.resynch = 1;
-	context.attr_filter = xfs_attr_filter(flags);
-	context.buffer = buffer;
-	context.bufsize = round_down(bufsize, sizeof(uint32_t));
-	context.firstu = context.bufsize;
-	context.put_listent = xfs_ioc_attr_put_listent;
-
-	alist = context.buffer;
-	alist->al_count = 0;
-	alist->al_more = 0;
-	alist->al_offset[0] = context.bufsize;
-
-	error = xfs_attr_list(&context);
-	if (error)
-		goto out_free;
-
-	if (copy_to_user(ubuf, buffer, bufsize) ||
-	    copy_to_user(ucursor, &context.cursor, sizeof(context.cursor)))
-		error = -EFAULT;
-out_free:
-	kvfree(buffer);
-	return error;
-}
-
-STATIC int
-xfs_attrlist_by_handle(
-	struct file		*parfilp,
-	struct xfs_fsop_attrlist_handlereq __user *p)
-{
-	struct xfs_fsop_attrlist_handlereq al_hreq;
-	struct dentry		*dentry;
-	int			error = -ENOMEM;
-
-	if (!capable(CAP_SYS_ADMIN))
-		return -EPERM;
-	if (copy_from_user(&al_hreq, p, sizeof(al_hreq)))
-		return -EFAULT;
-
-	dentry = xfs_handlereq_to_dentry(parfilp, &al_hreq.hreq);
-	if (IS_ERR(dentry))
-		return PTR_ERR(dentry);
-
-	error = xfs_ioc_attr_list(XFS_I(d_inode(dentry)), al_hreq.buffer,
-				  al_hreq.buflen, al_hreq.flags, &p->pos);
-	dput(dentry);
-	return error;
-}
-
-static int
-xfs_attrmulti_attr_get(
-	struct inode		*inode,
-	unsigned char		*name,
-	unsigned char		__user *ubuf,
-	uint32_t		*len,
-	uint32_t		flags)
-{
-	struct xfs_da_args	args = {
-		.dp		= XFS_I(inode),
-		.attr_filter	= xfs_attr_filter(flags),
-		.xattr_flags	= xfs_xattr_flags(flags),
-		.name		= name,
-		.namelen	= strlen(name),
-		.valuelen	= *len,
-	};
-	int			error;
-
-	if (*len > XFS_XATTR_SIZE_MAX)
-		return -EINVAL;
-
-	error = xfs_attr_get(&args);
-	if (error)
-		goto out_kfree;
-
-	*len = args.valuelen;
-	if (copy_to_user(ubuf, args.value, args.valuelen))
-		error = -EFAULT;
-
-out_kfree:
-	kvfree(args.value);
-	return error;
-}
-
-static int
-xfs_attrmulti_attr_set(
-	struct inode		*inode,
-	unsigned char		*name,
-	const unsigned char	__user *ubuf,
-	uint32_t		len,
-	uint32_t		flags)
-{
-	struct xfs_da_args	args = {
-		.dp		= XFS_I(inode),
-		.attr_filter	= xfs_attr_filter(flags),
-		.xattr_flags	= xfs_xattr_flags(flags),
-		.name		= name,
-		.namelen	= strlen(name),
-	};
-	int			error;
-
-	if (IS_IMMUTABLE(inode) || IS_APPEND(inode))
-		return -EPERM;
-
-	if (ubuf) {
-		if (len > XFS_XATTR_SIZE_MAX)
-			return -EINVAL;
-		args.value = memdup_user(ubuf, len);
-		if (IS_ERR(args.value))
-			return PTR_ERR(args.value);
-		args.valuelen = len;
-	}
-
-	error = xfs_attr_change(&args);
-	if (!error && (flags & XFS_IOC_ATTR_ROOT))
-		xfs_forget_acl(inode, name);
-	kfree(args.value);
-	return error;
-}
-
-int
-xfs_ioc_attrmulti_one(
-	struct file		*parfilp,
-	struct inode		*inode,
-	uint32_t		opcode,
-	void __user		*uname,
-	void __user		*value,
-	uint32_t		*len,
-	uint32_t		flags)
-{
-	unsigned char		*name;
-	int			error;
-
-	if ((flags & XFS_IOC_ATTR_ROOT) && (flags & XFS_IOC_ATTR_SECURE))
-		return -EINVAL;
-
-	name = strndup_user(uname, MAXNAMELEN);
-	if (IS_ERR(name))
-		return PTR_ERR(name);
-
-	switch (opcode) {
-	case ATTR_OP_GET:
-		error = xfs_attrmulti_attr_get(inode, name, value, len, flags);
-		break;
-	case ATTR_OP_REMOVE:
-		value = NULL;
-		*len = 0;
-		fallthrough;
-	case ATTR_OP_SET:
-		error = mnt_want_write_file(parfilp);
-		if (error)
-			break;
-		error = xfs_attrmulti_attr_set(inode, name, value, *len, flags);
-		mnt_drop_write_file(parfilp);
-		break;
-	default:
-		error = -EINVAL;
-		break;
-	}
-
-	kfree(name);
-	return error;
-}
-
-STATIC int
-xfs_attrmulti_by_handle(
-	struct file		*parfilp,
-	void			__user *arg)
-{
-	int			error;
-	xfs_attr_multiop_t	*ops;
-	xfs_fsop_attrmulti_handlereq_t am_hreq;
-	struct dentry		*dentry;
-	unsigned int		i, size;
-
-	if (!capable(CAP_SYS_ADMIN))
-		return -EPERM;
-	if (copy_from_user(&am_hreq, arg, sizeof(xfs_fsop_attrmulti_handlereq_t)))
-		return -EFAULT;
-
-	/* overflow check */
-	if (am_hreq.opcount >= INT_MAX / sizeof(xfs_attr_multiop_t))
-		return -E2BIG;
-
-	dentry = xfs_handlereq_to_dentry(parfilp, &am_hreq.hreq);
-	if (IS_ERR(dentry))
-		return PTR_ERR(dentry);
-
-	error = -E2BIG;
-	size = am_hreq.opcount * sizeof(xfs_attr_multiop_t);
-	if (!size || size > 16 * PAGE_SIZE)
-		goto out_dput;
-
-	ops = memdup_user(am_hreq.ops, size);
-	if (IS_ERR(ops)) {
-		error = PTR_ERR(ops);
-		goto out_dput;
-	}
-
-	error = 0;
-	for (i = 0; i < am_hreq.opcount; i++) {
-		ops[i].am_error = xfs_ioc_attrmulti_one(parfilp,
-				d_inode(dentry), ops[i].am_opcode,
-				ops[i].am_attrname, ops[i].am_attrvalue,
-				&ops[i].am_length, ops[i].am_flags);
-	}
-
-	if (copy_to_user(am_hreq.ops, ops, size))
-		error = -EFAULT;
-
-	kfree(ops);
- out_dput:
-	dput(dentry);
-	return error;
-}
-
 /* Return 0 on success or positive error */
 int
 xfs_fsbulkstat_one_fmt(
diff --git a/fs/xfs/xfs_ioctl.h b/fs/xfs/xfs_ioctl.h
index 38be600b5e1e8..12124946f347e 100644
--- a/fs/xfs/xfs_ioctl.h
+++ b/fs/xfs/xfs_ioctl.h
@@ -14,34 +14,6 @@ int
 xfs_ioc_swapext(
 	xfs_swapext_t	*sxp);
 
-extern int
-xfs_find_handle(
-	unsigned int		cmd,
-	xfs_fsop_handlereq_t	*hreq);
-
-extern int
-xfs_open_by_handle(
-	struct file		*parfilp,
-	xfs_fsop_handlereq_t	*hreq);
-
-extern int
-xfs_readlink_by_handle(
-	struct file		*parfilp,
-	xfs_fsop_handlereq_t	*hreq);
-
-int xfs_ioc_attrmulti_one(struct file *parfilp, struct inode *inode,
-		uint32_t opcode, void __user *uname, void __user *value,
-		uint32_t *len, uint32_t flags);
-int xfs_ioc_attr_list(struct xfs_inode *dp, void __user *ubuf,
-		      size_t bufsize, int flags,
-		      struct xfs_attrlist_cursor __user *ucursor);
-
-extern struct dentry *
-xfs_handle_to_dentry(
-	struct file		*parfilp,
-	void __user		*uhandle,
-	u32			hlen);
-
 extern int
 xfs_fileattr_get(
 	struct dentry		*dentry,
diff --git a/fs/xfs/xfs_ioctl32.c b/fs/xfs/xfs_ioctl32.c
index ee35eea1ecce6..b64785dc4354e 100644
--- a/fs/xfs/xfs_ioctl32.c
+++ b/fs/xfs/xfs_ioctl32.c
@@ -24,6 +24,7 @@
 #include "xfs_ioctl32.h"
 #include "xfs_trace.h"
 #include "xfs_sb.h"
+#include "xfs_handle.h"
 
 #define  _NATIVE_IOC(cmd, type) \
 	  _IOC(_IOC_DIR(cmd), _IOC_TYPE(cmd), _IOC_NR(cmd), sizeof(type))


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 26/32] xfs: split out handle management helpers a bit
  2024-04-10  0:45 ` [PATCHSET v13.1 5/9] xfs: Parent Pointers Darrick J. Wong
                     ` (24 preceding siblings ...)
  2024-04-10  1:00   ` [PATCH 25/32] xfs: move handle ioctl code to xfs_handle.c Darrick J. Wong
@ 2024-04-10  1:00   ` Darrick J. Wong
  2024-04-10  5:56     ` Christoph Hellwig
  2024-04-10  1:00   ` [PATCH 27/32] xfs: Add parent pointer ioctls Darrick J. Wong
                     ` (5 subsequent siblings)
  31 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  1:00 UTC (permalink / raw)
  To: djwong; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Split out the functions that generate file/fs handles and map them back
into dentries in preparation for the GETPARENTS ioctl next.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_fs.h |    4 ++
 fs/xfs/xfs_handle.c    |   92 ++++++++++++++++++++++++++++++++----------------
 2 files changed, 64 insertions(+), 32 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index 7486dcba8c218..51aa4774f57a2 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -633,7 +633,9 @@ typedef struct xfs_fsop_attrmulti_handlereq {
 /*
  * per machine unique filesystem identifier types.
  */
-typedef struct { __u32 val[2]; } xfs_fsid_t; /* file system id type */
+typedef struct xfs_fsid {
+	__u32	val[2];			/* file system id type */
+} xfs_fsid_t;
 
 typedef struct xfs_fid {
 	__u16	fid_len;		/* length of remainder	*/
diff --git a/fs/xfs/xfs_handle.c b/fs/xfs/xfs_handle.c
index a0015dc8cff1a..abeca486a2c91 100644
--- a/fs/xfs/xfs_handle.c
+++ b/fs/xfs/xfs_handle.c
@@ -30,6 +30,35 @@
 
 #include <linux/namei.h>
 
+static size_t
+xfs_filehandle_init(
+	struct xfs_mount	*mp,
+	xfs_ino_t		ino,
+	uint32_t		gen,
+	struct xfs_handle	*handle)
+{
+	memcpy(&handle->ha_fsid, mp->m_fixedfsid, sizeof(struct xfs_fsid));
+
+	handle->ha_fid.fid_len = sizeof(struct xfs_fid) -
+				 sizeof(handle->ha_fid.fid_len);
+	handle->ha_fid.fid_pad = 0;
+	handle->ha_fid.fid_gen = gen;
+	handle->ha_fid.fid_ino = ino;
+
+	return sizeof(struct xfs_handle);
+}
+
+static size_t
+xfs_fshandle_init(
+	struct xfs_mount	*mp,
+	struct xfs_handle	*handle)
+{
+	memcpy(&handle->ha_fsid, mp->m_fixedfsid, sizeof(struct xfs_fsid));
+	memset(&handle->ha_fid, 0, sizeof(handle->ha_fid));
+
+	return sizeof(struct xfs_fsid);
+}
+
 /*
  * xfs_find_handle maps from userspace xfs_fsop_handlereq structure to
  * a file or fs handle.
@@ -84,20 +113,11 @@ xfs_find_handle(
 
 	memcpy(&handle.ha_fsid, ip->i_mount->m_fixedfsid, sizeof(xfs_fsid_t));
 
-	if (cmd == XFS_IOC_PATH_TO_FSHANDLE) {
-		/*
-		 * This handle only contains an fsid, zero the rest.
-		 */
-		memset(&handle.ha_fid, 0, sizeof(handle.ha_fid));
-		hsize = sizeof(xfs_fsid_t);
-	} else {
-		handle.ha_fid.fid_len = sizeof(xfs_fid_t) -
-					sizeof(handle.ha_fid.fid_len);
-		handle.ha_fid.fid_pad = 0;
-		handle.ha_fid.fid_gen = inode->i_generation;
-		handle.ha_fid.fid_ino = ip->i_ino;
-		hsize = sizeof(xfs_handle_t);
-	}
+	if (cmd == XFS_IOC_PATH_TO_FSHANDLE)
+		hsize = xfs_fshandle_init(ip->i_mount, &handle);
+	else
+		hsize = xfs_filehandle_init(ip->i_mount, ip->i_ino,
+				inode->i_generation, &handle);
 
 	error = -EFAULT;
 	if (copy_to_user(hreq->ohandle, &handle, hsize) ||
@@ -126,6 +146,32 @@ xfs_handle_acceptable(
 	return 1;
 }
 
+/* Convert handle already copied to kernel space into a dentry. */
+static struct dentry *
+xfs_khandle_to_dentry(
+	struct file		*file,
+	struct xfs_handle	*handle)
+{
+	struct xfs_fid64        fid = {
+		.ino		= handle->ha_fid.fid_ino,
+		.gen		= handle->ha_fid.fid_gen,
+	};
+
+	/*
+	 * Only allow handle opens under a directory.
+	 */
+	if (!S_ISDIR(file_inode(file)->i_mode))
+		return ERR_PTR(-ENOTDIR);
+
+	if (handle->ha_fid.fid_len !=
+	    sizeof(handle->ha_fid) - sizeof(handle->ha_fid.fid_len))
+		return ERR_PTR(-EINVAL);
+
+	return exportfs_decode_fh(file->f_path.mnt, (struct fid *)&fid, 3,
+			FILEID_INO32_GEN | XFS_FILEID_TYPE_64FLAG,
+			xfs_handle_acceptable, NULL);
+}
+
 /*
  * Convert userspace handle data into a dentry.
  */
@@ -136,29 +182,13 @@ xfs_handle_to_dentry(
 	u32			hlen)
 {
 	xfs_handle_t		handle;
-	struct xfs_fid64	fid;
-
-	/*
-	 * Only allow handle opens under a directory.
-	 */
-	if (!S_ISDIR(file_inode(parfilp)->i_mode))
-		return ERR_PTR(-ENOTDIR);
 
 	if (hlen != sizeof(xfs_handle_t))
 		return ERR_PTR(-EINVAL);
 	if (copy_from_user(&handle, uhandle, hlen))
 		return ERR_PTR(-EFAULT);
-	if (handle.ha_fid.fid_len !=
-	    sizeof(handle.ha_fid) - sizeof(handle.ha_fid.fid_len))
-		return ERR_PTR(-EINVAL);
 
-	memset(&fid, 0, sizeof(struct fid));
-	fid.ino = handle.ha_fid.fid_ino;
-	fid.gen = handle.ha_fid.fid_gen;
-
-	return exportfs_decode_fh(parfilp->f_path.mnt, (struct fid *)&fid, 3,
-			FILEID_INO32_GEN | XFS_FILEID_TYPE_64FLAG,
-			xfs_handle_acceptable, NULL);
+	return xfs_khandle_to_dentry(parfilp, &handle);
 }
 
 STATIC struct dentry *


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 27/32] xfs: Add parent pointer ioctls
  2024-04-10  0:45 ` [PATCHSET v13.1 5/9] xfs: Parent Pointers Darrick J. Wong
                     ` (25 preceding siblings ...)
  2024-04-10  1:00   ` [PATCH 26/32] xfs: split out handle management helpers a bit Darrick J. Wong
@ 2024-04-10  1:00   ` Darrick J. Wong
  2024-04-10  6:04     ` Christoph Hellwig
  2024-04-12 17:39     ` Darrick J. Wong
  2024-04-10  1:00   ` [PATCH 28/32] xfs: don't remove the attr fork when parent pointers are enabled Darrick J. Wong
                     ` (4 subsequent siblings)
  31 siblings, 2 replies; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  1:00 UTC (permalink / raw)
  To: djwong
  Cc: Allison Henderson, catherine.hoang, hch, allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

This patch adds a pair of new file ioctls to retrieve the parent pointer
of a given inode.  They both return the same results, but one operates
on the file descriptor passed to ioctl() whereas the other allows the
caller to specify a file handle for which the caller wants results.

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
[djwong: adjust to new ondisk format, split ioctls]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_fs.h     |   73 ++++++++++++
 fs/xfs/libxfs/xfs_ondisk.h |    5 +
 fs/xfs/libxfs/xfs_parent.c |   35 ++++++
 fs/xfs/libxfs/xfs_parent.h |    5 +
 fs/xfs/xfs_handle.c        |  259 ++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_handle.h        |    5 +
 fs/xfs/xfs_ioctl.c         |    6 +
 fs/xfs/xfs_trace.c         |    1 
 fs/xfs/xfs_trace.h         |   92 ++++++++++++++++
 9 files changed, 480 insertions(+), 1 deletion(-)


diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index 51aa4774f57a2..fa28c18e521bf 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -840,6 +840,77 @@ struct xfs_commit_range {
 					 XFS_EXCHANGE_RANGE_DRY_RUN | \
 					 XFS_EXCHANGE_RANGE_FILE1_WRITTEN)
 
+/* Iterating parent pointers of files. */
+
+/* target was the root directory */
+#define XFS_GETPARENTS_OFLAG_ROOT	(1U << 0)
+
+/* Cursor is done iterating pptrs */
+#define XFS_GETPARENTS_OFLAG_DONE	(1U << 1)
+
+#define XFS_GETPARENTS_OFLAGS_ALL	(XFS_GETPARENTS_OFLAG_ROOT | \
+					 XFS_GETPARENTS_OFLAG_DONE)
+
+#define XFS_GETPARENTS_IFLAGS_ALL	(0)
+
+struct xfs_getparents_rec {
+	struct xfs_handle	gpr_parent; /* Handle to parent */
+	__u16			gpr_reclen; /* Length of entire record */
+	char			gpr_name[]; /* Null-terminated filename */
+} __packed;
+
+/* Iterate through this file's directory parent pointers */
+struct xfs_getparents {
+	/*
+	 * Structure to track progress in iterating the parent pointers.
+	 * Must be initialized to zeroes before the first ioctl call, and
+	 * not touched by callers after that.
+	 */
+	struct xfs_attrlist_cursor	gp_cursor;
+
+	/* Input flags: XFS_GETPARENTS_IFLAG* */
+	__u16				gp_iflags;
+
+	/* Output flags: XFS_GETPARENTS_OFLAG* */
+	__u16				gp_oflags;
+
+	/* Size of the gp_buffer in bytes */
+	__u32				gp_bufsize;
+
+	/* Must be set to zero */
+	__u64				__pad;
+
+	/* Pointer to a buffer in which to place xfs_getparents_rec */
+	__u64				gp_buffer;
+};
+
+static inline struct xfs_getparents_rec *
+xfs_getparents_first_rec(struct xfs_getparents *gp)
+{
+	return (struct xfs_getparents_rec *)(uintptr_t)gp->gp_buffer;
+}
+
+static inline struct xfs_getparents_rec *
+xfs_getparents_next_rec(struct xfs_getparents *gp,
+			struct xfs_getparents_rec *gpr)
+{
+	char *next = ((char *)gpr + gpr->gpr_reclen);
+	char *end = (char *)(uintptr_t)(gp->gp_buffer + gp->gp_bufsize);
+
+	if (next >= end)
+		return NULL;
+
+	return (struct xfs_getparents_rec *)next;
+}
+
+/* Iterate through this file handle's directory parent pointers. */
+struct xfs_getparents_by_handle {
+	/* Handle to file whose parents we want. */
+	struct xfs_handle		gph_handle;
+
+	struct xfs_getparents		gph_request;
+};
+
 /*
  * ioctl commands that are used by Linux filesystems
  */
@@ -875,6 +946,8 @@ struct xfs_commit_range {
 /*	XFS_IOC_GETFSMAP ------ hoisted 59         */
 #define XFS_IOC_SCRUB_METADATA	_IOWR('X', 60, struct xfs_scrub_metadata)
 #define XFS_IOC_AG_GEOMETRY	_IOWR('X', 61, struct xfs_ag_geometry)
+#define XFS_IOC_GETPARENTS	_IOWR('X', 62, struct xfs_getparents)
+#define XFS_IOC_GETPARENTS_BY_HANDLE _IOWR('X', 63, struct xfs_getparents_by_handle)
 
 /*
  * ioctl commands that replace IRIX syssgi()'s
diff --git a/fs/xfs/libxfs/xfs_ondisk.h b/fs/xfs/libxfs/xfs_ondisk.h
index 25952ef584eee..34c972113d997 100644
--- a/fs/xfs/libxfs/xfs_ondisk.h
+++ b/fs/xfs/libxfs/xfs_ondisk.h
@@ -156,6 +156,11 @@ xfs_check_ondisk_structs(void)
 	XFS_CHECK_OFFSET(struct xfs_efi_log_format_32, efi_extents,	16);
 	XFS_CHECK_OFFSET(struct xfs_efi_log_format_64, efi_extents,	16);
 
+	/* parent pointer ioctls */
+	XFS_CHECK_STRUCT_SIZE(struct xfs_getparents_rec,	26);
+	XFS_CHECK_STRUCT_SIZE(struct xfs_getparents,		40);
+	XFS_CHECK_STRUCT_SIZE(struct xfs_getparents_by_handle,	64);
+
 	/*
 	 * The v5 superblock format extended several v4 header structures with
 	 * additional data. While new fields are only accessible on v5
diff --git a/fs/xfs/libxfs/xfs_parent.c b/fs/xfs/libxfs/xfs_parent.c
index 86c808157294e..db8cfad0b968e 100644
--- a/fs/xfs/libxfs/xfs_parent.c
+++ b/fs/xfs/libxfs/xfs_parent.c
@@ -259,3 +259,38 @@ xfs_parent_replacename(
 	xfs_attr_defer_parent(&ppargs->args, XFS_ATTR_DEFER_REPLACE);
 	return 0;
 }
+
+/*
+ * Extract parent pointer information from any xattr into @parent_ino/gen.
+ * The last two parameters can be NULL pointers.
+ *
+ * Returns 1 if this is a valid parent pointer; 0 if this is not a parent
+ * pointer xattr at all; or -EFSCORRUPTED for garbage.
+ */
+int
+xfs_parent_from_xattr(
+	struct xfs_mount	*mp,
+	unsigned int		attr_flags,
+	const unsigned char	*name,
+	unsigned int		namelen,
+	const void		*value,
+	unsigned int		valuelen,
+	xfs_ino_t		*parent_ino,
+	uint32_t		*parent_gen)
+{
+	const struct xfs_parent_rec	*rec = value;
+
+	if (!(attr_flags & XFS_ATTR_PARENT))
+		return 0;
+
+	if (!xfs_parent_namecheck(attr_flags, name, namelen))
+		return -EFSCORRUPTED;
+	if (!xfs_parent_valuecheck(mp, value, valuelen))
+		return -EFSCORRUPTED;
+
+	if (parent_ino)
+		*parent_ino = be64_to_cpu(rec->p_ino);
+	if (parent_gen)
+		*parent_gen = be32_to_cpu(rec->p_gen);
+	return 1;
+}
diff --git a/fs/xfs/libxfs/xfs_parent.h b/fs/xfs/libxfs/xfs_parent.h
index 768633b313671..3003ab496f854 100644
--- a/fs/xfs/libxfs/xfs_parent.h
+++ b/fs/xfs/libxfs/xfs_parent.h
@@ -91,4 +91,9 @@ int xfs_parent_replacename(struct xfs_trans *tp,
 		struct xfs_inode *new_dp, const struct xfs_name *new_name,
 		struct xfs_inode *child);
 
+int xfs_parent_from_xattr(struct xfs_mount *mp, unsigned int attr_flags,
+		const unsigned char *name, unsigned int namelen,
+		const void *value, unsigned int valuelen,
+		xfs_ino_t *parent_ino, uint32_t *parent_gen);
+
 #endif /* __XFS_PARENT_H__ */
diff --git a/fs/xfs/xfs_handle.c b/fs/xfs/xfs_handle.c
index abeca486a2c91..833b0d7d8bea1 100644
--- a/fs/xfs/xfs_handle.c
+++ b/fs/xfs/xfs_handle.c
@@ -1,6 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
 /*
  * Copyright (c) 2000-2005 Silicon Graphics, Inc.
+ * Copyright (c) 2022-2024 Oracle.
  * All rights reserved.
  */
 #include "xfs.h"
@@ -645,3 +646,261 @@ xfs_attrmulti_by_handle(
 	dput(dentry);
 	return error;
 }
+
+struct xfs_getparents_ctx {
+	struct xfs_attr_list_context	context;
+	struct xfs_getparents_by_handle	gph;
+
+	/* File to target */
+	struct xfs_inode		*ip;
+
+	/* Internal buffer where we format records */
+	void				*krecords;
+
+	/* Last record filled out */
+	struct xfs_getparents_rec	*lastrec;
+
+	unsigned int			count;
+};
+
+static inline unsigned int
+xfs_getparents_rec_sizeof(
+	unsigned int		namelen)
+{
+	return round_up(sizeof(struct xfs_getparents_rec) + namelen + 1,
+			sizeof(uint32_t));
+}
+
+static void
+xfs_getparents_put_listent(
+	struct xfs_attr_list_context	*context,
+	int				flags,
+	unsigned char			*name,
+	int				namelen,
+	void				*value,
+	int				valuelen)
+{
+	struct xfs_getparents_ctx	*gpx =
+		container_of(context, struct xfs_getparents_ctx, context);
+	struct xfs_inode		*ip = context->dp;
+	struct xfs_mount		*mp = ip->i_mount;
+	struct xfs_getparents		*gp = &gpx->gph.gph_request;
+	struct xfs_getparents_rec	*gpr = gpx->krecords + context->firstu;
+	unsigned short			reclen = xfs_getparents_rec_sizeof(namelen);
+	xfs_ino_t			ino;
+	uint32_t			gen;
+	int				ret;
+
+	ret = xfs_parent_from_xattr(mp, flags, name, namelen, value, valuelen,
+			&ino, &gen);
+	if (ret < 0) {
+		xfs_inode_mark_sick(ip, XFS_SICK_INO_PARENT);
+		context->seen_enough = -EFSCORRUPTED;
+		return;
+	}
+	if (ret != 1)
+		return;
+
+	/*
+	 * We found a parent pointer, but we've filled up the buffer.  Signal
+	 * to the caller that we did /not/ reach the end of the parent pointer
+	 * recordset.
+	 */
+	if (context->firstu > context->bufsize - reclen) {
+		context->seen_enough = 1;
+		return;
+	}
+
+	/* Format the parent pointer directly into the caller buffer. */
+	gpr->gpr_reclen = reclen;
+	xfs_filehandle_init(mp, ino, gen, &gpr->gpr_parent);
+	memcpy(gpr->gpr_name, name, namelen);
+	gpr->gpr_name[namelen] = 0;
+
+	trace_xfs_getparents_put_listent(ip, gp, context, gpr);
+
+	context->firstu += reclen;
+	gpx->count++;
+	gpx->lastrec = gpr;
+}
+
+/* Expand the last record to fill the rest of the caller's buffer. */
+static inline void
+xfs_getparents_expand_lastrec(
+	struct xfs_getparents_ctx	*gpx)
+{
+	struct xfs_getparents		*gp = &gpx->gph.gph_request;
+	struct xfs_getparents_rec	*gpr = gpx->lastrec;
+
+	if (!gpx->lastrec)
+		gpr = gpx->krecords;
+
+	gpr->gpr_reclen = gp->gp_bufsize - ((void *)gpr - gpx->krecords);
+
+	trace_xfs_getparents_expand_lastrec(gpx->ip, gp, &gpx->context, gpr);
+}
+
+static inline void __user *u64_to_uptr(u64 val)
+{
+	return (void __user *)(uintptr_t)val;
+}
+
+/* Retrieve the parent pointers for a given inode. */
+STATIC int
+xfs_getparents(
+	struct xfs_getparents_ctx	*gpx)
+{
+	struct xfs_getparents		*gp = &gpx->gph.gph_request;
+	struct xfs_inode		*ip = gpx->ip;
+	struct xfs_mount		*mp = ip->i_mount;
+	size_t				bufsize;
+	int				error;
+
+	/* Check size of buffer requested by user */
+	if (gp->gp_bufsize > XFS_XATTR_LIST_MAX)
+		return -ENOMEM;
+	if (gp->gp_bufsize < xfs_getparents_rec_sizeof(1))
+		return -EINVAL;
+
+	if (gp->gp_iflags & ~XFS_GETPARENTS_IFLAGS_ALL)
+		return -EINVAL;
+	if (gp->__pad)
+		return -EINVAL;
+
+	bufsize = round_down(gp->gp_bufsize, sizeof(uint32_t));
+	gpx->krecords = kvzalloc(bufsize, GFP_KERNEL);
+	if (!gpx->krecords) {
+		bufsize = min(bufsize, PAGE_SIZE);
+		gpx->krecords = kvzalloc(bufsize, GFP_KERNEL);
+		if (!gpx->krecords)
+			return -ENOMEM;
+	}
+
+	gpx->context.dp = ip;
+	gpx->context.resynch = 1;
+	gpx->context.put_listent = xfs_getparents_put_listent;
+	gpx->context.bufsize = bufsize;
+	/* firstu is used to track the bytes filled in the buffer */
+	gpx->context.firstu = 0;
+
+	/* Copy the cursor provided by caller */
+	memcpy(&gpx->context.cursor, &gp->gp_cursor,
+			sizeof(struct xfs_attrlist_cursor));
+	gpx->count = 0;
+	gp->gp_oflags = 0;
+
+	trace_xfs_getparents_begin(ip, gp, &gpx->context.cursor);
+
+	error = xfs_attr_list(&gpx->context);
+	if (error)
+		goto out_free_buf;
+	if (gpx->context.seen_enough < 0) {
+		error = gpx->context.seen_enough;
+		goto out_free_buf;
+	}
+	xfs_getparents_expand_lastrec(gpx);
+
+	/* Update the caller with the current cursor position */
+	memcpy(&gp->gp_cursor, &gpx->context.cursor,
+			sizeof(struct xfs_attrlist_cursor));
+
+	/* Is this the root directory? */
+	if (ip->i_ino == mp->m_sb.sb_rootino)
+		gp->gp_oflags |= XFS_GETPARENTS_OFLAG_ROOT;
+
+	if (gpx->context.seen_enough == 0) {
+		/*
+		 * If we did not run out of buffer space, then we reached the
+		 * end of the pptr recordset, so set the DONE flag.
+		 */
+		gp->gp_oflags |= XFS_GETPARENTS_OFLAG_DONE;
+	} else if (gpx->count == 0) {
+		/*
+		 * If we ran out of buffer space before copying any parent
+		 * pointers at all, the caller's buffer was too short.  Tell
+		 * userspace that, erm, the message is too long.
+		 */
+		error = -EMSGSIZE;
+		goto out_free_buf;
+	}
+
+	trace_xfs_getparents_end(ip, gp, &gpx->context.cursor);
+
+	ASSERT(gpx->context.firstu <= gpx->gph.gph_request.gp_bufsize);
+
+	/* Copy the records to userspace. */
+	if (copy_to_user(u64_to_uptr(gpx->gph.gph_request.gp_buffer),
+				gpx->krecords, gpx->context.firstu))
+		error = -EFAULT;
+
+out_free_buf:
+	kvfree(gpx->krecords);
+	gpx->krecords = NULL;
+	return error;
+}
+
+/* Retrieve the parents of this file and pass them back to userspace. */
+int
+xfs_ioc_getparents(
+	struct file			*file,
+	struct xfs_getparents __user	*ureq)
+{
+	struct xfs_getparents_ctx	gpx = {
+		.ip			= XFS_I(file_inode(file)),
+	};
+	struct xfs_getparents		*kreq = &gpx.gph.gph_request;
+	struct xfs_mount		*mp = gpx.ip->i_mount;
+	int				error;
+
+	if (!capable(CAP_SYS_ADMIN))
+		return -EPERM;
+	if (!xfs_has_parent(mp))
+		return -EOPNOTSUPP;
+	if (copy_from_user(kreq, ureq, sizeof(*kreq)))
+		return -EFAULT;
+
+	error = xfs_getparents(&gpx);
+	if (error)
+		return error;
+
+	if (copy_to_user(ureq, kreq, sizeof(*kreq)))
+		return -EFAULT;
+
+	return 0;
+}
+
+/* Retrieve the parents of this file handle and pass them back to userspace. */
+int
+xfs_ioc_getparents_by_handle(
+	struct file			*file,
+	struct xfs_getparents_by_handle __user	*ureq)
+{
+	struct xfs_getparents_ctx	gpx = { };
+	struct xfs_inode		*ip = XFS_I(file_inode(file));
+	struct xfs_mount		*mp = ip->i_mount;
+	struct xfs_getparents_by_handle	*kreq = &gpx.gph;
+	struct dentry			*dentry;
+	int				error;
+
+	if (!capable(CAP_SYS_ADMIN))
+		return -EPERM;
+	if (!xfs_has_parent(mp))
+		return -EOPNOTSUPP;
+	if (copy_from_user(kreq, ureq, sizeof(*kreq)))
+		return -EFAULT;
+
+	dentry = xfs_khandle_to_dentry(file, &kreq->gph_handle);
+	if (IS_ERR(dentry))
+		return PTR_ERR(dentry);
+
+	gpx.ip = XFS_I(dentry->d_inode);
+	error = xfs_getparents(&gpx);
+	dput(dentry);
+	if (error)
+		return error;
+
+	if (copy_to_user(ureq, kreq, sizeof(*kreq)))
+		return -EFAULT;
+
+	return 0;
+}
diff --git a/fs/xfs/xfs_handle.h b/fs/xfs/xfs_handle.h
index e39eaf4689da9..6799a86d8565c 100644
--- a/fs/xfs/xfs_handle.h
+++ b/fs/xfs/xfs_handle.h
@@ -1,6 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
 /*
  * Copyright (c) 2000-2005 Silicon Graphics, Inc.
+ * Copyright (c) 2022-2024 Oracle.
  * All rights reserved.
  */
 #ifndef	__XFS_HANDLE_H__
@@ -25,4 +26,8 @@ int xfs_ioc_attr_list(struct xfs_inode *dp, void __user *ubuf,
 struct dentry *xfs_handle_to_dentry(struct file *parfilp, void __user *uhandle,
 		u32 hlen);
 
+int xfs_ioc_getparents(struct file *file, struct xfs_getparents __user *arg);
+int xfs_ioc_getparents_by_handle(struct file *file,
+		struct xfs_getparents_by_handle __user *arg);
+
 #endif	/* __XFS_HANDLE_H__ */
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 7b347cdd28785..c7a15b5f33aa4 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -35,6 +35,7 @@
 #include "xfs_health.h"
 #include "xfs_reflink.h"
 #include "xfs_ioctl.h"
+#include "xfs_xattr.h"
 #include "xfs_rtbitmap.h"
 #include "xfs_file.h"
 #include "xfs_exchrange.h"
@@ -1542,7 +1543,10 @@ xfs_file_ioctl(
 
 	case XFS_IOC_FSGETXATTRA:
 		return xfs_ioc_fsgetxattra(ip, arg);
-
+	case XFS_IOC_GETPARENTS:
+		return xfs_ioc_getparents(filp, arg);
+	case XFS_IOC_GETPARENTS_BY_HANDLE:
+		return xfs_ioc_getparents_by_handle(filp, arg);
 	case XFS_IOC_GETBMAP:
 	case XFS_IOC_GETBMAPA:
 	case XFS_IOC_GETBMAPX:
diff --git a/fs/xfs/xfs_trace.c b/fs/xfs/xfs_trace.c
index cf92a3bd56c79..9c7fbaae2717d 100644
--- a/fs/xfs/xfs_trace.c
+++ b/fs/xfs/xfs_trace.c
@@ -41,6 +41,7 @@
 #include "xfs_bmap.h"
 #include "xfs_exchmaps.h"
 #include "xfs_exchrange.h"
+#include "xfs_parent.h"
 
 /*
  * We include this last to have the helpers above available for the trace
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index e6cbdffb14f64..4438b62a8c562 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -87,6 +87,9 @@ struct xfs_bmap_intent;
 struct xfs_exchmaps_intent;
 struct xfs_exchmaps_req;
 struct xfs_exchrange;
+struct xfs_getparents;
+struct xfs_parent_irec;
+struct xfs_attrlist_cursor_kern;
 
 #define XFS_ATTR_FILTER_FLAGS \
 	{ XFS_ATTR_ROOT,	"ROOT" }, \
@@ -5158,6 +5161,95 @@ TRACE_EVENT(xfs_exchmaps_delta_nextents,
 		  __entry->d_nexts1, __entry->d_nexts2)
 );
 
+DECLARE_EVENT_CLASS(xfs_getparents_rec_class,
+	TP_PROTO(struct xfs_inode *ip, const struct xfs_getparents *ppi,
+		 const struct xfs_attr_list_context *context,
+	         const struct xfs_getparents_rec *pptr),
+	TP_ARGS(ip, ppi, context, pptr),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_ino_t, ino)
+		__field(unsigned int, firstu)
+		__field(unsigned short, reclen)
+		__field(unsigned int, bufsize)
+		__field(xfs_ino_t, parent_ino)
+		__field(unsigned int, parent_gen)
+		__string(name, pptr->gpr_name)
+	),
+	TP_fast_assign(
+		__entry->dev = ip->i_mount->m_super->s_dev;
+		__entry->ino = ip->i_ino;
+		__entry->firstu = context->firstu;
+		__entry->reclen = pptr->gpr_reclen;
+		__entry->bufsize = ppi->gp_bufsize;
+		__entry->parent_ino = pptr->gpr_parent.ha_fid.fid_ino;
+		__entry->parent_gen = pptr->gpr_parent.ha_fid.fid_gen;
+		__assign_str(name, pptr->gpr_name);
+	),
+	TP_printk("dev %d:%d ino 0x%llx firstu %u reclen %u bufsize %u parent_ino 0x%llx parent_gen 0x%x name '%s'",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->ino,
+		  __entry->firstu,
+		  __entry->reclen,
+		  __entry->bufsize,
+		  __entry->parent_ino,
+		  __entry->parent_gen,
+		  __get_str(name))
+)
+#define DEFINE_XFS_GETPARENTS_REC_EVENT(name) \
+DEFINE_EVENT(xfs_getparents_rec_class, name, \
+	TP_PROTO(struct xfs_inode *ip, const struct xfs_getparents *ppi, \
+		 const struct xfs_attr_list_context *context, \
+	         const struct xfs_getparents_rec *pptr), \
+	TP_ARGS(ip, ppi, context, pptr))
+DEFINE_XFS_GETPARENTS_REC_EVENT(xfs_getparents_put_listent);
+DEFINE_XFS_GETPARENTS_REC_EVENT(xfs_getparents_expand_lastrec);
+
+DECLARE_EVENT_CLASS(xfs_getparents_class,
+	TP_PROTO(struct xfs_inode *ip, const struct xfs_getparents *ppi,
+		 const struct xfs_attrlist_cursor_kern *cur),
+	TP_ARGS(ip, ppi, cur),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_ino_t, ino)
+		__field(unsigned short, iflags)
+		__field(unsigned short, oflags)
+		__field(unsigned int, bufsize)
+		__field(unsigned int, hashval)
+		__field(unsigned int, blkno)
+		__field(unsigned int, offset)
+		__field(int, initted)
+	),
+	TP_fast_assign(
+		__entry->dev = ip->i_mount->m_super->s_dev;
+		__entry->ino = ip->i_ino;
+		__entry->iflags = ppi->gp_iflags;
+		__entry->oflags = ppi->gp_oflags;
+		__entry->bufsize = ppi->gp_bufsize;
+		__entry->hashval = cur->hashval;
+		__entry->blkno = cur->blkno;
+		__entry->offset = cur->offset;
+		__entry->initted = cur->initted;
+	),
+	TP_printk("dev %d:%d ino 0x%llx iflags 0x%x oflags 0x%x bufsize %u cur_init? %d hashval 0x%x blkno %u offset %u",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->ino,
+		  __entry->iflags,
+		  __entry->oflags,
+		  __entry->bufsize,
+		  __entry->initted,
+		  __entry->hashval,
+		  __entry->blkno,
+		  __entry->offset)
+)
+#define DEFINE_XFS_GETPARENTS_EVENT(name) \
+DEFINE_EVENT(xfs_getparents_class, name, \
+	TP_PROTO(struct xfs_inode *ip, const struct xfs_getparents *ppi, \
+		 const struct xfs_attrlist_cursor_kern *cur), \
+	TP_ARGS(ip, ppi, cur))
+DEFINE_XFS_GETPARENTS_EVENT(xfs_getparents_begin);
+DEFINE_XFS_GETPARENTS_EVENT(xfs_getparents_end);
+
 #endif /* _TRACE_XFS_H */
 
 #undef TRACE_INCLUDE_PATH


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 28/32] xfs: don't remove the attr fork when parent pointers are enabled
  2024-04-10  0:45 ` [PATCHSET v13.1 5/9] xfs: Parent Pointers Darrick J. Wong
                     ` (26 preceding siblings ...)
  2024-04-10  1:00   ` [PATCH 27/32] xfs: Add parent pointer ioctls Darrick J. Wong
@ 2024-04-10  1:00   ` Darrick J. Wong
  2024-04-10  6:04     ` Christoph Hellwig
  2024-04-10  1:01   ` [PATCH 29/32] xfs: Add the parent pointer support to the superblock version 5 Darrick J. Wong
                     ` (3 subsequent siblings)
  31 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  1:00 UTC (permalink / raw)
  To: djwong
  Cc: Allison Henderson, catherine.hoang, hch, allison.henderson, linux-xfs

From: Allison Henderson <allison.henderson@oracle.com>

When an inode is removed, it may also cause the attribute fork to be
removed if it is the last attribute. This transaction gets flushed to
the log, but if the system goes down before we could inactivate the symlink,
the log recovery tries to inactivate this inode (since it is on the unlinked
list) but the verifier trips over the remote value and leaks it.

Hence we ended up with a file in this odd state on a "clean" mount.  The
"obvious" fix is to prohibit erasure of the attr fork to avoid tripping
over the verifiers when pptrs are enabled.

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_attr_leaf.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c
index 7d74ade47d8f1..6eacf3cb7ca0b 100644
--- a/fs/xfs/libxfs/xfs_attr_leaf.c
+++ b/fs/xfs/libxfs/xfs_attr_leaf.c
@@ -887,7 +887,8 @@ xfs_attr_sf_removename(
 	 */
 	if (totsize == sizeof(struct xfs_attr_sf_hdr) && xfs_has_attr2(mp) &&
 	    (dp->i_df.if_format != XFS_DINODE_FMT_BTREE) &&
-	    !(args->op_flags & (XFS_DA_OP_ADDNAME | XFS_DA_OP_REPLACE))) {
+	    !(args->op_flags & (XFS_DA_OP_ADDNAME | XFS_DA_OP_REPLACE)) &&
+	    !xfs_has_parent(mp)) {
 		xfs_attr_fork_remove(dp, args->trans);
 	} else {
 		xfs_idata_realloc(dp, -size, XFS_ATTR_FORK);
@@ -896,7 +897,8 @@ xfs_attr_sf_removename(
 		ASSERT(totsize > sizeof(struct xfs_attr_sf_hdr) ||
 				(args->op_flags & XFS_DA_OP_ADDNAME) ||
 				!xfs_has_attr2(mp) ||
-				dp->i_df.if_format == XFS_DINODE_FMT_BTREE);
+				dp->i_df.if_format == XFS_DINODE_FMT_BTREE ||
+				xfs_has_parent(mp));
 		xfs_trans_log_inode(args->trans, dp,
 					XFS_ILOG_CORE | XFS_ILOG_ADATA);
 	}


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 29/32] xfs: Add the parent pointer support to the superblock version 5.
  2024-04-10  0:45 ` [PATCHSET v13.1 5/9] xfs: Parent Pointers Darrick J. Wong
                     ` (27 preceding siblings ...)
  2024-04-10  1:00   ` [PATCH 28/32] xfs: don't remove the attr fork when parent pointers are enabled Darrick J. Wong
@ 2024-04-10  1:01   ` Darrick J. Wong
  2024-04-10  6:05     ` Christoph Hellwig
  2024-04-10  1:01   ` [PATCH 30/32] xfs: fix unit conversion error in xfs_log_calc_max_attrsetm_res Darrick J. Wong
                     ` (2 subsequent siblings)
  31 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  1:01 UTC (permalink / raw)
  To: djwong
  Cc: Mark Tinguely, Dave Chinner, Allison Henderson, Darrick J. Wong,
	catherine.hoang, hch, allison.henderson, linux-xfs

From: Allison Henderson <allison.henderson@oracle.com>

Add the parent pointer superblock flag so that we can actually mount
filesystems with this feature enabled.

Signed-off-by: Mark Tinguely <mark.tinguely@oracle.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_format.h |    1 +
 fs/xfs/libxfs/xfs_fs.h     |    2 ++
 fs/xfs/libxfs/xfs_sb.c     |    4 ++++
 fs/xfs/xfs_super.c         |    4 ++++
 4 files changed, 11 insertions(+)


diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index f1818c54af6f8..b457e457e1f71 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -374,6 +374,7 @@ xfs_sb_has_ro_compat_feature(
 #define XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR (1 << 4) /* needs xfs_repair */
 #define XFS_SB_FEAT_INCOMPAT_NREXT64	(1 << 5)  /* large extent counters */
 #define XFS_SB_FEAT_INCOMPAT_EXCHRANGE	(1 << 6)  /* exchangerange supported */
+#define XFS_SB_FEAT_INCOMPAT_PARENT	(1 << 7)  /* parent pointers */
 #define XFS_SB_FEAT_INCOMPAT_ALL \
 		(XFS_SB_FEAT_INCOMPAT_FTYPE | \
 		 XFS_SB_FEAT_INCOMPAT_SPINODES | \
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index fa28c18e521bf..90e1d0cc04e4b 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -241,6 +241,8 @@ typedef struct xfs_fsop_resblks {
 #define XFS_FSOP_GEOM_FLAGS_NREXT64	(1 << 23) /* large extent counters */
 #define XFS_FSOP_GEOM_FLAGS_EXCHANGE_RANGE (1 << 24) /* exchange range */
 
+#define XFS_FSOP_GEOM_FLAGS_PARENT	(1U << 30) /* parent pointers */
+
 /*
  * Minimum and maximum sizes need for growth checks.
  *
diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
index c350e259b6855..09e4bf949bf88 100644
--- a/fs/xfs/libxfs/xfs_sb.c
+++ b/fs/xfs/libxfs/xfs_sb.c
@@ -178,6 +178,8 @@ xfs_sb_version_to_features(
 		features |= XFS_FEAT_NREXT64;
 	if (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_EXCHRANGE)
 		features |= XFS_FEAT_EXCHANGE_RANGE;
+	if (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_PARENT)
+		features |= XFS_FEAT_PARENT;
 
 	return features;
 }
@@ -1254,6 +1256,8 @@ xfs_fs_geometry(
 		geo->flags |= XFS_FSOP_GEOM_FLAGS_BIGTIME;
 	if (xfs_has_inobtcounts(mp))
 		geo->flags |= XFS_FSOP_GEOM_FLAGS_INOBTCNT;
+	if (xfs_has_parent(mp))
+		geo->flags |= XFS_FSOP_GEOM_FLAGS_PARENT;
 	if (xfs_has_sector(mp)) {
 		geo->flags |= XFS_FSOP_GEOM_FLAGS_SECTOR;
 		geo->logsectsize = sbp->sb_logsectsize;
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 84f37e8474da2..14a7f74b20dbb 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1733,6 +1733,10 @@ xfs_fs_fill_super(
 		xfs_warn(mp,
 	"EXPERIMENTAL exchange-range feature enabled. Use at your own risk!");
 
+	if (xfs_has_parent(mp))
+		xfs_warn(mp,
+	"EXPERIMENTAL parent pointer feature enabled. Use at your own risk!");
+
 	error = xfs_mountfs(mp);
 	if (error)
 		goto out_filestream_unmount;


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 30/32] xfs: fix unit conversion error in xfs_log_calc_max_attrsetm_res
  2024-04-10  0:45 ` [PATCHSET v13.1 5/9] xfs: Parent Pointers Darrick J. Wong
                     ` (28 preceding siblings ...)
  2024-04-10  1:01   ` [PATCH 29/32] xfs: Add the parent pointer support to the superblock version 5 Darrick J. Wong
@ 2024-04-10  1:01   ` Darrick J. Wong
  2024-04-10  6:05     ` Christoph Hellwig
  2024-04-10  1:01   ` [PATCH 31/32] xfs: drop compatibility minimum log size computations for reflink Darrick J. Wong
  2024-04-10  1:01   ` [PATCH 32/32] xfs: enable parent pointers Darrick J. Wong
  31 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  1:01 UTC (permalink / raw)
  To: djwong
  Cc: Allison Henderson, catherine.hoang, hch, allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Dave and I were discussing some recent test regressions as a result of
me turning on nrext64=1 on realtime filesystems, when we noticed that
the minimum log size of a 32M filesystem jumped from 954 blocks to 4287
blocks.

Digging through xfs_log_calc_max_attrsetm_res, Dave noticed that @size
contains the maximum estimated amount of space needed for a local format
xattr, in bytes, but we feed this quantity to XFS_NEXTENTADD_SPACE_RES,
which requires units of blocks.  This has resulted in an overestimation
of the minimum log size over the years.

We should nominally correct this, but there's a backwards compatibility
problem -- if we enable it now, the minimum log size will decrease.  If
a corrected mkfs formats a filesystem with this new smaller log size, a
user will encounter mount failures on an uncorrected kernel due to the
larger minimum log size computations there.

Therefore, turn this on for parent pointers because it wasn't merged at
all upstream when this issue was discovered.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
---
 fs/xfs/libxfs/xfs_log_rlimit.c |   32 ++++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)


diff --git a/fs/xfs/libxfs/xfs_log_rlimit.c b/fs/xfs/libxfs/xfs_log_rlimit.c
index 9975b93a7412d..3518d5e21df03 100644
--- a/fs/xfs/libxfs/xfs_log_rlimit.c
+++ b/fs/xfs/libxfs/xfs_log_rlimit.c
@@ -16,6 +16,29 @@
 #include "xfs_bmap_btree.h"
 #include "xfs_trace.h"
 
+/*
+ * Shortly after enabling the large extents count feature in 2023, longstanding
+ * bugs were found in the code that computes the minimum log size.  Luckily,
+ * the bugs resulted in over-estimates of that size, so there's no impact to
+ * existing users.  However, we don't want to reduce the minimum log size
+ * because that can create the situation where a newer mkfs writes a new
+ * filesystem that an older kernel won't mount.
+ *
+ * Therefore, we only may correct the computation starting with filesystem
+ * features that didn't exist in 2023.  In other words, only turn this on if
+ * the filesystem has parent pointers.
+ *
+ * This function can be called before the XFS_HAS_* flags have been set up,
+ * (e.g. mkfs) so we must check the ondisk superblock.
+ */
+static inline bool
+xfs_want_minlogsize_fixes(
+	struct xfs_sb	*sb)
+{
+	return xfs_sb_is_v5(sb) &&
+	       xfs_sb_has_incompat_feature(sb, XFS_SB_FEAT_INCOMPAT_PARENT);
+}
+
 /*
  * Calculate the maximum length in bytes that would be required for a local
  * attribute value as large attributes out of line are not logged.
@@ -31,6 +54,15 @@ xfs_log_calc_max_attrsetm_res(
 	       MAXNAMELEN - 1;
 	nblks = XFS_DAENTER_SPACE_RES(mp, XFS_ATTR_FORK);
 	nblks += XFS_B_TO_FSB(mp, size);
+
+	/*
+	 * If the feature set is new enough, correct a unit conversion error in
+	 * the xattr transaction reservation code that resulted in oversized
+	 * minimum log size computations.
+	 */
+	if (xfs_want_minlogsize_fixes(&mp->m_sb))
+		size = XFS_B_TO_FSB(mp, size);
+
 	nblks += XFS_NEXTENTADD_SPACE_RES(mp, size, XFS_ATTR_FORK);
 
 	return  M_RES(mp)->tr_attrsetm.tr_logres +


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 31/32] xfs: drop compatibility minimum log size computations for reflink
  2024-04-10  0:45 ` [PATCHSET v13.1 5/9] xfs: Parent Pointers Darrick J. Wong
                     ` (29 preceding siblings ...)
  2024-04-10  1:01   ` [PATCH 30/32] xfs: fix unit conversion error in xfs_log_calc_max_attrsetm_res Darrick J. Wong
@ 2024-04-10  1:01   ` Darrick J. Wong
  2024-04-10  6:06     ` Christoph Hellwig
  2024-04-10  1:01   ` [PATCH 32/32] xfs: enable parent pointers Darrick J. Wong
  31 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  1:01 UTC (permalink / raw)
  To: djwong
  Cc: Allison Henderson, catherine.hoang, hch, allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Let's also drop the oversized minimum log computations for reflink and
rmap that were the result of bugs introduced many years ago.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
---
 fs/xfs/libxfs/xfs_log_rlimit.c |   14 ++++++++++++++
 1 file changed, 14 insertions(+)


diff --git a/fs/xfs/libxfs/xfs_log_rlimit.c b/fs/xfs/libxfs/xfs_log_rlimit.c
index 3518d5e21df03..d3bd6a86c8fe9 100644
--- a/fs/xfs/libxfs/xfs_log_rlimit.c
+++ b/fs/xfs/libxfs/xfs_log_rlimit.c
@@ -24,6 +24,11 @@
  * because that can create the situation where a newer mkfs writes a new
  * filesystem that an older kernel won't mount.
  *
+ * Several years prior, we also discovered that the transaction reservations
+ * for rmap and reflink operations were unnecessarily large.  That was fixed,
+ * but the minimum log size computation was left alone to avoid the
+ * compatibility problems noted above.  Fix that too.
+ *
  * Therefore, we only may correct the computation starting with filesystem
  * features that didn't exist in 2023.  In other words, only turn this on if
  * the filesystem has parent pointers.
@@ -80,6 +85,15 @@ xfs_log_calc_trans_resv_for_minlogblocks(
 {
 	unsigned int		rmap_maxlevels = mp->m_rmap_maxlevels;
 
+	/*
+	 * If the feature set is new enough, drop the oversized minimum log
+	 * size computation introduced by the original reflink code.
+	 */
+	if (xfs_want_minlogsize_fixes(&mp->m_sb)) {
+		xfs_trans_resv_calc(mp, resv);
+		return;
+	}
+
 	/*
 	 * In the early days of rmap+reflink, we always set the rmap maxlevels
 	 * to 9 even if the AG was small enough that it would never grow to


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 32/32] xfs: enable parent pointers
  2024-04-10  0:45 ` [PATCHSET v13.1 5/9] xfs: Parent Pointers Darrick J. Wong
                     ` (30 preceding siblings ...)
  2024-04-10  1:01   ` [PATCH 31/32] xfs: drop compatibility minimum log size computations for reflink Darrick J. Wong
@ 2024-04-10  1:01   ` Darrick J. Wong
  2024-04-10  6:06     ` Christoph Hellwig
  31 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  1:01 UTC (permalink / raw)
  To: djwong; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Add parent pointers to the list of supported features.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_format.h |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)


diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index b457e457e1f71..61f51becff4f7 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -382,7 +382,8 @@ xfs_sb_has_ro_compat_feature(
 		 XFS_SB_FEAT_INCOMPAT_BIGTIME | \
 		 XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR | \
 		 XFS_SB_FEAT_INCOMPAT_NREXT64 | \
-		 XFS_SB_FEAT_INCOMPAT_EXCHRANGE)
+		 XFS_SB_FEAT_INCOMPAT_EXCHRANGE | \
+		 XFS_SB_FEAT_INCOMPAT_PARENT)
 
 #define XFS_SB_FEAT_INCOMPAT_UNKNOWN	~XFS_SB_FEAT_INCOMPAT_ALL
 static inline bool


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 1/7] xfs: check dirents have parent pointers
  2024-04-10  0:45 ` [PATCHSET v13.1 6/9] xfs: scrubbing for " Darrick J. Wong
@ 2024-04-10  1:02   ` Darrick J. Wong
  2024-04-10  6:12     ` Christoph Hellwig
  2024-04-10  1:02   ` [PATCH 2/7] xfs: deferred scrub of dirents Darrick J. Wong
                     ` (5 subsequent siblings)
  6 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  1:02 UTC (permalink / raw)
  To: djwong; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

If the fs has parent pointers, we need to check that each child dirent
points to a file that has a parent pointer pointing back at us.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_parent.c |   22 +++++++++
 fs/xfs/libxfs/xfs_parent.h |    5 ++
 fs/xfs/scrub/dir.c         |  112 ++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 138 insertions(+), 1 deletion(-)


diff --git a/fs/xfs/libxfs/xfs_parent.c b/fs/xfs/libxfs/xfs_parent.c
index db8cfad0b968e..5898dc1ebff02 100644
--- a/fs/xfs/libxfs/xfs_parent.c
+++ b/fs/xfs/libxfs/xfs_parent.c
@@ -294,3 +294,25 @@ xfs_parent_from_xattr(
 		*parent_gen = be32_to_cpu(rec->p_gen);
 	return 1;
 }
+
+/*
+ * Look up a parent pointer record (@parent_name -> @pptr) of @ip.
+ *
+ * Caller must hold at least ILOCK_SHARED.  The scratchpad need not be
+ * initialized.
+ *
+ * Returns 0 if the pointer is found, -ENOATTR if there is no match, or a
+ * negative errno.
+ */
+int
+xfs_parent_lookup(
+	struct xfs_trans		*tp,
+	struct xfs_inode		*ip,
+	const struct xfs_name		*parent_name,
+	struct xfs_parent_rec		*pptr,
+	struct xfs_da_args		*scratch)
+{
+	memset(scratch, 0, sizeof(struct xfs_da_args));
+	xfs_parent_da_args_init(scratch, tp, pptr, ip, ip->i_ino, parent_name);
+	return xfs_attr_get_ilocked(scratch);
+}
diff --git a/fs/xfs/libxfs/xfs_parent.h b/fs/xfs/libxfs/xfs_parent.h
index 3003ab496f854..b063312a61acb 100644
--- a/fs/xfs/libxfs/xfs_parent.h
+++ b/fs/xfs/libxfs/xfs_parent.h
@@ -96,4 +96,9 @@ int xfs_parent_from_xattr(struct xfs_mount *mp, unsigned int attr_flags,
 		const void *value, unsigned int valuelen,
 		xfs_ino_t *parent_ino, uint32_t *parent_gen);
 
+/* Repair functions */
+int xfs_parent_lookup(struct xfs_trans *tp, struct xfs_inode *ip,
+		const struct xfs_name *name, struct xfs_parent_rec *pptr,
+		struct xfs_da_args *scratch);
+
 #endif /* __XFS_PARENT_H__ */
diff --git a/fs/xfs/scrub/dir.c b/fs/xfs/scrub/dir.c
index 3fe6ffcf9c062..e11d73eb89352 100644
--- a/fs/xfs/scrub/dir.c
+++ b/fs/xfs/scrub/dir.c
@@ -16,6 +16,8 @@
 #include "xfs_dir2.h"
 #include "xfs_dir2_priv.h"
 #include "xfs_health.h"
+#include "xfs_attr.h"
+#include "xfs_parent.h"
 #include "scrub/scrub.h"
 #include "scrub/common.h"
 #include "scrub/dabtree.h"
@@ -41,6 +43,14 @@ xchk_setup_directory(
 
 /* Directories */
 
+struct xchk_dir {
+	struct xfs_scrub	*sc;
+
+	/* information for parent pointer validation. */
+	struct xfs_parent_rec	pptr_rec;
+	struct xfs_da_args	pptr_args;
+};
+
 /* Scrub a directory entry. */
 
 /* Check that an inode's mode matches a given XFS_DIR3_FT_* type. */
@@ -63,6 +73,90 @@ xchk_dir_check_ftype(
 		xchk_fblock_set_corrupt(sc, XFS_DATA_FORK, offset);
 }
 
+/*
+ * Try to lock a child file for checking parent pointers.  Returns the inode
+ * flags for the locks we now hold, or zero if we failed.
+ */
+STATIC unsigned int
+xchk_dir_lock_child(
+	struct xfs_scrub	*sc,
+	struct xfs_inode	*ip)
+{
+	if (!xfs_ilock_nowait(ip, XFS_IOLOCK_SHARED))
+		return 0;
+
+	if (!xfs_ilock_nowait(ip, XFS_ILOCK_SHARED)) {
+		xfs_iunlock(ip, XFS_IOLOCK_SHARED);
+		return 0;
+	}
+
+	if (!xfs_inode_has_attr_fork(ip) || !xfs_need_iread_extents(&ip->i_af))
+		return XFS_IOLOCK_SHARED | XFS_ILOCK_SHARED;
+
+	xfs_iunlock(ip, XFS_ILOCK_SHARED);
+
+	if (!xfs_ilock_nowait(ip, XFS_ILOCK_EXCL)) {
+		xfs_iunlock(ip, XFS_IOLOCK_SHARED);
+		return 0;
+	}
+
+	return XFS_IOLOCK_SHARED | XFS_ILOCK_EXCL;
+}
+
+/* Check the backwards link (parent pointer) associated with this dirent. */
+STATIC int
+xchk_dir_parent_pointer(
+	struct xchk_dir		*sd,
+	const struct xfs_name	*name,
+	struct xfs_inode	*ip)
+{
+	struct xfs_scrub	*sc = sd->sc;
+	int			error;
+
+	xfs_inode_to_parent_rec(&sd->pptr_rec, sc->ip);
+	error = xfs_parent_lookup(sc->tp, ip, name, &sd->pptr_rec,
+			&sd->pptr_args);
+	if (error == -ENOATTR)
+		xchk_fblock_xref_set_corrupt(sc, XFS_DATA_FORK, 0);
+
+	return 0;
+}
+
+/* Look for a parent pointer matching this dirent, if the child isn't busy. */
+STATIC int
+xchk_dir_check_pptr_fast(
+	struct xchk_dir		*sd,
+	xfs_dir2_dataptr_t	dapos,
+	const struct xfs_name	*name,
+	struct xfs_inode	*ip)
+{
+	struct xfs_scrub	*sc = sd->sc;
+	unsigned int		lockmode;
+	int			error;
+
+	/* dot and dotdot entries do not have parent pointers */
+	if (xfs_dir2_samename(name, &xfs_name_dot) ||
+	    xfs_dir2_samename(name, &xfs_name_dotdot))
+		return 0;
+
+	/* No self-referential non-dot or dotdot dirents. */
+	if (ip == sc->ip) {
+		xchk_fblock_set_corrupt(sc, XFS_DATA_FORK, 0);
+		return -ECANCELED;
+	}
+
+	/* Try to lock the inode. */
+	lockmode = xchk_dir_lock_child(sc, ip);
+	if (!lockmode) {
+		xchk_set_incomplete(sc);
+		return -ECANCELED;
+	}
+
+	error = xchk_dir_parent_pointer(sd, name, ip);
+	xfs_iunlock(ip, lockmode);
+	return error;
+}
+
 /*
  * Scrub a single directory entry.
  *
@@ -80,6 +174,7 @@ xchk_dir_actor(
 {
 	struct xfs_mount	*mp = dp->i_mount;
 	struct xfs_inode	*ip;
+	struct xchk_dir		*sd = priv;
 	xfs_ino_t		lookup_ino;
 	xfs_dablk_t		offset;
 	int			error = 0;
@@ -146,6 +241,14 @@ xchk_dir_actor(
 		goto out;
 
 	xchk_dir_check_ftype(sc, offset, ip, name->type);
+
+	if (xfs_has_parent(mp)) {
+		error = xchk_dir_check_pptr_fast(sd, dapos, name, ip);
+		if (error)
+			goto out_rele;
+	}
+
+out_rele:
 	xchk_irele(sc, ip);
 out:
 	if (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)
@@ -767,6 +870,7 @@ int
 xchk_directory(
 	struct xfs_scrub	*sc)
 {
+	struct xchk_dir		*sd;
 	int			error;
 
 	if (!S_ISDIR(VFS_I(sc->ip)->i_mode))
@@ -799,8 +903,14 @@ xchk_directory(
 	if (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)
 		return 0;
 
+	sd = kvzalloc(sizeof(struct xchk_dir), XCHK_GFP_FLAGS);
+	if (!sd)
+		return -ENOMEM;
+	sd->sc = sc;
+
 	/* Look up every name in this directory by hash. */
-	error = xchk_dir_walk(sc, sc->ip, xchk_dir_actor, NULL);
+	error = xchk_dir_walk(sc, sc->ip, xchk_dir_actor, sd);
+	kvfree(sd);
 	if (error && error != -ECANCELED)
 		return error;
 


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 2/7] xfs: deferred scrub of dirents
  2024-04-10  0:45 ` [PATCHSET v13.1 6/9] xfs: scrubbing for " Darrick J. Wong
  2024-04-10  1:02   ` [PATCH 1/7] xfs: check dirents have " Darrick J. Wong
@ 2024-04-10  1:02   ` Darrick J. Wong
  2024-04-10  6:13     ` Christoph Hellwig
  2024-04-10  1:02   ` [PATCH 3/7] xfs: scrub parent pointers Darrick J. Wong
                     ` (4 subsequent siblings)
  6 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  1:02 UTC (permalink / raw)
  To: djwong; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

If the trylock-based parent pointer check fails, retain those dirents
and check them at the end.  This may involve dropping the locks on the
file being scanned, so yay.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/scrub/dir.c     |  234 +++++++++++++++++++++++++++++++++++++++++++++++-
 fs/xfs/scrub/readdir.c |   78 ++++++++++++++++
 fs/xfs/scrub/readdir.h |    3 +
 fs/xfs/scrub/trace.h   |   34 +++++++
 4 files changed, 346 insertions(+), 3 deletions(-)


diff --git a/fs/xfs/scrub/dir.c b/fs/xfs/scrub/dir.c
index e11d73eb89352..62474d0557c41 100644
--- a/fs/xfs/scrub/dir.c
+++ b/fs/xfs/scrub/dir.c
@@ -24,6 +24,10 @@
 #include "scrub/readdir.h"
 #include "scrub/health.h"
 #include "scrub/repair.h"
+#include "scrub/trace.h"
+#include "scrub/xfile.h"
+#include "scrub/xfarray.h"
+#include "scrub/xfblob.h"
 
 /* Set us up to scrub directories. */
 int
@@ -43,12 +47,37 @@ xchk_setup_directory(
 
 /* Directories */
 
+/* Deferred directory entry that we saved for later. */
+struct xchk_dirent {
+	/* Cookie for retrieval of the dirent name. */
+	xfblob_cookie		name_cookie;
+
+	/* Child inode number. */
+	xfs_ino_t		ino;
+
+	/* Length of the pptr name. */
+	uint8_t			namelen;
+};
+
 struct xchk_dir {
 	struct xfs_scrub	*sc;
 
 	/* information for parent pointer validation. */
 	struct xfs_parent_rec	pptr_rec;
 	struct xfs_da_args	pptr_args;
+
+	/* Fixed-size array of xchk_dirent structures. */
+	struct xfarray		*dir_entries;
+
+	/* Blobs containing dirent names. */
+	struct xfblob		*dir_names;
+
+	/* If we've cycled the ILOCK, we must revalidate deferred dirents. */
+	bool			need_revalidate;
+
+	/* Name buffer for dirent revalidation. */
+	struct xfs_name		xname;
+	uint8_t			namebuf[MAXNAMELEN];
 };
 
 /* Scrub a directory entry. */
@@ -148,8 +177,26 @@ xchk_dir_check_pptr_fast(
 	/* Try to lock the inode. */
 	lockmode = xchk_dir_lock_child(sc, ip);
 	if (!lockmode) {
-		xchk_set_incomplete(sc);
-		return -ECANCELED;
+		struct xchk_dirent	save_de = {
+			.namelen	= name->len,
+			.ino		= ip->i_ino,
+		};
+
+		/* Couldn't lock the inode, so save the dirent for later. */
+		trace_xchk_dir_defer(sc->ip, name, ip->i_ino);
+
+		error = xfblob_storename(sd->dir_names, &save_de.name_cookie,
+				name);
+		if (!xchk_fblock_xref_process_error(sc, XFS_DATA_FORK, 0,
+					&error))
+			return error;
+
+		error = xfarray_append(sd->dir_entries, &save_de);
+		if (!xchk_fblock_xref_process_error(sc, XFS_DATA_FORK, 0,
+					&error))
+			return error;
+
+		return 0;
 	}
 
 	error = xchk_dir_parent_pointer(sd, name, ip);
@@ -865,6 +912,142 @@ xchk_directory_blocks(
 	return error;
 }
 
+/*
+ * Revalidate a dirent that we collected in the past but couldn't check because
+ * of lock contention.  Returns 0 if the dirent is still valid, -ENOENT if it
+ * has gone away on us, or a negative errno.
+ */
+STATIC int
+xchk_dir_revalidate_dirent(
+	struct xchk_dir		*sd,
+	const struct xfs_name	*xname,
+	xfs_ino_t		ino)
+{
+	struct xfs_scrub	*sc = sd->sc;
+	xfs_ino_t		child_ino;
+	int			error;
+
+	/*
+	 * Look up the directory entry.  If we get -ENOENT, the directory entry
+	 * went away and there's nothing to revalidate.  Return any other
+	 * error.
+	 */
+	error = xchk_dir_lookup(sc, sc->ip, xname, &child_ino);
+	if (error)
+		return error;
+
+	/* The inode number changed, nothing to revalidate. */
+	if (ino != child_ino)
+		return -ENOENT;
+
+	return 0;
+}
+
+/*
+ * Check a directory entry's parent pointers the slow way, which means we cycle
+ * locks a bunch and put up with revalidation until we get it done.
+ */
+STATIC int
+xchk_dir_slow_dirent(
+	struct xchk_dir		*sd,
+	struct xchk_dirent	*dirent,
+	const struct xfs_name	*xname)
+{
+	struct xfs_scrub	*sc = sd->sc;
+	struct xfs_inode	*ip;
+	unsigned int		lockmode;
+	int			error;
+
+	/* Check that the deferred dirent still exists. */
+	if (sd->need_revalidate) {
+		error = xchk_dir_revalidate_dirent(sd, xname, dirent->ino);
+		if (error == -ENOENT)
+			return 0;
+		if (!xchk_fblock_xref_process_error(sc, XFS_DATA_FORK, 0,
+					&error))
+			return error;
+	}
+
+	error = xchk_iget(sc, dirent->ino, &ip);
+	if (error == -EINVAL || error == -ENOENT) {
+		xchk_fblock_set_corrupt(sc, XFS_DATA_FORK, 0);
+		return 0;
+	}
+	if (!xchk_fblock_xref_process_error(sc, XFS_DATA_FORK, 0, &error))
+		return error;
+
+	/*
+	 * If we can grab both IOLOCK and ILOCK of the alleged child, we can
+	 * proceed with the validation.
+	 */
+	lockmode = xchk_dir_lock_child(sc, ip);
+	if (lockmode) {
+		trace_xchk_dir_slowpath(sc->ip, xname, ip->i_ino);
+		goto check_pptr;
+	}
+
+	/*
+	 * We couldn't lock the child file.  Drop all the locks and try to
+	 * get them again, one at a time.
+	 */
+	xchk_iunlock(sc, sc->ilock_flags);
+	sd->need_revalidate = true;
+
+	trace_xchk_dir_ultraslowpath(sc->ip, xname, ip->i_ino);
+
+	error = xchk_dir_trylock_for_pptrs(sc, ip, &lockmode);
+	if (error)
+		goto out_rele;
+
+	/* Revalidate, since we just cycled the locks. */
+	error = xchk_dir_revalidate_dirent(sd, xname, dirent->ino);
+	if (error == -ENOENT) {
+		error = 0;
+		goto out_unlock;
+	}
+	if (!xchk_fblock_xref_process_error(sc, XFS_DATA_FORK, 0, &error))
+		goto out_unlock;
+
+check_pptr:
+	error = xchk_dir_parent_pointer(sd, xname, ip);
+out_unlock:
+	xfs_iunlock(ip, lockmode);
+out_rele:
+	xchk_irele(sc, ip);
+	return error;
+}
+
+/* Check all the dirents that we deferred the first time around. */
+STATIC int
+xchk_dir_finish_slow_dirents(
+	struct xchk_dir		*sd)
+{
+	xfarray_idx_t		array_cur;
+	int			error;
+
+	foreach_xfarray_idx(sd->dir_entries, array_cur) {
+		struct xchk_dirent	dirent;
+
+		if (sd->sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)
+			return 0;
+
+		error = xfarray_load(sd->dir_entries, array_cur, &dirent);
+		if (error)
+			return error;
+
+		error = xfblob_loadname(sd->dir_names, dirent.name_cookie,
+				&sd->xname, dirent.namelen);
+		if (error)
+			return error;
+
+		error = xchk_dir_slow_dirent(sd, &dirent, &sd->xname);
+		if (error)
+			return error;
+	}
+
+	return 0;
+}
+
 /* Scrub a whole directory. */
 int
 xchk_directory(
@@ -907,11 +1090,56 @@ xchk_directory(
 	if (!sd)
 		return -ENOMEM;
 	sd->sc = sc;
+	sd->xname.name = sd->namebuf;
+
+	if (xfs_has_parent(sc->mp)) {
+		char		*descr;
+
+		/*
+		 * Set up some staging memory for dirents that we can't check
+		 * due to locking contention.
+		 */
+		descr = xchk_xfile_ino_descr(sc, "slow directory entries");
+		error = xfarray_create(descr, 0, sizeof(struct xchk_dirent),
+				&sd->dir_entries);
+		kfree(descr);
+		if (error)
+			goto out_sd;
+
+		descr = xchk_xfile_ino_descr(sc, "slow directory entry names");
+		error = xfblob_create(descr, &sd->dir_names);
+		kfree(descr);
+		if (error)
+			goto out_entries;
+	}
 
 	/* Look up every name in this directory by hash. */
 	error = xchk_dir_walk(sc, sc->ip, xchk_dir_actor, sd);
+	if (error == -ECANCELED)
+		error = 0;
+	if (error)
+		goto out_names;
+
+	if (xfs_has_parent(sc->mp)) {
+		error = xchk_dir_finish_slow_dirents(sd);
+		if (error == -ETIMEDOUT) {
+			/* Couldn't grab a lock, scrub was marked incomplete */
+			error = 0;
+			goto out_names;
+		}
+		if (error)
+			goto out_names;
+	}
+
+out_names:
+	if (sd->dir_names)
+		xfblob_destroy(sd->dir_names);
+out_entries:
+	if (sd->dir_entries)
+		xfarray_destroy(sd->dir_entries);
+out_sd:
 	kvfree(sd);
-	if (error && error != -ECANCELED)
+	if (error)
 		return error;
 
 	/* If the dir is clean, it is clearly not zapped. */
diff --git a/fs/xfs/scrub/readdir.c b/fs/xfs/scrub/readdir.c
index 028690761c629..28a94c78b0b19 100644
--- a/fs/xfs/scrub/readdir.c
+++ b/fs/xfs/scrub/readdir.c
@@ -18,6 +18,7 @@
 #include "xfs_trans.h"
 #include "xfs_error.h"
 #include "scrub/scrub.h"
+#include "scrub/common.h"
 #include "scrub/readdir.h"
 
 /* Call a function for every entry in a shortform directory. */
@@ -380,3 +381,80 @@ xchk_dir_lookup(
 		*ino = args.inumber;
 	return error;
 }
+
+/*
+ * Try to grab the IOLOCK and ILOCK of sc->ip and ip, returning @ip's lock
+ * state.  The caller may have a transaction, so we must use trylock for both
+ * IOLOCKs.
+ */
+static inline unsigned int
+xchk_dir_trylock_both(
+	struct xfs_scrub	*sc,
+	struct xfs_inode	*ip)
+{
+	if (!xchk_ilock_nowait(sc, XFS_IOLOCK_EXCL))
+		return 0;
+
+	if (!xfs_ilock_nowait(ip, XFS_IOLOCK_SHARED))
+		goto parent_iolock;
+
+	xchk_ilock(sc, XFS_ILOCK_EXCL);
+	if (!xfs_ilock_nowait(ip, XFS_ILOCK_EXCL))
+		goto parent_ilock;
+
+	return XFS_IOLOCK_SHARED | XFS_ILOCK_EXCL;
+
+parent_ilock:
+	xchk_iunlock(sc, XFS_ILOCK_EXCL);
+	xfs_iunlock(ip, XFS_IOLOCK_SHARED);
+parent_iolock:
+	xchk_iunlock(sc, XFS_IOLOCK_EXCL);
+	return 0;
+}
+
+/*
+ * Try for a limited time to grab the IOLOCK and ILOCK of both the scrub target
+ * (@sc->ip) and the inode at the other end (@ip) of a directory or parent
+ * pointer link so that we can check that link.
+ *
+ * We do not know ahead of time that the directory tree is /not/ corrupt, so we
+ * cannot use the "lock two inode" functions because we do not know that there
+ * is not a racing thread trying to take the locks in opposite order.  First
+ * take IOLOCK_EXCL of the scrub target, and then try to take IOLOCK_SHARED
+ * of @ip to synchronize with the VFS.  Next, take ILOCK_EXCL of the scrub
+ * target and @ip to synchronize with XFS.
+ *
+ * If the trylocks succeed, *lockmode will be set to the locks held for @ip;
+ * @sc->ilock_flags will be set for the locks held for @sc->ip; and zero will
+ * be returned.  If not, returns -EDEADLOCK to try again; or -ETIMEDOUT if
+ * XCHK_TRY_HARDER was set.  Returns -EINTR if the process has been killed.
+ */
+int
+xchk_dir_trylock_for_pptrs(
+	struct xfs_scrub	*sc,
+	struct xfs_inode	*ip,
+	unsigned int		*lockmode)
+{
+	unsigned int		nr;
+	int			error = 0;
+
+	ASSERT(sc->ilock_flags == 0);
+
+	for (nr = 0; nr < HZ; nr++) {
+		*lockmode = xchk_dir_trylock_both(sc, ip);
+		if (*lockmode)
+			return 0;
+
+		if (xchk_should_terminate(sc, &error))
+			return error;
+
+		delay(1);
+	}
+
+	if (sc->flags & XCHK_TRY_HARDER) {
+		xchk_set_incomplete(sc);
+		return -ETIMEDOUT;
+	}
+
+	return -EDEADLOCK;
+}
diff --git a/fs/xfs/scrub/readdir.h b/fs/xfs/scrub/readdir.h
index 55787f4df123f..da501877a64dd 100644
--- a/fs/xfs/scrub/readdir.h
+++ b/fs/xfs/scrub/readdir.h
@@ -16,4 +16,7 @@ int xchk_dir_walk(struct xfs_scrub *sc, struct xfs_inode *dp,
 int xchk_dir_lookup(struct xfs_scrub *sc, struct xfs_inode *dp,
 		const struct xfs_name *name, xfs_ino_t *ino);
 
+int xchk_dir_trylock_for_pptrs(struct xfs_scrub *sc, struct xfs_inode *ip,
+		unsigned int *lockmode);
+
 #endif /* __XFS_SCRUB_READDIR_H__ */
diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
index 814db1d1747a0..4db762480b8d4 100644
--- a/fs/xfs/scrub/trace.h
+++ b/fs/xfs/scrub/trace.h
@@ -1511,6 +1511,40 @@ DEFINE_EVENT(xchk_nlinks_diff_class, name, \
 	TP_ARGS(mp, ip, live))
 DEFINE_SCRUB_NLINKS_DIFF_EVENT(xchk_nlinks_compare_inode);
 
+DECLARE_EVENT_CLASS(xchk_pptr_class,
+	TP_PROTO(struct xfs_inode *ip, const struct xfs_name *name,
+		 xfs_ino_t far_ino),
+	TP_ARGS(ip, name, far_ino),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_ino_t, ino)
+		__field(unsigned int, namelen)
+		__dynamic_array(char, name, name->len)
+		__field(xfs_ino_t, far_ino)
+	),
+	TP_fast_assign(
+		__entry->dev = ip->i_mount->m_super->s_dev;
+		__entry->ino = ip->i_ino;
+		__entry->namelen = name->len;
+		memcpy(__get_str(name), name, name->len);
+		__entry->far_ino = far_ino;
+	),
+	TP_printk("dev %d:%d ino 0x%llx name '%.*s' far_ino 0x%llx",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->ino,
+		  __entry->namelen,
+		  __get_str(name),
+		  __entry->far_ino)
+)
+#define DEFINE_XCHK_PPTR_EVENT(name) \
+DEFINE_EVENT(xchk_pptr_class, name, \
+	TP_PROTO(struct xfs_inode *ip, const struct xfs_name *name, \
+		 xfs_ino_t far_ino), \
+	TP_ARGS(ip, name, far_ino))
+DEFINE_XCHK_PPTR_EVENT(xchk_dir_defer);
+DEFINE_XCHK_PPTR_EVENT(xchk_dir_slowpath);
+DEFINE_XCHK_PPTR_EVENT(xchk_dir_ultraslowpath);
+
 /* repair tracepoints */
 #if IS_ENABLED(CONFIG_XFS_ONLINE_REPAIR)
 


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 3/7] xfs: scrub parent pointers
  2024-04-10  0:45 ` [PATCHSET v13.1 6/9] xfs: scrubbing for " Darrick J. Wong
  2024-04-10  1:02   ` [PATCH 1/7] xfs: check dirents have " Darrick J. Wong
  2024-04-10  1:02   ` [PATCH 2/7] xfs: deferred scrub of dirents Darrick J. Wong
@ 2024-04-10  1:02   ` Darrick J. Wong
  2024-04-10  6:13     ` Christoph Hellwig
  2024-04-10  1:02   ` [PATCH 4/7] xfs: deferred scrub of " Darrick J. Wong
                     ` (3 subsequent siblings)
  6 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  1:02 UTC (permalink / raw)
  To: djwong; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Actually check parent pointers now.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/scrub/parent.c |  367 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 367 insertions(+)


diff --git a/fs/xfs/scrub/parent.c b/fs/xfs/scrub/parent.c
index acb6282c3d148..966e106c1fe6d 100644
--- a/fs/xfs/scrub/parent.c
+++ b/fs/xfs/scrub/parent.c
@@ -15,11 +15,15 @@
 #include "xfs_icache.h"
 #include "xfs_dir2.h"
 #include "xfs_dir2_priv.h"
+#include "xfs_attr.h"
+#include "xfs_parent.h"
 #include "scrub/scrub.h"
 #include "scrub/common.h"
 #include "scrub/readdir.h"
 #include "scrub/tempfile.h"
 #include "scrub/repair.h"
+#include "scrub/listxattr.h"
+#include "scrub/trace.h"
 
 /* Set us up to scrub parents. */
 int
@@ -197,6 +201,366 @@ xchk_parent_validate(
 	return error;
 }
 
+/*
+ * Checking of Parent Pointers
+ * ===========================
+ *
+ * On filesystems with directory parent pointers, we check the referential
+ * integrity by visiting each parent pointer of a child file and checking that
+ * the directory referenced by the pointer actually has a dirent pointing
+ * forward to the child file.
+ */
+
+struct xchk_pptrs {
+	struct xfs_scrub	*sc;
+
+	/* How many parent pointers did we find at the end? */
+	unsigned long long	pptrs_found;
+
+	/* Parent of this directory. */
+	xfs_ino_t		parent_ino;
+};
+
+/* Does this parent pointer match the dotdot entry? */
+STATIC int
+xchk_parent_scan_dotdot(
+	struct xfs_scrub		*sc,
+	struct xfs_inode		*ip,
+	unsigned int			attr_flags,
+	const unsigned char		*name,
+	unsigned int			namelen,
+	const void			*value,
+	unsigned int			valuelen,
+	void				*priv)
+{
+	struct xchk_pptrs		*pp = priv;
+	xfs_ino_t			parent_ino;
+	int				ret;
+
+	ret = xfs_parent_from_xattr(sc->mp, attr_flags, name, namelen,
+			value, valuelen, &parent_ino, NULL);
+	if (ret < 0)
+		return ret;
+
+	if (ret == 1 && pp->parent_ino == parent_ino)
+		return -ECANCELED;
+
+	return 0;
+}
+
+/* Look up the dotdot entry so that we can check it as we walk the pptrs. */
+STATIC int
+xchk_parent_pptr_and_dotdot(
+	struct xchk_pptrs	*pp)
+{
+	struct xfs_scrub	*sc = pp->sc;
+	int			error;
+
+	/* Look up '..' */
+	error = xchk_dir_lookup(sc, sc->ip, &xfs_name_dotdot, &pp->parent_ino);
+	if (!xchk_fblock_process_error(sc, XFS_DATA_FORK, 0, &error))
+		return error;
+	if (!xfs_verify_dir_ino(sc->mp, pp->parent_ino)) {
+		xchk_fblock_set_corrupt(sc, XFS_DATA_FORK, 0);
+		return 0;
+	}
+
+	/* Is this the root dir?  Then '..' must point to itself. */
+	if (sc->ip == sc->mp->m_rootip) {
+		if (sc->ip->i_ino != pp->parent_ino)
+			xchk_fblock_set_corrupt(sc, XFS_DATA_FORK, 0);
+		return 0;
+	}
+
+	/*
+	 * If this is now an unlinked directory, the dotdot value is
+	 * meaningless as long as it points to a valid inode.
+	 */
+	if (VFS_I(sc->ip)->i_nlink == 0)
+		return 0;
+
+	if (pp->sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)
+		return 0;
+
+	/* Otherwise, walk the pptrs again, and check. */
+	error = xchk_xattr_walk(sc, sc->ip, xchk_parent_scan_dotdot, pp);
+	if (error == -ECANCELED) {
+		/* Found a parent pointer that matches dotdot. */
+		return 0;
+	}
+	if (!error || error == -EFSCORRUPTED) {
+		/* Found a broken parent pointer or no match. */
+		xchk_fblock_set_corrupt(sc, XFS_ATTR_FORK, 0);
+		return 0;
+	}
+	return error;
+}
+
+/*
+ * Try to lock a parent directory for checking dirents.  Returns the inode
+ * flags for the locks we now hold, or zero if we failed.
+ */
+STATIC unsigned int
+xchk_parent_lock_dir(
+	struct xfs_scrub	*sc,
+	struct xfs_inode	*dp)
+{
+	if (!xfs_ilock_nowait(dp, XFS_IOLOCK_SHARED))
+		return 0;
+
+	if (!xfs_ilock_nowait(dp, XFS_ILOCK_SHARED)) {
+		xfs_iunlock(dp, XFS_IOLOCK_SHARED);
+		return 0;
+	}
+
+	if (!xfs_need_iread_extents(&dp->i_df))
+		return XFS_IOLOCK_SHARED | XFS_ILOCK_SHARED;
+
+	xfs_iunlock(dp, XFS_ILOCK_SHARED);
+
+	if (!xfs_ilock_nowait(dp, XFS_ILOCK_EXCL)) {
+		xfs_iunlock(dp, XFS_IOLOCK_SHARED);
+		return 0;
+	}
+
+	return XFS_IOLOCK_SHARED | XFS_ILOCK_EXCL;
+}
+
+/* Check the forward link (dirent) associated with this parent pointer. */
+STATIC int
+xchk_parent_dirent(
+	struct xchk_pptrs	*pp,
+	const struct xfs_name	*xname,
+	struct xfs_inode	*dp)
+{
+	struct xfs_scrub	*sc = pp->sc;
+	xfs_ino_t		child_ino;
+	int			error;
+
+	/*
+	 * Use the name attached to this parent pointer to look up the
+	 * directory entry in the alleged parent.
+	 */
+	error = xchk_dir_lookup(sc, dp, xname, &child_ino);
+	if (error == -ENOENT) {
+		xchk_fblock_xref_set_corrupt(sc, XFS_ATTR_FORK, 0);
+		return 0;
+	}
+	if (!xchk_fblock_xref_process_error(sc, XFS_ATTR_FORK, 0, &error))
+		return error;
+
+	/* Does the inode number match? */
+	if (child_ino != sc->ip->i_ino) {
+		xchk_fblock_xref_set_corrupt(sc, XFS_ATTR_FORK, 0);
+		return 0;
+	}
+
+	return 0;
+}
+
+/* Try to grab a parent directory. */
+STATIC int
+xchk_parent_iget(
+	struct xchk_pptrs	*pp,
+	const struct xfs_parent_rec	*pptr,
+	struct xfs_inode	**dpp)
+{
+	struct xfs_scrub	*sc = pp->sc;
+	struct xfs_inode	*ip;
+	xfs_ino_t		parent_ino = be64_to_cpu(pptr->p_ino);
+	int			error;
+
+	/* Validate inode number. */
+	error = xfs_dir_ino_validate(sc->mp, parent_ino);
+	if (error) {
+		xchk_fblock_set_corrupt(sc, XFS_ATTR_FORK, 0);
+		return -ECANCELED;
+	}
+
+	error = xchk_iget(sc, parent_ino, &ip);
+	if (error == -EINVAL || error == -ENOENT) {
+		xchk_fblock_set_corrupt(sc, XFS_ATTR_FORK, 0);
+		return -ECANCELED;
+	}
+	if (!xchk_fblock_xref_process_error(sc, XFS_ATTR_FORK, 0, &error))
+		return error;
+
+	/* The parent must be a directory. */
+	if (!S_ISDIR(VFS_I(ip)->i_mode)) {
+		xchk_fblock_xref_set_corrupt(sc, XFS_ATTR_FORK, 0);
+		goto out_rele;
+	}
+
+	/* Validate generation number. */
+	if (VFS_I(ip)->i_generation != be32_to_cpu(pptr->p_gen)) {
+		xchk_fblock_xref_set_corrupt(sc, XFS_ATTR_FORK, 0);
+		goto out_rele;
+	}
+
+	*dpp = ip;
+	return 0;
+out_rele:
+	xchk_irele(sc, ip);
+	return 0;
+}
+
+/*
+ * Walk an xattr of a file.  If this xattr is a parent pointer, follow it up
+ * to a parent directory and check that the parent has a dirent pointing back
+ * to us.
+ */
+STATIC int
+xchk_parent_scan_attr(
+	struct xfs_scrub	*sc,
+	struct xfs_inode	*ip,
+	unsigned int		attr_flags,
+	const unsigned char	*name,
+	unsigned int		namelen,
+	const void		*value,
+	unsigned int		valuelen,
+	void			*priv)
+{
+	struct xfs_name		xname = {
+		.name		= name,
+		.len		= namelen,
+	};
+	struct xchk_pptrs	*pp = priv;
+	struct xfs_inode	*dp = NULL;
+	const struct xfs_parent_rec *pptr_rec = value;
+	xfs_ino_t		parent_ino;
+	unsigned int		lockmode;
+	int			ret;
+
+	ret = xfs_parent_from_xattr(sc->mp, attr_flags, name, namelen,
+			value, valuelen, &parent_ino, NULL);
+	if (ret < 0) {
+		xchk_fblock_set_corrupt(sc, XFS_ATTR_FORK, 0);
+		return -ECANCELED;
+	}
+	if (ret != 1)
+		return 0;
+
+	/* No self-referential parent pointers. */
+	if (parent_ino == sc->ip->i_ino) {
+		xchk_fblock_set_corrupt(sc, XFS_ATTR_FORK, 0);
+		return -ECANCELED;
+	}
+
+	pp->pptrs_found++;
+
+	ret = xchk_parent_iget(pp, pptr_rec, &dp);
+	if (ret)
+		return ret;
+	if (!dp)
+		return 0;
+
+	/* Try to lock the inode. */
+	lockmode = xchk_parent_lock_dir(sc, dp);
+	if (!lockmode) {
+		xchk_set_incomplete(sc);
+		ret = -ECANCELED;
+		goto out_rele;
+	}
+
+	ret = xchk_parent_dirent(pp, &xname, dp);
+	if (ret)
+		goto out_unlock;
+
+out_unlock:
+	xfs_iunlock(dp, lockmode);
+out_rele:
+	xchk_irele(sc, dp);
+	return ret;
+}
+
+/*
+ * Compare the number of parent pointers to the link count.  For
+ * non-directories these should be the same.  For unlinked directories the
+ * count should be zero; for linked directories, it should be nonzero.
+ */
+STATIC int
+xchk_parent_count_pptrs(
+	struct xchk_pptrs	*pp)
+{
+	struct xfs_scrub	*sc = pp->sc;
+
+	if (S_ISDIR(VFS_I(sc->ip)->i_mode)) {
+		if (sc->ip == sc->mp->m_rootip)
+			pp->pptrs_found++;
+
+		if (VFS_I(sc->ip)->i_nlink == 0 && pp->pptrs_found > 0)
+			xchk_ino_set_corrupt(sc, sc->ip->i_ino);
+		else if (VFS_I(sc->ip)->i_nlink > 0 &&
+			 pp->pptrs_found == 0)
+			xchk_ino_set_corrupt(sc, sc->ip->i_ino);
+	} else {
+		if (VFS_I(sc->ip)->i_nlink != pp->pptrs_found)
+			xchk_ino_set_corrupt(sc, sc->ip->i_ino);
+	}
+
+	return 0;
+}
+
+/* Check parent pointers of a file. */
+STATIC int
+xchk_parent_pptr(
+	struct xfs_scrub	*sc)
+{
+	struct xchk_pptrs	*pp;
+	int			error;
+
+	pp = kvzalloc(sizeof(struct xchk_pptrs), XCHK_GFP_FLAGS);
+	if (!pp)
+		return -ENOMEM;
+	pp->sc = sc;
+
+	error = xchk_xattr_walk(sc, sc->ip, xchk_parent_scan_attr, pp);
+	if (error == -ECANCELED) {
+		error = 0;
+		goto out_pp;
+	}
+	if (error)
+		goto out_pp;
+
+	if (pp->sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)
+		goto out_pp;
+
+	/*
+	 * For subdirectories, make sure the dotdot entry references the same
+	 * inode as the parent pointers.
+	 *
+	 * If we're scanning a /consistent/ directory, there should only be
+	 * one parent pointer, and it should point to the same directory as
+	 * the dotdot entry.
+	 *
+	 * However, a corrupt directory tree might feature a subdirectory with
+	 * multiple parents.  The directory loop scanner is responsible for
+	 * correcting that kind of problem, so for now we only validate that
+	 * the dotdot entry matches /one/ of the parents.
+	 */
+	if (S_ISDIR(VFS_I(sc->ip)->i_mode)) {
+		error = xchk_parent_pptr_and_dotdot(pp);
+		if (error)
+			goto out_pp;
+	}
+
+	if (pp->sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)
+		goto out_pp;
+
+	/*
+	 * Complain if the number of parent pointers doesn't match the link
+	 * count.  This could be a sign of missing parent pointers (or an
+	 * incorrect link count).
+	 */
+	error = xchk_parent_count_pptrs(pp);
+	if (error)
+		goto out_pp;
+
+out_pp:
+	kvfree(pp);
+	return error;
+}
+
 /* Scrub a parent pointer. */
 int
 xchk_parent(
@@ -206,6 +570,9 @@ xchk_parent(
 	xfs_ino_t		parent_ino;
 	int			error = 0;
 
+	if (xfs_has_parent(mp))
+		return xchk_parent_pptr(sc);
+
 	/*
 	 * If we're a directory, check that the '..' link points up to
 	 * a directory that has one entry pointing to us.


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 4/7] xfs: deferred scrub of parent pointers
  2024-04-10  0:45 ` [PATCHSET v13.1 6/9] xfs: scrubbing for " Darrick J. Wong
                     ` (2 preceding siblings ...)
  2024-04-10  1:02   ` [PATCH 3/7] xfs: scrub parent pointers Darrick J. Wong
@ 2024-04-10  1:02   ` Darrick J. Wong
  2024-04-10  6:14     ` Christoph Hellwig
  2024-04-10  1:03   ` [PATCH 5/7] xfs: walk directory parent pointers to determine backref count Darrick J. Wong
                     ` (2 subsequent siblings)
  6 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  1:02 UTC (permalink / raw)
  To: djwong; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

If the trylock-based dirent check fails, retain those parent pointers
and check them at the end.  This may involve dropping the locks on the
file being scanned, so yay.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/Makefile       |    2 
 fs/xfs/scrub/parent.c |  264 ++++++++++++++++++++++++++++++++++++++++++++++++-
 fs/xfs/scrub/trace.h  |    3 +
 3 files changed, 261 insertions(+), 8 deletions(-)


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index c969b11ce0f47..af99a455ce4db 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -177,6 +177,7 @@ xfs-y				+= $(addprefix scrub/, \
 				   scrub.o \
 				   symlink.o \
 				   xfarray.o \
+				   xfblob.o \
 				   xfile.o \
 				   )
 
@@ -218,7 +219,6 @@ xfs-y				+= $(addprefix scrub/, \
 				   rmap_repair.o \
 				   symlink_repair.o \
 				   tempfile.o \
-				   xfblob.o \
 				   )
 
 xfs-$(CONFIG_XFS_RT)		+= $(addprefix scrub/, \
diff --git a/fs/xfs/scrub/parent.c b/fs/xfs/scrub/parent.c
index 966e106c1fe6d..cb50e89e7ee64 100644
--- a/fs/xfs/scrub/parent.c
+++ b/fs/xfs/scrub/parent.c
@@ -23,6 +23,9 @@
 #include "scrub/tempfile.h"
 #include "scrub/repair.h"
 #include "scrub/listxattr.h"
+#include "scrub/xfile.h"
+#include "scrub/xfarray.h"
+#include "scrub/xfblob.h"
 #include "scrub/trace.h"
 
 /* Set us up to scrub parents. */
@@ -211,6 +214,18 @@ xchk_parent_validate(
  * forward to the child file.
  */
 
+/* Deferred parent pointer entry that we saved for later. */
+struct xchk_pptr {
+	/* Cookie for retrieval of the pptr name. */
+	xfblob_cookie		name_cookie;
+
+	/* Parent pointer record. */
+	struct xfs_parent_rec	pptr_rec;
+
+	/* Length of the pptr name. */
+	uint8_t			namelen;
+};
+
 struct xchk_pptrs {
 	struct xfs_scrub	*sc;
 
@@ -219,6 +234,22 @@ struct xchk_pptrs {
 
 	/* Parent of this directory. */
 	xfs_ino_t		parent_ino;
+
+	/* Fixed-size array of xchk_pptr structures. */
+	struct xfarray		*pptr_entries;
+
+	/* Blobs containing parent pointer names. */
+	struct xfblob		*pptr_names;
+
+	/* Scratch buffer for scanning pptr xattrs */
+	struct xfs_da_args	pptr_args;
+
+	/* If we've cycled the ILOCK, we must revalidate all deferred pptrs. */
+	bool			need_revalidate;
+
+	/* Name buffer */
+	struct xfs_name		xname;
+	char			namebuf[MAXNAMELEN];
 };
 
 /* Does this parent pointer match the dotdot entry? */
@@ -457,8 +488,25 @@ xchk_parent_scan_attr(
 	/* Try to lock the inode. */
 	lockmode = xchk_parent_lock_dir(sc, dp);
 	if (!lockmode) {
-		xchk_set_incomplete(sc);
-		ret = -ECANCELED;
+		struct xchk_pptr	save_pp = {
+			.pptr_rec	= *pptr_rec, /* struct copy */
+			.namelen	= namelen,
+		};
+
+		/* Couldn't lock the inode, so save the pptr for later. */
+		trace_xchk_parent_defer(sc->ip, &xname, dp->i_ino);
+
+		ret = xfblob_storename(pp->pptr_names, &save_pp.name_cookie,
+				&xname);
+		if (!xchk_fblock_xref_process_error(sc, XFS_ATTR_FORK, 0,
+					&ret))
+			goto out_rele;
+
+		ret = xfarray_append(pp->pptr_entries, &save_pp);
+		if (!xchk_fblock_xref_process_error(sc, XFS_ATTR_FORK, 0,
+					&ret))
+			goto out_rele;
+
 		goto out_rele;
 	}
 
@@ -473,6 +521,159 @@ xchk_parent_scan_attr(
 	return ret;
 }
 
+/*
+ * Revalidate a parent pointer that we collected in the past but couldn't check
+ * because of lock contention.  Returns 0 if the parent pointer is still valid,
+ * -ENOENT if it has gone away on us, or a negative errno.
+ */
+STATIC int
+xchk_parent_revalidate_pptr(
+	struct xchk_pptrs		*pp,
+	const struct xfs_name		*xname,
+	struct xfs_parent_rec		*pptr)
+{
+	struct xfs_scrub		*sc = pp->sc;
+	int				error;
+
+	error = xfs_parent_lookup(sc->tp, sc->ip, xname, pptr, &pp->pptr_args);
+	if (error == -ENOATTR) {
+		/* Parent pointer went away, nothing to revalidate. */
+		return -ENOENT;
+	}
+
+	return error;
+}
+
+/*
+ * Check a parent pointer the slow way, which means we cycle locks a bunch
+ * and put up with revalidation until we get it done.
+ */
+STATIC int
+xchk_parent_slow_pptr(
+	struct xchk_pptrs	*pp,
+	const struct xfs_name	*xname,
+	struct xfs_parent_rec	*pptr)
+{
+	struct xfs_scrub	*sc = pp->sc;
+	struct xfs_inode	*dp = NULL;
+	unsigned int		lockmode;
+	int			error;
+
+	/* Check that the deferred parent pointer still exists. */
+	if (pp->need_revalidate) {
+		error = xchk_parent_revalidate_pptr(pp, xname, pptr);
+		if (error == -ENOENT)
+			return 0;
+		if (!xchk_fblock_xref_process_error(sc, XFS_ATTR_FORK, 0,
+					&error))
+			return error;
+	}
+
+	error = xchk_parent_iget(pp, pptr, &dp);
+	if (error)
+		return error;
+	if (!dp)
+		return 0;
+
+	/*
+	 * If we can grab both IOLOCK and ILOCK of the alleged parent, we
+	 * can proceed with the validation.
+	 */
+	lockmode = xchk_parent_lock_dir(sc, dp);
+	if (lockmode) {
+		trace_xchk_parent_slowpath(sc->ip, xname, dp->i_ino);
+		goto check_dirent;
+	}
+
+	/*
+	 * We couldn't lock the parent dir.  Drop all the locks and try to
+	 * get them again, one at a time.
+	 */
+	xchk_iunlock(sc, sc->ilock_flags);
+	pp->need_revalidate = true;
+
+	trace_xchk_parent_ultraslowpath(sc->ip, xname, dp->i_ino);
+
+	error = xchk_dir_trylock_for_pptrs(sc, dp, &lockmode);
+	if (error)
+		goto out_rele;
+
+	/* Revalidate the parent pointer now that we cycled locks. */
+	error = xchk_parent_revalidate_pptr(pp, xname, pptr);
+	if (error == -ENOENT) {
+		error = 0;
+		goto out_unlock;
+	}
+	if (!xchk_fblock_xref_process_error(sc, XFS_ATTR_FORK, 0, &error))
+		goto out_unlock;
+
+check_dirent:
+	error = xchk_parent_dirent(pp, xname, dp);
+out_unlock:
+	xfs_iunlock(dp, lockmode);
+out_rele:
+	xchk_irele(sc, dp);
+	return error;
+}
+
+/* Check all the parent pointers that we deferred the first time around. */
+STATIC int
+xchk_parent_finish_slow_pptrs(
+	struct xchk_pptrs	*pp)
+{
+	xfarray_idx_t		array_cur;
+	int			error;
+
+	foreach_xfarray_idx(pp->pptr_entries, array_cur) {
+		struct xchk_pptr	pptr;
+
+		if (pp->sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)
+			return 0;
+
+		error = xfarray_load(pp->pptr_entries, array_cur, &pptr);
+		if (error)
+			return error;
+
+		error = xfblob_loadname(pp->pptr_names, pptr.name_cookie,
+				&pp->xname, pptr.namelen);
+		if (error)
+			return error;
+
+		error = xchk_parent_slow_pptr(pp, &pp->xname, &pptr.pptr_rec);
+		if (error)
+			return error;
+	}
+
+	/* Empty out both xfiles now that we've checked everything. */
+	xfarray_truncate(pp->pptr_entries);
+	xfblob_truncate(pp->pptr_names);
+	return 0;
+}
+
+/* Count the number of parent pointers. */
+STATIC int
+xchk_parent_count_pptr(
+	struct xfs_scrub		*sc,
+	struct xfs_inode		*ip,
+	unsigned int			attr_flags,
+	const unsigned char		*name,
+	unsigned int			namelen,
+	const void			*value,
+	unsigned int			valuelen,
+	void				*priv)
+{
+	struct xchk_pptrs		*pp = priv;
+	int				ret;
+
+	ret = xfs_parent_from_xattr(sc->mp, attr_flags, name, namelen,
+			value, valuelen, NULL, NULL);
+	if (ret != 1)
+		return ret;
+
+	pp->pptrs_found++;
+	return 0;
+}
+
 /*
  * Compare the number of parent pointers to the link count.  For
  * non-directories these should be the same.  For unlinked directories the
@@ -483,6 +684,23 @@ xchk_parent_count_pptrs(
 	struct xchk_pptrs	*pp)
 {
 	struct xfs_scrub	*sc = pp->sc;
+	int			error;
+
+	/*
+	 * If we cycled the ILOCK while cross-checking parent pointers with
+	 * dirents, then we need to recalculate the number of parent pointers.
+	 */
+	if (pp->need_revalidate) {
+		pp->pptrs_found = 0;
+		error = xchk_xattr_walk(sc, sc->ip, xchk_parent_count_pptr, pp);
+		if (error == -EFSCORRUPTED) {
+			/* Found a bad parent pointer */
+			xchk_fblock_set_corrupt(sc, XFS_ATTR_FORK, 0);
+			return 0;
+		}
+		if (error)
+			return error;
+	}
 
 	if (S_ISDIR(VFS_I(sc->ip)->i_mode)) {
 		if (sc->ip == sc->mp->m_rootip)
@@ -507,23 +725,51 @@ xchk_parent_pptr(
 	struct xfs_scrub	*sc)
 {
 	struct xchk_pptrs	*pp;
+	char			*descr;
 	int			error;
 
 	pp = kvzalloc(sizeof(struct xchk_pptrs), XCHK_GFP_FLAGS);
 	if (!pp)
 		return -ENOMEM;
 	pp->sc = sc;
+	pp->xname.name = pp->namebuf;
+
+	/*
+	 * Set up some staging memory for parent pointers that we can't check
+	 * due to locking contention.
+	 */
+	descr = xchk_xfile_ino_descr(sc, "slow parent pointer entries");
+	error = xfarray_create(descr, 0, sizeof(struct xchk_pptr),
+			&pp->pptr_entries);
+	kfree(descr);
+	if (error)
+		goto out_pp;
+
+	descr = xchk_xfile_ino_descr(sc, "slow parent pointer names");
+	error = xfblob_create(descr, &pp->pptr_names);
+	kfree(descr);
+	if (error)
+		goto out_entries;
 
 	error = xchk_xattr_walk(sc, sc->ip, xchk_parent_scan_attr, pp);
 	if (error == -ECANCELED) {
 		error = 0;
-		goto out_pp;
+		goto out_names;
 	}
 	if (error)
-		goto out_pp;
+		goto out_names;
+
+	error = xchk_parent_finish_slow_pptrs(pp);
+	if (error == -ETIMEDOUT) {
+		/* Couldn't grab a lock, scrub was marked incomplete */
+		error = 0;
+		goto out_names;
+	}
+	if (error)
+		goto out_names;
 
 	if (pp->sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)
-		goto out_pp;
+		goto out_names;
 
 	/*
 	 * For subdirectories, make sure the dotdot entry references the same
@@ -541,7 +787,7 @@ xchk_parent_pptr(
 	if (S_ISDIR(VFS_I(sc->ip)->i_mode)) {
 		error = xchk_parent_pptr_and_dotdot(pp);
 		if (error)
-			goto out_pp;
+			goto out_names;
 	}
 
 	if (pp->sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)
@@ -554,8 +800,12 @@ xchk_parent_pptr(
 	 */
 	error = xchk_parent_count_pptrs(pp);
 	if (error)
-		goto out_pp;
+		goto out_names;
 
+out_names:
+	xfblob_destroy(pp->pptr_names);
+out_entries:
+	xfarray_destroy(pp->pptr_entries);
 out_pp:
 	kvfree(pp);
 	return error;
diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
index 4db762480b8d4..97a106519b531 100644
--- a/fs/xfs/scrub/trace.h
+++ b/fs/xfs/scrub/trace.h
@@ -1544,6 +1544,9 @@ DEFINE_EVENT(xchk_pptr_class, name, \
 DEFINE_XCHK_PPTR_EVENT(xchk_dir_defer);
 DEFINE_XCHK_PPTR_EVENT(xchk_dir_slowpath);
 DEFINE_XCHK_PPTR_EVENT(xchk_dir_ultraslowpath);
+DEFINE_XCHK_PPTR_EVENT(xchk_parent_defer);
+DEFINE_XCHK_PPTR_EVENT(xchk_parent_slowpath);
+DEFINE_XCHK_PPTR_EVENT(xchk_parent_ultraslowpath);
 
 /* repair tracepoints */
 #if IS_ENABLED(CONFIG_XFS_ONLINE_REPAIR)


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 5/7] xfs: walk directory parent pointers to determine backref count
  2024-04-10  0:45 ` [PATCHSET v13.1 6/9] xfs: scrubbing for " Darrick J. Wong
                     ` (3 preceding siblings ...)
  2024-04-10  1:02   ` [PATCH 4/7] xfs: deferred scrub of " Darrick J. Wong
@ 2024-04-10  1:03   ` Darrick J. Wong
  2024-04-10  6:14     ` Christoph Hellwig
  2024-04-10  1:03   ` [PATCH 6/7] xfs: check parent pointer xattrs when scrubbing Darrick J. Wong
  2024-04-10  1:03   ` [PATCH 7/7] xfs: salvage parent pointers when rebuilding xattr structures Darrick J. Wong
  6 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  1:03 UTC (permalink / raw)
  To: djwong; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

If the filesystem has parent pointers enabled, walk the parent pointers
of subdirectories to determine the true backref count.  In theory each
subdir should have a single parent reachable via dotdot, but in the case
of (corrupt) subdirs with multiple parents, we need to keep the link
counts high enough that the directory loop detector will be able to
correct the multiple parents problems.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/scrub/common.h        |    1 +
 fs/xfs/scrub/nlinks.c        |   82 +++++++++++++++++++++++++++++++++++++++++-
 fs/xfs/scrub/nlinks_repair.c |    2 +
 fs/xfs/scrub/parent.c        |   61 +++++++++++++++++++++++++++++++
 fs/xfs/scrub/trace.c         |    1 +
 fs/xfs/scrub/trace.h         |   28 ++++++++++++++
 6 files changed, 174 insertions(+), 1 deletion(-)


diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
index 89f7bbec887ed..e00466f404829 100644
--- a/fs/xfs/scrub/common.h
+++ b/fs/xfs/scrub/common.h
@@ -212,6 +212,7 @@ static inline bool xchk_skip_xref(struct xfs_scrub_metadata *sm)
 }
 
 bool xchk_dir_looks_zapped(struct xfs_inode *dp);
+bool xchk_pptr_looks_zapped(struct xfs_inode *ip);
 
 #ifdef CONFIG_XFS_ONLINE_REPAIR
 /* Decide if a repair is required. */
diff --git a/fs/xfs/scrub/nlinks.c b/fs/xfs/scrub/nlinks.c
index fcb9c473f372e..a733e4e178de4 100644
--- a/fs/xfs/scrub/nlinks.c
+++ b/fs/xfs/scrub/nlinks.c
@@ -18,6 +18,7 @@
 #include "xfs_dir2.h"
 #include "xfs_dir2_priv.h"
 #include "xfs_ag.h"
+#include "xfs_parent.h"
 #include "scrub/scrub.h"
 #include "scrub/common.h"
 #include "scrub/repair.h"
@@ -29,6 +30,7 @@
 #include "scrub/trace.h"
 #include "scrub/readdir.h"
 #include "scrub/tempfile.h"
+#include "scrub/listxattr.h"
 
 /*
  * Live Inode Link Count Checking
@@ -272,12 +274,17 @@ xchk_nlinks_collect_dirent(
 	 * number of parents of the root directory.
 	 *
 	 * Otherwise, increment the number of backrefs pointing back to ino.
+	 *
+	 * If the filesystem has parent pointers, we walk the pptrs to
+	 * determine the backref count.
 	 */
 	if (dotdot) {
 		if (dp == sc->mp->m_rootip)
 			error = xchk_nlinks_update_incore(xnc, ino, 1, 0, 0);
-		else
+		else if (!xfs_has_parent(sc->mp))
 			error = xchk_nlinks_update_incore(xnc, ino, 0, 1, 0);
+		else
+			error = 0;
 		if (error)
 			goto out_unlock;
 	}
@@ -314,6 +321,58 @@ xchk_nlinks_collect_dirent(
 	return error;
 }
 
+/* Bump the backref count for the inode referenced by this parent pointer. */
+STATIC int
+xchk_nlinks_collect_pptr(
+	struct xfs_scrub		*sc,
+	struct xfs_inode		*ip,
+	unsigned int			attr_flags,
+	const unsigned char		*name,
+	unsigned int			namelen,
+	const void			*value,
+	unsigned int			valuelen,
+	void				*priv)
+{
+	struct xfs_name			xname = {
+		.name			= name,
+		.len			= namelen,
+	};
+	struct xchk_nlink_ctrs		*xnc = priv;
+	const struct xfs_parent_rec	*pptr_rec = value;
+	xfs_ino_t			parent_ino;
+	int				ret;
+
+	/* Update the shadow link counts if we haven't already failed. */
+
+	if (xchk_iscan_aborted(&xnc->collect_iscan)) {
+		ret = -ECANCELED;
+		goto out_incomplete;
+	}
+
+	ret = xfs_parent_from_xattr(sc->mp, attr_flags, name, namelen,
+			value, valuelen, &parent_ino, NULL);
+	if (ret != 1)
+		return ret;
+
+	trace_xchk_nlinks_collect_pptr(sc->mp, ip, &xname, pptr_rec);
+
+	mutex_lock(&xnc->lock);
+
+	ret = xchk_nlinks_update_incore(xnc, parent_ino, 0, 1, 0);
+	if (ret)
+		goto out_unlock;
+
+	mutex_unlock(&xnc->lock);
+	return 0;
+
+out_unlock:
+	mutex_unlock(&xnc->lock);
+	xchk_iscan_abort(&xnc->collect_iscan);
+out_incomplete:
+	xchk_set_incomplete(sc);
+	return ret;
+}
+
 /* Walk a directory to bump the observed link counts of the children. */
 STATIC int
 xchk_nlinks_collect_dir(
@@ -360,6 +419,27 @@ xchk_nlinks_collect_dir(
 	if (error)
 		goto out_abort;
 
+	/* Walk the parent pointers to get real backref counts. */
+	if (xfs_has_parent(sc->mp)) {
+		/*
+		 * If the extended attributes look as though they has been
+		 * zapped by the inode record repair code, we cannot scan for
+		 * parent pointers.
+		 */
+		if (xchk_pptr_looks_zapped(dp)) {
+			error = -EBUSY;
+			goto out_unlock;
+		}
+
+		error = xchk_xattr_walk(sc, dp, xchk_nlinks_collect_pptr, xnc);
+		if (error == -ECANCELED) {
+			error = 0;
+			goto out_unlock;
+		}
+		if (error)
+			goto out_abort;
+	}
+
 	xchk_iscan_mark_visited(&xnc->collect_iscan, dp);
 	goto out_unlock;
 
diff --git a/fs/xfs/scrub/nlinks_repair.c b/fs/xfs/scrub/nlinks_repair.c
index 83f8637bb08fd..78d0f650fe897 100644
--- a/fs/xfs/scrub/nlinks_repair.c
+++ b/fs/xfs/scrub/nlinks_repair.c
@@ -18,6 +18,8 @@
 #include "xfs_ialloc.h"
 #include "xfs_sb.h"
 #include "xfs_ag.h"
+#include "xfs_dir2.h"
+#include "xfs_parent.h"
 #include "scrub/scrub.h"
 #include "scrub/common.h"
 #include "scrub/repair.h"
diff --git a/fs/xfs/scrub/parent.c b/fs/xfs/scrub/parent.c
index cb50e89e7ee64..57b49fbf97a30 100644
--- a/fs/xfs/scrub/parent.c
+++ b/fs/xfs/scrub/parent.c
@@ -866,3 +866,64 @@ xchk_parent(
 
 	return error;
 }
+
+/*
+ * Decide if this file's extended attributes (and therefore its parent
+ * pointers) have been zapped to satisfy the inode and ifork verifiers.
+ * Checking and repairing should be postponed until the extended attribute
+ * structure is fixed.
+ */
+bool
+xchk_pptr_looks_zapped(
+	struct xfs_inode	*ip)
+{
+	struct xfs_mount	*mp = ip->i_mount;
+	struct inode		*inode = VFS_I(ip);
+
+	ASSERT(xfs_has_parent(mp));
+
+	/*
+	 * Temporary files that cannot be linked into the directory tree do not
+	 * have attr forks because they cannot ever have parents.
+	 */
+	if (inode->i_nlink == 0 && !(inode->i_state & I_LINKABLE))
+		return false;
+
+	/*
+	 * Directory tree roots do not have parents, so the expected outcome
+	 * of a parent pointer scan is always the empty set.  It's safe to scan
+	 * them even if the attr fork was zapped.
+	 */
+	if (ip == mp->m_rootip)
+		return false;
+
+	/*
+	 * Metadata inodes are all rooted in the superblock and do not have
+	 * any parents.  Hence the attr fork will not be initialized, but
+	 * there are no parent pointers that might have been zapped.
+	 */
+	if (xfs_is_metadata_inode(ip))
+		return false;
+
+	/*
+	 * Linked and linkable non-rootdir files should always have an
+	 * attribute fork because that is where parent pointers are
+	 * stored.  If the fork is absent, something is amiss.
+	 */
+	if (!xfs_inode_has_attr_fork(ip))
+		return true;
+
+	/* Repair zapped this file's attr fork a short time ago */
+	if (xfs_ifork_zapped(ip, XFS_ATTR_FORK))
+		return true;
+
+	/*
+	 * If the dinode repair found a bad attr fork, it will reset the fork
+	 * to extents format with zero records and wait for the bmapbta
+	 * scrubber to reconstruct the block mappings.  The extended attribute
+	 * structure always contain some content when parent pointers are
+	 * enabled, so this is a clear sign of a zapped attr fork.
+	 */
+	return ip->i_af.if_format == XFS_DINODE_FMT_EXTENTS &&
+	       ip->i_af.if_nextents == 0;
+}
diff --git a/fs/xfs/scrub/trace.c b/fs/xfs/scrub/trace.c
index b2ce7b22cad34..4a8cc2c98d997 100644
--- a/fs/xfs/scrub/trace.c
+++ b/fs/xfs/scrub/trace.c
@@ -19,6 +19,7 @@
 #include "xfs_da_format.h"
 #include "xfs_dir2.h"
 #include "xfs_rmap.h"
+#include "xfs_parent.h"
 #include "scrub/scrub.h"
 #include "scrub/xfile.h"
 #include "scrub/xfarray.h"
diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
index 97a106519b531..3e726610b9e32 100644
--- a/fs/xfs/scrub/trace.h
+++ b/fs/xfs/scrub/trace.h
@@ -26,6 +26,7 @@ struct xchk_iscan;
 struct xchk_nlink;
 struct xchk_fscounters;
 struct xfs_rmap_update_params;
+struct xfs_parent_rec;
 
 /*
  * ftrace's __print_symbolic requires that all enum values be wrapped in the
@@ -1363,6 +1364,33 @@ TRACE_EVENT(xchk_nlinks_collect_dirent,
 		  __get_str(name))
 );
 
+TRACE_EVENT(xchk_nlinks_collect_pptr,
+	TP_PROTO(struct xfs_mount *mp, struct xfs_inode *dp,
+		 const struct xfs_name *name,
+		 const struct xfs_parent_rec *pptr),
+	TP_ARGS(mp, dp, name, pptr),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_ino_t, dir)
+		__field(xfs_ino_t, ino)
+		__field(unsigned int, namelen)
+		__dynamic_array(char, name, name->len)
+	),
+	TP_fast_assign(
+		__entry->dev = mp->m_super->s_dev;
+		__entry->dir = dp->i_ino;
+		__entry->ino = be64_to_cpu(pptr->p_ino);
+		__entry->namelen = name->len;
+		memcpy(__get_str(name), name->name, name->len);
+	),
+	TP_printk("dev %d:%d dir 0x%llx -> ino 0x%llx name '%.*s'",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->dir,
+		  __entry->ino,
+		  __entry->namelen,
+		  __get_str(name))
+);
+
 TRACE_EVENT(xchk_nlinks_collect_metafile,
 	TP_PROTO(struct xfs_mount *mp, xfs_ino_t ino),
 	TP_ARGS(mp, ino),


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 6/7] xfs: check parent pointer xattrs when scrubbing
  2024-04-10  0:45 ` [PATCHSET v13.1 6/9] xfs: scrubbing for " Darrick J. Wong
                     ` (4 preceding siblings ...)
  2024-04-10  1:03   ` [PATCH 5/7] xfs: walk directory parent pointers to determine backref count Darrick J. Wong
@ 2024-04-10  1:03   ` Darrick J. Wong
  2024-04-10  6:14     ` Christoph Hellwig
  2024-04-10  1:03   ` [PATCH 7/7] xfs: salvage parent pointers when rebuilding xattr structures Darrick J. Wong
  6 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  1:03 UTC (permalink / raw)
  To: djwong; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Check parent pointer xattrs as part of scrubbing xattrs.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/scrub/attr.c |    8 ++++++++
 1 file changed, 8 insertions(+)


diff --git a/fs/xfs/scrub/attr.c b/fs/xfs/scrub/attr.c
index fe51a17661831..b91234bbd58aa 100644
--- a/fs/xfs/scrub/attr.c
+++ b/fs/xfs/scrub/attr.c
@@ -17,6 +17,7 @@
 #include "xfs_attr.h"
 #include "xfs_attr_leaf.h"
 #include "xfs_attr_sf.h"
+#include "xfs_parent.h"
 #include "scrub/scrub.h"
 #include "scrub/common.h"
 #include "scrub/dabtree.h"
@@ -208,6 +209,13 @@ xchk_xattr_actor(
 		return -ECANCELED;
 	}
 
+	/* Check parent pointer record. */
+	if ((attr_flags & XFS_ATTR_PARENT) &&
+	    !xfs_parent_valuecheck(sc->mp, value, valuelen)) {
+		xchk_fblock_set_corrupt(sc, XFS_ATTR_FORK, args.blkno);
+		return -ECANCELED;
+	}
+
 	/*
 	 * Local and shortform xattr values are stored in the attr leaf block,
 	 * so we don't need to retrieve the value from a remote block to detect


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 7/7] xfs: salvage parent pointers when rebuilding xattr structures
  2024-04-10  0:45 ` [PATCHSET v13.1 6/9] xfs: scrubbing for " Darrick J. Wong
                     ` (5 preceding siblings ...)
  2024-04-10  1:03   ` [PATCH 6/7] xfs: check parent pointer xattrs when scrubbing Darrick J. Wong
@ 2024-04-10  1:03   ` Darrick J. Wong
  2024-04-10  6:15     ` Christoph Hellwig
  6 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  1:03 UTC (permalink / raw)
  To: djwong; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

When we're salvaging extended attributes, make sure we validate the ones
that claim to be parent pointers before adding them to the salvage pile.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/scrub/attr_repair.c |   34 +++++++++++++++++++++++++---------
 fs/xfs/scrub/trace.h       |   38 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 63 insertions(+), 9 deletions(-)


diff --git a/fs/xfs/scrub/attr_repair.c b/fs/xfs/scrub/attr_repair.c
index 7228758c2da1a..091cef077cdde 100644
--- a/fs/xfs/scrub/attr_repair.c
+++ b/fs/xfs/scrub/attr_repair.c
@@ -28,6 +28,7 @@
 #include "xfs_exchmaps.h"
 #include "xfs_exchrange.h"
 #include "xfs_acl.h"
+#include "xfs_parent.h"
 #include "scrub/xfs_scrub.h"
 #include "scrub/scrub.h"
 #include "scrub/common.h"
@@ -127,6 +128,9 @@ xrep_xattr_want_salvage(
 		return false;
 	if (valuelen > XATTR_SIZE_MAX || valuelen < 0)
 		return false;
+	if (attr_flags & XFS_ATTR_PARENT)
+		return xfs_parent_valuecheck(rx->sc->mp, value, valuelen);
+
 	return true;
 }
 
@@ -154,14 +158,21 @@ xrep_xattr_salvage_key(
 	 * Truncate the name to the first character that would trip namecheck.
 	 * If we no longer have a name after that, ignore this attribute.
 	 */
-	while (i < namelen && name[i] != 0)
-		i++;
-	if (i == 0)
-		return 0;
-	key.namelen = i;
+	if (flags & XFS_ATTR_PARENT) {
+		key.namelen = namelen;
 
-	trace_xrep_xattr_salvage_rec(rx->sc->ip, flags, name, key.namelen,
-			valuelen);
+		trace_xrep_xattr_salvage_pptr(rx->sc->ip, flags, name,
+				key.namelen, value, valuelen);
+	} else {
+		while (i < namelen && name[i] != 0)
+			i++;
+		if (i == 0)
+			return 0;
+		key.namelen = i;
+
+		trace_xrep_xattr_salvage_rec(rx->sc->ip, flags, name,
+				key.namelen, valuelen);
+	}
 
 	error = xfblob_store(rx->xattr_blobs, &key.name_cookie, name,
 			key.namelen);
@@ -596,8 +607,13 @@ xrep_xattr_insert_rec(
 
 	ab->name[key->namelen] = 0;
 
-	trace_xrep_xattr_insert_rec(rx->sc->tempip, key->flags, ab->name,
-			key->namelen, key->valuelen);
+	if (key->flags & XFS_ATTR_PARENT)
+		trace_xrep_xattr_insert_pptr(rx->sc->tempip, key->flags,
+				ab->name, key->namelen, ab->value,
+				key->valuelen);
+	else
+		trace_xrep_xattr_insert_rec(rx->sc->tempip, key->flags,
+				ab->name, key->namelen, key->valuelen);
 
 	/*
 	 * xfs_attr_set creates and commits its own transaction.  If the attr
diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
index 3e726610b9e32..4b968df3d840c 100644
--- a/fs/xfs/scrub/trace.h
+++ b/fs/xfs/scrub/trace.h
@@ -2540,6 +2540,44 @@ DEFINE_EVENT(xrep_xattr_salvage_class, name, \
 DEFINE_XREP_XATTR_SALVAGE_EVENT(xrep_xattr_salvage_rec);
 DEFINE_XREP_XATTR_SALVAGE_EVENT(xrep_xattr_insert_rec);
 
+DECLARE_EVENT_CLASS(xrep_pptr_salvage_class,
+	TP_PROTO(struct xfs_inode *ip, unsigned int flags, const void *name,
+		 unsigned int namelen, const void *value, unsigned int valuelen),
+	TP_ARGS(ip, flags, name, namelen, value, valuelen),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_ino_t, ino)
+		__field(xfs_ino_t, parent_ino)
+		__field(unsigned int, parent_gen)
+		__field(unsigned int, namelen)
+		__dynamic_array(char, name, namelen)
+	),
+	TP_fast_assign(
+		const struct xfs_parent_rec	*rec = value;
+
+		__entry->dev = ip->i_mount->m_super->s_dev;
+		__entry->ino = ip->i_ino;
+		__entry->parent_ino = be64_to_cpu(rec->p_ino);
+		__entry->parent_gen = be32_to_cpu(rec->p_gen);
+		__entry->namelen = namelen;
+		memcpy(__get_str(name), name, namelen);
+	),
+	TP_printk("dev %d:%d ino 0x%llx parent_ino 0x%llx parent_gen 0x%x name '%.*s'",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->ino,
+		  __entry->parent_ino,
+		  __entry->parent_gen,
+		  __entry->namelen,
+		  __get_str(name))
+)
+#define DEFINE_XREP_PPTR_SALVAGE_EVENT(name) \
+DEFINE_EVENT(xrep_pptr_salvage_class, name, \
+	TP_PROTO(struct xfs_inode *ip, unsigned int flags, const void *name, \
+		 unsigned int namelen, const void *value, unsigned int valuelen), \
+	TP_ARGS(ip, flags, name, namelen, value, valuelen))
+DEFINE_XREP_PPTR_SALVAGE_EVENT(xrep_xattr_salvage_pptr);
+DEFINE_XREP_PPTR_SALVAGE_EVENT(xrep_xattr_insert_pptr);
+
 TRACE_EVENT(xrep_xattr_class,
 	TP_PROTO(struct xfs_inode *ip, struct xfs_inode *arg_ip),
 	TP_ARGS(ip, arg_ip),


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 01/14] xfs: add xattr setname and removename functions for internal users
  2024-04-10  0:45 ` [PATCHSET v13.1 7/9] xfs: online repair for parent pointers Darrick J. Wong
@ 2024-04-10  1:03   ` Darrick J. Wong
  2024-04-10  6:18     ` Christoph Hellwig
  2024-04-10  1:04   ` [PATCH 02/14] xfs: add raw parent pointer apis to support repair Darrick J. Wong
                     ` (12 subsequent siblings)
  13 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  1:03 UTC (permalink / raw)
  To: djwong; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Add a couple of internal xattr functions to set or remove attr names
from the xattr structures.  The upcoming parent pointer and fsverity
patchsets will want the ability to set and clear xattrs with a fully
initialized xfs_da_args structure.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_attr.c   |  193 ++++++++++++++++++++++++++++++++++++++++----
 fs/xfs/libxfs/xfs_attr.h   |    3 +
 fs/xfs/scrub/attr_repair.c |   17 +++-
 3 files changed, 191 insertions(+), 22 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index 99930472e59da..83f8cf551816a 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -950,6 +950,44 @@ xfs_attr_lookup(
 	return error;
 }
 
+/*
+ * Before updating xattrs, add an attribute fork if the inode doesn't have.
+ * (inode must not be locked when we call this routine)
+ */
+static int
+xfs_attr_ensure_fork(
+	struct xfs_da_args	*args,
+	bool			rsvd)
+{
+	int			sf_size;
+
+	if (xfs_inode_has_attr_fork(args->dp))
+		return 0;
+
+	sf_size = sizeof(struct xfs_attr_sf_hdr) +
+			xfs_attr_sf_entsize_byname(args->namelen,
+						   args->valuelen);
+
+	return xfs_bmap_add_attrfork(args->dp, sf_size, rsvd);
+}
+
+/*
+ * Before updating xattrs, make sure we can handle adding to the extent count.
+ * There must be a transaction and the ILOCK must be held.
+ */
+static int
+xfs_attr_ensure_iext(
+	struct xfs_da_args	*args,
+	int			nr)
+{
+	int			error;
+
+	error = xfs_iext_count_may_overflow(args->dp, XFS_ATTR_FORK, nr);
+	if (error == -EFBIG)
+		return xfs_iext_count_upgrade(args->trans, args->dp, nr);
+	return error;
+}
+
 /*
  * Note: If args->value is NULL the attribute will be removed, just like the
  * Linux ->setattr API.
@@ -994,19 +1032,9 @@ xfs_attr_set(
 		XFS_STATS_INC(mp, xs_attr_set);
 		args->total = xfs_attr_calc_size(args, &local);
 
-		/*
-		 * If the inode doesn't have an attribute fork, add one.
-		 * (inode must not be locked when we call this routine)
-		 */
-		if (xfs_inode_has_attr_fork(dp) == 0) {
-			int sf_size = sizeof(struct xfs_attr_sf_hdr) +
-				xfs_attr_sf_entsize_byname(args->namelen,
-						args->valuelen);
-
-			error = xfs_bmap_add_attrfork(dp, sf_size, rsvd);
-			if (error)
-				return error;
-		}
+		error = xfs_attr_ensure_fork(args, rsvd);
+		if (error)
+			return error;
 
 		if (!local)
 			rmt_blks = xfs_attr3_rmt_blocks(mp, args->valuelen);
@@ -1025,11 +1053,8 @@ xfs_attr_set(
 		return error;
 
 	if (args->value || xfs_inode_hasattr(dp)) {
-		error = xfs_iext_count_may_overflow(dp, XFS_ATTR_FORK,
+		error = xfs_attr_ensure_iext(args,
 				XFS_IEXT_ATTR_MANIP_CNT(rmt_blks));
-		if (error == -EFBIG)
-			error = xfs_iext_count_upgrade(args->trans, dp,
-					XFS_IEXT_ATTR_MANIP_CNT(rmt_blks));
 		if (error)
 			goto out_trans_cancel;
 	}
@@ -1086,6 +1111,140 @@ xfs_attr_set(
 	goto out_unlock;
 }
 
+/*
+ * Ensure that the xattr structure maps @args->name to @args->value.
+ *
+ * The caller must have initialized @args, attached dquots, and must not hold
+ * any ILOCKs.  Only XATTR_CREATE may be specified in @args->xattr_flags.
+ * Reserved data blocks may be used if @rsvd is set.
+ *
+ * Returns -EEXIST if XATTR_CREATE was specified and the name already exists.
+ */
+int
+xfs_attr_setname(
+	struct xfs_da_args	*args,
+	bool			rsvd)
+{
+	struct xfs_inode	*dp = args->dp;
+	struct xfs_mount	*mp = dp->i_mount;
+	struct xfs_trans_res	tres;
+	unsigned int		total;
+	int			rmt_extents = 0;
+	int			error, local;
+
+	ASSERT(!(args->xattr_flags & XATTR_REPLACE));
+	ASSERT(!args->trans);
+
+	args->total = xfs_attr_calc_size(args, &local);
+
+	error = xfs_attr_ensure_fork(args, rsvd);
+	if (error)
+		return error;
+
+	if (!local)
+		rmt_extents = XFS_IEXT_ATTR_MANIP_CNT(
+				xfs_attr3_rmt_blocks(mp, args->valuelen));
+
+	xfs_init_attr_trans(args, &tres, &total);
+	error = xfs_trans_alloc_inode(dp, &tres, total, 0, rsvd, &args->trans);
+	if (error)
+		return error;
+
+	error = xfs_attr_ensure_iext(args, rmt_extents);
+	if (error)
+		goto out_trans_cancel;
+
+	error = xfs_attr_lookup(args);
+	switch (error) {
+	case -EEXIST:
+		/* Pure create fails if the attr already exists */
+		if (args->xattr_flags & XATTR_CREATE)
+			goto out_trans_cancel;
+		if (args->attr_filter & XFS_ATTR_PARENT)
+			xfs_attr_defer_parent(args, XFS_ATTR_DEFER_REPLACE);
+		else
+			xfs_attr_defer_add(args, XFS_ATTR_DEFER_REPLACE);
+		break;
+	case -ENOATTR:
+		if (args->attr_filter & XFS_ATTR_PARENT)
+			xfs_attr_defer_parent(args, XFS_ATTR_DEFER_SET);
+		else
+			xfs_attr_defer_add(args, XFS_ATTR_DEFER_SET);
+		break;
+	default:
+		goto out_trans_cancel;
+	}
+
+	xfs_trans_ichgtime(args->trans, dp, XFS_ICHGTIME_CHG);
+	xfs_trans_log_inode(args->trans, dp, XFS_ILOG_CORE);
+	error = xfs_trans_commit(args->trans);
+out_unlock:
+	args->trans = NULL;
+	xfs_iunlock(dp, XFS_ILOCK_EXCL);
+	return error;
+
+out_trans_cancel:
+	xfs_trans_cancel(args->trans);
+	goto out_unlock;
+}
+
+/*
+ * Ensure that the xattr structure does not map @args->name to @args->value.
+ *
+ * The caller must have initialized @args, attached dquots, and must not hold
+ * any ILOCKs.  Reserved data blocks may be used if @rsvd is set.
+ *
+ * Returns -ENOATTR if the name did not already exist.
+ */
+int
+xfs_attr_removename(
+	struct xfs_da_args	*args,
+	bool			rsvd)
+{
+	struct xfs_inode	*dp = args->dp;
+	struct xfs_mount	*mp = dp->i_mount;
+	struct xfs_trans_res	tres;
+	unsigned int		total;
+	int			rmt_extents;
+	int			error;
+
+	ASSERT(!args->trans);
+
+	rmt_extents = XFS_IEXT_ATTR_MANIP_CNT(
+				xfs_attr3_rmt_blocks(mp, XFS_XATTR_SIZE_MAX));
+
+	xfs_init_attr_trans(args, &tres, &total);
+	error = xfs_trans_alloc_inode(dp, &tres, total, 0, rsvd, &args->trans);
+	if (error)
+		return error;
+
+	if (xfs_inode_hasattr(dp)) {
+		error = xfs_attr_ensure_iext(args, rmt_extents);
+		if (error)
+			goto out_trans_cancel;
+	}
+
+	error = xfs_attr_lookup(args);
+	if (error != -EEXIST)
+		goto out_trans_cancel;
+
+	if (args->attr_filter & XFS_ATTR_PARENT)
+		xfs_attr_defer_parent(args, XFS_ATTR_DEFER_REMOVE);
+	else
+		xfs_attr_defer_add(args, XFS_ATTR_DEFER_REMOVE);
+	xfs_trans_ichgtime(args->trans, dp, XFS_ICHGTIME_CHG);
+	xfs_trans_log_inode(args->trans, dp, XFS_ILOG_CORE);
+	error = xfs_trans_commit(args->trans);
+out_unlock:
+	args->trans = NULL;
+	xfs_iunlock(dp, XFS_ILOCK_EXCL);
+	return error;
+
+out_trans_cancel:
+	xfs_trans_cancel(args->trans);
+	goto out_unlock;
+}
+
 /*========================================================================
  * External routines when attribute list is inside the inode
  *========================================================================*/
diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
index cb5ca37000848..d51001c5809fe 100644
--- a/fs/xfs/libxfs/xfs_attr.h
+++ b/fs/xfs/libxfs/xfs_attr.h
@@ -560,6 +560,9 @@ int xfs_attr_calc_size(struct xfs_da_args *args, int *local);
 void xfs_init_attr_trans(struct xfs_da_args *args, struct xfs_trans_res *tres,
 			 unsigned int *total);
 
+int xfs_attr_setname(struct xfs_da_args *args, bool rsvd);
+int xfs_attr_removename(struct xfs_da_args *args, bool rsvd);
+
 /*
  * Check to see if the attr should be upgraded from non-existent or shortform to
  * single-leaf-block attribute list.
diff --git a/fs/xfs/scrub/attr_repair.c b/fs/xfs/scrub/attr_repair.c
index 091cef077cdde..a3a98051df0fb 100644
--- a/fs/xfs/scrub/attr_repair.c
+++ b/fs/xfs/scrub/attr_repair.c
@@ -570,6 +570,9 @@ xrep_xattr_insert_rec(
 		.namelen		= key->namelen,
 		.valuelen		= key->valuelen,
 		.owner			= rx->sc->ip->i_ino,
+		.geo			= rx->sc->mp->m_attr_geo,
+		.whichfork		= XFS_ATTR_FORK,
+		.op_flags		= XFS_DA_OP_OKNOENT,
 	};
 	struct xchk_xattr_buf		*ab = rx->sc->buf;
 	int				error;
@@ -607,19 +610,23 @@ xrep_xattr_insert_rec(
 
 	ab->name[key->namelen] = 0;
 
-	if (key->flags & XFS_ATTR_PARENT)
+	if (key->flags & XFS_ATTR_PARENT) {
 		trace_xrep_xattr_insert_pptr(rx->sc->tempip, key->flags,
 				ab->name, key->namelen, ab->value,
 				key->valuelen);
-	else
+		args.op_flags |= XFS_DA_OP_LOGGED;
+	} else {
 		trace_xrep_xattr_insert_rec(rx->sc->tempip, key->flags,
 				ab->name, key->namelen, key->valuelen);
+	}
 
 	/*
-	 * xfs_attr_set creates and commits its own transaction.  If the attr
-	 * already exists, we'll just drop it during the rebuild.
+	 * xfs_attr_setname creates and commits its own transaction.  If the
+	 * attr already exists, we'll just drop it during the rebuild.  Don't
+	 * use reserved blocks because we can abort the repair with ENOSPC.
 	 */
-	error = xfs_attr_set(&args);
+	xfs_attr_sethash(&args);
+	error = xfs_attr_setname(&args, false);
 	if (error == -EEXIST)
 		error = 0;
 


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 02/14] xfs: add raw parent pointer apis to support repair
  2024-04-10  0:45 ` [PATCHSET v13.1 7/9] xfs: online repair for parent pointers Darrick J. Wong
  2024-04-10  1:03   ` [PATCH 01/14] xfs: add xattr setname and removename functions for internal users Darrick J. Wong
@ 2024-04-10  1:04   ` Darrick J. Wong
  2024-04-10  6:18     ` Christoph Hellwig
  2024-04-10  1:04   ` [PATCH 03/14] xfs: repair directories by scanning directory parent pointers Darrick J. Wong
                     ` (11 subsequent siblings)
  13 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  1:04 UTC (permalink / raw)
  To: djwong; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Add a couple of utility functions to set or remove parent pointers from
a file.  These functions will be used by repair code, hence they skip
the xattr logging that regular parent pointer updates use.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_dir2.c   |    2 +
 fs/xfs/libxfs/xfs_dir2.h   |    2 +
 fs/xfs/libxfs/xfs_parent.c |   64 ++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/libxfs/xfs_parent.h |    6 ++++
 4 files changed, 72 insertions(+), 2 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_dir2.c b/fs/xfs/libxfs/xfs_dir2.c
index 9da99fa20c759..7634344dc5153 100644
--- a/fs/xfs/libxfs/xfs_dir2.c
+++ b/fs/xfs/libxfs/xfs_dir2.c
@@ -434,7 +434,7 @@ int
 xfs_dir_removename(
 	struct xfs_trans	*tp,
 	struct xfs_inode	*dp,
-	struct xfs_name		*name,
+	const struct xfs_name	*name,
 	xfs_ino_t		ino,
 	xfs_extlen_t		total)		/* bmap's total block count */
 {
diff --git a/fs/xfs/libxfs/xfs_dir2.h b/fs/xfs/libxfs/xfs_dir2.h
index eb3a5c35025b5..b580a78bcf4fc 100644
--- a/fs/xfs/libxfs/xfs_dir2.h
+++ b/fs/xfs/libxfs/xfs_dir2.h
@@ -58,7 +58,7 @@ extern int xfs_dir_lookup(struct xfs_trans *tp, struct xfs_inode *dp,
 				const struct xfs_name *name, xfs_ino_t *inum,
 				struct xfs_name *ci_name);
 extern int xfs_dir_removename(struct xfs_trans *tp, struct xfs_inode *dp,
-				struct xfs_name *name, xfs_ino_t ino,
+				const struct xfs_name *name, xfs_ino_t ino,
 				xfs_extlen_t tot);
 extern int xfs_dir_replace(struct xfs_trans *tp, struct xfs_inode *dp,
 				const struct xfs_name *name, xfs_ino_t inum,
diff --git a/fs/xfs/libxfs/xfs_parent.c b/fs/xfs/libxfs/xfs_parent.c
index 5898dc1ebff02..2b6ed8c1ee152 100644
--- a/fs/xfs/libxfs/xfs_parent.c
+++ b/fs/xfs/libxfs/xfs_parent.c
@@ -316,3 +316,67 @@ xfs_parent_lookup(
 	xfs_parent_da_args_init(scratch, tp, pptr, ip, ip->i_ino, parent_name);
 	return xfs_attr_get_ilocked(scratch);
 }
+
+/* Sanity-check a parent pointer before we try to perform repairs. */
+static inline bool
+xfs_parent_sanity_check(
+	struct xfs_mount		*mp,
+	const struct xfs_name		*parent_name,
+	const struct xfs_parent_rec	*pptr)
+{
+	if (!xfs_parent_namecheck(XFS_ATTR_PARENT, parent_name->name,
+				parent_name->len))
+		return false;
+
+	if (!xfs_parent_valuecheck(mp, pptr, sizeof(*pptr)))
+		return false;
+
+	return true;
+}
+
+
+/*
+ * Attach the parent pointer (@parent_name -> @pptr) to @ip immediately.
+ * Caller must not have a transaction or hold the ILOCK.  This is for
+ * specialized repair functions only.  The scratchpad need not be initialized.
+ */
+int
+xfs_parent_set(
+	struct xfs_inode	*ip,
+	xfs_ino_t		owner,
+	const struct xfs_name	*parent_name,
+	struct xfs_parent_rec	*pptr,
+	struct xfs_da_args	*scratch)
+{
+	if (!xfs_parent_sanity_check(ip->i_mount, parent_name, pptr)) {
+		ASSERT(0);
+		return -EFSCORRUPTED;
+	}
+
+	memset(scratch, 0, sizeof(struct xfs_da_args));
+	xfs_parent_da_args_init(scratch, NULL, pptr, ip, owner, parent_name);
+	return xfs_attr_setname(scratch, true);
+}
+
+/*
+ * Remove the parent pointer (@parent_name -> @pptr) from @ip immediately.
+ * Caller must not have a transaction or hold the ILOCK.  This is for
+ * specialized repair functions only.  The scratchpad need not be initialized.
+ */
+int
+xfs_parent_unset(
+	struct xfs_inode		*ip,
+	xfs_ino_t			owner,
+	const struct xfs_name		*parent_name,
+	struct xfs_parent_rec		*pptr,
+	struct xfs_da_args		*scratch)
+{
+	if (!xfs_parent_sanity_check(ip->i_mount, parent_name, pptr)) {
+		ASSERT(0);
+		return -EFSCORRUPTED;
+	}
+
+	memset(scratch, 0, sizeof(struct xfs_da_args));
+	xfs_parent_da_args_init(scratch, NULL, pptr, ip, owner, parent_name);
+	return xfs_attr_removename(scratch, true);
+}
diff --git a/fs/xfs/libxfs/xfs_parent.h b/fs/xfs/libxfs/xfs_parent.h
index b063312a61acb..0312f70217fb5 100644
--- a/fs/xfs/libxfs/xfs_parent.h
+++ b/fs/xfs/libxfs/xfs_parent.h
@@ -100,5 +100,11 @@ int xfs_parent_from_xattr(struct xfs_mount *mp, unsigned int attr_flags,
 int xfs_parent_lookup(struct xfs_trans *tp, struct xfs_inode *ip,
 		const struct xfs_name *name, struct xfs_parent_rec *pptr,
 		struct xfs_da_args *scratch);
+int xfs_parent_set(struct xfs_inode *ip, xfs_ino_t owner,
+		const struct xfs_name *name, struct xfs_parent_rec *pptr,
+		struct xfs_da_args *scratch);
+int xfs_parent_unset(struct xfs_inode *ip, xfs_ino_t owner,
+		const struct xfs_name *name, struct xfs_parent_rec *pptr,
+		struct xfs_da_args *scratch);
 
 #endif /* __XFS_PARENT_H__ */


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 03/14] xfs: repair directories by scanning directory parent pointers
  2024-04-10  0:45 ` [PATCHSET v13.1 7/9] xfs: online repair for parent pointers Darrick J. Wong
  2024-04-10  1:03   ` [PATCH 01/14] xfs: add xattr setname and removename functions for internal users Darrick J. Wong
  2024-04-10  1:04   ` [PATCH 02/14] xfs: add raw parent pointer apis to support repair Darrick J. Wong
@ 2024-04-10  1:04   ` Darrick J. Wong
  2024-04-10  6:19     ` Christoph Hellwig
  2024-04-10  1:04   ` [PATCH 04/14] xfs: implement live updates for directory repairs Darrick J. Wong
                     ` (10 subsequent siblings)
  13 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  1:04 UTC (permalink / raw)
  To: djwong; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

For filesystems with parent pointers, scan the entire filesystem looking
for parent pointers that target the directory we're rebuilding instead
of trying to salvage whatever we can from the directory data blocks.
This will be more robust than salvaging, but there's more code to come.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/scrub/dir_repair.c |  344 ++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 338 insertions(+), 6 deletions(-)


diff --git a/fs/xfs/scrub/dir_repair.c b/fs/xfs/scrub/dir_repair.c
index 575397aef1f7a..d7b84d69510a4 100644
--- a/fs/xfs/scrub/dir_repair.c
+++ b/fs/xfs/scrub/dir_repair.c
@@ -28,6 +28,7 @@
 #include "xfs_exchmaps.h"
 #include "xfs_exchrange.h"
 #include "xfs_ag.h"
+#include "xfs_parent.h"
 #include "scrub/xfs_scrub.h"
 #include "scrub/scrub.h"
 #include "scrub/common.h"
@@ -43,6 +44,7 @@
 #include "scrub/reap.h"
 #include "scrub/findparent.h"
 #include "scrub/orphanage.h"
+#include "scrub/listxattr.h"
 
 /*
  * Directory Repair
@@ -57,6 +59,15 @@
  * being repaired and the temporary directory, and will later become important
  * for parent pointer scanning.
  *
+ * If parent pointers are enabled on this filesystem, we instead reconstruct
+ * the directory by visiting each parent pointer of each file in the filesystem
+ * and translating the relevant parent pointer records into dirents.  In this
+ * case, it is advantageous to stash all directory entries created from parent
+ * pointers for a single child file before replaying them into the temporary
+ * directory.  To save memory, the live filesystem scan reuses the findparent
+ * fields.  Directory repair chooses either parent pointer scanning or
+ * directory entry salvaging, but not both.
+ *
  * Directory entries added to the temporary directory do not elevate the link
  * counts of the inodes found.  When salvaging completes, the remaining stashed
  * entries are replayed to the temporary directory.  An atomic mapping exchange
@@ -112,7 +123,15 @@ struct xrep_dir {
 
 	/*
 	 * Information used to scan the filesystem to find the inumber of the
-	 * dotdot entry for this directory.
+	 * dotdot entry for this directory.  For directory salvaging when
+	 * parent pointers are not enabled, we use the findparent_* functions
+	 * on this object and access only the parent_ino field directly.
+	 *
+	 * When parent pointers are enabled, however, the pptr scanner uses the
+	 * iscan, hooks, lock, and parent_ino fields of this object directly.
+	 * @pscan.lock coordinates access to dir_entries, dir_names,
+	 * parent_ino, subdirs, dirents, and args.  This reduces the memory
+	 * requirements of this structure.
 	 */
 	struct xrep_parent_scan_info pscan;
 
@@ -763,28 +782,35 @@ xrep_dir_replay_updates(
 	int			error;
 
 	/* Add all the salvaged dirents to the temporary directory. */
+	mutex_lock(&rd->pscan.lock);
 	foreach_xfarray_idx(rd->dir_entries, array_cur) {
 		struct xrep_dirent	dirent;
 
 		error = xfarray_load(rd->dir_entries, array_cur, &dirent);
 		if (error)
-			return error;
+			goto out_unlock;
 
 		error = xfblob_loadname(rd->dir_names, dirent.name_cookie,
 				&rd->xname, dirent.namelen);
 		if (error)
-			return error;
+			goto out_unlock;
 		rd->xname.type = dirent.ftype;
+		mutex_unlock(&rd->pscan.lock);
 
 		error = xrep_dir_replay_update(rd, &rd->xname, &dirent);
 		if (error)
 			return error;
+		mutex_lock(&rd->pscan.lock);
 	}
 
 	/* Empty out both arrays now that we've added the entries. */
 	xfarray_truncate(rd->dir_entries);
 	xfblob_truncate(rd->dir_names);
+	mutex_unlock(&rd->pscan.lock);
 	return 0;
+out_unlock:
+	mutex_unlock(&rd->pscan.lock);
+	return error;
 }
 
 /*
@@ -995,6 +1021,266 @@ xrep_dir_salvage_entries(
 }
 
 
+/*
+ * Examine a parent pointer of a file.  If it leads us back to the directory
+ * that we're rebuilding, create an incore dirent from the parent pointer and
+ * stash it.
+ */
+STATIC int
+xrep_dir_scan_pptr(
+	struct xfs_scrub		*sc,
+	struct xfs_inode		*ip,
+	unsigned int			attr_flags,
+	const unsigned char		*name,
+	unsigned int			namelen,
+	const void			*value,
+	unsigned int			valuelen,
+	void				*priv)
+{
+	struct xfs_name			xname = {
+		.name			= name,
+		.len			= namelen,
+		.type			= xfs_mode_to_ftype(VFS_I(ip)->i_mode),
+	};
+	xfs_ino_t			parent_ino;
+	uint32_t			parent_gen;
+	struct xrep_dir			*rd = priv;
+	int				ret;
+
+	/*
+	 * Ignore parent pointers that point back to a different dir, list the
+	 * wrong generation number, or are invalid.
+	 */
+	ret = xfs_parent_from_xattr(sc->mp, attr_flags, name, namelen,
+			value, valuelen, &parent_ino, &parent_gen);
+	if (ret != 1)
+		return ret;
+
+	if (parent_ino != sc->ip->i_ino ||
+	    parent_gen != VFS_I(sc->ip)->i_generation)
+		return 0;
+
+	mutex_lock(&rd->pscan.lock);
+	ret = xrep_dir_stash_createname(rd, &xname, ip->i_ino);
+	mutex_unlock(&rd->pscan.lock);
+	return ret;
+}
+
+/*
+ * If this child dirent points to the directory being repaired, remember that
+ * fact so that we can reset the dotdot entry if necessary.
+ */
+STATIC int
+xrep_dir_scan_dirent(
+	struct xfs_scrub	*sc,
+	struct xfs_inode	*dp,
+	xfs_dir2_dataptr_t	dapos,
+	const struct xfs_name	*name,
+	xfs_ino_t		ino,
+	void			*priv)
+{
+	struct xrep_dir		*rd = priv;
+
+	/* Dirent doesn't point to this directory. */
+	if (ino != rd->sc->ip->i_ino)
+		return 0;
+
+	/* Ignore garbage inum. */
+	if (!xfs_verify_dir_ino(rd->sc->mp, ino))
+		return 0;
+
+	/* No weird looking names. */
+	if (name->len >= MAXNAMELEN || name->len <= 0)
+		return 0;
+
+	/* Don't pick up dot or dotdot entries; we only want child dirents. */
+	if (xfs_dir2_samename(name, &xfs_name_dotdot) ||
+	    xfs_dir2_samename(name, &xfs_name_dot))
+		return 0;
+
+	trace_xrep_dir_stash_createname(sc->tempip, &xfs_name_dotdot,
+			dp->i_ino);
+
+	xrep_findparent_scan_found(&rd->pscan, dp->i_ino);
+	return 0;
+}
+
+/*
+ * Decide if we want to look for child dirents or parent pointers in this file.
+ * Skip the dir being repaired and any files being used to stage repairs.
+ */
+static inline bool
+xrep_dir_want_scan(
+	struct xrep_dir		*rd,
+	const struct xfs_inode	*ip)
+{
+	return ip != rd->sc->ip && !xrep_is_tempfile(ip);
+}
+
+/*
+ * Take ILOCK on a file that we want to scan.
+ *
+ * Select ILOCK_EXCL if the file is a directory with an unloaded data bmbt or
+ * has an unloaded attr bmbt.  Otherwise, take ILOCK_SHARED.
+ */
+static inline unsigned int
+xrep_dir_scan_ilock(
+	struct xrep_dir		*rd,
+	struct xfs_inode	*ip)
+{
+	uint			lock_mode = XFS_ILOCK_SHARED;
+
+	/* Need to take the shared ILOCK to advance the iscan cursor. */
+	if (!xrep_dir_want_scan(rd, ip))
+		goto lock;
+
+	if (S_ISDIR(VFS_I(ip)->i_mode) && xfs_need_iread_extents(&ip->i_df)) {
+		lock_mode = XFS_ILOCK_EXCL;
+		goto lock;
+	}
+
+	if (xfs_inode_has_attr_fork(ip) && xfs_need_iread_extents(&ip->i_af))
+		lock_mode = XFS_ILOCK_EXCL;
+
+lock:
+	xfs_ilock(ip, lock_mode);
+	return lock_mode;
+}
+
+/*
+ * Scan this file for relevant child dirents or parent pointers that point to
+ * the directory we're rebuilding.
+ */
+STATIC int
+xrep_dir_scan_file(
+	struct xrep_dir		*rd,
+	struct xfs_inode	*ip)
+{
+	unsigned int		lock_mode;
+	int			error = 0;
+
+	lock_mode = xrep_dir_scan_ilock(rd, ip);
+
+	if (!xrep_dir_want_scan(rd, ip))
+		goto scan_done;
+
+	/*
+	 * If the extended attributes look as though they has been zapped by
+	 * the inode record repair code, we cannot scan for parent pointers.
+	 */
+	if (xchk_pptr_looks_zapped(ip)) {
+		error = -EBUSY;
+		goto scan_done;
+	}
+
+	error = xchk_xattr_walk(rd->sc, ip, xrep_dir_scan_pptr, rd);
+	if (error)
+		goto scan_done;
+
+	if (S_ISDIR(VFS_I(ip)->i_mode)) {
+		/*
+		 * If the directory looks as though it has been zapped by the
+		 * inode record repair code, we cannot scan for child dirents.
+		 */
+		if (xchk_dir_looks_zapped(ip)) {
+			error = -EBUSY;
+			goto scan_done;
+		}
+
+		error = xchk_dir_walk(rd->sc, ip, xrep_dir_scan_dirent, rd);
+		if (error)
+			goto scan_done;
+	}
+
+scan_done:
+	xchk_iscan_mark_visited(&rd->pscan.iscan, ip);
+	xfs_iunlock(ip, lock_mode);
+	return error;
+}
+
+/*
+ * Scan all files in the filesystem for parent pointers that we can turn into
+ * replacement dirents, and a dirent that we can use to set the dotdot pointer.
+ */
+STATIC int
+xrep_dir_scan_dirtree(
+	struct xrep_dir		*rd)
+{
+	struct xfs_scrub	*sc = rd->sc;
+	struct xfs_inode	*ip;
+	int			error;
+
+	/* Roots of directory trees are their own parents. */
+	if (sc->ip == sc->mp->m_rootip)
+		xrep_findparent_scan_found(&rd->pscan, sc->ip->i_ino);
+
+	/*
+	 * Filesystem scans are time consuming.  Drop the directory ILOCK and
+	 * all other resources for the duration of the scan and hope for the
+	 * best.  The live update hooks will keep our scan information up to
+	 * date even though we've dropped the locks.
+	 */
+	xchk_trans_cancel(sc);
+	if (sc->ilock_flags & (XFS_ILOCK_SHARED | XFS_ILOCK_EXCL))
+		xchk_iunlock(sc, sc->ilock_flags & (XFS_ILOCK_SHARED |
+						    XFS_ILOCK_EXCL));
+	error = xchk_trans_alloc_empty(sc);
+	if (error)
+		return error;
+
+	while ((error = xchk_iscan_iter(&rd->pscan.iscan, &ip)) == 1) {
+		bool		flush;
+
+		error = xrep_dir_scan_file(rd, ip);
+		xchk_irele(sc, ip);
+		if (error)
+			break;
+
+		/* Flush stashed dirent updates to constrain memory usage. */
+		mutex_lock(&rd->pscan.lock);
+		flush = xrep_dir_want_flush_stashed(rd);
+		mutex_unlock(&rd->pscan.lock);
+		if (flush) {
+			xchk_trans_cancel(sc);
+
+			error = xrep_tempfile_iolock_polled(sc);
+			if (error)
+				break;
+
+			error = xrep_dir_replay_updates(rd);
+			xrep_tempfile_iounlock(sc);
+			if (error)
+				break;
+
+			error = xchk_trans_alloc_empty(sc);
+			if (error)
+				break;
+		}
+
+		if (xchk_should_terminate(sc, &error))
+			break;
+	}
+	xchk_iscan_iter_finish(&rd->pscan.iscan);
+	if (error) {
+		/*
+		 * If we couldn't grab an inode that was busy with a state
+		 * change, change the error code so that we exit to userspace
+		 * as quickly as possible.
+		 */
+		if (error == -EBUSY)
+			return -ECANCELED;
+		return error;
+	}
+
+	/*
+	 * Cancel the empty transaction so that we can (later) use the atomic
+	 * file mapping exchange functions to lock files and commit the new
+	 * directory.
+	 */
+	xchk_trans_cancel(rd->sc);
+	return 0;
+}
+
 /*
  * Free all the directory blocks and reset the data fork.  The caller must
  * join the inode to the transaction.  This function returns with the inode
@@ -1194,6 +1480,45 @@ xrep_dir_set_nlink(
 	return 0;
 }
 
+/*
+ * Finish replaying stashed dirent updates, allocate a transaction for
+ * exchanging data fork mappings, and take the ILOCKs of both directories
+ * before we commit the new directory structure.
+ */
+STATIC int
+xrep_dir_finalize_tempdir(
+	struct xrep_dir		*rd)
+{
+	struct xfs_scrub	*sc = rd->sc;
+	int			error;
+
+	if (!xfs_has_parent(sc->mp))
+		return xrep_tempexch_trans_alloc(sc, XFS_DATA_FORK, &rd->tx);
+
+	/*
+	 * Repair relies on the ILOCK to quiesce all possible dirent updates.
+	 * Replay all queued dirent updates into the tempdir before exchanging
+	 * the contents, even if that means dropping the ILOCKs and the
+	 * transaction.
+	 */
+	do {
+		error = xrep_dir_replay_updates(rd);
+		if (error)
+			return error;
+
+		error = xrep_tempexch_trans_alloc(sc, XFS_DATA_FORK, &rd->tx);
+		if (error)
+			return error;
+
+		if (xfarray_length(rd->dir_entries) == 0)
+			break;
+
+		xchk_trans_cancel(sc);
+		xrep_tempfile_iunlock_both(sc);
+	} while (!xchk_should_terminate(sc, &error));
+	return error;
+}
+
 /* Exchange the temporary directory's data fork with the one being repaired. */
 STATIC int
 xrep_dir_swap(
@@ -1296,8 +1621,12 @@ xrep_dir_rebuild_tree(
 	if (error)
 		return error;
 
-	/* Allocate transaction and ILOCK the scrub file and the temp file. */
-	error = xrep_tempexch_trans_alloc(sc, XFS_DATA_FORK, &rd->tx);
+	/*
+	 * Allocate transaction, lock inodes, and make sure that we've replayed
+	 * all the stashed dirent updates to the tempdir.  After this point,
+	 * we're ready to exchange data fork mappings.
+	 */
+	error = xrep_dir_finalize_tempdir(rd);
 	if (error)
 		return error;
 
@@ -1482,7 +1811,10 @@ xrep_directory(
 	if (error)
 		return error;
 
-	error = xrep_dir_salvage_entries(rd);
+	if (xfs_has_parent(sc->mp))
+		error = xrep_dir_scan_dirtree(rd);
+	else
+		error = xrep_dir_salvage_entries(rd);
 	if (error)
 		goto out_teardown;
 


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 04/14] xfs: implement live updates for directory repairs
  2024-04-10  0:45 ` [PATCHSET v13.1 7/9] xfs: online repair for parent pointers Darrick J. Wong
                     ` (2 preceding siblings ...)
  2024-04-10  1:04   ` [PATCH 03/14] xfs: repair directories by scanning directory parent pointers Darrick J. Wong
@ 2024-04-10  1:04   ` Darrick J. Wong
  2024-04-10  6:19     ` Christoph Hellwig
  2024-04-10  1:04   ` [PATCH 05/14] xfs: replay unlocked parent pointer updates that accrue during xattr repair Darrick J. Wong
                     ` (9 subsequent siblings)
  13 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  1:04 UTC (permalink / raw)
  To: djwong; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

While we're scanning the filesystem for parent pointers that we can turn
into dirents, we cannot hold the IOLOCK or ILOCK of the directory being
repaired.  Therefore, we need to set up a dirent hook so that we can
keep the temporary directory up to date with the rest of the filesystem.
Hence we add the ability to *remove* entries from the temporary dir.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/scrub/dir_repair.c |  220 +++++++++++++++++++++++++++++++++++++++++----
 fs/xfs/scrub/findparent.c |   10 +-
 fs/xfs/scrub/findparent.h |   10 ++
 fs/xfs/scrub/trace.h      |    2 
 4 files changed, 219 insertions(+), 23 deletions(-)


diff --git a/fs/xfs/scrub/dir_repair.c b/fs/xfs/scrub/dir_repair.c
index d7b84d69510a4..24c46211d9243 100644
--- a/fs/xfs/scrub/dir_repair.c
+++ b/fs/xfs/scrub/dir_repair.c
@@ -85,6 +85,12 @@
  * other threads.
  */
 
+/* Create a dirent in the tempdir. */
+#define XREP_DIRENT_ADD		(1)
+
+/* Remove a dirent from the tempdir. */
+#define XREP_DIRENT_REMOVE	(2)
+
 /* Directory entry to be restored in the new directory. */
 struct xrep_dirent {
 	/* Cookie for retrieval of the dirent name. */
@@ -98,6 +104,9 @@ struct xrep_dirent {
 
 	/* File type of the dirent. */
 	uint8_t			ftype;
+
+	/* XREP_DIRENT_{ADD,REMOVE} */
+	uint8_t			action;
 };
 
 /*
@@ -339,6 +348,7 @@ xrep_dir_stash_createname(
 	xfs_ino_t		ino)
 {
 	struct xrep_dirent	dirent = {
+		.action		= XREP_DIRENT_ADD,
 		.ino		= ino,
 		.namelen	= name->len,
 		.ftype		= name->type,
@@ -354,6 +364,33 @@ xrep_dir_stash_createname(
 	return xfarray_append(rd->dir_entries, &dirent);
 }
 
+/*
+ * Remember that we want to remove a dirent from the tempdir.  These stashed
+ * actions will be replayed later.
+ */
+STATIC int
+xrep_dir_stash_removename(
+	struct xrep_dir		*rd,
+	const struct xfs_name	*name,
+	xfs_ino_t		ino)
+{
+	struct xrep_dirent	dirent = {
+		.action		= XREP_DIRENT_REMOVE,
+		.ino		= ino,
+		.namelen	= name->len,
+		.ftype		= name->type,
+	};
+	int			error;
+
+	trace_xrep_dir_stash_removename(rd->sc->tempip, name, ino);
+
+	error = xfblob_storename(rd->dir_names, &dirent.name_cookie, name);
+	if (error)
+		return error;
+
+	return xfarray_append(rd->dir_entries, &dirent);
+}
+
 /* Allocate an in-core record to hold entries while we rebuild the dir data. */
 STATIC int
 xrep_dir_salvage_entry(
@@ -705,6 +742,43 @@ xrep_dir_replay_createname(
 	return xfs_dir2_node_addname(&rd->args);
 }
 
+/* Replay a stashed removename onto the temporary directory. */
+STATIC int
+xrep_dir_replay_removename(
+	struct xrep_dir		*rd,
+	const struct xfs_name	*name,
+	xfs_extlen_t		total)
+{
+	struct xfs_inode	*dp = rd->args.dp;
+	bool			is_block, is_leaf;
+	int			error;
+
+	ASSERT(S_ISDIR(VFS_I(dp)->i_mode));
+
+	xrep_dir_init_args(rd, dp, name);
+	rd->args.op_flags = 0;
+	rd->args.total = total;
+
+	trace_xrep_dir_replay_removename(dp, name, 0);
+
+	if (dp->i_df.if_format == XFS_DINODE_FMT_LOCAL)
+		return xfs_dir2_sf_removename(&rd->args);
+
+	error = xfs_dir2_isblock(&rd->args, &is_block);
+	if (error)
+		return error;
+	if (is_block)
+		return xfs_dir2_block_removename(&rd->args);
+
+	error = xfs_dir2_isleaf(&rd->args, &is_leaf);
+	if (error)
+		return error;
+	if (is_leaf)
+		return xfs_dir2_leaf_removename(&rd->args);
+
+	return xfs_dir2_node_removename(&rd->args);
+}
+
 /*
  * Add this stashed incore directory entry to the temporary directory.
  * The caller must hold the tempdir's IOLOCK, must not hold any ILOCKs, and
@@ -732,26 +806,64 @@ xrep_dir_replay_update(
 	xrep_tempfile_ilock(rd->sc);
 	xfs_trans_ijoin(rd->sc->tp, rd->sc->tempip, 0);
 
-	/*
-	 * Create a replacement dirent in the temporary directory.  Note that
-	 * _createname doesn't check for existing entries.  There shouldn't be
-	 * any in the temporary dir, but we'll verify this in debug mode.
-	 */
+	switch (dirent->action) {
+	case XREP_DIRENT_ADD:
+		/*
+		 * Create a replacement dirent in the temporary directory.
+		 * Note that _createname doesn't check for existing entries.
+		 * There shouldn't be any in the temporary dir, but we'll
+		 * verify this in debug mode.
+		 */
 #ifdef DEBUG
-	error = xchk_dir_lookup(rd->sc, rd->sc->tempip, xname, &ino);
-	if (error != -ENOENT) {
-		ASSERT(error != -ENOENT);
+		error = xchk_dir_lookup(rd->sc, rd->sc->tempip, xname, &ino);
+		if (error != -ENOENT) {
+			ASSERT(error != -ENOENT);
+			goto out_cancel;
+		}
+#endif
+
+		error = xrep_dir_replay_createname(rd, xname, dirent->ino,
+				resblks);
+		if (error)
+			goto out_cancel;
+
+		if (xname->type == XFS_DIR3_FT_DIR)
+			rd->subdirs++;
+		rd->dirents++;
+		break;
+	case XREP_DIRENT_REMOVE:
+		/*
+		 * Remove a dirent from the temporary directory.  Note that
+		 * _removename doesn't check the inode target of the exist
+		 * entry.  There should be a perfect match in the temporary
+		 * dir, but we'll verify this in debug mode.
+		 */
+#ifdef DEBUG
+		error = xchk_dir_lookup(rd->sc, rd->sc->tempip, xname, &ino);
+		if (error) {
+			ASSERT(error != 0);
+			goto out_cancel;
+		}
+		if (ino != dirent->ino) {
+			ASSERT(ino == dirent->ino);
+			error = -EIO;
+			goto out_cancel;
+		}
+#endif
+
+		error = xrep_dir_replay_removename(rd, xname, resblks);
+		if (error)
+			goto out_cancel;
+
+		if (xname->type == XFS_DIR3_FT_DIR)
+			rd->subdirs--;
+		rd->dirents--;
+		break;
+	default:
+		ASSERT(0);
+		error = -EIO;
 		goto out_cancel;
 	}
-#endif
-
-	error = xrep_dir_replay_createname(rd, xname, dirent->ino, resblks);
-	if (error)
-		goto out_cancel;
-
-	if (xname->type == XFS_DIR3_FT_DIR)
-		rd->subdirs++;
-	rd->dirents++;
 
 	/* Commit and unlock. */
 	error = xrep_trans_commit(rd->sc);
@@ -1281,6 +1393,71 @@ xrep_dir_scan_dirtree(
 	return 0;
 }
 
+/*
+ * Capture dirent updates being made by other threads which are relevant to the
+ * directory being repaired.
+ */
+STATIC int
+xrep_dir_live_update(
+	struct notifier_block		*nb,
+	unsigned long			action,
+	void				*data)
+{
+	struct xfs_dir_update_params	*p = data;
+	struct xrep_dir			*rd;
+	struct xfs_scrub		*sc;
+	int				error = 0;
+
+	rd = container_of(nb, struct xrep_dir, pscan.dhook.dirent_hook.nb);
+	sc = rd->sc;
+
+	/*
+	 * This thread updated a child dirent in the directory that we're
+	 * rebuilding.  Stash the update for replay against the temporary
+	 * directory.
+	 */
+	if (p->dp->i_ino == sc->ip->i_ino &&
+	    xchk_iscan_want_live_update(&rd->pscan.iscan, p->ip->i_ino)) {
+		mutex_lock(&rd->pscan.lock);
+		if (p->delta > 0)
+			error = xrep_dir_stash_createname(rd, p->name,
+					p->ip->i_ino);
+		else
+			error = xrep_dir_stash_removename(rd, p->name,
+					p->ip->i_ino);
+		mutex_unlock(&rd->pscan.lock);
+		if (error)
+			goto out_abort;
+	}
+
+	/*
+	 * This thread updated another directory's child dirent that points to
+	 * the directory that we're rebuilding, so remember the new dotdot
+	 * target.
+	 */
+	if (p->ip->i_ino == sc->ip->i_ino &&
+	    xchk_iscan_want_live_update(&rd->pscan.iscan, p->dp->i_ino)) {
+		if (p->delta > 0) {
+			trace_xrep_dir_stash_createname(sc->tempip,
+					&xfs_name_dotdot,
+					p->dp->i_ino);
+
+			xrep_findparent_scan_found(&rd->pscan, p->dp->i_ino);
+		} else {
+			trace_xrep_dir_stash_removename(sc->tempip,
+					&xfs_name_dotdot,
+					rd->pscan.parent_ino);
+
+			xrep_findparent_scan_found(&rd->pscan, NULLFSINO);
+		}
+	}
+
+	return NOTIFY_DONE;
+out_abort:
+	xchk_iscan_abort(&rd->pscan.iscan);
+	return NOTIFY_DONE;
+}
+
 /*
  * Free all the directory blocks and reset the data fork.  The caller must
  * join the inode to the transaction.  This function returns with the inode
@@ -1630,6 +1807,9 @@ xrep_dir_rebuild_tree(
 	if (error)
 		return error;
 
+	if (xchk_iscan_aborted(&rd->pscan.iscan))
+		return -ECANCELED;
+
 	/*
 	 * Exchange the tempdir's data fork with the file being repaired.  This
 	 * recreates the transaction and re-takes the ILOCK in the scrub
@@ -1685,7 +1865,11 @@ xrep_dir_setup_scan(
 	if (error)
 		goto out_xfarray;
 
-	error = xrep_findparent_scan_start(sc, &rd->pscan);
+	if (xfs_has_parent(sc->mp))
+		error = __xrep_findparent_scan_start(sc, &rd->pscan,
+				xrep_dir_live_update);
+	else
+		error = xrep_findparent_scan_start(sc, &rd->pscan);
 	if (error)
 		goto out_xfblob;
 
diff --git a/fs/xfs/scrub/findparent.c b/fs/xfs/scrub/findparent.c
index 712dd73e4789f..c78422ad757bf 100644
--- a/fs/xfs/scrub/findparent.c
+++ b/fs/xfs/scrub/findparent.c
@@ -238,9 +238,10 @@ xrep_findparent_live_update(
  * will be called when there is a dotdot update for the inode being repaired.
  */
 int
-xrep_findparent_scan_start(
+__xrep_findparent_scan_start(
 	struct xfs_scrub		*sc,
-	struct xrep_parent_scan_info	*pscan)
+	struct xrep_parent_scan_info	*pscan,
+	notifier_fn_t			custom_fn)
 {
 	int				error;
 
@@ -262,7 +263,10 @@ xrep_findparent_scan_start(
 	 * ILOCK, which means that any in-progress inode updates will finish
 	 * before we can scan the inode.
 	 */
-	xfs_dir_hook_setup(&pscan->dhook, xrep_findparent_live_update);
+	if (custom_fn)
+		xfs_dir_hook_setup(&pscan->dhook, custom_fn);
+	else
+		xfs_dir_hook_setup(&pscan->dhook, xrep_findparent_live_update);
 	error = xfs_dir_hook_add(sc->mp, &pscan->dhook);
 	if (error)
 		goto out_iscan;
diff --git a/fs/xfs/scrub/findparent.h b/fs/xfs/scrub/findparent.h
index 501f99d3164ed..d998c7a88152c 100644
--- a/fs/xfs/scrub/findparent.h
+++ b/fs/xfs/scrub/findparent.h
@@ -24,8 +24,14 @@ struct xrep_parent_scan_info {
 	bool			lookup_parent;
 };
 
-int xrep_findparent_scan_start(struct xfs_scrub *sc,
-		struct xrep_parent_scan_info *pscan);
+int __xrep_findparent_scan_start(struct xfs_scrub *sc,
+		struct xrep_parent_scan_info *pscan,
+		notifier_fn_t custom_fn);
+static inline int xrep_findparent_scan_start(struct xfs_scrub *sc,
+		struct xrep_parent_scan_info *pscan)
+{
+	return __xrep_findparent_scan_start(sc, pscan, NULL);
+}
 int xrep_findparent_scan(struct xrep_parent_scan_info *pscan);
 void xrep_findparent_scan_teardown(struct xrep_parent_scan_info *pscan);
 
diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
index 4b968df3d840c..64db413b18884 100644
--- a/fs/xfs/scrub/trace.h
+++ b/fs/xfs/scrub/trace.h
@@ -2692,6 +2692,8 @@ DEFINE_XREP_DIRENT_EVENT(xrep_dir_salvage_entry);
 DEFINE_XREP_DIRENT_EVENT(xrep_dir_stash_createname);
 DEFINE_XREP_DIRENT_EVENT(xrep_dir_replay_createname);
 DEFINE_XREP_DIRENT_EVENT(xrep_adoption_reparent);
+DEFINE_XREP_DIRENT_EVENT(xrep_dir_stash_removename);
+DEFINE_XREP_DIRENT_EVENT(xrep_dir_replay_removename);
 
 DECLARE_EVENT_CLASS(xrep_adoption_class,
 	TP_PROTO(struct xfs_inode *dp, struct xfs_inode *ip, bool moved),


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 05/14] xfs: replay unlocked parent pointer updates that accrue during xattr repair
  2024-04-10  0:45 ` [PATCHSET v13.1 7/9] xfs: online repair for parent pointers Darrick J. Wong
                     ` (3 preceding siblings ...)
  2024-04-10  1:04   ` [PATCH 04/14] xfs: implement live updates for directory repairs Darrick J. Wong
@ 2024-04-10  1:04   ` Darrick J. Wong
  2024-04-10  6:19     ` Christoph Hellwig
  2024-04-10  1:05   ` [PATCH 06/14] xfs: repair directory parent pointers by scanning for dirents Darrick J. Wong
                     ` (8 subsequent siblings)
  13 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  1:04 UTC (permalink / raw)
  To: djwong; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

There are a few places where the extended attribute repair code drops
the ILOCK to apply stashed xattrs to the temporary file.  Although
setxattr and removexattr are still locked out because we retain our hold
on the IOLOCK, this doesn't prevent renames from updating parent
pointers, because the VFS doesn't take i_rwsem on children that are
being moved.

Therefore, set up a dirent hook to capture parent pointer updates for
this file, and replay(?) the updates.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/scrub/attr_repair.c |  438 ++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/trace.h       |   73 +++++++
 2 files changed, 509 insertions(+), 2 deletions(-)


diff --git a/fs/xfs/scrub/attr_repair.c b/fs/xfs/scrub/attr_repair.c
index a3a98051df0fb..9cf002bc18042 100644
--- a/fs/xfs/scrub/attr_repair.c
+++ b/fs/xfs/scrub/attr_repair.c
@@ -96,6 +96,52 @@ struct xrep_xattr {
 
 	/* Number of attributes that we are salvaging. */
 	unsigned long long	attrs_found;
+
+	/* Can we flush stashed attrs to the tempfile? */
+	bool			can_flush;
+
+	/* Did the live update fail, and hence the repair is now out of date? */
+	bool			live_update_aborted;
+
+	/* Lock protecting parent pointer updates */
+	struct mutex		lock;
+
+	/* Fixed-size array of xrep_xattr_pptr structures. */
+	struct xfarray		*pptr_recs;
+
+	/* Blobs containing parent pointer names. */
+	struct xfblob		*pptr_names;
+
+	/* Hook to capture parent pointer updates. */
+	struct xfs_dir_hook	dhook;
+
+	/* Scratch buffer for capturing parent pointers. */
+	struct xfs_da_args	pptr_args;
+
+	/* Name buffer */
+	struct xfs_name		xname;
+	char			namebuf[MAXNAMELEN];
+};
+
+/* Create a parent pointer in the tempfile. */
+#define XREP_XATTR_PPTR_ADD	(1)
+
+/* Remove a parent pointer from the tempfile. */
+#define XREP_XATTR_PPTR_REMOVE	(2)
+
+/* A stashed parent pointer update. */
+struct xrep_xattr_pptr {
+	/* Cookie for retrieval of the pptr name. */
+	xfblob_cookie		name_cookie;
+
+	/* Parent pointer record. */
+	struct xfs_parent_rec	pptr_rec;
+
+	/* Length of the pptr name. */
+	uint8_t			namelen;
+
+	/* XREP_XATTR_PPTR_{ADD,REMOVE} */
+	uint8_t			action;
 };
 
 /* Set up to recreate the extended attributes. */
@@ -103,6 +149,9 @@ int
 xrep_setup_xattr(
 	struct xfs_scrub	*sc)
 {
+	if (xfs_has_parent(sc->mp))
+		xchk_fsgates_enable(sc, XCHK_FSGATES_DIRENTS);
+
 	return xrep_tempfile_create(sc, S_IFREG);
 }
 
@@ -713,11 +762,122 @@ xrep_xattr_want_flush_stashed(
 {
 	unsigned long long	bytes;
 
+	if (!rx->can_flush)
+		return false;
+
 	bytes = xfarray_bytes(rx->xattr_records) +
 		xfblob_bytes(rx->xattr_blobs);
 	return bytes > XREP_XATTR_MAX_STASH_BYTES;
 }
 
+/*
+ * Did we observe rename changing parent pointer xattrs while we were flushing
+ * salvaged attrs?
+ */
+static inline bool
+xrep_xattr_saw_pptr_conflict(
+	struct xrep_xattr	*rx)
+{
+	bool			ret;
+
+	ASSERT(rx->can_flush);
+
+	if (!xfs_has_parent(rx->sc->mp))
+		return false;
+
+	xfs_assert_ilocked(rx->sc->ip, XFS_ILOCK_EXCL);
+
+	mutex_lock(&rx->lock);
+	ret = xfarray_bytes(rx->pptr_recs) > 0;
+	mutex_unlock(&rx->lock);
+
+	return ret;
+}
+
+/*
+ * Reset the entire repair state back to initial conditions, now that we've
+ * detected a parent pointer update to the attr structure while we were
+ * flushing salvaged attrs.  See the locking notes in dir_repair.c for more
+ * information on why this is all necessary.
+ */
+STATIC int
+xrep_xattr_full_reset(
+	struct xrep_xattr	*rx)
+{
+	struct xfs_scrub	*sc = rx->sc;
+	struct xfs_attr_sf_hdr	*hdr;
+	struct xfs_ifork	*ifp = &sc->tempip->i_af;
+	int			error;
+
+	trace_xrep_xattr_full_reset(sc->ip, sc->tempip);
+
+	/* The temporary file's data fork had better not be in btree format. */
+	if (sc->tempip->i_df.if_format == XFS_DINODE_FMT_BTREE) {
+		ASSERT(0);
+		return -EIO;
+	}
+
+	/*
+	 * We begin in transaction context with sc->ip ILOCKed but not joined
+	 * to the transaction.  To reset to the initial state, we must hold
+	 * sc->ip's ILOCK to prevent rename from updating parent pointer
+	 * information and the tempfile's ILOCK to clear its contents.
+	 */
+	xchk_iunlock(rx->sc, XFS_ILOCK_EXCL);
+	xrep_tempfile_ilock_both(sc);
+	xfs_trans_ijoin(sc->tp, sc->ip, 0);
+	xfs_trans_ijoin(sc->tp, sc->tempip, 0);
+
+	/*
+	 * Free all the blocks of the attr fork of the temp file, and reset
+	 * it back to local format.
+	 */
+	if (xfs_ifork_has_extents(&sc->tempip->i_af)) {
+		error = xrep_reap_ifork(sc, sc->tempip, XFS_ATTR_FORK);
+		if (error)
+			return error;
+
+		ASSERT(ifp->if_bytes == 0);
+		ifp->if_format = XFS_DINODE_FMT_LOCAL;
+		xfs_idata_realloc(sc->tempip, sizeof(*hdr), XFS_ATTR_FORK);
+	}
+
+	/* Reinitialize the attr fork to an empty shortform structure. */
+	hdr = ifp->if_data;
+	memset(hdr, 0, sizeof(*hdr));
+	hdr->totsize = cpu_to_be16(sizeof(*hdr));
+	xfs_trans_log_inode(sc->tp, sc->tempip, XFS_ILOG_CORE | XFS_ILOG_ADATA);
+
+	/*
+	 * Roll this transaction to commit our reset ondisk.  The tempfile
+	 * should no longer be joined to the transaction, so we drop its ILOCK.
+	 * This should leave us in transaction context with sc->ip ILOCKed but
+	 * not joined to the transaction.
+	 */
+	error = xrep_roll_trans(sc);
+	if (error)
+		return error;
+	xrep_tempfile_iunlock(sc);
+
+	/*
+	 * Erase any accumulated parent pointer updates now that we've erased
+	 * the tempfile's attr fork.  We're resetting the entire repair state
+	 * back to where we were initially, except now we won't flush salvaged
+	 * xattrs until the very end.
+	 */
+	mutex_lock(&rx->lock);
+	xfarray_truncate(rx->pptr_recs);
+	xfblob_truncate(rx->pptr_names);
+	mutex_unlock(&rx->lock);
+
+	rx->can_flush = false;
+	rx->attrs_found = 0;
+
+	ASSERT(xfarray_bytes(rx->xattr_records) == 0);
+	ASSERT(xfblob_bytes(rx->xattr_blobs) == 0);
+	return 0;
+}
+
 /* Extract as many attribute keys and values as we can. */
 STATIC int
 xrep_xattr_recover(
@@ -732,6 +892,7 @@ xrep_xattr_recover(
 	int			nmap;
 	int			error;
 
+restart:
 	/*
 	 * Iterate each xattr leaf block in the attr fork to scan them for any
 	 * attributes that we might salvage.
@@ -770,6 +931,14 @@ xrep_xattr_recover(
 				error = xrep_xattr_flush_stashed(rx);
 				if (error)
 					return error;
+
+				if (xrep_xattr_saw_pptr_conflict(rx)) {
+					error = xrep_xattr_full_reset(rx);
+					if (error)
+						return error;
+
+					goto restart;
+				}
 			}
 		}
 	}
@@ -929,6 +1098,180 @@ xrep_xattr_salvage_attributes(
 	return xrep_xattr_flush_stashed(rx);
 }
 
+/*
+ * Add this stashed incore parent pointer to the temporary file.  The caller
+ * must hold the tempdir's IOLOCK, must not hold any ILOCKs, and must not be in
+ * transaction context.
+ */
+STATIC int
+xrep_xattr_replay_pptr_update(
+	struct xrep_xattr		*rx,
+	const struct xfs_name		*xname,
+	struct xrep_xattr_pptr		*pptr)
+{
+	struct xfs_scrub		*sc = rx->sc;
+	int				error;
+
+	switch (pptr->action) {
+	case XREP_XATTR_PPTR_ADD:
+		/* Create parent pointer. */
+		trace_xrep_xattr_replay_parentadd(sc->tempip, xname,
+				&pptr->pptr_rec);
+
+		error = xfs_parent_set(sc->tempip, sc->ip->i_ino, xname,
+				&pptr->pptr_rec, &rx->pptr_args);
+		ASSERT(error != -EEXIST);
+		return error;
+	case XREP_XATTR_PPTR_REMOVE:
+		/* Remove parent pointer. */
+		trace_xrep_xattr_replay_parentremove(sc->tempip, xname,
+				&pptr->pptr_rec);
+
+		error = xfs_parent_unset(sc->tempip, sc->ip->i_ino, xname,
+				&pptr->pptr_rec, &rx->pptr_args);
+		ASSERT(error != -ENOATTR);
+		return error;
+	}
+
+	ASSERT(0);
+	return -EIO;
+}
+
+/*
+ * Flush stashed parent pointer updates that have been recorded by the scanner.
+ * This is done to reduce the memory requirements of the xattr rebuild, since
+ * files can have a lot of hardlinks and the fs can be busy.
+ *
+ * Caller must not hold transactions or ILOCKs.  Caller must hold the tempfile
+ * IOLOCK.
+ */
+STATIC int
+xrep_xattr_replay_pptr_updates(
+	struct xrep_xattr	*rx)
+{
+	xfarray_idx_t		array_cur;
+	int			error;
+
+	mutex_lock(&rx->lock);
+	foreach_xfarray_idx(rx->pptr_recs, array_cur) {
+		struct xrep_xattr_pptr	pptr;
+
+		error = xfarray_load(rx->pptr_recs, array_cur, &pptr);
+		if (error)
+			goto out_unlock;
+
+		error = xfblob_loadname(rx->pptr_names, pptr.name_cookie,
+				&rx->xname, pptr.namelen);
+		if (error)
+			goto out_unlock;
+		mutex_unlock(&rx->lock);
+
+		error = xrep_xattr_replay_pptr_update(rx, &rx->xname, &pptr);
+		if (error)
+			return error;
+
+		mutex_lock(&rx->lock);
+	}
+
+	/* Empty out both arrays now that we've added the entries. */
+	xfarray_truncate(rx->pptr_recs);
+	xfblob_truncate(rx->pptr_names);
+	mutex_unlock(&rx->lock);
+	return 0;
+out_unlock:
+	mutex_unlock(&rx->lock);
+	return error;
+}
+
+/*
+ * Remember that we want to create a parent pointer in the tempfile.  These
+ * stashed actions will be replayed later.
+ */
+STATIC int
+xrep_xattr_stash_parentadd(
+	struct xrep_xattr	*rx,
+	const struct xfs_name	*name,
+	const struct xfs_inode	*dp)
+{
+	struct xrep_xattr_pptr	pptr = {
+		.action		= XREP_XATTR_PPTR_ADD,
+		.namelen	= name->len,
+	};
+	int			error;
+
+	trace_xrep_xattr_stash_parentadd(rx->sc->tempip, dp, name);
+
+	xfs_inode_to_parent_rec(&pptr.pptr_rec, dp);
+	error = xfblob_storename(rx->pptr_names, &pptr.name_cookie, name);
+	if (error)
+		return error;
+
+	return xfarray_append(rx->pptr_recs, &pptr);
+}
+
+/*
+ * Remember that we want to remove a parent pointer from the tempfile.  These
+ * stashed actions will be replayed later.
+ */
+STATIC int
+xrep_xattr_stash_parentremove(
+	struct xrep_xattr	*rx,
+	const struct xfs_name	*name,
+	const struct xfs_inode	*dp)
+{
+	struct xrep_xattr_pptr	pptr = {
+		.action		= XREP_XATTR_PPTR_REMOVE,
+		.namelen	= name->len,
+	};
+	int			error;
+
+	trace_xrep_xattr_stash_parentremove(rx->sc->tempip, dp, name);
+
+	xfs_inode_to_parent_rec(&pptr.pptr_rec, dp);
+	error = xfblob_storename(rx->pptr_names, &pptr.name_cookie, name);
+	if (error)
+		return error;
+
+	return xfarray_append(rx->pptr_recs, &pptr);
+}
+
+/*
+ * Capture dirent updates being made by other threads.  We will have to replay
+ * the parent pointer updates before exchanging attr forks.
+ */
+STATIC int
+xrep_xattr_live_dirent_update(
+	struct notifier_block		*nb,
+	unsigned long			action,
+	void				*data)
+{
+	struct xfs_dir_update_params	*p = data;
+	struct xrep_xattr		*rx;
+	struct xfs_scrub		*sc;
+	int				error;
+
+	rx = container_of(nb, struct xrep_xattr, dhook.dirent_hook.nb);
+	sc = rx->sc;
+
+	/*
+	 * This thread updated a dirent that points to the file that we're
+	 * repairing, so stash the update for replay against the temporary
+	 * file.
+	 */
+	if (p->ip->i_ino != sc->ip->i_ino)
+		return NOTIFY_DONE;
+
+	mutex_lock(&rx->lock);
+	if (p->delta > 0)
+		error = xrep_xattr_stash_parentadd(rx, p->name, p->dp);
+	else
+		error = xrep_xattr_stash_parentremove(rx, p->name, p->dp);
+	if (error)
+		rx->live_update_aborted = true;
+	mutex_unlock(&rx->lock);
+	return NOTIFY_DONE;
+}
+
 /*
  * Prepare both inodes' attribute forks for an exchange.  Promote the tempfile
  * from short format to leaf format, and if the file being repaired has a short
@@ -1032,6 +1375,45 @@ xrep_xattr_swap(
 	return xrep_tempexch_contents(sc, tx);
 }
 
+/*
+ * Finish replaying stashed parent pointer updates, allocate a transaction for
+ * exchanging extent mappings, and take the ILOCKs of both files before we
+ * commit the new extended attribute structure.
+ */
+STATIC int
+xrep_xattr_finalize_tempfile(
+	struct xrep_xattr	*rx)
+{
+	struct xfs_scrub	*sc = rx->sc;
+	int			error;
+
+	if (!xfs_has_parent(sc->mp))
+		return xrep_tempexch_trans_alloc(sc, XFS_ATTR_FORK, &rx->tx);
+
+	/*
+	 * Repair relies on the ILOCK to quiesce all possible xattr updates.
+	 * Replay all queued parent pointer updates into the tempfile before
+	 * exchanging the contents, even if that means dropping the ILOCKs and
+	 * the transaction.
+	 */
+	do {
+		error = xrep_xattr_replay_pptr_updates(rx);
+		if (error)
+			return error;
+
+		error = xrep_tempexch_trans_alloc(sc, XFS_ATTR_FORK, &rx->tx);
+		if (error)
+			return error;
+
+		if (xfarray_length(rx->pptr_recs) == 0)
+			break;
+
+		xchk_trans_cancel(sc);
+		xrep_tempfile_iunlock_both(sc);
+	} while (!xchk_should_terminate(sc, &error));
+	return error;
+}
+
 /*
  * Exchange the new extended attribute data (which we created in the tempfile)
  * with the file being repaired.
@@ -1084,8 +1466,12 @@ xrep_xattr_rebuild_tree(
 	if (error)
 		return error;
 
-	/* Allocate exchange transaction and lock both inodes. */
-	error = xrep_tempexch_trans_alloc(rx->sc, XFS_ATTR_FORK, &rx->tx);
+	/*
+	 * Allocate transaction, lock inodes, and make sure that we've replayed
+	 * all the stashed parent pointer updates to the temp file.  After this
+	 * point, we're ready to exchange attr fork mappings.
+	 */
+	error = xrep_xattr_finalize_tempfile(rx);
 	if (error)
 		return error;
 
@@ -1126,8 +1512,15 @@ STATIC void
 xrep_xattr_teardown(
 	struct xrep_xattr	*rx)
 {
+	if (xfs_has_parent(rx->sc->mp))
+		xfs_dir_hook_del(rx->sc->mp, &rx->dhook);
+	if (rx->pptr_names)
+		xfblob_destroy(rx->pptr_names);
+	if (rx->pptr_recs)
+		xfarray_destroy(rx->pptr_recs);
 	xfblob_destroy(rx->xattr_blobs);
 	xfarray_destroy(rx->xattr_records);
+	mutex_destroy(&rx->lock);
 	kfree(rx);
 }
 
@@ -1146,6 +1539,10 @@ xrep_xattr_setup_scan(
 	if (!rx)
 		return -ENOMEM;
 	rx->sc = sc;
+	rx->can_flush = true;
+	rx->xname.name = rx->namebuf;
+
+	mutex_init(&rx->lock);
 
 	/*
 	 * Allocate enough memory to handle loading local attr values from the
@@ -1173,11 +1570,43 @@ xrep_xattr_setup_scan(
 	if (error)
 		goto out_keys;
 
+	if (xfs_has_parent(sc->mp)) {
+		ASSERT(sc->flags & XCHK_FSGATES_DIRENTS);
+
+		descr = xchk_xfile_ino_descr(sc,
+				"xattr retained parent pointer entries");
+		error = xfarray_create(descr, 0,
+				sizeof(struct xrep_xattr_pptr),
+				&rx->pptr_recs);
+		kfree(descr);
+		if (error)
+			goto out_values;
+
+		descr = xchk_xfile_ino_descr(sc,
+				"xattr retained parent pointer names");
+		error = xfblob_create(descr, &rx->pptr_names);
+		kfree(descr);
+		if (error)
+			goto out_pprecs;
+
+		xfs_dir_hook_setup(&rx->dhook, xrep_xattr_live_dirent_update);
+		error = xfs_dir_hook_add(sc->mp, &rx->dhook);
+		if (error)
+			goto out_ppnames;
+	}
+
 	*rxp = rx;
 	return 0;
+out_ppnames:
+	xfblob_destroy(rx->pptr_names);
+out_pprecs:
+	xfarray_destroy(rx->pptr_recs);
+out_values:
+	xfblob_destroy(rx->xattr_blobs);
 out_keys:
 	xfarray_destroy(rx->xattr_records);
 out_rx:
+	mutex_destroy(&rx->lock);
 	kfree(rx);
 	return error;
 }
@@ -1214,6 +1643,11 @@ xrep_xattr(
 	if (error)
 		goto out_scan;
 
+	if (rx->live_update_aborted) {
+		error = -EIO;
+		goto out_scan;
+	}
+
 	/* Last chance to abort before we start committing fixes. */
 	if (xchk_should_terminate(sc, &error))
 		goto out_scan;
diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
index 64db413b18884..68532f686eeb1 100644
--- a/fs/xfs/scrub/trace.h
+++ b/fs/xfs/scrub/trace.h
@@ -2602,6 +2602,43 @@ DEFINE_EVENT(xrep_xattr_class, name, \
 	TP_ARGS(ip, arg_ip))
 DEFINE_XREP_XATTR_EVENT(xrep_xattr_rebuild_tree);
 DEFINE_XREP_XATTR_EVENT(xrep_xattr_reset_fork);
+DEFINE_XREP_XATTR_EVENT(xrep_xattr_full_reset);
+
+DECLARE_EVENT_CLASS(xrep_xattr_pptr_scan_class,
+	TP_PROTO(struct xfs_inode *ip, const struct xfs_inode *dp,
+		 const struct xfs_name *name),
+	TP_ARGS(ip, dp, name),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_ino_t, ino)
+		__field(xfs_ino_t, parent_ino)
+		__field(unsigned int, parent_gen)
+		__field(unsigned int, namelen)
+		__dynamic_array(char, name, name->len)
+	),
+	TP_fast_assign(
+		__entry->dev = ip->i_mount->m_super->s_dev;
+		__entry->ino = ip->i_ino;
+		__entry->parent_ino = dp->i_ino;
+		__entry->parent_gen = VFS_IC(dp)->i_generation;
+		__entry->namelen = name->len;
+		memcpy(__get_str(name), name->name, name->len);
+	),
+	TP_printk("dev %d:%d ino 0x%llx parent_ino 0x%llx parent_gen 0x%x name '%.*s'",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->ino,
+		  __entry->parent_ino,
+		  __entry->parent_gen,
+		  __entry->namelen,
+		  __get_str(name))
+)
+#define DEFINE_XREP_XATTR_PPTR_SCAN_EVENT(name) \
+DEFINE_EVENT(xrep_xattr_pptr_scan_class, name, \
+	TP_PROTO(struct xfs_inode *ip, const struct xfs_inode *dp, \
+		 const struct xfs_name *name), \
+	TP_ARGS(ip, dp, name))
+DEFINE_XREP_XATTR_PPTR_SCAN_EVENT(xrep_xattr_stash_parentadd);
+DEFINE_XREP_XATTR_PPTR_SCAN_EVENT(xrep_xattr_stash_parentremove);
 
 TRACE_EVENT(xrep_dir_recover_dirblock,
 	TP_PROTO(struct xfs_inode *dp, xfs_dablk_t dabno, uint32_t magic,
@@ -2748,6 +2785,42 @@ DEFINE_XREP_PARENT_SALVAGE_EVENT(xrep_dir_salvaged_parent);
 DEFINE_XREP_PARENT_SALVAGE_EVENT(xrep_findparent_dirent);
 DEFINE_XREP_PARENT_SALVAGE_EVENT(xrep_findparent_from_dcache);
 
+DECLARE_EVENT_CLASS(xrep_pptr_class,
+	TP_PROTO(struct xfs_inode *ip, const struct xfs_name *name,
+		 const struct xfs_parent_rec *pptr),
+	TP_ARGS(ip, name, pptr),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_ino_t, ino)
+		__field(xfs_ino_t, parent_ino)
+		__field(unsigned int, parent_gen)
+		__field(unsigned int, namelen)
+		__dynamic_array(char, name, name->len)
+	),
+	TP_fast_assign(
+		__entry->dev = ip->i_mount->m_super->s_dev;
+		__entry->ino = ip->i_ino;
+		__entry->parent_ino = be64_to_cpu(pptr->p_ino);
+		__entry->parent_gen = be32_to_cpu(pptr->p_gen);
+		__entry->namelen = name->len;
+		memcpy(__get_str(name), name->name, name->len);
+	),
+	TP_printk("dev %d:%d ino 0x%llx parent_ino 0x%llx parent_gen 0x%x name '%.*s'",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->ino,
+		  __entry->parent_ino,
+		  __entry->parent_gen,
+		  __entry->namelen,
+		  __get_str(name))
+)
+#define DEFINE_XREP_PPTR_EVENT(name) \
+DEFINE_EVENT(xrep_pptr_class, name, \
+	TP_PROTO(struct xfs_inode *ip, const struct xfs_name *name, \
+		 const struct xfs_parent_rec *pptr), \
+	TP_ARGS(ip, name, pptr))
+DEFINE_XREP_PPTR_EVENT(xrep_xattr_replay_parentadd);
+DEFINE_XREP_PPTR_EVENT(xrep_xattr_replay_parentremove);
+
 TRACE_EVENT(xrep_nlinks_set_record,
 	TP_PROTO(struct xfs_mount *mp, xfs_ino_t ino,
 		 const struct xchk_nlink *obs),


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 06/14] xfs: repair directory parent pointers by scanning for dirents
  2024-04-10  0:45 ` [PATCHSET v13.1 7/9] xfs: online repair for parent pointers Darrick J. Wong
                     ` (4 preceding siblings ...)
  2024-04-10  1:04   ` [PATCH 05/14] xfs: replay unlocked parent pointer updates that accrue during xattr repair Darrick J. Wong
@ 2024-04-10  1:05   ` Darrick J. Wong
  2024-04-10  6:20     ` Christoph Hellwig
  2024-04-10  1:05   ` [PATCH 07/14] xfs: implement live updates for parent pointer repairs Darrick J. Wong
                     ` (7 subsequent siblings)
  13 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  1:05 UTC (permalink / raw)
  To: djwong; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

If parent pointers are enabled on the filesystem, we can repair the
entire dataset by walking the directories of the filesystem looking for
dirents that we can turn into parent pointers.  Once we have a full
incore dataset, we'll figure out what to do with it, but that's for a
subsequent patch.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/scrub/parent_repair.c |  414 ++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/trace.h         |   36 ++++
 2 files changed, 447 insertions(+), 3 deletions(-)


diff --git a/fs/xfs/scrub/parent_repair.c b/fs/xfs/scrub/parent_repair.c
index 63590e1b35060..b4084a9f0e9c8 100644
--- a/fs/xfs/scrub/parent_repair.c
+++ b/fs/xfs/scrub/parent_repair.c
@@ -24,6 +24,7 @@
 #include "xfs_trans_space.h"
 #include "xfs_health.h"
 #include "xfs_exchmaps.h"
+#include "xfs_parent.h"
 #include "scrub/xfs_scrub.h"
 #include "scrub/scrub.h"
 #include "scrub/common.h"
@@ -34,6 +35,9 @@
 #include "scrub/readdir.h"
 #include "scrub/tempfile.h"
 #include "scrub/orphanage.h"
+#include "scrub/xfile.h"
+#include "scrub/xfarray.h"
+#include "scrub/xfblob.h"
 
 /*
  * Repairing The Directory Parent Pointer
@@ -49,14 +53,61 @@
  * See the section on locking issues in dir_repair.c for more information about
  * conflicts with the VFS.  The findparent code wll keep our incore parent
  * inode up to date.
+ *
+ * If parent pointers are enabled, we instead reconstruct the parent pointer
+ * information by visiting every directory entry of every directory in the
+ * system and translating the relevant dirents into parent pointers.  In this
+ * case, it is advantageous to stash all parent pointers created from dirents
+ * from a single parent file before replaying them into the temporary file.  To
+ * save memory, the live filesystem scan reuses the findparent object.  Parent
+ * pointer repair chooses either directory scanning or findparent, but not
+ * both.
+ *
+ * When salvaging completes, the remaining stashed entries are replayed to the
+ * temporary file.  All non-parent pointer extended attributes are copied to
+ * the temporary file's extended attributes.  An atomic extent swap is used to
+ * commit the new directory blocks to the directory being repaired.  This will
+ * disrupt attrmulti cursors.
  */
 
+/* A stashed parent pointer update. */
+struct xrep_pptr {
+	/* Cookie for retrieval of the pptr name. */
+	xfblob_cookie		name_cookie;
+
+	/* Parent pointer record. */
+	struct xfs_parent_rec	pptr_rec;
+
+	/* Length of the pptr name. */
+	uint8_t			namelen;
+};
+
+/*
+ * Stash up to 8 pages of recovered parent pointers in pptr_recs and
+ * pptr_names before we write them to the temp file.
+ */
+#define XREP_PARENT_MAX_STASH_BYTES	(PAGE_SIZE * 8)
+
 struct xrep_parent {
 	struct xfs_scrub	*sc;
 
+	/* Fixed-size array of xrep_pptr structures. */
+	struct xfarray		*pptr_recs;
+
+	/* Blobs containing parent pointer names. */
+	struct xfblob		*pptr_names;
+
 	/*
 	 * Information used to scan the filesystem to find the inumber of the
-	 * dotdot entry for this directory.
+	 * dotdot entry for this directory.  On filesystems without parent
+	 * pointers, we use the findparent_* functions on this object and
+	 * access only the parent_ino field directly.
+	 *
+	 * When parent pointers are enabled, the directory entry scanner uses
+	 * the iscan, hooks, and lock fields of this object directly.
+	 * @pscan.lock coordinates access to pptr_recs, pptr_names, pptr, and
+	 * pptr_scratch.  This reduces the memory requirements of this
+	 * structure.
 	 */
 	struct xrep_parent_scan_info pscan;
 
@@ -66,6 +117,9 @@ struct xrep_parent {
 	/* Directory entry name, plus the trailing null. */
 	struct xfs_name		xname;
 	unsigned char		namebuf[MAXNAMELEN];
+
+	/* Scratch buffer for scanning pptr xattrs */
+	struct xfs_da_args	pptr_args;
 };
 
 /* Tear down all the incore stuff we created. */
@@ -74,6 +128,12 @@ xrep_parent_teardown(
 	struct xrep_parent	*rp)
 {
 	xrep_findparent_scan_teardown(&rp->pscan);
+	if (rp->pptr_names)
+		xfblob_destroy(rp->pptr_names);
+	rp->pptr_names = NULL;
+	if (rp->pptr_recs)
+		xfarray_destroy(rp->pptr_recs);
+	rp->pptr_recs = NULL;
 }
 
 /* Set up for a parent repair. */
@@ -82,6 +142,7 @@ xrep_setup_parent(
 	struct xfs_scrub	*sc)
 {
 	struct xrep_parent	*rp;
+	int			error;
 
 	xchk_fsgates_enable(sc, XCHK_FSGATES_DIRENTS);
 
@@ -92,6 +153,10 @@ xrep_setup_parent(
 	rp->xname.name = rp->namebuf;
 	sc->buf = rp;
 
+	error = xrep_tempfile_create(sc, S_IFREG);
+	if (error)
+		return error;
+
 	return xrep_orphanage_try_create(sc);
 }
 
@@ -147,6 +212,307 @@ xrep_parent_find_dotdot(
 	return error;
 }
 
+/*
+ * Add this stashed incore parent pointer to the temporary file.
+ * The caller must hold the tempdir's IOLOCK, must not hold any ILOCKs, and
+ * must not be in transaction context.
+ */
+STATIC int
+xrep_parent_replay_update(
+	struct xrep_parent	*rp,
+	const struct xfs_name	*xname,
+	struct xrep_pptr	*pptr)
+{
+	struct xfs_scrub	*sc = rp->sc;
+
+	/* Create parent pointer. */
+	trace_xrep_parent_replay_parentadd(sc->tempip, xname, &pptr->pptr_rec);
+
+	return xfs_parent_set(sc->tempip, sc->ip->i_ino, xname,
+			&pptr->pptr_rec, &rp->pptr_args);
+}
+
+/*
+ * Flush stashed parent pointer updates that have been recorded by the scanner.
+ * This is done to reduce the memory requirements of the parent pointer
+ * rebuild, since files can have a lot of hardlinks and the fs can be busy.
+ *
+ * Caller must not hold transactions or ILOCKs.  Caller must hold the tempfile
+ * IOLOCK.
+ */
+STATIC int
+xrep_parent_replay_updates(
+	struct xrep_parent	*rp)
+{
+	xfarray_idx_t		array_cur;
+	int			error;
+
+	mutex_lock(&rp->pscan.lock);
+	foreach_xfarray_idx(rp->pptr_recs, array_cur) {
+		struct xrep_pptr	pptr;
+
+		error = xfarray_load(rp->pptr_recs, array_cur, &pptr);
+		if (error)
+			goto out_unlock;
+
+		error = xfblob_loadname(rp->pptr_names, pptr.name_cookie,
+				&rp->xname, pptr.namelen);
+		if (error)
+			goto out_unlock;
+		rp->xname.len = pptr.namelen;
+		mutex_unlock(&rp->pscan.lock);
+
+		error = xrep_parent_replay_update(rp, &rp->xname, &pptr);
+		if (error)
+			return error;
+
+		mutex_lock(&rp->pscan.lock);
+	}
+
+	/* Empty out both arrays now that we've added the entries. */
+	xfarray_truncate(rp->pptr_recs);
+	xfblob_truncate(rp->pptr_names);
+	mutex_unlock(&rp->pscan.lock);
+	return 0;
+out_unlock:
+	mutex_unlock(&rp->pscan.lock);
+	return error;
+}
+
+/*
+ * Remember that we want to create a parent pointer in the tempfile.  These
+ * stashed actions will be replayed later.
+ */
+STATIC int
+xrep_parent_stash_parentadd(
+	struct xrep_parent	*rp,
+	const struct xfs_name	*name,
+	const struct xfs_inode	*dp)
+{
+	struct xrep_pptr	pptr = {
+		.namelen	= name->len,
+	};
+	int			error;
+
+	trace_xrep_parent_stash_parentadd(rp->sc->tempip, dp, name);
+
+	xfs_inode_to_parent_rec(&pptr.pptr_rec, dp);
+	error = xfblob_storename(rp->pptr_names, &pptr.name_cookie, name);
+	if (error)
+		return error;
+
+	return xfarray_append(rp->pptr_recs, &pptr);
+}
+
+/*
+ * Examine an entry of a directory.  If this dirent leads us back to the file
+ * whose parent pointers we're rebuilding, add a pptr to the temporary
+ * directory.
+ */
+STATIC int
+xrep_parent_scan_dirent(
+	struct xfs_scrub	*sc,
+	struct xfs_inode	*dp,
+	xfs_dir2_dataptr_t	dapos,
+	const struct xfs_name	*name,
+	xfs_ino_t		ino,
+	void			*priv)
+{
+	struct xrep_parent	*rp = priv;
+	int			error;
+
+	/* Dirent doesn't point to this directory. */
+	if (ino != rp->sc->ip->i_ino)
+		return 0;
+
+	/* No weird looking names. */
+	if (name->len == 0 || !xfs_dir2_namecheck(name->name, name->len))
+		return -EFSCORRUPTED;
+
+	/* No mismatching ftypes. */
+	if (name->type != xfs_mode_to_ftype(VFS_I(sc->ip)->i_mode))
+		return -EFSCORRUPTED;
+
+	/* Don't pick up dot or dotdot entries; we only want child dirents. */
+	if (xfs_dir2_samename(name, &xfs_name_dotdot) ||
+	    xfs_dir2_samename(name, &xfs_name_dot))
+		return 0;
+
+	/*
+	 * Transform this dirent into a parent pointer and queue it for later
+	 * addition to the temporary file.
+	 */
+	mutex_lock(&rp->pscan.lock);
+	error = xrep_parent_stash_parentadd(rp, name, dp);
+	mutex_unlock(&rp->pscan.lock);
+	return error;
+}
+
+/*
+ * Decide if we want to look for dirents in this directory.  Skip the file
+ * being repaired and any files being used to stage repairs.
+ */
+static inline bool
+xrep_parent_want_scan(
+	struct xrep_parent	*rp,
+	const struct xfs_inode	*ip)
+{
+	return ip != rp->sc->ip && !xrep_is_tempfile(ip);
+}
+
+/*
+ * Take ILOCK on a file that we want to scan.
+ *
+ * Select ILOCK_EXCL if the file is a directory with an unloaded data bmbt.
+ * Otherwise, take ILOCK_SHARED.
+ */
+static inline unsigned int
+xrep_parent_scan_ilock(
+	struct xrep_parent	*rp,
+	struct xfs_inode	*ip)
+{
+	uint			lock_mode = XFS_ILOCK_SHARED;
+
+	/* Still need to take the shared ILOCK to advance the iscan cursor. */
+	if (!xrep_parent_want_scan(rp, ip))
+		goto lock;
+
+	if (S_ISDIR(VFS_I(ip)->i_mode) && xfs_need_iread_extents(&ip->i_df)) {
+		lock_mode = XFS_ILOCK_EXCL;
+		goto lock;
+	}
+
+lock:
+	xfs_ilock(ip, lock_mode);
+	return lock_mode;
+}
+
+/*
+ * Scan this file for relevant child dirents that point to the file whose
+ * parent pointers we're rebuilding.
+ */
+STATIC int
+xrep_parent_scan_file(
+	struct xrep_parent	*rp,
+	struct xfs_inode	*ip)
+{
+	unsigned int		lock_mode;
+	int			error = 0;
+
+	lock_mode = xrep_parent_scan_ilock(rp, ip);
+
+	if (!xrep_parent_want_scan(rp, ip))
+		goto scan_done;
+
+	if (S_ISDIR(VFS_I(ip)->i_mode)) {
+		/*
+		 * If the directory looks as though it has been zapped by the
+		 * inode record repair code, we cannot scan for child dirents.
+		 */
+		if (xchk_dir_looks_zapped(ip)) {
+			error = -EBUSY;
+			goto scan_done;
+		}
+
+		error = xchk_dir_walk(rp->sc, ip, xrep_parent_scan_dirent, rp);
+		if (error)
+			goto scan_done;
+	}
+
+scan_done:
+	xchk_iscan_mark_visited(&rp->pscan.iscan, ip);
+	xfs_iunlock(ip, lock_mode);
+	return error;
+}
+
+/* Decide if we've stashed too much pptr data in memory. */
+static inline bool
+xrep_parent_want_flush_stashed(
+	struct xrep_parent	*rp)
+{
+	unsigned long long	bytes;
+
+	bytes = xfarray_bytes(rp->pptr_recs) + xfblob_bytes(rp->pptr_names);
+	return bytes > XREP_PARENT_MAX_STASH_BYTES;
+}
+
+/*
+ * Scan all directories in the filesystem to look for dirents that we can turn
+ * into parent pointers.
+ */
+STATIC int
+xrep_parent_scan_dirtree(
+	struct xrep_parent	*rp)
+{
+	struct xfs_scrub	*sc = rp->sc;
+	struct xfs_inode	*ip;
+	int			error;
+
+	/*
+	 * Filesystem scans are time consuming.  Drop the file ILOCK and all
+	 * other resources for the duration of the scan and hope for the best.
+	 * The live update hooks will keep our scan information up to date.
+	 */
+	xchk_trans_cancel(sc);
+	if (sc->ilock_flags & (XFS_ILOCK_SHARED | XFS_ILOCK_EXCL))
+		xchk_iunlock(sc, sc->ilock_flags & (XFS_ILOCK_SHARED |
+						    XFS_ILOCK_EXCL));
+	error = xchk_trans_alloc_empty(sc);
+	if (error)
+		return error;
+
+	while ((error = xchk_iscan_iter(&rp->pscan.iscan, &ip)) == 1) {
+		bool		flush;
+
+		error = xrep_parent_scan_file(rp, ip);
+		xchk_irele(sc, ip);
+		if (error)
+			break;
+
+		/* Flush stashed pptr updates to constrain memory usage. */
+		mutex_lock(&rp->pscan.lock);
+		flush = xrep_parent_want_flush_stashed(rp);
+		mutex_unlock(&rp->pscan.lock);
+		if (flush) {
+			xchk_trans_cancel(sc);
+
+			error = xrep_tempfile_iolock_polled(sc);
+			if (error)
+				break;
+
+			error = xrep_parent_replay_updates(rp);
+			xrep_tempfile_iounlock(sc);
+			if (error)
+				break;
+
+			error = xchk_trans_alloc_empty(sc);
+			if (error)
+				break;
+		}
+
+		if (xchk_should_terminate(sc, &error))
+			break;
+	}
+	xchk_iscan_iter_finish(&rp->pscan.iscan);
+	if (error) {
+		/*
+		 * If we couldn't grab an inode that was busy with a state
+		 * change, change the error code so that we exit to userspace
+		 * as quickly as possible.
+		 */
+		if (error == -EBUSY)
+			return -ECANCELED;
+		return error;
+	}
+
+	/*
+	 * Cancel the empty transaction so that we can (later) use the atomic
+	 * extent swap helpers to lock files and commit the new directory.
+	 */
+	xchk_trans_cancel(rp->sc);
+	return 0;
+}
+
 /* Reset a directory's dotdot entry, if needed. */
 STATIC int
 xrep_parent_reset_dotdot(
@@ -298,8 +664,39 @@ xrep_parent_setup_scan(
 	struct xrep_parent	*rp)
 {
 	struct xfs_scrub	*sc = rp->sc;
+	char			*descr;
+	int			error;
 
-	return xrep_findparent_scan_start(sc, &rp->pscan);
+	if (!xfs_has_parent(sc->mp))
+		return xrep_findparent_scan_start(sc, &rp->pscan);
+
+	/* Set up some staging memory for logging parent pointer updates. */
+	descr = xchk_xfile_ino_descr(sc, "parent pointer entries");
+	error = xfarray_create(descr, 0, sizeof(struct xrep_pptr),
+			&rp->pptr_recs);
+	kfree(descr);
+	if (error)
+		return error;
+
+	descr = xchk_xfile_ino_descr(sc, "parent pointer names");
+	error = xfblob_create(descr, &rp->pptr_names);
+	kfree(descr);
+	if (error)
+		goto out_recs;
+
+	error = xrep_findparent_scan_start(sc, &rp->pscan);
+	if (error)
+		goto out_names;
+
+	return 0;
+
+out_names:
+	xfblob_destroy(rp->pptr_names);
+	rp->pptr_names = NULL;
+out_recs:
+	xfarray_destroy(rp->pptr_recs);
+	rp->pptr_recs = NULL;
+	return error;
 }
 
 int
@@ -309,11 +706,22 @@ xrep_parent(
 	struct xrep_parent	*rp = sc->buf;
 	int			error;
 
+	/*
+	 * When the parent pointers feature is enabled, repairs are committed
+	 * by atomically committing a new xattr structure and reaping the old
+	 * attr fork.  Reaping requires rmap to be enabled.
+	 */
+	if (xfs_has_parent(sc->mp) && !xfs_has_rmapbt(sc->mp))
+		return -EOPNOTSUPP;
+
 	error = xrep_parent_setup_scan(rp);
 	if (error)
 		return error;
 
-	error = xrep_parent_find_dotdot(rp);
+	if (xfs_has_parent(sc->mp))
+		error = xrep_parent_scan_dirtree(rp);
+	else
+		error = xrep_parent_find_dotdot(rp);
 	if (error)
 		goto out_teardown;
 
diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
index 68532f686eeb1..10c2a8d10058b 100644
--- a/fs/xfs/scrub/trace.h
+++ b/fs/xfs/scrub/trace.h
@@ -2820,6 +2820,42 @@ DEFINE_EVENT(xrep_pptr_class, name, \
 	TP_ARGS(ip, name, pptr))
 DEFINE_XREP_PPTR_EVENT(xrep_xattr_replay_parentadd);
 DEFINE_XREP_PPTR_EVENT(xrep_xattr_replay_parentremove);
+DEFINE_XREP_PPTR_EVENT(xrep_parent_replay_parentadd);
+
+DECLARE_EVENT_CLASS(xrep_pptr_scan_class,
+	TP_PROTO(struct xfs_inode *ip, const struct xfs_inode *dp,
+		 const struct xfs_name *name),
+	TP_ARGS(ip, dp, name),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_ino_t, ino)
+		__field(xfs_ino_t, parent_ino)
+		__field(unsigned int, parent_gen)
+		__field(unsigned int, namelen)
+		__dynamic_array(char, name, name->len)
+	),
+	TP_fast_assign(
+		__entry->dev = ip->i_mount->m_super->s_dev;
+		__entry->ino = ip->i_ino;
+		__entry->parent_ino = dp->i_ino;
+		__entry->parent_gen = VFS_IC(dp)->i_generation;
+		__entry->namelen = name->len;
+		memcpy(__get_str(name), name->name, name->len);
+	),
+	TP_printk("dev %d:%d ino 0x%llx parent_ino 0x%llx parent_gen 0x%x name '%.*s'",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->ino,
+		  __entry->parent_ino,
+		  __entry->parent_gen,
+		  __entry->namelen,
+		  __get_str(name))
+)
+#define DEFINE_XREP_PPTR_SCAN_EVENT(name) \
+DEFINE_EVENT(xrep_pptr_scan_class, name, \
+	TP_PROTO(struct xfs_inode *ip, const struct xfs_inode *dp, \
+		 const struct xfs_name *name), \
+	TP_ARGS(ip, dp, name))
+DEFINE_XREP_PPTR_SCAN_EVENT(xrep_parent_stash_parentadd);
 
 TRACE_EVENT(xrep_nlinks_set_record,
 	TP_PROTO(struct xfs_mount *mp, xfs_ino_t ino,


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 07/14] xfs: implement live updates for parent pointer repairs
  2024-04-10  0:45 ` [PATCHSET v13.1 7/9] xfs: online repair for parent pointers Darrick J. Wong
                     ` (5 preceding siblings ...)
  2024-04-10  1:05   ` [PATCH 06/14] xfs: repair directory parent pointers by scanning for dirents Darrick J. Wong
@ 2024-04-10  1:05   ` Darrick J. Wong
  2024-04-10  6:20     ` Christoph Hellwig
  2024-04-10  1:05   ` [PATCH 08/14] xfs: remove pointless unlocked assertion Darrick J. Wong
                     ` (6 subsequent siblings)
  13 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  1:05 UTC (permalink / raw)
  To: djwong; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

While we're scanning the filesystem for dirents that we can turn into
parent pointers, we cannot hold the IOLOCK or ILOCK of the file being
repaired.  Therefore, we need to set up a dirent hook so that we can
keep the temporary file's parent pionters up to date with the rest of
the filesystem.  Hence we add the ability to *remove* pptrs from the
temporary file.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/scrub/parent_repair.c |  103 ++++++++++++++++++++++++++++++++++++++++--
 fs/xfs/scrub/trace.h         |    2 +
 2 files changed, 100 insertions(+), 5 deletions(-)


diff --git a/fs/xfs/scrub/parent_repair.c b/fs/xfs/scrub/parent_repair.c
index b4084a9f0e9c8..311bc7990d7c7 100644
--- a/fs/xfs/scrub/parent_repair.c
+++ b/fs/xfs/scrub/parent_repair.c
@@ -70,6 +70,12 @@
  * disrupt attrmulti cursors.
  */
 
+/* Create a parent pointer in the tempfile. */
+#define XREP_PPTR_ADD		(1)
+
+/* Remove a parent pointer from the tempfile. */
+#define XREP_PPTR_REMOVE	(2)
+
 /* A stashed parent pointer update. */
 struct xrep_pptr {
 	/* Cookie for retrieval of the pptr name. */
@@ -80,6 +86,9 @@ struct xrep_pptr {
 
 	/* Length of the pptr name. */
 	uint8_t			namelen;
+
+	/* XREP_PPTR_{ADD,REMOVE} */
+	uint8_t			action;
 };
 
 /*
@@ -225,11 +234,25 @@ xrep_parent_replay_update(
 {
 	struct xfs_scrub	*sc = rp->sc;
 
-	/* Create parent pointer. */
-	trace_xrep_parent_replay_parentadd(sc->tempip, xname, &pptr->pptr_rec);
+	switch (pptr->action) {
+	case XREP_PPTR_ADD:
+		/* Create parent pointer. */
+		trace_xrep_parent_replay_parentadd(sc->tempip, xname,
+				&pptr->pptr_rec);
 
-	return xfs_parent_set(sc->tempip, sc->ip->i_ino, xname,
-			&pptr->pptr_rec, &rp->pptr_args);
+		return xfs_parent_set(sc->tempip, sc->ip->i_ino, xname,
+				&pptr->pptr_rec, &rp->pptr_args);
+	case XREP_PPTR_REMOVE:
+		/* Remove parent pointer. */
+		trace_xrep_parent_replay_parentremove(sc->tempip, xname,
+				&pptr->pptr_rec);
+
+		return xfs_parent_unset(sc->tempip, sc->ip->i_ino, xname,
+				&pptr->pptr_rec, &rp->pptr_args);
+	}
+
+	ASSERT(0);
+	return -EIO;
 }
 
 /*
@@ -290,6 +313,7 @@ xrep_parent_stash_parentadd(
 	const struct xfs_inode	*dp)
 {
 	struct xrep_pptr	pptr = {
+		.action		= XREP_PPTR_ADD,
 		.namelen	= name->len,
 	};
 	int			error;
@@ -304,6 +328,32 @@ xrep_parent_stash_parentadd(
 	return xfarray_append(rp->pptr_recs, &pptr);
 }
 
+/*
+ * Remember that we want to remove a parent pointer from the tempfile.  These
+ * stashed actions will be replayed later.
+ */
+STATIC int
+xrep_parent_stash_parentremove(
+	struct xrep_parent	*rp,
+	const struct xfs_name	*name,
+	const struct xfs_inode	*dp)
+{
+	struct xrep_pptr	pptr = {
+		.action		= XREP_PPTR_REMOVE,
+		.namelen	= name->len,
+	};
+	int			error;
+
+	trace_xrep_parent_stash_parentremove(rp->sc->tempip, dp, name);
+
+	xfs_inode_to_parent_rec(&pptr.pptr_rec, dp);
+	error = xfblob_storename(rp->pptr_names, &pptr.name_cookie, name);
+	if (error)
+		return error;
+
+	return xfarray_append(rp->pptr_recs, &pptr);
+}
+
 /*
  * Examine an entry of a directory.  If this dirent leads us back to the file
  * whose parent pointers we're rebuilding, add a pptr to the temporary
@@ -513,6 +563,48 @@ xrep_parent_scan_dirtree(
 	return 0;
 }
 
+/*
+ * Capture dirent updates being made by other threads which are relevant to the
+ * file being repaired.
+ */
+STATIC int
+xrep_parent_live_update(
+	struct notifier_block		*nb,
+	unsigned long			action,
+	void				*data)
+{
+	struct xfs_dir_update_params	*p = data;
+	struct xrep_parent		*rp;
+	struct xfs_scrub		*sc;
+	int				error;
+
+	rp = container_of(nb, struct xrep_parent, pscan.dhook.dirent_hook.nb);
+	sc = rp->sc;
+
+	/*
+	 * This thread updated a dirent that points to the file that we're
+	 * repairing, so stash the update for replay against the temporary
+	 * file.
+	 */
+	if (p->ip->i_ino == sc->ip->i_ino &&
+	    xchk_iscan_want_live_update(&rp->pscan.iscan, p->dp->i_ino)) {
+		mutex_lock(&rp->pscan.lock);
+		if (p->delta > 0)
+			error = xrep_parent_stash_parentadd(rp, p->name, p->dp);
+		else
+			error = xrep_parent_stash_parentremove(rp, p->name,
+					p->dp);
+		mutex_unlock(&rp->pscan.lock);
+		if (error)
+			goto out_abort;
+	}
+
+	return NOTIFY_DONE;
+out_abort:
+	xchk_iscan_abort(&rp->pscan.iscan);
+	return NOTIFY_DONE;
+}
+
 /* Reset a directory's dotdot entry, if needed. */
 STATIC int
 xrep_parent_reset_dotdot(
@@ -684,7 +776,8 @@ xrep_parent_setup_scan(
 	if (error)
 		goto out_recs;
 
-	error = xrep_findparent_scan_start(sc, &rp->pscan);
+	error = __xrep_findparent_scan_start(sc, &rp->pscan,
+			xrep_parent_live_update);
 	if (error)
 		goto out_names;
 
diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
index 10c2a8d10058b..3e0cd482379c6 100644
--- a/fs/xfs/scrub/trace.h
+++ b/fs/xfs/scrub/trace.h
@@ -2821,6 +2821,7 @@ DEFINE_EVENT(xrep_pptr_class, name, \
 DEFINE_XREP_PPTR_EVENT(xrep_xattr_replay_parentadd);
 DEFINE_XREP_PPTR_EVENT(xrep_xattr_replay_parentremove);
 DEFINE_XREP_PPTR_EVENT(xrep_parent_replay_parentadd);
+DEFINE_XREP_PPTR_EVENT(xrep_parent_replay_parentremove);
 
 DECLARE_EVENT_CLASS(xrep_pptr_scan_class,
 	TP_PROTO(struct xfs_inode *ip, const struct xfs_inode *dp,
@@ -2856,6 +2857,7 @@ DEFINE_EVENT(xrep_pptr_scan_class, name, \
 		 const struct xfs_name *name), \
 	TP_ARGS(ip, dp, name))
 DEFINE_XREP_PPTR_SCAN_EVENT(xrep_parent_stash_parentadd);
+DEFINE_XREP_PPTR_SCAN_EVENT(xrep_parent_stash_parentremove);
 
 TRACE_EVENT(xrep_nlinks_set_record,
 	TP_PROTO(struct xfs_mount *mp, xfs_ino_t ino,


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 08/14] xfs: remove pointless unlocked assertion
  2024-04-10  0:45 ` [PATCHSET v13.1 7/9] xfs: online repair for parent pointers Darrick J. Wong
                     ` (6 preceding siblings ...)
  2024-04-10  1:05   ` [PATCH 07/14] xfs: implement live updates for parent pointer repairs Darrick J. Wong
@ 2024-04-10  1:05   ` Darrick J. Wong
  2024-04-10  6:20     ` Christoph Hellwig
  2024-04-10  1:06   ` [PATCH 09/14] xfs: split xfs_bmap_add_attrfork into two pieces Darrick J. Wong
                     ` (5 subsequent siblings)
  13 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  1:05 UTC (permalink / raw)
  To: djwong; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Remove this assertion about the inode not having an attr fork from
xfs_bmap_add_attrfork because the function handles that case just fine.
Weirder still, the function actually /requires/ the caller not to hold
the ILOCK, which means that its accesses are not stabilized.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_bmap.c |    2 --
 1 file changed, 2 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 59b8b9dc29ccf..55a64a72771f2 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -1041,8 +1041,6 @@ xfs_bmap_add_attrfork(
 	int			logflags;	/* logging flags */
 	int			error;		/* error return value */
 
-	ASSERT(xfs_inode_has_attr_fork(ip) == 0);
-
 	mp = ip->i_mount;
 	ASSERT(!XFS_NOT_DQATTACHED(mp, ip));
 


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 09/14] xfs: split xfs_bmap_add_attrfork into two pieces
  2024-04-10  0:45 ` [PATCHSET v13.1 7/9] xfs: online repair for parent pointers Darrick J. Wong
                     ` (7 preceding siblings ...)
  2024-04-10  1:05   ` [PATCH 08/14] xfs: remove pointless unlocked assertion Darrick J. Wong
@ 2024-04-10  1:06   ` Darrick J. Wong
  2024-04-10  6:21     ` Christoph Hellwig
  2024-04-10  1:06   ` [PATCH 10/14] xfs: add a per-leaf block callback to xchk_xattr_walk Darrick J. Wong
                     ` (4 subsequent siblings)
  13 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  1:06 UTC (permalink / raw)
  To: djwong; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Split this function into two pieces -- one to make the actual changes to
the inode core to add the attr fork, and another one to deal with
getting the transaction and locking the inodes.

The next couple of patches will need this to be split into two.  One
patch implements committing new parent pointer recordsets to damaged
files.  If one file has an attr fork and the other does not, we have to
create the missing attr fork before the atomic swap transaction, and can
use the behavior encoded in the current xfs_bmap_add_attrfork.

The second patch adapts /lost+found adoptions to handle parent pointers
correctly.  The adoption process will add a parent pointer to a child
that is being moved to /lost+found, but this requires that the attr fork
already exists.  We don't know if we're actually going to commit the
adoption until we've already reserved a transaction and taken the
ILOCKs, which means that we must have a way to bypass the start of the
current xfs_bmap_add_attrfork.

Therefore, create xfs_attr_add_fork as the helper that creates a
transaction and takes locks; and make xfs_bmap_add_attrfork the function
that updates the inode core and allocates the incore attr fork.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_attr.c |   39 ++++++++++++++++++++++++++++++++++++++-
 fs/xfs/libxfs/xfs_bmap.c |   36 ++++++++++--------------------------
 fs/xfs/libxfs/xfs_bmap.h |    3 ++-
 3 files changed, 50 insertions(+), 28 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index 83f8cf551816a..94f5acaad6e09 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -950,6 +950,43 @@ xfs_attr_lookup(
 	return error;
 }
 
+STATIC int
+xfs_attr_add_fork(
+	struct xfs_inode	*ip,		/* incore inode pointer */
+	int			size,		/* space new attribute needs */
+	int			rsvd)		/* xact may use reserved blks */
+{
+	struct xfs_mount	*mp = ip->i_mount;
+	struct xfs_trans	*tp;		/* transaction pointer */
+	unsigned int		blks;		/* space reservation */
+	int			error;		/* error return value */
+
+	ASSERT(!XFS_NOT_DQATTACHED(mp, ip));
+
+	blks = XFS_ADDAFORK_SPACE_RES(mp);
+
+	error = xfs_trans_alloc_inode(ip, &M_RES(mp)->tr_addafork, blks, 0,
+			rsvd, &tp);
+	if (error)
+		return error;
+
+	if (xfs_inode_has_attr_fork(ip))
+		goto trans_cancel;
+
+	error = xfs_bmap_add_attrfork(tp, ip, size, rsvd);
+	if (error)
+		goto trans_cancel;
+
+	error = xfs_trans_commit(tp);
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+	return error;
+
+trans_cancel:
+	xfs_trans_cancel(tp);
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+	return error;
+}
+
 /*
  * Before updating xattrs, add an attribute fork if the inode doesn't have.
  * (inode must not be locked when we call this routine)
@@ -968,7 +1005,7 @@ xfs_attr_ensure_fork(
 			xfs_attr_sf_entsize_byname(args->namelen,
 						   args->valuelen);
 
-	return xfs_bmap_add_attrfork(args->dp, sf_size, rsvd);
+	return xfs_attr_add_fork(args->dp, sf_size, rsvd);
 }
 
 /*
diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
index 55a64a72771f2..b01d477d1cfbc 100644
--- a/fs/xfs/libxfs/xfs_bmap.c
+++ b/fs/xfs/libxfs/xfs_bmap.c
@@ -1025,38 +1025,29 @@ xfs_bmap_set_attrforkoff(
 }
 
 /*
- * Convert inode from non-attributed to attributed.
- * Must not be in a transaction, ip must not be locked.
+ * Convert inode from non-attributed to attributed.  Caller must hold the
+ * ILOCK_EXCL and the file cannot have an attr fork.
  */
 int						/* error code */
 xfs_bmap_add_attrfork(
-	xfs_inode_t		*ip,		/* incore inode pointer */
+	struct xfs_trans	*tp,
+	struct xfs_inode	*ip,		/* incore inode pointer */
 	int			size,		/* space new attribute needs */
 	int			rsvd)		/* xact may use reserved blks */
 {
-	xfs_mount_t		*mp;		/* mount structure */
-	xfs_trans_t		*tp;		/* transaction pointer */
-	int			blks;		/* space reservation */
+	struct xfs_mount	*mp = tp->t_mountp;
 	int			version = 1;	/* superblock attr version */
 	int			logflags;	/* logging flags */
 	int			error;		/* error return value */
 
-	mp = ip->i_mount;
+	xfs_assert_ilocked(ip, XFS_ILOCK_EXCL);
 	ASSERT(!XFS_NOT_DQATTACHED(mp, ip));
-
-	blks = XFS_ADDAFORK_SPACE_RES(mp);
-
-	error = xfs_trans_alloc_inode(ip, &M_RES(mp)->tr_addafork, blks, 0,
-			rsvd, &tp);
-	if (error)
-		return error;
-	if (xfs_inode_has_attr_fork(ip))
-		goto trans_cancel;
+	ASSERT(!xfs_inode_has_attr_fork(ip));
 
 	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
 	error = xfs_bmap_set_attrforkoff(ip, size, &version);
 	if (error)
-		goto trans_cancel;
+		return error;
 
 	xfs_ifork_init_attr(ip, XFS_DINODE_FMT_EXTENTS, 0);
 	logflags = 0;
@@ -1077,7 +1068,7 @@ xfs_bmap_add_attrfork(
 	if (logflags)
 		xfs_trans_log_inode(tp, ip, logflags);
 	if (error)
-		goto trans_cancel;
+		return error;
 	if (!xfs_has_attr(mp) ||
 	   (!xfs_has_attr2(mp) && version == 2)) {
 		bool log_sb = false;
@@ -1096,14 +1087,7 @@ xfs_bmap_add_attrfork(
 			xfs_log_sb(tp);
 	}
 
-	error = xfs_trans_commit(tp);
-	xfs_iunlock(ip, XFS_ILOCK_EXCL);
-	return error;
-
-trans_cancel:
-	xfs_trans_cancel(tp);
-	xfs_iunlock(ip, XFS_ILOCK_EXCL);
-	return error;
+	return 0;
 }
 
 /*
diff --git a/fs/xfs/libxfs/xfs_bmap.h b/fs/xfs/libxfs/xfs_bmap.h
index 32fb2a455c294..e98849eb9bbae 100644
--- a/fs/xfs/libxfs/xfs_bmap.h
+++ b/fs/xfs/libxfs/xfs_bmap.h
@@ -176,7 +176,8 @@ int	xfs_bmap_longest_free_extent(struct xfs_perag *pag,
 void	xfs_trim_extent(struct xfs_bmbt_irec *irec, xfs_fileoff_t bno,
 		xfs_filblks_t len);
 unsigned int xfs_bmap_compute_attr_offset(struct xfs_mount *mp);
-int	xfs_bmap_add_attrfork(struct xfs_inode *ip, int size, int rsvd);
+int	xfs_bmap_add_attrfork(struct xfs_trans *tp, struct xfs_inode *ip,
+		int size, int rsvd);
 void	xfs_bmap_local_to_extents_empty(struct xfs_trans *tp,
 		struct xfs_inode *ip, int whichfork);
 int xfs_bmap_local_to_extents(struct xfs_trans *tp, struct xfs_inode *ip,


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 10/14] xfs: add a per-leaf block callback to xchk_xattr_walk
  2024-04-10  0:45 ` [PATCHSET v13.1 7/9] xfs: online repair for parent pointers Darrick J. Wong
                     ` (8 preceding siblings ...)
  2024-04-10  1:06   ` [PATCH 09/14] xfs: split xfs_bmap_add_attrfork into two pieces Darrick J. Wong
@ 2024-04-10  1:06   ` Darrick J. Wong
  2024-04-10  6:22     ` Christoph Hellwig
  2024-04-10  1:06   ` [PATCH 11/14] xfs: actually rebuild the parent pointer xattrs Darrick J. Wong
                     ` (3 subsequent siblings)
  13 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  1:06 UTC (permalink / raw)
  To: djwong; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Add a second callback function to xchk_xattr_walk so that we can do
something in between attr leaf blocks.  This will be used by the next
patch to see if we should flush cached parent pointer updates to
constrain memory usage.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/scrub/attr.c       |    2 +-
 fs/xfs/scrub/dir_repair.c |    2 +-
 fs/xfs/scrub/listxattr.c  |   10 +++++++++-
 fs/xfs/scrub/listxattr.h  |    4 +++-
 fs/xfs/scrub/nlinks.c     |    3 ++-
 fs/xfs/scrub/parent.c     |    7 ++++---
 6 files changed, 20 insertions(+), 8 deletions(-)


diff --git a/fs/xfs/scrub/attr.c b/fs/xfs/scrub/attr.c
index b91234bbd58aa..cba09ad9d45dc 100644
--- a/fs/xfs/scrub/attr.c
+++ b/fs/xfs/scrub/attr.c
@@ -661,7 +661,7 @@ xchk_xattr(
 	 * iteration, which doesn't really follow the usual buffer
 	 * locking order.
 	 */
-	error = xchk_xattr_walk(sc, sc->ip, xchk_xattr_actor, NULL);
+	error = xchk_xattr_walk(sc, sc->ip, xchk_xattr_actor, NULL, NULL);
 	if (!xchk_fblock_process_error(sc, XFS_ATTR_FORK, 0, &error))
 		return error;
 
diff --git a/fs/xfs/scrub/dir_repair.c b/fs/xfs/scrub/dir_repair.c
index 24c46211d9243..a8932d97ae886 100644
--- a/fs/xfs/scrub/dir_repair.c
+++ b/fs/xfs/scrub/dir_repair.c
@@ -1285,7 +1285,7 @@ xrep_dir_scan_file(
 		goto scan_done;
 	}
 
-	error = xchk_xattr_walk(rd->sc, ip, xrep_dir_scan_pptr, rd);
+	error = xchk_xattr_walk(rd->sc, ip, xrep_dir_scan_pptr, NULL, rd);
 	if (error)
 		goto scan_done;
 
diff --git a/fs/xfs/scrub/listxattr.c b/fs/xfs/scrub/listxattr.c
index cbe5911ecbbcf..256ff7700c942 100644
--- a/fs/xfs/scrub/listxattr.c
+++ b/fs/xfs/scrub/listxattr.c
@@ -221,6 +221,7 @@ xchk_xattr_walk_node(
 	struct xfs_scrub		*sc,
 	struct xfs_inode		*ip,
 	xchk_xattr_fn			attr_fn,
+	xchk_xattrleaf_fn		leaf_fn,
 	void				*priv)
 {
 	struct xfs_attr3_icleaf_hdr	leafhdr;
@@ -252,6 +253,12 @@ xchk_xattr_walk_node(
 
 		xfs_trans_brelse(sc->tp, leaf_bp);
 
+		if (leaf_fn) {
+			error = leaf_fn(sc, priv);
+			if (error)
+				goto out_bitmap;
+		}
+
 		/* Make sure we haven't seen this new leaf already. */
 		len = 1;
 		if (xdab_bitmap_test(&seen_dablks, leafhdr.forw, &len)) {
@@ -288,6 +295,7 @@ xchk_xattr_walk(
 	struct xfs_scrub	*sc,
 	struct xfs_inode	*ip,
 	xchk_xattr_fn		attr_fn,
+	xchk_xattrleaf_fn	leaf_fn,
 	void			*priv)
 {
 	int			error;
@@ -308,5 +316,5 @@ xchk_xattr_walk(
 	if (xfs_attr_is_leaf(ip))
 		return xchk_xattr_walk_leaf(sc, ip, attr_fn, priv);
 
-	return xchk_xattr_walk_node(sc, ip, attr_fn, priv);
+	return xchk_xattr_walk_node(sc, ip, attr_fn, leaf_fn, priv);
 }
diff --git a/fs/xfs/scrub/listxattr.h b/fs/xfs/scrub/listxattr.h
index 48fe89d05946b..703cfb7b14cfd 100644
--- a/fs/xfs/scrub/listxattr.h
+++ b/fs/xfs/scrub/listxattr.h
@@ -11,7 +11,9 @@ typedef int (*xchk_xattr_fn)(struct xfs_scrub *sc, struct xfs_inode *ip,
 		unsigned int namelen, const void *value, unsigned int valuelen,
 		void *priv);
 
+typedef int (*xchk_xattrleaf_fn)(struct xfs_scrub *sc, void *priv);
+
 int xchk_xattr_walk(struct xfs_scrub *sc, struct xfs_inode *ip,
-		xchk_xattr_fn attr_fn, void *priv);
+		xchk_xattr_fn attr_fn, xchk_xattrleaf_fn leaf_fn, void *priv);
 
 #endif /* __XFS_SCRUB_LISTXATTR_H__ */
diff --git a/fs/xfs/scrub/nlinks.c b/fs/xfs/scrub/nlinks.c
index a733e4e178de4..124a8d6b7a04d 100644
--- a/fs/xfs/scrub/nlinks.c
+++ b/fs/xfs/scrub/nlinks.c
@@ -431,7 +431,8 @@ xchk_nlinks_collect_dir(
 			goto out_unlock;
 		}
 
-		error = xchk_xattr_walk(sc, dp, xchk_nlinks_collect_pptr, xnc);
+		error = xchk_xattr_walk(sc, dp, xchk_nlinks_collect_pptr, NULL,
+				xnc);
 		if (error == -ECANCELED) {
 			error = 0;
 			goto out_unlock;
diff --git a/fs/xfs/scrub/parent.c b/fs/xfs/scrub/parent.c
index 57b49fbf97a30..8f92a54c6d981 100644
--- a/fs/xfs/scrub/parent.c
+++ b/fs/xfs/scrub/parent.c
@@ -314,7 +314,7 @@ xchk_parent_pptr_and_dotdot(
 		return 0;
 
 	/* Otherwise, walk the pptrs again, and check. */
-	error = xchk_xattr_walk(sc, sc->ip, xchk_parent_scan_dotdot, pp);
+	error = xchk_xattr_walk(sc, sc->ip, xchk_parent_scan_dotdot, NULL, pp);
 	if (error == -ECANCELED) {
 		/* Found a parent pointer that matches dotdot. */
 		return 0;
@@ -692,7 +692,8 @@ xchk_parent_count_pptrs(
 	 */
 	if (pp->need_revalidate) {
 		pp->pptrs_found = 0;
-		error = xchk_xattr_walk(sc, sc->ip, xchk_parent_count_pptr, pp);
+		error = xchk_xattr_walk(sc, sc->ip, xchk_parent_count_pptr,
+				NULL, pp);
 		if (error == -EFSCORRUPTED) {
 			/* Found a bad parent pointer */
 			xchk_fblock_set_corrupt(sc, XFS_ATTR_FORK, 0);
@@ -751,7 +752,7 @@ xchk_parent_pptr(
 	if (error)
 		goto out_entries;
 
-	error = xchk_xattr_walk(sc, sc->ip, xchk_parent_scan_attr, pp);
+	error = xchk_xattr_walk(sc, sc->ip, xchk_parent_scan_attr, NULL, pp);
 	if (error == -ECANCELED) {
 		error = 0;
 		goto out_names;


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 11/14] xfs: actually rebuild the parent pointer xattrs
  2024-04-10  0:45 ` [PATCHSET v13.1 7/9] xfs: online repair for parent pointers Darrick J. Wong
                     ` (9 preceding siblings ...)
  2024-04-10  1:06   ` [PATCH 10/14] xfs: add a per-leaf block callback to xchk_xattr_walk Darrick J. Wong
@ 2024-04-10  1:06   ` Darrick J. Wong
  2024-04-10  6:22     ` Christoph Hellwig
  2024-04-10  1:06   ` [PATCH 12/14] xfs: adapt the orphanage code to handle parent pointers Darrick J. Wong
                     ` (2 subsequent siblings)
  13 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  1:06 UTC (permalink / raw)
  To: djwong; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Once we've assembled all the parent pointers for a file, we need to
commit the new dataset atomically to that file.  Parent pointer records
are embedded in the xattr structure, which means that we must write a
new extended attribute structure, again, atomically.  Therefore, we must
copy the non-parent-pointer attributes from the file being repaired into
the temporary file's extended attributes and then call the atomic extent
swap mechanism to exchange the blocks.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_attr.c     |    2 
 fs/xfs/libxfs/xfs_attr.h     |    1 
 fs/xfs/scrub/attr_repair.c   |    4 
 fs/xfs/scrub/attr_repair.h   |    4 
 fs/xfs/scrub/findparent.c    |    2 
 fs/xfs/scrub/parent_repair.c |  706 +++++++++++++++++++++++++++++++++++++++++-
 fs/xfs/scrub/trace.h         |    2 
 7 files changed, 698 insertions(+), 23 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index 94f5acaad6e09..76d40daa02cff 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -950,7 +950,7 @@ xfs_attr_lookup(
 	return error;
 }
 
-STATIC int
+int
 xfs_attr_add_fork(
 	struct xfs_inode	*ip,		/* incore inode pointer */
 	int			size,		/* space new attribute needs */
diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
index d51001c5809fe..2a0ef4f633e2d 100644
--- a/fs/xfs/libxfs/xfs_attr.h
+++ b/fs/xfs/libxfs/xfs_attr.h
@@ -643,5 +643,6 @@ int __init xfs_attr_intent_init_cache(void);
 void xfs_attr_intent_destroy_cache(void);
 
 int xfs_attr_sf_totsize(struct xfs_inode *dp);
+int xfs_attr_add_fork(struct xfs_inode *ip, int size, int rsvd);
 
 #endif	/* __XFS_ATTR_H__ */
diff --git a/fs/xfs/scrub/attr_repair.c b/fs/xfs/scrub/attr_repair.c
index 9cf002bc18042..e06d00ea828b3 100644
--- a/fs/xfs/scrub/attr_repair.c
+++ b/fs/xfs/scrub/attr_repair.c
@@ -1032,7 +1032,7 @@ xrep_xattr_reset_fork(
  * fork.  The caller must ILOCK the tempfile and join it to the transaction.
  * This function returns with the inode joined to a clean scrub transaction.
  */
-STATIC int
+int
 xrep_xattr_reset_tempfile_fork(
 	struct xfs_scrub	*sc)
 {
@@ -1338,7 +1338,7 @@ xrep_xattr_swap_prep(
 }
 
 /* Exchange the temporary file's attribute fork with the one being repaired. */
-STATIC int
+int
 xrep_xattr_swap(
 	struct xfs_scrub	*sc,
 	struct xrep_tempexch	*tx)
diff --git a/fs/xfs/scrub/attr_repair.h b/fs/xfs/scrub/attr_repair.h
index 0a9ffa7cfa906..979729bd4a5f8 100644
--- a/fs/xfs/scrub/attr_repair.h
+++ b/fs/xfs/scrub/attr_repair.h
@@ -6,6 +6,10 @@
 #ifndef __XFS_SCRUB_ATTR_REPAIR_H__
 #define __XFS_SCRUB_ATTR_REPAIR_H__
 
+struct xrep_tempexch;
+
+int xrep_xattr_swap(struct xfs_scrub *sc, struct xrep_tempexch *tx);
 int xrep_xattr_reset_fork(struct xfs_scrub *sc);
+int xrep_xattr_reset_tempfile_fork(struct xfs_scrub *sc);
 
 #endif /* __XFS_SCRUB_ATTR_REPAIR_H__ */
diff --git a/fs/xfs/scrub/findparent.c b/fs/xfs/scrub/findparent.c
index c78422ad757bf..01766041ba2cd 100644
--- a/fs/xfs/scrub/findparent.c
+++ b/fs/xfs/scrub/findparent.c
@@ -24,6 +24,7 @@
 #include "xfs_trans_space.h"
 #include "xfs_health.h"
 #include "xfs_exchmaps.h"
+#include "xfs_parent.h"
 #include "scrub/xfs_scrub.h"
 #include "scrub/scrub.h"
 #include "scrub/common.h"
@@ -33,6 +34,7 @@
 #include "scrub/findparent.h"
 #include "scrub/readdir.h"
 #include "scrub/tempfile.h"
+#include "scrub/listxattr.h"
 
 /*
  * Finding the Parent of a Directory
diff --git a/fs/xfs/scrub/parent_repair.c b/fs/xfs/scrub/parent_repair.c
index 311bc7990d7c7..02554c99d231f 100644
--- a/fs/xfs/scrub/parent_repair.c
+++ b/fs/xfs/scrub/parent_repair.c
@@ -25,6 +25,8 @@
 #include "xfs_health.h"
 #include "xfs_exchmaps.h"
 #include "xfs_parent.h"
+#include "xfs_attr.h"
+#include "xfs_bmap.h"
 #include "scrub/xfs_scrub.h"
 #include "scrub/scrub.h"
 #include "scrub/common.h"
@@ -34,10 +36,13 @@
 #include "scrub/findparent.h"
 #include "scrub/readdir.h"
 #include "scrub/tempfile.h"
+#include "scrub/tempexch.h"
 #include "scrub/orphanage.h"
 #include "scrub/xfile.h"
 #include "scrub/xfarray.h"
 #include "scrub/xfblob.h"
+#include "scrub/attr_repair.h"
+#include "scrub/listxattr.h"
 
 /*
  * Repairing The Directory Parent Pointer
@@ -65,9 +70,9 @@
  *
  * When salvaging completes, the remaining stashed entries are replayed to the
  * temporary file.  All non-parent pointer extended attributes are copied to
- * the temporary file's extended attributes.  An atomic extent swap is used to
- * commit the new directory blocks to the directory being repaired.  This will
- * disrupt attrmulti cursors.
+ * the temporary file's extended attributes.  An atomic file mapping exchange
+ * is used to commit the new xattr blocks to the file being repaired.  This
+ * will disrupt attrmulti cursors.
  */
 
 /* Create a parent pointer in the tempfile. */
@@ -106,6 +111,23 @@ struct xrep_parent {
 	/* Blobs containing parent pointer names. */
 	struct xfblob		*pptr_names;
 
+	/* xattr keys */
+	struct xfarray		*xattr_records;
+
+	/* xattr values */
+	struct xfblob		*xattr_blobs;
+
+	/* Scratch buffers for saving extended attributes */
+	unsigned char		*xattr_name;
+	void			*xattr_value;
+	unsigned int		xattr_value_sz;
+
+	/*
+	 * Information used to exchange the attr fork mappings, if the fs
+	 * supports parent pointers.
+	 */
+	struct xrep_tempexch	tx;
+
 	/*
 	 * Information used to scan the filesystem to find the inumber of the
 	 * dotdot entry for this directory.  On filesystems without parent
@@ -117,6 +139,8 @@ struct xrep_parent {
 	 * @pscan.lock coordinates access to pptr_recs, pptr_names, pptr, and
 	 * pptr_scratch.  This reduces the memory requirements of this
 	 * structure.
+	 *
+	 * The lock also controls access to xattr_records and xattr_blobs(?)
 	 */
 	struct xrep_parent_scan_info pscan;
 
@@ -129,14 +153,48 @@ struct xrep_parent {
 
 	/* Scratch buffer for scanning pptr xattrs */
 	struct xfs_da_args	pptr_args;
+
+	/* Have we seen any live updates of parent pointers recently? */
+	bool			saw_pptr_updates;
 };
 
+struct xrep_parent_xattr {
+	/* Cookie for retrieval of the xattr name. */
+	xfblob_cookie		name_cookie;
+
+	/* Cookie for retrieval of the xattr value. */
+	xfblob_cookie		value_cookie;
+
+	/* XFS_ATTR_* flags */
+	int			flags;
+
+	/* Length of the value and name. */
+	uint32_t		valuelen;
+	uint16_t		namelen;
+};
+
+/*
+ * Stash up to 8 pages of attrs in xattr_records/xattr_blobs before we write
+ * them to the temp file.
+ */
+#define XREP_PARENT_XATTR_MAX_STASH_BYTES	(PAGE_SIZE * 8)
+
 /* Tear down all the incore stuff we created. */
 static void
 xrep_parent_teardown(
 	struct xrep_parent	*rp)
 {
 	xrep_findparent_scan_teardown(&rp->pscan);
+	kvfree(rp->xattr_name);
+	rp->xattr_name = NULL;
+	kvfree(rp->xattr_value);
+	rp->xattr_value = NULL;
+	if (rp->xattr_blobs)
+		xfblob_destroy(rp->xattr_blobs);
+	rp->xattr_blobs = NULL;
+	if (rp->xattr_records)
+		xfarray_destroy(rp->xattr_records);
+	rp->xattr_records = NULL;
 	if (rp->pptr_names)
 		xfblob_destroy(rp->pptr_names);
 	rp->pptr_names = NULL;
@@ -556,10 +614,11 @@ xrep_parent_scan_dirtree(
 	}
 
 	/*
-	 * Cancel the empty transaction so that we can (later) use the atomic
-	 * extent swap helpers to lock files and commit the new directory.
+	 * Retake sc->ip's ILOCK now that we're done flushing stashed parent
+	 * pointers.  We end this function with an empty transaction and the
+	 * ILOCK.
 	 */
-	xchk_trans_cancel(rp->sc);
+	xchk_ilock(rp->sc, XFS_ILOCK_EXCL);
 	return 0;
 }
 
@@ -594,6 +653,8 @@ xrep_parent_live_update(
 		else
 			error = xrep_parent_stash_parentremove(rp, p->name,
 					p->dp);
+		if (!error)
+			rp->saw_pptr_updates = true;
 		mutex_unlock(&rp->pscan.lock);
 		if (error)
 			goto out_abort;
@@ -648,6 +709,52 @@ xrep_parent_reset_dotdot(
 	return xfs_trans_roll(&sc->tp);
 }
 
+/* Pass back the parent inumber if this a parent pointer */
+STATIC int
+xrep_parent_lookup_pptr(
+	struct xfs_scrub	*sc,
+	struct xfs_inode	*ip,
+	unsigned int		attr_flags,
+	const unsigned char	*name,
+	unsigned int		namelen,
+	const void		*value,
+	unsigned int		valuelen,
+	void			*priv)
+{
+	xfs_ino_t		*inop = priv;
+	xfs_ino_t		parent_ino;
+	int			ret;
+
+	ret = xfs_parent_from_xattr(sc->mp, attr_flags, name, namelen,
+			value, valuelen, &parent_ino, NULL);
+	if (ret != 1)
+		return ret;
+
+	*inop = parent_ino;
+	return -ECANCELED;
+}
+
+/*
+ * Find the first parent of the scrub target by walking parent pointers for
+ * the purpose of deciding if we're going to move it to the orphanage.
+ * We don't care if the attr fork is zapped.
+ */
+STATIC int
+xrep_parent_lookup_pptrs(
+	struct xfs_scrub	*sc,
+	xfs_ino_t		*inop)
+{
+	int			error;
+
+	*inop = NULLFSINO;
+
+	error = xchk_xattr_walk(sc, sc->ip, xrep_parent_lookup_pptr, NULL,
+			inop);
+	if (error && error != -ECANCELED)
+		return error;
+	return 0;
+}
+
 /*
  * Move the current file to the orphanage.
  *
@@ -664,14 +771,26 @@ xrep_parent_move_to_orphanage(
 	xfs_ino_t		orig_parent, new_parent;
 	int			error;
 
-	/*
-	 * We are about to drop the ILOCK on sc->ip to lock the orphanage and
-	 * prepare for the adoption.  Therefore, look up the old dotdot entry
-	 * for sc->ip so that we can compare it after we re-lock sc->ip.
-	 */
-	error = xchk_dir_lookup(sc, sc->ip, &xfs_name_dotdot, &orig_parent);
-	if (error)
-		return error;
+	if (S_ISDIR(VFS_I(sc->ip)->i_mode)) {
+		/*
+		 * We are about to drop the ILOCK on sc->ip to lock the
+		 * orphanage and prepare for the adoption.  Therefore, look up
+		 * the old dotdot entry for sc->ip so that we can compare it
+		 * after we re-lock sc->ip.
+		 */
+		error = xchk_dir_lookup(sc, sc->ip, &xfs_name_dotdot,
+				&orig_parent);
+		if (error)
+			return error;
+	} else {
+		/*
+		 * We haven't dropped the ILOCK since we committed the new
+		 * xattr structure (and hence the new parent pointer records),
+		 * which means that the file cannot have been moved in the
+		 * directory tree, and there are no parents.
+		 */
+		orig_parent = NULLFSINO;
+	}
 
 	/*
 	 * Drop the ILOCK on the scrub target and commit the transaction.
@@ -704,9 +823,14 @@ xrep_parent_move_to_orphanage(
 	 * Now that we've reacquired the ILOCK on sc->ip, look up the dotdot
 	 * entry again.  If the parent changed or the child was unlinked while
 	 * the child directory was unlocked, we don't need to move the child to
-	 * the orphanage after all.
+	 * the orphanage after all.  For a non-directory, we have to scan for
+	 * the first parent pointer to see if one has been added.
 	 */
-	error = xchk_dir_lookup(sc, sc->ip, &xfs_name_dotdot, &new_parent);
+	if (S_ISDIR(VFS_I(sc->ip)->i_mode))
+		error = xchk_dir_lookup(sc, sc->ip, &xfs_name_dotdot,
+				&new_parent);
+	else
+		error = xrep_parent_lookup_pptrs(sc, &new_parent);
 	if (error)
 		return error;
 
@@ -733,6 +857,488 @@ xrep_parent_move_to_orphanage(
 	return 0;
 }
 
+/* Ensure that the xattr value buffer is large enough. */
+STATIC int
+xrep_parent_alloc_xattr_value(
+	struct xrep_parent	*rp,
+	size_t			bufsize)
+{
+	void			*new_val;
+
+	if (rp->xattr_value_sz >= bufsize)
+		return 0;
+
+	if (rp->xattr_value) {
+		kvfree(rp->xattr_value);
+		rp->xattr_value = NULL;
+		rp->xattr_value_sz = 0;
+	}
+
+	new_val = kvmalloc(bufsize, XCHK_GFP_FLAGS);
+	if (!new_val)
+		return -ENOMEM;
+
+	rp->xattr_value = new_val;
+	rp->xattr_value_sz = bufsize;
+	return 0;
+}
+
+/* Retrieve the (remote) value of a non-pptr xattr. */
+STATIC int
+xrep_parent_fetch_xattr_remote(
+	struct xrep_parent	*rp,
+	struct xfs_inode	*ip,
+	unsigned int		attr_flags,
+	const unsigned char	*name,
+	unsigned int		namelen,
+	unsigned int		valuelen)
+{
+	struct xfs_scrub	*sc = rp->sc;
+	struct xfs_da_args	args = {
+		.attr_filter	= attr_flags & XFS_ATTR_NSP_ONDISK_MASK,
+		.geo		= sc->mp->m_attr_geo,
+		.whichfork	= XFS_ATTR_FORK,
+		.dp		= ip,
+		.name		= name,
+		.namelen	= namelen,
+		.trans		= sc->tp,
+		.valuelen	= valuelen,
+		.owner		= ip->i_ino,
+	};
+	int			error;
+
+	/*
+	 * If we need a larger value buffer, try to allocate one.  If that
+	 * fails, return with -EDEADLOCK to try harder.
+	 */
+	error = xrep_parent_alloc_xattr_value(rp, valuelen);
+	if (error == -ENOMEM)
+		return -EDEADLOCK;
+	if (error)
+		return error;
+
+	args.value = rp->xattr_value;
+	xfs_attr_sethash(&args);
+	return xfs_attr_get_ilocked(&args);
+}
+
+/* Stash non-pptr attributes for later replay into the temporary file. */
+STATIC int
+xrep_parent_stash_xattr(
+	struct xfs_scrub	*sc,
+	struct xfs_inode	*ip,
+	unsigned int		attr_flags,
+	const unsigned char	*name,
+	unsigned int		namelen,
+	const void		*value,
+	unsigned int		valuelen,
+	void			*priv)
+{
+	struct xrep_parent_xattr key = {
+		.valuelen	= valuelen,
+		.namelen	= namelen,
+		.flags		= attr_flags & XFS_ATTR_NSP_ONDISK_MASK,
+	};
+	struct xrep_parent	*rp = priv;
+	int			error;
+
+	if (attr_flags & (XFS_ATTR_INCOMPLETE | XFS_ATTR_PARENT))
+		return 0;
+
+	if (!value) {
+		error = xrep_parent_fetch_xattr_remote(rp, ip, attr_flags,
+				name, namelen, valuelen);
+		if (error)
+			return error;
+
+		value = rp->xattr_value;
+	}
+
+	trace_xrep_parent_stash_xattr(rp->sc->tempip, key.flags, (void *)name,
+			key.namelen, key.valuelen);
+
+	error = xfblob_store(rp->xattr_blobs, &key.name_cookie, name,
+			key.namelen);
+	if (error)
+		return error;
+
+	error = xfblob_store(rp->xattr_blobs, &key.value_cookie, value,
+			key.valuelen);
+	if (error)
+		return error;
+
+	return xfarray_append(rp->xattr_records, &key);
+}
+
+/* Insert one xattr key/value. */
+STATIC int
+xrep_parent_insert_xattr(
+	struct xrep_parent		*rp,
+	const struct xrep_parent_xattr	*key)
+{
+	struct xfs_da_args		args = {
+		.dp			= rp->sc->tempip,
+		.attr_filter		= key->flags,
+		.namelen		= key->namelen,
+		.valuelen		= key->valuelen,
+		.owner			= rp->sc->ip->i_ino,
+		.geo			= rp->sc->mp->m_attr_geo,
+		.whichfork		= XFS_ATTR_FORK,
+		.op_flags		= XFS_DA_OP_OKNOENT,
+	};
+	int				error;
+
+	ASSERT(!(key->flags & XFS_ATTR_PARENT));
+
+	/*
+	 * Grab pointers to the scrub buffer so that we can use them to insert
+	 * attrs into the temp file.
+	 */
+	args.name = rp->xattr_name;
+	args.value = rp->xattr_value;
+
+	/*
+	 * The attribute name is stored near the end of the in-core buffer,
+	 * though we reserve one more byte to ensure null termination.
+	 */
+	rp->xattr_name[XATTR_NAME_MAX] = 0;
+
+	error = xfblob_load(rp->xattr_blobs, key->name_cookie, rp->xattr_name,
+			key->namelen);
+	if (error)
+		return error;
+
+	error = xfblob_free(rp->xattr_blobs, key->name_cookie);
+	if (error)
+		return error;
+
+	error = xfblob_load(rp->xattr_blobs, key->value_cookie, args.value,
+			key->valuelen);
+	if (error)
+		return error;
+
+	error = xfblob_free(rp->xattr_blobs, key->value_cookie);
+	if (error)
+		return error;
+
+	rp->xattr_name[key->namelen] = 0;
+
+	trace_xrep_parent_insert_xattr(rp->sc->tempip, key->flags,
+			rp->xattr_name, key->namelen, key->valuelen);
+
+	xfs_attr_sethash(&args);
+	return xfs_attr_setname(&args, false);
+}
+
+/*
+ * Periodically flush salvaged attributes to the temporary file.  This is done
+ * to reduce the memory requirements of the xattr rebuild because files can
+ * contain millions of attributes.
+ */
+STATIC int
+xrep_parent_flush_xattrs(
+	struct xrep_parent	*rp)
+{
+	xfarray_idx_t		array_cur;
+	int			error;
+
+	/*
+	 * Entering this function, the scrub context has a reference to the
+	 * inode being repaired, the temporary file, and the empty scrub
+	 * transaction that we created for the xattr scan.  We hold ILOCK_EXCL
+	 * on the inode being repaired.
+	 *
+	 * To constrain kernel memory use, we occasionally flush salvaged
+	 * xattrs from the xfarray and xfblob structures into the temporary
+	 * file in preparation for exchanging the xattr structures at the end.
+	 * Updating the temporary file requires a transaction, so we commit the
+	 * scrub transaction and drop the ILOCK so that xfs_attr_set can
+	 * allocate whatever transaction it wants.
+	 *
+	 * We still hold IOLOCK_EXCL on the inode being repaired, which
+	 * prevents anyone from adding xattrs (or parent pointers) while we're
+	 * flushing.
+	 */
+	xchk_trans_cancel(rp->sc);
+	xchk_iunlock(rp->sc, XFS_ILOCK_EXCL);
+
+	/*
+	 * Take the IOLOCK of the temporary file while we modify xattrs.  This
+	 * isn't strictly required because the temporary file is never revealed
+	 * to userspace, but we follow the same locking rules.  We still hold
+	 * sc->ip's IOLOCK.
+	 */
+	error = xrep_tempfile_iolock_polled(rp->sc);
+	if (error)
+		return error;
+
+	/* Add all the salvaged attrs to the temporary file. */
+	foreach_xfarray_idx(rp->xattr_records, array_cur) {
+		struct xrep_parent_xattr	key;
+
+		error = xfarray_load(rp->xattr_records, array_cur, &key);
+		if (error)
+			return error;
+
+		error = xrep_parent_insert_xattr(rp, &key);
+		if (error)
+			return error;
+	}
+
+	/* Empty out both arrays now that we've added the entries. */
+	xfarray_truncate(rp->xattr_records);
+	xfblob_truncate(rp->xattr_blobs);
+
+	xrep_tempfile_iounlock(rp->sc);
+
+	/* Recreate the empty transaction and relock the inode. */
+	error = xchk_trans_alloc_empty(rp->sc);
+	if (error)
+		return error;
+	xchk_ilock(rp->sc, XFS_ILOCK_EXCL);
+	return 0;
+}
+
+/* Decide if we've stashed too much xattr data in memory. */
+static inline bool
+xrep_parent_want_flush_xattrs(
+	struct xrep_parent	*rp)
+{
+	unsigned long long	bytes;
+
+	bytes = xfarray_bytes(rp->xattr_records) +
+		xfblob_bytes(rp->xattr_blobs);
+	return bytes > XREP_PARENT_XATTR_MAX_STASH_BYTES;
+}
+
+/* Flush staged attributes to the temporary file if we're over the limit. */
+STATIC int
+xrep_parent_try_flush_xattrs(
+	struct xfs_scrub	*sc,
+	void			*priv)
+{
+	struct xrep_parent	*rp = priv;
+	int			error;
+
+	if (!xrep_parent_want_flush_xattrs(rp))
+		return 0;
+
+	error = xrep_parent_flush_xattrs(rp);
+	if (error)
+		return error;
+
+	/*
+	 * If there were any parent pointer updates to the xattr structure
+	 * while we dropped the ILOCK, the xattr structure is now stale.
+	 * Signal to the attr copy process that we need to start over, but
+	 * this time without opportunistic attr flushing.
+	 *
+	 * This is unlikely to happen, so we're ok with restarting the copy.
+	 */
+	mutex_lock(&rp->pscan.lock);
+	if (rp->saw_pptr_updates)
+		error = -ESTALE;
+	mutex_unlock(&rp->pscan.lock);
+	return error;
+}
+
+/* Copy all the non-pptr extended attributes into the temporary file. */
+STATIC int
+xrep_parent_copy_xattrs(
+	struct xrep_parent	*rp)
+{
+	struct xfs_scrub	*sc = rp->sc;
+	int			error;
+
+	/*
+	 * Clear the pptr updates flag.  We hold sc->ip ILOCKed, so there
+	 * can't be any parent pointer updates in progress.
+	 */
+	mutex_lock(&rp->pscan.lock);
+	rp->saw_pptr_updates = false;
+	mutex_unlock(&rp->pscan.lock);
+
+	/* Copy xattrs, stopping periodically to flush the incore buffers. */
+	error = xchk_xattr_walk(sc, sc->ip, xrep_parent_stash_xattr,
+			xrep_parent_try_flush_xattrs, rp);
+	if (error && error != -ESTALE)
+		return error;
+
+	if (error == -ESTALE) {
+		/*
+		 * The xattr copy collided with a parent pointer update.
+		 * Restart the copy, but this time hold the ILOCK all the way
+		 * to the end to lock out any directory parent pointer updates.
+		 */
+		error = xchk_xattr_walk(sc, sc->ip, xrep_parent_stash_xattr,
+				NULL, rp);
+		if (error)
+			return error;
+	}
+
+	/* Flush any remaining stashed xattrs to the temporary file. */
+	if (xfarray_bytes(rp->xattr_records) == 0)
+		return 0;
+
+	return xrep_parent_flush_xattrs(rp);
+}
+
+/*
+ * Ensure that @sc->ip and @sc->tempip both have attribute forks before we head
+ * into the attr fork exchange transaction.  All files on a filesystem with
+ * parent pointers must have an attr fork because the parent pointer code does
+ * not itself add attribute forks.
+ *
+ * Note: Unlinkable unlinked files don't need one, but the overhead of having
+ * an unnecessary attr fork is not justified by the additional code complexity
+ * that would be needed to track that state correctly.
+ */
+STATIC int
+xrep_parent_ensure_attr_fork(
+	struct xrep_parent	*rp)
+{
+	struct xfs_scrub	*sc = rp->sc;
+	int			error;
+
+	error = xfs_attr_add_fork(sc->tempip,
+			sizeof(struct xfs_attr_sf_hdr), 1);
+	if (error)
+		return error;
+	return xfs_attr_add_fork(sc->ip, sizeof(struct xfs_attr_sf_hdr), 1);
+}
+
+/*
+ * Finish replaying stashed parent pointer updates, allocate a transaction for
+ * exchanging extent mappings, and take the ILOCKs of both files before we
+ * commit the new attribute structure.
+ */
+STATIC int
+xrep_parent_finalize_tempfile(
+	struct xrep_parent	*rp)
+{
+	struct xfs_scrub	*sc = rp->sc;
+	int			error;
+
+	/*
+	 * Repair relies on the ILOCK to quiesce all possible xattr updates.
+	 * Replay all queued parent pointer updates into the tempfile before
+	 * exchanging the contents, even if that means dropping the ILOCKs and
+	 * the transaction.
+	 */
+	do {
+		error = xrep_parent_replay_updates(rp);
+		if (error)
+			return error;
+
+		error = xrep_parent_ensure_attr_fork(rp);
+		if (error)
+			return error;
+
+		error = xrep_tempexch_trans_alloc(sc, XFS_ATTR_FORK, &rp->tx);
+		if (error)
+			return error;
+
+		if (xfarray_length(rp->pptr_recs) == 0)
+			break;
+
+		xchk_trans_cancel(sc);
+		xrep_tempfile_iunlock_both(sc);
+	} while (!xchk_should_terminate(sc, &error));
+	return error;
+}
+
+/*
+ * Replay all the stashed parent pointers into the temporary file, copy all
+ * the non-pptr xattrs from the file being repaired into the temporary file,
+ * and exchange the attr fork contents atomically.
+ */
+STATIC int
+xrep_parent_rebuild_pptrs(
+	struct xrep_parent	*rp)
+{
+	struct xfs_scrub	*sc = rp->sc;
+	xfs_ino_t		parent_ino = NULLFSINO;
+	int			error;
+
+	/*
+	 * Copy non-ppttr xattrs from the file being repaired into the
+	 * temporary file's xattr structure.  We hold sc->ip's IOLOCK, which
+	 * prevents setxattr/removexattr calls from occurring, but renames
+	 * update the parent pointers without holding IOLOCK.  If we detect
+	 * stale attr structures, we restart the scan but only flush at the
+	 * end.
+	 */
+	error = xrep_parent_copy_xattrs(rp);
+	if (error)
+		return error;
+
+	/*
+	 * Cancel the empty transaction that we used to walk and copy attrs,
+	 * and drop the ILOCK so that we can take the IOLOCK on the temporary
+	 * file.  We still hold sc->ip's IOLOCK.
+	 */
+	xchk_trans_cancel(sc);
+	xchk_iunlock(sc, XFS_ILOCK_EXCL);
+
+	error = xrep_tempfile_iolock_polled(sc);
+	if (error)
+		return error;
+
+	/*
+	 * Allocate transaction, lock inodes, and make sure that we've replayed
+	 * all the stashed pptr updates to the tempdir.  After this point,
+	 * we're ready to exchange the attr fork mappings.
+	 */
+	error = xrep_parent_finalize_tempfile(rp);
+	if (error)
+		return error;
+
+	/* Last chance to abort before we start committing pptr fixes. */
+	if (xchk_should_terminate(sc, &error))
+		return error;
+
+	if (xchk_iscan_aborted(&rp->pscan.iscan))
+		return -ECANCELED;
+
+	/*
+	 * Exchange the attr fork contents and junk the old attr fork contents,
+	 * which are now in the tempfile.
+	 */
+	error = xrep_xattr_swap(sc, &rp->tx);
+	if (error)
+		return error;
+	error = xrep_xattr_reset_tempfile_fork(sc);
+	if (error)
+		return error;
+
+	/*
+	 * Roll to get a transaction without any inodes joined to it.  Then we
+	 * can drop the tempfile's ILOCK and IOLOCK before doing more work on
+	 * the scrub target file.
+	 */
+	error = xfs_trans_roll(&sc->tp);
+	if (error)
+		return error;
+	xrep_tempfile_iunlock(sc);
+	xrep_tempfile_iounlock(sc);
+
+	/*
+	 * We've committed the new parent pointers.  Find at least one parent
+	 * so that we can decide if we're moving this file to the orphanage.
+	 * For this purpose, root directories are their own parents.
+	 */
+	if (sc->ip == sc->mp->m_rootip) {
+		xrep_findparent_scan_found(&rp->pscan, sc->ip->i_ino);
+	} else {
+		error = xrep_parent_lookup_pptrs(sc, &parent_ino);
+		if (error)
+			return error;
+		if (parent_ino != NULLFSINO)
+			xrep_findparent_scan_found(&rp->pscan, parent_ino);
+	}
+	return 0;
+}
+
 /*
  * Commit the new parent pointer structure (currently only the dotdot entry) to
  * the file that we're repairing.
@@ -741,13 +1347,24 @@ STATIC int
 xrep_parent_rebuild_tree(
 	struct xrep_parent	*rp)
 {
+	int			error;
+
+	if (xfs_has_parent(rp->sc->mp)) {
+		error = xrep_parent_rebuild_pptrs(rp);
+		if (error)
+			return error;
+	}
+
 	if (rp->pscan.parent_ino == NULLFSINO) {
 		if (xrep_orphanage_can_adopt(rp->sc))
 			return xrep_parent_move_to_orphanage(rp);
 		return -EFSCORRUPTED;
 	}
 
-	return xrep_parent_reset_dotdot(rp);
+	if (S_ISDIR(VFS_I(rp->sc->ip)->i_mode))
+		return xrep_parent_reset_dotdot(rp);
+
+	return 0;
 }
 
 /* Set up the filesystem scan so we can look for parents. */
@@ -757,18 +1374,39 @@ xrep_parent_setup_scan(
 {
 	struct xfs_scrub	*sc = rp->sc;
 	char			*descr;
+	struct xfs_da_geometry	*geo = sc->mp->m_attr_geo;
+	int			max_len;
 	int			error;
 
 	if (!xfs_has_parent(sc->mp))
 		return xrep_findparent_scan_start(sc, &rp->pscan);
 
+	/* Buffers for copying non-pptr attrs to the tempfile */
+	rp->xattr_name = kvmalloc(XATTR_NAME_MAX + 1, XCHK_GFP_FLAGS);
+	if (!rp->xattr_name)
+		return -ENOMEM;
+
+	/*
+	 * Allocate enough memory to handle loading local attr values from the
+	 * xfblob data while flushing stashed attrs to the temporary file.
+	 * We only realloc the buffer when salvaging remote attr values, so
+	 * TRY_HARDER means we allocate the maximal attr value size.
+	 */
+	if (sc->flags & XCHK_TRY_HARDER)
+		max_len = XATTR_SIZE_MAX;
+	else
+		max_len = xfs_attr_leaf_entsize_local_max(geo->blksize);
+	error = xrep_parent_alloc_xattr_value(rp, max_len);
+	if (error)
+		goto out_xattr_name;
+
 	/* Set up some staging memory for logging parent pointer updates. */
 	descr = xchk_xfile_ino_descr(sc, "parent pointer entries");
 	error = xfarray_create(descr, 0, sizeof(struct xrep_pptr),
 			&rp->pptr_recs);
 	kfree(descr);
 	if (error)
-		return error;
+		goto out_xattr_value;
 
 	descr = xchk_xfile_ino_descr(sc, "parent pointer names");
 	error = xfblob_create(descr, &rp->pptr_names);
@@ -776,19 +1414,47 @@ xrep_parent_setup_scan(
 	if (error)
 		goto out_recs;
 
+	/* Set up some storage for copying attrs before the mapping exchange */
+	descr = xchk_xfile_ino_descr(sc,
+				"parent pointer retained xattr entries");
+	error = xfarray_create(descr, 0, sizeof(struct xrep_parent_xattr),
+			&rp->xattr_records);
+	kfree(descr);
+	if (error)
+		goto out_names;
+
+	descr = xchk_xfile_ino_descr(sc,
+				"parent pointer retained xattr values");
+	error = xfblob_create(descr, &rp->xattr_blobs);
+	kfree(descr);
+	if (error)
+		goto out_attr_keys;
+
 	error = __xrep_findparent_scan_start(sc, &rp->pscan,
 			xrep_parent_live_update);
 	if (error)
-		goto out_names;
+		goto out_attr_values;
 
 	return 0;
 
+out_attr_values:
+	xfblob_destroy(rp->xattr_blobs);
+	rp->xattr_blobs = NULL;
+out_attr_keys:
+	xfarray_destroy(rp->xattr_records);
+	rp->xattr_records = NULL;
 out_names:
 	xfblob_destroy(rp->pptr_names);
 	rp->pptr_names = NULL;
 out_recs:
 	xfarray_destroy(rp->pptr_recs);
 	rp->pptr_recs = NULL;
+out_xattr_value:
+	kvfree(rp->xattr_value);
+	rp->xattr_value = NULL;
+out_xattr_name:
+	kvfree(rp->xattr_name);
+	rp->xattr_name = NULL;
 	return error;
 }
 
@@ -818,7 +1484,7 @@ xrep_parent(
 	if (error)
 		goto out_teardown;
 
-	/* Last chance to abort before we start committing fixes. */
+	/* Last chance to abort before we start committing dotdot fixes. */
 	if (xchk_should_terminate(sc, &error))
 		goto out_teardown;
 
diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
index 3e0cd482379c6..ecfaa4b88910f 100644
--- a/fs/xfs/scrub/trace.h
+++ b/fs/xfs/scrub/trace.h
@@ -2539,6 +2539,8 @@ DEFINE_EVENT(xrep_xattr_salvage_class, name, \
 	TP_ARGS(ip, flags, name, namelen, valuelen))
 DEFINE_XREP_XATTR_SALVAGE_EVENT(xrep_xattr_salvage_rec);
 DEFINE_XREP_XATTR_SALVAGE_EVENT(xrep_xattr_insert_rec);
+DEFINE_XREP_XATTR_SALVAGE_EVENT(xrep_parent_stash_xattr);
+DEFINE_XREP_XATTR_SALVAGE_EVENT(xrep_parent_insert_xattr);
 
 DECLARE_EVENT_CLASS(xrep_pptr_salvage_class,
 	TP_PROTO(struct xfs_inode *ip, unsigned int flags, const void *name,


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 12/14] xfs: adapt the orphanage code to handle parent pointers
  2024-04-10  0:45 ` [PATCHSET v13.1 7/9] xfs: online repair for parent pointers Darrick J. Wong
                     ` (10 preceding siblings ...)
  2024-04-10  1:06   ` [PATCH 11/14] xfs: actually rebuild the parent pointer xattrs Darrick J. Wong
@ 2024-04-10  1:06   ` Darrick J. Wong
  2024-04-10  6:23     ` Christoph Hellwig
  2024-04-10  1:07   ` [PATCH 13/14] xfs: repair link count of nondirectories after rebuilding " Darrick J. Wong
  2024-04-10  1:07   ` [PATCH 14/14] xfs: inode repair should ensure there's an attr fork to store " Darrick J. Wong
  13 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  1:06 UTC (permalink / raw)
  To: djwong; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Adapt the orphanage's adoption code to update the child file's parent
pointers as part of the reparenting process.  Also ensure that the child
has an attr fork to receive the parent pointer update, since the runtime
code assumes one exists.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/scrub/orphanage.c |   38 ++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/orphanage.h |    3 +++
 fs/xfs/scrub/scrub.c     |    2 ++
 3 files changed, 43 insertions(+)


diff --git a/fs/xfs/scrub/orphanage.c b/fs/xfs/scrub/orphanage.c
index 94bcc2799188f..b2f905924d0d8 100644
--- a/fs/xfs/scrub/orphanage.c
+++ b/fs/xfs/scrub/orphanage.c
@@ -19,6 +19,8 @@
 #include "xfs_icache.h"
 #include "xfs_bmap.h"
 #include "xfs_bmap_btree.h"
+#include "xfs_parent.h"
+#include "xfs_attr_sf.h"
 #include "scrub/scrub.h"
 #include "scrub/common.h"
 #include "scrub/repair.h"
@@ -330,6 +332,8 @@ xrep_adoption_trans_alloc(
 	if (S_ISDIR(VFS_I(sc->ip)->i_mode))
 		child_blkres = xfs_rename_space_res(mp, 0, false,
 						    xfs_name_dotdot.len, false);
+	if (xfs_has_parent(mp))
+		child_blkres += XFS_ADDAFORK_SPACE_RES(mp);
 	adopt->child_blkres = child_blkres;
 
 	/*
@@ -503,6 +507,19 @@ xrep_adoption_zap_dcache(
 	dput(d_orphanage);
 }
 
+/*
+ * If we have to add an attr fork ahead of a parent pointer update, how much
+ * space should we ask for?
+ */
+static inline int
+xrep_adoption_attr_sizeof(
+	const struct xrep_adoption	*adopt)
+{
+	return sizeof(struct xfs_attr_sf_hdr) +
+		xfs_attr_sf_entsize_byname(sizeof(struct xfs_parent_rec),
+					   adopt->xname->len);
+}
+
 /*
  * Move the current file to the orphanage under the computed name.
  *
@@ -524,6 +541,19 @@ xrep_adoption_move(
 	if (error)
 		return error;
 
+	/*
+	 * If this filesystem has parent pointers, ensure that the file being
+	 * moved to the orphanage has an attribute fork.  This is required
+	 * because the parent pointer code does not itself add attr forks.
+	 */
+	if (!xfs_inode_has_attr_fork(sc->ip) && xfs_has_parent(sc->mp)) {
+		int sf_size = xrep_adoption_attr_sizeof(adopt);
+
+		error = xfs_bmap_add_attrfork(sc->tp, sc->ip, sf_size, true);
+		if (error)
+			return error;
+	}
+
 	/* Create the new name in the orphanage. */
 	error = xfs_dir_createname(sc->tp, sc->orphanage, adopt->xname,
 			sc->ip->i_ino, adopt->orphanage_blkres);
@@ -548,6 +578,14 @@ xrep_adoption_move(
 			return error;
 	}
 
+	/* Add a parent pointer from the file back to the lost+found. */
+	if (xfs_has_parent(sc->mp)) {
+		error = xfs_parent_addname(sc->tp, &adopt->ppargs,
+				sc->orphanage, adopt->xname, sc->ip);
+		if (error)
+			return error;
+	}
+
 	/*
 	 * Notify dirent hooks that we moved the file to /lost+found, and
 	 * finish all the deferred work so that we know the adoption is fully
diff --git a/fs/xfs/scrub/orphanage.h b/fs/xfs/scrub/orphanage.h
index 319179ab788d3..beb6b686784e6 100644
--- a/fs/xfs/scrub/orphanage.h
+++ b/fs/xfs/scrub/orphanage.h
@@ -54,6 +54,9 @@ struct xrep_adoption {
 	/* Name used for the adoption. */
 	struct xfs_name		*xname;
 
+	/* Parent pointer context tracking */
+	struct xfs_parent_args	ppargs;
+
 	/* Block reservations for orphanage and child (if directory). */
 	unsigned int		orphanage_blkres;
 	unsigned int		child_blkres;
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index ebb06838c31be..7b1f1abdc7a98 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -19,6 +19,8 @@
 #include "xfs_rmap.h"
 #include "xfs_exchrange.h"
 #include "xfs_exchmaps.h"
+#include "xfs_dir2.h"
+#include "xfs_parent.h"
 #include "scrub/scrub.h"
 #include "scrub/common.h"
 #include "scrub/trace.h"


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 13/14] xfs: repair link count of nondirectories after rebuilding parent pointers
  2024-04-10  0:45 ` [PATCHSET v13.1 7/9] xfs: online repair for parent pointers Darrick J. Wong
                     ` (11 preceding siblings ...)
  2024-04-10  1:06   ` [PATCH 12/14] xfs: adapt the orphanage code to handle parent pointers Darrick J. Wong
@ 2024-04-10  1:07   ` Darrick J. Wong
  2024-04-10  6:22     ` Christoph Hellwig
  2024-04-10  1:07   ` [PATCH 14/14] xfs: inode repair should ensure there's an attr fork to store " Darrick J. Wong
  13 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  1:07 UTC (permalink / raw)
  To: djwong; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Since the parent pointer scrubber does not exhaustively search the
filesystem for missing parent pointers, it doesn't have a good way to
determine that there are pointers missing from an otherwise uncorrupt
xattr structure.  Instead, for nondirectories it employs a heuristic of
comparing the file link count to the number of parent pointers found.

However, we don't want this heuristic flagging a false corruption after
a repair has actually scanned the entire filesystem to rebuild the
parent pointers.  Therefore, reset the file link count in this one case
because we actually know the correct link count.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/scrub/parent_repair.c |  104 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 104 insertions(+)


diff --git a/fs/xfs/scrub/parent_repair.c b/fs/xfs/scrub/parent_repair.c
index 02554c99d231f..d9ab5b85deb2d 100644
--- a/fs/xfs/scrub/parent_repair.c
+++ b/fs/xfs/scrub/parent_repair.c
@@ -27,6 +27,7 @@
 #include "xfs_parent.h"
 #include "xfs_attr.h"
 #include "xfs_bmap.h"
+#include "xfs_ag.h"
 #include "scrub/xfs_scrub.h"
 #include "scrub/scrub.h"
 #include "scrub/common.h"
@@ -156,6 +157,9 @@ struct xrep_parent {
 
 	/* Have we seen any live updates of parent pointers recently? */
 	bool			saw_pptr_updates;
+
+	/* Number of parents we found after all other repairs */
+	unsigned long long	parents;
 };
 
 struct xrep_parent_xattr {
@@ -1367,6 +1371,99 @@ xrep_parent_rebuild_tree(
 	return 0;
 }
 
+/* Count the number of parent pointers. */
+STATIC int
+xrep_parent_count_pptr(
+	struct xfs_scrub	*sc,
+	struct xfs_inode	*ip,
+	unsigned int		attr_flags,
+	const unsigned char	*name,
+	unsigned int		namelen,
+	const void		*value,
+	unsigned int		valuelen,
+	void			*priv)
+{
+	struct xrep_parent	*rp = priv;
+	int			ret;
+
+	ret = xfs_parent_from_xattr(sc->mp, attr_flags, name, namelen,
+			value, valuelen, NULL, NULL);
+	if (ret != 1)
+		return ret;
+
+	rp->parents++;
+	return 0;
+}
+
+/*
+ * After all parent pointer rebuilding and adoption activity completes, reset
+ * the link count of this nondirectory, having scanned the fs to rebuild all
+ * parent pointers.
+ */
+STATIC int
+xrep_parent_set_nondir_nlink(
+	struct xrep_parent	*rp)
+{
+	struct xfs_scrub	*sc = rp->sc;
+	struct xfs_inode	*ip = sc->ip;
+	struct xfs_perag	*pag;
+	bool			joined = false;
+	int			error;
+
+	/* Count parent pointers so we can reset the file link count. */
+	rp->parents = 0;
+	error = xchk_xattr_walk(sc, ip, xrep_parent_count_pptr, NULL, rp);
+	if (error)
+		return error;
+
+	if (rp->parents > 0 && xfs_inode_on_unlinked_list(ip)) {
+		xfs_trans_ijoin(sc->tp, sc->ip, 0);
+		joined = true;
+
+		/*
+		 * The file is on the unlinked list but we found parents.
+		 * Remove the file from the unlinked list.
+		 */
+		pag = xfs_perag_get(sc->mp, XFS_INO_TO_AGNO(sc->mp, ip->i_ino));
+		if (!pag) {
+			ASSERT(0);
+			return -EFSCORRUPTED;
+		}
+
+		error = xfs_iunlink_remove(sc->tp, pag, ip);
+		xfs_perag_put(pag);
+		if (error)
+			return error;
+	} else if (rp->parents == 0 && !xfs_inode_on_unlinked_list(ip)) {
+		xfs_trans_ijoin(sc->tp, sc->ip, 0);
+		joined = true;
+
+		/*
+		 * The file is not on the unlinked list but we found no
+		 * parents.  Add the file to the unlinked list.
+		 */
+		error = xfs_iunlink(sc->tp, ip);
+		if (error)
+			return error;
+	}
+
+	/* Set the correct link count. */
+	if (VFS_I(ip)->i_nlink != rp->parents) {
+		if (!joined) {
+			xfs_trans_ijoin(sc->tp, sc->ip, 0);
+			joined = true;
+		}
+
+		set_nlink(VFS_I(ip), min_t(unsigned long long, rp->parents,
+					   XFS_NLINK_PINNED));
+	}
+
+	/* Log the inode to keep it moving forward if we dirtied anything. */
+	if (joined)
+		xfs_trans_log_inode(sc->tp, ip, XFS_ILOG_CORE);
+	return 0;
+}
+
 /* Set up the filesystem scan so we can look for parents. */
 STATIC int
 xrep_parent_setup_scan(
@@ -1491,6 +1588,13 @@ xrep_parent(
 	error = xrep_parent_rebuild_tree(rp);
 	if (error)
 		goto out_teardown;
+	if (xfs_has_parent(sc->mp) && !S_ISDIR(VFS_I(sc->ip)->i_mode)) {
+		error = xrep_parent_set_nondir_nlink(rp);
+		if (error)
+			goto out_teardown;
+	}
+
+	error = xrep_defer_finish(sc);
 
 out_teardown:
 	xrep_parent_teardown(rp);


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 14/14] xfs: inode repair should ensure there's an attr fork to store parent pointers
  2024-04-10  0:45 ` [PATCHSET v13.1 7/9] xfs: online repair for parent pointers Darrick J. Wong
                     ` (12 preceding siblings ...)
  2024-04-10  1:07   ` [PATCH 13/14] xfs: repair link count of nondirectories after rebuilding " Darrick J. Wong
@ 2024-04-10  1:07   ` Darrick J. Wong
  2024-04-10  6:24     ` Christoph Hellwig
  13 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  1:07 UTC (permalink / raw)
  To: djwong; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

The runtime parent pointer update code expects that any file being moved
around the directory tree already has an attr fork.  However, if we had
to rebuild an inode core record, there's a chance that we zeroed forkoff
as part of the inode to pass the iget verifiers.

Therefore, if we performed any repairs on an inode core, ensure that the
inode has a nonzero forkoff before unlocking the inode.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/scrub/inode_repair.c |   41 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 41 insertions(+)


diff --git a/fs/xfs/scrub/inode_repair.c b/fs/xfs/scrub/inode_repair.c
index e3b74ea50fdef..daf9f1ee7c2cb 100644
--- a/fs/xfs/scrub/inode_repair.c
+++ b/fs/xfs/scrub/inode_repair.c
@@ -1736,6 +1736,44 @@ xrep_inode_extsize(
 	}
 }
 
+/* Ensure this file has an attr fork if it needs to hold a parent pointer. */
+STATIC int
+xrep_inode_pptr(
+	struct xfs_scrub	*sc)
+{
+	struct xfs_mount	*mp = sc->mp;
+	struct xfs_inode	*ip = sc->ip;
+	struct inode		*inode = VFS_I(ip);
+
+	if (!xfs_has_parent(mp))
+		return 0;
+
+	/*
+	 * Unlinked inodes that cannot be added to the directory tree will not
+	 * have a parent pointer.
+	 */
+	if (inode->i_nlink == 0 && !(inode->i_state & I_LINKABLE))
+		return 0;
+
+	/* The root directory doesn't have a parent pointer. */
+	if (ip == mp->m_rootip)
+		return 0;
+
+	/*
+	 * Metadata inodes are rooted in the superblock and do not have any
+	 * parents.
+	 */
+	if (xfs_is_metadata_inode(ip))
+		return 0;
+
+	/* Inode already has an attr fork; no further work possible here. */
+	if (xfs_inode_has_attr_fork(ip))
+		return 0;
+
+	return xfs_bmap_add_attrfork(sc->tp, ip,
+			sizeof(struct xfs_attr_sf_hdr), true);
+}
+
 /* Fix any irregularities in an inode that the verifiers don't catch. */
 STATIC int
 xrep_inode_problems(
@@ -1744,6 +1782,9 @@ xrep_inode_problems(
 	int			error;
 
 	error = xrep_inode_blockcounts(sc);
+	if (error)
+		return error;
+	error = xrep_inode_pptr(sc);
 	if (error)
 		return error;
 	xrep_inode_timestamps(sc->ip);


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 1/4] xfs: teach online scrub to find directory tree structure problems
  2024-04-10  0:46 ` [PATCHSET v13.1 8/9] xfs: detect and correct directory tree problems Darrick J. Wong
@ 2024-04-10  1:07   ` Darrick J. Wong
  2024-04-10  7:21     ` Christoph Hellwig
  2024-04-10  1:07   ` [PATCH 2/4] xfs: invalidate dirloop scrub path data when concurrent updates happen Darrick J. Wong
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  1:07 UTC (permalink / raw)
  To: djwong; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Create a new scrubber that detects corruptions within the directory tree
structure itself.  It can detect directories with multiple parents;
loops within the directory tree; and directory loops not accessible from
the root.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/Makefile           |    1 
 fs/xfs/libxfs/xfs_fs.h    |    3 
 fs/xfs/scrub/common.h     |    1 
 fs/xfs/scrub/dirtree.c    |  789 +++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/dirtree.h    |  129 +++++++
 fs/xfs/scrub/ino_bitmap.h |   37 ++
 fs/xfs/scrub/scrub.c      |    7 
 fs/xfs/scrub/scrub.h      |    1 
 fs/xfs/scrub/stats.c      |    1 
 fs/xfs/scrub/trace.c      |    4 
 fs/xfs/scrub/trace.h      |  190 +++++++++++
 fs/xfs/scrub/xfarray.h    |    1 
 12 files changed, 1162 insertions(+), 2 deletions(-)
 create mode 100644 fs/xfs/scrub/dirtree.c
 create mode 100644 fs/xfs/scrub/dirtree.h
 create mode 100644 fs/xfs/scrub/ino_bitmap.h


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index af99a455ce4db..8ec0dd257a984 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -163,6 +163,7 @@ xfs-y				+= $(addprefix scrub/, \
 				   common.o \
 				   dabtree.o \
 				   dir.o \
+				   dirtree.o \
 				   fscounters.o \
 				   health.o \
 				   ialloc.o \
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index 90e1d0cc04e4b..da0f427a09730 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -720,9 +720,10 @@ struct xfs_scrub_metadata {
 #define XFS_SCRUB_TYPE_QUOTACHECK 25	/* quota counters */
 #define XFS_SCRUB_TYPE_NLINKS	26	/* inode link counts */
 #define XFS_SCRUB_TYPE_HEALTHY	27	/* everything checked out ok */
+#define XFS_SCRUB_TYPE_DIRTREE	28	/* directory tree structure */
 
 /* Number of scrub subcommands. */
-#define XFS_SCRUB_TYPE_NR	28
+#define XFS_SCRUB_TYPE_NR	29
 
 /* i: Repair this metadata. */
 #define XFS_SCRUB_IFLAG_REPAIR		(1u << 0)
diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
index e00466f404829..39465e39dc5fd 100644
--- a/fs/xfs/scrub/common.h
+++ b/fs/xfs/scrub/common.h
@@ -92,6 +92,7 @@ int xchk_setup_directory(struct xfs_scrub *sc);
 int xchk_setup_xattr(struct xfs_scrub *sc);
 int xchk_setup_symlink(struct xfs_scrub *sc);
 int xchk_setup_parent(struct xfs_scrub *sc);
+int xchk_setup_dirtree(struct xfs_scrub *sc);
 #ifdef CONFIG_XFS_RT
 int xchk_setup_rtbitmap(struct xfs_scrub *sc);
 int xchk_setup_rtsummary(struct xfs_scrub *sc);
diff --git a/fs/xfs/scrub/dirtree.c b/fs/xfs/scrub/dirtree.c
new file mode 100644
index 0000000000000..2461e525b3d70
--- /dev/null
+++ b/fs/xfs/scrub/dirtree.c
@@ -0,0 +1,789 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (c) 2023-2024 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <djwong@kernel.org>
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_inode.h"
+#include "xfs_icache.h"
+#include "xfs_dir2.h"
+#include "xfs_dir2_priv.h"
+#include "xfs_attr.h"
+#include "xfs_parent.h"
+#include "scrub/scrub.h"
+#include "scrub/common.h"
+#include "scrub/bitmap.h"
+#include "scrub/ino_bitmap.h"
+#include "scrub/xfile.h"
+#include "scrub/xfarray.h"
+#include "scrub/xfblob.h"
+#include "scrub/listxattr.h"
+#include "scrub/trace.h"
+#include "scrub/dirtree.h"
+
+/*
+ * Directory Tree Structure Validation
+ * ===================================
+ *
+ * Validating the tree qualities of the directory tree structure can be
+ * difficult.  If the tree is frozen, running a depth (or breadth) first search
+ * and marking a bitmap suffices to determine if there is a cycle.  XORing the
+ * mark bitmap with the inode bitmap afterwards tells us if there are
+ * disconnected cycles.  If the tree is not frozen, directory updates can move
+ * subtrees across the scanner wavefront, which complicates the design greatly.
+ *
+ * Directory parent pointers change that by enabling an incremental approach to
+ * validation of the tree structure.  Instead of using one thread to scan the
+ * entire filesystem, we instead can have multiple threads walking individual
+ * subdirectories upwards to the root.  In a perfect world, the IOLOCK would
+ * suffice to stabilize two directories in a parent -> child relationship.
+ * Unfortunately, the VFS does not take the IOLOCK when moving a child
+ * subdirectory, so we instead synchronize on ILOCK and use dirent update hooks
+ * to detect a race.  If a race occurs in a path, we restart the scan.
+ *
+ * If the walk terminates without reaching the root, we know the path is
+ * disconnected and ought to be attached to the lost and found.  If on the walk
+ * we find the same subdir that we're scanning, we know this is a cycle and
+ * should delete an incoming edge.  If we find multiple paths to the root, we
+ * know to delete an incoming edge.
+ *
+ * There are two big hitches with this approach: first, all file link counts
+ * must be correct to prevent other writers from doing the wrong thing with the
+ * directory tree structure.  Second, because we're walking upwards in a tree
+ * of arbitrary depth, we cannot hold all the ILOCKs.  Instead, we will use a
+ * directory update hook to invalidate the scan results if one of the paths
+ * we've scanned has changed.
+ */
+
+/* Clean up the dirtree checking resources. */
+STATIC void
+xchk_dirtree_buf_cleanup(
+	void			*buf)
+{
+	struct xchk_dirtree	*dl = buf;
+	struct xchk_dirpath	*path, *n;
+
+	xchk_dirtree_for_each_path_safe(dl, path, n) {
+		list_del_init(&path->list);
+		xino_bitmap_destroy(&path->seen_inodes);
+		kfree(path);
+	}
+
+	xfblob_destroy(dl->path_names);
+	xfarray_destroy(dl->path_steps);
+	mutex_destroy(&dl->lock);
+}
+
+/* Set us up to look for directory loops. */
+int
+xchk_setup_dirtree(
+	struct xfs_scrub	*sc)
+{
+	struct xchk_dirtree	*dl;
+	char			*descr;
+	int			error;
+
+	dl = kvzalloc(sizeof(struct xchk_dirtree), XCHK_GFP_FLAGS);
+	if (!dl)
+		return -ENOMEM;
+	dl->sc = sc;
+	dl->xname.name = dl->namebuf;
+	INIT_LIST_HEAD(&dl->path_list);
+	dl->root_ino = NULLFSINO;
+
+	mutex_init(&dl->lock);
+
+	descr = xchk_xfile_ino_descr(sc, "dirtree path steps");
+	error = xfarray_create(descr, 0, sizeof(struct xchk_dirpath_step),
+			&dl->path_steps);
+	kfree(descr);
+	if (error)
+		goto out_dl;
+
+	descr = xchk_xfile_ino_descr(sc, "dirtree path names");
+	error = xfblob_create(descr, &dl->path_names);
+	kfree(descr);
+	if (error)
+		goto out_steps;
+
+	error = xchk_setup_inode_contents(sc, 0);
+	if (error)
+		goto out_names;
+
+	sc->buf = dl;
+	sc->buf_cleanup = xchk_dirtree_buf_cleanup;
+	return 0;
+
+out_names:
+	xfblob_destroy(dl->path_names);
+out_steps:
+	xfarray_destroy(dl->path_steps);
+out_dl:
+	mutex_destroy(&dl->lock);
+	kvfree(dl);
+	return error;
+}
+
+/*
+ * Add the parent pointer described by @dl->pptr to the given path as a new
+ * step.  Returns -ELNRNG if the path is too deep.
+ */
+STATIC int
+xchk_dirpath_append(
+	struct xchk_dirtree		*dl,
+	struct xfs_inode		*ip,
+	struct xchk_dirpath		*path,
+	const struct xfs_name		*name,
+	const struct xfs_parent_rec	*pptr)
+{
+	struct xchk_dirpath_step	step = {
+		.pptr_rec		= *pptr, /* struct copy */
+		.name_len		= name->len,
+	};
+	int				error;
+
+	/*
+	 * If this path is more than 2 billion steps long, this directory tree
+	 * is too far gone to fix.
+	 */
+	if (path->nr_steps >= XFS_MAXLINK)
+		return -ELNRNG;
+
+	error = xfblob_storename(dl->path_names, &step.name_cookie, name);
+	if (error)
+		return error;
+
+	error = xino_bitmap_set(&path->seen_inodes, ip->i_ino);
+	if (error)
+		return error;
+
+	error = xfarray_append(dl->path_steps, &step);
+	if (error)
+		return error;
+
+	path->nr_steps++;
+	return 0;
+}
+
+/*
+ * Create an xchk_path for each parent pointer of the directory that we're
+ * scanning.  For each path created, we will eventually try to walk towards the
+ * root with the goal of deleting all parents except for one that leads to the
+ * root.
+ *
+ * Returns -EFSCORRUPTED to signal that the inode being scanned has a corrupt
+ * parent pointer and hence there's no point in continuing; or -ENOSR if there
+ * are too many parent pointers for this directory.
+ */
+STATIC int
+xchk_dirtree_create_path(
+	struct xfs_scrub		*sc,
+	struct xfs_inode		*ip,
+	unsigned int			attr_flags,
+	const unsigned char		*name,
+	unsigned int			namelen,
+	const void			*value,
+	unsigned int			valuelen,
+	void				*priv)
+{
+	struct xfs_name			xname = {
+		.name			= name,
+		.len			= namelen,
+	};
+	struct xchk_dirtree		*dl = priv;
+	struct xchk_dirpath		*path;
+	const struct xfs_parent_rec	*rec = value;
+	int				ret;
+
+	ret = xfs_parent_from_xattr(sc->mp, attr_flags, name, namelen,
+			value, valuelen, NULL, NULL);
+	if (ret != 1)
+		return ret;
+
+	/*
+	 * If there are more than 2 billion actual parent pointers for this
+	 * subdirectory, this fs is too far gone to fix.
+	 */
+	if (dl->nr_paths >= XFS_MAXLINK)
+		return -ENOSR;
+
+	trace_xchk_dirtree_create_path(sc, ip, dl->nr_paths, &xname, rec);
+
+	/*
+	 * Create a new xchk_path structure to remember this parent pointer
+	 * and record the first name step.
+	 */
+	path = kmalloc(sizeof(struct xchk_dirpath), XCHK_GFP_FLAGS);
+	if (!path)
+		return -ENOMEM;
+
+	INIT_LIST_HEAD(&path->list);
+	xino_bitmap_init(&path->seen_inodes);
+	path->nr_steps = 0;
+	path->outcome = XCHK_DIRPATH_SCANNING;
+
+	ret = xchk_dirpath_append(dl, sc->ip, path, &xname, rec);
+	if (ret)
+		goto out_path;
+
+	path->first_step = xfarray_length(dl->path_steps) - 1;
+	path->second_step = XFARRAY_NULLIDX;
+	path->path_nr = dl->nr_paths;
+
+	list_add_tail(&path->list, &dl->path_list);
+	dl->nr_paths++;
+	return 0;
+out_path:
+	kfree(path);
+	return ret;
+}
+
+/*
+ * Validate that the first step of this path still has a corresponding
+ * parent pointer in @sc->ip.  We probably dropped @sc->ip's ILOCK while
+ * walking towards the roots, which is why this is necessary.
+ *
+ * This function has a side effect of loading the first parent pointer of this
+ * path into the parent pointer scratch pad.  This prepares us to walk up the
+ * directory tree towards the root.  Returns -ESTALE if the scan data is now
+ * out of date.
+ */
+STATIC int
+xchk_dirpath_revalidate(
+	struct xchk_dirtree		*dl,
+	struct xchk_dirpath		*path)
+{
+	struct xfs_scrub		*sc = dl->sc;
+	int				error;
+
+	/*
+	 * Look up the parent pointer that corresponds to the start of this
+	 * path.  If the parent pointer has disappeared on us, dump all the
+	 * scan results and try again.
+	 */
+	error = xfs_parent_lookup(sc->tp, sc->ip, &dl->xname, &dl->pptr_rec,
+			&dl->pptr_args);
+	if (error == -ENOATTR) {
+		trace_xchk_dirpath_disappeared(dl->sc, sc->ip, path->path_nr,
+				path->first_step, &dl->xname, &dl->pptr_rec);
+		dl->stale = true;
+		return -ESTALE;
+	}
+
+	return error;
+}
+
+/*
+ * Walk the parent pointers of a directory at the end of a path and record
+ * the parent that we find in @dl->xname/pptr_rec.
+ */
+STATIC int
+xchk_dirpath_find_next_step(
+	struct xfs_scrub		*sc,
+	struct xfs_inode		*ip,
+	unsigned int			attr_flags,
+	const unsigned char		*name,
+	unsigned int			namelen,
+	const void			*value,
+	unsigned int			valuelen,
+	void				*priv)
+{
+	struct xchk_dirtree		*dl = priv;
+	const struct xfs_parent_rec	*rec = value;
+	int				ret;
+
+	ret = xfs_parent_from_xattr(sc->mp, attr_flags, name, namelen,
+			value, valuelen, NULL, NULL);
+	if (ret != 1)
+		return ret;
+
+	/*
+	 * If we've already set @dl->pptr_rec, then this directory has multiple
+	 * parents.  Signal this back to the caller via -EMLINK.
+	 */
+	if (dl->parents_found > 0)
+		return -EMLINK;
+
+	dl->parents_found++;
+	memcpy(dl->namebuf, name, namelen);
+	dl->xname.len = namelen;
+	dl->pptr_rec = *rec; /* struct copy */
+	return 0;
+}
+
+/* Set and log the outcome of a path walk. */
+static inline void
+xchk_dirpath_set_outcome(
+	struct xchk_dirtree		*dl,
+	struct xchk_dirpath		*path,
+	enum xchk_dirpath_outcome	outcome)
+{
+	trace_xchk_dirpath_set_outcome(dl->sc, path->path_nr, path->nr_steps,
+			outcome);
+
+	path->outcome = outcome;
+}
+
+/*
+ * Scan the directory at the end of this path for its parent directory link.
+ * If we find one, extend the path.  Returns -ESTALE if the scan data out of
+ * date.  Returns -EFSCORRUPTED if the parent pointer is bad; or -ELNRNG if
+ * the path got too deep.
+ */
+STATIC int
+xchk_dirpath_step_up(
+	struct xchk_dirtree	*dl,
+	struct xchk_dirpath	*path)
+{
+	struct xfs_scrub	*sc = dl->sc;
+	struct xfs_inode	*dp;
+	xfs_ino_t		parent_ino = be64_to_cpu(dl->pptr_rec.p_ino);
+	unsigned int		lock_mode;
+	int			error;
+
+	/* Grab and lock the parent directory. */
+	error = xchk_iget(sc, parent_ino, &dp);
+	if (error)
+		return error;
+
+	lock_mode = xfs_ilock_attr_map_shared(dp);
+	mutex_lock(&dl->lock);
+
+	if (dl->stale) {
+		error = -ESTALE;
+		goto out_scanlock;
+	}
+
+	/* We've reached the root directory; the path is ok. */
+	if (parent_ino == dl->root_ino) {
+		xchk_dirpath_set_outcome(dl, path, XCHK_DIRPATH_OK);
+		error = 0;
+		goto out_scanlock;
+	}
+
+	/*
+	 * The inode being scanned is its own distant ancestor!  Get rid of
+	 * this path.
+	 */
+	if (parent_ino == sc->ip->i_ino) {
+		xchk_dirpath_set_outcome(dl, path, XCHK_DIRPATH_DELETE);
+		error = 0;
+		goto out_scanlock;
+	}
+
+	/*
+	 * We've seen this inode before during the path walk.  There's a loop
+	 * above us in the directory tree.  This probably means that we cannot
+	 * continue, but let's keep walking paths to get a full picture.
+	 */
+	if (xino_bitmap_test(&path->seen_inodes, parent_ino)) {
+		xchk_dirpath_set_outcome(dl, path, XCHK_DIRPATH_LOOP);
+		error = 0;
+		goto out_scanlock;
+	}
+
+	/* The handle encoded in the parent pointer must match. */
+	if (VFS_I(dp)->i_generation != be32_to_cpu(dl->pptr_rec.p_gen)) {
+		trace_xchk_dirpath_badgen(dl->sc, dp, path->path_nr,
+				path->nr_steps, &dl->xname, &dl->pptr_rec);
+		error = -EFSCORRUPTED;
+		goto out_scanlock;
+	}
+
+	/* Parent pointer must point up to a directory. */
+	if (!S_ISDIR(VFS_I(dp)->i_mode)) {
+		trace_xchk_dirpath_nondir_parent(dl->sc, dp, path->path_nr,
+				path->nr_steps, &dl->xname, &dl->pptr_rec);
+		error = -EFSCORRUPTED;
+		goto out_scanlock;
+	}
+
+	/* Parent cannot be an unlinked directory. */
+	if (VFS_I(dp)->i_nlink == 0) {
+		trace_xchk_dirpath_unlinked_parent(dl->sc, dp, path->path_nr,
+				path->nr_steps, &dl->xname, &dl->pptr_rec);
+		error = -EFSCORRUPTED;
+		goto out_scanlock;
+	}
+
+	/*
+	 * If the extended attributes look as though they has been zapped by
+	 * the inode record repair code, we cannot scan for parent pointers.
+	 */
+	if (xchk_pptr_looks_zapped(dp)) {
+		error = -EBUSY;
+		xchk_set_incomplete(sc);
+		goto out_scanlock;
+	}
+
+	/*
+	 * Walk the parent pointers of @dp to find the parent of this directory
+	 * to find the next step in our walk.  If we find that @dp has exactly
+	 * one parent, the parent pointer information will be stored in
+	 * @dl->pptr_rec.  This prepares us for the next step of the walk.
+	 */
+	mutex_unlock(&dl->lock);
+	dl->parents_found = 0;
+	error = xchk_xattr_walk(sc, dp, xchk_dirpath_find_next_step, NULL, dl);
+	mutex_lock(&dl->lock);
+	if (error == -EFSCORRUPTED || error == -EMLINK ||
+	    (!error && dl->parents_found == 0)) {
+		/*
+		 * Further up the directory tree from @sc->ip, we found a
+		 * corrupt parent pointer, multiple parent pointers while
+		 * finding this directory's parent, or zero parents despite
+		 * having a nonzero link count.  Keep looking for other paths.
+		 */
+		xchk_dirpath_set_outcome(dl, path, XCHK_DIRPATH_CORRUPT);
+		error = 0;
+		goto out_scanlock;
+	}
+	if (error)
+		goto out_scanlock;
+
+	if (dl->stale) {
+		error = -ESTALE;
+		goto out_scanlock;
+	}
+
+	trace_xchk_dirpath_found_next_step(sc, dp, path->path_nr,
+			path->nr_steps, &dl->xname, &dl->pptr_rec);
+
+	/* Append to the path steps */
+	error = xchk_dirpath_append(dl, dp, path, &dl->xname, &dl->pptr_rec);
+	if (error)
+		goto out_scanlock;
+
+	if (path->second_step == XFARRAY_NULLIDX)
+		path->second_step = xfarray_length(dl->path_steps) - 1;
+
+out_scanlock:
+	mutex_unlock(&dl->lock);
+	xfs_iunlock(dp, lock_mode);
+	xchk_irele(sc, dp);
+	return error;
+}
+
+/*
+ * Walk the directory tree upwards towards what is hopefully the root
+ * directory, recording path steps as we go.  The current path components are
+ * stored in dl->pptr_rec and dl->xname.
+ *
+ * Returns -ESTALE if the scan data are out of date.  Returns -EFSCORRUPTED
+ * only if the direct parent pointer of @sc->ip associated with this path is
+ * corrupt.
+ */
+STATIC int
+xchk_dirpath_walk_upwards(
+	struct xchk_dirtree	*dl,
+	struct xchk_dirpath	*path)
+{
+	struct xfs_scrub	*sc = dl->sc;
+	int			error;
+
+	ASSERT(sc->ilock_flags & XFS_ILOCK_EXCL);
+
+	/* Reload the start of this path and make sure it's still there. */
+	error = xchk_dirpath_revalidate(dl, path);
+	if (error)
+		return error;
+
+	trace_xchk_dirpath_walk_upwards(sc, sc->ip, path->path_nr, &dl->xname,
+			&dl->pptr_rec);
+
+	/*
+	 * The inode being scanned is its own direct ancestor!
+	 * Get rid of this path.
+	 */
+	if (be64_to_cpu(dl->pptr_rec.p_ino) == sc->ip->i_ino) {
+		xchk_dirpath_set_outcome(dl, path, XCHK_DIRPATH_DELETE);
+		return 0;
+	}
+
+	/*
+	 * Drop ILOCK_EXCL on the inode being scanned.  We still hold
+	 * IOLOCK_EXCL on it, so it cannot move around or be renamed.
+	 *
+	 * Beyond this point we're walking up the directory tree, which means
+	 * that we can acquire and drop the ILOCK on an alias of sc->ip.  The
+	 * ILOCK state is no longer tracked in the scrub context.  Hence we
+	 * must drop @sc->ip's ILOCK during the walk.
+	 */
+	mutex_unlock(&dl->lock);
+	xchk_iunlock(sc, XFS_ILOCK_EXCL);
+
+	/*
+	 * Take the first step in the walk towards the root by checking the
+	 * start of this path, which is a direct parent pointer of @sc->ip.
+	 * If we see any kind of error here (including corruptions), the parent
+	 * pointer of @sc->ip is corrupt.  Stop the whole scan.
+	 */
+	error = xchk_dirpath_step_up(dl, path);
+	if (error) {
+		xchk_ilock(sc, XFS_ILOCK_EXCL);
+		mutex_lock(&dl->lock);
+		return error;
+	}
+
+	/*
+	 * Take steps upward from the second step in this path towards the
+	 * root.  If we hit corruption errors here, there's a problem
+	 * *somewhere* in the path, but we don't need to stop scanning.
+	 */
+	while (!error && path->outcome == XCHK_DIRPATH_SCANNING)
+		error = xchk_dirpath_step_up(dl, path);
+
+	/* Retake the locks we had, mark paths, etc. */
+	xchk_ilock(sc, XFS_ILOCK_EXCL);
+	mutex_lock(&dl->lock);
+	if (error == -EFSCORRUPTED) {
+		xchk_dirpath_set_outcome(dl, path, XCHK_DIRPATH_CORRUPT);
+		error = 0;
+	}
+	if (!error && dl->stale)
+		return -ESTALE;
+	return error;
+}
+
+/* Delete all the collected path information. */
+STATIC void
+xchk_dirtree_reset(
+	void			*buf)
+{
+	struct xchk_dirtree	*dl = buf;
+	struct xchk_dirpath	*path, *n;
+
+	ASSERT(dl->sc->ilock_flags & XFS_ILOCK_EXCL);
+
+	xchk_dirtree_for_each_path_safe(dl, path, n) {
+		list_del_init(&path->list);
+		xino_bitmap_destroy(&path->seen_inodes);
+		kfree(path);
+	}
+	dl->nr_paths = 0;
+
+	xfarray_truncate(dl->path_steps);
+	xfblob_truncate(dl->path_names);
+
+	dl->stale = false;
+}
+
+/*
+ * Load the name/pptr from the first step in this path into @dl->pptr_rec and
+ * @dl->xname.
+ */
+STATIC int
+xchk_dirtree_load_path(
+	struct xchk_dirtree		*dl,
+	struct xchk_dirpath		*path)
+{
+	struct xchk_dirpath_step	step;
+	int				error;
+
+	error = xfarray_load(dl->path_steps, path->first_step, &step);
+	if (error)
+		return error;
+
+	error = xfblob_loadname(dl->path_names, step.name_cookie, &dl->xname,
+			step.name_len);
+	if (error)
+		return error;
+
+	dl->pptr_rec = step.pptr_rec; /* struct copy */
+	return 0;
+}
+
+/*
+ * For each parent pointer of this subdir, trace a path upwards towards the
+ * root directory and record what we find.  Returns 0 for success;
+ * -EFSCORRUPTED if walking the parent pointers of @sc->ip failed, -ELNRNG if a
+ * path was too deep; -ENOSR if there were too many parent pointers; or
+ * a negative errno.
+ */
+STATIC int
+xchk_dirtree_find_paths_to_root(
+	struct xchk_dirtree	*dl)
+{
+	struct xfs_scrub	*sc = dl->sc;
+	struct xchk_dirpath	*path;
+	int			error = 0;
+
+	do {
+		if (xchk_should_terminate(sc, &error))
+			return error;
+
+		xchk_dirtree_reset(dl);
+
+		/*
+		 * If the extended attributes look as though they has been
+		 * zapped by the inode record repair code, we cannot scan for
+		 * parent pointers.
+		 */
+		if (xchk_pptr_looks_zapped(sc->ip)) {
+			xchk_set_incomplete(sc);
+			return -EBUSY;
+		}
+
+		/*
+		 * Create path walk contexts for each parent of the directory
+		 * that is being scanned.  Directories are supposed to have
+		 * only one parent, but this is how we detect multiple parents.
+		 */
+		error = xchk_xattr_walk(sc, sc->ip, xchk_dirtree_create_path,
+				NULL, dl);
+		if (error)
+			return error;
+
+		xchk_dirtree_for_each_path(dl, path) {
+			/* Load path components into dl->pptr/xname */
+			error = xchk_dirtree_load_path(dl, path);
+			if (error)
+				return error;
+
+			/*
+			 * Try to walk up each path to the root.  This enables
+			 * us to find directory loops in ancestors, and the
+			 * like.
+			 */
+			error = xchk_dirpath_walk_upwards(dl, path);
+			if (error == -EFSCORRUPTED) {
+				/*
+				 * A parent pointer of @sc->ip is bad, don't
+				 * bother continuing.
+				 */
+				break;
+			}
+			if (error == -ESTALE) {
+				/* This had better be an invalidation. */
+				ASSERT(dl->stale);
+				break;
+			}
+			if (error)
+				return error;
+		}
+	} while (dl->stale);
+
+	return error;
+}
+
+/*
+ * Figure out what to do with the paths we tried to find.  Do not call this
+ * if the scan results are stale.
+ */
+STATIC void
+xchk_dirtree_evaluate(
+	struct xchk_dirtree		*dl,
+	struct xchk_dirtree_outcomes	*oc)
+{
+	struct xchk_dirpath		*path;
+
+	ASSERT(!dl->stale);
+
+	/* Scan the paths we have to decide what to do. */
+	memset(oc, 0, sizeof(struct xchk_dirtree_outcomes));
+	xchk_dirtree_for_each_path(dl, path) {
+		trace_xchk_dirpath_evaluate_path(dl->sc, path->path_nr,
+				path->nr_steps, path->outcome);
+
+		switch (path->outcome) {
+		case XCHK_DIRPATH_SCANNING:
+			/* shouldn't get here */
+			ASSERT(0);
+			break;
+		case XCHK_DIRPATH_DELETE:
+			/* This one is already going away. */
+			oc->bad++;
+			break;
+		case XCHK_DIRPATH_CORRUPT:
+		case XCHK_DIRPATH_LOOP:
+			/* Couldn't find the end of this path. */
+			oc->suspect++;
+			break;
+		case XCHK_DIRPATH_STALE:
+			/* shouldn't get here either */
+			ASSERT(0);
+			break;
+		case XCHK_DIRPATH_OK:
+			/* This path got all the way to the root. */
+			oc->good++;
+			break;
+		}
+	}
+
+	trace_xchk_dirtree_evaluate(dl, oc);
+}
+
+/* Look for directory loops. */
+int
+xchk_dirtree(
+	struct xfs_scrub		*sc)
+{
+	struct xchk_dirtree_outcomes	oc;
+	struct xchk_dirtree		*dl = sc->buf;
+	int				error;
+
+	/*
+	 * Nondirectories do not point downwards to other files, so they cannot
+	 * cause a cycle in the directory tree.
+	 */
+	if (!S_ISDIR(VFS_I(sc->ip)->i_mode))
+		return -ENOENT;
+
+	ASSERT(xfs_has_parent(sc->mp));
+
+	/* Find the root of the directory tree. */
+	dl->root_ino = sc->mp->m_rootip->i_ino;
+
+	trace_xchk_dirtree_start(sc->ip, sc->sm, 0);
+
+	mutex_lock(&dl->lock);
+
+	/* Trace each parent pointer's path to the root. */
+	error = xchk_dirtree_find_paths_to_root(dl);
+	if (error == -EFSCORRUPTED || error == -ELNRNG || error == -ENOSR) {
+		/*
+		 * Don't bother walking the paths if the xattr structure or the
+		 * parent pointers are corrupt; this scan cannot be completed
+		 * without full information.
+		 */
+		xchk_ino_xref_set_corrupt(sc, sc->ip->i_ino);
+		error = 0;
+		goto out_scanlock;
+	}
+	if (error == -EBUSY) {
+		/*
+		 * We couldn't scan some directory's parent pointers because
+		 * the attr fork looked like it had been zapped.  The
+		 * scan was marked incomplete, so no further error code
+		 * is necessary.
+		 */
+		error = 0;
+		goto out_scanlock;
+	}
+	if (error)
+		goto out_scanlock;
+
+	/* Assess what we found in our path evaluation. */
+	xchk_dirtree_evaluate(dl, &oc);
+	if (xchk_dirtree_parentless(dl)) {
+		if (oc.good || oc.bad || oc.suspect)
+			xchk_ino_set_corrupt(sc, sc->ip->i_ino);
+	} else {
+		if (oc.bad || oc.good + oc.suspect != 1)
+			xchk_ino_set_corrupt(sc, sc->ip->i_ino);
+		if (oc.suspect)
+			xchk_ino_xref_set_corrupt(sc, sc->ip->i_ino);
+	}
+
+out_scanlock:
+	mutex_unlock(&dl->lock);
+	trace_xchk_dirtree_done(sc->ip, sc->sm, error);
+	return error;
+}
diff --git a/fs/xfs/scrub/dirtree.h b/fs/xfs/scrub/dirtree.h
new file mode 100644
index 0000000000000..50fefd64ae508
--- /dev/null
+++ b/fs/xfs/scrub/dirtree.h
@@ -0,0 +1,129 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Copyright (c) 2023-2024 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <djwong@kernel.org>
+ */
+#ifndef __XFS_SCRUB_DIRTREE_H__
+#define __XFS_SCRUB_DIRTREE_H__
+
+/*
+ * Each of these represents one parent pointer path step in a chain going
+ * up towards the directory tree root.  These are stored inside an xfarray.
+ */
+struct xchk_dirpath_step {
+	/* Directory entry name associated with this parent link. */
+	xfblob_cookie		name_cookie;
+	unsigned int		name_len;
+
+	/* Handle of the parent directory. */
+	struct xfs_parent_rec	pptr_rec;
+};
+
+enum xchk_dirpath_outcome {
+	XCHK_DIRPATH_SCANNING = 0,	/* still being put together */
+	XCHK_DIRPATH_DELETE,		/* delete this path */
+	XCHK_DIRPATH_CORRUPT,		/* corruption detected in path */
+	XCHK_DIRPATH_LOOP,		/* cycle detected further up */
+	XCHK_DIRPATH_STALE,		/* path is stale */
+	XCHK_DIRPATH_OK,		/* path reaches the root */
+};
+
+/*
+ * Each of these represents one parent pointer path out of the directory being
+ * scanned.  These exist in-core, and hopefully there aren't more than a
+ * handful of them.
+ */
+struct xchk_dirpath {
+	struct list_head	list;
+
+	/* Index of the first step in this path. */
+	xfarray_idx_t		first_step;
+
+	/* Index of the second step in this path. */
+	xfarray_idx_t		second_step;
+
+	/* Inodes seen while walking this path. */
+	struct xino_bitmap	seen_inodes;
+
+	/* Number of steps in this path. */
+	unsigned int		nr_steps;
+
+	/* Which path is this? */
+	unsigned int		path_nr;
+
+	/* What did we conclude from following this path? */
+	enum xchk_dirpath_outcome outcome;
+};
+
+struct xchk_dirtree_outcomes {
+	/* Number of XCHK_DIRPATH_DELETE */
+	unsigned int		bad;
+
+	/* Number of XCHK_DIRPATH_CORRUPT or XCHK_DIRPATH_LOOP */
+	unsigned int		suspect;
+
+	/* Number of XCHK_DIRPATH_OK */
+	unsigned int		good;
+};
+
+struct xchk_dirtree {
+	struct xfs_scrub	*sc;
+
+	/* Root inode that we're looking for. */
+	xfs_ino_t		root_ino;
+
+	/* Scratch buffer for scanning pptr xattrs */
+	struct xfs_parent_rec	pptr_rec;
+	struct xfs_da_args	pptr_args;
+
+	/* Name buffer */
+	struct xfs_name		xname;
+	char			namebuf[MAXNAMELEN];
+
+	/* lock for everything below here */
+	struct mutex		lock;
+
+	/*
+	 * All path steps observed during this scan.  Each of the path
+	 * steps for a particular pathwalk are recorded in sequential
+	 * order in the xfarray.  A pathwalk ends either with a step
+	 * pointing to the root directory (success) or pointing to NULLFSINO
+	 * (loop detected, empty dir detected, etc).
+	 */
+	struct xfarray		*path_steps;
+
+	/* All names observed during this scan. */
+	struct xfblob		*path_names;
+
+	/* All paths being tracked by this scanner. */
+	struct list_head	path_list;
+
+	/* Number of paths in path_list. */
+	unsigned int		nr_paths;
+
+	/* Number of parents found by a pptr scan. */
+	unsigned int		parents_found;
+
+	/* Have the path data been invalidated by a concurrent update? */
+	bool			stale:1;
+};
+
+#define xchk_dirtree_for_each_path_safe(dl, path, n) \
+	list_for_each_entry_safe((path), (n), &(dl)->path_list, list)
+
+#define xchk_dirtree_for_each_path(dl, path) \
+	list_for_each_entry((path), &(dl)->path_list, list)
+
+static inline bool
+xchk_dirtree_parentless(const struct xchk_dirtree *dl)
+{
+	struct xfs_scrub	*sc = dl->sc;
+
+	if (sc->ip == sc->mp->m_rootip)
+		return true;
+	if (VFS_I(sc->ip)->i_nlink == 0)
+		return true;
+	return false;
+}
+
+#endif /* __XFS_SCRUB_DIRTREE_H__ */
diff --git a/fs/xfs/scrub/ino_bitmap.h b/fs/xfs/scrub/ino_bitmap.h
new file mode 100644
index 0000000000000..1300833679abf
--- /dev/null
+++ b/fs/xfs/scrub/ino_bitmap.h
@@ -0,0 +1,37 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (c) 2023-2024 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <djwong@kernel.org>
+ */
+#ifndef __XFS_SCRUB_INO_BITMAP_H__
+#define __XFS_SCRUB_INO_BITMAP_H__
+
+/* Bitmaps, but for type-checked for xfs_ino_t */
+
+struct xino_bitmap {
+	struct xbitmap64	inobitmap;
+};
+
+static inline void xino_bitmap_init(struct xino_bitmap *bitmap)
+{
+	xbitmap64_init(&bitmap->inobitmap);
+}
+
+static inline void xino_bitmap_destroy(struct xino_bitmap *bitmap)
+{
+	xbitmap64_destroy(&bitmap->inobitmap);
+}
+
+static inline int xino_bitmap_set(struct xino_bitmap *bitmap, xfs_ino_t ino)
+{
+	return xbitmap64_set(&bitmap->inobitmap, ino, 1);
+}
+
+static inline int xino_bitmap_test(struct xino_bitmap *bitmap, xfs_ino_t ino)
+{
+	uint64_t	len = 1;
+
+	return xbitmap64_test(&bitmap->inobitmap, ino, &len);
+}
+
+#endif	/* __XFS_SCRUB_INO_BITMAP_H__ */
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index 7b1f1abdc7a98..8f1431db77395 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -436,6 +436,13 @@ static const struct xchk_meta_ops meta_scrub_ops[] = {
 		.scrub	= xchk_health_record,
 		.repair = xrep_notsupported,
 	},
+	[XFS_SCRUB_TYPE_DIRTREE] = {	/* directory tree structure */
+		.type	= ST_INODE,
+		.setup	= xchk_setup_dirtree,
+		.scrub	= xchk_dirtree,
+		.has	= xfs_has_parent,
+		.repair	= xrep_notsupported,
+	},
 };
 
 static int
diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h
index 54a4242bc79cf..3910270471462 100644
--- a/fs/xfs/scrub/scrub.h
+++ b/fs/xfs/scrub/scrub.h
@@ -185,6 +185,7 @@ int xchk_directory(struct xfs_scrub *sc);
 int xchk_xattr(struct xfs_scrub *sc);
 int xchk_symlink(struct xfs_scrub *sc);
 int xchk_parent(struct xfs_scrub *sc);
+int xchk_dirtree(struct xfs_scrub *sc);
 #ifdef CONFIG_XFS_RT
 int xchk_rtbitmap(struct xfs_scrub *sc);
 int xchk_rtsummary(struct xfs_scrub *sc);
diff --git a/fs/xfs/scrub/stats.c b/fs/xfs/scrub/stats.c
index 42cafbed94ac6..7996c23354763 100644
--- a/fs/xfs/scrub/stats.c
+++ b/fs/xfs/scrub/stats.c
@@ -79,6 +79,7 @@ static const char *name_map[XFS_SCRUB_TYPE_NR] = {
 	[XFS_SCRUB_TYPE_FSCOUNTERS]	= "fscounters",
 	[XFS_SCRUB_TYPE_QUOTACHECK]	= "quotacheck",
 	[XFS_SCRUB_TYPE_NLINKS]		= "nlinks",
+	[XFS_SCRUB_TYPE_DIRTREE]	= "dirtree",
 };
 
 /* Format the scrub stats into a text buffer, similar to pcp style. */
diff --git a/fs/xfs/scrub/trace.c b/fs/xfs/scrub/trace.c
index 4a8cc2c98d997..4470ad0533b81 100644
--- a/fs/xfs/scrub/trace.c
+++ b/fs/xfs/scrub/trace.c
@@ -28,6 +28,10 @@
 #include "scrub/orphanage.h"
 #include "scrub/nlinks.h"
 #include "scrub/fscounters.h"
+#include "scrub/bitmap.h"
+#include "scrub/ino_bitmap.h"
+#include "scrub/xfblob.h"
+#include "scrub/dirtree.h"
 
 /* Figure out which block the btree cursor was pointing to. */
 static inline xfs_fsblock_t
diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
index ecfaa4b88910f..c474bcd7d54b7 100644
--- a/fs/xfs/scrub/trace.h
+++ b/fs/xfs/scrub/trace.h
@@ -27,6 +27,9 @@ struct xchk_nlink;
 struct xchk_fscounters;
 struct xfs_rmap_update_params;
 struct xfs_parent_rec;
+enum xchk_dirpath_outcome;
+struct xchk_dirtree;
+struct xchk_dirtree_outcomes;
 
 /*
  * ftrace's __print_symbolic requires that all enum values be wrapped in the
@@ -65,6 +68,7 @@ TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_FSCOUNTERS);
 TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_QUOTACHECK);
 TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_NLINKS);
 TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_HEALTHY);
+TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_DIRTREE);
 
 #define XFS_SCRUB_TYPE_STRINGS \
 	{ XFS_SCRUB_TYPE_PROBE,		"probe" }, \
@@ -94,7 +98,8 @@ TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_HEALTHY);
 	{ XFS_SCRUB_TYPE_FSCOUNTERS,	"fscounters" }, \
 	{ XFS_SCRUB_TYPE_QUOTACHECK,	"quotacheck" }, \
 	{ XFS_SCRUB_TYPE_NLINKS,	"nlinks" }, \
-	{ XFS_SCRUB_TYPE_HEALTHY,	"healthy" }
+	{ XFS_SCRUB_TYPE_HEALTHY,	"healthy" }, \
+	{ XFS_SCRUB_TYPE_DIRTREE,	"dirtree" }
 
 #define XFS_SCRUB_FLAG_STRINGS \
 	{ XFS_SCRUB_IFLAG_REPAIR,		"repair" }, \
@@ -171,6 +176,8 @@ DEFINE_EVENT(xchk_class, name, \
 DEFINE_SCRUB_EVENT(xchk_start);
 DEFINE_SCRUB_EVENT(xchk_done);
 DEFINE_SCRUB_EVENT(xchk_deadlock_retry);
+DEFINE_SCRUB_EVENT(xchk_dirtree_start);
+DEFINE_SCRUB_EVENT(xchk_dirtree_done);
 DEFINE_SCRUB_EVENT(xrep_attempt);
 DEFINE_SCRUB_EVENT(xrep_done);
 
@@ -1576,6 +1583,187 @@ DEFINE_XCHK_PPTR_EVENT(xchk_parent_defer);
 DEFINE_XCHK_PPTR_EVENT(xchk_parent_slowpath);
 DEFINE_XCHK_PPTR_EVENT(xchk_parent_ultraslowpath);
 
+DECLARE_EVENT_CLASS(xchk_dirtree_class,
+	TP_PROTO(struct xfs_scrub *sc, struct xfs_inode *ip,
+		 unsigned int path_nr, const struct xfs_name *name,
+		 const struct xfs_parent_rec *pptr),
+	TP_ARGS(sc, ip, path_nr, name, pptr),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(unsigned int, path_nr)
+		__field(xfs_ino_t, child_ino)
+		__field(unsigned int, child_gen)
+		__field(xfs_ino_t, parent_ino)
+		__field(unsigned int, parent_gen)
+		__field(unsigned int, namelen)
+		__dynamic_array(char, name, name->len)
+	),
+	TP_fast_assign(
+		__entry->dev = sc->mp->m_super->s_dev;
+		__entry->path_nr = path_nr;
+		__entry->child_ino = ip->i_ino;
+		__entry->child_gen = VFS_I(ip)->i_generation;
+		__entry->parent_ino = be64_to_cpu(pptr->p_ino);
+		__entry->parent_gen = be32_to_cpu(pptr->p_gen);
+		__entry->namelen = name->len;
+		memcpy(__get_str(name), name->name, name->len);
+	),
+	TP_printk("dev %d:%d path %u child_ino 0x%llx child_gen 0x%x parent_ino 0x%llx parent_gen 0x%x name '%.*s'",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->path_nr,
+		  __entry->child_ino,
+		  __entry->child_gen,
+		  __entry->parent_ino,
+		  __entry->parent_gen,
+		  __entry->namelen,
+		  __get_str(name))
+);
+#define DEFINE_XCHK_DIRTREE_EVENT(name) \
+DEFINE_EVENT(xchk_dirtree_class, name, \
+	TP_PROTO(struct xfs_scrub *sc, struct xfs_inode *ip, \
+		 unsigned int path_nr, const struct xfs_name *name, \
+		 const struct xfs_parent_rec *pptr), \
+	TP_ARGS(sc, ip, path_nr, name, pptr))
+DEFINE_XCHK_DIRTREE_EVENT(xchk_dirtree_create_path);
+DEFINE_XCHK_DIRTREE_EVENT(xchk_dirpath_walk_upwards);
+
+DECLARE_EVENT_CLASS(xchk_dirpath_class,
+	TP_PROTO(struct xfs_scrub *sc, struct xfs_inode *ip,
+		 unsigned int path_nr, unsigned int step_nr,
+		 const struct xfs_name *name,
+		 const struct xfs_parent_rec *pptr),
+	TP_ARGS(sc, ip, path_nr, step_nr, name, pptr),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(unsigned int, path_nr)
+		__field(unsigned int, step_nr)
+		__field(xfs_ino_t, child_ino)
+		__field(unsigned int, child_gen)
+		__field(xfs_ino_t, parent_ino)
+		__field(unsigned int, parent_gen)
+		__field(unsigned int, namelen)
+		__dynamic_array(char, name, name->len)
+	),
+	TP_fast_assign(
+		__entry->dev = sc->mp->m_super->s_dev;
+		__entry->path_nr = path_nr;
+		__entry->step_nr = step_nr;
+		__entry->child_ino = ip->i_ino;
+		__entry->child_gen = VFS_I(ip)->i_generation;
+		__entry->parent_ino = be64_to_cpu(pptr->p_ino);
+		__entry->parent_gen = be32_to_cpu(pptr->p_gen);
+		__entry->namelen = name->len;
+		memcpy(__get_str(name), name->name, name->len);
+	),
+	TP_printk("dev %d:%d path %u step %u child_ino 0x%llx child_gen 0x%x parent_ino 0x%llx parent_gen 0x%x name '%.*s'",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->path_nr,
+		  __entry->step_nr,
+		  __entry->child_ino,
+		  __entry->child_gen,
+		  __entry->parent_ino,
+		  __entry->parent_gen,
+		  __entry->namelen,
+		  __get_str(name))
+);
+#define DEFINE_XCHK_DIRPATH_EVENT(name) \
+DEFINE_EVENT(xchk_dirpath_class, name, \
+	TP_PROTO(struct xfs_scrub *sc, struct xfs_inode *ip, \
+		 unsigned int path_nr, unsigned int step_nr, \
+		 const struct xfs_name *name, \
+		 const struct xfs_parent_rec *pptr), \
+	TP_ARGS(sc, ip, path_nr, step_nr, name, pptr))
+DEFINE_XCHK_DIRPATH_EVENT(xchk_dirpath_disappeared);
+DEFINE_XCHK_DIRPATH_EVENT(xchk_dirpath_badgen);
+DEFINE_XCHK_DIRPATH_EVENT(xchk_dirpath_nondir_parent);
+DEFINE_XCHK_DIRPATH_EVENT(xchk_dirpath_unlinked_parent);
+DEFINE_XCHK_DIRPATH_EVENT(xchk_dirpath_found_next_step);
+
+TRACE_DEFINE_ENUM(XCHK_DIRPATH_SCANNING);
+TRACE_DEFINE_ENUM(XCHK_DIRPATH_DELETE);
+TRACE_DEFINE_ENUM(XCHK_DIRPATH_CORRUPT);
+TRACE_DEFINE_ENUM(XCHK_DIRPATH_LOOP);
+TRACE_DEFINE_ENUM(XCHK_DIRPATH_STALE);
+TRACE_DEFINE_ENUM(XCHK_DIRPATH_OK);
+
+#define XCHK_DIRPATH_OUTCOME_STRINGS \
+	{ XCHK_DIRPATH_SCANNING,	"scanning" }, \
+	{ XCHK_DIRPATH_DELETE,		"delete" }, \
+	{ XCHK_DIRPATH_CORRUPT,		"corrupt" }, \
+	{ XCHK_DIRPATH_LOOP,		"loop" }, \
+	{ XCHK_DIRPATH_STALE,		"stale" }, \
+	{ XCHK_DIRPATH_OK,		"ok" }
+
+DECLARE_EVENT_CLASS(xchk_dirpath_outcome_class,
+	TP_PROTO(struct xfs_scrub *sc, unsigned long long path_nr,
+		 unsigned int nr_steps, \
+		 unsigned int outcome),
+	TP_ARGS(sc, path_nr, nr_steps, outcome),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(unsigned long long, path_nr)
+		__field(unsigned int, nr_steps)
+		__field(unsigned int, outcome)
+	),
+	TP_fast_assign(
+		__entry->dev = sc->mp->m_super->s_dev;
+		__entry->path_nr = path_nr;
+		__entry->nr_steps = nr_steps;
+		__entry->outcome = outcome;
+	),
+	TP_printk("dev %d:%d path %llu steps %u outcome %s",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->path_nr,
+		  __entry->nr_steps,
+		  __print_symbolic(__entry->outcome, XCHK_DIRPATH_OUTCOME_STRINGS))
+);
+#define DEFINE_XCHK_DIRPATH_OUTCOME_EVENT(name) \
+DEFINE_EVENT(xchk_dirpath_outcome_class, name, \
+	TP_PROTO(struct xfs_scrub *sc, unsigned long long path_nr, \
+		 unsigned int nr_steps, \
+		 unsigned int outcome), \
+	TP_ARGS(sc, path_nr, nr_steps, outcome))
+DEFINE_XCHK_DIRPATH_OUTCOME_EVENT(xchk_dirpath_set_outcome);
+DEFINE_XCHK_DIRPATH_OUTCOME_EVENT(xchk_dirpath_evaluate_path);
+
+DECLARE_EVENT_CLASS(xchk_dirtree_evaluate_class,
+	TP_PROTO(const struct xchk_dirtree *dl,
+		 const struct xchk_dirtree_outcomes *oc),
+	TP_ARGS(dl, oc),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_ino_t, ino)
+		__field(xfs_ino_t, rootino)
+		__field(unsigned int, nr_paths)
+		__field(unsigned int, bad)
+		__field(unsigned int, suspect)
+		__field(unsigned int, good)
+	),
+	TP_fast_assign(
+		__entry->dev = dl->sc->mp->m_super->s_dev;
+		__entry->ino = dl->sc->ip->i_ino;
+		__entry->rootino = dl->root_ino;
+		__entry->nr_paths = dl->nr_paths;
+		__entry->bad = oc->bad;
+		__entry->suspect = oc->suspect;
+		__entry->good = oc->good;
+	),
+	TP_printk("dev %d:%d ino 0x%llx rootino 0x%llx nr_paths %u bad %u suspect %u good %u",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->ino,
+		  __entry->rootino,
+		  __entry->nr_paths,
+		  __entry->bad,
+		  __entry->suspect,
+		  __entry->good)
+);
+#define DEFINE_XCHK_DIRTREE_EVALUATE_EVENT(name) \
+DEFINE_EVENT(xchk_dirtree_evaluate_class, name, \
+	TP_PROTO(const struct xchk_dirtree *dl, \
+		 const struct xchk_dirtree_outcomes *oc), \
+	TP_ARGS(dl, oc))
+DEFINE_XCHK_DIRTREE_EVALUATE_EVENT(xchk_dirtree_evaluate);
+
 /* repair tracepoints */
 #if IS_ENABLED(CONFIG_XFS_ONLINE_REPAIR)
 
diff --git a/fs/xfs/scrub/xfarray.h b/fs/xfs/scrub/xfarray.h
index 3b10a58e9f146..8f54c8fc888fa 100644
--- a/fs/xfs/scrub/xfarray.h
+++ b/fs/xfs/scrub/xfarray.h
@@ -8,6 +8,7 @@
 
 /* xfile array index type, along with cursor initialization */
 typedef uint64_t		xfarray_idx_t;
+#define XFARRAY_NULLIDX		((__force xfarray_idx_t)-1ULL)
 #define XFARRAY_CURSOR_INIT	((__force xfarray_idx_t)0)
 
 /* Iterate each index of an xfile array. */


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 2/4] xfs: invalidate dirloop scrub path data when concurrent updates happen
  2024-04-10  0:46 ` [PATCHSET v13.1 8/9] xfs: detect and correct directory tree problems Darrick J. Wong
  2024-04-10  1:07   ` [PATCH 1/4] xfs: teach online scrub to find directory tree structure problems Darrick J. Wong
@ 2024-04-10  1:07   ` Darrick J. Wong
  2024-04-10  7:21     ` Christoph Hellwig
  2024-04-10  1:08   ` [PATCH 3/4] xfs: report directory tree corruption in the health information Darrick J. Wong
  2024-04-10  1:08   ` [PATCH 4/4] xfs: fix corruptions in the directory tree Darrick J. Wong
  3 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  1:07 UTC (permalink / raw)
  To: djwong; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Add a dirent update hook so that we can detect directory tree updates
that affect any of the paths found by this scrubber and force it to
rescan.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/scrub/dirtree.c |  160 ++++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/dirtree.h |   20 ++++++
 fs/xfs/scrub/trace.h   |   65 ++++++++++++++++++++
 3 files changed, 244 insertions(+), 1 deletion(-)


diff --git a/fs/xfs/scrub/dirtree.c b/fs/xfs/scrub/dirtree.c
index 2461e525b3d70..1d41dc9a4d00f 100644
--- a/fs/xfs/scrub/dirtree.c
+++ b/fs/xfs/scrub/dirtree.c
@@ -70,6 +70,9 @@ xchk_dirtree_buf_cleanup(
 	struct xchk_dirtree	*dl = buf;
 	struct xchk_dirpath	*path, *n;
 
+	if (dl->scan_ino != NULLFSINO)
+		xfs_dir_hook_del(dl->sc->mp, &dl->dhook);
+
 	xchk_dirtree_for_each_path_safe(dl, path, n) {
 		list_del_init(&path->list);
 		xino_bitmap_destroy(&path->seen_inodes);
@@ -90,13 +93,17 @@ xchk_setup_dirtree(
 	char			*descr;
 	int			error;
 
+	xchk_fsgates_enable(sc, XCHK_FSGATES_DIRENTS);
+
 	dl = kvzalloc(sizeof(struct xchk_dirtree), XCHK_GFP_FLAGS);
 	if (!dl)
 		return -ENOMEM;
 	dl->sc = sc;
 	dl->xname.name = dl->namebuf;
+	dl->hook_xname.name = dl->hook_namebuf;
 	INIT_LIST_HEAD(&dl->path_list);
 	dl->root_ino = NULLFSINO;
+	dl->scan_ino = NULLFSINO;
 
 	mutex_init(&dl->lock);
 
@@ -552,6 +559,133 @@ xchk_dirpath_walk_upwards(
 	return error;
 }
 
+/*
+ * Decide if this path step has been touched by this live update.  Returns
+ * 1 for yes, 0 for no, or a negative errno.
+ */
+STATIC int
+xchk_dirpath_step_is_stale(
+	struct xchk_dirtree		*dl,
+	struct xchk_dirpath		*path,
+	unsigned int			step_nr,
+	xfarray_idx_t			step_idx,
+	struct xfs_dir_update_params	*p,
+	xfs_ino_t			*cursor)
+{
+	struct xchk_dirpath_step	step;
+	xfs_ino_t			child_ino = *cursor;
+	int				error;
+
+	error = xfarray_load(dl->path_steps, step_idx, &step);
+	if (error)
+		return error;
+	*cursor = be64_to_cpu(step.pptr_rec.p_ino);
+
+	/*
+	 * If the parent and child being updated are not the ones mentioned in
+	 * this path step, the scan data is still ok.
+	 */
+	if (p->ip->i_ino != child_ino || p->dp->i_ino != *cursor)
+		return 0;
+
+	/*
+	 * If the dirent name lengths or byte sequences are different, the scan
+	 * data is still ok.
+	 */
+	if (p->name->len != step.name_len)
+		return 0;
+
+	error = xfblob_loadname(dl->path_names, step.name_cookie,
+			&dl->hook_xname, step.name_len);
+	if (error)
+		return error;
+
+	if (memcmp(dl->hook_xname.name, p->name->name, p->name->len) != 0)
+		return 0;
+
+	/* Exact match, scan data is out of date. */
+	trace_xchk_dirpath_changed(dl->sc, path->path_nr, step_nr, p->dp,
+			p->ip, p->name);
+	return 1;
+}
+
+/*
+ * Decide if this path has been touched by this live update.  Returns 1 for
+ * yes, 0 for no, or a negative errno.
+ */
+STATIC int
+xchk_dirpath_is_stale(
+	struct xchk_dirtree		*dl,
+	struct xchk_dirpath		*path,
+	struct xfs_dir_update_params	*p)
+{
+	xfs_ino_t			cursor = dl->scan_ino;
+	xfarray_idx_t			idx = path->first_step;
+	unsigned int			i;
+	int				ret;
+
+	/*
+	 * The child being updated has not been seen by this path at all; this
+	 * path cannot be stale.
+	 */
+	if (!xino_bitmap_test(&path->seen_inodes, p->ip->i_ino))
+		return 0;
+
+	ret = xchk_dirpath_step_is_stale(dl, path, 0, idx, p, &cursor);
+	if (ret != 0)
+		return ret;
+
+	for (i = 1, idx = path->second_step; i < path->nr_steps; i++, idx++) {
+		ret = xchk_dirpath_step_is_stale(dl, path, i, idx, p, &cursor);
+		if (ret != 0)
+			return ret;
+	}
+
+	return 0;
+}
+
+/*
+ * Decide if a directory update from the regular filesystem touches any of the
+ * paths we've scanned, and invalidate the scan data if true.
+ */
+STATIC int
+xchk_dirtree_live_update(
+	struct notifier_block		*nb,
+	unsigned long			action,
+	void				*data)
+{
+	struct xfs_dir_update_params	*p = data;
+	struct xchk_dirtree		*dl;
+	struct xchk_dirpath		*path;
+	int				ret;
+
+	dl = container_of(nb, struct xchk_dirtree, dhook.dirent_hook.nb);
+
+	trace_xchk_dirtree_live_update(dl->sc, p->dp, action, p->ip, p->delta,
+			p->name);
+
+	mutex_lock(&dl->lock);
+
+	if (dl->stale || dl->aborted)
+		goto out_unlock;
+
+	xchk_dirtree_for_each_path(dl, path) {
+		ret = xchk_dirpath_is_stale(dl, path, p);
+		if (ret < 0) {
+			dl->aborted = true;
+			break;
+		}
+		if (ret == 1) {
+			dl->stale = true;
+			break;
+		}
+	}
+
+out_unlock:
+	mutex_unlock(&dl->lock);
+	return NOTIFY_DONE;
+}
+
 /* Delete all the collected path information. */
 STATIC void
 xchk_dirtree_reset(
@@ -667,6 +801,8 @@ xchk_dirtree_find_paths_to_root(
 			}
 			if (error)
 				return error;
+			if (dl->aborted)
+				return 0;
 		}
 	} while (dl->stale);
 
@@ -738,11 +874,28 @@ xchk_dirtree(
 
 	ASSERT(xfs_has_parent(sc->mp));
 
-	/* Find the root of the directory tree. */
+	/*
+	 * Find the root of the directory tree.  Remember which directory to
+	 * scan, because the hook doesn't detach until after sc->ip gets
+	 * released during teardown.
+	 */
 	dl->root_ino = sc->mp->m_rootip->i_ino;
+	dl->scan_ino = sc->ip->i_ino;
 
 	trace_xchk_dirtree_start(sc->ip, sc->sm, 0);
 
+	/*
+	 * Hook into the directory entry code so that we can capture updates to
+	 * paths that we have already scanned.  The scanner thread takes each
+	 * directory's ILOCK, which means that any in-progress directory update
+	 * will finish before we can scan the directory.
+	 */
+	ASSERT(sc->flags & XCHK_FSGATES_DIRENTS);
+	xfs_dir_hook_setup(&dl->dhook, xchk_dirtree_live_update);
+	error = xfs_dir_hook_add(sc->mp, &dl->dhook);
+	if (error)
+		goto out;
+
 	mutex_lock(&dl->lock);
 
 	/* Trace each parent pointer's path to the root. */
@@ -769,6 +922,10 @@ xchk_dirtree(
 	}
 	if (error)
 		goto out_scanlock;
+	if (dl->aborted) {
+		xchk_set_incomplete(sc);
+		goto out_scanlock;
+	}
 
 	/* Assess what we found in our path evaluation. */
 	xchk_dirtree_evaluate(dl, &oc);
@@ -784,6 +941,7 @@ xchk_dirtree(
 
 out_scanlock:
 	mutex_unlock(&dl->lock);
+out:
 	trace_xchk_dirtree_done(sc->ip, sc->sm, error);
 	return error;
 }
diff --git a/fs/xfs/scrub/dirtree.h b/fs/xfs/scrub/dirtree.h
index 50fefd64ae508..2ddbcf43c2915 100644
--- a/fs/xfs/scrub/dirtree.h
+++ b/fs/xfs/scrub/dirtree.h
@@ -72,6 +72,13 @@ struct xchk_dirtree {
 	/* Root inode that we're looking for. */
 	xfs_ino_t		root_ino;
 
+	/*
+	 * This is the inode that we're scanning.  The live update hook can
+	 * continue to be called after xchk_teardown drops sc->ip but before
+	 * it calls buf_cleanup, so we keep a copy.
+	 */
+	xfs_ino_t		scan_ino;
+
 	/* Scratch buffer for scanning pptr xattrs */
 	struct xfs_parent_rec	pptr_rec;
 	struct xfs_da_args	pptr_args;
@@ -80,9 +87,19 @@ struct xchk_dirtree {
 	struct xfs_name		xname;
 	char			namebuf[MAXNAMELEN];
 
+	/*
+	 * Hook into directory updates so that we can receive live updates
+	 * from other writer threads.
+	 */
+	struct xfs_dir_hook	dhook;
+
 	/* lock for everything below here */
 	struct mutex		lock;
 
+	/* buffer for the live update functions to use for dirent names */
+	struct xfs_name		hook_xname;
+	unsigned char		hook_namebuf[MAXNAMELEN];
+
 	/*
 	 * All path steps observed during this scan.  Each of the path
 	 * steps for a particular pathwalk are recorded in sequential
@@ -106,6 +123,9 @@ struct xchk_dirtree {
 
 	/* Have the path data been invalidated by a concurrent update? */
 	bool			stale:1;
+
+	/* Has the scan been aborted? */
+	bool			aborted:1;
 };
 
 #define xchk_dirtree_for_each_path_safe(dl, path, n) \
diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
index c474bcd7d54b7..509b6f4fd0cd3 100644
--- a/fs/xfs/scrub/trace.h
+++ b/fs/xfs/scrub/trace.h
@@ -1764,6 +1764,71 @@ DEFINE_EVENT(xchk_dirtree_evaluate_class, name, \
 	TP_ARGS(dl, oc))
 DEFINE_XCHK_DIRTREE_EVALUATE_EVENT(xchk_dirtree_evaluate);
 
+TRACE_EVENT(xchk_dirpath_changed,
+	TP_PROTO(struct xfs_scrub *sc, unsigned int path_nr,
+		 unsigned int step_nr, const struct xfs_inode *dp,
+		 const struct xfs_inode *ip, const struct xfs_name *xname),
+	TP_ARGS(sc, path_nr, step_nr, dp, ip, xname),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(unsigned int, path_nr)
+		__field(unsigned int, step_nr)
+		__field(xfs_ino_t, child_ino)
+		__field(xfs_ino_t, parent_ino)
+		__field(unsigned int, namelen)
+		__dynamic_array(char, name, xname->len)
+	),
+	TP_fast_assign(
+		__entry->dev = sc->mp->m_super->s_dev;
+		__entry->path_nr = path_nr;
+		__entry->step_nr = step_nr;
+		__entry->child_ino = ip->i_ino;
+		__entry->parent_ino = dp->i_ino;
+		__entry->namelen = xname->len;
+		memcpy(__get_str(name), xname->name, xname->len);
+	),
+	TP_printk("dev %d:%d path %u step %u child_ino 0x%llx parent_ino 0x%llx name '%.*s'",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->path_nr,
+		  __entry->step_nr,
+		  __entry->child_ino,
+		  __entry->parent_ino,
+		  __entry->namelen,
+		  __get_str(name))
+);
+
+TRACE_EVENT(xchk_dirtree_live_update,
+	TP_PROTO(struct xfs_scrub *sc, const struct xfs_inode *dp,
+		 int action, const struct xfs_inode *ip, int delta,
+		 const struct xfs_name *xname),
+	TP_ARGS(sc, dp, action, ip, delta, xname),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_ino_t, parent_ino)
+		__field(int, action)
+		__field(xfs_ino_t, child_ino)
+		__field(int, delta)
+		__field(unsigned int, namelen)
+		__dynamic_array(char, name, xname->len)
+	),
+	TP_fast_assign(
+		__entry->dev = sc->mp->m_super->s_dev;
+		__entry->parent_ino = dp->i_ino;
+		__entry->action = action;
+		__entry->child_ino = ip->i_ino;
+		__entry->delta = delta;
+		__entry->namelen = xname->len;
+		memcpy(__get_str(name), xname->name, xname->len);
+	),
+	TP_printk("dev %d:%d parent_ino 0x%llx child_ino 0x%llx nlink_delta %d name '%.*s'",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->parent_ino,
+		  __entry->child_ino,
+		  __entry->delta,
+		  __entry->namelen,
+		  __get_str(name))
+);
+
 /* repair tracepoints */
 #if IS_ENABLED(CONFIG_XFS_ONLINE_REPAIR)
 


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 3/4] xfs: report directory tree corruption in the health information
  2024-04-10  0:46 ` [PATCHSET v13.1 8/9] xfs: detect and correct directory tree problems Darrick J. Wong
  2024-04-10  1:07   ` [PATCH 1/4] xfs: teach online scrub to find directory tree structure problems Darrick J. Wong
  2024-04-10  1:07   ` [PATCH 2/4] xfs: invalidate dirloop scrub path data when concurrent updates happen Darrick J. Wong
@ 2024-04-10  1:08   ` Darrick J. Wong
  2024-04-10  7:23     ` Christoph Hellwig
  2024-04-10  1:08   ` [PATCH 4/4] xfs: fix corruptions in the directory tree Darrick J. Wong
  3 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  1:08 UTC (permalink / raw)
  To: djwong; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Report directories that are the source of corruption in the directory
tree.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_fs.h     |    1 +
 fs/xfs/libxfs/xfs_health.h |    4 +++-
 fs/xfs/scrub/health.c      |    1 +
 fs/xfs/xfs_health.c        |    1 +
 4 files changed, 6 insertions(+), 1 deletion(-)


diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index da0f427a09730..f2d2c1db18e53 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -412,6 +412,7 @@ struct xfs_bulkstat {
 #define XFS_BS_SICK_XATTR	(1 << 5)  /* extended attributes */
 #define XFS_BS_SICK_SYMLINK	(1 << 6)  /* symbolic link remote target */
 #define XFS_BS_SICK_PARENT	(1 << 7)  /* parent pointers */
+#define XFS_BS_SICK_DIRTREE	(1 << 8)  /* directory tree structure */
 
 /*
  * Project quota id helpers (previously projid was 16bit only
diff --git a/fs/xfs/libxfs/xfs_health.h b/fs/xfs/libxfs/xfs_health.h
index 3c64b5f9bd681..b0edb4288e592 100644
--- a/fs/xfs/libxfs/xfs_health.h
+++ b/fs/xfs/libxfs/xfs_health.h
@@ -95,6 +95,7 @@ struct xfs_da_args;
 
 /* Don't propagate sick status to ag health summary during inactivation */
 #define XFS_SICK_INO_FORGET	(1 << 12)
+#define XFS_SICK_INO_DIRTREE	(1 << 13)  /* directory tree structure */
 
 /* Primary evidence of health problems in a given group. */
 #define XFS_SICK_FS_PRIMARY	(XFS_SICK_FS_COUNTERS | \
@@ -125,7 +126,8 @@ struct xfs_da_args;
 				 XFS_SICK_INO_DIR | \
 				 XFS_SICK_INO_XATTR | \
 				 XFS_SICK_INO_SYMLINK | \
-				 XFS_SICK_INO_PARENT)
+				 XFS_SICK_INO_PARENT | \
+				 XFS_SICK_INO_DIRTREE)
 
 #define XFS_SICK_INO_ZAPPED	(XFS_SICK_INO_BMBTD_ZAPPED | \
 				 XFS_SICK_INO_BMBTA_ZAPPED | \
diff --git a/fs/xfs/scrub/health.c b/fs/xfs/scrub/health.c
index 9020a6bef7f14..b712a8bd34f54 100644
--- a/fs/xfs/scrub/health.c
+++ b/fs/xfs/scrub/health.c
@@ -108,6 +108,7 @@ static const struct xchk_health_map type_to_health_flag[XFS_SCRUB_TYPE_NR] = {
 	[XFS_SCRUB_TYPE_FSCOUNTERS]	= { XHG_FS,  XFS_SICK_FS_COUNTERS },
 	[XFS_SCRUB_TYPE_QUOTACHECK]	= { XHG_FS,  XFS_SICK_FS_QUOTACHECK },
 	[XFS_SCRUB_TYPE_NLINKS]		= { XHG_FS,  XFS_SICK_FS_NLINKS },
+	[XFS_SCRUB_TYPE_DIRTREE]	= { XHG_INO, XFS_SICK_INO_DIRTREE },
 };
 
 /* Return the health status mask for this scrub type. */
diff --git a/fs/xfs/xfs_health.c b/fs/xfs/xfs_health.c
index b39f959146bc1..10f116d093a22 100644
--- a/fs/xfs/xfs_health.c
+++ b/fs/xfs/xfs_health.c
@@ -470,6 +470,7 @@ static const struct ioctl_sick_map ino_map[] = {
 	{ XFS_SICK_INO_BMBTA_ZAPPED,	XFS_BS_SICK_BMBTA },
 	{ XFS_SICK_INO_DIR_ZAPPED,	XFS_BS_SICK_DIR },
 	{ XFS_SICK_INO_SYMLINK_ZAPPED,	XFS_BS_SICK_SYMLINK },
+	{ XFS_SICK_INO_DIRTREE,	XFS_BS_SICK_DIRTREE },
 	{ 0, 0 },
 };
 


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 4/4] xfs: fix corruptions in the directory tree
  2024-04-10  0:46 ` [PATCHSET v13.1 8/9] xfs: detect and correct directory tree problems Darrick J. Wong
                     ` (2 preceding siblings ...)
  2024-04-10  1:08   ` [PATCH 3/4] xfs: report directory tree corruption in the health information Darrick J. Wong
@ 2024-04-10  1:08   ` Darrick J. Wong
  2024-04-10  7:23     ` Christoph Hellwig
  3 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  1:08 UTC (permalink / raw)
  To: djwong; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Repair corruptions in the directory tree itself.  Cycles are broken by
removing an incoming parent->child link.  Multiply-owned directories are
fixed by pruning the extra parent -> child links  Disconnected subtrees
are reconnected to the lost and found.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/Makefile               |    1 
 fs/xfs/scrub/dirtree.c        |   38 ++
 fs/xfs/scrub/dirtree.h        |   29 +
 fs/xfs/scrub/dirtree_repair.c |  821 +++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/orphanage.c      |    6 
 fs/xfs/scrub/orphanage.h      |    8 
 fs/xfs/scrub/repair.h         |    4 
 fs/xfs/scrub/scrub.c          |    2 
 fs/xfs/scrub/trace.h          |   23 +
 fs/xfs/xfs_inode.c            |    2 
 fs/xfs/xfs_inode.h            |    1 
 11 files changed, 927 insertions(+), 8 deletions(-)
 create mode 100644 fs/xfs/scrub/dirtree_repair.c


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 8ec0dd257a984..d1ce1213797be 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -204,6 +204,7 @@ xfs-y				+= $(addprefix scrub/, \
 				   bmap_repair.o \
 				   cow_repair.o \
 				   dir_repair.o \
+				   dirtree_repair.o \
 				   findparent.o \
 				   fscounters_repair.o \
 				   ialloc_repair.o \
diff --git a/fs/xfs/scrub/dirtree.c b/fs/xfs/scrub/dirtree.c
index 1d41dc9a4d00f..8c467f33b487d 100644
--- a/fs/xfs/scrub/dirtree.c
+++ b/fs/xfs/scrub/dirtree.c
@@ -26,6 +26,8 @@
 #include "scrub/xfblob.h"
 #include "scrub/listxattr.h"
 #include "scrub/trace.h"
+#include "scrub/repair.h"
+#include "scrub/orphanage.h"
 #include "scrub/dirtree.h"
 
 /*
@@ -95,6 +97,12 @@ xchk_setup_dirtree(
 
 	xchk_fsgates_enable(sc, XCHK_FSGATES_DIRENTS);
 
+	if (xchk_could_repair(sc)) {
+		error = xrep_setup_dirtree(sc);
+		if (error)
+			return error;
+	}
+
 	dl = kvzalloc(sizeof(struct xchk_dirtree), XCHK_GFP_FLAGS);
 	if (!dl)
 		return -ENOMEM;
@@ -104,6 +112,7 @@ xchk_setup_dirtree(
 	INIT_LIST_HEAD(&dl->path_list);
 	dl->root_ino = NULLFSINO;
 	dl->scan_ino = NULLFSINO;
+	dl->parent_ino = NULLFSINO;
 
 	mutex_init(&dl->lock);
 
@@ -142,7 +151,7 @@ xchk_setup_dirtree(
  * Add the parent pointer described by @dl->pptr to the given path as a new
  * step.  Returns -ELNRNG if the path is too deep.
  */
-STATIC int
+int
 xchk_dirpath_append(
 	struct xchk_dirtree		*dl,
 	struct xfs_inode		*ip,
@@ -603,6 +612,22 @@ xchk_dirpath_step_is_stale(
 	if (memcmp(dl->hook_xname.name, p->name->name, p->name->len) != 0)
 		return 0;
 
+	/*
+	 * If the update comes from the repair code itself, walk the state
+	 * machine forward.
+	 */
+	if (p->ip->i_ino == dl->scan_ino &&
+	    path->outcome == XREP_DIRPATH_ADOPTING) {
+		xchk_dirpath_set_outcome(dl, path, XREP_DIRPATH_ADOPTED);
+		return 0;
+	}
+
+	if (p->ip->i_ino == dl->scan_ino &&
+	    path->outcome == XREP_DIRPATH_DELETING) {
+		xchk_dirpath_set_outcome(dl, path, XREP_DIRPATH_DELETED);
+		return 0;
+	}
+
 	/* Exact match, scan data is out of date. */
 	trace_xchk_dirpath_changed(dl->sc, path->path_nr, step_nr, p->dp,
 			p->ip, p->name);
@@ -741,7 +766,7 @@ xchk_dirtree_load_path(
  * path was too deep; -ENOSR if there were too many parent pointers; or
  * a negative errno.
  */
-STATIC int
+int
 xchk_dirtree_find_paths_to_root(
 	struct xchk_dirtree	*dl)
 {
@@ -813,7 +838,7 @@ xchk_dirtree_find_paths_to_root(
  * Figure out what to do with the paths we tried to find.  Do not call this
  * if the scan results are stale.
  */
-STATIC void
+void
 xchk_dirtree_evaluate(
 	struct xchk_dirtree		*dl,
 	struct xchk_dirtree_outcomes	*oc)
@@ -850,6 +875,13 @@ xchk_dirtree_evaluate(
 			/* This path got all the way to the root. */
 			oc->good++;
 			break;
+		case XREP_DIRPATH_DELETING:
+		case XREP_DIRPATH_DELETED:
+		case XREP_DIRPATH_ADOPTING:
+		case XREP_DIRPATH_ADOPTED:
+			/* These should not be in progress! */
+			ASSERT(0);
+			break;
 		}
 	}
 
diff --git a/fs/xfs/scrub/dirtree.h b/fs/xfs/scrub/dirtree.h
index 2ddbcf43c2915..1e1686365c61c 100644
--- a/fs/xfs/scrub/dirtree.h
+++ b/fs/xfs/scrub/dirtree.h
@@ -26,6 +26,11 @@ enum xchk_dirpath_outcome {
 	XCHK_DIRPATH_LOOP,		/* cycle detected further up */
 	XCHK_DIRPATH_STALE,		/* path is stale */
 	XCHK_DIRPATH_OK,		/* path reaches the root */
+
+	XREP_DIRPATH_DELETING,		/* path is being deleted */
+	XREP_DIRPATH_DELETED,		/* path has been deleted */
+	XREP_DIRPATH_ADOPTING,		/* path is being adopted */
+	XREP_DIRPATH_ADOPTED,		/* path has been adopted */
 };
 
 /*
@@ -64,6 +69,9 @@ struct xchk_dirtree_outcomes {
 
 	/* Number of XCHK_DIRPATH_OK */
 	unsigned int		good;
+
+	/* Directory needs to be added to lost+found */
+	bool			needs_adoption;
 };
 
 struct xchk_dirtree {
@@ -79,6 +87,14 @@ struct xchk_dirtree {
 	 */
 	xfs_ino_t		scan_ino;
 
+	/*
+	 * If we start deleting redundant paths to this subdirectory, this is
+	 * the inode number of the surviving parent and the dotdot entry will
+	 * be set to this value.  If the value is NULLFSINO, then use @root_ino
+	 * as a stand-in until the orphanage can adopt the subdirectory.
+	 */
+	xfs_ino_t		parent_ino;
+
 	/* Scratch buffer for scanning pptr xattrs */
 	struct xfs_parent_rec	pptr_rec;
 	struct xfs_da_args	pptr_args;
@@ -87,12 +103,18 @@ struct xchk_dirtree {
 	struct xfs_name		xname;
 	char			namebuf[MAXNAMELEN];
 
+	/* Information for reparenting this directory. */
+	struct xrep_adoption	adoption;
+
 	/*
 	 * Hook into directory updates so that we can receive live updates
 	 * from other writer threads.
 	 */
 	struct xfs_dir_hook	dhook;
 
+	/* Parent pointer update arguments. */
+	struct xfs_parent_args	ppargs;
+
 	/* lock for everything below here */
 	struct mutex		lock;
 
@@ -146,4 +168,11 @@ xchk_dirtree_parentless(const struct xchk_dirtree *dl)
 	return false;
 }
 
+int xchk_dirtree_find_paths_to_root(struct xchk_dirtree *dl);
+int xchk_dirpath_append(struct xchk_dirtree *dl, struct xfs_inode *ip,
+		struct xchk_dirpath *path, const struct xfs_name *name,
+		const struct xfs_parent_rec *pptr);
+void xchk_dirtree_evaluate(struct xchk_dirtree *dl,
+		struct xchk_dirtree_outcomes *oc);
+
 #endif /* __XFS_SCRUB_DIRTREE_H__ */
diff --git a/fs/xfs/scrub/dirtree_repair.c b/fs/xfs/scrub/dirtree_repair.c
new file mode 100644
index 0000000000000..5c04e70ba9518
--- /dev/null
+++ b/fs/xfs/scrub/dirtree_repair.c
@@ -0,0 +1,821 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (c) 2023-2024 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <djwong@kernel.org>
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_trans_space.h"
+#include "xfs_mount.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_inode.h"
+#include "xfs_icache.h"
+#include "xfs_dir2.h"
+#include "xfs_dir2_priv.h"
+#include "xfs_attr.h"
+#include "xfs_parent.h"
+#include "scrub/scrub.h"
+#include "scrub/common.h"
+#include "scrub/bitmap.h"
+#include "scrub/ino_bitmap.h"
+#include "scrub/xfile.h"
+#include "scrub/xfarray.h"
+#include "scrub/xfblob.h"
+#include "scrub/listxattr.h"
+#include "scrub/trace.h"
+#include "scrub/repair.h"
+#include "scrub/orphanage.h"
+#include "scrub/dirtree.h"
+#include "scrub/readdir.h"
+
+/*
+ * Directory Tree Structure Repairs
+ * ================================
+ *
+ * If we decide that the directory being scanned is participating in a
+ * directory loop, the only change we can make is to remove directory entries
+ * pointing down to @sc->ip.  If that leaves it with no parents, the directory
+ * should be adopted by the orphanage.
+ */
+
+/* Set up to repair directory loops. */
+int
+xrep_setup_dirtree(
+	struct xfs_scrub	*sc)
+{
+	return xrep_orphanage_try_create(sc);
+}
+
+/* Change the outcome of this path. */
+static inline void
+xrep_dirpath_set_outcome(
+	struct xchk_dirtree		*dl,
+	struct xchk_dirpath		*path,
+	enum xchk_dirpath_outcome	outcome)
+{
+	trace_xrep_dirpath_set_outcome(dl->sc, path->path_nr, path->nr_steps,
+			outcome);
+
+	path->outcome = outcome;
+}
+
+/* Delete all paths. */
+STATIC void
+xrep_dirtree_delete_all_paths(
+	struct xchk_dirtree		*dl,
+	struct xchk_dirtree_outcomes	*oc)
+{
+	struct xchk_dirpath		*path;
+
+	xchk_dirtree_for_each_path(dl, path) {
+		switch (path->outcome) {
+		case XCHK_DIRPATH_CORRUPT:
+		case XCHK_DIRPATH_LOOP:
+			oc->suspect--;
+			oc->bad++;
+			xrep_dirpath_set_outcome(dl, path, XCHK_DIRPATH_DELETE);
+			break;
+		case XCHK_DIRPATH_OK:
+			oc->good--;
+			oc->bad++;
+			xrep_dirpath_set_outcome(dl, path, XCHK_DIRPATH_DELETE);
+			break;
+		default:
+			break;
+		}
+	}
+
+	ASSERT(oc->suspect == 0);
+	ASSERT(oc->good == 0);
+}
+
+/* Since this is the surviving path, set the dotdot entry to this value. */
+STATIC void
+xrep_dirpath_retain_parent(
+	struct xchk_dirtree		*dl,
+	struct xchk_dirpath		*path)
+{
+	struct xchk_dirpath_step	step;
+	int				error;
+
+	error = xfarray_load(dl->path_steps, path->first_step, &step);
+	if (error)
+		return;
+
+	dl->parent_ino = be64_to_cpu(step.pptr_rec.p_ino);
+}
+
+/* Find the one surviving path so we know how to set dotdot. */
+STATIC void
+xrep_dirtree_find_surviving_path(
+	struct xchk_dirtree		*dl,
+	struct xchk_dirtree_outcomes	*oc)
+{
+	struct xchk_dirpath		*path;
+	bool				foundit = false;
+
+	xchk_dirtree_for_each_path(dl, path) {
+		switch (path->outcome) {
+		case XCHK_DIRPATH_CORRUPT:
+		case XCHK_DIRPATH_LOOP:
+		case XCHK_DIRPATH_OK:
+			if (!foundit) {
+				xrep_dirpath_retain_parent(dl, path);
+				foundit = true;
+				continue;
+			}
+			ASSERT(foundit == false);
+			break;
+		default:
+			break;
+		}
+	}
+
+	ASSERT(oc->suspect + oc->good == 1);
+}
+
+/* Delete all paths except for the one good one. */
+STATIC void
+xrep_dirtree_keep_one_good_path(
+	struct xchk_dirtree		*dl,
+	struct xchk_dirtree_outcomes	*oc)
+{
+	struct xchk_dirpath		*path;
+	bool				foundit = false;
+
+	xchk_dirtree_for_each_path(dl, path) {
+		switch (path->outcome) {
+		case XCHK_DIRPATH_CORRUPT:
+		case XCHK_DIRPATH_LOOP:
+			oc->suspect--;
+			oc->bad++;
+			xrep_dirpath_set_outcome(dl, path, XCHK_DIRPATH_DELETE);
+			break;
+		case XCHK_DIRPATH_OK:
+			if (!foundit) {
+				xrep_dirpath_retain_parent(dl, path);
+				foundit = true;
+				continue;
+			}
+			oc->good--;
+			oc->bad++;
+			xrep_dirpath_set_outcome(dl, path, XCHK_DIRPATH_DELETE);
+			break;
+		default:
+			break;
+		}
+	}
+
+	ASSERT(oc->suspect == 0);
+	ASSERT(oc->good < 2);
+}
+
+/* Delete all paths except for one suspect one. */
+STATIC void
+xrep_dirtree_keep_one_suspect_path(
+	struct xchk_dirtree		*dl,
+	struct xchk_dirtree_outcomes	*oc)
+{
+	struct xchk_dirpath		*path;
+	bool				foundit = false;
+
+	xchk_dirtree_for_each_path(dl, path) {
+		switch (path->outcome) {
+		case XCHK_DIRPATH_CORRUPT:
+		case XCHK_DIRPATH_LOOP:
+			if (!foundit) {
+				xrep_dirpath_retain_parent(dl, path);
+				foundit = true;
+				continue;
+			}
+			oc->suspect--;
+			oc->bad++;
+			xrep_dirpath_set_outcome(dl, path, XCHK_DIRPATH_DELETE);
+			break;
+		case XCHK_DIRPATH_OK:
+			ASSERT(0);
+			break;
+		default:
+			break;
+		}
+	}
+
+	ASSERT(oc->suspect == 1);
+	ASSERT(oc->good == 0);
+}
+
+/*
+ * Figure out what to do with the paths we tried to find.  Returns -EDEADLOCK
+ * if the scan results have become stale.
+ */
+STATIC void
+xrep_dirtree_decide_fate(
+	struct xchk_dirtree		*dl,
+	struct xchk_dirtree_outcomes	*oc)
+{
+	xchk_dirtree_evaluate(dl, oc);
+
+	/* Parentless directories should not have any paths at all. */
+	if (xchk_dirtree_parentless(dl)) {
+		xrep_dirtree_delete_all_paths(dl, oc);
+		return;
+	}
+
+	/* One path is exactly the number of paths we want. */
+	if (oc->good + oc->suspect == 1) {
+		xrep_dirtree_find_surviving_path(dl, oc);
+		return;
+	}
+
+	/* Zero paths means we should reattach the subdir to the orphanage. */
+	if (oc->good + oc->suspect == 0) {
+		if (dl->sc->orphanage)
+			oc->needs_adoption = true;
+		return;
+	}
+
+	/*
+	 * Otherwise, this subdirectory has too many parents.  If there's at
+	 * least one good path, keep it and delete the others.
+	 */
+	if (oc->good > 0) {
+		xrep_dirtree_keep_one_good_path(dl, oc);
+		return;
+	}
+
+	/*
+	 * There are no good paths and there are too many suspect paths.
+	 * Keep the first suspect path and delete the rest.
+	 */
+	xrep_dirtree_keep_one_suspect_path(dl, oc);
+}
+
+/*
+ * Load the first step of this path into @step and @dl->xname/pptr
+ * for later repair work.
+ */
+STATIC int
+xrep_dirtree_prep_path(
+	struct xchk_dirtree		*dl,
+	struct xchk_dirpath		*path,
+	struct xchk_dirpath_step	*step)
+{
+	int				error;
+
+	error = xfarray_load(dl->path_steps, path->first_step, step);
+	if (error)
+		return error;
+
+	error = xfblob_loadname(dl->path_names, step->name_cookie, &dl->xname,
+			step->name_len);
+	if (error)
+		return error;
+
+	dl->pptr_rec = step->pptr_rec; /* struct copy */
+	return 0;
+}
+
+/* Delete the VFS dentry for a removed child. */
+STATIC int
+xrep_dirtree_purge_dentry(
+	struct xchk_dirtree	*dl,
+	struct xfs_inode	*dp,
+	const struct xfs_name	*name)
+{
+	struct qstr		qname = QSTR_INIT(name->name, name->len);
+	struct dentry		*parent_dentry, *child_dentry;
+	int			error = 0;
+
+	/*
+	 * Find the dentry for the parent directory.  If there isn't one, we're
+	 * done.  Caller already holds i_rwsem for parent and child.
+	 */
+	parent_dentry = d_find_alias(VFS_I(dp));
+	if (!parent_dentry)
+		return 0;
+
+	/* The VFS thinks the parent is a directory, right? */
+	if (!d_is_dir(parent_dentry)) {
+		ASSERT(d_is_dir(parent_dentry));
+		error = -EFSCORRUPTED;
+		goto out_dput_parent;
+	}
+
+	/*
+	 * Try to find the dirent pointing to the child.  If there isn't one,
+	 * we're done.
+	 */
+	qname.hash = full_name_hash(parent_dentry, name->name, name->len);
+	child_dentry = d_lookup(parent_dentry, &qname);
+	if (!child_dentry) {
+		error = 0;
+		goto out_dput_parent;
+	}
+
+	trace_xrep_dirtree_delete_child(dp->i_mount, child_dentry);
+
+	/* Child is not a directory?  We're screwed. */
+	if (!d_is_dir(child_dentry)) {
+		ASSERT(d_is_dir(child_dentry));
+		error = -EFSCORRUPTED;
+		goto out_dput_child;
+	}
+
+	/* Replace the child dentry with a negative one. */
+	d_delete(child_dentry);
+
+out_dput_child:
+	dput(child_dentry);
+out_dput_parent:
+	dput(parent_dentry);
+	return error;
+}
+
+/*
+ * Prepare to delete a link by taking the IOLOCK of the parent and the child
+ * (scrub target).  Caller must hold IOLOCK_EXCL on @sc->ip.  Returns 0 if we
+ * took both locks, or a negative errno if we couldn't lock the parent in time.
+ */
+static inline int
+xrep_dirtree_unlink_iolock(
+	struct xfs_scrub	*sc,
+	struct xfs_inode	*dp)
+{
+	int			error;
+
+	ASSERT(sc->ilock_flags & XFS_IOLOCK_EXCL);
+
+	if (xfs_ilock_nowait(dp, XFS_IOLOCK_EXCL))
+		return 0;
+
+	xchk_iunlock(sc, XFS_IOLOCK_EXCL);
+	do {
+		xfs_ilock(dp, XFS_IOLOCK_EXCL);
+		if (xchk_ilock_nowait(sc, XFS_IOLOCK_EXCL))
+			break;
+		xfs_iunlock(dp, XFS_IOLOCK_EXCL);
+
+		if (xchk_should_terminate(sc, &error)) {
+			xchk_ilock(sc, XFS_IOLOCK_EXCL);
+			return error;
+		}
+
+		delay(1);
+	} while (1);
+
+	return 0;
+}
+
+/*
+ * Remove a link from the directory tree and update the dcache.  Returns
+ * -ESTALE if the scan data are now out of date.
+ */
+STATIC int
+xrep_dirtree_unlink(
+	struct xchk_dirtree		*dl,
+	struct xfs_inode		*dp,
+	struct xchk_dirpath		*path,
+	struct xchk_dirpath_step	*step)
+{
+	struct xfs_scrub		*sc = dl->sc;
+	struct xfs_mount		*mp = sc->mp;
+	xfs_ino_t			dotdot_ino;
+	xfs_ino_t			parent_ino = dl->parent_ino;
+	unsigned int			resblks;
+	int				dontcare;
+	int				error;
+
+	/* Take IOLOCK_EXCL of the parent and child. */
+	error = xrep_dirtree_unlink_iolock(sc, dp);
+	if (error)
+		return error;
+
+	/*
+	 * Create the transaction that we need to sever the path.  Ignore
+	 * EDQUOT and ENOSPC being returned via nospace_error because the
+	 * directory code can handle a reservationless update.
+	 */
+	resblks = xfs_remove_space_res(mp, step->name_len);
+	error = xfs_trans_alloc_dir(dp, &M_RES(mp)->tr_remove, sc->ip,
+			&resblks, &sc->tp, &dontcare);
+	if (error)
+		goto out_iolock;
+
+	/*
+	 * Cancel if someone invalidate the paths while we were trying to get
+	 * the ILOCK.
+	 */
+	mutex_lock(&dl->lock);
+	if (dl->stale) {
+		mutex_unlock(&dl->lock);
+		error = -ESTALE;
+		goto out_trans_cancel;
+	}
+	xrep_dirpath_set_outcome(dl, path, XREP_DIRPATH_DELETING);
+	mutex_unlock(&dl->lock);
+
+	trace_xrep_dirtree_delete_path(dl->sc, sc->ip, path->path_nr,
+			&dl->xname, &dl->pptr_rec);
+
+	/*
+	 * Decide if we need to reset the dotdot entry.  Rules:
+	 *
+	 * - If there's a surviving parent, we want dotdot to point there.
+	 * - If we don't have any surviving parents, then point dotdot at the
+	 *   root dir.
+	 * - If dotdot is already set to the value we want, pass in NULLFSINO
+	 *   for no change necessary.
+	 *
+	 * Do this /before/ we dirty anything, in case the dotdot lookup
+	 * fails.
+	 */
+	error = xchk_dir_lookup(sc, sc->ip, &xfs_name_dotdot, &dotdot_ino);
+	if (error)
+		goto out_trans_cancel;
+	if (parent_ino == NULLFSINO)
+		parent_ino = dl->root_ino;
+	if (dotdot_ino == parent_ino)
+		parent_ino = NULLFSINO;
+
+	/* Drop the link from sc->ip's dotdot entry.  */
+	error = xfs_droplink(sc->tp, dp);
+	if (error)
+		goto out_trans_cancel;
+
+	/* Reset the dotdot entry to a surviving parent. */
+	if (parent_ino != NULLFSINO) {
+		error = xfs_dir_replace(sc->tp, sc->ip, &xfs_name_dotdot,
+				parent_ino, 0);
+		if (error)
+			goto out_trans_cancel;
+	}
+
+	/* Drop the link from dp to sc->ip. */
+	error = xfs_droplink(sc->tp, sc->ip);
+	if (error)
+		goto out_trans_cancel;
+
+	error = xfs_dir_removename(sc->tp, dp, &dl->xname, sc->ip->i_ino,
+			resblks);
+	if (error) {
+		ASSERT(error != -ENOENT);
+		goto out_trans_cancel;
+	}
+
+	if (xfs_has_parent(sc->mp)) {
+		error = xfs_parent_removename(sc->tp, &dl->ppargs, dp,
+				&dl->xname, sc->ip);
+		if (error)
+			goto out_trans_cancel;
+	}
+
+	/*
+	 * Notify dirent hooks that we removed the bad link, invalidate the
+	 * dcache, and commit the repair.
+	 */
+	xfs_dir_update_hook(dp, sc->ip, -1, &dl->xname);
+	error = xrep_dirtree_purge_dentry(dl, dp, &dl->xname);
+	if (error)
+		goto out_trans_cancel;
+
+	error = xrep_trans_commit(sc);
+	goto out_ilock;
+
+out_trans_cancel:
+	xchk_trans_cancel(sc);
+out_ilock:
+	xfs_iunlock(sc->ip, XFS_ILOCK_EXCL);
+	xfs_iunlock(dp, XFS_ILOCK_EXCL);
+out_iolock:
+	xfs_iunlock(dp, XFS_IOLOCK_EXCL);
+	return error;
+}
+
+/*
+ * Delete a directory entry that points to this directory.  Returns -ESTALE
+ * if the scan data are now out of date.
+ */
+STATIC int
+xrep_dirtree_delete_path(
+	struct xchk_dirtree		*dl,
+	struct xchk_dirpath		*path)
+{
+	struct xchk_dirpath_step	step;
+	struct xfs_scrub		*sc = dl->sc;
+	struct xfs_inode		*dp;
+	int				error;
+
+	/*
+	 * Load the parent pointer and directory inode for this path, then
+	 * drop the scan lock, the ILOCK, and the transaction so that
+	 * _delete_path can reserve the proper transaction.  This sets up
+	 * @dl->xname for the deletion.
+	 */
+	error = xrep_dirtree_prep_path(dl, path, &step);
+	if (error)
+		return error;
+
+	error = xchk_iget(sc, be64_to_cpu(step.pptr_rec.p_ino), &dp);
+	if (error)
+		return error;
+
+	mutex_unlock(&dl->lock);
+	xchk_trans_cancel(sc);
+	xchk_iunlock(sc, XFS_ILOCK_EXCL);
+
+	/* Delete the directory link and release the parent. */
+	error = xrep_dirtree_unlink(dl, dp, path, &step);
+	xchk_irele(sc, dp);
+
+	/*
+	 * Retake all the resources we had at the beginning even if the repair
+	 * failed or the scan data are now stale.  This keeps things simple for
+	 * the caller.
+	 */
+	xchk_trans_alloc_empty(sc);
+	xchk_ilock(sc, XFS_ILOCK_EXCL);
+	mutex_lock(&dl->lock);
+
+	if (!error && dl->stale)
+		error = -ESTALE;
+	return error;
+}
+
+/* Add a new path to represent our in-progress adoption. */
+STATIC int
+xrep_dirtree_create_adoption_path(
+	struct xchk_dirtree		*dl)
+{
+	struct xfs_scrub		*sc = dl->sc;
+	struct xchk_dirpath		*path;
+	int				error;
+
+	/*
+	 * We should have capped the number of paths at XFS_MAXLINK-1 in the
+	 * scanner.
+	 */
+	if (dl->nr_paths > XFS_MAXLINK) {
+		ASSERT(dl->nr_paths <= XFS_MAXLINK);
+		return -EFSCORRUPTED;
+	}
+
+	/*
+	 * Create a new xchk_path structure to remember this parent pointer
+	 * and record the first name step.
+	 */
+	path = kmalloc(sizeof(struct xchk_dirpath), XCHK_GFP_FLAGS);
+	if (!path)
+		return -ENOMEM;
+
+	INIT_LIST_HEAD(&path->list);
+	xino_bitmap_init(&path->seen_inodes);
+	path->nr_steps = 0;
+	path->outcome = XREP_DIRPATH_ADOPTING;
+
+	/*
+	 * Record the new link that we just created in the orphanage.  Because
+	 * adoption is the last repair that we perform, we don't bother filling
+	 * in the path all the way back to the root.
+	 */
+	xfs_inode_to_parent_rec(&dl->pptr_rec, sc->orphanage);
+
+	error = xino_bitmap_set(&path->seen_inodes, sc->orphanage->i_ino);
+	if (error)
+		goto out_path;
+
+	trace_xrep_dirtree_create_adoption(sc, sc->ip, dl->nr_paths,
+			&dl->xname, &dl->pptr_rec);
+
+	error = xchk_dirpath_append(dl, sc->ip, path, &dl->xname,
+			&dl->pptr_rec);
+	if (error)
+		goto out_path;
+
+	path->first_step = xfarray_length(dl->path_steps) - 1;
+	path->second_step = XFARRAY_NULLIDX;
+	path->path_nr = dl->nr_paths;
+
+	list_add_tail(&path->list, &dl->path_list);
+	dl->nr_paths++;
+	return 0;
+
+out_path:
+	kfree(path);
+	return error;
+}
+
+/*
+ * Prepare to move a file to the orphanage by taking the IOLOCK of the
+ * orphanage and the child (scrub target).  Caller must hold IOLOCK_EXCL on
+ * @sc->ip.  Returns 0 if we took both locks, or a negative errno if we
+ * couldn't lock the orphanage in time.
+ */
+static inline int
+xrep_dirtree_adopt_iolock(
+	struct xfs_scrub	*sc)
+{
+	int			error;
+
+	ASSERT(sc->ilock_flags & XFS_IOLOCK_EXCL);
+
+	if (xrep_orphanage_ilock_nowait(sc, XFS_IOLOCK_EXCL))
+		return 0;
+
+	xchk_iunlock(sc, XFS_IOLOCK_EXCL);
+	do {
+		xrep_orphanage_ilock(sc, XFS_IOLOCK_EXCL);
+		if (xchk_ilock_nowait(sc, XFS_IOLOCK_EXCL))
+			break;
+		xrep_orphanage_iunlock(sc, XFS_IOLOCK_EXCL);
+
+		if (xchk_should_terminate(sc, &error)) {
+			xchk_ilock(sc, XFS_IOLOCK_EXCL);
+			return error;
+		}
+
+		delay(1);
+	} while (1);
+
+	return 0;
+}
+
+/*
+ * Reattach this orphaned directory to the orphanage.  Do not call this with
+ * any resources held.  Returns -ESTALE if the scan data have become out of
+ * date.
+ */
+STATIC int
+xrep_dirtree_adopt(
+	struct xchk_dirtree		*dl)
+{
+	struct xfs_scrub		*sc = dl->sc;
+	int				error;
+
+	/* Take the IOLOCK of the orphanage and the scrub target. */
+	error = xrep_dirtree_adopt_iolock(sc);
+	if (error)
+		return error;
+
+	/*
+	 * Set up for an adoption.  The directory tree fixer runs after the
+	 * link counts have been corrected.  Therefore, we must bump the
+	 * child's link count since there will be no further opportunity to fix
+	 * errors.
+	 */
+	error = xrep_adoption_trans_alloc(sc, &dl->adoption);
+	if (error)
+		goto out_iolock;
+	dl->adoption.bump_child_nlink = true;
+
+	/* Figure out what name we're going to use here. */
+	error = xrep_adoption_compute_name(&dl->adoption, &dl->xname);
+	if (error)
+		goto out_trans;
+
+	/*
+	 * Now that we have a proposed name for the orphanage entry, create
+	 * a faux path so that the live update hook will see it.
+	 */
+	mutex_lock(&dl->lock);
+	if (dl->stale) {
+		mutex_unlock(&dl->lock);
+		error = -ESTALE;
+		goto out_trans;
+	}
+	error = xrep_dirtree_create_adoption_path(dl);
+	mutex_unlock(&dl->lock);
+	if (error)
+		goto out_trans;
+
+	/* Reparent the directory. */
+	error = xrep_adoption_move(&dl->adoption);
+	if (error)
+		goto out_trans;
+
+	/*
+	 * Commit the name and release all inode locks except for the scrub
+	 * target's IOLOCK.
+	 */
+	error = xrep_trans_commit(sc);
+	goto out_ilock;
+
+out_trans:
+	xchk_trans_cancel(sc);
+out_ilock:
+	xchk_iunlock(sc, XFS_ILOCK_EXCL);
+	xrep_orphanage_iunlock(sc, XFS_ILOCK_EXCL);
+out_iolock:
+	xrep_orphanage_iunlock(sc, XFS_IOLOCK_EXCL);
+	return error;
+}
+
+/*
+ * This newly orphaned directory needs to be adopted by the orphanage.
+ * Make this happen.
+ */
+STATIC int
+xrep_dirtree_move_to_orphanage(
+	struct xchk_dirtree		*dl)
+{
+	struct xfs_scrub		*sc = dl->sc;
+	int				error;
+
+	/*
+	 * Start by dropping all the resources that we hold so that we can grab
+	 * all the resources that we need for the adoption.
+	 */
+	mutex_unlock(&dl->lock);
+	xchk_trans_cancel(sc);
+	xchk_iunlock(sc, XFS_ILOCK_EXCL);
+
+	/* Perform the adoption. */
+	error = xrep_dirtree_adopt(dl);
+
+	/*
+	 * Retake all the resources we had at the beginning even if the repair
+	 * failed or the scan data are now stale.  This keeps things simple for
+	 * the caller.
+	 */
+	xchk_trans_alloc_empty(sc);
+	xchk_ilock(sc, XFS_ILOCK_EXCL);
+	mutex_lock(&dl->lock);
+
+	if (!error && dl->stale)
+		error = -ESTALE;
+	return error;
+}
+
+/*
+ * Try to fix all the problems.  Returns -ESTALE if the scan data have become
+ * out of date.
+ */
+STATIC int
+xrep_dirtree_fix_problems(
+	struct xchk_dirtree		*dl,
+	struct xchk_dirtree_outcomes	*oc)
+{
+	struct xchk_dirpath		*path;
+	int				error;
+
+	/* Delete all the paths we don't want. */
+	xchk_dirtree_for_each_path(dl, path) {
+		if (path->outcome != XCHK_DIRPATH_DELETE)
+			continue;
+
+		error = xrep_dirtree_delete_path(dl, path);
+		if (error)
+			return error;
+	}
+
+	/* Reparent this directory to the orphanage. */
+	if (oc->needs_adoption) {
+		if (xrep_orphanage_can_adopt(dl->sc))
+			return xrep_dirtree_move_to_orphanage(dl);
+		return -EFSCORRUPTED;
+	}
+
+	return 0;
+}
+
+/* Fix directory loops involving this directory. */
+int
+xrep_dirtree(
+	struct xfs_scrub		*sc)
+{
+	struct xchk_dirtree		*dl = sc->buf;
+	struct xchk_dirtree_outcomes	oc;
+	int				error;
+
+	/*
+	 * Prepare to fix the directory tree by retaking the scan lock.  The
+	 * order of resource acquisition is still IOLOCK -> transaction ->
+	 * ILOCK -> scan lock.
+	 */
+	mutex_lock(&dl->lock);
+	do {
+		/*
+		 * Decide what we're going to do, then do it.  An -ESTALE
+		 * return here means the scan results are invalid and we have
+		 * to walk again.
+		 */
+		if (!dl->stale) {
+			xrep_dirtree_decide_fate(dl, &oc);
+
+			trace_xrep_dirtree_decided_fate(dl, &oc);
+
+			error = xrep_dirtree_fix_problems(dl, &oc);
+			if (!error || error != -ESTALE)
+				break;
+		}
+		error = xchk_dirtree_find_paths_to_root(dl);
+		if (error == -ELNRNG || error == -ENOSR)
+			error = -EFSCORRUPTED;
+	} while (!error);
+	mutex_unlock(&dl->lock);
+
+	return error;
+}
diff --git a/fs/xfs/scrub/orphanage.c b/fs/xfs/scrub/orphanage.c
index b2f905924d0d8..b1c6c60ee1da6 100644
--- a/fs/xfs/scrub/orphanage.c
+++ b/fs/xfs/scrub/orphanage.c
@@ -570,6 +570,12 @@ xrep_adoption_move(
 		xfs_bumplink(sc->tp, sc->orphanage);
 	xfs_trans_log_inode(sc->tp, sc->orphanage, XFS_ILOG_CORE);
 
+	/* Bump the link count of the child. */
+	if (adopt->bump_child_nlink) {
+		xfs_bumplink(sc->tp, sc->ip);
+		xfs_trans_log_inode(sc->tp, sc->ip, XFS_ILOG_CORE);
+	}
+
 	/* Replace the dotdot entry if the child is a subdirectory. */
 	if (isdir) {
 		error = xfs_dir_replace(sc->tp, sc->ip, &xfs_name_dotdot,
diff --git a/fs/xfs/scrub/orphanage.h b/fs/xfs/scrub/orphanage.h
index beb6b686784e6..7c7a2e7d81dbd 100644
--- a/fs/xfs/scrub/orphanage.h
+++ b/fs/xfs/scrub/orphanage.h
@@ -60,6 +60,14 @@ struct xrep_adoption {
 	/* Block reservations for orphanage and child (if directory). */
 	unsigned int		orphanage_blkres;
 	unsigned int		child_blkres;
+
+	/*
+	 * Does the caller want us to bump the child link count?  This is not
+	 * needed when reattaching files that have become disconnected but have
+	 * nlink > 1.  It is necessary when changing the directory tree
+	 * structure.
+	 */
+	bool			bump_child_nlink:1;
 };
 
 bool xrep_orphanage_can_adopt(struct xfs_scrub *sc);
diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h
index 622eb486a16fb..0e0dc2bf985c2 100644
--- a/fs/xfs/scrub/repair.h
+++ b/fs/xfs/scrub/repair.h
@@ -95,6 +95,7 @@ int xrep_setup_directory(struct xfs_scrub *sc);
 int xrep_setup_parent(struct xfs_scrub *sc);
 int xrep_setup_nlinks(struct xfs_scrub *sc);
 int xrep_setup_symlink(struct xfs_scrub *sc, unsigned int *resblks);
+int xrep_setup_dirtree(struct xfs_scrub *sc);
 
 /* Repair setup functions */
 int xrep_setup_ag_allocbt(struct xfs_scrub *sc);
@@ -132,6 +133,7 @@ int xrep_xattr(struct xfs_scrub *sc);
 int xrep_directory(struct xfs_scrub *sc);
 int xrep_parent(struct xfs_scrub *sc);
 int xrep_symlink(struct xfs_scrub *sc);
+int xrep_dirtree(struct xfs_scrub *sc);
 
 #ifdef CONFIG_XFS_RT
 int xrep_rtbitmap(struct xfs_scrub *sc);
@@ -205,6 +207,7 @@ xrep_setup_nothing(
 #define xrep_setup_directory		xrep_setup_nothing
 #define xrep_setup_parent		xrep_setup_nothing
 #define xrep_setup_nlinks		xrep_setup_nothing
+#define xrep_setup_dirtree		xrep_setup_nothing
 
 #define xrep_setup_inode(sc, imap)	((void)0)
 
@@ -239,6 +242,7 @@ static inline int xrep_setup_symlink(struct xfs_scrub *sc, unsigned int *x)
 #define xrep_directory			xrep_notsupported
 #define xrep_parent			xrep_notsupported
 #define xrep_symlink			xrep_notsupported
+#define xrep_dirtree			xrep_notsupported
 
 #endif /* CONFIG_XFS_ONLINE_REPAIR */
 
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index 8f1431db77395..e813b66b603a1 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -441,7 +441,7 @@ static const struct xchk_meta_ops meta_scrub_ops[] = {
 		.setup	= xchk_setup_dirtree,
 		.scrub	= xchk_dirtree,
 		.has	= xfs_has_parent,
-		.repair	= xrep_notsupported,
+		.repair	= xrep_dirtree,
 	},
 };
 
diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
index 509b6f4fd0cd3..b3756722bee1d 100644
--- a/fs/xfs/scrub/trace.h
+++ b/fs/xfs/scrub/trace.h
@@ -1685,6 +1685,10 @@ TRACE_DEFINE_ENUM(XCHK_DIRPATH_CORRUPT);
 TRACE_DEFINE_ENUM(XCHK_DIRPATH_LOOP);
 TRACE_DEFINE_ENUM(XCHK_DIRPATH_STALE);
 TRACE_DEFINE_ENUM(XCHK_DIRPATH_OK);
+TRACE_DEFINE_ENUM(XREP_DIRPATH_DELETING);
+TRACE_DEFINE_ENUM(XREP_DIRPATH_DELETED);
+TRACE_DEFINE_ENUM(XREP_DIRPATH_ADOPTING);
+TRACE_DEFINE_ENUM(XREP_DIRPATH_ADOPTED);
 
 #define XCHK_DIRPATH_OUTCOME_STRINGS \
 	{ XCHK_DIRPATH_SCANNING,	"scanning" }, \
@@ -1692,7 +1696,11 @@ TRACE_DEFINE_ENUM(XCHK_DIRPATH_OK);
 	{ XCHK_DIRPATH_CORRUPT,		"corrupt" }, \
 	{ XCHK_DIRPATH_LOOP,		"loop" }, \
 	{ XCHK_DIRPATH_STALE,		"stale" }, \
-	{ XCHK_DIRPATH_OK,		"ok" }
+	{ XCHK_DIRPATH_OK,		"ok" }, \
+	{ XREP_DIRPATH_DELETING,	"deleting" }, \
+	{ XREP_DIRPATH_DELETED,		"deleted" }, \
+	{ XREP_DIRPATH_ADOPTING,	"adopting" }, \
+	{ XREP_DIRPATH_ADOPTED,		"adopted" }
 
 DECLARE_EVENT_CLASS(xchk_dirpath_outcome_class,
 	TP_PROTO(struct xfs_scrub *sc, unsigned long long path_nr,
@@ -1738,6 +1746,7 @@ DECLARE_EVENT_CLASS(xchk_dirtree_evaluate_class,
 		__field(unsigned int, bad)
 		__field(unsigned int, suspect)
 		__field(unsigned int, good)
+		__field(bool, needs_adoption)
 	),
 	TP_fast_assign(
 		__entry->dev = dl->sc->mp->m_super->s_dev;
@@ -1747,15 +1756,17 @@ DECLARE_EVENT_CLASS(xchk_dirtree_evaluate_class,
 		__entry->bad = oc->bad;
 		__entry->suspect = oc->suspect;
 		__entry->good = oc->good;
+		__entry->needs_adoption = oc->needs_adoption ? 1 : 0;
 	),
-	TP_printk("dev %d:%d ino 0x%llx rootino 0x%llx nr_paths %u bad %u suspect %u good %u",
+	TP_printk("dev %d:%d ino 0x%llx rootino 0x%llx nr_paths %u bad %u suspect %u good %u adopt? %d",
 		  MAJOR(__entry->dev), MINOR(__entry->dev),
 		  __entry->ino,
 		  __entry->rootino,
 		  __entry->nr_paths,
 		  __entry->bad,
 		  __entry->suspect,
-		  __entry->good)
+		  __entry->good,
+		  __entry->needs_adoption)
 );
 #define DEFINE_XCHK_DIRTREE_EVALUATE_EVENT(name) \
 DEFINE_EVENT(xchk_dirtree_evaluate_class, name, \
@@ -3181,6 +3192,7 @@ DEFINE_REPAIR_DENTRY_EVENT(xrep_adoption_check_child);
 DEFINE_REPAIR_DENTRY_EVENT(xrep_adoption_check_alias);
 DEFINE_REPAIR_DENTRY_EVENT(xrep_adoption_check_dentry);
 DEFINE_REPAIR_DENTRY_EVENT(xrep_adoption_invalidate_child);
+DEFINE_REPAIR_DENTRY_EVENT(xrep_dirtree_delete_child);
 
 TRACE_EVENT(xrep_symlink_salvage_target,
 	TP_PROTO(struct xfs_inode *ip, char *target, unsigned int targetlen),
@@ -3483,6 +3495,11 @@ TRACE_EVENT(xrep_iunlink_commit_bucket,
 		  __entry->agino)
 );
 
+DEFINE_XCHK_DIRPATH_OUTCOME_EVENT(xrep_dirpath_set_outcome);
+DEFINE_XCHK_DIRTREE_EVENT(xrep_dirtree_delete_path);
+DEFINE_XCHK_DIRTREE_EVENT(xrep_dirtree_create_adoption);
+DEFINE_XCHK_DIRTREE_EVALUATE_EVENT(xrep_dirtree_decided_fate);
+
 #endif /* IS_ENABLED(CONFIG_XFS_ONLINE_REPAIR) */
 
 #endif /* _TRACE_XFS_SCRUB_TRACE_H */
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 766cbb8b7be51..ad9162b023ba2 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -892,7 +892,7 @@ xfs_init_new_inode(
  * link count to go to zero, move the inode to AGI unlinked list so that it can
  * be freed when the last active reference goes away via xfs_inactive().
  */
-static int			/* error */
+int
 xfs_droplink(
 	struct xfs_trans	*tp,
 	struct xfs_inode	*ip)
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index 04a91e312993b..9fd4d29a57137 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -626,6 +626,7 @@ void xfs_end_io(struct work_struct *work);
 int xfs_ilock2_io_mmap(struct xfs_inode *ip1, struct xfs_inode *ip2);
 void xfs_iunlock2_io_mmap(struct xfs_inode *ip1, struct xfs_inode *ip2);
 void xfs_iunlock2_remapping(struct xfs_inode *ip1, struct xfs_inode *ip2);
+int xfs_droplink(struct xfs_trans *tp, struct xfs_inode *ip);
 void xfs_bumplink(struct xfs_trans *tp, struct xfs_inode *ip);
 void xfs_lock_inodes(struct xfs_inode **ips, int inodes, uint lock_mode);
 void xfs_sort_inodes(struct xfs_inode **i_tab, unsigned int num_inodes);


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 1/3] xfs: reduce the rate of cond_resched calls inside scrub
  2024-04-10  0:46 ` [PATCHSET v13.1 9/9] xfs: vectorize scrub kernel calls Darrick J. Wong
@ 2024-04-10  1:08   ` Darrick J. Wong
  2024-04-10 14:55     ` Christoph Hellwig
  2024-04-10  1:08   ` [PATCH 2/3] xfs: introduce vectored scrub mode Darrick J. Wong
  2024-04-10  1:09   ` [PATCH 3/3] xfs: only iget the file once when doing vectored scrub-by-handle Darrick J. Wong
  2 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  1:08 UTC (permalink / raw)
  To: djwong; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

We really don't want to call cond_resched every single time we go
through a loop in scrub -- there may be billions of records, and probing
into the scheduler itself has overhead.  Reduce this overhead by only
calling cond_resched 10x per second; and add a counter so that we only
check jiffies once every 1000 records or so.

Surprisingly, this reduces scrub-only fstests runtime by about 2%.  I
used the bmapinflate xfs_db command to produce a billion-extent file and
this stupid gadget reduced the scrub runtime by about 4%.

From a stupid microbenchmark of calling these things 1 billion times, I
estimate that cond_resched costs about 5.5ns per call; jiffes costs
about 0.3ns per read; and fatal_signal_pending costs about 0.4ns per
call.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/scrub/common.h  |   25 -------------------
 fs/xfs/scrub/scrub.c   |    1 +
 fs/xfs/scrub/scrub.h   |   64 ++++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/xfarray.c |   10 ++++----
 fs/xfs/scrub/xfarray.h |    3 ++
 fs/xfs/scrub/xfile.c   |    2 +-
 6 files changed, 74 insertions(+), 31 deletions(-)


diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
index 39465e39dc5fd..3d5f1f6b4b7bf 100644
--- a/fs/xfs/scrub/common.h
+++ b/fs/xfs/scrub/common.h
@@ -6,31 +6,6 @@
 #ifndef __XFS_SCRUB_COMMON_H__
 #define __XFS_SCRUB_COMMON_H__
 
-/*
- * We /could/ terminate a scrub/repair operation early.  If we're not
- * in a good place to continue (fatal signal, etc.) then bail out.
- * Note that we're careful not to make any judgements about *error.
- */
-static inline bool
-xchk_should_terminate(
-	struct xfs_scrub	*sc,
-	int			*error)
-{
-	/*
-	 * If preemption is disabled, we need to yield to the scheduler every
-	 * few seconds so that we don't run afoul of the soft lockup watchdog
-	 * or RCU stall detector.
-	 */
-	cond_resched();
-
-	if (fatal_signal_pending(current)) {
-		if (*error == 0)
-			*error = -EINTR;
-		return true;
-	}
-	return false;
-}
-
 int xchk_trans_alloc(struct xfs_scrub *sc, uint resblks);
 int xchk_trans_alloc_empty(struct xfs_scrub *sc);
 void xchk_trans_cancel(struct xfs_scrub *sc);
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index e813b66b603a1..4a81f828f9f13 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -620,6 +620,7 @@ xfs_scrub_metadata(
 	sc->sm = sm;
 	sc->ops = &meta_scrub_ops[sm->sm_type];
 	sc->sick_mask = xchk_health_mask_for_scrub_type(sm->sm_type);
+	sc->relax = INIT_XCHK_RELAX;
 retry_op:
 	/*
 	 * When repairs are allowed, prevent freezing or readonly remount while
diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h
index 3910270471462..4e7e3edb6350c 100644
--- a/fs/xfs/scrub/scrub.h
+++ b/fs/xfs/scrub/scrub.h
@@ -8,6 +8,49 @@
 
 struct xfs_scrub;
 
+struct xchk_relax {
+	unsigned long	next_resched;
+	unsigned int	resched_nr;
+	bool		interruptible;
+};
+
+/* Yield to the scheduler at most 10x per second. */
+#define XCHK_RELAX_NEXT		(jiffies + (HZ / 10))
+
+#define INIT_XCHK_RELAX	\
+	(struct xchk_relax){ \
+		.next_resched	= XCHK_RELAX_NEXT, \
+		.resched_nr	= 0, \
+		.interruptible	= true, \
+	}
+
+/*
+ * Relax during a scrub operation and exit if there's a fatal signal pending.
+ *
+ * If preemption is disabled, we need to yield to the scheduler every now and
+ * then so that we don't run afoul of the soft lockup watchdog or RCU stall
+ * detector.  cond_resched calls are somewhat expensive (~5ns) so we want to
+ * ratelimit this to 10x per second.  Amortize the cost of the other checks by
+ * only doing it once every 100 calls.
+ */
+static inline int xchk_maybe_relax(struct xchk_relax *widget)
+{
+	/* Amortize the cost of scheduling and checking signals. */
+	if (likely(++widget->resched_nr < 100))
+		return 0;
+	widget->resched_nr = 0;
+
+	if (unlikely(widget->next_resched <= jiffies)) {
+		cond_resched();
+		widget->next_resched = XCHK_RELAX_NEXT;
+	}
+
+	if (widget->interruptible && fatal_signal_pending(current))
+		return -EINTR;
+
+	return 0;
+}
+
 /*
  * Standard flags for allocating memory within scrub.  NOFS context is
  * configured by the process allocation scope.  Scrub and repair must be able
@@ -123,6 +166,9 @@ struct xfs_scrub {
 	 */
 	unsigned int			sick_mask;
 
+	/* next time we want to cond_resched() */
+	struct xchk_relax		relax;
+
 	/* State tracking for single-AG operations. */
 	struct xchk_ag			sa;
 };
@@ -167,6 +213,24 @@ struct xfs_scrub_subord *xchk_scrub_create_subord(struct xfs_scrub *sc,
 		unsigned int subtype);
 void xchk_scrub_free_subord(struct xfs_scrub_subord *sub);
 
+/*
+ * We /could/ terminate a scrub/repair operation early.  If we're not
+ * in a good place to continue (fatal signal, etc.) then bail out.
+ * Note that we're careful not to make any judgements about *error.
+ */
+static inline bool
+xchk_should_terminate(
+	struct xfs_scrub	*sc,
+	int			*error)
+{
+	if (xchk_maybe_relax(&sc->relax)) {
+		if (*error == 0)
+			*error = -EINTR;
+		return true;
+	}
+	return false;
+}
+
 /* Metadata scrubbers */
 int xchk_tester(struct xfs_scrub *sc);
 int xchk_superblock(struct xfs_scrub *sc);
diff --git a/fs/xfs/scrub/xfarray.c b/fs/xfs/scrub/xfarray.c
index b65cd3fc5ac9b..9185ae7088d49 100644
--- a/fs/xfs/scrub/xfarray.c
+++ b/fs/xfs/scrub/xfarray.c
@@ -7,9 +7,9 @@
 #include "xfs_fs.h"
 #include "xfs_shared.h"
 #include "xfs_format.h"
+#include "scrub/scrub.h"
 #include "scrub/xfile.h"
 #include "scrub/xfarray.h"
-#include "scrub/scrub.h"
 #include "scrub/trace.h"
 
 /*
@@ -486,6 +486,9 @@ xfarray_sortinfo_alloc(
 
 	xfarray_sortinfo_lo(si)[0] = 0;
 	xfarray_sortinfo_hi(si)[0] = array->nr - 1;
+	si->relax = INIT_XCHK_RELAX;
+	if (flags & XFARRAY_SORT_KILLABLE)
+		si->relax.interruptible = false;
 
 	trace_xfarray_sort(si, nr_bytes);
 	*infop = si;
@@ -503,10 +506,7 @@ xfarray_sort_terminated(
 	 * few seconds so that we don't run afoul of the soft lockup watchdog
 	 * or RCU stall detector.
 	 */
-	cond_resched();
-
-	if ((si->flags & XFARRAY_SORT_KILLABLE) &&
-	    fatal_signal_pending(current)) {
+	if (xchk_maybe_relax(&si->relax)) {
 		if (*error == 0)
 			*error = -EINTR;
 		return true;
diff --git a/fs/xfs/scrub/xfarray.h b/fs/xfs/scrub/xfarray.h
index 8f54c8fc888fa..5eeeeed13ae24 100644
--- a/fs/xfs/scrub/xfarray.h
+++ b/fs/xfs/scrub/xfarray.h
@@ -127,6 +127,9 @@ struct xfarray_sortinfo {
 	/* XFARRAY_SORT_* flags; see below. */
 	unsigned int		flags;
 
+	/* next time we want to cond_resched() */
+	struct xchk_relax	relax;
+
 	/* Cache a folio here for faster scanning for pivots */
 	struct folio		*folio;
 
diff --git a/fs/xfs/scrub/xfile.c b/fs/xfs/scrub/xfile.c
index 4e254a0ba0036..d848222f802ba 100644
--- a/fs/xfs/scrub/xfile.c
+++ b/fs/xfs/scrub/xfile.c
@@ -10,9 +10,9 @@
 #include "xfs_log_format.h"
 #include "xfs_trans_resv.h"
 #include "xfs_mount.h"
+#include "scrub/scrub.h"
 #include "scrub/xfile.h"
 #include "scrub/xfarray.h"
-#include "scrub/scrub.h"
 #include "scrub/trace.h"
 #include <linux/shmem_fs.h>
 


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 2/3] xfs: introduce vectored scrub mode
  2024-04-10  0:46 ` [PATCHSET v13.1 9/9] xfs: vectorize scrub kernel calls Darrick J. Wong
  2024-04-10  1:08   ` [PATCH 1/3] xfs: reduce the rate of cond_resched calls inside scrub Darrick J. Wong
@ 2024-04-10  1:08   ` Darrick J. Wong
  2024-04-10 15:00     ` Christoph Hellwig
  2024-04-10  1:09   ` [PATCH 3/3] xfs: only iget the file once when doing vectored scrub-by-handle Darrick J. Wong
  2 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  1:08 UTC (permalink / raw)
  To: djwong; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Introduce a variant on XFS_SCRUB_METADATA that allows for a vectored
mode.  The caller specifies the principal metadata object that they want
to scrub (allocation group, inode, etc.) once, followed by an array of
scrub types they want called on that object.  The kernel runs the scrub
operations and writes the output flags and errno code to the
corresponding array element.

A new pseudo scrub type BARRIER is introduced to force the kernel to
return to userspace if any corruptions have been found when scrubbing
the previous scrub types in the array.  This enables userspace to
schedule, for example, the sequence:

 1. data fork
 2. barrier
 3. directory

If the data fork scrub is clean, then the kernel will perform the
directory scrub.  If not, the barrier in 2 will exit back to userspace.

When running fstests in "rebuild all metadata after each test" mode, I
observed a 10% reduction in runtime due to fewer transitions across the
system call boundary.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_fs.h   |   40 +++++++++++++++++
 fs/xfs/scrub/scrub.c     |  106 ++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/trace.h     |   78 +++++++++++++++++++++++++++++++++-
 fs/xfs/scrub/xfs_scrub.h |    2 +
 fs/xfs/xfs_ioctl.c       |   50 ++++++++++++++++++++++
 5 files changed, 275 insertions(+), 1 deletion(-)


diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index f2d2c1db18e53..9dba51d29ecd0 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -726,6 +726,15 @@ struct xfs_scrub_metadata {
 /* Number of scrub subcommands. */
 #define XFS_SCRUB_TYPE_NR	29
 
+/*
+ * This special type code only applies to the vectored scrub implementation.
+ *
+ * If any of the previous scrub vectors recorded runtime errors or have
+ * sv_flags bits set that match the OFLAG bits in the barrier vector's
+ * sv_flags, set the barrier's sv_ret to -ECANCELED and return to userspace.
+ */
+#define XFS_SCRUB_TYPE_BARRIER	(-1U)
+
 /* i: Repair this metadata. */
 #define XFS_SCRUB_IFLAG_REPAIR		(1u << 0)
 
@@ -770,6 +779,36 @@ struct xfs_scrub_metadata {
 				 XFS_SCRUB_OFLAG_NO_REPAIR_NEEDED)
 #define XFS_SCRUB_FLAGS_ALL	(XFS_SCRUB_FLAGS_IN | XFS_SCRUB_FLAGS_OUT)
 
+/* Vectored scrub calls to reduce the number of kernel transitions. */
+
+struct xfs_scrub_vec {
+	__u32 sv_type;		/* XFS_SCRUB_TYPE_* */
+	__u32 sv_flags;		/* XFS_SCRUB_FLAGS_* */
+	__s32 sv_ret;		/* 0 or a negative error code */
+	__u32 sv_reserved;	/* must be zero */
+};
+
+/* Vectored metadata scrub control structure. */
+struct xfs_scrub_vec_head {
+	__u64 svh_ino;		/* inode number. */
+	__u32 svh_gen;		/* inode generation. */
+	__u32 svh_agno;		/* ag number. */
+	__u32 svh_flags;	/* XFS_SCRUB_VEC_FLAGS_* */
+	__u16 svh_rest_us;	/* wait this much time between vector items */
+	__u16 svh_nr;		/* number of svh_vecs */
+	__u64 svh_reserved;	/* must be zero */
+
+	struct xfs_scrub_vec svh_vecs[];
+};
+
+#define XFS_SCRUB_VEC_FLAGS_ALL		(0)
+
+static inline size_t sizeof_xfs_scrub_vec(unsigned int nr)
+{
+	return sizeof(struct xfs_scrub_vec_head) +
+		nr * sizeof(struct xfs_scrub_vec);
+}
+
 /*
  * ioctl limits
  */
@@ -949,6 +988,7 @@ struct xfs_getparents_by_handle {
 #define XFS_IOC_FREE_EOFBLOCKS	_IOR ('X', 58, struct xfs_fs_eofblocks)
 /*	XFS_IOC_GETFSMAP ------ hoisted 59         */
 #define XFS_IOC_SCRUB_METADATA	_IOWR('X', 60, struct xfs_scrub_metadata)
+#define XFS_IOC_SCRUBV_METADATA	_IOWR('X', 60, struct xfs_scrub_vec_head)
 #define XFS_IOC_AG_GEOMETRY	_IOWR('X', 61, struct xfs_ag_geometry)
 #define XFS_IOC_GETPARENTS	_IOWR('X', 62, struct xfs_getparents)
 #define XFS_IOC_GETPARENTS_BY_HANDLE _IOWR('X', 63, struct xfs_getparents_by_handle)
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index 4a81f828f9f13..cab34823f3c91 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -21,6 +21,7 @@
 #include "xfs_exchmaps.h"
 #include "xfs_dir2.h"
 #include "xfs_parent.h"
+#include "xfs_icache.h"
 #include "scrub/scrub.h"
 #include "scrub/common.h"
 #include "scrub/trace.h"
@@ -724,3 +725,108 @@ xfs_scrub_metadata(
 	run.retries++;
 	goto retry_op;
 }
+
+/* Decide if there have been any scrub failures up to this point. */
+static inline bool
+xfs_scrubv_previous_failures(
+	struct xfs_mount		*mp,
+	struct xfs_scrub_vec_head	*vhead,
+	struct xfs_scrub_vec		*barrier_vec)
+{
+	struct xfs_scrub_vec		*v;
+	__u32				failmask;
+
+	failmask = barrier_vec->sv_flags & XFS_SCRUB_FLAGS_OUT;
+
+	for (v = vhead->svh_vecs; v < barrier_vec; v++) {
+		if (v->sv_type == XFS_SCRUB_TYPE_BARRIER)
+			continue;
+
+		/*
+		 * Runtime errors count as a previous failure, except the ones
+		 * used to ask userspace to retry.
+		 */
+		if (v->sv_ret && v->sv_ret != -EBUSY && v->sv_ret != -ENOENT &&
+		    v->sv_ret != -EUSERS)
+			return true;
+
+		/*
+		 * If any of the out-flags on the scrub vector match the mask
+		 * that was set on the barrier vector, that's a previous fail.
+		 */
+		if (v->sv_flags & failmask)
+			return true;
+	}
+
+	return false;
+}
+
+/* Vectored scrub implementation to reduce ioctl calls. */
+int
+xfs_scrubv_metadata(
+	struct file			*file,
+	struct xfs_scrub_vec_head	*vhead)
+{
+	struct xfs_inode		*ip_in = XFS_I(file_inode(file));
+	struct xfs_mount		*mp = ip_in->i_mount;
+	struct xfs_scrub_vec		*v;
+	unsigned int			i;
+	int				error = 0;
+
+	BUILD_BUG_ON(sizeof(struct xfs_scrub_vec_head) ==
+		     sizeof(struct xfs_scrub_metadata));
+	BUILD_BUG_ON(XFS_IOC_SCRUB_METADATA == XFS_IOC_SCRUBV_METADATA);
+
+	trace_xchk_scrubv_start(ip_in, vhead);
+
+	if (vhead->svh_flags & ~XFS_SCRUB_VEC_FLAGS_ALL)
+		return -EINVAL;
+	for (i = 0, v = vhead->svh_vecs; i < vhead->svh_nr; i++, v++) {
+		if (v->sv_reserved)
+			return -EINVAL;
+		if (v->sv_type == XFS_SCRUB_TYPE_BARRIER &&
+		    (v->sv_flags & ~XFS_SCRUB_FLAGS_OUT))
+			return -EINVAL;
+
+		trace_xchk_scrubv_item(mp, vhead, v);
+	}
+
+	/* Run all the scrubbers. */
+	for (i = 0, v = vhead->svh_vecs; i < vhead->svh_nr; i++, v++) {
+		struct xfs_scrub_metadata	sm = {
+			.sm_type	= v->sv_type,
+			.sm_flags	= v->sv_flags,
+			.sm_ino		= vhead->svh_ino,
+			.sm_gen		= vhead->svh_gen,
+			.sm_agno	= vhead->svh_agno,
+		};
+
+		if (v->sv_type == XFS_SCRUB_TYPE_BARRIER) {
+			if (xfs_scrubv_previous_failures(mp, vhead, v)) {
+				v->sv_ret = -ECANCELED;
+				trace_xchk_scrubv_barrier_fail(mp, vhead, v);
+				break;
+			}
+
+			continue;
+		}
+
+		v->sv_ret = xfs_scrub_metadata(file, &sm);
+		v->sv_flags = sm.sm_flags;
+
+		if (vhead->svh_rest_us) {
+			ktime_t		expires;
+
+			expires = ktime_add_ns(ktime_get(),
+					vhead->svh_rest_us * 1000);
+			set_current_state(TASK_KILLABLE);
+			schedule_hrtimeout(&expires, HRTIMER_MODE_ABS);
+		}
+		if (fatal_signal_pending(current)) {
+			error = -EINTR;
+			break;
+		}
+	}
+
+	return error;
+}
diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
index b3756722bee1d..17557e635e68c 100644
--- a/fs/xfs/scrub/trace.h
+++ b/fs/xfs/scrub/trace.h
@@ -69,6 +69,7 @@ TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_QUOTACHECK);
 TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_NLINKS);
 TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_HEALTHY);
 TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_DIRTREE);
+TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_BARRIER);
 
 #define XFS_SCRUB_TYPE_STRINGS \
 	{ XFS_SCRUB_TYPE_PROBE,		"probe" }, \
@@ -99,7 +100,8 @@ TRACE_DEFINE_ENUM(XFS_SCRUB_TYPE_DIRTREE);
 	{ XFS_SCRUB_TYPE_QUOTACHECK,	"quotacheck" }, \
 	{ XFS_SCRUB_TYPE_NLINKS,	"nlinks" }, \
 	{ XFS_SCRUB_TYPE_HEALTHY,	"healthy" }, \
-	{ XFS_SCRUB_TYPE_DIRTREE,	"dirtree" }
+	{ XFS_SCRUB_TYPE_DIRTREE,	"dirtree" }, \
+	{ XFS_SCRUB_TYPE_BARRIER,	"barrier" }
 
 #define XFS_SCRUB_FLAG_STRINGS \
 	{ XFS_SCRUB_IFLAG_REPAIR,		"repair" }, \
@@ -208,6 +210,80 @@ DEFINE_EVENT(xchk_fsgate_class, name, \
 DEFINE_SCRUB_FSHOOK_EVENT(xchk_fsgates_enable);
 DEFINE_SCRUB_FSHOOK_EVENT(xchk_fsgates_disable);
 
+DECLARE_EVENT_CLASS(xchk_vector_head_class,
+	TP_PROTO(struct xfs_inode *ip, struct xfs_scrub_vec_head *vhead),
+	TP_ARGS(ip, vhead),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_ino_t, ino)
+		__field(xfs_agnumber_t, agno)
+		__field(xfs_ino_t, inum)
+		__field(unsigned int, gen)
+		__field(unsigned int, flags)
+		__field(unsigned short, rest_us)
+		__field(unsigned short, nr_vecs)
+	),
+	TP_fast_assign(
+		__entry->dev = ip->i_mount->m_super->s_dev;
+		__entry->ino = ip->i_ino;
+		__entry->agno = vhead->svh_agno;
+		__entry->inum = vhead->svh_ino;
+		__entry->gen = vhead->svh_gen;
+		__entry->flags = vhead->svh_flags;
+		__entry->rest_us = vhead->svh_rest_us;
+		__entry->nr_vecs = vhead->svh_nr;
+	),
+	TP_printk("dev %d:%d ino 0x%llx agno 0x%x inum 0x%llx gen 0x%x flags 0x%x rest_us %u nr_vecs %u",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->ino,
+		  __entry->agno,
+		  __entry->inum,
+		  __entry->gen,
+		  __entry->flags,
+		  __entry->rest_us,
+		  __entry->nr_vecs)
+)
+#define DEFINE_SCRUBV_HEAD_EVENT(name) \
+DEFINE_EVENT(xchk_vector_head_class, name, \
+	TP_PROTO(struct xfs_inode *ip, struct xfs_scrub_vec_head *vhead), \
+	TP_ARGS(ip, vhead))
+
+DEFINE_SCRUBV_HEAD_EVENT(xchk_scrubv_start);
+
+DECLARE_EVENT_CLASS(xchk_vector_class,
+	TP_PROTO(struct xfs_mount *mp, struct xfs_scrub_vec_head *vhead,
+		 struct xfs_scrub_vec *v),
+	TP_ARGS(mp, vhead, v),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(unsigned int, vec_nr)
+		__field(unsigned int, vec_type)
+		__field(unsigned int, vec_flags)
+		__field(int, vec_ret)
+	),
+	TP_fast_assign(
+		__entry->dev = mp->m_super->s_dev;
+		__entry->vec_nr = v - vhead->svh_vecs;
+		__entry->vec_type = v->sv_type;
+		__entry->vec_flags = v->sv_flags;
+		__entry->vec_ret = v->sv_ret;
+	),
+	TP_printk("dev %d:%d vec[%u] type %s flags %s ret %d",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->vec_nr,
+		  __print_symbolic(__entry->vec_type, XFS_SCRUB_TYPE_STRINGS),
+		  __print_flags(__entry->vec_flags, "|", XFS_SCRUB_FLAG_STRINGS),
+		  __entry->vec_ret)
+)
+#define DEFINE_SCRUBV_EVENT(name) \
+DEFINE_EVENT(xchk_vector_class, name, \
+	TP_PROTO(struct xfs_mount *mp, struct xfs_scrub_vec_head *vhead, \
+		 struct xfs_scrub_vec *v), \
+	TP_ARGS(mp, vhead, v))
+
+DEFINE_SCRUBV_EVENT(xchk_scrubv_barrier_fail);
+DEFINE_SCRUBV_EVENT(xchk_scrubv_item);
+
 TRACE_EVENT(xchk_op_error,
 	TP_PROTO(struct xfs_scrub *sc, xfs_agnumber_t agno,
 		 xfs_agblock_t bno, int error, void *ret_ip),
diff --git a/fs/xfs/scrub/xfs_scrub.h b/fs/xfs/scrub/xfs_scrub.h
index a39befa743ce0..61d010f19f003 100644
--- a/fs/xfs/scrub/xfs_scrub.h
+++ b/fs/xfs/scrub/xfs_scrub.h
@@ -8,8 +8,10 @@
 
 #ifndef CONFIG_XFS_ONLINE_SCRUB
 # define xfs_scrub_metadata(file, sm)	(-ENOTTY)
+# define xfs_scrubv_metadata(file, vhead)	(-ENOTTY)
 #else
 int xfs_scrub_metadata(struct file *file, struct xfs_scrub_metadata *sm);
+int xfs_scrubv_metadata(struct file *file, struct xfs_scrub_vec_head *vhead);
 #endif /* CONFIG_XFS_ONLINE_SCRUB */
 
 #endif	/* __XFS_SCRUB_H__ */
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index c7a15b5f33aa4..4bc74274a8af7 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -1241,6 +1241,54 @@ xfs_ioc_setlabel(
 	return error;
 }
 
+STATIC int
+xfs_ioc_scrubv_metadata(
+	struct file			*filp,
+	void				__user *arg)
+{
+	struct xfs_scrub_vec_head	__user *uhead = arg;
+	struct xfs_scrub_vec_head	head;
+	struct xfs_scrub_vec_head	*vhead;
+	size_t				bytes;
+	int				error;
+
+	if (!capable(CAP_SYS_ADMIN))
+		return -EPERM;
+
+	if (copy_from_user(&head, uhead, sizeof(head)))
+		return -EFAULT;
+
+	if (head.svh_reserved)
+		return -EINVAL;
+
+	bytes = sizeof_xfs_scrub_vec(head.svh_nr);
+	if (bytes > PAGE_SIZE)
+		return -ENOMEM;
+	vhead = kvmalloc(bytes, GFP_KERNEL | __GFP_RETRY_MAYFAIL);
+	if (!vhead)
+		return -ENOMEM;
+	memcpy(vhead, &head, sizeof(struct xfs_scrub_vec_head));
+
+	if (copy_from_user(&vhead->svh_vecs, &uhead->svh_vecs,
+				head.svh_nr * sizeof(struct xfs_scrub_vec))) {
+		error = -EFAULT;
+		goto err_free;
+	}
+
+	error = xfs_scrubv_metadata(filp, vhead);
+	if (error)
+		goto err_free;
+
+	if (copy_to_user(uhead, vhead, bytes)) {
+		error = -EFAULT;
+		goto err_free;
+	}
+
+err_free:
+	kvfree(vhead);
+	return error;
+}
+
 static inline int
 xfs_fs_eofblocks_from_user(
 	struct xfs_fs_eofblocks		*src,
@@ -1555,6 +1603,8 @@ xfs_file_ioctl(
 	case FS_IOC_GETFSMAP:
 		return xfs_ioc_getfsmap(ip, arg);
 
+	case XFS_IOC_SCRUBV_METADATA:
+		return xfs_ioc_scrubv_metadata(filp, arg);
 	case XFS_IOC_SCRUB_METADATA:
 		return xfs_ioc_scrub_metadata(filp, arg);
 


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* [PATCH 3/3] xfs: only iget the file once when doing vectored scrub-by-handle
  2024-04-10  0:46 ` [PATCHSET v13.1 9/9] xfs: vectorize scrub kernel calls Darrick J. Wong
  2024-04-10  1:08   ` [PATCH 1/3] xfs: reduce the rate of cond_resched calls inside scrub Darrick J. Wong
  2024-04-10  1:08   ` [PATCH 2/3] xfs: introduce vectored scrub mode Darrick J. Wong
@ 2024-04-10  1:09   ` Darrick J. Wong
  2024-04-10 15:12     ` Christoph Hellwig
  2 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10  1:09 UTC (permalink / raw)
  To: djwong; +Cc: hch, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

If a program wants us to perform a scrub on a file handle and the fd
passed to ioctl() is not the file referenced in the handle, iget the
file once and pass it into the scrub code.  This amortizes the untrusted
iget lookup over /all/ the scrubbers mentioned in the scrubv call.

When running fstests in "rebuild all metadata after each test" mode, I
observed a 10% reduction in runtime on account of avoiding repeated
inobt lookups.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/scrub/scrub.c |   61 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 61 insertions(+)


diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index cab34823f3c91..f1a17f986b6cf 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -761,6 +761,31 @@ xfs_scrubv_previous_failures(
 	return false;
 }
 
+/*
+ * If the caller provided us with a nonzero inode number that isn't the ioctl
+ * file, try to grab a reference to it to eliminate all further untrusted inode
+ * lookups.  If we can't get the inode, let each scrub function try again.
+ */
+STATIC struct xfs_inode *
+xchk_scrubv_open_by_handle(
+	struct xfs_mount		*mp,
+	const struct xfs_scrub_vec_head	*vhead)
+{
+	struct xfs_inode		*ip;
+	int				error;
+
+	error = xfs_iget(mp, NULL, vhead->svh_ino, XFS_IGET_UNTRUSTED, 0, &ip);
+	if (error)
+		return NULL;
+
+	if (VFS_I(ip)->i_generation != vhead->svh_gen) {
+		xfs_irele(ip);
+		return NULL;
+	}
+
+	return ip;
+}
+
 /* Vectored scrub implementation to reduce ioctl calls. */
 int
 xfs_scrubv_metadata(
@@ -769,7 +794,9 @@ xfs_scrubv_metadata(
 {
 	struct xfs_inode		*ip_in = XFS_I(file_inode(file));
 	struct xfs_mount		*mp = ip_in->i_mount;
+	struct xfs_inode		*handle_ip = NULL;
 	struct xfs_scrub_vec		*v;
+	bool				set_dontcache = false;
 	unsigned int			i;
 	int				error = 0;
 
@@ -788,9 +815,28 @@ xfs_scrubv_metadata(
 		    (v->sv_flags & ~XFS_SCRUB_FLAGS_OUT))
 			return -EINVAL;
 
+		/*
+		 * If we detect at least one inode-type scrub, we might
+		 * consider setting dontcache at the end.
+		 */
+		if (v->sv_type < XFS_SCRUB_TYPE_NR &&
+		    meta_scrub_ops[v->sv_type].type == ST_INODE)
+			set_dontcache = true;
+
 		trace_xchk_scrubv_item(mp, vhead, v);
 	}
 
+	/*
+	 * If the caller wants us to do a scrub-by-handle and the file used to
+	 * call the ioctl is not the same file, load the incore inode and pin
+	 * it across all the scrubv actions to avoid repeated UNTRUSTED
+	 * lookups.  The reference is not passed to deeper layers of scrub
+	 * because each scrubber gets to decide its own strategy for getting an
+	 * inode.
+	 */
+	if (vhead->svh_ino && vhead->svh_ino != ip_in->i_ino)
+		handle_ip = xchk_scrubv_open_by_handle(mp, vhead);
+
 	/* Run all the scrubbers. */
 	for (i = 0, v = vhead->svh_vecs; i < vhead->svh_nr; i++, v++) {
 		struct xfs_scrub_metadata	sm = {
@@ -814,6 +860,10 @@ xfs_scrubv_metadata(
 		v->sv_ret = xfs_scrub_metadata(file, &sm);
 		v->sv_flags = sm.sm_flags;
 
+		/* Leave the inode in memory if something's wrong with it. */
+		if (xchk_needs_repair(&sm))
+			set_dontcache = false;
+
 		if (vhead->svh_rest_us) {
 			ktime_t		expires;
 
@@ -828,5 +878,16 @@ xfs_scrubv_metadata(
 		}
 	}
 
+	/*
+	 * If we're holding the only reference to an inode opened via handle
+	 * and the scan was clean, mark it dontcache so that we don't pollute
+	 * the cache.
+	 */
+	if (handle_ip) {
+		if (set_dontcache &&
+		    atomic_read(&VFS_I(handle_ip)->i_count) == 1)
+			d_mark_dontcache(VFS_I(handle_ip));
+		xfs_irele(handle_ip);
+	}
 	return error;
 }


^ permalink raw reply related	[flat|nested] 234+ messages in thread

* Re: [PATCH 1/4] docs: update the parent pointers documentation to the final version
  2024-04-10  0:46   ` [PATCH 1/4] docs: update the parent pointers documentation to the final version Darrick J. Wong
@ 2024-04-10  4:40     ` Christoph Hellwig
  0 siblings, 0 replies; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  4:40 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: hch, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 2/4] docs: update online directory and parent pointer repair sections
  2024-04-10  0:46   ` [PATCH 2/4] docs: update online directory and parent pointer repair sections Darrick J. Wong
@ 2024-04-10  4:40     ` Christoph Hellwig
  0 siblings, 0 replies; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  4:40 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: hch, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 3/4] docs: update offline parent pointer repair strategy
  2024-04-10  0:47   ` [PATCH 3/4] docs: update offline parent pointer repair strategy Darrick J. Wong
@ 2024-04-10  4:40     ` Christoph Hellwig
  0 siblings, 0 replies; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  4:40 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: hch, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 4/4] docs: describe xfs directory tree online fsck
  2024-04-10  0:47   ` [PATCH 4/4] docs: describe xfs directory tree online fsck Darrick J. Wong
@ 2024-04-10  4:40     ` Christoph Hellwig
  0 siblings, 0 replies; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  4:40 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: hch, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 1/7] xfs: Increase XFS_DEFER_OPS_NR_INODES to 5
  2024-04-10  0:47   ` [PATCH 1/7] xfs: Increase XFS_DEFER_OPS_NR_INODES to 5 Darrick J. Wong
@ 2024-04-10  4:41     ` Christoph Hellwig
  0 siblings, 0 replies; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  4:41 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Allison Henderson, Catherine Hoang, hch, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 2/7] xfs: Increase XFS_QM_TRANS_MAXDQS to 5
  2024-04-10  0:48   ` [PATCH 2/7] xfs: Increase XFS_QM_TRANS_MAXDQS " Darrick J. Wong
@ 2024-04-10  4:41     ` Christoph Hellwig
  0 siblings, 0 replies; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  4:41 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Allison Henderson, catherine.hoang, hch, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 3/7] xfs: Hold inode locks in xfs_ialloc
  2024-04-10  0:48   ` [PATCH 3/7] xfs: Hold inode locks in xfs_ialloc Darrick J. Wong
@ 2024-04-10  4:41     ` Christoph Hellwig
  0 siblings, 0 replies; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  4:41 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Allison Henderson, Catherine Hoang, hch, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 4/7] xfs: Hold inode locks in xfs_trans_alloc_dir
  2024-04-10  0:48   ` [PATCH 4/7] xfs: Hold inode locks in xfs_trans_alloc_dir Darrick J. Wong
@ 2024-04-10  4:41     ` Christoph Hellwig
  0 siblings, 0 replies; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  4:41 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Allison Henderson, Catherine Hoang, hch, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 5/7] xfs: Hold inode locks in xfs_rename
  2024-04-10  0:48   ` [PATCH 5/7] xfs: Hold inode locks in xfs_rename Darrick J. Wong
@ 2024-04-10  4:42     ` Christoph Hellwig
  0 siblings, 0 replies; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  4:42 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Allison Henderson, Catherine Hoang, hch, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 6/7] xfs: don't pick up IOLOCK during rmapbt repair scan
  2024-04-10  0:49   ` [PATCH 6/7] xfs: don't pick up IOLOCK during rmapbt repair scan Darrick J. Wong
@ 2024-04-10  4:42     ` Christoph Hellwig
  0 siblings, 0 replies; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  4:42 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 7/7] xfs: unlock new repair tempfiles after creation
  2024-04-10  0:49   ` [PATCH 7/7] xfs: unlock new repair tempfiles after creation Darrick J. Wong
@ 2024-04-10  4:42     ` Christoph Hellwig
  0 siblings, 0 replies; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  4:42 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 1/4] xfs: remove XFS_DA_OP_REMOVE
  2024-04-10  0:49   ` [PATCH 1/4] xfs: remove XFS_DA_OP_REMOVE Darrick J. Wong
@ 2024-04-10  4:43     ` Christoph Hellwig
  0 siblings, 0 replies; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  4:43 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: hch, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 2/4] xfs: remove XFS_DA_OP_NOTIME
  2024-04-10  0:49   ` [PATCH 2/4] xfs: remove XFS_DA_OP_NOTIME Darrick J. Wong
@ 2024-04-10  4:44     ` Christoph Hellwig
  0 siblings, 0 replies; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  4:44 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: hch, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 3/4] xfs: rename xfs_da_args.attr_flags
  2024-04-10  0:50   ` [PATCH 3/4] xfs: rename xfs_da_args.attr_flags Darrick J. Wong
@ 2024-04-10  5:01     ` Christoph Hellwig
  2024-04-10 20:55       ` Darrick J. Wong
  0 siblings, 1 reply; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  5:01 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: hch, linux-xfs

On Tue, Apr 09, 2024 at 05:50:07PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> This field only ever contains XATTR_{CREATE,REPLACE}, so let's change
> the name of the field to make the field and its values consistent.

So, these flags only get passed to xfs_attr_set through xfs_attr_change
and xfs_attr_setname, which means we should probably just pass them
directly as in my patch (against your whole stack) below.

Also I suspect we should do an audit of all the internal callers
if they should ever be replace an existing attr, as I guess most
don't.  (and xfs_attr_change really should be folded into xfs_attr_set,
the split is confusing as hell).

diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index b98d2a908452a0..38d1f4d10baa3b 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -1034,7 +1034,8 @@ xfs_attr_ensure_iext(
  */
 int
 xfs_attr_set(
-	struct xfs_da_args	*args)
+	struct xfs_da_args	*args,
+	uint8_t			xattr_flags)
 {
 	struct xfs_inode	*dp = args->dp;
 	struct xfs_mount	*mp = dp->i_mount;
@@ -1109,7 +1110,7 @@ xfs_attr_set(
 		}
 
 		/* Pure create fails if the attr already exists */
-		if (args->xattr_flags & XATTR_CREATE)
+		if (xattr_flags & XATTR_CREATE)
 			goto out_trans_cancel;
 		xfs_attr_defer_add(args, XFS_ATTR_DEFER_REPLACE);
 		break;
@@ -1119,7 +1120,7 @@ xfs_attr_set(
 			goto out_trans_cancel;
 
 		/* Pure replace fails if no existing attr to replace. */
-		if (args->xattr_flags & XATTR_REPLACE)
+		if (xattr_flags & XATTR_REPLACE)
 			goto out_trans_cancel;
 		xfs_attr_defer_add(args, XFS_ATTR_DEFER_SET);
 		break;
@@ -1155,7 +1156,7 @@ xfs_attr_set(
  * Ensure that the xattr structure maps @args->name to @args->value.
  *
  * The caller must have initialized @args, attached dquots, and must not hold
- * any ILOCKs.  Only XATTR_CREATE may be specified in @args->xattr_flags.
+ * any ILOCKs.  Only XATTR_CREATE may be specified in @xattr_flags.
  * Reserved data blocks may be used if @rsvd is set.
  *
  * Returns -EEXIST if XATTR_CREATE was specified and the name already exists.
@@ -1163,6 +1164,7 @@ xfs_attr_set(
 int
 xfs_attr_setname(
 	struct xfs_da_args	*args,
+	uint8_t			xattr_flags,
 	bool			rsvd)
 {
 	struct xfs_inode	*dp = args->dp;
@@ -1172,7 +1174,7 @@ xfs_attr_setname(
 	int			rmt_extents = 0;
 	int			error, local;
 
-	ASSERT(!(args->xattr_flags & XATTR_REPLACE));
+	ASSERT(!(xattr_flags & ~XATTR_CREATE));
 	ASSERT(!args->trans);
 
 	args->total = xfs_attr_calc_size(args, &local);
@@ -1198,7 +1200,7 @@ xfs_attr_setname(
 	switch (error) {
 	case -EEXIST:
 		/* Pure create fails if the attr already exists */
-		if (args->xattr_flags & XATTR_CREATE)
+		if (xattr_flags & XATTR_CREATE)
 			goto out_trans_cancel;
 		if (args->attr_filter & XFS_ATTR_PARENT)
 			xfs_attr_defer_parent(args, XFS_ATTR_DEFER_REPLACE);
diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
index 2a0ef4f633e2d1..b90e04c3e64f60 100644
--- a/fs/xfs/libxfs/xfs_attr.h
+++ b/fs/xfs/libxfs/xfs_attr.h
@@ -550,7 +550,7 @@ int xfs_inode_hasattr(struct xfs_inode *ip);
 bool xfs_attr_is_leaf(struct xfs_inode *ip);
 int xfs_attr_get_ilocked(struct xfs_da_args *args);
 int xfs_attr_get(struct xfs_da_args *args);
-int xfs_attr_set(struct xfs_da_args *args);
+int xfs_attr_set(struct xfs_da_args *args, uint8_t xattr_flags);
 int xfs_attr_set_iter(struct xfs_attr_intent *attr);
 int xfs_attr_remove_iter(struct xfs_attr_intent *attr);
 bool xfs_attr_check_namespace(unsigned int attr_flags);
@@ -560,7 +560,7 @@ int xfs_attr_calc_size(struct xfs_da_args *args, int *local);
 void xfs_init_attr_trans(struct xfs_da_args *args, struct xfs_trans_res *tres,
 			 unsigned int *total);
 
-int xfs_attr_setname(struct xfs_da_args *args, bool rsvd);
+int xfs_attr_setname(struct xfs_da_args *args, uint8_t xattr_flags, bool rsvd);
 int xfs_attr_removename(struct xfs_da_args *args, bool rsvd);
 
 /*
diff --git a/fs/xfs/libxfs/xfs_da_btree.h b/fs/xfs/libxfs/xfs_da_btree.h
index 8d7a38fe2a5c07..354d5d65043e43 100644
--- a/fs/xfs/libxfs/xfs_da_btree.h
+++ b/fs/xfs/libxfs/xfs_da_btree.h
@@ -69,7 +69,6 @@ typedef struct xfs_da_args {
 	uint8_t		filetype;	/* filetype of inode for directories */
 	uint8_t		op_flags;	/* operation flags */
 	uint8_t		attr_filter;	/* XFS_ATTR_{ROOT,SECURE,INCOMPLETE} */
-	uint8_t		xattr_flags;	/* XATTR_{CREATE,REPLACE} */
 	short		namelen;	/* length of string (maybe no NULL) */
 	short		new_namelen;	/* length of new attr name */
 	xfs_dahash_t	hashval;	/* hash value of name */
diff --git a/fs/xfs/libxfs/xfs_parent.c b/fs/xfs/libxfs/xfs_parent.c
index 2b6ed8c1ee1522..c5422f714fcc72 100644
--- a/fs/xfs/libxfs/xfs_parent.c
+++ b/fs/xfs/libxfs/xfs_parent.c
@@ -355,7 +355,7 @@ xfs_parent_set(
 
 	memset(scratch, 0, sizeof(struct xfs_da_args));
 	xfs_parent_da_args_init(scratch, NULL, pptr, ip, owner, parent_name);
-	return xfs_attr_setname(scratch, true);
+	return xfs_attr_setname(scratch, 0, true);
 }
 
 /*
diff --git a/fs/xfs/scrub/attr_repair.c b/fs/xfs/scrub/attr_repair.c
index e06d00ea828b3e..8863eef5a0b87b 100644
--- a/fs/xfs/scrub/attr_repair.c
+++ b/fs/xfs/scrub/attr_repair.c
@@ -615,7 +615,6 @@ xrep_xattr_insert_rec(
 	struct xfs_da_args		args = {
 		.dp			= rx->sc->tempip,
 		.attr_filter		= key->flags,
-		.xattr_flags		= XATTR_CREATE,
 		.namelen		= key->namelen,
 		.valuelen		= key->valuelen,
 		.owner			= rx->sc->ip->i_ino,
@@ -675,7 +674,7 @@ xrep_xattr_insert_rec(
 	 * use reserved blocks because we can abort the repair with ENOSPC.
 	 */
 	xfs_attr_sethash(&args);
-	error = xfs_attr_setname(&args, false);
+	error = xfs_attr_setname(&args, XATTR_CREATE, false);
 	if (error == -EEXIST)
 		error = 0;
 
diff --git a/fs/xfs/scrub/parent_repair.c b/fs/xfs/scrub/parent_repair.c
index cf79cbcda3ecb4..1bc05efa344036 100644
--- a/fs/xfs/scrub/parent_repair.c
+++ b/fs/xfs/scrub/parent_repair.c
@@ -1031,7 +1031,7 @@ xrep_parent_insert_xattr(
 			rp->xattr_name, key->namelen, key->valuelen);
 
 	xfs_attr_sethash(&args);
-	return xfs_attr_setname(&args, false);
+	return xfs_attr_setname(&args, 0, false);
 }
 
 /*
diff --git a/fs/xfs/xfs_acl.c b/fs/xfs/xfs_acl.c
index 4bf69c9c088e28..1aaf3dc64bcbc1 100644
--- a/fs/xfs/xfs_acl.c
+++ b/fs/xfs/xfs_acl.c
@@ -203,7 +203,7 @@ __xfs_set_acl(struct inode *inode, struct posix_acl *acl, int type)
 		xfs_acl_to_disk(args.value, acl);
 	}
 
-	error = xfs_attr_change(&args);
+	error = xfs_attr_change(&args, 0);
 	kvfree(args.value);
 
 	/*
diff --git a/fs/xfs/xfs_handle.c b/fs/xfs/xfs_handle.c
index 833b0d7d8bea1c..e3f54817b91557 100644
--- a/fs/xfs/xfs_handle.c
+++ b/fs/xfs/xfs_handle.c
@@ -492,7 +492,6 @@ xfs_attrmulti_attr_get(
 	struct xfs_da_args	args = {
 		.dp		= XFS_I(inode),
 		.attr_filter	= xfs_attr_filter(flags),
-		.xattr_flags	= xfs_xattr_flags(flags),
 		.name		= name,
 		.namelen	= strlen(name),
 		.valuelen	= *len,
@@ -526,7 +525,6 @@ xfs_attrmulti_attr_set(
 	struct xfs_da_args	args = {
 		.dp		= XFS_I(inode),
 		.attr_filter	= xfs_attr_filter(flags),
-		.xattr_flags	= xfs_xattr_flags(flags),
 		.name		= name,
 		.namelen	= strlen(name),
 	};
@@ -544,7 +542,7 @@ xfs_attrmulti_attr_set(
 		args.valuelen = len;
 	}
 
-	error = xfs_attr_change(&args);
+	error = xfs_attr_change(&args, xfs_xattr_flags(flags));
 	if (!error && (flags & XFS_IOC_ATTR_ROOT))
 		xfs_forget_acl(inode, name);
 	kfree(args.value);
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index c4f9c7eec83590..d374be9f8a6e3e 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -64,7 +64,7 @@ xfs_initxattrs(
 			.value		= xattr->value,
 			.valuelen	= xattr->value_len,
 		};
-		error = xfs_attr_change(&args);
+		error = xfs_attr_change(&args, 0);
 		if (error < 0)
 			break;
 	}
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index dc074240ad239f..1292d69087dc0c 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -2131,7 +2131,6 @@ DECLARE_EVENT_CLASS(xfs_attr_class,
 		__field(int, valuelen)
 		__field(xfs_dahash_t, hashval)
 		__field(unsigned int, attr_filter)
-		__field(unsigned int, xattr_flags)
 		__field(uint32_t, op_flags)
 	),
 	TP_fast_assign(
@@ -2143,11 +2142,10 @@ DECLARE_EVENT_CLASS(xfs_attr_class,
 		__entry->valuelen = args->valuelen;
 		__entry->hashval = args->hashval;
 		__entry->attr_filter = args->attr_filter;
-		__entry->xattr_flags = args->xattr_flags;
 		__entry->op_flags = args->op_flags;
 	),
 	TP_printk("dev %d:%d ino 0x%llx name %.*s namelen %d valuelen %d "
-		  "hashval 0x%x filter %s flags %s op_flags %s",
+		  "hashval 0x%x filter %s op_flags %s",
 		  MAJOR(__entry->dev), MINOR(__entry->dev),
 		  __entry->ino,
 		  __entry->namelen,
@@ -2157,9 +2155,6 @@ DECLARE_EVENT_CLASS(xfs_attr_class,
 		  __entry->hashval,
 		  __print_flags(__entry->attr_filter, "|",
 				XFS_ATTR_FILTER_FLAGS),
-		   __print_flags(__entry->xattr_flags, "|",
-				{ XATTR_CREATE,		"CREATE" },
-				{ XATTR_REPLACE,	"REPLACE" }),
 		  __print_flags(__entry->op_flags, "|", XFS_DA_OP_FLAGS))
 )
 
diff --git a/fs/xfs/xfs_xattr.c b/fs/xfs/xfs_xattr.c
index 1d57e204c850ff..69fa7b89c68972 100644
--- a/fs/xfs/xfs_xattr.c
+++ b/fs/xfs/xfs_xattr.c
@@ -80,7 +80,8 @@ xfs_attr_want_log_assist(
  */
 int
 xfs_attr_change(
-	struct xfs_da_args	*args)
+	struct xfs_da_args	*args,
+	uint8_t			xattr_flags)
 {
 	struct xfs_mount	*mp = args->dp->i_mount;
 	int			error;
@@ -95,7 +96,7 @@ xfs_attr_change(
 		args->op_flags |= XFS_DA_OP_LOGGED;
 	}
 
-	return xfs_attr_set(args);
+	return xfs_attr_set(args, xattr_flags);
 }
 
 
@@ -131,7 +132,6 @@ xfs_xattr_set(const struct xattr_handler *handler,
 	struct xfs_da_args	args = {
 		.dp		= XFS_I(inode),
 		.attr_filter	= handler->flags,
-		.xattr_flags	= flags,
 		.name		= name,
 		.namelen	= strlen(name),
 		.value		= (void *)value,
@@ -139,7 +139,7 @@ xfs_xattr_set(const struct xattr_handler *handler,
 	};
 	int			error;
 
-	error = xfs_attr_change(&args);
+	error = xfs_attr_change(&args, flags);
 	if (!error && (handler->flags & XFS_ATTR_ROOT))
 		xfs_forget_acl(inode, name);
 	return error;
diff --git a/fs/xfs/xfs_xattr.h b/fs/xfs/xfs_xattr.h
index f097002d06571f..79c0040cc904b4 100644
--- a/fs/xfs/xfs_xattr.h
+++ b/fs/xfs/xfs_xattr.h
@@ -6,7 +6,7 @@
 #ifndef __XFS_XATTR_H__
 #define __XFS_XATTR_H__
 
-int xfs_attr_change(struct xfs_da_args *args);
+int xfs_attr_change(struct xfs_da_args *args, uint8_t xattr_flags);
 int xfs_attr_grab_log_assist(struct xfs_mount *mp);
 void xfs_attr_rele_log_assist(struct xfs_mount *mp);
 

^ permalink raw reply related	[flat|nested] 234+ messages in thread

* Re: [PATCH 4/4] xfs: rearrange xfs_da_args a bit to use less space
  2024-04-10  0:50   ` [PATCH 4/4] xfs: rearrange xfs_da_args a bit to use less space Darrick J. Wong
@ 2024-04-10  5:02     ` Christoph Hellwig
  2024-04-10 20:56       ` Darrick J. Wong
  0 siblings, 1 reply; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  5:02 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: hch, linux-xfs

On Tue, Apr 09, 2024 at 05:50:23PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> A few notes about struct xfs_da_args:
> 
> The XFS_ATTR_* flags only go up as far as XFS_ATTR_INCOMPLETE, which
> means that attr_filter could be a u8 field.
> 
> The XATTR_* flags only have two values, which means that xattr_flags
> could be shrunk to a u8.
> 
> I've reduced the number of XFS_DA_OP_* flags down to the point where
> op_flags would also fit into a u8.
> 
> filetype has 7 bytes of slack after it, which is wasteful.
> 
> namelen will never be greater than MAXNAMELEN, which is 256.  This field
> could be reduced to a short.
> 
> Rearrange the fields in xfs_da_args to waste less space.  This reduces
> the structure size from 136 bytes to 128.  Later when we add extra
> fields to support parent pointer replacement, this will only bloat the
> structure to 144 bytes, instead of 168.

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

Eventually we should probaly split this up, at lot of fields are
used only by the attr set code, and a few less only by dir vs attr.


^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 01/12] xfs: attr fork iext must be loaded before calling xfs_attr_is_leaf
  2024-04-10  0:50   ` [PATCH 01/12] xfs: attr fork iext must be loaded before calling xfs_attr_is_leaf Darrick J. Wong
@ 2024-04-10  5:04     ` Christoph Hellwig
  2024-04-10 20:58       ` Darrick J. Wong
  0 siblings, 1 reply; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  5:04 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, linux-xfs

On Tue, Apr 09, 2024 at 05:50:38PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Christoph noticed that the xfs_attr_is_leaf in xfs_attr_get_ilocked can
> access the incore extent tree of the attr fork, but nothing in the
> xfs_attr_get path guarantees that the incore tree is actually loaded.
> 
> Most of the time it is, but seeing as xfs_attr_is_leaf ignores the
> return value of xfs_iext_get_extent I guess we've been making choices
> based on random stack contents and nobody's complained?

Yes, I'm kinda puzzled.

Note that the dir code actually reads the extents in their
is_leaf/is_block helpers.  But given how the attr code is structured
that would thread through a lot of code so it might not be worth it.

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 02/12] xfs: require XFS_SB_FEAT_INCOMPAT_LOG_XATTRS for attr log intent item recovery
  2024-04-10  0:50   ` [PATCH 02/12] xfs: require XFS_SB_FEAT_INCOMPAT_LOG_XATTRS for attr log intent item recovery Darrick J. Wong
@ 2024-04-10  5:04     ` Christoph Hellwig
  0 siblings, 0 replies; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  5:04 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: hch, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 03/12] xfs: use an XFS_OPSTATE_ flag for detecting if logged xattrs are available
  2024-04-10  0:51   ` [PATCH 03/12] xfs: use an XFS_OPSTATE_ flag for detecting if logged xattrs are available Darrick J. Wong
@ 2024-04-10  5:05     ` Christoph Hellwig
  0 siblings, 0 replies; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  5:05 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: hch, linux-xfs

On Tue, Apr 09, 2024 at 05:51:09PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Per reviewer request, use an OPSTATE flag (+ helpers) to decide if
> logged xattrs are enabled, instead of querying the xfs_sb.

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 04/12] xfs: check opcode and iovec count match in xlog_recover_attri_commit_pass2
  2024-04-10  0:51   ` [PATCH 04/12] xfs: check opcode and iovec count match in xlog_recover_attri_commit_pass2 Darrick J. Wong
@ 2024-04-10  5:05     ` Christoph Hellwig
  0 siblings, 0 replies; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  5:05 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: hch, linux-xfs

On Tue, Apr 09, 2024 at 05:51:25PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Check that the number of recovered log iovecs is what is expected for
> the xattri opcode is expecting.

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 05/12] xfs: fix missing check for invalid attr flags
  2024-04-10  0:51   ` [PATCH 05/12] xfs: fix missing check for invalid attr flags Darrick J. Wong
@ 2024-04-10  5:07     ` Christoph Hellwig
  2024-04-10 21:04       ` Darrick J. Wong
  0 siblings, 1 reply; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  5:07 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: hch, linux-xfs

On Tue, Apr 09, 2024 at 05:51:41PM -0700, Darrick J. Wong wrote:
> +#define XFS_ATTR_ONDISK_MASK	(XFS_ATTR_NSP_ONDISK_MASK | \
> +				 XFS_ATTR_LOCAL | \
> +				 XFS_ATTR_INCOMPLETE)

Note that XFS_ATTR_LOCAL and XFS_ATTR_INCOMPLETE are not valid for
short form directories.  Should we check for that somewhere as well?

Otherwise looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 06/12] xfs: restructure xfs_attr_complete_op a bit
  2024-04-10  0:51   ` [PATCH 06/12] xfs: restructure xfs_attr_complete_op a bit Darrick J. Wong
@ 2024-04-10  5:07     ` Christoph Hellwig
  0 siblings, 0 replies; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  5:07 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: hch, linux-xfs

On Tue, Apr 09, 2024 at 05:51:56PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Eliminate the local variable from this function so that we can
> streamline things a bit later when we add the PPTR_REPLACE op code.

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 07/12] xfs: use helpers to extract xattr op from opflags
  2024-04-10  0:52   ` [PATCH 07/12] xfs: use helpers to extract xattr op from opflags Darrick J. Wong
@ 2024-04-10  5:07     ` Christoph Hellwig
  0 siblings, 0 replies; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  5:07 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: hch, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 08/12] xfs: validate recovered name buffers when recovering xattr items
  2024-04-10  0:52   ` [PATCH 08/12] xfs: validate recovered name buffers when recovering xattr items Darrick J. Wong
@ 2024-04-10  5:08     ` Christoph Hellwig
  0 siblings, 0 replies; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  5:08 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: hch, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 09/12] xfs: always set args->value in xfs_attri_item_recover
  2024-04-10  0:52   ` [PATCH 09/12] xfs: always set args->value in xfs_attri_item_recover Darrick J. Wong
@ 2024-04-10  5:08     ` Christoph Hellwig
  0 siblings, 0 replies; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  5:08 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: hch, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 10/12] xfs: use local variables for name and value length in _attri_commit_pass2
  2024-04-10  0:52   ` [PATCH 10/12] xfs: use local variables for name and value length in _attri_commit_pass2 Darrick J. Wong
@ 2024-04-10  5:08     ` Christoph Hellwig
  0 siblings, 0 replies; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  5:08 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: hch, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 11/12] xfs: refactor name/length checks in xfs_attri_validate
  2024-04-10  0:53   ` [PATCH 11/12] xfs: refactor name/length checks in xfs_attri_validate Darrick J. Wong
@ 2024-04-10  5:09     ` Christoph Hellwig
  0 siblings, 0 replies; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  5:09 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: hch, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 12/12] xfs: enforce one namespace per attribute
  2024-04-10  0:53   ` [PATCH 12/12] xfs: enforce one namespace per attribute Darrick J. Wong
@ 2024-04-10  5:09     ` Christoph Hellwig
  0 siblings, 0 replies; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  5:09 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: hch, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 01/32] xfs: rearrange xfs_attr_match parameters
  2024-04-10  0:53   ` [PATCH 01/32] xfs: rearrange xfs_attr_match parameters Darrick J. Wong
@ 2024-04-10  5:10     ` Christoph Hellwig
  0 siblings, 0 replies; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  5:10 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 03/32] xfs: move xfs_attr_defer_add to xfs_attr_item.c
  2024-04-10  0:54   ` [PATCH 03/32] xfs: move xfs_attr_defer_add to xfs_attr_item.c Darrick J. Wong
@ 2024-04-10  5:11     ` Christoph Hellwig
  0 siblings, 0 replies; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  5:11 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 04/32] xfs: create a separate hashname function for extended attributes
  2024-04-10  0:54   ` [PATCH 04/32] xfs: create a separate hashname function for extended attributes Darrick J. Wong
@ 2024-04-10  5:11     ` Christoph Hellwig
  0 siblings, 0 replies; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  5:11 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 05/32] xfs: add parent pointer support to attribute code
  2024-04-10  0:54   ` [PATCH 05/32] xfs: add parent pointer support to attribute code Darrick J. Wong
@ 2024-04-10  5:11     ` Christoph Hellwig
  0 siblings, 0 replies; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  5:11 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Mark Tinguely, Dave Chinner, Allison Henderson, catherine.hoang,
	hch, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 06/32] xfs: define parent pointer ondisk extended attribute format
  2024-04-10  0:55   ` [PATCH 06/32] xfs: define parent pointer ondisk extended attribute format Darrick J. Wong
@ 2024-04-10  5:12     ` Christoph Hellwig
  0 siblings, 0 replies; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  5:12 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Allison Henderson, catherine.hoang, hch, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 07/32] xfs: allow xattr matching on name and value for local/sf attrs
  2024-04-10  0:55   ` [PATCH 07/32] xfs: allow xattr matching on name and value for local/sf attrs Darrick J. Wong
@ 2024-04-10  5:16     ` Christoph Hellwig
  2024-04-10 21:13       ` Darrick J. Wong
  0 siblings, 1 reply; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  5:16 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

On Tue, Apr 09, 2024 at 05:55:20PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Add a new XFS_DA_OP_PARENT flag to signal that the caller wants to look

The flag doesn't actually exist, the match is done on the
XFS_ATTR_PARENT namespaces.

>  
> @@ -2444,14 +2477,17 @@ xfs_attr3_leaf_lookup_int(
>  			name_loc = xfs_attr3_leaf_name_local(leaf, probe);
>  			if (!xfs_attr_match(args, entry->flags,
>  						name_loc->nameval,
> -						name_loc->namelen))
> +						name_loc->namelen,
> +						&name_loc->nameval[name_loc->namelen],
> +						be16_to_cpu(name_loc->valuelen)))

If we'd switch from the odd pre-existing three-tab indent to the normal
two-tab indent we'd avoid the overly long line here.

>  				continue;
>  			args->index = probe;
>  			return -EEXIST;
>  		} else {
>  			name_rmt = xfs_attr3_leaf_name_remote(leaf, probe);
>  			if (!xfs_attr_match(args, entry->flags, name_rmt->name,
> -						name_rmt->namelen))
> +						name_rmt->namelen, NULL,
> +						be32_to_cpu(name_rmt->valuelen)))

... and here.

The remote side might also benefit from a local variable to store the
endian swapped version of the valuelen instead of calculating it twice.

Otherwise looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 08/32] xfs: allow logged xattr operations if parent pointers are enabled
  2024-04-10  0:55   ` [PATCH 08/32] xfs: allow logged xattr operations if parent pointers are enabled Darrick J. Wong
@ 2024-04-10  5:18     ` Christoph Hellwig
  2024-04-10 21:18       ` Darrick J. Wong
  0 siblings, 1 reply; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  5:18 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

On Tue, Apr 09, 2024 at 05:55:35PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Don't trip this assertion about attr log items if we have parent
> pointers enabled.  Parent pointers are an incompat feature that doesn't
> use any of the functionality protected by
> XFS_SB_FEAT_INCOMPAT_LOG_XATTRS, which is why this is ok.

I'd move the checks into the switch on op below, so that we check the log
attrs feature for the "normal" logged attrs and the parent pointers flag
for the parent pointer ops.


^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 09/32] xfs: log parent pointer xattr removal operations
  2024-04-10  0:55   ` [PATCH 09/32] xfs: log parent pointer xattr removal operations Darrick J. Wong
@ 2024-04-10  5:18     ` Christoph Hellwig
  0 siblings, 0 replies; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  5:18 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 11/32] xfs: log parent pointer xattr replace operations
  2024-04-10  0:56   ` [PATCH 11/32] xfs: log parent pointer xattr replace operations Darrick J. Wong
@ 2024-04-10  5:26     ` Christoph Hellwig
  2024-04-10 23:07       ` Darrick J. Wong
  0 siblings, 1 reply; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  5:26 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Allison Henderson, catherine.hoang, hch, linux-xfs

On Tue, Apr 09, 2024 at 05:56:22PM -0700, Darrick J. Wong wrote:
> From: Allison Henderson <allison.henderson@oracle.com>
> 
> The parent pointer code needs to do a deferred parent pointer replace
> operation with the xattr log intent code.  Declare a new logged xattr
> opcode and push it through the log.
> 
> (Formerly titled "xfs: Add new name to attri/d" and described as
> follows:

I don't think this history is very important.  The being said,
I suspect this and the previous two patches should be combined into
a single one adding the on-disk formats for parent pointers, and the
commit log could use a complete rewrite saying that it a

> +			return false;
> +		if (attrp->alfi_old_name_len == 0 ||
> +		    attrp->alfi_old_name_len > XATTR_NAME_MAX)
> +			return false;
> +		if (attrp->alfi_new_name_len == 0 ||
> +		    attrp->alfi_new_name_len > XATTR_NAME_MAX)
> +			return false;

Given that we have four copies of this (arguably simple) check,
should we grow a helper for it?

> +		if (attrp->alfi_value_len == 0 ||
> +		    attrp->alfi_value_len > XATTR_SIZE_MAX)
> +			return false;

All parent pointer attrs must be sized for exactly the parent_rec,
so we should probably check for that explicitly?

> +	if (xfs_attr_log_item_op(old_attrp) == XFS_ATTRI_OP_FLAGS_PPTR_REPLACE) {

Please avoid the overly long line here.

>  
> +	/* Validate the new attr name */
> +	if (new_name_len > 0) {
> +		if (item->ri_buf[i].i_len != xlog_calc_iovec_len(new_name_len)) {

.. and here.

And while we're at it, maybe factor the checking for valid xattr
name and value log iovecs into little helper instead of duplicating
them a few times?


^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 12/32] xfs: record inode generation in xattr update log intent items
  2024-04-10  0:56   ` [PATCH 12/32] xfs: record inode generation in xattr update log intent items Darrick J. Wong
@ 2024-04-10  5:27     ` Christoph Hellwig
  0 siblings, 0 replies; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  5:27 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

> -	if (xfs_attr_log_item_op(attrp) == XFS_ATTRI_OP_FLAGS_PPTR_REPLACE) {
> +	switch (xfs_attr_log_item_op(attrp)) {
> +	case XFS_ATTRI_OP_FLAGS_PPTR_REPLACE:
>  		ASSERT(attr->xattri_nameval->value.i_len ==
>  		       attr->xattri_nameval->new_value.i_len);
>  
> +		attrp->alfi_igen = VFS_I(attr->xattri_da_args->dp)->i_generation;

Please avoid the overly long lines (maybe xattri_da_args needs a shortet
name or we want a local variable for it?)


^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 13/32] xfs: Expose init_xattrs in xfs_create_tmpfile
  2024-04-10  0:56   ` [PATCH 13/32] xfs: Expose init_xattrs in xfs_create_tmpfile Darrick J. Wong
@ 2024-04-10  5:28     ` Christoph Hellwig
  0 siblings, 0 replies; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  5:28 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Allison Henderson, catherine.hoang, hch, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 14/32] xfs: add parent pointer validator functions
  2024-04-10  0:57   ` [PATCH 14/32] xfs: add parent pointer validator functions Darrick J. Wong
@ 2024-04-10  5:31     ` Christoph Hellwig
  2024-04-10 18:53       ` Darrick J. Wong
  0 siblings, 1 reply; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  5:31 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Allison Henderson, catherine.hoang, hch, linux-xfs

On Tue, Apr 09, 2024 at 05:57:09PM -0700, Darrick J. Wong wrote:
> From: Allison Henderson <allison.henderson@oracle.com>
> 
> Attribute names of parent pointers are not strings.

They are now.  The rest of the commit log also doesn't match the code
anymore.  The code itself looks good, though.


^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 15/32] xfs: extend transaction reservations for parent attributes
  2024-04-10  0:57   ` [PATCH 15/32] xfs: extend transaction reservations for parent attributes Darrick J. Wong
@ 2024-04-10  5:31     ` Christoph Hellwig
  0 siblings, 0 replies; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  5:31 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Dave Chinner, Allison Henderson, catherine.hoang, hch, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 16/32] xfs: create a hashname function for parent pointers
  2024-04-10  0:57   ` [PATCH 16/32] xfs: create a hashname function for parent pointers Darrick J. Wong
@ 2024-04-10  5:33     ` Christoph Hellwig
  2024-04-10 21:39       ` Darrick J. Wong
  0 siblings, 1 reply; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  5:33 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

> +	/*
> +	 * Use the same dirent name hash as would be used on the directory, but
> +	 * mix in the parent inode number.
> +	 */
> +	ret = xfs_dir2_hashname(mp, &xname);
> +	ret ^= upper_32_bits(parent_ino);
> +	ret ^= lower_32_bits(parent_ino);
> +	return ret;

Totally superficial nit, but wouldn't this read a little nicer as:

	return xfs_dir2_hashname(mp, &xname) ^
		lower_32_bits(parent_ino) ^
		upper_32_bits(parent_ino);

?

Otherwise looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 17/32] xfs: parent pointer attribute creation
  2024-04-10  0:57   ` [PATCH 17/32] xfs: parent pointer attribute creation Darrick J. Wong
@ 2024-04-10  5:44     ` Christoph Hellwig
  2024-04-10 21:50       ` Darrick J. Wong
  0 siblings, 1 reply; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  5:44 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Dave Chinner, Allison Henderson, catherine.hoang, hch, linux-xfs


One thing that might be worth documenting in a comment or at least
the commit log is why we have this three phase split between
allocating the daargs, doing all the work and freeing it.

As far as I can tell that is because the da_args need to be around
until transaction commit because xfs_attr_intent has a pointer to
the da_args and not a full copy.  So unless the attrs are on stack
they need to be free after transaction commit, and as the normal
dir operation args are not on the stack we don't want to add the
attr one to the stack here.  We could probably allocate the da_args
in the main parent pointer helpers, but that would require a NOFAIL
allocation and maybe lead to odd calling conventions, but maybe
someone directly involved can further refine that reasoning.

Otherwise looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 18/32] xfs: add parent attributes to link
  2024-04-10  0:58   ` [PATCH 18/32] xfs: add parent attributes to link Darrick J. Wong
@ 2024-04-10  5:45     ` Christoph Hellwig
  0 siblings, 0 replies; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  5:45 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Dave Chinner, Allison Henderson, catherine.hoang, hch, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 19/32] xfs: add parent attributes to symlink
  2024-04-10  0:58   ` [PATCH 19/32] xfs: add parent attributes to symlink Darrick J. Wong
@ 2024-04-10  5:45     ` Christoph Hellwig
  0 siblings, 0 replies; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  5:45 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Allison Henderson, catherine.hoang, hch, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 20/32] xfs: remove parent pointers in unlink
  2024-04-10  0:58   ` [PATCH 20/32] xfs: remove parent pointers in unlink Darrick J. Wong
@ 2024-04-10  5:45     ` Christoph Hellwig
  0 siblings, 0 replies; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  5:45 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Dave Chinner, Allison Henderson, catherine.hoang, hch, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 21/32] xfs: Add parent pointers to rename
  2024-04-10  0:58   ` [PATCH 21/32] xfs: Add parent pointers to rename Darrick J. Wong
@ 2024-04-10  5:46     ` Christoph Hellwig
  0 siblings, 0 replies; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  5:46 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Allison Henderson, catherine.hoang, hch, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 22/32] xfs: Add parent pointers to xfs_cross_rename
  2024-04-10  0:59   ` [PATCH 22/32] xfs: Add parent pointers to xfs_cross_rename Darrick J. Wong
@ 2024-04-10  5:46     ` Christoph Hellwig
  0 siblings, 0 replies; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  5:46 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Allison Henderson, catherine.hoang, hch, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 23/32] xfs: Filter XFS_ATTR_PARENT for getfattr
  2024-04-10  0:59   ` [PATCH 23/32] xfs: Filter XFS_ATTR_PARENT for getfattr Darrick J. Wong
@ 2024-04-10  5:51     ` Christoph Hellwig
  2024-04-10 21:58       ` Darrick J. Wong
  0 siblings, 1 reply; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  5:51 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Allison Henderson, catherine.hoang, hch, linux-xfs

On Tue, Apr 09, 2024 at 05:59:30PM -0700, Darrick J. Wong wrote:
> From: Allison Henderson <allison.henderson@oracle.com>
> 
> Parent pointers returned to the get_fattr tool cause errors since
> the tool cannot parse parent pointers.  Fix this by filtering parent
> parent pointers from xfs_xattr_put_listent.

With the new format returning the attrs should not cause parsing errors.
OTOH we now have duplicate names, which means a get operation based on
the name can't actually work in that case.

I'd also argue that parent pointers are internal enough that they
should not be exposed through the normal xattr interfaces.

> +/*
> + * This file defines functions to work with externally visible extended
> + * attributes, such as those in user, system, or security namespaces.  They
> + * should not be used for internally used attributes.  Consider xfs_attr.c.
> + */

As long as xfs_attr_change and xfs_attr_grab_log_assist are xfs_xattr.c
that is not actually true.  However I think they should be moved to
xfs_attr.c (and in case of xfs_attr_change merged into xfs_attr_set)
to make this comment true.

However I'd make it part of the top of file comment above the include
statements.  And please add it in a separate commit as it has nothing
to do with the other changes here.


^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 24/32] xfs: pass the attr value to put_listent when possible
  2024-04-10  0:59   ` [PATCH 24/32] xfs: pass the attr value to put_listent when possible Darrick J. Wong
@ 2024-04-10  5:51     ` Christoph Hellwig
  0 siblings, 0 replies; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  5:51 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Allison Henderson, catherine.hoang, hch, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 25/32] xfs: move handle ioctl code to xfs_handle.c
  2024-04-10  1:00   ` [PATCH 25/32] xfs: move handle ioctl code to xfs_handle.c Darrick J. Wong
@ 2024-04-10  5:52     ` Christoph Hellwig
  0 siblings, 0 replies; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  5:52 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 26/32] xfs: split out handle management helpers a bit
  2024-04-10  1:00   ` [PATCH 26/32] xfs: split out handle management helpers a bit Darrick J. Wong
@ 2024-04-10  5:56     ` Christoph Hellwig
  2024-04-10 22:01       ` Darrick J. Wong
  0 siblings, 1 reply; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  5:56 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

> +	handle->ha_fid.fid_len = sizeof(struct xfs_fid) -
> +				 sizeof(handle->ha_fid.fid_len);

If we clean this up anyway, maybe add a helper for the above calculation
and share it with xfs_khandle_to_dentry?

Otherwise looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 27/32] xfs: Add parent pointer ioctls
  2024-04-10  1:00   ` [PATCH 27/32] xfs: Add parent pointer ioctls Darrick J. Wong
@ 2024-04-10  6:04     ` Christoph Hellwig
  2024-04-10 23:34       ` Darrick J. Wong
  2024-04-12 17:39     ` Darrick J. Wong
  1 sibling, 1 reply; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  6:04 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Allison Henderson, catherine.hoang, hch, linux-xfs

Maybe replace the subject with 'add parent pointer listing ioctls' ?

On Tue, Apr 09, 2024 at 06:00:33PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> This patch adds a pair of new file ioctls to retrieve the parent pointer
> of a given inode.  They both return the same results, but one operates
> on the file descriptor passed to ioctl() whereas the other allows the
> caller to specify a file handle for which the caller wants results.
> 
> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> [djwong: adjust to new ondisk format, split ioctls]
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>

Note that the first signoff should always be from the patch author.
as recorded in the From line.

> +	/* Size of the gp_buffer in bytes */
> +	__u32				gp_bufsize;
> +
> +	/* Must be set to zero */
> +	__u64				__pad;

We don't really need this as padding.  If you want to keep it for
extensibility (although I can't really think of anything to use it
for in the future) it should probably be renamed to gp_reserved;

> +static inline struct xfs_getparents_rec *
> +xfs_getparents_next_rec(struct xfs_getparents *gp,
> +			struct xfs_getparents_rec *gpr)
> +{
> +	char *next = ((char *)gpr + gpr->gpr_reclen);
> +	char *end = (char *)(uintptr_t)(gp->gp_buffer + gp->gp_bufsize);
> +
> +	if (next >= end)
> +		return NULL;
> +
> +	return (struct xfs_getparents_rec *)next;

We rely on void pointer arithmetics everywhere in the kernel and
xfsprogs, so maybe use that here and avoid the need for the cast
at the end?

> + */
> +int
> +xfs_parent_from_xattr(
> +	struct xfs_mount	*mp,
> +	unsigned int		attr_flags,
> +	const unsigned char	*name,
> +	unsigned int		namelen,
> +	const void		*value,
> +	unsigned int		valuelen,
> +	xfs_ino_t		*parent_ino,
> +	uint32_t		*parent_gen)
> +{
> +	const struct xfs_parent_rec	*rec = value;
> +
> +	if (!(attr_flags & XFS_ATTR_PARENT))
> +		return 0;

I wonder if this check should move to the callers.  That makes the
calling conventions a lot simpler, and I think it probably makes
the code a bit easier to follow as well.  But I'm not entirely sure
either and open for arguments.

> +static inline unsigned int
> +xfs_getparents_rec_sizeof(
> +	unsigned int		namelen)
> +{
> +	return round_up(sizeof(struct xfs_getparents_rec) + namelen + 1,
> +			sizeof(uint32_t));
> +}

As we marked the xfs_getparents_rec as __packed we shouldn't really
need the alignment here.  Or if we align, it should be to 8 bytes,
in which case we don't need to pack it.

> +	unsigned short			reclen = xfs_getparents_rec_sizeof(namelen);

Please avoid the overly long line here.

Otherwise looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 28/32] xfs: don't remove the attr fork when parent pointers are enabled
  2024-04-10  1:00   ` [PATCH 28/32] xfs: don't remove the attr fork when parent pointers are enabled Darrick J. Wong
@ 2024-04-10  6:04     ` Christoph Hellwig
  0 siblings, 0 replies; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  6:04 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Allison Henderson, catherine.hoang, hch, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 29/32] xfs: Add the parent pointer support to the superblock version 5.
  2024-04-10  1:01   ` [PATCH 29/32] xfs: Add the parent pointer support to the superblock version 5 Darrick J. Wong
@ 2024-04-10  6:05     ` Christoph Hellwig
  2024-04-10 22:06       ` Darrick J. Wong
  0 siblings, 1 reply; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  6:05 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Mark Tinguely, Dave Chinner, Allison Henderson, Darrick J. Wong,
	catherine.hoang, hch, linux-xfs

On Tue, Apr 09, 2024 at 06:01:04PM -0700, Darrick J. Wong wrote:
> From: Allison Henderson <allison.henderson@oracle.com>
> 
> Add the parent pointer superblock flag so that we can actually mount
> filesystems with this feature enabled.

The subjcet reads a little weird.  What about

"add a incompat feature bit for parent pointers" ?


^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 30/32] xfs: fix unit conversion error in xfs_log_calc_max_attrsetm_res
  2024-04-10  1:01   ` [PATCH 30/32] xfs: fix unit conversion error in xfs_log_calc_max_attrsetm_res Darrick J. Wong
@ 2024-04-10  6:05     ` Christoph Hellwig
  0 siblings, 0 replies; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  6:05 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Allison Henderson, catherine.hoang, hch, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 31/32] xfs: drop compatibility minimum log size computations for reflink
  2024-04-10  1:01   ` [PATCH 31/32] xfs: drop compatibility minimum log size computations for reflink Darrick J. Wong
@ 2024-04-10  6:06     ` Christoph Hellwig
  0 siblings, 0 replies; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  6:06 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Allison Henderson, catherine.hoang, hch, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 32/32] xfs: enable parent pointers
  2024-04-10  1:01   ` [PATCH 32/32] xfs: enable parent pointers Darrick J. Wong
@ 2024-04-10  6:06     ` Christoph Hellwig
  2024-04-10 22:11       ` Darrick J. Wong
  0 siblings, 1 reply; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  6:06 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

On Tue, Apr 09, 2024 at 06:01:51PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Add parent pointers to the list of supported features.

Any reason to split this from actually adding the feature bit?

Otherwise looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 1/7] xfs: check dirents have parent pointers
  2024-04-10  1:02   ` [PATCH 1/7] xfs: check dirents have " Darrick J. Wong
@ 2024-04-10  6:12     ` Christoph Hellwig
  0 siblings, 0 replies; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  6:12 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 2/7] xfs: deferred scrub of dirents
  2024-04-10  1:02   ` [PATCH 2/7] xfs: deferred scrub of dirents Darrick J. Wong
@ 2024-04-10  6:13     ` Christoph Hellwig
  0 siblings, 0 replies; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  6:13 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 3/7] xfs: scrub parent pointers
  2024-04-10  1:02   ` [PATCH 3/7] xfs: scrub parent pointers Darrick J. Wong
@ 2024-04-10  6:13     ` Christoph Hellwig
  0 siblings, 0 replies; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  6:13 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 4/7] xfs: deferred scrub of parent pointers
  2024-04-10  1:02   ` [PATCH 4/7] xfs: deferred scrub of " Darrick J. Wong
@ 2024-04-10  6:14     ` Christoph Hellwig
  0 siblings, 0 replies; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  6:14 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 5/7] xfs: walk directory parent pointers to determine backref count
  2024-04-10  1:03   ` [PATCH 5/7] xfs: walk directory parent pointers to determine backref count Darrick J. Wong
@ 2024-04-10  6:14     ` Christoph Hellwig
  0 siblings, 0 replies; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  6:14 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 6/7] xfs: check parent pointer xattrs when scrubbing
  2024-04-10  1:03   ` [PATCH 6/7] xfs: check parent pointer xattrs when scrubbing Darrick J. Wong
@ 2024-04-10  6:14     ` Christoph Hellwig
  0 siblings, 0 replies; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  6:14 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 7/7] xfs: salvage parent pointers when rebuilding xattr structures
  2024-04-10  1:03   ` [PATCH 7/7] xfs: salvage parent pointers when rebuilding xattr structures Darrick J. Wong
@ 2024-04-10  6:15     ` Christoph Hellwig
  0 siblings, 0 replies; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  6:15 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 01/14] xfs: add xattr setname and removename functions for internal users
  2024-04-10  1:03   ` [PATCH 01/14] xfs: add xattr setname and removename functions for internal users Darrick J. Wong
@ 2024-04-10  6:18     ` Christoph Hellwig
  2024-04-10 22:18       ` Darrick J. Wong
  0 siblings, 1 reply; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  6:18 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

> +static int
> +xfs_attr_ensure_iext(
> +	struct xfs_da_args	*args,
> +	int			nr)
> +{
> +	int			error;
> +
> +	error = xfs_iext_count_may_overflow(args->dp, XFS_ATTR_FORK, nr);
> +	if (error == -EFBIG)
> +		return xfs_iext_count_upgrade(args->trans, args->dp, nr);
> +	return error;
> +}

I'd rather get my consolidation of these merged instead of adding
a wrapper like this.  Just waiting for my RT delalloc and your
exchrange series to hit for-next to resend it.

> +/*
> + * Ensure that the xattr structure maps @args->name to @args->value.
> + *
> + * The caller must have initialized @args, attached dquots, and must not hold
> + * any ILOCKs.  Only XATTR_CREATE may be specified in @args->xattr_flags.
> + * Reserved data blocks may be used if @rsvd is set.
> + *
> + * Returns -EEXIST if XATTR_CREATE was specified and the name already exists.
> + */
> +int
> +xfs_attr_setname(

Is there any case where we do not want to pass XATTR_CREATE, that
is replace an existing attribute when there is one?

> +int
> +xfs_attr_removename(
> +	struct xfs_da_args	*args,
> +	bool			rsvd)
> +{

Is there a good reason to have a separate remove helper and not
overload a NULL value like we do for the normal xattr interface?


^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 02/14] xfs: add raw parent pointer apis to support repair
  2024-04-10  1:04   ` [PATCH 02/14] xfs: add raw parent pointer apis to support repair Darrick J. Wong
@ 2024-04-10  6:18     ` Christoph Hellwig
  0 siblings, 0 replies; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  6:18 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 03/14] xfs: repair directories by scanning directory parent pointers
  2024-04-10  1:04   ` [PATCH 03/14] xfs: repair directories by scanning directory parent pointers Darrick J. Wong
@ 2024-04-10  6:19     ` Christoph Hellwig
  0 siblings, 0 replies; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  6:19 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 04/14] xfs: implement live updates for directory repairs
  2024-04-10  1:04   ` [PATCH 04/14] xfs: implement live updates for directory repairs Darrick J. Wong
@ 2024-04-10  6:19     ` Christoph Hellwig
  0 siblings, 0 replies; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  6:19 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 05/14] xfs: replay unlocked parent pointer updates that accrue during xattr repair
  2024-04-10  1:04   ` [PATCH 05/14] xfs: replay unlocked parent pointer updates that accrue during xattr repair Darrick J. Wong
@ 2024-04-10  6:19     ` Christoph Hellwig
  0 siblings, 0 replies; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  6:19 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 06/14] xfs: repair directory parent pointers by scanning for dirents
  2024-04-10  1:05   ` [PATCH 06/14] xfs: repair directory parent pointers by scanning for dirents Darrick J. Wong
@ 2024-04-10  6:20     ` Christoph Hellwig
  0 siblings, 0 replies; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  6:20 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 07/14] xfs: implement live updates for parent pointer repairs
  2024-04-10  1:05   ` [PATCH 07/14] xfs: implement live updates for parent pointer repairs Darrick J. Wong
@ 2024-04-10  6:20     ` Christoph Hellwig
  0 siblings, 0 replies; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  6:20 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 08/14] xfs: remove pointless unlocked assertion
  2024-04-10  1:05   ` [PATCH 08/14] xfs: remove pointless unlocked assertion Darrick J. Wong
@ 2024-04-10  6:20     ` Christoph Hellwig
  0 siblings, 0 replies; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  6:20 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 09/14] xfs: split xfs_bmap_add_attrfork into two pieces
  2024-04-10  1:06   ` [PATCH 09/14] xfs: split xfs_bmap_add_attrfork into two pieces Darrick J. Wong
@ 2024-04-10  6:21     ` Christoph Hellwig
  0 siblings, 0 replies; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  6:21 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 10/14] xfs: add a per-leaf block callback to xchk_xattr_walk
  2024-04-10  1:06   ` [PATCH 10/14] xfs: add a per-leaf block callback to xchk_xattr_walk Darrick J. Wong
@ 2024-04-10  6:22     ` Christoph Hellwig
  0 siblings, 0 replies; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  6:22 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 11/14] xfs: actually rebuild the parent pointer xattrs
  2024-04-10  1:06   ` [PATCH 11/14] xfs: actually rebuild the parent pointer xattrs Darrick J. Wong
@ 2024-04-10  6:22     ` Christoph Hellwig
  0 siblings, 0 replies; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  6:22 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 13/14] xfs: repair link count of nondirectories after rebuilding parent pointers
  2024-04-10  1:07   ` [PATCH 13/14] xfs: repair link count of nondirectories after rebuilding " Darrick J. Wong
@ 2024-04-10  6:22     ` Christoph Hellwig
  0 siblings, 0 replies; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  6:22 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 12/14] xfs: adapt the orphanage code to handle parent pointers
  2024-04-10  1:06   ` [PATCH 12/14] xfs: adapt the orphanage code to handle parent pointers Darrick J. Wong
@ 2024-04-10  6:23     ` Christoph Hellwig
  0 siblings, 0 replies; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  6:23 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 14/14] xfs: inode repair should ensure there's an attr fork to store parent pointers
  2024-04-10  1:07   ` [PATCH 14/14] xfs: inode repair should ensure there's an attr fork to store " Darrick J. Wong
@ 2024-04-10  6:24     ` Christoph Hellwig
  0 siblings, 0 replies; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  6:24 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 1/4] xfs: teach online scrub to find directory tree structure problems
  2024-04-10  1:07   ` [PATCH 1/4] xfs: teach online scrub to find directory tree structure problems Darrick J. Wong
@ 2024-04-10  7:21     ` Christoph Hellwig
  0 siblings, 0 replies; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  7:21 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: hch, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 2/4] xfs: invalidate dirloop scrub path data when concurrent updates happen
  2024-04-10  1:07   ` [PATCH 2/4] xfs: invalidate dirloop scrub path data when concurrent updates happen Darrick J. Wong
@ 2024-04-10  7:21     ` Christoph Hellwig
  0 siblings, 0 replies; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  7:21 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: hch, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 3/4] xfs: report directory tree corruption in the health information
  2024-04-10  1:08   ` [PATCH 3/4] xfs: report directory tree corruption in the health information Darrick J. Wong
@ 2024-04-10  7:23     ` Christoph Hellwig
  0 siblings, 0 replies; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  7:23 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: hch, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 4/4] xfs: fix corruptions in the directory tree
  2024-04-10  1:08   ` [PATCH 4/4] xfs: fix corruptions in the directory tree Darrick J. Wong
@ 2024-04-10  7:23     ` Christoph Hellwig
  0 siblings, 0 replies; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10  7:23 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: hch, linux-xfs

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 1/3] xfs: reduce the rate of cond_resched calls inside scrub
  2024-04-10  1:08   ` [PATCH 1/3] xfs: reduce the rate of cond_resched calls inside scrub Darrick J. Wong
@ 2024-04-10 14:55     ` Christoph Hellwig
  2024-04-10 22:19       ` Darrick J. Wong
  0 siblings, 1 reply; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10 14:55 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: hch, linux-xfs

On Tue, Apr 09, 2024 at 06:08:38PM -0700, Darrick J. Wong wrote:
> Surprisingly, this reduces scrub-only fstests runtime by about 2%.  I
> used the bmapinflate xfs_db command to produce a billion-extent file and
> this stupid gadget reduced the scrub runtime by about 4%.

I wish the scheduler maintainers would just finish sorting out the
preemption models mess and kill cond_resched() and we wouldn't need this.

But until then:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 2/3] xfs: introduce vectored scrub mode
  2024-04-10  1:08   ` [PATCH 2/3] xfs: introduce vectored scrub mode Darrick J. Wong
@ 2024-04-10 15:00     ` Christoph Hellwig
  2024-04-11  0:59       ` Darrick J. Wong
  0 siblings, 1 reply; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10 15:00 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: hch, linux-xfs

On Tue, Apr 09, 2024 at 06:08:54PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Introduce a variant on XFS_SCRUB_METADATA that allows for a vectored
> mode.  The caller specifies the principal metadata object that they want
> to scrub (allocation group, inode, etc.) once, followed by an array of
> scrub types they want called on that object.  The kernel runs the scrub
> operations and writes the output flags and errno code to the
> corresponding array element.
> 
> A new pseudo scrub type BARRIER is introduced to force the kernel to
> return to userspace if any corruptions have been found when scrubbing
> the previous scrub types in the array.  This enables userspace to
> schedule, for example, the sequence:
> 
>  1. data fork
>  2. barrier
>  3. directory
> 
> If the data fork scrub is clean, then the kernel will perform the
> directory scrub.  If not, the barrier in 2 will exit back to userspace.
> 
> When running fstests in "rebuild all metadata after each test" mode, I
> observed a 10% reduction in runtime due to fewer transitions across the
> system call boundary.

Just curius: what is the benefit over shaving a scruball $OBJECT interface
where the above order is encoded in the kernel instead of in the
scrub tool?

> +	BUILD_BUG_ON(sizeof(struct xfs_scrub_vec_head) ==
> +		     sizeof(struct xfs_scrub_metadata));
> +	BUILD_BUG_ON(XFS_IOC_SCRUB_METADATA == XFS_IOC_SCRUBV_METADATA);

What is the point of these BUILD_BUG_ONs?

> +	if (copy_from_user(&head, uhead, sizeof(head)))
> +		return -EFAULT;
> +
> +	if (head.svh_reserved)
> +		return -EINVAL;
> +
> +	bytes = sizeof_xfs_scrub_vec(head.svh_nr);
> +	if (bytes > PAGE_SIZE)
> +		return -ENOMEM;
> +	vhead = kvmalloc(bytes, GFP_KERNEL | __GFP_RETRY_MAYFAIL);

Why __GFP_RETRY_MAYFAIL and not just a plain GFP_KERNEL?

> +	if (!vhead)
> +		return -ENOMEM;
> +	memcpy(vhead, &head, sizeof(struct xfs_scrub_vec_head));
> +
> +	if (copy_from_user(&vhead->svh_vecs, &uhead->svh_vecs,
> +				head.svh_nr * sizeof(struct xfs_scrub_vec))) {

This should probably use array_size to better deal with overflows.

And maybe it should use an indirection for the vecs so that we can
simply do a memdup_user to copy the entire array to kernel space?


^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 3/3] xfs: only iget the file once when doing vectored scrub-by-handle
  2024-04-10  1:09   ` [PATCH 3/3] xfs: only iget the file once when doing vectored scrub-by-handle Darrick J. Wong
@ 2024-04-10 15:12     ` Christoph Hellwig
  2024-04-11  1:15       ` Darrick J. Wong
  0 siblings, 1 reply; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-10 15:12 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: hch, linux-xfs

> +	/*
> +	 * If the caller wants us to do a scrub-by-handle and the file used to
> +	 * call the ioctl is not the same file, load the incore inode and pin
> +	 * it across all the scrubv actions to avoid repeated UNTRUSTED
> +	 * lookups.  The reference is not passed to deeper layers of scrub
> +	 * because each scrubber gets to decide its own strategy for getting an
> +	 * inode.
> +	 */
> +	if (vhead->svh_ino && vhead->svh_ino != ip_in->i_ino)
> +		handle_ip = xchk_scrubv_open_by_handle(mp, vhead);

Oh.  So we read the inode, keep a reference to it, but still hit the
inode cache every time.  A little non-onvious and not perfect for
performance, but based on your numbers probably good enough.

Curious: what is the reason the scrubbers want/need different ways to
get at the inode?

> +	/*
> +	 * If we're holding the only reference to an inode opened via handle
> +	 * and the scan was clean, mark it dontcache so that we don't pollute
> +	 * the cache.
> +	 */
> +	if (handle_ip) {
> +		if (set_dontcache &&
> +		    atomic_read(&VFS_I(handle_ip)->i_count) == 1)
> +			d_mark_dontcache(VFS_I(handle_ip));
> +		xfs_irele(handle_ip);
> +	}

This looks a little weird to me.  Can't we simply use XFS_IGET_DONTCACHE
at iget time and then clear I_DONTCACHE here if we want to keep the
inode around?  Given that we only set the uncached flag from
XFS_IGET_DONTCACHE on a cache miss, we won't have set
DCACHE_DONTCACHE anywhere (and don't really care about the dentries to
start with).

But why do we care about keeping the inodes with errors in memory
here, but not elsewhere?  Maybe this can be explained in an expanded
comment.


^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 14/32] xfs: add parent pointer validator functions
  2024-04-10  5:31     ` Christoph Hellwig
@ 2024-04-10 18:53       ` Darrick J. Wong
  2024-04-11  3:25         ` Christoph Hellwig
  0 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10 18:53 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Allison Henderson, catherine.hoang, hch, linux-xfs

On Tue, Apr 09, 2024 at 10:31:10PM -0700, Christoph Hellwig wrote:
> On Tue, Apr 09, 2024 at 05:57:09PM -0700, Darrick J. Wong wrote:
> > From: Allison Henderson <allison.henderson@oracle.com>
> > 
> > Attribute names of parent pointers are not strings.
> 
> They are now.  The rest of the commit log also doesn't match the code
> anymore.  The code itself looks good, though.

How about this, then:

    xfs: add parent pointer validator functions

    The attr name of a parent pointer is a string, and the attr value of a
    parent pointer is (more or less) a file handle.  So we need to modify
    attr_namecheck to verify the parent pointer name, and add a
    xfs_parent_valuecheck function to sanitize the handle.  At the same
    time, we need to validate attr values during log recovery if the xattr
    is really a parent pointer.

    Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
    Reviewed-by: Darrick J. Wong <djwong@kernel.org>
    [djwong: move functions to xfs_parent.c, adjust for new disk format]
    Signed-off-by: Darrick J. Wong <djwong@kernel.org>

--D

^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 3/4] xfs: rename xfs_da_args.attr_flags
  2024-04-10  5:01     ` Christoph Hellwig
@ 2024-04-10 20:55       ` Darrick J. Wong
  2024-04-11  0:00         ` Darrick J. Wong
  2024-04-11  3:26         ` Christoph Hellwig
  0 siblings, 2 replies; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10 20:55 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: hch, linux-xfs

On Tue, Apr 09, 2024 at 10:01:55PM -0700, Christoph Hellwig wrote:
> On Tue, Apr 09, 2024 at 05:50:07PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > This field only ever contains XATTR_{CREATE,REPLACE}, so let's change
> > the name of the field to make the field and its values consistent.
> 
> So, these flags only get passed to xfs_attr_set through xfs_attr_change
> and xfs_attr_setname, which means we should probably just pass them
> directly as in my patch (against your whole stack) below.

Want me to reflow this through the tree, or just tack it on the end
after (perhaps?) "xfs: fix corruptions in the directory tree" ?

> Also I suspect we should do an audit of all the internal callers
> if they should ever be replace an existing attr, as I guess most
> don't.  (and xfs_attr_change really should be folded into xfs_attr_set,
> the split is confusing as hell).

I imagine a lot of the security stuff with magic xattrs probably only
ever creates xattrs, but I would bet that some of these subsystems
actually *want* the upsert behavior -- "the frob for this file should be
$foo, make it so".

--D

> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> index b98d2a908452a0..38d1f4d10baa3b 100644
> --- a/fs/xfs/libxfs/xfs_attr.c
> +++ b/fs/xfs/libxfs/xfs_attr.c
> @@ -1034,7 +1034,8 @@ xfs_attr_ensure_iext(
>   */
>  int
>  xfs_attr_set(
> -	struct xfs_da_args	*args)
> +	struct xfs_da_args	*args,
> +	uint8_t			xattr_flags)
>  {
>  	struct xfs_inode	*dp = args->dp;
>  	struct xfs_mount	*mp = dp->i_mount;
> @@ -1109,7 +1110,7 @@ xfs_attr_set(
>  		}
>  
>  		/* Pure create fails if the attr already exists */
> -		if (args->xattr_flags & XATTR_CREATE)
> +		if (xattr_flags & XATTR_CREATE)
>  			goto out_trans_cancel;
>  		xfs_attr_defer_add(args, XFS_ATTR_DEFER_REPLACE);
>  		break;
> @@ -1119,7 +1120,7 @@ xfs_attr_set(
>  			goto out_trans_cancel;
>  
>  		/* Pure replace fails if no existing attr to replace. */
> -		if (args->xattr_flags & XATTR_REPLACE)
> +		if (xattr_flags & XATTR_REPLACE)
>  			goto out_trans_cancel;
>  		xfs_attr_defer_add(args, XFS_ATTR_DEFER_SET);
>  		break;
> @@ -1155,7 +1156,7 @@ xfs_attr_set(
>   * Ensure that the xattr structure maps @args->name to @args->value.
>   *
>   * The caller must have initialized @args, attached dquots, and must not hold
> - * any ILOCKs.  Only XATTR_CREATE may be specified in @args->xattr_flags.
> + * any ILOCKs.  Only XATTR_CREATE may be specified in @xattr_flags.
>   * Reserved data blocks may be used if @rsvd is set.
>   *
>   * Returns -EEXIST if XATTR_CREATE was specified and the name already exists.
> @@ -1163,6 +1164,7 @@ xfs_attr_set(
>  int
>  xfs_attr_setname(
>  	struct xfs_da_args	*args,
> +	uint8_t			xattr_flags,
>  	bool			rsvd)
>  {
>  	struct xfs_inode	*dp = args->dp;
> @@ -1172,7 +1174,7 @@ xfs_attr_setname(
>  	int			rmt_extents = 0;
>  	int			error, local;
>  
> -	ASSERT(!(args->xattr_flags & XATTR_REPLACE));
> +	ASSERT(!(xattr_flags & ~XATTR_CREATE));
>  	ASSERT(!args->trans);
>  
>  	args->total = xfs_attr_calc_size(args, &local);
> @@ -1198,7 +1200,7 @@ xfs_attr_setname(
>  	switch (error) {
>  	case -EEXIST:
>  		/* Pure create fails if the attr already exists */
> -		if (args->xattr_flags & XATTR_CREATE)
> +		if (xattr_flags & XATTR_CREATE)
>  			goto out_trans_cancel;
>  		if (args->attr_filter & XFS_ATTR_PARENT)
>  			xfs_attr_defer_parent(args, XFS_ATTR_DEFER_REPLACE);
> diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
> index 2a0ef4f633e2d1..b90e04c3e64f60 100644
> --- a/fs/xfs/libxfs/xfs_attr.h
> +++ b/fs/xfs/libxfs/xfs_attr.h
> @@ -550,7 +550,7 @@ int xfs_inode_hasattr(struct xfs_inode *ip);
>  bool xfs_attr_is_leaf(struct xfs_inode *ip);
>  int xfs_attr_get_ilocked(struct xfs_da_args *args);
>  int xfs_attr_get(struct xfs_da_args *args);
> -int xfs_attr_set(struct xfs_da_args *args);
> +int xfs_attr_set(struct xfs_da_args *args, uint8_t xattr_flags);
>  int xfs_attr_set_iter(struct xfs_attr_intent *attr);
>  int xfs_attr_remove_iter(struct xfs_attr_intent *attr);
>  bool xfs_attr_check_namespace(unsigned int attr_flags);
> @@ -560,7 +560,7 @@ int xfs_attr_calc_size(struct xfs_da_args *args, int *local);
>  void xfs_init_attr_trans(struct xfs_da_args *args, struct xfs_trans_res *tres,
>  			 unsigned int *total);
>  
> -int xfs_attr_setname(struct xfs_da_args *args, bool rsvd);
> +int xfs_attr_setname(struct xfs_da_args *args, uint8_t xattr_flags, bool rsvd);
>  int xfs_attr_removename(struct xfs_da_args *args, bool rsvd);
>  
>  /*
> diff --git a/fs/xfs/libxfs/xfs_da_btree.h b/fs/xfs/libxfs/xfs_da_btree.h
> index 8d7a38fe2a5c07..354d5d65043e43 100644
> --- a/fs/xfs/libxfs/xfs_da_btree.h
> +++ b/fs/xfs/libxfs/xfs_da_btree.h
> @@ -69,7 +69,6 @@ typedef struct xfs_da_args {
>  	uint8_t		filetype;	/* filetype of inode for directories */
>  	uint8_t		op_flags;	/* operation flags */
>  	uint8_t		attr_filter;	/* XFS_ATTR_{ROOT,SECURE,INCOMPLETE} */
> -	uint8_t		xattr_flags;	/* XATTR_{CREATE,REPLACE} */
>  	short		namelen;	/* length of string (maybe no NULL) */
>  	short		new_namelen;	/* length of new attr name */
>  	xfs_dahash_t	hashval;	/* hash value of name */
> diff --git a/fs/xfs/libxfs/xfs_parent.c b/fs/xfs/libxfs/xfs_parent.c
> index 2b6ed8c1ee1522..c5422f714fcc72 100644
> --- a/fs/xfs/libxfs/xfs_parent.c
> +++ b/fs/xfs/libxfs/xfs_parent.c
> @@ -355,7 +355,7 @@ xfs_parent_set(
>  
>  	memset(scratch, 0, sizeof(struct xfs_da_args));
>  	xfs_parent_da_args_init(scratch, NULL, pptr, ip, owner, parent_name);
> -	return xfs_attr_setname(scratch, true);
> +	return xfs_attr_setname(scratch, 0, true);
>  }
>  
>  /*
> diff --git a/fs/xfs/scrub/attr_repair.c b/fs/xfs/scrub/attr_repair.c
> index e06d00ea828b3e..8863eef5a0b87b 100644
> --- a/fs/xfs/scrub/attr_repair.c
> +++ b/fs/xfs/scrub/attr_repair.c
> @@ -615,7 +615,6 @@ xrep_xattr_insert_rec(
>  	struct xfs_da_args		args = {
>  		.dp			= rx->sc->tempip,
>  		.attr_filter		= key->flags,
> -		.xattr_flags		= XATTR_CREATE,
>  		.namelen		= key->namelen,
>  		.valuelen		= key->valuelen,
>  		.owner			= rx->sc->ip->i_ino,
> @@ -675,7 +674,7 @@ xrep_xattr_insert_rec(
>  	 * use reserved blocks because we can abort the repair with ENOSPC.
>  	 */
>  	xfs_attr_sethash(&args);
> -	error = xfs_attr_setname(&args, false);
> +	error = xfs_attr_setname(&args, XATTR_CREATE, false);
>  	if (error == -EEXIST)
>  		error = 0;
>  
> diff --git a/fs/xfs/scrub/parent_repair.c b/fs/xfs/scrub/parent_repair.c
> index cf79cbcda3ecb4..1bc05efa344036 100644
> --- a/fs/xfs/scrub/parent_repair.c
> +++ b/fs/xfs/scrub/parent_repair.c
> @@ -1031,7 +1031,7 @@ xrep_parent_insert_xattr(
>  			rp->xattr_name, key->namelen, key->valuelen);
>  
>  	xfs_attr_sethash(&args);
> -	return xfs_attr_setname(&args, false);
> +	return xfs_attr_setname(&args, 0, false);
>  }
>  
>  /*
> diff --git a/fs/xfs/xfs_acl.c b/fs/xfs/xfs_acl.c
> index 4bf69c9c088e28..1aaf3dc64bcbc1 100644
> --- a/fs/xfs/xfs_acl.c
> +++ b/fs/xfs/xfs_acl.c
> @@ -203,7 +203,7 @@ __xfs_set_acl(struct inode *inode, struct posix_acl *acl, int type)
>  		xfs_acl_to_disk(args.value, acl);
>  	}
>  
> -	error = xfs_attr_change(&args);
> +	error = xfs_attr_change(&args, 0);
>  	kvfree(args.value);
>  
>  	/*
> diff --git a/fs/xfs/xfs_handle.c b/fs/xfs/xfs_handle.c
> index 833b0d7d8bea1c..e3f54817b91557 100644
> --- a/fs/xfs/xfs_handle.c
> +++ b/fs/xfs/xfs_handle.c
> @@ -492,7 +492,6 @@ xfs_attrmulti_attr_get(
>  	struct xfs_da_args	args = {
>  		.dp		= XFS_I(inode),
>  		.attr_filter	= xfs_attr_filter(flags),
> -		.xattr_flags	= xfs_xattr_flags(flags),
>  		.name		= name,
>  		.namelen	= strlen(name),
>  		.valuelen	= *len,
> @@ -526,7 +525,6 @@ xfs_attrmulti_attr_set(
>  	struct xfs_da_args	args = {
>  		.dp		= XFS_I(inode),
>  		.attr_filter	= xfs_attr_filter(flags),
> -		.xattr_flags	= xfs_xattr_flags(flags),
>  		.name		= name,
>  		.namelen	= strlen(name),
>  	};
> @@ -544,7 +542,7 @@ xfs_attrmulti_attr_set(
>  		args.valuelen = len;
>  	}
>  
> -	error = xfs_attr_change(&args);
> +	error = xfs_attr_change(&args, xfs_xattr_flags(flags));
>  	if (!error && (flags & XFS_IOC_ATTR_ROOT))
>  		xfs_forget_acl(inode, name);
>  	kfree(args.value);
> diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> index c4f9c7eec83590..d374be9f8a6e3e 100644
> --- a/fs/xfs/xfs_iops.c
> +++ b/fs/xfs/xfs_iops.c
> @@ -64,7 +64,7 @@ xfs_initxattrs(
>  			.value		= xattr->value,
>  			.valuelen	= xattr->value_len,
>  		};
> -		error = xfs_attr_change(&args);
> +		error = xfs_attr_change(&args, 0);
>  		if (error < 0)
>  			break;
>  	}
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index dc074240ad239f..1292d69087dc0c 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -2131,7 +2131,6 @@ DECLARE_EVENT_CLASS(xfs_attr_class,
>  		__field(int, valuelen)
>  		__field(xfs_dahash_t, hashval)
>  		__field(unsigned int, attr_filter)
> -		__field(unsigned int, xattr_flags)
>  		__field(uint32_t, op_flags)
>  	),
>  	TP_fast_assign(
> @@ -2143,11 +2142,10 @@ DECLARE_EVENT_CLASS(xfs_attr_class,
>  		__entry->valuelen = args->valuelen;
>  		__entry->hashval = args->hashval;
>  		__entry->attr_filter = args->attr_filter;
> -		__entry->xattr_flags = args->xattr_flags;
>  		__entry->op_flags = args->op_flags;
>  	),
>  	TP_printk("dev %d:%d ino 0x%llx name %.*s namelen %d valuelen %d "
> -		  "hashval 0x%x filter %s flags %s op_flags %s",
> +		  "hashval 0x%x filter %s op_flags %s",
>  		  MAJOR(__entry->dev), MINOR(__entry->dev),
>  		  __entry->ino,
>  		  __entry->namelen,
> @@ -2157,9 +2155,6 @@ DECLARE_EVENT_CLASS(xfs_attr_class,
>  		  __entry->hashval,
>  		  __print_flags(__entry->attr_filter, "|",
>  				XFS_ATTR_FILTER_FLAGS),
> -		   __print_flags(__entry->xattr_flags, "|",
> -				{ XATTR_CREATE,		"CREATE" },
> -				{ XATTR_REPLACE,	"REPLACE" }),
>  		  __print_flags(__entry->op_flags, "|", XFS_DA_OP_FLAGS))
>  )
>  
> diff --git a/fs/xfs/xfs_xattr.c b/fs/xfs/xfs_xattr.c
> index 1d57e204c850ff..69fa7b89c68972 100644
> --- a/fs/xfs/xfs_xattr.c
> +++ b/fs/xfs/xfs_xattr.c
> @@ -80,7 +80,8 @@ xfs_attr_want_log_assist(
>   */
>  int
>  xfs_attr_change(
> -	struct xfs_da_args	*args)
> +	struct xfs_da_args	*args,
> +	uint8_t			xattr_flags)
>  {
>  	struct xfs_mount	*mp = args->dp->i_mount;
>  	int			error;
> @@ -95,7 +96,7 @@ xfs_attr_change(
>  		args->op_flags |= XFS_DA_OP_LOGGED;
>  	}
>  
> -	return xfs_attr_set(args);
> +	return xfs_attr_set(args, xattr_flags);
>  }
>  
>  
> @@ -131,7 +132,6 @@ xfs_xattr_set(const struct xattr_handler *handler,
>  	struct xfs_da_args	args = {
>  		.dp		= XFS_I(inode),
>  		.attr_filter	= handler->flags,
> -		.xattr_flags	= flags,
>  		.name		= name,
>  		.namelen	= strlen(name),
>  		.value		= (void *)value,
> @@ -139,7 +139,7 @@ xfs_xattr_set(const struct xattr_handler *handler,
>  	};
>  	int			error;
>  
> -	error = xfs_attr_change(&args);
> +	error = xfs_attr_change(&args, flags);
>  	if (!error && (handler->flags & XFS_ATTR_ROOT))
>  		xfs_forget_acl(inode, name);
>  	return error;
> diff --git a/fs/xfs/xfs_xattr.h b/fs/xfs/xfs_xattr.h
> index f097002d06571f..79c0040cc904b4 100644
> --- a/fs/xfs/xfs_xattr.h
> +++ b/fs/xfs/xfs_xattr.h
> @@ -6,7 +6,7 @@
>  #ifndef __XFS_XATTR_H__
>  #define __XFS_XATTR_H__
>  
> -int xfs_attr_change(struct xfs_da_args *args);
> +int xfs_attr_change(struct xfs_da_args *args, uint8_t xattr_flags);
>  int xfs_attr_grab_log_assist(struct xfs_mount *mp);
>  void xfs_attr_rele_log_assist(struct xfs_mount *mp);
>  
> 

^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 4/4] xfs: rearrange xfs_da_args a bit to use less space
  2024-04-10  5:02     ` Christoph Hellwig
@ 2024-04-10 20:56       ` Darrick J. Wong
  0 siblings, 0 replies; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10 20:56 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: hch, linux-xfs

On Tue, Apr 09, 2024 at 10:02:45PM -0700, Christoph Hellwig wrote:
> On Tue, Apr 09, 2024 at 05:50:23PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > A few notes about struct xfs_da_args:
> > 
> > The XFS_ATTR_* flags only go up as far as XFS_ATTR_INCOMPLETE, which
> > means that attr_filter could be a u8 field.
> > 
> > The XATTR_* flags only have two values, which means that xattr_flags
> > could be shrunk to a u8.
> > 
> > I've reduced the number of XFS_DA_OP_* flags down to the point where
> > op_flags would also fit into a u8.
> > 
> > filetype has 7 bytes of slack after it, which is wasteful.
> > 
> > namelen will never be greater than MAXNAMELEN, which is 256.  This field
> > could be reduced to a short.
> > 
> > Rearrange the fields in xfs_da_args to waste less space.  This reduces
> > the structure size from 136 bytes to 128.  Later when we add extra
> > fields to support parent pointer replacement, this will only bloat the
> > structure to 144 bytes, instead of 168.
> 
> Looks good:
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> 
> Eventually we should probaly split this up, at lot of fields are
> used only by the attr set code, and a few less only by dir vs attr.

Agreed, though we're veering dangerously close to object inheritance.

But it would be useful for code analysis if dir operations would pass in
an xfs_dir_op structure containing a much smaller xfs_da_args and
likewise for xattrs.

--D

^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 01/12] xfs: attr fork iext must be loaded before calling xfs_attr_is_leaf
  2024-04-10  5:04     ` Christoph Hellwig
@ 2024-04-10 20:58       ` Darrick J. Wong
  0 siblings, 0 replies; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10 20:58 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Christoph Hellwig, linux-xfs

On Tue, Apr 09, 2024 at 10:04:11PM -0700, Christoph Hellwig wrote:
> On Tue, Apr 09, 2024 at 05:50:38PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > Christoph noticed that the xfs_attr_is_leaf in xfs_attr_get_ilocked can
> > access the incore extent tree of the attr fork, but nothing in the
> > xfs_attr_get path guarantees that the incore tree is actually loaded.
> > 
> > Most of the time it is, but seeing as xfs_attr_is_leaf ignores the
> > return value of xfs_iext_get_extent I guess we've been making choices
> > based on random stack contents and nobody's complained?
> 
> Yes, I'm kinda puzzled.

I suspect that most of the time we get lucky and *someone* has read in
the attr fork or created it or whatever.

> Note that the dir code actually reads the extents in their
> is_leaf/is_block helpers.  But given how the attr code is structured
> that would thread through a lot of code so it might not be worth it.

<nod> But it would be more consistent...

> Reviewed-by: Christoph Hellwig <hch@lst.de>

Thanks!

--D

^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 05/12] xfs: fix missing check for invalid attr flags
  2024-04-10  5:07     ` Christoph Hellwig
@ 2024-04-10 21:04       ` Darrick J. Wong
  0 siblings, 0 replies; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10 21:04 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: hch, linux-xfs

On Tue, Apr 09, 2024 at 10:07:10PM -0700, Christoph Hellwig wrote:
> On Tue, Apr 09, 2024 at 05:51:41PM -0700, Darrick J. Wong wrote:
> > +#define XFS_ATTR_ONDISK_MASK	(XFS_ATTR_NSP_ONDISK_MASK | \
> > +				 XFS_ATTR_LOCAL | \
> > +				 XFS_ATTR_INCOMPLETE)
> 
> Note that XFS_ATTR_LOCAL and XFS_ATTR_INCOMPLETE are not valid for
> short form directories.  Should we check for that somewhere as well?

Good point, xchk_xattr_check_sf should be flagging those too:

	/*
	 * Shortform entries do not set LOCAL or INCOMPLETE, so the only
	 * valid flag bits here are for namespaces.
	 */
	if (sfe->flags & ~XFS_ATTR_NSP_ONDISK_MASK) {
		xchk_fblock_set_corrupt(sc, XFS_ATTR_FORK, 0);
		break;
	}

I'll tack that on the end.

> Otherwise looks good:
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>

Thanks!

--D

^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 07/32] xfs: allow xattr matching on name and value for local/sf attrs
  2024-04-10  5:16     ` Christoph Hellwig
@ 2024-04-10 21:13       ` Darrick J. Wong
  2024-04-11  3:28         ` Christoph Hellwig
  0 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10 21:13 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

On Tue, Apr 09, 2024 at 10:16:58PM -0700, Christoph Hellwig wrote:
> On Tue, Apr 09, 2024 at 05:55:20PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > Add a new XFS_DA_OP_PARENT flag to signal that the caller wants to look
> 
> The flag doesn't actually exist, the match is done on the
> XFS_ATTR_PARENT namespaces.

How about:

"xfs: allow xattr matching on name and value for local/sf pptr attrs

"If a file is hardlinked with the same name but from multiple parents,
the parent pointers will all have the same dirent name (== attr name)
but with different parent_ino/parent_gen values.  To disambiguate, we
need to be able to match on both the attr name and the attr value.  This
is in contrast to regular xattrs, which are matched only on name.

"Therefore, plumb in the ability to match shortform and local attrs on
name and value in the XFS_ATTR_PARENT namespace.  Parent pointer attr
values are never large enough to be stored in a remote attr, so we need
can reject these cases as corruption."

> >  
> > @@ -2444,14 +2477,17 @@ xfs_attr3_leaf_lookup_int(
> >  			name_loc = xfs_attr3_leaf_name_local(leaf, probe);
> >  			if (!xfs_attr_match(args, entry->flags,
> >  						name_loc->nameval,
> > -						name_loc->namelen))
> > +						name_loc->namelen,
> > +						&name_loc->nameval[name_loc->namelen],
> > +						be16_to_cpu(name_loc->valuelen)))
> 
> If we'd switch from the odd pre-existing three-tab indent to the normal
> two-tab indent we'd avoid the overly long line here.
> 
> >  				continue;
> >  			args->index = probe;
> >  			return -EEXIST;
> >  		} else {
> >  			name_rmt = xfs_attr3_leaf_name_remote(leaf, probe);
> >  			if (!xfs_attr_match(args, entry->flags, name_rmt->name,
> > -						name_rmt->namelen))
> > +						name_rmt->namelen, NULL,
> > +						be32_to_cpu(name_rmt->valuelen)))
> 
> ... and here.

Believe it or not that's what vim autoformat does by default.
But yes, I'll reduce it to two indents to reduce the indentation and
overflow.

> The remote side might also benefit from a local variable to store the
> endian swapped version of the valuelen instead of calculating it twice.

Ok.

> Otherwise looks good:
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>

Thanks!

--D

^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 08/32] xfs: allow logged xattr operations if parent pointers are enabled
  2024-04-10  5:18     ` Christoph Hellwig
@ 2024-04-10 21:18       ` Darrick J. Wong
  0 siblings, 0 replies; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10 21:18 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

On Tue, Apr 09, 2024 at 10:18:17PM -0700, Christoph Hellwig wrote:
> On Tue, Apr 09, 2024 at 05:55:35PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > Don't trip this assertion about attr log items if we have parent
> > pointers enabled.  Parent pointers are an incompat feature that doesn't
> > use any of the functionality protected by
> > XFS_SB_FEAT_INCOMPAT_LOG_XATTRS, which is why this is ok.
> 
> I'd move the checks into the switch on op below, so that we check the log
> attrs feature for the "normal" logged attrs and the parent pointers flag
> for the parent pointer ops.

Good idea.

--D

^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 16/32] xfs: create a hashname function for parent pointers
  2024-04-10  5:33     ` Christoph Hellwig
@ 2024-04-10 21:39       ` Darrick J. Wong
  0 siblings, 0 replies; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10 21:39 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

On Tue, Apr 09, 2024 at 10:33:45PM -0700, Christoph Hellwig wrote:
> > +	/*
> > +	 * Use the same dirent name hash as would be used on the directory, but
> > +	 * mix in the parent inode number.
> > +	 */
> > +	ret = xfs_dir2_hashname(mp, &xname);
> > +	ret ^= upper_32_bits(parent_ino);
> > +	ret ^= lower_32_bits(parent_ino);
> > +	return ret;
> 
> Totally superficial nit, but wouldn't this read a little nicer as:
> 
> 	return xfs_dir2_hashname(mp, &xname) ^
> 		lower_32_bits(parent_ino) ^
> 		upper_32_bits(parent_ino);
> 
> ?

Yeah, will change.

> Otherwise looks good:
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>

Thanks!

--D

^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 17/32] xfs: parent pointer attribute creation
  2024-04-10  5:44     ` Christoph Hellwig
@ 2024-04-10 21:50       ` Darrick J. Wong
  0 siblings, 0 replies; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10 21:50 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Dave Chinner, Allison Henderson, catherine.hoang, hch, linux-xfs

On Tue, Apr 09, 2024 at 10:44:56PM -0700, Christoph Hellwig wrote:
> 
> One thing that might be worth documenting in a comment or at least
> the commit log is why we have this three phase split between
> allocating the daargs, doing all the work and freeing it.
> 
> As far as I can tell that is because the da_args need to be around
> until transaction commit because xfs_attr_intent has a pointer to
> the da_args and not a full copy.

Correct.  I think the xfs_attr_intent could make its own full copy and
iterate on that, but that would break the existing behavior that the
xfs_attr_set caller can look at the end state of the xfs_attr_set
operation.

AFAICT the only xfs_attr_* callers that care about the end state at all
are xfs_attr_lookup functions (because they might want to get the
value).  I don't *think* any of the xfs_attr_set callers actually care.
The log intent creation function will snapshot the
name/value/oldname/newvalue buffers, so we're already doing large(ish)
allocations deep in transaction context.

On the other hand, the current code has fewer copies to make because we
"know" that the da_args has to persist until transaction commit
because...

>                                   So unless the attrs are on stack
> they need to be free after transaction commit, and as the normal
> dir operation args are not on the stack we don't want to add the
> attr one to the stack here.  We could probably allocate the da_args
> in the main parent pointer helpers, but that would require a NOFAIL
> allocation and maybe lead to odd calling conventions, but maybe
> someone directly involved can further refine that reasoning.

...the da_args (for the parent pointer op) is already allocated as part
of xfs_parent_args in xfs_parent_start.  If the kernel supported alloca
(HAHAAHA) then we wouldn't need the xfs_parent_args_cache.

> Otherwise looks good:
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>

Thanks!

--D

^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 23/32] xfs: Filter XFS_ATTR_PARENT for getfattr
  2024-04-10  5:51     ` Christoph Hellwig
@ 2024-04-10 21:58       ` Darrick J. Wong
  2024-04-11  3:29         ` Christoph Hellwig
  0 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10 21:58 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Allison Henderson, catherine.hoang, hch, linux-xfs

On Tue, Apr 09, 2024 at 10:51:18PM -0700, Christoph Hellwig wrote:
> On Tue, Apr 09, 2024 at 05:59:30PM -0700, Darrick J. Wong wrote:
> > From: Allison Henderson <allison.henderson@oracle.com>
> > 
> > Parent pointers returned to the get_fattr tool cause errors since
> > the tool cannot parse parent pointers.  Fix this by filtering parent
> > parent pointers from xfs_xattr_put_listent.
> 
> With the new format returning the attrs should not cause parsing errors.
> OTOH we now have duplicate names, which means a get operation based on
> the name can't actually work in that case.
> 
> I'd also argue that parent pointers are internal enough that they
> should not be exposed through the normal xattr interfaces.

Yeah, I probably should change the commit message to:

"xfs: don't return XFS_ATTR_PARENT attributes via listxattr

"Parent pointers are internal filesystem metadata.  They're not intended
to be directly visible to userspace, so filter them out of
xfs_xattr_put_listent so that they don't appear in listxattr."

> > +/*
> > + * This file defines functions to work with externally visible extended
> > + * attributes, such as those in user, system, or security namespaces.  They
> > + * should not be used for internally used attributes.  Consider xfs_attr.c.
> > + */
> 
> As long as xfs_attr_change and xfs_attr_grab_log_assist are xfs_xattr.c
> that is not actually true.  However I think they should be moved to
> xfs_attr.c (and in case of xfs_attr_change merged into xfs_attr_set)
> to make this comment true.

I don't want to hoist all the larp enabling jun^Wmachinery to libxfs and
then have to stub that out in userspace.  I'd rather get rid of larp
mode entirely, after which point xfs_attr_change becomes a trivial
helper that can be collapsed.

> However I'd make it part of the top of file comment above the include
> statements.  And please add it in a separate commit as it has nothing
> to do with the other changes here.

Or just get rid of the comment entirely?  It came from the verity
series.

--D

^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 26/32] xfs: split out handle management helpers a bit
  2024-04-10  5:56     ` Christoph Hellwig
@ 2024-04-10 22:01       ` Darrick J. Wong
  0 siblings, 0 replies; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10 22:01 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

On Tue, Apr 09, 2024 at 10:56:06PM -0700, Christoph Hellwig wrote:
> > +	handle->ha_fid.fid_len = sizeof(struct xfs_fid) -
> > +				 sizeof(handle->ha_fid.fid_len);
> 
> If we clean this up anyway, maybe add a helper for the above calculation
> and share it with xfs_khandle_to_dentry?

Done:

static inline size_t
xfs_filehandle_fid_len(void)
{
	struct xfs_handle	*handle = NULL;

	return sizeof(struct xfs_fid) - sizeof(handle->ha_fid.fid_len);
}

static inline size_t
xfs_filehandle_init(
	struct xfs_mount	*mp,
	xfs_ino_t		ino,
	uint32_t		gen,
	struct xfs_handle	*handle)
{
	memcpy(&handle->ha_fsid, mp->m_fixedfsid, sizeof(struct xfs_fsid));

	handle->ha_fid.fid_len = xfs_filehandle_fid_len();
	...
}

> Otherwise looks good:
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>

Thanks!

--D

^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 29/32] xfs: Add the parent pointer support to the superblock version 5.
  2024-04-10  6:05     ` Christoph Hellwig
@ 2024-04-10 22:06       ` Darrick J. Wong
  0 siblings, 0 replies; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10 22:06 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Mark Tinguely, Dave Chinner, Allison Henderson, Darrick J. Wong,
	catherine.hoang, hch, linux-xfs

On Tue, Apr 09, 2024 at 11:05:12PM -0700, Christoph Hellwig wrote:
> On Tue, Apr 09, 2024 at 06:01:04PM -0700, Darrick J. Wong wrote:
> > From: Allison Henderson <allison.henderson@oracle.com>
> > 
> > Add the parent pointer superblock flag so that we can actually mount
> > filesystems with this feature enabled.
> 
> The subjcet reads a little weird.  What about
> 
> "add a incompat feature bit for parent pointers" ?

Changed to:

"xfs: add a incompat feature bit for parent pointers

"Create an incompat feature bit and a fs geometry flag so that we can
enable the feature in the ondisk superblock and advertise its existence
to userspace."

Also I'll change the bit to 1<<25; I was using 1U<<30 to avoid
collisions with the other patchsets.

--D

^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 32/32] xfs: enable parent pointers
  2024-04-10  6:06     ` Christoph Hellwig
@ 2024-04-10 22:11       ` Darrick J. Wong
  0 siblings, 0 replies; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10 22:11 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

On Tue, Apr 09, 2024 at 11:06:33PM -0700, Christoph Hellwig wrote:
> On Tue, Apr 09, 2024 at 06:01:51PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > Add parent pointers to the list of supported features.
> 
> Any reason to split this from actually adding the feature bit?

Bisection.  If we don't add the bit to XFS_SB_FEAT_INCOMPAT_ALL, then
the kernel won't mount the filesystem.

Nowadays I think we could define the xfs_has_foo helpers at the start
and only add the superblock feature bit and the code that sets
XFS_FEAT_FOO in the final patch.

BUuut.... this patchset predates the xfs_has_foo helpers and I haven't
fully adjusted to that yet.

> Otherwise looks good:
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>

Thanks!

--D

^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 01/14] xfs: add xattr setname and removename functions for internal users
  2024-04-10  6:18     ` Christoph Hellwig
@ 2024-04-10 22:18       ` Darrick J. Wong
  2024-04-11  3:32         ` Christoph Hellwig
  0 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10 22:18 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

On Tue, Apr 09, 2024 at 11:18:03PM -0700, Christoph Hellwig wrote:
> > +static int
> > +xfs_attr_ensure_iext(
> > +	struct xfs_da_args	*args,
> > +	int			nr)
> > +{
> > +	int			error;
> > +
> > +	error = xfs_iext_count_may_overflow(args->dp, XFS_ATTR_FORK, nr);
> > +	if (error == -EFBIG)
> > +		return xfs_iext_count_upgrade(args->trans, args->dp, nr);
> > +	return error;
> > +}
> 
> I'd rather get my consolidation of these merged instead of adding
> a wrapper like this.  Just waiting for my RT delalloc and your
> exchrange series to hit for-next to resend it.

Yeah, I made a mental note to scrub this function out if your patch wins
the race.

> > +/*
> > + * Ensure that the xattr structure maps @args->name to @args->value.
> > + *
> > + * The caller must have initialized @args, attached dquots, and must not hold
> > + * any ILOCKs.  Only XATTR_CREATE may be specified in @args->xattr_flags.
> > + * Reserved data blocks may be used if @rsvd is set.
> > + *
> > + * Returns -EEXIST if XATTR_CREATE was specified and the name already exists.
> > + */
> > +int
> > +xfs_attr_setname(
> 
> Is there any case where we do not want to pass XATTR_CREATE, that
> is replace an existing attribute when there is one?

Yes, verity setup will use xfs_attr_setname to upsert a merkle tree
block into the attr structure and obliterate stale blocks that might
already have been there.

> > +int
> > +xfs_attr_removename(
> > +	struct xfs_da_args	*args,
> > +	bool			rsvd)
> > +{
> 
> Is there a good reason to have a separate remove helper and not
> overload a NULL value like we do for the normal xattr interface?

xfs_repair uses xfs_parent_unset -> xfs_attr_removename to erase any
XFS_ATTR_PARENT attribute that doesn't validate, so it needs to be able
to pass in a non-NULL value.  Perhaps I'll add a comment about that,
since this isn't the first time this has come up.

Come to think of it you can't removename a remote parent value, so I
guess in that bad case xfs_repair will have to drop the entire attr
structure <frown>.

/*
 * Ensure that the xattr structure does not map @args->name to @args->value.
 * @args->value must be set for XFS_ATTR_PARENT removal (e.g. xfs_repair).
 *
 * The caller must have initialized @args, attached dquots, and must not hold
 * any ILOCKs.  Reserved data blocks may be used if @rsvd is set.
 *
 * Returns -ENOATTR if the name did not already exist.
 */


--D

^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 1/3] xfs: reduce the rate of cond_resched calls inside scrub
  2024-04-10 14:55     ` Christoph Hellwig
@ 2024-04-10 22:19       ` Darrick J. Wong
  0 siblings, 0 replies; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10 22:19 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: hch, linux-xfs

On Wed, Apr 10, 2024 at 07:55:35AM -0700, Christoph Hellwig wrote:
> On Tue, Apr 09, 2024 at 06:08:38PM -0700, Darrick J. Wong wrote:
> > Surprisingly, this reduces scrub-only fstests runtime by about 2%.  I
> > used the bmapinflate xfs_db command to produce a billion-extent file and
> > this stupid gadget reduced the scrub runtime by about 4%.
> 
> I wish the scheduler maintainers would just finish sorting out the
> preemption models mess and kill cond_resched() and we wouldn't need this.

Yeah, I heard that might not happen or ... something.

> But until then:
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>

Thanks!

--D

^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 11/32] xfs: log parent pointer xattr replace operations
  2024-04-10  5:26     ` Christoph Hellwig
@ 2024-04-10 23:07       ` Darrick J. Wong
  2024-04-11  3:35         ` Christoph Hellwig
  0 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10 23:07 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Allison Henderson, catherine.hoang, hch, linux-xfs

On Tue, Apr 09, 2024 at 10:26:07PM -0700, Christoph Hellwig wrote:
> On Tue, Apr 09, 2024 at 05:56:22PM -0700, Darrick J. Wong wrote:
> > From: Allison Henderson <allison.henderson@oracle.com>
> > 
> > The parent pointer code needs to do a deferred parent pointer replace
> > operation with the xattr log intent code.  Declare a new logged xattr
> > opcode and push it through the log.
> > 
> > (Formerly titled "xfs: Add new name to attri/d" and described as
> > follows:
> 
> I don't think this history is very important.  The being said,
> I suspect this and the previous two patches should be combined into
> a single one adding the on-disk formats for parent pointers, and the
> commit log could use a complete rewrite saying that it a

I combined the three patches into this:

    xfs: create attr log item opcodes and formats for parent pointers

    Make the necessary alterations to the extended attribute log intent item
    ondisk format so that we can log parent pointer operations.  This
    requires the creation of new opcodes specific to parent pointers, and a
    new four-argument replace operation to handle renames.  At this point
    this part of the patchset has changed so much from what Allison original
    wrote that I no longer think her SoB applies.

    Signed-off-by: Darrick J. Wong <djwong@kernel.org>

and about time, I was getting real irritated at having to iterate back
and forth across those. ;)

> > +			return false;
> > +		if (attrp->alfi_old_name_len == 0 ||
> > +		    attrp->alfi_old_name_len > XATTR_NAME_MAX)
> > +			return false;
> > +		if (attrp->alfi_new_name_len == 0 ||
> > +		    attrp->alfi_new_name_len > XATTR_NAME_MAX)
> > +			return false;
> 
> Given that we have four copies of this (arguably simple) check,
> should we grow a helper for it?

static inline bool
xfs_attri_validate_namelen(unsigned int namelen)
{
	return namelen > 0 && namelen <= XATTR_NAME_MAX;
}

Done.

> > +		if (attrp->alfi_value_len == 0 ||
> > +		    attrp->alfi_value_len > XATTR_SIZE_MAX)
> > +			return false;
> 
> All parent pointer attrs must be sized for exactly the parent_rec,
> so we should probably check for that explicitly?

Done.

> > +	if (xfs_attr_log_item_op(old_attrp) == XFS_ATTRI_OP_FLAGS_PPTR_REPLACE) {
> 
> Please avoid the overly long line here.

I've turned that into a switch()

> >  
> > +	/* Validate the new attr name */
> > +	if (new_name_len > 0) {
> > +		if (item->ri_buf[i].i_len != xlog_calc_iovec_len(new_name_len)) {
> 
> .. and here.
> 
> And while we're at it, maybe factor the checking for valid xattr
> name and value log iovecs into little helper instead of duplicating
> them a few times?

Ok, I'll add that as a prep patch.

--D

^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 27/32] xfs: Add parent pointer ioctls
  2024-04-10  6:04     ` Christoph Hellwig
@ 2024-04-10 23:34       ` Darrick J. Wong
  0 siblings, 0 replies; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-10 23:34 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Allison Henderson, catherine.hoang, hch, linux-xfs

On Tue, Apr 09, 2024 at 11:04:08PM -0700, Christoph Hellwig wrote:
> Maybe replace the subject with 'add parent pointer listing ioctls' ?
> 
> On Tue, Apr 09, 2024 at 06:00:33PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > This patch adds a pair of new file ioctls to retrieve the parent pointer
> > of a given inode.  They both return the same results, but one operates
> > on the file descriptor passed to ioctl() whereas the other allows the
> > caller to specify a file handle for which the caller wants results.
> > 
> > Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
> > Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> > [djwong: adjust to new ondisk format, split ioctls]
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> 
> Note that the first signoff should always be from the patch author.
> as recorded in the From line.

Yeah.  At this point the ioctl is so much different from Allison's
original version that it doesn't make much sense to keep her as the
patch author or sob person.

> > +	/* Size of the gp_buffer in bytes */
> > +	__u32				gp_bufsize;
> > +
> > +	/* Must be set to zero */
> > +	__u64				__pad;
> 
> We don't really need this as padding.  If you want to keep it for
> extensibility (although I can't really think of anything to use it
> for in the future) it should probably be renamed to gp_reserved;

Eh, I'll keep it, just in case.  The getparents_by_handle aligns nicely
with a single cacheline. :P

> > +static inline struct xfs_getparents_rec *
> > +xfs_getparents_next_rec(struct xfs_getparents *gp,
> > +			struct xfs_getparents_rec *gpr)
> > +{
> > +	char *next = ((char *)gpr + gpr->gpr_reclen);
> > +	char *end = (char *)(uintptr_t)(gp->gp_buffer + gp->gp_bufsize);
> > +
> > +	if (next >= end)
> > +		return NULL;
> > +
> > +	return (struct xfs_getparents_rec *)next;
> 
> We rely on void pointer arithmetics everywhere in the kernel and
> xfsprogs, so maybe use that here and avoid the need for the cast
> at the end?

Hopefully our downstream users also have compilers that allow void
pointer arithmetic. ;)

> > + */
> > +int
> > +xfs_parent_from_xattr(
> > +	struct xfs_mount	*mp,
> > +	unsigned int		attr_flags,
> > +	const unsigned char	*name,
> > +	unsigned int		namelen,
> > +	const void		*value,
> > +	unsigned int		valuelen,
> > +	xfs_ino_t		*parent_ino,
> > +	uint32_t		*parent_gen)
> > +{
> > +	const struct xfs_parent_rec	*rec = value;
> > +
> > +	if (!(attr_flags & XFS_ATTR_PARENT))
> > +		return 0;
> 
> I wonder if this check should move to the callers.  That makes the
> calling conventions a lot simpler, and I think it probably makes
> the code a bit easier to follow as well.  But I'm not entirely sure
> either and open for arguments.

Yeah, on further thought I don't like the 0/1 return value convention
and will change that to require callers to screen for ATTR_PARENT.

> > +static inline unsigned int
> > +xfs_getparents_rec_sizeof(
> > +	unsigned int		namelen)
> > +{
> > +	return round_up(sizeof(struct xfs_getparents_rec) + namelen + 1,
> > +			sizeof(uint32_t));
> > +}
> 
> As we marked the xfs_getparents_rec as __packed we shouldn't really
> need the alignment here.  Or if we align, it should be to 8 bytes,
> in which case we don't need to pack it.

Let's align it to u64; everything else is.

> > +	unsigned short			reclen = xfs_getparents_rec_sizeof(namelen);
> 
> Please avoid the overly long line here.

Fixed.

> Otherwise looks good:
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>

Thanks!

--D

^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 3/4] xfs: rename xfs_da_args.attr_flags
  2024-04-10 20:55       ` Darrick J. Wong
@ 2024-04-11  0:00         ` Darrick J. Wong
  2024-04-11  3:26         ` Christoph Hellwig
  1 sibling, 0 replies; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-11  0:00 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: hch, linux-xfs

On Wed, Apr 10, 2024 at 01:55:28PM -0700, Darrick J. Wong wrote:
> On Tue, Apr 09, 2024 at 10:01:55PM -0700, Christoph Hellwig wrote:
> > On Tue, Apr 09, 2024 at 05:50:07PM -0700, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <djwong@kernel.org>
> > > 
> > > This field only ever contains XATTR_{CREATE,REPLACE}, so let's change
> > > the name of the field to make the field and its values consistent.
> > 
> > So, these flags only get passed to xfs_attr_set through xfs_attr_change
> > and xfs_attr_setname, which means we should probably just pass them
> > directly as in my patch (against your whole stack) below.
> 
> Want me to reflow this through the tree, or just tack it on the end
> after (perhaps?) "xfs: fix corruptions in the directory tree" ?

Ugh, no, that got messy so I just tacked it on the end. :)

Also I changed the uint8_t parameter to int because the XATTR_* flags
mostly come from the VFS and that's what it passes us in
xattr_handler::set().

--D

> > Also I suspect we should do an audit of all the internal callers
> > if they should ever be replace an existing attr, as I guess most
> > don't.  (and xfs_attr_change really should be folded into xfs_attr_set,
> > the split is confusing as hell).
> 
> I imagine a lot of the security stuff with magic xattrs probably only
> ever creates xattrs, but I would bet that some of these subsystems
> actually *want* the upsert behavior -- "the frob for this file should be
> $foo, make it so".
> 
> --D
> 
> > diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> > index b98d2a908452a0..38d1f4d10baa3b 100644
> > --- a/fs/xfs/libxfs/xfs_attr.c
> > +++ b/fs/xfs/libxfs/xfs_attr.c
> > @@ -1034,7 +1034,8 @@ xfs_attr_ensure_iext(
> >   */
> >  int
> >  xfs_attr_set(
> > -	struct xfs_da_args	*args)
> > +	struct xfs_da_args	*args,
> > +	uint8_t			xattr_flags)
> >  {
> >  	struct xfs_inode	*dp = args->dp;
> >  	struct xfs_mount	*mp = dp->i_mount;
> > @@ -1109,7 +1110,7 @@ xfs_attr_set(
> >  		}
> >  
> >  		/* Pure create fails if the attr already exists */
> > -		if (args->xattr_flags & XATTR_CREATE)
> > +		if (xattr_flags & XATTR_CREATE)
> >  			goto out_trans_cancel;
> >  		xfs_attr_defer_add(args, XFS_ATTR_DEFER_REPLACE);
> >  		break;
> > @@ -1119,7 +1120,7 @@ xfs_attr_set(
> >  			goto out_trans_cancel;
> >  
> >  		/* Pure replace fails if no existing attr to replace. */
> > -		if (args->xattr_flags & XATTR_REPLACE)
> > +		if (xattr_flags & XATTR_REPLACE)
> >  			goto out_trans_cancel;
> >  		xfs_attr_defer_add(args, XFS_ATTR_DEFER_SET);
> >  		break;
> > @@ -1155,7 +1156,7 @@ xfs_attr_set(
> >   * Ensure that the xattr structure maps @args->name to @args->value.
> >   *
> >   * The caller must have initialized @args, attached dquots, and must not hold
> > - * any ILOCKs.  Only XATTR_CREATE may be specified in @args->xattr_flags.
> > + * any ILOCKs.  Only XATTR_CREATE may be specified in @xattr_flags.
> >   * Reserved data blocks may be used if @rsvd is set.
> >   *
> >   * Returns -EEXIST if XATTR_CREATE was specified and the name already exists.
> > @@ -1163,6 +1164,7 @@ xfs_attr_set(
> >  int
> >  xfs_attr_setname(
> >  	struct xfs_da_args	*args,
> > +	uint8_t			xattr_flags,
> >  	bool			rsvd)
> >  {
> >  	struct xfs_inode	*dp = args->dp;
> > @@ -1172,7 +1174,7 @@ xfs_attr_setname(
> >  	int			rmt_extents = 0;
> >  	int			error, local;
> >  
> > -	ASSERT(!(args->xattr_flags & XATTR_REPLACE));
> > +	ASSERT(!(xattr_flags & ~XATTR_CREATE));
> >  	ASSERT(!args->trans);
> >  
> >  	args->total = xfs_attr_calc_size(args, &local);
> > @@ -1198,7 +1200,7 @@ xfs_attr_setname(
> >  	switch (error) {
> >  	case -EEXIST:
> >  		/* Pure create fails if the attr already exists */
> > -		if (args->xattr_flags & XATTR_CREATE)
> > +		if (xattr_flags & XATTR_CREATE)
> >  			goto out_trans_cancel;
> >  		if (args->attr_filter & XFS_ATTR_PARENT)
> >  			xfs_attr_defer_parent(args, XFS_ATTR_DEFER_REPLACE);
> > diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
> > index 2a0ef4f633e2d1..b90e04c3e64f60 100644
> > --- a/fs/xfs/libxfs/xfs_attr.h
> > +++ b/fs/xfs/libxfs/xfs_attr.h
> > @@ -550,7 +550,7 @@ int xfs_inode_hasattr(struct xfs_inode *ip);
> >  bool xfs_attr_is_leaf(struct xfs_inode *ip);
> >  int xfs_attr_get_ilocked(struct xfs_da_args *args);
> >  int xfs_attr_get(struct xfs_da_args *args);
> > -int xfs_attr_set(struct xfs_da_args *args);
> > +int xfs_attr_set(struct xfs_da_args *args, uint8_t xattr_flags);
> >  int xfs_attr_set_iter(struct xfs_attr_intent *attr);
> >  int xfs_attr_remove_iter(struct xfs_attr_intent *attr);
> >  bool xfs_attr_check_namespace(unsigned int attr_flags);
> > @@ -560,7 +560,7 @@ int xfs_attr_calc_size(struct xfs_da_args *args, int *local);
> >  void xfs_init_attr_trans(struct xfs_da_args *args, struct xfs_trans_res *tres,
> >  			 unsigned int *total);
> >  
> > -int xfs_attr_setname(struct xfs_da_args *args, bool rsvd);
> > +int xfs_attr_setname(struct xfs_da_args *args, uint8_t xattr_flags, bool rsvd);
> >  int xfs_attr_removename(struct xfs_da_args *args, bool rsvd);
> >  
> >  /*
> > diff --git a/fs/xfs/libxfs/xfs_da_btree.h b/fs/xfs/libxfs/xfs_da_btree.h
> > index 8d7a38fe2a5c07..354d5d65043e43 100644
> > --- a/fs/xfs/libxfs/xfs_da_btree.h
> > +++ b/fs/xfs/libxfs/xfs_da_btree.h
> > @@ -69,7 +69,6 @@ typedef struct xfs_da_args {
> >  	uint8_t		filetype;	/* filetype of inode for directories */
> >  	uint8_t		op_flags;	/* operation flags */
> >  	uint8_t		attr_filter;	/* XFS_ATTR_{ROOT,SECURE,INCOMPLETE} */
> > -	uint8_t		xattr_flags;	/* XATTR_{CREATE,REPLACE} */
> >  	short		namelen;	/* length of string (maybe no NULL) */
> >  	short		new_namelen;	/* length of new attr name */
> >  	xfs_dahash_t	hashval;	/* hash value of name */
> > diff --git a/fs/xfs/libxfs/xfs_parent.c b/fs/xfs/libxfs/xfs_parent.c
> > index 2b6ed8c1ee1522..c5422f714fcc72 100644
> > --- a/fs/xfs/libxfs/xfs_parent.c
> > +++ b/fs/xfs/libxfs/xfs_parent.c
> > @@ -355,7 +355,7 @@ xfs_parent_set(
> >  
> >  	memset(scratch, 0, sizeof(struct xfs_da_args));
> >  	xfs_parent_da_args_init(scratch, NULL, pptr, ip, owner, parent_name);
> > -	return xfs_attr_setname(scratch, true);
> > +	return xfs_attr_setname(scratch, 0, true);
> >  }
> >  
> >  /*
> > diff --git a/fs/xfs/scrub/attr_repair.c b/fs/xfs/scrub/attr_repair.c
> > index e06d00ea828b3e..8863eef5a0b87b 100644
> > --- a/fs/xfs/scrub/attr_repair.c
> > +++ b/fs/xfs/scrub/attr_repair.c
> > @@ -615,7 +615,6 @@ xrep_xattr_insert_rec(
> >  	struct xfs_da_args		args = {
> >  		.dp			= rx->sc->tempip,
> >  		.attr_filter		= key->flags,
> > -		.xattr_flags		= XATTR_CREATE,
> >  		.namelen		= key->namelen,
> >  		.valuelen		= key->valuelen,
> >  		.owner			= rx->sc->ip->i_ino,
> > @@ -675,7 +674,7 @@ xrep_xattr_insert_rec(
> >  	 * use reserved blocks because we can abort the repair with ENOSPC.
> >  	 */
> >  	xfs_attr_sethash(&args);
> > -	error = xfs_attr_setname(&args, false);
> > +	error = xfs_attr_setname(&args, XATTR_CREATE, false);
> >  	if (error == -EEXIST)
> >  		error = 0;
> >  
> > diff --git a/fs/xfs/scrub/parent_repair.c b/fs/xfs/scrub/parent_repair.c
> > index cf79cbcda3ecb4..1bc05efa344036 100644
> > --- a/fs/xfs/scrub/parent_repair.c
> > +++ b/fs/xfs/scrub/parent_repair.c
> > @@ -1031,7 +1031,7 @@ xrep_parent_insert_xattr(
> >  			rp->xattr_name, key->namelen, key->valuelen);
> >  
> >  	xfs_attr_sethash(&args);
> > -	return xfs_attr_setname(&args, false);
> > +	return xfs_attr_setname(&args, 0, false);
> >  }
> >  
> >  /*
> > diff --git a/fs/xfs/xfs_acl.c b/fs/xfs/xfs_acl.c
> > index 4bf69c9c088e28..1aaf3dc64bcbc1 100644
> > --- a/fs/xfs/xfs_acl.c
> > +++ b/fs/xfs/xfs_acl.c
> > @@ -203,7 +203,7 @@ __xfs_set_acl(struct inode *inode, struct posix_acl *acl, int type)
> >  		xfs_acl_to_disk(args.value, acl);
> >  	}
> >  
> > -	error = xfs_attr_change(&args);
> > +	error = xfs_attr_change(&args, 0);
> >  	kvfree(args.value);
> >  
> >  	/*
> > diff --git a/fs/xfs/xfs_handle.c b/fs/xfs/xfs_handle.c
> > index 833b0d7d8bea1c..e3f54817b91557 100644
> > --- a/fs/xfs/xfs_handle.c
> > +++ b/fs/xfs/xfs_handle.c
> > @@ -492,7 +492,6 @@ xfs_attrmulti_attr_get(
> >  	struct xfs_da_args	args = {
> >  		.dp		= XFS_I(inode),
> >  		.attr_filter	= xfs_attr_filter(flags),
> > -		.xattr_flags	= xfs_xattr_flags(flags),
> >  		.name		= name,
> >  		.namelen	= strlen(name),
> >  		.valuelen	= *len,
> > @@ -526,7 +525,6 @@ xfs_attrmulti_attr_set(
> >  	struct xfs_da_args	args = {
> >  		.dp		= XFS_I(inode),
> >  		.attr_filter	= xfs_attr_filter(flags),
> > -		.xattr_flags	= xfs_xattr_flags(flags),
> >  		.name		= name,
> >  		.namelen	= strlen(name),
> >  	};
> > @@ -544,7 +542,7 @@ xfs_attrmulti_attr_set(
> >  		args.valuelen = len;
> >  	}
> >  
> > -	error = xfs_attr_change(&args);
> > +	error = xfs_attr_change(&args, xfs_xattr_flags(flags));
> >  	if (!error && (flags & XFS_IOC_ATTR_ROOT))
> >  		xfs_forget_acl(inode, name);
> >  	kfree(args.value);
> > diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
> > index c4f9c7eec83590..d374be9f8a6e3e 100644
> > --- a/fs/xfs/xfs_iops.c
> > +++ b/fs/xfs/xfs_iops.c
> > @@ -64,7 +64,7 @@ xfs_initxattrs(
> >  			.value		= xattr->value,
> >  			.valuelen	= xattr->value_len,
> >  		};
> > -		error = xfs_attr_change(&args);
> > +		error = xfs_attr_change(&args, 0);
> >  		if (error < 0)
> >  			break;
> >  	}
> > diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> > index dc074240ad239f..1292d69087dc0c 100644
> > --- a/fs/xfs/xfs_trace.h
> > +++ b/fs/xfs/xfs_trace.h
> > @@ -2131,7 +2131,6 @@ DECLARE_EVENT_CLASS(xfs_attr_class,
> >  		__field(int, valuelen)
> >  		__field(xfs_dahash_t, hashval)
> >  		__field(unsigned int, attr_filter)
> > -		__field(unsigned int, xattr_flags)
> >  		__field(uint32_t, op_flags)
> >  	),
> >  	TP_fast_assign(
> > @@ -2143,11 +2142,10 @@ DECLARE_EVENT_CLASS(xfs_attr_class,
> >  		__entry->valuelen = args->valuelen;
> >  		__entry->hashval = args->hashval;
> >  		__entry->attr_filter = args->attr_filter;
> > -		__entry->xattr_flags = args->xattr_flags;
> >  		__entry->op_flags = args->op_flags;
> >  	),
> >  	TP_printk("dev %d:%d ino 0x%llx name %.*s namelen %d valuelen %d "
> > -		  "hashval 0x%x filter %s flags %s op_flags %s",
> > +		  "hashval 0x%x filter %s op_flags %s",
> >  		  MAJOR(__entry->dev), MINOR(__entry->dev),
> >  		  __entry->ino,
> >  		  __entry->namelen,
> > @@ -2157,9 +2155,6 @@ DECLARE_EVENT_CLASS(xfs_attr_class,
> >  		  __entry->hashval,
> >  		  __print_flags(__entry->attr_filter, "|",
> >  				XFS_ATTR_FILTER_FLAGS),
> > -		   __print_flags(__entry->xattr_flags, "|",
> > -				{ XATTR_CREATE,		"CREATE" },
> > -				{ XATTR_REPLACE,	"REPLACE" }),
> >  		  __print_flags(__entry->op_flags, "|", XFS_DA_OP_FLAGS))
> >  )
> >  
> > diff --git a/fs/xfs/xfs_xattr.c b/fs/xfs/xfs_xattr.c
> > index 1d57e204c850ff..69fa7b89c68972 100644
> > --- a/fs/xfs/xfs_xattr.c
> > +++ b/fs/xfs/xfs_xattr.c
> > @@ -80,7 +80,8 @@ xfs_attr_want_log_assist(
> >   */
> >  int
> >  xfs_attr_change(
> > -	struct xfs_da_args	*args)
> > +	struct xfs_da_args	*args,
> > +	uint8_t			xattr_flags)
> >  {
> >  	struct xfs_mount	*mp = args->dp->i_mount;
> >  	int			error;
> > @@ -95,7 +96,7 @@ xfs_attr_change(
> >  		args->op_flags |= XFS_DA_OP_LOGGED;
> >  	}
> >  
> > -	return xfs_attr_set(args);
> > +	return xfs_attr_set(args, xattr_flags);
> >  }
> >  
> >  
> > @@ -131,7 +132,6 @@ xfs_xattr_set(const struct xattr_handler *handler,
> >  	struct xfs_da_args	args = {
> >  		.dp		= XFS_I(inode),
> >  		.attr_filter	= handler->flags,
> > -		.xattr_flags	= flags,
> >  		.name		= name,
> >  		.namelen	= strlen(name),
> >  		.value		= (void *)value,
> > @@ -139,7 +139,7 @@ xfs_xattr_set(const struct xattr_handler *handler,
> >  	};
> >  	int			error;
> >  
> > -	error = xfs_attr_change(&args);
> > +	error = xfs_attr_change(&args, flags);
> >  	if (!error && (handler->flags & XFS_ATTR_ROOT))
> >  		xfs_forget_acl(inode, name);
> >  	return error;
> > diff --git a/fs/xfs/xfs_xattr.h b/fs/xfs/xfs_xattr.h
> > index f097002d06571f..79c0040cc904b4 100644
> > --- a/fs/xfs/xfs_xattr.h
> > +++ b/fs/xfs/xfs_xattr.h
> > @@ -6,7 +6,7 @@
> >  #ifndef __XFS_XATTR_H__
> >  #define __XFS_XATTR_H__
> >  
> > -int xfs_attr_change(struct xfs_da_args *args);
> > +int xfs_attr_change(struct xfs_da_args *args, uint8_t xattr_flags);
> >  int xfs_attr_grab_log_assist(struct xfs_mount *mp);
> >  void xfs_attr_rele_log_assist(struct xfs_mount *mp);
> >  
> > 
> 

^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 2/3] xfs: introduce vectored scrub mode
  2024-04-10 15:00     ` Christoph Hellwig
@ 2024-04-11  0:59       ` Darrick J. Wong
  2024-04-11  3:38         ` Christoph Hellwig
  0 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-11  0:59 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: hch, linux-xfs

On Wed, Apr 10, 2024 at 08:00:11AM -0700, Christoph Hellwig wrote:
> On Tue, Apr 09, 2024 at 06:08:54PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > Introduce a variant on XFS_SCRUB_METADATA that allows for a vectored
> > mode.  The caller specifies the principal metadata object that they want
> > to scrub (allocation group, inode, etc.) once, followed by an array of
> > scrub types they want called on that object.  The kernel runs the scrub
> > operations and writes the output flags and errno code to the
> > corresponding array element.
> > 
> > A new pseudo scrub type BARRIER is introduced to force the kernel to
> > return to userspace if any corruptions have been found when scrubbing
> > the previous scrub types in the array.  This enables userspace to
> > schedule, for example, the sequence:
> > 
> >  1. data fork
> >  2. barrier
> >  3. directory
> > 
> > If the data fork scrub is clean, then the kernel will perform the
> > directory scrub.  If not, the barrier in 2 will exit back to userspace.
> > 
> > When running fstests in "rebuild all metadata after each test" mode, I
> > observed a 10% reduction in runtime due to fewer transitions across the
> > system call boundary.
> 
> Just curius: what is the benefit over shaving a scruball $OBJECT interface
> where the above order is encoded in the kernel instead of in the
> scrub tool?

I thought about designing this interface that way, where userspace
passes a pointer to an empty buffer, and the kernel formats that with
xfs_scrub_vecs that tell userspace what it scrubbed and what the outcome
was.  I didn't like that, because now the kernel has to have a way to
communicate that the buffer needed to have been at least X size, even
though for our cases XFS_SCRUB_TYPE_NR + 2 would always be enough.

Better, I thought, to let userspace figure out what it wants to run, and
tell that explicitly to the kernel, and then the kernel can just do
that.  The downside is that now we need the barriers.

> > +	BUILD_BUG_ON(sizeof(struct xfs_scrub_vec_head) ==
> > +		     sizeof(struct xfs_scrub_metadata));
> > +	BUILD_BUG_ON(XFS_IOC_SCRUB_METADATA == XFS_IOC_SCRUBV_METADATA);
> 
> What is the point of these BUILD_BUG_ONs?

Reusing the same ioctl number instead of burning another one.  It's not
really necessary I suppose.

> > +	if (copy_from_user(&head, uhead, sizeof(head)))
> > +		return -EFAULT;
> > +
> > +	if (head.svh_reserved)
> > +		return -EINVAL;
> > +
> > +	bytes = sizeof_xfs_scrub_vec(head.svh_nr);
> > +	if (bytes > PAGE_SIZE)
> > +		return -ENOMEM;
> > +	vhead = kvmalloc(bytes, GFP_KERNEL | __GFP_RETRY_MAYFAIL);
> 
> Why __GFP_RETRY_MAYFAIL and not just a plain GFP_KERNEL?

Hmm.  At one point I convinced myself this was correct because it would
retry if the allocation failed but could still just fail.  But I guess
it tries "really" hard (so says memory-allocation.rst) sooo yeah
GFP_KERNEL it is then.

> > +	if (!vhead)
> > +		return -ENOMEM;
> > +	memcpy(vhead, &head, sizeof(struct xfs_scrub_vec_head));
> > +
> > +	if (copy_from_user(&vhead->svh_vecs, &uhead->svh_vecs,
> > +				head.svh_nr * sizeof(struct xfs_scrub_vec))) {
> 
> This should probably use array_size to better deal with overflows.

Yep.

> And maybe it should use an indirection for the vecs so that we can
> simply do a memdup_user to copy the entire array to kernel space?

Hmmm.  That's worth considering.  Heck, userspace is already declaring a
fugly structure like this:

struct scrubv_head {
	struct xfs_scrub_vec_head	head;
	struct xfs_scrub_vec		__vecs[XFS_SCRUB_TYPE_NR + 2];
};

Now the pointers are explicit rather than assuming that nobody will
silently reorder the fields here.  That alone is worth it.

--D

^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 3/3] xfs: only iget the file once when doing vectored scrub-by-handle
  2024-04-10 15:12     ` Christoph Hellwig
@ 2024-04-11  1:15       ` Darrick J. Wong
  2024-04-11  3:49         ` Christoph Hellwig
  0 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-11  1:15 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: hch, linux-xfs

On Wed, Apr 10, 2024 at 08:12:16AM -0700, Christoph Hellwig wrote:
> > +	/*
> > +	 * If the caller wants us to do a scrub-by-handle and the file used to
> > +	 * call the ioctl is not the same file, load the incore inode and pin
> > +	 * it across all the scrubv actions to avoid repeated UNTRUSTED
> > +	 * lookups.  The reference is not passed to deeper layers of scrub
> > +	 * because each scrubber gets to decide its own strategy for getting an
> > +	 * inode.
> > +	 */
> > +	if (vhead->svh_ino && vhead->svh_ino != ip_in->i_ino)
> > +		handle_ip = xchk_scrubv_open_by_handle(mp, vhead);
> 
> Oh.  So we read the inode, keep a reference to it, but still hit the
> inode cache every time.  A little non-onvious and not perfect for
> performance, but based on your numbers probably good enough.
> 
> Curious: what is the reason the scrubbers want/need different ways to
> get at the inode?

I don't remember the exact reason why we don't pass this ip into
xfs_scrub_metadata, but iirc the inode scrub setup functions react
differently (from the bmap/dir/attr/symlink scrubbers) when iget
failures occur.

Also this way xfs_scrub_metadata owns the refcount to whatever inode it
picks up, and can do whatever it wants with that reference.

> > +	/*
> > +	 * If we're holding the only reference to an inode opened via handle
> > +	 * and the scan was clean, mark it dontcache so that we don't pollute
> > +	 * the cache.
> > +	 */
> > +	if (handle_ip) {
> > +		if (set_dontcache &&
> > +		    atomic_read(&VFS_I(handle_ip)->i_count) == 1)
> > +			d_mark_dontcache(VFS_I(handle_ip));
> > +		xfs_irele(handle_ip);
> > +	}
> 
> This looks a little weird to me.  Can't we simply use XFS_IGET_DONTCACHE
> at iget time and then clear I_DONTCACHE here if we want to keep the
> inode around?

Not anymore, because other threads can mess around with the dontcache
state (yay fsdax access path changes!!) while we are scrubbing the
inode.

>                Given that we only set the uncached flag from
> XFS_IGET_DONTCACHE on a cache miss, we won't have set
> DCACHE_DONTCACHE anywhere (and don't really care about the dentries to
> start with).
> 
> But why do we care about keeping the inodes with errors in memory
> here, but not elsewhere?

We actually, do, but it's not obvious...

> Maybe this can be explained in an expanded comment.

...because this bit here is basically the same as xchk_irele, but we
don't have a xfs_scrub object to pass in, so it's opencoded.  I could
pull this logic out into:

void xfs_scrub_irele(struct xfs_inode *ip)
{
	if (atomic_read(&VFS_I(ip)->i_count) == 1) {
		/*
		 * If this is the last reference to the inode and the caller
		 * permits it, set DONTCACHE to avoid thrashing.
		 */
		d_mark_dontcache(VFS_I(ip));
	}
	xfs_irele(ip);
}

and change xchk_irele to:

void
xchk_irele(
	struct xfs_scrub	*sc,
	struct xfs_inode	*ip)
{
	if (sc->tp) {
		spin_lock(&VFS_I(ip)->i_lock);
		VFS_I(ip)->i_state &= ~I_DONTCACHE;
		spin_unlock(&VFS_I(ip)->i_lock);
		xfs_irele(ip);
		return;
	}

	xfs_scrub_irele(ip);
}

^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 14/32] xfs: add parent pointer validator functions
  2024-04-10 18:53       ` Darrick J. Wong
@ 2024-04-11  3:25         ` Christoph Hellwig
  0 siblings, 0 replies; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-11  3:25 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Allison Henderson, catherine.hoang, hch, linux-xfs

On Wed, Apr 10, 2024 at 11:53:12AM -0700, Darrick J. Wong wrote:
> On Tue, Apr 09, 2024 at 10:31:10PM -0700, Christoph Hellwig wrote:
> > On Tue, Apr 09, 2024 at 05:57:09PM -0700, Darrick J. Wong wrote:
> > > From: Allison Henderson <allison.henderson@oracle.com>
> > > 
> > > Attribute names of parent pointers are not strings.
> > 
> > They are now.  The rest of the commit log also doesn't match the code
> > anymore.  The code itself looks good, though.
> 
> How about this, then:

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>


^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 3/4] xfs: rename xfs_da_args.attr_flags
  2024-04-10 20:55       ` Darrick J. Wong
  2024-04-11  0:00         ` Darrick J. Wong
@ 2024-04-11  3:26         ` Christoph Hellwig
  2024-04-11  4:15           ` Darrick J. Wong
  1 sibling, 1 reply; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-11  3:26 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, hch, linux-xfs

On Wed, Apr 10, 2024 at 01:55:28PM -0700, Darrick J. Wong wrote:
> On Tue, Apr 09, 2024 at 10:01:55PM -0700, Christoph Hellwig wrote:
> > On Tue, Apr 09, 2024 at 05:50:07PM -0700, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <djwong@kernel.org>
> > > 
> > > This field only ever contains XATTR_{CREATE,REPLACE}, so let's change
> > > the name of the field to make the field and its values consistent.
> > 
> > So, these flags only get passed to xfs_attr_set through xfs_attr_change
> > and xfs_attr_setname, which means we should probably just pass them
> > directly as in my patch (against your whole stack) below.
> 
> Want me to reflow this through the tree, or just tack it on the end
> after (perhaps?) "xfs: fix corruptions in the directory tree" ?

If it makes your life easier feel free to add it at the end.


^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 07/32] xfs: allow xattr matching on name and value for local/sf attrs
  2024-04-10 21:13       ` Darrick J. Wong
@ 2024-04-11  3:28         ` Christoph Hellwig
  0 siblings, 0 replies; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-11  3:28 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, catherine.hoang, hch, allison.henderson, linux-xfs

On Wed, Apr 10, 2024 at 02:13:10PM -0700, Darrick J. Wong wrote:
> How about:
> 
> "xfs: allow xattr matching on name and value for local/sf pptr attrs

How about: "match on the attr value as well for parent pointers"

for the subject?

The commit message body looks good to me.


^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 23/32] xfs: Filter XFS_ATTR_PARENT for getfattr
  2024-04-10 21:58       ` Darrick J. Wong
@ 2024-04-11  3:29         ` Christoph Hellwig
  0 siblings, 0 replies; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-11  3:29 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Allison Henderson, catherine.hoang, hch, linux-xfs

On Wed, Apr 10, 2024 at 02:58:27PM -0700, Darrick J. Wong wrote:
> "xfs: don't return XFS_ATTR_PARENT attributes via listxattr
> 
> "Parent pointers are internal filesystem metadata.  They're not intended
> to be directly visible to userspace, so filter them out of
> xfs_xattr_put_listent so that they don't appear in listxattr."

Looks good.

> > However I'd make it part of the top of file comment above the include
> > statements.  And please add it in a separate commit as it has nothing
> > to do with the other changes here.
> 
> Or just get rid of the comment entirely?  It came from the verity
> series.

Fine with me.


^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 01/14] xfs: add xattr setname and removename functions for internal users
  2024-04-10 22:18       ` Darrick J. Wong
@ 2024-04-11  3:32         ` Christoph Hellwig
  2024-04-11  4:30           ` Darrick J. Wong
  0 siblings, 1 reply; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-11  3:32 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, catherine.hoang, hch, allison.henderson, linux-xfs

On Wed, Apr 10, 2024 at 03:18:44PM -0700, Darrick J. Wong wrote:
> > Is there a good reason to have a separate remove helper and not
> > overload a NULL value like we do for the normal xattr interface?
> 
> xfs_repair uses xfs_parent_unset -> xfs_attr_removename to erase any
> XFS_ATTR_PARENT attribute that doesn't validate, so it needs to be able
> to pass in a non-NULL value.  Perhaps I'll add a comment about that,
> since this isn't the first time this has come up.
> 
> Come to think of it you can't removename a remote parent value, so I
> guess in that bad case xfs_repair will have to drop the entire attr
> structure <frown>.

Maybe we'll need to fix that.  How about you leave the xattr_flags in
place for now, and then I or you if you really want) replace it with
a new enum argument:

enum xfs_attr_change {
	XFS_ATTR_CREATE,
	XFS_ATTR_REPLACE,
	XFS_ATTR_CREATE_OR_REPLACE,
	XFS_ATTR_REMOVE,
};

and we pass that to xfs_attr_set and what is current xfs_attr_setname
(which btw is a name that feels really odd).  That way repair can
also use the libxfs attr helpers with a value match for parent pointers?


^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 11/32] xfs: log parent pointer xattr replace operations
  2024-04-10 23:07       ` Darrick J. Wong
@ 2024-04-11  3:35         ` Christoph Hellwig
  0 siblings, 0 replies; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-11  3:35 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Allison Henderson, catherine.hoang, hch, linux-xfs

On Wed, Apr 10, 2024 at 04:07:24PM -0700, Darrick J. Wong wrote:
>     xfs: create attr log item opcodes and formats for parent pointers
> 
>     Make the necessary alterations to the extended attribute log intent item
>     ondisk format so that we can log parent pointer operations.  This
>     requires the creation of new opcodes specific to parent pointers, and a
>     new four-argument replace operation to handle renames.  At this point
>     this part of the patchset has changed so much from what Allison original
>     wrote that I no longer think her SoB applies.

Sounds good.


^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 2/3] xfs: introduce vectored scrub mode
  2024-04-11  0:59       ` Darrick J. Wong
@ 2024-04-11  3:38         ` Christoph Hellwig
  2024-04-11  4:31           ` Darrick J. Wong
  0 siblings, 1 reply; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-11  3:38 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, hch, linux-xfs

On Wed, Apr 10, 2024 at 05:59:41PM -0700, Darrick J. Wong wrote:
> I thought about designing this interface that way, where userspace
> passes a pointer to an empty buffer, and the kernel formats that with
> xfs_scrub_vecs that tell userspace what it scrubbed and what the outcome
> was.  I didn't like that, because now the kernel has to have a way to
> communicate that the buffer needed to have been at least X size, even
> though for our cases XFS_SCRUB_TYPE_NR + 2 would always be enough.
> 
> Better, I thought, to let userspace figure out what it wants to run, and
> tell that explicitly to the kernel, and then the kernel can just do
> that.  The downside is that now we need the barriers.

And the downside is the userspace needs to known about all the passes
and dependencies.  Which I guess it does anyway due to the older
scrub interface, but maybe that's worth documenting?

> 
> > > +	BUILD_BUG_ON(sizeof(struct xfs_scrub_vec_head) ==
> > > +		     sizeof(struct xfs_scrub_metadata));
> > > +	BUILD_BUG_ON(XFS_IOC_SCRUB_METADATA == XFS_IOC_SCRUBV_METADATA);
> > 
> > What is the point of these BUILD_BUG_ONs?
> 
> Reusing the same ioctl number instead of burning another one.  It's not
> really necessary I suppose.

I find reusing the numbers really confusings even if it does work due
to the size encoding.  If you're fine with getting rid of it I'm all
for it.


^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 3/3] xfs: only iget the file once when doing vectored scrub-by-handle
  2024-04-11  1:15       ` Darrick J. Wong
@ 2024-04-11  3:49         ` Christoph Hellwig
  2024-04-11  4:41           ` Darrick J. Wong
  0 siblings, 1 reply; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-11  3:49 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, Ira Weiny

On Wed, Apr 10, 2024 at 06:15:02PM -0700, Darrick J. Wong wrote:
> > This looks a little weird to me.  Can't we simply use XFS_IGET_DONTCACHE
> > at iget time and then clear I_DONTCACHE here if we want to keep the
> > inode around?
> 
> Not anymore, because other threads can mess around with the dontcache
> state (yay fsdax access path changes!!) while we are scrubbing the
> inode.

You mean xfs_ioctl_setattr_prepare_dax?  Oh lovely, a completely
undocumented d_mark_dontcache in a completely non-obvious place.

It sems to have appeared in
commit e4f9ba20d3b8c2b86ec71f326882e1a3c4e47953
Author: Ira Weiny <ira.weiny@intel.com>
Date:   Thu Apr 30 07:41:38 2020 -0700

    fs/xfs: Update xfs_ioctl_setattr_dax_invalidate()

without any explanation either.  And I can't see any reason why
we'd prevent inodes and dentries to be cached after DAX mode
switches to start with.  I can only guess, maybe the commit thinks
d_mark_dontcache is about data caching?

> 
> >                Given that we only set the uncached flag from
> > XFS_IGET_DONTCACHE on a cache miss, we won't have set
> > DCACHE_DONTCACHE anywhere (and don't really care about the dentries to
> > start with).
> > 
> > But why do we care about keeping the inodes with errors in memory
> > here, but not elsewhere?
> 
> We actually, do, but it's not obvious...
> 
> > Maybe this can be explained in an expanded comment.
> 
> ...because this bit here is basically the same as xchk_irele, but we
> don't have a xfs_scrub object to pass in, so it's opencoded.  I could
> pull this logic out into:

Eww, I hadn't seen xchk_irele before.  To me it looks like
I_DONTCACHE/d_mark_dontcache is really the wrong vehicle here.

I'd instead have a XFS_IGET_SCRUB, which will set an XFS_ISCRUB or
whatever flag on a cache miss.  Any cache hit without XFS_IGET_SCRUB
will clear it.

->drop_inode then always returns true for XFS_ISCRUB inodes unless
in a transaction.  Talking about the in transaction part - why do
we drop inodes in the transaction in scrub, but not elsewhere?


^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 3/4] xfs: rename xfs_da_args.attr_flags
  2024-04-11  3:26         ` Christoph Hellwig
@ 2024-04-11  4:15           ` Darrick J. Wong
  0 siblings, 0 replies; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-11  4:15 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: hch, linux-xfs

On Wed, Apr 10, 2024 at 08:26:46PM -0700, Christoph Hellwig wrote:
> On Wed, Apr 10, 2024 at 01:55:28PM -0700, Darrick J. Wong wrote:
> > On Tue, Apr 09, 2024 at 10:01:55PM -0700, Christoph Hellwig wrote:
> > > On Tue, Apr 09, 2024 at 05:50:07PM -0700, Darrick J. Wong wrote:
> > > > From: Darrick J. Wong <djwong@kernel.org>
> > > > 
> > > > This field only ever contains XATTR_{CREATE,REPLACE}, so let's change
> > > > the name of the field to make the field and its values consistent.
> > > 
> > > So, these flags only get passed to xfs_attr_set through xfs_attr_change
> > > and xfs_attr_setname, which means we should probably just pass them
> > > directly as in my patch (against your whole stack) below.
> > 
> > Want me to reflow this through the tree, or just tack it on the end
> > after (perhaps?) "xfs: fix corruptions in the directory tree" ?
> 
> If it makes your life easier feel free to add it at the end.

It does, and done!

--D

^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 01/14] xfs: add xattr setname and removename functions for internal users
  2024-04-11  3:32         ` Christoph Hellwig
@ 2024-04-11  4:30           ` Darrick J. Wong
  2024-04-11  4:50             ` Christoph Hellwig
  0 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-11  4:30 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: catherine.hoang, hch, allison.henderson, linux-xfs

On Wed, Apr 10, 2024 at 08:32:44PM -0700, Christoph Hellwig wrote:
> On Wed, Apr 10, 2024 at 03:18:44PM -0700, Darrick J. Wong wrote:
> > > Is there a good reason to have a separate remove helper and not
> > > overload a NULL value like we do for the normal xattr interface?
> > 
> > xfs_repair uses xfs_parent_unset -> xfs_attr_removename to erase any
> > XFS_ATTR_PARENT attribute that doesn't validate, so it needs to be able
> > to pass in a non-NULL value.  Perhaps I'll add a comment about that,
> > since this isn't the first time this has come up.
> > 
> > Come to think of it you can't removename a remote parent value, so I
> > guess in that bad case xfs_repair will have to drop the entire attr
> > structure <frown>.
> 
> Maybe we'll need to fix that.  How about you leave the xattr_flags in
> place for now, and then I or you if you really want) replace it with
> a new enum argument:
> 
> enum xfs_attr_change {
> 	XFS_ATTR_CREATE,
> 	XFS_ATTR_REPLACE,
> 	XFS_ATTR_CREATE_OR_REPLACE,
> 	XFS_ATTR_REMOVE,
> };

Heh, I almost did that:

enum xfs_attr_change {
	XAC_CREATE	= XATTR_CREATE,
	XAC_REPLACE	= XATTR_REPLACE,
	XAC_UPSERT,
	XAC_REMOVE,
};

(500 patches from now when I get around to removing xattr_flags & making
it a parameter.)

> and we pass that to xfs_attr_set and what is current xfs_attr_setname
> (which btw is a name that feels really odd).  That way repair can
> also use the libxfs attr helpers with a value match for parent pointers?

I think this is a good idea.  Maybe even worth rebasing through the
tree.

--D

^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 2/3] xfs: introduce vectored scrub mode
  2024-04-11  3:38         ` Christoph Hellwig
@ 2024-04-11  4:31           ` Darrick J. Wong
  0 siblings, 0 replies; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-11  4:31 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: hch, linux-xfs

On Wed, Apr 10, 2024 at 08:38:38PM -0700, Christoph Hellwig wrote:
> On Wed, Apr 10, 2024 at 05:59:41PM -0700, Darrick J. Wong wrote:
> > I thought about designing this interface that way, where userspace
> > passes a pointer to an empty buffer, and the kernel formats that with
> > xfs_scrub_vecs that tell userspace what it scrubbed and what the outcome
> > was.  I didn't like that, because now the kernel has to have a way to
> > communicate that the buffer needed to have been at least X size, even
> > though for our cases XFS_SCRUB_TYPE_NR + 2 would always be enough.
> > 
> > Better, I thought, to let userspace figure out what it wants to run, and
> > tell that explicitly to the kernel, and then the kernel can just do
> > that.  The downside is that now we need the barriers.
> 
> And the downside is the userspace needs to known about all the passes
> and dependencies.  Which I guess it does anyway due to the older
> scrub interface, but maybe that's worth documenting?

Yes, that's correct that userspace would have needed to know all that
anyway.  I'll summarize this conversation in the commit message.

> > 
> > > > +	BUILD_BUG_ON(sizeof(struct xfs_scrub_vec_head) ==
> > > > +		     sizeof(struct xfs_scrub_metadata));
> > > > +	BUILD_BUG_ON(XFS_IOC_SCRUB_METADATA == XFS_IOC_SCRUBV_METADATA);
> > > 
> > > What is the point of these BUILD_BUG_ONs?
> > 
> > Reusing the same ioctl number instead of burning another one.  It's not
> > really necessary I suppose.
> 
> I find reusing the numbers really confusings even if it does work due
> to the size encoding.  If you're fine with getting rid of it I'm all
> for it.

Done.

--D

^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 3/3] xfs: only iget the file once when doing vectored scrub-by-handle
  2024-04-11  3:49         ` Christoph Hellwig
@ 2024-04-11  4:41           ` Darrick J. Wong
  2024-04-11  4:52             ` Christoph Hellwig
  0 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-11  4:41 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, Ira Weiny

On Wed, Apr 10, 2024 at 08:49:39PM -0700, Christoph Hellwig wrote:
> On Wed, Apr 10, 2024 at 06:15:02PM -0700, Darrick J. Wong wrote:
> > > This looks a little weird to me.  Can't we simply use XFS_IGET_DONTCACHE
> > > at iget time and then clear I_DONTCACHE here if we want to keep the
> > > inode around?
> > 
> > Not anymore, because other threads can mess around with the dontcache
> > state (yay fsdax access path changes!!) while we are scrubbing the
> > inode.
> 
> You mean xfs_ioctl_setattr_prepare_dax?  Oh lovely, a completely
> undocumented d_mark_dontcache in a completely non-obvious place.
> 
> It sems to have appeared in
> commit e4f9ba20d3b8c2b86ec71f326882e1a3c4e47953
> Author: Ira Weiny <ira.weiny@intel.com>
> Date:   Thu Apr 30 07:41:38 2020 -0700
> 
>     fs/xfs: Update xfs_ioctl_setattr_dax_invalidate()
> 
> without any explanation either.  And I can't see any reason why
> we'd prevent inodes and dentries to be cached after DAX mode
> switches to start with.  I can only guess, maybe the commit thinks
> d_mark_dontcache is about data caching?

It's the horrible way that fsdax "supports" switching the address ops
and i_mapping contents at runtime -- set the ondisk iflag, mark the
inode/dentry for immediate explusion, wait for reclaim to eat the inode,
then reload it and *presto* new incore iflag and state!

(It's gross but I don't know of a better way to drain i_mapping and
change address ops and at this point I'm hoping I just plain forget all
that pmem stuff. :P)

> > 
> > >                Given that we only set the uncached flag from
> > > XFS_IGET_DONTCACHE on a cache miss, we won't have set
> > > DCACHE_DONTCACHE anywhere (and don't really care about the dentries to
> > > start with).
> > > 
> > > But why do we care about keeping the inodes with errors in memory
> > > here, but not elsewhere?
> > 
> > We actually, do, but it's not obvious...
> > 
> > > Maybe this can be explained in an expanded comment.
> > 
> > ...because this bit here is basically the same as xchk_irele, but we
> > don't have a xfs_scrub object to pass in, so it's opencoded.  I could
> > pull this logic out into:
> 
> Eww, I hadn't seen xchk_irele before.  To me it looks like
> I_DONTCACHE/d_mark_dontcache is really the wrong vehicle here.
> 
> I'd instead have a XFS_IGET_SCRUB, which will set an XFS_ISCRUB or
> whatever flag on a cache miss.  Any cache hit without XFS_IGET_SCRUB
> will clear it.
> 
> ->drop_inode then always returns true for XFS_ISCRUB inodes unless
> in a transaction.

How does it determine that we're in a transaction?  We just stopped
storing transactions in current->journal_info due to problems with
nested transactions and ext4 assuming that it can blind deref that.

>                    Talking about the in transaction part - why do
> we drop inodes in the transaction in scrub, but not elsewhere?

One example is:

Alloc transaction -> lock rmap btree for repairs -> iscan filesystem to
find rmap records -> iget/irele.

--D

^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 01/14] xfs: add xattr setname and removename functions for internal users
  2024-04-11  4:30           ` Darrick J. Wong
@ 2024-04-11  4:50             ` Christoph Hellwig
  0 siblings, 0 replies; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-11  4:50 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, catherine.hoang, hch, allison.henderson, linux-xfs

On Wed, Apr 10, 2024 at 09:30:48PM -0700, Darrick J. Wong wrote:
> Heh, I almost did that:
> 
> enum xfs_attr_change {
> 	XAC_CREATE	= XATTR_CREATE,
> 	XAC_REPLACE	= XATTR_REPLACE,
> 	XAC_UPSERT,
> 	XAC_REMOVE,
> };
> 
> (500 patches from now when I get around to removing xattr_flags & making
> it a parameter.)

Heh. Reusing the XATTR_* values is ok, but I doubt it's really worth
the effort given that just one caller actually uses them.


^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 3/3] xfs: only iget the file once when doing vectored scrub-by-handle
  2024-04-11  4:41           ` Darrick J. Wong
@ 2024-04-11  4:52             ` Christoph Hellwig
  2024-04-11  4:56               ` Darrick J. Wong
  0 siblings, 1 reply; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-11  4:52 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, linux-xfs, Ira Weiny

> How does it determine that we're in a transaction?  We just stopped
> storing transactions in current->journal_info due to problems with
> nested transactions and ext4 assuming that it can blind deref that.

Oh, I was looking at an old tree and missed that.

Well, someone needs to own it, it's just not just ext4 but could us.
> 
> >                    Talking about the in transaction part - why do
> > we drop inodes in the transaction in scrub, but not elsewhere?
> 
> One example is:
> 
> Alloc transaction -> lock rmap btree for repairs -> iscan filesystem to
> find rmap records -> iget/irele.

So this is just the magic empty transaction?


^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 3/3] xfs: only iget the file once when doing vectored scrub-by-handle
  2024-04-11  4:52             ` Christoph Hellwig
@ 2024-04-11  4:56               ` Darrick J. Wong
  2024-04-11  5:02                 ` Christoph Hellwig
  0 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-11  4:56 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, Ira Weiny

On Wed, Apr 10, 2024 at 09:52:41PM -0700, Christoph Hellwig wrote:
> > How does it determine that we're in a transaction?  We just stopped
> > storing transactions in current->journal_info due to problems with
> > nested transactions and ext4 assuming that it can blind deref that.
> 
> Oh, I was looking at an old tree and missed that.

It's not in my tree but I did ... oh crap that already got committed; I
need to rip out that part of xrep_trans_alloc_hook_dummy now.

> Well, someone needs to own it, it's just not just ext4 but could us.

Er... I don't understand this?        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

> > >                    Talking about the in transaction part - why do
> > > we drop inodes in the transaction in scrub, but not elsewhere?
> > 
> > One example is:
> > 
> > Alloc transaction -> lock rmap btree for repairs -> iscan filesystem to
> > find rmap records -> iget/irele.
> 
> So this is just the magic empty transaction?

No, that's the fully featured repair transaction that will eventually be
used to write/commit the new rmap tree.

--D

^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 3/3] xfs: only iget the file once when doing vectored scrub-by-handle
  2024-04-11  4:56               ` Darrick J. Wong
@ 2024-04-11  5:02                 ` Christoph Hellwig
  2024-04-11  5:21                   ` Darrick J. Wong
  0 siblings, 1 reply; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-11  5:02 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, linux-xfs, Ira Weiny

On Wed, Apr 10, 2024 at 09:56:45PM -0700, Darrick J. Wong wrote:
> > Well, someone needs to own it, it's just not just ext4 but could us.
> 
> Er... I don't understand this?        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

If we set current->journal and take a page faul we could not just
recurse into ext4 but into any fs including XFS.  Any everyone
blindly dereferences is as only one fs can own it.

> > > Alloc transaction -> lock rmap btree for repairs -> iscan filesystem to
> > > find rmap records -> iget/irele.
> > 
> > So this is just the magic empty transaction?
> 
> No, that's the fully featured repair transaction that will eventually be
> used to write/commit the new rmap tree.

That seems a bit dangerous to me.  I guess we rely on the code inside
the transaction context to never race with unmount as lack of SB_ACTIVE
will make the VFS ignore the dontcache flag.


^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 3/3] xfs: only iget the file once when doing vectored scrub-by-handle
  2024-04-11  5:02                 ` Christoph Hellwig
@ 2024-04-11  5:21                   ` Darrick J. Wong
  2024-04-11 14:02                     ` Christoph Hellwig
  0 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-11  5:21 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, Ira Weiny

On Wed, Apr 10, 2024 at 10:02:23PM -0700, Christoph Hellwig wrote:
> On Wed, Apr 10, 2024 at 09:56:45PM -0700, Darrick J. Wong wrote:
> > > Well, someone needs to own it, it's just not just ext4 but could us.
> > 
> > Er... I don't understand this?        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
> If we set current->journal and take a page faul we could not just
> recurse into ext4 but into any fs including XFS.  Any everyone
> blindly dereferences is as only one fs can own it.

Well back before we ripped it out I had said that XFS should just set
current->journal to 1 to prevent memory corruption but then Jan Kara
noted that ext4 changes its behavior wrt jbd2 if it sees nonzero
current->journal.  That's why Dave dropped it entirely.

> > > > Alloc transaction -> lock rmap btree for repairs -> iscan filesystem to
> > > > find rmap records -> iget/irele.
> > > 
> > > So this is just the magic empty transaction?
> > 
> > No, that's the fully featured repair transaction that will eventually be
> > used to write/commit the new rmap tree.
> 
> That seems a bit dangerous to me.  I guess we rely on the code inside
> the transaction context to never race with unmount as lack of SB_ACTIVE
> will make the VFS ignore the dontcache flag.

That and we have an open fd to call the ioctl so any unmount will fail,
and we can't enter scrub if unmount already starte.

--D

^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 3/3] xfs: only iget the file once when doing vectored scrub-by-handle
  2024-04-11  5:21                   ` Darrick J. Wong
@ 2024-04-11 14:02                     ` Christoph Hellwig
  2024-04-12  0:21                       ` Darrick J. Wong
  0 siblings, 1 reply; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-11 14:02 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Christoph Hellwig, linux-xfs, Ira Weiny

On Wed, Apr 10, 2024 at 10:21:07PM -0700, Darrick J. Wong wrote:
> On Wed, Apr 10, 2024 at 10:02:23PM -0700, Christoph Hellwig wrote:
> > On Wed, Apr 10, 2024 at 09:56:45PM -0700, Darrick J. Wong wrote:
> > > > Well, someone needs to own it, it's just not just ext4 but could us.
> > > 
> > > Er... I don't understand this?        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > 
> > If we set current->journal and take a page faul we could not just
> > recurse into ext4 but into any fs including XFS.  Any everyone
> > blindly dereferences is as only one fs can own it.
> 
> Well back before we ripped it out I had said that XFS should just set
> current->journal to 1 to prevent memory corruption but then Jan Kara
> noted that ext4 changes its behavior wrt jbd2 if it sees nonzero
> current->journal.  That's why Dave dropped it entirely.

If you are in a fs context you own current->journal_info.  But you
also must make sure to not copy from and especially to user to not
recurse into another file system.  A per-thread field can't work any
other way.  So what ext4 is doing here is perfectly fine.  What XFS
did was to set current->journal_info and then cause page faults, which
is not ok.  I'm glad we fixed it.

> > That seems a bit dangerous to me.  I guess we rely on the code inside
> > the transaction context to never race with unmount as lack of SB_ACTIVE
> > will make the VFS ignore the dontcache flag.
> 
> That and we have an open fd to call the ioctl so any unmount will fail,
> and we can't enter scrub if unmount already starte.

Indeed.

So I'm still confused on why this new code keeps the inode around if an
error happend, but xchk_irele does not.  What is the benefit of keeping
the inode around here?  Why des it not apply to xchk_irele?

I also don't understand how d_mark_dontcache in
xfs_ioctl_setattr_prepare_dax is supposed to work.  It'll make the inode
go away quicker than without, but it can't force the inode by itself.

I'm also lot on the interaction of that with the scrub inodes due to
both above.  I'd still expect any scrub iget to set uncached for
a cache miss.  If we then need to keep the inode around in transaction
context we just keep it.  What is the benefit of playing racing
games with i_count to delay setting the dontcache flag until irele?
And why does the DAX mess matter for that?

Maybe I'm just thick and this is all obvious, but then it needs to
be documented in detailed comments.

^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 3/3] xfs: only iget the file once when doing vectored scrub-by-handle
  2024-04-11 14:02                     ` Christoph Hellwig
@ 2024-04-12  0:21                       ` Darrick J. Wong
  0 siblings, 0 replies; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-12  0:21 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-xfs, Ira Weiny

On Thu, Apr 11, 2024 at 07:02:07AM -0700, Christoph Hellwig wrote:
> On Wed, Apr 10, 2024 at 10:21:07PM -0700, Darrick J. Wong wrote:
> > On Wed, Apr 10, 2024 at 10:02:23PM -0700, Christoph Hellwig wrote:
> > > On Wed, Apr 10, 2024 at 09:56:45PM -0700, Darrick J. Wong wrote:
> > > > > Well, someone needs to own it, it's just not just ext4 but could us.
> > > > 
> > > > Er... I don't understand this?        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> > > 
> > > If we set current->journal and take a page faul we could not just
> > > recurse into ext4 but into any fs including XFS.  Any everyone
> > > blindly dereferences is as only one fs can own it.
> > 
> > Well back before we ripped it out I had said that XFS should just set
> > current->journal to 1 to prevent memory corruption but then Jan Kara
> > noted that ext4 changes its behavior wrt jbd2 if it sees nonzero
> > current->journal.  That's why Dave dropped it entirely.
> 
> If you are in a fs context you own current->journal_info.  But you
> also must make sure to not copy from and especially to user to not
> recurse into another file system.  A per-thread field can't work any
> other way.  So what ext4 is doing here is perfectly fine.  What XFS
> did was to set current->journal_info and then cause page faults, which
> is not ok.  I'm glad we fixed it.
> 
> > > That seems a bit dangerous to me.  I guess we rely on the code inside
> > > the transaction context to never race with unmount as lack of SB_ACTIVE
> > > will make the VFS ignore the dontcache flag.
> > 
> > That and we have an open fd to call the ioctl so any unmount will fail,
> > and we can't enter scrub if unmount already starte.
> 
> Indeed.
> 
> So I'm still confused on why this new code keeps the inode around if an
> error happend, but xchk_irele does not.  What is the benefit of keeping
> the inode around here?  Why des it not apply to xchk_irele?

OH!  Crap, I forgot that some years ago (after the creation of the
vectorized scrub patch) I cleaned up that behavior -- previously scrub
actually did play games with clearing dontcache if the inode was sick.

Then Dave pointed out that we could just change reclaim not to purge the
incore inode (and hence preserve the health state) until unmount or the
fs goes down, and clear I_DONTCACHE any time we notice bad metadata.
Hopefully the incore inode then survives long enough that anyone
scanning for filesystem health status will still see the badness state.

Therefore, we don't need the set_dontcache variable in this patch:

	/*
	 * If we're holding the only reference to an inode opened via
	 * handle, mark it dontcache so that we don't pollute the cache.
	 */
	if (handle_ip) {
		if (atomic_read(&VFS_I(handle_ip)->i_count) == 1)
			d_mark_dontcache(VFS_I(handle_ip));
		xfs_irele(handle_ip);
	}

> I also don't understand how d_mark_dontcache in
> xfs_ioctl_setattr_prepare_dax is supposed to work.  It'll make the inode
> go away quicker than without, but it can't force the inode by itself.

That's correct.  You can set the ondisk fsdax iflag and then wait
centuries for the incore fsdax to catch up.  I think this is a very
marginal design, but thankfully the intended design is that you set
daxinherit on the parent dir or mount with dax=always and all new files
just come up with both fsdax flags set.

> I'm also lot on the interaction of that with the scrub inodes due to
> both above.  I'd still expect any scrub iget to set uncached for
> a cache miss.  If we then need to keep the inode around in transaction
> context we just keep it.  What is the benefit of playing racing
> games with i_count to delay setting the dontcache flag until irele?

One thing I don't like about XFS_IGET_DONTCACHE is that a concurrent
iget call without DONTCACHE won't clear the state from the inode, which
can lead to unnecessary evictions.  This racy thing means we only set it
if we think the inode isn't in use anywhere else.

At this point, though, I think we could add XFS_IGET_DONTCACHE to all
the iget calls and drop the irele dontcache thing.

> And why does the DAX mess matter for that?

fsdax doesn't matter, it merely slurped up the functionality and now
we're tangled up in it.

> Maybe I'm just thick and this is all obvious, but then it needs to
> be documented in detailed comments.

No, it's ... very twisty and weird.  I wish dontcache had remained our
private xfs thing.

--D

^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 27/32] xfs: Add parent pointer ioctls
  2024-04-10  1:00   ` [PATCH 27/32] xfs: Add parent pointer ioctls Darrick J. Wong
  2024-04-10  6:04     ` Christoph Hellwig
@ 2024-04-12 17:39     ` Darrick J. Wong
  2024-04-14  5:18       ` Christoph Hellwig
  1 sibling, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-12 17:39 UTC (permalink / raw)
  To: Allison Henderson, catherine.hoang, hch, linux-xfs

On Tue, Apr 09, 2024 at 06:00:33PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> This patch adds a pair of new file ioctls to retrieve the parent pointer
> of a given inode.  They both return the same results, but one operates
> on the file descriptor passed to ioctl() whereas the other allows the
> caller to specify a file handle for which the caller wants results.
> 
> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> [djwong: adjust to new ondisk format, split ioctls]
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
>  fs/xfs/libxfs/xfs_fs.h     |   73 ++++++++++++
>  fs/xfs/libxfs/xfs_ondisk.h |    5 +
>  fs/xfs/libxfs/xfs_parent.c |   35 ++++++
>  fs/xfs/libxfs/xfs_parent.h |    5 +
>  fs/xfs/xfs_handle.c        |  259 ++++++++++++++++++++++++++++++++++++++++++++
>  fs/xfs/xfs_handle.h        |    5 +
>  fs/xfs/xfs_ioctl.c         |    6 +
>  fs/xfs/xfs_trace.c         |    1 
>  fs/xfs/xfs_trace.h         |   92 ++++++++++++++++
>  9 files changed, 480 insertions(+), 1 deletion(-)
> 
> 
> diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
> index 51aa4774f57a2..fa28c18e521bf 100644
> --- a/fs/xfs/libxfs/xfs_fs.h
> +++ b/fs/xfs/libxfs/xfs_fs.h
> @@ -840,6 +840,77 @@ struct xfs_commit_range {
>  					 XFS_EXCHANGE_RANGE_DRY_RUN | \
>  					 XFS_EXCHANGE_RANGE_FILE1_WRITTEN)
>  
> +/* Iterating parent pointers of files. */
> +
> +/* target was the root directory */
> +#define XFS_GETPARENTS_OFLAG_ROOT	(1U << 0)
> +
> +/* Cursor is done iterating pptrs */
> +#define XFS_GETPARENTS_OFLAG_DONE	(1U << 1)
> +
> +#define XFS_GETPARENTS_OFLAGS_ALL	(XFS_GETPARENTS_OFLAG_ROOT | \
> +					 XFS_GETPARENTS_OFLAG_DONE)
> +
> +#define XFS_GETPARENTS_IFLAGS_ALL	(0)
> +
> +struct xfs_getparents_rec {
> +	struct xfs_handle	gpr_parent; /* Handle to parent */
> +	__u16			gpr_reclen; /* Length of entire record */
> +	char			gpr_name[]; /* Null-terminated filename */
> +} __packed;
> +
> +/* Iterate through this file's directory parent pointers */
> +struct xfs_getparents {
> +	/*
> +	 * Structure to track progress in iterating the parent pointers.
> +	 * Must be initialized to zeroes before the first ioctl call, and
> +	 * not touched by callers after that.
> +	 */
> +	struct xfs_attrlist_cursor	gp_cursor;
> +
> +	/* Input flags: XFS_GETPARENTS_IFLAG* */
> +	__u16				gp_iflags;
> +
> +	/* Output flags: XFS_GETPARENTS_OFLAG* */
> +	__u16				gp_oflags;
> +
> +	/* Size of the gp_buffer in bytes */
> +	__u32				gp_bufsize;
> +
> +	/* Must be set to zero */
> +	__u64				__pad;
> +
> +	/* Pointer to a buffer in which to place xfs_getparents_rec */
> +	__u64				gp_buffer;
> +};
> +
> +static inline struct xfs_getparents_rec *
> +xfs_getparents_first_rec(struct xfs_getparents *gp)
> +{
> +	return (struct xfs_getparents_rec *)(uintptr_t)gp->gp_buffer;
> +}
> +
> +static inline struct xfs_getparents_rec *
> +xfs_getparents_next_rec(struct xfs_getparents *gp,
> +			struct xfs_getparents_rec *gpr)
> +{
> +	char *next = ((char *)gpr + gpr->gpr_reclen);
> +	char *end = (char *)(uintptr_t)(gp->gp_buffer + gp->gp_bufsize);
> +
> +	if (next >= end)
> +		return NULL;
> +
> +	return (struct xfs_getparents_rec *)next;
> +}
> +
> +/* Iterate through this file handle's directory parent pointers. */
> +struct xfs_getparents_by_handle {
> +	/* Handle to file whose parents we want. */
> +	struct xfs_handle		gph_handle;
> +
> +	struct xfs_getparents		gph_request;
> +};
> +
>  /*
>   * ioctl commands that are used by Linux filesystems
>   */
> @@ -875,6 +946,8 @@ struct xfs_commit_range {
>  /*	XFS_IOC_GETFSMAP ------ hoisted 59         */
>  #define XFS_IOC_SCRUB_METADATA	_IOWR('X', 60, struct xfs_scrub_metadata)
>  #define XFS_IOC_AG_GEOMETRY	_IOWR('X', 61, struct xfs_ag_geometry)
> +#define XFS_IOC_GETPARENTS	_IOWR('X', 62, struct xfs_getparents)
> +#define XFS_IOC_GETPARENTS_BY_HANDLE _IOWR('X', 63, struct xfs_getparents_by_handle)
>  
>  /*
>   * ioctl commands that replace IRIX syssgi()'s
> diff --git a/fs/xfs/libxfs/xfs_ondisk.h b/fs/xfs/libxfs/xfs_ondisk.h
> index 25952ef584eee..34c972113d997 100644
> --- a/fs/xfs/libxfs/xfs_ondisk.h
> +++ b/fs/xfs/libxfs/xfs_ondisk.h
> @@ -156,6 +156,11 @@ xfs_check_ondisk_structs(void)
>  	XFS_CHECK_OFFSET(struct xfs_efi_log_format_32, efi_extents,	16);
>  	XFS_CHECK_OFFSET(struct xfs_efi_log_format_64, efi_extents,	16);
>  
> +	/* parent pointer ioctls */
> +	XFS_CHECK_STRUCT_SIZE(struct xfs_getparents_rec,	26);
> +	XFS_CHECK_STRUCT_SIZE(struct xfs_getparents,		40);
> +	XFS_CHECK_STRUCT_SIZE(struct xfs_getparents_by_handle,	64);
> +
>  	/*
>  	 * The v5 superblock format extended several v4 header structures with
>  	 * additional data. While new fields are only accessible on v5
> diff --git a/fs/xfs/libxfs/xfs_parent.c b/fs/xfs/libxfs/xfs_parent.c
> index 86c808157294e..db8cfad0b968e 100644
> --- a/fs/xfs/libxfs/xfs_parent.c
> +++ b/fs/xfs/libxfs/xfs_parent.c
> @@ -259,3 +259,38 @@ xfs_parent_replacename(
>  	xfs_attr_defer_parent(&ppargs->args, XFS_ATTR_DEFER_REPLACE);
>  	return 0;
>  }
> +
> +/*
> + * Extract parent pointer information from any xattr into @parent_ino/gen.
> + * The last two parameters can be NULL pointers.
> + *
> + * Returns 1 if this is a valid parent pointer; 0 if this is not a parent
> + * pointer xattr at all; or -EFSCORRUPTED for garbage.
> + */
> +int
> +xfs_parent_from_xattr(
> +	struct xfs_mount	*mp,
> +	unsigned int		attr_flags,
> +	const unsigned char	*name,
> +	unsigned int		namelen,
> +	const void		*value,
> +	unsigned int		valuelen,
> +	xfs_ino_t		*parent_ino,
> +	uint32_t		*parent_gen)
> +{
> +	const struct xfs_parent_rec	*rec = value;
> +
> +	if (!(attr_flags & XFS_ATTR_PARENT))
> +		return 0;
> +
> +	if (!xfs_parent_namecheck(attr_flags, name, namelen))
> +		return -EFSCORRUPTED;
> +	if (!xfs_parent_valuecheck(mp, value, valuelen))
> +		return -EFSCORRUPTED;
> +
> +	if (parent_ino)
> +		*parent_ino = be64_to_cpu(rec->p_ino);
> +	if (parent_gen)
> +		*parent_gen = be32_to_cpu(rec->p_gen);
> +	return 1;
> +}
> diff --git a/fs/xfs/libxfs/xfs_parent.h b/fs/xfs/libxfs/xfs_parent.h
> index 768633b313671..3003ab496f854 100644
> --- a/fs/xfs/libxfs/xfs_parent.h
> +++ b/fs/xfs/libxfs/xfs_parent.h
> @@ -91,4 +91,9 @@ int xfs_parent_replacename(struct xfs_trans *tp,
>  		struct xfs_inode *new_dp, const struct xfs_name *new_name,
>  		struct xfs_inode *child);
>  
> +int xfs_parent_from_xattr(struct xfs_mount *mp, unsigned int attr_flags,
> +		const unsigned char *name, unsigned int namelen,
> +		const void *value, unsigned int valuelen,
> +		xfs_ino_t *parent_ino, uint32_t *parent_gen);
> +
>  #endif /* __XFS_PARENT_H__ */
> diff --git a/fs/xfs/xfs_handle.c b/fs/xfs/xfs_handle.c
> index abeca486a2c91..833b0d7d8bea1 100644
> --- a/fs/xfs/xfs_handle.c
> +++ b/fs/xfs/xfs_handle.c
> @@ -1,6 +1,7 @@
>  // SPDX-License-Identifier: GPL-2.0
>  /*
>   * Copyright (c) 2000-2005 Silicon Graphics, Inc.
> + * Copyright (c) 2022-2024 Oracle.
>   * All rights reserved.
>   */
>  #include "xfs.h"
> @@ -645,3 +646,261 @@ xfs_attrmulti_by_handle(
>  	dput(dentry);
>  	return error;
>  }
> +
> +struct xfs_getparents_ctx {
> +	struct xfs_attr_list_context	context;
> +	struct xfs_getparents_by_handle	gph;
> +
> +	/* File to target */
> +	struct xfs_inode		*ip;
> +
> +	/* Internal buffer where we format records */
> +	void				*krecords;
> +
> +	/* Last record filled out */
> +	struct xfs_getparents_rec	*lastrec;
> +
> +	unsigned int			count;
> +};
> +
> +static inline unsigned int
> +xfs_getparents_rec_sizeof(
> +	unsigned int		namelen)
> +{
> +	return round_up(sizeof(struct xfs_getparents_rec) + namelen + 1,
> +			sizeof(uint32_t));
> +}
> +
> +static void
> +xfs_getparents_put_listent(
> +	struct xfs_attr_list_context	*context,
> +	int				flags,
> +	unsigned char			*name,
> +	int				namelen,
> +	void				*value,
> +	int				valuelen)
> +{
> +	struct xfs_getparents_ctx	*gpx =
> +		container_of(context, struct xfs_getparents_ctx, context);
> +	struct xfs_inode		*ip = context->dp;
> +	struct xfs_mount		*mp = ip->i_mount;
> +	struct xfs_getparents		*gp = &gpx->gph.gph_request;
> +	struct xfs_getparents_rec	*gpr = gpx->krecords + context->firstu;
> +	unsigned short			reclen = xfs_getparents_rec_sizeof(namelen);
> +	xfs_ino_t			ino;
> +	uint32_t			gen;
> +	int				ret;
> +
> +	ret = xfs_parent_from_xattr(mp, flags, name, namelen, value, valuelen,
> +			&ino, &gen);
> +	if (ret < 0) {
> +		xfs_inode_mark_sick(ip, XFS_SICK_INO_PARENT);
> +		context->seen_enough = -EFSCORRUPTED;
> +		return;
> +	}
> +	if (ret != 1)
> +		return;
> +
> +	/*
> +	 * We found a parent pointer, but we've filled up the buffer.  Signal
> +	 * to the caller that we did /not/ reach the end of the parent pointer
> +	 * recordset.
> +	 */
> +	if (context->firstu > context->bufsize - reclen) {
> +		context->seen_enough = 1;
> +		return;
> +	}
> +
> +	/* Format the parent pointer directly into the caller buffer. */
> +	gpr->gpr_reclen = reclen;
> +	xfs_filehandle_init(mp, ino, gen, &gpr->gpr_parent);
> +	memcpy(gpr->gpr_name, name, namelen);
> +	gpr->gpr_name[namelen] = 0;
> +
> +	trace_xfs_getparents_put_listent(ip, gp, context, gpr);
> +
> +	context->firstu += reclen;
> +	gpx->count++;
> +	gpx->lastrec = gpr;
> +}
> +
> +/* Expand the last record to fill the rest of the caller's buffer. */
> +static inline void
> +xfs_getparents_expand_lastrec(
> +	struct xfs_getparents_ctx	*gpx)
> +{
> +	struct xfs_getparents		*gp = &gpx->gph.gph_request;
> +	struct xfs_getparents_rec	*gpr = gpx->lastrec;
> +
> +	if (!gpx->lastrec)
> +		gpr = gpx->krecords;
> +
> +	gpr->gpr_reclen = gp->gp_bufsize - ((void *)gpr - gpx->krecords);
> +
> +	trace_xfs_getparents_expand_lastrec(gpx->ip, gp, &gpx->context, gpr);
> +}
> +
> +static inline void __user *u64_to_uptr(u64 val)
> +{
> +	return (void __user *)(uintptr_t)val;
> +}
> +
> +/* Retrieve the parent pointers for a given inode. */
> +STATIC int
> +xfs_getparents(
> +	struct xfs_getparents_ctx	*gpx)
> +{
> +	struct xfs_getparents		*gp = &gpx->gph.gph_request;
> +	struct xfs_inode		*ip = gpx->ip;
> +	struct xfs_mount		*mp = ip->i_mount;
> +	size_t				bufsize;
> +	int				error;
> +
> +	/* Check size of buffer requested by user */
> +	if (gp->gp_bufsize > XFS_XATTR_LIST_MAX)
> +		return -ENOMEM;
> +	if (gp->gp_bufsize < xfs_getparents_rec_sizeof(1))
> +		return -EINVAL;
> +
> +	if (gp->gp_iflags & ~XFS_GETPARENTS_IFLAGS_ALL)
> +		return -EINVAL;
> +	if (gp->__pad)
> +		return -EINVAL;
> +
> +	bufsize = round_down(gp->gp_bufsize, sizeof(uint32_t));
> +	gpx->krecords = kvzalloc(bufsize, GFP_KERNEL);
> +	if (!gpx->krecords) {
> +		bufsize = min(bufsize, PAGE_SIZE);
> +		gpx->krecords = kvzalloc(bufsize, GFP_KERNEL);
> +		if (!gpx->krecords)
> +			return -ENOMEM;
> +	}
> +
> +	gpx->context.dp = ip;
> +	gpx->context.resynch = 1;
> +	gpx->context.put_listent = xfs_getparents_put_listent;
> +	gpx->context.bufsize = bufsize;
> +	/* firstu is used to track the bytes filled in the buffer */
> +	gpx->context.firstu = 0;
> +
> +	/* Copy the cursor provided by caller */
> +	memcpy(&gpx->context.cursor, &gp->gp_cursor,
> +			sizeof(struct xfs_attrlist_cursor));
> +	gpx->count = 0;
> +	gp->gp_oflags = 0;
> +
> +	trace_xfs_getparents_begin(ip, gp, &gpx->context.cursor);
> +
> +	error = xfs_attr_list(&gpx->context);
> +	if (error)
> +		goto out_free_buf;
> +	if (gpx->context.seen_enough < 0) {
> +		error = gpx->context.seen_enough;
> +		goto out_free_buf;
> +	}
> +	xfs_getparents_expand_lastrec(gpx);
> +
> +	/* Update the caller with the current cursor position */
> +	memcpy(&gp->gp_cursor, &gpx->context.cursor,
> +			sizeof(struct xfs_attrlist_cursor));
> +
> +	/* Is this the root directory? */
> +	if (ip->i_ino == mp->m_sb.sb_rootino)
> +		gp->gp_oflags |= XFS_GETPARENTS_OFLAG_ROOT;
> +
> +	if (gpx->context.seen_enough == 0) {
> +		/*
> +		 * If we did not run out of buffer space, then we reached the
> +		 * end of the pptr recordset, so set the DONE flag.
> +		 */
> +		gp->gp_oflags |= XFS_GETPARENTS_OFLAG_DONE;
> +	} else if (gpx->count == 0) {
> +		/*
> +		 * If we ran out of buffer space before copying any parent
> +		 * pointers at all, the caller's buffer was too short.  Tell
> +		 * userspace that, erm, the message is too long.
> +		 */
> +		error = -EMSGSIZE;
> +		goto out_free_buf;
> +	}
> +
> +	trace_xfs_getparents_end(ip, gp, &gpx->context.cursor);
> +
> +	ASSERT(gpx->context.firstu <= gpx->gph.gph_request.gp_bufsize);
> +
> +	/* Copy the records to userspace. */
> +	if (copy_to_user(u64_to_uptr(gpx->gph.gph_request.gp_buffer),
> +				gpx->krecords, gpx->context.firstu))
> +		error = -EFAULT;
> +
> +out_free_buf:
> +	kvfree(gpx->krecords);
> +	gpx->krecords = NULL;
> +	return error;
> +}
> +
> +/* Retrieve the parents of this file and pass them back to userspace. */
> +int
> +xfs_ioc_getparents(
> +	struct file			*file,
> +	struct xfs_getparents __user	*ureq)
> +{
> +	struct xfs_getparents_ctx	gpx = {
> +		.ip			= XFS_I(file_inode(file)),
> +	};
> +	struct xfs_getparents		*kreq = &gpx.gph.gph_request;
> +	struct xfs_mount		*mp = gpx.ip->i_mount;
> +	int				error;
> +
> +	if (!capable(CAP_SYS_ADMIN))
> +		return -EPERM;
> +	if (!xfs_has_parent(mp))
> +		return -EOPNOTSUPP;
> +	if (copy_from_user(kreq, ureq, sizeof(*kreq)))
> +		return -EFAULT;
> +
> +	error = xfs_getparents(&gpx);
> +	if (error)
> +		return error;
> +
> +	if (copy_to_user(ureq, kreq, sizeof(*kreq)))
> +		return -EFAULT;
> +
> +	return 0;
> +}
> +
> +/* Retrieve the parents of this file handle and pass them back to userspace. */
> +int
> +xfs_ioc_getparents_by_handle(
> +	struct file			*file,
> +	struct xfs_getparents_by_handle __user	*ureq)
> +{
> +	struct xfs_getparents_ctx	gpx = { };
> +	struct xfs_inode		*ip = XFS_I(file_inode(file));
> +	struct xfs_mount		*mp = ip->i_mount;
> +	struct xfs_getparents_by_handle	*kreq = &gpx.gph;
> +	struct dentry			*dentry;
> +	int				error;
> +
> +	if (!capable(CAP_SYS_ADMIN))
> +		return -EPERM;
> +	if (!xfs_has_parent(mp))
> +		return -EOPNOTSUPP;
> +	if (copy_from_user(kreq, ureq, sizeof(*kreq)))
> +		return -EFAULT;
> +
> +	dentry = xfs_khandle_to_dentry(file, &kreq->gph_handle);

I noticed a couple of things while doing more testing here -- first,
xfs_khandle_to_dentry doesn't check that the handle fsid actually
matches this filesystem, and AFAICT *nothing* actually checks that.
So I guess that's a longstanding weakness of handle validation, and we
probably haven't gotten any reports because what's the chance that
you'll get lucky with an ino/gen from a different filesystem?

(Pretty good what with golden images proliferating, I'd say...)

The second thing is that exportfs_decode_fh does too much work here --
if the handle references a directory, it'll walk up the directory tree
to the root to try to reconnect the dentry paths.  For GETPARENTS we
don't care about that since we're not doing anything with dentries.
Walking upwards in the directory tree is extra work that doesn't change
the results.

Worse yet, if there's a loop in in the directory tree due to dotdot
damage or whatnot, this can livelock the system.  This is unfortunate
for xfs_scrub because it'll use GETPARENTS to try to report the path of
a file that it wants to repair ... but it might not have checked those
parents.  So I really don't want GETPARENTS to be creating a bunch of
dentries and reconnecting paths and whatnot.

What we really want, I think, is some basic handle validation and then a
call to something like xfs_nfs_get_inode.


	if (!S_ISDIR(VFS_I(ip)->i_mode))
		return -ENOTDIR;

	if (memcmp(&handle->ha_fsid, mp->m_fixedfsid, sizeof(struct xfs_fsid)))
		return -ESTALE;

	if (handle->ha_fid.fid_len != xfs_filehandle_fid_len())
		return -EINVAL;

	inode = xfs_nfs_get_inode(mp->m_super, handle->ha_fid.fid_ino,
			handle->ha_fid.fid_gen);
	if (IS_ERR(inode))
		return PTR_ERR(inode);

	gpx.ip = XFS_I(inode);
	error = xfs_getparents(&gpx);

And yes, I'll add a big comment explaining why we don't use the regular
handle functions here.

--D

> +	if (IS_ERR(dentry))
> +		return PTR_ERR(dentry);
> +
> +	gpx.ip = XFS_I(dentry->d_inode);
> +	error = xfs_getparents(&gpx);
> +	dput(dentry);
> +	if (error)
> +		return error;
> +
> +	if (copy_to_user(ureq, kreq, sizeof(*kreq)))
> +		return -EFAULT;
> +
> +	return 0;
> +}
> diff --git a/fs/xfs/xfs_handle.h b/fs/xfs/xfs_handle.h
> index e39eaf4689da9..6799a86d8565c 100644
> --- a/fs/xfs/xfs_handle.h
> +++ b/fs/xfs/xfs_handle.h
> @@ -1,6 +1,7 @@
>  // SPDX-License-Identifier: GPL-2.0
>  /*
>   * Copyright (c) 2000-2005 Silicon Graphics, Inc.
> + * Copyright (c) 2022-2024 Oracle.
>   * All rights reserved.
>   */
>  #ifndef	__XFS_HANDLE_H__
> @@ -25,4 +26,8 @@ int xfs_ioc_attr_list(struct xfs_inode *dp, void __user *ubuf,
>  struct dentry *xfs_handle_to_dentry(struct file *parfilp, void __user *uhandle,
>  		u32 hlen);
>  
> +int xfs_ioc_getparents(struct file *file, struct xfs_getparents __user *arg);
> +int xfs_ioc_getparents_by_handle(struct file *file,
> +		struct xfs_getparents_by_handle __user *arg);
> +
>  #endif	/* __XFS_HANDLE_H__ */
> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> index 7b347cdd28785..c7a15b5f33aa4 100644
> --- a/fs/xfs/xfs_ioctl.c
> +++ b/fs/xfs/xfs_ioctl.c
> @@ -35,6 +35,7 @@
>  #include "xfs_health.h"
>  #include "xfs_reflink.h"
>  #include "xfs_ioctl.h"
> +#include "xfs_xattr.h"
>  #include "xfs_rtbitmap.h"
>  #include "xfs_file.h"
>  #include "xfs_exchrange.h"
> @@ -1542,7 +1543,10 @@ xfs_file_ioctl(
>  
>  	case XFS_IOC_FSGETXATTRA:
>  		return xfs_ioc_fsgetxattra(ip, arg);
> -
> +	case XFS_IOC_GETPARENTS:
> +		return xfs_ioc_getparents(filp, arg);
> +	case XFS_IOC_GETPARENTS_BY_HANDLE:
> +		return xfs_ioc_getparents_by_handle(filp, arg);
>  	case XFS_IOC_GETBMAP:
>  	case XFS_IOC_GETBMAPA:
>  	case XFS_IOC_GETBMAPX:
> diff --git a/fs/xfs/xfs_trace.c b/fs/xfs/xfs_trace.c
> index cf92a3bd56c79..9c7fbaae2717d 100644
> --- a/fs/xfs/xfs_trace.c
> +++ b/fs/xfs/xfs_trace.c
> @@ -41,6 +41,7 @@
>  #include "xfs_bmap.h"
>  #include "xfs_exchmaps.h"
>  #include "xfs_exchrange.h"
> +#include "xfs_parent.h"
>  
>  /*
>   * We include this last to have the helpers above available for the trace
> diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
> index e6cbdffb14f64..4438b62a8c562 100644
> --- a/fs/xfs/xfs_trace.h
> +++ b/fs/xfs/xfs_trace.h
> @@ -87,6 +87,9 @@ struct xfs_bmap_intent;
>  struct xfs_exchmaps_intent;
>  struct xfs_exchmaps_req;
>  struct xfs_exchrange;
> +struct xfs_getparents;
> +struct xfs_parent_irec;
> +struct xfs_attrlist_cursor_kern;
>  
>  #define XFS_ATTR_FILTER_FLAGS \
>  	{ XFS_ATTR_ROOT,	"ROOT" }, \
> @@ -5158,6 +5161,95 @@ TRACE_EVENT(xfs_exchmaps_delta_nextents,
>  		  __entry->d_nexts1, __entry->d_nexts2)
>  );
>  
> +DECLARE_EVENT_CLASS(xfs_getparents_rec_class,
> +	TP_PROTO(struct xfs_inode *ip, const struct xfs_getparents *ppi,
> +		 const struct xfs_attr_list_context *context,
> +	         const struct xfs_getparents_rec *pptr),
> +	TP_ARGS(ip, ppi, context, pptr),
> +	TP_STRUCT__entry(
> +		__field(dev_t, dev)
> +		__field(xfs_ino_t, ino)
> +		__field(unsigned int, firstu)
> +		__field(unsigned short, reclen)
> +		__field(unsigned int, bufsize)
> +		__field(xfs_ino_t, parent_ino)
> +		__field(unsigned int, parent_gen)
> +		__string(name, pptr->gpr_name)
> +	),
> +	TP_fast_assign(
> +		__entry->dev = ip->i_mount->m_super->s_dev;
> +		__entry->ino = ip->i_ino;
> +		__entry->firstu = context->firstu;
> +		__entry->reclen = pptr->gpr_reclen;
> +		__entry->bufsize = ppi->gp_bufsize;
> +		__entry->parent_ino = pptr->gpr_parent.ha_fid.fid_ino;
> +		__entry->parent_gen = pptr->gpr_parent.ha_fid.fid_gen;
> +		__assign_str(name, pptr->gpr_name);
> +	),
> +	TP_printk("dev %d:%d ino 0x%llx firstu %u reclen %u bufsize %u parent_ino 0x%llx parent_gen 0x%x name '%s'",
> +		  MAJOR(__entry->dev), MINOR(__entry->dev),
> +		  __entry->ino,
> +		  __entry->firstu,
> +		  __entry->reclen,
> +		  __entry->bufsize,
> +		  __entry->parent_ino,
> +		  __entry->parent_gen,
> +		  __get_str(name))
> +)
> +#define DEFINE_XFS_GETPARENTS_REC_EVENT(name) \
> +DEFINE_EVENT(xfs_getparents_rec_class, name, \
> +	TP_PROTO(struct xfs_inode *ip, const struct xfs_getparents *ppi, \
> +		 const struct xfs_attr_list_context *context, \
> +	         const struct xfs_getparents_rec *pptr), \
> +	TP_ARGS(ip, ppi, context, pptr))
> +DEFINE_XFS_GETPARENTS_REC_EVENT(xfs_getparents_put_listent);
> +DEFINE_XFS_GETPARENTS_REC_EVENT(xfs_getparents_expand_lastrec);
> +
> +DECLARE_EVENT_CLASS(xfs_getparents_class,
> +	TP_PROTO(struct xfs_inode *ip, const struct xfs_getparents *ppi,
> +		 const struct xfs_attrlist_cursor_kern *cur),
> +	TP_ARGS(ip, ppi, cur),
> +	TP_STRUCT__entry(
> +		__field(dev_t, dev)
> +		__field(xfs_ino_t, ino)
> +		__field(unsigned short, iflags)
> +		__field(unsigned short, oflags)
> +		__field(unsigned int, bufsize)
> +		__field(unsigned int, hashval)
> +		__field(unsigned int, blkno)
> +		__field(unsigned int, offset)
> +		__field(int, initted)
> +	),
> +	TP_fast_assign(
> +		__entry->dev = ip->i_mount->m_super->s_dev;
> +		__entry->ino = ip->i_ino;
> +		__entry->iflags = ppi->gp_iflags;
> +		__entry->oflags = ppi->gp_oflags;
> +		__entry->bufsize = ppi->gp_bufsize;
> +		__entry->hashval = cur->hashval;
> +		__entry->blkno = cur->blkno;
> +		__entry->offset = cur->offset;
> +		__entry->initted = cur->initted;
> +	),
> +	TP_printk("dev %d:%d ino 0x%llx iflags 0x%x oflags 0x%x bufsize %u cur_init? %d hashval 0x%x blkno %u offset %u",
> +		  MAJOR(__entry->dev), MINOR(__entry->dev),
> +		  __entry->ino,
> +		  __entry->iflags,
> +		  __entry->oflags,
> +		  __entry->bufsize,
> +		  __entry->initted,
> +		  __entry->hashval,
> +		  __entry->blkno,
> +		  __entry->offset)
> +)
> +#define DEFINE_XFS_GETPARENTS_EVENT(name) \
> +DEFINE_EVENT(xfs_getparents_class, name, \
> +	TP_PROTO(struct xfs_inode *ip, const struct xfs_getparents *ppi, \
> +		 const struct xfs_attrlist_cursor_kern *cur), \
> +	TP_ARGS(ip, ppi, cur))
> +DEFINE_XFS_GETPARENTS_EVENT(xfs_getparents_begin);
> +DEFINE_XFS_GETPARENTS_EVENT(xfs_getparents_end);
> +
>  #endif /* _TRACE_XFS_H */
>  
>  #undef TRACE_INCLUDE_PATH
> 
> 

^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 27/32] xfs: Add parent pointer ioctls
  2024-04-12 17:39     ` Darrick J. Wong
@ 2024-04-14  5:18       ` Christoph Hellwig
  2024-04-15 19:40         ` Darrick J. Wong
  0 siblings, 1 reply; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-14  5:18 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Allison Henderson, catherine.hoang, hch, linux-xfs

[full quote deleted.  It took me about a minute of scrolling to find
the actual contents, *sigh*]

On Fri, Apr 12, 2024 at 10:39:57AM -0700, Darrick J. Wong wrote:
> I noticed a couple of things while doing more testing here -- first,
> xfs_khandle_to_dentry doesn't check that the handle fsid actually
> matches this filesystem, and AFAICT *nothing* actually checks that.

Yes.  Userspace better have resolved that, as the ioctl only works
on the given file system, so libhandle has to resolve it before
even calling the ioctl.

> So I guess that's a longstanding weakness of handle validation, and we
> probably haven't gotten any reports because what's the chance that
> you'll get lucky with an ino/gen from a different filesystem?

Not really, see above.

> The second thing is that exportfs_decode_fh does too much work here --
> if the handle references a directory, it'll walk up the directory tree
> to the root to try to reconnect the dentry paths.  For GETPARENTS we
> don't care about that since we're not doing anything with dentries.
> Walking upwards in the directory tree is extra work that doesn't change
> the results.

In theory no one cares as all operations work just fine with disconnected
dentries, and exportfs_decode_fh doesn't do these checks unless the
accpetable parameter is passed to it.  The real question is why we (which
in this case means 15 years younger me) decided back then we want this
checking for XFS handle operations?  I can't really think of one
right now..


^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 27/32] xfs: Add parent pointer ioctls
  2024-04-14  5:18       ` Christoph Hellwig
@ 2024-04-15 19:40         ` Darrick J. Wong
  2024-04-16  4:47           ` Christoph Hellwig
  0 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-15 19:40 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Allison Henderson, catherine.hoang, linux-xfs

On Sun, Apr 14, 2024 at 07:18:16AM +0200, Christoph Hellwig wrote:
> [full quote deleted.  It took me about a minute of scrolling to find
> the actual contents, *sigh*]
> 
> On Fri, Apr 12, 2024 at 10:39:57AM -0700, Darrick J. Wong wrote:
> > I noticed a couple of things while doing more testing here -- first,
> > xfs_khandle_to_dentry doesn't check that the handle fsid actually
> > matches this filesystem, and AFAICT *nothing* actually checks that.
> 
> Yes.  Userspace better have resolved that, as the ioctl only works
> on the given file system, so libhandle has to resolve it before
> even calling the ioctl.

True, libhandle is a very nice wrapper for the kernel ioctls.  I wish
Linux projects did that more often.  But suppose you're calling the
ioctls directly without libhandle and mess it up?

> > So I guess that's a longstanding weakness of handle validation, and we
> > probably haven't gotten any reports because what's the chance that
> > you'll get lucky with an ino/gen from a different filesystem?
> 
> Not really, see above.
> 
> > The second thing is that exportfs_decode_fh does too much work here --
> > if the handle references a directory, it'll walk up the directory tree
> > to the root to try to reconnect the dentry paths.  For GETPARENTS we
> > don't care about that since we're not doing anything with dentries.
> > Walking upwards in the directory tree is extra work that doesn't change
> > the results.
> 
> In theory no one cares as all operations work just fine with disconnected
> dentries, and exportfs_decode_fh doesn't do these checks unless the
> accpetable parameter is passed to it.  The real question is why we (which
> in this case means 15 years younger me) decided back then we want this
> checking for XFS handle operations?  I can't really think of one
> right now..

Me neither.  Though at this point there are a lot of filesystems that
implement ->get_parent, so I think removing XFS's will need a discussion
at least on linux-xfs, if not fsdevel.  In the meantime, getparents can
do minimal validation + iget for now and if it makes sense to port it
back to xfs_khandle_to_dentry, I can do that easily.

(FWIW turning off reconnection would likely fix some of the annoying
behaviors of xfs_scrub where it tries to open a dir to scan it and then
sprays dmesg with errors from unrelated parents as it stumbles over
reconnection only to fail the open, at which point it falls back to
scrubbing by handle anyway.)

--D

^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 27/32] xfs: Add parent pointer ioctls
  2024-04-15 19:40         ` Darrick J. Wong
@ 2024-04-16  4:47           ` Christoph Hellwig
  2024-04-16 16:50             ` Darrick J. Wong
  0 siblings, 1 reply; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-16  4:47 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Allison Henderson, catherine.hoang, linux-xfs

On Mon, Apr 15, 2024 at 12:40:36PM -0700, Darrick J. Wong wrote:
> True, libhandle is a very nice wrapper for the kernel ioctls.  I wish
> Linux projects did that more often.  But suppose you're calling the
> ioctls directly without libhandle and mess it up?

The you get different inodes back.  Not really any different from
pointing your path name based code to the wrong fs or directory,
is it?

> > In theory no one cares as all operations work just fine with disconnected
> > dentries, and exportfs_decode_fh doesn't do these checks unless the
> > accpetable parameter is passed to it.  The real question is why we (which
> > in this case means 15 years younger me) decided back then we want this
> > checking for XFS handle operations?  I can't really think of one
> > right now..
> 
> Me neither.  Though at this point there are a lot of filesystems that
> implement ->get_parent, so I think removing XFS's will need a discussion
> at least on linux-xfs, if not fsdevel.  In the meantime, getparents can
> do minimal validation + iget for now and if it makes sense to port it
> back to xfs_khandle_to_dentry, I can do that easily.

Uhh, I'm not advocating for removing ->get_parent at all.  We actually
do need that for security on NFS, where the file handles are used
undernath pathname based operations.

And it turns out my previous analysis wasn't quite sport on.  The
exportfs code always reconnects directories, because we basically
have to, not connecting them would make the VFS locking scheme
not work.

But as we never generate the file handles that encode the parent
we already never connect files to their parent directory anyway.


OTOH we should be able to optimize ->get_parent a bit with parent
pointers, as we can find the name in the parent directory for
a directory instead of doing linear scans in the parent directory.
(for non-directory files we currenty don't fully connect anwyay)

^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 27/32] xfs: Add parent pointer ioctls
  2024-04-16  4:47           ` Christoph Hellwig
@ 2024-04-16 16:50             ` Darrick J. Wong
  2024-04-16 16:54               ` Christoph Hellwig
  0 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-16 16:50 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Allison Henderson, catherine.hoang, linux-xfs

On Tue, Apr 16, 2024 at 06:47:16AM +0200, Christoph Hellwig wrote:
> On Mon, Apr 15, 2024 at 12:40:36PM -0700, Darrick J. Wong wrote:
> > True, libhandle is a very nice wrapper for the kernel ioctls.  I wish
> > Linux projects did that more often.  But suppose you're calling the
> > ioctls directly without libhandle and mess it up?
> 
> The you get different inodes back.  Not really any different from
> pointing your path name based code to the wrong fs or directory,
> is it?

I suppose not.  But why bother setting the fsid at all, then?

> > > In theory no one cares as all operations work just fine with disconnected
> > > dentries, and exportfs_decode_fh doesn't do these checks unless the
> > > accpetable parameter is passed to it.  The real question is why we (which
> > > in this case means 15 years younger me) decided back then we want this
> > > checking for XFS handle operations?  I can't really think of one
> > > right now..
> > 
> > Me neither.  Though at this point there are a lot of filesystems that
> > implement ->get_parent, so I think removing XFS's will need a discussion
> > at least on linux-xfs, if not fsdevel.  In the meantime, getparents can
> > do minimal validation + iget for now and if it makes sense to port it
> > back to xfs_khandle_to_dentry, I can do that easily.
> 
> Uhh, I'm not advocating for removing ->get_parent at all.  We actually
> do need that for security on NFS, where the file handles are used
> undernath pathname based operations.

Ahh, I wasn't aware of that, beyond a sense that "a lot of
NFS-exportable fses do this, so there's likely a general desire for this
to be wired up."

> And it turns out my previous analysis wasn't quite sport on.  The
> exportfs code always reconnects directories, because we basically
> have to, not connecting them would make the VFS locking scheme
> not work.

Noted.

> But as we never generate the file handles that encode the parent
> we already never connect files to their parent directory anyway.

I pondered whether or not we should encode parent info in a regular
file's handle.  Would that result in an invalid handle if the file gets
moved to another directory?  That doesn't seem to fit with the behavior
that fds remain attached to the file even if it gets moved/deleted.

> OTOH we should be able to optimize ->get_parent a bit with parent
> pointers, as we can find the name in the parent directory for
> a directory instead of doing linear scans in the parent directory.
> (for non-directory files we currenty don't fully connect anwyay)

<nod> But does exportfs actually want parent info for a nondirectory?
There aren't any stubs or XXX/FIXME comments, and I've never heard any
calls (at least on fsdevel) for that functionality.

--D

^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 27/32] xfs: Add parent pointer ioctls
  2024-04-16 16:50             ` Darrick J. Wong
@ 2024-04-16 16:54               ` Christoph Hellwig
  2024-04-16 18:52                 ` Darrick J. Wong
  0 siblings, 1 reply; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-16 16:54 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Allison Henderson, catherine.hoang, linux-xfs

On Tue, Apr 16, 2024 at 09:50:56AM -0700, Darrick J. Wong wrote:
> On Tue, Apr 16, 2024 at 06:47:16AM +0200, Christoph Hellwig wrote:
> > On Mon, Apr 15, 2024 at 12:40:36PM -0700, Darrick J. Wong wrote:
> > > True, libhandle is a very nice wrapper for the kernel ioctls.  I wish
> > > Linux projects did that more often.  But suppose you're calling the
> > > ioctls directly without libhandle and mess it up?
> > 
> > The you get different inodes back.  Not really any different from
> > pointing your path name based code to the wrong fs or directory,
> > is it?
> 
> I suppose not.  But why bother setting the fsid at all, then?

I suspect that's a leftover from IRIX where the by handle operations
weren't ioctls tied to a specific file system.

> > But as we never generate the file handles that encode the parent
> > we already never connect files to their parent directory anyway.
> 
> I pondered whether or not we should encode parent info in a regular
> file's handle.

We shouldn't.  It's a really a NFS thing.


> Would that result in an invalid handle if the file gets
> moved to another directory?

Yes.

> That doesn't seem to fit with the behavior
> that fds remain attached to the file even if it gets moved/deleted.

Exactly.

> 
> > OTOH we should be able to optimize ->get_parent a bit with parent
> > pointers, as we can find the name in the parent directory for
> > a directory instead of doing linear scans in the parent directory.
> > (for non-directory files we currenty don't fully connect anwyay)
> 
> <nod> But does exportfs actually want parent info for a nondirectory?
> There aren't any stubs or XXX/FIXME comments, and I've never heard any
> calls (at least on fsdevel) for that functionality.

It doesn't.  It would avoid having disconnected dentries, but
disconnected non-directory dentries aren't really a problem.

^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 27/32] xfs: Add parent pointer ioctls
  2024-04-16 16:54               ` Christoph Hellwig
@ 2024-04-16 18:52                 ` Darrick J. Wong
  2024-04-16 19:01                   ` Christoph Hellwig
  0 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-16 18:52 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Christoph Hellwig, Allison Henderson, catherine.hoang, linux-xfs

On Tue, Apr 16, 2024 at 09:54:14AM -0700, Christoph Hellwig wrote:
> On Tue, Apr 16, 2024 at 09:50:56AM -0700, Darrick J. Wong wrote:
> > On Tue, Apr 16, 2024 at 06:47:16AM +0200, Christoph Hellwig wrote:
> > > On Mon, Apr 15, 2024 at 12:40:36PM -0700, Darrick J. Wong wrote:
> > > > True, libhandle is a very nice wrapper for the kernel ioctls.  I wish
> > > > Linux projects did that more often.  But suppose you're calling the
> > > > ioctls directly without libhandle and mess it up?
> > > 
> > > The you get different inodes back.  Not really any different from
> > > pointing your path name based code to the wrong fs or directory,
> > > is it?
> > 
> > I suppose not.  But why bother setting the fsid at all, then?
> 
> I suspect that's a leftover from IRIX where the by handle operations
> weren't ioctls tied to a specific file system.

Oh, so on Irix a program could call the kernel with *only* the handle
and no fd?  I wasn't aware of that, but most of my exposure to Irix was
wowwwing over the 3D File Explorer in _Jurassic Park_ and later an old
Indigo that someone donated to the high school. ;)

Ok, I'll drop the fsid checking code entirely.

> > > But as we never generate the file handles that encode the parent
> > > we already never connect files to their parent directory anyway.
> > 
> > I pondered whether or not we should encode parent info in a regular
> > file's handle.
> 
> We shouldn't.  It's a really a NFS thing.
> 
> 
> > Would that result in an invalid handle if the file gets
> > moved to another directory?
> 
> Yes.
> 
> > That doesn't seem to fit with the behavior
> > that fds remain attached to the file even if it gets moved/deleted.
> 
> Exactly.

<nod>

> > 
> > > OTOH we should be able to optimize ->get_parent a bit with parent
> > > pointers, as we can find the name in the parent directory for
> > > a directory instead of doing linear scans in the parent directory.
> > > (for non-directory files we currenty don't fully connect anwyay)
> > 
> > <nod> But does exportfs actually want parent info for a nondirectory?
> > There aren't any stubs or XXX/FIXME comments, and I've never heard any
> > calls (at least on fsdevel) for that functionality.
> 
> It doesn't.  It would avoid having disconnected dentries, but
> disconnected non-directory dentries aren't really a problem.

For directories, I think the dotdot lookup is much cheaper than scanning
the attrs to find the first nongarbage XFS_ATTR_PARENT entry.

--D

^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 27/32] xfs: Add parent pointer ioctls
  2024-04-16 18:52                 ` Darrick J. Wong
@ 2024-04-16 19:01                   ` Christoph Hellwig
  2024-04-16 19:07                     ` Darrick J. Wong
  0 siblings, 1 reply; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-16 19:01 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Christoph Hellwig, Allison Henderson,
	catherine.hoang, linux-xfs

On Tue, Apr 16, 2024 at 11:52:09AM -0700, Darrick J. Wong wrote:
> > > <nod> But does exportfs actually want parent info for a nondirectory?
> > > There aren't any stubs or XXX/FIXME comments, and I've never heard any
> > > calls (at least on fsdevel) for that functionality.
> > 
> > It doesn't.  It would avoid having disconnected dentries, but
> > disconnected non-directory dentries aren't really a problem.
> 
> For directories, I think the dotdot lookup is much cheaper than scanning
> the attrs to find the first nongarbage XFS_ATTR_PARENT entry.

It is.

But I was confused again, it's been a while since I worked on that code..

We do the full reconnection for non-directories if NFSD asks for it (the
XFS or VFS handle code won't hit this because our acceptable callback
always returns true).   That code does a readdir on the parent and
returns the name when it finds the inode number.  For files without
crazy number of hardlinks just looking over the parent pointers would
be a lot more efficient for that.


^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 27/32] xfs: Add parent pointer ioctls
  2024-04-16 19:01                   ` Christoph Hellwig
@ 2024-04-16 19:07                     ` Darrick J. Wong
  2024-04-16 19:14                       ` Christoph Hellwig
  0 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-16 19:07 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Christoph Hellwig, Allison Henderson, catherine.hoang, linux-xfs

On Tue, Apr 16, 2024 at 12:01:52PM -0700, Christoph Hellwig wrote:
> On Tue, Apr 16, 2024 at 11:52:09AM -0700, Darrick J. Wong wrote:
> > > > <nod> But does exportfs actually want parent info for a nondirectory?
> > > > There aren't any stubs or XXX/FIXME comments, and I've never heard any
> > > > calls (at least on fsdevel) for that functionality.
> > > 
> > > It doesn't.  It would avoid having disconnected dentries, but
> > > disconnected non-directory dentries aren't really a problem.
> > 
> > For directories, I think the dotdot lookup is much cheaper than scanning
> > the attrs to find the first nongarbage XFS_ATTR_PARENT entry.
> 
> It is.
> 
> But I was confused again, it's been a while since I worked on that code..
> 
> We do the full reconnection for non-directories if NFSD asks for it (the
> XFS or VFS handle code won't hit this because our acceptable callback
> always returns true).   That code does a readdir on the parent and
> returns the name when it finds the inode number.  For files without
> crazy number of hardlinks just looking over the parent pointers would
> be a lot more efficient for that.

Ohhhh, does that happens outside of XFS then?  No wonder I couldn't find
what you were talking about.  Ok I'll go look some more.

--D

^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 27/32] xfs: Add parent pointer ioctls
  2024-04-16 19:07                     ` Darrick J. Wong
@ 2024-04-16 19:14                       ` Christoph Hellwig
  2024-04-17  5:22                         ` Darrick J. Wong
  0 siblings, 1 reply; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-16 19:14 UTC (permalink / raw)
  To: Darrick J. Wong
  Cc: Christoph Hellwig, Christoph Hellwig, Allison Henderson,
	catherine.hoang, linux-xfs

On Tue, Apr 16, 2024 at 12:07:33PM -0700, Darrick J. Wong wrote:
> Ohhhh, does that happens outside of XFS then?  No wonder I couldn't find
> what you were talking about.  Ok I'll go look some more.

Yes. get_name() in fs/exportfs/expfs.c.


^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 27/32] xfs: Add parent pointer ioctls
  2024-04-16 19:14                       ` Christoph Hellwig
@ 2024-04-17  5:22                         ` Darrick J. Wong
  2024-04-17  5:29                           ` Christoph Hellwig
  0 siblings, 1 reply; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-17  5:22 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Christoph Hellwig, Allison Henderson, catherine.hoang, linux-xfs

On Tue, Apr 16, 2024 at 12:14:24PM -0700, Christoph Hellwig wrote:
> On Tue, Apr 16, 2024 at 12:07:33PM -0700, Darrick J. Wong wrote:
> > Ohhhh, does that happens outside of XFS then?  No wonder I couldn't find
> > what you were talking about.  Ok I'll go look some more.
> 
> Yes. get_name() in fs/exportfs/expfs.c.

Hmm.  Implementing a custom ->get_name for pptrs would work well for
child files with low hardlink counts.  Certainly there are probably a
lot more large directories than files that are hardlinked many many
times.  At what point does it become cheaper to scan the directory?

--D

^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 27/32] xfs: Add parent pointer ioctls
  2024-04-17  5:22                         ` Darrick J. Wong
@ 2024-04-17  5:29                           ` Christoph Hellwig
  2024-04-17  5:55                             ` Darrick J. Wong
  0 siblings, 1 reply; 234+ messages in thread
From: Christoph Hellwig @ 2024-04-17  5:29 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Allison Henderson, catherine.hoang, linux-xfs

On Tue, Apr 16, 2024 at 10:22:45PM -0700, Darrick J. Wong wrote:
> On Tue, Apr 16, 2024 at 12:14:24PM -0700, Christoph Hellwig wrote:
> > On Tue, Apr 16, 2024 at 12:07:33PM -0700, Darrick J. Wong wrote:
> > > Ohhhh, does that happens outside of XFS then?  No wonder I couldn't find
> > > what you were talking about.  Ok I'll go look some more.
> > 
> > Yes. get_name() in fs/exportfs/expfs.c.
> 
> Hmm.  Implementing a custom ->get_name for pptrs would work well for
> child files with low hardlink counts.  Certainly there are probably a
> lot more large directories than files that are hardlinked many many
> times.  At what point does it become cheaper to scan the directory?

Note that despite my previous confusion get_name is also called for
directories to find the actual name they have in their parent.

An easy conservative choice would be to always look at the parent
pointers for nlink==1.

All of that is for later, I don't want to delay the parent pointers
series even further.


^ permalink raw reply	[flat|nested] 234+ messages in thread

* Re: [PATCH 27/32] xfs: Add parent pointer ioctls
  2024-04-17  5:29                           ` Christoph Hellwig
@ 2024-04-17  5:55                             ` Darrick J. Wong
  0 siblings, 0 replies; 234+ messages in thread
From: Darrick J. Wong @ 2024-04-17  5:55 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Allison Henderson, catherine.hoang, linux-xfs

On Wed, Apr 17, 2024 at 07:29:18AM +0200, Christoph Hellwig wrote:
> On Tue, Apr 16, 2024 at 10:22:45PM -0700, Darrick J. Wong wrote:
> > On Tue, Apr 16, 2024 at 12:14:24PM -0700, Christoph Hellwig wrote:
> > > On Tue, Apr 16, 2024 at 12:07:33PM -0700, Darrick J. Wong wrote:
> > > > Ohhhh, does that happens outside of XFS then?  No wonder I couldn't find
> > > > what you were talking about.  Ok I'll go look some more.
> > > 
> > > Yes. get_name() in fs/exportfs/expfs.c.
> > 
> > Hmm.  Implementing a custom ->get_name for pptrs would work well for
> > child files with low hardlink counts.  Certainly there are probably a
> > lot more large directories than files that are hardlinked many many
> > times.  At what point does it become cheaper to scan the directory?
> 
> Note that despite my previous confusion get_name is also called for
> directories to find the actual name they have in their parent.
> 
> An easy conservative choice would be to always look at the parent
> pointers for nlink==1.
> 
> All of that is for later, I don't want to delay the parent pointers
> series even further.

It won't; I'll need to test the default ->get_name implementation so
that I can compare it to the new one, which means I'll have to figure
out /how/ to test that.

--D

^ permalink raw reply	[flat|nested] 234+ messages in thread

end of thread, other threads:[~2024-04-17  5:55 UTC | newest]

Thread overview: 234+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-10  0:36 [PATCHBOMB v13.1] xfs: directory parent pointers Darrick J. Wong
2024-04-10  0:44 ` [PATCHSET v13.1 1/9] xfs: design documentation for online fsck, part 2 Darrick J. Wong
2024-04-10  0:46   ` [PATCH 1/4] docs: update the parent pointers documentation to the final version Darrick J. Wong
2024-04-10  4:40     ` Christoph Hellwig
2024-04-10  0:46   ` [PATCH 2/4] docs: update online directory and parent pointer repair sections Darrick J. Wong
2024-04-10  4:40     ` Christoph Hellwig
2024-04-10  0:47   ` [PATCH 3/4] docs: update offline parent pointer repair strategy Darrick J. Wong
2024-04-10  4:40     ` Christoph Hellwig
2024-04-10  0:47   ` [PATCH 4/4] docs: describe xfs directory tree online fsck Darrick J. Wong
2024-04-10  4:40     ` Christoph Hellwig
2024-04-10  0:44 ` [PATCHSET v13.1 2/9] xfs: retain ILOCK during directory updates Darrick J. Wong
2024-04-10  0:47   ` [PATCH 1/7] xfs: Increase XFS_DEFER_OPS_NR_INODES to 5 Darrick J. Wong
2024-04-10  4:41     ` Christoph Hellwig
2024-04-10  0:48   ` [PATCH 2/7] xfs: Increase XFS_QM_TRANS_MAXDQS " Darrick J. Wong
2024-04-10  4:41     ` Christoph Hellwig
2024-04-10  0:48   ` [PATCH 3/7] xfs: Hold inode locks in xfs_ialloc Darrick J. Wong
2024-04-10  4:41     ` Christoph Hellwig
2024-04-10  0:48   ` [PATCH 4/7] xfs: Hold inode locks in xfs_trans_alloc_dir Darrick J. Wong
2024-04-10  4:41     ` Christoph Hellwig
2024-04-10  0:48   ` [PATCH 5/7] xfs: Hold inode locks in xfs_rename Darrick J. Wong
2024-04-10  4:42     ` Christoph Hellwig
2024-04-10  0:49   ` [PATCH 6/7] xfs: don't pick up IOLOCK during rmapbt repair scan Darrick J. Wong
2024-04-10  4:42     ` Christoph Hellwig
2024-04-10  0:49   ` [PATCH 7/7] xfs: unlock new repair tempfiles after creation Darrick J. Wong
2024-04-10  4:42     ` Christoph Hellwig
2024-04-10  0:44 ` [PATCHSET v13.1 3/9] xfs: shrink struct xfs_da_args Darrick J. Wong
2024-04-10  0:49   ` [PATCH 1/4] xfs: remove XFS_DA_OP_REMOVE Darrick J. Wong
2024-04-10  4:43     ` Christoph Hellwig
2024-04-10  0:49   ` [PATCH 2/4] xfs: remove XFS_DA_OP_NOTIME Darrick J. Wong
2024-04-10  4:44     ` Christoph Hellwig
2024-04-10  0:50   ` [PATCH 3/4] xfs: rename xfs_da_args.attr_flags Darrick J. Wong
2024-04-10  5:01     ` Christoph Hellwig
2024-04-10 20:55       ` Darrick J. Wong
2024-04-11  0:00         ` Darrick J. Wong
2024-04-11  3:26         ` Christoph Hellwig
2024-04-11  4:15           ` Darrick J. Wong
2024-04-10  0:50   ` [PATCH 4/4] xfs: rearrange xfs_da_args a bit to use less space Darrick J. Wong
2024-04-10  5:02     ` Christoph Hellwig
2024-04-10 20:56       ` Darrick J. Wong
2024-04-10  0:45 ` [PATCHSET v13.1 4/9] xfs: improve extended attribute validation Darrick J. Wong
2024-04-10  0:50   ` [PATCH 01/12] xfs: attr fork iext must be loaded before calling xfs_attr_is_leaf Darrick J. Wong
2024-04-10  5:04     ` Christoph Hellwig
2024-04-10 20:58       ` Darrick J. Wong
2024-04-10  0:50   ` [PATCH 02/12] xfs: require XFS_SB_FEAT_INCOMPAT_LOG_XATTRS for attr log intent item recovery Darrick J. Wong
2024-04-10  5:04     ` Christoph Hellwig
2024-04-10  0:51   ` [PATCH 03/12] xfs: use an XFS_OPSTATE_ flag for detecting if logged xattrs are available Darrick J. Wong
2024-04-10  5:05     ` Christoph Hellwig
2024-04-10  0:51   ` [PATCH 04/12] xfs: check opcode and iovec count match in xlog_recover_attri_commit_pass2 Darrick J. Wong
2024-04-10  5:05     ` Christoph Hellwig
2024-04-10  0:51   ` [PATCH 05/12] xfs: fix missing check for invalid attr flags Darrick J. Wong
2024-04-10  5:07     ` Christoph Hellwig
2024-04-10 21:04       ` Darrick J. Wong
2024-04-10  0:51   ` [PATCH 06/12] xfs: restructure xfs_attr_complete_op a bit Darrick J. Wong
2024-04-10  5:07     ` Christoph Hellwig
2024-04-10  0:52   ` [PATCH 07/12] xfs: use helpers to extract xattr op from opflags Darrick J. Wong
2024-04-10  5:07     ` Christoph Hellwig
2024-04-10  0:52   ` [PATCH 08/12] xfs: validate recovered name buffers when recovering xattr items Darrick J. Wong
2024-04-10  5:08     ` Christoph Hellwig
2024-04-10  0:52   ` [PATCH 09/12] xfs: always set args->value in xfs_attri_item_recover Darrick J. Wong
2024-04-10  5:08     ` Christoph Hellwig
2024-04-10  0:52   ` [PATCH 10/12] xfs: use local variables for name and value length in _attri_commit_pass2 Darrick J. Wong
2024-04-10  5:08     ` Christoph Hellwig
2024-04-10  0:53   ` [PATCH 11/12] xfs: refactor name/length checks in xfs_attri_validate Darrick J. Wong
2024-04-10  5:09     ` Christoph Hellwig
2024-04-10  0:53   ` [PATCH 12/12] xfs: enforce one namespace per attribute Darrick J. Wong
2024-04-10  5:09     ` Christoph Hellwig
2024-04-10  0:45 ` [PATCHSET v13.1 5/9] xfs: Parent Pointers Darrick J. Wong
2024-04-10  0:53   ` [PATCH 01/32] xfs: rearrange xfs_attr_match parameters Darrick J. Wong
2024-04-10  5:10     ` Christoph Hellwig
2024-04-10  0:54   ` [PATCH 02/32] xfs: check the flags earlier in xfs_attr_match Darrick J. Wong
2024-04-10  0:54   ` [PATCH 03/32] xfs: move xfs_attr_defer_add to xfs_attr_item.c Darrick J. Wong
2024-04-10  5:11     ` Christoph Hellwig
2024-04-10  0:54   ` [PATCH 04/32] xfs: create a separate hashname function for extended attributes Darrick J. Wong
2024-04-10  5:11     ` Christoph Hellwig
2024-04-10  0:54   ` [PATCH 05/32] xfs: add parent pointer support to attribute code Darrick J. Wong
2024-04-10  5:11     ` Christoph Hellwig
2024-04-10  0:55   ` [PATCH 06/32] xfs: define parent pointer ondisk extended attribute format Darrick J. Wong
2024-04-10  5:12     ` Christoph Hellwig
2024-04-10  0:55   ` [PATCH 07/32] xfs: allow xattr matching on name and value for local/sf attrs Darrick J. Wong
2024-04-10  5:16     ` Christoph Hellwig
2024-04-10 21:13       ` Darrick J. Wong
2024-04-11  3:28         ` Christoph Hellwig
2024-04-10  0:55   ` [PATCH 08/32] xfs: allow logged xattr operations if parent pointers are enabled Darrick J. Wong
2024-04-10  5:18     ` Christoph Hellwig
2024-04-10 21:18       ` Darrick J. Wong
2024-04-10  0:55   ` [PATCH 09/32] xfs: log parent pointer xattr removal operations Darrick J. Wong
2024-04-10  5:18     ` Christoph Hellwig
2024-04-10  0:56   ` [PATCH 10/32] xfs: log parent pointer xattr setting operations Darrick J. Wong
2024-04-10  0:56   ` [PATCH 11/32] xfs: log parent pointer xattr replace operations Darrick J. Wong
2024-04-10  5:26     ` Christoph Hellwig
2024-04-10 23:07       ` Darrick J. Wong
2024-04-11  3:35         ` Christoph Hellwig
2024-04-10  0:56   ` [PATCH 12/32] xfs: record inode generation in xattr update log intent items Darrick J. Wong
2024-04-10  5:27     ` Christoph Hellwig
2024-04-10  0:56   ` [PATCH 13/32] xfs: Expose init_xattrs in xfs_create_tmpfile Darrick J. Wong
2024-04-10  5:28     ` Christoph Hellwig
2024-04-10  0:57   ` [PATCH 14/32] xfs: add parent pointer validator functions Darrick J. Wong
2024-04-10  5:31     ` Christoph Hellwig
2024-04-10 18:53       ` Darrick J. Wong
2024-04-11  3:25         ` Christoph Hellwig
2024-04-10  0:57   ` [PATCH 15/32] xfs: extend transaction reservations for parent attributes Darrick J. Wong
2024-04-10  5:31     ` Christoph Hellwig
2024-04-10  0:57   ` [PATCH 16/32] xfs: create a hashname function for parent pointers Darrick J. Wong
2024-04-10  5:33     ` Christoph Hellwig
2024-04-10 21:39       ` Darrick J. Wong
2024-04-10  0:57   ` [PATCH 17/32] xfs: parent pointer attribute creation Darrick J. Wong
2024-04-10  5:44     ` Christoph Hellwig
2024-04-10 21:50       ` Darrick J. Wong
2024-04-10  0:58   ` [PATCH 18/32] xfs: add parent attributes to link Darrick J. Wong
2024-04-10  5:45     ` Christoph Hellwig
2024-04-10  0:58   ` [PATCH 19/32] xfs: add parent attributes to symlink Darrick J. Wong
2024-04-10  5:45     ` Christoph Hellwig
2024-04-10  0:58   ` [PATCH 20/32] xfs: remove parent pointers in unlink Darrick J. Wong
2024-04-10  5:45     ` Christoph Hellwig
2024-04-10  0:58   ` [PATCH 21/32] xfs: Add parent pointers to rename Darrick J. Wong
2024-04-10  5:46     ` Christoph Hellwig
2024-04-10  0:59   ` [PATCH 22/32] xfs: Add parent pointers to xfs_cross_rename Darrick J. Wong
2024-04-10  5:46     ` Christoph Hellwig
2024-04-10  0:59   ` [PATCH 23/32] xfs: Filter XFS_ATTR_PARENT for getfattr Darrick J. Wong
2024-04-10  5:51     ` Christoph Hellwig
2024-04-10 21:58       ` Darrick J. Wong
2024-04-11  3:29         ` Christoph Hellwig
2024-04-10  0:59   ` [PATCH 24/32] xfs: pass the attr value to put_listent when possible Darrick J. Wong
2024-04-10  5:51     ` Christoph Hellwig
2024-04-10  1:00   ` [PATCH 25/32] xfs: move handle ioctl code to xfs_handle.c Darrick J. Wong
2024-04-10  5:52     ` Christoph Hellwig
2024-04-10  1:00   ` [PATCH 26/32] xfs: split out handle management helpers a bit Darrick J. Wong
2024-04-10  5:56     ` Christoph Hellwig
2024-04-10 22:01       ` Darrick J. Wong
2024-04-10  1:00   ` [PATCH 27/32] xfs: Add parent pointer ioctls Darrick J. Wong
2024-04-10  6:04     ` Christoph Hellwig
2024-04-10 23:34       ` Darrick J. Wong
2024-04-12 17:39     ` Darrick J. Wong
2024-04-14  5:18       ` Christoph Hellwig
2024-04-15 19:40         ` Darrick J. Wong
2024-04-16  4:47           ` Christoph Hellwig
2024-04-16 16:50             ` Darrick J. Wong
2024-04-16 16:54               ` Christoph Hellwig
2024-04-16 18:52                 ` Darrick J. Wong
2024-04-16 19:01                   ` Christoph Hellwig
2024-04-16 19:07                     ` Darrick J. Wong
2024-04-16 19:14                       ` Christoph Hellwig
2024-04-17  5:22                         ` Darrick J. Wong
2024-04-17  5:29                           ` Christoph Hellwig
2024-04-17  5:55                             ` Darrick J. Wong
2024-04-10  1:00   ` [PATCH 28/32] xfs: don't remove the attr fork when parent pointers are enabled Darrick J. Wong
2024-04-10  6:04     ` Christoph Hellwig
2024-04-10  1:01   ` [PATCH 29/32] xfs: Add the parent pointer support to the superblock version 5 Darrick J. Wong
2024-04-10  6:05     ` Christoph Hellwig
2024-04-10 22:06       ` Darrick J. Wong
2024-04-10  1:01   ` [PATCH 30/32] xfs: fix unit conversion error in xfs_log_calc_max_attrsetm_res Darrick J. Wong
2024-04-10  6:05     ` Christoph Hellwig
2024-04-10  1:01   ` [PATCH 31/32] xfs: drop compatibility minimum log size computations for reflink Darrick J. Wong
2024-04-10  6:06     ` Christoph Hellwig
2024-04-10  1:01   ` [PATCH 32/32] xfs: enable parent pointers Darrick J. Wong
2024-04-10  6:06     ` Christoph Hellwig
2024-04-10 22:11       ` Darrick J. Wong
2024-04-10  0:45 ` [PATCHSET v13.1 6/9] xfs: scrubbing for " Darrick J. Wong
2024-04-10  1:02   ` [PATCH 1/7] xfs: check dirents have " Darrick J. Wong
2024-04-10  6:12     ` Christoph Hellwig
2024-04-10  1:02   ` [PATCH 2/7] xfs: deferred scrub of dirents Darrick J. Wong
2024-04-10  6:13     ` Christoph Hellwig
2024-04-10  1:02   ` [PATCH 3/7] xfs: scrub parent pointers Darrick J. Wong
2024-04-10  6:13     ` Christoph Hellwig
2024-04-10  1:02   ` [PATCH 4/7] xfs: deferred scrub of " Darrick J. Wong
2024-04-10  6:14     ` Christoph Hellwig
2024-04-10  1:03   ` [PATCH 5/7] xfs: walk directory parent pointers to determine backref count Darrick J. Wong
2024-04-10  6:14     ` Christoph Hellwig
2024-04-10  1:03   ` [PATCH 6/7] xfs: check parent pointer xattrs when scrubbing Darrick J. Wong
2024-04-10  6:14     ` Christoph Hellwig
2024-04-10  1:03   ` [PATCH 7/7] xfs: salvage parent pointers when rebuilding xattr structures Darrick J. Wong
2024-04-10  6:15     ` Christoph Hellwig
2024-04-10  0:45 ` [PATCHSET v13.1 7/9] xfs: online repair for parent pointers Darrick J. Wong
2024-04-10  1:03   ` [PATCH 01/14] xfs: add xattr setname and removename functions for internal users Darrick J. Wong
2024-04-10  6:18     ` Christoph Hellwig
2024-04-10 22:18       ` Darrick J. Wong
2024-04-11  3:32         ` Christoph Hellwig
2024-04-11  4:30           ` Darrick J. Wong
2024-04-11  4:50             ` Christoph Hellwig
2024-04-10  1:04   ` [PATCH 02/14] xfs: add raw parent pointer apis to support repair Darrick J. Wong
2024-04-10  6:18     ` Christoph Hellwig
2024-04-10  1:04   ` [PATCH 03/14] xfs: repair directories by scanning directory parent pointers Darrick J. Wong
2024-04-10  6:19     ` Christoph Hellwig
2024-04-10  1:04   ` [PATCH 04/14] xfs: implement live updates for directory repairs Darrick J. Wong
2024-04-10  6:19     ` Christoph Hellwig
2024-04-10  1:04   ` [PATCH 05/14] xfs: replay unlocked parent pointer updates that accrue during xattr repair Darrick J. Wong
2024-04-10  6:19     ` Christoph Hellwig
2024-04-10  1:05   ` [PATCH 06/14] xfs: repair directory parent pointers by scanning for dirents Darrick J. Wong
2024-04-10  6:20     ` Christoph Hellwig
2024-04-10  1:05   ` [PATCH 07/14] xfs: implement live updates for parent pointer repairs Darrick J. Wong
2024-04-10  6:20     ` Christoph Hellwig
2024-04-10  1:05   ` [PATCH 08/14] xfs: remove pointless unlocked assertion Darrick J. Wong
2024-04-10  6:20     ` Christoph Hellwig
2024-04-10  1:06   ` [PATCH 09/14] xfs: split xfs_bmap_add_attrfork into two pieces Darrick J. Wong
2024-04-10  6:21     ` Christoph Hellwig
2024-04-10  1:06   ` [PATCH 10/14] xfs: add a per-leaf block callback to xchk_xattr_walk Darrick J. Wong
2024-04-10  6:22     ` Christoph Hellwig
2024-04-10  1:06   ` [PATCH 11/14] xfs: actually rebuild the parent pointer xattrs Darrick J. Wong
2024-04-10  6:22     ` Christoph Hellwig
2024-04-10  1:06   ` [PATCH 12/14] xfs: adapt the orphanage code to handle parent pointers Darrick J. Wong
2024-04-10  6:23     ` Christoph Hellwig
2024-04-10  1:07   ` [PATCH 13/14] xfs: repair link count of nondirectories after rebuilding " Darrick J. Wong
2024-04-10  6:22     ` Christoph Hellwig
2024-04-10  1:07   ` [PATCH 14/14] xfs: inode repair should ensure there's an attr fork to store " Darrick J. Wong
2024-04-10  6:24     ` Christoph Hellwig
2024-04-10  0:46 ` [PATCHSET v13.1 8/9] xfs: detect and correct directory tree problems Darrick J. Wong
2024-04-10  1:07   ` [PATCH 1/4] xfs: teach online scrub to find directory tree structure problems Darrick J. Wong
2024-04-10  7:21     ` Christoph Hellwig
2024-04-10  1:07   ` [PATCH 2/4] xfs: invalidate dirloop scrub path data when concurrent updates happen Darrick J. Wong
2024-04-10  7:21     ` Christoph Hellwig
2024-04-10  1:08   ` [PATCH 3/4] xfs: report directory tree corruption in the health information Darrick J. Wong
2024-04-10  7:23     ` Christoph Hellwig
2024-04-10  1:08   ` [PATCH 4/4] xfs: fix corruptions in the directory tree Darrick J. Wong
2024-04-10  7:23     ` Christoph Hellwig
2024-04-10  0:46 ` [PATCHSET v13.1 9/9] xfs: vectorize scrub kernel calls Darrick J. Wong
2024-04-10  1:08   ` [PATCH 1/3] xfs: reduce the rate of cond_resched calls inside scrub Darrick J. Wong
2024-04-10 14:55     ` Christoph Hellwig
2024-04-10 22:19       ` Darrick J. Wong
2024-04-10  1:08   ` [PATCH 2/3] xfs: introduce vectored scrub mode Darrick J. Wong
2024-04-10 15:00     ` Christoph Hellwig
2024-04-11  0:59       ` Darrick J. Wong
2024-04-11  3:38         ` Christoph Hellwig
2024-04-11  4:31           ` Darrick J. Wong
2024-04-10  1:09   ` [PATCH 3/3] xfs: only iget the file once when doing vectored scrub-by-handle Darrick J. Wong
2024-04-10 15:12     ` Christoph Hellwig
2024-04-11  1:15       ` Darrick J. Wong
2024-04-11  3:49         ` Christoph Hellwig
2024-04-11  4:41           ` Darrick J. Wong
2024-04-11  4:52             ` Christoph Hellwig
2024-04-11  4:56               ` Darrick J. Wong
2024-04-11  5:02                 ` Christoph Hellwig
2024-04-11  5:21                   ` Darrick J. Wong
2024-04-11 14:02                     ` Christoph Hellwig
2024-04-12  0:21                       ` Darrick J. Wong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.