linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC DELUGE v9r2d1] xfs: Parent Pointers
@ 2023-02-16 20:06 Darrick J. Wong
  2023-02-16 20:26 ` [PATCHSET v9r2d1 00/28] " Darrick J. Wong
                   ` (25 more replies)
  0 siblings, 26 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:06 UTC (permalink / raw)
  To: allison.henderson; +Cc: linux-xfs

Hi everyone,

This deluge contains all of the additions to the parent pointers
patchset that I've been working on for the past month.  The kernel and
xfsprogs patchsets are based on Allison's v9r2 tag from last week;
the fstests patches are merely a part of my development tree.  To recap
Allison's cover letter:

"The goal of this patch set is to add a parent pointer attribute to each
inode.  The attribute name containing the parent inode, generation, and
directory offset, while the  attribute value contains the file name.
This feature will enable future optimizations for online scrub, shrink,
nfs handles, verity, or any other feature that could make use of quickly
deriving an inodes path from the mount point."

The kernel branches start with a number of buf fixes that I need to get
fstests to pass.  I also restructured the kernel implementation of
GETPARENTS to cut the memory usage considerably.

For userspace, I cleaned up the xfsprogs patches so that libxfs-diff
shows no discrepancies with the kernel and cleaned up the parent pointer
usage code that I prototyped in 2017 so that it's less buggy and moldy.
I also rewired xfs_scrub to use GETPARENTS to report file paths of
corrupt files instead of inode numbers, since that part had bitrotted
badly.

With that out of the way, I implemented a prototype of online repairs
for directories and parent pointers.  This is only a proof of concept,
because I had already backported many many patches from part 1 of online
repair, and didn't feel like porting the parts needed to commit new
structures atomically and reap the old dir/xattr blocks.  IOWs, the
prototype scans the filesystem to build a parallel directory or xattr
structure, and then reports on any discrepancies between the two
versions.  Obviously this won't fix a corrupt directory tree, but it
enables us to test the repair code on a consistent filesystem to
demonstrate that it works.

Next, I implemented fully functional parent pointer checking and repair
for xfs_repair.  This was less hard than I guessed it would be because
the current design of phase 6 includes a walk of all directories.  From
the dirent data, we can build a per-AG index of all the parent pointers
for all the inodes in that AG, then walk all the inodes in that AG to
compare the lists.  As you might guess, this eats a fair amount of
memory, even with a rudimentary dirent name deduplication table to cut
down on memory usage.

After that, I moved on to solving the major problem that I've been
having with the directory repair code, and that is the problem of
reconstructing dirents at the offsets specified by the parent pointers.
The details of the problem and how I dealt with it are captured in the
cover letter for those patches.  Suffice to say, we now encode the
dirent name in the parent pointer attrname (or a collision resistant
hash if it doesn't fit), which makes it possible to commit new
directories atomically.

The last part of this patchset reorganizes the XFS_IOC_GETPARENTS ioctl
to encode variable length parent pointer records in the caller's buffer.
The denser encodings mean that we can extract the parent list with fewer
kernel calls.

--D

^ permalink raw reply	[flat|nested] 227+ messages in thread

* [PATCHSET v9r2d1 00/28] xfs: Parent Pointers
  2023-02-16 20:06 [RFC DELUGE v9r2d1] xfs: Parent Pointers Darrick J. Wong
@ 2023-02-16 20:26 ` Darrick J. Wong
  2023-02-16 20:32   ` [PATCH 01/28] xfs: Add new name to attri/d Darrick J. Wong
                     ` (27 more replies)
  2023-02-16 20:26 ` [PATCHSET v9r2d1 0/3] xfs: bug fixes for parent pointers Darrick J. Wong
                   ` (24 subsequent siblings)
  25 siblings, 28 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:26 UTC (permalink / raw)
  To: djwong
  Cc: Allison Henderson, Catherine Hoang, Mark Tinguely,
	Darrick J. Wong, Dave Chinner, allison.henderson, linux-xfs

Hi all,

This is the latest parent pointer attributes for xfs.
The goal of this patch set is to add a parent pointer attribute to each inode.
The attribute name containing the parent inode, generation, and directory
offset, while the  attribute value contains the file name.  This feature will
enable future optimizations for online scrub, shrink, nfs handles, verity, or
any other feature that could make use of quickly deriving an inodes path from
the mount point.

This set can be viewed on github here
https://github.com/allisonhenderson/xfs/tree/xfs_new_pptrsv9_r2

And the corresponding xfsprogs code is here
https://github.com/allisonhenderson/xfsprogs/tree/xfsprogs_new_pptrs_v9_r2

This set has been tested with the below parent pointers tests
https://lore.kernel.org/fstests/20221012013812.82161-1-catherine.hoang@oracle.com/T/#t

Updates since v8:

xfs: parent pointer attribute creation
   Fix xfs_parent_init to release log assist on alloc fail
   Add slab cache for xfs_parent_defer
   Fix xfs_create to release after unlock
   Add xfs_parent_start and xfs_parent_finish wrappers
   removed unused xfs_parent_name_irec and xfs_init_parent_name_irec

xfs: add parent attributes to link
   Start/finish wrapper updates
   Fix xfs_link to disallow reservationless quotas
   
xfs: add parent attributes to symlink
   Fix xfs_symlink to release after unlock
   Start/finish wrapper updates
   
xfs: remove parent pointers in unlink
   Start/finish wrapper updates
   Add missing parent free

xfs: Add parent pointers to rename
   Start/finish wrapper updates
   Fix rename to only grab logged xattr once
   Fix xfs_rename to disallow reservationless quotas
   Fix double unlock on dqattach fail
   Move parent frees to out_release_wip
   
xfs: Add parent pointers to xfs_cross_rename
   Hoist parent pointers into rename

Questions comments and feedback appreciated!

Thanks all!
Allison

If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.
kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=pptrs

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=pptrs

fstests git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=pptrs
---
 fs/xfs/Makefile                 |    2 
 fs/xfs/libxfs/xfs_attr.c        |   71 +++++-
 fs/xfs/libxfs/xfs_attr.h        |   13 +
 fs/xfs/libxfs/xfs_da_btree.h    |    3 
 fs/xfs/libxfs/xfs_da_format.h   |   26 ++
 fs/xfs/libxfs/xfs_defer.c       |   28 ++
 fs/xfs/libxfs/xfs_defer.h       |    8 +
 fs/xfs/libxfs/xfs_dir2.c        |   21 +-
 fs/xfs/libxfs/xfs_dir2.h        |    7 -
 fs/xfs/libxfs/xfs_dir2_block.c  |    9 -
 fs/xfs/libxfs/xfs_dir2_leaf.c   |    8 +
 fs/xfs/libxfs/xfs_dir2_node.c   |    8 +
 fs/xfs/libxfs/xfs_dir2_sf.c     |    6 +
 fs/xfs/libxfs/xfs_format.h      |    4 
 fs/xfs/libxfs/xfs_fs.h          |   75 +++++++
 fs/xfs/libxfs/xfs_log_format.h  |    7 -
 fs/xfs/libxfs/xfs_log_rlimit.c  |   53 +++++
 fs/xfs/libxfs/xfs_parent.c      |  203 ++++++++++++++++++
 fs/xfs/libxfs/xfs_parent.h      |   84 +++++++
 fs/xfs/libxfs/xfs_sb.c          |    4 
 fs/xfs/libxfs/xfs_trans_resv.c  |  324 ++++++++++++++++++++++++----
 fs/xfs/libxfs/xfs_trans_space.h |    8 -
 fs/xfs/scrub/attr.c             |    4 
 fs/xfs/xfs_attr_item.c          |  142 ++++++++++--
 fs/xfs/xfs_attr_item.h          |    1 
 fs/xfs/xfs_attr_list.c          |   17 +
 fs/xfs/xfs_dquot.c              |   38 +++
 fs/xfs/xfs_dquot.h              |    1 
 fs/xfs/xfs_file.c               |    1 
 fs/xfs/xfs_inode.c              |  447 +++++++++++++++++++++++++++++++--------
 fs/xfs/xfs_inode.h              |    3 
 fs/xfs/xfs_ioctl.c              |  148 +++++++++++--
 fs/xfs/xfs_ioctl.h              |    2 
 fs/xfs/xfs_iops.c               |    3 
 fs/xfs/xfs_ondisk.h             |    4 
 fs/xfs/xfs_parent_utils.c       |  126 +++++++++++
 fs/xfs/xfs_parent_utils.h       |   11 +
 fs/xfs/xfs_qm.c                 |    4 
 fs/xfs/xfs_qm.h                 |    2 
 fs/xfs/xfs_super.c              |   14 +
 fs/xfs/xfs_symlink.c            |   58 ++++-
 fs/xfs/xfs_trans.c              |   13 +
 fs/xfs/xfs_trans_dquot.c        |   15 +
 fs/xfs/xfs_xattr.c              |    7 -
 fs/xfs/xfs_xattr.h              |    2 
 45 files changed, 1782 insertions(+), 253 deletions(-)
 create mode 100644 fs/xfs/libxfs/xfs_parent.c
 create mode 100644 fs/xfs/libxfs/xfs_parent.h
 create mode 100644 fs/xfs/xfs_parent_utils.c
 create mode 100644 fs/xfs/xfs_parent_utils.h


^ permalink raw reply	[flat|nested] 227+ messages in thread

* [PATCHSET v9r2d1 0/3] xfs: bug fixes for parent pointers
  2023-02-16 20:06 [RFC DELUGE v9r2d1] xfs: Parent Pointers Darrick J. Wong
  2023-02-16 20:26 ` [PATCHSET v9r2d1 00/28] " Darrick J. Wong
@ 2023-02-16 20:26 ` Darrick J. Wong
  2023-02-16 20:40   ` [PATCH 1/3] xfs: directory lookups should return diroffsets too Darrick J. Wong
                     ` (2 more replies)
  2023-02-16 20:26 ` [PATCHSET v9r2d1 0/4] xfs: rework the GETPARENTS ioctl Darrick J. Wong
                   ` (23 subsequent siblings)
  25 siblings, 3 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:26 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

Hi all,

This series contains the accumulated bug fixes from Darrick to make
fstests pass and online repair work.

If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.
kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=pptrs-bugfixes

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=pptrs-bugfixes
---
 fs/xfs/libxfs/xfs_attr.c       |   61 +++++++++-------------------------------
 fs/xfs/libxfs/xfs_attr.h       |    2 +
 fs/xfs/libxfs/xfs_attr_leaf.c  |    6 +++-
 fs/xfs/libxfs/xfs_dir2_block.c |    2 +
 fs/xfs/libxfs/xfs_dir2_leaf.c  |    2 +
 fs/xfs/libxfs/xfs_dir2_node.c  |    2 +
 fs/xfs/libxfs/xfs_dir2_sf.c    |    4 +++
 fs/xfs/libxfs/xfs_parent.c     |   44 +++++++++++++++++++++++++++++
 fs/xfs/libxfs/xfs_parent.h     |    7 +++++
 9 files changed, 79 insertions(+), 51 deletions(-)


^ permalink raw reply	[flat|nested] 227+ messages in thread

* [PATCHSET v9r2d1 0/4] xfs: rework the GETPARENTS ioctl
  2023-02-16 20:06 [RFC DELUGE v9r2d1] xfs: Parent Pointers Darrick J. Wong
  2023-02-16 20:26 ` [PATCHSET v9r2d1 00/28] " Darrick J. Wong
  2023-02-16 20:26 ` [PATCHSET v9r2d1 0/3] xfs: bug fixes for parent pointers Darrick J. Wong
@ 2023-02-16 20:26 ` Darrick J. Wong
  2023-02-16 20:40   ` [PATCH 1/4] xfs: fix multiple problems when doing getparents by handle Darrick J. Wong
                     ` (3 more replies)
  2023-02-16 20:27 ` [PATCHSET v9r2d1 00/23] xfs: online fsck support patches Darrick J. Wong
                   ` (22 subsequent siblings)
  25 siblings, 4 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:26 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

Hi all,

This series fixes a few bugs that I found in the XFS_IOC_GETPARENTS
implementation.  It also reworks the xfs_attr_list implementations to
provide an xattr value pointer when available, and finally it reworks
the whole implementation to take advantage of this and use less memory.

If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.
kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=pptrs-ioctl
---
 fs/xfs/libxfs/xfs_attr.h    |    5 +
 fs/xfs/libxfs/xfs_attr_sf.h |    1 
 fs/xfs/libxfs/xfs_parent.c  |   40 +++++++--
 fs/xfs/libxfs/xfs_parent.h  |   21 ++++-
 fs/xfs/scrub/attr.c         |    8 ++
 fs/xfs/xfs_attr_list.c      |    8 ++
 fs/xfs/xfs_ioctl.c          |   40 ++++++---
 fs/xfs/xfs_parent_utils.c   |  184 ++++++++++++++++++++++++-------------------
 fs/xfs/xfs_parent_utils.h   |    4 -
 fs/xfs/xfs_trace.c          |    1 
 fs/xfs/xfs_trace.h          |   73 +++++++++++++++++
 fs/xfs/xfs_xattr.c          |    1 
 12 files changed, 272 insertions(+), 114 deletions(-)


^ permalink raw reply	[flat|nested] 227+ messages in thread

* [PATCHSET v9r2d1 00/23] xfs: online fsck support patches
  2023-02-16 20:06 [RFC DELUGE v9r2d1] xfs: Parent Pointers Darrick J. Wong
                   ` (2 preceding siblings ...)
  2023-02-16 20:26 ` [PATCHSET v9r2d1 0/4] xfs: rework the GETPARENTS ioctl Darrick J. Wong
@ 2023-02-16 20:27 ` Darrick J. Wong
  2023-02-16 20:42   ` [PATCH 01/23] xfs: manage inode DONTCACHE status at irele time Darrick J. Wong
                     ` (22 more replies)
  2023-02-16 20:27 ` [PATCHSET v9r2d1 0/7] xfs: online repair of directories Darrick J. Wong
                   ` (21 subsequent siblings)
  25 siblings, 23 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:27 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

Hi all,

These are all the patches that I needed to backport from the online fsck
patchset to start writing online fsck for parent pointers and
directories.

IOWS, we're blatantly copying things from the online repair part 1
megaseries; this is what online repair part 2 requires.

If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.
kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=pptrs-online-fsck-backports

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=pptrs-online-fsck-backports
---
 fs/xfs/Kconfig                 |   38 +++
 fs/xfs/Makefile                |   11 +
 fs/xfs/libxfs/xfs_dir2.c       |    6 
 fs/xfs/libxfs/xfs_dir2.h       |    1 
 fs/xfs/scrub/agheader_repair.c |   99 +++++---
 fs/xfs/scrub/bitmap.c          |  389 ++++++++++++++++++++------------
 fs/xfs/scrub/bitmap.h          |   35 ++-
 fs/xfs/scrub/bmap.c            |    6 
 fs/xfs/scrub/common.c          |  133 ++++++++---
 fs/xfs/scrub/common.h          |   10 +
 fs/xfs/scrub/dir.c             |  233 ++++++-------------
 fs/xfs/scrub/inode.c           |    6 
 fs/xfs/scrub/iscan.c           |  483 ++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/iscan.h           |   65 +++++
 fs/xfs/scrub/listxattr.c       |  314 ++++++++++++++++++++++++++
 fs/xfs/scrub/listxattr.h       |   17 +
 fs/xfs/scrub/parent.c          |  290 ++++++++++--------------
 fs/xfs/scrub/quota.c           |    9 -
 fs/xfs/scrub/readdir.c         |  375 +++++++++++++++++++++++++++++++
 fs/xfs/scrub/readdir.h         |   19 ++
 fs/xfs/scrub/repair.c          |  102 +++++---
 fs/xfs/scrub/rtbitmap.c        |   11 -
 fs/xfs/scrub/scrub.c           |   23 ++
 fs/xfs/scrub/scrub.h           |    7 +
 fs/xfs/scrub/tempfile.c        |  243 ++++++++++++++++++++
 fs/xfs/scrub/tempfile.h        |   29 ++
 fs/xfs/scrub/trace.c           |    5 
 fs/xfs/scrub/trace.h           |  272 +++++++++++++++++++++++
 fs/xfs/scrub/xfarray.c         |  394 +++++++++++++++++++++++++++++++++
 fs/xfs/scrub/xfarray.h         |   60 +++++
 fs/xfs/scrub/xfblob.c          |  176 +++++++++++++++
 fs/xfs/scrub/xfblob.h          |   27 ++
 fs/xfs/scrub/xfile.c           |  329 +++++++++++++++++++++++++++
 fs/xfs/scrub/xfile.h           |   60 +++++
 fs/xfs/xfs_buf.c               |    5 
 fs/xfs/xfs_buf.h               |   10 +
 fs/xfs/xfs_export.c            |    2 
 fs/xfs/xfs_hooks.c             |   94 ++++++++
 fs/xfs/xfs_hooks.h             |   72 ++++++
 fs/xfs/xfs_icache.c            |    3 
 fs/xfs/xfs_icache.h            |   11 +
 fs/xfs/xfs_inode.c             |  229 +++++++++++++++++++
 fs/xfs/xfs_inode.h             |   37 +++
 fs/xfs/xfs_itable.c            |    8 +
 fs/xfs/xfs_linux.h             |    1 
 fs/xfs/xfs_mount.h             |    2 
 fs/xfs/xfs_super.c             |    2 
 fs/xfs/xfs_symlink.c           |    1 
 48 files changed, 4112 insertions(+), 642 deletions(-)
 create mode 100644 fs/xfs/scrub/iscan.c
 create mode 100644 fs/xfs/scrub/iscan.h
 create mode 100644 fs/xfs/scrub/listxattr.c
 create mode 100644 fs/xfs/scrub/listxattr.h
 create mode 100644 fs/xfs/scrub/readdir.c
 create mode 100644 fs/xfs/scrub/readdir.h
 create mode 100644 fs/xfs/scrub/tempfile.c
 create mode 100644 fs/xfs/scrub/tempfile.h
 create mode 100644 fs/xfs/scrub/xfarray.c
 create mode 100644 fs/xfs/scrub/xfarray.h
 create mode 100644 fs/xfs/scrub/xfblob.c
 create mode 100644 fs/xfs/scrub/xfblob.h
 create mode 100644 fs/xfs/scrub/xfile.c
 create mode 100644 fs/xfs/scrub/xfile.h
 create mode 100644 fs/xfs/xfs_hooks.c
 create mode 100644 fs/xfs/xfs_hooks.h


^ permalink raw reply	[flat|nested] 227+ messages in thread

* [PATCHSET v9r2d1 0/7] xfs: online repair of directories
  2023-02-16 20:06 [RFC DELUGE v9r2d1] xfs: Parent Pointers Darrick J. Wong
                   ` (3 preceding siblings ...)
  2023-02-16 20:27 ` [PATCHSET v9r2d1 00/23] xfs: online fsck support patches Darrick J. Wong
@ 2023-02-16 20:27 ` Darrick J. Wong
  2023-02-16 20:48   ` [PATCH 1/7] xfs: pass directory offsets as part of the dirent hook data Darrick J. Wong
                     ` (6 more replies)
  2023-02-16 20:27 ` [PATCHSET v9r2d1 0/2] xfs: online checking of parent pointers Darrick J. Wong
                   ` (20 subsequent siblings)
  25 siblings, 7 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:27 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

Hi all,

With this patchset, we implement online reconstruction of directories by
scanning the entire filesystem looking for parent pointer data.  This
mostly works, except for the part where we need to resync the diroffset
field of the parent pointers to match the new directory structure.

Fixing that is left as an open research question, with a few possible
solutions:

1. As part of committing the new directory, queue a bunch of parent
pointer updates to make those changes.

2. Leave them inconsistent and let the parent pointer repair fix it.

3. Change the ondisk format of parent pointers (and xattrs) so that we
can encode the full dirent name in the xattr name.

4. Change the ondisk format of parent pointers to encode a sha256 hash
of the dirent name in the xattr name.  This will work as long as nobody
breaks sha256.

Thoughts?  Note that the atomic swapext and block reaping code is NOT
ported for this PoC, so we do not commit any repairs.

If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.
kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=pptrs-online-dir-repair

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=pptrs-online-dir-repair
---
 fs/xfs/Makefile               |    1 
 fs/xfs/libxfs/xfs_da_format.h |   11 
 fs/xfs/libxfs/xfs_dir2.c      |    2 
 fs/xfs/libxfs/xfs_dir2.h      |    2 
 fs/xfs/libxfs/xfs_parent.c    |   47 +-
 fs/xfs/libxfs/xfs_parent.h    |   24 -
 fs/xfs/scrub/common.c         |   15 +
 fs/xfs/scrub/common.h         |   28 +
 fs/xfs/scrub/dir.c            |   11 
 fs/xfs/scrub/dir_repair.c     | 1129 +++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/readdir.c        |   12 
 fs/xfs/scrub/readdir.h        |    3 
 fs/xfs/scrub/repair.h         |   16 +
 fs/xfs/scrub/scrub.c          |    2 
 fs/xfs/scrub/tempfile.c       |   42 ++
 fs/xfs/scrub/tempfile.h       |    2 
 fs/xfs/scrub/trace.c          |    1 
 fs/xfs/scrub/trace.h          |   69 +++
 fs/xfs/xfs_inode.c            |   56 +-
 fs/xfs/xfs_inode.h            |    5 
 fs/xfs/xfs_symlink.c          |    4 
 21 files changed, 1428 insertions(+), 54 deletions(-)
 create mode 100644 fs/xfs/scrub/dir_repair.c


^ permalink raw reply	[flat|nested] 227+ messages in thread

* [PATCHSET v9r2d1 0/2] xfs: online checking of parent pointers
  2023-02-16 20:06 [RFC DELUGE v9r2d1] xfs: Parent Pointers Darrick J. Wong
                   ` (4 preceding siblings ...)
  2023-02-16 20:27 ` [PATCHSET v9r2d1 0/7] xfs: online repair of directories Darrick J. Wong
@ 2023-02-16 20:27 ` Darrick J. Wong
  2023-02-16 20:49   ` [PATCH 1/2] xfs: scrub " Darrick J. Wong
  2023-02-16 20:50   ` [PATCH 2/2] xfs: deferred scrub of " Darrick J. Wong
  2023-02-16 20:27 ` [PATCHSET v9r2d1 0/3] xfs: online checking " Darrick J. Wong
                   ` (19 subsequent siblings)
  25 siblings, 2 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:27 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

Hi all,

Update the existing online parent pointer checker to confirm the
directory entries that should also exist.

If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.
kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=pptrs-online-parent-check

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=pptrs-online-parent-check
---
 fs/xfs/Makefile            |    2 
 fs/xfs/libxfs/xfs_parent.c |   38 +++
 fs/xfs/libxfs/xfs_parent.h |   10 +
 fs/xfs/scrub/parent.c      |  529 ++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/trace.h       |   66 +++++
 5 files changed, 644 insertions(+), 1 deletion(-)


^ permalink raw reply	[flat|nested] 227+ messages in thread

* [PATCHSET v9r2d1 0/3] xfs: online checking of parent pointers
  2023-02-16 20:06 [RFC DELUGE v9r2d1] xfs: Parent Pointers Darrick J. Wong
                   ` (5 preceding siblings ...)
  2023-02-16 20:27 ` [PATCHSET v9r2d1 0/2] xfs: online checking of parent pointers Darrick J. Wong
@ 2023-02-16 20:27 ` Darrick J. Wong
  2023-02-16 20:50   ` [PATCH 1/3] xfs: repair parent pointers by scanning directories Darrick J. Wong
                     ` (2 more replies)
  2023-02-16 20:28 ` [PATCHSET v9r2d1 0/2] xfs: online checking of directories Darrick J. Wong
                   ` (18 subsequent siblings)
  25 siblings, 3 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:27 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

Hi all,

With this patchset, we implement online repairs for parent pointers.
This is structured similarly to the directory repair code in that we
scan the entire filesystem looking for dirents and use them to
reconstruct the parent pointer information.

Note that the atomic swapext and block reaping code is NOT ported for
this PoC, so we do not commit any repairs.

If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.
kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=pptrs-online-parent-repair

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=pptrs-online-parent-repair
---
 fs/xfs/Makefile              |    1 
 fs/xfs/libxfs/xfs_parent.c   |   56 +++
 fs/xfs/libxfs/xfs_parent.h   |    8 
 fs/xfs/scrub/parent.c        |   10 +
 fs/xfs/scrub/parent_repair.c |  739 ++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/repair.h        |    4 
 fs/xfs/scrub/scrub.c         |    2 
 fs/xfs/scrub/trace.c         |    2 
 fs/xfs/scrub/trace.h         |   80 +++++
 fs/xfs/xfs_inode.h           |    6 
 10 files changed, 905 insertions(+), 3 deletions(-)
 create mode 100644 fs/xfs/scrub/parent_repair.c


^ permalink raw reply	[flat|nested] 227+ messages in thread

* [PATCHSET v9r2d1 0/2] xfs: online checking of directories
  2023-02-16 20:06 [RFC DELUGE v9r2d1] xfs: Parent Pointers Darrick J. Wong
                   ` (6 preceding siblings ...)
  2023-02-16 20:27 ` [PATCHSET v9r2d1 0/3] xfs: online checking " Darrick J. Wong
@ 2023-02-16 20:28 ` Darrick J. Wong
  2023-02-16 20:51   ` [PATCH 1/2] xfs: check dirents have parent pointers Darrick J. Wong
  2023-02-16 20:51   ` [PATCH 2/2] xfs: deferred scrub of dirents Darrick J. Wong
  2023-02-16 20:28 ` [PATCHSET v9r2d1 0/5] xfs: encode parent pointer name in xattr key Darrick J. Wong
                   ` (17 subsequent siblings)
  25 siblings, 2 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:28 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

Hi all,

Update the existing online directory checker to confirm the parent
pointers that should also exist.

If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.
kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=pptrs-online-dir-check
---
 fs/xfs/scrub/dir.c   |  367 ++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/trace.h |    2 
 2 files changed, 368 insertions(+), 1 deletion(-)


^ permalink raw reply	[flat|nested] 227+ messages in thread

* [PATCHSET v9r2d1 0/5] xfs: encode parent pointer name in xattr key
  2023-02-16 20:06 [RFC DELUGE v9r2d1] xfs: Parent Pointers Darrick J. Wong
                   ` (7 preceding siblings ...)
  2023-02-16 20:28 ` [PATCHSET v9r2d1 0/2] xfs: online checking of directories Darrick J. Wong
@ 2023-02-16 20:28 ` Darrick J. Wong
  2023-02-16 20:51   ` [PATCH 1/5] xfs: load secure hash algorithm for parent pointers Darrick J. Wong
                     ` (7 more replies)
  2023-02-16 20:28 ` [PATCHSET v9r2d1 0/3] xfs: use flex arrays for XFS_IOC_GETPARENTS Darrick J. Wong
                   ` (16 subsequent siblings)
  25 siblings, 8 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:28 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

Hi all,

As I've mentioned in past comments on the parent pointers patchset, the
proposed ondisk parent pointer format presents a major difficulty for
online directory repair.  This difficulty derives from encoding the
directory offset of the dirent that the parent pointer is mirroring.
Recall that parent pointers are stored in extended attributes:

    (parent_ino, parent_gen, diroffset) -> (dirent_name)

If the directory is rebuilt, the offsets of the new directory entries
must match the diroffset encoded in the parent pointer, or the
filesystem becomes inconsistent.  There are a few ways to solve this
problem.

One approach would be to augment the directory addname function to take
a diroffset and try to create the new entry at that offset.  This will
not work if the original directory became corrupt and the parent
pointers were written out with impossible diroffsets (e.g. overlapping).
Requiring matching diroffsets also prevents reorganization and
compaction of directories.

This could be remedied by recording the parent pointer diroffset updates
necessary to retain consistency, and using the logged parent pointer
replace function to rewrite parent pointers as necessary.  This is a
poor choice from a performance perspective because the logged xattr
updates must be committed in the same transaction that commits the new
directory structure.  If there are a large number of diroffset updates,
then the directory commit could take an even longer time.

Worse yet, if the logged xattr updates fill up the transaction, repair
will have no choice but to roll to a fresh transaction to continue
logging.  This breaks repair's policy that repairs should commit
atomically.  It may break the filesystem as well, since all files
involved are pinned until the delayed pptr xattr processing completes.
This is a completely bad engineering choice.

Note that the diroffset information is not used anywhere in the
directory lookup code.  Observe that the only information that we
require for a parent pointer is the inverse of an pre-ftype dirent,
since this is all we need to reconstruct a directory entry:

    (parent_ino, dirent_name) -> NULL

The xattr code supports xattrs with zero-length values, surprisingly.
The parent_gen field makes it easy to export parent handle information,
so it can be retained:

    (parent_ino, parent_gen, dirent_name) -> NULL

Moving the ondisk format to this format is very advantageous for repair
code.  Unfortunately, there is one hitch: xattr names cannot exceed 255
bytes due to ondisk format limitations.  We don't want to constrain the
length of dirent names, so instead we could use collision resistant
hashes to handle dirents with very long names:

    (parent_ino, parent_gen, sha512(dirent_name)) -> (dirent_name)

The first two patches implement this schema.  However, this encoding is
not maximally efficient, since many directory names are shorter than the
length of a sha512 hash.  The last three patches in the series bifurcate
the parent pointer ondisk format depending on context:

For dirent names shorter than 243 bytes:

    (parent_ino, parent_gen, dirent_name) -> NULL

For dirent names longer than 243 bytes:

    (parent_ino, parent_gen, dirent_name[0:178],
     sha512(child_gen, dirent_name)) -> (dirent_name[179:255])

The child file's generation number is mixed into the sha512 computation
to make it a little more difficult for unprivileged userspace to attempt
collisions.

A messier solution to this problem would be to extend the xattr ondisk
format to allow parent pointers to have xattr names up to 267 bytes.
This would likely involve redefining the ondisk namelen field to omit
the size of the parent ino/gen information and might be madness.

If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.
kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=pptrs-name-in-attr-key

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=pptrs-name-in-attr-key

fstests git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=pptrs-name-in-attr-key
---
 fs/xfs/Kconfig                 |    1 
 fs/xfs/libxfs/xfs_da_format.h  |   49 +++++++
 fs/xfs/libxfs/xfs_fs.h         |    4 -
 fs/xfs/libxfs/xfs_parent.c     |  265 ++++++++++++++++++++++++++++++++--------
 fs/xfs/libxfs/xfs_parent.h     |   48 +++++--
 fs/xfs/libxfs/xfs_trans_resv.c |    6 -
 fs/xfs/scrub/dir.c             |   16 ++
 fs/xfs/scrub/dir_repair.c      |   87 ++++---------
 fs/xfs/scrub/parent.c          |   51 +++++---
 fs/xfs/scrub/parent_repair.c   |   29 ++--
 fs/xfs/scrub/trace.h           |   48 ++-----
 fs/xfs/xfs_attr_item.c         |    4 -
 fs/xfs/xfs_inode.c             |   30 ++---
 fs/xfs/xfs_linux.h             |    1 
 fs/xfs/xfs_mount.c             |   13 ++
 fs/xfs/xfs_mount.h             |    3 
 fs/xfs/xfs_ondisk.h            |    6 +
 fs/xfs/xfs_parent_utils.c      |    4 -
 fs/xfs/xfs_sha512.h            |   42 ++++++
 fs/xfs/xfs_super.c             |    3 
 fs/xfs/xfs_symlink.c           |    3 
 21 files changed, 481 insertions(+), 232 deletions(-)
 create mode 100644 fs/xfs/xfs_sha512.h


^ permalink raw reply	[flat|nested] 227+ messages in thread

* [PATCHSET v9r2d1 0/3] xfs: use flex arrays for XFS_IOC_GETPARENTS
  2023-02-16 20:06 [RFC DELUGE v9r2d1] xfs: Parent Pointers Darrick J. Wong
                   ` (8 preceding siblings ...)
  2023-02-16 20:28 ` [PATCHSET v9r2d1 0/5] xfs: encode parent pointer name in xattr key Darrick J. Wong
@ 2023-02-16 20:28 ` Darrick J. Wong
  2023-02-16 20:52   ` [PATCH 1/3] xfs: rename xfs_pptr_info to xfs_getparents Darrick J. Wong
                     ` (2 more replies)
  2023-02-16 20:29 ` [PATCHSET v9r2d1 00/25] xfsprogs: Parent Pointers Darrick J. Wong
                   ` (15 subsequent siblings)
  25 siblings, 3 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:28 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

Hi all,

Change the XFS_IOC_GETPARENTS structure definitions so that we can pack
parent pointer information more densely, in a manner similar to the
attrlistmulti ioctl.  This also reduces the amount of memory that has to
be copied back to userspace if the buffer doesn't fill up.

If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.
kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=pptrs-ioctl-flexarray

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=pptrs-ioctl-flexarray

fstests git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=pptrs-ioctl-flexarray
---
 fs/xfs/libxfs/xfs_fs.h    |   75 ++++++++++++++++++++-------------------------
 fs/xfs/xfs_ioctl.c        |   67 +++++++++++++++++++++++++++++-----------
 fs/xfs/xfs_ondisk.h       |    4 +-
 fs/xfs/xfs_parent_utils.c |   57 +++++++++++++++++++++-------------
 fs/xfs/xfs_parent_utils.h |   11 ++++++-
 fs/xfs/xfs_trace.h        |   30 +++++++++---------
 6 files changed, 143 insertions(+), 101 deletions(-)


^ permalink raw reply	[flat|nested] 227+ messages in thread

* [PATCHSET v9r2d1 00/25] xfsprogs: Parent Pointers
  2023-02-16 20:06 [RFC DELUGE v9r2d1] xfs: Parent Pointers Darrick J. Wong
                   ` (9 preceding siblings ...)
  2023-02-16 20:28 ` [PATCHSET v9r2d1 0/3] xfs: use flex arrays for XFS_IOC_GETPARENTS Darrick J. Wong
@ 2023-02-16 20:29 ` Darrick J. Wong
  2023-02-16 20:53   ` [PATCH 01/25] xfsprogs: Fix default superblock attr bits Darrick J. Wong
                     ` (24 more replies)
  2023-02-16 20:29 ` [PATCHSET v9r2d1 0/6] xfsprogs: bug fixes before parent pointers Darrick J. Wong
                   ` (14 subsequent siblings)
  25 siblings, 25 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:29 UTC (permalink / raw)
  To: djwong
  Cc: Allison Henderson, Mark Tinguely, Allison Collins,
	Darrick J. Wong, Catherine Hoang, Dave Chinner,
	allison.henderson, linux-xfs

Hi all,

NOTE: Darrick has tweaked some of these patches to match the kernel
code.

This is the latest parent pointer attributes for xfs.
The goal of this patch set is to add a parent pointer attribute to each inode.
The attribute name containing the parent inode, generation, and directory
offset, while the  attribute value contains the file name.  This feature will
enable future optimizations for online scrub, shrink, nfs handles, verity, or
any other feature that could make use of quickly deriving an inodes path from
the mount point.

This set can be viewed on github here
https://github.com/allisonhenderson/xfs/tree/xfs_new_pptrsv9_r2

And the corresponding xfsprogs code is here
https://github.com/allisonhenderson/xfsprogs/tree/xfsprogs_new_pptrs_v9_r2

This set has been tested with the below parent pointers tests
https://lore.kernel.org/fstests/20221012013812.82161-1-catherine.hoang@oracle.com/T/#t

Updates since v8:

xfs: parent pointer attribute creation
   Fix xfs_parent_init to release log assist on alloc fail
   Add slab cache for xfs_parent_defer
   Fix xfs_create to release after unlock
   Add xfs_parent_start and xfs_parent_finish wrappers
   removed unused xfs_parent_name_irec and xfs_init_parent_name_irec

xfs: add parent attributes to link
   Start/finish wrapper updates
   Fix xfs_link to disallow reservationless quotas
   
xfs: add parent attributes to symlink
   Fix xfs_symlink to release after unlock
   Start/finish wrapper updates
   
xfs: remove parent pointers in unlink
   Start/finish wrapper updates
   Add missing parent free

xfs: Add parent pointers to rename
   Start/finish wrapper updates
   Fix rename to only grab logged xattr once
   Fix xfs_rename to disallow reservationless quotas
   Fix double unlock on dqattach fail
   Move parent frees to out_release_wip
   
xfs: Add parent pointers to xfs_cross_rename
   Hoist parent pointers into rename

Questions comments and feedback appreciated!

Thanks all!
Allison

If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.
kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=pptrs

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=pptrs

fstests git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=pptrs
---
 db/attr.c                |    3 
 db/attrshort.c           |    3 
 include/handle.h         |    2 
 include/parent.h         |   25 ++
 io/parent.c              |  505 ++++++++++++++--------------------------------
 libfrog/fsgeom.c         |    4 
 libfrog/paths.c          |  136 ++++++++++++
 libfrog/paths.h          |   21 ++
 libhandle/Makefile       |    2 
 libhandle/handle.c       |    7 -
 libhandle/parent.c       |  361 +++++++++++++++++++++++++++++++++
 libxfs/Makefile          |    2 
 libxfs/libxfs_priv.h     |    5 
 libxfs/xfs_attr.c        |   71 ++++++
 libxfs/xfs_attr.h        |   13 +
 libxfs/xfs_da_btree.h    |    3 
 libxfs/xfs_da_format.h   |   26 ++
 libxfs/xfs_defer.c       |   28 ++-
 libxfs/xfs_defer.h       |    8 +
 libxfs/xfs_dir2.c        |   21 ++
 libxfs/xfs_dir2.h        |    7 -
 libxfs/xfs_dir2_block.c  |    9 -
 libxfs/xfs_dir2_leaf.c   |    8 +
 libxfs/xfs_dir2_node.c   |    8 +
 libxfs/xfs_dir2_sf.c     |    6 +
 libxfs/xfs_format.h      |    4 
 libxfs/xfs_fs.h          |   75 +++++++
 libxfs/xfs_log_format.h  |    7 -
 libxfs/xfs_log_rlimit.c  |   53 +++++
 libxfs/xfs_parent.c      |  204 +++++++++++++++++++
 libxfs/xfs_parent.h      |   84 ++++++++
 libxfs/xfs_sb.c          |    4 
 libxfs/xfs_trans_resv.c  |  322 +++++++++++++++++++++++++----
 libxfs/xfs_trans_space.h |    8 -
 logprint/log_redo.c      |  212 +++++++++++++++++--
 logprint/logprint.h      |    5 
 man/man3/xfsctl.3        |   55 +++++
 mkfs/proto.c             |   50 +++--
 mkfs/xfs_mkfs.c          |   31 ++-
 repair/attr_repair.c     |   19 +-
 repair/phase6.c          |   24 +-
 scrub/inodes.c           |   26 ++
 scrub/inodes.h           |    2 
 43 files changed, 1957 insertions(+), 512 deletions(-)
 create mode 100644 libhandle/parent.c
 create mode 100644 libxfs/xfs_parent.c
 create mode 100644 libxfs/xfs_parent.h


^ permalink raw reply	[flat|nested] 227+ messages in thread

* [PATCHSET v9r2d1 0/6] xfsprogs: bug fixes before parent pointers
  2023-02-16 20:06 [RFC DELUGE v9r2d1] xfs: Parent Pointers Darrick J. Wong
                   ` (10 preceding siblings ...)
  2023-02-16 20:29 ` [PATCHSET v9r2d1 00/25] xfsprogs: Parent Pointers Darrick J. Wong
@ 2023-02-16 20:29 ` Darrick J. Wong
  2023-02-16 21:00   ` [PATCH 1/6] libxfs: initialize the slab cache for parent defer items Darrick J. Wong
                     ` (5 more replies)
  2023-02-16 20:29 ` [PATCHSET v9r2d1 0/6] xfsprogs: tool fixes for parent pointers Darrick J. Wong
                   ` (13 subsequent siblings)
  25 siblings, 6 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:29 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

Hi all,

This series contains the accumulated bug fixes from Darrick to make
fstests pass and online repair work.  None of these are bug fixes for
parent pointers itself.

If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.
kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=pptrs-bugfixes

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=pptrs-bugfixes
---
 include/libxfs.h        |    1 +
 libxfs/init.c           |    3 ++
 libxfs/xfs_attr.c       |   61 +++++++---------------------------
 libxfs/xfs_attr.h       |    7 ++--
 libxfs/xfs_attr_leaf.c  |    6 ++-
 libxfs/xfs_attr_sf.h    |    1 +
 libxfs/xfs_dir2_block.c |    2 +
 libxfs/xfs_dir2_leaf.c  |    2 +
 libxfs/xfs_dir2_node.c  |    2 +
 libxfs/xfs_dir2_sf.c    |    4 ++
 libxfs/xfs_parent.c     |   84 +++++++++++++++++++++++++++++++++++++++++------
 libxfs/xfs_parent.h     |   28 +++++++++++++++-
 12 files changed, 136 insertions(+), 65 deletions(-)


^ permalink raw reply	[flat|nested] 227+ messages in thread

* [PATCHSET v9r2d1 0/6] xfsprogs: tool fixes for parent pointers
  2023-02-16 20:06 [RFC DELUGE v9r2d1] xfs: Parent Pointers Darrick J. Wong
                   ` (11 preceding siblings ...)
  2023-02-16 20:29 ` [PATCHSET v9r2d1 0/6] xfsprogs: bug fixes before parent pointers Darrick J. Wong
@ 2023-02-16 20:29 ` Darrick J. Wong
  2023-02-16 21:01   ` [PATCH 1/6] xfs_scrub: don't report media errors for space with unknowable owner Darrick J. Wong
                     ` (5 more replies)
  2023-02-16 20:29 ` [PATCHSET v9r2d1 00/10] xfsprogs: actually use getparent ioctl Darrick J. Wong
                   ` (12 subsequent siblings)
  25 siblings, 6 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:29 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

Hi all,

Here are a bunch of tooling changes for the parent pointers code.  The
only new feature here is to decode the parent pointer xattr name in
xfs_db so that we can interpret (and someday fuzz) them.

If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.
xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=pptrs-toolfixes
---
 db/attr.c                |   31 ++++++++++++++++++++++
 db/attrshort.c           |   25 ++++++++++++++++++
 db/metadump.c            |   34 +++++++++++++++++-------
 libxfs/init.c            |    4 +++
 libxfs/libxfs_api_defs.h |    4 +++
 libxfs/util.c            |   14 ++++++++++
 mkfs/proto.c             |   65 +++++++++++++++++++++++++++++++++-------------
 scrub/phase6.c           |   13 ++++++++-
 8 files changed, 161 insertions(+), 29 deletions(-)


^ permalink raw reply	[flat|nested] 227+ messages in thread

* [PATCHSET v9r2d1 00/10] xfsprogs: actually use getparent ioctl
  2023-02-16 20:06 [RFC DELUGE v9r2d1] xfs: Parent Pointers Darrick J. Wong
                   ` (12 preceding siblings ...)
  2023-02-16 20:29 ` [PATCHSET v9r2d1 0/6] xfsprogs: tool fixes for parent pointers Darrick J. Wong
@ 2023-02-16 20:29 ` Darrick J. Wong
  2023-02-16 21:03   ` [PATCH 01/10] xfs_scrub: revert unnecessary code from "implement the upper half of parent pointers" Darrick J. Wong
                     ` (9 more replies)
  2023-02-16 20:30 ` [PATCHSET v9r2d1 0/4] xfsprogs: offline fsck support patches Darrick J. Wong
                   ` (11 subsequent siblings)
  25 siblings, 10 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:29 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

Hi all,

Yikes.  The userspace parent pointers support code dates from 2017 and
is very very moldy.  This patchset moves the xfs_io filtering stuff back
to xfs_io.  It also moves the parent pointer support code to libfrog
because we don't want to expose things via libhandle until we're
absolutely sure that we want to do that.

(We probably want to do that some day.)

Finally, adapt xfs_scrub to use parent pointer information whenever it
has something to say about a file handle that it has open.

If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.
xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=pptrs-use-getparents
---
 include/parent.h   |   25 ------
 io/parent.c        |  122 ++++++++++++++++++-----------
 libfrog/Makefile   |    2 
 libfrog/paths.c    |   39 ++++++++-
 libfrog/paths.h    |    8 +-
 libfrog/pptrs.c    |  219 ++++++++++++++++++++++------------------------------
 libfrog/pptrs.h    |   25 ++++++
 libhandle/Makefile |    2 
 scrub/common.c     |   21 +++++
 scrub/inodes.c     |   26 ------
 scrub/inodes.h     |    2 
 11 files changed, 256 insertions(+), 235 deletions(-)
 rename libhandle/parent.c => libfrog/pptrs.c (50%)
 create mode 100644 libfrog/pptrs.h


^ permalink raw reply	[flat|nested] 227+ messages in thread

* [PATCHSET v9r2d1 0/4] xfsprogs: offline fsck support patches
  2023-02-16 20:06 [RFC DELUGE v9r2d1] xfs: Parent Pointers Darrick J. Wong
                   ` (13 preceding siblings ...)
  2023-02-16 20:29 ` [PATCHSET v9r2d1 00/10] xfsprogs: actually use getparent ioctl Darrick J. Wong
@ 2023-02-16 20:30 ` Darrick J. Wong
  2023-02-16 21:06   ` [PATCH 1/4] libxfs: add xfile support Darrick J. Wong
                     ` (3 more replies)
  2023-02-16 20:30 ` [PATCHSET v9r2d1 0/3] xfsprogs: online repair of directories Darrick J. Wong
                   ` (10 subsequent siblings)
  25 siblings, 4 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:30 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

Hi all,

These are all the patches that I needed to port from the patchset that
backports various online fsck things to offline fsck so that I can start
writing offline fsck for parent pointers and directories.

IOWS, we're blatantly copying things from the online repair part 1
megaseries; this is what online repair part 2 requires.

If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.
kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=pptrs-online-fsck-backports

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=pptrs-online-fsck-backports
---
 configure.ac             |    3 +
 db/attr.c                |    2 
 db/metadump.c            |    2 
 include/builddefs.in     |    3 +
 libxfs/Makefile          |   14 +++
 libxfs/libxfs_api_defs.h |    1 
 libxfs/xfblob.c          |  148 +++++++++++++++++++++++++++++
 libxfs/xfblob.h          |   25 +++++
 libxfs/xfile.c           |  235 ++++++++++++++++++++++++++++++++++++++++++++++
 libxfs/xfile.h           |   57 +++++++++++
 libxfs/xfs_dir2.c        |    6 +
 libxfs/xfs_dir2.h        |    1 
 m4/package_libcdev.m4    |   50 ++++++++++
 repair/attr_repair.c     |    6 +
 repair/phase6.c          |    4 -
 repair/xfs_repair.c      |   15 +++
 16 files changed, 563 insertions(+), 9 deletions(-)
 create mode 100644 libxfs/xfblob.c
 create mode 100644 libxfs/xfblob.h
 create mode 100644 libxfs/xfile.c
 create mode 100644 libxfs/xfile.h


^ permalink raw reply	[flat|nested] 227+ messages in thread

* [PATCHSET v9r2d1 0/3] xfsprogs: online repair of directories
  2023-02-16 20:06 [RFC DELUGE v9r2d1] xfs: Parent Pointers Darrick J. Wong
                   ` (14 preceding siblings ...)
  2023-02-16 20:30 ` [PATCHSET v9r2d1 0/4] xfsprogs: offline fsck support patches Darrick J. Wong
@ 2023-02-16 20:30 ` Darrick J. Wong
  2023-02-16 21:07   ` [PATCH 1/3] xfs: shorten parent pointer function names Darrick J. Wong
                     ` (2 more replies)
  2023-02-16 20:30 ` [PATCHSET v9r2d1 0/1] xfsprogs: online checking of parent pointers Darrick J. Wong
                   ` (9 subsequent siblings)
  25 siblings, 3 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:30 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

Hi all,

With this patchset, we implement online reconstruction of directories by
scanning the entire filesystem looking for parent pointer data.  This
mostly works, except for the part where we need to resync the diroffset
field of the parent pointers to match the new directory structure.

Fixing that is left as an open research question, with a few possible
solutions:

1. As part of committing the new directory, queue a bunch of parent
pointer updates to make those changes.

2. Leave them inconsistent and let the parent pointer repair fix it.

3. Change the ondisk format of parent pointers (and xattrs) so that we
can encode the full dirent name in the xattr name.

4. Change the ondisk format of parent pointers to encode a sha256 hash
of the dirent name in the xattr name.  This will work as long as nobody
breaks sha256.

Thoughts?  Note that the atomic swapext and block reaping code is NOT
ported for this PoC, so we do not commit any repairs.

If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.
kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=pptrs-online-dir-repair

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=pptrs-online-dir-repair
---
 libxfs/libxfs_api_defs.h |    3 +--
 libxfs/xfs_da_format.h   |   11 +++++++++++
 libxfs/xfs_dir2.c        |    2 +-
 libxfs/xfs_dir2.h        |    2 +-
 libxfs/xfs_parent.c      |   47 +++++++++++++++++++++++++++++++++++++++-------
 libxfs/xfs_parent.h      |   24 +++++++++++------------
 mkfs/proto.c             |   12 ++++++------
 7 files changed, 71 insertions(+), 30 deletions(-)


^ permalink raw reply	[flat|nested] 227+ messages in thread

* [PATCHSET v9r2d1 0/1] xfsprogs: online checking of parent pointers
  2023-02-16 20:06 [RFC DELUGE v9r2d1] xfs: Parent Pointers Darrick J. Wong
                   ` (15 preceding siblings ...)
  2023-02-16 20:30 ` [PATCHSET v9r2d1 0/3] xfsprogs: online repair of directories Darrick J. Wong
@ 2023-02-16 20:30 ` Darrick J. Wong
  2023-02-16 21:07   ` [PATCH 1/1] xfs: deferred scrub " Darrick J. Wong
  2023-02-16 20:30 ` [PATCHSET v9r2d1 0/2] xfsprogs: online checking " Darrick J. Wong
                   ` (8 subsequent siblings)
  25 siblings, 1 reply; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:30 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

Hi all,

Update the existing online parent pointer checker to confirm the
directory entries that should also exist.

If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.
kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=pptrs-online-parent-check

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=pptrs-online-parent-check
---
 libxfs/xfs_parent.c |   38 ++++++++++++++++++++++++++++++++++++++
 libxfs/xfs_parent.h |   10 ++++++++++
 2 files changed, 48 insertions(+)


^ permalink raw reply	[flat|nested] 227+ messages in thread

* [PATCHSET v9r2d1 0/2] xfsprogs: online checking of parent pointers
  2023-02-16 20:06 [RFC DELUGE v9r2d1] xfs: Parent Pointers Darrick J. Wong
                   ` (16 preceding siblings ...)
  2023-02-16 20:30 ` [PATCHSET v9r2d1 0/1] xfsprogs: online checking of parent pointers Darrick J. Wong
@ 2023-02-16 20:30 ` Darrick J. Wong
  2023-02-16 21:08   ` [PATCH 1/2] xfs: repair parent pointers by scanning directories Darrick J. Wong
  2023-02-16 21:08   ` [PATCH 2/2] xfs: repair parent pointers with live scan hooks Darrick J. Wong
  2023-02-16 20:31 ` [PATCHSET v9r2d1 0/8] xfs_repair: support parent pointers Darrick J. Wong
                   ` (7 subsequent siblings)
  25 siblings, 2 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:30 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

Hi all,

With this patchset, we implement online repairs for parent pointers.
This is structured similarly to the directory repair code in that we
scan the entire filesystem looking for dirents and use them to
reconstruct the parent pointer information.

Note that the atomic swapext and block reaping code is NOT ported for
this PoC, so we do not commit any repairs.

If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.
kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=pptrs-online-parent-repair

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=pptrs-online-parent-repair
---
 include/xfs_inode.h |    6 +++++
 libxfs/xfs_parent.c |   56 +++++++++++++++++++++++++++++++++++++++++++++++++--
 libxfs/xfs_parent.h |    8 +++++++
 3 files changed, 68 insertions(+), 2 deletions(-)


^ permalink raw reply	[flat|nested] 227+ messages in thread

* [PATCHSET v9r2d1 0/8] xfs_repair: support parent pointers
  2023-02-16 20:06 [RFC DELUGE v9r2d1] xfs: Parent Pointers Darrick J. Wong
                   ` (17 preceding siblings ...)
  2023-02-16 20:30 ` [PATCHSET v9r2d1 0/2] xfsprogs: online checking " Darrick J. Wong
@ 2023-02-16 20:31 ` Darrick J. Wong
  2023-02-16 21:08   ` [PATCH 1/8] xfs_repair: build a parent pointer index Darrick J. Wong
                     ` (7 more replies)
  2023-02-16 20:31 ` [PATCHSET v9r2d1 0/6] xfsprogs: encode parent pointer name in xattr key Darrick J. Wong
                   ` (6 subsequent siblings)
  25 siblings, 8 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:31 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

Hi all,

This patchset implements offline checking and repair for parent
pointers.  We do this rather expensively by constructing a (per-AG)
master list of parent pointers for inodes rooted in that AG.  Next, we
walk each inode of that AG, construct an index of that file's parent
pointers, and then compare the file index against the relevant part of
the master index.  From there we can sync the parent pointers as needed.

If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.
xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=pptrs-offline-repair
---
 libxfs/libxfs_api_defs.h |    7 
 libxfs/xfblob.c          |    9 
 libxfs/xfblob.h          |    2 
 repair/Makefile          |    6 
 repair/listxattr.c       |  283 ++++++++++++
 repair/listxattr.h       |   15 +
 repair/phase6.c          |   57 ++
 repair/pptr.c            | 1111 ++++++++++++++++++++++++++++++++++++++++++++++
 repair/pptr.h            |   17 +
 repair/strblobs.c        |  215 +++++++++
 repair/strblobs.h        |   22 +
 11 files changed, 1737 insertions(+), 7 deletions(-)
 create mode 100644 repair/listxattr.c
 create mode 100644 repair/listxattr.h
 create mode 100644 repair/pptr.c
 create mode 100644 repair/pptr.h
 create mode 100644 repair/strblobs.c
 create mode 100644 repair/strblobs.h


^ permalink raw reply	[flat|nested] 227+ messages in thread

* [PATCHSET v9r2d1 0/6] xfsprogs: encode parent pointer name in xattr key
  2023-02-16 20:06 [RFC DELUGE v9r2d1] xfs: Parent Pointers Darrick J. Wong
                   ` (18 preceding siblings ...)
  2023-02-16 20:31 ` [PATCHSET v9r2d1 0/8] xfs_repair: support parent pointers Darrick J. Wong
@ 2023-02-16 20:31 ` Darrick J. Wong
  2023-02-16 21:10   ` [PATCH 1/6] libfrog: support the sha512 hash algorithm Darrick J. Wong
                     ` (5 more replies)
  2023-02-16 20:31 ` [PATCHSET v9r2d1 0/3] xfsprogs: use flex arrays for XFS_IOC_GETPARENTS Darrick J. Wong
                   ` (5 subsequent siblings)
  25 siblings, 6 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:31 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

Hi all,

As I've mentioned in past comments on the parent pointers patchset, the
proposed ondisk parent pointer format presents a major difficulty for
online directory repair.  This difficulty derives from encoding the
directory offset of the dirent that the parent pointer is mirroring.
Recall that parent pointers are stored in extended attributes:

    (parent_ino, parent_gen, diroffset) -> (dirent_name)

If the directory is rebuilt, the offsets of the new directory entries
must match the diroffset encoded in the parent pointer, or the
filesystem becomes inconsistent.  There are a few ways to solve this
problem.

One approach would be to augment the directory addname function to take
a diroffset and try to create the new entry at that offset.  This will
not work if the original directory became corrupt and the parent
pointers were written out with impossible diroffsets (e.g. overlapping).
Requiring matching diroffsets also prevents reorganization and
compaction of directories.

This could be remedied by recording the parent pointer diroffset updates
necessary to retain consistency, and using the logged parent pointer
replace function to rewrite parent pointers as necessary.  This is a
poor choice from a performance perspective because the logged xattr
updates must be committed in the same transaction that commits the new
directory structure.  If there are a large number of diroffset updates,
then the directory commit could take an even longer time.

Worse yet, if the logged xattr updates fill up the transaction, repair
will have no choice but to roll to a fresh transaction to continue
logging.  This breaks repair's policy that repairs should commit
atomically.  It may break the filesystem as well, since all files
involved are pinned until the delayed pptr xattr processing completes.
This is a completely bad engineering choice.

Note that the diroffset information is not used anywhere in the
directory lookup code.  Observe that the only information that we
require for a parent pointer is the inverse of an pre-ftype dirent,
since this is all we need to reconstruct a directory entry:

    (parent_ino, dirent_name) -> NULL

The xattr code supports xattrs with zero-length values, surprisingly.
The parent_gen field makes it easy to export parent handle information,
so it can be retained:

    (parent_ino, parent_gen, dirent_name) -> NULL

Moving the ondisk format to this format is very advantageous for repair
code.  Unfortunately, there is one hitch: xattr names cannot exceed 255
bytes due to ondisk format limitations.  We don't want to constrain the
length of dirent names, so instead we could use collision resistant
hashes to handle dirents with very long names:

    (parent_ino, parent_gen, sha512(dirent_name)) -> (dirent_name)

The first two patches implement this schema.  However, this encoding is
not maximally efficient, since many directory names are shorter than the
length of a sha512 hash.  The last three patches in the series bifurcate
the parent pointer ondisk format depending on context:

For dirent names shorter than 243 bytes:

    (parent_ino, parent_gen, dirent_name) -> NULL

For dirent names longer than 243 bytes:

    (parent_ino, parent_gen, dirent_name[0:178],
     sha512(child_gen, dirent_name)) -> (dirent_name[179:255])

The child file's generation number is mixed into the sha512 computation
to make it a little more difficult for unprivileged userspace to attempt
collisions.

A messier solution to this problem would be to extend the xattr ondisk
format to allow parent pointers to have xattr names up to 267 bytes.
This would likely involve redefining the ondisk namelen field to omit
the size of the parent ino/gen information and might be madness.

If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.
kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=pptrs-name-in-attr-key

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=pptrs-name-in-attr-key

fstests git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=pptrs-name-in-attr-key
---
 db/attr.c                |   29 +++++
 db/attrshort.c           |   19 +++
 db/field.c               |    2 
 db/field.h               |    1 
 db/fprint.c              |   31 +++++
 db/fprint.h              |    2 
 db/metadump.c            |   39 +++----
 include/libxfs.h         |    1 
 io/crc32cselftest.c      |   22 ++++
 libfrog/Makefile         |   10 +-
 libfrog/sha512.c         |  249 +++++++++++++++++++++++++++++++++++++++++++
 libfrog/sha512.h         |   33 ++++++
 libfrog/sha512selftest.h |   86 +++++++++++++++
 libxfs/libxfs_api_defs.h |    2 
 libxfs/libxfs_priv.h     |    1 
 libxfs/xfs_da_format.h   |   49 ++++++++-
 libxfs/xfs_fs.h          |    4 -
 libxfs/xfs_parent.c      |  264 +++++++++++++++++++++++++++++++++++++---------
 libxfs/xfs_parent.h      |   48 ++++++--
 libxfs/xfs_trans_resv.c  |    6 +
 logprint/log_redo.c      |  124 ++++++++++++++++------
 logprint/logprint.h      |    3 -
 man/man3/xfsctl.3        |    1 
 man/man8/xfs_io.8        |    4 +
 mkfs/proto.c             |    7 +
 mkfs/xfs_mkfs.c          |    8 +
 repair/init.c            |    5 +
 repair/phase6.c          |   13 +-
 repair/pptr.c            |  199 +++++++++++++++++++++++------------
 repair/pptr.h            |    2 
 30 files changed, 1043 insertions(+), 221 deletions(-)
 create mode 100644 libfrog/sha512.c
 create mode 100644 libfrog/sha512.h
 create mode 100644 libfrog/sha512selftest.h


^ permalink raw reply	[flat|nested] 227+ messages in thread

* [PATCHSET v9r2d1 0/3] xfsprogs: use flex arrays for XFS_IOC_GETPARENTS
  2023-02-16 20:06 [RFC DELUGE v9r2d1] xfs: Parent Pointers Darrick J. Wong
                   ` (19 preceding siblings ...)
  2023-02-16 20:31 ` [PATCHSET v9r2d1 0/6] xfsprogs: encode parent pointer name in xattr key Darrick J. Wong
@ 2023-02-16 20:31 ` Darrick J. Wong
  2023-02-16 21:12   ` [PATCH 1/3] xfs: rename xfs_pptr_info to xfs_getparents Darrick J. Wong
                     ` (2 more replies)
  2023-02-16 20:31 ` [PATCHSET v9r2d1 0/3] xfsprogs: turn on all available features Darrick J. Wong
                   ` (4 subsequent siblings)
  25 siblings, 3 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:31 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

Hi all,

Change the XFS_IOC_GETPARENTS structure definitions so that we can pack
parent pointer information more densely, in a manner similar to the
attrlistmulti ioctl.  This also reduces the amount of memory that has to
be copied back to userspace if the buffer doesn't fill up.

If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.
kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=pptrs-ioctl-flexarray

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=pptrs-ioctl-flexarray

fstests git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=pptrs-ioctl-flexarray
---
 io/parent.c       |   22 ++++++++--------
 libfrog/pptrs.c   |   44 ++++++++++++++++---------------
 libfrog/pptrs.h   |    4 +--
 libxfs/xfs_fs.h   |   75 +++++++++++++++++++++++------------------------------
 man/man3/xfsctl.3 |   16 ++++++-----
 5 files changed, 76 insertions(+), 85 deletions(-)


^ permalink raw reply	[flat|nested] 227+ messages in thread

* [PATCHSET v9r2d1 0/3] xfsprogs: turn on all available features
  2023-02-16 20:06 [RFC DELUGE v9r2d1] xfs: Parent Pointers Darrick J. Wong
                   ` (20 preceding siblings ...)
  2023-02-16 20:31 ` [PATCHSET v9r2d1 0/3] xfsprogs: use flex arrays for XFS_IOC_GETPARENTS Darrick J. Wong
@ 2023-02-16 20:31 ` Darrick J. Wong
  2023-02-16 21:13   ` [PATCH 1/3] mkfs: enable large extent counts by default Darrick J. Wong
                     ` (2 more replies)
  2023-02-16 20:32 ` [PATCHSET 00/14] fstests: adjust tests for xfs parent pointers Darrick J. Wong
                   ` (3 subsequent siblings)
  25 siblings, 3 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:31 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

Hi all,

Enable reverse mapping, large extent counts, and parent pointers by default.

If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.
xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=pptrs-mkfs-defaults
---
 man/man8/mkfs.xfs.8.in |   11 ++++++-----
 mkfs/xfs_mkfs.c        |    6 +++---
 2 files changed, 9 insertions(+), 8 deletions(-)


^ permalink raw reply	[flat|nested] 227+ messages in thread

* [PATCHSET 00/14] fstests: adjust tests for xfs parent pointers
  2023-02-16 20:06 [RFC DELUGE v9r2d1] xfs: Parent Pointers Darrick J. Wong
                   ` (21 preceding siblings ...)
  2023-02-16 20:31 ` [PATCHSET v9r2d1 0/3] xfsprogs: turn on all available features Darrick J. Wong
@ 2023-02-16 20:32 ` Darrick J. Wong
  2023-02-16 21:13   ` [PATCH 01/14] xfs/122: update for " Darrick J. Wong
                     ` (13 more replies)
  2023-02-16 20:32 ` [PATCHSET v9r2 0/4] fstests: encode parent pointer name in xattr key Darrick J. Wong
                   ` (2 subsequent siblings)
  25 siblings, 14 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:32 UTC (permalink / raw)
  To: djwong, zlang
  Cc: Catherine Hoang, Allison Henderson, linux-xfs, fstests, guan

Hi all,

Adjust fstests as necessary to test the new xfs parent pointers feature.
At some point this section needs to grow some specific functionality
tests for repair and dumping.

If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=pptrs

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=pptrs

fstests git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=pptrs
---
 common/parent                        |  209 +++++++
 common/populate                      |   38 +
 common/rc                            |    7 
 common/xfs                           |   12 
 doc/group-names.txt                  |    1 
 src/popdir.pl                        |   11 
 tests/generic/050                    |    9 
 tests/generic/050.cfg                |    1 
 tests/generic/050.out.xfsquotaparent |   23 +
 tests/xfs/018                        |    7 
 tests/xfs/021                        |   15 -
 tests/xfs/021.cfg                    |    1 
 tests/xfs/021.out.default            |    0 
 tests/xfs/021.out.parent             |   64 ++
 tests/xfs/122.out                    |    4 
 tests/xfs/191                        |    7 
 tests/xfs/206                        |    3 
 tests/xfs/288                        |    7 
 tests/xfs/306                        |    9 
 tests/xfs/851                        |  116 ++++
 tests/xfs/851.out                    |   69 ++
 tests/xfs/852                        |   69 ++
 tests/xfs/852.out                    | 1002 ++++++++++++++++++++++++++++++++++
 tests/xfs/853                        |   85 +++
 tests/xfs/853.out                    |   14 
 25 files changed, 1772 insertions(+), 11 deletions(-)
 create mode 100644 common/parent
 create mode 100644 tests/generic/050.out.xfsquotaparent
 create mode 100644 tests/xfs/021.cfg
 rename tests/xfs/{021.out => 021.out.default} (100%)
 create mode 100644 tests/xfs/021.out.parent
 create mode 100755 tests/xfs/851
 create mode 100644 tests/xfs/851.out
 create mode 100755 tests/xfs/852
 create mode 100644 tests/xfs/852.out
 create mode 100755 tests/xfs/853
 create mode 100644 tests/xfs/853.out


^ permalink raw reply	[flat|nested] 227+ messages in thread

* [PATCHSET v9r2 0/4] fstests: encode parent pointer name in xattr key
  2023-02-16 20:06 [RFC DELUGE v9r2d1] xfs: Parent Pointers Darrick J. Wong
                   ` (22 preceding siblings ...)
  2023-02-16 20:32 ` [PATCHSET 00/14] fstests: adjust tests for xfs parent pointers Darrick J. Wong
@ 2023-02-16 20:32 ` Darrick J. Wong
  2023-02-16 21:17   ` [PATCH 1/4] misc: adjust for parent pointers with namehashes Darrick J. Wong
                     ` (3 more replies)
  2023-02-16 20:32 ` [PATCHSET v9r2 0/1] fstests: use flex arrays for XFS_IOC_GETPARENTS Darrick J. Wong
  2023-02-17 20:02 ` [RFC DELUGE v9r2d1] xfs: Parent Pointers Allison Henderson
  25 siblings, 4 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:32 UTC (permalink / raw)
  To: djwong, zlang; +Cc: linux-xfs, fstests, guan

Hi all,

As I've mentioned in past comments on the parent pointers patchset, the
proposed ondisk parent pointer format presents a major difficulty for
online directory repair.  This difficulty derives from encoding the
directory offset of the dirent that the parent pointer is mirroring.
Recall that parent pointers are stored in extended attributes:

    (parent_ino, parent_gen, diroffset) -> (dirent_name)

If the directory is rebuilt, the offsets of the new directory entries
must match the diroffset encoded in the parent pointer, or the
filesystem becomes inconsistent.  There are a few ways to solve this
problem.

One approach would be to augment the directory addname function to take
a diroffset and try to create the new entry at that offset.  This will
not work if the original directory became corrupt and the parent
pointers were written out with impossible diroffsets (e.g. overlapping).
Requiring matching diroffsets also prevents reorganization and
compaction of directories.

This could be remedied by recording the parent pointer diroffset updates
necessary to retain consistency, and using the logged parent pointer
replace function to rewrite parent pointers as necessary.  This is a
poor choice from a performance perspective because the logged xattr
updates must be committed in the same transaction that commits the new
directory structure.  If there are a large number of diroffset updates,
then the directory commit could take an even longer time.

Worse yet, if the logged xattr updates fill up the transaction, repair
will have no choice but to roll to a fresh transaction to continue
logging.  This breaks repair's policy that repairs should commit
atomically.  It may break the filesystem as well, since all files
involved are pinned until the delayed pptr xattr processing completes.
This is a completely bad engineering choice.

Note that the diroffset information is not used anywhere in the
directory lookup code.  Observe that the only information that we
require for a parent pointer is the inverse of an pre-ftype dirent,
since this is all we need to reconstruct a directory entry:

    (parent_ino, dirent_name) -> NULL

The xattr code supports xattrs with zero-length values, surprisingly.
The parent_gen field makes it easy to export parent handle information,
so it can be retained:

    (parent_ino, parent_gen, dirent_name) -> NULL

Moving the ondisk format to this format is very advantageous for repair
code.  Unfortunately, there is one hitch: xattr names cannot exceed 255
bytes due to ondisk format limitations.  We don't want to constrain the
length of dirent names, so instead we could use collision resistant
hashes to handle dirents with very long names:

    (parent_ino, parent_gen, sha512(dirent_name)) -> (dirent_name)

The first two patches implement this schema.  However, this encoding is
not maximally efficient, since many directory names are shorter than the
length of a sha512 hash.  The last three patches in the series bifurcate
the parent pointer ondisk format depending on context:

For dirent names shorter than 243 bytes:

    (parent_ino, parent_gen, dirent_name) -> NULL

For dirent names longer than 243 bytes:

    (parent_ino, parent_gen, dirent_name[0:178],
     sha512(child_gen, dirent_name)) -> (dirent_name[179:255])

The child file's generation number is mixed into the sha512 computation
to make it a little more difficult for unprivileged userspace to attempt
collisions.

A messier solution to this problem would be to extend the xattr ondisk
format to allow parent pointers to have xattr names up to 267 bytes.
This would likely involve redefining the ondisk namelen field to omit
the size of the parent ino/gen information and might be madness.

If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=pptrs-name-in-attr-key

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=pptrs-name-in-attr-key

fstests git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=pptrs-name-in-attr-key
---
 common/punch             |    8 ++++++++
 tests/xfs/021.out.parent |   22 ++++++++++------------
 tests/xfs/122.out        |    4 ++--
 3 files changed, 20 insertions(+), 14 deletions(-)


^ permalink raw reply	[flat|nested] 227+ messages in thread

* [PATCHSET v9r2 0/1] fstests: use flex arrays for XFS_IOC_GETPARENTS
  2023-02-16 20:06 [RFC DELUGE v9r2d1] xfs: Parent Pointers Darrick J. Wong
                   ` (23 preceding siblings ...)
  2023-02-16 20:32 ` [PATCHSET v9r2 0/4] fstests: encode parent pointer name in xattr key Darrick J. Wong
@ 2023-02-16 20:32 ` Darrick J. Wong
  2023-02-16 21:18   ` [PATCH 1/1] xfs/122: adjust for flex-array XFS_IOC_GETPARENTS ioctl Darrick J. Wong
  2023-02-17 20:02 ` [RFC DELUGE v9r2d1] xfs: Parent Pointers Allison Henderson
  25 siblings, 1 reply; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:32 UTC (permalink / raw)
  To: djwong, zlang; +Cc: linux-xfs, fstests, guan

Hi all,

Change the XFS_IOC_GETPARENTS structure definitions so that we can pack
parent pointer information more densely, in a manner similar to the
attrlistmulti ioctl.  This also reduces the amount of memory that has to
be copied back to userspace if the buffer doesn't fill up.

If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=pptrs-ioctl-flexarray

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=pptrs-ioctl-flexarray

fstests git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=pptrs-ioctl-flexarray
---
 tests/xfs/122.out |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)


^ permalink raw reply	[flat|nested] 227+ messages in thread

* [PATCH 01/28] xfs: Add new name to attri/d
  2023-02-16 20:26 ` [PATCHSET v9r2d1 00/28] " Darrick J. Wong
@ 2023-02-16 20:32   ` Darrick J. Wong
  2023-02-16 20:33   ` [PATCH 02/28] xfs: Increase XFS_DEFER_OPS_NR_INODES to 5 Darrick J. Wong
                     ` (26 subsequent siblings)
  27 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:32 UTC (permalink / raw)
  To: djwong; +Cc: Allison Henderson, allison.henderson, linux-xfs

From: Allison Henderson <allison.henderson@oracle.com>

This patch adds two new fields to the atti/d.  They are nname and
nnamelen.  This will be used for parent pointer updates since a
rename operation may cause the parent pointer to update both the
name and value.  So we need to carry both the new name as well as
the target name in the attri/d.

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
---
 fs/xfs/libxfs/xfs_attr.c       |   12 +++-
 fs/xfs/libxfs/xfs_attr.h       |    4 +
 fs/xfs/libxfs/xfs_da_btree.h   |    2 +
 fs/xfs/libxfs/xfs_log_format.h |    6 +-
 fs/xfs/xfs_attr_item.c         |  135 +++++++++++++++++++++++++++++++++-------
 fs/xfs/xfs_attr_item.h         |    1 
 6 files changed, 133 insertions(+), 27 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index e28d93d232de..b1dbed7655e8 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -423,6 +423,12 @@ xfs_attr_complete_op(
 	args->op_flags &= ~XFS_DA_OP_REPLACE;
 	if (do_replace) {
 		args->attr_filter &= ~XFS_ATTR_INCOMPLETE;
+		if (args->new_namelen > 0) {
+			args->name = args->new_name;
+			args->namelen = args->new_namelen;
+			args->hashval = xfs_da_hashname(args->name,
+							args->namelen);
+		}
 		return replace_state;
 	}
 	return XFS_DAS_DONE;
@@ -922,9 +928,13 @@ xfs_attr_defer_replace(
 	struct xfs_da_args	*args)
 {
 	struct xfs_attr_intent	*new;
+	int			op_flag;
 	int			error = 0;
 
-	error = xfs_attr_intent_init(args, XFS_ATTRI_OP_FLAGS_REPLACE, &new);
+	op_flag = args->new_namelen == 0 ? XFS_ATTRI_OP_FLAGS_REPLACE :
+		  XFS_ATTRI_OP_FLAGS_NVREPLACE;
+
+	error = xfs_attr_intent_init(args, op_flag, &new);
 	if (error)
 		return error;
 
diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
index 81be9b3e4004..3e81f3f48560 100644
--- a/fs/xfs/libxfs/xfs_attr.h
+++ b/fs/xfs/libxfs/xfs_attr.h
@@ -510,8 +510,8 @@ struct xfs_attr_intent {
 	struct xfs_da_args		*xattri_da_args;
 
 	/*
-	 * Shared buffer containing the attr name and value so that the logging
-	 * code can share large memory buffers between log items.
+	 * Shared buffer containing the attr name, new name, and value so that
+	 * the logging code can share large memory buffers between log items.
 	 */
 	struct xfs_attri_log_nameval	*xattri_nameval;
 
diff --git a/fs/xfs/libxfs/xfs_da_btree.h b/fs/xfs/libxfs/xfs_da_btree.h
index ffa3df5b2893..a4b29827603f 100644
--- a/fs/xfs/libxfs/xfs_da_btree.h
+++ b/fs/xfs/libxfs/xfs_da_btree.h
@@ -55,7 +55,9 @@ enum xfs_dacmp {
 typedef struct xfs_da_args {
 	struct xfs_da_geometry *geo;	/* da block geometry */
 	const uint8_t		*name;		/* string (maybe not NULL terminated) */
+	const uint8_t	*new_name;	/* new attr name */
 	int		namelen;	/* length of string (maybe no NULL) */
+	int		new_namelen;	/* new attr name len */
 	uint8_t		filetype;	/* filetype of inode for directories */
 	void		*value;		/* set of bytes (maybe contain NULLs) */
 	int		valuelen;	/* length of value */
diff --git a/fs/xfs/libxfs/xfs_log_format.h b/fs/xfs/libxfs/xfs_log_format.h
index f13e0809dc63..ae9c99762a24 100644
--- a/fs/xfs/libxfs/xfs_log_format.h
+++ b/fs/xfs/libxfs/xfs_log_format.h
@@ -117,7 +117,8 @@ struct xfs_unmount_log_format {
 #define XLOG_REG_TYPE_ATTRD_FORMAT	28
 #define XLOG_REG_TYPE_ATTR_NAME	29
 #define XLOG_REG_TYPE_ATTR_VALUE	30
-#define XLOG_REG_TYPE_MAX		30
+#define XLOG_REG_TYPE_ATTR_NNAME	31
+#define XLOG_REG_TYPE_MAX		31
 
 
 /*
@@ -957,6 +958,7 @@ struct xfs_icreate_log {
 #define XFS_ATTRI_OP_FLAGS_SET		1	/* Set the attribute */
 #define XFS_ATTRI_OP_FLAGS_REMOVE	2	/* Remove the attribute */
 #define XFS_ATTRI_OP_FLAGS_REPLACE	3	/* Replace the attribute */
+#define XFS_ATTRI_OP_FLAGS_NVREPLACE	4	/* Replace attr name and val */
 #define XFS_ATTRI_OP_FLAGS_TYPE_MASK	0xFF	/* Flags type mask */
 
 /*
@@ -974,7 +976,7 @@ struct xfs_icreate_log {
 struct xfs_attri_log_format {
 	uint16_t	alfi_type;	/* attri log item type */
 	uint16_t	alfi_size;	/* size of this item */
-	uint32_t	__pad;		/* pad to 64 bit aligned */
+	uint32_t	alfi_nname_len;	/* attr new name length */
 	uint64_t	alfi_id;	/* attri identifier */
 	uint64_t	alfi_ino;	/* the inode for this attr operation */
 	uint32_t	alfi_op_flags;	/* marks the op as a set or remove */
diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
index 2788a6f2edcd..95e9ecbb4a67 100644
--- a/fs/xfs/xfs_attr_item.c
+++ b/fs/xfs/xfs_attr_item.c
@@ -75,6 +75,8 @@ static inline struct xfs_attri_log_nameval *
 xfs_attri_log_nameval_alloc(
 	const void			*name,
 	unsigned int			name_len,
+	const void			*nname,
+	unsigned int			nname_len,
 	const void			*value,
 	unsigned int			value_len)
 {
@@ -85,15 +87,25 @@ xfs_attri_log_nameval_alloc(
 	 * this. But kvmalloc() utterly sucks, so we use our own version.
 	 */
 	nv = xlog_kvmalloc(sizeof(struct xfs_attri_log_nameval) +
-					name_len + value_len);
+					name_len + nname_len + value_len);
 
 	nv->name.i_addr = nv + 1;
 	nv->name.i_len = name_len;
 	nv->name.i_type = XLOG_REG_TYPE_ATTR_NAME;
 	memcpy(nv->name.i_addr, name, name_len);
 
+	if (nname_len) {
+		nv->nname.i_addr = nv->name.i_addr + name_len;
+		nv->nname.i_len = nname_len;
+		memcpy(nv->nname.i_addr, nname, nname_len);
+	} else {
+		nv->nname.i_addr = NULL;
+		nv->nname.i_len = 0;
+	}
+	nv->nname.i_type = XLOG_REG_TYPE_ATTR_NNAME;
+
 	if (value_len) {
-		nv->value.i_addr = nv->name.i_addr + name_len;
+		nv->value.i_addr = nv->name.i_addr + nname_len + name_len;
 		nv->value.i_len = value_len;
 		memcpy(nv->value.i_addr, value, value_len);
 	} else {
@@ -147,11 +159,15 @@ xfs_attri_item_size(
 	*nbytes += sizeof(struct xfs_attri_log_format) +
 			xlog_calc_iovec_len(nv->name.i_len);
 
-	if (!nv->value.i_len)
-		return;
+	if (nv->nname.i_len) {
+		*nvecs += 1;
+		*nbytes += xlog_calc_iovec_len(nv->nname.i_len);
+	}
 
-	*nvecs += 1;
-	*nbytes += xlog_calc_iovec_len(nv->value.i_len);
+	if (nv->value.i_len) {
+		*nvecs += 1;
+		*nbytes += xlog_calc_iovec_len(nv->value.i_len);
+	}
 }
 
 /*
@@ -181,6 +197,9 @@ xfs_attri_item_format(
 	ASSERT(nv->name.i_len > 0);
 	attrip->attri_format.alfi_size++;
 
+	if (nv->nname.i_len > 0)
+		attrip->attri_format.alfi_size++;
+
 	if (nv->value.i_len > 0)
 		attrip->attri_format.alfi_size++;
 
@@ -188,6 +207,10 @@ xfs_attri_item_format(
 			&attrip->attri_format,
 			sizeof(struct xfs_attri_log_format));
 	xlog_copy_from_iovec(lv, &vecp, &nv->name);
+
+	if (nv->nname.i_len > 0)
+		xlog_copy_from_iovec(lv, &vecp, &nv->nname);
+
 	if (nv->value.i_len > 0)
 		xlog_copy_from_iovec(lv, &vecp, &nv->value);
 }
@@ -374,6 +397,7 @@ xfs_attr_log_item(
 	attrp->alfi_op_flags = attr->xattri_op_flags;
 	attrp->alfi_value_len = attr->xattri_nameval->value.i_len;
 	attrp->alfi_name_len = attr->xattri_nameval->name.i_len;
+	attrp->alfi_nname_len = attr->xattri_nameval->nname.i_len;
 	ASSERT(!(attr->xattri_da_args->attr_filter & ~XFS_ATTRI_FILTER_MASK));
 	attrp->alfi_attr_filter = attr->xattri_da_args->attr_filter;
 }
@@ -415,7 +439,8 @@ xfs_attr_create_intent(
 		 * deferred work state structure.
 		 */
 		attr->xattri_nameval = xfs_attri_log_nameval_alloc(args->name,
-				args->namelen, args->value, args->valuelen);
+				args->namelen, args->new_name,
+				args->new_namelen, args->value, args->valuelen);
 	}
 
 	attrip = xfs_attri_init(mp, attr->xattri_nameval);
@@ -503,7 +528,8 @@ xfs_attri_validate(
 	unsigned int			op = attrp->alfi_op_flags &
 					     XFS_ATTRI_OP_FLAGS_TYPE_MASK;
 
-	if (attrp->__pad != 0)
+	if (attrp->alfi_op_flags != XFS_ATTRI_OP_FLAGS_NVREPLACE &&
+	    attrp->alfi_nname_len != 0)
 		return false;
 
 	if (attrp->alfi_op_flags & ~XFS_ATTRI_OP_FLAGS_TYPE_MASK)
@@ -517,6 +543,7 @@ xfs_attri_validate(
 	case XFS_ATTRI_OP_FLAGS_SET:
 	case XFS_ATTRI_OP_FLAGS_REPLACE:
 	case XFS_ATTRI_OP_FLAGS_REMOVE:
+	case XFS_ATTRI_OP_FLAGS_NVREPLACE:
 		break;
 	default:
 		return false;
@@ -526,9 +553,14 @@ xfs_attri_validate(
 		return false;
 
 	if ((attrp->alfi_name_len > XATTR_NAME_MAX) ||
+	    (attrp->alfi_nname_len > XATTR_NAME_MAX) ||
 	    (attrp->alfi_name_len == 0))
 		return false;
 
+	if (op == XFS_ATTRI_OP_FLAGS_REMOVE &&
+	    attrp->alfi_value_len != 0)
+		return false;
+
 	return xfs_verify_ino(mp, attrp->alfi_ino);
 }
 
@@ -589,6 +621,8 @@ xfs_attri_item_recover(
 	args->whichfork = XFS_ATTR_FORK;
 	args->name = nv->name.i_addr;
 	args->namelen = nv->name.i_len;
+	args->new_name = nv->nname.i_addr;
+	args->new_namelen = nv->nname.i_len;
 	args->hashval = xfs_da_hashname(args->name, args->namelen);
 	args->attr_filter = attrp->alfi_attr_filter & XFS_ATTRI_FILTER_MASK;
 	args->op_flags = XFS_DA_OP_RECOVERY | XFS_DA_OP_OKNOENT |
@@ -599,6 +633,7 @@ xfs_attri_item_recover(
 	switch (attr->xattri_op_flags) {
 	case XFS_ATTRI_OP_FLAGS_SET:
 	case XFS_ATTRI_OP_FLAGS_REPLACE:
+	case XFS_ATTRI_OP_FLAGS_NVREPLACE:
 		args->value = nv->value.i_addr;
 		args->valuelen = nv->value.i_len;
 		args->total = xfs_attr_calc_size(args, &local);
@@ -688,6 +723,7 @@ xfs_attri_item_relog(
 	new_attrp->alfi_op_flags = old_attrp->alfi_op_flags;
 	new_attrp->alfi_value_len = old_attrp->alfi_value_len;
 	new_attrp->alfi_name_len = old_attrp->alfi_name_len;
+	new_attrp->alfi_nname_len = old_attrp->alfi_nname_len;
 	new_attrp->alfi_attr_filter = old_attrp->alfi_attr_filter;
 
 	xfs_trans_add_item(tp, &new_attrip->attri_item);
@@ -710,48 +746,102 @@ xlog_recover_attri_commit_pass2(
 	const void			*attr_value = NULL;
 	const void			*attr_name;
 	size_t				len;
-
-	attri_formatp = item->ri_buf[0].i_addr;
-	attr_name = item->ri_buf[1].i_addr;
+	const void			*attr_nname = NULL;
+	int				op, i = 0;
 
 	/* Validate xfs_attri_log_format before the large memory allocation */
 	len = sizeof(struct xfs_attri_log_format);
-	if (item->ri_buf[0].i_len != len) {
+	if (item->ri_buf[i].i_len != len) {
 		XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
-				item->ri_buf[0].i_addr, item->ri_buf[0].i_len);
+				item->ri_buf[i].i_addr, item->ri_buf[i].i_len);
 		return -EFSCORRUPTED;
 	}
 
+	attri_formatp = item->ri_buf[i].i_addr;
 	if (!xfs_attri_validate(mp, attri_formatp)) {
 		XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
-				item->ri_buf[0].i_addr, item->ri_buf[0].i_len);
+				item->ri_buf[i].i_addr, item->ri_buf[i].i_len);
 		return -EFSCORRUPTED;
 	}
 
+	op = attri_formatp->alfi_op_flags & XFS_ATTRI_OP_FLAGS_TYPE_MASK;
+	switch (op) {
+	case XFS_ATTRI_OP_FLAGS_SET:
+	case XFS_ATTRI_OP_FLAGS_REPLACE:
+		if (item->ri_total != 3) {
+			XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
+					     attri_formatp, len);
+			return -EFSCORRUPTED;
+		}
+		break;
+	case XFS_ATTRI_OP_FLAGS_REMOVE:
+		if (item->ri_total != 2) {
+			XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
+					     attri_formatp, len);
+			return -EFSCORRUPTED;
+		}
+		break;
+	case XFS_ATTRI_OP_FLAGS_NVREPLACE:
+		if (item->ri_total != 4) {
+			XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
+					     attri_formatp, len);
+			return -EFSCORRUPTED;
+		}
+		break;
+	default:
+		XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
+				     attri_formatp, len);
+		return -EFSCORRUPTED;
+	}
+
+	i++;
 	/* Validate the attr name */
-	if (item->ri_buf[1].i_len !=
+	if (item->ri_buf[i].i_len !=
 			xlog_calc_iovec_len(attri_formatp->alfi_name_len)) {
 		XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
-				item->ri_buf[0].i_addr, item->ri_buf[0].i_len);
+				attri_formatp, len);
 		return -EFSCORRUPTED;
 	}
 
+	attr_name = item->ri_buf[i].i_addr;
 	if (!xfs_attr_namecheck(attr_name, attri_formatp->alfi_name_len)) {
 		XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
-				item->ri_buf[1].i_addr, item->ri_buf[1].i_len);
+				item->ri_buf[i].i_addr, item->ri_buf[i].i_len);
 		return -EFSCORRUPTED;
 	}
 
+	i++;
+	if (attri_formatp->alfi_nname_len) {
+		/* Validate the attr nname */
+		if (item->ri_buf[i].i_len !=
+		    xlog_calc_iovec_len(attri_formatp->alfi_nname_len)) {
+			XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
+					item->ri_buf[i].i_addr,
+					item->ri_buf[i].i_len);
+			return -EFSCORRUPTED;
+		}
+
+		attr_nname = item->ri_buf[i].i_addr;
+		if (!xfs_attr_namecheck(attr_nname,
+				attri_formatp->alfi_nname_len)) {
+			XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
+					item->ri_buf[i].i_addr,
+					item->ri_buf[i].i_len);
+			return -EFSCORRUPTED;
+		}
+		i++;
+	}
+
+
 	/* Validate the attr value, if present */
 	if (attri_formatp->alfi_value_len != 0) {
-		if (item->ri_buf[2].i_len != xlog_calc_iovec_len(attri_formatp->alfi_value_len)) {
+		if (item->ri_buf[i].i_len != xlog_calc_iovec_len(attri_formatp->alfi_value_len)) {
 			XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
-					item->ri_buf[0].i_addr,
-					item->ri_buf[0].i_len);
+					attri_formatp, len);
 			return -EFSCORRUPTED;
 		}
 
-		attr_value = item->ri_buf[2].i_addr;
+		attr_value = item->ri_buf[i].i_addr;
 	}
 
 	/*
@@ -760,7 +850,8 @@ xlog_recover_attri_commit_pass2(
 	 * reference.
 	 */
 	nv = xfs_attri_log_nameval_alloc(attr_name,
-			attri_formatp->alfi_name_len, attr_value,
+			attri_formatp->alfi_name_len, attr_nname,
+			attri_formatp->alfi_nname_len, attr_value,
 			attri_formatp->alfi_value_len);
 
 	attrip = xfs_attri_init(mp, nv);
diff --git a/fs/xfs/xfs_attr_item.h b/fs/xfs/xfs_attr_item.h
index 3280a7930287..24d4968dd6cc 100644
--- a/fs/xfs/xfs_attr_item.h
+++ b/fs/xfs/xfs_attr_item.h
@@ -13,6 +13,7 @@ struct kmem_zone;
 
 struct xfs_attri_log_nameval {
 	struct xfs_log_iovec	name;
+	struct xfs_log_iovec	nname;
 	struct xfs_log_iovec	value;
 	refcount_t		refcount;
 


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 02/28] xfs: Increase XFS_DEFER_OPS_NR_INODES to 5
  2023-02-16 20:26 ` [PATCHSET v9r2d1 00/28] " Darrick J. Wong
  2023-02-16 20:32   ` [PATCH 01/28] xfs: Add new name to attri/d Darrick J. Wong
@ 2023-02-16 20:33   ` Darrick J. Wong
  2023-02-16 20:33   ` [PATCH 03/28] xfs: Increase XFS_QM_TRANS_MAXDQS " Darrick J. Wong
                     ` (25 subsequent siblings)
  27 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:33 UTC (permalink / raw)
  To: djwong; +Cc: Allison Henderson, Catherine Hoang, allison.henderson, linux-xfs

From: Allison Henderson <allison.henderson@oracle.com>

Renames that generate parent pointer updates can join up to 5
inodes locked in sorted order.  So we need to increase the
number of defer ops inodes and relock them in the same way.

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Catherine Hoang <catherine.hoang@oracle.com>
---
 fs/xfs/libxfs/xfs_defer.c |   28 ++++++++++++++++++++++++++--
 fs/xfs/libxfs/xfs_defer.h |    8 +++++++-
 fs/xfs/xfs_inode.c        |    2 +-
 fs/xfs/xfs_inode.h        |    1 +
 4 files changed, 35 insertions(+), 4 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
index 5a321b783398..c0279b57e51d 100644
--- a/fs/xfs/libxfs/xfs_defer.c
+++ b/fs/xfs/libxfs/xfs_defer.c
@@ -820,13 +820,37 @@ xfs_defer_ops_continue(
 	struct xfs_trans		*tp,
 	struct xfs_defer_resources	*dres)
 {
-	unsigned int			i;
+	unsigned int			i, j;
+	struct xfs_inode		*sips[XFS_DEFER_OPS_NR_INODES];
+	struct xfs_inode		*temp;
 
 	ASSERT(tp->t_flags & XFS_TRANS_PERM_LOG_RES);
 	ASSERT(!(tp->t_flags & XFS_TRANS_DIRTY));
 
 	/* Lock the captured resources to the new transaction. */
-	if (dfc->dfc_held.dr_inos == 2)
+	if (dfc->dfc_held.dr_inos > 2) {
+		/*
+		 * Renames with parent pointer updates can lock up to 5 inodes,
+		 * sorted by their inode number.  So we need to make sure they
+		 * are relocked in the same way.
+		 */
+		memset(sips, 0, sizeof(sips));
+		for (i = 0; i < dfc->dfc_held.dr_inos; i++)
+			sips[i] = dfc->dfc_held.dr_ip[i];
+
+		/* Bubble sort of at most 5 inodes */
+		for (i = 0; i < dfc->dfc_held.dr_inos; i++) {
+			for (j = 1; j < dfc->dfc_held.dr_inos; j++) {
+				if (sips[j]->i_ino < sips[j-1]->i_ino) {
+					temp = sips[j];
+					sips[j] = sips[j-1];
+					sips[j-1] = temp;
+				}
+			}
+		}
+
+		xfs_lock_inodes(sips, dfc->dfc_held.dr_inos, XFS_ILOCK_EXCL);
+	} else if (dfc->dfc_held.dr_inos == 2)
 		xfs_lock_two_inodes(dfc->dfc_held.dr_ip[0], XFS_ILOCK_EXCL,
 				    dfc->dfc_held.dr_ip[1], XFS_ILOCK_EXCL);
 	else if (dfc->dfc_held.dr_inos == 1)
diff --git a/fs/xfs/libxfs/xfs_defer.h b/fs/xfs/libxfs/xfs_defer.h
index 114a3a4930a3..fdf6941f8f4d 100644
--- a/fs/xfs/libxfs/xfs_defer.h
+++ b/fs/xfs/libxfs/xfs_defer.h
@@ -70,7 +70,13 @@ extern const struct xfs_defer_op_type xfs_attr_defer_type;
 /*
  * Deferred operation item relogging limits.
  */
-#define XFS_DEFER_OPS_NR_INODES	2	/* join up to two inodes */
+
+/*
+ * Rename w/ parent pointers can require up to 5 inodes with deferred ops to
+ * be joined to the transaction: src_dp, target_dp, src_ip, target_ip, and wip.
+ * These inodes are locked in sorted order by their inode numbers
+ */
+#define XFS_DEFER_OPS_NR_INODES	5
 #define XFS_DEFER_OPS_NR_BUFS	2	/* join up to two buffers */
 
 /* Resources that must be held across a transaction roll. */
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index d354ea2b74f9..27532053a67b 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -447,7 +447,7 @@ xfs_lock_inumorder(
  * lock more than one at a time, lockdep will report false positives saying we
  * have violated locking orders.
  */
-static void
+void
 xfs_lock_inodes(
 	struct xfs_inode	**ips,
 	int			inodes,
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index fa780f08dc89..2eaed98af814 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -574,5 +574,6 @@ void xfs_end_io(struct work_struct *work);
 
 int xfs_ilock2_io_mmap(struct xfs_inode *ip1, struct xfs_inode *ip2);
 void xfs_iunlock2_io_mmap(struct xfs_inode *ip1, struct xfs_inode *ip2);
+void xfs_lock_inodes(struct xfs_inode **ips, int inodes, uint lock_mode);
 
 #endif	/* __XFS_INODE_H__ */


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 03/28] xfs: Increase XFS_QM_TRANS_MAXDQS to 5
  2023-02-16 20:26 ` [PATCHSET v9r2d1 00/28] " Darrick J. Wong
  2023-02-16 20:32   ` [PATCH 01/28] xfs: Add new name to attri/d Darrick J. Wong
  2023-02-16 20:33   ` [PATCH 02/28] xfs: Increase XFS_DEFER_OPS_NR_INODES to 5 Darrick J. Wong
@ 2023-02-16 20:33   ` Darrick J. Wong
  2023-02-16 20:33   ` [PATCH 04/28] xfs: Hold inode locks in xfs_ialloc Darrick J. Wong
                     ` (24 subsequent siblings)
  27 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:33 UTC (permalink / raw)
  To: djwong; +Cc: Allison Henderson, allison.henderson, linux-xfs

From: Allison Henderson <allison.henderson@oracle.com>

With parent pointers enabled, a rename operation can update up to 5
inodes: src_dp, target_dp, src_ip, target_ip and wip.  This causes
their dquots to a be attached to the transaction chain, so we need
to increase XFS_QM_TRANS_MAXDQS.  This patch also add a helper
function xfs_dqlockn to lock an arbitrary number of dquots.

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
---
 fs/xfs/xfs_dquot.c       |   38 ++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_dquot.h       |    1 +
 fs/xfs/xfs_qm.h          |    2 +-
 fs/xfs/xfs_trans_dquot.c |   15 ++++++++++-----
 4 files changed, 50 insertions(+), 6 deletions(-)


diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
index 8fb90da89787..9f311729c4c8 100644
--- a/fs/xfs/xfs_dquot.c
+++ b/fs/xfs/xfs_dquot.c
@@ -1333,6 +1333,44 @@ xfs_dqlock2(
 	}
 }
 
+static int
+xfs_dqtrx_cmp(
+	const void		*a,
+	const void		*b)
+{
+	const struct xfs_dqtrx	*qa = a;
+	const struct xfs_dqtrx	*qb = b;
+
+	if (qa->qt_dquot->q_id > qb->qt_dquot->q_id)
+		return 1;
+	if (qa->qt_dquot->q_id < qb->qt_dquot->q_id)
+		return -1;
+	return 0;
+}
+
+void
+xfs_dqlockn(
+	struct xfs_dqtrx	*q)
+{
+	unsigned int		i;
+
+	/* Sort in order of dquot id, do not allow duplicates */
+	for (i = 0; i < XFS_QM_TRANS_MAXDQS && q[i].qt_dquot != NULL; i++) {
+		unsigned int	j;
+
+		for (j = 0; j < i; j++)
+			ASSERT(q[i].qt_dquot != q[j].qt_dquot);
+	}
+	if (i == 0)
+		return;
+
+	sort(q, i, sizeof(struct xfs_dqtrx), xfs_dqtrx_cmp, NULL);
+
+	mutex_lock(&q[0].qt_dquot->q_qlock);
+	for (i = 1; i < XFS_QM_TRANS_MAXDQS && q[i].qt_dquot != NULL; i++)
+		mutex_lock_nested(&q[i].qt_dquot->q_qlock, XFS_QLOCK_NESTED);
+}
+
 int __init
 xfs_qm_init(void)
 {
diff --git a/fs/xfs/xfs_dquot.h b/fs/xfs/xfs_dquot.h
index 80c8f851a2f3..dc7d0226242b 100644
--- a/fs/xfs/xfs_dquot.h
+++ b/fs/xfs/xfs_dquot.h
@@ -223,6 +223,7 @@ int		xfs_qm_dqget_uncached(struct xfs_mount *mp,
 void		xfs_qm_dqput(struct xfs_dquot *dqp);
 
 void		xfs_dqlock2(struct xfs_dquot *, struct xfs_dquot *);
+void		xfs_dqlockn(struct xfs_dqtrx *q);
 
 void		xfs_dquot_set_prealloc_limits(struct xfs_dquot *);
 
diff --git a/fs/xfs/xfs_qm.h b/fs/xfs/xfs_qm.h
index 9683f0457d19..c6ec88779356 100644
--- a/fs/xfs/xfs_qm.h
+++ b/fs/xfs/xfs_qm.h
@@ -120,7 +120,7 @@ enum {
 	XFS_QM_TRANS_PRJ,
 	XFS_QM_TRANS_DQTYPES
 };
-#define XFS_QM_TRANS_MAXDQS		2
+#define XFS_QM_TRANS_MAXDQS		5
 struct xfs_dquot_acct {
 	struct xfs_dqtrx	dqs[XFS_QM_TRANS_DQTYPES][XFS_QM_TRANS_MAXDQS];
 };
diff --git a/fs/xfs/xfs_trans_dquot.c b/fs/xfs/xfs_trans_dquot.c
index aa00cf67ad72..8a48175ea3a7 100644
--- a/fs/xfs/xfs_trans_dquot.c
+++ b/fs/xfs/xfs_trans_dquot.c
@@ -268,24 +268,29 @@ xfs_trans_mod_dquot(
 
 /*
  * Given an array of dqtrx structures, lock all the dquots associated and join
- * them to the transaction, provided they have been modified.  We know that the
- * highest number of dquots of one type - usr, grp and prj - involved in a
- * transaction is 3 so we don't need to make this very generic.
+ * them to the transaction, provided they have been modified.
  */
 STATIC void
 xfs_trans_dqlockedjoin(
 	struct xfs_trans	*tp,
 	struct xfs_dqtrx	*q)
 {
+	unsigned int		i;
 	ASSERT(q[0].qt_dquot != NULL);
 	if (q[1].qt_dquot == NULL) {
 		xfs_dqlock(q[0].qt_dquot);
 		xfs_trans_dqjoin(tp, q[0].qt_dquot);
-	} else {
-		ASSERT(XFS_QM_TRANS_MAXDQS == 2);
+	} else if (q[2].qt_dquot == NULL) {
 		xfs_dqlock2(q[0].qt_dquot, q[1].qt_dquot);
 		xfs_trans_dqjoin(tp, q[0].qt_dquot);
 		xfs_trans_dqjoin(tp, q[1].qt_dquot);
+	} else {
+		xfs_dqlockn(q);
+		for (i = 0; i < XFS_QM_TRANS_MAXDQS; i++) {
+			if (q[i].qt_dquot == NULL)
+				break;
+			xfs_trans_dqjoin(tp, q[i].qt_dquot);
+		}
 	}
 }
 


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 04/28] xfs: Hold inode locks in xfs_ialloc
  2023-02-16 20:26 ` [PATCHSET v9r2d1 00/28] " Darrick J. Wong
                     ` (2 preceding siblings ...)
  2023-02-16 20:33   ` [PATCH 03/28] xfs: Increase XFS_QM_TRANS_MAXDQS " Darrick J. Wong
@ 2023-02-16 20:33   ` Darrick J. Wong
  2023-02-16 20:33   ` [PATCH 05/28] xfs: Hold inode locks in xfs_trans_alloc_dir Darrick J. Wong
                     ` (23 subsequent siblings)
  27 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:33 UTC (permalink / raw)
  To: djwong; +Cc: Allison Henderson, Catherine Hoang, allison.henderson, linux-xfs

From: Allison Henderson <allison.henderson@oracle.com>

Modify xfs_ialloc to hold locks after return.  Caller will be
responsible for manual unlock.  We will need this later to hold locks
across parent pointer operations

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Catherine Hoang <catherine.hoang@oracle.com>
---
 fs/xfs/xfs_inode.c   |    8 +++++++-
 fs/xfs/xfs_qm.c      |    4 +++-
 fs/xfs/xfs_symlink.c |    3 +++
 3 files changed, 13 insertions(+), 2 deletions(-)


diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 27532053a67b..772e3f105b7b 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -774,6 +774,8 @@ xfs_inode_inherit_flags2(
 /*
  * Initialise a newly allocated inode and return the in-core inode to the
  * caller locked exclusively.
+ *
+ * Caller is responsible for unlocking the inode manually upon return
  */
 int
 xfs_init_new_inode(
@@ -899,7 +901,7 @@ xfs_init_new_inode(
 	/*
 	 * Log the new values stuffed into the inode.
 	 */
-	xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
+	xfs_trans_ijoin(tp, ip, 0);
 	xfs_trans_log_inode(tp, ip, flags);
 
 	/* now that we have an i_mode we can setup the inode structure */
@@ -1076,6 +1078,7 @@ xfs_create(
 	xfs_qm_dqrele(pdqp);
 
 	*ipp = ip;
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
 	return 0;
 
  out_trans_cancel:
@@ -1089,6 +1092,7 @@ xfs_create(
 	if (ip) {
 		xfs_finish_inode_setup(ip);
 		xfs_irele(ip);
+		xfs_iunlock(ip, XFS_ILOCK_EXCL);
 	}
  out_release_dquots:
 	xfs_qm_dqrele(udqp);
@@ -1172,6 +1176,7 @@ xfs_create_tmpfile(
 	xfs_qm_dqrele(pdqp);
 
 	*ipp = ip;
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
 	return 0;
 
  out_trans_cancel:
@@ -1185,6 +1190,7 @@ xfs_create_tmpfile(
 	if (ip) {
 		xfs_finish_inode_setup(ip);
 		xfs_irele(ip);
+		xfs_iunlock(ip, XFS_ILOCK_EXCL);
 	}
  out_release_dquots:
 	xfs_qm_dqrele(udqp);
diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c
index e2c542f6dcd4..fbecf54d3b44 100644
--- a/fs/xfs/xfs_qm.c
+++ b/fs/xfs/xfs_qm.c
@@ -826,8 +826,10 @@ xfs_qm_qino_alloc(
 		ASSERT(xfs_is_shutdown(mp));
 		xfs_alert(mp, "%s failed (error %d)!", __func__, error);
 	}
-	if (need_alloc)
+	if (need_alloc) {
 		xfs_finish_inode_setup(*ipp);
+		xfs_iunlock(*ipp, XFS_ILOCK_EXCL);
+	}
 	return error;
 }
 
diff --git a/fs/xfs/xfs_symlink.c b/fs/xfs/xfs_symlink.c
index 8389f3ef88ef..d8e120913036 100644
--- a/fs/xfs/xfs_symlink.c
+++ b/fs/xfs/xfs_symlink.c
@@ -337,6 +337,7 @@ xfs_symlink(
 	xfs_qm_dqrele(pdqp);
 
 	*ipp = ip;
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
 	return 0;
 
 out_trans_cancel:
@@ -358,6 +359,8 @@ xfs_symlink(
 
 	if (unlock_dp_on_error)
 		xfs_iunlock(dp, XFS_ILOCK_EXCL);
+	if (ip)
+		xfs_iunlock(ip, XFS_ILOCK_EXCL);
 	return error;
 }
 


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 05/28] xfs: Hold inode locks in xfs_trans_alloc_dir
  2023-02-16 20:26 ` [PATCHSET v9r2d1 00/28] " Darrick J. Wong
                     ` (3 preceding siblings ...)
  2023-02-16 20:33   ` [PATCH 04/28] xfs: Hold inode locks in xfs_ialloc Darrick J. Wong
@ 2023-02-16 20:33   ` Darrick J. Wong
  2023-02-16 20:34   ` [PATCH 06/28] xfs: Hold inode locks in xfs_rename Darrick J. Wong
                     ` (22 subsequent siblings)
  27 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:33 UTC (permalink / raw)
  To: djwong; +Cc: Allison Henderson, Catherine Hoang, allison.henderson, linux-xfs

From: Allison Henderson <allison.henderson@oracle.com>

Modify xfs_trans_alloc_dir to hold locks after return.  Caller will be
responsible for manual unlock.  We will need this later to hold locks
across parent pointer operations

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Catherine Hoang <catherine.hoang@oracle.com>
---
 fs/xfs/xfs_inode.c |   14 ++++++++++++--
 fs/xfs/xfs_trans.c |    9 +++++++--
 2 files changed, 19 insertions(+), 4 deletions(-)


diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 772e3f105b7b..e292688ee608 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -1279,10 +1279,15 @@ xfs_link(
 	if (xfs_has_wsync(mp) || xfs_has_dirsync(mp))
 		xfs_trans_set_sync(tp);
 
-	return xfs_trans_commit(tp);
+	error = xfs_trans_commit(tp);
+	xfs_iunlock(tdp, XFS_ILOCK_EXCL);
+	xfs_iunlock(sip, XFS_ILOCK_EXCL);
+	return error;
 
  error_return:
 	xfs_trans_cancel(tp);
+	xfs_iunlock(tdp, XFS_ILOCK_EXCL);
+	xfs_iunlock(sip, XFS_ILOCK_EXCL);
  std_return:
 	if (error == -ENOSPC && nospace_error)
 		error = nospace_error;
@@ -2518,15 +2523,20 @@ xfs_remove(
 
 	error = xfs_trans_commit(tp);
 	if (error)
-		goto std_return;
+		goto out_unlock;
 
 	if (is_dir && xfs_inode_is_filestream(ip))
 		xfs_filestream_deassociate(ip);
 
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+	xfs_iunlock(dp, XFS_ILOCK_EXCL);
 	return 0;
 
  out_trans_cancel:
 	xfs_trans_cancel(tp);
+ out_unlock:
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+	xfs_iunlock(dp, XFS_ILOCK_EXCL);
  std_return:
 	return error;
 }
diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index 7bd16fbff534..43f4b0943f49 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -1356,6 +1356,8 @@ xfs_trans_alloc_ichange(
  * The caller must ensure that the on-disk dquots attached to this inode have
  * already been allocated and initialized.  The ILOCKs will be dropped when the
  * transaction is committed or cancelled.
+ *
+ * Caller is responsible for unlocking the inodes manually upon return
  */
 int
 xfs_trans_alloc_dir(
@@ -1386,8 +1388,8 @@ xfs_trans_alloc_dir(
 
 	xfs_lock_two_inodes(dp, XFS_ILOCK_EXCL, ip, XFS_ILOCK_EXCL);
 
-	xfs_trans_ijoin(tp, dp, XFS_ILOCK_EXCL);
-	xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
+	xfs_trans_ijoin(tp, dp, 0);
+	xfs_trans_ijoin(tp, ip, 0);
 
 	error = xfs_qm_dqattach_locked(dp, false);
 	if (error) {
@@ -1410,6 +1412,9 @@ xfs_trans_alloc_dir(
 	if (error == -EDQUOT || error == -ENOSPC) {
 		if (!retried) {
 			xfs_trans_cancel(tp);
+			xfs_iunlock(dp, XFS_ILOCK_EXCL);
+			if (dp != ip)
+				xfs_iunlock(ip, XFS_ILOCK_EXCL);
 			xfs_blockgc_free_quota(dp, 0);
 			retried = true;
 			goto retry;


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 06/28] xfs: Hold inode locks in xfs_rename
  2023-02-16 20:26 ` [PATCHSET v9r2d1 00/28] " Darrick J. Wong
                     ` (4 preceding siblings ...)
  2023-02-16 20:33   ` [PATCH 05/28] xfs: Hold inode locks in xfs_trans_alloc_dir Darrick J. Wong
@ 2023-02-16 20:34   ` Darrick J. Wong
  2023-02-16 20:34   ` [PATCH 07/28] xfs: Expose init_xattrs in xfs_create_tmpfile Darrick J. Wong
                     ` (21 subsequent siblings)
  27 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:34 UTC (permalink / raw)
  To: djwong; +Cc: Allison Henderson, Catherine Hoang, allison.henderson, linux-xfs

From: Allison Henderson <allison.henderson@oracle.com>

Modify xfs_rename to hold all inode locks across a rename operation
We will need this later when we add parent pointers

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Catherine Hoang <catherine.hoang@oracle.com>
---
 fs/xfs/xfs_inode.c |   43 ++++++++++++++++++++++++++++++-------------
 1 file changed, 30 insertions(+), 13 deletions(-)


diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index e292688ee608..131abf84ea87 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -2541,6 +2541,21 @@ xfs_remove(
 	return error;
 }
 
+static inline void
+xfs_iunlock_rename(
+	struct xfs_inode	**i_tab,
+	int			num_inodes)
+{
+	int			i;
+
+	for (i = num_inodes - 1; i >= 0; i--) {
+		/* Skip duplicate inodes if src and target dps are the same */
+		if (!i_tab[i] || (i > 0 && i_tab[i] == i_tab[i - 1]))
+			continue;
+		xfs_iunlock(i_tab[i], XFS_ILOCK_EXCL);
+	}
+}
+
 /*
  * Enter all inodes for a rename transaction into a sorted array.
  */
@@ -2839,18 +2854,16 @@ xfs_rename(
 	xfs_lock_inodes(inodes, num_inodes, XFS_ILOCK_EXCL);
 
 	/*
-	 * Join all the inodes to the transaction. From this point on,
-	 * we can rely on either trans_commit or trans_cancel to unlock
-	 * them.
+	 * Join all the inodes to the transaction.
 	 */
-	xfs_trans_ijoin(tp, src_dp, XFS_ILOCK_EXCL);
+	xfs_trans_ijoin(tp, src_dp, 0);
 	if (new_parent)
-		xfs_trans_ijoin(tp, target_dp, XFS_ILOCK_EXCL);
-	xfs_trans_ijoin(tp, src_ip, XFS_ILOCK_EXCL);
+		xfs_trans_ijoin(tp, target_dp, 0);
+	xfs_trans_ijoin(tp, src_ip, 0);
 	if (target_ip)
-		xfs_trans_ijoin(tp, target_ip, XFS_ILOCK_EXCL);
+		xfs_trans_ijoin(tp, target_ip, 0);
 	if (wip)
-		xfs_trans_ijoin(tp, wip, XFS_ILOCK_EXCL);
+		xfs_trans_ijoin(tp, wip, 0);
 
 	/*
 	 * If we are using project inheritance, we only allow renames
@@ -2864,10 +2877,12 @@ xfs_rename(
 	}
 
 	/* RENAME_EXCHANGE is unique from here on. */
-	if (flags & RENAME_EXCHANGE)
-		return xfs_cross_rename(tp, src_dp, src_name, src_ip,
+	if (flags & RENAME_EXCHANGE) {
+		error = xfs_cross_rename(tp, src_dp, src_name, src_ip,
 					target_dp, target_name, target_ip,
 					spaceres);
+		goto out_unlock;
+	}
 
 	/*
 	 * Try to reserve quota to handle an expansion of the target directory.
@@ -2881,6 +2896,7 @@ xfs_rename(
 		if (error == -EDQUOT || error == -ENOSPC) {
 			if (!retried) {
 				xfs_trans_cancel(tp);
+				xfs_iunlock_rename(inodes, num_inodes);
 				xfs_blockgc_free_quota(target_dp, 0);
 				retried = true;
 				goto retry;
@@ -3092,12 +3108,13 @@ xfs_rename(
 		xfs_trans_log_inode(tp, target_dp, XFS_ILOG_CORE);
 
 	error = xfs_finish_rename(tp);
-	if (wip)
-		xfs_irele(wip);
-	return error;
+
+	goto out_unlock;
 
 out_trans_cancel:
 	xfs_trans_cancel(tp);
+out_unlock:
+	xfs_iunlock_rename(inodes, num_inodes);
 out_release_wip:
 	if (wip)
 		xfs_irele(wip);


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 07/28] xfs: Expose init_xattrs in xfs_create_tmpfile
  2023-02-16 20:26 ` [PATCHSET v9r2d1 00/28] " Darrick J. Wong
                     ` (5 preceding siblings ...)
  2023-02-16 20:34   ` [PATCH 06/28] xfs: Hold inode locks in xfs_rename Darrick J. Wong
@ 2023-02-16 20:34   ` Darrick J. Wong
  2023-02-16 20:34   ` [PATCH 08/28] xfs: get directory offset when adding directory name Darrick J. Wong
                     ` (20 subsequent siblings)
  27 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:34 UTC (permalink / raw)
  To: djwong; +Cc: Allison Henderson, allison.henderson, linux-xfs

From: Allison Henderson <allison.henderson@oracle.com>

Tmp files are used as part of rename operations and will need attr forks
initialized for parent pointers.  Expose the init_xattrs parameter to
the calling function to initialize the fork.

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_inode.c |    5 +++--
 fs/xfs/xfs_inode.h |    2 +-
 fs/xfs/xfs_iops.c  |    3 ++-
 3 files changed, 6 insertions(+), 4 deletions(-)


diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 131abf84ea87..267d629a33d9 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -1109,6 +1109,7 @@ xfs_create_tmpfile(
 	struct user_namespace	*mnt_userns,
 	struct xfs_inode	*dp,
 	umode_t			mode,
+	bool			init_xattrs,
 	struct xfs_inode	**ipp)
 {
 	struct xfs_mount	*mp = dp->i_mount;
@@ -1149,7 +1150,7 @@ xfs_create_tmpfile(
 	error = xfs_dialloc(&tp, dp->i_ino, mode, &ino);
 	if (!error)
 		error = xfs_init_new_inode(mnt_userns, tp, dp, ino, mode,
-				0, 0, prid, false, &ip);
+				0, 0, prid, init_xattrs, &ip);
 	if (error)
 		goto out_trans_cancel;
 
@@ -2750,7 +2751,7 @@ xfs_rename_alloc_whiteout(
 	int			error;
 
 	error = xfs_create_tmpfile(mnt_userns, dp, S_IFCHR | WHITEOUT_MODE,
-				   &tmpfile);
+				   false, &tmpfile);
 	if (error)
 		return error;
 
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index 2eaed98af814..5735de32beeb 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -478,7 +478,7 @@ int		xfs_create(struct user_namespace *mnt_userns,
 			   umode_t mode, dev_t rdev, bool need_xattr,
 			   struct xfs_inode **ipp);
 int		xfs_create_tmpfile(struct user_namespace *mnt_userns,
-			   struct xfs_inode *dp, umode_t mode,
+			   struct xfs_inode *dp, umode_t mode, bool init_xattrs,
 			   struct xfs_inode **ipp);
 int		xfs_remove(struct xfs_inode *dp, struct xfs_name *name,
 			   struct xfs_inode *ip);
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index 515318dfbc38..45e66c961829 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -200,7 +200,8 @@ xfs_generic_create(
 				xfs_create_need_xattr(dir, default_acl, acl),
 				&ip);
 	} else {
-		error = xfs_create_tmpfile(mnt_userns, XFS_I(dir), mode, &ip);
+		error = xfs_create_tmpfile(mnt_userns, XFS_I(dir), mode, true,
+					   &ip);
 	}
 	if (unlikely(error))
 		goto out_free_acl;


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 08/28] xfs: get directory offset when adding directory name
  2023-02-16 20:26 ` [PATCHSET v9r2d1 00/28] " Darrick J. Wong
                     ` (6 preceding siblings ...)
  2023-02-16 20:34   ` [PATCH 07/28] xfs: Expose init_xattrs in xfs_create_tmpfile Darrick J. Wong
@ 2023-02-16 20:34   ` Darrick J. Wong
  2023-02-16 20:35   ` [PATCH 09/28] xfs: get directory offset when removing " Darrick J. Wong
                     ` (19 subsequent siblings)
  27 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:34 UTC (permalink / raw)
  To: djwong
  Cc: Dave Chinner, Allison Henderson, Catherine Hoang,
	allison.henderson, linux-xfs

From: Allison Henderson <allison.henderson@oracle.com>

Return the directory offset information when adding an entry to the
directory.

This offset will be used as the parent pointer offset in xfs_create,
xfs_symlink, xfs_link and xfs_rename.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Catherine Hoang <catherine.hoang@oracle.com>
---
 fs/xfs/libxfs/xfs_da_btree.h   |    1 +
 fs/xfs/libxfs/xfs_dir2.c       |    9 +++++++--
 fs/xfs/libxfs/xfs_dir2.h       |    2 +-
 fs/xfs/libxfs/xfs_dir2_block.c |    1 +
 fs/xfs/libxfs/xfs_dir2_leaf.c  |    2 ++
 fs/xfs/libxfs/xfs_dir2_node.c  |    2 ++
 fs/xfs/libxfs/xfs_dir2_sf.c    |    2 ++
 fs/xfs/xfs_inode.c             |    6 +++---
 fs/xfs/xfs_symlink.c           |    3 ++-
 9 files changed, 21 insertions(+), 7 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_da_btree.h b/fs/xfs/libxfs/xfs_da_btree.h
index a4b29827603f..90b86d00258f 100644
--- a/fs/xfs/libxfs/xfs_da_btree.h
+++ b/fs/xfs/libxfs/xfs_da_btree.h
@@ -81,6 +81,7 @@ typedef struct xfs_da_args {
 	int		rmtvaluelen2;	/* remote attr value length in bytes */
 	uint32_t	op_flags;	/* operation flags */
 	enum xfs_dacmp	cmpresult;	/* name compare result for lookups */
+	xfs_dir2_dataptr_t offset;	/* OUT: offset in directory */
 } xfs_da_args_t;
 
 /*
diff --git a/fs/xfs/libxfs/xfs_dir2.c b/fs/xfs/libxfs/xfs_dir2.c
index 92bac3373f1f..69a6561c22cc 100644
--- a/fs/xfs/libxfs/xfs_dir2.c
+++ b/fs/xfs/libxfs/xfs_dir2.c
@@ -257,7 +257,8 @@ xfs_dir_createname(
 	struct xfs_inode	*dp,
 	const struct xfs_name	*name,
 	xfs_ino_t		inum,		/* new entry inode number */
-	xfs_extlen_t		total)		/* bmap's total block count */
+	xfs_extlen_t		total,		/* bmap's total block count */
+	xfs_dir2_dataptr_t	*offset)	/* OUT entry's dir offset */
 {
 	struct xfs_da_args	*args;
 	int			rval;
@@ -312,6 +313,10 @@ xfs_dir_createname(
 		rval = xfs_dir2_node_addname(args);
 
 out_free:
+	/* return the location that this entry was place in the parent inode */
+	if (offset)
+		*offset = args->offset;
+
 	kmem_free(args);
 	return rval;
 }
@@ -550,7 +555,7 @@ xfs_dir_canenter(
 	xfs_inode_t	*dp,
 	struct xfs_name	*name)		/* name of entry to add */
 {
-	return xfs_dir_createname(tp, dp, name, 0, 0);
+	return xfs_dir_createname(tp, dp, name, 0, 0, NULL);
 }
 
 /*
diff --git a/fs/xfs/libxfs/xfs_dir2.h b/fs/xfs/libxfs/xfs_dir2.h
index dd39f17dd9a9..d96954478696 100644
--- a/fs/xfs/libxfs/xfs_dir2.h
+++ b/fs/xfs/libxfs/xfs_dir2.h
@@ -40,7 +40,7 @@ extern int xfs_dir_init(struct xfs_trans *tp, struct xfs_inode *dp,
 				struct xfs_inode *pdp);
 extern int xfs_dir_createname(struct xfs_trans *tp, struct xfs_inode *dp,
 				const struct xfs_name *name, xfs_ino_t inum,
-				xfs_extlen_t tot);
+				xfs_extlen_t tot, xfs_dir2_dataptr_t *offset);
 extern int xfs_dir_lookup(struct xfs_trans *tp, struct xfs_inode *dp,
 				const struct xfs_name *name, xfs_ino_t *inum,
 				struct xfs_name *ci_name);
diff --git a/fs/xfs/libxfs/xfs_dir2_block.c b/fs/xfs/libxfs/xfs_dir2_block.c
index 00f960a703b2..70aeab9d2a12 100644
--- a/fs/xfs/libxfs/xfs_dir2_block.c
+++ b/fs/xfs/libxfs/xfs_dir2_block.c
@@ -573,6 +573,7 @@ xfs_dir2_block_addname(
 	xfs_dir2_data_put_ftype(dp->i_mount, dep, args->filetype);
 	tagp = xfs_dir2_data_entry_tag_p(dp->i_mount, dep);
 	*tagp = cpu_to_be16((char *)dep - (char *)hdr);
+	args->offset = xfs_dir2_byte_to_dataptr((char *)dep - (char *)hdr);
 	/*
 	 * Clean up the bestfree array and log the header, tail, and entry.
 	 */
diff --git a/fs/xfs/libxfs/xfs_dir2_leaf.c b/fs/xfs/libxfs/xfs_dir2_leaf.c
index cb9e950a911d..9ab520b66547 100644
--- a/fs/xfs/libxfs/xfs_dir2_leaf.c
+++ b/fs/xfs/libxfs/xfs_dir2_leaf.c
@@ -870,6 +870,8 @@ xfs_dir2_leaf_addname(
 	xfs_dir2_data_put_ftype(dp->i_mount, dep, args->filetype);
 	tagp = xfs_dir2_data_entry_tag_p(dp->i_mount, dep);
 	*tagp = cpu_to_be16((char *)dep - (char *)hdr);
+	args->offset = xfs_dir2_db_off_to_dataptr(args->geo, use_block,
+						(char *)dep - (char *)hdr);
 	/*
 	 * Need to scan fix up the bestfree table.
 	 */
diff --git a/fs/xfs/libxfs/xfs_dir2_node.c b/fs/xfs/libxfs/xfs_dir2_node.c
index 7a03aeb9f4c9..5a9513c036b8 100644
--- a/fs/xfs/libxfs/xfs_dir2_node.c
+++ b/fs/xfs/libxfs/xfs_dir2_node.c
@@ -1974,6 +1974,8 @@ xfs_dir2_node_addname_int(
 	xfs_dir2_data_put_ftype(dp->i_mount, dep, args->filetype);
 	tagp = xfs_dir2_data_entry_tag_p(dp->i_mount, dep);
 	*tagp = cpu_to_be16((char *)dep - (char *)hdr);
+	args->offset = xfs_dir2_db_off_to_dataptr(args->geo, dbno,
+						  (char *)dep - (char *)hdr);
 	xfs_dir2_data_log_entry(args, dbp, dep);
 
 	/* Rescan the freespace and log the data block if needed. */
diff --git a/fs/xfs/libxfs/xfs_dir2_sf.c b/fs/xfs/libxfs/xfs_dir2_sf.c
index 8cd37e6e9d38..44bc4ba3da8a 100644
--- a/fs/xfs/libxfs/xfs_dir2_sf.c
+++ b/fs/xfs/libxfs/xfs_dir2_sf.c
@@ -485,6 +485,7 @@ xfs_dir2_sf_addname_easy(
 	memcpy(sfep->name, args->name, sfep->namelen);
 	xfs_dir2_sf_put_ino(mp, sfp, sfep, args->inumber);
 	xfs_dir2_sf_put_ftype(mp, sfep, args->filetype);
+	args->offset = xfs_dir2_byte_to_dataptr(offset);
 
 	/*
 	 * Update the header and inode.
@@ -575,6 +576,7 @@ xfs_dir2_sf_addname_hard(
 	memcpy(sfep->name, args->name, sfep->namelen);
 	xfs_dir2_sf_put_ino(mp, sfp, sfep, args->inumber);
 	xfs_dir2_sf_put_ftype(mp, sfep, args->filetype);
+	args->offset = xfs_dir2_byte_to_dataptr(offset);
 	sfp->count++;
 	if (args->inumber > XFS_DIR2_MAX_SHORT_INUM && !objchange)
 		sfp->i8count++;
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 267d629a33d9..143de4202cf4 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -1038,7 +1038,7 @@ xfs_create(
 	unlock_dp_on_error = false;
 
 	error = xfs_dir_createname(tp, dp, name, ip->i_ino,
-					resblks - XFS_IALLOC_SPACE_RES(mp));
+				   resblks - XFS_IALLOC_SPACE_RES(mp), NULL);
 	if (error) {
 		ASSERT(error != -ENOSPC);
 		goto out_trans_cancel;
@@ -1264,7 +1264,7 @@ xfs_link(
 	}
 
 	error = xfs_dir_createname(tp, tdp, target_name, sip->i_ino,
-				   resblks);
+				   resblks, NULL);
 	if (error)
 		goto error_return;
 	xfs_trans_ichgtime(tp, tdp, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG);
@@ -3001,7 +3001,7 @@ xfs_rename(
 		 * to account for the ".." reference from the new entry.
 		 */
 		error = xfs_dir_createname(tp, target_dp, target_name,
-					   src_ip->i_ino, spaceres);
+					   src_ip->i_ino, spaceres, NULL);
 		if (error)
 			goto out_trans_cancel;
 
diff --git a/fs/xfs/xfs_symlink.c b/fs/xfs/xfs_symlink.c
index d8e120913036..27a7d7c57015 100644
--- a/fs/xfs/xfs_symlink.c
+++ b/fs/xfs/xfs_symlink.c
@@ -314,7 +314,8 @@ xfs_symlink(
 	/*
 	 * Create the directory entry for the symlink.
 	 */
-	error = xfs_dir_createname(tp, dp, link_name, ip->i_ino, resblks);
+	error = xfs_dir_createname(tp, dp, link_name,
+			ip->i_ino, resblks, NULL);
 	if (error)
 		goto out_trans_cancel;
 	xfs_trans_ichgtime(tp, dp, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG);


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 09/28] xfs: get directory offset when removing directory name
  2023-02-16 20:26 ` [PATCHSET v9r2d1 00/28] " Darrick J. Wong
                     ` (7 preceding siblings ...)
  2023-02-16 20:34   ` [PATCH 08/28] xfs: get directory offset when adding directory name Darrick J. Wong
@ 2023-02-16 20:35   ` Darrick J. Wong
  2023-02-16 20:35   ` [PATCH 10/28] xfs: get directory offset when replacing a " Darrick J. Wong
                     ` (18 subsequent siblings)
  27 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:35 UTC (permalink / raw)
  To: djwong
  Cc: Mark Tinguely, Dave Chinner, Allison Henderson, Catherine Hoang,
	allison.henderson, linux-xfs

From: Allison Henderson <allison.henderson@oracle.com>

Return the directory offset information when removing an entry to the
directory.

This offset will be used as the parent pointer offset in xfs_remove.

Signed-off-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Catherine Hoang <catherine.hoang@oracle.com>
---
 fs/xfs/libxfs/xfs_dir2.c       |    6 +++++-
 fs/xfs/libxfs/xfs_dir2.h       |    3 ++-
 fs/xfs/libxfs/xfs_dir2_block.c |    4 ++--
 fs/xfs/libxfs/xfs_dir2_leaf.c  |    5 +++--
 fs/xfs/libxfs/xfs_dir2_node.c  |    5 +++--
 fs/xfs/libxfs/xfs_dir2_sf.c    |    2 ++
 fs/xfs/xfs_inode.c             |    4 ++--
 7 files changed, 19 insertions(+), 10 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_dir2.c b/fs/xfs/libxfs/xfs_dir2.c
index 69a6561c22cc..891c1f701f53 100644
--- a/fs/xfs/libxfs/xfs_dir2.c
+++ b/fs/xfs/libxfs/xfs_dir2.c
@@ -436,7 +436,8 @@ xfs_dir_removename(
 	struct xfs_inode	*dp,
 	struct xfs_name		*name,
 	xfs_ino_t		ino,
-	xfs_extlen_t		total)		/* bmap's total block count */
+	xfs_extlen_t		total,		/* bmap's total block count */
+	xfs_dir2_dataptr_t	*offset)	/* OUT: offset in directory */
 {
 	struct xfs_da_args	*args;
 	int			rval;
@@ -481,6 +482,9 @@ xfs_dir_removename(
 	else
 		rval = xfs_dir2_node_removename(args);
 out_free:
+	if (offset)
+		*offset = args->offset;
+
 	kmem_free(args);
 	return rval;
 }
diff --git a/fs/xfs/libxfs/xfs_dir2.h b/fs/xfs/libxfs/xfs_dir2.h
index d96954478696..0c2d7c0af78f 100644
--- a/fs/xfs/libxfs/xfs_dir2.h
+++ b/fs/xfs/libxfs/xfs_dir2.h
@@ -46,7 +46,8 @@ extern int xfs_dir_lookup(struct xfs_trans *tp, struct xfs_inode *dp,
 				struct xfs_name *ci_name);
 extern int xfs_dir_removename(struct xfs_trans *tp, struct xfs_inode *dp,
 				struct xfs_name *name, xfs_ino_t ino,
-				xfs_extlen_t tot);
+				xfs_extlen_t tot,
+				xfs_dir2_dataptr_t *offset);
 extern int xfs_dir_replace(struct xfs_trans *tp, struct xfs_inode *dp,
 				const struct xfs_name *name, xfs_ino_t inum,
 				xfs_extlen_t tot);
diff --git a/fs/xfs/libxfs/xfs_dir2_block.c b/fs/xfs/libxfs/xfs_dir2_block.c
index 70aeab9d2a12..d36f3f1491da 100644
--- a/fs/xfs/libxfs/xfs_dir2_block.c
+++ b/fs/xfs/libxfs/xfs_dir2_block.c
@@ -810,9 +810,9 @@ xfs_dir2_block_removename(
 	/*
 	 * Point to the data entry using the leaf entry.
 	 */
+	args->offset = be32_to_cpu(blp[ent].address);
 	dep = (xfs_dir2_data_entry_t *)((char *)hdr +
-			xfs_dir2_dataptr_to_off(args->geo,
-						be32_to_cpu(blp[ent].address)));
+			xfs_dir2_dataptr_to_off(args->geo, args->offset));
 	/*
 	 * Mark the data entry's space free.
 	 */
diff --git a/fs/xfs/libxfs/xfs_dir2_leaf.c b/fs/xfs/libxfs/xfs_dir2_leaf.c
index 9ab520b66547..b4a066259d97 100644
--- a/fs/xfs/libxfs/xfs_dir2_leaf.c
+++ b/fs/xfs/libxfs/xfs_dir2_leaf.c
@@ -1386,9 +1386,10 @@ xfs_dir2_leaf_removename(
 	 * Point to the leaf entry, use that to point to the data entry.
 	 */
 	lep = &leafhdr.ents[index];
-	db = xfs_dir2_dataptr_to_db(geo, be32_to_cpu(lep->address));
+	args->offset = be32_to_cpu(lep->address);
+	db = xfs_dir2_dataptr_to_db(args->geo, args->offset);
 	dep = (xfs_dir2_data_entry_t *)((char *)hdr +
-		xfs_dir2_dataptr_to_off(geo, be32_to_cpu(lep->address)));
+		xfs_dir2_dataptr_to_off(args->geo, args->offset));
 	needscan = needlog = 0;
 	oldbest = be16_to_cpu(bf[0].length);
 	ltp = xfs_dir2_leaf_tail_p(geo, leaf);
diff --git a/fs/xfs/libxfs/xfs_dir2_node.c b/fs/xfs/libxfs/xfs_dir2_node.c
index 5a9513c036b8..39cbdeafa0f6 100644
--- a/fs/xfs/libxfs/xfs_dir2_node.c
+++ b/fs/xfs/libxfs/xfs_dir2_node.c
@@ -1296,9 +1296,10 @@ xfs_dir2_leafn_remove(
 	/*
 	 * Extract the data block and offset from the entry.
 	 */
-	db = xfs_dir2_dataptr_to_db(geo, be32_to_cpu(lep->address));
+	args->offset = be32_to_cpu(lep->address);
+	db = xfs_dir2_dataptr_to_db(args->geo, args->offset);
 	ASSERT(dblk->blkno == db);
-	off = xfs_dir2_dataptr_to_off(geo, be32_to_cpu(lep->address));
+	off = xfs_dir2_dataptr_to_off(args->geo, args->offset);
 	ASSERT(dblk->index == off);
 
 	/*
diff --git a/fs/xfs/libxfs/xfs_dir2_sf.c b/fs/xfs/libxfs/xfs_dir2_sf.c
index 44bc4ba3da8a..b49578a547b3 100644
--- a/fs/xfs/libxfs/xfs_dir2_sf.c
+++ b/fs/xfs/libxfs/xfs_dir2_sf.c
@@ -969,6 +969,8 @@ xfs_dir2_sf_removename(
 								XFS_CMP_EXACT) {
 			ASSERT(xfs_dir2_sf_get_ino(mp, sfp, sfep) ==
 			       args->inumber);
+			args->offset = xfs_dir2_byte_to_dataptr(
+						xfs_dir2_sf_get_offset(sfep));
 			break;
 		}
 	}
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 143de4202cf4..e5ed8bdef9fe 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -2508,7 +2508,7 @@ xfs_remove(
 	if (error)
 		goto out_trans_cancel;
 
-	error = xfs_dir_removename(tp, dp, name, ip->i_ino, resblks);
+	error = xfs_dir_removename(tp, dp, name, ip->i_ino, resblks, NULL);
 	if (error) {
 		ASSERT(error != -ENOENT);
 		goto out_trans_cancel;
@@ -3098,7 +3098,7 @@ xfs_rename(
 					spaceres);
 	else
 		error = xfs_dir_removename(tp, src_dp, src_name, src_ip->i_ino,
-					   spaceres);
+					   spaceres, NULL);
 
 	if (error)
 		goto out_trans_cancel;


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 10/28] xfs: get directory offset when replacing a directory name
  2023-02-16 20:26 ` [PATCHSET v9r2d1 00/28] " Darrick J. Wong
                     ` (8 preceding siblings ...)
  2023-02-16 20:35   ` [PATCH 09/28] xfs: get directory offset when removing " Darrick J. Wong
@ 2023-02-16 20:35   ` Darrick J. Wong
  2023-02-16 20:35   ` [PATCH 11/28] xfs: add parent pointer support to attribute code Darrick J. Wong
                     ` (17 subsequent siblings)
  27 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:35 UTC (permalink / raw)
  To: djwong
  Cc: Mark Tinguely, Dave Chinner, Allison Henderson,
	allison.henderson, linux-xfs

From: Allison Henderson <allison.henderson@oracle.com>

Return the directory offset information when replacing an entry to the
directory.

This offset will be used as the parent pointer offset in xfs_rename.

Signed-off-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_dir2.c       |    8 ++++++--
 fs/xfs/libxfs/xfs_dir2.h       |    2 +-
 fs/xfs/libxfs/xfs_dir2_block.c |    4 ++--
 fs/xfs/libxfs/xfs_dir2_leaf.c  |    1 +
 fs/xfs/libxfs/xfs_dir2_node.c  |    1 +
 fs/xfs/libxfs/xfs_dir2_sf.c    |    2 ++
 fs/xfs/xfs_inode.c             |   16 ++++++++--------
 7 files changed, 21 insertions(+), 13 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_dir2.c b/fs/xfs/libxfs/xfs_dir2.c
index 891c1f701f53..c1a9394d7478 100644
--- a/fs/xfs/libxfs/xfs_dir2.c
+++ b/fs/xfs/libxfs/xfs_dir2.c
@@ -482,7 +482,7 @@ xfs_dir_removename(
 	else
 		rval = xfs_dir2_node_removename(args);
 out_free:
-	if (offset)
+	if (!rval && offset)
 		*offset = args->offset;
 
 	kmem_free(args);
@@ -498,7 +498,8 @@ xfs_dir_replace(
 	struct xfs_inode	*dp,
 	const struct xfs_name	*name,		/* name of entry to replace */
 	xfs_ino_t		inum,		/* new inode number */
-	xfs_extlen_t		total)		/* bmap's total block count */
+	xfs_extlen_t		total,		/* bmap's total block count */
+	xfs_dir2_dataptr_t	*offset)	/* OUT: offset in directory */
 {
 	struct xfs_da_args	*args;
 	int			rval;
@@ -546,6 +547,9 @@ xfs_dir_replace(
 	else
 		rval = xfs_dir2_node_replace(args);
 out_free:
+	if (offset)
+		*offset = args->offset;
+
 	kmem_free(args);
 	return rval;
 }
diff --git a/fs/xfs/libxfs/xfs_dir2.h b/fs/xfs/libxfs/xfs_dir2.h
index 0c2d7c0af78f..ff59f009d1fd 100644
--- a/fs/xfs/libxfs/xfs_dir2.h
+++ b/fs/xfs/libxfs/xfs_dir2.h
@@ -50,7 +50,7 @@ extern int xfs_dir_removename(struct xfs_trans *tp, struct xfs_inode *dp,
 				xfs_dir2_dataptr_t *offset);
 extern int xfs_dir_replace(struct xfs_trans *tp, struct xfs_inode *dp,
 				const struct xfs_name *name, xfs_ino_t inum,
-				xfs_extlen_t tot);
+				xfs_extlen_t tot, xfs_dir2_dataptr_t *offset);
 extern int xfs_dir_canenter(struct xfs_trans *tp, struct xfs_inode *dp,
 				struct xfs_name *name);
 
diff --git a/fs/xfs/libxfs/xfs_dir2_block.c b/fs/xfs/libxfs/xfs_dir2_block.c
index d36f3f1491da..0f3a03e87278 100644
--- a/fs/xfs/libxfs/xfs_dir2_block.c
+++ b/fs/xfs/libxfs/xfs_dir2_block.c
@@ -885,9 +885,9 @@ xfs_dir2_block_replace(
 	/*
 	 * Point to the data entry we need to change.
 	 */
+	args->offset = be32_to_cpu(blp[ent].address);
 	dep = (xfs_dir2_data_entry_t *)((char *)hdr +
-			xfs_dir2_dataptr_to_off(args->geo,
-						be32_to_cpu(blp[ent].address)));
+			xfs_dir2_dataptr_to_off(args->geo, args->offset));
 	ASSERT(be64_to_cpu(dep->inumber) != args->inumber);
 	/*
 	 * Change the inode number to the new value.
diff --git a/fs/xfs/libxfs/xfs_dir2_leaf.c b/fs/xfs/libxfs/xfs_dir2_leaf.c
index b4a066259d97..fe75ffadace9 100644
--- a/fs/xfs/libxfs/xfs_dir2_leaf.c
+++ b/fs/xfs/libxfs/xfs_dir2_leaf.c
@@ -1523,6 +1523,7 @@ xfs_dir2_leaf_replace(
 	/*
 	 * Point to the data entry.
 	 */
+	args->offset = be32_to_cpu(lep->address);
 	dep = (xfs_dir2_data_entry_t *)
 	      ((char *)dbp->b_addr +
 	       xfs_dir2_dataptr_to_off(args->geo, be32_to_cpu(lep->address)));
diff --git a/fs/xfs/libxfs/xfs_dir2_node.c b/fs/xfs/libxfs/xfs_dir2_node.c
index 39cbdeafa0f6..53cd0d5d94f7 100644
--- a/fs/xfs/libxfs/xfs_dir2_node.c
+++ b/fs/xfs/libxfs/xfs_dir2_node.c
@@ -2242,6 +2242,7 @@ xfs_dir2_node_replace(
 		hdr = state->extrablk.bp->b_addr;
 		ASSERT(hdr->magic == cpu_to_be32(XFS_DIR2_DATA_MAGIC) ||
 		       hdr->magic == cpu_to_be32(XFS_DIR3_DATA_MAGIC));
+		args->offset = be32_to_cpu(leafhdr.ents[blk->index].address);
 		dep = (xfs_dir2_data_entry_t *)
 		      ((char *)hdr +
 		       xfs_dir2_dataptr_to_off(args->geo,
diff --git a/fs/xfs/libxfs/xfs_dir2_sf.c b/fs/xfs/libxfs/xfs_dir2_sf.c
index b49578a547b3..032c65804610 100644
--- a/fs/xfs/libxfs/xfs_dir2_sf.c
+++ b/fs/xfs/libxfs/xfs_dir2_sf.c
@@ -1107,6 +1107,8 @@ xfs_dir2_sf_replace(
 				xfs_dir2_sf_put_ino(mp, sfp, sfep,
 						args->inumber);
 				xfs_dir2_sf_put_ftype(mp, sfep, args->filetype);
+				args->offset = xfs_dir2_byte_to_dataptr(
+						  xfs_dir2_sf_get_offset(sfep));
 				break;
 			}
 		}
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index e5ed8bdef9fe..a896ee4c9680 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -2489,7 +2489,7 @@ xfs_remove(
 		 */
 		if (dp->i_ino != tp->t_mountp->m_sb.sb_rootino) {
 			error = xfs_dir_replace(tp, ip, &xfs_name_dotdot,
-					tp->t_mountp->m_sb.sb_rootino, 0);
+					tp->t_mountp->m_sb.sb_rootino, 0, NULL);
 			if (error)
 				goto out_trans_cancel;
 		}
@@ -2644,12 +2644,12 @@ xfs_cross_rename(
 	int		dp2_flags = 0;
 
 	/* Swap inode number for dirent in first parent */
-	error = xfs_dir_replace(tp, dp1, name1, ip2->i_ino, spaceres);
+	error = xfs_dir_replace(tp, dp1, name1, ip2->i_ino, spaceres, NULL);
 	if (error)
 		goto out_trans_abort;
 
 	/* Swap inode number for dirent in second parent */
-	error = xfs_dir_replace(tp, dp2, name2, ip1->i_ino, spaceres);
+	error = xfs_dir_replace(tp, dp2, name2, ip1->i_ino, spaceres, NULL);
 	if (error)
 		goto out_trans_abort;
 
@@ -2663,7 +2663,7 @@ xfs_cross_rename(
 
 		if (S_ISDIR(VFS_I(ip2)->i_mode)) {
 			error = xfs_dir_replace(tp, ip2, &xfs_name_dotdot,
-						dp1->i_ino, spaceres);
+						dp1->i_ino, spaceres, NULL);
 			if (error)
 				goto out_trans_abort;
 
@@ -2687,7 +2687,7 @@ xfs_cross_rename(
 
 		if (S_ISDIR(VFS_I(ip1)->i_mode)) {
 			error = xfs_dir_replace(tp, ip1, &xfs_name_dotdot,
-						dp2->i_ino, spaceres);
+						dp2->i_ino, spaceres, NULL);
 			if (error)
 				goto out_trans_abort;
 
@@ -3022,7 +3022,7 @@ xfs_rename(
 		 * name at the destination directory, remove it first.
 		 */
 		error = xfs_dir_replace(tp, target_dp, target_name,
-					src_ip->i_ino, spaceres);
+					src_ip->i_ino, spaceres, NULL);
 		if (error)
 			goto out_trans_cancel;
 
@@ -3056,7 +3056,7 @@ xfs_rename(
 		 * directory.
 		 */
 		error = xfs_dir_replace(tp, src_ip, &xfs_name_dotdot,
-					target_dp->i_ino, spaceres);
+					target_dp->i_ino, spaceres, NULL);
 		ASSERT(error != -EEXIST);
 		if (error)
 			goto out_trans_cancel;
@@ -3095,7 +3095,7 @@ xfs_rename(
 	 */
 	if (wip)
 		error = xfs_dir_replace(tp, src_dp, src_name, wip->i_ino,
-					spaceres);
+					spaceres, NULL);
 	else
 		error = xfs_dir_removename(tp, src_dp, src_name, src_ip->i_ino,
 					   spaceres, NULL);


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 11/28] xfs: add parent pointer support to attribute code
  2023-02-16 20:26 ` [PATCHSET v9r2d1 00/28] " Darrick J. Wong
                     ` (9 preceding siblings ...)
  2023-02-16 20:35   ` [PATCH 10/28] xfs: get directory offset when replacing a " Darrick J. Wong
@ 2023-02-16 20:35   ` Darrick J. Wong
  2023-02-16 20:35   ` [PATCH 12/28] xfs: define parent pointer xattr format Darrick J. Wong
                     ` (16 subsequent siblings)
  27 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:35 UTC (permalink / raw)
  To: djwong
  Cc: Mark Tinguely, Dave Chinner, Allison Henderson,
	allison.henderson, linux-xfs

From: Allison Henderson <allison.henderson@oracle.com>

Add the new parent attribute type. XFS_ATTR_PARENT is used only for parent pointer
entries; it uses reserved blocks like XFS_ATTR_ROOT.

Signed-off-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_attr.c       |    4 +++-
 fs/xfs/libxfs/xfs_da_format.h  |    5 ++++-
 fs/xfs/libxfs/xfs_log_format.h |    1 +
 fs/xfs/scrub/attr.c            |    2 +-
 4 files changed, 9 insertions(+), 3 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index b1dbed7655e8..101823772bf9 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -976,11 +976,13 @@ xfs_attr_set(
 	struct xfs_inode	*dp = args->dp;
 	struct xfs_mount	*mp = dp->i_mount;
 	struct xfs_trans_res	tres;
-	bool			rsvd = (args->attr_filter & XFS_ATTR_ROOT);
+	bool			rsvd;
 	int			error, local;
 	int			rmt_blks = 0;
 	unsigned int		total;
 
+	rsvd = (args->attr_filter & (XFS_ATTR_ROOT | XFS_ATTR_PARENT)) != 0;
+
 	if (xfs_is_shutdown(dp->i_mount))
 		return -EIO;
 
diff --git a/fs/xfs/libxfs/xfs_da_format.h b/fs/xfs/libxfs/xfs_da_format.h
index 25e2841084e1..3dc03968bba6 100644
--- a/fs/xfs/libxfs/xfs_da_format.h
+++ b/fs/xfs/libxfs/xfs_da_format.h
@@ -688,12 +688,15 @@ struct xfs_attr3_leafblock {
 #define	XFS_ATTR_LOCAL_BIT	0	/* attr is stored locally */
 #define	XFS_ATTR_ROOT_BIT	1	/* limit access to trusted attrs */
 #define	XFS_ATTR_SECURE_BIT	2	/* limit access to secure attrs */
+#define	XFS_ATTR_PARENT_BIT	3	/* parent pointer attrs */
 #define	XFS_ATTR_INCOMPLETE_BIT	7	/* attr in middle of create/delete */
 #define XFS_ATTR_LOCAL		(1u << XFS_ATTR_LOCAL_BIT)
 #define XFS_ATTR_ROOT		(1u << XFS_ATTR_ROOT_BIT)
 #define XFS_ATTR_SECURE		(1u << XFS_ATTR_SECURE_BIT)
+#define XFS_ATTR_PARENT		(1u << XFS_ATTR_PARENT_BIT)
 #define XFS_ATTR_INCOMPLETE	(1u << XFS_ATTR_INCOMPLETE_BIT)
-#define XFS_ATTR_NSP_ONDISK_MASK	(XFS_ATTR_ROOT | XFS_ATTR_SECURE)
+#define XFS_ATTR_NSP_ONDISK_MASK \
+			(XFS_ATTR_ROOT | XFS_ATTR_SECURE | XFS_ATTR_PARENT)
 
 /*
  * Alignment for namelist and valuelist entries (since they are mixed
diff --git a/fs/xfs/libxfs/xfs_log_format.h b/fs/xfs/libxfs/xfs_log_format.h
index ae9c99762a24..727b5a858028 100644
--- a/fs/xfs/libxfs/xfs_log_format.h
+++ b/fs/xfs/libxfs/xfs_log_format.h
@@ -967,6 +967,7 @@ struct xfs_icreate_log {
  */
 #define XFS_ATTRI_FILTER_MASK		(XFS_ATTR_ROOT | \
 					 XFS_ATTR_SECURE | \
+					 XFS_ATTR_PARENT | \
 					 XFS_ATTR_INCOMPLETE)
 
 /*
diff --git a/fs/xfs/scrub/attr.c b/fs/xfs/scrub/attr.c
index 31529b9bf389..9d2e33743ecd 100644
--- a/fs/xfs/scrub/attr.c
+++ b/fs/xfs/scrub/attr.c
@@ -441,7 +441,7 @@ xchk_xattr_rec(
 	/* Retrieve the entry and check it. */
 	hash = be32_to_cpu(ent->hashval);
 	badflags = ~(XFS_ATTR_LOCAL | XFS_ATTR_ROOT | XFS_ATTR_SECURE |
-			XFS_ATTR_INCOMPLETE);
+			XFS_ATTR_INCOMPLETE | XFS_ATTR_PARENT);
 	if ((ent->flags & badflags) != 0)
 		xchk_da_set_corrupt(ds, level);
 	if (ent->flags & XFS_ATTR_LOCAL) {


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 12/28] xfs: define parent pointer xattr format
  2023-02-16 20:26 ` [PATCHSET v9r2d1 00/28] " Darrick J. Wong
                     ` (10 preceding siblings ...)
  2023-02-16 20:35   ` [PATCH 11/28] xfs: add parent pointer support to attribute code Darrick J. Wong
@ 2023-02-16 20:35   ` Darrick J. Wong
  2023-02-16 20:36   ` [PATCH 13/28] xfs: Add xfs_verify_pptr Darrick J. Wong
                     ` (15 subsequent siblings)
  27 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:35 UTC (permalink / raw)
  To: djwong; +Cc: Dave Chinner, Allison Henderson, allison.henderson, linux-xfs

From: Allison Henderson <allison.henderson@oracle.com>

We need to define the parent pointer attribute format before we start
adding support for it into all the code that needs to use it. The EA
format we will use encodes the following information:

        name={parent inode #, parent inode generation, dirent offset}
        value={dirent filename}

The inode/gen gives all the information we need to reliably identify the
parent without requiring child->parent lock ordering, and allows
userspace to do pathname component level reconstruction without the
kernel ever needing to verify the parent itself as part of ioctl calls.

By using the dirent offset in the EA name, we have a method of knowing
the exact parent pointer EA we need to modify/remove in rename/unlink
without an unbound EA name search.

By keeping the dirent name in the value, we have enough information to
be able to validate and reconstruct damaged directory trees. While the
diroffset of a filename alone is not unique enough to identify the
child, the {diroffset,filename,child_inode} tuple is sufficient. That
is, if the diroffset gets reused and points to a different filename, we
can detect that from the contents of EA. If a link of the same name is
created, then we can check whether it points at the same inode as the
parent EA we current have.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_da_format.h |   25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)


diff --git a/fs/xfs/libxfs/xfs_da_format.h b/fs/xfs/libxfs/xfs_da_format.h
index 3dc03968bba6..b02b67f1999e 100644
--- a/fs/xfs/libxfs/xfs_da_format.h
+++ b/fs/xfs/libxfs/xfs_da_format.h
@@ -805,4 +805,29 @@ static inline unsigned int xfs_dir2_dirblock_bytes(struct xfs_sb *sbp)
 xfs_failaddr_t xfs_da3_blkinfo_verify(struct xfs_buf *bp,
 				      struct xfs_da3_blkinfo *hdr3);
 
+/*
+ * Parent pointer attribute format definition
+ *
+ * EA name encodes the parent inode number, generation and the offset of
+ * the dirent that points to the child inode. The EA value contains the
+ * same name as the dirent in the parent directory.
+ */
+struct xfs_parent_name_rec {
+	__be64  p_ino;
+	__be32  p_gen;
+	__be32  p_diroffset;
+};
+
+/*
+ * incore version of the above, also contains name pointers so callers
+ * can pass/obtain all the parent pointer information in a single structure
+ */
+struct xfs_parent_name_irec {
+	xfs_ino_t		p_ino;
+	uint32_t		p_gen;
+	xfs_dir2_dataptr_t	p_diroffset;
+	const char		*p_name;
+	uint8_t			p_namelen;
+};
+
 #endif /* __XFS_DA_FORMAT_H__ */


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 13/28] xfs: Add xfs_verify_pptr
  2023-02-16 20:26 ` [PATCHSET v9r2d1 00/28] " Darrick J. Wong
                     ` (11 preceding siblings ...)
  2023-02-16 20:35   ` [PATCH 12/28] xfs: define parent pointer xattr format Darrick J. Wong
@ 2023-02-16 20:36   ` Darrick J. Wong
  2023-02-16 20:36   ` [PATCH 14/28] xfs: extend transaction reservations for parent attributes Darrick J. Wong
                     ` (14 subsequent siblings)
  27 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:36 UTC (permalink / raw)
  To: djwong; +Cc: Allison Henderson, allison.henderson, linux-xfs

From: Allison Henderson <allison.henderson@oracle.com>

Attribute names of parent pointers are not strings.  So we need to modify
attr_namecheck to verify parent pointer records when the XFS_ATTR_PARENT flag is
set.

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_attr.c      |   47 ++++++++++++++++++++++++++++++++++++++---
 fs/xfs/libxfs/xfs_attr.h      |    3 ++-
 fs/xfs/libxfs/xfs_da_format.h |    8 +++++++
 fs/xfs/scrub/attr.c           |    2 +-
 fs/xfs/xfs_attr_item.c        |   11 ++++++----
 fs/xfs/xfs_attr_list.c        |   17 ++++++++++-----
 6 files changed, 74 insertions(+), 14 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index 101823772bf9..711022742e34 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -1577,9 +1577,33 @@ xfs_attr_node_get(
 	return error;
 }
 
-/* Returns true if the attribute entry name is valid. */
-bool
-xfs_attr_namecheck(
+/*
+ * Verify parent pointer attribute is valid.
+ * Return true on success or false on failure
+ */
+STATIC bool
+xfs_verify_pptr(
+	struct xfs_mount			*mp,
+	const struct xfs_parent_name_rec	*rec)
+{
+	xfs_ino_t				p_ino;
+	xfs_dir2_dataptr_t			p_diroffset;
+
+	p_ino = be64_to_cpu(rec->p_ino);
+	p_diroffset = be32_to_cpu(rec->p_diroffset);
+
+	if (!xfs_verify_ino(mp, p_ino))
+		return false;
+
+	if (p_diroffset > XFS_DIR2_MAX_DATAPTR)
+		return false;
+
+	return true;
+}
+
+/* Returns true if the string attribute entry name is valid. */
+static bool
+xfs_str_attr_namecheck(
 	const void	*name,
 	size_t		length)
 {
@@ -1594,6 +1618,23 @@ xfs_attr_namecheck(
 	return !memchr(name, 0, length);
 }
 
+/* Returns true if the attribute entry name is valid. */
+bool
+xfs_attr_namecheck(
+	struct xfs_mount	*mp,
+	const void		*name,
+	size_t			length,
+	int			flags)
+{
+	if (flags & XFS_ATTR_PARENT) {
+		if (length != sizeof(struct xfs_parent_name_rec))
+			return false;
+		return xfs_verify_pptr(mp, (struct xfs_parent_name_rec *)name);
+	}
+
+	return xfs_str_attr_namecheck(name, length);
+}
+
 int __init
 xfs_attr_intent_init_cache(void)
 {
diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
index 3e81f3f48560..b79dae788cfb 100644
--- a/fs/xfs/libxfs/xfs_attr.h
+++ b/fs/xfs/libxfs/xfs_attr.h
@@ -547,7 +547,8 @@ int xfs_attr_get(struct xfs_da_args *args);
 int xfs_attr_set(struct xfs_da_args *args);
 int xfs_attr_set_iter(struct xfs_attr_intent *attr);
 int xfs_attr_remove_iter(struct xfs_attr_intent *attr);
-bool xfs_attr_namecheck(const void *name, size_t length);
+bool xfs_attr_namecheck(struct xfs_mount *mp, const void *name, size_t length,
+			int flags);
 int xfs_attr_calc_size(struct xfs_da_args *args, int *local);
 void xfs_init_attr_trans(struct xfs_da_args *args, struct xfs_trans_res *tres,
 			 unsigned int *total);
diff --git a/fs/xfs/libxfs/xfs_da_format.h b/fs/xfs/libxfs/xfs_da_format.h
index b02b67f1999e..75b13807145d 100644
--- a/fs/xfs/libxfs/xfs_da_format.h
+++ b/fs/xfs/libxfs/xfs_da_format.h
@@ -731,6 +731,14 @@ xfs_attr3_leaf_name(xfs_attr_leafblock_t *leafp, int idx)
 	return &((char *)leafp)[be16_to_cpu(entries[idx].nameidx)];
 }
 
+static inline int
+xfs_attr3_leaf_flags(xfs_attr_leafblock_t *leafp, int idx)
+{
+	struct xfs_attr_leaf_entry *entries = xfs_attr3_leaf_entryp(leafp);
+
+	return entries[idx].flags;
+}
+
 static inline xfs_attr_leaf_name_remote_t *
 xfs_attr3_leaf_name_remote(xfs_attr_leafblock_t *leafp, int idx)
 {
diff --git a/fs/xfs/scrub/attr.c b/fs/xfs/scrub/attr.c
index 9d2e33743ecd..2a79a13cb600 100644
--- a/fs/xfs/scrub/attr.c
+++ b/fs/xfs/scrub/attr.c
@@ -129,7 +129,7 @@ xchk_xattr_listent(
 	}
 
 	/* Does this name make sense? */
-	if (!xfs_attr_namecheck(name, namelen)) {
+	if (!xfs_attr_namecheck(sx->sc->mp, name, namelen, flags)) {
 		xchk_fblock_set_corrupt(sx->sc, XFS_ATTR_FORK, args.blkno);
 		return;
 	}
diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
index 95e9ecbb4a67..da807f286a09 100644
--- a/fs/xfs/xfs_attr_item.c
+++ b/fs/xfs/xfs_attr_item.c
@@ -593,7 +593,8 @@ xfs_attri_item_recover(
 	 */
 	attrp = &attrip->attri_format;
 	if (!xfs_attri_validate(mp, attrp) ||
-	    !xfs_attr_namecheck(nv->name.i_addr, nv->name.i_len))
+	    !xfs_attr_namecheck(mp, nv->name.i_addr, nv->name.i_len,
+				attrp->alfi_attr_filter))
 		return -EFSCORRUPTED;
 
 	error = xlog_recover_iget(mp,  attrp->alfi_ino, &ip);
@@ -804,7 +805,8 @@ xlog_recover_attri_commit_pass2(
 	}
 
 	attr_name = item->ri_buf[i].i_addr;
-	if (!xfs_attr_namecheck(attr_name, attri_formatp->alfi_name_len)) {
+	if (!xfs_attr_namecheck(mp, attr_name, attri_formatp->alfi_name_len,
+				attri_formatp->alfi_attr_filter)) {
 		XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
 				item->ri_buf[i].i_addr, item->ri_buf[i].i_len);
 		return -EFSCORRUPTED;
@@ -822,8 +824,9 @@ xlog_recover_attri_commit_pass2(
 		}
 
 		attr_nname = item->ri_buf[i].i_addr;
-		if (!xfs_attr_namecheck(attr_nname,
-				attri_formatp->alfi_nname_len)) {
+		if (!xfs_attr_namecheck(mp, attr_nname,
+				attri_formatp->alfi_nname_len,
+				attri_formatp->alfi_attr_filter)) {
 			XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
 					item->ri_buf[i].i_addr,
 					item->ri_buf[i].i_len);
diff --git a/fs/xfs/xfs_attr_list.c b/fs/xfs/xfs_attr_list.c
index 99bbbe1a0e44..a51f7f13a352 100644
--- a/fs/xfs/xfs_attr_list.c
+++ b/fs/xfs/xfs_attr_list.c
@@ -58,9 +58,13 @@ xfs_attr_shortform_list(
 	struct xfs_attr_sf_sort		*sbuf, *sbp;
 	struct xfs_attr_shortform	*sf;
 	struct xfs_attr_sf_entry	*sfe;
+	struct xfs_mount		*mp;
 	int				sbsize, nsbuf, count, i;
 	int				error = 0;
 
+	ASSERT(context != NULL);
+	ASSERT(dp != NULL);
+	mp = dp->i_mount;
 	sf = (struct xfs_attr_shortform *)dp->i_af.if_u1.if_data;
 	ASSERT(sf != NULL);
 	if (!sf->hdr.count)
@@ -82,8 +86,9 @@ xfs_attr_shortform_list(
 	     (dp->i_af.if_bytes + sf->hdr.count * 16) < context->bufsize)) {
 		for (i = 0, sfe = &sf->list[0]; i < sf->hdr.count; i++) {
 			if (XFS_IS_CORRUPT(context->dp->i_mount,
-					   !xfs_attr_namecheck(sfe->nameval,
-							       sfe->namelen)))
+					   !xfs_attr_namecheck(mp, sfe->nameval,
+							       sfe->namelen,
+							       sfe->flags)))
 				return -EFSCORRUPTED;
 			context->put_listent(context,
 					     sfe->flags,
@@ -174,8 +179,9 @@ xfs_attr_shortform_list(
 			cursor->offset = 0;
 		}
 		if (XFS_IS_CORRUPT(context->dp->i_mount,
-				   !xfs_attr_namecheck(sbp->name,
-						       sbp->namelen))) {
+				   !xfs_attr_namecheck(mp, sbp->name,
+						       sbp->namelen,
+						       sbp->flags))) {
 			error = -EFSCORRUPTED;
 			goto out;
 		}
@@ -465,7 +471,8 @@ xfs_attr3_leaf_list_int(
 		}
 
 		if (XFS_IS_CORRUPT(context->dp->i_mount,
-				   !xfs_attr_namecheck(name, namelen)))
+				   !xfs_attr_namecheck(mp, name, namelen,
+						       entry->flags)))
 			return -EFSCORRUPTED;
 		context->put_listent(context, entry->flags,
 					      name, namelen, valuelen);


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 14/28] xfs: extend transaction reservations for parent attributes
  2023-02-16 20:26 ` [PATCHSET v9r2d1 00/28] " Darrick J. Wong
                     ` (12 preceding siblings ...)
  2023-02-16 20:36   ` [PATCH 13/28] xfs: Add xfs_verify_pptr Darrick J. Wong
@ 2023-02-16 20:36   ` Darrick J. Wong
  2023-02-16 20:36   ` [PATCH 15/28] xfs: parent pointer attribute creation Darrick J. Wong
                     ` (13 subsequent siblings)
  27 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:36 UTC (permalink / raw)
  To: djwong; +Cc: Dave Chinner, Allison Henderson, allison.henderson, linux-xfs

From: Allison Henderson <allison.henderson@oracle.com>

We need to add, remove or modify parent pointer attributes during
create/link/unlink/rename operations atomically with the dirents in the
parent directories being modified. This means they need to be modified
in the same transaction as the parent directories, and so we need to add
the required space for the attribute modifications to the transaction
reservations.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
---
 fs/xfs/libxfs/xfs_trans_resv.c |  324 ++++++++++++++++++++++++++++++++++------
 1 file changed, 272 insertions(+), 52 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_trans_resv.c b/fs/xfs/libxfs/xfs_trans_resv.c
index 5b2f27cbdb80..93419956b9e5 100644
--- a/fs/xfs/libxfs/xfs_trans_resv.c
+++ b/fs/xfs/libxfs/xfs_trans_resv.c
@@ -19,6 +19,9 @@
 #include "xfs_trans.h"
 #include "xfs_qm.h"
 #include "xfs_trans_space.h"
+#include "xfs_attr_item.h"
+#include "xfs_log.h"
+#include "xfs_da_format.h"
 
 #define _ALLOC	true
 #define _FREE	false
@@ -420,29 +423,108 @@ xfs_calc_itruncate_reservation_minlogsize(
 	return xfs_calc_itruncate_reservation(mp, true);
 }
 
+static inline unsigned int xfs_calc_pptr_link_overhead(void)
+{
+	return sizeof(struct xfs_attri_log_format) +
+			xlog_calc_iovec_len(XATTR_NAME_MAX) +
+			xlog_calc_iovec_len(sizeof(struct xfs_parent_name_rec));
+}
+static inline unsigned int xfs_calc_pptr_unlink_overhead(void)
+{
+	return sizeof(struct xfs_attri_log_format) +
+			xlog_calc_iovec_len(sizeof(struct xfs_parent_name_rec));
+}
+static inline unsigned int xfs_calc_pptr_replace_overhead(void)
+{
+	return sizeof(struct xfs_attri_log_format) +
+			xlog_calc_iovec_len(XATTR_NAME_MAX) +
+			xlog_calc_iovec_len(XATTR_NAME_MAX) +
+			xlog_calc_iovec_len(sizeof(struct xfs_parent_name_rec));
+}
+
 /*
  * In renaming a files we can modify:
  *    the five inodes involved: 5 * inode size
  *    the two directory btrees: 2 * (max depth + v2) * dir block size
  *    the two directory bmap btrees: 2 * max depth * block size
  * And the bmap_finish transaction can free dir and bmap blocks (two sets
- *	of bmap blocks) giving:
+ *	of bmap blocks) giving (t2):
  *    the agf for the ags in which the blocks live: 3 * sector size
  *    the agfl for the ags in which the blocks live: 3 * sector size
  *    the superblock for the free block count: sector size
  *    the allocation btrees: 3 exts * 2 trees * (2 * max depth - 1) * block size
+ * If parent pointers are enabled (t3), then each transaction in the chain
+ *    must be capable of setting or removing the extended attribute
+ *    containing the parent information.  It must also be able to handle
+ *    the three xattr intent items that track the progress of the parent
+ *    pointer update.
  */
 STATIC uint
 xfs_calc_rename_reservation(
 	struct xfs_mount	*mp)
 {
-	return XFS_DQUOT_LOGRES(mp) +
-		max((xfs_calc_inode_res(mp, 5) +
-		     xfs_calc_buf_res(2 * XFS_DIROP_LOG_COUNT(mp),
-				      XFS_FSB_TO_B(mp, 1))),
-		    (xfs_calc_buf_res(7, mp->m_sb.sb_sectsize) +
-		     xfs_calc_buf_res(xfs_allocfree_block_count(mp, 3),
-				      XFS_FSB_TO_B(mp, 1))));
+	unsigned int		overhead = XFS_DQUOT_LOGRES(mp);
+	struct xfs_trans_resv	*resp = M_RES(mp);
+	unsigned int		t1, t2, t3 = 0;
+
+	t1 = xfs_calc_inode_res(mp, 5) +
+	     xfs_calc_buf_res(2 * XFS_DIROP_LOG_COUNT(mp),
+			XFS_FSB_TO_B(mp, 1));
+
+	t2 = xfs_calc_buf_res(7, mp->m_sb.sb_sectsize) +
+	     xfs_calc_buf_res(xfs_allocfree_block_count(mp, 3),
+			XFS_FSB_TO_B(mp, 1));
+
+	if (xfs_has_parent(mp)) {
+		unsigned int	rename_overhead, exchange_overhead;
+
+		t3 = max(resp->tr_attrsetm.tr_logres,
+			 resp->tr_attrrm.tr_logres);
+
+		/*
+		 * For a standard rename, the three xattr intent log items
+		 * are (1) replacing the pptr for the source file; (2)
+		 * removing the pptr on the dest file; and (3) adding a
+		 * pptr for the whiteout file in the src dir.
+		 *
+		 * For an RENAME_EXCHANGE, there are two xattr intent
+		 * items to replace the pptr for both src and dest
+		 * files.  Link counts don't change and there is no
+		 * whiteout.
+		 *
+		 * In the worst case we can end up relogging all log
+		 * intent items to allow the log tail to move ahead, so
+		 * they become overhead added to each transaction in a
+		 * processing chain.
+		 */
+		rename_overhead = xfs_calc_pptr_replace_overhead() +
+				  xfs_calc_pptr_unlink_overhead() +
+				  xfs_calc_pptr_link_overhead();
+		exchange_overhead = 2 * xfs_calc_pptr_replace_overhead();
+
+		overhead += max(rename_overhead, exchange_overhead);
+	}
+
+	return overhead + max3(t1, t2, t3);
+}
+
+static inline unsigned int
+xfs_rename_log_count(
+	struct xfs_mount	*mp,
+	struct xfs_trans_resv	*resp)
+{
+	/* One for the rename, one more for freeing blocks */
+	unsigned int		ret = XFS_RENAME_LOG_COUNT;
+
+	/*
+	 * Pre-reserve enough log reservation to handle the transaction
+	 * rolling needed to remove or add one parent pointer.
+	 */
+	if (xfs_has_parent(mp))
+		ret += max(resp->tr_attrsetm.tr_logcount,
+			   resp->tr_attrrm.tr_logcount);
+
+	return ret;
 }
 
 /*
@@ -459,6 +541,23 @@ xfs_calc_iunlink_remove_reservation(
 	       2 * M_IGEO(mp)->inode_cluster_size;
 }
 
+static inline unsigned int
+xfs_link_log_count(
+	struct xfs_mount	*mp,
+	struct xfs_trans_resv	*resp)
+{
+	unsigned int		ret = XFS_LINK_LOG_COUNT;
+
+	/*
+	 * Pre-reserve enough log reservation to handle the transaction
+	 * rolling needed to add one parent pointer.
+	 */
+	if (xfs_has_parent(mp))
+		ret += resp->tr_attrsetm.tr_logcount;
+
+	return ret;
+}
+
 /*
  * For creating a link to an inode:
  *    the parent directory inode: inode size
@@ -475,14 +574,23 @@ STATIC uint
 xfs_calc_link_reservation(
 	struct xfs_mount	*mp)
 {
-	return XFS_DQUOT_LOGRES(mp) +
-		xfs_calc_iunlink_remove_reservation(mp) +
-		max((xfs_calc_inode_res(mp, 2) +
-		     xfs_calc_buf_res(XFS_DIROP_LOG_COUNT(mp),
-				      XFS_FSB_TO_B(mp, 1))),
-		    (xfs_calc_buf_res(3, mp->m_sb.sb_sectsize) +
-		     xfs_calc_buf_res(xfs_allocfree_block_count(mp, 1),
-				      XFS_FSB_TO_B(mp, 1))));
+	unsigned int            overhead = XFS_DQUOT_LOGRES(mp);
+	struct xfs_trans_resv   *resp = M_RES(mp);
+	unsigned int            t1, t2, t3 = 0;
+
+	overhead += xfs_calc_iunlink_remove_reservation(mp);
+	t1 = xfs_calc_inode_res(mp, 2) +
+	       xfs_calc_buf_res(XFS_DIROP_LOG_COUNT(mp), XFS_FSB_TO_B(mp, 1));
+	t2 = xfs_calc_buf_res(3, mp->m_sb.sb_sectsize) +
+	     xfs_calc_buf_res(xfs_allocfree_block_count(mp, 1),
+			      XFS_FSB_TO_B(mp, 1));
+
+	if (xfs_has_parent(mp)) {
+		t3 = resp->tr_attrsetm.tr_logres;
+		overhead += xfs_calc_pptr_link_overhead();
+	}
+
+	return overhead + max3(t1, t2, t3);
 }
 
 /*
@@ -497,6 +605,23 @@ xfs_calc_iunlink_add_reservation(xfs_mount_t *mp)
 			M_IGEO(mp)->inode_cluster_size;
 }
 
+static inline unsigned int
+xfs_remove_log_count(
+	struct xfs_mount	*mp,
+	struct xfs_trans_resv	*resp)
+{
+	unsigned int		ret = XFS_REMOVE_LOG_COUNT;
+
+	/*
+	 * Pre-reserve enough log reservation to handle the transaction
+	 * rolling needed to add one parent pointer.
+	 */
+	if (xfs_has_parent(mp))
+		ret += resp->tr_attrrm.tr_logcount;
+
+	return ret;
+}
+
 /*
  * For removing a directory entry we can modify:
  *    the parent directory inode: inode size
@@ -513,14 +638,24 @@ STATIC uint
 xfs_calc_remove_reservation(
 	struct xfs_mount	*mp)
 {
-	return XFS_DQUOT_LOGRES(mp) +
-		xfs_calc_iunlink_add_reservation(mp) +
-		max((xfs_calc_inode_res(mp, 2) +
-		     xfs_calc_buf_res(XFS_DIROP_LOG_COUNT(mp),
-				      XFS_FSB_TO_B(mp, 1))),
-		    (xfs_calc_buf_res(4, mp->m_sb.sb_sectsize) +
-		     xfs_calc_buf_res(xfs_allocfree_block_count(mp, 2),
-				      XFS_FSB_TO_B(mp, 1))));
+	unsigned int            overhead = XFS_DQUOT_LOGRES(mp);
+	struct xfs_trans_resv   *resp = M_RES(mp);
+	unsigned int            t1, t2, t3 = 0;
+
+	overhead += xfs_calc_iunlink_add_reservation(mp);
+
+	t1 = xfs_calc_inode_res(mp, 2) +
+	     xfs_calc_buf_res(XFS_DIROP_LOG_COUNT(mp), XFS_FSB_TO_B(mp, 1));
+	t2 = xfs_calc_buf_res(4, mp->m_sb.sb_sectsize) +
+	     xfs_calc_buf_res(xfs_allocfree_block_count(mp, 2),
+			      XFS_FSB_TO_B(mp, 1));
+
+	if (xfs_has_parent(mp)) {
+		t3 = resp->tr_attrrm.tr_logres;
+		overhead += xfs_calc_pptr_unlink_overhead();
+	}
+
+	return overhead + max3(t1, t2, t3);
 }
 
 /*
@@ -569,12 +704,40 @@ xfs_calc_icreate_resv_alloc(
 		xfs_calc_finobt_res(mp);
 }
 
+static inline unsigned int
+xfs_icreate_log_count(
+	struct xfs_mount	*mp,
+	struct xfs_trans_resv	*resp)
+{
+	unsigned int		ret = XFS_CREATE_LOG_COUNT;
+
+	/*
+	 * Pre-reserve enough log reservation to handle the transaction
+	 * rolling needed to add one parent pointer.
+	 */
+	if (xfs_has_parent(mp))
+		ret += resp->tr_attrsetm.tr_logcount;
+
+	return ret;
+}
+
 STATIC uint
-xfs_calc_icreate_reservation(xfs_mount_t *mp)
+xfs_calc_icreate_reservation(
+	struct xfs_mount	*mp)
 {
-	return XFS_DQUOT_LOGRES(mp) +
-		max(xfs_calc_icreate_resv_alloc(mp),
-		    xfs_calc_create_resv_modify(mp));
+	struct xfs_trans_resv   *resp = M_RES(mp);
+	unsigned int		overhead = XFS_DQUOT_LOGRES(mp);
+	unsigned int		t1, t2, t3 = 0;
+
+	t1 = xfs_calc_icreate_resv_alloc(mp);
+	t2 = xfs_calc_create_resv_modify(mp);
+
+	if (xfs_has_parent(mp)) {
+		t3 = resp->tr_attrsetm.tr_logres;
+		overhead += xfs_calc_pptr_link_overhead();
+	}
+
+	return overhead + max3(t1, t2, t3);
 }
 
 STATIC uint
@@ -587,6 +750,23 @@ xfs_calc_create_tmpfile_reservation(
 	return res + xfs_calc_iunlink_add_reservation(mp);
 }
 
+static inline unsigned int
+xfs_mkdir_log_count(
+	struct xfs_mount	*mp,
+	struct xfs_trans_resv	*resp)
+{
+	unsigned int		ret = XFS_MKDIR_LOG_COUNT;
+
+	/*
+	 * Pre-reserve enough log reservation to handle the transaction
+	 * rolling needed to add one parent pointer.
+	 */
+	if (xfs_has_parent(mp))
+		ret += resp->tr_attrsetm.tr_logcount;
+
+	return ret;
+}
+
 /*
  * Making a new directory is the same as creating a new file.
  */
@@ -597,6 +777,22 @@ xfs_calc_mkdir_reservation(
 	return xfs_calc_icreate_reservation(mp);
 }
 
+static inline unsigned int
+xfs_symlink_log_count(
+	struct xfs_mount	*mp,
+	struct xfs_trans_resv	*resp)
+{
+	unsigned int		ret = XFS_SYMLINK_LOG_COUNT;
+
+	/*
+	 * Pre-reserve enough log reservation to handle the transaction
+	 * rolling needed to add one parent pointer.
+	 */
+	if (xfs_has_parent(mp))
+		ret += resp->tr_attrsetm.tr_logcount;
+
+	return ret;
+}
 
 /*
  * Making a new symplink is the same as creating a new file, but
@@ -909,6 +1105,52 @@ xfs_calc_sb_reservation(
 	return xfs_calc_buf_res(1, mp->m_sb.sb_sectsize);
 }
 
+/*
+ * Namespace reservations.
+ *
+ * These get tricky when parent pointers are enabled as we have attribute
+ * modifications occurring from within these transactions. Rather than confuse
+ * each of these reservation calculations with the conditional attribute
+ * reservations, add them here in a clear and concise manner. This requires that
+ * the attribute reservations have already been calculated.
+ *
+ * Note that we only include the static attribute reservation here; the runtime
+ * reservation will have to be modified by the size of the attributes being
+ * added/removed/modified. See the comments on the attribute reservation
+ * calculations for more details.
+ */
+STATIC void
+xfs_calc_namespace_reservations(
+	struct xfs_mount	*mp,
+	struct xfs_trans_resv	*resp)
+{
+	ASSERT(resp->tr_attrsetm.tr_logres > 0);
+
+	resp->tr_rename.tr_logres = xfs_calc_rename_reservation(mp);
+	resp->tr_rename.tr_logcount = xfs_rename_log_count(mp, resp);
+	resp->tr_rename.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
+
+	resp->tr_link.tr_logres = xfs_calc_link_reservation(mp);
+	resp->tr_link.tr_logcount = xfs_link_log_count(mp, resp);
+	resp->tr_link.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
+
+	resp->tr_remove.tr_logres = xfs_calc_remove_reservation(mp);
+	resp->tr_remove.tr_logcount = xfs_remove_log_count(mp, resp);
+	resp->tr_remove.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
+
+	resp->tr_symlink.tr_logres = xfs_calc_symlink_reservation(mp);
+	resp->tr_symlink.tr_logcount = xfs_symlink_log_count(mp, resp);
+	resp->tr_symlink.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
+
+	resp->tr_create.tr_logres = xfs_calc_icreate_reservation(mp);
+	resp->tr_create.tr_logcount = xfs_icreate_log_count(mp, resp);
+	resp->tr_create.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
+
+	resp->tr_mkdir.tr_logres = xfs_calc_mkdir_reservation(mp);
+	resp->tr_mkdir.tr_logcount = xfs_mkdir_log_count(mp, resp);
+	resp->tr_mkdir.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
+}
+
 void
 xfs_trans_resv_calc(
 	struct xfs_mount	*mp,
@@ -928,35 +1170,11 @@ xfs_trans_resv_calc(
 	resp->tr_itruncate.tr_logcount = XFS_ITRUNCATE_LOG_COUNT;
 	resp->tr_itruncate.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
 
-	resp->tr_rename.tr_logres = xfs_calc_rename_reservation(mp);
-	resp->tr_rename.tr_logcount = XFS_RENAME_LOG_COUNT;
-	resp->tr_rename.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
-
-	resp->tr_link.tr_logres = xfs_calc_link_reservation(mp);
-	resp->tr_link.tr_logcount = XFS_LINK_LOG_COUNT;
-	resp->tr_link.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
-
-	resp->tr_remove.tr_logres = xfs_calc_remove_reservation(mp);
-	resp->tr_remove.tr_logcount = XFS_REMOVE_LOG_COUNT;
-	resp->tr_remove.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
-
-	resp->tr_symlink.tr_logres = xfs_calc_symlink_reservation(mp);
-	resp->tr_symlink.tr_logcount = XFS_SYMLINK_LOG_COUNT;
-	resp->tr_symlink.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
-
-	resp->tr_create.tr_logres = xfs_calc_icreate_reservation(mp);
-	resp->tr_create.tr_logcount = XFS_CREATE_LOG_COUNT;
-	resp->tr_create.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
-
 	resp->tr_create_tmpfile.tr_logres =
 			xfs_calc_create_tmpfile_reservation(mp);
 	resp->tr_create_tmpfile.tr_logcount = XFS_CREATE_TMPFILE_LOG_COUNT;
 	resp->tr_create_tmpfile.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
 
-	resp->tr_mkdir.tr_logres = xfs_calc_mkdir_reservation(mp);
-	resp->tr_mkdir.tr_logcount = XFS_MKDIR_LOG_COUNT;
-	resp->tr_mkdir.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
-
 	resp->tr_ifree.tr_logres = xfs_calc_ifree_reservation(mp);
 	resp->tr_ifree.tr_logcount = XFS_INACTIVE_LOG_COUNT;
 	resp->tr_ifree.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
@@ -986,6 +1204,8 @@ xfs_trans_resv_calc(
 	resp->tr_qm_dqalloc.tr_logcount = XFS_WRITE_LOG_COUNT;
 	resp->tr_qm_dqalloc.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
 
+	xfs_calc_namespace_reservations(mp, resp);
+
 	/*
 	 * The following transactions are logged in logical format with
 	 * a default log count.


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 15/28] xfs: parent pointer attribute creation
  2023-02-16 20:26 ` [PATCHSET v9r2d1 00/28] " Darrick J. Wong
                     ` (13 preceding siblings ...)
  2023-02-16 20:36   ` [PATCH 14/28] xfs: extend transaction reservations for parent attributes Darrick J. Wong
@ 2023-02-16 20:36   ` Darrick J. Wong
  2023-02-16 20:36   ` [PATCH 16/28] xfs: add parent attributes to link Darrick J. Wong
                     ` (12 subsequent siblings)
  27 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:36 UTC (permalink / raw)
  To: djwong; +Cc: Dave Chinner, Allison Henderson, allison.henderson, linux-xfs

From: Allison Henderson <allison.henderson@oracle.com>

Add parent pointer attribute during xfs_create, and subroutines to
initialize attributes

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
---
 fs/xfs/Makefile               |    1 
 fs/xfs/libxfs/xfs_attr.c      |    4 +
 fs/xfs/libxfs/xfs_attr.h      |    4 +
 fs/xfs/libxfs/xfs_da_format.h |   12 ----
 fs/xfs/libxfs/xfs_parent.c    |  139 +++++++++++++++++++++++++++++++++++++++++
 fs/xfs/libxfs/xfs_parent.h    |   57 +++++++++++++++++
 fs/xfs/xfs_inode.c            |   64 ++++++++++++++++---
 fs/xfs/xfs_super.c            |   10 +++
 fs/xfs/xfs_xattr.c            |    4 +
 fs/xfs/xfs_xattr.h            |    2 +
 10 files changed, 271 insertions(+), 26 deletions(-)
 create mode 100644 fs/xfs/libxfs/xfs_parent.c
 create mode 100644 fs/xfs/libxfs/xfs_parent.h


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 03135a1c31b6..e2b2cf50ffcf 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -40,6 +40,7 @@ xfs-y				+= $(addprefix libxfs/, \
 				   xfs_inode_fork.o \
 				   xfs_inode_buf.o \
 				   xfs_log_rlimit.o \
+				   xfs_parent.o \
 				   xfs_ag_resv.o \
 				   xfs_rmap.o \
 				   xfs_rmap_btree.o \
diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index 711022742e34..f68d41f0f998 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -886,7 +886,7 @@ xfs_attr_lookup(
 	return error;
 }
 
-static int
+int
 xfs_attr_intent_init(
 	struct xfs_da_args	*args,
 	unsigned int		op_flags,	/* op flag (set or remove) */
@@ -904,7 +904,7 @@ xfs_attr_intent_init(
 }
 
 /* Sets an attribute for an inode as a deferred operation */
-static int
+int
 xfs_attr_defer_add(
 	struct xfs_da_args	*args)
 {
diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
index b79dae788cfb..0cf23f5117ad 100644
--- a/fs/xfs/libxfs/xfs_attr.h
+++ b/fs/xfs/libxfs/xfs_attr.h
@@ -544,6 +544,7 @@ int xfs_inode_hasattr(struct xfs_inode *ip);
 bool xfs_attr_is_leaf(struct xfs_inode *ip);
 int xfs_attr_get_ilocked(struct xfs_da_args *args);
 int xfs_attr_get(struct xfs_da_args *args);
+int xfs_attr_defer_add(struct xfs_da_args *args);
 int xfs_attr_set(struct xfs_da_args *args);
 int xfs_attr_set_iter(struct xfs_attr_intent *attr);
 int xfs_attr_remove_iter(struct xfs_attr_intent *attr);
@@ -552,7 +553,8 @@ bool xfs_attr_namecheck(struct xfs_mount *mp, const void *name, size_t length,
 int xfs_attr_calc_size(struct xfs_da_args *args, int *local);
 void xfs_init_attr_trans(struct xfs_da_args *args, struct xfs_trans_res *tres,
 			 unsigned int *total);
-
+int xfs_attr_intent_init(struct xfs_da_args *args, unsigned int op_flags,
+			 struct xfs_attr_intent  **attr);
 /*
  * Check to see if the attr should be upgraded from non-existent or shortform to
  * single-leaf-block attribute list.
diff --git a/fs/xfs/libxfs/xfs_da_format.h b/fs/xfs/libxfs/xfs_da_format.h
index 75b13807145d..2db1cf97b2c8 100644
--- a/fs/xfs/libxfs/xfs_da_format.h
+++ b/fs/xfs/libxfs/xfs_da_format.h
@@ -826,16 +826,4 @@ struct xfs_parent_name_rec {
 	__be32  p_diroffset;
 };
 
-/*
- * incore version of the above, also contains name pointers so callers
- * can pass/obtain all the parent pointer information in a single structure
- */
-struct xfs_parent_name_irec {
-	xfs_ino_t		p_ino;
-	uint32_t		p_gen;
-	xfs_dir2_dataptr_t	p_diroffset;
-	const char		*p_name;
-	uint8_t			p_namelen;
-};
-
 #endif /* __XFS_DA_FORMAT_H__ */
diff --git a/fs/xfs/libxfs/xfs_parent.c b/fs/xfs/libxfs/xfs_parent.c
new file mode 100644
index 000000000000..6b6d415319e6
--- /dev/null
+++ b/fs/xfs/libxfs/xfs_parent.c
@@ -0,0 +1,139 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2022 Oracle, Inc.
+ * All rights reserved.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_format.h"
+#include "xfs_da_format.h"
+#include "xfs_log_format.h"
+#include "xfs_shared.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_bmap_btree.h"
+#include "xfs_inode.h"
+#include "xfs_error.h"
+#include "xfs_trace.h"
+#include "xfs_trans.h"
+#include "xfs_da_btree.h"
+#include "xfs_attr.h"
+#include "xfs_da_btree.h"
+#include "xfs_attr_sf.h"
+#include "xfs_bmap.h"
+#include "xfs_defer.h"
+#include "xfs_log.h"
+#include "xfs_xattr.h"
+#include "xfs_parent.h"
+#include "xfs_trans_space.h"
+
+struct kmem_cache		*xfs_parent_intent_cache;
+
+/*
+ * Parent pointer attribute handling.
+ *
+ * Because the attribute value is a filename component, it will never be longer
+ * than 255 bytes. This means the attribute will always be a local format
+ * attribute as it is xfs_attr_leaf_entsize_local_max() for v5 filesystems will
+ * always be larger than this (max is 75% of block size).
+ *
+ * Creating a new parent attribute will always create a new attribute - there
+ * should never, ever be an existing attribute in the tree for a new inode.
+ * ENOSPC behavior is problematic - creating the inode without the parent
+ * pointer is effectively a corruption, so we allow parent attribute creation
+ * to dip into the reserve block pool to avoid unexpected ENOSPC errors from
+ * occurring.
+ */
+
+
+/* Initializes a xfs_parent_name_rec to be stored as an attribute name */
+void
+xfs_init_parent_name_rec(
+	struct xfs_parent_name_rec	*rec,
+	struct xfs_inode		*ip,
+	uint32_t			p_diroffset)
+{
+	xfs_ino_t			p_ino = ip->i_ino;
+	uint32_t			p_gen = VFS_I(ip)->i_generation;
+
+	rec->p_ino = cpu_to_be64(p_ino);
+	rec->p_gen = cpu_to_be32(p_gen);
+	rec->p_diroffset = cpu_to_be32(p_diroffset);
+}
+
+int
+__xfs_parent_init(
+	struct xfs_mount		*mp,
+	struct xfs_parent_defer		**parentp)
+{
+	struct xfs_parent_defer		*parent;
+	int				error;
+
+	error = xfs_attr_grab_log_assist(mp);
+	if (error)
+		return error;
+
+	parent = kmem_cache_zalloc(xfs_parent_intent_cache, GFP_KERNEL);
+	if (!parent) {
+		xfs_attr_rele_log_assist(mp);
+		return -ENOMEM;
+	}
+
+	/* init parent da_args */
+	parent->args.geo = mp->m_attr_geo;
+	parent->args.whichfork = XFS_ATTR_FORK;
+	parent->args.attr_filter = XFS_ATTR_PARENT;
+	parent->args.op_flags = XFS_DA_OP_OKNOENT | XFS_DA_OP_LOGGED;
+	parent->args.name = (const uint8_t *)&parent->rec;
+	parent->args.namelen = sizeof(struct xfs_parent_name_rec);
+
+	*parentp = parent;
+	return 0;
+}
+
+int
+xfs_parent_defer_add(
+	struct xfs_trans	*tp,
+	struct xfs_parent_defer	*parent,
+	struct xfs_inode	*dp,
+	struct xfs_name		*parent_name,
+	xfs_dir2_dataptr_t	diroffset,
+	struct xfs_inode	*child)
+{
+	struct xfs_da_args	*args = &parent->args;
+
+	xfs_init_parent_name_rec(&parent->rec, dp, diroffset);
+	args->hashval = xfs_da_hashname(args->name, args->namelen);
+
+	args->trans = tp;
+	args->dp = child;
+	if (parent_name) {
+		parent->args.value = (void *)parent_name->name;
+		parent->args.valuelen = parent_name->len;
+	}
+
+	return xfs_attr_defer_add(args);
+}
+
+void
+__xfs_parent_cancel(
+	xfs_mount_t		*mp,
+	struct xfs_parent_defer *parent)
+{
+	xlog_drop_incompat_feat(mp->m_log);
+	kmem_cache_free(xfs_parent_intent_cache, parent);
+}
+
+unsigned int
+xfs_pptr_calc_space_res(
+	struct xfs_mount	*mp,
+	unsigned int		namelen)
+{
+	/*
+	 * Pptrs are always the first attr in an attr tree, and never larger
+	 * than a block
+	 */
+	return XFS_DAENTER_SPACE_RES(mp, XFS_ATTR_FORK) +
+	       XFS_NEXTENTADD_SPACE_RES(mp, namelen, XFS_ATTR_FORK);
+}
+
diff --git a/fs/xfs/libxfs/xfs_parent.h b/fs/xfs/libxfs/xfs_parent.h
new file mode 100644
index 000000000000..d5a8c8e52cb5
--- /dev/null
+++ b/fs/xfs/libxfs/xfs_parent.h
@@ -0,0 +1,57 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2022 Oracle, Inc.
+ * All Rights Reserved.
+ */
+#ifndef	__XFS_PARENT_H__
+#define	__XFS_PARENT_H__
+
+extern struct kmem_cache	*xfs_parent_intent_cache;
+
+/*
+ * Dynamically allocd structure used to wrap the needed data to pass around
+ * the defer ops machinery
+ */
+struct xfs_parent_defer {
+	struct xfs_parent_name_rec	rec;
+	struct xfs_da_args		args;
+};
+
+/*
+ * Parent pointer attribute prototypes
+ */
+void xfs_init_parent_name_rec(struct xfs_parent_name_rec *rec,
+			      struct xfs_inode *ip,
+			      uint32_t p_diroffset);
+int __xfs_parent_init(struct xfs_mount *mp, struct xfs_parent_defer **parentp);
+
+static inline int
+xfs_parent_start(
+	struct xfs_mount	*mp,
+	struct xfs_parent_defer	**pp)
+{
+	*pp = NULL;
+
+	if (xfs_has_parent(mp))
+		return __xfs_parent_init(mp, pp);
+	return 0;
+}
+
+int xfs_parent_defer_add(struct xfs_trans *tp, struct xfs_parent_defer *parent,
+			 struct xfs_inode *dp, struct xfs_name *parent_name,
+			 xfs_dir2_dataptr_t diroffset, struct xfs_inode *child);
+void __xfs_parent_cancel(struct xfs_mount *mp, struct xfs_parent_defer *parent);
+
+static inline void
+xfs_parent_finish(
+	struct xfs_mount	*mp,
+	struct xfs_parent_defer	*p)
+{
+	if (p)
+		__xfs_parent_cancel(mp, p);
+}
+
+unsigned int xfs_pptr_calc_space_res(struct xfs_mount *mp,
+				     unsigned int namelen);
+
+#endif	/* __XFS_PARENT_H__ */
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index a896ee4c9680..ba488310ea9c 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -37,6 +37,8 @@
 #include "xfs_reflink.h"
 #include "xfs_ag.h"
 #include "xfs_log_priv.h"
+#include "xfs_parent.h"
+#include "xfs_xattr.h"
 
 struct kmem_cache *xfs_inode_cache;
 
@@ -946,10 +948,32 @@ xfs_bumplink(
 	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
 }
 
+static unsigned int
+xfs_create_space_res(
+	struct xfs_mount	*mp,
+	unsigned int		namelen)
+{
+	unsigned int		ret;
+
+	ret = XFS_IALLOC_SPACE_RES(mp) + XFS_DIRENTER_SPACE_RES(mp, namelen);
+	if (xfs_has_parent(mp))
+		ret += xfs_pptr_calc_space_res(mp, namelen);
+
+	return ret;
+}
+
+static unsigned int
+xfs_mkdir_space_res(
+	struct xfs_mount	*mp,
+	unsigned int		namelen)
+{
+	return xfs_create_space_res(mp, namelen);
+}
+
 int
 xfs_create(
 	struct user_namespace	*mnt_userns,
-	xfs_inode_t		*dp,
+	struct xfs_inode	*dp,
 	struct xfs_name		*name,
 	umode_t			mode,
 	dev_t			rdev,
@@ -961,7 +985,7 @@ xfs_create(
 	struct xfs_inode	*ip = NULL;
 	struct xfs_trans	*tp = NULL;
 	int			error;
-	bool                    unlock_dp_on_error = false;
+	bool			unlock_dp_on_error = false;
 	prid_t			prid;
 	struct xfs_dquot	*udqp = NULL;
 	struct xfs_dquot	*gdqp = NULL;
@@ -969,6 +993,8 @@ xfs_create(
 	struct xfs_trans_res	*tres;
 	uint			resblks;
 	xfs_ino_t		ino;
+	xfs_dir2_dataptr_t	diroffset;
+	struct xfs_parent_defer	*parent;
 
 	trace_xfs_create(dp, name);
 
@@ -988,13 +1014,17 @@ xfs_create(
 		return error;
 
 	if (is_dir) {
-		resblks = XFS_MKDIR_SPACE_RES(mp, name->len);
+		resblks = xfs_mkdir_space_res(mp, name->len);
 		tres = &M_RES(mp)->tr_mkdir;
 	} else {
-		resblks = XFS_CREATE_SPACE_RES(mp, name->len);
+		resblks = xfs_create_space_res(mp, name->len);
 		tres = &M_RES(mp)->tr_create;
 	}
 
+	error = xfs_parent_start(mp, &parent);
+	if (error)
+		goto out_release_dquots;
+
 	/*
 	 * Initially assume that the file does not exist and
 	 * reserve the resources for that case.  If that is not
@@ -1010,7 +1040,7 @@ xfs_create(
 				resblks, &tp);
 	}
 	if (error)
-		goto out_release_dquots;
+		goto out_parent;
 
 	xfs_ilock(dp, XFS_ILOCK_EXCL | XFS_ILOCK_PARENT);
 	unlock_dp_on_error = true;
@@ -1020,6 +1050,7 @@ xfs_create(
 	 * entry pointing to them, but a directory also the "." entry
 	 * pointing to itself.
 	 */
+	init_xattrs = init_xattrs || xfs_has_parent(mp);
 	error = xfs_dialloc(&tp, dp->i_ino, mode, &ino);
 	if (!error)
 		error = xfs_init_new_inode(mnt_userns, tp, dp, ino, mode,
@@ -1034,11 +1065,11 @@ xfs_create(
 	 * the transaction cancel unlocking dp so don't do it explicitly in the
 	 * error path.
 	 */
-	xfs_trans_ijoin(tp, dp, XFS_ILOCK_EXCL);
-	unlock_dp_on_error = false;
+	xfs_trans_ijoin(tp, dp, 0);
 
 	error = xfs_dir_createname(tp, dp, name, ip->i_ino,
-				   resblks - XFS_IALLOC_SPACE_RES(mp), NULL);
+				   resblks - XFS_IALLOC_SPACE_RES(mp),
+				   &diroffset);
 	if (error) {
 		ASSERT(error != -ENOSPC);
 		goto out_trans_cancel;
@@ -1054,6 +1085,17 @@ xfs_create(
 		xfs_bumplink(tp, dp);
 	}
 
+	/*
+	 * If we have parent pointers, we need to add the attribute containing
+	 * the parent information now.
+	 */
+	if (parent) {
+		error = xfs_parent_defer_add(tp, parent, dp, name, diroffset,
+					     ip);
+		if (error)
+			goto out_trans_cancel;
+	}
+
 	/*
 	 * If this is a synchronous mount, make sure that the
 	 * create transaction goes to disk before returning to
@@ -1079,6 +1121,8 @@ xfs_create(
 
 	*ipp = ip;
 	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+	xfs_iunlock(dp, XFS_ILOCK_EXCL);
+	xfs_parent_finish(mp, parent);
 	return 0;
 
  out_trans_cancel:
@@ -1090,10 +1134,12 @@ xfs_create(
 	 * transactions and deadlocks from xfs_inactive.
 	 */
 	if (ip) {
+		xfs_iunlock(ip, XFS_ILOCK_EXCL);
 		xfs_finish_inode_setup(ip);
 		xfs_irele(ip);
-		xfs_iunlock(ip, XFS_ILOCK_EXCL);
 	}
+ out_parent:
+	xfs_parent_finish(mp, parent);
  out_release_dquots:
 	xfs_qm_dqrele(udqp);
 	xfs_qm_dqrele(gdqp);
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 0c4b73e9b29d..6795761c31e0 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -41,6 +41,7 @@
 #include "xfs_attr_item.h"
 #include "xfs_xattr.h"
 #include "xfs_iunlink_item.h"
+#include "xfs_parent.h"
 
 #include <linux/magic.h>
 #include <linux/fs_context.h>
@@ -2115,8 +2116,16 @@ xfs_init_caches(void)
 	if (!xfs_iunlink_cache)
 		goto out_destroy_attri_cache;
 
+	xfs_parent_intent_cache = kmem_cache_create("xfs_parent_intent",
+					     sizeof(struct xfs_parent_defer),
+					     0, 0, NULL);
+	if (!xfs_parent_intent_cache)
+		goto out_destroy_iul_cache;
+
 	return 0;
 
+ out_destroy_iul_cache:
+	kmem_cache_destroy(xfs_iunlink_cache);
  out_destroy_attri_cache:
 	kmem_cache_destroy(xfs_attri_cache);
  out_destroy_attrd_cache:
@@ -2171,6 +2180,7 @@ xfs_destroy_caches(void)
 	 * destroy caches.
 	 */
 	rcu_barrier();
+	kmem_cache_destroy(xfs_parent_intent_cache);
 	kmem_cache_destroy(xfs_iunlink_cache);
 	kmem_cache_destroy(xfs_attri_cache);
 	kmem_cache_destroy(xfs_attrd_cache);
diff --git a/fs/xfs/xfs_xattr.c b/fs/xfs/xfs_xattr.c
index 10aa1fd39d2b..8bb5f53a31fe 100644
--- a/fs/xfs/xfs_xattr.c
+++ b/fs/xfs/xfs_xattr.c
@@ -27,7 +27,7 @@
  * they must release the permission by calling xlog_drop_incompat_feat
  * when they're done.
  */
-static inline int
+int
 xfs_attr_grab_log_assist(
 	struct xfs_mount	*mp)
 {
@@ -61,7 +61,7 @@ xfs_attr_grab_log_assist(
 	return error;
 }
 
-static inline void
+void
 xfs_attr_rele_log_assist(
 	struct xfs_mount	*mp)
 {
diff --git a/fs/xfs/xfs_xattr.h b/fs/xfs/xfs_xattr.h
index 2b09133b1b9b..7e0a2f3bb7f8 100644
--- a/fs/xfs/xfs_xattr.h
+++ b/fs/xfs/xfs_xattr.h
@@ -7,6 +7,8 @@
 #define __XFS_XATTR_H__
 
 int xfs_attr_change(struct xfs_da_args *args);
+int xfs_attr_grab_log_assist(struct xfs_mount *mp);
+void xfs_attr_rele_log_assist(struct xfs_mount *mp);
 
 extern const struct xattr_handler *xfs_xattr_handlers[];
 


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 16/28] xfs: add parent attributes to link
  2023-02-16 20:26 ` [PATCHSET v9r2d1 00/28] " Darrick J. Wong
                     ` (14 preceding siblings ...)
  2023-02-16 20:36   ` [PATCH 15/28] xfs: parent pointer attribute creation Darrick J. Wong
@ 2023-02-16 20:36   ` Darrick J. Wong
  2023-02-16 20:37   ` [PATCH 17/28] xfs: add parent attributes to symlink Darrick J. Wong
                     ` (11 subsequent siblings)
  27 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:36 UTC (permalink / raw)
  To: djwong; +Cc: Dave Chinner, Allison Henderson, allison.henderson, linux-xfs

From: Allison Henderson <allison.henderson@oracle.com>

This patch modifies xfs_link to add a parent pointer to the inode.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_trans_space.h |    2 -
 fs/xfs/xfs_inode.c              |   60 ++++++++++++++++++++++++++++++++++-----
 2 files changed, 53 insertions(+), 9 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_trans_space.h b/fs/xfs/libxfs/xfs_trans_space.h
index 87b31c69a773..f72207923ec2 100644
--- a/fs/xfs/libxfs/xfs_trans_space.h
+++ b/fs/xfs/libxfs/xfs_trans_space.h
@@ -84,8 +84,6 @@
 	(2 * (mp)->m_alloc_maxlevels)
 #define	XFS_GROWFSRT_SPACE_RES(mp,b)	\
 	((b) + XFS_EXTENTADD_SPACE_RES(mp, XFS_DATA_FORK))
-#define	XFS_LINK_SPACE_RES(mp,nl)	\
-	XFS_DIRENTER_SPACE_RES(mp,nl)
 #define	XFS_MKDIR_SPACE_RES(mp,nl)	\
 	(XFS_IALLOC_SPACE_RES(mp) + XFS_DIRENTER_SPACE_RES(mp,nl))
 #define	XFS_QM_DQALLOC_SPACE_RES(mp)	\
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index ba488310ea9c..b4318df03b5c 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -1247,16 +1247,32 @@ xfs_create_tmpfile(
 	return error;
 }
 
+static unsigned int
+xfs_link_space_res(
+	struct xfs_mount	*mp,
+	unsigned int		namelen)
+{
+	unsigned int		ret;
+
+	ret = XFS_DIRENTER_SPACE_RES(mp, namelen);
+	if (xfs_has_parent(mp))
+		ret += xfs_pptr_calc_space_res(mp, namelen);
+
+	return ret;
+}
+
 int
 xfs_link(
-	xfs_inode_t		*tdp,
-	xfs_inode_t		*sip,
+	struct xfs_inode	*tdp,
+	struct xfs_inode	*sip,
 	struct xfs_name		*target_name)
 {
-	xfs_mount_t		*mp = tdp->i_mount;
-	xfs_trans_t		*tp;
+	struct xfs_mount	*mp = tdp->i_mount;
+	struct xfs_trans	*tp;
 	int			error, nospace_error = 0;
 	int			resblks;
+	xfs_dir2_dataptr_t	diroffset;
+	struct xfs_parent_defer	*parent = NULL;
 
 	trace_xfs_link(tdp, target_name);
 
@@ -1273,11 +1289,25 @@ xfs_link(
 	if (error)
 		goto std_return;
 
-	resblks = XFS_LINK_SPACE_RES(mp, target_name->len);
+	error = xfs_parent_start(mp, &parent);
+	if (error)
+		goto std_return;
+
+	resblks = xfs_link_space_res(mp, target_name->len);
 	error = xfs_trans_alloc_dir(tdp, &M_RES(mp)->tr_link, sip, &resblks,
 			&tp, &nospace_error);
 	if (error)
-		goto std_return;
+		goto out_parent;
+
+	/*
+	 * We don't allow reservationless or quotaless hardlinking when parent
+	 * pointers are enabled because we can't back out if the xattrs must
+	 * grow.
+	 */
+	if (parent && nospace_error) {
+		error = nospace_error;
+		goto error_return;
+	}
 
 	/*
 	 * If we are using project inheritance, we only allow hard link
@@ -1310,7 +1340,7 @@ xfs_link(
 	}
 
 	error = xfs_dir_createname(tp, tdp, target_name, sip->i_ino,
-				   resblks, NULL);
+				   resblks, &diroffset);
 	if (error)
 		goto error_return;
 	xfs_trans_ichgtime(tp, tdp, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG);
@@ -1318,6 +1348,19 @@ xfs_link(
 
 	xfs_bumplink(tp, sip);
 
+	/*
+	 * If we have parent pointers, we now need to add the parent record to
+	 * the attribute fork of the inode. If this is the initial parent
+	 * attribute, we need to create it correctly, otherwise we can just add
+	 * the parent to the inode.
+	 */
+	if (parent) {
+		error = xfs_parent_defer_add(tp, parent, tdp, target_name,
+					     diroffset, sip);
+		if (error)
+			goto error_return;
+	}
+
 	/*
 	 * If this is a synchronous mount, make sure that the
 	 * link transaction goes to disk before returning to
@@ -1329,12 +1372,15 @@ xfs_link(
 	error = xfs_trans_commit(tp);
 	xfs_iunlock(tdp, XFS_ILOCK_EXCL);
 	xfs_iunlock(sip, XFS_ILOCK_EXCL);
+	xfs_parent_finish(mp, parent);
 	return error;
 
  error_return:
 	xfs_trans_cancel(tp);
 	xfs_iunlock(tdp, XFS_ILOCK_EXCL);
 	xfs_iunlock(sip, XFS_ILOCK_EXCL);
+ out_parent:
+	xfs_parent_finish(mp, parent);
  std_return:
 	if (error == -ENOSPC && nospace_error)
 		error = nospace_error;


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 17/28] xfs: add parent attributes to symlink
  2023-02-16 20:26 ` [PATCHSET v9r2d1 00/28] " Darrick J. Wong
                     ` (15 preceding siblings ...)
  2023-02-16 20:36   ` [PATCH 16/28] xfs: add parent attributes to link Darrick J. Wong
@ 2023-02-16 20:37   ` Darrick J. Wong
  2023-02-16 20:37   ` [PATCH 18/28] xfs: remove parent pointers in unlink Darrick J. Wong
                     ` (10 subsequent siblings)
  27 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:37 UTC (permalink / raw)
  To: djwong; +Cc: Allison Henderson, allison.henderson, linux-xfs

From: Allison Henderson <allison.henderson@oracle.com>

This patch modifies xfs_symlink to add a parent pointer to the inode.

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
---
 fs/xfs/libxfs/xfs_trans_space.h |    2 -
 fs/xfs/xfs_symlink.c            |   58 ++++++++++++++++++++++++++++++++-------
 2 files changed, 48 insertions(+), 12 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_trans_space.h b/fs/xfs/libxfs/xfs_trans_space.h
index f72207923ec2..25a55650baf4 100644
--- a/fs/xfs/libxfs/xfs_trans_space.h
+++ b/fs/xfs/libxfs/xfs_trans_space.h
@@ -95,8 +95,6 @@
 	XFS_DIRREMOVE_SPACE_RES(mp)
 #define	XFS_RENAME_SPACE_RES(mp,nl)	\
 	(XFS_DIRREMOVE_SPACE_RES(mp) + XFS_DIRENTER_SPACE_RES(mp,nl))
-#define	XFS_SYMLINK_SPACE_RES(mp,nl,b)	\
-	(XFS_IALLOC_SPACE_RES(mp) + XFS_DIRENTER_SPACE_RES(mp,nl) + (b))
 #define XFS_IFREE_SPACE_RES(mp)		\
 	(xfs_has_finobt(mp) ? M_IGEO(mp)->inobt_maxlevels : 0)
 
diff --git a/fs/xfs/xfs_symlink.c b/fs/xfs/xfs_symlink.c
index 27a7d7c57015..f305226109f0 100644
--- a/fs/xfs/xfs_symlink.c
+++ b/fs/xfs/xfs_symlink.c
@@ -23,6 +23,8 @@
 #include "xfs_trans.h"
 #include "xfs_ialloc.h"
 #include "xfs_error.h"
+#include "xfs_parent.h"
+#include "xfs_defer.h"
 
 /* ----- Kernel only functions below ----- */
 int
@@ -142,6 +144,23 @@ xfs_readlink(
 	return error;
 }
 
+static unsigned int
+xfs_symlink_space_res(
+	struct xfs_mount	*mp,
+	unsigned int		namelen,
+	unsigned int		fsblocks)
+{
+	unsigned int		ret;
+
+	ret = XFS_IALLOC_SPACE_RES(mp) + XFS_DIRENTER_SPACE_RES(mp, namelen) +
+			fsblocks;
+
+	if (xfs_has_parent(mp))
+		ret += xfs_pptr_calc_space_res(mp, namelen);
+
+	return ret;
+}
+
 int
 xfs_symlink(
 	struct user_namespace	*mnt_userns,
@@ -172,6 +191,8 @@ xfs_symlink(
 	struct xfs_dquot	*pdqp = NULL;
 	uint			resblks;
 	xfs_ino_t		ino;
+	xfs_dir2_dataptr_t      diroffset;
+	struct xfs_parent_defer *parent;
 
 	*ipp = NULL;
 
@@ -202,18 +223,24 @@ xfs_symlink(
 
 	/*
 	 * The symlink will fit into the inode data fork?
-	 * There can't be any attributes so we get the whole variable part.
+	 * If there are no parent pointers, then there wont't be any attributes.
+	 * So we get the whole variable part, and do not need to reserve extra
+	 * blocks.  Otherwise, we need to reserve the blocks.
 	 */
-	if (pathlen <= XFS_LITINO(mp))
+	if (pathlen <= XFS_LITINO(mp) && !xfs_has_parent(mp))
 		fs_blocks = 0;
 	else
 		fs_blocks = xfs_symlink_blocks(mp, pathlen);
-	resblks = XFS_SYMLINK_SPACE_RES(mp, link_name->len, fs_blocks);
+	resblks = xfs_symlink_space_res(mp, link_name->len, fs_blocks);
+
+	error = xfs_parent_start(mp, &parent);
+	if (error)
+		goto out_release_dquots;
 
 	error = xfs_trans_alloc_icreate(mp, &M_RES(mp)->tr_symlink, udqp, gdqp,
 			pdqp, resblks, &tp);
 	if (error)
-		goto out_release_dquots;
+		goto out_parent;
 
 	xfs_ilock(dp, XFS_ILOCK_EXCL | XFS_ILOCK_PARENT);
 	unlock_dp_on_error = true;
@@ -233,7 +260,7 @@ xfs_symlink(
 	if (!error)
 		error = xfs_init_new_inode(mnt_userns, tp, dp, ino,
 				S_IFLNK | (mode & ~S_IFMT), 1, 0, prid,
-				false, &ip);
+				xfs_has_parent(mp), &ip);
 	if (error)
 		goto out_trans_cancel;
 
@@ -244,8 +271,7 @@ xfs_symlink(
 	 * the transaction cancel unlocking dp so don't do it explicitly in the
 	 * error path.
 	 */
-	xfs_trans_ijoin(tp, dp, XFS_ILOCK_EXCL);
-	unlock_dp_on_error = false;
+	xfs_trans_ijoin(tp, dp, 0);
 
 	/*
 	 * Also attach the dquot(s) to it, if applicable.
@@ -315,12 +341,20 @@ xfs_symlink(
 	 * Create the directory entry for the symlink.
 	 */
 	error = xfs_dir_createname(tp, dp, link_name,
-			ip->i_ino, resblks, NULL);
+			ip->i_ino, resblks, &diroffset);
 	if (error)
 		goto out_trans_cancel;
 	xfs_trans_ichgtime(tp, dp, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG);
 	xfs_trans_log_inode(tp, dp, XFS_ILOG_CORE);
 
+	if (parent) {
+		error = xfs_parent_defer_add(tp, parent, dp, link_name,
+					     diroffset, ip);
+		if (error)
+			goto out_trans_cancel;
+	}
+
+
 	/*
 	 * If this is a synchronous mount, make sure that the
 	 * symlink transaction goes to disk before returning to
@@ -339,6 +373,8 @@ xfs_symlink(
 
 	*ipp = ip;
 	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+	xfs_iunlock(dp, XFS_ILOCK_EXCL);
+	xfs_parent_finish(mp, parent);
 	return 0;
 
 out_trans_cancel:
@@ -350,9 +386,12 @@ xfs_symlink(
 	 * transactions and deadlocks from xfs_inactive.
 	 */
 	if (ip) {
+		xfs_iunlock(ip, XFS_ILOCK_EXCL);
 		xfs_finish_inode_setup(ip);
 		xfs_irele(ip);
 	}
+out_parent:
+	xfs_parent_finish(mp, parent);
 out_release_dquots:
 	xfs_qm_dqrele(udqp);
 	xfs_qm_dqrele(gdqp);
@@ -360,8 +399,7 @@ xfs_symlink(
 
 	if (unlock_dp_on_error)
 		xfs_iunlock(dp, XFS_ILOCK_EXCL);
-	if (ip)
-		xfs_iunlock(ip, XFS_ILOCK_EXCL);
+
 	return error;
 }
 


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 18/28] xfs: remove parent pointers in unlink
  2023-02-16 20:26 ` [PATCHSET v9r2d1 00/28] " Darrick J. Wong
                     ` (16 preceding siblings ...)
  2023-02-16 20:37   ` [PATCH 17/28] xfs: add parent attributes to symlink Darrick J. Wong
@ 2023-02-16 20:37   ` Darrick J. Wong
  2023-02-16 20:37   ` [PATCH 19/28] xfs: Indent xfs_rename Darrick J. Wong
                     ` (9 subsequent siblings)
  27 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:37 UTC (permalink / raw)
  To: djwong; +Cc: Dave Chinner, Allison Henderson, allison.henderson, linux-xfs

From: Allison Henderson <allison.henderson@oracle.com>

This patch removes the parent pointer attribute during unlink

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_attr.c        |    2 +-
 fs/xfs/libxfs/xfs_attr.h        |    1 +
 fs/xfs/libxfs/xfs_parent.c      |   17 ++++++++++++++++
 fs/xfs/libxfs/xfs_parent.h      |    5 +++++
 fs/xfs/libxfs/xfs_trans_space.h |    2 --
 fs/xfs/xfs_inode.c              |   42 +++++++++++++++++++++++++++++++++------
 6 files changed, 59 insertions(+), 10 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index f68d41f0f998..a8db44728b11 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -946,7 +946,7 @@ xfs_attr_defer_replace(
 }
 
 /* Removes an attribute for an inode as a deferred operation */
-static int
+int
 xfs_attr_defer_remove(
 	struct xfs_da_args	*args)
 {
diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
index 0cf23f5117ad..033005542b9e 100644
--- a/fs/xfs/libxfs/xfs_attr.h
+++ b/fs/xfs/libxfs/xfs_attr.h
@@ -545,6 +545,7 @@ bool xfs_attr_is_leaf(struct xfs_inode *ip);
 int xfs_attr_get_ilocked(struct xfs_da_args *args);
 int xfs_attr_get(struct xfs_da_args *args);
 int xfs_attr_defer_add(struct xfs_da_args *args);
+int xfs_attr_defer_remove(struct xfs_da_args *args);
 int xfs_attr_set(struct xfs_da_args *args);
 int xfs_attr_set_iter(struct xfs_attr_intent *attr);
 int xfs_attr_remove_iter(struct xfs_attr_intent *attr);
diff --git a/fs/xfs/libxfs/xfs_parent.c b/fs/xfs/libxfs/xfs_parent.c
index 6b6d415319e6..245855a5f969 100644
--- a/fs/xfs/libxfs/xfs_parent.c
+++ b/fs/xfs/libxfs/xfs_parent.c
@@ -115,6 +115,23 @@ xfs_parent_defer_add(
 	return xfs_attr_defer_add(args);
 }
 
+int
+xfs_parent_defer_remove(
+	struct xfs_trans	*tp,
+	struct xfs_inode	*dp,
+	struct xfs_parent_defer	*parent,
+	xfs_dir2_dataptr_t	diroffset,
+	struct xfs_inode	*child)
+{
+	struct xfs_da_args	*args = &parent->args;
+
+	xfs_init_parent_name_rec(&parent->rec, dp, diroffset);
+	args->trans = tp;
+	args->dp = child;
+	args->hashval = xfs_da_hashname(args->name, args->namelen);
+	return xfs_attr_defer_remove(args);
+}
+
 void
 __xfs_parent_cancel(
 	xfs_mount_t		*mp,
diff --git a/fs/xfs/libxfs/xfs_parent.h b/fs/xfs/libxfs/xfs_parent.h
index d5a8c8e52cb5..0f39d033d84e 100644
--- a/fs/xfs/libxfs/xfs_parent.h
+++ b/fs/xfs/libxfs/xfs_parent.h
@@ -40,6 +40,11 @@ xfs_parent_start(
 int xfs_parent_defer_add(struct xfs_trans *tp, struct xfs_parent_defer *parent,
 			 struct xfs_inode *dp, struct xfs_name *parent_name,
 			 xfs_dir2_dataptr_t diroffset, struct xfs_inode *child);
+int xfs_parent_defer_remove(struct xfs_trans *tp, struct xfs_inode *dp,
+			    struct xfs_parent_defer *parent,
+			    xfs_dir2_dataptr_t diroffset,
+			    struct xfs_inode *child);
+
 void __xfs_parent_cancel(struct xfs_mount *mp, struct xfs_parent_defer *parent);
 
 static inline void
diff --git a/fs/xfs/libxfs/xfs_trans_space.h b/fs/xfs/libxfs/xfs_trans_space.h
index 25a55650baf4..b5ab6701e7fb 100644
--- a/fs/xfs/libxfs/xfs_trans_space.h
+++ b/fs/xfs/libxfs/xfs_trans_space.h
@@ -91,8 +91,6 @@
 	 XFS_DQUOT_CLUSTER_SIZE_FSB)
 #define	XFS_QM_QINOCREATE_SPACE_RES(mp)	\
 	XFS_IALLOC_SPACE_RES(mp)
-#define	XFS_REMOVE_SPACE_RES(mp)	\
-	XFS_DIRREMOVE_SPACE_RES(mp)
 #define	XFS_RENAME_SPACE_RES(mp,nl)	\
 	(XFS_DIRREMOVE_SPACE_RES(mp) + XFS_DIRENTER_SPACE_RES(mp,nl))
 #define XFS_IFREE_SPACE_RES(mp)		\
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index b4318df03b5c..7b34ca2de569 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -2477,6 +2477,19 @@ xfs_iunpin_wait(
 		__xfs_iunpin_wait(ip);
 }
 
+static unsigned int
+xfs_remove_space_res(
+	struct xfs_mount	*mp,
+	unsigned int		namelen)
+{
+	unsigned int		ret = XFS_DIRREMOVE_SPACE_RES(mp);
+
+	if (xfs_has_parent(mp))
+		ret += xfs_pptr_calc_space_res(mp, namelen);
+
+	return ret;
+}
+
 /*
  * Removing an inode from the namespace involves removing the directory entry
  * and dropping the link count on the inode. Removing the directory entry can
@@ -2506,16 +2519,18 @@ xfs_iunpin_wait(
  */
 int
 xfs_remove(
-	xfs_inode_t             *dp,
+	struct xfs_inode	*dp,
 	struct xfs_name		*name,
-	xfs_inode_t		*ip)
+	struct xfs_inode	*ip)
 {
-	xfs_mount_t		*mp = dp->i_mount;
-	xfs_trans_t             *tp = NULL;
+	struct xfs_mount	*mp = dp->i_mount;
+	struct xfs_trans	*tp = NULL;
 	int			is_dir = S_ISDIR(VFS_I(ip)->i_mode);
 	int			dontcare;
 	int                     error = 0;
 	uint			resblks;
+	xfs_dir2_dataptr_t	dir_offset;
+	struct xfs_parent_defer	*parent = NULL;
 
 	trace_xfs_remove(dp, name);
 
@@ -2530,6 +2545,10 @@ xfs_remove(
 	if (error)
 		goto std_return;
 
+	error = xfs_parent_start(mp, &parent);
+	if (error)
+		goto std_return;
+
 	/*
 	 * We try to get the real space reservation first, allowing for
 	 * directory btree deletion(s) implying possible bmap insert(s).  If we
@@ -2541,12 +2560,12 @@ xfs_remove(
 	 * the directory code can handle a reservationless update and we don't
 	 * want to prevent a user from trying to free space by deleting things.
 	 */
-	resblks = XFS_REMOVE_SPACE_RES(mp);
+	resblks = xfs_remove_space_res(mp, name->len);
 	error = xfs_trans_alloc_dir(dp, &M_RES(mp)->tr_remove, ip, &resblks,
 			&tp, &dontcare);
 	if (error) {
 		ASSERT(error != -ENOSPC);
-		goto std_return;
+		goto out_parent;
 	}
 
 	/*
@@ -2600,12 +2619,18 @@ xfs_remove(
 	if (error)
 		goto out_trans_cancel;
 
-	error = xfs_dir_removename(tp, dp, name, ip->i_ino, resblks, NULL);
+	error = xfs_dir_removename(tp, dp, name, ip->i_ino, resblks, &dir_offset);
 	if (error) {
 		ASSERT(error != -ENOENT);
 		goto out_trans_cancel;
 	}
 
+	if (parent) {
+		error = xfs_parent_defer_remove(tp, dp, parent, dir_offset, ip);
+		if (error)
+			goto out_trans_cancel;
+	}
+
 	/*
 	 * If this is a synchronous mount, make sure that the
 	 * remove transaction goes to disk before returning to
@@ -2623,6 +2648,7 @@ xfs_remove(
 
 	xfs_iunlock(ip, XFS_ILOCK_EXCL);
 	xfs_iunlock(dp, XFS_ILOCK_EXCL);
+	xfs_parent_finish(mp, parent);
 	return 0;
 
  out_trans_cancel:
@@ -2630,6 +2656,8 @@ xfs_remove(
  out_unlock:
 	xfs_iunlock(ip, XFS_ILOCK_EXCL);
 	xfs_iunlock(dp, XFS_ILOCK_EXCL);
+ out_parent:
+	xfs_parent_finish(mp, parent);
  std_return:
 	return error;
 }


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 19/28] xfs: Indent xfs_rename
  2023-02-16 20:26 ` [PATCHSET v9r2d1 00/28] " Darrick J. Wong
                     ` (17 preceding siblings ...)
  2023-02-16 20:37   ` [PATCH 18/28] xfs: remove parent pointers in unlink Darrick J. Wong
@ 2023-02-16 20:37   ` Darrick J. Wong
  2023-02-16 20:37   ` [PATCH 20/28] xfs: Add parent pointers to rename Darrick J. Wong
                     ` (8 subsequent siblings)
  27 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:37 UTC (permalink / raw)
  To: djwong; +Cc: Allison Henderson, allison.henderson, linux-xfs

From: Allison Henderson <allison.henderson@oracle.com>

Indent variables and parameters in xfs_rename in preparation for
parent pointer modifications.  White space only, no functional
changes.  This will make reviewing new code easier on reviewers.

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_inode.c |   39 ++++++++++++++++++++-------------------
 1 file changed, 20 insertions(+), 19 deletions(-)


diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 7b34ca2de569..2d8f225cb57d 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -2902,26 +2902,27 @@ xfs_rename_alloc_whiteout(
  */
 int
 xfs_rename(
-	struct user_namespace	*mnt_userns,
-	struct xfs_inode	*src_dp,
-	struct xfs_name		*src_name,
-	struct xfs_inode	*src_ip,
-	struct xfs_inode	*target_dp,
-	struct xfs_name		*target_name,
-	struct xfs_inode	*target_ip,
-	unsigned int		flags)
+	struct user_namespace		*mnt_userns,
+	struct xfs_inode		*src_dp,
+	struct xfs_name			*src_name,
+	struct xfs_inode		*src_ip,
+	struct xfs_inode		*target_dp,
+	struct xfs_name			*target_name,
+	struct xfs_inode		*target_ip,
+	unsigned int			flags)
 {
-	struct xfs_mount	*mp = src_dp->i_mount;
-	struct xfs_trans	*tp;
-	struct xfs_inode	*wip = NULL;		/* whiteout inode */
-	struct xfs_inode	*inodes[__XFS_SORT_INODES];
-	int			i;
-	int			num_inodes = __XFS_SORT_INODES;
-	bool			new_parent = (src_dp != target_dp);
-	bool			src_is_directory = S_ISDIR(VFS_I(src_ip)->i_mode);
-	int			spaceres;
-	bool			retried = false;
-	int			error, nospace_error = 0;
+	struct xfs_mount		*mp = src_dp->i_mount;
+	struct xfs_trans		*tp;
+	struct xfs_inode		*wip = NULL;	/* whiteout inode */
+	struct xfs_inode		*inodes[__XFS_SORT_INODES];
+	int				i;
+	int				num_inodes = __XFS_SORT_INODES;
+	bool				new_parent = (src_dp != target_dp);
+	bool				src_is_directory =
+						S_ISDIR(VFS_I(src_ip)->i_mode);
+	int				spaceres;
+	bool				retried = false;
+	int				error, nospace_error = 0;
 
 	trace_xfs_rename(src_dp, target_dp, src_name, target_name);
 


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 20/28] xfs: Add parent pointers to rename
  2023-02-16 20:26 ` [PATCHSET v9r2d1 00/28] " Darrick J. Wong
                     ` (18 preceding siblings ...)
  2023-02-16 20:37   ` [PATCH 19/28] xfs: Indent xfs_rename Darrick J. Wong
@ 2023-02-16 20:37   ` Darrick J. Wong
  2023-02-16 20:38   ` [PATCH 21/28] xfs: Add parent pointers to xfs_cross_rename Darrick J. Wong
                     ` (7 subsequent siblings)
  27 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:37 UTC (permalink / raw)
  To: djwong; +Cc: Allison Henderson, allison.henderson, linux-xfs

From: Allison Henderson <allison.henderson@oracle.com>

This patch removes the old parent pointer attribute during the rename
operation, and re-adds the updated parent pointer.

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
---
 fs/xfs/libxfs/xfs_attr.c        |    2 -
 fs/xfs/libxfs/xfs_attr.h        |    1 
 fs/xfs/libxfs/xfs_parent.c      |   47 ++++++++++++++--
 fs/xfs/libxfs/xfs_parent.h      |   24 +++++++-
 fs/xfs/libxfs/xfs_trans_space.h |    2 -
 fs/xfs/xfs_inode.c              |  117 ++++++++++++++++++++++++++++++++++++---
 6 files changed, 174 insertions(+), 19 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index a8db44728b11..57080ea4c869 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -923,7 +923,7 @@ xfs_attr_defer_add(
 }
 
 /* Sets an attribute for an inode as a deferred operation */
-static int
+int
 xfs_attr_defer_replace(
 	struct xfs_da_args	*args)
 {
diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
index 033005542b9e..985761264d1f 100644
--- a/fs/xfs/libxfs/xfs_attr.h
+++ b/fs/xfs/libxfs/xfs_attr.h
@@ -546,6 +546,7 @@ int xfs_attr_get_ilocked(struct xfs_da_args *args);
 int xfs_attr_get(struct xfs_da_args *args);
 int xfs_attr_defer_add(struct xfs_da_args *args);
 int xfs_attr_defer_remove(struct xfs_da_args *args);
+int xfs_attr_defer_replace(struct xfs_da_args *args);
 int xfs_attr_set(struct xfs_da_args *args);
 int xfs_attr_set_iter(struct xfs_attr_intent *attr);
 int xfs_attr_remove_iter(struct xfs_attr_intent *attr);
diff --git a/fs/xfs/libxfs/xfs_parent.c b/fs/xfs/libxfs/xfs_parent.c
index 245855a5f969..629762701952 100644
--- a/fs/xfs/libxfs/xfs_parent.c
+++ b/fs/xfs/libxfs/xfs_parent.c
@@ -64,22 +64,27 @@ xfs_init_parent_name_rec(
 int
 __xfs_parent_init(
 	struct xfs_mount		*mp,
+	bool				grab_log,
 	struct xfs_parent_defer		**parentp)
 {
 	struct xfs_parent_defer		*parent;
 	int				error;
 
-	error = xfs_attr_grab_log_assist(mp);
-	if (error)
-		return error;
+	if (grab_log) {
+		error = xfs_attr_grab_log_assist(mp);
+		if (error)
+			return error;
+	}
 
 	parent = kmem_cache_zalloc(xfs_parent_intent_cache, GFP_KERNEL);
 	if (!parent) {
-		xfs_attr_rele_log_assist(mp);
+		if (grab_log)
+			xfs_attr_rele_log_assist(mp);
 		return -ENOMEM;
 	}
 
 	/* init parent da_args */
+	parent->have_log = grab_log;
 	parent->args.geo = mp->m_attr_geo;
 	parent->args.whichfork = XFS_ATTR_FORK;
 	parent->args.attr_filter = XFS_ATTR_PARENT;
@@ -132,12 +137,44 @@ xfs_parent_defer_remove(
 	return xfs_attr_defer_remove(args);
 }
 
+
+int
+xfs_parent_defer_replace(
+	struct xfs_trans	*tp,
+	struct xfs_parent_defer	*new_parent,
+	struct xfs_inode	*old_dp,
+	xfs_dir2_dataptr_t	old_diroffset,
+	struct xfs_name		*parent_name,
+	struct xfs_inode	*new_dp,
+	xfs_dir2_dataptr_t	new_diroffset,
+	struct xfs_inode	*child)
+{
+	struct xfs_da_args	*args = &new_parent->args;
+
+	xfs_init_parent_name_rec(&new_parent->old_rec, old_dp, old_diroffset);
+	xfs_init_parent_name_rec(&new_parent->rec, new_dp, new_diroffset);
+	new_parent->args.name = (const uint8_t *)&new_parent->old_rec;
+	new_parent->args.namelen = sizeof(struct xfs_parent_name_rec);
+	new_parent->args.new_name = (const uint8_t *)&new_parent->rec;
+	new_parent->args.new_namelen = sizeof(struct xfs_parent_name_rec);
+	args->trans = tp;
+	args->dp = child;
+
+	ASSERT(parent_name != NULL);
+	new_parent->args.value = (void *)parent_name->name;
+	new_parent->args.valuelen = parent_name->len;
+
+	args->hashval = xfs_da_hashname(args->name, args->namelen);
+	return xfs_attr_defer_replace(args);
+}
+
 void
 __xfs_parent_cancel(
 	xfs_mount_t		*mp,
 	struct xfs_parent_defer *parent)
 {
-	xlog_drop_incompat_feat(mp->m_log);
+	if (parent->have_log)
+		xlog_drop_incompat_feat(mp->m_log);
 	kmem_cache_free(xfs_parent_intent_cache, parent);
 }
 
diff --git a/fs/xfs/libxfs/xfs_parent.h b/fs/xfs/libxfs/xfs_parent.h
index 0f39d033d84e..039005883bb6 100644
--- a/fs/xfs/libxfs/xfs_parent.h
+++ b/fs/xfs/libxfs/xfs_parent.h
@@ -14,7 +14,9 @@ extern struct kmem_cache	*xfs_parent_intent_cache;
  */
 struct xfs_parent_defer {
 	struct xfs_parent_name_rec	rec;
+	struct xfs_parent_name_rec	old_rec;
 	struct xfs_da_args		args;
+	bool				have_log;
 };
 
 /*
@@ -23,7 +25,8 @@ struct xfs_parent_defer {
 void xfs_init_parent_name_rec(struct xfs_parent_name_rec *rec,
 			      struct xfs_inode *ip,
 			      uint32_t p_diroffset);
-int __xfs_parent_init(struct xfs_mount *mp, struct xfs_parent_defer **parentp);
+int __xfs_parent_init(struct xfs_mount *mp, bool grab_log,
+		struct xfs_parent_defer **parentp);
 
 static inline int
 xfs_parent_start(
@@ -33,13 +36,30 @@ xfs_parent_start(
 	*pp = NULL;
 
 	if (xfs_has_parent(mp))
-		return __xfs_parent_init(mp, pp);
+		return __xfs_parent_init(mp, true, pp);
+	return 0;
+}
+
+static inline int
+xfs_parent_start_locked(
+	struct xfs_mount	*mp,
+	struct xfs_parent_defer	**pp)
+{
+	*pp = NULL;
+
+	if (xfs_has_parent(mp))
+		return __xfs_parent_init(mp, false, pp);
 	return 0;
 }
 
 int xfs_parent_defer_add(struct xfs_trans *tp, struct xfs_parent_defer *parent,
 			 struct xfs_inode *dp, struct xfs_name *parent_name,
 			 xfs_dir2_dataptr_t diroffset, struct xfs_inode *child);
+int xfs_parent_defer_replace(struct xfs_trans *tp,
+		struct xfs_parent_defer *new_parent, struct xfs_inode *old_dp,
+		xfs_dir2_dataptr_t old_diroffset, struct xfs_name *parent_name,
+		struct xfs_inode *new_ip, xfs_dir2_dataptr_t new_diroffset,
+		struct xfs_inode *child);
 int xfs_parent_defer_remove(struct xfs_trans *tp, struct xfs_inode *dp,
 			    struct xfs_parent_defer *parent,
 			    xfs_dir2_dataptr_t diroffset,
diff --git a/fs/xfs/libxfs/xfs_trans_space.h b/fs/xfs/libxfs/xfs_trans_space.h
index b5ab6701e7fb..810610a14c4d 100644
--- a/fs/xfs/libxfs/xfs_trans_space.h
+++ b/fs/xfs/libxfs/xfs_trans_space.h
@@ -91,8 +91,6 @@
 	 XFS_DQUOT_CLUSTER_SIZE_FSB)
 #define	XFS_QM_QINOCREATE_SPACE_RES(mp)	\
 	XFS_IALLOC_SPACE_RES(mp)
-#define	XFS_RENAME_SPACE_RES(mp,nl)	\
-	(XFS_DIRREMOVE_SPACE_RES(mp) + XFS_DIRENTER_SPACE_RES(mp,nl))
 #define XFS_IFREE_SPACE_RES(mp)		\
 	(xfs_has_finobt(mp) ? M_IGEO(mp)->inobt_maxlevels : 0)
 
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 2d8f225cb57d..cdbd7df64ff0 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -2871,7 +2871,7 @@ xfs_rename_alloc_whiteout(
 	int			error;
 
 	error = xfs_create_tmpfile(mnt_userns, dp, S_IFCHR | WHITEOUT_MODE,
-				   false, &tmpfile);
+				   xfs_has_parent(dp->i_mount), &tmpfile);
 	if (error)
 		return error;
 
@@ -2897,6 +2897,31 @@ xfs_rename_alloc_whiteout(
 	return 0;
 }
 
+static unsigned int
+xfs_rename_space_res(
+	struct xfs_mount	*mp,
+	struct xfs_name		*src_name,
+	struct xfs_parent_defer	*target_parent_ptr,
+	struct xfs_name		*target_name,
+	struct xfs_parent_defer	*new_parent_ptr,
+	struct xfs_inode	*wip)
+{
+	unsigned int		ret;
+
+	ret = XFS_DIRREMOVE_SPACE_RES(mp) +
+			XFS_DIRENTER_SPACE_RES(mp, target_name->len);
+
+	if (new_parent_ptr) {
+		if (wip)
+			ret += xfs_pptr_calc_space_res(mp, src_name->len);
+		ret += 2 * xfs_pptr_calc_space_res(mp, target_name->len);
+	}
+	if (target_parent_ptr)
+		ret += xfs_pptr_calc_space_res(mp, target_name->len);
+
+	return ret;
+}
+
 /*
  * xfs_rename
  */
@@ -2923,6 +2948,11 @@ xfs_rename(
 	int				spaceres;
 	bool				retried = false;
 	int				error, nospace_error = 0;
+	xfs_dir2_dataptr_t		new_diroffset;
+	xfs_dir2_dataptr_t		old_diroffset;
+	struct xfs_parent_defer		*src_ip_pptr = NULL;
+	struct xfs_parent_defer		*tgt_ip_pptr = NULL;
+	struct xfs_parent_defer		*wip_pptr = NULL;
 
 	trace_xfs_rename(src_dp, target_dp, src_name, target_name);
 
@@ -2947,9 +2977,26 @@ xfs_rename(
 	xfs_sort_for_rename(src_dp, target_dp, src_ip, target_ip, wip,
 				inodes, &num_inodes);
 
+	error = xfs_parent_start(mp, &src_ip_pptr);
+	if (error)
+		goto out_release_wip;
+
+	if (wip) {
+		error = xfs_parent_start_locked(mp, &wip_pptr);
+		if (error)
+			goto out_src_ip_pptr;
+	}
+
+	if (target_ip) {
+		error = xfs_parent_start_locked(mp, &tgt_ip_pptr);
+		if (error)
+			goto out_wip_pptr;
+	}
+
 retry:
 	nospace_error = 0;
-	spaceres = XFS_RENAME_SPACE_RES(mp, target_name->len);
+	spaceres = xfs_rename_space_res(mp, src_name, tgt_ip_pptr,
+			target_name, src_ip_pptr, wip);
 	error = xfs_trans_alloc(mp, &M_RES(mp)->tr_rename, spaceres, 0, 0, &tp);
 	if (error == -ENOSPC) {
 		nospace_error = error;
@@ -2958,14 +3005,26 @@ xfs_rename(
 				&tp);
 	}
 	if (error)
-		goto out_release_wip;
+		goto out_tgt_ip_pptr;
+
+	/*
+	 * We don't allow reservationless renaming when parent pointers are
+	 * enabled because we can't back out if the xattrs must grow.
+	 */
+	if (src_ip_pptr && nospace_error) {
+		error = nospace_error;
+		xfs_trans_cancel(tp);
+		goto out_tgt_ip_pptr;
+	}
 
 	/*
 	 * Attach the dquots to the inodes
 	 */
 	error = xfs_qm_vop_rename_dqattach(inodes);
-	if (error)
-		goto out_trans_cancel;
+	if (error) {
+		xfs_trans_cancel(tp);
+		goto out_tgt_ip_pptr;
+	}
 
 	/*
 	 * Lock all the participating inodes. Depending upon whether
@@ -3032,6 +3091,15 @@ xfs_rename(
 			goto out_trans_cancel;
 	}
 
+	/*
+	 * We don't allow quotaless renaming when parent pointers are enabled
+	 * because we can't back out if the xattrs must grow.
+	 */
+	if (src_ip_pptr && nospace_error) {
+		error = nospace_error;
+		goto out_trans_cancel;
+	}
+
 	/*
 	 * Check for expected errors before we dirty the transaction
 	 * so we can return an error without a transaction abort.
@@ -3122,7 +3190,7 @@ xfs_rename(
 		 * to account for the ".." reference from the new entry.
 		 */
 		error = xfs_dir_createname(tp, target_dp, target_name,
-					   src_ip->i_ino, spaceres, NULL);
+					   src_ip->i_ino, spaceres, &new_diroffset);
 		if (error)
 			goto out_trans_cancel;
 
@@ -3143,7 +3211,7 @@ xfs_rename(
 		 * name at the destination directory, remove it first.
 		 */
 		error = xfs_dir_replace(tp, target_dp, target_name,
-					src_ip->i_ino, spaceres, NULL);
+					src_ip->i_ino, spaceres, &new_diroffset);
 		if (error)
 			goto out_trans_cancel;
 
@@ -3216,14 +3284,38 @@ xfs_rename(
 	 */
 	if (wip)
 		error = xfs_dir_replace(tp, src_dp, src_name, wip->i_ino,
-					spaceres, NULL);
+					spaceres, &old_diroffset);
 	else
 		error = xfs_dir_removename(tp, src_dp, src_name, src_ip->i_ino,
-					   spaceres, NULL);
+					   spaceres, &old_diroffset);
 
 	if (error)
 		goto out_trans_cancel;
 
+	if (wip_pptr) {
+		error = xfs_parent_defer_add(tp, wip_pptr,
+					     src_dp, src_name,
+					     old_diroffset, wip);
+		if (error)
+			goto out_trans_cancel;
+	}
+
+	if (src_ip_pptr) {
+		error = xfs_parent_defer_replace(tp, src_ip_pptr, src_dp,
+				old_diroffset, target_name, target_dp,
+				new_diroffset, src_ip);
+		if (error)
+			goto out_trans_cancel;
+	}
+
+	if (tgt_ip_pptr) {
+		error = xfs_parent_defer_remove(tp, target_dp,
+						tgt_ip_pptr,
+						new_diroffset, target_ip);
+		if (error)
+			goto out_trans_cancel;
+	}
+
 	xfs_trans_ichgtime(tp, src_dp, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG);
 	xfs_trans_log_inode(tp, src_dp, XFS_ILOG_CORE);
 	if (new_parent)
@@ -3237,6 +3329,13 @@ xfs_rename(
 	xfs_trans_cancel(tp);
 out_unlock:
 	xfs_iunlock_rename(inodes, num_inodes);
+out_tgt_ip_pptr:
+	xfs_parent_finish(mp, tgt_ip_pptr);
+out_wip_pptr:
+	xfs_parent_finish(mp, wip_pptr);
+out_src_ip_pptr:
+	xfs_parent_finish(mp, src_ip_pptr);
+
 out_release_wip:
 	if (wip)
 		xfs_irele(wip);


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 21/28] xfs: Add parent pointers to xfs_cross_rename
  2023-02-16 20:26 ` [PATCHSET v9r2d1 00/28] " Darrick J. Wong
                     ` (19 preceding siblings ...)
  2023-02-16 20:37   ` [PATCH 20/28] xfs: Add parent pointers to rename Darrick J. Wong
@ 2023-02-16 20:38   ` Darrick J. Wong
  2023-02-16 20:38   ` [PATCH 22/28] xfs: Add the parent pointer support to the superblock version 5 Darrick J. Wong
                     ` (6 subsequent siblings)
  27 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:38 UTC (permalink / raw)
  To: djwong; +Cc: Allison Henderson, allison.henderson, linux-xfs

From: Allison Henderson <allison.henderson@oracle.com>

Cross renames are handled separately from standard renames, and
need different handling to update the parent attributes correctly.

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
---
 fs/xfs/xfs_inode.c |   49 +++++++++++++++++++++++++++++++++----------------
 1 file changed, 33 insertions(+), 16 deletions(-)


diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index cdbd7df64ff0..6626aa7486f1 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -2749,27 +2749,31 @@ xfs_finish_rename(
  */
 STATIC int
 xfs_cross_rename(
-	struct xfs_trans	*tp,
-	struct xfs_inode	*dp1,
-	struct xfs_name		*name1,
-	struct xfs_inode	*ip1,
-	struct xfs_inode	*dp2,
-	struct xfs_name		*name2,
-	struct xfs_inode	*ip2,
-	int			spaceres)
+	struct xfs_trans		*tp,
+	struct xfs_inode		*dp1,
+	struct xfs_name			*name1,
+	struct xfs_inode		*ip1,
+	struct xfs_parent_defer		*ip1_pptr,
+	struct xfs_inode		*dp2,
+	struct xfs_name			*name2,
+	struct xfs_inode		*ip2,
+	struct xfs_parent_defer		*ip2_pptr,
+	int				spaceres)
 {
-	int		error = 0;
-	int		ip1_flags = 0;
-	int		ip2_flags = 0;
-	int		dp2_flags = 0;
+	struct xfs_mount		*mp = dp1->i_mount;
+	int				error = 0;
+	int				ip1_flags = 0;
+	int				ip2_flags = 0;
+	int				dp2_flags = 0;
+	int				new_diroffset, old_diroffset;
 
 	/* Swap inode number for dirent in first parent */
-	error = xfs_dir_replace(tp, dp1, name1, ip2->i_ino, spaceres, NULL);
+	error = xfs_dir_replace(tp, dp1, name1, ip2->i_ino, spaceres, &old_diroffset);
 	if (error)
 		goto out_trans_abort;
 
 	/* Swap inode number for dirent in second parent */
-	error = xfs_dir_replace(tp, dp2, name2, ip1->i_ino, spaceres, NULL);
+	error = xfs_dir_replace(tp, dp2, name2, ip1->i_ino, spaceres, &new_diroffset);
 	if (error)
 		goto out_trans_abort;
 
@@ -2830,6 +2834,18 @@ xfs_cross_rename(
 		}
 	}
 
+	if (xfs_has_parent(mp)) {
+		error = xfs_parent_defer_replace(tp, ip1_pptr, dp1,
+				old_diroffset, name2, dp2, new_diroffset, ip1);
+		if (error)
+			goto out_trans_abort;
+
+		error = xfs_parent_defer_replace(tp, ip2_pptr, dp2,
+				new_diroffset, name1, dp1, old_diroffset, ip2);
+		if (error)
+			goto out_trans_abort;
+	}
+
 	if (ip1_flags) {
 		xfs_trans_ichgtime(tp, ip1, ip1_flags);
 		xfs_trans_log_inode(tp, ip1, XFS_ILOG_CORE);
@@ -2844,6 +2860,7 @@ xfs_cross_rename(
 	}
 	xfs_trans_ichgtime(tp, dp1, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG);
 	xfs_trans_log_inode(tp, dp1, XFS_ILOG_CORE);
+
 	return xfs_finish_rename(tp);
 
 out_trans_abort:
@@ -3060,8 +3077,8 @@ xfs_rename(
 	/* RENAME_EXCHANGE is unique from here on. */
 	if (flags & RENAME_EXCHANGE) {
 		error = xfs_cross_rename(tp, src_dp, src_name, src_ip,
-					target_dp, target_name, target_ip,
-					spaceres);
+				src_ip_pptr, target_dp, target_name, target_ip,
+				tgt_ip_pptr, spaceres);
 		goto out_unlock;
 	}
 


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 22/28] xfs: Add the parent pointer support to the  superblock version 5.
  2023-02-16 20:26 ` [PATCHSET v9r2d1 00/28] " Darrick J. Wong
                     ` (20 preceding siblings ...)
  2023-02-16 20:38   ` [PATCH 21/28] xfs: Add parent pointers to xfs_cross_rename Darrick J. Wong
@ 2023-02-16 20:38   ` Darrick J. Wong
  2023-02-16 20:38   ` [PATCH 23/28] xfs: Add helper function xfs_attr_list_context_init Darrick J. Wong
                     ` (5 subsequent siblings)
  27 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:38 UTC (permalink / raw)
  To: djwong
  Cc: Mark Tinguely, Dave Chinner, Allison Henderson, Darrick J. Wong,
	allison.henderson, linux-xfs

From: Allison Henderson <allison.henderson@oracle.com>

Signed-off-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_format.h |    4 +++-
 fs/xfs/libxfs/xfs_fs.h     |    1 +
 fs/xfs/libxfs/xfs_sb.c     |    4 ++++
 fs/xfs/xfs_super.c         |    4 ++++
 4 files changed, 12 insertions(+), 1 deletion(-)


diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index 371dc07233e0..f413819b2a8a 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -373,13 +373,15 @@ xfs_sb_has_ro_compat_feature(
 #define XFS_SB_FEAT_INCOMPAT_BIGTIME	(1 << 3)	/* large timestamps */
 #define XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR (1 << 4)	/* needs xfs_repair */
 #define XFS_SB_FEAT_INCOMPAT_NREXT64	(1 << 5)	/* large extent counters */
+#define XFS_SB_FEAT_INCOMPAT_PARENT	(1 << 6)	/* parent pointers */
 #define XFS_SB_FEAT_INCOMPAT_ALL \
 		(XFS_SB_FEAT_INCOMPAT_FTYPE|	\
 		 XFS_SB_FEAT_INCOMPAT_SPINODES|	\
 		 XFS_SB_FEAT_INCOMPAT_META_UUID| \
 		 XFS_SB_FEAT_INCOMPAT_BIGTIME| \
 		 XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR| \
-		 XFS_SB_FEAT_INCOMPAT_NREXT64)
+		 XFS_SB_FEAT_INCOMPAT_NREXT64| \
+		 XFS_SB_FEAT_INCOMPAT_PARENT)
 
 #define XFS_SB_FEAT_INCOMPAT_UNKNOWN	~XFS_SB_FEAT_INCOMPAT_ALL
 static inline bool
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index 1cfd5bc6520a..b0b4d7a3aa15 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -237,6 +237,7 @@ typedef struct xfs_fsop_resblks {
 #define XFS_FSOP_GEOM_FLAGS_BIGTIME	(1 << 21) /* 64-bit nsec timestamps */
 #define XFS_FSOP_GEOM_FLAGS_INOBTCNT	(1 << 22) /* inobt btree counter */
 #define XFS_FSOP_GEOM_FLAGS_NREXT64	(1 << 23) /* large extent counters */
+#define XFS_FSOP_GEOM_FLAGS_PARENT	(1 << 24) /* parent pointers 	    */
 
 /*
  * Minimum and maximum sizes need for growth checks.
diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
index 1eeecf2eb2a7..a59bf09495b1 100644
--- a/fs/xfs/libxfs/xfs_sb.c
+++ b/fs/xfs/libxfs/xfs_sb.c
@@ -173,6 +173,8 @@ xfs_sb_version_to_features(
 		features |= XFS_FEAT_NEEDSREPAIR;
 	if (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_NREXT64)
 		features |= XFS_FEAT_NREXT64;
+	if (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_PARENT)
+		features |= XFS_FEAT_PARENT;
 
 	return features;
 }
@@ -1189,6 +1191,8 @@ xfs_fs_geometry(
 		geo->flags |= XFS_FSOP_GEOM_FLAGS_BIGTIME;
 	if (xfs_has_inobtcounts(mp))
 		geo->flags |= XFS_FSOP_GEOM_FLAGS_INOBTCNT;
+	if (xfs_has_parent(mp))
+		geo->flags |= XFS_FSOP_GEOM_FLAGS_PARENT;
 	if (xfs_has_sector(mp)) {
 		geo->flags |= XFS_FSOP_GEOM_FLAGS_SECTOR;
 		geo->logsectsize = sbp->sb_logsectsize;
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 6795761c31e0..0ac55d191f1f 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1664,6 +1664,10 @@ xfs_fs_fill_super(
 		xfs_warn(mp,
 	"EXPERIMENTAL Large extent counts feature in use. Use at your own risk!");
 
+	if (xfs_has_parent(mp))
+		xfs_alert(mp,
+	"EXPERIMENTAL parent pointer feature enabled. Use at your own risk!");
+
 	error = xfs_mountfs(mp);
 	if (error)
 		goto out_filestream_unmount;


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 23/28] xfs: Add helper function xfs_attr_list_context_init
  2023-02-16 20:26 ` [PATCHSET v9r2d1 00/28] " Darrick J. Wong
                     ` (21 preceding siblings ...)
  2023-02-16 20:38   ` [PATCH 22/28] xfs: Add the parent pointer support to the superblock version 5 Darrick J. Wong
@ 2023-02-16 20:38   ` Darrick J. Wong
  2023-02-16 20:38   ` [PATCH 24/28] xfs: Filter XFS_ATTR_PARENT for getfattr Darrick J. Wong
                     ` (4 subsequent siblings)
  27 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:38 UTC (permalink / raw)
  To: djwong; +Cc: Allison Henderson, allison.henderson, linux-xfs

From: Allison Henderson <allison.henderson@oracle.com>

This patch adds a helper function xfs_attr_list_context_init used by
xfs_attr_list. This function initializes the xfs_attr_list_context
structure passed to xfs_attr_list_int. We will need this later to call
xfs_attr_list_int_ilocked when the node is already locked.

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_file.c  |    1 +
 fs/xfs/xfs_ioctl.c |   54 +++++++++++++++++++++++++++++++++++++---------------
 fs/xfs/xfs_ioctl.h |    2 ++
 3 files changed, 41 insertions(+), 16 deletions(-)


diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 595a5bcf46b9..9c09d32a6c9e 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -17,6 +17,7 @@
 #include "xfs_bmap_util.h"
 #include "xfs_dir2.h"
 #include "xfs_dir2_priv.h"
+#include "xfs_attr.h"
 #include "xfs_ioctl.h"
 #include "xfs_trace.h"
 #include "xfs_log.h"
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 736510bc241b..5cd5154d4d1e 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -369,6 +369,40 @@ xfs_attr_flags(
 	return 0;
 }
 
+/*
+ * Initializes an xfs_attr_list_context suitable for
+ * use by xfs_attr_list
+ */
+int
+xfs_ioc_attr_list_context_init(
+	struct xfs_inode		*dp,
+	char				*buffer,
+	int				bufsize,
+	int				flags,
+	struct xfs_attr_list_context	*context)
+{
+	struct xfs_attrlist		*alist;
+
+	/*
+	 * Initialize the output buffer.
+	 */
+	context->dp = dp;
+	context->resynch = 1;
+	context->attr_filter = xfs_attr_filter(flags);
+	context->buffer = buffer;
+	context->bufsize = round_down(bufsize, sizeof(uint32_t));
+	context->firstu = context->bufsize;
+	context->put_listent = xfs_ioc_attr_put_listent;
+
+	alist = context->buffer;
+	alist->al_count = 0;
+	alist->al_more = 0;
+	alist->al_offset[0] = context->bufsize;
+
+	return 0;
+}
+
+
 int
 xfs_ioc_attr_list(
 	struct xfs_inode		*dp,
@@ -378,7 +412,6 @@ xfs_ioc_attr_list(
 	struct xfs_attrlist_cursor __user *ucursor)
 {
 	struct xfs_attr_list_context	context = { };
-	struct xfs_attrlist		*alist;
 	void				*buffer;
 	int				error;
 
@@ -410,21 +443,10 @@ xfs_ioc_attr_list(
 	if (!buffer)
 		return -ENOMEM;
 
-	/*
-	 * Initialize the output buffer.
-	 */
-	context.dp = dp;
-	context.resynch = 1;
-	context.attr_filter = xfs_attr_filter(flags);
-	context.buffer = buffer;
-	context.bufsize = round_down(bufsize, sizeof(uint32_t));
-	context.firstu = context.bufsize;
-	context.put_listent = xfs_ioc_attr_put_listent;
-
-	alist = context.buffer;
-	alist->al_count = 0;
-	alist->al_more = 0;
-	alist->al_offset[0] = context.bufsize;
+	error = xfs_ioc_attr_list_context_init(dp, buffer, bufsize, flags,
+			&context);
+	if (error)
+		return error;
 
 	error = xfs_attr_list(&context);
 	if (error)
diff --git a/fs/xfs/xfs_ioctl.h b/fs/xfs/xfs_ioctl.h
index d4abba2c13c1..ca60e1c427a3 100644
--- a/fs/xfs/xfs_ioctl.h
+++ b/fs/xfs/xfs_ioctl.h
@@ -35,6 +35,8 @@ int xfs_ioc_attrmulti_one(struct file *parfilp, struct inode *inode,
 int xfs_ioc_attr_list(struct xfs_inode *dp, void __user *ubuf,
 		      size_t bufsize, int flags,
 		      struct xfs_attrlist_cursor __user *ucursor);
+int xfs_ioc_attr_list_context_init(struct xfs_inode *dp, char *buffer,
+		int bufsize, int flags, struct xfs_attr_list_context *context);
 
 extern struct dentry *
 xfs_handle_to_dentry(


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 24/28] xfs: Filter XFS_ATTR_PARENT for getfattr
  2023-02-16 20:26 ` [PATCHSET v9r2d1 00/28] " Darrick J. Wong
                     ` (22 preceding siblings ...)
  2023-02-16 20:38   ` [PATCH 23/28] xfs: Add helper function xfs_attr_list_context_init Darrick J. Wong
@ 2023-02-16 20:38   ` Darrick J. Wong
  2023-02-16 20:39   ` [PATCH 25/28] xfs: Add parent pointer ioctl Darrick J. Wong
                     ` (3 subsequent siblings)
  27 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:38 UTC (permalink / raw)
  To: djwong; +Cc: Allison Henderson, allison.henderson, linux-xfs

From: Allison Henderson <allison.henderson@oracle.com>

Parent pointers returned to the get_fattr tool cause errors since
the tool cannot parse parent pointers.  Fix this by filtering parent
parent pointers from xfs_xattr_put_listent.

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
---
 fs/xfs/xfs_xattr.c |    3 +++
 1 file changed, 3 insertions(+)


diff --git a/fs/xfs/xfs_xattr.c b/fs/xfs/xfs_xattr.c
index 8bb5f53a31fe..ddc2db5d6f73 100644
--- a/fs/xfs/xfs_xattr.c
+++ b/fs/xfs/xfs_xattr.c
@@ -234,6 +234,9 @@ xfs_xattr_put_listent(
 
 	ASSERT(context->count >= 0);
 
+	if (flags & XFS_ATTR_PARENT)
+		return;
+
 	if (flags & XFS_ATTR_ROOT) {
 #ifdef CONFIG_XFS_POSIX_ACL
 		if (namelen == SGI_ACL_FILE_SIZE &&


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 25/28] xfs: Add parent pointer ioctl
  2023-02-16 20:26 ` [PATCHSET v9r2d1 00/28] " Darrick J. Wong
                     ` (23 preceding siblings ...)
  2023-02-16 20:38   ` [PATCH 24/28] xfs: Filter XFS_ATTR_PARENT for getfattr Darrick J. Wong
@ 2023-02-16 20:39   ` Darrick J. Wong
  2023-02-16 20:39   ` [PATCH 26/28] xfs: fix unit conversion error in xfs_log_calc_max_attrsetm_res Darrick J. Wong
                     ` (2 subsequent siblings)
  27 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:39 UTC (permalink / raw)
  To: djwong; +Cc: Allison Henderson, allison.henderson, linux-xfs

From: Allison Henderson <allison.henderson@oracle.com>

This patch adds a new file ioctl to retrieve the parent pointer of a
given inode

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
---
 fs/xfs/Makefile            |    1 
 fs/xfs/libxfs/xfs_fs.h     |   74 ++++++++++++++++++++++++++
 fs/xfs/libxfs/xfs_parent.c |   10 +++
 fs/xfs/libxfs/xfs_parent.h |    2 +
 fs/xfs/xfs_ioctl.c         |   94 ++++++++++++++++++++++++++++++++-
 fs/xfs/xfs_ondisk.h        |    4 +
 fs/xfs/xfs_parent_utils.c  |  126 ++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_parent_utils.h  |   11 ++++
 8 files changed, 321 insertions(+), 1 deletion(-)
 create mode 100644 fs/xfs/xfs_parent_utils.c
 create mode 100644 fs/xfs/xfs_parent_utils.h


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index e2b2cf50ffcf..42d0496fdad7 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -86,6 +86,7 @@ xfs-y				+= xfs_aops.o \
 				   xfs_mount.o \
 				   xfs_mru_cache.o \
 				   xfs_pwork.o \
+				   xfs_parent_utils.o \
 				   xfs_reflink.o \
 				   xfs_stats.o \
 				   xfs_super.o \
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index b0b4d7a3aa15..9e59a1fdfb0c 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -752,6 +752,79 @@ struct xfs_scrub_metadata {
 				 XFS_SCRUB_OFLAG_NO_REPAIR_NEEDED)
 #define XFS_SCRUB_FLAGS_ALL	(XFS_SCRUB_FLAGS_IN | XFS_SCRUB_FLAGS_OUT)
 
+#define XFS_PPTR_MAXNAMELEN				256
+
+/* return parents of the handle, not the open fd */
+#define XFS_PPTR_IFLAG_HANDLE  (1U << 0)
+
+/* target was the root directory */
+#define XFS_PPTR_OFLAG_ROOT    (1U << 1)
+
+/* Cursor is done iterating pptrs */
+#define XFS_PPTR_OFLAG_DONE    (1U << 2)
+
+ #define XFS_PPTR_FLAG_ALL     (XFS_PPTR_IFLAG_HANDLE | XFS_PPTR_OFLAG_ROOT | \
+				XFS_PPTR_OFLAG_DONE)
+
+/* Get an inode parent pointer through ioctl */
+struct xfs_parent_ptr {
+	__u64		xpp_ino;			/* Inode */
+	__u32		xpp_gen;			/* Inode generation */
+	__u32		xpp_diroffset;			/* Directory offset */
+	__u64		xpp_rsvd;			/* Reserved */
+	__u8		xpp_name[XFS_PPTR_MAXNAMELEN];	/* File name */
+};
+
+/* Iterate through an inodes parent pointers */
+struct xfs_pptr_info {
+	/* File handle, if XFS_PPTR_IFLAG_HANDLE is set */
+	struct xfs_handle		pi_handle;
+
+	/*
+	 * Structure to track progress in iterating the parent pointers.
+	 * Must be initialized to zeroes before the first ioctl call, and
+	 * not touched by callers after that.
+	 */
+	struct xfs_attrlist_cursor	pi_cursor;
+
+	/* Operational flags: XFS_PPTR_*FLAG* */
+	__u32				pi_flags;
+
+	/* Must be set to zero */
+	__u32				pi_reserved;
+
+	/* # of entries in array */
+	__u32				pi_ptrs_size;
+
+	/* # of entries filled in (output) */
+	__u32				pi_ptrs_used;
+
+	/* Must be set to zero */
+	__u64				pi_reserved2[6];
+
+	/*
+	 * An array of struct xfs_parent_ptr follows the header
+	 * information. Use xfs_ppinfo_to_pp() to access the
+	 * parent pointer array entries.
+	 */
+	struct xfs_parent_ptr		pi_parents[];
+};
+
+static inline size_t
+xfs_pptr_info_sizeof(int nr_ptrs)
+{
+	return sizeof(struct xfs_pptr_info) +
+	       (nr_ptrs * sizeof(struct xfs_parent_ptr));
+}
+
+static inline struct xfs_parent_ptr*
+xfs_ppinfo_to_pp(
+	struct xfs_pptr_info	*info,
+	int			idx)
+{
+	return &info->pi_parents[idx];
+}
+
 /*
  * ioctl limits
  */
@@ -797,6 +870,7 @@ struct xfs_scrub_metadata {
 /*	XFS_IOC_GETFSMAP ------ hoisted 59         */
 #define XFS_IOC_SCRUB_METADATA	_IOWR('X', 60, struct xfs_scrub_metadata)
 #define XFS_IOC_AG_GEOMETRY	_IOWR('X', 61, struct xfs_ag_geometry)
+#define XFS_IOC_GETPARENTS	_IOWR('X', 62, struct xfs_parent_ptr)
 
 /*
  * ioctl commands that replace IRIX syssgi()'s
diff --git a/fs/xfs/libxfs/xfs_parent.c b/fs/xfs/libxfs/xfs_parent.c
index 629762701952..9176adfaa9e8 100644
--- a/fs/xfs/libxfs/xfs_parent.c
+++ b/fs/xfs/libxfs/xfs_parent.c
@@ -29,6 +29,16 @@
 
 struct kmem_cache		*xfs_parent_intent_cache;
 
+/* Initializes a xfs_parent_ptr from an xfs_parent_name_rec */
+void
+xfs_init_parent_ptr(struct xfs_parent_ptr		*xpp,
+		    const struct xfs_parent_name_rec	*rec)
+{
+	xpp->xpp_ino = be64_to_cpu(rec->p_ino);
+	xpp->xpp_gen = be32_to_cpu(rec->p_gen);
+	xpp->xpp_diroffset = be32_to_cpu(rec->p_diroffset);
+}
+
 /*
  * Parent pointer attribute handling.
  *
diff --git a/fs/xfs/libxfs/xfs_parent.h b/fs/xfs/libxfs/xfs_parent.h
index 039005883bb6..13040b9d8b08 100644
--- a/fs/xfs/libxfs/xfs_parent.h
+++ b/fs/xfs/libxfs/xfs_parent.h
@@ -25,6 +25,8 @@ struct xfs_parent_defer {
 void xfs_init_parent_name_rec(struct xfs_parent_name_rec *rec,
 			      struct xfs_inode *ip,
 			      uint32_t p_diroffset);
+void xfs_init_parent_ptr(struct xfs_parent_ptr *xpp,
+			 const struct xfs_parent_name_rec *rec);
 int __xfs_parent_init(struct xfs_mount *mp, bool grab_log,
 		struct xfs_parent_defer **parentp);
 
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 5cd5154d4d1e..df5a45b97f8f 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -37,6 +37,7 @@
 #include "xfs_health.h"
 #include "xfs_reflink.h"
 #include "xfs_ioctl.h"
+#include "xfs_parent_utils.h"
 #include "xfs_xattr.h"
 
 #include <linux/mount.h>
@@ -1675,6 +1676,96 @@ xfs_ioc_scrub_metadata(
 	return 0;
 }
 
+/*
+ * IOCTL routine to get the parent pointers of an inode and return it to user
+ * space.  Caller must pass a buffer space containing a struct xfs_pptr_info,
+ * followed by a region large enough to contain an array of struct
+ * xfs_parent_ptr of a size specified in pi_ptrs_size.  If the inode contains
+ * more parent pointers than can fit in the buffer space, caller may re-call
+ * the function using the returned pi_cursor to resume iteration.  The
+ * number of xfs_parent_ptr returned will be stored in pi_ptrs_used.
+ *
+ * Returns 0 on success or non-zero on failure
+ */
+STATIC int
+xfs_ioc_get_parent_pointer(
+	struct file			*filp,
+	void				__user *arg)
+{
+	struct xfs_pptr_info		*ppi = NULL;
+	int				error = 0;
+	struct xfs_inode		*ip = XFS_I(file_inode(filp));
+	struct xfs_mount		*mp = ip->i_mount;
+
+	if (!capable(CAP_SYS_ADMIN))
+		return -EPERM;
+
+	/* Allocate an xfs_pptr_info to put the user data */
+	ppi = kmalloc(sizeof(struct xfs_pptr_info), 0);
+	if (!ppi)
+		return -ENOMEM;
+
+	/* Copy the data from the user */
+	error = copy_from_user(ppi, arg, sizeof(struct xfs_pptr_info));
+	if (error) {
+		error = -EFAULT;
+		goto out;
+	}
+
+	/* Check size of buffer requested by user */
+	if (xfs_pptr_info_sizeof(ppi->pi_ptrs_size) > XFS_XATTR_LIST_MAX) {
+		error = -ENOMEM;
+		goto out;
+	}
+
+	if (ppi->pi_flags & ~XFS_PPTR_FLAG_ALL) {
+		error = -EINVAL;
+		goto out;
+	}
+	ppi->pi_flags &= ~(XFS_PPTR_OFLAG_ROOT | XFS_PPTR_OFLAG_DONE);
+
+	/*
+	 * Now that we know how big the trailing buffer is, expand
+	 * our kernel xfs_pptr_info to be the same size
+	 */
+	ppi = krealloc(ppi, xfs_pptr_info_sizeof(ppi->pi_ptrs_size), 0);
+	if (!ppi)
+		return -ENOMEM;
+
+	if (ppi->pi_flags & XFS_PPTR_IFLAG_HANDLE) {
+		error = xfs_iget(mp, NULL, ppi->pi_handle.ha_fid.fid_ino,
+				0, 0, &ip);
+		if (error)
+			goto out;
+
+		if (VFS_I(ip)->i_generation != ppi->pi_handle.ha_fid.fid_gen) {
+			error = -EINVAL;
+			goto out;
+		}
+	}
+
+	if (ip->i_ino == mp->m_sb.sb_rootino)
+		ppi->pi_flags |= XFS_PPTR_OFLAG_ROOT;
+
+	/* Get the parent pointers */
+	error = xfs_attr_get_parent_pointer(ip, ppi);
+
+	if (error)
+		goto out;
+
+	/* Copy the parent pointers back to the user */
+	error = copy_to_user(arg, ppi,
+			xfs_pptr_info_sizeof(ppi->pi_ptrs_size));
+	if (error) {
+		error = -EFAULT;
+		goto out;
+	}
+
+out:
+	kmem_free(ppi);
+	return error;
+}
+
 int
 xfs_ioc_swapext(
 	xfs_swapext_t	*sxp)
@@ -1964,7 +2055,8 @@ xfs_file_ioctl(
 
 	case XFS_IOC_FSGETXATTRA:
 		return xfs_ioc_fsgetxattra(ip, arg);
-
+	case XFS_IOC_GETPARENTS:
+		return xfs_ioc_get_parent_pointer(filp, arg);
 	case XFS_IOC_GETBMAP:
 	case XFS_IOC_GETBMAPA:
 	case XFS_IOC_GETBMAPX:
diff --git a/fs/xfs/xfs_ondisk.h b/fs/xfs/xfs_ondisk.h
index 9737b5a9f405..6a6bd05c2a68 100644
--- a/fs/xfs/xfs_ondisk.h
+++ b/fs/xfs/xfs_ondisk.h
@@ -150,6 +150,10 @@ xfs_check_ondisk_structs(void)
 	XFS_CHECK_OFFSET(struct xfs_efi_log_format_32, efi_extents,	16);
 	XFS_CHECK_OFFSET(struct xfs_efi_log_format_64, efi_extents,	16);
 
+	/* parent pointer ioctls */
+	XFS_CHECK_STRUCT_SIZE(struct xfs_parent_ptr,            280);
+	XFS_CHECK_STRUCT_SIZE(struct xfs_pptr_info,             104);
+
 	/*
 	 * The v5 superblock format extended several v4 header structures with
 	 * additional data. While new fields are only accessible on v5
diff --git a/fs/xfs/xfs_parent_utils.c b/fs/xfs/xfs_parent_utils.c
new file mode 100644
index 000000000000..771279731d42
--- /dev/null
+++ b/fs/xfs/xfs_parent_utils.c
@@ -0,0 +1,126 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2022 Oracle, Inc.
+ * All rights reserved.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_format.h"
+#include "xfs_log_format.h"
+#include "xfs_shared.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_bmap_btree.h"
+#include "xfs_inode.h"
+#include "xfs_error.h"
+#include "xfs_trace.h"
+#include "xfs_trans.h"
+#include "xfs_da_format.h"
+#include "xfs_da_btree.h"
+#include "xfs_attr.h"
+#include "xfs_ioctl.h"
+#include "xfs_parent.h"
+#include "xfs_da_btree.h"
+#include "xfs_parent_utils.h"
+
+/*
+ * Get the parent pointers for a given inode
+ *
+ * Returns 0 on success and non zero on error
+ */
+int
+xfs_attr_get_parent_pointer(
+	struct xfs_inode		*ip,
+	struct xfs_pptr_info		*ppi)
+{
+
+	struct xfs_attrlist		*alist;
+	struct xfs_attrlist_ent		*aent;
+	struct xfs_parent_ptr		*xpp;
+	struct xfs_parent_name_rec	*xpnr;
+	char				*namebuf;
+	unsigned int			namebuf_size;
+	int				name_len, i, error = 0;
+	unsigned int			lock_mode, flags = XFS_ATTR_PARENT;
+	struct xfs_attr_list_context	context;
+
+	/* Allocate a buffer to store the attribute names */
+	namebuf_size = sizeof(struct xfs_attrlist) +
+		       (ppi->pi_ptrs_size) * sizeof(struct xfs_attrlist_ent);
+	namebuf = kvzalloc(namebuf_size, GFP_KERNEL);
+	if (!namebuf)
+		return -ENOMEM;
+
+	memset(&context, 0, sizeof(struct xfs_attr_list_context));
+	error = xfs_ioc_attr_list_context_init(ip, namebuf, namebuf_size, 0,
+			&context);
+	if (error)
+		goto out_kfree;
+
+	/* Copy the cursor provided by caller */
+	memcpy(&context.cursor, &ppi->pi_cursor,
+		sizeof(struct xfs_attrlist_cursor));
+	context.attr_filter = XFS_ATTR_PARENT;
+
+	lock_mode = xfs_ilock_attr_map_shared(ip);
+
+	error = xfs_attr_list_ilocked(&context);
+	if (error)
+		goto out_unlock;
+
+	alist = (struct xfs_attrlist *)namebuf;
+	for (i = 0; i < alist->al_count; i++) {
+		struct xfs_da_args args = {
+			.geo = ip->i_mount->m_attr_geo,
+			.whichfork = XFS_ATTR_FORK,
+			.dp = ip,
+			.namelen = sizeof(struct xfs_parent_name_rec),
+			.attr_filter = flags,
+		};
+
+		xpp = xfs_ppinfo_to_pp(ppi, i);
+		memset(xpp, 0, sizeof(struct xfs_parent_ptr));
+		aent = (struct xfs_attrlist_ent *)
+			&namebuf[alist->al_offset[i]];
+		xpnr = (struct xfs_parent_name_rec *)(aent->a_name);
+
+		if (aent->a_valuelen > XFS_PPTR_MAXNAMELEN) {
+			error = -EFSCORRUPTED;
+			goto out_unlock;
+		}
+		name_len = aent->a_valuelen;
+
+		args.name = (char *)xpnr;
+		args.hashval = xfs_da_hashname(args.name, args.namelen),
+		args.value = (unsigned char *)(xpp->xpp_name);
+		args.valuelen = name_len;
+
+		error = xfs_attr_get_ilocked(&args);
+		error = (error == -EEXIST ? 0 : error);
+		if (error) {
+			error = -EFSCORRUPTED;
+			goto out_unlock;
+		}
+
+		xfs_init_parent_ptr(xpp, xpnr);
+		if (!xfs_verify_ino(args.dp->i_mount, xpp->xpp_ino)) {
+			error = -EFSCORRUPTED;
+			goto out_unlock;
+		}
+	}
+	ppi->pi_ptrs_used = alist->al_count;
+	if (!alist->al_more)
+		ppi->pi_flags |= XFS_PPTR_OFLAG_DONE;
+
+	/* Update the caller with the current cursor position */
+	memcpy(&ppi->pi_cursor, &context.cursor,
+			sizeof(struct xfs_attrlist_cursor));
+
+out_unlock:
+	xfs_iunlock(ip, lock_mode);
+out_kfree:
+	kvfree(namebuf);
+
+	return error;
+}
+
diff --git a/fs/xfs/xfs_parent_utils.h b/fs/xfs/xfs_parent_utils.h
new file mode 100644
index 000000000000..ad60baee8b2a
--- /dev/null
+++ b/fs/xfs/xfs_parent_utils.h
@@ -0,0 +1,11 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2022 Oracle, Inc.
+ * All rights reserved.
+ */
+#ifndef	__XFS_PARENT_UTILS_H__
+#define	__XFS_PARENT_UTILS_H__
+
+int xfs_attr_get_parent_pointer(struct xfs_inode *ip,
+				struct xfs_pptr_info *ppi);
+#endif	/* __XFS_PARENT_UTILS_H__ */


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 26/28] xfs: fix unit conversion error in xfs_log_calc_max_attrsetm_res
  2023-02-16 20:26 ` [PATCHSET v9r2d1 00/28] " Darrick J. Wong
                     ` (24 preceding siblings ...)
  2023-02-16 20:39   ` [PATCH 25/28] xfs: Add parent pointer ioctl Darrick J. Wong
@ 2023-02-16 20:39   ` Darrick J. Wong
  2023-02-16 20:39   ` [PATCH 27/28] xfs: drop compatibility minimum log size computations for reflink Darrick J. Wong
  2023-02-16 20:39   ` [PATCH 28/28] xfs: add xfs_trans_mod_sb tracing Darrick J. Wong
  27 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:39 UTC (permalink / raw)
  To: djwong; +Cc: Allison Henderson, allison.henderson, linux-xfs

From: Allison Henderson <allison.henderson@oracle.com>

Dave and I were discussing some recent test regressions as a result of
me turning on nrext64=1 on realtime filesystems, when we noticed that
the minimum log size of a 32M filesystem jumped from 954 blocks to 4287
blocks.

Digging through xfs_log_calc_max_attrsetm_res, Dave noticed that @size
contains the maximum estimated amount of space needed for a local format
xattr, in bytes, but we feed this quantity to XFS_NEXTENTADD_SPACE_RES,
which requires units of blocks.  This has resulted in an overestimation
of the minimum log size over the years.

We should nominally correct this, but there's a backwards compatibility
problem -- if we enable it now, the minimum log size will decrease.  If
a corrected mkfs formats a filesystem with this new smaller log size, a
user will encounter mount failures on an uncorrected kernel due to the
larger minimum log size computations there.

However, the large extent counters feature is still EXPERIMENTAL, so we
can gate the correction on that feature (or any features that get added
after that) being enabled.  Any filesystem with nrext64 or any of the
as-yet-undefined feature bits turned on will be rejected by old
uncorrected kernels, so this should be safe even in the upgrade case.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
---
 fs/xfs/libxfs/xfs_log_rlimit.c |   43 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)


diff --git a/fs/xfs/libxfs/xfs_log_rlimit.c b/fs/xfs/libxfs/xfs_log_rlimit.c
index 9975b93a7412..e5c606fb7a6a 100644
--- a/fs/xfs/libxfs/xfs_log_rlimit.c
+++ b/fs/xfs/libxfs/xfs_log_rlimit.c
@@ -16,6 +16,39 @@
 #include "xfs_bmap_btree.h"
 #include "xfs_trace.h"
 
+/*
+ * Decide if the filesystem has the parent pointer feature or any feature
+ * added after that.
+ */
+static inline bool
+xfs_has_parent_or_newer_feature(
+	struct xfs_mount	*mp)
+{
+	if (!xfs_sb_is_v5(&mp->m_sb))
+		return false;
+
+	if (xfs_sb_has_compat_feature(&mp->m_sb, ~0))
+		return true;
+
+	if (xfs_sb_has_ro_compat_feature(&mp->m_sb,
+				~(XFS_SB_FEAT_RO_COMPAT_FINOBT |
+				 XFS_SB_FEAT_RO_COMPAT_RMAPBT |
+				 XFS_SB_FEAT_RO_COMPAT_REFLINK |
+				 XFS_SB_FEAT_RO_COMPAT_INOBTCNT)))
+		return true;
+
+	if (xfs_sb_has_incompat_feature(&mp->m_sb,
+				~(XFS_SB_FEAT_INCOMPAT_FTYPE |
+				 XFS_SB_FEAT_INCOMPAT_SPINODES |
+				 XFS_SB_FEAT_INCOMPAT_META_UUID |
+				 XFS_SB_FEAT_INCOMPAT_BIGTIME |
+				 XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR |
+				 XFS_SB_FEAT_INCOMPAT_NREXT64)))
+		return true;
+
+	return false;
+}
+
 /*
  * Calculate the maximum length in bytes that would be required for a local
  * attribute value as large attributes out of line are not logged.
@@ -31,6 +64,16 @@ xfs_log_calc_max_attrsetm_res(
 	       MAXNAMELEN - 1;
 	nblks = XFS_DAENTER_SPACE_RES(mp, XFS_ATTR_FORK);
 	nblks += XFS_B_TO_FSB(mp, size);
+
+	/*
+	 * Starting with the parent pointer feature, every new fs feature
+	 * corrects a unit conversion error in the xattr transaction
+	 * reservation code that resulted in oversized minimum log size
+	 * computations.
+	 */
+	if (xfs_has_parent_or_newer_feature(mp))
+		size = XFS_B_TO_FSB(mp, size);
+
 	nblks += XFS_NEXTENTADD_SPACE_RES(mp, size, XFS_ATTR_FORK);
 
 	return  M_RES(mp)->tr_attrsetm.tr_logres +


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 27/28] xfs: drop compatibility minimum log size computations for reflink
  2023-02-16 20:26 ` [PATCHSET v9r2d1 00/28] " Darrick J. Wong
                     ` (25 preceding siblings ...)
  2023-02-16 20:39   ` [PATCH 26/28] xfs: fix unit conversion error in xfs_log_calc_max_attrsetm_res Darrick J. Wong
@ 2023-02-16 20:39   ` Darrick J. Wong
  2023-02-16 20:39   ` [PATCH 28/28] xfs: add xfs_trans_mod_sb tracing Darrick J. Wong
  27 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:39 UTC (permalink / raw)
  To: djwong; +Cc: Allison Henderson, allison.henderson, linux-xfs

From: Allison Henderson <allison.henderson@oracle.com>

Having established that we can reduce the minimum log size computation
for filesystems with parent pointers or any newer feature, we should
also drop the compat minlogsize code that we added when we reduced the
transaction reservation size for rmap and reflink.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
---
 fs/xfs/libxfs/xfs_log_rlimit.c |   10 ++++++++++
 1 file changed, 10 insertions(+)


diff --git a/fs/xfs/libxfs/xfs_log_rlimit.c b/fs/xfs/libxfs/xfs_log_rlimit.c
index e5c606fb7a6a..74821c7fd0cc 100644
--- a/fs/xfs/libxfs/xfs_log_rlimit.c
+++ b/fs/xfs/libxfs/xfs_log_rlimit.c
@@ -91,6 +91,16 @@ xfs_log_calc_trans_resv_for_minlogblocks(
 {
 	unsigned int		rmap_maxlevels = mp->m_rmap_maxlevels;
 
+	/*
+	 * Starting with the parent pointer feature, every new fs feature
+	 * drops the oversized minimum log size computation introduced by the
+	 * original reflink code.
+	 */
+	if (xfs_has_parent_or_newer_feature(mp)) {
+		xfs_trans_resv_calc(mp, resv);
+		return;
+	}
+
 	/*
 	 * In the early days of rmap+reflink, we always set the rmap maxlevels
 	 * to 9 even if the AG was small enough that it would never grow to


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 28/28] xfs: add xfs_trans_mod_sb tracing
  2023-02-16 20:26 ` [PATCHSET v9r2d1 00/28] " Darrick J. Wong
                     ` (26 preceding siblings ...)
  2023-02-16 20:39   ` [PATCH 27/28] xfs: drop compatibility minimum log size computations for reflink Darrick J. Wong
@ 2023-02-16 20:39   ` Darrick J. Wong
  27 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:39 UTC (permalink / raw)
  To: djwong; +Cc: Allison Henderson, allison.henderson, linux-xfs

From: Allison Henderson <allison.henderson@oracle.com>

Reservationless operations are not allowed with parent pointers because
the attr expansion may cause a shutdown if  an operation is retried without
reservation and succeeds without enough space for the parent pointer.  Add
tracing to detect if this shutdown occurs.

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
---
 fs/xfs/xfs_trans.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)


diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index 43f4b0943f49..bfb7e87e7794 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -375,8 +375,10 @@ xfs_trans_mod_sb(
 		 */
 		if (delta < 0) {
 			tp->t_blk_res_used += (uint)-delta;
-			if (tp->t_blk_res_used > tp->t_blk_res)
+			if (tp->t_blk_res_used > tp->t_blk_res) {
+				xfs_err(mp, "URK blkres 0x%x used 0x%x", tp->t_blk_res, tp->t_blk_res_used);
 				xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
+			}
 		} else if (delta > 0 && (tp->t_flags & XFS_TRANS_RES_FDBLKS)) {
 			int64_t	blkres_delta;
 


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 1/3] xfs: directory lookups should return diroffsets too
  2023-02-16 20:26 ` [PATCHSET v9r2d1 0/3] xfs: bug fixes for parent pointers Darrick J. Wong
@ 2023-02-16 20:40   ` Darrick J. Wong
  2023-02-16 20:40   ` [PATCH 2/3] xfs: move/add parent pointer validators to xfs_parent Darrick J. Wong
  2023-02-16 20:40   ` [PATCH 3/3] xfs: don't remove the attr fork when parent pointers are enabled Darrick J. Wong
  2 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:40 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Teach the directory lookup functions to return the dir offset of the
dirent that it finds.  Online fsck will use this when checking and
repairing filesystems.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_dir2_block.c |    2 ++
 fs/xfs/libxfs/xfs_dir2_leaf.c  |    2 ++
 fs/xfs/libxfs/xfs_dir2_node.c  |    2 ++
 fs/xfs/libxfs/xfs_dir2_sf.c    |    4 ++++
 4 files changed, 10 insertions(+)


diff --git a/fs/xfs/libxfs/xfs_dir2_block.c b/fs/xfs/libxfs/xfs_dir2_block.c
index 0f3a03e87278..24467e1a0d6f 100644
--- a/fs/xfs/libxfs/xfs_dir2_block.c
+++ b/fs/xfs/libxfs/xfs_dir2_block.c
@@ -749,6 +749,8 @@ xfs_dir2_block_lookup_int(
 		cmp = xfs_dir2_compname(args, dep->name, dep->namelen);
 		if (cmp != XFS_CMP_DIFFERENT && cmp != args->cmpresult) {
 			args->cmpresult = cmp;
+			args->offset = xfs_dir2_byte_to_dataptr(
+					(char *)dep - (char *)hdr);
 			*bpp = bp;
 			*entno = mid;
 			if (cmp == XFS_CMP_EXACT)
diff --git a/fs/xfs/libxfs/xfs_dir2_leaf.c b/fs/xfs/libxfs/xfs_dir2_leaf.c
index fe75ffadace9..b7ea73b4f592 100644
--- a/fs/xfs/libxfs/xfs_dir2_leaf.c
+++ b/fs/xfs/libxfs/xfs_dir2_leaf.c
@@ -1300,6 +1300,8 @@ xfs_dir2_leaf_lookup_int(
 		cmp = xfs_dir2_compname(args, dep->name, dep->namelen);
 		if (cmp != XFS_CMP_DIFFERENT && cmp != args->cmpresult) {
 			args->cmpresult = cmp;
+			args->offset = xfs_dir2_db_off_to_dataptr(args->geo,
+					newdb, (char *)dep - (char *)dbp->b_addr);
 			*indexp = index;
 			/* case exact match: return the current buffer. */
 			if (cmp == XFS_CMP_EXACT) {
diff --git a/fs/xfs/libxfs/xfs_dir2_node.c b/fs/xfs/libxfs/xfs_dir2_node.c
index 53cd0d5d94f7..f8c01e8d885c 100644
--- a/fs/xfs/libxfs/xfs_dir2_node.c
+++ b/fs/xfs/libxfs/xfs_dir2_node.c
@@ -887,6 +887,8 @@ xfs_dir2_leafn_lookup_for_entry(
 			args->cmpresult = cmp;
 			args->inumber = be64_to_cpu(dep->inumber);
 			args->filetype = xfs_dir2_data_get_ftype(mp, dep);
+			args->offset = xfs_dir2_db_off_to_dataptr(args->geo,
++					newdb, (char *)dep - (char *)curbp->b_addr);
 			*indexp = index;
 			state->extravalid = 1;
 			state->extrablk.bp = curbp;
diff --git a/fs/xfs/libxfs/xfs_dir2_sf.c b/fs/xfs/libxfs/xfs_dir2_sf.c
index 032c65804610..f8670c56c7a6 100644
--- a/fs/xfs/libxfs/xfs_dir2_sf.c
+++ b/fs/xfs/libxfs/xfs_dir2_sf.c
@@ -889,6 +889,7 @@ xfs_dir2_sf_lookup(
 		args->inumber = dp->i_ino;
 		args->cmpresult = XFS_CMP_EXACT;
 		args->filetype = XFS_DIR3_FT_DIR;
+		args->offset = 1;
 		return -EEXIST;
 	}
 	/*
@@ -899,6 +900,7 @@ xfs_dir2_sf_lookup(
 		args->inumber = xfs_dir2_sf_get_parent_ino(sfp);
 		args->cmpresult = XFS_CMP_EXACT;
 		args->filetype = XFS_DIR3_FT_DIR;
+		args->offset = 2;
 		return -EEXIST;
 	}
 	/*
@@ -917,6 +919,8 @@ xfs_dir2_sf_lookup(
 			args->cmpresult = cmp;
 			args->inumber = xfs_dir2_sf_get_ino(mp, sfp, sfep);
 			args->filetype = xfs_dir2_sf_get_ftype(mp, sfep);
+			args->offset = xfs_dir2_byte_to_dataptr(
+						xfs_dir2_sf_get_offset(sfep));
 			if (cmp == XFS_CMP_EXACT)
 				return -EEXIST;
 			ci_sfep = sfep;


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 2/3] xfs: move/add parent pointer validators to xfs_parent
  2023-02-16 20:26 ` [PATCHSET v9r2d1 0/3] xfs: bug fixes for parent pointers Darrick J. Wong
  2023-02-16 20:40   ` [PATCH 1/3] xfs: directory lookups should return diroffsets too Darrick J. Wong
@ 2023-02-16 20:40   ` Darrick J. Wong
  2023-02-16 20:40   ` [PATCH 3/3] xfs: don't remove the attr fork when parent pointers are enabled Darrick J. Wong
  2 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:40 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Move the parent pointer xattr name validator to xfs_parent.c, and add a
new function to check the xattr value.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_attr.c   |   61 +++++++++-----------------------------------
 fs/xfs/libxfs/xfs_attr.h   |    2 +
 fs/xfs/libxfs/xfs_parent.c |   44 ++++++++++++++++++++++++++++++++
 fs/xfs/libxfs/xfs_parent.h |    7 +++++
 4 files changed, 65 insertions(+), 49 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index 57080ea4c869..3065dd622102 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -26,6 +26,7 @@
 #include "xfs_trace.h"
 #include "xfs_attr_item.h"
 #include "xfs_xattr.h"
+#include "xfs_parent.h"
 
 struct kmem_cache		*xfs_attr_intent_cache;
 
@@ -1577,62 +1578,26 @@ xfs_attr_node_get(
 	return error;
 }
 
-/*
- * Verify parent pointer attribute is valid.
- * Return true on success or false on failure
- */
-STATIC bool
-xfs_verify_pptr(
-	struct xfs_mount			*mp,
-	const struct xfs_parent_name_rec	*rec)
-{
-	xfs_ino_t				p_ino;
-	xfs_dir2_dataptr_t			p_diroffset;
-
-	p_ino = be64_to_cpu(rec->p_ino);
-	p_diroffset = be32_to_cpu(rec->p_diroffset);
-
-	if (!xfs_verify_ino(mp, p_ino))
-		return false;
-
-	if (p_diroffset > XFS_DIR2_MAX_DATAPTR)
-		return false;
-
-	return true;
-}
-
-/* Returns true if the string attribute entry name is valid. */
-static bool
-xfs_str_attr_namecheck(
-	const void	*name,
-	size_t		length)
-{
-	/*
-	 * MAXNAMELEN includes the trailing null, but (name/length) leave it
-	 * out, so use >= for the length check.
-	 */
-	if (length >= MAXNAMELEN)
-		return false;
-
-	/* There shouldn't be any nulls here */
-	return !memchr(name, 0, length);
-}
-
 /* Returns true if the attribute entry name is valid. */
 bool
 xfs_attr_namecheck(
 	struct xfs_mount	*mp,
 	const void		*name,
 	size_t			length,
-	int			flags)
+	unsigned int		flags)
 {
-	if (flags & XFS_ATTR_PARENT) {
-		if (length != sizeof(struct xfs_parent_name_rec))
-			return false;
-		return xfs_verify_pptr(mp, (struct xfs_parent_name_rec *)name);
-	}
+	if (flags & XFS_ATTR_PARENT)
+		return xfs_parent_namecheck(mp, name, length, flags);
 
-	return xfs_str_attr_namecheck(name, length);
+	/*
+	 * MAXNAMELEN includes the trailing null, but (name/length) leave it
+	 * out, so use >= for the length check.
+	 */
+	if (length >= MAXNAMELEN)
+		return false;
+
+	/* There shouldn't be any nulls here */
+	return !memchr(name, 0, length);
 }
 
 int __init
diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
index 985761264d1f..d6d23cf19ade 100644
--- a/fs/xfs/libxfs/xfs_attr.h
+++ b/fs/xfs/libxfs/xfs_attr.h
@@ -551,7 +551,7 @@ int xfs_attr_set(struct xfs_da_args *args);
 int xfs_attr_set_iter(struct xfs_attr_intent *attr);
 int xfs_attr_remove_iter(struct xfs_attr_intent *attr);
 bool xfs_attr_namecheck(struct xfs_mount *mp, const void *name, size_t length,
-			int flags);
+		unsigned int flags);
 int xfs_attr_calc_size(struct xfs_da_args *args, int *local);
 void xfs_init_attr_trans(struct xfs_da_args *args, struct xfs_trans_res *tres,
 			 unsigned int *total);
diff --git a/fs/xfs/libxfs/xfs_parent.c b/fs/xfs/libxfs/xfs_parent.c
index 9176adfaa9e8..8cc264baf6c7 100644
--- a/fs/xfs/libxfs/xfs_parent.c
+++ b/fs/xfs/libxfs/xfs_parent.c
@@ -55,6 +55,50 @@ xfs_init_parent_ptr(struct xfs_parent_ptr		*xpp,
  * occurring.
  */
 
+/* Return true if parent pointer EA name is valid. */
+bool
+xfs_parent_namecheck(
+	struct xfs_mount			*mp,
+	const struct xfs_parent_name_rec	*rec,
+	size_t					reclen,
+	unsigned int				attr_flags)
+{
+	xfs_ino_t				p_ino;
+	xfs_dir2_dataptr_t			p_diroffset;
+
+	if (reclen != sizeof(struct xfs_parent_name_rec))
+		return false;
+
+	/* Only one namespace bit allowed. */
+	if (hweight32(attr_flags & XFS_ATTR_NSP_ONDISK_MASK) > 1)
+		return false;
+
+	p_ino = be64_to_cpu(rec->p_ino);
+	if (!xfs_verify_ino(mp, p_ino))
+		return false;
+
+	p_diroffset = be32_to_cpu(rec->p_diroffset);
+	if (p_diroffset > XFS_DIR2_MAX_DATAPTR)
+		return false;
+
+	return true;
+}
+
+/* Return true if parent pointer EA value is valid. */
+bool
+xfs_parent_valuecheck(
+	struct xfs_mount		*mp,
+	const void			*value,
+	size_t				valuelen)
+{
+	if (valuelen == 0 || valuelen >= MAXNAMELEN)
+		return false;
+
+	if (value == NULL)
+		return false;
+
+	return true;
+}
 
 /* Initializes a xfs_parent_name_rec to be stored as an attribute name */
 void
diff --git a/fs/xfs/libxfs/xfs_parent.h b/fs/xfs/libxfs/xfs_parent.h
index 13040b9d8b08..4ffcb81d399c 100644
--- a/fs/xfs/libxfs/xfs_parent.h
+++ b/fs/xfs/libxfs/xfs_parent.h
@@ -8,6 +8,13 @@
 
 extern struct kmem_cache	*xfs_parent_intent_cache;
 
+/* Metadata validators */
+bool xfs_parent_namecheck(struct xfs_mount *mp,
+		const struct xfs_parent_name_rec *rec, size_t reclen,
+		unsigned int attr_flags);
+bool xfs_parent_valuecheck(struct xfs_mount *mp, const void *value,
+		size_t valuelen);
+
 /*
  * Dynamically allocd structure used to wrap the needed data to pass around
  * the defer ops machinery


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 3/3] xfs: don't remove the attr fork when parent pointers are enabled
  2023-02-16 20:26 ` [PATCHSET v9r2d1 0/3] xfs: bug fixes for parent pointers Darrick J. Wong
  2023-02-16 20:40   ` [PATCH 1/3] xfs: directory lookups should return diroffsets too Darrick J. Wong
  2023-02-16 20:40   ` [PATCH 2/3] xfs: move/add parent pointer validators to xfs_parent Darrick J. Wong
@ 2023-02-16 20:40   ` Darrick J. Wong
  2 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:40 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

When running generic/388, I observed the following .out.bad output:

_check_xfs_filesystem: filesystem on /dev/sda4 is inconsistent (r)
*** xfs_repair -n output ***
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
mismatch between format (2) and size (276) in symlink ino 37223730
bad data fork in symlink 37223730
would have cleared inode 37223730
        - agno = 2
        - agno = 3
mismatch between format (2) and size (276) in symlink ino 102725435
bad data fork in symlink 102725435
would have cleared inode 102725435
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
unknown block state, ag 1, blocks 458655-458655
unknown block state, ag 3, blocks 257772-257772
        - check for inodes claiming duplicate blocks...
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 0
mismatch between format (2) and size (276) in symlink ino 102725435
bad data fork in symlink 102725435
would have cleared inode 102725435
mismatch between format (2) and size (276) in symlink ino 37223730
bad data fork in symlink 37223730
would have cleared inode 37223730
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
user quota id 0 has bcount 1140448, expected 1140446
user quota id 0 has icount 39892, expected 39890
No modify flag set, skipping filesystem flush and exiting.

Inode 37223730 is an unlinked remote-format symlink with no xattr fork.
According to the inode verifier and xfs_repair, this symlink ought to
have a local format data fork, since 276 bytes is small enough to fit in
the immediate area.

How did we get here?  fsstress removed the symlink, which removed the
last parent pointer xattr.  There were no other xattrs, so that removal
also removed the attr fork.  This transaction got flushed to the log,
but the system went down before we could inactivate the symlink.  Log
recovery tried to inactivate this inode (since it is on the unlinked
list) but the verifier tripped over the remote value and leaked it.

Hence we ended up with a file in this odd state on a "clean" mount.  The
"obvious" fix is to prohibit erasure of the attr fork to avoid tripping
over the verifiers when pptrs are enabled.

I wonder this could be reproduced with normal xattrs and (say) a
directory?  Maybe this fix should target /any/ symlink or directory?

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_attr_leaf.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c
index beee51ad75ce..e6c4c8b52a55 100644
--- a/fs/xfs/libxfs/xfs_attr_leaf.c
+++ b/fs/xfs/libxfs/xfs_attr_leaf.c
@@ -854,7 +854,8 @@ xfs_attr_sf_removename(
 	totsize -= size;
 	if (totsize == sizeof(xfs_attr_sf_hdr_t) && xfs_has_attr2(mp) &&
 	    (dp->i_df.if_format != XFS_DINODE_FMT_BTREE) &&
-	    !(args->op_flags & (XFS_DA_OP_ADDNAME | XFS_DA_OP_REPLACE))) {
+	    !(args->op_flags & (XFS_DA_OP_ADDNAME | XFS_DA_OP_REPLACE)) &&
+	    !xfs_has_parent(mp)) {
 		xfs_attr_fork_remove(dp, args->trans);
 	} else {
 		xfs_idata_realloc(dp, -size, XFS_ATTR_FORK);
@@ -863,7 +864,8 @@ xfs_attr_sf_removename(
 		ASSERT(totsize > sizeof(xfs_attr_sf_hdr_t) ||
 				(args->op_flags & XFS_DA_OP_ADDNAME) ||
 				!xfs_has_attr2(mp) ||
-				dp->i_df.if_format == XFS_DINODE_FMT_BTREE);
+				dp->i_df.if_format == XFS_DINODE_FMT_BTREE ||
+				xfs_has_parent(mp));
 		xfs_trans_log_inode(args->trans, dp,
 					XFS_ILOG_CORE | XFS_ILOG_ADATA);
 	}


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 1/4] xfs: fix multiple problems when doing getparents by handle
  2023-02-16 20:26 ` [PATCHSET v9r2d1 0/4] xfs: rework the GETPARENTS ioctl Darrick J. Wong
@ 2023-02-16 20:40   ` Darrick J. Wong
  2023-02-16 20:41   ` [PATCH 2/4] xfs: use kvalloc for the parent pointer info buffer Darrick J. Wong
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:40 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Fix a few problems in the file handle processing part of GETPARENTS.
First, we need to validate that the fsid of the handle matches the
filesystem that we're talking to.  Second, we can skip the iget if the
inode number matches the open file.  Third, if we are going to do the
iget file, we need to use an UNTRUSTED lookup to guard against crap.
Finally, we mustn't leak any inodes that we iget.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_ioctl.c |   30 +++++++++++++++++++++---------
 1 file changed, 21 insertions(+), 9 deletions(-)


diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index df5a45b97f8f..a1929b08c539 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -1694,8 +1694,9 @@ xfs_ioc_get_parent_pointer(
 {
 	struct xfs_pptr_info		*ppi = NULL;
 	int				error = 0;
-	struct xfs_inode		*ip = XFS_I(file_inode(filp));
-	struct xfs_mount		*mp = ip->i_mount;
+	struct xfs_inode		*file_ip = XFS_I(file_inode(filp));
+	struct xfs_inode		*call_ip = file_ip;
+	struct xfs_mount		*mp = file_ip->i_mount;
 
 	if (!capable(CAP_SYS_ADMIN))
 		return -EPERM;
@@ -1733,23 +1734,32 @@ xfs_ioc_get_parent_pointer(
 		return -ENOMEM;
 
 	if (ppi->pi_flags & XFS_PPTR_IFLAG_HANDLE) {
-		error = xfs_iget(mp, NULL, ppi->pi_handle.ha_fid.fid_ino,
-				0, 0, &ip);
-		if (error)
+		struct xfs_handle	*hanp = &ppi->pi_handle;
+
+		if (memcmp(&hanp->ha_fsid, mp->m_fixedfsid,
+							sizeof(xfs_fsid_t))) {
+			error = -EINVAL;
 			goto out;
+		}
 
-		if (VFS_I(ip)->i_generation != ppi->pi_handle.ha_fid.fid_gen) {
+		if (hanp->ha_fid.fid_ino != file_ip->i_ino) {
+			error = xfs_iget(mp, NULL, hanp->ha_fid.fid_ino,
+					XFS_IGET_UNTRUSTED, 0, &call_ip);
+			if (error)
+				goto out;
+		}
+
+		if (VFS_I(call_ip)->i_generation != hanp->ha_fid.fid_gen) {
 			error = -EINVAL;
 			goto out;
 		}
 	}
 
-	if (ip->i_ino == mp->m_sb.sb_rootino)
+	if (call_ip->i_ino == mp->m_sb.sb_rootino)
 		ppi->pi_flags |= XFS_PPTR_OFLAG_ROOT;
 
 	/* Get the parent pointers */
-	error = xfs_attr_get_parent_pointer(ip, ppi);
-
+	error = xfs_attr_get_parent_pointer(call_ip, ppi);
 	if (error)
 		goto out;
 
@@ -1762,6 +1772,8 @@ xfs_ioc_get_parent_pointer(
 	}
 
 out:
+	if (call_ip != file_ip)
+		xfs_irele(call_ip);
 	kmem_free(ppi);
 	return error;
 }


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 2/4] xfs: use kvalloc for the parent pointer info buffer
  2023-02-16 20:26 ` [PATCHSET v9r2d1 0/4] xfs: rework the GETPARENTS ioctl Darrick J. Wong
  2023-02-16 20:40   ` [PATCH 1/4] xfs: fix multiple problems when doing getparents by handle Darrick J. Wong
@ 2023-02-16 20:41   ` Darrick J. Wong
  2023-02-16 20:41   ` [PATCH 3/4] xfs: pass the attr value to put_listent when possible Darrick J. Wong
  2023-02-16 20:41   ` [PATCH 4/4] xfs: replace the XFS_IOC_GETPARENTS backend Darrick J. Wong
  3 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:41 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

It's possible that userspace could call us with large(ish) 64k buffer.
Use kvalloc for this, so that the kernel doesn't have to find a
contiguous physical region.  Zero the realloc buffer so that we don't
leak kernel contents to userspace.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_ioctl.c |    8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)


diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index a1929b08c539..19f71d6eb561 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -1702,7 +1702,7 @@ xfs_ioc_get_parent_pointer(
 		return -EPERM;
 
 	/* Allocate an xfs_pptr_info to put the user data */
-	ppi = kmalloc(sizeof(struct xfs_pptr_info), 0);
+	ppi = kvmalloc(sizeof(struct xfs_pptr_info), GFP_KERNEL);
 	if (!ppi)
 		return -ENOMEM;
 
@@ -1729,7 +1729,9 @@ xfs_ioc_get_parent_pointer(
 	 * Now that we know how big the trailing buffer is, expand
 	 * our kernel xfs_pptr_info to be the same size
 	 */
-	ppi = krealloc(ppi, xfs_pptr_info_sizeof(ppi->pi_ptrs_size), 0);
+	ppi = kvrealloc(ppi, sizeof(struct xfs_pptr_info),
+			xfs_pptr_info_sizeof(ppi->pi_ptrs_size),
+			GFP_KERNEL | __GFP_ZERO);
 	if (!ppi)
 		return -ENOMEM;
 
@@ -1774,7 +1776,7 @@ xfs_ioc_get_parent_pointer(
 out:
 	if (call_ip != file_ip)
 		xfs_irele(call_ip);
-	kmem_free(ppi);
+	kvfree(ppi);
 	return error;
 }
 


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 3/4] xfs: pass the attr value to put_listent when possible
  2023-02-16 20:26 ` [PATCHSET v9r2d1 0/4] xfs: rework the GETPARENTS ioctl Darrick J. Wong
  2023-02-16 20:40   ` [PATCH 1/4] xfs: fix multiple problems when doing getparents by handle Darrick J. Wong
  2023-02-16 20:41   ` [PATCH 2/4] xfs: use kvalloc for the parent pointer info buffer Darrick J. Wong
@ 2023-02-16 20:41   ` Darrick J. Wong
  2023-02-16 20:41   ` [PATCH 4/4] xfs: replace the XFS_IOC_GETPARENTS backend Darrick J. Wong
  3 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:41 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Pass the attr value to put_listent when we have local xattrs or
shortform xattrs.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_attr.h    |    5 +++--
 fs/xfs/libxfs/xfs_attr_sf.h |    1 +
 fs/xfs/scrub/attr.c         |    8 ++++++++
 fs/xfs/xfs_attr_list.c      |    8 +++++++-
 fs/xfs/xfs_ioctl.c          |    1 +
 fs/xfs/xfs_xattr.c          |    1 +
 6 files changed, 21 insertions(+), 3 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
index d6d23cf19ade..02a20b948c8f 100644
--- a/fs/xfs/libxfs/xfs_attr.h
+++ b/fs/xfs/libxfs/xfs_attr.h
@@ -47,8 +47,9 @@ struct xfs_attrlist_cursor_kern {
 
 
 /* void; state communicated via *context */
-typedef void (*put_listent_func_t)(struct xfs_attr_list_context *, int,
-			      unsigned char *, int, int);
+typedef void (*put_listent_func_t)(struct xfs_attr_list_context *context,
+		int flags, unsigned char *name, int namelen, void *value,
+		int valuelen);
 
 struct xfs_attr_list_context {
 	struct xfs_trans	*tp;
diff --git a/fs/xfs/libxfs/xfs_attr_sf.h b/fs/xfs/libxfs/xfs_attr_sf.h
index 37578b369d9b..c6e259791bc3 100644
--- a/fs/xfs/libxfs/xfs_attr_sf.h
+++ b/fs/xfs/libxfs/xfs_attr_sf.h
@@ -24,6 +24,7 @@ typedef struct xfs_attr_sf_sort {
 	uint8_t		flags;		/* flags bits (see xfs_attr_leaf.h) */
 	xfs_dahash_t	hash;		/* this entry's hash value */
 	unsigned char	*name;		/* name value, pointer into buffer */
+	void		*value;
 } xfs_attr_sf_sort_t;
 
 #define XFS_ATTR_SF_ENTSIZE_MAX			/* max space for name&value */ \
diff --git a/fs/xfs/scrub/attr.c b/fs/xfs/scrub/attr.c
index 2a79a13cb600..00682006d0d3 100644
--- a/fs/xfs/scrub/attr.c
+++ b/fs/xfs/scrub/attr.c
@@ -109,6 +109,7 @@ xchk_xattr_listent(
 	int				flags,
 	unsigned char			*name,
 	int				namelen,
+	void				*value,
 	int				valuelen)
 {
 	struct xchk_xattr		*sx;
@@ -134,6 +135,13 @@ xchk_xattr_listent(
 		return;
 	}
 
+	/*
+	 * Shortform and local attrs don't require external lookups to retrieve
+	 * the value, so there's nothing else to check here.
+	 */
+	if (value)
+		return;
+
 	/*
 	 * Try to allocate enough memory to extrat the attr value.  If that
 	 * doesn't work, we overload the seen_enough variable to convey
diff --git a/fs/xfs/xfs_attr_list.c b/fs/xfs/xfs_attr_list.c
index a51f7f13a352..8e3891b96736 100644
--- a/fs/xfs/xfs_attr_list.c
+++ b/fs/xfs/xfs_attr_list.c
@@ -94,6 +94,7 @@ xfs_attr_shortform_list(
 					     sfe->flags,
 					     sfe->nameval,
 					     (int)sfe->namelen,
+					     &sfe->nameval[sfe->namelen],
 					     (int)sfe->valuelen);
 			/*
 			 * Either search callback finished early or
@@ -139,6 +140,7 @@ xfs_attr_shortform_list(
 		sbp->name = sfe->nameval;
 		sbp->namelen = sfe->namelen;
 		/* These are bytes, and both on-disk, don't endian-flip */
+		sbp->value = &sfe->nameval[sfe->namelen],
 		sbp->valuelen = sfe->valuelen;
 		sbp->flags = sfe->flags;
 		sfe = xfs_attr_sf_nextentry(sfe);
@@ -189,6 +191,7 @@ xfs_attr_shortform_list(
 				     sbp->flags,
 				     sbp->name,
 				     sbp->namelen,
+				     sbp->value,
 				     sbp->valuelen);
 		if (context->seen_enough)
 			break;
@@ -443,6 +446,7 @@ xfs_attr3_leaf_list_int(
 	 */
 	for (; i < ichdr.count; entry++, i++) {
 		char *name;
+		void *value;
 		int namelen, valuelen;
 
 		if (be32_to_cpu(entry->hashval) != cursor->hashval) {
@@ -460,6 +464,7 @@ xfs_attr3_leaf_list_int(
 			name_loc = xfs_attr3_leaf_name_local(leaf, i);
 			name = name_loc->nameval;
 			namelen = name_loc->namelen;
+			value = &name_loc->nameval[name_loc->namelen];
 			valuelen = be16_to_cpu(name_loc->valuelen);
 		} else {
 			xfs_attr_leaf_name_remote_t *name_rmt;
@@ -467,6 +472,7 @@ xfs_attr3_leaf_list_int(
 			name_rmt = xfs_attr3_leaf_name_remote(leaf, i);
 			name = name_rmt->name;
 			namelen = name_rmt->namelen;
+			value = NULL;
 			valuelen = be32_to_cpu(name_rmt->valuelen);
 		}
 
@@ -475,7 +481,7 @@ xfs_attr3_leaf_list_int(
 						       entry->flags)))
 			return -EFSCORRUPTED;
 		context->put_listent(context, entry->flags,
-					      name, namelen, valuelen);
+					      name, namelen, value, valuelen);
 		if (context->seen_enough)
 			break;
 		cursor->offset++;
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 19f71d6eb561..e6d1e69c6d4a 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -308,6 +308,7 @@ xfs_ioc_attr_put_listent(
 	int			flags,
 	unsigned char		*name,
 	int			namelen,
+	void			*value,
 	int			valuelen)
 {
 	struct xfs_attrlist	*alist = context->buffer;
diff --git a/fs/xfs/xfs_xattr.c b/fs/xfs/xfs_xattr.c
index ddc2db5d6f73..85edd7e05fde 100644
--- a/fs/xfs/xfs_xattr.c
+++ b/fs/xfs/xfs_xattr.c
@@ -227,6 +227,7 @@ xfs_xattr_put_listent(
 	int		flags,
 	unsigned char	*name,
 	int		namelen,
+	void		*value,
 	int		valuelen)
 {
 	char *prefix;


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 4/4] xfs: replace the XFS_IOC_GETPARENTS backend
  2023-02-16 20:26 ` [PATCHSET v9r2d1 0/4] xfs: rework the GETPARENTS ioctl Darrick J. Wong
                     ` (2 preceding siblings ...)
  2023-02-16 20:41   ` [PATCH 3/4] xfs: pass the attr value to put_listent when possible Darrick J. Wong
@ 2023-02-16 20:41   ` Darrick J. Wong
  3 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:41 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Now that xfs_attr_list can pass local xattr values to the put_listent
function, build a new version of the GETPARENTS backend that supplies a
custom put_listent function to format parent pointer info directly into
the caller's buffer.  This uses a lot less memory and obviates the
iterate list and then grab the values logic, since parent pointers
aren't supposed to have remote values anyway.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_parent.c |   40 +++++++---
 fs/xfs/libxfs/xfs_parent.h |   21 +++++
 fs/xfs/xfs_ioctl.c         |    5 -
 fs/xfs/xfs_parent_utils.c  |  184 ++++++++++++++++++++++++--------------------
 fs/xfs/xfs_parent_utils.h  |    4 -
 fs/xfs/xfs_trace.c         |    1 
 fs/xfs/xfs_trace.h         |   73 +++++++++++++++++
 7 files changed, 227 insertions(+), 101 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_parent.c b/fs/xfs/libxfs/xfs_parent.c
index 8cc264baf6c7..179b9bebaf25 100644
--- a/fs/xfs/libxfs/xfs_parent.c
+++ b/fs/xfs/libxfs/xfs_parent.c
@@ -29,16 +29,6 @@
 
 struct kmem_cache		*xfs_parent_intent_cache;
 
-/* Initializes a xfs_parent_ptr from an xfs_parent_name_rec */
-void
-xfs_init_parent_ptr(struct xfs_parent_ptr		*xpp,
-		    const struct xfs_parent_name_rec	*rec)
-{
-	xpp->xpp_ino = be64_to_cpu(rec->p_ino);
-	xpp->xpp_gen = be32_to_cpu(rec->p_gen);
-	xpp->xpp_diroffset = be32_to_cpu(rec->p_diroffset);
-}
-
 /*
  * Parent pointer attribute handling.
  *
@@ -115,6 +105,36 @@ xfs_init_parent_name_rec(
 	rec->p_diroffset = cpu_to_be32(p_diroffset);
 }
 
+/*
+ * Convert an ondisk parent_name xattr to its incore format.  If @value is
+ * NULL, set @irec->p_namelen to zero and leave @irec->p_name untouched.
+ */
+void
+xfs_parent_irec_from_disk(
+	struct xfs_parent_name_irec	*irec,
+	const struct xfs_parent_name_rec *rec,
+	const void			*value,
+	int				valuelen)
+{
+	irec->p_ino = be64_to_cpu(rec->p_ino);
+	irec->p_gen = be32_to_cpu(rec->p_gen);
+	irec->p_diroffset = be32_to_cpu(rec->p_diroffset);
+
+	if (!value) {
+		irec->p_namelen = 0;
+		return;
+	}
+
+	ASSERT(valuelen > 0);
+	ASSERT(valuelen < MAXNAMELEN);
+
+	valuelen = min(valuelen, MAXNAMELEN);
+
+	irec->p_namelen = valuelen;
+	memcpy(irec->p_name, value, valuelen);
+	memset(&irec->p_name[valuelen], 0, sizeof(irec->p_name) - valuelen);
+}
+
 int
 __xfs_parent_init(
 	struct xfs_mount		*mp,
diff --git a/fs/xfs/libxfs/xfs_parent.h b/fs/xfs/libxfs/xfs_parent.h
index 4ffcb81d399c..f4f5887d1133 100644
--- a/fs/xfs/libxfs/xfs_parent.h
+++ b/fs/xfs/libxfs/xfs_parent.h
@@ -15,6 +15,25 @@ bool xfs_parent_namecheck(struct xfs_mount *mp,
 bool xfs_parent_valuecheck(struct xfs_mount *mp, const void *value,
 		size_t valuelen);
 
+/*
+ * Incore version of a parent pointer, also contains dirent name so callers
+ * can pass/obtain all the parent pointer information in a single structure
+ */
+struct xfs_parent_name_irec {
+	/* Key fields for looking up a particular parent pointer. */
+	xfs_ino_t		p_ino;
+	uint32_t		p_gen;
+	xfs_dir2_dataptr_t	p_diroffset;
+
+	/* Attributes of a parent pointer. */
+	uint8_t			p_namelen;
+	unsigned char		p_name[MAXNAMELEN];
+};
+
+void xfs_parent_irec_from_disk(struct xfs_parent_name_irec *irec,
+		const struct xfs_parent_name_rec *rec,
+		const void *value, int valuelen);
+
 /*
  * Dynamically allocd structure used to wrap the needed data to pass around
  * the defer ops machinery
@@ -32,8 +51,6 @@ struct xfs_parent_defer {
 void xfs_init_parent_name_rec(struct xfs_parent_name_rec *rec,
 			      struct xfs_inode *ip,
 			      uint32_t p_diroffset);
-void xfs_init_parent_ptr(struct xfs_parent_ptr *xpp,
-			 const struct xfs_parent_name_rec *rec);
 int __xfs_parent_init(struct xfs_mount *mp, bool grab_log,
 		struct xfs_parent_defer **parentp);
 
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index e6d1e69c6d4a..4c36ddd19dbd 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -1758,11 +1758,8 @@ xfs_ioc_get_parent_pointer(
 		}
 	}
 
-	if (call_ip->i_ino == mp->m_sb.sb_rootino)
-		ppi->pi_flags |= XFS_PPTR_OFLAG_ROOT;
-
 	/* Get the parent pointers */
-	error = xfs_attr_get_parent_pointer(call_ip, ppi);
+	error = xfs_getparent_pointers(call_ip, ppi);
 	if (error)
 		goto out;
 
diff --git a/fs/xfs/xfs_parent_utils.c b/fs/xfs/xfs_parent_utils.c
index 771279731d42..5ff7d38bc375 100644
--- a/fs/xfs/xfs_parent_utils.c
+++ b/fs/xfs/xfs_parent_utils.c
@@ -23,104 +23,122 @@
 #include "xfs_da_btree.h"
 #include "xfs_parent_utils.h"
 
-/*
- * Get the parent pointers for a given inode
- *
- * Returns 0 on success and non zero on error
- */
+struct xfs_getparent_ctx {
+	struct xfs_attr_list_context	context;
+	struct xfs_parent_name_irec	pptr_irec;
+	struct xfs_pptr_info		*ppi;
+};
+
+static void
+xfs_getparent_listent(
+	struct xfs_attr_list_context	*context,
+	int				flags,
+	unsigned char			*name,
+	int				namelen,
+	void				*value,
+	int				valuelen)
+{
+	struct xfs_getparent_ctx	*gp;
+	struct xfs_pptr_info		*ppi;
+	struct xfs_parent_ptr		*pptr;
+	struct xfs_parent_name_irec	*irec;
+	struct xfs_mount		*mp = context->dp->i_mount;
+
+	gp = container_of(context, struct xfs_getparent_ctx, context);
+	ppi = gp->ppi;
+	irec = &gp->pptr_irec;
+
+	/* Ignore non-parent xattrs */
+	if (!(flags & XFS_ATTR_PARENT))
+		return;
+
+	/*
+	 * Report corruption for xattrs with any other flag set, or for a
+	 * parent pointer that has a remote value.  The attr list functions
+	 * filtered any INCOMPLETE attrs for us.
+	 */
+	if (XFS_IS_CORRUPT(mp,
+			   hweight32(flags & XFS_ATTR_NSP_ONDISK_MASK) > 1) ||
+	    XFS_IS_CORRUPT(mp, value == NULL)) {
+		context->seen_enough = -EFSCORRUPTED;
+		return;
+	}
+
+	/*
+	 * We found a parent pointer, but we've filled up the buffer.  Signal
+	 * to the caller that we did /not/ reach the end of the parent pointer
+	 * recordset.
+	 */
+	if (ppi->pi_ptrs_used >= ppi->pi_ptrs_size) {
+		context->seen_enough = 1;
+		return;
+	}
+
+	xfs_parent_irec_from_disk(&gp->pptr_irec, (void *)name, value,
+			valuelen);
+
+	trace_xfs_getparent_listent(context->dp, ppi, irec);
+
+	/* Format the parent pointer directly into the caller buffer. */
+	pptr = &ppi->pi_parents[ppi->pi_ptrs_used++];
+	pptr->xpp_ino = irec->p_ino;
+	pptr->xpp_gen = irec->p_gen;
+	pptr->xpp_diroffset = irec->p_diroffset;
+	pptr->xpp_rsvd = 0;
+
+	memcpy(pptr->xpp_name, irec->p_name, irec->p_namelen);
+	memset(pptr->xpp_name + irec->p_namelen, 0,
+			sizeof(pptr->xpp_name) - irec->p_namelen);
+}
+
+/* Retrieve the parent pointers for a given inode. */
 int
-xfs_attr_get_parent_pointer(
+xfs_getparent_pointers(
 	struct xfs_inode		*ip,
 	struct xfs_pptr_info		*ppi)
 {
+	struct xfs_getparent_ctx	*gp;
+	int				error;
 
-	struct xfs_attrlist		*alist;
-	struct xfs_attrlist_ent		*aent;
-	struct xfs_parent_ptr		*xpp;
-	struct xfs_parent_name_rec	*xpnr;
-	char				*namebuf;
-	unsigned int			namebuf_size;
-	int				name_len, i, error = 0;
-	unsigned int			lock_mode, flags = XFS_ATTR_PARENT;
-	struct xfs_attr_list_context	context;
-
-	/* Allocate a buffer to store the attribute names */
-	namebuf_size = sizeof(struct xfs_attrlist) +
-		       (ppi->pi_ptrs_size) * sizeof(struct xfs_attrlist_ent);
-	namebuf = kvzalloc(namebuf_size, GFP_KERNEL);
-	if (!namebuf)
+	gp = kzalloc(sizeof(struct xfs_getparent_ctx), GFP_KERNEL);
+	if (!gp)
 		return -ENOMEM;
-
-	memset(&context, 0, sizeof(struct xfs_attr_list_context));
-	error = xfs_ioc_attr_list_context_init(ip, namebuf, namebuf_size, 0,
-			&context);
-	if (error)
-		goto out_kfree;
+	gp->ppi = ppi;
+	gp->context.dp = ip;
+	gp->context.resynch = 1;
+	gp->context.put_listent = xfs_getparent_listent;
+	gp->context.bufsize = 1; /* always init cursor */
 
 	/* Copy the cursor provided by caller */
-	memcpy(&context.cursor, &ppi->pi_cursor,
-		sizeof(struct xfs_attrlist_cursor));
-	context.attr_filter = XFS_ATTR_PARENT;
+	memcpy(&gp->context.cursor, &ppi->pi_cursor,
+			sizeof(struct xfs_attrlist_cursor));
+	ppi->pi_ptrs_used = 0;
 
-	lock_mode = xfs_ilock_attr_map_shared(ip);
+	trace_xfs_getparent_pointers(ip, ppi, &gp->context.cursor);
 
-	error = xfs_attr_list_ilocked(&context);
+	error = xfs_attr_list(&gp->context);
 	if (error)
-		goto out_unlock;
-
-	alist = (struct xfs_attrlist *)namebuf;
-	for (i = 0; i < alist->al_count; i++) {
-		struct xfs_da_args args = {
-			.geo = ip->i_mount->m_attr_geo,
-			.whichfork = XFS_ATTR_FORK,
-			.dp = ip,
-			.namelen = sizeof(struct xfs_parent_name_rec),
-			.attr_filter = flags,
-		};
-
-		xpp = xfs_ppinfo_to_pp(ppi, i);
-		memset(xpp, 0, sizeof(struct xfs_parent_ptr));
-		aent = (struct xfs_attrlist_ent *)
-			&namebuf[alist->al_offset[i]];
-		xpnr = (struct xfs_parent_name_rec *)(aent->a_name);
-
-		if (aent->a_valuelen > XFS_PPTR_MAXNAMELEN) {
-			error = -EFSCORRUPTED;
-			goto out_unlock;
-		}
-		name_len = aent->a_valuelen;
-
-		args.name = (char *)xpnr;
-		args.hashval = xfs_da_hashname(args.name, args.namelen),
-		args.value = (unsigned char *)(xpp->xpp_name);
-		args.valuelen = name_len;
-
-		error = xfs_attr_get_ilocked(&args);
-		error = (error == -EEXIST ? 0 : error);
-		if (error) {
-			error = -EFSCORRUPTED;
-			goto out_unlock;
-		}
-
-		xfs_init_parent_ptr(xpp, xpnr);
-		if (!xfs_verify_ino(args.dp->i_mount, xpp->xpp_ino)) {
-			error = -EFSCORRUPTED;
-			goto out_unlock;
-		}
+		goto out_free;
+	if (gp->context.seen_enough < 0) {
+		error = gp->context.seen_enough;
+		goto out_free;
 	}
-	ppi->pi_ptrs_used = alist->al_count;
-	if (!alist->al_more)
+
+	/* Is this the root directory? */
+	if (ip->i_ino == ip->i_mount->m_sb.sb_rootino)
+		ppi->pi_flags |= XFS_PPTR_OFLAG_ROOT;
+
+	/*
+	 * If we did not run out of buffer space, then we reached the end of
+	 * the pptr recordset, so set the DONE flag.
+	 */
+	if (gp->context.seen_enough == 0)
 		ppi->pi_flags |= XFS_PPTR_OFLAG_DONE;
 
 	/* Update the caller with the current cursor position */
-	memcpy(&ppi->pi_cursor, &context.cursor,
+	memcpy(&ppi->pi_cursor, &gp->context.cursor,
 			sizeof(struct xfs_attrlist_cursor));
-
-out_unlock:
-	xfs_iunlock(ip, lock_mode);
-out_kfree:
-	kvfree(namebuf);
-
+out_free:
+	kfree(gp);
 	return error;
 }
-
diff --git a/fs/xfs/xfs_parent_utils.h b/fs/xfs/xfs_parent_utils.h
index ad60baee8b2a..9936c74e6f96 100644
--- a/fs/xfs/xfs_parent_utils.h
+++ b/fs/xfs/xfs_parent_utils.h
@@ -6,6 +6,6 @@
 #ifndef	__XFS_PARENT_UTILS_H__
 #define	__XFS_PARENT_UTILS_H__
 
-int xfs_attr_get_parent_pointer(struct xfs_inode *ip,
-				struct xfs_pptr_info *ppi);
+int xfs_getparent_pointers(struct xfs_inode *ip, struct xfs_pptr_info *ppi);
+
 #endif	/* __XFS_PARENT_UTILS_H__ */
diff --git a/fs/xfs/xfs_trace.c b/fs/xfs/xfs_trace.c
index 8a5dc1538aa8..c1f339481697 100644
--- a/fs/xfs/xfs_trace.c
+++ b/fs/xfs/xfs_trace.c
@@ -36,6 +36,7 @@
 #include "xfs_error.h"
 #include <linux/iomap.h>
 #include "xfs_iomap.h"
+#include "xfs_parent.h"
 
 /*
  * We include this last to have the helpers above available for the trace
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 6b0e9ae7c513..959aff69822d 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -74,6 +74,9 @@ struct xfs_inobt_rec_incore;
 union xfs_btree_ptr;
 struct xfs_dqtrx;
 struct xfs_icwalk;
+struct xfs_pptr_info;
+struct xfs_parent_name_irec;
+struct xfs_attrlist_cursor_kern;
 
 #define XFS_ATTR_FILTER_FLAGS \
 	{ XFS_ATTR_ROOT,	"ROOT" }, \
@@ -4317,6 +4320,76 @@ TRACE_EVENT(xfs_force_shutdown,
 		__entry->line_num)
 );
 
+TRACE_EVENT(xfs_getparent_listent,
+	TP_PROTO(struct xfs_inode *ip, const struct xfs_pptr_info *ppi,
+	         const struct xfs_parent_name_irec *irec),
+	TP_ARGS(ip, ppi, irec),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_ino_t, ino)
+		__field(unsigned int, pused)
+		__field(unsigned int, psize)
+		__field(xfs_ino_t, parent_ino)
+		__field(unsigned int, parent_gen)
+		__field(unsigned int, namelen)
+		__dynamic_array(char, name, irec->p_namelen)
+	),
+	TP_fast_assign(
+		__entry->dev = ip->i_mount->m_super->s_dev;
+		__entry->ino = ip->i_ino;
+		__entry->pused = ppi->pi_ptrs_used;
+		__entry->psize = ppi->pi_ptrs_size;
+		__entry->parent_ino = irec->p_ino;
+		__entry->parent_gen = irec->p_gen;
+		__entry->namelen = irec->p_namelen;
+		memcpy(__get_str(name), irec->p_name, irec->p_namelen);
+	),
+	TP_printk("dev %d:%d ino 0x%llx pptr %u/%u: parent_ino 0x%llx parent_gen 0x%x name '%.*s'",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->ino,
+		  __entry->pused,
+		  __entry->psize,
+		  __entry->parent_ino,
+		  __entry->parent_gen,
+		  __entry->namelen,
+		  __get_str(name))
+);
+
+TRACE_EVENT(xfs_getparent_pointers,
+	TP_PROTO(struct xfs_inode *ip, const struct xfs_pptr_info *ppi,
+		 const struct xfs_attrlist_cursor_kern *cur),
+	TP_ARGS(ip, ppi, cur),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_ino_t, ino)
+		__field(unsigned int, flags)
+		__field(unsigned int, psize)
+		__field(unsigned int, hashval)
+		__field(unsigned int, blkno)
+		__field(unsigned int, offset)
+		__field(int, initted)
+	),
+	TP_fast_assign(
+		__entry->dev = ip->i_mount->m_super->s_dev;
+		__entry->ino = ip->i_ino;
+		__entry->flags = ppi->pi_flags;
+		__entry->psize = ppi->pi_ptrs_size;
+		__entry->hashval = cur->hashval;
+		__entry->blkno = cur->blkno;
+		__entry->offset = cur->offset;
+		__entry->initted = cur->initted;
+	),
+	TP_printk("dev %d:%d ino 0x%llx flags 0x%x psize %u cur_init? %d hashval 0x%x blkno %u offset %u",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->ino,
+		  __entry->flags,
+		  __entry->psize,
+		  __entry->initted,
+		  __entry->hashval,
+		  __entry->blkno,
+		  __entry->offset)
+);
+
 #endif /* _TRACE_XFS_H */
 
 #undef TRACE_INCLUDE_PATH


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 01/23] xfs: manage inode DONTCACHE status at irele time
  2023-02-16 20:27 ` [PATCHSET v9r2d1 00/23] xfs: online fsck support patches Darrick J. Wong
@ 2023-02-16 20:42   ` Darrick J. Wong
  2023-02-16 20:42   ` [PATCH 02/23] xfs: make checking directory dotdot entries more reliable Darrick J. Wong
                     ` (21 subsequent siblings)
  22 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:42 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Right now, there are statements scattered all over the online fsck
codebase about how we can't use XFS_IGET_DONTCACHE because of concerns
about scrub's unusual practice of releasing inodes with transactions
held.

However, iget is the wrong place to handle this -- the DONTCACHE state
doesn't matter at all until we try to *release* the inode, and here we
get things wrong in multiple ways:

First, if we /do/ have a transaction, we must NOT drop the inode,
because the inode could have dirty pages, dropping the inode will
trigger writeback, and writeback can trigger a nested transaction.

Second, if the inode already had an active reference and the DONTCACHE
flag set, the icache hit when scrub grabs another ref will not clear
DONTCACHE.  This is sort of by design, since DONTCACHE is now used to
initiate cache drops so that sysadmins can change a file's access mode
between pagecache and DAX.

Third, if we do actually have the last active reference to the inode, we
can set DONTCACHE to avoid polluting the cache.  This is the /one/ case
where we actually want that flag.

Create an xchk_irele helper to encode all that logic and switch the
online fsck code to use it.  Since this now means that nearly all
scrubbers use the same xfs_iget flags, we can wrap them too.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/scrub/common.c |   52 +++++++++++++++++++++++++++++++++++++++++++++----
 fs/xfs/scrub/common.h |    3 +++
 fs/xfs/scrub/dir.c    |    2 +-
 fs/xfs/scrub/parent.c |    9 ++++----
 fs/xfs/scrub/scrub.c  |    2 +-
 5 files changed, 57 insertions(+), 11 deletions(-)


diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
index 613260b04a3d..03039d4ade16 100644
--- a/fs/xfs/scrub/common.c
+++ b/fs/xfs/scrub/common.c
@@ -625,6 +625,16 @@ xchk_checkpoint_log(
 	return 0;
 }
 
+/* Verify that an inode is allocated ondisk, then return its cached inode. */
+int
+xchk_iget(
+	struct xfs_scrub	*sc,
+	xfs_ino_t		inum,
+	struct xfs_inode	**ipp)
+{
+	return xfs_iget(sc->mp, sc->tp, inum, XFS_IGET_UNTRUSTED, 0, ipp);
+}
+
 /*
  * Given an inode and the scrub control structure, grab either the
  * inode referenced in the control structure or the inode passed in.
@@ -649,8 +659,7 @@ xchk_get_inode(
 	/* Look up the inode, see if the generation number matches. */
 	if (xfs_internal_inum(mp, sc->sm->sm_ino))
 		return -ENOENT;
-	error = xfs_iget(mp, NULL, sc->sm->sm_ino,
-			XFS_IGET_UNTRUSTED | XFS_IGET_DONTCACHE, 0, &ip);
+	error = xchk_iget(sc, sc->sm->sm_ino, &ip);
 	switch (error) {
 	case -ENOENT:
 		/* Inode doesn't exist, just bail out. */
@@ -672,7 +681,7 @@ xchk_get_inode(
 		 * that it no longer exists.
 		 */
 		error = xfs_imap(sc->mp, sc->tp, sc->sm->sm_ino, &imap,
-				XFS_IGET_UNTRUSTED | XFS_IGET_DONTCACHE);
+				XFS_IGET_UNTRUSTED);
 		if (error)
 			return -ENOENT;
 		error = -EFSCORRUPTED;
@@ -685,7 +694,7 @@ xchk_get_inode(
 		return error;
 	}
 	if (VFS_I(ip)->i_generation != sc->sm->sm_gen) {
-		xfs_irele(ip);
+		xchk_irele(sc, ip);
 		return -ENOENT;
 	}
 
@@ -693,6 +702,41 @@ xchk_get_inode(
 	return 0;
 }
 
+/* Release an inode, possibly dropping it in the process. */
+void
+xchk_irele(
+	struct xfs_scrub	*sc,
+	struct xfs_inode	*ip)
+{
+	if (current->journal_info != NULL) {
+		ASSERT(current->journal_info == sc->tp);
+
+		/*
+		 * If we are in a transaction, we /cannot/ drop the inode
+		 * ourselves, because the VFS will trigger writeback, which
+		 * can require a transaction.  Clear DONTCACHE to force the
+		 * inode to the LRU, where someone else can take care of
+		 * dropping it.
+		 *
+		 * Note that when we grabbed our reference to the inode, it
+		 * could have had an active ref and DONTCACHE set if a sysadmin
+		 * is trying to coerce a change in file access mode.  icache
+		 * hits do not clear DONTCACHE, so we must do it here.
+		 */
+		spin_lock(&VFS_I(ip)->i_lock);
+		VFS_I(ip)->i_state &= ~I_DONTCACHE;
+		spin_unlock(&VFS_I(ip)->i_lock);
+	} else if (atomic_read(&VFS_I(ip)->i_count) == 1) {
+		/*
+		 * If this is the last reference to the inode and the caller
+		 * permits it, set DONTCACHE to avoid thrashing.
+		 */
+		d_mark_dontcache(VFS_I(ip));
+	}
+
+	xfs_irele(ip);
+}
+
 /* Set us up to scrub a file's contents. */
 int
 xchk_setup_inode_contents(
diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
index b73648d81d23..1c4525d97939 100644
--- a/fs/xfs/scrub/common.h
+++ b/fs/xfs/scrub/common.h
@@ -136,6 +136,9 @@ int xchk_get_inode(struct xfs_scrub *sc);
 int xchk_setup_inode_contents(struct xfs_scrub *sc, unsigned int resblks);
 void xchk_buffer_recheck(struct xfs_scrub *sc, struct xfs_buf *bp);
 
+int xchk_iget(struct xfs_scrub *sc, xfs_ino_t inum, struct xfs_inode **ipp);
+void xchk_irele(struct xfs_scrub *sc, struct xfs_inode *ip);
+
 /*
  * Don't bother cross-referencing if we already found corruption or cross
  * referencing discrepancies.
diff --git a/fs/xfs/scrub/dir.c b/fs/xfs/scrub/dir.c
index d1b0f23c2c59..677b21c3c865 100644
--- a/fs/xfs/scrub/dir.c
+++ b/fs/xfs/scrub/dir.c
@@ -86,7 +86,7 @@ xchk_dir_check_ftype(
 			xfs_mode_to_ftype(VFS_I(ip)->i_mode));
 	if (ino_dtype != dtype)
 		xchk_fblock_set_corrupt(sdc->sc, XFS_DATA_FORK, offset);
-	xfs_irele(ip);
+	xchk_irele(sdc->sc, ip);
 out:
 	return error;
 }
diff --git a/fs/xfs/scrub/parent.c b/fs/xfs/scrub/parent.c
index d8dff3fd8053..2696bb49324a 100644
--- a/fs/xfs/scrub/parent.c
+++ b/fs/xfs/scrub/parent.c
@@ -131,7 +131,6 @@ xchk_parent_validate(
 	xfs_ino_t		dnum,
 	bool			*try_again)
 {
-	struct xfs_mount	*mp = sc->mp;
 	struct xfs_inode	*dp = NULL;
 	xfs_nlink_t		expected_nlink;
 	xfs_nlink_t		nlink;
@@ -168,7 +167,7 @@ xchk_parent_validate(
 	 * -EFSCORRUPTED or -EFSBADCRC then the parent is corrupt which is a
 	 *  cross referencing error.  Any other error is an operational error.
 	 */
-	error = xfs_iget(mp, sc->tp, dnum, XFS_IGET_UNTRUSTED, 0, &dp);
+	error = xchk_iget(sc, dnum, &dp);
 	if (error == -EINVAL || error == -ENOENT) {
 		error = -EFSCORRUPTED;
 		xchk_fblock_process_error(sc, XFS_DATA_FORK, 0, &error);
@@ -236,11 +235,11 @@ xchk_parent_validate(
 
 	/* Drat, parent changed.  Try again! */
 	if (dnum != dp->i_ino) {
-		xfs_irele(dp);
+		xchk_irele(sc, dp);
 		*try_again = true;
 		return 0;
 	}
-	xfs_irele(dp);
+	xchk_irele(sc, dp);
 
 	/*
 	 * '..' didn't change, so check that there was only one entry
@@ -253,7 +252,7 @@ xchk_parent_validate(
 out_unlock:
 	xfs_iunlock(dp, XFS_IOLOCK_SHARED);
 out_rele:
-	xfs_irele(dp);
+	xchk_irele(sc, dp);
 out:
 	return error;
 }
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index 07a7a75f987f..752cb4fbd26f 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -166,7 +166,7 @@ xchk_teardown(
 			xfs_iunlock(sc->ip, sc->ilock_flags);
 		if (sc->ip != ip_in &&
 		    !xfs_internal_inum(sc->mp, sc->ip->i_ino))
-			xfs_irele(sc->ip);
+			xchk_irele(sc, sc->ip);
 		sc->ip = NULL;
 	}
 	if (sc->sm->sm_flags & XFS_SCRUB_IFLAG_REPAIR)


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 02/23] xfs: make checking directory dotdot entries more reliable
  2023-02-16 20:27 ` [PATCHSET v9r2d1 00/23] xfs: online fsck support patches Darrick J. Wong
  2023-02-16 20:42   ` [PATCH 01/23] xfs: manage inode DONTCACHE status at irele time Darrick J. Wong
@ 2023-02-16 20:42   ` Darrick J. Wong
  2023-02-16 20:42   ` [PATCH 03/23] xfs: xfs_iget in the directory scrubber needs to use UNTRUSTED Darrick J. Wong
                     ` (20 subsequent siblings)
  22 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:42 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

The current directory parent scrubbing code could be tighter in its
execution -- instead of bailing out to userspace after a couple of
seconds of waiting for the (alleged) parent directory's IOLOCK while
refusing to release the child directory's IOLOCK, we could just cycle
both locks until we get both or the child process absorbs a fatal
signal.

Note that because the usual sequence is to take IOLOCKs before grabbing
a transaction, we have to use the _nowait variants on both inodes to
avoid an ABBA deadlock.  Since parent pointer checking is the only place
in scrub that needs this kind of functionality, move it to parent.c as a
private function.

Furthermore, if the child directory's parent changes during the lock
cycling, we know that the new parent has stamped the correct parent into
the dotdot entry, so we can conclude that the parent entry is correct.

This eliminates an entire source of -EDEADLOCK-based "retry harder"
scrub executions.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/scrub/common.c |   22 -----
 fs/xfs/scrub/common.h |    1 
 fs/xfs/scrub/parent.c |  203 +++++++++++++++++++++++--------------------------
 3 files changed, 97 insertions(+), 129 deletions(-)


diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
index 03039d4ade16..d523cbb2c90b 100644
--- a/fs/xfs/scrub/common.c
+++ b/fs/xfs/scrub/common.c
@@ -908,28 +908,6 @@ xchk_metadata_inode_forks(
 	return 0;
 }
 
-/*
- * Try to lock an inode in violation of the usual locking order rules.  For
- * example, trying to get the IOLOCK while in transaction context, or just
- * plain breaking AG-order or inode-order inode locking rules.  Either way,
- * the only way to avoid an ABBA deadlock is to use trylock and back off if
- * we can't.
- */
-int
-xchk_ilock_inverted(
-	struct xfs_inode	*ip,
-	uint			lock_mode)
-{
-	int			i;
-
-	for (i = 0; i < 20; i++) {
-		if (xfs_ilock_nowait(ip, lock_mode))
-			return 0;
-		delay(1);
-	}
-	return -EDEADLOCK;
-}
-
 /* Pause background reaping of resources. */
 void
 xchk_stop_reaping(
diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
index 1c4525d97939..367f754c5cef 100644
--- a/fs/xfs/scrub/common.h
+++ b/fs/xfs/scrub/common.h
@@ -150,7 +150,6 @@ static inline bool xchk_skip_xref(struct xfs_scrub_metadata *sm)
 }
 
 int xchk_metadata_inode_forks(struct xfs_scrub *sc);
-int xchk_ilock_inverted(struct xfs_inode *ip, uint lock_mode);
 void xchk_stop_reaping(struct xfs_scrub *sc);
 void xchk_start_reaping(struct xfs_scrub *sc);
 
diff --git a/fs/xfs/scrub/parent.c b/fs/xfs/scrub/parent.c
index 2696bb49324a..0c23fd49716b 100644
--- a/fs/xfs/scrub/parent.c
+++ b/fs/xfs/scrub/parent.c
@@ -120,6 +120,48 @@ xchk_parent_count_parent_dentries(
 	return error;
 }
 
+/*
+ * Try to iolock the parent dir @dp in shared mode and the child dir @sc->ip
+ * exclusively.
+ */
+STATIC int
+xchk_parent_lock_two_dirs(
+	struct xfs_scrub	*sc,
+	struct xfs_inode	*dp)
+{
+	int			error = 0;
+
+	/* Callers shouldn't do this, but protect ourselves anyway. */
+	if (dp == sc->ip) {
+		ASSERT(dp != sc->ip);
+		return -EINVAL;
+	}
+
+	xfs_iunlock(sc->ip, sc->ilock_flags);
+	sc->ilock_flags = 0;
+	while (true) {
+		if (xchk_should_terminate(sc, &error))
+			return error;
+
+		/*
+		 * Normal XFS takes the IOLOCK before grabbing a transaction.
+		 * Scrub holds a transaction, which means that we can't block
+		 * on either IOLOCK.
+		 */
+		if (xfs_ilock_nowait(dp, XFS_IOLOCK_SHARED)) {
+			if (xfs_ilock_nowait(sc->ip, XFS_IOLOCK_EXCL)) {
+				sc->ilock_flags = XFS_IOLOCK_EXCL;
+				break;
+			}
+			xfs_iunlock(dp, XFS_IOLOCK_SHARED);
+		}
+
+		delay(1);
+	}
+
+	return 0;
+}
+
 /*
  * Given the inode number of the alleged parent of the inode being
  * scrubbed, try to validate that the parent has exactly one directory
@@ -128,23 +170,20 @@ xchk_parent_count_parent_dentries(
 STATIC int
 xchk_parent_validate(
 	struct xfs_scrub	*sc,
-	xfs_ino_t		dnum,
-	bool			*try_again)
+	xfs_ino_t		parent_ino)
 {
 	struct xfs_inode	*dp = NULL;
 	xfs_nlink_t		expected_nlink;
 	xfs_nlink_t		nlink;
 	int			error = 0;
 
-	*try_again = false;
-
 	if (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)
-		goto out;
+		return 0;
 
 	/* '..' must not point to ourselves. */
-	if (sc->ip->i_ino == dnum) {
+	if (sc->ip->i_ino == parent_ino) {
 		xchk_fblock_set_corrupt(sc, XFS_DATA_FORK, 0);
-		goto out;
+		return 0;
 	}
 
 	/*
@@ -154,106 +193,80 @@ xchk_parent_validate(
 	expected_nlink = VFS_I(sc->ip)->i_nlink == 0 ? 0 : 1;
 
 	/*
-	 * Grab this parent inode.  We release the inode before we
-	 * cancel the scrub transaction.  Since we're don't know a
-	 * priori that releasing the inode won't trigger eofblocks
-	 * cleanup (which allocates what would be a nested transaction)
-	 * if the parent pointer erroneously points to a file, we
-	 * can't use DONTCACHE here because DONTCACHE inodes can trigger
-	 * immediate inactive cleanup of the inode.
+	 * Grab the parent directory inode.  This must be released before we
+	 * cancel the scrub transaction.
 	 *
 	 * If _iget returns -EINVAL or -ENOENT then the parent inode number is
 	 * garbage and the directory is corrupt.  If the _iget returns
 	 * -EFSCORRUPTED or -EFSBADCRC then the parent is corrupt which is a
 	 *  cross referencing error.  Any other error is an operational error.
 	 */
-	error = xchk_iget(sc, dnum, &dp);
+	error = xchk_iget(sc, parent_ino, &dp);
 	if (error == -EINVAL || error == -ENOENT) {
 		error = -EFSCORRUPTED;
 		xchk_fblock_process_error(sc, XFS_DATA_FORK, 0, &error);
-		goto out;
+		return error;
 	}
 	if (!xchk_fblock_xref_process_error(sc, XFS_DATA_FORK, 0, &error))
-		goto out;
+		return error;
 	if (dp == sc->ip || !S_ISDIR(VFS_I(dp)->i_mode)) {
 		xchk_fblock_set_corrupt(sc, XFS_DATA_FORK, 0);
 		goto out_rele;
 	}
 
 	/*
-	 * We prefer to keep the inode locked while we lock and search
-	 * its alleged parent for a forward reference.  If we can grab
-	 * the iolock, validate the pointers and we're done.  We must
-	 * use nowait here to avoid an ABBA deadlock on the parent and
-	 * the child inodes.
+	 * We prefer to keep the inode locked while we lock and search its
+	 * alleged parent for a forward reference.  If we can grab the iolock
+	 * of the alleged parent, then we can move ahead to counting dirents
+	 * and checking nlinks.
+	 *
+	 * However, if we fail to iolock the alleged parent while holding the
+	 * child iolock, we have no way to tell if a blocking lock() would
+	 * result in an ABBA deadlock.  Release the lock on the child, then
+	 * try to lock the alleged parent and trylock the child.
 	 */
-	if (xfs_ilock_nowait(dp, XFS_IOLOCK_SHARED)) {
-		error = xchk_parent_count_parent_dentries(sc, dp, &nlink);
-		if (!xchk_fblock_xref_process_error(sc, XFS_DATA_FORK, 0,
-				&error))
+	if (!xfs_ilock_nowait(dp, XFS_IOLOCK_SHARED)) {
+		error = xchk_parent_lock_two_dirs(sc, dp);
+		if (error)
+			goto out_rele;
+
+		/*
+		 * Now that we've locked out updates to the child directory,
+		 * re-sample the expected nlink and the '..' dirent.
+		 */
+		expected_nlink = VFS_I(sc->ip)->i_nlink == 0 ? 0 : 1;
+
+		error = xfs_dir_lookup(sc->tp, sc->ip, &xfs_name_dotdot,
+				&parent_ino, NULL);
+		if (!xchk_fblock_process_error(sc, XFS_DATA_FORK, 0, &error))
+			goto out_unlock;
+
+		/*
+		 * After relocking the child directory, the '..' entry points
+		 * to a different parent than before.  This means someone moved
+		 * the child elsewhere in the directory tree, which means that
+		 * the parent link is now correct and we're done.
+		 */
+		if (parent_ino != dp->i_ino)
 			goto out_unlock;
-		if (nlink != expected_nlink)
-			xchk_fblock_set_corrupt(sc, XFS_DATA_FORK, 0);
-		goto out_unlock;
 	}
 
-	/*
-	 * The game changes if we get here.  We failed to lock the parent,
-	 * so we're going to try to verify both pointers while only holding
-	 * one lock so as to avoid deadlocking with something that's actually
-	 * trying to traverse down the directory tree.
-	 */
-	xfs_iunlock(sc->ip, sc->ilock_flags);
-	sc->ilock_flags = 0;
-	error = xchk_ilock_inverted(dp, XFS_IOLOCK_SHARED);
-	if (error)
-		goto out_rele;
-
-	/* Go looking for our dentry. */
+	/* Look for a directory entry in the parent pointing to the child. */
 	error = xchk_parent_count_parent_dentries(sc, dp, &nlink);
 	if (!xchk_fblock_xref_process_error(sc, XFS_DATA_FORK, 0, &error))
 		goto out_unlock;
 
-	/* Drop the parent lock, relock this inode. */
-	xfs_iunlock(dp, XFS_IOLOCK_SHARED);
-	error = xchk_ilock_inverted(sc->ip, XFS_IOLOCK_EXCL);
-	if (error)
-		goto out_rele;
-	sc->ilock_flags = XFS_IOLOCK_EXCL;
-
 	/*
-	 * If we're an unlinked directory, the parent /won't/ have a link
-	 * to us.  Otherwise, it should have one link.  We have to re-set
-	 * it here because we dropped the lock on sc->ip.
-	 */
-	expected_nlink = VFS_I(sc->ip)->i_nlink == 0 ? 0 : 1;
-
-	/* Look up '..' to see if the inode changed. */
-	error = xfs_dir_lookup(sc->tp, sc->ip, &xfs_name_dotdot, &dnum, NULL);
-	if (!xchk_fblock_process_error(sc, XFS_DATA_FORK, 0, &error))
-		goto out_rele;
-
-	/* Drat, parent changed.  Try again! */
-	if (dnum != dp->i_ino) {
-		xchk_irele(sc, dp);
-		*try_again = true;
-		return 0;
-	}
-	xchk_irele(sc, dp);
-
-	/*
-	 * '..' didn't change, so check that there was only one entry
-	 * for us in the parent.
+	 * Ensure that the parent has as many links to the child as the child
+	 * thinks it has to the parent.
 	 */
 	if (nlink != expected_nlink)
 		xchk_fblock_set_corrupt(sc, XFS_DATA_FORK, 0);
-	return error;
 
 out_unlock:
 	xfs_iunlock(dp, XFS_IOLOCK_SHARED);
 out_rele:
 	xchk_irele(sc, dp);
-out:
 	return error;
 }
 
@@ -263,10 +276,8 @@ xchk_parent(
 	struct xfs_scrub	*sc)
 {
 	struct xfs_mount	*mp = sc->mp;
-	xfs_ino_t		dnum;
-	bool			try_again;
-	int			tries = 0;
-	int			error = 0;
+	xfs_ino_t		parent_ino;
+	int			error;
 
 	/*
 	 * If we're a directory, check that the '..' link points up to
@@ -278,7 +289,7 @@ xchk_parent(
 	/* We're not a special inode, are we? */
 	if (!xfs_verify_dir_ino(mp, sc->ip->i_ino)) {
 		xchk_fblock_set_corrupt(sc, XFS_DATA_FORK, 0);
-		goto out;
+		return 0;
 	}
 
 	/*
@@ -292,42 +303,22 @@ xchk_parent(
 	xfs_iunlock(sc->ip, XFS_ILOCK_EXCL | XFS_MMAPLOCK_EXCL);
 
 	/* Look up '..' */
-	error = xfs_dir_lookup(sc->tp, sc->ip, &xfs_name_dotdot, &dnum, NULL);
+	error = xfs_dir_lookup(sc->tp, sc->ip, &xfs_name_dotdot, &parent_ino,
+			NULL);
 	if (!xchk_fblock_process_error(sc, XFS_DATA_FORK, 0, &error))
-		goto out;
-	if (!xfs_verify_dir_ino(mp, dnum)) {
+		return error;
+	if (!xfs_verify_dir_ino(mp, parent_ino)) {
 		xchk_fblock_set_corrupt(sc, XFS_DATA_FORK, 0);
-		goto out;
+		return 0;
 	}
 
 	/* Is this the root dir?  Then '..' must point to itself. */
 	if (sc->ip == mp->m_rootip) {
 		if (sc->ip->i_ino != mp->m_sb.sb_rootino ||
-		    sc->ip->i_ino != dnum)
+		    sc->ip->i_ino != parent_ino)
 			xchk_fblock_set_corrupt(sc, XFS_DATA_FORK, 0);
-		goto out;
+		return 0;
 	}
 
-	do {
-		error = xchk_parent_validate(sc, dnum, &try_again);
-		if (error)
-			goto out;
-	} while (try_again && ++tries < 20);
-
-	/*
-	 * We gave it our best shot but failed, so mark this scrub
-	 * incomplete.  Userspace can decide if it wants to try again.
-	 */
-	if (try_again && tries == 20)
-		xchk_set_incomplete(sc);
-out:
-	/*
-	 * If we failed to lock the parent inode even after a retry, just mark
-	 * this scrub incomplete and return.
-	 */
-	if ((sc->flags & XCHK_TRY_HARDER) && error == -EDEADLOCK) {
-		error = 0;
-		xchk_set_incomplete(sc);
-	}
-	return error;
+	return xchk_parent_validate(sc, parent_ino);
 }


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 03/23] xfs: xfs_iget in the directory scrubber needs to use UNTRUSTED
  2023-02-16 20:27 ` [PATCHSET v9r2d1 00/23] xfs: online fsck support patches Darrick J. Wong
  2023-02-16 20:42   ` [PATCH 01/23] xfs: manage inode DONTCACHE status at irele time Darrick J. Wong
  2023-02-16 20:42   ` [PATCH 02/23] xfs: make checking directory dotdot entries more reliable Darrick J. Wong
@ 2023-02-16 20:42   ` Darrick J. Wong
  2023-02-16 20:42   ` [PATCH 04/23] xfs: always check the existence of a dirent's child inode Darrick J. Wong
                     ` (19 subsequent siblings)
  22 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:42 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

In commit 4b80ac64450f, we tried to strengthen the directory scrubber by
using the iget call to detect directory entries that point to
unallocated inodes.  Unfortunately, that commit neglected to pass
XFS_IGET_UNTRUSTED to xfs_iget, so we don't check the inode btree first.
If the inode number points to something that isn't even an inode
cluster, iget will throw corruption errors and return -EFSCORRUPTED,
which means that we fail to mark the directory corrupt.

Fixes: 4b80ac64450f ("xfs: scrub should mark a directory corrupt if any entries cannot be iget'd")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/scrub/dir.c |   10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)


diff --git a/fs/xfs/scrub/dir.c b/fs/xfs/scrub/dir.c
index 677b21c3c865..ec0c73e0eb0c 100644
--- a/fs/xfs/scrub/dir.c
+++ b/fs/xfs/scrub/dir.c
@@ -59,19 +59,15 @@ xchk_dir_check_ftype(
 	}
 
 	/*
-	 * Grab the inode pointed to by the dirent.  We release the
-	 * inode before we cancel the scrub transaction.  Since we're
-	 * don't know a priori that releasing the inode won't trigger
-	 * eofblocks cleanup (which allocates what would be a nested
-	 * transaction), we can't use DONTCACHE here because DONTCACHE
-	 * inodes can trigger immediate inactive cleanup of the inode.
+	 * Grab the inode pointed to by the dirent.  Use UNTRUSTED here to
+	 * check the allocation status of the inode in the inode btrees.
 	 *
 	 * If _iget returns -EINVAL or -ENOENT then the child inode number is
 	 * garbage and the directory is corrupt.  If the _iget returns
 	 * -EFSCORRUPTED or -EFSBADCRC then the child is corrupt which is a
 	 *  cross referencing error.  Any other error is an operational error.
 	 */
-	error = xfs_iget(mp, sdc->sc->tp, inum, 0, 0, &ip);
+	error = xchk_iget(sdc->sc, inum, &ip);
 	if (error == -EINVAL || error == -ENOENT) {
 		error = -EFSCORRUPTED;
 		xchk_fblock_process_error(sdc->sc, XFS_DATA_FORK, 0, &error);


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 04/23] xfs: always check the existence of a dirent's child inode
  2023-02-16 20:27 ` [PATCHSET v9r2d1 00/23] xfs: online fsck support patches Darrick J. Wong
                     ` (2 preceding siblings ...)
  2023-02-16 20:42   ` [PATCH 03/23] xfs: xfs_iget in the directory scrubber needs to use UNTRUSTED Darrick J. Wong
@ 2023-02-16 20:42   ` Darrick J. Wong
  2023-02-16 20:43   ` [PATCH 05/23] xfs: remove the for_each_xbitmap_ helpers Darrick J. Wong
                     ` (18 subsequent siblings)
  22 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:42 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

When we're scrubbing directory entries, we always need to iget the child
inode to make sure that the inode pointer points to a valid inode.  The
original directory scrub code (commit a5c4) only set us up to do this
for ftype=1 filesystems, which is not sufficient; and then commit 4b80
made it worse by exempting the dot and dotdot entries.

Sorta-fixes: a5c46e5e8912 ("xfs: scrub directory metadata")
Sorta-fixes: 4b80ac64450f ("xfs: scrub should mark a directory corrupt if any entries cannot be iget'd")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/scrub/dir.c |   75 ++++++++++++++++++++--------------------------------
 1 file changed, 29 insertions(+), 46 deletions(-)


diff --git a/fs/xfs/scrub/dir.c b/fs/xfs/scrub/dir.c
index ec0c73e0eb0c..8076e7620734 100644
--- a/fs/xfs/scrub/dir.c
+++ b/fs/xfs/scrub/dir.c
@@ -39,52 +39,28 @@ struct xchk_dir_ctx {
 };
 
 /* Check that an inode's mode matches a given DT_ type. */
-STATIC int
+STATIC void
 xchk_dir_check_ftype(
 	struct xchk_dir_ctx	*sdc,
 	xfs_fileoff_t		offset,
-	xfs_ino_t		inum,
+	struct xfs_inode	*ip,
 	int			dtype)
 {
 	struct xfs_mount	*mp = sdc->sc->mp;
-	struct xfs_inode	*ip;
 	int			ino_dtype;
-	int			error = 0;
 
 	if (!xfs_has_ftype(mp)) {
 		if (dtype != DT_UNKNOWN && dtype != DT_DIR)
 			xchk_fblock_set_corrupt(sdc->sc, XFS_DATA_FORK,
 					offset);
-		goto out;
+		return;
 	}
 
-	/*
-	 * Grab the inode pointed to by the dirent.  Use UNTRUSTED here to
-	 * check the allocation status of the inode in the inode btrees.
-	 *
-	 * If _iget returns -EINVAL or -ENOENT then the child inode number is
-	 * garbage and the directory is corrupt.  If the _iget returns
-	 * -EFSCORRUPTED or -EFSBADCRC then the child is corrupt which is a
-	 *  cross referencing error.  Any other error is an operational error.
-	 */
-	error = xchk_iget(sdc->sc, inum, &ip);
-	if (error == -EINVAL || error == -ENOENT) {
-		error = -EFSCORRUPTED;
-		xchk_fblock_process_error(sdc->sc, XFS_DATA_FORK, 0, &error);
-		goto out;
-	}
-	if (!xchk_fblock_xref_process_error(sdc->sc, XFS_DATA_FORK, offset,
-			&error))
-		goto out;
-
 	/* Convert mode to the DT_* values that dir_emit uses. */
 	ino_dtype = xfs_dir3_get_dtype(mp,
 			xfs_mode_to_ftype(VFS_I(ip)->i_mode));
 	if (ino_dtype != dtype)
 		xchk_fblock_set_corrupt(sdc->sc, XFS_DATA_FORK, offset);
-	xchk_irele(sdc->sc, ip);
-out:
-	return error;
 }
 
 /*
@@ -105,17 +81,17 @@ xchk_dir_actor(
 	unsigned		type)
 {
 	struct xfs_mount	*mp;
+	struct xfs_inode	*dp;
 	struct xfs_inode	*ip;
 	struct xchk_dir_ctx	*sdc;
 	struct xfs_name		xname;
 	xfs_ino_t		lookup_ino;
 	xfs_dablk_t		offset;
-	bool			checked_ftype = false;
 	int			error = 0;
 
 	sdc = container_of(dir_iter, struct xchk_dir_ctx, dir_iter);
-	ip = sdc->sc->ip;
-	mp = ip->i_mount;
+	dp = sdc->sc->ip;
+	mp = dp->i_mount;
 	offset = xfs_dir2_db_to_da(mp->m_dir_geo,
 			xfs_dir2_dataptr_to_db(mp->m_dir_geo, pos));
 
@@ -136,11 +112,7 @@ xchk_dir_actor(
 
 	if (!strncmp(".", name, namelen)) {
 		/* If this is "." then check that the inum matches the dir. */
-		if (xfs_has_ftype(mp) && type != DT_DIR)
-			xchk_fblock_set_corrupt(sdc->sc, XFS_DATA_FORK,
-					offset);
-		checked_ftype = true;
-		if (ino != ip->i_ino)
+		if (ino != dp->i_ino)
 			xchk_fblock_set_corrupt(sdc->sc, XFS_DATA_FORK,
 					offset);
 	} else if (!strncmp("..", name, namelen)) {
@@ -148,11 +120,7 @@ xchk_dir_actor(
 		 * If this is ".." in the root inode, check that the inum
 		 * matches this dir.
 		 */
-		if (xfs_has_ftype(mp) && type != DT_DIR)
-			xchk_fblock_set_corrupt(sdc->sc, XFS_DATA_FORK,
-					offset);
-		checked_ftype = true;
-		if (ip->i_ino == mp->m_sb.sb_rootino && ino != ip->i_ino)
+		if (dp->i_ino == mp->m_sb.sb_rootino && ino != dp->i_ino)
 			xchk_fblock_set_corrupt(sdc->sc, XFS_DATA_FORK,
 					offset);
 	}
@@ -162,7 +130,7 @@ xchk_dir_actor(
 	xname.len = namelen;
 	xname.type = XFS_DIR3_FT_UNKNOWN;
 
-	error = xfs_dir_lookup(sdc->sc->tp, ip, &xname, &lookup_ino, NULL);
+	error = xfs_dir_lookup(sdc->sc->tp, dp, &xname, &lookup_ino, NULL);
 	/* ENOENT means the hash lookup failed and the dir is corrupt */
 	if (error == -ENOENT)
 		error = -EFSCORRUPTED;
@@ -174,12 +142,27 @@ xchk_dir_actor(
 		goto out;
 	}
 
-	/* Verify the file type.  This function absorbs error codes. */
-	if (!checked_ftype) {
-		error = xchk_dir_check_ftype(sdc, offset, lookup_ino, type);
-		if (error)
-			goto out;
+	/*
+	 * Grab the inode pointed to by the dirent.  Use UNTRUSTED here to
+	 * check the allocation status of the inode in the inode btrees.
+	 *
+	 * If _iget returns -EINVAL or -ENOENT then the child inode number is
+	 * garbage and the directory is corrupt.  If the _iget returns
+	 * -EFSCORRUPTED or -EFSBADCRC then the child is corrupt which is a
+	 *  cross referencing error.  Any other error is an operational error.
+	 */
+	error = xchk_iget(sdc->sc, ino, &ip);
+	if (error == -EINVAL || error == -ENOENT) {
+		error = -EFSCORRUPTED;
+		xchk_fblock_process_error(sdc->sc, XFS_DATA_FORK, 0, &error);
+		goto out;
 	}
+	if (!xchk_fblock_xref_process_error(sdc->sc, XFS_DATA_FORK, offset,
+			&error))
+		goto out;
+
+	xchk_dir_check_ftype(sdc, offset, ip, type);
+	xchk_irele(sdc->sc, ip);
 out:
 	/*
 	 * A negative error code returned here is supposed to cause the


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 05/23] xfs: remove the for_each_xbitmap_ helpers
  2023-02-16 20:27 ` [PATCHSET v9r2d1 00/23] xfs: online fsck support patches Darrick J. Wong
                     ` (3 preceding siblings ...)
  2023-02-16 20:42   ` [PATCH 04/23] xfs: always check the existence of a dirent's child inode Darrick J. Wong
@ 2023-02-16 20:43   ` Darrick J. Wong
  2023-02-16 20:43   ` [PATCH 06/23] xfs: drop the _safe behavior from the xbitmap foreach macro Darrick J. Wong
                     ` (17 subsequent siblings)
  22 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:43 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Remove the for_each_xbitmap_ macros in favor of proper iterator
functions.  We'll soon be switching this data structure over to an
interval tree implementation, which means that we can't allow callers to
modify the bitmap during iteration without telling us.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/scrub/agheader_repair.c |   89 ++++++++++++++++++++---------------
 fs/xfs/scrub/bitmap.c          |   59 +++++++++++++++++++++++
 fs/xfs/scrub/bitmap.h          |   22 ++++++---
 fs/xfs/scrub/repair.c          |  102 ++++++++++++++++++++++------------------
 4 files changed, 179 insertions(+), 93 deletions(-)


diff --git a/fs/xfs/scrub/agheader_repair.c b/fs/xfs/scrub/agheader_repair.c
index d75d82151eeb..26bce2f12b09 100644
--- a/fs/xfs/scrub/agheader_repair.c
+++ b/fs/xfs/scrub/agheader_repair.c
@@ -486,10 +486,11 @@ xrep_agfl_walk_rmap(
 /* Strike out the blocks that are cross-linked according to the rmapbt. */
 STATIC int
 xrep_agfl_check_extent(
-	struct xrep_agfl	*ra,
 	uint64_t		start,
-	uint64_t		len)
+	uint64_t		len,
+	void			*priv)
 {
+	struct xrep_agfl	*ra = priv;
 	xfs_agblock_t		agbno = XFS_FSB_TO_AGBNO(ra->sc->mp, start);
 	xfs_agblock_t		last_agbno = agbno + len - 1;
 	int			error;
@@ -537,7 +538,6 @@ xrep_agfl_collect_blocks(
 	struct xrep_agfl	ra;
 	struct xfs_mount	*mp = sc->mp;
 	struct xfs_btree_cur	*cur;
-	struct xbitmap_range	*br, *n;
 	int			error;
 
 	ra.sc = sc;
@@ -578,11 +578,7 @@ xrep_agfl_collect_blocks(
 
 	/* Strike out the blocks that are cross-linked. */
 	ra.rmap_cur = xfs_rmapbt_init_cursor(mp, sc->tp, agf_bp, sc->sa.pag);
-	for_each_xbitmap_extent(br, n, agfl_extents) {
-		error = xrep_agfl_check_extent(&ra, br->start, br->len);
-		if (error)
-			break;
-	}
+	error = xbitmap_walk(agfl_extents, xrep_agfl_check_extent, &ra);
 	xfs_btree_del_cursor(ra.rmap_cur, error);
 	if (error)
 		goto out_bmp;
@@ -628,6 +624,43 @@ xrep_agfl_update_agf(
 			XFS_AGF_FLFIRST | XFS_AGF_FLLAST | XFS_AGF_FLCOUNT);
 }
 
+struct xrep_agfl_fill {
+	struct xbitmap		used_extents;
+	struct xfs_scrub	*sc;
+	__be32			*agfl_bno;
+	xfs_agblock_t		flcount;
+	unsigned int		fl_off;
+};
+
+/* Fill the AGFL with whatever blocks are in this extent. */
+static int
+xrep_agfl_fill(
+	uint64_t		start,
+	uint64_t		len,
+	void			*priv)
+{
+	struct xrep_agfl_fill	*af = priv;
+	struct xfs_scrub	*sc = af->sc;
+	xfs_fsblock_t		fsbno = start;
+	int			error;
+
+	while (fsbno < start + len && af->fl_off < af->flcount)
+		af->agfl_bno[af->fl_off++] =
+				cpu_to_be32(XFS_FSB_TO_AGBNO(sc->mp, fsbno++));
+
+	trace_xrep_agfl_insert(sc->mp, sc->sa.pag->pag_agno,
+			XFS_FSB_TO_AGBNO(sc->mp, start), len);
+
+	error = xbitmap_set(&af->used_extents, start, fsbno - 1);
+	if (error)
+		return error;
+
+	if (af->fl_off == af->flcount)
+		return -ECANCELED;
+
+	return 0;
+}
+
 /* Write out a totally new AGFL. */
 STATIC void
 xrep_agfl_init_header(
@@ -636,13 +669,12 @@ xrep_agfl_init_header(
 	struct xbitmap		*agfl_extents,
 	xfs_agblock_t		flcount)
 {
+	struct xrep_agfl_fill	af = {
+		.sc		= sc,
+		.flcount	= flcount,
+	};
 	struct xfs_mount	*mp = sc->mp;
-	__be32			*agfl_bno;
-	struct xbitmap_range	*br;
-	struct xbitmap_range	*n;
 	struct xfs_agfl		*agfl;
-	xfs_agblock_t		agbno;
-	unsigned int		fl_off;
 
 	ASSERT(flcount <= xfs_agfl_size(mp));
 
@@ -661,36 +693,15 @@ xrep_agfl_init_header(
 	 * blocks than fit in the AGFL, they will be freed in a subsequent
 	 * step.
 	 */
-	fl_off = 0;
-	agfl_bno = xfs_buf_to_agfl_bno(agfl_bp);
-	for_each_xbitmap_extent(br, n, agfl_extents) {
-		agbno = XFS_FSB_TO_AGBNO(mp, br->start);
-
-		trace_xrep_agfl_insert(mp, sc->sa.pag->pag_agno, agbno,
-				br->len);
-
-		while (br->len > 0 && fl_off < flcount) {
-			agfl_bno[fl_off] = cpu_to_be32(agbno);
-			fl_off++;
-			agbno++;
-
-			/*
-			 * We've now used br->start by putting it in the AGFL,
-			 * so bump br so that we don't reap the block later.
-			 */
-			br->start++;
-			br->len--;
-		}
-
-		if (br->len)
-			break;
-		list_del(&br->list);
-		kfree(br);
-	}
+	xbitmap_init(&af.used_extents);
+	af.agfl_bno = xfs_buf_to_agfl_bno(agfl_bp),
+	xbitmap_walk(agfl_extents, xrep_agfl_fill, &af);
+	xbitmap_disunion(agfl_extents, &af.used_extents);
 
 	/* Write new AGFL to disk. */
 	xfs_trans_buf_set_type(sc->tp, agfl_bp, XFS_BLFT_AGFL_BUF);
 	xfs_trans_log_buf(sc->tp, agfl_bp, 0, BBTOB(agfl_bp->b_length) - 1);
+	xbitmap_destroy(&af.used_extents);
 }
 
 /* Repair the AGFL. */
diff --git a/fs/xfs/scrub/bitmap.c b/fs/xfs/scrub/bitmap.c
index a255f09e9f0a..d32ded56da90 100644
--- a/fs/xfs/scrub/bitmap.c
+++ b/fs/xfs/scrub/bitmap.c
@@ -13,6 +13,9 @@
 #include "scrub/scrub.h"
 #include "scrub/bitmap.h"
 
+#define for_each_xbitmap_extent(bex, n, bitmap) \
+	list_for_each_entry_safe((bex), (n), &(bitmap)->list, list)
+
 /*
  * Set a range of this bitmap.  Caller must ensure the range is not set.
  *
@@ -313,3 +316,59 @@ xbitmap_hweight(
 
 	return ret;
 }
+
+/* Call a function for every run of set bits in this bitmap. */
+int
+xbitmap_walk(
+	struct xbitmap		*bitmap,
+	xbitmap_walk_fn	fn,
+	void			*priv)
+{
+	struct xbitmap_range	*bex, *n;
+	int			error = 0;
+
+	for_each_xbitmap_extent(bex, n, bitmap) {
+		error = fn(bex->start, bex->len, priv);
+		if (error)
+			break;
+	}
+
+	return error;
+}
+
+struct xbitmap_walk_bits {
+	xbitmap_walk_bits_fn	fn;
+	void			*priv;
+};
+
+/* Walk all the bits in a run. */
+static int
+xbitmap_walk_bits_in_run(
+	uint64_t			start,
+	uint64_t			len,
+	void				*priv)
+{
+	struct xbitmap_walk_bits	*wb = priv;
+	uint64_t			i;
+	int				error = 0;
+
+	for (i = start; i < start + len; i++) {
+		error = wb->fn(i, wb->priv);
+		if (error)
+			break;
+	}
+
+	return error;
+}
+
+/* Call a function for every set bit in this bitmap. */
+int
+xbitmap_walk_bits(
+	struct xbitmap			*bitmap,
+	xbitmap_walk_bits_fn		fn,
+	void				*priv)
+{
+	struct xbitmap_walk_bits	wb = {.fn = fn, .priv = priv};
+
+	return xbitmap_walk(bitmap, xbitmap_walk_bits_in_run, &wb);
+}
diff --git a/fs/xfs/scrub/bitmap.h b/fs/xfs/scrub/bitmap.h
index 900646b72de1..53601d281ffb 100644
--- a/fs/xfs/scrub/bitmap.h
+++ b/fs/xfs/scrub/bitmap.h
@@ -19,13 +19,6 @@ struct xbitmap {
 void xbitmap_init(struct xbitmap *bitmap);
 void xbitmap_destroy(struct xbitmap *bitmap);
 
-#define for_each_xbitmap_extent(bex, n, bitmap) \
-	list_for_each_entry_safe((bex), (n), &(bitmap)->list, list)
-
-#define for_each_xbitmap_block(b, bex, n, bitmap) \
-	list_for_each_entry_safe((bex), (n), &(bitmap)->list, list) \
-		for ((b) = (bex)->start; (b) < (bex)->start + (bex)->len; (b)++)
-
 int xbitmap_set(struct xbitmap *bitmap, uint64_t start, uint64_t len);
 int xbitmap_disunion(struct xbitmap *bitmap, struct xbitmap *sub);
 int xbitmap_set_btcur_path(struct xbitmap *bitmap,
@@ -34,4 +27,19 @@ int xbitmap_set_btblocks(struct xbitmap *bitmap,
 		struct xfs_btree_cur *cur);
 uint64_t xbitmap_hweight(struct xbitmap *bitmap);
 
+/*
+ * Return codes for the bitmap iterator functions are 0 to continue iterating,
+ * and non-zero to stop iterating.  Any non-zero value will be passed up to the
+ * iteration caller.  The special value -ECANCELED can be used to stop
+ * iteration, because neither bitmap iterator ever generates that error code on
+ * its own.  Callers must not modify the bitmap while walking it.
+ */
+typedef int (*xbitmap_walk_fn)(uint64_t start, uint64_t len, void *priv);
+int xbitmap_walk(struct xbitmap *bitmap, xbitmap_walk_fn fn,
+		void *priv);
+
+typedef int (*xbitmap_walk_bits_fn)(uint64_t bit, void *priv);
+int xbitmap_walk_bits(struct xbitmap *bitmap, xbitmap_walk_bits_fn fn,
+		void *priv);
+
 #endif	/* __XFS_SCRUB_BITMAP_H__ */
diff --git a/fs/xfs/scrub/repair.c b/fs/xfs/scrub/repair.c
index 4b92f9253ccd..e117ae06e438 100644
--- a/fs/xfs/scrub/repair.c
+++ b/fs/xfs/scrub/repair.c
@@ -443,6 +443,30 @@ xrep_init_btblock(
  * buffers associated with @bitmap.
  */
 
+static int
+xrep_invalidate_block(
+	uint64_t		fsbno,
+	void			*priv)
+{
+	struct xfs_scrub	*sc = priv;
+	struct xfs_buf		*bp;
+	int			error;
+
+	/* Skip AG headers and post-EOFS blocks */
+	if (!xfs_verify_fsbno(sc->mp, fsbno))
+		return 0;
+
+	error = xfs_buf_incore(sc->mp->m_ddev_targp,
+			XFS_FSB_TO_DADDR(sc->mp, fsbno),
+			XFS_FSB_TO_BB(sc->mp, 1), XBF_TRYLOCK, &bp);
+	if (error)
+		return 0;
+
+	xfs_trans_bjoin(sc->tp, bp);
+	xfs_trans_binval(sc->tp, bp);
+	return 0;
+}
+
 /*
  * Invalidate buffers for per-AG btree blocks we're dumping.  This function
  * is not intended for use with file data repairs; we have bunmapi for that.
@@ -452,11 +476,6 @@ xrep_invalidate_blocks(
 	struct xfs_scrub	*sc,
 	struct xbitmap		*bitmap)
 {
-	struct xbitmap_range	*bmr;
-	struct xbitmap_range	*n;
-	struct xfs_buf		*bp;
-	xfs_fsblock_t		fsbno;
-
 	/*
 	 * For each block in each extent, see if there's an incore buffer for
 	 * exactly that block; if so, invalidate it.  The buffer cache only
@@ -465,23 +484,7 @@ xrep_invalidate_blocks(
 	 * because we never own those; and if we can't TRYLOCK the buffer we
 	 * assume it's owned by someone else.
 	 */
-	for_each_xbitmap_block(fsbno, bmr, n, bitmap) {
-		int		error;
-
-		/* Skip AG headers and post-EOFS blocks */
-		if (!xfs_verify_fsbno(sc->mp, fsbno))
-			continue;
-		error = xfs_buf_incore(sc->mp->m_ddev_targp,
-				XFS_FSB_TO_DADDR(sc->mp, fsbno),
-				XFS_FSB_TO_BB(sc->mp, 1), XBF_TRYLOCK, &bp);
-		if (error)
-			continue;
-
-		xfs_trans_bjoin(sc->tp, bp);
-		xfs_trans_binval(sc->tp, bp);
-	}
-
-	return 0;
+	return xbitmap_walk_bits(bitmap, xrep_invalidate_block, sc);
 }
 
 /* Ensure the freelist is the correct size. */
@@ -502,6 +505,15 @@ xrep_fix_freelist(
 			can_shrink ? 0 : XFS_ALLOC_FLAG_NOSHRINK);
 }
 
+/* Information about reaping extents after a repair. */
+struct xrep_reap_state {
+	struct xfs_scrub		*sc;
+
+	/* Reverse mapping owner and metadata reservation type. */
+	const struct xfs_owner_info	*oinfo;
+	enum xfs_ag_resv_type		resv;
+};
+
 /*
  * Put a block back on the AGFL.
  */
@@ -546,17 +558,23 @@ xrep_put_freelist(
 /* Dispose of a single block. */
 STATIC int
 xrep_reap_block(
-	struct xfs_scrub		*sc,
-	xfs_fsblock_t			fsbno,
-	const struct xfs_owner_info	*oinfo,
-	enum xfs_ag_resv_type		resv)
+	uint64_t			fsbno,
+	void				*priv)
 {
+	struct xrep_reap_state		*rs = priv;
+	struct xfs_scrub		*sc = rs->sc;
 	struct xfs_btree_cur		*cur;
 	struct xfs_buf			*agf_bp = NULL;
 	xfs_agblock_t			agbno;
 	bool				has_other_rmap;
 	int				error;
 
+	ASSERT(sc->ip != NULL ||
+	       XFS_FSB_TO_AGNO(sc->mp, fsbno) == sc->sa.pag->pag_agno);
+	trace_xrep_dispose_btree_extent(sc->mp,
+			XFS_FSB_TO_AGNO(sc->mp, fsbno),
+			XFS_FSB_TO_AGBNO(sc->mp, fsbno), 1);
+
 	agbno = XFS_FSB_TO_AGBNO(sc->mp, fsbno);
 	ASSERT(XFS_FSB_TO_AGNO(sc->mp, fsbno) == sc->sa.pag->pag_agno);
 
@@ -575,7 +593,8 @@ xrep_reap_block(
 	cur = xfs_rmapbt_init_cursor(sc->mp, sc->tp, agf_bp, sc->sa.pag);
 
 	/* Can we find any other rmappings? */
-	error = xfs_rmap_has_other_keys(cur, agbno, 1, oinfo, &has_other_rmap);
+	error = xfs_rmap_has_other_keys(cur, agbno, 1, rs->oinfo,
+			&has_other_rmap);
 	xfs_btree_del_cursor(cur, error);
 	if (error)
 		goto out_free;
@@ -595,11 +614,11 @@ xrep_reap_block(
 	 */
 	if (has_other_rmap)
 		error = xfs_rmap_free(sc->tp, agf_bp, sc->sa.pag, agbno,
-					1, oinfo);
-	else if (resv == XFS_AG_RESV_AGFL)
+					1, rs->oinfo);
+	else if (rs->resv == XFS_AG_RESV_AGFL)
 		error = xrep_put_freelist(sc, agbno);
 	else
-		error = xfs_free_extent(sc->tp, fsbno, 1, oinfo, resv);
+		error = xfs_free_extent(sc->tp, fsbno, 1, rs->oinfo, rs->resv);
 	if (agf_bp != sc->sa.agf_bp)
 		xfs_trans_brelse(sc->tp, agf_bp);
 	if (error)
@@ -623,26 +642,15 @@ xrep_reap_extents(
 	const struct xfs_owner_info	*oinfo,
 	enum xfs_ag_resv_type		type)
 {
-	struct xbitmap_range		*bmr;
-	struct xbitmap_range		*n;
-	xfs_fsblock_t			fsbno;
-	int				error = 0;
+	struct xrep_reap_state		rs = {
+		.sc			= sc,
+		.oinfo			= oinfo,
+		.resv			= type,
+	};
 
 	ASSERT(xfs_has_rmapbt(sc->mp));
 
-	for_each_xbitmap_block(fsbno, bmr, n, bitmap) {
-		ASSERT(sc->ip != NULL ||
-		       XFS_FSB_TO_AGNO(sc->mp, fsbno) == sc->sa.pag->pag_agno);
-		trace_xrep_dispose_btree_extent(sc->mp,
-				XFS_FSB_TO_AGNO(sc->mp, fsbno),
-				XFS_FSB_TO_AGBNO(sc->mp, fsbno), 1);
-
-		error = xrep_reap_block(sc, fsbno, oinfo, type);
-		if (error)
-			break;
-	}
-
-	return error;
+	return xbitmap_walk_bits(bitmap, xrep_reap_block, &rs);
 }
 
 /*


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 06/23] xfs: drop the _safe behavior from the xbitmap foreach macro
  2023-02-16 20:27 ` [PATCHSET v9r2d1 00/23] xfs: online fsck support patches Darrick J. Wong
                     ` (4 preceding siblings ...)
  2023-02-16 20:43   ` [PATCH 05/23] xfs: remove the for_each_xbitmap_ helpers Darrick J. Wong
@ 2023-02-16 20:43   ` Darrick J. Wong
  2023-02-16 20:43   ` [PATCH 07/23] xfs: convert xbitmap to interval tree Darrick J. Wong
                     ` (16 subsequent siblings)
  22 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:43 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

It's not safe to edit bitmap intervals while we're iterating them with
for_each_xbitmap_extent.  None of the existing callers actually need
that ability anyway, so drop the safe variable.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/scrub/bitmap.c |   17 ++++++++---------
 1 file changed, 8 insertions(+), 9 deletions(-)


diff --git a/fs/xfs/scrub/bitmap.c b/fs/xfs/scrub/bitmap.c
index d32ded56da90..f8ebc4d61462 100644
--- a/fs/xfs/scrub/bitmap.c
+++ b/fs/xfs/scrub/bitmap.c
@@ -13,8 +13,9 @@
 #include "scrub/scrub.h"
 #include "scrub/bitmap.h"
 
-#define for_each_xbitmap_extent(bex, n, bitmap) \
-	list_for_each_entry_safe((bex), (n), &(bitmap)->list, list)
+/* Iterate each interval of a bitmap.  Do not change the bitmap. */
+#define for_each_xbitmap_extent(bex, bitmap) \
+	list_for_each_entry((bex), &(bitmap)->list, list)
 
 /*
  * Set a range of this bitmap.  Caller must ensure the range is not set.
@@ -46,10 +47,9 @@ void
 xbitmap_destroy(
 	struct xbitmap		*bitmap)
 {
-	struct xbitmap_range	*bmr;
-	struct xbitmap_range	*n;
+	struct xbitmap_range	*bmr, *n;
 
-	for_each_xbitmap_extent(bmr, n, bitmap) {
+	list_for_each_entry_safe(bmr, n, &bitmap->list, list) {
 		list_del(&bmr->list);
 		kfree(bmr);
 	}
@@ -308,10 +308,9 @@ xbitmap_hweight(
 	struct xbitmap		*bitmap)
 {
 	struct xbitmap_range	*bmr;
-	struct xbitmap_range	*n;
 	uint64_t		ret = 0;
 
-	for_each_xbitmap_extent(bmr, n, bitmap)
+	for_each_xbitmap_extent(bmr, bitmap)
 		ret += bmr->len;
 
 	return ret;
@@ -324,10 +323,10 @@ xbitmap_walk(
 	xbitmap_walk_fn	fn,
 	void			*priv)
 {
-	struct xbitmap_range	*bex, *n;
+	struct xbitmap_range	*bex;
 	int			error = 0;
 
-	for_each_xbitmap_extent(bex, n, bitmap) {
+	for_each_xbitmap_extent(bex, bitmap) {
 		error = fn(bex->start, bex->len, priv);
 		if (error)
 			break;


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 07/23] xfs: convert xbitmap to interval tree
  2023-02-16 20:27 ` [PATCHSET v9r2d1 00/23] xfs: online fsck support patches Darrick J. Wong
                     ` (5 preceding siblings ...)
  2023-02-16 20:43   ` [PATCH 06/23] xfs: drop the _safe behavior from the xbitmap foreach macro Darrick J. Wong
@ 2023-02-16 20:43   ` Darrick J. Wong
  2023-02-16 20:43   ` [PATCH 08/23] xfs: port xbitmap_test Darrick J. Wong
                     ` (15 subsequent siblings)
  22 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:43 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Convert the xbitmap code to use interval trees instead of linked lists.
This reduces the amount of coding required to handle the disunion
operation and in the future will make it easier to set bits in arbitrary
order yet later be able to extract maximally sized extents, which we'll
need for rebuilding certain structures.  We define our own interval tree
type so that it can deal with 64-bit indices even on 32-bit machines.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/scrub/agheader_repair.c |   12 +
 fs/xfs/scrub/bitmap.c          |  323 ++++++++++++++++++++++------------------
 fs/xfs/scrub/bitmap.h          |   11 -
 3 files changed, 187 insertions(+), 159 deletions(-)


diff --git a/fs/xfs/scrub/agheader_repair.c b/fs/xfs/scrub/agheader_repair.c
index 26bce2f12b09..c22dc71fdd82 100644
--- a/fs/xfs/scrub/agheader_repair.c
+++ b/fs/xfs/scrub/agheader_repair.c
@@ -662,7 +662,7 @@ xrep_agfl_fill(
 }
 
 /* Write out a totally new AGFL. */
-STATIC void
+STATIC int
 xrep_agfl_init_header(
 	struct xfs_scrub	*sc,
 	struct xfs_buf		*agfl_bp,
@@ -675,6 +675,7 @@ xrep_agfl_init_header(
 	};
 	struct xfs_mount	*mp = sc->mp;
 	struct xfs_agfl		*agfl;
+	int			error;
 
 	ASSERT(flcount <= xfs_agfl_size(mp));
 
@@ -696,12 +697,15 @@ xrep_agfl_init_header(
 	xbitmap_init(&af.used_extents);
 	af.agfl_bno = xfs_buf_to_agfl_bno(agfl_bp),
 	xbitmap_walk(agfl_extents, xrep_agfl_fill, &af);
-	xbitmap_disunion(agfl_extents, &af.used_extents);
+	error = xbitmap_disunion(agfl_extents, &af.used_extents);
+	if (error)
+		return error;
 
 	/* Write new AGFL to disk. */
 	xfs_trans_buf_set_type(sc->tp, agfl_bp, XFS_BLFT_AGFL_BUF);
 	xfs_trans_log_buf(sc->tp, agfl_bp, 0, BBTOB(agfl_bp->b_length) - 1);
 	xbitmap_destroy(&af.used_extents);
+	return 0;
 }
 
 /* Repair the AGFL. */
@@ -754,7 +758,9 @@ xrep_agfl(
 	 * buffers until we know that part works.
 	 */
 	xrep_agfl_update_agf(sc, agf_bp, flcount);
-	xrep_agfl_init_header(sc, agfl_bp, &agfl_extents, flcount);
+	error = xrep_agfl_init_header(sc, agfl_bp, &agfl_extents, flcount);
+	if (error)
+		goto err;
 
 	/*
 	 * Ok, the AGFL should be ready to go now.  Roll the transaction to
diff --git a/fs/xfs/scrub/bitmap.c b/fs/xfs/scrub/bitmap.c
index f8ebc4d61462..1b04d2ce020a 100644
--- a/fs/xfs/scrub/bitmap.c
+++ b/fs/xfs/scrub/bitmap.c
@@ -13,31 +13,160 @@
 #include "scrub/scrub.h"
 #include "scrub/bitmap.h"
 
+#include <linux/interval_tree_generic.h>
+
+struct xbitmap_node {
+	struct rb_node	bn_rbnode;
+
+	/* First set bit of this interval and subtree. */
+	uint64_t	bn_start;
+
+	/* Last set bit of this interval. */
+	uint64_t	bn_last;
+
+	/* Last set bit of this subtree.  Do not touch this. */
+	uint64_t	__bn_subtree_last;
+};
+
+/* Define our own interval tree type with uint64_t parameters. */
+
+#define START(node) ((node)->bn_start)
+#define LAST(node)  ((node)->bn_last)
+
+/*
+ * These functions are defined by the INTERVAL_TREE_DEFINE macro, but we'll
+ * forward-declare them anyway for clarity.
+ */
+static inline void
+xbitmap_tree_insert(struct xbitmap_node *node, struct rb_root_cached *root);
+
+static inline void
+xbitmap_tree_remove(struct xbitmap_node *node, struct rb_root_cached *root);
+
+static inline struct xbitmap_node *
+xbitmap_tree_iter_first(struct rb_root_cached *root, uint64_t start,
+			uint64_t last);
+
+static inline struct xbitmap_node *
+xbitmap_tree_iter_next(struct xbitmap_node *node, uint64_t start,
+		       uint64_t last);
+
+INTERVAL_TREE_DEFINE(struct xbitmap_node, bn_rbnode, uint64_t,
+		__bn_subtree_last, START, LAST, static inline, xbitmap_tree)
+
 /* Iterate each interval of a bitmap.  Do not change the bitmap. */
-#define for_each_xbitmap_extent(bex, bitmap) \
-	list_for_each_entry((bex), &(bitmap)->list, list)
-
-/*
- * Set a range of this bitmap.  Caller must ensure the range is not set.
- *
- * This is the logical equivalent of bitmap |= mask(start, len).
- */
+#define for_each_xbitmap_extent(bn, bitmap) \
+	for ((bn) = rb_entry_safe(rb_first(&(bitmap)->xb_root.rb_root), \
+				   struct xbitmap_node, bn_rbnode); \
+	     (bn) != NULL; \
+	     (bn) = rb_entry_safe(rb_next(&(bn)->bn_rbnode), \
+				   struct xbitmap_node, bn_rbnode))
+
+/* Clear a range of this bitmap. */
+int
+xbitmap_clear(
+	struct xbitmap		*bitmap,
+	uint64_t		start,
+	uint64_t		len)
+{
+	struct xbitmap_node	*bn;
+	struct xbitmap_node	*new_bn;
+	uint64_t		last = start + len - 1;
+
+	while ((bn = xbitmap_tree_iter_first(&bitmap->xb_root, start, last))) {
+		if (bn->bn_start < start && bn->bn_last > last) {
+			uint64_t	old_last = bn->bn_last;
+
+			/* overlaps with the entire clearing range */
+			xbitmap_tree_remove(bn, &bitmap->xb_root);
+			bn->bn_last = start - 1;
+			xbitmap_tree_insert(bn, &bitmap->xb_root);
+
+			/* add an extent */
+			new_bn = kmalloc(sizeof(struct xbitmap_node),
+					XCHK_GFP_FLAGS);
+			if (!new_bn)
+				return -ENOMEM;
+			new_bn->bn_start = last + 1;
+			new_bn->bn_last = old_last;
+			xbitmap_tree_insert(new_bn, &bitmap->xb_root);
+		} else if (bn->bn_start < start) {
+			/* overlaps with the left side of the clearing range */
+			xbitmap_tree_remove(bn, &bitmap->xb_root);
+			bn->bn_last = start - 1;
+			xbitmap_tree_insert(bn, &bitmap->xb_root);
+		} else if (bn->bn_last > last) {
+			/* overlaps with the right side of the clearing range */
+			xbitmap_tree_remove(bn, &bitmap->xb_root);
+			bn->bn_start = last + 1;
+			xbitmap_tree_insert(bn, &bitmap->xb_root);
+			break;
+		} else {
+			/* in the middle of the clearing range */
+			xbitmap_tree_remove(bn, &bitmap->xb_root);
+			kfree(bn);
+		}
+	}
+
+	return 0;
+}
+
+/* Set a range of this bitmap. */
 int
 xbitmap_set(
 	struct xbitmap		*bitmap,
 	uint64_t		start,
 	uint64_t		len)
 {
-	struct xbitmap_range	*bmr;
+	struct xbitmap_node	*left;
+	struct xbitmap_node	*right;
+	uint64_t		last = start + len - 1;
+	int			error;
 
-	bmr = kmalloc(sizeof(struct xbitmap_range), XCHK_GFP_FLAGS);
-	if (!bmr)
-		return -ENOMEM;
+	/* Is this whole range already set? */
+	left = xbitmap_tree_iter_first(&bitmap->xb_root, start, last);
+	if (left && left->bn_start <= start && left->bn_last >= last)
+		return 0;
 
-	INIT_LIST_HEAD(&bmr->list);
-	bmr->start = start;
-	bmr->len = len;
-	list_add_tail(&bmr->list, &bitmap->list);
+	/* Clear out everything in the range we want to set. */
+	error = xbitmap_clear(bitmap, start, len);
+	if (error)
+		return error;
+
+	/* Do we have a left-adjacent extent? */
+	left = xbitmap_tree_iter_first(&bitmap->xb_root, start - 1, start - 1);
+	ASSERT(!left || left->bn_last + 1 == start);
+
+	/* Do we have a right-adjacent extent? */
+	right = xbitmap_tree_iter_first(&bitmap->xb_root, last + 1, last + 1);
+	ASSERT(!right || right->bn_start == last + 1);
+
+	if (left && right) {
+		/* combine left and right adjacent extent */
+		xbitmap_tree_remove(left, &bitmap->xb_root);
+		xbitmap_tree_remove(right, &bitmap->xb_root);
+		left->bn_last = right->bn_last;
+		xbitmap_tree_insert(left, &bitmap->xb_root);
+		kfree(right);
+	} else if (left) {
+		/* combine with left extent */
+		xbitmap_tree_remove(left, &bitmap->xb_root);
+		left->bn_last = last;
+		xbitmap_tree_insert(left, &bitmap->xb_root);
+	} else if (right) {
+		/* combine with right extent */
+		xbitmap_tree_remove(right, &bitmap->xb_root);
+		right->bn_start = start;
+		xbitmap_tree_insert(right, &bitmap->xb_root);
+	} else {
+		/* add an extent */
+		left = kmalloc(sizeof(struct xbitmap_node), XCHK_GFP_FLAGS);
+		if (!left)
+			return -ENOMEM;
+		left->bn_start = start;
+		left->bn_last = last;
+		xbitmap_tree_insert(left, &bitmap->xb_root);
+	}
 
 	return 0;
 }
@@ -47,11 +176,11 @@ void
 xbitmap_destroy(
 	struct xbitmap		*bitmap)
 {
-	struct xbitmap_range	*bmr, *n;
+	struct xbitmap_node	*bn;
 
-	list_for_each_entry_safe(bmr, n, &bitmap->list, list) {
-		list_del(&bmr->list);
-		kfree(bmr);
+	while ((bn = xbitmap_tree_iter_first(&bitmap->xb_root, 0, -1ULL))) {
+		xbitmap_tree_remove(bn, &bitmap->xb_root);
+		kfree(bn);
 	}
 }
 
@@ -60,27 +189,7 @@ void
 xbitmap_init(
 	struct xbitmap		*bitmap)
 {
-	INIT_LIST_HEAD(&bitmap->list);
-}
-
-/* Compare two btree extents. */
-static int
-xbitmap_range_cmp(
-	void			*priv,
-	const struct list_head	*a,
-	const struct list_head	*b)
-{
-	struct xbitmap_range	*ap;
-	struct xbitmap_range	*bp;
-
-	ap = container_of(a, struct xbitmap_range, list);
-	bp = container_of(b, struct xbitmap_range, list);
-
-	if (ap->start > bp->start)
-		return 1;
-	if (ap->start < bp->start)
-		return -1;
-	return 0;
+	bitmap->xb_root = RB_ROOT_CACHED;
 }
 
 /*
@@ -97,118 +206,26 @@ xbitmap_range_cmp(
  *
  * This is the logical equivalent of bitmap &= ~sub.
  */
-#define LEFT_ALIGNED	(1 << 0)
-#define RIGHT_ALIGNED	(1 << 1)
 int
 xbitmap_disunion(
 	struct xbitmap		*bitmap,
 	struct xbitmap		*sub)
 {
-	struct list_head	*lp;
-	struct xbitmap_range	*br;
-	struct xbitmap_range	*new_br;
-	struct xbitmap_range	*sub_br;
-	uint64_t		sub_start;
-	uint64_t		sub_len;
-	int			state;
-	int			error = 0;
+	struct xbitmap_node	*bn;
+	int			error;
 
-	if (list_empty(&bitmap->list) || list_empty(&sub->list))
+	if (xbitmap_empty(bitmap) || xbitmap_empty(sub))
 		return 0;
-	ASSERT(!list_empty(&sub->list));
 
-	list_sort(NULL, &bitmap->list, xbitmap_range_cmp);
-	list_sort(NULL, &sub->list, xbitmap_range_cmp);
-
-	/*
-	 * Now that we've sorted both lists, we iterate bitmap once, rolling
-	 * forward through sub and/or bitmap as necessary until we find an
-	 * overlap or reach the end of either list.  We do not reset lp to the
-	 * head of bitmap nor do we reset sub_br to the head of sub.  The
-	 * list traversal is similar to merge sort, but we're deleting
-	 * instead.  In this manner we avoid O(n^2) operations.
-	 */
-	sub_br = list_first_entry(&sub->list, struct xbitmap_range,
-			list);
-	lp = bitmap->list.next;
-	while (lp != &bitmap->list) {
-		br = list_entry(lp, struct xbitmap_range, list);
-
-		/*
-		 * Advance sub_br and/or br until we find a pair that
-		 * intersect or we run out of extents.
-		 */
-		while (sub_br->start + sub_br->len <= br->start) {
-			if (list_is_last(&sub_br->list, &sub->list))
-				goto out;
-			sub_br = list_next_entry(sub_br, list);
-		}
-		if (sub_br->start >= br->start + br->len) {
-			lp = lp->next;
-			continue;
-		}
-
-		/* trim sub_br to fit the extent we have */
-		sub_start = sub_br->start;
-		sub_len = sub_br->len;
-		if (sub_br->start < br->start) {
-			sub_len -= br->start - sub_br->start;
-			sub_start = br->start;
-		}
-		if (sub_len > br->len)
-			sub_len = br->len;
-
-		state = 0;
-		if (sub_start == br->start)
-			state |= LEFT_ALIGNED;
-		if (sub_start + sub_len == br->start + br->len)
-			state |= RIGHT_ALIGNED;
-		switch (state) {
-		case LEFT_ALIGNED:
-			/* Coincides with only the left. */
-			br->start += sub_len;
-			br->len -= sub_len;
-			break;
-		case RIGHT_ALIGNED:
-			/* Coincides with only the right. */
-			br->len -= sub_len;
-			lp = lp->next;
-			break;
-		case LEFT_ALIGNED | RIGHT_ALIGNED:
-			/* Total overlap, just delete ex. */
-			lp = lp->next;
-			list_del(&br->list);
-			kfree(br);
-			break;
-		case 0:
-			/*
-			 * Deleting from the middle: add the new right extent
-			 * and then shrink the left extent.
-			 */
-			new_br = kmalloc(sizeof(struct xbitmap_range),
-					XCHK_GFP_FLAGS);
-			if (!new_br) {
-				error = -ENOMEM;
-				goto out;
-			}
-			INIT_LIST_HEAD(&new_br->list);
-			new_br->start = sub_start + sub_len;
-			new_br->len = br->start + br->len - new_br->start;
-			list_add(&new_br->list, &br->list);
-			br->len = sub_start - br->start;
-			lp = lp->next;
-			break;
-		default:
-			ASSERT(0);
-			break;
-		}
+	for_each_xbitmap_extent(bn, sub) {
+		error = xbitmap_clear(bitmap, bn->bn_start,
+				bn->bn_last - bn->bn_start + 1);
+		if (error)
+			return error;
 	}
 
-out:
-	return error;
+	return 0;
 }
-#undef LEFT_ALIGNED
-#undef RIGHT_ALIGNED
 
 /*
  * Record all btree blocks seen while iterating all records of a btree.
@@ -307,11 +324,11 @@ uint64_t
 xbitmap_hweight(
 	struct xbitmap		*bitmap)
 {
-	struct xbitmap_range	*bmr;
+	struct xbitmap_node	*bn;
 	uint64_t		ret = 0;
 
-	for_each_xbitmap_extent(bmr, bitmap)
-		ret += bmr->len;
+	for_each_xbitmap_extent(bn, bitmap)
+		ret += bn->bn_last - bn->bn_start + 1;
 
 	return ret;
 }
@@ -320,14 +337,14 @@ xbitmap_hweight(
 int
 xbitmap_walk(
 	struct xbitmap		*bitmap,
-	xbitmap_walk_fn	fn,
+	xbitmap_walk_fn		fn,
 	void			*priv)
 {
-	struct xbitmap_range	*bex;
+	struct xbitmap_node	*bn;
 	int			error = 0;
 
-	for_each_xbitmap_extent(bex, bitmap) {
-		error = fn(bex->start, bex->len, priv);
+	for_each_xbitmap_extent(bn, bitmap) {
+		error = fn(bn->bn_start, bn->bn_last - bn->bn_start + 1, priv);
 		if (error)
 			break;
 	}
@@ -371,3 +388,11 @@ xbitmap_walk_bits(
 
 	return xbitmap_walk(bitmap, xbitmap_walk_bits_in_run, &wb);
 }
+
+/* Does this bitmap have no bits set at all? */
+bool
+xbitmap_empty(
+	struct xbitmap		*bitmap)
+{
+	return bitmap->xb_root.rb_root.rb_node == NULL;
+}
diff --git a/fs/xfs/scrub/bitmap.h b/fs/xfs/scrub/bitmap.h
index 53601d281ffb..7afd64a318d1 100644
--- a/fs/xfs/scrub/bitmap.h
+++ b/fs/xfs/scrub/bitmap.h
@@ -6,19 +6,14 @@
 #ifndef __XFS_SCRUB_BITMAP_H__
 #define __XFS_SCRUB_BITMAP_H__
 
-struct xbitmap_range {
-	struct list_head	list;
-	uint64_t		start;
-	uint64_t		len;
-};
-
 struct xbitmap {
-	struct list_head	list;
+	struct rb_root_cached	xb_root;
 };
 
 void xbitmap_init(struct xbitmap *bitmap);
 void xbitmap_destroy(struct xbitmap *bitmap);
 
+int xbitmap_clear(struct xbitmap *bitmap, uint64_t start, uint64_t len);
 int xbitmap_set(struct xbitmap *bitmap, uint64_t start, uint64_t len);
 int xbitmap_disunion(struct xbitmap *bitmap, struct xbitmap *sub);
 int xbitmap_set_btcur_path(struct xbitmap *bitmap,
@@ -42,4 +37,6 @@ typedef int (*xbitmap_walk_bits_fn)(uint64_t bit, void *priv);
 int xbitmap_walk_bits(struct xbitmap *bitmap, xbitmap_walk_bits_fn fn,
 		void *priv);
 
+bool xbitmap_empty(struct xbitmap *bitmap);
+
 #endif	/* __XFS_SCRUB_BITMAP_H__ */


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 08/23] xfs: port xbitmap_test
  2023-02-16 20:27 ` [PATCHSET v9r2d1 00/23] xfs: online fsck support patches Darrick J. Wong
                     ` (6 preceding siblings ...)
  2023-02-16 20:43   ` [PATCH 07/23] xfs: convert xbitmap to interval tree Darrick J. Wong
@ 2023-02-16 20:43   ` Darrick J. Wong
  2023-02-16 20:44   ` [PATCH 09/23] xfs: ignore stale buffers when scanning the buffer cache Darrick J. Wong
                     ` (14 subsequent siblings)
  22 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:43 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/Makefile       |    2 +-
 fs/xfs/scrub/bitmap.c |   22 ++++++++++++++++++++++
 fs/xfs/scrub/bitmap.h |    2 ++
 3 files changed, 25 insertions(+), 1 deletion(-)


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 42d0496fdad7..2de5a71a2fa3 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -147,6 +147,7 @@ xfs-y				+= $(addprefix scrub/, \
 				   agheader.o \
 				   alloc.o \
 				   attr.o \
+				   bitmap.o \
 				   bmap.o \
 				   btree.o \
 				   common.o \
@@ -170,7 +171,6 @@ xfs-$(CONFIG_XFS_QUOTA)		+= scrub/quota.o
 ifeq ($(CONFIG_XFS_ONLINE_REPAIR),y)
 xfs-y				+= $(addprefix scrub/, \
 				   agheader_repair.o \
-				   bitmap.o \
 				   repair.o \
 				   )
 endif
diff --git a/fs/xfs/scrub/bitmap.c b/fs/xfs/scrub/bitmap.c
index 1b04d2ce020a..14caff0a28ce 100644
--- a/fs/xfs/scrub/bitmap.c
+++ b/fs/xfs/scrub/bitmap.c
@@ -396,3 +396,25 @@ xbitmap_empty(
 {
 	return bitmap->xb_root.rb_root.rb_node == NULL;
 }
+
+/* Is the start of the range set or clear?  And for how long? */
+bool
+xbitmap_test(
+	struct xbitmap		*bitmap,
+	uint64_t		start,
+	uint64_t		*len)
+{
+	struct xbitmap_node	*bn;
+	uint64_t		last = start + *len - 1;
+
+	bn = xbitmap_tree_iter_first(&bitmap->xb_root, start, last);
+	if (!bn)
+		return false;
+	if (bn->bn_start <= start) {
+		if (bn->bn_last < last)
+			*len = bn->bn_last - start + 1;
+		return true;
+	}
+	*len = bn->bn_start - start;
+	return false;
+}
diff --git a/fs/xfs/scrub/bitmap.h b/fs/xfs/scrub/bitmap.h
index 7afd64a318d1..dd492798b7af 100644
--- a/fs/xfs/scrub/bitmap.h
+++ b/fs/xfs/scrub/bitmap.h
@@ -39,4 +39,6 @@ int xbitmap_walk_bits(struct xbitmap *bitmap, xbitmap_walk_bits_fn fn,
 
 bool xbitmap_empty(struct xbitmap *bitmap);
 
+bool xbitmap_test(struct xbitmap *bitmap, uint64_t start, uint64_t *len);
+
 #endif	/* __XFS_SCRUB_BITMAP_H__ */


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 09/23] xfs: ignore stale buffers when scanning the buffer cache
  2023-02-16 20:27 ` [PATCHSET v9r2d1 00/23] xfs: online fsck support patches Darrick J. Wong
                     ` (7 preceding siblings ...)
  2023-02-16 20:43   ` [PATCH 08/23] xfs: port xbitmap_test Darrick J. Wong
@ 2023-02-16 20:44   ` Darrick J. Wong
  2023-02-16 20:44   ` [PATCH 10/23] xfs: create a big array data structure Darrick J. Wong
                     ` (13 subsequent siblings)
  22 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:44 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

After an online repair, we need to invalidate buffers representing the
blocks from the old metadata that we're replacing.  It's possible that
parts of a tree that were previously cached in memory are no longer
accessible due to media failure or other corruption on interior nodes,
so repair figures out the old blocks from the reverse mapping data and
scans the buffer cache directly.

Unfortunately, the current buffer cache code triggers asserts if the
rhashtable lookup finds a non-stale buffer of a different length than
the key we searched for.  For regular operation this is desirable, but
for this repair procedure, we don't care since we're going to forcibly
stale the buffer anyway.  Add an internal lookup flag to avoid the
assert.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_buf.c |    5 ++++-
 fs/xfs/xfs_buf.h |   10 ++++++++++
 2 files changed, 14 insertions(+), 1 deletion(-)


diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 54c774af6e1c..a538501b652b 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -482,7 +482,8 @@ _xfs_buf_obj_cmp(
 		 * reallocating a busy extent. Skip this buffer and
 		 * continue searching for an exact match.
 		 */
-		ASSERT(bp->b_flags & XBF_STALE);
+		if (!(map->bm_flags & XBM_IGNORE_LENGTH_MISMATCH))
+			ASSERT(bp->b_flags & XBF_STALE);
 		return 1;
 	}
 	return 0;
@@ -683,6 +684,8 @@ xfs_buf_get_map(
 	int			error;
 	int			i;
 
+	if (flags & XBF_BCACHE_SCAN)
+		cmap.bm_flags |= XBM_IGNORE_LENGTH_MISMATCH;
 	for (i = 0; i < nmaps; i++)
 		cmap.bm_len += map[i].bm_len;
 
diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
index 549c60942208..d6e8c3bab9f6 100644
--- a/fs/xfs/xfs_buf.h
+++ b/fs/xfs/xfs_buf.h
@@ -44,6 +44,11 @@ struct xfs_buf;
 #define _XBF_DELWRI_Q	 (1u << 22)/* buffer on a delwri queue */
 
 /* flags used only as arguments to access routines */
+/*
+ * We're scanning the buffer cache; do not warn about lookup mismatches.
+ * Only online repair should use this.
+ */
+#define XBF_BCACHE_SCAN	 (1u << 28)
 #define XBF_INCORE	 (1u << 29)/* lookup only, return if found in cache */
 #define XBF_TRYLOCK	 (1u << 30)/* lock requested, but do not wait */
 #define XBF_UNMAPPED	 (1u << 31)/* do not map the buffer */
@@ -67,6 +72,7 @@ typedef unsigned int xfs_buf_flags_t;
 	{ _XBF_KMEM,		"KMEM" }, \
 	{ _XBF_DELWRI_Q,	"DELWRI_Q" }, \
 	/* The following interface flags should never be set */ \
+	{ XBF_BCACHE_SCAN,	"BCACHE_SCAN" }, \
 	{ XBF_INCORE,		"INCORE" }, \
 	{ XBF_TRYLOCK,		"TRYLOCK" }, \
 	{ XBF_UNMAPPED,		"UNMAPPED" }
@@ -114,8 +120,12 @@ typedef struct xfs_buftarg {
 struct xfs_buf_map {
 	xfs_daddr_t		bm_bn;	/* block number for I/O */
 	int			bm_len;	/* size of I/O */
+	unsigned int		bm_flags;
 };
 
+/* Don't complain about live buffers with the wrong length during lookup. */
+#define XBM_IGNORE_LENGTH_MISMATCH	(1U << 0)
+
 #define DEFINE_SINGLE_BUF_MAP(map, blkno, numblk) \
 	struct xfs_buf_map (map) = { .bm_bn = (blkno), .bm_len = (numblk) };
 


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 10/23] xfs: create a big array data structure
  2023-02-16 20:27 ` [PATCHSET v9r2d1 00/23] xfs: online fsck support patches Darrick J. Wong
                     ` (8 preceding siblings ...)
  2023-02-16 20:44   ` [PATCH 09/23] xfs: ignore stale buffers when scanning the buffer cache Darrick J. Wong
@ 2023-02-16 20:44   ` Darrick J. Wong
  2023-02-16 20:44   ` [PATCH 11/23] xfs: wrap ilock/iunlock operations on sc->ip Darrick J. Wong
                     ` (12 subsequent siblings)
  22 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:44 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Create a simple 'big array' data structure for storage of fixed-size
metadata records that will be used to reconstruct a btree index.  For
repair operations, the most important operations are append, iterate,
and sort.

Earlier implementations of the big array used linked lists and suffered
from severe problems -- pinning all records in kernel memory was not a
good idea and frequently lead to OOM situations; random access was very
inefficient; and record overhead for the lists was unacceptably high at
40-60%.

Therefore, the big memory array relies on the 'xfile' abstraction, which
creates a memfd file and stores the records in page cache pages.  Since
the memfd is created in tmpfs, the memory pages can be pushed out to
disk if necessary and we have a built-in usage limit of 50% of physical
memory.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/Kconfig         |    1 
 fs/xfs/Makefile        |    2 
 fs/xfs/scrub/trace.c   |    4 -
 fs/xfs/scrub/trace.h   |  123 ++++++++++++++++
 fs/xfs/scrub/xfarray.c |  370 ++++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/xfarray.h |   58 ++++++++
 fs/xfs/scrub/xfile.c   |  318 +++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/xfile.h   |   58 ++++++++
 8 files changed, 933 insertions(+), 1 deletion(-)
 create mode 100644 fs/xfs/scrub/xfarray.c
 create mode 100644 fs/xfs/scrub/xfarray.h
 create mode 100644 fs/xfs/scrub/xfile.c
 create mode 100644 fs/xfs/scrub/xfile.h


diff --git a/fs/xfs/Kconfig b/fs/xfs/Kconfig
index 9fac5ea8d0e4..7f12b40146b3 100644
--- a/fs/xfs/Kconfig
+++ b/fs/xfs/Kconfig
@@ -97,6 +97,7 @@ config XFS_ONLINE_SCRUB
 	bool "XFS online metadata check support"
 	default n
 	depends on XFS_FS
+	depends on TMPFS && SHMEM
 	help
 	  If you say Y here you will be able to check metadata on a
 	  mounted XFS filesystem.  This feature is intended to reduce
diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 2de5a71a2fa3..b55b8ece7627 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -162,6 +162,8 @@ xfs-y				+= $(addprefix scrub/, \
 				   rmap.o \
 				   scrub.o \
 				   symlink.o \
+				   xfarray.o \
+				   xfile.o \
 				   )
 
 xfs-$(CONFIG_XFS_RT)		+= scrub/rtbitmap.o
diff --git a/fs/xfs/scrub/trace.c b/fs/xfs/scrub/trace.c
index b5f94676c37c..4a0385c97ea6 100644
--- a/fs/xfs/scrub/trace.c
+++ b/fs/xfs/scrub/trace.c
@@ -12,8 +12,10 @@
 #include "xfs_mount.h"
 #include "xfs_inode.h"
 #include "xfs_btree.h"
-#include "scrub/scrub.h"
 #include "xfs_ag.h"
+#include "scrub/scrub.h"
+#include "scrub/xfile.h"
+#include "scrub/xfarray.h"
 
 /* Figure out which block the btree cursor was pointing to. */
 static inline xfs_fsblock_t
diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
index 93ece6df02e3..25086df0963c 100644
--- a/fs/xfs/scrub/trace.h
+++ b/fs/xfs/scrub/trace.h
@@ -16,6 +16,9 @@
 #include <linux/tracepoint.h>
 #include "xfs_bit.h"
 
+struct xfile;
+struct xfarray;
+
 /*
  * ftrace's __print_symbolic requires that all enum values be wrapped in the
  * TRACE_DEFINE_ENUM macro so that the enum value can be encoded in the ftrace
@@ -657,6 +660,126 @@ TRACE_EVENT(xchk_fscounters_within_range,
 		  __entry->old_value)
 )
 
+TRACE_EVENT(xfile_create,
+	TP_PROTO(struct xfs_mount *mp, struct xfile *xf),
+	TP_ARGS(mp, xf),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(unsigned long, ino)
+		__array(char, pathname, 256)
+	),
+	TP_fast_assign(
+		char		pathname[257];
+		char		*path;
+
+		__entry->dev = mp->m_super->s_dev;
+		__entry->ino = file_inode(xf->file)->i_ino;
+		memset(pathname, 0, sizeof(pathname));
+		path = file_path(xf->file, pathname, sizeof(pathname) - 1);
+		if (IS_ERR(path))
+			path = "(unknown)";
+		strncpy(__entry->pathname, path, sizeof(__entry->pathname));
+	),
+	TP_printk("dev %d:%d xfino 0x%lx path '%s'",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->ino,
+		  __entry->pathname)
+);
+
+TRACE_EVENT(xfile_destroy,
+	TP_PROTO(struct xfile *xf),
+	TP_ARGS(xf),
+	TP_STRUCT__entry(
+		__field(unsigned long, ino)
+		__field(unsigned long long, bytes)
+		__field(loff_t, size)
+	),
+	TP_fast_assign(
+		struct xfile_stat	statbuf;
+		int			ret;
+
+		ret = xfile_stat(xf, &statbuf);
+		if (!ret) {
+			__entry->bytes = statbuf.bytes;
+			__entry->size = statbuf.size;
+		} else {
+			__entry->bytes = -1;
+			__entry->size = -1;
+		}
+		__entry->ino = file_inode(xf->file)->i_ino;
+	),
+	TP_printk("xfino 0x%lx mem_bytes 0x%llx isize 0x%llx",
+		  __entry->ino,
+		  __entry->bytes,
+		  __entry->size)
+);
+
+DECLARE_EVENT_CLASS(xfile_class,
+	TP_PROTO(struct xfile *xf, loff_t pos, unsigned long long bytecount),
+	TP_ARGS(xf, pos, bytecount),
+	TP_STRUCT__entry(
+		__field(unsigned long, ino)
+		__field(unsigned long long, bytes_used)
+		__field(loff_t, pos)
+		__field(loff_t, size)
+		__field(unsigned long long, bytecount)
+	),
+	TP_fast_assign(
+		struct xfile_stat	statbuf;
+		int			ret;
+
+		ret = xfile_stat(xf, &statbuf);
+		if (!ret) {
+			__entry->bytes_used = statbuf.bytes;
+			__entry->size = statbuf.size;
+		} else {
+			__entry->bytes_used = -1;
+			__entry->size = -1;
+		}
+		__entry->ino = file_inode(xf->file)->i_ino;
+		__entry->pos = pos;
+		__entry->bytecount = bytecount;
+	),
+	TP_printk("xfino 0x%lx mem_bytes 0x%llx pos 0x%llx bytecount 0x%llx isize 0x%llx",
+		  __entry->ino,
+		  __entry->bytes_used,
+		  __entry->pos,
+		  __entry->bytecount,
+		  __entry->size)
+);
+#define DEFINE_XFILE_EVENT(name) \
+DEFINE_EVENT(xfile_class, name, \
+	TP_PROTO(struct xfile *xf, loff_t pos, unsigned long long bytecount), \
+	TP_ARGS(xf, pos, bytecount))
+DEFINE_XFILE_EVENT(xfile_pread);
+DEFINE_XFILE_EVENT(xfile_pwrite);
+DEFINE_XFILE_EVENT(xfile_seek_data);
+
+TRACE_EVENT(xfarray_create,
+	TP_PROTO(struct xfarray *xfa, unsigned long long required_capacity),
+	TP_ARGS(xfa, required_capacity),
+	TP_STRUCT__entry(
+		__field(unsigned long, ino)
+		__field(uint64_t, max_nr)
+		__field(size_t, obj_size)
+		__field(int, obj_size_log)
+		__field(unsigned long long, required_capacity)
+	),
+	TP_fast_assign(
+		__entry->max_nr = xfa->max_nr;
+		__entry->obj_size = xfa->obj_size;
+		__entry->obj_size_log = xfa->obj_size_log;
+		__entry->ino = file_inode(xfa->xfile->file)->i_ino;
+		__entry->required_capacity = required_capacity;
+	),
+	TP_printk("xfino 0x%lx max_nr %llu reqd_nr %llu objsz %zu objszlog %d",
+		  __entry->ino,
+		  __entry->max_nr,
+		  __entry->required_capacity,
+		  __entry->obj_size,
+		  __entry->obj_size_log)
+);
+
 /* repair tracepoints */
 #if IS_ENABLED(CONFIG_XFS_ONLINE_REPAIR)
 
diff --git a/fs/xfs/scrub/xfarray.c b/fs/xfs/scrub/xfarray.c
new file mode 100644
index 000000000000..8fdd7dd40193
--- /dev/null
+++ b/fs/xfs/scrub/xfarray.c
@@ -0,0 +1,370 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (C) 2022 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <djwong@kernel.org>
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "scrub/xfile.h"
+#include "scrub/xfarray.h"
+#include "scrub/scrub.h"
+#include "scrub/trace.h"
+
+/*
+ * Large Arrays of Fixed-Size Records
+ * ==================================
+ *
+ * This memory array uses an xfile (which itself is a memfd "file") to store
+ * large numbers of fixed-size records in memory that can be paged out.  This
+ * puts less stress on the memory reclaim algorithms during an online repair
+ * because we don't have to pin so much memory.  However, array access is less
+ * direct than would be in a regular memory array.  Access to the array is
+ * performed via indexed load and store methods, and an append method is
+ * provided for convenience.  Array elements can be unset, which sets them to
+ * all zeroes.  Unset entries are skipped during iteration, though direct loads
+ * will return a zeroed buffer.  Callers are responsible for concurrency
+ * control.
+ */
+
+/*
+ * Pointer to scratch space.  Because we can't access the xfile data directly,
+ * we allocate a small amount of memory on the end of the xfarray structure to
+ * buffer array items when we need space to store values temporarily.
+ */
+static inline void *xfarray_scratch(struct xfarray *array)
+{
+	return (array + 1);
+}
+
+/* Compute array index given an xfile offset. */
+static xfarray_idx_t
+xfarray_idx(
+	struct xfarray	*array,
+	loff_t		pos)
+{
+	if (array->obj_size_log >= 0)
+		return (xfarray_idx_t)pos >> array->obj_size_log;
+
+	return div_u64((xfarray_idx_t)pos, array->obj_size);
+}
+
+/* Compute xfile offset of array element. */
+static inline loff_t xfarray_pos(struct xfarray *array, xfarray_idx_t idx)
+{
+	if (array->obj_size_log >= 0)
+		return idx << array->obj_size_log;
+
+	return idx * array->obj_size;
+}
+
+/*
+ * Initialize a big memory array.  Array records cannot be larger than a
+ * page, and the array cannot span more bytes than the page cache supports.
+ * If @required_capacity is nonzero, the maximum array size will be set to this
+ * quantity and the array creation will fail if the underlying storage cannot
+ * support that many records.
+ */
+int
+xfarray_create(
+	struct xfs_mount	*mp,
+	const char		*description,
+	unsigned long long	required_capacity,
+	size_t			obj_size,
+	struct xfarray		**arrayp)
+{
+	struct xfarray		*array;
+	struct xfile		*xfile;
+	int			error;
+
+	ASSERT(obj_size < PAGE_SIZE);
+
+	error = xfile_create(mp, description, 0, &xfile);
+	if (error)
+		return error;
+
+	error = -ENOMEM;
+	array = kzalloc(sizeof(struct xfarray) + obj_size, XCHK_GFP_FLAGS);
+	if (!array)
+		goto out_xfile;
+
+	array->xfile = xfile;
+	array->obj_size = obj_size;
+
+	if (is_power_of_2(obj_size))
+		array->obj_size_log = ilog2(obj_size);
+	else
+		array->obj_size_log = -1;
+
+	array->max_nr = xfarray_idx(array, MAX_LFS_FILESIZE);
+	trace_xfarray_create(array, required_capacity);
+
+	if (required_capacity > 0) {
+		if (array->max_nr < required_capacity) {
+			error = -ENOMEM;
+			goto out_xfarray;
+		}
+		array->max_nr = required_capacity;
+	}
+
+	*arrayp = array;
+	return 0;
+
+out_xfarray:
+	kfree(array);
+out_xfile:
+	xfile_destroy(xfile);
+	return error;
+}
+
+/* Destroy the array. */
+void
+xfarray_destroy(
+	struct xfarray	*array)
+{
+	xfile_destroy(array->xfile);
+	kfree(array);
+}
+
+/* Load an element from the array. */
+int
+xfarray_load(
+	struct xfarray	*array,
+	xfarray_idx_t	idx,
+	void		*ptr)
+{
+	if (idx >= array->nr)
+		return -ENODATA;
+
+	return xfile_obj_load(array->xfile, ptr, array->obj_size,
+			xfarray_pos(array, idx));
+}
+
+/* Is this array element potentially unset? */
+static inline bool
+xfarray_is_unset(
+	struct xfarray	*array,
+	loff_t		pos)
+{
+	void		*temp = xfarray_scratch(array);
+	int		error;
+
+	if (array->unset_slots == 0)
+		return false;
+
+	error = xfile_obj_load(array->xfile, temp, array->obj_size, pos);
+	if (!error && xfarray_element_is_null(array, temp))
+		return true;
+
+	return false;
+}
+
+/*
+ * Unset an array element.  If @idx is the last element in the array, the
+ * array will be truncated.  Otherwise, the entry will be zeroed.
+ */
+int
+xfarray_unset(
+	struct xfarray	*array,
+	xfarray_idx_t	idx)
+{
+	void		*temp = xfarray_scratch(array);
+	loff_t		pos = xfarray_pos(array, idx);
+	int		error;
+
+	if (idx >= array->nr)
+		return -ENODATA;
+
+	if (idx == array->nr - 1) {
+		array->nr--;
+		return 0;
+	}
+
+	if (xfarray_is_unset(array, pos))
+		return 0;
+
+	memset(temp, 0, array->obj_size);
+	error = xfile_obj_store(array->xfile, temp, array->obj_size, pos);
+	if (error)
+		return error;
+
+	array->unset_slots++;
+	return 0;
+}
+
+/*
+ * Store an element in the array.  The element must not be completely zeroed,
+ * because those are considered unset sparse elements.
+ */
+int
+xfarray_store(
+	struct xfarray	*array,
+	xfarray_idx_t	idx,
+	const void	*ptr)
+{
+	int		ret;
+
+	if (idx >= array->max_nr)
+		return -EFBIG;
+
+	ASSERT(!xfarray_element_is_null(array, ptr));
+
+	ret = xfile_obj_store(array->xfile, ptr, array->obj_size,
+			xfarray_pos(array, idx));
+	if (ret)
+		return ret;
+
+	array->nr = max(array->nr, idx + 1);
+	return 0;
+}
+
+/* Is this array element NULL? */
+bool
+xfarray_element_is_null(
+	struct xfarray	*array,
+	const void	*ptr)
+{
+	return !memchr_inv(ptr, 0, array->obj_size);
+}
+
+/*
+ * Store an element anywhere in the array that is unset.  If there are no
+ * unset slots, append the element to the array.
+ */
+int
+xfarray_store_anywhere(
+	struct xfarray	*array,
+	const void	*ptr)
+{
+	void		*temp = xfarray_scratch(array);
+	loff_t		endpos = xfarray_pos(array, array->nr);
+	loff_t		pos;
+	int		error;
+
+	/* Find an unset slot to put it in. */
+	for (pos = 0;
+	     pos < endpos && array->unset_slots > 0;
+	     pos += array->obj_size) {
+		error = xfile_obj_load(array->xfile, temp, array->obj_size,
+				pos);
+		if (error || !xfarray_element_is_null(array, temp))
+			continue;
+
+		error = xfile_obj_store(array->xfile, ptr, array->obj_size,
+				pos);
+		if (error)
+			return error;
+
+		array->unset_slots--;
+		return 0;
+	}
+
+	/* No unset slots found; attach it on the end. */
+	array->unset_slots = 0;
+	return xfarray_append(array, ptr);
+}
+
+/* Return length of array. */
+uint64_t
+xfarray_length(
+	struct xfarray	*array)
+{
+	return array->nr;
+}
+
+/*
+ * Decide which array item we're going to read as part of an _iter_get.
+ * @cur is the array index, and @pos is the file offset of that array index in
+ * the backing xfile.  Returns ENODATA if we reach the end of the records.
+ *
+ * Reading from a hole in a sparse xfile causes page instantiation, so for
+ * iterating a (possibly sparse) array we need to figure out if the cursor is
+ * pointing at a totally uninitialized hole and move the cursor up if
+ * necessary.
+ */
+static inline int
+xfarray_find_data(
+	struct xfarray	*array,
+	xfarray_idx_t	*cur,
+	loff_t		*pos)
+{
+	unsigned int	pgoff = offset_in_page(*pos);
+	loff_t		end_pos = *pos + array->obj_size - 1;
+	loff_t		new_pos;
+
+	/*
+	 * If the current array record is not adjacent to a page boundary, we
+	 * are in the middle of the page.  We do not need to move the cursor.
+	 */
+	if (pgoff != 0 && pgoff + array->obj_size - 1 < PAGE_SIZE)
+		return 0;
+
+	/*
+	 * Call SEEK_DATA on the last byte in the record we're about to read.
+	 * If the record ends at (or crosses) the end of a page then we know
+	 * that the first byte of the record is backed by pages and don't need
+	 * to query it.  If instead the record begins at the start of the page
+	 * then we know that querying the last byte is just as good as querying
+	 * the first byte, since records cannot be larger than a page.
+	 *
+	 * If the call returns the same file offset, we know this record is
+	 * backed by real pages.  We do not need to move the cursor.
+	 */
+	new_pos = xfile_seek_data(array->xfile, end_pos);
+	if (new_pos == -ENXIO)
+		return -ENODATA;
+	if (new_pos < 0)
+		return new_pos;
+	if (new_pos == end_pos)
+		return 0;
+
+	/*
+	 * Otherwise, SEEK_DATA told us how far up to move the file pointer to
+	 * find more data.  Move the array index to the first record past the
+	 * byte offset we were given.
+	 */
+	new_pos = roundup_64(new_pos, array->obj_size);
+	*cur = xfarray_idx(array, new_pos);
+	*pos = xfarray_pos(array, *cur);
+	return 0;
+}
+
+/*
+ * Starting at *idx, fetch the next non-null array entry and advance the index
+ * to set up the next _load_next call.  Returns ENODATA if we reach the end of
+ * the array.  Callers must set @*idx to XFARRAY_CURSOR_INIT before the first
+ * call to this function.
+ */
+int
+xfarray_load_next(
+	struct xfarray	*array,
+	xfarray_idx_t	*idx,
+	void		*rec)
+{
+	xfarray_idx_t	cur = *idx;
+	loff_t		pos = xfarray_pos(array, cur);
+	int		error;
+
+	do {
+		if (cur >= array->nr)
+			return -ENODATA;
+
+		/*
+		 * Ask the backing store for the location of next possible
+		 * written record, then retrieve that record.
+		 */
+		error = xfarray_find_data(array, &cur, &pos);
+		if (error)
+			return error;
+		error = xfarray_load(array, cur, rec);
+		if (error)
+			return error;
+
+		cur++;
+		pos += array->obj_size;
+	} while (xfarray_element_is_null(array, rec));
+
+	*idx = cur;
+	return 0;
+}
diff --git a/fs/xfs/scrub/xfarray.h b/fs/xfs/scrub/xfarray.h
new file mode 100644
index 000000000000..26e2b594f121
--- /dev/null
+++ b/fs/xfs/scrub/xfarray.h
@@ -0,0 +1,58 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Copyright (C) 2022 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <djwong@kernel.org>
+ */
+#ifndef __XFS_SCRUB_XFARRAY_H__
+#define __XFS_SCRUB_XFARRAY_H__
+
+/* xfile array index type, along with cursor initialization */
+typedef uint64_t		xfarray_idx_t;
+#define XFARRAY_CURSOR_INIT	((__force xfarray_idx_t)0)
+
+/* Iterate each index of an xfile array. */
+#define foreach_xfarray_idx(array, idx) \
+	for ((idx) = XFARRAY_CURSOR_INIT; \
+	     (idx) < xfarray_length(array); \
+	     (idx)++)
+
+struct xfarray {
+	/* Underlying file that backs the array. */
+	struct xfile	*xfile;
+
+	/* Number of array elements. */
+	xfarray_idx_t	nr;
+
+	/* Maximum possible array size. */
+	xfarray_idx_t	max_nr;
+
+	/* Number of unset slots in the array below @nr. */
+	uint64_t	unset_slots;
+
+	/* Size of an array element. */
+	size_t		obj_size;
+
+	/* log2 of array element size, if possible. */
+	int		obj_size_log;
+};
+
+int xfarray_create(struct xfs_mount *mp, const char *descr,
+		unsigned long long required_capacity, size_t obj_size,
+		struct xfarray **arrayp);
+void xfarray_destroy(struct xfarray *array);
+int xfarray_load(struct xfarray *array, xfarray_idx_t idx, void *ptr);
+int xfarray_unset(struct xfarray *array, xfarray_idx_t idx);
+int xfarray_store(struct xfarray *array, xfarray_idx_t idx, const void *ptr);
+int xfarray_store_anywhere(struct xfarray *array, const void *ptr);
+bool xfarray_element_is_null(struct xfarray *array, const void *ptr);
+
+/* Append an element to the array. */
+static inline int xfarray_append(struct xfarray *array, const void *ptr)
+{
+	return xfarray_store(array, array->nr, ptr);
+}
+
+uint64_t xfarray_length(struct xfarray *array);
+int xfarray_load_next(struct xfarray *array, xfarray_idx_t *idx, void *rec);
+
+#endif /* __XFS_SCRUB_XFARRAY_H__ */
diff --git a/fs/xfs/scrub/xfile.c b/fs/xfs/scrub/xfile.c
new file mode 100644
index 000000000000..43455aa78243
--- /dev/null
+++ b/fs/xfs/scrub/xfile.c
@@ -0,0 +1,318 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (C) 2022 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <djwong@kernel.org>
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_log_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_format.h"
+#include "scrub/xfile.h"
+#include "scrub/xfarray.h"
+#include "scrub/scrub.h"
+#include "scrub/trace.h"
+#include <linux/shmem_fs.h>
+
+/*
+ * Swappable Temporary Memory
+ * ==========================
+ *
+ * Online checking sometimes needs to be able to stage a large amount of data
+ * in memory.  This information might not fit in the available memory and it
+ * doesn't all need to be accessible at all times.  In other words, we want an
+ * indexed data buffer to store data that can be paged out.
+ *
+ * When CONFIG_TMPFS=y, shmemfs is enough of a filesystem to meet those
+ * requirements.  Therefore, the xfile mechanism uses an unlinked shmem file to
+ * store our staging data.  This file is not installed in the file descriptor
+ * table so that user programs cannot access the data, which means that the
+ * xfile must be freed with xfile_destroy.
+ *
+ * xfiles assume that the caller will handle all required concurrency
+ * management; standard vfs locks (freezer and inode) are not taken.  Reads
+ * and writes are satisfied directly from the page cache.
+ *
+ * NOTE: The current shmemfs implementation has a quirk that in-kernel reads
+ * of a hole cause a page to be mapped into the file.  If you are going to
+ * create a sparse xfile, please be careful about reading from uninitialized
+ * parts of the file.  These pages are !Uptodate and will eventually be
+ * reclaimed if not written, but in the short term this boosts memory
+ * consumption.
+ */
+
+/*
+ * xfiles must not be exposed to userspace and require upper layers to
+ * coordinate access to the one handle returned by the constructor, so
+ * establish a separate lock class for xfiles to avoid confusing lockdep.
+ */
+static struct lock_class_key xfile_i_mutex_key;
+
+/*
+ * Create an xfile of the given size.  The description will be used in the
+ * trace output.
+ */
+int
+xfile_create(
+	struct xfs_mount	*mp,
+	const char		*description,
+	loff_t			isize,
+	struct xfile		**xfilep)
+{
+	char			*fname;
+	struct xfile		*xf;
+	int			error = -ENOMEM;
+
+	xf = kmalloc(sizeof(struct xfile), XCHK_GFP_FLAGS);
+	if (!xf)
+		return -ENOMEM;
+
+	fname = kmalloc(MAXNAMELEN, XCHK_GFP_FLAGS);
+	if (!fname)
+		goto out_xfile;
+
+	snprintf(fname, MAXNAMELEN - 1, "XFS (%s): %s", mp->m_super->s_id,
+			description);
+	fname[MAXNAMELEN - 1] = 0;
+
+	xf->file = shmem_file_setup(fname, isize, 0);
+	if (!xf->file)
+		goto out_fname;
+	if (IS_ERR(xf->file)) {
+		error = PTR_ERR(xf->file);
+		goto out_fname;
+	}
+
+	/*
+	 * We want a large sparse file that we can pread, pwrite, and seek.
+	 * xfile users are responsible for keeping the xfile hidden away from
+	 * all other callers, so we skip timestamp updates and security checks.
+	 */
+	xf->file->f_mode |= FMODE_PREAD | FMODE_PWRITE | FMODE_NOCMTIME |
+			    FMODE_LSEEK;
+	xf->file->f_flags |= O_RDWR | O_LARGEFILE | O_NOATIME;
+	xf->file->f_inode->i_flags |= S_PRIVATE | S_NOCMTIME | S_NOATIME;
+
+	lockdep_set_class(&file_inode(xf->file)->i_rwsem, &xfile_i_mutex_key);
+
+	trace_xfile_create(mp, xf);
+
+	kfree(fname);
+	*xfilep = xf;
+	return 0;
+out_fname:
+	kfree(fname);
+out_xfile:
+	kfree(xf);
+	return error;
+}
+
+/* Close the file and release all resources. */
+void
+xfile_destroy(
+	struct xfile		*xf)
+{
+	struct inode		*inode = file_inode(xf->file);
+
+	trace_xfile_destroy(xf);
+
+	lockdep_set_class(&inode->i_rwsem, &inode->i_sb->s_type->i_mutex_key);
+	fput(xf->file);
+	kfree(xf);
+}
+
+/*
+ * Read a memory object directly from the xfile's page cache.  Unlike regular
+ * pread, we return -E2BIG and -EFBIG for reads that are too large or at too
+ * high an offset, instead of truncating the read.  Otherwise, we return
+ * bytes read or an error code, like regular pread.
+ */
+ssize_t
+xfile_pread(
+	struct xfile		*xf,
+	void			*buf,
+	size_t			count,
+	loff_t			pos)
+{
+	struct inode		*inode = file_inode(xf->file);
+	struct address_space	*mapping = inode->i_mapping;
+	struct page		*page = NULL;
+	ssize_t			read = 0;
+	unsigned int		pflags;
+	int			error = 0;
+
+	if (count > MAX_RW_COUNT)
+		return -E2BIG;
+	if (inode->i_sb->s_maxbytes - pos < count)
+		return -EFBIG;
+
+	trace_xfile_pread(xf, pos, count);
+
+	pflags = memalloc_nofs_save();
+	while (count > 0) {
+		void		*p, *kaddr;
+		unsigned int	len;
+
+		len = min_t(ssize_t, count, PAGE_SIZE - offset_in_page(pos));
+
+		/*
+		 * In-kernel reads of a shmem file cause it to allocate a page
+		 * if the mapping shows a hole.  Therefore, if we hit ENOMEM
+		 * we can continue by zeroing the caller's buffer.
+		 */
+		page = shmem_read_mapping_page_gfp(mapping, pos >> PAGE_SHIFT,
+				__GFP_NOWARN);
+		if (IS_ERR(page)) {
+			error = PTR_ERR(page);
+			if (error != -ENOMEM)
+				break;
+
+			memset(buf, 0, len);
+			goto advance;
+		}
+
+		if (PageUptodate(page)) {
+			/*
+			 * xfile pages must never be mapped into userspace, so
+			 * we skip the dcache flush.
+			 */
+			kaddr = kmap_local_page(page);
+			p = kaddr + offset_in_page(pos);
+			memcpy(buf, p, len);
+			kunmap_local(kaddr);
+		} else {
+			memset(buf, 0, len);
+		}
+		put_page(page);
+
+advance:
+		count -= len;
+		pos += len;
+		buf += len;
+		read += len;
+	}
+	memalloc_nofs_restore(pflags);
+
+	if (read > 0)
+		return read;
+	return error;
+}
+
+/*
+ * Write a memory object directly to the xfile's page cache.  Unlike regular
+ * pwrite, we return -E2BIG and -EFBIG for writes that are too large or at too
+ * high an offset, instead of truncating the write.  Otherwise, we return
+ * bytes written or an error code, like regular pwrite.
+ */
+ssize_t
+xfile_pwrite(
+	struct xfile		*xf,
+	const void		*buf,
+	size_t			count,
+	loff_t			pos)
+{
+	struct inode		*inode = file_inode(xf->file);
+	struct address_space	*mapping = inode->i_mapping;
+	const struct address_space_operations *aops = mapping->a_ops;
+	struct page		*page = NULL;
+	ssize_t			written = 0;
+	unsigned int		pflags;
+	int			error = 0;
+
+	if (count > MAX_RW_COUNT)
+		return -E2BIG;
+	if (inode->i_sb->s_maxbytes - pos < count)
+		return -EFBIG;
+
+	trace_xfile_pwrite(xf, pos, count);
+
+	pflags = memalloc_nofs_save();
+	while (count > 0) {
+		void		*fsdata = NULL;
+		void		*p, *kaddr;
+		unsigned int	len;
+		int		ret;
+
+		len = min_t(ssize_t, count, PAGE_SIZE - offset_in_page(pos));
+
+		/*
+		 * We call write_begin directly here to avoid all the freezer
+		 * protection lock-taking that happens in the normal path.
+		 * shmem doesn't support fs freeze, but lockdep doesn't know
+		 * that and will trip over that.
+		 */
+		error = aops->write_begin(NULL, mapping, pos, len, &page,
+				&fsdata);
+		if (error)
+			break;
+
+		/*
+		 * xfile pages must never be mapped into userspace, so we skip
+		 * the dcache flush.  If the page is not uptodate, zero it
+		 * before writing data.
+		 */
+		kaddr = kmap_local_page(page);
+		if (!PageUptodate(page)) {
+			memset(kaddr, 0, PAGE_SIZE);
+			SetPageUptodate(page);
+		}
+		p = kaddr + offset_in_page(pos);
+		memcpy(p, buf, len);
+		kunmap_local(kaddr);
+
+		ret = aops->write_end(NULL, mapping, pos, len, len, page,
+				fsdata);
+		if (ret < 0) {
+			error = ret;
+			break;
+		}
+
+		written += ret;
+		if (ret != len)
+			break;
+
+		count -= ret;
+		pos += ret;
+		buf += ret;
+	}
+	memalloc_nofs_restore(pflags);
+
+	if (written > 0)
+		return written;
+	return error;
+}
+
+/* Find the next written area in the xfile data for a given offset. */
+loff_t
+xfile_seek_data(
+	struct xfile		*xf,
+	loff_t			pos)
+{
+	loff_t			ret;
+
+	ret = vfs_llseek(xf->file, pos, SEEK_DATA);
+	trace_xfile_seek_data(xf, pos, ret);
+	return ret;
+}
+
+/* Query stat information for an xfile. */
+int
+xfile_stat(
+	struct xfile		*xf,
+	struct xfile_stat	*statbuf)
+{
+	struct kstat		ks;
+	int			error;
+
+	error = vfs_getattr_nosec(&xf->file->f_path, &ks,
+			STATX_SIZE | STATX_BLOCKS, AT_STATX_DONT_SYNC);
+	if (error)
+		return error;
+
+	statbuf->size = ks.size;
+	statbuf->bytes = ks.blocks << SECTOR_SHIFT;
+	return 0;
+}
diff --git a/fs/xfs/scrub/xfile.h b/fs/xfs/scrub/xfile.h
new file mode 100644
index 000000000000..b37dba1961d8
--- /dev/null
+++ b/fs/xfs/scrub/xfile.h
@@ -0,0 +1,58 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Copyright (C) 2022 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <djwong@kernel.org>
+ */
+#ifndef __XFS_SCRUB_XFILE_H__
+#define __XFS_SCRUB_XFILE_H__
+
+struct xfile {
+	struct file		*file;
+};
+
+int xfile_create(struct xfs_mount *mp, const char *description, loff_t isize,
+		struct xfile **xfilep);
+void xfile_destroy(struct xfile *xf);
+
+ssize_t xfile_pread(struct xfile *xf, void *buf, size_t count, loff_t pos);
+ssize_t xfile_pwrite(struct xfile *xf, const void *buf, size_t count,
+		loff_t pos);
+
+/*
+ * Load an object.  Since we're treating this file as "memory", any error or
+ * short IO is treated as a failure to allocate memory.
+ */
+static inline int
+xfile_obj_load(struct xfile *xf, void *buf, size_t count, loff_t pos)
+{
+	ssize_t	ret = xfile_pread(xf, buf, count, pos);
+
+	if (ret < 0 || ret != count)
+		return -ENOMEM;
+	return 0;
+}
+
+/*
+ * Store an object.  Since we're treating this file as "memory", any error or
+ * short IO is treated as a failure to allocate memory.
+ */
+static inline int
+xfile_obj_store(struct xfile *xf, const void *buf, size_t count, loff_t pos)
+{
+	ssize_t	ret = xfile_pwrite(xf, buf, count, pos);
+
+	if (ret < 0 || ret != count)
+		return -ENOMEM;
+	return 0;
+}
+
+loff_t xfile_seek_data(struct xfile *xf, loff_t pos);
+
+struct xfile_stat {
+	loff_t			size;
+	unsigned long long	bytes;
+};
+
+int xfile_stat(struct xfile *xf, struct xfile_stat *statbuf);
+
+#endif /* __XFS_SCRUB_XFILE_H__ */


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 11/23] xfs: wrap ilock/iunlock operations on sc->ip
  2023-02-16 20:27 ` [PATCHSET v9r2d1 00/23] xfs: online fsck support patches Darrick J. Wong
                     ` (9 preceding siblings ...)
  2023-02-16 20:44   ` [PATCH 10/23] xfs: create a big array data structure Darrick J. Wong
@ 2023-02-16 20:44   ` Darrick J. Wong
  2023-02-16 20:44   ` [PATCH 12/23] xfs: port scrub inode scan from djwong-dev Darrick J. Wong
                     ` (11 subsequent siblings)
  22 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:44 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Scrub tracks the resources that it's holding onto in the xfs_scrub
structure.  This includes the inode being checked (if applicable) and
the inode lock state of that inode.  Replace the open-coded structure
manipulation with a trivial helper to eliminate sources of error.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/scrub/bmap.c     |    6 ++----
 fs/xfs/scrub/common.c   |   38 +++++++++++++++++++++++++++++++++-----
 fs/xfs/scrub/common.h   |    4 ++++
 fs/xfs/scrub/dir.c      |    3 +--
 fs/xfs/scrub/inode.c    |    6 ++----
 fs/xfs/scrub/parent.c   |   10 +++-------
 fs/xfs/scrub/quota.c    |    9 +++------
 fs/xfs/scrub/rtbitmap.c |   11 +++++------
 fs/xfs/scrub/scrub.c    |    2 +-
 9 files changed, 54 insertions(+), 35 deletions(-)


diff --git a/fs/xfs/scrub/bmap.c b/fs/xfs/scrub/bmap.c
index d50d0eab196a..bab3c5db437b 100644
--- a/fs/xfs/scrub/bmap.c
+++ b/fs/xfs/scrub/bmap.c
@@ -35,8 +35,7 @@ xchk_setup_inode_bmap(
 	if (error)
 		goto out;
 
-	sc->ilock_flags = XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL;
-	xfs_ilock(sc->ip, sc->ilock_flags);
+	xchk_ilock(sc, XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL);
 
 	/*
 	 * We don't want any ephemeral data fork updates sitting around
@@ -73,9 +72,8 @@ xchk_setup_inode_bmap(
 	error = xchk_trans_alloc(sc, 0);
 	if (error)
 		goto out;
-	sc->ilock_flags |= XFS_ILOCK_EXCL;
-	xfs_ilock(sc->ip, XFS_ILOCK_EXCL);
 
+	xchk_ilock(sc, XFS_ILOCK_EXCL);
 out:
 	/* scrub teardown will unlock and release the inode */
 	return error;
diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
index d523cbb2c90b..dc78e28a9447 100644
--- a/fs/xfs/scrub/common.c
+++ b/fs/xfs/scrub/common.c
@@ -750,19 +750,47 @@ xchk_setup_inode_contents(
 		return error;
 
 	/* Got the inode, lock it and we're ready to go. */
-	sc->ilock_flags = XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL;
-	xfs_ilock(sc->ip, sc->ilock_flags);
+	xchk_ilock(sc, XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL);
 	error = xchk_trans_alloc(sc, resblks);
 	if (error)
 		goto out;
-	sc->ilock_flags |= XFS_ILOCK_EXCL;
-	xfs_ilock(sc->ip, XFS_ILOCK_EXCL);
-
+	xchk_ilock(sc, XFS_ILOCK_EXCL);
 out:
 	/* scrub teardown will unlock and release the inode for us */
 	return error;
 }
 
+void
+xchk_ilock(
+	struct xfs_scrub	*sc,
+	unsigned int		ilock_flags)
+{
+	sc->ilock_flags |= ilock_flags;
+	xfs_ilock(sc->ip, ilock_flags);
+}
+
+bool
+xchk_ilock_nowait(
+	struct xfs_scrub	*sc,
+	unsigned int		ilock_flags)
+{
+	if (xfs_ilock_nowait(sc->ip, ilock_flags)) {
+		sc->ilock_flags |= ilock_flags;
+		return true;
+	}
+
+	return false;
+}
+
+void
+xchk_iunlock(
+	struct xfs_scrub	*sc,
+	unsigned int		ilock_flags)
+{
+	xfs_iunlock(sc->ip, ilock_flags);
+	sc->ilock_flags &= ~ilock_flags;
+}
+
 /*
  * Predicate that decides if we need to evaluate the cross-reference check.
  * If there was an error accessing the cross-reference btree, just delete
diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
index 367f754c5cef..5286c263ff60 100644
--- a/fs/xfs/scrub/common.h
+++ b/fs/xfs/scrub/common.h
@@ -139,6 +139,10 @@ void xchk_buffer_recheck(struct xfs_scrub *sc, struct xfs_buf *bp);
 int xchk_iget(struct xfs_scrub *sc, xfs_ino_t inum, struct xfs_inode **ipp);
 void xchk_irele(struct xfs_scrub *sc, struct xfs_inode *ip);
 
+void xchk_ilock(struct xfs_scrub *sc, unsigned int ilock_flags);
+bool xchk_ilock_nowait(struct xfs_scrub *sc, unsigned int ilock_flags);
+void xchk_iunlock(struct xfs_scrub *sc, unsigned int ilock_flags);
+
 /*
  * Don't bother cross-referencing if we already found corruption or cross
  * referencing discrepancies.
diff --git a/fs/xfs/scrub/dir.c b/fs/xfs/scrub/dir.c
index 8076e7620734..2a3107cc8ccb 100644
--- a/fs/xfs/scrub/dir.c
+++ b/fs/xfs/scrub/dir.c
@@ -840,8 +840,7 @@ xchk_directory(
 	 * _dir_lookup routines, which do their own ILOCK locking.
 	 */
 	oldpos = 0;
-	sc->ilock_flags &= ~XFS_ILOCK_EXCL;
-	xfs_iunlock(sc->ip, XFS_ILOCK_EXCL);
+	xchk_iunlock(sc, XFS_ILOCK_EXCL);
 	while (true) {
 		error = xfs_readdir(sc->tp, sc->ip, &sdc.dir_iter, bufsize);
 		if (!xchk_fblock_process_error(sc, XFS_DATA_FORK, 0,
diff --git a/fs/xfs/scrub/inode.c b/fs/xfs/scrub/inode.c
index 7a2f38e5202c..a0ee3ce35ed9 100644
--- a/fs/xfs/scrub/inode.c
+++ b/fs/xfs/scrub/inode.c
@@ -48,13 +48,11 @@ xchk_setup_inode(
 	}
 
 	/* Got the inode, lock it and we're ready to go. */
-	sc->ilock_flags = XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL;
-	xfs_ilock(sc->ip, sc->ilock_flags);
+	xchk_ilock(sc, XFS_IOLOCK_EXCL | XFS_MMAPLOCK_EXCL);
 	error = xchk_trans_alloc(sc, 0);
 	if (error)
 		goto out;
-	sc->ilock_flags |= XFS_ILOCK_EXCL;
-	xfs_ilock(sc->ip, XFS_ILOCK_EXCL);
+	xchk_ilock(sc, XFS_ILOCK_EXCL);
 
 out:
 	/* scrub teardown will unlock and release the inode for us */
diff --git a/fs/xfs/scrub/parent.c b/fs/xfs/scrub/parent.c
index 0c23fd49716b..8581a21bfbfd 100644
--- a/fs/xfs/scrub/parent.c
+++ b/fs/xfs/scrub/parent.c
@@ -137,8 +137,7 @@ xchk_parent_lock_two_dirs(
 		return -EINVAL;
 	}
 
-	xfs_iunlock(sc->ip, sc->ilock_flags);
-	sc->ilock_flags = 0;
+	xchk_iunlock(sc, sc->ilock_flags);
 	while (true) {
 		if (xchk_should_terminate(sc, &error))
 			return error;
@@ -149,10 +148,8 @@ xchk_parent_lock_two_dirs(
 		 * on either IOLOCK.
 		 */
 		if (xfs_ilock_nowait(dp, XFS_IOLOCK_SHARED)) {
-			if (xfs_ilock_nowait(sc->ip, XFS_IOLOCK_EXCL)) {
-				sc->ilock_flags = XFS_IOLOCK_EXCL;
+			if (xchk_ilock_nowait(sc, XFS_IOLOCK_EXCL))
 				break;
-			}
 			xfs_iunlock(dp, XFS_IOLOCK_SHARED);
 		}
 
@@ -299,8 +296,7 @@ xchk_parent(
 	 * getting a write lock on i_rwsem.  Therefore, it is safe for us
 	 * to drop the ILOCK here in order to do directory lookups.
 	 */
-	sc->ilock_flags &= ~(XFS_ILOCK_EXCL | XFS_MMAPLOCK_EXCL);
-	xfs_iunlock(sc->ip, XFS_ILOCK_EXCL | XFS_MMAPLOCK_EXCL);
+	xchk_iunlock(sc, XFS_ILOCK_EXCL | XFS_MMAPLOCK_EXCL);
 
 	/* Look up '..' */
 	error = xfs_dir_lookup(sc->tp, sc->ip, &xfs_name_dotdot, &parent_ino,
diff --git a/fs/xfs/scrub/quota.c b/fs/xfs/scrub/quota.c
index 9eeac8565394..d3db47a3dafb 100644
--- a/fs/xfs/scrub/quota.c
+++ b/fs/xfs/scrub/quota.c
@@ -57,8 +57,7 @@ xchk_setup_quota(
 	if (error)
 		return error;
 	sc->ip = xfs_quota_inode(sc->mp, dqtype);
-	xfs_ilock(sc->ip, XFS_ILOCK_EXCL);
-	sc->ilock_flags = XFS_ILOCK_EXCL;
+	xchk_ilock(sc, XFS_ILOCK_EXCL);
 	return 0;
 }
 
@@ -232,13 +231,11 @@ xchk_quota(
 	 * data fork we have to drop ILOCK_EXCL to use the regular dquot
 	 * functions.
 	 */
-	xfs_iunlock(sc->ip, sc->ilock_flags);
-	sc->ilock_flags = 0;
+	xchk_iunlock(sc, sc->ilock_flags);
 	sqi.sc = sc;
 	sqi.last_id = 0;
 	error = xfs_qm_dqiterate(mp, dqtype, xchk_quota_item, &sqi);
-	sc->ilock_flags = XFS_ILOCK_EXCL;
-	xfs_ilock(sc->ip, sc->ilock_flags);
+	xchk_ilock(sc, XFS_ILOCK_EXCL);
 	if (error == -ECANCELED)
 		error = 0;
 	if (!xchk_fblock_process_error(sc, XFS_DATA_FORK,
diff --git a/fs/xfs/scrub/rtbitmap.c b/fs/xfs/scrub/rtbitmap.c
index 0a3bde64c675..340964a35963 100644
--- a/fs/xfs/scrub/rtbitmap.c
+++ b/fs/xfs/scrub/rtbitmap.c
@@ -28,10 +28,9 @@ xchk_setup_rt(
 	if (error)
 		return error;
 
-	sc->ilock_flags = XFS_ILOCK_EXCL | XFS_ILOCK_RTBITMAP;
 	sc->ip = sc->mp->m_rbmip;
-	xfs_ilock(sc->ip, sc->ilock_flags);
-
+	sc->ilock_flags = 0;
+	xchk_ilock(sc, XFS_ILOCK_EXCL | XFS_ILOCK_RTBITMAP);
 	return 0;
 }
 
@@ -141,8 +140,8 @@ xchk_rtsummary(
 	 * flags so that we don't mix up the inode state that @sc tracks.
 	 */
 	sc->ip = rsumip;
-	sc->ilock_flags = XFS_ILOCK_EXCL | XFS_ILOCK_RTSUM;
-	xfs_ilock(sc->ip, sc->ilock_flags);
+	sc->ilock_flags = 0;
+	xchk_ilock(sc, XFS_ILOCK_EXCL | XFS_ILOCK_RTSUM);
 
 	/* Invoke the fork scrubber. */
 	error = xchk_metadata_inode_forks(sc);
@@ -153,7 +152,7 @@ xchk_rtsummary(
 	xchk_set_incomplete(sc);
 out:
 	/* Switch back to the rtbitmap inode and lock flags. */
-	xfs_iunlock(sc->ip, sc->ilock_flags);
+	xchk_iunlock(sc, XFS_ILOCK_EXCL | XFS_ILOCK_RTSUM);
 	sc->ilock_flags = old_ilock_flags;
 	sc->ip = old_ip;
 	return error;
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index 752cb4fbd26f..6aedce9b67fc 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -163,7 +163,7 @@ xchk_teardown(
 	}
 	if (sc->ip) {
 		if (sc->ilock_flags)
-			xfs_iunlock(sc->ip, sc->ilock_flags);
+			xchk_iunlock(sc, sc->ilock_flags);
 		if (sc->ip != ip_in &&
 		    !xfs_internal_inum(sc->mp, sc->ip->i_ino))
 			xchk_irele(sc, sc->ip);


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 12/23] xfs: port scrub inode scan from djwong-dev
  2023-02-16 20:27 ` [PATCHSET v9r2d1 00/23] xfs: online fsck support patches Darrick J. Wong
                     ` (10 preceding siblings ...)
  2023-02-16 20:44   ` [PATCH 11/23] xfs: wrap ilock/iunlock operations on sc->ip Darrick J. Wong
@ 2023-02-16 20:44   ` Darrick J. Wong
  2023-02-16 20:45   ` [PATCH 13/23] xfs: allow scrub to hook metadata updates in other writers Darrick J. Wong
                     ` (10 subsequent siblings)
  22 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:44 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Port the inode scanning code from djwong-dev so we can use it here.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/Makefile      |    1 
 fs/xfs/scrub/iscan.c |  483 ++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/iscan.h |   65 +++++++
 fs/xfs/scrub/trace.c |    1 
 fs/xfs/scrub/trace.h |   73 ++++++++
 fs/xfs/xfs_icache.c  |    3 
 fs/xfs/xfs_icache.h  |   11 +
 7 files changed, 632 insertions(+), 5 deletions(-)
 create mode 100644 fs/xfs/scrub/iscan.c
 create mode 100644 fs/xfs/scrub/iscan.h


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index b55b8ece7627..69805b4ad79f 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -157,6 +157,7 @@ xfs-y				+= $(addprefix scrub/, \
 				   health.o \
 				   ialloc.o \
 				   inode.o \
+				   iscan.o \
 				   parent.o \
 				   refcount.o \
 				   rmap.o \
diff --git a/fs/xfs/scrub/iscan.c b/fs/xfs/scrub/iscan.c
new file mode 100644
index 000000000000..8cf486dfde19
--- /dev/null
+++ b/fs/xfs/scrub/iscan.c
@@ -0,0 +1,483 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (C) 2022 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <djwong@kernel.org>
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_inode.h"
+#include "xfs_btree.h"
+#include "xfs_ialloc.h"
+#include "xfs_ialloc_btree.h"
+#include "xfs_ag.h"
+#include "xfs_error.h"
+#include "xfs_bit.h"
+#include "xfs_icache.h"
+#include "scrub/scrub.h"
+#include "scrub/iscan.h"
+#include "scrub/common.h"
+#include "scrub/trace.h"
+
+/*
+ * Live File Scan
+ * ==============
+ *
+ * Live file scans walk every inode in a live filesystem.  This is more or
+ * less like a regular iwalk, except that when we're advancing the scan cursor,
+ * we must ensure that inodes cannot be added or deleted anywhere between the
+ * old cursor value and the new cursor value.  If we're advancing the cursor
+ * by one inode, the caller must hold that inode; if we're finding the next
+ * inode to scan, we must grab the AGI and hold it until we've updated the
+ * scan cursor.
+ *
+ * Callers are expected to use this code to scan all files in the filesystem to
+ * construct a new metadata index of some kind.  The scan races against other
+ * live updates, which means there must be a provision to update the new index
+ * when updates are made to inodes that already been scanned.  The iscan lock
+ * can be used in live update hook code to stop the scan and protect this data
+ * structure.
+ *
+ * To keep the new index up to date with other metadata updates being made to
+ * the live filesystem, it is assumed that the caller will add hooks as needed
+ * to be notified when a metadata update occurs.  The inode scanner must tell
+ * the hook code when an inode has been visited with xchk_iscan_mark_visit.
+ * Hook functions can use xchk_iscan_want_live_update to decide if the
+ * scanner's observations must be updated.
+ */
+
+/*
+ * Set the bits in @irec's free mask that correspond to the inodes before
+ * @agino so that we skip them.  This is how we restart an inode walk that was
+ * interrupted in the middle of an inode record.
+ */
+STATIC void
+xchk_iscan_adjust_start(
+	xfs_agino_t			agino,	/* starting inode of chunk */
+	struct xfs_inobt_rec_incore	*irec)	/* btree record */
+{
+	int				idx;	/* index into inode chunk */
+
+	idx = agino - irec->ir_startino;
+
+	irec->ir_free |= xfs_inobt_maskn(0, idx);
+	irec->ir_freecount = hweight64(irec->ir_free);
+}
+
+/*
+ * Set *cursor to the next allocated inode after whatever it's set to now.
+ * If there are no more inodes in this AG, cursor is set to NULLAGINO.
+ */
+STATIC int
+xchk_iscan_find_next(
+	struct xfs_scrub	*sc,
+	struct xfs_buf		*agi_bp,
+	struct xfs_perag	*pag,
+	xfs_agino_t		*cursor)
+{
+	struct xfs_inobt_rec_incore	rec;
+	struct xfs_btree_cur	*cur;
+	struct xfs_mount	*mp = sc->mp;
+	struct xfs_trans	*tp = sc->tp;
+	xfs_agnumber_t		agno = pag->pag_agno;
+	xfs_agino_t		lastino = NULLAGINO;
+	xfs_agino_t		first, last;
+	xfs_agino_t		agino = *cursor;
+	int			has_rec;
+	int			error;
+
+	/* If the cursor is beyond the end of this AG, move to the next one. */
+	xfs_agino_range(mp, agno, &first, &last);
+	if (agino > last) {
+		*cursor = NULLAGINO;
+		return 0;
+	}
+
+	/*
+	 * Look up the inode chunk for the current cursor position.  If there
+	 * is no chunk here, we want the next one.
+	 */
+	cur = xfs_inobt_init_cursor(mp, tp, agi_bp, pag, XFS_BTNUM_INO);
+	error = xfs_inobt_lookup(cur, agino, XFS_LOOKUP_LE, &has_rec);
+	if (!error && !has_rec)
+		error = xfs_btree_increment(cur, 0, &has_rec);
+	for (; !error; error = xfs_btree_increment(cur, 0, &has_rec)) {
+		/*
+		 * If we've run out of inobt records in this AG, move the
+		 * cursor on to the next AG and exit.  The caller can try
+		 * again with the next AG.
+		 */
+		if (!has_rec) {
+			*cursor = NULLAGINO;
+			break;
+		}
+
+		error = xfs_inobt_get_rec(cur, &rec, &has_rec);
+		if (error)
+			break;
+		if (!has_rec) {
+			error = -EFSCORRUPTED;
+			break;
+		}
+
+		/* Make sure that we always move forward. */
+		if (lastino != NULLAGINO &&
+		    XFS_IS_CORRUPT(mp, lastino >= rec.ir_startino)) {
+			error = -EFSCORRUPTED;
+			break;
+		}
+		lastino = rec.ir_startino + XFS_INODES_PER_CHUNK - 1;
+
+		/*
+		 * If this record only covers inodes that come before the
+		 * cursor, advance to the next record.
+		 */
+		if (rec.ir_startino + XFS_INODES_PER_CHUNK <= agino)
+			continue;
+
+		/*
+		 * If the incoming lookup put us in the middle of an inobt
+		 * record, mark it and the previous inodes "free" so that the
+		 * search for allocated inodes will start at the cursor.  Use
+		 * funny math to avoid overflowing the bit shift.
+		 */
+		if (agino >= rec.ir_startino)
+			xchk_iscan_adjust_start(agino + 1, &rec);
+
+		/*
+		 * If there are allocated inodes in this chunk, find them,
+		 * and update the cursor.
+		 */
+		if (rec.ir_freecount < XFS_INODES_PER_CHUNK) {
+			int	next = xfs_lowbit64(~rec.ir_free);
+
+			*cursor = rec.ir_startino + next;
+			break;
+		}
+	}
+
+	xfs_btree_del_cursor(cur, error);
+	return error;
+}
+
+/*
+ * Prepare to return agno/agino to the iscan caller by moving the lastino
+ * cursor to the previous inode.  Do this while we still hold the AGI so that
+ * no other threads can create or delete inodes in this AG.
+ */
+static inline void
+xchk_iscan_move_cursor(
+	struct xfs_scrub	*sc,
+	struct xchk_iscan	*iscan,
+	xfs_agnumber_t		agno,
+	xfs_agino_t		agino)
+{
+	struct xfs_mount	*mp = sc->mp;
+
+	mutex_lock(&iscan->lock);
+	iscan->cursor_ino = XFS_AGINO_TO_INO(mp, agno, agino);
+	iscan->__visited_ino = iscan->cursor_ino - 1;
+	trace_xchk_iscan_move_cursor(mp, iscan);
+	mutex_unlock(&iscan->lock);
+}
+
+/*
+ * Prepare to return agno/agino to the iscan caller by moving the lastino
+ * cursor to the previous inode.  Do this while we still hold the AGI so that
+ * no other threads can create or delete inodes in this AG.
+ */
+static inline void
+xchk_iscan_finish_scan(
+	struct xfs_scrub	*sc,
+	struct xchk_iscan	*iscan)
+{
+	struct xfs_mount	*mp = sc->mp;
+
+	mutex_lock(&iscan->lock);
+	iscan->cursor_ino = NULLFSINO;
+
+	/* All live updates will be applied from now on */
+	iscan->__visited_ino = NULLFSINO;
+
+	trace_xchk_iscan_move_cursor(mp, iscan);
+	mutex_unlock(&iscan->lock);
+}
+
+/*
+ * Advance ino to the next inode that the inobt thinks is allocated, being
+ * careful to jump to the next AG if we've reached the right end of this AG's
+ * inode btree.  Advancing ino effectively means that we've pushed the inode
+ * scan forward, so set the iscan cursor to (ino - 1) so that our live update
+ * predicates will track inode allocations in that part of the inode number
+ * key space once we release the AGI buffer.
+ *
+ * Returns 1 if there's a new inode to examine, 0 if we've run out of inodes,
+ * -ECANCELED if the live scan aborted, or the usual negative errno.
+ */
+STATIC int
+xchk_iscan_advance(
+	struct xfs_scrub	*sc,
+	struct xchk_iscan	*iscan,
+	struct xfs_buf		**agi_bpp)
+{
+	struct xfs_mount	*mp = sc->mp;
+	struct xfs_buf		*agi_bp;
+	struct xfs_perag	*pag;
+	xfs_agnumber_t		agno;
+	xfs_agino_t		agino;
+	int			ret;
+
+	ASSERT(iscan->cursor_ino >= iscan->__visited_ino);
+
+	do {
+		agno = XFS_INO_TO_AGNO(mp, iscan->cursor_ino);
+		pag = xfs_perag_get(mp, agno);
+		if (!pag) {
+			xchk_iscan_finish_scan(sc, iscan);
+			return 0;
+		}
+
+		ret = xfs_ialloc_read_agi(pag, sc->tp, &agi_bp);
+		if (ret)
+			goto out_pag;
+
+		agino = XFS_INO_TO_AGINO(mp, iscan->cursor_ino);
+		ret = xchk_iscan_find_next(sc, agi_bp, pag, &agino);
+		if (ret)
+			goto out_buf;
+
+		if (agino != NULLAGINO)
+			break;
+
+		xchk_iscan_move_cursor(sc, iscan, agno + 1, 0);
+		xfs_trans_brelse(sc->tp, agi_bp);
+		xfs_perag_put(pag);
+
+		if (xchk_iscan_aborted(iscan))
+			return -ECANCELED;
+	} while (1);
+
+	xchk_iscan_move_cursor(sc, iscan, agno, agino);
+	*agi_bpp = agi_bp;
+	xfs_perag_put(pag);
+	return 1;
+
+out_buf:
+	xfs_trans_brelse(sc->tp, agi_bp);
+out_pag:
+	xfs_perag_put(pag);
+	return ret;
+}
+
+/*
+ * Grabbing the inode failed, so we need to back up the scan and ask the caller
+ * to try to _advance the scan again.  Returns -EBUSY if we've run out of retry
+ * opportunities, -ECANCELED if the process has a fatal signal pending, or
+ * -EAGAIN if we should try again.
+ */
+STATIC int
+xchk_iscan_iget_retry(
+	struct xfs_mount	*mp,
+	struct xchk_iscan	*iscan,
+	bool			wait)
+{
+	ASSERT(iscan->cursor_ino == iscan->__visited_ino + 1);
+
+	if (!iscan->iget_timeout ||
+	    time_is_before_jiffies(iscan->__iget_deadline))
+		return -EBUSY;
+
+	if (wait) {
+		unsigned long	relax;
+
+		/*
+		 * Sleep for a period of time to let the rest of the system
+		 * catch up.  If we return early, someone sent a kill signal to
+		 * the calling process.
+		 */
+		relax = msecs_to_jiffies(iscan->iget_retry_delay);
+		trace_xchk_iscan_iget_retry_wait(mp, iscan);
+
+		if (schedule_timeout_killable(relax) ||
+		    xchk_iscan_aborted(iscan))
+			return -ECANCELED;
+	}
+
+	iscan->cursor_ino--;
+	return -EAGAIN;
+}
+
+/*
+ * Grab an inode as part of an inode scan.  While scanning this inode, the
+ * caller must ensure that no other threads can modify the inode until a call
+ * to xchk_iscan_visit succeeds.
+ *
+ * Returns 0 and an incore inode; -EAGAIN if the caller should call again
+ * xchk_iscan_advance; -EBUSY if we couldn't grab an inode; -ECANCELED if
+ * there's a fatal signal pending; or some other negative errno.
+ */
+STATIC int
+xchk_iscan_iget(
+	struct xfs_scrub	*sc,
+	struct xchk_iscan	*iscan,
+	struct xfs_buf		*agi_bp,
+	struct xfs_inode	**ipp)
+{
+	struct xfs_mount	*mp = sc->mp;
+	int			error;
+
+	error = xfs_iget(sc->mp, sc->tp, iscan->cursor_ino, XFS_IGET_NORETRY, 0,
+			ipp);
+	xfs_trans_brelse(sc->tp, agi_bp);
+
+	trace_xchk_iscan_iget(mp, iscan, error);
+
+	if (error == -ENOENT || error == -EAGAIN) {
+		/*¬
+		 * It's possible that this inode has lost all of its links but
+		 * hasn't yet been inactivated.  If we don't have a transaction
+		 * or it's not writable, flush the inodegc workers and wait.
+		 * Otherwise, we have a dirty transaction in progress and the
+		 * best we can do is to queue the inodegc workers.
+		 */
+		if (!iscan->iget_nowait)
+			xfs_inodegc_flush(mp);
+		else
+			xfs_inodegc_push(mp);
+		return xchk_iscan_iget_retry(mp, iscan, true);
+	}
+
+	if (error == -EINVAL) {
+		/*
+		 * We thought the inode was allocated, but the inode btree
+		 * lookup failed, which means that it was freed since the last
+		 * time we advanced the cursor.  Back up and try again.  This
+		 * should never happen since still hold the AGI buffer from the
+		 * inobt check, but we need to be careful about infinite loops.
+		 */
+		return xchk_iscan_iget_retry(mp, iscan, false);
+	}
+
+	return error;
+}
+
+/*
+ * Advance the inode scan cursor to the next allocated inode and return the
+ * incore inode structure associated with it.
+ *
+ * Returns 1 if there's a new inode to examine, 0 if we've run out of inodes,
+ * -ECANCELED if the live scan aborted, -EBUSY if the incore inode could not be
+ * grabbed, or the usual negative errno.
+ *
+ * If the function returns -EBUSY and the caller can handle skipping an inode,
+ * it may call this function again to continue the scan with the next allocated
+ * inode.
+ */
+int
+xchk_iscan_iter(
+	struct xfs_scrub	*sc,
+	struct xchk_iscan	*iscan,
+	struct xfs_inode	**ipp)
+{
+	int			ret;
+
+	if (iscan->iget_timeout)
+		iscan->__iget_deadline = jiffies +
+					 msecs_to_jiffies(iscan->iget_timeout);
+
+	do {
+		struct xfs_buf	*agi_bp = NULL;
+
+		ret = xchk_iscan_advance(sc, iscan, &agi_bp);
+		if (ret != 1)
+			return ret;
+
+		if (xchk_iscan_aborted(iscan)) {
+			xfs_trans_brelse(sc->tp, agi_bp);
+			ret = -ECANCELED;
+			break;
+		}
+
+		ret = xchk_iscan_iget(sc, iscan, agi_bp, ipp);
+	} while (ret == -EAGAIN);
+
+	if (!ret)
+		return 1;
+
+	return ret;
+}
+
+
+/* Release inode scan resources. */
+void
+xchk_iscan_finish(
+	struct xchk_iscan	*iscan)
+{
+	mutex_destroy(&iscan->lock);
+	iscan->cursor_ino = NULLFSINO;
+	iscan->__visited_ino = NULLFSINO;
+}
+
+/*
+ * Set ourselves up to start an inode scan.  If the @iget_timeout and
+ * @iget_retry_delay parameters are set, the scan will try to iget each inode
+ * for @iget_timeout milliseconds.  If an iget call indicates that the inode is
+ * waiting to be inactivated, the CPU will relax for @iget_retry_delay
+ * milliseconds after pushing the inactivation workers.
+ */
+void
+xchk_iscan_start(
+	struct xchk_iscan	*iscan,
+	unsigned int		iget_timeout,
+	unsigned int		iget_retry_delay)
+{
+	clear_bit(XCHK_ISCAN_OPSTATE_ABORTED, &iscan->__opstate);
+	iscan->iget_timeout = iget_timeout;
+	iscan->iget_retry_delay = iget_retry_delay;
+	iscan->__visited_ino = 0;
+	iscan->cursor_ino = 0;
+	mutex_init(&iscan->lock);
+}
+
+/*
+ * Mark this inode as having been visited.  Callers must hold a sufficiently
+ * exclusive lock on the inode to prevent concurrent modifications.
+ */
+void
+xchk_iscan_mark_visited(
+	struct xchk_iscan	*iscan,
+	struct xfs_inode	*ip)
+{
+	mutex_lock(&iscan->lock);
+	iscan->__visited_ino = ip->i_ino;
+	trace_xchk_iscan_visit(ip->i_mount, iscan);
+	mutex_unlock(&iscan->lock);
+}
+
+/*
+ * Do we need a live update for this inode?  This is true if the scanner thread
+ * has visited this inode and the scan hasn't been aborted due to errors.
+ * Callers must hold a sufficiently exclusive lock on the inode to prevent
+ * scanners from reading any inode metadata.
+ */
+bool
+xchk_iscan_want_live_update(
+	struct xchk_iscan	*iscan,
+	xfs_ino_t		ino)
+{
+        bool			ret;
+
+	if (xchk_iscan_aborted(iscan))
+		return false;
+
+	mutex_lock(&iscan->lock);
+	ret = iscan->__visited_ino >= ino;
+	mutex_unlock(&iscan->lock);
+
+	return ret;
+}
diff --git a/fs/xfs/scrub/iscan.h b/fs/xfs/scrub/iscan.h
new file mode 100644
index 000000000000..f10b71d9cec4
--- /dev/null
+++ b/fs/xfs/scrub/iscan.h
@@ -0,0 +1,65 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Copyright (C) 2022 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <djwong@kernel.org>
+ */
+#ifndef __XFS_SCRUB_ISCAN_H__
+#define __XFS_SCRUB_ISCAN_H__
+
+struct xchk_iscan {
+	/* Lock to protect the scan cursor. */
+	struct mutex		lock;
+
+	/* This is the inode that will be examined next. */
+	xfs_ino_t		cursor_ino;
+
+	/*
+	 * This is the last inode that we've successfully scanned, either
+	 * because the caller scanned it, or we moved the cursor past an empty
+	 * part of the inode address space.  Scan callers should only use the
+	 * xchk_iscan_visit function to modify this.
+	 */
+	xfs_ino_t		__visited_ino;
+
+	/* Operational state of the livescan. */
+	unsigned long		__opstate;
+
+	/* Give up on iterating @cursor_ino if we can't iget it by this time. */
+	unsigned long		__iget_deadline;
+
+	/* Amount of time (in ms) that we will try to iget an inode. */
+	unsigned int		iget_timeout;
+
+	/* Wait this many ms to retry an iget. */
+	unsigned int		iget_retry_delay;
+
+	/* True if we cannot allow iget to wait indefinitely. */
+	bool			iget_nowait:1;
+};
+
+/* Set if the scan has been aborted due to some event in the fs. */
+#define XCHK_ISCAN_OPSTATE_ABORTED	(1)
+
+static inline bool
+xchk_iscan_aborted(const struct xchk_iscan *iscan)
+{
+	return test_bit(XCHK_ISCAN_OPSTATE_ABORTED, &iscan->__opstate);
+}
+
+static inline void
+xchk_iscan_abort(struct xchk_iscan *iscan)
+{
+	set_bit(XCHK_ISCAN_OPSTATE_ABORTED, &iscan->__opstate);
+}
+
+void xchk_iscan_start(struct xchk_iscan *iscan, unsigned int iget_timeout,
+		unsigned int iget_retry_delay);
+void xchk_iscan_finish(struct xchk_iscan *iscan);
+
+int xchk_iscan_iter(struct xfs_scrub *sc, struct xchk_iscan *iscan,
+		struct xfs_inode **ipp);
+
+void xchk_iscan_mark_visited(struct xchk_iscan *iscan, struct xfs_inode *ip);
+bool xchk_iscan_want_live_update(struct xchk_iscan *iscan, xfs_ino_t ino);
+
+#endif /* __XFS_SCRUB_ISCAN_H__ */
diff --git a/fs/xfs/scrub/trace.c b/fs/xfs/scrub/trace.c
index 4a0385c97ea6..83e8a64c95d4 100644
--- a/fs/xfs/scrub/trace.c
+++ b/fs/xfs/scrub/trace.c
@@ -16,6 +16,7 @@
 #include "scrub/scrub.h"
 #include "scrub/xfile.h"
 #include "scrub/xfarray.h"
+#include "scrub/iscan.h"
 
 /* Figure out which block the btree cursor was pointing to. */
 static inline xfs_fsblock_t
diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
index 25086df0963c..d97c9a40186a 100644
--- a/fs/xfs/scrub/trace.h
+++ b/fs/xfs/scrub/trace.h
@@ -18,6 +18,7 @@
 
 struct xfile;
 struct xfarray;
+struct xchk_iscan;
 
 /*
  * ftrace's __print_symbolic requires that all enum values be wrapped in the
@@ -780,6 +781,78 @@ TRACE_EVENT(xfarray_create,
 		  __entry->obj_size_log)
 );
 
+DECLARE_EVENT_CLASS(xchk_iscan_class,
+	TP_PROTO(struct xfs_mount *mp, struct xchk_iscan *iscan),
+	TP_ARGS(mp, iscan),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_ino_t, cursor)
+		__field(xfs_ino_t, visited)
+	),
+	TP_fast_assign(
+		__entry->dev = mp->m_super->s_dev;
+		__entry->cursor = iscan->cursor_ino;
+		__entry->visited = iscan->__visited_ino;
+	),
+	TP_printk("dev %d:%d iscan cursor 0x%llx visited 0x%llx",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->cursor, __entry->visited)
+)
+#define DEFINE_ISCAN_EVENT(name) \
+DEFINE_EVENT(xchk_iscan_class, name, \
+	TP_PROTO(struct xfs_mount *mp, struct xchk_iscan *iscan), \
+	TP_ARGS(mp, iscan))
+DEFINE_ISCAN_EVENT(xchk_iscan_move_cursor);
+DEFINE_ISCAN_EVENT(xchk_iscan_visit);
+
+TRACE_EVENT(xchk_iscan_iget,
+	TP_PROTO(struct xfs_mount *mp, struct xchk_iscan *iscan, int error),
+	TP_ARGS(mp, iscan, error),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_ino_t, cursor)
+		__field(xfs_ino_t, visited)
+		__field(int, error)
+	),
+	TP_fast_assign(
+		__entry->dev = mp->m_super->s_dev;
+		__entry->cursor = iscan->cursor_ino;
+		__entry->visited = iscan->__visited_ino;
+		__entry->error = error;
+	),
+	TP_printk("dev %d:%d iscan cursor 0x%llx visited 0x%llx error %d",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->cursor, __entry->visited, __entry->error)
+);
+
+TRACE_EVENT(xchk_iscan_iget_retry_wait,
+	TP_PROTO(struct xfs_mount *mp, struct xchk_iscan *iscan),
+	TP_ARGS(mp, iscan),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_ino_t, cursor)
+		__field(xfs_ino_t, visited)
+		__field(unsigned int, retry_delay)
+		__field(unsigned long, remaining)
+		__field(unsigned int, iget_timeout)
+	),
+	TP_fast_assign(
+		__entry->dev = mp->m_super->s_dev;
+		__entry->cursor = iscan->cursor_ino;
+		__entry->visited = iscan->__visited_ino;
+		__entry->retry_delay = iscan->iget_retry_delay;
+		__entry->remaining = jiffies_to_msecs(iscan->__iget_deadline - jiffies);
+		__entry->iget_timeout = iscan->iget_timeout;
+	),
+	TP_printk("dev %d:%d iscan cursor 0x%llx visited 0x%llx remaining %lu timeout %u delay %u",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->cursor,
+		  __entry->visited,
+		  __entry->remaining,
+		  __entry->iget_timeout,
+		  __entry->retry_delay)
+);
+
 /* repair tracepoints */
 #if IS_ENABLED(CONFIG_XFS_ONLINE_REPAIR)
 
diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c
index ddeaccc04aec..0d58d7b0d8ac 100644
--- a/fs/xfs/xfs_icache.c
+++ b/fs/xfs/xfs_icache.c
@@ -767,7 +767,8 @@ xfs_iget(
 	return 0;
 
 out_error_or_again:
-	if (!(flags & XFS_IGET_INCORE) && error == -EAGAIN) {
+	if (!(flags & (XFS_IGET_INCORE | XFS_IGET_NORETRY)) &&
+	    error == -EAGAIN) {
 		delay(1);
 		goto again;
 	}
diff --git a/fs/xfs/xfs_icache.h b/fs/xfs/xfs_icache.h
index 6cd180721659..87910191a9dd 100644
--- a/fs/xfs/xfs_icache.h
+++ b/fs/xfs/xfs_icache.h
@@ -34,10 +34,13 @@ struct xfs_icwalk {
 /*
  * Flags for xfs_iget()
  */
-#define XFS_IGET_CREATE		0x1
-#define XFS_IGET_UNTRUSTED	0x2
-#define XFS_IGET_DONTCACHE	0x4
-#define XFS_IGET_INCORE		0x8	/* don't read from disk or reinit */
+#define XFS_IGET_CREATE		(1U << 0)
+#define XFS_IGET_UNTRUSTED	(1U << 1)
+#define XFS_IGET_DONTCACHE	(1U << 2)
+/* don't read from disk or reinit */
+#define XFS_IGET_INCORE		(1U << 3)
+/* Return -EAGAIN immediately if the inode is unavailable. */
+#define XFS_IGET_NORETRY	(1U << 4)
 
 int xfs_iget(struct xfs_mount *mp, struct xfs_trans *tp, xfs_ino_t ino,
 	     uint flags, uint lock_flags, xfs_inode_t **ipp);


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 13/23] xfs: allow scrub to hook metadata updates in other writers
  2023-02-16 20:27 ` [PATCHSET v9r2d1 00/23] xfs: online fsck support patches Darrick J. Wong
                     ` (11 preceding siblings ...)
  2023-02-16 20:44   ` [PATCH 12/23] xfs: port scrub inode scan from djwong-dev Darrick J. Wong
@ 2023-02-16 20:45   ` Darrick J. Wong
  2023-02-16 20:45   ` [PATCH 14/23] xfs: allow blocking notifier chains with filesystem hooks Darrick J. Wong
                     ` (9 subsequent siblings)
  22 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:45 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Certain types of filesystem metadata can only be checked by scanning
every file in the entire filesystem.  Specific examples of this include
quota counts, file link counts, and reverse mappings of file extents.
Directory and parent pointer reconstruction may also fall into this
category.  File scanning is much trickier than scanning AG metadata
because we have to take inode locks in the same order as the rest of
[VX]FS, we can't be holding buffer locks when we do that, and scanning
the whole filesystem takes time.

Earlier versions of the online repair patchset relied heavily on
fsfreeze as a means to quiesce the filesystem so that we could take
locks in the proper order without worrying about concurrent updates from
other writers.  Reviewers of those patches opined that freezing the
entire fs to check and repair something was not sufficiently better than
unmounting to run fsck offline.  I don't agree with that 100%, but the
message was clear: find a way to repair things that minimizes the
quiet period where nobody can write to the filesystem.

Generally, building btree indexes online can be split into two phases: a
collection phase where we compute the records that will be put into the
new btree; and a construction phase, where we construct the physical
btree blocks and persist them.  While it's simple to hold resource locks
for the entirety of the two phases to ensure that the new index is
consistent with the rest of the system, we don't need to hold resource
locks during the collection phase if we have a means to receive live
updates of other work going on elsewhere in the system.

The goal of this patch, then, is to enable online fsck to learn about
metadata updates going on in other threads while it constructs a shadow
copy of the metadata records to verify or correct the real metadata.  To
minimize the overhead when online fsck isn't running, we use srcu
notifiers because they prioritize fast access to the notifier call chain
(particularly when the chain is empty) at a cost to configuring
notifiers.  Online fsck should be relatively infrequent, so this is
acceptable.

The intended usage model is fairly simple.  Code that modifies a
metadata structure of interest should declare a xfs_hook_chain structure
in some well defined place, and call xfs_hook_call whenever an update
happens.  Online fsck code should define a struct notifier_block and use
xfs_hook_add to attach the block to the chain, along with a function to
be called.  This function should synchronize with the fsck scanner to
update whatever in-memory data the scanner is collecting.  When
finished, xfs_hook_del removes the notifier from the list and waits for
them all to complete.

On the author's computer, calling an empty srcu notifier chain was
observed to have an overhead averaging ~40ns with a maximum of 60ns.
Adding a no-op notifier function increased the average to ~58ns and
66ns.  When the quotacheck live update notifier is attached, the average
increases to ~322ns with a max of 372ns to update scrub's in-memory
observation data, assuming no lock contention.

With jump labels enabled, calls to empty srcu notifier chains are elided
from the call sites when there are no hooks registered, which means that
the overhead is 0.36ns when fsck is not running.  For compilers that do
not support jump labels (all major architectures do), the overhead of a
no-op notifier call is less bad (on a many-cpu system) than the atomic
counter ops, so we make the hook switch itself a nop.

Note: This new code is also split out as a separate patch from its
initial user so that the author can move patches around his tree with
ease.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/Kconfig     |    6 +++++
 fs/xfs/Makefile    |    2 ++
 fs/xfs/xfs_hooks.c |   53 +++++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_hooks.h |   68 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_linux.h |    1 +
 5 files changed, 130 insertions(+)
 create mode 100644 fs/xfs/xfs_hooks.c
 create mode 100644 fs/xfs/xfs_hooks.h


diff --git a/fs/xfs/Kconfig b/fs/xfs/Kconfig
index 7f12b40146b3..e99821c4c337 100644
--- a/fs/xfs/Kconfig
+++ b/fs/xfs/Kconfig
@@ -92,12 +92,18 @@ config XFS_RT
 	  See the xfs man page in section 5 for additional information.
 
 	  If unsure, say N.
+ 
+config XFS_LIVE_HOOKS
+	bool
+	select JUMP_LABEL if HAVE_ARCH_JUMP_LABEL
 
 config XFS_ONLINE_SCRUB
 	bool "XFS online metadata check support"
 	default n
 	depends on XFS_FS
 	depends on TMPFS && SHMEM
+	depends on SRCU
+	select XFS_LIVE_HOOKS
 	help
 	  If you say Y here you will be able to check metadata on a
 	  mounted XFS filesystem.  This feature is intended to reduce
diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 69805b4ad79f..64a3cc396e16 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -137,6 +137,8 @@ ifeq ($(CONFIG_MEMORY_FAILURE),y)
 xfs-$(CONFIG_FS_DAX)		+= xfs_notify_failure.o
 endif
 
+xfs-$(CONFIG_XFS_LIVE_HOOKS)	+= xfs_hooks.o
+
 # online scrub/repair
 ifeq ($(CONFIG_XFS_ONLINE_SCRUB),y)
 
diff --git a/fs/xfs/xfs_hooks.c b/fs/xfs/xfs_hooks.c
new file mode 100644
index 000000000000..3f958ece0dc0
--- /dev/null
+++ b/fs/xfs/xfs_hooks.c
@@ -0,0 +1,53 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (C) 2022 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <djwong@kernel.org>
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_ag.h"
+#include "xfs_trace.h"
+
+/* Initialize a notifier chain. */
+void
+xfs_hooks_init(
+	struct xfs_hooks	*chain)
+{
+	srcu_init_notifier_head(&chain->head);
+}
+
+/* Make it so a function gets called whenever we hit a certain hook point. */
+int
+xfs_hooks_add(
+	struct xfs_hooks	*chain,
+	struct xfs_hook		*hook)
+{
+	ASSERT(hook->nb.notifier_call != NULL);
+	BUILD_BUG_ON(offsetof(struct xfs_hook, nb) != 0);
+
+	return srcu_notifier_chain_register(&chain->head, &hook->nb);
+}
+
+/* Remove a previously installed hook. */
+void
+xfs_hooks_del(
+	struct xfs_hooks	*chain,
+	struct xfs_hook		*hook)
+{
+	srcu_notifier_chain_unregister(&chain->head, &hook->nb);
+	rcu_barrier();
+}
+
+/* Call a hook.  Returns the NOTIFY_* value returned by the last hook. */
+int
+xfs_hooks_call(
+	struct xfs_hooks	*chain,
+	unsigned long		val,
+	void			*priv)
+{
+	return srcu_notifier_call_chain(&chain->head, val, priv);
+}
diff --git a/fs/xfs/xfs_hooks.h b/fs/xfs/xfs_hooks.h
new file mode 100644
index 000000000000..8dd5baffb84a
--- /dev/null
+++ b/fs/xfs/xfs_hooks.h
@@ -0,0 +1,68 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (C) 2022 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <djwong@kernel.org>
+ */
+#ifndef XFS_HOOKS_H_
+#define XFS_HOOKS_H_
+
+#ifdef CONFIG_XFS_LIVE_HOOKS
+struct xfs_hooks {
+	struct srcu_notifier_head	head;
+};
+#else
+struct xfs_hooks { /* empty */ };
+#endif
+
+/*
+ * If hooks and jump labels are enabled, we use jump labels (aka patching of
+ * the code segment) to avoid the minute overhead of calling an empty notifier
+ * chain when we know there are no callers.  If hooks are enabled without jump
+ * labels, hardwire the predicate to true because calling an empty srcu
+ * notifier chain isn't so expensive.
+ */
+#if defined(CONFIG_JUMP_LABEL) && defined(CONFIG_XFS_LIVE_HOOKS)
+# define DEFINE_STATIC_XFS_HOOK_SWITCH(name) \
+	static DEFINE_STATIC_KEY_FALSE(name)
+# define xfs_hooks_switch_on(name)	static_branch_inc(name)
+# define xfs_hooks_switch_off(name)	static_branch_dec(name)
+# define xfs_hooks_switched_on(name)	static_branch_unlikely(name)
+#elif defined(CONFIG_XFS_LIVE_HOOKS)
+# define DEFINE_STATIC_XFS_HOOK_SWITCH(name)
+# define xfs_hooks_switch_on(name)	((void)0)
+# define xfs_hooks_switch_off(name)	((void)0)
+# define xfs_hooks_switched_on(name)	(true)
+#else
+# define DEFINE_STATIC_XFS_HOOK_SWITCH(name)
+# define xfs_hooks_switch_on(name)	((void)0)
+# define xfs_hooks_switch_off(name)	((void)0)
+# define xfs_hooks_switched_on(name)	(false)
+#endif /* JUMP_LABEL && XFS_LIVE_HOOKS */
+
+#ifdef CONFIG_XFS_LIVE_HOOKS
+struct xfs_hook {
+	/* This must come at the start of the structure. */
+	struct notifier_block		nb;
+};
+
+typedef	int (*xfs_hook_fn_t)(struct xfs_hook *hook, unsigned long action,
+		void *data);
+
+void xfs_hooks_init(struct xfs_hooks *chain);
+int xfs_hooks_add(struct xfs_hooks *chain, struct xfs_hook *hook);
+void xfs_hooks_del(struct xfs_hooks *chain, struct xfs_hook *hook);
+int xfs_hooks_call(struct xfs_hooks *chain, unsigned long action,
+		void *priv);
+
+static inline void xfs_hook_setup(struct xfs_hook *hook, notifier_fn_t fn)
+{
+	hook->nb.notifier_call = fn;
+	hook->nb.priority = 0;
+}
+
+#else
+# define xfs_hooks_init(chain)			((void)0)
+# define xfs_hooks_call(chain, val, priv)	(NOTIFY_DONE)
+#endif
+
+#endif /* XFS_HOOKS_H_ */
diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h
index f9878021e7d0..c05f7e309c3e 100644
--- a/fs/xfs/xfs_linux.h
+++ b/fs/xfs/xfs_linux.h
@@ -79,6 +79,7 @@ typedef __u32			xfs_nlink_t;
 #include "xfs_cksum.h"
 #include "xfs_buf.h"
 #include "xfs_message.h"
+#include "xfs_hooks.h"
 
 #ifdef __BIG_ENDIAN
 #define XFS_NATIVE_HOST 1


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 14/23] xfs: allow blocking notifier chains with filesystem hooks
  2023-02-16 20:27 ` [PATCHSET v9r2d1 00/23] xfs: online fsck support patches Darrick J. Wong
                     ` (12 preceding siblings ...)
  2023-02-16 20:45   ` [PATCH 13/23] xfs: allow scrub to hook metadata updates in other writers Darrick J. Wong
@ 2023-02-16 20:45   ` Darrick J. Wong
  2023-02-16 20:45   ` [PATCH 15/23] xfs: streamline the directory iteration code for scrub Darrick J. Wong
                     ` (8 subsequent siblings)
  22 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:45 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Make it so that we can switch between notifier chain implementations for
testing purposes.  On the author's test system, calling an empty srcu
notifier chain cost about 19ns per call, vs. 4ns for a blocking notifier
chain.  Hm.  Might we actually want regular blocking notifiers?

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/Kconfig     |   33 ++++++++++++++++++++++++++++++++-
 fs/xfs/xfs_hooks.c |   41 +++++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_hooks.h |    6 +++++-
 3 files changed, 78 insertions(+), 2 deletions(-)


diff --git a/fs/xfs/Kconfig b/fs/xfs/Kconfig
index e99821c4c337..4798a147fd9e 100644
--- a/fs/xfs/Kconfig
+++ b/fs/xfs/Kconfig
@@ -102,7 +102,6 @@ config XFS_ONLINE_SCRUB
 	default n
 	depends on XFS_FS
 	depends on TMPFS && SHMEM
-	depends on SRCU
 	select XFS_LIVE_HOOKS
 	help
 	  If you say Y here you will be able to check metadata on a
@@ -117,6 +116,38 @@ config XFS_ONLINE_SCRUB
 
 	  If unsure, say N.
 
+choice
+	prompt "XFS hook implementation"
+	depends on XFS_FS && XFS_LIVE_HOOKS && XFS_ONLINE_SCRUB
+	default XFS_LIVE_HOOKS_BLOCKING if HAVE_ARCH_JUMP_LABEL
+	default XFS_LIVE_HOOKS_SRCU if !HAVE_ARCH_JUMP_LABEL
+	help
+	  Pick one
+
+config XFS_LIVE_HOOKS_SRCU
+	bool "SRCU notifier chains"
+	depends on SRCU
+	help
+	  Use SRCU notifier chains for filesystem hooks.  These have very low
+	  overhead for event initiators (the main filesystem) and higher
+	  overhead for chain modifiers (scrub waits for RCU grace).  This is
+	  the best option when jump labels are not supported or there are many
+	  CPUs in the system.
+
+	  This may cause problems with CPU hotplug invoking reclaim invoking
+	  XFS.
+
+config XFS_LIVE_HOOKS_BLOCKING
+	bool "Blocking notifier chains"
+	help
+	  Use blocking notifier chains for filesystem hooks.  These have medium
+	  overhead for event initiators (the main fs) and chain modifiers
+	  (scrub) due to their use of rwsems.  This is the best option when
+	  jump labels can be used to eliminate overhead for the filesystem when
+	  scrub is not running.
+
+endchoice
+
 config XFS_ONLINE_REPAIR
 	bool "XFS online metadata repair support"
 	default n
diff --git a/fs/xfs/xfs_hooks.c b/fs/xfs/xfs_hooks.c
index 3f958ece0dc0..653fc1f82516 100644
--- a/fs/xfs/xfs_hooks.c
+++ b/fs/xfs/xfs_hooks.c
@@ -12,6 +12,7 @@
 #include "xfs_ag.h"
 #include "xfs_trace.h"
 
+#if defined(CONFIG_XFS_LIVE_HOOKS_SRCU)
 /* Initialize a notifier chain. */
 void
 xfs_hooks_init(
@@ -51,3 +52,43 @@ xfs_hooks_call(
 {
 	return srcu_notifier_call_chain(&chain->head, val, priv);
 }
+#elif defined(CONFIG_XFS_LIVE_HOOKS_BLOCKING)
+/* Initialize a notifier chain. */
+void
+xfs_hooks_init(
+	struct xfs_hooks	*chain)
+{
+	BLOCKING_INIT_NOTIFIER_HEAD(&chain->head);
+}
+
+/* Make it so a function gets called whenever we hit a certain hook point. */
+int
+xfs_hooks_add(
+	struct xfs_hooks	*chain,
+	struct xfs_hook		*hook)
+{
+	ASSERT(hook->nb.notifier_call != NULL);
+	BUILD_BUG_ON(offsetof(struct xfs_hook, nb) != 0);
+
+	return blocking_notifier_chain_register(&chain->head, &hook->nb);
+}
+
+/* Remove a previously installed hook. */
+void
+xfs_hooks_del(
+	struct xfs_hooks	*chain,
+	struct xfs_hook		*hook)
+{
+	blocking_notifier_chain_unregister(&chain->head, &hook->nb);
+}
+
+/* Call a hook.  Returns the NOTIFY_* value returned by the last hook. */
+int
+xfs_hooks_call(
+	struct xfs_hooks	*chain,
+	unsigned long		val,
+	void			*priv)
+{
+	return blocking_notifier_call_chain(&chain->head, val, priv);
+}
+#endif /* CONFIG_XFS_LIVE_HOOKS_BLOCKING */
diff --git a/fs/xfs/xfs_hooks.h b/fs/xfs/xfs_hooks.h
index 8dd5baffb84a..a63f2276f614 100644
--- a/fs/xfs/xfs_hooks.h
+++ b/fs/xfs/xfs_hooks.h
@@ -6,10 +6,14 @@
 #ifndef XFS_HOOKS_H_
 #define XFS_HOOKS_H_
 
-#ifdef CONFIG_XFS_LIVE_HOOKS
+#if defined(CONFIG_XFS_LIVE_HOOKS_SRCU)
 struct xfs_hooks {
 	struct srcu_notifier_head	head;
 };
+#elif defined(CONFIG_XFS_LIVE_HOOKS_BLOCKING)
+struct xfs_hooks {
+	struct blocking_notifier_head	head;
+};
 #else
 struct xfs_hooks { /* empty */ };
 #endif


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 15/23] xfs: streamline the directory iteration code for scrub
  2023-02-16 20:27 ` [PATCHSET v9r2d1 00/23] xfs: online fsck support patches Darrick J. Wong
                     ` (13 preceding siblings ...)
  2023-02-16 20:45   ` [PATCH 14/23] xfs: allow blocking notifier chains with filesystem hooks Darrick J. Wong
@ 2023-02-16 20:45   ` Darrick J. Wong
  2023-02-16 20:45   ` [PATCH 16/23] xfs: track file link count updates during live nlinks fsck Darrick J. Wong
                     ` (7 subsequent siblings)
  22 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:45 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Currently, online scrub reuses the xfs_readdir code to walk every entry
in a directory.  This isn't awesome for performance, since we end up
cycling the directory ILOCK needlessly and coding around the particular
quirks of the VFS dir_context interface.

Create a streamlined version of readdir that keeps the ILOCK (since the
walk function isn't going to copy stuff to userspace), skips a whole lot
of directory walk cursor checks (since we start at 0 and walk to the
end) and has a sane way to return error codes.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/Makefile        |    1 
 fs/xfs/scrub/dir.c     |  173 +++++++---------------
 fs/xfs/scrub/parent.c  |   90 +++---------
 fs/xfs/scrub/readdir.c |  375 ++++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/readdir.h |   19 ++
 5 files changed, 473 insertions(+), 185 deletions(-)
 create mode 100644 fs/xfs/scrub/readdir.c
 create mode 100644 fs/xfs/scrub/readdir.h


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 64a3cc396e16..9a6ef1f7c27b 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -161,6 +161,7 @@ xfs-y				+= $(addprefix scrub/, \
 				   inode.o \
 				   iscan.o \
 				   parent.o \
+				   readdir.o \
 				   refcount.o \
 				   rmap.o \
 				   scrub.o \
diff --git a/fs/xfs/scrub/dir.c b/fs/xfs/scrub/dir.c
index 2a3107cc8ccb..46080134b408 100644
--- a/fs/xfs/scrub/dir.c
+++ b/fs/xfs/scrub/dir.c
@@ -18,6 +18,7 @@
 #include "scrub/scrub.h"
 #include "scrub/common.h"
 #include "scrub/dabtree.h"
+#include "scrub/readdir.h"
 
 /* Set us up to scrub directories. */
 int
@@ -31,115 +32,88 @@ xchk_setup_directory(
 
 /* Scrub a directory entry. */
 
-struct xchk_dir_ctx {
-	/* VFS fill-directory iterator */
-	struct dir_context	dir_iter;
-
-	struct xfs_scrub	*sc;
-};
-
-/* Check that an inode's mode matches a given DT_ type. */
+/* Check that an inode's mode matches a given XFS_DIR3_FT_* type. */
 STATIC void
 xchk_dir_check_ftype(
-	struct xchk_dir_ctx	*sdc,
+	struct xfs_scrub	*sc,
 	xfs_fileoff_t		offset,
 	struct xfs_inode	*ip,
-	int			dtype)
+	int			ftype)
 {
-	struct xfs_mount	*mp = sdc->sc->mp;
-	int			ino_dtype;
+	struct xfs_mount	*mp = sc->mp;
 
 	if (!xfs_has_ftype(mp)) {
-		if (dtype != DT_UNKNOWN && dtype != DT_DIR)
-			xchk_fblock_set_corrupt(sdc->sc, XFS_DATA_FORK,
-					offset);
+		if (ftype != XFS_DIR3_FT_UNKNOWN && ftype != XFS_DIR3_FT_DIR)
+			xchk_fblock_set_corrupt(sc, XFS_DATA_FORK, offset);
 		return;
 	}
 
-	/* Convert mode to the DT_* values that dir_emit uses. */
-	ino_dtype = xfs_dir3_get_dtype(mp,
-			xfs_mode_to_ftype(VFS_I(ip)->i_mode));
-	if (ino_dtype != dtype)
-		xchk_fblock_set_corrupt(sdc->sc, XFS_DATA_FORK, offset);
+	if (xfs_mode_to_ftype(VFS_I(ip)->i_mode) != ftype)
+		xchk_fblock_set_corrupt(sc, XFS_DATA_FORK, offset);
 }
 
 /*
  * Scrub a single directory entry.
  *
- * We use the VFS directory iterator (i.e. readdir) to call this
- * function for every directory entry in a directory.  Once we're here,
- * we check the inode number to make sure it's sane, then we check that
- * we can look up this filename.  Finally, we check the ftype.
+ * Check the inode number to make sure it's sane, then we check that we can
+ * look up this filename.  Finally, we check the ftype.
  */
-STATIC bool
+STATIC int
 xchk_dir_actor(
-	struct dir_context	*dir_iter,
-	const char		*name,
-	int			namelen,
-	loff_t			pos,
-	u64			ino,
-	unsigned		type)
+	struct xfs_scrub	*sc,
+	struct xfs_inode	*dp,
+	xfs_dir2_dataptr_t	dapos,
+	const struct xfs_name	*name,
+	xfs_ino_t		ino,
+	void			*priv)
 {
-	struct xfs_mount	*mp;
-	struct xfs_inode	*dp;
+	struct xfs_mount	*mp = dp->i_mount;
 	struct xfs_inode	*ip;
-	struct xchk_dir_ctx	*sdc;
-	struct xfs_name		xname;
 	xfs_ino_t		lookup_ino;
 	xfs_dablk_t		offset;
 	int			error = 0;
 
-	sdc = container_of(dir_iter, struct xchk_dir_ctx, dir_iter);
-	dp = sdc->sc->ip;
-	mp = dp->i_mount;
 	offset = xfs_dir2_db_to_da(mp->m_dir_geo,
-			xfs_dir2_dataptr_to_db(mp->m_dir_geo, pos));
+			xfs_dir2_dataptr_to_db(mp->m_dir_geo, dapos));
 
-	if (xchk_should_terminate(sdc->sc, &error))
-		return !error;
+	if (xchk_should_terminate(sc, &error))
+		return error;
 
 	/* Does this inode number make sense? */
 	if (!xfs_verify_dir_ino(mp, ino)) {
-		xchk_fblock_set_corrupt(sdc->sc, XFS_DATA_FORK, offset);
-		goto out;
+		xchk_fblock_set_corrupt(sc, XFS_DATA_FORK, offset);
+		return -ECANCELED;
 	}
 
 	/* Does this name make sense? */
-	if (!xfs_dir2_namecheck(name, namelen)) {
-		xchk_fblock_set_corrupt(sdc->sc, XFS_DATA_FORK, offset);
-		goto out;
+	if (!xfs_dir2_namecheck(name->name, name->len)) {
+		xchk_fblock_set_corrupt(sc, XFS_DATA_FORK, offset);
+		return -ECANCELED;
 	}
 
-	if (!strncmp(".", name, namelen)) {
+	if (!strncmp(".", name->name, name->len)) {
 		/* If this is "." then check that the inum matches the dir. */
 		if (ino != dp->i_ino)
-			xchk_fblock_set_corrupt(sdc->sc, XFS_DATA_FORK,
-					offset);
-	} else if (!strncmp("..", name, namelen)) {
+			xchk_fblock_set_corrupt(sc, XFS_DATA_FORK, offset);
+	} else if (!strncmp("..", name->name, name->len)) {
 		/*
 		 * If this is ".." in the root inode, check that the inum
 		 * matches this dir.
 		 */
 		if (dp->i_ino == mp->m_sb.sb_rootino && ino != dp->i_ino)
-			xchk_fblock_set_corrupt(sdc->sc, XFS_DATA_FORK,
-					offset);
+			xchk_fblock_set_corrupt(sc, XFS_DATA_FORK, offset);
 	}
 
 	/* Verify that we can look up this name by hash. */
-	xname.name = name;
-	xname.len = namelen;
-	xname.type = XFS_DIR3_FT_UNKNOWN;
-
-	error = xfs_dir_lookup(sdc->sc->tp, dp, &xname, &lookup_ino, NULL);
+	error = xchk_dir_lookup(sc, dp, name, &lookup_ino);
 	/* ENOENT means the hash lookup failed and the dir is corrupt */
 	if (error == -ENOENT)
 		error = -EFSCORRUPTED;
-	if (!xchk_fblock_process_error(sdc->sc, XFS_DATA_FORK, offset,
-			&error))
+	if (!xchk_fblock_process_error(sc, XFS_DATA_FORK, offset, &error))
 		goto out;
 	if (lookup_ino != ino) {
-		xchk_fblock_set_corrupt(sdc->sc, XFS_DATA_FORK, offset);
-		goto out;
+		xchk_fblock_set_corrupt(sc, XFS_DATA_FORK, offset);
+		return -ECANCELED;
 	}
 
 	/*
@@ -151,27 +125,21 @@ xchk_dir_actor(
 	 * -EFSCORRUPTED or -EFSBADCRC then the child is corrupt which is a
 	 *  cross referencing error.  Any other error is an operational error.
 	 */
-	error = xchk_iget(sdc->sc, ino, &ip);
+	error = xchk_iget(sc, ino, &ip);
 	if (error == -EINVAL || error == -ENOENT) {
 		error = -EFSCORRUPTED;
-		xchk_fblock_process_error(sdc->sc, XFS_DATA_FORK, 0, &error);
+		xchk_fblock_process_error(sc, XFS_DATA_FORK, 0, &error);
 		goto out;
 	}
-	if (!xchk_fblock_xref_process_error(sdc->sc, XFS_DATA_FORK, offset,
-			&error))
+	if (!xchk_fblock_xref_process_error(sc, XFS_DATA_FORK, offset, &error))
 		goto out;
 
-	xchk_dir_check_ftype(sdc, offset, ip, type);
-	xchk_irele(sdc->sc, ip);
+	xchk_dir_check_ftype(sc, offset, ip, name->type);
+	xchk_irele(sc, ip);
 out:
-	/*
-	 * A negative error code returned here is supposed to cause the
-	 * dir_emit caller (xfs_readdir) to abort the directory iteration
-	 * and return zero to xchk_directory.
-	 */
-	if (error == 0 && sdc->sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)
-		return false;
-	return !error;
+	if (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)
+		return -ECANCELED;
+	return error;
 }
 
 /* Scrub a directory btree record. */
@@ -782,14 +750,7 @@ int
 xchk_directory(
 	struct xfs_scrub	*sc)
 {
-	struct xchk_dir_ctx	sdc = {
-		.dir_iter.actor = xchk_dir_actor,
-		.dir_iter.pos = 0,
-		.sc = sc,
-	};
-	size_t			bufsize;
-	loff_t			oldpos;
-	int			error = 0;
+	int			error;
 
 	if (!S_ISDIR(VFS_I(sc->ip)->i_mode))
 		return -ENOENT;
@@ -797,7 +758,7 @@ xchk_directory(
 	/* Plausible size? */
 	if (sc->ip->i_disk_size < xfs_dir2_sf_hdr_size(0)) {
 		xchk_ino_set_corrupt(sc, sc->ip->i_ino);
-		goto out;
+		return 0;
 	}
 
 	/* Check directory tree structure */
@@ -806,7 +767,7 @@ xchk_directory(
 		return error;
 
 	if (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)
-		return error;
+		return 0;
 
 	/* Check the freespace. */
 	error = xchk_directory_blocks(sc);
@@ -814,43 +775,11 @@ xchk_directory(
 		return error;
 
 	if (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)
-		return error;
+		return 0;
 
-	/*
-	 * Check that every dirent we see can also be looked up by hash.
-	 * Userspace usually asks for a 32k buffer, so we will too.
-	 */
-	bufsize = (size_t)min_t(loff_t, XFS_READDIR_BUFSIZE,
-			sc->ip->i_disk_size);
-
-	/*
-	 * Look up every name in this directory by hash.
-	 *
-	 * Use the xfs_readdir function to call xchk_dir_actor on
-	 * every directory entry in this directory.  In _actor, we check
-	 * the name, inode number, and ftype (if applicable) of the
-	 * entry.  xfs_readdir uses the VFS filldir functions to provide
-	 * iteration context.
-	 *
-	 * The VFS grabs a read or write lock via i_rwsem before it reads
-	 * or writes to a directory.  If we've gotten this far we've
-	 * already obtained IOLOCK_EXCL, which (since 4.10) is the same as
-	 * getting a write lock on i_rwsem.  Therefore, it is safe for us
-	 * to drop the ILOCK here in order to reuse the _readdir and
-	 * _dir_lookup routines, which do their own ILOCK locking.
-	 */
-	oldpos = 0;
-	xchk_iunlock(sc, XFS_ILOCK_EXCL);
-	while (true) {
-		error = xfs_readdir(sc->tp, sc->ip, &sdc.dir_iter, bufsize);
-		if (!xchk_fblock_process_error(sc, XFS_DATA_FORK, 0,
-				&error))
-			goto out;
-		if (oldpos == sdc.dir_iter.pos)
-			break;
-		oldpos = sdc.dir_iter.pos;
-	}
-
-out:
+	/* Look up every name in this directory by hash. */
+	error = xchk_dir_walk(sc, sc->ip, xchk_dir_actor, NULL);
+	if (error == -ECANCELED)
+		error = 0;
 	return error;
 }
diff --git a/fs/xfs/scrub/parent.c b/fs/xfs/scrub/parent.c
index 8581a21bfbfd..d59184a59671 100644
--- a/fs/xfs/scrub/parent.c
+++ b/fs/xfs/scrub/parent.c
@@ -16,6 +16,7 @@
 #include "xfs_dir2_priv.h"
 #include "scrub/scrub.h"
 #include "scrub/common.h"
+#include "scrub/readdir.h"
 
 /* Set us up to scrub parents. */
 int
@@ -30,39 +31,37 @@ xchk_setup_parent(
 /* Look for an entry in a parent pointing to this inode. */
 
 struct xchk_parent_ctx {
-	struct dir_context	dc;
 	struct xfs_scrub	*sc;
 	xfs_ino_t		ino;
 	xfs_nlink_t		nlink;
-	bool			cancelled;
 };
 
 /* Look for a single entry in a directory pointing to an inode. */
-STATIC bool
+STATIC int
 xchk_parent_actor(
-	struct dir_context	*dc,
-	const char		*name,
-	int			namelen,
-	loff_t			pos,
-	u64			ino,
-	unsigned		type)
+	struct xfs_scrub	*sc,
+	struct xfs_inode	*dp,
+	xfs_dir2_dataptr_t	dapos,
+	const struct xfs_name	*name,
+	xfs_ino_t		ino,
+	void			*priv)
 {
-	struct xchk_parent_ctx	*spc;
+	struct xchk_parent_ctx	*spc = priv;
 	int			error = 0;
 
-	spc = container_of(dc, struct xchk_parent_ctx, dc);
+	/* Does this name make sense? */
+	if (!xfs_dir2_namecheck(name->name, name->len))
+		error = -EFSCORRUPTED;
+	if (!xchk_fblock_xref_process_error(sc, XFS_DATA_FORK, 0, &error))
+		return error;
+
 	if (spc->ino == ino)
 		spc->nlink++;
 
-	/*
-	 * If we're facing a fatal signal, bail out.  Store the cancellation
-	 * status separately because the VFS readdir code squashes error codes
-	 * into short directory reads.
-	 */
 	if (xchk_should_terminate(spc->sc, &error))
-		spc->cancelled = true;
+		return error;
 
-	return !error;
+	return 0;
 }
 
 /* Count the number of dentries in the parent dir that point to this inode. */
@@ -70,53 +69,14 @@ STATIC int
 xchk_parent_count_parent_dentries(
 	struct xfs_scrub	*sc,
 	struct xfs_inode	*parent,
-	xfs_nlink_t		*nlink)
+	struct xchk_parent_ctx	*spc)
 {
-	struct xchk_parent_ctx	spc = {
-		.dc.actor	= xchk_parent_actor,
-		.ino		= sc->ip->i_ino,
-		.sc		= sc,
-	};
-	size_t			bufsize;
-	loff_t			oldpos;
 	uint			lock_mode;
-	int			error = 0;
+	int			error;
 
-	/*
-	 * If there are any blocks, read-ahead block 0 as we're almost
-	 * certain to have the next operation be a read there.  This is
-	 * how we guarantee that the parent's extent map has been loaded,
-	 * if there is one.
-	 */
 	lock_mode = xfs_ilock_data_map_shared(parent);
-	if (parent->i_df.if_nextents > 0)
-		error = xfs_dir3_data_readahead(parent, 0, 0);
+	error = xchk_dir_walk(sc, parent, xchk_parent_actor, spc);
 	xfs_iunlock(parent, lock_mode);
-	if (error)
-		return error;
-
-	/*
-	 * Iterate the parent dir to confirm that there is
-	 * exactly one entry pointing back to the inode being
-	 * scanned.
-	 */
-	bufsize = (size_t)min_t(loff_t, XFS_READDIR_BUFSIZE,
-			parent->i_disk_size);
-	oldpos = 0;
-	while (true) {
-		error = xfs_readdir(sc->tp, parent, &spc.dc, bufsize);
-		if (error)
-			goto out;
-		if (spc.cancelled) {
-			error = -EAGAIN;
-			goto out;
-		}
-		if (oldpos == spc.dc.pos)
-			break;
-		oldpos = spc.dc.pos;
-	}
-	*nlink = spc.nlink;
-out:
 	return error;
 }
 
@@ -169,9 +129,13 @@ xchk_parent_validate(
 	struct xfs_scrub	*sc,
 	xfs_ino_t		parent_ino)
 {
+	struct xchk_parent_ctx	spc = {
+		.sc		= sc,
+		.ino		= sc->ip->i_ino,
+		.nlink		= 0,
+	};
 	struct xfs_inode	*dp = NULL;
 	xfs_nlink_t		expected_nlink;
-	xfs_nlink_t		nlink;
 	int			error = 0;
 
 	if (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)
@@ -249,7 +213,7 @@ xchk_parent_validate(
 	}
 
 	/* Look for a directory entry in the parent pointing to the child. */
-	error = xchk_parent_count_parent_dentries(sc, dp, &nlink);
+	error = xchk_parent_count_parent_dentries(sc, dp, &spc);
 	if (!xchk_fblock_xref_process_error(sc, XFS_DATA_FORK, 0, &error))
 		goto out_unlock;
 
@@ -257,7 +221,7 @@ xchk_parent_validate(
 	 * Ensure that the parent has as many links to the child as the child
 	 * thinks it has to the parent.
 	 */
-	if (nlink != expected_nlink)
+	if (spc.nlink != expected_nlink)
 		xchk_fblock_set_corrupt(sc, XFS_DATA_FORK, 0);
 
 out_unlock:
diff --git a/fs/xfs/scrub/readdir.c b/fs/xfs/scrub/readdir.c
new file mode 100644
index 000000000000..7d1695e98cc6
--- /dev/null
+++ b/fs/xfs/scrub/readdir.c
@@ -0,0 +1,375 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (C) 2022 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <djwong@kernel.org>
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_log_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_inode.h"
+#include "xfs_dir2.h"
+#include "xfs_dir2_priv.h"
+#include "xfs_trace.h"
+#include "xfs_bmap.h"
+#include "xfs_trans.h"
+#include "xfs_error.h"
+#include "scrub/scrub.h"
+#include "scrub/readdir.h"
+
+/* Call a function for every entry in a shortform directory. */
+STATIC int
+xchk_dir_walk_sf(
+	struct xfs_scrub	*sc,
+	struct xfs_inode	*dp,
+	xchk_dirent_fn		dirent_fn,
+	void			*priv)
+{
+	struct xfs_name		name = {
+		.name		= ".",
+		.len		= 1,
+		.type		= XFS_DIR3_FT_DIR,
+	};
+	struct xfs_mount	*mp = dp->i_mount;
+	struct xfs_da_geometry	*geo = mp->m_dir_geo;
+	struct xfs_dir2_sf_entry *sfep;
+	struct xfs_dir2_sf_hdr	*sfp;
+	xfs_ino_t		ino;
+	xfs_dir2_dataptr_t	dapos;
+	unsigned int		i;
+	int			error;
+
+	ASSERT(dp->i_df.if_bytes == dp->i_disk_size);
+	ASSERT(dp->i_df.if_u1.if_data != NULL);
+
+	sfp = (struct xfs_dir2_sf_hdr *)dp->i_df.if_u1.if_data;
+
+	/* dot entry */
+	dapos = xfs_dir2_db_off_to_dataptr(geo, geo->datablk,
+			geo->data_entry_offset);
+
+	error = dirent_fn(sc, dp, dapos, &name, dp->i_ino, priv);
+	if (error)
+		return error;
+
+	/* dotdot entry */
+	dapos = xfs_dir2_db_off_to_dataptr(geo, geo->datablk,
+			geo->data_entry_offset +
+			xfs_dir2_data_entsize(mp, sizeof(".") - 1));
+	ino = xfs_dir2_sf_get_parent_ino(sfp);
+	name.name = "..";
+	name.len = 2;
+
+	error = dirent_fn(sc, dp, dapos, &name, ino, priv);
+	if (error)
+		return error;
+
+	/* iterate everything else */
+	sfep = xfs_dir2_sf_firstentry(sfp);
+	for (i = 0; i < sfp->count; i++) {
+		dapos = xfs_dir2_db_off_to_dataptr(geo, geo->datablk,
+				xfs_dir2_sf_get_offset(sfep));
+		ino = xfs_dir2_sf_get_ino(mp, sfp, sfep);
+		name.name = sfep->name;
+		name.len = sfep->namelen;
+		name.type = xfs_dir2_sf_get_ftype(mp, sfep);
+
+		error = dirent_fn(sc, dp, dapos, &name, ino, priv);
+		if (error)
+			return error;
+
+		sfep = xfs_dir2_sf_nextentry(mp, sfp, sfep);
+	}
+
+	return 0;
+}
+
+/* Call a function for every entry in a block directory. */
+STATIC int
+xchk_dir_walk_block(
+	struct xfs_scrub	*sc,
+	struct xfs_inode	*dp,
+	xchk_dirent_fn		dirent_fn,
+	void			*priv)
+{
+	struct xfs_mount	*mp = dp->i_mount;
+	struct xfs_da_geometry	*geo = mp->m_dir_geo;
+	struct xfs_buf		*bp;
+	unsigned int		off, next_off, end;
+	int			error;
+
+	error = xfs_dir3_block_read(sc->tp, dp, &bp);
+	if (error)
+		return error;
+
+	/* Walk each directory entry. */
+	end = xfs_dir3_data_end_offset(geo, bp->b_addr);
+	for (off = geo->data_entry_offset; off < end; off = next_off) {
+		struct xfs_name			name = { };
+		struct xfs_dir2_data_unused	*dup = bp->b_addr + off;
+		struct xfs_dir2_data_entry	*dep = bp->b_addr + off;
+		xfs_ino_t			ino;
+		xfs_dir2_dataptr_t		dapos;
+
+		/* Skip an empty entry. */
+		if (be16_to_cpu(dup->freetag) == XFS_DIR2_DATA_FREE_TAG) {
+			next_off = off + be16_to_cpu(dup->length);
+			continue;
+		}
+
+		/* Otherwise, find the next entry and report it. */
+		next_off = off + xfs_dir2_data_entsize(mp, dep->namelen);
+		if (next_off > end)
+			break;
+
+		dapos = xfs_dir2_db_off_to_dataptr(geo, geo->datablk, off);
+		ino = be64_to_cpu(dep->inumber);
+		name.name = dep->name;
+		name.len = dep->namelen;
+		name.type = xfs_dir2_data_get_ftype(mp, dep);
+
+		error = dirent_fn(sc, dp, dapos, &name, ino, priv);
+		if (error)
+			break;
+	}
+
+	xfs_trans_brelse(sc->tp, bp);
+	return error;
+}
+
+/* Read a leaf-format directory buffer. */
+STATIC int
+xchk_read_leaf_dir_buf(
+	struct xfs_trans	*tp,
+	struct xfs_inode	*dp,
+	struct xfs_da_geometry	*geo,
+	xfs_dir2_off_t		*curoff,
+	struct xfs_buf		**bpp)
+{
+	struct xfs_iext_cursor	icur;
+	struct xfs_bmbt_irec	map;
+	struct xfs_ifork	*ifp = xfs_ifork_ptr(dp, XFS_DATA_FORK);
+	xfs_dablk_t		last_da;
+	xfs_dablk_t		map_off;
+	xfs_dir2_off_t		new_off;
+
+	*bpp = NULL;
+
+	/*
+	 * Look for mapped directory blocks at or above the current offset.
+	 * Truncate down to the nearest directory block to start the scanning
+	 * operation.
+	 */
+	last_da = xfs_dir2_byte_to_da(geo, XFS_DIR2_LEAF_OFFSET);
+	map_off = xfs_dir2_db_to_da(geo, xfs_dir2_byte_to_db(geo, *curoff));
+
+	if (!xfs_iext_lookup_extent(dp, ifp, map_off, &icur, &map))
+		return 0;
+	if (map.br_startoff >= last_da)
+		return 0;
+	xfs_trim_extent(&map, map_off, last_da - map_off);
+
+	/* Read the directory block of that first mapping. */
+	new_off = xfs_dir2_da_to_byte(geo, map.br_startoff);
+	if (new_off > *curoff)
+		*curoff = new_off;
+
+	return xfs_dir3_data_read(tp, dp, map.br_startoff, 0, bpp);
+}
+
+/* Call a function for every entry in a leaf directory. */
+STATIC int
+xchk_dir_walk_leaf(
+	struct xfs_scrub	*sc,
+	struct xfs_inode	*dp,
+	xchk_dirent_fn		dirent_fn,
+	void			*priv)
+{
+	struct xfs_mount	*mp = dp->i_mount;
+	struct xfs_da_geometry	*geo = mp->m_dir_geo;
+	struct xfs_buf		*bp = NULL;
+	xfs_dir2_off_t		curoff = 0;
+	unsigned int		offset = 0;
+	int			error;
+
+	/* Iterate every directory offset in this directory. */
+	while (curoff < XFS_DIR2_LEAF_OFFSET) {
+		struct xfs_name			name = { };
+		struct xfs_dir2_data_unused	*dup;
+		struct xfs_dir2_data_entry	*dep;
+		xfs_ino_t			ino;
+		unsigned int			length;
+		xfs_dir2_dataptr_t		dapos;
+
+		/*
+		 * If we have no buffer, or we're off the end of the
+		 * current buffer, need to get another one.
+		 */
+		if (!bp || offset >= geo->blksize) {
+			if (bp) {
+				xfs_trans_brelse(sc->tp, bp);
+				bp = NULL;
+			}
+
+			error = xchk_read_leaf_dir_buf(sc->tp, dp, geo, &curoff,
+					&bp);
+			if (error || !bp)
+				break;
+
+			/*
+			 * Find our position in the block.
+			 */
+			offset = geo->data_entry_offset;
+			curoff += geo->data_entry_offset;
+		}
+
+		/* Skip an empty entry. */
+		dup = bp->b_addr + offset;
+		if (be16_to_cpu(dup->freetag) == XFS_DIR2_DATA_FREE_TAG) {
+			length = be16_to_cpu(dup->length);
+			offset += length;
+			curoff += length;
+			continue;
+		}
+
+		/* Otherwise, find the next entry and report it. */
+		dep = bp->b_addr + offset;
+		length = xfs_dir2_data_entsize(mp, dep->namelen);
+
+		dapos = xfs_dir2_byte_to_dataptr(curoff) & 0x7fffffff;
+		ino = be64_to_cpu(dep->inumber);
+		name.name = dep->name;
+		name.len = dep->namelen;
+		name.type = xfs_dir2_data_get_ftype(mp, dep);
+
+		error = dirent_fn(sc, dp, dapos, &name, ino, priv);
+		if (error)
+			break;
+
+		/* Advance to the next entry. */
+		offset += length;
+		curoff += length;
+	}
+
+	if (bp)
+		xfs_trans_brelse(sc->tp, bp);
+	return error;
+}
+
+/*
+ * Call a function for every entry in a directory.
+ *
+ * Callers must hold the ILOCK.  File types are XFS_DIR3_FT_*.
+ */
+int
+xchk_dir_walk(
+	struct xfs_scrub	*sc,
+	struct xfs_inode	*dp,
+	xchk_dirent_fn		dirent_fn,
+	void			*priv)
+{
+	struct xfs_da_args	args = {
+		.dp		= dp,
+		.geo		= dp->i_mount->m_dir_geo,
+		.trans		= sc->tp,
+	};
+	bool			isblock;
+	int			error;
+
+	if (xfs_is_shutdown(dp->i_mount))
+		return -EIO;
+
+	ASSERT(S_ISDIR(VFS_I(dp)->i_mode));
+	ASSERT(xfs_isilocked(dp, XFS_ILOCK_SHARED | XFS_ILOCK_EXCL));
+
+	if (dp->i_df.if_format == XFS_DINODE_FMT_LOCAL)
+		return xchk_dir_walk_sf(sc, dp, dirent_fn, priv);
+
+	/* dir2 functions require that the data fork is loaded */
+	error = xfs_iread_extents(sc->tp, dp, XFS_DATA_FORK);
+	if (error)
+		return error;
+
+	error = xfs_dir2_isblock(&args, &isblock);
+	if (error)
+		return error;
+
+	if (isblock)
+		return xchk_dir_walk_block(sc, dp, dirent_fn, priv);
+
+	return xchk_dir_walk_leaf(sc, dp, dirent_fn, priv);
+}
+
+/*
+ * Look up the inode number for an exact name in a directory.
+ *
+ * Callers must hold the ILOCK.  File types are XFS_DIR3_FT_*.  Names are not
+ * checked for correctness.
+ */
+int
+xchk_dir_lookup(
+	struct xfs_scrub	*sc,
+	struct xfs_inode	*dp,
+	const struct xfs_name	*name,
+	xfs_ino_t		*ino)
+{
+	struct xfs_da_args	args = {
+		.dp		= dp,
+		.geo		= dp->i_mount->m_dir_geo,
+		.trans		= sc->tp,
+		.name		= name->name,
+		.namelen	= name->len,
+		.filetype	= name->type,
+		.hashval	= xfs_dir2_hashname(dp->i_mount, name),
+		.whichfork	= XFS_DATA_FORK,
+		.op_flags	= XFS_DA_OP_OKNOENT,
+	};
+	bool			isblock, isleaf;
+	int			error;
+
+	if (xfs_is_shutdown(dp->i_mount))
+		return -EIO;
+
+	ASSERT(S_ISDIR(VFS_I(dp)->i_mode));
+	ASSERT(xfs_isilocked(dp, XFS_ILOCK_SHARED | XFS_ILOCK_EXCL));
+
+	if (dp->i_df.if_format == XFS_DINODE_FMT_LOCAL) {
+		error = xfs_dir2_sf_lookup(&args);
+		goto out_check_rval;
+	}
+
+	/* dir2 functions require that the data fork is loaded */
+	error = xfs_iread_extents(sc->tp, dp, XFS_DATA_FORK);
+	if (error)
+		return error;
+
+	error = xfs_dir2_isblock(&args, &isblock);
+	if (error)
+		return error;
+
+	if (isblock) {
+		error = xfs_dir2_block_lookup(&args);
+		goto out_check_rval;
+	}
+
+	error = xfs_dir2_isleaf(&args, &isleaf);
+	if (error)
+		return error;
+
+	if (isleaf) {
+		error = xfs_dir2_leaf_lookup(&args);
+		goto out_check_rval;
+	}
+
+	error = xfs_dir2_node_lookup(&args);
+
+out_check_rval:
+	if (error == -EEXIST)
+		error = 0;
+	if (!error)
+		*ino = args.inumber;
+	return error;
+}
diff --git a/fs/xfs/scrub/readdir.h b/fs/xfs/scrub/readdir.h
new file mode 100644
index 000000000000..7272f3bd28b4
--- /dev/null
+++ b/fs/xfs/scrub/readdir.h
@@ -0,0 +1,19 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Copyright (C) 2022 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <djwong@kernel.org>
+ */
+#ifndef __XFS_SCRUB_READDIR_H__
+#define __XFS_SCRUB_READDIR_H__
+
+typedef int (*xchk_dirent_fn)(struct xfs_scrub *sc, struct xfs_inode *dp,
+		xfs_dir2_dataptr_t dapos, const struct xfs_name *name,
+		xfs_ino_t ino, void *priv);
+
+int xchk_dir_walk(struct xfs_scrub *sc, struct xfs_inode *dp,
+		xchk_dirent_fn dirent_fn, void *priv);
+
+int xchk_dir_lookup(struct xfs_scrub *sc, struct xfs_inode *dp,
+		const struct xfs_name *name, xfs_ino_t *ino);
+
+#endif /* __XFS_SCRUB_READDIR_H__ */


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 16/23] xfs: track file link count updates during live nlinks fsck
  2023-02-16 20:27 ` [PATCHSET v9r2d1 00/23] xfs: online fsck support patches Darrick J. Wong
                     ` (14 preceding siblings ...)
  2023-02-16 20:45   ` [PATCH 15/23] xfs: streamline the directory iteration code for scrub Darrick J. Wong
@ 2023-02-16 20:45   ` Darrick J. Wong
  2023-02-16 20:46   ` [PATCH 17/23] xfs: connect in-memory btrees to xfiles Darrick J. Wong
                     ` (6 subsequent siblings)
  22 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:45 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Create the necessary hooks in the file create/unlink/rename code so that
our live nlink scrub code can stay up to date with the rest of the
filesystem.  This will be the means to keep our shadow link count
information up to date while the scan runs in real time.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_dir2.c |    6 +
 fs/xfs/libxfs/xfs_dir2.h |    1 
 fs/xfs/scrub/common.c    |   20 ++++
 fs/xfs/scrub/common.h    |    2 
 fs/xfs/scrub/scrub.c     |   17 +++
 fs/xfs/scrub/scrub.h     |    3 +
 fs/xfs/scrub/trace.h     |   42 +++++++++
 fs/xfs/xfs_inode.c       |  226 ++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_inode.h       |   35 +++++++
 fs/xfs/xfs_mount.h       |    2 
 fs/xfs/xfs_super.c       |    2 
 fs/xfs/xfs_symlink.c     |    1 
 12 files changed, 357 insertions(+)


diff --git a/fs/xfs/libxfs/xfs_dir2.c b/fs/xfs/libxfs/xfs_dir2.c
index c1a9394d7478..27e408d20d18 100644
--- a/fs/xfs/libxfs/xfs_dir2.c
+++ b/fs/xfs/libxfs/xfs_dir2.c
@@ -25,6 +25,12 @@ const struct xfs_name xfs_name_dotdot = {
 	.type	= XFS_DIR3_FT_DIR,
 };
 
+const struct xfs_name xfs_name_dot = {
+	.name	= (const unsigned char *)".",
+	.len	= 1,
+	.type	= XFS_DIR3_FT_DIR,
+};
+
 /*
  * Convert inode mode to directory entry filetype
  */
diff --git a/fs/xfs/libxfs/xfs_dir2.h b/fs/xfs/libxfs/xfs_dir2.h
index ff59f009d1fd..ac360c0b2fe7 100644
--- a/fs/xfs/libxfs/xfs_dir2.h
+++ b/fs/xfs/libxfs/xfs_dir2.h
@@ -22,6 +22,7 @@ struct xfs_dir3_icfree_hdr;
 struct xfs_dir3_icleaf_hdr;
 
 extern const struct xfs_name	xfs_name_dotdot;
+extern const struct xfs_name	xfs_name_dot;
 
 /*
  * Convert inode mode to directory entry filetype
diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
index dc78e28a9447..a4cfe5653880 100644
--- a/fs/xfs/scrub/common.c
+++ b/fs/xfs/scrub/common.c
@@ -961,3 +961,23 @@ xchk_start_reaping(
 	}
 	sc->flags &= ~XCHK_REAPING_DISABLED;
 }
+
+/*
+ * Enable filesystem hooks (i.e. runtime code patching) before starting a scrub
+ * operation.  Callers must not hold any locks that intersect with the CPU
+ * hotplug lock (e.g. writeback locks) because code patching must halt the CPUs
+ * to change kernel code.
+ */
+void
+xchk_fshooks_enable(
+	struct xfs_scrub	*sc,
+	unsigned int		scrub_fshooks)
+{
+	ASSERT(!(scrub_fshooks & ~XCHK_FSHOOKS_ALL));
+	ASSERT(!(sc->flags & scrub_fshooks));
+
+	if (scrub_fshooks & XCHK_FSHOOKS_DIRENTS)
+		xfs_dirent_hook_enable();
+
+	sc->flags |= scrub_fshooks;
+}
diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
index 5286c263ff60..423a98c39fb6 100644
--- a/fs/xfs/scrub/common.h
+++ b/fs/xfs/scrub/common.h
@@ -157,4 +157,6 @@ int xchk_metadata_inode_forks(struct xfs_scrub *sc);
 void xchk_stop_reaping(struct xfs_scrub *sc);
 void xchk_start_reaping(struct xfs_scrub *sc);
 
+void xchk_fshooks_enable(struct xfs_scrub *sc, unsigned int scrub_fshooks);
+
 #endif	/* __XFS_SCRUB_COMMON_H__ */
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index 6aedce9b67fc..871a72e22a8a 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -145,6 +145,21 @@ xchk_probe(
 
 /* Scrub setup and teardown */
 
+static inline void
+xchk_fshooks_disable(
+	struct xfs_scrub	*sc)
+{
+	if (!(sc->flags & XCHK_FSHOOKS_ALL))
+		return;
+
+	//trace_xchk_fshooks_disable(sc, sc->flags & XCHK_FSHOOKS_ALL);
+
+	if (sc->flags & XCHK_FSHOOKS_DIRENTS)
+		xfs_dirent_hook_disable();
+
+	sc->flags &= ~XCHK_FSHOOKS_ALL;
+}
+
 /* Free all the resources and finish the transactions. */
 STATIC int
 xchk_teardown(
@@ -177,6 +192,8 @@ xchk_teardown(
 		kvfree(sc->buf);
 		sc->buf = NULL;
 	}
+
+	xchk_fshooks_disable(sc);
 	return error;
 }
 
diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h
index b4d391b4c938..484e5fb7fe7a 100644
--- a/fs/xfs/scrub/scrub.h
+++ b/fs/xfs/scrub/scrub.h
@@ -97,8 +97,11 @@ struct xfs_scrub {
 /* XCHK state flags grow up from zero, XREP state flags grown down from 2^31 */
 #define XCHK_TRY_HARDER		(1 << 0)  /* can't get resources, try again */
 #define XCHK_REAPING_DISABLED	(1 << 2)  /* background block reaping paused */
+#define XCHK_FSHOOKS_DIRENTS	(1 << 5)  /* link count live update enabled */
 #define XREP_ALREADY_FIXED	(1 << 31) /* checking our repair work */
 
+#define XCHK_FSHOOKS_ALL	(XCHK_FSHOOKS_DIRENTS)
+
 /* Metadata scrubbers */
 int xchk_tester(struct xfs_scrub *sc);
 int xchk_superblock(struct xfs_scrub *sc);
diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
index d97c9a40186a..979ee2789668 100644
--- a/fs/xfs/scrub/trace.h
+++ b/fs/xfs/scrub/trace.h
@@ -853,6 +853,48 @@ TRACE_EVENT(xchk_iscan_iget_retry_wait,
 		  __entry->retry_delay)
 );
 
+TRACE_DEFINE_ENUM(XFS_DIRENT_CHILD_DELTA);
+TRACE_DEFINE_ENUM(XFS_DIRENT_BACKREF_DELTA);
+TRACE_DEFINE_ENUM(XFS_DIRENT_SELF_DELTA);
+
+#define XFS_NLINK_DELTA_STRINGS \
+	{ XFS_DIRENT_CHILD_DELTA,	"->" }, \
+	{ XFS_DIRENT_BACKREF_DELTA,	"<-" }, \
+	{ XFS_DIRENT_SELF_DELTA,		"<>" }
+
+TRACE_EVENT(xchk_nlinks_live_update,
+	TP_PROTO(struct xfs_mount *mp, const struct xfs_inode *dp,
+		 int action, xfs_ino_t ino, int delta,
+		 const char *name, unsigned int namelen),
+	TP_ARGS(mp, dp, action, ino, delta, name, namelen),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_ino_t, dir)
+		__field(int, action)
+		__field(xfs_ino_t, ino)
+		__field(int, delta)
+		__field(unsigned int, namelen)
+		__dynamic_array(char, name, namelen)
+	),
+	TP_fast_assign(
+		__entry->dev = mp->m_super->s_dev;
+		__entry->dir = dp ? dp->i_ino : NULLFSINO;
+		__entry->action = action;
+		__entry->ino = ino;
+		__entry->delta = delta;
+		__entry->namelen = namelen;
+		memcpy(__get_str(name), name, namelen);
+	),
+	TP_printk("dev %d:%d dir 0x%llx %s ino 0x%llx nlink_delta %d name '%.*s'",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->dir,
+		  __print_symbolic(__entry->action, XFS_NLINK_DELTA_STRINGS),
+		  __entry->ino,
+		  __entry->delta,
+		  __entry->namelen,
+		  __get_str(name))
+);
+
 /* repair tracepoints */
 #if IS_ENABLED(CONFIG_XFS_ONLINE_REPAIR)
 
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 6626aa7486f1..b17e4ba3622b 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -970,6 +970,117 @@ xfs_mkdir_space_res(
 	return xfs_create_space_res(mp, namelen);
 }
 
+#ifdef CONFIG_XFS_LIVE_HOOKS
+/*
+ * Use a static key here to reduce the overhead of directory live update hooks.
+ * If the compiler supports jump labels, the static branch will be replaced by
+ * a nop sled when there are no hook users.  Online fsck is currently the only
+ * caller, so this is a reasonable tradeoff.
+ *
+ * Note: Patching the kernel code requires taking the cpu hotplug lock.  Other
+ * parts of the kernel allocate memory with that lock held, which means that
+ * XFS callers cannot hold any locks that might be used by memory reclaim or
+ * writeback when calling the static_branch_{inc,dec} functions.
+ */
+DEFINE_STATIC_XFS_HOOK_SWITCH(xfs_dirents_hooks_switch);
+
+void
+xfs_dirent_hook_disable(void)
+{
+	xfs_hooks_switch_off(&xfs_dirents_hooks_switch);
+}
+
+void
+xfs_dirent_hook_enable(void)
+{
+	xfs_hooks_switch_on(&xfs_dirents_hooks_switch);
+}
+
+/* Call hooks for a directory update relating to a dot dirent update. */
+static inline void
+xfs_dirent_self_delta(
+	struct xfs_inode		*dp,
+	int				delta)
+{
+	if (xfs_hooks_switched_on(&xfs_dirents_hooks_switch)) {
+		struct xfs_dirent_update_params	p = {
+			.dp		= dp,
+			.ip		= dp,
+			.delta		= delta,
+			.name		= &xfs_name_dot,
+		};
+		struct xfs_mount	*mp = dp->i_mount;
+
+		xfs_hooks_call(&mp->m_dirent_update_hooks,
+				XFS_DIRENT_SELF_DELTA, &p);
+	}
+}
+
+/* Call hooks for a directory update relating to a dotdot dirent update. */
+static inline void
+xfs_dirent_backref_delta(
+	struct xfs_inode		*dp,
+	struct xfs_inode		*ip,
+	int				delta)
+{
+	if (xfs_hooks_switched_on(&xfs_dirents_hooks_switch)) {
+		struct xfs_dirent_update_params	p = {
+			.dp		= dp,
+			.ip		= ip,
+			.delta		= delta,
+			.name		= &xfs_name_dotdot,
+		};
+		struct xfs_mount	*mp = ip->i_mount;
+
+		xfs_hooks_call(&mp->m_dirent_update_hooks,
+				XFS_DIRENT_BACKREF_DELTA, &p);
+	}
+}
+
+/* Call hooks for a directory update relating to a dirent update. */
+void
+xfs_dirent_child_delta(
+	struct xfs_inode		*dp,
+	struct xfs_inode		*ip,
+	int				delta,
+	struct xfs_name			*name)
+{
+	if (xfs_hooks_switched_on(&xfs_dirents_hooks_switch)) {
+		struct xfs_dirent_update_params	p = {
+			.dp		= dp,
+			.ip		= ip,
+			.delta		= delta,
+			.name		= name,
+		};
+		struct xfs_mount	*mp = ip->i_mount;
+
+		xfs_hooks_call(&mp->m_dirent_update_hooks,
+				XFS_DIRENT_CHILD_DELTA, &p);
+	}
+}
+
+/* Call the specified function during a directory update. */
+int
+xfs_dirent_hook_add(
+	struct xfs_mount	*mp,
+	struct xfs_dirent_hook	*hook)
+{
+	return xfs_hooks_add(&mp->m_dirent_update_hooks, &hook->delta_hook);
+}
+
+/* Stop calling the specified function during a directory update. */
+void
+xfs_dirent_hook_del(
+	struct xfs_mount	*mp,
+	struct xfs_dirent_hook	*hook)
+{
+	xfs_hooks_del(&mp->m_dirent_update_hooks, &hook->delta_hook);
+}
+#else
+# define xfs_dirent_self_delta(dp, delta)		((void)0)
+# define xfs_dirent_backref_delta(dp, ip, delta)	((void)0)
+#endif /* CONFIG_XFS_LIVE_HOOKS */
+
 int
 xfs_create(
 	struct user_namespace	*mnt_userns,
@@ -1096,6 +1207,16 @@ xfs_create(
 			goto out_trans_cancel;
 	}
 
+	/*
+	 * Create ip with a reference from dp, and add '.' and '..' references
+	 * if it's a directory.
+	 */
+	xfs_dirent_child_delta(dp, ip, 1, name);
+	if (is_dir) {
+		xfs_dirent_self_delta(ip, 1);
+		xfs_dirent_backref_delta(dp, ip, 1);
+	}
+
 	/*
 	 * If this is a synchronous mount, make sure that the
 	 * create transaction goes to disk before returning to
@@ -1361,6 +1482,8 @@ xfs_link(
 			goto error_return;
 	}
 
+	xfs_dirent_child_delta(tdp, sip, 1, target_name);
+
 	/*
 	 * If this is a synchronous mount, make sure that the
 	 * link transaction goes to disk before returning to
@@ -2631,6 +2754,16 @@ xfs_remove(
 			goto out_trans_cancel;
 	}
 
+	/*
+	 * Drop the link from dp to ip, and if ip was a directory, remove the
+	 * '.' and '..' references since we freed the directory.
+	 */
+	xfs_dirent_child_delta(dp, ip, -1, name);
+	if (S_ISDIR(VFS_I(ip)->i_mode)) {
+		xfs_dirent_backref_delta(dp, ip, -1);
+		xfs_dirent_self_delta(ip, -1);
+	}
+
 	/*
 	 * If this is a synchronous mount, make sure that the
 	 * remove transaction goes to disk before returning to
@@ -2728,6 +2861,92 @@ xfs_sort_for_rename(
 	}
 }
 
+#ifdef CONFIG_XFS_LIVE_HOOKS
+/*
+ * Directory entry live update hooks are called with ILOCK_EXCL held on all
+ * inodes after we've committed to making all the directory updates.  Hence we
+ * do not have to call the hooks in *exactly* the same order as the rename and
+ * exchange code make the actual updates.  This is fortunate because we can
+ * simplify things quite a bit, as long as we're careful to delete old dirents
+ * before creating new ones.
+ */
+static inline void
+xfs_exchange_call_nlink_hooks(
+	struct xfs_inode	*src_dp,
+	struct xfs_name		*src_name,
+	struct xfs_inode	*src_ip,
+	struct xfs_inode	*target_dp,
+	struct xfs_name		*target_name,
+	struct xfs_inode	*target_ip)
+{
+	/* Exchange files in the source directory. */
+	xfs_dirent_child_delta(src_dp, src_ip, -1, src_name);
+	xfs_dirent_child_delta(src_dp, target_ip, 1, src_name);
+
+	/* Exchange files in the target directory. */
+	xfs_dirent_child_delta(target_dp, target_ip, -1, target_name);
+	xfs_dirent_child_delta(target_dp, src_ip, 1, target_name);
+
+	/* If the source file is a dir, update its dotdot entry. */
+	if (S_ISDIR(VFS_I(src_ip)->i_mode)) {
+		xfs_dirent_backref_delta(src_dp, src_ip, -1);
+		xfs_dirent_backref_delta(target_dp, src_ip, 1);
+	}
+
+	/* If the target file is a dir, update its dotdot entry. */
+	if (S_ISDIR(VFS_I(target_ip)->i_mode)) {
+		xfs_dirent_backref_delta(target_dp, target_ip, -1);
+		xfs_dirent_backref_delta(src_dp, target_ip, 1);
+	}
+}
+
+static inline void
+xfs_rename_call_nlink_hooks(
+	struct xfs_inode	*src_dp,
+	struct xfs_name		*src_name,
+	struct xfs_inode	*src_ip,
+	struct xfs_inode	*target_dp,
+	struct xfs_name		*target_name,
+	struct xfs_inode	*target_ip,
+	struct xfs_inode	*wip)
+{
+	/*
+	 * If there's a target file, remove it from the target directory and
+	 * move the source file to the target directory.
+	 */
+	if (target_ip)
+		xfs_dirent_child_delta(target_dp, target_ip, -1, target_name);
+	xfs_dirent_child_delta(target_dp, src_ip, 1, target_name);
+
+	/*
+	 * Remove the source file from the source directory, and possibly move
+	 * the whiteout file into its place.
+	 */
+	xfs_dirent_child_delta(src_dp, src_ip, -1, src_name);
+	if (wip)
+		xfs_dirent_child_delta(src_dp, wip, 1, src_name);
+
+	/* If the source file is a dir, update its dotdot entry. */
+	if (S_ISDIR(VFS_I(src_ip)->i_mode)) {
+		xfs_dirent_backref_delta(src_dp, src_ip, -1);
+		xfs_dirent_backref_delta(target_dp, src_ip, 1);
+	}
+
+	/*
+	 * If the target file is a dir, drop the dot and dotdot entries because
+	 * we've dropped the last reference.
+	 */
+	if (target_ip && S_ISDIR(VFS_I(target_ip)->i_mode)) {
+		ASSERT(VFS_I(target_ip)->i_nlink == 0);
+		xfs_dirent_self_delta(target_ip, -1);
+		xfs_dirent_backref_delta(target_dp, target_ip, -1);
+	}
+}
+#else
+# define xfs_exchange_call_nlink_hooks(...)	((void)0)
+# define xfs_rename_call_nlink_hooks(...)	((void)0)
+#endif /* CONFIG_XFS_LIVE_HOOKS */
+
 static int
 xfs_finish_rename(
 	struct xfs_trans	*tp)
@@ -2861,6 +3080,9 @@ xfs_cross_rename(
 	xfs_trans_ichgtime(tp, dp1, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG);
 	xfs_trans_log_inode(tp, dp1, XFS_ILOG_CORE);
 
+	if (xfs_hooks_switched_on(&xfs_dirents_hooks_switch))
+		xfs_exchange_call_nlink_hooks(dp1, name1, ip1, dp2, name2, ip2);
+
 	return xfs_finish_rename(tp);
 
 out_trans_abort:
@@ -3338,6 +3560,10 @@ xfs_rename(
 	if (new_parent)
 		xfs_trans_log_inode(tp, target_dp, XFS_ILOG_CORE);
 
+	if (xfs_hooks_switched_on(&xfs_dirents_hooks_switch))
+		xfs_rename_call_nlink_hooks(src_dp, src_name, src_ip,
+				target_dp, target_name, target_ip, wip);
+
 	error = xfs_finish_rename(tp);
 
 	goto out_unlock;
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index 5735de32beeb..b7a16642a8c3 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -576,4 +576,39 @@ int xfs_ilock2_io_mmap(struct xfs_inode *ip1, struct xfs_inode *ip2);
 void xfs_iunlock2_io_mmap(struct xfs_inode *ip1, struct xfs_inode *ip2);
 void xfs_lock_inodes(struct xfs_inode **ips, int inodes, uint lock_mode);
 
+/*
+ * Parameters for tracking bumplink and droplink operations.  The hook
+ * function arg parameter is one of these.
+ */
+enum xfs_dirent_update_type {
+	XFS_DIRENT_CHILD_DELTA,		/* parent pointing to child */
+	XFS_DIRENT_BACKREF_DELTA,		/* dotdot entries */
+	XFS_DIRENT_SELF_DELTA,		/* dot entries */
+};
+
+struct xfs_dirent_update_params {
+	const struct xfs_inode	*dp;
+	const struct xfs_inode	*ip;
+	const struct xfs_name	*name;
+	int			delta;
+};
+
+#ifdef CONFIG_XFS_LIVE_HOOKS
+void xfs_dirent_child_delta(struct xfs_inode *dp, struct xfs_inode *ip,
+		int delta, struct xfs_name *name);
+
+struct xfs_dirent_hook {
+	struct xfs_hook		delta_hook;
+};
+
+void xfs_dirent_hook_disable(void);
+void xfs_dirent_hook_enable(void);
+
+int xfs_dirent_hook_add(struct xfs_mount *mp, struct xfs_dirent_hook *hook);
+void xfs_dirent_hook_del(struct xfs_mount *mp, struct xfs_dirent_hook *hook);
+
+#else
+# define xfs_dirent_child_delta(dp, ip, delta, name)	((void)0)
+#endif /* CONFIG_XFS_LIVE_HOOKS */
+
 #endif	/* __XFS_INODE_H__ */
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index 8aca2cc173ac..c08f55cc4f36 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -242,6 +242,8 @@ typedef struct xfs_mount {
 	unsigned int		*m_errortag;
 	struct xfs_kobj		m_errortag_kobj;
 #endif
+	/* Hook to feed file directory updates to an active online repair. */
+	struct xfs_hooks	m_dirent_update_hooks;
 } xfs_mount_t;
 
 #define M_IGEO(mp)		(&(mp)->m_ino_geo)
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 0ac55d191f1f..0432a4a096e8 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1949,6 +1949,8 @@ static int xfs_init_fs_context(
 	mp->m_logbsize = -1;
 	mp->m_allocsize_log = 16; /* 64k */
 
+	xfs_hooks_init(&mp->m_dirent_update_hooks);
+
 	/*
 	 * Copy binary VFS mount flags we are interested in.
 	 */
diff --git a/fs/xfs/xfs_symlink.c b/fs/xfs/xfs_symlink.c
index f305226109f0..77427a50a760 100644
--- a/fs/xfs/xfs_symlink.c
+++ b/fs/xfs/xfs_symlink.c
@@ -354,6 +354,7 @@ xfs_symlink(
 			goto out_trans_cancel;
 	}
 
+	xfs_dirent_child_delta(dp, ip, 1, link_name);
 
 	/*
 	 * If this is a synchronous mount, make sure that the


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 17/23] xfs: connect in-memory btrees to xfiles
  2023-02-16 20:27 ` [PATCHSET v9r2d1 00/23] xfs: online fsck support patches Darrick J. Wong
                     ` (15 preceding siblings ...)
  2023-02-16 20:45   ` [PATCH 16/23] xfs: track file link count updates during live nlinks fsck Darrick J. Wong
@ 2023-02-16 20:46   ` Darrick J. Wong
  2023-02-16 20:46   ` [PATCH 18/23] xfs: create temporary files and directories for online repair Darrick J. Wong
                     ` (5 subsequent siblings)
  22 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:46 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Add to our stubbed-out in-memory btrees the ability to connect them with
an actual in-memory backing file (aka xfiles) and the necessary pieces
to track free space in the xfile and flush dirty xfbtree buffers on
demand, which we'll need for online repair.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/scrub/trace.h |    1 +
 fs/xfs/scrub/xfile.c |   11 +++++++++++
 fs/xfs/scrub/xfile.h |    2 ++
 3 files changed, 14 insertions(+)


diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
index 979ee2789668..4a6f0f1b0881 100644
--- a/fs/xfs/scrub/trace.h
+++ b/fs/xfs/scrub/trace.h
@@ -755,6 +755,7 @@ DEFINE_EVENT(xfile_class, name, \
 DEFINE_XFILE_EVENT(xfile_pread);
 DEFINE_XFILE_EVENT(xfile_pwrite);
 DEFINE_XFILE_EVENT(xfile_seek_data);
+DEFINE_XFILE_EVENT(xfile_discard);
 
 TRACE_EVENT(xfarray_create,
 	TP_PROTO(struct xfarray *xfa, unsigned long long required_capacity),
diff --git a/fs/xfs/scrub/xfile.c b/fs/xfs/scrub/xfile.c
index 43455aa78243..f9888b6dd728 100644
--- a/fs/xfs/scrub/xfile.c
+++ b/fs/xfs/scrub/xfile.c
@@ -285,6 +285,17 @@ xfile_pwrite(
 	return error;
 }
 
+/* Discard pages backing a range of the xfile. */
+void
+xfile_discard(
+	struct xfile		*xf,
+	loff_t			pos,
+	u64			count)
+{
+	trace_xfile_discard(xf, pos, count);
+	shmem_truncate_range(file_inode(xf->file), pos, pos + count - 1);
+}
+
 /* Find the next written area in the xfile data for a given offset. */
 loff_t
 xfile_seek_data(
diff --git a/fs/xfs/scrub/xfile.h b/fs/xfs/scrub/xfile.h
index b37dba1961d8..973c8fc37707 100644
--- a/fs/xfs/scrub/xfile.h
+++ b/fs/xfs/scrub/xfile.h
@@ -46,6 +46,8 @@ xfile_obj_store(struct xfile *xf, const void *buf, size_t count, loff_t pos)
 	return 0;
 }
 
+void xfile_discard(struct xfile *xf, loff_t pos, u64 count);
+int xfile_prealloc(struct xfile *xf, loff_t pos, u64 count);
 loff_t xfile_seek_data(struct xfile *xf, loff_t pos);
 
 struct xfile_stat {


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 18/23] xfs: create temporary files and directories for online repair
  2023-02-16 20:27 ` [PATCHSET v9r2d1 00/23] xfs: online fsck support patches Darrick J. Wong
                     ` (16 preceding siblings ...)
  2023-02-16 20:46   ` [PATCH 17/23] xfs: connect in-memory btrees to xfiles Darrick J. Wong
@ 2023-02-16 20:46   ` Darrick J. Wong
  2023-02-16 20:46   ` [PATCH 19/23] xfs: hide private inodes from bulkstat and handle functions Darrick J. Wong
                     ` (4 subsequent siblings)
  22 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:46 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Teach the online repair code how to create temporary files or
directories.  These temporary files can be used to stage reconstructed
information until we're ready to perform an atomic extent swap to commit
the new metadata.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/Makefile         |    1 
 fs/xfs/scrub/common.c   |    1 
 fs/xfs/scrub/scrub.c    |    2 
 fs/xfs/scrub/scrub.h    |    4 +
 fs/xfs/scrub/tempfile.c |  230 +++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/tempfile.h |   27 ++++++
 fs/xfs/scrub/trace.h    |   33 +++++++
 fs/xfs/xfs_inode.c      |    3 -
 fs/xfs/xfs_inode.h      |    2 
 9 files changed, 301 insertions(+), 2 deletions(-)
 create mode 100644 fs/xfs/scrub/tempfile.c
 create mode 100644 fs/xfs/scrub/tempfile.h


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 9a6ef1f7c27b..2562c852db7f 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -178,6 +178,7 @@ ifeq ($(CONFIG_XFS_ONLINE_REPAIR),y)
 xfs-y				+= $(addprefix scrub/, \
 				   agheader_repair.o \
 				   repair.o \
+				   tempfile.o \
 				   )
 endif
 endif
diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
index a4cfe5653880..2874da088e8d 100644
--- a/fs/xfs/scrub/common.c
+++ b/fs/xfs/scrub/common.c
@@ -33,6 +33,7 @@
 #include "scrub/trace.h"
 #include "scrub/repair.h"
 #include "scrub/health.h"
+#include "scrub/tempfile.h"
 
 /* Common code for the metadata scrubbers. */
 
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index 871a72e22a8a..a19ea7fdd510 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -22,6 +22,7 @@
 #include "scrub/trace.h"
 #include "scrub/repair.h"
 #include "scrub/health.h"
+#include "scrub/tempfile.h"
 
 /*
  * Online Scrub and Repair
@@ -193,6 +194,7 @@ xchk_teardown(
 		sc->buf = NULL;
 	}
 
+	xrep_tempfile_rele(sc);
 	xchk_fshooks_disable(sc);
 	return error;
 }
diff --git a/fs/xfs/scrub/scrub.h b/fs/xfs/scrub/scrub.h
index 484e5fb7fe7a..20e814a53850 100644
--- a/fs/xfs/scrub/scrub.h
+++ b/fs/xfs/scrub/scrub.h
@@ -80,6 +80,10 @@ struct xfs_scrub {
 	void				*buf;
 	uint				ilock_flags;
 
+	/* A temporary file on this filesystem, for staging new metadata. */
+	struct xfs_inode		*tempip;
+	uint				temp_ilock_flags;
+
 	/* See the XCHK/XREP state flags below. */
 	unsigned int			flags;
 
diff --git a/fs/xfs/scrub/tempfile.c b/fs/xfs/scrub/tempfile.c
new file mode 100644
index 000000000000..8f80f1c2555c
--- /dev/null
+++ b/fs/xfs/scrub/tempfile.c
@@ -0,0 +1,230 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (C) 2022 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <djwong@kernel.org>
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_inode.h"
+#include "xfs_ialloc.h"
+#include "xfs_quota.h"
+#include "xfs_bmap_btree.h"
+#include "xfs_trans_space.h"
+#include "xfs_dir2.h"
+#include "scrub/scrub.h"
+#include "scrub/common.h"
+#include "scrub/trace.h"
+#include "scrub/tempfile.h"
+
+/*
+ * Create a temporary file for reconstructing metadata, with the intention of
+ * atomically swapping the temporary file's contents with the file that's
+ * being repaired.
+ */
+int
+xrep_tempfile_create(
+	struct xfs_scrub	*sc,
+	uint16_t		mode)
+{
+	struct xfs_mount	*mp = sc->mp;
+	struct xfs_trans	*tp = NULL;
+	struct xfs_dquot	*udqp = NULL;
+	struct xfs_dquot	*gdqp = NULL;
+	struct xfs_dquot	*pdqp = NULL;
+	struct xfs_trans_res	*tres;
+	struct xfs_inode	*dp = mp->m_rootip;
+	xfs_ino_t		ino;
+	unsigned int		resblks;
+	bool			is_dir = S_ISDIR(mode);
+	int			error;
+
+	if (xfs_is_shutdown(mp))
+		return -EIO;
+	if (xfs_is_readonly(mp))
+		return -EROFS;
+
+	ASSERT(sc->tp == NULL);
+	ASSERT(sc->tempip == NULL);
+
+	/*
+	 * Make sure that we have allocated dquot(s) on disk.  The temporary
+	 * inode should be completely root owned so that we don't fail due to
+	 * quota limits.
+	 */
+	error = xfs_qm_vop_dqalloc(dp, GLOBAL_ROOT_UID, GLOBAL_ROOT_GID, 0,
+			XFS_QMOPT_QUOTALL, &udqp, &gdqp, &pdqp);
+	if (error)
+		return error;
+
+	if (is_dir) {
+		resblks = XFS_MKDIR_SPACE_RES(mp, 0);
+		tres = &M_RES(mp)->tr_mkdir;
+	} else {
+		resblks = XFS_IALLOC_SPACE_RES(mp);
+		tres = &M_RES(mp)->tr_create_tmpfile;
+	}
+
+	error = xfs_trans_alloc_icreate(mp, tres, udqp, gdqp, pdqp, resblks,
+			&tp);
+	if (error)
+		goto out_release_dquots;
+
+	/* Allocate inode, set up directory. */
+	error = xfs_dialloc(&tp, dp->i_ino, mode, &ino);
+	if (error)
+		goto out_trans_cancel;
+	error = xfs_init_new_inode(&init_user_ns, tp, dp, ino, mode, 0, 0,
+			0, false, &sc->tempip);
+	if (error)
+		goto out_trans_cancel;
+
+	/* Change the ownership of the inode to root. */
+	VFS_I(sc->tempip)->i_uid = GLOBAL_ROOT_UID;
+	VFS_I(sc->tempip)->i_gid = GLOBAL_ROOT_GID;
+	sc->tempip->i_diflags &= ~(XFS_DIFLAG_REALTIME | XFS_DIFLAG_RTINHERIT);
+	xfs_trans_log_inode(tp, sc->tempip, XFS_ILOG_CORE);
+
+	/*
+	 * Mark our temporary file as private so that LSMs and the ACL code
+	 * don't try to add their own metadata or reason about these files.
+	 * The file should never be exposed to userspace.
+	 */
+	VFS_I(sc->tempip)->i_flags |= S_PRIVATE;
+	VFS_I(sc->tempip)->i_opflags &= ~IOP_XATTR;
+
+	if (is_dir) {
+		error = xfs_dir_init(tp, sc->tempip, dp);
+		if (error)
+			goto out_trans_cancel;
+	}
+
+	/*
+	 * Attach the dquot(s) to the inodes and modify them incore.
+	 * These ids of the inode couldn't have changed since the new
+	 * inode has been locked ever since it was created.
+	 */
+	xfs_qm_vop_create_dqattach(tp, sc->tempip, udqp, gdqp, pdqp);
+
+	/*
+	 * Put our temp file on the unlinked list so it's purged automatically.
+	 * Anything being reconstructed using this file must be atomically
+	 * swapped with the original file because the contents here will be
+	 * purged when the inode is dropped or log recovery cleans out the
+	 * unlinked list.
+	 */
+	error = xfs_iunlink(tp, sc->tempip);
+	if (error)
+		goto out_trans_cancel;
+
+	error = xfs_trans_commit(tp);
+	if (error)
+		goto out_release_inode;
+
+	trace_xrep_tempfile_create(sc);
+
+	xfs_qm_dqrele(udqp);
+	xfs_qm_dqrele(gdqp);
+	xfs_qm_dqrele(pdqp);
+
+	/* Finish setting up the incore / vfs context. */
+	xfs_setup_iops(sc->tempip);
+	xfs_finish_inode_setup(sc->tempip);
+
+	sc->temp_ilock_flags = 0;
+	return error;
+
+out_trans_cancel:
+	xfs_trans_cancel(tp);
+out_release_inode:
+	/*
+	 * Wait until after the current transaction is aborted to finish the
+	 * setup of the inode and release the inode.  This prevents recursive
+	 * transactions and deadlocks from xfs_inactive.
+	 */
+	if (sc->tempip) {
+		xfs_finish_inode_setup(sc->tempip);
+		xchk_irele(sc, sc->tempip);
+	}
+out_release_dquots:
+	xfs_qm_dqrele(udqp);
+	xfs_qm_dqrele(gdqp);
+	xfs_qm_dqrele(pdqp);
+
+	return error;
+}
+
+/* Take IOLOCK_EXCL on the temporary file, maybe. */
+bool
+xrep_tempfile_iolock_nowait(
+	struct xfs_scrub	*sc)
+{
+	if (xfs_ilock_nowait(sc->tempip, XFS_IOLOCK_EXCL)) {
+		sc->temp_ilock_flags |= XFS_IOLOCK_EXCL;
+		return true;
+	}
+
+	return false;
+}
+
+/* Release IOLOCK_EXCL on the temporary file. */
+void
+xrep_tempfile_iounlock(
+	struct xfs_scrub	*sc)
+{
+	xfs_iunlock(sc->tempip, XFS_IOLOCK_EXCL);
+	sc->temp_ilock_flags &= ~XFS_IOLOCK_EXCL;
+}
+
+/* Prepare the temporary file for metadata updates by grabbing ILOCK_EXCL. */
+void
+xrep_tempfile_ilock(
+	struct xfs_scrub	*sc)
+{
+	sc->temp_ilock_flags |= XFS_ILOCK_EXCL;
+	xfs_ilock(sc->tempip, XFS_ILOCK_EXCL);
+}
+
+/* Try to grab ILOCK_EXCL on the temporary file. */
+bool
+xrep_tempfile_ilock_nowait(
+	struct xfs_scrub	*sc)
+{
+	if (xfs_ilock_nowait(sc->tempip, XFS_ILOCK_EXCL)) {
+		sc->temp_ilock_flags |= XFS_ILOCK_EXCL;
+		return true;
+	}
+
+	return false;
+}
+
+/* Unlock ILOCK_EXCL on the temporary file after an update. */
+void
+xrep_tempfile_iunlock(
+	struct xfs_scrub	*sc)
+{
+	xfs_iunlock(sc->tempip, XFS_ILOCK_EXCL);
+	sc->temp_ilock_flags &= ~XFS_ILOCK_EXCL;
+}
+
+/* Release the temporary file. */
+void
+xrep_tempfile_rele(
+	struct xfs_scrub	*sc)
+{
+	if (!sc->tempip)
+		return;
+
+	if (sc->temp_ilock_flags) {
+		xfs_iunlock(sc->tempip, sc->temp_ilock_flags);
+		sc->temp_ilock_flags = 0;
+	}
+
+	xchk_irele(sc, sc->tempip);
+	sc->tempip = NULL;
+}
diff --git a/fs/xfs/scrub/tempfile.h b/fs/xfs/scrub/tempfile.h
new file mode 100644
index 000000000000..f00a9ce43a32
--- /dev/null
+++ b/fs/xfs/scrub/tempfile.h
@@ -0,0 +1,27 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (C) 2022 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <djwong@kernel.org>
+ */
+#ifndef __XFS_SCRUB_TEMPFILE_H__
+#define __XFS_SCRUB_TEMPFILE_H__
+
+#ifdef CONFIG_XFS_ONLINE_REPAIR
+int xrep_tempfile_create(struct xfs_scrub *sc, uint16_t mode);
+void xrep_tempfile_rele(struct xfs_scrub *sc);
+
+bool xrep_tempfile_iolock_nowait(struct xfs_scrub *sc);
+void xrep_tempfile_iounlock(struct xfs_scrub *sc);
+
+void xrep_tempfile_ilock(struct xfs_scrub *sc);
+bool xrep_tempfile_ilock_nowait(struct xfs_scrub *sc);
+void xrep_tempfile_iunlock(struct xfs_scrub *sc);
+#else
+static inline void xrep_tempfile_iolock_both(struct xfs_scrub *sc)
+{
+	xchk_ilock(sc, XFS_IOLOCK_EXCL);
+}
+# define xrep_tempfile_rele(sc)
+#endif /* CONFIG_XFS_ONLINE_REPAIR */
+
+#endif /* __XFS_SCRUB_TEMPFILE_H__ */
diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
index 4a6f0f1b0881..0c27eb197f83 100644
--- a/fs/xfs/scrub/trace.h
+++ b/fs/xfs/scrub/trace.h
@@ -1149,6 +1149,39 @@ TRACE_EVENT(xrep_ialloc_insert,
 		  __entry->freemask)
 )
 
+TRACE_EVENT(xrep_tempfile_create,
+	TP_PROTO(struct xfs_scrub *sc),
+	TP_ARGS(sc),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_ino_t, ino)
+		__field(unsigned int, type)
+		__field(xfs_agnumber_t, agno)
+		__field(xfs_ino_t, inum)
+		__field(unsigned int, gen)
+		__field(unsigned int, flags)
+		__field(xfs_ino_t, temp_inum)
+	),
+	TP_fast_assign(
+		__entry->dev = sc->mp->m_super->s_dev;
+		__entry->ino = sc->file ? XFS_I(file_inode(sc->file))->i_ino : 0;
+		__entry->type = sc->sm->sm_type;
+		__entry->agno = sc->sm->sm_agno;
+		__entry->inum = sc->sm->sm_ino;
+		__entry->gen = sc->sm->sm_gen;
+		__entry->flags = sc->sm->sm_flags;
+		__entry->temp_inum = sc->tempip->i_ino;
+	),
+	TP_printk("dev %d:%d ino 0x%llx type %s inum 0x%llx gen 0x%x flags 0x%x temp_inum 0x%llx",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->ino,
+		  __print_symbolic(__entry->type, XFS_SCRUB_TYPE_STRINGS),
+		  __entry->inum,
+		  __entry->gen,
+		  __entry->flags,
+		  __entry->temp_inum)
+);
+
 #endif /* IS_ENABLED(CONFIG_XFS_ONLINE_REPAIR) */
 
 #endif /* _TRACE_XFS_SCRUB_TRACE_H */
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index b17e4ba3622b..8ad646beee75 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -48,7 +48,6 @@ struct kmem_cache *xfs_inode_cache;
  */
 #define	XFS_ITRUNC_MAX_EXTENTS	2
 
-STATIC int xfs_iunlink(struct xfs_trans *, struct xfs_inode *);
 STATIC int xfs_iunlink_remove(struct xfs_trans *tp, struct xfs_perag *pag,
 	struct xfs_inode *);
 
@@ -2187,7 +2186,7 @@ xfs_iunlink_insert_inode(
  * We place the on-disk inode on a list in the AGI.  It will be pulled from this
  * list when the inode is freed.
  */
-STATIC int
+int
 xfs_iunlink(
 	struct xfs_trans	*tp,
 	struct xfs_inode	*ip)
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index b7a16642a8c3..94a1490fb7b0 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -570,6 +570,8 @@ extern struct kmem_cache	*xfs_inode_cache;
 
 bool xfs_inode_needs_inactive(struct xfs_inode *ip);
 
+int xfs_iunlink(struct xfs_trans *tp, struct xfs_inode *ip);
+
 void xfs_end_io(struct work_struct *work);
 
 int xfs_ilock2_io_mmap(struct xfs_inode *ip1, struct xfs_inode *ip2);


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 19/23] xfs: hide private inodes from bulkstat and handle functions
  2023-02-16 20:27 ` [PATCHSET v9r2d1 00/23] xfs: online fsck support patches Darrick J. Wong
                     ` (17 preceding siblings ...)
  2023-02-16 20:46   ` [PATCH 18/23] xfs: create temporary files and directories for online repair Darrick J. Wong
@ 2023-02-16 20:46   ` Darrick J. Wong
  2023-02-16 20:46   ` [PATCH 20/23] xfs: create a blob array data structure Darrick J. Wong
                     ` (3 subsequent siblings)
  22 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:46 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

We're about to start adding functionality that uses internal inodes that
are private to XFS.  What this means is that userspace should never be
able to access any information about these files, and should not be able
to open these files by handle.  Callers are not allowed to link these
files into the directory tree, which should suffice to make these
private inodes actually private.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_export.c |    2 +-
 fs/xfs/xfs_itable.c |    8 ++++++++
 2 files changed, 9 insertions(+), 1 deletion(-)


diff --git a/fs/xfs/xfs_export.c b/fs/xfs/xfs_export.c
index 1064c2342876..b6ba96e0dd75 100644
--- a/fs/xfs/xfs_export.c
+++ b/fs/xfs/xfs_export.c
@@ -146,7 +146,7 @@ xfs_nfs_get_inode(
 		return ERR_PTR(error);
 	}
 
-	if (VFS_I(ip)->i_generation != generation) {
+	if (VFS_I(ip)->i_generation != generation || IS_PRIVATE(VFS_I(ip))) {
 		xfs_irele(ip);
 		return ERR_PTR(-ESTALE);
 	}
diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c
index a1c2bcf65d37..7a967cc78010 100644
--- a/fs/xfs/xfs_itable.c
+++ b/fs/xfs/xfs_itable.c
@@ -86,6 +86,14 @@ xfs_bulkstat_one_int(
 	vfsuid = i_uid_into_vfsuid(mnt_userns, inode);
 	vfsgid = i_gid_into_vfsgid(mnt_userns, inode);
 
+	/* If this is a private inode, don't leak its details to userspace. */
+	if (IS_PRIVATE(inode)) {
+		xfs_iunlock(ip, XFS_ILOCK_SHARED);
+		xfs_irele(ip);
+		error = -EINVAL;
+		goto out_advance;
+	}
+
 	/* xfs_iget returns the following without needing
 	 * further change.
 	 */


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 20/23] xfs: create a blob array data structure
  2023-02-16 20:27 ` [PATCHSET v9r2d1 00/23] xfs: online fsck support patches Darrick J. Wong
                     ` (18 preceding siblings ...)
  2023-02-16 20:46   ` [PATCH 19/23] xfs: hide private inodes from bulkstat and handle functions Darrick J. Wong
@ 2023-02-16 20:46   ` Darrick J. Wong
  2023-02-16 20:47   ` [PATCH 21/23] xfs: repair extended attributes Darrick J. Wong
                     ` (2 subsequent siblings)
  22 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:46 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Create a simple 'blob array' data structure for storage of arbitrarily
sized metadata objects that will be used to reconstruct metadata.  For
the intended usage (temporarily storing extended attribute names and
values) we only have to support storing objects and retrieving them.
Use the xfile abstraction to store the attribute information in memory
that can be swapped out.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/Makefile       |    1 
 fs/xfs/scrub/xfblob.c |  152 +++++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/xfblob.h |   25 ++++++++
 3 files changed, 178 insertions(+)
 create mode 100644 fs/xfs/scrub/xfblob.c
 create mode 100644 fs/xfs/scrub/xfblob.h


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 2562c852db7f..f2f3ab589c04 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -179,6 +179,7 @@ xfs-y				+= $(addprefix scrub/, \
 				   agheader_repair.o \
 				   repair.o \
 				   tempfile.o \
+				   xfblob.o \
 				   )
 endif
 endif
diff --git a/fs/xfs/scrub/xfblob.c b/fs/xfs/scrub/xfblob.c
new file mode 100644
index 000000000000..1f89d7d13c59
--- /dev/null
+++ b/fs/xfs/scrub/xfblob.c
@@ -0,0 +1,152 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (C) 2022 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <djwong@kernel.org>
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "scrub/scrub.h"
+#include "scrub/xfile.h"
+#include "scrub/xfarray.h"
+#include "scrub/xfblob.h"
+
+/*
+ * XFS Blob Storage
+ * ================
+ * Stores and retrieves blobs using an xfile.  Objects are appended to the file
+ * and the offset is returned as a magic cookie for retrieval.
+ */
+
+#define XB_KEY_MAGIC	0xABAADDAD
+struct xb_key {
+	uint32_t		xb_magic;  /* XB_KEY_MAGIC */
+	uint32_t		xb_size;   /* size of the blob, in bytes */
+	loff_t			xb_offset; /* byte offset of this key */
+	/* blob comes after here */
+} __packed;
+
+/* Initialize a blob storage object. */
+int
+xfblob_create(
+	struct xfs_mount	*mp,
+	const char		*description,
+	struct xfblob		**blobp)
+{
+	struct xfblob		*blob;
+	struct xfile		*xfile;
+	int			error;
+
+	error = xfile_create(mp, description, 0, &xfile);
+	if (error)
+		return error;
+
+	blob = kmalloc(sizeof(struct xfblob), XCHK_GFP_FLAGS);
+	if (!blob) {
+		error = -ENOMEM;
+		goto out_xfile;
+	}
+
+	blob->xfile = xfile;
+	blob->last_offset = PAGE_SIZE;
+
+	*blobp = blob;
+	return 0;
+
+out_xfile:
+	xfile_destroy(xfile);
+	return error;
+}
+
+/* Destroy a blob storage object. */
+void
+xfblob_destroy(
+	struct xfblob	*blob)
+{
+	xfile_destroy(blob->xfile);
+	kfree(blob);
+}
+
+/* Retrieve a blob. */
+int
+xfblob_load(
+	struct xfblob	*blob,
+	xfblob_cookie	cookie,
+	void		*ptr,
+	uint32_t	size)
+{
+	struct xb_key	key;
+	int		error;
+
+	error = xfile_obj_load(blob->xfile, &key, sizeof(key), cookie);
+	if (error)
+		return error;
+
+	if (key.xb_magic != XB_KEY_MAGIC || key.xb_offset != cookie) {
+		ASSERT(0);
+		return -ENODATA;
+	}
+	if (size < key.xb_size) {
+		ASSERT(0);
+		return -EFBIG;
+	}
+
+	return xfile_obj_load(blob->xfile, ptr, key.xb_size,
+			cookie + sizeof(key));
+}
+
+/* Store a blob. */
+int
+xfblob_store(
+	struct xfblob	*blob,
+	xfblob_cookie	*cookie,
+	const void	*ptr,
+	uint32_t	size)
+{
+	struct xb_key	key = {
+		.xb_offset = blob->last_offset,
+		.xb_magic = XB_KEY_MAGIC,
+		.xb_size = size,
+	};
+	loff_t		pos = blob->last_offset;
+	int		error;
+
+	error = xfile_obj_store(blob->xfile, &key, sizeof(key), pos);
+	if (error)
+		return error;
+
+	pos += sizeof(key);
+	error = xfile_obj_store(blob->xfile, ptr, size, pos);
+	if (error)
+		goto out_err;
+
+	*cookie = blob->last_offset;
+	blob->last_offset += sizeof(key) + size;
+	return 0;
+out_err:
+	xfile_discard(blob->xfile, blob->last_offset, sizeof(key));
+	return error;
+}
+
+/* Free a blob. */
+int
+xfblob_free(
+	struct xfblob	*blob,
+	xfblob_cookie	cookie)
+{
+	struct xb_key	key;
+	int		error;
+
+	error = xfile_obj_load(blob->xfile, &key, sizeof(key), cookie);
+	if (error)
+		return error;
+
+	if (key.xb_magic != XB_KEY_MAGIC || key.xb_offset != cookie) {
+		ASSERT(0);
+		return -ENODATA;
+	}
+
+	xfile_discard(blob->xfile, cookie, sizeof(key) + key.xb_size);
+	return 0;
+}
diff --git a/fs/xfs/scrub/xfblob.h b/fs/xfs/scrub/xfblob.h
new file mode 100644
index 000000000000..d1282810bb1d
--- /dev/null
+++ b/fs/xfs/scrub/xfblob.h
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Copyright (C) 2022 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <djwong@kernel.org>
+ */
+#ifndef __XFS_SCRUB_XFBLOB_H__
+#define __XFS_SCRUB_XFBLOB_H__
+
+struct xfblob {
+	struct xfile	*xfile;
+	loff_t		last_offset;
+};
+
+typedef loff_t		xfblob_cookie;
+
+int xfblob_create(struct xfs_mount *mp, const char *descr,
+		struct xfblob **blobp);
+void xfblob_destroy(struct xfblob *blob);
+int xfblob_load(struct xfblob *blob, xfblob_cookie cookie, void *ptr,
+		uint32_t size);
+int xfblob_store(struct xfblob *blob, xfblob_cookie *cookie, const void *ptr,
+		uint32_t size);
+int xfblob_free(struct xfblob *blob, xfblob_cookie cookie);
+
+#endif /* __XFS_SCRUB_XFBLOB_H__ */


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 21/23] xfs: repair extended attributes
  2023-02-16 20:27 ` [PATCHSET v9r2d1 00/23] xfs: online fsck support patches Darrick J. Wong
                     ` (19 preceding siblings ...)
  2023-02-16 20:46   ` [PATCH 20/23] xfs: create a blob array data structure Darrick J. Wong
@ 2023-02-16 20:47   ` Darrick J. Wong
  2023-02-16 20:47   ` [PATCH 22/23] xfs: online repair of directories Darrick J. Wong
  2023-02-16 20:47   ` [PATCH 23/23] xfs: create an xattr iteration function for scrub Darrick J. Wong
  22 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:47 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

If the extended attributes look bad, try to sift through the rubble to
find whatever keys/values we can, stage a new attribute structure in a
temporary file and use the atomic extent swapping mechanism to commit
the results in bulk.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/scrub/xfarray.c |   24 ++++++++++++++++++++++++
 fs/xfs/scrub/xfarray.h |    2 ++
 fs/xfs/scrub/xfblob.c  |   24 ++++++++++++++++++++++++
 fs/xfs/scrub/xfblob.h  |    2 ++
 4 files changed, 52 insertions(+)


diff --git a/fs/xfs/scrub/xfarray.c b/fs/xfs/scrub/xfarray.c
index 8fdd7dd40193..fccb3a3d9199 100644
--- a/fs/xfs/scrub/xfarray.c
+++ b/fs/xfs/scrub/xfarray.c
@@ -368,3 +368,27 @@ xfarray_load_next(
 	*idx = cur;
 	return 0;
 }
+
+/* How many bytes is this array consuming? */
+long long
+xfarray_bytes(
+	struct xfarray		*array)
+{
+	struct xfile_stat	statbuf;
+	int			error;
+
+	error = xfile_stat(array->xfile, &statbuf);
+	if (error)
+		return error;
+
+	return statbuf.bytes;
+}
+
+/* Empty the entire array. */
+void
+xfarray_truncate(
+	struct xfarray	*array)
+{
+	xfile_discard(array->xfile, 0, MAX_LFS_FILESIZE);
+	array->nr = 0;
+}
diff --git a/fs/xfs/scrub/xfarray.h b/fs/xfs/scrub/xfarray.h
index 26e2b594f121..8a3af0cecc3e 100644
--- a/fs/xfs/scrub/xfarray.h
+++ b/fs/xfs/scrub/xfarray.h
@@ -45,6 +45,8 @@ int xfarray_unset(struct xfarray *array, xfarray_idx_t idx);
 int xfarray_store(struct xfarray *array, xfarray_idx_t idx, const void *ptr);
 int xfarray_store_anywhere(struct xfarray *array, const void *ptr);
 bool xfarray_element_is_null(struct xfarray *array, const void *ptr);
+void xfarray_truncate(struct xfarray *array);
+long long xfarray_bytes(struct xfarray *array);
 
 /* Append an element to the array. */
 static inline int xfarray_append(struct xfarray *array, const void *ptr)
diff --git a/fs/xfs/scrub/xfblob.c b/fs/xfs/scrub/xfblob.c
index 1f89d7d13c59..2f89617a2db8 100644
--- a/fs/xfs/scrub/xfblob.c
+++ b/fs/xfs/scrub/xfblob.c
@@ -150,3 +150,27 @@ xfblob_free(
 	xfile_discard(blob->xfile, cookie, sizeof(key) + key.xb_size);
 	return 0;
 }
+
+/* How many bytes is this blob storage object consuming? */
+long long
+xfblob_bytes(
+	struct xfblob		*blob)
+{
+	struct xfile_stat	statbuf;
+	int			error;
+
+	error = xfile_stat(blob->xfile, &statbuf);
+	if (error)
+		return error;
+
+	return statbuf.bytes;
+}
+
+/* Drop all the blobs. */
+void
+xfblob_truncate(
+	struct xfblob	*blob)
+{
+	xfile_discard(blob->xfile, 0, MAX_LFS_FILESIZE);
+	blob->last_offset = 0;
+}
diff --git a/fs/xfs/scrub/xfblob.h b/fs/xfs/scrub/xfblob.h
index d1282810bb1d..8a5738e1d568 100644
--- a/fs/xfs/scrub/xfblob.h
+++ b/fs/xfs/scrub/xfblob.h
@@ -21,5 +21,7 @@ int xfblob_load(struct xfblob *blob, xfblob_cookie cookie, void *ptr,
 int xfblob_store(struct xfblob *blob, xfblob_cookie *cookie, const void *ptr,
 		uint32_t size);
 int xfblob_free(struct xfblob *blob, xfblob_cookie cookie);
+long long xfblob_bytes(struct xfblob *blob);
+void xfblob_truncate(struct xfblob *blob);
 
 #endif /* __XFS_SCRUB_XFBLOB_H__ */


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 22/23] xfs: online repair of directories
  2023-02-16 20:27 ` [PATCHSET v9r2d1 00/23] xfs: online fsck support patches Darrick J. Wong
                     ` (20 preceding siblings ...)
  2023-02-16 20:47   ` [PATCH 21/23] xfs: repair extended attributes Darrick J. Wong
@ 2023-02-16 20:47   ` Darrick J. Wong
  2023-02-16 20:47   ` [PATCH 23/23] xfs: create an xattr iteration function for scrub Darrick J. Wong
  22 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:47 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

If a directory looks like it's in bad shape, try to sift through the
rubble to find whatever directory entries we can, scan the directory
tree for the parent (if needed), stage the new directory contents in a
temporary file and use the atomic extent swapping mechanism to commit
the results in bulk.  As a side effect of this patch, directory
inactivation will be able to purge any leftover dir blocks.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/scrub/tempfile.c |   13 +++++++++++++
 fs/xfs/scrub/tempfile.h |    2 ++
 2 files changed, 15 insertions(+)


diff --git a/fs/xfs/scrub/tempfile.c b/fs/xfs/scrub/tempfile.c
index 8f80f1c2555c..91875d4bb67f 100644
--- a/fs/xfs/scrub/tempfile.c
+++ b/fs/xfs/scrub/tempfile.c
@@ -228,3 +228,16 @@ xrep_tempfile_rele(
 	xchk_irele(sc, sc->tempip);
 	sc->tempip = NULL;
 }
+
+/* Decide if a given XFS inode is a temporary file for a repair. */
+bool
+xrep_is_tempfile(
+	const struct xfs_inode	*ip)
+{
+	const struct inode	*inode = &ip->i_vnode;
+
+	if (IS_PRIVATE(inode) && !(inode->i_opflags & IOP_XATTR))
+		return true;
+
+	return false;
+}
diff --git a/fs/xfs/scrub/tempfile.h b/fs/xfs/scrub/tempfile.h
index f00a9ce43a32..e2f493b5d3d9 100644
--- a/fs/xfs/scrub/tempfile.h
+++ b/fs/xfs/scrub/tempfile.h
@@ -16,11 +16,13 @@ void xrep_tempfile_iounlock(struct xfs_scrub *sc);
 void xrep_tempfile_ilock(struct xfs_scrub *sc);
 bool xrep_tempfile_ilock_nowait(struct xfs_scrub *sc);
 void xrep_tempfile_iunlock(struct xfs_scrub *sc);
+bool xrep_is_tempfile(const struct xfs_inode *ip);
 #else
 static inline void xrep_tempfile_iolock_both(struct xfs_scrub *sc)
 {
 	xchk_ilock(sc, XFS_IOLOCK_EXCL);
 }
+# define xrep_is_tempfile(ip)		(false)
 # define xrep_tempfile_rele(sc)
 #endif /* CONFIG_XFS_ONLINE_REPAIR */
 


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 23/23] xfs: create an xattr iteration function for scrub
  2023-02-16 20:27 ` [PATCHSET v9r2d1 00/23] xfs: online fsck support patches Darrick J. Wong
                     ` (21 preceding siblings ...)
  2023-02-16 20:47   ` [PATCH 22/23] xfs: online repair of directories Darrick J. Wong
@ 2023-02-16 20:47   ` Darrick J. Wong
  22 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:47 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Create a streamlined function to walk a file's xattrs, without all the
cursor management stuff in the regular listxattr.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/Makefile          |    1 
 fs/xfs/scrub/listxattr.c |  314 ++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/listxattr.h |   17 ++
 3 files changed, 332 insertions(+)
 create mode 100644 fs/xfs/scrub/listxattr.c
 create mode 100644 fs/xfs/scrub/listxattr.h


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index f2f3ab589c04..6a30b145491d 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -160,6 +160,7 @@ xfs-y				+= $(addprefix scrub/, \
 				   ialloc.o \
 				   inode.o \
 				   iscan.o \
+				   listxattr.o \
 				   parent.o \
 				   readdir.o \
 				   refcount.o \
diff --git a/fs/xfs/scrub/listxattr.c b/fs/xfs/scrub/listxattr.c
new file mode 100644
index 000000000000..94a76dee8a0a
--- /dev/null
+++ b/fs/xfs/scrub/listxattr.c
@@ -0,0 +1,314 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (C) 2022 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <djwong@kernel.org>
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_log_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_inode.h"
+#include "xfs_da_format.h"
+#include "xfs_da_btree.h"
+#include "xfs_attr.h"
+#include "xfs_attr_leaf.h"
+#include "xfs_attr_sf.h"
+#include "xfs_trans.h"
+#include "scrub/scrub.h"
+#include "scrub/bitmap.h"
+#include "scrub/listxattr.h"
+
+/* Call a function for every entry in a shortform xattr structure. */
+STATIC int
+xchk_xattr_walk_sf(
+	struct xfs_scrub		*sc,
+	struct xfs_inode		*ip,
+	xchk_xattr_fn			attr_fn,
+	void				*priv)
+{
+	struct xfs_attr_shortform	*sf;
+	struct xfs_attr_sf_entry	*sfe;
+	unsigned int			i;
+	int				error;
+
+	sf = (struct xfs_attr_shortform *)ip->i_af.if_u1.if_data;
+	for (i = 0, sfe = &sf->list[0]; i < sf->hdr.count; i++) {
+		error = attr_fn(sc, ip, sfe->flags, sfe->nameval, sfe->namelen,
+				&sfe->nameval[sfe->namelen], sfe->valuelen,
+				priv);
+		if (error)
+			return error;
+
+		sfe = xfs_attr_sf_nextentry(sfe);
+	}
+
+	return 0;
+}
+
+/* Call a function for every entry in this xattr leaf block. */
+STATIC int
+xchk_xattr_walk_leaf_entries(
+	struct xfs_scrub		*sc,
+	struct xfs_inode		*ip,
+	xchk_xattr_fn			attr_fn,
+	struct xfs_buf			*bp,
+	void				*priv)
+{
+	struct xfs_attr3_icleaf_hdr	ichdr;
+	struct xfs_mount		*mp = sc->mp;
+	struct xfs_attr_leafblock	*leaf = bp->b_addr;
+	struct xfs_attr_leaf_entry	*entry;
+	unsigned int			i;
+	int				error;
+
+	xfs_attr3_leaf_hdr_from_disk(mp->m_attr_geo, &ichdr, leaf);
+	entry = xfs_attr3_leaf_entryp(leaf);
+
+	for (i = 0; i < ichdr.count; entry++, i++) {
+		void			*value;
+		unsigned char		*name;
+		unsigned int		namelen, valuelen;
+
+		if (entry->flags & XFS_ATTR_LOCAL) {
+			struct xfs_attr_leaf_name_local		*name_loc;
+
+			name_loc = xfs_attr3_leaf_name_local(leaf, i);
+			name = name_loc->nameval;
+			namelen = name_loc->namelen;
+			value = &name_loc->nameval[name_loc->namelen];
+			valuelen = be16_to_cpu(name_loc->valuelen);
+		} else {
+			struct xfs_attr_leaf_name_remote	*name_rmt;
+
+			name_rmt = xfs_attr3_leaf_name_remote(leaf, i);
+			name = name_rmt->name;
+			namelen = name_rmt->namelen;
+			value = NULL;
+			valuelen = be32_to_cpu(name_rmt->valuelen);
+		}
+
+		error = attr_fn(sc, ip, entry->flags, name, namelen, value,
+				valuelen, priv);
+		if (error)
+			return error;
+
+	}
+
+	return 0;
+}
+
+/*
+ * Call a function for every entry in a leaf-format xattr structure.  Avoid
+ * memory allocations for the loop detector since there's only one block.
+ */
+STATIC int
+xchk_xattr_walk_leaf(
+	struct xfs_scrub		*sc,
+	struct xfs_inode		*ip,
+	xchk_xattr_fn			attr_fn,
+	void				*priv)
+{
+	struct xfs_buf			*leaf_bp;
+	int				error;
+
+	error = xfs_attr3_leaf_read(sc->tp, ip, 0, &leaf_bp);
+	if (error)
+		return error;
+
+	error = xchk_xattr_walk_leaf_entries(sc, ip, attr_fn, leaf_bp, priv);
+	xfs_trans_brelse(sc->tp, leaf_bp);
+	return error;
+}
+
+/* Find the leftmost leaf in the xattr dabtree. */
+STATIC int
+xchk_xattr_find_leftmost_leaf(
+	struct xfs_scrub		*sc,
+	struct xfs_inode		*ip,
+	struct xbitmap			*seen_blocks,
+	struct xfs_buf			**leaf_bpp)
+{
+	struct xfs_da3_icnode_hdr	nodehdr;
+	struct xfs_mount		*mp = sc->mp;
+	struct xfs_trans		*tp = sc->tp;
+	struct xfs_da_intnode		*node;
+	struct xfs_da_node_entry	*btree;
+	struct xfs_buf			*bp;
+	//xfs_failaddr_t			fa;
+	xfs_dablk_t			blkno = 0;
+	unsigned int			expected_level = 0;
+	int				error;
+
+	for (;;) {
+		uint64_t		len;
+		uint16_t		magic;
+
+		error = xfs_da3_node_read(tp, ip, blkno, &bp, XFS_ATTR_FORK);
+		if (error)
+			return error;
+
+		node = bp->b_addr;
+		magic = be16_to_cpu(node->hdr.info.magic);
+		if (magic == XFS_ATTR_LEAF_MAGIC ||
+		    magic == XFS_ATTR3_LEAF_MAGIC)
+			break;
+
+		error = -EFSCORRUPTED;
+		if (magic != XFS_DA_NODE_MAGIC &&
+		    magic != XFS_DA3_NODE_MAGIC)
+			goto out_buf;
+
+#if 0
+		fa = xfs_da3_node_header_check(bp, ip->i_ino);
+		if (fa)
+			goto out_buf;
+#endif
+
+		xfs_da3_node_hdr_from_disk(mp, &nodehdr, node);
+
+		if (nodehdr.count == 0 || nodehdr.level >= XFS_DA_NODE_MAXDEPTH)
+			goto out_buf;
+
+		/* Check the level from the root node. */
+		if (blkno == 0)
+			expected_level = nodehdr.level - 1;
+		else if (expected_level != nodehdr.level)
+			goto out_buf;
+		else
+			expected_level--;
+
+		/* Remember that we've seen this node. */
+		error = xbitmap_set(seen_blocks, blkno, 1);
+		if (error)
+			goto out_buf;
+
+		/* Find the next level towards the leaves of the dabtree. */
+		btree = nodehdr.btree;
+		blkno = be32_to_cpu(btree->before);
+		xfs_trans_brelse(tp, bp);
+
+		/* Make sure we haven't seen this new block already. */
+		len = 1;
+		if (xbitmap_test(seen_blocks, blkno, &len))
+			return -EFSCORRUPTED;
+	}
+
+	error = -EFSCORRUPTED;
+#if 0
+	fa = xfs_attr3_leaf_header_check(bp, ip->i_ino);
+	if (fa)
+		goto out_buf;
+#endif
+
+	if (expected_level != 0)
+		goto out_buf;
+
+	/* Remember that we've seen this leaf. */
+	error = xbitmap_set(seen_blocks, blkno, 1);
+	if (error)
+		goto out_buf;
+
+	*leaf_bpp = bp;
+	return 0;
+
+out_buf:
+	xfs_trans_brelse(tp, bp);
+	return error;
+}
+
+/* Call a function for every entry in a node-format xattr structure. */
+STATIC int
+xchk_xattr_walk_node(
+	struct xfs_scrub		*sc,
+	struct xfs_inode		*ip,
+	xchk_xattr_fn			attr_fn,
+	void				*priv)
+{
+	struct xfs_attr3_icleaf_hdr	leafhdr;
+	struct xbitmap			seen_blocks;
+	struct xfs_mount		*mp = sc->mp;
+	struct xfs_attr_leafblock	*leaf;
+	struct xfs_buf			*leaf_bp;
+	int				error;
+
+	xbitmap_init(&seen_blocks);
+
+	error = xchk_xattr_find_leftmost_leaf(sc, ip, &seen_blocks, &leaf_bp);
+	if (error)
+		goto out_bitmap;
+
+	for (;;) {
+		uint64_t	len;
+
+		error = xchk_xattr_walk_leaf_entries(sc, ip, attr_fn, leaf_bp,
+				priv);
+		if (error)
+			goto out_leaf;
+
+		/* Find the right sibling of this leaf block. */
+		leaf = leaf_bp->b_addr;
+		xfs_attr3_leaf_hdr_from_disk(mp->m_attr_geo, &leafhdr, leaf);
+		if (leafhdr.forw == 0)
+			goto out_leaf;
+
+		xfs_trans_brelse(sc->tp, leaf_bp);
+
+		/* Make sure we haven't seen this new leaf already. */
+		len = 1;
+		if (xbitmap_test(&seen_blocks, leafhdr.forw, &len))
+			goto out_bitmap;
+
+		error = xfs_attr3_leaf_read(sc->tp, ip,
+				leafhdr.forw, &leaf_bp);
+		if (error)
+			goto out_bitmap;
+
+		/* Remember that we've seen this new leaf. */
+		error = xbitmap_set(&seen_blocks, leafhdr.forw, 1);
+		if (error)
+			goto out_leaf;
+	}
+
+out_leaf:
+	xfs_trans_brelse(sc->tp, leaf_bp);
+out_bitmap:
+	xbitmap_destroy(&seen_blocks);
+	return error;
+}
+
+/*
+ * Call a function for every extended attribute in a file.
+ *
+ * Callers must hold the ILOCK.  No validation or cursor restarts allowed.
+ * Returns -EFSCORRUPTED on any problem, including loops in the dabtree.
+ */
+int
+xchk_xattr_walk(
+	struct xfs_scrub	*sc,
+	struct xfs_inode	*ip,
+	xchk_xattr_fn		attr_fn,
+	void			*priv)
+{
+	int			error;
+
+	ASSERT(xfs_isilocked(ip, XFS_ILOCK_SHARED | XFS_ILOCK_EXCL));
+
+	if (!xfs_inode_hasattr(ip))
+		return 0;
+
+	if (ip->i_af.if_format == XFS_DINODE_FMT_LOCAL)
+		return xchk_xattr_walk_sf(sc, ip, attr_fn, priv);
+
+	/* attr functions require that the attr fork is loaded */
+	error = xfs_iread_extents(sc->tp, ip, XFS_ATTR_FORK);
+	if (error)
+		return error;
+
+	if (xfs_attr_is_leaf(ip))
+		return xchk_xattr_walk_leaf(sc, ip, attr_fn, priv);
+
+	return xchk_xattr_walk_node(sc, ip, attr_fn, priv);
+}
diff --git a/fs/xfs/scrub/listxattr.h b/fs/xfs/scrub/listxattr.h
new file mode 100644
index 000000000000..97af8ca23324
--- /dev/null
+++ b/fs/xfs/scrub/listxattr.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Copyright (C) 2022 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <djwong@kernel.org>
+ */
+#ifndef __XFS_SCRUB_LISTXATTR_H__
+#define __XFS_SCRUB_LISTXATTR_H__
+
+typedef int (*xchk_xattr_fn)(struct xfs_scrub *sc, struct xfs_inode *ip,
+		unsigned int attr_flags, const unsigned char *name,
+		unsigned int namelen, const void *value, unsigned int valuelen,
+		void *priv);
+
+int xchk_xattr_walk(struct xfs_scrub *sc, struct xfs_inode *ip,
+		xchk_xattr_fn attr_fn, void *priv);
+
+#endif /* __XFS_SCRUB_LISTXATTR_H__ */


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 1/7] xfs: pass directory offsets as part of the dirent hook data
  2023-02-16 20:27 ` [PATCHSET v9r2d1 0/7] xfs: online repair of directories Darrick J. Wong
@ 2023-02-16 20:48   ` Darrick J. Wong
  2023-02-16 20:48   ` [PATCH 2/7] xfs: pass diroffset back from xchk_dir_lookup Darrick J. Wong
                     ` (5 subsequent siblings)
  6 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:48 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

When we're calling the dirent hooks about a directory entry update, be
sure to pass the diroffset associated with the change.  We're going to
need this in the next patch.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_inode.c   |   40 +++++++++++++++++++++++++---------------
 fs/xfs/xfs_inode.h   |    5 +++--
 fs/xfs/xfs_symlink.c |    2 +-
 3 files changed, 29 insertions(+), 18 deletions(-)


diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 8ad646beee75..ce1f6d03c3a9 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -1042,7 +1042,8 @@ xfs_dirent_child_delta(
 	struct xfs_inode		*dp,
 	struct xfs_inode		*ip,
 	int				delta,
-	struct xfs_name			*name)
+	struct xfs_name			*name,
+	unsigned int			diroffset)
 {
 	if (xfs_hooks_switched_on(&xfs_dirents_hooks_switch)) {
 		struct xfs_dirent_update_params	p = {
@@ -1050,6 +1051,7 @@ xfs_dirent_child_delta(
 			.ip		= ip,
 			.delta		= delta,
 			.name		= name,
+			.diroffset	= diroffset,
 		};
 		struct xfs_mount	*mp = ip->i_mount;
 
@@ -1210,7 +1212,7 @@ xfs_create(
 	 * Create ip with a reference from dp, and add '.' and '..' references
 	 * if it's a directory.
 	 */
-	xfs_dirent_child_delta(dp, ip, 1, name);
+	xfs_dirent_child_delta(dp, ip, 1, name, diroffset);
 	if (is_dir) {
 		xfs_dirent_self_delta(ip, 1);
 		xfs_dirent_backref_delta(dp, ip, 1);
@@ -1481,7 +1483,7 @@ xfs_link(
 			goto error_return;
 	}
 
-	xfs_dirent_child_delta(tdp, sip, 1, target_name);
+	xfs_dirent_child_delta(tdp, sip, 1, target_name, diroffset);
 
 	/*
 	 * If this is a synchronous mount, make sure that the
@@ -2757,7 +2759,7 @@ xfs_remove(
 	 * Drop the link from dp to ip, and if ip was a directory, remove the
 	 * '.' and '..' references since we freed the directory.
 	 */
-	xfs_dirent_child_delta(dp, ip, -1, name);
+	xfs_dirent_child_delta(dp, ip, -1, name, dir_offset);
 	if (S_ISDIR(VFS_I(ip)->i_mode)) {
 		xfs_dirent_backref_delta(dp, ip, -1);
 		xfs_dirent_self_delta(ip, -1);
@@ -2873,18 +2875,22 @@ static inline void
 xfs_exchange_call_nlink_hooks(
 	struct xfs_inode	*src_dp,
 	struct xfs_name		*src_name,
+	xfs_dir2_dataptr_t	src_diroffset,
 	struct xfs_inode	*src_ip,
 	struct xfs_inode	*target_dp,
 	struct xfs_name		*target_name,
+	xfs_dir2_dataptr_t	target_diroffset,
 	struct xfs_inode	*target_ip)
 {
 	/* Exchange files in the source directory. */
-	xfs_dirent_child_delta(src_dp, src_ip, -1, src_name);
-	xfs_dirent_child_delta(src_dp, target_ip, 1, src_name);
+	xfs_dirent_child_delta(src_dp, src_ip, -1, src_name, src_diroffset);
+	xfs_dirent_child_delta(src_dp, target_ip, 1, src_name, src_diroffset);
 
 	/* Exchange files in the target directory. */
-	xfs_dirent_child_delta(target_dp, target_ip, -1, target_name);
-	xfs_dirent_child_delta(target_dp, src_ip, 1, target_name);
+	xfs_dirent_child_delta(target_dp, target_ip, -1, target_name,
+			target_diroffset);
+	xfs_dirent_child_delta(target_dp, src_ip, 1, target_name,
+			target_diroffset);
 
 	/* If the source file is a dir, update its dotdot entry. */
 	if (S_ISDIR(VFS_I(src_ip)->i_mode)) {
@@ -2903,9 +2909,11 @@ static inline void
 xfs_rename_call_nlink_hooks(
 	struct xfs_inode	*src_dp,
 	struct xfs_name		*src_name,
+	xfs_dir2_dataptr_t	src_diroffset,
 	struct xfs_inode	*src_ip,
 	struct xfs_inode	*target_dp,
 	struct xfs_name		*target_name,
+	xfs_dir2_dataptr_t	target_diroffset,
 	struct xfs_inode	*target_ip,
 	struct xfs_inode	*wip)
 {
@@ -2914,16 +2922,16 @@ xfs_rename_call_nlink_hooks(
 	 * move the source file to the target directory.
 	 */
 	if (target_ip)
-		xfs_dirent_child_delta(target_dp, target_ip, -1, target_name);
-	xfs_dirent_child_delta(target_dp, src_ip, 1, target_name);
+		xfs_dirent_child_delta(target_dp, target_ip, -1, target_name, target_diroffset);
+	xfs_dirent_child_delta(target_dp, src_ip, 1, target_name, target_diroffset);
 
 	/*
 	 * Remove the source file from the source directory, and possibly move
 	 * the whiteout file into its place.
 	 */
-	xfs_dirent_child_delta(src_dp, src_ip, -1, src_name);
+	xfs_dirent_child_delta(src_dp, src_ip, -1, src_name, src_diroffset);
 	if (wip)
-		xfs_dirent_child_delta(src_dp, wip, 1, src_name);
+		xfs_dirent_child_delta(src_dp, wip, 1, src_name, src_diroffset);
 
 	/* If the source file is a dir, update its dotdot entry. */
 	if (S_ISDIR(VFS_I(src_ip)->i_mode)) {
@@ -3080,7 +3088,8 @@ xfs_cross_rename(
 	xfs_trans_log_inode(tp, dp1, XFS_ILOG_CORE);
 
 	if (xfs_hooks_switched_on(&xfs_dirents_hooks_switch))
-		xfs_exchange_call_nlink_hooks(dp1, name1, ip1, dp2, name2, ip2);
+		xfs_exchange_call_nlink_hooks(dp1, name1, old_diroffset, ip1,
+				dp2, name2, new_diroffset, ip2);
 
 	return xfs_finish_rename(tp);
 
@@ -3560,8 +3569,9 @@ xfs_rename(
 		xfs_trans_log_inode(tp, target_dp, XFS_ILOG_CORE);
 
 	if (xfs_hooks_switched_on(&xfs_dirents_hooks_switch))
-		xfs_rename_call_nlink_hooks(src_dp, src_name, src_ip,
-				target_dp, target_name, target_ip, wip);
+		xfs_rename_call_nlink_hooks(src_dp, src_name, old_diroffset,
+				src_ip, target_dp, target_name, new_diroffset,
+				target_ip, wip);
 
 	error = xfs_finish_rename(tp);
 
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index 94a1490fb7b0..403b0f4cb5c0 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -592,12 +592,13 @@ struct xfs_dirent_update_params {
 	const struct xfs_inode	*dp;
 	const struct xfs_inode	*ip;
 	const struct xfs_name	*name;
+	unsigned int		diroffset;
 	int			delta;
 };
 
 #ifdef CONFIG_XFS_LIVE_HOOKS
 void xfs_dirent_child_delta(struct xfs_inode *dp, struct xfs_inode *ip,
-		int delta, struct xfs_name *name);
+		int delta, struct xfs_name *name, unsigned int diroffset);
 
 struct xfs_dirent_hook {
 	struct xfs_hook		delta_hook;
@@ -610,7 +611,7 @@ int xfs_dirent_hook_add(struct xfs_mount *mp, struct xfs_dirent_hook *hook);
 void xfs_dirent_hook_del(struct xfs_mount *mp, struct xfs_dirent_hook *hook);
 
 #else
-# define xfs_dirent_child_delta(dp, ip, delta, name)	((void)0)
+# define xfs_dirent_child_delta(dp, ip, delta, name, doff)	((void)0)
 #endif /* CONFIG_XFS_LIVE_HOOKS */
 
 #endif	/* __XFS_INODE_H__ */
diff --git a/fs/xfs/xfs_symlink.c b/fs/xfs/xfs_symlink.c
index 77427a50a760..fdfaab466f5d 100644
--- a/fs/xfs/xfs_symlink.c
+++ b/fs/xfs/xfs_symlink.c
@@ -354,7 +354,7 @@ xfs_symlink(
 			goto out_trans_cancel;
 	}
 
-	xfs_dirent_child_delta(dp, ip, 1, link_name);
+	xfs_dirent_child_delta(dp, ip, 1, link_name, diroffset);
 
 	/*
 	 * If this is a synchronous mount, make sure that the


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 2/7] xfs: pass diroffset back from xchk_dir_lookup
  2023-02-16 20:27 ` [PATCHSET v9r2d1 0/7] xfs: online repair of directories Darrick J. Wong
  2023-02-16 20:48   ` [PATCH 1/7] xfs: pass directory offsets as part of the dirent hook data Darrick J. Wong
@ 2023-02-16 20:48   ` Darrick J. Wong
  2023-02-16 20:48   ` [PATCH 3/7] xfs: shorten parent pointer function names Darrick J. Wong
                     ` (4 subsequent siblings)
  6 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:48 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Pass directory offsets back from xchk_dir_lookup so that we can compare
things in scrub.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/scrub/dir.c     |    2 +-
 fs/xfs/scrub/readdir.c |   12 ++++++++++--
 fs/xfs/scrub/readdir.h |    3 ++-
 3 files changed, 13 insertions(+), 4 deletions(-)


diff --git a/fs/xfs/scrub/dir.c b/fs/xfs/scrub/dir.c
index 46080134b408..06783e4b95ad 100644
--- a/fs/xfs/scrub/dir.c
+++ b/fs/xfs/scrub/dir.c
@@ -105,7 +105,7 @@ xchk_dir_actor(
 	}
 
 	/* Verify that we can look up this name by hash. */
-	error = xchk_dir_lookup(sc, dp, name, &lookup_ino);
+	error = xchk_dir_lookup(sc, dp, name, &lookup_ino, NULL);
 	/* ENOENT means the hash lookup failed and the dir is corrupt */
 	if (error == -ENOENT)
 		error = -EFSCORRUPTED;
diff --git a/fs/xfs/scrub/readdir.c b/fs/xfs/scrub/readdir.c
index 7d1695e98cc6..0a53438975c3 100644
--- a/fs/xfs/scrub/readdir.c
+++ b/fs/xfs/scrub/readdir.c
@@ -314,7 +314,8 @@ xchk_dir_lookup(
 	struct xfs_scrub	*sc,
 	struct xfs_inode	*dp,
 	const struct xfs_name	*name,
-	xfs_ino_t		*ino)
+	xfs_ino_t		*ino,
+	xfs_dir2_dataptr_t	*diroffsetp)
 {
 	struct xfs_da_args	args = {
 		.dp		= dp,
@@ -326,10 +327,14 @@ xchk_dir_lookup(
 		.hashval	= xfs_dir2_hashname(dp->i_mount, name),
 		.whichfork	= XFS_DATA_FORK,
 		.op_flags	= XFS_DA_OP_OKNOENT,
+		.offset		= XFS_DIR2_NULL_DATAPTR,
 	};
 	bool			isblock, isleaf;
 	int			error;
 
+	if (diroffsetp)
+		*diroffsetp = XFS_DIR2_NULL_DATAPTR;
+
 	if (xfs_is_shutdown(dp->i_mount))
 		return -EIO;
 
@@ -369,7 +374,10 @@ xchk_dir_lookup(
 out_check_rval:
 	if (error == -EEXIST)
 		error = 0;
-	if (!error)
+	if (!error) {
 		*ino = args.inumber;
+		if (diroffsetp)
+			*diroffsetp = args.offset;
+	}
 	return error;
 }
diff --git a/fs/xfs/scrub/readdir.h b/fs/xfs/scrub/readdir.h
index 7272f3bd28b4..1a18bb59adb2 100644
--- a/fs/xfs/scrub/readdir.h
+++ b/fs/xfs/scrub/readdir.h
@@ -14,6 +14,7 @@ int xchk_dir_walk(struct xfs_scrub *sc, struct xfs_inode *dp,
 		xchk_dirent_fn dirent_fn, void *priv);
 
 int xchk_dir_lookup(struct xfs_scrub *sc, struct xfs_inode *dp,
-		const struct xfs_name *name, xfs_ino_t *ino);
+		const struct xfs_name *name, xfs_ino_t *ino,
+		xfs_dir2_dataptr_t *diroffsetp);
 
 #endif /* __XFS_SCRUB_READDIR_H__ */


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 3/7] xfs: shorten parent pointer function names
  2023-02-16 20:27 ` [PATCHSET v9r2d1 0/7] xfs: online repair of directories Darrick J. Wong
  2023-02-16 20:48   ` [PATCH 1/7] xfs: pass directory offsets as part of the dirent hook data Darrick J. Wong
  2023-02-16 20:48   ` [PATCH 2/7] xfs: pass diroffset back from xchk_dir_lookup Darrick J. Wong
@ 2023-02-16 20:48   ` Darrick J. Wong
  2023-02-16 20:48   ` [PATCH 4/7] xfs: rearrange bits of the parent pointer apis for fsck Darrick J. Wong
                     ` (3 subsequent siblings)
  6 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:48 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Shorten the function names and add brief comments to each, outlining
what they're supposed to be doing.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_parent.c |   18 ++++++++++++------
 fs/xfs/libxfs/xfs_parent.h |   24 ++++++++++++------------
 fs/xfs/xfs_inode.c         |   16 ++++++++--------
 fs/xfs/xfs_symlink.c       |    2 +-
 4 files changed, 33 insertions(+), 27 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_parent.c b/fs/xfs/libxfs/xfs_parent.c
index 179b9bebaf25..ec2bff195773 100644
--- a/fs/xfs/libxfs/xfs_parent.c
+++ b/fs/xfs/libxfs/xfs_parent.c
@@ -135,6 +135,10 @@ xfs_parent_irec_from_disk(
 	memset(&irec->p_name[valuelen], 0, sizeof(irec->p_name) - valuelen);
 }
 
+/*
+ * Allocate memory to control a logged parent pointer update as part of a
+ * dirent operation.
+ */
 int
 __xfs_parent_init(
 	struct xfs_mount		*mp,
@@ -170,12 +174,13 @@ __xfs_parent_init(
 	return 0;
 }
 
+/* Add a parent pointer to reflect a dirent addition. */
 int
-xfs_parent_defer_add(
+xfs_parent_add(
 	struct xfs_trans	*tp,
 	struct xfs_parent_defer	*parent,
 	struct xfs_inode	*dp,
-	struct xfs_name		*parent_name,
+	const struct xfs_name	*parent_name,
 	xfs_dir2_dataptr_t	diroffset,
 	struct xfs_inode	*child)
 {
@@ -194,8 +199,9 @@ xfs_parent_defer_add(
 	return xfs_attr_defer_add(args);
 }
 
+/* Remove a parent pointer to reflect a dirent removal. */
 int
-xfs_parent_defer_remove(
+xfs_parent_remove(
 	struct xfs_trans	*tp,
 	struct xfs_inode	*dp,
 	struct xfs_parent_defer	*parent,
@@ -211,14 +217,14 @@ xfs_parent_defer_remove(
 	return xfs_attr_defer_remove(args);
 }
 
-
+/* Replace one parent pointer with another to reflect a rename. */
 int
-xfs_parent_defer_replace(
+xfs_parent_replace(
 	struct xfs_trans	*tp,
 	struct xfs_parent_defer	*new_parent,
 	struct xfs_inode	*old_dp,
 	xfs_dir2_dataptr_t	old_diroffset,
-	struct xfs_name		*parent_name,
+	const struct xfs_name	*parent_name,
 	struct xfs_inode	*new_dp,
 	xfs_dir2_dataptr_t	new_diroffset,
 	struct xfs_inode	*child)
diff --git a/fs/xfs/libxfs/xfs_parent.h b/fs/xfs/libxfs/xfs_parent.h
index f4f5887d1133..35854e968f1d 100644
--- a/fs/xfs/libxfs/xfs_parent.h
+++ b/fs/xfs/libxfs/xfs_parent.h
@@ -49,8 +49,9 @@ struct xfs_parent_defer {
  * Parent pointer attribute prototypes
  */
 void xfs_init_parent_name_rec(struct xfs_parent_name_rec *rec,
-			      struct xfs_inode *ip,
-			      uint32_t p_diroffset);
+		struct xfs_inode *ip, uint32_t p_diroffset);
+void xfs_init_parent_name_irec(struct xfs_parent_name_irec *irec,
+			       struct xfs_parent_name_rec *rec);
 int __xfs_parent_init(struct xfs_mount *mp, bool grab_log,
 		struct xfs_parent_defer **parentp);
 
@@ -78,18 +79,17 @@ xfs_parent_start_locked(
 	return 0;
 }
 
-int xfs_parent_defer_add(struct xfs_trans *tp, struct xfs_parent_defer *parent,
-			 struct xfs_inode *dp, struct xfs_name *parent_name,
-			 xfs_dir2_dataptr_t diroffset, struct xfs_inode *child);
-int xfs_parent_defer_replace(struct xfs_trans *tp,
+int xfs_parent_add(struct xfs_trans *tp, struct xfs_parent_defer *parent,
+		struct xfs_inode *dp, const struct xfs_name *parent_name,
+		xfs_dir2_dataptr_t diroffset, struct xfs_inode *child);
+int xfs_parent_replace(struct xfs_trans *tp,
 		struct xfs_parent_defer *new_parent, struct xfs_inode *old_dp,
-		xfs_dir2_dataptr_t old_diroffset, struct xfs_name *parent_name,
-		struct xfs_inode *new_ip, xfs_dir2_dataptr_t new_diroffset,
+		xfs_dir2_dataptr_t old_diroffset,
+		const struct xfs_name *parent_name, struct xfs_inode *new_ip,
+		xfs_dir2_dataptr_t new_diroffset, struct xfs_inode *child);
+int xfs_parent_remove(struct xfs_trans *tp, struct xfs_inode *dp,
+		struct xfs_parent_defer *parent, xfs_dir2_dataptr_t diroffset,
 		struct xfs_inode *child);
-int xfs_parent_defer_remove(struct xfs_trans *tp, struct xfs_inode *dp,
-			    struct xfs_parent_defer *parent,
-			    xfs_dir2_dataptr_t diroffset,
-			    struct xfs_inode *child);
 
 void __xfs_parent_cancel(struct xfs_mount *mp, struct xfs_parent_defer *parent);
 
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index ce1f6d03c3a9..09b0ac6b99cb 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -1202,7 +1202,7 @@ xfs_create(
 	 * the parent information now.
 	 */
 	if (parent) {
-		error = xfs_parent_defer_add(tp, parent, dp, name, diroffset,
+		error = xfs_parent_add(tp, parent, dp, name, diroffset,
 					     ip);
 		if (error)
 			goto out_trans_cancel;
@@ -1477,7 +1477,7 @@ xfs_link(
 	 * the parent to the inode.
 	 */
 	if (parent) {
-		error = xfs_parent_defer_add(tp, parent, tdp, target_name,
+		error = xfs_parent_add(tp, parent, tdp, target_name,
 					     diroffset, sip);
 		if (error)
 			goto error_return;
@@ -2750,7 +2750,7 @@ xfs_remove(
 	}
 
 	if (parent) {
-		error = xfs_parent_defer_remove(tp, dp, parent, dir_offset, ip);
+		error = xfs_parent_remove(tp, dp, parent, dir_offset, ip);
 		if (error)
 			goto out_trans_cancel;
 	}
@@ -3061,12 +3061,12 @@ xfs_cross_rename(
 	}
 
 	if (xfs_has_parent(mp)) {
-		error = xfs_parent_defer_replace(tp, ip1_pptr, dp1,
+		error = xfs_parent_replace(tp, ip1_pptr, dp1,
 				old_diroffset, name2, dp2, new_diroffset, ip1);
 		if (error)
 			goto out_trans_abort;
 
-		error = xfs_parent_defer_replace(tp, ip2_pptr, dp2,
+		error = xfs_parent_replace(tp, ip2_pptr, dp2,
 				new_diroffset, name1, dp1, old_diroffset, ip2);
 		if (error)
 			goto out_trans_abort;
@@ -3540,7 +3540,7 @@ xfs_rename(
 		goto out_trans_cancel;
 
 	if (wip_pptr) {
-		error = xfs_parent_defer_add(tp, wip_pptr,
+		error = xfs_parent_add(tp, wip_pptr,
 					     src_dp, src_name,
 					     old_diroffset, wip);
 		if (error)
@@ -3548,7 +3548,7 @@ xfs_rename(
 	}
 
 	if (src_ip_pptr) {
-		error = xfs_parent_defer_replace(tp, src_ip_pptr, src_dp,
+		error = xfs_parent_replace(tp, src_ip_pptr, src_dp,
 				old_diroffset, target_name, target_dp,
 				new_diroffset, src_ip);
 		if (error)
@@ -3556,7 +3556,7 @@ xfs_rename(
 	}
 
 	if (tgt_ip_pptr) {
-		error = xfs_parent_defer_remove(tp, target_dp,
+		error = xfs_parent_remove(tp, target_dp,
 						tgt_ip_pptr,
 						new_diroffset, target_ip);
 		if (error)
diff --git a/fs/xfs/xfs_symlink.c b/fs/xfs/xfs_symlink.c
index fdfaab466f5d..63e68e832551 100644
--- a/fs/xfs/xfs_symlink.c
+++ b/fs/xfs/xfs_symlink.c
@@ -348,7 +348,7 @@ xfs_symlink(
 	xfs_trans_log_inode(tp, dp, XFS_ILOG_CORE);
 
 	if (parent) {
-		error = xfs_parent_defer_add(tp, parent, dp, link_name,
+		error = xfs_parent_add(tp, parent, dp, link_name,
 					     diroffset, ip);
 		if (error)
 			goto out_trans_cancel;


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 4/7] xfs: rearrange bits of the parent pointer apis for fsck
  2023-02-16 20:27 ` [PATCHSET v9r2d1 0/7] xfs: online repair of directories Darrick J. Wong
                     ` (2 preceding siblings ...)
  2023-02-16 20:48   ` [PATCH 3/7] xfs: shorten parent pointer function names Darrick J. Wong
@ 2023-02-16 20:48   ` Darrick J. Wong
  2023-02-16 20:49   ` [PATCH 5/7] xfs: reconstruct directories from parent pointers Darrick J. Wong
                     ` (2 subsequent siblings)
  6 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:48 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Rearrange parts of this thing in preparation for fsck code.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_da_format.h |   11 +++++++++++
 fs/xfs/libxfs/xfs_parent.c    |   29 ++++++++++++++++++++++++++++-
 fs/xfs/libxfs/xfs_parent.h    |    6 ++----
 3 files changed, 41 insertions(+), 5 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_da_format.h b/fs/xfs/libxfs/xfs_da_format.h
index 2db1cf97b2c8..c07b8166e8ff 100644
--- a/fs/xfs/libxfs/xfs_da_format.h
+++ b/fs/xfs/libxfs/xfs_da_format.h
@@ -159,6 +159,17 @@ struct xfs_da3_intnode {
 
 #define XFS_DIR3_FT_MAX			9
 
+#define XFS_DIR3_FTYPE_STR \
+	{ XFS_DIR3_FT_UNKNOWN,	"unknown" }, \
+	{ XFS_DIR3_FT_REG_FILE,	"file" }, \
+	{ XFS_DIR3_FT_DIR,	"directory" }, \
+	{ XFS_DIR3_FT_CHRDEV,	"char" }, \
+	{ XFS_DIR3_FT_BLKDEV,	"block" }, \
+	{ XFS_DIR3_FT_FIFO,	"fifo" }, \
+	{ XFS_DIR3_FT_SOCK,	"sock" }, \
+	{ XFS_DIR3_FT_SYMLINK,	"symlink" }, \
+	{ XFS_DIR3_FT_WHT,	"whiteout" }
+
 /*
  * Byte offset in data block and shortform entry.
  */
diff --git a/fs/xfs/libxfs/xfs_parent.c b/fs/xfs/libxfs/xfs_parent.c
index ec2bff195773..fe6d4d1a7d57 100644
--- a/fs/xfs/libxfs/xfs_parent.c
+++ b/fs/xfs/libxfs/xfs_parent.c
@@ -91,7 +91,7 @@ xfs_parent_valuecheck(
 }
 
 /* Initializes a xfs_parent_name_rec to be stored as an attribute name */
-void
+static inline void
 xfs_init_parent_name_rec(
 	struct xfs_parent_name_rec	*rec,
 	struct xfs_inode		*ip,
@@ -135,6 +135,33 @@ xfs_parent_irec_from_disk(
 	memset(&irec->p_name[valuelen], 0, sizeof(irec->p_name) - valuelen);
 }
 
+/*
+ * Convert an incore parent_name record to its ondisk format.  If @value or
+ * @valuelen are NULL, they will not be written to.
+ */
+void
+xfs_parent_irec_to_disk(
+	struct xfs_parent_name_rec	*rec,
+	void				*value,
+	int				*valuelen,
+	const struct xfs_parent_name_irec *irec)
+{
+	rec->p_ino = cpu_to_be64(irec->p_ino);
+	rec->p_gen = cpu_to_be32(irec->p_gen);
+	rec->p_diroffset = cpu_to_be32(irec->p_diroffset);
+
+	if (valuelen) {
+		ASSERT(*valuelen > 0);
+		ASSERT(*valuelen >= irec->p_namelen);
+		ASSERT(*valuelen < MAXNAMELEN);
+
+		*valuelen = irec->p_namelen;
+	}
+
+	if (value)
+		memcpy(value, irec->p_name, irec->p_namelen);
+}
+
 /*
  * Allocate memory to control a logged parent pointer update as part of a
  * dirent operation.
diff --git a/fs/xfs/libxfs/xfs_parent.h b/fs/xfs/libxfs/xfs_parent.h
index 35854e968f1d..4eb92fb4b11b 100644
--- a/fs/xfs/libxfs/xfs_parent.h
+++ b/fs/xfs/libxfs/xfs_parent.h
@@ -33,6 +33,8 @@ struct xfs_parent_name_irec {
 void xfs_parent_irec_from_disk(struct xfs_parent_name_irec *irec,
 		const struct xfs_parent_name_rec *rec,
 		const void *value, int valuelen);
+void xfs_parent_irec_to_disk(struct xfs_parent_name_rec *rec, void *value,
+		int *valuelen, const struct xfs_parent_name_irec *irec);
 
 /*
  * Dynamically allocd structure used to wrap the needed data to pass around
@@ -48,10 +50,6 @@ struct xfs_parent_defer {
 /*
  * Parent pointer attribute prototypes
  */
-void xfs_init_parent_name_rec(struct xfs_parent_name_rec *rec,
-		struct xfs_inode *ip, uint32_t p_diroffset);
-void xfs_init_parent_name_irec(struct xfs_parent_name_irec *irec,
-			       struct xfs_parent_name_rec *rec);
 int __xfs_parent_init(struct xfs_mount *mp, bool grab_log,
 		struct xfs_parent_defer **parentp);
 


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 5/7] xfs: reconstruct directories from parent pointers
  2023-02-16 20:27 ` [PATCHSET v9r2d1 0/7] xfs: online repair of directories Darrick J. Wong
                     ` (3 preceding siblings ...)
  2023-02-16 20:48   ` [PATCH 4/7] xfs: rearrange bits of the parent pointer apis for fsck Darrick J. Wong
@ 2023-02-16 20:49   ` Darrick J. Wong
  2023-02-16 20:49   ` [PATCH 6/7] xfs: add hooks to do directory updates Darrick J. Wong
  2023-02-16 20:49   ` [PATCH 7/7] xfs: compare generated and existing dirents Darrick J. Wong
  6 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:49 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Use the filesystem scanning infrastructure to walk the filesystem
looking for parent pointers and child dirents that reference the
directory that we're rebuilding.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/Makefile           |    1 
 fs/xfs/scrub/common.c     |   15 +
 fs/xfs/scrub/common.h     |   28 +
 fs/xfs/scrub/dir.c        |    9 
 fs/xfs/scrub/dir_repair.c |  940 +++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/repair.h     |   16 +
 fs/xfs/scrub/scrub.c      |    2 
 fs/xfs/scrub/tempfile.c   |   42 ++
 fs/xfs/scrub/tempfile.h   |    2 
 fs/xfs/scrub/trace.c      |    1 
 fs/xfs/scrub/trace.h      |   67 +++
 11 files changed, 1122 insertions(+), 1 deletion(-)
 create mode 100644 fs/xfs/scrub/dir_repair.c


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 6a30b145491d..a32f6da27a86 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -178,6 +178,7 @@ xfs-$(CONFIG_XFS_QUOTA)		+= scrub/quota.o
 ifeq ($(CONFIG_XFS_ONLINE_REPAIR),y)
 xfs-y				+= $(addprefix scrub/, \
 				   agheader_repair.o \
+				   dir_repair.o \
 				   repair.o \
 				   tempfile.o \
 				   xfblob.o \
diff --git a/fs/xfs/scrub/common.c b/fs/xfs/scrub/common.c
index 2874da088e8d..17a9bc610a76 100644
--- a/fs/xfs/scrub/common.c
+++ b/fs/xfs/scrub/common.c
@@ -551,6 +551,21 @@ xchk_ag_init(
 
 /* Per-scrubber setup functions */
 
+void
+xchk_trans_cancel(
+	struct xfs_scrub	*sc)
+{
+	xfs_trans_cancel(sc->tp);
+	sc->tp = NULL;
+}
+
+int
+xchk_trans_alloc_empty(
+	struct xfs_scrub	*sc)
+{
+	return xfs_trans_alloc_empty(sc->mp, &sc->tp);
+}
+
 /*
  * Grab an empty transaction so that we can re-grab locked buffers if
  * one of our btrees turns out to be cyclic.
diff --git a/fs/xfs/scrub/common.h b/fs/xfs/scrub/common.h
index 423a98c39fb6..7720982adfc6 100644
--- a/fs/xfs/scrub/common.h
+++ b/fs/xfs/scrub/common.h
@@ -31,6 +31,9 @@ xchk_should_terminate(
 	return false;
 }
 
+void xchk_trans_cancel(struct xfs_scrub *sc);
+int xchk_trans_alloc_empty(struct xfs_scrub *sc);
+
 int xchk_trans_alloc(struct xfs_scrub *sc, uint resblks);
 bool xchk_process_error(struct xfs_scrub *sc, xfs_agnumber_t agno,
 		xfs_agblock_t bno, int *error);
@@ -159,4 +162,29 @@ void xchk_start_reaping(struct xfs_scrub *sc);
 
 void xchk_fshooks_enable(struct xfs_scrub *sc, unsigned int scrub_fshooks);
 
+#ifdef CONFIG_XFS_ONLINE_REPAIR
+/* Decide if a repair is required. */
+static inline bool xchk_needs_repair(const struct xfs_scrub_metadata *sm)
+{
+	return sm->sm_flags & (XFS_SCRUB_OFLAG_CORRUPT |
+			       XFS_SCRUB_OFLAG_XCORRUPT |
+			       XFS_SCRUB_OFLAG_PREEN);
+}
+
+/*
+ * "Should we prepare for a repair?"
+ *
+ * Return true if the caller permits us to repair metadata and we're not
+ * setting up for a post-repair evaluation.
+ */
+static inline bool xchk_could_repair(const struct xfs_scrub *sc)
+{
+	return (sc->sm->sm_flags & XFS_SCRUB_IFLAG_REPAIR) &&
+		!(sc->flags & XREP_ALREADY_FIXED);
+}
+#else
+# define xchk_needs_repair(sc)		(false)
+# define xchk_could_repair(sc)		(false)
+#endif /* CONFIG_XFS_ONLINE_REPAIR */
+
 #endif	/* __XFS_SCRUB_COMMON_H__ */
diff --git a/fs/xfs/scrub/dir.c b/fs/xfs/scrub/dir.c
index 06783e4b95ad..d720f1e143dd 100644
--- a/fs/xfs/scrub/dir.c
+++ b/fs/xfs/scrub/dir.c
@@ -19,12 +19,21 @@
 #include "scrub/common.h"
 #include "scrub/dabtree.h"
 #include "scrub/readdir.h"
+#include "scrub/repair.h"
 
 /* Set us up to scrub directories. */
 int
 xchk_setup_directory(
 	struct xfs_scrub	*sc)
 {
+	int			error;
+
+	if (xchk_could_repair(sc)) {
+		error = xrep_setup_directory(sc);
+		if (error)
+			return error;
+	}
+
 	return xchk_setup_inode_contents(sc, 0);
 }
 
diff --git a/fs/xfs/scrub/dir_repair.c b/fs/xfs/scrub/dir_repair.c
new file mode 100644
index 000000000000..a6576a29e784
--- /dev/null
+++ b/fs/xfs/scrub/dir_repair.c
@@ -0,0 +1,940 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (C) 2023 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <djwong@kernel.org>
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_bit.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_sb.h"
+#include "xfs_inode.h"
+#include "xfs_icache.h"
+#include "xfs_da_format.h"
+#include "xfs_da_btree.h"
+#include "xfs_dir2.h"
+#include "xfs_dir2_priv.h"
+#include "xfs_bmap.h"
+#include "xfs_quota.h"
+#include "xfs_bmap_btree.h"
+#include "xfs_trans_space.h"
+#include "xfs_bmap_util.h"
+#include "xfs_attr.h"
+#include "xfs_parent.h"
+#include "scrub/xfs_scrub.h"
+#include "scrub/scrub.h"
+#include "scrub/common.h"
+#include "scrub/trace.h"
+#include "scrub/repair.h"
+#include "scrub/tempfile.h"
+#include "scrub/iscan.h"
+#include "scrub/readdir.h"
+#include "scrub/listxattr.h"
+#include "scrub/xfile.h"
+#include "scrub/xfarray.h"
+#include "scrub/xfblob.h"
+
+/*
+ * Directory Repairs
+ * =================
+ *
+ * Reconstruct a directory by visiting each parent pointer of each file in the
+ * filesystem and translating the relevant pptrs into dirents.  Translation
+ * occurs by adding new dirents to a temporary directory, which formats the
+ * ondisk directory blocks.  In the final version of this code, we'll use the
+ * atomic extent swap code to exchange the entire directory structure of the
+ * file being repaired and the temporary, but for this PoC we omit the commit
+ * to reduce the amount of code that has to be ported.
+ *
+ * Because we have to scan the entire filesystem, the next patch introduces the
+ * inode scan and live update hooks so that the rebuilder can be kept aware of
+ * filesystem updates being made to this directory by other threads.  Directory
+ * entry translation therefore requires two steps to avoid problems with lock
+ * contention and to keep ondisk tempdir updates out of the hook path.
+ *
+ * Every time the filesystem scanner or the live update hook code encounter a
+ * directory operation relevant to this rebuilder, they will write a record of
+ * the createname/removename operation to an xfarray.  Dirent names are stored
+ * in an xfblob structure.  At opportune times, these stashed updates will be
+ * read from the xfarray and committed (individually) to the temporary
+ * directory.
+ *
+ * When the filesystem scan is complete, we relock both the directory and the
+ * tempdir, and finish any stashed operations.  At that point, we are
+ * theoretically ready to exchange the directory data fork mappings.  This
+ * cannot happen until two patchsets get merged: the first allows callers to
+ * specify the owning inode number explicitly; and the second is the atomic
+ * extent swap series.
+ *
+ * For now we'll simply compare the two directories and complain about
+ * discrepancies.
+ */
+
+/* Maximum memory usage for the tempdir log, in bytes. */
+#define MAX_DIRENT_STASH_SIZE	(32ULL << 10)
+
+/* Create a dirent in the tempdir. */
+#define XREP_DIRENT_ADD		(1)
+
+/* Remove a dirent from the tempdir. */
+#define XREP_DIRENT_REMOVE	(2)
+
+/* A stashed dirent update. */
+struct xrep_dirent {
+	/* Cookie for retrieval of the dirent name. */
+	xfblob_cookie		name_cookie;
+
+	/* Child inode number. */
+	xfs_ino_t		ino;
+
+	/* Directory offset that we want.  We're not going to get it. */
+	xfs_dir2_dataptr_t	diroffset;
+
+	/* Length of the dirent name. */
+	uint8_t			namelen;
+
+	/* File type of the dirent. */
+	uint8_t			ftype;
+
+	/* XREP_DIRENT_{ADD,REMOVE} */
+	uint8_t			action;
+};
+
+struct xrep_dir {
+	struct xfs_scrub	*sc;
+
+	/* Inode scan cursor. */
+	struct xchk_iscan	iscan;
+
+	/* Preallocated args struct for performing dir operations */
+	struct xfs_da_args	args;
+
+	/* Stashed directory entry updates. */
+	struct xfarray		*dir_entries;
+
+	/* Directory entry names. */
+	struct xfblob		*dir_names;
+
+	/* Mutex protecting dir_entries, dir_names, and parent_ino. */
+	struct mutex		lock;
+
+	/*
+	 * This is the dotdot inumber that we're going to set on the
+	 * reconstructed directory.
+	 */
+	xfs_ino_t		parent_ino;
+
+	/* Scratch buffer for scanning pptr xattrs */
+	struct xfs_parent_name_irec pptr;
+};
+
+/* Tear down all the incore stuff we created. */
+static void
+xrep_dir_teardown(
+	struct xfs_scrub	*sc)
+{
+	struct xrep_dir		*rd = sc->buf;
+
+	xchk_iscan_finish(&rd->iscan);
+	mutex_destroy(&rd->lock);
+	xfblob_destroy(rd->dir_names);
+	xfarray_destroy(rd->dir_entries);
+}
+
+/* Set up for a directory repair. */
+int
+xrep_setup_directory(
+	struct xfs_scrub	*sc)
+{
+	struct xrep_dir		*rd;
+	int			error;
+
+	error = xrep_tempfile_create(sc, S_IFDIR);
+	if (error)
+		return error;
+
+	rd = kvzalloc(sizeof(struct xrep_dir), XCHK_GFP_FLAGS);
+	if (!rd)
+		return -ENOMEM;
+
+	sc->buf = rd;
+	rd->sc = sc;
+	rd->parent_ino = NULLFSINO;
+	return 0;
+}
+
+/* Are these two directory names the same? */
+static inline bool
+xrep_dir_samename(
+	const struct xfs_name	*n1,
+	const struct xfs_name	*n2)
+{
+	return n1->len == n2->len && !memcmp(n1->name, n2->name, n1->len);
+}
+
+/*
+ * Look up the inode number for an exact name in a directory.
+ *
+ * Callers must hold the ILOCK.  File types are XFS_DIR3_FT_*.  Names are not
+ * checked for correctness.  This initializes rd->args.
+ */
+STATIC int
+xrep_dir_lookup(
+	struct xrep_dir		*rd,
+	struct xfs_inode	*dp,
+	const struct xfs_name	*name,
+	xfs_ino_t		*ino)
+{
+	struct xfs_scrub	*sc = rd->sc;
+	bool			isblock, isleaf;
+	int			error;
+
+	if (xfs_is_shutdown(dp->i_mount))
+		return -EIO;
+
+	ASSERT(S_ISDIR(VFS_I(dp)->i_mode));
+	ASSERT(xfs_isilocked(dp, XFS_ILOCK_SHARED | XFS_ILOCK_EXCL));
+
+	memset(&rd->args, 0, sizeof(struct xfs_da_args));
+	rd->args.dp		= dp;
+	rd->args.geo		= sc->mp->m_dir_geo;
+	rd->args.hashval	= xfs_dir2_hashname(dp->i_mount, name);
+	rd->args.namelen	= name->len;
+	rd->args.name		= name->name;
+	rd->args.op_flags	= XFS_DA_OP_OKNOENT;
+	rd->args.trans		= sc->tp;
+	rd->args.whichfork	= XFS_DATA_FORK;
+
+	if (dp->i_df.if_format == XFS_DINODE_FMT_LOCAL) {
+		error = xfs_dir2_sf_lookup(&rd->args);
+		goto out_check_rval;
+	}
+
+	/* dir2 functions require that the data fork is loaded */
+	error = xfs_iread_extents(sc->tp, dp, XFS_DATA_FORK);
+	if (error)
+		return error;
+
+	error = xfs_dir2_isblock(&rd->args, &isblock);
+	if (error)
+		return error;
+
+	if (isblock) {
+		error = xfs_dir2_block_lookup(&rd->args);
+		goto out_check_rval;
+	}
+
+	error = xfs_dir2_isleaf(&rd->args, &isleaf);
+	if (error)
+		return error;
+
+	if (isleaf) {
+		error = xfs_dir2_leaf_lookup(&rd->args);
+		goto out_check_rval;
+	}
+
+	error = xfs_dir2_node_lookup(&rd->args);
+
+out_check_rval:
+	if (error == -EEXIST)
+		error = 0;
+	if (!error)
+		*ino = rd->args.inumber;
+	return error;
+}
+
+/* Create a directory entry, having filled out most of rd->args via lookup. */
+STATIC int
+xrep_dir_createname(
+	struct xrep_dir		*rd,
+	const struct xfs_name	*name,
+	xfs_ino_t		inum,
+	xfs_extlen_t		total,
+	xfs_dir2_dataptr_t	diroffset)
+{
+	struct xfs_scrub	*sc = rd->sc;
+	struct xfs_inode	*dp = rd->args.dp;
+	bool			is_block, is_leaf;
+	int			error;
+
+	ASSERT(S_ISDIR(VFS_I(dp)->i_mode));
+
+	error = xfs_dir_ino_validate(sc->mp, inum);
+	if (error)
+		return error;
+
+	trace_xrep_dir_createname(dp, name, inum, diroffset);
+
+	/* reset cmpresult as if we haven't done a lookup */
+	rd->args.cmpresult = XFS_CMP_DIFFERENT;
+	rd->args.filetype = name->type;
+	rd->args.inumber = inum;
+	rd->args.op_flags = XFS_DA_OP_ADDNAME | XFS_DA_OP_OKNOENT;
+	rd->args.total = total;
+
+	if (dp->i_df.if_format == XFS_DINODE_FMT_LOCAL)
+		return xfs_dir2_sf_addname(&rd->args);
+
+	error = xfs_dir2_isblock(&rd->args, &is_block);
+	if (error)
+		return error;
+	if (is_block)
+		return xfs_dir2_block_addname(&rd->args);
+
+	error = xfs_dir2_isleaf(&rd->args, &is_leaf);
+	if (error)
+		return error;
+	if (is_leaf)
+		return xfs_dir2_leaf_addname(&rd->args);
+
+	return xfs_dir2_node_addname(&rd->args);
+}
+
+/* Remove a directory entry, having filled out rd->args via lookup. */
+STATIC int
+xrep_dir_removename(
+	struct xrep_dir		*rd,
+	const struct xfs_name	*name,
+	xfs_extlen_t		total,
+	xfs_dir2_dataptr_t	diroffset)
+{
+	struct xfs_inode	*dp = rd->args.dp;
+	bool			is_block, is_leaf;
+	int			error;
+
+	ASSERT(S_ISDIR(VFS_I(dp)->i_mode));
+
+	/* reset cmpresult as if we haven't done a lookup */
+	rd->args.cmpresult = XFS_CMP_DIFFERENT;
+	rd->args.op_flags = 0;
+	rd->args.total = total;
+
+	trace_xrep_dir_removename(dp, name, rd->args.inumber, diroffset);
+
+	if (dp->i_df.if_format == XFS_DINODE_FMT_LOCAL)
+		return xfs_dir2_sf_removename(&rd->args);
+
+	error = xfs_dir2_isblock(&rd->args, &is_block);
+	if (error)
+		return error;
+	if (is_block)
+		return xfs_dir2_block_removename(&rd->args);
+
+	error = xfs_dir2_isleaf(&rd->args, &is_leaf);
+	if (error)
+		return error;
+	if (is_leaf)
+		return xfs_dir2_leaf_removename(&rd->args);
+
+	return xfs_dir2_node_removename(&rd->args);
+}
+
+/* Update the temporary directory with a stashed update. */
+STATIC int
+xrep_dir_replay_update(
+	struct xrep_dir			*rd,
+	const struct xrep_dirent	*dirent)
+{
+	struct xfs_name			xname = {
+		.len			= dirent->namelen,
+		.type			= dirent->ftype,
+		.name			= rd->pptr.p_name,
+	};
+	struct xfs_scrub		*sc = rd->sc;
+	struct xfs_mount		*mp = sc->mp;
+	xfs_ino_t			child_ino;
+	uint				resblks;
+	int				error;
+
+	if (dirent->action == XREP_DIRENT_REMOVE)
+		resblks = XFS_DIRREMOVE_SPACE_RES(mp);
+	else
+		resblks = XFS_DIRENTER_SPACE_RES(mp, dirent->namelen);
+
+	error = xchk_trans_alloc(sc, resblks);
+	if (error)
+		return error;
+
+	error = xrep_tempfile_ilock_polled(sc);
+	if (error) {
+		xchk_trans_cancel(rd->sc);
+		return error;
+	}
+
+	xfs_trans_ijoin(sc->tp, sc->tempip, 0);
+
+	error = xrep_dir_lookup(rd, sc->tempip, &xname, &child_ino);
+	if (dirent->action == XREP_DIRENT_REMOVE) {
+		/* Remove this dirent.  The lookup must succeed. */
+		if (error)
+			goto out_cancel;
+		if (child_ino != dirent->ino) {
+			error = -ENOENT;
+			goto out_cancel;
+		}
+
+		error = xrep_dir_removename(rd, &xname, resblks,
+				dirent->diroffset);
+	} else {
+		/* Add this dirent.  The lookup must not succeed. */
+		if (error == 0)
+			error = -EEXIST;
+		if (error != -ENOENT)
+			goto out_cancel;
+
+		error = xrep_dir_createname(rd, &xname, dirent->ino, resblks,
+				dirent->diroffset);
+	}
+	if (error)
+		goto out_cancel;
+
+	error = xrep_trans_commit(sc);
+	goto out_ilock;
+
+out_cancel:
+	xchk_trans_cancel(rd->sc);
+out_ilock:
+	xrep_tempfile_iunlock(rd->sc);
+	return error;
+}
+
+/*
+ * Flush stashed dirent updates that have been recorded by the scanner.  This
+ * is done to reduce the memory requirements of the directory rebuild, since
+ * directories can contain up to 32GB of directory data.
+ *
+ * Caller must not hold transactions or ILOCKs.  Caller must hold the tempdir
+ * IOLOCK.
+ */
+STATIC int
+xrep_dir_replay_updates(
+	struct xrep_dir		*rd)
+{
+	xfarray_idx_t		array_cur;
+	int			error;
+
+	mutex_lock(&rd->lock);
+	foreach_xfarray_idx(rd->dir_entries, array_cur) {
+		struct xrep_dirent	dirent;
+
+		error = xfarray_load(rd->dir_entries, array_cur, &dirent);
+		if (error)
+			goto out_unlock;
+
+		error = xfblob_load(rd->dir_names, dirent.name_cookie,
+				rd->pptr.p_name, dirent.namelen);
+		if (error)
+			goto out_unlock;
+		rd->pptr.p_name[MAXNAMELEN - 1] = 0;
+		mutex_unlock(&rd->lock);
+
+		error = xrep_dir_replay_update(rd, &dirent);
+		if (error)
+			return error;
+
+		mutex_lock(&rd->lock);
+	}
+
+	/* Empty out both arrays now that we've added the entries. */
+	xfarray_truncate(rd->dir_entries);
+	xfblob_truncate(rd->dir_names);
+	mutex_unlock(&rd->lock);
+	return 0;
+out_unlock:
+	mutex_unlock(&rd->lock);
+	return error;
+}
+
+/*
+ * Remember that we want to create a dirent in the tempdir.  These stashed
+ * actions will be replayed later.
+ */
+STATIC int
+xrep_dir_add_dirent(
+	struct xrep_dir		*rd,
+	const struct xfs_name	*name,
+	xfs_ino_t		ino,
+	xfs_dir2_dataptr_t	diroffset)
+{
+	struct xrep_dirent	dirent = {
+		.action		= XREP_DIRENT_ADD,
+		.ino		= ino,
+		.namelen	= name->len,
+		.ftype		= name->type,
+		.diroffset	= diroffset,
+	};
+	int			error;
+
+	trace_xrep_dir_add_dirent(rd->sc->tempip, name, ino, diroffset);
+
+	error = xfblob_store(rd->dir_names, &dirent.name_cookie, name->name,
+			name->len);
+	if (error)
+		return error;
+
+	return xfarray_append(rd->dir_entries, &dirent);
+}
+
+/*
+ * Remember that we want to remove a dirent from the tempdir.  These stashed
+ * actions will be replayed later.
+ */
+STATIC int
+xrep_dir_remove_dirent(
+	struct xrep_dir		*rd,
+	const struct xfs_name	*name,
+	xfs_ino_t		ino,
+	xfs_dir2_dataptr_t	diroffset)
+{
+	struct xrep_dirent	dirent = {
+		.action		= XREP_DIRENT_REMOVE,
+		.ino		= ino,
+		.namelen	= name->len,
+		.ftype		= name->type,
+		.diroffset	= diroffset,
+	};
+	int			error;
+
+	trace_xrep_dir_remove_dirent(rd->sc->tempip, name, ino, diroffset);
+
+	error = xfblob_store(rd->dir_names, &dirent.name_cookie, name->name,
+			name->len);
+	if (error)
+		return error;
+
+	return xfarray_append(rd->dir_entries, &dirent);
+}
+
+/*
+ * Examine an xattr of a file.  If this xattr is a parent pointer that leads us
+ * back to the directory that we're rebuilding, add a dirent to the temporary
+ * directory.
+ */
+STATIC int
+xrep_dir_scan_parent_pointer(
+	struct xfs_scrub	*sc,
+	struct xfs_inode	*ip,
+	unsigned int		attr_flags,
+	const unsigned char	*name,
+	unsigned int		namelen,
+	const void		*value,
+	unsigned int		valuelen,
+	void			*priv)
+{
+	struct xfs_name		xname;
+	struct xrep_dir		*rd = priv;
+	const struct xfs_parent_name_rec *rec = (const void *)name;
+	int			error;
+
+	/* Ignore incomplete xattrs */
+	if (attr_flags & XFS_ATTR_INCOMPLETE)
+		return 0;
+
+	/* Ignore anything that isn't a parent pointer. */
+	if (!(attr_flags & XFS_ATTR_PARENT))
+		return 0;
+
+	/* Does the ondisk parent pointer structure make sense? */
+	if (!xfs_parent_namecheck(sc->mp, rec, namelen, attr_flags) ||
+	    !xfs_parent_valuecheck(sc->mp, value, valuelen))
+		return -EFSCORRUPTED;
+
+	xfs_parent_irec_from_disk(&rd->pptr, rec, value, valuelen);
+
+	/* Ignore parent pointers that point back to a different dir. */
+	if (rd->pptr.p_ino != sc->ip->i_ino ||
+	    rd->pptr.p_gen != VFS_I(sc->ip)->i_generation)
+		return 0;
+
+	/*
+	 * Transform this parent pointer into a dirent and queue it for later
+	 * addition to the temporary directory.
+	 */
+	xname.name = rd->pptr.p_name;
+	xname.len = rd->pptr.p_namelen;
+	xname.type = xfs_mode_to_ftype(VFS_I(ip)->i_mode);
+
+	mutex_lock(&rd->lock);
+	error = xrep_dir_add_dirent(rd, &xname, ip->i_ino,
+			rd->pptr.p_diroffset);
+	mutex_unlock(&rd->lock);
+	return error;
+}
+
+/*
+ * If this child dirent points to the directory being repaired, remember that
+ * fact so that we can reset the dotdot entry if necessary.
+ */
+STATIC int
+xrep_dir_scan_dirent(
+	struct xfs_scrub	*sc,
+	struct xfs_inode	*dp,
+	xfs_dir2_dataptr_t	dapos,
+	const struct xfs_name	*name,
+	xfs_ino_t		ino,
+	void			*priv)
+{
+	struct xrep_dir		*rd = priv;
+
+	/* Dirent doesn't point to this directory. */
+	if (ino != rd->sc->ip->i_ino)
+		return 0;
+
+	/* Ignore garbage inum. */
+	if (!xfs_verify_dir_ino(rd->sc->mp, ino))
+		return 0;
+
+	/* No weird looking names. */
+	if (name->len >= MAXNAMELEN || name->len <= 0)
+		return 0;
+
+	/* Don't pick up dot or dotdot entries; we only want child dirents. */
+	if (xrep_dir_samename(name, &xfs_name_dotdot) ||
+	    xrep_dir_samename(name, &xfs_name_dot))
+		return 0;
+
+	trace_xrep_dir_replacename(sc->tempip, &xfs_name_dotdot, dp->i_ino, 0);
+
+	mutex_lock(&rd->lock);
+	rd->parent_ino = dp->i_ino;
+	mutex_unlock(&rd->lock);
+	return 0;
+}
+
+/*
+ * Decide if we want to look for child dirents or parent pointers in this file.
+ * Skip the dir being repaired and any files being used to stage repairs.
+ */
+static inline bool
+xrep_dir_want_scan(
+	struct xrep_dir		*rd,
+	const struct xfs_inode	*ip)
+{
+	return ip != rd->sc->ip && !xrep_is_tempfile(ip);
+}
+
+/*
+ * Take ILOCK on a file that we want to scan.
+ *
+ * Select ILOCK_EXCL if the file is a directory with an unloaded data bmbt or
+ * has an unloaded attr bmbt.  Otherwise, take ILOCK_SHARED.
+ */
+static inline unsigned int
+xrep_dir_scan_ilock(
+	struct xrep_dir		*rd,
+	struct xfs_inode	*ip)
+{
+	uint			lock_mode = XFS_ILOCK_SHARED;
+
+	/* Need to take the shared ILOCK to advance the iscan cursor. */
+	if (!xrep_dir_want_scan(rd, ip))
+		goto lock;
+
+	if (S_ISDIR(VFS_I(ip)->i_mode) && xfs_need_iread_extents(&ip->i_df)) {
+		lock_mode = XFS_ILOCK_EXCL;
+		goto lock;
+	}
+
+	if (xfs_inode_has_attr_fork(ip) && xfs_need_iread_extents(&ip->i_af))
+		lock_mode = XFS_ILOCK_EXCL;
+
+lock:
+	xfs_ilock(ip, lock_mode);
+	return lock_mode;
+}
+
+/*
+ * Scan this file for relevant child dirents or parent pointers that point to
+ * the directory we're rebuilding.
+ */
+STATIC int
+xrep_dir_scan_file(
+	struct xrep_dir		*rd,
+	struct xfs_inode	*ip)
+{
+	unsigned int		lock_mode;
+	int			error = 0;
+
+	lock_mode = xrep_dir_scan_ilock(rd, ip);
+
+	if (!xrep_dir_want_scan(rd, ip))
+		goto scan_done;
+
+	error = xchk_xattr_walk(rd->sc, ip, xrep_dir_scan_parent_pointer, rd);
+	if (error)
+		goto scan_done;
+
+	if (S_ISDIR(VFS_I(ip)->i_mode)) {
+		error = xchk_dir_walk(rd->sc, ip, xrep_dir_scan_dirent, rd);
+		if (error)
+			goto scan_done;
+	}
+
+scan_done:
+	xchk_iscan_mark_visited(&rd->iscan, ip);
+	xfs_iunlock(ip, lock_mode);
+	return error;
+}
+
+/* Scan all files in the filesystem for dirents. */
+STATIC int
+xrep_dir_scan_dirtree(
+	struct xrep_dir		*rd)
+{
+	struct xfs_scrub	*sc = rd->sc;
+	struct xfs_inode	*ip;
+	int			error;
+
+	/*
+	 * Filesystem scans are time consuming.  Drop the directory ILOCK and
+	 * all other resources for the duration of the scan and hope for the
+	 * best.
+	 */
+	xchk_trans_cancel(sc);
+	if (sc->ilock_flags & (XFS_ILOCK_SHARED | XFS_ILOCK_EXCL))
+		xchk_iunlock(sc, sc->ilock_flags & (XFS_ILOCK_SHARED |
+						    XFS_ILOCK_EXCL));
+	error = xchk_trans_alloc_empty(sc);
+	if (error)
+		return error;
+
+	while ((error = xchk_iscan_iter(sc, &rd->iscan, &ip)) == 1) {
+		uint64_t	mem_usage;
+
+		error = xrep_dir_scan_file(rd, ip);
+		xchk_irele(sc, ip);
+		if (error)
+			break;
+
+		/* Flush stashed dirent updates to constrain memory usage. */
+		mutex_lock(&rd->lock);
+		mem_usage = xfarray_bytes(rd->dir_entries) +
+			     xfblob_bytes(rd->dir_names);
+		mutex_unlock(&rd->lock);
+		if (mem_usage >= MAX_DIRENT_STASH_SIZE) {
+			xchk_trans_cancel(sc);
+
+			error = xrep_tempfile_iolock_polled(sc);
+			if (error)
+				break;
+
+			error = xrep_dir_replay_updates(rd);
+			xrep_tempfile_iounlock(sc);
+			if (error)
+				break;
+
+			error = xchk_trans_alloc_empty(sc);
+			if (error)
+				break;
+		}
+
+		if (xchk_should_terminate(sc, &error))
+			break;
+	}
+	if (error) {
+		/*
+		 * If we couldn't grab an inode that was busy with a state
+		 * change, change the error code so that we exit to userspace
+		 * as quickly as possible.
+		 */
+		if (error == -EBUSY)
+			return -ECANCELED;
+		return error;
+	}
+
+	return 0;
+}
+
+/* Dump a dirent from the temporary dir. */
+STATIC int
+xrep_dir_dump_tempdir(
+	struct xfs_scrub	*sc,
+	struct xfs_inode	*dp,
+	xfs_dir2_dataptr_t	dapos,
+	const struct xfs_name	*name,
+	xfs_ino_t		ino,
+	void			*priv)
+{
+	struct xrep_dir		*rd = priv;
+	bool			child = true;
+	int			error;
+
+	/*
+	 * The tempdir was created with a dotdot entry pointing to the root
+	 * directory.  Substitute whatever inode number we found during the
+	 * filesystem scan.
+	 *
+	 * The tempdir was also created with a dot entry pointing to itself.
+	 * Substitute the inode number of the directory being repaired.  A
+	 * prerequisite for the real repair code is a patchset to allow dir
+	 * callers to set the owner (and dot entry in the case of sf -> block
+	 * conversion) explicitly.
+	 *
+	 * I've chosen not to port the owner setting patchset or the swapext
+	 * patchset for this PoC, which is why we build the tempdir, compare
+	 * the contents, and drop the tempdir.
+	 */
+	if (xrep_dir_samename(name, &xfs_name_dotdot)) {
+		child = false;
+		ino = rd->parent_ino;
+	}
+	if (xrep_dir_samename(name, &xfs_name_dot)) {
+		child = false;
+		ino = sc->ip->i_ino;
+	}
+
+	trace_xrep_dir_dumpname(sc->tempip, name, ino, dapos);
+
+	if (!child)
+		return 0;
+
+	/*
+	 * Set ourselves up to free every dirent in the tempdir because
+	 * directory inactivation won't do it for us.  The rest of the online
+	 * fsck patchset provides us a means to swap the directory structure
+	 * and reap it responsibly, but I didn't feel like porting all that.
+	 */
+	mutex_lock(&rd->lock);
+	error = xrep_dir_remove_dirent(rd, name, ino, dapos);
+	mutex_unlock(&rd->lock);
+	return error;
+}
+
+/*
+ * "Commit" the new directory structure to the file that we're repairing.
+ *
+ * In the final version, we'd swap the new directory contents (which we created
+ * in the tempfile) into the directory being repaired.  For now we just lock
+ * the temporary dir and dump what we found.
+ */
+STATIC int
+xrep_dir_rebuild_tree(
+	struct xrep_dir		*rd)
+{
+	struct xfs_scrub	*sc = rd->sc;
+	int			error = 0;
+
+	/*
+	 * Replay the last of the stashed dirent updates.  We still hold the
+	 * IOLOCK_EXCL of the directory that we're repairing and the temporary
+	 * directory.
+	 */
+	xchk_trans_cancel(sc);
+
+	ASSERT(sc->ilock_flags & XFS_IOLOCK_EXCL);
+	error = xrep_tempfile_iolock_polled(sc);
+	if (error)
+		return error;
+
+	error = xrep_dir_replay_updates(rd);
+	if (error)
+		return error;
+
+	if (sc->ip == sc->mp->m_rootip) {
+		/* Should not have found any parent of the root directory. */
+		ASSERT(rd->parent_ino == NULLFSINO);
+		rd->parent_ino = sc->mp->m_rootip->i_ino;
+	} else if (rd->parent_ino == NULLFSINO) {
+		/*
+		 * Should have found a parent somewhere unless this is an
+		 * unlinked directory.
+		 */
+		ASSERT(VFS_I(sc->ip)->i_nlink == 0);
+		rd->parent_ino = rd->sc->mp->m_sb.sb_rootino;
+	}
+
+	/*
+	 * At this point, we've quiesced both directories and should be ready
+	 * to commit the new contents.
+	 *
+	 * We don't have atomic swapext here, so all we do is dump the dirents
+	 * that we found to the ftrace buffer and {ab,re}use the dirent update
+	 * stashing mechanism to schedule deletion of every dirent in the
+	 * temporary directory to avoid leaking directory blocks.
+	 */
+	error = xchk_trans_alloc_empty(sc);
+	if (error)
+		return error;
+
+	trace_xrep_dir_rebuild_tree(sc->ip, rd->parent_ino);
+
+	xrep_tempfile_ilock(sc);
+	error = xchk_dir_walk(sc, sc->tempip, xrep_dir_dump_tempdir, rd);
+	if (error)
+		return error;
+
+	xrep_tempfile_iunlock(sc);
+	xchk_trans_cancel(sc);
+
+	return xrep_dir_replay_updates(rd);
+}
+
+/* Set up the filesystem scan so we can regenerate directory entries. */
+STATIC int
+xrep_dir_setup_scan(
+	struct xrep_dir		*rd)
+{
+	struct xfs_scrub	*sc = rd->sc;
+	int			error;
+
+	error = xfarray_create(sc->mp, "directory entries", 0,
+			sizeof(struct xrep_dirent), &rd->dir_entries);
+	if (error)
+		return error;
+
+	error = xfblob_create(sc->mp, "dirent names", &rd->dir_names);
+	if (error)
+		goto out_entries;
+
+	mutex_init(&rd->lock);
+
+	/* Retry iget every tenth of a second for up to 30 seconds. */
+	xchk_iscan_start(&rd->iscan, 30000, 100);
+
+	return 0;
+
+out_entries:
+	xfarray_destroy(rd->dir_entries);
+	return error;
+}
+
+/*
+ * Repair the directory metadata.
+ *
+ * XXX: Is it necessary to check the dcache for this directory to make sure
+ * that we always recreate every cached entry?
+ */
+int
+xrep_directory(
+	struct xfs_scrub	*sc)
+{
+	struct xrep_dir		*rd = sc->buf;
+	int			error = 0;
+
+	/* We require directory parent pointers to rebuild anything. */
+	if (!xfs_has_parent(sc->mp))
+		return -EOPNOTSUPP;
+
+	error = xrep_dir_setup_scan(rd);
+	if (error)
+		goto out;
+
+	error = xrep_dir_scan_dirtree(rd);
+	if (error)
+		goto out_finish_scan;
+
+	error = xrep_dir_rebuild_tree(rd);
+	if (error)
+		goto out_finish_scan;
+
+out_finish_scan:
+	xrep_dir_teardown(sc);
+out:
+	return error;
+}
diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h
index 840f74ec431c..ff254ff9b86d 100644
--- a/fs/xfs/scrub/repair.h
+++ b/fs/xfs/scrub/repair.h
@@ -30,6 +30,16 @@ int xrep_init_btblock(struct xfs_scrub *sc, xfs_fsblock_t fsb,
 		struct xfs_buf **bpp, xfs_btnum_t btnum,
 		const struct xfs_buf_ops *ops);
 
+static inline int
+xrep_trans_commit(
+	struct xfs_scrub	*sc)
+{
+	int			error = xfs_trans_commit(sc->tp);
+
+	sc->tp = NULL;
+	return error;
+}
+ 
 struct xbitmap;
 
 int xrep_fix_freelist(struct xfs_scrub *sc, bool can_shrink);
@@ -57,6 +67,8 @@ int xrep_find_ag_btree_roots(struct xfs_scrub *sc, struct xfs_buf *agf_bp,
 void xrep_force_quotacheck(struct xfs_scrub *sc, xfs_dqtype_t type);
 int xrep_ino_dqattach(struct xfs_scrub *sc);
 
+int xrep_setup_directory(struct xfs_scrub *sc);
+
 /* Metadata repairers */
 
 int xrep_probe(struct xfs_scrub *sc);
@@ -64,6 +76,7 @@ int xrep_superblock(struct xfs_scrub *sc);
 int xrep_agf(struct xfs_scrub *sc);
 int xrep_agfl(struct xfs_scrub *sc);
 int xrep_agi(struct xfs_scrub *sc);
+int xrep_directory(struct xfs_scrub *sc);
 
 #else
 
@@ -83,11 +96,14 @@ xrep_calc_ag_resblks(
 	return 0;
 }
 
+#define xrep_setup_directory(sc)	(0)
+
 #define xrep_probe			xrep_notsupported
 #define xrep_superblock			xrep_notsupported
 #define xrep_agf			xrep_notsupported
 #define xrep_agfl			xrep_notsupported
 #define xrep_agi			xrep_notsupported
+#define xrep_directory			xrep_notsupported
 
 #endif /* CONFIG_XFS_ONLINE_REPAIR */
 
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index a19ea7fdd510..b2a8de449d11 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -299,7 +299,7 @@ static const struct xchk_meta_ops meta_scrub_ops[] = {
 		.type	= ST_INODE,
 		.setup	= xchk_setup_directory,
 		.scrub	= xchk_directory,
-		.repair	= xrep_notsupported,
+		.repair	= xrep_directory,
 	},
 	[XFS_SCRUB_TYPE_XATTR] = {	/* extended attributes */
 		.type	= ST_INODE,
diff --git a/fs/xfs/scrub/tempfile.c b/fs/xfs/scrub/tempfile.c
index 91875d4bb67f..fa47e3423763 100644
--- a/fs/xfs/scrub/tempfile.c
+++ b/fs/xfs/scrub/tempfile.c
@@ -136,6 +136,7 @@ xrep_tempfile_create(
 	xfs_setup_iops(sc->tempip);
 	xfs_finish_inode_setup(sc->tempip);
 
+	xfs_iunlock(sc->tempip, XFS_ILOCK_EXCL);
 	sc->temp_ilock_flags = 0;
 	return error;
 
@@ -149,6 +150,7 @@ xrep_tempfile_create(
 	 */
 	if (sc->tempip) {
 		xfs_finish_inode_setup(sc->tempip);
+		xfs_iunlock(sc->tempip, XFS_ILOCK_EXCL);
 		xchk_irele(sc, sc->tempip);
 	}
 out_release_dquots:
@@ -172,6 +174,26 @@ xrep_tempfile_iolock_nowait(
 	return false;
 }
 
+/*
+ * Take the temporary file's IOLOCK while holding a different inode's IOLOCK.
+ * In theory nobody else should hold the tempfile's IOLOCK, but we use trylock
+ * to avoid deadlocks and lockdep.
+ */
+int
+xrep_tempfile_iolock_polled(
+	struct xfs_scrub	*sc)
+{
+	int			error = 0;
+
+	while (!xrep_tempfile_iolock_nowait(sc)) {
+		if (xchk_should_terminate(sc, &error))
+			return error;
+		delay(1);
+	}
+
+	return 0;
+}
+
 /* Release IOLOCK_EXCL on the temporary file. */
 void
 xrep_tempfile_iounlock(
@@ -203,6 +225,26 @@ xrep_tempfile_ilock_nowait(
 	return false;
 }
 
+/*
+ * Take the temporary file's ILOCK while holding a different inode's ILOCK.  In
+ * theory nobody else should hold the tempfile's ILOCK, but we use trylock to
+ * avoid deadlocks and lockdep.
+ */
+int
+xrep_tempfile_ilock_polled(
+	struct xfs_scrub	*sc)
+{
+	int			error = 0;
+
+	while (!xrep_tempfile_ilock_nowait(sc)) {
+		if (xchk_should_terminate(sc, &error))
+			return error;
+		delay(1);
+	}
+
+	return 0;
+}
+
 /* Unlock ILOCK_EXCL on the temporary file after an update. */
 void
 xrep_tempfile_iunlock(
diff --git a/fs/xfs/scrub/tempfile.h b/fs/xfs/scrub/tempfile.h
index e2f493b5d3d9..1e61d8e1ddce 100644
--- a/fs/xfs/scrub/tempfile.h
+++ b/fs/xfs/scrub/tempfile.h
@@ -11,10 +11,12 @@ int xrep_tempfile_create(struct xfs_scrub *sc, uint16_t mode);
 void xrep_tempfile_rele(struct xfs_scrub *sc);
 
 bool xrep_tempfile_iolock_nowait(struct xfs_scrub *sc);
+int xrep_tempfile_iolock_polled(struct xfs_scrub *sc);
 void xrep_tempfile_iounlock(struct xfs_scrub *sc);
 
 void xrep_tempfile_ilock(struct xfs_scrub *sc);
 bool xrep_tempfile_ilock_nowait(struct xfs_scrub *sc);
+int xrep_tempfile_ilock_polled(struct xfs_scrub *sc);
 void xrep_tempfile_iunlock(struct xfs_scrub *sc);
 bool xrep_is_tempfile(const struct xfs_inode *ip);
 #else
diff --git a/fs/xfs/scrub/trace.c b/fs/xfs/scrub/trace.c
index 83e8a64c95d4..61b51617fbb4 100644
--- a/fs/xfs/scrub/trace.c
+++ b/fs/xfs/scrub/trace.c
@@ -17,6 +17,7 @@
 #include "scrub/xfile.h"
 #include "scrub/xfarray.h"
 #include "scrub/iscan.h"
+#include "xfs_da_format.h"
 
 /* Figure out which block the btree cursor was pointing to. */
 static inline xfs_fsblock_t
diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
index 0c27eb197f83..cbf914bce6db 100644
--- a/fs/xfs/scrub/trace.h
+++ b/fs/xfs/scrub/trace.h
@@ -1182,6 +1182,73 @@ TRACE_EVENT(xrep_tempfile_create,
 		  __entry->temp_inum)
 );
 
+DECLARE_EVENT_CLASS(xrep_dirent_class,
+	TP_PROTO(struct xfs_inode *dp, const struct xfs_name *name,
+		 xfs_ino_t ino, unsigned int diroffset),
+	TP_ARGS(dp, name, ino, diroffset),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_ino_t, dir_ino)
+		__field(unsigned int, namelen)
+		__dynamic_array(char, name, name->len)
+		__field(xfs_ino_t, ino)
+		__field(uint8_t, ftype)
+		__field(unsigned int, diroffset)
+	),
+	TP_fast_assign(
+		__entry->dev = dp->i_mount->m_super->s_dev;
+		__entry->dir_ino = dp->i_ino;
+		__entry->namelen = name->len;
+		memcpy(__get_str(name), name->name, name->len);
+		__entry->ino = ino;
+		__entry->ftype = name->type;
+		__entry->diroffset = diroffset;
+	),
+	TP_printk("dev %d:%d dir 0x%llx dapos 0x%x ftype %s name '%.*s' ino 0x%llx",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->dir_ino,
+		  __entry->diroffset,
+		  __print_symbolic(__entry->ftype, XFS_DIR3_FTYPE_STR),
+		  __entry->namelen,
+		  __get_str(name),
+		  __entry->ino)
+)
+#define DEFINE_XREP_DIRENT_CLASS(name) \
+DEFINE_EVENT(xrep_dirent_class, name, \
+	TP_PROTO(struct xfs_inode *dp, const struct xfs_name *name, \
+		 xfs_ino_t ino, unsigned int diroffset), \
+	TP_ARGS(dp, name, ino, diroffset))
+DEFINE_XREP_DIRENT_CLASS(xrep_dir_add_dirent);
+DEFINE_XREP_DIRENT_CLASS(xrep_dir_remove_dirent);
+DEFINE_XREP_DIRENT_CLASS(xrep_dir_createname);
+DEFINE_XREP_DIRENT_CLASS(xrep_dir_removename);
+DEFINE_XREP_DIRENT_CLASS(xrep_dir_replacename);
+DEFINE_XREP_DIRENT_CLASS(xrep_dir_dumpname);
+
+DECLARE_EVENT_CLASS(xrep_dir_class,
+	TP_PROTO(struct xfs_inode *dp, xfs_ino_t parent_ino),
+	TP_ARGS(dp, parent_ino),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_ino_t, dir_ino)
+		__field(xfs_ino_t, parent_ino)
+	),
+	TP_fast_assign(
+		__entry->dev = dp->i_mount->m_super->s_dev;
+		__entry->dir_ino = dp->i_ino;
+		__entry->parent_ino = parent_ino;
+	),
+	TP_printk("dev %d:%d dir 0x%llx parent 0x%llx",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->dir_ino,
+		  __entry->parent_ino)
+)
+#define DEFINE_XREP_DIR_CLASS(name) \
+DEFINE_EVENT(xrep_dir_class, name, \
+	TP_PROTO(struct xfs_inode *dp, xfs_ino_t parent_ino), \
+	TP_ARGS(dp, parent_ino))
+DEFINE_XREP_DIR_CLASS(xrep_dir_rebuild_tree);
+
 #endif /* IS_ENABLED(CONFIG_XFS_ONLINE_REPAIR) */
 
 #endif /* _TRACE_XFS_SCRUB_TRACE_H */


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 6/7] xfs: add hooks to do directory updates
  2023-02-16 20:27 ` [PATCHSET v9r2d1 0/7] xfs: online repair of directories Darrick J. Wong
                     ` (4 preceding siblings ...)
  2023-02-16 20:49   ` [PATCH 5/7] xfs: reconstruct directories from parent pointers Darrick J. Wong
@ 2023-02-16 20:49   ` Darrick J. Wong
  2023-02-16 20:49   ` [PATCH 7/7] xfs: compare generated and existing dirents Darrick J. Wong
  6 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:49 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

While we're scanning the filesystem, we still need to keep the tempdir
up to date with whatever changes get made to the you know what.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_dir2.c  |    2 -
 fs/xfs/libxfs/xfs_dir2.h  |    2 -
 fs/xfs/scrub/dir_repair.c |   94 +++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 96 insertions(+), 2 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_dir2.c b/fs/xfs/libxfs/xfs_dir2.c
index 27e408d20d18..8bed71a5e9cc 100644
--- a/fs/xfs/libxfs/xfs_dir2.c
+++ b/fs/xfs/libxfs/xfs_dir2.c
@@ -440,7 +440,7 @@ int
 xfs_dir_removename(
 	struct xfs_trans	*tp,
 	struct xfs_inode	*dp,
-	struct xfs_name		*name,
+	const struct xfs_name	*name,
 	xfs_ino_t		ino,
 	xfs_extlen_t		total,		/* bmap's total block count */
 	xfs_dir2_dataptr_t	*offset)	/* OUT: offset in directory */
diff --git a/fs/xfs/libxfs/xfs_dir2.h b/fs/xfs/libxfs/xfs_dir2.h
index ac360c0b2fe7..6ed86b7bd13c 100644
--- a/fs/xfs/libxfs/xfs_dir2.h
+++ b/fs/xfs/libxfs/xfs_dir2.h
@@ -46,7 +46,7 @@ extern int xfs_dir_lookup(struct xfs_trans *tp, struct xfs_inode *dp,
 				const struct xfs_name *name, xfs_ino_t *inum,
 				struct xfs_name *ci_name);
 extern int xfs_dir_removename(struct xfs_trans *tp, struct xfs_inode *dp,
-				struct xfs_name *name, xfs_ino_t ino,
+				const struct xfs_name *name, xfs_ino_t ino,
 				xfs_extlen_t tot,
 				xfs_dir2_dataptr_t *offset);
 extern int xfs_dir_replace(struct xfs_trans *tp, struct xfs_inode *dp,
diff --git a/fs/xfs/scrub/dir_repair.c b/fs/xfs/scrub/dir_repair.c
index a6576a29e784..25af002df1da 100644
--- a/fs/xfs/scrub/dir_repair.c
+++ b/fs/xfs/scrub/dir_repair.c
@@ -124,6 +124,9 @@ struct xrep_dir {
 	/* Mutex protecting dir_entries, dir_names, and parent_ino. */
 	struct mutex		lock;
 
+	/* Hook to capture directory entry updates. */
+	struct xfs_dirent_hook	hooks;
+
 	/*
 	 * This is the dotdot inumber that we're going to set on the
 	 * reconstructed directory.
@@ -141,6 +144,7 @@ xrep_dir_teardown(
 {
 	struct xrep_dir		*rd = sc->buf;
 
+	xfs_dirent_hook_del(sc->mp, &rd->hooks);
 	xchk_iscan_finish(&rd->iscan);
 	mutex_destroy(&rd->lock);
 	xfblob_destroy(rd->dir_names);
@@ -155,6 +159,8 @@ xrep_setup_directory(
 	struct xrep_dir		*rd;
 	int			error;
 
+	xchk_fshooks_enable(sc, XCHK_FSHOOKS_DIRENTS);
+
 	error = xrep_tempfile_create(sc, S_IFDIR);
 	if (error)
 		return error;
@@ -832,6 +838,12 @@ xrep_dir_rebuild_tree(
 	if (error)
 		return error;
 
+	/*
+	 * Abort the inode scan so that the live hooks won't stash any more
+	 * directory updates.
+	 */
+	xchk_iscan_abort(&rd->iscan);
+
 	error = xrep_dir_replay_updates(rd);
 	if (error)
 		return error;
@@ -875,6 +887,72 @@ xrep_dir_rebuild_tree(
 	return xrep_dir_replay_updates(rd);
 }
 
+/*
+ * Capture dirent updates being made by other threads which are relevant to the
+ * directory being repaired.
+ */
+STATIC int
+xrep_dir_live_update(
+	struct notifier_block		*nb,
+	unsigned long			action,
+	void				*data)
+{
+	struct xfs_dirent_update_params	*p = data;
+	struct xrep_dir			*rd;
+	struct xfs_scrub		*sc;
+	int				error = 0;
+
+	rd = container_of(nb, struct xrep_dir, hooks.delta_hook.nb);
+	sc = rd->sc;
+
+	if (action != XFS_DIRENT_CHILD_DELTA)
+		return NOTIFY_DONE;
+
+	/*
+	 * This thread updated a dirent in the directory that we're rebuilding,
+	 * so stash the update for replay against the temporary directory.
+	 */
+	if (p->dp->i_ino == sc->ip->i_ino &&
+	    xchk_iscan_want_live_update(&rd->iscan, p->ip->i_ino)) {
+		mutex_lock(&rd->lock);
+		if (p->delta > 0)
+			error = xrep_dir_add_dirent(rd, p->name, p->ip->i_ino,
+					p->diroffset);
+		else
+			error = xrep_dir_remove_dirent(rd, p->name,
+					p->ip->i_ino, p->diroffset);
+		mutex_unlock(&rd->lock);
+		if (error)
+			goto out_abort;
+	}
+
+	/*
+	 * This thread updated a dirent that points to the directory that we're
+	 * rebuilding, so remember the new dotdot target.
+	 */
+	if (p->ip->i_ino == sc->ip->i_ino &&
+	    xchk_iscan_want_live_update(&rd->iscan, p->dp->i_ino)) {
+		mutex_lock(&rd->lock);
+		if (p->delta > 0) {
+			trace_xrep_dir_add_dirent(sc->tempip, &xfs_name_dotdot,
+					p->dp->i_ino, 0);
+
+			rd->parent_ino = p->dp->i_ino;
+		} else {
+			trace_xrep_dir_remove_dirent(sc->tempip,
+					&xfs_name_dotdot, NULLFSINO, 0);
+
+			rd->parent_ino = NULLFSINO;
+		}
+		mutex_unlock(&rd->lock);
+	}
+
+	return NOTIFY_DONE;
+out_abort:
+	xchk_iscan_abort(&rd->iscan);
+	return NOTIFY_DONE;
+}
+
 /* Set up the filesystem scan so we can regenerate directory entries. */
 STATIC int
 xrep_dir_setup_scan(
@@ -897,8 +975,24 @@ xrep_dir_setup_scan(
 	/* Retry iget every tenth of a second for up to 30 seconds. */
 	xchk_iscan_start(&rd->iscan, 30000, 100);
 
+	/*
+	 * Hook into the dirent update code.  The hook only operates on inodes
+	 * that were already scanned, and the scanner thread takes each inode's
+	 * ILOCK, which means that any in-progress inode updates will finish
+	 * before we can scan the inode.
+	 */
+	ASSERT(sc->flags & XCHK_FSHOOKS_DIRENTS);
+	xfs_hook_setup(&rd->hooks.delta_hook, xrep_dir_live_update);
+	error = xfs_dirent_hook_add(sc->mp, &rd->hooks);
+	if (error)
+		goto out_scan;
+
 	return 0;
 
+out_scan:
+	xchk_iscan_finish(&rd->iscan);
+	mutex_destroy(&rd->lock);
+	xfblob_destroy(rd->dir_names);
 out_entries:
 	xfarray_destroy(rd->dir_entries);
 	return error;


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 7/7] xfs: compare generated and existing dirents
  2023-02-16 20:27 ` [PATCHSET v9r2d1 0/7] xfs: online repair of directories Darrick J. Wong
                     ` (5 preceding siblings ...)
  2023-02-16 20:49   ` [PATCH 6/7] xfs: add hooks to do directory updates Darrick J. Wong
@ 2023-02-16 20:49   ` Darrick J. Wong
  6 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:49 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Check our work to make sure we found all the dirents that the original
directory had.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/scrub/dir_repair.c |  101 ++++++++++++++++++++++++++++++++++++++++++++-
 fs/xfs/scrub/trace.h      |    2 +
 2 files changed, 100 insertions(+), 3 deletions(-)


diff --git a/fs/xfs/scrub/dir_repair.c b/fs/xfs/scrub/dir_repair.c
index 25af002df1da..ec48b3268809 100644
--- a/fs/xfs/scrub/dir_repair.c
+++ b/fs/xfs/scrub/dir_repair.c
@@ -757,7 +757,10 @@ xrep_dir_scan_dirtree(
 	return 0;
 }
 
-/* Dump a dirent from the temporary dir. */
+/*
+ * Dump a dirent from the temporary dir and check it against the dir we're
+ * rebuilding.  We are not committing any of this.
+ */
 STATIC int
 xrep_dir_dump_tempdir(
 	struct xfs_scrub	*sc,
@@ -768,7 +771,9 @@ xrep_dir_dump_tempdir(
 	void			*priv)
 {
 	struct xrep_dir		*rd = priv;
+	xfs_ino_t		child_ino;
 	bool			child = true;
+	xfs_dir2_dataptr_t	child_diroffset = XFS_DIR2_NULL_DATAPTR;
 	int			error;
 
 	/*
@@ -809,7 +814,88 @@ xrep_dir_dump_tempdir(
 	mutex_lock(&rd->lock);
 	error = xrep_dir_remove_dirent(rd, name, ino, dapos);
 	mutex_unlock(&rd->lock);
-	return error;
+	if (error)
+		return error;
+
+	/* Check that the dir being repaired has the same entry. */
+	error = xchk_dir_lookup(sc, sc->ip, name, &child_ino,
+			&child_diroffset);
+	if (error == -ENOENT) {
+		trace_xrep_dir_checkname(sc->ip, name, NULLFSINO,
+				XFS_DIR2_NULL_DATAPTR);
+		ASSERT(error != -ENOENT);
+		return -EFSCORRUPTED;
+	}
+	if (error)
+		return error;
+
+	if (ino != child_ino) {
+		trace_xrep_dir_checkname(sc->ip, name, child_ino,
+				child_diroffset);
+		ASSERT(ino == child_ino);
+		return -EFSCORRUPTED;
+	}
+
+	if (dapos != child_diroffset) {
+		trace_xrep_dir_badposname(sc->ip, name, child_ino,
+				child_diroffset);
+		/* We have no way to update this, so we just leave it. */
+	}
+
+	return 0;
+}
+
+/*
+ * Dump a dirent from the dir we're rebuilding and check it against the
+ * temporary dir.  This assumes that the directory wasn't really corrupt to
+ * begin with.
+ */
+STATIC int
+xrep_dir_dump_baddir(
+	struct xfs_scrub	*sc,
+	struct xfs_inode	*dp,
+	xfs_dir2_dataptr_t	dapos,
+	const struct xfs_name	*name,
+	xfs_ino_t		ino,
+	void			*priv)
+{
+	xfs_ino_t		child_ino;
+	xfs_dir2_dataptr_t	child_diroffset = XFS_DIR2_NULL_DATAPTR;
+	int			error;
+
+	/* Ignore the directory's dot and dotdot entries. */
+	if (xrep_dir_samename(name, &xfs_name_dotdot) ||
+	    xrep_dir_samename(name, &xfs_name_dot))
+		return 0;
+
+	trace_xrep_dir_dumpname(sc->ip, name, ino, dapos);
+
+	/* Check that the tempdir has the same entry. */
+	error = xchk_dir_lookup(sc, sc->tempip, name, &child_ino,
+			&child_diroffset);
+	if (error == -ENOENT) {
+		trace_xrep_dir_checkname(sc->tempip, name, NULLFSINO,
+				XFS_DIR2_NULL_DATAPTR);
+		ASSERT(error != -ENOENT);
+		return -EFSCORRUPTED;
+	}
+	if (error)
+		return error;
+
+	if (ino != child_ino) {
+		trace_xrep_dir_checkname(sc->tempip, name, child_ino,
+				child_diroffset);
+		ASSERT(ino == child_ino);
+		return -EFSCORRUPTED;
+	}
+
+	if (dapos != child_diroffset) {
+		trace_xrep_dir_badposname(sc->ip, name, child_ino,
+				child_diroffset);
+		/* We have no way to update this, so we just leave it. */
+	}
+
+	return 0;
 }
 
 /*
@@ -876,12 +962,21 @@ xrep_dir_rebuild_tree(
 
 	trace_xrep_dir_rebuild_tree(sc->ip, rd->parent_ino);
 
-	xrep_tempfile_ilock(sc);
+	xchk_ilock(sc, XFS_ILOCK_EXCL);
+	error = xrep_tempfile_ilock_polled(sc);
+	if (error)
+		return error;
+
 	error = xchk_dir_walk(sc, sc->tempip, xrep_dir_dump_tempdir, rd);
 	if (error)
 		return error;
 
+	error = xchk_dir_walk(sc, sc->ip, xrep_dir_dump_baddir, rd);
+	if (error)
+		return error;
+
 	xrep_tempfile_iunlock(sc);
+	xchk_iunlock(sc, XFS_ILOCK_EXCL);
 	xchk_trans_cancel(sc);
 
 	return xrep_dir_replay_updates(rd);
diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
index cbf914bce6db..81d26be0ef3b 100644
--- a/fs/xfs/scrub/trace.h
+++ b/fs/xfs/scrub/trace.h
@@ -1224,6 +1224,8 @@ DEFINE_XREP_DIRENT_CLASS(xrep_dir_createname);
 DEFINE_XREP_DIRENT_CLASS(xrep_dir_removename);
 DEFINE_XREP_DIRENT_CLASS(xrep_dir_replacename);
 DEFINE_XREP_DIRENT_CLASS(xrep_dir_dumpname);
+DEFINE_XREP_DIRENT_CLASS(xrep_dir_checkname);
+DEFINE_XREP_DIRENT_CLASS(xrep_dir_badposname);
 
 DECLARE_EVENT_CLASS(xrep_dir_class,
 	TP_PROTO(struct xfs_inode *dp, xfs_ino_t parent_ino),


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 1/2] xfs: scrub parent pointers
  2023-02-16 20:27 ` [PATCHSET v9r2d1 0/2] xfs: online checking of parent pointers Darrick J. Wong
@ 2023-02-16 20:49   ` Darrick J. Wong
  2023-02-16 20:50   ` [PATCH 2/2] xfs: deferred scrub of " Darrick J. Wong
  1 sibling, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:49 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Actually check parent pointers now.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/scrub/parent.c |  291 +++++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/trace.h  |   33 ++++++
 2 files changed, 324 insertions(+)


diff --git a/fs/xfs/scrub/parent.c b/fs/xfs/scrub/parent.c
index d59184a59671..1bb196f2c1b2 100644
--- a/fs/xfs/scrub/parent.c
+++ b/fs/xfs/scrub/parent.c
@@ -14,9 +14,13 @@
 #include "xfs_icache.h"
 #include "xfs_dir2.h"
 #include "xfs_dir2_priv.h"
+#include "xfs_attr.h"
+#include "xfs_parent.h"
 #include "scrub/scrub.h"
 #include "scrub/common.h"
 #include "scrub/readdir.h"
+#include "scrub/listxattr.h"
+#include "scrub/trace.h"
 
 /* Set us up to scrub parents. */
 int
@@ -231,6 +235,8 @@ xchk_parent_validate(
 	return error;
 }
 
+STATIC int xchk_parent_pptr(struct xfs_scrub *sc);
+
 /* Scrub a parent pointer. */
 int
 xchk_parent(
@@ -240,6 +246,9 @@ xchk_parent(
 	xfs_ino_t		parent_ino;
 	int			error;
 
+	if (xfs_has_parent(mp))
+		return xchk_parent_pptr(sc);
+
 	/*
 	 * If we're a directory, check that the '..' link points up to
 	 * a directory that has one entry pointing to us.
@@ -282,3 +291,285 @@ xchk_parent(
 
 	return xchk_parent_validate(sc, parent_ino);
 }
+
+/*
+ * Checking of Parent Pointers
+ * ===========================
+ *
+ * On filesystems with directory parent pointers, we check the referential
+ * integrity by visiting each parent pointer of a child file and checking that
+ * the directory referenced by the pointer actually has a dirent pointing back
+ * to the child file.
+ */
+
+struct xchk_pptrs {
+	struct xfs_scrub	*sc;
+
+	/* Scratch buffer for scanning pptr xattrs */
+	struct xfs_parent_name_irec pptr;
+
+	/* Parent of this directory. */
+	xfs_ino_t		parent_ino;
+};
+
+/* Look up the dotdot entry so that we can check it as we walk the pptrs. */
+STATIC int
+xchk_parent_dotdot(
+	struct xchk_pptrs	*pp)
+{
+	struct xfs_scrub	*sc = pp->sc;
+	int			error;
+
+	if (!S_ISDIR(VFS_I(sc->ip)->i_mode)) {
+		pp->parent_ino = NULLFSINO;
+		return 0;
+	}
+
+	/* Look up '..' */
+	error = xchk_dir_lookup(sc, sc->ip, &xfs_name_dotdot, &pp->parent_ino,
+			NULL);
+	if (!xchk_fblock_process_error(sc, XFS_DATA_FORK, 0, &error))
+		return error;
+	if (!xfs_verify_dir_ino(sc->mp, pp->parent_ino)) {
+		xchk_fblock_set_corrupt(sc, XFS_DATA_FORK, 0);
+		return 0;
+	}
+
+	/* Is this the root dir?  Then '..' must point to itself. */
+	if (sc->ip == sc->mp->m_rootip && sc->ip->i_ino != pp->parent_ino)
+		xchk_fblock_set_corrupt(sc, XFS_DATA_FORK, 0);
+
+	return 0;
+}
+
+/*
+ * Try to lock a parent directory for checking dirents.  Returns the inode
+ * flags for the locks we now hold, or zero if we failed.
+ */
+STATIC unsigned int
+xchk_parent_lock_dir(
+	struct xfs_scrub	*sc,
+	struct xfs_inode	*dp)
+{
+	if (!xfs_ilock_nowait(dp, XFS_IOLOCK_SHARED))
+		return 0;
+
+	if (!xfs_ilock_nowait(dp, XFS_ILOCK_SHARED)) {
+		xfs_iunlock(dp, XFS_IOLOCK_SHARED);
+		return 0;
+	}
+
+	if (!xfs_need_iread_extents(&dp->i_df))
+		return XFS_IOLOCK_SHARED | XFS_ILOCK_SHARED;
+
+	xfs_iunlock(dp, XFS_ILOCK_SHARED);
+
+	if (!xfs_ilock_nowait(dp, XFS_ILOCK_EXCL)) {
+		xfs_iunlock(dp, XFS_IOLOCK_SHARED);
+		return 0;
+	}
+
+	return XFS_IOLOCK_SHARED | XFS_ILOCK_EXCL;
+}
+
+/* Check the forward link (dirent) associated with this parent pointer. */
+STATIC int
+xchk_parent_dirent(
+	struct xchk_pptrs	*pp,
+	struct xfs_inode	*dp)
+{
+	struct xfs_name		xname = {
+		.name		= pp->pptr.p_name,
+		.len		= pp->pptr.p_namelen,
+	};
+	struct xfs_scrub	*sc = pp->sc;
+	xfs_ino_t		child_ino;
+	xfs_dir2_dataptr_t	child_diroffset;
+	int			error;
+
+	/*
+	 * Use the name attached to this parent pointer to look up the
+	 * directory entry in the alleged parent.
+	 */
+	error = xchk_dir_lookup(sc, dp, &xname, &child_ino, &child_diroffset);
+	if (error == -ENOENT) {
+		xchk_fblock_xref_set_corrupt(sc, XFS_ATTR_FORK, 0);
+		return 0;
+	}
+	if (!xchk_fblock_xref_process_error(sc, XFS_ATTR_FORK, 0, &error))
+		return error;
+
+	/* Does the inode number match? */
+	if (child_ino != sc->ip->i_ino) {
+		xchk_fblock_xref_set_corrupt(sc, XFS_ATTR_FORK, 0);
+		return 0;
+	}
+
+	/* Does the directory offset match? */
+	if (pp->pptr.p_diroffset != child_diroffset) {
+		trace_xchk_parent_bad_dapos(sc->ip, pp->pptr.p_diroffset,
+				dp->i_ino, child_diroffset, xname.name,
+				xname.len);
+		xchk_fblock_xref_set_corrupt(sc, XFS_ATTR_FORK, 0);
+		return 0;
+	}
+
+	/*
+	 * If we're scanning a directory, we should only ever encounter a
+	 * single parent pointer, and it should match the dotdot entry.  We set
+	 * the parent_ino from the dotdot entry before the scan, so compare it
+	 * now.
+	 */
+	if (!S_ISDIR(VFS_I(sc->ip)->i_mode))
+		return 0;
+
+	if (pp->parent_ino != dp->i_ino) {
+		xchk_fblock_set_corrupt(sc, XFS_ATTR_FORK, 0);
+		return 0;
+	}
+
+	pp->parent_ino = NULLFSINO;
+	return 0;
+}
+
+/* Try to grab a parent directory. */
+STATIC int
+xchk_parent_iget(
+	struct xchk_pptrs		*pp,
+	struct xfs_inode		**dpp)
+{
+	struct xfs_scrub		*sc = pp->sc;
+	struct xfs_inode		*ip;
+	int				error;
+
+	/* Validate inode number. */
+	error = xfs_dir_ino_validate(sc->mp, pp->pptr.p_ino);
+	if (error) {
+		xchk_fblock_set_corrupt(sc, XFS_ATTR_FORK, 0);
+		return -ECANCELED;
+	}
+
+	error = xchk_iget(sc, pp->pptr.p_ino, &ip);
+	if (error == -EINVAL || error == -ENOENT) {
+		xchk_fblock_set_corrupt(sc, XFS_ATTR_FORK, 0);
+		return -ECANCELED;
+	}
+	if (!xchk_fblock_xref_process_error(sc, XFS_ATTR_FORK, 0, &error))
+		return error;
+
+	/* The parent must be a directory. */
+	if (!S_ISDIR(VFS_I(ip)->i_mode)) {
+		xchk_fblock_xref_set_corrupt(sc, XFS_ATTR_FORK, 0);
+		goto out_rele;
+	}
+
+	/* Validate generation number. */
+	if (VFS_I(ip)->i_generation != pp->pptr.p_gen) {
+		xchk_fblock_xref_set_corrupt(sc, XFS_ATTR_FORK, 0);
+		goto out_rele;
+	}
+
+	*dpp = ip;
+	return 0;
+out_rele:
+	xchk_irele(sc, ip);
+	return 0;
+}
+
+/*
+ * Walk an xattr of a file.  If this xattr is a parent pointer, follow it up
+ * to a parent directory and check that the parent has a dirent pointing back
+ * to us.
+ */
+STATIC int
+xchk_parent_scan_attr(
+	struct xfs_scrub	*sc,
+	struct xfs_inode	*ip,
+	unsigned int		attr_flags,
+	const unsigned char	*name,
+	unsigned int		namelen,
+	const void		*value,
+	unsigned int		valuelen,
+	void			*priv)
+{
+	struct xchk_pptrs	*pp = priv;
+	struct xfs_inode	*dp = NULL;
+	const struct xfs_parent_name_rec *rec = (const void *)name;
+	unsigned int		lockmode;
+	int			error;
+
+	/* Ignore incomplete xattrs */
+	if (attr_flags & XFS_ATTR_INCOMPLETE)
+		return 0;
+
+	/* Ignore anything that isn't a parent pointer. */
+	if (!(attr_flags & XFS_ATTR_PARENT))
+		return 0;
+
+	/* Does the ondisk parent pointer structure make sense? */
+	if (!xfs_parent_namecheck(sc->mp, rec, namelen, attr_flags)) {
+		xchk_fblock_set_corrupt(sc, XFS_ATTR_FORK, 0);
+		return -ECANCELED;
+	}
+
+	if (!xfs_parent_valuecheck(sc->mp, value, valuelen)) {
+		xchk_fblock_set_corrupt(sc, XFS_ATTR_FORK, 0);
+		return -ECANCELED;
+	}
+
+	xfs_parent_irec_from_disk(&pp->pptr, rec, value, valuelen);
+
+	error = xchk_parent_iget(pp, &dp);
+	if (error)
+		return error;
+	if (!dp)
+		return 0;
+
+	/* Try to lock the inode. */
+	lockmode = xchk_parent_lock_dir(sc, dp);
+	if (!lockmode) {
+		xchk_set_incomplete(sc);
+		error = -ECANCELED;
+		goto out_rele;
+	}
+
+	error = xchk_parent_dirent(pp, dp);
+	if (error)
+		goto out_unlock;
+
+out_unlock:
+	xfs_iunlock(dp, lockmode);
+out_rele:
+	xchk_irele(sc, dp);
+	return error;
+}
+
+/* Check parent pointers of a file. */
+STATIC int
+xchk_parent_pptr(
+	struct xfs_scrub	*sc)
+{
+	struct xchk_pptrs	*pp;
+	int			error;
+
+	pp = kvzalloc(sizeof(struct xchk_pptrs), XCHK_GFP_FLAGS);
+	if (!pp)
+		return -ENOMEM;
+	pp->sc = sc;
+
+	error = xchk_parent_dotdot(pp);
+	if (error)
+		goto out_pp;
+
+	error = xchk_xattr_walk(sc, sc->ip, xchk_parent_scan_attr, pp);
+	if (error == -ECANCELED) {
+		error = 0;
+		goto out_pp;
+	}
+	if (error)
+		goto out_pp;
+
+out_pp:
+	kvfree(pp);
+	return error;
+}
diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
index 81d26be0ef3b..ac21759fc3e1 100644
--- a/fs/xfs/scrub/trace.h
+++ b/fs/xfs/scrub/trace.h
@@ -896,6 +896,39 @@ TRACE_EVENT(xchk_nlinks_live_update,
 		  __get_str(name))
 );
 
+TRACE_EVENT(xchk_parent_bad_dapos,
+	TP_PROTO(struct xfs_inode *ip, unsigned int p_diroffset,
+		 xfs_ino_t parent_ino, unsigned int dapos,
+		 const char *name, unsigned int namelen),
+	TP_ARGS(ip, p_diroffset, parent_ino, dapos, name, namelen),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_ino_t, ino)
+		__field(unsigned int, p_diroffset)
+		__field(xfs_ino_t, parent_ino)
+		__field(unsigned int, dapos)
+		__field(unsigned int, namelen)
+		__dynamic_array(char, name, namelen)
+	),
+	TP_fast_assign(
+		__entry->dev = ip->i_mount->m_super->s_dev;
+		__entry->ino = ip->i_ino;
+		__entry->p_diroffset = p_diroffset;
+		__entry->parent_ino = parent_ino;
+		__entry->dapos = dapos;
+		__entry->namelen = namelen;
+		memcpy(__get_str(name), name, namelen);
+	),
+	TP_printk("dev %d:%d ino 0x%llx p_diroff 0x%x parent_ino 0x%llx parent_diroff 0x%x name '%.*s'",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->ino,
+		  __entry->p_diroffset,
+		  __entry->parent_ino,
+		  __entry->dapos,
+		  __entry->namelen,
+		  __get_str(name))
+);
+
 /* repair tracepoints */
 #if IS_ENABLED(CONFIG_XFS_ONLINE_REPAIR)
 


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 2/2] xfs: deferred scrub of parent pointers
  2023-02-16 20:27 ` [PATCHSET v9r2d1 0/2] xfs: online checking of parent pointers Darrick J. Wong
  2023-02-16 20:49   ` [PATCH 1/2] xfs: scrub " Darrick J. Wong
@ 2023-02-16 20:50   ` Darrick J. Wong
  1 sibling, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:50 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

If the trylock-based dirent check fails, retain those parent pointers
and check them at the end.  This may involve dropping the locks on the
file being scanned, so yay.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/Makefile            |    2 
 fs/xfs/libxfs/xfs_parent.c |   38 +++++++
 fs/xfs/libxfs/xfs_parent.h |   10 ++
 fs/xfs/scrub/parent.c      |  246 +++++++++++++++++++++++++++++++++++++++++++-
 fs/xfs/scrub/trace.h       |   33 ++++++
 5 files changed, 324 insertions(+), 5 deletions(-)


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index a32f6da27a86..0a908382d033 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -168,6 +168,7 @@ xfs-y				+= $(addprefix scrub/, \
 				   scrub.o \
 				   symlink.o \
 				   xfarray.o \
+				   xfblob.o \
 				   xfile.o \
 				   )
 
@@ -181,7 +182,6 @@ xfs-y				+= $(addprefix scrub/, \
 				   dir_repair.o \
 				   repair.o \
 				   tempfile.o \
-				   xfblob.o \
 				   )
 endif
 endif
diff --git a/fs/xfs/libxfs/xfs_parent.c b/fs/xfs/libxfs/xfs_parent.c
index fe6d4d1a7d57..36e1968337d5 100644
--- a/fs/xfs/libxfs/xfs_parent.c
+++ b/fs/xfs/libxfs/xfs_parent.c
@@ -298,3 +298,41 @@ xfs_pptr_calc_space_res(
 	       XFS_NEXTENTADD_SPACE_RES(mp, namelen, XFS_ATTR_FORK);
 }
 
+/*
+ * Look up the @name associated with the parent pointer (@pptr) of @ip.  Caller
+ * must hold at least ILOCK_SHARED.  Returns the length of the dirent name, or
+ * a negative errno.  The scratchpad need not be initialized.
+ */
+int
+xfs_parent_lookup(
+	struct xfs_trans		*tp,
+	struct xfs_inode		*ip,
+	const struct xfs_parent_name_irec *pptr,
+	unsigned char			*name,
+	unsigned int			namelen,
+	struct xfs_parent_scratch	*scr)
+{
+	int				error;
+
+	xfs_parent_irec_to_disk(&scr->rec, NULL, NULL, pptr);
+
+	memset(&scr->args, 0, sizeof(struct xfs_da_args));
+	scr->args.attr_filter	= XFS_ATTR_PARENT;
+	scr->args.dp		= ip;
+	scr->args.geo		= ip->i_mount->m_attr_geo;
+	scr->args.name		= (const unsigned char *)&scr->rec;
+	scr->args.namelen	= sizeof(struct xfs_parent_name_rec);
+	scr->args.op_flags	= XFS_DA_OP_OKNOENT;
+	scr->args.trans		= tp;
+	scr->args.valuelen	= namelen;
+	scr->args.value		= name;
+	scr->args.whichfork	= XFS_ATTR_FORK;
+
+	scr->args.hashval = xfs_da_hashname(scr->args.name, scr->args.namelen);
+
+	error = xfs_attr_get_ilocked(&scr->args);
+	if (error)
+		return error;
+
+	return scr->args.valuelen;
+}
diff --git a/fs/xfs/libxfs/xfs_parent.h b/fs/xfs/libxfs/xfs_parent.h
index 4eb92fb4b11b..cd1b135195a2 100644
--- a/fs/xfs/libxfs/xfs_parent.h
+++ b/fs/xfs/libxfs/xfs_parent.h
@@ -103,4 +103,14 @@ xfs_parent_finish(
 unsigned int xfs_pptr_calc_space_res(struct xfs_mount *mp,
 				     unsigned int namelen);
 
+/* Scratchpad memory so that raw parent operations don't burn stack space. */
+struct xfs_parent_scratch {
+	struct xfs_parent_name_rec	rec;
+	struct xfs_da_args		args;
+};
+
+int xfs_parent_lookup(struct xfs_trans *tp, struct xfs_inode *ip,
+		const struct xfs_parent_name_irec *pptr, unsigned char *name,
+		unsigned int namelen, struct xfs_parent_scratch *scratch);
+
 #endif	/* __XFS_PARENT_H__ */
diff --git a/fs/xfs/scrub/parent.c b/fs/xfs/scrub/parent.c
index 1bb196f2c1b2..056e11337cec 100644
--- a/fs/xfs/scrub/parent.c
+++ b/fs/xfs/scrub/parent.c
@@ -20,6 +20,9 @@
 #include "scrub/common.h"
 #include "scrub/readdir.h"
 #include "scrub/listxattr.h"
+#include "scrub/xfile.h"
+#include "scrub/xfarray.h"
+#include "scrub/xfblob.h"
 #include "scrub/trace.h"
 
 /* Set us up to scrub parents. */
@@ -302,14 +305,43 @@ xchk_parent(
  * to the child file.
  */
 
+/* Deferred parent pointer entry that we saved for later. */
+struct xchk_pptr {
+	/* Cookie for retrieval of the pptr name. */
+	xfblob_cookie			name_cookie;
+
+	/* Parent pointer attr key. */
+	xfs_ino_t			p_ino;
+	uint32_t			p_gen;
+	xfs_dir2_dataptr_t		p_diroffset;
+
+	/* Length of the pptr name. */
+	uint8_t				namelen;
+};
+
 struct xchk_pptrs {
 	struct xfs_scrub	*sc;
 
 	/* Scratch buffer for scanning pptr xattrs */
 	struct xfs_parent_name_irec pptr;
 
+	/* Fixed-size array of xchk_pptr structures. */
+	struct xfarray		*pptr_entries;
+
+	/* Blobs containing parent pointer names. */
+	struct xfblob		*pptr_names;
+
 	/* Parent of this directory. */
 	xfs_ino_t		parent_ino;
+
+	/* If we've cycled the ILOCK, we must revalidate all deferred pptrs. */
+	bool			need_revalidate;
+
+	/* xattr key and da args for parent pointer revalidation. */
+	struct xfs_parent_scratch pptr_scratch;
+
+	/* Name buffer for revalidation. */
+	uint8_t			namebuf[MAXNAMELEN];
 };
 
 /* Look up the dotdot entry so that we can check it as we walk the pptrs. */
@@ -528,8 +560,26 @@ xchk_parent_scan_attr(
 	/* Try to lock the inode. */
 	lockmode = xchk_parent_lock_dir(sc, dp);
 	if (!lockmode) {
-		xchk_set_incomplete(sc);
-		error = -ECANCELED;
+		struct xchk_pptr	save_pp = {
+			.p_ino		= pp->pptr.p_ino,
+			.p_gen		= pp->pptr.p_gen,
+			.p_diroffset	= pp->pptr.p_diroffset,
+			.namelen	= pp->pptr.p_namelen,
+		};
+
+		/* Couldn't lock the inode, so save the pptr for later. */
+		trace_xchk_parent_defer(sc->ip, pp->pptr.p_name,
+				pp->pptr.p_namelen, dp->i_ino);
+
+		error = xfblob_store(pp->pptr_names, &save_pp.name_cookie,
+				pp->pptr.p_name, pp->pptr.p_namelen);
+		if (xchk_fblock_process_error(sc, XFS_ATTR_FORK, 0, &error))
+			goto out_rele;
+
+		error = xfarray_append(pp->pptr_entries, &save_pp);
+		if (xchk_fblock_process_error(sc, XFS_ATTR_FORK, 0, &error))
+			goto out_rele;
+
 		goto out_rele;
 	}
 
@@ -544,6 +594,173 @@ xchk_parent_scan_attr(
 	return error;
 }
 
+/*
+ * Revalidate a parent pointer that we collected in the past but couldn't check
+ * because of lock contention.  Returns 0 if the parent pointer is still valid,
+ * -ENOENT if it has gone away on us, or a negative errno.
+ */
+STATIC int
+xchk_parent_revalidate_pptr(
+	struct xchk_pptrs	*pp)
+{
+	struct xfs_scrub	*sc = pp->sc;
+	int			namelen;
+
+	namelen = xfs_parent_lookup(sc->tp, sc->ip, &pp->pptr, pp->namebuf,
+			MAXNAMELEN, &pp->pptr_scratch);
+	if (namelen == -ENOATTR) {
+		/*  Parent pointer went away, nothing to revalidate. */
+		return -ENOENT;
+	}
+	if (namelen < 0 && namelen != -EEXIST)
+		return namelen;
+
+	/*
+	 * The dirent name changed length while we were unlocked.  No need
+	 * to revalidate this.
+	 */
+	if (namelen != pp->pptr.p_namelen)
+		return -ENOENT;
+
+	/* The dirent name itself changed; there's nothing to revalidate. */
+	if (memcmp(pp->namebuf, pp->pptr.p_name, pp->pptr.p_namelen))
+		return -ENOENT;
+	return 0;
+}
+
+/*
+ * Check a parent pointer the slow way, which means we cycle locks a bunch
+ * and put up with revalidation until we get it done.
+ */
+STATIC int
+xchk_parent_slow_pptr(
+	struct xchk_pptrs	*pp,
+	struct xchk_pptr	*pptr)
+{
+	struct xfs_scrub	*sc = pp->sc;
+	struct xfs_inode	*dp = NULL;
+	unsigned int		lockmode;
+	int			error;
+
+	/* Restore the saved parent pointer into the irec. */
+	pp->pptr.p_ino = pptr->p_ino;
+	pp->pptr.p_gen = pptr->p_gen;
+	pp->pptr.p_diroffset = pptr->p_diroffset;
+
+	error = xfblob_load(pp->pptr_names, pptr->name_cookie, pp->pptr.p_name,
+			pptr->namelen);
+	if (error)
+		return error;
+	pp->pptr.p_name[MAXNAMELEN - 1] = 0;
+	pp->pptr.p_namelen = pptr->namelen;
+
+	/* Check that the deferred parent pointer still exists. */
+	if (pp->need_revalidate) {
+		error = xchk_parent_revalidate_pptr(pp);
+		if (error == -ENOENT)
+			return 0;
+		if (!xchk_fblock_xref_process_error(sc, XFS_ATTR_FORK, 0,
+					&error))
+			return error;
+	}
+
+	error = xchk_parent_iget(pp, &dp);
+	if (error)
+		return error;
+	if (!dp)
+		return 0;
+
+	/*
+	 * If we can grab both IOLOCK and ILOCK of the alleged parent, we
+	 * can proceed with the validation.
+	 */
+	lockmode = xchk_parent_lock_dir(sc, dp);
+	if (lockmode)
+		goto check_dirent;
+
+	/*
+	 * We couldn't lock the parent dir.  Drop all the locks and try to
+	 * get them again, one at a time.
+	 */
+	xchk_iunlock(sc, sc->ilock_flags);
+	pp->need_revalidate = true;
+
+	trace_xchk_parent_slowpath(sc->ip, pp->namebuf, pptr->namelen,
+			dp->i_ino);
+
+	while (true) {
+		xchk_ilock(sc, XFS_IOLOCK_EXCL);
+		if (xfs_ilock_nowait(dp, XFS_IOLOCK_SHARED)) {
+			xchk_ilock(sc, XFS_ILOCK_EXCL);
+			if (xfs_ilock_nowait(dp, XFS_ILOCK_EXCL)) {
+				break;
+			}
+			xchk_iunlock(sc, XFS_ILOCK_EXCL);
+		}
+		xchk_iunlock(sc, XFS_IOLOCK_EXCL);
+
+		if (xchk_should_terminate(sc, &error))
+			goto out_rele;
+
+		delay(1);
+	}
+	lockmode = XFS_IOLOCK_SHARED | XFS_ILOCK_EXCL;
+
+	/*
+	 * If we didn't already find a parent pointer matching the dotdot
+	 * entry, re-query the dotdot entry so that we can validate it.
+	 */
+	if (pp->parent_ino != NULLFSINO) {
+		error = xchk_parent_dotdot(pp);
+		if (error)
+			goto out_unlock;
+	}
+
+	/* Revalidate the parent pointer now that we cycled locks. */
+	error = xchk_parent_revalidate_pptr(pp);
+	if (error == -ENOENT)
+		goto out_unlock;
+	if (!xchk_fblock_xref_process_error(sc, XFS_ATTR_FORK, 0, &error))
+		goto out_unlock;
+
+check_dirent:
+	error = xchk_parent_dirent(pp, dp);
+out_unlock:
+	xfs_iunlock(dp, lockmode);
+out_rele:
+	xchk_irele(sc, dp);
+	return error;
+}
+
+/* Check all the parent pointers that we deferred the first time around. */
+STATIC int
+xchk_parent_finish_slow_pptrs(
+	struct xchk_pptrs	*pp)
+{
+	xfarray_idx_t		array_cur;
+	int			error;
+
+	foreach_xfarray_idx(pp->pptr_entries, array_cur) {
+		struct xchk_pptr	pptr;
+
+		if (pp->sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)
+			return 0;
+
+		error = xfarray_load(pp->pptr_entries, array_cur, &pptr);
+		if (error)
+			return error;
+
+		error = xchk_parent_slow_pptr(pp, &pptr);
+		if (error)
+			return error;
+	}
+
+	/* Empty out both xfiles now that we've checked everything. */
+	xfarray_truncate(pp->pptr_entries);
+	xfblob_truncate(pp->pptr_names);
+	return 0;
+}
+
 /* Check parent pointers of a file. */
 STATIC int
 xchk_parent_pptr(
@@ -561,14 +778,35 @@ xchk_parent_pptr(
 	if (error)
 		goto out_pp;
 
+	/*
+	 * Set up some staging memory for parent pointers that we can't check
+	 * due to locking contention.
+	 */
+	error = xfarray_create(sc->mp, "pptr entries", 0,
+			sizeof(struct xchk_pptr), &pp->pptr_entries);
+	if (error)
+		goto out_pp;
+
+	error = xfblob_create(sc->mp, "pptr names", &pp->pptr_names);
+	if (error)
+		goto out_entries;
+
 	error = xchk_xattr_walk(sc, sc->ip, xchk_parent_scan_attr, pp);
 	if (error == -ECANCELED) {
 		error = 0;
-		goto out_pp;
+		goto out_names;
 	}
 	if (error)
-		goto out_pp;
+		goto out_names;
 
+	error = xchk_parent_finish_slow_pptrs(pp);
+	if (error)
+		goto out_names;
+
+out_names:
+	xfblob_destroy(pp->pptr_names);
+out_entries:
+	xfarray_destroy(pp->pptr_entries);
 out_pp:
 	kvfree(pp);
 	return error;
diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
index ac21759fc3e1..61f18632cb6f 100644
--- a/fs/xfs/scrub/trace.h
+++ b/fs/xfs/scrub/trace.h
@@ -929,6 +929,39 @@ TRACE_EVENT(xchk_parent_bad_dapos,
 		  __get_str(name))
 );
 
+DECLARE_EVENT_CLASS(xchk_pptr_class,
+	TP_PROTO(struct xfs_inode *ip, const unsigned char *name,
+		 unsigned int namelen, xfs_ino_t parent_ino),
+	TP_ARGS(ip, name, namelen, parent_ino),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_ino_t, ino)
+		__field(unsigned int, namelen)
+		__dynamic_array(char, name, namelen)
+		__field(xfs_ino_t, parent_ino)
+	),
+	TP_fast_assign(
+		__entry->dev = ip->i_mount->m_super->s_dev;
+		__entry->ino = ip->i_ino;
+		__entry->namelen = namelen;
+		memcpy(__get_str(name), name, namelen);
+		__entry->parent_ino = parent_ino;
+	),
+	TP_printk("dev %d:%d ino 0x%llx name '%.*s' parent_ino 0x%llx",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->ino,
+		  __entry->namelen,
+		  __get_str(name),
+		  __entry->parent_ino)
+)
+#define DEFINE_XCHK_PPTR_CLASS(name) \
+DEFINE_EVENT(xchk_pptr_class, name, \
+	TP_PROTO(struct xfs_inode *ip, const unsigned char *name, \
+		 unsigned int namelen, xfs_ino_t parent_ino), \
+	TP_ARGS(ip, name, namelen, parent_ino))
+DEFINE_XCHK_PPTR_CLASS(xchk_parent_defer);
+DEFINE_XCHK_PPTR_CLASS(xchk_parent_slowpath);
+
 /* repair tracepoints */
 #if IS_ENABLED(CONFIG_XFS_ONLINE_REPAIR)
 


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 1/3] xfs: repair parent pointers by scanning directories
  2023-02-16 20:27 ` [PATCHSET v9r2d1 0/3] xfs: online checking " Darrick J. Wong
@ 2023-02-16 20:50   ` Darrick J. Wong
  2023-02-16 20:50   ` [PATCH 2/3] xfs: repair parent pointers with live scan hooks Darrick J. Wong
  2023-02-16 20:50   ` [PATCH 3/3] xfs: compare generated and existing parent pointers Darrick J. Wong
  2 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:50 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Walk the filesystem to rebuild parent pointer information.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/Makefile              |    1 
 fs/xfs/libxfs/xfs_parent.c   |   31 ++
 fs/xfs/libxfs/xfs_parent.h   |    4 
 fs/xfs/scrub/parent.c        |   10 +
 fs/xfs/scrub/parent_repair.c |  583 ++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/repair.h        |    4 
 fs/xfs/scrub/scrub.c         |    2 
 fs/xfs/scrub/trace.c         |    2 
 fs/xfs/scrub/trace.h         |   77 ++++++
 fs/xfs/xfs_inode.h           |    6 
 10 files changed, 717 insertions(+), 3 deletions(-)
 create mode 100644 fs/xfs/scrub/parent_repair.c


diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 0a908382d033..f0d5a517ca00 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -180,6 +180,7 @@ ifeq ($(CONFIG_XFS_ONLINE_REPAIR),y)
 xfs-y				+= $(addprefix scrub/, \
 				   agheader_repair.o \
 				   dir_repair.o \
+				   parent_repair.o \
 				   repair.o \
 				   tempfile.o \
 				   )
diff --git a/fs/xfs/libxfs/xfs_parent.c b/fs/xfs/libxfs/xfs_parent.c
index 36e1968337d5..5f07bd3debee 100644
--- a/fs/xfs/libxfs/xfs_parent.c
+++ b/fs/xfs/libxfs/xfs_parent.c
@@ -94,11 +94,11 @@ xfs_parent_valuecheck(
 static inline void
 xfs_init_parent_name_rec(
 	struct xfs_parent_name_rec	*rec,
-	struct xfs_inode		*ip,
+	const struct xfs_inode		*ip,
 	uint32_t			p_diroffset)
 {
 	xfs_ino_t			p_ino = ip->i_ino;
-	uint32_t			p_gen = VFS_I(ip)->i_generation;
+	uint32_t			p_gen = VFS_IC(ip)->i_generation;
 
 	rec->p_ino = cpu_to_be64(p_ino);
 	rec->p_gen = cpu_to_be32(p_gen);
@@ -336,3 +336,30 @@ xfs_parent_lookup(
 
 	return scr->args.valuelen;
 }
+
+/*
+ * Attach the parent pointer (@pptr -> @name) to @ip immediately.  Caller must
+ * not have a transaction or hold the ILOCK.  The update will not use logged
+ * xattrs.  This is for specialized repair functions only.  The scratchpad need
+ * not be initialized.
+ */
+int
+xfs_parent_set(
+	struct xfs_inode		*ip,
+	const struct xfs_parent_name_irec *pptr,
+	struct xfs_parent_scratch	*scr)
+{
+	xfs_parent_irec_to_disk(&scr->rec, NULL, NULL, pptr);
+
+	memset(&scr->args, 0, sizeof(struct xfs_da_args));
+	scr->args.attr_filter	= XFS_ATTR_PARENT;
+	scr->args.dp		= ip;
+	scr->args.geo		= ip->i_mount->m_attr_geo;
+	scr->args.name		= (const unsigned char *)&scr->rec;
+	scr->args.namelen	= sizeof(struct xfs_parent_name_rec);
+	scr->args.valuelen	= pptr->p_namelen;
+	scr->args.value		= (void *)pptr->p_name;
+	scr->args.whichfork	= XFS_ATTR_FORK;
+
+	return xfs_attr_set(&scr->args);
+}
diff --git a/fs/xfs/libxfs/xfs_parent.h b/fs/xfs/libxfs/xfs_parent.h
index cd1b135195a2..effbccdf6b0e 100644
--- a/fs/xfs/libxfs/xfs_parent.h
+++ b/fs/xfs/libxfs/xfs_parent.h
@@ -113,4 +113,8 @@ int xfs_parent_lookup(struct xfs_trans *tp, struct xfs_inode *ip,
 		const struct xfs_parent_name_irec *pptr, unsigned char *name,
 		unsigned int namelen, struct xfs_parent_scratch *scratch);
 
+int xfs_parent_set(struct xfs_inode *ip,
+		const struct xfs_parent_name_irec *pptr,
+		struct xfs_parent_scratch *scratch);
+
 #endif	/* __XFS_PARENT_H__ */
diff --git a/fs/xfs/scrub/parent.c b/fs/xfs/scrub/parent.c
index 056e11337cec..14f16fefd1b0 100644
--- a/fs/xfs/scrub/parent.c
+++ b/fs/xfs/scrub/parent.c
@@ -10,6 +10,7 @@
 #include "xfs_trans_resv.h"
 #include "xfs_mount.h"
 #include "xfs_log_format.h"
+#include "xfs_trans.h"
 #include "xfs_inode.h"
 #include "xfs_icache.h"
 #include "xfs_dir2.h"
@@ -24,12 +25,21 @@
 #include "scrub/xfarray.h"
 #include "scrub/xfblob.h"
 #include "scrub/trace.h"
+#include "scrub/repair.h"
 
 /* Set us up to scrub parents. */
 int
 xchk_setup_parent(
 	struct xfs_scrub	*sc)
 {
+	int			error;
+
+	if (xchk_could_repair(sc)) {
+		error = xrep_setup_parent(sc);
+		if (error)
+			return error;
+	}
+
 	return xchk_setup_inode_contents(sc, 0);
 }
 
diff --git a/fs/xfs/scrub/parent_repair.c b/fs/xfs/scrub/parent_repair.c
new file mode 100644
index 000000000000..d80d1a466c02
--- /dev/null
+++ b/fs/xfs/scrub/parent_repair.c
@@ -0,0 +1,583 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (C) 2023 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <djwong@kernel.org>
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_defer.h"
+#include "xfs_bit.h"
+#include "xfs_log_format.h"
+#include "xfs_trans.h"
+#include "xfs_sb.h"
+#include "xfs_inode.h"
+#include "xfs_icache.h"
+#include "xfs_da_format.h"
+#include "xfs_da_btree.h"
+#include "xfs_dir2.h"
+#include "xfs_dir2_priv.h"
+#include "xfs_bmap.h"
+#include "xfs_quota.h"
+#include "xfs_bmap_btree.h"
+#include "xfs_trans_space.h"
+#include "xfs_bmap_util.h"
+#include "xfs_attr.h"
+#include "xfs_parent.h"
+#include "scrub/xfs_scrub.h"
+#include "scrub/scrub.h"
+#include "scrub/common.h"
+#include "scrub/trace.h"
+#include "scrub/repair.h"
+#include "scrub/tempfile.h"
+#include "scrub/iscan.h"
+#include "scrub/readdir.h"
+#include "scrub/listxattr.h"
+#include "scrub/xfile.h"
+#include "scrub/xfarray.h"
+#include "scrub/xfblob.h"
+
+/*
+ * Parent Pointer Repairs
+ * ======================
+ *
+ * Reconstruct a file's parent pointers by visiting each dirent of each
+ * directory in the filesystem and translating the relevant dirents into parent
+ * pointers.  Translation occurs by adding new parent pointers to a temporary
+ * file, which formats the ondisk extended attribute blocks.  In the final
+ * version of this code, we'll use the atomic extent swap code to exchange the
+ * entire xattr structure of the file being repaired and the temporary file,
+ * but for this PoC we omit the commit to reduce the amount of code that has to
+ * be ported.
+ *
+ * Because we have to scan the entire filesystem, the next patch introduces the
+ * inode scan and live update hooks so that the rebuilder can be kept aware of
+ * filesystem updates being made to this file's parents by other threads.
+ * Parent pointer translation therefore requires two steps to avoid problems
+ * with lock contention and to keep ondisk tempdir updates out of the hook
+ * path.
+ *
+ * Every time the filesystem scanner or the live update hook code encounter a
+ * directory operation relevant to this rebuilder, they will write a record of
+ * the createpptr/removepptr operation to an xfarray.  Parent pointer names are
+ * stored in an xfblob structure.  At opportune times, these stashed updates
+ * will be read from the xfarray and committed (individually) to the temporary
+ * file's parent pointers.
+ *
+ * When the filesystem scan is complete, we relock both the file and the
+ * tempfile, and finish any stashed operations.  At that point, had we copied
+ * the extended attributes, we would be ready to exchange the attribute data
+ * fork mappings.  This cannot happen until two patchsets get merged: the first
+ * allows callers to specify the owning inode number explicitly; and the second
+ * is the atomic extent swap series.
+ *
+ * For now we'll simply compare the two files parent pointers and complain
+ * about discrepancies.
+ */
+
+/* Maximum memory usage for the tempfile log, in bytes. */
+#define MAX_PPTR_STASH_SIZE	(32ULL << 10)
+
+/* Create a parent pointer in the tempfile. */
+#define XREP_PPTR_ADD		(1)
+
+/* Remove a parent pointer from the tempfile. */
+#define XREP_PPTR_REMOVE	(2)
+
+/* A stashed parent pointer update. */
+struct xrep_pptr {
+	/* Cookie for retrieval of the pptr name. */
+	xfblob_cookie			name_cookie;
+
+	/* Parent pointer attr key. */
+	xfs_ino_t			p_ino;
+	uint32_t			p_gen;
+	xfs_dir2_dataptr_t		p_diroffset;
+
+	/* Length of the pptr name. */
+	uint8_t				namelen;
+
+	/* XREP_PPTR_{ADD,REMOVE} */
+	uint8_t				action;
+};
+
+struct xrep_pptrs {
+	struct xfs_scrub	*sc;
+
+	/* Inode scan cursor. */
+	struct xchk_iscan	iscan;
+
+	/* Scratch buffer for scanning dirents to create pptr xattrs */
+	struct xfs_parent_name_irec pptr;
+
+	/* xattr key and da args for parent pointer replay. */
+	struct xfs_parent_scratch pptr_scratch;
+
+	/* Mutex protecting parent_ptrs, pptr_names. */
+	struct mutex		lock;
+
+	/* Stashed parent pointer updates. */
+	struct xfarray		*parent_ptrs;
+
+	/* Parent pointer names. */
+	struct xfblob		*pptr_names;
+};
+
+/* Tear down all the incore stuff we created. */
+static void
+xrep_pptr_teardown(
+	struct xrep_pptrs	*rp)
+{
+	xchk_iscan_finish(&rp->iscan);
+	mutex_destroy(&rp->lock);
+	xfblob_destroy(rp->pptr_names);
+	xfarray_destroy(rp->parent_ptrs);
+}
+
+/* Set up for a parent pointer repair. */
+int
+xrep_setup_parent(
+	struct xfs_scrub	*sc)
+{
+	struct xrep_pptrs	*rp;
+	int			error;
+
+	error = xrep_tempfile_create(sc, S_IFREG);
+	if (error)
+		return error;
+
+	rp = kvzalloc(sizeof(struct xrep_pptrs), XCHK_GFP_FLAGS);
+	if (!rp)
+		return -ENOMEM;
+
+	sc->buf = rp;
+	rp->sc = sc;
+	return 0;
+}
+
+/* Are these two parent pointer names the same? */
+static inline bool
+xrep_pptr_samename(
+	const struct xfs_name	*n1,
+	const struct xfs_name	*n2)
+{
+	return n1->len == n2->len && !memcmp(n1->name, n2->name, n1->len);
+}
+
+/* Update the temporary file's parent pointers with a stashed update. */
+STATIC int
+xrep_pptr_replay_update(
+	struct xrep_pptrs	*rp,
+	const struct xrep_pptr	*pptr)
+{
+	struct xfs_scrub	*sc = rp->sc;
+
+	rp->pptr.p_ino = pptr->p_ino;
+	rp->pptr.p_gen = pptr->p_gen;
+	rp->pptr.p_diroffset = pptr->p_diroffset;
+	rp->pptr.p_namelen = pptr->namelen;
+
+	if (pptr->action == XREP_PPTR_ADD) {
+		/* Create parent pointer. */
+		trace_xrep_pptr_createname(sc->tempip, &rp->pptr);
+
+		return xfs_parent_set(sc->tempip, &rp->pptr, &rp->pptr_scratch);
+	}
+
+	ASSERT(0);
+	return -EOPNOTSUPP;
+}
+
+/*
+ * Flush stashed parent pointer updates that have been recorded by the scanner.
+ * This is done to reduce the memory requirements of the parent pointer
+ * rebuild, since files can have a lot of hardlinks and the fs can be busy.
+ *
+ * Caller must not hold transactions or ILOCKs.  Caller must hold the tempfile
+ * IOLOCK.
+ */
+STATIC int
+xrep_pptr_replay_updates(
+	struct xrep_pptrs	*rp)
+{
+	xfarray_idx_t		array_cur;
+	int			error;
+
+	mutex_lock(&rp->lock);
+	foreach_xfarray_idx(rp->parent_ptrs, array_cur) {
+		struct xrep_pptr	pptr;
+
+		error = xfarray_load(rp->parent_ptrs, array_cur, &pptr);
+		if (error)
+			goto out_unlock;
+
+		error = xfblob_load(rp->pptr_names, pptr.name_cookie,
+				rp->pptr.p_name, pptr.namelen);
+		if (error)
+			goto out_unlock;
+		rp->pptr.p_name[MAXNAMELEN - 1] = 0;
+		mutex_unlock(&rp->lock);
+
+		error = xrep_pptr_replay_update(rp, &pptr);
+		if (error)
+			return error;
+
+		mutex_lock(&rp->lock);
+	}
+
+	/* Empty out both arrays now that we've added the entries. */
+	xfarray_truncate(rp->parent_ptrs);
+	xfblob_truncate(rp->pptr_names);
+	mutex_unlock(&rp->lock);
+	return 0;
+out_unlock:
+	mutex_unlock(&rp->lock);
+	return error;
+}
+
+/*
+ * Remember that we want to create a parent pointer in the tempfile.  These
+ * stashed actions will be replayed later.
+ */
+STATIC int
+xrep_pptr_add_pointer(
+	struct xrep_pptrs	*rp,
+	const struct xfs_name	*name,
+	const struct xfs_inode	*dp,
+	xfs_dir2_dataptr_t	diroffset)
+{
+	struct xrep_pptr	pptr = {
+		.action		= XREP_PPTR_ADD,
+		.namelen	= name->len,
+		.p_ino		= dp->i_ino,
+		.p_gen		= VFS_IC(dp)->i_generation,
+		.p_diroffset	= diroffset,
+	};
+	int			error;
+
+	trace_xrep_pptr_add_pointer(rp->sc->tempip, dp, diroffset, name);
+
+	error = xfblob_store(rp->pptr_names, &pptr.name_cookie, name->name,
+			name->len);
+	if (error)
+		return error;
+
+	return xfarray_append(rp->parent_ptrs, &pptr);
+}
+
+/*
+ * Examine an entry of a directory.  If this dirent leads us back to the file
+ * whose parent pointers we're rebuilding, add a pptr to the temporary
+ * directory.
+ */
+STATIC int
+xrep_pptr_scan_dirent(
+	struct xfs_scrub	*sc,
+	struct xfs_inode	*dp,
+	xfs_dir2_dataptr_t	dapos,
+	const struct xfs_name	*name,
+	xfs_ino_t		ino,
+	void			*priv)
+{
+	struct xrep_pptrs	*rp = priv;
+	int			error;
+
+	/* Dirent doesn't point to this directory. */
+	if (ino != rp->sc->ip->i_ino)
+		return 0;
+
+	/* No weird looking names. */
+	if (!xfs_dir2_namecheck(name->name, name->len))
+		return -EFSCORRUPTED;
+
+	/* No mismatching ftypes. */
+	if (name->type != xfs_mode_to_ftype(VFS_I(sc->ip)->i_mode))
+		return -EFSCORRUPTED;
+
+	/* Don't pick up dot or dotdot entries; we only want child dirents. */
+	if (xrep_pptr_samename(name, &xfs_name_dotdot) ||
+	    xrep_pptr_samename(name, &xfs_name_dot))
+		return 0;
+
+	/*
+	 * Transform this dirent into a parent pointer and queue it for later
+	 * addition to the temporary file.
+	 */
+	mutex_lock(&rp->lock);
+	error = xrep_pptr_add_pointer(rp, name, dp, dapos);
+	mutex_unlock(&rp->lock);
+	return error;
+}
+
+/*
+ * Decide if we want to look for dirents in this directory.  Skip the file
+ * being repaired and any files being used to stage repairs.
+ */
+static inline bool
+xrep_pptr_want_scan(
+	struct xrep_pptrs	*rp,
+	const struct xfs_inode	*ip)
+{
+	return ip != rp->sc->ip && !xrep_is_tempfile(ip);
+}
+
+/*
+ * Take ILOCK on a file that we want to scan.
+ *
+ * Select ILOCK_EXCL if the file is a directory with an unloaded data bmbt.
+ * Otherwise, take ILOCK_SHARED.
+ */
+static inline unsigned int
+xrep_pptr_scan_ilock(
+	struct xrep_pptrs	*rp,
+	struct xfs_inode	*ip)
+{
+	uint			lock_mode = XFS_ILOCK_SHARED;
+
+	/* Still need to take the shared ILOCK to advance the iscan cursor. */
+	if (!xrep_pptr_want_scan(rp, ip))
+		goto lock;
+
+	if (S_ISDIR(VFS_I(ip)->i_mode) && xfs_need_iread_extents(&ip->i_df)) {
+		lock_mode = XFS_ILOCK_EXCL;
+		goto lock;
+	}
+
+lock:
+	xfs_ilock(ip, lock_mode);
+	return lock_mode;
+}
+
+/*
+ * Scan this file for relevant child dirents that point to the file whose
+ * parent pointers we're rebuilding.
+ */
+STATIC int
+xrep_pptr_scan_file(
+	struct xrep_pptrs	*rp,
+	struct xfs_inode	*ip)
+{
+	unsigned int		lock_mode;
+	int			error = 0;
+
+	lock_mode = xrep_pptr_scan_ilock(rp, ip);
+
+	if (!xrep_pptr_want_scan(rp, ip))
+		goto scan_done;
+
+	if (S_ISDIR(VFS_I(ip)->i_mode)) {
+		error = xchk_dir_walk(rp->sc, ip, xrep_pptr_scan_dirent, rp);
+		if (error)
+			goto scan_done;
+	}
+
+scan_done:
+	xchk_iscan_mark_visited(&rp->iscan, ip);
+	xfs_iunlock(ip, lock_mode);
+	return error;
+}
+
+/* Scan all files in the filesystem for parent pointers. */
+STATIC int
+xrep_pptr_scan_dirtree(
+	struct xrep_pptrs	*rp)
+{
+	struct xfs_scrub	*sc = rp->sc;
+	struct xfs_inode	*ip;
+	int			error;
+
+	/*
+	 * Filesystem scans are time consuming.  Drop the file ILOCK and all
+	 * other resources for the duration of the scan and hope for the best.
+	 */
+	xchk_trans_cancel(sc);
+	if (sc->ilock_flags & (XFS_ILOCK_SHARED | XFS_ILOCK_EXCL))
+		xchk_iunlock(sc, sc->ilock_flags & (XFS_ILOCK_SHARED |
+						    XFS_ILOCK_EXCL));
+	error = xchk_trans_alloc_empty(sc);
+	if (error)
+		return error;
+
+	while ((error = xchk_iscan_iter(sc, &rp->iscan, &ip)) == 1) {
+		uint64_t	mem_usage;
+
+		error = xrep_pptr_scan_file(rp, ip);
+		xchk_irele(sc, ip);
+		if (error)
+			break;
+
+		/* Flush stashed pptr updates to constrain memory usage. */
+		mutex_lock(&rp->lock);
+		mem_usage = xfarray_bytes(rp->parent_ptrs) +
+			     xfblob_bytes(rp->pptr_names);
+		mutex_unlock(&rp->lock);
+		if (mem_usage >= MAX_PPTR_STASH_SIZE) {
+			xchk_trans_cancel(sc);
+
+			error = xrep_tempfile_iolock_polled(sc);
+			if (error)
+				break;
+
+			error = xrep_pptr_replay_updates(rp);
+			xrep_tempfile_iounlock(sc);
+			if (error)
+				break;
+
+			error = xchk_trans_alloc_empty(sc);
+			if (error)
+				break;
+		}
+
+		if (xchk_should_terminate(sc, &error))
+			break;
+	}
+	if (error) {
+		/*
+		 * If we couldn't grab an inode that was busy with a state
+		 * change, change the error code so that we exit to userspace
+		 * as quickly as possible.
+		 */
+		if (error == -EBUSY)
+			return -ECANCELED;
+		return error;
+	}
+
+	return 0;
+}
+
+/* Dump a parent pointer from the temporary file. */
+STATIC int
+xrep_pptr_dump_tempptr(
+	struct xfs_scrub	*sc,
+	struct xfs_inode	*ip,
+	unsigned int		attr_flags,
+	const unsigned char	*name,
+	unsigned int		namelen,
+	const void		*value,
+	unsigned int		valuelen,
+	void			*priv)
+{
+	struct xrep_pptrs	*rp = priv;
+	const struct xfs_parent_name_rec *rec = (const void *)name;
+
+	if (!(attr_flags & XFS_ATTR_PARENT))
+		return 0;
+
+	xfs_parent_irec_from_disk(&rp->pptr, rec, value, valuelen);
+
+	trace_xrep_pptr_dumpname(sc->tempip, &rp->pptr);
+	return 0;
+}
+
+/*
+ * "Commit" the new parent pointer (aka extended attribute) structure to the
+ * file that we're repairing.
+ *
+ * In the final version, we'd copy the existing xattrs from the file being
+ * repaired to the temporary file and swap the new xattr contents (which we
+ * created in the tempfile) into the file being repaired.  For now we just lock
+ * the temporary file and dump what we found.
+ */
+STATIC int
+xrep_pptr_rebuild_tree(
+	struct xrep_pptrs	*rp)
+{
+	struct xfs_scrub	*sc = rp->sc;
+	int			error = 0;
+
+	/*
+	 * Replay the last of the stashed dirent updates after retaking
+	 * IOLOCK_EXCL of the directory that we're repairing and the temporary
+	 * directory.
+	 */
+	xchk_trans_cancel(sc);
+
+	ASSERT(sc->ilock_flags & XFS_IOLOCK_EXCL);
+	error = xrep_tempfile_iolock_polled(sc);
+	if (error)
+		return error;
+
+	error = xrep_pptr_replay_updates(rp);
+	if (error)
+		return error;
+
+	/*
+	 * At this point, we've quiesced both files and should be ready
+	 * to commit the new contents.
+	 *
+	 * We don't have atomic swapext here, so all we do is dump the pptrs
+	 * that we found to the ftrace buffer.  Inactivation of the tempfile
+	 * will erase the attr fork for us.
+	 */
+	error = xchk_trans_alloc(sc, 0);
+	if (error)
+		return error;
+
+	trace_xrep_pptr_rebuild_tree(sc->ip, 0);
+
+	xrep_tempfile_ilock(sc);
+	return xchk_xattr_walk(sc, sc->tempip, xrep_pptr_dump_tempptr, rp);
+}
+
+/* Set up the filesystem scan so we can look for pptrs. */
+STATIC int
+xrep_pptr_setup_scan(
+	struct xrep_pptrs	*rp)
+{
+	struct xfs_scrub	*sc = rp->sc;
+	int			error;
+
+	/* Set up some staging memory for logging parent pointers. */
+	error = xfarray_create(sc->mp, "parent pointers", 0,
+			sizeof(struct xrep_pptr), &rp->parent_ptrs);
+	if (error)
+		return error;
+
+	error = xfblob_create(sc->mp, "pptr names", &rp->pptr_names);
+	if (error)
+		goto out_entries;
+
+	mutex_init(&rp->lock);
+
+	/* Retry iget every tenth of a second for up to 30 seconds. */
+	xchk_iscan_start(&rp->iscan, 30000, 100);
+
+	return 0;
+
+out_entries:
+	xfarray_destroy(rp->parent_ptrs);
+	return error;
+}
+
+/* Repair the parent pointers. */
+int
+xrep_parent(
+	struct xfs_scrub	*sc)
+{
+	struct xrep_pptrs	*rp = sc->buf;
+	int			error = 0;
+
+	/* We require directory parent pointers to rebuild anything. */
+	if (!xfs_has_parent(sc->mp))
+		return -EOPNOTSUPP;
+
+	error = xrep_pptr_setup_scan(rp);
+	if (error)
+		goto out;
+
+	error = xrep_pptr_scan_dirtree(rp);
+	if (error)
+		goto out_finish_scan;
+
+	error = xrep_pptr_rebuild_tree(rp);
+	if (error)
+		goto out_finish_scan;
+
+out_finish_scan:
+	xrep_pptr_teardown(rp);
+out:
+	return error;
+}
diff --git a/fs/xfs/scrub/repair.h b/fs/xfs/scrub/repair.h
index ff254ff9b86d..cc42cf65ac92 100644
--- a/fs/xfs/scrub/repair.h
+++ b/fs/xfs/scrub/repair.h
@@ -68,6 +68,7 @@ void xrep_force_quotacheck(struct xfs_scrub *sc, xfs_dqtype_t type);
 int xrep_ino_dqattach(struct xfs_scrub *sc);
 
 int xrep_setup_directory(struct xfs_scrub *sc);
+int xrep_setup_parent(struct xfs_scrub *sc);
 
 /* Metadata repairers */
 
@@ -77,6 +78,7 @@ int xrep_agf(struct xfs_scrub *sc);
 int xrep_agfl(struct xfs_scrub *sc);
 int xrep_agi(struct xfs_scrub *sc);
 int xrep_directory(struct xfs_scrub *sc);
+int xrep_parent(struct xfs_scrub *sc);
 
 #else
 
@@ -97,6 +99,7 @@ xrep_calc_ag_resblks(
 }
 
 #define xrep_setup_directory(sc)	(0)
+#define xrep_setup_parent(sc)		(0)
 
 #define xrep_probe			xrep_notsupported
 #define xrep_superblock			xrep_notsupported
@@ -104,6 +107,7 @@ xrep_calc_ag_resblks(
 #define xrep_agfl			xrep_notsupported
 #define xrep_agi			xrep_notsupported
 #define xrep_directory			xrep_notsupported
+#define xrep_parent			xrep_notsupported
 
 #endif /* CONFIG_XFS_ONLINE_REPAIR */
 
diff --git a/fs/xfs/scrub/scrub.c b/fs/xfs/scrub/scrub.c
index b2a8de449d11..5ddb4dcff978 100644
--- a/fs/xfs/scrub/scrub.c
+++ b/fs/xfs/scrub/scrub.c
@@ -317,7 +317,7 @@ static const struct xchk_meta_ops meta_scrub_ops[] = {
 		.type	= ST_INODE,
 		.setup	= xchk_setup_parent,
 		.scrub	= xchk_parent,
-		.repair	= xrep_notsupported,
+		.repair	= xrep_parent,
 	},
 	[XFS_SCRUB_TYPE_RTBITMAP] = {	/* realtime bitmap */
 		.type	= ST_FS,
diff --git a/fs/xfs/scrub/trace.c b/fs/xfs/scrub/trace.c
index 61b51617fbb4..30946b6a16dd 100644
--- a/fs/xfs/scrub/trace.c
+++ b/fs/xfs/scrub/trace.c
@@ -18,6 +18,8 @@
 #include "scrub/xfarray.h"
 #include "scrub/iscan.h"
 #include "xfs_da_format.h"
+#include "xfs_da_btree.h"
+#include "xfs_parent.h"
 
 /* Figure out which block the btree cursor was pointing to. */
 static inline xfs_fsblock_t
diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
index 61f18632cb6f..1fb9832a113a 100644
--- a/fs/xfs/scrub/trace.h
+++ b/fs/xfs/scrub/trace.h
@@ -19,6 +19,7 @@
 struct xfile;
 struct xfarray;
 struct xchk_iscan;
+struct xfs_parent_name_irec;
 
 /*
  * ftrace's __print_symbolic requires that all enum values be wrapped in the
@@ -1316,6 +1317,82 @@ DEFINE_EVENT(xrep_dir_class, name, \
 	TP_PROTO(struct xfs_inode *dp, xfs_ino_t parent_ino), \
 	TP_ARGS(dp, parent_ino))
 DEFINE_XREP_DIR_CLASS(xrep_dir_rebuild_tree);
+DEFINE_XREP_DIR_CLASS(xrep_pptr_rebuild_tree);
+
+DECLARE_EVENT_CLASS(xrep_pptr_class,
+	TP_PROTO(struct xfs_inode *ip, const struct xfs_parent_name_irec *pptr),
+	TP_ARGS(ip, pptr),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_ino_t, ino)
+		__field(xfs_ino_t, parent_ino)
+		__field(unsigned int, parent_gen)
+		__field(unsigned int, parent_diroffset)
+		__field(unsigned int, namelen)
+		__dynamic_array(char, name, pptr->p_namelen)
+	),
+	TP_fast_assign(
+		__entry->dev = ip->i_mount->m_super->s_dev;
+		__entry->ino = ip->i_ino;
+		__entry->parent_ino = pptr->p_ino;
+		__entry->parent_gen = pptr->p_gen;
+		__entry->parent_diroffset = pptr->p_diroffset;
+		__entry->namelen = pptr->p_namelen;
+		memcpy(__get_str(name), pptr->p_name, pptr->p_namelen);
+	),
+	TP_printk("dev %d:%d ino 0x%llx parent_ino 0x%llx parent_gen 0x%x parent_dapos 0x%x name '%.*s'",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->ino,
+		  __entry->parent_ino,
+		  __entry->parent_gen,
+		  __entry->parent_diroffset,
+		  __entry->namelen,
+		  __get_str(name))
+)
+#define DEFINE_XREP_PPTR_CLASS(name) \
+DEFINE_EVENT(xrep_pptr_class, name, \
+	TP_PROTO(struct xfs_inode *ip, const struct xfs_parent_name_irec *pptr), \
+	TP_ARGS(ip, pptr))
+DEFINE_XREP_PPTR_CLASS(xrep_pptr_createname);
+DEFINE_XREP_PPTR_CLASS(xrep_pptr_dumpname);
+
+DECLARE_EVENT_CLASS(xrep_pptr_scan_class,
+	TP_PROTO(struct xfs_inode *ip, const struct xfs_inode *dp,
+		 unsigned int diroffset, const struct xfs_name *name),
+	TP_ARGS(ip, dp, diroffset, name),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_ino_t, ino)
+		__field(xfs_ino_t, parent_ino)
+		__field(unsigned int, parent_gen)
+		__field(unsigned int, parent_diroffset)
+		__field(unsigned int, namelen)
+		__dynamic_array(char, name, name->len)
+	),
+	TP_fast_assign(
+		__entry->dev = ip->i_mount->m_super->s_dev;
+		__entry->ino = ip->i_ino;
+		__entry->parent_ino = dp->i_ino;
+		__entry->parent_gen = VFS_IC(dp)->i_generation;
+		__entry->parent_diroffset = diroffset;
+		__entry->namelen = name->len;
+		memcpy(__get_str(name), name->name, name->len);
+	),
+	TP_printk("dev %d:%d ino 0x%llx parent_ino 0x%llx parent_gen 0x%x parent_dapos 0x%x name '%.*s'",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->ino,
+		  __entry->parent_ino,
+		  __entry->parent_gen,
+		  __entry->parent_diroffset,
+		  __entry->namelen,
+		  __get_str(name))
+)
+#define DEFINE_XREP_PPTR_SCAN_CLASS(name) \
+DEFINE_EVENT(xrep_pptr_scan_class, name, \
+	TP_PROTO(struct xfs_inode *ip, const struct xfs_inode *dp, \
+		 unsigned int diroffset, const struct xfs_name *name), \
+	TP_ARGS(ip, dp, diroffset, name))
+DEFINE_XREP_PPTR_SCAN_CLASS(xrep_pptr_add_pointer);
 
 #endif /* IS_ENABLED(CONFIG_XFS_ONLINE_REPAIR) */
 
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index 403b0f4cb5c0..32efd4cb6507 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -153,6 +153,12 @@ static inline struct inode *VFS_I(struct xfs_inode *ip)
 	return &ip->i_vnode;
 }
 
+/* convert from const xfs inode to vfs inode */
+static inline const struct inode *VFS_IC(const struct xfs_inode *ip)
+{
+	return &ip->i_vnode;
+}
+
 /*
  * For regular files we only update the on-disk filesize when actually
  * writing data back to disk.  Until then only the copy in the VFS inode


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 2/3] xfs: repair parent pointers with live scan hooks
  2023-02-16 20:27 ` [PATCHSET v9r2d1 0/3] xfs: online checking " Darrick J. Wong
  2023-02-16 20:50   ` [PATCH 1/3] xfs: repair parent pointers by scanning directories Darrick J. Wong
@ 2023-02-16 20:50   ` Darrick J. Wong
  2023-02-16 20:50   ` [PATCH 3/3] xfs: compare generated and existing parent pointers Darrick J. Wong
  2 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:50 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Use the nlink hooks to keep our tempfile's parent pointers up to date.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_parent.c   |   25 ++++++++++
 fs/xfs/libxfs/xfs_parent.h   |    4 ++
 fs/xfs/scrub/parent_repair.c |  110 ++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/trace.h         |    2 +
 4 files changed, 141 insertions(+)


diff --git a/fs/xfs/libxfs/xfs_parent.c b/fs/xfs/libxfs/xfs_parent.c
index 5f07bd3debee..a2575bf44c89 100644
--- a/fs/xfs/libxfs/xfs_parent.c
+++ b/fs/xfs/libxfs/xfs_parent.c
@@ -363,3 +363,28 @@ xfs_parent_set(
 
 	return xfs_attr_set(&scr->args);
 }
+
+/*
+ * Remove the parent pointer (@rec -> @name) from @ip immediately.  Caller must
+ * not have a transaction or hold the ILOCK.  The update will not use logged
+ * xattrs.  This is for specialized repair functions only.  The scratchpad need
+ * not be initialized.
+ */
+int
+xfs_parent_unset(
+	struct xfs_inode		*ip,
+	const struct xfs_parent_name_irec *pptr,
+	struct xfs_parent_scratch	*scr)
+{
+	xfs_parent_irec_to_disk(&scr->rec, NULL, NULL, pptr);
+
+	memset(&scr->args, 0, sizeof(struct xfs_da_args));
+	scr->args.attr_filter	= XFS_ATTR_PARENT;
+	scr->args.dp		= ip;
+	scr->args.geo		= ip->i_mount->m_attr_geo;
+	scr->args.name		= (const unsigned char *)&scr->rec;
+	scr->args.namelen	= sizeof(struct xfs_parent_name_rec);
+	scr->args.whichfork	= XFS_ATTR_FORK;
+
+	return xfs_attr_set(&scr->args);
+}
diff --git a/fs/xfs/libxfs/xfs_parent.h b/fs/xfs/libxfs/xfs_parent.h
index effbccdf6b0e..a7fc621b82c4 100644
--- a/fs/xfs/libxfs/xfs_parent.h
+++ b/fs/xfs/libxfs/xfs_parent.h
@@ -117,4 +117,8 @@ int xfs_parent_set(struct xfs_inode *ip,
 		const struct xfs_parent_name_irec *pptr,
 		struct xfs_parent_scratch *scratch);
 
+int xfs_parent_unset(struct xfs_inode *ip,
+		const struct xfs_parent_name_irec *rec,
+		struct xfs_parent_scratch *scratch);
+
 #endif	/* __XFS_PARENT_H__ */
diff --git a/fs/xfs/scrub/parent_repair.c b/fs/xfs/scrub/parent_repair.c
index d80d1a466c02..4aec32081c6d 100644
--- a/fs/xfs/scrub/parent_repair.c
+++ b/fs/xfs/scrub/parent_repair.c
@@ -119,6 +119,9 @@ struct xrep_pptrs {
 	/* Mutex protecting parent_ptrs, pptr_names. */
 	struct mutex		lock;
 
+	/* Hook to capture directory entry updates. */
+	struct xfs_dirent_hook	hooks;
+
 	/* Stashed parent pointer updates. */
 	struct xfarray		*parent_ptrs;
 
@@ -131,6 +134,7 @@ static void
 xrep_pptr_teardown(
 	struct xrep_pptrs	*rp)
 {
+	xfs_dirent_hook_del(rp->sc->mp, &rp->hooks);
 	xchk_iscan_finish(&rp->iscan);
 	mutex_destroy(&rp->lock);
 	xfblob_destroy(rp->pptr_names);
@@ -145,6 +149,8 @@ xrep_setup_parent(
 	struct xrep_pptrs	*rp;
 	int			error;
 
+	xchk_fshooks_enable(sc, XCHK_FSHOOKS_DIRENTS);
+
 	error = xrep_tempfile_create(sc, S_IFREG);
 	if (error)
 		return error;
@@ -185,6 +191,12 @@ xrep_pptr_replay_update(
 		trace_xrep_pptr_createname(sc->tempip, &rp->pptr);
 
 		return xfs_parent_set(sc->tempip, &rp->pptr, &rp->pptr_scratch);
+	} else if (pptr->action == XREP_PPTR_REMOVE) {
+		/* Remove parent pointer. */
+		trace_xrep_pptr_removename(sc->tempip, &rp->pptr);
+
+		return xfs_parent_unset(sc->tempip, &rp->pptr,
+				&rp->pptr_scratch);
 	}
 
 	ASSERT(0);
@@ -268,6 +280,36 @@ xrep_pptr_add_pointer(
 	return xfarray_append(rp->parent_ptrs, &pptr);
 }
 
+/*
+ * Remember that we want to remove a parent pointer from the tempfile.  These
+ * stashed actions will be replayed later.
+ */
+STATIC int
+xrep_pptr_remove_pointer(
+	struct xrep_pptrs	*rp,
+	const struct xfs_name	*name,
+	const struct xfs_inode	*dp,
+	xfs_dir2_dataptr_t	diroffset)
+{
+	struct xrep_pptr	pptr = {
+		.action		= XREP_PPTR_REMOVE,
+		.namelen	= name->len,
+		.p_ino		= dp->i_ino,
+		.p_gen		= VFS_IC(dp)->i_generation,
+		.p_diroffset	= diroffset,
+	};
+	int			error;
+
+	trace_xrep_pptr_remove_pointer(rp->sc->tempip, dp, diroffset, name);
+
+	error = xfblob_store(rp->pptr_names, &pptr.name_cookie, name->name,
+			name->len);
+	if (error)
+		return error;
+
+	return xfarray_append(rp->parent_ptrs, &pptr);
+}
+
 /*
  * Examine an entry of a directory.  If this dirent leads us back to the file
  * whose parent pointers we're rebuilding, add a pptr to the temporary
@@ -500,6 +542,12 @@ xrep_pptr_rebuild_tree(
 	if (error)
 		return error;
 
+	/*
+	 * Abort the inode scan so that the live hooks won't stash any more
+	 * directory updates.
+	 */
+	xchk_iscan_abort(&rp->iscan);
+
 	error = xrep_pptr_replay_updates(rp);
 	if (error)
 		return error;
@@ -522,6 +570,52 @@ xrep_pptr_rebuild_tree(
 	return xchk_xattr_walk(sc, sc->tempip, xrep_pptr_dump_tempptr, rp);
 }
 
+/*
+ * Capture dirent updates being made by other threads which are relevant to the
+ * file being repaired.
+ */
+STATIC int
+xrep_pptr_live_update(
+	struct notifier_block		*nb,
+	unsigned long			action,
+	void				*data)
+{
+	struct xfs_dirent_update_params	*p = data;
+	struct xrep_pptrs		*rp;
+	struct xfs_scrub		*sc;
+	int				error;
+
+	rp = container_of(nb, struct xrep_pptrs, hooks.delta_hook.nb);
+	sc = rp->sc;
+
+	if (action != XFS_DIRENT_CHILD_DELTA)
+		return NOTIFY_DONE;
+
+	/*
+	 * This thread updated a dirent that points to the file that we're
+	 * repairing, so stash the update for replay against the temporary
+	 * file.
+	 */
+	if (p->ip->i_ino == sc->ip->i_ino &&
+	    xchk_iscan_want_live_update(&rp->iscan, p->dp->i_ino)) {
+		mutex_lock(&rp->lock);
+		if (p->delta > 0)
+			error = xrep_pptr_add_pointer(rp, p->name, p->dp,
+					p->diroffset);
+		else
+			error = xrep_pptr_remove_pointer(rp, p->name, p->dp,
+					p->diroffset);
+		mutex_unlock(&rp->lock);
+		if (error)
+			goto out_abort;
+	}
+
+	return NOTIFY_DONE;
+out_abort:
+	xchk_iscan_abort(&rp->iscan);
+	return NOTIFY_DONE;
+}
+
 /* Set up the filesystem scan so we can look for pptrs. */
 STATIC int
 xrep_pptr_setup_scan(
@@ -545,8 +639,24 @@ xrep_pptr_setup_scan(
 	/* Retry iget every tenth of a second for up to 30 seconds. */
 	xchk_iscan_start(&rp->iscan, 30000, 100);
 
+	/*
+	 * Hook into the dirent update code.  The hook only operates on inodes
+	 * that were already scanned, and the scanner thread takes each inode's
+	 * ILOCK, which means that any in-progress inode updates will finish
+	 * before we can scan the inode.
+	 */
+	ASSERT(sc->flags & XCHK_FSHOOKS_DIRENTS);
+	xfs_hook_setup(&rp->hooks.delta_hook, xrep_pptr_live_update);
+	error = xfs_dirent_hook_add(sc->mp, &rp->hooks);
+	if (error)
+		goto out_scan;
+
 	return 0;
 
+out_scan:
+	xchk_iscan_finish(&rp->iscan);
+	mutex_destroy(&rp->lock);
+	xfblob_destroy(rp->pptr_names);
 out_entries:
 	xfarray_destroy(rp->parent_ptrs);
 	return error;
diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
index 1fb9832a113a..283a1cedf368 100644
--- a/fs/xfs/scrub/trace.h
+++ b/fs/xfs/scrub/trace.h
@@ -1354,6 +1354,7 @@ DEFINE_EVENT(xrep_pptr_class, name, \
 	TP_PROTO(struct xfs_inode *ip, const struct xfs_parent_name_irec *pptr), \
 	TP_ARGS(ip, pptr))
 DEFINE_XREP_PPTR_CLASS(xrep_pptr_createname);
+DEFINE_XREP_PPTR_CLASS(xrep_pptr_removename);
 DEFINE_XREP_PPTR_CLASS(xrep_pptr_dumpname);
 
 DECLARE_EVENT_CLASS(xrep_pptr_scan_class,
@@ -1393,6 +1394,7 @@ DEFINE_EVENT(xrep_pptr_scan_class, name, \
 		 unsigned int diroffset, const struct xfs_name *name), \
 	TP_ARGS(ip, dp, diroffset, name))
 DEFINE_XREP_PPTR_SCAN_CLASS(xrep_pptr_add_pointer);
+DEFINE_XREP_PPTR_SCAN_CLASS(xrep_pptr_remove_pointer);
 
 #endif /* IS_ENABLED(CONFIG_XFS_ONLINE_REPAIR) */
 


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 3/3] xfs: compare generated and existing parent pointers
  2023-02-16 20:27 ` [PATCHSET v9r2d1 0/3] xfs: online checking " Darrick J. Wong
  2023-02-16 20:50   ` [PATCH 1/3] xfs: repair parent pointers by scanning directories Darrick J. Wong
  2023-02-16 20:50   ` [PATCH 2/3] xfs: repair parent pointers with live scan hooks Darrick J. Wong
@ 2023-02-16 20:50   ` Darrick J. Wong
  2 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:50 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Check our work to make sure we found all the parent pointers that the
original file had.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/scrub/parent_repair.c |   52 ++++++++++++++++++++++++++++++++++++++++--
 fs/xfs/scrub/trace.h         |    1 +
 2 files changed, 50 insertions(+), 3 deletions(-)


diff --git a/fs/xfs/scrub/parent_repair.c b/fs/xfs/scrub/parent_repair.c
index 4aec32081c6d..56b47bf2807b 100644
--- a/fs/xfs/scrub/parent_repair.c
+++ b/fs/xfs/scrub/parent_repair.c
@@ -127,6 +127,9 @@ struct xrep_pptrs {
 
 	/* Parent pointer names. */
 	struct xfblob		*pptr_names;
+
+	/* Buffer for validation. */
+	unsigned char		namebuf[MAXNAMELEN];
 };
 
 /* Tear down all the incore stuff we created. */
@@ -490,7 +493,10 @@ xrep_pptr_scan_dirtree(
 	return 0;
 }
 
-/* Dump a parent pointer from the temporary file. */
+/*
+ * Dump a parent pointer from the temporary file and check it against the file
+ * we're rebuilding.  We are not committing any of this.
+ */
 STATIC int
 xrep_pptr_dump_tempptr(
 	struct xfs_scrub	*sc,
@@ -504,13 +510,45 @@ xrep_pptr_dump_tempptr(
 {
 	struct xrep_pptrs	*rp = priv;
 	const struct xfs_parent_name_rec *rec = (const void *)name;
+	struct xfs_inode	*other_ip;
+	int			pptr_namelen;
 
 	if (!(attr_flags & XFS_ATTR_PARENT))
 		return 0;
 
+	if (ip == sc->ip)
+		other_ip = sc->tempip;
+	else if (ip == sc->tempip)
+		other_ip = sc->ip;
+	else
+		return -EFSCORRUPTED;
+
 	xfs_parent_irec_from_disk(&rp->pptr, rec, value, valuelen);
 
 	trace_xrep_pptr_dumpname(sc->tempip, &rp->pptr);
+
+	pptr_namelen = xfs_parent_lookup(sc->tp, other_ip, &rp->pptr,
+			rp->namebuf, MAXNAMELEN, &rp->pptr_scratch);
+	if (pptr_namelen == -ENOATTR) {
+		trace_xrep_pptr_checkname(other_ip, &rp->pptr);
+		ASSERT(pptr_namelen != -ENOATTR);
+		return -EFSCORRUPTED;
+	}
+	if (pptr_namelen < 0)
+		return pptr_namelen;
+
+	if (pptr_namelen != rp->pptr.p_namelen) {
+		trace_xrep_pptr_checkname(other_ip, &rp->pptr);
+		ASSERT(pptr_namelen == rp->pptr.p_namelen);
+		return -EFSCORRUPTED;
+	}
+
+	if (memcmp(rp->namebuf, rp->pptr.p_name, rp->pptr.p_namelen)) {
+		trace_xrep_pptr_checkname(other_ip, &rp->pptr);
+		ASSERT(0);
+		return -EFSCORRUPTED;
+	}
+
 	return 0;
 }
 
@@ -566,8 +604,16 @@ xrep_pptr_rebuild_tree(
 
 	trace_xrep_pptr_rebuild_tree(sc->ip, 0);
 
-	xrep_tempfile_ilock(sc);
-	return xchk_xattr_walk(sc, sc->tempip, xrep_pptr_dump_tempptr, rp);
+	xchk_ilock(sc, XFS_ILOCK_EXCL);
+	error = xrep_tempfile_ilock_polled(sc);
+	if (error)
+		return error;
+
+	error = xchk_xattr_walk(sc, sc->tempip, xrep_pptr_dump_tempptr, rp);
+	if (error)
+		return error;
+
+	return xchk_xattr_walk(sc, sc->ip, xrep_pptr_dump_tempptr, rp);
 }
 
 /*
diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
index 283a1cedf368..e536d070f9c7 100644
--- a/fs/xfs/scrub/trace.h
+++ b/fs/xfs/scrub/trace.h
@@ -1356,6 +1356,7 @@ DEFINE_EVENT(xrep_pptr_class, name, \
 DEFINE_XREP_PPTR_CLASS(xrep_pptr_createname);
 DEFINE_XREP_PPTR_CLASS(xrep_pptr_removename);
 DEFINE_XREP_PPTR_CLASS(xrep_pptr_dumpname);
+DEFINE_XREP_PPTR_CLASS(xrep_pptr_checkname);
 
 DECLARE_EVENT_CLASS(xrep_pptr_scan_class,
 	TP_PROTO(struct xfs_inode *ip, const struct xfs_inode *dp,


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 1/2] xfs: check dirents have parent pointers
  2023-02-16 20:28 ` [PATCHSET v9r2d1 0/2] xfs: online checking of directories Darrick J. Wong
@ 2023-02-16 20:51   ` Darrick J. Wong
  2023-02-16 20:51   ` [PATCH 2/2] xfs: deferred scrub of dirents Darrick J. Wong
  1 sibling, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:51 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

If the fs has parent pointers, we need to check that each child dirent
points to a file that has a parent pointer pointing back at us.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/scrub/dir.c |  134 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 133 insertions(+), 1 deletion(-)


diff --git a/fs/xfs/scrub/dir.c b/fs/xfs/scrub/dir.c
index d720f1e143dd..39ae59eb4f40 100644
--- a/fs/xfs/scrub/dir.c
+++ b/fs/xfs/scrub/dir.c
@@ -15,6 +15,8 @@
 #include "xfs_icache.h"
 #include "xfs_dir2.h"
 #include "xfs_dir2_priv.h"
+#include "xfs_attr.h"
+#include "xfs_parent.h"
 #include "scrub/scrub.h"
 #include "scrub/common.h"
 #include "scrub/dabtree.h"
@@ -39,6 +41,20 @@ xchk_setup_directory(
 
 /* Directories */
 
+struct xchk_dir {
+	struct xfs_scrub	*sc;
+
+	/* Scratch buffer for scanning pptr xattrs */
+	struct xfs_parent_name_irec pptr;
+
+	/* xattr key and da args for parent pointer validation. */
+	struct xfs_parent_scratch pptr_scratch;
+
+	/* Name buffer for pptr validation and dirent revalidation. */
+	uint8_t			namebuf[MAXNAMELEN];
+
+};
+
 /* Scrub a directory entry. */
 
 /* Check that an inode's mode matches a given XFS_DIR3_FT_* type. */
@@ -61,6 +77,105 @@ xchk_dir_check_ftype(
 		xchk_fblock_set_corrupt(sc, XFS_DATA_FORK, offset);
 }
 
+/*
+ * Try to lock a child file for checking parent pointers.  Returns the inode
+ * flags for the locks we now hold, or zero if we failed.
+ */
+STATIC unsigned int
+xchk_dir_lock_child(
+	struct xfs_scrub	*sc,
+	struct xfs_inode	*ip)
+{
+	if (!xfs_ilock_nowait(ip, XFS_IOLOCK_SHARED))
+		return 0;
+
+	if (!xfs_ilock_nowait(ip, XFS_ILOCK_SHARED)) {
+		xfs_iunlock(ip, XFS_IOLOCK_SHARED);
+		return 0;
+	}
+
+	if (!xfs_inode_has_attr_fork(ip) || !xfs_need_iread_extents(&ip->i_af))
+		return XFS_IOLOCK_SHARED | XFS_ILOCK_SHARED;
+
+	xfs_iunlock(ip, XFS_ILOCK_SHARED);
+
+	if (!xfs_ilock_nowait(ip, XFS_ILOCK_EXCL)) {
+		xfs_iunlock(ip, XFS_IOLOCK_SHARED);
+		return 0;
+	}
+
+	return XFS_IOLOCK_SHARED | XFS_ILOCK_EXCL;
+}
+
+/* Check the backwards link (parent pointer) associated with this dirent. */
+STATIC int
+xchk_dir_parent_pointer(
+	struct xchk_dir		*sd,
+	xfs_dir2_dataptr_t	dapos,
+	const struct xfs_name	*name,
+	struct xfs_inode	*ip)
+{
+	struct xfs_scrub	*sc = sd->sc;
+	int			pptr_namelen;
+
+	sd->pptr.p_ino = sc->ip->i_ino;
+	sd->pptr.p_gen = VFS_I(sc->ip)->i_generation;
+	sd->pptr.p_diroffset = dapos;
+
+	pptr_namelen = xfs_parent_lookup(sc->tp, ip, &sd->pptr, sd->namebuf,
+			MAXNAMELEN, &sd->pptr_scratch);
+	if (pptr_namelen == -ENOATTR) {
+		xchk_fblock_xref_set_corrupt(sc, XFS_DATA_FORK, 0);
+		return 0;
+	}
+	if (pptr_namelen < 0) {
+		xchk_fblock_xref_process_error(sc, XFS_DATA_FORK, 0,
+				&pptr_namelen);
+		return pptr_namelen;
+	}
+
+	if (pptr_namelen != name->len) {
+		xchk_fblock_xref_set_corrupt(sc, XFS_DATA_FORK, 0);
+		return 0;
+	}
+
+	if (memcmp(sd->namebuf, name->name, name->len)) {
+		xchk_fblock_xref_set_corrupt(sc, XFS_DATA_FORK, 0);
+		return 0;
+	}
+
+	return 0;
+}
+
+/* Look for a parent pointer matching this dirent, if the child isn't busy. */
+STATIC int
+xchk_dir_check_pptr_fast(
+	struct xchk_dir		*sd,
+	xfs_dir2_dataptr_t	dapos,
+	const struct xfs_name	*name,
+	struct xfs_inode	*ip)
+{
+	struct xfs_scrub	*sc = sd->sc;
+	unsigned int		lockmode;
+	int			error;
+
+	/* dot and dotdot entries do not have parent pointers */
+	if (!strncmp(".", name->name, name->len) ||
+	    !strncmp("..", name->name, name->len))
+		return 0;
+
+	/* Try to lock the inode. */
+	lockmode = xchk_dir_lock_child(sc, ip);
+	if (!lockmode) {
+		xchk_set_incomplete(sc);
+		return -ECANCELED;
+	}
+
+	error = xchk_dir_parent_pointer(sd, dapos, name, ip);
+	xfs_iunlock(ip, lockmode);
+	return error;
+}
+
 /*
  * Scrub a single directory entry.
  *
@@ -78,6 +193,7 @@ xchk_dir_actor(
 {
 	struct xfs_mount	*mp = dp->i_mount;
 	struct xfs_inode	*ip;
+	struct xchk_dir		*sd = priv;
 	xfs_ino_t		lookup_ino;
 	xfs_dablk_t		offset;
 	int			error = 0;
@@ -144,6 +260,14 @@ xchk_dir_actor(
 		goto out;
 
 	xchk_dir_check_ftype(sc, offset, ip, name->type);
+
+	if (xfs_has_parent(mp)) {
+		error = xchk_dir_check_pptr_fast(sd, dapos, name, ip);
+		if (error)
+			goto out_rele;
+	}
+
+out_rele:
 	xchk_irele(sc, ip);
 out:
 	if (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)
@@ -759,6 +883,7 @@ int
 xchk_directory(
 	struct xfs_scrub	*sc)
 {
+	struct xchk_dir		*sd;
 	int			error;
 
 	if (!S_ISDIR(VFS_I(sc->ip)->i_mode))
@@ -786,9 +911,16 @@ xchk_directory(
 	if (sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)
 		return 0;
 
+	sd = kvzalloc(sizeof(struct xchk_dir), XCHK_GFP_FLAGS);
+	if (!sd)
+		return -ENOMEM;
+	sd->sc = sc;
+
 	/* Look up every name in this directory by hash. */
-	error = xchk_dir_walk(sc, sc->ip, xchk_dir_actor, NULL);
+	error = xchk_dir_walk(sc, sc->ip, xchk_dir_actor, sd);
 	if (error == -ECANCELED)
 		error = 0;
+
+	kvfree(sd);
 	return error;
 }


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 2/2] xfs: deferred scrub of dirents
  2023-02-16 20:28 ` [PATCHSET v9r2d1 0/2] xfs: online checking of directories Darrick J. Wong
  2023-02-16 20:51   ` [PATCH 1/2] xfs: check dirents have parent pointers Darrick J. Wong
@ 2023-02-16 20:51   ` Darrick J. Wong
  1 sibling, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:51 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

If the trylock-based parent pointer check fails, retain those dirents
and check them at the end.  This may involve dropping the locks on the
file being scanned, so yay.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/scrub/dir.c   |  237 ++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/xfs/scrub/trace.h |    2 
 2 files changed, 237 insertions(+), 2 deletions(-)


diff --git a/fs/xfs/scrub/dir.c b/fs/xfs/scrub/dir.c
index 39ae59eb4f40..3f3223e563ae 100644
--- a/fs/xfs/scrub/dir.c
+++ b/fs/xfs/scrub/dir.c
@@ -22,6 +22,10 @@
 #include "scrub/dabtree.h"
 #include "scrub/readdir.h"
 #include "scrub/repair.h"
+#include "scrub/trace.h"
+#include "scrub/xfile.h"
+#include "scrub/xfarray.h"
+#include "scrub/xfblob.h"
 
 /* Set us up to scrub directories. */
 int
@@ -41,6 +45,21 @@ xchk_setup_directory(
 
 /* Directories */
 
+/* Deferred directory entry that we saved for later. */
+struct xchk_dirent {
+	/* Cookie for retrieval of the dirent name. */
+	xfblob_cookie			name_cookie;
+
+	/* Child inode number. */
+	xfs_ino_t			ino;
+
+	/* Directory offset. */
+	xfs_dir2_dataptr_t		diroffset;
+
+	/* Length of the pptr name. */
+	uint8_t				namelen;
+};
+
 struct xchk_dir {
 	struct xfs_scrub	*sc;
 
@@ -50,6 +69,15 @@ struct xchk_dir {
 	/* xattr key and da args for parent pointer validation. */
 	struct xfs_parent_scratch pptr_scratch;
 
+	/* Fixed-size array of xchk_dirent structures. */
+	struct xfarray		*dir_entries;
+
+	/* Blobs containing dirent names. */
+	struct xfblob		*dir_names;
+
+	/* If we've cycled the ILOCK, we must revalidate deferred dirents. */
+	bool			need_revalidate;
+
 	/* Name buffer for pptr validation and dirent revalidation. */
 	uint8_t			namebuf[MAXNAMELEN];
 
@@ -167,8 +195,25 @@ xchk_dir_check_pptr_fast(
 	/* Try to lock the inode. */
 	lockmode = xchk_dir_lock_child(sc, ip);
 	if (!lockmode) {
-		xchk_set_incomplete(sc);
-		return -ECANCELED;
+		struct xchk_dirent	save_de = {
+			.namelen	= name->len,
+			.ino		= ip->i_ino,
+			.diroffset	= dapos,
+		};
+
+		/* Couldn't lock the inode, so save the dirent for later. */
+		trace_xchk_dir_defer(sc->ip, name->name, name->len, ip->i_ino);
+
+		error = xfblob_store(sd->dir_names, &save_de.name_cookie,
+				name->name, name->len);
+		if (xchk_fblock_process_error(sc, XFS_DATA_FORK, 0, &error))
+			return error;
+
+		error = xfarray_append(sd->dir_entries, &save_de);
+		if (xchk_fblock_process_error(sc, XFS_DATA_FORK, 0, &error))
+			return error;
+
+		return 0;
 	}
 
 	error = xchk_dir_parent_pointer(sd, dapos, name, ip);
@@ -878,6 +923,164 @@ xchk_directory_blocks(
 	return error;
 }
 
+/*
+ * Revalidate a dirent that we collected in the past but couldn't check because
+ * of lock contention.  Returns 0 if the dirent is still valid, -ENOENT if it
+ * has gone away on us, or a negative errno.
+ */
+STATIC int
+xchk_dir_revalidate_dirent(
+	struct xchk_dir		*sd,
+	const struct xfs_name	*xname,
+	xfs_ino_t		ino,
+	xfs_dir2_dataptr_t	diroffset)
+{
+	struct xfs_scrub	*sc = sd->sc;
+	xfs_ino_t		child_ino;
+	xfs_dir2_dataptr_t	child_diroffset = XFS_DIR2_NULL_DATAPTR;
+	int			error;
+
+	error = xchk_dir_lookup(sc, sc->ip, xname, &child_ino,
+			&child_diroffset);
+	if (error == -ENOENT) {
+		/* Directory entry went away, nothing to revalidate. */
+		return -ENOENT;
+	}
+	if (error)
+		return error;
+
+	/* The inode number changed, nothing to revalidate. */
+	if (ino != child_ino)
+		return -ENOENT;
+
+	/* The directory offset changed, nothing to revalidate. */
+	if (diroffset != child_diroffset)
+		return -ENOENT;
+
+	return 0;
+}
+
+/*
+ * Check a directory entry's parent pointers the slow way, which means we cycle
+ * locks a bunch and put up with revalidation until we get it done.
+ */
+STATIC int
+xchk_dir_slow_dirent(
+	struct xchk_dir		*sd,
+	struct xchk_dirent	*dirent)
+{
+	struct xfs_name		xname = {
+		.name		= sd->namebuf,
+		.len		= dirent->namelen,
+	};
+	struct xfs_scrub	*sc = sd->sc;
+	struct xfs_inode	*ip;
+	unsigned int		lockmode;
+	int			error;
+
+	/* Check that the deferred dirent still exists. */
+	if (sd->need_revalidate) {
+		error = xchk_dir_revalidate_dirent(sd, &xname, dirent->ino,
+				dirent->diroffset);
+		if (error == -ENOENT)
+			return 0;
+		if (!xchk_fblock_xref_process_error(sc, XFS_DATA_FORK, 0,
+					&error))
+			return error;
+	}
+
+	error = xchk_iget(sc, dirent->ino, &ip);
+	if (error == -EINVAL || error == -ENOENT) {
+		xchk_fblock_set_corrupt(sc, XFS_ATTR_FORK, 0);
+		return 0;
+	}
+	if (!xchk_fblock_xref_process_error(sc, XFS_ATTR_FORK, 0, &error))
+		return error;
+
+	/*
+	 * If we can grab both IOLOCK and ILOCK of the alleged child, we can
+	 * proceed with the validation.
+	 */
+	lockmode = xchk_dir_lock_child(sc, ip);
+	if (lockmode)
+		goto check_pptr;
+
+	/*
+	 * We couldn't lock the child file.  Drop all the locks and try to
+	 * get them again, one at a time.
+	 */
+	xchk_iunlock(sc, sc->ilock_flags);
+	sd->need_revalidate = true;
+
+	trace_xchk_dir_slowpath(sc->ip, xname.name, xname.len, ip->i_ino);
+
+	while (true) {
+		xchk_ilock(sc, XFS_IOLOCK_EXCL);
+		if (xfs_ilock_nowait(ip, XFS_IOLOCK_SHARED)) {
+			xchk_ilock(sc, XFS_ILOCK_EXCL);
+			if (xfs_ilock_nowait(ip, XFS_ILOCK_EXCL)) {
+				break;
+			}
+			xchk_iunlock(sc, XFS_ILOCK_EXCL);
+		}
+		xchk_iunlock(sc, XFS_IOLOCK_EXCL);
+
+		if (xchk_should_terminate(sc, &error))
+			goto out_rele;
+
+		delay(1);
+	}
+	lockmode = XFS_IOLOCK_SHARED | XFS_ILOCK_EXCL;
+
+	/* Revalidate, since we just cycled the locks. */
+	error = xchk_dir_revalidate_dirent(sd, &xname, dirent->ino,
+			dirent->diroffset);
+	if (error == -ENOENT)
+		goto out_unlock;
+	if (!xchk_fblock_xref_process_error(sc, XFS_DATA_FORK, 0, &error))
+		goto out_unlock;
+
+check_pptr:
+	error = xchk_dir_parent_pointer(sd, dirent->diroffset, &xname, ip);
+out_unlock:
+	xfs_iunlock(ip, lockmode);
+out_rele:
+	xchk_irele(sc, ip);
+	return error;
+}
+
+/* Check all the dirents that we deferred the first time around. */
+STATIC int
+xchk_dir_finish_slow_dirents(
+	struct xchk_dir		*sd)
+{
+	xfarray_idx_t		array_cur;
+	int			error;
+
+	foreach_xfarray_idx(sd->dir_entries, array_cur) {
+		struct xchk_dirent	dirent;
+
+		if (sd->sc->sm->sm_flags & XFS_SCRUB_OFLAG_CORRUPT)
+			return 0;
+
+		error = xfarray_load(sd->dir_entries, array_cur, &dirent);
+		if (error)
+			return error;
+
+		error = xfblob_load(sd->dir_names, dirent.name_cookie,
+				sd->namebuf, dirent.namelen);
+		if (error)
+			return error;
+		sd->namebuf[MAXNAMELEN - 1] = 0;
+
+		error = xchk_dir_slow_dirent(sd, &dirent);
+		if (error)
+			return error;
+	}
+
+	return 0;
+}
+
 /* Scrub a whole directory. */
 int
 xchk_directory(
@@ -916,11 +1119,41 @@ xchk_directory(
 		return -ENOMEM;
 	sd->sc = sc;
 
+	if (xfs_has_parent(sc->mp)) {
+		/*
+		 * Set up some staging memory for dirents that we can't check
+		 * due to locking contention.
+		 */
+		error = xfarray_create(sc->mp, "directory entries", 0,
+				sizeof(struct xchk_dirent), &sd->dir_entries);
+		if (error)
+			goto out_sd;
+
+		error = xfblob_create(sc->mp, "dirent names", &sd->dir_names);
+		if (error)
+			goto out_entries;
+	}
+
 	/* Look up every name in this directory by hash. */
 	error = xchk_dir_walk(sc, sc->ip, xchk_dir_actor, sd);
 	if (error == -ECANCELED)
 		error = 0;
+	if (error)
+		goto out_names;
 
+	if (xfs_has_parent(sc->mp)) {
+		error = xchk_dir_finish_slow_dirents(sd);
+		if (error)
+			goto out_names;
+	}
+
+out_names:
+	if (sd->dir_names)
+		xfblob_destroy(sd->dir_names);
+out_entries:
+	if (sd->dir_entries)
+		xfarray_destroy(sd->dir_entries);
+out_sd:
 	kvfree(sd);
 	return error;
 }
diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
index e536d070f9c7..911d947db787 100644
--- a/fs/xfs/scrub/trace.h
+++ b/fs/xfs/scrub/trace.h
@@ -962,6 +962,8 @@ DEFINE_EVENT(xchk_pptr_class, name, \
 	TP_ARGS(ip, name, namelen, parent_ino))
 DEFINE_XCHK_PPTR_CLASS(xchk_parent_defer);
 DEFINE_XCHK_PPTR_CLASS(xchk_parent_slowpath);
+DEFINE_XCHK_PPTR_CLASS(xchk_dir_defer);
+DEFINE_XCHK_PPTR_CLASS(xchk_dir_slowpath);
 
 /* repair tracepoints */
 #if IS_ENABLED(CONFIG_XFS_ONLINE_REPAIR)


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 1/5] xfs: load secure hash algorithm for parent pointers
  2023-02-16 20:28 ` [PATCHSET v9r2d1 0/5] xfs: encode parent pointer name in xattr key Darrick J. Wong
@ 2023-02-16 20:51   ` Darrick J. Wong
  2023-02-16 20:51   ` [PATCH 2/5] xfs: replace parent pointer diroffset with sha512 hash of name Darrick J. Wong
                     ` (6 subsequent siblings)
  7 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:51 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

We're about to start replacing the diroffset field of parent pointers
with a collision-resistant hash of the directory entry name.  Start by
attaching the sha512 crypto implementation if parent pointers are
attached.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/Kconfig     |    1 +
 fs/xfs/xfs_linux.h |    1 +
 fs/xfs/xfs_mount.c |   13 +++++++++++++
 fs/xfs/xfs_mount.h |    3 +++
 fs/xfs/xfs_super.c |    3 +++
 5 files changed, 21 insertions(+)


diff --git a/fs/xfs/Kconfig b/fs/xfs/Kconfig
index 4798a147fd9e..6422daaf8914 100644
--- a/fs/xfs/Kconfig
+++ b/fs/xfs/Kconfig
@@ -5,6 +5,7 @@ config XFS_FS
 	select EXPORTFS
 	select LIBCRC32C
 	select FS_IOMAP
+	select CRYPTO_SHA512
 	help
 	  XFS is a high performance journaling filesystem which originated
 	  on the SGI IRIX platform.  It is completely multi-threaded, can
diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h
index c05f7e309c3e..3f93a742b896 100644
--- a/fs/xfs/xfs_linux.h
+++ b/fs/xfs/xfs_linux.h
@@ -62,6 +62,7 @@ typedef __u32			xfs_nlink_t;
 #include <linux/rhashtable.h>
 #include <linux/xattr.h>
 #include <linux/mnt_idmapping.h>
+#include <crypto/hash.h>
 
 #include <asm/page.h>
 #include <asm/div64.h>
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index fb87ffb48f7f..a5f3dce658e9 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -983,6 +983,19 @@ xfs_mountfs(
 			goto out_agresv;
 	}
 
+	if (xfs_has_parent(mp)) {
+		struct crypto_shash	*tfm;
+
+		tfm = crypto_alloc_shash("sha512", 0, 0);
+		if (IS_ERR(tfm)) {
+			error = PTR_ERR(tfm);
+			goto out_agresv;
+		}
+		xfs_info(mp, "parent pointer hash %s",
+				crypto_shash_driver_name(tfm));
+		mp->m_sha512 = tfm;
+	}
+
 	return 0;
 
  out_agresv:
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index c08f55cc4f36..7c8e15e84cd6 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -244,6 +244,9 @@ typedef struct xfs_mount {
 #endif
 	/* Hook to feed file directory updates to an active online repair. */
 	struct xfs_hooks	m_dirent_update_hooks;
+
+	/* sha512 engine, if needed */
+	struct crypto_shash	*m_sha512;
 } xfs_mount_t;
 
 #define M_IGEO(mp)		(&(mp)->m_ino_geo)
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 0432a4a096e8..610d72353f39 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -738,6 +738,8 @@ xfs_mount_free(
 {
 	kfree(mp->m_rtname);
 	kfree(mp->m_logname);
+	if (mp->m_sha512)
+		crypto_free_shash(mp->m_sha512);
 	kmem_free(mp);
 }
 
@@ -1961,6 +1963,7 @@ static int xfs_init_fs_context(
 	if (fc->sb_flags & SB_SYNCHRONOUS)
 		mp->m_features |= XFS_FEAT_WSYNC;
 
+	mp->m_sha512 = NULL;
 	fc->s_fs_info = mp;
 	fc->ops = &xfs_context_ops;
 


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 2/5] xfs: replace parent pointer diroffset with sha512 hash of name
  2023-02-16 20:28 ` [PATCHSET v9r2d1 0/5] xfs: encode parent pointer name in xattr key Darrick J. Wong
  2023-02-16 20:51   ` [PATCH 1/5] xfs: load secure hash algorithm for parent pointers Darrick J. Wong
@ 2023-02-16 20:51   ` Darrick J. Wong
  2023-02-16 20:52   ` [PATCH 3/5] xfs: skip the sha512 namehash when possible Darrick J. Wong
                     ` (5 subsequent siblings)
  7 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:51 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Replace the diroffset with the sha512 hash of the dirent name, thereby
eliminating the need for directory repair to update all the parent
pointers after rebuilding the directory.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_da_format.h |   15 +++--
 fs/xfs/libxfs/xfs_fs.h        |    4 +
 fs/xfs/libxfs/xfs_parent.c    |  125 +++++++++++++++++++++++++++++++----------
 fs/xfs/libxfs/xfs_parent.h    |   21 ++++---
 fs/xfs/scrub/dir.c            |   12 +++-
 fs/xfs/scrub/dir_repair.c     |   83 ++++++++-------------------
 fs/xfs/scrub/parent.c         |   43 ++++++++++----
 fs/xfs/scrub/parent_repair.c  |   27 ++++-----
 fs/xfs/scrub/trace.h          |   48 +++++-----------
 fs/xfs/xfs_inode.c            |   30 ++++------
 fs/xfs/xfs_ondisk.h           |    4 +
 fs/xfs/xfs_parent_utils.c     |    2 -
 fs/xfs/xfs_sha512.h           |   42 ++++++++++++++
 fs/xfs/xfs_symlink.c          |    3 -
 14 files changed, 271 insertions(+), 188 deletions(-)
 create mode 100644 fs/xfs/xfs_sha512.h


diff --git a/fs/xfs/libxfs/xfs_da_format.h b/fs/xfs/libxfs/xfs_da_format.h
index c07b8166e8ff..386f63b262d5 100644
--- a/fs/xfs/libxfs/xfs_da_format.h
+++ b/fs/xfs/libxfs/xfs_da_format.h
@@ -824,17 +824,22 @@ static inline unsigned int xfs_dir2_dirblock_bytes(struct xfs_sb *sbp)
 xfs_failaddr_t xfs_da3_blkinfo_verify(struct xfs_buf *bp,
 				      struct xfs_da3_blkinfo *hdr3);
 
+/* We use sha512 for the parent pointer name hash. */
+#define XFS_PARENT_NAME_HASH_SIZE	(64)
+
 /*
  * Parent pointer attribute format definition
  *
- * EA name encodes the parent inode number, generation and the offset of
- * the dirent that points to the child inode. The EA value contains the
- * same name as the dirent in the parent directory.
+ * The EA name encodes the parent inode number, generation and a collision
+ * resistant hash computed from the dirent name.  The hash is defined to be the
+ * sha512 of the child inode generation and the dirent name.
+ *
+ * The EA value contains the same name as the dirent in the parent directory.
  */
 struct xfs_parent_name_rec {
 	__be64  p_ino;
 	__be32  p_gen;
-	__be32  p_diroffset;
-};
+	__u8	p_namehash[XFS_PARENT_NAME_HASH_SIZE];
+} __attribute__((packed));
 
 #endif /* __XFS_DA_FORMAT_H__ */
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index 9e59a1fdfb0c..c65345d2ba7a 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -770,8 +770,8 @@ struct xfs_scrub_metadata {
 struct xfs_parent_ptr {
 	__u64		xpp_ino;			/* Inode */
 	__u32		xpp_gen;			/* Inode generation */
-	__u32		xpp_diroffset;			/* Directory offset */
-	__u64		xpp_rsvd;			/* Reserved */
+	__u32		xpp_rsvd;			/* Reserved */
+	__u64		xpp_rsvd2;			/* Reserved */
 	__u8		xpp_name[XFS_PPTR_MAXNAMELEN];	/* File name */
 };
 
diff --git a/fs/xfs/libxfs/xfs_parent.c b/fs/xfs/libxfs/xfs_parent.c
index a2575bf44c89..a28dcf18cb4d 100644
--- a/fs/xfs/libxfs/xfs_parent.c
+++ b/fs/xfs/libxfs/xfs_parent.c
@@ -26,6 +26,7 @@
 #include "xfs_xattr.h"
 #include "xfs_parent.h"
 #include "xfs_trans_space.h"
+#include "xfs_sha512.h"
 
 struct kmem_cache		*xfs_parent_intent_cache;
 
@@ -54,7 +55,6 @@ xfs_parent_namecheck(
 	unsigned int				attr_flags)
 {
 	xfs_ino_t				p_ino;
-	xfs_dir2_dataptr_t			p_diroffset;
 
 	if (reclen != sizeof(struct xfs_parent_name_rec))
 		return false;
@@ -67,10 +67,6 @@ xfs_parent_namecheck(
 	if (!xfs_verify_ino(mp, p_ino))
 		return false;
 
-	p_diroffset = be32_to_cpu(rec->p_diroffset);
-	if (p_diroffset > XFS_DIR2_MAX_DATAPTR)
-		return false;
-
 	return true;
 }
 
@@ -91,18 +87,17 @@ xfs_parent_valuecheck(
 }
 
 /* Initializes a xfs_parent_name_rec to be stored as an attribute name */
-static inline void
+static inline int
 xfs_init_parent_name_rec(
 	struct xfs_parent_name_rec	*rec,
-	const struct xfs_inode		*ip,
-	uint32_t			p_diroffset)
+	const struct xfs_inode		*dp,
+	const struct xfs_name		*name,
+	struct xfs_inode		*ip)
 {
-	xfs_ino_t			p_ino = ip->i_ino;
-	uint32_t			p_gen = VFS_IC(ip)->i_generation;
-
-	rec->p_ino = cpu_to_be64(p_ino);
-	rec->p_gen = cpu_to_be32(p_gen);
-	rec->p_diroffset = cpu_to_be32(p_diroffset);
+	rec->p_ino = cpu_to_be64(dp->i_ino);
+	rec->p_gen = cpu_to_be32(VFS_IC(dp)->i_generation);
+	return xfs_parent_namehash(ip, name, rec->p_namehash,
+			sizeof(rec->p_namehash));
 }
 
 /*
@@ -118,7 +113,7 @@ xfs_parent_irec_from_disk(
 {
 	irec->p_ino = be64_to_cpu(rec->p_ino);
 	irec->p_gen = be32_to_cpu(rec->p_gen);
-	irec->p_diroffset = be32_to_cpu(rec->p_diroffset);
+	memcpy(irec->p_namehash, rec->p_namehash, sizeof(irec->p_namehash));
 
 	if (!value) {
 		irec->p_namelen = 0;
@@ -148,7 +143,7 @@ xfs_parent_irec_to_disk(
 {
 	rec->p_ino = cpu_to_be64(irec->p_ino);
 	rec->p_gen = cpu_to_be32(irec->p_gen);
-	rec->p_diroffset = cpu_to_be32(irec->p_diroffset);
+	memcpy(rec->p_namehash, irec->p_namehash, sizeof(rec->p_namehash));
 
 	if (valuelen) {
 		ASSERT(*valuelen > 0);
@@ -208,12 +203,15 @@ xfs_parent_add(
 	struct xfs_parent_defer	*parent,
 	struct xfs_inode	*dp,
 	const struct xfs_name	*parent_name,
-	xfs_dir2_dataptr_t	diroffset,
 	struct xfs_inode	*child)
 {
 	struct xfs_da_args	*args = &parent->args;
+	int			error;
+
+	error = xfs_init_parent_name_rec(&parent->rec, dp, parent_name, child);
+	if (error)
+		return error;
 
-	xfs_init_parent_name_rec(&parent->rec, dp, diroffset);
 	args->hashval = xfs_da_hashname(args->name, args->namelen);
 
 	args->trans = tp;
@@ -230,14 +228,18 @@ xfs_parent_add(
 int
 xfs_parent_remove(
 	struct xfs_trans	*tp,
-	struct xfs_inode	*dp,
 	struct xfs_parent_defer	*parent,
-	xfs_dir2_dataptr_t	diroffset,
+	struct xfs_inode	*dp,
+	const struct xfs_name	*name,
 	struct xfs_inode	*child)
 {
 	struct xfs_da_args	*args = &parent->args;
+	int			error;
+
+	error = xfs_init_parent_name_rec(&parent->rec, dp, name, child);
+	if (error)
+		return error;
 
-	xfs_init_parent_name_rec(&parent->rec, dp, diroffset);
 	args->trans = tp;
 	args->dp = child;
 	args->hashval = xfs_da_hashname(args->name, args->namelen);
@@ -250,16 +252,23 @@ xfs_parent_replace(
 	struct xfs_trans	*tp,
 	struct xfs_parent_defer	*new_parent,
 	struct xfs_inode	*old_dp,
-	xfs_dir2_dataptr_t	old_diroffset,
-	const struct xfs_name	*parent_name,
+	const struct xfs_name	*old_name,
 	struct xfs_inode	*new_dp,
-	xfs_dir2_dataptr_t	new_diroffset,
+	const struct xfs_name	*new_name,
 	struct xfs_inode	*child)
 {
 	struct xfs_da_args	*args = &new_parent->args;
+	int			error;
+
+	error = xfs_init_parent_name_rec(&new_parent->old_rec, old_dp,
+			old_name, child);
+	if (error)
+		return error;
+	error = xfs_init_parent_name_rec(&new_parent->rec, new_dp, new_name,
+			child);
+	if (error)
+		return error;
 
-	xfs_init_parent_name_rec(&new_parent->old_rec, old_dp, old_diroffset);
-	xfs_init_parent_name_rec(&new_parent->rec, new_dp, new_diroffset);
 	new_parent->args.name = (const uint8_t *)&new_parent->old_rec;
 	new_parent->args.namelen = sizeof(struct xfs_parent_name_rec);
 	new_parent->args.new_name = (const uint8_t *)&new_parent->rec;
@@ -267,9 +276,8 @@ xfs_parent_replace(
 	args->trans = tp;
 	args->dp = child;
 
-	ASSERT(parent_name != NULL);
-	new_parent->args.value = (void *)parent_name->name;
-	new_parent->args.valuelen = parent_name->len;
+	new_parent->args.value = (void *)new_name->name;
+	new_parent->args.valuelen = new_name->len;
 
 	args->hashval = xfs_da_hashname(args->name, args->namelen);
 	return xfs_attr_defer_replace(args);
@@ -388,3 +396,62 @@ xfs_parent_unset(
 
 	return xfs_attr_set(&scr->args);
 }
+
+/*
+ * Compute the parent pointer namehash for the given child file and dirent
+ * name.
+ */
+int
+xfs_parent_namehash(
+	struct xfs_inode	*ip,
+	const struct xfs_name	*name,
+	void			*namehash,
+	unsigned int		namehash_len)
+{
+	SHA512_DESC_ON_STACK(ip->i_mount, shash);
+	__be32			gen = cpu_to_be32(VFS_I(ip)->i_generation);
+	int			error;
+
+	ASSERT(SHA512_DIGEST_SIZE ==
+			crypto_shash_digestsize(ip->i_mount->m_sha512));
+
+	if (namehash_len != SHA512_DIGEST_SIZE) {
+		ASSERT(0);
+		return -EINVAL;
+	}
+
+	error = sha512_init(&shash);
+	if (error)
+		goto out;
+
+	error = sha512_process(&shash, (const u8 *)&gen, sizeof(gen));
+	if (error)
+		goto out;
+
+	error = sha512_process(&shash, name->name, name->len);
+	if (error)
+		goto out;
+
+	error = sha512_done(&shash, namehash);
+	if (error)
+		goto out;
+
+out:
+	sha512_erase(&shash);
+	return error;
+}
+
+/* Recalculate the name hash of this parent pointer. */
+int
+xfs_parent_irec_hash(
+	struct xfs_inode		*ip,
+	struct xfs_parent_name_irec	*pptr)
+{
+	struct xfs_name			xname = {
+		.name			= pptr->p_name,
+		.len			= pptr->p_namelen,
+	};
+
+	return xfs_parent_namehash(ip, &xname, &pptr->p_namehash,
+			sizeof(pptr->p_namehash));
+}
diff --git a/fs/xfs/libxfs/xfs_parent.h b/fs/xfs/libxfs/xfs_parent.h
index a7fc621b82c4..d3f2841e0f6e 100644
--- a/fs/xfs/libxfs/xfs_parent.h
+++ b/fs/xfs/libxfs/xfs_parent.h
@@ -23,7 +23,7 @@ struct xfs_parent_name_irec {
 	/* Key fields for looking up a particular parent pointer. */
 	xfs_ino_t		p_ino;
 	uint32_t		p_gen;
-	xfs_dir2_dataptr_t	p_diroffset;
+	uint8_t			p_namehash[XFS_PARENT_NAME_HASH_SIZE];
 
 	/* Attributes of a parent pointer. */
 	uint8_t			p_namelen;
@@ -79,15 +79,14 @@ xfs_parent_start_locked(
 
 int xfs_parent_add(struct xfs_trans *tp, struct xfs_parent_defer *parent,
 		struct xfs_inode *dp, const struct xfs_name *parent_name,
-		xfs_dir2_dataptr_t diroffset, struct xfs_inode *child);
+		struct xfs_inode *child);
 int xfs_parent_replace(struct xfs_trans *tp,
 		struct xfs_parent_defer *new_parent, struct xfs_inode *old_dp,
-		xfs_dir2_dataptr_t old_diroffset,
-		const struct xfs_name *parent_name, struct xfs_inode *new_ip,
-		xfs_dir2_dataptr_t new_diroffset, struct xfs_inode *child);
-int xfs_parent_remove(struct xfs_trans *tp, struct xfs_inode *dp,
-		struct xfs_parent_defer *parent, xfs_dir2_dataptr_t diroffset,
-		struct xfs_inode *child);
+		const struct xfs_name *old_name, struct xfs_inode *new_ip,
+		const struct xfs_name *new_name, struct xfs_inode *child);
+int xfs_parent_remove(struct xfs_trans *tp,
+		struct xfs_parent_defer *parent, struct xfs_inode *dp,
+		const struct xfs_name *name, struct xfs_inode *child);
 
 void __xfs_parent_cancel(struct xfs_mount *mp, struct xfs_parent_defer *parent);
 
@@ -100,6 +99,12 @@ xfs_parent_finish(
 		__xfs_parent_cancel(mp, p);
 }
 
+int xfs_parent_namehash(struct xfs_inode *ip, const struct xfs_name *name,
+		void *namehash, unsigned int namehash_len);
+
+int xfs_parent_irec_hash(struct xfs_inode *ip,
+		struct xfs_parent_name_irec *pptr);
+
 unsigned int xfs_pptr_calc_space_res(struct xfs_mount *mp,
 				     unsigned int namelen);
 
diff --git a/fs/xfs/scrub/dir.c b/fs/xfs/scrub/dir.c
index 3f3223e563ae..2494947a0c93 100644
--- a/fs/xfs/scrub/dir.c
+++ b/fs/xfs/scrub/dir.c
@@ -139,16 +139,20 @@ xchk_dir_lock_child(
 STATIC int
 xchk_dir_parent_pointer(
 	struct xchk_dir		*sd,
-	xfs_dir2_dataptr_t	dapos,
 	const struct xfs_name	*name,
 	struct xfs_inode	*ip)
 {
 	struct xfs_scrub	*sc = sd->sc;
 	int			pptr_namelen;
+	int			error;
 
 	sd->pptr.p_ino = sc->ip->i_ino;
 	sd->pptr.p_gen = VFS_I(sc->ip)->i_generation;
-	sd->pptr.p_diroffset = dapos;
+
+	error = xfs_parent_namehash(ip, name, &sd->pptr.p_namehash,
+			sizeof(sd->pptr.p_namehash));
+	if (error)
+		return error;
 
 	pptr_namelen = xfs_parent_lookup(sc->tp, ip, &sd->pptr, sd->namebuf,
 			MAXNAMELEN, &sd->pptr_scratch);
@@ -216,7 +220,7 @@ xchk_dir_check_pptr_fast(
 		return 0;
 	}
 
-	error = xchk_dir_parent_pointer(sd, dapos, name, ip);
+	error = xchk_dir_parent_pointer(sd, name, ip);
 	xfs_iunlock(ip, lockmode);
 	return error;
 }
@@ -1041,7 +1045,7 @@ xchk_dir_slow_dirent(
 		goto out_unlock;
 
 check_pptr:
-	error = xchk_dir_parent_pointer(sd, dirent->diroffset, &xname, ip);
+	error = xchk_dir_parent_pointer(sd, &xname, ip);
 out_unlock:
 	xfs_iunlock(ip, lockmode);
 out_rele:
diff --git a/fs/xfs/scrub/dir_repair.c b/fs/xfs/scrub/dir_repair.c
index ec48b3268809..c0b2b78da277 100644
--- a/fs/xfs/scrub/dir_repair.c
+++ b/fs/xfs/scrub/dir_repair.c
@@ -93,9 +93,6 @@ struct xrep_dirent {
 	/* Child inode number. */
 	xfs_ino_t		ino;
 
-	/* Directory offset that we want.  We're not going to get it. */
-	xfs_dir2_dataptr_t	diroffset;
-
 	/* Length of the dirent name. */
 	uint8_t			namelen;
 
@@ -261,8 +258,7 @@ xrep_dir_createname(
 	struct xrep_dir		*rd,
 	const struct xfs_name	*name,
 	xfs_ino_t		inum,
-	xfs_extlen_t		total,
-	xfs_dir2_dataptr_t	diroffset)
+	xfs_extlen_t		total)
 {
 	struct xfs_scrub	*sc = rd->sc;
 	struct xfs_inode	*dp = rd->args.dp;
@@ -275,7 +271,7 @@ xrep_dir_createname(
 	if (error)
 		return error;
 
-	trace_xrep_dir_createname(dp, name, inum, diroffset);
+	trace_xrep_dir_createname(dp, name, inum);
 
 	/* reset cmpresult as if we haven't done a lookup */
 	rd->args.cmpresult = XFS_CMP_DIFFERENT;
@@ -307,8 +303,7 @@ STATIC int
 xrep_dir_removename(
 	struct xrep_dir		*rd,
 	const struct xfs_name	*name,
-	xfs_extlen_t		total,
-	xfs_dir2_dataptr_t	diroffset)
+	xfs_extlen_t		total)
 {
 	struct xfs_inode	*dp = rd->args.dp;
 	bool			is_block, is_leaf;
@@ -321,7 +316,7 @@ xrep_dir_removename(
 	rd->args.op_flags = 0;
 	rd->args.total = total;
 
-	trace_xrep_dir_removename(dp, name, rd->args.inumber, diroffset);
+	trace_xrep_dir_removename(dp, name, rd->args.inumber);
 
 	if (dp->i_df.if_format == XFS_DINODE_FMT_LOCAL)
 		return xfs_dir2_sf_removename(&rd->args);
@@ -385,8 +380,7 @@ xrep_dir_replay_update(
 			goto out_cancel;
 		}
 
-		error = xrep_dir_removename(rd, &xname, resblks,
-				dirent->diroffset);
+		error = xrep_dir_removename(rd, &xname, resblks);
 	} else {
 		/* Add this dirent.  The lookup must not succeed. */
 		if (error == 0)
@@ -394,8 +388,7 @@ xrep_dir_replay_update(
 		if (error != -ENOENT)
 			goto out_cancel;
 
-		error = xrep_dir_createname(rd, &xname, dirent->ino, resblks,
-				dirent->diroffset);
+		error = xrep_dir_createname(rd, &xname, dirent->ino, resblks);
 	}
 	if (error)
 		goto out_cancel;
@@ -465,19 +458,17 @@ STATIC int
 xrep_dir_add_dirent(
 	struct xrep_dir		*rd,
 	const struct xfs_name	*name,
-	xfs_ino_t		ino,
-	xfs_dir2_dataptr_t	diroffset)
+	xfs_ino_t		ino)
 {
 	struct xrep_dirent	dirent = {
 		.action		= XREP_DIRENT_ADD,
 		.ino		= ino,
 		.namelen	= name->len,
 		.ftype		= name->type,
-		.diroffset	= diroffset,
 	};
 	int			error;
 
-	trace_xrep_dir_add_dirent(rd->sc->tempip, name, ino, diroffset);
+	trace_xrep_dir_add_dirent(rd->sc->tempip, name, ino);
 
 	error = xfblob_store(rd->dir_names, &dirent.name_cookie, name->name,
 			name->len);
@@ -495,19 +486,17 @@ STATIC int
 xrep_dir_remove_dirent(
 	struct xrep_dir		*rd,
 	const struct xfs_name	*name,
-	xfs_ino_t		ino,
-	xfs_dir2_dataptr_t	diroffset)
+	xfs_ino_t		ino)
 {
 	struct xrep_dirent	dirent = {
 		.action		= XREP_DIRENT_REMOVE,
 		.ino		= ino,
 		.namelen	= name->len,
 		.ftype		= name->type,
-		.diroffset	= diroffset,
 	};
 	int			error;
 
-	trace_xrep_dir_remove_dirent(rd->sc->tempip, name, ino, diroffset);
+	trace_xrep_dir_remove_dirent(rd->sc->tempip, name, ino);
 
 	error = xfblob_store(rd->dir_names, &dirent.name_cookie, name->name,
 			name->len);
@@ -567,8 +556,7 @@ xrep_dir_scan_parent_pointer(
 	xname.type = xfs_mode_to_ftype(VFS_I(ip)->i_mode);
 
 	mutex_lock(&rd->lock);
-	error = xrep_dir_add_dirent(rd, &xname, ip->i_ino,
-			rd->pptr.p_diroffset);
+	error = xrep_dir_add_dirent(rd, &xname, ip->i_ino);
 	mutex_unlock(&rd->lock);
 	return error;
 }
@@ -605,7 +593,7 @@ xrep_dir_scan_dirent(
 	    xrep_dir_samename(name, &xfs_name_dot))
 		return 0;
 
-	trace_xrep_dir_replacename(sc->tempip, &xfs_name_dotdot, dp->i_ino, 0);
+	trace_xrep_dir_replacename(sc->tempip, &xfs_name_dotdot, dp->i_ino);
 
 	mutex_lock(&rd->lock);
 	rd->parent_ino = dp->i_ino;
@@ -773,7 +761,6 @@ xrep_dir_dump_tempdir(
 	struct xrep_dir		*rd = priv;
 	xfs_ino_t		child_ino;
 	bool			child = true;
-	xfs_dir2_dataptr_t	child_diroffset = XFS_DIR2_NULL_DATAPTR;
 	int			error;
 
 	/*
@@ -800,7 +787,7 @@ xrep_dir_dump_tempdir(
 		ino = sc->ip->i_ino;
 	}
 
-	trace_xrep_dir_dumpname(sc->tempip, name, ino, dapos);
+	trace_xrep_dir_dumpname(sc->tempip, name, ino);
 
 	if (!child)
 		return 0;
@@ -812,17 +799,15 @@ xrep_dir_dump_tempdir(
 	 * and reap it responsibly, but I didn't feel like porting all that.
 	 */
 	mutex_lock(&rd->lock);
-	error = xrep_dir_remove_dirent(rd, name, ino, dapos);
+	error = xrep_dir_remove_dirent(rd, name, ino);
 	mutex_unlock(&rd->lock);
 	if (error)
 		return error;
 
 	/* Check that the dir being repaired has the same entry. */
-	error = xchk_dir_lookup(sc, sc->ip, name, &child_ino,
-			&child_diroffset);
+	error = xchk_dir_lookup(sc, sc->ip, name, &child_ino, NULL);
 	if (error == -ENOENT) {
-		trace_xrep_dir_checkname(sc->ip, name, NULLFSINO,
-				XFS_DIR2_NULL_DATAPTR);
+		trace_xrep_dir_checkname(sc->ip, name, NULLFSINO);
 		ASSERT(error != -ENOENT);
 		return -EFSCORRUPTED;
 	}
@@ -830,18 +815,11 @@ xrep_dir_dump_tempdir(
 		return error;
 
 	if (ino != child_ino) {
-		trace_xrep_dir_checkname(sc->ip, name, child_ino,
-				child_diroffset);
+		trace_xrep_dir_checkname(sc->ip, name, child_ino);
 		ASSERT(ino == child_ino);
 		return -EFSCORRUPTED;
 	}
 
-	if (dapos != child_diroffset) {
-		trace_xrep_dir_badposname(sc->ip, name, child_ino,
-				child_diroffset);
-		/* We have no way to update this, so we just leave it. */
-	}
-
 	return 0;
 }
 
@@ -860,7 +838,6 @@ xrep_dir_dump_baddir(
 	void			*priv)
 {
 	xfs_ino_t		child_ino;
-	xfs_dir2_dataptr_t	child_diroffset = XFS_DIR2_NULL_DATAPTR;
 	int			error;
 
 	/* Ignore the directory's dot and dotdot entries. */
@@ -868,14 +845,12 @@ xrep_dir_dump_baddir(
 	    xrep_dir_samename(name, &xfs_name_dot))
 		return 0;
 
-	trace_xrep_dir_dumpname(sc->ip, name, ino, dapos);
+	trace_xrep_dir_dumpname(sc->ip, name, ino);
 
 	/* Check that the tempdir has the same entry. */
-	error = xchk_dir_lookup(sc, sc->tempip, name, &child_ino,
-			&child_diroffset);
+	error = xchk_dir_lookup(sc, sc->tempip, name, &child_ino, NULL);
 	if (error == -ENOENT) {
-		trace_xrep_dir_checkname(sc->tempip, name, NULLFSINO,
-				XFS_DIR2_NULL_DATAPTR);
+		trace_xrep_dir_checkname(sc->tempip, name, NULLFSINO);
 		ASSERT(error != -ENOENT);
 		return -EFSCORRUPTED;
 	}
@@ -883,18 +858,11 @@ xrep_dir_dump_baddir(
 		return error;
 
 	if (ino != child_ino) {
-		trace_xrep_dir_checkname(sc->tempip, name, child_ino,
-				child_diroffset);
+		trace_xrep_dir_checkname(sc->tempip, name, child_ino);
 		ASSERT(ino == child_ino);
 		return -EFSCORRUPTED;
 	}
 
-	if (dapos != child_diroffset) {
-		trace_xrep_dir_badposname(sc->ip, name, child_ino,
-				child_diroffset);
-		/* We have no way to update this, so we just leave it. */
-	}
-
 	return 0;
 }
 
@@ -1011,11 +979,10 @@ xrep_dir_live_update(
 	    xchk_iscan_want_live_update(&rd->iscan, p->ip->i_ino)) {
 		mutex_lock(&rd->lock);
 		if (p->delta > 0)
-			error = xrep_dir_add_dirent(rd, p->name, p->ip->i_ino,
-					p->diroffset);
+			error = xrep_dir_add_dirent(rd, p->name, p->ip->i_ino);
 		else
 			error = xrep_dir_remove_dirent(rd, p->name,
-					p->ip->i_ino, p->diroffset);
+					p->ip->i_ino);
 		mutex_unlock(&rd->lock);
 		if (error)
 			goto out_abort;
@@ -1030,12 +997,12 @@ xrep_dir_live_update(
 		mutex_lock(&rd->lock);
 		if (p->delta > 0) {
 			trace_xrep_dir_add_dirent(sc->tempip, &xfs_name_dotdot,
-					p->dp->i_ino, 0);
+					p->dp->i_ino);
 
 			rd->parent_ino = p->dp->i_ino;
 		} else {
 			trace_xrep_dir_remove_dirent(sc->tempip,
-					&xfs_name_dotdot, NULLFSINO, 0);
+					&xfs_name_dotdot, NULLFSINO);
 
 			rd->parent_ino = NULLFSINO;
 		}
diff --git a/fs/xfs/scrub/parent.c b/fs/xfs/scrub/parent.c
index 14f16fefd1b0..53872a7be942 100644
--- a/fs/xfs/scrub/parent.c
+++ b/fs/xfs/scrub/parent.c
@@ -323,7 +323,6 @@ struct xchk_pptr {
 	/* Parent pointer attr key. */
 	xfs_ino_t			p_ino;
 	uint32_t			p_gen;
-	xfs_dir2_dataptr_t		p_diroffset;
 
 	/* Length of the pptr name. */
 	uint8_t				namelen;
@@ -350,6 +349,9 @@ struct xchk_pptrs {
 	/* xattr key and da args for parent pointer revalidation. */
 	struct xfs_parent_scratch pptr_scratch;
 
+	/* Name hashes */
+	uint8_t			child_namehash[XFS_PARENT_NAME_HASH_SIZE];
+
 	/* Name buffer for revalidation. */
 	uint8_t			namebuf[MAXNAMELEN];
 };
@@ -426,14 +428,13 @@ xchk_parent_dirent(
 	};
 	struct xfs_scrub	*sc = pp->sc;
 	xfs_ino_t		child_ino;
-	xfs_dir2_dataptr_t	child_diroffset;
 	int			error;
 
 	/*
 	 * Use the name attached to this parent pointer to look up the
 	 * directory entry in the alleged parent.
 	 */
-	error = xchk_dir_lookup(sc, dp, &xname, &child_ino, &child_diroffset);
+	error = xchk_dir_lookup(sc, dp, &xname, &child_ino, NULL);
 	if (error == -ENOENT) {
 		xchk_fblock_xref_set_corrupt(sc, XFS_ATTR_FORK, 0);
 		return 0;
@@ -447,15 +448,6 @@ xchk_parent_dirent(
 		return 0;
 	}
 
-	/* Does the directory offset match? */
-	if (pp->pptr.p_diroffset != child_diroffset) {
-		trace_xchk_parent_bad_dapos(sc->ip, pp->pptr.p_diroffset,
-				dp->i_ino, child_diroffset, xname.name,
-				xname.len);
-		xchk_fblock_xref_set_corrupt(sc, XFS_ATTR_FORK, 0);
-		return 0;
-	}
-
 	/*
 	 * If we're scanning a directory, we should only ever encounter a
 	 * single parent pointer, and it should match the dotdot entry.  We set
@@ -534,6 +526,7 @@ xchk_parent_scan_attr(
 	unsigned int		valuelen,
 	void			*priv)
 {
+	struct xfs_name		xname = { };
 	struct xchk_pptrs	*pp = priv;
 	struct xfs_inode	*dp = NULL;
 	const struct xfs_parent_name_rec *rec = (const void *)name;
@@ -561,6 +554,26 @@ xchk_parent_scan_attr(
 
 	xfs_parent_irec_from_disk(&pp->pptr, rec, value, valuelen);
 
+	xname.name = pp->pptr.p_name;
+	xname.len = pp->pptr.p_namelen;
+
+	/*
+	 * Does the namehash in the parent pointer match the actual name?
+	 * If not, there's no point in checking further.
+	 */
+	error = xfs_parent_namehash(sc->ip, &xname, pp->child_namehash,
+			sizeof(pp->child_namehash));
+	if (!xchk_fblock_xref_process_error(sc, XFS_ATTR_FORK, 0, &error))
+		return error;
+
+	if (memcmp(pp->pptr.p_namehash, pp->child_namehash,
+				sizeof(pp->pptr.p_namehash))) {
+		trace_xchk_parent_bad_namehash(sc->ip, pp->pptr.p_ino,
+				xname.name, xname.len);
+		xchk_fblock_xref_set_corrupt(sc, XFS_ATTR_FORK, 0);
+		return 0;
+	}
+
 	error = xchk_parent_iget(pp, &dp);
 	if (error)
 		return error;
@@ -573,7 +586,6 @@ xchk_parent_scan_attr(
 		struct xchk_pptr	save_pp = {
 			.p_ino		= pp->pptr.p_ino,
 			.p_gen		= pp->pptr.p_gen,
-			.p_diroffset	= pp->pptr.p_diroffset,
 			.namelen	= pp->pptr.p_namelen,
 		};
 
@@ -655,7 +667,6 @@ xchk_parent_slow_pptr(
 	/* Restore the saved parent pointer into the irec. */
 	pp->pptr.p_ino = pptr->p_ino;
 	pp->pptr.p_gen = pptr->p_gen;
-	pp->pptr.p_diroffset = pptr->p_diroffset;
 
 	error = xfblob_load(pp->pptr_names, pptr->name_cookie, pp->pptr.p_name,
 			pptr->namelen);
@@ -664,6 +675,10 @@ xchk_parent_slow_pptr(
 	pp->pptr.p_name[MAXNAMELEN - 1] = 0;
 	pp->pptr.p_namelen = pptr->namelen;
 
+	error = xfs_parent_irec_hash(sc->ip, &pp->pptr);
+	if (error)
+		return error;
+
 	/* Check that the deferred parent pointer still exists. */
 	if (pp->need_revalidate) {
 		error = xchk_parent_revalidate_pptr(pp);
diff --git a/fs/xfs/scrub/parent_repair.c b/fs/xfs/scrub/parent_repair.c
index 56b47bf2807b..51432ab61c94 100644
--- a/fs/xfs/scrub/parent_repair.c
+++ b/fs/xfs/scrub/parent_repair.c
@@ -95,7 +95,6 @@ struct xrep_pptr {
 	/* Parent pointer attr key. */
 	xfs_ino_t			p_ino;
 	uint32_t			p_gen;
-	xfs_dir2_dataptr_t		p_diroffset;
 
 	/* Length of the pptr name. */
 	uint8_t				namelen;
@@ -183,12 +182,16 @@ xrep_pptr_replay_update(
 	const struct xrep_pptr	*pptr)
 {
 	struct xfs_scrub	*sc = rp->sc;
+	int			error;
 
 	rp->pptr.p_ino = pptr->p_ino;
 	rp->pptr.p_gen = pptr->p_gen;
-	rp->pptr.p_diroffset = pptr->p_diroffset;
 	rp->pptr.p_namelen = pptr->namelen;
 
+	error = xfs_parent_irec_hash(sc->ip, &rp->pptr);
+	if (error)
+		return error;
+
 	if (pptr->action == XREP_PPTR_ADD) {
 		/* Create parent pointer. */
 		trace_xrep_pptr_createname(sc->tempip, &rp->pptr);
@@ -261,19 +264,17 @@ STATIC int
 xrep_pptr_add_pointer(
 	struct xrep_pptrs	*rp,
 	const struct xfs_name	*name,
-	const struct xfs_inode	*dp,
-	xfs_dir2_dataptr_t	diroffset)
+	const struct xfs_inode	*dp)
 {
 	struct xrep_pptr	pptr = {
 		.action		= XREP_PPTR_ADD,
 		.namelen	= name->len,
 		.p_ino		= dp->i_ino,
 		.p_gen		= VFS_IC(dp)->i_generation,
-		.p_diroffset	= diroffset,
 	};
 	int			error;
 
-	trace_xrep_pptr_add_pointer(rp->sc->tempip, dp, diroffset, name);
+	trace_xrep_pptr_add_pointer(rp->sc->tempip, dp, name);
 
 	error = xfblob_store(rp->pptr_names, &pptr.name_cookie, name->name,
 			name->len);
@@ -291,19 +292,17 @@ STATIC int
 xrep_pptr_remove_pointer(
 	struct xrep_pptrs	*rp,
 	const struct xfs_name	*name,
-	const struct xfs_inode	*dp,
-	xfs_dir2_dataptr_t	diroffset)
+	const struct xfs_inode	*dp)
 {
 	struct xrep_pptr	pptr = {
 		.action		= XREP_PPTR_REMOVE,
 		.namelen	= name->len,
 		.p_ino		= dp->i_ino,
 		.p_gen		= VFS_IC(dp)->i_generation,
-		.p_diroffset	= diroffset,
 	};
 	int			error;
 
-	trace_xrep_pptr_remove_pointer(rp->sc->tempip, dp, diroffset, name);
+	trace_xrep_pptr_remove_pointer(rp->sc->tempip, dp, name);
 
 	error = xfblob_store(rp->pptr_names, &pptr.name_cookie, name->name,
 			name->len);
@@ -352,7 +351,7 @@ xrep_pptr_scan_dirent(
 	 * addition to the temporary file.
 	 */
 	mutex_lock(&rp->lock);
-	error = xrep_pptr_add_pointer(rp, name, dp, dapos);
+	error = xrep_pptr_add_pointer(rp, name, dp);
 	mutex_unlock(&rp->lock);
 	return error;
 }
@@ -646,11 +645,9 @@ xrep_pptr_live_update(
 	    xchk_iscan_want_live_update(&rp->iscan, p->dp->i_ino)) {
 		mutex_lock(&rp->lock);
 		if (p->delta > 0)
-			error = xrep_pptr_add_pointer(rp, p->name, p->dp,
-					p->diroffset);
+			error = xrep_pptr_add_pointer(rp, p->name, p->dp);
 		else
-			error = xrep_pptr_remove_pointer(rp, p->name, p->dp,
-					p->diroffset);
+			error = xrep_pptr_remove_pointer(rp, p->name, p->dp);
 		mutex_unlock(&rp->lock);
 		if (error)
 			goto out_abort;
diff --git a/fs/xfs/scrub/trace.h b/fs/xfs/scrub/trace.h
index 911d947db787..1af148d7617e 100644
--- a/fs/xfs/scrub/trace.h
+++ b/fs/xfs/scrub/trace.h
@@ -897,35 +897,28 @@ TRACE_EVENT(xchk_nlinks_live_update,
 		  __get_str(name))
 );
 
-TRACE_EVENT(xchk_parent_bad_dapos,
-	TP_PROTO(struct xfs_inode *ip, unsigned int p_diroffset,
-		 xfs_ino_t parent_ino, unsigned int dapos,
-		 const char *name, unsigned int namelen),
-	TP_ARGS(ip, p_diroffset, parent_ino, dapos, name, namelen),
+TRACE_EVENT(xchk_parent_bad_namehash,
+	TP_PROTO(struct xfs_inode *ip, xfs_ino_t parent_ino, const char *name,
+		unsigned int namelen),
+	TP_ARGS(ip, parent_ino, name, namelen),
 	TP_STRUCT__entry(
 		__field(dev_t, dev)
 		__field(xfs_ino_t, ino)
-		__field(unsigned int, p_diroffset)
 		__field(xfs_ino_t, parent_ino)
-		__field(unsigned int, dapos)
 		__field(unsigned int, namelen)
 		__dynamic_array(char, name, namelen)
 	),
 	TP_fast_assign(
 		__entry->dev = ip->i_mount->m_super->s_dev;
 		__entry->ino = ip->i_ino;
-		__entry->p_diroffset = p_diroffset;
 		__entry->parent_ino = parent_ino;
-		__entry->dapos = dapos;
 		__entry->namelen = namelen;
 		memcpy(__get_str(name), name, namelen);
 	),
-	TP_printk("dev %d:%d ino 0x%llx p_diroff 0x%x parent_ino 0x%llx parent_diroff 0x%x name '%.*s'",
+	TP_printk("dev %d:%d ino 0x%llx parent_ino 0x%llx name '%.*s'",
 		  MAJOR(__entry->dev), MINOR(__entry->dev),
 		  __entry->ino,
-		  __entry->p_diroffset,
 		  __entry->parent_ino,
-		  __entry->dapos,
 		  __entry->namelen,
 		  __get_str(name))
 );
@@ -1253,8 +1246,8 @@ TRACE_EVENT(xrep_tempfile_create,
 
 DECLARE_EVENT_CLASS(xrep_dirent_class,
 	TP_PROTO(struct xfs_inode *dp, const struct xfs_name *name,
-		 xfs_ino_t ino, unsigned int diroffset),
-	TP_ARGS(dp, name, ino, diroffset),
+		 xfs_ino_t ino),
+	TP_ARGS(dp, name, ino),
 	TP_STRUCT__entry(
 		__field(dev_t, dev)
 		__field(xfs_ino_t, dir_ino)
@@ -1262,7 +1255,6 @@ DECLARE_EVENT_CLASS(xrep_dirent_class,
 		__dynamic_array(char, name, name->len)
 		__field(xfs_ino_t, ino)
 		__field(uint8_t, ftype)
-		__field(unsigned int, diroffset)
 	),
 	TP_fast_assign(
 		__entry->dev = dp->i_mount->m_super->s_dev;
@@ -1271,12 +1263,10 @@ DECLARE_EVENT_CLASS(xrep_dirent_class,
 		memcpy(__get_str(name), name->name, name->len);
 		__entry->ino = ino;
 		__entry->ftype = name->type;
-		__entry->diroffset = diroffset;
 	),
-	TP_printk("dev %d:%d dir 0x%llx dapos 0x%x ftype %s name '%.*s' ino 0x%llx",
+	TP_printk("dev %d:%d dir 0x%llx ftype %s name '%.*s' ino 0x%llx",
 		  MAJOR(__entry->dev), MINOR(__entry->dev),
 		  __entry->dir_ino,
-		  __entry->diroffset,
 		  __print_symbolic(__entry->ftype, XFS_DIR3_FTYPE_STR),
 		  __entry->namelen,
 		  __get_str(name),
@@ -1285,8 +1275,8 @@ DECLARE_EVENT_CLASS(xrep_dirent_class,
 #define DEFINE_XREP_DIRENT_CLASS(name) \
 DEFINE_EVENT(xrep_dirent_class, name, \
 	TP_PROTO(struct xfs_inode *dp, const struct xfs_name *name, \
-		 xfs_ino_t ino, unsigned int diroffset), \
-	TP_ARGS(dp, name, ino, diroffset))
+		 xfs_ino_t ino), \
+	TP_ARGS(dp, name, ino))
 DEFINE_XREP_DIRENT_CLASS(xrep_dir_add_dirent);
 DEFINE_XREP_DIRENT_CLASS(xrep_dir_remove_dirent);
 DEFINE_XREP_DIRENT_CLASS(xrep_dir_createname);
@@ -1329,7 +1319,6 @@ DECLARE_EVENT_CLASS(xrep_pptr_class,
 		__field(xfs_ino_t, ino)
 		__field(xfs_ino_t, parent_ino)
 		__field(unsigned int, parent_gen)
-		__field(unsigned int, parent_diroffset)
 		__field(unsigned int, namelen)
 		__dynamic_array(char, name, pptr->p_namelen)
 	),
@@ -1338,16 +1327,14 @@ DECLARE_EVENT_CLASS(xrep_pptr_class,
 		__entry->ino = ip->i_ino;
 		__entry->parent_ino = pptr->p_ino;
 		__entry->parent_gen = pptr->p_gen;
-		__entry->parent_diroffset = pptr->p_diroffset;
 		__entry->namelen = pptr->p_namelen;
 		memcpy(__get_str(name), pptr->p_name, pptr->p_namelen);
 	),
-	TP_printk("dev %d:%d ino 0x%llx parent_ino 0x%llx parent_gen 0x%x parent_dapos 0x%x name '%.*s'",
+	TP_printk("dev %d:%d ino 0x%llx parent_ino 0x%llx parent_gen 0x%x name '%.*s'",
 		  MAJOR(__entry->dev), MINOR(__entry->dev),
 		  __entry->ino,
 		  __entry->parent_ino,
 		  __entry->parent_gen,
-		  __entry->parent_diroffset,
 		  __entry->namelen,
 		  __get_str(name))
 )
@@ -1362,14 +1349,13 @@ DEFINE_XREP_PPTR_CLASS(xrep_pptr_checkname);
 
 DECLARE_EVENT_CLASS(xrep_pptr_scan_class,
 	TP_PROTO(struct xfs_inode *ip, const struct xfs_inode *dp,
-		 unsigned int diroffset, const struct xfs_name *name),
-	TP_ARGS(ip, dp, diroffset, name),
+		 const struct xfs_name *name),
+	TP_ARGS(ip, dp, name),
 	TP_STRUCT__entry(
 		__field(dev_t, dev)
 		__field(xfs_ino_t, ino)
 		__field(xfs_ino_t, parent_ino)
 		__field(unsigned int, parent_gen)
-		__field(unsigned int, parent_diroffset)
 		__field(unsigned int, namelen)
 		__dynamic_array(char, name, name->len)
 	),
@@ -1378,24 +1364,22 @@ DECLARE_EVENT_CLASS(xrep_pptr_scan_class,
 		__entry->ino = ip->i_ino;
 		__entry->parent_ino = dp->i_ino;
 		__entry->parent_gen = VFS_IC(dp)->i_generation;
-		__entry->parent_diroffset = diroffset;
 		__entry->namelen = name->len;
 		memcpy(__get_str(name), name->name, name->len);
 	),
-	TP_printk("dev %d:%d ino 0x%llx parent_ino 0x%llx parent_gen 0x%x parent_dapos 0x%x name '%.*s'",
+	TP_printk("dev %d:%d ino 0x%llx parent_ino 0x%llx parent_gen 0x%x name '%.*s'",
 		  MAJOR(__entry->dev), MINOR(__entry->dev),
 		  __entry->ino,
 		  __entry->parent_ino,
 		  __entry->parent_gen,
-		  __entry->parent_diroffset,
 		  __entry->namelen,
 		  __get_str(name))
 )
 #define DEFINE_XREP_PPTR_SCAN_CLASS(name) \
 DEFINE_EVENT(xrep_pptr_scan_class, name, \
 	TP_PROTO(struct xfs_inode *ip, const struct xfs_inode *dp, \
-		 unsigned int diroffset, const struct xfs_name *name), \
-	TP_ARGS(ip, dp, diroffset, name))
+		 const struct xfs_name *name), \
+	TP_ARGS(ip, dp, name))
 DEFINE_XREP_PPTR_SCAN_CLASS(xrep_pptr_add_pointer);
 DEFINE_XREP_PPTR_SCAN_CLASS(xrep_pptr_remove_pointer);
 
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 09b0ac6b99cb..4cd9a4fea5e0 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -1202,8 +1202,7 @@ xfs_create(
 	 * the parent information now.
 	 */
 	if (parent) {
-		error = xfs_parent_add(tp, parent, dp, name, diroffset,
-					     ip);
+		error = xfs_parent_add(tp, parent, dp, name, ip);
 		if (error)
 			goto out_trans_cancel;
 	}
@@ -1477,8 +1476,7 @@ xfs_link(
 	 * the parent to the inode.
 	 */
 	if (parent) {
-		error = xfs_parent_add(tp, parent, tdp, target_name,
-					     diroffset, sip);
+		error = xfs_parent_add(tp, parent, tdp, target_name, sip);
 		if (error)
 			goto error_return;
 	}
@@ -2750,7 +2748,7 @@ xfs_remove(
 	}
 
 	if (parent) {
-		error = xfs_parent_remove(tp, dp, parent, dir_offset, ip);
+		error = xfs_parent_remove(tp, parent, dp, name, ip);
 		if (error)
 			goto out_trans_cancel;
 	}
@@ -3061,13 +3059,13 @@ xfs_cross_rename(
 	}
 
 	if (xfs_has_parent(mp)) {
-		error = xfs_parent_replace(tp, ip1_pptr, dp1,
-				old_diroffset, name2, dp2, new_diroffset, ip1);
+		error = xfs_parent_replace(tp, ip1_pptr, dp1, name1, dp2,
+				name2, ip1);
 		if (error)
 			goto out_trans_abort;
 
-		error = xfs_parent_replace(tp, ip2_pptr, dp2,
-				new_diroffset, name1, dp1, old_diroffset, ip2);
+		error = xfs_parent_replace(tp, ip2_pptr, dp2, name2, dp1,
+				name1, ip2);
 		if (error)
 			goto out_trans_abort;
 	}
@@ -3540,25 +3538,21 @@ xfs_rename(
 		goto out_trans_cancel;
 
 	if (wip_pptr) {
-		error = xfs_parent_add(tp, wip_pptr,
-					     src_dp, src_name,
-					     old_diroffset, wip);
+		error = xfs_parent_add(tp, wip_pptr, src_dp, src_name, wip);
 		if (error)
 			goto out_trans_cancel;
 	}
 
 	if (src_ip_pptr) {
-		error = xfs_parent_replace(tp, src_ip_pptr, src_dp,
-				old_diroffset, target_name, target_dp,
-				new_diroffset, src_ip);
+		error = xfs_parent_replace(tp, src_ip_pptr, src_dp, src_name,
+				target_dp, target_name, src_ip);
 		if (error)
 			goto out_trans_cancel;
 	}
 
 	if (tgt_ip_pptr) {
-		error = xfs_parent_remove(tp, target_dp,
-						tgt_ip_pptr,
-						new_diroffset, target_ip);
+		error = xfs_parent_remove(tp, tgt_ip_pptr, target_dp,
+				target_name, target_ip);
 		if (error)
 			goto out_trans_cancel;
 	}
diff --git a/fs/xfs/xfs_ondisk.h b/fs/xfs/xfs_ondisk.h
index 6a6bd05c2a68..2dc1eef63d96 100644
--- a/fs/xfs/xfs_ondisk.h
+++ b/fs/xfs/xfs_ondisk.h
@@ -6,6 +6,8 @@
 #ifndef __XFS_ONDISK_H
 #define __XFS_ONDISK_H
 
+#include <crypto/sha2.h>
+
 #define XFS_CHECK_STRUCT_SIZE(structname, size) \
 	BUILD_BUG_ON_MSG(sizeof(structname) != (size), "XFS: sizeof(" \
 		#structname ") is wrong, expected " #size)
@@ -114,6 +116,8 @@ xfs_check_ondisk_structs(void)
 	XFS_CHECK_OFFSET(xfs_dir2_sf_entry_t, offset,		1);
 	XFS_CHECK_OFFSET(xfs_dir2_sf_entry_t, name,		3);
 	XFS_CHECK_STRUCT_SIZE(xfs_dir2_sf_hdr_t,		10);
+	XFS_CHECK_STRUCT_SIZE(struct xfs_parent_name_rec,	76);
+	BUILD_BUG_ON(XFS_PARENT_NAME_HASH_SIZE != SHA512_DIGEST_SIZE);
 
 	/* log structures */
 	XFS_CHECK_STRUCT_SIZE(struct xfs_buf_log_format,	88);
diff --git a/fs/xfs/xfs_parent_utils.c b/fs/xfs/xfs_parent_utils.c
index 5ff7d38bc375..65bec3875308 100644
--- a/fs/xfs/xfs_parent_utils.c
+++ b/fs/xfs/xfs_parent_utils.c
@@ -83,7 +83,7 @@ xfs_getparent_listent(
 	pptr = &ppi->pi_parents[ppi->pi_ptrs_used++];
 	pptr->xpp_ino = irec->p_ino;
 	pptr->xpp_gen = irec->p_gen;
-	pptr->xpp_diroffset = irec->p_diroffset;
+	pptr->xpp_rsvd2 = 0;
 	pptr->xpp_rsvd = 0;
 
 	memcpy(pptr->xpp_name, irec->p_name, irec->p_namelen);
diff --git a/fs/xfs/xfs_sha512.h b/fs/xfs/xfs_sha512.h
new file mode 100644
index 000000000000..d9756db63aa6
--- /dev/null
+++ b/fs/xfs/xfs_sha512.h
@@ -0,0 +1,42 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (C) 2023 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <djwong@kernel.org>
+ */
+#ifndef __XFS_SHA512_H__
+#define __XFS_SHA512_H__
+
+struct sha512_state {
+	union {
+		struct shash_desc desc;
+		char __desc[sizeof(struct shash_desc) + HASH_MAX_DESCSIZE];
+	};
+};
+
+#define SHA512_DESC_ON_STACK(mp, name) \
+	struct sha512_state name = { .desc.tfm = (mp)->m_sha512 }
+
+#define SHA512_DIGEST_SIZE	64
+
+static inline int sha512_init(struct sha512_state *md)
+{
+	return crypto_shash_init(&md->desc);
+}
+
+static inline int sha512_done(struct sha512_state *md, unsigned char *out)
+{
+	return crypto_shash_final(&md->desc, out);
+}
+
+static inline int sha512_process(struct sha512_state *md,
+		const unsigned char *in, unsigned long inlen)
+{
+	return crypto_shash_update(&md->desc, in, inlen);
+}
+
+static inline void sha512_erase(struct sha512_state *md)
+{
+	memset(md, 0, sizeof(*md));
+}
+
+#endif /* __XFS_SHA512_H__ */
diff --git a/fs/xfs/xfs_symlink.c b/fs/xfs/xfs_symlink.c
index 63e68e832551..327c805815dc 100644
--- a/fs/xfs/xfs_symlink.c
+++ b/fs/xfs/xfs_symlink.c
@@ -348,8 +348,7 @@ xfs_symlink(
 	xfs_trans_log_inode(tp, dp, XFS_ILOG_CORE);
 
 	if (parent) {
-		error = xfs_parent_add(tp, parent, dp, link_name,
-					     diroffset, ip);
+		error = xfs_parent_add(tp, parent, dp, link_name, ip);
 		if (error)
 			goto out_trans_cancel;
 	}


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 3/5] xfs: skip the sha512 namehash when possible
  2023-02-16 20:28 ` [PATCHSET v9r2d1 0/5] xfs: encode parent pointer name in xattr key Darrick J. Wong
  2023-02-16 20:51   ` [PATCH 1/5] xfs: load secure hash algorithm for parent pointers Darrick J. Wong
  2023-02-16 20:51   ` [PATCH 2/5] xfs: replace parent pointer diroffset with sha512 hash of name Darrick J. Wong
@ 2023-02-16 20:52   ` Darrick J. Wong
  2023-02-16 20:52   ` [PATCH 4/5] xfs: make the ondisk parent pointer record a flex array Darrick J. Wong
                     ` (4 subsequent siblings)
  7 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:52 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Reduce the size and performance impacts of parent pointer name hashes by
using the dirent name as the hash if the dirent name is shorter than a
sha512 hash would be.  IOWs, we only use sha512 for names longer than 63
bytes.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_da_format.h |   21 +++++++++-
 fs/xfs/libxfs/xfs_parent.c    |   85 +++++++++++++++++++++++++++--------------
 fs/xfs/libxfs/xfs_parent.h    |    8 ++--
 fs/xfs/scrub/dir.c            |   12 ++++--
 fs/xfs/scrub/dir_repair.c     |    2 -
 fs/xfs/scrub/parent.c         |   16 +++++---
 fs/xfs/scrub/parent_repair.c  |    2 -
 fs/xfs/xfs_parent_utils.c     |    2 -
 8 files changed, 101 insertions(+), 47 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_da_format.h b/fs/xfs/libxfs/xfs_da_format.h
index 386f63b262d5..275357506394 100644
--- a/fs/xfs/libxfs/xfs_da_format.h
+++ b/fs/xfs/libxfs/xfs_da_format.h
@@ -831,8 +831,11 @@ xfs_failaddr_t xfs_da3_blkinfo_verify(struct xfs_buf *bp,
  * Parent pointer attribute format definition
  *
  * The EA name encodes the parent inode number, generation and a collision
- * resistant hash computed from the dirent name.  The hash is defined to be the
- * sha512 of the child inode generation and the dirent name.
+ * resistant hash computed from the dirent name.  The hash is defined to be:
+ *
+ * - The dirent name if it fits within the EA name.
+ *
+ * - The sha512 of the child inode generation and the dirent name.
  *
  * The EA value contains the same name as the dirent in the parent directory.
  */
@@ -842,4 +845,18 @@ struct xfs_parent_name_rec {
 	__u8	p_namehash[XFS_PARENT_NAME_HASH_SIZE];
 } __attribute__((packed));
 
+static inline unsigned int
+xfs_parent_name_rec_sizeof(
+	unsigned int		hashlen)
+{
+	return offsetof(struct xfs_parent_name_rec, p_namehash) + hashlen;
+}
+
+static inline unsigned int
+xfs_parent_name_hashlen(
+	unsigned int		rec_sizeof)
+{
+	return rec_sizeof - offsetof(struct xfs_parent_name_rec, p_namehash);
+}
+
 #endif /* __XFS_DA_FORMAT_H__ */
diff --git a/fs/xfs/libxfs/xfs_parent.c b/fs/xfs/libxfs/xfs_parent.c
index a28dcf18cb4d..32235a0e9e0d 100644
--- a/fs/xfs/libxfs/xfs_parent.c
+++ b/fs/xfs/libxfs/xfs_parent.c
@@ -56,7 +56,8 @@ xfs_parent_namecheck(
 {
 	xfs_ino_t				p_ino;
 
-	if (reclen != sizeof(struct xfs_parent_name_rec))
+	if (reclen <= xfs_parent_name_rec_sizeof(0) ||
+	    reclen > xfs_parent_name_rec_sizeof(XFS_PARENT_NAME_HASH_SIZE))
 		return false;
 
 	/* Only one namespace bit allowed. */
@@ -108,12 +109,16 @@ void
 xfs_parent_irec_from_disk(
 	struct xfs_parent_name_irec	*irec,
 	const struct xfs_parent_name_rec *rec,
+	int				reclen,
 	const void			*value,
 	int				valuelen)
 {
 	irec->p_ino = be64_to_cpu(rec->p_ino);
 	irec->p_gen = be32_to_cpu(rec->p_gen);
-	memcpy(irec->p_namehash, rec->p_namehash, sizeof(irec->p_namehash));
+	irec->hashlen = xfs_parent_name_hashlen(reclen);
+	memcpy(irec->p_namehash, rec->p_namehash, irec->hashlen);
+	memset(irec->p_namehash + irec->hashlen, 0,
+			sizeof(irec->p_namehash) - irec->hashlen);
 
 	if (!value) {
 		irec->p_namelen = 0;
@@ -137,13 +142,15 @@ xfs_parent_irec_from_disk(
 void
 xfs_parent_irec_to_disk(
 	struct xfs_parent_name_rec	*rec,
+	int				*reclen,
 	void				*value,
 	int				*valuelen,
 	const struct xfs_parent_name_irec *irec)
 {
 	rec->p_ino = cpu_to_be64(irec->p_ino);
 	rec->p_gen = cpu_to_be32(irec->p_gen);
-	memcpy(rec->p_namehash, irec->p_namehash, sizeof(rec->p_namehash));
+	*reclen = xfs_parent_name_rec_sizeof(irec->hashlen);
+	memcpy(rec->p_namehash, irec->p_namehash, irec->hashlen);
 
 	if (valuelen) {
 		ASSERT(*valuelen > 0);
@@ -206,12 +213,14 @@ xfs_parent_add(
 	struct xfs_inode	*child)
 {
 	struct xfs_da_args	*args = &parent->args;
-	int			error;
+	int			hashlen;
 
-	error = xfs_init_parent_name_rec(&parent->rec, dp, parent_name, child);
-	if (error)
-		return error;
+	hashlen = xfs_init_parent_name_rec(&parent->rec, dp, parent_name,
+			child);
+	if (hashlen < 0)
+		return hashlen;
 
+	args->namelen = xfs_parent_name_rec_sizeof(hashlen);
 	args->hashval = xfs_da_hashname(args->name, args->namelen);
 
 	args->trans = tp;
@@ -234,12 +243,13 @@ xfs_parent_remove(
 	struct xfs_inode	*child)
 {
 	struct xfs_da_args	*args = &parent->args;
-	int			error;
+	int			hashlen;
 
-	error = xfs_init_parent_name_rec(&parent->rec, dp, name, child);
-	if (error)
-		return error;
+	hashlen = xfs_init_parent_name_rec(&parent->rec, dp, name, child);
+	if (hashlen < 0)
+		return hashlen;
 
+	args->namelen = xfs_parent_name_rec_sizeof(hashlen);
 	args->trans = tp;
 	args->dp = child;
 	args->hashval = xfs_da_hashname(args->name, args->namelen);
@@ -258,21 +268,21 @@ xfs_parent_replace(
 	struct xfs_inode	*child)
 {
 	struct xfs_da_args	*args = &new_parent->args;
-	int			error;
+	int			old_hashlen, new_hashlen;
 
-	error = xfs_init_parent_name_rec(&new_parent->old_rec, old_dp,
+	old_hashlen = xfs_init_parent_name_rec(&new_parent->old_rec, old_dp,
 			old_name, child);
-	if (error)
-		return error;
-	error = xfs_init_parent_name_rec(&new_parent->rec, new_dp, new_name,
-			child);
-	if (error)
-		return error;
+	if (old_hashlen < 0)
+		return old_hashlen;
+	new_hashlen = xfs_init_parent_name_rec(&new_parent->rec, new_dp,
+			new_name, child);
+	if (new_hashlen < 0)
+		return new_hashlen;
 
 	new_parent->args.name = (const uint8_t *)&new_parent->old_rec;
-	new_parent->args.namelen = sizeof(struct xfs_parent_name_rec);
+	new_parent->args.namelen = xfs_parent_name_rec_sizeof(old_hashlen);
 	new_parent->args.new_name = (const uint8_t *)&new_parent->rec;
-	new_parent->args.new_namelen = sizeof(struct xfs_parent_name_rec);
+	new_parent->args.new_namelen = xfs_parent_name_rec_sizeof(new_hashlen);
 	args->trans = tp;
 	args->dp = child;
 
@@ -320,16 +330,17 @@ xfs_parent_lookup(
 	unsigned int			namelen,
 	struct xfs_parent_scratch	*scr)
 {
+	int				reclen;
 	int				error;
 
-	xfs_parent_irec_to_disk(&scr->rec, NULL, NULL, pptr);
+	xfs_parent_irec_to_disk(&scr->rec, &reclen, NULL, NULL, pptr);
 
 	memset(&scr->args, 0, sizeof(struct xfs_da_args));
 	scr->args.attr_filter	= XFS_ATTR_PARENT;
 	scr->args.dp		= ip;
 	scr->args.geo		= ip->i_mount->m_attr_geo;
 	scr->args.name		= (const unsigned char *)&scr->rec;
-	scr->args.namelen	= sizeof(struct xfs_parent_name_rec);
+	scr->args.namelen	= reclen;
 	scr->args.op_flags	= XFS_DA_OP_OKNOENT;
 	scr->args.trans		= tp;
 	scr->args.valuelen	= namelen;
@@ -357,14 +368,16 @@ xfs_parent_set(
 	const struct xfs_parent_name_irec *pptr,
 	struct xfs_parent_scratch	*scr)
 {
-	xfs_parent_irec_to_disk(&scr->rec, NULL, NULL, pptr);
+	int				reclen;
+
+	xfs_parent_irec_to_disk(&scr->rec, &reclen, NULL, NULL, pptr);
 
 	memset(&scr->args, 0, sizeof(struct xfs_da_args));
 	scr->args.attr_filter	= XFS_ATTR_PARENT;
 	scr->args.dp		= ip;
 	scr->args.geo		= ip->i_mount->m_attr_geo;
 	scr->args.name		= (const unsigned char *)&scr->rec;
-	scr->args.namelen	= sizeof(struct xfs_parent_name_rec);
+	scr->args.namelen	= reclen;
 	scr->args.valuelen	= pptr->p_namelen;
 	scr->args.value		= (void *)pptr->p_name;
 	scr->args.whichfork	= XFS_ATTR_FORK;
@@ -384,14 +397,16 @@ xfs_parent_unset(
 	const struct xfs_parent_name_irec *pptr,
 	struct xfs_parent_scratch	*scr)
 {
-	xfs_parent_irec_to_disk(&scr->rec, NULL, NULL, pptr);
+	int				reclen;
+
+	xfs_parent_irec_to_disk(&scr->rec, &reclen, NULL, NULL, pptr);
 
 	memset(&scr->args, 0, sizeof(struct xfs_da_args));
 	scr->args.attr_filter	= XFS_ATTR_PARENT;
 	scr->args.dp		= ip;
 	scr->args.geo		= ip->i_mount->m_attr_geo;
 	scr->args.name		= (const unsigned char *)&scr->rec;
-	scr->args.namelen	= sizeof(struct xfs_parent_name_rec);
+	scr->args.namelen	= reclen;
 	scr->args.whichfork	= XFS_ATTR_FORK;
 
 	return xfs_attr_set(&scr->args);
@@ -399,7 +414,7 @@ xfs_parent_unset(
 
 /*
  * Compute the parent pointer namehash for the given child file and dirent
- * name.
+ * name.  Returns the length of the hash in bytes, or a negative errno.
  */
 int
 xfs_parent_namehash(
@@ -420,6 +435,12 @@ xfs_parent_namehash(
 		return -EINVAL;
 	}
 
+	if (name->len < namehash_len) {
+		memcpy(namehash, name->name, name->len);
+		memset(namehash + name->len, 0, namehash_len - name->len);
+		return name->len;
+	}
+
 	error = sha512_init(&shash);
 	if (error)
 		goto out;
@@ -436,6 +457,7 @@ xfs_parent_namehash(
 	if (error)
 		goto out;
 
+	error = SHA512_DIGEST_SIZE;
 out:
 	sha512_erase(&shash);
 	return error;
@@ -451,7 +473,12 @@ xfs_parent_irec_hash(
 		.name			= pptr->p_name,
 		.len			= pptr->p_namelen,
 	};
+	int				hashlen;
 
-	return xfs_parent_namehash(ip, &xname, &pptr->p_namehash,
+	hashlen = xfs_parent_namehash(ip, &xname, &pptr->p_namehash,
 			sizeof(pptr->p_namehash));
+	if (hashlen < 0)
+		return hashlen;
+	pptr->hashlen = hashlen;
+	return 0;
 }
diff --git a/fs/xfs/libxfs/xfs_parent.h b/fs/xfs/libxfs/xfs_parent.h
index d3f2841e0f6e..4c3100760bba 100644
--- a/fs/xfs/libxfs/xfs_parent.h
+++ b/fs/xfs/libxfs/xfs_parent.h
@@ -23,6 +23,7 @@ struct xfs_parent_name_irec {
 	/* Key fields for looking up a particular parent pointer. */
 	xfs_ino_t		p_ino;
 	uint32_t		p_gen;
+	uint8_t			hashlen;
 	uint8_t			p_namehash[XFS_PARENT_NAME_HASH_SIZE];
 
 	/* Attributes of a parent pointer. */
@@ -31,10 +32,11 @@ struct xfs_parent_name_irec {
 };
 
 void xfs_parent_irec_from_disk(struct xfs_parent_name_irec *irec,
-		const struct xfs_parent_name_rec *rec,
+		const struct xfs_parent_name_rec *rec, int reclen,
 		const void *value, int valuelen);
-void xfs_parent_irec_to_disk(struct xfs_parent_name_rec *rec, void *value,
-		int *valuelen, const struct xfs_parent_name_irec *irec);
+void xfs_parent_irec_to_disk(struct xfs_parent_name_rec *rec, int *reclen,
+		void *value, int *valuelen,
+		const struct xfs_parent_name_irec *irec);
 
 /*
  * Dynamically allocd structure used to wrap the needed data to pass around
diff --git a/fs/xfs/scrub/dir.c b/fs/xfs/scrub/dir.c
index 2494947a0c93..87cff40b15f1 100644
--- a/fs/xfs/scrub/dir.c
+++ b/fs/xfs/scrub/dir.c
@@ -144,15 +144,19 @@ xchk_dir_parent_pointer(
 {
 	struct xfs_scrub	*sc = sd->sc;
 	int			pptr_namelen;
-	int			error;
+	int			hashlen;
 
 	sd->pptr.p_ino = sc->ip->i_ino;
 	sd->pptr.p_gen = VFS_I(sc->ip)->i_generation;
 
-	error = xfs_parent_namehash(ip, name, &sd->pptr.p_namehash,
+	hashlen = xfs_parent_namehash(ip, name, &sd->pptr.p_namehash,
 			sizeof(sd->pptr.p_namehash));
-	if (error)
-		return error;
+	if (hashlen < 0) {
+		xchk_fblock_xref_process_error(sc, XFS_DATA_FORK, 0,
+				&hashlen);
+		return hashlen;
+	}
+	sd->pptr.hashlen = hashlen;
 
 	pptr_namelen = xfs_parent_lookup(sc->tp, ip, &sd->pptr, sd->namebuf,
 			MAXNAMELEN, &sd->pptr_scratch);
diff --git a/fs/xfs/scrub/dir_repair.c b/fs/xfs/scrub/dir_repair.c
index c0b2b78da277..b12548787321 100644
--- a/fs/xfs/scrub/dir_repair.c
+++ b/fs/xfs/scrub/dir_repair.c
@@ -540,7 +540,7 @@ xrep_dir_scan_parent_pointer(
 	    !xfs_parent_valuecheck(sc->mp, value, valuelen))
 		return -EFSCORRUPTED;
 
-	xfs_parent_irec_from_disk(&rd->pptr, rec, value, valuelen);
+	xfs_parent_irec_from_disk(&rd->pptr, rec, namelen, value, valuelen);
 
 	/* Ignore parent pointers that point back to a different dir. */
 	if (rd->pptr.p_ino != sc->ip->i_ino ||
diff --git a/fs/xfs/scrub/parent.c b/fs/xfs/scrub/parent.c
index 53872a7be942..b47f0bcef690 100644
--- a/fs/xfs/scrub/parent.c
+++ b/fs/xfs/scrub/parent.c
@@ -531,6 +531,7 @@ xchk_parent_scan_attr(
 	struct xfs_inode	*dp = NULL;
 	const struct xfs_parent_name_rec *rec = (const void *)name;
 	unsigned int		lockmode;
+	int			hashlen;
 	int			error;
 
 	/* Ignore incomplete xattrs */
@@ -552,7 +553,7 @@ xchk_parent_scan_attr(
 		return -ECANCELED;
 	}
 
-	xfs_parent_irec_from_disk(&pp->pptr, rec, value, valuelen);
+	xfs_parent_irec_from_disk(&pp->pptr, rec, namelen, value, valuelen);
 
 	xname.name = pp->pptr.p_name;
 	xname.len = pp->pptr.p_namelen;
@@ -561,13 +562,16 @@ xchk_parent_scan_attr(
 	 * Does the namehash in the parent pointer match the actual name?
 	 * If not, there's no point in checking further.
 	 */
-	error = xfs_parent_namehash(sc->ip, &xname, pp->child_namehash,
+	hashlen = xfs_parent_namehash(sc->ip, &xname, pp->child_namehash,
 			sizeof(pp->child_namehash));
-	if (!xchk_fblock_xref_process_error(sc, XFS_ATTR_FORK, 0, &error))
-		return error;
+	if (hashlen < 0) {
+		xchk_fblock_xref_process_error(sc, XFS_ATTR_FORK, 0, &hashlen);
+		return hashlen;
+	}
 
-	if (memcmp(pp->pptr.p_namehash, pp->child_namehash,
-				sizeof(pp->pptr.p_namehash))) {
+	if (hashlen != pp->pptr.hashlen ||
+	    memcmp(pp->pptr.p_namehash, pp->child_namehash,
+				pp->pptr.hashlen)) {
 		trace_xchk_parent_bad_namehash(sc->ip, pp->pptr.p_ino,
 				xname.name, xname.len);
 		xchk_fblock_xref_set_corrupt(sc, XFS_ATTR_FORK, 0);
diff --git a/fs/xfs/scrub/parent_repair.c b/fs/xfs/scrub/parent_repair.c
index 51432ab61c94..7d3b9c82bd05 100644
--- a/fs/xfs/scrub/parent_repair.c
+++ b/fs/xfs/scrub/parent_repair.c
@@ -522,7 +522,7 @@ xrep_pptr_dump_tempptr(
 	else
 		return -EFSCORRUPTED;
 
-	xfs_parent_irec_from_disk(&rp->pptr, rec, value, valuelen);
+	xfs_parent_irec_from_disk(&rp->pptr, rec, namelen, value, valuelen);
 
 	trace_xrep_pptr_dumpname(sc->tempip, &rp->pptr);
 
diff --git a/fs/xfs/xfs_parent_utils.c b/fs/xfs/xfs_parent_utils.c
index 65bec3875308..284ca3c14a0f 100644
--- a/fs/xfs/xfs_parent_utils.c
+++ b/fs/xfs/xfs_parent_utils.c
@@ -74,7 +74,7 @@ xfs_getparent_listent(
 		return;
 	}
 
-	xfs_parent_irec_from_disk(&gp->pptr_irec, (void *)name, value,
+	xfs_parent_irec_from_disk(&gp->pptr_irec, (void *)name, namelen, value,
 			valuelen);
 
 	trace_xfs_getparent_listent(context->dp, ppi, irec);


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 4/5] xfs: make the ondisk parent pointer record a flex array
  2023-02-16 20:28 ` [PATCHSET v9r2d1 0/5] xfs: encode parent pointer name in xattr key Darrick J. Wong
                     ` (2 preceding siblings ...)
  2023-02-16 20:52   ` [PATCH 3/5] xfs: skip the sha512 namehash when possible Darrick J. Wong
@ 2023-02-16 20:52   ` Darrick J. Wong
  2023-02-16 20:52   ` [PATCH 5/5] xfs: use parent pointer xattr space more efficiently Darrick J. Wong
                     ` (3 subsequent siblings)
  7 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:52 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Now that we can use the filename as the parent pointer name hash, we
always write the full 64 bytes into the xattr.  In other words, the
namehash is really a flex array, so adjust its C definition.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_da_format.h  |    9 ++++++---
 fs/xfs/libxfs/xfs_parent.c     |    4 ++--
 fs/xfs/libxfs/xfs_parent.h     |   15 ++++++++++++---
 fs/xfs/libxfs/xfs_trans_resv.c |    6 +++---
 fs/xfs/xfs_ondisk.h            |    2 +-
 5 files changed, 24 insertions(+), 12 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_da_format.h b/fs/xfs/libxfs/xfs_da_format.h
index 275357506394..4d85830785ae 100644
--- a/fs/xfs/libxfs/xfs_da_format.h
+++ b/fs/xfs/libxfs/xfs_da_format.h
@@ -842,21 +842,24 @@ xfs_failaddr_t xfs_da3_blkinfo_verify(struct xfs_buf *bp,
 struct xfs_parent_name_rec {
 	__be64  p_ino;
 	__be32  p_gen;
-	__u8	p_namehash[XFS_PARENT_NAME_HASH_SIZE];
+	__u8	p_namehash[];
 } __attribute__((packed));
 
+#define XFS_PARENT_NAME_MAX_SIZE \
+	(sizeof(struct xfs_parent_name_rec) + XFS_PARENT_NAME_HASH_SIZE)
+
 static inline unsigned int
 xfs_parent_name_rec_sizeof(
 	unsigned int		hashlen)
 {
-	return offsetof(struct xfs_parent_name_rec, p_namehash) + hashlen;
+	return sizeof(struct xfs_parent_name_rec) + hashlen;
 }
 
 static inline unsigned int
 xfs_parent_name_hashlen(
 	unsigned int		rec_sizeof)
 {
-	return rec_sizeof - offsetof(struct xfs_parent_name_rec, p_namehash);
+	return rec_sizeof - sizeof(struct xfs_parent_name_rec);
 }
 
 #endif /* __XFS_DA_FORMAT_H__ */
diff --git a/fs/xfs/libxfs/xfs_parent.c b/fs/xfs/libxfs/xfs_parent.c
index 32235a0e9e0d..6520e35178a0 100644
--- a/fs/xfs/libxfs/xfs_parent.c
+++ b/fs/xfs/libxfs/xfs_parent.c
@@ -98,7 +98,7 @@ xfs_init_parent_name_rec(
 	rec->p_ino = cpu_to_be64(dp->i_ino);
 	rec->p_gen = cpu_to_be32(VFS_IC(dp)->i_generation);
 	return xfs_parent_namehash(ip, name, rec->p_namehash,
-			sizeof(rec->p_namehash));
+			XFS_PARENT_NAME_HASH_SIZE);
 }
 
 /*
@@ -197,7 +197,7 @@ __xfs_parent_init(
 	parent->args.attr_filter = XFS_ATTR_PARENT;
 	parent->args.op_flags = XFS_DA_OP_OKNOENT | XFS_DA_OP_LOGGED;
 	parent->args.name = (const uint8_t *)&parent->rec;
-	parent->args.namelen = sizeof(struct xfs_parent_name_rec);
+	parent->args.namelen = 0;
 
 	*parentp = parent;
 	return 0;
diff --git a/fs/xfs/libxfs/xfs_parent.h b/fs/xfs/libxfs/xfs_parent.h
index 4c3100760bba..3431aac75e92 100644
--- a/fs/xfs/libxfs/xfs_parent.h
+++ b/fs/xfs/libxfs/xfs_parent.h
@@ -43,8 +43,14 @@ void xfs_parent_irec_to_disk(struct xfs_parent_name_rec *rec, int *reclen,
  * the defer ops machinery
  */
 struct xfs_parent_defer {
-	struct xfs_parent_name_rec	rec;
-	struct xfs_parent_name_rec	old_rec;
+	union {
+		struct xfs_parent_name_rec	rec;
+		__u8			dummy1[XFS_PARENT_NAME_MAX_SIZE];
+	};
+	union {
+		struct xfs_parent_name_rec	old_rec;
+		__u8			dummy2[XFS_PARENT_NAME_MAX_SIZE];
+	};
 	struct xfs_da_args		args;
 	bool				have_log;
 };
@@ -112,7 +118,10 @@ unsigned int xfs_pptr_calc_space_res(struct xfs_mount *mp,
 
 /* Scratchpad memory so that raw parent operations don't burn stack space. */
 struct xfs_parent_scratch {
-	struct xfs_parent_name_rec	rec;
+	union {
+		struct xfs_parent_name_rec	rec;
+		__u8			dummy1[XFS_PARENT_NAME_MAX_SIZE];
+	};
 	struct xfs_da_args		args;
 };
 
diff --git a/fs/xfs/libxfs/xfs_trans_resv.c b/fs/xfs/libxfs/xfs_trans_resv.c
index 93419956b9e5..0e625c6b0153 100644
--- a/fs/xfs/libxfs/xfs_trans_resv.c
+++ b/fs/xfs/libxfs/xfs_trans_resv.c
@@ -427,19 +427,19 @@ static inline unsigned int xfs_calc_pptr_link_overhead(void)
 {
 	return sizeof(struct xfs_attri_log_format) +
 			xlog_calc_iovec_len(XATTR_NAME_MAX) +
-			xlog_calc_iovec_len(sizeof(struct xfs_parent_name_rec));
+			xlog_calc_iovec_len(XFS_PARENT_NAME_MAX_SIZE);
 }
 static inline unsigned int xfs_calc_pptr_unlink_overhead(void)
 {
 	return sizeof(struct xfs_attri_log_format) +
-			xlog_calc_iovec_len(sizeof(struct xfs_parent_name_rec));
+			xlog_calc_iovec_len(XFS_PARENT_NAME_MAX_SIZE);
 }
 static inline unsigned int xfs_calc_pptr_replace_overhead(void)
 {
 	return sizeof(struct xfs_attri_log_format) +
 			xlog_calc_iovec_len(XATTR_NAME_MAX) +
 			xlog_calc_iovec_len(XATTR_NAME_MAX) +
-			xlog_calc_iovec_len(sizeof(struct xfs_parent_name_rec));
+			xlog_calc_iovec_len(XFS_PARENT_NAME_MAX_SIZE);
 }
 
 /*
diff --git a/fs/xfs/xfs_ondisk.h b/fs/xfs/xfs_ondisk.h
index 2dc1eef63d96..24361ae0fd48 100644
--- a/fs/xfs/xfs_ondisk.h
+++ b/fs/xfs/xfs_ondisk.h
@@ -116,7 +116,7 @@ xfs_check_ondisk_structs(void)
 	XFS_CHECK_OFFSET(xfs_dir2_sf_entry_t, offset,		1);
 	XFS_CHECK_OFFSET(xfs_dir2_sf_entry_t, name,		3);
 	XFS_CHECK_STRUCT_SIZE(xfs_dir2_sf_hdr_t,		10);
-	XFS_CHECK_STRUCT_SIZE(struct xfs_parent_name_rec,	76);
+	XFS_CHECK_STRUCT_SIZE(struct xfs_parent_name_rec,	12);
 	BUILD_BUG_ON(XFS_PARENT_NAME_HASH_SIZE != SHA512_DIGEST_SIZE);
 
 	/* log structures */


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 5/5] xfs: use parent pointer xattr space more efficiently
  2023-02-16 20:28 ` [PATCHSET v9r2d1 0/5] xfs: encode parent pointer name in xattr key Darrick J. Wong
                     ` (3 preceding siblings ...)
  2023-02-16 20:52   ` [PATCH 4/5] xfs: make the ondisk parent pointer record a flex array Darrick J. Wong
@ 2023-02-16 20:52   ` Darrick J. Wong
  2023-02-18  8:12   ` [PATCHSET v9r2d1 0/5] xfs: encode parent pointer name in xattr key Amir Goldstein
                     ` (2 subsequent siblings)
  7 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:52 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Amend the parent pointer xattr format even more.  Now we put as much of
the dirent name in the namehash as we can.  For names that don't fit,
the namehash is the truncated dirent name with the sha512 of the entire
name at the end of the namehash.  The EA value is then truncated to
whatever doesn't fit in the namehash.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_da_format.h |   26 +++++++---
 fs/xfs/libxfs/xfs_parent.c    |  111 ++++++++++++++++++++++++++++++++---------
 fs/xfs/libxfs/xfs_parent.h    |    6 +-
 fs/xfs/scrub/dir_repair.c     |    2 -
 fs/xfs/scrub/parent.c         |    4 +
 fs/xfs/xfs_attr_item.c        |    4 +
 fs/xfs/xfs_ondisk.h           |    4 +
 7 files changed, 118 insertions(+), 39 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_da_format.h b/fs/xfs/libxfs/xfs_da_format.h
index 4d85830785ae..55f510f82e8d 100644
--- a/fs/xfs/libxfs/xfs_da_format.h
+++ b/fs/xfs/libxfs/xfs_da_format.h
@@ -825,19 +825,24 @@ xfs_failaddr_t xfs_da3_blkinfo_verify(struct xfs_buf *bp,
 				      struct xfs_da3_blkinfo *hdr3);
 
 /* We use sha512 for the parent pointer name hash. */
-#define XFS_PARENT_NAME_HASH_SIZE	(64)
+#define XFS_PARENT_NAME_SHA512_SIZE	(64)
 
 /*
  * Parent pointer attribute format definition
  *
  * The EA name encodes the parent inode number, generation and a collision
- * resistant hash computed from the dirent name.  The hash is defined to be:
+ * resistant hash computed from the dirent name.  The hash is defined to be
+ * one of the following:
  *
- * - The dirent name if it fits within the EA name.
+ * - The dirent name, as long as it does not use the last possible byte of the
+ *   EA name space.
  *
- * - The sha512 of the child inode generation and the dirent name.
+ * - The truncated dirent name, with the sha512 hash of the child inode
+ *   generation number and dirent name.  The hash is written at the end of the
+ *   EA name.
  *
- * The EA value contains the same name as the dirent in the parent directory.
+ * The EA value contains however much of the dirent name that does not fit in
+ * the EA name.
  */
 struct xfs_parent_name_rec {
 	__be64  p_ino;
@@ -845,8 +850,17 @@ struct xfs_parent_name_rec {
 	__u8	p_namehash[];
 } __attribute__((packed));
 
+/* Maximum size of a parent pointer EA name. */
 #define XFS_PARENT_NAME_MAX_SIZE \
-	(sizeof(struct xfs_parent_name_rec) + XFS_PARENT_NAME_HASH_SIZE)
+	(MAXNAMELEN - 1)
+
+/* Maximum size of a parent pointer name hash. */
+#define XFS_PARENT_NAME_MAX_HASH_SIZE \
+	(XFS_PARENT_NAME_MAX_SIZE - sizeof(struct xfs_parent_name_rec))
+
+/* Offset of the sha512 hash, if used. */
+#define XFS_PARENT_NAME_SHA512_OFFSET \
+	(XFS_PARENT_NAME_MAX_HASH_SIZE - XFS_PARENT_NAME_SHA512_SIZE)
 
 static inline unsigned int
 xfs_parent_name_rec_sizeof(
diff --git a/fs/xfs/libxfs/xfs_parent.c b/fs/xfs/libxfs/xfs_parent.c
index 6520e35178a0..f7fecee93894 100644
--- a/fs/xfs/libxfs/xfs_parent.c
+++ b/fs/xfs/libxfs/xfs_parent.c
@@ -57,7 +57,7 @@ xfs_parent_namecheck(
 	xfs_ino_t				p_ino;
 
 	if (reclen <= xfs_parent_name_rec_sizeof(0) ||
-	    reclen > xfs_parent_name_rec_sizeof(XFS_PARENT_NAME_HASH_SIZE))
+	    reclen > xfs_parent_name_rec_sizeof(XFS_PARENT_NAME_MAX_HASH_SIZE))
 		return false;
 
 	/* Only one namespace bit allowed. */
@@ -75,10 +75,18 @@ xfs_parent_namecheck(
 bool
 xfs_parent_valuecheck(
 	struct xfs_mount		*mp,
+	size_t				namelen,
 	const void			*value,
 	size_t				valuelen)
 {
-	if (valuelen == 0 || valuelen >= MAXNAMELEN)
+	if (namelen > XFS_PARENT_NAME_MAX_SIZE)
+		return false;
+
+	if (namelen < XFS_PARENT_NAME_MAX_SIZE && valuelen != 0)
+		return false;
+
+	if (namelen == XFS_PARENT_NAME_MAX_SIZE &&
+	    valuelen >= MAXNAMELEN - XFS_PARENT_NAME_SHA512_OFFSET)
 		return false;
 
 	if (value == NULL)
@@ -98,7 +106,20 @@ xfs_init_parent_name_rec(
 	rec->p_ino = cpu_to_be64(dp->i_ino);
 	rec->p_gen = cpu_to_be32(VFS_IC(dp)->i_generation);
 	return xfs_parent_namehash(ip, name, rec->p_namehash,
-			XFS_PARENT_NAME_HASH_SIZE);
+			XFS_PARENT_NAME_MAX_HASH_SIZE);
+}
+
+/* Compute the number of name bytes that can be encoded in the namehash. */
+static inline unsigned int
+xfs_parent_valuelen_adj(
+	int			hashlen)
+{
+	ASSERT(hashlen > 0);
+
+	if (hashlen == XFS_PARENT_NAME_MAX_HASH_SIZE)
+		return XFS_PARENT_NAME_SHA512_OFFSET;
+
+	return hashlen;
 }
 
 /*
@@ -125,14 +146,29 @@ xfs_parent_irec_from_disk(
 		return;
 	}
 
-	ASSERT(valuelen > 0);
 	ASSERT(valuelen < MAXNAMELEN);
 
-	valuelen = min(valuelen, MAXNAMELEN);
+	if (irec->hashlen == XFS_PARENT_NAME_MAX_HASH_SIZE) {
+		ASSERT(valuelen > 0);
+		ASSERT(valuelen <= MAXNAMELEN - XFS_PARENT_NAME_SHA512_OFFSET);
 
-	irec->p_namelen = valuelen;
-	memcpy(irec->p_name, value, valuelen);
-	memset(&irec->p_name[valuelen], 0, sizeof(irec->p_name) - valuelen);
+		valuelen = min_t(int, valuelen,
+				MAXNAMELEN - XFS_PARENT_NAME_SHA512_OFFSET);
+
+		memcpy(irec->p_name, irec->p_namehash,
+				XFS_PARENT_NAME_SHA512_OFFSET);
+		memcpy(&irec->p_name[XFS_PARENT_NAME_SHA512_OFFSET],
+				value, valuelen);
+		irec->p_namelen = XFS_PARENT_NAME_SHA512_OFFSET + valuelen;
+	} else {
+		ASSERT(valuelen == 0);
+
+		memcpy(irec->p_name, irec->p_namehash, irec->hashlen);
+		irec->p_namelen = irec->hashlen;
+	}
+
+	memset(&irec->p_name[irec->p_namelen], 0,
+			sizeof(irec->p_name) - irec->p_namelen);
 }
 
 /*
@@ -157,11 +193,15 @@ xfs_parent_irec_to_disk(
 		ASSERT(*valuelen >= irec->p_namelen);
 		ASSERT(*valuelen < MAXNAMELEN);
 
-		*valuelen = irec->p_namelen;
+		if (irec->hashlen == XFS_PARENT_NAME_MAX_HASH_SIZE)
+			*valuelen = irec->p_namelen - XFS_PARENT_NAME_SHA512_OFFSET;
+		else
+			*valuelen = 0;
 	}
 
-	if (value)
-		memcpy(value, irec->p_name, irec->p_namelen);
+	if (value && irec->hashlen == XFS_PARENT_NAME_MAX_HASH_SIZE)
+		memcpy(value, irec->p_name + XFS_PARENT_NAME_SHA512_OFFSET,
+			      irec->p_namelen - XFS_PARENT_NAME_SHA512_OFFSET);
 }
 
 /*
@@ -214,6 +254,7 @@ xfs_parent_add(
 {
 	struct xfs_da_args	*args = &parent->args;
 	int			hashlen;
+	unsigned int		name_adj;
 
 	hashlen = xfs_init_parent_name_rec(&parent->rec, dp, parent_name,
 			child);
@@ -223,11 +264,13 @@ xfs_parent_add(
 	args->namelen = xfs_parent_name_rec_sizeof(hashlen);
 	args->hashval = xfs_da_hashname(args->name, args->namelen);
 
+	name_adj = xfs_parent_valuelen_adj(hashlen);
+
 	args->trans = tp;
 	args->dp = child;
 	if (parent_name) {
-		parent->args.value = (void *)parent_name->name;
-		parent->args.valuelen = parent_name->len;
+		parent->args.value = (void *)parent_name->name + name_adj;
+		parent->args.valuelen = parent_name->len - name_adj;
 	}
 
 	return xfs_attr_defer_add(args);
@@ -269,6 +312,7 @@ xfs_parent_replace(
 {
 	struct xfs_da_args	*args = &new_parent->args;
 	int			old_hashlen, new_hashlen;
+	int			new_name_adj;
 
 	old_hashlen = xfs_init_parent_name_rec(&new_parent->old_rec, old_dp,
 			old_name, child);
@@ -279,6 +323,8 @@ xfs_parent_replace(
 	if (new_hashlen < 0)
 		return new_hashlen;
 
+	new_name_adj = xfs_parent_valuelen_adj(new_hashlen);
+
 	new_parent->args.name = (const uint8_t *)&new_parent->old_rec;
 	new_parent->args.namelen = xfs_parent_name_rec_sizeof(old_hashlen);
 	new_parent->args.new_name = (const uint8_t *)&new_parent->rec;
@@ -286,8 +332,8 @@ xfs_parent_replace(
 	args->trans = tp;
 	args->dp = child;
 
-	new_parent->args.value = (void *)new_name->name;
-	new_parent->args.valuelen = new_name->len;
+	new_parent->args.value = (void *)new_name->name + new_name_adj;
+	new_parent->args.valuelen = new_name->len - new_name_adj;
 
 	args->hashval = xfs_da_hashname(args->name, args->namelen);
 	return xfs_attr_defer_replace(args);
@@ -331,10 +377,13 @@ xfs_parent_lookup(
 	struct xfs_parent_scratch	*scr)
 {
 	int				reclen;
+	int				name_adj;
 	int				error;
 
 	xfs_parent_irec_to_disk(&scr->rec, &reclen, NULL, NULL, pptr);
 
+	name_adj = xfs_parent_valuelen_adj(pptr->hashlen);
+
 	memset(&scr->args, 0, sizeof(struct xfs_da_args));
 	scr->args.attr_filter	= XFS_ATTR_PARENT;
 	scr->args.dp		= ip;
@@ -343,8 +392,8 @@ xfs_parent_lookup(
 	scr->args.namelen	= reclen;
 	scr->args.op_flags	= XFS_DA_OP_OKNOENT;
 	scr->args.trans		= tp;
-	scr->args.valuelen	= namelen;
-	scr->args.value		= name;
+	scr->args.valuelen	= namelen - name_adj;
+	scr->args.value		= name + name_adj;
 	scr->args.whichfork	= XFS_ATTR_FORK;
 
 	scr->args.hashval = xfs_da_hashname(scr->args.name, scr->args.namelen);
@@ -353,7 +402,8 @@ xfs_parent_lookup(
 	if (error)
 		return error;
 
-	return scr->args.valuelen;
+	memcpy(name, pptr->p_namehash, name_adj);
+	return scr->args.valuelen + name_adj;
 }
 
 /*
@@ -369,17 +419,20 @@ xfs_parent_set(
 	struct xfs_parent_scratch	*scr)
 {
 	int				reclen;
+	int				name_adj;
 
 	xfs_parent_irec_to_disk(&scr->rec, &reclen, NULL, NULL, pptr);
 
+	name_adj = xfs_parent_valuelen_adj(pptr->hashlen);
+
 	memset(&scr->args, 0, sizeof(struct xfs_da_args));
 	scr->args.attr_filter	= XFS_ATTR_PARENT;
 	scr->args.dp		= ip;
 	scr->args.geo		= ip->i_mount->m_attr_geo;
 	scr->args.name		= (const unsigned char *)&scr->rec;
 	scr->args.namelen	= reclen;
-	scr->args.valuelen	= pptr->p_namelen;
-	scr->args.value		= (void *)pptr->p_name;
+	scr->args.valuelen	= pptr->p_namelen - name_adj;
+	scr->args.value		= (void *)pptr->p_name + name_adj;
 	scr->args.whichfork	= XFS_ATTR_FORK;
 
 	return xfs_attr_set(&scr->args);
@@ -430,12 +483,16 @@ xfs_parent_namehash(
 	ASSERT(SHA512_DIGEST_SIZE ==
 			crypto_shash_digestsize(ip->i_mount->m_sha512));
 
-	if (namehash_len != SHA512_DIGEST_SIZE) {
+	if (namehash_len != XFS_PARENT_NAME_MAX_HASH_SIZE) {
 		ASSERT(0);
 		return -EINVAL;
 	}
 
-	if (name->len < namehash_len) {
+	if (name->len < XFS_PARENT_NAME_MAX_HASH_SIZE) {
+		/*
+		 * If the dirent name is shorter than the size of the namehash
+		 * field, write it directly into the namehash field.
+		 */
 		memcpy(namehash, name->name, name->len);
 		memset(namehash + name->len, 0, namehash_len - name->len);
 		return name->len;
@@ -453,11 +510,17 @@ xfs_parent_namehash(
 	if (error)
 		goto out;
 
-	error = sha512_done(&shash, namehash);
+	/*
+	 * The sha512 hash of the child gen and dirent name is placed at the
+	 * end of the namehash, and as many bytes as will fit are copied from
+	 * the dirent name to the start of the namehash.
+	 */
+	error = sha512_done(&shash, namehash + XFS_PARENT_NAME_SHA512_OFFSET);
 	if (error)
 		goto out;
 
-	error = SHA512_DIGEST_SIZE;
+	memcpy(namehash, name->name, XFS_PARENT_NAME_SHA512_OFFSET);
+	error = XFS_PARENT_NAME_MAX_HASH_SIZE;
 out:
 	sha512_erase(&shash);
 	return error;
diff --git a/fs/xfs/libxfs/xfs_parent.h b/fs/xfs/libxfs/xfs_parent.h
index 3431aac75e92..6f6136165efe 100644
--- a/fs/xfs/libxfs/xfs_parent.h
+++ b/fs/xfs/libxfs/xfs_parent.h
@@ -12,8 +12,8 @@ extern struct kmem_cache	*xfs_parent_intent_cache;
 bool xfs_parent_namecheck(struct xfs_mount *mp,
 		const struct xfs_parent_name_rec *rec, size_t reclen,
 		unsigned int attr_flags);
-bool xfs_parent_valuecheck(struct xfs_mount *mp, const void *value,
-		size_t valuelen);
+bool xfs_parent_valuecheck(struct xfs_mount *mp, size_t namelen,
+		const void *value, size_t valuelen);
 
 /*
  * Incore version of a parent pointer, also contains dirent name so callers
@@ -24,7 +24,7 @@ struct xfs_parent_name_irec {
 	xfs_ino_t		p_ino;
 	uint32_t		p_gen;
 	uint8_t			hashlen;
-	uint8_t			p_namehash[XFS_PARENT_NAME_HASH_SIZE];
+	uint8_t			p_namehash[XFS_PARENT_NAME_MAX_HASH_SIZE];
 
 	/* Attributes of a parent pointer. */
 	uint8_t			p_namelen;
diff --git a/fs/xfs/scrub/dir_repair.c b/fs/xfs/scrub/dir_repair.c
index b12548787321..76953575f0b2 100644
--- a/fs/xfs/scrub/dir_repair.c
+++ b/fs/xfs/scrub/dir_repair.c
@@ -537,7 +537,7 @@ xrep_dir_scan_parent_pointer(
 
 	/* Does the ondisk parent pointer structure make sense? */
 	if (!xfs_parent_namecheck(sc->mp, rec, namelen, attr_flags) ||
-	    !xfs_parent_valuecheck(sc->mp, value, valuelen))
+	    !xfs_parent_valuecheck(sc->mp, namelen, value, valuelen))
 		return -EFSCORRUPTED;
 
 	xfs_parent_irec_from_disk(&rd->pptr, rec, namelen, value, valuelen);
diff --git a/fs/xfs/scrub/parent.c b/fs/xfs/scrub/parent.c
index b47f0bcef690..f3b1d7cbe415 100644
--- a/fs/xfs/scrub/parent.c
+++ b/fs/xfs/scrub/parent.c
@@ -350,7 +350,7 @@ struct xchk_pptrs {
 	struct xfs_parent_scratch pptr_scratch;
 
 	/* Name hashes */
-	uint8_t			child_namehash[XFS_PARENT_NAME_HASH_SIZE];
+	uint8_t			child_namehash[XFS_PARENT_NAME_MAX_HASH_SIZE];
 
 	/* Name buffer for revalidation. */
 	uint8_t			namebuf[MAXNAMELEN];
@@ -548,7 +548,7 @@ xchk_parent_scan_attr(
 		return -ECANCELED;
 	}
 
-	if (!xfs_parent_valuecheck(sc->mp, value, valuelen)) {
+	if (!xfs_parent_valuecheck(sc->mp, namelen, value, valuelen)) {
 		xchk_fblock_set_corrupt(sc, XFS_ATTR_FORK, 0);
 		return -ECANCELED;
 	}
diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
index da807f286a09..792c01a49749 100644
--- a/fs/xfs/xfs_attr_item.c
+++ b/fs/xfs/xfs_attr_item.c
@@ -769,7 +769,7 @@ xlog_recover_attri_commit_pass2(
 	switch (op) {
 	case XFS_ATTRI_OP_FLAGS_SET:
 	case XFS_ATTRI_OP_FLAGS_REPLACE:
-		if (item->ri_total != 3) {
+		if (item->ri_total != 3 && item->ri_total != 2) {
 			XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
 					     attri_formatp, len);
 			return -EFSCORRUPTED;
@@ -783,7 +783,7 @@ xlog_recover_attri_commit_pass2(
 		}
 		break;
 	case XFS_ATTRI_OP_FLAGS_NVREPLACE:
-		if (item->ri_total != 4) {
+		if (item->ri_total != 3 && item->ri_total != 4) {
 			XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
 					     attri_formatp, len);
 			return -EFSCORRUPTED;
diff --git a/fs/xfs/xfs_ondisk.h b/fs/xfs/xfs_ondisk.h
index 24361ae0fd48..5f32dea26221 100644
--- a/fs/xfs/xfs_ondisk.h
+++ b/fs/xfs/xfs_ondisk.h
@@ -117,7 +117,9 @@ xfs_check_ondisk_structs(void)
 	XFS_CHECK_OFFSET(xfs_dir2_sf_entry_t, name,		3);
 	XFS_CHECK_STRUCT_SIZE(xfs_dir2_sf_hdr_t,		10);
 	XFS_CHECK_STRUCT_SIZE(struct xfs_parent_name_rec,	12);
-	BUILD_BUG_ON(XFS_PARENT_NAME_HASH_SIZE != SHA512_DIGEST_SIZE);
+	BUILD_BUG_ON(XFS_PARENT_NAME_MAX_HASH_SIZE < SHA512_DIGEST_SIZE);
+	BUILD_BUG_ON(XFS_PARENT_NAME_MAX_HASH_SIZE !=           243);
+	BUILD_BUG_ON(XFS_PARENT_NAME_SHA512_OFFSET !=           179);
 
 	/* log structures */
 	XFS_CHECK_STRUCT_SIZE(struct xfs_buf_log_format,	88);


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 1/3] xfs: rename xfs_pptr_info to xfs_getparents
  2023-02-16 20:28 ` [PATCHSET v9r2d1 0/3] xfs: use flex arrays for XFS_IOC_GETPARENTS Darrick J. Wong
@ 2023-02-16 20:52   ` Darrick J. Wong
  2023-02-16 20:53   ` [PATCH 2/3] xfs: rename xfs_parent_ptr Darrick J. Wong
  2023-02-16 20:53   ` [PATCH 3/3] xfs: convert GETPARENTS structures to flex arrays Darrick J. Wong
  2 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:52 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Rename the head structure of the parent pointer ioctl to match the name
of the ioctl (XFS_IOC_GETPARENTS).

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_fs.h    |   51 +++++++++++++++++++++++----------------------
 fs/xfs/xfs_ioctl.c        |   34 +++++++++++++++---------------
 fs/xfs/xfs_ondisk.h       |    2 +-
 fs/xfs/xfs_parent_utils.c |   20 +++++++++---------
 fs/xfs/xfs_parent_utils.h |    2 +-
 fs/xfs/xfs_trace.h        |   14 ++++++------
 6 files changed, 62 insertions(+), 61 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index c65345d2ba7a..2a23c010a0a0 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -752,19 +752,20 @@ struct xfs_scrub_metadata {
 				 XFS_SCRUB_OFLAG_NO_REPAIR_NEEDED)
 #define XFS_SCRUB_FLAGS_ALL	(XFS_SCRUB_FLAGS_IN | XFS_SCRUB_FLAGS_OUT)
 
-#define XFS_PPTR_MAXNAMELEN				256
+#define XFS_GETPARENTS_MAXNAMELEN	256
 
 /* return parents of the handle, not the open fd */
-#define XFS_PPTR_IFLAG_HANDLE  (1U << 0)
+#define XFS_GETPARENTS_IFLAG_HANDLE	(1U << 0)
 
 /* target was the root directory */
-#define XFS_PPTR_OFLAG_ROOT    (1U << 1)
+#define XFS_GETPARENTS_OFLAG_ROOT	(1U << 1)
 
 /* Cursor is done iterating pptrs */
-#define XFS_PPTR_OFLAG_DONE    (1U << 2)
+#define XFS_GETPARENTS_OFLAG_DONE	(1U << 2)
 
- #define XFS_PPTR_FLAG_ALL     (XFS_PPTR_IFLAG_HANDLE | XFS_PPTR_OFLAG_ROOT | \
-				XFS_PPTR_OFLAG_DONE)
+#define XFS_GETPARENTS_FLAG_ALL		(XFS_GETPARENTS_IFLAG_HANDLE | \
+					 XFS_GETPARENTS_OFLAG_ROOT | \
+					 XFS_GETPARENTS_OFLAG_DONE)
 
 /* Get an inode parent pointer through ioctl */
 struct xfs_parent_ptr {
@@ -772,57 +773,57 @@ struct xfs_parent_ptr {
 	__u32		xpp_gen;			/* Inode generation */
 	__u32		xpp_rsvd;			/* Reserved */
 	__u64		xpp_rsvd2;			/* Reserved */
-	__u8		xpp_name[XFS_PPTR_MAXNAMELEN];	/* File name */
+	__u8		xpp_name[XFS_GETPARENTS_MAXNAMELEN];	/* File name */
 };
 
 /* Iterate through an inodes parent pointers */
-struct xfs_pptr_info {
-	/* File handle, if XFS_PPTR_IFLAG_HANDLE is set */
-	struct xfs_handle		pi_handle;
+struct xfs_getparents {
+	/* File handle, if XFS_GETPARENTS_IFLAG_HANDLE is set */
+	struct xfs_handle		gp_handle;
 
 	/*
 	 * Structure to track progress in iterating the parent pointers.
 	 * Must be initialized to zeroes before the first ioctl call, and
 	 * not touched by callers after that.
 	 */
-	struct xfs_attrlist_cursor	pi_cursor;
+	struct xfs_attrlist_cursor	gp_cursor;
 
-	/* Operational flags: XFS_PPTR_*FLAG* */
-	__u32				pi_flags;
+	/* Operational flags: XFS_GETPARENTS_*FLAG* */
+	__u32				gp_flags;
 
 	/* Must be set to zero */
-	__u32				pi_reserved;
+	__u32				gp_reserved;
 
 	/* # of entries in array */
-	__u32				pi_ptrs_size;
+	__u32				gp_ptrs_size;
 
 	/* # of entries filled in (output) */
-	__u32				pi_ptrs_used;
+	__u32				gp_ptrs_used;
 
 	/* Must be set to zero */
-	__u64				pi_reserved2[6];
+	__u64				gp_reserved2[6];
 
 	/*
 	 * An array of struct xfs_parent_ptr follows the header
-	 * information. Use xfs_ppinfo_to_pp() to access the
+	 * information. Use xfs_getparents_rec() to access the
 	 * parent pointer array entries.
 	 */
-	struct xfs_parent_ptr		pi_parents[];
+	struct xfs_parent_ptr		gp_parents[];
 };
 
 static inline size_t
-xfs_pptr_info_sizeof(int nr_ptrs)
+xfs_getparents_sizeof(int nr_ptrs)
 {
-	return sizeof(struct xfs_pptr_info) +
+	return sizeof(struct xfs_getparents) +
 	       (nr_ptrs * sizeof(struct xfs_parent_ptr));
 }
 
 static inline struct xfs_parent_ptr*
-xfs_ppinfo_to_pp(
-	struct xfs_pptr_info	*info,
-	int			idx)
+xfs_getparents_rec(
+	struct xfs_getparents	*info,
+	unsigned int		idx)
 {
-	return &info->pi_parents[idx];
+	return &info->gp_parents[idx];
 }
 
 /*
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 4c36ddd19dbd..2687e9965310 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -1679,12 +1679,12 @@ xfs_ioc_scrub_metadata(
 
 /*
  * IOCTL routine to get the parent pointers of an inode and return it to user
- * space.  Caller must pass a buffer space containing a struct xfs_pptr_info,
+ * space.  Caller must pass a buffer space containing a struct xfs_getparents,
  * followed by a region large enough to contain an array of struct
- * xfs_parent_ptr of a size specified in pi_ptrs_size.  If the inode contains
+ * xfs_parent_ptr of a size specified in gp_ptrs_size.  If the inode contains
  * more parent pointers than can fit in the buffer space, caller may re-call
- * the function using the returned pi_cursor to resume iteration.  The
- * number of xfs_parent_ptr returned will be stored in pi_ptrs_used.
+ * the function using the returned gp_cursor to resume iteration.  The
+ * number of xfs_parent_ptr returned will be stored in gp_ptrs_used.
  *
  * Returns 0 on success or non-zero on failure
  */
@@ -1693,7 +1693,7 @@ xfs_ioc_get_parent_pointer(
 	struct file			*filp,
 	void				__user *arg)
 {
-	struct xfs_pptr_info		*ppi = NULL;
+	struct xfs_getparents		*ppi = NULL;
 	int				error = 0;
 	struct xfs_inode		*file_ip = XFS_I(file_inode(filp));
 	struct xfs_inode		*call_ip = file_ip;
@@ -1702,42 +1702,42 @@ xfs_ioc_get_parent_pointer(
 	if (!capable(CAP_SYS_ADMIN))
 		return -EPERM;
 
-	/* Allocate an xfs_pptr_info to put the user data */
-	ppi = kvmalloc(sizeof(struct xfs_pptr_info), GFP_KERNEL);
+	/* Allocate an xfs_getparents to put the user data */
+	ppi = kvmalloc(sizeof(struct xfs_getparents), GFP_KERNEL);
 	if (!ppi)
 		return -ENOMEM;
 
 	/* Copy the data from the user */
-	error = copy_from_user(ppi, arg, sizeof(struct xfs_pptr_info));
+	error = copy_from_user(ppi, arg, sizeof(struct xfs_getparents));
 	if (error) {
 		error = -EFAULT;
 		goto out;
 	}
 
 	/* Check size of buffer requested by user */
-	if (xfs_pptr_info_sizeof(ppi->pi_ptrs_size) > XFS_XATTR_LIST_MAX) {
+	if (xfs_getparents_sizeof(ppi->gp_ptrs_size) > XFS_XATTR_LIST_MAX) {
 		error = -ENOMEM;
 		goto out;
 	}
 
-	if (ppi->pi_flags & ~XFS_PPTR_FLAG_ALL) {
+	if (ppi->gp_flags & ~XFS_GETPARENTS_FLAG_ALL) {
 		error = -EINVAL;
 		goto out;
 	}
-	ppi->pi_flags &= ~(XFS_PPTR_OFLAG_ROOT | XFS_PPTR_OFLAG_DONE);
+	ppi->gp_flags &= ~(XFS_GETPARENTS_OFLAG_ROOT | XFS_GETPARENTS_OFLAG_DONE);
 
 	/*
 	 * Now that we know how big the trailing buffer is, expand
-	 * our kernel xfs_pptr_info to be the same size
+	 * our kernel xfs_getparents to be the same size
 	 */
-	ppi = kvrealloc(ppi, sizeof(struct xfs_pptr_info),
-			xfs_pptr_info_sizeof(ppi->pi_ptrs_size),
+	ppi = kvrealloc(ppi, sizeof(struct xfs_getparents),
+			xfs_getparents_sizeof(ppi->gp_ptrs_size),
 			GFP_KERNEL | __GFP_ZERO);
 	if (!ppi)
 		return -ENOMEM;
 
-	if (ppi->pi_flags & XFS_PPTR_IFLAG_HANDLE) {
-		struct xfs_handle	*hanp = &ppi->pi_handle;
+	if (ppi->gp_flags & XFS_GETPARENTS_IFLAG_HANDLE) {
+		struct xfs_handle	*hanp = &ppi->gp_handle;
 
 		if (memcmp(&hanp->ha_fsid, mp->m_fixedfsid,
 							sizeof(xfs_fsid_t))) {
@@ -1765,7 +1765,7 @@ xfs_ioc_get_parent_pointer(
 
 	/* Copy the parent pointers back to the user */
 	error = copy_to_user(arg, ppi,
-			xfs_pptr_info_sizeof(ppi->pi_ptrs_size));
+			xfs_getparents_sizeof(ppi->gp_ptrs_size));
 	if (error) {
 		error = -EFAULT;
 		goto out;
diff --git a/fs/xfs/xfs_ondisk.h b/fs/xfs/xfs_ondisk.h
index 5f32dea26221..ba85dec53b0f 100644
--- a/fs/xfs/xfs_ondisk.h
+++ b/fs/xfs/xfs_ondisk.h
@@ -158,7 +158,7 @@ xfs_check_ondisk_structs(void)
 
 	/* parent pointer ioctls */
 	XFS_CHECK_STRUCT_SIZE(struct xfs_parent_ptr,            280);
-	XFS_CHECK_STRUCT_SIZE(struct xfs_pptr_info,             104);
+	XFS_CHECK_STRUCT_SIZE(struct xfs_getparents,		104);
 
 	/*
 	 * The v5 superblock format extended several v4 header structures with
diff --git a/fs/xfs/xfs_parent_utils.c b/fs/xfs/xfs_parent_utils.c
index 284ca3c14a0f..d10d04a8a3c4 100644
--- a/fs/xfs/xfs_parent_utils.c
+++ b/fs/xfs/xfs_parent_utils.c
@@ -26,7 +26,7 @@
 struct xfs_getparent_ctx {
 	struct xfs_attr_list_context	context;
 	struct xfs_parent_name_irec	pptr_irec;
-	struct xfs_pptr_info		*ppi;
+	struct xfs_getparents		*ppi;
 };
 
 static void
@@ -39,7 +39,7 @@ xfs_getparent_listent(
 	int				valuelen)
 {
 	struct xfs_getparent_ctx	*gp;
-	struct xfs_pptr_info		*ppi;
+	struct xfs_getparents		*ppi;
 	struct xfs_parent_ptr		*pptr;
 	struct xfs_parent_name_irec	*irec;
 	struct xfs_mount		*mp = context->dp->i_mount;
@@ -69,7 +69,7 @@ xfs_getparent_listent(
 	 * to the caller that we did /not/ reach the end of the parent pointer
 	 * recordset.
 	 */
-	if (ppi->pi_ptrs_used >= ppi->pi_ptrs_size) {
+	if (ppi->gp_ptrs_used >= ppi->gp_ptrs_size) {
 		context->seen_enough = 1;
 		return;
 	}
@@ -80,7 +80,7 @@ xfs_getparent_listent(
 	trace_xfs_getparent_listent(context->dp, ppi, irec);
 
 	/* Format the parent pointer directly into the caller buffer. */
-	pptr = &ppi->pi_parents[ppi->pi_ptrs_used++];
+	pptr = &ppi->gp_parents[ppi->gp_ptrs_used++];
 	pptr->xpp_ino = irec->p_ino;
 	pptr->xpp_gen = irec->p_gen;
 	pptr->xpp_rsvd2 = 0;
@@ -95,7 +95,7 @@ xfs_getparent_listent(
 int
 xfs_getparent_pointers(
 	struct xfs_inode		*ip,
-	struct xfs_pptr_info		*ppi)
+	struct xfs_getparents		*ppi)
 {
 	struct xfs_getparent_ctx	*gp;
 	int				error;
@@ -110,9 +110,9 @@ xfs_getparent_pointers(
 	gp->context.bufsize = 1; /* always init cursor */
 
 	/* Copy the cursor provided by caller */
-	memcpy(&gp->context.cursor, &ppi->pi_cursor,
+	memcpy(&gp->context.cursor, &ppi->gp_cursor,
 			sizeof(struct xfs_attrlist_cursor));
-	ppi->pi_ptrs_used = 0;
+	ppi->gp_ptrs_used = 0;
 
 	trace_xfs_getparent_pointers(ip, ppi, &gp->context.cursor);
 
@@ -126,17 +126,17 @@ xfs_getparent_pointers(
 
 	/* Is this the root directory? */
 	if (ip->i_ino == ip->i_mount->m_sb.sb_rootino)
-		ppi->pi_flags |= XFS_PPTR_OFLAG_ROOT;
+		ppi->gp_flags |= XFS_GETPARENTS_OFLAG_ROOT;
 
 	/*
 	 * If we did not run out of buffer space, then we reached the end of
 	 * the pptr recordset, so set the DONE flag.
 	 */
 	if (gp->context.seen_enough == 0)
-		ppi->pi_flags |= XFS_PPTR_OFLAG_DONE;
+		ppi->gp_flags |= XFS_GETPARENTS_OFLAG_DONE;
 
 	/* Update the caller with the current cursor position */
-	memcpy(&ppi->pi_cursor, &gp->context.cursor,
+	memcpy(&ppi->gp_cursor, &gp->context.cursor,
 			sizeof(struct xfs_attrlist_cursor));
 out_free:
 	kfree(gp);
diff --git a/fs/xfs/xfs_parent_utils.h b/fs/xfs/xfs_parent_utils.h
index 9936c74e6f96..01f127dae086 100644
--- a/fs/xfs/xfs_parent_utils.h
+++ b/fs/xfs/xfs_parent_utils.h
@@ -6,6 +6,6 @@
 #ifndef	__XFS_PARENT_UTILS_H__
 #define	__XFS_PARENT_UTILS_H__
 
-int xfs_getparent_pointers(struct xfs_inode *ip, struct xfs_pptr_info *ppi);
+int xfs_getparent_pointers(struct xfs_inode *ip, struct xfs_getparents *ppi);
 
 #endif	/* __XFS_PARENT_UTILS_H__ */
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 959aff69822d..d31f47eced4c 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -74,7 +74,7 @@ struct xfs_inobt_rec_incore;
 union xfs_btree_ptr;
 struct xfs_dqtrx;
 struct xfs_icwalk;
-struct xfs_pptr_info;
+struct xfs_getparents;
 struct xfs_parent_name_irec;
 struct xfs_attrlist_cursor_kern;
 
@@ -4321,7 +4321,7 @@ TRACE_EVENT(xfs_force_shutdown,
 );
 
 TRACE_EVENT(xfs_getparent_listent,
-	TP_PROTO(struct xfs_inode *ip, const struct xfs_pptr_info *ppi,
+	TP_PROTO(struct xfs_inode *ip, const struct xfs_getparents *ppi,
 	         const struct xfs_parent_name_irec *irec),
 	TP_ARGS(ip, ppi, irec),
 	TP_STRUCT__entry(
@@ -4337,8 +4337,8 @@ TRACE_EVENT(xfs_getparent_listent,
 	TP_fast_assign(
 		__entry->dev = ip->i_mount->m_super->s_dev;
 		__entry->ino = ip->i_ino;
-		__entry->pused = ppi->pi_ptrs_used;
-		__entry->psize = ppi->pi_ptrs_size;
+		__entry->pused = ppi->gp_ptrs_used;
+		__entry->psize = ppi->gp_ptrs_size;
 		__entry->parent_ino = irec->p_ino;
 		__entry->parent_gen = irec->p_gen;
 		__entry->namelen = irec->p_namelen;
@@ -4356,7 +4356,7 @@ TRACE_EVENT(xfs_getparent_listent,
 );
 
 TRACE_EVENT(xfs_getparent_pointers,
-	TP_PROTO(struct xfs_inode *ip, const struct xfs_pptr_info *ppi,
+	TP_PROTO(struct xfs_inode *ip, const struct xfs_getparents *ppi,
 		 const struct xfs_attrlist_cursor_kern *cur),
 	TP_ARGS(ip, ppi, cur),
 	TP_STRUCT__entry(
@@ -4372,8 +4372,8 @@ TRACE_EVENT(xfs_getparent_pointers,
 	TP_fast_assign(
 		__entry->dev = ip->i_mount->m_super->s_dev;
 		__entry->ino = ip->i_ino;
-		__entry->flags = ppi->pi_flags;
-		__entry->psize = ppi->pi_ptrs_size;
+		__entry->flags = ppi->gp_flags;
+		__entry->psize = ppi->gp_ptrs_size;
 		__entry->hashval = cur->hashval;
 		__entry->blkno = cur->blkno;
 		__entry->offset = cur->offset;


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 2/3] xfs: rename xfs_parent_ptr
  2023-02-16 20:28 ` [PATCHSET v9r2d1 0/3] xfs: use flex arrays for XFS_IOC_GETPARENTS Darrick J. Wong
  2023-02-16 20:52   ` [PATCH 1/3] xfs: rename xfs_pptr_info to xfs_getparents Darrick J. Wong
@ 2023-02-16 20:53   ` Darrick J. Wong
  2023-02-16 20:53   ` [PATCH 3/3] xfs: convert GETPARENTS structures to flex arrays Darrick J. Wong
  2 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:53 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Change the name to xfs_getparents_rec so that the name matches the head
structure.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_fs.h    |   22 +++++++++++-----------
 fs/xfs/xfs_ioctl.c        |    4 ++--
 fs/xfs/xfs_ondisk.h       |    2 +-
 fs/xfs/xfs_parent_utils.c |   16 ++++++++--------
 4 files changed, 22 insertions(+), 22 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index 2a23c010a0a0..ec6fdf78fde7 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -768,12 +768,12 @@ struct xfs_scrub_metadata {
 					 XFS_GETPARENTS_OFLAG_DONE)
 
 /* Get an inode parent pointer through ioctl */
-struct xfs_parent_ptr {
-	__u64		xpp_ino;			/* Inode */
-	__u32		xpp_gen;			/* Inode generation */
-	__u32		xpp_rsvd;			/* Reserved */
-	__u64		xpp_rsvd2;			/* Reserved */
-	__u8		xpp_name[XFS_GETPARENTS_MAXNAMELEN];	/* File name */
+struct xfs_getparents_rec {
+	__u64		gpr_ino;			/* Inode */
+	__u32		gpr_gen;			/* Inode generation */
+	__u32		gpr_rsvd;			/* Reserved */
+	__u64		gpr_rsvd2;			/* Reserved */
+	__u8		gpr_name[XFS_GETPARENTS_MAXNAMELEN];	/* File name */
 };
 
 /* Iterate through an inodes parent pointers */
@@ -804,21 +804,21 @@ struct xfs_getparents {
 	__u64				gp_reserved2[6];
 
 	/*
-	 * An array of struct xfs_parent_ptr follows the header
+	 * An array of struct xfs_getparents_rec follows the header
 	 * information. Use xfs_getparents_rec() to access the
 	 * parent pointer array entries.
 	 */
-	struct xfs_parent_ptr		gp_parents[];
+	struct xfs_getparents_rec		gp_parents[];
 };
 
 static inline size_t
 xfs_getparents_sizeof(int nr_ptrs)
 {
 	return sizeof(struct xfs_getparents) +
-	       (nr_ptrs * sizeof(struct xfs_parent_ptr));
+	       (nr_ptrs * sizeof(struct xfs_getparents_rec));
 }
 
-static inline struct xfs_parent_ptr*
+static inline struct xfs_getparents_rec*
 xfs_getparents_rec(
 	struct xfs_getparents	*info,
 	unsigned int		idx)
@@ -871,7 +871,7 @@ xfs_getparents_rec(
 /*	XFS_IOC_GETFSMAP ------ hoisted 59         */
 #define XFS_IOC_SCRUB_METADATA	_IOWR('X', 60, struct xfs_scrub_metadata)
 #define XFS_IOC_AG_GEOMETRY	_IOWR('X', 61, struct xfs_ag_geometry)
-#define XFS_IOC_GETPARENTS	_IOWR('X', 62, struct xfs_parent_ptr)
+#define XFS_IOC_GETPARENTS	_IOWR('X', 62, struct xfs_getparents_rec)
 
 /*
  * ioctl commands that replace IRIX syssgi()'s
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 2687e9965310..b3154830ef91 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -1681,10 +1681,10 @@ xfs_ioc_scrub_metadata(
  * IOCTL routine to get the parent pointers of an inode and return it to user
  * space.  Caller must pass a buffer space containing a struct xfs_getparents,
  * followed by a region large enough to contain an array of struct
- * xfs_parent_ptr of a size specified in gp_ptrs_size.  If the inode contains
+ * xfs_getparents_rec of a size specified in gp_ptrs_size.  If the inode contains
  * more parent pointers than can fit in the buffer space, caller may re-call
  * the function using the returned gp_cursor to resume iteration.  The
- * number of xfs_parent_ptr returned will be stored in gp_ptrs_used.
+ * number of xfs_getparents_rec returned will be stored in gp_ptrs_used.
  *
  * Returns 0 on success or non-zero on failure
  */
diff --git a/fs/xfs/xfs_ondisk.h b/fs/xfs/xfs_ondisk.h
index ba85dec53b0f..38d8113b832d 100644
--- a/fs/xfs/xfs_ondisk.h
+++ b/fs/xfs/xfs_ondisk.h
@@ -157,7 +157,7 @@ xfs_check_ondisk_structs(void)
 	XFS_CHECK_OFFSET(struct xfs_efi_log_format_64, efi_extents,	16);
 
 	/* parent pointer ioctls */
-	XFS_CHECK_STRUCT_SIZE(struct xfs_parent_ptr,            280);
+	XFS_CHECK_STRUCT_SIZE(struct xfs_getparents_rec,	280);
 	XFS_CHECK_STRUCT_SIZE(struct xfs_getparents,		104);
 
 	/*
diff --git a/fs/xfs/xfs_parent_utils.c b/fs/xfs/xfs_parent_utils.c
index d10d04a8a3c4..801223d011e7 100644
--- a/fs/xfs/xfs_parent_utils.c
+++ b/fs/xfs/xfs_parent_utils.c
@@ -40,7 +40,7 @@ xfs_getparent_listent(
 {
 	struct xfs_getparent_ctx	*gp;
 	struct xfs_getparents		*ppi;
-	struct xfs_parent_ptr		*pptr;
+	struct xfs_getparents_rec	*pptr;
 	struct xfs_parent_name_irec	*irec;
 	struct xfs_mount		*mp = context->dp->i_mount;
 
@@ -81,14 +81,14 @@ xfs_getparent_listent(
 
 	/* Format the parent pointer directly into the caller buffer. */
 	pptr = &ppi->gp_parents[ppi->gp_ptrs_used++];
-	pptr->xpp_ino = irec->p_ino;
-	pptr->xpp_gen = irec->p_gen;
-	pptr->xpp_rsvd2 = 0;
-	pptr->xpp_rsvd = 0;
+	pptr->gpr_ino = irec->p_ino;
+	pptr->gpr_gen = irec->p_gen;
+	pptr->gpr_rsvd2 = 0;
+	pptr->gpr_rsvd = 0;
 
-	memcpy(pptr->xpp_name, irec->p_name, irec->p_namelen);
-	memset(pptr->xpp_name + irec->p_namelen, 0,
-			sizeof(pptr->xpp_name) - irec->p_namelen);
+	memcpy(pptr->gpr_name, irec->p_name, irec->p_namelen);
+	memset(pptr->gpr_name + irec->p_namelen, 0,
+			sizeof(pptr->gpr_name) - irec->p_namelen);
 }
 
 /* Retrieve the parent pointers for a given inode. */


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 3/3] xfs: convert GETPARENTS structures to flex arrays
  2023-02-16 20:28 ` [PATCHSET v9r2d1 0/3] xfs: use flex arrays for XFS_IOC_GETPARENTS Darrick J. Wong
  2023-02-16 20:52   ` [PATCH 1/3] xfs: rename xfs_pptr_info to xfs_getparents Darrick J. Wong
  2023-02-16 20:53   ` [PATCH 2/3] xfs: rename xfs_parent_ptr Darrick J. Wong
@ 2023-02-16 20:53   ` Darrick J. Wong
  2 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:53 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

The current definition of the GETPARENTS ioctl doesn't use the buffer
space terribly efficiently because each parent pointer record struct
incorporates enough space to hold the maximally sized dirent name.  Most
dirent names are much less than 255 bytes long, which means we're
wasting a lot of space.

Convert the xfs_getparents_rec structure to use a flex array to store
the dirent name as a null terminated string, which allows us to pack the
information much more densely.  For this to work, augment the
xfs_getparents struct to end with a flex array of buffer offsets to each
xfs_getparents_rec object, much as we do for the attrlist multi ioctl.
Record objects are allocated from the end of the buffer towards the
head.

Finally, reduce the amount of data that we copy to userspace to the head
array containg the offsets, and however much of the buffer's end is used
for the parent records.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_fs.h    |   38 ++++++++++++++------------------------
 fs/xfs/xfs_ioctl.c        |   43 ++++++++++++++++++++++++++++++++++++-------
 fs/xfs/xfs_ondisk.h       |    4 ++--
 fs/xfs/xfs_parent_utils.c |   31 ++++++++++++++++++++++---------
 fs/xfs/xfs_parent_utils.h |    9 +++++++++
 fs/xfs/xfs_trace.h        |   22 +++++++++++-----------
 6 files changed, 94 insertions(+), 53 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index ec6fdf78fde7..c8be149398a6 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -769,11 +769,11 @@ struct xfs_scrub_metadata {
 
 /* Get an inode parent pointer through ioctl */
 struct xfs_getparents_rec {
-	__u64		gpr_ino;			/* Inode */
-	__u32		gpr_gen;			/* Inode generation */
-	__u32		gpr_rsvd;			/* Reserved */
-	__u64		gpr_rsvd2;			/* Reserved */
-	__u8		gpr_name[XFS_GETPARENTS_MAXNAMELEN];	/* File name */
+	__u64		gpr_ino;	/* Inode */
+	__u32		gpr_gen;	/* Inode generation */
+	__u32		gpr_rsvd;	/* Reserved */
+	__u64		gpr_rsvd2;	/* Reserved */
+	__u8		gpr_name[];	/* File name and null terminator */
 };
 
 /* Iterate through an inodes parent pointers */
@@ -794,36 +794,26 @@ struct xfs_getparents {
 	/* Must be set to zero */
 	__u32				gp_reserved;
 
-	/* # of entries in array */
-	__u32				gp_ptrs_size;
+	/* size of the memory buffer in bytes, including this header */
+	__u32				gp_bufsize;
 
 	/* # of entries filled in (output) */
-	__u32				gp_ptrs_used;
+	__u32				gp_count;
 
 	/* Must be set to zero */
-	__u64				gp_reserved2[6];
+	__u64				gp_reserved2[5];
 
-	/*
-	 * An array of struct xfs_getparents_rec follows the header
-	 * information. Use xfs_getparents_rec() to access the
-	 * parent pointer array entries.
-	 */
-	struct xfs_getparents_rec		gp_parents[];
+	/* Byte offset of each xfs_getparents_rec object within the buffer. */
+	__u32				gp_offsets[];
 };
 
-static inline size_t
-xfs_getparents_sizeof(int nr_ptrs)
-{
-	return sizeof(struct xfs_getparents) +
-	       (nr_ptrs * sizeof(struct xfs_getparents_rec));
-}
-
 static inline struct xfs_getparents_rec*
 xfs_getparents_rec(
 	struct xfs_getparents	*info,
 	unsigned int		idx)
 {
-	return &info->gp_parents[idx];
+	return (struct xfs_getparents_rec *)((char *)info +
+					     info->gp_offsets[idx]);
 }
 
 /*
@@ -871,7 +861,7 @@ xfs_getparents_rec(
 /*	XFS_IOC_GETFSMAP ------ hoisted 59         */
 #define XFS_IOC_SCRUB_METADATA	_IOWR('X', 60, struct xfs_scrub_metadata)
 #define XFS_IOC_AG_GEOMETRY	_IOWR('X', 61, struct xfs_ag_geometry)
-#define XFS_IOC_GETPARENTS	_IOWR('X', 62, struct xfs_getparents_rec)
+#define XFS_IOC_GETPARENTS	_IOWR('X', 62, struct xfs_getparents)
 
 /*
  * ioctl commands that replace IRIX syssgi()'s
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index b3154830ef91..14138ce5100a 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -1684,7 +1684,7 @@ xfs_ioc_scrub_metadata(
  * xfs_getparents_rec of a size specified in gp_ptrs_size.  If the inode contains
  * more parent pointers than can fit in the buffer space, caller may re-call
  * the function using the returned gp_cursor to resume iteration.  The
- * number of xfs_getparents_rec returned will be stored in gp_ptrs_used.
+ * number of xfs_getparents_rec returned will be stored in gp_count.
  *
  * Returns 0 on success or non-zero on failure
  */
@@ -1698,6 +1698,9 @@ xfs_ioc_get_parent_pointer(
 	struct xfs_inode		*file_ip = XFS_I(file_inode(filp));
 	struct xfs_inode		*call_ip = file_ip;
 	struct xfs_mount		*mp = file_ip->i_mount;
+	void				__user *o_pptr;
+	struct xfs_getparents_rec	*i_pptr;
+	unsigned int			bytes;
 
 	if (!capable(CAP_SYS_ADMIN))
 		return -EPERM;
@@ -1715,10 +1718,14 @@ xfs_ioc_get_parent_pointer(
 	}
 
 	/* Check size of buffer requested by user */
-	if (xfs_getparents_sizeof(ppi->gp_ptrs_size) > XFS_XATTR_LIST_MAX) {
+	if (ppi->gp_bufsize > XFS_XATTR_LIST_MAX) {
 		error = -ENOMEM;
 		goto out;
 	}
+	if (ppi->gp_bufsize < sizeof(struct xfs_getparents)) {
+		error = -EINVAL;
+		goto out;
+	}
 
 	if (ppi->gp_flags & ~XFS_GETPARENTS_FLAG_ALL) {
 		error = -EINVAL;
@@ -1730,8 +1737,7 @@ xfs_ioc_get_parent_pointer(
 	 * Now that we know how big the trailing buffer is, expand
 	 * our kernel xfs_getparents to be the same size
 	 */
-	ppi = kvrealloc(ppi, sizeof(struct xfs_getparents),
-			xfs_getparents_sizeof(ppi->gp_ptrs_size),
+	ppi = kvrealloc(ppi, sizeof(struct xfs_getparents), ppi->gp_bufsize,
 			GFP_KERNEL | __GFP_ZERO);
 	if (!ppi)
 		return -ENOMEM;
@@ -1763,9 +1769,32 @@ xfs_ioc_get_parent_pointer(
 	if (error)
 		goto out;
 
-	/* Copy the parent pointers back to the user */
-	error = copy_to_user(arg, ppi,
-			xfs_getparents_sizeof(ppi->gp_ptrs_size));
+	/*
+	 * If we ran out of buffer space before copying any parent pointers at
+	 * all, the caller's buffer was too short.  Tell userspace that, erm,
+	 * the message is too long.
+	 */
+	if (ppi->gp_count == 0 && !(ppi->gp_flags & XFS_GETPARENTS_OFLAG_DONE)) {
+		error = -EMSGSIZE;
+		goto out;
+	}
+
+	/* Copy the parent pointer head back to the user */
+	bytes = xfs_getparents_arraytop(ppi, ppi->gp_count);
+	error = copy_to_user(arg, ppi, bytes);
+	if (error) {
+		error = -EFAULT;
+		goto out;
+	}
+
+	if (ppi->gp_count == 0)
+		goto out;
+
+	/* Copy the parent pointer records back to the user. */
+	o_pptr = (__user char*)arg + ppi->gp_offsets[ppi->gp_count - 1];
+	i_pptr = xfs_getparents_rec(ppi, ppi->gp_count - 1);
+	bytes = ((char *)ppi + ppi->gp_bufsize) - (char *)i_pptr;
+	error = copy_to_user(o_pptr, i_pptr, bytes);
 	if (error) {
 		error = -EFAULT;
 		goto out;
diff --git a/fs/xfs/xfs_ondisk.h b/fs/xfs/xfs_ondisk.h
index 38d8113b832d..b7f29b4acac3 100644
--- a/fs/xfs/xfs_ondisk.h
+++ b/fs/xfs/xfs_ondisk.h
@@ -157,8 +157,8 @@ xfs_check_ondisk_structs(void)
 	XFS_CHECK_OFFSET(struct xfs_efi_log_format_64, efi_extents,	16);
 
 	/* parent pointer ioctls */
-	XFS_CHECK_STRUCT_SIZE(struct xfs_getparents_rec,	280);
-	XFS_CHECK_STRUCT_SIZE(struct xfs_getparents,		104);
+	XFS_CHECK_STRUCT_SIZE(struct xfs_getparents_rec,	24);
+	XFS_CHECK_STRUCT_SIZE(struct xfs_getparents,		96);
 
 	/*
 	 * The v5 superblock format extended several v4 header structures with
diff --git a/fs/xfs/xfs_parent_utils.c b/fs/xfs/xfs_parent_utils.c
index 801223d011e7..04e2e93a1986 100644
--- a/fs/xfs/xfs_parent_utils.c
+++ b/fs/xfs/xfs_parent_utils.c
@@ -29,6 +29,14 @@ struct xfs_getparent_ctx {
 	struct xfs_getparents		*ppi;
 };
 
+static inline unsigned int
+xfs_getparents_rec_sizeof(
+	const struct xfs_parent_name_irec	*irec)
+{
+	return round_up(sizeof(struct xfs_getparents_rec) + irec->p_namelen + 1,
+			sizeof(uint32_t));
+}
+
 static void
 xfs_getparent_listent(
 	struct xfs_attr_list_context	*context,
@@ -43,6 +51,7 @@ xfs_getparent_listent(
 	struct xfs_getparents_rec	*pptr;
 	struct xfs_parent_name_irec	*irec;
 	struct xfs_mount		*mp = context->dp->i_mount;
+	int				arraytop;
 
 	gp = container_of(context, struct xfs_getparent_ctx, context);
 	ppi = gp->ppi;
@@ -64,31 +73,34 @@ xfs_getparent_listent(
 		return;
 	}
 
+	xfs_parent_irec_from_disk(&gp->pptr_irec, (void *)name, namelen, value,
+			valuelen);
+
 	/*
 	 * We found a parent pointer, but we've filled up the buffer.  Signal
 	 * to the caller that we did /not/ reach the end of the parent pointer
 	 * recordset.
 	 */
-	if (ppi->gp_ptrs_used >= ppi->gp_ptrs_size) {
+	arraytop = xfs_getparents_arraytop(ppi, ppi->gp_count + 1);
+	context->firstu -= xfs_getparents_rec_sizeof(irec);
+	if (context->firstu < arraytop) {
 		context->seen_enough = 1;
 		return;
 	}
 
-	xfs_parent_irec_from_disk(&gp->pptr_irec, (void *)name, namelen, value,
-			valuelen);
-
 	trace_xfs_getparent_listent(context->dp, ppi, irec);
 
 	/* Format the parent pointer directly into the caller buffer. */
-	pptr = &ppi->gp_parents[ppi->gp_ptrs_used++];
+	ppi->gp_offsets[ppi->gp_count] = context->firstu;
+	pptr = xfs_getparents_rec(ppi, ppi->gp_count);
 	pptr->gpr_ino = irec->p_ino;
 	pptr->gpr_gen = irec->p_gen;
 	pptr->gpr_rsvd2 = 0;
 	pptr->gpr_rsvd = 0;
 
 	memcpy(pptr->gpr_name, irec->p_name, irec->p_namelen);
-	memset(pptr->gpr_name + irec->p_namelen, 0,
-			sizeof(pptr->gpr_name) - irec->p_namelen);
+	pptr->gpr_name[irec->p_namelen] = 0;
+	ppi->gp_count++;
 }
 
 /* Retrieve the parent pointers for a given inode. */
@@ -107,12 +119,13 @@ xfs_getparent_pointers(
 	gp->context.dp = ip;
 	gp->context.resynch = 1;
 	gp->context.put_listent = xfs_getparent_listent;
-	gp->context.bufsize = 1; /* always init cursor */
+	gp->context.bufsize = round_down(ppi->gp_bufsize, sizeof(uint32_t));
+	gp->context.firstu = gp->context.bufsize;
 
 	/* Copy the cursor provided by caller */
 	memcpy(&gp->context.cursor, &ppi->gp_cursor,
 			sizeof(struct xfs_attrlist_cursor));
-	ppi->gp_ptrs_used = 0;
+	ppi->gp_count = 0;
 
 	trace_xfs_getparent_pointers(ip, ppi, &gp->context.cursor);
 
diff --git a/fs/xfs/xfs_parent_utils.h b/fs/xfs/xfs_parent_utils.h
index 01f127dae086..48de5b700f9c 100644
--- a/fs/xfs/xfs_parent_utils.h
+++ b/fs/xfs/xfs_parent_utils.h
@@ -6,6 +6,15 @@
 #ifndef	__XFS_PARENT_UTILS_H__
 #define	__XFS_PARENT_UTILS_H__
 
+static inline unsigned int
+xfs_getparents_arraytop(
+	const struct xfs_getparents	*ppi,
+	unsigned int			nr)
+{
+	return sizeof(struct xfs_getparents) +
+			(nr * sizeof(ppi->gp_offsets[0]));
+}
+
 int xfs_getparent_pointers(struct xfs_inode *ip, struct xfs_getparents *ppi);
 
 #endif	/* __XFS_PARENT_UTILS_H__ */
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index d31f47eced4c..f831ee910235 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -4327,8 +4327,8 @@ TRACE_EVENT(xfs_getparent_listent,
 	TP_STRUCT__entry(
 		__field(dev_t, dev)
 		__field(xfs_ino_t, ino)
-		__field(unsigned int, pused)
-		__field(unsigned int, psize)
+		__field(unsigned int, count)
+		__field(unsigned int, bufsize)
 		__field(xfs_ino_t, parent_ino)
 		__field(unsigned int, parent_gen)
 		__field(unsigned int, namelen)
@@ -4337,18 +4337,18 @@ TRACE_EVENT(xfs_getparent_listent,
 	TP_fast_assign(
 		__entry->dev = ip->i_mount->m_super->s_dev;
 		__entry->ino = ip->i_ino;
-		__entry->pused = ppi->gp_ptrs_used;
-		__entry->psize = ppi->gp_ptrs_size;
+		__entry->count = ppi->gp_count;
+		__entry->bufsize = ppi->gp_bufsize;
 		__entry->parent_ino = irec->p_ino;
 		__entry->parent_gen = irec->p_gen;
 		__entry->namelen = irec->p_namelen;
 		memcpy(__get_str(name), irec->p_name, irec->p_namelen);
 	),
-	TP_printk("dev %d:%d ino 0x%llx pptr %u/%u: parent_ino 0x%llx parent_gen 0x%x name '%.*s'",
+	TP_printk("dev %d:%d ino 0x%llx bufsize %u count %u: parent_ino 0x%llx parent_gen 0x%x name '%.*s'",
 		  MAJOR(__entry->dev), MINOR(__entry->dev),
 		  __entry->ino,
-		  __entry->pused,
-		  __entry->psize,
+		  __entry->bufsize,
+		  __entry->count,
 		  __entry->parent_ino,
 		  __entry->parent_gen,
 		  __entry->namelen,
@@ -4363,7 +4363,7 @@ TRACE_EVENT(xfs_getparent_pointers,
 		__field(dev_t, dev)
 		__field(xfs_ino_t, ino)
 		__field(unsigned int, flags)
-		__field(unsigned int, psize)
+		__field(unsigned int, bufsize)
 		__field(unsigned int, hashval)
 		__field(unsigned int, blkno)
 		__field(unsigned int, offset)
@@ -4373,17 +4373,17 @@ TRACE_EVENT(xfs_getparent_pointers,
 		__entry->dev = ip->i_mount->m_super->s_dev;
 		__entry->ino = ip->i_ino;
 		__entry->flags = ppi->gp_flags;
-		__entry->psize = ppi->gp_ptrs_size;
+		__entry->bufsize = ppi->gp_bufsize;
 		__entry->hashval = cur->hashval;
 		__entry->blkno = cur->blkno;
 		__entry->offset = cur->offset;
 		__entry->initted = cur->initted;
 	),
-	TP_printk("dev %d:%d ino 0x%llx flags 0x%x psize %u cur_init? %d hashval 0x%x blkno %u offset %u",
+	TP_printk("dev %d:%d ino 0x%llx flags 0x%x bufsize %u cur_init? %d hashval 0x%x blkno %u offset %u",
 		  MAJOR(__entry->dev), MINOR(__entry->dev),
 		  __entry->ino,
 		  __entry->flags,
-		  __entry->psize,
+		  __entry->bufsize,
 		  __entry->initted,
 		  __entry->hashval,
 		  __entry->blkno,


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 01/25] xfsprogs: Fix default superblock attr bits
  2023-02-16 20:29 ` [PATCHSET v9r2d1 00/25] xfsprogs: Parent Pointers Darrick J. Wong
@ 2023-02-16 20:53   ` Darrick J. Wong
  2023-02-16 20:54   ` [PATCH 02/25] xfsprogs: Add new name to attri/d Darrick J. Wong
                     ` (23 subsequent siblings)
  24 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:53 UTC (permalink / raw)
  To: djwong; +Cc: Allison Henderson, allison.henderson, linux-xfs

From: Allison Henderson <allison.henderson@oracle.com>

Recent parent pointer testing discovered that the default attr
configuration has XFS_SB_VERSION2_ATTR2BIT enabled but
XFS_SB_VERSION_ATTRBIT disabled.  This is incorrect since
XFS_SB_VERSION2_ATTR2BIT describes the format of the attr where
as XFS_SB_VERSION_ATTRBIT enables or disables attrs.  Fix this
by enableing XFS_SB_VERSION_ATTRBIT for either attr version 1 or 2

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
---
 mkfs/xfs_mkfs.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
index e219ec16..d95394a5 100644
--- a/mkfs/xfs_mkfs.c
+++ b/mkfs/xfs_mkfs.c
@@ -3205,7 +3205,7 @@ sb_set_features(
 		sbp->sb_versionnum |= XFS_SB_VERSION_DALIGNBIT;
 	if (fp->log_version == 2)
 		sbp->sb_versionnum |= XFS_SB_VERSION_LOGV2BIT;
-	if (fp->attr_version == 1)
+	if (fp->attr_version >= 1)
 		sbp->sb_versionnum |= XFS_SB_VERSION_ATTRBIT;
 	if (fp->nci)
 		sbp->sb_versionnum |= XFS_SB_VERSION_BORGBIT;


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 02/25] xfsprogs: Add new name to attri/d
  2023-02-16 20:29 ` [PATCHSET v9r2d1 00/25] xfsprogs: Parent Pointers Darrick J. Wong
  2023-02-16 20:53   ` [PATCH 01/25] xfsprogs: Fix default superblock attr bits Darrick J. Wong
@ 2023-02-16 20:54   ` Darrick J. Wong
  2023-02-16 20:54   ` [PATCH 03/25] xfsprogs: Increase XFS_DEFER_OPS_NR_INODES to 5 Darrick J. Wong
                     ` (22 subsequent siblings)
  24 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:54 UTC (permalink / raw)
  To: djwong; +Cc: Allison Henderson, allison.henderson, linux-xfs

From: Allison Henderson <allison.henderson@oracle.com>

This patch adds two new fields to the atti/d.  They are nname and
nnamelen.  This will be used for parent pointer updates since a
rename operation may cause the parent pointer to update both the
name and value.  So we need to carry both the new name as well as
the target name in the attri/d.

This patch also applies the necassary updates to print the new
attri/d name fields.

Source kernel commit: 7b3bde6f488372494236cb96da308b192bbe72c9

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
---
 libxfs/xfs_attr.c       |   12 +++++++++++-
 libxfs/xfs_attr.h       |    4 ++--
 libxfs/xfs_da_btree.h   |    2 ++
 libxfs/xfs_log_format.h |    6 ++++--
 logprint/log_redo.c     |   27 ++++++++++++++++++++++-----
 5 files changed, 41 insertions(+), 10 deletions(-)


diff --git a/libxfs/xfs_attr.c b/libxfs/xfs_attr.c
index 2103a06b..2f619286 100644
--- a/libxfs/xfs_attr.c
+++ b/libxfs/xfs_attr.c
@@ -421,6 +421,12 @@ xfs_attr_complete_op(
 	args->op_flags &= ~XFS_DA_OP_REPLACE;
 	if (do_replace) {
 		args->attr_filter &= ~XFS_ATTR_INCOMPLETE;
+		if (args->new_namelen > 0) {
+			args->name = args->new_name;
+			args->namelen = args->new_namelen;
+			args->hashval = xfs_da_hashname(args->name,
+							args->namelen);
+		}
 		return replace_state;
 	}
 	return XFS_DAS_DONE;
@@ -920,9 +926,13 @@ xfs_attr_defer_replace(
 	struct xfs_da_args	*args)
 {
 	struct xfs_attr_intent	*new;
+	int			op_flag;
 	int			error = 0;
 
-	error = xfs_attr_intent_init(args, XFS_ATTRI_OP_FLAGS_REPLACE, &new);
+	op_flag = args->new_namelen == 0 ? XFS_ATTRI_OP_FLAGS_REPLACE :
+		  XFS_ATTRI_OP_FLAGS_NVREPLACE;
+
+	error = xfs_attr_intent_init(args, op_flag, &new);
 	if (error)
 		return error;
 
diff --git a/libxfs/xfs_attr.h b/libxfs/xfs_attr.h
index 81be9b3e..3e81f3f4 100644
--- a/libxfs/xfs_attr.h
+++ b/libxfs/xfs_attr.h
@@ -510,8 +510,8 @@ struct xfs_attr_intent {
 	struct xfs_da_args		*xattri_da_args;
 
 	/*
-	 * Shared buffer containing the attr name and value so that the logging
-	 * code can share large memory buffers between log items.
+	 * Shared buffer containing the attr name, new name, and value so that
+	 * the logging code can share large memory buffers between log items.
 	 */
 	struct xfs_attri_log_nameval	*xattri_nameval;
 
diff --git a/libxfs/xfs_da_btree.h b/libxfs/xfs_da_btree.h
index ffa3df5b..a4b29827 100644
--- a/libxfs/xfs_da_btree.h
+++ b/libxfs/xfs_da_btree.h
@@ -55,7 +55,9 @@ enum xfs_dacmp {
 typedef struct xfs_da_args {
 	struct xfs_da_geometry *geo;	/* da block geometry */
 	const uint8_t		*name;		/* string (maybe not NULL terminated) */
+	const uint8_t	*new_name;	/* new attr name */
 	int		namelen;	/* length of string (maybe no NULL) */
+	int		new_namelen;	/* new attr name len */
 	uint8_t		filetype;	/* filetype of inode for directories */
 	void		*value;		/* set of bytes (maybe contain NULLs) */
 	int		valuelen;	/* length of value */
diff --git a/libxfs/xfs_log_format.h b/libxfs/xfs_log_format.h
index f13e0809..ae9c9976 100644
--- a/libxfs/xfs_log_format.h
+++ b/libxfs/xfs_log_format.h
@@ -117,7 +117,8 @@ struct xfs_unmount_log_format {
 #define XLOG_REG_TYPE_ATTRD_FORMAT	28
 #define XLOG_REG_TYPE_ATTR_NAME	29
 #define XLOG_REG_TYPE_ATTR_VALUE	30
-#define XLOG_REG_TYPE_MAX		30
+#define XLOG_REG_TYPE_ATTR_NNAME	31
+#define XLOG_REG_TYPE_MAX		31
 
 
 /*
@@ -957,6 +958,7 @@ struct xfs_icreate_log {
 #define XFS_ATTRI_OP_FLAGS_SET		1	/* Set the attribute */
 #define XFS_ATTRI_OP_FLAGS_REMOVE	2	/* Remove the attribute */
 #define XFS_ATTRI_OP_FLAGS_REPLACE	3	/* Replace the attribute */
+#define XFS_ATTRI_OP_FLAGS_NVREPLACE	4	/* Replace attr name and val */
 #define XFS_ATTRI_OP_FLAGS_TYPE_MASK	0xFF	/* Flags type mask */
 
 /*
@@ -974,7 +976,7 @@ struct xfs_icreate_log {
 struct xfs_attri_log_format {
 	uint16_t	alfi_type;	/* attri log item type */
 	uint16_t	alfi_size;	/* size of this item */
-	uint32_t	__pad;		/* pad to 64 bit aligned */
+	uint32_t	alfi_nname_len;	/* attr new name length */
 	uint64_t	alfi_id;	/* attri identifier */
 	uint64_t	alfi_ino;	/* the inode for this attr operation */
 	uint32_t	alfi_op_flags;	/* marks the op as a set or remove */
diff --git a/logprint/log_redo.c b/logprint/log_redo.c
index edf7e0fb..b596af02 100644
--- a/logprint/log_redo.c
+++ b/logprint/log_redo.c
@@ -705,9 +705,9 @@ xlog_print_trans_attri(
 	memmove((char*)src_f, *ptr, src_len);
 	*ptr += src_len;
 
-	printf(_("ATTRI:  #regs: %d	name_len: %d, value_len: %d  id: 0x%llx\n"),
-		src_f->alfi_size, src_f->alfi_name_len, src_f->alfi_value_len,
-				(unsigned long long)src_f->alfi_id);
+	printf(_("ATTRI:  #regs: %d	name_len: %d, nname_len: %d value_len: %d  id: 0x%llx\n"),
+		src_f->alfi_size, src_f->alfi_name_len, src_f->alfi_nname_len,
+		src_f->alfi_value_len, (unsigned long long)src_f->alfi_id);
 
 	if (src_f->alfi_name_len > 0) {
 		printf(_("\n"));
@@ -719,6 +719,16 @@ xlog_print_trans_attri(
 			goto error;
 	}
 
+	if (src_f->alfi_nname_len > 0) {
+		printf(_("\n"));
+		(*i)++;
+		head = (xlog_op_header_t *)*ptr;
+		xlog_print_op_header(head, *i, ptr);
+		error = xlog_print_trans_attri_name(ptr, be32_to_cpu(head->oh_len));
+		if (error)
+			goto error;
+	}
+
 	if (src_f->alfi_value_len > 0) {
 		printf(_("\n"));
 		(*i)++;
@@ -788,8 +798,8 @@ xlog_recover_print_attri(
 	if (xfs_attri_copy_log_format((char*)src_f, src_len, f))
 		goto out;
 
-	printf(_("ATTRI:  #regs: %d	name_len: %d, value_len: %d  id: 0x%llx\n"),
-		f->alfi_size, f->alfi_name_len, f->alfi_value_len, (unsigned long long)f->alfi_id);
+	printf(_("ATTRI:  #regs: %d	name_len: %d, nname_len:%d, value_len: %d  id: 0x%llx\n"),
+		f->alfi_size, f->alfi_name_len, f->alfi_nname_len, f->alfi_value_len, (unsigned long long)f->alfi_id);
 
 	if (f->alfi_name_len > 0) {
 		region++;
@@ -798,6 +808,13 @@ xlog_recover_print_attri(
 			       f->alfi_name_len);
 	}
 
+	if (f->alfi_nname_len > 0) {
+		region++;
+		printf(_("ATTRI:  nname len:%u\n"), f->alfi_nname_len);
+		print_or_dump((char *)item->ri_buf[region].i_addr,
+			       f->alfi_nname_len);
+	}
+
 	if (f->alfi_value_len > 0) {
 		int len = f->alfi_value_len;
 


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 03/25] xfsprogs: Increase XFS_DEFER_OPS_NR_INODES to 5
  2023-02-16 20:29 ` [PATCHSET v9r2d1 00/25] xfsprogs: Parent Pointers Darrick J. Wong
  2023-02-16 20:53   ` [PATCH 01/25] xfsprogs: Fix default superblock attr bits Darrick J. Wong
  2023-02-16 20:54   ` [PATCH 02/25] xfsprogs: Add new name to attri/d Darrick J. Wong
@ 2023-02-16 20:54   ` Darrick J. Wong
  2023-02-16 20:54   ` [PATCH 04/25] xfsprogs: get directory offset when adding directory name Darrick J. Wong
                     ` (21 subsequent siblings)
  24 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:54 UTC (permalink / raw)
  To: djwong; +Cc: Allison Henderson, Catherine Hoang, allison.henderson, linux-xfs

From: Allison Henderson <allison.henderson@oracle.com>

Source kernel commit: e9dc6a1e293b7e3843cd3868603801a1af2704c3

Renames that generate parent pointer updates can join up to 5
inodes locked in sorted order.  So we need to increase the
number of defer ops inodes and relock them in the same way.

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Catherine Hoang <catherine.hoang@oracle.com>
---
 libxfs/libxfs_priv.h |    1 +
 libxfs/xfs_defer.c   |   28 ++++++++++++++++++++++++++--
 libxfs/xfs_defer.h   |    8 +++++++-
 3 files changed, 34 insertions(+), 3 deletions(-)


diff --git a/libxfs/libxfs_priv.h b/libxfs/libxfs_priv.h
index 456c82d7..567bd237 100644
--- a/libxfs/libxfs_priv.h
+++ b/libxfs/libxfs_priv.h
@@ -477,6 +477,7 @@ void __xfs_buf_mark_corrupt(struct xfs_buf *bp, xfs_failaddr_t fa);
 	__mode = __mode; /* no set-but-unused warning */	\
 })
 #define xfs_lock_two_inodes(ip0,mode0,ip1,mode1)	((void) 0)
+#define xfs_lock_inodes(ips,num_ips,mode)		((void) 0)
 
 /* space allocation */
 #define XFS_EXTENT_BUSY_DISCARDED	0x01	/* undergoing a discard op. */
diff --git a/libxfs/xfs_defer.c b/libxfs/xfs_defer.c
index c4f0269d..415fcaf5 100644
--- a/libxfs/xfs_defer.c
+++ b/libxfs/xfs_defer.c
@@ -815,13 +815,37 @@ xfs_defer_ops_continue(
 	struct xfs_trans		*tp,
 	struct xfs_defer_resources	*dres)
 {
-	unsigned int			i;
+	unsigned int			i, j;
+	struct xfs_inode		*sips[XFS_DEFER_OPS_NR_INODES];
+	struct xfs_inode		*temp;
 
 	ASSERT(tp->t_flags & XFS_TRANS_PERM_LOG_RES);
 	ASSERT(!(tp->t_flags & XFS_TRANS_DIRTY));
 
 	/* Lock the captured resources to the new transaction. */
-	if (dfc->dfc_held.dr_inos == 2)
+	if (dfc->dfc_held.dr_inos > 2) {
+		/*
+		 * Renames with parent pointer updates can lock up to 5 inodes,
+		 * sorted by their inode number.  So we need to make sure they
+		 * are relocked in the same way.
+		 */
+		memset(sips, 0, sizeof(sips));
+		for (i = 0; i < dfc->dfc_held.dr_inos; i++)
+			sips[i] = dfc->dfc_held.dr_ip[i];
+
+		/* Bubble sort of at most 5 inodes */
+		for (i = 0; i < dfc->dfc_held.dr_inos; i++) {
+			for (j = 1; j < dfc->dfc_held.dr_inos; j++) {
+				if (sips[j]->i_ino < sips[j-1]->i_ino) {
+					temp = sips[j];
+					sips[j] = sips[j-1];
+					sips[j-1] = temp;
+				}
+			}
+		}
+
+		xfs_lock_inodes(sips, dfc->dfc_held.dr_inos, XFS_ILOCK_EXCL);
+	} else if (dfc->dfc_held.dr_inos == 2)
 		xfs_lock_two_inodes(dfc->dfc_held.dr_ip[0], XFS_ILOCK_EXCL,
 				    dfc->dfc_held.dr_ip[1], XFS_ILOCK_EXCL);
 	else if (dfc->dfc_held.dr_inos == 1)
diff --git a/libxfs/xfs_defer.h b/libxfs/xfs_defer.h
index 114a3a49..fdf6941f 100644
--- a/libxfs/xfs_defer.h
+++ b/libxfs/xfs_defer.h
@@ -70,7 +70,13 @@ extern const struct xfs_defer_op_type xfs_attr_defer_type;
 /*
  * Deferred operation item relogging limits.
  */
-#define XFS_DEFER_OPS_NR_INODES	2	/* join up to two inodes */
+
+/*
+ * Rename w/ parent pointers can require up to 5 inodes with deferred ops to
+ * be joined to the transaction: src_dp, target_dp, src_ip, target_ip, and wip.
+ * These inodes are locked in sorted order by their inode numbers
+ */
+#define XFS_DEFER_OPS_NR_INODES	5
 #define XFS_DEFER_OPS_NR_BUFS	2	/* join up to two buffers */
 
 /* Resources that must be held across a transaction roll. */


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 04/25] xfsprogs: get directory offset when adding directory name
  2023-02-16 20:29 ` [PATCHSET v9r2d1 00/25] xfsprogs: Parent Pointers Darrick J. Wong
                     ` (2 preceding siblings ...)
  2023-02-16 20:54   ` [PATCH 03/25] xfsprogs: Increase XFS_DEFER_OPS_NR_INODES to 5 Darrick J. Wong
@ 2023-02-16 20:54   ` Darrick J. Wong
  2023-02-16 20:54   ` [PATCH 05/25] xfsprogs: get directory offset when removing " Darrick J. Wong
                     ` (20 subsequent siblings)
  24 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:54 UTC (permalink / raw)
  To: djwong
  Cc: Dave Chinner, Allison Henderson, Catherine Hoang,
	allison.henderson, linux-xfs

From: Allison Collins <allison.henderson@oracle.com>

Return the directory offset information when adding an entry to the
directory.

This offset will be used as the parent pointer offset in xfs_create,
xfs_symlink, xfs_link and xfs_rename.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Catherine Hoang <catherine.hoang@oracle.com>
---
 libxfs/xfs_da_btree.h   |    1 +
 libxfs/xfs_dir2.c       |    9 +++++++--
 libxfs/xfs_dir2.h       |    2 +-
 libxfs/xfs_dir2_block.c |    1 +
 libxfs/xfs_dir2_leaf.c  |    2 ++
 libxfs/xfs_dir2_node.c  |    2 ++
 libxfs/xfs_dir2_sf.c    |    2 ++
 mkfs/proto.c            |    2 +-
 repair/phase6.c         |   16 ++++++++--------
 9 files changed, 25 insertions(+), 12 deletions(-)


diff --git a/libxfs/xfs_da_btree.h b/libxfs/xfs_da_btree.h
index a4b29827..90b86d00 100644
--- a/libxfs/xfs_da_btree.h
+++ b/libxfs/xfs_da_btree.h
@@ -81,6 +81,7 @@ typedef struct xfs_da_args {
 	int		rmtvaluelen2;	/* remote attr value length in bytes */
 	uint32_t	op_flags;	/* operation flags */
 	enum xfs_dacmp	cmpresult;	/* name compare result for lookups */
+	xfs_dir2_dataptr_t offset;	/* OUT: offset in directory */
 } xfs_da_args_t;
 
 /*
diff --git a/libxfs/xfs_dir2.c b/libxfs/xfs_dir2.c
index d6a19296..409d74a1 100644
--- a/libxfs/xfs_dir2.c
+++ b/libxfs/xfs_dir2.c
@@ -256,7 +256,8 @@ xfs_dir_createname(
 	struct xfs_inode	*dp,
 	const struct xfs_name	*name,
 	xfs_ino_t		inum,		/* new entry inode number */
-	xfs_extlen_t		total)		/* bmap's total block count */
+	xfs_extlen_t		total,		/* bmap's total block count */
+	xfs_dir2_dataptr_t	*offset)	/* OUT entry's dir offset */
 {
 	struct xfs_da_args	*args;
 	int			rval;
@@ -311,6 +312,10 @@ xfs_dir_createname(
 		rval = xfs_dir2_node_addname(args);
 
 out_free:
+	/* return the location that this entry was place in the parent inode */
+	if (offset)
+		*offset = args->offset;
+
 	kmem_free(args);
 	return rval;
 }
@@ -549,7 +554,7 @@ xfs_dir_canenter(
 	xfs_inode_t	*dp,
 	struct xfs_name	*name)		/* name of entry to add */
 {
-	return xfs_dir_createname(tp, dp, name, 0, 0);
+	return xfs_dir_createname(tp, dp, name, 0, 0, NULL);
 }
 
 /*
diff --git a/libxfs/xfs_dir2.h b/libxfs/xfs_dir2.h
index dd39f17d..d9695447 100644
--- a/libxfs/xfs_dir2.h
+++ b/libxfs/xfs_dir2.h
@@ -40,7 +40,7 @@ extern int xfs_dir_init(struct xfs_trans *tp, struct xfs_inode *dp,
 				struct xfs_inode *pdp);
 extern int xfs_dir_createname(struct xfs_trans *tp, struct xfs_inode *dp,
 				const struct xfs_name *name, xfs_ino_t inum,
-				xfs_extlen_t tot);
+				xfs_extlen_t tot, xfs_dir2_dataptr_t *offset);
 extern int xfs_dir_lookup(struct xfs_trans *tp, struct xfs_inode *dp,
 				const struct xfs_name *name, xfs_ino_t *inum,
 				struct xfs_name *ci_name);
diff --git a/libxfs/xfs_dir2_block.c b/libxfs/xfs_dir2_block.c
index bb9301b7..fb5b4179 100644
--- a/libxfs/xfs_dir2_block.c
+++ b/libxfs/xfs_dir2_block.c
@@ -570,6 +570,7 @@ xfs_dir2_block_addname(
 	xfs_dir2_data_put_ftype(dp->i_mount, dep, args->filetype);
 	tagp = xfs_dir2_data_entry_tag_p(dp->i_mount, dep);
 	*tagp = cpu_to_be16((char *)dep - (char *)hdr);
+	args->offset = xfs_dir2_byte_to_dataptr((char *)dep - (char *)hdr);
 	/*
 	 * Clean up the bestfree array and log the header, tail, and entry.
 	 */
diff --git a/libxfs/xfs_dir2_leaf.c b/libxfs/xfs_dir2_leaf.c
index 5da66006..2dac830c 100644
--- a/libxfs/xfs_dir2_leaf.c
+++ b/libxfs/xfs_dir2_leaf.c
@@ -868,6 +868,8 @@ xfs_dir2_leaf_addname(
 	xfs_dir2_data_put_ftype(dp->i_mount, dep, args->filetype);
 	tagp = xfs_dir2_data_entry_tag_p(dp->i_mount, dep);
 	*tagp = cpu_to_be16((char *)dep - (char *)hdr);
+	args->offset = xfs_dir2_db_off_to_dataptr(args->geo, use_block,
+						(char *)dep - (char *)hdr);
 	/*
 	 * Need to scan fix up the bestfree table.
 	 */
diff --git a/libxfs/xfs_dir2_node.c b/libxfs/xfs_dir2_node.c
index c0eb335c..45fb218f 100644
--- a/libxfs/xfs_dir2_node.c
+++ b/libxfs/xfs_dir2_node.c
@@ -1971,6 +1971,8 @@ xfs_dir2_node_addname_int(
 	xfs_dir2_data_put_ftype(dp->i_mount, dep, args->filetype);
 	tagp = xfs_dir2_data_entry_tag_p(dp->i_mount, dep);
 	*tagp = cpu_to_be16((char *)dep - (char *)hdr);
+	args->offset = xfs_dir2_db_off_to_dataptr(args->geo, dbno,
+						  (char *)dep - (char *)hdr);
 	xfs_dir2_data_log_entry(args, dbp, dep);
 
 	/* Rescan the freespace and log the data block if needed. */
diff --git a/libxfs/xfs_dir2_sf.c b/libxfs/xfs_dir2_sf.c
index 08b36c95..a3f1e657 100644
--- a/libxfs/xfs_dir2_sf.c
+++ b/libxfs/xfs_dir2_sf.c
@@ -485,6 +485,7 @@ xfs_dir2_sf_addname_easy(
 	memcpy(sfep->name, args->name, sfep->namelen);
 	xfs_dir2_sf_put_ino(mp, sfp, sfep, args->inumber);
 	xfs_dir2_sf_put_ftype(mp, sfep, args->filetype);
+	args->offset = xfs_dir2_byte_to_dataptr(offset);
 
 	/*
 	 * Update the header and inode.
@@ -575,6 +576,7 @@ xfs_dir2_sf_addname_hard(
 	memcpy(sfep->name, args->name, sfep->namelen);
 	xfs_dir2_sf_put_ino(mp, sfp, sfep, args->inumber);
 	xfs_dir2_sf_put_ftype(mp, sfep, args->filetype);
+	args->offset = xfs_dir2_byte_to_dataptr(offset);
 	sfp->count++;
 	if (args->inumber > XFS_DIR2_MAX_SHORT_INUM && !objchange)
 		sfp->i8count++;
diff --git a/mkfs/proto.c b/mkfs/proto.c
index 68ecdbf3..6b6a070f 100644
--- a/mkfs/proto.c
+++ b/mkfs/proto.c
@@ -328,7 +328,7 @@ newdirent(
 
 	rsv = XFS_DIRENTER_SPACE_RES(mp, name->len);
 
-	error = -libxfs_dir_createname(tp, pip, name, inum, rsv);
+	error = -libxfs_dir_createname(tp, pip, name, inum, rsv, NULL);
 	if (error)
 		fail(_("directory createname error"), error);
 }
diff --git a/repair/phase6.c b/repair/phase6.c
index 0be2c9c9..d1e9c0d9 100644
--- a/repair/phase6.c
+++ b/repair/phase6.c
@@ -973,7 +973,7 @@ mk_orphanage(xfs_mount_t *mp)
 	/*
 	 * create the actual entry
 	 */
-	error = -libxfs_dir_createname(tp, pip, &xname, ip->i_ino, nres);
+	error = -libxfs_dir_createname(tp, pip, &xname, ip->i_ino, nres, NULL);
 	if (error)
 		do_error(
 		_("can't make %s, createname error %d\n"),
@@ -1070,7 +1070,7 @@ mv_orphanage(
 			libxfs_trans_ijoin(tp, ino_p, 0);
 
 			err = -libxfs_dir_createname(tp, orphanage_ip, &xname,
-						ino, nres);
+						ino, nres, NULL);
 			if (err)
 				do_error(
 	_("name create failed in %s (%d)\n"), ORPHANAGE, err);
@@ -1082,7 +1082,7 @@ mv_orphanage(
 			libxfs_trans_log_inode(tp, orphanage_ip, XFS_ILOG_CORE);
 
 			err = -libxfs_dir_createname(tp, ino_p, &xfs_name_dotdot,
-					orphanage_ino, nres);
+					orphanage_ino, nres, NULL);
 			if (err)
 				do_error(
 	_("creation of .. entry failed (%d)\n"), err);
@@ -1104,7 +1104,7 @@ mv_orphanage(
 
 
 			err = -libxfs_dir_createname(tp, orphanage_ip, &xname,
-						ino, nres);
+						ino, nres, NULL);
 			if (err)
 				do_error(
 	_("name create failed in %s (%d)\n"), ORPHANAGE, err);
@@ -1151,7 +1151,7 @@ mv_orphanage(
 		libxfs_trans_ijoin(tp, ino_p, 0);
 
 		err = -libxfs_dir_createname(tp, orphanage_ip, &xname, ino,
-						nres);
+						nres, NULL);
 		if (err)
 			do_error(
 	_("name create failed in %s (%d)\n"), ORPHANAGE, err);
@@ -1334,7 +1334,7 @@ longform_dir2_rebuild(
 		libxfs_trans_ijoin(tp, ip, 0);
 
 		error = -libxfs_dir_createname(tp, ip, &p->name, p->inum,
-						nres);
+						nres, NULL);
 		if (error) {
 			do_warn(
 _("name create failed in ino %" PRIu64 " (%d)\n"), ino, error);
@@ -2943,7 +2943,7 @@ _("error %d fixing shortform directory %llu\n"),
 		libxfs_trans_ijoin(tp, ip, 0);
 
 		error = -libxfs_dir_createname(tp, ip, &xfs_name_dotdot,
-					ip->i_ino, nres);
+					ip->i_ino, nres, NULL);
 		if (error)
 			do_error(
 	_("can't make \"..\" entry in root inode %" PRIu64 ", createname error %d\n"), ino, error);
@@ -2998,7 +2998,7 @@ _("error %d fixing shortform directory %llu\n"),
 			libxfs_trans_ijoin(tp, ip, 0);
 
 			error = -libxfs_dir_createname(tp, ip, &xfs_name_dot,
-					ip->i_ino, nres);
+					ip->i_ino, nres, NULL);
 			if (error)
 				do_error(
 	_("can't make \".\" entry in dir ino %" PRIu64 ", createname error %d\n"),


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 05/25] xfsprogs: get directory offset when removing directory name
  2023-02-16 20:29 ` [PATCHSET v9r2d1 00/25] xfsprogs: Parent Pointers Darrick J. Wong
                     ` (3 preceding siblings ...)
  2023-02-16 20:54   ` [PATCH 04/25] xfsprogs: get directory offset when adding directory name Darrick J. Wong
@ 2023-02-16 20:54   ` Darrick J. Wong
  2023-02-16 20:55   ` [PATCH 06/25] xfsprogs: get directory offset when replacing a " Darrick J. Wong
                     ` (19 subsequent siblings)
  24 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:54 UTC (permalink / raw)
  To: djwong
  Cc: Mark Tinguely, Dave Chinner, Allison Henderson, Catherine Hoang,
	allison.henderson, linux-xfs

From: Allison Henderson <allison.henderson@oracle.com>

Return the directory offset information when removing an entry to the
directory.

This offset will be used as the parent pointer offset in xfs_remove.

Signed-off-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Catherine Hoang <catherine.hoang@oracle.com>
---
 libxfs/xfs_dir2.c       |    6 +++++-
 libxfs/xfs_dir2.h       |    3 ++-
 libxfs/xfs_dir2_block.c |    4 ++--
 libxfs/xfs_dir2_leaf.c  |    5 +++--
 libxfs/xfs_dir2_node.c  |    5 +++--
 libxfs/xfs_dir2_sf.c    |    2 ++
 6 files changed, 17 insertions(+), 8 deletions(-)


diff --git a/libxfs/xfs_dir2.c b/libxfs/xfs_dir2.c
index 409d74a1..6aa1db0e 100644
--- a/libxfs/xfs_dir2.c
+++ b/libxfs/xfs_dir2.c
@@ -435,7 +435,8 @@ xfs_dir_removename(
 	struct xfs_inode	*dp,
 	struct xfs_name		*name,
 	xfs_ino_t		ino,
-	xfs_extlen_t		total)		/* bmap's total block count */
+	xfs_extlen_t		total,		/* bmap's total block count */
+	xfs_dir2_dataptr_t	*offset)	/* OUT: offset in directory */
 {
 	struct xfs_da_args	*args;
 	int			rval;
@@ -480,6 +481,9 @@ xfs_dir_removename(
 	else
 		rval = xfs_dir2_node_removename(args);
 out_free:
+	if (offset)
+		*offset = args->offset;
+
 	kmem_free(args);
 	return rval;
 }
diff --git a/libxfs/xfs_dir2.h b/libxfs/xfs_dir2.h
index d9695447..0c2d7c0a 100644
--- a/libxfs/xfs_dir2.h
+++ b/libxfs/xfs_dir2.h
@@ -46,7 +46,8 @@ extern int xfs_dir_lookup(struct xfs_trans *tp, struct xfs_inode *dp,
 				struct xfs_name *ci_name);
 extern int xfs_dir_removename(struct xfs_trans *tp, struct xfs_inode *dp,
 				struct xfs_name *name, xfs_ino_t ino,
-				xfs_extlen_t tot);
+				xfs_extlen_t tot,
+				xfs_dir2_dataptr_t *offset);
 extern int xfs_dir_replace(struct xfs_trans *tp, struct xfs_inode *dp,
 				const struct xfs_name *name, xfs_ino_t inum,
 				xfs_extlen_t tot);
diff --git a/libxfs/xfs_dir2_block.c b/libxfs/xfs_dir2_block.c
index fb5b4179..43b9c18f 100644
--- a/libxfs/xfs_dir2_block.c
+++ b/libxfs/xfs_dir2_block.c
@@ -807,9 +807,9 @@ xfs_dir2_block_removename(
 	/*
 	 * Point to the data entry using the leaf entry.
 	 */
+	args->offset = be32_to_cpu(blp[ent].address);
 	dep = (xfs_dir2_data_entry_t *)((char *)hdr +
-			xfs_dir2_dataptr_to_off(args->geo,
-						be32_to_cpu(blp[ent].address)));
+			xfs_dir2_dataptr_to_off(args->geo, args->offset));
 	/*
 	 * Mark the data entry's space free.
 	 */
diff --git a/libxfs/xfs_dir2_leaf.c b/libxfs/xfs_dir2_leaf.c
index 2dac830c..ee9cfbe9 100644
--- a/libxfs/xfs_dir2_leaf.c
+++ b/libxfs/xfs_dir2_leaf.c
@@ -1384,9 +1384,10 @@ xfs_dir2_leaf_removename(
 	 * Point to the leaf entry, use that to point to the data entry.
 	 */
 	lep = &leafhdr.ents[index];
-	db = xfs_dir2_dataptr_to_db(geo, be32_to_cpu(lep->address));
+	args->offset = be32_to_cpu(lep->address);
+	db = xfs_dir2_dataptr_to_db(args->geo, args->offset);
 	dep = (xfs_dir2_data_entry_t *)((char *)hdr +
-		xfs_dir2_dataptr_to_off(geo, be32_to_cpu(lep->address)));
+		xfs_dir2_dataptr_to_off(args->geo, args->offset));
 	needscan = needlog = 0;
 	oldbest = be16_to_cpu(bf[0].length);
 	ltp = xfs_dir2_leaf_tail_p(geo, leaf);
diff --git a/libxfs/xfs_dir2_node.c b/libxfs/xfs_dir2_node.c
index 45fb218f..ac6a7089 100644
--- a/libxfs/xfs_dir2_node.c
+++ b/libxfs/xfs_dir2_node.c
@@ -1293,9 +1293,10 @@ xfs_dir2_leafn_remove(
 	/*
 	 * Extract the data block and offset from the entry.
 	 */
-	db = xfs_dir2_dataptr_to_db(geo, be32_to_cpu(lep->address));
+	args->offset = be32_to_cpu(lep->address);
+	db = xfs_dir2_dataptr_to_db(args->geo, args->offset);
 	ASSERT(dblk->blkno == db);
-	off = xfs_dir2_dataptr_to_off(geo, be32_to_cpu(lep->address));
+	off = xfs_dir2_dataptr_to_off(args->geo, args->offset);
 	ASSERT(dblk->index == off);
 
 	/*
diff --git a/libxfs/xfs_dir2_sf.c b/libxfs/xfs_dir2_sf.c
index a3f1e657..4dc74734 100644
--- a/libxfs/xfs_dir2_sf.c
+++ b/libxfs/xfs_dir2_sf.c
@@ -969,6 +969,8 @@ xfs_dir2_sf_removename(
 								XFS_CMP_EXACT) {
 			ASSERT(xfs_dir2_sf_get_ino(mp, sfp, sfep) ==
 			       args->inumber);
+			args->offset = xfs_dir2_byte_to_dataptr(
+						xfs_dir2_sf_get_offset(sfep));
 			break;
 		}
 	}


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 06/25] xfsprogs: get directory offset when replacing a directory name
  2023-02-16 20:29 ` [PATCHSET v9r2d1 00/25] xfsprogs: Parent Pointers Darrick J. Wong
                     ` (4 preceding siblings ...)
  2023-02-16 20:54   ` [PATCH 05/25] xfsprogs: get directory offset when removing " Darrick J. Wong
@ 2023-02-16 20:55   ` Darrick J. Wong
  2023-02-16 20:55   ` [PATCH 07/25] xfsprogs: add parent pointer support to attribute code Darrick J. Wong
                     ` (18 subsequent siblings)
  24 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:55 UTC (permalink / raw)
  To: djwong
  Cc: Mark Tinguely, Dave Chinner, Allison Henderson,
	allison.henderson, linux-xfs

From: Allison Collins <allison.henderson@oracle.com>

Return the directory offset information when replacing an entry to the
directory.

This offset will be used as the parent pointer offset in xfs_rename.

Signed-off-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 libxfs/xfs_dir2.c       |    8 ++++++--
 libxfs/xfs_dir2.h       |    2 +-
 libxfs/xfs_dir2_block.c |    4 ++--
 libxfs/xfs_dir2_leaf.c  |    1 +
 libxfs/xfs_dir2_node.c  |    1 +
 libxfs/xfs_dir2_sf.c    |    2 ++
 repair/phase6.c         |    2 +-
 7 files changed, 14 insertions(+), 6 deletions(-)


diff --git a/libxfs/xfs_dir2.c b/libxfs/xfs_dir2.c
index 6aa1db0e..43b4e46b 100644
--- a/libxfs/xfs_dir2.c
+++ b/libxfs/xfs_dir2.c
@@ -481,7 +481,7 @@ xfs_dir_removename(
 	else
 		rval = xfs_dir2_node_removename(args);
 out_free:
-	if (offset)
+	if (!rval && offset)
 		*offset = args->offset;
 
 	kmem_free(args);
@@ -497,7 +497,8 @@ xfs_dir_replace(
 	struct xfs_inode	*dp,
 	const struct xfs_name	*name,		/* name of entry to replace */
 	xfs_ino_t		inum,		/* new inode number */
-	xfs_extlen_t		total)		/* bmap's total block count */
+	xfs_extlen_t		total,		/* bmap's total block count */
+	xfs_dir2_dataptr_t	*offset)	/* OUT: offset in directory */
 {
 	struct xfs_da_args	*args;
 	int			rval;
@@ -545,6 +546,9 @@ xfs_dir_replace(
 	else
 		rval = xfs_dir2_node_replace(args);
 out_free:
+	if (offset)
+		*offset = args->offset;
+
 	kmem_free(args);
 	return rval;
 }
diff --git a/libxfs/xfs_dir2.h b/libxfs/xfs_dir2.h
index 0c2d7c0a..ff59f009 100644
--- a/libxfs/xfs_dir2.h
+++ b/libxfs/xfs_dir2.h
@@ -50,7 +50,7 @@ extern int xfs_dir_removename(struct xfs_trans *tp, struct xfs_inode *dp,
 				xfs_dir2_dataptr_t *offset);
 extern int xfs_dir_replace(struct xfs_trans *tp, struct xfs_inode *dp,
 				const struct xfs_name *name, xfs_ino_t inum,
-				xfs_extlen_t tot);
+				xfs_extlen_t tot, xfs_dir2_dataptr_t *offset);
 extern int xfs_dir_canenter(struct xfs_trans *tp, struct xfs_inode *dp,
 				struct xfs_name *name);
 
diff --git a/libxfs/xfs_dir2_block.c b/libxfs/xfs_dir2_block.c
index 43b9c18f..c743fa67 100644
--- a/libxfs/xfs_dir2_block.c
+++ b/libxfs/xfs_dir2_block.c
@@ -882,9 +882,9 @@ xfs_dir2_block_replace(
 	/*
 	 * Point to the data entry we need to change.
 	 */
+	args->offset = be32_to_cpu(blp[ent].address);
 	dep = (xfs_dir2_data_entry_t *)((char *)hdr +
-			xfs_dir2_dataptr_to_off(args->geo,
-						be32_to_cpu(blp[ent].address)));
+			xfs_dir2_dataptr_to_off(args->geo, args->offset));
 	ASSERT(be64_to_cpu(dep->inumber) != args->inumber);
 	/*
 	 * Change the inode number to the new value.
diff --git a/libxfs/xfs_dir2_leaf.c b/libxfs/xfs_dir2_leaf.c
index ee9cfbe9..1be7773e 100644
--- a/libxfs/xfs_dir2_leaf.c
+++ b/libxfs/xfs_dir2_leaf.c
@@ -1521,6 +1521,7 @@ xfs_dir2_leaf_replace(
 	/*
 	 * Point to the data entry.
 	 */
+	args->offset = be32_to_cpu(lep->address);
 	dep = (xfs_dir2_data_entry_t *)
 	      ((char *)dbp->b_addr +
 	       xfs_dir2_dataptr_to_off(args->geo, be32_to_cpu(lep->address)));
diff --git a/libxfs/xfs_dir2_node.c b/libxfs/xfs_dir2_node.c
index ac6a7089..621e8bf5 100644
--- a/libxfs/xfs_dir2_node.c
+++ b/libxfs/xfs_dir2_node.c
@@ -2239,6 +2239,7 @@ xfs_dir2_node_replace(
 		hdr = state->extrablk.bp->b_addr;
 		ASSERT(hdr->magic == cpu_to_be32(XFS_DIR2_DATA_MAGIC) ||
 		       hdr->magic == cpu_to_be32(XFS_DIR3_DATA_MAGIC));
+		args->offset = be32_to_cpu(leafhdr.ents[blk->index].address);
 		dep = (xfs_dir2_data_entry_t *)
 		      ((char *)hdr +
 		       xfs_dir2_dataptr_to_off(args->geo,
diff --git a/libxfs/xfs_dir2_sf.c b/libxfs/xfs_dir2_sf.c
index 4dc74734..6a128748 100644
--- a/libxfs/xfs_dir2_sf.c
+++ b/libxfs/xfs_dir2_sf.c
@@ -1107,6 +1107,8 @@ xfs_dir2_sf_replace(
 				xfs_dir2_sf_put_ino(mp, sfp, sfep,
 						args->inumber);
 				xfs_dir2_sf_put_ftype(mp, sfep, args->filetype);
+				args->offset = xfs_dir2_byte_to_dataptr(
+						  xfs_dir2_sf_get_offset(sfep));
 				break;
 			}
 		}
diff --git a/repair/phase6.c b/repair/phase6.c
index d1e9c0d9..a347c390 100644
--- a/repair/phase6.c
+++ b/repair/phase6.c
@@ -1122,7 +1122,7 @@ mv_orphanage(
 			if (entry_ino_num != orphanage_ino)  {
 				err = -libxfs_dir_replace(tp, ino_p,
 						&xfs_name_dotdot, orphanage_ino,
-						nres);
+						nres, NULL);
 				if (err)
 					do_error(
 	_("name replace op failed (%d)\n"), err);


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 07/25] xfsprogs: add parent pointer support to attribute code
  2023-02-16 20:29 ` [PATCHSET v9r2d1 00/25] xfsprogs: Parent Pointers Darrick J. Wong
                     ` (5 preceding siblings ...)
  2023-02-16 20:55   ` [PATCH 06/25] xfsprogs: get directory offset when replacing a " Darrick J. Wong
@ 2023-02-16 20:55   ` Darrick J. Wong
  2023-02-16 20:55   ` [PATCH 08/25] xfsprogs: define parent pointer xattr format Darrick J. Wong
                     ` (17 subsequent siblings)
  24 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:55 UTC (permalink / raw)
  To: djwong
  Cc: Mark Tinguely, Dave Chinner, Allison Henderson,
	allison.henderson, linux-xfs

From: Allison Henderson <allison.henderson@oracle.com>

Add the new parent attribute type. XFS_ATTR_PARENT is used only for parent pointer
entries; it uses reserved blocks like XFS_ATTR_ROOT.

Signed-off-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
[djwong: fix whitespace errors]
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 libxfs/xfs_attr.c       |    4 +++-
 libxfs/xfs_da_format.h  |    5 ++++-
 libxfs/xfs_log_format.h |    1 +
 3 files changed, 8 insertions(+), 2 deletions(-)


diff --git a/libxfs/xfs_attr.c b/libxfs/xfs_attr.c
index 2f619286..04f8e349 100644
--- a/libxfs/xfs_attr.c
+++ b/libxfs/xfs_attr.c
@@ -974,11 +974,13 @@ xfs_attr_set(
 	struct xfs_inode	*dp = args->dp;
 	struct xfs_mount	*mp = dp->i_mount;
 	struct xfs_trans_res	tres;
-	bool			rsvd = (args->attr_filter & XFS_ATTR_ROOT);
+	bool			rsvd;
 	int			error, local;
 	int			rmt_blks = 0;
 	unsigned int		total;
 
+	rsvd = (args->attr_filter & (XFS_ATTR_ROOT | XFS_ATTR_PARENT)) != 0;
+
 	if (xfs_is_shutdown(dp->i_mount))
 		return -EIO;
 
diff --git a/libxfs/xfs_da_format.h b/libxfs/xfs_da_format.h
index 25e28410..3dc03968 100644
--- a/libxfs/xfs_da_format.h
+++ b/libxfs/xfs_da_format.h
@@ -688,12 +688,15 @@ struct xfs_attr3_leafblock {
 #define	XFS_ATTR_LOCAL_BIT	0	/* attr is stored locally */
 #define	XFS_ATTR_ROOT_BIT	1	/* limit access to trusted attrs */
 #define	XFS_ATTR_SECURE_BIT	2	/* limit access to secure attrs */
+#define	XFS_ATTR_PARENT_BIT	3	/* parent pointer attrs */
 #define	XFS_ATTR_INCOMPLETE_BIT	7	/* attr in middle of create/delete */
 #define XFS_ATTR_LOCAL		(1u << XFS_ATTR_LOCAL_BIT)
 #define XFS_ATTR_ROOT		(1u << XFS_ATTR_ROOT_BIT)
 #define XFS_ATTR_SECURE		(1u << XFS_ATTR_SECURE_BIT)
+#define XFS_ATTR_PARENT		(1u << XFS_ATTR_PARENT_BIT)
 #define XFS_ATTR_INCOMPLETE	(1u << XFS_ATTR_INCOMPLETE_BIT)
-#define XFS_ATTR_NSP_ONDISK_MASK	(XFS_ATTR_ROOT | XFS_ATTR_SECURE)
+#define XFS_ATTR_NSP_ONDISK_MASK \
+			(XFS_ATTR_ROOT | XFS_ATTR_SECURE | XFS_ATTR_PARENT)
 
 /*
  * Alignment for namelist and valuelist entries (since they are mixed
diff --git a/libxfs/xfs_log_format.h b/libxfs/xfs_log_format.h
index ae9c9976..727b5a85 100644
--- a/libxfs/xfs_log_format.h
+++ b/libxfs/xfs_log_format.h
@@ -967,6 +967,7 @@ struct xfs_icreate_log {
  */
 #define XFS_ATTRI_FILTER_MASK		(XFS_ATTR_ROOT | \
 					 XFS_ATTR_SECURE | \
+					 XFS_ATTR_PARENT | \
 					 XFS_ATTR_INCOMPLETE)
 
 /*


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 08/25] xfsprogs: define parent pointer xattr format
  2023-02-16 20:29 ` [PATCHSET v9r2d1 00/25] xfsprogs: Parent Pointers Darrick J. Wong
                     ` (6 preceding siblings ...)
  2023-02-16 20:55   ` [PATCH 07/25] xfsprogs: add parent pointer support to attribute code Darrick J. Wong
@ 2023-02-16 20:55   ` Darrick J. Wong
  2023-02-16 20:55   ` [PATCH 09/25] xfsprogs: Add xfs_verify_pptr Darrick J. Wong
                     ` (16 subsequent siblings)
  24 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:55 UTC (permalink / raw)
  To: djwong; +Cc: Dave Chinner, Allison Henderson, allison.henderson, linux-xfs

From: Allison Henderson <allison.henderson@oracle.com>

Source kernel commit: 059f7b9c5aedf18990aaaee05ff9938b8d87a5ef

We need to define the parent pointer attribute format before we start
adding support for it into all the code that needs to use it. The EA
format we will use encodes the following information:

name={parent inode #, parent inode generation, dirent offset}
value={dirent filename}

The inode/gen gives all the information we need to reliably identify the
parent without requiring child->parent lock ordering, and allows
userspace to do pathname component level reconstruction without the
kernel ever needing to verify the parent itself as part of ioctl calls.

By using the dirent offset in the EA name, we have a method of knowing
the exact parent pointer EA we need to modify/remove in rename/unlink
without an unbound EA name search.

By keeping the dirent name in the value, we have enough information to
be able to validate and reconstruct damaged directory trees. While the
diroffset of a filename alone is not unique enough to identify the
child, the {diroffset,filename,child_inode} tuple is sufficient. That
is, if the diroffset gets reused and points to a different filename, we
can detect that from the contents of EA. If a link of the same name is
created, then we can check whether it points at the same inode as the
parent EA we current have.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 libxfs/xfs_da_format.h |   25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)


diff --git a/libxfs/xfs_da_format.h b/libxfs/xfs_da_format.h
index 3dc03968..b02b67f1 100644
--- a/libxfs/xfs_da_format.h
+++ b/libxfs/xfs_da_format.h
@@ -805,4 +805,29 @@ static inline unsigned int xfs_dir2_dirblock_bytes(struct xfs_sb *sbp)
 xfs_failaddr_t xfs_da3_blkinfo_verify(struct xfs_buf *bp,
 				      struct xfs_da3_blkinfo *hdr3);
 
+/*
+ * Parent pointer attribute format definition
+ *
+ * EA name encodes the parent inode number, generation and the offset of
+ * the dirent that points to the child inode. The EA value contains the
+ * same name as the dirent in the parent directory.
+ */
+struct xfs_parent_name_rec {
+	__be64  p_ino;
+	__be32  p_gen;
+	__be32  p_diroffset;
+};
+
+/*
+ * incore version of the above, also contains name pointers so callers
+ * can pass/obtain all the parent pointer information in a single structure
+ */
+struct xfs_parent_name_irec {
+	xfs_ino_t		p_ino;
+	uint32_t		p_gen;
+	xfs_dir2_dataptr_t	p_diroffset;
+	const char		*p_name;
+	uint8_t			p_namelen;
+};
+
 #endif /* __XFS_DA_FORMAT_H__ */


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 09/25] xfsprogs: Add xfs_verify_pptr
  2023-02-16 20:29 ` [PATCHSET v9r2d1 00/25] xfsprogs: Parent Pointers Darrick J. Wong
                     ` (7 preceding siblings ...)
  2023-02-16 20:55   ` [PATCH 08/25] xfsprogs: define parent pointer xattr format Darrick J. Wong
@ 2023-02-16 20:55   ` Darrick J. Wong
  2023-02-16 20:56   ` [PATCH 10/25] xfsprogs: extend transaction reservations for parent attributes Darrick J. Wong
                     ` (15 subsequent siblings)
  24 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:55 UTC (permalink / raw)
  To: djwong; +Cc: Allison Henderson, allison.henderson, linux-xfs

From: Allison Henderson <allison.henderson@oracle.com>

Source kernel commit: b328f630fcee8dc96e0e3942355fd211f8e15a5d

Attribute names of parent pointers are not strings.  So we need to modify
attr_namecheck to verify parent pointer records when the XFS_ATTR_PARENT flag is
set.

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 libxfs/xfs_attr.c      |   47 ++++++++++++++++++++++++++++++++++++++++++++---
 libxfs/xfs_attr.h      |    3 ++-
 libxfs/xfs_da_format.h |    8 ++++++++
 repair/attr_repair.c   |   19 ++++++++++++-------
 4 files changed, 66 insertions(+), 11 deletions(-)


diff --git a/libxfs/xfs_attr.c b/libxfs/xfs_attr.c
index 04f8e349..d5f1f488 100644
--- a/libxfs/xfs_attr.c
+++ b/libxfs/xfs_attr.c
@@ -1575,9 +1575,33 @@ xfs_attr_node_get(
 	return error;
 }
 
-/* Returns true if the attribute entry name is valid. */
-bool
-xfs_attr_namecheck(
+/*
+ * Verify parent pointer attribute is valid.
+ * Return true on success or false on failure
+ */
+STATIC bool
+xfs_verify_pptr(
+	struct xfs_mount			*mp,
+	const struct xfs_parent_name_rec	*rec)
+{
+	xfs_ino_t				p_ino;
+	xfs_dir2_dataptr_t			p_diroffset;
+
+	p_ino = be64_to_cpu(rec->p_ino);
+	p_diroffset = be32_to_cpu(rec->p_diroffset);
+
+	if (!xfs_verify_ino(mp, p_ino))
+		return false;
+
+	if (p_diroffset > XFS_DIR2_MAX_DATAPTR)
+		return false;
+
+	return true;
+}
+
+/* Returns true if the string attribute entry name is valid. */
+static bool
+xfs_str_attr_namecheck(
 	const void	*name,
 	size_t		length)
 {
@@ -1592,6 +1616,23 @@ xfs_attr_namecheck(
 	return !memchr(name, 0, length);
 }
 
+/* Returns true if the attribute entry name is valid. */
+bool
+xfs_attr_namecheck(
+	struct xfs_mount	*mp,
+	const void		*name,
+	size_t			length,
+	int			flags)
+{
+	if (flags & XFS_ATTR_PARENT) {
+		if (length != sizeof(struct xfs_parent_name_rec))
+			return false;
+		return xfs_verify_pptr(mp, (struct xfs_parent_name_rec *)name);
+	}
+
+	return xfs_str_attr_namecheck(name, length);
+}
+
 int __init
 xfs_attr_intent_init_cache(void)
 {
diff --git a/libxfs/xfs_attr.h b/libxfs/xfs_attr.h
index 3e81f3f4..b79dae78 100644
--- a/libxfs/xfs_attr.h
+++ b/libxfs/xfs_attr.h
@@ -547,7 +547,8 @@ int xfs_attr_get(struct xfs_da_args *args);
 int xfs_attr_set(struct xfs_da_args *args);
 int xfs_attr_set_iter(struct xfs_attr_intent *attr);
 int xfs_attr_remove_iter(struct xfs_attr_intent *attr);
-bool xfs_attr_namecheck(const void *name, size_t length);
+bool xfs_attr_namecheck(struct xfs_mount *mp, const void *name, size_t length,
+			int flags);
 int xfs_attr_calc_size(struct xfs_da_args *args, int *local);
 void xfs_init_attr_trans(struct xfs_da_args *args, struct xfs_trans_res *tres,
 			 unsigned int *total);
diff --git a/libxfs/xfs_da_format.h b/libxfs/xfs_da_format.h
index b02b67f1..75b13807 100644
--- a/libxfs/xfs_da_format.h
+++ b/libxfs/xfs_da_format.h
@@ -731,6 +731,14 @@ xfs_attr3_leaf_name(xfs_attr_leafblock_t *leafp, int idx)
 	return &((char *)leafp)[be16_to_cpu(entries[idx].nameidx)];
 }
 
+static inline int
+xfs_attr3_leaf_flags(xfs_attr_leafblock_t *leafp, int idx)
+{
+	struct xfs_attr_leaf_entry *entries = xfs_attr3_leaf_entryp(leafp);
+
+	return entries[idx].flags;
+}
+
 static inline xfs_attr_leaf_name_remote_t *
 xfs_attr3_leaf_name_remote(xfs_attr_leafblock_t *leafp, int idx)
 {
diff --git a/repair/attr_repair.c b/repair/attr_repair.c
index c3a6d502..afe8073c 100644
--- a/repair/attr_repair.c
+++ b/repair/attr_repair.c
@@ -293,8 +293,9 @@ process_shortform_attr(
 		}
 
 		/* namecheck checks for null chars in attr names. */
-		if (!libxfs_attr_namecheck(currententry->nameval,
-					   currententry->namelen)) {
+		if (!libxfs_attr_namecheck(mp, currententry->nameval,
+					   currententry->namelen,
+					   currententry->flags)) {
 			do_warn(
 	_("entry contains illegal character in shortform attribute name\n"));
 			junkit = 1;
@@ -454,12 +455,14 @@ process_leaf_attr_local(
 	xfs_dablk_t		da_bno,
 	xfs_ino_t		ino)
 {
-	xfs_attr_leaf_name_local_t *local;
+	xfs_attr_leaf_name_local_t	*local;
+	int				flags;
 
 	local = xfs_attr3_leaf_name_local(leaf, i);
+	flags = xfs_attr3_leaf_flags(leaf, i);
 	if (local->namelen == 0 ||
-	    !libxfs_attr_namecheck(local->nameval,
-				   local->namelen)) {
+	    !libxfs_attr_namecheck(mp, local->nameval,
+				   local->namelen, flags)) {
 		do_warn(
 	_("attribute entry %d in attr block %u, inode %" PRIu64 " has bad name (namelen = %d)\n"),
 			i, da_bno, ino, local->namelen);
@@ -510,12 +513,14 @@ process_leaf_attr_remote(
 {
 	xfs_attr_leaf_name_remote_t *remotep;
 	char*			value;
+	int			flags;
 
 	remotep = xfs_attr3_leaf_name_remote(leaf, i);
+	flags = xfs_attr3_leaf_flags(leaf, i);
 
 	if (remotep->namelen == 0 ||
-	    !libxfs_attr_namecheck(remotep->name,
-				   remotep->namelen) ||
+	    !libxfs_attr_namecheck(mp, remotep->name,
+				   remotep->namelen, flags) ||
 	    be32_to_cpu(entry->hashval) !=
 			libxfs_da_hashname((unsigned char *)&remotep->name[0],
 					   remotep->namelen) ||


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 10/25] xfsprogs: extend transaction reservations for parent attributes
  2023-02-16 20:29 ` [PATCHSET v9r2d1 00/25] xfsprogs: Parent Pointers Darrick J. Wong
                     ` (8 preceding siblings ...)
  2023-02-16 20:55   ` [PATCH 09/25] xfsprogs: Add xfs_verify_pptr Darrick J. Wong
@ 2023-02-16 20:56   ` Darrick J. Wong
  2023-02-16 20:56   ` [PATCH 11/25] xfsprogs: parent pointer attribute creation Darrick J. Wong
                     ` (14 subsequent siblings)
  24 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:56 UTC (permalink / raw)
  To: djwong; +Cc: Dave Chinner, Allison Henderson, allison.henderson, linux-xfs

From: Allison Henderson <allison.henderson@oracle.com>

Source kernel commit: 99c10e460207a624b3e243e4a3665737d436d08c

We need to add, remove or modify parent pointer attributes during
create/link/unlink/rename operations atomically with the dirents in the
parent directories being modified. This means they need to be modified
in the same transaction as the parent directories, and so we need to add
the required space for the attribute modifications to the transaction
reservations.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
[djwong: fix indentation errors]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 libxfs/libxfs_priv.h    |    1 
 libxfs/xfs_trans_resv.c |  322 +++++++++++++++++++++++++++++++++++++++--------
 2 files changed, 271 insertions(+), 52 deletions(-)


diff --git a/libxfs/libxfs_priv.h b/libxfs/libxfs_priv.h
index 567bd237..9dec26f9 100644
--- a/libxfs/libxfs_priv.h
+++ b/libxfs/libxfs_priv.h
@@ -521,6 +521,7 @@ static inline int retzero(void) { return 0; }
 
 #define xfs_icreate_log(tp, agno, agbno, cnt, isize, len, gen) ((void) 0)
 #define xfs_sb_validate_fsb_count(sbp, nblks)		(0)
+#define xlog_calc_iovec_len(len)		roundup(len, sizeof(uint32_t))
 
 /*
  * Prototypes for kernel static functions that are aren't in their
diff --git a/libxfs/xfs_trans_resv.c b/libxfs/xfs_trans_resv.c
index 04c44480..50315738 100644
--- a/libxfs/xfs_trans_resv.c
+++ b/libxfs/xfs_trans_resv.c
@@ -18,6 +18,7 @@
 #include "xfs_trans.h"
 #include "xfs_trans_space.h"
 #include "xfs_quota_defs.h"
+#include "xfs_da_format.h"
 
 #define _ALLOC	true
 #define _FREE	false
@@ -419,29 +420,108 @@ xfs_calc_itruncate_reservation_minlogsize(
 	return xfs_calc_itruncate_reservation(mp, true);
 }
 
+static inline unsigned int xfs_calc_pptr_link_overhead(void)
+{
+	return sizeof(struct xfs_attri_log_format) +
+			xlog_calc_iovec_len(XATTR_NAME_MAX) +
+			xlog_calc_iovec_len(sizeof(struct xfs_parent_name_rec));
+}
+static inline unsigned int xfs_calc_pptr_unlink_overhead(void)
+{
+	return sizeof(struct xfs_attri_log_format) +
+			xlog_calc_iovec_len(sizeof(struct xfs_parent_name_rec));
+}
+static inline unsigned int xfs_calc_pptr_replace_overhead(void)
+{
+	return sizeof(struct xfs_attri_log_format) +
+			xlog_calc_iovec_len(XATTR_NAME_MAX) +
+			xlog_calc_iovec_len(XATTR_NAME_MAX) +
+			xlog_calc_iovec_len(sizeof(struct xfs_parent_name_rec));
+}
+
 /*
  * In renaming a files we can modify:
  *    the five inodes involved: 5 * inode size
  *    the two directory btrees: 2 * (max depth + v2) * dir block size
  *    the two directory bmap btrees: 2 * max depth * block size
  * And the bmap_finish transaction can free dir and bmap blocks (two sets
- *	of bmap blocks) giving:
+ *	of bmap blocks) giving (t2):
  *    the agf for the ags in which the blocks live: 3 * sector size
  *    the agfl for the ags in which the blocks live: 3 * sector size
  *    the superblock for the free block count: sector size
  *    the allocation btrees: 3 exts * 2 trees * (2 * max depth - 1) * block size
+ * If parent pointers are enabled (t3), then each transaction in the chain
+ *    must be capable of setting or removing the extended attribute
+ *    containing the parent information.  It must also be able to handle
+ *    the three xattr intent items that track the progress of the parent
+ *    pointer update.
  */
 STATIC uint
 xfs_calc_rename_reservation(
 	struct xfs_mount	*mp)
 {
-	return XFS_DQUOT_LOGRES(mp) +
-		max((xfs_calc_inode_res(mp, 5) +
-		     xfs_calc_buf_res(2 * XFS_DIROP_LOG_COUNT(mp),
-				      XFS_FSB_TO_B(mp, 1))),
-		    (xfs_calc_buf_res(7, mp->m_sb.sb_sectsize) +
-		     xfs_calc_buf_res(xfs_allocfree_block_count(mp, 3),
-				      XFS_FSB_TO_B(mp, 1))));
+	unsigned int		overhead = XFS_DQUOT_LOGRES(mp);
+	struct xfs_trans_resv	*resp = M_RES(mp);
+	unsigned int		t1, t2, t3 = 0;
+
+	t1 = xfs_calc_inode_res(mp, 5) +
+	     xfs_calc_buf_res(2 * XFS_DIROP_LOG_COUNT(mp),
+			XFS_FSB_TO_B(mp, 1));
+
+	t2 = xfs_calc_buf_res(7, mp->m_sb.sb_sectsize) +
+	     xfs_calc_buf_res(xfs_allocfree_block_count(mp, 3),
+			XFS_FSB_TO_B(mp, 1));
+
+	if (xfs_has_parent(mp)) {
+		unsigned int	rename_overhead, exchange_overhead;
+
+		t3 = max(resp->tr_attrsetm.tr_logres,
+			 resp->tr_attrrm.tr_logres);
+
+		/*
+		 * For a standard rename, the three xattr intent log items
+		 * are (1) replacing the pptr for the source file; (2)
+		 * removing the pptr on the dest file; and (3) adding a
+		 * pptr for the whiteout file in the src dir.
+		 *
+		 * For an RENAME_EXCHANGE, there are two xattr intent
+		 * items to replace the pptr for both src and dest
+		 * files.  Link counts don't change and there is no
+		 * whiteout.
+		 *
+		 * In the worst case we can end up relogging all log
+		 * intent items to allow the log tail to move ahead, so
+		 * they become overhead added to each transaction in a
+		 * processing chain.
+		 */
+		rename_overhead = xfs_calc_pptr_replace_overhead() +
+				  xfs_calc_pptr_unlink_overhead() +
+				  xfs_calc_pptr_link_overhead();
+		exchange_overhead = 2 * xfs_calc_pptr_replace_overhead();
+
+		overhead += max(rename_overhead, exchange_overhead);
+	}
+
+	return overhead + max3(t1, t2, t3);
+}
+
+static inline unsigned int
+xfs_rename_log_count(
+	struct xfs_mount	*mp,
+	struct xfs_trans_resv	*resp)
+{
+	/* One for the rename, one more for freeing blocks */
+	unsigned int		ret = XFS_RENAME_LOG_COUNT;
+
+	/*
+	 * Pre-reserve enough log reservation to handle the transaction
+	 * rolling needed to remove or add one parent pointer.
+	 */
+	if (xfs_has_parent(mp))
+		ret += max(resp->tr_attrsetm.tr_logcount,
+			   resp->tr_attrrm.tr_logcount);
+
+	return ret;
 }
 
 /*
@@ -458,6 +538,23 @@ xfs_calc_iunlink_remove_reservation(
 	       2 * M_IGEO(mp)->inode_cluster_size;
 }
 
+static inline unsigned int
+xfs_link_log_count(
+	struct xfs_mount	*mp,
+	struct xfs_trans_resv	*resp)
+{
+	unsigned int		ret = XFS_LINK_LOG_COUNT;
+
+	/*
+	 * Pre-reserve enough log reservation to handle the transaction
+	 * rolling needed to add one parent pointer.
+	 */
+	if (xfs_has_parent(mp))
+		ret += resp->tr_attrsetm.tr_logcount;
+
+	return ret;
+}
+
 /*
  * For creating a link to an inode:
  *    the parent directory inode: inode size
@@ -474,14 +571,23 @@ STATIC uint
 xfs_calc_link_reservation(
 	struct xfs_mount	*mp)
 {
-	return XFS_DQUOT_LOGRES(mp) +
-		xfs_calc_iunlink_remove_reservation(mp) +
-		max((xfs_calc_inode_res(mp, 2) +
-		     xfs_calc_buf_res(XFS_DIROP_LOG_COUNT(mp),
-				      XFS_FSB_TO_B(mp, 1))),
-		    (xfs_calc_buf_res(3, mp->m_sb.sb_sectsize) +
-		     xfs_calc_buf_res(xfs_allocfree_block_count(mp, 1),
-				      XFS_FSB_TO_B(mp, 1))));
+	unsigned int            overhead = XFS_DQUOT_LOGRES(mp);
+	struct xfs_trans_resv   *resp = M_RES(mp);
+	unsigned int            t1, t2, t3 = 0;
+
+	overhead += xfs_calc_iunlink_remove_reservation(mp);
+	t1 = xfs_calc_inode_res(mp, 2) +
+	       xfs_calc_buf_res(XFS_DIROP_LOG_COUNT(mp), XFS_FSB_TO_B(mp, 1));
+	t2 = xfs_calc_buf_res(3, mp->m_sb.sb_sectsize) +
+	     xfs_calc_buf_res(xfs_allocfree_block_count(mp, 1),
+			      XFS_FSB_TO_B(mp, 1));
+
+	if (xfs_has_parent(mp)) {
+		t3 = resp->tr_attrsetm.tr_logres;
+		overhead += xfs_calc_pptr_link_overhead();
+	}
+
+	return overhead + max3(t1, t2, t3);
 }
 
 /*
@@ -496,6 +602,23 @@ xfs_calc_iunlink_add_reservation(xfs_mount_t *mp)
 			M_IGEO(mp)->inode_cluster_size;
 }
 
+static inline unsigned int
+xfs_remove_log_count(
+	struct xfs_mount	*mp,
+	struct xfs_trans_resv	*resp)
+{
+	unsigned int		ret = XFS_REMOVE_LOG_COUNT;
+
+	/*
+	 * Pre-reserve enough log reservation to handle the transaction
+	 * rolling needed to add one parent pointer.
+	 */
+	if (xfs_has_parent(mp))
+		ret += resp->tr_attrrm.tr_logcount;
+
+	return ret;
+}
+
 /*
  * For removing a directory entry we can modify:
  *    the parent directory inode: inode size
@@ -512,14 +635,24 @@ STATIC uint
 xfs_calc_remove_reservation(
 	struct xfs_mount	*mp)
 {
-	return XFS_DQUOT_LOGRES(mp) +
-		xfs_calc_iunlink_add_reservation(mp) +
-		max((xfs_calc_inode_res(mp, 2) +
-		     xfs_calc_buf_res(XFS_DIROP_LOG_COUNT(mp),
-				      XFS_FSB_TO_B(mp, 1))),
-		    (xfs_calc_buf_res(4, mp->m_sb.sb_sectsize) +
-		     xfs_calc_buf_res(xfs_allocfree_block_count(mp, 2),
-				      XFS_FSB_TO_B(mp, 1))));
+	unsigned int            overhead = XFS_DQUOT_LOGRES(mp);
+	struct xfs_trans_resv   *resp = M_RES(mp);
+	unsigned int            t1, t2, t3 = 0;
+
+	overhead += xfs_calc_iunlink_add_reservation(mp);
+
+	t1 = xfs_calc_inode_res(mp, 2) +
+	     xfs_calc_buf_res(XFS_DIROP_LOG_COUNT(mp), XFS_FSB_TO_B(mp, 1));
+	t2 = xfs_calc_buf_res(4, mp->m_sb.sb_sectsize) +
+	     xfs_calc_buf_res(xfs_allocfree_block_count(mp, 2),
+			      XFS_FSB_TO_B(mp, 1));
+
+	if (xfs_has_parent(mp)) {
+		t3 = resp->tr_attrrm.tr_logres;
+		overhead += xfs_calc_pptr_unlink_overhead();
+	}
+
+	return overhead + max3(t1, t2, t3);
 }
 
 /*
@@ -568,12 +701,40 @@ xfs_calc_icreate_resv_alloc(
 		xfs_calc_finobt_res(mp);
 }
 
+static inline unsigned int
+xfs_icreate_log_count(
+	struct xfs_mount	*mp,
+	struct xfs_trans_resv	*resp)
+{
+	unsigned int		ret = XFS_CREATE_LOG_COUNT;
+
+	/*
+	 * Pre-reserve enough log reservation to handle the transaction
+	 * rolling needed to add one parent pointer.
+	 */
+	if (xfs_has_parent(mp))
+		ret += resp->tr_attrsetm.tr_logcount;
+
+	return ret;
+}
+
 STATIC uint
-xfs_calc_icreate_reservation(xfs_mount_t *mp)
+xfs_calc_icreate_reservation(
+	struct xfs_mount	*mp)
 {
-	return XFS_DQUOT_LOGRES(mp) +
-		max(xfs_calc_icreate_resv_alloc(mp),
-		    xfs_calc_create_resv_modify(mp));
+	struct xfs_trans_resv   *resp = M_RES(mp);
+	unsigned int		overhead = XFS_DQUOT_LOGRES(mp);
+	unsigned int		t1, t2, t3 = 0;
+
+	t1 = xfs_calc_icreate_resv_alloc(mp);
+	t2 = xfs_calc_create_resv_modify(mp);
+
+	if (xfs_has_parent(mp)) {
+		t3 = resp->tr_attrsetm.tr_logres;
+		overhead += xfs_calc_pptr_link_overhead();
+	}
+
+	return overhead + max3(t1, t2, t3);
 }
 
 STATIC uint
@@ -586,6 +747,23 @@ xfs_calc_create_tmpfile_reservation(
 	return res + xfs_calc_iunlink_add_reservation(mp);
 }
 
+static inline unsigned int
+xfs_mkdir_log_count(
+	struct xfs_mount	*mp,
+	struct xfs_trans_resv	*resp)
+{
+	unsigned int		ret = XFS_MKDIR_LOG_COUNT;
+
+	/*
+	 * Pre-reserve enough log reservation to handle the transaction
+	 * rolling needed to add one parent pointer.
+	 */
+	if (xfs_has_parent(mp))
+		ret += resp->tr_attrsetm.tr_logcount;
+
+	return ret;
+}
+
 /*
  * Making a new directory is the same as creating a new file.
  */
@@ -596,6 +774,22 @@ xfs_calc_mkdir_reservation(
 	return xfs_calc_icreate_reservation(mp);
 }
 
+static inline unsigned int
+xfs_symlink_log_count(
+	struct xfs_mount	*mp,
+	struct xfs_trans_resv	*resp)
+{
+	unsigned int		ret = XFS_SYMLINK_LOG_COUNT;
+
+	/*
+	 * Pre-reserve enough log reservation to handle the transaction
+	 * rolling needed to add one parent pointer.
+	 */
+	if (xfs_has_parent(mp))
+		ret += resp->tr_attrsetm.tr_logcount;
+
+	return ret;
+}
 
 /*
  * Making a new symplink is the same as creating a new file, but
@@ -908,6 +1102,52 @@ xfs_calc_sb_reservation(
 	return xfs_calc_buf_res(1, mp->m_sb.sb_sectsize);
 }
 
+/*
+ * Namespace reservations.
+ *
+ * These get tricky when parent pointers are enabled as we have attribute
+ * modifications occurring from within these transactions. Rather than confuse
+ * each of these reservation calculations with the conditional attribute
+ * reservations, add them here in a clear and concise manner. This requires that
+ * the attribute reservations have already been calculated.
+ *
+ * Note that we only include the static attribute reservation here; the runtime
+ * reservation will have to be modified by the size of the attributes being
+ * added/removed/modified. See the comments on the attribute reservation
+ * calculations for more details.
+ */
+STATIC void
+xfs_calc_namespace_reservations(
+	struct xfs_mount	*mp,
+	struct xfs_trans_resv	*resp)
+{
+	ASSERT(resp->tr_attrsetm.tr_logres > 0);
+
+	resp->tr_rename.tr_logres = xfs_calc_rename_reservation(mp);
+	resp->tr_rename.tr_logcount = xfs_rename_log_count(mp, resp);
+	resp->tr_rename.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
+
+	resp->tr_link.tr_logres = xfs_calc_link_reservation(mp);
+	resp->tr_link.tr_logcount = xfs_link_log_count(mp, resp);
+	resp->tr_link.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
+
+	resp->tr_remove.tr_logres = xfs_calc_remove_reservation(mp);
+	resp->tr_remove.tr_logcount = xfs_remove_log_count(mp, resp);
+	resp->tr_remove.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
+
+	resp->tr_symlink.tr_logres = xfs_calc_symlink_reservation(mp);
+	resp->tr_symlink.tr_logcount = xfs_symlink_log_count(mp, resp);
+	resp->tr_symlink.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
+
+	resp->tr_create.tr_logres = xfs_calc_icreate_reservation(mp);
+	resp->tr_create.tr_logcount = xfs_icreate_log_count(mp, resp);
+	resp->tr_create.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
+
+	resp->tr_mkdir.tr_logres = xfs_calc_mkdir_reservation(mp);
+	resp->tr_mkdir.tr_logcount = xfs_mkdir_log_count(mp, resp);
+	resp->tr_mkdir.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
+}
+
 void
 xfs_trans_resv_calc(
 	struct xfs_mount	*mp,
@@ -927,35 +1167,11 @@ xfs_trans_resv_calc(
 	resp->tr_itruncate.tr_logcount = XFS_ITRUNCATE_LOG_COUNT;
 	resp->tr_itruncate.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
 
-	resp->tr_rename.tr_logres = xfs_calc_rename_reservation(mp);
-	resp->tr_rename.tr_logcount = XFS_RENAME_LOG_COUNT;
-	resp->tr_rename.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
-
-	resp->tr_link.tr_logres = xfs_calc_link_reservation(mp);
-	resp->tr_link.tr_logcount = XFS_LINK_LOG_COUNT;
-	resp->tr_link.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
-
-	resp->tr_remove.tr_logres = xfs_calc_remove_reservation(mp);
-	resp->tr_remove.tr_logcount = XFS_REMOVE_LOG_COUNT;
-	resp->tr_remove.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
-
-	resp->tr_symlink.tr_logres = xfs_calc_symlink_reservation(mp);
-	resp->tr_symlink.tr_logcount = XFS_SYMLINK_LOG_COUNT;
-	resp->tr_symlink.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
-
-	resp->tr_create.tr_logres = xfs_calc_icreate_reservation(mp);
-	resp->tr_create.tr_logcount = XFS_CREATE_LOG_COUNT;
-	resp->tr_create.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
-
 	resp->tr_create_tmpfile.tr_logres =
 			xfs_calc_create_tmpfile_reservation(mp);
 	resp->tr_create_tmpfile.tr_logcount = XFS_CREATE_TMPFILE_LOG_COUNT;
 	resp->tr_create_tmpfile.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
 
-	resp->tr_mkdir.tr_logres = xfs_calc_mkdir_reservation(mp);
-	resp->tr_mkdir.tr_logcount = XFS_MKDIR_LOG_COUNT;
-	resp->tr_mkdir.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
-
 	resp->tr_ifree.tr_logres = xfs_calc_ifree_reservation(mp);
 	resp->tr_ifree.tr_logcount = XFS_INACTIVE_LOG_COUNT;
 	resp->tr_ifree.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
@@ -985,6 +1201,8 @@ xfs_trans_resv_calc(
 	resp->tr_qm_dqalloc.tr_logcount = XFS_WRITE_LOG_COUNT;
 	resp->tr_qm_dqalloc.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
 
+	xfs_calc_namespace_reservations(mp, resp);
+
 	/*
 	 * The following transactions are logged in logical format with
 	 * a default log count.


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 11/25] xfsprogs: parent pointer attribute creation
  2023-02-16 20:29 ` [PATCHSET v9r2d1 00/25] xfsprogs: Parent Pointers Darrick J. Wong
                     ` (9 preceding siblings ...)
  2023-02-16 20:56   ` [PATCH 10/25] xfsprogs: extend transaction reservations for parent attributes Darrick J. Wong
@ 2023-02-16 20:56   ` Darrick J. Wong
  2023-02-16 20:56   ` [PATCH 12/25] xfsprogs: add parent attributes to link Darrick J. Wong
                     ` (13 subsequent siblings)
  24 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:56 UTC (permalink / raw)
  To: djwong; +Cc: Dave Chinner, Allison Henderson, allison.henderson, linux-xfs

From: Allison Henderson <allison.henderson@oracle.com>

Source kernel commit: bb5cfa7e0c8eff07c62b1a28e0a4ea1d2561e0bb

Add parent pointer attribute during xfs_create, and subroutines to
initialize attributes

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
[djwong: sync with kernel code]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 libxfs/Makefile        |    2 +
 libxfs/libxfs_priv.h   |    3 +
 libxfs/xfs_attr.c      |    4 +
 libxfs/xfs_attr.h      |    4 +
 libxfs/xfs_da_format.h |   12 ----
 libxfs/xfs_parent.c    |  140 ++++++++++++++++++++++++++++++++++++++++++++++++
 libxfs/xfs_parent.h    |   57 ++++++++++++++++++++
 7 files changed, 206 insertions(+), 16 deletions(-)
 create mode 100644 libxfs/xfs_parent.c
 create mode 100644 libxfs/xfs_parent.h


diff --git a/libxfs/Makefile b/libxfs/Makefile
index 010ee68e..89d29dc9 100644
--- a/libxfs/Makefile
+++ b/libxfs/Makefile
@@ -45,6 +45,7 @@ HFILES = \
 	xfs_ialloc_btree.h \
 	xfs_inode_buf.h \
 	xfs_inode_fork.h \
+	xfs_parent.h \
 	xfs_quota_defs.h \
 	xfs_refcount.h \
 	xfs_refcount_btree.h \
@@ -92,6 +93,7 @@ CFILES = cache.c \
 	xfs_inode_fork.c \
 	xfs_ialloc_btree.c \
 	xfs_log_rlimit.c \
+	xfs_parent.c \
 	xfs_refcount.c \
 	xfs_refcount_btree.c \
 	xfs_rmap.c \
diff --git a/libxfs/libxfs_priv.h b/libxfs/libxfs_priv.h
index 9dec26f9..ad21a25d 100644
--- a/libxfs/libxfs_priv.h
+++ b/libxfs/libxfs_priv.h
@@ -614,7 +614,8 @@ int libxfs_zero_extent(struct xfs_inode *ip, xfs_fsblock_t start_fsb,
 /* xfs_log.c */
 bool xfs_log_check_lsn(struct xfs_mount *, xfs_lsn_t);
 void xfs_log_item_init(struct xfs_mount *, struct xfs_log_item *, int);
-#define xfs_attr_use_log_assist(mp)	(0)
+#define xfs_attr_grab_log_assist(mp)	(0)
+#define xfs_attr_rele_log_assist(mp)	((void) 0)
 #define xlog_drop_incompat_feat(log)	do { } while (0)
 #define xfs_log_in_recovery(mp)		(false)
 
diff --git a/libxfs/xfs_attr.c b/libxfs/xfs_attr.c
index d5f1f488..edf7e1ee 100644
--- a/libxfs/xfs_attr.c
+++ b/libxfs/xfs_attr.c
@@ -884,7 +884,7 @@ xfs_attr_lookup(
 	return error;
 }
 
-static int
+int
 xfs_attr_intent_init(
 	struct xfs_da_args	*args,
 	unsigned int		op_flags,	/* op flag (set or remove) */
@@ -902,7 +902,7 @@ xfs_attr_intent_init(
 }
 
 /* Sets an attribute for an inode as a deferred operation */
-static int
+int
 xfs_attr_defer_add(
 	struct xfs_da_args	*args)
 {
diff --git a/libxfs/xfs_attr.h b/libxfs/xfs_attr.h
index b79dae78..0cf23f51 100644
--- a/libxfs/xfs_attr.h
+++ b/libxfs/xfs_attr.h
@@ -544,6 +544,7 @@ int xfs_inode_hasattr(struct xfs_inode *ip);
 bool xfs_attr_is_leaf(struct xfs_inode *ip);
 int xfs_attr_get_ilocked(struct xfs_da_args *args);
 int xfs_attr_get(struct xfs_da_args *args);
+int xfs_attr_defer_add(struct xfs_da_args *args);
 int xfs_attr_set(struct xfs_da_args *args);
 int xfs_attr_set_iter(struct xfs_attr_intent *attr);
 int xfs_attr_remove_iter(struct xfs_attr_intent *attr);
@@ -552,7 +553,8 @@ bool xfs_attr_namecheck(struct xfs_mount *mp, const void *name, size_t length,
 int xfs_attr_calc_size(struct xfs_da_args *args, int *local);
 void xfs_init_attr_trans(struct xfs_da_args *args, struct xfs_trans_res *tres,
 			 unsigned int *total);
-
+int xfs_attr_intent_init(struct xfs_da_args *args, unsigned int op_flags,
+			 struct xfs_attr_intent  **attr);
 /*
  * Check to see if the attr should be upgraded from non-existent or shortform to
  * single-leaf-block attribute list.
diff --git a/libxfs/xfs_da_format.h b/libxfs/xfs_da_format.h
index 75b13807..2db1cf97 100644
--- a/libxfs/xfs_da_format.h
+++ b/libxfs/xfs_da_format.h
@@ -826,16 +826,4 @@ struct xfs_parent_name_rec {
 	__be32  p_diroffset;
 };
 
-/*
- * incore version of the above, also contains name pointers so callers
- * can pass/obtain all the parent pointer information in a single structure
- */
-struct xfs_parent_name_irec {
-	xfs_ino_t		p_ino;
-	uint32_t		p_gen;
-	xfs_dir2_dataptr_t	p_diroffset;
-	const char		*p_name;
-	uint8_t			p_namelen;
-};
-
 #endif /* __XFS_DA_FORMAT_H__ */
diff --git a/libxfs/xfs_parent.c b/libxfs/xfs_parent.c
new file mode 100644
index 00000000..e0a59998
--- /dev/null
+++ b/libxfs/xfs_parent.c
@@ -0,0 +1,140 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2022 Oracle, Inc.
+ * All rights reserved.
+ */
+#include "libxfs_priv.h"
+#include "xfs_shared.h"
+#include "xfs_format.h"
+#include "xfs_log_format.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_inode.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_trace.h"
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_da_format.h"
+#include "xfs_bmap_btree.h"
+#include "xfs_trans.h"
+#include "xfs_da_btree.h"
+#include "xfs_attr.h"
+#include "xfs_da_btree.h"
+#include "xfs_attr_sf.h"
+#include "xfs_bmap.h"
+#include "xfs_parent.h"
+#include "xfs_da_format.h"
+#include "xfs_format.h"
+#include "xfs_trans_space.h"
+
+struct kmem_cache		*xfs_parent_intent_cache;
+
+/*
+ * Parent pointer attribute handling.
+ *
+ * Because the attribute value is a filename component, it will never be longer
+ * than 255 bytes. This means the attribute will always be a local format
+ * attribute as it is xfs_attr_leaf_entsize_local_max() for v5 filesystems will
+ * always be larger than this (max is 75% of block size).
+ *
+ * Creating a new parent attribute will always create a new attribute - there
+ * should never, ever be an existing attribute in the tree for a new inode.
+ * ENOSPC behavior is problematic - creating the inode without the parent
+ * pointer is effectively a corruption, so we allow parent attribute creation
+ * to dip into the reserve block pool to avoid unexpected ENOSPC errors from
+ * occurring.
+ */
+
+
+/* Initializes a xfs_parent_name_rec to be stored as an attribute name */
+void
+xfs_init_parent_name_rec(
+	struct xfs_parent_name_rec	*rec,
+	struct xfs_inode		*ip,
+	uint32_t			p_diroffset)
+{
+	xfs_ino_t			p_ino = ip->i_ino;
+	uint32_t			p_gen = VFS_I(ip)->i_generation;
+
+	rec->p_ino = cpu_to_be64(p_ino);
+	rec->p_gen = cpu_to_be32(p_gen);
+	rec->p_diroffset = cpu_to_be32(p_diroffset);
+}
+
+int
+__xfs_parent_init(
+	struct xfs_mount		*mp,
+	struct xfs_parent_defer		**parentp)
+{
+	struct xfs_parent_defer		*parent;
+	int				error;
+
+	error = xfs_attr_grab_log_assist(mp);
+	if (error)
+		return error;
+
+	parent = kmem_cache_zalloc(xfs_parent_intent_cache, GFP_KERNEL);
+	if (!parent) {
+		xfs_attr_rele_log_assist(mp);
+		return -ENOMEM;
+	}
+
+	/* init parent da_args */
+	parent->args.geo = mp->m_attr_geo;
+	parent->args.whichfork = XFS_ATTR_FORK;
+	parent->args.attr_filter = XFS_ATTR_PARENT;
+	parent->args.op_flags = XFS_DA_OP_OKNOENT | XFS_DA_OP_LOGGED;
+	parent->args.name = (const uint8_t *)&parent->rec;
+	parent->args.namelen = sizeof(struct xfs_parent_name_rec);
+
+	*parentp = parent;
+	return 0;
+}
+
+int
+xfs_parent_defer_add(
+	struct xfs_trans	*tp,
+	struct xfs_parent_defer	*parent,
+	struct xfs_inode	*dp,
+	struct xfs_name		*parent_name,
+	xfs_dir2_dataptr_t	diroffset,
+	struct xfs_inode	*child)
+{
+	struct xfs_da_args	*args = &parent->args;
+
+	xfs_init_parent_name_rec(&parent->rec, dp, diroffset);
+	args->hashval = xfs_da_hashname(args->name, args->namelen);
+
+	args->trans = tp;
+	args->dp = child;
+	if (parent_name) {
+		parent->args.value = (void *)parent_name->name;
+		parent->args.valuelen = parent_name->len;
+	}
+
+	return xfs_attr_defer_add(args);
+}
+
+void
+__xfs_parent_cancel(
+	xfs_mount_t		*mp,
+	struct xfs_parent_defer *parent)
+{
+	xlog_drop_incompat_feat(mp->m_log);
+	kmem_cache_free(xfs_parent_intent_cache, parent);
+}
+
+unsigned int
+xfs_pptr_calc_space_res(
+	struct xfs_mount	*mp,
+	unsigned int		namelen)
+{
+	/*
+	 * Pptrs are always the first attr in an attr tree, and never larger
+	 * than a block
+	 */
+	return XFS_DAENTER_SPACE_RES(mp, XFS_ATTR_FORK) +
+	       XFS_NEXTENTADD_SPACE_RES(mp, namelen, XFS_ATTR_FORK);
+}
+
diff --git a/libxfs/xfs_parent.h b/libxfs/xfs_parent.h
new file mode 100644
index 00000000..d5a8c8e5
--- /dev/null
+++ b/libxfs/xfs_parent.h
@@ -0,0 +1,57 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2022 Oracle, Inc.
+ * All Rights Reserved.
+ */
+#ifndef	__XFS_PARENT_H__
+#define	__XFS_PARENT_H__
+
+extern struct kmem_cache	*xfs_parent_intent_cache;
+
+/*
+ * Dynamically allocd structure used to wrap the needed data to pass around
+ * the defer ops machinery
+ */
+struct xfs_parent_defer {
+	struct xfs_parent_name_rec	rec;
+	struct xfs_da_args		args;
+};
+
+/*
+ * Parent pointer attribute prototypes
+ */
+void xfs_init_parent_name_rec(struct xfs_parent_name_rec *rec,
+			      struct xfs_inode *ip,
+			      uint32_t p_diroffset);
+int __xfs_parent_init(struct xfs_mount *mp, struct xfs_parent_defer **parentp);
+
+static inline int
+xfs_parent_start(
+	struct xfs_mount	*mp,
+	struct xfs_parent_defer	**pp)
+{
+	*pp = NULL;
+
+	if (xfs_has_parent(mp))
+		return __xfs_parent_init(mp, pp);
+	return 0;
+}
+
+int xfs_parent_defer_add(struct xfs_trans *tp, struct xfs_parent_defer *parent,
+			 struct xfs_inode *dp, struct xfs_name *parent_name,
+			 xfs_dir2_dataptr_t diroffset, struct xfs_inode *child);
+void __xfs_parent_cancel(struct xfs_mount *mp, struct xfs_parent_defer *parent);
+
+static inline void
+xfs_parent_finish(
+	struct xfs_mount	*mp,
+	struct xfs_parent_defer	*p)
+{
+	if (p)
+		__xfs_parent_cancel(mp, p);
+}
+
+unsigned int xfs_pptr_calc_space_res(struct xfs_mount *mp,
+				     unsigned int namelen);
+
+#endif	/* __XFS_PARENT_H__ */


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 12/25] xfsprogs: add parent attributes to link
  2023-02-16 20:29 ` [PATCHSET v9r2d1 00/25] xfsprogs: Parent Pointers Darrick J. Wong
                     ` (10 preceding siblings ...)
  2023-02-16 20:56   ` [PATCH 11/25] xfsprogs: parent pointer attribute creation Darrick J. Wong
@ 2023-02-16 20:56   ` Darrick J. Wong
  2023-02-16 20:56   ` [PATCH 13/25] xfsprogs: add parent attributes to symlink Darrick J. Wong
                     ` (12 subsequent siblings)
  24 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:56 UTC (permalink / raw)
  To: djwong; +Cc: Dave Chinner, Allison Henderson, allison.henderson, linux-xfs

From: Allison Henderson <allison.henderson@oracle.com>

This patch modifies xfs_link to add a parent pointer to the inode.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 libxfs/xfs_trans_space.h |    2 --
 1 file changed, 2 deletions(-)


diff --git a/libxfs/xfs_trans_space.h b/libxfs/xfs_trans_space.h
index 87b31c69..f7220792 100644
--- a/libxfs/xfs_trans_space.h
+++ b/libxfs/xfs_trans_space.h
@@ -84,8 +84,6 @@
 	(2 * (mp)->m_alloc_maxlevels)
 #define	XFS_GROWFSRT_SPACE_RES(mp,b)	\
 	((b) + XFS_EXTENTADD_SPACE_RES(mp, XFS_DATA_FORK))
-#define	XFS_LINK_SPACE_RES(mp,nl)	\
-	XFS_DIRENTER_SPACE_RES(mp,nl)
 #define	XFS_MKDIR_SPACE_RES(mp,nl)	\
 	(XFS_IALLOC_SPACE_RES(mp) + XFS_DIRENTER_SPACE_RES(mp,nl))
 #define	XFS_QM_DQALLOC_SPACE_RES(mp)	\


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 13/25] xfsprogs: add parent attributes to symlink
  2023-02-16 20:29 ` [PATCHSET v9r2d1 00/25] xfsprogs: Parent Pointers Darrick J. Wong
                     ` (11 preceding siblings ...)
  2023-02-16 20:56   ` [PATCH 12/25] xfsprogs: add parent attributes to link Darrick J. Wong
@ 2023-02-16 20:56   ` Darrick J. Wong
  2023-02-16 20:57   ` [PATCH 14/25] xfsprogs: remove parent pointers in unlink Darrick J. Wong
                     ` (11 subsequent siblings)
  24 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:56 UTC (permalink / raw)
  To: djwong; +Cc: Allison Henderson, allison.henderson, linux-xfs

From: Allison Henderson <allison.henderson@oracle.com>

This patch modifies xfs_symlink to add a parent pointer to the inode.

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
---
 libxfs/xfs_trans_space.h |    2 --
 1 file changed, 2 deletions(-)


diff --git a/libxfs/xfs_trans_space.h b/libxfs/xfs_trans_space.h
index f7220792..25a55650 100644
--- a/libxfs/xfs_trans_space.h
+++ b/libxfs/xfs_trans_space.h
@@ -95,8 +95,6 @@
 	XFS_DIRREMOVE_SPACE_RES(mp)
 #define	XFS_RENAME_SPACE_RES(mp,nl)	\
 	(XFS_DIRREMOVE_SPACE_RES(mp) + XFS_DIRENTER_SPACE_RES(mp,nl))
-#define	XFS_SYMLINK_SPACE_RES(mp,nl,b)	\
-	(XFS_IALLOC_SPACE_RES(mp) + XFS_DIRENTER_SPACE_RES(mp,nl) + (b))
 #define XFS_IFREE_SPACE_RES(mp)		\
 	(xfs_has_finobt(mp) ? M_IGEO(mp)->inobt_maxlevels : 0)
 


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 14/25] xfsprogs: remove parent pointers in unlink
  2023-02-16 20:29 ` [PATCHSET v9r2d1 00/25] xfsprogs: Parent Pointers Darrick J. Wong
                     ` (12 preceding siblings ...)
  2023-02-16 20:56   ` [PATCH 13/25] xfsprogs: add parent attributes to symlink Darrick J. Wong
@ 2023-02-16 20:57   ` Darrick J. Wong
  2023-02-16 20:57   ` [PATCH 15/25] xfsprogs: Add parent pointers to rename Darrick J. Wong
                     ` (10 subsequent siblings)
  24 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:57 UTC (permalink / raw)
  To: djwong; +Cc: Dave Chinner, Allison Henderson, allison.henderson, linux-xfs

From: Allison Henderson <allison.henderson@oracle.com>

Source kernel commit: b9ffc3d05531820aea30b2caf3368c312d8b2508

This patch removes the parent pointer attribute during unlink

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 libxfs/xfs_attr.c        |    2 +-
 libxfs/xfs_attr.h        |    1 +
 libxfs/xfs_parent.c      |   17 +++++++++++++++++
 libxfs/xfs_parent.h      |    5 +++++
 libxfs/xfs_trans_space.h |    2 --
 repair/phase6.c          |    6 +++---
 6 files changed, 27 insertions(+), 6 deletions(-)


diff --git a/libxfs/xfs_attr.c b/libxfs/xfs_attr.c
index edf7e1ee..04cafc5f 100644
--- a/libxfs/xfs_attr.c
+++ b/libxfs/xfs_attr.c
@@ -944,7 +944,7 @@ xfs_attr_defer_replace(
 }
 
 /* Removes an attribute for an inode as a deferred operation */
-static int
+int
 xfs_attr_defer_remove(
 	struct xfs_da_args	*args)
 {
diff --git a/libxfs/xfs_attr.h b/libxfs/xfs_attr.h
index 0cf23f51..03300554 100644
--- a/libxfs/xfs_attr.h
+++ b/libxfs/xfs_attr.h
@@ -545,6 +545,7 @@ bool xfs_attr_is_leaf(struct xfs_inode *ip);
 int xfs_attr_get_ilocked(struct xfs_da_args *args);
 int xfs_attr_get(struct xfs_da_args *args);
 int xfs_attr_defer_add(struct xfs_da_args *args);
+int xfs_attr_defer_remove(struct xfs_da_args *args);
 int xfs_attr_set(struct xfs_da_args *args);
 int xfs_attr_set_iter(struct xfs_attr_intent *attr);
 int xfs_attr_remove_iter(struct xfs_attr_intent *attr);
diff --git a/libxfs/xfs_parent.c b/libxfs/xfs_parent.c
index e0a59998..b137cfda 100644
--- a/libxfs/xfs_parent.c
+++ b/libxfs/xfs_parent.c
@@ -116,6 +116,23 @@ xfs_parent_defer_add(
 	return xfs_attr_defer_add(args);
 }
 
+int
+xfs_parent_defer_remove(
+	struct xfs_trans	*tp,
+	struct xfs_inode	*dp,
+	struct xfs_parent_defer	*parent,
+	xfs_dir2_dataptr_t	diroffset,
+	struct xfs_inode	*child)
+{
+	struct xfs_da_args	*args = &parent->args;
+
+	xfs_init_parent_name_rec(&parent->rec, dp, diroffset);
+	args->trans = tp;
+	args->dp = child;
+	args->hashval = xfs_da_hashname(args->name, args->namelen);
+	return xfs_attr_defer_remove(args);
+}
+
 void
 __xfs_parent_cancel(
 	xfs_mount_t		*mp,
diff --git a/libxfs/xfs_parent.h b/libxfs/xfs_parent.h
index d5a8c8e5..0f39d033 100644
--- a/libxfs/xfs_parent.h
+++ b/libxfs/xfs_parent.h
@@ -40,6 +40,11 @@ xfs_parent_start(
 int xfs_parent_defer_add(struct xfs_trans *tp, struct xfs_parent_defer *parent,
 			 struct xfs_inode *dp, struct xfs_name *parent_name,
 			 xfs_dir2_dataptr_t diroffset, struct xfs_inode *child);
+int xfs_parent_defer_remove(struct xfs_trans *tp, struct xfs_inode *dp,
+			    struct xfs_parent_defer *parent,
+			    xfs_dir2_dataptr_t diroffset,
+			    struct xfs_inode *child);
+
 void __xfs_parent_cancel(struct xfs_mount *mp, struct xfs_parent_defer *parent);
 
 static inline void
diff --git a/libxfs/xfs_trans_space.h b/libxfs/xfs_trans_space.h
index 25a55650..b5ab6701 100644
--- a/libxfs/xfs_trans_space.h
+++ b/libxfs/xfs_trans_space.h
@@ -91,8 +91,6 @@
 	 XFS_DQUOT_CLUSTER_SIZE_FSB)
 #define	XFS_QM_QINOCREATE_SPACE_RES(mp)	\
 	XFS_IALLOC_SPACE_RES(mp)
-#define	XFS_REMOVE_SPACE_RES(mp)	\
-	XFS_DIRREMOVE_SPACE_RES(mp)
 #define	XFS_RENAME_SPACE_RES(mp,nl)	\
 	(XFS_DIRREMOVE_SPACE_RES(mp) + XFS_DIRENTER_SPACE_RES(mp,nl))
 #define XFS_IFREE_SPACE_RES(mp)		\
diff --git a/repair/phase6.c b/repair/phase6.c
index a347c390..e202398e 100644
--- a/repair/phase6.c
+++ b/repair/phase6.c
@@ -1266,7 +1266,7 @@ longform_dir2_rebuild(
 	    libxfs_dir_ino_validate(mp, pip.i_ino))
 		pip.i_ino = mp->m_sb.sb_rootino;
 
-	nres = XFS_REMOVE_SPACE_RES(mp);
+	nres = XFS_DIRREMOVE_SPACE_RES(mp);
 	error = -libxfs_trans_alloc(mp, &M_RES(mp)->tr_remove, nres, 0, 0, &tp);
 	if (error)
 		res_failed(error);
@@ -1371,7 +1371,7 @@ dir2_kill_block(
 	int		nres;
 	xfs_trans_t	*tp;
 
-	nres = XFS_REMOVE_SPACE_RES(mp);
+	nres = XFS_DIRREMOVE_SPACE_RES(mp);
 	error = -libxfs_trans_alloc(mp, &M_RES(mp)->tr_remove, nres, 0, 0, &tp);
 	if (error)
 		res_failed(error);
@@ -2887,7 +2887,7 @@ process_dir_inode(
 			 * inode but it's easier than wedging a
 			 * new define in ourselves.
 			 */
-			nres = no_modify ? 0 : XFS_REMOVE_SPACE_RES(mp);
+			nres = no_modify ? 0 : XFS_DIRREMOVE_SPACE_RES(mp);
 			error = -libxfs_trans_alloc(mp, &M_RES(mp)->tr_remove,
 						    nres, 0, 0, &tp);
 			if (error)


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 15/25] xfsprogs: Add parent pointers to rename
  2023-02-16 20:29 ` [PATCHSET v9r2d1 00/25] xfsprogs: Parent Pointers Darrick J. Wong
                     ` (13 preceding siblings ...)
  2023-02-16 20:57   ` [PATCH 14/25] xfsprogs: remove parent pointers in unlink Darrick J. Wong
@ 2023-02-16 20:57   ` Darrick J. Wong
  2023-02-16 20:57   ` [PATCH 16/25] xfsprogs: Add the parent pointer support to the superblock version 5 Darrick J. Wong
                     ` (9 subsequent siblings)
  24 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:57 UTC (permalink / raw)
  To: djwong; +Cc: Allison Henderson, allison.henderson, linux-xfs

From: Allison Henderson <allison.henderson@oracle.com>

This patch removes the old parent pointer attribute during the rename
operation, and re-adds the updated parent pointer.  In the case of
xfs_cross_rename, we modify the routine not to roll the transaction just
yet.  We will do this after the parent pointer is added in the calling
xfs_rename function.

Source kernel commit: d00721b30fd1923f6e9e9c1ca6f2a74cfc4ed5d3

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
[djwong: fix indent with kernel]
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 libxfs/xfs_attr.c        |    2 +-
 libxfs/xfs_attr.h        |    1 +
 libxfs/xfs_parent.c      |   47 +++++++++++++++++++++++++++++++++++++++++-----
 libxfs/xfs_parent.h      |   24 ++++++++++++++++++++++-
 libxfs/xfs_trans_space.h |    2 --
 5 files changed, 66 insertions(+), 10 deletions(-)


diff --git a/libxfs/xfs_attr.c b/libxfs/xfs_attr.c
index 04cafc5f..0cb76f8f 100644
--- a/libxfs/xfs_attr.c
+++ b/libxfs/xfs_attr.c
@@ -921,7 +921,7 @@ xfs_attr_defer_add(
 }
 
 /* Sets an attribute for an inode as a deferred operation */
-static int
+int
 xfs_attr_defer_replace(
 	struct xfs_da_args	*args)
 {
diff --git a/libxfs/xfs_attr.h b/libxfs/xfs_attr.h
index 03300554..98576126 100644
--- a/libxfs/xfs_attr.h
+++ b/libxfs/xfs_attr.h
@@ -546,6 +546,7 @@ int xfs_attr_get_ilocked(struct xfs_da_args *args);
 int xfs_attr_get(struct xfs_da_args *args);
 int xfs_attr_defer_add(struct xfs_da_args *args);
 int xfs_attr_defer_remove(struct xfs_da_args *args);
+int xfs_attr_defer_replace(struct xfs_da_args *args);
 int xfs_attr_set(struct xfs_da_args *args);
 int xfs_attr_set_iter(struct xfs_attr_intent *attr);
 int xfs_attr_remove_iter(struct xfs_attr_intent *attr);
diff --git a/libxfs/xfs_parent.c b/libxfs/xfs_parent.c
index b137cfda..3f02271f 100644
--- a/libxfs/xfs_parent.c
+++ b/libxfs/xfs_parent.c
@@ -65,22 +65,27 @@ xfs_init_parent_name_rec(
 int
 __xfs_parent_init(
 	struct xfs_mount		*mp,
+	bool				grab_log,
 	struct xfs_parent_defer		**parentp)
 {
 	struct xfs_parent_defer		*parent;
 	int				error;
 
-	error = xfs_attr_grab_log_assist(mp);
-	if (error)
-		return error;
+	if (grab_log) {
+		error = xfs_attr_grab_log_assist(mp);
+		if (error)
+			return error;
+	}
 
 	parent = kmem_cache_zalloc(xfs_parent_intent_cache, GFP_KERNEL);
 	if (!parent) {
-		xfs_attr_rele_log_assist(mp);
+		if (grab_log)
+			xfs_attr_rele_log_assist(mp);
 		return -ENOMEM;
 	}
 
 	/* init parent da_args */
+	parent->have_log = grab_log;
 	parent->args.geo = mp->m_attr_geo;
 	parent->args.whichfork = XFS_ATTR_FORK;
 	parent->args.attr_filter = XFS_ATTR_PARENT;
@@ -133,12 +138,44 @@ xfs_parent_defer_remove(
 	return xfs_attr_defer_remove(args);
 }
 
+
+int
+xfs_parent_defer_replace(
+	struct xfs_trans	*tp,
+	struct xfs_parent_defer	*new_parent,
+	struct xfs_inode	*old_dp,
+	xfs_dir2_dataptr_t	old_diroffset,
+	struct xfs_name		*parent_name,
+	struct xfs_inode	*new_dp,
+	xfs_dir2_dataptr_t	new_diroffset,
+	struct xfs_inode	*child)
+{
+	struct xfs_da_args	*args = &new_parent->args;
+
+	xfs_init_parent_name_rec(&new_parent->old_rec, old_dp, old_diroffset);
+	xfs_init_parent_name_rec(&new_parent->rec, new_dp, new_diroffset);
+	new_parent->args.name = (const uint8_t *)&new_parent->old_rec;
+	new_parent->args.namelen = sizeof(struct xfs_parent_name_rec);
+	new_parent->args.new_name = (const uint8_t *)&new_parent->rec;
+	new_parent->args.new_namelen = sizeof(struct xfs_parent_name_rec);
+	args->trans = tp;
+	args->dp = child;
+
+	ASSERT(parent_name != NULL);
+	new_parent->args.value = (void *)parent_name->name;
+	new_parent->args.valuelen = parent_name->len;
+
+	args->hashval = xfs_da_hashname(args->name, args->namelen);
+	return xfs_attr_defer_replace(args);
+}
+
 void
 __xfs_parent_cancel(
 	xfs_mount_t		*mp,
 	struct xfs_parent_defer *parent)
 {
-	xlog_drop_incompat_feat(mp->m_log);
+	if (parent->have_log)
+		xlog_drop_incompat_feat(mp->m_log);
 	kmem_cache_free(xfs_parent_intent_cache, parent);
 }
 
diff --git a/libxfs/xfs_parent.h b/libxfs/xfs_parent.h
index 0f39d033..03900588 100644
--- a/libxfs/xfs_parent.h
+++ b/libxfs/xfs_parent.h
@@ -14,7 +14,9 @@ extern struct kmem_cache	*xfs_parent_intent_cache;
  */
 struct xfs_parent_defer {
 	struct xfs_parent_name_rec	rec;
+	struct xfs_parent_name_rec	old_rec;
 	struct xfs_da_args		args;
+	bool				have_log;
 };
 
 /*
@@ -23,7 +25,8 @@ struct xfs_parent_defer {
 void xfs_init_parent_name_rec(struct xfs_parent_name_rec *rec,
 			      struct xfs_inode *ip,
 			      uint32_t p_diroffset);
-int __xfs_parent_init(struct xfs_mount *mp, struct xfs_parent_defer **parentp);
+int __xfs_parent_init(struct xfs_mount *mp, bool grab_log,
+		struct xfs_parent_defer **parentp);
 
 static inline int
 xfs_parent_start(
@@ -33,13 +36,30 @@ xfs_parent_start(
 	*pp = NULL;
 
 	if (xfs_has_parent(mp))
-		return __xfs_parent_init(mp, pp);
+		return __xfs_parent_init(mp, true, pp);
+	return 0;
+}
+
+static inline int
+xfs_parent_start_locked(
+	struct xfs_mount	*mp,
+	struct xfs_parent_defer	**pp)
+{
+	*pp = NULL;
+
+	if (xfs_has_parent(mp))
+		return __xfs_parent_init(mp, false, pp);
 	return 0;
 }
 
 int xfs_parent_defer_add(struct xfs_trans *tp, struct xfs_parent_defer *parent,
 			 struct xfs_inode *dp, struct xfs_name *parent_name,
 			 xfs_dir2_dataptr_t diroffset, struct xfs_inode *child);
+int xfs_parent_defer_replace(struct xfs_trans *tp,
+		struct xfs_parent_defer *new_parent, struct xfs_inode *old_dp,
+		xfs_dir2_dataptr_t old_diroffset, struct xfs_name *parent_name,
+		struct xfs_inode *new_ip, xfs_dir2_dataptr_t new_diroffset,
+		struct xfs_inode *child);
 int xfs_parent_defer_remove(struct xfs_trans *tp, struct xfs_inode *dp,
 			    struct xfs_parent_defer *parent,
 			    xfs_dir2_dataptr_t diroffset,
diff --git a/libxfs/xfs_trans_space.h b/libxfs/xfs_trans_space.h
index b5ab6701..810610a1 100644
--- a/libxfs/xfs_trans_space.h
+++ b/libxfs/xfs_trans_space.h
@@ -91,8 +91,6 @@
 	 XFS_DQUOT_CLUSTER_SIZE_FSB)
 #define	XFS_QM_QINOCREATE_SPACE_RES(mp)	\
 	XFS_IALLOC_SPACE_RES(mp)
-#define	XFS_RENAME_SPACE_RES(mp,nl)	\
-	(XFS_DIRREMOVE_SPACE_RES(mp) + XFS_DIRENTER_SPACE_RES(mp,nl))
 #define XFS_IFREE_SPACE_RES(mp)		\
 	(xfs_has_finobt(mp) ? M_IGEO(mp)->inobt_maxlevels : 0)
 


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 16/25] xfsprogs: Add the parent pointer support to the superblock version 5.
  2023-02-16 20:29 ` [PATCHSET v9r2d1 00/25] xfsprogs: Parent Pointers Darrick J. Wong
                     ` (14 preceding siblings ...)
  2023-02-16 20:57   ` [PATCH 15/25] xfsprogs: Add parent pointers to rename Darrick J. Wong
@ 2023-02-16 20:57   ` Darrick J. Wong
  2023-02-16 20:57   ` [PATCH 17/25] xfsprogs: Add parent pointer ioctl Darrick J. Wong
                     ` (8 subsequent siblings)
  24 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:57 UTC (permalink / raw)
  To: djwong
  Cc: Mark Tinguely, Dave Chinner, Allison Henderson, Darrick J. Wong,
	allison.henderson, linux-xfs

From: Allison Henderson <allison.henderson@oracle.com>

Source kernel commit: 724321b7f1c737ce880ea0e6fa4422ad13c4d440

Signed-off-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 libfrog/fsgeom.c    |    4 ++++
 libxfs/xfs_format.h |    4 +++-
 libxfs/xfs_fs.h     |    1 +
 libxfs/xfs_sb.c     |    4 ++++
 4 files changed, 12 insertions(+), 1 deletion(-)


diff --git a/libfrog/fsgeom.c b/libfrog/fsgeom.c
index 3e7f0797..3bb753ac 100644
--- a/libfrog/fsgeom.c
+++ b/libfrog/fsgeom.c
@@ -31,6 +31,7 @@ xfs_report_geom(
 	int			bigtime_enabled;
 	int			inobtcount;
 	int			nrext64;
+	int			parent;
 
 	isint = geo->logstart > 0;
 	lazycount = geo->flags & XFS_FSOP_GEOM_FLAGS_LAZYSB ? 1 : 0;
@@ -49,12 +50,14 @@ xfs_report_geom(
 	bigtime_enabled = geo->flags & XFS_FSOP_GEOM_FLAGS_BIGTIME ? 1 : 0;
 	inobtcount = geo->flags & XFS_FSOP_GEOM_FLAGS_INOBTCNT ? 1 : 0;
 	nrext64 = geo->flags & XFS_FSOP_GEOM_FLAGS_NREXT64 ? 1 : 0;
+	parent = geo->flags & XFS_FSOP_GEOM_FLAGS_PARENT ? 1 : 0;
 
 	printf(_(
 "meta-data=%-22s isize=%-6d agcount=%u, agsize=%u blks\n"
 "         =%-22s sectsz=%-5u attr=%u, projid32bit=%u\n"
 "         =%-22s crc=%-8u finobt=%u, sparse=%u, rmapbt=%u\n"
 "         =%-22s reflink=%-4u bigtime=%u inobtcount=%u nrext64=%u\n"
+"         =%-22s parent=%d\n"
 "data     =%-22s bsize=%-6u blocks=%llu, imaxpct=%u\n"
 "         =%-22s sunit=%-6u swidth=%u blks\n"
 "naming   =version %-14u bsize=%-6u ascii-ci=%d, ftype=%d\n"
@@ -65,6 +68,7 @@ xfs_report_geom(
 		"", geo->sectsize, attrversion, projid32bit,
 		"", crcs_enabled, finobt_enabled, spinodes, rmapbt_enabled,
 		"", reflink_enabled, bigtime_enabled, inobtcount, nrext64,
+		"", parent,
 		"", geo->blocksize, (unsigned long long)geo->datablocks,
 			geo->imaxpct,
 		"", geo->sunit, geo->swidth,
diff --git a/libxfs/xfs_format.h b/libxfs/xfs_format.h
index 371dc072..f413819b 100644
--- a/libxfs/xfs_format.h
+++ b/libxfs/xfs_format.h
@@ -373,13 +373,15 @@ xfs_sb_has_ro_compat_feature(
 #define XFS_SB_FEAT_INCOMPAT_BIGTIME	(1 << 3)	/* large timestamps */
 #define XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR (1 << 4)	/* needs xfs_repair */
 #define XFS_SB_FEAT_INCOMPAT_NREXT64	(1 << 5)	/* large extent counters */
+#define XFS_SB_FEAT_INCOMPAT_PARENT	(1 << 6)	/* parent pointers */
 #define XFS_SB_FEAT_INCOMPAT_ALL \
 		(XFS_SB_FEAT_INCOMPAT_FTYPE|	\
 		 XFS_SB_FEAT_INCOMPAT_SPINODES|	\
 		 XFS_SB_FEAT_INCOMPAT_META_UUID| \
 		 XFS_SB_FEAT_INCOMPAT_BIGTIME| \
 		 XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR| \
-		 XFS_SB_FEAT_INCOMPAT_NREXT64)
+		 XFS_SB_FEAT_INCOMPAT_NREXT64| \
+		 XFS_SB_FEAT_INCOMPAT_PARENT)
 
 #define XFS_SB_FEAT_INCOMPAT_UNKNOWN	~XFS_SB_FEAT_INCOMPAT_ALL
 static inline bool
diff --git a/libxfs/xfs_fs.h b/libxfs/xfs_fs.h
index 1cfd5bc6..b0b4d7a3 100644
--- a/libxfs/xfs_fs.h
+++ b/libxfs/xfs_fs.h
@@ -237,6 +237,7 @@ typedef struct xfs_fsop_resblks {
 #define XFS_FSOP_GEOM_FLAGS_BIGTIME	(1 << 21) /* 64-bit nsec timestamps */
 #define XFS_FSOP_GEOM_FLAGS_INOBTCNT	(1 << 22) /* inobt btree counter */
 #define XFS_FSOP_GEOM_FLAGS_NREXT64	(1 << 23) /* large extent counters */
+#define XFS_FSOP_GEOM_FLAGS_PARENT	(1 << 24) /* parent pointers 	    */
 
 /*
  * Minimum and maximum sizes need for growth checks.
diff --git a/libxfs/xfs_sb.c b/libxfs/xfs_sb.c
index d05f0e6e..2ce2ba75 100644
--- a/libxfs/xfs_sb.c
+++ b/libxfs/xfs_sb.c
@@ -171,6 +171,8 @@ xfs_sb_version_to_features(
 		features |= XFS_FEAT_NEEDSREPAIR;
 	if (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_NREXT64)
 		features |= XFS_FEAT_NREXT64;
+	if (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_PARENT)
+		features |= XFS_FEAT_PARENT;
 
 	return features;
 }
@@ -1187,6 +1189,8 @@ xfs_fs_geometry(
 		geo->flags |= XFS_FSOP_GEOM_FLAGS_BIGTIME;
 	if (xfs_has_inobtcounts(mp))
 		geo->flags |= XFS_FSOP_GEOM_FLAGS_INOBTCNT;
+	if (xfs_has_parent(mp))
+		geo->flags |= XFS_FSOP_GEOM_FLAGS_PARENT;
 	if (xfs_has_sector(mp)) {
 		geo->flags |= XFS_FSOP_GEOM_FLAGS_SECTOR;
 		geo->logsectsize = sbp->sb_logsectsize;


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 17/25] xfsprogs: Add parent pointer ioctl
  2023-02-16 20:29 ` [PATCHSET v9r2d1 00/25] xfsprogs: Parent Pointers Darrick J. Wong
                     ` (15 preceding siblings ...)
  2023-02-16 20:57   ` [PATCH 16/25] xfsprogs: Add the parent pointer support to the superblock version 5 Darrick J. Wong
@ 2023-02-16 20:57   ` Darrick J. Wong
  2023-02-16 20:58   ` [PATCH 18/25] xfsprogs: fix unit conversion error in xfs_log_calc_max_attrsetm_res Darrick J. Wong
                     ` (7 subsequent siblings)
  24 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:57 UTC (permalink / raw)
  To: djwong; +Cc: Allison Henderson, allison.henderson, linux-xfs

From: Allison Henderson <allison.henderson@oracle.com>

Source kernel commit: 5e5cdd593342c5ff8aeef9daaa93293f63079b4b

This patch adds a new file ioctl to retrieve the parent pointer of a
given inode

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
---
 libxfs/xfs_fs.h     |   74 +++++++++++++++++++++++++++++++++++++++++++++++++++
 libxfs/xfs_parent.c |   10 +++++++
 libxfs/xfs_parent.h |    2 +
 man/man3/xfsctl.3   |   55 ++++++++++++++++++++++++++++++++++++++
 4 files changed, 141 insertions(+)


diff --git a/libxfs/xfs_fs.h b/libxfs/xfs_fs.h
index b0b4d7a3..9e59a1fd 100644
--- a/libxfs/xfs_fs.h
+++ b/libxfs/xfs_fs.h
@@ -752,6 +752,79 @@ struct xfs_scrub_metadata {
 				 XFS_SCRUB_OFLAG_NO_REPAIR_NEEDED)
 #define XFS_SCRUB_FLAGS_ALL	(XFS_SCRUB_FLAGS_IN | XFS_SCRUB_FLAGS_OUT)
 
+#define XFS_PPTR_MAXNAMELEN				256
+
+/* return parents of the handle, not the open fd */
+#define XFS_PPTR_IFLAG_HANDLE  (1U << 0)
+
+/* target was the root directory */
+#define XFS_PPTR_OFLAG_ROOT    (1U << 1)
+
+/* Cursor is done iterating pptrs */
+#define XFS_PPTR_OFLAG_DONE    (1U << 2)
+
+ #define XFS_PPTR_FLAG_ALL     (XFS_PPTR_IFLAG_HANDLE | XFS_PPTR_OFLAG_ROOT | \
+				XFS_PPTR_OFLAG_DONE)
+
+/* Get an inode parent pointer through ioctl */
+struct xfs_parent_ptr {
+	__u64		xpp_ino;			/* Inode */
+	__u32		xpp_gen;			/* Inode generation */
+	__u32		xpp_diroffset;			/* Directory offset */
+	__u64		xpp_rsvd;			/* Reserved */
+	__u8		xpp_name[XFS_PPTR_MAXNAMELEN];	/* File name */
+};
+
+/* Iterate through an inodes parent pointers */
+struct xfs_pptr_info {
+	/* File handle, if XFS_PPTR_IFLAG_HANDLE is set */
+	struct xfs_handle		pi_handle;
+
+	/*
+	 * Structure to track progress in iterating the parent pointers.
+	 * Must be initialized to zeroes before the first ioctl call, and
+	 * not touched by callers after that.
+	 */
+	struct xfs_attrlist_cursor	pi_cursor;
+
+	/* Operational flags: XFS_PPTR_*FLAG* */
+	__u32				pi_flags;
+
+	/* Must be set to zero */
+	__u32				pi_reserved;
+
+	/* # of entries in array */
+	__u32				pi_ptrs_size;
+
+	/* # of entries filled in (output) */
+	__u32				pi_ptrs_used;
+
+	/* Must be set to zero */
+	__u64				pi_reserved2[6];
+
+	/*
+	 * An array of struct xfs_parent_ptr follows the header
+	 * information. Use xfs_ppinfo_to_pp() to access the
+	 * parent pointer array entries.
+	 */
+	struct xfs_parent_ptr		pi_parents[];
+};
+
+static inline size_t
+xfs_pptr_info_sizeof(int nr_ptrs)
+{
+	return sizeof(struct xfs_pptr_info) +
+	       (nr_ptrs * sizeof(struct xfs_parent_ptr));
+}
+
+static inline struct xfs_parent_ptr*
+xfs_ppinfo_to_pp(
+	struct xfs_pptr_info	*info,
+	int			idx)
+{
+	return &info->pi_parents[idx];
+}
+
 /*
  * ioctl limits
  */
@@ -797,6 +870,7 @@ struct xfs_scrub_metadata {
 /*	XFS_IOC_GETFSMAP ------ hoisted 59         */
 #define XFS_IOC_SCRUB_METADATA	_IOWR('X', 60, struct xfs_scrub_metadata)
 #define XFS_IOC_AG_GEOMETRY	_IOWR('X', 61, struct xfs_ag_geometry)
+#define XFS_IOC_GETPARENTS	_IOWR('X', 62, struct xfs_parent_ptr)
 
 /*
  * ioctl commands that replace IRIX syssgi()'s
diff --git a/libxfs/xfs_parent.c b/libxfs/xfs_parent.c
index 3f02271f..47ea6b89 100644
--- a/libxfs/xfs_parent.c
+++ b/libxfs/xfs_parent.c
@@ -30,6 +30,16 @@
 
 struct kmem_cache		*xfs_parent_intent_cache;
 
+/* Initializes a xfs_parent_ptr from an xfs_parent_name_rec */
+void
+xfs_init_parent_ptr(struct xfs_parent_ptr		*xpp,
+		    const struct xfs_parent_name_rec	*rec)
+{
+	xpp->xpp_ino = be64_to_cpu(rec->p_ino);
+	xpp->xpp_gen = be32_to_cpu(rec->p_gen);
+	xpp->xpp_diroffset = be32_to_cpu(rec->p_diroffset);
+}
+
 /*
  * Parent pointer attribute handling.
  *
diff --git a/libxfs/xfs_parent.h b/libxfs/xfs_parent.h
index 03900588..13040b9d 100644
--- a/libxfs/xfs_parent.h
+++ b/libxfs/xfs_parent.h
@@ -25,6 +25,8 @@ struct xfs_parent_defer {
 void xfs_init_parent_name_rec(struct xfs_parent_name_rec *rec,
 			      struct xfs_inode *ip,
 			      uint32_t p_diroffset);
+void xfs_init_parent_ptr(struct xfs_parent_ptr *xpp,
+			 const struct xfs_parent_name_rec *rec);
 int __xfs_parent_init(struct xfs_mount *mp, bool grab_log,
 		struct xfs_parent_defer **parentp);
 
diff --git a/man/man3/xfsctl.3 b/man/man3/xfsctl.3
index 4a0d4d08..7cc97499 100644
--- a/man/man3/xfsctl.3
+++ b/man/man3/xfsctl.3
@@ -321,6 +321,61 @@ They are all subject to change and should not be called directly
 by applications.
 XFS_IOC_FSSETDM_BY_HANDLE is not supported as of Linux 5.5.
 
+.PP
+.TP
+.B XFS_IOC_GETPARENTS
+This command is used to get a files parent pointers.  Parent pointers are
+file attributes used to store meta data information about an inodes parent.
+This command takes a xfs_pptr_info structure with trailing array of
+struct xfs_parent_ptr as an input to store an inodes parents. The
+xfs_pptr_info_sizeof() and xfs_ppinfo_to_pp() routines are provided to
+create and iterate through these structures.  The number of pointers stored
+in the array is indicated by the xfs_pptr_info.used field, and the
+XFS_PPTR_OFLAG_DONE flag will be set in xfs_pptr_info.flags when there are
+no more parent pointers to be read.  The below code is an example
+of XFS_IOC_GETPARENTS usage:
+
+.nf
+#include<stdio.h>
+#include<string.h>
+#include<errno.h>
+#include<xfs/linux.h>
+#include<xfs/xfs.h>
+#include<xfs/xfs_types.h>
+#include<xfs/xfs_fs.h>
+
+int main() {
+	struct xfs_pptr_info	*pi;
+	struct xfs_parent_ptr	*p;
+	int			i, error, fd, nr_ptrs = 4;
+
+	unsigned char buffer[xfs_pptr_info_sizeof(nr_ptrs)];
+	memset(buffer, 0, sizeof(buffer));
+	pi = (struct xfs_pptr_info *)&buffer;
+	pi->pi_ptrs_size = nr_ptrs;
+
+	fd = open("/mnt/test/foo.txt", O_RDONLY | O_CREAT);
+	if (fd  == -1)
+		return errno;
+
+	do {
+		error = ioctl(fd, XFS_IOC_GETPARENTS, pi);
+		if (error)
+			return error;
+
+		for (i = 0; i < pi->pi_ptrs_used; i++) {
+			p = xfs_ppinfo_to_pp(pi, i);
+			printf("inode		= %llu\\n", (unsigned long long)p->xpp_ino);
+			printf("generation	= %u\\n", (unsigned int)p->xpp_gen);
+			printf("diroffset	= %u\\n", (unsigned int)p->xpp_diroffset);
+			printf("name		= \\"%s\\"\\n\\n", (char *)p->xpp_name);
+		}
+	} while (!pi->pi_flags & XFS_PPTR_OFLAG_DONE);
+
+	return 0;
+}
+.fi
+
 .SS Filesystem Operations
 In order to effect one of the following operations, the pathname
 and descriptor arguments passed to


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 18/25] xfsprogs: fix unit conversion error in xfs_log_calc_max_attrsetm_res
  2023-02-16 20:29 ` [PATCHSET v9r2d1 00/25] xfsprogs: Parent Pointers Darrick J. Wong
                     ` (16 preceding siblings ...)
  2023-02-16 20:57   ` [PATCH 17/25] xfsprogs: Add parent pointer ioctl Darrick J. Wong
@ 2023-02-16 20:58   ` Darrick J. Wong
  2023-02-16 20:58   ` [PATCH 19/25] xfsprogs: drop compatibility minimum log size computations for reflink Darrick J. Wong
                     ` (6 subsequent siblings)
  24 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:58 UTC (permalink / raw)
  To: djwong; +Cc: Allison Henderson, allison.henderson, linux-xfs

From: Allison Henderson <allison.henderson@oracle.com>

Source kernel commit: 46f34ae75a2ef5ca24104377a10a57f9d4151e1d

Dave and I were discussing some recent test regressions as a result of
me turning on nrext64=1 on realtime filesystems, when we noticed that
the minimum log size of a 32M filesystem jumped from 954 blocks to 4287
blocks.

Digging through xfs_log_calc_max_attrsetm_res, Dave noticed that @size
contains the maximum estimated amount of space needed for a local format
xattr, in bytes, but we feed this quantity to XFS_NEXTENTADD_SPACE_RES,
which requires units of blocks.  This has resulted in an overestimation
of the minimum log size over the years.

We should nominally correct this, but there's a backwards compatibility
problem -- if we enable it now, the minimum log size will decrease.  If
a corrected mkfs formats a filesystem with this new smaller log size, a
user will encounter mount failures on an uncorrected kernel due to the
larger minimum log size computations there.

However, the large extent counters feature is still EXPERIMENTAL, so we
can gate the correction on that feature (or any features that get added
after that) being enabled.  Any filesystem with nrext64 or any of the
as-yet-undefined feature bits turned on will be rejected by old
uncorrected kernels, so this should be safe even in the upgrade case.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
---
 libxfs/xfs_log_rlimit.c |   43 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)


diff --git a/libxfs/xfs_log_rlimit.c b/libxfs/xfs_log_rlimit.c
index cba24493..6ecb9ad5 100644
--- a/libxfs/xfs_log_rlimit.c
+++ b/libxfs/xfs_log_rlimit.c
@@ -16,6 +16,39 @@
 #include "xfs_bmap_btree.h"
 #include "xfs_trace.h"
 
+/*
+ * Decide if the filesystem has the parent pointer feature or any feature
+ * added after that.
+ */
+static inline bool
+xfs_has_parent_or_newer_feature(
+	struct xfs_mount	*mp)
+{
+	if (!xfs_sb_is_v5(&mp->m_sb))
+		return false;
+
+	if (xfs_sb_has_compat_feature(&mp->m_sb, ~0))
+		return true;
+
+	if (xfs_sb_has_ro_compat_feature(&mp->m_sb,
+				~(XFS_SB_FEAT_RO_COMPAT_FINOBT |
+				 XFS_SB_FEAT_RO_COMPAT_RMAPBT |
+				 XFS_SB_FEAT_RO_COMPAT_REFLINK |
+				 XFS_SB_FEAT_RO_COMPAT_INOBTCNT)))
+		return true;
+
+	if (xfs_sb_has_incompat_feature(&mp->m_sb,
+				~(XFS_SB_FEAT_INCOMPAT_FTYPE |
+				 XFS_SB_FEAT_INCOMPAT_SPINODES |
+				 XFS_SB_FEAT_INCOMPAT_META_UUID |
+				 XFS_SB_FEAT_INCOMPAT_BIGTIME |
+				 XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR |
+				 XFS_SB_FEAT_INCOMPAT_NREXT64)))
+		return true;
+
+	return false;
+}
+
 /*
  * Calculate the maximum length in bytes that would be required for a local
  * attribute value as large attributes out of line are not logged.
@@ -31,6 +64,16 @@ xfs_log_calc_max_attrsetm_res(
 	       MAXNAMELEN - 1;
 	nblks = XFS_DAENTER_SPACE_RES(mp, XFS_ATTR_FORK);
 	nblks += XFS_B_TO_FSB(mp, size);
+
+	/*
+	 * Starting with the parent pointer feature, every new fs feature
+	 * corrects a unit conversion error in the xattr transaction
+	 * reservation code that resulted in oversized minimum log size
+	 * computations.
+	 */
+	if (xfs_has_parent_or_newer_feature(mp))
+		size = XFS_B_TO_FSB(mp, size);
+
 	nblks += XFS_NEXTENTADD_SPACE_RES(mp, size, XFS_ATTR_FORK);
 
 	return  M_RES(mp)->tr_attrsetm.tr_logres +


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 19/25] xfsprogs: drop compatibility minimum log size computations for reflink
  2023-02-16 20:29 ` [PATCHSET v9r2d1 00/25] xfsprogs: Parent Pointers Darrick J. Wong
                     ` (17 preceding siblings ...)
  2023-02-16 20:58   ` [PATCH 18/25] xfsprogs: fix unit conversion error in xfs_log_calc_max_attrsetm_res Darrick J. Wong
@ 2023-02-16 20:58   ` Darrick J. Wong
  2023-02-16 20:58   ` [PATCH 20/25] xfsprogs: Add parent pointer flag to cmd Darrick J. Wong
                     ` (5 subsequent siblings)
  24 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:58 UTC (permalink / raw)
  To: djwong; +Cc: Allison Henderson, allison.henderson, linux-xfs

From: Allison Henderson <allison.henderson@oracle.com>

Source kernel commit: c14b8c08a1dff8019bc4cd1674c5d5bd4248a1e5

Having established that we can reduce the minimum log size computation
for filesystems with parent pointers or any newer feature, we should
also drop the compat minlogsize code that we added when we reduced the
transaction reservation size for rmap and reflink.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
---
 libxfs/xfs_log_rlimit.c |   10 ++++++++++
 1 file changed, 10 insertions(+)


diff --git a/libxfs/xfs_log_rlimit.c b/libxfs/xfs_log_rlimit.c
index 6ecb9ad5..59605f0d 100644
--- a/libxfs/xfs_log_rlimit.c
+++ b/libxfs/xfs_log_rlimit.c
@@ -91,6 +91,16 @@ xfs_log_calc_trans_resv_for_minlogblocks(
 {
 	unsigned int		rmap_maxlevels = mp->m_rmap_maxlevels;
 
+	/*
+	 * Starting with the parent pointer feature, every new fs feature
+	 * drops the oversized minimum log size computation introduced by the
+	 * original reflink code.
+	 */
+	if (xfs_has_parent_or_newer_feature(mp)) {
+		xfs_trans_resv_calc(mp, resv);
+		return;
+	}
+
 	/*
 	 * In the early days of rmap+reflink, we always set the rmap maxlevels
 	 * to 9 even if the AG was small enough that it would never grow to


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 20/25] xfsprogs: Add parent pointer flag to cmd
  2023-02-16 20:29 ` [PATCHSET v9r2d1 00/25] xfsprogs: Parent Pointers Darrick J. Wong
                     ` (18 preceding siblings ...)
  2023-02-16 20:58   ` [PATCH 19/25] xfsprogs: drop compatibility minimum log size computations for reflink Darrick J. Wong
@ 2023-02-16 20:58   ` Darrick J. Wong
  2023-02-16 20:58   ` [PATCH 21/25] xfsprogs: Print pptrs in ATTRI items Darrick J. Wong
                     ` (4 subsequent siblings)
  24 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:58 UTC (permalink / raw)
  To: djwong; +Cc: Allison Henderson, allison.henderson, linux-xfs

From: Allison Henderson <allison.henderson@oracle.com>

mkfs: enable formatting with parent pointers. Enable parent pointer support in mkfs
via the '-n parent' parameter.

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
---
 mkfs/xfs_mkfs.c |   29 ++++++++++++++++++++++++++---
 1 file changed, 26 insertions(+), 3 deletions(-)


diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
index d95394a5..dffee9e2 100644
--- a/mkfs/xfs_mkfs.c
+++ b/mkfs/xfs_mkfs.c
@@ -110,6 +110,7 @@ enum {
 	N_SIZE = 0,
 	N_VERSION,
 	N_FTYPE,
+	N_PARENT,
 	N_MAX_OPTS,
 };
 
@@ -615,6 +616,7 @@ static struct opt_params nopts = {
 		[N_SIZE] = "size",
 		[N_VERSION] = "version",
 		[N_FTYPE] = "ftype",
+		[N_PARENT] = "parent",
 		[N_MAX_OPTS] = NULL,
 	},
 	.subopt_params = {
@@ -638,6 +640,14 @@ static struct opt_params nopts = {
 		  .maxval = 1,
 		  .defaultval = 1,
 		},
+		{ .index = N_PARENT,
+		  .conflicts = { { NULL, LAST_CONFLICT } },
+		  .minval = 0,
+		  .maxval = 1,
+		  .defaultval = 1,
+		},
+
+
 	},
 };
 
@@ -970,7 +980,7 @@ usage( void )
 /* log subvol */	[-l agnum=n,internal,size=num,logdev=xxx,version=n\n\
 			    sunit=value|su=num,sectsize=num,lazy-count=0|1]\n\
 /* label */		[-L label (maximum 12 characters)]\n\
-/* naming */		[-n size=num,version=2|ci,ftype=0|1]\n\
+/* naming */		[-n size=num,version=2|ci,ftype=0|1,parent=0|1]]\n\
 /* no-op info only */	[-N]\n\
 /* prototype file */	[-p fname]\n\
 /* quiet */		[-q]\n\
@@ -1744,6 +1754,9 @@ naming_opts_parser(
 	case N_FTYPE:
 		cli->sb_feat.dirftype = getnum(value, opts, subopt);
 		break;
+	case N_PARENT:
+		cli->sb_feat.parent_pointers = getnum(value, &nopts, N_PARENT);
+		break;
 	default:
 		return -EINVAL;
 	}
@@ -2225,6 +2238,14 @@ _("inode btree counters not supported without finobt support\n"));
 		cli->sb_feat.inobtcnt = false;
 	}
 
+	if ((cli->sb_feat.parent_pointers) &&
+	    cli->sb_feat.dir_version == 4) {
+		fprintf(stderr,
+_("parent pointers not supported on v4 filesystems\n"));
+		usage();
+		cli->sb_feat.parent_pointers = false;
+	}
+
 	if (cli->xi->rtname) {
 		if (cli->sb_feat.reflink && cli_opt_set(&mopts, M_REFLINK)) {
 			fprintf(stderr,
@@ -3224,8 +3245,6 @@ sb_set_features(
 		sbp->sb_features2 |= XFS_SB_VERSION2_LAZYSBCOUNTBIT;
 	if (fp->projid32bit)
 		sbp->sb_features2 |= XFS_SB_VERSION2_PROJID32BIT;
-	if (fp->parent_pointers)
-		sbp->sb_features2 |= XFS_SB_VERSION2_PARENTBIT;
 	if (fp->crcs_enabled)
 		sbp->sb_features2 |= XFS_SB_VERSION2_CRCBIT;
 	if (fp->attr_version == 2)
@@ -3266,6 +3285,10 @@ sb_set_features(
 		sbp->sb_features_ro_compat |= XFS_SB_FEAT_RO_COMPAT_INOBTCNT;
 	if (fp->bigtime)
 		sbp->sb_features_incompat |= XFS_SB_FEAT_INCOMPAT_BIGTIME;
+	if (fp->parent_pointers) {
+		sbp->sb_features_incompat |= XFS_SB_FEAT_INCOMPAT_PARENT;
+		sbp->sb_versionnum |= XFS_SB_VERSION_ATTRBIT;
+	}
 
 	/*
 	 * Sparse inode chunk support has two main inode alignment requirements.


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 21/25] xfsprogs: Print pptrs in ATTRI items
  2023-02-16 20:29 ` [PATCHSET v9r2d1 00/25] xfsprogs: Parent Pointers Darrick J. Wong
                     ` (19 preceding siblings ...)
  2023-02-16 20:58   ` [PATCH 20/25] xfsprogs: Add parent pointer flag to cmd Darrick J. Wong
@ 2023-02-16 20:58   ` Darrick J. Wong
  2023-02-16 20:59   ` [PATCH 22/25] xfs_db: report parent bit on xattrs Darrick J. Wong
                     ` (3 subsequent siblings)
  24 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:58 UTC (permalink / raw)
  To: djwong; +Cc: Allison Henderson, allison.henderson, linux-xfs

From: Allison Henderson <allison.henderson@oracle.com>

This patch modifies the ATTRI print routines to look for the parent pointer flag,
and print the log entry name as a parent pointer name record.  Values are printed as
strings since they contain the file name.

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
---
 logprint/log_redo.c |  193 +++++++++++++++++++++++++++++++++++++++++++++------
 logprint/logprint.h |    5 +
 2 files changed, 172 insertions(+), 26 deletions(-)


diff --git a/logprint/log_redo.c b/logprint/log_redo.c
index b596af02..f7e9c9ad 100644
--- a/logprint/log_redo.c
+++ b/logprint/log_redo.c
@@ -674,6 +674,31 @@ xfs_attri_copy_log_format(
 	return 1;
 }
 
+/* iovec length must be 32-bit aligned */
+static inline size_t ATTR_NVEC_SIZE(size_t size)
+{
+	return round_up(size, sizeof(int32_t));
+}
+
+static int
+xfs_attri_copy_name_format(
+	char                            *buf,
+	uint                            len,
+	struct xfs_parent_name_rec     *dst_attri_fmt)
+{
+	uint dst_len = ATTR_NVEC_SIZE(sizeof(struct xfs_parent_name_rec));
+
+	if (len == dst_len) {
+		memcpy((char *)dst_attri_fmt, buf, len);
+		return 0;
+	}
+
+	fprintf(stderr, _("%s: bad size of attri name format: %u; expected %u\n"),
+		progname, len, dst_len);
+
+	return 1;
+}
+
 int
 xlog_print_trans_attri(
 	char				**ptr,
@@ -714,7 +739,8 @@ xlog_print_trans_attri(
 		(*i)++;
 		head = (xlog_op_header_t *)*ptr;
 		xlog_print_op_header(head, *i, ptr);
-		error = xlog_print_trans_attri_name(ptr, be32_to_cpu(head->oh_len));
+		error = xlog_print_trans_attri_name(ptr, be32_to_cpu(head->oh_len),
+						    src_f->alfi_attr_filter);
 		if (error)
 			goto error;
 	}
@@ -724,7 +750,8 @@ xlog_print_trans_attri(
 		(*i)++;
 		head = (xlog_op_header_t *)*ptr;
 		xlog_print_op_header(head, *i, ptr);
-		error = xlog_print_trans_attri_name(ptr, be32_to_cpu(head->oh_len));
+		error = xlog_print_trans_attri_name(ptr, be32_to_cpu(head->oh_len),
+						    src_f->alfi_attr_filter);
 		if (error)
 			goto error;
 	}
@@ -735,7 +762,7 @@ xlog_print_trans_attri(
 		head = (xlog_op_header_t *)*ptr;
 		xlog_print_op_header(head, *i, ptr);
 		error = xlog_print_trans_attri_value(ptr, be32_to_cpu(head->oh_len),
-				src_f->alfi_value_len);
+				src_f->alfi_value_len, src_f->alfi_attr_filter);
 	}
 error:
 	free(src_f);
@@ -746,13 +773,45 @@ xlog_print_trans_attri(
 int
 xlog_print_trans_attri_name(
 	char				**ptr,
-	uint				src_len)
+	uint				src_len,
+	uint				attr_flags)
 {
-	printf(_("ATTRI:  name len:%u\n"), src_len);
-	print_or_dump(*ptr, src_len);
+	struct xfs_parent_name_rec	*src_f = NULL;
+	uint				dst_len;
 
+	/*
+	 * If this is not a parent pointer, just do a bin dump
+	 */
+	if (!(attr_flags & XFS_ATTR_PARENT)) {
+		printf(_("ATTRI:  name len:%u\n"), src_len);
+		print_or_dump(*ptr, src_len);
+		goto out;
+	}
+
+	dst_len	= ATTR_NVEC_SIZE(sizeof(struct xfs_parent_name_rec));
+	if (dst_len != src_len) {
+		fprintf(stderr, _("%s: bad size of attri name format: %u; expected %u\n"),
+			progname, src_len, dst_len);
+		return 1;
+	}
+
+	/*
+	 * memmove to ensure 8-byte alignment for the long longs in
+	 * xfs_parent_name_rec structure
+	 */
+	if ((src_f = (struct xfs_parent_name_rec *)malloc(src_len)) == NULL) {
+		fprintf(stderr, _("%s: xlog_print_trans_attri_name: malloc failed\n"), progname);
+		exit(1);
+	}
+	memmove((char*)src_f, *ptr, src_len);
+
+	printf(_("ATTRI:  #p_ino: %llu	p_gen: %u, p_diroffset: %u\n"),
+		be64_to_cpu(src_f->p_ino), be32_to_cpu(src_f->p_gen),
+				be32_to_cpu(src_f->p_diroffset));
+
+	free(src_f);
+out:
 	*ptr += src_len;
-
 	return 0;
 }	/* xlog_print_trans_attri */
 
@@ -760,15 +819,32 @@ int
 xlog_print_trans_attri_value(
 	char				**ptr,
 	uint				src_len,
-	int				value_len)
+	int				value_len,
+	uint				attr_flags)
 {
 	int len = min(value_len, src_len);
+	char				*f = NULL;
 
-	printf(_("ATTRI:  value len:%u\n"), value_len);
-	print_or_dump(*ptr, len);
+	/*
+	 * If this is not a parent pointer, just do a bin dump
+	 */
+	if (!(attr_flags & XFS_ATTR_PARENT)) {
+		printf(_("ATTRI:  value len:%u\n"), value_len);
+		print_or_dump(*ptr, min(len, MAX_ATTR_VAL_PRINT));
+		goto out;
+	}
 
+	if ((f = (char *)malloc(src_len)) == NULL) {
+		fprintf(stderr, _("%s: xlog_print_trans_attri: malloc failed\n"), progname);
+		exit(1);
+	}
+
+	memcpy(f, *ptr, value_len);
+	printf(_("ATTRI:  value: %.*s\n"), value_len, f);
+
+	free(f);
+out:
 	*ptr += src_len;
-
 	return 0;
 }	/* xlog_print_trans_attri_value */
 
@@ -779,6 +855,9 @@ xlog_recover_print_attri(
 	struct xfs_attri_log_format	*f, *src_f = NULL;
 	uint				src_len, dst_len;
 
+	struct xfs_parent_name_rec 	*rec, *src_rec = NULL;
+	char				*value, *src_value = NULL;
+
 	int				region = 0;
 
 	src_f = (struct xfs_attri_log_format *)item->ri_buf[0].i_addr;
@@ -803,27 +882,93 @@ xlog_recover_print_attri(
 
 	if (f->alfi_name_len > 0) {
 		region++;
-		printf(_("ATTRI:  name len:%u\n"), f->alfi_name_len);
-		print_or_dump((char *)item->ri_buf[region].i_addr,
-			       f->alfi_name_len);
+
+		if (f->alfi_attr_filter & XFS_ATTR_PARENT) {
+			src_rec = (struct xfs_parent_name_rec *)item->ri_buf[region].i_addr;
+			src_len = item->ri_buf[region].i_len;
+
+			dst_len = ATTR_NVEC_SIZE(sizeof(struct xfs_parent_name_rec));
+
+			if ((rec = ((struct xfs_parent_name_rec *)malloc(dst_len))) == NULL) {
+				fprintf(stderr, _("%s: xlog_recover_print_attri: malloc failed\n"),
+					progname);
+				exit(1);
+			}
+			if (xfs_attri_copy_name_format((char *)src_rec, src_len, rec)) {
+				goto out;
+			}
+
+			printf(_("ATTRI:  #inode: %llu     gen: %u, offset: %u\n"),
+				be64_to_cpu(rec->p_ino), be32_to_cpu(rec->p_gen),
+				be32_to_cpu(rec->p_diroffset));
+
+			free(rec);
+		}
+		else {
+			printf(_("ATTRI:  name len:%u\n"), f->alfi_name_len);
+			print_or_dump((char *)item->ri_buf[region].i_addr,
+					f->alfi_name_len);
+		}
 	}
 
 	if (f->alfi_nname_len > 0) {
 		region++;
-		printf(_("ATTRI:  nname len:%u\n"), f->alfi_nname_len);
-		print_or_dump((char *)item->ri_buf[region].i_addr,
-			       f->alfi_nname_len);
+
+		if (f->alfi_attr_filter & XFS_ATTR_PARENT) {
+			src_rec = (struct xfs_parent_name_rec *)item->ri_buf[region].i_addr;
+			src_len = item->ri_buf[region].i_len;
+
+			dst_len = ATTR_NVEC_SIZE(sizeof(struct xfs_parent_name_rec));
+
+			if ((rec = ((struct xfs_parent_name_rec *)malloc(dst_len))) == NULL) {
+				fprintf(stderr, _("%s: xlog_recover_print_attri: malloc failed\n"),
+					progname);
+				exit(1);
+			}
+			if (xfs_attri_copy_name_format((char *)src_rec, src_len, rec)) {
+				goto out;
+			}
+
+			printf(_("ATTRI:  new #inode: %llu     gen: %u, offset: %u\n"),
+				be64_to_cpu(rec->p_ino), be32_to_cpu(rec->p_gen),
+				be32_to_cpu(rec->p_diroffset));
+
+			free(rec);
+		}
+		else {
+			printf(_("ATTRI:  nname len:%u\n"), f->alfi_nname_len);
+			print_or_dump((char *)item->ri_buf[region].i_addr,
+				       f->alfi_nname_len);
+		}
 	}
 
 	if (f->alfi_value_len > 0) {
-		int len = f->alfi_value_len;
-
-		if (len > MAX_ATTR_VAL_PRINT)
-			len = MAX_ATTR_VAL_PRINT;
-
 		region++;
-		printf(_("ATTRI:  value len:%u\n"), f->alfi_value_len);
-		print_or_dump((char *)item->ri_buf[region].i_addr, len);
+
+		if (f->alfi_attr_filter & XFS_ATTR_PARENT) {
+			src_value = (char *)item->ri_buf[region].i_addr;
+
+			if ((value = ((char *)malloc(f->alfi_value_len))) == NULL) {
+				fprintf(stderr, _("%s: xlog_recover_print_attri: malloc failed\n"),
+					progname);
+				exit(1);
+			}
+
+			memcpy((char *)value, (char *)src_value, f->alfi_value_len);
+			printf("ATTRI:  value: %.*s\n", f->alfi_value_len, value);
+
+			free(value);
+		}
+		else {
+			int len = f->alfi_value_len;
+
+			if (len > MAX_ATTR_VAL_PRINT)
+				len = MAX_ATTR_VAL_PRINT;
+
+			printf(_("ATTRI:  value len:%u\n"), f->alfi_value_len);
+			print_or_dump((char *)item->ri_buf[region].i_addr,
+					len);
+		}
 	}
 
 out:
diff --git a/logprint/logprint.h b/logprint/logprint.h
index b4479c24..b8e1c932 100644
--- a/logprint/logprint.h
+++ b/logprint/logprint.h
@@ -59,8 +59,9 @@ extern void xlog_recover_print_bud(struct xlog_recover_item *item);
 #define MAX_ATTR_VAL_PRINT	128
 
 extern int xlog_print_trans_attri(char **ptr, uint src_len, int *i);
-extern int xlog_print_trans_attri_name(char **ptr, uint src_len);
-extern int xlog_print_trans_attri_value(char **ptr, uint src_len, int value_len);
+extern int xlog_print_trans_attri_name(char **ptr, uint src_len, uint attr_flags);
+extern int xlog_print_trans_attri_value(char **ptr, uint src_len, int value_len,
+					uint attr_flags);
 extern void xlog_recover_print_attri(struct xlog_recover_item *item);
 extern int xlog_print_trans_attrd(char **ptr, uint len);
 extern void xlog_recover_print_attrd(struct xlog_recover_item *item);


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 22/25] xfs_db: report parent bit on xattrs
  2023-02-16 20:29 ` [PATCHSET v9r2d1 00/25] xfsprogs: Parent Pointers Darrick J. Wong
                     ` (20 preceding siblings ...)
  2023-02-16 20:58   ` [PATCH 21/25] xfsprogs: Print pptrs in ATTRI items Darrick J. Wong
@ 2023-02-16 20:59   ` Darrick J. Wong
  2023-02-16 20:59   ` [PATCH 23/25] xfsprogs: implement the upper half of parent pointers Darrick J. Wong
                     ` (2 subsequent siblings)
  24 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:59 UTC (permalink / raw)
  To: djwong; +Cc: Allison Henderson, allison.henderson, linux-xfs

From: Allison Henderson <allison.henderson@oracle.com>

Display the parent bit on xattr keys

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
---
 db/attr.c      |    3 +++
 db/attrshort.c |    3 +++
 2 files changed, 6 insertions(+)


diff --git a/db/attr.c b/db/attr.c
index ba722e14..f29e4a54 100644
--- a/db/attr.c
+++ b/db/attr.c
@@ -82,6 +82,9 @@ const field_t	attr_leaf_entry_flds[] = {
 	{ "local", FLDT_UINT1,
 	  OI(LEOFF(flags) + bitsz(uint8_t) - XFS_ATTR_LOCAL_BIT - 1), C1, 0,
 	  TYP_NONE },
+	{ "parent", FLDT_UINT1,
+	  OI(LEOFF(flags) + bitsz(uint8_t) - XFS_ATTR_PARENT_BIT - 1), C1, 0,
+	  TYP_NONE },
 	{ "pad2", FLDT_UINT8X, OI(LEOFF(pad2)), C1, FLD_SKIPALL, TYP_NONE },
 	{ NULL }
 };
diff --git a/db/attrshort.c b/db/attrshort.c
index e234fbd8..872d771d 100644
--- a/db/attrshort.c
+++ b/db/attrshort.c
@@ -44,6 +44,9 @@ const field_t	attr_sf_entry_flds[] = {
 	{ "secure", FLDT_UINT1,
 	  OI(EOFF(flags) + bitsz(uint8_t) - XFS_ATTR_SECURE_BIT - 1), C1, 0,
 	  TYP_NONE },
+	{ "parent", FLDT_UINT1,
+	  OI(EOFF(flags) + bitsz(uint8_t) - XFS_ATTR_PARENT_BIT - 1), C1, 0,
+	  TYP_NONE },
 	{ "name", FLDT_CHARNS, OI(EOFF(nameval)), attr_sf_entry_name_count,
 	  FLD_COUNT, TYP_NONE },
 	{ "value", FLDT_CHARNS, attr_sf_entry_value_offset,


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 23/25] xfsprogs: implement the upper half of parent pointers
  2023-02-16 20:29 ` [PATCHSET v9r2d1 00/25] xfsprogs: Parent Pointers Darrick J. Wong
                     ` (21 preceding siblings ...)
  2023-02-16 20:59   ` [PATCH 22/25] xfs_db: report parent bit on xattrs Darrick J. Wong
@ 2023-02-16 20:59   ` Darrick J. Wong
  2023-02-16 20:59   ` [PATCH 24/25] xfsprogs: Add parent pointers during protofile creation Darrick J. Wong
  2023-02-16 21:00   ` [PATCH 25/25] xfsprogs: Add i, n and f flags to parent command Darrick J. Wong
  24 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:59 UTC (permalink / raw)
  To: djwong; +Cc: Darrick J. Wong, Allison Collins, allison.henderson, linux-xfs

From: Allison Collins <allison.henderson@oracle.com>

Add ioctl definitions to libxfs, build the necessary helpers into libfrog and
libhandle to iterate parents (and parent paths), then wire up xfs_scrub to be able
to query parent pointers from userspace.  The goal of this patch is to exercise
userspace, and is nowhere near a complete solution.  A basic xfs_io parent command
implementation replaces ... whatever that is that's there now.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Allison Collins <allison.henderson@oracle.com>
---
 include/handle.h   |    2 
 include/parent.h   |   18 ++
 io/parent.c        |  473 +++++++++++++---------------------------------------
 libfrog/paths.c    |  136 +++++++++++++++
 libfrog/paths.h    |   21 ++
 libhandle/Makefile |    2 
 libhandle/handle.c |    7 -
 libhandle/parent.c |  328 ++++++++++++++++++++++++++++++++++++
 scrub/inodes.c     |   26 +++
 scrub/inodes.h     |    2 
 10 files changed, 658 insertions(+), 357 deletions(-)
 create mode 100644 libhandle/parent.c


diff --git a/include/handle.h b/include/handle.h
index 34246f38..1f02c964 100644
--- a/include/handle.h
+++ b/include/handle.h
@@ -40,6 +40,8 @@ extern int  fssetdm_by_handle (void *__hanp, size_t __hlen,
 
 void fshandle_destroy(void);
 
+int handle_to_fsfd(void *hanp, char **path);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/include/parent.h b/include/parent.h
index 4d3ad51b..fb900041 100644
--- a/include/parent.h
+++ b/include/parent.h
@@ -17,4 +17,22 @@ typedef struct parent_cursor {
 	__u32	opaque[4];      /* an opaque cookie */
 } parent_cursor_t;
 
+struct path_list;
+
+typedef int (*walk_pptr_fn)(struct xfs_pptr_info *pi, struct xfs_parent_ptr *pptr,
+		void *arg);
+typedef int (*walk_ppath_fn)(const char *mntpt, struct path_list *path,
+		void *arg);
+
+#define WALK_PPTRS_ABORT	1
+int fd_walk_pptrs(int fd, walk_pptr_fn fn, void *arg);
+int handle_walk_pptrs(void *hanp, size_t hanlen, walk_pptr_fn fn, void *arg);
+
+#define WALK_PPATHS_ABORT	1
+int fd_walk_ppaths(int fd, walk_ppath_fn fn, void *arg);
+int handle_walk_ppaths(void *hanp, size_t hanlen, walk_ppath_fn fn, void *arg);
+
+int fd_to_path(int fd, char *path, size_t pathlen);
+int handle_to_path(void *hanp, size_t hlen, char *path, size_t pathlen);
+
 #endif
diff --git a/io/parent.c b/io/parent.c
index 8f63607f..e0ca29eb 100644
--- a/io/parent.c
+++ b/io/parent.c
@@ -9,363 +9,106 @@
 #include "libfrog/paths.h"
 #include "parent.h"
 #include "handle.h"
-#include "jdm.h"
 #include "init.h"
 #include "io.h"
 
-#define PARENTBUF_SZ		16384
-#define BSTATBUF_SZ		16384
-
 static cmdinfo_t parent_cmd;
-static int verbose_flag;
-static int err_status;
-static __u64 inodes_checked;
 static char *mntpt;
 
-/*
- * check out a parent entry to see if the values seem valid
- */
-static void
-check_parent_entry(struct xfs_bstat *bstatp, parent_t *parent)
-{
-	int sts;
-	char fullpath[PATH_MAX];
-	struct stat statbuf;
-	char *str;
-
-	sprintf(fullpath, _("%s%s"), mntpt, parent->p_name);
-
-	sts = lstat(fullpath, &statbuf);
-	if (sts != 0) {
-		fprintf(stderr,
-			_("inode-path for inode: %llu is incorrect - path \"%s\" non-existent\n"),
-			(unsigned long long) bstatp->bs_ino, fullpath);
-		if (verbose_flag) {
-			fprintf(stderr,
-				_("path \"%s\" does not stat for inode: %llu; err = %s\n"),
-				fullpath,
-			       (unsigned long long) bstatp->bs_ino,
-				strerror(errno));
-		}
-		err_status++;
-		return;
-	} else {
-		if (verbose_flag > 1) {
-			printf(_("path \"%s\" found\n"), fullpath);
-		}
-	}
-
-	if (statbuf.st_ino != bstatp->bs_ino) {
-		fprintf(stderr,
-			_("inode-path for inode: %llu is incorrect - wrong inode#\n"),
-		       (unsigned long long) bstatp->bs_ino);
-		if (verbose_flag) {
-			fprintf(stderr,
-				_("ino mismatch for path \"%s\" %llu vs %llu\n"),
-				fullpath,
-				(unsigned long long)statbuf.st_ino,
-				(unsigned long long)bstatp->bs_ino);
-		}
-		err_status++;
-		return;
-	} else if (verbose_flag > 1) {
-		printf(_("inode number match: %llu\n"),
-			(unsigned long long)statbuf.st_ino);
-	}
-
-	/* get parent path */
-	str = strrchr(fullpath, '/');
-	*str = '\0';
-	sts = stat(fullpath, &statbuf);
-	if (sts != 0) {
-		fprintf(stderr,
-			_("parent path \"%s\" does not stat: %s\n"),
-			fullpath,
-			strerror(errno));
-		err_status++;
-		return;
-	} else {
-		if (parent->p_ino != statbuf.st_ino) {
-			fprintf(stderr,
-				_("inode-path for inode: %llu is incorrect - wrong parent inode#\n"),
-			       (unsigned long long) bstatp->bs_ino);
-			if (verbose_flag) {
-				fprintf(stderr,
-					_("ino mismatch for path \"%s\" %llu vs %llu\n"),
-					fullpath,
-					(unsigned long long)parent->p_ino,
-					(unsigned long long)statbuf.st_ino);
-			}
-			err_status++;
-			return;
-		} else {
-			if (verbose_flag > 1) {
-			       printf(_("parent ino match for %llu\n"),
-				       (unsigned long long) parent->p_ino);
-			}
-		}
-	}
-}
-
-static void
-check_parents(parent_t *parentbuf, size_t *parentbuf_size,
-	     jdm_fshandle_t *fshandlep, struct xfs_bstat *statp)
-{
-	int error, i;
-	__u32 count;
-	parent_t *entryp;
-
-	do {
-		error = jdm_parentpaths(fshandlep, statp, parentbuf, *parentbuf_size, &count);
-
-		if (error == ERANGE) {
-			*parentbuf_size *= 2;
-			parentbuf = (parent_t *)realloc(parentbuf, *parentbuf_size);
-		} else if (error) {
-			fprintf(stderr, _("parentpaths failed for ino %llu: %s\n"),
-			       (unsigned long long) statp->bs_ino,
-				strerror(errno));
-			err_status++;
-			break;
-		}
-	} while (error == ERANGE);
-
-
-	if (count == 0) {
-		/* no links for inode - something wrong here */
-	       fprintf(stderr, _("inode-path for inode: %llu is missing\n"),
-			       (unsigned long long) statp->bs_ino);
-		err_status++;
-	}
-
-	entryp = parentbuf;
-	for (i = 0; i < count; i++) {
-		check_parent_entry(statp, entryp);
-		entryp = (parent_t*) (((char*)entryp) + entryp->p_reclen);
-	}
-}
-
 static int
-do_bulkstat(parent_t *parentbuf, size_t *parentbuf_size,
-	    struct xfs_bstat *bstatbuf, int fsfd, jdm_fshandle_t *fshandlep)
+pptr_print(
+	struct xfs_pptr_info	*pi,
+	struct xfs_parent_ptr	*pptr,
+	void			*arg)
 {
-	__s32 buflenout;
-	__u64 lastino = 0;
-	struct xfs_bstat *p;
-	struct xfs_bstat *endp;
-	struct xfs_fsop_bulkreq bulkreq;
-	struct stat mntstat;
+	char			buf[XFS_PPTR_MAXNAMELEN + 1];
+	unsigned int		namelen = strlen((char *)pptr->xpp_name);
 
-	if (stat(mntpt, &mntstat)) {
-		fprintf(stderr, _("can't stat mount point \"%s\": %s\n"),
-			mntpt, strerror(errno));
-		return 1;
+	if (pi->pi_flags & XFS_PPTR_OFLAG_ROOT) {
+		printf(_("Root directory.\n"));
+		return 0;
 	}
 
-	bulkreq.lastip  = &lastino;
-	bulkreq.icount  = BSTATBUF_SZ;
-	bulkreq.ubuffer = (void *)bstatbuf;
-	bulkreq.ocount  = &buflenout;
-
-	while (xfsctl(mntpt, fsfd, XFS_IOC_FSBULKSTAT, &bulkreq) == 0) {
-		if (*(bulkreq.ocount) == 0) {
-			return 0;
-		}
-		for (p = bstatbuf, endp = bstatbuf + *bulkreq.ocount; p < endp; p++) {
-
-			/* inode being modified, get synced data with iget */
-			if ( (!p->bs_nlink || !p->bs_mode) && p->bs_ino != 0 ) {
-
-				if (xfsctl(mntpt, fsfd, XFS_IOC_FSBULKSTAT_SINGLE, &bulkreq) < 0) {
-				    fprintf(stderr,
-					  _("failed to get bulkstat information for inode %llu\n"),
-					 (unsigned long long) p->bs_ino);
-				    continue;
-				}
-				if (!p->bs_nlink || !p->bs_mode || !p->bs_ino) {
-				    fprintf(stderr,
-					  _("failed to get valid bulkstat information for inode %llu\n"),
-					 (unsigned long long) p->bs_ino);
-				    continue;
-				}
-			}
-
-			/* skip root */
-			if (p->bs_ino == mntstat.st_ino) {
-				continue;
-			}
-
-			if (verbose_flag > 1) {
-			       printf(_("checking inode %llu\n"),
-				       (unsigned long long) p->bs_ino);
-			}
-
-			/* print dotted progress */
-			if ((inodes_checked % 100) == 0 && verbose_flag == 1) {
-				printf("."); fflush(stdout);
-			}
-			inodes_checked++;
-
-			check_parents(parentbuf, parentbuf_size, fshandlep, p);
-		}
-
-	}/*while*/
-
-	fprintf(stderr, _("syssgi bulkstat failed: %s\n"), strerror(errno));
-	return 1;
+	memcpy(buf, pptr->xpp_name, namelen);
+	buf[namelen] = 0;
+	printf(_("p_ino    = %llu\n"), (unsigned long long)pptr->xpp_ino);
+	printf(_("p_gen    = %u\n"), (unsigned int)pptr->xpp_gen);
+	printf(_("p_reclen = %u\n"), namelen);
+	printf(_("p_name   = \"%s\"\n\n"), buf);
+	return 0;
 }
 
-static int
-parent_check(void)
+int
+print_parents(
+	struct xfs_handle	*handle)
 {
-	int fsfd;
-	jdm_fshandle_t *fshandlep;
-	parent_t *parentbuf;
-	size_t parentbuf_size = PARENTBUF_SZ;
-	struct xfs_bstat *bstatbuf;
-
-	err_status = 0;
-	inodes_checked = 0;
-
-	sync();
-
-        fsfd = file->fd;
-
-	fshandlep = jdm_getfshandle(mntpt);
-	if (fshandlep == NULL) {
-		fprintf(stderr, _("unable to open \"%s\" for jdm: %s\n"),
-		      mntpt,
-		      strerror(errno));
-		return 1;
-	}
-
-	/* allocate buffers */
-        bstatbuf = (struct xfs_bstat *)calloc(BSTATBUF_SZ, sizeof(struct xfs_bstat));
-	parentbuf = (parent_t *)malloc(parentbuf_size);
-	if (!bstatbuf || !parentbuf) {
-		fprintf(stderr, _("unable to allocate buffers: %s\n"),
-			strerror(errno));
-		err_status = 1;
-		goto out;
-	}
-
-	if (do_bulkstat(parentbuf, &parentbuf_size, bstatbuf, fsfd, fshandlep) != 0)
-		err_status++;
+	int			ret;
 
-	if (err_status > 0)
-		fprintf(stderr, _("num errors: %d\n"), err_status);
+	if (handle)
+		ret = handle_walk_pptrs(handle, sizeof(*handle), pptr_print,
+				NULL);
 	else
-		printf(_("succeeded checking %llu inodes\n"),
-			(unsigned long long) inodes_checked);
+		ret = fd_walk_pptrs(file->fd, pptr_print, NULL);
+	if (ret)
+		perror(file->name);
 
-out:
-	free(bstatbuf);
-	free(parentbuf);
-	free(fshandlep);
-	return err_status;
-}
-
-static void
-print_parent_entry(parent_t *parent, int fullpath)
-{
-       printf(_("p_ino    = %llu\n"),  (unsigned long long) parent->p_ino);
-	printf(_("p_gen    = %u\n"),	parent->p_gen);
-	printf(_("p_reclen = %u\n"),	parent->p_reclen);
-	if (fullpath)
-		printf(_("p_name   = \"%s%s\"\n"), mntpt, parent->p_name);
-	else
-		printf(_("p_name   = \"%s\"\n"), parent->p_name);
+	return 0;
 }
 
 static int
-parent_list(int fullpath)
-{
-	void *handlep = NULL;
-	size_t handlen;
-	int error, i;
-	int retval = 1;
-	__u32 count;
-	parent_t *entryp;
-	parent_t *parentbuf = NULL;
-	char *path = file->name;
-	int pb_size = PARENTBUF_SZ;
-
-	/* XXXX for linux libhandle version - to set libhandle fsfd cache */
-	{
-		void *fshandle;
-		size_t fshlen;
-
-		if (path_to_fshandle(mntpt, &fshandle, &fshlen) != 0) {
-			fprintf(stderr, _("%s: failed path_to_fshandle \"%s\": %s\n"),
-				progname, path, strerror(errno));
-			goto error;
-		}
-		free_handle(fshandle, fshlen);
+path_print(
+	const char		*mntpt,
+	struct path_list	*path,
+	void			*arg) {
+
+	char			buf[PATH_MAX];
+	size_t			len = PATH_MAX;
+	int			ret;
+
+	ret = snprintf(buf, len, "%s", mntpt);
+	if (ret != strlen(mntpt)) {
+		errno = ENOMEM;
+		return -1;
 	}
 
-	if (path_to_handle(path, &handlep, &handlen) != 0) {
-		fprintf(stderr, _("%s: path_to_handle failed for \"%s\"\n"), progname, path);
-		goto error;
-	}
-
-	do {
-		parentbuf = (parent_t *)realloc(parentbuf, pb_size);
-		if (!parentbuf) {
-			fprintf(stderr, _("%s: unable to allocate parent buffer: %s\n"),
-				progname, strerror(errno));
-			goto error;
-		}
-
-		if (fullpath) {
-			error = parentpaths_by_handle(handlep,
-						       handlen,
-						       parentbuf,
-						       pb_size,
-						       &count);
-		} else {
-			error = parents_by_handle(handlep,
-						   handlen,
-						   parentbuf,
-						   pb_size,
-						   &count);
-		}
-		if (error == ERANGE) {
-			pb_size *= 2;
-		} else if (error) {
-			fprintf(stderr, _("%s: %s call failed for \"%s\": %s\n"),
-				progname, fullpath ? "parentpaths" : "parents",
-				path, strerror(errno));
-			goto error;
-		}
-	} while (error == ERANGE);
-
-	if (count == 0) {
-		/* no links for inode - something wrong here */
-		fprintf(stderr, _("%s: inode-path is missing\n"), progname);
-		goto error;
-	}
-
-	entryp = parentbuf;
-	for (i = 0; i < count; i++) {
-		print_parent_entry(entryp, fullpath);
-		entryp = (parent_t*) (((char*)entryp) + entryp->p_reclen);
-	}
+	ret = path_list_to_string(path, buf + ret, len - ret);
+	if (ret < 0)
+		return ret;
+	return 0;
+}
 
-	retval = 0;
-error:
-	free(handlep);
-	free(parentbuf);
-	return retval;
+int
+print_paths(
+	struct xfs_handle	*handle)
+{
+	int			ret;
+
+	if (handle)
+		ret = handle_walk_ppaths(handle, sizeof(*handle), path_print,
+				NULL);
+ 	else
+		ret = fd_walk_ppaths(file->fd, path_print, NULL);
+	if (ret)
+		perror(file->name);
+	return 0;
 }
 
 static int
-parent_f(int argc, char **argv)
+parent_f(
+	int			argc,
+	char			**argv)
 {
-	int c;
-	int listpath_flag = 0;
-	int check_flag = 0;
-	fs_path_t *fs;
-	static int tab_init;
+	struct xfs_handle	handle;
+	void			*hanp = NULL;
+	size_t			hlen;
+	struct fs_path		*fs;
+	char			*p;
+	uint64_t		ino = 0;
+	uint32_t		gen = 0;
+	int			c;
+	int			listpath_flag = 0;
+	int			ret;
+	static int		tab_init;
 
 	if (!tab_init) {
 		tab_init = 1;
@@ -380,46 +123,72 @@ parent_f(int argc, char **argv)
 	}
 	mntpt = fs->fs_dir;
 
-	verbose_flag = 0;
-
-	while ((c = getopt(argc, argv, "cpv")) != EOF) {
+	while ((c = getopt(argc, argv, "p")) != EOF) {
 		switch (c) {
-		case 'c':
-			check_flag = 1;
-			break;
 		case 'p':
 			listpath_flag = 1;
 			break;
-		case 'v':
-			verbose_flag++;
-			break;
 		default:
 			return command_usage(&parent_cmd);
 		}
 	}
 
-	if (!check_flag && !listpath_flag) /* default case */
-		exitcode = parent_list(listpath_flag);
-	else {
-		if (listpath_flag)
-			exitcode = parent_list(listpath_flag);
-		if (check_flag)
-			exitcode = parent_check();
+	/*
+	 * Always initialize the fshandle table because we need it for
+	 * the ppaths functions to work.
+	 */
+	ret = path_to_fshandle((char *)mntpt, &hanp, &hlen);
+	if (ret) {
+		perror(mntpt);
+		return 0;
+ 	}
+ 
+	if (optind + 2 == argc) {
+		ino = strtoull(argv[optind], &p, 0);
+		if (*p != '\0' || ino == 0) {
+			fprintf(stderr,
+				_("Bad inode number '%s'.\n"),
+				argv[optind]);
+			return 0;
+		}
+		gen = strtoul(argv[optind + 1], &p, 0);
+		if (*p != '\0') {
+			fprintf(stderr,
+				_("Bad generation number '%s'.\n"),
+				argv[optind + 1]);
+			return 0;
+		}
+
+		memcpy(&handle, hanp, sizeof(handle));
+		handle.ha_fid.fid_len = sizeof(xfs_fid_t) -
+				sizeof(handle.ha_fid.fid_len);
+		handle.ha_fid.fid_pad = 0;
+		handle.ha_fid.fid_ino = ino;
+		handle.ha_fid.fid_gen = gen;
+
 	}
 
+	if (listpath_flag)
+		exitcode = print_paths(ino ? &handle : NULL);
+	else
+		exitcode = print_parents(ino ? &handle : NULL);
+
+	if (hanp)
+		free_handle(hanp, hlen);
+
 	return 0;
 }
 
 static void
 parent_help(void)
 {
-	printf(_(
+printf(_(
 "\n"
 " list the current file's parents and their filenames\n"
 "\n"
-" -c -- check the current file's file system for parent consistency\n"
-" -p -- list the current file's parents and their full paths\n"
-" -v -- verbose mode\n"
+" -p -- list the current file's paths up to the root\n"
+"\n"
+"If ino and gen are supplied, use them instead.\n"
 "\n"));
 }
 
@@ -430,9 +199,9 @@ parent_init(void)
 	parent_cmd.cfunc = parent_f;
 	parent_cmd.argmin = 0;
 	parent_cmd.argmax = -1;
-	parent_cmd.args = _("[-cpv]");
+	parent_cmd.args = _("[-p] [ino gen]");
 	parent_cmd.flags = CMD_NOMAP_OK;
-	parent_cmd.oneline = _("print or check parent inodes");
+	parent_cmd.oneline = _("print parent inodes");
 	parent_cmd.help = parent_help;
 
 	if (expert)
diff --git a/libfrog/paths.c b/libfrog/paths.c
index abb29a23..a86ae07c 100644
--- a/libfrog/paths.c
+++ b/libfrog/paths.c
@@ -15,6 +15,7 @@
 #include "paths.h"
 #include "input.h"
 #include "projects.h"
+#include "list.h"
 #include <limits.h>
 
 extern char *progname;
@@ -563,3 +564,138 @@ fs_table_insert_project_path(
 
 	return error;
 }
+
+
+/* Structured path components. */
+
+struct path_list {
+	struct list_head	p_head;
+};
+
+struct path_component {
+	struct list_head	pc_list;
+	char			*pc_fname;
+};
+
+/* Initialize a path component with a given name. */
+struct path_component *
+path_component_init(
+	const char		*name)
+{
+	struct path_component	*pc;
+
+	pc = malloc(sizeof(struct path_component));
+	if (!pc)
+		return NULL;
+	INIT_LIST_HEAD(&pc->pc_list);
+	pc->pc_fname = strdup(name);
+	if (!pc->pc_fname) {
+		free(pc);
+		return NULL;
+	}
+	return pc;
+}
+
+/* Free a path component. */
+void
+path_component_free(
+	struct path_component	*pc)
+{
+	free(pc->pc_fname);
+	free(pc);
+}
+
+/* Change a path component's filename. */
+int
+path_component_change(
+	struct path_component	*pc,
+	void			*name,
+	size_t			namelen)
+{
+	void			*p;
+
+	p = realloc(pc->pc_fname, namelen + 1);
+	if (!p)
+		return -1;
+	pc->pc_fname = p;
+	memcpy(pc->pc_fname, name, namelen);
+	pc->pc_fname[namelen] = 0;
+	return 0;
+}
+
+/* Initialize a pathname. */
+struct path_list *
+path_list_init(void)
+{
+	struct path_list	*path;
+
+	path = malloc(sizeof(struct path_list));
+	if (!path)
+		return NULL;
+	INIT_LIST_HEAD(&path->p_head);
+	return path;
+}
+
+/* Empty out a pathname. */
+void
+path_list_free(
+	struct path_list	*path)
+{
+	struct path_component	*pos;
+	struct path_component	*n;
+
+	list_for_each_entry_safe(pos, n, &path->p_head, pc_list) {
+		path_list_del_component(path, pos);
+		path_component_free(pos);
+	}
+	free(path);
+}
+
+/* Add a parent component to a pathname. */
+void
+path_list_add_parent_component(
+	struct path_list	*path,
+	struct path_component	*pc)
+{
+	list_add(&pc->pc_list, &path->p_head);
+}
+
+/* Add a component to a pathname. */
+void
+path_list_add_component(
+	struct path_list	*path,
+	struct path_component	*pc)
+{
+	list_add_tail(&pc->pc_list, &path->p_head);
+}
+
+/* Remove a component from a pathname. */
+void
+path_list_del_component(
+	struct path_list	*path,
+	struct path_component	*pc)
+{
+	list_del_init(&pc->pc_list);
+}
+
+/* Convert a pathname into a string. */
+ssize_t
+path_list_to_string(
+	struct path_list	*path,
+	char			*buf,
+	size_t			buflen)
+{
+	struct path_component	*pos;
+	ssize_t			bytes = 0;
+	int			ret;
+
+	list_for_each_entry(pos, &path->p_head, pc_list) {
+		ret = snprintf(buf, buflen, "/%s", pos->pc_fname);
+		if (ret != 1 + strlen(pos->pc_fname))
+			return -1;
+		bytes += ret;
+		buf += ret;
+		buflen -= ret;
+	}
+	return bytes;
+}
diff --git a/libfrog/paths.h b/libfrog/paths.h
index f20a2c3e..52538fb5 100644
--- a/libfrog/paths.h
+++ b/libfrog/paths.h
@@ -58,4 +58,23 @@ typedef struct fs_cursor {
 extern void fs_cursor_initialise(char *__dir, uint __flags, fs_cursor_t *__cp);
 extern fs_path_t *fs_cursor_next_entry(fs_cursor_t *__cp);
 
-#endif	/* __LIBFROG_PATH_H__ */
+/* Path information. */
+
+struct path_list;
+struct path_component;
+
+struct path_component *path_component_init(const char *name);
+void path_component_free(struct path_component *pc);
+int path_component_change(struct path_component *pc, void *name,
+		size_t namelen);
+
+struct path_list *path_list_init(void);
+void path_list_free(struct path_list *path);
+void path_list_add_parent_component(struct path_list *path,
+		struct path_component *pc);
+void path_list_add_component(struct path_list *path, struct path_component *pc);
+void path_list_del_component(struct path_list *path, struct path_component *pc);
+
+ssize_t path_list_to_string(struct path_list *path, char *buf, size_t buflen);
+
+#endif	/* __PATH_H__ */
diff --git a/libhandle/Makefile b/libhandle/Makefile
index f297a59e..cf7df67c 100644
--- a/libhandle/Makefile
+++ b/libhandle/Makefile
@@ -12,7 +12,7 @@ LT_AGE = 0
 
 LTLDFLAGS += -Wl,--version-script,libhandle.sym
 
-CFILES = handle.c jdm.c
+CFILES = handle.c jdm.c parent.c
 LSRCFILES = libhandle.sym
 
 default: ltdepend $(LTLIBRARY)
diff --git a/libhandle/handle.c b/libhandle/handle.c
index 333c2190..1e8fe9ac 100644
--- a/libhandle/handle.c
+++ b/libhandle/handle.c
@@ -29,7 +29,6 @@ typedef union {
 } comarg_t;
 
 static int obj_to_handle(char *, int, unsigned int, comarg_t, void**, size_t*);
-static int handle_to_fsfd(void *, char **);
 static char *path_to_fspath(char *path);
 
 
@@ -203,8 +202,10 @@ handle_to_fshandle(
 	return 0;
 }
 
-static int
-handle_to_fsfd(void *hanp, char **path)
+int
+handle_to_fsfd(
+	void		*hanp,
+	char		**path)
 {
 	struct fdhash	*fdhp;
 
diff --git a/libhandle/parent.c b/libhandle/parent.c
new file mode 100644
index 00000000..ebd0abd5
--- /dev/null
+++ b/libhandle/parent.c
@@ -0,0 +1,328 @@
+/*
+ * Copyright (C) 2017 Oracle.  All Rights Reserved.
+ *
+ * Author: Darrick J. Wong <darrick.wong@oracle.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation,
+ * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+#include "platform_defs.h"
+#include "xfs.h"
+#include "xfs_arch.h"
+#include "list.h"
+#include "libfrog/paths.h"
+#include "handle.h"
+#include "parent.h"
+
+/* Allocate a buffer large enough for some parent pointer records. */
+static inline struct xfs_pptr_info *
+xfs_pptr_alloc(
+      size_t                  nr_ptrs)
+{
+      struct xfs_pptr_info    *pi;
+
+      pi = malloc(xfs_pptr_info_sizeof(nr_ptrs));
+      if (!pi)
+              return NULL;
+      memset(pi, 0, sizeof(struct xfs_pptr_info));
+      pi->pi_ptrs_size = nr_ptrs;
+      return pi;
+}
+
+/* Walk all parents of the given file handle. */
+static int
+handle_walk_parents(
+	int			fd,
+	struct xfs_handle	*handle,
+	walk_pptr_fn		fn,
+	void			*arg)
+{
+	struct xfs_pptr_info	*pi;
+	struct xfs_parent_ptr	*p;
+	unsigned int		i;
+	ssize_t			ret = -1;
+
+	pi = xfs_pptr_alloc(4);
+	if (!pi)
+		return -1;
+
+	if (handle) {
+		memcpy(&pi->pi_handle, handle, sizeof(struct xfs_handle));
+		pi->pi_flags = XFS_PPTR_IFLAG_HANDLE;
+	}
+
+	ret = ioctl(fd, XFS_IOC_GETPARENTS, pi);
+	while (!ret) {
+		if (pi->pi_flags & XFS_PPTR_OFLAG_ROOT) {
+			ret = fn(pi, NULL, arg);
+			break;
+		}
+
+		for (i = 0; i < pi->pi_ptrs_used; i++) {
+			p = xfs_ppinfo_to_pp(pi, i);
+			ret = fn(pi, p, arg);
+			if (ret)
+				goto out_pi;
+		}
+
+		if (pi->pi_flags & XFS_PPTR_OFLAG_DONE)
+			break;
+
+		ret = ioctl(fd, XFS_IOC_GETPARENTS, pi);
+	}
+
+out_pi:
+	free(pi);
+	return ret;
+}
+
+/* Walk all parent pointers of this handle. */
+int
+handle_walk_pptrs(
+	void			*hanp,
+	size_t			hlen,
+	walk_pptr_fn		fn,
+	void			*arg)
+{
+	char			*mntpt;
+	int			fd;
+
+	if (hlen != sizeof(struct xfs_handle)) {
+		errno = EINVAL;
+		return -1;
+	}
+
+	fd = handle_to_fsfd(hanp, &mntpt);
+	if (fd < 0)
+		return -1;
+
+	return handle_walk_parents(fd, hanp, fn, arg);
+}
+
+/* Walk all parent pointers of this fd. */
+int
+fd_walk_pptrs(
+	int			fd,
+	walk_pptr_fn		fn,
+	void			*arg)
+{
+	return handle_walk_parents(fd, NULL, fn, arg);
+}
+
+struct walk_ppaths_info {
+	walk_ppath_fn			fn;
+	void				*arg;
+	char				*mntpt;
+	struct path_list		*path;
+	int				fd;
+};
+
+struct walk_ppath_level_info {
+	struct xfs_handle		newhandle;
+	struct path_component		*pc;
+	struct walk_ppaths_info		*wpi;
+};
+
+static int handle_walk_parent_paths(struct walk_ppaths_info *wpi,
+		struct xfs_handle *handle);
+
+static int
+handle_walk_parent_path_ptr(
+	struct xfs_pptr_info		*pi,
+	struct xfs_parent_ptr		*p,
+	void				*arg)
+{
+	struct walk_ppath_level_info	*wpli = arg;
+	struct walk_ppaths_info		*wpi = wpli->wpi;
+	unsigned int			i;
+	int				ret = 0;
+
+	if (pi->pi_flags & XFS_PPTR_OFLAG_ROOT)
+		return wpi->fn(wpi->mntpt, wpi->path, wpi->arg);
+
+	for (i = 0; i < pi->pi_ptrs_used; i++) {
+		p = xfs_ppinfo_to_pp(pi, i);
+		ret = path_component_change(wpli->pc, p->xpp_name,
+				strlen((char *)p->xpp_name));
+		if (ret)
+			break;
+		wpli->newhandle.ha_fid.fid_ino = p->xpp_ino;
+		wpli->newhandle.ha_fid.fid_gen = p->xpp_gen;
+		path_list_add_parent_component(wpi->path, wpli->pc);
+		ret = handle_walk_parent_paths(wpi, &wpli->newhandle);
+		path_list_del_component(wpi->path, wpli->pc);
+		if (ret)
+			break;
+	}
+
+	return ret;
+}
+
+/*
+ * Recursively walk all parents of the given file handle; if we hit the
+ * fs root then we call the associated function with the constructed path.
+ */
+static int
+handle_walk_parent_paths(
+	struct walk_ppaths_info		*wpi,
+	struct xfs_handle		*handle)
+{
+	struct walk_ppath_level_info	*wpli;
+	int				ret;
+
+	wpli = malloc(sizeof(struct walk_ppath_level_info));
+	if (!wpli)
+		return -1;
+	wpli->pc = path_component_init("");
+	if (!wpli->pc) {
+		free(wpli);
+		return -1;
+	}
+	wpli->wpi = wpi;
+	memcpy(&wpli->newhandle, handle, sizeof(struct xfs_handle));
+
+	ret = handle_walk_parents(wpi->fd, handle, handle_walk_parent_path_ptr,
+			wpli);
+
+	path_component_free(wpli->pc);
+	free(wpli);
+	return ret;
+}
+
+/*
+ * Call the given function on all known paths from the vfs root to the inode
+ * described in the handle.
+ */
+int
+handle_walk_ppaths(
+	void			*hanp,
+	size_t			hlen,
+	walk_ppath_fn		fn,
+	void			*arg)
+{
+	struct walk_ppaths_info	wpi;
+	ssize_t			ret;
+
+	if (hlen != sizeof(struct xfs_handle)) {
+		errno = EINVAL;
+		return -1;
+	}
+
+	wpi.fd = handle_to_fsfd(hanp, &wpi.mntpt);
+	if (wpi.fd < 0)
+		return -1;
+	wpi.path = path_list_init();
+	if (!wpi.path)
+		return -1;
+	wpi.fn = fn;
+	wpi.arg = arg;
+
+	ret = handle_walk_parent_paths(&wpi, hanp);
+	path_list_free(wpi.path);
+
+	return ret;
+}
+
+/*
+ * Call the given function on all known paths from the vfs root to the inode
+ * referred to by the file description.
+ */
+int
+fd_walk_ppaths(
+	int			fd,
+	walk_ppath_fn		fn,
+	void			*arg)
+{
+	struct walk_ppaths_info	wpi;
+	void			*hanp;
+	size_t			hlen;
+	int			fsfd;
+	int			ret;
+
+	ret = fd_to_handle(fd, &hanp, &hlen);
+	if (ret)
+		return ret;
+
+	fsfd = handle_to_fsfd(hanp, &wpi.mntpt);
+	if (fsfd < 0)
+		return -1;
+	wpi.fd = fd;
+	wpi.path = path_list_init();
+	if (!wpi.path)
+		return -1;
+	wpi.fn = fn;
+	wpi.arg = arg;
+
+	ret = handle_walk_parent_paths(&wpi, hanp);
+	path_list_free(wpi.path);
+
+	return ret;
+}
+
+struct path_walk_info {
+	char			*buf;
+	size_t			len;
+};
+
+/* Helper that stringifies the first full path that we find. */
+static int
+handle_to_path_walk(
+	const char		*mntpt,
+	struct path_list	*path,
+	void			*arg)
+{
+	struct path_walk_info	*pwi = arg;
+	int			ret;
+
+	ret = snprintf(pwi->buf, pwi->len, "%s", mntpt);
+	if (ret != strlen(mntpt)) {
+		errno = ENOMEM;
+		return -1;
+	}
+
+	ret = path_list_to_string(path, pwi->buf + ret, pwi->len - ret);
+	if (ret < 0)
+		return ret;
+
+	return WALK_PPATHS_ABORT;
+}
+
+/* Return any eligible path to this file handle. */
+int
+handle_to_path(
+	void			*hanp,
+	size_t			hlen,
+	char			*path,
+	size_t			pathlen)
+{
+	struct path_walk_info	pwi;
+
+	pwi.buf = path;
+	pwi.len = pathlen;
+	return handle_walk_ppaths(hanp, hlen, handle_to_path_walk, &pwi);
+}
+
+/* Return any eligible path to this file description. */
+int
+fd_to_path(
+	int			fd,
+	char			*path,
+	size_t			pathlen)
+{
+	struct path_walk_info	pwi;
+
+	pwi.buf = path;
+	pwi.len = pathlen;
+	return fd_walk_ppaths(fd, handle_to_path_walk, &pwi);
+}
diff --git a/scrub/inodes.c b/scrub/inodes.c
index 78f0914b..245dd713 100644
--- a/scrub/inodes.c
+++ b/scrub/inodes.c
@@ -19,6 +19,7 @@
 #include "descr.h"
 #include "libfrog/fsgeom.h"
 #include "libfrog/bulkstat.h"
+#include "parent.h"
 
 /*
  * Iterate a range of inodes.
@@ -449,3 +450,28 @@ scrub_open_handle(
 	return open_by_fshandle(handle, sizeof(*handle),
 			O_RDONLY | O_NOATIME | O_NOFOLLOW | O_NOCTTY);
 }
+
+/* Construct a description for an inode. */
+void
+xfs_scrub_ino_descr(
+	struct scrub_ctx	*ctx,
+	struct xfs_handle	*handle,
+	char			*buf,
+	size_t			buflen)
+{
+	uint64_t		ino;
+	xfs_agnumber_t		agno;
+	xfs_agino_t		agino;
+	int			ret;
+
+	ret = handle_to_path(handle, sizeof(struct xfs_handle), buf, buflen);
+	if (ret >= 0)
+		return;
+
+	ino = handle->ha_fid.fid_ino;
+	agno = ino / (1ULL << (ctx->mnt.inopblog + ctx->mnt.agblklog));
+	agino = ino % (1ULL << (ctx->mnt.inopblog + ctx->mnt.agblklog));
+	snprintf(buf, buflen, _("inode %"PRIu64" (%u/%u)"), ino, agno,
+			agino);
+}
+
diff --git a/scrub/inodes.h b/scrub/inodes.h
index f0318045..189fa282 100644
--- a/scrub/inodes.h
+++ b/scrub/inodes.h
@@ -21,5 +21,7 @@ int scrub_scan_all_inodes(struct scrub_ctx *ctx, scrub_inode_iter_fn fn,
 		void *arg);
 
 int scrub_open_handle(struct xfs_handle *handle);
+void xfs_scrub_ino_descr(struct scrub_ctx *ctx, struct xfs_handle *handle,
+		char *buf, size_t buflen);
 
 #endif /* XFS_SCRUB_INODES_H_ */


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 24/25] xfsprogs: Add parent pointers during protofile creation
  2023-02-16 20:29 ` [PATCHSET v9r2d1 00/25] xfsprogs: Parent Pointers Darrick J. Wong
                     ` (22 preceding siblings ...)
  2023-02-16 20:59   ` [PATCH 23/25] xfsprogs: implement the upper half of parent pointers Darrick J. Wong
@ 2023-02-16 20:59   ` Darrick J. Wong
  2023-02-16 21:00   ` [PATCH 25/25] xfsprogs: Add i, n and f flags to parent command Darrick J. Wong
  24 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 20:59 UTC (permalink / raw)
  To: djwong; +Cc: Allison Henderson, allison.henderson, linux-xfs

From: Allison Henderson <allison.henderson@oracle.com>

Inodes created from protofile parsing will also need to add the appropriate parent
pointers

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
---
 mkfs/proto.c |   50 +++++++++++++++++++++++++++++++++++---------------
 1 file changed, 35 insertions(+), 15 deletions(-)


diff --git a/mkfs/proto.c b/mkfs/proto.c
index 6b6a070f..36d8cde2 100644
--- a/mkfs/proto.c
+++ b/mkfs/proto.c
@@ -8,6 +8,7 @@
 #include <sys/stat.h>
 #include "libfrog/convert.h"
 #include "proto.h"
+#include "xfs_parent.h"
 
 /*
  * Prototypes for internal functions.
@@ -317,18 +318,19 @@ newregfile(
 
 static void
 newdirent(
-	xfs_mount_t	*mp,
-	xfs_trans_t	*tp,
-	xfs_inode_t	*pip,
-	struct xfs_name	*name,
-	xfs_ino_t	inum)
+	struct xfs_mount	*mp,
+	struct xfs_trans	*tp,
+	struct xfs_inode	*pip,
+	struct xfs_name		*name,
+	xfs_ino_t		inum,
+	xfs_dir2_dataptr_t      *offset)
 {
-	int	error;
-	int	rsv;
+	int			error;
+	int			rsv;
 
 	rsv = XFS_DIRENTER_SPACE_RES(mp, name->len);
 
-	error = -libxfs_dir_createname(tp, pip, name, inum, rsv, NULL);
+	error = -libxfs_dir_createname(tp, pip, name, inum, rsv, offset);
 	if (error)
 		fail(_("directory createname error"), error);
 }
@@ -381,6 +383,7 @@ parseproto(
 	struct cred	creds;
 	char		*value;
 	struct xfs_name	xname;
+	xfs_dir2_dataptr_t offset;
 
 	memset(&creds, 0, sizeof(creds));
 	mstr = getstr(pp);
@@ -464,7 +467,7 @@ parseproto(
 			free(buf);
 		libxfs_trans_ijoin(tp, pip, 0);
 		xname.type = XFS_DIR3_FT_REG_FILE;
-		newdirent(mp, tp, pip, &xname, ip->i_ino);
+		newdirent(mp, tp, pip, &xname, ip->i_ino, &offset);
 		break;
 
 	case IF_RESERVED:			/* pre-allocated space only */
@@ -487,7 +490,7 @@ parseproto(
 		libxfs_trans_ijoin(tp, pip, 0);
 
 		xname.type = XFS_DIR3_FT_REG_FILE;
-		newdirent(mp, tp, pip, &xname, ip->i_ino);
+		newdirent(mp, tp, pip, &xname, ip->i_ino, &offset);
 		libxfs_trans_log_inode(tp, ip, flags);
 		error = -libxfs_trans_commit(tp);
 		if (error)
@@ -507,7 +510,7 @@ parseproto(
 		}
 		libxfs_trans_ijoin(tp, pip, 0);
 		xname.type = XFS_DIR3_FT_BLKDEV;
-		newdirent(mp, tp, pip, &xname, ip->i_ino);
+		newdirent(mp, tp, pip, &xname, ip->i_ino, &offset);
 		flags |= XFS_ILOG_DEV;
 		break;
 
@@ -521,7 +524,7 @@ parseproto(
 			fail(_("Inode allocation failed"), error);
 		libxfs_trans_ijoin(tp, pip, 0);
 		xname.type = XFS_DIR3_FT_CHRDEV;
-		newdirent(mp, tp, pip, &xname, ip->i_ino);
+		newdirent(mp, tp, pip, &xname, ip->i_ino, &offset);
 		flags |= XFS_ILOG_DEV;
 		break;
 
@@ -533,7 +536,7 @@ parseproto(
 			fail(_("Inode allocation failed"), error);
 		libxfs_trans_ijoin(tp, pip, 0);
 		xname.type = XFS_DIR3_FT_FIFO;
-		newdirent(mp, tp, pip, &xname, ip->i_ino);
+		newdirent(mp, tp, pip, &xname, ip->i_ino, &offset);
 		break;
 	case IF_SYMLINK:
 		buf = getstr(pp);
@@ -546,7 +549,7 @@ parseproto(
 		flags |= newfile(tp, ip, 1, 1, buf, len);
 		libxfs_trans_ijoin(tp, pip, 0);
 		xname.type = XFS_DIR3_FT_SYMLINK;
-		newdirent(mp, tp, pip, &xname, ip->i_ino);
+		newdirent(mp, tp, pip, &xname, ip->i_ino, &offset);
 		break;
 	case IF_DIRECTORY:
 		tp = getres(mp, 0);
@@ -563,7 +566,7 @@ parseproto(
 		} else {
 			libxfs_trans_ijoin(tp, pip, 0);
 			xname.type = XFS_DIR3_FT_DIR;
-			newdirent(mp, tp, pip, &xname, ip->i_ino);
+			newdirent(mp, tp, pip, &xname, ip->i_ino, &offset);
 			inc_nlink(VFS_I(pip));
 			libxfs_trans_log_inode(tp, pip, XFS_ILOG_CORE);
 		}
@@ -599,6 +602,23 @@ parseproto(
 		fail(_("Error encountered creating file from prototype file"),
 			error);
 	}
+
+	if (xfs_has_parent(mp)) {
+		struct xfs_parent_name_rec      rec;
+		struct xfs_da_args		args = {
+			.dp = ip,
+			.name = (const unsigned char *)&rec,
+			.namelen = sizeof(rec),
+			.attr_filter = XFS_ATTR_PARENT,
+			.value = (void *)xname.name,
+			.valuelen = xname.len,
+		};
+		xfs_init_parent_name_rec(&rec, pip, offset);
+		error = xfs_attr_set(&args);
+		if (error)
+			fail(_("Error creating parent pointer"), error);
+	}
+
 	libxfs_irele(ip);
 }
 


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 25/25] xfsprogs: Add i, n and f flags to parent command
  2023-02-16 20:29 ` [PATCHSET v9r2d1 00/25] xfsprogs: Parent Pointers Darrick J. Wong
                     ` (23 preceding siblings ...)
  2023-02-16 20:59   ` [PATCH 24/25] xfsprogs: Add parent pointers during protofile creation Darrick J. Wong
@ 2023-02-16 21:00   ` Darrick J. Wong
  24 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 21:00 UTC (permalink / raw)
  To: djwong; +Cc: Allison Henderson, allison.henderson, linux-xfs

From: Allison Henderson <allison.henderson@oracle.com>

This patch adds the flags i, n, and f to the parent command. These flags add
filtering options that are used by the new parent pointer tests in xfstests, and
help to improve the test run time.  The flags are:

-i: Only show parent pointer records containing the given inode
-n: Only show parent pointer records containing the given filename
-f: Print records in short format: ino/gen/namelen/name

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
---
 include/parent.h   |   17 ++++++++---
 io/parent.c        |   82 ++++++++++++++++++++++++++++++++++++++++------------
 libhandle/parent.c |   73 ++++++++++++++++++++++++++++++++++------------
 3 files changed, 128 insertions(+), 44 deletions(-)


diff --git a/include/parent.h b/include/parent.h
index fb900041..2e136724 100644
--- a/include/parent.h
+++ b/include/parent.h
@@ -17,20 +17,27 @@ typedef struct parent_cursor {
 	__u32	opaque[4];      /* an opaque cookie */
 } parent_cursor_t;
 
+/* Print parent pointer option flags */
+#define XFS_PPPTR_OFLAG_SHORT  (1<<0)	/* Print in short format */
+
 struct path_list;
 
 typedef int (*walk_pptr_fn)(struct xfs_pptr_info *pi, struct xfs_parent_ptr *pptr,
-		void *arg);
+		void *arg, int flags);
 typedef int (*walk_ppath_fn)(const char *mntpt, struct path_list *path,
 		void *arg);
 
 #define WALK_PPTRS_ABORT	1
-int fd_walk_pptrs(int fd, walk_pptr_fn fn, void *arg);
-int handle_walk_pptrs(void *hanp, size_t hanlen, walk_pptr_fn fn, void *arg);
+int fd_walk_pptrs(int fd, uint64_t pino, char *pname, walk_pptr_fn fn,
+		void *arg, int flags);
+int handle_walk_pptrs(void *hanp, size_t hanlen, uint64_t pino, char *pname,
+		walk_pptr_fn fn, void *arg, int flags);
 
 #define WALK_PPATHS_ABORT	1
-int fd_walk_ppaths(int fd, walk_ppath_fn fn, void *arg);
-int handle_walk_ppaths(void *hanp, size_t hanlen, walk_ppath_fn fn, void *arg);
+int fd_walk_ppaths(int fd, uint64_t pino, char *pname, walk_ppath_fn fn,
+		void *arg, int flags);
+int handle_walk_ppaths(void *hanp, size_t hanlen, uint64_t pino, char *pname,
+		walk_ppath_fn fn, void *arg, int flags);
 
 int fd_to_path(int fd, char *path, size_t pathlen);
 int handle_to_path(void *hanp, size_t hlen, char *path, size_t pathlen);
diff --git a/io/parent.c b/io/parent.c
index e0ca29eb..a6f3fa0c 100644
--- a/io/parent.c
+++ b/io/parent.c
@@ -19,7 +19,8 @@ static int
 pptr_print(
 	struct xfs_pptr_info	*pi,
 	struct xfs_parent_ptr	*pptr,
-	void			*arg)
+	void			*arg,
+	int			flags)
 {
 	char			buf[XFS_PPTR_MAXNAMELEN + 1];
 	unsigned int		namelen = strlen((char *)pptr->xpp_name);
@@ -31,24 +32,36 @@ pptr_print(
 
 	memcpy(buf, pptr->xpp_name, namelen);
 	buf[namelen] = 0;
-	printf(_("p_ino    = %llu\n"), (unsigned long long)pptr->xpp_ino);
-	printf(_("p_gen    = %u\n"), (unsigned int)pptr->xpp_gen);
-	printf(_("p_reclen = %u\n"), namelen);
-	printf(_("p_name   = \"%s\"\n\n"), buf);
+
+	if (flags & XFS_PPPTR_OFLAG_SHORT) {
+		printf("%llu/%u/%u/%s\n",
+			(unsigned long long)pptr->xpp_ino,
+			(unsigned int)pptr->xpp_gen, namelen, buf);
+	}
+	else {
+		printf(_("p_ino    = %llu\n"), (unsigned long long)pptr->xpp_ino);
+		printf(_("p_gen    = %u\n"), (unsigned int)pptr->xpp_gen);
+		printf(_("p_reclen = %u\n"), namelen);
+		printf(_("p_name   = \"%s\"\n\n"), buf);
+	}
 	return 0;
 }
 
-int
+static int
 print_parents(
-	struct xfs_handle	*handle)
+	struct xfs_handle	*handle,
+	uint64_t		pino,
+	char			*pname,
+	int			flags)
 {
 	int			ret;
 
 	if (handle)
-		ret = handle_walk_pptrs(handle, sizeof(*handle), pptr_print,
-				NULL);
+		ret = handle_walk_pptrs(handle, sizeof(*handle), pino,
+				pname, pptr_print, NULL, flags);
 	else
-		ret = fd_walk_pptrs(file->fd, pptr_print, NULL);
+		ret = fd_walk_pptrs(file->fd, pino, pname, pptr_print,
+				NULL, flags);
 	if (ret)
 		perror(file->name);
 
@@ -77,17 +90,21 @@ path_print(
 	return 0;
 }
 
-int
+static int
 print_paths(
-	struct xfs_handle	*handle)
+	struct xfs_handle	*handle,
+	uint64_t		pino,
+	char			*pname,
+	int			flags)
 {
 	int			ret;
 
 	if (handle)
-		ret = handle_walk_ppaths(handle, sizeof(*handle), path_print,
-				NULL);
+		ret = handle_walk_ppaths(handle, sizeof(*handle), pino,
+				pname, path_print, NULL, flags);
  	else
-		ret = fd_walk_ppaths(file->fd, path_print, NULL);
+		ret = fd_walk_ppaths(file->fd, pino, pname, path_print,
+				NULL, flags);
 	if (ret)
 		perror(file->name);
 	return 0;
@@ -109,6 +126,9 @@ parent_f(
 	int			listpath_flag = 0;
 	int			ret;
 	static int		tab_init;
+	uint64_t		pino = 0;
+	char			*pname = NULL;
+	int			ppptr_flags = 0;
 
 	if (!tab_init) {
 		tab_init = 1;
@@ -123,11 +143,27 @@ parent_f(
 	}
 	mntpt = fs->fs_dir;
 
-	while ((c = getopt(argc, argv, "p")) != EOF) {
+	while ((c = getopt(argc, argv, "pfi:n:")) != EOF) {
 		switch (c) {
 		case 'p':
 			listpath_flag = 1;
 			break;
+		case 'i':
+	                pino = strtoull(optarg, &p, 0);
+	                if (*p != '\0' || pino == 0) {
+	                        fprintf(stderr,
+	                                _("Bad inode number '%s'.\n"),
+	                                optarg);
+	                        return 0;
+			}
+
+			break;
+		case 'n':
+			pname = optarg;
+			break;
+		case 'f':
+			ppptr_flags |= XFS_PPPTR_OFLAG_SHORT;
+			break;
 		default:
 			return command_usage(&parent_cmd);
 		}
@@ -169,9 +205,11 @@ parent_f(
 	}
 
 	if (listpath_flag)
-		exitcode = print_paths(ino ? &handle : NULL);
+		exitcode = print_paths(ino ? &handle : NULL,
+				pino, pname, ppptr_flags);
 	else
-		exitcode = print_parents(ino ? &handle : NULL);
+		exitcode = print_parents(ino ? &handle : NULL,
+				pino, pname, ppptr_flags);
 
 	if (hanp)
 		free_handle(hanp, hlen);
@@ -189,6 +227,12 @@ printf(_(
 " -p -- list the current file's paths up to the root\n"
 "\n"
 "If ino and gen are supplied, use them instead.\n"
+"\n"
+" -i -- Only show parent pointer records containing the given inode\n"
+"\n"
+" -n -- Only show parent pointer records containing the given filename\n"
+"\n"
+" -f -- Print records in short format: ino/gen/namelen/filename\n"
 "\n"));
 }
 
@@ -199,7 +243,7 @@ parent_init(void)
 	parent_cmd.cfunc = parent_f;
 	parent_cmd.argmin = 0;
 	parent_cmd.argmax = -1;
-	parent_cmd.args = _("[-p] [ino gen]");
+	parent_cmd.args = _("[-p] [ino gen] [-i] [ino] [-n] [name] [-f]");
 	parent_cmd.flags = CMD_NOMAP_OK;
 	parent_cmd.oneline = _("print parent inodes");
 	parent_cmd.help = parent_help;
diff --git a/libhandle/parent.c b/libhandle/parent.c
index ebd0abd5..3de8742c 100644
--- a/libhandle/parent.c
+++ b/libhandle/parent.c
@@ -40,13 +40,21 @@ xfs_pptr_alloc(
       return pi;
 }
 
-/* Walk all parents of the given file handle. */
+/*
+ * Walk all parents of the given file handle.
+ * If pino is set, print only the parent pointer
+ * of that inode.  If pname is set, print only the
+ * parent pointer of that filename
+ */
 static int
 handle_walk_parents(
 	int			fd,
 	struct xfs_handle	*handle,
+	uint64_t		pino,
+	char			*pname,
 	walk_pptr_fn		fn,
-	void			*arg)
+	void			*arg,
+	int			flags)
 {
 	struct xfs_pptr_info	*pi;
 	struct xfs_parent_ptr	*p;
@@ -65,13 +73,20 @@ handle_walk_parents(
 	ret = ioctl(fd, XFS_IOC_GETPARENTS, pi);
 	while (!ret) {
 		if (pi->pi_flags & XFS_PPTR_OFLAG_ROOT) {
-			ret = fn(pi, NULL, arg);
+			ret = fn(pi, NULL, arg, flags);
 			break;
 		}
 
 		for (i = 0; i < pi->pi_ptrs_used; i++) {
 			p = xfs_ppinfo_to_pp(pi, i);
-			ret = fn(pi, p, arg);
+			if ((pino != 0) && (pino != p->xpp_ino))
+				continue;
+
+			if ((pname  != NULL) && (strcmp(pname,
+					(char *)p->xpp_name) != 0))
+				continue;
+
+			ret = fn(pi, p, arg, flags);
 			if (ret)
 				goto out_pi;
 		}
@@ -92,8 +107,11 @@ int
 handle_walk_pptrs(
 	void			*hanp,
 	size_t			hlen,
+	uint64_t		pino,
+	char			*pname,
 	walk_pptr_fn		fn,
-	void			*arg)
+	void			*arg,
+	int			flags)
 {
 	char			*mntpt;
 	int			fd;
@@ -107,17 +125,20 @@ handle_walk_pptrs(
 	if (fd < 0)
 		return -1;
 
-	return handle_walk_parents(fd, hanp, fn, arg);
+	return handle_walk_parents(fd, hanp, pino, pname, fn, arg, flags);
 }
 
 /* Walk all parent pointers of this fd. */
 int
 fd_walk_pptrs(
 	int			fd,
+	uint64_t		pino,
+	char			*pname,
 	walk_pptr_fn		fn,
-	void			*arg)
+	void			*arg,
+	int			flags)
 {
-	return handle_walk_parents(fd, NULL, fn, arg);
+	return handle_walk_parents(fd, NULL, pino, pname, fn, arg, flags);
 }
 
 struct walk_ppaths_info {
@@ -135,13 +156,15 @@ struct walk_ppath_level_info {
 };
 
 static int handle_walk_parent_paths(struct walk_ppaths_info *wpi,
-		struct xfs_handle *handle);
+		struct xfs_handle *handle, uint64_t pino, char *pname,
+		int flags);
 
 static int
 handle_walk_parent_path_ptr(
 	struct xfs_pptr_info		*pi,
 	struct xfs_parent_ptr		*p,
-	void				*arg)
+	void				*arg,
+	int				flags)
 {
 	struct walk_ppath_level_info	*wpli = arg;
 	struct walk_ppaths_info		*wpi = wpli->wpi;
@@ -160,7 +183,7 @@ handle_walk_parent_path_ptr(
 		wpli->newhandle.ha_fid.fid_ino = p->xpp_ino;
 		wpli->newhandle.ha_fid.fid_gen = p->xpp_gen;
 		path_list_add_parent_component(wpi->path, wpli->pc);
-		ret = handle_walk_parent_paths(wpi, &wpli->newhandle);
+		ret = handle_walk_parent_paths(wpi, &wpli->newhandle, 0, NULL, 0);
 		path_list_del_component(wpi->path, wpli->pc);
 		if (ret)
 			break;
@@ -176,7 +199,10 @@ handle_walk_parent_path_ptr(
 static int
 handle_walk_parent_paths(
 	struct walk_ppaths_info		*wpi,
-	struct xfs_handle		*handle)
+	struct xfs_handle		*handle,
+	uint64_t			pino,
+	char				*pname,
+	int				flags)
 {
 	struct walk_ppath_level_info	*wpli;
 	int				ret;
@@ -192,8 +218,8 @@ handle_walk_parent_paths(
 	wpli->wpi = wpi;
 	memcpy(&wpli->newhandle, handle, sizeof(struct xfs_handle));
 
-	ret = handle_walk_parents(wpi->fd, handle, handle_walk_parent_path_ptr,
-			wpli);
+	ret = handle_walk_parents(wpi->fd, handle, pino, pname,
+			handle_walk_parent_path_ptr, wpli, flags);
 
 	path_component_free(wpli->pc);
 	free(wpli);
@@ -208,8 +234,11 @@ int
 handle_walk_ppaths(
 	void			*hanp,
 	size_t			hlen,
+	uint64_t		pino,
+	char			*pname,
 	walk_ppath_fn		fn,
-	void			*arg)
+	void			*arg,
+	int			flags)
 {
 	struct walk_ppaths_info	wpi;
 	ssize_t			ret;
@@ -228,7 +257,7 @@ handle_walk_ppaths(
 	wpi.fn = fn;
 	wpi.arg = arg;
 
-	ret = handle_walk_parent_paths(&wpi, hanp);
+	ret = handle_walk_parent_paths(&wpi, hanp, pino, pname, flags);
 	path_list_free(wpi.path);
 
 	return ret;
@@ -241,8 +270,11 @@ handle_walk_ppaths(
 int
 fd_walk_ppaths(
 	int			fd,
+	uint64_t		pino,
+	char			*pname,
 	walk_ppath_fn		fn,
-	void			*arg)
+	void			*arg,
+	int			flags)
 {
 	struct walk_ppaths_info	wpi;
 	void			*hanp;
@@ -264,7 +296,7 @@ fd_walk_ppaths(
 	wpi.fn = fn;
 	wpi.arg = arg;
 
-	ret = handle_walk_parent_paths(&wpi, hanp);
+	ret = handle_walk_parent_paths(&wpi, hanp, pino, pname, flags);
 	path_list_free(wpi.path);
 
 	return ret;
@@ -310,7 +342,8 @@ handle_to_path(
 
 	pwi.buf = path;
 	pwi.len = pathlen;
-	return handle_walk_ppaths(hanp, hlen, handle_to_path_walk, &pwi);
+	return handle_walk_ppaths(hanp, hlen, 0, NULL, handle_to_path_walk,
+			&pwi, 0);
 }
 
 /* Return any eligible path to this file description. */
@@ -324,5 +357,5 @@ fd_to_path(
 
 	pwi.buf = path;
 	pwi.len = pathlen;
-	return fd_walk_ppaths(fd, handle_to_path_walk, &pwi);
+	return fd_walk_ppaths(fd, 0, NULL, handle_to_path_walk, &pwi, 0);
 }


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 1/6] libxfs: initialize the slab cache for parent defer items
  2023-02-16 20:29 ` [PATCHSET v9r2d1 0/6] xfsprogs: bug fixes before parent pointers Darrick J. Wong
@ 2023-02-16 21:00   ` Darrick J. Wong
  2023-02-16 21:00   ` [PATCH 2/6] xfs: directory lookups should return diroffsets too Darrick J. Wong
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 21:00 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Initialize the slab cache for parent defer items.  We'll need this in an
upcoming patch.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 include/libxfs.h |    1 +
 libxfs/init.c    |    3 +++
 2 files changed, 4 insertions(+)


diff --git a/include/libxfs.h b/include/libxfs.h
index 915bf511..a38d78a1 100644
--- a/include/libxfs.h
+++ b/include/libxfs.h
@@ -77,6 +77,7 @@ struct iomap;
 #include "xfs_refcount_btree.h"
 #include "xfs_refcount.h"
 #include "xfs_btree_staging.h"
+#include "xfs_parent.h"
 
 #ifndef ARRAY_SIZE
 #define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
diff --git a/libxfs/init.c b/libxfs/init.c
index 93dc1f1c..49cb2326 100644
--- a/libxfs/init.c
+++ b/libxfs/init.c
@@ -258,6 +258,8 @@ init_caches(void)
 			"xfs_extfree_item");
 	xfs_trans_cache = kmem_cache_init(
 			sizeof(struct xfs_trans), "xfs_trans");
+	xfs_parent_intent_cache = kmem_cache_init(
+			sizeof(struct xfs_parent_defer), "xfs_parent_defer");
 }
 
 static int
@@ -275,6 +277,7 @@ destroy_caches(void)
 	xfs_btree_destroy_cur_caches();
 	leaked += kmem_cache_destroy(xfs_extfree_item_cache);
 	leaked += kmem_cache_destroy(xfs_trans_cache);
+	leaked += kmem_cache_destroy(xfs_parent_intent_cache);
 
 	return leaked;
 }


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 2/6] xfs: directory lookups should return diroffsets too
  2023-02-16 20:29 ` [PATCHSET v9r2d1 0/6] xfsprogs: bug fixes before parent pointers Darrick J. Wong
  2023-02-16 21:00   ` [PATCH 1/6] libxfs: initialize the slab cache for parent defer items Darrick J. Wong
@ 2023-02-16 21:00   ` Darrick J. Wong
  2023-02-16 21:00   ` [PATCH 3/6] xfs: move/add parent pointer validators to xfs_parent Darrick J. Wong
                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 21:00 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Teach the directory lookup functions to return the dir offset of the
dirent that it finds.  Online fsck will use this when checking and
repairing filesystems.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 libxfs/xfs_dir2_block.c |    2 ++
 libxfs/xfs_dir2_leaf.c  |    2 ++
 libxfs/xfs_dir2_node.c  |    2 ++
 libxfs/xfs_dir2_sf.c    |    4 ++++
 4 files changed, 10 insertions(+)


diff --git a/libxfs/xfs_dir2_block.c b/libxfs/xfs_dir2_block.c
index c743fa67..1ed3c974 100644
--- a/libxfs/xfs_dir2_block.c
+++ b/libxfs/xfs_dir2_block.c
@@ -746,6 +746,8 @@ xfs_dir2_block_lookup_int(
 		cmp = xfs_dir2_compname(args, dep->name, dep->namelen);
 		if (cmp != XFS_CMP_DIFFERENT && cmp != args->cmpresult) {
 			args->cmpresult = cmp;
+			args->offset = xfs_dir2_byte_to_dataptr(
+					(char *)dep - (char *)hdr);
 			*bpp = bp;
 			*entno = mid;
 			if (cmp == XFS_CMP_EXACT)
diff --git a/libxfs/xfs_dir2_leaf.c b/libxfs/xfs_dir2_leaf.c
index 1be7773e..9ec01d02 100644
--- a/libxfs/xfs_dir2_leaf.c
+++ b/libxfs/xfs_dir2_leaf.c
@@ -1298,6 +1298,8 @@ xfs_dir2_leaf_lookup_int(
 		cmp = xfs_dir2_compname(args, dep->name, dep->namelen);
 		if (cmp != XFS_CMP_DIFFERENT && cmp != args->cmpresult) {
 			args->cmpresult = cmp;
+			args->offset = xfs_dir2_db_off_to_dataptr(args->geo,
+					newdb, (char *)dep - (char *)dbp->b_addr);
 			*indexp = index;
 			/* case exact match: return the current buffer. */
 			if (cmp == XFS_CMP_EXACT) {
diff --git a/libxfs/xfs_dir2_node.c b/libxfs/xfs_dir2_node.c
index 621e8bf5..b00fb3cf 100644
--- a/libxfs/xfs_dir2_node.c
+++ b/libxfs/xfs_dir2_node.c
@@ -884,6 +884,8 @@ xfs_dir2_leafn_lookup_for_entry(
 			args->cmpresult = cmp;
 			args->inumber = be64_to_cpu(dep->inumber);
 			args->filetype = xfs_dir2_data_get_ftype(mp, dep);
+			args->offset = xfs_dir2_db_off_to_dataptr(args->geo,
++					newdb, (char *)dep - (char *)curbp->b_addr);
 			*indexp = index;
 			state->extravalid = 1;
 			state->extrablk.bp = curbp;
diff --git a/libxfs/xfs_dir2_sf.c b/libxfs/xfs_dir2_sf.c
index 6a128748..9356bf62 100644
--- a/libxfs/xfs_dir2_sf.c
+++ b/libxfs/xfs_dir2_sf.c
@@ -889,6 +889,7 @@ xfs_dir2_sf_lookup(
 		args->inumber = dp->i_ino;
 		args->cmpresult = XFS_CMP_EXACT;
 		args->filetype = XFS_DIR3_FT_DIR;
+		args->offset = 1;
 		return -EEXIST;
 	}
 	/*
@@ -899,6 +900,7 @@ xfs_dir2_sf_lookup(
 		args->inumber = xfs_dir2_sf_get_parent_ino(sfp);
 		args->cmpresult = XFS_CMP_EXACT;
 		args->filetype = XFS_DIR3_FT_DIR;
+		args->offset = 2;
 		return -EEXIST;
 	}
 	/*
@@ -917,6 +919,8 @@ xfs_dir2_sf_lookup(
 			args->cmpresult = cmp;
 			args->inumber = xfs_dir2_sf_get_ino(mp, sfp, sfep);
 			args->filetype = xfs_dir2_sf_get_ftype(mp, sfep);
+			args->offset = xfs_dir2_byte_to_dataptr(
+						xfs_dir2_sf_get_offset(sfep));
 			if (cmp == XFS_CMP_EXACT)
 				return -EEXIST;
 			ci_sfep = sfep;


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 3/6] xfs: move/add parent pointer validators to xfs_parent
  2023-02-16 20:29 ` [PATCHSET v9r2d1 0/6] xfsprogs: bug fixes before parent pointers Darrick J. Wong
  2023-02-16 21:00   ` [PATCH 1/6] libxfs: initialize the slab cache for parent defer items Darrick J. Wong
  2023-02-16 21:00   ` [PATCH 2/6] xfs: directory lookups should return diroffsets too Darrick J. Wong
@ 2023-02-16 21:00   ` Darrick J. Wong
  2023-02-16 21:01   ` [PATCH 4/6] xfs: don't remove the attr fork when parent pointers are enabled Darrick J. Wong
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 21:00 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Move the parent pointer xattr name validator to xfs_parent.c, and add a
new function to check the xattr value.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 libxfs/xfs_attr.c   |   61 +++++++++++----------------------------------------
 libxfs/xfs_attr.h   |    2 +-
 libxfs/xfs_parent.c |   44 +++++++++++++++++++++++++++++++++++++
 libxfs/xfs_parent.h |    7 ++++++
 4 files changed, 65 insertions(+), 49 deletions(-)


diff --git a/libxfs/xfs_attr.c b/libxfs/xfs_attr.c
index 0cb76f8f..9afa0fef 100644
--- a/libxfs/xfs_attr.c
+++ b/libxfs/xfs_attr.c
@@ -24,6 +24,7 @@
 #include "xfs_quota_defs.h"
 #include "xfs_trans_space.h"
 #include "xfs_trace.h"
+#include "xfs_parent.h"
 
 struct kmem_cache		*xfs_attr_intent_cache;
 
@@ -1575,62 +1576,26 @@ xfs_attr_node_get(
 	return error;
 }
 
-/*
- * Verify parent pointer attribute is valid.
- * Return true on success or false on failure
- */
-STATIC bool
-xfs_verify_pptr(
-	struct xfs_mount			*mp,
-	const struct xfs_parent_name_rec	*rec)
-{
-	xfs_ino_t				p_ino;
-	xfs_dir2_dataptr_t			p_diroffset;
-
-	p_ino = be64_to_cpu(rec->p_ino);
-	p_diroffset = be32_to_cpu(rec->p_diroffset);
-
-	if (!xfs_verify_ino(mp, p_ino))
-		return false;
-
-	if (p_diroffset > XFS_DIR2_MAX_DATAPTR)
-		return false;
-
-	return true;
-}
-
-/* Returns true if the string attribute entry name is valid. */
-static bool
-xfs_str_attr_namecheck(
-	const void	*name,
-	size_t		length)
-{
-	/*
-	 * MAXNAMELEN includes the trailing null, but (name/length) leave it
-	 * out, so use >= for the length check.
-	 */
-	if (length >= MAXNAMELEN)
-		return false;
-
-	/* There shouldn't be any nulls here */
-	return !memchr(name, 0, length);
-}
-
 /* Returns true if the attribute entry name is valid. */
 bool
 xfs_attr_namecheck(
 	struct xfs_mount	*mp,
 	const void		*name,
 	size_t			length,
-	int			flags)
+	unsigned int		flags)
 {
-	if (flags & XFS_ATTR_PARENT) {
-		if (length != sizeof(struct xfs_parent_name_rec))
-			return false;
-		return xfs_verify_pptr(mp, (struct xfs_parent_name_rec *)name);
-	}
+	if (flags & XFS_ATTR_PARENT)
+		return xfs_parent_namecheck(mp, name, length, flags);
 
-	return xfs_str_attr_namecheck(name, length);
+	/*
+	 * MAXNAMELEN includes the trailing null, but (name/length) leave it
+	 * out, so use >= for the length check.
+	 */
+	if (length >= MAXNAMELEN)
+		return false;
+
+	/* There shouldn't be any nulls here */
+	return !memchr(name, 0, length);
 }
 
 int __init
diff --git a/libxfs/xfs_attr.h b/libxfs/xfs_attr.h
index 98576126..d6d23cf1 100644
--- a/libxfs/xfs_attr.h
+++ b/libxfs/xfs_attr.h
@@ -551,7 +551,7 @@ int xfs_attr_set(struct xfs_da_args *args);
 int xfs_attr_set_iter(struct xfs_attr_intent *attr);
 int xfs_attr_remove_iter(struct xfs_attr_intent *attr);
 bool xfs_attr_namecheck(struct xfs_mount *mp, const void *name, size_t length,
-			int flags);
+		unsigned int flags);
 int xfs_attr_calc_size(struct xfs_da_args *args, int *local);
 void xfs_init_attr_trans(struct xfs_da_args *args, struct xfs_trans_res *tres,
 			 unsigned int *total);
diff --git a/libxfs/xfs_parent.c b/libxfs/xfs_parent.c
index 47ea6b89..654eaec7 100644
--- a/libxfs/xfs_parent.c
+++ b/libxfs/xfs_parent.c
@@ -56,6 +56,50 @@ xfs_init_parent_ptr(struct xfs_parent_ptr		*xpp,
  * occurring.
  */
 
+/* Return true if parent pointer EA name is valid. */
+bool
+xfs_parent_namecheck(
+	struct xfs_mount			*mp,
+	const struct xfs_parent_name_rec	*rec,
+	size_t					reclen,
+	unsigned int				attr_flags)
+{
+	xfs_ino_t				p_ino;
+	xfs_dir2_dataptr_t			p_diroffset;
+
+	if (reclen != sizeof(struct xfs_parent_name_rec))
+		return false;
+
+	/* Only one namespace bit allowed. */
+	if (hweight32(attr_flags & XFS_ATTR_NSP_ONDISK_MASK) > 1)
+		return false;
+
+	p_ino = be64_to_cpu(rec->p_ino);
+	if (!xfs_verify_ino(mp, p_ino))
+		return false;
+
+	p_diroffset = be32_to_cpu(rec->p_diroffset);
+	if (p_diroffset > XFS_DIR2_MAX_DATAPTR)
+		return false;
+
+	return true;
+}
+
+/* Return true if parent pointer EA value is valid. */
+bool
+xfs_parent_valuecheck(
+	struct xfs_mount		*mp,
+	const void			*value,
+	size_t				valuelen)
+{
+	if (valuelen == 0 || valuelen >= MAXNAMELEN)
+		return false;
+
+	if (value == NULL)
+		return false;
+
+	return true;
+}
 
 /* Initializes a xfs_parent_name_rec to be stored as an attribute name */
 void
diff --git a/libxfs/xfs_parent.h b/libxfs/xfs_parent.h
index 13040b9d..4ffcb81d 100644
--- a/libxfs/xfs_parent.h
+++ b/libxfs/xfs_parent.h
@@ -8,6 +8,13 @@
 
 extern struct kmem_cache	*xfs_parent_intent_cache;
 
+/* Metadata validators */
+bool xfs_parent_namecheck(struct xfs_mount *mp,
+		const struct xfs_parent_name_rec *rec, size_t reclen,
+		unsigned int attr_flags);
+bool xfs_parent_valuecheck(struct xfs_mount *mp, const void *value,
+		size_t valuelen);
+
 /*
  * Dynamically allocd structure used to wrap the needed data to pass around
  * the defer ops machinery


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 4/6] xfs: don't remove the attr fork when parent pointers are enabled
  2023-02-16 20:29 ` [PATCHSET v9r2d1 0/6] xfsprogs: bug fixes before parent pointers Darrick J. Wong
                     ` (2 preceding siblings ...)
  2023-02-16 21:00   ` [PATCH 3/6] xfs: move/add parent pointer validators to xfs_parent Darrick J. Wong
@ 2023-02-16 21:01   ` Darrick J. Wong
  2023-02-16 21:01   ` [PATCH 5/6] xfs: pass the attr value to put_listent when possible Darrick J. Wong
  2023-02-16 21:01   ` [PATCH 6/6] xfs: replace the XFS_IOC_GETPARENTS backend Darrick J. Wong
  5 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 21:01 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

When running generic/388, I observed the following .out.bad output:

_check_xfs_filesystem: filesystem on /dev/sda4 is inconsistent (r)
*** xfs_repair -n output ***
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
mismatch between format (2) and size (276) in symlink ino 37223730
bad data fork in symlink 37223730
would have cleared inode 37223730
        - agno = 2
        - agno = 3
mismatch between format (2) and size (276) in symlink ino 102725435
bad data fork in symlink 102725435
would have cleared inode 102725435
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
unknown block state, ag 1, blocks 458655-458655
unknown block state, ag 3, blocks 257772-257772
        - check for inodes claiming duplicate blocks...
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 0
mismatch between format (2) and size (276) in symlink ino 102725435
bad data fork in symlink 102725435
would have cleared inode 102725435
mismatch between format (2) and size (276) in symlink ino 37223730
bad data fork in symlink 37223730
would have cleared inode 37223730
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
user quota id 0 has bcount 1140448, expected 1140446
user quota id 0 has icount 39892, expected 39890
No modify flag set, skipping filesystem flush and exiting.

Inode 37223730 is an unlinked remote-format symlink with no xattr fork.
According to the inode verifier and xfs_repair, this symlink ought to
have a local format data fork, since 276 bytes is small enough to fit in
the immediate area.

How did we get here?  fsstress removed the symlink, which removed the
last parent pointer xattr.  There were no other xattrs, so that removal
also removed the attr fork.  This transaction got flushed to the log,
but the system went down before we could inactivate the symlink.  Log
recovery tried to inactivate this inode (since it is on the unlinked
list) but the verifier tripped over the remote value and leaked it.

Hence we ended up with a file in this odd state on a "clean" mount.  The
"obvious" fix is to prohibit erasure of the attr fork to avoid tripping
over the verifiers when pptrs are enabled.

I wonder this could be reproduced with normal xattrs and (say) a
directory?  Maybe this fix should target /any/ symlink or directory?

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 libxfs/xfs_attr_leaf.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)


diff --git a/libxfs/xfs_attr_leaf.c b/libxfs/xfs_attr_leaf.c
index 6cac2531..6391f6ab 100644
--- a/libxfs/xfs_attr_leaf.c
+++ b/libxfs/xfs_attr_leaf.c
@@ -851,7 +851,8 @@ xfs_attr_sf_removename(
 	totsize -= size;
 	if (totsize == sizeof(xfs_attr_sf_hdr_t) && xfs_has_attr2(mp) &&
 	    (dp->i_df.if_format != XFS_DINODE_FMT_BTREE) &&
-	    !(args->op_flags & (XFS_DA_OP_ADDNAME | XFS_DA_OP_REPLACE))) {
+	    !(args->op_flags & (XFS_DA_OP_ADDNAME | XFS_DA_OP_REPLACE)) &&
+	    !xfs_has_parent(mp)) {
 		xfs_attr_fork_remove(dp, args->trans);
 	} else {
 		xfs_idata_realloc(dp, -size, XFS_ATTR_FORK);
@@ -860,7 +861,8 @@ xfs_attr_sf_removename(
 		ASSERT(totsize > sizeof(xfs_attr_sf_hdr_t) ||
 				(args->op_flags & XFS_DA_OP_ADDNAME) ||
 				!xfs_has_attr2(mp) ||
-				dp->i_df.if_format == XFS_DINODE_FMT_BTREE);
+				dp->i_df.if_format == XFS_DINODE_FMT_BTREE ||
+				xfs_has_parent(mp));
 		xfs_trans_log_inode(args->trans, dp,
 					XFS_ILOG_CORE | XFS_ILOG_ADATA);
 	}


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 5/6] xfs: pass the attr value to put_listent when possible
  2023-02-16 20:29 ` [PATCHSET v9r2d1 0/6] xfsprogs: bug fixes before parent pointers Darrick J. Wong
                     ` (3 preceding siblings ...)
  2023-02-16 21:01   ` [PATCH 4/6] xfs: don't remove the attr fork when parent pointers are enabled Darrick J. Wong
@ 2023-02-16 21:01   ` Darrick J. Wong
  2023-02-16 21:01   ` [PATCH 6/6] xfs: replace the XFS_IOC_GETPARENTS backend Darrick J. Wong
  5 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 21:01 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Pass the attr value to put_listent when we have local xattrs or
shortform xattrs.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 libxfs/xfs_attr.h    |    5 +++--
 libxfs/xfs_attr_sf.h |    1 +
 2 files changed, 4 insertions(+), 2 deletions(-)


diff --git a/libxfs/xfs_attr.h b/libxfs/xfs_attr.h
index d6d23cf1..02a20b94 100644
--- a/libxfs/xfs_attr.h
+++ b/libxfs/xfs_attr.h
@@ -47,8 +47,9 @@ struct xfs_attrlist_cursor_kern {
 
 
 /* void; state communicated via *context */
-typedef void (*put_listent_func_t)(struct xfs_attr_list_context *, int,
-			      unsigned char *, int, int);
+typedef void (*put_listent_func_t)(struct xfs_attr_list_context *context,
+		int flags, unsigned char *name, int namelen, void *value,
+		int valuelen);
 
 struct xfs_attr_list_context {
 	struct xfs_trans	*tp;
diff --git a/libxfs/xfs_attr_sf.h b/libxfs/xfs_attr_sf.h
index 37578b36..c6e25979 100644
--- a/libxfs/xfs_attr_sf.h
+++ b/libxfs/xfs_attr_sf.h
@@ -24,6 +24,7 @@ typedef struct xfs_attr_sf_sort {
 	uint8_t		flags;		/* flags bits (see xfs_attr_leaf.h) */
 	xfs_dahash_t	hash;		/* this entry's hash value */
 	unsigned char	*name;		/* name value, pointer into buffer */
+	void		*value;
 } xfs_attr_sf_sort_t;
 
 #define XFS_ATTR_SF_ENTSIZE_MAX			/* max space for name&value */ \


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 6/6] xfs: replace the XFS_IOC_GETPARENTS backend
  2023-02-16 20:29 ` [PATCHSET v9r2d1 0/6] xfsprogs: bug fixes before parent pointers Darrick J. Wong
                     ` (4 preceding siblings ...)
  2023-02-16 21:01   ` [PATCH 5/6] xfs: pass the attr value to put_listent when possible Darrick J. Wong
@ 2023-02-16 21:01   ` Darrick J. Wong
  5 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 21:01 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Now that xfs_attr_list can pass local xattr values to the put_listent
function, build a new version of the GETPARENTS backend that supplies a
custom put_listent function to format parent pointer info directly into
the caller's buffer.  This uses a lot less memory and obviates the
iterate list and then grab the values logic, since parent pointers
aren't supposed to have remote values anyway.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 libxfs/xfs_parent.c |   40 ++++++++++++++++++++++++++++++----------
 libxfs/xfs_parent.h |   21 +++++++++++++++++++--
 2 files changed, 49 insertions(+), 12 deletions(-)


diff --git a/libxfs/xfs_parent.c b/libxfs/xfs_parent.c
index 654eaec7..74c7f1f7 100644
--- a/libxfs/xfs_parent.c
+++ b/libxfs/xfs_parent.c
@@ -30,16 +30,6 @@
 
 struct kmem_cache		*xfs_parent_intent_cache;
 
-/* Initializes a xfs_parent_ptr from an xfs_parent_name_rec */
-void
-xfs_init_parent_ptr(struct xfs_parent_ptr		*xpp,
-		    const struct xfs_parent_name_rec	*rec)
-{
-	xpp->xpp_ino = be64_to_cpu(rec->p_ino);
-	xpp->xpp_gen = be32_to_cpu(rec->p_gen);
-	xpp->xpp_diroffset = be32_to_cpu(rec->p_diroffset);
-}
-
 /*
  * Parent pointer attribute handling.
  *
@@ -116,6 +106,36 @@ xfs_init_parent_name_rec(
 	rec->p_diroffset = cpu_to_be32(p_diroffset);
 }
 
+/*
+ * Convert an ondisk parent_name xattr to its incore format.  If @value is
+ * NULL, set @irec->p_namelen to zero and leave @irec->p_name untouched.
+ */
+void
+xfs_parent_irec_from_disk(
+	struct xfs_parent_name_irec	*irec,
+	const struct xfs_parent_name_rec *rec,
+	const void			*value,
+	int				valuelen)
+{
+	irec->p_ino = be64_to_cpu(rec->p_ino);
+	irec->p_gen = be32_to_cpu(rec->p_gen);
+	irec->p_diroffset = be32_to_cpu(rec->p_diroffset);
+
+	if (!value) {
+		irec->p_namelen = 0;
+		return;
+	}
+
+	ASSERT(valuelen > 0);
+	ASSERT(valuelen < MAXNAMELEN);
+
+	valuelen = min(valuelen, MAXNAMELEN);
+
+	irec->p_namelen = valuelen;
+	memcpy(irec->p_name, value, valuelen);
+	memset(&irec->p_name[valuelen], 0, sizeof(irec->p_name) - valuelen);
+}
+
 int
 __xfs_parent_init(
 	struct xfs_mount		*mp,
diff --git a/libxfs/xfs_parent.h b/libxfs/xfs_parent.h
index 4ffcb81d..f4f5887d 100644
--- a/libxfs/xfs_parent.h
+++ b/libxfs/xfs_parent.h
@@ -15,6 +15,25 @@ bool xfs_parent_namecheck(struct xfs_mount *mp,
 bool xfs_parent_valuecheck(struct xfs_mount *mp, const void *value,
 		size_t valuelen);
 
+/*
+ * Incore version of a parent pointer, also contains dirent name so callers
+ * can pass/obtain all the parent pointer information in a single structure
+ */
+struct xfs_parent_name_irec {
+	/* Key fields for looking up a particular parent pointer. */
+	xfs_ino_t		p_ino;
+	uint32_t		p_gen;
+	xfs_dir2_dataptr_t	p_diroffset;
+
+	/* Attributes of a parent pointer. */
+	uint8_t			p_namelen;
+	unsigned char		p_name[MAXNAMELEN];
+};
+
+void xfs_parent_irec_from_disk(struct xfs_parent_name_irec *irec,
+		const struct xfs_parent_name_rec *rec,
+		const void *value, int valuelen);
+
 /*
  * Dynamically allocd structure used to wrap the needed data to pass around
  * the defer ops machinery
@@ -32,8 +51,6 @@ struct xfs_parent_defer {
 void xfs_init_parent_name_rec(struct xfs_parent_name_rec *rec,
 			      struct xfs_inode *ip,
 			      uint32_t p_diroffset);
-void xfs_init_parent_ptr(struct xfs_parent_ptr *xpp,
-			 const struct xfs_parent_name_rec *rec);
 int __xfs_parent_init(struct xfs_mount *mp, bool grab_log,
 		struct xfs_parent_defer **parentp);
 


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 1/6] xfs_scrub: don't report media errors for space with unknowable owner
  2023-02-16 20:29 ` [PATCHSET v9r2d1 0/6] xfsprogs: tool fixes for parent pointers Darrick J. Wong
@ 2023-02-16 21:01   ` Darrick J. Wong
  2023-02-16 21:02   ` [PATCH 2/6] mkfs: fix libxfs api misuse Darrick J. Wong
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 21:01 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

On filesystems that don't have the reverse mapping feature enabled, the
GETFSMAP call cannot tell us much about the owner of a space extent --
we're limited to static fs metadata, free space, or "unknown".  In this
case, nothing is corrupt, so str_corrupt is not an appropriate logging
function.  Relax this to str_info so that the user sees a notice that
media errors have been found so that the user knows something bad
happened even if the directory tree walker cannot find the file owning
the space where the media error was found.

Filesystems with rmap enabled are never supposed to return OWN_UNKNOWN
from a GETFSMAP report, so continue to report that as a corruption.
This fixes regressions reported by xfs/556.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 scrub/phase6.c |   13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)


diff --git a/scrub/phase6.c b/scrub/phase6.c
index afdb16b6..1a2643bd 100644
--- a/scrub/phase6.c
+++ b/scrub/phase6.c
@@ -397,7 +397,18 @@ report_ioerr_fsmap(
 		snprintf(buf, DESCR_BUFSZ, _("disk offset %"PRIu64),
 				(uint64_t)map->fmr_physical + err_off);
 		type = decode_special_owner(map->fmr_owner);
-		str_corrupt(ctx, buf, _("media error in %s."), type);
+		/*
+		 * On filesystems that don't store reverse mappings, the
+		 * GETFSMAP call returns OWNER_UNKNOWN for allocated space.
+		 * We'll have to let the directory tree walker find the file
+		 * that lost data.
+		 */
+		if (!(ctx->mnt.fsgeom.flags & XFS_FSOP_GEOM_FLAGS_RMAPBT) &&
+		    map->fmr_owner == XFS_FMR_OWN_UNKNOWN) {
+			str_info(ctx, buf, _("media error detected."));
+		} else {
+			str_corrupt(ctx, buf, _("media error in %s."), type);
+		}
 	}
 
 	/* Report extent maps */


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 2/6] mkfs: fix libxfs api misuse
  2023-02-16 20:29 ` [PATCHSET v9r2d1 0/6] xfsprogs: tool fixes for parent pointers Darrick J. Wong
  2023-02-16 21:01   ` [PATCH 1/6] xfs_scrub: don't report media errors for space with unknowable owner Darrick J. Wong
@ 2023-02-16 21:02   ` Darrick J. Wong
  2023-02-16 21:02   ` [PATCH 3/6] libxfs: create new files with attr forks if necessary Darrick J. Wong
                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 21:02 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Fix libxfs usage problems as pointed out by xfs/437.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 libxfs/libxfs_api_defs.h |    1 +
 mkfs/proto.c             |    4 ++--
 2 files changed, 3 insertions(+), 2 deletions(-)


diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h
index f8efcce7..e44b0b29 100644
--- a/libxfs/libxfs_api_defs.h
+++ b/libxfs/libxfs_api_defs.h
@@ -124,6 +124,7 @@
 #define xfs_initialize_perag		libxfs_initialize_perag
 #define xfs_initialize_perag_data	libxfs_initialize_perag_data
 #define xfs_init_local_fork		libxfs_init_local_fork
+#define xfs_init_parent_name_rec	libxfs_init_parent_name_rec
 
 #define xfs_inobt_maxrecs		libxfs_inobt_maxrecs
 #define xfs_inobt_stage_cursor		libxfs_inobt_stage_cursor
diff --git a/mkfs/proto.c b/mkfs/proto.c
index 36d8cde2..ac7ffbe9 100644
--- a/mkfs/proto.c
+++ b/mkfs/proto.c
@@ -613,8 +613,8 @@ parseproto(
 			.value = (void *)xname.name,
 			.valuelen = xname.len,
 		};
-		xfs_init_parent_name_rec(&rec, pip, offset);
-		error = xfs_attr_set(&args);
+		libxfs_init_parent_name_rec(&rec, pip, offset);
+		error = -libxfs_attr_set(&args);
 		if (error)
 			fail(_("Error creating parent pointer"), error);
 	}


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 3/6] libxfs: create new files with attr forks if necessary
  2023-02-16 20:29 ` [PATCHSET v9r2d1 0/6] xfsprogs: tool fixes for parent pointers Darrick J. Wong
  2023-02-16 21:01   ` [PATCH 1/6] xfs_scrub: don't report media errors for space with unknowable owner Darrick J. Wong
  2023-02-16 21:02   ` [PATCH 2/6] mkfs: fix libxfs api misuse Darrick J. Wong
@ 2023-02-16 21:02   ` Darrick J. Wong
  2023-02-16 21:02   ` [PATCH 4/6] mkfs: fix subdir parent pointer creation Darrick J. Wong
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 21:02 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Create new files with attr forks if they're going to have parent
pointers.  In the next patch we'll fix mkfs to use the same parent
creation functions as the kernel, so we're going to need this.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 libxfs/init.c |    4 ++++
 libxfs/util.c |   14 ++++++++++++++
 2 files changed, 18 insertions(+)


diff --git a/libxfs/init.c b/libxfs/init.c
index 49cb2326..cffd8a63 100644
--- a/libxfs/init.c
+++ b/libxfs/init.c
@@ -736,14 +736,18 @@ void
 libxfs_compute_all_maxlevels(
 	struct xfs_mount	*mp)
 {
+	struct xfs_ino_geometry *igeo = M_IGEO(mp);
+
 	xfs_alloc_compute_maxlevels(mp);
 	xfs_bmap_compute_maxlevels(mp, XFS_DATA_FORK);
 	xfs_bmap_compute_maxlevels(mp, XFS_ATTR_FORK);
+	igeo->attr_fork_offset = xfs_bmap_compute_attr_offset(mp);
 	xfs_ialloc_setup_geometry(mp);
 	xfs_rmapbt_compute_maxlevels(mp);
 	xfs_refcountbt_compute_maxlevels(mp);
 
 	xfs_agbtree_compute_maxlevels(mp);
+
 }
 
 /*
diff --git a/libxfs/util.c b/libxfs/util.c
index 6525f63d..bea5f1c7 100644
--- a/libxfs/util.c
+++ b/libxfs/util.c
@@ -322,6 +322,20 @@ libxfs_init_new_inode(
 		ASSERT(0);
 	}
 
+	/*
+	 * If we need to create attributes immediately after allocating the
+	 * inode, initialise an empty attribute fork right now. We use the
+	 * default fork offset for attributes here as we don't know exactly what
+	 * size or how many attributes we might be adding. We can do this
+	 * safely here because we know the data fork is completely empty and
+	 * this saves us from needing to run a separate transaction to set the
+	 * fork offset in the immediate future.
+	 */
+	if (xfs_has_parent(tp->t_mountp) && xfs_has_attr(tp->t_mountp)) {
+		ip->i_forkoff = xfs_default_attroffset(ip) >> 3;
+		xfs_ifork_init_attr(ip, XFS_DINODE_FMT_EXTENTS, 0);
+	}
+
 	/*
 	 * Log the new values stuffed into the inode.
 	 */


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 4/6] mkfs: fix subdir parent pointer creation
  2023-02-16 20:29 ` [PATCHSET v9r2d1 0/6] xfsprogs: tool fixes for parent pointers Darrick J. Wong
                     ` (2 preceding siblings ...)
  2023-02-16 21:02   ` [PATCH 3/6] libxfs: create new files with attr forks if necessary Darrick J. Wong
@ 2023-02-16 21:02   ` Darrick J. Wong
  2023-02-16 21:02   ` [PATCH 5/6] xfs_db: report parent pointer keys Darrick J. Wong
  2023-02-16 21:03   ` [PATCH 6/6] xfs_db: obfuscate dirent and pptr names consistently Darrick J. Wong
  5 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 21:02 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Rework the protofile code so that it uses the same deferred parent
pointer ops that the kernel uses to create parent pointers.  While we're
at it, make it so that subdirs of the root directory and reserved files
also get parent pointers.  Found by xfs/019.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 libxfs/libxfs_api_defs.h |    3 ++
 mkfs/proto.c             |   65 +++++++++++++++++++++++++++++++++-------------
 2 files changed, 50 insertions(+), 18 deletions(-)


diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h
index e44b0b29..055d2862 100644
--- a/libxfs/libxfs_api_defs.h
+++ b/libxfs/libxfs_api_defs.h
@@ -140,6 +140,9 @@
 #define xfs_log_get_max_trans_res	libxfs_log_get_max_trans_res
 #define xfs_log_sb			libxfs_log_sb
 #define xfs_mode_to_ftype		libxfs_mode_to_ftype
+#define xfs_parent_defer_add		libxfs_parent_defer_add
+#define xfs_parent_finish		libxfs_parent_finish
+#define xfs_parent_start		libxfs_parent_start
 #define xfs_perag_get			libxfs_perag_get
 #define xfs_perag_put			libxfs_perag_put
 #define xfs_prealloc_blocks		libxfs_prealloc_blocks
diff --git a/mkfs/proto.c b/mkfs/proto.c
index ac7ffbe9..e0131df5 100644
--- a/mkfs/proto.c
+++ b/mkfs/proto.c
@@ -8,7 +8,6 @@
 #include <sys/stat.h>
 #include "libfrog/convert.h"
 #include "proto.h"
-#include "xfs_parent.h"
 
 /*
  * Prototypes for internal functions.
@@ -349,6 +348,20 @@ newdirectory(
 		fail(_("directory create error"), error);
 }
 
+static struct xfs_parent_defer *
+newpptr(
+	struct xfs_mount	*mp)
+{
+	struct xfs_parent_defer	*ret;
+	int			error;
+
+	error = -libxfs_parent_start(mp, &ret);
+	if (error)
+		fail(_("initializing parent pointer"), error);
+
+	return ret;
+}
+
 static void
 parseproto(
 	xfs_mount_t	*mp,
@@ -384,6 +397,7 @@ parseproto(
 	char		*value;
 	struct xfs_name	xname;
 	xfs_dir2_dataptr_t offset;
+	struct xfs_parent_defer *parent = NULL;
 
 	memset(&creds, 0, sizeof(creds));
 	mstr = getstr(pp);
@@ -458,6 +472,7 @@ parseproto(
 	case IF_REGULAR:
 		buf = newregfile(pp, &len);
 		tp = getres(mp, XFS_B_TO_FSB(mp, len));
+		parent = newpptr(mp);
 		error = -libxfs_dir_ialloc(&tp, pip, mode|S_IFREG, 1, 0,
 					   &creds, fsxp, &ip);
 		if (error)
@@ -481,7 +496,7 @@ parseproto(
 			exit(1);
 		}
 		tp = getres(mp, XFS_B_TO_FSB(mp, llen));
-
+		parent = newpptr(mp);
 		error = -libxfs_dir_ialloc(&tp, pip, mode|S_IFREG, 1, 0,
 					  &creds, fsxp, &ip);
 		if (error)
@@ -492,15 +507,24 @@ parseproto(
 		xname.type = XFS_DIR3_FT_REG_FILE;
 		newdirent(mp, tp, pip, &xname, ip->i_ino, &offset);
 		libxfs_trans_log_inode(tp, ip, flags);
+		if (parent) {
+			error = -libxfs_parent_defer_add(tp, parent, pip,
+					&xname, offset, ip);
+			if (error)
+				fail(_("committing parent pointers failed."),
+						error);
+		}
 		error = -libxfs_trans_commit(tp);
 		if (error)
 			fail(_("Space preallocation failed."), error);
+		libxfs_parent_finish(mp, parent);
 		rsvfile(mp, ip, llen);
 		libxfs_irele(ip);
 		return;
 
 	case IF_BLOCK:
 		tp = getres(mp, 0);
+		parent = newpptr(mp);
 		majdev = getnum(getstr(pp), 0, 0, false);
 		mindev = getnum(getstr(pp), 0, 0, false);
 		error = -libxfs_dir_ialloc(&tp, pip, mode|S_IFBLK, 1,
@@ -516,6 +540,7 @@ parseproto(
 
 	case IF_CHAR:
 		tp = getres(mp, 0);
+		parent = newpptr(mp);
 		majdev = getnum(getstr(pp), 0, 0, false);
 		mindev = getnum(getstr(pp), 0, 0, false);
 		error = -libxfs_dir_ialloc(&tp, pip, mode|S_IFCHR, 1,
@@ -530,6 +555,7 @@ parseproto(
 
 	case IF_FIFO:
 		tp = getres(mp, 0);
+		parent = newpptr(mp);
 		error = -libxfs_dir_ialloc(&tp, pip, mode|S_IFIFO, 1, 0,
 				&creds, fsxp, &ip);
 		if (error)
@@ -542,6 +568,7 @@ parseproto(
 		buf = getstr(pp);
 		len = (int)strlen(buf);
 		tp = getres(mp, XFS_B_TO_FSB(mp, len));
+		parent = newpptr(mp);
 		error = -libxfs_dir_ialloc(&tp, pip, mode|S_IFLNK, 1, 0,
 				&creds, fsxp, &ip);
 		if (error)
@@ -564,6 +591,7 @@ parseproto(
 			libxfs_log_sb(tp);
 			isroot = 1;
 		} else {
+			parent = newpptr(mp);
 			libxfs_trans_ijoin(tp, pip, 0);
 			xname.type = XFS_DIR3_FT_DIR;
 			newdirent(mp, tp, pip, &xname, ip->i_ino, &offset);
@@ -572,9 +600,19 @@ parseproto(
 		}
 		newdirectory(mp, tp, ip, pip);
 		libxfs_trans_log_inode(tp, ip, flags);
+		if (parent) {
+			error = -libxfs_parent_defer_add(tp, parent, pip,
+					&xname, offset, ip);
+			if (error)
+				fail(_("committing parent pointers failed."),
+						error);
+		}
 		error = -libxfs_trans_commit(tp);
 		if (error)
 			fail(_("Directory inode allocation failed."), error);
+
+		libxfs_parent_finish(mp, parent);
+
 		/*
 		 * RT initialization.  Do this here to ensure that
 		 * the RT inodes get placed after the root inode.
@@ -597,28 +635,19 @@ parseproto(
 		fail(_("Unknown format"), EINVAL);
 	}
 	libxfs_trans_log_inode(tp, ip, flags);
+	if (parent) {
+		error = -libxfs_parent_defer_add(tp, parent, pip, &xname,
+				offset, ip);
+		if (error)
+			fail(_("committing parent pointers failed."), error);
+	}
 	error = -libxfs_trans_commit(tp);
 	if (error) {
 		fail(_("Error encountered creating file from prototype file"),
 			error);
 	}
 
-	if (xfs_has_parent(mp)) {
-		struct xfs_parent_name_rec      rec;
-		struct xfs_da_args		args = {
-			.dp = ip,
-			.name = (const unsigned char *)&rec,
-			.namelen = sizeof(rec),
-			.attr_filter = XFS_ATTR_PARENT,
-			.value = (void *)xname.name,
-			.valuelen = xname.len,
-		};
-		libxfs_init_parent_name_rec(&rec, pip, offset);
-		error = -libxfs_attr_set(&args);
-		if (error)
-			fail(_("Error creating parent pointer"), error);
-	}
-
+	libxfs_parent_finish(mp, parent);
 	libxfs_irele(ip);
 }
 


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 5/6] xfs_db: report parent pointer keys
  2023-02-16 20:29 ` [PATCHSET v9r2d1 0/6] xfsprogs: tool fixes for parent pointers Darrick J. Wong
                     ` (3 preceding siblings ...)
  2023-02-16 21:02   ` [PATCH 4/6] mkfs: fix subdir parent pointer creation Darrick J. Wong
@ 2023-02-16 21:02   ` Darrick J. Wong
  2023-02-16 21:03   ` [PATCH 6/6] xfs_db: obfuscate dirent and pptr names consistently Darrick J. Wong
  5 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 21:02 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Decode the parent pointer inode, generation, and diroffset fields.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 db/attr.c      |   31 +++++++++++++++++++++++++++++++
 db/attrshort.c |   25 +++++++++++++++++++++++++
 2 files changed, 56 insertions(+)


diff --git a/db/attr.c b/db/attr.c
index f29e4a54..db7cf54b 100644
--- a/db/attr.c
+++ b/db/attr.c
@@ -19,6 +19,7 @@ static int	attr_leaf_entries_count(void *obj, int startoff);
 static int	attr_leaf_hdr_count(void *obj, int startoff);
 static int	attr_leaf_name_local_count(void *obj, int startoff);
 static int	attr_leaf_name_local_name_count(void *obj, int startoff);
+static int	attr_leaf_name_pptr_count(void *obj, int startoff);
 static int	attr_leaf_name_local_value_count(void *obj, int startoff);
 static int	attr_leaf_name_local_value_offset(void *obj, int startoff,
 						  int idx);
@@ -111,6 +112,8 @@ const field_t	attr_leaf_map_flds[] = {
 
 #define	LNOFF(f)	bitize(offsetof(xfs_attr_leaf_name_local_t, f))
 #define	LVOFF(f)	bitize(offsetof(xfs_attr_leaf_name_remote_t, f))
+#define	PPOFF(f)	bitize(offsetof(xfs_attr_leaf_name_local_t, nameval) + \
+			       offsetof(struct xfs_parent_name_rec, f))
 const field_t	attr_leaf_name_flds[] = {
 	{ "valuelen", FLDT_UINT16D, OI(LNOFF(valuelen)),
 	  attr_leaf_name_local_count, FLD_COUNT, TYP_NONE },
@@ -118,6 +121,12 @@ const field_t	attr_leaf_name_flds[] = {
 	  attr_leaf_name_local_count, FLD_COUNT, TYP_NONE },
 	{ "name", FLDT_CHARNS, OI(LNOFF(nameval)),
 	  attr_leaf_name_local_name_count, FLD_COUNT, TYP_NONE },
+	{ "parent_ino", FLDT_INO, OI(PPOFF(p_ino)),
+	  attr_leaf_name_pptr_count, FLD_COUNT, TYP_INODE },
+	{ "parent_gen", FLDT_UINT32D, OI(PPOFF(p_gen)),
+	  attr_leaf_name_pptr_count, FLD_COUNT, TYP_NONE },
+	{ "parent_diroffset", FLDT_UINT32D, OI(PPOFF(p_diroffset)),
+	  attr_leaf_name_pptr_count, FLD_COUNT, TYP_NONE },
 	{ "value", FLDT_CHARNS, attr_leaf_name_local_value_offset,
 	  attr_leaf_name_local_value_count, FLD_COUNT|FLD_OFFSET, TYP_NONE },
 	{ "valueblk", FLDT_UINT32X, OI(LVOFF(valueblk)),
@@ -273,6 +282,26 @@ attr_leaf_name_local_count(
 				    __attr_leaf_name_local_count);
 }
 
+static int
+__attr_leaf_name_pptr_count(
+	struct xfs_attr_leafblock	*leaf,
+	struct xfs_attr_leaf_entry      *e,
+	int				i)
+{
+	if (e->flags & XFS_ATTR_PARENT)
+		return 1;
+	return 0;
+}
+
+static int
+attr_leaf_name_pptr_count(
+	void				*obj,
+	int				startoff)
+{
+	return attr_leaf_entry_walk(obj, startoff,
+			__attr_leaf_name_pptr_count);
+}
+
 static int
 __attr_leaf_name_local_name_count(
 	struct xfs_attr_leafblock	*leaf,
@@ -283,6 +312,8 @@ __attr_leaf_name_local_name_count(
 
 	if (!(e->flags & XFS_ATTR_LOCAL))
 		return 0;
+	if (e->flags & XFS_ATTR_PARENT)
+		return 0;
 
 	l = xfs_attr3_leaf_name_local(leaf, i);
 	return l->namelen;
diff --git a/db/attrshort.c b/db/attrshort.c
index 872d771d..7c8ac485 100644
--- a/db/attrshort.c
+++ b/db/attrshort.c
@@ -13,6 +13,7 @@
 #include "attrshort.h"
 
 static int	attr_sf_entry_name_count(void *obj, int startoff);
+static int	attr_sf_entry_pptr_count(void *obj, int startoff);
 static int	attr_sf_entry_value_count(void *obj, int startoff);
 static int	attr_sf_entry_value_offset(void *obj, int startoff, int idx);
 static int	attr_shortform_list_count(void *obj, int startoff);
@@ -34,6 +35,8 @@ const field_t	attr_sf_hdr_flds[] = {
 };
 
 #define	EOFF(f)	bitize(offsetof(struct xfs_attr_sf_entry, f))
+#define	PPOFF(f) bitize(offsetof(struct xfs_attr_sf_entry, nameval) + \
+			offsetof(struct xfs_parent_name_rec, f))
 const field_t	attr_sf_entry_flds[] = {
 	{ "namelen", FLDT_UINT8D, OI(EOFF(namelen)), C1, 0, TYP_NONE },
 	{ "valuelen", FLDT_UINT8D, OI(EOFF(valuelen)), C1, 0, TYP_NONE },
@@ -49,11 +52,31 @@ const field_t	attr_sf_entry_flds[] = {
 	  TYP_NONE },
 	{ "name", FLDT_CHARNS, OI(EOFF(nameval)), attr_sf_entry_name_count,
 	  FLD_COUNT, TYP_NONE },
+	{ "parent_ino", FLDT_INO, OI(PPOFF(p_ino)), attr_sf_entry_pptr_count,
+	  FLD_COUNT, TYP_INODE },
+	{ "parent_gen", FLDT_UINT32D, OI(PPOFF(p_gen)), attr_sf_entry_pptr_count,
+	  FLD_COUNT, TYP_NONE },
+	{ "parent_diroffset", FLDT_UINT32D, OI(PPOFF(p_diroffset)),
+	   attr_sf_entry_pptr_count, FLD_COUNT, TYP_NONE },
 	{ "value", FLDT_CHARNS, attr_sf_entry_value_offset,
 	  attr_sf_entry_value_count, FLD_COUNT|FLD_OFFSET, TYP_NONE },
 	{ NULL }
 };
 
+static int
+attr_sf_entry_pptr_count(
+	void				*obj,
+	int				startoff)
+{
+	struct xfs_attr_sf_entry	*e;
+
+	ASSERT(bitoffs(startoff) == 0);
+	e = (struct xfs_attr_sf_entry *)((char *)obj + byteize(startoff));
+	if (e->flags & XFS_ATTR_PARENT)
+		return 1;
+	return 0;
+}
+
 static int
 attr_sf_entry_name_count(
 	void				*obj,
@@ -63,6 +86,8 @@ attr_sf_entry_name_count(
 
 	ASSERT(bitoffs(startoff) == 0);
 	e = (struct xfs_attr_sf_entry *)((char *)obj + byteize(startoff));
+	if (e->flags & XFS_ATTR_PARENT)
+		return 0;
 	return e->namelen;
 }
 


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 6/6] xfs_db: obfuscate dirent and pptr names consistently
  2023-02-16 20:29 ` [PATCHSET v9r2d1 0/6] xfsprogs: tool fixes for parent pointers Darrick J. Wong
                     ` (4 preceding siblings ...)
  2023-02-16 21:02   ` [PATCH 5/6] xfs_db: report parent pointer keys Darrick J. Wong
@ 2023-02-16 21:03   ` Darrick J. Wong
  5 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 21:03 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

When someone wants to perform an obfuscated metadump of a filesystem
where parent pointers are enabled, we have to use the *exact* same
obfuscated name for both the directory entry and the parent pointer.
Instead of using an RNG to influence the obfuscated name, use the dirent
inode number to start the obfuscated name.  This makes them consistent,
though the resulting names aren't quite so full of control characters.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 db/metadump.c |   34 ++++++++++++++++++++++++----------
 1 file changed, 24 insertions(+), 10 deletions(-)


diff --git a/db/metadump.c b/db/metadump.c
index 27d1df43..bb441fbb 100644
--- a/db/metadump.c
+++ b/db/metadump.c
@@ -740,12 +740,14 @@ nametable_add(xfs_dahash_t hash, int namelen, unsigned char *name)
 #define rol32(x,y)		(((x) << (y)) | ((x) >> (32 - (y))))
 
 static inline unsigned char
-random_filename_char(void)
+random_filename_char(xfs_ino_t	ino)
 {
 	static unsigned char filename_alphabet[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
 						"abcdefghijklmnopqrstuvwxyz"
 						"0123456789-_";
 
+	if (ino)
+		return filename_alphabet[ino % (sizeof filename_alphabet - 1)];
 	return filename_alphabet[random() % (sizeof filename_alphabet - 1)];
 }
 
@@ -815,6 +817,7 @@ in_lost_found(
  */
 static void
 obfuscate_name(
+	xfs_ino_t	ino,
 	xfs_dahash_t	hash,
 	size_t		name_len,
 	unsigned char	*name)
@@ -842,7 +845,7 @@ obfuscate_name(
 	 * Accumulate its new hash value as we go.
 	 */
 	for (i = 0; i < name_len - 5; i++) {
-		*newp = random_filename_char();
+		*newp = random_filename_char(ino);
 		new_hash = *newp ^ rol32(new_hash, 7);
 		newp++;
 	}
@@ -1207,7 +1210,10 @@ generate_obfuscated_name(
 	/* Obfuscate the name (if possible) */
 
 	hash = libxfs_da_hashname(name, namelen);
-	obfuscate_name(hash, namelen, name);
+	if (xfs_has_parent(mp))
+		obfuscate_name(ino, hash, namelen, name);
+	else
+		obfuscate_name(0, hash, namelen, name);
 
 	/*
 	 * Make sure the name is not something already seen.  If we
@@ -1320,7 +1326,7 @@ obfuscate_path_components(
 			/* last (or single) component */
 			namelen = strnlen((char *)comp, len);
 			hash = libxfs_da_hashname(comp, namelen);
-			obfuscate_name(hash, namelen, comp);
+			obfuscate_name(0, hash, namelen, comp);
 			break;
 		}
 		namelen = slash - (char *)comp;
@@ -1331,7 +1337,7 @@ obfuscate_path_components(
 			continue;
 		}
 		hash = libxfs_da_hashname(comp, namelen);
-		obfuscate_name(hash, namelen, comp);
+		obfuscate_name(0, hash, namelen, comp);
 		comp += namelen + 1;
 		len -= namelen + 1;
 	}
@@ -1407,10 +1413,15 @@ process_sf_attr(
 		}
 
 		if (obfuscate) {
-			generate_obfuscated_name(0, asfep->namelen,
-						 &asfep->nameval[0]);
-			memset(&asfep->nameval[asfep->namelen], 'v',
-			       asfep->valuelen);
+			if (asfep->flags & XFS_ATTR_PARENT) {
+				generate_obfuscated_name(cur_ino, asfep->valuelen,
+					 &asfep->nameval[asfep->namelen]);
+			} else {
+				generate_obfuscated_name(0, asfep->namelen,
+							 &asfep->nameval[0]);
+				memset(&asfep->nameval[asfep->namelen], 'v',
+				       asfep->valuelen);
+			}
 		}
 
 		asfep = (struct xfs_attr_sf_entry *)((char *)asfep +
@@ -1785,7 +1796,7 @@ process_attr_block(
 						(long long)cur_ino);
 				break;
 			}
-			if (obfuscate) {
+			if (obfuscate && !(entry->flags & XFS_ATTR_PARENT)) {
 				generate_obfuscated_name(0, local->namelen,
 					&local->nameval[0]);
 				memset(&local->nameval[local->namelen], 'v',
@@ -1797,6 +1808,9 @@ process_attr_block(
 			zlen = xfs_attr_leaf_entsize_local(nlen, vlen) -
 				(sizeof(xfs_attr_leaf_name_local_t) - 1 +
 				 nlen + vlen);
+			if (obfuscate && (entry->flags & XFS_ATTR_PARENT))
+				generate_obfuscated_name(cur_ino, vlen,
+						&local->nameval[nlen]);
 			if (zero_stale_data)
 				memset(&local->nameval[nlen + vlen], 0, zlen);
 		} else {


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 01/10] xfs_scrub: revert unnecessary code from "implement the upper half of parent pointers"
  2023-02-16 20:29 ` [PATCHSET v9r2d1 00/10] xfsprogs: actually use getparent ioctl Darrick J. Wong
@ 2023-02-16 21:03   ` Darrick J. Wong
  2023-02-16 21:03   ` [PATCH 02/10] xfs_io: print path in path_print Darrick J. Wong
                     ` (8 subsequent siblings)
  9 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 21:03 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Revert this piece which is no longer necessary.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 scrub/inodes.c |   26 --------------------------
 scrub/inodes.h |    2 --
 2 files changed, 28 deletions(-)


diff --git a/scrub/inodes.c b/scrub/inodes.c
index 245dd713..78f0914b 100644
--- a/scrub/inodes.c
+++ b/scrub/inodes.c
@@ -19,7 +19,6 @@
 #include "descr.h"
 #include "libfrog/fsgeom.h"
 #include "libfrog/bulkstat.h"
-#include "parent.h"
 
 /*
  * Iterate a range of inodes.
@@ -450,28 +449,3 @@ scrub_open_handle(
 	return open_by_fshandle(handle, sizeof(*handle),
 			O_RDONLY | O_NOATIME | O_NOFOLLOW | O_NOCTTY);
 }
-
-/* Construct a description for an inode. */
-void
-xfs_scrub_ino_descr(
-	struct scrub_ctx	*ctx,
-	struct xfs_handle	*handle,
-	char			*buf,
-	size_t			buflen)
-{
-	uint64_t		ino;
-	xfs_agnumber_t		agno;
-	xfs_agino_t		agino;
-	int			ret;
-
-	ret = handle_to_path(handle, sizeof(struct xfs_handle), buf, buflen);
-	if (ret >= 0)
-		return;
-
-	ino = handle->ha_fid.fid_ino;
-	agno = ino / (1ULL << (ctx->mnt.inopblog + ctx->mnt.agblklog));
-	agino = ino % (1ULL << (ctx->mnt.inopblog + ctx->mnt.agblklog));
-	snprintf(buf, buflen, _("inode %"PRIu64" (%u/%u)"), ino, agno,
-			agino);
-}
-
diff --git a/scrub/inodes.h b/scrub/inodes.h
index 189fa282..f0318045 100644
--- a/scrub/inodes.h
+++ b/scrub/inodes.h
@@ -21,7 +21,5 @@ int scrub_scan_all_inodes(struct scrub_ctx *ctx, scrub_inode_iter_fn fn,
 		void *arg);
 
 int scrub_open_handle(struct xfs_handle *handle);
-void xfs_scrub_ino_descr(struct scrub_ctx *ctx, struct xfs_handle *handle,
-		char *buf, size_t buflen);
 
 #endif /* XFS_SCRUB_INODES_H_ */


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 02/10] xfs_io: print path in path_print
  2023-02-16 20:29 ` [PATCHSET v9r2d1 00/10] xfsprogs: actually use getparent ioctl Darrick J. Wong
  2023-02-16 21:03   ` [PATCH 01/10] xfs_scrub: revert unnecessary code from "implement the upper half of parent pointers" Darrick J. Wong
@ 2023-02-16 21:03   ` Darrick J. Wong
  2023-02-16 21:03   ` [PATCH 03/10] xfs_io: move parent pointer filtering and formatting flags out of libhandle Darrick J. Wong
                     ` (7 subsequent siblings)
  9 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 21:03 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Actually print the path string once we've bothered to construct it into
a string buffer.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 io/parent.c |    2 ++
 1 file changed, 2 insertions(+)


diff --git a/io/parent.c b/io/parent.c
index a6f3fa0c..b18e02c4 100644
--- a/io/parent.c
+++ b/io/parent.c
@@ -87,6 +87,8 @@ path_print(
 	ret = path_list_to_string(path, buf + ret, len - ret);
 	if (ret < 0)
 		return ret;
+
+	printf("%s\n", buf);
 	return 0;
 }
 


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 03/10] xfs_io: move parent pointer filtering and formatting flags out of libhandle
  2023-02-16 20:29 ` [PATCHSET v9r2d1 00/10] xfsprogs: actually use getparent ioctl Darrick J. Wong
  2023-02-16 21:03   ` [PATCH 01/10] xfs_scrub: revert unnecessary code from "implement the upper half of parent pointers" Darrick J. Wong
  2023-02-16 21:03   ` [PATCH 02/10] xfs_io: print path in path_print Darrick J. Wong
@ 2023-02-16 21:03   ` Darrick J. Wong
  2023-02-16 21:04   ` [PATCH 04/10] libfrog: remove all the parent pointer code from libhandle Darrick J. Wong
                     ` (6 subsequent siblings)
  9 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 21:03 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

All this filtering and presentation stuff originates in xfs_io and
should stay there.  The added arguments seriously complicate the basic
iterator interface and there are no other users.

While we're at it, fix a bug in path_print where it doesn't actually
print the path.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 include/parent.h   |   17 +++------
 io/parent.c        |   99 ++++++++++++++++++++++++++++++++--------------------
 libfrog/paths.c    |   28 ++++++++++++++-
 libfrog/paths.h    |    8 +++-
 libhandle/parent.c |   77 ++++++++++++----------------------------
 5 files changed, 119 insertions(+), 110 deletions(-)


diff --git a/include/parent.h b/include/parent.h
index 2e136724..fb900041 100644
--- a/include/parent.h
+++ b/include/parent.h
@@ -17,27 +17,20 @@ typedef struct parent_cursor {
 	__u32	opaque[4];      /* an opaque cookie */
 } parent_cursor_t;
 
-/* Print parent pointer option flags */
-#define XFS_PPPTR_OFLAG_SHORT  (1<<0)	/* Print in short format */
-
 struct path_list;
 
 typedef int (*walk_pptr_fn)(struct xfs_pptr_info *pi, struct xfs_parent_ptr *pptr,
-		void *arg, int flags);
+		void *arg);
 typedef int (*walk_ppath_fn)(const char *mntpt, struct path_list *path,
 		void *arg);
 
 #define WALK_PPTRS_ABORT	1
-int fd_walk_pptrs(int fd, uint64_t pino, char *pname, walk_pptr_fn fn,
-		void *arg, int flags);
-int handle_walk_pptrs(void *hanp, size_t hanlen, uint64_t pino, char *pname,
-		walk_pptr_fn fn, void *arg, int flags);
+int fd_walk_pptrs(int fd, walk_pptr_fn fn, void *arg);
+int handle_walk_pptrs(void *hanp, size_t hanlen, walk_pptr_fn fn, void *arg);
 
 #define WALK_PPATHS_ABORT	1
-int fd_walk_ppaths(int fd, uint64_t pino, char *pname, walk_ppath_fn fn,
-		void *arg, int flags);
-int handle_walk_ppaths(void *hanp, size_t hanlen, uint64_t pino, char *pname,
-		walk_ppath_fn fn, void *arg, int flags);
+int fd_walk_ppaths(int fd, walk_ppath_fn fn, void *arg);
+int handle_walk_ppaths(void *hanp, size_t hanlen, walk_ppath_fn fn, void *arg);
 
 int fd_to_path(int fd, char *path, size_t pathlen);
 int handle_to_path(void *hanp, size_t hlen, char *path, size_t pathlen);
diff --git a/io/parent.c b/io/parent.c
index b18e02c4..66bb0fae 100644
--- a/io/parent.c
+++ b/io/parent.c
@@ -15,34 +15,41 @@
 static cmdinfo_t parent_cmd;
 static char *mntpt;
 
+struct pptr_args {
+	uint64_t	filter_ino;
+	char		*filter_name;
+	bool		shortformat;
+};
+
 static int
 pptr_print(
 	struct xfs_pptr_info	*pi,
 	struct xfs_parent_ptr	*pptr,
-	void			*arg,
-	int			flags)
+	void			*arg)
 {
-	char			buf[XFS_PPTR_MAXNAMELEN + 1];
-	unsigned int		namelen = strlen((char *)pptr->xpp_name);
+	struct pptr_args	*args = arg;
+	unsigned int		namelen;
 
 	if (pi->pi_flags & XFS_PPTR_OFLAG_ROOT) {
 		printf(_("Root directory.\n"));
 		return 0;
 	}
 
-	memcpy(buf, pptr->xpp_name, namelen);
-	buf[namelen] = 0;
+	if (args->filter_ino && pptr->xpp_ino != args->filter_ino)
+		return 0;
+	if (args->filter_name && strcmp(args->filter_name, pptr->xpp_name))
+		return 0;
 
-	if (flags & XFS_PPPTR_OFLAG_SHORT) {
+	namelen = strlen(pptr->xpp_name);
+	if (args->shortformat) {
 		printf("%llu/%u/%u/%s\n",
 			(unsigned long long)pptr->xpp_ino,
-			(unsigned int)pptr->xpp_gen, namelen, buf);
-	}
-	else {
+			(unsigned int)pptr->xpp_gen, namelen, pptr->xpp_name);
+	} else {
 		printf(_("p_ino    = %llu\n"), (unsigned long long)pptr->xpp_ino);
 		printf(_("p_gen    = %u\n"), (unsigned int)pptr->xpp_gen);
 		printf(_("p_reclen = %u\n"), namelen);
-		printf(_("p_name   = \"%s\"\n\n"), buf);
+		printf(_("p_name   = \"%s\"\n\n"), pptr->xpp_name);
 	}
 	return 0;
 }
@@ -50,34 +57,53 @@ pptr_print(
 static int
 print_parents(
 	struct xfs_handle	*handle,
-	uint64_t		pino,
-	char			*pname,
-	int			flags)
+	struct pptr_args	*args)
 {
 	int			ret;
 
 	if (handle)
-		ret = handle_walk_pptrs(handle, sizeof(*handle), pino,
-				pname, pptr_print, NULL, flags);
+		ret = handle_walk_pptrs(handle, sizeof(*handle), pptr_print,
+				args);
 	else
-		ret = fd_walk_pptrs(file->fd, pino, pname, pptr_print,
-				NULL, flags);
+		ret = fd_walk_pptrs(file->fd, pptr_print, args);
 	if (ret)
 		perror(file->name);
 
 	return 0;
 }
 
+static int
+filter_path_components(
+	const char		*name,
+	uint64_t		ino,
+	void			*arg)
+{
+	struct pptr_args	*args = arg;
+
+	if (args->filter_ino && ino == args->filter_ino)
+		return ECANCELED;
+	if (args->filter_name && !strcmp(args->filter_name, name))
+		return ECANCELED;
+	return 0;
+}
+
 static int
 path_print(
 	const char		*mntpt,
 	struct path_list	*path,
-	void			*arg) {
-
+	void			*arg)
+{
+	struct pptr_args	*args = arg;
 	char			buf[PATH_MAX];
 	size_t			len = PATH_MAX;
 	int			ret;
 
+	if (args->filter_ino || args->filter_name) {
+		ret = path_walk_components(path, filter_path_components, args);
+		if (ret != ECANCELED)
+			return 0;
+	}
+
 	ret = snprintf(buf, len, "%s", mntpt);
 	if (ret != strlen(mntpt)) {
 		errno = ENOMEM;
@@ -95,18 +121,15 @@ path_print(
 static int
 print_paths(
 	struct xfs_handle	*handle,
-	uint64_t		pino,
-	char			*pname,
-	int			flags)
+	struct pptr_args	*args)
 {
 	int			ret;
 
 	if (handle)
-		ret = handle_walk_ppaths(handle, sizeof(*handle), pino,
-				pname, path_print, NULL, flags);
+		ret = handle_walk_ppaths(handle, sizeof(*handle), path_print,
+				args);
  	else
-		ret = fd_walk_ppaths(file->fd, pino, pname, path_print,
-				NULL, flags);
+		ret = fd_walk_ppaths(file->fd, path_print, args);
 	if (ret)
 		perror(file->name);
 	return 0;
@@ -118,6 +141,7 @@ parent_f(
 	char			**argv)
 {
 	struct xfs_handle	handle;
+	struct pptr_args	args = { 0 };
 	void			*hanp = NULL;
 	size_t			hlen;
 	struct fs_path		*fs;
@@ -128,9 +152,6 @@ parent_f(
 	int			listpath_flag = 0;
 	int			ret;
 	static int		tab_init;
-	uint64_t		pino = 0;
-	char			*pname = NULL;
-	int			ppptr_flags = 0;
 
 	if (!tab_init) {
 		tab_init = 1;
@@ -151,8 +172,8 @@ parent_f(
 			listpath_flag = 1;
 			break;
 		case 'i':
-	                pino = strtoull(optarg, &p, 0);
-	                if (*p != '\0' || pino == 0) {
+	                args.filter_ino = strtoull(optarg, &p, 0);
+	                if (*p != '\0' || args.filter_ino == 0) {
 	                        fprintf(stderr,
 	                                _("Bad inode number '%s'.\n"),
 	                                optarg);
@@ -161,10 +182,10 @@ parent_f(
 
 			break;
 		case 'n':
-			pname = optarg;
+			args.filter_name = optarg;
 			break;
 		case 'f':
-			ppptr_flags |= XFS_PPPTR_OFLAG_SHORT;
+			args.shortformat = true;
 			break;
 		default:
 			return command_usage(&parent_cmd);
@@ -204,14 +225,14 @@ parent_f(
 		handle.ha_fid.fid_ino = ino;
 		handle.ha_fid.fid_gen = gen;
 
+	} else if (optind != argc) {
+		return command_usage(&parent_cmd);
 	}
 
 	if (listpath_flag)
-		exitcode = print_paths(ino ? &handle : NULL,
-				pino, pname, ppptr_flags);
+		exitcode = print_paths(ino ? &handle : NULL, &args);
 	else
-		exitcode = print_parents(ino ? &handle : NULL,
-				pino, pname, ppptr_flags);
+		exitcode = print_parents(ino ? &handle : NULL, &args);
 
 	if (hanp)
 		free_handle(hanp, hlen);
@@ -245,7 +266,7 @@ parent_init(void)
 	parent_cmd.cfunc = parent_f;
 	parent_cmd.argmin = 0;
 	parent_cmd.argmax = -1;
-	parent_cmd.args = _("[-p] [ino gen] [-i] [ino] [-n] [name] [-f]");
+	parent_cmd.args = _("[-p] [ino gen] [-i ino] [-n name] [-f]");
 	parent_cmd.flags = CMD_NOMAP_OK;
 	parent_cmd.oneline = _("print parent inodes");
 	parent_cmd.help = parent_help;
diff --git a/libfrog/paths.c b/libfrog/paths.c
index a86ae07c..e541e200 100644
--- a/libfrog/paths.c
+++ b/libfrog/paths.c
@@ -574,13 +574,15 @@ struct path_list {
 
 struct path_component {
 	struct list_head	pc_list;
+	uint64_t		pc_ino;
 	char			*pc_fname;
 };
 
 /* Initialize a path component with a given name. */
 struct path_component *
 path_component_init(
-	const char		*name)
+	const char		*name,
+	uint64_t		ino)
 {
 	struct path_component	*pc;
 
@@ -593,6 +595,7 @@ path_component_init(
 		free(pc);
 		return NULL;
 	}
+	pc->pc_ino = ino;
 	return pc;
 }
 
@@ -610,7 +613,8 @@ int
 path_component_change(
 	struct path_component	*pc,
 	void			*name,
-	size_t			namelen)
+	size_t			namelen,
+	uint64_t		ino)
 {
 	void			*p;
 
@@ -620,6 +624,7 @@ path_component_change(
 	pc->pc_fname = p;
 	memcpy(pc->pc_fname, name, namelen);
 	pc->pc_fname[namelen] = 0;
+	pc->pc_ino = ino;
 	return 0;
 }
 
@@ -699,3 +704,22 @@ path_list_to_string(
 	}
 	return bytes;
 }
+
+/* Walk each component of a path. */
+int
+path_walk_components(
+	struct path_list	*path,
+	path_walk_fn_t		fn,
+	void			*arg)
+{
+	struct path_component	*pos;
+	int			ret;
+
+	list_for_each_entry(pos, &path->p_head, pc_list) {
+		ret = fn(pos->pc_fname, pos->pc_ino, arg);
+		if (ret)
+			return ret;
+	}
+
+	return 0;
+}
diff --git a/libfrog/paths.h b/libfrog/paths.h
index 52538fb5..eb66df0c 100644
--- a/libfrog/paths.h
+++ b/libfrog/paths.h
@@ -63,10 +63,10 @@ extern fs_path_t *fs_cursor_next_entry(fs_cursor_t *__cp);
 struct path_list;
 struct path_component;
 
-struct path_component *path_component_init(const char *name);
+struct path_component *path_component_init(const char *name, uint64_t ino);
 void path_component_free(struct path_component *pc);
 int path_component_change(struct path_component *pc, void *name,
-		size_t namelen);
+		size_t namelen, uint64_t ino);
 
 struct path_list *path_list_init(void);
 void path_list_free(struct path_list *path);
@@ -77,4 +77,8 @@ void path_list_del_component(struct path_list *path, struct path_component *pc);
 
 ssize_t path_list_to_string(struct path_list *path, char *buf, size_t buflen);
 
+typedef int (*path_walk_fn_t)(const char *name, uint64_t ino, void *arg);
+
+int path_walk_components(struct path_list *path, path_walk_fn_t fn, void *arg);
+
 #endif	/* __PATH_H__ */
diff --git a/libhandle/parent.c b/libhandle/parent.c
index 3de8742c..c10a55ac 100644
--- a/libhandle/parent.c
+++ b/libhandle/parent.c
@@ -40,21 +40,13 @@ xfs_pptr_alloc(
       return pi;
 }
 
-/*
- * Walk all parents of the given file handle.
- * If pino is set, print only the parent pointer
- * of that inode.  If pname is set, print only the
- * parent pointer of that filename
- */
+/* Walk all parents of the given file handle. */
 static int
 handle_walk_parents(
 	int			fd,
 	struct xfs_handle	*handle,
-	uint64_t		pino,
-	char			*pname,
 	walk_pptr_fn		fn,
-	void			*arg,
-	int			flags)
+	void			*arg)
 {
 	struct xfs_pptr_info	*pi;
 	struct xfs_parent_ptr	*p;
@@ -73,20 +65,13 @@ handle_walk_parents(
 	ret = ioctl(fd, XFS_IOC_GETPARENTS, pi);
 	while (!ret) {
 		if (pi->pi_flags & XFS_PPTR_OFLAG_ROOT) {
-			ret = fn(pi, NULL, arg, flags);
+			ret = fn(pi, NULL, arg);
 			break;
 		}
 
 		for (i = 0; i < pi->pi_ptrs_used; i++) {
 			p = xfs_ppinfo_to_pp(pi, i);
-			if ((pino != 0) && (pino != p->xpp_ino))
-				continue;
-
-			if ((pname  != NULL) && (strcmp(pname,
-					(char *)p->xpp_name) != 0))
-				continue;
-
-			ret = fn(pi, p, arg, flags);
+			ret = fn(pi, p, arg);
 			if (ret)
 				goto out_pi;
 		}
@@ -107,11 +92,8 @@ int
 handle_walk_pptrs(
 	void			*hanp,
 	size_t			hlen,
-	uint64_t		pino,
-	char			*pname,
 	walk_pptr_fn		fn,
-	void			*arg,
-	int			flags)
+	void			*arg)
 {
 	char			*mntpt;
 	int			fd;
@@ -125,20 +107,17 @@ handle_walk_pptrs(
 	if (fd < 0)
 		return -1;
 
-	return handle_walk_parents(fd, hanp, pino, pname, fn, arg, flags);
+	return handle_walk_parents(fd, hanp, fn, arg);
 }
 
 /* Walk all parent pointers of this fd. */
 int
 fd_walk_pptrs(
 	int			fd,
-	uint64_t		pino,
-	char			*pname,
 	walk_pptr_fn		fn,
-	void			*arg,
-	int			flags)
+	void			*arg)
 {
-	return handle_walk_parents(fd, NULL, pino, pname, fn, arg, flags);
+	return handle_walk_parents(fd, NULL, fn, arg);
 }
 
 struct walk_ppaths_info {
@@ -156,15 +135,13 @@ struct walk_ppath_level_info {
 };
 
 static int handle_walk_parent_paths(struct walk_ppaths_info *wpi,
-		struct xfs_handle *handle, uint64_t pino, char *pname,
-		int flags);
+		struct xfs_handle *handle);
 
 static int
 handle_walk_parent_path_ptr(
 	struct xfs_pptr_info		*pi,
 	struct xfs_parent_ptr		*p,
-	void				*arg,
-	int				flags)
+	void				*arg)
 {
 	struct walk_ppath_level_info	*wpli = arg;
 	struct walk_ppaths_info		*wpi = wpli->wpi;
@@ -177,13 +154,13 @@ handle_walk_parent_path_ptr(
 	for (i = 0; i < pi->pi_ptrs_used; i++) {
 		p = xfs_ppinfo_to_pp(pi, i);
 		ret = path_component_change(wpli->pc, p->xpp_name,
-				strlen((char *)p->xpp_name));
+				strlen((char *)p->xpp_name), p->xpp_ino);
 		if (ret)
 			break;
 		wpli->newhandle.ha_fid.fid_ino = p->xpp_ino;
 		wpli->newhandle.ha_fid.fid_gen = p->xpp_gen;
 		path_list_add_parent_component(wpi->path, wpli->pc);
-		ret = handle_walk_parent_paths(wpi, &wpli->newhandle, 0, NULL, 0);
+		ret = handle_walk_parent_paths(wpi, &wpli->newhandle);
 		path_list_del_component(wpi->path, wpli->pc);
 		if (ret)
 			break;
@@ -199,10 +176,7 @@ handle_walk_parent_path_ptr(
 static int
 handle_walk_parent_paths(
 	struct walk_ppaths_info		*wpi,
-	struct xfs_handle		*handle,
-	uint64_t			pino,
-	char				*pname,
-	int				flags)
+	struct xfs_handle		*handle)
 {
 	struct walk_ppath_level_info	*wpli;
 	int				ret;
@@ -210,7 +184,7 @@ handle_walk_parent_paths(
 	wpli = malloc(sizeof(struct walk_ppath_level_info));
 	if (!wpli)
 		return -1;
-	wpli->pc = path_component_init("");
+	wpli->pc = path_component_init("", 0);
 	if (!wpli->pc) {
 		free(wpli);
 		return -1;
@@ -218,8 +192,8 @@ handle_walk_parent_paths(
 	wpli->wpi = wpi;
 	memcpy(&wpli->newhandle, handle, sizeof(struct xfs_handle));
 
-	ret = handle_walk_parents(wpi->fd, handle, pino, pname,
-			handle_walk_parent_path_ptr, wpli, flags);
+	ret = handle_walk_parents(wpi->fd, handle, handle_walk_parent_path_ptr,
+			wpli);
 
 	path_component_free(wpli->pc);
 	free(wpli);
@@ -234,11 +208,8 @@ int
 handle_walk_ppaths(
 	void			*hanp,
 	size_t			hlen,
-	uint64_t		pino,
-	char			*pname,
 	walk_ppath_fn		fn,
-	void			*arg,
-	int			flags)
+	void			*arg)
 {
 	struct walk_ppaths_info	wpi;
 	ssize_t			ret;
@@ -257,7 +228,7 @@ handle_walk_ppaths(
 	wpi.fn = fn;
 	wpi.arg = arg;
 
-	ret = handle_walk_parent_paths(&wpi, hanp, pino, pname, flags);
+	ret = handle_walk_parent_paths(&wpi, hanp);
 	path_list_free(wpi.path);
 
 	return ret;
@@ -270,11 +241,8 @@ handle_walk_ppaths(
 int
 fd_walk_ppaths(
 	int			fd,
-	uint64_t		pino,
-	char			*pname,
 	walk_ppath_fn		fn,
-	void			*arg,
-	int			flags)
+	void			*arg)
 {
 	struct walk_ppaths_info	wpi;
 	void			*hanp;
@@ -296,7 +264,7 @@ fd_walk_ppaths(
 	wpi.fn = fn;
 	wpi.arg = arg;
 
-	ret = handle_walk_parent_paths(&wpi, hanp, pino, pname, flags);
+	ret = handle_walk_parent_paths(&wpi, hanp);
 	path_list_free(wpi.path);
 
 	return ret;
@@ -342,8 +310,7 @@ handle_to_path(
 
 	pwi.buf = path;
 	pwi.len = pathlen;
-	return handle_walk_ppaths(hanp, hlen, 0, NULL, handle_to_path_walk,
-			&pwi, 0);
+	return handle_walk_ppaths(hanp, hlen, handle_to_path_walk, &pwi);
 }
 
 /* Return any eligible path to this file description. */
@@ -357,5 +324,5 @@ fd_to_path(
 
 	pwi.buf = path;
 	pwi.len = pathlen;
-	return fd_walk_ppaths(fd, 0, NULL, handle_to_path_walk, &pwi, 0);
+	return fd_walk_ppaths(fd, handle_to_path_walk, &pwi);
 }


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 04/10] libfrog: remove all the parent pointer code from libhandle
  2023-02-16 20:29 ` [PATCHSET v9r2d1 00/10] xfsprogs: actually use getparent ioctl Darrick J. Wong
                     ` (2 preceding siblings ...)
  2023-02-16 21:03   ` [PATCH 03/10] xfs_io: move parent pointer filtering and formatting flags out of libhandle Darrick J. Wong
@ 2023-02-16 21:04   ` Darrick J. Wong
  2023-02-16 21:04   ` [PATCH 05/10] libfrog: fix indenting errors in xfss_pptr_alloc Darrick J. Wong
                     ` (5 subsequent siblings)
  9 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 21:04 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Move this code out of libhandle and into libfrog.  We don't want to
expose this stuff to a userspace library until customers actually demand
it.  While we're here, fix the copyright statements and licensing tags.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 include/parent.h   |   18 ------------------
 io/parent.c        |    1 +
 libfrog/Makefile   |    2 ++
 libfrog/pptrs.c    |   22 ++++------------------
 libfrog/pptrs.h    |   27 +++++++++++++++++++++++++++
 libhandle/Makefile |    2 +-
 6 files changed, 35 insertions(+), 37 deletions(-)
 rename libhandle/parent.c => libfrog/pptrs.c (87%)
 create mode 100644 libfrog/pptrs.h


diff --git a/include/parent.h b/include/parent.h
index fb900041..4d3ad51b 100644
--- a/include/parent.h
+++ b/include/parent.h
@@ -17,22 +17,4 @@ typedef struct parent_cursor {
 	__u32	opaque[4];      /* an opaque cookie */
 } parent_cursor_t;
 
-struct path_list;
-
-typedef int (*walk_pptr_fn)(struct xfs_pptr_info *pi, struct xfs_parent_ptr *pptr,
-		void *arg);
-typedef int (*walk_ppath_fn)(const char *mntpt, struct path_list *path,
-		void *arg);
-
-#define WALK_PPTRS_ABORT	1
-int fd_walk_pptrs(int fd, walk_pptr_fn fn, void *arg);
-int handle_walk_pptrs(void *hanp, size_t hanlen, walk_pptr_fn fn, void *arg);
-
-#define WALK_PPATHS_ABORT	1
-int fd_walk_ppaths(int fd, walk_ppath_fn fn, void *arg);
-int handle_walk_ppaths(void *hanp, size_t hanlen, walk_ppath_fn fn, void *arg);
-
-int fd_to_path(int fd, char *path, size_t pathlen);
-int handle_to_path(void *hanp, size_t hlen, char *path, size_t pathlen);
-
 #endif
diff --git a/io/parent.c b/io/parent.c
index 66bb0fae..ceb62a43 100644
--- a/io/parent.c
+++ b/io/parent.c
@@ -8,6 +8,7 @@
 #include "input.h"
 #include "libfrog/paths.h"
 #include "parent.h"
+#include "libfrog/pptrs.h"
 #include "handle.h"
 #include "init.h"
 #include "io.h"
diff --git a/libfrog/Makefile b/libfrog/Makefile
index 01107082..5622ab9b 100644
--- a/libfrog/Makefile
+++ b/libfrog/Makefile
@@ -23,6 +23,7 @@ list_sort.c \
 linux.c \
 logging.c \
 paths.c \
+pptrs.c \
 projects.c \
 ptvar.c \
 radix-tree.c \
@@ -42,6 +43,7 @@ crc32table.h \
 fsgeom.h \
 logging.h \
 paths.h \
+pptrs.h \
 projects.h \
 ptvar.h \
 radix-tree.h \
diff --git a/libhandle/parent.c b/libfrog/pptrs.c
similarity index 87%
rename from libhandle/parent.c
rename to libfrog/pptrs.c
index c10a55ac..66a34246 100644
--- a/libhandle/parent.c
+++ b/libfrog/pptrs.c
@@ -1,21 +1,7 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
 /*
- * Copyright (C) 2017 Oracle.  All Rights Reserved.
- *
- * Author: Darrick J. Wong <darrick.wong@oracle.com>
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License
- * as published by the Free Software Foundation; either version 2
- * of the License, or (at your option) any later version.
- *
- * This program is distributed in the hope that it would be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write the Free Software Foundation,
- * Inc.,  51 Franklin St, Fifth Floor, Boston, MA  02110-1301, USA.
+ * Copyright (C) 2023 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <djwong@kernel.org>
  */
 #include "platform_defs.h"
 #include "xfs.h"
@@ -23,7 +9,7 @@
 #include "list.h"
 #include "libfrog/paths.h"
 #include "handle.h"
-#include "parent.h"
+#include "libfrog/pptrs.h"
 
 /* Allocate a buffer large enough for some parent pointer records. */
 static inline struct xfs_pptr_info *
diff --git a/libfrog/pptrs.h b/libfrog/pptrs.h
new file mode 100644
index 00000000..d174aa2a
--- /dev/null
+++ b/libfrog/pptrs.h
@@ -0,0 +1,27 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (C) 2023 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <djwong@kernel.org>
+ */
+#ifndef __LIBFROG_PPTRS_H_
+#define	__LIBFROG_PPTRS_H_
+
+struct path_list;
+
+typedef int (*walk_pptr_fn)(struct xfs_pptr_info *pi, struct xfs_parent_ptr *pptr,
+		void *arg);
+typedef int (*walk_ppath_fn)(const char *mntpt, struct path_list *path,
+		void *arg);
+
+#define WALK_PPTRS_ABORT	1
+int fd_walk_pptrs(int fd, walk_pptr_fn fn, void *arg);
+int handle_walk_pptrs(void *hanp, size_t hanlen, walk_pptr_fn fn, void *arg);
+
+#define WALK_PPATHS_ABORT	1
+int fd_walk_ppaths(int fd, walk_ppath_fn fn, void *arg);
+int handle_walk_ppaths(void *hanp, size_t hanlen, walk_ppath_fn fn, void *arg);
+
+int fd_to_path(int fd, char *path, size_t pathlen);
+int handle_to_path(void *hanp, size_t hlen, char *path, size_t pathlen);
+
+#endif /* __LIBFROG_PPTRS_H_ */
diff --git a/libhandle/Makefile b/libhandle/Makefile
index cf7df67c..f297a59e 100644
--- a/libhandle/Makefile
+++ b/libhandle/Makefile
@@ -12,7 +12,7 @@ LT_AGE = 0
 
 LTLDFLAGS += -Wl,--version-script,libhandle.sym
 
-CFILES = handle.c jdm.c parent.c
+CFILES = handle.c jdm.c
 LSRCFILES = libhandle.sym
 
 default: ltdepend $(LTLIBRARY)


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 05/10] libfrog: fix indenting errors in xfss_pptr_alloc
  2023-02-16 20:29 ` [PATCHSET v9r2d1 00/10] xfsprogs: actually use getparent ioctl Darrick J. Wong
                     ` (3 preceding siblings ...)
  2023-02-16 21:04   ` [PATCH 04/10] libfrog: remove all the parent pointer code from libhandle Darrick J. Wong
@ 2023-02-16 21:04   ` Darrick J. Wong
  2023-02-16 21:04   ` [PATCH 06/10] libfrog: return positive errno in pptrs.c Darrick J. Wong
                     ` (4 subsequent siblings)
  9 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 21:04 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Fix some indenting problems, and get rid of the xfs_ prefix.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 libfrog/pptrs.c |   20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)


diff --git a/libfrog/pptrs.c b/libfrog/pptrs.c
index 66a34246..5a3a7e2b 100644
--- a/libfrog/pptrs.c
+++ b/libfrog/pptrs.c
@@ -13,17 +13,17 @@
 
 /* Allocate a buffer large enough for some parent pointer records. */
 static inline struct xfs_pptr_info *
-xfs_pptr_alloc(
-      size_t                  nr_ptrs)
+alloc_pptr_buf(
+	size_t			nr_ptrs)
 {
-      struct xfs_pptr_info    *pi;
+	struct xfs_pptr_info	*pi;
 
-      pi = malloc(xfs_pptr_info_sizeof(nr_ptrs));
-      if (!pi)
-              return NULL;
-      memset(pi, 0, sizeof(struct xfs_pptr_info));
-      pi->pi_ptrs_size = nr_ptrs;
-      return pi;
+	pi = malloc(xfs_pptr_info_sizeof(nr_ptrs));
+	if (!pi)
+		return NULL;
+	memset(pi, 0, sizeof(struct xfs_pptr_info));
+	pi->pi_ptrs_size = nr_ptrs;
+	return pi;
 }
 
 /* Walk all parents of the given file handle. */
@@ -39,7 +39,7 @@ handle_walk_parents(
 	unsigned int		i;
 	ssize_t			ret = -1;
 
-	pi = xfs_pptr_alloc(4);
+	pi = alloc_pptr_buf(4);
 	if (!pi)
 		return -1;
 


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 06/10] libfrog: return positive errno in pptrs.c
  2023-02-16 20:29 ` [PATCHSET v9r2d1 00/10] xfsprogs: actually use getparent ioctl Darrick J. Wong
                     ` (4 preceding siblings ...)
  2023-02-16 21:04   ` [PATCH 05/10] libfrog: fix indenting errors in xfss_pptr_alloc Darrick J. Wong
@ 2023-02-16 21:04   ` Darrick J. Wong
  2023-02-16 21:04   ` [PATCH 07/10] libfrog: only walk one parent pointer at a time in handle_walk_parent_path_ptr Darrick J. Wong
                     ` (3 subsequent siblings)
  9 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 21:04 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Make all the functions in here return 0 for success or positive errno.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 io/parent.c     |   12 +++-----
 libfrog/paths.c |   11 +++++--
 libfrog/pptrs.c |   81 +++++++++++++++++++++++++++++++++----------------------
 libfrog/pptrs.h |    6 +---
 4 files changed, 62 insertions(+), 48 deletions(-)


diff --git a/io/parent.c b/io/parent.c
index ceb62a43..25d835a3 100644
--- a/io/parent.c
+++ b/io/parent.c
@@ -68,7 +68,7 @@ print_parents(
 	else
 		ret = fd_walk_pptrs(file->fd, pptr_print, args);
 	if (ret)
-		perror(file->name);
+		fprintf(stderr, "%s: %s\n", file->name, strerror(ret));
 
 	return 0;
 }
@@ -106,14 +106,12 @@ path_print(
 	}
 
 	ret = snprintf(buf, len, "%s", mntpt);
-	if (ret != strlen(mntpt)) {
-		errno = ENOMEM;
-		return -1;
-	}
+	if (ret != strlen(mntpt))
+		return ENAMETOOLONG;
 
 	ret = path_list_to_string(path, buf + ret, len - ret);
 	if (ret < 0)
-		return ret;
+		return ENAMETOOLONG;
 
 	printf("%s\n", buf);
 	return 0;
@@ -132,7 +130,7 @@ print_paths(
  	else
 		ret = fd_walk_ppaths(file->fd, path_print, args);
 	if (ret)
-		perror(file->name);
+		fprintf(stderr, "%s: %s\n", file->name, strerror(ret));
 	return 0;
 }
 
diff --git a/libfrog/paths.c b/libfrog/paths.c
index e541e200..cc43b02c 100644
--- a/libfrog/paths.c
+++ b/libfrog/paths.c
@@ -608,7 +608,7 @@ path_component_free(
 	free(pc);
 }
 
-/* Change a path component's filename. */
+/* Change a path component's filename or returns positive errno. */
 int
 path_component_change(
 	struct path_component	*pc,
@@ -620,7 +620,7 @@ path_component_change(
 
 	p = realloc(pc->pc_fname, namelen + 1);
 	if (!p)
-		return -1;
+		return errno;
 	pc->pc_fname = p;
 	memcpy(pc->pc_fname, name, namelen);
 	pc->pc_fname[namelen] = 0;
@@ -628,7 +628,7 @@ path_component_change(
 	return 0;
 }
 
-/* Initialize a pathname. */
+/* Initialize a pathname or returns positive errno. */
 struct path_list *
 path_list_init(void)
 {
@@ -683,7 +683,10 @@ path_list_del_component(
 	list_del_init(&pc->pc_list);
 }
 
-/* Convert a pathname into a string. */
+/*
+ * Convert a pathname into a string or returns -1 if the buffer isn't long
+ * enough.
+ */
 ssize_t
 path_list_to_string(
 	struct path_list	*path,
diff --git a/libfrog/pptrs.c b/libfrog/pptrs.c
index 5a3a7e2b..ef91a919 100644
--- a/libfrog/pptrs.c
+++ b/libfrog/pptrs.c
@@ -26,7 +26,10 @@ alloc_pptr_buf(
 	return pi;
 }
 
-/* Walk all parents of the given file handle. */
+/*
+ * Walk all parents of the given file handle.  Returns 0 on success or positive
+ * errno.
+ */
 static int
 handle_walk_parents(
 	int			fd,
@@ -41,7 +44,7 @@ handle_walk_parents(
 
 	pi = alloc_pptr_buf(4);
 	if (!pi)
-		return -1;
+		return errno;
 
 	if (handle) {
 		memcpy(&pi->pi_handle, handle, sizeof(struct xfs_handle));
@@ -52,7 +55,7 @@ handle_walk_parents(
 	while (!ret) {
 		if (pi->pi_flags & XFS_PPTR_OFLAG_ROOT) {
 			ret = fn(pi, NULL, arg);
-			break;
+			goto out_pi;
 		}
 
 		for (i = 0; i < pi->pi_ptrs_used; i++) {
@@ -67,13 +70,15 @@ handle_walk_parents(
 
 		ret = ioctl(fd, XFS_IOC_GETPARENTS, pi);
 	}
+	if (ret)
+		ret = errno;
 
 out_pi:
 	free(pi);
 	return ret;
 }
 
-/* Walk all parent pointers of this handle. */
+/* Walk all parent pointers of this handle.  Returns 0 or positive errno. */
 int
 handle_walk_pptrs(
 	void			*hanp,
@@ -84,19 +89,17 @@ handle_walk_pptrs(
 	char			*mntpt;
 	int			fd;
 
-	if (hlen != sizeof(struct xfs_handle)) {
-		errno = EINVAL;
-		return -1;
-	}
+	if (hlen != sizeof(struct xfs_handle))
+		return EINVAL;
 
 	fd = handle_to_fsfd(hanp, &mntpt);
 	if (fd < 0)
-		return -1;
+		return errno;
 
 	return handle_walk_parents(fd, hanp, fn, arg);
 }
 
-/* Walk all parent pointers of this fd. */
+/* Walk all parent pointers of this fd.  Returns 0 or positive errno. */
 int
 fd_walk_pptrs(
 	int			fd,
@@ -158,6 +161,7 @@ handle_walk_parent_path_ptr(
 /*
  * Recursively walk all parents of the given file handle; if we hit the
  * fs root then we call the associated function with the constructed path.
+ * Returns 0 for success or positive errno.
  */
 static int
 handle_walk_parent_paths(
@@ -169,11 +173,12 @@ handle_walk_parent_paths(
 
 	wpli = malloc(sizeof(struct walk_ppath_level_info));
 	if (!wpli)
-		return -1;
+		return errno;
 	wpli->pc = path_component_init("", 0);
 	if (!wpli->pc) {
+		ret = errno;
 		free(wpli);
-		return -1;
+		return ret;
 	}
 	wpli->wpi = wpi;
 	memcpy(&wpli->newhandle, handle, sizeof(struct xfs_handle));
@@ -188,7 +193,7 @@ handle_walk_parent_paths(
 
 /*
  * Call the given function on all known paths from the vfs root to the inode
- * described in the handle.
+ * described in the handle.  Returns 0 for success or positive errno.
  */
 int
 handle_walk_ppaths(
@@ -200,17 +205,15 @@ handle_walk_ppaths(
 	struct walk_ppaths_info	wpi;
 	ssize_t			ret;
 
-	if (hlen != sizeof(struct xfs_handle)) {
-		errno = EINVAL;
-		return -1;
-	}
+	if (hlen != sizeof(struct xfs_handle))
+		return EINVAL;
 
 	wpi.fd = handle_to_fsfd(hanp, &wpi.mntpt);
 	if (wpi.fd < 0)
-		return -1;
+		return errno;
 	wpi.path = path_list_init();
 	if (!wpi.path)
-		return -1;
+		return errno;
 	wpi.fn = fn;
 	wpi.arg = arg;
 
@@ -222,7 +225,7 @@ handle_walk_ppaths(
 
 /*
  * Call the given function on all known paths from the vfs root to the inode
- * referred to by the file description.
+ * referred to by the file description.  Returns 0 or positive errno.
  */
 int
 fd_walk_ppaths(
@@ -238,15 +241,15 @@ fd_walk_ppaths(
 
 	ret = fd_to_handle(fd, &hanp, &hlen);
 	if (ret)
-		return ret;
+		return errno;
 
 	fsfd = handle_to_fsfd(hanp, &wpi.mntpt);
 	if (fsfd < 0)
-		return -1;
+		return errno;
 	wpi.fd = fd;
 	wpi.path = path_list_init();
 	if (!wpi.path)
-		return -1;
+		return errno;
 	wpi.fn = fn;
 	wpi.arg = arg;
 
@@ -272,19 +275,20 @@ handle_to_path_walk(
 	int			ret;
 
 	ret = snprintf(pwi->buf, pwi->len, "%s", mntpt);
-	if (ret != strlen(mntpt)) {
-		errno = ENOMEM;
-		return -1;
-	}
+	if (ret != strlen(mntpt))
+		return ENAMETOOLONG;
 
 	ret = path_list_to_string(path, pwi->buf + ret, pwi->len - ret);
 	if (ret < 0)
-		return ret;
+		return ENAMETOOLONG;
 
-	return WALK_PPATHS_ABORT;
+	return ECANCELED;
 }
 
-/* Return any eligible path to this file handle. */
+/*
+ * Return any eligible path to this file handle.  Returns 0 for success or
+ * positive errno.
+ */
 int
 handle_to_path(
 	void			*hanp,
@@ -293,13 +297,20 @@ handle_to_path(
 	size_t			pathlen)
 {
 	struct path_walk_info	pwi;
+	int			ret;
 
 	pwi.buf = path;
 	pwi.len = pathlen;
-	return handle_walk_ppaths(hanp, hlen, handle_to_path_walk, &pwi);
+	ret = handle_walk_ppaths(hanp, hlen, handle_to_path_walk, &pwi);
+	if (ret == ECANCELED)
+		return 0;
+	return ret;
 }
 
-/* Return any eligible path to this file description. */
+/*
+ * Return any eligible path to this file description.  Returns 0 for success
+ * or positive errno.
+ */
 int
 fd_to_path(
 	int			fd,
@@ -307,8 +318,12 @@ fd_to_path(
 	size_t			pathlen)
 {
 	struct path_walk_info	pwi;
+	int			ret;
 
 	pwi.buf = path;
 	pwi.len = pathlen;
-	return fd_walk_ppaths(fd, handle_to_path_walk, &pwi);
+	ret = fd_walk_ppaths(fd, handle_to_path_walk, &pwi);
+	if (ret == ECANCELED)
+		return 0;
+	return ret;
 }
diff --git a/libfrog/pptrs.h b/libfrog/pptrs.h
index d174aa2a..1666de06 100644
--- a/libfrog/pptrs.h
+++ b/libfrog/pptrs.h
@@ -8,16 +8,14 @@
 
 struct path_list;
 
-typedef int (*walk_pptr_fn)(struct xfs_pptr_info *pi, struct xfs_parent_ptr *pptr,
-		void *arg);
+typedef int (*walk_pptr_fn)(struct xfs_pptr_info *pi,
+		struct xfs_parent_ptr *pptr, void *arg);
 typedef int (*walk_ppath_fn)(const char *mntpt, struct path_list *path,
 		void *arg);
 
-#define WALK_PPTRS_ABORT	1
 int fd_walk_pptrs(int fd, walk_pptr_fn fn, void *arg);
 int handle_walk_pptrs(void *hanp, size_t hanlen, walk_pptr_fn fn, void *arg);
 
-#define WALK_PPATHS_ABORT	1
 int fd_walk_ppaths(int fd, walk_ppath_fn fn, void *arg);
 int handle_walk_ppaths(void *hanp, size_t hanlen, walk_ppath_fn fn, void *arg);
 


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 07/10] libfrog: only walk one parent pointer at a time in handle_walk_parent_path_ptr
  2023-02-16 20:29 ` [PATCHSET v9r2d1 00/10] xfsprogs: actually use getparent ioctl Darrick J. Wong
                     ` (5 preceding siblings ...)
  2023-02-16 21:04   ` [PATCH 06/10] libfrog: return positive errno in pptrs.c Darrick J. Wong
@ 2023-02-16 21:04   ` Darrick J. Wong
  2023-02-16 21:05   ` [PATCH 08/10] libfrog: trim trailing slashes when printing pptr paths Darrick J. Wong
                     ` (2 subsequent siblings)
  9 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 21:04 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

handle_walk_parents already walks each returned parent pointer record,
so we don't need a loop in handle_walk_parent_path_ptr.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 libfrog/pptrs.c |   24 ++++++++++--------------
 1 file changed, 10 insertions(+), 14 deletions(-)


diff --git a/libfrog/pptrs.c b/libfrog/pptrs.c
index ef91a919..8d9e62a2 100644
--- a/libfrog/pptrs.c
+++ b/libfrog/pptrs.c
@@ -134,26 +134,22 @@ handle_walk_parent_path_ptr(
 {
 	struct walk_ppath_level_info	*wpli = arg;
 	struct walk_ppaths_info		*wpi = wpli->wpi;
-	unsigned int			i;
 	int				ret = 0;
 
 	if (pi->pi_flags & XFS_PPTR_OFLAG_ROOT)
 		return wpi->fn(wpi->mntpt, wpi->path, wpi->arg);
 
-	for (i = 0; i < pi->pi_ptrs_used; i++) {
-		p = xfs_ppinfo_to_pp(pi, i);
-		ret = path_component_change(wpli->pc, p->xpp_name,
+	ret = path_component_change(wpli->pc, p->xpp_name,
 				strlen((char *)p->xpp_name), p->xpp_ino);
-		if (ret)
-			break;
-		wpli->newhandle.ha_fid.fid_ino = p->xpp_ino;
-		wpli->newhandle.ha_fid.fid_gen = p->xpp_gen;
-		path_list_add_parent_component(wpi->path, wpli->pc);
-		ret = handle_walk_parent_paths(wpi, &wpli->newhandle);
-		path_list_del_component(wpi->path, wpli->pc);
-		if (ret)
-			break;
-	}
+	if (ret)
+		return ret;
+
+	wpli->newhandle.ha_fid.fid_ino = p->xpp_ino;
+	wpli->newhandle.ha_fid.fid_gen = p->xpp_gen;
+
+	path_list_add_parent_component(wpi->path, wpli->pc);
+	ret = handle_walk_parent_paths(wpi, &wpli->newhandle);
+	path_list_del_component(wpi->path, wpli->pc);
 
 	return ret;
 }


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 08/10] libfrog: trim trailing slashes when printing pptr paths
  2023-02-16 20:29 ` [PATCHSET v9r2d1 00/10] xfsprogs: actually use getparent ioctl Darrick J. Wong
                     ` (6 preceding siblings ...)
  2023-02-16 21:04   ` [PATCH 07/10] libfrog: only walk one parent pointer at a time in handle_walk_parent_path_ptr Darrick J. Wong
@ 2023-02-16 21:05   ` Darrick J. Wong
  2023-02-16 21:05   ` [PATCH 09/10] xfs_io: parent command is not experts-only Darrick J. Wong
  2023-02-16 21:05   ` [PATCH 10/10] xfs_scrub: use parent pointers when possible to report file operations Darrick J. Wong
  9 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 21:05 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Trim the trailing slashes in the mountpoint string when we're printing
parent pointer paths.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 io/parent.c     |    9 +++++++--
 libfrog/pptrs.c |    9 +++++++--
 2 files changed, 14 insertions(+), 4 deletions(-)


diff --git a/io/parent.c b/io/parent.c
index 25d835a3..694c0839 100644
--- a/io/parent.c
+++ b/io/parent.c
@@ -97,6 +97,7 @@ path_print(
 	struct pptr_args	*args = arg;
 	char			buf[PATH_MAX];
 	size_t			len = PATH_MAX;
+	int			mntpt_len = strlen(mntpt);
 	int			ret;
 
 	if (args->filter_ino || args->filter_name) {
@@ -105,8 +106,12 @@ path_print(
 			return 0;
 	}
 
-	ret = snprintf(buf, len, "%s", mntpt);
-	if (ret != strlen(mntpt))
+	/* Trim trailing slashes from the mountpoint */
+	while (mntpt_len > 0 && mntpt[mntpt_len - 1] == '/')
+		mntpt_len--;
+
+	ret = snprintf(buf, len, "%.*s", mntpt_len, mntpt);
+	if (ret != mntpt_len)
 		return ENAMETOOLONG;
 
 	ret = path_list_to_string(path, buf + ret, len - ret);
diff --git a/libfrog/pptrs.c b/libfrog/pptrs.c
index 8d9e62a2..61fd1fb9 100644
--- a/libfrog/pptrs.c
+++ b/libfrog/pptrs.c
@@ -268,10 +268,15 @@ handle_to_path_walk(
 	void			*arg)
 {
 	struct path_walk_info	*pwi = arg;
+	int			mntpt_len = strlen(mntpt);
 	int			ret;
 
-	ret = snprintf(pwi->buf, pwi->len, "%s", mntpt);
-	if (ret != strlen(mntpt))
+	/* Trim trailing slashes from the mountpoint */
+	while (mntpt_len > 0 && mntpt[mntpt_len - 1] == '/')
+		mntpt_len--;
+
+	ret = snprintf(pwi->buf, pwi->len, "%.*s", mntpt_len, mntpt);
+	if (ret != mntpt_len)
 		return ENAMETOOLONG;
 
 	ret = path_list_to_string(path, pwi->buf + ret, pwi->len - ret);


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 09/10] xfs_io: parent command is not experts-only
  2023-02-16 20:29 ` [PATCHSET v9r2d1 00/10] xfsprogs: actually use getparent ioctl Darrick J. Wong
                     ` (7 preceding siblings ...)
  2023-02-16 21:05   ` [PATCH 08/10] libfrog: trim trailing slashes when printing pptr paths Darrick J. Wong
@ 2023-02-16 21:05   ` Darrick J. Wong
  2023-02-16 21:05   ` [PATCH 10/10] xfs_scrub: use parent pointers when possible to report file operations Darrick J. Wong
  9 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 21:05 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

This command isn't dangerous, so don't make it experts-only.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 io/parent.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)


diff --git a/io/parent.c b/io/parent.c
index 694c0839..36522f26 100644
--- a/io/parent.c
+++ b/io/parent.c
@@ -275,6 +275,5 @@ parent_init(void)
 	parent_cmd.oneline = _("print parent inodes");
 	parent_cmd.help = parent_help;
 
-	if (expert)
-		add_command(&parent_cmd);
+	add_command(&parent_cmd);
 }


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 10/10] xfs_scrub: use parent pointers when possible to report file operations
  2023-02-16 20:29 ` [PATCHSET v9r2d1 00/10] xfsprogs: actually use getparent ioctl Darrick J. Wong
                     ` (8 preceding siblings ...)
  2023-02-16 21:05   ` [PATCH 09/10] xfs_io: parent command is not experts-only Darrick J. Wong
@ 2023-02-16 21:05   ` Darrick J. Wong
  9 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 21:05 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

If parent pointers are available, use them to supply file paths when
doing things to files, instead of merely printing the inode number.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 scrub/common.c |   21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)


diff --git a/scrub/common.c b/scrub/common.c
index 49a87f41..9f3cde9b 100644
--- a/scrub/common.c
+++ b/scrub/common.c
@@ -12,6 +12,7 @@
 #include "xfs_scrub.h"
 #include "common.h"
 #include "progress.h"
+#include "libfrog/pptrs.h"
 
 extern char		*progname;
 
@@ -407,6 +408,26 @@ scrub_render_ino_descr(
 	uint32_t		agino;
 	int			ret;
 
+	if (ctx->mnt.fsgeom.flags & XFS_FSOP_GEOM_FLAGS_PARENT) {
+		struct xfs_handle handle;
+
+		memcpy(&handle.ha_fsid, ctx->fshandle, sizeof(handle.ha_fsid));
+		handle.ha_fid.fid_len = sizeof(xfs_fid_t) -
+				sizeof(handle.ha_fid.fid_len);
+		handle.ha_fid.fid_pad = 0;
+		handle.ha_fid.fid_ino = ino;
+		handle.ha_fid.fid_gen = gen;
+
+		ret = handle_to_path(&handle, sizeof(struct xfs_handle), buf,
+				buflen);
+		/*
+		 * If successful, return any positive integer to use the
+		 * formatted error string.
+		 */
+		if (ret == 0)
+			return 1;
+	}
+
 	agno = cvt_ino_to_agno(&ctx->mnt, ino);
 	agino = cvt_ino_to_agino(&ctx->mnt, ino);
 	ret = snprintf(buf, buflen, _("inode %"PRIu64" (%"PRIu32"/%"PRIu32")%s"),


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 1/4] libxfs: add xfile support
  2023-02-16 20:30 ` [PATCHSET v9r2d1 0/4] xfsprogs: offline fsck support patches Darrick J. Wong
@ 2023-02-16 21:06   ` Darrick J. Wong
  2023-02-16 21:06   ` [PATCH 2/4] xfs: track file link count updates during live nlinks fsck Darrick J. Wong
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 21:06 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Port the xfile functionality (anonymous pageable file-index memory) from
the kernel.  In userspace, we try to use memfd() to create tmpfs files
that are not in any namespace, matching the kernel.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 configure.ac          |    3 +
 include/builddefs.in  |    3 +
 libxfs/Makefile       |   12 +++
 libxfs/xfile.c        |  224 +++++++++++++++++++++++++++++++++++++++++++++++++
 libxfs/xfile.h        |   56 ++++++++++++
 m4/package_libcdev.m4 |   50 +++++++++++
 repair/xfs_repair.c   |   15 +++
 7 files changed, 363 insertions(+)
 create mode 100644 libxfs/xfile.c
 create mode 100644 libxfs/xfile.h


diff --git a/configure.ac b/configure.ac
index 63cc18cc..2472b32f 100644
--- a/configure.ac
+++ b/configure.ac
@@ -251,6 +251,9 @@ AC_CHECK_SIZEOF([char *])
 AC_TYPE_UMODE_T
 AC_MANUAL_FORMAT
 AC_HAVE_LIBURCU_ATOMIC64
+AC_HAVE_MEMFD_CLOEXEC
+AC_HAVE_O_TMPFILE
+AC_HAVE_MKOSTEMP_CLOEXEC
 
 AC_CONFIG_FILES([include/builddefs])
 AC_OUTPUT
diff --git a/include/builddefs.in b/include/builddefs.in
index e0a2f3cb..60c1320a 100644
--- a/include/builddefs.in
+++ b/include/builddefs.in
@@ -127,6 +127,9 @@ SYSTEMD_SYSTEM_UNIT_DIR = @systemd_system_unit_dir@
 HAVE_CROND = @have_crond@
 CROND_DIR = @crond_dir@
 HAVE_LIBURCU_ATOMIC64 = @have_liburcu_atomic64@
+HAVE_MEMFD_CLOEXEC = @have_memfd_cloexec@
+HAVE_O_TMPFILE = @have_o_tmpfile@
+HAVE_MKOSTEMP_CLOEXEC = @have_mkostemp_cloexec@
 
 GCCFLAGS = -funsigned-char -fno-strict-aliasing -Wall
 #	   -Wbitwise -Wno-transparent-union -Wno-old-initializer -Wno-decl
diff --git a/libxfs/Makefile b/libxfs/Makefile
index 89d29dc9..17978006 100644
--- a/libxfs/Makefile
+++ b/libxfs/Makefile
@@ -26,6 +26,7 @@ HFILES = \
 	libxfs_priv.h \
 	linux-err.h \
 	topology.h \
+	xfile.h \
 	xfs_ag_resv.h \
 	xfs_alloc.h \
 	xfs_alloc_btree.h \
@@ -66,6 +67,7 @@ CFILES = cache.c \
 	topology.c \
 	trans.c \
 	util.c \
+	xfile.c \
 	xfs_ag.c \
 	xfs_ag_resv.c \
 	xfs_alloc.c \
@@ -113,6 +115,16 @@ CFILES = cache.c \
 #
 #LCFLAGS +=
 
+ifeq ($(HAVE_MEMFD_CLOEXEC),yes)
+	LCFLAGS += -DHAVE_MEMFD_CLOEXEC
+endif
+ifeq ($(HAVE_O_TMPFILE),yes)
+	LCFLAGS += -DHAVE_O_TMPFILE
+endif
+ifeq ($(HAVE_MKOSTEMP_CLOEXEC),yes)
+	LCFLAGS += -DHAVE_MKOSTEMP_CLOEXEC
+endif
+
 FCFLAGS = -I.
 
 LTLIBS = $(LIBPTHREAD) $(LIBRT)
diff --git a/libxfs/xfile.c b/libxfs/xfile.c
new file mode 100644
index 00000000..f551aef5
--- /dev/null
+++ b/libxfs/xfile.c
@@ -0,0 +1,224 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (C) 2022 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <djwong@kernel.org>
+ */
+#include "libxfs_priv.h"
+#include "libxfs.h"
+#include "libxfs/xfile.h"
+#include <sys/mman.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+
+/*
+ * Swappable Temporary Memory
+ * ==========================
+ *
+ * Offline checking sometimes needs to be able to stage a large amount of data
+ * in memory.  This information might not fit in the available memory and it
+ * doesn't all need to be accessible at all times.  In other words, we want an
+ * indexed data buffer to store data that can be paged out.
+ *
+ * memfd files meet those requirements.  Therefore, the xfile mechanism uses
+ * one to store our staging data.  The xfile must be freed with xfile_destroy.
+ *
+ * xfiles assume that the caller will handle all required concurrency
+ * management; file locks are not taken.
+ */
+
+/*
+ * Open a memory-backed fd to back an xfile.  We require close-on-exec here,
+ * because these memfd files function as windowed RAM and hence should never
+ * be shared with other processes.
+ */
+static int
+xfile_create_fd(
+	const char		*description)
+{
+	int			fd = -1;
+
+#ifdef HAVE_MEMFD_CLOEXEC
+	/* memfd_create exists in kernel 3.17 (2014) and glibc 2.27 (2018). */
+	fd = memfd_create(description, MFD_CLOEXEC);
+	if (fd >= 0)
+		return fd;
+#endif
+
+#ifdef HAVE_O_TMPFILE
+	/*
+	 * O_TMPFILE exists as of kernel 3.11 (2013), which means that if we
+	 * find it, we're pretty safe in assuming O_CLOEXEC exists too.
+	 */
+	fd = open("/dev/shm", O_TMPFILE | O_CLOEXEC | O_RDWR, 0600);
+	if (fd >= 0)
+		return fd;
+
+	fd = open("/tmp", O_TMPFILE | O_CLOEXEC | O_RDWR, 0600);
+	if (fd >= 0)
+		return fd;
+#endif
+
+#ifdef HAVE_MKOSTEMP_CLOEXEC
+	/*
+	 * mkostemp exists as of glibc 2.7 (2007) and O_CLOEXEC exists as of
+	 * kernel 2.6.23 (2007).
+	 */
+	fd = mkostemp("libxfsXXXXXX", O_CLOEXEC);
+	if (fd >= 0)
+		return fd;
+#endif
+
+#if !defined(HAVE_MEMFD_CLOEXEC) && \
+    !defined(HAVE_O_TMPFILE) && \
+    !defined(HAVE_MKOSTEMP_CLOEXEC)
+# error System needs memfd_create, O_TMPFILE, or O_CLOEXEC to build!
+#endif
+
+	return fd;
+}
+
+/*
+ * Create an xfile of the given size.  The description will be used in the
+ * trace output.
+ */
+int
+xfile_create(
+	struct xfs_mount	*mp,
+	const char		*description,
+	struct xfile		**xfilep)
+{
+	struct xfile		*xf;
+	char			fname[MAXNAMELEN];
+	int			error;
+
+	snprintf(fname, MAXNAMELEN - 1, "XFS (%s): %s", mp->m_fsname,
+			description);
+	fname[MAXNAMELEN - 1] = 0;
+
+	xf = kmem_alloc(sizeof(struct xfile), KM_MAYFAIL);
+	if (!xf)
+		return -ENOMEM;
+
+	xf->fd = xfile_create_fd(fname);
+	if (xf->fd < 0) {
+		error = -errno;
+		kmem_free(xf);
+		return error;
+	}
+
+	*xfilep = xf;
+	return 0;
+}
+
+/* Close the file and release all resources. */
+void
+xfile_destroy(
+	struct xfile		*xf)
+{
+	close(xf->fd);
+	kmem_free(xf);
+}
+
+static inline loff_t
+xfile_maxbytes(
+	struct xfile		*xf)
+{
+	if (sizeof(loff_t) == 8)
+		return LLONG_MAX;
+	return LONG_MAX;
+}
+
+/*
+ * Read a memory object directly from the xfile's page cache.  Unlike regular
+ * pread, we return -E2BIG and -EFBIG for reads that are too large or at too
+ * high an offset, instead of truncating the read.  Otherwise, we return
+ * bytes read or an error code, like regular pread.
+ */
+ssize_t
+xfile_pread(
+	struct xfile		*xf,
+	void			*buf,
+	size_t			count,
+	loff_t			pos)
+{
+	ssize_t			ret;
+
+	if (count > INT_MAX)
+		return -E2BIG;
+	if (xfile_maxbytes(xf) - pos < count)
+		return -EFBIG;
+
+	ret = pread(xf->fd, buf, count, pos);
+	if (ret >= 0)
+		return ret;
+	return -errno;
+}
+
+/*
+ * Write a memory object directly to the xfile's page cache.  Unlike regular
+ * pwrite, we return -E2BIG and -EFBIG for writes that are too large or at too
+ * high an offset, instead of truncating the write.  Otherwise, we return
+ * bytes written or an error code, like regular pwrite.
+ */
+ssize_t
+xfile_pwrite(
+	struct xfile		*xf,
+	const void		*buf,
+	size_t			count,
+	loff_t			pos)
+{
+	ssize_t			ret;
+
+	if (count > INT_MAX)
+		return -E2BIG;
+	if (xfile_maxbytes(xf) - pos < count)
+		return -EFBIG;
+
+	ret = pwrite(xf->fd, buf, count, pos);
+	if (ret >= 0)
+		return ret;
+	return -errno;
+}
+
+/* Query stat information for an xfile. */
+int
+xfile_stat(
+	struct xfile		*xf,
+	struct xfile_stat	*statbuf)
+{
+	struct stat		ks;
+	int			error;
+
+	error = fstat(xf->fd, &ks);
+	if (error)
+		return -errno;
+
+	statbuf->size = ks.st_size;
+	statbuf->bytes = (unsigned long long)ks.st_blocks << 9;
+	return 0;
+}
+
+/* Dump an xfile to stdout. */
+int
+xfile_dump(
+	struct xfile		*xf)
+{
+	char			*argv[] = {"od", "-tx1", "-Ad", "-c", NULL};
+	pid_t			child;
+	int			i;
+
+	child = fork();
+	if (child != 0) {
+		int		wstatus;
+
+		wait(&wstatus);
+		return wstatus == 0 ? 0 : -EIO;
+	}
+
+	/* reroute our xfile to stdin and shut everything else */
+	dup2(xf->fd, 0);
+	for (i = 3; i < 1024; i++)
+		close(i);
+
+	return execvp("od", argv);
+}
diff --git a/libxfs/xfile.h b/libxfs/xfile.h
new file mode 100644
index 00000000..1389ff8f
--- /dev/null
+++ b/libxfs/xfile.h
@@ -0,0 +1,56 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Copyright (C) 2022 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <djwong@kernel.org>
+ */
+#ifndef __LIBXFS_XFILE_H__
+#define __LIBXFS_XFILE_H__
+
+struct xfile {
+	int		fd;
+};
+
+int xfile_create(struct xfs_mount *mp, const char *description,
+		struct xfile **xfilep);
+void xfile_destroy(struct xfile *xf);
+
+ssize_t xfile_pread(struct xfile *xf, void *buf, size_t count, loff_t pos);
+ssize_t xfile_pwrite(struct xfile *xf, const void *buf, size_t count, loff_t pos);
+
+/*
+ * Load an object.  Since we're treating this file as "memory", any error or
+ * short IO is treated as a failure to allocate memory.
+ */
+static inline int
+xfile_obj_load(struct xfile *xf, void *buf, size_t count, loff_t pos)
+{
+	ssize_t	ret = xfile_pread(xf, buf, count, pos);
+
+	if (ret < 0 || ret != count)
+		return -ENOMEM;
+	return 0;
+}
+
+/*
+ * Store an object.  Since we're treating this file as "memory", any error or
+ * short IO is treated as a failure to allocate memory.
+ */
+static inline int
+xfile_obj_store(struct xfile *xf, const void *buf, size_t count, loff_t pos)
+{
+	ssize_t	ret = xfile_pwrite(xf, buf, count, pos);
+
+	if (ret < 0 || ret != count)
+		return -ENOMEM;
+	return 0;
+}
+
+struct xfile_stat {
+	loff_t			size;
+	unsigned long long	bytes;
+};
+
+int xfile_stat(struct xfile *xf, struct xfile_stat *statbuf);
+int xfile_dump(struct xfile *xf);
+
+#endif /* __LIBXFS_XFILE_H__ */
diff --git a/m4/package_libcdev.m4 b/m4/package_libcdev.m4
index bb1ab49c..119d1bda 100644
--- a/m4/package_libcdev.m4
+++ b/m4/package_libcdev.m4
@@ -507,3 +507,53 @@ AC_DEFUN([AC_PACKAGE_CHECK_LTO],
     AC_SUBST(lto_cflags)
     AC_SUBST(lto_ldflags)
   ])
+
+#
+# Check if we have a memfd_create syscall with a MFD_CLOEXEC flag
+#
+AC_DEFUN([AC_HAVE_MEMFD_CLOEXEC],
+  [ AC_MSG_CHECKING([for memfd_fd and MFD_CLOEXEC])
+    AC_LINK_IFELSE([AC_LANG_PROGRAM([[
+#define _GNU_SOURCE
+#include <sys/mman.h>
+    ]], [[
+         return memfd_create("xfs", MFD_CLOEXEC);
+    ]])],[have_memfd_cloexec=yes
+       AC_MSG_RESULT(yes)],[AC_MSG_RESULT(no)])
+    AC_SUBST(have_memfd_cloexec)
+  ])
+
+#
+# Check if we have the O_TMPFILE flag
+#
+AC_DEFUN([AC_HAVE_O_TMPFILE],
+  [ AC_MSG_CHECKING([for O_TMPFILE])
+    AC_LINK_IFELSE([AC_LANG_PROGRAM([[
+#define _GNU_SOURCE
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+    ]], [[
+         return open("nowhere", O_TMPFILE, 0600);
+    ]])],[have_o_tmpfile=yes
+       AC_MSG_RESULT(yes)],[AC_MSG_RESULT(no)])
+    AC_SUBST(have_o_tmpfile)
+  ])
+
+#
+# Check if we have mkostemp with the O_CLOEXEC flag
+#
+AC_DEFUN([AC_HAVE_MKOSTEMP_CLOEXEC],
+  [ AC_MSG_CHECKING([for mkostemp and O_CLOEXEC])
+    AC_LINK_IFELSE([AC_LANG_PROGRAM([[
+#define _GNU_SOURCE
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <fcntl.h>
+#include <stdlib.h>
+    ]], [[
+         return mkostemp("nowhere", O_TMPFILE);
+    ]])],[have_mkostemp_cloexec=yes
+       AC_MSG_RESULT(yes)],[AC_MSG_RESULT(no)])
+    AC_SUBST(have_mkostemp_cloexec)
+  ])
diff --git a/repair/xfs_repair.c b/repair/xfs_repair.c
index ff29bea9..65cb9387 100644
--- a/repair/xfs_repair.c
+++ b/repair/xfs_repair.c
@@ -953,6 +953,20 @@ phase_end(int phase)
 		platform_crash();
 }
 
+/* Try to allow as many memfds as possible. */
+static void
+bump_max_fds(void)
+{
+	struct rlimit	rlim = { };
+	int		ret;
+
+	ret = getrlimit(RLIMIT_NOFILE, &rlim);
+	if (!ret) {
+		rlim.rlim_cur = rlim.rlim_max;
+		setrlimit(RLIMIT_NOFILE, &rlim);
+	}
+}
+
 int
 main(int argc, char **argv)
 {
@@ -972,6 +986,7 @@ main(int argc, char **argv)
 	bindtextdomain(PACKAGE, LOCALEDIR);
 	textdomain(PACKAGE);
 	dinode_bmbt_translation_init();
+	bump_max_fds();
 
 	temp_mp = &xfs_m;
 	setbuf(stdout, NULL);


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 2/4] xfs: track file link count updates during live nlinks fsck
  2023-02-16 20:30 ` [PATCHSET v9r2d1 0/4] xfsprogs: offline fsck support patches Darrick J. Wong
  2023-02-16 21:06   ` [PATCH 1/4] libxfs: add xfile support Darrick J. Wong
@ 2023-02-16 21:06   ` Darrick J. Wong
  2023-02-16 21:06   ` [PATCH 3/4] xfs: create a blob array data structure Darrick J. Wong
  2023-02-16 21:06   ` [PATCH 4/4] libxfs: export attr3_leaf_hdr_from_disk via libxfs_api_defs.h Darrick J. Wong
  3 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 21:06 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Create the necessary hooks in the file create/unlink/rename code so that
our live nlink scrub code can stay up to date with the rest of the
filesystem.  This will be the means to keep our shadow link count
information up to date while the scan runs in real time.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 libxfs/xfs_dir2.c |    6 ++++++
 libxfs/xfs_dir2.h |    1 +
 repair/phase6.c   |    4 ----
 3 files changed, 7 insertions(+), 4 deletions(-)


diff --git a/libxfs/xfs_dir2.c b/libxfs/xfs_dir2.c
index 43b4e46b..4bbe83f9 100644
--- a/libxfs/xfs_dir2.c
+++ b/libxfs/xfs_dir2.c
@@ -24,6 +24,12 @@ const struct xfs_name xfs_name_dotdot = {
 	.type	= XFS_DIR3_FT_DIR,
 };
 
+const struct xfs_name xfs_name_dot = {
+	.name	= (const unsigned char *)".",
+	.len	= 1,
+	.type	= XFS_DIR3_FT_DIR,
+};
+
 /*
  * Convert inode mode to directory entry filetype
  */
diff --git a/libxfs/xfs_dir2.h b/libxfs/xfs_dir2.h
index ff59f009..ac360c0b 100644
--- a/libxfs/xfs_dir2.h
+++ b/libxfs/xfs_dir2.h
@@ -22,6 +22,7 @@ struct xfs_dir3_icfree_hdr;
 struct xfs_dir3_icleaf_hdr;
 
 extern const struct xfs_name	xfs_name_dotdot;
+extern const struct xfs_name	xfs_name_dot;
 
 /*
  * Convert inode mode to directory entry filetype
diff --git a/repair/phase6.c b/repair/phase6.c
index e202398e..0d253701 100644
--- a/repair/phase6.c
+++ b/repair/phase6.c
@@ -23,10 +23,6 @@ static struct cred		zerocr;
 static struct fsxattr 		zerofsx;
 static xfs_ino_t		orphanage_ino;
 
-static struct xfs_name		xfs_name_dot = {(unsigned char *)".",
-						1,
-						XFS_DIR3_FT_DIR};
-
 /*
  * Data structures used to keep track of directories where the ".."
  * entries are updated. These must be rebuilt after the initial pass


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 3/4] xfs: create a blob array data structure
  2023-02-16 20:30 ` [PATCHSET v9r2d1 0/4] xfsprogs: offline fsck support patches Darrick J. Wong
  2023-02-16 21:06   ` [PATCH 1/4] libxfs: add xfile support Darrick J. Wong
  2023-02-16 21:06   ` [PATCH 2/4] xfs: track file link count updates during live nlinks fsck Darrick J. Wong
@ 2023-02-16 21:06   ` Darrick J. Wong
  2023-02-16 21:06   ` [PATCH 4/4] libxfs: export attr3_leaf_hdr_from_disk via libxfs_api_defs.h Darrick J. Wong
  3 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 21:06 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Create a simple 'blob array' data structure for storage of arbitrarily
sized metadata objects that will be used to reconstruct metadata.  For
the intended usage (temporarily storing extended attribute names and
values) we only have to support storing objects and retrieving them.
Use the xfile abstraction to store the attribute information in memory
that can be swapped out.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 libxfs/Makefile |    2 +
 libxfs/xfblob.c |  148 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 libxfs/xfblob.h |   25 +++++++++
 libxfs/xfile.c  |   11 ++++
 libxfs/xfile.h  |    1 
 5 files changed, 187 insertions(+)
 create mode 100644 libxfs/xfblob.c
 create mode 100644 libxfs/xfblob.h


diff --git a/libxfs/Makefile b/libxfs/Makefile
index 17978006..cac0c948 100644
--- a/libxfs/Makefile
+++ b/libxfs/Makefile
@@ -26,6 +26,7 @@ HFILES = \
 	libxfs_priv.h \
 	linux-err.h \
 	topology.h \
+	xfblob.h \
 	xfile.h \
 	xfs_ag_resv.h \
 	xfs_alloc.h \
@@ -67,6 +68,7 @@ CFILES = cache.c \
 	topology.c \
 	trans.c \
 	util.c \
+	xfblob.c \
 	xfile.c \
 	xfs_ag.c \
 	xfs_ag_resv.c \
diff --git a/libxfs/xfblob.c b/libxfs/xfblob.c
new file mode 100644
index 00000000..6c1c8e6f
--- /dev/null
+++ b/libxfs/xfblob.c
@@ -0,0 +1,148 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (C) 2022 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <djwong@kernel.org>
+ */
+#include "libxfs_priv.h"
+#include "libxfs.h"
+#include "libxfs/xfile.h"
+#include "libxfs/xfblob.h"
+
+/*
+ * XFS Blob Storage
+ * ================
+ * Stores and retrieves blobs using an xfile.  Objects are appended to the file
+ * and the offset is returned as a magic cookie for retrieval.
+ */
+
+#define XB_KEY_MAGIC	0xABAADDAD
+struct xb_key {
+	uint32_t		xb_magic;  /* XB_KEY_MAGIC */
+	uint32_t		xb_size;   /* size of the blob, in bytes */
+	loff_t			xb_offset; /* byte offset of this key */
+	/* blob comes after here */
+} __packed;
+
+/* Initialize a blob storage object. */
+int
+xfblob_create(
+	struct xfs_mount	*mp,
+	const char		*description,
+	struct xfblob		**blobp)
+{
+	struct xfblob		*blob;
+	struct xfile		*xfile;
+	int			error;
+
+	error = xfile_create(mp, description, &xfile);
+	if (error)
+		return error;
+
+	blob = malloc(sizeof(struct xfblob));
+	if (!blob) {
+		error = -ENOMEM;
+		goto out_xfile;
+	}
+
+	blob->xfile = xfile;
+	blob->last_offset = PAGE_SIZE;
+
+	*blobp = blob;
+	return 0;
+
+out_xfile:
+	xfile_destroy(xfile);
+	return error;
+}
+
+/* Destroy a blob storage object. */
+void
+xfblob_destroy(
+	struct xfblob	*blob)
+{
+	xfile_destroy(blob->xfile);
+	kfree(blob);
+}
+
+/* Retrieve a blob. */
+int
+xfblob_load(
+	struct xfblob	*blob,
+	xfblob_cookie	cookie,
+	void		*ptr,
+	uint32_t	size)
+{
+	struct xb_key	key;
+	int		error;
+
+	error = xfile_obj_load(blob->xfile, &key, sizeof(key), cookie);
+	if (error)
+		return error;
+
+	if (key.xb_magic != XB_KEY_MAGIC || key.xb_offset != cookie) {
+		ASSERT(0);
+		return -ENODATA;
+	}
+	if (size < key.xb_size) {
+		ASSERT(0);
+		return -EFBIG;
+	}
+
+	return xfile_obj_load(blob->xfile, ptr, key.xb_size,
+			cookie + sizeof(key));
+}
+
+/* Store a blob. */
+int
+xfblob_store(
+	struct xfblob	*blob,
+	xfblob_cookie	*cookie,
+	const void	*ptr,
+	uint32_t	size)
+{
+	struct xb_key	key = {
+		.xb_offset = blob->last_offset,
+		.xb_magic = XB_KEY_MAGIC,
+		.xb_size = size,
+	};
+	loff_t		pos = blob->last_offset;
+	int		error;
+
+	error = xfile_obj_store(blob->xfile, &key, sizeof(key), pos);
+	if (error)
+		return error;
+
+	pos += sizeof(key);
+	error = xfile_obj_store(blob->xfile, ptr, size, pos);
+	if (error)
+		goto out_err;
+
+	*cookie = blob->last_offset;
+	blob->last_offset += sizeof(key) + size;
+	return 0;
+out_err:
+	xfile_discard(blob->xfile, blob->last_offset, sizeof(key));
+	return error;
+}
+
+/* Free a blob. */
+int
+xfblob_free(
+	struct xfblob	*blob,
+	xfblob_cookie	cookie)
+{
+	struct xb_key	key;
+	int		error;
+
+	error = xfile_obj_load(blob->xfile, &key, sizeof(key), cookie);
+	if (error)
+		return error;
+
+	if (key.xb_magic != XB_KEY_MAGIC || key.xb_offset != cookie) {
+		ASSERT(0);
+		return -ENODATA;
+	}
+
+	xfile_discard(blob->xfile, cookie, sizeof(key) + key.xb_size);
+	return 0;
+}
diff --git a/libxfs/xfblob.h b/libxfs/xfblob.h
new file mode 100644
index 00000000..d1282810
--- /dev/null
+++ b/libxfs/xfblob.h
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Copyright (C) 2022 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <djwong@kernel.org>
+ */
+#ifndef __XFS_SCRUB_XFBLOB_H__
+#define __XFS_SCRUB_XFBLOB_H__
+
+struct xfblob {
+	struct xfile	*xfile;
+	loff_t		last_offset;
+};
+
+typedef loff_t		xfblob_cookie;
+
+int xfblob_create(struct xfs_mount *mp, const char *descr,
+		struct xfblob **blobp);
+void xfblob_destroy(struct xfblob *blob);
+int xfblob_load(struct xfblob *blob, xfblob_cookie cookie, void *ptr,
+		uint32_t size);
+int xfblob_store(struct xfblob *blob, xfblob_cookie *cookie, const void *ptr,
+		uint32_t size);
+int xfblob_free(struct xfblob *blob, xfblob_cookie cookie);
+
+#endif /* __XFS_SCRUB_XFBLOB_H__ */
diff --git a/libxfs/xfile.c b/libxfs/xfile.c
index f551aef5..57542507 100644
--- a/libxfs/xfile.c
+++ b/libxfs/xfile.c
@@ -222,3 +222,14 @@ xfile_dump(
 
 	return execvp("od", argv);
 }
+
+/* Discard pages backing a range of the xfile. */
+void
+xfile_discard(
+	struct xfile		*xf,
+	loff_t			pos,
+	unsigned long long	count)
+{
+	fallocate(xf->fd, FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE,
+			pos, count);
+}
diff --git a/libxfs/xfile.h b/libxfs/xfile.h
index 1389ff8f..89431f6f 100644
--- a/libxfs/xfile.h
+++ b/libxfs/xfile.h
@@ -52,5 +52,6 @@ struct xfile_stat {
 
 int xfile_stat(struct xfile *xf, struct xfile_stat *statbuf);
 int xfile_dump(struct xfile *xf);
+void xfile_discard(struct xfile *xf, loff_t pos, unsigned long long count);
 
 #endif /* __LIBXFS_XFILE_H__ */


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 4/4] libxfs: export attr3_leaf_hdr_from_disk via libxfs_api_defs.h
  2023-02-16 20:30 ` [PATCHSET v9r2d1 0/4] xfsprogs: offline fsck support patches Darrick J. Wong
                     ` (2 preceding siblings ...)
  2023-02-16 21:06   ` [PATCH 3/4] xfs: create a blob array data structure Darrick J. Wong
@ 2023-02-16 21:06   ` Darrick J. Wong
  3 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 21:06 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Do the xfs -> libxfs switcheroo and cleanups separately so the next
patch doesn't become an even larger mess.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 db/attr.c                |    2 +-
 db/metadump.c            |    2 +-
 libxfs/libxfs_api_defs.h |    1 +
 repair/attr_repair.c     |    6 +++---
 4 files changed, 6 insertions(+), 5 deletions(-)


diff --git a/db/attr.c b/db/attr.c
index db7cf54b..8ea7b36e 100644
--- a/db/attr.c
+++ b/db/attr.c
@@ -253,7 +253,7 @@ attr_leaf_entry_walk(
 		return 0;
 
 	off = byteize(startoff);
-	xfs_attr3_leaf_hdr_from_disk(mp->m_attr_geo, &leafhdr, leaf);
+	libxfs_attr3_leaf_hdr_from_disk(mp->m_attr_geo, &leafhdr, leaf);
 	entries = xfs_attr3_leaf_entryp(leaf);
 
 	for (i = 0; i < leafhdr.count; i++) {
diff --git a/db/metadump.c b/db/metadump.c
index bb441fbb..4be23993 100644
--- a/db/metadump.c
+++ b/db/metadump.c
@@ -1757,7 +1757,7 @@ process_attr_block(
 	}
 
 	/* Ok, it's a leaf - get header; accounts for crc & non-crc */
-	xfs_attr3_leaf_hdr_from_disk(mp->m_attr_geo, &hdr, leaf);
+	libxfs_attr3_leaf_hdr_from_disk(mp->m_attr_geo, &hdr, leaf);
 
 	nentries = hdr.count;
 	if (nentries == 0 ||
diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h
index 055d2862..6d045867 100644
--- a/libxfs/libxfs_api_defs.h
+++ b/libxfs/libxfs_api_defs.h
@@ -33,6 +33,7 @@
 #define xfs_alloc_read_agf		libxfs_alloc_read_agf
 #define xfs_alloc_vextent		libxfs_alloc_vextent
 
+#define xfs_attr3_leaf_hdr_from_disk	libxfs_attr3_leaf_hdr_from_disk
 #define xfs_attr_get			libxfs_attr_get
 #define xfs_attr_leaf_newentsize	libxfs_attr_leaf_newentsize
 #define xfs_attr_namecheck		libxfs_attr_namecheck
diff --git a/repair/attr_repair.c b/repair/attr_repair.c
index afe8073c..d3fd7a47 100644
--- a/repair/attr_repair.c
+++ b/repair/attr_repair.c
@@ -579,7 +579,7 @@ process_leaf_attr_block(
 	da_freemap_t *attr_freemap;
 	struct xfs_attr3_icleaf_hdr leafhdr;
 
-	xfs_attr3_leaf_hdr_from_disk(mp->m_attr_geo, &leafhdr, leaf);
+	libxfs_attr3_leaf_hdr_from_disk(mp->m_attr_geo, &leafhdr, leaf);
 	clearit = usedbs = 0;
 	firstb = mp->m_sb.sb_blocksize;
 	stop = xfs_attr3_leaf_hdr_size(leaf);
@@ -802,7 +802,7 @@ process_leaf_attr_level(xfs_mount_t	*mp,
 		}
 
 		leaf = bp->b_addr;
-		xfs_attr3_leaf_hdr_from_disk(mp->m_attr_geo, &leafhdr, leaf);
+		libxfs_attr3_leaf_hdr_from_disk(mp->m_attr_geo, &leafhdr, leaf);
 
 		/* check magic number for leaf directory btree block */
 		if (!(leafhdr.magic == XFS_ATTR_LEAF_MAGIC ||
@@ -1000,7 +1000,7 @@ process_longform_leaf_root(
 	 * check sibling pointers in leaf block or root block 0 before
 	 * we have to release the btree block
 	 */
-	xfs_attr3_leaf_hdr_from_disk(mp->m_attr_geo, &leafhdr, bp->b_addr);
+	libxfs_attr3_leaf_hdr_from_disk(mp->m_attr_geo, &leafhdr, bp->b_addr);
 	if (leafhdr.forw != 0 || leafhdr.back != 0)  {
 		if (!no_modify)  {
 			do_warn(


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 1/3] xfs: shorten parent pointer function names
  2023-02-16 20:30 ` [PATCHSET v9r2d1 0/3] xfsprogs: online repair of directories Darrick J. Wong
@ 2023-02-16 21:07   ` Darrick J. Wong
  2023-02-16 21:07   ` [PATCH 2/3] xfs: rearrange bits of the parent pointer apis for fsck Darrick J. Wong
  2023-02-16 21:07   ` [PATCH 3/3] xfs: add hooks to do directory updates Darrick J. Wong
  2 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 21:07 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Shorten the function names and add brief comments to each, outlining
what they're supposed to be doing.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 libxfs/libxfs_api_defs.h |    2 +-
 libxfs/xfs_parent.c      |   18 ++++++++++++------
 libxfs/xfs_parent.h      |   24 ++++++++++++------------
 mkfs/proto.c             |   12 ++++++------
 4 files changed, 31 insertions(+), 25 deletions(-)


diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h
index 6d045867..a5045d2e 100644
--- a/libxfs/libxfs_api_defs.h
+++ b/libxfs/libxfs_api_defs.h
@@ -141,7 +141,7 @@
 #define xfs_log_get_max_trans_res	libxfs_log_get_max_trans_res
 #define xfs_log_sb			libxfs_log_sb
 #define xfs_mode_to_ftype		libxfs_mode_to_ftype
-#define xfs_parent_defer_add		libxfs_parent_defer_add
+#define xfs_parent_add			libxfs_parent_add
 #define xfs_parent_finish		libxfs_parent_finish
 #define xfs_parent_start		libxfs_parent_start
 #define xfs_perag_get			libxfs_perag_get
diff --git a/libxfs/xfs_parent.c b/libxfs/xfs_parent.c
index 74c7f1f7..89eb531f 100644
--- a/libxfs/xfs_parent.c
+++ b/libxfs/xfs_parent.c
@@ -136,6 +136,10 @@ xfs_parent_irec_from_disk(
 	memset(&irec->p_name[valuelen], 0, sizeof(irec->p_name) - valuelen);
 }
 
+/*
+ * Allocate memory to control a logged parent pointer update as part of a
+ * dirent operation.
+ */
 int
 __xfs_parent_init(
 	struct xfs_mount		*mp,
@@ -171,12 +175,13 @@ __xfs_parent_init(
 	return 0;
 }
 
+/* Add a parent pointer to reflect a dirent addition. */
 int
-xfs_parent_defer_add(
+xfs_parent_add(
 	struct xfs_trans	*tp,
 	struct xfs_parent_defer	*parent,
 	struct xfs_inode	*dp,
-	struct xfs_name		*parent_name,
+	const struct xfs_name	*parent_name,
 	xfs_dir2_dataptr_t	diroffset,
 	struct xfs_inode	*child)
 {
@@ -195,8 +200,9 @@ xfs_parent_defer_add(
 	return xfs_attr_defer_add(args);
 }
 
+/* Remove a parent pointer to reflect a dirent removal. */
 int
-xfs_parent_defer_remove(
+xfs_parent_remove(
 	struct xfs_trans	*tp,
 	struct xfs_inode	*dp,
 	struct xfs_parent_defer	*parent,
@@ -212,14 +218,14 @@ xfs_parent_defer_remove(
 	return xfs_attr_defer_remove(args);
 }
 
-
+/* Replace one parent pointer with another to reflect a rename. */
 int
-xfs_parent_defer_replace(
+xfs_parent_replace(
 	struct xfs_trans	*tp,
 	struct xfs_parent_defer	*new_parent,
 	struct xfs_inode	*old_dp,
 	xfs_dir2_dataptr_t	old_diroffset,
-	struct xfs_name		*parent_name,
+	const struct xfs_name	*parent_name,
 	struct xfs_inode	*new_dp,
 	xfs_dir2_dataptr_t	new_diroffset,
 	struct xfs_inode	*child)
diff --git a/libxfs/xfs_parent.h b/libxfs/xfs_parent.h
index f4f5887d..35854e96 100644
--- a/libxfs/xfs_parent.h
+++ b/libxfs/xfs_parent.h
@@ -49,8 +49,9 @@ struct xfs_parent_defer {
  * Parent pointer attribute prototypes
  */
 void xfs_init_parent_name_rec(struct xfs_parent_name_rec *rec,
-			      struct xfs_inode *ip,
-			      uint32_t p_diroffset);
+		struct xfs_inode *ip, uint32_t p_diroffset);
+void xfs_init_parent_name_irec(struct xfs_parent_name_irec *irec,
+			       struct xfs_parent_name_rec *rec);
 int __xfs_parent_init(struct xfs_mount *mp, bool grab_log,
 		struct xfs_parent_defer **parentp);
 
@@ -78,18 +79,17 @@ xfs_parent_start_locked(
 	return 0;
 }
 
-int xfs_parent_defer_add(struct xfs_trans *tp, struct xfs_parent_defer *parent,
-			 struct xfs_inode *dp, struct xfs_name *parent_name,
-			 xfs_dir2_dataptr_t diroffset, struct xfs_inode *child);
-int xfs_parent_defer_replace(struct xfs_trans *tp,
+int xfs_parent_add(struct xfs_trans *tp, struct xfs_parent_defer *parent,
+		struct xfs_inode *dp, const struct xfs_name *parent_name,
+		xfs_dir2_dataptr_t diroffset, struct xfs_inode *child);
+int xfs_parent_replace(struct xfs_trans *tp,
 		struct xfs_parent_defer *new_parent, struct xfs_inode *old_dp,
-		xfs_dir2_dataptr_t old_diroffset, struct xfs_name *parent_name,
-		struct xfs_inode *new_ip, xfs_dir2_dataptr_t new_diroffset,
+		xfs_dir2_dataptr_t old_diroffset,
+		const struct xfs_name *parent_name, struct xfs_inode *new_ip,
+		xfs_dir2_dataptr_t new_diroffset, struct xfs_inode *child);
+int xfs_parent_remove(struct xfs_trans *tp, struct xfs_inode *dp,
+		struct xfs_parent_defer *parent, xfs_dir2_dataptr_t diroffset,
 		struct xfs_inode *child);
-int xfs_parent_defer_remove(struct xfs_trans *tp, struct xfs_inode *dp,
-			    struct xfs_parent_defer *parent,
-			    xfs_dir2_dataptr_t diroffset,
-			    struct xfs_inode *child);
 
 void __xfs_parent_cancel(struct xfs_mount *mp, struct xfs_parent_defer *parent);
 
diff --git a/mkfs/proto.c b/mkfs/proto.c
index e0131df5..b8d7ac96 100644
--- a/mkfs/proto.c
+++ b/mkfs/proto.c
@@ -508,8 +508,8 @@ parseproto(
 		newdirent(mp, tp, pip, &xname, ip->i_ino, &offset);
 		libxfs_trans_log_inode(tp, ip, flags);
 		if (parent) {
-			error = -libxfs_parent_defer_add(tp, parent, pip,
-					&xname, offset, ip);
+			error = -libxfs_parent_add(tp, parent, pip, &xname,
+					offset, ip);
 			if (error)
 				fail(_("committing parent pointers failed."),
 						error);
@@ -601,8 +601,8 @@ parseproto(
 		newdirectory(mp, tp, ip, pip);
 		libxfs_trans_log_inode(tp, ip, flags);
 		if (parent) {
-			error = -libxfs_parent_defer_add(tp, parent, pip,
-					&xname, offset, ip);
+			error = -libxfs_parent_add(tp, parent, pip, &xname,
+					offset, ip);
 			if (error)
 				fail(_("committing parent pointers failed."),
 						error);
@@ -636,8 +636,8 @@ parseproto(
 	}
 	libxfs_trans_log_inode(tp, ip, flags);
 	if (parent) {
-		error = -libxfs_parent_defer_add(tp, parent, pip, &xname,
-				offset, ip);
+		error = -libxfs_parent_add(tp, parent, pip, &xname, offset,
+				ip);
 		if (error)
 			fail(_("committing parent pointers failed."), error);
 	}


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 2/3] xfs: rearrange bits of the parent pointer apis for fsck
  2023-02-16 20:30 ` [PATCHSET v9r2d1 0/3] xfsprogs: online repair of directories Darrick J. Wong
  2023-02-16 21:07   ` [PATCH 1/3] xfs: shorten parent pointer function names Darrick J. Wong
@ 2023-02-16 21:07   ` Darrick J. Wong
  2023-02-16 21:07   ` [PATCH 3/3] xfs: add hooks to do directory updates Darrick J. Wong
  2 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 21:07 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Rearrange parts of this thing in preparation for fsck code.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 libxfs/libxfs_api_defs.h |    1 -
 libxfs/xfs_da_format.h   |   11 +++++++++++
 libxfs/xfs_parent.c      |   29 ++++++++++++++++++++++++++++-
 libxfs/xfs_parent.h      |    6 ++----
 4 files changed, 41 insertions(+), 6 deletions(-)


diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h
index a5045d2e..b8ee0247 100644
--- a/libxfs/libxfs_api_defs.h
+++ b/libxfs/libxfs_api_defs.h
@@ -125,7 +125,6 @@
 #define xfs_initialize_perag		libxfs_initialize_perag
 #define xfs_initialize_perag_data	libxfs_initialize_perag_data
 #define xfs_init_local_fork		libxfs_init_local_fork
-#define xfs_init_parent_name_rec	libxfs_init_parent_name_rec
 
 #define xfs_inobt_maxrecs		libxfs_inobt_maxrecs
 #define xfs_inobt_stage_cursor		libxfs_inobt_stage_cursor
diff --git a/libxfs/xfs_da_format.h b/libxfs/xfs_da_format.h
index 2db1cf97..c07b8166 100644
--- a/libxfs/xfs_da_format.h
+++ b/libxfs/xfs_da_format.h
@@ -159,6 +159,17 @@ struct xfs_da3_intnode {
 
 #define XFS_DIR3_FT_MAX			9
 
+#define XFS_DIR3_FTYPE_STR \
+	{ XFS_DIR3_FT_UNKNOWN,	"unknown" }, \
+	{ XFS_DIR3_FT_REG_FILE,	"file" }, \
+	{ XFS_DIR3_FT_DIR,	"directory" }, \
+	{ XFS_DIR3_FT_CHRDEV,	"char" }, \
+	{ XFS_DIR3_FT_BLKDEV,	"block" }, \
+	{ XFS_DIR3_FT_FIFO,	"fifo" }, \
+	{ XFS_DIR3_FT_SOCK,	"sock" }, \
+	{ XFS_DIR3_FT_SYMLINK,	"symlink" }, \
+	{ XFS_DIR3_FT_WHT,	"whiteout" }
+
 /*
  * Byte offset in data block and shortform entry.
  */
diff --git a/libxfs/xfs_parent.c b/libxfs/xfs_parent.c
index 89eb531f..980f0b82 100644
--- a/libxfs/xfs_parent.c
+++ b/libxfs/xfs_parent.c
@@ -92,7 +92,7 @@ xfs_parent_valuecheck(
 }
 
 /* Initializes a xfs_parent_name_rec to be stored as an attribute name */
-void
+static inline void
 xfs_init_parent_name_rec(
 	struct xfs_parent_name_rec	*rec,
 	struct xfs_inode		*ip,
@@ -136,6 +136,33 @@ xfs_parent_irec_from_disk(
 	memset(&irec->p_name[valuelen], 0, sizeof(irec->p_name) - valuelen);
 }
 
+/*
+ * Convert an incore parent_name record to its ondisk format.  If @value or
+ * @valuelen are NULL, they will not be written to.
+ */
+void
+xfs_parent_irec_to_disk(
+	struct xfs_parent_name_rec	*rec,
+	void				*value,
+	int				*valuelen,
+	const struct xfs_parent_name_irec *irec)
+{
+	rec->p_ino = cpu_to_be64(irec->p_ino);
+	rec->p_gen = cpu_to_be32(irec->p_gen);
+	rec->p_diroffset = cpu_to_be32(irec->p_diroffset);
+
+	if (valuelen) {
+		ASSERT(*valuelen > 0);
+		ASSERT(*valuelen >= irec->p_namelen);
+		ASSERT(*valuelen < MAXNAMELEN);
+
+		*valuelen = irec->p_namelen;
+	}
+
+	if (value)
+		memcpy(value, irec->p_name, irec->p_namelen);
+}
+
 /*
  * Allocate memory to control a logged parent pointer update as part of a
  * dirent operation.
diff --git a/libxfs/xfs_parent.h b/libxfs/xfs_parent.h
index 35854e96..4eb92fb4 100644
--- a/libxfs/xfs_parent.h
+++ b/libxfs/xfs_parent.h
@@ -33,6 +33,8 @@ struct xfs_parent_name_irec {
 void xfs_parent_irec_from_disk(struct xfs_parent_name_irec *irec,
 		const struct xfs_parent_name_rec *rec,
 		const void *value, int valuelen);
+void xfs_parent_irec_to_disk(struct xfs_parent_name_rec *rec, void *value,
+		int *valuelen, const struct xfs_parent_name_irec *irec);
 
 /*
  * Dynamically allocd structure used to wrap the needed data to pass around
@@ -48,10 +50,6 @@ struct xfs_parent_defer {
 /*
  * Parent pointer attribute prototypes
  */
-void xfs_init_parent_name_rec(struct xfs_parent_name_rec *rec,
-		struct xfs_inode *ip, uint32_t p_diroffset);
-void xfs_init_parent_name_irec(struct xfs_parent_name_irec *irec,
-			       struct xfs_parent_name_rec *rec);
 int __xfs_parent_init(struct xfs_mount *mp, bool grab_log,
 		struct xfs_parent_defer **parentp);
 


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 3/3] xfs: add hooks to do directory updates
  2023-02-16 20:30 ` [PATCHSET v9r2d1 0/3] xfsprogs: online repair of directories Darrick J. Wong
  2023-02-16 21:07   ` [PATCH 1/3] xfs: shorten parent pointer function names Darrick J. Wong
  2023-02-16 21:07   ` [PATCH 2/3] xfs: rearrange bits of the parent pointer apis for fsck Darrick J. Wong
@ 2023-02-16 21:07   ` Darrick J. Wong
  2 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 21:07 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

While we're scanning the filesystem, we still need to keep the tempdir
up to date with whatever changes get made to the you know what.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 libxfs/xfs_dir2.c |    2 +-
 libxfs/xfs_dir2.h |    2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)


diff --git a/libxfs/xfs_dir2.c b/libxfs/xfs_dir2.c
index 4bbe83f9..9742ba65 100644
--- a/libxfs/xfs_dir2.c
+++ b/libxfs/xfs_dir2.c
@@ -439,7 +439,7 @@ int
 xfs_dir_removename(
 	struct xfs_trans	*tp,
 	struct xfs_inode	*dp,
-	struct xfs_name		*name,
+	const struct xfs_name	*name,
 	xfs_ino_t		ino,
 	xfs_extlen_t		total,		/* bmap's total block count */
 	xfs_dir2_dataptr_t	*offset)	/* OUT: offset in directory */
diff --git a/libxfs/xfs_dir2.h b/libxfs/xfs_dir2.h
index ac360c0b..6ed86b7b 100644
--- a/libxfs/xfs_dir2.h
+++ b/libxfs/xfs_dir2.h
@@ -46,7 +46,7 @@ extern int xfs_dir_lookup(struct xfs_trans *tp, struct xfs_inode *dp,
 				const struct xfs_name *name, xfs_ino_t *inum,
 				struct xfs_name *ci_name);
 extern int xfs_dir_removename(struct xfs_trans *tp, struct xfs_inode *dp,
-				struct xfs_name *name, xfs_ino_t ino,
+				const struct xfs_name *name, xfs_ino_t ino,
 				xfs_extlen_t tot,
 				xfs_dir2_dataptr_t *offset);
 extern int xfs_dir_replace(struct xfs_trans *tp, struct xfs_inode *dp,


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 1/1] xfs: deferred scrub of parent pointers
  2023-02-16 20:30 ` [PATCHSET v9r2d1 0/1] xfsprogs: online checking of parent pointers Darrick J. Wong
@ 2023-02-16 21:07   ` Darrick J. Wong
  0 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 21:07 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

If the trylock-based dirent check fails, retain those parent pointers
and check them at the end.  This may involve dropping the locks on the
file being scanned, so yay.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 libxfs/xfs_parent.c |   38 ++++++++++++++++++++++++++++++++++++++
 libxfs/xfs_parent.h |   10 ++++++++++
 2 files changed, 48 insertions(+)


diff --git a/libxfs/xfs_parent.c b/libxfs/xfs_parent.c
index 980f0b82..1598158f 100644
--- a/libxfs/xfs_parent.c
+++ b/libxfs/xfs_parent.c
@@ -299,3 +299,41 @@ xfs_pptr_calc_space_res(
 	       XFS_NEXTENTADD_SPACE_RES(mp, namelen, XFS_ATTR_FORK);
 }
 
+/*
+ * Look up the @name associated with the parent pointer (@pptr) of @ip.  Caller
+ * must hold at least ILOCK_SHARED.  Returns the length of the dirent name, or
+ * a negative errno.  The scratchpad need not be initialized.
+ */
+int
+xfs_parent_lookup(
+	struct xfs_trans		*tp,
+	struct xfs_inode		*ip,
+	const struct xfs_parent_name_irec *pptr,
+	unsigned char			*name,
+	unsigned int			namelen,
+	struct xfs_parent_scratch	*scr)
+{
+	int				error;
+
+	xfs_parent_irec_to_disk(&scr->rec, NULL, NULL, pptr);
+
+	memset(&scr->args, 0, sizeof(struct xfs_da_args));
+	scr->args.attr_filter	= XFS_ATTR_PARENT;
+	scr->args.dp		= ip;
+	scr->args.geo		= ip->i_mount->m_attr_geo;
+	scr->args.name		= (const unsigned char *)&scr->rec;
+	scr->args.namelen	= sizeof(struct xfs_parent_name_rec);
+	scr->args.op_flags	= XFS_DA_OP_OKNOENT;
+	scr->args.trans		= tp;
+	scr->args.valuelen	= namelen;
+	scr->args.value		= name;
+	scr->args.whichfork	= XFS_ATTR_FORK;
+
+	scr->args.hashval = xfs_da_hashname(scr->args.name, scr->args.namelen);
+
+	error = xfs_attr_get_ilocked(&scr->args);
+	if (error)
+		return error;
+
+	return scr->args.valuelen;
+}
diff --git a/libxfs/xfs_parent.h b/libxfs/xfs_parent.h
index 4eb92fb4..cd1b1351 100644
--- a/libxfs/xfs_parent.h
+++ b/libxfs/xfs_parent.h
@@ -103,4 +103,14 @@ xfs_parent_finish(
 unsigned int xfs_pptr_calc_space_res(struct xfs_mount *mp,
 				     unsigned int namelen);
 
+/* Scratchpad memory so that raw parent operations don't burn stack space. */
+struct xfs_parent_scratch {
+	struct xfs_parent_name_rec	rec;
+	struct xfs_da_args		args;
+};
+
+int xfs_parent_lookup(struct xfs_trans *tp, struct xfs_inode *ip,
+		const struct xfs_parent_name_irec *pptr, unsigned char *name,
+		unsigned int namelen, struct xfs_parent_scratch *scratch);
+
 #endif	/* __XFS_PARENT_H__ */


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 1/2] xfs: repair parent pointers by scanning directories
  2023-02-16 20:30 ` [PATCHSET v9r2d1 0/2] xfsprogs: online checking " Darrick J. Wong
@ 2023-02-16 21:08   ` Darrick J. Wong
  2023-02-16 21:08   ` [PATCH 2/2] xfs: repair parent pointers with live scan hooks Darrick J. Wong
  1 sibling, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 21:08 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Walk the filesystem to rebuild parent pointer information.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 include/xfs_inode.h |    6 ++++++
 libxfs/xfs_parent.c |   31 +++++++++++++++++++++++++++++--
 libxfs/xfs_parent.h |    4 ++++
 3 files changed, 39 insertions(+), 2 deletions(-)


diff --git a/include/xfs_inode.h b/include/xfs_inode.h
index b0bba109..2bbda956 100644
--- a/include/xfs_inode.h
+++ b/include/xfs_inode.h
@@ -175,6 +175,12 @@ static inline struct inode *VFS_I(struct xfs_inode *ip)
 	return &ip->i_vnode;
 }
 
+/* convert from xfs inode to vfs inode */
+static inline const struct inode *VFS_IC(const struct xfs_inode *ip)
+{
+	return &ip->i_vnode;
+}
+
 /* We only have i_size in the xfs inode in userspace */
 static inline loff_t i_size_read(struct inode *inode)
 {
diff --git a/libxfs/xfs_parent.c b/libxfs/xfs_parent.c
index 1598158f..3ce30860 100644
--- a/libxfs/xfs_parent.c
+++ b/libxfs/xfs_parent.c
@@ -95,11 +95,11 @@ xfs_parent_valuecheck(
 static inline void
 xfs_init_parent_name_rec(
 	struct xfs_parent_name_rec	*rec,
-	struct xfs_inode		*ip,
+	const struct xfs_inode		*ip,
 	uint32_t			p_diroffset)
 {
 	xfs_ino_t			p_ino = ip->i_ino;
-	uint32_t			p_gen = VFS_I(ip)->i_generation;
+	uint32_t			p_gen = VFS_IC(ip)->i_generation;
 
 	rec->p_ino = cpu_to_be64(p_ino);
 	rec->p_gen = cpu_to_be32(p_gen);
@@ -337,3 +337,30 @@ xfs_parent_lookup(
 
 	return scr->args.valuelen;
 }
+
+/*
+ * Attach the parent pointer (@pptr -> @name) to @ip immediately.  Caller must
+ * not have a transaction or hold the ILOCK.  The update will not use logged
+ * xattrs.  This is for specialized repair functions only.  The scratchpad need
+ * not be initialized.
+ */
+int
+xfs_parent_set(
+	struct xfs_inode		*ip,
+	const struct xfs_parent_name_irec *pptr,
+	struct xfs_parent_scratch	*scr)
+{
+	xfs_parent_irec_to_disk(&scr->rec, NULL, NULL, pptr);
+
+	memset(&scr->args, 0, sizeof(struct xfs_da_args));
+	scr->args.attr_filter	= XFS_ATTR_PARENT;
+	scr->args.dp		= ip;
+	scr->args.geo		= ip->i_mount->m_attr_geo;
+	scr->args.name		= (const unsigned char *)&scr->rec;
+	scr->args.namelen	= sizeof(struct xfs_parent_name_rec);
+	scr->args.valuelen	= pptr->p_namelen;
+	scr->args.value		= (void *)pptr->p_name;
+	scr->args.whichfork	= XFS_ATTR_FORK;
+
+	return xfs_attr_set(&scr->args);
+}
diff --git a/libxfs/xfs_parent.h b/libxfs/xfs_parent.h
index cd1b1351..effbccdf 100644
--- a/libxfs/xfs_parent.h
+++ b/libxfs/xfs_parent.h
@@ -113,4 +113,8 @@ int xfs_parent_lookup(struct xfs_trans *tp, struct xfs_inode *ip,
 		const struct xfs_parent_name_irec *pptr, unsigned char *name,
 		unsigned int namelen, struct xfs_parent_scratch *scratch);
 
+int xfs_parent_set(struct xfs_inode *ip,
+		const struct xfs_parent_name_irec *pptr,
+		struct xfs_parent_scratch *scratch);
+
 #endif	/* __XFS_PARENT_H__ */


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 2/2] xfs: repair parent pointers with live scan hooks
  2023-02-16 20:30 ` [PATCHSET v9r2d1 0/2] xfsprogs: online checking " Darrick J. Wong
  2023-02-16 21:08   ` [PATCH 1/2] xfs: repair parent pointers by scanning directories Darrick J. Wong
@ 2023-02-16 21:08   ` Darrick J. Wong
  1 sibling, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 21:08 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Use the nlink hooks to keep our tempfile's parent pointers up to date.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 libxfs/xfs_parent.c |   25 +++++++++++++++++++++++++
 libxfs/xfs_parent.h |    4 ++++
 2 files changed, 29 insertions(+)


diff --git a/libxfs/xfs_parent.c b/libxfs/xfs_parent.c
index 3ce30860..a7c5974c 100644
--- a/libxfs/xfs_parent.c
+++ b/libxfs/xfs_parent.c
@@ -364,3 +364,28 @@ xfs_parent_set(
 
 	return xfs_attr_set(&scr->args);
 }
+
+/*
+ * Remove the parent pointer (@rec -> @name) from @ip immediately.  Caller must
+ * not have a transaction or hold the ILOCK.  The update will not use logged
+ * xattrs.  This is for specialized repair functions only.  The scratchpad need
+ * not be initialized.
+ */
+int
+xfs_parent_unset(
+	struct xfs_inode		*ip,
+	const struct xfs_parent_name_irec *pptr,
+	struct xfs_parent_scratch	*scr)
+{
+	xfs_parent_irec_to_disk(&scr->rec, NULL, NULL, pptr);
+
+	memset(&scr->args, 0, sizeof(struct xfs_da_args));
+	scr->args.attr_filter	= XFS_ATTR_PARENT;
+	scr->args.dp		= ip;
+	scr->args.geo		= ip->i_mount->m_attr_geo;
+	scr->args.name		= (const unsigned char *)&scr->rec;
+	scr->args.namelen	= sizeof(struct xfs_parent_name_rec);
+	scr->args.whichfork	= XFS_ATTR_FORK;
+
+	return xfs_attr_set(&scr->args);
+}
diff --git a/libxfs/xfs_parent.h b/libxfs/xfs_parent.h
index effbccdf..a7fc621b 100644
--- a/libxfs/xfs_parent.h
+++ b/libxfs/xfs_parent.h
@@ -117,4 +117,8 @@ int xfs_parent_set(struct xfs_inode *ip,
 		const struct xfs_parent_name_irec *pptr,
 		struct xfs_parent_scratch *scratch);
 
+int xfs_parent_unset(struct xfs_inode *ip,
+		const struct xfs_parent_name_irec *rec,
+		struct xfs_parent_scratch *scratch);
+
 #endif	/* __XFS_PARENT_H__ */


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 1/8] xfs_repair: build a parent pointer index
  2023-02-16 20:31 ` [PATCHSET v9r2d1 0/8] xfs_repair: support parent pointers Darrick J. Wong
@ 2023-02-16 21:08   ` Darrick J. Wong
  2023-02-16 21:08   ` [PATCH 2/8] xfs_repair: check parent pointers Darrick J. Wong
                     ` (6 subsequent siblings)
  7 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 21:08 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

When we're walking directories during phase 6, build an index of parent
pointers that we expect to find.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 repair/Makefile |    2 
 repair/phase6.c |   55 +++++++++++--
 repair/pptr.c   |  242 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 repair/pptr.h   |   15 +++
 4 files changed, 307 insertions(+), 7 deletions(-)
 create mode 100644 repair/pptr.c
 create mode 100644 repair/pptr.h


diff --git a/repair/Makefile b/repair/Makefile
index 2c40e59a..18731613 100644
--- a/repair/Makefile
+++ b/repair/Makefile
@@ -23,6 +23,7 @@ HFILES = \
 	err_protos.h \
 	globals.h \
 	incore.h \
+	pptr.h \
 	prefetch.h \
 	progress.h \
 	protos.h \
@@ -59,6 +60,7 @@ CFILES = \
 	phase5.c \
 	phase6.c \
 	phase7.c \
+	pptr.c \
 	prefetch.c \
 	progress.c \
 	quotacheck.c \
diff --git a/repair/phase6.c b/repair/phase6.c
index 0d253701..48ec236d 100644
--- a/repair/phase6.c
+++ b/repair/phase6.c
@@ -18,6 +18,7 @@
 #include "dinode.h"
 #include "progress.h"
 #include "versions.h"
+#include "repair/pptr.h"
 
 static struct cred		zerocr;
 static struct fsxattr 		zerofsx;
@@ -67,6 +68,7 @@ struct dir_hash_ent {
 	struct dir_hash_ent	*nextbyorder;	/* next in order added */
 	xfs_dahash_t		hashval;	/* hash value of name */
 	uint32_t		address;	/* offset of data entry */
+	uint32_t		new_address;	/* new address, if we rebuild */
 	xfs_ino_t		inum;		/* inode num of entry */
 	short			junkit;		/* name starts with / */
 	short			seen;		/* have seen leaf entry */
@@ -224,6 +226,7 @@ dir_hash_add(
 	p->address = addr;
 	p->inum = inum;
 	p->seen = 0;
+	p->new_address = addr;
 
 	/* Set up the name in the region trailing the hash entry. */
 	memcpy(p->namebuf, name, namelen);
@@ -885,6 +888,7 @@ mk_orphanage(xfs_mount_t *mp)
 	int		error;
 	const int	mode = 0755;
 	int		nres;
+	xfs_dir2_dataptr_t	diroffset;
 	struct xfs_name	xname;
 
 	/*
@@ -969,11 +973,13 @@ mk_orphanage(xfs_mount_t *mp)
 	/*
 	 * create the actual entry
 	 */
-	error = -libxfs_dir_createname(tp, pip, &xname, ip->i_ino, nres, NULL);
+	error = -libxfs_dir_createname(tp, pip, &xname, ip->i_ino, nres,
+			&diroffset);
 	if (error)
 		do_error(
 		_("can't make %s, createname error %d\n"),
 			ORPHANAGE, error);
+	add_parent_ptr(ip->i_ino, ORPHANAGE, diroffset, pip);
 
 	/*
 	 * bump up the link count in the root directory to account
@@ -1018,6 +1024,7 @@ mv_orphanage(
 	int			nres;
 	int			incr;
 	ino_tree_node_t		*irec;
+	xfs_dir2_dataptr_t	diroffset;
 	int			ino_offset = 0;
 	struct xfs_name		xname;
 
@@ -1066,7 +1073,7 @@ mv_orphanage(
 			libxfs_trans_ijoin(tp, ino_p, 0);
 
 			err = -libxfs_dir_createname(tp, orphanage_ip, &xname,
-						ino, nres, NULL);
+						ino, nres, &diroffset);
 			if (err)
 				do_error(
 	_("name create failed in %s (%d)\n"), ORPHANAGE, err);
@@ -1100,7 +1107,7 @@ mv_orphanage(
 
 
 			err = -libxfs_dir_createname(tp, orphanage_ip, &xname,
-						ino, nres, NULL);
+						ino, nres, &diroffset);
 			if (err)
 				do_error(
 	_("name create failed in %s (%d)\n"), ORPHANAGE, err);
@@ -1147,7 +1154,7 @@ mv_orphanage(
 		libxfs_trans_ijoin(tp, ino_p, 0);
 
 		err = -libxfs_dir_createname(tp, orphanage_ip, &xname, ino,
-						nres, NULL);
+						nres, &diroffset);
 		if (err)
 			do_error(
 	_("name create failed in %s (%d)\n"), ORPHANAGE, err);
@@ -1160,6 +1167,11 @@ mv_orphanage(
 			do_error(
 	_("orphanage name create failed (%d)\n"), err);
 	}
+
+	if (xfs_has_parent(mp))
+		add_parent_ptr(ino_p->i_ino, xname.name, diroffset,
+				orphanage_ip);
+
 	libxfs_irele(ino_p);
 	libxfs_irele(orphanage_ip);
 }
@@ -1330,7 +1342,7 @@ longform_dir2_rebuild(
 		libxfs_trans_ijoin(tp, ip, 0);
 
 		error = -libxfs_dir_createname(tp, ip, &p->name, p->inum,
-						nres, NULL);
+						nres, &p->new_address);
 		if (error) {
 			do_warn(
 _("name create failed in ino %" PRIu64 " (%d)\n"), ino, error);
@@ -2459,6 +2471,7 @@ shortform_dir2_entry_check(
 	struct xfs_dir2_sf_entry *next_sfep;
 	struct xfs_ifork	*ifp;
 	struct ino_tree_node	*irec;
+	xfs_dir2_dataptr_t	diroffset;
 	int			max_size;
 	int			ino_offset;
 	int			i;
@@ -2637,8 +2650,9 @@ shortform_dir2_entry_check(
 		/*
 		 * check for duplicate names in directory.
 		 */
-		if (!dir_hash_add(mp, hashtab, (xfs_dir2_dataptr_t)
-				(sfep - xfs_dir2_sf_firstentry(sfp)),
+		diroffset = xfs_dir2_byte_to_dataptr(
+				xfs_dir2_sf_get_offset(sfep));
+		if (!dir_hash_add(mp, hashtab, diroffset,
 				lino, sfep->namelen, sfep->name,
 				libxfs_dir2_sf_get_ftype(mp, sfep))) {
 			do_warn(
@@ -2672,6 +2686,7 @@ _("entry \"%s\" (ino %" PRIu64 ") in dir %" PRIu64 " is a duplicate name"),
 				next_sfep = shortform_dir2_junk(mp, sfp, sfep,
 						lino, &max_size, &i,
 						&bytes_deleted, ino_dirty);
+				dir_hash_junkit(hashtab, diroffset);
 				continue;
 			} else if (parent == ino)  {
 				add_inode_reached(irec, ino_offset);
@@ -2696,6 +2711,7 @@ _("entry \"%s\" (ino %" PRIu64 ") in dir %" PRIu64 " is a duplicate name"),
 				next_sfep = shortform_dir2_junk(mp, sfp, sfep,
 						lino, &max_size, &i,
 						&bytes_deleted, ino_dirty);
+				dir_hash_junkit(hashtab, diroffset);
 				continue;
 			}
 		}
@@ -2787,6 +2803,26 @@ _("entry \"%s\" (ino %" PRIu64 ") in dir %" PRIu64 " is a duplicate name"),
 	}
 }
 
+static void
+dir_hash_add_parent_ptrs(
+	struct xfs_inode	*dp,
+	struct dir_hash_tab	*hashtab)
+{
+	struct dir_hash_ent	*p;
+
+	if (!xfs_has_parent(dp->i_mount))
+		return;
+
+	for (p = hashtab->first; p; p = p->nextbyorder) {
+		if (p->name.name[0] == '/' || (p->name.name[0] == '.' &&
+				(p->name.len == 1 || (p->name.len == 2 &&
+						p->name.name[1] == '.'))))
+			continue;
+
+		add_parent_ptr(p->inum, p->name.name, p->new_address, dp);
+	}
+}
+
 /*
  * processes all reachable inodes in directories
  */
@@ -2913,6 +2949,7 @@ _("error %d fixing shortform directory %llu\n"),
 		default:
 			break;
 	}
+	dir_hash_add_parent_ptrs(ip, hashtab);
 	dir_hash_done(hashtab);
 
 	/*
@@ -3204,6 +3241,8 @@ phase6(xfs_mount_t *mp)
 	ino_tree_node_t		*irec;
 	int			i;
 
+	parent_ptr_init(mp);
+
 	memset(&zerocr, 0, sizeof(struct cred));
 	memset(&zerofsx, 0, sizeof(struct fsxattr));
 	orphanage_ino = 0;
@@ -3304,4 +3343,6 @@ _("        - resetting contents of realtime bitmap and summary inodes\n"));
 			irec = next_ino_rec(irec);
 		}
 	}
+
+	parent_ptr_free(mp);
 }
diff --git a/repair/pptr.c b/repair/pptr.c
new file mode 100644
index 00000000..b10c7f41
--- /dev/null
+++ b/repair/pptr.c
@@ -0,0 +1,242 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (C) 2023 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <djwong@kernel.org>
+ */
+#include "libxfs.h"
+#include "libxfs/xfile.h"
+#include "libxfs/xfblob.h"
+#include "repair/err_protos.h"
+#include "repair/slab.h"
+#include "repair/pptr.h"
+
+#undef PPTR_DEBUG
+
+#ifdef PPTR_DEBUG
+# define dbg_printf(f, a...)  do {printf(f, ## a); fflush(stdout); } while (0)
+#else
+# define dbg_printf(f, a...)
+#endif
+
+/*
+ * Parent Pointer Validation
+ * =========================
+ *
+ * Phase 6 validates the connectivity of the directory tree after validating
+ * that all the space metadata are correct, and confirming all the inodes that
+ * we intend to keep.  The first part of phase 6 walks the directories of the
+ * filesystem to ensure that every file that isn't the root directory has a
+ * parent.  Unconnected files are attached to the orphanage.  Filesystems with
+ * the directory parent pointer feature enabled must also ensure that for every
+ * directory entry that points to a child file, that child has a matching
+ * parent pointer.
+ *
+ * There are many ways that we could check the parent pointers, but the means
+ * that we have chosen is to build a per-AG master index of all parent pointers
+ * of all inodes stored in that AG, and use that as the basis for comparison.
+ * This consumes a lot of memory, but performing both a forward scan to check
+ * dirent -> parent pointer and a backwards scan of parent pointer -> dirent
+ * takes longer than the simple method presented here.  Userspace adds the
+ * additional twist that inodes are not cached (and there are no ILOCKs), which
+ * makes that approach even less attractive.
+ *
+ * During the directory walk at the start of phase 6, we transform each child
+ * directory entry found into its parent pointer equivalent.  In other words,
+ * the forward information:
+ *
+ *     (dir_ino, dir_offset, name, child_ino)
+ *
+ * becomes this backwards information:
+ *
+ *     (*child_agino, *dir_ino, dir_gen, *dir_offset, name)
+ *
+ * Key fields are starred.
+ *
+ * This tuple is recorded in the per-AG master parent pointer index.  Note
+ * that names are stored separately in an xfblob data structure so that the
+ * rest of the information can be sorted and processed as fixed-size records.
+ *
+ * Once we've finished with the forward scan, we get to work on the backwards
+ * scan.  Each AG is processed independently.  First, we sort the per-AG master
+ * records in order of child_agino, dir_ino, and dir_offset.  Each inode in the
+ * AG is then processed in numerical order.
+ *
+ * The first thing that happens to the file is that we read all the extended
+ * attributes to look for parent pointers.  Attributes that claim to be parent
+ * pointers but are obviously garbage are thrown away.  The rest of the parent
+ * pointers for that file are recorded in memory like this:
+ *
+ *     (*dir_ino, dir_gen, *dir_offset, name)
+ *
+ * When we've concluded the xattr scan, these records are sorted in order of
+ * dir_ino and dir_offset.  The master index cursor should point at the first
+ * record for the file that we're scanning, if everything is consistent.
+ *
+ * If not, there are two possibilities:
+ *
+ * A. The master index cursor points to a higher inode number than the one we
+ * are scanning.  The file has apparently lost all parents, so all parent
+ * pointers (if any) must be deleted.  This should only happen to metadata
+ * inodes.
+ *
+ * B. The cursor instead points to a lower inode number than the one we are
+ * scanning.  This means that there exists a directory entry pointing at an
+ * inode that is free.  We supposedly already settled which inodes are free
+ * and which aren't, which means in-memory information is inconsistent.  Abort.
+ *
+ * Otherwise, we are ready to check the file parent pointers against the
+ * master.  If the ondisk directory metadata are all consistent, this recordset
+ * should correspond exactly to the subset of the master records with a
+ * child_agino matching the file that we're scanning.  We should be able to
+ * walk both sets in lockstep, and find one of the following outcomes:
+ *
+ * 1) The master index cursor is ahead of the ondisk index cursor.  This means
+ * that the inode has parent pointers that were not found during the dirent
+ * scan.  These should be deleted.
+ *
+ * 2) The ondisk index gets ahead of the master index.  This means that the
+ * dirent scan found parent pointers that are not attached to the inode.
+ * These should be added.
+ *
+ * 3) The parent_gen or (dirent) name are not consistent.  Update the parent
+ * pointer to the values that we found during the dirent scan.
+ *
+ * 4) Everything matches.  Move on to the next parent pointer.
+ *
+ * The current implementation does not try to rebuild directories from parent
+ * pointer information, as this requires a lengthy scan of the filesystem for
+ * each broken directory.
+ */
+
+struct ag_pptr {
+	/* parent directory handle */
+	xfs_ino_t		parent_ino;
+	unsigned int		parent_gen;
+
+	/* dirent offset */
+	xfs_dir2_dataptr_t	diroffset;
+
+	/* dirent name length */
+	unsigned int		namelen;
+
+	/* cookie for the actual dirent name */
+	xfblob_cookie		name_cookie;
+
+	/* agino of the child file */
+	xfs_agino_t		child_agino;
+};
+
+struct ag_pptrs {
+	/* Lock to protect pptr_recs during the dirent scan. */
+	pthread_mutex_t		lock;
+
+	/* Parent pointer records for files in this AG. */
+	struct xfs_slab		*pptr_recs;
+};
+
+/* Global names storage file. */
+static struct xfblob	*names;
+static pthread_mutex_t	names_mutex = PTHREAD_MUTEX_INITIALIZER;
+static struct ag_pptrs	*fs_pptrs;
+
+void
+parent_ptr_free(
+	struct xfs_mount	*mp)
+{
+	xfs_agnumber_t		agno;
+
+	if (!xfs_has_parent(mp))
+		return;
+
+	for (agno = 0; agno < mp->m_sb.sb_agcount; agno++) {
+		free_slab(&fs_pptrs[agno].pptr_recs);
+		pthread_mutex_destroy(&fs_pptrs[agno].lock);
+	}
+	free(fs_pptrs);
+	fs_pptrs = NULL;
+
+	xfblob_destroy(names);
+}
+
+void
+parent_ptr_init(
+	struct xfs_mount	*mp)
+{
+	xfs_agnumber_t		agno;
+	int			error;
+
+	if (!xfs_has_parent(mp))
+		return;
+
+	error = -xfblob_create(mp, "parent pointer names", &names);
+	if (error)
+		do_error(_("init parent pointer names failed: %s\n"),
+				strerror(error));
+
+	fs_pptrs = calloc(mp->m_sb.sb_agcount, sizeof(struct ag_pptrs));
+	if (!fs_pptrs)
+		do_error(
+ _("init parent pointer per-AG record array failed: %s\n"),
+				strerror(errno));
+
+	for (agno = 0; agno < mp->m_sb.sb_agcount; agno++) {
+		error = pthread_mutex_init(&fs_pptrs[agno].lock, NULL);
+		if (error)
+			do_error(
+ _("init agno %u parent pointer lock failed: %s\n"),
+					agno, strerror(error));
+
+		error = -init_slab(&fs_pptrs[agno].pptr_recs,
+				sizeof(struct ag_pptr));
+		if (error)
+			do_error(
+ _("init agno %u parent pointer recs failed: %s\n"),
+					agno, strerror(error));
+	}
+}
+
+/* Remember that @dp has a dirent (@fname, @ino) at @diroffset. */
+void
+add_parent_ptr(
+	xfs_ino_t		ino,
+	const unsigned char	*fname,
+	xfs_dir2_dataptr_t	diroffset,
+	struct xfs_inode	*dp)
+{
+	struct xfs_mount	*mp = dp->i_mount;
+	struct ag_pptr		ag_pptr = {
+		.child_agino	= XFS_INO_TO_AGINO(mp, ino),
+		.parent_ino	= dp->i_ino,
+		.parent_gen	= VFS_I(dp)->i_generation,
+		.diroffset	= diroffset,
+		.namelen	= strlen(fname),
+	};
+	struct ag_pptrs		*ag_pptrs;
+	xfs_agnumber_t		agno = XFS_INO_TO_AGNO(mp, ino);
+	int			error;
+
+	if (!xfs_has_parent(mp))
+		return;
+
+	pthread_mutex_lock(&names_mutex);
+	error = -xfblob_store(names, &ag_pptr.name_cookie, fname,
+			ag_pptr.namelen);
+	pthread_mutex_unlock(&names_mutex);
+	if (error)
+		do_error(_("storing name '%s' failed: %s\n"),
+				fname, strerror(error));
+
+	ag_pptrs = &fs_pptrs[agno];
+	pthread_mutex_lock(&ag_pptrs->lock);
+	error = -slab_add(ag_pptrs->pptr_recs, &ag_pptr);
+	pthread_mutex_unlock(&ag_pptrs->lock);
+	if (error)
+		do_error(_("storing name '%s' key failed: %s\n"),
+				fname, strerror(error));
+
+	dbg_printf(
+ _("%s: dp %llu fname '%s' diroffset %u ino %llu cookie 0x%llx\n"),
+			__func__, (unsigned long long)dp->i_ino, fname,
+			diroffset, (unsigned long long)ino,
+			(unsigned long long)ag_pptr.name_cookie);
+}
diff --git a/repair/pptr.h b/repair/pptr.h
new file mode 100644
index 00000000..2c632ec9
--- /dev/null
+++ b/repair/pptr.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Copyright (C) 2023 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <djwong@kernel.org>
+ */
+#ifndef __REPAIR_PPTR_H__
+#define __REPAIR_PPTR_H__
+
+void parent_ptr_free(struct xfs_mount *mp);
+void parent_ptr_init(struct xfs_mount *mp);
+
+void add_parent_ptr(xfs_ino_t ino, const unsigned char *fname,
+		xfs_dir2_dataptr_t diroffset, struct xfs_inode *dp);
+
+#endif /* __REPAIR_PPTR_H__ */


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 2/8] xfs_repair: check parent pointers
  2023-02-16 20:31 ` [PATCHSET v9r2d1 0/8] xfs_repair: support parent pointers Darrick J. Wong
  2023-02-16 21:08   ` [PATCH 1/8] xfs_repair: build a parent pointer index Darrick J. Wong
@ 2023-02-16 21:08   ` Darrick J. Wong
  2023-02-16 21:09   ` [PATCH 3/8] xfs_repair: dump garbage parent pointer attributes Darrick J. Wong
                     ` (5 subsequent siblings)
  7 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 21:08 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Use the parent pointer index that we constructed in the previous patch
to check that each file's parent pointer records exactly match the
directory entries that we recorded while walking directory entries.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 libxfs/libxfs_api_defs.h |    5 
 libxfs/xfblob.c          |    9 +
 libxfs/xfblob.h          |    2 
 repair/Makefile          |    2 
 repair/listxattr.c       |  283 +++++++++++++++++++++
 repair/listxattr.h       |   15 +
 repair/phase6.c          |    2 
 repair/pptr.c            |  618 ++++++++++++++++++++++++++++++++++++++++++++++
 repair/pptr.h            |    2 
 9 files changed, 938 insertions(+)
 create mode 100644 repair/listxattr.c
 create mode 100644 repair/listxattr.h


diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h
index b8ee0247..92cdb6cc 100644
--- a/libxfs/libxfs_api_defs.h
+++ b/libxfs/libxfs_api_defs.h
@@ -34,7 +34,9 @@
 #define xfs_alloc_vextent		libxfs_alloc_vextent
 
 #define xfs_attr3_leaf_hdr_from_disk	libxfs_attr3_leaf_hdr_from_disk
+#define xfs_attr3_leaf_read		libxfs_attr3_leaf_read
 #define xfs_attr_get			libxfs_attr_get
+#define xfs_attr_is_leaf		libxfs_attr_is_leaf
 #define xfs_attr_leaf_newentsize	libxfs_attr_leaf_newentsize
 #define xfs_attr_namecheck		libxfs_attr_namecheck
 #define xfs_attr_set			libxfs_attr_set
@@ -63,6 +65,7 @@
 #define xfs_bwrite			libxfs_bwrite
 #define xfs_calc_dquots_per_chunk	libxfs_calc_dquots_per_chunk
 #define xfs_da3_node_hdr_from_disk	libxfs_da3_node_hdr_from_disk
+#define xfs_da3_node_read		libxfs_da3_node_read
 #define xfs_da_get_buf			libxfs_da_get_buf
 #define xfs_da_hashname			libxfs_da_hashname
 #define xfs_da_read_buf			libxfs_da_read_buf
@@ -130,6 +133,7 @@
 #define xfs_inobt_stage_cursor		libxfs_inobt_stage_cursor
 #define xfs_inode_from_disk		libxfs_inode_from_disk
 #define xfs_inode_from_disk_ts		libxfs_inode_from_disk_ts
+#define xfs_inode_hasattr		libxfs_inode_hasattr
 #define xfs_inode_to_disk		libxfs_inode_to_disk
 #define xfs_inode_validate_cowextsize	libxfs_inode_validate_cowextsize
 #define xfs_inode_validate_extsize	libxfs_inode_validate_extsize
@@ -142,6 +146,7 @@
 #define xfs_mode_to_ftype		libxfs_mode_to_ftype
 #define xfs_parent_add			libxfs_parent_add
 #define xfs_parent_finish		libxfs_parent_finish
+#define xfs_parent_irec_from_disk	libxfs_parent_irec_from_disk
 #define xfs_parent_start		libxfs_parent_start
 #define xfs_perag_get			libxfs_perag_get
 #define xfs_perag_put			libxfs_perag_put
diff --git a/libxfs/xfblob.c b/libxfs/xfblob.c
index 6c1c8e6f..2c6e69a2 100644
--- a/libxfs/xfblob.c
+++ b/libxfs/xfblob.c
@@ -146,3 +146,12 @@ xfblob_free(
 	xfile_discard(blob->xfile, cookie, sizeof(key) + key.xb_size);
 	return 0;
 }
+
+/* Drop all the blobs. */
+void
+xfblob_truncate(
+	struct xfblob	*blob)
+{
+	xfile_discard(blob->xfile, PAGE_SIZE, blob->last_offset - PAGE_SIZE);
+	blob->last_offset = PAGE_SIZE;
+}
diff --git a/libxfs/xfblob.h b/libxfs/xfblob.h
index d1282810..0d6de8ce 100644
--- a/libxfs/xfblob.h
+++ b/libxfs/xfblob.h
@@ -22,4 +22,6 @@ int xfblob_store(struct xfblob *blob, xfblob_cookie *cookie, const void *ptr,
 		uint32_t size);
 int xfblob_free(struct xfblob *blob, xfblob_cookie cookie);
 
+void xfblob_truncate(struct xfblob *blob);
+
 #endif /* __XFS_SCRUB_XFBLOB_H__ */
diff --git a/repair/Makefile b/repair/Makefile
index 18731613..925864c2 100644
--- a/repair/Makefile
+++ b/repair/Makefile
@@ -23,6 +23,7 @@ HFILES = \
 	err_protos.h \
 	globals.h \
 	incore.h \
+	listxattr.h \
 	pptr.h \
 	prefetch.h \
 	progress.h \
@@ -53,6 +54,7 @@ CFILES = \
 	incore_ext.c \
 	incore_ino.c \
 	init.c \
+	listxattr.c \
 	phase1.c \
 	phase2.c \
 	phase3.c \
diff --git a/repair/listxattr.c b/repair/listxattr.c
new file mode 100644
index 00000000..484f9a00
--- /dev/null
+++ b/repair/listxattr.c
@@ -0,0 +1,283 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (C) 2022 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <djwong@kernel.org>
+ */
+#include "libxfs.h"
+#include "libxlog.h"
+#include "libfrog/bitmap.h"
+#include "repair/listxattr.h"
+
+/* Call a function for every entry in a shortform xattr structure. */
+STATIC int
+xattr_walk_sf(
+	struct xfs_inode		*ip,
+	xattr_walk_fn			attr_fn,
+	void				*priv)
+{
+	struct xfs_attr_shortform	*sf;
+	struct xfs_attr_sf_entry	*sfe;
+	unsigned int			i;
+	int				error;
+
+	sf = (struct xfs_attr_shortform *)ip->i_af.if_u1.if_data;
+	for (i = 0, sfe = &sf->list[0]; i < sf->hdr.count; i++) {
+		error = attr_fn(ip, sfe->flags, sfe->nameval, sfe->namelen,
+				&sfe->nameval[sfe->namelen], sfe->valuelen,
+				priv);
+		if (error)
+			return error;
+
+		sfe = xfs_attr_sf_nextentry(sfe);
+	}
+
+	return 0;
+}
+
+/* Call a function for every entry in this xattr leaf block. */
+STATIC int
+xattr_walk_leaf_entries(
+	struct xfs_inode		*ip,
+	xattr_walk_fn			attr_fn,
+	struct xfs_buf			*bp,
+	void				*priv)
+{
+	struct xfs_attr3_icleaf_hdr	ichdr;
+	struct xfs_mount		*mp = ip->i_mount;
+	struct xfs_attr_leafblock	*leaf = bp->b_addr;
+	struct xfs_attr_leaf_entry	*entry;
+	unsigned int			i;
+	int				error;
+
+	libxfs_attr3_leaf_hdr_from_disk(mp->m_attr_geo, &ichdr, leaf);
+	entry = xfs_attr3_leaf_entryp(leaf);
+
+	for (i = 0; i < ichdr.count; entry++, i++) {
+		void			*value;
+		char			*name;
+		unsigned int		namelen, valuelen;
+
+		if (entry->flags & XFS_ATTR_LOCAL) {
+			struct xfs_attr_leaf_name_local		*name_loc;
+
+			name_loc = xfs_attr3_leaf_name_local(leaf, i);
+			name = name_loc->nameval;
+			namelen = name_loc->namelen;
+			value = &name_loc->nameval[name_loc->namelen];
+			valuelen = be16_to_cpu(name_loc->valuelen);
+		} else {
+			struct xfs_attr_leaf_name_remote	*name_rmt;
+
+			name_rmt = xfs_attr3_leaf_name_remote(leaf, i);
+			name = name_rmt->name;
+			namelen = name_rmt->namelen;
+			value = NULL;
+			valuelen = be32_to_cpu(name_rmt->valuelen);
+		}
+
+		error = attr_fn(ip, entry->flags, name, namelen, value,
+				valuelen, priv);
+		if (error)
+			return error;
+
+	}
+
+	return 0;
+}
+
+/*
+ * Call a function for every entry in a leaf-format xattr structure.  Avoid
+ * memory allocations for the loop detector since there's only one block.
+ */
+STATIC int
+xattr_walk_leaf(
+	struct xfs_inode		*ip,
+	xattr_walk_fn			attr_fn,
+	void				*priv)
+{
+	struct xfs_buf			*leaf_bp;
+	int				error;
+
+	error = -libxfs_attr3_leaf_read(NULL, ip, 0, &leaf_bp);
+	if (error)
+		return error;
+
+	error = xattr_walk_leaf_entries(ip, attr_fn, leaf_bp, priv);
+	libxfs_trans_brelse(NULL, leaf_bp);
+	return error;
+}
+
+/* Find the leftmost leaf in the xattr dabtree. */
+STATIC int
+xattr_walk_find_leftmost_leaf(
+	struct xfs_inode		*ip,
+	struct bitmap			*seen_blocks,
+	struct xfs_buf			**leaf_bpp)
+{
+	struct xfs_da3_icnode_hdr	nodehdr;
+	struct xfs_mount		*mp = ip->i_mount;
+	struct xfs_da_intnode		*node;
+	struct xfs_da_node_entry	*btree;
+	struct xfs_buf			*bp;
+	//xfs_failaddr_t			fa;
+	xfs_dablk_t			blkno = 0;
+	unsigned int			expected_level = 0;
+	int				error;
+
+	for (;;) {
+		uint16_t		magic;
+
+		error = -libxfs_da3_node_read(NULL, ip, blkno, &bp,
+				XFS_ATTR_FORK);
+		if (error)
+			return error;
+
+		node = bp->b_addr;
+		magic = be16_to_cpu(node->hdr.info.magic);
+		if (magic == XFS_ATTR_LEAF_MAGIC ||
+		    magic == XFS_ATTR3_LEAF_MAGIC)
+			break;
+
+		error = EFSCORRUPTED;
+		if (magic != XFS_DA_NODE_MAGIC &&
+		    magic != XFS_DA3_NODE_MAGIC)
+			goto out_buf;
+
+#if 0
+		fa = xfs_da3_node_header_check(bp, ip->i_ino);
+		if (fa)
+			goto out_buf;
+#endif
+
+		libxfs_da3_node_hdr_from_disk(mp, &nodehdr, node);
+
+		if (nodehdr.count == 0 || nodehdr.level >= XFS_DA_NODE_MAXDEPTH)
+			goto out_buf;
+
+		/* Check the level from the root node. */
+		if (blkno == 0)
+			expected_level = nodehdr.level - 1;
+		else if (expected_level != nodehdr.level)
+			goto out_buf;
+		else
+			expected_level--;
+
+		/* Remember that we've seen this node. */
+		error = -bitmap_set(seen_blocks, blkno, 1);
+		if (error)
+			goto out_buf;
+
+		/* Find the next level towards the leaves of the dabtree. */
+		btree = nodehdr.btree;
+		blkno = be32_to_cpu(btree->before);
+		libxfs_trans_brelse(NULL, bp);
+
+		/* Make sure we haven't seen this new block already. */
+		if (bitmap_test(seen_blocks, blkno, 1))
+			return EFSCORRUPTED;
+	}
+
+	error = EFSCORRUPTED;
+#if 0
+	fa = xfs_attr3_leaf_header_check(bp, ip->i_ino);
+	if (fa)
+		goto out_buf;
+#endif
+
+	if (expected_level != 0)
+		goto out_buf;
+
+	/* Remember that we've seen this leaf. */
+	error = -bitmap_set(seen_blocks, blkno, 1);
+	if (error)
+		goto out_buf;
+
+	*leaf_bpp = bp;
+	return 0;
+
+out_buf:
+	libxfs_trans_brelse(NULL, bp);
+	return error;
+}
+
+/* Call a function for every entry in a node-format xattr structure. */
+STATIC int
+xattr_walk_node(
+	struct xfs_inode		*ip,
+	xattr_walk_fn			attr_fn,
+	void				*priv)
+{
+	struct xfs_attr3_icleaf_hdr	leafhdr;
+	struct bitmap			*seen_blocks;
+	struct xfs_mount		*mp = ip->i_mount;
+	struct xfs_attr_leafblock	*leaf;
+	struct xfs_buf			*leaf_bp;
+	int				error;
+
+	bitmap_alloc(&seen_blocks);
+
+	error = xattr_walk_find_leftmost_leaf(ip, seen_blocks, &leaf_bp);
+	if (error)
+		goto out_bitmap;
+
+	for (;;) {
+		error = xattr_walk_leaf_entries(ip, attr_fn, leaf_bp,
+				priv);
+		if (error)
+			goto out_leaf;
+
+		/* Find the right sibling of this leaf block. */
+		leaf = leaf_bp->b_addr;
+		libxfs_attr3_leaf_hdr_from_disk(mp->m_attr_geo, &leafhdr, leaf);
+		if (leafhdr.forw == 0)
+			goto out_leaf;
+
+		libxfs_trans_brelse(NULL, leaf_bp);
+
+		/* Make sure we haven't seen this new leaf already. */
+		if (bitmap_test(seen_blocks, leafhdr.forw, 1))
+			goto out_bitmap;
+
+		error = -libxfs_attr3_leaf_read(NULL, ip, leafhdr.forw,
+				&leaf_bp);
+		if (error)
+			goto out_bitmap;
+
+		/* Remember that we've seen this new leaf. */
+		error = -bitmap_set(seen_blocks, leafhdr.forw, 1);
+		if (error)
+			goto out_leaf;
+	}
+
+out_leaf:
+	libxfs_trans_brelse(NULL, leaf_bp);
+out_bitmap:
+	bitmap_free(&seen_blocks);
+	return error;
+}
+
+/* Call a function for every extended attribute in a file. */
+int
+xattr_walk(
+	struct xfs_inode	*ip,
+	xattr_walk_fn		attr_fn,
+	void			*priv)
+{
+	int			error;
+
+	if (!libxfs_inode_hasattr(ip))
+		return 0;
+
+	if (ip->i_af.if_format == XFS_DINODE_FMT_LOCAL)
+		return xattr_walk_sf(ip, attr_fn, priv);
+
+	/* attr functions require that the attr fork is loaded */
+	error = -libxfs_iread_extents(NULL, ip, XFS_ATTR_FORK);
+	if (error)
+		return error;
+
+	if (libxfs_attr_is_leaf(ip))
+		return xattr_walk_leaf(ip, attr_fn, priv);
+
+	return xattr_walk_node(ip, attr_fn, priv);
+}
diff --git a/repair/listxattr.h b/repair/listxattr.h
new file mode 100644
index 00000000..cd18fdd2
--- /dev/null
+++ b/repair/listxattr.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Copyright (C) 2022 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <djwong@kernel.org>
+ */
+#ifndef __REPAIR_LISTXATTR_H__
+#define __REPAIR_LISTXATTR_H__
+
+typedef int (*xattr_walk_fn)(struct xfs_inode *ip, unsigned int attr_flags,
+		const unsigned char *name, unsigned int namelen,
+		const void *value, unsigned int valuelen, void *priv);
+
+int xattr_walk(struct xfs_inode *ip, xattr_walk_fn attr_fn, void *priv);
+
+#endif /* __REPAIR_LISTXATTR_H__ */
diff --git a/repair/phase6.c b/repair/phase6.c
index 48ec236d..1994162a 100644
--- a/repair/phase6.c
+++ b/repair/phase6.c
@@ -3344,5 +3344,7 @@ _("        - resetting contents of realtime bitmap and summary inodes\n"));
 		}
 	}
 
+	/* Check and repair directory parent pointers, if enabled. */
+	check_parent_ptrs(mp);
 	parent_ptr_free(mp);
 }
diff --git a/repair/pptr.c b/repair/pptr.c
index b10c7f41..d1e7f5ee 100644
--- a/repair/pptr.c
+++ b/repair/pptr.c
@@ -6,8 +6,13 @@
 #include "libxfs.h"
 #include "libxfs/xfile.h"
 #include "libxfs/xfblob.h"
+#include "libfrog/workqueue.h"
+#include "repair/globals.h"
 #include "repair/err_protos.h"
 #include "repair/slab.h"
+#include "repair/listxattr.h"
+#include "repair/threads.h"
+#include "repair/incore.h"
 #include "repair/pptr.h"
 
 #undef PPTR_DEBUG
@@ -126,6 +131,21 @@ struct ag_pptr {
 	xfs_agino_t		child_agino;
 };
 
+struct file_pptr {
+	/* parent directory handle */
+	xfs_ino_t		parent_ino;
+	unsigned int		parent_gen;
+
+	/* dirent offset */
+	xfs_dir2_dataptr_t	diroffset;
+
+	/* parent pointer name length */
+	unsigned int		namelen;
+
+	/* cookie for the file dirent name */
+	xfblob_cookie		name_cookie;
+};
+
 struct ag_pptrs {
 	/* Lock to protect pptr_recs during the dirent scan. */
 	pthread_mutex_t		lock;
@@ -134,11 +154,80 @@ struct ag_pptrs {
 	struct xfs_slab		*pptr_recs;
 };
 
+struct file_scan {
+	struct ag_pptrs		*ag_pptrs;
+
+	/* cursor for comparing ag_pptrs.pptr_recs against file_pptrs_recs */
+	struct xfs_slab_cursor	*ag_pptr_recs_cur;
+
+	/* xfs_parent_name_rec records for a file that we're checking */
+	struct xfs_slab		*file_pptr_recs;
+
+	/* cursor for comparing file_pptr_recs against pptrs_recs */
+	struct xfs_slab_cursor	*file_pptr_recs_cur;
+
+	/* names associated with file_pptr_recs */
+	struct xfblob		*file_pptr_names;
+
+	/* Number of parent pointers recorded for this file. */
+	unsigned int		nr_file_pptrs;
+
+	/* Does this file have garbage xattrs with ATTR_PARENT set? */
+	bool			have_garbage;
+};
+
 /* Global names storage file. */
 static struct xfblob	*names;
 static pthread_mutex_t	names_mutex = PTHREAD_MUTEX_INITIALIZER;
 static struct ag_pptrs	*fs_pptrs;
 
+static int
+cmp_ag_pptr(
+	const void		*a,
+	const void		*b)
+{
+	const struct ag_pptr	*pa = a;
+	const struct ag_pptr	*pb = b;
+
+	if (pa->child_agino < pb->child_agino)
+		return -1;
+	if (pa->child_agino > pb->child_agino)
+		return 1;
+
+	if (pa->parent_ino < pb->parent_ino)
+		return -1;
+	if (pa->parent_ino > pb->parent_ino)
+		return 1;
+
+	if (pa->diroffset < pb->diroffset)
+		return -1;
+	if (pa->diroffset > pb->diroffset)
+		return 1;
+
+	return 0;
+}
+
+static int
+cmp_file_pptr(
+	const void		*a,
+	const void		*b)
+{
+	const struct file_pptr	*pa = a;
+	const struct file_pptr	*pb = b;
+
+	if (pa->parent_ino < pb->parent_ino)
+		return -1;
+	if (pa->parent_ino > pb->parent_ino)
+		return 1;
+
+	if (pa->diroffset < pb->diroffset)
+		return -1;
+	if (pa->diroffset > pb->diroffset)
+		return 1;
+
+	return 0;
+}
+
 void
 parent_ptr_free(
 	struct xfs_mount	*mp)
@@ -240,3 +329,532 @@ add_parent_ptr(
 			diroffset, (unsigned long long)ino,
 			(unsigned long long)ag_pptr.name_cookie);
 }
+
+/* Schedule this extended attribute for deletion. */
+static void
+record_garbage_xattr(
+	struct xfs_inode	*ip,
+	struct file_scan	*fscan,
+	unsigned int		attr_filter,
+	const void		*name,
+	unsigned int		namelen)
+{
+	if (no_modify) {
+		if (!fscan->have_garbage)
+			do_warn(
+ _("would delete garbage parent pointer extended attributes in ino %llu\n"),
+					(unsigned long long)ip->i_ino);
+		fscan->have_garbage = true;
+		return;
+	}
+
+	if (fscan->have_garbage)
+		return;
+	fscan->have_garbage = true;
+
+	do_warn(
+ _("deleting garbage parent pointer extended attributes in ino %llu\n"),
+			(unsigned long long)ip->i_ino);
+	/* XXX do the work */
+}
+
+/* Decide if this is a directory parent pointer and stash it if so. */
+static int
+examine_xattr(
+	struct xfs_inode	*ip,
+	unsigned int		attr_flags,
+	const unsigned char	*name,
+	unsigned int		namelen,
+	const void		*value,
+	unsigned int		valuelen,
+	void			*priv)
+{
+	struct file_pptr	file_pptr = { };
+	struct xfs_parent_name_irec irec;
+	struct xfs_mount	*mp = ip->i_mount;
+	struct file_scan	*fscan = priv;
+	const struct xfs_parent_name_rec *rec = (const void *)name;
+	int			error;
+
+	/* Ignore anything that isn't a parent pointer. */
+	if (!(attr_flags & XFS_ATTR_PARENT))
+		return 0;
+
+	/* No incomplete parent pointers. */
+	if (attr_flags & XFS_ATTR_INCOMPLETE)
+		goto corrupt;
+
+	/* Does the ondisk parent pointer structure make sense? */
+	if (!xfs_parent_namecheck(mp, rec, namelen, attr_flags) ||
+	    !xfs_parent_valuecheck(mp, value, valuelen))
+		goto corrupt;
+
+	libxfs_parent_irec_from_disk(&irec, rec, value, valuelen);
+
+	file_pptr.parent_ino = irec.p_ino;
+	file_pptr.parent_gen = irec.p_gen;
+	file_pptr.diroffset = irec.p_diroffset;
+	file_pptr.namelen = irec.p_namelen;
+
+	error = -xfblob_store(fscan->file_pptr_names,
+			&file_pptr.name_cookie, irec.p_name, irec.p_namelen);
+	if (error)
+		do_error(
+ _("storing ino %llu parent pointer '%.*s' failed: %s\n"),
+				(unsigned long long)ip->i_ino, irec.p_namelen,
+				(const char *)irec.p_name, strerror(error));
+
+	error = -slab_add(fscan->file_pptr_recs, &file_pptr);
+	if (error)
+		do_error(_("storing ino %llu parent pointer rec failed: %s\n"),
+				(unsigned long long)ip->i_ino, strerror(error));
+
+	dbg_printf(
+ _("%s: dp %llu fname '%.*s' namelen %u diroffset %u ino %llu cookie 0x%llx\n"),
+			__func__, (unsigned long long)irec.p_ino,
+			irec.p_namelen, (const char *)irec.p_name,
+			irec.p_namelen, irec.p_diroffset,
+			(unsigned long long)ip->i_ino,
+			(unsigned long long)file_pptr.name_cookie);
+	fscan->nr_file_pptrs++;
+	return 0;
+corrupt:
+	record_garbage_xattr(ip, fscan, attr_flags, name, namelen);
+	return 0;
+}
+
+/* Remove all pptrs from @ip. */
+static void
+clear_all_pptrs(
+	struct xfs_inode	*ip)
+{
+	if (no_modify) {
+		do_warn(_("would delete unlinked ino %llu parent pointers\n"),
+				(unsigned long long)ip->i_ino);
+		return;
+	}
+
+	do_warn(_("deleting unlinked ino %llu parent pointers\n"),
+			(unsigned long long)ip->i_ino);
+	/* XXX actually do the work */
+}
+
+/* Add @ag_pptr to @ip. */
+static void
+add_missing_parent_ptr(
+	struct xfs_inode	*ip,
+	struct file_scan	*fscan,
+	const struct ag_pptr	*ag_pptr)
+{
+	unsigned char		name[MAXNAMELEN];
+	int			error;
+
+	error = -xfblob_load(names, ag_pptr->name_cookie, name,
+			ag_pptr->namelen);
+	if (error)
+		do_error(
+ _("loading missing name for ino %llu parent pointer (ino %llu gen 0x%x diroffset %u namecookie 0x%llx) failed: %s\n"),
+				(unsigned long long)ip->i_ino,
+				(unsigned long long)ag_pptr->parent_ino,
+				ag_pptr->parent_gen, ag_pptr->diroffset,
+				(unsigned long long)ag_pptr->name_cookie,
+				strerror(error));
+
+	if (no_modify) {
+		do_warn(
+ _("would add missing ino %llu parent pointer (ino %llu gen 0x%x diroffset %u name '%.*s')\n"),
+				(unsigned long long)ip->i_ino,
+				(unsigned long long)ag_pptr->parent_ino,
+				ag_pptr->parent_gen, ag_pptr->diroffset,
+				ag_pptr->namelen, name);
+		return;
+	}
+
+	do_warn(
+ _("adding missing ino %llu parent pointer (ino %llu gen 0x%x diroffset %u name '%.*s')\n"),
+			(unsigned long long)ip->i_ino,
+			(unsigned long long)ag_pptr->parent_ino,
+			ag_pptr->parent_gen, ag_pptr->diroffset,
+			ag_pptr->namelen, name);
+
+	/* XXX actually do the work */
+}
+
+/* Remove @file_pptr from @ip. */
+static void
+remove_incorrect_parent_ptr(
+	struct xfs_inode	*ip,
+	struct file_scan	*fscan,
+	const struct file_pptr	*file_pptr)
+{
+	unsigned char		name[MAXNAMELEN] = { };
+	int			error;
+
+	error = -xfblob_load(fscan->file_pptr_names, file_pptr->name_cookie,
+			name, file_pptr->namelen);
+	if (error)
+		do_error(
+ _("loading incorrect name for ino %llu parent pointer (ino %llu gen 0x%x diroffset %u namecookie 0x%llx) failed: %s\n"),
+				(unsigned long long)ip->i_ino,
+				(unsigned long long)file_pptr->parent_ino,
+				file_pptr->parent_gen, file_pptr->diroffset,
+				(unsigned long long)file_pptr->name_cookie,
+				strerror(error));
+
+	if (no_modify) {
+		do_warn(
+ _("would remove bad ino %llu parent pointer (ino %llu gen 0x%x diroffset %u name '%.*s')\n"),
+				(unsigned long long)ip->i_ino,
+				(unsigned long long)file_pptr->parent_ino,
+				file_pptr->parent_gen, file_pptr->diroffset,
+				file_pptr->namelen, name);
+		return;
+	}
+
+	do_warn(
+ _("removing bad ino %llu parent pointer (ino %llu gen 0x%x diroffset %u name '%.*s')\n"),
+			(unsigned long long)ip->i_ino,
+			(unsigned long long)file_pptr->parent_ino,
+			file_pptr->parent_gen, file_pptr->diroffset,
+			file_pptr->namelen, name);
+
+	/* XXX actually do the work */
+}
+
+/*
+ * We found parent pointers that point to the same inode and directory offset.
+ * Make sure they have the same generation number and dirent name.
+ */
+static void
+compare_parent_pointers(
+	struct xfs_inode	*ip,
+	struct file_scan	*fscan,
+	const struct ag_pptr	*ag_pptr,
+	const struct file_pptr	*file_pptr)
+{
+	unsigned char		name1[MAXNAMELEN] = { };
+	unsigned char		name2[MAXNAMELEN] = { };
+	int			error;
+
+	error = -xfblob_load(names, ag_pptr->name_cookie, name1,
+			ag_pptr->namelen);
+	if (error)
+		do_error(
+ _("loading master-list name for ino %llu parent pointer (ino %llu gen 0x%x diroffset %u namecookie 0x%llx namelen %u) failed: %s\n"),
+				(unsigned long long)ip->i_ino,
+				(unsigned long long)ag_pptr->parent_ino,
+				ag_pptr->parent_gen, ag_pptr->diroffset,
+				(unsigned long long)ag_pptr->name_cookie,
+				ag_pptr->namelen, strerror(error));
+
+	error = -xfblob_load(fscan->file_pptr_names, file_pptr->name_cookie,
+			name2, file_pptr->namelen);
+	if (error)
+		do_error(
+ _("loading file-list name for ino %llu parent pointer (ino %llu gen 0x%x diroffset %u namecookie 0x%llx namelen %u) failed: %s\n"),
+				(unsigned long long)ip->i_ino,
+				(unsigned long long)file_pptr->parent_ino,
+				file_pptr->parent_gen, file_pptr->diroffset,
+				(unsigned long long)file_pptr->name_cookie,
+				ag_pptr->namelen, strerror(error));
+
+	if (ag_pptr->parent_gen != file_pptr->parent_gen)
+		goto reset;
+	if (ag_pptr->namelen != file_pptr->namelen)
+		goto reset;
+	if (memcmp(name1, name2, ag_pptr->namelen))
+		goto reset;
+
+	return;
+
+reset:
+	if (no_modify) {
+		do_warn(
+ _("would update ino %llu parent pointer (ino %llu gen 0x%x diroffset %u name '%.*s')\n"),
+				(unsigned long long)ip->i_ino,
+				(unsigned long long)ag_pptr->parent_ino,
+				ag_pptr->parent_gen, ag_pptr->diroffset,
+				ag_pptr->namelen, name1);
+		return;
+	}
+
+	do_warn(
+ _("updating ino %llu parent pointer (ino %llu gen 0x%x diroffset %u name '%.*s')\n"),
+			(unsigned long long)ip->i_ino,
+			(unsigned long long)ag_pptr->parent_ino,
+			ag_pptr->parent_gen, ag_pptr->diroffset,
+			ag_pptr->namelen, name1);
+
+	/* XXX do the work */
+}
+
+/*
+ * Make sure that the parent pointers we observed match the ones ondisk.
+ *
+ * Earlier, we generated a master list of parent pointers for files in this AG
+ * based on what we saw during the directory walk at the start of phase 6.
+ * Now that we've read in all of this file's parent pointers, make sure the
+ * lists match.
+ */
+static void
+crosscheck_file_parent_ptrs(
+	struct xfs_inode	*ip,
+	struct file_scan	*fscan)
+{
+	struct ag_pptr		*ag_pptr;
+	struct file_pptr	*file_pptr;
+	struct xfs_mount	*mp = ip->i_mount;
+	xfs_agino_t		agino = XFS_INO_TO_AGINO(mp, ip->i_ino);
+	int			error;
+
+	ag_pptr = peek_slab_cursor(fscan->ag_pptr_recs_cur);
+
+	if (!ag_pptr || ag_pptr->child_agino > agino) {
+		/*
+		 * The cursor for the master pptr list has gone beyond this
+		 * file that we're scanning.  Evidently it has no parents at
+		 * all, so we better not have found any pptrs attached to the
+		 * file.
+		 */
+		if (fscan->nr_file_pptrs > 0)
+			clear_all_pptrs(ip);
+
+		return;
+	}
+
+	if (ag_pptr->child_agino < agino) {
+		/*
+		 * The cursor for the master pptr list is behind the file that
+		 * we're scanning.  This suggests that the incore inode tree
+		 * doesn't know about a file that is mentioned by a dirent.
+		 * At this point the inode liveness is supposed to be settled,
+		 * which means our incore information is inconsistent.
+		 */
+		do_error(
+ _("found dirent referring to ino %llu even though inobt scan moved on to ino %llu?!\n"),
+				(unsigned long long)XFS_AGINO_TO_INO(mp,
+					XFS_INO_TO_AGNO(mp, ip->i_ino),
+					ag_pptr->child_agino),
+				(unsigned long long)ip->i_ino);
+		/* does not return */
+	}
+
+	/*
+	 * The master pptr list cursor is pointing to the inode that we want
+	 * to check.  Sort the pptr records that we recorded from the ondisk
+	 * pptrs for this file, then set up for the comparison.
+	 */
+	qsort_slab(fscan->file_pptr_recs, cmp_file_pptr);
+
+	error = -init_slab_cursor(fscan->file_pptr_recs, cmp_file_pptr,
+			&fscan->file_pptr_recs_cur);
+	if (error)
+		do_error(_("init ino %llu parent pointer cursor failed: %s\n"),
+				(unsigned long long)ip->i_ino, strerror(error));
+
+	do {
+		file_pptr = peek_slab_cursor(fscan->file_pptr_recs_cur);
+
+		dbg_printf(
+ _("%s: dp %llu dp_gen 0x%x namelen %u diroffset %u ino %llu namecookie 0x%llx (master)\n"),
+				__func__,
+				(unsigned long long)ag_pptr->parent_ino,
+				ag_pptr->parent_gen,
+				ag_pptr->namelen,
+				ag_pptr->diroffset,
+				(unsigned long long)ip->i_ino,
+				(unsigned long long)ag_pptr->name_cookie);
+
+		if (file_pptr) {
+			dbg_printf(
+ _("%s: dp %llu dp_gen 0x%x namelen %u diroffset %u ino %llu namecookie 0x%llx (file)\n"),
+					__func__,
+					(unsigned long long)file_pptr->parent_ino,
+					file_pptr->parent_gen,
+					file_pptr->namelen,
+					file_pptr->diroffset,
+					(unsigned long long)ip->i_ino,
+					(unsigned long long)file_pptr->name_cookie);
+		} else {
+			dbg_printf(
+ _("%s: ran out of parent pointers for ino %llu (file)\n"),
+					__func__,
+					(unsigned long long)ip->i_ino);
+		}
+
+		if (!file_pptr ||
+		    file_pptr->parent_ino > ag_pptr->parent_ino ||
+		    file_pptr->diroffset > ag_pptr->diroffset) {
+			/*
+			 * The master pptr list knows about pptrs that are not
+			 * in the ondisk metadata.  Add the missing pptr and
+			 * advance only the master pptr cursor.
+			 */
+			add_missing_parent_ptr(ip, fscan, ag_pptr);
+			advance_slab_cursor(fscan->ag_pptr_recs_cur);
+		} else if (file_pptr->parent_ino < ag_pptr->parent_ino ||
+			   file_pptr->diroffset < ag_pptr->diroffset) {
+			/*
+			 * The ondisk pptrs mention a link that is not in the
+			 * master list.  Delete the extra pptr and advance only
+			 * the file pptr cursor.
+			 */
+			remove_incorrect_parent_ptr(ip, fscan, file_pptr);
+			advance_slab_cursor(fscan->file_pptr_recs_cur);
+		} else {
+			/*
+			 * Exact match, make sure the parent_gen and dirent
+			 * name parts of the parent pointer match.  Move both
+			 * cursors forward.
+			 */
+			compare_parent_pointers(ip, fscan, ag_pptr, file_pptr);
+			advance_slab_cursor(fscan->ag_pptr_recs_cur);
+			advance_slab_cursor(fscan->file_pptr_recs_cur);
+		}
+
+		ag_pptr = peek_slab_cursor(fscan->ag_pptr_recs_cur);
+	} while (ag_pptr && ag_pptr->child_agino == agino);
+
+	while ((file_pptr = pop_slab_cursor(fscan->file_pptr_recs_cur))) {
+		dbg_printf(
+ _("%s: dp %llu dp_gen 0x%x namelen %u diroffset %u ino %llu namecookie 0x%llx (excess)\n"),
+				__func__,
+				(unsigned long long)file_pptr->parent_ino,
+				file_pptr->parent_gen,
+				file_pptr->namelen,
+				file_pptr->diroffset,
+				(unsigned long long)ip->i_ino,
+				(unsigned long long)file_pptr->name_cookie);
+
+		/*
+		 * The master pptr list does not have any more pptrs for this
+		 * file, but we still have unprocessed ondisk pptrs.  Delete
+		 * all these ondisk pptrs.
+		 */
+		remove_incorrect_parent_ptr(ip, fscan, file_pptr);
+	}
+}
+
+/* Ensure this file's parent pointers match what we found in the dirent scan. */
+static void
+check_file_parent_ptrs(
+	struct xfs_inode	*ip,
+	struct file_scan	*fscan)
+{
+	int			error;
+
+	error = -init_slab(&fscan->file_pptr_recs, sizeof(struct file_pptr));
+	if (error)
+		do_error(_("init file parent pointer recs failed: %s\n"),
+				strerror(error));
+
+	fscan->have_garbage = false;
+	fscan->nr_file_pptrs = 0;
+
+	error = xattr_walk(ip, examine_xattr, fscan);
+	if (error && !no_modify)
+		do_error(_("ino %llu parent pointer scan failed: %s\n"),
+				(unsigned long long)ip->i_ino,
+				strerror(error));
+	if (error) {
+		do_warn(_("ino %llu parent pointer scan failed: %s\n"),
+				(unsigned long long)ip->i_ino,
+				strerror(error));
+		goto out_free;
+	}
+
+	crosscheck_file_parent_ptrs(ip, fscan);
+
+out_free:
+	free_slab(&fscan->file_pptr_recs);
+	xfblob_truncate(fscan->file_pptr_names);
+}
+
+/* Check all the parent pointers of files in this AG. */
+static void
+check_ag_parent_ptrs(
+	struct workqueue	*wq,
+	uint32_t		agno,
+	void			*arg)
+{
+	struct xfs_mount	*mp = wq->wq_ctx;
+	struct file_scan	fscan = {
+		.ag_pptrs	= &fs_pptrs[agno],
+	};
+	struct ag_pptrs		*ag_pptrs = &fs_pptrs[agno];
+	struct ino_tree_node	*irec;
+	int			error;
+
+	qsort_slab(ag_pptrs->pptr_recs, cmp_ag_pptr);
+
+	error = -init_slab_cursor(ag_pptrs->pptr_recs, cmp_ag_pptr,
+			&fscan.ag_pptr_recs_cur);
+	if (error)
+		do_error(
+ _("init agno %u parent pointer slab cursor failed: %s\n"),
+				agno, strerror(error));
+
+	error = -xfblob_create(mp, "file parent pointer names",
+			&fscan.file_pptr_names);
+	if (error)
+		do_error(
+ _("init agno %u file parent pointer names failed: %s\n"),
+				agno, strerror(error));
+
+	for (irec = findfirst_inode_rec(agno);
+	     irec != NULL;
+	     irec = next_ino_rec(irec)) {
+		unsigned int	ino_offset;
+
+		for (ino_offset = 0;
+		     ino_offset < XFS_INODES_PER_CHUNK;
+		     ino_offset++) {
+			struct xfs_inode *ip;
+			xfs_ino_t	ino;
+
+			if (is_inode_free(irec, ino_offset))
+				continue;
+
+			ino = XFS_AGINO_TO_INO(mp, agno,
+					irec->ino_startnum + ino_offset);
+			error = -libxfs_iget(mp, NULL, ino, 0, &ip);
+			if (error && !no_modify)
+				do_error(
+ _("loading ino %llu for parent pointer check failed: %s\n"),
+						(unsigned long long)ino,
+						strerror(error));
+			if (error) {
+				do_warn(
+ _("loading ino %llu for parent pointer check failed: %s\n"),
+						(unsigned long long)ino,
+						strerror(error));
+				continue;
+			}
+
+			check_file_parent_ptrs(ip, &fscan);
+			libxfs_irele(ip);
+		}
+	}
+
+	xfblob_destroy(fscan.file_pptr_names);
+	free_slab_cursor(&fscan.ag_pptr_recs_cur);
+}
+
+/* Check all the parent pointers of all files in this filesystem. */
+void
+check_parent_ptrs(
+	struct xfs_mount	*mp)
+{
+	struct workqueue	wq;
+	xfs_agnumber_t		agno;
+
+	if (!xfs_has_parent(mp))
+		return;
+
+	create_work_queue(&wq, mp, ag_stride);
+
+	for (agno = 0; agno < mp->m_sb.sb_agcount; agno++)
+		queue_work(&wq, check_ag_parent_ptrs, agno, NULL);
+
+	destroy_work_queue(&wq);
+}
diff --git a/repair/pptr.h b/repair/pptr.h
index 2c632ec9..d72c1ac2 100644
--- a/repair/pptr.h
+++ b/repair/pptr.h
@@ -12,4 +12,6 @@ void parent_ptr_init(struct xfs_mount *mp);
 void add_parent_ptr(xfs_ino_t ino, const unsigned char *fname,
 		xfs_dir2_dataptr_t diroffset, struct xfs_inode *dp);
 
+void check_parent_ptrs(struct xfs_mount *mp);
+
 #endif /* __REPAIR_PPTR_H__ */


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 3/8] xfs_repair: dump garbage parent pointer attributes
  2023-02-16 20:31 ` [PATCHSET v9r2d1 0/8] xfs_repair: support parent pointers Darrick J. Wong
  2023-02-16 21:08   ` [PATCH 1/8] xfs_repair: build a parent pointer index Darrick J. Wong
  2023-02-16 21:08   ` [PATCH 2/8] xfs_repair: check parent pointers Darrick J. Wong
@ 2023-02-16 21:09   ` Darrick J. Wong
  2023-02-16 21:09   ` [PATCH 4/8] xfs_repair: update ondisk parent pointer records Darrick J. Wong
                     ` (4 subsequent siblings)
  7 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 21:09 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Delete xattrs that have ATTR_PARENT set but are so garbage that they
clearly aren't parent pointers.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 repair/pptr.c |  114 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 111 insertions(+), 3 deletions(-)


diff --git a/repair/pptr.c b/repair/pptr.c
index d1e7f5ee..695177ce 100644
--- a/repair/pptr.c
+++ b/repair/pptr.c
@@ -174,6 +174,23 @@ struct file_scan {
 
 	/* Does this file have garbage xattrs with ATTR_PARENT set? */
 	bool			have_garbage;
+
+	/* xattrs that we have to remove from this file */
+	struct xfs_slab		*garbage_xattr_recs;
+
+	/* attr names associated with garbage_xattr_recs */
+	struct xfblob		*garbage_xattr_names;
+};
+
+struct garbage_xattr {
+	/* xfs_da_args.attr_filter for the attribute being removed */
+	unsigned int		attr_filter;
+
+	/* attribute name length */
+	unsigned int		attrnamelen;
+
+	/* cookie for the attribute name */
+	xfblob_cookie		attrname_cookie;
 };
 
 /* Global names storage file. */
@@ -330,7 +347,63 @@ add_parent_ptr(
 			(unsigned long long)ag_pptr.name_cookie);
 }
 
-/* Schedule this extended attribute for deletion. */
+/* Remove garbage extended attributes that have ATTR_PARENT set. */
+static void
+remove_garbage_xattrs(
+	struct xfs_inode	*ip,
+	struct file_scan	*fscan)
+{
+	struct xfs_slab_cursor	*cur;
+	struct garbage_xattr	*ga;
+	int			error;
+
+	error = -init_slab_cursor(fscan->garbage_xattr_recs, NULL, &cur);
+	if (error)
+		do_error(_("init garbage xattr cursor failed: %s\n"),
+				strerror(error));
+
+	while ((ga = pop_slab_cursor(cur)) != NULL) {
+		struct xfs_da_args	args = {
+			.dp		= ip,
+			.attr_filter	= ga->attr_filter,
+			.namelen	= ga->attrnamelen,
+		};
+		void			*buf;
+
+		buf = malloc(ga->attrnamelen);
+		if (!buf)
+			do_error(
+ _("allocating %u bytes to remove ino %llu garbage xattr failed: %s\n"),
+					ga->attrnamelen,
+					(unsigned long long)ip->i_ino,
+					strerror(errno));
+
+		error = -xfblob_load(fscan->garbage_xattr_names,
+				ga->attrname_cookie, buf, ga->attrnamelen);
+		if (error)
+			do_error(
+ _("loading garbage xattr name failed: %s\n"),
+					strerror(error));
+
+		args.name = buf;
+		error = -libxfs_attr_set(&args);
+		if (error)
+			do_error(
+ _("removing ino %llu garbage xattr failed: %s\n"),
+					(unsigned long long)ip->i_ino,
+					strerror(error));
+
+		free(buf);
+	}
+
+	free_slab_cursor(&cur);
+
+	free_slab(&fscan->garbage_xattr_recs);
+	xfblob_destroy(fscan->garbage_xattr_names);
+	fscan->garbage_xattr_names = NULL;
+}
+
+/* Schedule this ATTR_PARENT extended attribute for deletion. */
 static void
 record_garbage_xattr(
 	struct xfs_inode	*ip,
@@ -339,6 +412,13 @@ record_garbage_xattr(
 	const void		*name,
 	unsigned int		namelen)
 {
+	struct garbage_xattr	garbage_xattr = {
+		.attr_filter	= attr_filter,
+		.attrnamelen	= namelen,
+	};
+	struct xfs_mount	*mp = ip->i_mount;
+	int			error;
+
 	if (no_modify) {
 		if (!fscan->have_garbage)
 			do_warn(
@@ -349,13 +429,38 @@ record_garbage_xattr(
 	}
 
 	if (fscan->have_garbage)
-		return;
+		goto stuffit;
 	fscan->have_garbage = true;
 
 	do_warn(
  _("deleting garbage parent pointer extended attributes in ino %llu\n"),
 			(unsigned long long)ip->i_ino);
-	/* XXX do the work */
+
+	error = -init_slab(&fscan->garbage_xattr_recs,
+			sizeof(struct garbage_xattr));
+	if (error)
+		do_error(_("init garbage xattr recs failed: %s\n"),
+				strerror(error));
+
+	error = -xfblob_create(mp, "garbage xattr names",
+			&fscan->garbage_xattr_names);
+	if (error)
+		do_error("init garbage xattr names failed: %s\n",
+				strerror(error));
+
+stuffit:
+	error = -xfblob_store(fscan->garbage_xattr_names,
+			&garbage_xattr.attrname_cookie, name, namelen);
+	if (error)
+		do_error(_("storing ino %llu garbage xattr failed: %s\n"),
+				(unsigned long long)ip->i_ino,
+				strerror(error));
+
+	error = -slab_add(fscan->garbage_xattr_recs, &garbage_xattr);
+	if (error)
+		do_error(_("storing ino %llu garbage xattr rec failed: %s\n"),
+				(unsigned long long)ip->i_ino,
+				strerror(error));
 }
 
 /* Decide if this is a directory parent pointer and stash it if so. */
@@ -763,6 +868,9 @@ check_file_parent_ptrs(
 		goto out_free;
 	}
 
+	if (!no_modify && fscan->have_garbage)
+		remove_garbage_xattrs(ip, fscan);
+
 	crosscheck_file_parent_ptrs(ip, fscan);
 
 out_free:


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 4/8] xfs_repair: update ondisk parent pointer records
  2023-02-16 20:31 ` [PATCHSET v9r2d1 0/8] xfs_repair: support parent pointers Darrick J. Wong
                     ` (2 preceding siblings ...)
  2023-02-16 21:09   ` [PATCH 3/8] xfs_repair: dump garbage parent pointer attributes Darrick J. Wong
@ 2023-02-16 21:09   ` Darrick J. Wong
  2023-02-16 21:09   ` [PATCH 5/8] xfs_repair: wipe ondisk parent pointers when there are none Darrick J. Wong
                     ` (3 subsequent siblings)
  7 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 21:09 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Update the ondisk parent pointer records as necessary.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 libxfs/libxfs_api_defs.h |    2 +
 repair/pptr.c            |   74 ++++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 73 insertions(+), 3 deletions(-)


diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h
index 92cdb6cc..ab8bdc1c 100644
--- a/libxfs/libxfs_api_defs.h
+++ b/libxfs/libxfs_api_defs.h
@@ -147,7 +147,9 @@
 #define xfs_parent_add			libxfs_parent_add
 #define xfs_parent_finish		libxfs_parent_finish
 #define xfs_parent_irec_from_disk	libxfs_parent_irec_from_disk
+#define xfs_parent_set			libxfs_parent_set
 #define xfs_parent_start		libxfs_parent_start
+#define xfs_parent_unset		libxfs_parent_unset
 #define xfs_perag_get			libxfs_perag_get
 #define xfs_perag_put			libxfs_perag_put
 #define xfs_prealloc_blocks		libxfs_prealloc_blocks
diff --git a/repair/pptr.c b/repair/pptr.c
index 695177ce..53ac1013 100644
--- a/repair/pptr.c
+++ b/repair/pptr.c
@@ -528,6 +528,42 @@ examine_xattr(
 	return 0;
 }
 
+/* Add an on disk parent pointer to a file. */
+static int
+add_file_pptr(
+	struct xfs_inode		*ip,
+	const struct ag_pptr		*ag_pptr,
+	const unsigned char		*name)
+{
+	struct xfs_parent_name_irec	pptr_rec = {
+		.p_ino			= ag_pptr->parent_ino,
+		.p_gen			= ag_pptr->parent_gen,
+		.p_diroffset		= ag_pptr->diroffset,
+		.p_namelen		= ag_pptr->namelen,
+	};
+	struct xfs_parent_scratch	scratch;
+
+	memcpy(pptr_rec.p_name, name, ag_pptr->namelen);
+
+	return -libxfs_parent_set(ip, &pptr_rec, &scratch);
+}
+
+/* Remove an on disk parent pointer from a file. */
+static int
+remove_file_pptr(
+	struct xfs_inode		*ip,
+	const struct file_pptr		*file_pptr)
+{
+	struct xfs_parent_name_irec	pptr_rec = {
+		.p_ino			= file_pptr->parent_ino,
+		.p_gen			= file_pptr->parent_gen,
+		.p_diroffset		= file_pptr->diroffset,
+	};
+	struct xfs_parent_scratch	scratch;
+
+	return -libxfs_parent_unset(ip, &pptr_rec, &scratch);
+}
+
 /* Remove all pptrs from @ip. */
 static void
 clear_all_pptrs(
@@ -582,7 +618,14 @@ add_missing_parent_ptr(
 			ag_pptr->parent_gen, ag_pptr->diroffset,
 			ag_pptr->namelen, name);
 
-	/* XXX actually do the work */
+	error = add_file_pptr(ip, ag_pptr, name);
+	if (error)
+		do_error(
+ _("adding ino %llu pptr (ino %llu gen 0x%x diroffset %u name '%.*s') failed: %s\n"),
+			(unsigned long long)ip->i_ino,
+			(unsigned long long)ag_pptr->parent_ino,
+			ag_pptr->parent_gen, ag_pptr->diroffset,
+			ag_pptr->namelen, name, strerror(error));
 }
 
 /* Remove @file_pptr from @ip. */
@@ -623,7 +666,14 @@ remove_incorrect_parent_ptr(
 			file_pptr->parent_gen, file_pptr->diroffset,
 			file_pptr->namelen, name);
 
-	/* XXX actually do the work */
+	error = remove_file_pptr(ip, file_pptr);
+	if (error)
+		do_error(
+ _("removing ino %llu pptr (ino %llu gen 0x%x diroffset %u name '%.*s') failed: %s\n"),
+			(unsigned long long)ip->i_ino,
+			(unsigned long long)file_pptr->parent_ino,
+			file_pptr->parent_gen, file_pptr->diroffset,
+			file_pptr->namelen, name, strerror(error));
 }
 
 /*
@@ -690,7 +740,25 @@ compare_parent_pointers(
 			ag_pptr->parent_gen, ag_pptr->diroffset,
 			ag_pptr->namelen, name1);
 
-	/* XXX do the work */
+	if (ag_pptr->parent_gen != file_pptr->parent_gen) {
+		error = remove_file_pptr(ip, file_pptr);
+		if (error)
+			do_error(
+ _("erasing ino %llu pptr (ino %llu gen 0x%x diroffset %u name '%.*s') failed: %s\n"),
+				(unsigned long long)ip->i_ino,
+				(unsigned long long)file_pptr->parent_ino,
+				file_pptr->parent_gen, file_pptr->diroffset,
+				file_pptr->namelen, name2, strerror(error));
+	}
+
+	error = add_file_pptr(ip, ag_pptr, name1);
+	if (error)
+		do_error(
+ _("updating ino %llu pptr (ino %llu gen 0x%x diroffset %u name '%.*s') failed: %s\n"),
+			(unsigned long long)ip->i_ino,
+			(unsigned long long)ag_pptr->parent_ino,
+			ag_pptr->parent_gen, ag_pptr->diroffset,
+			ag_pptr->namelen, name1, strerror(error));
 }
 
 /*


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 5/8] xfs_repair: wipe ondisk parent pointers when there are none
  2023-02-16 20:31 ` [PATCHSET v9r2d1 0/8] xfs_repair: support parent pointers Darrick J. Wong
                     ` (3 preceding siblings ...)
  2023-02-16 21:09   ` [PATCH 4/8] xfs_repair: update ondisk parent pointer records Darrick J. Wong
@ 2023-02-16 21:09   ` Darrick J. Wong
  2023-02-16 21:09   ` [PATCH 6/8] xfs_repair: move the global dirent name store to a separate object Darrick J. Wong
                     ` (2 subsequent siblings)
  7 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 21:09 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Erase all the parent pointers when there aren't any found by the
directory entry scan.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 repair/pptr.c |   29 ++++++++++++++++++++++++++---
 1 file changed, 26 insertions(+), 3 deletions(-)


diff --git a/repair/pptr.c b/repair/pptr.c
index 53ac1013..b1f5fb4e 100644
--- a/repair/pptr.c
+++ b/repair/pptr.c
@@ -567,8 +567,13 @@ remove_file_pptr(
 /* Remove all pptrs from @ip. */
 static void
 clear_all_pptrs(
-	struct xfs_inode	*ip)
+	struct xfs_inode	*ip,
+	struct file_scan	*fscan)
 {
+	struct xfs_slab_cursor	*cur;
+	struct file_pptr	*file_pptr;
+	int			error;
+
 	if (no_modify) {
 		do_warn(_("would delete unlinked ino %llu parent pointers\n"),
 				(unsigned long long)ip->i_ino);
@@ -577,7 +582,25 @@ clear_all_pptrs(
 
 	do_warn(_("deleting unlinked ino %llu parent pointers\n"),
 			(unsigned long long)ip->i_ino);
-	/* XXX actually do the work */
+
+	error = -init_slab_cursor(fscan->file_pptr_recs, NULL, &cur);
+	if (error)
+		do_error(_("init ino %llu pptr cursor failed: %s\n"),
+				(unsigned long long)ip->i_ino,
+				strerror(error));
+
+	while ((file_pptr = pop_slab_cursor(cur)) != NULL) {
+		error = remove_file_pptr(ip, file_pptr);
+		if (error)
+			do_error(
+ _("wiping ino %llu pptr (ino %llu gen 0x%x diroffset %u) failed: %s\n"),
+				(unsigned long long)ip->i_ino,
+				(unsigned long long)file_pptr->parent_ino,
+				file_pptr->parent_gen, file_pptr->diroffset,
+				strerror(error));
+	}
+
+	free_slab_cursor(&cur);
 }
 
 /* Add @ag_pptr to @ip. */
@@ -790,7 +813,7 @@ crosscheck_file_parent_ptrs(
 		 * file.
 		 */
 		if (fscan->nr_file_pptrs > 0)
-			clear_all_pptrs(ip);
+			clear_all_pptrs(ip, fscan);
 
 		return;
 	}


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 6/8] xfs_repair: move the global dirent name store to a separate object
  2023-02-16 20:31 ` [PATCHSET v9r2d1 0/8] xfs_repair: support parent pointers Darrick J. Wong
                     ` (4 preceding siblings ...)
  2023-02-16 21:09   ` [PATCH 5/8] xfs_repair: wipe ondisk parent pointers when there are none Darrick J. Wong
@ 2023-02-16 21:09   ` Darrick J. Wong
  2023-02-16 21:10   ` [PATCH 7/8] xfs_repair: deduplicate strings stored in string blob Darrick J. Wong
  2023-02-16 21:10   ` [PATCH 8/8] xfs_repair: try to reuse nameblob names for file pptr scan names Darrick J. Wong
  7 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 21:09 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Abstract the main parent pointer dirent names xfblob object into a
separate data structure to hide implementation details.  This is the
first step towards deduplicating the names.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 repair/Makefile   |    2 +
 repair/pptr.c     |   13 +++++----
 repair/strblobs.c |   80 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 repair/strblobs.h |   20 +++++++++++++
 4 files changed, 109 insertions(+), 6 deletions(-)
 create mode 100644 repair/strblobs.c
 create mode 100644 repair/strblobs.h


diff --git a/repair/Makefile b/repair/Makefile
index 925864c2..48ddcdd1 100644
--- a/repair/Makefile
+++ b/repair/Makefile
@@ -33,6 +33,7 @@ HFILES = \
 	rt.h \
 	scan.h \
 	slab.h \
+	strblobs.h \
 	threads.h \
 	versions.h
 
@@ -71,6 +72,7 @@ CFILES = \
 	sb.c \
 	scan.c \
 	slab.c \
+	strblobs.c \
 	threads.c \
 	versions.c \
 	xfs_repair.c
diff --git a/repair/pptr.c b/repair/pptr.c
index b1f5fb4e..20d66884 100644
--- a/repair/pptr.c
+++ b/repair/pptr.c
@@ -14,6 +14,7 @@
 #include "repair/threads.h"
 #include "repair/incore.h"
 #include "repair/pptr.h"
+#include "repair/strblobs.h"
 
 #undef PPTR_DEBUG
 
@@ -194,7 +195,7 @@ struct garbage_xattr {
 };
 
 /* Global names storage file. */
-static struct xfblob	*names;
+static struct strblobs	*nameblobs;
 static pthread_mutex_t	names_mutex = PTHREAD_MUTEX_INITIALIZER;
 static struct ag_pptrs	*fs_pptrs;
 
@@ -261,7 +262,7 @@ parent_ptr_free(
 	free(fs_pptrs);
 	fs_pptrs = NULL;
 
-	xfblob_destroy(names);
+	strblobs_destroy(&nameblobs);
 }
 
 void
@@ -274,7 +275,7 @@ parent_ptr_init(
 	if (!xfs_has_parent(mp))
 		return;
 
-	error = -xfblob_create(mp, "parent pointer names", &names);
+	error = strblobs_init(mp, "parent pointer names", &nameblobs);
 	if (error)
 		do_error(_("init parent pointer names failed: %s\n"),
 				strerror(error));
@@ -325,7 +326,7 @@ add_parent_ptr(
 		return;
 
 	pthread_mutex_lock(&names_mutex);
-	error = -xfblob_store(names, &ag_pptr.name_cookie, fname,
+	error = strblobs_store(nameblobs, &ag_pptr.name_cookie, fname,
 			ag_pptr.namelen);
 	pthread_mutex_unlock(&names_mutex);
 	if (error)
@@ -613,7 +614,7 @@ add_missing_parent_ptr(
 	unsigned char		name[MAXNAMELEN];
 	int			error;
 
-	error = -xfblob_load(names, ag_pptr->name_cookie, name,
+	error = strblobs_load(nameblobs, ag_pptr->name_cookie, name,
 			ag_pptr->namelen);
 	if (error)
 		do_error(
@@ -714,7 +715,7 @@ compare_parent_pointers(
 	unsigned char		name2[MAXNAMELEN] = { };
 	int			error;
 
-	error = -xfblob_load(names, ag_pptr->name_cookie, name1,
+	error = strblobs_load(nameblobs, ag_pptr->name_cookie, name1,
 			ag_pptr->namelen);
 	if (error)
 		do_error(
diff --git a/repair/strblobs.c b/repair/strblobs.c
new file mode 100644
index 00000000..2b7a7a5e
--- /dev/null
+++ b/repair/strblobs.c
@@ -0,0 +1,80 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Copyright (C) 2023 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <djwong@kernel.org>
+ */
+#include "libxfs.h"
+#include "libxfs/xfile.h"
+#include "libxfs/xfblob.h"
+#include "repair/strblobs.h"
+
+/*
+ * String Blob Structure
+ * =====================
+ *
+ * This data structure wraps the storage of strings with explicit length in an
+ * xfblob structure.
+ */
+struct strblobs {
+	struct xfblob		*strings;
+};
+
+/* Initialize a string blob structure. */
+int
+strblobs_init(
+	struct xfs_mount	*mp,
+	const char		*descr,
+	struct strblobs		**sblobs)
+{
+	struct strblobs		*sb;
+	int			error;
+
+	sb = malloc(sizeof(struct strblobs));
+	if (!sb)
+		return ENOMEM;
+
+	error = -xfblob_create(mp, descr, &sb->strings);
+	if (error)
+		goto out_free;
+
+	*sblobs = sb;
+	return 0;
+
+out_free:
+	free(sb);
+	return error;
+}
+
+/* Deconstruct a string blob structure. */
+void
+strblobs_destroy(
+	struct strblobs		**sblobs)
+{
+	struct strblobs		*sb = *sblobs;
+
+	xfblob_destroy(sb->strings);
+	free(sb);
+	*sblobs = NULL;
+}
+
+/* Store a string and return a cookie for its retrieval. */
+int
+strblobs_store(
+	struct strblobs		*sblobs,
+	xfblob_cookie		*str_cookie,
+	const unsigned char	*str,
+	unsigned int		str_len)
+{
+	return -xfblob_store(sblobs->strings, str_cookie, str, str_len);
+}
+
+/* Retrieve a previously stored string. */
+int
+strblobs_load(
+	struct strblobs		*sblobs,
+	xfblob_cookie		str_cookie,
+	unsigned char		*str,
+	unsigned int		str_len)
+{
+	return -xfblob_load(sblobs->strings, str_cookie, str, str_len);
+}
diff --git a/repair/strblobs.h b/repair/strblobs.h
new file mode 100644
index 00000000..f5680175
--- /dev/null
+++ b/repair/strblobs.h
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Copyright (C) 2023 Oracle.  All Rights Reserved.
+ * Author: Darrick J. Wong <djwong@kernel.org>
+ */
+#ifndef __REPAIR_STRBLOBS_H__
+#define __REPAIR_STRBLOBS_H__
+
+struct strblobs;
+
+int strblobs_init(struct xfs_mount *mp, const char *descr,
+		struct strblobs **sblobs);
+void strblobs_destroy(struct strblobs **sblobs);
+
+int strblobs_store(struct strblobs *sblobs, xfblob_cookie *str_cookie,
+		const unsigned char *str, unsigned int str_len);
+int strblobs_load(struct strblobs *sblobs, xfblob_cookie str_cookie,
+		unsigned char *str, unsigned int str_len);
+
+#endif /* __REPAIR_STRBLOBS_H__ */


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 7/8] xfs_repair: deduplicate strings stored in string blob
  2023-02-16 20:31 ` [PATCHSET v9r2d1 0/8] xfs_repair: support parent pointers Darrick J. Wong
                     ` (5 preceding siblings ...)
  2023-02-16 21:09   ` [PATCH 6/8] xfs_repair: move the global dirent name store to a separate object Darrick J. Wong
@ 2023-02-16 21:10   ` Darrick J. Wong
  2023-02-16 21:10   ` [PATCH 8/8] xfs_repair: try to reuse nameblob names for file pptr scan names Darrick J. Wong
  7 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 21:10 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Reduce the memory requirements of the string blob structure by
deduplicating the strings stored within.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 repair/pptr.c     |    5 ++
 repair/strblobs.c |  141 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 repair/strblobs.h |    4 +-
 3 files changed, 145 insertions(+), 5 deletions(-)


diff --git a/repair/pptr.c b/repair/pptr.c
index 20d66884..c1cd9060 100644
--- a/repair/pptr.c
+++ b/repair/pptr.c
@@ -269,13 +269,16 @@ void
 parent_ptr_init(
 	struct xfs_mount	*mp)
 {
+	uint64_t		iused;
 	xfs_agnumber_t		agno;
 	int			error;
 
 	if (!xfs_has_parent(mp))
 		return;
 
-	error = strblobs_init(mp, "parent pointer names", &nameblobs);
+	/* One hash bucket per inode, up to about 8M of memory on 64-bit. */
+	iused = min(mp->m_sb.sb_icount - mp->m_sb.sb_ifree, 1048573);
+	error = strblobs_init(mp, "parent pointer names", iused, &nameblobs);
 	if (error)
 		do_error(_("init parent pointer names failed: %s\n"),
 				strerror(error));
diff --git a/repair/strblobs.c b/repair/strblobs.c
index 2b7a7a5e..fb5929e9 100644
--- a/repair/strblobs.c
+++ b/repair/strblobs.c
@@ -13,23 +13,43 @@
  * =====================
  *
  * This data structure wraps the storage of strings with explicit length in an
- * xfblob structure.
+ * xfblob structure.  It stores a hashtable of string checksums to provide
+ * fast(ish) lookups of existing strings to enable deduplication of the strings
+ * contained within.
  */
+struct strblob_hashent {
+	struct strblob_hashent	*next;
+
+	xfblob_cookie		str_cookie;
+	unsigned int		str_len;
+	xfs_dahash_t		str_hash;
+};
+
 struct strblobs {
 	struct xfblob		*strings;
+	unsigned int		nr_buckets;
+
+	struct strblob_hashent	*buckets[];
 };
 
+static inline size_t strblobs_sizeof(unsigned int nr_buckets)
+{
+	return sizeof(struct strblobs) +
+			(nr_buckets * sizeof(struct strblobs_hashent *));
+}
+
 /* Initialize a string blob structure. */
 int
 strblobs_init(
 	struct xfs_mount	*mp,
 	const char		*descr,
+	unsigned int		hash_buckets,
 	struct strblobs		**sblobs)
 {
 	struct strblobs		*sb;
 	int			error;
 
-	sb = malloc(sizeof(struct strblobs));
+	sb = calloc(strblobs_sizeof(hash_buckets), 1);
 	if (!sb)
 		return ENOMEM;
 
@@ -37,6 +57,7 @@ strblobs_init(
 	if (error)
 		goto out_free;
 
+	sb->nr_buckets = hash_buckets;
 	*sblobs = sb;
 	return 0;
 
@@ -51,12 +72,114 @@ strblobs_destroy(
 	struct strblobs		**sblobs)
 {
 	struct strblobs		*sb = *sblobs;
+	struct strblob_hashent	*ent, *ent_next;
+	unsigned int		bucket;
+
+	for (bucket = 0; bucket < sb->nr_buckets; bucket++) {
+		ent = sb->buckets[bucket];
+		while (ent != NULL) {
+			ent_next = ent->next;
+			free(ent);
+			ent = ent_next;
+		}
+	}
 
 	xfblob_destroy(sb->strings);
 	free(sb);
 	*sblobs = NULL;
 }
 
+/*
+ * Search the string hashtable for a matching entry.  Sets sets the cookie and
+ * returns 0 if one is found; ENOENT if there is no match; or a positive errno.
+ */
+static int
+__strblobs_lookup(
+	struct strblobs		*sblobs,
+	xfblob_cookie		*str_cookie,
+	const unsigned char	*str,
+	unsigned int		str_len,
+	xfs_dahash_t		str_hash)
+{
+	struct strblob_hashent	*ent;
+	char			*buf = NULL;
+	unsigned int		bucket;
+	int			error;
+
+	bucket = str_hash % sblobs->nr_buckets;
+	ent = sblobs->buckets[bucket];
+
+	for (ent = sblobs->buckets[bucket]; ent != NULL; ent = ent->next) {
+		if (ent->str_len != str_len || ent->str_hash != str_hash)
+			continue;
+
+		if (!buf) {
+			buf = malloc(str_len);
+			if (!buf)
+				return ENOMEM;
+		}
+
+		error = strblobs_load(sblobs, ent->str_cookie, buf, str_len);
+		if (error)
+			goto out;
+
+		if (memcmp(str, buf, str_len))
+			continue;
+
+		*str_cookie = ent->str_cookie;
+		goto out;
+	}
+	error = ENOENT;
+
+out:
+	free(buf);
+	return error;
+}
+
+/*
+ * Search the string hashtable for a matching entry.  Sets sets the cookie and
+ * returns 0 if one is found; ENOENT if there is no match; or a positive errno.
+ */
+int
+strblobs_lookup(
+	struct strblobs		*sblobs,
+	xfblob_cookie		*str_cookie,
+	const unsigned char	*str,
+	unsigned int		str_len)
+{
+	xfs_dahash_t		str_hash;
+
+	str_hash = libxfs_da_hashname(str, str_len);
+	return __strblobs_lookup(sblobs, str_cookie, str, str_len, str_hash);
+}
+
+/* Remember a string in the hashtable. */
+static int
+strblobs_hash(
+	struct strblobs		*sblobs,
+	xfblob_cookie		str_cookie,
+	const unsigned char	*str,
+	unsigned int		str_len,
+	xfs_dahash_t		str_hash)
+{
+	struct strblob_hashent	*ent;
+	unsigned int		bucket;
+
+	bucket = str_hash % sblobs->nr_buckets;
+
+	ent = malloc(sizeof(struct strblob_hashent));
+	if (!ent)
+		return ENOMEM;
+
+	ent->str_cookie = str_cookie;
+	ent->str_len = str_len;
+	ent->str_hash = str_hash;
+	ent->next = sblobs->buckets[bucket];
+
+	sblobs->buckets[bucket] = ent;
+	return 0;
+}
+
 /* Store a string and return a cookie for its retrieval. */
 int
 strblobs_store(
@@ -65,7 +188,19 @@ strblobs_store(
 	const unsigned char	*str,
 	unsigned int		str_len)
 {
-	return -xfblob_store(sblobs->strings, str_cookie, str, str_len);
+	int			error;
+	xfs_dahash_t		str_hash;
+
+	str_hash = libxfs_da_hashname(str, str_len);
+	error = __strblobs_lookup(sblobs, str_cookie, str, str_len, str_hash);
+	if (error != ENOENT)
+		return error;
+
+	error = -xfblob_store(sblobs->strings, str_cookie, str, str_len);
+	if (error)
+		return error;
+
+	return strblobs_hash(sblobs, *str_cookie, str, str_len, str_hash);
 }
 
 /* Retrieve a previously stored string. */
diff --git a/repair/strblobs.h b/repair/strblobs.h
index f5680175..b8b059e2 100644
--- a/repair/strblobs.h
+++ b/repair/strblobs.h
@@ -9,12 +9,14 @@
 struct strblobs;
 
 int strblobs_init(struct xfs_mount *mp, const char *descr,
-		struct strblobs **sblobs);
+		unsigned int hash_buckets, struct strblobs **sblobs);
 void strblobs_destroy(struct strblobs **sblobs);
 
 int strblobs_store(struct strblobs *sblobs, xfblob_cookie *str_cookie,
 		const unsigned char *str, unsigned int str_len);
 int strblobs_load(struct strblobs *sblobs, xfblob_cookie str_cookie,
 		unsigned char *str, unsigned int str_len);
+int strblobs_lookup(struct strblobs *sblobs, xfblob_cookie *str_cookie,
+		const unsigned char *str, unsigned int str_len);
 
 #endif /* __REPAIR_STRBLOBS_H__ */


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 8/8] xfs_repair: try to reuse nameblob names for file pptr scan names
  2023-02-16 20:31 ` [PATCHSET v9r2d1 0/8] xfs_repair: support parent pointers Darrick J. Wong
                     ` (6 preceding siblings ...)
  2023-02-16 21:10   ` [PATCH 7/8] xfs_repair: deduplicate strings stored in string blob Darrick J. Wong
@ 2023-02-16 21:10   ` Darrick J. Wong
  7 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 21:10 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

When we're scanning a file's parent pointers, see if the name already
exists in the nameblobs structure to save memory.  If not, we'll
continue to use the file scan xfblob, because we don't want to pollute
the nameblob structure with names we didn't see in the directory walk.

Each of the parent pointer scanner threads can access the nameblob
structure locklessly since they don't modify the nameblob.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 repair/pptr.c |   62 +++++++++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 55 insertions(+), 7 deletions(-)


diff --git a/repair/pptr.c b/repair/pptr.c
index c1cd9060..a5cf89b9 100644
--- a/repair/pptr.c
+++ b/repair/pptr.c
@@ -133,8 +133,11 @@ struct ag_pptr {
 };
 
 struct file_pptr {
+	/* Is the name stored in the global nameblobs structure? */
+	unsigned int		name_in_nameblobs:1;
+
 	/* parent directory handle */
-	xfs_ino_t		parent_ino;
+	unsigned long long	parent_ino:63;
 	unsigned int		parent_gen;
 
 	/* dirent offset */
@@ -467,6 +470,32 @@ record_garbage_xattr(
 				strerror(error));
 }
 
+/*
+ * Store this file parent pointer's name in the file scan namelist unless it's
+ * already in the global list.
+ */
+static int
+store_file_pptr_name(
+	struct file_scan			*fscan,
+	struct file_pptr			*file_pptr,
+	const struct xfs_parent_name_irec	*irec)
+{
+	int					error;
+
+	error = strblobs_lookup(nameblobs, &file_pptr->name_cookie,
+			irec->p_name, irec->p_namelen);
+	if (!error) {
+		file_pptr->name_in_nameblobs = true;
+		return 0;
+	}
+	if (error != ENOENT)
+		return error;
+
+	file_pptr->name_in_nameblobs = false;
+	return -xfblob_store(fscan->file_pptr_names, &file_pptr->name_cookie,
+			irec->p_name, irec->p_namelen);
+}
+
 /* Decide if this is a directory parent pointer and stash it if so. */
 static int
 examine_xattr(
@@ -505,8 +534,7 @@ examine_xattr(
 	file_pptr.diroffset = irec.p_diroffset;
 	file_pptr.namelen = irec.p_namelen;
 
-	error = -xfblob_store(fscan->file_pptr_names,
-			&file_pptr.name_cookie, irec.p_name, irec.p_namelen);
+	error = store_file_pptr_name(fscan, &file_pptr, &irec);
 	if (error)
 		do_error(
  _("storing ino %llu parent pointer '%.*s' failed: %s\n"),
@@ -568,6 +596,21 @@ remove_file_pptr(
 	return -libxfs_parent_unset(ip, &pptr_rec, &scratch);
 }
 
+/* Load a file parent pointer name from wherever we stored it. */
+static int
+load_file_pptr_name(
+	struct file_scan	*fscan,
+	const struct file_pptr	*file_pptr,
+	unsigned char		*name)
+{
+	if (file_pptr->name_in_nameblobs)
+		return strblobs_load(nameblobs, file_pptr->name_cookie,
+				name, file_pptr->namelen);
+
+	return -xfblob_load(fscan->file_pptr_names, file_pptr->name_cookie,
+			name, file_pptr->namelen);
+}
+
 /* Remove all pptrs from @ip. */
 static void
 clear_all_pptrs(
@@ -665,8 +708,7 @@ remove_incorrect_parent_ptr(
 	unsigned char		name[MAXNAMELEN] = { };
 	int			error;
 
-	error = -xfblob_load(fscan->file_pptr_names, file_pptr->name_cookie,
-			name, file_pptr->namelen);
+	error = load_file_pptr_name(fscan, file_pptr, name);
 	if (error)
 		do_error(
  _("loading incorrect name for ino %llu parent pointer (ino %llu gen 0x%x diroffset %u namecookie 0x%llx) failed: %s\n"),
@@ -729,8 +771,7 @@ compare_parent_pointers(
 				(unsigned long long)ag_pptr->name_cookie,
 				ag_pptr->namelen, strerror(error));
 
-	error = -xfblob_load(fscan->file_pptr_names, file_pptr->name_cookie,
-			name2, file_pptr->namelen);
+	error = load_file_pptr_name(fscan, file_pptr, name2);
 	if (error)
 		do_error(
  _("loading file-list name for ino %llu parent pointer (ino %llu gen 0x%x diroffset %u namecookie 0x%llx namelen %u) failed: %s\n"),
@@ -1051,6 +1092,13 @@ check_parent_ptrs(
 	struct workqueue	wq;
 	xfs_agnumber_t		agno;
 
+	/*
+	 * We only store the lower 63 bits of the inode number in struct
+	 * file_pptr to save space, so we must guarantee that we'll never
+	 * encounter an inumber with the top bit set.
+	 */
+	BUILD_BUG_ON((1ULL << 63) & XFS_MAXINUMBER);
+
 	if (!xfs_has_parent(mp))
 		return;
 


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 1/6] libfrog: support the sha512 hash algorithm
  2023-02-16 20:31 ` [PATCHSET v9r2d1 0/6] xfsprogs: encode parent pointer name in xattr key Darrick J. Wong
@ 2023-02-16 21:10   ` Darrick J. Wong
  2023-02-16 21:10   ` [PATCH 2/6] xfs: replace parent pointer diroffset with sha512 hash of name Darrick J. Wong
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 21:10 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Add a generic implementation of this hash algorithm.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 include/libxfs.h         |    1 
 io/crc32cselftest.c      |   22 ++++
 libfrog/Makefile         |   10 +-
 libfrog/sha512.c         |  249 ++++++++++++++++++++++++++++++++++++++++++++++
 libfrog/sha512.h         |   33 ++++++
 libfrog/sha512selftest.h |   86 ++++++++++++++++
 man/man8/xfs_io.8        |    4 +
 mkfs/xfs_mkfs.c          |    8 +
 repair/init.c            |    5 +
 9 files changed, 416 insertions(+), 2 deletions(-)
 create mode 100644 libfrog/sha512.c
 create mode 100644 libfrog/sha512.h
 create mode 100644 libfrog/sha512selftest.h


diff --git a/include/libxfs.h b/include/libxfs.h
index a38d78a1..23756f27 100644
--- a/include/libxfs.h
+++ b/include/libxfs.h
@@ -38,6 +38,7 @@ extern uint32_t crc32c_le(uint32_t crc, unsigned char const *p, size_t len);
 /* fake up kernel's iomap, (not) used in xfs_bmap.[ch] */
 struct iomap;
 #include "xfs_cksum.h"
+#include "libfrog/sha512.h"
 
 #define __round_mask(x, y) ((__typeof__(x))((y)-1))
 #define round_up(x, y) ((((x)-1) | __round_mask(x, y))+1)
diff --git a/io/crc32cselftest.c b/io/crc32cselftest.c
index 49eb5b6d..ebef6fe0 100644
--- a/io/crc32cselftest.c
+++ b/io/crc32cselftest.c
@@ -10,6 +10,8 @@
 #include "io.h"
 #include "libfrog/crc32c.h"
 #include "libfrog/crc32cselftest.h"
+#include "libfrog/sha512.h"
+#include "libfrog/sha512selftest.h"
 
 static int
 crc32cselftest_f(
@@ -30,8 +32,28 @@ static const cmdinfo_t	crc32cselftest_cmd = {
 	.oneline	= N_("self test of crc32c implementation"),
 };
 
+static int
+sha512selftest_f(
+	int		argc,
+	char		**argv)
+{
+	return sha512_test(0) != 0;
+}
+
+static const cmdinfo_t	sha512selftest_cmd = {
+	.name		= "sha512selftest",
+	.cfunc		= sha512selftest_f,
+	.argmin		= 0,
+	.argmax		= 0,
+	.canpush	= 0,
+	.flags		= CMD_FLAG_ONESHOT | CMD_FLAG_FOREIGN_OK |
+			  CMD_NOFILE_OK | CMD_NOMAP_OK,
+	.oneline	= N_("self test of sha512 implementation"),
+};
+
 void
 crc32cselftest_init(void)
 {
 	add_command(&crc32cselftest_cmd);
+	add_command(&sha512selftest_cmd);
 }
diff --git a/libfrog/Makefile b/libfrog/Makefile
index 5622ab9b..752fb9c7 100644
--- a/libfrog/Makefile
+++ b/libfrog/Makefile
@@ -28,6 +28,7 @@ projects.c \
 ptvar.c \
 radix-tree.c \
 scrub.c \
+sha512.c \
 util.c \
 workqueue.c
 
@@ -48,6 +49,7 @@ projects.h \
 ptvar.h \
 radix-tree.h \
 scrub.h \
+sha512.h \
 workqueue.h
 
 LSRCFILES += gen_crc32table.c
@@ -56,9 +58,9 @@ ifeq ($(HAVE_GETMNTENT),yes)
 LCFLAGS += -DHAVE_GETMNTENT
 endif
 
-LDIRT = gen_crc32table crc32table.h crc32selftest
+LDIRT = gen_crc32table crc32table.h crc32selftest sha512selftest
 
-default: crc32selftest ltdepend $(LTLIBRARY)
+default: crc32selftest sha512selftest ltdepend $(LTLIBRARY)
 
 crc32table.h: gen_crc32table.c crc32defs.h
 	@echo "    [CC]     gen_crc32table"
@@ -75,6 +77,10 @@ crc32selftest: gen_crc32table.c crc32table.h crc32.c crc32defs.h
 	@echo "    [TEST]    CRC32"
 	$(Q) $(BUILD_CC) $(BUILD_CFLAGS) -D CRC32_SELFTEST=1 crc32.c -o $@
 	$(Q) ./$@
+sha512selftest: sha512.h sha512selftest.h sha512.c
+	@echo "    [TEST]    SHA512"
+	$(Q) $(BUILD_CC) $(BUILD_CFLAGS) -D SHA512_SELFTEST=1 sha512.c -o $@
+	$(Q) ./$@
 
 include $(BUILDRULES)
 
diff --git a/libfrog/sha512.c b/libfrog/sha512.c
new file mode 100644
index 00000000..741e03be
--- /dev/null
+++ b/libfrog/sha512.c
@@ -0,0 +1,249 @@
+/*
+ * sha512.c --- The sha512 algorithm
+ *
+ * Copyright (C) 2004 Sam Hocevar <sam@hocevar.net>
+ * (copied from libtomcrypt and then relicensed under GPLv2)
+ * (and later copied from e2fsprogs)
+ *
+ * %Begin-Header%
+ * This file may be redistributed under the terms of the GNU Library
+ * General Public License, version 2.
+ * %End-Header%
+ */
+#include <string.h>
+#include "xfs.h"
+#include "libfrog/sha512.h"
+
+/* the K array */
+#define CONST64(n) n
+static const __u64 K[80] = {
+	CONST64(0x428a2f98d728ae22), CONST64(0x7137449123ef65cd),
+	CONST64(0xb5c0fbcfec4d3b2f), CONST64(0xe9b5dba58189dbbc),
+	CONST64(0x3956c25bf348b538), CONST64(0x59f111f1b605d019),
+	CONST64(0x923f82a4af194f9b), CONST64(0xab1c5ed5da6d8118),
+	CONST64(0xd807aa98a3030242), CONST64(0x12835b0145706fbe),
+	CONST64(0x243185be4ee4b28c), CONST64(0x550c7dc3d5ffb4e2),
+	CONST64(0x72be5d74f27b896f), CONST64(0x80deb1fe3b1696b1),
+	CONST64(0x9bdc06a725c71235), CONST64(0xc19bf174cf692694),
+	CONST64(0xe49b69c19ef14ad2), CONST64(0xefbe4786384f25e3),
+	CONST64(0x0fc19dc68b8cd5b5), CONST64(0x240ca1cc77ac9c65),
+	CONST64(0x2de92c6f592b0275), CONST64(0x4a7484aa6ea6e483),
+	CONST64(0x5cb0a9dcbd41fbd4), CONST64(0x76f988da831153b5),
+	CONST64(0x983e5152ee66dfab), CONST64(0xa831c66d2db43210),
+	CONST64(0xb00327c898fb213f), CONST64(0xbf597fc7beef0ee4),
+	CONST64(0xc6e00bf33da88fc2), CONST64(0xd5a79147930aa725),
+	CONST64(0x06ca6351e003826f), CONST64(0x142929670a0e6e70),
+	CONST64(0x27b70a8546d22ffc), CONST64(0x2e1b21385c26c926),
+	CONST64(0x4d2c6dfc5ac42aed), CONST64(0x53380d139d95b3df),
+	CONST64(0x650a73548baf63de), CONST64(0x766a0abb3c77b2a8),
+	CONST64(0x81c2c92e47edaee6), CONST64(0x92722c851482353b),
+	CONST64(0xa2bfe8a14cf10364), CONST64(0xa81a664bbc423001),
+	CONST64(0xc24b8b70d0f89791), CONST64(0xc76c51a30654be30),
+	CONST64(0xd192e819d6ef5218), CONST64(0xd69906245565a910),
+	CONST64(0xf40e35855771202a), CONST64(0x106aa07032bbd1b8),
+	CONST64(0x19a4c116b8d2d0c8), CONST64(0x1e376c085141ab53),
+	CONST64(0x2748774cdf8eeb99), CONST64(0x34b0bcb5e19b48a8),
+	CONST64(0x391c0cb3c5c95a63), CONST64(0x4ed8aa4ae3418acb),
+	CONST64(0x5b9cca4f7763e373), CONST64(0x682e6ff3d6b2b8a3),
+	CONST64(0x748f82ee5defb2fc), CONST64(0x78a5636f43172f60),
+	CONST64(0x84c87814a1f0ab72), CONST64(0x8cc702081a6439ec),
+	CONST64(0x90befffa23631e28), CONST64(0xa4506cebde82bde9),
+	CONST64(0xbef9a3f7b2c67915), CONST64(0xc67178f2e372532b),
+	CONST64(0xca273eceea26619c), CONST64(0xd186b8c721c0c207),
+	CONST64(0xeada7dd6cde0eb1e), CONST64(0xf57d4f7fee6ed178),
+	CONST64(0x06f067aa72176fba), CONST64(0x0a637dc5a2c898a6),
+	CONST64(0x113f9804bef90dae), CONST64(0x1b710b35131c471b),
+	CONST64(0x28db77f523047d84), CONST64(0x32caab7b40c72493),
+	CONST64(0x3c9ebe0a15c9bebc), CONST64(0x431d67c49c100d4c),
+	CONST64(0x4cc5d4becb3e42b6), CONST64(0x597f299cfc657e2a),
+	CONST64(0x5fcb6fab3ad6faec), CONST64(0x6c44198c4a475817)
+};
+#define Ch(x,y,z)       (z ^ (x & (y ^ z)))
+#define Maj(x,y,z)      (((x | y) & z) | (x & y))
+#define S(x, n)         ROR64c(x, n)
+#define R(x, n)         (((x)&CONST64(0xFFFFFFFFFFFFFFFF))>>((__u64)n))
+#define Sigma0(x)       (S(x, 28) ^ S(x, 34) ^ S(x, 39))
+#define Sigma1(x)       (S(x, 14) ^ S(x, 18) ^ S(x, 41))
+#define Gamma0(x)       (S(x, 1) ^ S(x, 8) ^ R(x, 7))
+#define Gamma1(x)       (S(x, 19) ^ S(x, 61) ^ R(x, 6))
+#define RND(a,b,c,d,e,f,g,h,i)\
+		t0 = h + Sigma1(e) + Ch(e, f, g) + K[i] + W[i];\
+		t1 = Sigma0(a) + Maj(a, b, c);\
+		d += t0;\
+		h  = t0 + t1;
+#define STORE64H(x, y) \
+	do { \
+		(y)[0] = (unsigned char)(((x)>>56)&255);\
+		(y)[1] = (unsigned char)(((x)>>48)&255);\
+		(y)[2] = (unsigned char)(((x)>>40)&255);\
+		(y)[3] = (unsigned char)(((x)>>32)&255);\
+		(y)[4] = (unsigned char)(((x)>>24)&255);\
+		(y)[5] = (unsigned char)(((x)>>16)&255);\
+		(y)[6] = (unsigned char)(((x)>>8)&255);\
+		(y)[7] = (unsigned char)((x)&255); } while(0)
+
+#define LOAD64H(x, y)\
+	do {x = \
+		(((__u64)((y)[0] & 255)) << 56) |\
+		(((__u64)((y)[1] & 255)) << 48) |\
+		(((__u64)((y)[2] & 255)) << 40) |\
+		(((__u64)((y)[3] & 255)) << 32) |\
+		(((__u64)((y)[4] & 255)) << 24) |\
+		(((__u64)((y)[5] & 255)) << 16) |\
+		(((__u64)((y)[6] & 255)) << 8) |\
+		(((__u64)((y)[7] & 255)));\
+	} while(0)
+
+#define ROR64c(x, y) \
+    ( ((((x)&CONST64(0xFFFFFFFFFFFFFFFF))>>((__u64)(y)&CONST64(63))) | \
+      ((x)<<((__u64)(64-((y)&CONST64(63)))))) & CONST64(0xFFFFFFFFFFFFFFFF))
+
+static void sha512_compress(struct sha512_state * md, const unsigned char *buf)
+{
+	__u64 S[8], W[80], t0, t1;
+	int i;
+
+	/* copy state into S */
+	for (i = 0; i < 8; i++) {
+		S[i] = md->state[i];
+	}
+
+	/* copy the state into 1024-bits into W[0..15] */
+	for (i = 0; i < 16; i++) {
+		LOAD64H(W[i], buf + (8*i));
+	}
+
+	/* fill W[16..79] */
+	for (i = 16; i < 80; i++) {
+		W[i] = Gamma1(W[i - 2]) + W[i - 7] +
+			Gamma0(W[i - 15]) + W[i - 16];
+	}
+
+	for (i = 0; i < 80; i += 8) {
+		RND(S[0],S[1],S[2],S[3],S[4],S[5],S[6],S[7],i+0);
+		RND(S[7],S[0],S[1],S[2],S[3],S[4],S[5],S[6],i+1);
+		RND(S[6],S[7],S[0],S[1],S[2],S[3],S[4],S[5],i+2);
+		RND(S[5],S[6],S[7],S[0],S[1],S[2],S[3],S[4],i+3);
+		RND(S[4],S[5],S[6],S[7],S[0],S[1],S[2],S[3],i+4);
+		RND(S[3],S[4],S[5],S[6],S[7],S[0],S[1],S[2],i+5);
+		RND(S[2],S[3],S[4],S[5],S[6],S[7],S[0],S[1],i+6);
+		RND(S[1],S[2],S[3],S[4],S[5],S[6],S[7],S[0],i+7);
+	}
+
+	 /* feedback */
+	for (i = 0; i < 8; i++) {
+		md->state[i] = md->state[i] + S[i];
+	}
+}
+
+int sha512_init(struct sha512_state * md)
+{
+	md->curlen = 0;
+	md->length = 0;
+	md->state[0] = CONST64(0x6a09e667f3bcc908);
+	md->state[1] = CONST64(0xbb67ae8584caa73b);
+	md->state[2] = CONST64(0x3c6ef372fe94f82b);
+	md->state[3] = CONST64(0xa54ff53a5f1d36f1);
+	md->state[4] = CONST64(0x510e527fade682d1);
+	md->state[5] = CONST64(0x9b05688c2b3e6c1f);
+	md->state[6] = CONST64(0x1f83d9abfb41bd6b);
+	md->state[7] = CONST64(0x5be0cd19137e2179);
+	return 0;
+}
+
+int sha512_done(struct sha512_state * md, unsigned char *out)
+{
+	int i;
+
+	/* increase the length of the message */
+	md->length += md->curlen * CONST64(8);
+
+	/* append the '1' bit */
+	md->buf[md->curlen++] = (unsigned char)0x80;
+
+	/* if the length is currently above 112 bytes we append zeros then
+	 * compress. Then we can fall back to padding zeros and length encoding
+	 * like normal. */
+	if (md->curlen > 112) {
+		while (md->curlen < 128) {
+			md->buf[md->curlen++] = (unsigned char)0;
+		}
+		sha512_compress(md, md->buf);
+		md->curlen = 0;
+	}
+
+	/* pad up to 120 bytes of zeroes note: that from 112 to 120 is the 64 MSB
+	 * of the length. We assume that you won't hash > 2^64 bits of data. */
+	while (md->curlen < 120) {
+		md->buf[md->curlen++] = (unsigned char)0;
+	}
+
+	/* store length */
+	STORE64H(md->length, md->buf + 120);
+	sha512_compress(md, md->buf);
+
+	/* copy output */
+	for (i = 0; i < 8; i++) {
+		STORE64H(md->state[i], out+(8 * i));
+	}
+
+	return 0;
+}
+
+#define SHA512_BLOCKSIZE 128
+int sha512_process(struct sha512_state * md,
+		   const unsigned char *in,
+		   unsigned long inlen)
+{
+	unsigned long n;
+
+	while (inlen > 0) {
+		if (md->curlen == 0 && inlen >= SHA512_BLOCKSIZE) {
+			sha512_compress(md, in);
+			md->length += SHA512_BLOCKSIZE * 8;
+			in += SHA512_BLOCKSIZE;
+			inlen -= SHA512_BLOCKSIZE;
+		} else {
+			n = MIN(inlen, (SHA512_BLOCKSIZE - md->curlen));
+			memcpy(md->buf + md->curlen,
+			       in, (size_t)n);
+			md->curlen += n;
+			in += n;
+			inlen -= n;
+			if (md->curlen == SHA512_BLOCKSIZE) {
+				sha512_compress(md, md->buf);
+				md->length += SHA512_BLOCKSIZE * 8;
+				md->curlen = 0;
+			}
+		}
+	}
+
+	return 0;
+}
+
+void sha512(const unsigned char *in, unsigned long in_size, unsigned char *out)
+{
+	struct sha512_state md;
+
+	sha512_init(&md);
+	sha512_process(&md, in, in_size);
+	sha512_done(&md, out);
+}
+
+#ifdef SHA512_SELFTEST
+# include "sha512selftest.h"
+
+/*
+ * make sure we always return 0 for a successful test run, and non-zero for a
+ * failed run. The build infrastructure is looking for this information to
+ * determine whether to allow the build to proceed.
+ */
+int main(int argc, char **argv)
+{
+	int errors;
+
+	errors = sha512_test(0);
+
+	return errors != 0;
+}
+#endif /* SHA512_SELFTEST */
diff --git a/libfrog/sha512.h b/libfrog/sha512.h
new file mode 100644
index 00000000..28ff5284
--- /dev/null
+++ b/libfrog/sha512.h
@@ -0,0 +1,33 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/*
+ * Copyright (c) 2023 Oracle, Inc.
+ * All Rights Reserved.
+ */
+#ifndef __LIBFROG_SHA512_H__
+#define __LIBFROG_SHA512_H__
+
+struct sha512_state {
+	__u64		length;
+	__u64		state[8];
+	unsigned long	curlen;
+	unsigned char	buf[128];
+};
+
+#define SHA512_DESC_ON_STACK(mp, name) \
+	struct sha512_state name
+
+#define SHA512_DIGEST_SIZE	64
+
+void sha512(const unsigned char *in, unsigned long in_size, unsigned char *out);
+
+int sha512_init(struct sha512_state *md);
+int sha512_done(struct sha512_state *md, unsigned char *out);
+int sha512_process(struct sha512_state *md, const unsigned char *in,
+		unsigned long inlen);
+
+static inline void sha512_erase(struct sha512_state *md)
+{
+	memset(md, 0, sizeof(*md));
+}
+
+#endif	/* __LIBFROG_SHA512_H__ */
diff --git a/libfrog/sha512selftest.h b/libfrog/sha512selftest.h
new file mode 100644
index 00000000..9f4edeb0
--- /dev/null
+++ b/libfrog/sha512selftest.h
@@ -0,0 +1,86 @@
+/*
+ * sha512.c --- The sha512 algorithm self tests
+ *
+ * Copyright (C) 2004 Sam Hocevar <sam@hocevar.net>
+ * (copied from libtomcrypt and then relicensed under GPLv2)
+ * (and later copied from e2fsprogs)
+ *
+ * %Begin-Header%
+ * This file may be redistributed under the terms of the GNU Library
+ * General Public License, version 2.
+ * %End-Header%
+ */
+#ifndef __LIBFROG_SHA512SELFTEST_H__
+#define __LIBFROG_SHA512SELFTEST_H__
+
+static const struct {
+	char *msg;
+	unsigned char hash[64];
+} sha512_tests[] = {
+	{ "",
+	  { 0xcf, 0x83, 0xe1, 0x35, 0x7e, 0xef, 0xb8, 0xbd,
+	    0xf1, 0x54, 0x28, 0x50, 0xd6, 0x6d, 0x80, 0x07,
+	    0xd6, 0x20, 0xe4, 0x05, 0x0b, 0x57, 0x15, 0xdc,
+	    0x83, 0xf4, 0xa9, 0x21, 0xd3, 0x6c, 0xe9, 0xce,
+	    0x47, 0xd0, 0xd1, 0x3c, 0x5d, 0x85, 0xf2, 0xb0,
+	    0xff, 0x83, 0x18, 0xd2, 0x87, 0x7e, 0xec, 0x2f,
+	    0x63, 0xb9, 0x31, 0xbd, 0x47, 0x41, 0x7a, 0x81,
+	    0xa5, 0x38, 0x32, 0x7a, 0xf9, 0x27, 0xda, 0x3e }
+	},
+	{ "abc",
+	  { 0xdd, 0xaf, 0x35, 0xa1, 0x93, 0x61, 0x7a, 0xba,
+	    0xcc, 0x41, 0x73, 0x49, 0xae, 0x20, 0x41, 0x31,
+	    0x12, 0xe6, 0xfa, 0x4e, 0x89, 0xa9, 0x7e, 0xa2,
+	    0x0a, 0x9e, 0xee, 0xe6, 0x4b, 0x55, 0xd3, 0x9a,
+	    0x21, 0x92, 0x99, 0x2a, 0x27, 0x4f, 0xc1, 0xa8,
+	    0x36, 0xba, 0x3c, 0x23, 0xa3, 0xfe, 0xeb, 0xbd,
+	    0x45, 0x4d, 0x44, 0x23, 0x64, 0x3c, 0xe8, 0x0e,
+	    0x2a, 0x9a, 0xc9, 0x4f, 0xa5, 0x4c, 0xa4, 0x9f }
+	},
+	{ "abcdefghbcdefghicdefghijdefghijkefghijklfghijklmghijklmnhijklmnoijklmnopjklmnopqklmnopqrlmnopqrsmnopqrstnopqrstu",
+	  { 0x8e, 0x95, 0x9b, 0x75, 0xda, 0xe3, 0x13, 0xda,
+	    0x8c, 0xf4, 0xf7, 0x28, 0x14, 0xfc, 0x14, 0x3f,
+	    0x8f, 0x77, 0x79, 0xc6, 0xeb, 0x9f, 0x7f, 0xa1,
+	    0x72, 0x99, 0xae, 0xad, 0xb6, 0x88, 0x90, 0x18,
+	    0x50, 0x1d, 0x28, 0x9e, 0x49, 0x00, 0xf7, 0xe4,
+	    0x33, 0x1b, 0x99, 0xde, 0xc4, 0xb5, 0x43, 0x3a,
+	    0xc7, 0xd3, 0x29, 0xee, 0xb6, 0xdd, 0x26, 0x54,
+	    0x5e, 0x96, 0xe5, 0x5b, 0x87, 0x4b, 0xe9, 0x09 }
+	},
+	{ "The quick brown fox jumps over the lazy dog.\n",
+	  { 0x02, 0x0d, 0xa0, 0xf4, 0xd8, 0xa4, 0xc8, 0xbf,
+	    0xbc, 0x98, 0x27, 0x40, 0x27, 0x74, 0x00, 0x61,
+	    0xd7, 0xdf, 0x52, 0xee, 0x07, 0x09, 0x1e, 0xd6,
+	    0x59, 0x5a, 0x08, 0x3e, 0x0f, 0x45, 0x32, 0x7b,
+	    0xbe, 0x59, 0x42, 0x43, 0x12, 0xd8, 0x6f, 0x21,
+	    0x8b, 0x74, 0xed, 0x2e, 0x25, 0x50, 0x7a, 0xba,
+	    0xf5, 0xc7, 0xa5, 0xfc, 0xf4, 0xca, 0xfc, 0xf9,
+	    0x53, 0x8b, 0x70, 0x58, 0x08, 0xfd, 0x55, 0xec }
+	},
+};
+
+/* Don't print anything to stdout. */
+#define SHA512TEST_QUIET	(1U << 0)
+
+static int sha512_test(unsigned int flags)
+{
+	int i;
+	int errors = 0;
+	unsigned char tmp[64];
+
+	for (i = 0; i < (int)(sizeof(sha512_tests) / sizeof(sha512_tests[0])); i++) {
+		unsigned char *msg = (unsigned char *) sha512_tests[i].msg;
+		int len = strlen(sha512_tests[i].msg);
+
+		sha512(msg, len, tmp);
+		if (memcmp(tmp, sha512_tests[i].hash, 64) != 0)
+			errors++;
+	}
+
+	if (!(flags & SHA512TEST_QUIET) && errors)
+		printf("sha512: %d self tests failed\n", errors);
+
+	return errors;
+}
+
+#endif /* __LIBFROG_SHA512SELFTEST_H__ */
diff --git a/man/man8/xfs_io.8 b/man/man8/xfs_io.8
index ef7087b3..dc10c8f6 100644
--- a/man/man8/xfs_io.8
+++ b/man/man8/xfs_io.8
@@ -1506,6 +1506,10 @@ command.
 .B crc32cselftest
 Test the internal crc32c implementation to make sure that it computes results
 correctly.
+.TP
+.B sha512selftest
+Test the internal sha512 implementation to make sure that it computes results
+correctly.
 .SH SEE ALSO
 .BR mkfs.xfs (8),
 .BR xfsctl (3),
diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
index dffee9e2..d3f34ef8 100644
--- a/mkfs/xfs_mkfs.c
+++ b/mkfs/xfs_mkfs.c
@@ -11,6 +11,7 @@
 #include "libfrog/fsgeom.h"
 #include "libfrog/convert.h"
 #include "libfrog/crc32cselftest.h"
+#include "libfrog/sha512selftest.h"
 #include "proto.h"
 #include <ini.h>
 
@@ -4287,6 +4288,13 @@ main(
 		return 1;
 	}
 
+	/* Make sure our checksum algorithm really works. */
+	if (sha512_test(SHA512TEST_QUIET) != 0) {
+		fprintf(stderr,
+ _("sha512 self-test failed, will not create a filesystem here.\n"));
+		return 1;
+	}
+
 	/*
 	 * All values have been validated, discard the old device layout.
 	 */
diff --git a/repair/init.c b/repair/init.c
index 3a320b4f..46b4dbef 100644
--- a/repair/init.c
+++ b/repair/init.c
@@ -15,6 +15,7 @@
 #include "incore.h"
 #include "prefetch.h"
 #include "libfrog/crc32cselftest.h"
+#include "libfrog/sha512selftest.h"
 #include <sys/resource.h>
 
 static void
@@ -105,4 +106,8 @@ _("Unmount or use the dangerous (-d) option to repair a read-only mounted filesy
 	if (crc32c_test(CRC32CTEST_QUIET) != 0)
 		do_error(
  _("crc32c self-test failed, will not examine filesystem.\n"));
+
+	if (sha512_test(SHA512TEST_QUIET) != 0)
+		do_error(
+ _("sha512 self-test failed, will not examine filesystem.\n"));
 }


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 2/6] xfs: replace parent pointer diroffset with sha512 hash of name
  2023-02-16 20:31 ` [PATCHSET v9r2d1 0/6] xfsprogs: encode parent pointer name in xattr key Darrick J. Wong
  2023-02-16 21:10   ` [PATCH 1/6] libfrog: support the sha512 hash algorithm Darrick J. Wong
@ 2023-02-16 21:10   ` Darrick J. Wong
  2023-02-16 21:11   ` [PATCH 3/6] xfs_logprint: decode parent pointers fully Darrick J. Wong
                     ` (3 subsequent siblings)
  5 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 21:10 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Replace the diroffset with the sha512 hash of the dirent name, thereby
eliminating the need for directory repair to update all the parent
pointers after rebuilding the directory.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 db/attr.c                |   25 +++++-
 db/attrshort.c           |   19 ++++-
 db/field.c               |    2 
 db/field.h               |    1 
 db/fprint.c              |   31 +++++++
 db/fprint.h              |    2 
 db/metadump.c            |   39 ++++-----
 libxfs/libxfs_api_defs.h |    2 
 libxfs/libxfs_priv.h     |    1 
 libxfs/xfs_da_format.h   |   15 ++--
 libxfs/xfs_fs.h          |    4 -
 libxfs/xfs_parent.c      |  124 +++++++++++++++++++++++-------
 libxfs/xfs_parent.h      |   21 +++--
 logprint/log_redo.c      |   15 +---
 man/man3/xfsctl.3        |    1 
 mkfs/proto.c             |    7 +-
 repair/phase6.c          |   13 +--
 repair/pptr.c            |  193 +++++++++++++++++++++++++++++++---------------
 repair/pptr.h            |    2 
 19 files changed, 358 insertions(+), 159 deletions(-)


diff --git a/db/attr.c b/db/attr.c
index 8ea7b36e..bacdc6d9 100644
--- a/db/attr.c
+++ b/db/attr.c
@@ -20,6 +20,7 @@ static int	attr_leaf_hdr_count(void *obj, int startoff);
 static int	attr_leaf_name_local_count(void *obj, int startoff);
 static int	attr_leaf_name_local_name_count(void *obj, int startoff);
 static int	attr_leaf_name_pptr_count(void *obj, int startoff);
+static int	attr_leaf_name_pptr_namehashlen(void *obj, int startoff);
 static int	attr_leaf_name_local_value_count(void *obj, int startoff);
 static int	attr_leaf_name_local_value_offset(void *obj, int startoff,
 						  int idx);
@@ -125,8 +126,8 @@ const field_t	attr_leaf_name_flds[] = {
 	  attr_leaf_name_pptr_count, FLD_COUNT, TYP_INODE },
 	{ "parent_gen", FLDT_UINT32D, OI(PPOFF(p_gen)),
 	  attr_leaf_name_pptr_count, FLD_COUNT, TYP_NONE },
-	{ "parent_diroffset", FLDT_UINT32D, OI(PPOFF(p_diroffset)),
-	  attr_leaf_name_pptr_count, FLD_COUNT, TYP_NONE },
+	{ "parent_namehash", FLDT_HEXSTRING, OI(PPOFF(p_namehash)),
+	  attr_leaf_name_pptr_namehashlen, FLD_COUNT, TYP_NONE },
 	{ "value", FLDT_CHARNS, attr_leaf_name_local_value_offset,
 	  attr_leaf_name_local_value_count, FLD_COUNT|FLD_OFFSET, TYP_NONE },
 	{ "valueblk", FLDT_UINT32X, OI(LVOFF(valueblk)),
@@ -302,6 +303,26 @@ attr_leaf_name_pptr_count(
 			__attr_leaf_name_pptr_count);
 }
 
+static int
+__attr_leaf_name_pptr_namehashlen(
+	struct xfs_attr_leafblock	*leaf,
+	struct xfs_attr_leaf_entry      *e,
+	int				i)
+{
+	if (e->flags & XFS_ATTR_PARENT)
+		return XFS_PARENT_NAME_HASH_SIZE;
+	return 0;
+}
+
+static int
+attr_leaf_name_pptr_namehashlen(
+	void				*obj,
+	int				startoff)
+{
+	return attr_leaf_entry_walk(obj, startoff,
+			__attr_leaf_name_pptr_namehashlen);
+}
+
 static int
 __attr_leaf_name_local_name_count(
 	struct xfs_attr_leafblock	*leaf,
diff --git a/db/attrshort.c b/db/attrshort.c
index 7c8ac485..be15f4ee 100644
--- a/db/attrshort.c
+++ b/db/attrshort.c
@@ -14,6 +14,7 @@
 
 static int	attr_sf_entry_name_count(void *obj, int startoff);
 static int	attr_sf_entry_pptr_count(void *obj, int startoff);
+static int	attr_sf_entry_pptr_namehashlen(void *obj, int startoff);
 static int	attr_sf_entry_value_count(void *obj, int startoff);
 static int	attr_sf_entry_value_offset(void *obj, int startoff, int idx);
 static int	attr_shortform_list_count(void *obj, int startoff);
@@ -56,8 +57,8 @@ const field_t	attr_sf_entry_flds[] = {
 	  FLD_COUNT, TYP_INODE },
 	{ "parent_gen", FLDT_UINT32D, OI(PPOFF(p_gen)), attr_sf_entry_pptr_count,
 	  FLD_COUNT, TYP_NONE },
-	{ "parent_diroffset", FLDT_UINT32D, OI(PPOFF(p_diroffset)),
-	   attr_sf_entry_pptr_count, FLD_COUNT, TYP_NONE },
+	{ "parent_namehash", FLDT_HEXSTRING, OI(PPOFF(p_namehash)),
+	   attr_sf_entry_pptr_namehashlen, FLD_COUNT, TYP_NONE },
 	{ "value", FLDT_CHARNS, attr_sf_entry_value_offset,
 	  attr_sf_entry_value_count, FLD_COUNT|FLD_OFFSET, TYP_NONE },
 	{ NULL }
@@ -77,6 +78,20 @@ attr_sf_entry_pptr_count(
 	return 0;
 }
 
+static int
+attr_sf_entry_pptr_namehashlen(
+	void				*obj,
+	int				startoff)
+{
+	struct xfs_attr_sf_entry	*e;
+
+	ASSERT(bitoffs(startoff) == 0);
+	e = (struct xfs_attr_sf_entry *)((char *)obj + byteize(startoff));
+	if (e->flags & XFS_ATTR_PARENT)
+		return XFS_PARENT_NAME_HASH_SIZE;
+	return 0;
+}
+
 static int
 attr_sf_entry_name_count(
 	void				*obj,
diff --git a/db/field.c b/db/field.c
index a3e47ee8..afadfdb4 100644
--- a/db/field.c
+++ b/db/field.c
@@ -144,6 +144,8 @@ const ftattr_t	ftattrtab[] = {
 	{ FLDT_CHARNS, "charns", fp_charns, NULL, SI(bitsz(char)), 0, NULL,
 	  NULL },
 	{ FLDT_CHARS, "chars", fp_num, "%c", SI(bitsz(char)), 0, NULL, NULL },
+	{ FLDT_HEXSTRING, "hexstring", fp_hexstring, NULL, SI(bitsz(char)), 0, NULL,
+	  NULL },
 	{ FLDT_REXTLEN, "rextlen", fp_num, "%u", SI(RMAPBT_BLOCKCOUNT_BITLEN),
 	  0, NULL, NULL },
 	{ FLDT_RFILEOFFD, "rfileoffd", fp_num, "%llu", SI(RMAPBT_OFFSET_BITLEN),
diff --git a/db/field.h b/db/field.h
index 634742a5..d756e04a 100644
--- a/db/field.h
+++ b/db/field.h
@@ -64,6 +64,7 @@ typedef enum fldt	{
 	FLDT_CFSBLOCK,
 	FLDT_CHARNS,
 	FLDT_CHARS,
+	FLDT_HEXSTRING,
 	FLDT_REXTLEN,
 	FLDT_RFILEOFFD,
 	FLDT_REXTFLG,
diff --git a/db/fprint.c b/db/fprint.c
index 65accfda..c4462fb6 100644
--- a/db/fprint.c
+++ b/db/fprint.c
@@ -54,6 +54,37 @@ fp_charns(
 	return 1;
 }
 
+int
+fp_hexstring(
+	void	*obj,
+	int	bit,
+	int	count,
+	char	*fmtstr,
+	int	size,
+	int	arg,
+	int	base,
+	int	array)
+{
+	int	i;
+	char	*p;
+
+	ASSERT(bitoffs(bit) == 0);
+	ASSERT(size == bitsz(char));
+	dbprintf("\"");
+	for (i = 0, p = (char *)obj + byteize(bit);
+	     i < count && !seenint();
+	     i++, p++) {
+		char c = *p & 0xff;
+
+		if (isalnum(c))
+			dbprintf("%c", c);
+		else
+			dbprintf("\\x%02x", c);
+	}
+	dbprintf("\"");
+	return 1;
+}
+
 int
 fp_num(
 	void		*obj,
diff --git a/db/fprint.h b/db/fprint.h
index a1ea935c..c07240c6 100644
--- a/db/fprint.h
+++ b/db/fprint.h
@@ -9,6 +9,8 @@ typedef int (*prfnc_t)(void *obj, int bit, int count, char *fmtstr, int size,
 
 extern int	fp_charns(void *obj, int bit, int count, char *fmtstr, int size,
 			  int arg, int base, int array);
+extern int	fp_hexstring(void *obj, int bit, int count, char *fmtstr,
+			  int size, int arg, int base, int array);
 extern int	fp_num(void *obj, int bit, int count, char *fmtstr, int size,
 		       int arg, int base, int array);
 extern int	fp_sarray(void *obj, int bit, int count, char *fmtstr, int size,
diff --git a/db/metadump.c b/db/metadump.c
index 4be23993..e56acdcc 100644
--- a/db/metadump.c
+++ b/db/metadump.c
@@ -740,14 +740,12 @@ nametable_add(xfs_dahash_t hash, int namelen, unsigned char *name)
 #define rol32(x,y)		(((x) << (y)) | ((x) >> (32 - (y))))
 
 static inline unsigned char
-random_filename_char(xfs_ino_t	ino)
+random_filename_char(void)
 {
 	static unsigned char filename_alphabet[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
 						"abcdefghijklmnopqrstuvwxyz"
 						"0123456789-_";
 
-	if (ino)
-		return filename_alphabet[ino % (sizeof filename_alphabet - 1)];
 	return filename_alphabet[random() % (sizeof filename_alphabet - 1)];
 }
 
@@ -817,7 +815,6 @@ in_lost_found(
  */
 static void
 obfuscate_name(
-	xfs_ino_t	ino,
 	xfs_dahash_t	hash,
 	size_t		name_len,
 	unsigned char	*name)
@@ -845,7 +842,7 @@ obfuscate_name(
 	 * Accumulate its new hash value as we go.
 	 */
 	for (i = 0; i < name_len - 5; i++) {
-		*newp = random_filename_char(ino);
+		*newp = random_filename_char();
 		new_hash = *newp ^ rol32(new_hash, 7);
 		newp++;
 	}
@@ -1210,10 +1207,14 @@ generate_obfuscated_name(
 	/* Obfuscate the name (if possible) */
 
 	hash = libxfs_da_hashname(name, namelen);
-	if (xfs_has_parent(mp))
-		obfuscate_name(ino, hash, namelen, name);
+	if (xfs_has_parent(mp) && ino)
+		/*
+		 * XXX: no good way to obfuscate dirent names now that we
+		 * hash them into the pptr key
+		 * obfuscate_name(ino, hash, namelen, name)
+		 */ ;
 	else
-		obfuscate_name(0, hash, namelen, name);
+		obfuscate_name(hash, namelen, name);
 
 	/*
 	 * Make sure the name is not something already seen.  If we
@@ -1326,7 +1327,7 @@ obfuscate_path_components(
 			/* last (or single) component */
 			namelen = strnlen((char *)comp, len);
 			hash = libxfs_da_hashname(comp, namelen);
-			obfuscate_name(0, hash, namelen, comp);
+			obfuscate_name(hash, namelen, comp);
 			break;
 		}
 		namelen = slash - (char *)comp;
@@ -1337,7 +1338,7 @@ obfuscate_path_components(
 			continue;
 		}
 		hash = libxfs_da_hashname(comp, namelen);
-		obfuscate_name(0, hash, namelen, comp);
+		obfuscate_name(hash, namelen, comp);
 		comp += namelen + 1;
 		len -= namelen + 1;
 	}
@@ -1412,16 +1413,11 @@ process_sf_attr(
 			break;
 		}
 
-		if (obfuscate) {
-			if (asfep->flags & XFS_ATTR_PARENT) {
-				generate_obfuscated_name(cur_ino, asfep->valuelen,
-					 &asfep->nameval[asfep->namelen]);
-			} else {
-				generate_obfuscated_name(0, asfep->namelen,
-							 &asfep->nameval[0]);
-				memset(&asfep->nameval[asfep->namelen], 'v',
-				       asfep->valuelen);
-			}
+		if (obfuscate && !(asfep->flags & XFS_ATTR_PARENT)) {
+			generate_obfuscated_name(0, asfep->namelen,
+					&asfep->nameval[0]);
+			memset(&asfep->nameval[asfep->namelen], 'v',
+					asfep->valuelen);
 		}
 
 		asfep = (struct xfs_attr_sf_entry *)((char *)asfep +
@@ -1808,9 +1804,6 @@ process_attr_block(
 			zlen = xfs_attr_leaf_entsize_local(nlen, vlen) -
 				(sizeof(xfs_attr_leaf_name_local_t) - 1 +
 				 nlen + vlen);
-			if (obfuscate && (entry->flags & XFS_ATTR_PARENT))
-				generate_obfuscated_name(cur_ino, vlen,
-						&local->nameval[nlen]);
 			if (zero_stale_data)
 				memset(&local->nameval[nlen + vlen], 0, zlen);
 		} else {
diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h
index ab8bdc1c..a28e604d 100644
--- a/libxfs/libxfs_api_defs.h
+++ b/libxfs/libxfs_api_defs.h
@@ -147,6 +147,8 @@
 #define xfs_parent_add			libxfs_parent_add
 #define xfs_parent_finish		libxfs_parent_finish
 #define xfs_parent_irec_from_disk	libxfs_parent_irec_from_disk
+#define xfs_parent_irec_hash		libxfs_parent_irec_hash
+#define xfs_parent_namehash		libxfs_parent_namehash
 #define xfs_parent_set			libxfs_parent_set
 #define xfs_parent_start		libxfs_parent_start
 #define xfs_parent_unset		libxfs_parent_unset
diff --git a/libxfs/libxfs_priv.h b/libxfs/libxfs_priv.h
index ad21a25d..d5a9fec2 100644
--- a/libxfs/libxfs_priv.h
+++ b/libxfs/libxfs_priv.h
@@ -56,6 +56,7 @@
 
 #include "xfs_fs.h"
 #include "libfrog/crc32c.h"
+#include "libfrog/sha512.h"
 
 #include <sys/xattr.h>
 
diff --git a/libxfs/xfs_da_format.h b/libxfs/xfs_da_format.h
index c07b8166..386f63b2 100644
--- a/libxfs/xfs_da_format.h
+++ b/libxfs/xfs_da_format.h
@@ -824,17 +824,22 @@ static inline unsigned int xfs_dir2_dirblock_bytes(struct xfs_sb *sbp)
 xfs_failaddr_t xfs_da3_blkinfo_verify(struct xfs_buf *bp,
 				      struct xfs_da3_blkinfo *hdr3);
 
+/* We use sha512 for the parent pointer name hash. */
+#define XFS_PARENT_NAME_HASH_SIZE	(64)
+
 /*
  * Parent pointer attribute format definition
  *
- * EA name encodes the parent inode number, generation and the offset of
- * the dirent that points to the child inode. The EA value contains the
- * same name as the dirent in the parent directory.
+ * The EA name encodes the parent inode number, generation and a collision
+ * resistant hash computed from the dirent name.  The hash is defined to be the
+ * sha512 of the child inode generation and the dirent name.
+ *
+ * The EA value contains the same name as the dirent in the parent directory.
  */
 struct xfs_parent_name_rec {
 	__be64  p_ino;
 	__be32  p_gen;
-	__be32  p_diroffset;
-};
+	__u8	p_namehash[XFS_PARENT_NAME_HASH_SIZE];
+} __attribute__((packed));
 
 #endif /* __XFS_DA_FORMAT_H__ */
diff --git a/libxfs/xfs_fs.h b/libxfs/xfs_fs.h
index 9e59a1fd..c65345d2 100644
--- a/libxfs/xfs_fs.h
+++ b/libxfs/xfs_fs.h
@@ -770,8 +770,8 @@ struct xfs_scrub_metadata {
 struct xfs_parent_ptr {
 	__u64		xpp_ino;			/* Inode */
 	__u32		xpp_gen;			/* Inode generation */
-	__u32		xpp_diroffset;			/* Directory offset */
-	__u64		xpp_rsvd;			/* Reserved */
+	__u32		xpp_rsvd;			/* Reserved */
+	__u64		xpp_rsvd2;			/* Reserved */
 	__u8		xpp_name[XFS_PPTR_MAXNAMELEN];	/* File name */
 };
 
diff --git a/libxfs/xfs_parent.c b/libxfs/xfs_parent.c
index a7c5974c..05c1e032 100644
--- a/libxfs/xfs_parent.c
+++ b/libxfs/xfs_parent.c
@@ -55,7 +55,6 @@ xfs_parent_namecheck(
 	unsigned int				attr_flags)
 {
 	xfs_ino_t				p_ino;
-	xfs_dir2_dataptr_t			p_diroffset;
 
 	if (reclen != sizeof(struct xfs_parent_name_rec))
 		return false;
@@ -68,10 +67,6 @@ xfs_parent_namecheck(
 	if (!xfs_verify_ino(mp, p_ino))
 		return false;
 
-	p_diroffset = be32_to_cpu(rec->p_diroffset);
-	if (p_diroffset > XFS_DIR2_MAX_DATAPTR)
-		return false;
-
 	return true;
 }
 
@@ -92,18 +87,17 @@ xfs_parent_valuecheck(
 }
 
 /* Initializes a xfs_parent_name_rec to be stored as an attribute name */
-static inline void
+static inline int
 xfs_init_parent_name_rec(
 	struct xfs_parent_name_rec	*rec,
-	const struct xfs_inode		*ip,
-	uint32_t			p_diroffset)
+	const struct xfs_inode		*dp,
+	const struct xfs_name		*name,
+	struct xfs_inode		*ip)
 {
-	xfs_ino_t			p_ino = ip->i_ino;
-	uint32_t			p_gen = VFS_IC(ip)->i_generation;
-
-	rec->p_ino = cpu_to_be64(p_ino);
-	rec->p_gen = cpu_to_be32(p_gen);
-	rec->p_diroffset = cpu_to_be32(p_diroffset);
+	rec->p_ino = cpu_to_be64(dp->i_ino);
+	rec->p_gen = cpu_to_be32(VFS_IC(dp)->i_generation);
+	return xfs_parent_namehash(ip, name, rec->p_namehash,
+			sizeof(rec->p_namehash));
 }
 
 /*
@@ -119,7 +113,7 @@ xfs_parent_irec_from_disk(
 {
 	irec->p_ino = be64_to_cpu(rec->p_ino);
 	irec->p_gen = be32_to_cpu(rec->p_gen);
-	irec->p_diroffset = be32_to_cpu(rec->p_diroffset);
+	memcpy(irec->p_namehash, rec->p_namehash, sizeof(irec->p_namehash));
 
 	if (!value) {
 		irec->p_namelen = 0;
@@ -149,7 +143,7 @@ xfs_parent_irec_to_disk(
 {
 	rec->p_ino = cpu_to_be64(irec->p_ino);
 	rec->p_gen = cpu_to_be32(irec->p_gen);
-	rec->p_diroffset = cpu_to_be32(irec->p_diroffset);
+	memcpy(rec->p_namehash, irec->p_namehash, sizeof(rec->p_namehash));
 
 	if (valuelen) {
 		ASSERT(*valuelen > 0);
@@ -209,12 +203,15 @@ xfs_parent_add(
 	struct xfs_parent_defer	*parent,
 	struct xfs_inode	*dp,
 	const struct xfs_name	*parent_name,
-	xfs_dir2_dataptr_t	diroffset,
 	struct xfs_inode	*child)
 {
 	struct xfs_da_args	*args = &parent->args;
+	int			error;
+
+	error = xfs_init_parent_name_rec(&parent->rec, dp, parent_name, child);
+	if (error)
+		return error;
 
-	xfs_init_parent_name_rec(&parent->rec, dp, diroffset);
 	args->hashval = xfs_da_hashname(args->name, args->namelen);
 
 	args->trans = tp;
@@ -231,14 +228,18 @@ xfs_parent_add(
 int
 xfs_parent_remove(
 	struct xfs_trans	*tp,
-	struct xfs_inode	*dp,
 	struct xfs_parent_defer	*parent,
-	xfs_dir2_dataptr_t	diroffset,
+	struct xfs_inode	*dp,
+	const struct xfs_name	*name,
 	struct xfs_inode	*child)
 {
 	struct xfs_da_args	*args = &parent->args;
+	int			error;
+
+	error = xfs_init_parent_name_rec(&parent->rec, dp, name, child);
+	if (error)
+		return error;
 
-	xfs_init_parent_name_rec(&parent->rec, dp, diroffset);
 	args->trans = tp;
 	args->dp = child;
 	args->hashval = xfs_da_hashname(args->name, args->namelen);
@@ -251,16 +252,23 @@ xfs_parent_replace(
 	struct xfs_trans	*tp,
 	struct xfs_parent_defer	*new_parent,
 	struct xfs_inode	*old_dp,
-	xfs_dir2_dataptr_t	old_diroffset,
-	const struct xfs_name	*parent_name,
+	const struct xfs_name	*old_name,
 	struct xfs_inode	*new_dp,
-	xfs_dir2_dataptr_t	new_diroffset,
+	const struct xfs_name	*new_name,
 	struct xfs_inode	*child)
 {
 	struct xfs_da_args	*args = &new_parent->args;
+	int			error;
+
+	error = xfs_init_parent_name_rec(&new_parent->old_rec, old_dp,
+			old_name, child);
+	if (error)
+		return error;
+	error = xfs_init_parent_name_rec(&new_parent->rec, new_dp, new_name,
+			child);
+	if (error)
+		return error;
 
-	xfs_init_parent_name_rec(&new_parent->old_rec, old_dp, old_diroffset);
-	xfs_init_parent_name_rec(&new_parent->rec, new_dp, new_diroffset);
 	new_parent->args.name = (const uint8_t *)&new_parent->old_rec;
 	new_parent->args.namelen = sizeof(struct xfs_parent_name_rec);
 	new_parent->args.new_name = (const uint8_t *)&new_parent->rec;
@@ -268,9 +276,8 @@ xfs_parent_replace(
 	args->trans = tp;
 	args->dp = child;
 
-	ASSERT(parent_name != NULL);
-	new_parent->args.value = (void *)parent_name->name;
-	new_parent->args.valuelen = parent_name->len;
+	new_parent->args.value = (void *)new_name->name;
+	new_parent->args.valuelen = new_name->len;
 
 	args->hashval = xfs_da_hashname(args->name, args->namelen);
 	return xfs_attr_defer_replace(args);
@@ -389,3 +396,62 @@ xfs_parent_unset(
 
 	return xfs_attr_set(&scr->args);
 }
+
+/*
+ * Compute the parent pointer namehash for the given child file and dirent
+ * name.
+ */
+int
+xfs_parent_namehash(
+	struct xfs_inode	*ip,
+	const struct xfs_name	*name,
+	void			*namehash,
+	unsigned int		namehash_len)
+{
+	SHA512_DESC_ON_STACK(ip->i_mount, shash);
+	__be32			gen = cpu_to_be32(VFS_I(ip)->i_generation);
+	int			error;
+
+	ASSERT(SHA512_DIGEST_SIZE ==
+			crypto_shash_digestsize(ip->i_mount->m_sha512));
+
+	if (namehash_len != SHA512_DIGEST_SIZE) {
+		ASSERT(0);
+		return -EINVAL;
+	}
+
+	error = sha512_init(&shash);
+	if (error)
+		goto out;
+
+	error = sha512_process(&shash, (const u8 *)&gen, sizeof(gen));
+	if (error)
+		goto out;
+
+	error = sha512_process(&shash, name->name, name->len);
+	if (error)
+		goto out;
+
+	error = sha512_done(&shash, namehash);
+	if (error)
+		goto out;
+
+out:
+	sha512_erase(&shash);
+	return error;
+}
+
+/* Recalculate the name hash of this parent pointer. */
+int
+xfs_parent_irec_hash(
+	struct xfs_inode		*ip,
+	struct xfs_parent_name_irec	*pptr)
+{
+	struct xfs_name			xname = {
+		.name			= pptr->p_name,
+		.len			= pptr->p_namelen,
+	};
+
+	return xfs_parent_namehash(ip, &xname, &pptr->p_namehash,
+			sizeof(pptr->p_namehash));
+}
diff --git a/libxfs/xfs_parent.h b/libxfs/xfs_parent.h
index a7fc621b..d3f2841e 100644
--- a/libxfs/xfs_parent.h
+++ b/libxfs/xfs_parent.h
@@ -23,7 +23,7 @@ struct xfs_parent_name_irec {
 	/* Key fields for looking up a particular parent pointer. */
 	xfs_ino_t		p_ino;
 	uint32_t		p_gen;
-	xfs_dir2_dataptr_t	p_diroffset;
+	uint8_t			p_namehash[XFS_PARENT_NAME_HASH_SIZE];
 
 	/* Attributes of a parent pointer. */
 	uint8_t			p_namelen;
@@ -79,15 +79,14 @@ xfs_parent_start_locked(
 
 int xfs_parent_add(struct xfs_trans *tp, struct xfs_parent_defer *parent,
 		struct xfs_inode *dp, const struct xfs_name *parent_name,
-		xfs_dir2_dataptr_t diroffset, struct xfs_inode *child);
+		struct xfs_inode *child);
 int xfs_parent_replace(struct xfs_trans *tp,
 		struct xfs_parent_defer *new_parent, struct xfs_inode *old_dp,
-		xfs_dir2_dataptr_t old_diroffset,
-		const struct xfs_name *parent_name, struct xfs_inode *new_ip,
-		xfs_dir2_dataptr_t new_diroffset, struct xfs_inode *child);
-int xfs_parent_remove(struct xfs_trans *tp, struct xfs_inode *dp,
-		struct xfs_parent_defer *parent, xfs_dir2_dataptr_t diroffset,
-		struct xfs_inode *child);
+		const struct xfs_name *old_name, struct xfs_inode *new_ip,
+		const struct xfs_name *new_name, struct xfs_inode *child);
+int xfs_parent_remove(struct xfs_trans *tp,
+		struct xfs_parent_defer *parent, struct xfs_inode *dp,
+		const struct xfs_name *name, struct xfs_inode *child);
 
 void __xfs_parent_cancel(struct xfs_mount *mp, struct xfs_parent_defer *parent);
 
@@ -100,6 +99,12 @@ xfs_parent_finish(
 		__xfs_parent_cancel(mp, p);
 }
 
+int xfs_parent_namehash(struct xfs_inode *ip, const struct xfs_name *name,
+		void *namehash, unsigned int namehash_len);
+
+int xfs_parent_irec_hash(struct xfs_inode *ip,
+		struct xfs_parent_name_irec *pptr);
+
 unsigned int xfs_pptr_calc_space_res(struct xfs_mount *mp,
 				     unsigned int namelen);
 
diff --git a/logprint/log_redo.c b/logprint/log_redo.c
index f7e9c9ad..1ac0536a 100644
--- a/logprint/log_redo.c
+++ b/logprint/log_redo.c
@@ -805,9 +805,8 @@ xlog_print_trans_attri_name(
 	}
 	memmove((char*)src_f, *ptr, src_len);
 
-	printf(_("ATTRI:  #p_ino: %llu	p_gen: %u, p_diroffset: %u\n"),
-		be64_to_cpu(src_f->p_ino), be32_to_cpu(src_f->p_gen),
-				be32_to_cpu(src_f->p_diroffset));
+	printf(_("ATTRI:  #p_ino: %llu	p_gen: %u\n"),
+		be64_to_cpu(src_f->p_ino), be32_to_cpu(src_f->p_gen));
 
 	free(src_f);
 out:
@@ -898,9 +897,8 @@ xlog_recover_print_attri(
 				goto out;
 			}
 
-			printf(_("ATTRI:  #inode: %llu     gen: %u, offset: %u\n"),
-				be64_to_cpu(rec->p_ino), be32_to_cpu(rec->p_gen),
-				be32_to_cpu(rec->p_diroffset));
+			printf(_("ATTRI:  #inode: %llu     gen: %u\n"),
+				be64_to_cpu(rec->p_ino), be32_to_cpu(rec->p_gen));
 
 			free(rec);
 		}
@@ -929,9 +927,8 @@ xlog_recover_print_attri(
 				goto out;
 			}
 
-			printf(_("ATTRI:  new #inode: %llu     gen: %u, offset: %u\n"),
-				be64_to_cpu(rec->p_ino), be32_to_cpu(rec->p_gen),
-				be32_to_cpu(rec->p_diroffset));
+			printf(_("ATTRI:  new #inode: %llu     gen: %u\n"),
+				be64_to_cpu(rec->p_ino), be32_to_cpu(rec->p_gen));
 
 			free(rec);
 		}
diff --git a/man/man3/xfsctl.3 b/man/man3/xfsctl.3
index 7cc97499..42ba3bba 100644
--- a/man/man3/xfsctl.3
+++ b/man/man3/xfsctl.3
@@ -367,7 +367,6 @@ int main() {
 			p = xfs_ppinfo_to_pp(pi, i);
 			printf("inode		= %llu\\n", (unsigned long long)p->xpp_ino);
 			printf("generation	= %u\\n", (unsigned int)p->xpp_gen);
-			printf("diroffset	= %u\\n", (unsigned int)p->xpp_diroffset);
 			printf("name		= \\"%s\\"\\n\\n", (char *)p->xpp_name);
 		}
 	} while (!pi->pi_flags & XFS_PPTR_OFLAG_DONE);
diff --git a/mkfs/proto.c b/mkfs/proto.c
index b8d7ac96..445fbefb 100644
--- a/mkfs/proto.c
+++ b/mkfs/proto.c
@@ -509,7 +509,7 @@ parseproto(
 		libxfs_trans_log_inode(tp, ip, flags);
 		if (parent) {
 			error = -libxfs_parent_add(tp, parent, pip, &xname,
-					offset, ip);
+					ip);
 			if (error)
 				fail(_("committing parent pointers failed."),
 						error);
@@ -602,7 +602,7 @@ parseproto(
 		libxfs_trans_log_inode(tp, ip, flags);
 		if (parent) {
 			error = -libxfs_parent_add(tp, parent, pip, &xname,
-					offset, ip);
+					ip);
 			if (error)
 				fail(_("committing parent pointers failed."),
 						error);
@@ -636,8 +636,7 @@ parseproto(
 	}
 	libxfs_trans_log_inode(tp, ip, flags);
 	if (parent) {
-		error = -libxfs_parent_add(tp, parent, pip, &xname, offset,
-				ip);
+		error = -libxfs_parent_add(tp, parent, pip, &xname, ip);
 		if (error)
 			fail(_("committing parent pointers failed."), error);
 	}
diff --git a/repair/phase6.c b/repair/phase6.c
index 1994162a..3fb11df9 100644
--- a/repair/phase6.c
+++ b/repair/phase6.c
@@ -68,7 +68,6 @@ struct dir_hash_ent {
 	struct dir_hash_ent	*nextbyorder;	/* next in order added */
 	xfs_dahash_t		hashval;	/* hash value of name */
 	uint32_t		address;	/* offset of data entry */
-	uint32_t		new_address;	/* new address, if we rebuild */
 	xfs_ino_t		inum;		/* inode num of entry */
 	short			junkit;		/* name starts with / */
 	short			seen;		/* have seen leaf entry */
@@ -226,7 +225,6 @@ dir_hash_add(
 	p->address = addr;
 	p->inum = inum;
 	p->seen = 0;
-	p->new_address = addr;
 
 	/* Set up the name in the region trailing the hash entry. */
 	memcpy(p->namebuf, name, namelen);
@@ -979,7 +977,7 @@ mk_orphanage(xfs_mount_t *mp)
 		do_error(
 		_("can't make %s, createname error %d\n"),
 			ORPHANAGE, error);
-	add_parent_ptr(ip->i_ino, ORPHANAGE, diroffset, pip);
+	add_parent_ptr(ip->i_ino, ORPHANAGE, pip);
 
 	/*
 	 * bump up the link count in the root directory to account
@@ -1169,8 +1167,7 @@ mv_orphanage(
 	}
 
 	if (xfs_has_parent(mp))
-		add_parent_ptr(ino_p->i_ino, xname.name, diroffset,
-				orphanage_ip);
+		add_parent_ptr(ino_p->i_ino, xname.name, orphanage_ip);
 
 	libxfs_irele(ino_p);
 	libxfs_irele(orphanage_ip);
@@ -1341,8 +1338,8 @@ longform_dir2_rebuild(
 
 		libxfs_trans_ijoin(tp, ip, 0);
 
-		error = -libxfs_dir_createname(tp, ip, &p->name, p->inum,
-						nres, &p->new_address);
+		error = -libxfs_dir_createname(tp, ip, &p->name, p->inum, nres,
+				NULL);
 		if (error) {
 			do_warn(
 _("name create failed in ino %" PRIu64 " (%d)\n"), ino, error);
@@ -2819,7 +2816,7 @@ dir_hash_add_parent_ptrs(
 						p->name.name[1] == '.'))))
 			continue;
 
-		add_parent_ptr(p->inum, p->name.name, p->new_address, dp);
+		add_parent_ptr(p->inum, p->name.name, dp);
 	}
 }
 
diff --git a/repair/pptr.c b/repair/pptr.c
index a5cf89b9..ca5fe7e3 100644
--- a/repair/pptr.c
+++ b/repair/pptr.c
@@ -119,9 +119,6 @@ struct ag_pptr {
 	xfs_ino_t		parent_ino;
 	unsigned int		parent_gen;
 
-	/* dirent offset */
-	xfs_dir2_dataptr_t	diroffset;
-
 	/* dirent name length */
 	unsigned int		namelen;
 
@@ -140,9 +137,6 @@ struct file_pptr {
 	unsigned long long	parent_ino:63;
 	unsigned int		parent_gen;
 
-	/* dirent offset */
-	xfs_dir2_dataptr_t	diroffset;
-
 	/* parent pointer name length */
 	unsigned int		namelen;
 
@@ -220,9 +214,9 @@ cmp_ag_pptr(
 	if (pa->parent_ino > pb->parent_ino)
 		return 1;
 
-	if (pa->diroffset < pb->diroffset)
+	if (pa->name_cookie < pb->name_cookie)
 		return -1;
-	if (pa->diroffset > pb->diroffset)
+	if (pa->name_cookie > pb->name_cookie)
 		return 1;
 
 	return 0;
@@ -241,9 +235,18 @@ cmp_file_pptr(
 	if (pa->parent_ino > pb->parent_ino)
 		return 1;
 
-	if (pa->diroffset < pb->diroffset)
+	/*
+	 * Push the parent pointer names that we didn't find in the dirent scan
+	 * towards the front of the list so that we delete them first.
+	 */
+	if (!pa->name_in_nameblobs && pb->name_in_nameblobs)
 		return -1;
-	if (pa->diroffset > pb->diroffset)
+	if (pa->name_in_nameblobs && !pb->name_in_nameblobs)
+		return 1;
+
+	if (pa->name_cookie < pb->name_cookie)
+		return -1;
+	if (pa->name_cookie > pb->name_cookie)
 		return 1;
 
 	return 0;
@@ -308,12 +311,11 @@ parent_ptr_init(
 	}
 }
 
-/* Remember that @dp has a dirent (@fname, @ino) at @diroffset. */
+/* Remember that @dp has a dirent (@fname, @ino). */
 void
 add_parent_ptr(
 	xfs_ino_t		ino,
 	const unsigned char	*fname,
-	xfs_dir2_dataptr_t	diroffset,
 	struct xfs_inode	*dp)
 {
 	struct xfs_mount	*mp = dp->i_mount;
@@ -321,7 +323,6 @@ add_parent_ptr(
 		.child_agino	= XFS_INO_TO_AGINO(mp, ino),
 		.parent_ino	= dp->i_ino,
 		.parent_gen	= VFS_I(dp)->i_generation,
-		.diroffset	= diroffset,
 		.namelen	= strlen(fname),
 	};
 	struct ag_pptrs		*ag_pptrs;
@@ -348,9 +349,9 @@ add_parent_ptr(
 				fname, strerror(error));
 
 	dbg_printf(
- _("%s: dp %llu fname '%s' diroffset %u ino %llu cookie 0x%llx\n"),
+ _("%s: dp %llu fname '%s' ino %llu cookie 0x%llx\n"),
 			__func__, (unsigned long long)dp->i_ino, fname,
-			diroffset, (unsigned long long)ino,
+			(unsigned long long)ino,
 			(unsigned long long)ag_pptr.name_cookie);
 }
 
@@ -509,6 +510,8 @@ examine_xattr(
 {
 	struct file_pptr	file_pptr = { };
 	struct xfs_parent_name_irec irec;
+	struct xfs_name		xname;
+	uint8_t			namehash[XFS_PARENT_NAME_HASH_SIZE];
 	struct xfs_mount	*mp = ip->i_mount;
 	struct file_scan	*fscan = priv;
 	const struct xfs_parent_name_rec *rec = (const void *)name;
@@ -531,9 +534,23 @@ examine_xattr(
 
 	file_pptr.parent_ino = irec.p_ino;
 	file_pptr.parent_gen = irec.p_gen;
-	file_pptr.diroffset = irec.p_diroffset;
 	file_pptr.namelen = irec.p_namelen;
 
+	xname.name = irec.p_name;
+	xname.len = irec.p_namelen;
+
+	/*
+	 * Does the namehash in the attr key match the name in the attr value?
+	 * If not, there's no point in checking further.
+	 */
+	error = -libxfs_parent_namehash(ip, &xname, namehash,
+			sizeof(namehash));
+	if (error)
+		goto corrupt;
+
+	if (memcmp(irec.p_namehash, namehash, sizeof(irec.p_namehash)))
+		goto corrupt;
+
 	error = store_file_pptr_name(fscan, &file_pptr, &irec);
 	if (error)
 		do_error(
@@ -547,10 +564,10 @@ examine_xattr(
 				(unsigned long long)ip->i_ino, strerror(error));
 
 	dbg_printf(
- _("%s: dp %llu fname '%.*s' namelen %u diroffset %u ino %llu cookie 0x%llx\n"),
+ _("%s: dp %llu fname '%.*s' namelen %u ino %llu cookie 0x%llx\n"),
 			__func__, (unsigned long long)irec.p_ino,
 			irec.p_namelen, (const char *)irec.p_name,
-			irec.p_namelen, irec.p_diroffset,
+			irec.p_namelen,
 			(unsigned long long)ip->i_ino,
 			(unsigned long long)file_pptr.name_cookie);
 	fscan->nr_file_pptrs++;
@@ -570,13 +587,17 @@ add_file_pptr(
 	struct xfs_parent_name_irec	pptr_rec = {
 		.p_ino			= ag_pptr->parent_ino,
 		.p_gen			= ag_pptr->parent_gen,
-		.p_diroffset		= ag_pptr->diroffset,
 		.p_namelen		= ag_pptr->namelen,
 	};
 	struct xfs_parent_scratch	scratch;
+	int				error;
 
 	memcpy(pptr_rec.p_name, name, ag_pptr->namelen);
 
+	error = -libxfs_parent_irec_hash(ip, &pptr_rec);
+	if (error)
+		return error;
+
 	return -libxfs_parent_set(ip, &pptr_rec, &scratch);
 }
 
@@ -584,14 +605,22 @@ add_file_pptr(
 static int
 remove_file_pptr(
 	struct xfs_inode		*ip,
-	const struct file_pptr		*file_pptr)
+	const struct file_pptr		*file_pptr,
+	const unsigned char		*name)
 {
 	struct xfs_parent_name_irec	pptr_rec = {
 		.p_ino			= file_pptr->parent_ino,
 		.p_gen			= file_pptr->parent_gen,
-		.p_diroffset		= file_pptr->diroffset,
+		.p_namelen		= file_pptr->namelen,
 	};
 	struct xfs_parent_scratch	scratch;
+	int				error;
+
+	memcpy(pptr_rec.p_name, name, file_pptr->namelen);
+
+	error = -libxfs_parent_irec_hash(ip, &pptr_rec);
+	if (error)
+		return error;
 
 	return -libxfs_parent_unset(ip, &pptr_rec, &scratch);
 }
@@ -637,13 +666,25 @@ clear_all_pptrs(
 				strerror(error));
 
 	while ((file_pptr = pop_slab_cursor(cur)) != NULL) {
-		error = remove_file_pptr(ip, file_pptr);
+		unsigned char	name[MAXNAMELEN];
+
+		error = load_file_pptr_name(fscan, file_pptr, name);
 		if (error)
 			do_error(
- _("wiping ino %llu pptr (ino %llu gen 0x%x diroffset %u) failed: %s\n"),
+  _("loading incorrect name for ino %llu parent pointer (ino %llu gen 0x%x namecookie 0x%llx) failed: %s\n"),
+					(unsigned long long)ip->i_ino,
+					(unsigned long long)file_pptr->parent_ino,
+					file_pptr->parent_gen,
+					(unsigned long long)file_pptr->name_cookie,
+					strerror(error));
+
+		error = remove_file_pptr(ip, file_pptr, name);
+		if (error)
+			do_error(
+ _("wiping ino %llu pptr (ino %llu gen 0x%x) failed: %s\n"),
 				(unsigned long long)ip->i_ino,
 				(unsigned long long)file_pptr->parent_ino,
-				file_pptr->parent_gen, file_pptr->diroffset,
+				file_pptr->parent_gen,
 				strerror(error));
 	}
 
@@ -664,37 +705,37 @@ add_missing_parent_ptr(
 			ag_pptr->namelen);
 	if (error)
 		do_error(
- _("loading missing name for ino %llu parent pointer (ino %llu gen 0x%x diroffset %u namecookie 0x%llx) failed: %s\n"),
+ _("loading missing name for ino %llu parent pointer (ino %llu gen 0x%x namecookie 0x%llx) failed: %s\n"),
 				(unsigned long long)ip->i_ino,
 				(unsigned long long)ag_pptr->parent_ino,
-				ag_pptr->parent_gen, ag_pptr->diroffset,
+				ag_pptr->parent_gen,
 				(unsigned long long)ag_pptr->name_cookie,
 				strerror(error));
 
 	if (no_modify) {
 		do_warn(
- _("would add missing ino %llu parent pointer (ino %llu gen 0x%x diroffset %u name '%.*s')\n"),
+ _("would add missing ino %llu parent pointer (ino %llu gen 0x%x name '%.*s')\n"),
 				(unsigned long long)ip->i_ino,
 				(unsigned long long)ag_pptr->parent_ino,
-				ag_pptr->parent_gen, ag_pptr->diroffset,
+				ag_pptr->parent_gen,
 				ag_pptr->namelen, name);
 		return;
 	}
 
 	do_warn(
- _("adding missing ino %llu parent pointer (ino %llu gen 0x%x diroffset %u name '%.*s')\n"),
+ _("adding missing ino %llu parent pointer (ino %llu gen 0x%x name '%.*s')\n"),
 			(unsigned long long)ip->i_ino,
 			(unsigned long long)ag_pptr->parent_ino,
-			ag_pptr->parent_gen, ag_pptr->diroffset,
+			ag_pptr->parent_gen,
 			ag_pptr->namelen, name);
 
 	error = add_file_pptr(ip, ag_pptr, name);
 	if (error)
 		do_error(
- _("adding ino %llu pptr (ino %llu gen 0x%x diroffset %u name '%.*s') failed: %s\n"),
+ _("adding ino %llu pptr (ino %llu gen 0x%x name '%.*s') failed: %s\n"),
 			(unsigned long long)ip->i_ino,
 			(unsigned long long)ag_pptr->parent_ino,
-			ag_pptr->parent_gen, ag_pptr->diroffset,
+			ag_pptr->parent_gen,
 			ag_pptr->namelen, name, strerror(error));
 }
 
@@ -711,37 +752,37 @@ remove_incorrect_parent_ptr(
 	error = load_file_pptr_name(fscan, file_pptr, name);
 	if (error)
 		do_error(
- _("loading incorrect name for ino %llu parent pointer (ino %llu gen 0x%x diroffset %u namecookie 0x%llx) failed: %s\n"),
+ _("loading incorrect name for ino %llu parent pointer (ino %llu gen 0x%x namecookie 0x%llx) failed: %s\n"),
 				(unsigned long long)ip->i_ino,
 				(unsigned long long)file_pptr->parent_ino,
-				file_pptr->parent_gen, file_pptr->diroffset,
+				file_pptr->parent_gen,
 				(unsigned long long)file_pptr->name_cookie,
 				strerror(error));
 
 	if (no_modify) {
 		do_warn(
- _("would remove bad ino %llu parent pointer (ino %llu gen 0x%x diroffset %u name '%.*s')\n"),
+ _("would remove bad ino %llu parent pointer (ino %llu gen 0x%x name '%.*s')\n"),
 				(unsigned long long)ip->i_ino,
 				(unsigned long long)file_pptr->parent_ino,
-				file_pptr->parent_gen, file_pptr->diroffset,
+				file_pptr->parent_gen,
 				file_pptr->namelen, name);
 		return;
 	}
 
 	do_warn(
- _("removing bad ino %llu parent pointer (ino %llu gen 0x%x diroffset %u name '%.*s')\n"),
+ _("removing bad ino %llu parent pointer (ino %llu gen 0x%x name '%.*s')\n"),
 			(unsigned long long)ip->i_ino,
 			(unsigned long long)file_pptr->parent_ino,
-			file_pptr->parent_gen, file_pptr->diroffset,
+			file_pptr->parent_gen,
 			file_pptr->namelen, name);
 
-	error = remove_file_pptr(ip, file_pptr);
+	error = remove_file_pptr(ip, file_pptr, name);
 	if (error)
 		do_error(
- _("removing ino %llu pptr (ino %llu gen 0x%x diroffset %u name '%.*s') failed: %s\n"),
+ _("removing ino %llu pptr (ino %llu gen 0x%x name '%.*s') failed: %s\n"),
 			(unsigned long long)ip->i_ino,
 			(unsigned long long)file_pptr->parent_ino,
-			file_pptr->parent_gen, file_pptr->diroffset,
+			file_pptr->parent_gen,
 			file_pptr->namelen, name, strerror(error));
 }
 
@@ -764,20 +805,20 @@ compare_parent_pointers(
 			ag_pptr->namelen);
 	if (error)
 		do_error(
- _("loading master-list name for ino %llu parent pointer (ino %llu gen 0x%x diroffset %u namecookie 0x%llx namelen %u) failed: %s\n"),
+ _("loading master-list name for ino %llu parent pointer (ino %llu gen 0x%x  namecookie 0x%llx namelen %u) failed: %s\n"),
 				(unsigned long long)ip->i_ino,
 				(unsigned long long)ag_pptr->parent_ino,
-				ag_pptr->parent_gen, ag_pptr->diroffset,
+				ag_pptr->parent_gen,
 				(unsigned long long)ag_pptr->name_cookie,
 				ag_pptr->namelen, strerror(error));
 
 	error = load_file_pptr_name(fscan, file_pptr, name2);
 	if (error)
 		do_error(
- _("loading file-list name for ino %llu parent pointer (ino %llu gen 0x%x diroffset %u namecookie 0x%llx namelen %u) failed: %s\n"),
+ _("loading file-list name for ino %llu parent pointer (ino %llu gen 0x%x namecookie 0x%llx namelen %u) failed: %s\n"),
 				(unsigned long long)ip->i_ino,
 				(unsigned long long)file_pptr->parent_ino,
-				file_pptr->parent_gen, file_pptr->diroffset,
+				file_pptr->parent_gen,
 				(unsigned long long)file_pptr->name_cookie,
 				ag_pptr->namelen, strerror(error));
 
@@ -793,42 +834,67 @@ compare_parent_pointers(
 reset:
 	if (no_modify) {
 		do_warn(
- _("would update ino %llu parent pointer (ino %llu gen 0x%x diroffset %u name '%.*s')\n"),
+ _("would update ino %llu parent pointer (ino %llu gen 0x%x name '%.*s')\n"),
 				(unsigned long long)ip->i_ino,
 				(unsigned long long)ag_pptr->parent_ino,
-				ag_pptr->parent_gen, ag_pptr->diroffset,
+				ag_pptr->parent_gen,
 				ag_pptr->namelen, name1);
 		return;
 	}
 
 	do_warn(
- _("updating ino %llu parent pointer (ino %llu gen 0x%x diroffset %u name '%.*s')\n"),
+ _("updating ino %llu parent pointer (ino %llu gen 0x%x name '%.*s')\n"),
 			(unsigned long long)ip->i_ino,
 			(unsigned long long)ag_pptr->parent_ino,
-			ag_pptr->parent_gen, ag_pptr->diroffset,
+			ag_pptr->parent_gen,
 			ag_pptr->namelen, name1);
 
 	if (ag_pptr->parent_gen != file_pptr->parent_gen) {
-		error = remove_file_pptr(ip, file_pptr);
+		error = remove_file_pptr(ip, file_pptr, name2);
 		if (error)
 			do_error(
- _("erasing ino %llu pptr (ino %llu gen 0x%x diroffset %u name '%.*s') failed: %s\n"),
+ _("erasing ino %llu pptr (ino %llu gen 0x%x name '%.*s') failed: %s\n"),
 				(unsigned long long)ip->i_ino,
 				(unsigned long long)file_pptr->parent_ino,
-				file_pptr->parent_gen, file_pptr->diroffset,
+				file_pptr->parent_gen,
 				file_pptr->namelen, name2, strerror(error));
 	}
 
 	error = add_file_pptr(ip, ag_pptr, name1);
 	if (error)
 		do_error(
- _("updating ino %llu pptr (ino %llu gen 0x%x diroffset %u name '%.*s') failed: %s\n"),
+ _("updating ino %llu pptr (ino %llu gen 0x%x name '%.*s') failed: %s\n"),
 			(unsigned long long)ip->i_ino,
 			(unsigned long long)ag_pptr->parent_ino,
-			ag_pptr->parent_gen, ag_pptr->diroffset,
+			ag_pptr->parent_gen,
 			ag_pptr->namelen, name1, strerror(error));
 }
 
+static int
+cmp_file_to_ag_pptr(
+	const struct file_pptr	*fp,
+	const struct ag_pptr	*ap)
+{
+	if (fp->parent_ino > ap->parent_ino)
+		return 1;
+	if (fp->parent_ino < ap->parent_ino)
+		return -1;
+
+	/*
+	 * If this parent pointer wasn't found in the dirent scan, we know it
+	 * should be removed.
+	 */
+	if (!fp->name_in_nameblobs)
+		return -1;
+
+	if (fp->name_cookie < ap->name_cookie)
+		return -1;
+	if (fp->name_cookie > ap->name_cookie)
+		return 1;
+
+	return 0;
+}
+
 /*
  * Make sure that the parent pointers we observed match the ones ondisk.
  *
@@ -894,26 +960,26 @@ crosscheck_file_parent_ptrs(
 				(unsigned long long)ip->i_ino, strerror(error));
 
 	do {
+		int	cmp_result;
+
 		file_pptr = peek_slab_cursor(fscan->file_pptr_recs_cur);
 
 		dbg_printf(
- _("%s: dp %llu dp_gen 0x%x namelen %u diroffset %u ino %llu namecookie 0x%llx (master)\n"),
+ _("%s: dp %llu dp_gen 0x%x namelen %u ino %llu namecookie 0x%llx (master)\n"),
 				__func__,
 				(unsigned long long)ag_pptr->parent_ino,
 				ag_pptr->parent_gen,
 				ag_pptr->namelen,
-				ag_pptr->diroffset,
 				(unsigned long long)ip->i_ino,
 				(unsigned long long)ag_pptr->name_cookie);
 
 		if (file_pptr) {
 			dbg_printf(
- _("%s: dp %llu dp_gen 0x%x namelen %u diroffset %u ino %llu namecookie 0x%llx (file)\n"),
+ _("%s: dp %llu dp_gen 0x%x namelen %u ino %llu namecookie 0x%llx (file)\n"),
 					__func__,
 					(unsigned long long)file_pptr->parent_ino,
 					file_pptr->parent_gen,
 					file_pptr->namelen,
-					file_pptr->diroffset,
 					(unsigned long long)ip->i_ino,
 					(unsigned long long)file_pptr->name_cookie);
 		} else {
@@ -923,9 +989,8 @@ crosscheck_file_parent_ptrs(
 					(unsigned long long)ip->i_ino);
 		}
 
-		if (!file_pptr ||
-		    file_pptr->parent_ino > ag_pptr->parent_ino ||
-		    file_pptr->diroffset > ag_pptr->diroffset) {
+		cmp_result = file_pptr ? cmp_file_to_ag_pptr(file_pptr, ag_pptr) : 1;
+		if (cmp_result > 0) {
 			/*
 			 * The master pptr list knows about pptrs that are not
 			 * in the ondisk metadata.  Add the missing pptr and
@@ -933,8 +998,7 @@ crosscheck_file_parent_ptrs(
 			 */
 			add_missing_parent_ptr(ip, fscan, ag_pptr);
 			advance_slab_cursor(fscan->ag_pptr_recs_cur);
-		} else if (file_pptr->parent_ino < ag_pptr->parent_ino ||
-			   file_pptr->diroffset < ag_pptr->diroffset) {
+		} else if (cmp_result < 0) {
 			/*
 			 * The ondisk pptrs mention a link that is not in the
 			 * master list.  Delete the extra pptr and advance only
@@ -958,12 +1022,11 @@ crosscheck_file_parent_ptrs(
 
 	while ((file_pptr = pop_slab_cursor(fscan->file_pptr_recs_cur))) {
 		dbg_printf(
- _("%s: dp %llu dp_gen 0x%x namelen %u diroffset %u ino %llu namecookie 0x%llx (excess)\n"),
+ _("%s: dp %llu dp_gen 0x%x namelen %u ino %llu namecookie 0x%llx (excess)\n"),
 				__func__,
 				(unsigned long long)file_pptr->parent_ino,
 				file_pptr->parent_gen,
 				file_pptr->namelen,
-				file_pptr->diroffset,
 				(unsigned long long)ip->i_ino,
 				(unsigned long long)file_pptr->name_cookie);
 
diff --git a/repair/pptr.h b/repair/pptr.h
index d72c1ac2..1cf3444c 100644
--- a/repair/pptr.h
+++ b/repair/pptr.h
@@ -10,7 +10,7 @@ void parent_ptr_free(struct xfs_mount *mp);
 void parent_ptr_init(struct xfs_mount *mp);
 
 void add_parent_ptr(xfs_ino_t ino, const unsigned char *fname,
-		xfs_dir2_dataptr_t diroffset, struct xfs_inode *dp);
+		struct xfs_inode *dp);
 
 void check_parent_ptrs(struct xfs_mount *mp);
 


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 3/6] xfs_logprint: decode parent pointers fully
  2023-02-16 20:31 ` [PATCHSET v9r2d1 0/6] xfsprogs: encode parent pointer name in xattr key Darrick J. Wong
  2023-02-16 21:10   ` [PATCH 1/6] libfrog: support the sha512 hash algorithm Darrick J. Wong
  2023-02-16 21:10   ` [PATCH 2/6] xfs: replace parent pointer diroffset with sha512 hash of name Darrick J. Wong
@ 2023-02-16 21:11   ` Darrick J. Wong
  2023-02-16 21:11   ` [PATCH 4/6] xfs: skip the sha512 namehash when possible Darrick J. Wong
                     ` (2 subsequent siblings)
  5 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 21:11 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Decode logged parent pointers fully when dumping log contents.  Between
the existing ATTRI: printouts and the new ones introduced here, we can
figure out what was stored in each log iovec, as well as the higher
level parent pointer that was logged.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 logprint/log_redo.c |   63 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 63 insertions(+)


diff --git a/logprint/log_redo.c b/logprint/log_redo.c
index 1ac0536a..ca6b2641 100644
--- a/logprint/log_redo.c
+++ b/logprint/log_redo.c
@@ -699,6 +699,24 @@ xfs_attri_copy_name_format(
 	return 1;
 }
 
+static void
+dump_pptr(
+	const char			*tag,
+	const void			*name,
+	unsigned int			namelen,
+	const void			*value,
+	unsigned int			valuelen)
+{
+	struct xfs_parent_name_irec	irec;
+
+	libxfs_parent_irec_from_disk(&irec, name, value, valuelen);
+
+	printf("PPTR: %s attr_namelen %u value_namelen %u\n", tag, namelen, valuelen);
+	printf("PPTR: %s parent_ino %llu parent_gen %u namelen %u name '%.*s'\n",
+			tag, (unsigned long long)irec.p_ino, irec.p_gen,
+			irec.p_namelen, irec.p_namelen, irec.p_name);
+}
+
 int
 xlog_print_trans_attri(
 	char				**ptr,
@@ -707,6 +725,9 @@ xlog_print_trans_attri(
 {
 	struct xfs_attri_log_format	*src_f = NULL;
 	xlog_op_header_t		*head = NULL;
+	void				*name_ptr = NULL, *nname_ptr = NULL;
+	void				*value_ptr = (void *)1;
+	int				name_len = 0, nname_len = 0, value_len = 0;
 	uint				dst_len;
 	int				error = 0;
 
@@ -739,6 +760,8 @@ xlog_print_trans_attri(
 		(*i)++;
 		head = (xlog_op_header_t *)*ptr;
 		xlog_print_op_header(head, *i, ptr);
+		name_ptr = *ptr;
+		name_len = src_f->alfi_name_len;
 		error = xlog_print_trans_attri_name(ptr, be32_to_cpu(head->oh_len),
 						    src_f->alfi_attr_filter);
 		if (error)
@@ -750,6 +773,8 @@ xlog_print_trans_attri(
 		(*i)++;
 		head = (xlog_op_header_t *)*ptr;
 		xlog_print_op_header(head, *i, ptr);
+		nname_ptr = *ptr;
+		nname_len = src_f->alfi_nname_len;
 		error = xlog_print_trans_attri_name(ptr, be32_to_cpu(head->oh_len),
 						    src_f->alfi_attr_filter);
 		if (error)
@@ -761,9 +786,23 @@ xlog_print_trans_attri(
 		(*i)++;
 		head = (xlog_op_header_t *)*ptr;
 		xlog_print_op_header(head, *i, ptr);
+		value_ptr = *ptr;
+		value_len = src_f->alfi_value_len;
 		error = xlog_print_trans_attri_value(ptr, be32_to_cpu(head->oh_len),
 				src_f->alfi_value_len, src_f->alfi_attr_filter);
 	}
+
+	if (src_f->alfi_attr_filter & XFS_ATTR_PARENT) {
+		if (nname_ptr && name_ptr) {
+			dump_pptr("OLDNAME", name_ptr, name_len, (void *)1, 0);
+			dump_pptr("NEWNAME", nname_ptr, nname_len, value_ptr, value_len);
+			name_ptr = nname_ptr = NULL;
+		}
+		if (name_ptr)
+			dump_pptr("NAME", name_ptr, name_len, value_ptr, value_len);
+		if (nname_ptr)
+			dump_pptr("NNAME", nname_ptr, nname_len, (void *)1, 0);
+	}
 error:
 	free(src_f);
 
@@ -853,6 +892,9 @@ xlog_recover_print_attri(
 {
 	struct xfs_attri_log_format	*f, *src_f = NULL;
 	uint				src_len, dst_len;
+	void				*name_ptr = NULL, *nname_ptr = NULL;
+	void				*value_ptr = (void *)1;
+	int				name_len = 0, nname_len = 0, value_len = 0;
 
 	struct xfs_parent_name_rec 	*rec, *src_rec = NULL;
 	char				*value, *src_value = NULL;
@@ -897,6 +939,9 @@ xlog_recover_print_attri(
 				goto out;
 			}
 
+			name_ptr = src_rec;
+			name_len = src_len;
+
 			printf(_("ATTRI:  #inode: %llu     gen: %u\n"),
 				be64_to_cpu(rec->p_ino), be32_to_cpu(rec->p_gen));
 
@@ -927,6 +972,9 @@ xlog_recover_print_attri(
 				goto out;
 			}
 
+			nname_ptr = src_rec;
+			nname_len = src_len;
+
 			printf(_("ATTRI:  new #inode: %llu     gen: %u\n"),
 				be64_to_cpu(rec->p_ino), be32_to_cpu(rec->p_gen));
 
@@ -951,6 +999,9 @@ xlog_recover_print_attri(
 				exit(1);
 			}
 
+			value_ptr = src_value;
+			value_len = f->alfi_value_len;
+
 			memcpy((char *)value, (char *)src_value, f->alfi_value_len);
 			printf("ATTRI:  value: %.*s\n", f->alfi_value_len, value);
 
@@ -968,6 +1019,18 @@ xlog_recover_print_attri(
 		}
 	}
 
+	if (src_f->alfi_attr_filter & XFS_ATTR_PARENT) {
+		if (nname_ptr && name_ptr) {
+			dump_pptr("OLDNAME", name_ptr, name_len, (void *)1, 0);
+			dump_pptr("NEWNAME", nname_ptr, nname_len, value_ptr, value_len);
+			name_ptr = nname_ptr = NULL;
+		}
+		if (name_ptr)
+			dump_pptr("NAME", name_ptr, name_len, value_ptr, value_len);
+		if (nname_ptr)
+			dump_pptr("NNAME", nname_ptr, nname_len, (void *)1, 0);
+	}
+
 out:
 	free(f);
 


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 4/6] xfs: skip the sha512 namehash when possible
  2023-02-16 20:31 ` [PATCHSET v9r2d1 0/6] xfsprogs: encode parent pointer name in xattr key Darrick J. Wong
                     ` (2 preceding siblings ...)
  2023-02-16 21:11   ` [PATCH 3/6] xfs_logprint: decode parent pointers fully Darrick J. Wong
@ 2023-02-16 21:11   ` Darrick J. Wong
  2023-02-16 21:11   ` [PATCH 5/6] xfs: make the ondisk parent pointer record a flex array Darrick J. Wong
  2023-02-16 21:12   ` [PATCH 6/6] xfs: use parent pointer xattr space more efficiently Darrick J. Wong
  5 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 21:11 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Reduce the size and performance impacts of parent pointer name hashes by
using the dirent name as the hash if the dirent name is shorter than a
sha512 hash would be.  IOWs, we only use sha512 for names longer than 63
bytes.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 db/attr.c              |    8 +++--
 db/attrshort.c         |    2 +
 libxfs/xfs_da_format.h |   21 +++++++++++-
 libxfs/xfs_parent.c    |   85 ++++++++++++++++++++++++++++++++----------------
 libxfs/xfs_parent.h    |    8 +++--
 logprint/log_redo.c    |    5 ++-
 repair/pptr.c          |   10 +++---
 7 files changed, 96 insertions(+), 43 deletions(-)


diff --git a/db/attr.c b/db/attr.c
index bacdc6d9..798a7e1a 100644
--- a/db/attr.c
+++ b/db/attr.c
@@ -309,8 +309,12 @@ __attr_leaf_name_pptr_namehashlen(
 	struct xfs_attr_leaf_entry      *e,
 	int				i)
 {
-	if (e->flags & XFS_ATTR_PARENT)
-		return XFS_PARENT_NAME_HASH_SIZE;
+	struct xfs_attr_leaf_name_local	*lname;
+
+	if (e->flags & XFS_ATTR_PARENT) {
+		lname = xfs_attr3_leaf_name_local(leaf, i);
+		return xfs_parent_name_hashlen(lname->namelen);
+	}
 	return 0;
 }
 
diff --git a/db/attrshort.c b/db/attrshort.c
index be15f4ee..2fcf44c1 100644
--- a/db/attrshort.c
+++ b/db/attrshort.c
@@ -88,7 +88,7 @@ attr_sf_entry_pptr_namehashlen(
 	ASSERT(bitoffs(startoff) == 0);
 	e = (struct xfs_attr_sf_entry *)((char *)obj + byteize(startoff));
 	if (e->flags & XFS_ATTR_PARENT)
-		return XFS_PARENT_NAME_HASH_SIZE;
+		return xfs_parent_name_hashlen(e->namelen);
 	return 0;
 }
 
diff --git a/libxfs/xfs_da_format.h b/libxfs/xfs_da_format.h
index 386f63b2..27535750 100644
--- a/libxfs/xfs_da_format.h
+++ b/libxfs/xfs_da_format.h
@@ -831,8 +831,11 @@ xfs_failaddr_t xfs_da3_blkinfo_verify(struct xfs_buf *bp,
  * Parent pointer attribute format definition
  *
  * The EA name encodes the parent inode number, generation and a collision
- * resistant hash computed from the dirent name.  The hash is defined to be the
- * sha512 of the child inode generation and the dirent name.
+ * resistant hash computed from the dirent name.  The hash is defined to be:
+ *
+ * - The dirent name if it fits within the EA name.
+ *
+ * - The sha512 of the child inode generation and the dirent name.
  *
  * The EA value contains the same name as the dirent in the parent directory.
  */
@@ -842,4 +845,18 @@ struct xfs_parent_name_rec {
 	__u8	p_namehash[XFS_PARENT_NAME_HASH_SIZE];
 } __attribute__((packed));
 
+static inline unsigned int
+xfs_parent_name_rec_sizeof(
+	unsigned int		hashlen)
+{
+	return offsetof(struct xfs_parent_name_rec, p_namehash) + hashlen;
+}
+
+static inline unsigned int
+xfs_parent_name_hashlen(
+	unsigned int		rec_sizeof)
+{
+	return rec_sizeof - offsetof(struct xfs_parent_name_rec, p_namehash);
+}
+
 #endif /* __XFS_DA_FORMAT_H__ */
diff --git a/libxfs/xfs_parent.c b/libxfs/xfs_parent.c
index 05c1e032..064f2f40 100644
--- a/libxfs/xfs_parent.c
+++ b/libxfs/xfs_parent.c
@@ -56,7 +56,8 @@ xfs_parent_namecheck(
 {
 	xfs_ino_t				p_ino;
 
-	if (reclen != sizeof(struct xfs_parent_name_rec))
+	if (reclen <= xfs_parent_name_rec_sizeof(0) ||
+	    reclen > xfs_parent_name_rec_sizeof(XFS_PARENT_NAME_HASH_SIZE))
 		return false;
 
 	/* Only one namespace bit allowed. */
@@ -108,12 +109,16 @@ void
 xfs_parent_irec_from_disk(
 	struct xfs_parent_name_irec	*irec,
 	const struct xfs_parent_name_rec *rec,
+	int				reclen,
 	const void			*value,
 	int				valuelen)
 {
 	irec->p_ino = be64_to_cpu(rec->p_ino);
 	irec->p_gen = be32_to_cpu(rec->p_gen);
-	memcpy(irec->p_namehash, rec->p_namehash, sizeof(irec->p_namehash));
+	irec->hashlen = xfs_parent_name_hashlen(reclen);
+	memcpy(irec->p_namehash, rec->p_namehash, irec->hashlen);
+	memset(irec->p_namehash + irec->hashlen, 0,
+			sizeof(irec->p_namehash) - irec->hashlen);
 
 	if (!value) {
 		irec->p_namelen = 0;
@@ -137,13 +142,15 @@ xfs_parent_irec_from_disk(
 void
 xfs_parent_irec_to_disk(
 	struct xfs_parent_name_rec	*rec,
+	int				*reclen,
 	void				*value,
 	int				*valuelen,
 	const struct xfs_parent_name_irec *irec)
 {
 	rec->p_ino = cpu_to_be64(irec->p_ino);
 	rec->p_gen = cpu_to_be32(irec->p_gen);
-	memcpy(rec->p_namehash, irec->p_namehash, sizeof(rec->p_namehash));
+	*reclen = xfs_parent_name_rec_sizeof(irec->hashlen);
+	memcpy(rec->p_namehash, irec->p_namehash, irec->hashlen);
 
 	if (valuelen) {
 		ASSERT(*valuelen > 0);
@@ -206,12 +213,14 @@ xfs_parent_add(
 	struct xfs_inode	*child)
 {
 	struct xfs_da_args	*args = &parent->args;
-	int			error;
+	int			hashlen;
 
-	error = xfs_init_parent_name_rec(&parent->rec, dp, parent_name, child);
-	if (error)
-		return error;
+	hashlen = xfs_init_parent_name_rec(&parent->rec, dp, parent_name,
+			child);
+	if (hashlen < 0)
+		return hashlen;
 
+	args->namelen = xfs_parent_name_rec_sizeof(hashlen);
 	args->hashval = xfs_da_hashname(args->name, args->namelen);
 
 	args->trans = tp;
@@ -234,12 +243,13 @@ xfs_parent_remove(
 	struct xfs_inode	*child)
 {
 	struct xfs_da_args	*args = &parent->args;
-	int			error;
+	int			hashlen;
 
-	error = xfs_init_parent_name_rec(&parent->rec, dp, name, child);
-	if (error)
-		return error;
+	hashlen = xfs_init_parent_name_rec(&parent->rec, dp, name, child);
+	if (hashlen < 0)
+		return hashlen;
 
+	args->namelen = xfs_parent_name_rec_sizeof(hashlen);
 	args->trans = tp;
 	args->dp = child;
 	args->hashval = xfs_da_hashname(args->name, args->namelen);
@@ -258,21 +268,21 @@ xfs_parent_replace(
 	struct xfs_inode	*child)
 {
 	struct xfs_da_args	*args = &new_parent->args;
-	int			error;
+	int			old_hashlen, new_hashlen;
 
-	error = xfs_init_parent_name_rec(&new_parent->old_rec, old_dp,
+	old_hashlen = xfs_init_parent_name_rec(&new_parent->old_rec, old_dp,
 			old_name, child);
-	if (error)
-		return error;
-	error = xfs_init_parent_name_rec(&new_parent->rec, new_dp, new_name,
-			child);
-	if (error)
-		return error;
+	if (old_hashlen < 0)
+		return old_hashlen;
+	new_hashlen = xfs_init_parent_name_rec(&new_parent->rec, new_dp,
+			new_name, child);
+	if (new_hashlen < 0)
+		return new_hashlen;
 
 	new_parent->args.name = (const uint8_t *)&new_parent->old_rec;
-	new_parent->args.namelen = sizeof(struct xfs_parent_name_rec);
+	new_parent->args.namelen = xfs_parent_name_rec_sizeof(old_hashlen);
 	new_parent->args.new_name = (const uint8_t *)&new_parent->rec;
-	new_parent->args.new_namelen = sizeof(struct xfs_parent_name_rec);
+	new_parent->args.new_namelen = xfs_parent_name_rec_sizeof(new_hashlen);
 	args->trans = tp;
 	args->dp = child;
 
@@ -320,16 +330,17 @@ xfs_parent_lookup(
 	unsigned int			namelen,
 	struct xfs_parent_scratch	*scr)
 {
+	int				reclen;
 	int				error;
 
-	xfs_parent_irec_to_disk(&scr->rec, NULL, NULL, pptr);
+	xfs_parent_irec_to_disk(&scr->rec, &reclen, NULL, NULL, pptr);
 
 	memset(&scr->args, 0, sizeof(struct xfs_da_args));
 	scr->args.attr_filter	= XFS_ATTR_PARENT;
 	scr->args.dp		= ip;
 	scr->args.geo		= ip->i_mount->m_attr_geo;
 	scr->args.name		= (const unsigned char *)&scr->rec;
-	scr->args.namelen	= sizeof(struct xfs_parent_name_rec);
+	scr->args.namelen	= reclen;
 	scr->args.op_flags	= XFS_DA_OP_OKNOENT;
 	scr->args.trans		= tp;
 	scr->args.valuelen	= namelen;
@@ -357,14 +368,16 @@ xfs_parent_set(
 	const struct xfs_parent_name_irec *pptr,
 	struct xfs_parent_scratch	*scr)
 {
-	xfs_parent_irec_to_disk(&scr->rec, NULL, NULL, pptr);
+	int				reclen;
+
+	xfs_parent_irec_to_disk(&scr->rec, &reclen, NULL, NULL, pptr);
 
 	memset(&scr->args, 0, sizeof(struct xfs_da_args));
 	scr->args.attr_filter	= XFS_ATTR_PARENT;
 	scr->args.dp		= ip;
 	scr->args.geo		= ip->i_mount->m_attr_geo;
 	scr->args.name		= (const unsigned char *)&scr->rec;
-	scr->args.namelen	= sizeof(struct xfs_parent_name_rec);
+	scr->args.namelen	= reclen;
 	scr->args.valuelen	= pptr->p_namelen;
 	scr->args.value		= (void *)pptr->p_name;
 	scr->args.whichfork	= XFS_ATTR_FORK;
@@ -384,14 +397,16 @@ xfs_parent_unset(
 	const struct xfs_parent_name_irec *pptr,
 	struct xfs_parent_scratch	*scr)
 {
-	xfs_parent_irec_to_disk(&scr->rec, NULL, NULL, pptr);
+	int				reclen;
+
+	xfs_parent_irec_to_disk(&scr->rec, &reclen, NULL, NULL, pptr);
 
 	memset(&scr->args, 0, sizeof(struct xfs_da_args));
 	scr->args.attr_filter	= XFS_ATTR_PARENT;
 	scr->args.dp		= ip;
 	scr->args.geo		= ip->i_mount->m_attr_geo;
 	scr->args.name		= (const unsigned char *)&scr->rec;
-	scr->args.namelen	= sizeof(struct xfs_parent_name_rec);
+	scr->args.namelen	= reclen;
 	scr->args.whichfork	= XFS_ATTR_FORK;
 
 	return xfs_attr_set(&scr->args);
@@ -399,7 +414,7 @@ xfs_parent_unset(
 
 /*
  * Compute the parent pointer namehash for the given child file and dirent
- * name.
+ * name.  Returns the length of the hash in bytes, or a negative errno.
  */
 int
 xfs_parent_namehash(
@@ -420,6 +435,12 @@ xfs_parent_namehash(
 		return -EINVAL;
 	}
 
+	if (name->len < namehash_len) {
+		memcpy(namehash, name->name, name->len);
+		memset(namehash + name->len, 0, namehash_len - name->len);
+		return name->len;
+	}
+
 	error = sha512_init(&shash);
 	if (error)
 		goto out;
@@ -436,6 +457,7 @@ xfs_parent_namehash(
 	if (error)
 		goto out;
 
+	error = SHA512_DIGEST_SIZE;
 out:
 	sha512_erase(&shash);
 	return error;
@@ -451,7 +473,12 @@ xfs_parent_irec_hash(
 		.name			= pptr->p_name,
 		.len			= pptr->p_namelen,
 	};
+	int				hashlen;
 
-	return xfs_parent_namehash(ip, &xname, &pptr->p_namehash,
+	hashlen = xfs_parent_namehash(ip, &xname, &pptr->p_namehash,
 			sizeof(pptr->p_namehash));
+	if (hashlen < 0)
+		return hashlen;
+	pptr->hashlen = hashlen;
+	return 0;
 }
diff --git a/libxfs/xfs_parent.h b/libxfs/xfs_parent.h
index d3f2841e..4c310076 100644
--- a/libxfs/xfs_parent.h
+++ b/libxfs/xfs_parent.h
@@ -23,6 +23,7 @@ struct xfs_parent_name_irec {
 	/* Key fields for looking up a particular parent pointer. */
 	xfs_ino_t		p_ino;
 	uint32_t		p_gen;
+	uint8_t			hashlen;
 	uint8_t			p_namehash[XFS_PARENT_NAME_HASH_SIZE];
 
 	/* Attributes of a parent pointer. */
@@ -31,10 +32,11 @@ struct xfs_parent_name_irec {
 };
 
 void xfs_parent_irec_from_disk(struct xfs_parent_name_irec *irec,
-		const struct xfs_parent_name_rec *rec,
+		const struct xfs_parent_name_rec *rec, int reclen,
 		const void *value, int valuelen);
-void xfs_parent_irec_to_disk(struct xfs_parent_name_rec *rec, void *value,
-		int *valuelen, const struct xfs_parent_name_irec *irec);
+void xfs_parent_irec_to_disk(struct xfs_parent_name_rec *rec, int *reclen,
+		void *value, int *valuelen,
+		const struct xfs_parent_name_irec *irec);
 
 /*
  * Dynamically allocd structure used to wrap the needed data to pass around
diff --git a/logprint/log_redo.c b/logprint/log_redo.c
index ca6b2641..339d4815 100644
--- a/logprint/log_redo.c
+++ b/logprint/log_redo.c
@@ -709,11 +709,12 @@ dump_pptr(
 {
 	struct xfs_parent_name_irec	irec;
 
-	libxfs_parent_irec_from_disk(&irec, name, value, valuelen);
+	libxfs_parent_irec_from_disk(&irec, name, namelen, value, valuelen);
 
 	printf("PPTR: %s attr_namelen %u value_namelen %u\n", tag, namelen, valuelen);
-	printf("PPTR: %s parent_ino %llu parent_gen %u namelen %u name '%.*s'\n",
+	printf("PPTR: %s parent_ino %llu parent_gen %u hashlen %u namelen %u name '%.*s'\n",
 			tag, (unsigned long long)irec.p_ino, irec.p_gen,
+			irec.hashlen,
 			irec.p_namelen, irec.p_namelen, irec.p_name);
 }
 
diff --git a/repair/pptr.c b/repair/pptr.c
index ca5fe7e3..12382ad7 100644
--- a/repair/pptr.c
+++ b/repair/pptr.c
@@ -515,6 +515,7 @@ examine_xattr(
 	struct xfs_mount	*mp = ip->i_mount;
 	struct file_scan	*fscan = priv;
 	const struct xfs_parent_name_rec *rec = (const void *)name;
+	int			hashlen;
 	int			error;
 
 	/* Ignore anything that isn't a parent pointer. */
@@ -530,7 +531,7 @@ examine_xattr(
 	    !xfs_parent_valuecheck(mp, value, valuelen))
 		goto corrupt;
 
-	libxfs_parent_irec_from_disk(&irec, rec, value, valuelen);
+	libxfs_parent_irec_from_disk(&irec, rec, namelen, value, valuelen);
 
 	file_pptr.parent_ino = irec.p_ino;
 	file_pptr.parent_gen = irec.p_gen;
@@ -543,12 +544,13 @@ examine_xattr(
 	 * Does the namehash in the attr key match the name in the attr value?
 	 * If not, there's no point in checking further.
 	 */
-	error = -libxfs_parent_namehash(ip, &xname, namehash,
+	hashlen = libxfs_parent_namehash(ip, &xname, namehash,
 			sizeof(namehash));
-	if (error)
+	if (hashlen < 0)
 		goto corrupt;
 
-	if (memcmp(irec.p_namehash, namehash, sizeof(irec.p_namehash)))
+	if (namelen != xfs_parent_name_rec_sizeof(hashlen) ||
+	    memcmp(irec.p_namehash, namehash, hashlen))
 		goto corrupt;
 
 	error = store_file_pptr_name(fscan, &file_pptr, &irec);


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 5/6] xfs: make the ondisk parent pointer record a flex array
  2023-02-16 20:31 ` [PATCHSET v9r2d1 0/6] xfsprogs: encode parent pointer name in xattr key Darrick J. Wong
                     ` (3 preceding siblings ...)
  2023-02-16 21:11   ` [PATCH 4/6] xfs: skip the sha512 namehash when possible Darrick J. Wong
@ 2023-02-16 21:11   ` Darrick J. Wong
  2023-02-16 21:12   ` [PATCH 6/6] xfs: use parent pointer xattr space more efficiently Darrick J. Wong
  5 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 21:11 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Now that we can use the filename as the parent pointer name hash, we
always write the full 64 bytes into the xattr.  In other words, the
namehash is really a flex array, so adjust its C definition.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 libxfs/xfs_da_format.h  |    9 ++++++---
 libxfs/xfs_parent.c     |    4 ++--
 libxfs/xfs_parent.h     |   15 ++++++++++++---
 libxfs/xfs_trans_resv.c |    6 +++---
 logprint/log_redo.c     |   45 ++++++++++++++++++++++++---------------------
 logprint/logprint.h     |    3 ++-
 6 files changed, 49 insertions(+), 33 deletions(-)


diff --git a/libxfs/xfs_da_format.h b/libxfs/xfs_da_format.h
index 27535750..4d858307 100644
--- a/libxfs/xfs_da_format.h
+++ b/libxfs/xfs_da_format.h
@@ -842,21 +842,24 @@ xfs_failaddr_t xfs_da3_blkinfo_verify(struct xfs_buf *bp,
 struct xfs_parent_name_rec {
 	__be64  p_ino;
 	__be32  p_gen;
-	__u8	p_namehash[XFS_PARENT_NAME_HASH_SIZE];
+	__u8	p_namehash[];
 } __attribute__((packed));
 
+#define XFS_PARENT_NAME_MAX_SIZE \
+	(sizeof(struct xfs_parent_name_rec) + XFS_PARENT_NAME_HASH_SIZE)
+
 static inline unsigned int
 xfs_parent_name_rec_sizeof(
 	unsigned int		hashlen)
 {
-	return offsetof(struct xfs_parent_name_rec, p_namehash) + hashlen;
+	return sizeof(struct xfs_parent_name_rec) + hashlen;
 }
 
 static inline unsigned int
 xfs_parent_name_hashlen(
 	unsigned int		rec_sizeof)
 {
-	return rec_sizeof - offsetof(struct xfs_parent_name_rec, p_namehash);
+	return rec_sizeof - sizeof(struct xfs_parent_name_rec);
 }
 
 #endif /* __XFS_DA_FORMAT_H__ */
diff --git a/libxfs/xfs_parent.c b/libxfs/xfs_parent.c
index 064f2f40..8886d344 100644
--- a/libxfs/xfs_parent.c
+++ b/libxfs/xfs_parent.c
@@ -98,7 +98,7 @@ xfs_init_parent_name_rec(
 	rec->p_ino = cpu_to_be64(dp->i_ino);
 	rec->p_gen = cpu_to_be32(VFS_IC(dp)->i_generation);
 	return xfs_parent_namehash(ip, name, rec->p_namehash,
-			sizeof(rec->p_namehash));
+			XFS_PARENT_NAME_HASH_SIZE);
 }
 
 /*
@@ -197,7 +197,7 @@ __xfs_parent_init(
 	parent->args.attr_filter = XFS_ATTR_PARENT;
 	parent->args.op_flags = XFS_DA_OP_OKNOENT | XFS_DA_OP_LOGGED;
 	parent->args.name = (const uint8_t *)&parent->rec;
-	parent->args.namelen = sizeof(struct xfs_parent_name_rec);
+	parent->args.namelen = 0;
 
 	*parentp = parent;
 	return 0;
diff --git a/libxfs/xfs_parent.h b/libxfs/xfs_parent.h
index 4c310076..3431aac7 100644
--- a/libxfs/xfs_parent.h
+++ b/libxfs/xfs_parent.h
@@ -43,8 +43,14 @@ void xfs_parent_irec_to_disk(struct xfs_parent_name_rec *rec, int *reclen,
  * the defer ops machinery
  */
 struct xfs_parent_defer {
-	struct xfs_parent_name_rec	rec;
-	struct xfs_parent_name_rec	old_rec;
+	union {
+		struct xfs_parent_name_rec	rec;
+		__u8			dummy1[XFS_PARENT_NAME_MAX_SIZE];
+	};
+	union {
+		struct xfs_parent_name_rec	old_rec;
+		__u8			dummy2[XFS_PARENT_NAME_MAX_SIZE];
+	};
 	struct xfs_da_args		args;
 	bool				have_log;
 };
@@ -112,7 +118,10 @@ unsigned int xfs_pptr_calc_space_res(struct xfs_mount *mp,
 
 /* Scratchpad memory so that raw parent operations don't burn stack space. */
 struct xfs_parent_scratch {
-	struct xfs_parent_name_rec	rec;
+	union {
+		struct xfs_parent_name_rec	rec;
+		__u8			dummy1[XFS_PARENT_NAME_MAX_SIZE];
+	};
 	struct xfs_da_args		args;
 };
 
diff --git a/libxfs/xfs_trans_resv.c b/libxfs/xfs_trans_resv.c
index 50315738..406592f2 100644
--- a/libxfs/xfs_trans_resv.c
+++ b/libxfs/xfs_trans_resv.c
@@ -424,19 +424,19 @@ static inline unsigned int xfs_calc_pptr_link_overhead(void)
 {
 	return sizeof(struct xfs_attri_log_format) +
 			xlog_calc_iovec_len(XATTR_NAME_MAX) +
-			xlog_calc_iovec_len(sizeof(struct xfs_parent_name_rec));
+			xlog_calc_iovec_len(XFS_PARENT_NAME_MAX_SIZE);
 }
 static inline unsigned int xfs_calc_pptr_unlink_overhead(void)
 {
 	return sizeof(struct xfs_attri_log_format) +
-			xlog_calc_iovec_len(sizeof(struct xfs_parent_name_rec));
+			xlog_calc_iovec_len(XFS_PARENT_NAME_MAX_SIZE);
 }
 static inline unsigned int xfs_calc_pptr_replace_overhead(void)
 {
 	return sizeof(struct xfs_attri_log_format) +
 			xlog_calc_iovec_len(XATTR_NAME_MAX) +
 			xlog_calc_iovec_len(XATTR_NAME_MAX) +
-			xlog_calc_iovec_len(sizeof(struct xfs_parent_name_rec));
+			xlog_calc_iovec_len(XFS_PARENT_NAME_MAX_SIZE);
 }
 
 /*
diff --git a/logprint/log_redo.c b/logprint/log_redo.c
index 339d4815..7869d58e 100644
--- a/logprint/log_redo.c
+++ b/logprint/log_redo.c
@@ -682,19 +682,18 @@ static inline size_t ATTR_NVEC_SIZE(size_t size)
 
 static int
 xfs_attri_copy_name_format(
-	char                            *buf,
-	uint                            len,
-	struct xfs_parent_name_rec     *dst_attri_fmt)
+	char				*buf,
+	uint				len,
+	uint				alfi_name_len,
+	struct xfs_parent_name_rec	*dst_attri_fmt)
 {
-	uint dst_len = ATTR_NVEC_SIZE(sizeof(struct xfs_parent_name_rec));
-
-	if (len == dst_len) {
-		memcpy((char *)dst_attri_fmt, buf, len);
+	if (alfi_name_len <= len) {
+		memcpy(dst_attri_fmt, buf, alfi_name_len);
 		return 0;
 	}
 
 	fprintf(stderr, _("%s: bad size of attri name format: %u; expected %u\n"),
-		progname, len, dst_len);
+		progname, len, alfi_name_len);
 
 	return 1;
 }
@@ -764,6 +763,7 @@ xlog_print_trans_attri(
 		name_ptr = *ptr;
 		name_len = src_f->alfi_name_len;
 		error = xlog_print_trans_attri_name(ptr, be32_to_cpu(head->oh_len),
+						    src_f->alfi_name_len,
 						    src_f->alfi_attr_filter);
 		if (error)
 			goto error;
@@ -777,6 +777,7 @@ xlog_print_trans_attri(
 		nname_ptr = *ptr;
 		nname_len = src_f->alfi_nname_len;
 		error = xlog_print_trans_attri_name(ptr, be32_to_cpu(head->oh_len),
+						    src_f->alfi_nname_len,
 						    src_f->alfi_attr_filter);
 		if (error)
 			goto error;
@@ -814,10 +815,10 @@ int
 xlog_print_trans_attri_name(
 	char				**ptr,
 	uint				src_len,
+	uint				alfi_name_len,
 	uint				attr_flags)
 {
-	struct xfs_parent_name_rec	*src_f = NULL;
-	uint				dst_len;
+	struct xfs_parent_name_rec	*src_f;
 
 	/*
 	 * If this is not a parent pointer, just do a bin dump
@@ -828,10 +829,9 @@ xlog_print_trans_attri_name(
 		goto out;
 	}
 
-	dst_len	= ATTR_NVEC_SIZE(sizeof(struct xfs_parent_name_rec));
-	if (dst_len != src_len) {
+	if (alfi_name_len > src_len) {
 		fprintf(stderr, _("%s: bad size of attri name format: %u; expected %u\n"),
-			progname, src_len, dst_len);
+			progname, src_len, alfi_name_len);
 		return 1;
 	}
 
@@ -929,14 +929,12 @@ xlog_recover_print_attri(
 			src_rec = (struct xfs_parent_name_rec *)item->ri_buf[region].i_addr;
 			src_len = item->ri_buf[region].i_len;
 
-			dst_len = ATTR_NVEC_SIZE(sizeof(struct xfs_parent_name_rec));
-
-			if ((rec = ((struct xfs_parent_name_rec *)malloc(dst_len))) == NULL) {
+			if ((rec = calloc(src_len, 1)) == NULL) {
 				fprintf(stderr, _("%s: xlog_recover_print_attri: malloc failed\n"),
 					progname);
 				exit(1);
 			}
-			if (xfs_attri_copy_name_format((char *)src_rec, src_len, rec)) {
+			if (xfs_attri_copy_name_format((char *)src_rec, src_len, f->alfi_name_len, rec)) {
 				goto out;
 			}
 
@@ -962,14 +960,12 @@ xlog_recover_print_attri(
 			src_rec = (struct xfs_parent_name_rec *)item->ri_buf[region].i_addr;
 			src_len = item->ri_buf[region].i_len;
 
-			dst_len = ATTR_NVEC_SIZE(sizeof(struct xfs_parent_name_rec));
-
-			if ((rec = ((struct xfs_parent_name_rec *)malloc(dst_len))) == NULL) {
+			if ((rec = calloc(dst_len, 1)) == NULL) {
 				fprintf(stderr, _("%s: xlog_recover_print_attri: malloc failed\n"),
 					progname);
 				exit(1);
 			}
-			if (xfs_attri_copy_name_format((char *)src_rec, src_len, rec)) {
+			if (xfs_attri_copy_name_format((char *)src_rec, src_len, f->alfi_nname_len, rec)) {
 				goto out;
 			}
 
@@ -993,6 +989,7 @@ xlog_recover_print_attri(
 
 		if (f->alfi_attr_filter & XFS_ATTR_PARENT) {
 			src_value = (char *)item->ri_buf[region].i_addr;
+			src_len = item->ri_buf[region].i_len;
 
 			if ((value = ((char *)malloc(f->alfi_value_len))) == NULL) {
 				fprintf(stderr, _("%s: xlog_recover_print_attri: malloc failed\n"),
@@ -1000,6 +997,12 @@ xlog_recover_print_attri(
 				exit(1);
 			}
 
+			if (f->alfi_value_len > src_len) {
+				fprintf(stderr, _("%s: bad size of attri value format: %u; expected %u\n"),
+					progname, src_len, f->alfi_value_len);
+				exit(1);
+			}
+
 			value_ptr = src_value;
 			value_len = f->alfi_value_len;
 
diff --git a/logprint/logprint.h b/logprint/logprint.h
index b8e1c932..12d333d7 100644
--- a/logprint/logprint.h
+++ b/logprint/logprint.h
@@ -59,7 +59,8 @@ extern void xlog_recover_print_bud(struct xlog_recover_item *item);
 #define MAX_ATTR_VAL_PRINT	128
 
 extern int xlog_print_trans_attri(char **ptr, uint src_len, int *i);
-extern int xlog_print_trans_attri_name(char **ptr, uint src_len, uint attr_flags);
+extern int xlog_print_trans_attri_name(char **ptr, uint src_len,
+		uint alfi_name_len, uint attr_flags);
 extern int xlog_print_trans_attri_value(char **ptr, uint src_len, int value_len,
 					uint attr_flags);
 extern void xlog_recover_print_attri(struct xlog_recover_item *item);


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 6/6] xfs: use parent pointer xattr space more efficiently
  2023-02-16 20:31 ` [PATCHSET v9r2d1 0/6] xfsprogs: encode parent pointer name in xattr key Darrick J. Wong
                     ` (4 preceding siblings ...)
  2023-02-16 21:11   ` [PATCH 5/6] xfs: make the ondisk parent pointer record a flex array Darrick J. Wong
@ 2023-02-16 21:12   ` Darrick J. Wong
  5 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 21:12 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Amend the parent pointer xattr format even more.  Now we put as much of
the dirent name in the namehash as we can.  For names that don't fit,
the namehash is the truncated dirent name with the sha512 of the entire
name at the end of the namehash.  The EA value is then truncated to
whatever doesn't fit in the namehash.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 libxfs/xfs_da_format.h |   26 +++++++++--
 libxfs/xfs_parent.c    |  111 ++++++++++++++++++++++++++++++++++++++----------
 libxfs/xfs_parent.h    |    6 +--
 repair/pptr.c          |    4 +-
 4 files changed, 112 insertions(+), 35 deletions(-)


diff --git a/libxfs/xfs_da_format.h b/libxfs/xfs_da_format.h
index 4d858307..55f510f8 100644
--- a/libxfs/xfs_da_format.h
+++ b/libxfs/xfs_da_format.h
@@ -825,19 +825,24 @@ xfs_failaddr_t xfs_da3_blkinfo_verify(struct xfs_buf *bp,
 				      struct xfs_da3_blkinfo *hdr3);
 
 /* We use sha512 for the parent pointer name hash. */
-#define XFS_PARENT_NAME_HASH_SIZE	(64)
+#define XFS_PARENT_NAME_SHA512_SIZE	(64)
 
 /*
  * Parent pointer attribute format definition
  *
  * The EA name encodes the parent inode number, generation and a collision
- * resistant hash computed from the dirent name.  The hash is defined to be:
+ * resistant hash computed from the dirent name.  The hash is defined to be
+ * one of the following:
  *
- * - The dirent name if it fits within the EA name.
+ * - The dirent name, as long as it does not use the last possible byte of the
+ *   EA name space.
  *
- * - The sha512 of the child inode generation and the dirent name.
+ * - The truncated dirent name, with the sha512 hash of the child inode
+ *   generation number and dirent name.  The hash is written at the end of the
+ *   EA name.
  *
- * The EA value contains the same name as the dirent in the parent directory.
+ * The EA value contains however much of the dirent name that does not fit in
+ * the EA name.
  */
 struct xfs_parent_name_rec {
 	__be64  p_ino;
@@ -845,8 +850,17 @@ struct xfs_parent_name_rec {
 	__u8	p_namehash[];
 } __attribute__((packed));
 
+/* Maximum size of a parent pointer EA name. */
 #define XFS_PARENT_NAME_MAX_SIZE \
-	(sizeof(struct xfs_parent_name_rec) + XFS_PARENT_NAME_HASH_SIZE)
+	(MAXNAMELEN - 1)
+
+/* Maximum size of a parent pointer name hash. */
+#define XFS_PARENT_NAME_MAX_HASH_SIZE \
+	(XFS_PARENT_NAME_MAX_SIZE - sizeof(struct xfs_parent_name_rec))
+
+/* Offset of the sha512 hash, if used. */
+#define XFS_PARENT_NAME_SHA512_OFFSET \
+	(XFS_PARENT_NAME_MAX_HASH_SIZE - XFS_PARENT_NAME_SHA512_SIZE)
 
 static inline unsigned int
 xfs_parent_name_rec_sizeof(
diff --git a/libxfs/xfs_parent.c b/libxfs/xfs_parent.c
index 8886d344..09bd8e3a 100644
--- a/libxfs/xfs_parent.c
+++ b/libxfs/xfs_parent.c
@@ -57,7 +57,7 @@ xfs_parent_namecheck(
 	xfs_ino_t				p_ino;
 
 	if (reclen <= xfs_parent_name_rec_sizeof(0) ||
-	    reclen > xfs_parent_name_rec_sizeof(XFS_PARENT_NAME_HASH_SIZE))
+	    reclen > xfs_parent_name_rec_sizeof(XFS_PARENT_NAME_MAX_HASH_SIZE))
 		return false;
 
 	/* Only one namespace bit allowed. */
@@ -75,10 +75,18 @@ xfs_parent_namecheck(
 bool
 xfs_parent_valuecheck(
 	struct xfs_mount		*mp,
+	size_t				namelen,
 	const void			*value,
 	size_t				valuelen)
 {
-	if (valuelen == 0 || valuelen >= MAXNAMELEN)
+	if (namelen > XFS_PARENT_NAME_MAX_SIZE)
+		return false;
+
+	if (namelen < XFS_PARENT_NAME_MAX_SIZE && valuelen != 0)
+		return false;
+
+	if (namelen == XFS_PARENT_NAME_MAX_SIZE &&
+	    valuelen >= MAXNAMELEN - XFS_PARENT_NAME_SHA512_OFFSET)
 		return false;
 
 	if (value == NULL)
@@ -98,7 +106,20 @@ xfs_init_parent_name_rec(
 	rec->p_ino = cpu_to_be64(dp->i_ino);
 	rec->p_gen = cpu_to_be32(VFS_IC(dp)->i_generation);
 	return xfs_parent_namehash(ip, name, rec->p_namehash,
-			XFS_PARENT_NAME_HASH_SIZE);
+			XFS_PARENT_NAME_MAX_HASH_SIZE);
+}
+
+/* Compute the number of name bytes that can be encoded in the namehash. */
+static inline unsigned int
+xfs_parent_valuelen_adj(
+	int			hashlen)
+{
+	ASSERT(hashlen > 0);
+
+	if (hashlen == XFS_PARENT_NAME_MAX_HASH_SIZE)
+		return XFS_PARENT_NAME_SHA512_OFFSET;
+
+	return hashlen;
 }
 
 /*
@@ -125,14 +146,29 @@ xfs_parent_irec_from_disk(
 		return;
 	}
 
-	ASSERT(valuelen > 0);
 	ASSERT(valuelen < MAXNAMELEN);
 
-	valuelen = min(valuelen, MAXNAMELEN);
+	if (irec->hashlen == XFS_PARENT_NAME_MAX_HASH_SIZE) {
+		ASSERT(valuelen > 0);
+		ASSERT(valuelen <= MAXNAMELEN - XFS_PARENT_NAME_SHA512_OFFSET);
 
-	irec->p_namelen = valuelen;
-	memcpy(irec->p_name, value, valuelen);
-	memset(&irec->p_name[valuelen], 0, sizeof(irec->p_name) - valuelen);
+		valuelen = min_t(int, valuelen,
+				MAXNAMELEN - XFS_PARENT_NAME_SHA512_OFFSET);
+
+		memcpy(irec->p_name, irec->p_namehash,
+				XFS_PARENT_NAME_SHA512_OFFSET);
+		memcpy(&irec->p_name[XFS_PARENT_NAME_SHA512_OFFSET],
+				value, valuelen);
+		irec->p_namelen = XFS_PARENT_NAME_SHA512_OFFSET + valuelen;
+	} else {
+		ASSERT(valuelen == 0);
+
+		memcpy(irec->p_name, irec->p_namehash, irec->hashlen);
+		irec->p_namelen = irec->hashlen;
+	}
+
+	memset(&irec->p_name[irec->p_namelen], 0,
+			sizeof(irec->p_name) - irec->p_namelen);
 }
 
 /*
@@ -157,11 +193,15 @@ xfs_parent_irec_to_disk(
 		ASSERT(*valuelen >= irec->p_namelen);
 		ASSERT(*valuelen < MAXNAMELEN);
 
-		*valuelen = irec->p_namelen;
+		if (irec->hashlen == XFS_PARENT_NAME_MAX_HASH_SIZE)
+			*valuelen = irec->p_namelen - XFS_PARENT_NAME_SHA512_OFFSET;
+		else
+			*valuelen = 0;
 	}
 
-	if (value)
-		memcpy(value, irec->p_name, irec->p_namelen);
+	if (value && irec->hashlen == XFS_PARENT_NAME_MAX_HASH_SIZE)
+		memcpy(value, irec->p_name + XFS_PARENT_NAME_SHA512_OFFSET,
+			      irec->p_namelen - XFS_PARENT_NAME_SHA512_OFFSET);
 }
 
 /*
@@ -214,6 +254,7 @@ xfs_parent_add(
 {
 	struct xfs_da_args	*args = &parent->args;
 	int			hashlen;
+	unsigned int		name_adj;
 
 	hashlen = xfs_init_parent_name_rec(&parent->rec, dp, parent_name,
 			child);
@@ -223,11 +264,13 @@ xfs_parent_add(
 	args->namelen = xfs_parent_name_rec_sizeof(hashlen);
 	args->hashval = xfs_da_hashname(args->name, args->namelen);
 
+	name_adj = xfs_parent_valuelen_adj(hashlen);
+
 	args->trans = tp;
 	args->dp = child;
 	if (parent_name) {
-		parent->args.value = (void *)parent_name->name;
-		parent->args.valuelen = parent_name->len;
+		parent->args.value = (void *)parent_name->name + name_adj;
+		parent->args.valuelen = parent_name->len - name_adj;
 	}
 
 	return xfs_attr_defer_add(args);
@@ -269,6 +312,7 @@ xfs_parent_replace(
 {
 	struct xfs_da_args	*args = &new_parent->args;
 	int			old_hashlen, new_hashlen;
+	int			new_name_adj;
 
 	old_hashlen = xfs_init_parent_name_rec(&new_parent->old_rec, old_dp,
 			old_name, child);
@@ -279,6 +323,8 @@ xfs_parent_replace(
 	if (new_hashlen < 0)
 		return new_hashlen;
 
+	new_name_adj = xfs_parent_valuelen_adj(new_hashlen);
+
 	new_parent->args.name = (const uint8_t *)&new_parent->old_rec;
 	new_parent->args.namelen = xfs_parent_name_rec_sizeof(old_hashlen);
 	new_parent->args.new_name = (const uint8_t *)&new_parent->rec;
@@ -286,8 +332,8 @@ xfs_parent_replace(
 	args->trans = tp;
 	args->dp = child;
 
-	new_parent->args.value = (void *)new_name->name;
-	new_parent->args.valuelen = new_name->len;
+	new_parent->args.value = (void *)new_name->name + new_name_adj;
+	new_parent->args.valuelen = new_name->len - new_name_adj;
 
 	args->hashval = xfs_da_hashname(args->name, args->namelen);
 	return xfs_attr_defer_replace(args);
@@ -331,10 +377,13 @@ xfs_parent_lookup(
 	struct xfs_parent_scratch	*scr)
 {
 	int				reclen;
+	int				name_adj;
 	int				error;
 
 	xfs_parent_irec_to_disk(&scr->rec, &reclen, NULL, NULL, pptr);
 
+	name_adj = xfs_parent_valuelen_adj(pptr->hashlen);
+
 	memset(&scr->args, 0, sizeof(struct xfs_da_args));
 	scr->args.attr_filter	= XFS_ATTR_PARENT;
 	scr->args.dp		= ip;
@@ -343,8 +392,8 @@ xfs_parent_lookup(
 	scr->args.namelen	= reclen;
 	scr->args.op_flags	= XFS_DA_OP_OKNOENT;
 	scr->args.trans		= tp;
-	scr->args.valuelen	= namelen;
-	scr->args.value		= name;
+	scr->args.valuelen	= namelen - name_adj;
+	scr->args.value		= name + name_adj;
 	scr->args.whichfork	= XFS_ATTR_FORK;
 
 	scr->args.hashval = xfs_da_hashname(scr->args.name, scr->args.namelen);
@@ -353,7 +402,8 @@ xfs_parent_lookup(
 	if (error)
 		return error;
 
-	return scr->args.valuelen;
+	memcpy(name, pptr->p_namehash, name_adj);
+	return scr->args.valuelen + name_adj;
 }
 
 /*
@@ -369,17 +419,20 @@ xfs_parent_set(
 	struct xfs_parent_scratch	*scr)
 {
 	int				reclen;
+	int				name_adj;
 
 	xfs_parent_irec_to_disk(&scr->rec, &reclen, NULL, NULL, pptr);
 
+	name_adj = xfs_parent_valuelen_adj(pptr->hashlen);
+
 	memset(&scr->args, 0, sizeof(struct xfs_da_args));
 	scr->args.attr_filter	= XFS_ATTR_PARENT;
 	scr->args.dp		= ip;
 	scr->args.geo		= ip->i_mount->m_attr_geo;
 	scr->args.name		= (const unsigned char *)&scr->rec;
 	scr->args.namelen	= reclen;
-	scr->args.valuelen	= pptr->p_namelen;
-	scr->args.value		= (void *)pptr->p_name;
+	scr->args.valuelen	= pptr->p_namelen - name_adj;
+	scr->args.value		= (void *)pptr->p_name + name_adj;
 	scr->args.whichfork	= XFS_ATTR_FORK;
 
 	return xfs_attr_set(&scr->args);
@@ -430,12 +483,16 @@ xfs_parent_namehash(
 	ASSERT(SHA512_DIGEST_SIZE ==
 			crypto_shash_digestsize(ip->i_mount->m_sha512));
 
-	if (namehash_len != SHA512_DIGEST_SIZE) {
+	if (namehash_len != XFS_PARENT_NAME_MAX_HASH_SIZE) {
 		ASSERT(0);
 		return -EINVAL;
 	}
 
-	if (name->len < namehash_len) {
+	if (name->len < XFS_PARENT_NAME_MAX_HASH_SIZE) {
+		/*
+		 * If the dirent name is shorter than the size of the namehash
+		 * field, write it directly into the namehash field.
+		 */
 		memcpy(namehash, name->name, name->len);
 		memset(namehash + name->len, 0, namehash_len - name->len);
 		return name->len;
@@ -453,11 +510,17 @@ xfs_parent_namehash(
 	if (error)
 		goto out;
 
-	error = sha512_done(&shash, namehash);
+	/*
+	 * The sha512 hash of the child gen and dirent name is placed at the
+	 * end of the namehash, and as many bytes as will fit are copied from
+	 * the dirent name to the start of the namehash.
+	 */
+	error = sha512_done(&shash, namehash + XFS_PARENT_NAME_SHA512_OFFSET);
 	if (error)
 		goto out;
 
-	error = SHA512_DIGEST_SIZE;
+	memcpy(namehash, name->name, XFS_PARENT_NAME_SHA512_OFFSET);
+	error = XFS_PARENT_NAME_MAX_HASH_SIZE;
 out:
 	sha512_erase(&shash);
 	return error;
diff --git a/libxfs/xfs_parent.h b/libxfs/xfs_parent.h
index 3431aac7..6f613616 100644
--- a/libxfs/xfs_parent.h
+++ b/libxfs/xfs_parent.h
@@ -12,8 +12,8 @@ extern struct kmem_cache	*xfs_parent_intent_cache;
 bool xfs_parent_namecheck(struct xfs_mount *mp,
 		const struct xfs_parent_name_rec *rec, size_t reclen,
 		unsigned int attr_flags);
-bool xfs_parent_valuecheck(struct xfs_mount *mp, const void *value,
-		size_t valuelen);
+bool xfs_parent_valuecheck(struct xfs_mount *mp, size_t namelen,
+		const void *value, size_t valuelen);
 
 /*
  * Incore version of a parent pointer, also contains dirent name so callers
@@ -24,7 +24,7 @@ struct xfs_parent_name_irec {
 	xfs_ino_t		p_ino;
 	uint32_t		p_gen;
 	uint8_t			hashlen;
-	uint8_t			p_namehash[XFS_PARENT_NAME_HASH_SIZE];
+	uint8_t			p_namehash[XFS_PARENT_NAME_MAX_HASH_SIZE];
 
 	/* Attributes of a parent pointer. */
 	uint8_t			p_namelen;
diff --git a/repair/pptr.c b/repair/pptr.c
index 12382ad7..67131981 100644
--- a/repair/pptr.c
+++ b/repair/pptr.c
@@ -511,7 +511,7 @@ examine_xattr(
 	struct file_pptr	file_pptr = { };
 	struct xfs_parent_name_irec irec;
 	struct xfs_name		xname;
-	uint8_t			namehash[XFS_PARENT_NAME_HASH_SIZE];
+	uint8_t			namehash[XFS_PARENT_NAME_MAX_HASH_SIZE];
 	struct xfs_mount	*mp = ip->i_mount;
 	struct file_scan	*fscan = priv;
 	const struct xfs_parent_name_rec *rec = (const void *)name;
@@ -528,7 +528,7 @@ examine_xattr(
 
 	/* Does the ondisk parent pointer structure make sense? */
 	if (!xfs_parent_namecheck(mp, rec, namelen, attr_flags) ||
-	    !xfs_parent_valuecheck(mp, value, valuelen))
+	    !xfs_parent_valuecheck(mp, namelen, value, valuelen))
 		goto corrupt;
 
 	libxfs_parent_irec_from_disk(&irec, rec, namelen, value, valuelen);


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 1/3] xfs: rename xfs_pptr_info to xfs_getparents
  2023-02-16 20:31 ` [PATCHSET v9r2d1 0/3] xfsprogs: use flex arrays for XFS_IOC_GETPARENTS Darrick J. Wong
@ 2023-02-16 21:12   ` Darrick J. Wong
  2023-02-16 21:12   ` [PATCH 2/3] xfs: rename xfs_parent_ptr Darrick J. Wong
  2023-02-16 21:12   ` [PATCH 3/3] xfs: convert GETPARENTS structures to flex arrays Darrick J. Wong
  2 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 21:12 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Rename the head structure of the parent pointer ioctl to match the name
of the ioctl (XFS_IOC_GETPARENTS).

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 io/parent.c       |    4 ++--
 libfrog/pptrs.c   |   28 ++++++++++++++--------------
 libfrog/pptrs.h   |    2 +-
 libxfs/xfs_fs.h   |   51 ++++++++++++++++++++++++++-------------------------
 man/man3/xfsctl.3 |   16 ++++++++--------
 5 files changed, 51 insertions(+), 50 deletions(-)


diff --git a/io/parent.c b/io/parent.c
index 36522f26..1c1453f2 100644
--- a/io/parent.c
+++ b/io/parent.c
@@ -24,14 +24,14 @@ struct pptr_args {
 
 static int
 pptr_print(
-	struct xfs_pptr_info	*pi,
+	struct xfs_getparents	*pi,
 	struct xfs_parent_ptr	*pptr,
 	void			*arg)
 {
 	struct pptr_args	*args = arg;
 	unsigned int		namelen;
 
-	if (pi->pi_flags & XFS_PPTR_OFLAG_ROOT) {
+	if (pi->gp_flags & XFS_GETPARENTS_OFLAG_ROOT) {
 		printf(_("Root directory.\n"));
 		return 0;
 	}
diff --git a/libfrog/pptrs.c b/libfrog/pptrs.c
index 61fd1fb9..3bb441f0 100644
--- a/libfrog/pptrs.c
+++ b/libfrog/pptrs.c
@@ -12,17 +12,17 @@
 #include "libfrog/pptrs.h"
 
 /* Allocate a buffer large enough for some parent pointer records. */
-static inline struct xfs_pptr_info *
+static inline struct xfs_getparents *
 alloc_pptr_buf(
 	size_t			nr_ptrs)
 {
-	struct xfs_pptr_info	*pi;
+	struct xfs_getparents	*pi;
 
-	pi = malloc(xfs_pptr_info_sizeof(nr_ptrs));
+	pi = malloc(xfs_getparents_sizeof(nr_ptrs));
 	if (!pi)
 		return NULL;
-	memset(pi, 0, sizeof(struct xfs_pptr_info));
-	pi->pi_ptrs_size = nr_ptrs;
+	memset(pi, 0, sizeof(struct xfs_getparents));
+	pi->gp_ptrs_size = nr_ptrs;
 	return pi;
 }
 
@@ -37,7 +37,7 @@ handle_walk_parents(
 	walk_pptr_fn		fn,
 	void			*arg)
 {
-	struct xfs_pptr_info	*pi;
+	struct xfs_getparents	*pi;
 	struct xfs_parent_ptr	*p;
 	unsigned int		i;
 	ssize_t			ret = -1;
@@ -47,25 +47,25 @@ handle_walk_parents(
 		return errno;
 
 	if (handle) {
-		memcpy(&pi->pi_handle, handle, sizeof(struct xfs_handle));
-		pi->pi_flags = XFS_PPTR_IFLAG_HANDLE;
+		memcpy(&pi->gp_handle, handle, sizeof(struct xfs_handle));
+		pi->gp_flags = XFS_GETPARENTS_IFLAG_HANDLE;
 	}
 
 	ret = ioctl(fd, XFS_IOC_GETPARENTS, pi);
 	while (!ret) {
-		if (pi->pi_flags & XFS_PPTR_OFLAG_ROOT) {
+		if (pi->gp_flags & XFS_GETPARENTS_OFLAG_ROOT) {
 			ret = fn(pi, NULL, arg);
 			goto out_pi;
 		}
 
-		for (i = 0; i < pi->pi_ptrs_used; i++) {
-			p = xfs_ppinfo_to_pp(pi, i);
+		for (i = 0; i < pi->gp_ptrs_used; i++) {
+			p = xfs_getparents_rec(pi, i);
 			ret = fn(pi, p, arg);
 			if (ret)
 				goto out_pi;
 		}
 
-		if (pi->pi_flags & XFS_PPTR_OFLAG_DONE)
+		if (pi->gp_flags & XFS_GETPARENTS_OFLAG_DONE)
 			break;
 
 		ret = ioctl(fd, XFS_IOC_GETPARENTS, pi);
@@ -128,7 +128,7 @@ static int handle_walk_parent_paths(struct walk_ppaths_info *wpi,
 
 static int
 handle_walk_parent_path_ptr(
-	struct xfs_pptr_info		*pi,
+	struct xfs_getparents		*pi,
 	struct xfs_parent_ptr		*p,
 	void				*arg)
 {
@@ -136,7 +136,7 @@ handle_walk_parent_path_ptr(
 	struct walk_ppaths_info		*wpi = wpli->wpi;
 	int				ret = 0;
 
-	if (pi->pi_flags & XFS_PPTR_OFLAG_ROOT)
+	if (pi->gp_flags & XFS_GETPARENTS_OFLAG_ROOT)
 		return wpi->fn(wpi->mntpt, wpi->path, wpi->arg);
 
 	ret = path_component_change(wpli->pc, p->xpp_name,
diff --git a/libfrog/pptrs.h b/libfrog/pptrs.h
index 1666de06..ab1d0f2f 100644
--- a/libfrog/pptrs.h
+++ b/libfrog/pptrs.h
@@ -8,7 +8,7 @@
 
 struct path_list;
 
-typedef int (*walk_pptr_fn)(struct xfs_pptr_info *pi,
+typedef int (*walk_pptr_fn)(struct xfs_getparents *pi,
 		struct xfs_parent_ptr *pptr, void *arg);
 typedef int (*walk_ppath_fn)(const char *mntpt, struct path_list *path,
 		void *arg);
diff --git a/libxfs/xfs_fs.h b/libxfs/xfs_fs.h
index c65345d2..2a23c010 100644
--- a/libxfs/xfs_fs.h
+++ b/libxfs/xfs_fs.h
@@ -752,19 +752,20 @@ struct xfs_scrub_metadata {
 				 XFS_SCRUB_OFLAG_NO_REPAIR_NEEDED)
 #define XFS_SCRUB_FLAGS_ALL	(XFS_SCRUB_FLAGS_IN | XFS_SCRUB_FLAGS_OUT)
 
-#define XFS_PPTR_MAXNAMELEN				256
+#define XFS_GETPARENTS_MAXNAMELEN	256
 
 /* return parents of the handle, not the open fd */
-#define XFS_PPTR_IFLAG_HANDLE  (1U << 0)
+#define XFS_GETPARENTS_IFLAG_HANDLE	(1U << 0)
 
 /* target was the root directory */
-#define XFS_PPTR_OFLAG_ROOT    (1U << 1)
+#define XFS_GETPARENTS_OFLAG_ROOT	(1U << 1)
 
 /* Cursor is done iterating pptrs */
-#define XFS_PPTR_OFLAG_DONE    (1U << 2)
+#define XFS_GETPARENTS_OFLAG_DONE	(1U << 2)
 
- #define XFS_PPTR_FLAG_ALL     (XFS_PPTR_IFLAG_HANDLE | XFS_PPTR_OFLAG_ROOT | \
-				XFS_PPTR_OFLAG_DONE)
+#define XFS_GETPARENTS_FLAG_ALL		(XFS_GETPARENTS_IFLAG_HANDLE | \
+					 XFS_GETPARENTS_OFLAG_ROOT | \
+					 XFS_GETPARENTS_OFLAG_DONE)
 
 /* Get an inode parent pointer through ioctl */
 struct xfs_parent_ptr {
@@ -772,57 +773,57 @@ struct xfs_parent_ptr {
 	__u32		xpp_gen;			/* Inode generation */
 	__u32		xpp_rsvd;			/* Reserved */
 	__u64		xpp_rsvd2;			/* Reserved */
-	__u8		xpp_name[XFS_PPTR_MAXNAMELEN];	/* File name */
+	__u8		xpp_name[XFS_GETPARENTS_MAXNAMELEN];	/* File name */
 };
 
 /* Iterate through an inodes parent pointers */
-struct xfs_pptr_info {
-	/* File handle, if XFS_PPTR_IFLAG_HANDLE is set */
-	struct xfs_handle		pi_handle;
+struct xfs_getparents {
+	/* File handle, if XFS_GETPARENTS_IFLAG_HANDLE is set */
+	struct xfs_handle		gp_handle;
 
 	/*
 	 * Structure to track progress in iterating the parent pointers.
 	 * Must be initialized to zeroes before the first ioctl call, and
 	 * not touched by callers after that.
 	 */
-	struct xfs_attrlist_cursor	pi_cursor;
+	struct xfs_attrlist_cursor	gp_cursor;
 
-	/* Operational flags: XFS_PPTR_*FLAG* */
-	__u32				pi_flags;
+	/* Operational flags: XFS_GETPARENTS_*FLAG* */
+	__u32				gp_flags;
 
 	/* Must be set to zero */
-	__u32				pi_reserved;
+	__u32				gp_reserved;
 
 	/* # of entries in array */
-	__u32				pi_ptrs_size;
+	__u32				gp_ptrs_size;
 
 	/* # of entries filled in (output) */
-	__u32				pi_ptrs_used;
+	__u32				gp_ptrs_used;
 
 	/* Must be set to zero */
-	__u64				pi_reserved2[6];
+	__u64				gp_reserved2[6];
 
 	/*
 	 * An array of struct xfs_parent_ptr follows the header
-	 * information. Use xfs_ppinfo_to_pp() to access the
+	 * information. Use xfs_getparents_rec() to access the
 	 * parent pointer array entries.
 	 */
-	struct xfs_parent_ptr		pi_parents[];
+	struct xfs_parent_ptr		gp_parents[];
 };
 
 static inline size_t
-xfs_pptr_info_sizeof(int nr_ptrs)
+xfs_getparents_sizeof(int nr_ptrs)
 {
-	return sizeof(struct xfs_pptr_info) +
+	return sizeof(struct xfs_getparents) +
 	       (nr_ptrs * sizeof(struct xfs_parent_ptr));
 }
 
 static inline struct xfs_parent_ptr*
-xfs_ppinfo_to_pp(
-	struct xfs_pptr_info	*info,
-	int			idx)
+xfs_getparents_rec(
+	struct xfs_getparents	*info,
+	unsigned int		idx)
 {
-	return &info->pi_parents[idx];
+	return &info->gp_parents[idx];
 }
 
 /*
diff --git a/man/man3/xfsctl.3 b/man/man3/xfsctl.3
index 42ba3bba..0bcf8886 100644
--- a/man/man3/xfsctl.3
+++ b/man/man3/xfsctl.3
@@ -326,12 +326,12 @@ XFS_IOC_FSSETDM_BY_HANDLE is not supported as of Linux 5.5.
 .B XFS_IOC_GETPARENTS
 This command is used to get a files parent pointers.  Parent pointers are
 file attributes used to store meta data information about an inodes parent.
-This command takes a xfs_pptr_info structure with trailing array of
+This command takes a xfs_getparents structure with trailing array of
 struct xfs_parent_ptr as an input to store an inodes parents. The
-xfs_pptr_info_sizeof() and xfs_ppinfo_to_pp() routines are provided to
+xfs_getparents_sizeof() and xfs_getparents_rec() routines are provided to
 create and iterate through these structures.  The number of pointers stored
-in the array is indicated by the xfs_pptr_info.used field, and the
-XFS_PPTR_OFLAG_DONE flag will be set in xfs_pptr_info.flags when there are
+in the array is indicated by the xfs_getparents.used field, and the
+XFS_PPTR_OFLAG_DONE flag will be set in xfs_getparents.flags when there are
 no more parent pointers to be read.  The below code is an example
 of XFS_IOC_GETPARENTS usage:
 
@@ -345,13 +345,13 @@ of XFS_IOC_GETPARENTS usage:
 #include<xfs/xfs_fs.h>
 
 int main() {
-	struct xfs_pptr_info	*pi;
+	struct xfs_getparents	*pi;
 	struct xfs_parent_ptr	*p;
 	int			i, error, fd, nr_ptrs = 4;
 
-	unsigned char buffer[xfs_pptr_info_sizeof(nr_ptrs)];
+	unsigned char buffer[xfs_getparents_sizeof(nr_ptrs)];
 	memset(buffer, 0, sizeof(buffer));
-	pi = (struct xfs_pptr_info *)&buffer;
+	pi = (struct xfs_getparents *)&buffer;
 	pi->pi_ptrs_size = nr_ptrs;
 
 	fd = open("/mnt/test/foo.txt", O_RDONLY | O_CREAT);
@@ -364,7 +364,7 @@ int main() {
 			return error;
 
 		for (i = 0; i < pi->pi_ptrs_used; i++) {
-			p = xfs_ppinfo_to_pp(pi, i);
+			p = xfs_getparents_rec(pi, i);
 			printf("inode		= %llu\\n", (unsigned long long)p->xpp_ino);
 			printf("generation	= %u\\n", (unsigned int)p->xpp_gen);
 			printf("name		= \\"%s\\"\\n\\n", (char *)p->xpp_name);


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 2/3] xfs: rename xfs_parent_ptr
  2023-02-16 20:31 ` [PATCHSET v9r2d1 0/3] xfsprogs: use flex arrays for XFS_IOC_GETPARENTS Darrick J. Wong
  2023-02-16 21:12   ` [PATCH 1/3] xfs: rename xfs_pptr_info to xfs_getparents Darrick J. Wong
@ 2023-02-16 21:12   ` Darrick J. Wong
  2023-02-16 21:12   ` [PATCH 3/3] xfs: convert GETPARENTS structures to flex arrays Darrick J. Wong
  2 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 21:12 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Change the name to xfs_getparents_rec so that the name matches the head
structure.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 io/parent.c     |   18 +++++++++---------
 libfrog/pptrs.c |   12 ++++++------
 libfrog/pptrs.h |    2 +-
 libxfs/xfs_fs.h |   22 +++++++++++-----------
 4 files changed, 27 insertions(+), 27 deletions(-)


diff --git a/io/parent.c b/io/parent.c
index 1c1453f2..162c3169 100644
--- a/io/parent.c
+++ b/io/parent.c
@@ -25,7 +25,7 @@ struct pptr_args {
 static int
 pptr_print(
 	struct xfs_getparents	*pi,
-	struct xfs_parent_ptr	*pptr,
+	struct xfs_getparents_rec *pptr,
 	void			*arg)
 {
 	struct pptr_args	*args = arg;
@@ -36,21 +36,21 @@ pptr_print(
 		return 0;
 	}
 
-	if (args->filter_ino && pptr->xpp_ino != args->filter_ino)
+	if (args->filter_ino && pptr->gpr_ino != args->filter_ino)
 		return 0;
-	if (args->filter_name && strcmp(args->filter_name, pptr->xpp_name))
+	if (args->filter_name && strcmp(args->filter_name, pptr->gpr_name))
 		return 0;
 
-	namelen = strlen(pptr->xpp_name);
+	namelen = strlen(pptr->gpr_name);
 	if (args->shortformat) {
 		printf("%llu/%u/%u/%s\n",
-			(unsigned long long)pptr->xpp_ino,
-			(unsigned int)pptr->xpp_gen, namelen, pptr->xpp_name);
+			(unsigned long long)pptr->gpr_ino,
+			(unsigned int)pptr->gpr_gen, namelen, pptr->gpr_name);
 	} else {
-		printf(_("p_ino    = %llu\n"), (unsigned long long)pptr->xpp_ino);
-		printf(_("p_gen    = %u\n"), (unsigned int)pptr->xpp_gen);
+		printf(_("p_ino    = %llu\n"), (unsigned long long)pptr->gpr_ino);
+		printf(_("p_gen    = %u\n"), (unsigned int)pptr->gpr_gen);
 		printf(_("p_reclen = %u\n"), namelen);
-		printf(_("p_name   = \"%s\"\n\n"), pptr->xpp_name);
+		printf(_("p_name   = \"%s\"\n\n"), pptr->gpr_name);
 	}
 	return 0;
 }
diff --git a/libfrog/pptrs.c b/libfrog/pptrs.c
index 3bb441f0..48a09f69 100644
--- a/libfrog/pptrs.c
+++ b/libfrog/pptrs.c
@@ -38,7 +38,7 @@ handle_walk_parents(
 	void			*arg)
 {
 	struct xfs_getparents	*pi;
-	struct xfs_parent_ptr	*p;
+	struct xfs_getparents_rec	*p;
 	unsigned int		i;
 	ssize_t			ret = -1;
 
@@ -129,7 +129,7 @@ static int handle_walk_parent_paths(struct walk_ppaths_info *wpi,
 static int
 handle_walk_parent_path_ptr(
 	struct xfs_getparents		*pi,
-	struct xfs_parent_ptr		*p,
+	struct xfs_getparents_rec	*p,
 	void				*arg)
 {
 	struct walk_ppath_level_info	*wpli = arg;
@@ -139,13 +139,13 @@ handle_walk_parent_path_ptr(
 	if (pi->gp_flags & XFS_GETPARENTS_OFLAG_ROOT)
 		return wpi->fn(wpi->mntpt, wpi->path, wpi->arg);
 
-	ret = path_component_change(wpli->pc, p->xpp_name,
-				strlen((char *)p->xpp_name), p->xpp_ino);
+	ret = path_component_change(wpli->pc, p->gpr_name,
+				strlen((char *)p->gpr_name), p->gpr_ino);
 	if (ret)
 		return ret;
 
-	wpli->newhandle.ha_fid.fid_ino = p->xpp_ino;
-	wpli->newhandle.ha_fid.fid_gen = p->xpp_gen;
+	wpli->newhandle.ha_fid.fid_ino = p->gpr_ino;
+	wpli->newhandle.ha_fid.fid_gen = p->gpr_gen;
 
 	path_list_add_parent_component(wpi->path, wpli->pc);
 	ret = handle_walk_parent_paths(wpi, &wpli->newhandle);
diff --git a/libfrog/pptrs.h b/libfrog/pptrs.h
index ab1d0f2f..05aaea60 100644
--- a/libfrog/pptrs.h
+++ b/libfrog/pptrs.h
@@ -9,7 +9,7 @@
 struct path_list;
 
 typedef int (*walk_pptr_fn)(struct xfs_getparents *pi,
-		struct xfs_parent_ptr *pptr, void *arg);
+		struct xfs_getparents_rec *pptr, void *arg);
 typedef int (*walk_ppath_fn)(const char *mntpt, struct path_list *path,
 		void *arg);
 
diff --git a/libxfs/xfs_fs.h b/libxfs/xfs_fs.h
index 2a23c010..ec6fdf78 100644
--- a/libxfs/xfs_fs.h
+++ b/libxfs/xfs_fs.h
@@ -768,12 +768,12 @@ struct xfs_scrub_metadata {
 					 XFS_GETPARENTS_OFLAG_DONE)
 
 /* Get an inode parent pointer through ioctl */
-struct xfs_parent_ptr {
-	__u64		xpp_ino;			/* Inode */
-	__u32		xpp_gen;			/* Inode generation */
-	__u32		xpp_rsvd;			/* Reserved */
-	__u64		xpp_rsvd2;			/* Reserved */
-	__u8		xpp_name[XFS_GETPARENTS_MAXNAMELEN];	/* File name */
+struct xfs_getparents_rec {
+	__u64		gpr_ino;			/* Inode */
+	__u32		gpr_gen;			/* Inode generation */
+	__u32		gpr_rsvd;			/* Reserved */
+	__u64		gpr_rsvd2;			/* Reserved */
+	__u8		gpr_name[XFS_GETPARENTS_MAXNAMELEN];	/* File name */
 };
 
 /* Iterate through an inodes parent pointers */
@@ -804,21 +804,21 @@ struct xfs_getparents {
 	__u64				gp_reserved2[6];
 
 	/*
-	 * An array of struct xfs_parent_ptr follows the header
+	 * An array of struct xfs_getparents_rec follows the header
 	 * information. Use xfs_getparents_rec() to access the
 	 * parent pointer array entries.
 	 */
-	struct xfs_parent_ptr		gp_parents[];
+	struct xfs_getparents_rec		gp_parents[];
 };
 
 static inline size_t
 xfs_getparents_sizeof(int nr_ptrs)
 {
 	return sizeof(struct xfs_getparents) +
-	       (nr_ptrs * sizeof(struct xfs_parent_ptr));
+	       (nr_ptrs * sizeof(struct xfs_getparents_rec));
 }
 
-static inline struct xfs_parent_ptr*
+static inline struct xfs_getparents_rec*
 xfs_getparents_rec(
 	struct xfs_getparents	*info,
 	unsigned int		idx)
@@ -871,7 +871,7 @@ xfs_getparents_rec(
 /*	XFS_IOC_GETFSMAP ------ hoisted 59         */
 #define XFS_IOC_SCRUB_METADATA	_IOWR('X', 60, struct xfs_scrub_metadata)
 #define XFS_IOC_AG_GEOMETRY	_IOWR('X', 61, struct xfs_ag_geometry)
-#define XFS_IOC_GETPARENTS	_IOWR('X', 62, struct xfs_parent_ptr)
+#define XFS_IOC_GETPARENTS	_IOWR('X', 62, struct xfs_getparents_rec)
 
 /*
  * ioctl commands that replace IRIX syssgi()'s


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 3/3] xfs: convert GETPARENTS structures to flex arrays
  2023-02-16 20:31 ` [PATCHSET v9r2d1 0/3] xfsprogs: use flex arrays for XFS_IOC_GETPARENTS Darrick J. Wong
  2023-02-16 21:12   ` [PATCH 1/3] xfs: rename xfs_pptr_info to xfs_getparents Darrick J. Wong
  2023-02-16 21:12   ` [PATCH 2/3] xfs: rename xfs_parent_ptr Darrick J. Wong
@ 2023-02-16 21:12   ` Darrick J. Wong
  2 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 21:12 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

The current definition of the GETPARENTS ioctl doesn't use the buffer
space terribly efficiently because each parent pointer record struct
incorporates enough space to hold the maximally sized dirent name.  Most
dirent names are much less than 255 bytes long, which means we're
wasting a lot of space.

Convert the xfs_getparents_rec structure to use a flex array to store
the dirent name as a null terminated string, which allows us to pack the
information much more densely.  For this to work, augment the
xfs_getparents struct to end with a flex array of buffer offsets to each
xfs_getparents_rec object, much as we do for the attrlist multi ioctl.
Record objects are allocated from the end of the buffer towards the
head.

Reduce the amount of data that we copy to userspace to the head array
containg the offsets, and however much of the buffer's end is used for
the parent records.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 libfrog/pptrs.c |   12 ++++++------
 libxfs/xfs_fs.h |   38 ++++++++++++++------------------------
 2 files changed, 20 insertions(+), 30 deletions(-)


diff --git a/libfrog/pptrs.c b/libfrog/pptrs.c
index 48a09f69..67fd40c3 100644
--- a/libfrog/pptrs.c
+++ b/libfrog/pptrs.c
@@ -14,15 +14,15 @@
 /* Allocate a buffer large enough for some parent pointer records. */
 static inline struct xfs_getparents *
 alloc_pptr_buf(
-	size_t			nr_ptrs)
+	size_t			bufsize)
 {
 	struct xfs_getparents	*pi;
 
-	pi = malloc(xfs_getparents_sizeof(nr_ptrs));
+	pi = calloc(bufsize, 1);
 	if (!pi)
 		return NULL;
-	memset(pi, 0, sizeof(struct xfs_getparents));
-	pi->gp_ptrs_size = nr_ptrs;
+
+	pi->gp_bufsize = bufsize;
 	return pi;
 }
 
@@ -42,7 +42,7 @@ handle_walk_parents(
 	unsigned int		i;
 	ssize_t			ret = -1;
 
-	pi = alloc_pptr_buf(4);
+	pi = alloc_pptr_buf(XFS_XATTR_LIST_MAX);
 	if (!pi)
 		return errno;
 
@@ -58,7 +58,7 @@ handle_walk_parents(
 			goto out_pi;
 		}
 
-		for (i = 0; i < pi->gp_ptrs_used; i++) {
+		for (i = 0; i < pi->gp_count; i++) {
 			p = xfs_getparents_rec(pi, i);
 			ret = fn(pi, p, arg);
 			if (ret)
diff --git a/libxfs/xfs_fs.h b/libxfs/xfs_fs.h
index ec6fdf78..c8be1493 100644
--- a/libxfs/xfs_fs.h
+++ b/libxfs/xfs_fs.h
@@ -769,11 +769,11 @@ struct xfs_scrub_metadata {
 
 /* Get an inode parent pointer through ioctl */
 struct xfs_getparents_rec {
-	__u64		gpr_ino;			/* Inode */
-	__u32		gpr_gen;			/* Inode generation */
-	__u32		gpr_rsvd;			/* Reserved */
-	__u64		gpr_rsvd2;			/* Reserved */
-	__u8		gpr_name[XFS_GETPARENTS_MAXNAMELEN];	/* File name */
+	__u64		gpr_ino;	/* Inode */
+	__u32		gpr_gen;	/* Inode generation */
+	__u32		gpr_rsvd;	/* Reserved */
+	__u64		gpr_rsvd2;	/* Reserved */
+	__u8		gpr_name[];	/* File name and null terminator */
 };
 
 /* Iterate through an inodes parent pointers */
@@ -794,36 +794,26 @@ struct xfs_getparents {
 	/* Must be set to zero */
 	__u32				gp_reserved;
 
-	/* # of entries in array */
-	__u32				gp_ptrs_size;
+	/* size of the memory buffer in bytes, including this header */
+	__u32				gp_bufsize;
 
 	/* # of entries filled in (output) */
-	__u32				gp_ptrs_used;
+	__u32				gp_count;
 
 	/* Must be set to zero */
-	__u64				gp_reserved2[6];
+	__u64				gp_reserved2[5];
 
-	/*
-	 * An array of struct xfs_getparents_rec follows the header
-	 * information. Use xfs_getparents_rec() to access the
-	 * parent pointer array entries.
-	 */
-	struct xfs_getparents_rec		gp_parents[];
+	/* Byte offset of each xfs_getparents_rec object within the buffer. */
+	__u32				gp_offsets[];
 };
 
-static inline size_t
-xfs_getparents_sizeof(int nr_ptrs)
-{
-	return sizeof(struct xfs_getparents) +
-	       (nr_ptrs * sizeof(struct xfs_getparents_rec));
-}
-
 static inline struct xfs_getparents_rec*
 xfs_getparents_rec(
 	struct xfs_getparents	*info,
 	unsigned int		idx)
 {
-	return &info->gp_parents[idx];
+	return (struct xfs_getparents_rec *)((char *)info +
+					     info->gp_offsets[idx]);
 }
 
 /*
@@ -871,7 +861,7 @@ xfs_getparents_rec(
 /*	XFS_IOC_GETFSMAP ------ hoisted 59         */
 #define XFS_IOC_SCRUB_METADATA	_IOWR('X', 60, struct xfs_scrub_metadata)
 #define XFS_IOC_AG_GEOMETRY	_IOWR('X', 61, struct xfs_ag_geometry)
-#define XFS_IOC_GETPARENTS	_IOWR('X', 62, struct xfs_getparents_rec)
+#define XFS_IOC_GETPARENTS	_IOWR('X', 62, struct xfs_getparents)
 
 /*
  * ioctl commands that replace IRIX syssgi()'s


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 1/3] mkfs: enable large extent counts by default
  2023-02-16 20:31 ` [PATCHSET v9r2d1 0/3] xfsprogs: turn on all available features Darrick J. Wong
@ 2023-02-16 21:13   ` Darrick J. Wong
  2023-02-16 21:13   ` [PATCH 2/3] mkfs: enable reverse mapping " Darrick J. Wong
  2023-02-16 21:13   ` [PATCH 3/3] mkfs: enable parent pointers " Darrick J. Wong
  2 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 21:13 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Format filesystems with the large extent counter feature turned on.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 man/man8/mkfs.xfs.8.in |    7 ++++---
 mkfs/xfs_mkfs.c        |    2 +-
 2 files changed, 5 insertions(+), 4 deletions(-)


diff --git a/man/man8/mkfs.xfs.8.in b/man/man8/mkfs.xfs.8.in
index 211e7b0c..4c379549 100644
--- a/man/man8/mkfs.xfs.8.in
+++ b/man/man8/mkfs.xfs.8.in
@@ -645,9 +645,10 @@ free space conditions.
 .TP
 .BI nrext64[= value]
 Extend maximum values of inode data and attr fork extent counters from 2^31 -
-1 and 2^15 - 1 to 2^48 - 1 and 2^32 - 1 respectively. If the value is
-omitted, 1 is assumed. This feature is disabled by default. This feature is
-only available for filesystems formatted with -m crc=1.
+1 and 2^15 - 1 to 2^48 - 1 and 2^32 - 1 respectively.
+If the value is omitted, 1 is assumed.
+This feature will be enabled when possible.
+This feature is only available for filesystems formatted with -m crc=1.
 .TP
 .RE
 .PP
diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
index d3f34ef8..f355e416 100644
--- a/mkfs/xfs_mkfs.c
+++ b/mkfs/xfs_mkfs.c
@@ -4092,7 +4092,7 @@ main(
 			.nodalign = false,
 			.nortalign = false,
 			.bigtime = true,
-			.nrext64 = false,
+			.nrext64 = true,
 			/*
 			 * When we decide to enable a new feature by default,
 			 * please remember to update the mkfs conf files.


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 2/3] mkfs: enable reverse mapping by default
  2023-02-16 20:31 ` [PATCHSET v9r2d1 0/3] xfsprogs: turn on all available features Darrick J. Wong
  2023-02-16 21:13   ` [PATCH 1/3] mkfs: enable large extent counts by default Darrick J. Wong
@ 2023-02-16 21:13   ` Darrick J. Wong
  2023-02-16 21:13   ` [PATCH 3/3] mkfs: enable parent pointers " Darrick J. Wong
  2 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 21:13 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Now that the scrub part of online fsck is feature complete (scrub and
health reporting are done) there's actually a compelling story for
having the reverse mappings enabled.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 man/man8/mkfs.xfs.8.in |    4 ++--
 mkfs/xfs_mkfs.c        |    2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)


diff --git a/man/man8/mkfs.xfs.8.in b/man/man8/mkfs.xfs.8.in
index 4c379549..9ce8373d 100644
--- a/man/man8/mkfs.xfs.8.in
+++ b/man/man8/mkfs.xfs.8.in
@@ -289,8 +289,8 @@ pinpoint exactly which data has been lost when a disk error occurs.
 .IP
 By default,
 .B mkfs.xfs
-will not create reverse mapping btrees.  This feature is only available
-for filesystems created with the (default)
+will create reverse mapping btrees when possible.
+This feature is only available for filesystems created with the (default)
 .B \-m crc=1
 option set. When the option
 .B \-m crc=0
diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
index f355e416..325f8617 100644
--- a/mkfs/xfs_mkfs.c
+++ b/mkfs/xfs_mkfs.c
@@ -4085,7 +4085,7 @@ main(
 			.dirftype = true,
 			.finobt = true,
 			.spinodes = true,
-			.rmapbt = false,
+			.rmapbt = true,
 			.reflink = true,
 			.inobtcnt = true,
 			.parent_pointers = false,


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 3/3] mkfs: enable parent pointers by default
  2023-02-16 20:31 ` [PATCHSET v9r2d1 0/3] xfsprogs: turn on all available features Darrick J. Wong
  2023-02-16 21:13   ` [PATCH 1/3] mkfs: enable large extent counts by default Darrick J. Wong
  2023-02-16 21:13   ` [PATCH 2/3] mkfs: enable reverse mapping " Darrick J. Wong
@ 2023-02-16 21:13   ` Darrick J. Wong
  2 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 21:13 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 mkfs/xfs_mkfs.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
index 325f8617..5f090c08 100644
--- a/mkfs/xfs_mkfs.c
+++ b/mkfs/xfs_mkfs.c
@@ -4088,7 +4088,7 @@ main(
 			.rmapbt = true,
 			.reflink = true,
 			.inobtcnt = true,
-			.parent_pointers = false,
+			.parent_pointers = true,
 			.nodalign = false,
 			.nortalign = false,
 			.bigtime = true,


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 01/14] xfs/122: update for parent pointers
  2023-02-16 20:32 ` [PATCHSET 00/14] fstests: adjust tests for xfs parent pointers Darrick J. Wong
@ 2023-02-16 21:13   ` Darrick J. Wong
  2023-02-16 21:14   ` [PATCH 02/14] populate: create hardlinks " Darrick J. Wong
                     ` (12 subsequent siblings)
  13 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 21:13 UTC (permalink / raw)
  To: djwong, zlang; +Cc: linux-xfs, fstests, guan

From: Darrick J. Wong <djwong@kernel.org>

Update test for parent pointers.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 tests/xfs/122.out |    4 ++++
 tests/xfs/206     |    3 ++-
 2 files changed, 6 insertions(+), 1 deletion(-)


diff --git a/tests/xfs/122.out b/tests/xfs/122.out
index 43461e875c..c5958d1b99 100644
--- a/tests/xfs/122.out
+++ b/tests/xfs/122.out
@@ -109,7 +109,11 @@ sizeof(struct xfs_legacy_timestamp) = 8
 sizeof(struct xfs_log_dinode) = 176
 sizeof(struct xfs_log_legacy_timestamp) = 8
 sizeof(struct xfs_map_extent) = 32
+sizeof(struct xfs_parent_name_irec) = 32
+sizeof(struct xfs_parent_name_rec) = 16
+sizeof(struct xfs_parent_ptr) = 280
 sizeof(struct xfs_phys_extent) = 16
+sizeof(struct xfs_pptr_info) = 104
 sizeof(struct xfs_refcount_key) = 4
 sizeof(struct xfs_refcount_rec) = 12
 sizeof(struct xfs_rmap_key) = 20
diff --git a/tests/xfs/206 b/tests/xfs/206
index 904d53deb0..b29edeadf0 100755
--- a/tests/xfs/206
+++ b/tests/xfs/206
@@ -66,7 +66,8 @@ mkfs_filter()
 	    -e "/.*crc=/d" \
 	    -e "/^Default configuration/d" \
 	    -e "/metadir=.*/d" \
-	    -e '/rgcount=/d'
+	    -e '/rgcount=/d' \
+	    -e '/parent=/d'
 }
 
 # mkfs slightly smaller than that, small log for speed.


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 02/14] populate: create hardlinks for parent pointers
  2023-02-16 20:32 ` [PATCHSET 00/14] fstests: adjust tests for xfs parent pointers Darrick J. Wong
  2023-02-16 21:13   ` [PATCH 01/14] xfs/122: update for " Darrick J. Wong
@ 2023-02-16 21:14   ` Darrick J. Wong
  2023-02-16 21:14   ` [PATCH 03/14] xfs/021: adapt golden output files " Darrick J. Wong
                     ` (11 subsequent siblings)
  13 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 21:14 UTC (permalink / raw)
  To: djwong, zlang; +Cc: linux-xfs, fstests, guan

From: Darrick J. Wong <djwong@kernel.org>

Create some hardlinked files so that we can exercise parent pointers.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 common/populate |   38 ++++++++++++++++++++++++++++++++++++++
 src/popdir.pl   |   11 +++++++++++
 2 files changed, 49 insertions(+)


diff --git a/common/populate b/common/populate
index 389a762329..d52167964c 100644
--- a/common/populate
+++ b/common/populate
@@ -376,6 +376,7 @@ _scratch_xfs_populate() {
 	is_rt="$(_xfs_get_rtextents "$SCRATCH_MNT")"
 	is_rmapbt="$(_xfs_has_feature "$SCRATCH_MNT" rmapbt -v)"
 	is_reflink="$(_xfs_has_feature "$SCRATCH_MNT" reflink -v)"
+	is_pptr="$(_xfs_has_feature "$SCRATCH_MNT" parent -v)"
 
 	# Reverse-mapping btree
 	if [ $is_rmapbt -gt 0 ]; then
@@ -412,6 +413,43 @@ _scratch_xfs_populate() {
 		cp --reflink=always "${SCRATCH_MNT}/REFCOUNTBT" "${SCRATCH_MNT}/REFCOUNTBT2"
 	fi
 
+	# Parent pointers
+	if [ $is_pptr -gt 0 ]; then
+		echo "+ parent pointers"
+
+		# Create a couple of parent pointers
+		__populate_create_dir "${SCRATCH_MNT}/PPTRS" 1 '' --hardlink --format "two_%d"
+
+		# Create one xattr leaf block of parent pointers
+		nr="$((blksz * 2 / 16))"
+		__populate_create_dir "${SCRATCH_MNT}/PPTRS" ${nr} '' --hardlink --format "many%04d"
+
+		# Create multiple xattr leaf blocks of large parent pointers
+		nr="$((blksz * 16 / 16))"
+		__populate_create_dir "${SCRATCH_MNT}/PPTRS" ${nr} '' --hardlink --format "y%0254d"
+
+		# Create multiple paths to a file
+		local moof="${SCRATCH_MNT}/PPTRS/moofile"
+		touch "${moof}"
+		for ((i = 0; i < 4; i++)); do
+			mkdir -p "${SCRATCH_MNT}/PPTRS/SUB${i}"
+			ln "${moof}" "${SCRATCH_MNT}/PPTRS/SUB${i}/moofile"
+		done
+
+		# Create parent pointers of various lengths
+		touch "${SCRATCH_MNT}/PPTRS/vlength"
+		local len_len
+		local tst
+		local fname
+		ln "${SCRATCH_MNT}/PPTRS/vlength" "${SCRATCH_MNT}/PPTRS/b"
+		for len in 32 64 96 128 160 192 224 250 255; do
+			len_len="${#len}"
+			tst="$(perl -e "print \"b\" x (${len} - (${len_len} + 1))")"
+			fname="v${tst}${len}"
+			ln "${SCRATCH_MNT}/PPTRS/vlength" "${SCRATCH_MNT}/PPTRS/${fname}"
+		done
+	fi
+
 	# Copy some real files (xfs tests, I guess...)
 	echo "+ real files"
 	test $fill -ne 0 && __populate_fill_fs "${SCRATCH_MNT}" 5
diff --git a/src/popdir.pl b/src/popdir.pl
index dc0c046b7d..950503c621 100755
--- a/src/popdir.pl
+++ b/src/popdir.pl
@@ -17,6 +17,7 @@ GetOptions("start=i" => \$start,
 	   "dir=s" => \$dir,
 	   "remove!" => \$remove,
 	   "help!" => \$help,
+	   "hardlink!" => \$hardlink,
 	   "verbose!" => \$verbose);
 
 
@@ -36,6 +37,7 @@ Options:
   --format=str      printf formatting string for file name ("%08d")
   --verbose         verbose output
   --help            this help screen
+  --hardlink        hardlink subsequent files to the first one created
 EOF
   exit(1) unless defined $help;
   # otherwise...
@@ -51,12 +53,21 @@ $file_mult = 20 if (!defined $file_mult);
 $format = "%08d" if (!defined $format);
 $incr = 1 if (!defined $incr);
 
+if ($hardlink) {
+	$file_mult = 0;
+	$link_fname = sprintf($format, $start);
+}
+
 for ($i = $start; $i <= $end; $i += $incr) {
 	$fname = sprintf($format, $i);
 
 	if ($remove) {
 		$verbose && print "rm $fname\n";
 		unlink($fname) or rmdir($fname) or die("unlink $fname");
+	} elsif ($hardlink && $i > $start) {
+		# hardlink the first file
+		$verbose && print "ln $link_fname $fname\n";
+		link $link_fname, $fname;
 	} elsif ($file_mult == 0 or ($i % $file_mult) == 0) {
 		# create a file
 		$verbose && print "touch $fname\n";


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 03/14] xfs/021: adapt golden output files for parent pointers
  2023-02-16 20:32 ` [PATCHSET 00/14] fstests: adjust tests for xfs parent pointers Darrick J. Wong
  2023-02-16 21:13   ` [PATCH 01/14] xfs/122: update for " Darrick J. Wong
  2023-02-16 21:14   ` [PATCH 02/14] populate: create hardlinks " Darrick J. Wong
@ 2023-02-16 21:14   ` Darrick J. Wong
  2023-02-16 21:14   ` [PATCH 04/14] generic/050: adapt " Darrick J. Wong
                     ` (10 subsequent siblings)
  13 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 21:14 UTC (permalink / raw)
  To: djwong, zlang; +Cc: linux-xfs, fstests, guan

From: Darrick J. Wong <djwong@kernel.org>

Parent pointers change the xattr structure dramatically, so fix this
test to handle them.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 common/rc                 |    4 +++
 tests/xfs/021             |   15 +++++++++--
 tests/xfs/021.cfg         |    1 +
 tests/xfs/021.out.default |    0 
 tests/xfs/021.out.parent  |   64 +++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 82 insertions(+), 2 deletions(-)
 create mode 100644 tests/xfs/021.cfg
 rename tests/xfs/{021.out => 021.out.default} (100%)
 create mode 100644 tests/xfs/021.out.parent


diff --git a/common/rc b/common/rc
index 58aabe9a2e..00800c43b4 100644
--- a/common/rc
+++ b/common/rc
@@ -3307,6 +3307,8 @@ _get_os_name()
 
 _link_out_file_named()
 {
+	test -n "$seqfull" || _fail "need to set seqfull"
+
 	local features=$2
 	local suffix=$(FEATURES="$features" perl -e '
 		my %feathash;
@@ -3342,6 +3344,8 @@ _link_out_file()
 {
 	local features
 
+	test -n "$seqfull" || _fail "need to set seqfull"
+
 	if [ $# -eq 0 ]; then
 		features="$(_get_os_name),$FSTYP"
 		if [ -n "$MOUNT_OPTIONS" ]; then
diff --git a/tests/xfs/021 b/tests/xfs/021
index 9432e2acb0..ef307fc064 100755
--- a/tests/xfs/021
+++ b/tests/xfs/021
@@ -67,6 +67,13 @@ _scratch_mkfs_xfs >/dev/null \
 echo "*** mount FS"
 _scratch_mount
 
+seqfull=$0
+if _xfs_has_feature $SCRATCH_MNT parent; then
+	_link_out_file "parent"
+else
+	_link_out_file ""
+fi
+
 testfile=$SCRATCH_MNT/testfile
 echo "*** make test file 1"
 
@@ -108,7 +115,10 @@ _scratch_unmount >>$seqres.full 2>&1 \
 echo "*** dump attributes (1)"
 
 _scratch_xfs_db -r -c "inode $inum_1" -c "print a.sfattr"  | \
-	sed -e '/secure = /d' | sed -e '/parent = /d'
+	perl -ne '
+/\.secure/ && next;
+/\.parent/ && next;
+	print unless /^\d+:\[.*/;'
 
 echo "*** dump attributes (2)"
 
@@ -124,10 +134,11 @@ s/info.hdr/info/;
 /hdr.info.uuid/ && next;
 /hdr.info.lsn/ && next;
 /hdr.info.owner/ && next;
+/\.parent/ && next;
 s/^(hdr.info.magic =) 0x3bee/\1 0xfbee/;
 s/^(hdr.firstused =) (\d+)/\1 FIRSTUSED/;
 s/^(hdr.freemap\[0-2] = \[base,size]).*/\1 [FREEMAP..]/;
-s/^(entries\[0-2] = \[hashval,nameidx,incomplete,root,local]).*/\1 [ENTRIES..]/;
+s/^(entries\[0-[23]] = \[hashval,nameidx,incomplete,root,local]).*/\1 [ENTRIES..]/;
 	print unless /^\d+:\[.*/;'
 
 echo "*** done"
diff --git a/tests/xfs/021.cfg b/tests/xfs/021.cfg
new file mode 100644
index 0000000000..73b127260c
--- /dev/null
+++ b/tests/xfs/021.cfg
@@ -0,0 +1 @@
+parent: parent
diff --git a/tests/xfs/021.out b/tests/xfs/021.out.default
similarity index 100%
rename from tests/xfs/021.out
rename to tests/xfs/021.out.default
diff --git a/tests/xfs/021.out.parent b/tests/xfs/021.out.parent
new file mode 100644
index 0000000000..661d130239
--- /dev/null
+++ b/tests/xfs/021.out.parent
@@ -0,0 +1,64 @@
+QA output created by 021
+*** mkfs
+*** mount FS
+*** make test file 1
+# file: <TESTFILE>.1
+user.a1
+user.a2--
+
+*** make test file 2
+1+0 records in
+1+0 records out
+# file: <TESTFILE>.2
+user.a1
+user.a2-----
+user.a3
+
+Attribute "a3" had a 65535 byte value for <TESTFILE>.2:
+size of attr value = 65536
+
+*** unmount FS
+*** dump attributes (1)
+a.sfattr.hdr.totsize = 53
+a.sfattr.hdr.count = 3
+a.sfattr.list[0].namelen = 16
+a.sfattr.list[0].valuelen = 10
+a.sfattr.list[0].root = 0
+a.sfattr.list[0].value = "testfile.1"
+a.sfattr.list[1].namelen = 2
+a.sfattr.list[1].valuelen = 3
+a.sfattr.list[1].root = 0
+a.sfattr.list[1].name = "a1"
+a.sfattr.list[1].value = "v1\d"
+a.sfattr.list[2].namelen = 4
+a.sfattr.list[2].valuelen = 5
+a.sfattr.list[2].root = 0
+a.sfattr.list[2].name = "a2--"
+a.sfattr.list[2].value = "v2--\d"
+*** dump attributes (2)
+hdr.info.forw = 0
+hdr.info.back = 0
+hdr.info.magic = 0xfbee
+hdr.count = 4
+hdr.usedbytes = 84
+hdr.firstused = FIRSTUSED
+hdr.holes = 0
+hdr.freemap[0-2] = [base,size] [FREEMAP..]
+entries[0-3] = [hashval,nameidx,incomplete,root,local] [ENTRIES..]
+nvlist[0].valuelen = 8
+nvlist[0].namelen = 2
+nvlist[0].name = "a1"
+nvlist[0].value = "value_1\d"
+nvlist[1].valueblk = 0x1
+nvlist[1].valuelen = 65535
+nvlist[1].namelen = 2
+nvlist[1].name = "a3"
+nvlist[2].valuelen = 10
+nvlist[2].namelen = 16
+nvlist[2].value = "testfile.2"
+nvlist[3].valuelen = 8
+nvlist[3].namelen = 7
+nvlist[3].name = "a2-----"
+nvlist[3].value = "value_2\d"
+*** done
+*** unmount


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 04/14] generic/050: adapt for parent pointers
  2023-02-16 20:32 ` [PATCHSET 00/14] fstests: adjust tests for xfs parent pointers Darrick J. Wong
                     ` (2 preceding siblings ...)
  2023-02-16 21:14   ` [PATCH 03/14] xfs/021: adapt golden output files " Darrick J. Wong
@ 2023-02-16 21:14   ` Darrick J. Wong
  2023-02-16 21:14   ` [PATCH 05/14] xfs/018: disable parent pointers for this test Darrick J. Wong
                     ` (9 subsequent siblings)
  13 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 21:14 UTC (permalink / raw)
  To: djwong, zlang; +Cc: linux-xfs, fstests, guan

From: Darrick J. Wong <djwong@kernel.org>

Fix this test when quotas and parent pointers are enabled.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 tests/generic/050                    |    9 +++++++++
 tests/generic/050.cfg                |    1 +
 tests/generic/050.out.xfsquotaparent |   23 +++++++++++++++++++++++
 3 files changed, 33 insertions(+)
 create mode 100644 tests/generic/050.out.xfsquotaparent


diff --git a/tests/generic/050 b/tests/generic/050
index 0664f8c0e4..8af0d13842 100755
--- a/tests/generic/050
+++ b/tests/generic/050
@@ -36,6 +36,15 @@ elif [ "$FSTYP" = "xfs" ] && echo "$MOUNT_OPTIONS" | grep -q quota ; then
 	# Mounting with quota on XFS requires a writable fs, which means
 	# we expect to fail the ro blockdev test with with EPERM.
 	features="xfsquota"
+
+	if _xfs_has_feature $SCRATCH_DEV parent; then
+		# If we have quotas and parent pointers enabled, the primary
+		# superblock will be written out with the quota flags set when
+		# the logged xattrs log_incompat feature is set.  Hence the
+		# norecovery mount won't fail due to quota rejecting the
+		# mismatch between the mount qflags and the ondisk ones.
+		features="xfsquotaparent"
+	fi
 fi
 _link_out_file "$features"
 
diff --git a/tests/generic/050.cfg b/tests/generic/050.cfg
index 1d9d60bc69..85924d117d 100644
--- a/tests/generic/050.cfg
+++ b/tests/generic/050.cfg
@@ -1,2 +1,3 @@
 nojournal: nojournal
 xfsquota: xfsquota
+xfsquotaparent: xfsquotaparent
diff --git a/tests/generic/050.out.xfsquotaparent b/tests/generic/050.out.xfsquotaparent
new file mode 100644
index 0000000000..b341aca5be
--- /dev/null
+++ b/tests/generic/050.out.xfsquotaparent
@@ -0,0 +1,23 @@
+QA output created by 050
+setting device read-only
+mounting read-only block device:
+mount: SCRATCH_MNT: permission denied
+unmounting read-only filesystem
+umount: SCRATCH_DEV: not mounted
+setting device read-write
+mounting read-write block device:
+touch files
+going down:
+unmounting shutdown filesystem:
+setting device read-only
+mounting filesystem that needs recovery on a read-only device:
+mount: device write-protected, mounting read-only
+mount: cannot mount device read-only
+unmounting read-only filesystem
+umount: SCRATCH_DEV: not mounted
+mounting filesystem with -o norecovery on a read-only device:
+mount: device write-protected, mounting read-only
+unmounting read-only filesystem
+setting device read-write
+mounting filesystem that needs recovery with -o ro:
+*** done


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 05/14] xfs/018: disable parent pointers for this test
  2023-02-16 20:32 ` [PATCHSET 00/14] fstests: adjust tests for xfs parent pointers Darrick J. Wong
                     ` (3 preceding siblings ...)
  2023-02-16 21:14   ` [PATCH 04/14] generic/050: adapt " Darrick J. Wong
@ 2023-02-16 21:14   ` Darrick J. Wong
  2023-02-16 21:15   ` [PATCH 06/14] xfs/306: fix formatting failures with parent pointers Darrick J. Wong
                     ` (8 subsequent siblings)
  13 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 21:14 UTC (permalink / raw)
  To: djwong, zlang; +Cc: linux-xfs, fstests, guan

From: Darrick J. Wong <djwong@kernel.org>

This test depends heavily on the xattr formats created for new files.
Parent pointers break those assumptions, so force parent pointers off.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 tests/xfs/018 |    7 ++++++-
 tests/xfs/191 |    7 ++++++-
 tests/xfs/288 |    7 ++++++-
 3 files changed, 18 insertions(+), 3 deletions(-)


diff --git a/tests/xfs/018 b/tests/xfs/018
index 1ef51a2e61..34b6e91579 100755
--- a/tests/xfs/018
+++ b/tests/xfs/018
@@ -100,7 +100,12 @@ attr32l="X$attr32k"
 attr64k="$attr32k$attr32k"
 
 echo "*** mkfs"
-_scratch_mkfs >/dev/null
+
+# Parent pointers change the xattr formats sufficiently to break this test.
+# Disable parent pointers if mkfs supports it.
+mkfs_args=()
+$MKFS_XFS_PROG 2>&1 | grep -q parent=0 && mkfs_args+=(-n parent=0)
+_scratch_mkfs "${mkfs_args[@]}" >/dev/null
 
 blk_sz=$(_scratch_xfs_get_sb_field blocksize)
 err_inj_attr_sz=$(( blk_sz / 3 - 50 ))
diff --git a/tests/xfs/191 b/tests/xfs/191
index 7a02f1be21..0a6c20dad7 100755
--- a/tests/xfs/191
+++ b/tests/xfs/191
@@ -33,7 +33,12 @@ _fixed_by_kernel_commit 7be3bd8856fb "xfs: empty xattr leaf header blocks are no
 _fixed_by_kernel_commit e87021a2bc10 "xfs: use larger in-core attr firstused field and detect overflow"
 _fixed_by_git_commit xfsprogs f50d3462c654 "xfs_repair: ignore empty xattr leaf blocks"
 
-_scratch_mkfs_xfs | _filter_mkfs >$seqres.full 2>$tmp.mkfs
+# Parent pointers change the xattr formats sufficiently to break this test.
+# Disable parent pointers if mkfs supports it.
+mkfs_args=()
+$MKFS_XFS_PROG 2>&1 | grep -q parent=0 && mkfs_args+=(-n parent=0)
+
+_scratch_mkfs_xfs "${mkfs_args[@]}" | _filter_mkfs >$seqres.full 2>$tmp.mkfs
 cat $tmp.mkfs >> $seqres.full
 source $tmp.mkfs
 _scratch_mount
diff --git a/tests/xfs/288 b/tests/xfs/288
index aa664a266e..6bfc9ac0c8 100755
--- a/tests/xfs/288
+++ b/tests/xfs/288
@@ -19,8 +19,13 @@ _supported_fs xfs
 _require_scratch
 _require_attrs
 
+# Parent pointers change the xattr formats sufficiently to break this test.
+# Disable parent pointers if mkfs supports it.
+mkfs_args=()
+$MKFS_XFS_PROG 2>&1 | grep -q parent=0 && mkfs_args+=(-n parent=0)
+
 # get block size ($dbsize) from the mkfs output
-_scratch_mkfs_xfs 2>/dev/null | _filter_mkfs 2>$tmp.mkfs >/dev/null
+_scratch_mkfs_xfs "${mkfs_args[@]}" 2>/dev/null | _filter_mkfs 2>$tmp.mkfs >/dev/null
 . $tmp.mkfs
 
 _scratch_mount


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 06/14] xfs/306: fix formatting failures with parent pointers
  2023-02-16 20:32 ` [PATCHSET 00/14] fstests: adjust tests for xfs parent pointers Darrick J. Wong
                     ` (4 preceding siblings ...)
  2023-02-16 21:14   ` [PATCH 05/14] xfs/018: disable parent pointers for this test Darrick J. Wong
@ 2023-02-16 21:15   ` Darrick J. Wong
  2023-02-16 21:15   ` [PATCH 07/14] common: add helpers for parent pointer tests Darrick J. Wong
                     ` (7 subsequent siblings)
  13 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 21:15 UTC (permalink / raw)
  To: djwong, zlang; +Cc: linux-xfs, fstests, guan

From: Darrick J. Wong <djwong@kernel.org>

The parent pointers feature isn't supported on tiny 20MB filesystems
because the larger directory transactions result in larger minimum log
sizes, particularly with nrext64 enabled:

** mkfs failed with extra mkfs options added to " -m rmapbt=0, -i nrext64=1, -n parent=1," by test 306 **
** attempting to mkfs using only test 306 options: -d size=20m -n size=64k **
max log size 5108 smaller than min log size 5310, filesystem is too small

We don't support 20M filesystems anymore, so bump the filesystem size up
to 100M and skip this test if we can't actually format the filesystem.
Convert the open-coded punch-alternating logic into a call to that
program to reduce execve overhead, which more than makes up having to
write 5x as much data to fragment the free space.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 tests/xfs/306 |    9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)


diff --git a/tests/xfs/306 b/tests/xfs/306
index b57bf4c0a9..152971cfc3 100755
--- a/tests/xfs/306
+++ b/tests/xfs/306
@@ -23,6 +23,7 @@ _supported_fs xfs
 _require_scratch_nocheck	# check complains about single AG fs
 _require_xfs_io_command "fpunch"
 _require_command $UUIDGEN_PROG uuidgen
+_require_test_program "punch-alternating"
 
 # Disable the scratch rt device to avoid test failures relating to the rt
 # bitmap consuming all the free space in our small data device.
@@ -30,7 +31,8 @@ unset SCRATCH_RTDEV
 
 # Create a small fs with a large directory block size. We want to fill up the fs
 # quickly and then create multi-fsb dirblocks over fragmented free space.
-_scratch_mkfs_xfs -d size=20m -n size=64k >> $seqres.full 2>&1
+_scratch_mkfs_xfs -d size=100m -n size=64k >> $seqres.full 2>&1 || \
+	_notrun 'could not format tiny scratch fs'
 _scratch_mount
 
 # Fill a source directory with many largish-named files. 1k uuid-named entries
@@ -49,10 +51,7 @@ done
 $XFS_IO_PROG -xc "resblks 16" $SCRATCH_MNT >> $seqres.full 2>&1
 dd if=/dev/zero of=$SCRATCH_MNT/file bs=4k >> $seqres.full 2>&1
 $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/file >> $seqres.full 2>&1
-size=`_get_filesize $SCRATCH_MNT/file`
-for i in $(seq 0 8192 $size); do
-	$XFS_IO_PROG -c "fpunch $i 4k" $SCRATCH_MNT/file >> $seqres.full 2>&1
-done
+$here/src/punch-alternating $SCRATCH_MNT/file
 
 # Replicate the src dir several times into fragmented free space. After one or
 # two dirs, we should have nothing but non-contiguous directory blocks.


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 07/14] common: add helpers for parent pointer tests
  2023-02-16 20:32 ` [PATCHSET 00/14] fstests: adjust tests for xfs parent pointers Darrick J. Wong
                     ` (5 preceding siblings ...)
  2023-02-16 21:15   ` [PATCH 06/14] xfs/306: fix formatting failures with parent pointers Darrick J. Wong
@ 2023-02-16 21:15   ` Darrick J. Wong
  2023-02-16 21:15   ` [PATCH 08/14] xfs: add parent pointer test Darrick J. Wong
                     ` (6 subsequent siblings)
  13 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 21:15 UTC (permalink / raw)
  To: djwong, zlang
  Cc: Allison Henderson, Catherine Hoang, linux-xfs, fstests, guan

From: Allison Henderson <allison.henderson@oracle.com>

Add helper functions in common/parent to parse and verify parent
pointers. Also add functions to check that mkfs, kernel, and xfs_io
support parent pointers.

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 common/parent |  198 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 common/rc     |    3 +
 common/xfs    |   12 +++
 3 files changed, 213 insertions(+)
 create mode 100644 common/parent


diff --git a/common/parent b/common/parent
new file mode 100644
index 0000000000..a0ba7d927a
--- /dev/null
+++ b/common/parent
@@ -0,0 +1,198 @@
+#
+# Parent pointer common functions
+#
+
+#
+# parse_parent_pointer parents parent_inode parent_pointer_name
+#
+# Given a list of parent pointers, find the record that matches
+# the given inode and filename
+#
+# inputs:
+# parents	: A list of parent pointers in the format of:
+#		  inode/generation/name_length/name
+# parent_inode	: The parent inode to search for
+# parent_name	: The parent name to search for
+#
+# outputs:
+# PPINO         : Parent pointer inode
+# PPGEN         : Parent pointer generation
+# PPNAME        : Parent pointer name
+# PPNAME_LEN    : Parent pointer name length
+#
+_parse_parent_pointer()
+{
+	local parents=$1
+	local pino=$2
+	local parent_pointer_name=$3
+
+	local found=0
+
+	# Find the entry that has the same inode as the parent
+	# and parse out the entry info
+	while IFS=\/ read PPINO PPGEN PPNAME_LEN PPNAME; do
+		if [ "$PPINO" != "$pino" ]; then
+			continue
+		fi
+
+		if [ "$PPNAME" != "$parent_pointer_name" ]; then
+			continue
+		fi
+
+		found=1
+		break
+	done <<< $(echo "$parents")
+
+	# Check to see if we found anything
+	# We do not fail the test because we also use this
+	# routine to verify when parent pointers should
+	# be removed or updated  (ie a rename or a move
+	# operation changes your parent pointer)
+	if [ $found -eq "0" ]; then
+		return 1
+	fi
+
+	# Verify the parent pointer name length is correct
+	if [ "$PPNAME_LEN" -ne "${#parent_pointer_name}" ]
+	then
+		echo "*** Bad parent pointer:"\
+			"name:$PPNAME, namelen:$PPNAME_LEN"
+	fi
+
+	#return sucess
+	return 0
+}
+
+#
+# _verify_parent parent_path parent_pointer_name child_path
+#
+# Verify that the given child path lists the given parent as a parent pointer
+# and that the parent pointer name matches the given name
+#
+# Examples:
+#
+# #simple example
+# mkdir testfolder1
+# touch testfolder1/file1
+# verify_parent testfolder1 file1 testfolder1/file1
+#
+# # In this above example, we want to verify that "testfolder1"
+# # appears as a parent pointer of "testfolder1/file1".  Additionally
+# # we verify that the name record of the parent pointer is "file1"
+#
+#
+# #hardlink example
+# mkdir testfolder1
+# mkdir testfolder2
+# touch testfolder1/file1
+# ln testfolder1/file1 testfolder2/file1_ln
+# verify_parent testfolder2 file1_ln testfolder1/file1
+#
+# # In this above example, we want to verify that "testfolder2"
+# # appears as a parent pointer of "testfolder1/file1".  Additionally
+# # we verify that the name record of the parent pointer is "file1_ln"
+#
+_verify_parent()
+{
+	local parent_path=$1
+	local parent_pointer_name=$2
+	local child_path=$3
+
+	local parent_ppath="$parent_path/$parent_pointer_name"
+
+	# Verify parent exists
+	if [ ! -d $SCRATCH_MNT/$parent_path ]; then
+		_fail "$SCRATCH_MNT/$parent_path not found"
+	else
+		echo "*** $parent_path OK"
+	fi
+
+	# Verify child exists
+	if [ ! -f $SCRATCH_MNT/$child_path ]; then
+		_fail "$SCRATCH_MNT/$child_path not found"
+	else
+		echo "*** $child_path OK"
+	fi
+
+	# Verify the parent pointer name exists as a child of the parent
+	if [ ! -f $SCRATCH_MNT/$parent_ppath ]; then
+		_fail "$SCRATCH_MNT/$parent_ppath not found"
+	else
+		echo "*** $parent_ppath OK"
+	fi
+
+	# Get the inodes of both parent and child
+	pino="$(stat -c '%i' $SCRATCH_MNT/$parent_path)"
+	cino="$(stat -c '%i' $SCRATCH_MNT/$child_path)"
+
+	# Get all the parent pointers of the child
+	parents=($($XFS_IO_PROG -x -c \
+	 "parent -f -i $pino -n $parent_pointer_name" $SCRATCH_MNT/$child_path))
+	if [[ $? != 0 ]]; then
+		 _fail "No parent pointers found for $child_path"
+	fi
+
+	# Parse parent pointer output.
+	# This sets PPINO PPGEN PPNAME PPNAME_LEN
+	_parse_parent_pointer $parents $pino $parent_pointer_name
+
+	# If we didnt find one, bail out
+	if [ $? -ne 0 ]; then
+		_fail "No parent pointer record found for $parent_path"\
+			"in $child_path"
+	fi
+
+	# Verify the inode generated by the parent pointer name is
+	# the same as the child inode
+	pppino="$(stat -c '%i' $SCRATCH_MNT/$parent_ppath)"
+	if [ $cino -ne $pppino ]
+	then
+		_fail "Bad parent pointer name value for $child_path."\
+			"$SCRATCH_MNT/$parent_ppath belongs to inode $PPPINO,"\
+			"but should be $cino"
+	fi
+
+	echo "*** Verified parent pointer:"\
+			"name:$PPNAME, namelen:$PPNAME_LEN"
+	echo "*** Parent pointer OK for child $child_path"
+}
+
+#
+# _verify_parent parent_pointer_name pino child_path
+#
+# Verify that the given child path contains no parent pointer entry
+# for the given inode and file name
+#
+_verify_no_parent()
+{
+	local parent_pname=$1
+	local pino=$2
+	local child_path=$3
+
+	# Verify child exists
+	if [ ! -f $SCRATCH_MNT/$child_path ]; then
+		_fail "$SCRATCH_MNT/$child_path not found"
+	else
+		echo "*** $child_path OK"
+	fi
+
+	# Get all the parent pointers of the child
+	local parents=($($XFS_IO_PROG -x -c \
+	 "parent -f -i $pino -n $parent_pname" $SCRATCH_MNT/$child_path))
+	if [[ $? != 0 ]]; then
+		return 0
+	fi
+
+	# Parse parent pointer output.
+	# This sets PPINO PPGEN PPNAME PPNAME_LEN
+	_parse_parent_pointer $parents $pino $parent_pname
+
+	# If we didnt find one, return sucess
+	if [ $? -ne 0 ]; then
+		return 0
+	fi
+
+	_fail "Parent pointer entry found where none should:"\
+			"inode:$PPINO, gen:$PPGEN,"
+			"name:$PPNAME, namelen:$PPNAME_LEN"
+}
diff --git a/common/rc b/common/rc
index 00800c43b4..d3edbb78c4 100644
--- a/common/rc
+++ b/common/rc
@@ -2565,6 +2565,9 @@ _require_xfs_io_command()
 		echo $testio | grep -q "invalid option" && \
 			_notrun "xfs_io $command support is missing"
 		;;
+	"parent")
+		testio=`$XFS_IO_PROG -x -c "parent" $TEST_DIR 2>&1`
+		;;
 	"pwrite")
 		# -N (RWF_NOWAIT) only works with direct vectored I/O writes
 		local pwrite_opts=" "
diff --git a/common/xfs b/common/xfs
index 0ea9d3826e..32406c0fe5 100644
--- a/common/xfs
+++ b/common/xfs
@@ -2091,3 +2091,15 @@ _scratch_find_rt_metadir_entry() {
 		grep "${sfkey}.inumber" | awk '{print $1}'
 	return 0
 }
+
+# this test requires the xfs parent pointers feature
+#
+_require_xfs_parent()
+{
+	_scratch_mkfs_xfs_supported -n parent > /dev/null 2>&1 \
+		|| _notrun "mkfs.xfs does not support parent pointers"
+	_scratch_mkfs_xfs -n parent > /dev/null 2>&1
+	_try_scratch_mount >/dev/null 2>&1 \
+		|| _notrun "kernel does not support parent pointers"
+	_scratch_unmount
+}


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 08/14] xfs: add parent pointer test
  2023-02-16 20:32 ` [PATCHSET 00/14] fstests: adjust tests for xfs parent pointers Darrick J. Wong
                     ` (6 preceding siblings ...)
  2023-02-16 21:15   ` [PATCH 07/14] common: add helpers for parent pointer tests Darrick J. Wong
@ 2023-02-16 21:15   ` Darrick J. Wong
  2023-02-16 21:15   ` [PATCH 09/14] xfs: add multi link " Darrick J. Wong
                     ` (5 subsequent siblings)
  13 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 21:15 UTC (permalink / raw)
  To: djwong, zlang
  Cc: Allison Henderson, Catherine Hoang, linux-xfs, fstests, guan

From: Allison Henderson <allison.henderson@oracle.com>

Add a test to verify basic parent pointers operations (create, move, link,
unlink, rename, overwrite).

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 doc/group-names.txt |    1 +
 tests/xfs/851       |  101 +++++++++++++++++++++++++++++++++++++++++++++++++++
 tests/xfs/851.out   |   59 ++++++++++++++++++++++++++++++
 3 files changed, 161 insertions(+)
 create mode 100755 tests/xfs/851
 create mode 100644 tests/xfs/851.out


diff --git a/doc/group-names.txt b/doc/group-names.txt
index 8bcf21919b..569a32d9bb 100644
--- a/doc/group-names.txt
+++ b/doc/group-names.txt
@@ -82,6 +82,7 @@ nfs4_acl		NFSv4 access control lists
 nonsamefs		overlayfs layers on different filesystems
 online_repair		online repair functionality tests
 other			dumping ground, do not add more tests to this group
+parent			Parent pointer tests
 pattern			specific IO pattern tests
 perms			access control and permission checking
 pipe			pipe functionality
diff --git a/tests/xfs/851 b/tests/xfs/851
new file mode 100755
index 0000000000..27870ec05a
--- /dev/null
+++ b/tests/xfs/851
@@ -0,0 +1,101 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (c) 2022, Oracle and/or its affiliates.  All Rights Reserved.
+#
+# FS QA Test 851
+#
+# simple parent pointer test
+#
+
+. ./common/preamble
+_begin_fstest auto quick parent
+
+# get standard environment, filters and checks
+. ./common/parent
+
+# Modify as appropriate
+_supported_fs xfs
+_require_scratch
+_require_xfs_sysfs debug/larp
+_require_xfs_parent
+_require_xfs_io_command "parent"
+
+# real QA test starts here
+
+# Create a directory tree using a protofile and
+# make sure all inodes created have parent pointers
+
+protofile=$tmp.proto
+
+cat >$protofile <<EOF
+DUMMY1
+0 0
+: root directory
+d--777 3 1
+: a directory
+testfolder1 d--755 3 1
+file1 ---755 3 1 /dev/null
+$
+: back in the root
+testfolder2 d--755 3 1
+file2 ---755 3 1 /dev/null
+: done
+$
+EOF
+
+_scratch_mkfs -f -n parent=1 -p $protofile >>$seqres.full 2>&1 \
+	|| _fail "mkfs failed"
+_check_scratch_fs
+
+_scratch_mount >>$seqres.full 2>&1 \
+	|| _fail "mount failed"
+
+testfolder1="testfolder1"
+testfolder2="testfolder2"
+file1="file1"
+file2="file2"
+file3="file3"
+file1_ln="file1_link"
+
+echo ""
+# Create parent pointer test
+_verify_parent "$testfolder1" "$file1" "$testfolder1/$file1"
+
+echo ""
+# Move parent pointer test
+mv $SCRATCH_MNT/$testfolder1/$file1 $SCRATCH_MNT/$testfolder2/$file1
+_verify_parent "$testfolder2" "$file1" "$testfolder2/$file1"
+
+echo ""
+# Hard link parent pointer test
+ln $SCRATCH_MNT/$testfolder2/$file1 $SCRATCH_MNT/$testfolder1/$file1_ln
+_verify_parent "$testfolder1" "$file1_ln" "$testfolder1/$file1_ln"
+_verify_parent "$testfolder1" "$file1_ln" "$testfolder2/$file1"
+_verify_parent "$testfolder2" "$file1"    "$testfolder1/$file1_ln"
+_verify_parent "$testfolder2" "$file1"    "$testfolder2/$file1"
+
+echo ""
+# Remove hard link parent pointer test
+ino="$(stat -c '%i' $SCRATCH_MNT/$testfolder2/$file1)"
+rm $SCRATCH_MNT/$testfolder2/$file1
+_verify_parent "$testfolder1" "$file1_ln" "$testfolder1/$file1_ln"
+_verify_no_parent "$file1" "$ino" "$testfolder1/$file1_ln"
+
+echo ""
+# Rename parent pointer test
+ino="$(stat -c '%i' $SCRATCH_MNT/$testfolder1/$file1_ln)"
+mv $SCRATCH_MNT/$testfolder1/$file1_ln $SCRATCH_MNT/$testfolder1/$file2
+_verify_parent "$testfolder1" "$file2" "$testfolder1/$file2"
+_verify_no_parent "$file1_ln" "$ino" "$testfolder1/$file2"
+
+echo ""
+# Over write parent pointer test
+touch $SCRATCH_MNT/$testfolder2/$file3
+_verify_parent "$testfolder2" "$file3" "$testfolder2/$file3"
+ino="$(stat -c '%i' $SCRATCH_MNT/$testfolder2/$file3)"
+mv -f $SCRATCH_MNT/$testfolder2/$file3 $SCRATCH_MNT/$testfolder1/$file2
+_verify_parent "$testfolder1" "$file2" "$testfolder1/$file2"
+
+# success, all done
+status=0
+exit
diff --git a/tests/xfs/851.out b/tests/xfs/851.out
new file mode 100644
index 0000000000..c375ba5f00
--- /dev/null
+++ b/tests/xfs/851.out
@@ -0,0 +1,59 @@
+QA output created by 851
+
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1
+
+*** testfolder2 OK
+*** testfolder2/file1 OK
+*** testfolder2/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder2/file1
+
+*** testfolder1 OK
+*** testfolder1/file1_link OK
+*** testfolder1/file1_link OK
+*** Verified parent pointer: name:file1_link, namelen:10
+*** Parent pointer OK for child testfolder1/file1_link
+*** testfolder1 OK
+*** testfolder2/file1 OK
+*** testfolder1/file1_link OK
+*** Verified parent pointer: name:file1_link, namelen:10
+*** Parent pointer OK for child testfolder2/file1
+*** testfolder2 OK
+*** testfolder1/file1_link OK
+*** testfolder2/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link
+*** testfolder2 OK
+*** testfolder2/file1 OK
+*** testfolder2/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder2/file1
+
+*** testfolder1 OK
+*** testfolder1/file1_link OK
+*** testfolder1/file1_link OK
+*** Verified parent pointer: name:file1_link, namelen:10
+*** Parent pointer OK for child testfolder1/file1_link
+*** testfolder1/file1_link OK
+
+*** testfolder1 OK
+*** testfolder1/file2 OK
+*** testfolder1/file2 OK
+*** Verified parent pointer: name:file2, namelen:5
+*** Parent pointer OK for child testfolder1/file2
+*** testfolder1/file2 OK
+
+*** testfolder2 OK
+*** testfolder2/file3 OK
+*** testfolder2/file3 OK
+*** Verified parent pointer: name:file3, namelen:5
+*** Parent pointer OK for child testfolder2/file3
+*** testfolder1 OK
+*** testfolder1/file2 OK
+*** testfolder1/file2 OK
+*** Verified parent pointer: name:file2, namelen:5
+*** Parent pointer OK for child testfolder1/file2


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 09/14] xfs: add multi link parent pointer test
  2023-02-16 20:32 ` [PATCHSET 00/14] fstests: adjust tests for xfs parent pointers Darrick J. Wong
                     ` (7 preceding siblings ...)
  2023-02-16 21:15   ` [PATCH 08/14] xfs: add parent pointer test Darrick J. Wong
@ 2023-02-16 21:15   ` Darrick J. Wong
  2023-02-16 21:16   ` [PATCH 10/14] xfs: add parent pointer inject test Darrick J. Wong
                     ` (4 subsequent siblings)
  13 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 21:15 UTC (permalink / raw)
  To: djwong, zlang
  Cc: Allison Henderson, Catherine Hoang, linux-xfs, fstests, guan

From: Allison Henderson <allison.henderson@oracle.com>

Add a test to verify parent pointers while multiple links to a file are
created and removed.

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 tests/xfs/852     |   69 ++++
 tests/xfs/852.out | 1002 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 1071 insertions(+)
 create mode 100755 tests/xfs/852
 create mode 100644 tests/xfs/852.out


diff --git a/tests/xfs/852 b/tests/xfs/852
new file mode 100755
index 0000000000..4d1be0e945
--- /dev/null
+++ b/tests/xfs/852
@@ -0,0 +1,69 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (c) 2022, Oracle and/or its affiliates.  All Rights Reserved.
+#
+# FS QA Test 852
+#
+# multi link parent pointer test
+#
+. ./common/preamble
+_begin_fstest auto quick parent
+
+# get standard environment, filters and checks
+. ./common/parent
+
+# Modify as appropriate
+_supported_fs xfs
+_require_scratch
+_require_xfs_sysfs debug/larp
+_require_xfs_parent
+_require_xfs_io_command "parent"
+
+# real QA test starts here
+
+# Create a directory tree using a protofile and
+# make sure all inodes created have parent pointers
+
+protofile=$tmp.proto
+
+cat >$protofile <<EOF
+DUMMY1
+0 0
+: root directory
+d--777 3 1
+: a directory
+testfolder1 d--755 3 1
+file1 ---755 3 1 /dev/null
+: done
+$
+EOF
+
+_scratch_mkfs -f -n parent=1 -p $protofile >>$seqresres.full 2>&1 \
+	|| _fail "mkfs failed"
+_check_scratch_fs
+
+_scratch_mount >>$seqres.full 2>&1 \
+	|| _fail "mount failed"
+
+testfolder1="testfolder1"
+file1="file1"
+file1_ln="file1_link"
+
+echo ""
+# Multi link parent pointer test
+NLINKS=100
+for (( j=0; j<$NLINKS; j++ )); do
+	ln $SCRATCH_MNT/$testfolder1/$file1 $SCRATCH_MNT/$testfolder1/$file1_ln.$j
+	_verify_parent "$testfolder1" "$file1_ln.$j" "$testfolder1/$file1"
+	_verify_parent "$testfolder1" "$file1" "$testfolder1/$file1_ln.$j"
+done
+# Multi unlink parent pointer test
+for (( j=$NLINKS-1; j<=0; j-- )); do
+	ino="$(stat -c '%i' $SCRATCH_MNT/$testfolder1/$file1_ln.$j)"
+	rm $SCRATCH_MNT/$testfolder1/$file1_ln.$j
+	_verify_no_parent "$file1_ln.$j" "$ino" "$testfolder1/$file1"
+done
+
+# success, all done
+status=0
+exit
diff --git a/tests/xfs/852.out b/tests/xfs/852.out
new file mode 100644
index 0000000000..9cc4b354ad
--- /dev/null
+++ b/tests/xfs/852.out
@@ -0,0 +1,1002 @@
+QA output created by 852
+
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.0 OK
+*** Verified parent pointer: name:file1_link.0, namelen:12
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.0 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.0
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.1 OK
+*** Verified parent pointer: name:file1_link.1, namelen:12
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.1 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.1
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.2 OK
+*** Verified parent pointer: name:file1_link.2, namelen:12
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.2 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.2
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.3 OK
+*** Verified parent pointer: name:file1_link.3, namelen:12
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.3 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.3
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.4 OK
+*** Verified parent pointer: name:file1_link.4, namelen:12
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.4 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.4
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.5 OK
+*** Verified parent pointer: name:file1_link.5, namelen:12
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.5 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.5
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.6 OK
+*** Verified parent pointer: name:file1_link.6, namelen:12
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.6 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.6
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.7 OK
+*** Verified parent pointer: name:file1_link.7, namelen:12
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.7 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.7
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.8 OK
+*** Verified parent pointer: name:file1_link.8, namelen:12
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.8 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.8
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.9 OK
+*** Verified parent pointer: name:file1_link.9, namelen:12
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.9 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.9
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.10 OK
+*** Verified parent pointer: name:file1_link.10, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.10 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.10
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.11 OK
+*** Verified parent pointer: name:file1_link.11, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.11 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.11
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.12 OK
+*** Verified parent pointer: name:file1_link.12, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.12 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.12
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.13 OK
+*** Verified parent pointer: name:file1_link.13, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.13 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.13
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.14 OK
+*** Verified parent pointer: name:file1_link.14, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.14 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.14
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.15 OK
+*** Verified parent pointer: name:file1_link.15, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.15 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.15
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.16 OK
+*** Verified parent pointer: name:file1_link.16, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.16 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.16
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.17 OK
+*** Verified parent pointer: name:file1_link.17, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.17 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.17
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.18 OK
+*** Verified parent pointer: name:file1_link.18, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.18 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.18
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.19 OK
+*** Verified parent pointer: name:file1_link.19, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.19 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.19
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.20 OK
+*** Verified parent pointer: name:file1_link.20, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.20 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.20
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.21 OK
+*** Verified parent pointer: name:file1_link.21, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.21 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.21
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.22 OK
+*** Verified parent pointer: name:file1_link.22, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.22 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.22
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.23 OK
+*** Verified parent pointer: name:file1_link.23, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.23 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.23
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.24 OK
+*** Verified parent pointer: name:file1_link.24, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.24 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.24
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.25 OK
+*** Verified parent pointer: name:file1_link.25, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.25 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.25
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.26 OK
+*** Verified parent pointer: name:file1_link.26, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.26 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.26
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.27 OK
+*** Verified parent pointer: name:file1_link.27, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.27 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.27
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.28 OK
+*** Verified parent pointer: name:file1_link.28, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.28 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.28
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.29 OK
+*** Verified parent pointer: name:file1_link.29, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.29 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.29
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.30 OK
+*** Verified parent pointer: name:file1_link.30, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.30 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.30
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.31 OK
+*** Verified parent pointer: name:file1_link.31, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.31 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.31
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.32 OK
+*** Verified parent pointer: name:file1_link.32, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.32 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.32
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.33 OK
+*** Verified parent pointer: name:file1_link.33, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.33 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.33
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.34 OK
+*** Verified parent pointer: name:file1_link.34, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.34 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.34
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.35 OK
+*** Verified parent pointer: name:file1_link.35, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.35 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.35
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.36 OK
+*** Verified parent pointer: name:file1_link.36, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.36 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.36
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.37 OK
+*** Verified parent pointer: name:file1_link.37, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.37 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.37
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.38 OK
+*** Verified parent pointer: name:file1_link.38, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.38 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.38
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.39 OK
+*** Verified parent pointer: name:file1_link.39, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.39 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.39
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.40 OK
+*** Verified parent pointer: name:file1_link.40, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.40 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.40
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.41 OK
+*** Verified parent pointer: name:file1_link.41, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.41 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.41
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.42 OK
+*** Verified parent pointer: name:file1_link.42, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.42 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.42
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.43 OK
+*** Verified parent pointer: name:file1_link.43, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.43 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.43
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.44 OK
+*** Verified parent pointer: name:file1_link.44, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.44 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.44
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.45 OK
+*** Verified parent pointer: name:file1_link.45, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.45 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.45
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.46 OK
+*** Verified parent pointer: name:file1_link.46, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.46 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.46
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.47 OK
+*** Verified parent pointer: name:file1_link.47, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.47 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.47
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.48 OK
+*** Verified parent pointer: name:file1_link.48, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.48 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.48
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.49 OK
+*** Verified parent pointer: name:file1_link.49, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.49 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.49
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.50 OK
+*** Verified parent pointer: name:file1_link.50, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.50 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.50
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.51 OK
+*** Verified parent pointer: name:file1_link.51, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.51 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.51
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.52 OK
+*** Verified parent pointer: name:file1_link.52, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.52 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.52
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.53 OK
+*** Verified parent pointer: name:file1_link.53, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.53 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.53
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.54 OK
+*** Verified parent pointer: name:file1_link.54, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.54 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.54
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.55 OK
+*** Verified parent pointer: name:file1_link.55, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.55 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.55
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.56 OK
+*** Verified parent pointer: name:file1_link.56, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.56 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.56
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.57 OK
+*** Verified parent pointer: name:file1_link.57, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.57 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.57
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.58 OK
+*** Verified parent pointer: name:file1_link.58, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.58 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.58
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.59 OK
+*** Verified parent pointer: name:file1_link.59, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.59 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.59
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.60 OK
+*** Verified parent pointer: name:file1_link.60, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.60 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.60
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.61 OK
+*** Verified parent pointer: name:file1_link.61, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.61 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.61
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.62 OK
+*** Verified parent pointer: name:file1_link.62, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.62 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.62
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.63 OK
+*** Verified parent pointer: name:file1_link.63, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.63 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.63
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.64 OK
+*** Verified parent pointer: name:file1_link.64, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.64 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.64
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.65 OK
+*** Verified parent pointer: name:file1_link.65, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.65 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.65
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.66 OK
+*** Verified parent pointer: name:file1_link.66, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.66 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.66
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.67 OK
+*** Verified parent pointer: name:file1_link.67, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.67 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.67
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.68 OK
+*** Verified parent pointer: name:file1_link.68, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.68 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.68
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.69 OK
+*** Verified parent pointer: name:file1_link.69, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.69 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.69
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.70 OK
+*** Verified parent pointer: name:file1_link.70, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.70 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.70
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.71 OK
+*** Verified parent pointer: name:file1_link.71, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.71 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.71
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.72 OK
+*** Verified parent pointer: name:file1_link.72, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.72 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.72
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.73 OK
+*** Verified parent pointer: name:file1_link.73, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.73 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.73
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.74 OK
+*** Verified parent pointer: name:file1_link.74, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.74 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.74
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.75 OK
+*** Verified parent pointer: name:file1_link.75, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.75 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.75
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.76 OK
+*** Verified parent pointer: name:file1_link.76, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.76 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.76
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.77 OK
+*** Verified parent pointer: name:file1_link.77, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.77 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.77
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.78 OK
+*** Verified parent pointer: name:file1_link.78, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.78 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.78
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.79 OK
+*** Verified parent pointer: name:file1_link.79, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.79 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.79
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.80 OK
+*** Verified parent pointer: name:file1_link.80, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.80 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.80
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.81 OK
+*** Verified parent pointer: name:file1_link.81, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.81 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.81
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.82 OK
+*** Verified parent pointer: name:file1_link.82, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.82 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.82
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.83 OK
+*** Verified parent pointer: name:file1_link.83, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.83 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.83
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.84 OK
+*** Verified parent pointer: name:file1_link.84, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.84 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.84
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.85 OK
+*** Verified parent pointer: name:file1_link.85, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.85 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.85
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.86 OK
+*** Verified parent pointer: name:file1_link.86, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.86 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.86
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.87 OK
+*** Verified parent pointer: name:file1_link.87, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.87 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.87
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.88 OK
+*** Verified parent pointer: name:file1_link.88, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.88 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.88
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.89 OK
+*** Verified parent pointer: name:file1_link.89, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.89 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.89
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.90 OK
+*** Verified parent pointer: name:file1_link.90, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.90 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.90
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.91 OK
+*** Verified parent pointer: name:file1_link.91, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.91 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.91
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.92 OK
+*** Verified parent pointer: name:file1_link.92, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.92 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.92
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.93 OK
+*** Verified parent pointer: name:file1_link.93, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.93 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.93
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.94 OK
+*** Verified parent pointer: name:file1_link.94, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.94 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.94
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.95 OK
+*** Verified parent pointer: name:file1_link.95, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.95 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.95
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.96 OK
+*** Verified parent pointer: name:file1_link.96, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.96 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.96
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.97 OK
+*** Verified parent pointer: name:file1_link.97, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.97 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.97
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.98 OK
+*** Verified parent pointer: name:file1_link.98, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.98 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.98
+*** testfolder1 OK
+*** testfolder1/file1 OK
+*** testfolder1/file1_link.99 OK
+*** Verified parent pointer: name:file1_link.99, namelen:13
+*** Parent pointer OK for child testfolder1/file1
+*** testfolder1 OK
+*** testfolder1/file1_link.99 OK
+*** testfolder1/file1 OK
+*** Verified parent pointer: name:file1, namelen:5
+*** Parent pointer OK for child testfolder1/file1_link.99


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 10/14] xfs: add parent pointer inject test
  2023-02-16 20:32 ` [PATCHSET 00/14] fstests: adjust tests for xfs parent pointers Darrick J. Wong
                     ` (8 preceding siblings ...)
  2023-02-16 21:15   ` [PATCH 09/14] xfs: add multi link " Darrick J. Wong
@ 2023-02-16 21:16   ` Darrick J. Wong
  2023-02-16 21:16   ` [PATCH 11/14] common/parent: add license and copyright Darrick J. Wong
                     ` (3 subsequent siblings)
  13 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 21:16 UTC (permalink / raw)
  To: djwong, zlang
  Cc: Allison Henderson, Catherine Hoang, linux-xfs, fstests, guan

From: Allison Henderson <allison.henderson@oracle.com>

Add a test to verify parent pointers after an error injection and log
replay.

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Signed-off-by: Catherine Hoang <catherine.hoang@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 tests/xfs/853     |   85 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 tests/xfs/853.out |   14 +++++++++
 2 files changed, 99 insertions(+)
 create mode 100755 tests/xfs/853
 create mode 100644 tests/xfs/853.out


diff --git a/tests/xfs/853 b/tests/xfs/853
new file mode 100755
index 0000000000..f17f4b7e9e
--- /dev/null
+++ b/tests/xfs/853
@@ -0,0 +1,85 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (c) 2022, Oracle and/or its affiliates.  All Rights Reserved.
+#
+# FS QA Test 853
+#
+# parent pointer inject test
+#
+. ./common/preamble
+_begin_fstest auto quick parent
+
+# get standard environment, filters and checks
+. ./common/filter
+. ./common/inject
+. ./common/parent
+
+# Modify as appropriate
+_supported_fs xfs
+_require_scratch
+_require_xfs_sysfs debug/larp
+_require_xfs_io_error_injection "larp"
+_require_xfs_parent
+_require_xfs_io_command "parent"
+
+# real QA test starts here
+
+# Create a directory tree using a protofile and
+# make sure all inodes created have parent pointers
+
+protofile=$tmp.proto
+
+cat >$protofile <<EOF
+DUMMY1
+0 0
+: root directory
+d--777 3 1
+: a directory
+testfolder1 d--755 3 1
+$
+: back in the root
+testfolder2 d--755 3 1
+: done
+$
+EOF
+
+_scratch_mkfs -f -n parent=1 -p $protofile >>$seqres.full 2>&1 \
+	|| _fail "mkfs failed"
+_check_scratch_fs
+
+_scratch_mount >>$seqres.full 2>&1 \
+	|| _fail "mount failed"
+
+testfolder1="testfolder1"
+testfolder2="testfolder2"
+file4="file4"
+file5="file5"
+
+echo ""
+
+# Create files
+touch $SCRATCH_MNT/$testfolder1/$file4
+_verify_parent "$testfolder1" "$file4" "$testfolder1/$file4"
+
+# Inject error
+_scratch_inject_error "larp"
+
+# Move files
+mv $SCRATCH_MNT/$testfolder1/$file4 $SCRATCH_MNT/$testfolder2/$file5 2>&1 \
+	| _filter_scratch
+
+# FS should be shut down, touch will fail
+touch $SCRATCH_MNT/$testfolder2/$file5 2>&1 | _filter_scratch
+
+# Remount to replay log
+_scratch_remount_dump_log >> $seqres.full
+
+# FS should be online, touch should succeed
+touch $SCRATCH_MNT/$testfolder2/$file5
+
+# Check files again
+_verify_parent "$testfolder2" "$file5" "$testfolder2/$file5"
+
+# success, all done
+status=0
+exit
diff --git a/tests/xfs/853.out b/tests/xfs/853.out
new file mode 100644
index 0000000000..56247c1434
--- /dev/null
+++ b/tests/xfs/853.out
@@ -0,0 +1,14 @@
+QA output created by 853
+
+*** testfolder1 OK
+*** testfolder1/file4 OK
+*** testfolder1/file4 OK
+*** Verified parent pointer: name:file4, namelen:5
+*** Parent pointer OK for child testfolder1/file4
+mv: cannot stat 'SCRATCH_MNT/testfolder1/file4': Input/output error
+touch: cannot touch 'SCRATCH_MNT/testfolder2/file5': Input/output error
+*** testfolder2 OK
+*** testfolder2/file5 OK
+*** testfolder2/file5 OK
+*** Verified parent pointer: name:file5, namelen:5
+*** Parent pointer OK for child testfolder2/file5


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 11/14] common/parent: add license and copyright
  2023-02-16 20:32 ` [PATCHSET 00/14] fstests: adjust tests for xfs parent pointers Darrick J. Wong
                     ` (9 preceding siblings ...)
  2023-02-16 21:16   ` [PATCH 10/14] xfs: add parent pointer inject test Darrick J. Wong
@ 2023-02-16 21:16   ` Darrick J. Wong
  2023-02-16 21:16   ` [PATCH 12/14] common/parent: don't _fail on missing parent pointer components Darrick J. Wong
                     ` (2 subsequent siblings)
  13 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 21:16 UTC (permalink / raw)
  To: djwong, zlang; +Cc: linux-xfs, fstests, guan

From: Darrick J. Wong <djwong@kernel.org>

Add the necessary licensing and copyright tags to the new file.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 common/parent |    3 +++
 1 file changed, 3 insertions(+)


diff --git a/common/parent b/common/parent
index a0ba7d927a..a734a8017d 100644
--- a/common/parent
+++ b/common/parent
@@ -1,3 +1,6 @@
+#! /bin/bash
+# SPDX-License-Identifier: GPL-2.0
+# Copyright (c) 2022, Oracle and/or its affiliates.  All Rights Reserved.
 #
 # Parent pointer common functions
 #


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 12/14] common/parent: don't _fail on missing parent pointer components
  2023-02-16 20:32 ` [PATCHSET 00/14] fstests: adjust tests for xfs parent pointers Darrick J. Wong
                     ` (10 preceding siblings ...)
  2023-02-16 21:16   ` [PATCH 11/14] common/parent: add license and copyright Darrick J. Wong
@ 2023-02-16 21:16   ` Darrick J. Wong
  2023-02-16 21:16   ` [PATCH 13/14] common/parent: check xfs_io parent command paths Darrick J. Wong
  2023-02-16 21:17   ` [PATCH 14/14] xfs/851: test xfs_io parent -p too Darrick J. Wong
  13 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 21:16 UTC (permalink / raw)
  To: djwong, zlang; +Cc: linux-xfs, fstests, guan

From: Darrick J. Wong <djwong@kernel.org>

Use echo instead of _fail here so that we run as much of the test as
possible.  There's no need to stop the test immediately even if the pptr
code isn't working.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 common/parent |   16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)


diff --git a/common/parent b/common/parent
index a734a8017d..7e63765d56 100644
--- a/common/parent
+++ b/common/parent
@@ -105,21 +105,21 @@ _verify_parent()
 
 	# Verify parent exists
 	if [ ! -d $SCRATCH_MNT/$parent_path ]; then
-		_fail "$SCRATCH_MNT/$parent_path not found"
+		echo "$SCRATCH_MNT/$parent_path not found"
 	else
 		echo "*** $parent_path OK"
 	fi
 
 	# Verify child exists
 	if [ ! -f $SCRATCH_MNT/$child_path ]; then
-		_fail "$SCRATCH_MNT/$child_path not found"
+		echo "$SCRATCH_MNT/$child_path not found"
 	else
 		echo "*** $child_path OK"
 	fi
 
 	# Verify the parent pointer name exists as a child of the parent
 	if [ ! -f $SCRATCH_MNT/$parent_ppath ]; then
-		_fail "$SCRATCH_MNT/$parent_ppath not found"
+		echo "$SCRATCH_MNT/$parent_ppath not found"
 	else
 		echo "*** $parent_ppath OK"
 	fi
@@ -132,7 +132,7 @@ _verify_parent()
 	parents=($($XFS_IO_PROG -x -c \
 	 "parent -f -i $pino -n $parent_pointer_name" $SCRATCH_MNT/$child_path))
 	if [[ $? != 0 ]]; then
-		 _fail "No parent pointers found for $child_path"
+		 echo "No parent pointers found for $child_path"
 	fi
 
 	# Parse parent pointer output.
@@ -141,7 +141,7 @@ _verify_parent()
 
 	# If we didnt find one, bail out
 	if [ $? -ne 0 ]; then
-		_fail "No parent pointer record found for $parent_path"\
+		echo "No parent pointer record found for $parent_path"\
 			"in $child_path"
 	fi
 
@@ -150,7 +150,7 @@ _verify_parent()
 	pppino="$(stat -c '%i' $SCRATCH_MNT/$parent_ppath)"
 	if [ $cino -ne $pppino ]
 	then
-		_fail "Bad parent pointer name value for $child_path."\
+		echo "Bad parent pointer name value for $child_path."\
 			"$SCRATCH_MNT/$parent_ppath belongs to inode $PPPINO,"\
 			"but should be $cino"
 	fi
@@ -174,7 +174,7 @@ _verify_no_parent()
 
 	# Verify child exists
 	if [ ! -f $SCRATCH_MNT/$child_path ]; then
-		_fail "$SCRATCH_MNT/$child_path not found"
+		echo "$SCRATCH_MNT/$child_path not found"
 	else
 		echo "*** $child_path OK"
 	fi
@@ -195,7 +195,7 @@ _verify_no_parent()
 		return 0
 	fi
 
-	_fail "Parent pointer entry found where none should:"\
+	echo "Parent pointer entry found where none should:"\
 			"inode:$PPINO, gen:$PPGEN,"
 			"name:$PPNAME, namelen:$PPNAME_LEN"
 }


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 13/14] common/parent: check xfs_io parent command paths
  2023-02-16 20:32 ` [PATCHSET 00/14] fstests: adjust tests for xfs parent pointers Darrick J. Wong
                     ` (11 preceding siblings ...)
  2023-02-16 21:16   ` [PATCH 12/14] common/parent: don't _fail on missing parent pointer components Darrick J. Wong
@ 2023-02-16 21:16   ` Darrick J. Wong
  2023-02-16 21:17   ` [PATCH 14/14] xfs/851: test xfs_io parent -p too Darrick J. Wong
  13 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 21:16 UTC (permalink / raw)
  To: djwong, zlang; +Cc: linux-xfs, fstests, guan

From: Darrick J. Wong <djwong@kernel.org>

Make sure that the paths returned by the xfs_io parent command actually
point to the same file.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 common/parent |    8 ++++++++
 1 file changed, 8 insertions(+)


diff --git a/common/parent b/common/parent
index 7e63765d56..96547727d9 100644
--- a/common/parent
+++ b/common/parent
@@ -155,6 +155,14 @@ _verify_parent()
 			"but should be $cino"
 	fi
 
+	# Make sure path printing works by checking that the paths returned
+	# all point to the same inode.
+	local tgt="$SCRATCH_MNT/$child_path"
+	$XFS_IO_PROG -x -c 'parent -p' "$tgt" | while read pptr_path; do
+		test "$tgt" -ef "$pptr_path" || \
+			echo "$tgt parent pointer $pptr_path should be the same file"
+	done
+
 	echo "*** Verified parent pointer:"\
 			"name:$PPNAME, namelen:$PPNAME_LEN"
 	echo "*** Parent pointer OK for child $child_path"


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 14/14] xfs/851: test xfs_io parent -p too
  2023-02-16 20:32 ` [PATCHSET 00/14] fstests: adjust tests for xfs parent pointers Darrick J. Wong
                     ` (12 preceding siblings ...)
  2023-02-16 21:16   ` [PATCH 13/14] common/parent: check xfs_io parent command paths Darrick J. Wong
@ 2023-02-16 21:17   ` Darrick J. Wong
  13 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 21:17 UTC (permalink / raw)
  To: djwong, zlang; +Cc: linux-xfs, fstests, guan

From: Darrick J. Wong <djwong@kernel.org>

Test the -p argument to the xfs_io parent command too.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 tests/xfs/851     |   15 +++++++++++++++
 tests/xfs/851.out |   10 ++++++++++
 2 files changed, 25 insertions(+)


diff --git a/tests/xfs/851 b/tests/xfs/851
index 27870ec05a..8233c1563c 100755
--- a/tests/xfs/851
+++ b/tests/xfs/851
@@ -12,6 +12,7 @@ _begin_fstest auto quick parent
 
 # get standard environment, filters and checks
 . ./common/parent
+. ./common/filter
 
 # Modify as appropriate
 _supported_fs xfs
@@ -96,6 +97,20 @@ ino="$(stat -c '%i' $SCRATCH_MNT/$testfolder2/$file3)"
 mv -f $SCRATCH_MNT/$testfolder2/$file3 $SCRATCH_MNT/$testfolder1/$file2
 _verify_parent "$testfolder1" "$file2" "$testfolder1/$file2"
 
+# Make sure that parent -p filtering works
+mkdir -p $SCRATCH_MNT/dira/ $SCRATCH_MNT/dirb/
+dira_inum=$(stat -c '%i' $SCRATCH_MNT/dira)
+dirb_inum=$(stat -c '%i' $SCRATCH_MNT/dirb)
+touch $SCRATCH_MNT/gorn
+ln $SCRATCH_MNT/gorn $SCRATCH_MNT/dira/file1
+ln $SCRATCH_MNT/gorn $SCRATCH_MNT/dirb/file1
+echo look for both
+$XFS_IO_PROG -c 'parent -p' $SCRATCH_MNT/gorn | _filter_scratch
+echo look for dira
+$XFS_IO_PROG -c 'parent -p -n dira' -c "parent -p -i $dira_inum" $SCRATCH_MNT/gorn | _filter_scratch
+echo look for dirb
+$XFS_IO_PROG -c 'parent -p -n dirb' -c "parent -p -i $dirb_inum" $SCRATCH_MNT/gorn | _filter_scratch
+
 # success, all done
 status=0
 exit
diff --git a/tests/xfs/851.out b/tests/xfs/851.out
index c375ba5f00..f44d3e5d4f 100644
--- a/tests/xfs/851.out
+++ b/tests/xfs/851.out
@@ -57,3 +57,13 @@ QA output created by 851
 *** testfolder1/file2 OK
 *** Verified parent pointer: name:file2, namelen:5
 *** Parent pointer OK for child testfolder1/file2
+look for both
+SCRATCH_MNT/gorn
+SCRATCH_MNT/dira/file1
+SCRATCH_MNT/dirb/file1
+look for dira
+SCRATCH_MNT/dira/file1
+SCRATCH_MNT/dira/file1
+look for dirb
+SCRATCH_MNT/dirb/file1
+SCRATCH_MNT/dirb/file1


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 1/4] misc: adjust for parent pointers with namehashes
  2023-02-16 20:32 ` [PATCHSET v9r2 0/4] fstests: encode parent pointer name in xattr key Darrick J. Wong
@ 2023-02-16 21:17   ` Darrick J. Wong
  2023-02-16 21:17   ` [PATCH 2/4] xfs/021: adjust for short parent pointers with hashes Darrick J. Wong
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 21:17 UTC (permalink / raw)
  To: djwong, zlang; +Cc: linux-xfs, fstests, guan

From: Darrick J. Wong <djwong@kernel.org>

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 tests/xfs/021.out.parent |    8 ++++----
 tests/xfs/122.out        |    4 ++--
 2 files changed, 6 insertions(+), 6 deletions(-)


diff --git a/tests/xfs/021.out.parent b/tests/xfs/021.out.parent
index 661d130239..e7ed72fc27 100644
--- a/tests/xfs/021.out.parent
+++ b/tests/xfs/021.out.parent
@@ -19,9 +19,9 @@ size of attr value = 65536
 
 *** unmount FS
 *** dump attributes (1)
-a.sfattr.hdr.totsize = 53
+a.sfattr.hdr.totsize = 113
 a.sfattr.hdr.count = 3
-a.sfattr.list[0].namelen = 16
+a.sfattr.list[0].namelen = 76
 a.sfattr.list[0].valuelen = 10
 a.sfattr.list[0].root = 0
 a.sfattr.list[0].value = "testfile.1"
@@ -40,7 +40,7 @@ hdr.info.forw = 0
 hdr.info.back = 0
 hdr.info.magic = 0xfbee
 hdr.count = 4
-hdr.usedbytes = 84
+hdr.usedbytes = 144
 hdr.firstused = FIRSTUSED
 hdr.holes = 0
 hdr.freemap[0-2] = [base,size] [FREEMAP..]
@@ -54,7 +54,7 @@ nvlist[1].valuelen = 65535
 nvlist[1].namelen = 2
 nvlist[1].name = "a3"
 nvlist[2].valuelen = 10
-nvlist[2].namelen = 16
+nvlist[2].namelen = 76
 nvlist[2].value = "testfile.2"
 nvlist[3].valuelen = 8
 nvlist[3].namelen = 7
diff --git a/tests/xfs/122.out b/tests/xfs/122.out
index c5958d1b99..97be93274e 100644
--- a/tests/xfs/122.out
+++ b/tests/xfs/122.out
@@ -109,8 +109,8 @@ sizeof(struct xfs_legacy_timestamp) = 8
 sizeof(struct xfs_log_dinode) = 176
 sizeof(struct xfs_log_legacy_timestamp) = 8
 sizeof(struct xfs_map_extent) = 32
-sizeof(struct xfs_parent_name_irec) = 32
-sizeof(struct xfs_parent_name_rec) = 16
+sizeof(struct xfs_parent_name_irec) = 96
+sizeof(struct xfs_parent_name_rec) = 76
 sizeof(struct xfs_parent_ptr) = 280
 sizeof(struct xfs_phys_extent) = 16
 sizeof(struct xfs_pptr_info) = 104


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 2/4] xfs/021: adjust for short parent pointers with hashes
  2023-02-16 20:32 ` [PATCHSET v9r2 0/4] fstests: encode parent pointer name in xattr key Darrick J. Wong
  2023-02-16 21:17   ` [PATCH 1/4] misc: adjust for parent pointers with namehashes Darrick J. Wong
@ 2023-02-16 21:17   ` Darrick J. Wong
  2023-02-16 21:18   ` [PATCH 3/4] xfs/242: fix _filter_bmap for xfs_io bmap that does rt file properly Darrick J. Wong
  2023-02-16 21:18   ` [PATCH 4/4] xfs/021: adjust for short valuelens Darrick J. Wong
  3 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 21:17 UTC (permalink / raw)
  To: djwong, zlang; +Cc: linux-xfs, fstests, guan

From: Darrick J. Wong <djwong@kernel.org>

Adjust this again to handle shortened namehashes.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 tests/xfs/021.out.parent |   20 ++++++++++----------
 tests/xfs/122.out        |    2 +-
 2 files changed, 11 insertions(+), 11 deletions(-)


diff --git a/tests/xfs/021.out.parent b/tests/xfs/021.out.parent
index e7ed72fc27..837b17ffdf 100644
--- a/tests/xfs/021.out.parent
+++ b/tests/xfs/021.out.parent
@@ -19,9 +19,9 @@ size of attr value = 65536
 
 *** unmount FS
 *** dump attributes (1)
-a.sfattr.hdr.totsize = 113
+a.sfattr.hdr.totsize = 59
 a.sfattr.hdr.count = 3
-a.sfattr.list[0].namelen = 76
+a.sfattr.list[0].namelen = 22
 a.sfattr.list[0].valuelen = 10
 a.sfattr.list[0].root = 0
 a.sfattr.list[0].value = "testfile.1"
@@ -40,7 +40,7 @@ hdr.info.forw = 0
 hdr.info.back = 0
 hdr.info.magic = 0xfbee
 hdr.count = 4
-hdr.usedbytes = 144
+hdr.usedbytes = 88
 hdr.firstused = FIRSTUSED
 hdr.holes = 0
 hdr.freemap[0-2] = [base,size] [FREEMAP..]
@@ -53,12 +53,12 @@ nvlist[1].valueblk = 0x1
 nvlist[1].valuelen = 65535
 nvlist[1].namelen = 2
 nvlist[1].name = "a3"
-nvlist[2].valuelen = 10
-nvlist[2].namelen = 76
-nvlist[2].value = "testfile.2"
-nvlist[3].valuelen = 8
-nvlist[3].namelen = 7
-nvlist[3].name = "a2-----"
-nvlist[3].value = "value_2\d"
+nvlist[2].valuelen = 8
+nvlist[2].namelen = 7
+nvlist[2].name = "a2-----"
+nvlist[2].value = "value_2\d"
+nvlist[3].valuelen = 10
+nvlist[3].namelen = 22
+nvlist[3].value = "testfile.2"
 *** done
 *** unmount
diff --git a/tests/xfs/122.out b/tests/xfs/122.out
index 97be93274e..5233aaad5f 100644
--- a/tests/xfs/122.out
+++ b/tests/xfs/122.out
@@ -110,7 +110,7 @@ sizeof(struct xfs_log_dinode) = 176
 sizeof(struct xfs_log_legacy_timestamp) = 8
 sizeof(struct xfs_map_extent) = 32
 sizeof(struct xfs_parent_name_irec) = 96
-sizeof(struct xfs_parent_name_rec) = 76
+sizeof(struct xfs_parent_name_rec) = 12
 sizeof(struct xfs_parent_ptr) = 280
 sizeof(struct xfs_phys_extent) = 16
 sizeof(struct xfs_pptr_info) = 104


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 3/4] xfs/242: fix _filter_bmap for xfs_io bmap that does rt file properly
  2023-02-16 20:32 ` [PATCHSET v9r2 0/4] fstests: encode parent pointer name in xattr key Darrick J. Wong
  2023-02-16 21:17   ` [PATCH 1/4] misc: adjust for parent pointers with namehashes Darrick J. Wong
  2023-02-16 21:17   ` [PATCH 2/4] xfs/021: adjust for short parent pointers with hashes Darrick J. Wong
@ 2023-02-16 21:18   ` Darrick J. Wong
  2023-02-16 21:18   ` [PATCH 4/4] xfs/021: adjust for short valuelens Darrick J. Wong
  3 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 21:18 UTC (permalink / raw)
  To: djwong, zlang; +Cc: linux-xfs, fstests, guan

From: Darrick J. Wong <djwong@kernel.org>

xfsprogs commit XXXXX ("xfs_io: fix bmap command not detecting realtime
files with xattrs") fixed the xfs_io bmap output to display realtime
file columns for realtime files with xattrs.  As a result, the data and
unwritten flags are in column 5 and not column 7.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 common/punch |    8 ++++++++
 1 file changed, 8 insertions(+)


diff --git a/common/punch b/common/punch
index 3b8be21a2a..9e730404e2 100644
--- a/common/punch
+++ b/common/punch
@@ -188,6 +188,7 @@ _filter_hole_fiemap()
 	_coalesce_extents
 }
 
+# Column 7 for datadev files and column 5 for rtdev files
 #     10000 Unwritten preallocated extent
 #     01000 Doesn't begin on stripe unit
 #     00100 Doesn't end   on stripe unit
@@ -200,6 +201,13 @@ _filter_bmap()
 			print $1, $2, $3;
 			next;
 		}
+		$5 ~ /1[01][01][01][01]/ {
+			print $1, $2, "unwritten";
+			next;
+		}
+		$5 ~ /0[01][01][01][01]/ {
+			print $1, $2, "data"
+		}
 		$7 ~ /1[01][01][01][01]/ {
 			print $1, $2, "unwritten";
 			next;


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 4/4] xfs/021: adjust for short valuelens
  2023-02-16 20:32 ` [PATCHSET v9r2 0/4] fstests: encode parent pointer name in xattr key Darrick J. Wong
                     ` (2 preceding siblings ...)
  2023-02-16 21:18   ` [PATCH 3/4] xfs/242: fix _filter_bmap for xfs_io bmap that does rt file properly Darrick J. Wong
@ 2023-02-16 21:18   ` Darrick J. Wong
  3 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 21:18 UTC (permalink / raw)
  To: djwong, zlang; +Cc: linux-xfs, fstests, guan

From: Darrick J. Wong <djwong@kernel.org>

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 tests/xfs/021.out.parent |   10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)


diff --git a/tests/xfs/021.out.parent b/tests/xfs/021.out.parent
index 837b17ffdf..c43dd15900 100644
--- a/tests/xfs/021.out.parent
+++ b/tests/xfs/021.out.parent
@@ -19,12 +19,11 @@ size of attr value = 65536
 
 *** unmount FS
 *** dump attributes (1)
-a.sfattr.hdr.totsize = 59
+a.sfattr.hdr.totsize = 49
 a.sfattr.hdr.count = 3
 a.sfattr.list[0].namelen = 22
-a.sfattr.list[0].valuelen = 10
+a.sfattr.list[0].valuelen = 0
 a.sfattr.list[0].root = 0
-a.sfattr.list[0].value = "testfile.1"
 a.sfattr.list[1].namelen = 2
 a.sfattr.list[1].valuelen = 3
 a.sfattr.list[1].root = 0
@@ -40,7 +39,7 @@ hdr.info.forw = 0
 hdr.info.back = 0
 hdr.info.magic = 0xfbee
 hdr.count = 4
-hdr.usedbytes = 88
+hdr.usedbytes = 80
 hdr.firstused = FIRSTUSED
 hdr.holes = 0
 hdr.freemap[0-2] = [base,size] [FREEMAP..]
@@ -57,8 +56,7 @@ nvlist[2].valuelen = 8
 nvlist[2].namelen = 7
 nvlist[2].name = "a2-----"
 nvlist[2].value = "value_2\d"
-nvlist[3].valuelen = 10
+nvlist[3].valuelen = 0
 nvlist[3].namelen = 22
-nvlist[3].value = "testfile.2"
 *** done
 *** unmount


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 1/1] xfs/122: adjust for flex-array XFS_IOC_GETPARENTS ioctl
  2023-02-16 20:32 ` [PATCHSET v9r2 0/1] fstests: use flex arrays for XFS_IOC_GETPARENTS Darrick J. Wong
@ 2023-02-16 21:18   ` Darrick J. Wong
  0 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-16 21:18 UTC (permalink / raw)
  To: djwong, zlang; +Cc: linux-xfs, fstests, guan

From: Darrick J. Wong <djwong@kernel.org>

Adjust the values here for the flex-array based GETPARENTS structure
definitions.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 tests/xfs/122.out |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)


diff --git a/tests/xfs/122.out b/tests/xfs/122.out
index 5233aaad5f..fe67a0206d 100644
--- a/tests/xfs/122.out
+++ b/tests/xfs/122.out
@@ -100,6 +100,8 @@ sizeof(struct xfs_fsop_ag_resblks) = 64
 sizeof(struct xfs_fsop_geom) = 256
 sizeof(struct xfs_fsop_geom_v1) = 112
 sizeof(struct xfs_fsop_geom_v4) = 112
+sizeof(struct xfs_getparents) = 96
+sizeof(struct xfs_getparents_rec) = 24
 sizeof(struct xfs_icreate_log) = 28
 sizeof(struct xfs_inode_log_format) = 56
 sizeof(struct xfs_inode_log_format_32) = 52
@@ -111,9 +113,7 @@ sizeof(struct xfs_log_legacy_timestamp) = 8
 sizeof(struct xfs_map_extent) = 32
 sizeof(struct xfs_parent_name_irec) = 96
 sizeof(struct xfs_parent_name_rec) = 12
-sizeof(struct xfs_parent_ptr) = 280
 sizeof(struct xfs_phys_extent) = 16
-sizeof(struct xfs_pptr_info) = 104
 sizeof(struct xfs_refcount_key) = 4
 sizeof(struct xfs_refcount_rec) = 12
 sizeof(struct xfs_rmap_key) = 20


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* Re: [RFC DELUGE v9r2d1] xfs: Parent Pointers
  2023-02-16 20:06 [RFC DELUGE v9r2d1] xfs: Parent Pointers Darrick J. Wong
                   ` (24 preceding siblings ...)
  2023-02-16 20:32 ` [PATCHSET v9r2 0/1] fstests: use flex arrays for XFS_IOC_GETPARENTS Darrick J. Wong
@ 2023-02-17 20:02 ` Allison Henderson
  2023-02-24  2:51   ` Darrick J. Wong
  25 siblings, 1 reply; 227+ messages in thread
From: Allison Henderson @ 2023-02-17 20:02 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs

On Thu, 2023-02-16 at 12:06 -0800, Darrick J. Wong wrote:
> Hi everyone,
> 
> This deluge contains all of the additions to the parent pointers
> patchset that I've been working on for the past month.  The kernel
> and
> xfsprogs patchsets are based on Allison's v9r2 tag from last week;
> the fstests patches are merely a part of my development tree.  To
> recap
> Allison's cover letter:
> 
> "The goal of this patch set is to add a parent pointer attribute to
> each
> inode.  The attribute name containing the parent inode, generation,
> and
> directory offset, while the  attribute value contains the file name.
> This feature will enable future optimizations for online scrub,
> shrink,
> nfs handles, verity, or any other feature that could make use of
> quickly
> deriving an inodes path from the mount point."
> 
> The kernel branches start with a number of buf fixes that I need to
> get
> fstests to pass.  I also restructured the kernel implementation of
> GETPARENTS to cut the memory usage considerably.
> 
> For userspace, I cleaned up the xfsprogs patches so that libxfs-diff
> shows no discrepancies with the kernel and cleaned up the parent
> pointer
> usage code that I prototyped in 2017 so that it's less buggy and
> moldy.
> I also rewired xfs_scrub to use GETPARENTS to report file paths of
> corrupt files instead of inode numbers, since that part had bitrotted
> badly.
> 
> With that out of the way, I implemented a prototype of online repairs
> for directories and parent pointers.  This is only a proof of
> concept,
> because I had already backported many many patches from part 1 of
> online
> repair, and didn't feel like porting the parts needed to commit new
> structures atomically and reap the old dir/xattr blocks.  IOWs, the
> prototype scans the filesystem to build a parallel directory or xattr
> structure, and then reports on any discrepancies between the two
> versions.  Obviously this won't fix a corrupt directory tree, but it
> enables us to test the repair code on a consistent filesystem to
> demonstrate that it works.
> 
> Next, I implemented fully functional parent pointer checking and
> repair
> for xfs_repair.  This was less hard than I guessed it would be
> because
> the current design of phase 6 includes a walk of all directories. 
> From
> the dirent data, we can build a per-AG index of all the parent
> pointers
> for all the inodes in that AG, then walk all the inodes in that AG to
> compare the lists.  As you might guess, this eats a fair amount of
> memory, even with a rudimentary dirent name deduplication table to
> cut
> down on memory usage.
> 
> After that, I moved on to solving the major problem that I've been
> having with the directory repair code, and that is the problem of
> reconstructing dirents at the offsets specified by the parent
> pointers.
> The details of the problem and how I dealt with it are captured in
> the
> cover letter for those patches.  Suffice to say, we now encode the
> dirent name in the parent pointer attrname (or a collision resistant
> hash if it doesn't fit), which makes it possible to commit new
> directories atomically.
> 
> The last part of this patchset reorganizes the XFS_IOC_GETPARENTS
> ioctl
> to encode variable length parent pointer records in the caller's
> buffer.
> The denser encodings mean that we can extract the parent list with
> fewer
> kernel calls.
> 
> --D


Ermergersh, thats a lot!  Thanks for all the hard work.  I feel like if
we don't come up with a plan for review though, people may not know
where to start for these deluges!  Lets see... if we had to break this
down, I think would divide it up between the existing parent pointers
and the new pptr propositions for ofsck.  Then further divide it among
kernel space, user space and test case.  If I had to pick only one of
these to focus attention on, probably it should be new ofsck changes in
the kernel space, since the rest of the deluge is really contingent on
it. 

So now we've narrowed this down to a few subsets:

[PATCHSET v9r2d1 0/3] xfs: bug fixes for parent pointers
[PATCHSET v9r2d1 0/4] xfs: rework the GETPARENTS ioctl,
[PATCHSET v9r2d1 00/23] xfs: online fsck support patches
[PATCHSET v9r2d1 0/7] xfs: online repair of directories
[PATCHSET v9r2d1 0/2] xfs: online checking of parent pointers
[PATCHSET v9r2d1 0/3] xfs: online checking of parent pointers
[PATCHSET v9r2d1 0/2] xfs: online checking of directories
[PATCHSET v9r2d1 0/5] xfs: encode parent pointer name in xattr key
[PATCHSET v9r2d1 0/3] xfs: use flex arrays for XFS_IOC_GETPARENTS,

Of those, I think "xfs: encode parent pointer name in xattr key" is the
only one that might impact other features since it's changeing the
ondisk format from when we first started the effort years ago.  So
probably that might be the best place for people to start since if this
needs to change it might impact some of the other subsets in the
deluge, or even features they are working on if they've based anything
on the existing pptr set.

I feel like a 5 patch subset is a very reasonable thing to ask people
to give their attention to.  That way they dont get lost in things like
nits for optimizations that might not even matter if something it
depends on changes.

For the most part I am ok with changeing the format as long as everyone
is aware and in agreement so that we dont get caught up re-coding
efforts that seem to have stuggled with disagreements now on the scale
of decades.  Some of these patches were already very old by the time I
got them!

On a side note, there are some preliminary patches of kernel side
parent pointers that are either larp fixes or refactoring not sensitive
to the proposed ofsck changes.  These patches a have been floating
around for a while now, so if no one has any gripes, I think just
merging those would help cut down the amount of rebaseing, user space
porting and patch reviewing that goes on for every version.  (maybe the
first 1 though 7 of the 28 patch set, if folks are ok with that)

I think the shear size of some of these sets tend to work against them,
as people likely cannot afford the time block they present on the
surface.  So I think we would do well to find a way to introduce them
at a reasonable pace and keep attention focused on the subsections that
should require more than others, and hopefully keep thing moving in a
progressive direction.

Thx!
Allison


^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [PATCHSET v9r2d1 0/5] xfs: encode parent pointer name in xattr key
  2023-02-16 20:28 ` [PATCHSET v9r2d1 0/5] xfs: encode parent pointer name in xattr key Darrick J. Wong
                     ` (4 preceding siblings ...)
  2023-02-16 20:52   ` [PATCH 5/5] xfs: use parent pointer xattr space more efficiently Darrick J. Wong
@ 2023-02-18  8:12   ` Amir Goldstein
  2023-02-24  2:58     ` Darrick J. Wong
  2023-03-03 16:43   ` Darrick J. Wong
  2023-03-03 17:11   ` [PATCHSET v9r2d1.1 00/13] xfs: remove parent pointer hashing Darrick J. Wong
  7 siblings, 1 reply; 227+ messages in thread
From: Amir Goldstein @ 2023-02-18  8:12 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: allison.henderson, linux-xfs

On Thu, Feb 16, 2023 at 10:33 PM Darrick J. Wong <djwong@kernel.org> wrote:
>
> Hi all,
>
> As I've mentioned in past comments on the parent pointers patchset, the
> proposed ondisk parent pointer format presents a major difficulty for
> online directory repair.  This difficulty derives from encoding the
> directory offset of the dirent that the parent pointer is mirroring.
> Recall that parent pointers are stored in extended attributes:
>
>     (parent_ino, parent_gen, diroffset) -> (dirent_name)
>
> If the directory is rebuilt, the offsets of the new directory entries
> must match the diroffset encoded in the parent pointer, or the
> filesystem becomes inconsistent.  There are a few ways to solve this
> problem.
>
> One approach would be to augment the directory addname function to take
> a diroffset and try to create the new entry at that offset.  This will
> not work if the original directory became corrupt and the parent
> pointers were written out with impossible diroffsets (e.g. overlapping).
> Requiring matching diroffsets also prevents reorganization and
> compaction of directories.
>
> This could be remedied by recording the parent pointer diroffset updates
> necessary to retain consistency, and using the logged parent pointer
> replace function to rewrite parent pointers as necessary.  This is a
> poor choice from a performance perspective because the logged xattr
> updates must be committed in the same transaction that commits the new
> directory structure.  If there are a large number of diroffset updates,
> then the directory commit could take an even longer time.
>
> Worse yet, if the logged xattr updates fill up the transaction, repair
> will have no choice but to roll to a fresh transaction to continue
> logging.  This breaks repair's policy that repairs should commit
> atomically.  It may break the filesystem as well, since all files
> involved are pinned until the delayed pptr xattr processing completes.
> This is a completely bad engineering choice.
>
> Note that the diroffset information is not used anywhere in the
> directory lookup code.  Observe that the only information that we
> require for a parent pointer is the inverse of an pre-ftype dirent,
> since this is all we need to reconstruct a directory entry:
>
>     (parent_ino, dirent_name) -> NULL
>
> The xattr code supports xattrs with zero-length values, surprisingly.
> The parent_gen field makes it easy to export parent handle information,
> so it can be retained:
>
>     (parent_ino, parent_gen, dirent_name) -> NULL
>
> Moving the ondisk format to this format is very advantageous for repair
> code.  Unfortunately, there is one hitch: xattr names cannot exceed 255
> bytes due to ondisk format limitations.  We don't want to constrain the
> length of dirent names, so instead we could use collision resistant
> hashes to handle dirents with very long names:
>
>     (parent_ino, parent_gen, sha512(dirent_name)) -> (dirent_name)
>
> The first two patches implement this schema.  However, this encoding is
> not maximally efficient, since many directory names are shorter than the
> length of a sha512 hash.  The last three patches in the series bifurcate
> the parent pointer ondisk format depending on context:
>
> For dirent names shorter than 243 bytes:
>
>     (parent_ino, parent_gen, dirent_name) -> NULL
>
> For dirent names longer than 243 bytes:
>
>     (parent_ino, parent_gen, dirent_name[0:178],
>      sha512(child_gen, dirent_name)) -> (dirent_name[179:255])
>
> The child file's generation number is mixed into the sha512 computation
> to make it a little more difficult for unprivileged userspace to attempt
> collisions.
>

Naive question:

Obviously, the spec of stradard xattrs does not allow duplicate keys,
but dabtree does allow duplicate keys, does it not?

So if you were to allow duplicate parent pointer records, e.g.:

(parent_ino, parent_gen) -> dirent_name1
(parent_ino, parent_gen) -> dirent_name2

Or to optimize performance for the case of large number of sibling hardlinks
of the same parent (if that case is worth optimizing):

(parent_ino, parent_gen, dirent_name[0:178]) -> (dirent_name1[179:255])
(parent_ino, parent_gen, dirent_name[0:178]) -> (dirent_name2[179:255])

Then pptr code should have no problem walking all the matching
parent pointer records to find the unique parent->child record that it
needs to operate on?

I am sure it would be more complicated than how I depicted it,
but having to deal with even remote possibility of hash collisions sounds
like a massive headache - having to maintain code that is really hard to
test and rarely exercised is not a recipe for peace of mind...

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC DELUGE v9r2d1] xfs: Parent Pointers
  2023-02-17 20:02 ` [RFC DELUGE v9r2d1] xfs: Parent Pointers Allison Henderson
@ 2023-02-24  2:51   ` Darrick J. Wong
  2023-02-24  7:24     ` Amir Goldstein
  2023-02-25  7:34     ` Allison Henderson
  0 siblings, 2 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-24  2:51 UTC (permalink / raw)
  To: Allison Henderson; +Cc: linux-xfs

On Fri, Feb 17, 2023 at 08:02:29PM +0000, Allison Henderson wrote:
> On Thu, 2023-02-16 at 12:06 -0800, Darrick J. Wong wrote:
> > Hi everyone,
> > 
> > This deluge contains all of the additions to the parent pointers
> > patchset that I've been working on for the past month.  The kernel
> > and
> > xfsprogs patchsets are based on Allison's v9r2 tag from last week;
> > the fstests patches are merely a part of my development tree.  To
> > recap

<snip>

> Ermergersh, thats a lot!  Thanks for all the hard work.  I feel like if
> we don't come up with a plan for review though, people may not know
> where to start for these deluges!  Lets see... if we had to break this
> down, I think would divide it up between the existing parent pointers
> and the new pptr propositions for ofsck.

That's a good place to cleave.

> Then further divide it among
> kernel space, user space and test case.  If I had to pick only one of
> these to focus attention on, probably it should be new ofsck changes in
> the kernel space, since the rest of the deluge is really contingent on
> it. 

Yup.  Though you ought to read through the offline fsck patches too.
Those take a very different approach to resolving parent pointers.  So
much of repair is based on nuking directories that I don't know there's
a good way to rebuild them from parent pointers.

A thought I had was that when we decide to zap a directory due to
problems in the directory blocks themselves, we could them initiate a
scan of the parent pointers to try to find all the dirents we can.  I
ran into problems with that approach because libxfs_iget allocates fresh
xfs_inode objects (instead of caching and sharing them like the kernel
does) and that made it really hard to scan things in a coherent manner.

> So now we've narrowed this down to a few subsets:
> 
> [PATCHSET v9r2d1 0/3] xfs: bug fixes for parent pointers
> [PATCHSET v9r2d1 0/4] xfs: rework the GETPARENTS ioctl,

If you read through these two patchsets and think they're ok, then
either fold the fixes into the main series or tack them on the end,
whichever is easier.  If you tack them on the end, please add your
own SOB tags.

> [PATCHSET v9r2d1 00/23] xfs: online fsck support patches
> [PATCHSET v9r2d1 0/7] xfs: online repair of directories
> [PATCHSET v9r2d1 0/2] xfs: online checking of parent pointers
> [PATCHSET v9r2d1 0/3] xfs: online checking of parent pointers
> [PATCHSET v9r2d1 0/2] xfs: online checking of directories

The fsck functionality exists to prove the point that directory repair
is /very/ awkward if we have to update p_diroffset.  As such, they
focused on getting the main parts right ... but with the obvious
problem of making pptrs dependent on online fsck part 1 getting merged.

Speaking of which -- can we merge online fsck for 6.4?  Please? :)

> [PATCHSET v9r2d1 0/5] xfs: encode parent pointer name in xattr key

Resolving the questions presented by this series is critical to nailing
down the ondisk format and merging the feature.  But we'll get to that
below.

> [PATCHSET v9r2d1 0/3] xfs: use flex arrays for XFS_IOC_GETPARENTS,

I'd like to know what you think about converting the ioctl definition to
flex arrays instead of the fixed size structs.  I'm not sure where to
put this series, though.  If you decide that you want 'em, then ideally
they'd be in xfs_fs.h from the introduction of XFS_IOC_GETPARENTS, but
I don't see any point in backporting them around "xfs: rework the
GETPARENTS ioctl".

(I would be ok if you rolled all of it into patch 25 from the original
v9 set.)

> Of those, I think "xfs: encode parent pointer name in xattr key" is the
> only one that might impact other features since it's changeing the
> ondisk format from when we first started the effort years ago.  So
> probably that might be the best place for people to start since if this
> needs to change it might impact some of the other subsets in the
> deluge, or even features they are working on if they've based anything
> on the existing pptr set.

Bingo!

The biggest question about the format change is (IMHO) whether we're ok
with using a hash function for parent pointer names that don't fit in
the attr key space, and which hash?

The sha2 family was designed to be collision resistant, but I don't
anticipate that will last forever.  The hash is computed from (the full
name and the child generation number) when the dirent name is longer
than 243 bytes.  The first 179 bytes of the dirent name are still
written in the parent pointer attr name.  An attacker would have to find
a collision that only changes the last 76 bytes of the dirent name, and
they'd have to know the generation number at runtime.

(Note: dirent names shorter than 243 bytes are written directly into the
parent pointer xattr name, no hashing required.)

I /think/ that's good enough, but I'm no cryptanalyst.  The alternative
would be to change the xattr format so that the namelen field in the
leaf structure to encode *only* the name component of the parent
pointer.  This would lead to a lot of special cased xattr code and
probably a lot of bugs and other stupid problems, which is why I didn't
take that route.

Thoughts?

> I feel like a 5 patch subset is a very reasonable thing to ask people
> to give their attention to.  That way they dont get lost in things like
> nits for optimizations that might not even matter if something it
> depends on changes.
>
> For the most part I am ok with changeing the format as long as everyone
> is aware and in agreement so that we dont get caught up re-coding
> efforts that seem to have stuggled with disagreements now on the scale
> of decades.  Some of these patches were already very old by the time I
> got them!

Hheehhe.  Same here -- rmap was pretty old by the time I started pushing
that for reals. :)

> On a side note, there are some preliminary patches of kernel side
> parent pointers that are either larp fixes or refactoring not sensitive
> to the proposed ofsck changes.  These patches a have been floating
> around for a while now, so if no one has any gripes, I think just
> merging those would help cut down the amount of rebaseing, user space
> porting and patch reviewing that goes on for every version.  (maybe the
> first 1 though 7 of the 28 patch set, if folks are ok with that)

I thought about doing that for 6.3, but I found enough bugs in the
locking stuff (recall the first bugfix series) that I held back.  I'm
not sure about the two "Increase <blah>" patches -- they'll bloat kernel
structures without a real user for them.

<shrug>

> I think the shear size of some of these sets tend to work against them,
> as people likely cannot afford the time block they present on the
> surface.

Agreed.  At this point, I've worked through enough of the parent
pointers code to understand what's going on that I'm ok with merging it
once we settle the above question.

FWIW the whole series (kernel+xfsprogs+fstests) has been passing my
nightly QA farm for a couple of weeks now despite my constant hammering
on it, so I think the implementation is ready.

> So I think we would do well to find a way to introduce them
> at a reasonable pace and keep attention focused on the subsections that
> should require more than others, and hopefully keep thing moving in a
> progressive direction.

I disagree -- I want to merge online fsck part 1 so I can get that out
of my dev trees.  Then I want to focus on getting this over the finish
line and merged.  But then I'm not known for incrementalism. :P

--D

> Thx!
> Allison
> 

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [PATCHSET v9r2d1 0/5] xfs: encode parent pointer name in xattr key
  2023-02-18  8:12   ` [PATCHSET v9r2d1 0/5] xfs: encode parent pointer name in xattr key Amir Goldstein
@ 2023-02-24  2:58     ` Darrick J. Wong
  0 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-24  2:58 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: allison.henderson, linux-xfs

On Sat, Feb 18, 2023 at 10:12:05AM +0200, Amir Goldstein wrote:
> On Thu, Feb 16, 2023 at 10:33 PM Darrick J. Wong <djwong@kernel.org> wrote:
> >
> > Hi all,
> >
> > As I've mentioned in past comments on the parent pointers patchset, the
> > proposed ondisk parent pointer format presents a major difficulty for
> > online directory repair.  This difficulty derives from encoding the
> > directory offset of the dirent that the parent pointer is mirroring.
> > Recall that parent pointers are stored in extended attributes:
> >
> >     (parent_ino, parent_gen, diroffset) -> (dirent_name)
> >
> > If the directory is rebuilt, the offsets of the new directory entries
> > must match the diroffset encoded in the parent pointer, or the
> > filesystem becomes inconsistent.  There are a few ways to solve this
> > problem.
> >
> > One approach would be to augment the directory addname function to take
> > a diroffset and try to create the new entry at that offset.  This will
> > not work if the original directory became corrupt and the parent
> > pointers were written out with impossible diroffsets (e.g. overlapping).
> > Requiring matching diroffsets also prevents reorganization and
> > compaction of directories.
> >
> > This could be remedied by recording the parent pointer diroffset updates
> > necessary to retain consistency, and using the logged parent pointer
> > replace function to rewrite parent pointers as necessary.  This is a
> > poor choice from a performance perspective because the logged xattr
> > updates must be committed in the same transaction that commits the new
> > directory structure.  If there are a large number of diroffset updates,
> > then the directory commit could take an even longer time.
> >
> > Worse yet, if the logged xattr updates fill up the transaction, repair
> > will have no choice but to roll to a fresh transaction to continue
> > logging.  This breaks repair's policy that repairs should commit
> > atomically.  It may break the filesystem as well, since all files
> > involved are pinned until the delayed pptr xattr processing completes.
> > This is a completely bad engineering choice.
> >
> > Note that the diroffset information is not used anywhere in the
> > directory lookup code.  Observe that the only information that we
> > require for a parent pointer is the inverse of an pre-ftype dirent,
> > since this is all we need to reconstruct a directory entry:
> >
> >     (parent_ino, dirent_name) -> NULL
> >
> > The xattr code supports xattrs with zero-length values, surprisingly.
> > The parent_gen field makes it easy to export parent handle information,
> > so it can be retained:
> >
> >     (parent_ino, parent_gen, dirent_name) -> NULL
> >
> > Moving the ondisk format to this format is very advantageous for repair
> > code.  Unfortunately, there is one hitch: xattr names cannot exceed 255
> > bytes due to ondisk format limitations.  We don't want to constrain the
> > length of dirent names, so instead we could use collision resistant
> > hashes to handle dirents with very long names:
> >
> >     (parent_ino, parent_gen, sha512(dirent_name)) -> (dirent_name)
> >
> > The first two patches implement this schema.  However, this encoding is
> > not maximally efficient, since many directory names are shorter than the
> > length of a sha512 hash.  The last three patches in the series bifurcate
> > the parent pointer ondisk format depending on context:
> >
> > For dirent names shorter than 243 bytes:
> >
> >     (parent_ino, parent_gen, dirent_name) -> NULL
> >
> > For dirent names longer than 243 bytes:
> >
> >     (parent_ino, parent_gen, dirent_name[0:178],
> >      sha512(child_gen, dirent_name)) -> (dirent_name[179:255])
> >
> > The child file's generation number is mixed into the sha512 computation
> > to make it a little more difficult for unprivileged userspace to attempt
> > collisions.
> >
> 
> Naive question:
> 
> Obviously, the spec of stradard xattrs does not allow duplicate keys,
> but dabtree does allow duplicate keys, does it not?

The dabtree allows duplicate hashes for a given name, yes.  It doesn't
allow for duplicate names, though.

(Also note that small xattr structures skip the dabtree and hashing.)

> So if you were to allow duplicate parent pointer records, e.g.:
> 
> (parent_ino, parent_gen) -> dirent_name1
> (parent_ino, parent_gen) -> dirent_name2
> 
> Or to optimize performance for the case of large number of sibling hardlinks
> of the same parent (if that case is worth optimizing):
> 
> (parent_ino, parent_gen, dirent_name[0:178]) -> (dirent_name1[179:255])
> (parent_ino, parent_gen, dirent_name[0:178]) -> (dirent_name2[179:255])
> 
> Then pptr code should have no problem walking all the matching
> parent pointer records to find the unique parent->child record that it
> needs to operate on?

Theoretically, yes, the parent pointer code could walk all the xattrs
that have the same attr name looking for the one with matching value.
But keep in mind that there could be min(2^32, 2^(8+76)) potential
matches.

The other difficulty is that the xattr lookup and removal code don't
have a means to return the dastate to callers or for callers to provide
a dastate to go back into the xattr code.  You'd need that to identify
the specific parent pointer xattr you want to operate on.

> I am sure it would be more complicated than how I depicted it,
> but having to deal with even remote possibility of hash collisions sounds
> like a massive headache - having to maintain code that is really hard to
> test and rarely exercised is not a recipe for peace of mind...

Yep.  Hash functions are definitely finger-crossing headhurting.

--D

> Thanks,
> Amir.

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC DELUGE v9r2d1] xfs: Parent Pointers
  2023-02-24  2:51   ` Darrick J. Wong
@ 2023-02-24  7:24     ` Amir Goldstein
  2023-02-25  1:58       ` Darrick J. Wong
  2023-02-25  7:34     ` Allison Henderson
  1 sibling, 1 reply; 227+ messages in thread
From: Amir Goldstein @ 2023-02-24  7:24 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Allison Henderson, linux-xfs

On Fri, Feb 24, 2023 at 5:09 AM Darrick J. Wong <djwong@kernel.org> wrote:
>
> On Fri, Feb 17, 2023 at 08:02:29PM +0000, Allison Henderson wrote:
> > On Thu, 2023-02-16 at 12:06 -0800, Darrick J. Wong wrote:
> > > Hi everyone,
> > >
> > > This deluge contains all of the additions to the parent pointers
> > > patchset that I've been working on for the past month.  The kernel
> > > and
> > > xfsprogs patchsets are based on Allison's v9r2 tag from last week;
> > > the fstests patches are merely a part of my development tree.  To
> > > recap
>
> <snip>
>
> > Ermergersh, thats a lot!  Thanks for all the hard work.  I feel like if
> > we don't come up with a plan for review though, people may not know
> > where to start for these deluges!  Lets see... if we had to break this
> > down, I think would divide it up between the existing parent pointers
> > and the new pptr propositions for ofsck.
>
> That's a good place to cleave.
>
> > Then further divide it among
> > kernel space, user space and test case.  If I had to pick only one of
> > these to focus attention on, probably it should be new ofsck changes in
> > the kernel space, since the rest of the deluge is really contingent on
> > it.
>
> Yup.  Though you ought to read through the offline fsck patches too.
> Those take a very different approach to resolving parent pointers.  So
> much of repair is based on nuking directories that I don't know there's
> a good way to rebuild them from parent pointers.
>
> A thought I had was that when we decide to zap a directory due to
> problems in the directory blocks themselves, we could them initiate a
> scan of the parent pointers to try to find all the dirents we can.  I
> ran into problems with that approach because libxfs_iget allocates fresh
> xfs_inode objects (instead of caching and sharing them like the kernel
> does) and that made it really hard to scan things in a coherent manner.
>
> > So now we've narrowed this down to a few subsets:
> >
> > [PATCHSET v9r2d1 0/3] xfs: bug fixes for parent pointers
> > [PATCHSET v9r2d1 0/4] xfs: rework the GETPARENTS ioctl,
>
> If you read through these two patchsets and think they're ok, then
> either fold the fixes into the main series or tack them on the end,
> whichever is easier.  If you tack them on the end, please add your
> own SOB tags.
>
> > [PATCHSET v9r2d1 00/23] xfs: online fsck support patches
> > [PATCHSET v9r2d1 0/7] xfs: online repair of directories
> > [PATCHSET v9r2d1 0/2] xfs: online checking of parent pointers
> > [PATCHSET v9r2d1 0/3] xfs: online checking of parent pointers
> > [PATCHSET v9r2d1 0/2] xfs: online checking of directories
>
> The fsck functionality exists to prove the point that directory repair
> is /very/ awkward if we have to update p_diroffset.  As such, they
> focused on getting the main parts right ... but with the obvious
> problem of making pptrs dependent on online fsck part 1 getting merged.
>
> Speaking of which -- can we merge online fsck for 6.4?  Please? :)
>
> > [PATCHSET v9r2d1 0/5] xfs: encode parent pointer name in xattr key
>
> Resolving the questions presented by this series is critical to nailing
> down the ondisk format and merging the feature.  But we'll get to that
> below.
>
> > [PATCHSET v9r2d1 0/3] xfs: use flex arrays for XFS_IOC_GETPARENTS,
>
> I'd like to know what you think about converting the ioctl definition to
> flex arrays instead of the fixed size structs.  I'm not sure where to
> put this series, though.  If you decide that you want 'em, then ideally
> they'd be in xfs_fs.h from the introduction of XFS_IOC_GETPARENTS, but
> I don't see any point in backporting them around "xfs: rework the
> GETPARENTS ioctl".
>
> (I would be ok if you rolled all of it into patch 25 from the original
> v9 set.)
>
> > Of those, I think "xfs: encode parent pointer name in xattr key" is the
> > only one that might impact other features since it's changeing the
> > ondisk format from when we first started the effort years ago.  So
> > probably that might be the best place for people to start since if this
> > needs to change it might impact some of the other subsets in the
> > deluge, or even features they are working on if they've based anything
> > on the existing pptr set.
>
> Bingo!
>
> The biggest question about the format change is (IMHO) whether we're ok
> with using a hash function for parent pointer names that don't fit in
> the attr key space, and which hash?
>
> The sha2 family was designed to be collision resistant, but I don't
> anticipate that will last forever.  The hash is computed from (the full
> name and the child generation number) when the dirent name is longer
> than 243 bytes.  The first 179 bytes of the dirent name are still
> written in the parent pointer attr name.  An attacker would have to find
> a collision that only changes the last 76 bytes of the dirent name, and
> they'd have to know the generation number at runtime.
>
> (Note: dirent names shorter than 243 bytes are written directly into the
> parent pointer xattr name, no hashing required.)
>
> I /think/ that's good enough, but I'm no cryptanalyst.  The alternative
> would be to change the xattr format so that the namelen field in the
> leaf structure to encode *only* the name component of the parent
> pointer.  This would lead to a lot of special cased xattr code and
> probably a lot of bugs and other stupid problems, which is why I didn't
> take that route.
>
> Thoughts?

Is there an intention to allow enabling parent pointers on existing systems
and run online repair to add the pptr xattrs?

If not, then you could avoid the entire complexity with
statp->f_namelen = XFS_MAX_PPTR_NAMELEN;
for pptr formatted fs.

Are those 12 bytes of namelen really going to be missed?
This limitation does not need to last forever.
It can be lifted later by special casing pptr namelen as you suggested
after a separate risk vs. benefit discussion.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC DELUGE v9r2d1] xfs: Parent Pointers
  2023-02-24  7:24     ` Amir Goldstein
@ 2023-02-25  1:58       ` Darrick J. Wong
  0 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-02-25  1:58 UTC (permalink / raw)
  To: Amir Goldstein; +Cc: Allison Henderson, linux-xfs

On Fri, Feb 24, 2023 at 09:24:48AM +0200, Amir Goldstein wrote:
> On Fri, Feb 24, 2023 at 5:09 AM Darrick J. Wong <djwong@kernel.org> wrote:
> >
> > On Fri, Feb 17, 2023 at 08:02:29PM +0000, Allison Henderson wrote:
> > > On Thu, 2023-02-16 at 12:06 -0800, Darrick J. Wong wrote:
> > > > Hi everyone,
> > > >
> > > > This deluge contains all of the additions to the parent pointers
> > > > patchset that I've been working on for the past month.  The kernel
> > > > and
> > > > xfsprogs patchsets are based on Allison's v9r2 tag from last week;
> > > > the fstests patches are merely a part of my development tree.  To
> > > > recap
> >
> > <snip>
> >
> > > Ermergersh, thats a lot!  Thanks for all the hard work.  I feel like if
> > > we don't come up with a plan for review though, people may not know
> > > where to start for these deluges!  Lets see... if we had to break this
> > > down, I think would divide it up between the existing parent pointers
> > > and the new pptr propositions for ofsck.
> >
> > That's a good place to cleave.
> >
> > > Then further divide it among
> > > kernel space, user space and test case.  If I had to pick only one of
> > > these to focus attention on, probably it should be new ofsck changes in
> > > the kernel space, since the rest of the deluge is really contingent on
> > > it.
> >
> > Yup.  Though you ought to read through the offline fsck patches too.
> > Those take a very different approach to resolving parent pointers.  So
> > much of repair is based on nuking directories that I don't know there's
> > a good way to rebuild them from parent pointers.
> >
> > A thought I had was that when we decide to zap a directory due to
> > problems in the directory blocks themselves, we could them initiate a
> > scan of the parent pointers to try to find all the dirents we can.  I
> > ran into problems with that approach because libxfs_iget allocates fresh
> > xfs_inode objects (instead of caching and sharing them like the kernel
> > does) and that made it really hard to scan things in a coherent manner.
> >
> > > So now we've narrowed this down to a few subsets:
> > >
> > > [PATCHSET v9r2d1 0/3] xfs: bug fixes for parent pointers
> > > [PATCHSET v9r2d1 0/4] xfs: rework the GETPARENTS ioctl,
> >
> > If you read through these two patchsets and think they're ok, then
> > either fold the fixes into the main series or tack them on the end,
> > whichever is easier.  If you tack them on the end, please add your
> > own SOB tags.
> >
> > > [PATCHSET v9r2d1 00/23] xfs: online fsck support patches
> > > [PATCHSET v9r2d1 0/7] xfs: online repair of directories
> > > [PATCHSET v9r2d1 0/2] xfs: online checking of parent pointers
> > > [PATCHSET v9r2d1 0/3] xfs: online checking of parent pointers
> > > [PATCHSET v9r2d1 0/2] xfs: online checking of directories
> >
> > The fsck functionality exists to prove the point that directory repair
> > is /very/ awkward if we have to update p_diroffset.  As such, they
> > focused on getting the main parts right ... but with the obvious
> > problem of making pptrs dependent on online fsck part 1 getting merged.
> >
> > Speaking of which -- can we merge online fsck for 6.4?  Please? :)
> >
> > > [PATCHSET v9r2d1 0/5] xfs: encode parent pointer name in xattr key
> >
> > Resolving the questions presented by this series is critical to nailing
> > down the ondisk format and merging the feature.  But we'll get to that
> > below.
> >
> > > [PATCHSET v9r2d1 0/3] xfs: use flex arrays for XFS_IOC_GETPARENTS,
> >
> > I'd like to know what you think about converting the ioctl definition to
> > flex arrays instead of the fixed size structs.  I'm not sure where to
> > put this series, though.  If you decide that you want 'em, then ideally
> > they'd be in xfs_fs.h from the introduction of XFS_IOC_GETPARENTS, but
> > I don't see any point in backporting them around "xfs: rework the
> > GETPARENTS ioctl".
> >
> > (I would be ok if you rolled all of it into patch 25 from the original
> > v9 set.)
> >
> > > Of those, I think "xfs: encode parent pointer name in xattr key" is the
> > > only one that might impact other features since it's changeing the
> > > ondisk format from when we first started the effort years ago.  So
> > > probably that might be the best place for people to start since if this
> > > needs to change it might impact some of the other subsets in the
> > > deluge, or even features they are working on if they've based anything
> > > on the existing pptr set.
> >
> > Bingo!
> >
> > The biggest question about the format change is (IMHO) whether we're ok
> > with using a hash function for parent pointer names that don't fit in
> > the attr key space, and which hash?
> >
> > The sha2 family was designed to be collision resistant, but I don't
> > anticipate that will last forever.  The hash is computed from (the full
> > name and the child generation number) when the dirent name is longer
> > than 243 bytes.  The first 179 bytes of the dirent name are still
> > written in the parent pointer attr name.  An attacker would have to find
> > a collision that only changes the last 76 bytes of the dirent name, and
> > they'd have to know the generation number at runtime.
> >
> > (Note: dirent names shorter than 243 bytes are written directly into the
> > parent pointer xattr name, no hashing required.)
> >
> > I /think/ that's good enough, but I'm no cryptanalyst.  The alternative
> > would be to change the xattr format so that the namelen field in the
> > leaf structure to encode *only* the name component of the parent
> > pointer.  This would lead to a lot of special cased xattr code and
> > probably a lot of bugs and other stupid problems, which is why I didn't
> > take that route.
> >
> > Thoughts?
> 
> Is there an intention to allow enabling parent pointers on existing systems
> and run online repair to add the pptr xattrs?

That's going to be difficult because we don't know how much space the
parent pointers are going to need ahead of time.  I'd guess probably
not.

> If not, then you could avoid the entire complexity with
> statp->f_namelen = XFS_MAX_PPTR_NAMELEN;
> for pptr formatted fs.
> 
> Are those 12 bytes of namelen really going to be missed?

I dislike having to lower MAXNAMELEN; that seems like it would result in
user complaints.

> This limitation does not need to last forever.
> It can be lifted later by special casing pptr namelen as you suggested
> after a separate risk vs. benefit discussion.

Deferring the discussion in that manner will require us to burn another
incompat feature bit to prevent older kernels that don't understand the
hashing from mounting a filesystem where the hashes are in use.

--D

> Thanks,
> Amir.

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC DELUGE v9r2d1] xfs: Parent Pointers
  2023-02-24  2:51   ` Darrick J. Wong
  2023-02-24  7:24     ` Amir Goldstein
@ 2023-02-25  7:34     ` Allison Henderson
  2023-03-01  1:24       ` Darrick J. Wong
  1 sibling, 1 reply; 227+ messages in thread
From: Allison Henderson @ 2023-02-25  7:34 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs

On Thu, 2023-02-23 at 18:51 -0800, Darrick J. Wong wrote:
> On Fri, Feb 17, 2023 at 08:02:29PM +0000, Allison Henderson wrote:
> > On Thu, 2023-02-16 at 12:06 -0800, Darrick J. Wong wrote:
> > > Hi everyone,
> > > 
> > > This deluge contains all of the additions to the parent pointers
> > > patchset that I've been working on for the past month.  The
> > > kernel
> > > and
> > > xfsprogs patchsets are based on Allison's v9r2 tag from last
> > > week;
> > > the fstests patches are merely a part of my development tree.  To
> > > recap
> 
> <snip>
> 
> > Ermergersh, thats a lot!  Thanks for all the hard work.  I feel
> > like if
> > we don't come up with a plan for review though, people may not know
> > where to start for these deluges!  Lets see... if we had to break
> > this
> > down, I think would divide it up between the existing parent
> > pointers
> > and the new pptr propositions for ofsck.
> 
> That's a good place to cleave.
> 
> > Then further divide it among
> > kernel space, user space and test case.  If I had to pick only one
> > of
> > these to focus attention on, probably it should be new ofsck
> > changes in
> > the kernel space, since the rest of the deluge is really contingent
> > on
> > it. 
> 
> Yup.  Though you ought to read through the offline fsck patches too.
> Those take a very different approach to resolving parent pointers. 
> So
> much of repair is based on nuking directories that I don't know
> there's
> a good way to rebuild them from parent pointers.
Ok, will take a look

> 
> A thought I had was that when we decide to zap a directory due to
> problems in the directory blocks themselves, we could them initiate a
> scan of the parent pointers to try to find all the dirents we can.  I
> ran into problems with that approach because libxfs_iget allocates
> fresh
> xfs_inode objects (instead of caching and sharing them like the
> kernel
> does) and that made it really hard to scan things in a coherent
> manner.
> 
> > So now we've narrowed this down to a few subsets:
> > 
> > [PATCHSET v9r2d1 0/3] xfs: bug fixes for parent pointers
> > [PATCHSET v9r2d1 0/4] xfs: rework the GETPARENTS ioctl,
> 
> If you read through these two patchsets and think they're ok, then
> either fold the fixes into the main series or tack them on the end,
> whichever is easier.  
ok, I'll take a look, I'll probably tack the first 2 fixes since they
dont seat into an existing patch in the set.

> If you tack them on the end, please add your
> own SOB tags.

Sure?  I SOB'd the last 2 patches of the set in v3, and then you said
to make it an RVB

> 
> > [PATCHSET v9r2d1 00/23] xfs: online fsck support patches
> > [PATCHSET v9r2d1 0/7] xfs: online repair of directories
> > [PATCHSET v9r2d1 0/2] xfs: online checking of parent pointers
> > [PATCHSET v9r2d1 0/3] xfs: online checking of parent pointers
> > [PATCHSET v9r2d1 0/2] xfs: online checking of directories
> 
> The fsck functionality exists to prove the point that directory
> repair
> is /very/ awkward if we have to update p_diroffset.  As such, they
> focused on getting the main parts right ... but with the obvious
> problem of making pptrs dependent on online fsck part 1 getting
> merged.
> 
> Speaking of which -- can we merge online fsck for 6.4?  Please? :)
I'm fine with it as long as everyone else is?  I'm not sure who this is
directed to. I admittedly haven't been able to work through all of it,
but I don't think anyone has.  I don't know that exhaustive reviewing
as a whole is particularly effective though.  Back when the combined
set of "attr refactoring" + "larp" + "parent pointers" was particularly
large, I used to just send out subsets that I thought were more
reasonable for people digest.  That way people can look at the giant
mega-set if they really gotta see it, but it kept the reviews more
focused on a sort of smaller next step.

> 
> > [PATCHSET v9r2d1 0/5] xfs: encode parent pointer name in xattr key
> 
> Resolving the questions presented by this series is critical to
> nailing
> down the ondisk format and merging the feature.  But we'll get to
> that
> below.
> 
> > [PATCHSET v9r2d1 0/3] xfs: use flex arrays for XFS_IOC_GETPARENTS,
> 
> I'd like to know what you think about converting the ioctl definition
> to
> flex arrays instead of the fixed size structs.  I'm not sure where to
> put this series, though.  If you decide that you want 'em, then
> ideally
> they'd be in xfs_fs.h from the introduction of XFS_IOC_GETPARENTS,
> but
> I don't see any point in backporting them around "xfs: rework the
> GETPARENTS ioctl".
> 
> (I would be ok if you rolled all of it into patch 25 from the
> original
> v9 set.)
I'll take a look at it, I didnt put a whole lot of focus on the ioctl
initially because the only thing that was using it at the time was the
test case, and I wanted to keep attention more on the infrastructure.
> 
> > Of those, I think "xfs: encode parent pointer name in xattr key" is
> > the
> > only one that might impact other features since it's changeing the
> > ondisk format from when we first started the effort years ago.  So
> > probably that might be the best place for people to start since if
> > this
> > needs to change it might impact some of the other subsets in the
> > deluge, or even features they are working on if they've based
> > anything
> > on the existing pptr set.
> 
> Bingo!
> 
> The biggest question about the format change is (IMHO) whether we're
> ok
> with using a hash function for parent pointer names that don't fit in
> the attr key space, and which hash?
> 
> The sha2 family was designed to be collision resistant, but I don't
> anticipate that will last forever.  The hash is computed from (the
> full
> name and the child generation number) when the dirent name is longer
> than 243 bytes.  The first 179 bytes of the dirent name are still
> written in the parent pointer attr name.  An attacker would have to
> find
> a collision that only changes the last 76 bytes of the dirent name,
> and
> they'd have to know the generation number at runtime.
> 
> (Note: dirent names shorter than 243 bytes are written directly into
> the
> parent pointer xattr name, no hashing required.)
> 
> I /think/ that's good enough, but I'm no cryptanalyst.  The
> alternative
> would be to change the xattr format so that the namelen field in the
> leaf structure to encode *only* the name component of the parent
> pointer.  This would lead to a lot of special cased xattr code and
> probably a lot of bugs and other stupid problems, which is why I
> didn't
> take that route.
> 
> Thoughts?

Hmm, well, it sounds like a risk to be weighed.  It wouldn't happen
very often.  It seems like it would be extremely rare.  But when it
does it will likely be quite unpleasant.  

I think another question to ask would be how often does the parent
pointer really need to be updated in a repair?  In most cases, an
orphaned inode will likely be able to return to the dirofset from
whence it came.  So an update may be unlikely.  Even more so would be
the worst case of needing to update crazy amounts of parent pointers. 
So  another option is to simply pick a cap and error out if the demand
is too much.  Likely if this condition does arise, there's probably
bigger issues going on.

While option A is substantially more rare than option B, you could
probably pick either one and rarely encounter the error path.  While
option A does have the advantage of being more memory conservative, it
has the disadvantage of possibly being a really ugly sleeping bug. 
While option B might error out when option A would have not, it would
at least be clear as to why it did, and probably elude to the presence
of bigger problems, such as an internal bug that we should probably go
catch, or perhaps something external corrupting the fs image, which
ofsck may not be able to solve anyway.  

FWIW I seem to recall running across the idea of using hashes as keys
in other projects I've been on, and most of the time the rarity of the
collision was considered an acceptable risk, though it's really about
which risk really bothers you more.

> 
> > I feel like a 5 patch subset is a very reasonable thing to ask
> > people
> > to give their attention to.  That way they dont get lost in things
> > like
> > nits for optimizations that might not even matter if something it
> > depends on changes.
> > 
> > For the most part I am ok with changeing the format as long as
> > everyone
> > is aware and in agreement so that we dont get caught up re-coding
> > efforts that seem to have stuggled with disagreements now on the
> > scale
> > of decades.  Some of these patches were already very old by the
> > time I
> > got them!
> 
> Hheehhe.  Same here -- rmap was pretty old by the time I started
> pushing
> that for reals. :)
> 
> > On a side note, there are some preliminary patches of kernel side
> > parent pointers that are either larp fixes or refactoring not
> > sensitive
> > to the proposed ofsck changes.  These patches a have been floating
> > around for a while now, so if no one has any gripes, I think just
> > merging those would help cut down the amount of rebaseing, user
> > space
> > porting and patch reviewing that goes on for every version.  (maybe
> > the
> > first 1 though 7 of the 28 patch set, if folks are ok with that)
> 
> I thought about doing that for 6.3, but I found enough bugs in the
> locking stuff (recall the first bugfix series) that I held back.  I'm
> not sure about the two "Increase <blah>" patches -- they'll bloat
> kernel
> structures without a real user for them.

I don't think the first 7 are order sensitive, we should be able to do
just 1, 4, 5, 6 and 7.

> 
> <shrug>
> 
> > I think the shear size of some of these sets tend to work against
> > them,
> > as people likely cannot afford the time block they present on the
> > surface.
> 
> Agreed.  At this point, I've worked through enough of the parent
> pointers code to understand what's going on that I'm ok with merging
> it
> once we settle the above question.
> 
> FWIW the whole series (kernel+xfsprogs+fstests) has been passing my
> nightly QA farm for a couple of weeks now despite my constant
> hammering
> on it, so I think the implementation is ready.
> 
> > So I think we would do well to find a way to introduce them
> > at a reasonable pace and keep attention focused on the subsections
> > that
> > should require more than others, and hopefully keep thing moving in
> > a
> > progressive direction.
> 
> I disagree -- I want to merge online fsck part 1 so I can get that
> out
> of my dev trees.  Then I want to focus on getting this over the
> finish
> line and merged.  But then I'm not known for incrementalism. :P
Well, I notice people respond better to subsets in smaller doses
though.  And then it gives the preliminary patches time to stabilize if
people do find an issue.

> 
> --D
> 
> > Thx!
> > Allison
> > 


^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC DELUGE v9r2d1] xfs: Parent Pointers
  2023-02-25  7:34     ` Allison Henderson
@ 2023-03-01  1:24       ` Darrick J. Wong
  2023-03-08 22:47         ` Allison Henderson
  0 siblings, 1 reply; 227+ messages in thread
From: Darrick J. Wong @ 2023-03-01  1:24 UTC (permalink / raw)
  To: Allison Henderson; +Cc: linux-xfs

On Sat, Feb 25, 2023 at 07:34:14AM +0000, Allison Henderson wrote:
> On Thu, 2023-02-23 at 18:51 -0800, Darrick J. Wong wrote:
> > On Fri, Feb 17, 2023 at 08:02:29PM +0000, Allison Henderson wrote:
> > > On Thu, 2023-02-16 at 12:06 -0800, Darrick J. Wong wrote:
> > > > Hi everyone,
> > > > 
> > > > This deluge contains all of the additions to the parent pointers
> > > > patchset that I've been working on for the past month.  The
> > > > kernel
> > > > and
> > > > xfsprogs patchsets are based on Allison's v9r2 tag from last
> > > > week;
> > > > the fstests patches are merely a part of my development tree.  To
> > > > recap
> > 
> > <snip>
> > 
> > > Ermergersh, thats a lot!  Thanks for all the hard work.  I feel
> > > like if
> > > we don't come up with a plan for review though, people may not know
> > > where to start for these deluges!  Lets see... if we had to break
> > > this
> > > down, I think would divide it up between the existing parent
> > > pointers
> > > and the new pptr propositions for ofsck.
> > 
> > That's a good place to cleave.
> > 
> > > Then further divide it among
> > > kernel space, user space and test case.  If I had to pick only one
> > > of
> > > these to focus attention on, probably it should be new ofsck
> > > changes in
> > > the kernel space, since the rest of the deluge is really contingent
> > > on
> > > it. 
> > 
> > Yup.  Though you ought to read through the offline fsck patches too.
> > Those take a very different approach to resolving parent pointers. 
> > So
> > much of repair is based on nuking directories that I don't know
> > there's
> > a good way to rebuild them from parent pointers.
> Ok, will take a look
> 
> > 
> > A thought I had was that when we decide to zap a directory due to
> > problems in the directory blocks themselves, we could them initiate a
> > scan of the parent pointers to try to find all the dirents we can.  I
> > ran into problems with that approach because libxfs_iget allocates
> > fresh
> > xfs_inode objects (instead of caching and sharing them like the
> > kernel
> > does) and that made it really hard to scan things in a coherent
> > manner.
> > 
> > > So now we've narrowed this down to a few subsets:
> > > 
> > > [PATCHSET v9r2d1 0/3] xfs: bug fixes for parent pointers
> > > [PATCHSET v9r2d1 0/4] xfs: rework the GETPARENTS ioctl,
> > 
> > If you read through these two patchsets and think they're ok, then
> > either fold the fixes into the main series or tack them on the end,
> > whichever is easier.  
> ok, I'll take a look, I'll probably tack the first 2 fixes since they
> dont seat into an existing patch in the set.

Ok.

> > If you tack them on the end, please add your
> > own SOB tags.
> 
> Sure?  I SOB'd the last 2 patches of the set in v3, and then you said
> to make it an RVB

Er... SOB, RVB, whichever tag(s) get us to a patch that has a signoff
and a review. :)

> > 
> > > [PATCHSET v9r2d1 00/23] xfs: online fsck support patches
> > > [PATCHSET v9r2d1 0/7] xfs: online repair of directories
> > > [PATCHSET v9r2d1 0/2] xfs: online checking of parent pointers
> > > [PATCHSET v9r2d1 0/3] xfs: online checking of parent pointers
> > > [PATCHSET v9r2d1 0/2] xfs: online checking of directories
> > 
> > The fsck functionality exists to prove the point that directory
> > repair
> > is /very/ awkward if we have to update p_diroffset.  As such, they
> > focused on getting the main parts right ... but with the obvious
> > problem of making pptrs dependent on online fsck part 1 getting
> > merged.
> > 
> > Speaking of which -- can we merge online fsck for 6.4?  Please? :)
> I'm fine with it as long as everyone else is?  I'm not sure who this is
> directed to.

10% dchinner, 90% anyone we don't know about who might swoop in at the
last minute and NAK it. ;)

> I admittedly haven't been able to work through all of it,
> but I don't think anyone has.  I don't know that exhaustive reviewing
> as a whole is particularly effective though.  Back when the combined
> set of "attr refactoring" + "larp" + "parent pointers" was particularly
> large, I used to just send out subsets that I thought were more
> reasonable for people digest.  That way people can look at the giant
> mega-set if they really gotta see it, but it kept the reviews more
> focused on a sort of smaller next step.

TBH every time I went to look at all that, I pulled your github branch
and looked at the whole thing.  I paid more attention to whatever was
being reviewed on-list, obviously.  That said, after about the fifth
round of looking at a patchset I start feeling like I'm only going to
increase my knowledge of the code by using it to write something.  At
that point it's easier to convince me to merge it, or at least to fling
it at fstestscloud.

> 
> > 
> > > [PATCHSET v9r2d1 0/5] xfs: encode parent pointer name in xattr key
> > 
> > Resolving the questions presented by this series is critical to
> > nailing
> > down the ondisk format and merging the feature.  But we'll get to
> > that
> > below.
> > 
> > > [PATCHSET v9r2d1 0/3] xfs: use flex arrays for XFS_IOC_GETPARENTS,
> > 
> > I'd like to know what you think about converting the ioctl definition
> > to
> > flex arrays instead of the fixed size structs.  I'm not sure where to
> > put this series, though.  If you decide that you want 'em, then
> > ideally
> > they'd be in xfs_fs.h from the introduction of XFS_IOC_GETPARENTS,
> > but
> > I don't see any point in backporting them around "xfs: rework the
> > GETPARENTS ioctl".
> > 
> > (I would be ok if you rolled all of it into patch 25 from the
> > original
> > v9 set.)
> I'll take a look at it, I didnt put a whole lot of focus on the ioctl
> initially because the only thing that was using it at the time was the
> test case, and I wanted to keep attention more on the infrastructure.

<nod> I only started looking at it because I started pounding on it with
xfs_scrub and noticed problems. :D

> > 
> > > Of those, I think "xfs: encode parent pointer name in xattr key" is
> > > the
> > > only one that might impact other features since it's changeing the
> > > ondisk format from when we first started the effort years ago.  So
> > > probably that might be the best place for people to start since if
> > > this
> > > needs to change it might impact some of the other subsets in the
> > > deluge, or even features they are working on if they've based
> > > anything
> > > on the existing pptr set.
> > 
> > Bingo!
> > 
> > The biggest question about the format change is (IMHO) whether we're
> > ok
> > with using a hash function for parent pointer names that don't fit in
> > the attr key space, and which hash?
> > 
> > The sha2 family was designed to be collision resistant, but I don't
> > anticipate that will last forever.  The hash is computed from (the
> > full
> > name and the child generation number) when the dirent name is longer
> > than 243 bytes.  The first 179 bytes of the dirent name are still
> > written in the parent pointer attr name.  An attacker would have to
> > find
> > a collision that only changes the last 76 bytes of the dirent name,
> > and
> > they'd have to know the generation number at runtime.
> > 
> > (Note: dirent names shorter than 243 bytes are written directly into
> > the
> > parent pointer xattr name, no hashing required.)
> > 
> > I /think/ that's good enough, but I'm no cryptanalyst.  The
> > alternative
> > would be to change the xattr format so that the namelen field in the
> > leaf structure to encode *only* the name component of the parent
> > pointer.  This would lead to a lot of special cased xattr code and
> > probably a lot of bugs and other stupid problems, which is why I
> > didn't
> > take that route.
> > 
> > Thoughts?
> 
> Hmm, well, it sounds like a risk to be weighed.  It wouldn't happen
> very often.  It seems like it would be extremely rare.  But when it
> does it will likely be quite unpleasant.  
> 
> I think another question to ask would be how often does the parent
> pointer really need to be updated in a repair?  In most cases, an
> orphaned inode will likely be able to return to the dirofset from
> whence it came.  So an update may be unlikely.  Even more so would be
> the worst case of needing to update crazy amounts of parent pointers. 
> So  another option is to simply pick a cap and error out if the demand
> is too much.  Likely if this condition does arise, there's probably
> bigger issues going on.
> 
> While option A is substantially more rare than option B, you could
> probably pick either one and rarely encounter the error path.  While
> option A does have the advantage of being more memory conservative, it
> has the disadvantage of possibly being a really ugly sleeping bug. 
> While option B might error out when option A would have not, it would
> at least be clear as to why it did, and probably elude to the presence
> of bigger problems, such as an internal bug that we should probably go
> catch, or perhaps something external corrupting the fs image, which
> ofsck may not be able to solve anyway.  
> 
> FWIW I seem to recall running across the idea of using hashes as keys
> in other projects I've been on, and most of the time the rarity of the
> collision was considered an acceptable risk, though it's really about
> which risk really bothers you more.

I want to study sha2 hash collisions and/or how the xattr code stumbles
over attrs with the same dahash first.  Dealing with colliding xattr
names might not be as painful for the parent pointer code as I'm
currently thinking.

> > 
> > > I feel like a 5 patch subset is a very reasonable thing to ask
> > > people
> > > to give their attention to.  That way they dont get lost in things
> > > like
> > > nits for optimizations that might not even matter if something it
> > > depends on changes.
> > > 
> > > For the most part I am ok with changeing the format as long as
> > > everyone
> > > is aware and in agreement so that we dont get caught up re-coding
> > > efforts that seem to have stuggled with disagreements now on the
> > > scale
> > > of decades.  Some of these patches were already very old by the
> > > time I
> > > got them!
> > 
> > Hheehhe.  Same here -- rmap was pretty old by the time I started
> > pushing
> > that for reals. :)
> > 
> > > On a side note, there are some preliminary patches of kernel side
> > > parent pointers that are either larp fixes or refactoring not
> > > sensitive
> > > to the proposed ofsck changes.  These patches a have been floating
> > > around for a while now, so if no one has any gripes, I think just
> > > merging those would help cut down the amount of rebaseing, user
> > > space
> > > porting and patch reviewing that goes on for every version.  (maybe
> > > the
> > > first 1 though 7 of the 28 patch set, if folks are ok with that)
> > 
> > I thought about doing that for 6.3, but I found enough bugs in the
> > locking stuff (recall the first bugfix series) that I held back.  I'm
> > not sure about the two "Increase <blah>" patches -- they'll bloat
> > kernel
> > structures without a real user for them.
> 
> I don't think the first 7 are order sensitive, we should be able to do
> just 1, 4, 5, 6 and 7.

OH.

> > 
> > <shrug>
> > 
> > > I think the shear size of some of these sets tend to work against
> > > them,
> > > as people likely cannot afford the time block they present on the
> > > surface.
> > 
> > Agreed.  At this point, I've worked through enough of the parent
> > pointers code to understand what's going on that I'm ok with merging
> > it
> > once we settle the above question.
> > 
> > FWIW the whole series (kernel+xfsprogs+fstests) has been passing my
> > nightly QA farm for a couple of weeks now despite my constant
> > hammering
> > on it, so I think the implementation is ready.
> > 
> > > So I think we would do well to find a way to introduce them
> > > at a reasonable pace and keep attention focused on the subsections
> > > that
> > > should require more than others, and hopefully keep thing moving in
> > > a
> > > progressive direction.
> > 
> > I disagree -- I want to merge online fsck part 1 so I can get that
> > out
> > of my dev trees.  Then I want to focus on getting this over the
> > finish
> > line and merged.  But then I'm not known for incrementalism. :P
> Well, I notice people respond better to subsets in smaller doses
> though.  And then it gives the preliminary patches time to stabilize if
> people do find an issue.

<nod> I'll keep that in mind.

--D

> > 
> > --D
> > 
> > > Thx!
> > > Allison
> > > 
> 

^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [PATCHSET v9r2d1 0/5] xfs: encode parent pointer name in xattr key
  2023-02-16 20:28 ` [PATCHSET v9r2d1 0/5] xfs: encode parent pointer name in xattr key Darrick J. Wong
                     ` (5 preceding siblings ...)
  2023-02-18  8:12   ` [PATCHSET v9r2d1 0/5] xfs: encode parent pointer name in xattr key Amir Goldstein
@ 2023-03-03 16:43   ` Darrick J. Wong
  2023-03-03 17:11   ` [PATCHSET v9r2d1.1 00/13] xfs: remove parent pointer hashing Darrick J. Wong
  7 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-03-03 16:43 UTC (permalink / raw)
  To: allison.henderson, linux-xfs

On Thu, Feb 16, 2023 at 12:28:30PM -0800, Darrick J. Wong wrote:
> Hi all,
> 
> As I've mentioned in past comments on the parent pointers patchset, the
> proposed ondisk parent pointer format presents a major difficulty for
> online directory repair.  This difficulty derives from encoding the
> directory offset of the dirent that the parent pointer is mirroring.
> Recall that parent pointers are stored in extended attributes:
> 
>     (parent_ino, parent_gen, diroffset) -> (dirent_name)
> 
> If the directory is rebuilt, the offsets of the new directory entries
> must match the diroffset encoded in the parent pointer, or the
> filesystem becomes inconsistent.  There are a few ways to solve this
> problem.
> 
> One approach would be to augment the directory addname function to take
> a diroffset and try to create the new entry at that offset.  This will
> not work if the original directory became corrupt and the parent
> pointers were written out with impossible diroffsets (e.g. overlapping).
> Requiring matching diroffsets also prevents reorganization and
> compaction of directories.
> 
> This could be remedied by recording the parent pointer diroffset updates
> necessary to retain consistency, and using the logged parent pointer
> replace function to rewrite parent pointers as necessary.  This is a
> poor choice from a performance perspective because the logged xattr
> updates must be committed in the same transaction that commits the new
> directory structure.  If there are a large number of diroffset updates,
> then the directory commit could take an even longer time.
> 
> Worse yet, if the logged xattr updates fill up the transaction, repair
> will have no choice but to roll to a fresh transaction to continue
> logging.  This breaks repair's policy that repairs should commit
> atomically.  It may break the filesystem as well, since all files
> involved are pinned until the delayed pptr xattr processing completes.
> This is a completely bad engineering choice.
> 
> Note that the diroffset information is not used anywhere in the
> directory lookup code.  Observe that the only information that we
> require for a parent pointer is the inverse of an pre-ftype dirent,
> since this is all we need to reconstruct a directory entry:
> 
>     (parent_ino, dirent_name) -> NULL
> 
> The xattr code supports xattrs with zero-length values, surprisingly.
> The parent_gen field makes it easy to export parent handle information,
> so it can be retained:
> 
>     (parent_ino, parent_gen, dirent_name) -> NULL
> 
> Moving the ondisk format to this format is very advantageous for repair
> code.  Unfortunately, there is one hitch: xattr names cannot exceed 255
> bytes due to ondisk format limitations.  We don't want to constrain the
> length of dirent names, so instead we could use collision resistant
> hashes to handle dirents with very long names:
> 
>     (parent_ino, parent_gen, sha512(dirent_name)) -> (dirent_name)
> 
> The first two patches implement this schema.  However, this encoding is
> not maximally efficient, since many directory names are shorter than the
> length of a sha512 hash.  The last three patches in the series bifurcate
> the parent pointer ondisk format depending on context:
> 
> For dirent names shorter than 243 bytes:
> 
>     (parent_ino, parent_gen, dirent_name) -> NULL
> 
> For dirent names longer than 243 bytes:
> 
>     (parent_ino, parent_gen, dirent_name[0:178],
>      sha512(child_gen, dirent_name)) -> (dirent_name[179:255])

Heh, that should've been dirent_name[179:254].

> The child file's generation number is mixed into the sha512 computation
> to make it a little more difficult for unprivileged userspace to attempt
> collisions.
> 
> A messier solution to this problem would be to extend the xattr ondisk
> format to allow parent pointers to have xattr names up to 267 bytes.
> This would likely involve redefining the ondisk namelen field to omit
> the size of the parent ino/gen information and might be madness.

Update:

After some subtle prodding from Dave, I realized that there's a simpler
solution to this problem: extend the xattr match predicate to check both
the xattr name /and/ the xattr value.  Parent pointers cannot be remote
format because the amount of data are never larger than 3/4 of 1FSB, so
the value of an ATTR_PARENT attribute is always immediately available.

The pptr ondisk format becomes one of:

    (parent_ino, parent_gen, dirent_name) -> NULL
    (parent_ino, parent_gen, dirent_name[0:242]) -> (dirent_name[243:254])

Matching on both xattr name and value is only useful* for parent
pointers, so I introduced a new XFS_DA_OP_VLOOKUP flag to gate this new
mode.  No more sha512 in the attr name, no more worrying about collision
resistance of sha*, and one less dependency for xfs.

The next challenge was to log the VLOOKUP flag when we're doing xattr
operations and to recover that state when replaying an ATTRI log item.
This I did by creating new "NV" variants of ATTRI_OP_FLAGS_{SET,REMOVE}.
The NVSET and NVREMOVE opcodes require VLOOKUP, and NVREMOVE can log
an xattr value buffer.  Since parent pointers is an unmerged incompat
feature, we don't need a new log-incompat feature to protect them.

The last thing was to update NVREPLACE so that we can handle rename
operations.  Here too we need VLOOKUPs, but I also needed to log both
the old value and the new value, so I changed the attri ondisk format.
For NVREPLACE, the old and new name lengths are two u16 overlaid atop
the alfi_name_len field; and the new value length is a u32 that replaces
the old pad (and what Allison called alfi_nname_len).

I'm going to run QA on this over the weekend, and figure out how to
collapse this patchset with the new one.  I'll rebase the whole branch
on pptrs v10 whenever it comes out.

--D

* Unless someone has a usecase for cmpxchg of extended attributes?

> If you're going to start using this mess, you probably ought to just
> pull from my git trees, which are linked below.
> 
> This is an extraordinary way to destroy everything.  Enjoy!
> Comments and questions are, as always, welcome.
> kernel git tree:
> https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=pptrs-name-in-attr-key
> 
> xfsprogs git tree:
> https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=pptrs-name-in-attr-key
> 
> fstests git tree:
> https://git.kernel.org/cgit/linux/kernel/git/djwong/xfstests-dev.git/log/?h=pptrs-name-in-attr-key
> ---
>  fs/xfs/Kconfig                 |    1 
>  fs/xfs/libxfs/xfs_da_format.h  |   49 +++++++
>  fs/xfs/libxfs/xfs_fs.h         |    4 -
>  fs/xfs/libxfs/xfs_parent.c     |  265 ++++++++++++++++++++++++++++++++--------
>  fs/xfs/libxfs/xfs_parent.h     |   48 +++++--
>  fs/xfs/libxfs/xfs_trans_resv.c |    6 -
>  fs/xfs/scrub/dir.c             |   16 ++
>  fs/xfs/scrub/dir_repair.c      |   87 ++++---------
>  fs/xfs/scrub/parent.c          |   51 +++++---
>  fs/xfs/scrub/parent_repair.c   |   29 ++--
>  fs/xfs/scrub/trace.h           |   48 ++-----
>  fs/xfs/xfs_attr_item.c         |    4 -
>  fs/xfs/xfs_inode.c             |   30 ++---
>  fs/xfs/xfs_linux.h             |    1 
>  fs/xfs/xfs_mount.c             |   13 ++
>  fs/xfs/xfs_mount.h             |    3 
>  fs/xfs/xfs_ondisk.h            |    6 +
>  fs/xfs/xfs_parent_utils.c      |    4 -
>  fs/xfs/xfs_sha512.h            |   42 ++++++
>  fs/xfs/xfs_super.c             |    3 
>  fs/xfs/xfs_symlink.c           |    3 
>  21 files changed, 481 insertions(+), 232 deletions(-)
>  create mode 100644 fs/xfs/xfs_sha512.h
> 

^ permalink raw reply	[flat|nested] 227+ messages in thread

* [PATCHSET v9r2d1.1 00/13] xfs: remove parent pointer hashing
  2023-02-16 20:28 ` [PATCHSET v9r2d1 0/5] xfs: encode parent pointer name in xattr key Darrick J. Wong
                     ` (6 preceding siblings ...)
  2023-03-03 16:43   ` Darrick J. Wong
@ 2023-03-03 17:11   ` Darrick J. Wong
  2023-03-03 17:11     ` [PATCH 01/13] xfs: make xfs_attr_set require XFS_DA_OP_REMOVE Darrick J. Wong
                       ` (12 more replies)
  7 siblings, 13 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-03-03 17:11 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

Hi all,

Dave Chinner pointed out (a bit too subtly) that hashing the dirent name
to try to squash it into the parent pointer xattr name is unnecessary
because we could simply make the xattr matching predicate compare names.
Do that instead and drop the hashing.

If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.
kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=pptrs-vlookup

xfsprogs git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfsprogs-dev.git/log/?h=pptrs-vlookup
---
 fs/xfs/Kconfig                 |    1 
 fs/xfs/libxfs/xfs_attr.c       |   39 +++--
 fs/xfs/libxfs/xfs_attr_leaf.c  |   41 +++++
 fs/xfs/libxfs/xfs_da_btree.h   |    6 +
 fs/xfs/libxfs/xfs_da_format.h  |   34 ++---
 fs/xfs/libxfs/xfs_log_format.h |   30 +++-
 fs/xfs/libxfs/xfs_parent.c     |  302 +++++++++++++---------------------------
 fs/xfs/libxfs/xfs_parent.h     |   16 --
 fs/xfs/libxfs/xfs_trans_resv.c |    1 
 fs/xfs/scrub/dir.c             |   38 +----
 fs/xfs/scrub/parent.c          |   61 +-------
 fs/xfs/scrub/parent_repair.c   |   34 +----
 fs/xfs/xfs_attr_item.c         |  217 ++++++++++++++++++++---------
 fs/xfs/xfs_attr_item.h         |    3 
 fs/xfs/xfs_linux.h             |    1 
 fs/xfs/xfs_mount.c             |   13 --
 fs/xfs/xfs_mount.h             |    3 
 fs/xfs/xfs_ondisk.h            |    3 
 fs/xfs/xfs_sha512.h            |   42 ------
 fs/xfs/xfs_super.c             |    3 
 fs/xfs/xfs_xattr.c             |    5 +
 21 files changed, 383 insertions(+), 510 deletions(-)
 delete mode 100644 fs/xfs/xfs_sha512.h


^ permalink raw reply	[flat|nested] 227+ messages in thread

* [PATCH 01/13] xfs: make xfs_attr_set require XFS_DA_OP_REMOVE
  2023-03-03 17:11   ` [PATCHSET v9r2d1.1 00/13] xfs: remove parent pointer hashing Darrick J. Wong
@ 2023-03-03 17:11     ` Darrick J. Wong
  2023-03-03 17:11     ` [PATCH 02/13] xfs: allow xattr matching on value for local/sf attrs Darrick J. Wong
                       ` (11 subsequent siblings)
  12 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-03-03 17:11 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

In the next patch we're going to add the ability to look up local/sf
xattrs based on the attr name and value matching.  As a result, we need
callers of xfs_attr_set to declare explicitly that they want to remove
an xattr.  Passing in NULL value will no longer suffice.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_attr.c   |    9 +++++----
 fs/xfs/libxfs/xfs_parent.c |    1 +
 fs/xfs/xfs_xattr.c         |    5 +++++
 3 files changed, 11 insertions(+), 4 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index 3065dd622102..756d93526075 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -977,6 +977,7 @@ xfs_attr_set(
 	struct xfs_inode	*dp = args->dp;
 	struct xfs_mount	*mp = dp->i_mount;
 	struct xfs_trans_res	tres;
+	bool			is_remove = args->op_flags & XFS_DA_OP_REMOVE;
 	bool			rsvd;
 	int			error, local;
 	int			rmt_blks = 0;
@@ -1004,7 +1005,7 @@ xfs_attr_set(
 	args->op_flags = XFS_DA_OP_OKNOENT |
 					(args->op_flags & XFS_DA_OP_LOGGED);
 
-	if (args->value) {
+	if (!is_remove) {
 		XFS_STATS_INC(mp, xs_attr_set);
 		args->total = xfs_attr_calc_size(args, &local);
 
@@ -1038,7 +1039,7 @@ xfs_attr_set(
 	if (error)
 		return error;
 
-	if (args->value || xfs_inode_hasattr(dp)) {
+	if (!is_remove || xfs_inode_hasattr(dp)) {
 		error = xfs_iext_count_may_overflow(dp, XFS_ATTR_FORK,
 				XFS_IEXT_ATTR_MANIP_CNT(rmt_blks));
 		if (error == -EFBIG)
@@ -1052,7 +1053,7 @@ xfs_attr_set(
 	switch (error) {
 	case -EEXIST:
 		/* if no value, we are performing a remove operation */
-		if (!args->value) {
+		if (is_remove) {
 			error = xfs_attr_defer_remove(args);
 			break;
 		}
@@ -1064,7 +1065,7 @@ xfs_attr_set(
 		break;
 	case -ENOATTR:
 		/* Can't remove what isn't there. */
-		if (!args->value)
+		if (is_remove)
 			goto out_trans_cancel;
 
 		/* Pure replace fails if no existing attr to replace. */
diff --git a/fs/xfs/libxfs/xfs_parent.c b/fs/xfs/libxfs/xfs_parent.c
index f7fecee93894..387f3c65287f 100644
--- a/fs/xfs/libxfs/xfs_parent.c
+++ b/fs/xfs/libxfs/xfs_parent.c
@@ -460,6 +460,7 @@ xfs_parent_unset(
 	scr->args.geo		= ip->i_mount->m_attr_geo;
 	scr->args.name		= (const unsigned char *)&scr->rec;
 	scr->args.namelen	= reclen;
+	scr->args.op_flags	= XFS_DA_OP_REMOVE;
 	scr->args.whichfork	= XFS_ATTR_FORK;
 
 	return xfs_attr_set(&scr->args);
diff --git a/fs/xfs/xfs_xattr.c b/fs/xfs/xfs_xattr.c
index 85edd7e05fde..8f8aa13bf7eb 100644
--- a/fs/xfs/xfs_xattr.c
+++ b/fs/xfs/xfs_xattr.c
@@ -103,6 +103,11 @@ xfs_attr_change(
 		use_logging = true;
 	}
 
+	if (args->value)
+		args->op_flags &= ~XFS_DA_OP_REMOVE;
+	else
+		args->op_flags |= XFS_DA_OP_REMOVE;
+
 	error = xfs_attr_set(args);
 
 	if (use_logging)


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 02/13] xfs: allow xattr matching on value for local/sf attrs
  2023-03-03 17:11   ` [PATCHSET v9r2d1.1 00/13] xfs: remove parent pointer hashing Darrick J. Wong
  2023-03-03 17:11     ` [PATCH 01/13] xfs: make xfs_attr_set require XFS_DA_OP_REMOVE Darrick J. Wong
@ 2023-03-03 17:11     ` Darrick J. Wong
  2023-03-03 17:11     ` [PATCH 03/13] xfs: preserve VLOOKUP in xfs_attr_set Darrick J. Wong
                       ` (10 subsequent siblings)
  12 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-03-03 17:11 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_attr_leaf.c |   41 +++++++++++++++++++++++++++++++++++------
 fs/xfs/libxfs/xfs_da_btree.h  |    4 +++-
 2 files changed, 38 insertions(+), 7 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_attr_leaf.c b/fs/xfs/libxfs/xfs_attr_leaf.c
index e6c4c8b52a55..d05f2c5cc0cc 100644
--- a/fs/xfs/libxfs/xfs_attr_leaf.c
+++ b/fs/xfs/libxfs/xfs_attr_leaf.c
@@ -473,10 +473,12 @@ xfs_attr3_leaf_read(
  */
 static bool
 xfs_attr_match(
-	struct xfs_da_args	*args,
-	uint8_t			namelen,
-	unsigned char		*name,
-	int			flags)
+	const struct xfs_da_args	*args,
+	uint8_t				namelen,
+	const unsigned char		*name,
+	unsigned int			valuelen,
+	const void			*value,
+	int				flags)
 {
 
 	if (args->namelen != namelen)
@@ -484,6 +486,23 @@ xfs_attr_match(
 	if (memcmp(args->name, name, namelen) != 0)
 		return false;
 
+	if (args->op_flags & XFS_DA_OP_VLOOKUP) {
+		if (args->valuelen != valuelen)
+			return false;
+		if (args->valuelen && !value) {
+			/* not implemented for remote values */
+			ASSERT(0);
+			return false;
+		}
+		if (valuelen && !args->value) {
+			/* caller gave us valuelen > 0 but no value?? */
+			ASSERT(0);
+			return false;
+		}
+		if (valuelen > 0 && memcmp(args->value, value, valuelen) != 0)
+			return false;
+	}
+
 	/* Recovery ignores the INCOMPLETE flag. */
 	if ((args->op_flags & XFS_DA_OP_RECOVERY) &&
 	    args->attr_filter == (flags & XFS_ATTR_NSP_ONDISK_MASK))
@@ -502,6 +521,10 @@ xfs_attr_copy_value(
 	unsigned char		*value,
 	int			valuelen)
 {
+	/* vlookups already supplied the attr value; don't copy anything */
+	if (args->op_flags & XFS_DA_OP_VLOOKUP)
+		return 0;
+
 	/*
 	 * No copy if all we have to do is get the length
 	 */
@@ -726,6 +749,7 @@ xfs_attr_sf_findname(
 			     base += size, i++) {
 		size = xfs_attr_sf_entsize(sfe);
 		if (!xfs_attr_match(args, sfe->namelen, sfe->nameval,
+				    sfe->valuelen, &sfe->nameval[sfe->namelen],
 				    sfe->flags))
 			continue;
 		break;
@@ -896,6 +920,7 @@ xfs_attr_shortform_lookup(xfs_da_args_t *args)
 	for (i = 0; i < sf->hdr.count;
 				sfe = xfs_attr_sf_nextentry(sfe), i++) {
 		if (xfs_attr_match(args, sfe->namelen, sfe->nameval,
+				sfe->valuelen, &sfe->nameval[sfe->namelen],
 				sfe->flags))
 			return -EEXIST;
 	}
@@ -923,6 +948,7 @@ xfs_attr_shortform_getvalue(
 	for (i = 0; i < sf->hdr.count;
 				sfe = xfs_attr_sf_nextentry(sfe), i++) {
 		if (xfs_attr_match(args, sfe->namelen, sfe->nameval,
+				sfe->valuelen, &sfe->nameval[sfe->namelen],
 				sfe->flags))
 			return xfs_attr_copy_value(args,
 				&sfe->nameval[args->namelen], sfe->valuelen);
@@ -2484,14 +2510,17 @@ xfs_attr3_leaf_lookup_int(
 		if (entry->flags & XFS_ATTR_LOCAL) {
 			name_loc = xfs_attr3_leaf_name_local(leaf, probe);
 			if (!xfs_attr_match(args, name_loc->namelen,
-					name_loc->nameval, entry->flags))
+					name_loc->nameval,
+					be16_to_cpu(name_loc->valuelen),
+					&name_loc->nameval[name_loc->namelen],
+					entry->flags))
 				continue;
 			args->index = probe;
 			return -EEXIST;
 		} else {
 			name_rmt = xfs_attr3_leaf_name_remote(leaf, probe);
 			if (!xfs_attr_match(args, name_rmt->namelen,
-					name_rmt->name, entry->flags))
+					name_rmt->name, 0, NULL, entry->flags))
 				continue;
 			args->index = probe;
 			args->rmtvaluelen = be32_to_cpu(name_rmt->valuelen);
diff --git a/fs/xfs/libxfs/xfs_da_btree.h b/fs/xfs/libxfs/xfs_da_btree.h
index 90b86d00258f..0ef32f629e1b 100644
--- a/fs/xfs/libxfs/xfs_da_btree.h
+++ b/fs/xfs/libxfs/xfs_da_btree.h
@@ -96,6 +96,7 @@ typedef struct xfs_da_args {
 #define XFS_DA_OP_REMOVE	(1u << 6) /* this is a remove operation */
 #define XFS_DA_OP_RECOVERY	(1u << 7) /* Log recovery operation */
 #define XFS_DA_OP_LOGGED	(1u << 8) /* Use intent items to track op */
+#define XFS_DA_OP_VLOOKUP	(1u << 9) /* Compare attr value during lookup */
 
 #define XFS_DA_OP_FLAGS \
 	{ XFS_DA_OP_JUSTCHECK,	"JUSTCHECK" }, \
@@ -106,7 +107,8 @@ typedef struct xfs_da_args {
 	{ XFS_DA_OP_NOTIME,	"NOTIME" }, \
 	{ XFS_DA_OP_REMOVE,	"REMOVE" }, \
 	{ XFS_DA_OP_RECOVERY,	"RECOVERY" }, \
-	{ XFS_DA_OP_LOGGED,	"LOGGED" }
+	{ XFS_DA_OP_LOGGED,	"LOGGED" }, \
+	{ XFS_DA_OP_VLOOKUP,	"VLOOKUP" }
 
 /*
  * Storage for holding state during Btree searches and split/join ops.


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 03/13] xfs: preserve VLOOKUP in xfs_attr_set
  2023-03-03 17:11   ` [PATCHSET v9r2d1.1 00/13] xfs: remove parent pointer hashing Darrick J. Wong
  2023-03-03 17:11     ` [PATCH 01/13] xfs: make xfs_attr_set require XFS_DA_OP_REMOVE Darrick J. Wong
  2023-03-03 17:11     ` [PATCH 02/13] xfs: allow xattr matching on value for local/sf attrs Darrick J. Wong
@ 2023-03-03 17:11     ` Darrick J. Wong
  2023-03-03 17:11     ` [PATCH 04/13] xfs: log VLOOKUP xattr removal operations Darrick J. Wong
                       ` (9 subsequent siblings)
  12 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-03-03 17:11 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Preserve the attr-value lookup flag when calling xfs_attr_set.  Normal
xattr users will never use this, but parent pointer fsck will.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_attr.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index 756d93526075..86672061c99e 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -999,11 +999,11 @@ xfs_attr_set(
 	/*
 	 * We have no control over the attribute names that userspace passes us
 	 * to remove, so we have to allow the name lookup prior to attribute
-	 * removal to fail as well.  Preserve the logged flag, since we need
-	 * to pass that through to the logging code.
+	 * removal to fail as well.  Preserve the logged and vlookup flags,
+	 * since we need to pass them through to the lower levels.
 	 */
-	args->op_flags = XFS_DA_OP_OKNOENT |
-					(args->op_flags & XFS_DA_OP_LOGGED);
+	args->op_flags &= (XFS_DA_OP_LOGGED | XFS_DA_OP_VLOOKUP);
+	args->op_flags |= XFS_DA_OP_OKNOENT;
 
 	if (!is_remove) {
 		XFS_STATS_INC(mp, xs_attr_set);


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 04/13] xfs: log VLOOKUP xattr removal operations
  2023-03-03 17:11   ` [PATCHSET v9r2d1.1 00/13] xfs: remove parent pointer hashing Darrick J. Wong
                       ` (2 preceding siblings ...)
  2023-03-03 17:11     ` [PATCH 03/13] xfs: preserve VLOOKUP in xfs_attr_set Darrick J. Wong
@ 2023-03-03 17:11     ` Darrick J. Wong
  2023-03-03 17:11     ` [PATCH 05/13] xfs: log VLOOKUP xattr setting operations Darrick J. Wong
                       ` (8 subsequent siblings)
  12 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-03-03 17:11 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

If high level code wants to do a deferred xattr remove operation with
the VLOOKUP flag set, we need to push this through the log.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_attr.c       |    6 +++++-
 fs/xfs/libxfs/xfs_log_format.h |    1 +
 fs/xfs/xfs_attr_item.c         |    7 +++++++
 3 files changed, 13 insertions(+), 1 deletion(-)


diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index 86672061c99e..6468286d2d71 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -953,9 +953,13 @@ xfs_attr_defer_remove(
 {
 
 	struct xfs_attr_intent	*new;
+	int			op_flag = XFS_ATTRI_OP_FLAGS_REMOVE;
 	int			error;
 
-	error  = xfs_attr_intent_init(args, XFS_ATTRI_OP_FLAGS_REMOVE, &new);
+	if (args->op_flags & XFS_DA_OP_VLOOKUP)
+		op_flag = XFS_ATTRI_OP_FLAGS_NVREMOVE;
+
+	error  = xfs_attr_intent_init(args, op_flag, &new);
 	if (error)
 		return error;
 
diff --git a/fs/xfs/libxfs/xfs_log_format.h b/fs/xfs/libxfs/xfs_log_format.h
index 727b5a858028..a3d95a3d8476 100644
--- a/fs/xfs/libxfs/xfs_log_format.h
+++ b/fs/xfs/libxfs/xfs_log_format.h
@@ -959,6 +959,7 @@ struct xfs_icreate_log {
 #define XFS_ATTRI_OP_FLAGS_REMOVE	2	/* Remove the attribute */
 #define XFS_ATTRI_OP_FLAGS_REPLACE	3	/* Replace the attribute */
 #define XFS_ATTRI_OP_FLAGS_NVREPLACE	4	/* Replace attr name and val */
+#define XFS_ATTRI_OP_FLAGS_NVREMOVE	5	/* Remove attr w/ vlookup */
 #define XFS_ATTRI_OP_FLAGS_TYPE_MASK	0xFF	/* Flags type mask */
 
 /*
diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
index 792c01a49749..08cb26d6b37b 100644
--- a/fs/xfs/xfs_attr_item.c
+++ b/fs/xfs/xfs_attr_item.c
@@ -544,6 +544,7 @@ xfs_attri_validate(
 	case XFS_ATTRI_OP_FLAGS_REPLACE:
 	case XFS_ATTRI_OP_FLAGS_REMOVE:
 	case XFS_ATTRI_OP_FLAGS_NVREPLACE:
+	case XFS_ATTRI_OP_FLAGS_NVREMOVE:
 		break;
 	default:
 		return false;
@@ -643,6 +644,11 @@ xfs_attri_item_recover(
 		else
 			attr->xattri_dela_state = xfs_attr_init_add_state(args);
 		break;
+	case XFS_ATTRI_OP_FLAGS_NVREMOVE:
+		args->op_flags |= XFS_DA_OP_VLOOKUP;
+		args->value = nv->value.i_addr;
+		args->valuelen = nv->value.i_len;
+		fallthrough;
 	case XFS_ATTRI_OP_FLAGS_REMOVE:
 		if (!xfs_inode_hasattr(args->dp))
 			goto out;
@@ -769,6 +775,7 @@ xlog_recover_attri_commit_pass2(
 	switch (op) {
 	case XFS_ATTRI_OP_FLAGS_SET:
 	case XFS_ATTRI_OP_FLAGS_REPLACE:
+	case XFS_ATTRI_OP_FLAGS_NVREMOVE:
 		if (item->ri_total != 3 && item->ri_total != 2) {
 			XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
 					     attri_formatp, len);


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 05/13] xfs: log VLOOKUP xattr setting operations
  2023-03-03 17:11   ` [PATCHSET v9r2d1.1 00/13] xfs: remove parent pointer hashing Darrick J. Wong
                       ` (3 preceding siblings ...)
  2023-03-03 17:11     ` [PATCH 04/13] xfs: log VLOOKUP xattr removal operations Darrick J. Wong
@ 2023-03-03 17:11     ` Darrick J. Wong
  2023-03-03 17:11     ` [PATCH 06/13] xfs: refactor extracting attri ops from alfi_op_flags Darrick J. Wong
                       ` (7 subsequent siblings)
  12 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-03-03 17:11 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

If high level code wants to do a deferred xattr set operation with the
VLOOKUP flag set, we need to push this through the log.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_attr.c       |    6 +++++-
 fs/xfs/libxfs/xfs_log_format.h |    1 +
 fs/xfs/xfs_attr_item.c         |    5 +++++
 3 files changed, 11 insertions(+), 1 deletion(-)


diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index 6468286d2d71..ba8ad232b306 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -910,9 +910,13 @@ xfs_attr_defer_add(
 	struct xfs_da_args	*args)
 {
 	struct xfs_attr_intent	*new;
+	int			op_flag = XFS_ATTRI_OP_FLAGS_SET;
 	int			error = 0;
 
-	error = xfs_attr_intent_init(args, XFS_ATTRI_OP_FLAGS_SET, &new);
+	if (args->op_flags & XFS_DA_OP_VLOOKUP)
+		op_flag = XFS_ATTRI_OP_FLAGS_NVSET;
+
+	error = xfs_attr_intent_init(args, op_flag, &new);
 	if (error)
 		return error;
 
diff --git a/fs/xfs/libxfs/xfs_log_format.h b/fs/xfs/libxfs/xfs_log_format.h
index a3d95a3d8476..1fe9f7394812 100644
--- a/fs/xfs/libxfs/xfs_log_format.h
+++ b/fs/xfs/libxfs/xfs_log_format.h
@@ -960,6 +960,7 @@ struct xfs_icreate_log {
 #define XFS_ATTRI_OP_FLAGS_REPLACE	3	/* Replace the attribute */
 #define XFS_ATTRI_OP_FLAGS_NVREPLACE	4	/* Replace attr name and val */
 #define XFS_ATTRI_OP_FLAGS_NVREMOVE	5	/* Remove attr w/ vlookup */
+#define XFS_ATTRI_OP_FLAGS_NVSET	6	/* Set attr with w/ vlookup */
 #define XFS_ATTRI_OP_FLAGS_TYPE_MASK	0xFF	/* Flags type mask */
 
 /*
diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
index 08cb26d6b37b..79a459e8d51a 100644
--- a/fs/xfs/xfs_attr_item.c
+++ b/fs/xfs/xfs_attr_item.c
@@ -545,6 +545,7 @@ xfs_attri_validate(
 	case XFS_ATTRI_OP_FLAGS_REMOVE:
 	case XFS_ATTRI_OP_FLAGS_NVREPLACE:
 	case XFS_ATTRI_OP_FLAGS_NVREMOVE:
+	case XFS_ATTRI_OP_FLAGS_NVSET:
 		break;
 	default:
 		return false;
@@ -633,6 +634,9 @@ xfs_attri_item_recover(
 	ASSERT(xfs_sb_version_haslogxattrs(&mp->m_sb));
 
 	switch (attr->xattri_op_flags) {
+	case XFS_ATTRI_OP_FLAGS_NVSET:
+		args->op_flags |= XFS_DA_OP_VLOOKUP;
+		fallthrough;
 	case XFS_ATTRI_OP_FLAGS_SET:
 	case XFS_ATTRI_OP_FLAGS_REPLACE:
 	case XFS_ATTRI_OP_FLAGS_NVREPLACE:
@@ -773,6 +777,7 @@ xlog_recover_attri_commit_pass2(
 
 	op = attri_formatp->alfi_op_flags & XFS_ATTRI_OP_FLAGS_TYPE_MASK;
 	switch (op) {
+	case XFS_ATTRI_OP_FLAGS_NVSET:
 	case XFS_ATTRI_OP_FLAGS_SET:
 	case XFS_ATTRI_OP_FLAGS_REPLACE:
 	case XFS_ATTRI_OP_FLAGS_NVREMOVE:


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 06/13] xfs: refactor extracting attri ops from alfi_op_flags
  2023-03-03 17:11   ` [PATCHSET v9r2d1.1 00/13] xfs: remove parent pointer hashing Darrick J. Wong
                       ` (4 preceding siblings ...)
  2023-03-03 17:11     ` [PATCH 05/13] xfs: log VLOOKUP xattr setting operations Darrick J. Wong
@ 2023-03-03 17:11     ` Darrick J. Wong
  2023-03-03 17:11     ` [PATCH 07/13] xfs: overlay alfi_nname_len atop alfi_name_len for NVREPLACE Darrick J. Wong
                       ` (6 subsequent siblings)
  12 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-03-03 17:11 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Refactor this very long expression into a helper before we start adding
more of them.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_attr_item.c |   14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)


diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
index 79a459e8d51a..6dce2110a871 100644
--- a/fs/xfs/xfs_attr_item.c
+++ b/fs/xfs/xfs_attr_item.c
@@ -374,6 +374,12 @@ xfs_xattri_finish_update(
 	return error;
 }
 
+static inline unsigned int
+xfs_attr_log_item_op(const struct xfs_attri_log_format *attrp)
+{
+	return attrp->alfi_op_flags & XFS_ATTRI_OP_FLAGS_TYPE_MASK;
+}
+
 /* Log an attr to the intent item. */
 STATIC void
 xfs_attr_log_item(
@@ -525,8 +531,7 @@ xfs_attri_validate(
 	struct xfs_mount		*mp,
 	struct xfs_attri_log_format	*attrp)
 {
-	unsigned int			op = attrp->alfi_op_flags &
-					     XFS_ATTRI_OP_FLAGS_TYPE_MASK;
+	unsigned int			op = xfs_attr_log_item_op(attrp);
 
 	if (attrp->alfi_op_flags != XFS_ATTRI_OP_FLAGS_NVREPLACE &&
 	    attrp->alfi_nname_len != 0)
@@ -608,8 +613,7 @@ xfs_attri_item_recover(
 	args = (struct xfs_da_args *)(attr + 1);
 
 	attr->xattri_da_args = args;
-	attr->xattri_op_flags = attrp->alfi_op_flags &
-						XFS_ATTRI_OP_FLAGS_TYPE_MASK;
+	attr->xattri_op_flags = xfs_attr_log_item_op(attrp);
 
 	/*
 	 * We're reconstructing the deferred work state structure from the
@@ -775,7 +779,7 @@ xlog_recover_attri_commit_pass2(
 		return -EFSCORRUPTED;
 	}
 
-	op = attri_formatp->alfi_op_flags & XFS_ATTRI_OP_FLAGS_TYPE_MASK;
+	op = xfs_attr_log_item_op(attri_formatp);
 	switch (op) {
 	case XFS_ATTRI_OP_FLAGS_NVSET:
 	case XFS_ATTRI_OP_FLAGS_SET:


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 07/13] xfs: overlay alfi_nname_len atop alfi_name_len for NVREPLACE
  2023-03-03 17:11   ` [PATCHSET v9r2d1.1 00/13] xfs: remove parent pointer hashing Darrick J. Wong
                       ` (5 preceding siblings ...)
  2023-03-03 17:11     ` [PATCH 06/13] xfs: refactor extracting attri ops from alfi_op_flags Darrick J. Wong
@ 2023-03-03 17:11     ` Darrick J. Wong
  2023-03-03 17:12     ` [PATCH 08/13] xfs: rename nname to newname Darrick J. Wong
                       ` (5 subsequent siblings)
  12 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-03-03 17:11 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

In preparation for being able to log the old attr value in a NVREPLACE
operation, encode the old and new name lengths in the alfi_name_len
field.  We haven't shipped a kernel with XFS_ATTRI_OP_FLAGS_NVREPLACE,
so we can still tweak the ondisk log item format.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_log_format.h |   14 ++++++-
 fs/xfs/xfs_attr_item.c         |   81 ++++++++++++++++++++++++----------------
 2 files changed, 60 insertions(+), 35 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_log_format.h b/fs/xfs/libxfs/xfs_log_format.h
index 1fe9f7394812..32035786135b 100644
--- a/fs/xfs/libxfs/xfs_log_format.h
+++ b/fs/xfs/libxfs/xfs_log_format.h
@@ -979,11 +979,21 @@ struct xfs_icreate_log {
 struct xfs_attri_log_format {
 	uint16_t	alfi_type;	/* attri log item type */
 	uint16_t	alfi_size;	/* size of this item */
-	uint32_t	alfi_nname_len;	/* attr new name length */
+	uint32_t	__pad;		/* pad to 64 bit aligned */
 	uint64_t	alfi_id;	/* attri identifier */
 	uint64_t	alfi_ino;	/* the inode for this attr operation */
 	uint32_t	alfi_op_flags;	/* marks the op as a set or remove */
-	uint32_t	alfi_name_len;	/* attr name length */
+	union {
+		uint32_t	alfi_name_len;	/* attr name length */
+		struct {
+			/*
+			 * For NVREPLACE, these are the lengths of the old and
+			 * new attr name.
+			 */
+			uint16_t	alfi_oldname_len;
+			uint16_t	alfi_newname_len;
+		};
+	};
 	uint32_t	alfi_value_len;	/* attr value length */
 	uint32_t	alfi_attr_filter;/* attr filter flags */
 };
diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
index 6dce2110a871..6042ba34f705 100644
--- a/fs/xfs/xfs_attr_item.c
+++ b/fs/xfs/xfs_attr_item.c
@@ -402,8 +402,14 @@ xfs_attr_log_item(
 	ASSERT(!(attr->xattri_op_flags & ~XFS_ATTRI_OP_FLAGS_TYPE_MASK));
 	attrp->alfi_op_flags = attr->xattri_op_flags;
 	attrp->alfi_value_len = attr->xattri_nameval->value.i_len;
-	attrp->alfi_name_len = attr->xattri_nameval->name.i_len;
-	attrp->alfi_nname_len = attr->xattri_nameval->nname.i_len;
+
+	if (xfs_attr_log_item_op(attrp) == XFS_ATTRI_OP_FLAGS_NVREPLACE) {
+		attrp->alfi_oldname_len = attr->xattri_nameval->name.i_len;
+		attrp->alfi_newname_len = attr->xattri_nameval->nname.i_len;
+	} else {
+		attrp->alfi_name_len = attr->xattri_nameval->name.i_len;
+	}
+
 	ASSERT(!(attr->xattri_da_args->attr_filter & ~XFS_ATTRI_FILTER_MASK));
 	attrp->alfi_attr_filter = attr->xattri_da_args->attr_filter;
 }
@@ -533,10 +539,6 @@ xfs_attri_validate(
 {
 	unsigned int			op = xfs_attr_log_item_op(attrp);
 
-	if (attrp->alfi_op_flags != XFS_ATTRI_OP_FLAGS_NVREPLACE &&
-	    attrp->alfi_nname_len != 0)
-		return false;
-
 	if (attrp->alfi_op_flags & ~XFS_ATTRI_OP_FLAGS_TYPE_MASK)
 		return false;
 
@@ -545,29 +547,37 @@ xfs_attri_validate(
 
 	/* alfi_op_flags should be either a set or remove */
 	switch (op) {
+	case XFS_ATTRI_OP_FLAGS_REMOVE:
+		if (attrp->alfi_value_len != 0)
+			return false;
+		if (attrp->alfi_name_len == 0 ||
+		    attrp->alfi_name_len > XATTR_NAME_MAX)
+			return false;
+		break;
 	case XFS_ATTRI_OP_FLAGS_SET:
 	case XFS_ATTRI_OP_FLAGS_REPLACE:
-	case XFS_ATTRI_OP_FLAGS_REMOVE:
-	case XFS_ATTRI_OP_FLAGS_NVREPLACE:
 	case XFS_ATTRI_OP_FLAGS_NVREMOVE:
 	case XFS_ATTRI_OP_FLAGS_NVSET:
+		if (attrp->alfi_name_len == 0 ||
+		    attrp->alfi_name_len > XATTR_NAME_MAX)
+			return false;
+		if (attrp->alfi_value_len > XATTR_SIZE_MAX)
+			return false;
+		break;
+	case XFS_ATTRI_OP_FLAGS_NVREPLACE:
+		if (attrp->alfi_oldname_len == 0 ||
+		    attrp->alfi_oldname_len > XATTR_NAME_MAX)
+			return false;
+		if (attrp->alfi_newname_len == 0 ||
+		    attrp->alfi_newname_len > XATTR_NAME_MAX)
+			return false;
+		if (attrp->alfi_value_len > XATTR_SIZE_MAX)
+			return false;
 		break;
 	default:
 		return false;
 	}
 
-	if (attrp->alfi_value_len > XATTR_SIZE_MAX)
-		return false;
-
-	if ((attrp->alfi_name_len > XATTR_NAME_MAX) ||
-	    (attrp->alfi_nname_len > XATTR_NAME_MAX) ||
-	    (attrp->alfi_name_len == 0))
-		return false;
-
-	if (op == XFS_ATTRI_OP_FLAGS_REMOVE &&
-	    attrp->alfi_value_len != 0)
-		return false;
-
 	return xfs_verify_ino(mp, attrp->alfi_ino);
 }
 
@@ -737,8 +747,12 @@ xfs_attri_item_relog(
 	new_attrp->alfi_ino = old_attrp->alfi_ino;
 	new_attrp->alfi_op_flags = old_attrp->alfi_op_flags;
 	new_attrp->alfi_value_len = old_attrp->alfi_value_len;
-	new_attrp->alfi_name_len = old_attrp->alfi_name_len;
-	new_attrp->alfi_nname_len = old_attrp->alfi_nname_len;
+	if (xfs_attr_log_item_op(old_attrp) == XFS_ATTRI_OP_FLAGS_NVREPLACE) {
+		new_attrp->alfi_newname_len = old_attrp->alfi_newname_len;
+		new_attrp->alfi_oldname_len = old_attrp->alfi_oldname_len;
+	} else {
+		new_attrp->alfi_name_len = old_attrp->alfi_name_len;
+	}
 	new_attrp->alfi_attr_filter = old_attrp->alfi_attr_filter;
 
 	xfs_trans_add_item(tp, &new_attrip->attri_item);
@@ -762,6 +776,7 @@ xlog_recover_attri_commit_pass2(
 	const void			*attr_name;
 	size_t				len;
 	const void			*attr_nname = NULL;
+	unsigned int			name_len = 0, newname_len = 0;
 	int				op, i = 0;
 
 	/* Validate xfs_attri_log_format before the large memory allocation */
@@ -790,6 +805,7 @@ xlog_recover_attri_commit_pass2(
 					     attri_formatp, len);
 			return -EFSCORRUPTED;
 		}
+		name_len = attri_formatp->alfi_name_len;
 		break;
 	case XFS_ATTRI_OP_FLAGS_REMOVE:
 		if (item->ri_total != 2) {
@@ -797,6 +813,7 @@ xlog_recover_attri_commit_pass2(
 					     attri_formatp, len);
 			return -EFSCORRUPTED;
 		}
+		name_len = attri_formatp->alfi_name_len;
 		break;
 	case XFS_ATTRI_OP_FLAGS_NVREPLACE:
 		if (item->ri_total != 3 && item->ri_total != 4) {
@@ -804,6 +821,8 @@ xlog_recover_attri_commit_pass2(
 					     attri_formatp, len);
 			return -EFSCORRUPTED;
 		}
+		name_len = attri_formatp->alfi_oldname_len;
+		newname_len = attri_formatp->alfi_newname_len;
 		break;
 	default:
 		XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
@@ -813,15 +832,14 @@ xlog_recover_attri_commit_pass2(
 
 	i++;
 	/* Validate the attr name */
-	if (item->ri_buf[i].i_len !=
-			xlog_calc_iovec_len(attri_formatp->alfi_name_len)) {
+	if (item->ri_buf[i].i_len != xlog_calc_iovec_len(name_len)) {
 		XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
 				attri_formatp, len);
 		return -EFSCORRUPTED;
 	}
 
 	attr_name = item->ri_buf[i].i_addr;
-	if (!xfs_attr_namecheck(mp, attr_name, attri_formatp->alfi_name_len,
+	if (!xfs_attr_namecheck(mp, attr_name, name_len,
 				attri_formatp->alfi_attr_filter)) {
 		XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
 				item->ri_buf[i].i_addr, item->ri_buf[i].i_len);
@@ -829,10 +847,9 @@ xlog_recover_attri_commit_pass2(
 	}
 
 	i++;
-	if (attri_formatp->alfi_nname_len) {
+	if (newname_len > 0) {
 		/* Validate the attr nname */
-		if (item->ri_buf[i].i_len !=
-		    xlog_calc_iovec_len(attri_formatp->alfi_nname_len)) {
+		if (item->ri_buf[i].i_len != xlog_calc_iovec_len(newname_len)) {
 			XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
 					item->ri_buf[i].i_addr,
 					item->ri_buf[i].i_len);
@@ -840,8 +857,7 @@ xlog_recover_attri_commit_pass2(
 		}
 
 		attr_nname = item->ri_buf[i].i_addr;
-		if (!xfs_attr_namecheck(mp, attr_nname,
-				attri_formatp->alfi_nname_len,
+		if (!xfs_attr_namecheck(mp, attr_nname, newname_len,
 				attri_formatp->alfi_attr_filter)) {
 			XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
 					item->ri_buf[i].i_addr,
@@ -868,9 +884,8 @@ xlog_recover_attri_commit_pass2(
 	 * name/value buffer to the recovered incore log item and drop our
 	 * reference.
 	 */
-	nv = xfs_attri_log_nameval_alloc(attr_name,
-			attri_formatp->alfi_name_len, attr_nname,
-			attri_formatp->alfi_nname_len, attr_value,
+	nv = xfs_attri_log_nameval_alloc(attr_name, name_len, attr_nname,
+			newname_len, attr_value,
 			attri_formatp->alfi_value_len);
 
 	attrip = xfs_attri_init(mp, nv);


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 08/13] xfs: rename nname to newname
  2023-03-03 17:11   ` [PATCHSET v9r2d1.1 00/13] xfs: remove parent pointer hashing Darrick J. Wong
                       ` (6 preceding siblings ...)
  2023-03-03 17:11     ` [PATCH 07/13] xfs: overlay alfi_nname_len atop alfi_name_len for NVREPLACE Darrick J. Wong
@ 2023-03-03 17:12     ` Darrick J. Wong
  2023-03-03 17:12     ` [PATCH 09/13] xfs: log VLOOKUP xattr nvreplace operations Darrick J. Wong
                       ` (4 subsequent siblings)
  12 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-03-03 17:12 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Burn a couple of extra bytes to make it clear what this does.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_log_format.h |    2 +-
 fs/xfs/xfs_attr_item.c         |   48 ++++++++++++++++++++--------------------
 fs/xfs/xfs_attr_item.h         |    2 +-
 3 files changed, 26 insertions(+), 26 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_log_format.h b/fs/xfs/libxfs/xfs_log_format.h
index 32035786135b..8b16ae27c2fd 100644
--- a/fs/xfs/libxfs/xfs_log_format.h
+++ b/fs/xfs/libxfs/xfs_log_format.h
@@ -117,7 +117,7 @@ struct xfs_unmount_log_format {
 #define XLOG_REG_TYPE_ATTRD_FORMAT	28
 #define XLOG_REG_TYPE_ATTR_NAME	29
 #define XLOG_REG_TYPE_ATTR_VALUE	30
-#define XLOG_REG_TYPE_ATTR_NNAME	31
+#define XLOG_REG_TYPE_ATTR_NEWNAME	31
 #define XLOG_REG_TYPE_MAX		31
 
 
diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
index 6042ba34f705..83e83aa05f94 100644
--- a/fs/xfs/xfs_attr_item.c
+++ b/fs/xfs/xfs_attr_item.c
@@ -75,8 +75,8 @@ static inline struct xfs_attri_log_nameval *
 xfs_attri_log_nameval_alloc(
 	const void			*name,
 	unsigned int			name_len,
-	const void			*nname,
-	unsigned int			nname_len,
+	const void			*newname,
+	unsigned int			newname_len,
 	const void			*value,
 	unsigned int			value_len)
 {
@@ -87,25 +87,25 @@ xfs_attri_log_nameval_alloc(
 	 * this. But kvmalloc() utterly sucks, so we use our own version.
 	 */
 	nv = xlog_kvmalloc(sizeof(struct xfs_attri_log_nameval) +
-					name_len + nname_len + value_len);
+					name_len + newname_len + value_len);
 
 	nv->name.i_addr = nv + 1;
 	nv->name.i_len = name_len;
 	nv->name.i_type = XLOG_REG_TYPE_ATTR_NAME;
 	memcpy(nv->name.i_addr, name, name_len);
 
-	if (nname_len) {
-		nv->nname.i_addr = nv->name.i_addr + name_len;
-		nv->nname.i_len = nname_len;
-		memcpy(nv->nname.i_addr, nname, nname_len);
+	if (newname_len) {
+		nv->newname.i_addr = nv->name.i_addr + name_len;
+		nv->newname.i_len = newname_len;
+		memcpy(nv->newname.i_addr, newname, newname_len);
 	} else {
-		nv->nname.i_addr = NULL;
-		nv->nname.i_len = 0;
+		nv->newname.i_addr = NULL;
+		nv->newname.i_len = 0;
 	}
-	nv->nname.i_type = XLOG_REG_TYPE_ATTR_NNAME;
+	nv->newname.i_type = XLOG_REG_TYPE_ATTR_NEWNAME;
 
 	if (value_len) {
-		nv->value.i_addr = nv->name.i_addr + nname_len + name_len;
+		nv->value.i_addr = nv->name.i_addr + newname_len + name_len;
 		nv->value.i_len = value_len;
 		memcpy(nv->value.i_addr, value, value_len);
 	} else {
@@ -159,9 +159,9 @@ xfs_attri_item_size(
 	*nbytes += sizeof(struct xfs_attri_log_format) +
 			xlog_calc_iovec_len(nv->name.i_len);
 
-	if (nv->nname.i_len) {
+	if (nv->newname.i_len) {
 		*nvecs += 1;
-		*nbytes += xlog_calc_iovec_len(nv->nname.i_len);
+		*nbytes += xlog_calc_iovec_len(nv->newname.i_len);
 	}
 
 	if (nv->value.i_len) {
@@ -197,7 +197,7 @@ xfs_attri_item_format(
 	ASSERT(nv->name.i_len > 0);
 	attrip->attri_format.alfi_size++;
 
-	if (nv->nname.i_len > 0)
+	if (nv->newname.i_len > 0)
 		attrip->attri_format.alfi_size++;
 
 	if (nv->value.i_len > 0)
@@ -208,8 +208,8 @@ xfs_attri_item_format(
 			sizeof(struct xfs_attri_log_format));
 	xlog_copy_from_iovec(lv, &vecp, &nv->name);
 
-	if (nv->nname.i_len > 0)
-		xlog_copy_from_iovec(lv, &vecp, &nv->nname);
+	if (nv->newname.i_len > 0)
+		xlog_copy_from_iovec(lv, &vecp, &nv->newname);
 
 	if (nv->value.i_len > 0)
 		xlog_copy_from_iovec(lv, &vecp, &nv->value);
@@ -405,7 +405,7 @@ xfs_attr_log_item(
 
 	if (xfs_attr_log_item_op(attrp) == XFS_ATTRI_OP_FLAGS_NVREPLACE) {
 		attrp->alfi_oldname_len = attr->xattri_nameval->name.i_len;
-		attrp->alfi_newname_len = attr->xattri_nameval->nname.i_len;
+		attrp->alfi_newname_len = attr->xattri_nameval->newname.i_len;
 	} else {
 		attrp->alfi_name_len = attr->xattri_nameval->name.i_len;
 	}
@@ -638,8 +638,8 @@ xfs_attri_item_recover(
 	args->whichfork = XFS_ATTR_FORK;
 	args->name = nv->name.i_addr;
 	args->namelen = nv->name.i_len;
-	args->new_name = nv->nname.i_addr;
-	args->new_namelen = nv->nname.i_len;
+	args->new_name = nv->newname.i_addr;
+	args->new_namelen = nv->newname.i_len;
 	args->hashval = xfs_da_hashname(args->name, args->namelen);
 	args->attr_filter = attrp->alfi_attr_filter & XFS_ATTRI_FILTER_MASK;
 	args->op_flags = XFS_DA_OP_RECOVERY | XFS_DA_OP_OKNOENT |
@@ -775,7 +775,7 @@ xlog_recover_attri_commit_pass2(
 	const void			*attr_value = NULL;
 	const void			*attr_name;
 	size_t				len;
-	const void			*attr_nname = NULL;
+	const void			*attr_newname = NULL;
 	unsigned int			name_len = 0, newname_len = 0;
 	int				op, i = 0;
 
@@ -848,7 +848,7 @@ xlog_recover_attri_commit_pass2(
 
 	i++;
 	if (newname_len > 0) {
-		/* Validate the attr nname */
+		/* Validate the attr newname */
 		if (item->ri_buf[i].i_len != xlog_calc_iovec_len(newname_len)) {
 			XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
 					item->ri_buf[i].i_addr,
@@ -856,8 +856,8 @@ xlog_recover_attri_commit_pass2(
 			return -EFSCORRUPTED;
 		}
 
-		attr_nname = item->ri_buf[i].i_addr;
-		if (!xfs_attr_namecheck(mp, attr_nname, newname_len,
+		attr_newname = item->ri_buf[i].i_addr;
+		if (!xfs_attr_namecheck(mp, attr_newname, newname_len,
 				attri_formatp->alfi_attr_filter)) {
 			XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
 					item->ri_buf[i].i_addr,
@@ -884,7 +884,7 @@ xlog_recover_attri_commit_pass2(
 	 * name/value buffer to the recovered incore log item and drop our
 	 * reference.
 	 */
-	nv = xfs_attri_log_nameval_alloc(attr_name, name_len, attr_nname,
+	nv = xfs_attri_log_nameval_alloc(attr_name, name_len, attr_newname,
 			newname_len, attr_value,
 			attri_formatp->alfi_value_len);
 
diff --git a/fs/xfs/xfs_attr_item.h b/fs/xfs/xfs_attr_item.h
index 24d4968dd6cc..e374712ba06b 100644
--- a/fs/xfs/xfs_attr_item.h
+++ b/fs/xfs/xfs_attr_item.h
@@ -13,7 +13,7 @@ struct kmem_zone;
 
 struct xfs_attri_log_nameval {
 	struct xfs_log_iovec	name;
-	struct xfs_log_iovec	nname;
+	struct xfs_log_iovec	newname;
 	struct xfs_log_iovec	value;
 	refcount_t		refcount;
 


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 09/13] xfs: log VLOOKUP xattr nvreplace operations
  2023-03-03 17:11   ` [PATCHSET v9r2d1.1 00/13] xfs: remove parent pointer hashing Darrick J. Wong
                       ` (7 preceding siblings ...)
  2023-03-03 17:12     ` [PATCH 08/13] xfs: rename nname to newname Darrick J. Wong
@ 2023-03-03 17:12     ` Darrick J. Wong
  2023-03-03 17:12     ` [PATCH 10/13] xfs: log old xattr values for NVREPLACEXXX operations Darrick J. Wong
                       ` (3 subsequent siblings)
  12 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-03-03 17:12 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

If high level code wants to do a deferred xattr nvreplace operation with
the VLOOKUP flag set, we need to push this through the log.  To avoid
breaking the parent pointer code, we'll temporarily create a new
NVREPLACEXXX flag that connects to the VLOOKUP flag.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_attr.c       |   15 +++++++++++----
 fs/xfs/libxfs/xfs_log_format.h |    1 +
 fs/xfs/xfs_attr_item.c         |   13 +++++++++++--
 3 files changed, 23 insertions(+), 6 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index ba8ad232b306..b9178c4efdeb 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -424,7 +424,12 @@ xfs_attr_complete_op(
 	args->op_flags &= ~XFS_DA_OP_REPLACE;
 	if (do_replace) {
 		args->attr_filter &= ~XFS_ATTR_INCOMPLETE;
-		if (args->new_namelen > 0) {
+		if (args->op_flags & XFS_DA_OP_VLOOKUP) {
+			args->name = args->new_name;
+			args->namelen = args->new_namelen;
+			args->hashval = xfs_da_hashname(args->name,
+							args->namelen);
+		} else if (args->new_namelen > 0) {
 			args->name = args->new_name;
 			args->namelen = args->new_namelen;
 			args->hashval = xfs_da_hashname(args->name,
@@ -933,11 +938,13 @@ xfs_attr_defer_replace(
 	struct xfs_da_args	*args)
 {
 	struct xfs_attr_intent	*new;
-	int			op_flag;
+	int			op_flag = XFS_ATTRI_OP_FLAGS_REPLACE;
 	int			error = 0;
 
-	op_flag = args->new_namelen == 0 ? XFS_ATTRI_OP_FLAGS_REPLACE :
-		  XFS_ATTRI_OP_FLAGS_NVREPLACE;
+	if (args->op_flags & XFS_DA_OP_VLOOKUP)
+		op_flag = XFS_ATTRI_OP_FLAGS_NVREPLACEXXX;
+	else if (args->new_namelen > 0)
+		op_flag = XFS_ATTRI_OP_FLAGS_NVREPLACE;
 
 	error = xfs_attr_intent_init(args, op_flag, &new);
 	if (error)
diff --git a/fs/xfs/libxfs/xfs_log_format.h b/fs/xfs/libxfs/xfs_log_format.h
index 8b16ae27c2fd..a1581dc6f131 100644
--- a/fs/xfs/libxfs/xfs_log_format.h
+++ b/fs/xfs/libxfs/xfs_log_format.h
@@ -961,6 +961,7 @@ struct xfs_icreate_log {
 #define XFS_ATTRI_OP_FLAGS_NVREPLACE	4	/* Replace attr name and val */
 #define XFS_ATTRI_OP_FLAGS_NVREMOVE	5	/* Remove attr w/ vlookup */
 #define XFS_ATTRI_OP_FLAGS_NVSET	6	/* Set attr with w/ vlookup */
+#define XFS_ATTRI_OP_FLAGS_NVREPLACEXXX	7	/* Replace attr name and val */
 #define XFS_ATTRI_OP_FLAGS_TYPE_MASK	0xFF	/* Flags type mask */
 
 /*
diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
index 83e83aa05f94..0dd49c5f235a 100644
--- a/fs/xfs/xfs_attr_item.c
+++ b/fs/xfs/xfs_attr_item.c
@@ -403,7 +403,10 @@ xfs_attr_log_item(
 	attrp->alfi_op_flags = attr->xattri_op_flags;
 	attrp->alfi_value_len = attr->xattri_nameval->value.i_len;
 
-	if (xfs_attr_log_item_op(attrp) == XFS_ATTRI_OP_FLAGS_NVREPLACE) {
+	if (xfs_attr_log_item_op(attrp) == XFS_ATTRI_OP_FLAGS_NVREPLACEXXX) {
+		attrp->alfi_oldname_len = attr->xattri_nameval->name.i_len;
+		attrp->alfi_newname_len = attr->xattri_nameval->newname.i_len;
+	} else if (xfs_attr_log_item_op(attrp) == XFS_ATTRI_OP_FLAGS_NVREPLACE) {
 		attrp->alfi_oldname_len = attr->xattri_nameval->name.i_len;
 		attrp->alfi_newname_len = attr->xattri_nameval->newname.i_len;
 	} else {
@@ -564,6 +567,7 @@ xfs_attri_validate(
 		if (attrp->alfi_value_len > XATTR_SIZE_MAX)
 			return false;
 		break;
+	case XFS_ATTRI_OP_FLAGS_NVREPLACEXXX:
 	case XFS_ATTRI_OP_FLAGS_NVREPLACE:
 		if (attrp->alfi_oldname_len == 0 ||
 		    attrp->alfi_oldname_len > XATTR_NAME_MAX)
@@ -649,6 +653,7 @@ xfs_attri_item_recover(
 
 	switch (attr->xattri_op_flags) {
 	case XFS_ATTRI_OP_FLAGS_NVSET:
+	case XFS_ATTRI_OP_FLAGS_NVREPLACEXXX:
 		args->op_flags |= XFS_DA_OP_VLOOKUP;
 		fallthrough;
 	case XFS_ATTRI_OP_FLAGS_SET:
@@ -747,7 +752,10 @@ xfs_attri_item_relog(
 	new_attrp->alfi_ino = old_attrp->alfi_ino;
 	new_attrp->alfi_op_flags = old_attrp->alfi_op_flags;
 	new_attrp->alfi_value_len = old_attrp->alfi_value_len;
-	if (xfs_attr_log_item_op(old_attrp) == XFS_ATTRI_OP_FLAGS_NVREPLACE) {
+	if (xfs_attr_log_item_op(old_attrp) == XFS_ATTRI_OP_FLAGS_NVREPLACEXXX) {
+		new_attrp->alfi_newname_len = old_attrp->alfi_newname_len;
+		new_attrp->alfi_oldname_len = old_attrp->alfi_oldname_len;
+	} else if (xfs_attr_log_item_op(old_attrp) == XFS_ATTRI_OP_FLAGS_NVREPLACE) {
 		new_attrp->alfi_newname_len = old_attrp->alfi_newname_len;
 		new_attrp->alfi_oldname_len = old_attrp->alfi_oldname_len;
 	} else {
@@ -815,6 +823,7 @@ xlog_recover_attri_commit_pass2(
 		}
 		name_len = attri_formatp->alfi_name_len;
 		break;
+	case XFS_ATTRI_OP_FLAGS_NVREPLACEXXX:
 	case XFS_ATTRI_OP_FLAGS_NVREPLACE:
 		if (item->ri_total != 3 && item->ri_total != 4) {
 			XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 10/13] xfs: log old xattr values for NVREPLACEXXX operations
  2023-03-03 17:11   ` [PATCHSET v9r2d1.1 00/13] xfs: remove parent pointer hashing Darrick J. Wong
                       ` (8 preceding siblings ...)
  2023-03-03 17:12     ` [PATCH 09/13] xfs: log VLOOKUP xattr nvreplace operations Darrick J. Wong
@ 2023-03-03 17:12     ` Darrick J. Wong
  2023-03-03 17:12     ` [PATCH 11/13] xfs: use VLOOKUP mode to avoid hashing parent pointer names Darrick J. Wong
                       ` (2 subsequent siblings)
  12 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-03-03 17:12 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

For NVREPLACEXXX operations, make it possible to log the old and new
attr values, since this variant does VLOOKUP operations.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_attr.c       |    2 +
 fs/xfs/libxfs/xfs_da_btree.h   |    2 +
 fs/xfs/libxfs/xfs_log_format.h |   14 +++++---
 fs/xfs/xfs_attr_item.c         |   74 +++++++++++++++++++++++++++++++++++-----
 fs/xfs/xfs_attr_item.h         |    3 +-
 5 files changed, 80 insertions(+), 15 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index b9178c4efdeb..d807692b259c 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -429,6 +429,8 @@ xfs_attr_complete_op(
 			args->namelen = args->new_namelen;
 			args->hashval = xfs_da_hashname(args->name,
 							args->namelen);
+			args->value = args->new_value;
+			args->valuelen = args->new_valuelen;
 		} else if (args->new_namelen > 0) {
 			args->name = args->new_name;
 			args->namelen = args->new_namelen;
diff --git a/fs/xfs/libxfs/xfs_da_btree.h b/fs/xfs/libxfs/xfs_da_btree.h
index 0ef32f629e1b..cbea5233159c 100644
--- a/fs/xfs/libxfs/xfs_da_btree.h
+++ b/fs/xfs/libxfs/xfs_da_btree.h
@@ -60,7 +60,9 @@ typedef struct xfs_da_args {
 	int		new_namelen;	/* new attr name len */
 	uint8_t		filetype;	/* filetype of inode for directories */
 	void		*value;		/* set of bytes (maybe contain NULLs) */
+	void		*new_value;	/* new xattr value (may contain NULLs) */
 	int		valuelen;	/* length of value */
+	int		new_valuelen;	/* length of new value */
 	unsigned int	attr_filter;	/* XFS_ATTR_{ROOT,SECURE,INCOMPLETE} */
 	unsigned int	attr_flags;	/* XATTR_{CREATE,REPLACE} */
 	xfs_dahash_t	hashval;	/* hash value of name */
diff --git a/fs/xfs/libxfs/xfs_log_format.h b/fs/xfs/libxfs/xfs_log_format.h
index a1581dc6f131..ed406738847d 100644
--- a/fs/xfs/libxfs/xfs_log_format.h
+++ b/fs/xfs/libxfs/xfs_log_format.h
@@ -115,11 +115,11 @@ struct xfs_unmount_log_format {
 #define XLOG_REG_TYPE_BUD_FORMAT	26
 #define XLOG_REG_TYPE_ATTRI_FORMAT	27
 #define XLOG_REG_TYPE_ATTRD_FORMAT	28
-#define XLOG_REG_TYPE_ATTR_NAME	29
+#define XLOG_REG_TYPE_ATTR_NAME		29
 #define XLOG_REG_TYPE_ATTR_VALUE	30
 #define XLOG_REG_TYPE_ATTR_NEWNAME	31
-#define XLOG_REG_TYPE_MAX		31
-
+#define XLOG_REG_TYPE_ATTR_NEWVALUE	32
+#define XLOG_REG_TYPE_MAX		32
 
 /*
  * Flags to log operation header
@@ -980,7 +980,13 @@ struct xfs_icreate_log {
 struct xfs_attri_log_format {
 	uint16_t	alfi_type;	/* attri log item type */
 	uint16_t	alfi_size;	/* size of this item */
-	uint32_t	__pad;		/* pad to 64 bit aligned */
+
+	/*
+	 * For NVREPLACE, this is the length of the new xattr value.
+	 * alfi_value_len contains the length of the old xattr value.
+	 */
+	uint32_t	alfi_newvalue_len;
+
 	uint64_t	alfi_id;	/* attri identifier */
 	uint64_t	alfi_ino;	/* the inode for this attr operation */
 	uint32_t	alfi_op_flags;	/* marks the op as a set or remove */
diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
index 0dd49c5f235a..57cc426b1e22 100644
--- a/fs/xfs/xfs_attr_item.c
+++ b/fs/xfs/xfs_attr_item.c
@@ -78,7 +78,9 @@ xfs_attri_log_nameval_alloc(
 	const void			*newname,
 	unsigned int			newname_len,
 	const void			*value,
-	unsigned int			value_len)
+	unsigned int			value_len,
+	const void			*newvalue,
+	unsigned int			newvalue_len)
 {
 	struct xfs_attri_log_nameval	*nv;
 
@@ -87,7 +89,8 @@ xfs_attri_log_nameval_alloc(
 	 * this. But kvmalloc() utterly sucks, so we use our own version.
 	 */
 	nv = xlog_kvmalloc(sizeof(struct xfs_attri_log_nameval) +
-					name_len + newname_len + value_len);
+					name_len + newname_len + value_len +
+					newvalue_len);
 
 	nv->name.i_addr = nv + 1;
 	nv->name.i_len = name_len;
@@ -114,6 +117,17 @@ xfs_attri_log_nameval_alloc(
 	}
 	nv->value.i_type = XLOG_REG_TYPE_ATTR_VALUE;
 
+	if (newvalue_len) {
+		nv->newvalue.i_addr = nv->name.i_addr + newname_len +
+							name_len + value_len;
+		nv->newvalue.i_len = newvalue_len;
+		memcpy(nv->newvalue.i_addr, newvalue, newvalue_len);
+	} else {
+		nv->newvalue.i_addr = NULL;
+		nv->newvalue.i_len = 0;
+	}
+	nv->newvalue.i_type = XLOG_REG_TYPE_ATTR_NEWVALUE;
+
 	refcount_set(&nv->refcount, 1);
 	return nv;
 }
@@ -168,6 +182,11 @@ xfs_attri_item_size(
 		*nvecs += 1;
 		*nbytes += xlog_calc_iovec_len(nv->value.i_len);
 	}
+
+	if (nv->newvalue.i_len) {
+		*nvecs += 1;
+		*nbytes += xlog_calc_iovec_len(nv->newvalue.i_len);
+	}
 }
 
 /*
@@ -203,6 +222,9 @@ xfs_attri_item_format(
 	if (nv->value.i_len > 0)
 		attrip->attri_format.alfi_size++;
 
+	if (nv->newvalue.i_len > 0)
+		attrip->attri_format.alfi_size++;
+
 	xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTRI_FORMAT,
 			&attrip->attri_format,
 			sizeof(struct xfs_attri_log_format));
@@ -213,6 +235,9 @@ xfs_attri_item_format(
 
 	if (nv->value.i_len > 0)
 		xlog_copy_from_iovec(lv, &vecp, &nv->value);
+
+	if (nv->newvalue.i_len > 0)
+		xlog_copy_from_iovec(lv, &vecp, &nv->newvalue);
 }
 
 /*
@@ -406,6 +431,7 @@ xfs_attr_log_item(
 	if (xfs_attr_log_item_op(attrp) == XFS_ATTRI_OP_FLAGS_NVREPLACEXXX) {
 		attrp->alfi_oldname_len = attr->xattri_nameval->name.i_len;
 		attrp->alfi_newname_len = attr->xattri_nameval->newname.i_len;
+		attrp->alfi_newvalue_len = attr->xattri_nameval->newvalue.i_len;
 	} else if (xfs_attr_log_item_op(attrp) == XFS_ATTRI_OP_FLAGS_NVREPLACE) {
 		attrp->alfi_oldname_len = attr->xattri_nameval->name.i_len;
 		attrp->alfi_newname_len = attr->xattri_nameval->newname.i_len;
@@ -455,7 +481,8 @@ xfs_attr_create_intent(
 		 */
 		attr->xattri_nameval = xfs_attri_log_nameval_alloc(args->name,
 				args->namelen, args->new_name,
-				args->new_namelen, args->value, args->valuelen);
+				args->new_namelen, args->value, args->valuelen,
+				args->new_value, args->new_valuelen);
 	}
 
 	attrip = xfs_attri_init(mp, attr->xattri_nameval);
@@ -556,6 +583,8 @@ xfs_attri_validate(
 		if (attrp->alfi_name_len == 0 ||
 		    attrp->alfi_name_len > XATTR_NAME_MAX)
 			return false;
+		if (attrp->alfi_newvalue_len != 0)
+			return false;
 		break;
 	case XFS_ATTRI_OP_FLAGS_SET:
 	case XFS_ATTRI_OP_FLAGS_REPLACE:
@@ -566,6 +595,8 @@ xfs_attri_validate(
 			return false;
 		if (attrp->alfi_value_len > XATTR_SIZE_MAX)
 			return false;
+		if (attrp->alfi_newvalue_len != 0)
+			return false;
 		break;
 	case XFS_ATTRI_OP_FLAGS_NVREPLACEXXX:
 	case XFS_ATTRI_OP_FLAGS_NVREPLACE:
@@ -577,6 +608,8 @@ xfs_attri_validate(
 			return false;
 		if (attrp->alfi_value_len > XATTR_SIZE_MAX)
 			return false;
+		if (attrp->alfi_newvalue_len > XATTR_SIZE_MAX)
+			return false;
 		break;
 	default:
 		return false;
@@ -652,8 +685,11 @@ xfs_attri_item_recover(
 	ASSERT(xfs_sb_version_haslogxattrs(&mp->m_sb));
 
 	switch (attr->xattri_op_flags) {
-	case XFS_ATTRI_OP_FLAGS_NVSET:
 	case XFS_ATTRI_OP_FLAGS_NVREPLACEXXX:
+		args->new_value = nv->newvalue.i_addr;
+		args->new_valuelen = nv->newvalue.i_len;
+		fallthrough;
+	case XFS_ATTRI_OP_FLAGS_NVSET:
 		args->op_flags |= XFS_DA_OP_VLOOKUP;
 		fallthrough;
 	case XFS_ATTRI_OP_FLAGS_SET:
@@ -755,6 +791,7 @@ xfs_attri_item_relog(
 	if (xfs_attr_log_item_op(old_attrp) == XFS_ATTRI_OP_FLAGS_NVREPLACEXXX) {
 		new_attrp->alfi_newname_len = old_attrp->alfi_newname_len;
 		new_attrp->alfi_oldname_len = old_attrp->alfi_oldname_len;
+		new_attrp->alfi_newvalue_len = old_attrp->alfi_newvalue_len;
 	} else if (xfs_attr_log_item_op(old_attrp) == XFS_ATTRI_OP_FLAGS_NVREPLACE) {
 		new_attrp->alfi_newname_len = old_attrp->alfi_newname_len;
 		new_attrp->alfi_oldname_len = old_attrp->alfi_oldname_len;
@@ -781,10 +818,12 @@ xlog_recover_attri_commit_pass2(
 	struct xfs_attri_log_format     *attri_formatp;
 	struct xfs_attri_log_nameval	*nv;
 	const void			*attr_value = NULL;
+	const void			*attr_newvalue = NULL;
 	const void			*attr_name;
 	size_t				len;
 	const void			*attr_newname = NULL;
 	unsigned int			name_len = 0, newname_len = 0;
+	unsigned int			value_len = 0, newvalue_len = 0;
 	int				op, i = 0;
 
 	/* Validate xfs_attri_log_format before the large memory allocation */
@@ -814,6 +853,7 @@ xlog_recover_attri_commit_pass2(
 			return -EFSCORRUPTED;
 		}
 		name_len = attri_formatp->alfi_name_len;
+		value_len = attri_formatp->alfi_value_len;
 		break;
 	case XFS_ATTRI_OP_FLAGS_REMOVE:
 		if (item->ri_total != 2) {
@@ -822,16 +862,19 @@ xlog_recover_attri_commit_pass2(
 			return -EFSCORRUPTED;
 		}
 		name_len = attri_formatp->alfi_name_len;
+		value_len = attri_formatp->alfi_value_len;
 		break;
 	case XFS_ATTRI_OP_FLAGS_NVREPLACEXXX:
 	case XFS_ATTRI_OP_FLAGS_NVREPLACE:
-		if (item->ri_total != 3 && item->ri_total != 4) {
+		if (item->ri_total < 3 || item->ri_total > 5) {
 			XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
 					     attri_formatp, len);
 			return -EFSCORRUPTED;
 		}
 		name_len = attri_formatp->alfi_oldname_len;
 		newname_len = attri_formatp->alfi_newname_len;
+		value_len = attri_formatp->alfi_value_len;
+		newvalue_len = attri_formatp->alfi_newvalue_len;
 		break;
 	default:
 		XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
@@ -876,16 +919,27 @@ xlog_recover_attri_commit_pass2(
 		i++;
 	}
 
-
 	/* Validate the attr value, if present */
-	if (attri_formatp->alfi_value_len != 0) {
-		if (item->ri_buf[i].i_len != xlog_calc_iovec_len(attri_formatp->alfi_value_len)) {
+	if (value_len > 0) {
+		if (item->ri_buf[i].i_len != xlog_calc_iovec_len(value_len)) {
 			XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
 					attri_formatp, len);
 			return -EFSCORRUPTED;
 		}
 
 		attr_value = item->ri_buf[i].i_addr;
+		i++;
+	}
+
+	/* Validate the old attr value, if present */
+	if (newvalue_len > 0) {
+		if (item->ri_buf[i].i_len != xlog_calc_iovec_len(newvalue_len)) {
+			XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,
+					attri_formatp, len);
+			return -EFSCORRUPTED;
+		}
+
+		attr_newvalue = item->ri_buf[i].i_addr;
 	}
 
 	/*
@@ -894,8 +948,8 @@ xlog_recover_attri_commit_pass2(
 	 * reference.
 	 */
 	nv = xfs_attri_log_nameval_alloc(attr_name, name_len, attr_newname,
-			newname_len, attr_value,
-			attri_formatp->alfi_value_len);
+			newname_len, attr_value, attri_formatp->alfi_value_len,
+			attr_newvalue, newvalue_len);
 
 	attrip = xfs_attri_init(mp, nv);
 	memcpy(&attrip->attri_format, attri_formatp, len);
diff --git a/fs/xfs/xfs_attr_item.h b/fs/xfs/xfs_attr_item.h
index e374712ba06b..d15fe4b1ce28 100644
--- a/fs/xfs/xfs_attr_item.h
+++ b/fs/xfs/xfs_attr_item.h
@@ -13,8 +13,9 @@ struct kmem_zone;
 
 struct xfs_attri_log_nameval {
 	struct xfs_log_iovec	name;
-	struct xfs_log_iovec	newname;
+	struct xfs_log_iovec	newname;	/* NVREPLACE only */
 	struct xfs_log_iovec	value;
+	struct xfs_log_iovec	newvalue;	/* NVREPLACE only */
 	refcount_t		refcount;
 
 	/* name and value follow the end of this struct */


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 11/13] xfs: use VLOOKUP mode to avoid hashing parent pointer names
  2023-03-03 17:11   ` [PATCHSET v9r2d1.1 00/13] xfs: remove parent pointer hashing Darrick J. Wong
                       ` (9 preceding siblings ...)
  2023-03-03 17:12     ` [PATCH 10/13] xfs: log old xattr values for NVREPLACEXXX operations Darrick J. Wong
@ 2023-03-03 17:12     ` Darrick J. Wong
  2023-03-03 17:12     ` [PATCH 12/13] xfs: turn NVREPLACEXXX into NVREPLACE Darrick J. Wong
  2023-03-03 17:12     ` [PATCH 13/13] xfs: revert "load secure hash algorithm for parent pointers" Darrick J. Wong
  12 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-03-03 17:12 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Hashing the parent pointer name is fugly because no hashing function can
be collision proof.  Since we store as much of the dirent name as we can
in the xattr name and spill the rest to the xattr value, use VLOOKUP
mode so that we can match on name and value.  Then we can get rid of the
hashing stuff.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_da_format.h  |   34 ++--
 fs/xfs/libxfs/xfs_parent.c     |  303 +++++++++++++---------------------------
 fs/xfs/libxfs/xfs_parent.h     |   16 --
 fs/xfs/libxfs/xfs_trans_resv.c |    1 
 fs/xfs/scrub/dir.c             |   38 +----
 fs/xfs/scrub/parent.c          |   61 +-------
 fs/xfs/scrub/parent_repair.c   |   34 +---
 fs/xfs/xfs_ondisk.h            |    3 
 8 files changed, 136 insertions(+), 354 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_da_format.h b/fs/xfs/libxfs/xfs_da_format.h
index 55f510f82e8d..dd569286b3be 100644
--- a/fs/xfs/libxfs/xfs_da_format.h
+++ b/fs/xfs/libxfs/xfs_da_format.h
@@ -824,22 +824,12 @@ static inline unsigned int xfs_dir2_dirblock_bytes(struct xfs_sb *sbp)
 xfs_failaddr_t xfs_da3_blkinfo_verify(struct xfs_buf *bp,
 				      struct xfs_da3_blkinfo *hdr3);
 
-/* We use sha512 for the parent pointer name hash. */
-#define XFS_PARENT_NAME_SHA512_SIZE	(64)
-
 /*
  * Parent pointer attribute format definition
  *
- * The EA name encodes the parent inode number, generation and a collision
- * resistant hash computed from the dirent name.  The hash is defined to be
- * one of the following:
- *
- * - The dirent name, as long as it does not use the last possible byte of the
- *   EA name space.
- *
- * - The truncated dirent name, with the sha512 hash of the child inode
- *   generation number and dirent name.  The hash is written at the end of the
- *   EA name.
+ * The EA name encodes the parent inode number, generation and as much of the
+ * dirent name as fits.  In other words, it contains up to 243 bytes of the
+ * dirent name.
  *
  * The EA value contains however much of the dirent name that does not fit in
  * the EA name.
@@ -847,30 +837,30 @@ xfs_failaddr_t xfs_da3_blkinfo_verify(struct xfs_buf *bp,
 struct xfs_parent_name_rec {
 	__be64  p_ino;
 	__be32  p_gen;
-	__u8	p_namehash[];
+	__u8	p_dname[];
 } __attribute__((packed));
 
 /* Maximum size of a parent pointer EA name. */
 #define XFS_PARENT_NAME_MAX_SIZE \
 	(MAXNAMELEN - 1)
 
-/* Maximum size of a parent pointer name hash. */
-#define XFS_PARENT_NAME_MAX_HASH_SIZE \
+/* Maximum number of dirent name bytes stored in p_dname. */
+#define XFS_PARENT_MAX_DNAME_SIZE \
 	(XFS_PARENT_NAME_MAX_SIZE - sizeof(struct xfs_parent_name_rec))
 
-/* Offset of the sha512 hash, if used. */
-#define XFS_PARENT_NAME_SHA512_OFFSET \
-	(XFS_PARENT_NAME_MAX_HASH_SIZE - XFS_PARENT_NAME_SHA512_SIZE)
+/* Maximum number of dirent name bytes stored in the xattr value. */
+#define XFS_PARENT_MAX_DNAME_VALUELEN \
+	sizeof(struct xfs_parent_name_rec)
 
 static inline unsigned int
 xfs_parent_name_rec_sizeof(
-	unsigned int		hashlen)
+	unsigned int		dnamelen)
 {
-	return sizeof(struct xfs_parent_name_rec) + hashlen;
+	return sizeof(struct xfs_parent_name_rec) + dnamelen;
 }
 
 static inline unsigned int
-xfs_parent_name_hashlen(
+xfs_parent_name_dnamelen(
 	unsigned int		rec_sizeof)
 {
 	return rec_sizeof - sizeof(struct xfs_parent_name_rec);
diff --git a/fs/xfs/libxfs/xfs_parent.c b/fs/xfs/libxfs/xfs_parent.c
index 387f3c65287f..af412ebe65a4 100644
--- a/fs/xfs/libxfs/xfs_parent.c
+++ b/fs/xfs/libxfs/xfs_parent.c
@@ -26,7 +26,6 @@
 #include "xfs_xattr.h"
 #include "xfs_parent.h"
 #include "xfs_trans_space.h"
-#include "xfs_sha512.h"
 
 struct kmem_cache		*xfs_parent_intent_cache;
 
@@ -56,8 +55,11 @@ xfs_parent_namecheck(
 {
 	xfs_ino_t				p_ino;
 
-	if (reclen <= xfs_parent_name_rec_sizeof(0) ||
-	    reclen > xfs_parent_name_rec_sizeof(XFS_PARENT_NAME_MAX_HASH_SIZE))
+	if (!(attr_flags & XFS_ATTR_PARENT))
+		return false;
+
+	if (reclen <= sizeof(struct xfs_parent_name_rec) ||
+	    reclen > XFS_PARENT_NAME_MAX_SIZE)
 		return false;
 
 	/* Only one namespace bit allowed. */
@@ -86,7 +88,7 @@ xfs_parent_valuecheck(
 		return false;
 
 	if (namelen == XFS_PARENT_NAME_MAX_SIZE &&
-	    valuelen >= MAXNAMELEN - XFS_PARENT_NAME_SHA512_OFFSET)
+	    valuelen > XFS_PARENT_MAX_DNAME_VALUELEN)
 		return false;
 
 	if (value == NULL)
@@ -95,7 +97,10 @@ xfs_parent_valuecheck(
 	return true;
 }
 
-/* Initializes a xfs_parent_name_rec to be stored as an attribute name */
+/*
+ * Initializes a xfs_parent_name_rec to be stored as an attribute name.
+ * Returns the number of name bytes stored in p_dname.
+ */
 static inline int
 xfs_init_parent_name_rec(
 	struct xfs_parent_name_rec	*rec,
@@ -103,23 +108,14 @@ xfs_init_parent_name_rec(
 	const struct xfs_name		*name,
 	struct xfs_inode		*ip)
 {
+	int				dnamelen;
+
 	rec->p_ino = cpu_to_be64(dp->i_ino);
 	rec->p_gen = cpu_to_be32(VFS_IC(dp)->i_generation);
-	return xfs_parent_namehash(ip, name, rec->p_namehash,
-			XFS_PARENT_NAME_MAX_HASH_SIZE);
-}
 
-/* Compute the number of name bytes that can be encoded in the namehash. */
-static inline unsigned int
-xfs_parent_valuelen_adj(
-	int			hashlen)
-{
-	ASSERT(hashlen > 0);
-
-	if (hashlen == XFS_PARENT_NAME_MAX_HASH_SIZE)
-		return XFS_PARENT_NAME_SHA512_OFFSET;
-
-	return hashlen;
+	dnamelen = min_t(int, name->len, XFS_PARENT_MAX_DNAME_SIZE);
+	memcpy(rec->p_dname, name->name, dnamelen);
+	return dnamelen;
 }
 
 /*
@@ -134,48 +130,30 @@ xfs_parent_irec_from_disk(
 	const void			*value,
 	int				valuelen)
 {
+	int				dnamelen;
+
 	irec->p_ino = be64_to_cpu(rec->p_ino);
 	irec->p_gen = be32_to_cpu(rec->p_gen);
-	irec->hashlen = xfs_parent_name_hashlen(reclen);
-	memcpy(irec->p_namehash, rec->p_namehash, irec->hashlen);
-	memset(irec->p_namehash + irec->hashlen, 0,
-			sizeof(irec->p_namehash) - irec->hashlen);
 
 	if (!value) {
 		irec->p_namelen = 0;
 		return;
 	}
 
-	ASSERT(valuelen < MAXNAMELEN);
+	ASSERT(valuelen <= XFS_PARENT_MAX_DNAME_VALUELEN);
 
-	if (irec->hashlen == XFS_PARENT_NAME_MAX_HASH_SIZE) {
-		ASSERT(valuelen > 0);
-		ASSERT(valuelen <= MAXNAMELEN - XFS_PARENT_NAME_SHA512_OFFSET);
-
-		valuelen = min_t(int, valuelen,
-				MAXNAMELEN - XFS_PARENT_NAME_SHA512_OFFSET);
-
-		memcpy(irec->p_name, irec->p_namehash,
-				XFS_PARENT_NAME_SHA512_OFFSET);
-		memcpy(&irec->p_name[XFS_PARENT_NAME_SHA512_OFFSET],
-				value, valuelen);
-		irec->p_namelen = XFS_PARENT_NAME_SHA512_OFFSET + valuelen;
-	} else {
-		ASSERT(valuelen == 0);
-
-		memcpy(irec->p_name, irec->p_namehash, irec->hashlen);
-		irec->p_namelen = irec->hashlen;
-	}
-
-	memset(&irec->p_name[irec->p_namelen], 0,
-			sizeof(irec->p_name) - irec->p_namelen);
+	dnamelen = xfs_parent_name_dnamelen(reclen);
+	irec->p_namelen = dnamelen + valuelen;
+	memcpy(irec->p_name, rec->p_dname, dnamelen);
+	if (valuelen > 0)
+		memcpy(irec->p_name + dnamelen, value, valuelen);
 }
 
 /*
- * Convert an incore parent_name record to its ondisk format.  If @value or
- * @valuelen are NULL, they will not be written to.
+ * Convert an incore parent_name record to its ondisk format.  If @valuelen is
+ * NULL, neither it nor @value will be written to.
  */
-void
+int
 xfs_parent_irec_to_disk(
 	struct xfs_parent_name_rec	*rec,
 	int				*reclen,
@@ -183,25 +161,23 @@ xfs_parent_irec_to_disk(
 	int				*valuelen,
 	const struct xfs_parent_name_irec *irec)
 {
+	int				dnamelen;
+
 	rec->p_ino = cpu_to_be64(irec->p_ino);
 	rec->p_gen = cpu_to_be32(irec->p_gen);
-	*reclen = xfs_parent_name_rec_sizeof(irec->hashlen);
-	memcpy(rec->p_namehash, irec->p_namehash, irec->hashlen);
+	dnamelen = min_t(int, irec->p_namelen, XFS_PARENT_MAX_DNAME_SIZE);
+	*reclen = xfs_parent_name_rec_sizeof(dnamelen);
+	memcpy(rec->p_dname, irec->p_name, dnamelen);
 
-	if (valuelen) {
-		ASSERT(*valuelen > 0);
-		ASSERT(*valuelen >= irec->p_namelen);
-		ASSERT(*valuelen < MAXNAMELEN);
+	if (!valuelen)
+		return dnamelen;
 
-		if (irec->hashlen == XFS_PARENT_NAME_MAX_HASH_SIZE)
-			*valuelen = irec->p_namelen - XFS_PARENT_NAME_SHA512_OFFSET;
-		else
-			*valuelen = 0;
-	}
+	*valuelen = irec->p_namelen - dnamelen;
+	if (*valuelen)
+		memcpy(value, rec->p_dname + XFS_PARENT_MAX_DNAME_SIZE,
+				*valuelen);
 
-	if (value && irec->hashlen == XFS_PARENT_NAME_MAX_HASH_SIZE)
-		memcpy(value, irec->p_name + XFS_PARENT_NAME_SHA512_OFFSET,
-			      irec->p_namelen - XFS_PARENT_NAME_SHA512_OFFSET);
+	return dnamelen;
 }
 
 /*
@@ -235,7 +211,8 @@ __xfs_parent_init(
 	parent->args.geo = mp->m_attr_geo;
 	parent->args.whichfork = XFS_ATTR_FORK;
 	parent->args.attr_filter = XFS_ATTR_PARENT;
-	parent->args.op_flags = XFS_DA_OP_OKNOENT | XFS_DA_OP_LOGGED;
+	parent->args.op_flags = XFS_DA_OP_OKNOENT | XFS_DA_OP_LOGGED |
+				XFS_DA_OP_VLOOKUP;
 	parent->args.name = (const uint8_t *)&parent->rec;
 	parent->args.namelen = 0;
 
@@ -253,25 +230,22 @@ xfs_parent_add(
 	struct xfs_inode	*child)
 {
 	struct xfs_da_args	*args = &parent->args;
-	int			hashlen;
-	unsigned int		name_adj;
+	int			dnamelen;
 
-	hashlen = xfs_init_parent_name_rec(&parent->rec, dp, parent_name,
+	dnamelen = xfs_init_parent_name_rec(&parent->rec, dp, parent_name,
 			child);
-	if (hashlen < 0)
-		return hashlen;
 
-	args->namelen = xfs_parent_name_rec_sizeof(hashlen);
+	args->namelen = xfs_parent_name_rec_sizeof(dnamelen);
 	args->hashval = xfs_da_hashname(args->name, args->namelen);
 
-	name_adj = xfs_parent_valuelen_adj(hashlen);
-
 	args->trans = tp;
 	args->dp = child;
-	if (parent_name) {
-		parent->args.value = (void *)parent_name->name + name_adj;
-		parent->args.valuelen = parent_name->len - name_adj;
-	}
+
+	parent->args.valuelen = parent_name->len - dnamelen;
+	if (parent->args.valuelen > 0)
+		parent->args.value = (void *)parent_name->name + dnamelen;
+	else
+		parent->args.value = NULL;
 
 	return xfs_attr_defer_add(args);
 }
@@ -286,16 +260,21 @@ xfs_parent_remove(
 	struct xfs_inode	*child)
 {
 	struct xfs_da_args	*args = &parent->args;
-	int			hashlen;
+	int			dnamelen;
 
-	hashlen = xfs_init_parent_name_rec(&parent->rec, dp, name, child);
-	if (hashlen < 0)
-		return hashlen;
+	dnamelen = xfs_init_parent_name_rec(&parent->rec, dp, name, child);
 
-	args->namelen = xfs_parent_name_rec_sizeof(hashlen);
+	args->namelen = xfs_parent_name_rec_sizeof(dnamelen);
 	args->trans = tp;
 	args->dp = child;
 	args->hashval = xfs_da_hashname(args->name, args->namelen);
+
+	parent->args.valuelen = name->len - dnamelen;
+	if (parent->args.valuelen > 0)
+		parent->args.value = (void *)name->name + dnamelen;
+	else
+		parent->args.value = NULL;
+
 	return xfs_attr_defer_remove(args);
 }
 
@@ -311,29 +290,31 @@ xfs_parent_replace(
 	struct xfs_inode	*child)
 {
 	struct xfs_da_args	*args = &new_parent->args;
-	int			old_hashlen, new_hashlen;
-	int			new_name_adj;
+	int			old_dnamelen, new_dnamelen;
 
-	old_hashlen = xfs_init_parent_name_rec(&new_parent->old_rec, old_dp,
+	old_dnamelen = xfs_init_parent_name_rec(&new_parent->old_rec, old_dp,
 			old_name, child);
-	if (old_hashlen < 0)
-		return old_hashlen;
-	new_hashlen = xfs_init_parent_name_rec(&new_parent->rec, new_dp,
+	new_dnamelen = xfs_init_parent_name_rec(&new_parent->rec, new_dp,
 			new_name, child);
-	if (new_hashlen < 0)
-		return new_hashlen;
-
-	new_name_adj = xfs_parent_valuelen_adj(new_hashlen);
 
 	new_parent->args.name = (const uint8_t *)&new_parent->old_rec;
-	new_parent->args.namelen = xfs_parent_name_rec_sizeof(old_hashlen);
+	new_parent->args.namelen = xfs_parent_name_rec_sizeof(old_dnamelen);
 	new_parent->args.new_name = (const uint8_t *)&new_parent->rec;
-	new_parent->args.new_namelen = xfs_parent_name_rec_sizeof(new_hashlen);
+	new_parent->args.new_namelen = xfs_parent_name_rec_sizeof(new_dnamelen);
 	args->trans = tp;
 	args->dp = child;
 
-	new_parent->args.value = (void *)new_name->name + new_name_adj;
-	new_parent->args.valuelen = new_name->len - new_name_adj;
+	new_parent->args.new_valuelen = new_name->len - new_dnamelen;
+	if (new_parent->args.new_valuelen > 0)
+		new_parent->args.new_value = (void *)new_name->name + new_dnamelen;
+	else
+		new_parent->args.new_value = NULL;
+
+	new_parent->args.valuelen = old_name->len - old_dnamelen;
+	if (new_parent->args.valuelen > 0)
+		new_parent->args.value = (void *)old_name->name + old_dnamelen;
+	else
+		new_parent->args.value = NULL;
 
 	args->hashval = xfs_da_hashname(args->name, args->namelen);
 	return xfs_attr_defer_replace(args);
@@ -363,26 +344,22 @@ xfs_pptr_calc_space_res(
 }
 
 /*
- * Look up the @name associated with the parent pointer (@pptr) of @ip.  Caller
- * must hold at least ILOCK_SHARED.  Returns the length of the dirent name, or
- * a negative errno.  The scratchpad need not be initialized.
+ * Look up the @name associated with the parent pointer (@pptr) of @ip.
+ * Caller must hold at least ILOCK_SHARED.  Returns 0 if the pointer is found,
+ * -ENOATTR if there is no match, or a negative errno.  The scratchpad need not
+ *  be initialized.
  */
 int
 xfs_parent_lookup(
 	struct xfs_trans		*tp,
 	struct xfs_inode		*ip,
 	const struct xfs_parent_name_irec *pptr,
-	unsigned char			*name,
-	unsigned int			namelen,
 	struct xfs_parent_scratch	*scr)
 {
+	int				dnamelen;
 	int				reclen;
-	int				name_adj;
-	int				error;
 
-	xfs_parent_irec_to_disk(&scr->rec, &reclen, NULL, NULL, pptr);
-
-	name_adj = xfs_parent_valuelen_adj(pptr->hashlen);
+	dnamelen = xfs_parent_irec_to_disk(&scr->rec, &reclen, NULL, NULL, pptr);
 
 	memset(&scr->args, 0, sizeof(struct xfs_da_args));
 	scr->args.attr_filter	= XFS_ATTR_PARENT;
@@ -390,20 +367,17 @@ xfs_parent_lookup(
 	scr->args.geo		= ip->i_mount->m_attr_geo;
 	scr->args.name		= (const unsigned char *)&scr->rec;
 	scr->args.namelen	= reclen;
-	scr->args.op_flags	= XFS_DA_OP_OKNOENT;
+	scr->args.op_flags	= XFS_DA_OP_OKNOENT | XFS_DA_OP_VLOOKUP;
 	scr->args.trans		= tp;
-	scr->args.valuelen	= namelen - name_adj;
-	scr->args.value		= name + name_adj;
+	scr->args.valuelen	= pptr->p_namelen - dnamelen;
 	scr->args.whichfork	= XFS_ATTR_FORK;
 
+	if (scr->args.valuelen)
+		scr->args.value	= (void *)pptr->p_name + dnamelen;
+
 	scr->args.hashval = xfs_da_hashname(scr->args.name, scr->args.namelen);
 
-	error = xfs_attr_get_ilocked(&scr->args);
-	if (error)
-		return error;
-
-	memcpy(name, pptr->p_namehash, name_adj);
-	return scr->args.valuelen + name_adj;
+	return xfs_attr_get_ilocked(&scr->args);
 }
 
 /*
@@ -418,12 +392,10 @@ xfs_parent_set(
 	const struct xfs_parent_name_irec *pptr,
 	struct xfs_parent_scratch	*scr)
 {
+	int				dnamelen;
 	int				reclen;
-	int				name_adj;
 
-	xfs_parent_irec_to_disk(&scr->rec, &reclen, NULL, NULL, pptr);
-
-	name_adj = xfs_parent_valuelen_adj(pptr->hashlen);
+	dnamelen = xfs_parent_irec_to_disk(&scr->rec, &reclen, NULL, NULL, pptr);
 
 	memset(&scr->args, 0, sizeof(struct xfs_da_args));
 	scr->args.attr_filter	= XFS_ATTR_PARENT;
@@ -431,10 +403,13 @@ xfs_parent_set(
 	scr->args.geo		= ip->i_mount->m_attr_geo;
 	scr->args.name		= (const unsigned char *)&scr->rec;
 	scr->args.namelen	= reclen;
-	scr->args.valuelen	= pptr->p_namelen - name_adj;
-	scr->args.value		= (void *)pptr->p_name + name_adj;
+	scr->args.op_flags	= XFS_DA_OP_VLOOKUP;
+	scr->args.valuelen	= pptr->p_namelen - dnamelen;
 	scr->args.whichfork	= XFS_ATTR_FORK;
 
+	if (scr->args.valuelen)
+		scr->args.value	= (void *)pptr->p_name + dnamelen;
+
 	return xfs_attr_set(&scr->args);
 }
 
@@ -450,9 +425,10 @@ xfs_parent_unset(
 	const struct xfs_parent_name_irec *pptr,
 	struct xfs_parent_scratch	*scr)
 {
+	int				dnamelen;
 	int				reclen;
 
-	xfs_parent_irec_to_disk(&scr->rec, &reclen, NULL, NULL, pptr);
+	dnamelen = xfs_parent_irec_to_disk(&scr->rec, &reclen, NULL, NULL, pptr);
 
 	memset(&scr->args, 0, sizeof(struct xfs_da_args));
 	scr->args.attr_filter	= XFS_ATTR_PARENT;
@@ -460,89 +436,12 @@ xfs_parent_unset(
 	scr->args.geo		= ip->i_mount->m_attr_geo;
 	scr->args.name		= (const unsigned char *)&scr->rec;
 	scr->args.namelen	= reclen;
-	scr->args.op_flags	= XFS_DA_OP_REMOVE;
+	scr->args.op_flags	= XFS_DA_OP_REMOVE | XFS_DA_OP_VLOOKUP;
+	scr->args.valuelen	= pptr->p_namelen - dnamelen;
 	scr->args.whichfork	= XFS_ATTR_FORK;
 
+	if (scr->args.valuelen)
+		scr->args.value	= (void *)pptr->p_name + dnamelen;
+
 	return xfs_attr_set(&scr->args);
 }
-
-/*
- * Compute the parent pointer namehash for the given child file and dirent
- * name.  Returns the length of the hash in bytes, or a negative errno.
- */
-int
-xfs_parent_namehash(
-	struct xfs_inode	*ip,
-	const struct xfs_name	*name,
-	void			*namehash,
-	unsigned int		namehash_len)
-{
-	SHA512_DESC_ON_STACK(ip->i_mount, shash);
-	__be32			gen = cpu_to_be32(VFS_I(ip)->i_generation);
-	int			error;
-
-	ASSERT(SHA512_DIGEST_SIZE ==
-			crypto_shash_digestsize(ip->i_mount->m_sha512));
-
-	if (namehash_len != XFS_PARENT_NAME_MAX_HASH_SIZE) {
-		ASSERT(0);
-		return -EINVAL;
-	}
-
-	if (name->len < XFS_PARENT_NAME_MAX_HASH_SIZE) {
-		/*
-		 * If the dirent name is shorter than the size of the namehash
-		 * field, write it directly into the namehash field.
-		 */
-		memcpy(namehash, name->name, name->len);
-		memset(namehash + name->len, 0, namehash_len - name->len);
-		return name->len;
-	}
-
-	error = sha512_init(&shash);
-	if (error)
-		goto out;
-
-	error = sha512_process(&shash, (const u8 *)&gen, sizeof(gen));
-	if (error)
-		goto out;
-
-	error = sha512_process(&shash, name->name, name->len);
-	if (error)
-		goto out;
-
-	/*
-	 * The sha512 hash of the child gen and dirent name is placed at the
-	 * end of the namehash, and as many bytes as will fit are copied from
-	 * the dirent name to the start of the namehash.
-	 */
-	error = sha512_done(&shash, namehash + XFS_PARENT_NAME_SHA512_OFFSET);
-	if (error)
-		goto out;
-
-	memcpy(namehash, name->name, XFS_PARENT_NAME_SHA512_OFFSET);
-	error = XFS_PARENT_NAME_MAX_HASH_SIZE;
-out:
-	sha512_erase(&shash);
-	return error;
-}
-
-/* Recalculate the name hash of this parent pointer. */
-int
-xfs_parent_irec_hash(
-	struct xfs_inode		*ip,
-	struct xfs_parent_name_irec	*pptr)
-{
-	struct xfs_name			xname = {
-		.name			= pptr->p_name,
-		.len			= pptr->p_namelen,
-	};
-	int				hashlen;
-
-	hashlen = xfs_parent_namehash(ip, &xname, &pptr->p_namehash,
-			sizeof(pptr->p_namehash));
-	if (hashlen < 0)
-		return hashlen;
-	pptr->hashlen = hashlen;
-	return 0;
-}
diff --git a/fs/xfs/libxfs/xfs_parent.h b/fs/xfs/libxfs/xfs_parent.h
index 6f6136165efe..0b3e0b94d6cb 100644
--- a/fs/xfs/libxfs/xfs_parent.h
+++ b/fs/xfs/libxfs/xfs_parent.h
@@ -23,10 +23,6 @@ struct xfs_parent_name_irec {
 	/* Key fields for looking up a particular parent pointer. */
 	xfs_ino_t		p_ino;
 	uint32_t		p_gen;
-	uint8_t			hashlen;
-	uint8_t			p_namehash[XFS_PARENT_NAME_MAX_HASH_SIZE];
-
-	/* Attributes of a parent pointer. */
 	uint8_t			p_namelen;
 	unsigned char		p_name[MAXNAMELEN];
 };
@@ -34,7 +30,7 @@ struct xfs_parent_name_irec {
 void xfs_parent_irec_from_disk(struct xfs_parent_name_irec *irec,
 		const struct xfs_parent_name_rec *rec, int reclen,
 		const void *value, int valuelen);
-void xfs_parent_irec_to_disk(struct xfs_parent_name_rec *rec, int *reclen,
+int xfs_parent_irec_to_disk(struct xfs_parent_name_rec *rec, int *reclen,
 		void *value, int *valuelen,
 		const struct xfs_parent_name_irec *irec);
 
@@ -107,12 +103,6 @@ xfs_parent_finish(
 		__xfs_parent_cancel(mp, p);
 }
 
-int xfs_parent_namehash(struct xfs_inode *ip, const struct xfs_name *name,
-		void *namehash, unsigned int namehash_len);
-
-int xfs_parent_irec_hash(struct xfs_inode *ip,
-		struct xfs_parent_name_irec *pptr);
-
 unsigned int xfs_pptr_calc_space_res(struct xfs_mount *mp,
 				     unsigned int namelen);
 
@@ -126,8 +116,8 @@ struct xfs_parent_scratch {
 };
 
 int xfs_parent_lookup(struct xfs_trans *tp, struct xfs_inode *ip,
-		const struct xfs_parent_name_irec *pptr, unsigned char *name,
-		unsigned int namelen, struct xfs_parent_scratch *scratch);
+		const struct xfs_parent_name_irec *pptr,
+		struct xfs_parent_scratch *scratch);
 
 int xfs_parent_set(struct xfs_inode *ip,
 		const struct xfs_parent_name_irec *pptr,
diff --git a/fs/xfs/libxfs/xfs_trans_resv.c b/fs/xfs/libxfs/xfs_trans_resv.c
index 0e625c6b0153..a8afe2333194 100644
--- a/fs/xfs/libxfs/xfs_trans_resv.c
+++ b/fs/xfs/libxfs/xfs_trans_resv.c
@@ -439,6 +439,7 @@ static inline unsigned int xfs_calc_pptr_replace_overhead(void)
 	return sizeof(struct xfs_attri_log_format) +
 			xlog_calc_iovec_len(XATTR_NAME_MAX) +
 			xlog_calc_iovec_len(XATTR_NAME_MAX) +
+			xlog_calc_iovec_len(XFS_PARENT_NAME_MAX_SIZE) +
 			xlog_calc_iovec_len(XFS_PARENT_NAME_MAX_SIZE);
 }
 
diff --git a/fs/xfs/scrub/dir.c b/fs/xfs/scrub/dir.c
index 87cff40b15f1..23cb7519c8f0 100644
--- a/fs/xfs/scrub/dir.c
+++ b/fs/xfs/scrub/dir.c
@@ -78,7 +78,7 @@ struct xchk_dir {
 	/* If we've cycled the ILOCK, we must revalidate deferred dirents. */
 	bool			need_revalidate;
 
-	/* Name buffer for pptr validation and dirent revalidation. */
+	/* Name buffer for dirent revalidation. */
 	uint8_t			namebuf[MAXNAMELEN];
 
 };
@@ -143,42 +143,16 @@ xchk_dir_parent_pointer(
 	struct xfs_inode	*ip)
 {
 	struct xfs_scrub	*sc = sd->sc;
-	int			pptr_namelen;
-	int			hashlen;
+	int			error;
 
 	sd->pptr.p_ino = sc->ip->i_ino;
 	sd->pptr.p_gen = VFS_I(sc->ip)->i_generation;
+	sd->pptr.p_namelen = name->len;
+	memcpy(sd->pptr.p_name, name->name, name->len);
 
-	hashlen = xfs_parent_namehash(ip, name, &sd->pptr.p_namehash,
-			sizeof(sd->pptr.p_namehash));
-	if (hashlen < 0) {
-		xchk_fblock_xref_process_error(sc, XFS_DATA_FORK, 0,
-				&hashlen);
-		return hashlen;
-	}
-	sd->pptr.hashlen = hashlen;
-
-	pptr_namelen = xfs_parent_lookup(sc->tp, ip, &sd->pptr, sd->namebuf,
-			MAXNAMELEN, &sd->pptr_scratch);
-	if (pptr_namelen == -ENOATTR) {
-		xchk_fblock_xref_set_corrupt(sc, XFS_DATA_FORK, 0);
-		return 0;
-	}
-	if (pptr_namelen < 0) {
-		xchk_fblock_xref_process_error(sc, XFS_DATA_FORK, 0,
-				&pptr_namelen);
-		return pptr_namelen;
-	}
-
-	if (pptr_namelen != name->len) {
-		xchk_fblock_xref_set_corrupt(sc, XFS_DATA_FORK, 0);
-		return 0;
-	}
-
-	if (memcmp(sd->namebuf, name->name, name->len)) {
+	error = xfs_parent_lookup(sc->tp, ip, &sd->pptr, &sd->pptr_scratch);
+	if (error == -ENOATTR)
 		xchk_fblock_xref_set_corrupt(sc, XFS_DATA_FORK, 0);
-		return 0;
-	}
 
 	return 0;
 }
diff --git a/fs/xfs/scrub/parent.c b/fs/xfs/scrub/parent.c
index f3b1d7cbe415..fbe6fb709e2e 100644
--- a/fs/xfs/scrub/parent.c
+++ b/fs/xfs/scrub/parent.c
@@ -348,12 +348,6 @@ struct xchk_pptrs {
 
 	/* xattr key and da args for parent pointer revalidation. */
 	struct xfs_parent_scratch pptr_scratch;
-
-	/* Name hashes */
-	uint8_t			child_namehash[XFS_PARENT_NAME_MAX_HASH_SIZE];
-
-	/* Name buffer for revalidation. */
-	uint8_t			namebuf[MAXNAMELEN];
 };
 
 /* Look up the dotdot entry so that we can check it as we walk the pptrs. */
@@ -526,12 +520,10 @@ xchk_parent_scan_attr(
 	unsigned int		valuelen,
 	void			*priv)
 {
-	struct xfs_name		xname = { };
 	struct xchk_pptrs	*pp = priv;
 	struct xfs_inode	*dp = NULL;
 	const struct xfs_parent_name_rec *rec = (const void *)name;
 	unsigned int		lockmode;
-	int			hashlen;
 	int			error;
 
 	/* Ignore incomplete xattrs */
@@ -555,29 +547,6 @@ xchk_parent_scan_attr(
 
 	xfs_parent_irec_from_disk(&pp->pptr, rec, namelen, value, valuelen);
 
-	xname.name = pp->pptr.p_name;
-	xname.len = pp->pptr.p_namelen;
-
-	/*
-	 * Does the namehash in the parent pointer match the actual name?
-	 * If not, there's no point in checking further.
-	 */
-	hashlen = xfs_parent_namehash(sc->ip, &xname, pp->child_namehash,
-			sizeof(pp->child_namehash));
-	if (hashlen < 0) {
-		xchk_fblock_xref_process_error(sc, XFS_ATTR_FORK, 0, &hashlen);
-		return hashlen;
-	}
-
-	if (hashlen != pp->pptr.hashlen ||
-	    memcmp(pp->pptr.p_namehash, pp->child_namehash,
-				pp->pptr.hashlen)) {
-		trace_xchk_parent_bad_namehash(sc->ip, pp->pptr.p_ino,
-				xname.name, xname.len);
-		xchk_fblock_xref_set_corrupt(sc, XFS_ATTR_FORK, 0);
-		return 0;
-	}
-
 	error = xchk_parent_iget(pp, &dp);
 	if (error)
 		return error;
@@ -630,28 +599,16 @@ xchk_parent_revalidate_pptr(
 	struct xchk_pptrs	*pp)
 {
 	struct xfs_scrub	*sc = pp->sc;
-	int			namelen;
+	int			error;
 
-	namelen = xfs_parent_lookup(sc->tp, sc->ip, &pp->pptr, pp->namebuf,
-			MAXNAMELEN, &pp->pptr_scratch);
-	if (namelen == -ENOATTR) {
-		/*  Parent pointer went away, nothing to revalidate. */
+	error = xfs_parent_lookup(sc->tp, sc->ip, &pp->pptr,
+			&pp->pptr_scratch);
+	if (error == -ENOATTR) {
+		/* Parent pointer went away, nothing to revalidate. */
 		return -ENOENT;
 	}
-	if (namelen < 0 && namelen != -EEXIST)
-		return namelen;
 
-	/*
-	 * The dirent name changed length while we were unlocked.  No need
-	 * to revalidate this.
-	 */
-	if (namelen != pp->pptr.p_namelen)
-		return -ENOENT;
-
-	/* The dirent name itself changed; there's nothing to revalidate. */
-	if (memcmp(pp->namebuf, pp->pptr.p_name, pp->pptr.p_namelen))
-		return -ENOENT;
-	return 0;
+	return error;
 }
 
 /*
@@ -679,10 +636,6 @@ xchk_parent_slow_pptr(
 	pp->pptr.p_name[MAXNAMELEN - 1] = 0;
 	pp->pptr.p_namelen = pptr->namelen;
 
-	error = xfs_parent_irec_hash(sc->ip, &pp->pptr);
-	if (error)
-		return error;
-
 	/* Check that the deferred parent pointer still exists. */
 	if (pp->need_revalidate) {
 		error = xchk_parent_revalidate_pptr(pp);
@@ -714,7 +667,7 @@ xchk_parent_slow_pptr(
 	xchk_iunlock(sc, sc->ilock_flags);
 	pp->need_revalidate = true;
 
-	trace_xchk_parent_slowpath(sc->ip, pp->namebuf, pptr->namelen,
+	trace_xchk_parent_slowpath(sc->ip, pp->pptr.p_name, pptr->namelen,
 			dp->i_ino);
 
 	while (true) {
diff --git a/fs/xfs/scrub/parent_repair.c b/fs/xfs/scrub/parent_repair.c
index 14647e3da8c1..b55ef1506dd2 100644
--- a/fs/xfs/scrub/parent_repair.c
+++ b/fs/xfs/scrub/parent_repair.c
@@ -126,9 +126,6 @@ struct xrep_pptrs {
 
 	/* Parent pointer names. */
 	struct xfblob		*pptr_names;
-
-	/* Buffer for validation. */
-	unsigned char		namebuf[MAXNAMELEN];
 };
 
 /* Tear down all the incore stuff we created. */
@@ -182,16 +179,11 @@ xrep_pptr_replay_update(
 	const struct xrep_pptr	*pptr)
 {
 	struct xfs_scrub	*sc = rp->sc;
-	int			error;
 
 	rp->pptr.p_ino = pptr->p_ino;
 	rp->pptr.p_gen = pptr->p_gen;
 	rp->pptr.p_namelen = pptr->namelen;
 
-	error = xfs_parent_irec_hash(sc->ip, &rp->pptr);
-	if (error)
-		return error;
-
 	if (pptr->action == XREP_PPTR_ADD) {
 		/* Create parent pointer. */
 		trace_xrep_pptr_createname(sc->tempip, &rp->pptr);
@@ -510,7 +502,7 @@ xrep_pptr_dump_tempptr(
 	struct xrep_pptrs	*rp = priv;
 	const struct xfs_parent_name_rec *rec = (const void *)name;
 	struct xfs_inode	*other_ip;
-	int			pptr_namelen;
+	int			error;
 
 	if (!(attr_flags & XFS_ATTR_PARENT))
 		return 0;
@@ -526,29 +518,15 @@ xrep_pptr_dump_tempptr(
 
 	trace_xrep_pptr_dumpname(sc->tempip, &rp->pptr);
 
-	pptr_namelen = xfs_parent_lookup(sc->tp, other_ip, &rp->pptr,
-			rp->namebuf, MAXNAMELEN, &rp->pptr_scratch);
-	if (pptr_namelen == -ENOATTR) {
+	error = xfs_parent_lookup(sc->tp, other_ip, &rp->pptr,
+			&rp->pptr_scratch);
+	if (error == -ENOATTR) {
 		trace_xrep_pptr_checkname(other_ip, &rp->pptr);
-		ASSERT(pptr_namelen != -ENOATTR);
+		ASSERT(error != -ENOATTR);
 		return -EFSCORRUPTED;
 	}
-	if (pptr_namelen < 0)
-		return pptr_namelen;
 
-	if (pptr_namelen != rp->pptr.p_namelen) {
-		trace_xrep_pptr_checkname(other_ip, &rp->pptr);
-		ASSERT(pptr_namelen == rp->pptr.p_namelen);
-		return -EFSCORRUPTED;
-	}
-
-	if (memcmp(rp->namebuf, rp->pptr.p_name, rp->pptr.p_namelen)) {
-		trace_xrep_pptr_checkname(other_ip, &rp->pptr);
-		ASSERT(0);
-		return -EFSCORRUPTED;
-	}
-
-	return 0;
+	return error;
 }
 
 /*
diff --git a/fs/xfs/xfs_ondisk.h b/fs/xfs/xfs_ondisk.h
index b7f29b4acac3..a78a3077b41a 100644
--- a/fs/xfs/xfs_ondisk.h
+++ b/fs/xfs/xfs_ondisk.h
@@ -117,9 +117,6 @@ xfs_check_ondisk_structs(void)
 	XFS_CHECK_OFFSET(xfs_dir2_sf_entry_t, name,		3);
 	XFS_CHECK_STRUCT_SIZE(xfs_dir2_sf_hdr_t,		10);
 	XFS_CHECK_STRUCT_SIZE(struct xfs_parent_name_rec,	12);
-	BUILD_BUG_ON(XFS_PARENT_NAME_MAX_HASH_SIZE < SHA512_DIGEST_SIZE);
-	BUILD_BUG_ON(XFS_PARENT_NAME_MAX_HASH_SIZE !=           243);
-	BUILD_BUG_ON(XFS_PARENT_NAME_SHA512_OFFSET !=           179);
 
 	/* log structures */
 	XFS_CHECK_STRUCT_SIZE(struct xfs_buf_log_format,	88);


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 12/13] xfs: turn NVREPLACEXXX into NVREPLACE
  2023-03-03 17:11   ` [PATCHSET v9r2d1.1 00/13] xfs: remove parent pointer hashing Darrick J. Wong
                       ` (10 preceding siblings ...)
  2023-03-03 17:12     ` [PATCH 11/13] xfs: use VLOOKUP mode to avoid hashing parent pointer names Darrick J. Wong
@ 2023-03-03 17:12     ` Darrick J. Wong
  2023-03-03 17:12     ` [PATCH 13/13] xfs: revert "load secure hash algorithm for parent pointers" Darrick J. Wong
  12 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-03-03 17:12 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

NVREPLACEXXX is NVREPLACE with VLOOKUP enabled.  Nobody uses NVREPLACE
now, so get rid of NVREPLACE and make NVREPLACEXXX take its place.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_attr.c       |    7 -------
 fs/xfs/libxfs/xfs_log_format.h |    1 -
 fs/xfs/xfs_attr_item.c         |   15 +++------------
 3 files changed, 3 insertions(+), 20 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index d807692b259c..c6621aba161d 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -431,11 +431,6 @@ xfs_attr_complete_op(
 							args->namelen);
 			args->value = args->new_value;
 			args->valuelen = args->new_valuelen;
-		} else if (args->new_namelen > 0) {
-			args->name = args->new_name;
-			args->namelen = args->new_namelen;
-			args->hashval = xfs_da_hashname(args->name,
-							args->namelen);
 		}
 		return replace_state;
 	}
@@ -944,8 +939,6 @@ xfs_attr_defer_replace(
 	int			error = 0;
 
 	if (args->op_flags & XFS_DA_OP_VLOOKUP)
-		op_flag = XFS_ATTRI_OP_FLAGS_NVREPLACEXXX;
-	else if (args->new_namelen > 0)
 		op_flag = XFS_ATTRI_OP_FLAGS_NVREPLACE;
 
 	error = xfs_attr_intent_init(args, op_flag, &new);
diff --git a/fs/xfs/libxfs/xfs_log_format.h b/fs/xfs/libxfs/xfs_log_format.h
index ed406738847d..ec85af39ed91 100644
--- a/fs/xfs/libxfs/xfs_log_format.h
+++ b/fs/xfs/libxfs/xfs_log_format.h
@@ -961,7 +961,6 @@ struct xfs_icreate_log {
 #define XFS_ATTRI_OP_FLAGS_NVREPLACE	4	/* Replace attr name and val */
 #define XFS_ATTRI_OP_FLAGS_NVREMOVE	5	/* Remove attr w/ vlookup */
 #define XFS_ATTRI_OP_FLAGS_NVSET	6	/* Set attr with w/ vlookup */
-#define XFS_ATTRI_OP_FLAGS_NVREPLACEXXX	7	/* Replace attr name and val */
 #define XFS_ATTRI_OP_FLAGS_TYPE_MASK	0xFF	/* Flags type mask */
 
 /*
diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
index 57cc426b1e22..70d56bab4e21 100644
--- a/fs/xfs/xfs_attr_item.c
+++ b/fs/xfs/xfs_attr_item.c
@@ -428,13 +428,10 @@ xfs_attr_log_item(
 	attrp->alfi_op_flags = attr->xattri_op_flags;
 	attrp->alfi_value_len = attr->xattri_nameval->value.i_len;
 
-	if (xfs_attr_log_item_op(attrp) == XFS_ATTRI_OP_FLAGS_NVREPLACEXXX) {
+	if (xfs_attr_log_item_op(attrp) == XFS_ATTRI_OP_FLAGS_NVREPLACE) {
 		attrp->alfi_oldname_len = attr->xattri_nameval->name.i_len;
 		attrp->alfi_newname_len = attr->xattri_nameval->newname.i_len;
 		attrp->alfi_newvalue_len = attr->xattri_nameval->newvalue.i_len;
-	} else if (xfs_attr_log_item_op(attrp) == XFS_ATTRI_OP_FLAGS_NVREPLACE) {
-		attrp->alfi_oldname_len = attr->xattri_nameval->name.i_len;
-		attrp->alfi_newname_len = attr->xattri_nameval->newname.i_len;
 	} else {
 		attrp->alfi_name_len = attr->xattri_nameval->name.i_len;
 	}
@@ -598,7 +595,6 @@ xfs_attri_validate(
 		if (attrp->alfi_newvalue_len != 0)
 			return false;
 		break;
-	case XFS_ATTRI_OP_FLAGS_NVREPLACEXXX:
 	case XFS_ATTRI_OP_FLAGS_NVREPLACE:
 		if (attrp->alfi_oldname_len == 0 ||
 		    attrp->alfi_oldname_len > XATTR_NAME_MAX)
@@ -685,7 +681,7 @@ xfs_attri_item_recover(
 	ASSERT(xfs_sb_version_haslogxattrs(&mp->m_sb));
 
 	switch (attr->xattri_op_flags) {
-	case XFS_ATTRI_OP_FLAGS_NVREPLACEXXX:
+	case XFS_ATTRI_OP_FLAGS_NVREPLACE:
 		args->new_value = nv->newvalue.i_addr;
 		args->new_valuelen = nv->newvalue.i_len;
 		fallthrough;
@@ -694,7 +690,6 @@ xfs_attri_item_recover(
 		fallthrough;
 	case XFS_ATTRI_OP_FLAGS_SET:
 	case XFS_ATTRI_OP_FLAGS_REPLACE:
-	case XFS_ATTRI_OP_FLAGS_NVREPLACE:
 		args->value = nv->value.i_addr;
 		args->valuelen = nv->value.i_len;
 		args->total = xfs_attr_calc_size(args, &local);
@@ -788,13 +783,10 @@ xfs_attri_item_relog(
 	new_attrp->alfi_ino = old_attrp->alfi_ino;
 	new_attrp->alfi_op_flags = old_attrp->alfi_op_flags;
 	new_attrp->alfi_value_len = old_attrp->alfi_value_len;
-	if (xfs_attr_log_item_op(old_attrp) == XFS_ATTRI_OP_FLAGS_NVREPLACEXXX) {
+	if (xfs_attr_log_item_op(old_attrp) == XFS_ATTRI_OP_FLAGS_NVREPLACE) {
 		new_attrp->alfi_newname_len = old_attrp->alfi_newname_len;
 		new_attrp->alfi_oldname_len = old_attrp->alfi_oldname_len;
 		new_attrp->alfi_newvalue_len = old_attrp->alfi_newvalue_len;
-	} else if (xfs_attr_log_item_op(old_attrp) == XFS_ATTRI_OP_FLAGS_NVREPLACE) {
-		new_attrp->alfi_newname_len = old_attrp->alfi_newname_len;
-		new_attrp->alfi_oldname_len = old_attrp->alfi_oldname_len;
 	} else {
 		new_attrp->alfi_name_len = old_attrp->alfi_name_len;
 	}
@@ -864,7 +856,6 @@ xlog_recover_attri_commit_pass2(
 		name_len = attri_formatp->alfi_name_len;
 		value_len = attri_formatp->alfi_value_len;
 		break;
-	case XFS_ATTRI_OP_FLAGS_NVREPLACEXXX:
 	case XFS_ATTRI_OP_FLAGS_NVREPLACE:
 		if (item->ri_total < 3 || item->ri_total > 5) {
 			XFS_CORRUPTION_ERROR(__func__, XFS_ERRLEVEL_LOW, mp,


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* [PATCH 13/13] xfs: revert "load secure hash algorithm for parent pointers"
  2023-03-03 17:11   ` [PATCHSET v9r2d1.1 00/13] xfs: remove parent pointer hashing Darrick J. Wong
                       ` (11 preceding siblings ...)
  2023-03-03 17:12     ` [PATCH 12/13] xfs: turn NVREPLACEXXX into NVREPLACE Darrick J. Wong
@ 2023-03-03 17:12     ` Darrick J. Wong
  12 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-03-03 17:12 UTC (permalink / raw)
  To: djwong; +Cc: allison.henderson, linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

We don't use this anymore, so get rid of it.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/Kconfig      |    1 -
 fs/xfs/xfs_linux.h  |    1 -
 fs/xfs/xfs_mount.c  |   13 -------------
 fs/xfs/xfs_mount.h  |    3 ---
 fs/xfs/xfs_sha512.h |   42 ------------------------------------------
 fs/xfs/xfs_super.c  |    3 ---
 6 files changed, 63 deletions(-)
 delete mode 100644 fs/xfs/xfs_sha512.h


diff --git a/fs/xfs/Kconfig b/fs/xfs/Kconfig
index 6422daaf8914..4798a147fd9e 100644
--- a/fs/xfs/Kconfig
+++ b/fs/xfs/Kconfig
@@ -5,7 +5,6 @@ config XFS_FS
 	select EXPORTFS
 	select LIBCRC32C
 	select FS_IOMAP
-	select CRYPTO_SHA512
 	help
 	  XFS is a high performance journaling filesystem which originated
 	  on the SGI IRIX platform.  It is completely multi-threaded, can
diff --git a/fs/xfs/xfs_linux.h b/fs/xfs/xfs_linux.h
index 3f93a742b896..c05f7e309c3e 100644
--- a/fs/xfs/xfs_linux.h
+++ b/fs/xfs/xfs_linux.h
@@ -62,7 +62,6 @@ typedef __u32			xfs_nlink_t;
 #include <linux/rhashtable.h>
 #include <linux/xattr.h>
 #include <linux/mnt_idmapping.h>
-#include <crypto/hash.h>
 
 #include <asm/page.h>
 #include <asm/div64.h>
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index a5f3dce658e9..fb87ffb48f7f 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -983,19 +983,6 @@ xfs_mountfs(
 			goto out_agresv;
 	}
 
-	if (xfs_has_parent(mp)) {
-		struct crypto_shash	*tfm;
-
-		tfm = crypto_alloc_shash("sha512", 0, 0);
-		if (IS_ERR(tfm)) {
-			error = PTR_ERR(tfm);
-			goto out_agresv;
-		}
-		xfs_info(mp, "parent pointer hash %s",
-				crypto_shash_driver_name(tfm));
-		mp->m_sha512 = tfm;
-	}
-
 	return 0;
 
  out_agresv:
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index 7c8e15e84cd6..c08f55cc4f36 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -244,9 +244,6 @@ typedef struct xfs_mount {
 #endif
 	/* Hook to feed file directory updates to an active online repair. */
 	struct xfs_hooks	m_dirent_update_hooks;
-
-	/* sha512 engine, if needed */
-	struct crypto_shash	*m_sha512;
 } xfs_mount_t;
 
 #define M_IGEO(mp)		(&(mp)->m_ino_geo)
diff --git a/fs/xfs/xfs_sha512.h b/fs/xfs/xfs_sha512.h
deleted file mode 100644
index d9756db63aa6..000000000000
--- a/fs/xfs/xfs_sha512.h
+++ /dev/null
@@ -1,42 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-or-later
-/*
- * Copyright (C) 2023 Oracle.  All Rights Reserved.
- * Author: Darrick J. Wong <djwong@kernel.org>
- */
-#ifndef __XFS_SHA512_H__
-#define __XFS_SHA512_H__
-
-struct sha512_state {
-	union {
-		struct shash_desc desc;
-		char __desc[sizeof(struct shash_desc) + HASH_MAX_DESCSIZE];
-	};
-};
-
-#define SHA512_DESC_ON_STACK(mp, name) \
-	struct sha512_state name = { .desc.tfm = (mp)->m_sha512 }
-
-#define SHA512_DIGEST_SIZE	64
-
-static inline int sha512_init(struct sha512_state *md)
-{
-	return crypto_shash_init(&md->desc);
-}
-
-static inline int sha512_done(struct sha512_state *md, unsigned char *out)
-{
-	return crypto_shash_final(&md->desc, out);
-}
-
-static inline int sha512_process(struct sha512_state *md,
-		const unsigned char *in, unsigned long inlen)
-{
-	return crypto_shash_update(&md->desc, in, inlen);
-}
-
-static inline void sha512_erase(struct sha512_state *md)
-{
-	memset(md, 0, sizeof(*md));
-}
-
-#endif /* __XFS_SHA512_H__ */
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 610d72353f39..0432a4a096e8 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -738,8 +738,6 @@ xfs_mount_free(
 {
 	kfree(mp->m_rtname);
 	kfree(mp->m_logname);
-	if (mp->m_sha512)
-		crypto_free_shash(mp->m_sha512);
 	kmem_free(mp);
 }
 
@@ -1963,7 +1961,6 @@ static int xfs_init_fs_context(
 	if (fc->sb_flags & SB_SYNCHRONOUS)
 		mp->m_features |= XFS_FEAT_WSYNC;
 
-	mp->m_sha512 = NULL;
 	fc->s_fs_info = mp;
 	fc->ops = &xfs_context_ops;
 


^ permalink raw reply related	[flat|nested] 227+ messages in thread

* Re: [RFC DELUGE v9r2d1] xfs: Parent Pointers
  2023-03-01  1:24       ` Darrick J. Wong
@ 2023-03-08 22:47         ` Allison Henderson
  2023-03-14  2:20           ` Darrick J. Wong
  0 siblings, 1 reply; 227+ messages in thread
From: Allison Henderson @ 2023-03-08 22:47 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs

On Tue, 2023-02-28 at 17:24 -0800, Darrick J. Wong wrote:
> On Sat, Feb 25, 2023 at 07:34:14AM +0000, Allison Henderson wrote:
> > On Thu, 2023-02-23 at 18:51 -0800, Darrick J. Wong wrote:
> > > On Fri, Feb 17, 2023 at 08:02:29PM +0000, Allison Henderson
> > > wrote:
> > > > On Thu, 2023-02-16 at 12:06 -0800, Darrick J. Wong wrote:
> > > > > Hi everyone,
> > > > > 
> > > > > This deluge contains all of the additions to the parent
> > > > > pointers
> > > > > patchset that I've been working on for the past month.  The
> > > > > kernel
> > > > > and
> > > > > xfsprogs patchsets are based on Allison's v9r2 tag from last
> > > > > week;
> > > > > the fstests patches are merely a part of my development
> > > > > tree.  To
> > > > > recap
> > > 
> > > <snip>
> > > 
> > > > Ermergersh, thats a lot!  Thanks for all the hard work.  I feel
> > > > like if
> > > > we don't come up with a plan for review though, people may not
> > > > know
> > > > where to start for these deluges!  Lets see... if we had to
> > > > break
> > > > this
> > > > down, I think would divide it up between the existing parent
> > > > pointers
> > > > and the new pptr propositions for ofsck.
> > > 
> > > That's a good place to cleave.
> > > 
> > > > Then further divide it among
> > > > kernel space, user space and test case.  If I had to pick only
> > > > one
> > > > of
> > > > these to focus attention on, probably it should be new ofsck
> > > > changes in
> > > > the kernel space, since the rest of the deluge is really
> > > > contingent
> > > > on
> > > > it. 
> > > 
> > > Yup.  Though you ought to read through the offline fsck patches
> > > too.
> > > Those take a very different approach to resolving parent
> > > pointers. 
> > > So
> > > much of repair is based on nuking directories that I don't know
> > > there's
> > > a good way to rebuild them from parent pointers.
> > Ok, will take a look
> > 
> > > 
> > > A thought I had was that when we decide to zap a directory due to
> > > problems in the directory blocks themselves, we could them
> > > initiate a
> > > scan of the parent pointers to try to find all the dirents we
> > > can.  I
> > > ran into problems with that approach because libxfs_iget
> > > allocates
> > > fresh
> > > xfs_inode objects (instead of caching and sharing them like the
> > > kernel
> > > does) and that made it really hard to scan things in a coherent
> > > manner.
> > > 
> > > > So now we've narrowed this down to a few subsets:
> > > > 
> > > > [PATCHSET v9r2d1 0/3] xfs: bug fixes for parent pointers
> > > > [PATCHSET v9r2d1 0/4] xfs: rework the GETPARENTS ioctl,
> > > 
> > > If you read through these two patchsets and think they're ok,
> > > then
> > > either fold the fixes into the main series or tack them on the
> > > end,
> > > whichever is easier.  
> > ok, I'll take a look, I'll probably tack the first 2 fixes since
> > they
> > dont seat into an existing patch in the set.
> 
> Ok.
> 
> > > If you tack them on the end, please add your
> > > own SOB tags.
> > 
> > Sure?  I SOB'd the last 2 patches of the set in v3, and then you
> > said
> > to make it an RVB
> 
> Er... SOB, RVB, whichever tag(s) get us to a patch that has a signoff
> and a review. :)
> 
> > > 
> > > > [PATCHSET v9r2d1 00/23] xfs: online fsck support patches
> > > > [PATCHSET v9r2d1 0/7] xfs: online repair of directories
> > > > [PATCHSET v9r2d1 0/2] xfs: online checking of parent pointers
> > > > [PATCHSET v9r2d1 0/3] xfs: online checking of parent pointers
> > > > [PATCHSET v9r2d1 0/2] xfs: online checking of directories
> > > 
> > > The fsck functionality exists to prove the point that directory
> > > repair
> > > is /very/ awkward if we have to update p_diroffset.  As such,
> > > they
> > > focused on getting the main parts right ... but with the obvious
> > > problem of making pptrs dependent on online fsck part 1 getting
> > > merged.
> > > 
> > > Speaking of which -- can we merge online fsck for 6.4?  Please?
> > > :)
> > I'm fine with it as long as everyone else is?  I'm not sure who
> > this is
> > directed to.
> 
> 10% dchinner, 90% anyone we don't know about who might swoop in at
> the
> last minute and NAK it. ;)
> 
> > I admittedly haven't been able to work through all of it,
> > but I don't think anyone has.  I don't know that exhaustive
> > reviewing
> > as a whole is particularly effective though.  Back when the
> > combined
> > set of "attr refactoring" + "larp" + "parent pointers" was
> > particularly
> > large, I used to just send out subsets that I thought were more
> > reasonable for people digest.  That way people can look at the
> > giant
> > mega-set if they really gotta see it, but it kept the reviews more
> > focused on a sort of smaller next step.
> 
> TBH every time I went to look at all that, I pulled your github
> branch
> and looked at the whole thing.  I paid more attention to whatever was
> being reviewed on-list, obviously.  That said, after about the fifth
> round of looking at a patchset I start feeling like I'm only going to
> increase my knowledge of the code by using it to write something.  At
> that point it's easier to convince me to merge it, or at least to
> fling
> it at fstestscloud.
> 
> > 
> > > 
> > > > [PATCHSET v9r2d1 0/5] xfs: encode parent pointer name in xattr
> > > > key
> > > 
> > > Resolving the questions presented by this series is critical to
> > > nailing
> > > down the ondisk format and merging the feature.  But we'll get to
> > > that
> > > below.
> > > 
> > > > [PATCHSET v9r2d1 0/3] xfs: use flex arrays for
> > > > XFS_IOC_GETPARENTS,
> > > 
> > > I'd like to know what you think about converting the ioctl
> > > definition
> > > to
> > > flex arrays instead of the fixed size structs.  I'm not sure
> > > where to
> > > put this series, though.  If you decide that you want 'em, then
> > > ideally
> > > they'd be in xfs_fs.h from the introduction of
> > > XFS_IOC_GETPARENTS,
> > > but
> > > I don't see any point in backporting them around "xfs: rework the
> > > GETPARENTS ioctl".
> > > 
> > > (I would be ok if you rolled all of it into patch 25 from the
> > > original
> > > v9 set.)
> > I'll take a look at it, I didnt put a whole lot of focus on the
> > ioctl
> > initially because the only thing that was using it at the time was
> > the
> > test case, and I wanted to keep attention more on the
> > infrastructure.
> 
> <nod> I only started looking at it because I started pounding on it
> with
> xfs_scrub and noticed problems. :D
Just an fyi, I thought the flex arrays are fine, but it had some
conflicts moving to the bottom of the set, so I just ended up re-doing
it directly in the existing parent pointers ioctl patch.  If possible,
next time put the renames and clean ups after the functional changes,
and that should help them move around a bit easier. Thanks!

Allison

> 
> > > 
> > > > Of those, I think "xfs: encode parent pointer name in xattr
> > > > key" is
> > > > the
> > > > only one that might impact other features since it's changeing
> > > > the
> > > > ondisk format from when we first started the effort years ago. 
> > > > So
> > > > probably that might be the best place for people to start since
> > > > if
> > > > this
> > > > needs to change it might impact some of the other subsets in
> > > > the
> > > > deluge, or even features they are working on if they've based
> > > > anything
> > > > on the existing pptr set.
> > > 
> > > Bingo!
> > > 
> > > The biggest question about the format change is (IMHO) whether
> > > we're
> > > ok
> > > with using a hash function for parent pointer names that don't
> > > fit in
> > > the attr key space, and which hash?
> > > 
> > > The sha2 family was designed to be collision resistant, but I
> > > don't
> > > anticipate that will last forever.  The hash is computed from
> > > (the
> > > full
> > > name and the child generation number) when the dirent name is
> > > longer
> > > than 243 bytes.  The first 179 bytes of the dirent name are still
> > > written in the parent pointer attr name.  An attacker would have
> > > to
> > > find
> > > a collision that only changes the last 76 bytes of the dirent
> > > name,
> > > and
> > > they'd have to know the generation number at runtime.
> > > 
> > > (Note: dirent names shorter than 243 bytes are written directly
> > > into
> > > the
> > > parent pointer xattr name, no hashing required.)
> > > 
> > > I /think/ that's good enough, but I'm no cryptanalyst.  The
> > > alternative
> > > would be to change the xattr format so that the namelen field in
> > > the
> > > leaf structure to encode *only* the name component of the parent
> > > pointer.  This would lead to a lot of special cased xattr code
> > > and
> > > probably a lot of bugs and other stupid problems, which is why I
> > > didn't
> > > take that route.
> > > 
> > > Thoughts?
> > 
> > Hmm, well, it sounds like a risk to be weighed.  It wouldn't happen
> > very often.  It seems like it would be extremely rare.  But when it
> > does it will likely be quite unpleasant.  
> > 
> > I think another question to ask would be how often does the parent
> > pointer really need to be updated in a repair?  In most cases, an
> > orphaned inode will likely be able to return to the dirofset from
> > whence it came.  So an update may be unlikely.  Even more so would
> > be
> > the worst case of needing to update crazy amounts of parent
> > pointers. 
> > So  another option is to simply pick a cap and error out if the
> > demand
> > is too much.  Likely if this condition does arise, there's probably
> > bigger issues going on.
> > 
> > While option A is substantially more rare than option B, you could
> > probably pick either one and rarely encounter the error path. 
> > While
> > option A does have the advantage of being more memory conservative,
> > it
> > has the disadvantage of possibly being a really ugly sleeping bug. 
> > While option B might error out when option A would have not, it
> > would
> > at least be clear as to why it did, and probably elude to the
> > presence
> > of bigger problems, such as an internal bug that we should probably
> > go
> > catch, or perhaps something external corrupting the fs image, which
> > ofsck may not be able to solve anyway.  
> > 
> > FWIW I seem to recall running across the idea of using hashes as
> > keys
> > in other projects I've been on, and most of the time the rarity of
> > the
> > collision was considered an acceptable risk, though it's really
> > about
> > which risk really bothers you more.
> 
> I want to study sha2 hash collisions and/or how the xattr code
> stumbles
> over attrs with the same dahash first.  Dealing with colliding xattr
> names might not be as painful for the parent pointer code as I'm
> currently thinking.
> 
> > > 
> > > > I feel like a 5 patch subset is a very reasonable thing to ask
> > > > people
> > > > to give their attention to.  That way they dont get lost in
> > > > things
> > > > like
> > > > nits for optimizations that might not even matter if something
> > > > it
> > > > depends on changes.
> > > > 
> > > > For the most part I am ok with changeing the format as long as
> > > > everyone
> > > > is aware and in agreement so that we dont get caught up re-
> > > > coding
> > > > efforts that seem to have stuggled with disagreements now on
> > > > the
> > > > scale
> > > > of decades.  Some of these patches were already very old by the
> > > > time I
> > > > got them!
> > > 
> > > Hheehhe.  Same here -- rmap was pretty old by the time I started
> > > pushing
> > > that for reals. :)
> > > 
> > > > On a side note, there are some preliminary patches of kernel
> > > > side
> > > > parent pointers that are either larp fixes or refactoring not
> > > > sensitive
> > > > to the proposed ofsck changes.  These patches a have been
> > > > floating
> > > > around for a while now, so if no one has any gripes, I think
> > > > just
> > > > merging those would help cut down the amount of rebaseing, user
> > > > space
> > > > porting and patch reviewing that goes on for every version. 
> > > > (maybe
> > > > the
> > > > first 1 though 7 of the 28 patch set, if folks are ok with
> > > > that)
> > > 
> > > I thought about doing that for 6.3, but I found enough bugs in
> > > the
> > > locking stuff (recall the first bugfix series) that I held back. 
> > > I'm
> > > not sure about the two "Increase <blah>" patches -- they'll bloat
> > > kernel
> > > structures without a real user for them.
> > 
> > I don't think the first 7 are order sensitive, we should be able to
> > do
> > just 1, 4, 5, 6 and 7.
> 
> OH.
> 
> > > 
> > > <shrug>
> > > 
> > > > I think the shear size of some of these sets tend to work
> > > > against
> > > > them,
> > > > as people likely cannot afford the time block they present on
> > > > the
> > > > surface.
> > > 
> > > Agreed.  At this point, I've worked through enough of the parent
> > > pointers code to understand what's going on that I'm ok with
> > > merging
> > > it
> > > once we settle the above question.
> > > 
> > > FWIW the whole series (kernel+xfsprogs+fstests) has been passing
> > > my
> > > nightly QA farm for a couple of weeks now despite my constant
> > > hammering
> > > on it, so I think the implementation is ready.
> > > 
> > > > So I think we would do well to find a way to introduce them
> > > > at a reasonable pace and keep attention focused on the
> > > > subsections
> > > > that
> > > > should require more than others, and hopefully keep thing
> > > > moving in
> > > > a
> > > > progressive direction.
> > > 
> > > I disagree -- I want to merge online fsck part 1 so I can get
> > > that
> > > out
> > > of my dev trees.  Then I want to focus on getting this over the
> > > finish
> > > line and merged.  But then I'm not known for incrementalism. :P
> > Well, I notice people respond better to subsets in smaller doses
> > though.  And then it gives the preliminary patches time to
> > stabilize if
> > people do find an issue.
> 
> <nod> I'll keep that in mind.
> 
> --D
> 
> > > 
> > > --D
> > > 
> > > > Thx!
> > > > Allison
> > > > 
> > 


^ permalink raw reply	[flat|nested] 227+ messages in thread

* Re: [RFC DELUGE v9r2d1] xfs: Parent Pointers
  2023-03-08 22:47         ` Allison Henderson
@ 2023-03-14  2:20           ` Darrick J. Wong
  0 siblings, 0 replies; 227+ messages in thread
From: Darrick J. Wong @ 2023-03-14  2:20 UTC (permalink / raw)
  To: Allison Henderson; +Cc: linux-xfs

On Wed, Mar 08, 2023 at 10:47:30PM +0000, Allison Henderson wrote:
> On Tue, 2023-02-28 at 17:24 -0800, Darrick J. Wong wrote:
> > On Sat, Feb 25, 2023 at 07:34:14AM +0000, Allison Henderson wrote:
> > > On Thu, 2023-02-23 at 18:51 -0800, Darrick J. Wong wrote:
> > > > On Fri, Feb 17, 2023 at 08:02:29PM +0000, Allison Henderson
> > > > wrote:
> > > > > On Thu, 2023-02-16 at 12:06 -0800, Darrick J. Wong wrote:
> > > > > > Hi everyone,
> > > > > > 
> > > > > > This deluge contains all of the additions to the parent
> > > > > > pointers
> > > > > > patchset that I've been working on for the past month.  The
> > > > > > kernel
> > > > > > and
> > > > > > xfsprogs patchsets are based on Allison's v9r2 tag from last
> > > > > > week;
> > > > > > the fstests patches are merely a part of my development
> > > > > > tree.  To
> > > > > > recap
> > > > 
> > > > <snip>
> > > > 
> > > > > Ermergersh, thats a lot!  Thanks for all the hard work.  I feel
> > > > > like if
> > > > > we don't come up with a plan for review though, people may not
> > > > > know
> > > > > where to start for these deluges!  Lets see... if we had to
> > > > > break
> > > > > this
> > > > > down, I think would divide it up between the existing parent
> > > > > pointers
> > > > > and the new pptr propositions for ofsck.
> > > > 
> > > > That's a good place to cleave.
> > > > 
> > > > > Then further divide it among
> > > > > kernel space, user space and test case.  If I had to pick only
> > > > > one
> > > > > of
> > > > > these to focus attention on, probably it should be new ofsck
> > > > > changes in
> > > > > the kernel space, since the rest of the deluge is really
> > > > > contingent
> > > > > on
> > > > > it. 
> > > > 
> > > > Yup.  Though you ought to read through the offline fsck patches
> > > > too.
> > > > Those take a very different approach to resolving parent
> > > > pointers. 
> > > > So
> > > > much of repair is based on nuking directories that I don't know
> > > > there's
> > > > a good way to rebuild them from parent pointers.
> > > Ok, will take a look
> > > 
> > > > 
> > > > A thought I had was that when we decide to zap a directory due to
> > > > problems in the directory blocks themselves, we could them
> > > > initiate a
> > > > scan of the parent pointers to try to find all the dirents we
> > > > can.  I
> > > > ran into problems with that approach because libxfs_iget
> > > > allocates
> > > > fresh
> > > > xfs_inode objects (instead of caching and sharing them like the
> > > > kernel
> > > > does) and that made it really hard to scan things in a coherent
> > > > manner.
> > > > 
> > > > > So now we've narrowed this down to a few subsets:
> > > > > 
> > > > > [PATCHSET v9r2d1 0/3] xfs: bug fixes for parent pointers
> > > > > [PATCHSET v9r2d1 0/4] xfs: rework the GETPARENTS ioctl,
> > > > 
> > > > If you read through these two patchsets and think they're ok,
> > > > then
> > > > either fold the fixes into the main series or tack them on the
> > > > end,
> > > > whichever is easier.  
> > > ok, I'll take a look, I'll probably tack the first 2 fixes since
> > > they
> > > dont seat into an existing patch in the set.
> > 
> > Ok.
> > 
> > > > If you tack them on the end, please add your
> > > > own SOB tags.
> > > 
> > > Sure?  I SOB'd the last 2 patches of the set in v3, and then you
> > > said
> > > to make it an RVB
> > 
> > Er... SOB, RVB, whichever tag(s) get us to a patch that has a signoff
> > and a review. :)
> > 
> > > > 
> > > > > [PATCHSET v9r2d1 00/23] xfs: online fsck support patches
> > > > > [PATCHSET v9r2d1 0/7] xfs: online repair of directories
> > > > > [PATCHSET v9r2d1 0/2] xfs: online checking of parent pointers
> > > > > [PATCHSET v9r2d1 0/3] xfs: online checking of parent pointers
> > > > > [PATCHSET v9r2d1 0/2] xfs: online checking of directories
> > > > 
> > > > The fsck functionality exists to prove the point that directory
> > > > repair
> > > > is /very/ awkward if we have to update p_diroffset.  As such,
> > > > they
> > > > focused on getting the main parts right ... but with the obvious
> > > > problem of making pptrs dependent on online fsck part 1 getting
> > > > merged.
> > > > 
> > > > Speaking of which -- can we merge online fsck for 6.4?  Please?
> > > > :)
> > > I'm fine with it as long as everyone else is?  I'm not sure who
> > > this is
> > > directed to.
> > 
> > 10% dchinner, 90% anyone we don't know about who might swoop in at
> > the
> > last minute and NAK it. ;)
> > 
> > > I admittedly haven't been able to work through all of it,
> > > but I don't think anyone has.  I don't know that exhaustive
> > > reviewing
> > > as a whole is particularly effective though.  Back when the
> > > combined
> > > set of "attr refactoring" + "larp" + "parent pointers" was
> > > particularly
> > > large, I used to just send out subsets that I thought were more
> > > reasonable for people digest.  That way people can look at the
> > > giant
> > > mega-set if they really gotta see it, but it kept the reviews more
> > > focused on a sort of smaller next step.
> > 
> > TBH every time I went to look at all that, I pulled your github
> > branch
> > and looked at the whole thing.  I paid more attention to whatever was
> > being reviewed on-list, obviously.  That said, after about the fifth
> > round of looking at a patchset I start feeling like I'm only going to
> > increase my knowledge of the code by using it to write something.  At
> > that point it's easier to convince me to merge it, or at least to
> > fling
> > it at fstestscloud.
> > 
> > > 
> > > > 
> > > > > [PATCHSET v9r2d1 0/5] xfs: encode parent pointer name in xattr
> > > > > key
> > > > 
> > > > Resolving the questions presented by this series is critical to
> > > > nailing
> > > > down the ondisk format and merging the feature.  But we'll get to
> > > > that
> > > > below.
> > > > 
> > > > > [PATCHSET v9r2d1 0/3] xfs: use flex arrays for
> > > > > XFS_IOC_GETPARENTS,
> > > > 
> > > > I'd like to know what you think about converting the ioctl
> > > > definition
> > > > to
> > > > flex arrays instead of the fixed size structs.  I'm not sure
> > > > where to
> > > > put this series, though.  If you decide that you want 'em, then
> > > > ideally
> > > > they'd be in xfs_fs.h from the introduction of
> > > > XFS_IOC_GETPARENTS,
> > > > but
> > > > I don't see any point in backporting them around "xfs: rework the
> > > > GETPARENTS ioctl".
> > > > 
> > > > (I would be ok if you rolled all of it into patch 25 from the
> > > > original
> > > > v9 set.)
> > > I'll take a look at it, I didnt put a whole lot of focus on the
> > > ioctl
> > > initially because the only thing that was using it at the time was
> > > the
> > > test case, and I wanted to keep attention more on the
> > > infrastructure.
> > 
> > <nod> I only started looking at it because I started pounding on it
> > with
> > xfs_scrub and noticed problems. :D
> Just an fyi, I thought the flex arrays are fine, but it had some
> conflicts moving to the bottom of the set, so I just ended up re-doing
> it directly in the existing parent pointers ioctl patch.  If possible,
> next time put the renames and clean ups after the functional changes,
> and that should help them move around a bit easier. Thanks!

Ok, I'll try to keep things cleaner when I rebase against v10.  TBH I
had decided that the ioctl changes were something I could put off until
after I'd written repair, so that's why they landed where they did.
For future reference, if there's a patchset of mine that you want to
merge and want me to reorder it to make your life easier, I'm open to
doing that.

Last week I got bogged down in 6.3 problems, and then discovered that my
new name-value xattr lookup code didn't quite work right with log
recovery, so it's only today that I (think) I'm ready to call my own
work on parent pointers done.

(I'm definitely going to have to figure out how to grind out all the
sha512 stuff that was in the v9r2d1 deluge.)

--D

> Allison
> 
> > 
> > > > 
> > > > > Of those, I think "xfs: encode parent pointer name in xattr
> > > > > key" is
> > > > > the
> > > > > only one that might impact other features since it's changeing
> > > > > the
> > > > > ondisk format from when we first started the effort years ago. 
> > > > > So
> > > > > probably that might be the best place for people to start since
> > > > > if
> > > > > this
> > > > > needs to change it might impact some of the other subsets in
> > > > > the
> > > > > deluge, or even features they are working on if they've based
> > > > > anything
> > > > > on the existing pptr set.
> > > > 
> > > > Bingo!
> > > > 
> > > > The biggest question about the format change is (IMHO) whether
> > > > we're
> > > > ok
> > > > with using a hash function for parent pointer names that don't
> > > > fit in
> > > > the attr key space, and which hash?
> > > > 
> > > > The sha2 family was designed to be collision resistant, but I
> > > > don't
> > > > anticipate that will last forever.  The hash is computed from
> > > > (the
> > > > full
> > > > name and the child generation number) when the dirent name is
> > > > longer
> > > > than 243 bytes.  The first 179 bytes of the dirent name are still
> > > > written in the parent pointer attr name.  An attacker would have
> > > > to
> > > > find
> > > > a collision that only changes the last 76 bytes of the dirent
> > > > name,
> > > > and
> > > > they'd have to know the generation number at runtime.
> > > > 
> > > > (Note: dirent names shorter than 243 bytes are written directly
> > > > into
> > > > the
> > > > parent pointer xattr name, no hashing required.)
> > > > 
> > > > I /think/ that's good enough, but I'm no cryptanalyst.  The
> > > > alternative
> > > > would be to change the xattr format so that the namelen field in
> > > > the
> > > > leaf structure to encode *only* the name component of the parent
> > > > pointer.  This would lead to a lot of special cased xattr code
> > > > and
> > > > probably a lot of bugs and other stupid problems, which is why I
> > > > didn't
> > > > take that route.
> > > > 
> > > > Thoughts?
> > > 
> > > Hmm, well, it sounds like a risk to be weighed.  It wouldn't happen
> > > very often.  It seems like it would be extremely rare.  But when it
> > > does it will likely be quite unpleasant.  
> > > 
> > > I think another question to ask would be how often does the parent
> > > pointer really need to be updated in a repair?  In most cases, an
> > > orphaned inode will likely be able to return to the dirofset from
> > > whence it came.  So an update may be unlikely.  Even more so would
> > > be
> > > the worst case of needing to update crazy amounts of parent
> > > pointers. 
> > > So  another option is to simply pick a cap and error out if the
> > > demand
> > > is too much.  Likely if this condition does arise, there's probably
> > > bigger issues going on.
> > > 
> > > While option A is substantially more rare than option B, you could
> > > probably pick either one and rarely encounter the error path. 
> > > While
> > > option A does have the advantage of being more memory conservative,
> > > it
> > > has the disadvantage of possibly being a really ugly sleeping bug. 
> > > While option B might error out when option A would have not, it
> > > would
> > > at least be clear as to why it did, and probably elude to the
> > > presence
> > > of bigger problems, such as an internal bug that we should probably
> > > go
> > > catch, or perhaps something external corrupting the fs image, which
> > > ofsck may not be able to solve anyway.  
> > > 
> > > FWIW I seem to recall running across the idea of using hashes as
> > > keys
> > > in other projects I've been on, and most of the time the rarity of
> > > the
> > > collision was considered an acceptable risk, though it's really
> > > about
> > > which risk really bothers you more.
> > 
> > I want to study sha2 hash collisions and/or how the xattr code
> > stumbles
> > over attrs with the same dahash first.  Dealing with colliding xattr
> > names might not be as painful for the parent pointer code as I'm
> > currently thinking.
> > 
> > > > 
> > > > > I feel like a 5 patch subset is a very reasonable thing to ask
> > > > > people
> > > > > to give their attention to.  That way they dont get lost in
> > > > > things
> > > > > like
> > > > > nits for optimizations that might not even matter if something
> > > > > it
> > > > > depends on changes.
> > > > > 
> > > > > For the most part I am ok with changeing the format as long as
> > > > > everyone
> > > > > is aware and in agreement so that we dont get caught up re-
> > > > > coding
> > > > > efforts that seem to have stuggled with disagreements now on
> > > > > the
> > > > > scale
> > > > > of decades.  Some of these patches were already very old by the
> > > > > time I
> > > > > got them!
> > > > 
> > > > Hheehhe.  Same here -- rmap was pretty old by the time I started
> > > > pushing
> > > > that for reals. :)
> > > > 
> > > > > On a side note, there are some preliminary patches of kernel
> > > > > side
> > > > > parent pointers that are either larp fixes or refactoring not
> > > > > sensitive
> > > > > to the proposed ofsck changes.  These patches a have been
> > > > > floating
> > > > > around for a while now, so if no one has any gripes, I think
> > > > > just
> > > > > merging those would help cut down the amount of rebaseing, user
> > > > > space
> > > > > porting and patch reviewing that goes on for every version. 
> > > > > (maybe
> > > > > the
> > > > > first 1 though 7 of the 28 patch set, if folks are ok with
> > > > > that)
> > > > 
> > > > I thought about doing that for 6.3, but I found enough bugs in
> > > > the
> > > > locking stuff (recall the first bugfix series) that I held back. 
> > > > I'm
> > > > not sure about the two "Increase <blah>" patches -- they'll bloat
> > > > kernel
> > > > structures without a real user for them.
> > > 
> > > I don't think the first 7 are order sensitive, we should be able to
> > > do
> > > just 1, 4, 5, 6 and 7.
> > 
> > OH.
> > 
> > > > 
> > > > <shrug>
> > > > 
> > > > > I think the shear size of some of these sets tend to work
> > > > > against
> > > > > them,
> > > > > as people likely cannot afford the time block they present on
> > > > > the
> > > > > surface.
> > > > 
> > > > Agreed.  At this point, I've worked through enough of the parent
> > > > pointers code to understand what's going on that I'm ok with
> > > > merging
> > > > it
> > > > once we settle the above question.
> > > > 
> > > > FWIW the whole series (kernel+xfsprogs+fstests) has been passing
> > > > my
> > > > nightly QA farm for a couple of weeks now despite my constant
> > > > hammering
> > > > on it, so I think the implementation is ready.
> > > > 
> > > > > So I think we would do well to find a way to introduce them
> > > > > at a reasonable pace and keep attention focused on the
> > > > > subsections
> > > > > that
> > > > > should require more than others, and hopefully keep thing
> > > > > moving in
> > > > > a
> > > > > progressive direction.
> > > > 
> > > > I disagree -- I want to merge online fsck part 1 so I can get
> > > > that
> > > > out
> > > > of my dev trees.  Then I want to focus on getting this over the
> > > > finish
> > > > line and merged.  But then I'm not known for incrementalism. :P
> > > Well, I notice people respond better to subsets in smaller doses
> > > though.  And then it gives the preliminary patches time to
> > > stabilize if
> > > people do find an issue.
> > 
> > <nod> I'll keep that in mind.
> > 
> > --D
> > 
> > > > 
> > > > --D
> > > > 
> > > > > Thx!
> > > > > Allison
> > > > > 
> > > 
> 

^ permalink raw reply	[flat|nested] 227+ messages in thread

end of thread, other threads:[~2023-03-14  2:21 UTC | newest]

Thread overview: 227+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-16 20:06 [RFC DELUGE v9r2d1] xfs: Parent Pointers Darrick J. Wong
2023-02-16 20:26 ` [PATCHSET v9r2d1 00/28] " Darrick J. Wong
2023-02-16 20:32   ` [PATCH 01/28] xfs: Add new name to attri/d Darrick J. Wong
2023-02-16 20:33   ` [PATCH 02/28] xfs: Increase XFS_DEFER_OPS_NR_INODES to 5 Darrick J. Wong
2023-02-16 20:33   ` [PATCH 03/28] xfs: Increase XFS_QM_TRANS_MAXDQS " Darrick J. Wong
2023-02-16 20:33   ` [PATCH 04/28] xfs: Hold inode locks in xfs_ialloc Darrick J. Wong
2023-02-16 20:33   ` [PATCH 05/28] xfs: Hold inode locks in xfs_trans_alloc_dir Darrick J. Wong
2023-02-16 20:34   ` [PATCH 06/28] xfs: Hold inode locks in xfs_rename Darrick J. Wong
2023-02-16 20:34   ` [PATCH 07/28] xfs: Expose init_xattrs in xfs_create_tmpfile Darrick J. Wong
2023-02-16 20:34   ` [PATCH 08/28] xfs: get directory offset when adding directory name Darrick J. Wong
2023-02-16 20:35   ` [PATCH 09/28] xfs: get directory offset when removing " Darrick J. Wong
2023-02-16 20:35   ` [PATCH 10/28] xfs: get directory offset when replacing a " Darrick J. Wong
2023-02-16 20:35   ` [PATCH 11/28] xfs: add parent pointer support to attribute code Darrick J. Wong
2023-02-16 20:35   ` [PATCH 12/28] xfs: define parent pointer xattr format Darrick J. Wong
2023-02-16 20:36   ` [PATCH 13/28] xfs: Add xfs_verify_pptr Darrick J. Wong
2023-02-16 20:36   ` [PATCH 14/28] xfs: extend transaction reservations for parent attributes Darrick J. Wong
2023-02-16 20:36   ` [PATCH 15/28] xfs: parent pointer attribute creation Darrick J. Wong
2023-02-16 20:36   ` [PATCH 16/28] xfs: add parent attributes to link Darrick J. Wong
2023-02-16 20:37   ` [PATCH 17/28] xfs: add parent attributes to symlink Darrick J. Wong
2023-02-16 20:37   ` [PATCH 18/28] xfs: remove parent pointers in unlink Darrick J. Wong
2023-02-16 20:37   ` [PATCH 19/28] xfs: Indent xfs_rename Darrick J. Wong
2023-02-16 20:37   ` [PATCH 20/28] xfs: Add parent pointers to rename Darrick J. Wong
2023-02-16 20:38   ` [PATCH 21/28] xfs: Add parent pointers to xfs_cross_rename Darrick J. Wong
2023-02-16 20:38   ` [PATCH 22/28] xfs: Add the parent pointer support to the superblock version 5 Darrick J. Wong
2023-02-16 20:38   ` [PATCH 23/28] xfs: Add helper function xfs_attr_list_context_init Darrick J. Wong
2023-02-16 20:38   ` [PATCH 24/28] xfs: Filter XFS_ATTR_PARENT for getfattr Darrick J. Wong
2023-02-16 20:39   ` [PATCH 25/28] xfs: Add parent pointer ioctl Darrick J. Wong
2023-02-16 20:39   ` [PATCH 26/28] xfs: fix unit conversion error in xfs_log_calc_max_attrsetm_res Darrick J. Wong
2023-02-16 20:39   ` [PATCH 27/28] xfs: drop compatibility minimum log size computations for reflink Darrick J. Wong
2023-02-16 20:39   ` [PATCH 28/28] xfs: add xfs_trans_mod_sb tracing Darrick J. Wong
2023-02-16 20:26 ` [PATCHSET v9r2d1 0/3] xfs: bug fixes for parent pointers Darrick J. Wong
2023-02-16 20:40   ` [PATCH 1/3] xfs: directory lookups should return diroffsets too Darrick J. Wong
2023-02-16 20:40   ` [PATCH 2/3] xfs: move/add parent pointer validators to xfs_parent Darrick J. Wong
2023-02-16 20:40   ` [PATCH 3/3] xfs: don't remove the attr fork when parent pointers are enabled Darrick J. Wong
2023-02-16 20:26 ` [PATCHSET v9r2d1 0/4] xfs: rework the GETPARENTS ioctl Darrick J. Wong
2023-02-16 20:40   ` [PATCH 1/4] xfs: fix multiple problems when doing getparents by handle Darrick J. Wong
2023-02-16 20:41   ` [PATCH 2/4] xfs: use kvalloc for the parent pointer info buffer Darrick J. Wong
2023-02-16 20:41   ` [PATCH 3/4] xfs: pass the attr value to put_listent when possible Darrick J. Wong
2023-02-16 20:41   ` [PATCH 4/4] xfs: replace the XFS_IOC_GETPARENTS backend Darrick J. Wong
2023-02-16 20:27 ` [PATCHSET v9r2d1 00/23] xfs: online fsck support patches Darrick J. Wong
2023-02-16 20:42   ` [PATCH 01/23] xfs: manage inode DONTCACHE status at irele time Darrick J. Wong
2023-02-16 20:42   ` [PATCH 02/23] xfs: make checking directory dotdot entries more reliable Darrick J. Wong
2023-02-16 20:42   ` [PATCH 03/23] xfs: xfs_iget in the directory scrubber needs to use UNTRUSTED Darrick J. Wong
2023-02-16 20:42   ` [PATCH 04/23] xfs: always check the existence of a dirent's child inode Darrick J. Wong
2023-02-16 20:43   ` [PATCH 05/23] xfs: remove the for_each_xbitmap_ helpers Darrick J. Wong
2023-02-16 20:43   ` [PATCH 06/23] xfs: drop the _safe behavior from the xbitmap foreach macro Darrick J. Wong
2023-02-16 20:43   ` [PATCH 07/23] xfs: convert xbitmap to interval tree Darrick J. Wong
2023-02-16 20:43   ` [PATCH 08/23] xfs: port xbitmap_test Darrick J. Wong
2023-02-16 20:44   ` [PATCH 09/23] xfs: ignore stale buffers when scanning the buffer cache Darrick J. Wong
2023-02-16 20:44   ` [PATCH 10/23] xfs: create a big array data structure Darrick J. Wong
2023-02-16 20:44   ` [PATCH 11/23] xfs: wrap ilock/iunlock operations on sc->ip Darrick J. Wong
2023-02-16 20:44   ` [PATCH 12/23] xfs: port scrub inode scan from djwong-dev Darrick J. Wong
2023-02-16 20:45   ` [PATCH 13/23] xfs: allow scrub to hook metadata updates in other writers Darrick J. Wong
2023-02-16 20:45   ` [PATCH 14/23] xfs: allow blocking notifier chains with filesystem hooks Darrick J. Wong
2023-02-16 20:45   ` [PATCH 15/23] xfs: streamline the directory iteration code for scrub Darrick J. Wong
2023-02-16 20:45   ` [PATCH 16/23] xfs: track file link count updates during live nlinks fsck Darrick J. Wong
2023-02-16 20:46   ` [PATCH 17/23] xfs: connect in-memory btrees to xfiles Darrick J. Wong
2023-02-16 20:46   ` [PATCH 18/23] xfs: create temporary files and directories for online repair Darrick J. Wong
2023-02-16 20:46   ` [PATCH 19/23] xfs: hide private inodes from bulkstat and handle functions Darrick J. Wong
2023-02-16 20:46   ` [PATCH 20/23] xfs: create a blob array data structure Darrick J. Wong
2023-02-16 20:47   ` [PATCH 21/23] xfs: repair extended attributes Darrick J. Wong
2023-02-16 20:47   ` [PATCH 22/23] xfs: online repair of directories Darrick J. Wong
2023-02-16 20:47   ` [PATCH 23/23] xfs: create an xattr iteration function for scrub Darrick J. Wong
2023-02-16 20:27 ` [PATCHSET v9r2d1 0/7] xfs: online repair of directories Darrick J. Wong
2023-02-16 20:48   ` [PATCH 1/7] xfs: pass directory offsets as part of the dirent hook data Darrick J. Wong
2023-02-16 20:48   ` [PATCH 2/7] xfs: pass diroffset back from xchk_dir_lookup Darrick J. Wong
2023-02-16 20:48   ` [PATCH 3/7] xfs: shorten parent pointer function names Darrick J. Wong
2023-02-16 20:48   ` [PATCH 4/7] xfs: rearrange bits of the parent pointer apis for fsck Darrick J. Wong
2023-02-16 20:49   ` [PATCH 5/7] xfs: reconstruct directories from parent pointers Darrick J. Wong
2023-02-16 20:49   ` [PATCH 6/7] xfs: add hooks to do directory updates Darrick J. Wong
2023-02-16 20:49   ` [PATCH 7/7] xfs: compare generated and existing dirents Darrick J. Wong
2023-02-16 20:27 ` [PATCHSET v9r2d1 0/2] xfs: online checking of parent pointers Darrick J. Wong
2023-02-16 20:49   ` [PATCH 1/2] xfs: scrub " Darrick J. Wong
2023-02-16 20:50   ` [PATCH 2/2] xfs: deferred scrub of " Darrick J. Wong
2023-02-16 20:27 ` [PATCHSET v9r2d1 0/3] xfs: online checking " Darrick J. Wong
2023-02-16 20:50   ` [PATCH 1/3] xfs: repair parent pointers by scanning directories Darrick J. Wong
2023-02-16 20:50   ` [PATCH 2/3] xfs: repair parent pointers with live scan hooks Darrick J. Wong
2023-02-16 20:50   ` [PATCH 3/3] xfs: compare generated and existing parent pointers Darrick J. Wong
2023-02-16 20:28 ` [PATCHSET v9r2d1 0/2] xfs: online checking of directories Darrick J. Wong
2023-02-16 20:51   ` [PATCH 1/2] xfs: check dirents have parent pointers Darrick J. Wong
2023-02-16 20:51   ` [PATCH 2/2] xfs: deferred scrub of dirents Darrick J. Wong
2023-02-16 20:28 ` [PATCHSET v9r2d1 0/5] xfs: encode parent pointer name in xattr key Darrick J. Wong
2023-02-16 20:51   ` [PATCH 1/5] xfs: load secure hash algorithm for parent pointers Darrick J. Wong
2023-02-16 20:51   ` [PATCH 2/5] xfs: replace parent pointer diroffset with sha512 hash of name Darrick J. Wong
2023-02-16 20:52   ` [PATCH 3/5] xfs: skip the sha512 namehash when possible Darrick J. Wong
2023-02-16 20:52   ` [PATCH 4/5] xfs: make the ondisk parent pointer record a flex array Darrick J. Wong
2023-02-16 20:52   ` [PATCH 5/5] xfs: use parent pointer xattr space more efficiently Darrick J. Wong
2023-02-18  8:12   ` [PATCHSET v9r2d1 0/5] xfs: encode parent pointer name in xattr key Amir Goldstein
2023-02-24  2:58     ` Darrick J. Wong
2023-03-03 16:43   ` Darrick J. Wong
2023-03-03 17:11   ` [PATCHSET v9r2d1.1 00/13] xfs: remove parent pointer hashing Darrick J. Wong
2023-03-03 17:11     ` [PATCH 01/13] xfs: make xfs_attr_set require XFS_DA_OP_REMOVE Darrick J. Wong
2023-03-03 17:11     ` [PATCH 02/13] xfs: allow xattr matching on value for local/sf attrs Darrick J. Wong
2023-03-03 17:11     ` [PATCH 03/13] xfs: preserve VLOOKUP in xfs_attr_set Darrick J. Wong
2023-03-03 17:11     ` [PATCH 04/13] xfs: log VLOOKUP xattr removal operations Darrick J. Wong
2023-03-03 17:11     ` [PATCH 05/13] xfs: log VLOOKUP xattr setting operations Darrick J. Wong
2023-03-03 17:11     ` [PATCH 06/13] xfs: refactor extracting attri ops from alfi_op_flags Darrick J. Wong
2023-03-03 17:11     ` [PATCH 07/13] xfs: overlay alfi_nname_len atop alfi_name_len for NVREPLACE Darrick J. Wong
2023-03-03 17:12     ` [PATCH 08/13] xfs: rename nname to newname Darrick J. Wong
2023-03-03 17:12     ` [PATCH 09/13] xfs: log VLOOKUP xattr nvreplace operations Darrick J. Wong
2023-03-03 17:12     ` [PATCH 10/13] xfs: log old xattr values for NVREPLACEXXX operations Darrick J. Wong
2023-03-03 17:12     ` [PATCH 11/13] xfs: use VLOOKUP mode to avoid hashing parent pointer names Darrick J. Wong
2023-03-03 17:12     ` [PATCH 12/13] xfs: turn NVREPLACEXXX into NVREPLACE Darrick J. Wong
2023-03-03 17:12     ` [PATCH 13/13] xfs: revert "load secure hash algorithm for parent pointers" Darrick J. Wong
2023-02-16 20:28 ` [PATCHSET v9r2d1 0/3] xfs: use flex arrays for XFS_IOC_GETPARENTS Darrick J. Wong
2023-02-16 20:52   ` [PATCH 1/3] xfs: rename xfs_pptr_info to xfs_getparents Darrick J. Wong
2023-02-16 20:53   ` [PATCH 2/3] xfs: rename xfs_parent_ptr Darrick J. Wong
2023-02-16 20:53   ` [PATCH 3/3] xfs: convert GETPARENTS structures to flex arrays Darrick J. Wong
2023-02-16 20:29 ` [PATCHSET v9r2d1 00/25] xfsprogs: Parent Pointers Darrick J. Wong
2023-02-16 20:53   ` [PATCH 01/25] xfsprogs: Fix default superblock attr bits Darrick J. Wong
2023-02-16 20:54   ` [PATCH 02/25] xfsprogs: Add new name to attri/d Darrick J. Wong
2023-02-16 20:54   ` [PATCH 03/25] xfsprogs: Increase XFS_DEFER_OPS_NR_INODES to 5 Darrick J. Wong
2023-02-16 20:54   ` [PATCH 04/25] xfsprogs: get directory offset when adding directory name Darrick J. Wong
2023-02-16 20:54   ` [PATCH 05/25] xfsprogs: get directory offset when removing " Darrick J. Wong
2023-02-16 20:55   ` [PATCH 06/25] xfsprogs: get directory offset when replacing a " Darrick J. Wong
2023-02-16 20:55   ` [PATCH 07/25] xfsprogs: add parent pointer support to attribute code Darrick J. Wong
2023-02-16 20:55   ` [PATCH 08/25] xfsprogs: define parent pointer xattr format Darrick J. Wong
2023-02-16 20:55   ` [PATCH 09/25] xfsprogs: Add xfs_verify_pptr Darrick J. Wong
2023-02-16 20:56   ` [PATCH 10/25] xfsprogs: extend transaction reservations for parent attributes Darrick J. Wong
2023-02-16 20:56   ` [PATCH 11/25] xfsprogs: parent pointer attribute creation Darrick J. Wong
2023-02-16 20:56   ` [PATCH 12/25] xfsprogs: add parent attributes to link Darrick J. Wong
2023-02-16 20:56   ` [PATCH 13/25] xfsprogs: add parent attributes to symlink Darrick J. Wong
2023-02-16 20:57   ` [PATCH 14/25] xfsprogs: remove parent pointers in unlink Darrick J. Wong
2023-02-16 20:57   ` [PATCH 15/25] xfsprogs: Add parent pointers to rename Darrick J. Wong
2023-02-16 20:57   ` [PATCH 16/25] xfsprogs: Add the parent pointer support to the superblock version 5 Darrick J. Wong
2023-02-16 20:57   ` [PATCH 17/25] xfsprogs: Add parent pointer ioctl Darrick J. Wong
2023-02-16 20:58   ` [PATCH 18/25] xfsprogs: fix unit conversion error in xfs_log_calc_max_attrsetm_res Darrick J. Wong
2023-02-16 20:58   ` [PATCH 19/25] xfsprogs: drop compatibility minimum log size computations for reflink Darrick J. Wong
2023-02-16 20:58   ` [PATCH 20/25] xfsprogs: Add parent pointer flag to cmd Darrick J. Wong
2023-02-16 20:58   ` [PATCH 21/25] xfsprogs: Print pptrs in ATTRI items Darrick J. Wong
2023-02-16 20:59   ` [PATCH 22/25] xfs_db: report parent bit on xattrs Darrick J. Wong
2023-02-16 20:59   ` [PATCH 23/25] xfsprogs: implement the upper half of parent pointers Darrick J. Wong
2023-02-16 20:59   ` [PATCH 24/25] xfsprogs: Add parent pointers during protofile creation Darrick J. Wong
2023-02-16 21:00   ` [PATCH 25/25] xfsprogs: Add i, n and f flags to parent command Darrick J. Wong
2023-02-16 20:29 ` [PATCHSET v9r2d1 0/6] xfsprogs: bug fixes before parent pointers Darrick J. Wong
2023-02-16 21:00   ` [PATCH 1/6] libxfs: initialize the slab cache for parent defer items Darrick J. Wong
2023-02-16 21:00   ` [PATCH 2/6] xfs: directory lookups should return diroffsets too Darrick J. Wong
2023-02-16 21:00   ` [PATCH 3/6] xfs: move/add parent pointer validators to xfs_parent Darrick J. Wong
2023-02-16 21:01   ` [PATCH 4/6] xfs: don't remove the attr fork when parent pointers are enabled Darrick J. Wong
2023-02-16 21:01   ` [PATCH 5/6] xfs: pass the attr value to put_listent when possible Darrick J. Wong
2023-02-16 21:01   ` [PATCH 6/6] xfs: replace the XFS_IOC_GETPARENTS backend Darrick J. Wong
2023-02-16 20:29 ` [PATCHSET v9r2d1 0/6] xfsprogs: tool fixes for parent pointers Darrick J. Wong
2023-02-16 21:01   ` [PATCH 1/6] xfs_scrub: don't report media errors for space with unknowable owner Darrick J. Wong
2023-02-16 21:02   ` [PATCH 2/6] mkfs: fix libxfs api misuse Darrick J. Wong
2023-02-16 21:02   ` [PATCH 3/6] libxfs: create new files with attr forks if necessary Darrick J. Wong
2023-02-16 21:02   ` [PATCH 4/6] mkfs: fix subdir parent pointer creation Darrick J. Wong
2023-02-16 21:02   ` [PATCH 5/6] xfs_db: report parent pointer keys Darrick J. Wong
2023-02-16 21:03   ` [PATCH 6/6] xfs_db: obfuscate dirent and pptr names consistently Darrick J. Wong
2023-02-16 20:29 ` [PATCHSET v9r2d1 00/10] xfsprogs: actually use getparent ioctl Darrick J. Wong
2023-02-16 21:03   ` [PATCH 01/10] xfs_scrub: revert unnecessary code from "implement the upper half of parent pointers" Darrick J. Wong
2023-02-16 21:03   ` [PATCH 02/10] xfs_io: print path in path_print Darrick J. Wong
2023-02-16 21:03   ` [PATCH 03/10] xfs_io: move parent pointer filtering and formatting flags out of libhandle Darrick J. Wong
2023-02-16 21:04   ` [PATCH 04/10] libfrog: remove all the parent pointer code from libhandle Darrick J. Wong
2023-02-16 21:04   ` [PATCH 05/10] libfrog: fix indenting errors in xfss_pptr_alloc Darrick J. Wong
2023-02-16 21:04   ` [PATCH 06/10] libfrog: return positive errno in pptrs.c Darrick J. Wong
2023-02-16 21:04   ` [PATCH 07/10] libfrog: only walk one parent pointer at a time in handle_walk_parent_path_ptr Darrick J. Wong
2023-02-16 21:05   ` [PATCH 08/10] libfrog: trim trailing slashes when printing pptr paths Darrick J. Wong
2023-02-16 21:05   ` [PATCH 09/10] xfs_io: parent command is not experts-only Darrick J. Wong
2023-02-16 21:05   ` [PATCH 10/10] xfs_scrub: use parent pointers when possible to report file operations Darrick J. Wong
2023-02-16 20:30 ` [PATCHSET v9r2d1 0/4] xfsprogs: offline fsck support patches Darrick J. Wong
2023-02-16 21:06   ` [PATCH 1/4] libxfs: add xfile support Darrick J. Wong
2023-02-16 21:06   ` [PATCH 2/4] xfs: track file link count updates during live nlinks fsck Darrick J. Wong
2023-02-16 21:06   ` [PATCH 3/4] xfs: create a blob array data structure Darrick J. Wong
2023-02-16 21:06   ` [PATCH 4/4] libxfs: export attr3_leaf_hdr_from_disk via libxfs_api_defs.h Darrick J. Wong
2023-02-16 20:30 ` [PATCHSET v9r2d1 0/3] xfsprogs: online repair of directories Darrick J. Wong
2023-02-16 21:07   ` [PATCH 1/3] xfs: shorten parent pointer function names Darrick J. Wong
2023-02-16 21:07   ` [PATCH 2/3] xfs: rearrange bits of the parent pointer apis for fsck Darrick J. Wong
2023-02-16 21:07   ` [PATCH 3/3] xfs: add hooks to do directory updates Darrick J. Wong
2023-02-16 20:30 ` [PATCHSET v9r2d1 0/1] xfsprogs: online checking of parent pointers Darrick J. Wong
2023-02-16 21:07   ` [PATCH 1/1] xfs: deferred scrub " Darrick J. Wong
2023-02-16 20:30 ` [PATCHSET v9r2d1 0/2] xfsprogs: online checking " Darrick J. Wong
2023-02-16 21:08   ` [PATCH 1/2] xfs: repair parent pointers by scanning directories Darrick J. Wong
2023-02-16 21:08   ` [PATCH 2/2] xfs: repair parent pointers with live scan hooks Darrick J. Wong
2023-02-16 20:31 ` [PATCHSET v9r2d1 0/8] xfs_repair: support parent pointers Darrick J. Wong
2023-02-16 21:08   ` [PATCH 1/8] xfs_repair: build a parent pointer index Darrick J. Wong
2023-02-16 21:08   ` [PATCH 2/8] xfs_repair: check parent pointers Darrick J. Wong
2023-02-16 21:09   ` [PATCH 3/8] xfs_repair: dump garbage parent pointer attributes Darrick J. Wong
2023-02-16 21:09   ` [PATCH 4/8] xfs_repair: update ondisk parent pointer records Darrick J. Wong
2023-02-16 21:09   ` [PATCH 5/8] xfs_repair: wipe ondisk parent pointers when there are none Darrick J. Wong
2023-02-16 21:09   ` [PATCH 6/8] xfs_repair: move the global dirent name store to a separate object Darrick J. Wong
2023-02-16 21:10   ` [PATCH 7/8] xfs_repair: deduplicate strings stored in string blob Darrick J. Wong
2023-02-16 21:10   ` [PATCH 8/8] xfs_repair: try to reuse nameblob names for file pptr scan names Darrick J. Wong
2023-02-16 20:31 ` [PATCHSET v9r2d1 0/6] xfsprogs: encode parent pointer name in xattr key Darrick J. Wong
2023-02-16 21:10   ` [PATCH 1/6] libfrog: support the sha512 hash algorithm Darrick J. Wong
2023-02-16 21:10   ` [PATCH 2/6] xfs: replace parent pointer diroffset with sha512 hash of name Darrick J. Wong
2023-02-16 21:11   ` [PATCH 3/6] xfs_logprint: decode parent pointers fully Darrick J. Wong
2023-02-16 21:11   ` [PATCH 4/6] xfs: skip the sha512 namehash when possible Darrick J. Wong
2023-02-16 21:11   ` [PATCH 5/6] xfs: make the ondisk parent pointer record a flex array Darrick J. Wong
2023-02-16 21:12   ` [PATCH 6/6] xfs: use parent pointer xattr space more efficiently Darrick J. Wong
2023-02-16 20:31 ` [PATCHSET v9r2d1 0/3] xfsprogs: use flex arrays for XFS_IOC_GETPARENTS Darrick J. Wong
2023-02-16 21:12   ` [PATCH 1/3] xfs: rename xfs_pptr_info to xfs_getparents Darrick J. Wong
2023-02-16 21:12   ` [PATCH 2/3] xfs: rename xfs_parent_ptr Darrick J. Wong
2023-02-16 21:12   ` [PATCH 3/3] xfs: convert GETPARENTS structures to flex arrays Darrick J. Wong
2023-02-16 20:31 ` [PATCHSET v9r2d1 0/3] xfsprogs: turn on all available features Darrick J. Wong
2023-02-16 21:13   ` [PATCH 1/3] mkfs: enable large extent counts by default Darrick J. Wong
2023-02-16 21:13   ` [PATCH 2/3] mkfs: enable reverse mapping " Darrick J. Wong
2023-02-16 21:13   ` [PATCH 3/3] mkfs: enable parent pointers " Darrick J. Wong
2023-02-16 20:32 ` [PATCHSET 00/14] fstests: adjust tests for xfs parent pointers Darrick J. Wong
2023-02-16 21:13   ` [PATCH 01/14] xfs/122: update for " Darrick J. Wong
2023-02-16 21:14   ` [PATCH 02/14] populate: create hardlinks " Darrick J. Wong
2023-02-16 21:14   ` [PATCH 03/14] xfs/021: adapt golden output files " Darrick J. Wong
2023-02-16 21:14   ` [PATCH 04/14] generic/050: adapt " Darrick J. Wong
2023-02-16 21:14   ` [PATCH 05/14] xfs/018: disable parent pointers for this test Darrick J. Wong
2023-02-16 21:15   ` [PATCH 06/14] xfs/306: fix formatting failures with parent pointers Darrick J. Wong
2023-02-16 21:15   ` [PATCH 07/14] common: add helpers for parent pointer tests Darrick J. Wong
2023-02-16 21:15   ` [PATCH 08/14] xfs: add parent pointer test Darrick J. Wong
2023-02-16 21:15   ` [PATCH 09/14] xfs: add multi link " Darrick J. Wong
2023-02-16 21:16   ` [PATCH 10/14] xfs: add parent pointer inject test Darrick J. Wong
2023-02-16 21:16   ` [PATCH 11/14] common/parent: add license and copyright Darrick J. Wong
2023-02-16 21:16   ` [PATCH 12/14] common/parent: don't _fail on missing parent pointer components Darrick J. Wong
2023-02-16 21:16   ` [PATCH 13/14] common/parent: check xfs_io parent command paths Darrick J. Wong
2023-02-16 21:17   ` [PATCH 14/14] xfs/851: test xfs_io parent -p too Darrick J. Wong
2023-02-16 20:32 ` [PATCHSET v9r2 0/4] fstests: encode parent pointer name in xattr key Darrick J. Wong
2023-02-16 21:17   ` [PATCH 1/4] misc: adjust for parent pointers with namehashes Darrick J. Wong
2023-02-16 21:17   ` [PATCH 2/4] xfs/021: adjust for short parent pointers with hashes Darrick J. Wong
2023-02-16 21:18   ` [PATCH 3/4] xfs/242: fix _filter_bmap for xfs_io bmap that does rt file properly Darrick J. Wong
2023-02-16 21:18   ` [PATCH 4/4] xfs/021: adjust for short valuelens Darrick J. Wong
2023-02-16 20:32 ` [PATCHSET v9r2 0/1] fstests: use flex arrays for XFS_IOC_GETPARENTS Darrick J. Wong
2023-02-16 21:18   ` [PATCH 1/1] xfs/122: adjust for flex-array XFS_IOC_GETPARENTS ioctl Darrick J. Wong
2023-02-17 20:02 ` [RFC DELUGE v9r2d1] xfs: Parent Pointers Allison Henderson
2023-02-24  2:51   ` Darrick J. Wong
2023-02-24  7:24     ` Amir Goldstein
2023-02-25  1:58       ` Darrick J. Wong
2023-02-25  7:34     ` Allison Henderson
2023-03-01  1:24       ` Darrick J. Wong
2023-03-08 22:47         ` Allison Henderson
2023-03-14  2:20           ` Darrick J. Wong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).