All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RESEND v2 00/18] Parent Pointers
@ 2022-08-04 19:39 Allison Henderson
  2022-08-04 19:39 ` [PATCH RESEND v2 01/18] xfs: Fix multi-transaction larp replay Allison Henderson
                   ` (19 more replies)
  0 siblings, 20 replies; 58+ messages in thread
From: Allison Henderson @ 2022-08-04 19:39 UTC (permalink / raw)
  To: linux-xfs

Hi all,

This is a rebase and resend of the latest parent pointer attributes for xfs.
The goal of this patch set is to add a parent pointer attribute to each inode.
The attribute name containing the parent inode, generation, and directory
offset, while the  attribute value contains the file name.  This feature will
enable future optimizations for online scrub, or any other feature that could
make use of quickly deriving an inodes path from  the mount point.  

This set can be viewed on github here
https://github.com/allisonhenderson/xfs/tree/xfs_new_pptrsv2_rebase

And the corresponding xfsprogs code is here
https://github.com/allisonhenderson/xfsprogs/tree/xfsprogs_new_pptrsv2

This set has been tested with the below parent pointers tests
https://www.spinics.net/lists/fstests/msg19963.html


Updates since v1:

xfs: Fix multi-transaction larp replay
  Resend (from stand alone patch)

xfs: Increase XFS_DEFER_OPS_NR_INODES to 5
  Increased XFS_DEFER_OPS_NR_INODES from 4 to 5
  Moved to beginning of the set
  Added code in xfs_defer_ops_continue to sort the inodes
  Added commentary about which inodes are locked

xfs: Hold inode locks in xfs_trans_alloc_dir
  New patch

xfs: add parent pointer support to attribute code
  Typo fix

xfs: extend transaction reservations for parent attribute
  Made xfs_calc_parent_ptr_reservations static
  Whitespace fixes

xfs: parent pointer attribute creation
  Fixed SPDX License Headers
  Updated xfs_sb_version_hasparent to xfs_has_parent
  Type def conversions
  Whitespace
  Added helper functions: xfs_parent_init, xfs_parent_defer_add,
    xfs_parent_cancel
  Investigated mount option that overrides larp option:
    The larp global itself isnt used with in the delayed ops machinery due
    to race conditions with the syscall being toggled.  Pptrs dont toggle
    so we can just set XFS_DA_OP_LOGGED to log the pptr with out requireing
    all attrs be logged

xfs: add parent attributes to link
  Rebase to use new helpers: xfs_parent_init, xfs_parent_defer_add,
  xfs_parent_cancel, xfs_has_parent
  
xfs: add parent attributes to unlink
  rebase to use new helpers
  added helper function xfs_parent_defer_remove
  
xfs: Add parent pointers to rename
  added extra parent remove for target_ip unlinks
  rebase to use new helpers
  
xfs: Add the parent pointer support to the  superblock version 5.
changed XFS_SB_FEAT_RO_COMPAT_PARENT to XFS_SB_FEAT_INCOMPAT_PARENT

xfs: Add parent pointer ioctl
  Added pi_parents[] to struct xfs_parent_ptr
  Changed pptr helper defines into inline functions
  White space/indentation
  Reordered flag check in xfs_ioc_get_parent_pointer to be before the alloc
  Added gen number check in xfs_ioc_get_parent_pointer
  Init args in loop body of xfs_attr_get_parent_pointer

Questions comments and feedback appreciated!

Thanks all!
Allison


Allison Henderson (18):
  xfs: Fix multi-transaction larp replay
  xfs: Increase XFS_DEFER_OPS_NR_INODES to 5
  xfs: Hold inode locks in xfs_ialloc
  xfs: Hold inode locks in xfs_trans_alloc_dir
  xfs: get directory offset when adding directory name
  xfs: get directory offset when removing directory name
  xfs: get directory offset when replacing a directory name
  xfs: add parent pointer support to attribute code
  xfs: define parent pointer xattr format
  xfs: Add xfs_verify_pptr
  xfs: extend transaction reservations for parent attributes
  xfs: parent pointer attribute creation
  xfs: add parent attributes to link
  xfs: remove parent pointers in unlink
  xfs: Add parent pointers to rename
  xfs: Add the parent pointer support to the  superblock version 5.
  xfs: Add helper function xfs_attr_list_context_init
  xfs: Add parent pointer ioctl

 fs/xfs/Makefile                |   2 +
 fs/xfs/libxfs/xfs_attr.c       |  53 ++++++-
 fs/xfs/libxfs/xfs_attr.h       |   8 +-
 fs/xfs/libxfs/xfs_da_btree.h   |   1 +
 fs/xfs/libxfs/xfs_da_format.h  |  30 +++-
 fs/xfs/libxfs/xfs_defer.c      |  28 +++-
 fs/xfs/libxfs/xfs_defer.h      |   8 +-
 fs/xfs/libxfs/xfs_dir2.c       |  21 ++-
 fs/xfs/libxfs/xfs_dir2.h       |   7 +-
 fs/xfs/libxfs/xfs_dir2_block.c |   9 +-
 fs/xfs/libxfs/xfs_dir2_leaf.c  |   8 +-
 fs/xfs/libxfs/xfs_dir2_node.c  |   8 +-
 fs/xfs/libxfs/xfs_dir2_sf.c    |   6 +
 fs/xfs/libxfs/xfs_format.h     |   4 +-
 fs/xfs/libxfs/xfs_fs.h         |  58 +++++++
 fs/xfs/libxfs/xfs_log_format.h |   1 +
 fs/xfs/libxfs/xfs_parent.c     | 159 +++++++++++++++++++
 fs/xfs/libxfs/xfs_parent.h     |  39 +++++
 fs/xfs/libxfs/xfs_sb.c         |   4 +
 fs/xfs/libxfs/xfs_trans_resv.c | 105 ++++++++++---
 fs/xfs/scrub/attr.c            |   2 +-
 fs/xfs/xfs_attr_item.c         |  41 ++---
 fs/xfs/xfs_attr_list.c         |  17 ++-
 fs/xfs/xfs_file.c              |   1 +
 fs/xfs/xfs_inode.c             | 271 +++++++++++++++++++++++++--------
 fs/xfs/xfs_inode.h             |   1 +
 fs/xfs/xfs_ioctl.c             | 149 +++++++++++++++---
 fs/xfs/xfs_ioctl.h             |   2 +
 fs/xfs/xfs_ondisk.h            |   4 +
 fs/xfs/xfs_parent_utils.c      | 134 ++++++++++++++++
 fs/xfs/xfs_parent_utils.h      |  22 +++
 fs/xfs/xfs_qm.c                |   4 +-
 fs/xfs/xfs_super.c             |   4 +
 fs/xfs/xfs_symlink.c           |   6 +-
 fs/xfs/xfs_trans.c             |   6 +-
 fs/xfs/xfs_xattr.c             |   2 +-
 fs/xfs/xfs_xattr.h             |   1 +
 37 files changed, 1056 insertions(+), 170 deletions(-)
 create mode 100644 fs/xfs/libxfs/xfs_parent.c
 create mode 100644 fs/xfs/libxfs/xfs_parent.h
 create mode 100644 fs/xfs/xfs_parent_utils.c
 create mode 100644 fs/xfs/xfs_parent_utils.h

-- 
2.25.1


^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH RESEND v2 01/18] xfs: Fix multi-transaction larp replay
  2022-08-04 19:39 [PATCH RESEND v2 00/18] Parent Pointers Allison Henderson
@ 2022-08-04 19:39 ` Allison Henderson
  2022-08-09 16:52   ` Darrick J. Wong
  2022-08-04 19:39 ` [PATCH RESEND v2 02/18] xfs: Increase XFS_DEFER_OPS_NR_INODES to 5 Allison Henderson
                   ` (18 subsequent siblings)
  19 siblings, 1 reply; 58+ messages in thread
From: Allison Henderson @ 2022-08-04 19:39 UTC (permalink / raw)
  To: linux-xfs

Recent parent pointer testing has exposed a bug in the underlying
attr replay.  A multi transaction replay currently performs a
single step of the replay, then deferrs the rest if there is more
to do.  This causes race conditions with other attr replays that
might be recovered before the remaining deferred work has had a
chance to finish.  This can lead to interleaved set and remove
operations that may clobber the attribute fork.  Fix this by
deferring all work for any attribute operation.

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
---
 fs/xfs/xfs_attr_item.c | 35 ++++++++---------------------------
 1 file changed, 8 insertions(+), 27 deletions(-)

diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
index 5077a7ad5646..c13d724a3e13 100644
--- a/fs/xfs/xfs_attr_item.c
+++ b/fs/xfs/xfs_attr_item.c
@@ -635,52 +635,33 @@ xfs_attri_item_recover(
 		break;
 	case XFS_ATTRI_OP_FLAGS_REMOVE:
 		if (!xfs_inode_hasattr(args->dp))
-			goto out;
+			return 0;
 		attr->xattri_dela_state = xfs_attr_init_remove_state(args);
 		break;
 	default:
 		ASSERT(0);
-		error = -EFSCORRUPTED;
-		goto out;
+		return -EFSCORRUPTED;
 	}
 
 	xfs_init_attr_trans(args, &tres, &total);
 	error = xfs_trans_alloc(mp, &tres, total, 0, XFS_TRANS_RESERVE, &tp);
 	if (error)
-		goto out;
+		return error;
 
 	args->trans = tp;
 	done_item = xfs_trans_get_attrd(tp, attrip);
+	args->trans->t_flags |= XFS_TRANS_HAS_INTENT_DONE;
+	set_bit(XFS_LI_DIRTY, &done_item->attrd_item.li_flags);
 
 	xfs_ilock(ip, XFS_ILOCK_EXCL);
 	xfs_trans_ijoin(tp, ip, 0);
 
-	error = xfs_xattri_finish_update(attr, done_item);
-	if (error == -EAGAIN) {
-		/*
-		 * There's more work to do, so add the intent item to this
-		 * transaction so that we can continue it later.
-		 */
-		xfs_defer_add(tp, XFS_DEFER_OPS_TYPE_ATTR, &attr->xattri_list);
-		error = xfs_defer_ops_capture_and_commit(tp, capture_list);
-		if (error)
-			goto out_unlock;
-
-		xfs_iunlock(ip, XFS_ILOCK_EXCL);
-		xfs_irele(ip);
-		return 0;
-	}
-	if (error) {
-		xfs_trans_cancel(tp);
-		goto out_unlock;
-	}
-
+	xfs_defer_add(tp, XFS_DEFER_OPS_TYPE_ATTR, &attr->xattri_list);
 	error = xfs_defer_ops_capture_and_commit(tp, capture_list);
-out_unlock:
+
 	xfs_iunlock(ip, XFS_ILOCK_EXCL);
 	xfs_irele(ip);
-out:
-	xfs_attr_free_item(attr);
+
 	return error;
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH RESEND v2 02/18] xfs: Increase XFS_DEFER_OPS_NR_INODES to 5
  2022-08-04 19:39 [PATCH RESEND v2 00/18] Parent Pointers Allison Henderson
  2022-08-04 19:39 ` [PATCH RESEND v2 01/18] xfs: Fix multi-transaction larp replay Allison Henderson
@ 2022-08-04 19:39 ` Allison Henderson
  2022-08-09 16:38   ` Darrick J. Wong
  2022-08-04 19:39 ` [PATCH RESEND v2 03/18] xfs: Hold inode locks in xfs_ialloc Allison Henderson
                   ` (17 subsequent siblings)
  19 siblings, 1 reply; 58+ messages in thread
From: Allison Henderson @ 2022-08-04 19:39 UTC (permalink / raw)
  To: linux-xfs

Renames that generate parent pointer updates can join up to 5
inodes locked in sorted order.  So we need to increase the
number of defer ops inodes and relock them in the same way.

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
---
 fs/xfs/libxfs/xfs_defer.c | 28 ++++++++++++++++++++++++++--
 fs/xfs/libxfs/xfs_defer.h |  8 +++++++-
 fs/xfs/xfs_inode.c        |  2 +-
 fs/xfs/xfs_inode.h        |  1 +
 4 files changed, 35 insertions(+), 4 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
index 5a321b783398..c0279b57e51d 100644
--- a/fs/xfs/libxfs/xfs_defer.c
+++ b/fs/xfs/libxfs/xfs_defer.c
@@ -820,13 +820,37 @@ xfs_defer_ops_continue(
 	struct xfs_trans		*tp,
 	struct xfs_defer_resources	*dres)
 {
-	unsigned int			i;
+	unsigned int			i, j;
+	struct xfs_inode		*sips[XFS_DEFER_OPS_NR_INODES];
+	struct xfs_inode		*temp;
 
 	ASSERT(tp->t_flags & XFS_TRANS_PERM_LOG_RES);
 	ASSERT(!(tp->t_flags & XFS_TRANS_DIRTY));
 
 	/* Lock the captured resources to the new transaction. */
-	if (dfc->dfc_held.dr_inos == 2)
+	if (dfc->dfc_held.dr_inos > 2) {
+		/*
+		 * Renames with parent pointer updates can lock up to 5 inodes,
+		 * sorted by their inode number.  So we need to make sure they
+		 * are relocked in the same way.
+		 */
+		memset(sips, 0, sizeof(sips));
+		for (i = 0; i < dfc->dfc_held.dr_inos; i++)
+			sips[i] = dfc->dfc_held.dr_ip[i];
+
+		/* Bubble sort of at most 5 inodes */
+		for (i = 0; i < dfc->dfc_held.dr_inos; i++) {
+			for (j = 1; j < dfc->dfc_held.dr_inos; j++) {
+				if (sips[j]->i_ino < sips[j-1]->i_ino) {
+					temp = sips[j];
+					sips[j] = sips[j-1];
+					sips[j-1] = temp;
+				}
+			}
+		}
+
+		xfs_lock_inodes(sips, dfc->dfc_held.dr_inos, XFS_ILOCK_EXCL);
+	} else if (dfc->dfc_held.dr_inos == 2)
 		xfs_lock_two_inodes(dfc->dfc_held.dr_ip[0], XFS_ILOCK_EXCL,
 				    dfc->dfc_held.dr_ip[1], XFS_ILOCK_EXCL);
 	else if (dfc->dfc_held.dr_inos == 1)
diff --git a/fs/xfs/libxfs/xfs_defer.h b/fs/xfs/libxfs/xfs_defer.h
index 114a3a4930a3..3e4029d2ce41 100644
--- a/fs/xfs/libxfs/xfs_defer.h
+++ b/fs/xfs/libxfs/xfs_defer.h
@@ -70,7 +70,13 @@ extern const struct xfs_defer_op_type xfs_attr_defer_type;
 /*
  * Deferred operation item relogging limits.
  */
-#define XFS_DEFER_OPS_NR_INODES	2	/* join up to two inodes */
+
+/*
+ * Rename w/ parent pointers can require up to 5 inodes with defered ops to
+ * be joined to the transaction: src_dp, target_dp, src_ip, target_ip, and wip.
+ * These inodes are locked in sorted order by their inode numbers
+ */
+#define XFS_DEFER_OPS_NR_INODES	5
 #define XFS_DEFER_OPS_NR_BUFS	2	/* join up to two buffers */
 
 /* Resources that must be held across a transaction roll. */
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 3022918bf96a..cfdcca95594f 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -447,7 +447,7 @@ xfs_lock_inumorder(
  * lock more than one at a time, lockdep will report false positives saying we
  * have violated locking orders.
  */
-static void
+void
 xfs_lock_inodes(
 	struct xfs_inode	**ips,
 	int			inodes,
diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index 4d626f4321bc..bc06d6e4164a 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -573,5 +573,6 @@ void xfs_end_io(struct work_struct *work);
 
 int xfs_ilock2_io_mmap(struct xfs_inode *ip1, struct xfs_inode *ip2);
 void xfs_iunlock2_io_mmap(struct xfs_inode *ip1, struct xfs_inode *ip2);
+void xfs_lock_inodes(struct xfs_inode **ips, int inodes, uint lock_mode);
 
 #endif	/* __XFS_INODE_H__ */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH RESEND v2 03/18] xfs: Hold inode locks in xfs_ialloc
  2022-08-04 19:39 [PATCH RESEND v2 00/18] Parent Pointers Allison Henderson
  2022-08-04 19:39 ` [PATCH RESEND v2 01/18] xfs: Fix multi-transaction larp replay Allison Henderson
  2022-08-04 19:39 ` [PATCH RESEND v2 02/18] xfs: Increase XFS_DEFER_OPS_NR_INODES to 5 Allison Henderson
@ 2022-08-04 19:39 ` Allison Henderson
  2022-08-04 19:39 ` [PATCH RESEND v2 04/18] xfs: Hold inode locks in xfs_trans_alloc_dir Allison Henderson
                   ` (16 subsequent siblings)
  19 siblings, 0 replies; 58+ messages in thread
From: Allison Henderson @ 2022-08-04 19:39 UTC (permalink / raw)
  To: linux-xfs

Modify xfs_ialloc to hold locks after return.  Caller will be
responsible for manual unlock.  We will need this later to hold locks
across parent pointer operations

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_inode.c   | 6 +++++-
 fs/xfs/xfs_qm.c      | 4 +++-
 fs/xfs/xfs_symlink.c | 3 +++
 3 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index cfdcca95594f..cce5fe7c048e 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -774,6 +774,8 @@ xfs_inode_inherit_flags2(
 /*
  * Initialise a newly allocated inode and return the in-core inode to the
  * caller locked exclusively.
+ *
+ * Caller is responsible for unlocking the inode manually upon return
  */
 int
 xfs_init_new_inode(
@@ -900,7 +902,7 @@ xfs_init_new_inode(
 	/*
 	 * Log the new values stuffed into the inode.
 	 */
-	xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
+	xfs_trans_ijoin(tp, ip, 0);
 	xfs_trans_log_inode(tp, ip, flags);
 
 	/* now that we have an i_mode we can setup the inode structure */
@@ -1077,6 +1079,7 @@ xfs_create(
 	xfs_qm_dqrele(pdqp);
 
 	*ipp = ip;
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
 	return 0;
 
  out_trans_cancel:
@@ -1173,6 +1176,7 @@ xfs_create_tmpfile(
 	xfs_qm_dqrele(pdqp);
 
 	*ipp = ip;
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
 	return 0;
 
  out_trans_cancel:
diff --git a/fs/xfs/xfs_qm.c b/fs/xfs/xfs_qm.c
index 57dd3b722265..5582c44f12ab 100644
--- a/fs/xfs/xfs_qm.c
+++ b/fs/xfs/xfs_qm.c
@@ -817,8 +817,10 @@ xfs_qm_qino_alloc(
 		ASSERT(xfs_is_shutdown(mp));
 		xfs_alert(mp, "%s failed (error %d)!", __func__, error);
 	}
-	if (need_alloc)
+	if (need_alloc) {
 		xfs_finish_inode_setup(*ipp);
+		xfs_iunlock(*ipp, XFS_ILOCK_EXCL);
+	}
 	return error;
 }
 
diff --git a/fs/xfs/xfs_symlink.c b/fs/xfs/xfs_symlink.c
index 8389f3ef88ef..d8e120913036 100644
--- a/fs/xfs/xfs_symlink.c
+++ b/fs/xfs/xfs_symlink.c
@@ -337,6 +337,7 @@ xfs_symlink(
 	xfs_qm_dqrele(pdqp);
 
 	*ipp = ip;
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
 	return 0;
 
 out_trans_cancel:
@@ -358,6 +359,8 @@ xfs_symlink(
 
 	if (unlock_dp_on_error)
 		xfs_iunlock(dp, XFS_ILOCK_EXCL);
+	if (ip)
+		xfs_iunlock(ip, XFS_ILOCK_EXCL);
 	return error;
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH RESEND v2 04/18] xfs: Hold inode locks in xfs_trans_alloc_dir
  2022-08-04 19:39 [PATCH RESEND v2 00/18] Parent Pointers Allison Henderson
                   ` (2 preceding siblings ...)
  2022-08-04 19:39 ` [PATCH RESEND v2 03/18] xfs: Hold inode locks in xfs_ialloc Allison Henderson
@ 2022-08-04 19:39 ` Allison Henderson
  2022-08-04 19:40 ` [PATCH RESEND v2 05/18] xfs: get directory offset when adding directory name Allison Henderson
                   ` (15 subsequent siblings)
  19 siblings, 0 replies; 58+ messages in thread
From: Allison Henderson @ 2022-08-04 19:39 UTC (permalink / raw)
  To: linux-xfs

Modify xfs_trans_alloc_dir to hold locks after return.  Caller will be
responsible for manual unlock.  We will need this later to hold locks
across parent pointer operations

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
---
 fs/xfs/xfs_inode.c | 14 ++++++++++++--
 fs/xfs/xfs_trans.c |  6 ++++--
 2 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index cce5fe7c048e..2703473b13b1 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -1278,10 +1278,15 @@ xfs_link(
 	if (xfs_has_wsync(mp) || xfs_has_dirsync(mp))
 		xfs_trans_set_sync(tp);
 
-	return xfs_trans_commit(tp);
+	error = xfs_trans_commit(tp);
+	xfs_iunlock(tdp, XFS_ILOCK_EXCL);
+	xfs_iunlock(sip, XFS_ILOCK_EXCL);
+	return error;
 
  error_return:
 	xfs_trans_cancel(tp);
+	xfs_iunlock(tdp, XFS_ILOCK_EXCL);
+	xfs_iunlock(sip, XFS_ILOCK_EXCL);
  std_return:
 	if (error == -ENOSPC && nospace_error)
 		error = nospace_error;
@@ -2517,15 +2522,20 @@ xfs_remove(
 
 	error = xfs_trans_commit(tp);
 	if (error)
-		goto std_return;
+		goto out_unlock;
 
 	if (is_dir && xfs_inode_is_filestream(ip))
 		xfs_filestream_deassociate(ip);
 
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+	xfs_iunlock(dp, XFS_ILOCK_EXCL);
 	return 0;
 
  out_trans_cancel:
 	xfs_trans_cancel(tp);
+ out_unlock:
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+	xfs_iunlock(dp, XFS_ILOCK_EXCL);
  std_return:
 	return error;
 }
diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index 7bd16fbff534..ac98ff416e54 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -1356,6 +1356,8 @@ xfs_trans_alloc_ichange(
  * The caller must ensure that the on-disk dquots attached to this inode have
  * already been allocated and initialized.  The ILOCKs will be dropped when the
  * transaction is committed or cancelled.
+ *
+ * Caller is responsible for unlocking the inodes manually upon return
  */
 int
 xfs_trans_alloc_dir(
@@ -1386,8 +1388,8 @@ xfs_trans_alloc_dir(
 
 	xfs_lock_two_inodes(dp, XFS_ILOCK_EXCL, ip, XFS_ILOCK_EXCL);
 
-	xfs_trans_ijoin(tp, dp, XFS_ILOCK_EXCL);
-	xfs_trans_ijoin(tp, ip, XFS_ILOCK_EXCL);
+	xfs_trans_ijoin(tp, dp, 0);
+	xfs_trans_ijoin(tp, ip, 0);
 
 	error = xfs_qm_dqattach_locked(dp, false);
 	if (error) {
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH RESEND v2 05/18] xfs: get directory offset when adding directory name
  2022-08-04 19:39 [PATCH RESEND v2 00/18] Parent Pointers Allison Henderson
                   ` (3 preceding siblings ...)
  2022-08-04 19:39 ` [PATCH RESEND v2 04/18] xfs: Hold inode locks in xfs_trans_alloc_dir Allison Henderson
@ 2022-08-04 19:40 ` Allison Henderson
  2022-08-04 19:40 ` [PATCH RESEND v2 06/18] xfs: get directory offset when removing " Allison Henderson
                   ` (14 subsequent siblings)
  19 siblings, 0 replies; 58+ messages in thread
From: Allison Henderson @ 2022-08-04 19:40 UTC (permalink / raw)
  To: linux-xfs

Return the directory offset information when adding an entry to the
directory.

This offset will be used as the parent pointer offset in xfs_create,
xfs_symlink, xfs_link and xfs_rename.

[dchinner: forward ported and cleaned up]
[dchinner: no s-o-b from Mark]
[bfoster: rebased, use args->geo in dir code]
[achender: rebased, chaged __uint32_t to xfs_dir2_dataptr_t]

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_da_btree.h   | 1 +
 fs/xfs/libxfs/xfs_dir2.c       | 9 +++++++--
 fs/xfs/libxfs/xfs_dir2.h       | 2 +-
 fs/xfs/libxfs/xfs_dir2_block.c | 1 +
 fs/xfs/libxfs/xfs_dir2_leaf.c  | 2 ++
 fs/xfs/libxfs/xfs_dir2_node.c  | 2 ++
 fs/xfs/libxfs/xfs_dir2_sf.c    | 2 ++
 fs/xfs/xfs_inode.c             | 6 +++---
 fs/xfs/xfs_symlink.c           | 3 ++-
 9 files changed, 21 insertions(+), 7 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_da_btree.h b/fs/xfs/libxfs/xfs_da_btree.h
index ffa3df5b2893..3692de4e6716 100644
--- a/fs/xfs/libxfs/xfs_da_btree.h
+++ b/fs/xfs/libxfs/xfs_da_btree.h
@@ -79,6 +79,7 @@ typedef struct xfs_da_args {
 	int		rmtvaluelen2;	/* remote attr value length in bytes */
 	uint32_t	op_flags;	/* operation flags */
 	enum xfs_dacmp	cmpresult;	/* name compare result for lookups */
+	xfs_dir2_dataptr_t offset;	/* OUT: offset in directory */
 } xfs_da_args_t;
 
 /*
diff --git a/fs/xfs/libxfs/xfs_dir2.c b/fs/xfs/libxfs/xfs_dir2.c
index 76eedc2756b3..c0629c2cdecc 100644
--- a/fs/xfs/libxfs/xfs_dir2.c
+++ b/fs/xfs/libxfs/xfs_dir2.c
@@ -257,7 +257,8 @@ xfs_dir_createname(
 	struct xfs_inode	*dp,
 	const struct xfs_name	*name,
 	xfs_ino_t		inum,		/* new entry inode number */
-	xfs_extlen_t		total)		/* bmap's total block count */
+	xfs_extlen_t		total,		/* bmap's total block count */
+	xfs_dir2_dataptr_t	*offset)	/* OUT entry's dir offset */
 {
 	struct xfs_da_args	*args;
 	int			rval;
@@ -312,6 +313,10 @@ xfs_dir_createname(
 		rval = xfs_dir2_node_addname(args);
 
 out_free:
+	/* return the location that this entry was place in the parent inode */
+	if (offset)
+		*offset = args->offset;
+
 	kmem_free(args);
 	return rval;
 }
@@ -550,7 +555,7 @@ xfs_dir_canenter(
 	xfs_inode_t	*dp,
 	struct xfs_name	*name)		/* name of entry to add */
 {
-	return xfs_dir_createname(tp, dp, name, 0, 0);
+	return xfs_dir_createname(tp, dp, name, 0, 0, NULL);
 }
 
 /*
diff --git a/fs/xfs/libxfs/xfs_dir2.h b/fs/xfs/libxfs/xfs_dir2.h
index b6df3c34b26a..4d1c2570b833 100644
--- a/fs/xfs/libxfs/xfs_dir2.h
+++ b/fs/xfs/libxfs/xfs_dir2.h
@@ -40,7 +40,7 @@ extern int xfs_dir_init(struct xfs_trans *tp, struct xfs_inode *dp,
 				struct xfs_inode *pdp);
 extern int xfs_dir_createname(struct xfs_trans *tp, struct xfs_inode *dp,
 				const struct xfs_name *name, xfs_ino_t inum,
-				xfs_extlen_t tot);
+				xfs_extlen_t tot, xfs_dir2_dataptr_t *offset);
 extern int xfs_dir_lookup(struct xfs_trans *tp, struct xfs_inode *dp,
 				const struct xfs_name *name, xfs_ino_t *inum,
 				struct xfs_name *ci_name);
diff --git a/fs/xfs/libxfs/xfs_dir2_block.c b/fs/xfs/libxfs/xfs_dir2_block.c
index 00f960a703b2..70aeab9d2a12 100644
--- a/fs/xfs/libxfs/xfs_dir2_block.c
+++ b/fs/xfs/libxfs/xfs_dir2_block.c
@@ -573,6 +573,7 @@ xfs_dir2_block_addname(
 	xfs_dir2_data_put_ftype(dp->i_mount, dep, args->filetype);
 	tagp = xfs_dir2_data_entry_tag_p(dp->i_mount, dep);
 	*tagp = cpu_to_be16((char *)dep - (char *)hdr);
+	args->offset = xfs_dir2_byte_to_dataptr((char *)dep - (char *)hdr);
 	/*
 	 * Clean up the bestfree array and log the header, tail, and entry.
 	 */
diff --git a/fs/xfs/libxfs/xfs_dir2_leaf.c b/fs/xfs/libxfs/xfs_dir2_leaf.c
index d9b66306a9a7..bd0c2f963545 100644
--- a/fs/xfs/libxfs/xfs_dir2_leaf.c
+++ b/fs/xfs/libxfs/xfs_dir2_leaf.c
@@ -865,6 +865,8 @@ xfs_dir2_leaf_addname(
 	xfs_dir2_data_put_ftype(dp->i_mount, dep, args->filetype);
 	tagp = xfs_dir2_data_entry_tag_p(dp->i_mount, dep);
 	*tagp = cpu_to_be16((char *)dep - (char *)hdr);
+	args->offset = xfs_dir2_db_off_to_dataptr(args->geo, use_block,
+						(char *)dep - (char *)hdr);
 	/*
 	 * Need to scan fix up the bestfree table.
 	 */
diff --git a/fs/xfs/libxfs/xfs_dir2_node.c b/fs/xfs/libxfs/xfs_dir2_node.c
index 7a03aeb9f4c9..5a9513c036b8 100644
--- a/fs/xfs/libxfs/xfs_dir2_node.c
+++ b/fs/xfs/libxfs/xfs_dir2_node.c
@@ -1974,6 +1974,8 @@ xfs_dir2_node_addname_int(
 	xfs_dir2_data_put_ftype(dp->i_mount, dep, args->filetype);
 	tagp = xfs_dir2_data_entry_tag_p(dp->i_mount, dep);
 	*tagp = cpu_to_be16((char *)dep - (char *)hdr);
+	args->offset = xfs_dir2_db_off_to_dataptr(args->geo, dbno,
+						  (char *)dep - (char *)hdr);
 	xfs_dir2_data_log_entry(args, dbp, dep);
 
 	/* Rescan the freespace and log the data block if needed. */
diff --git a/fs/xfs/libxfs/xfs_dir2_sf.c b/fs/xfs/libxfs/xfs_dir2_sf.c
index 003812fd7d35..541235b37d69 100644
--- a/fs/xfs/libxfs/xfs_dir2_sf.c
+++ b/fs/xfs/libxfs/xfs_dir2_sf.c
@@ -485,6 +485,7 @@ xfs_dir2_sf_addname_easy(
 	memcpy(sfep->name, args->name, sfep->namelen);
 	xfs_dir2_sf_put_ino(mp, sfp, sfep, args->inumber);
 	xfs_dir2_sf_put_ftype(mp, sfep, args->filetype);
+	args->offset = xfs_dir2_byte_to_dataptr(offset);
 
 	/*
 	 * Update the header and inode.
@@ -575,6 +576,7 @@ xfs_dir2_sf_addname_hard(
 	memcpy(sfep->name, args->name, sfep->namelen);
 	xfs_dir2_sf_put_ino(mp, sfp, sfep, args->inumber);
 	xfs_dir2_sf_put_ftype(mp, sfep, args->filetype);
+	args->offset = xfs_dir2_byte_to_dataptr(offset);
 	sfp->count++;
 	if (args->inumber > XFS_DIR2_MAX_SHORT_INUM && !objchange)
 		sfp->i8count++;
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 2703473b13b1..08550f579551 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -1039,7 +1039,7 @@ xfs_create(
 	unlock_dp_on_error = false;
 
 	error = xfs_dir_createname(tp, dp, name, ip->i_ino,
-					resblks - XFS_IALLOC_SPACE_RES(mp));
+				   resblks - XFS_IALLOC_SPACE_RES(mp), NULL);
 	if (error) {
 		ASSERT(error != -ENOSPC);
 		goto out_trans_cancel;
@@ -1262,7 +1262,7 @@ xfs_link(
 	}
 
 	error = xfs_dir_createname(tp, tdp, target_name, sip->i_ino,
-				   resblks);
+				   resblks, NULL);
 	if (error)
 		goto error_return;
 	xfs_trans_ichgtime(tp, tdp, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG);
@@ -2983,7 +2983,7 @@ xfs_rename(
 		 * to account for the ".." reference from the new entry.
 		 */
 		error = xfs_dir_createname(tp, target_dp, target_name,
-					   src_ip->i_ino, spaceres);
+					   src_ip->i_ino, spaceres, NULL);
 		if (error)
 			goto out_trans_cancel;
 
diff --git a/fs/xfs/xfs_symlink.c b/fs/xfs/xfs_symlink.c
index d8e120913036..27a7d7c57015 100644
--- a/fs/xfs/xfs_symlink.c
+++ b/fs/xfs/xfs_symlink.c
@@ -314,7 +314,8 @@ xfs_symlink(
 	/*
 	 * Create the directory entry for the symlink.
 	 */
-	error = xfs_dir_createname(tp, dp, link_name, ip->i_ino, resblks);
+	error = xfs_dir_createname(tp, dp, link_name,
+			ip->i_ino, resblks, NULL);
 	if (error)
 		goto out_trans_cancel;
 	xfs_trans_ichgtime(tp, dp, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH RESEND v2 06/18] xfs: get directory offset when removing directory name
  2022-08-04 19:39 [PATCH RESEND v2 00/18] Parent Pointers Allison Henderson
                   ` (4 preceding siblings ...)
  2022-08-04 19:40 ` [PATCH RESEND v2 05/18] xfs: get directory offset when adding directory name Allison Henderson
@ 2022-08-04 19:40 ` Allison Henderson
  2022-08-04 19:40 ` [PATCH RESEND v2 07/18] xfs: get directory offset when replacing a " Allison Henderson
                   ` (13 subsequent siblings)
  19 siblings, 0 replies; 58+ messages in thread
From: Allison Henderson @ 2022-08-04 19:40 UTC (permalink / raw)
  To: linux-xfs

Return the directory offset information when removing an entry to the
directory.

This offset will be used as the parent pointer offset in xfs_remove.

[dchinner: forward ported and cleaned up]
[achender: rebased, changed __unint32_t to xfs_dir2_dataptr_t,
           Changed typedefs to raw struct types]

Signed-off-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_dir2.c       | 6 +++++-
 fs/xfs/libxfs/xfs_dir2.h       | 3 ++-
 fs/xfs/libxfs/xfs_dir2_block.c | 4 ++--
 fs/xfs/libxfs/xfs_dir2_leaf.c  | 5 +++--
 fs/xfs/libxfs/xfs_dir2_node.c  | 5 +++--
 fs/xfs/libxfs/xfs_dir2_sf.c    | 2 ++
 fs/xfs/xfs_inode.c             | 4 ++--
 7 files changed, 19 insertions(+), 10 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_dir2.c b/fs/xfs/libxfs/xfs_dir2.c
index c0629c2cdecc..e62ec568f42d 100644
--- a/fs/xfs/libxfs/xfs_dir2.c
+++ b/fs/xfs/libxfs/xfs_dir2.c
@@ -436,7 +436,8 @@ xfs_dir_removename(
 	struct xfs_inode	*dp,
 	struct xfs_name		*name,
 	xfs_ino_t		ino,
-	xfs_extlen_t		total)		/* bmap's total block count */
+	xfs_extlen_t		total,		/* bmap's total block count */
+	xfs_dir2_dataptr_t	*offset)	/* OUT: offset in directory */
 {
 	struct xfs_da_args	*args;
 	int			rval;
@@ -481,6 +482,9 @@ xfs_dir_removename(
 	else
 		rval = xfs_dir2_node_removename(args);
 out_free:
+	if (offset)
+		*offset = args->offset;
+
 	kmem_free(args);
 	return rval;
 }
diff --git a/fs/xfs/libxfs/xfs_dir2.h b/fs/xfs/libxfs/xfs_dir2.h
index 4d1c2570b833..c581d3b19bc6 100644
--- a/fs/xfs/libxfs/xfs_dir2.h
+++ b/fs/xfs/libxfs/xfs_dir2.h
@@ -46,7 +46,8 @@ extern int xfs_dir_lookup(struct xfs_trans *tp, struct xfs_inode *dp,
 				struct xfs_name *ci_name);
 extern int xfs_dir_removename(struct xfs_trans *tp, struct xfs_inode *dp,
 				struct xfs_name *name, xfs_ino_t ino,
-				xfs_extlen_t tot);
+				xfs_extlen_t tot,
+				xfs_dir2_dataptr_t *offset);
 extern int xfs_dir_replace(struct xfs_trans *tp, struct xfs_inode *dp,
 				const struct xfs_name *name, xfs_ino_t inum,
 				xfs_extlen_t tot);
diff --git a/fs/xfs/libxfs/xfs_dir2_block.c b/fs/xfs/libxfs/xfs_dir2_block.c
index 70aeab9d2a12..d36f3f1491da 100644
--- a/fs/xfs/libxfs/xfs_dir2_block.c
+++ b/fs/xfs/libxfs/xfs_dir2_block.c
@@ -810,9 +810,9 @@ xfs_dir2_block_removename(
 	/*
 	 * Point to the data entry using the leaf entry.
 	 */
+	args->offset = be32_to_cpu(blp[ent].address);
 	dep = (xfs_dir2_data_entry_t *)((char *)hdr +
-			xfs_dir2_dataptr_to_off(args->geo,
-						be32_to_cpu(blp[ent].address)));
+			xfs_dir2_dataptr_to_off(args->geo, args->offset));
 	/*
 	 * Mark the data entry's space free.
 	 */
diff --git a/fs/xfs/libxfs/xfs_dir2_leaf.c b/fs/xfs/libxfs/xfs_dir2_leaf.c
index bd0c2f963545..c13763c16095 100644
--- a/fs/xfs/libxfs/xfs_dir2_leaf.c
+++ b/fs/xfs/libxfs/xfs_dir2_leaf.c
@@ -1381,9 +1381,10 @@ xfs_dir2_leaf_removename(
 	 * Point to the leaf entry, use that to point to the data entry.
 	 */
 	lep = &leafhdr.ents[index];
-	db = xfs_dir2_dataptr_to_db(geo, be32_to_cpu(lep->address));
+	args->offset = be32_to_cpu(lep->address);
+	db = xfs_dir2_dataptr_to_db(args->geo, args->offset);
 	dep = (xfs_dir2_data_entry_t *)((char *)hdr +
-		xfs_dir2_dataptr_to_off(geo, be32_to_cpu(lep->address)));
+		xfs_dir2_dataptr_to_off(args->geo, args->offset));
 	needscan = needlog = 0;
 	oldbest = be16_to_cpu(bf[0].length);
 	ltp = xfs_dir2_leaf_tail_p(geo, leaf);
diff --git a/fs/xfs/libxfs/xfs_dir2_node.c b/fs/xfs/libxfs/xfs_dir2_node.c
index 5a9513c036b8..39cbdeafa0f6 100644
--- a/fs/xfs/libxfs/xfs_dir2_node.c
+++ b/fs/xfs/libxfs/xfs_dir2_node.c
@@ -1296,9 +1296,10 @@ xfs_dir2_leafn_remove(
 	/*
 	 * Extract the data block and offset from the entry.
 	 */
-	db = xfs_dir2_dataptr_to_db(geo, be32_to_cpu(lep->address));
+	args->offset = be32_to_cpu(lep->address);
+	db = xfs_dir2_dataptr_to_db(args->geo, args->offset);
 	ASSERT(dblk->blkno == db);
-	off = xfs_dir2_dataptr_to_off(geo, be32_to_cpu(lep->address));
+	off = xfs_dir2_dataptr_to_off(args->geo, args->offset);
 	ASSERT(dblk->index == off);
 
 	/*
diff --git a/fs/xfs/libxfs/xfs_dir2_sf.c b/fs/xfs/libxfs/xfs_dir2_sf.c
index 541235b37d69..2dc1d8d52228 100644
--- a/fs/xfs/libxfs/xfs_dir2_sf.c
+++ b/fs/xfs/libxfs/xfs_dir2_sf.c
@@ -971,6 +971,8 @@ xfs_dir2_sf_removename(
 								XFS_CMP_EXACT) {
 			ASSERT(xfs_dir2_sf_get_ino(mp, sfp, sfep) ==
 			       args->inumber);
+			args->offset = xfs_dir2_byte_to_dataptr(
+						xfs_dir2_sf_get_offset(sfep));
 			break;
 		}
 	}
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 08550f579551..ce888f844053 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -2506,7 +2506,7 @@ xfs_remove(
 	if (error)
 		goto out_trans_cancel;
 
-	error = xfs_dir_removename(tp, dp, name, ip->i_ino, resblks);
+	error = xfs_dir_removename(tp, dp, name, ip->i_ino, resblks, NULL);
 	if (error) {
 		ASSERT(error != -ENOENT);
 		goto out_trans_cancel;
@@ -3080,7 +3080,7 @@ xfs_rename(
 					spaceres);
 	else
 		error = xfs_dir_removename(tp, src_dp, src_name, src_ip->i_ino,
-					   spaceres);
+					   spaceres, NULL);
 
 	if (error)
 		goto out_trans_cancel;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH RESEND v2 07/18] xfs: get directory offset when replacing a directory name
  2022-08-04 19:39 [PATCH RESEND v2 00/18] Parent Pointers Allison Henderson
                   ` (5 preceding siblings ...)
  2022-08-04 19:40 ` [PATCH RESEND v2 06/18] xfs: get directory offset when removing " Allison Henderson
@ 2022-08-04 19:40 ` Allison Henderson
  2022-08-04 19:40 ` [PATCH RESEND v2 08/18] xfs: add parent pointer support to attribute code Allison Henderson
                   ` (12 subsequent siblings)
  19 siblings, 0 replies; 58+ messages in thread
From: Allison Henderson @ 2022-08-04 19:40 UTC (permalink / raw)
  To: linux-xfs

Return the directory offset information when replacing an entry to the
directory.

This offset will be used as the parent pointer offset in xfs_rename.

[dchinner: forward ported and cleaned up]
[achender: rebased, changed __unint32_t to xfs_dir2_dataptr_t,
           Changed typedefs to raw struct types]

Signed-off-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_dir2.c       |  8 ++++++--
 fs/xfs/libxfs/xfs_dir2.h       |  2 +-
 fs/xfs/libxfs/xfs_dir2_block.c |  4 ++--
 fs/xfs/libxfs/xfs_dir2_leaf.c  |  1 +
 fs/xfs/libxfs/xfs_dir2_node.c  |  1 +
 fs/xfs/libxfs/xfs_dir2_sf.c    |  2 ++
 fs/xfs/xfs_inode.c             | 16 ++++++++--------
 7 files changed, 21 insertions(+), 13 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_dir2.c b/fs/xfs/libxfs/xfs_dir2.c
index e62ec568f42d..e603323ce7a3 100644
--- a/fs/xfs/libxfs/xfs_dir2.c
+++ b/fs/xfs/libxfs/xfs_dir2.c
@@ -482,7 +482,7 @@ xfs_dir_removename(
 	else
 		rval = xfs_dir2_node_removename(args);
 out_free:
-	if (offset)
+	if (!rval && offset)
 		*offset = args->offset;
 
 	kmem_free(args);
@@ -498,7 +498,8 @@ xfs_dir_replace(
 	struct xfs_inode	*dp,
 	const struct xfs_name	*name,		/* name of entry to replace */
 	xfs_ino_t		inum,		/* new inode number */
-	xfs_extlen_t		total)		/* bmap's total block count */
+	xfs_extlen_t		total,		/* bmap's total block count */
+	xfs_dir2_dataptr_t	*offset)	/* OUT: offset in directory */
 {
 	struct xfs_da_args	*args;
 	int			rval;
@@ -546,6 +547,9 @@ xfs_dir_replace(
 	else
 		rval = xfs_dir2_node_replace(args);
 out_free:
+	if (offset)
+		*offset = args->offset;
+
 	kmem_free(args);
 	return rval;
 }
diff --git a/fs/xfs/libxfs/xfs_dir2.h b/fs/xfs/libxfs/xfs_dir2.h
index c581d3b19bc6..fd943c0c00a0 100644
--- a/fs/xfs/libxfs/xfs_dir2.h
+++ b/fs/xfs/libxfs/xfs_dir2.h
@@ -50,7 +50,7 @@ extern int xfs_dir_removename(struct xfs_trans *tp, struct xfs_inode *dp,
 				xfs_dir2_dataptr_t *offset);
 extern int xfs_dir_replace(struct xfs_trans *tp, struct xfs_inode *dp,
 				const struct xfs_name *name, xfs_ino_t inum,
-				xfs_extlen_t tot);
+				xfs_extlen_t tot, xfs_dir2_dataptr_t *offset);
 extern int xfs_dir_canenter(struct xfs_trans *tp, struct xfs_inode *dp,
 				struct xfs_name *name);
 
diff --git a/fs/xfs/libxfs/xfs_dir2_block.c b/fs/xfs/libxfs/xfs_dir2_block.c
index d36f3f1491da..0f3a03e87278 100644
--- a/fs/xfs/libxfs/xfs_dir2_block.c
+++ b/fs/xfs/libxfs/xfs_dir2_block.c
@@ -885,9 +885,9 @@ xfs_dir2_block_replace(
 	/*
 	 * Point to the data entry we need to change.
 	 */
+	args->offset = be32_to_cpu(blp[ent].address);
 	dep = (xfs_dir2_data_entry_t *)((char *)hdr +
-			xfs_dir2_dataptr_to_off(args->geo,
-						be32_to_cpu(blp[ent].address)));
+			xfs_dir2_dataptr_to_off(args->geo, args->offset));
 	ASSERT(be64_to_cpu(dep->inumber) != args->inumber);
 	/*
 	 * Change the inode number to the new value.
diff --git a/fs/xfs/libxfs/xfs_dir2_leaf.c b/fs/xfs/libxfs/xfs_dir2_leaf.c
index c13763c16095..958b9fea64bd 100644
--- a/fs/xfs/libxfs/xfs_dir2_leaf.c
+++ b/fs/xfs/libxfs/xfs_dir2_leaf.c
@@ -1518,6 +1518,7 @@ xfs_dir2_leaf_replace(
 	/*
 	 * Point to the data entry.
 	 */
+	args->offset = be32_to_cpu(lep->address);
 	dep = (xfs_dir2_data_entry_t *)
 	      ((char *)dbp->b_addr +
 	       xfs_dir2_dataptr_to_off(args->geo, be32_to_cpu(lep->address)));
diff --git a/fs/xfs/libxfs/xfs_dir2_node.c b/fs/xfs/libxfs/xfs_dir2_node.c
index 39cbdeafa0f6..53cd0d5d94f7 100644
--- a/fs/xfs/libxfs/xfs_dir2_node.c
+++ b/fs/xfs/libxfs/xfs_dir2_node.c
@@ -2242,6 +2242,7 @@ xfs_dir2_node_replace(
 		hdr = state->extrablk.bp->b_addr;
 		ASSERT(hdr->magic == cpu_to_be32(XFS_DIR2_DATA_MAGIC) ||
 		       hdr->magic == cpu_to_be32(XFS_DIR3_DATA_MAGIC));
+		args->offset = be32_to_cpu(leafhdr.ents[blk->index].address);
 		dep = (xfs_dir2_data_entry_t *)
 		      ((char *)hdr +
 		       xfs_dir2_dataptr_to_off(args->geo,
diff --git a/fs/xfs/libxfs/xfs_dir2_sf.c b/fs/xfs/libxfs/xfs_dir2_sf.c
index 2dc1d8d52228..2a8df4ede1a1 100644
--- a/fs/xfs/libxfs/xfs_dir2_sf.c
+++ b/fs/xfs/libxfs/xfs_dir2_sf.c
@@ -1109,6 +1109,8 @@ xfs_dir2_sf_replace(
 				xfs_dir2_sf_put_ino(mp, sfp, sfep,
 						args->inumber);
 				xfs_dir2_sf_put_ftype(mp, sfep, args->filetype);
+				args->offset = xfs_dir2_byte_to_dataptr(
+						  xfs_dir2_sf_get_offset(sfep));
 				break;
 			}
 		}
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index ce888f844053..09876ba10a42 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -2487,7 +2487,7 @@ xfs_remove(
 		 */
 		if (dp->i_ino != tp->t_mountp->m_sb.sb_rootino) {
 			error = xfs_dir_replace(tp, ip, &xfs_name_dotdot,
-					tp->t_mountp->m_sb.sb_rootino, 0);
+					tp->t_mountp->m_sb.sb_rootino, 0, NULL);
 			if (error)
 				return error;
 		}
@@ -2627,12 +2627,12 @@ xfs_cross_rename(
 	int		dp2_flags = 0;
 
 	/* Swap inode number for dirent in first parent */
-	error = xfs_dir_replace(tp, dp1, name1, ip2->i_ino, spaceres);
+	error = xfs_dir_replace(tp, dp1, name1, ip2->i_ino, spaceres, NULL);
 	if (error)
 		goto out_trans_abort;
 
 	/* Swap inode number for dirent in second parent */
-	error = xfs_dir_replace(tp, dp2, name2, ip1->i_ino, spaceres);
+	error = xfs_dir_replace(tp, dp2, name2, ip1->i_ino, spaceres, NULL);
 	if (error)
 		goto out_trans_abort;
 
@@ -2646,7 +2646,7 @@ xfs_cross_rename(
 
 		if (S_ISDIR(VFS_I(ip2)->i_mode)) {
 			error = xfs_dir_replace(tp, ip2, &xfs_name_dotdot,
-						dp1->i_ino, spaceres);
+						dp1->i_ino, spaceres, NULL);
 			if (error)
 				goto out_trans_abort;
 
@@ -2670,7 +2670,7 @@ xfs_cross_rename(
 
 		if (S_ISDIR(VFS_I(ip1)->i_mode)) {
 			error = xfs_dir_replace(tp, ip1, &xfs_name_dotdot,
-						dp2->i_ino, spaceres);
+						dp2->i_ino, spaceres, NULL);
 			if (error)
 				goto out_trans_abort;
 
@@ -3004,7 +3004,7 @@ xfs_rename(
 		 * name at the destination directory, remove it first.
 		 */
 		error = xfs_dir_replace(tp, target_dp, target_name,
-					src_ip->i_ino, spaceres);
+					src_ip->i_ino, spaceres, NULL);
 		if (error)
 			goto out_trans_cancel;
 
@@ -3038,7 +3038,7 @@ xfs_rename(
 		 * directory.
 		 */
 		error = xfs_dir_replace(tp, src_ip, &xfs_name_dotdot,
-					target_dp->i_ino, spaceres);
+					target_dp->i_ino, spaceres, NULL);
 		ASSERT(error != -EEXIST);
 		if (error)
 			goto out_trans_cancel;
@@ -3077,7 +3077,7 @@ xfs_rename(
 	 */
 	if (wip)
 		error = xfs_dir_replace(tp, src_dp, src_name, wip->i_ino,
-					spaceres);
+					spaceres, NULL);
 	else
 		error = xfs_dir_removename(tp, src_dp, src_name, src_ip->i_ino,
 					   spaceres, NULL);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH RESEND v2 08/18] xfs: add parent pointer support to attribute code
  2022-08-04 19:39 [PATCH RESEND v2 00/18] Parent Pointers Allison Henderson
                   ` (6 preceding siblings ...)
  2022-08-04 19:40 ` [PATCH RESEND v2 07/18] xfs: get directory offset when replacing a " Allison Henderson
@ 2022-08-04 19:40 ` Allison Henderson
  2022-08-09 16:54   ` Darrick J. Wong
  2022-08-04 19:40 ` [PATCH RESEND v2 09/18] xfs: define parent pointer xattr format Allison Henderson
                   ` (11 subsequent siblings)
  19 siblings, 1 reply; 58+ messages in thread
From: Allison Henderson @ 2022-08-04 19:40 UTC (permalink / raw)
  To: linux-xfs

Add the new parent attribute type. XFS_ATTR_PARENT is used only for parent pointer
entries; it uses reserved blocks like XFS_ATTR_ROOT.

[dchinner: forward ported and cleaned up]
[achender: rebased]

Signed-off-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
---
 fs/xfs/libxfs/xfs_attr.c       | 4 +++-
 fs/xfs/libxfs/xfs_da_format.h  | 5 ++++-
 fs/xfs/libxfs/xfs_log_format.h | 1 +
 3 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index e28d93d232de..8df80d91399b 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -966,11 +966,13 @@ xfs_attr_set(
 	struct xfs_inode	*dp = args->dp;
 	struct xfs_mount	*mp = dp->i_mount;
 	struct xfs_trans_res	tres;
-	bool			rsvd = (args->attr_filter & XFS_ATTR_ROOT);
+	bool			rsvd;
 	int			error, local;
 	int			rmt_blks = 0;
 	unsigned int		total;
 
+	rsvd = (args->attr_filter & (XFS_ATTR_ROOT | XFS_ATTR_PARENT)) != 0;
+
 	if (xfs_is_shutdown(dp->i_mount))
 		return -EIO;
 
diff --git a/fs/xfs/libxfs/xfs_da_format.h b/fs/xfs/libxfs/xfs_da_format.h
index 25e2841084e1..3dc03968bba6 100644
--- a/fs/xfs/libxfs/xfs_da_format.h
+++ b/fs/xfs/libxfs/xfs_da_format.h
@@ -688,12 +688,15 @@ struct xfs_attr3_leafblock {
 #define	XFS_ATTR_LOCAL_BIT	0	/* attr is stored locally */
 #define	XFS_ATTR_ROOT_BIT	1	/* limit access to trusted attrs */
 #define	XFS_ATTR_SECURE_BIT	2	/* limit access to secure attrs */
+#define	XFS_ATTR_PARENT_BIT	3	/* parent pointer attrs */
 #define	XFS_ATTR_INCOMPLETE_BIT	7	/* attr in middle of create/delete */
 #define XFS_ATTR_LOCAL		(1u << XFS_ATTR_LOCAL_BIT)
 #define XFS_ATTR_ROOT		(1u << XFS_ATTR_ROOT_BIT)
 #define XFS_ATTR_SECURE		(1u << XFS_ATTR_SECURE_BIT)
+#define XFS_ATTR_PARENT		(1u << XFS_ATTR_PARENT_BIT)
 #define XFS_ATTR_INCOMPLETE	(1u << XFS_ATTR_INCOMPLETE_BIT)
-#define XFS_ATTR_NSP_ONDISK_MASK	(XFS_ATTR_ROOT | XFS_ATTR_SECURE)
+#define XFS_ATTR_NSP_ONDISK_MASK \
+			(XFS_ATTR_ROOT | XFS_ATTR_SECURE | XFS_ATTR_PARENT)
 
 /*
  * Alignment for namelist and valuelist entries (since they are mixed
diff --git a/fs/xfs/libxfs/xfs_log_format.h b/fs/xfs/libxfs/xfs_log_format.h
index b351b9dc6561..eea53874fde8 100644
--- a/fs/xfs/libxfs/xfs_log_format.h
+++ b/fs/xfs/libxfs/xfs_log_format.h
@@ -917,6 +917,7 @@ struct xfs_icreate_log {
  */
 #define XFS_ATTRI_FILTER_MASK		(XFS_ATTR_ROOT | \
 					 XFS_ATTR_SECURE | \
+					 XFS_ATTR_PARENT | \
 					 XFS_ATTR_INCOMPLETE)
 
 /*
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH RESEND v2 09/18] xfs: define parent pointer xattr format
  2022-08-04 19:39 [PATCH RESEND v2 00/18] Parent Pointers Allison Henderson
                   ` (7 preceding siblings ...)
  2022-08-04 19:40 ` [PATCH RESEND v2 08/18] xfs: add parent pointer support to attribute code Allison Henderson
@ 2022-08-04 19:40 ` Allison Henderson
  2022-08-04 19:40 ` [PATCH RESEND v2 10/18] xfs: Add xfs_verify_pptr Allison Henderson
                   ` (10 subsequent siblings)
  19 siblings, 0 replies; 58+ messages in thread
From: Allison Henderson @ 2022-08-04 19:40 UTC (permalink / raw)
  To: linux-xfs

We need to define the parent pointer attribute format before we start
adding support for it into all the code that needs to use it. The EA
format we will use encodes the following information:

        name={parent inode #, parent inode generation, dirent offset}
        value={dirent filename}

The inode/gen gives all the information we need to reliably identify the
parent without requiring child->parent lock ordering, and allows
userspace to do pathname component level reconstruction without the
kernel ever needing to verify the parent itself as part of ioctl calls.

By using the dirent offset in the EA name, we have a method of knowing
the exact parent pointer EA we need to modify/remove in rename/unlink
without an unbound EA name search.

By keeping the dirent name in the value, we have enough information to
be able to validate and reconstruct damaged directory trees. While the
diroffset of a filename alone is not unique enough to identify the
child, the {diroffset,filename,child_inode} tuple is sufficient. That
is, if the diroffset gets reused and points to a different filename, we
can detect that from the contents of EA. If a link of the same name is
created, then we can check whether it points at the same inode as the
parent EA we current have.

[achender: rebased, changed __unint32_t to xfs_dir2_dataptr_t,
           changed p_ino to xfs_ino_t and p_namelen to uint8_t,
           moved to xfs_da_format for xfs_dir2_dataptr_t]

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_da_format.h | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/fs/xfs/libxfs/xfs_da_format.h b/fs/xfs/libxfs/xfs_da_format.h
index 3dc03968bba6..b02b67f1999e 100644
--- a/fs/xfs/libxfs/xfs_da_format.h
+++ b/fs/xfs/libxfs/xfs_da_format.h
@@ -805,4 +805,29 @@ static inline unsigned int xfs_dir2_dirblock_bytes(struct xfs_sb *sbp)
 xfs_failaddr_t xfs_da3_blkinfo_verify(struct xfs_buf *bp,
 				      struct xfs_da3_blkinfo *hdr3);
 
+/*
+ * Parent pointer attribute format definition
+ *
+ * EA name encodes the parent inode number, generation and the offset of
+ * the dirent that points to the child inode. The EA value contains the
+ * same name as the dirent in the parent directory.
+ */
+struct xfs_parent_name_rec {
+	__be64  p_ino;
+	__be32  p_gen;
+	__be32  p_diroffset;
+};
+
+/*
+ * incore version of the above, also contains name pointers so callers
+ * can pass/obtain all the parent pointer information in a single structure
+ */
+struct xfs_parent_name_irec {
+	xfs_ino_t		p_ino;
+	uint32_t		p_gen;
+	xfs_dir2_dataptr_t	p_diroffset;
+	const char		*p_name;
+	uint8_t			p_namelen;
+};
+
 #endif /* __XFS_DA_FORMAT_H__ */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH RESEND v2 10/18] xfs: Add xfs_verify_pptr
  2022-08-04 19:39 [PATCH RESEND v2 00/18] Parent Pointers Allison Henderson
                   ` (8 preceding siblings ...)
  2022-08-04 19:40 ` [PATCH RESEND v2 09/18] xfs: define parent pointer xattr format Allison Henderson
@ 2022-08-04 19:40 ` Allison Henderson
  2022-08-09 16:59   ` Darrick J. Wong
  2022-08-04 19:40 ` [PATCH RESEND v2 11/18] xfs: extend transaction reservations for parent attributes Allison Henderson
                   ` (9 subsequent siblings)
  19 siblings, 1 reply; 58+ messages in thread
From: Allison Henderson @ 2022-08-04 19:40 UTC (permalink / raw)
  To: linux-xfs

Attribute names of parent pointers are not strings.  So we need to modify
attr_namecheck to verify parent pointer records when the XFS_ATTR_PARENT flag is
set.

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_attr.c | 43 +++++++++++++++++++++++++++++++++++++---
 fs/xfs/libxfs/xfs_attr.h |  3 ++-
 fs/xfs/scrub/attr.c      |  2 +-
 fs/xfs/xfs_attr_item.c   |  6 ++++--
 fs/xfs/xfs_attr_list.c   | 17 +++++++++++-----
 5 files changed, 59 insertions(+), 12 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index 8df80d91399b..2ef3262f21e8 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -1567,9 +1567,29 @@ xfs_attr_node_get(
 	return error;
 }
 
-/* Returns true if the attribute entry name is valid. */
-bool
-xfs_attr_namecheck(
+/*
+ * Verify parent pointer attribute is valid.
+ * Return true on success or false on failure
+ */
+STATIC bool
+xfs_verify_pptr(struct xfs_mount *mp, struct xfs_parent_name_rec *rec)
+{
+	xfs_ino_t p_ino = (xfs_ino_t)be64_to_cpu(rec->p_ino);
+	xfs_dir2_dataptr_t p_diroffset =
+		(xfs_dir2_dataptr_t)be32_to_cpu(rec->p_diroffset);
+
+	if (!xfs_verify_ino(mp, p_ino))
+		return false;
+
+	if (p_diroffset > XFS_DIR2_MAX_DATAPTR)
+		return false;
+
+	return true;
+}
+
+/* Returns true if the string attribute entry name is valid. */
+static bool
+xfs_str_attr_namecheck(
 	const void	*name,
 	size_t		length)
 {
@@ -1584,6 +1604,23 @@ xfs_attr_namecheck(
 	return !memchr(name, 0, length);
 }
 
+/* Returns true if the attribute entry name is valid. */
+bool
+xfs_attr_namecheck(
+	struct xfs_mount	*mp,
+	const void		*name,
+	size_t			length,
+	int			flags)
+{
+	if (flags & XFS_ATTR_PARENT) {
+		if (length != sizeof(struct xfs_parent_name_rec))
+			return false;
+		return xfs_verify_pptr(mp, (struct xfs_parent_name_rec *)name);
+	}
+
+	return xfs_str_attr_namecheck(name, length);
+}
+
 int __init
 xfs_attr_intent_init_cache(void)
 {
diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
index 81be9b3e4004..af92cc57e7d8 100644
--- a/fs/xfs/libxfs/xfs_attr.h
+++ b/fs/xfs/libxfs/xfs_attr.h
@@ -547,7 +547,8 @@ int xfs_attr_get(struct xfs_da_args *args);
 int xfs_attr_set(struct xfs_da_args *args);
 int xfs_attr_set_iter(struct xfs_attr_intent *attr);
 int xfs_attr_remove_iter(struct xfs_attr_intent *attr);
-bool xfs_attr_namecheck(const void *name, size_t length);
+bool xfs_attr_namecheck(struct xfs_mount *mp, const void *name, size_t length,
+			int flags);
 int xfs_attr_calc_size(struct xfs_da_args *args, int *local);
 void xfs_init_attr_trans(struct xfs_da_args *args, struct xfs_trans_res *tres,
 			 unsigned int *total);
diff --git a/fs/xfs/scrub/attr.c b/fs/xfs/scrub/attr.c
index b6f0c9f3f124..d3e75c077fab 100644
--- a/fs/xfs/scrub/attr.c
+++ b/fs/xfs/scrub/attr.c
@@ -128,7 +128,7 @@ xchk_xattr_listent(
 	}
 
 	/* Does this name make sense? */
-	if (!xfs_attr_namecheck(name, namelen)) {
+	if (!xfs_attr_namecheck(sx->sc->mp, name, namelen, flags)) {
 		xchk_fblock_set_corrupt(sx->sc, XFS_ATTR_FORK, args.blkno);
 		return;
 	}
diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
index c13d724a3e13..69856814c066 100644
--- a/fs/xfs/xfs_attr_item.c
+++ b/fs/xfs/xfs_attr_item.c
@@ -587,7 +587,8 @@ xfs_attri_item_recover(
 	 */
 	attrp = &attrip->attri_format;
 	if (!xfs_attri_validate(mp, attrp) ||
-	    !xfs_attr_namecheck(nv->name.i_addr, nv->name.i_len))
+	    !xfs_attr_namecheck(mp, nv->name.i_addr, nv->name.i_len,
+				attrp->alfi_attr_filter))
 		return -EFSCORRUPTED;
 
 	error = xlog_recover_iget(mp,  attrp->alfi_ino, &ip);
@@ -727,7 +728,8 @@ xlog_recover_attri_commit_pass2(
 		return -EFSCORRUPTED;
 	}
 
-	if (!xfs_attr_namecheck(attr_name, attri_formatp->alfi_name_len)) {
+	if (!xfs_attr_namecheck(mp, attr_name, attri_formatp->alfi_name_len,
+				attri_formatp->alfi_attr_filter)) {
 		XFS_ERROR_REPORT(__func__, XFS_ERRLEVEL_LOW, mp);
 		return -EFSCORRUPTED;
 	}
diff --git a/fs/xfs/xfs_attr_list.c b/fs/xfs/xfs_attr_list.c
index 99bbbe1a0e44..a51f7f13a352 100644
--- a/fs/xfs/xfs_attr_list.c
+++ b/fs/xfs/xfs_attr_list.c
@@ -58,9 +58,13 @@ xfs_attr_shortform_list(
 	struct xfs_attr_sf_sort		*sbuf, *sbp;
 	struct xfs_attr_shortform	*sf;
 	struct xfs_attr_sf_entry	*sfe;
+	struct xfs_mount		*mp;
 	int				sbsize, nsbuf, count, i;
 	int				error = 0;
 
+	ASSERT(context != NULL);
+	ASSERT(dp != NULL);
+	mp = dp->i_mount;
 	sf = (struct xfs_attr_shortform *)dp->i_af.if_u1.if_data;
 	ASSERT(sf != NULL);
 	if (!sf->hdr.count)
@@ -82,8 +86,9 @@ xfs_attr_shortform_list(
 	     (dp->i_af.if_bytes + sf->hdr.count * 16) < context->bufsize)) {
 		for (i = 0, sfe = &sf->list[0]; i < sf->hdr.count; i++) {
 			if (XFS_IS_CORRUPT(context->dp->i_mount,
-					   !xfs_attr_namecheck(sfe->nameval,
-							       sfe->namelen)))
+					   !xfs_attr_namecheck(mp, sfe->nameval,
+							       sfe->namelen,
+							       sfe->flags)))
 				return -EFSCORRUPTED;
 			context->put_listent(context,
 					     sfe->flags,
@@ -174,8 +179,9 @@ xfs_attr_shortform_list(
 			cursor->offset = 0;
 		}
 		if (XFS_IS_CORRUPT(context->dp->i_mount,
-				   !xfs_attr_namecheck(sbp->name,
-						       sbp->namelen))) {
+				   !xfs_attr_namecheck(mp, sbp->name,
+						       sbp->namelen,
+						       sbp->flags))) {
 			error = -EFSCORRUPTED;
 			goto out;
 		}
@@ -465,7 +471,8 @@ xfs_attr3_leaf_list_int(
 		}
 
 		if (XFS_IS_CORRUPT(context->dp->i_mount,
-				   !xfs_attr_namecheck(name, namelen)))
+				   !xfs_attr_namecheck(mp, name, namelen,
+						       entry->flags)))
 			return -EFSCORRUPTED;
 		context->put_listent(context, entry->flags,
 					      name, namelen, valuelen);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH RESEND v2 11/18] xfs: extend transaction reservations for parent attributes
  2022-08-04 19:39 [PATCH RESEND v2 00/18] Parent Pointers Allison Henderson
                   ` (9 preceding siblings ...)
  2022-08-04 19:40 ` [PATCH RESEND v2 10/18] xfs: Add xfs_verify_pptr Allison Henderson
@ 2022-08-04 19:40 ` Allison Henderson
  2022-08-09 17:48   ` Darrick J. Wong
  2022-08-04 19:40 ` [PATCH RESEND v2 12/18] xfs: parent pointer attribute creation Allison Henderson
                   ` (8 subsequent siblings)
  19 siblings, 1 reply; 58+ messages in thread
From: Allison Henderson @ 2022-08-04 19:40 UTC (permalink / raw)
  To: linux-xfs

We need to add, remove or modify parent pointer attributes during
create/link/unlink/rename operations atomically with the dirents in the
parent directories being modified. This means they need to be modified
in the same transaction as the parent directories, and so we need to add
the required space for the attribute modifications to the transaction
reservations.

[achender: rebased]

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
---
 fs/xfs/libxfs/xfs_trans_resv.c | 105 +++++++++++++++++++++++++++------
 1 file changed, 86 insertions(+), 19 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_trans_resv.c b/fs/xfs/libxfs/xfs_trans_resv.c
index e9913c2c5a24..b43ac4be7564 100644
--- a/fs/xfs/libxfs/xfs_trans_resv.c
+++ b/fs/xfs/libxfs/xfs_trans_resv.c
@@ -909,24 +909,67 @@ xfs_calc_sb_reservation(
 	return xfs_calc_buf_res(1, mp->m_sb.sb_sectsize);
 }
 
-void
-xfs_trans_resv_calc(
-	struct xfs_mount	*mp,
-	struct xfs_trans_resv	*resp)
+STATIC void
+xfs_calc_parent_ptr_reservations(
+	struct xfs_mount     *mp)
 {
-	int			logcount_adj = 0;
+	struct xfs_trans_resv   *resp = M_RES(mp);
 
-	/*
-	 * The following transactions are logged in physical format and
-	 * require a permanent reservation on space.
-	 */
-	resp->tr_write.tr_logres = xfs_calc_write_reservation(mp, false);
-	resp->tr_write.tr_logcount = XFS_WRITE_LOG_COUNT;
-	resp->tr_write.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
+	/* Calculate extra space needed for parent pointer attributes */
+	if (!xfs_has_parent(mp))
+		return;
 
-	resp->tr_itruncate.tr_logres = xfs_calc_itruncate_reservation(mp, false);
-	resp->tr_itruncate.tr_logcount = XFS_ITRUNCATE_LOG_COUNT;
-	resp->tr_itruncate.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
+	/* rename can add/remove/modify 4 parent attributes */
+	resp->tr_rename.tr_logres += 4 * max(resp->tr_attrsetm.tr_logres,
+					 resp->tr_attrrm.tr_logres);
+	resp->tr_rename.tr_logcount += 4 * max(resp->tr_attrsetm.tr_logcount,
+					   resp->tr_attrrm.tr_logcount);
+
+	/* create will add 1 parent attribute */
+	resp->tr_create.tr_logres += resp->tr_attrsetm.tr_logres;
+	resp->tr_create.tr_logcount += resp->tr_attrsetm.tr_logcount;
+
+	/* mkdir will add 1 parent attribute */
+	resp->tr_mkdir.tr_logres += resp->tr_attrsetm.tr_logres;
+	resp->tr_mkdir.tr_logcount += resp->tr_attrsetm.tr_logcount;
+
+	/* link will add 1 parent attribute */
+	resp->tr_link.tr_logres += resp->tr_attrsetm.tr_logres;
+	resp->tr_link.tr_logcount += resp->tr_attrsetm.tr_logcount;
+
+	/* symlink will add 1 parent attribute */
+	resp->tr_symlink.tr_logres += resp->tr_attrsetm.tr_logres;
+	resp->tr_symlink.tr_logcount += resp->tr_attrsetm.tr_logcount;
+
+	/* remove will remove 1 parent attribute */
+	resp->tr_remove.tr_logres += resp->tr_attrrm.tr_logres;
+	resp->tr_remove.tr_logcount += resp->tr_attrrm.tr_logcount;
+}
+
+/*
+ * Namespace reservations.
+ *
+ * These get tricky when parent pointers are enabled as we have attribute
+ * modifications occurring from within these transactions. Rather than confuse
+ * each of these reservation calculations with the conditional attribute
+ * reservations, add them here in a clear and concise manner. This assumes that
+ * the attribute reservations have already been calculated.
+ *
+ * Note that we only include the static attribute reservation here; the runtime
+ * reservation will have to be modified by the size of the attributes being
+ * added/removed/modified. See the comments on the attribute reservation
+ * calculations for more details.
+ *
+ * Note for rename: rename will vastly overestimate requirements. This will be
+ * addressed later when modifications are made to ensure parent attribute
+ * modifications can be done atomically with the rename operation.
+ */
+STATIC void
+xfs_calc_namespace_reservations(
+	struct xfs_mount	*mp,
+	struct xfs_trans_resv	*resp)
+{
+	ASSERT(resp->tr_attrsetm.tr_logres > 0);
 
 	resp->tr_rename.tr_logres = xfs_calc_rename_reservation(mp);
 	resp->tr_rename.tr_logcount = XFS_RENAME_LOG_COUNT;
@@ -948,15 +991,37 @@ xfs_trans_resv_calc(
 	resp->tr_create.tr_logcount = XFS_CREATE_LOG_COUNT;
 	resp->tr_create.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
 
+	resp->tr_mkdir.tr_logres = xfs_calc_mkdir_reservation(mp);
+	resp->tr_mkdir.tr_logcount = XFS_MKDIR_LOG_COUNT;
+	resp->tr_mkdir.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
+
+	xfs_calc_parent_ptr_reservations(mp);
+}
+
+void
+xfs_trans_resv_calc(
+	struct xfs_mount	*mp,
+	struct xfs_trans_resv	*resp)
+{
+	int			logcount_adj = 0;
+
+	/*
+	 * The following transactions are logged in physical format and
+	 * require a permanent reservation on space.
+	 */
+	resp->tr_write.tr_logres = xfs_calc_write_reservation(mp, false);
+	resp->tr_write.tr_logcount = XFS_WRITE_LOG_COUNT;
+	resp->tr_write.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
+
+	resp->tr_itruncate.tr_logres = xfs_calc_itruncate_reservation(mp, false);
+	resp->tr_itruncate.tr_logcount = XFS_ITRUNCATE_LOG_COUNT;
+	resp->tr_itruncate.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
+
 	resp->tr_create_tmpfile.tr_logres =
 			xfs_calc_create_tmpfile_reservation(mp);
 	resp->tr_create_tmpfile.tr_logcount = XFS_CREATE_TMPFILE_LOG_COUNT;
 	resp->tr_create_tmpfile.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
 
-	resp->tr_mkdir.tr_logres = xfs_calc_mkdir_reservation(mp);
-	resp->tr_mkdir.tr_logcount = XFS_MKDIR_LOG_COUNT;
-	resp->tr_mkdir.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
-
 	resp->tr_ifree.tr_logres = xfs_calc_ifree_reservation(mp);
 	resp->tr_ifree.tr_logcount = XFS_INACTIVE_LOG_COUNT;
 	resp->tr_ifree.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
@@ -986,6 +1051,8 @@ xfs_trans_resv_calc(
 	resp->tr_qm_dqalloc.tr_logcount = XFS_WRITE_LOG_COUNT;
 	resp->tr_qm_dqalloc.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
 
+	xfs_calc_namespace_reservations(mp, resp);
+
 	/*
 	 * The following transactions are logged in logical format with
 	 * a default log count.
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH RESEND v2 12/18] xfs: parent pointer attribute creation
  2022-08-04 19:39 [PATCH RESEND v2 00/18] Parent Pointers Allison Henderson
                   ` (10 preceding siblings ...)
  2022-08-04 19:40 ` [PATCH RESEND v2 11/18] xfs: extend transaction reservations for parent attributes Allison Henderson
@ 2022-08-04 19:40 ` Allison Henderson
  2022-08-09 18:01   ` Darrick J. Wong
  2022-08-04 19:40 ` [PATCH RESEND v2 13/18] xfs: add parent attributes to link Allison Henderson
                   ` (7 subsequent siblings)
  19 siblings, 1 reply; 58+ messages in thread
From: Allison Henderson @ 2022-08-04 19:40 UTC (permalink / raw)
  To: linux-xfs

Add parent pointer attribute during xfs_create, and subroutines to
initialize attributes

[bfoster: rebase, use VFS inode generation]
[achender: rebased, changed __unint32_t to xfs_dir2_dataptr_t,
           fixed some null pointer bugs,
           merged error handling patch,
           remove unnecessary ENOSPC handling in xfs_attr_set_first_parent]

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
---
 fs/xfs/Makefile            |   1 +
 fs/xfs/libxfs/xfs_attr.c   |   4 +-
 fs/xfs/libxfs/xfs_attr.h   |   4 +-
 fs/xfs/libxfs/xfs_parent.c | 134 +++++++++++++++++++++++++++++++++++++
 fs/xfs/libxfs/xfs_parent.h |  34 ++++++++++
 fs/xfs/xfs_inode.c         |  37 ++++++++--
 fs/xfs/xfs_xattr.c         |   2 +-
 fs/xfs/xfs_xattr.h         |   1 +
 8 files changed, 208 insertions(+), 9 deletions(-)

diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index 1131dd01e4fe..caeea8d968ba 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -40,6 +40,7 @@ xfs-y				+= $(addprefix libxfs/, \
 				   xfs_inode_fork.o \
 				   xfs_inode_buf.o \
 				   xfs_log_rlimit.o \
+				   xfs_parent.o \
 				   xfs_ag_resv.o \
 				   xfs_rmap.o \
 				   xfs_rmap_btree.o \
diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index 2ef3262f21e8..0a458ea7051f 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -880,7 +880,7 @@ xfs_attr_lookup(
 	return error;
 }
 
-static int
+int
 xfs_attr_intent_init(
 	struct xfs_da_args	*args,
 	unsigned int		op_flags,	/* op flag (set or remove) */
@@ -898,7 +898,7 @@ xfs_attr_intent_init(
 }
 
 /* Sets an attribute for an inode as a deferred operation */
-static int
+int
 xfs_attr_defer_add(
 	struct xfs_da_args	*args)
 {
diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
index af92cc57e7d8..b47417b5172f 100644
--- a/fs/xfs/libxfs/xfs_attr.h
+++ b/fs/xfs/libxfs/xfs_attr.h
@@ -544,6 +544,7 @@ int xfs_inode_hasattr(struct xfs_inode *ip);
 bool xfs_attr_is_leaf(struct xfs_inode *ip);
 int xfs_attr_get_ilocked(struct xfs_da_args *args);
 int xfs_attr_get(struct xfs_da_args *args);
+int xfs_attr_defer_add(struct xfs_da_args *args);
 int xfs_attr_set(struct xfs_da_args *args);
 int xfs_attr_set_iter(struct xfs_attr_intent *attr);
 int xfs_attr_remove_iter(struct xfs_attr_intent *attr);
@@ -552,7 +553,8 @@ bool xfs_attr_namecheck(struct xfs_mount *mp, const void *name, size_t length,
 int xfs_attr_calc_size(struct xfs_da_args *args, int *local);
 void xfs_init_attr_trans(struct xfs_da_args *args, struct xfs_trans_res *tres,
 			 unsigned int *total);
-
+int xfs_attr_intent_init(struct xfs_da_args *args, unsigned int op_flags,
+			 struct xfs_attr_intent  **attr);
 /*
  * Check to see if the attr should be upgraded from non-existent or shortform to
  * single-leaf-block attribute list.
diff --git a/fs/xfs/libxfs/xfs_parent.c b/fs/xfs/libxfs/xfs_parent.c
new file mode 100644
index 000000000000..4ab531c77d7d
--- /dev/null
+++ b/fs/xfs/libxfs/xfs_parent.c
@@ -0,0 +1,134 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2022 Oracle, Inc.
+ * All rights reserved.
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_format.h"
+#include "xfs_da_format.h"
+#include "xfs_log_format.h"
+#include "xfs_shared.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_bmap_btree.h"
+#include "xfs_inode.h"
+#include "xfs_error.h"
+#include "xfs_trace.h"
+#include "xfs_trans.h"
+#include "xfs_da_btree.h"
+#include "xfs_attr.h"
+#include "xfs_da_btree.h"
+#include "xfs_attr_sf.h"
+#include "xfs_bmap.h"
+#include "xfs_defer.h"
+#include "xfs_log.h"
+#include "xfs_xattr.h"
+#include "xfs_parent.h"
+
+/*
+ * Parent pointer attribute handling.
+ *
+ * Because the attribute value is a filename component, it will never be longer
+ * than 255 bytes. This means the attribute will always be a local format
+ * attribute as it is xfs_attr_leaf_entsize_local_max() for v5 filesystems will
+ * always be larger than this (max is 75% of block size).
+ *
+ * Creating a new parent attribute will always create a new attribute - there
+ * should never, ever be an existing attribute in the tree for a new inode.
+ * ENOSPC behavior is problematic - creating the inode without the parent
+ * pointer is effectively a corruption, so we allow parent attribute creation
+ * to dip into the reserve block pool to avoid unexpected ENOSPC errors from
+ * occurring.
+ */
+
+
+/* Initializes a xfs_parent_name_rec to be stored as an attribute name */
+void
+xfs_init_parent_name_rec(
+	struct xfs_parent_name_rec	*rec,
+	struct xfs_inode		*ip,
+	uint32_t			p_diroffset)
+{
+	xfs_ino_t			p_ino = ip->i_ino;
+	uint32_t			p_gen = VFS_I(ip)->i_generation;
+
+	rec->p_ino = cpu_to_be64(p_ino);
+	rec->p_gen = cpu_to_be32(p_gen);
+	rec->p_diroffset = cpu_to_be32(p_diroffset);
+}
+
+/* Initializes a xfs_parent_name_irec from an xfs_parent_name_rec */
+void
+xfs_init_parent_name_irec(
+	struct xfs_parent_name_irec	*irec,
+	struct xfs_parent_name_rec	*rec)
+{
+	irec->p_ino = be64_to_cpu(rec->p_ino);
+	irec->p_gen = be32_to_cpu(rec->p_gen);
+	irec->p_diroffset = be32_to_cpu(rec->p_diroffset);
+}
+
+int
+xfs_parent_init(
+	xfs_mount_t                     *mp,
+	xfs_inode_t			*ip,
+	struct xfs_name			*target_name,
+	struct xfs_parent_defer		**parentp)
+{
+	struct xfs_parent_defer		*parent;
+	int				error;
+
+	if (!xfs_has_parent(mp))
+		return 0;
+
+	error = xfs_attr_grab_log_assist(mp);
+	if (error)
+		return error;
+
+	parent = kzalloc(sizeof(*parent), GFP_KERNEL);
+	if (!parent)
+		return -ENOMEM;
+
+	/* init parent da_args */
+	parent->args.dp = ip;
+	parent->args.geo = mp->m_attr_geo;
+	parent->args.whichfork = XFS_ATTR_FORK;
+	parent->args.attr_filter = XFS_ATTR_PARENT;
+	parent->args.op_flags = XFS_DA_OP_OKNOENT | XFS_DA_OP_LOGGED;
+	parent->args.name = (const uint8_t *)&parent->rec;
+	parent->args.namelen = sizeof(struct xfs_parent_name_rec);
+
+	if (target_name) {
+		parent->args.value = (void *)target_name->name;
+		parent->args.valuelen = target_name->len;
+	}
+
+	*parentp = parent;
+	return 0;
+}
+
+int
+xfs_parent_defer_add(
+	struct xfs_trans	*tp,
+	struct xfs_inode	*ip,
+	struct xfs_parent_defer	*parent,
+	xfs_dir2_dataptr_t	diroffset)
+{
+	struct xfs_da_args	*args = &parent->args;
+
+	xfs_init_parent_name_rec(&parent->rec, ip, diroffset);
+	args->trans = tp;
+	args->hashval = xfs_da_hashname(args->name, args->namelen);
+	return xfs_attr_defer_add(args);
+}
+
+void
+xfs_parent_cancel(
+	xfs_mount_t		*mp,
+	struct xfs_parent_defer *parent)
+{
+	xlog_drop_incompat_feat(mp->m_log);
+	kfree(parent);
+}
+
diff --git a/fs/xfs/libxfs/xfs_parent.h b/fs/xfs/libxfs/xfs_parent.h
new file mode 100644
index 000000000000..21a350b97ed5
--- /dev/null
+++ b/fs/xfs/libxfs/xfs_parent.h
@@ -0,0 +1,34 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2022 Oracle, Inc.
+ * All Rights Reserved.
+ */
+#ifndef	__XFS_PARENT_H__
+#define	__XFS_PARENT_H__
+
+/*
+ * Dynamically allocd structure used to wrap the needed data to pass around
+ * the defer ops machinery
+ */
+struct xfs_parent_defer {
+	struct xfs_parent_name_rec	rec;
+	struct xfs_da_args		args;
+};
+
+/*
+ * Parent pointer attribute prototypes
+ */
+void xfs_init_parent_name_rec(struct xfs_parent_name_rec *rec,
+			      struct xfs_inode *ip,
+			      uint32_t p_diroffset);
+void xfs_init_parent_name_irec(struct xfs_parent_name_irec *irec,
+			       struct xfs_parent_name_rec *rec);
+int xfs_parent_init(xfs_mount_t *mp, xfs_inode_t *ip,
+		    struct xfs_name *target_name,
+		    struct xfs_parent_defer **parentp);
+int xfs_parent_defer_add(struct xfs_trans *tp, struct xfs_inode *ip,
+			 struct xfs_parent_defer *parent,
+			 xfs_dir2_dataptr_t diroffset);
+void xfs_parent_cancel(xfs_mount_t *mp, struct xfs_parent_defer *parent);
+
+#endif	/* __XFS_PARENT_H__ */
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 09876ba10a42..ef993c3a8963 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -37,6 +37,8 @@
 #include "xfs_reflink.h"
 #include "xfs_ag.h"
 #include "xfs_log_priv.h"
+#include "xfs_parent.h"
+#include "xfs_xattr.h"
 
 struct kmem_cache *xfs_inode_cache;
 
@@ -950,7 +952,7 @@ xfs_bumplink(
 int
 xfs_create(
 	struct user_namespace	*mnt_userns,
-	xfs_inode_t		*dp,
+	struct xfs_inode	*dp,
 	struct xfs_name		*name,
 	umode_t			mode,
 	dev_t			rdev,
@@ -962,7 +964,7 @@ xfs_create(
 	struct xfs_inode	*ip = NULL;
 	struct xfs_trans	*tp = NULL;
 	int			error;
-	bool                    unlock_dp_on_error = false;
+	bool			unlock_dp_on_error = false;
 	prid_t			prid;
 	struct xfs_dquot	*udqp = NULL;
 	struct xfs_dquot	*gdqp = NULL;
@@ -970,6 +972,8 @@ xfs_create(
 	struct xfs_trans_res	*tres;
 	uint			resblks;
 	xfs_ino_t		ino;
+	xfs_dir2_dataptr_t	diroffset;
+	struct xfs_parent_defer	*parent = NULL;
 
 	trace_xfs_create(dp, name);
 
@@ -996,6 +1000,12 @@ xfs_create(
 		tres = &M_RES(mp)->tr_create;
 	}
 
+	if (xfs_has_parent(mp)) {
+		error = xfs_parent_init(mp, dp, name, &parent);
+		if (error)
+			goto out_release_dquots;
+	}
+
 	/*
 	 * Initially assume that the file does not exist and
 	 * reserve the resources for that case.  If that is not
@@ -1011,7 +1021,7 @@ xfs_create(
 				resblks, &tp);
 	}
 	if (error)
-		goto out_release_dquots;
+		goto drop_incompat;
 
 	xfs_ilock(dp, XFS_ILOCK_EXCL | XFS_ILOCK_PARENT);
 	unlock_dp_on_error = true;
@@ -1021,6 +1031,7 @@ xfs_create(
 	 * entry pointing to them, but a directory also the "." entry
 	 * pointing to itself.
 	 */
+	init_xattrs |= xfs_has_parent(mp);
 	error = xfs_dialloc(&tp, dp->i_ino, mode, &ino);
 	if (!error)
 		error = xfs_init_new_inode(mnt_userns, tp, dp, ino, mode,
@@ -1035,11 +1046,12 @@ xfs_create(
 	 * the transaction cancel unlocking dp so don't do it explicitly in the
 	 * error path.
 	 */
-	xfs_trans_ijoin(tp, dp, XFS_ILOCK_EXCL);
+	xfs_trans_ijoin(tp, dp, 0);
 	unlock_dp_on_error = false;
 
 	error = xfs_dir_createname(tp, dp, name, ip->i_ino,
-				   resblks - XFS_IALLOC_SPACE_RES(mp), NULL);
+				   resblks - XFS_IALLOC_SPACE_RES(mp),
+				   &diroffset);
 	if (error) {
 		ASSERT(error != -ENOSPC);
 		goto out_trans_cancel;
@@ -1055,6 +1067,17 @@ xfs_create(
 		xfs_bumplink(tp, dp);
 	}
 
+	/*
+	 * If we have parent pointers, we need to add the attribute containing
+	 * the parent information now.
+	 */
+	if (parent) {
+		parent->args.dp	= ip;
+		error = xfs_parent_defer_add(tp, dp, parent, diroffset);
+		if (error)
+			goto out_trans_cancel;
+	}
+
 	/*
 	 * If this is a synchronous mount, make sure that the
 	 * create transaction goes to disk before returning to
@@ -1080,6 +1103,7 @@ xfs_create(
 
 	*ipp = ip;
 	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+	xfs_iunlock(dp, XFS_ILOCK_EXCL | XFS_ILOCK_PARENT);
 	return 0;
 
  out_trans_cancel:
@@ -1094,6 +1118,9 @@ xfs_create(
 		xfs_finish_inode_setup(ip);
 		xfs_irele(ip);
 	}
+ drop_incompat:
+	if (parent)
+		xfs_parent_cancel(mp, parent);
  out_release_dquots:
 	xfs_qm_dqrele(udqp);
 	xfs_qm_dqrele(gdqp);
diff --git a/fs/xfs/xfs_xattr.c b/fs/xfs/xfs_xattr.c
index c325a28b89a8..d9067c5f6bd6 100644
--- a/fs/xfs/xfs_xattr.c
+++ b/fs/xfs/xfs_xattr.c
@@ -27,7 +27,7 @@
  * they must release the permission by calling xlog_drop_incompat_feat
  * when they're done.
  */
-static inline int
+int
 xfs_attr_grab_log_assist(
 	struct xfs_mount	*mp)
 {
diff --git a/fs/xfs/xfs_xattr.h b/fs/xfs/xfs_xattr.h
index 2b09133b1b9b..3fd6520a4d69 100644
--- a/fs/xfs/xfs_xattr.h
+++ b/fs/xfs/xfs_xattr.h
@@ -7,6 +7,7 @@
 #define __XFS_XATTR_H__
 
 int xfs_attr_change(struct xfs_da_args *args);
+int xfs_attr_grab_log_assist(struct xfs_mount *mp);
 
 extern const struct xattr_handler *xfs_xattr_handlers[];
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH RESEND v2 13/18] xfs: add parent attributes to link
  2022-08-04 19:39 [PATCH RESEND v2 00/18] Parent Pointers Allison Henderson
                   ` (11 preceding siblings ...)
  2022-08-04 19:40 ` [PATCH RESEND v2 12/18] xfs: parent pointer attribute creation Allison Henderson
@ 2022-08-04 19:40 ` Allison Henderson
  2022-08-09 18:43   ` Darrick J. Wong
  2022-08-04 19:40 ` [PATCH RESEND v2 14/18] xfs: remove parent pointers in unlink Allison Henderson
                   ` (6 subsequent siblings)
  19 siblings, 1 reply; 58+ messages in thread
From: Allison Henderson @ 2022-08-04 19:40 UTC (permalink / raw)
  To: linux-xfs

This patch modifies xfs_link to add a parent pointer to the inode.

[bfoster: rebase, use VFS inode fields, fix xfs_bmap_finish() usage]
[achender: rebased, changed __unint32_t to xfs_dir2_dataptr_t,
           fixed null pointer bugs]

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
---
 fs/xfs/xfs_inode.c | 43 ++++++++++++++++++++++++++++++++++---------
 1 file changed, 34 insertions(+), 9 deletions(-)

diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index ef993c3a8963..6e5deb0d42c4 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -1228,14 +1228,16 @@ xfs_create_tmpfile(
 
 int
 xfs_link(
-	xfs_inode_t		*tdp,
-	xfs_inode_t		*sip,
+	struct xfs_inode	*tdp,
+	struct xfs_inode	*sip,
 	struct xfs_name		*target_name)
 {
-	xfs_mount_t		*mp = tdp->i_mount;
-	xfs_trans_t		*tp;
+	struct xfs_mount	*mp = tdp->i_mount;
+	struct xfs_trans	*tp;
 	int			error, nospace_error = 0;
 	int			resblks;
+	xfs_dir2_dataptr_t	diroffset;
+	struct xfs_parent_defer	*parent = NULL;
 
 	trace_xfs_link(tdp, target_name);
 
@@ -1252,11 +1254,17 @@ xfs_link(
 	if (error)
 		goto std_return;
 
+	if (xfs_has_parent(mp)) {
+		error = xfs_parent_init(mp, sip, target_name, &parent);
+		if (error)
+			goto std_return;
+	}
+
 	resblks = XFS_LINK_SPACE_RES(mp, target_name->len);
 	error = xfs_trans_alloc_dir(tdp, &M_RES(mp)->tr_link, sip, &resblks,
 			&tp, &nospace_error);
 	if (error)
-		goto std_return;
+		goto drop_incompat;
 
 	/*
 	 * If we are using project inheritance, we only allow hard link
@@ -1289,14 +1297,26 @@ xfs_link(
 	}
 
 	error = xfs_dir_createname(tp, tdp, target_name, sip->i_ino,
-				   resblks, NULL);
+				   resblks, &diroffset);
 	if (error)
-		goto error_return;
+		goto out_defer_cancel;
 	xfs_trans_ichgtime(tp, tdp, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG);
 	xfs_trans_log_inode(tp, tdp, XFS_ILOG_CORE);
 
 	xfs_bumplink(tp, sip);
 
+	/*
+	 * If we have parent pointers, we now need to add the parent record to
+	 * the attribute fork of the inode. If this is the initial parent
+	 * attribute, we need to create it correctly, otherwise we can just add
+	 * the parent to the inode.
+	 */
+	if (parent) {
+		error = xfs_parent_defer_add(tp, tdp, parent, diroffset);
+		if (error)
+			goto out_defer_cancel;
+	}
+
 	/*
 	 * If this is a synchronous mount, make sure that the
 	 * link transaction goes to disk before returning to
@@ -1310,11 +1330,16 @@ xfs_link(
 	xfs_iunlock(sip, XFS_ILOCK_EXCL);
 	return error;
 
- error_return:
+out_defer_cancel:
+	xfs_defer_cancel(tp);
+error_return:
 	xfs_trans_cancel(tp);
 	xfs_iunlock(tdp, XFS_ILOCK_EXCL);
 	xfs_iunlock(sip, XFS_ILOCK_EXCL);
- std_return:
+drop_incompat:
+	if (parent)
+		xfs_parent_cancel(mp, parent);
+std_return:
 	if (error == -ENOSPC && nospace_error)
 		error = nospace_error;
 	return error;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH RESEND v2 14/18] xfs: remove parent pointers in unlink
  2022-08-04 19:39 [PATCH RESEND v2 00/18] Parent Pointers Allison Henderson
                   ` (12 preceding siblings ...)
  2022-08-04 19:40 ` [PATCH RESEND v2 13/18] xfs: add parent attributes to link Allison Henderson
@ 2022-08-04 19:40 ` Allison Henderson
  2022-08-09 18:45   ` Darrick J. Wong
  2022-08-04 19:40 ` [PATCH RESEND v2 15/18] xfs: Add parent pointers to rename Allison Henderson
                   ` (5 subsequent siblings)
  19 siblings, 1 reply; 58+ messages in thread
From: Allison Henderson @ 2022-08-04 19:40 UTC (permalink / raw)
  To: linux-xfs

This patch removes the parent pointer attribute during unlink

[bfoster: rebase, use VFS inode generation]
[achender: rebased, changed __unint32_t to xfs_dir2_dataptr_t
           implemented xfs_attr_remove_parent]

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
---
 fs/xfs/libxfs/xfs_attr.c   |  2 +-
 fs/xfs/libxfs/xfs_attr.h   |  1 +
 fs/xfs/libxfs/xfs_parent.c | 15 +++++++++++++++
 fs/xfs/libxfs/xfs_parent.h |  3 +++
 fs/xfs/xfs_inode.c         | 29 +++++++++++++++++++++++------
 5 files changed, 43 insertions(+), 7 deletions(-)

diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
index 0a458ea7051f..77513ff7e1ec 100644
--- a/fs/xfs/libxfs/xfs_attr.c
+++ b/fs/xfs/libxfs/xfs_attr.c
@@ -936,7 +936,7 @@ xfs_attr_defer_replace(
 }
 
 /* Removes an attribute for an inode as a deferred operation */
-static int
+int
 xfs_attr_defer_remove(
 	struct xfs_da_args	*args)
 {
diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
index b47417b5172f..2e11e5e83941 100644
--- a/fs/xfs/libxfs/xfs_attr.h
+++ b/fs/xfs/libxfs/xfs_attr.h
@@ -545,6 +545,7 @@ bool xfs_attr_is_leaf(struct xfs_inode *ip);
 int xfs_attr_get_ilocked(struct xfs_da_args *args);
 int xfs_attr_get(struct xfs_da_args *args);
 int xfs_attr_defer_add(struct xfs_da_args *args);
+int xfs_attr_defer_remove(struct xfs_da_args *args);
 int xfs_attr_set(struct xfs_da_args *args);
 int xfs_attr_set_iter(struct xfs_attr_intent *attr);
 int xfs_attr_remove_iter(struct xfs_attr_intent *attr);
diff --git a/fs/xfs/libxfs/xfs_parent.c b/fs/xfs/libxfs/xfs_parent.c
index 4ab531c77d7d..03f03f731d02 100644
--- a/fs/xfs/libxfs/xfs_parent.c
+++ b/fs/xfs/libxfs/xfs_parent.c
@@ -123,6 +123,21 @@ xfs_parent_defer_add(
 	return xfs_attr_defer_add(args);
 }
 
+int
+xfs_parent_defer_remove(
+	struct xfs_trans	*tp,
+	struct xfs_inode	*ip,
+	struct xfs_parent_defer	*parent,
+	xfs_dir2_dataptr_t	diroffset)
+{
+	struct xfs_da_args	*args = &parent->args;
+
+	xfs_init_parent_name_rec(&parent->rec, ip, diroffset);
+	args->trans = tp;
+	args->hashval = xfs_da_hashname(args->name, args->namelen);
+	return xfs_attr_defer_remove(args);
+}
+
 void
 xfs_parent_cancel(
 	xfs_mount_t		*mp,
diff --git a/fs/xfs/libxfs/xfs_parent.h b/fs/xfs/libxfs/xfs_parent.h
index 21a350b97ed5..67948f4b3834 100644
--- a/fs/xfs/libxfs/xfs_parent.h
+++ b/fs/xfs/libxfs/xfs_parent.h
@@ -29,6 +29,9 @@ int xfs_parent_init(xfs_mount_t *mp, xfs_inode_t *ip,
 int xfs_parent_defer_add(struct xfs_trans *tp, struct xfs_inode *ip,
 			 struct xfs_parent_defer *parent,
 			 xfs_dir2_dataptr_t diroffset);
+int xfs_parent_defer_remove(struct xfs_trans *tp, struct xfs_inode *ip,
+			    struct xfs_parent_defer *parent,
+			    xfs_dir2_dataptr_t diroffset);
 void xfs_parent_cancel(xfs_mount_t *mp, struct xfs_parent_defer *parent);
 
 #endif	/* __XFS_PARENT_H__ */
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 6e5deb0d42c4..69bb67f2a252 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -2464,16 +2464,18 @@ xfs_iunpin_wait(
  */
 int
 xfs_remove(
-	xfs_inode_t             *dp,
+	struct xfs_inode	*dp,
 	struct xfs_name		*name,
-	xfs_inode_t		*ip)
+	struct xfs_inode	*ip)
 {
-	xfs_mount_t		*mp = dp->i_mount;
-	xfs_trans_t             *tp = NULL;
+	struct xfs_mount	*mp = dp->i_mount;
+	struct xfs_trans	*tp = NULL;
 	int			is_dir = S_ISDIR(VFS_I(ip)->i_mode);
 	int			dontcare;
 	int                     error = 0;
 	uint			resblks;
+	xfs_dir2_dataptr_t	dir_offset;
+	struct xfs_parent_defer	*parent = NULL;
 
 	trace_xfs_remove(dp, name);
 
@@ -2488,6 +2490,12 @@ xfs_remove(
 	if (error)
 		goto std_return;
 
+	if (xfs_has_parent(mp)) {
+		error = xfs_parent_init(mp, ip, NULL, &parent);
+		if (error)
+			goto std_return;
+	}
+
 	/*
 	 * We try to get the real space reservation first, allowing for
 	 * directory btree deletion(s) implying possible bmap insert(s).  If we
@@ -2504,7 +2512,7 @@ xfs_remove(
 			&tp, &dontcare);
 	if (error) {
 		ASSERT(error != -ENOSPC);
-		goto std_return;
+		goto drop_incompat;
 	}
 
 	/*
@@ -2558,12 +2566,18 @@ xfs_remove(
 	if (error)
 		goto out_trans_cancel;
 
-	error = xfs_dir_removename(tp, dp, name, ip->i_ino, resblks, NULL);
+	error = xfs_dir_removename(tp, dp, name, ip->i_ino, resblks, &dir_offset);
 	if (error) {
 		ASSERT(error != -ENOENT);
 		goto out_trans_cancel;
 	}
 
+	if (xfs_has_parent(mp)) {
+		error = xfs_parent_defer_remove(tp, dp, parent, dir_offset);
+		if (error)
+			goto out_trans_cancel;
+	}
+
 	/*
 	 * If this is a synchronous mount, make sure that the
 	 * remove transaction goes to disk before returning to
@@ -2588,6 +2602,9 @@ xfs_remove(
  out_unlock:
 	xfs_iunlock(ip, XFS_ILOCK_EXCL);
 	xfs_iunlock(dp, XFS_ILOCK_EXCL);
+ drop_incompat:
+	if (parent)
+		xfs_parent_cancel(mp, parent);
  std_return:
 	return error;
 }
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH RESEND v2 15/18] xfs: Add parent pointers to rename
  2022-08-04 19:39 [PATCH RESEND v2 00/18] Parent Pointers Allison Henderson
                   ` (13 preceding siblings ...)
  2022-08-04 19:40 ` [PATCH RESEND v2 14/18] xfs: remove parent pointers in unlink Allison Henderson
@ 2022-08-04 19:40 ` Allison Henderson
  2022-08-09 18:49   ` Darrick J. Wong
  2022-08-04 19:40 ` [PATCH RESEND v2 16/18] xfs: Add the parent pointer support to the superblock version 5 Allison Henderson
                   ` (4 subsequent siblings)
  19 siblings, 1 reply; 58+ messages in thread
From: Allison Henderson @ 2022-08-04 19:40 UTC (permalink / raw)
  To: linux-xfs

This patch removes the old parent pointer attribute during the rename
operation, and re-adds the updated parent pointer.  In the case of
xfs_cross_rename, we modify the routine not to roll the transaction just
yet.  We will do this after the parent pointer is added in the calling
xfs_rename function.

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
---
 fs/xfs/xfs_inode.c | 128 +++++++++++++++++++++++++++++++++------------
 1 file changed, 94 insertions(+), 34 deletions(-)

diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 69bb67f2a252..8a81b78b6dd7 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -2776,7 +2776,7 @@ xfs_cross_rename(
 	}
 	xfs_trans_ichgtime(tp, dp1, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG);
 	xfs_trans_log_inode(tp, dp1, XFS_ILOG_CORE);
-	return xfs_finish_rename(tp);
+	return 0;
 
 out_trans_abort:
 	xfs_trans_cancel(tp);
@@ -2834,26 +2834,31 @@ xfs_rename_alloc_whiteout(
  */
 int
 xfs_rename(
-	struct user_namespace	*mnt_userns,
-	struct xfs_inode	*src_dp,
-	struct xfs_name		*src_name,
-	struct xfs_inode	*src_ip,
-	struct xfs_inode	*target_dp,
-	struct xfs_name		*target_name,
-	struct xfs_inode	*target_ip,
-	unsigned int		flags)
+	struct user_namespace		*mnt_userns,
+	struct xfs_inode		*src_dp,
+	struct xfs_name			*src_name,
+	struct xfs_inode		*src_ip,
+	struct xfs_inode		*target_dp,
+	struct xfs_name			*target_name,
+	struct xfs_inode		*target_ip,
+	unsigned int			flags)
 {
-	struct xfs_mount	*mp = src_dp->i_mount;
-	struct xfs_trans	*tp;
-	struct xfs_inode	*wip = NULL;		/* whiteout inode */
-	struct xfs_inode	*inodes[__XFS_SORT_INODES];
-	int			i;
-	int			num_inodes = __XFS_SORT_INODES;
-	bool			new_parent = (src_dp != target_dp);
-	bool			src_is_directory = S_ISDIR(VFS_I(src_ip)->i_mode);
-	int			spaceres;
-	bool			retried = false;
-	int			error, nospace_error = 0;
+	struct xfs_mount		*mp = src_dp->i_mount;
+	struct xfs_trans		*tp;
+	struct xfs_inode		*wip = NULL;		/* whiteout inode */
+	struct xfs_inode		*inodes[__XFS_SORT_INODES];
+	int				i;
+	int				num_inodes = __XFS_SORT_INODES;
+	bool				new_parent = (src_dp != target_dp);
+	bool				src_is_directory = S_ISDIR(VFS_I(src_ip)->i_mode);
+	int				spaceres;
+	bool				retried = false;
+	int				error, nospace_error = 0;
+	xfs_dir2_dataptr_t		new_diroffset;
+	xfs_dir2_dataptr_t		old_diroffset;
+	struct xfs_parent_defer		*old_parent_ptr = NULL;
+	struct xfs_parent_defer		*new_parent_ptr = NULL;
+	struct xfs_parent_defer		*target_parent_ptr = NULL;
 
 	trace_xfs_rename(src_dp, target_dp, src_name, target_name);
 
@@ -2877,6 +2882,15 @@ xfs_rename(
 
 	xfs_sort_for_rename(src_dp, target_dp, src_ip, target_ip, wip,
 				inodes, &num_inodes);
+	if (xfs_has_parent(mp)) {
+		error = xfs_parent_init(mp, src_ip, NULL, &old_parent_ptr);
+		if (error)
+			goto out_release_wip;
+		error = xfs_parent_init(mp, src_ip, target_name,
+					&new_parent_ptr);
+		if (error)
+			goto out_release_wip;
+	}
 
 retry:
 	nospace_error = 0;
@@ -2889,7 +2903,7 @@ xfs_rename(
 				&tp);
 	}
 	if (error)
-		goto out_release_wip;
+		goto drop_incompat;
 
 	/*
 	 * Attach the dquots to the inodes
@@ -2911,14 +2925,14 @@ xfs_rename(
 	 * we can rely on either trans_commit or trans_cancel to unlock
 	 * them.
 	 */
-	xfs_trans_ijoin(tp, src_dp, XFS_ILOCK_EXCL);
+	xfs_trans_ijoin(tp, src_dp, 0);
 	if (new_parent)
-		xfs_trans_ijoin(tp, target_dp, XFS_ILOCK_EXCL);
-	xfs_trans_ijoin(tp, src_ip, XFS_ILOCK_EXCL);
+		xfs_trans_ijoin(tp, target_dp, 0);
+	xfs_trans_ijoin(tp, src_ip, 0);
 	if (target_ip)
-		xfs_trans_ijoin(tp, target_ip, XFS_ILOCK_EXCL);
+		xfs_trans_ijoin(tp, target_ip, 0);
 	if (wip)
-		xfs_trans_ijoin(tp, wip, XFS_ILOCK_EXCL);
+		xfs_trans_ijoin(tp, wip, 0);
 
 	/*
 	 * If we are using project inheritance, we only allow renames
@@ -2928,15 +2942,16 @@ xfs_rename(
 	if (unlikely((target_dp->i_diflags & XFS_DIFLAG_PROJINHERIT) &&
 		     target_dp->i_projid != src_ip->i_projid)) {
 		error = -EXDEV;
-		goto out_trans_cancel;
+		goto out_unlock;
 	}
 
 	/* RENAME_EXCHANGE is unique from here on. */
-	if (flags & RENAME_EXCHANGE)
-		return xfs_cross_rename(tp, src_dp, src_name, src_ip,
+	if (flags & RENAME_EXCHANGE) {
+		error = xfs_cross_rename(tp, src_dp, src_name, src_ip,
 					target_dp, target_name, target_ip,
 					spaceres);
-
+		goto out_pptr;
+	}
 	/*
 	 * Try to reserve quota to handle an expansion of the target directory.
 	 * We'll allow the rename to continue in reservationless mode if we hit
@@ -3052,7 +3067,7 @@ xfs_rename(
 		 * to account for the ".." reference from the new entry.
 		 */
 		error = xfs_dir_createname(tp, target_dp, target_name,
-					   src_ip->i_ino, spaceres, NULL);
+					   src_ip->i_ino, spaceres, &new_diroffset);
 		if (error)
 			goto out_trans_cancel;
 
@@ -3073,10 +3088,14 @@ xfs_rename(
 		 * name at the destination directory, remove it first.
 		 */
 		error = xfs_dir_replace(tp, target_dp, target_name,
-					src_ip->i_ino, spaceres, NULL);
+					src_ip->i_ino, spaceres, &new_diroffset);
 		if (error)
 			goto out_trans_cancel;
 
+		if (xfs_has_parent(mp))
+			error = xfs_parent_init(mp, target_ip, NULL,
+						&target_parent_ptr);
+
 		xfs_trans_ichgtime(tp, target_dp,
 					XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG);
 
@@ -3146,26 +3165,67 @@ xfs_rename(
 	 */
 	if (wip)
 		error = xfs_dir_replace(tp, src_dp, src_name, wip->i_ino,
-					spaceres, NULL);
+					spaceres, &old_diroffset);
 	else
 		error = xfs_dir_removename(tp, src_dp, src_name, src_ip->i_ino,
-					   spaceres, NULL);
+					   spaceres, &old_diroffset);
 
 	if (error)
 		goto out_trans_cancel;
 
+out_pptr:
+	if (new_parent_ptr) {
+		error = xfs_parent_defer_add(tp, target_dp, new_parent_ptr,
+					     new_diroffset);
+		if (error)
+			goto out_trans_cancel;
+	}
+
+	if (old_parent_ptr) {
+		error = xfs_parent_defer_remove(tp, src_dp, old_parent_ptr,
+						old_diroffset);
+		if (error)
+			goto out_trans_cancel;
+	}
+
+	if (target_parent_ptr) {
+		error = xfs_parent_defer_remove(tp, target_dp,
+						target_parent_ptr,
+						new_diroffset);
+		if (error)
+			goto out_trans_cancel;
+	}
+
 	xfs_trans_ichgtime(tp, src_dp, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG);
 	xfs_trans_log_inode(tp, src_dp, XFS_ILOG_CORE);
 	if (new_parent)
 		xfs_trans_log_inode(tp, target_dp, XFS_ILOG_CORE);
 
 	error = xfs_finish_rename(tp);
+
+out_unlock:
 	if (wip)
 		xfs_irele(wip);
+	if (wip)
+		xfs_iunlock(wip, XFS_ILOCK_EXCL);
+	if (target_ip)
+		xfs_iunlock(target_ip, XFS_ILOCK_EXCL);
+	xfs_iunlock(src_ip, XFS_ILOCK_EXCL);
+	if (new_parent)
+		xfs_iunlock(target_dp, XFS_ILOCK_EXCL);
+	xfs_iunlock(src_dp, XFS_ILOCK_EXCL);
+
 	return error;
 
 out_trans_cancel:
 	xfs_trans_cancel(tp);
+drop_incompat:
+	if (new_parent_ptr)
+		xfs_parent_cancel(mp, new_parent_ptr);
+	if (old_parent_ptr)
+		xfs_parent_cancel(mp, old_parent_ptr);
+	if (target_parent_ptr)
+		xfs_parent_cancel(mp, target_parent_ptr);
 out_release_wip:
 	if (wip)
 		xfs_irele(wip);
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH RESEND v2 16/18] xfs: Add the parent pointer support to the  superblock version 5.
  2022-08-04 19:39 [PATCH RESEND v2 00/18] Parent Pointers Allison Henderson
                   ` (14 preceding siblings ...)
  2022-08-04 19:40 ` [PATCH RESEND v2 15/18] xfs: Add parent pointers to rename Allison Henderson
@ 2022-08-04 19:40 ` Allison Henderson
  2022-08-04 19:40 ` [PATCH RESEND v2 17/18] xfs: Add helper function xfs_attr_list_context_init Allison Henderson
                   ` (3 subsequent siblings)
  19 siblings, 0 replies; 58+ messages in thread
From: Allison Henderson @ 2022-08-04 19:40 UTC (permalink / raw)
  To: linux-xfs

[dchinner: forward ported and cleaned up]
[achender: rebased and added parent pointer attribute to
           compatible attributes mask]

Signed-off-by: Mark Tinguely <tinguely@sgi.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_format.h | 4 +++-
 fs/xfs/libxfs/xfs_fs.h     | 1 +
 fs/xfs/libxfs/xfs_sb.c     | 4 ++++
 fs/xfs/xfs_super.c         | 4 ++++
 4 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/libxfs/xfs_format.h b/fs/xfs/libxfs/xfs_format.h
index b55bdfa9c8a8..0343f8586be3 100644
--- a/fs/xfs/libxfs/xfs_format.h
+++ b/fs/xfs/libxfs/xfs_format.h
@@ -373,13 +373,15 @@ xfs_sb_has_ro_compat_feature(
 #define XFS_SB_FEAT_INCOMPAT_BIGTIME	(1 << 3)	/* large timestamps */
 #define XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR (1 << 4)	/* needs xfs_repair */
 #define XFS_SB_FEAT_INCOMPAT_NREXT64	(1 << 5)	/* large extent counters */
+#define XFS_SB_FEAT_INCOMPAT_PARENT	(1 << 6)	/* parent pointers */
 #define XFS_SB_FEAT_INCOMPAT_ALL \
 		(XFS_SB_FEAT_INCOMPAT_FTYPE|	\
 		 XFS_SB_FEAT_INCOMPAT_SPINODES|	\
 		 XFS_SB_FEAT_INCOMPAT_META_UUID| \
 		 XFS_SB_FEAT_INCOMPAT_BIGTIME| \
 		 XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR| \
-		 XFS_SB_FEAT_INCOMPAT_NREXT64)
+		 XFS_SB_FEAT_INCOMPAT_NREXT64| \
+		 XFS_SB_FEAT_INCOMPAT_PARENT)
 
 #define XFS_SB_FEAT_INCOMPAT_UNKNOWN	~XFS_SB_FEAT_INCOMPAT_ALL
 static inline bool
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index 1cfd5bc6520a..b0b4d7a3aa15 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -237,6 +237,7 @@ typedef struct xfs_fsop_resblks {
 #define XFS_FSOP_GEOM_FLAGS_BIGTIME	(1 << 21) /* 64-bit nsec timestamps */
 #define XFS_FSOP_GEOM_FLAGS_INOBTCNT	(1 << 22) /* inobt btree counter */
 #define XFS_FSOP_GEOM_FLAGS_NREXT64	(1 << 23) /* large extent counters */
+#define XFS_FSOP_GEOM_FLAGS_PARENT	(1 << 24) /* parent pointers 	    */
 
 /*
  * Minimum and maximum sizes need for growth checks.
diff --git a/fs/xfs/libxfs/xfs_sb.c b/fs/xfs/libxfs/xfs_sb.c
index a20cade590e9..75e893e93629 100644
--- a/fs/xfs/libxfs/xfs_sb.c
+++ b/fs/xfs/libxfs/xfs_sb.c
@@ -173,6 +173,8 @@ xfs_sb_version_to_features(
 		features |= XFS_FEAT_NEEDSREPAIR;
 	if (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_NREXT64)
 		features |= XFS_FEAT_NREXT64;
+	if (sbp->sb_features_incompat & XFS_SB_FEAT_INCOMPAT_PARENT)
+		features |= XFS_FEAT_PARENT;
 
 	return features;
 }
@@ -1187,6 +1189,8 @@ xfs_fs_geometry(
 		geo->flags |= XFS_FSOP_GEOM_FLAGS_BIGTIME;
 	if (xfs_has_inobtcounts(mp))
 		geo->flags |= XFS_FSOP_GEOM_FLAGS_INOBTCNT;
+	if (xfs_has_parent(mp))
+		geo->flags |= XFS_FSOP_GEOM_FLAGS_PARENT;
 	if (xfs_has_sector(mp)) {
 		geo->flags |= XFS_FSOP_GEOM_FLAGS_SECTOR;
 		geo->logsectsize = sbp->sb_logsectsize;
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 3d27ba1295c9..eaa2bb63621b 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1655,6 +1655,10 @@ xfs_fs_fill_super(
 		xfs_warn(mp,
 	"EXPERIMENTAL Large extent counts feature in use. Use at your own risk!");
 
+	if (xfs_has_parent(mp))
+		xfs_alert(mp,
+	"EXPERIMENTAL parent pointer feature enabled. Use at your own risk!");
+
 	error = xfs_mountfs(mp);
 	if (error)
 		goto out_filestream_unmount;
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH RESEND v2 17/18] xfs: Add helper function xfs_attr_list_context_init
  2022-08-04 19:39 [PATCH RESEND v2 00/18] Parent Pointers Allison Henderson
                   ` (15 preceding siblings ...)
  2022-08-04 19:40 ` [PATCH RESEND v2 16/18] xfs: Add the parent pointer support to the superblock version 5 Allison Henderson
@ 2022-08-04 19:40 ` Allison Henderson
  2022-08-04 19:40 ` [PATCH RESEND v2 18/18] xfs: Add parent pointer ioctl Allison Henderson
                   ` (2 subsequent siblings)
  19 siblings, 0 replies; 58+ messages in thread
From: Allison Henderson @ 2022-08-04 19:40 UTC (permalink / raw)
  To: linux-xfs

This patch adds a helper function xfs_attr_list_context_init used by
xfs_attr_list. This function initializes the xfs_attr_list_context
structure passed to xfs_attr_list_int. We will need this later to call
xfs_attr_list_int_ilocked when the node is already locked.

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_file.c  |  1 +
 fs/xfs/xfs_ioctl.c | 54 ++++++++++++++++++++++++++++++++--------------
 fs/xfs/xfs_ioctl.h |  2 ++
 3 files changed, 41 insertions(+), 16 deletions(-)

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 5a171c0b244b..7a54887cc37c 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -17,6 +17,7 @@
 #include "xfs_bmap_util.h"
 #include "xfs_dir2.h"
 #include "xfs_dir2_priv.h"
+#include "xfs_attr.h"
 #include "xfs_ioctl.h"
 #include "xfs_trace.h"
 #include "xfs_log.h"
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 1f783e979629..5b600d3f7981 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -369,6 +369,40 @@ xfs_attr_flags(
 	return 0;
 }
 
+/*
+ * Initializes an xfs_attr_list_context suitable for
+ * use by xfs_attr_list
+ */
+int
+xfs_ioc_attr_list_context_init(
+	struct xfs_inode		*dp,
+	char				*buffer,
+	int				bufsize,
+	int				flags,
+	struct xfs_attr_list_context	*context)
+{
+	struct xfs_attrlist		*alist;
+
+	/*
+	 * Initialize the output buffer.
+	 */
+	context->dp = dp;
+	context->resynch = 1;
+	context->attr_filter = xfs_attr_filter(flags);
+	context->buffer = buffer;
+	context->bufsize = round_down(bufsize, sizeof(uint32_t));
+	context->firstu = context->bufsize;
+	context->put_listent = xfs_ioc_attr_put_listent;
+
+	alist = context->buffer;
+	alist->al_count = 0;
+	alist->al_more = 0;
+	alist->al_offset[0] = context->bufsize;
+
+	return 0;
+}
+
+
 int
 xfs_ioc_attr_list(
 	struct xfs_inode		*dp,
@@ -378,7 +412,6 @@ xfs_ioc_attr_list(
 	struct xfs_attrlist_cursor __user *ucursor)
 {
 	struct xfs_attr_list_context	context = { };
-	struct xfs_attrlist		*alist;
 	void				*buffer;
 	int				error;
 
@@ -410,21 +443,10 @@ xfs_ioc_attr_list(
 	if (!buffer)
 		return -ENOMEM;
 
-	/*
-	 * Initialize the output buffer.
-	 */
-	context.dp = dp;
-	context.resynch = 1;
-	context.attr_filter = xfs_attr_filter(flags);
-	context.buffer = buffer;
-	context.bufsize = round_down(bufsize, sizeof(uint32_t));
-	context.firstu = context.bufsize;
-	context.put_listent = xfs_ioc_attr_put_listent;
-
-	alist = context.buffer;
-	alist->al_count = 0;
-	alist->al_more = 0;
-	alist->al_offset[0] = context.bufsize;
+	error = xfs_ioc_attr_list_context_init(dp, buffer, bufsize, flags,
+			&context);
+	if (error)
+		return error;
 
 	error = xfs_attr_list(&context);
 	if (error)
diff --git a/fs/xfs/xfs_ioctl.h b/fs/xfs/xfs_ioctl.h
index d4abba2c13c1..ca60e1c427a3 100644
--- a/fs/xfs/xfs_ioctl.h
+++ b/fs/xfs/xfs_ioctl.h
@@ -35,6 +35,8 @@ int xfs_ioc_attrmulti_one(struct file *parfilp, struct inode *inode,
 int xfs_ioc_attr_list(struct xfs_inode *dp, void __user *ubuf,
 		      size_t bufsize, int flags,
 		      struct xfs_attrlist_cursor __user *ucursor);
+int xfs_ioc_attr_list_context_init(struct xfs_inode *dp, char *buffer,
+		int bufsize, int flags, struct xfs_attr_list_context *context);
 
 extern struct dentry *
 xfs_handle_to_dentry(
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH RESEND v2 18/18] xfs: Add parent pointer ioctl
  2022-08-04 19:39 [PATCH RESEND v2 00/18] Parent Pointers Allison Henderson
                   ` (16 preceding siblings ...)
  2022-08-04 19:40 ` [PATCH RESEND v2 17/18] xfs: Add helper function xfs_attr_list_context_init Allison Henderson
@ 2022-08-04 19:40 ` Allison Henderson
  2022-08-09 19:26   ` Darrick J. Wong
  2022-08-09 22:55 ` [RFC PATCH 19/18] xfs: fix unit conversion error in xfs_log_calc_max_attrsetm_res Darrick J. Wong
  2022-08-09 22:56 ` [RFC PATCH 20/18] xfs: drop compatibility minimum log size computations for reflink Darrick J. Wong
  19 siblings, 1 reply; 58+ messages in thread
From: Allison Henderson @ 2022-08-04 19:40 UTC (permalink / raw)
  To: linux-xfs

This patch adds a new file ioctl to retrieve the parent pointer of a
given inode

Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
---
 fs/xfs/Makefile            |   1 +
 fs/xfs/libxfs/xfs_fs.h     |  57 ++++++++++++++++
 fs/xfs/libxfs/xfs_parent.c |  10 +++
 fs/xfs/libxfs/xfs_parent.h |   2 +
 fs/xfs/xfs_ioctl.c         |  95 +++++++++++++++++++++++++-
 fs/xfs/xfs_ondisk.h        |   4 ++
 fs/xfs/xfs_parent_utils.c  | 134 +++++++++++++++++++++++++++++++++++++
 fs/xfs/xfs_parent_utils.h  |  22 ++++++
 8 files changed, 323 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
index caeea8d968ba..998658e40ab4 100644
--- a/fs/xfs/Makefile
+++ b/fs/xfs/Makefile
@@ -86,6 +86,7 @@ xfs-y				+= xfs_aops.o \
 				   xfs_mount.o \
 				   xfs_mru_cache.o \
 				   xfs_pwork.o \
+				   xfs_parent_utils.o \
 				   xfs_reflink.o \
 				   xfs_stats.o \
 				   xfs_super.o \
diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
index b0b4d7a3aa15..ba6ec82a0272 100644
--- a/fs/xfs/libxfs/xfs_fs.h
+++ b/fs/xfs/libxfs/xfs_fs.h
@@ -574,6 +574,7 @@ typedef struct xfs_fsop_handlereq {
 #define XFS_IOC_ATTR_SECURE	0x0008	/* use attrs in security namespace */
 #define XFS_IOC_ATTR_CREATE	0x0010	/* fail if attr already exists */
 #define XFS_IOC_ATTR_REPLACE	0x0020	/* fail if attr does not exist */
+#define XFS_IOC_ATTR_PARENT	0x0040  /* use attrs in parent namespace */
 
 typedef struct xfs_attrlist_cursor {
 	__u32		opaque[4];
@@ -752,6 +753,61 @@ struct xfs_scrub_metadata {
 				 XFS_SCRUB_OFLAG_NO_REPAIR_NEEDED)
 #define XFS_SCRUB_FLAGS_ALL	(XFS_SCRUB_FLAGS_IN | XFS_SCRUB_FLAGS_OUT)
 
+#define XFS_PPTR_MAXNAMELEN				256
+
+/* return parents of the handle, not the open fd */
+#define XFS_PPTR_IFLAG_HANDLE  (1U << 0)
+
+/* target was the root directory */
+#define XFS_PPTR_OFLAG_ROOT    (1U << 1)
+
+/* Cursor is done iterating pptrs */
+#define XFS_PPTR_OFLAG_DONE    (1U << 2)
+
+/* Get an inode parent pointer through ioctl */
+struct xfs_parent_ptr {
+	__u64		xpp_ino;			/* Inode */
+	__u32		xpp_gen;			/* Inode generation */
+	__u32		xpp_diroffset;			/* Directory offset */
+	__u32		xpp_namelen;			/* File name length */
+	__u32		xpp_pad;
+	__u8		xpp_name[XFS_PPTR_MAXNAMELEN];	/* File name */
+};
+
+/* Iterate through an inodes parent pointers */
+struct xfs_pptr_info {
+	struct xfs_handle		pi_handle;
+	struct xfs_attrlist_cursor	pi_cursor;
+	__u32				pi_flags;
+	__u32				pi_reserved;
+	__u32				pi_ptrs_size;
+	__u32				pi_ptrs_used;
+	__u64				pi_reserved2[6];
+
+	/*
+	 * An array of struct xfs_parent_ptr follows the header
+	 * information. Use XFS_PPINFO_TO_PP() to access the
+	 * parent pointer array entries.
+	 */
+	struct xfs_parent_ptr		pi_parents[];
+};
+
+static inline size_t
+xfs_pptr_info_sizeof(int nr_ptrs)
+{
+	return sizeof(struct xfs_pptr_info) +
+	       (nr_ptrs * sizeof(struct xfs_parent_ptr));
+}
+
+static inline struct xfs_parent_ptr*
+xfs_ppinfo_to_pp(
+	struct xfs_pptr_info	*info,
+	int			idx)
+{
+
+	return &info->pi_parents[idx];
+}
+
 /*
  * ioctl limits
  */
@@ -797,6 +853,7 @@ struct xfs_scrub_metadata {
 /*	XFS_IOC_GETFSMAP ------ hoisted 59         */
 #define XFS_IOC_SCRUB_METADATA	_IOWR('X', 60, struct xfs_scrub_metadata)
 #define XFS_IOC_AG_GEOMETRY	_IOWR('X', 61, struct xfs_ag_geometry)
+#define XFS_IOC_GETPPOINTER	_IOR ('X', 62, struct xfs_parent_ptr)
 
 /*
  * ioctl commands that replace IRIX syssgi()'s
diff --git a/fs/xfs/libxfs/xfs_parent.c b/fs/xfs/libxfs/xfs_parent.c
index 03f03f731d02..d9c922a78617 100644
--- a/fs/xfs/libxfs/xfs_parent.c
+++ b/fs/xfs/libxfs/xfs_parent.c
@@ -26,6 +26,16 @@
 #include "xfs_xattr.h"
 #include "xfs_parent.h"
 
+/* Initializes a xfs_parent_ptr from an xfs_parent_name_rec */
+void
+xfs_init_parent_ptr(struct xfs_parent_ptr	*xpp,
+		    struct xfs_parent_name_rec	*rec)
+{
+	xpp->xpp_ino = be64_to_cpu(rec->p_ino);
+	xpp->xpp_gen = be32_to_cpu(rec->p_gen);
+	xpp->xpp_diroffset = be32_to_cpu(rec->p_diroffset);
+}
+
 /*
  * Parent pointer attribute handling.
  *
diff --git a/fs/xfs/libxfs/xfs_parent.h b/fs/xfs/libxfs/xfs_parent.h
index 67948f4b3834..53161b79d1e2 100644
--- a/fs/xfs/libxfs/xfs_parent.h
+++ b/fs/xfs/libxfs/xfs_parent.h
@@ -23,6 +23,8 @@ void xfs_init_parent_name_rec(struct xfs_parent_name_rec *rec,
 			      uint32_t p_diroffset);
 void xfs_init_parent_name_irec(struct xfs_parent_name_irec *irec,
 			       struct xfs_parent_name_rec *rec);
+void xfs_init_parent_ptr(struct xfs_parent_ptr *xpp,
+			 struct xfs_parent_name_rec *rec);
 int xfs_parent_init(xfs_mount_t *mp, xfs_inode_t *ip,
 		    struct xfs_name *target_name,
 		    struct xfs_parent_defer **parentp);
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index 5b600d3f7981..8a9530588ef4 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -37,6 +37,7 @@
 #include "xfs_health.h"
 #include "xfs_reflink.h"
 #include "xfs_ioctl.h"
+#include "xfs_parent_utils.h"
 #include "xfs_xattr.h"
 
 #include <linux/mount.h>
@@ -355,6 +356,8 @@ xfs_attr_filter(
 		return XFS_ATTR_ROOT;
 	if (ioc_flags & XFS_IOC_ATTR_SECURE)
 		return XFS_ATTR_SECURE;
+	if (ioc_flags & XFS_IOC_ATTR_PARENT)
+		return XFS_ATTR_PARENT;
 	return 0;
 }
 
@@ -422,7 +425,8 @@ xfs_ioc_attr_list(
 	/*
 	 * Reject flags, only allow namespaces.
 	 */
-	if (flags & ~(XFS_IOC_ATTR_ROOT | XFS_IOC_ATTR_SECURE))
+	if (flags & ~(XFS_IOC_ATTR_ROOT | XFS_IOC_ATTR_SECURE |
+		      XFS_IOC_ATTR_PARENT))
 		return -EINVAL;
 	if (flags == (XFS_IOC_ATTR_ROOT | XFS_IOC_ATTR_SECURE))
 		return -EINVAL;
@@ -1679,6 +1683,92 @@ xfs_ioc_scrub_metadata(
 	return 0;
 }
 
+/*
+ * IOCTL routine to get the parent pointers of an inode and return it to user
+ * space.  Caller must pass a buffer space containing a struct xfs_pptr_info,
+ * followed by a region large enough to contain an array of struct
+ * xfs_parent_ptr of a size specified in pi_ptrs_size.  If the inode contains
+ * more parent pointers than can fit in the buffer space, caller may re-call
+ * the function using the returned pi_cursor to resume iteration.  The
+ * number of xfs_parent_ptr returned will be stored in pi_ptrs_used.
+ *
+ * Returns 0 on success or non-zero on failure
+ */
+STATIC int
+xfs_ioc_get_parent_pointer(
+	struct file			*filp,
+	void				__user *arg)
+{
+	struct xfs_pptr_info		*ppi = NULL;
+	int				error = 0;
+	struct xfs_inode		*ip = XFS_I(file_inode(filp));
+	struct xfs_mount		*mp = ip->i_mount;
+
+	if (!capable(CAP_SYS_ADMIN))
+		return -EPERM;
+
+	/* Allocate an xfs_pptr_info to put the user data */
+	ppi = kmem_alloc(sizeof(struct xfs_pptr_info), 0);
+	if (!ppi)
+		return -ENOMEM;
+
+	/* Copy the data from the user */
+	error = copy_from_user(ppi, arg, sizeof(struct xfs_pptr_info));
+	if (error)
+		goto out;
+
+	/* Check size of buffer requested by user */
+	if (xfs_pptr_info_sizeof(ppi->pi_ptrs_size) > XFS_XATTR_LIST_MAX) {
+		error = -ENOMEM;
+		goto out;
+	}
+
+	if (ppi->pi_flags != 0 && ppi->pi_flags != XFS_PPTR_IFLAG_HANDLE) {
+		error = -EINVAL;
+		goto out;
+	}
+
+	/*
+	 * Now that we know how big the trailing buffer is, expand
+	 * our kernel xfs_pptr_info to be the same size
+	 */
+	ppi = krealloc(ppi, xfs_pptr_info_sizeof(ppi->pi_ptrs_size),
+		       GFP_NOFS | __GFP_NOFAIL);
+	if (!ppi)
+		return -ENOMEM;
+
+	if (ppi->pi_flags == XFS_PPTR_IFLAG_HANDLE) {
+		error = xfs_iget(mp, NULL, ppi->pi_handle.ha_fid.fid_ino,
+				0, 0, &ip);
+		if (error)
+			goto out;
+
+		if (VFS_I(ip)->i_generation != ppi->pi_handle.ha_fid.fid_gen) {
+			error = -EINVAL;
+			goto out;
+		}
+	}
+
+	if (ip->i_ino == mp->m_sb.sb_rootino)
+		ppi->pi_flags |= XFS_PPTR_OFLAG_ROOT;
+
+	/* Get the parent pointers */
+	error = xfs_attr_get_parent_pointer(ip, ppi);
+
+	if (error)
+		goto out;
+
+	/* Copy the parent pointers back to the user */
+	error = copy_to_user(arg, ppi,
+			xfs_pptr_info_sizeof(ppi->pi_ptrs_size));
+	if (error)
+		goto out;
+
+out:
+	kmem_free(ppi);
+	return error;
+}
+
 int
 xfs_ioc_swapext(
 	xfs_swapext_t	*sxp)
@@ -1968,7 +2058,8 @@ xfs_file_ioctl(
 
 	case XFS_IOC_FSGETXATTRA:
 		return xfs_ioc_fsgetxattra(ip, arg);
-
+	case XFS_IOC_GETPPOINTER:
+		return xfs_ioc_get_parent_pointer(filp, arg);
 	case XFS_IOC_GETBMAP:
 	case XFS_IOC_GETBMAPA:
 	case XFS_IOC_GETBMAPX:
diff --git a/fs/xfs/xfs_ondisk.h b/fs/xfs/xfs_ondisk.h
index 758702b9495f..765eb514a917 100644
--- a/fs/xfs/xfs_ondisk.h
+++ b/fs/xfs/xfs_ondisk.h
@@ -135,6 +135,10 @@ xfs_check_ondisk_structs(void)
 	XFS_CHECK_STRUCT_SIZE(struct xfs_attri_log_format,	40);
 	XFS_CHECK_STRUCT_SIZE(struct xfs_attrd_log_format,	16);
 
+	/* parent pointer ioctls */
+	XFS_CHECK_STRUCT_SIZE(struct xfs_parent_ptr,            280);
+	XFS_CHECK_STRUCT_SIZE(struct xfs_pptr_info,             104);
+
 	/*
 	 * The v5 superblock format extended several v4 header structures with
 	 * additional data. While new fields are only accessible on v5
diff --git a/fs/xfs/xfs_parent_utils.c b/fs/xfs/xfs_parent_utils.c
new file mode 100644
index 000000000000..3351ce173075
--- /dev/null
+++ b/fs/xfs/xfs_parent_utils.c
@@ -0,0 +1,134 @@
+/*
+ * Copyright (c) 2015 Red Hat, Inc.
+ * All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation
+ */
+#include "xfs.h"
+#include "xfs_fs.h"
+#include "xfs_format.h"
+#include "xfs_log_format.h"
+#include "xfs_shared.h"
+#include "xfs_trans_resv.h"
+#include "xfs_mount.h"
+#include "xfs_bmap_btree.h"
+#include "xfs_inode.h"
+#include "xfs_error.h"
+#include "xfs_trace.h"
+#include "xfs_trans.h"
+#include "xfs_da_format.h"
+#include "xfs_da_btree.h"
+#include "xfs_attr.h"
+#include "xfs_ioctl.h"
+#include "xfs_parent.h"
+#include "xfs_da_btree.h"
+
+/*
+ * Get the parent pointers for a given inode
+ *
+ * Returns 0 on success and non zero on error
+ */
+int
+xfs_attr_get_parent_pointer(struct xfs_inode		*ip,
+			    struct xfs_pptr_info	*ppi)
+
+{
+
+	struct xfs_attrlist		*alist;
+	struct xfs_attrlist_ent		*aent;
+	struct xfs_parent_ptr		*xpp;
+	struct xfs_parent_name_rec	*xpnr;
+	char				*namebuf;
+	unsigned int			namebuf_size;
+	int				name_len;
+	int				error = 0;
+	unsigned int			ioc_flags = XFS_IOC_ATTR_PARENT;
+	unsigned int			flags = XFS_ATTR_PARENT;
+	int				i;
+	struct xfs_attr_list_context	context;
+
+	/* Allocate a buffer to store the attribute names */
+	namebuf_size = sizeof(struct xfs_attrlist) +
+		       (ppi->pi_ptrs_size) * sizeof(struct xfs_attrlist_ent);
+	namebuf = kvzalloc(namebuf_size, GFP_KERNEL);
+	if (!namebuf)
+		return -ENOMEM;
+
+	memset(&context, 0, sizeof(struct xfs_attr_list_context));
+	error = xfs_ioc_attr_list_context_init(ip, namebuf, namebuf_size,
+			ioc_flags, &context);
+
+	/* Copy the cursor provided by caller */
+	memcpy(&context.cursor, &ppi->pi_cursor,
+	       sizeof(struct xfs_attrlist_cursor));
+
+	if (error)
+		goto out_kfree;
+
+	xfs_ilock(ip, XFS_ILOCK_EXCL);
+
+	error = xfs_attr_list_ilocked(&context);
+	if (error)
+		goto out_kfree;
+
+	alist = (struct xfs_attrlist *)namebuf;
+	for (i = 0; i < alist->al_count; i++) {
+		struct xfs_da_args args = {
+			.geo = ip->i_mount->m_attr_geo,
+			.whichfork = XFS_ATTR_FORK,
+			.dp = ip,
+			.namelen = sizeof(struct xfs_parent_name_rec),
+			.attr_filter = flags,
+			.op_flags = XFS_DA_OP_OKNOENT,
+		};
+
+		xpp = xfs_ppinfo_to_pp(ppi, i);
+		memset(xpp, 0, sizeof(struct xfs_parent_ptr));
+		aent = (struct xfs_attrlist_ent *)
+			&namebuf[alist->al_offset[i]];
+		xpnr = (struct xfs_parent_name_rec *)(aent->a_name);
+
+		if (aent->a_valuelen > XFS_PPTR_MAXNAMELEN) {
+			error = -ERANGE;
+			goto out_kfree;
+		}
+		name_len = aent->a_valuelen;
+
+		args.name = (char *)xpnr;
+		args.hashval = xfs_da_hashname(args.name, args.namelen),
+		args.value = (unsigned char *)(xpp->xpp_name);
+		args.valuelen = name_len;
+
+		error = xfs_attr_get_ilocked(&args);
+		error = (error == -EEXIST ? 0 : error);
+		if (error)
+			goto out_kfree;
+
+		xpp->xpp_namelen = name_len;
+		xfs_init_parent_ptr(xpp, xpnr);
+	}
+	ppi->pi_ptrs_used = alist->al_count;
+	if (!alist->al_more)
+		ppi->pi_flags |= XFS_PPTR_OFLAG_DONE;
+
+	/* Update the caller with the current cursor position */
+	memcpy(&ppi->pi_cursor, &context.cursor,
+		sizeof(struct xfs_attrlist_cursor));
+
+out_kfree:
+	xfs_iunlock(ip, XFS_ILOCK_EXCL);
+	kmem_free(namebuf);
+
+	return error;
+}
+
diff --git a/fs/xfs/xfs_parent_utils.h b/fs/xfs/xfs_parent_utils.h
new file mode 100644
index 000000000000..0e952b2ebd4a
--- /dev/null
+++ b/fs/xfs/xfs_parent_utils.h
@@ -0,0 +1,22 @@
+/*
+ * Copyright (c) 2017 Oracle, Inc.
+ * All Rights Reserved.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it would be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write the Free Software Foundation Inc.
+ */
+#ifndef	__XFS_PARENT_UTILS_H__
+#define	__XFS_PARENT_UTILS_H__
+
+int xfs_attr_get_parent_pointer(struct xfs_inode *ip,
+				struct xfs_pptr_info *ppi);
+#endif	/* __XFS_PARENT_UTILS_H__ */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [PATCH RESEND v2 02/18] xfs: Increase XFS_DEFER_OPS_NR_INODES to 5
  2022-08-04 19:39 ` [PATCH RESEND v2 02/18] xfs: Increase XFS_DEFER_OPS_NR_INODES to 5 Allison Henderson
@ 2022-08-09 16:38   ` Darrick J. Wong
  2022-08-10  3:07     ` Alli
  0 siblings, 1 reply; 58+ messages in thread
From: Darrick J. Wong @ 2022-08-09 16:38 UTC (permalink / raw)
  To: Allison Henderson; +Cc: linux-xfs

On Thu, Aug 04, 2022 at 12:39:57PM -0700, Allison Henderson wrote:
> Renames that generate parent pointer updates can join up to 5
> inodes locked in sorted order.  So we need to increase the
> number of defer ops inodes and relock them in the same way.
> 
> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_defer.c | 28 ++++++++++++++++++++++++++--
>  fs/xfs/libxfs/xfs_defer.h |  8 +++++++-
>  fs/xfs/xfs_inode.c        |  2 +-
>  fs/xfs/xfs_inode.h        |  1 +
>  4 files changed, 35 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
> index 5a321b783398..c0279b57e51d 100644
> --- a/fs/xfs/libxfs/xfs_defer.c
> +++ b/fs/xfs/libxfs/xfs_defer.c
> @@ -820,13 +820,37 @@ xfs_defer_ops_continue(
>  	struct xfs_trans		*tp,
>  	struct xfs_defer_resources	*dres)
>  {
> -	unsigned int			i;
> +	unsigned int			i, j;
> +	struct xfs_inode		*sips[XFS_DEFER_OPS_NR_INODES];
> +	struct xfs_inode		*temp;
>  
>  	ASSERT(tp->t_flags & XFS_TRANS_PERM_LOG_RES);
>  	ASSERT(!(tp->t_flags & XFS_TRANS_DIRTY));
>  
>  	/* Lock the captured resources to the new transaction. */
> -	if (dfc->dfc_held.dr_inos == 2)
> +	if (dfc->dfc_held.dr_inos > 2) {
> +		/*
> +		 * Renames with parent pointer updates can lock up to 5 inodes,
> +		 * sorted by their inode number.  So we need to make sure they
> +		 * are relocked in the same way.
> +		 */
> +		memset(sips, 0, sizeof(sips));
> +		for (i = 0; i < dfc->dfc_held.dr_inos; i++)
> +			sips[i] = dfc->dfc_held.dr_ip[i];
> +
> +		/* Bubble sort of at most 5 inodes */
> +		for (i = 0; i < dfc->dfc_held.dr_inos; i++) {
> +			for (j = 1; j < dfc->dfc_held.dr_inos; j++) {
> +				if (sips[j]->i_ino < sips[j-1]->i_ino) {
> +					temp = sips[j];
> +					sips[j] = sips[j-1];
> +					sips[j-1] = temp;
> +				}
> +			}
> +		}

Why not reuse xfs_sort_for_rename?

I also wonder if it's worth the trouble to replace the open-coded
bubblesort with a call to sort_r(), but TBH I suspect the cost of a
retpoline for the compare function isn't worth the overhead.

> +
> +		xfs_lock_inodes(sips, dfc->dfc_held.dr_inos, XFS_ILOCK_EXCL);
> +	} else if (dfc->dfc_held.dr_inos == 2)
>  		xfs_lock_two_inodes(dfc->dfc_held.dr_ip[0], XFS_ILOCK_EXCL,
>  				    dfc->dfc_held.dr_ip[1], XFS_ILOCK_EXCL);
>  	else if (dfc->dfc_held.dr_inos == 1)
> diff --git a/fs/xfs/libxfs/xfs_defer.h b/fs/xfs/libxfs/xfs_defer.h
> index 114a3a4930a3..3e4029d2ce41 100644
> --- a/fs/xfs/libxfs/xfs_defer.h
> +++ b/fs/xfs/libxfs/xfs_defer.h
> @@ -70,7 +70,13 @@ extern const struct xfs_defer_op_type xfs_attr_defer_type;
>  /*
>   * Deferred operation item relogging limits.
>   */
> -#define XFS_DEFER_OPS_NR_INODES	2	/* join up to two inodes */
> +
> +/*
> + * Rename w/ parent pointers can require up to 5 inodes with defered ops to
> + * be joined to the transaction: src_dp, target_dp, src_ip, target_ip, and wip.
> + * These inodes are locked in sorted order by their inode numbers

Much inode.  Thanks for recording this.

--D

> + */
> +#define XFS_DEFER_OPS_NR_INODES	5
>  #define XFS_DEFER_OPS_NR_BUFS	2	/* join up to two buffers */
>  
>  /* Resources that must be held across a transaction roll. */
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index 3022918bf96a..cfdcca95594f 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -447,7 +447,7 @@ xfs_lock_inumorder(
>   * lock more than one at a time, lockdep will report false positives saying we
>   * have violated locking orders.
>   */
> -static void
> +void
>  xfs_lock_inodes(
>  	struct xfs_inode	**ips,
>  	int			inodes,
> diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
> index 4d626f4321bc..bc06d6e4164a 100644
> --- a/fs/xfs/xfs_inode.h
> +++ b/fs/xfs/xfs_inode.h
> @@ -573,5 +573,6 @@ void xfs_end_io(struct work_struct *work);
>  
>  int xfs_ilock2_io_mmap(struct xfs_inode *ip1, struct xfs_inode *ip2);
>  void xfs_iunlock2_io_mmap(struct xfs_inode *ip1, struct xfs_inode *ip2);
> +void xfs_lock_inodes(struct xfs_inode **ips, int inodes, uint lock_mode);
>  
>  #endif	/* __XFS_INODE_H__ */
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH RESEND v2 01/18] xfs: Fix multi-transaction larp replay
  2022-08-04 19:39 ` [PATCH RESEND v2 01/18] xfs: Fix multi-transaction larp replay Allison Henderson
@ 2022-08-09 16:52   ` Darrick J. Wong
  2022-08-10  1:58     ` Dave Chinner
  2022-08-10  3:08     ` Alli
  0 siblings, 2 replies; 58+ messages in thread
From: Darrick J. Wong @ 2022-08-09 16:52 UTC (permalink / raw)
  To: Allison Henderson; +Cc: linux-xfs

On Thu, Aug 04, 2022 at 12:39:56PM -0700, Allison Henderson wrote:
> Recent parent pointer testing has exposed a bug in the underlying
> attr replay.  A multi transaction replay currently performs a
> single step of the replay, then deferrs the rest if there is more
> to do.  This causes race conditions with other attr replays that
> might be recovered before the remaining deferred work has had a
> chance to finish.  This can lead to interleaved set and remove
> operations that may clobber the attribute fork.  Fix this by
> deferring all work for any attribute operation.
> 
> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
> ---
>  fs/xfs/xfs_attr_item.c | 35 ++++++++---------------------------
>  1 file changed, 8 insertions(+), 27 deletions(-)
> 
> diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
> index 5077a7ad5646..c13d724a3e13 100644
> --- a/fs/xfs/xfs_attr_item.c
> +++ b/fs/xfs/xfs_attr_item.c
> @@ -635,52 +635,33 @@ xfs_attri_item_recover(
>  		break;
>  	case XFS_ATTRI_OP_FLAGS_REMOVE:
>  		if (!xfs_inode_hasattr(args->dp))
> -			goto out;
> +			return 0;
>  		attr->xattri_dela_state = xfs_attr_init_remove_state(args);
>  		break;
>  	default:
>  		ASSERT(0);
> -		error = -EFSCORRUPTED;
> -		goto out;
> +		return -EFSCORRUPTED;
>  	}
>  
>  	xfs_init_attr_trans(args, &tres, &total);
>  	error = xfs_trans_alloc(mp, &tres, total, 0, XFS_TRANS_RESERVE, &tp);
>  	if (error)
> -		goto out;
> +		return error;
>  
>  	args->trans = tp;
>  	done_item = xfs_trans_get_attrd(tp, attrip);
> +	args->trans->t_flags |= XFS_TRANS_HAS_INTENT_DONE;
> +	set_bit(XFS_LI_DIRTY, &done_item->attrd_item.li_flags);
>  
>  	xfs_ilock(ip, XFS_ILOCK_EXCL);
>  	xfs_trans_ijoin(tp, ip, 0);
>  
> -	error = xfs_xattri_finish_update(attr, done_item);
> -	if (error == -EAGAIN) {
> -		/*
> -		 * There's more work to do, so add the intent item to this
> -		 * transaction so that we can continue it later.
> -		 */
> -		xfs_defer_add(tp, XFS_DEFER_OPS_TYPE_ATTR, &attr->xattri_list);
> -		error = xfs_defer_ops_capture_and_commit(tp, capture_list);
> -		if (error)
> -			goto out_unlock;
> -
> -		xfs_iunlock(ip, XFS_ILOCK_EXCL);
> -		xfs_irele(ip);
> -		return 0;
> -	}
> -	if (error) {
> -		xfs_trans_cancel(tp);
> -		goto out_unlock;
> -	}
> -
> +	xfs_defer_add(tp, XFS_DEFER_OPS_TYPE_ATTR, &attr->xattri_list);

This seems a little convoluted to me.  Maybe?  Maybe not?

1. Log recovery recreates an incore xfs_attri_log_item from what it
finds in the log.

2. This function then logs an xattrd for the recovered xattri item.

3. Then it creates a new xfs_attr_intent to complete the operation.

4. Finally, it calls xfs_defer_ops_capture_and_commit, which logs a new
xattri for the intent created in step 3 and also commits the xattrd for
the first xattri.

IOWs, the only difference between before and after is that we're not
advancing one more step through the state machine as part of log
recovery.  From the perspective of the log, the recovery function merely
replaces the recovered xattri log item with a new one.

Why can't we just attach the recovered xattri to the xfs_defer_pending
that is created to point to the xfs_attr_intent that's created in step
3, and skip the xattrd?

I /think/ the answer to that question is that we might need to move the
log tail forward to free enough log space to finish the intent items, so
creating the extra xattrd/xattri (a) avoid the complexity of submitting
an incore intent item *and* a log intent item to the defer ops
machinery; and (b) avoid livelocks in log recovery.  Therefore, we
actually need to do it this way.

IOWS, I *think* this is ok, but want to see if others have differing
perspectives on how log item recovery works?

--D

>  	error = xfs_defer_ops_capture_and_commit(tp, capture_list);
> -out_unlock:
> +
>  	xfs_iunlock(ip, XFS_ILOCK_EXCL);
>  	xfs_irele(ip);
> -out:
> -	xfs_attr_free_item(attr);
> +
>  	return error;
>  }
>  
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH RESEND v2 08/18] xfs: add parent pointer support to attribute code
  2022-08-04 19:40 ` [PATCH RESEND v2 08/18] xfs: add parent pointer support to attribute code Allison Henderson
@ 2022-08-09 16:54   ` Darrick J. Wong
  2022-08-10  3:08     ` Alli
  0 siblings, 1 reply; 58+ messages in thread
From: Darrick J. Wong @ 2022-08-09 16:54 UTC (permalink / raw)
  To: Allison Henderson; +Cc: linux-xfs

On Thu, Aug 04, 2022 at 12:40:03PM -0700, Allison Henderson wrote:
> Add the new parent attribute type. XFS_ATTR_PARENT is used only for parent pointer
> entries; it uses reserved blocks like XFS_ATTR_ROOT.
> 
> [dchinner: forward ported and cleaned up]
> [achender: rebased]
> 
> Signed-off-by: Mark Tinguely <tinguely@sgi.com>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>

Looks good now,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D

> ---
>  fs/xfs/libxfs/xfs_attr.c       | 4 +++-
>  fs/xfs/libxfs/xfs_da_format.h  | 5 ++++-
>  fs/xfs/libxfs/xfs_log_format.h | 1 +
>  3 files changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> index e28d93d232de..8df80d91399b 100644
> --- a/fs/xfs/libxfs/xfs_attr.c
> +++ b/fs/xfs/libxfs/xfs_attr.c
> @@ -966,11 +966,13 @@ xfs_attr_set(
>  	struct xfs_inode	*dp = args->dp;
>  	struct xfs_mount	*mp = dp->i_mount;
>  	struct xfs_trans_res	tres;
> -	bool			rsvd = (args->attr_filter & XFS_ATTR_ROOT);
> +	bool			rsvd;
>  	int			error, local;
>  	int			rmt_blks = 0;
>  	unsigned int		total;
>  
> +	rsvd = (args->attr_filter & (XFS_ATTR_ROOT | XFS_ATTR_PARENT)) != 0;
> +
>  	if (xfs_is_shutdown(dp->i_mount))
>  		return -EIO;
>  
> diff --git a/fs/xfs/libxfs/xfs_da_format.h b/fs/xfs/libxfs/xfs_da_format.h
> index 25e2841084e1..3dc03968bba6 100644
> --- a/fs/xfs/libxfs/xfs_da_format.h
> +++ b/fs/xfs/libxfs/xfs_da_format.h
> @@ -688,12 +688,15 @@ struct xfs_attr3_leafblock {
>  #define	XFS_ATTR_LOCAL_BIT	0	/* attr is stored locally */
>  #define	XFS_ATTR_ROOT_BIT	1	/* limit access to trusted attrs */
>  #define	XFS_ATTR_SECURE_BIT	2	/* limit access to secure attrs */
> +#define	XFS_ATTR_PARENT_BIT	3	/* parent pointer attrs */
>  #define	XFS_ATTR_INCOMPLETE_BIT	7	/* attr in middle of create/delete */
>  #define XFS_ATTR_LOCAL		(1u << XFS_ATTR_LOCAL_BIT)
>  #define XFS_ATTR_ROOT		(1u << XFS_ATTR_ROOT_BIT)
>  #define XFS_ATTR_SECURE		(1u << XFS_ATTR_SECURE_BIT)
> +#define XFS_ATTR_PARENT		(1u << XFS_ATTR_PARENT_BIT)
>  #define XFS_ATTR_INCOMPLETE	(1u << XFS_ATTR_INCOMPLETE_BIT)
> -#define XFS_ATTR_NSP_ONDISK_MASK	(XFS_ATTR_ROOT | XFS_ATTR_SECURE)
> +#define XFS_ATTR_NSP_ONDISK_MASK \
> +			(XFS_ATTR_ROOT | XFS_ATTR_SECURE | XFS_ATTR_PARENT)
>  
>  /*
>   * Alignment for namelist and valuelist entries (since they are mixed
> diff --git a/fs/xfs/libxfs/xfs_log_format.h b/fs/xfs/libxfs/xfs_log_format.h
> index b351b9dc6561..eea53874fde8 100644
> --- a/fs/xfs/libxfs/xfs_log_format.h
> +++ b/fs/xfs/libxfs/xfs_log_format.h
> @@ -917,6 +917,7 @@ struct xfs_icreate_log {
>   */
>  #define XFS_ATTRI_FILTER_MASK		(XFS_ATTR_ROOT | \
>  					 XFS_ATTR_SECURE | \
> +					 XFS_ATTR_PARENT | \
>  					 XFS_ATTR_INCOMPLETE)
>  
>  /*
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH RESEND v2 10/18] xfs: Add xfs_verify_pptr
  2022-08-04 19:40 ` [PATCH RESEND v2 10/18] xfs: Add xfs_verify_pptr Allison Henderson
@ 2022-08-09 16:59   ` Darrick J. Wong
  2022-08-10  3:08     ` Alli
  0 siblings, 1 reply; 58+ messages in thread
From: Darrick J. Wong @ 2022-08-09 16:59 UTC (permalink / raw)
  To: Allison Henderson; +Cc: linux-xfs

On Thu, Aug 04, 2022 at 12:40:05PM -0700, Allison Henderson wrote:
> Attribute names of parent pointers are not strings.  So we need to modify
> attr_namecheck to verify parent pointer records when the XFS_ATTR_PARENT flag is
> set.
> 
> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> ---
>  fs/xfs/libxfs/xfs_attr.c | 43 +++++++++++++++++++++++++++++++++++++---
>  fs/xfs/libxfs/xfs_attr.h |  3 ++-
>  fs/xfs/scrub/attr.c      |  2 +-
>  fs/xfs/xfs_attr_item.c   |  6 ++++--
>  fs/xfs/xfs_attr_list.c   | 17 +++++++++++-----
>  5 files changed, 59 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> index 8df80d91399b..2ef3262f21e8 100644
> --- a/fs/xfs/libxfs/xfs_attr.c
> +++ b/fs/xfs/libxfs/xfs_attr.c
> @@ -1567,9 +1567,29 @@ xfs_attr_node_get(
>  	return error;
>  }
>  
> -/* Returns true if the attribute entry name is valid. */
> -bool
> -xfs_attr_namecheck(
> +/*
> + * Verify parent pointer attribute is valid.
> + * Return true on success or false on failure
> + */
> +STATIC bool
> +xfs_verify_pptr(struct xfs_mount *mp, struct xfs_parent_name_rec *rec)
> +{
> +	xfs_ino_t p_ino = (xfs_ino_t)be64_to_cpu(rec->p_ino);
> +	xfs_dir2_dataptr_t p_diroffset =
> +		(xfs_dir2_dataptr_t)be32_to_cpu(rec->p_diroffset);

I guess I should complain about the indentation here...

STATIC bool
xfs_verify_pptr(
	struct xfs_mount		*mp,
	struct xfs_parent_name_rec	*rec)
{
	xfs_ino_t			p_ino;
	xfs_dir2_dataptr_t		p_diroffset;

	p_ino = be64_to_cpu(rec->p_ino);
	p_diroffset = be32_to_cpu(rec->p_diroffset);

(You can keep the RVB tag if you clean this up for the next revision.)

--D

> +
> +	if (!xfs_verify_ino(mp, p_ino))
> +		return false;
> +
> +	if (p_diroffset > XFS_DIR2_MAX_DATAPTR)
> +		return false;
> +
> +	return true;
> +}
> +
> +/* Returns true if the string attribute entry name is valid. */
> +static bool
> +xfs_str_attr_namecheck(
>  	const void	*name,
>  	size_t		length)
>  {
> @@ -1584,6 +1604,23 @@ xfs_attr_namecheck(
>  	return !memchr(name, 0, length);
>  }
>  
> +/* Returns true if the attribute entry name is valid. */
> +bool
> +xfs_attr_namecheck(
> +	struct xfs_mount	*mp,
> +	const void		*name,
> +	size_t			length,
> +	int			flags)
> +{
> +	if (flags & XFS_ATTR_PARENT) {
> +		if (length != sizeof(struct xfs_parent_name_rec))
> +			return false;
> +		return xfs_verify_pptr(mp, (struct xfs_parent_name_rec *)name);
> +	}
> +
> +	return xfs_str_attr_namecheck(name, length);
> +}
> +
>  int __init
>  xfs_attr_intent_init_cache(void)
>  {
> diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
> index 81be9b3e4004..af92cc57e7d8 100644
> --- a/fs/xfs/libxfs/xfs_attr.h
> +++ b/fs/xfs/libxfs/xfs_attr.h
> @@ -547,7 +547,8 @@ int xfs_attr_get(struct xfs_da_args *args);
>  int xfs_attr_set(struct xfs_da_args *args);
>  int xfs_attr_set_iter(struct xfs_attr_intent *attr);
>  int xfs_attr_remove_iter(struct xfs_attr_intent *attr);
> -bool xfs_attr_namecheck(const void *name, size_t length);
> +bool xfs_attr_namecheck(struct xfs_mount *mp, const void *name, size_t length,
> +			int flags);
>  int xfs_attr_calc_size(struct xfs_da_args *args, int *local);
>  void xfs_init_attr_trans(struct xfs_da_args *args, struct xfs_trans_res *tres,
>  			 unsigned int *total);
> diff --git a/fs/xfs/scrub/attr.c b/fs/xfs/scrub/attr.c
> index b6f0c9f3f124..d3e75c077fab 100644
> --- a/fs/xfs/scrub/attr.c
> +++ b/fs/xfs/scrub/attr.c
> @@ -128,7 +128,7 @@ xchk_xattr_listent(
>  	}
>  
>  	/* Does this name make sense? */
> -	if (!xfs_attr_namecheck(name, namelen)) {
> +	if (!xfs_attr_namecheck(sx->sc->mp, name, namelen, flags)) {
>  		xchk_fblock_set_corrupt(sx->sc, XFS_ATTR_FORK, args.blkno);
>  		return;
>  	}
> diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
> index c13d724a3e13..69856814c066 100644
> --- a/fs/xfs/xfs_attr_item.c
> +++ b/fs/xfs/xfs_attr_item.c
> @@ -587,7 +587,8 @@ xfs_attri_item_recover(
>  	 */
>  	attrp = &attrip->attri_format;
>  	if (!xfs_attri_validate(mp, attrp) ||
> -	    !xfs_attr_namecheck(nv->name.i_addr, nv->name.i_len))
> +	    !xfs_attr_namecheck(mp, nv->name.i_addr, nv->name.i_len,
> +				attrp->alfi_attr_filter))
>  		return -EFSCORRUPTED;
>  
>  	error = xlog_recover_iget(mp,  attrp->alfi_ino, &ip);
> @@ -727,7 +728,8 @@ xlog_recover_attri_commit_pass2(
>  		return -EFSCORRUPTED;
>  	}
>  
> -	if (!xfs_attr_namecheck(attr_name, attri_formatp->alfi_name_len)) {
> +	if (!xfs_attr_namecheck(mp, attr_name, attri_formatp->alfi_name_len,
> +				attri_formatp->alfi_attr_filter)) {
>  		XFS_ERROR_REPORT(__func__, XFS_ERRLEVEL_LOW, mp);
>  		return -EFSCORRUPTED;
>  	}
> diff --git a/fs/xfs/xfs_attr_list.c b/fs/xfs/xfs_attr_list.c
> index 99bbbe1a0e44..a51f7f13a352 100644
> --- a/fs/xfs/xfs_attr_list.c
> +++ b/fs/xfs/xfs_attr_list.c
> @@ -58,9 +58,13 @@ xfs_attr_shortform_list(
>  	struct xfs_attr_sf_sort		*sbuf, *sbp;
>  	struct xfs_attr_shortform	*sf;
>  	struct xfs_attr_sf_entry	*sfe;
> +	struct xfs_mount		*mp;
>  	int				sbsize, nsbuf, count, i;
>  	int				error = 0;
>  
> +	ASSERT(context != NULL);
> +	ASSERT(dp != NULL);
> +	mp = dp->i_mount;
>  	sf = (struct xfs_attr_shortform *)dp->i_af.if_u1.if_data;
>  	ASSERT(sf != NULL);
>  	if (!sf->hdr.count)
> @@ -82,8 +86,9 @@ xfs_attr_shortform_list(
>  	     (dp->i_af.if_bytes + sf->hdr.count * 16) < context->bufsize)) {
>  		for (i = 0, sfe = &sf->list[0]; i < sf->hdr.count; i++) {
>  			if (XFS_IS_CORRUPT(context->dp->i_mount,
> -					   !xfs_attr_namecheck(sfe->nameval,
> -							       sfe->namelen)))
> +					   !xfs_attr_namecheck(mp, sfe->nameval,
> +							       sfe->namelen,
> +							       sfe->flags)))
>  				return -EFSCORRUPTED;
>  			context->put_listent(context,
>  					     sfe->flags,
> @@ -174,8 +179,9 @@ xfs_attr_shortform_list(
>  			cursor->offset = 0;
>  		}
>  		if (XFS_IS_CORRUPT(context->dp->i_mount,
> -				   !xfs_attr_namecheck(sbp->name,
> -						       sbp->namelen))) {
> +				   !xfs_attr_namecheck(mp, sbp->name,
> +						       sbp->namelen,
> +						       sbp->flags))) {
>  			error = -EFSCORRUPTED;
>  			goto out;
>  		}
> @@ -465,7 +471,8 @@ xfs_attr3_leaf_list_int(
>  		}
>  
>  		if (XFS_IS_CORRUPT(context->dp->i_mount,
> -				   !xfs_attr_namecheck(name, namelen)))
> +				   !xfs_attr_namecheck(mp, name, namelen,
> +						       entry->flags)))
>  			return -EFSCORRUPTED;
>  		context->put_listent(context, entry->flags,
>  					      name, namelen, valuelen);
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH RESEND v2 11/18] xfs: extend transaction reservations for parent attributes
  2022-08-04 19:40 ` [PATCH RESEND v2 11/18] xfs: extend transaction reservations for parent attributes Allison Henderson
@ 2022-08-09 17:48   ` Darrick J. Wong
  2022-08-10  3:08     ` Alli
  0 siblings, 1 reply; 58+ messages in thread
From: Darrick J. Wong @ 2022-08-09 17:48 UTC (permalink / raw)
  To: Allison Henderson; +Cc: linux-xfs

On Thu, Aug 04, 2022 at 12:40:06PM -0700, Allison Henderson wrote:
> We need to add, remove or modify parent pointer attributes during
> create/link/unlink/rename operations atomically with the dirents in the
> parent directories being modified. This means they need to be modified
> in the same transaction as the parent directories, and so we need to add
> the required space for the attribute modifications to the transaction
> reservations.

While we're on the topic of log reservations ... Dave and I noticed
during the 5.19 cycle that xfs_log_calc_max_attrsetm_res has a unit
conversion problem when it's trying to compute the minimum log size:

STATIC int
xfs_log_calc_max_attrsetm_res(
	struct xfs_mount	*mp)
{
	int			size;
	int			nblks;

	size = xfs_attr_leaf_entsize_local_max(mp->m_attr_geo->blksize) -
	       MAXNAMELEN - 1;

Notice here that @size is the maximum amount of space that a local
format attribute can use in an xattr leaf block.  The computation is in
units of bytes.

	nblks = XFS_DAENTER_SPACE_RES(mp, XFS_ATTR_FORK);
	nblks += XFS_B_TO_FSB(mp, size);

...and here we convert bytes to fs blocks for the block count
computation...

	nblks += XFS_NEXTENTADD_SPACE_RES(mp, size, XFS_ATTR_FORK);

...but here we pass the byte count into a macro that takes a block count
as its second parameter and returns the number of bmbt blocks needed to
add that many blocks to an attribute fork.  Oops!

I would like to fix this incorrect code, but it's never a good idea to
adjust downwards the min log size calculation for existing filesystems,
because this can result in the situation where new mkfs formats a
filesystem with a small enough log that an old kernel won't mount it.

Therefore, the corrected logic would have to be gated on whatever
happens to be the next new ondisk feature.  It's probably too late to do
this for large extent counts, but fixing the calculation would be (I
think) appropriate for parent pointers, since it's still undergoing
review and won't be an easy upgrade, which eliminates the legacy
problem.

I'll attach the patches that I've written as patches 19 and 20 to this
patchset, if you don't mind having a look and adding them?

	return  M_RES(mp)->tr_attrsetm.tr_logres +
		M_RES(mp)->tr_attrsetrt.tr_logres * nblks;
}


> [achender: rebased]
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_trans_resv.c | 105 +++++++++++++++++++++++++++------
>  1 file changed, 86 insertions(+), 19 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_trans_resv.c b/fs/xfs/libxfs/xfs_trans_resv.c
> index e9913c2c5a24..b43ac4be7564 100644
> --- a/fs/xfs/libxfs/xfs_trans_resv.c
> +++ b/fs/xfs/libxfs/xfs_trans_resv.c
> @@ -909,24 +909,67 @@ xfs_calc_sb_reservation(
>  	return xfs_calc_buf_res(1, mp->m_sb.sb_sectsize);
>  }
>  
> -void
> -xfs_trans_resv_calc(
> -	struct xfs_mount	*mp,
> -	struct xfs_trans_resv	*resp)
> +STATIC void
> +xfs_calc_parent_ptr_reservations(
> +	struct xfs_mount     *mp)
>  {
> -	int			logcount_adj = 0;
> +	struct xfs_trans_resv   *resp = M_RES(mp);
>  
> -	/*
> -	 * The following transactions are logged in physical format and
> -	 * require a permanent reservation on space.
> -	 */
> -	resp->tr_write.tr_logres = xfs_calc_write_reservation(mp, false);
> -	resp->tr_write.tr_logcount = XFS_WRITE_LOG_COUNT;
> -	resp->tr_write.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
> +	/* Calculate extra space needed for parent pointer attributes */

This might be better expressed as a comment just prior to the function
declaration above.

> +	if (!xfs_has_parent(mp))
> +		return;
>  
> -	resp->tr_itruncate.tr_logres = xfs_calc_itruncate_reservation(mp, false);
> -	resp->tr_itruncate.tr_logcount = XFS_ITRUNCATE_LOG_COUNT;
> -	resp->tr_itruncate.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
> +	/* rename can add/remove/modify 4 parent attributes */
> +	resp->tr_rename.tr_logres += 4 * max(resp->tr_attrsetm.tr_logres,
> +					 resp->tr_attrrm.tr_logres);

Why does the per-transaction reservation increase by 4x the amount of
space needed to set (or delete) an xattr?  The pptr patchset now uses
logged xattrs, which means that each xattr update needed to commit the
rename operation will happen in a separate transaction.  IOWs, each
transaction in the chain does not have to handle *every* update that
must be made during the entire chain, it only has to handle one step of
the full process.

Doesn't that mean that the size of tr_rename.tr_logres only needs to
increase by the amount of space needed to log the four(?) xattr items to
the first transaction in the chain?  AFAICT, it also can't be smaller
than max(resp->tr_attrsetm.tr_logres, resp->tr_attrrm.tr_logres);

(I'm also not sure why four -- the patch for xfs_rename only creates
three xfs_parent_defer objects.)

I also think that adjusting tr_rename to account for parent pointers is
something that should be done in xfs_calc_rename_reservation, not a
separate function:

/*
 * In renaming a files we can modify (t1):
 *    the four inodes involved: 4 * inode size
 *    the two directory btrees: 2 * (max depth + v2) * dir block size
 *    the two directory bmap btrees: 2 * max depth * block size
 * And the bmap_finish transaction can free dir and bmap blocks (two sets
 *	of bmap blocks) giving (t2):
 *    the agf for the ags in which the blocks live: 3 * sector size
 *    the agfl for the ags in which the blocks live: 3 * sector size
 *    the superblock for the free block count: sector size
 *    the allocation btrees: 3 exts * 2 trees * (2 * max depth - 1) * block size
 * If parent pointers are enabled (t3), then each transaction in the chain
 *    must be capable of setting or removing the extended attribute
 *    containing the parent information.  It must also be able to handle
 *    the three xattr intent items that track the progress of the parent
 *    pointer update.
 */
STATIC uint
xfs_calc_rename_reservation(
	struct xfs_mount	*mp)
{
	unsigned int		overhead = XFS_DQUOT_LOGRES(mp);
	unsigned int		t1, t2, t3 = 0;

	t1 = xfs_calc_inode_res(mp, 4) +
	     xfs_calc_buf_res(2 * XFS_DIROP_LOG_COUNT(mp),
			XFS_FSB_TO_B(mp, 1));

	t2 = xfs_calc_buf_res(7, mp->m_sb.sb_sectsize) +
	     xfs_calc_buf_res(xfs_allocfree_block_count(mp, 3),
			XFS_FSB_TO_B(mp, 1))));

	if (xfs_has_parent(mp)) {
		t3 = max(resp->tr_attrsetm.tr_logres,
				resp->tr_attrrm.tr_logres);
		overhead += 3 * (size of a pptr xattr intent item);
	}

	return overhead + max3(t1, t2, t3);
}

> +	resp->tr_rename.tr_logcount += 4 * max(resp->tr_attrsetm.tr_logcount,
> +					   resp->tr_attrrm.tr_logcount);

Looks correct, module the 4 vs. 3 thing.

> +
> +	/* create will add 1 parent attribute */
> +	resp->tr_create.tr_logres += resp->tr_attrsetm.tr_logres;
> +	resp->tr_create.tr_logcount += resp->tr_attrsetm.tr_logcount;
> +
> +	/* mkdir will add 1 parent attribute */
> +	resp->tr_mkdir.tr_logres += resp->tr_attrsetm.tr_logres;
> +	resp->tr_mkdir.tr_logcount += resp->tr_attrsetm.tr_logcount;
> +
> +	/* link will add 1 parent attribute */
> +	resp->tr_link.tr_logres += resp->tr_attrsetm.tr_logres;
> +	resp->tr_link.tr_logcount += resp->tr_attrsetm.tr_logcount;
> +
> +	/* symlink will add 1 parent attribute */
> +	resp->tr_symlink.tr_logres += resp->tr_attrsetm.tr_logres;
> +	resp->tr_symlink.tr_logcount += resp->tr_attrsetm.tr_logcount;
> +
> +	/* remove will remove 1 parent attribute */
> +	resp->tr_remove.tr_logres += resp->tr_attrrm.tr_logres;
> +	resp->tr_remove.tr_logcount += resp->tr_attrrm.tr_logcount;
> +}
> +
> +/*
> + * Namespace reservations.
> + *
> + * These get tricky when parent pointers are enabled as we have attribute
> + * modifications occurring from within these transactions. Rather than confuse
> + * each of these reservation calculations with the conditional attribute
> + * reservations, add them here in a clear and concise manner. This assumes that
> + * the attribute reservations have already been calculated.
> + *
> + * Note that we only include the static attribute reservation here; the runtime
> + * reservation will have to be modified by the size of the attributes being
> + * added/removed/modified. See the comments on the attribute reservation
> + * calculations for more details.
> + *
> + * Note for rename: rename will vastly overestimate requirements. This will be
> + * addressed later when modifications are made to ensure parent attribute

Later?  I took a look at the rename patch, and it looks like we're using
logged xattrs from the start.

--D

> + * modifications can be done atomically with the rename operation.
> + */
> +STATIC void
> +xfs_calc_namespace_reservations(
> +	struct xfs_mount	*mp,
> +	struct xfs_trans_resv	*resp)
> +{
> +	ASSERT(resp->tr_attrsetm.tr_logres > 0);
>  
>  	resp->tr_rename.tr_logres = xfs_calc_rename_reservation(mp);
>  	resp->tr_rename.tr_logcount = XFS_RENAME_LOG_COUNT;
> @@ -948,15 +991,37 @@ xfs_trans_resv_calc(
>  	resp->tr_create.tr_logcount = XFS_CREATE_LOG_COUNT;
>  	resp->tr_create.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
>  
> +	resp->tr_mkdir.tr_logres = xfs_calc_mkdir_reservation(mp);
> +	resp->tr_mkdir.tr_logcount = XFS_MKDIR_LOG_COUNT;
> +	resp->tr_mkdir.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
> +
> +	xfs_calc_parent_ptr_reservations(mp);
> +}
> +
> +void
> +xfs_trans_resv_calc(
> +	struct xfs_mount	*mp,
> +	struct xfs_trans_resv	*resp)
> +{
> +	int			logcount_adj = 0;
> +
> +	/*
> +	 * The following transactions are logged in physical format and
> +	 * require a permanent reservation on space.
> +	 */
> +	resp->tr_write.tr_logres = xfs_calc_write_reservation(mp, false);
> +	resp->tr_write.tr_logcount = XFS_WRITE_LOG_COUNT;
> +	resp->tr_write.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
> +
> +	resp->tr_itruncate.tr_logres = xfs_calc_itruncate_reservation(mp, false);
> +	resp->tr_itruncate.tr_logcount = XFS_ITRUNCATE_LOG_COUNT;
> +	resp->tr_itruncate.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
> +
>  	resp->tr_create_tmpfile.tr_logres =
>  			xfs_calc_create_tmpfile_reservation(mp);
>  	resp->tr_create_tmpfile.tr_logcount = XFS_CREATE_TMPFILE_LOG_COUNT;
>  	resp->tr_create_tmpfile.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
>  
> -	resp->tr_mkdir.tr_logres = xfs_calc_mkdir_reservation(mp);
> -	resp->tr_mkdir.tr_logcount = XFS_MKDIR_LOG_COUNT;
> -	resp->tr_mkdir.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
> -
>  	resp->tr_ifree.tr_logres = xfs_calc_ifree_reservation(mp);
>  	resp->tr_ifree.tr_logcount = XFS_INACTIVE_LOG_COUNT;
>  	resp->tr_ifree.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
> @@ -986,6 +1051,8 @@ xfs_trans_resv_calc(
>  	resp->tr_qm_dqalloc.tr_logcount = XFS_WRITE_LOG_COUNT;
>  	resp->tr_qm_dqalloc.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
>  
> +	xfs_calc_namespace_reservations(mp, resp);
> +
>  	/*
>  	 * The following transactions are logged in logical format with
>  	 * a default log count.
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH RESEND v2 12/18] xfs: parent pointer attribute creation
  2022-08-04 19:40 ` [PATCH RESEND v2 12/18] xfs: parent pointer attribute creation Allison Henderson
@ 2022-08-09 18:01   ` Darrick J. Wong
  2022-08-09 18:13     ` Darrick J. Wong
  2022-08-10  3:08     ` Alli
  0 siblings, 2 replies; 58+ messages in thread
From: Darrick J. Wong @ 2022-08-09 18:01 UTC (permalink / raw)
  To: Allison Henderson; +Cc: linux-xfs

On Thu, Aug 04, 2022 at 12:40:07PM -0700, Allison Henderson wrote:
> Add parent pointer attribute during xfs_create, and subroutines to
> initialize attributes
> 
> [bfoster: rebase, use VFS inode generation]
> [achender: rebased, changed __unint32_t to xfs_dir2_dataptr_t,

Nit: uint32_t, not unint32_t.

>            fixed some null pointer bugs,
>            merged error handling patch,
>            remove unnecessary ENOSPC handling in xfs_attr_set_first_parent]
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
> ---
>  fs/xfs/Makefile            |   1 +
>  fs/xfs/libxfs/xfs_attr.c   |   4 +-
>  fs/xfs/libxfs/xfs_attr.h   |   4 +-
>  fs/xfs/libxfs/xfs_parent.c | 134 +++++++++++++++++++++++++++++++++++++
>  fs/xfs/libxfs/xfs_parent.h |  34 ++++++++++
>  fs/xfs/xfs_inode.c         |  37 ++++++++--
>  fs/xfs/xfs_xattr.c         |   2 +-
>  fs/xfs/xfs_xattr.h         |   1 +
>  8 files changed, 208 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
> index 1131dd01e4fe..caeea8d968ba 100644
> --- a/fs/xfs/Makefile
> +++ b/fs/xfs/Makefile
> @@ -40,6 +40,7 @@ xfs-y				+= $(addprefix libxfs/, \
>  				   xfs_inode_fork.o \
>  				   xfs_inode_buf.o \
>  				   xfs_log_rlimit.o \
> +				   xfs_parent.o \
>  				   xfs_ag_resv.o \
>  				   xfs_rmap.o \
>  				   xfs_rmap_btree.o \
> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> index 2ef3262f21e8..0a458ea7051f 100644
> --- a/fs/xfs/libxfs/xfs_attr.c
> +++ b/fs/xfs/libxfs/xfs_attr.c
> @@ -880,7 +880,7 @@ xfs_attr_lookup(
>  	return error;
>  }
>  
> -static int
> +int
>  xfs_attr_intent_init(
>  	struct xfs_da_args	*args,
>  	unsigned int		op_flags,	/* op flag (set or remove) */
> @@ -898,7 +898,7 @@ xfs_attr_intent_init(
>  }
>  
>  /* Sets an attribute for an inode as a deferred operation */
> -static int
> +int
>  xfs_attr_defer_add(
>  	struct xfs_da_args	*args)
>  {
> diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
> index af92cc57e7d8..b47417b5172f 100644
> --- a/fs/xfs/libxfs/xfs_attr.h
> +++ b/fs/xfs/libxfs/xfs_attr.h
> @@ -544,6 +544,7 @@ int xfs_inode_hasattr(struct xfs_inode *ip);
>  bool xfs_attr_is_leaf(struct xfs_inode *ip);
>  int xfs_attr_get_ilocked(struct xfs_da_args *args);
>  int xfs_attr_get(struct xfs_da_args *args);
> +int xfs_attr_defer_add(struct xfs_da_args *args);
>  int xfs_attr_set(struct xfs_da_args *args);
>  int xfs_attr_set_iter(struct xfs_attr_intent *attr);
>  int xfs_attr_remove_iter(struct xfs_attr_intent *attr);
> @@ -552,7 +553,8 @@ bool xfs_attr_namecheck(struct xfs_mount *mp, const void *name, size_t length,
>  int xfs_attr_calc_size(struct xfs_da_args *args, int *local);
>  void xfs_init_attr_trans(struct xfs_da_args *args, struct xfs_trans_res *tres,
>  			 unsigned int *total);
> -
> +int xfs_attr_intent_init(struct xfs_da_args *args, unsigned int op_flags,
> +			 struct xfs_attr_intent  **attr);
>  /*
>   * Check to see if the attr should be upgraded from non-existent or shortform to
>   * single-leaf-block attribute list.
> diff --git a/fs/xfs/libxfs/xfs_parent.c b/fs/xfs/libxfs/xfs_parent.c
> new file mode 100644
> index 000000000000..4ab531c77d7d
> --- /dev/null
> +++ b/fs/xfs/libxfs/xfs_parent.c
> @@ -0,0 +1,134 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (c) 2022 Oracle, Inc.
> + * All rights reserved.
> + */
> +#include "xfs.h"
> +#include "xfs_fs.h"
> +#include "xfs_format.h"
> +#include "xfs_da_format.h"
> +#include "xfs_log_format.h"
> +#include "xfs_shared.h"
> +#include "xfs_trans_resv.h"
> +#include "xfs_mount.h"
> +#include "xfs_bmap_btree.h"
> +#include "xfs_inode.h"
> +#include "xfs_error.h"
> +#include "xfs_trace.h"
> +#include "xfs_trans.h"
> +#include "xfs_da_btree.h"
> +#include "xfs_attr.h"
> +#include "xfs_da_btree.h"
> +#include "xfs_attr_sf.h"
> +#include "xfs_bmap.h"
> +#include "xfs_defer.h"
> +#include "xfs_log.h"
> +#include "xfs_xattr.h"
> +#include "xfs_parent.h"
> +
> +/*
> + * Parent pointer attribute handling.
> + *
> + * Because the attribute value is a filename component, it will never be longer
> + * than 255 bytes. This means the attribute will always be a local format
> + * attribute as it is xfs_attr_leaf_entsize_local_max() for v5 filesystems will
> + * always be larger than this (max is 75% of block size).
> + *
> + * Creating a new parent attribute will always create a new attribute - there
> + * should never, ever be an existing attribute in the tree for a new inode.
> + * ENOSPC behavior is problematic - creating the inode without the parent
> + * pointer is effectively a corruption, so we allow parent attribute creation
> + * to dip into the reserve block pool to avoid unexpected ENOSPC errors from
> + * occurring.

Shouldn't we increase XFS_LINK_SPACE_RES to avoid this?  The reserve
pool isn't terribly large (8192 blocks) and was really only supposed to
save us from an ENOSPC shutdown if an unwritten extent conversion in the
writeback endio handler needs a few more blocks.

IOWs, we really ought to ENOSPC at transaction reservation time instead
of draining the reserve pool.

> + */
> +
> +
> +/* Initializes a xfs_parent_name_rec to be stored as an attribute name */
> +void
> +xfs_init_parent_name_rec(
> +	struct xfs_parent_name_rec	*rec,
> +	struct xfs_inode		*ip,
> +	uint32_t			p_diroffset)
> +{
> +	xfs_ino_t			p_ino = ip->i_ino;
> +	uint32_t			p_gen = VFS_I(ip)->i_generation;
> +
> +	rec->p_ino = cpu_to_be64(p_ino);
> +	rec->p_gen = cpu_to_be32(p_gen);
> +	rec->p_diroffset = cpu_to_be32(p_diroffset);
> +}
> +
> +/* Initializes a xfs_parent_name_irec from an xfs_parent_name_rec */
> +void
> +xfs_init_parent_name_irec(
> +	struct xfs_parent_name_irec	*irec,
> +	struct xfs_parent_name_rec	*rec)
> +{
> +	irec->p_ino = be64_to_cpu(rec->p_ino);
> +	irec->p_gen = be32_to_cpu(rec->p_gen);
> +	irec->p_diroffset = be32_to_cpu(rec->p_diroffset);
> +}
> +
> +int
> +xfs_parent_init(
> +	xfs_mount_t                     *mp,
> +	xfs_inode_t			*ip,
> +	struct xfs_name			*target_name,
> +	struct xfs_parent_defer		**parentp)
> +{
> +	struct xfs_parent_defer		*parent;
> +	int				error;
> +
> +	if (!xfs_has_parent(mp))
> +		return 0;
> +
> +	error = xfs_attr_grab_log_assist(mp);

At some point we might want to consider boosting performance by setting
XFS_SB_FEAT_INCOMPAT_LOG_XATTRS permanently when parent pointers are
turned on, since adding the feature requires a synchronous bwrite of the
primary superblock.

I /think/ this could be accomplished by setting the feature bit in mkfs
and teaching xlog_clear_incompat to exit if xfs_has_parent()==true.
Then we can skip the xfs_attr_grab_log_assist calls.

But, let's focus on getting this patchset into good enough shape that
we can be confident that we don't need any ondisk format changes, and
worry about speed later.

> +	if (error)
> +		return error;
> +
> +	parent = kzalloc(sizeof(*parent), GFP_KERNEL);

These objects are going to be created and freed fairly frequently; could
you please convert these to a kmem cache?  (That can be a cleanup at the
end.)

> +	if (!parent)
> +		return -ENOMEM;
> +
> +	/* init parent da_args */
> +	parent->args.dp = ip;
> +	parent->args.geo = mp->m_attr_geo;
> +	parent->args.whichfork = XFS_ATTR_FORK;
> +	parent->args.attr_filter = XFS_ATTR_PARENT;
> +	parent->args.op_flags = XFS_DA_OP_OKNOENT | XFS_DA_OP_LOGGED;
> +	parent->args.name = (const uint8_t *)&parent->rec;
> +	parent->args.namelen = sizeof(struct xfs_parent_name_rec);
> +
> +	if (target_name) {
> +		parent->args.value = (void *)target_name->name;
> +		parent->args.valuelen = target_name->len;
> +	}
> +
> +	*parentp = parent;
> +	return 0;
> +}
> +
> +int
> +xfs_parent_defer_add(
> +	struct xfs_trans	*tp,
> +	struct xfs_inode	*ip,
> +	struct xfs_parent_defer	*parent,
> +	xfs_dir2_dataptr_t	diroffset)
> +{
> +	struct xfs_da_args	*args = &parent->args;
> +
> +	xfs_init_parent_name_rec(&parent->rec, ip, diroffset);
> +	args->trans = tp;
> +	args->hashval = xfs_da_hashname(args->name, args->namelen);
> +	return xfs_attr_defer_add(args);
> +}
> +
> +void
> +xfs_parent_cancel(
> +	xfs_mount_t		*mp,
> +	struct xfs_parent_defer *parent)
> +{
> +	xlog_drop_incompat_feat(mp->m_log);
> +	kfree(parent);
> +}
> +
> diff --git a/fs/xfs/libxfs/xfs_parent.h b/fs/xfs/libxfs/xfs_parent.h
> new file mode 100644
> index 000000000000..21a350b97ed5
> --- /dev/null
> +++ b/fs/xfs/libxfs/xfs_parent.h
> @@ -0,0 +1,34 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (c) 2022 Oracle, Inc.
> + * All Rights Reserved.
> + */
> +#ifndef	__XFS_PARENT_H__
> +#define	__XFS_PARENT_H__
> +
> +/*
> + * Dynamically allocd structure used to wrap the needed data to pass around
> + * the defer ops machinery
> + */
> +struct xfs_parent_defer {
> +	struct xfs_parent_name_rec	rec;
> +	struct xfs_da_args		args;
> +};
> +
> +/*
> + * Parent pointer attribute prototypes
> + */
> +void xfs_init_parent_name_rec(struct xfs_parent_name_rec *rec,
> +			      struct xfs_inode *ip,
> +			      uint32_t p_diroffset);
> +void xfs_init_parent_name_irec(struct xfs_parent_name_irec *irec,
> +			       struct xfs_parent_name_rec *rec);
> +int xfs_parent_init(xfs_mount_t *mp, xfs_inode_t *ip,
> +		    struct xfs_name *target_name,
> +		    struct xfs_parent_defer **parentp);
> +int xfs_parent_defer_add(struct xfs_trans *tp, struct xfs_inode *ip,
> +			 struct xfs_parent_defer *parent,
> +			 xfs_dir2_dataptr_t diroffset);
> +void xfs_parent_cancel(xfs_mount_t *mp, struct xfs_parent_defer *parent);
> +
> +#endif	/* __XFS_PARENT_H__ */
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index 09876ba10a42..ef993c3a8963 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -37,6 +37,8 @@
>  #include "xfs_reflink.h"
>  #include "xfs_ag.h"
>  #include "xfs_log_priv.h"
> +#include "xfs_parent.h"
> +#include "xfs_xattr.h"
>  
>  struct kmem_cache *xfs_inode_cache;
>  
> @@ -950,7 +952,7 @@ xfs_bumplink(
>  int
>  xfs_create(
>  	struct user_namespace	*mnt_userns,
> -	xfs_inode_t		*dp,
> +	struct xfs_inode	*dp,
>  	struct xfs_name		*name,
>  	umode_t			mode,
>  	dev_t			rdev,
> @@ -962,7 +964,7 @@ xfs_create(
>  	struct xfs_inode	*ip = NULL;
>  	struct xfs_trans	*tp = NULL;
>  	int			error;
> -	bool                    unlock_dp_on_error = false;
> +	bool			unlock_dp_on_error = false;
>  	prid_t			prid;
>  	struct xfs_dquot	*udqp = NULL;
>  	struct xfs_dquot	*gdqp = NULL;
> @@ -970,6 +972,8 @@ xfs_create(
>  	struct xfs_trans_res	*tres;
>  	uint			resblks;
>  	xfs_ino_t		ino;
> +	xfs_dir2_dataptr_t	diroffset;
> +	struct xfs_parent_defer	*parent = NULL;
>  
>  	trace_xfs_create(dp, name);
>  
> @@ -996,6 +1000,12 @@ xfs_create(
>  		tres = &M_RES(mp)->tr_create;
>  	}
>  
> +	if (xfs_has_parent(mp)) {
> +		error = xfs_parent_init(mp, dp, name, &parent);
> +		if (error)
> +			goto out_release_dquots;
> +	}
> +
>  	/*
>  	 * Initially assume that the file does not exist and
>  	 * reserve the resources for that case.  If that is not
> @@ -1011,7 +1021,7 @@ xfs_create(
>  				resblks, &tp);
>  	}
>  	if (error)
> -		goto out_release_dquots;
> +		goto drop_incompat;
>  
>  	xfs_ilock(dp, XFS_ILOCK_EXCL | XFS_ILOCK_PARENT);
>  	unlock_dp_on_error = true;
> @@ -1021,6 +1031,7 @@ xfs_create(
>  	 * entry pointing to them, but a directory also the "." entry
>  	 * pointing to itself.
>  	 */
> +	init_xattrs |= xfs_has_parent(mp);
>  	error = xfs_dialloc(&tp, dp->i_ino, mode, &ino);
>  	if (!error)
>  		error = xfs_init_new_inode(mnt_userns, tp, dp, ino, mode,
> @@ -1035,11 +1046,12 @@ xfs_create(
>  	 * the transaction cancel unlocking dp so don't do it explicitly in the
>  	 * error path.
>  	 */
> -	xfs_trans_ijoin(tp, dp, XFS_ILOCK_EXCL);
> +	xfs_trans_ijoin(tp, dp, 0);
>  	unlock_dp_on_error = false;
>  
>  	error = xfs_dir_createname(tp, dp, name, ip->i_ino,
> -				   resblks - XFS_IALLOC_SPACE_RES(mp), NULL);
> +				   resblks - XFS_IALLOC_SPACE_RES(mp),
> +				   &diroffset);
>  	if (error) {
>  		ASSERT(error != -ENOSPC);
>  		goto out_trans_cancel;
> @@ -1055,6 +1067,17 @@ xfs_create(
>  		xfs_bumplink(tp, dp);
>  	}
>  
> +	/*
> +	 * If we have parent pointers, we need to add the attribute containing
> +	 * the parent information now.
> +	 */
> +	if (parent) {
> +		parent->args.dp	= ip;
> +		error = xfs_parent_defer_add(tp, dp, parent, diroffset);
> +		if (error)
> +			goto out_trans_cancel;
> +	}
> +
>  	/*
>  	 * If this is a synchronous mount, make sure that the
>  	 * create transaction goes to disk before returning to
> @@ -1080,6 +1103,7 @@ xfs_create(
>  
>  	*ipp = ip;
>  	xfs_iunlock(ip, XFS_ILOCK_EXCL);
> +	xfs_iunlock(dp, XFS_ILOCK_EXCL | XFS_ILOCK_PARENT);

I don't think we need the ILOCK class annotations for unlocks.

Other than the two things I asked about, this is looking good.

--D

>  	return 0;
>  
>   out_trans_cancel:
> @@ -1094,6 +1118,9 @@ xfs_create(
>  		xfs_finish_inode_setup(ip);
>  		xfs_irele(ip);
>  	}
> + drop_incompat:
> +	if (parent)
> +		xfs_parent_cancel(mp, parent);
>   out_release_dquots:
>  	xfs_qm_dqrele(udqp);
>  	xfs_qm_dqrele(gdqp);
> diff --git a/fs/xfs/xfs_xattr.c b/fs/xfs/xfs_xattr.c
> index c325a28b89a8..d9067c5f6bd6 100644
> --- a/fs/xfs/xfs_xattr.c
> +++ b/fs/xfs/xfs_xattr.c
> @@ -27,7 +27,7 @@
>   * they must release the permission by calling xlog_drop_incompat_feat
>   * when they're done.
>   */
> -static inline int
> +int
>  xfs_attr_grab_log_assist(
>  	struct xfs_mount	*mp)
>  {
> diff --git a/fs/xfs/xfs_xattr.h b/fs/xfs/xfs_xattr.h
> index 2b09133b1b9b..3fd6520a4d69 100644
> --- a/fs/xfs/xfs_xattr.h
> +++ b/fs/xfs/xfs_xattr.h
> @@ -7,6 +7,7 @@
>  #define __XFS_XATTR_H__
>  
>  int xfs_attr_change(struct xfs_da_args *args);
> +int xfs_attr_grab_log_assist(struct xfs_mount *mp);
>  
>  extern const struct xattr_handler *xfs_xattr_handlers[];
>  
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH RESEND v2 12/18] xfs: parent pointer attribute creation
  2022-08-09 18:01   ` Darrick J. Wong
@ 2022-08-09 18:13     ` Darrick J. Wong
  2022-08-10  3:09       ` Alli
  2022-08-10  3:08     ` Alli
  1 sibling, 1 reply; 58+ messages in thread
From: Darrick J. Wong @ 2022-08-09 18:13 UTC (permalink / raw)
  To: Allison Henderson; +Cc: linux-xfs

On Tue, Aug 09, 2022 at 11:01:01AM -0700, Darrick J. Wong wrote:
> On Thu, Aug 04, 2022 at 12:40:07PM -0700, Allison Henderson wrote:
> > Add parent pointer attribute during xfs_create, and subroutines to
> > initialize attributes
> > 
> > [bfoster: rebase, use VFS inode generation]
> > [achender: rebased, changed __unint32_t to xfs_dir2_dataptr_t,
> 
> Nit: uint32_t, not unint32_t.
> 
> >            fixed some null pointer bugs,
> >            merged error handling patch,
> >            remove unnecessary ENOSPC handling in xfs_attr_set_first_parent]
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
> > ---
> >  fs/xfs/Makefile            |   1 +
> >  fs/xfs/libxfs/xfs_attr.c   |   4 +-
> >  fs/xfs/libxfs/xfs_attr.h   |   4 +-
> >  fs/xfs/libxfs/xfs_parent.c | 134 +++++++++++++++++++++++++++++++++++++
> >  fs/xfs/libxfs/xfs_parent.h |  34 ++++++++++
> >  fs/xfs/xfs_inode.c         |  37 ++++++++--
> >  fs/xfs/xfs_xattr.c         |   2 +-
> >  fs/xfs/xfs_xattr.h         |   1 +
> >  8 files changed, 208 insertions(+), 9 deletions(-)
> > 
> > diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
> > index 1131dd01e4fe..caeea8d968ba 100644
> > --- a/fs/xfs/Makefile
> > +++ b/fs/xfs/Makefile
> > @@ -40,6 +40,7 @@ xfs-y				+= $(addprefix libxfs/, \
> >  				   xfs_inode_fork.o \
> >  				   xfs_inode_buf.o \
> >  				   xfs_log_rlimit.o \
> > +				   xfs_parent.o \
> >  				   xfs_ag_resv.o \
> >  				   xfs_rmap.o \
> >  				   xfs_rmap_btree.o \
> > diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> > index 2ef3262f21e8..0a458ea7051f 100644
> > --- a/fs/xfs/libxfs/xfs_attr.c
> > +++ b/fs/xfs/libxfs/xfs_attr.c
> > @@ -880,7 +880,7 @@ xfs_attr_lookup(
> >  	return error;
> >  }
> >  
> > -static int
> > +int
> >  xfs_attr_intent_init(
> >  	struct xfs_da_args	*args,
> >  	unsigned int		op_flags,	/* op flag (set or remove) */
> > @@ -898,7 +898,7 @@ xfs_attr_intent_init(
> >  }
> >  
> >  /* Sets an attribute for an inode as a deferred operation */
> > -static int
> > +int
> >  xfs_attr_defer_add(
> >  	struct xfs_da_args	*args)
> >  {
> > diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
> > index af92cc57e7d8..b47417b5172f 100644
> > --- a/fs/xfs/libxfs/xfs_attr.h
> > +++ b/fs/xfs/libxfs/xfs_attr.h
> > @@ -544,6 +544,7 @@ int xfs_inode_hasattr(struct xfs_inode *ip);
> >  bool xfs_attr_is_leaf(struct xfs_inode *ip);
> >  int xfs_attr_get_ilocked(struct xfs_da_args *args);
> >  int xfs_attr_get(struct xfs_da_args *args);
> > +int xfs_attr_defer_add(struct xfs_da_args *args);
> >  int xfs_attr_set(struct xfs_da_args *args);
> >  int xfs_attr_set_iter(struct xfs_attr_intent *attr);
> >  int xfs_attr_remove_iter(struct xfs_attr_intent *attr);
> > @@ -552,7 +553,8 @@ bool xfs_attr_namecheck(struct xfs_mount *mp, const void *name, size_t length,
> >  int xfs_attr_calc_size(struct xfs_da_args *args, int *local);
> >  void xfs_init_attr_trans(struct xfs_da_args *args, struct xfs_trans_res *tres,
> >  			 unsigned int *total);
> > -
> > +int xfs_attr_intent_init(struct xfs_da_args *args, unsigned int op_flags,
> > +			 struct xfs_attr_intent  **attr);
> >  /*
> >   * Check to see if the attr should be upgraded from non-existent or shortform to
> >   * single-leaf-block attribute list.
> > diff --git a/fs/xfs/libxfs/xfs_parent.c b/fs/xfs/libxfs/xfs_parent.c
> > new file mode 100644
> > index 000000000000..4ab531c77d7d
> > --- /dev/null
> > +++ b/fs/xfs/libxfs/xfs_parent.c
> > @@ -0,0 +1,134 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * Copyright (c) 2022 Oracle, Inc.
> > + * All rights reserved.
> > + */
> > +#include "xfs.h"
> > +#include "xfs_fs.h"
> > +#include "xfs_format.h"
> > +#include "xfs_da_format.h"
> > +#include "xfs_log_format.h"
> > +#include "xfs_shared.h"
> > +#include "xfs_trans_resv.h"
> > +#include "xfs_mount.h"
> > +#include "xfs_bmap_btree.h"
> > +#include "xfs_inode.h"
> > +#include "xfs_error.h"
> > +#include "xfs_trace.h"
> > +#include "xfs_trans.h"
> > +#include "xfs_da_btree.h"
> > +#include "xfs_attr.h"
> > +#include "xfs_da_btree.h"
> > +#include "xfs_attr_sf.h"
> > +#include "xfs_bmap.h"
> > +#include "xfs_defer.h"
> > +#include "xfs_log.h"
> > +#include "xfs_xattr.h"
> > +#include "xfs_parent.h"
> > +
> > +/*
> > + * Parent pointer attribute handling.
> > + *
> > + * Because the attribute value is a filename component, it will never be longer
> > + * than 255 bytes. This means the attribute will always be a local format
> > + * attribute as it is xfs_attr_leaf_entsize_local_max() for v5 filesystems will
> > + * always be larger than this (max is 75% of block size).
> > + *
> > + * Creating a new parent attribute will always create a new attribute - there
> > + * should never, ever be an existing attribute in the tree for a new inode.
> > + * ENOSPC behavior is problematic - creating the inode without the parent
> > + * pointer is effectively a corruption, so we allow parent attribute creation
> > + * to dip into the reserve block pool to avoid unexpected ENOSPC errors from
> > + * occurring.
> 
> Shouldn't we increase XFS_LINK_SPACE_RES to avoid this?  The reserve
> pool isn't terribly large (8192 blocks) and was really only supposed to
> save us from an ENOSPC shutdown if an unwritten extent conversion in the
> writeback endio handler needs a few more blocks.
> 
> IOWs, we really ought to ENOSPC at transaction reservation time instead
> of draining the reserve pool.
> 
> > + */
> > +
> > +
> > +/* Initializes a xfs_parent_name_rec to be stored as an attribute name */
> > +void
> > +xfs_init_parent_name_rec(
> > +	struct xfs_parent_name_rec	*rec,
> > +	struct xfs_inode		*ip,
> > +	uint32_t			p_diroffset)
> > +{
> > +	xfs_ino_t			p_ino = ip->i_ino;
> > +	uint32_t			p_gen = VFS_I(ip)->i_generation;
> > +
> > +	rec->p_ino = cpu_to_be64(p_ino);
> > +	rec->p_gen = cpu_to_be32(p_gen);
> > +	rec->p_diroffset = cpu_to_be32(p_diroffset);
> > +}
> > +
> > +/* Initializes a xfs_parent_name_irec from an xfs_parent_name_rec */
> > +void
> > +xfs_init_parent_name_irec(
> > +	struct xfs_parent_name_irec	*irec,
> > +	struct xfs_parent_name_rec	*rec)
> > +{
> > +	irec->p_ino = be64_to_cpu(rec->p_ino);
> > +	irec->p_gen = be32_to_cpu(rec->p_gen);
> > +	irec->p_diroffset = be32_to_cpu(rec->p_diroffset);
> > +}
> > +
> > +int
> > +xfs_parent_init(
> > +	xfs_mount_t                     *mp,
> > +	xfs_inode_t			*ip,

More nits: Please don't use struct typedefs here.

> > +	struct xfs_name			*target_name,
> > +	struct xfs_parent_defer		**parentp)
> > +{
> > +	struct xfs_parent_defer		*parent;
> > +	int				error;
> > +
> > +	if (!xfs_has_parent(mp))
> > +		return 0;
> > +
> > +	error = xfs_attr_grab_log_assist(mp);
> 
> At some point we might want to consider boosting performance by setting
> XFS_SB_FEAT_INCOMPAT_LOG_XATTRS permanently when parent pointers are
> turned on, since adding the feature requires a synchronous bwrite of the
> primary superblock.
> 
> I /think/ this could be accomplished by setting the feature bit in mkfs
> and teaching xlog_clear_incompat to exit if xfs_has_parent()==true.
> Then we can skip the xfs_attr_grab_log_assist calls.
> 
> But, let's focus on getting this patchset into good enough shape that
> we can be confident that we don't need any ondisk format changes, and
> worry about speed later.
> 
> > +	if (error)
> > +		return error;
> > +
> > +	parent = kzalloc(sizeof(*parent), GFP_KERNEL);
> 
> These objects are going to be created and freed fairly frequently; could
> you please convert these to a kmem cache?  (That can be a cleanup at the
> end.)
> 
> > +	if (!parent)
> > +		return -ENOMEM;
> > +
> > +	/* init parent da_args */
> > +	parent->args.dp = ip;
> > +	parent->args.geo = mp->m_attr_geo;
> > +	parent->args.whichfork = XFS_ATTR_FORK;
> > +	parent->args.attr_filter = XFS_ATTR_PARENT;
> > +	parent->args.op_flags = XFS_DA_OP_OKNOENT | XFS_DA_OP_LOGGED;
> > +	parent->args.name = (const uint8_t *)&parent->rec;
> > +	parent->args.namelen = sizeof(struct xfs_parent_name_rec);
> > +
> > +	if (target_name) {
> > +		parent->args.value = (void *)target_name->name;
> > +		parent->args.valuelen = target_name->len;
> > +	}
> > +
> > +	*parentp = parent;
> > +	return 0;
> > +}
> > +
> > +int
> > +xfs_parent_defer_add(
> > +	struct xfs_trans	*tp,
> > +	struct xfs_inode	*ip,
> > +	struct xfs_parent_defer	*parent,
> > +	xfs_dir2_dataptr_t	diroffset)
> > +{
> > +	struct xfs_da_args	*args = &parent->args;
> > +
> > +	xfs_init_parent_name_rec(&parent->rec, ip, diroffset);
> > +	args->trans = tp;
> > +	args->hashval = xfs_da_hashname(args->name, args->namelen);
> > +	return xfs_attr_defer_add(args);
> > +}
> > +
> > +void
> > +xfs_parent_cancel(
> > +	xfs_mount_t		*mp,
> > +	struct xfs_parent_defer *parent)
> > +{
> > +	xlog_drop_incompat_feat(mp->m_log);
> > +	kfree(parent);
> > +}
> > +
> > diff --git a/fs/xfs/libxfs/xfs_parent.h b/fs/xfs/libxfs/xfs_parent.h
> > new file mode 100644
> > index 000000000000..21a350b97ed5
> > --- /dev/null
> > +++ b/fs/xfs/libxfs/xfs_parent.h
> > @@ -0,0 +1,34 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * Copyright (c) 2022 Oracle, Inc.
> > + * All Rights Reserved.
> > + */
> > +#ifndef	__XFS_PARENT_H__
> > +#define	__XFS_PARENT_H__
> > +
> > +/*
> > + * Dynamically allocd structure used to wrap the needed data to pass around
> > + * the defer ops machinery
> > + */
> > +struct xfs_parent_defer {
> > +	struct xfs_parent_name_rec	rec;
> > +	struct xfs_da_args		args;
> > +};
> > +
> > +/*
> > + * Parent pointer attribute prototypes
> > + */
> > +void xfs_init_parent_name_rec(struct xfs_parent_name_rec *rec,
> > +			      struct xfs_inode *ip,
> > +			      uint32_t p_diroffset);
> > +void xfs_init_parent_name_irec(struct xfs_parent_name_irec *irec,
> > +			       struct xfs_parent_name_rec *rec);
> > +int xfs_parent_init(xfs_mount_t *mp, xfs_inode_t *ip,
> > +		    struct xfs_name *target_name,
> > +		    struct xfs_parent_defer **parentp);
> > +int xfs_parent_defer_add(struct xfs_trans *tp, struct xfs_inode *ip,
> > +			 struct xfs_parent_defer *parent,
> > +			 xfs_dir2_dataptr_t diroffset);
> > +void xfs_parent_cancel(xfs_mount_t *mp, struct xfs_parent_defer *parent);
> > +
> > +#endif	/* __XFS_PARENT_H__ */
> > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> > index 09876ba10a42..ef993c3a8963 100644
> > --- a/fs/xfs/xfs_inode.c
> > +++ b/fs/xfs/xfs_inode.c
> > @@ -37,6 +37,8 @@
> >  #include "xfs_reflink.h"
> >  #include "xfs_ag.h"
> >  #include "xfs_log_priv.h"
> > +#include "xfs_parent.h"
> > +#include "xfs_xattr.h"
> >  
> >  struct kmem_cache *xfs_inode_cache;
> >  
> > @@ -950,7 +952,7 @@ xfs_bumplink(
> >  int
> >  xfs_create(
> >  	struct user_namespace	*mnt_userns,
> > -	xfs_inode_t		*dp,
> > +	struct xfs_inode	*dp,
> >  	struct xfs_name		*name,
> >  	umode_t			mode,
> >  	dev_t			rdev,
> > @@ -962,7 +964,7 @@ xfs_create(
> >  	struct xfs_inode	*ip = NULL;
> >  	struct xfs_trans	*tp = NULL;
> >  	int			error;
> > -	bool                    unlock_dp_on_error = false;
> > +	bool			unlock_dp_on_error = false;
> >  	prid_t			prid;
> >  	struct xfs_dquot	*udqp = NULL;
> >  	struct xfs_dquot	*gdqp = NULL;
> > @@ -970,6 +972,8 @@ xfs_create(
> >  	struct xfs_trans_res	*tres;
> >  	uint			resblks;
> >  	xfs_ino_t		ino;
> > +	xfs_dir2_dataptr_t	diroffset;
> > +	struct xfs_parent_defer	*parent = NULL;
> >  
> >  	trace_xfs_create(dp, name);
> >  
> > @@ -996,6 +1000,12 @@ xfs_create(
> >  		tres = &M_RES(mp)->tr_create;
> >  	}
> >  
> > +	if (xfs_has_parent(mp)) {
> > +		error = xfs_parent_init(mp, dp, name, &parent);
> > +		if (error)
> > +			goto out_release_dquots;
> > +	}
> > +
> >  	/*
> >  	 * Initially assume that the file does not exist and
> >  	 * reserve the resources for that case.  If that is not
> > @@ -1011,7 +1021,7 @@ xfs_create(
> >  				resblks, &tp);
> >  	}
> >  	if (error)
> > -		goto out_release_dquots;
> > +		goto drop_incompat;
> >  
> >  	xfs_ilock(dp, XFS_ILOCK_EXCL | XFS_ILOCK_PARENT);
> >  	unlock_dp_on_error = true;
> > @@ -1021,6 +1031,7 @@ xfs_create(
> >  	 * entry pointing to them, but a directory also the "." entry
> >  	 * pointing to itself.
> >  	 */
> > +	init_xattrs |= xfs_has_parent(mp);
> >  	error = xfs_dialloc(&tp, dp->i_ino, mode, &ino);
> >  	if (!error)
> >  		error = xfs_init_new_inode(mnt_userns, tp, dp, ino, mode,
> > @@ -1035,11 +1046,12 @@ xfs_create(
> >  	 * the transaction cancel unlocking dp so don't do it explicitly in the
> >  	 * error path.
> >  	 */
> > -	xfs_trans_ijoin(tp, dp, XFS_ILOCK_EXCL);
> > +	xfs_trans_ijoin(tp, dp, 0);
> >  	unlock_dp_on_error = false;
> >  
> >  	error = xfs_dir_createname(tp, dp, name, ip->i_ino,
> > -				   resblks - XFS_IALLOC_SPACE_RES(mp), NULL);
> > +				   resblks - XFS_IALLOC_SPACE_RES(mp),
> > +				   &diroffset);
> >  	if (error) {
> >  		ASSERT(error != -ENOSPC);
> >  		goto out_trans_cancel;
> > @@ -1055,6 +1067,17 @@ xfs_create(
> >  		xfs_bumplink(tp, dp);
> >  	}
> >  
> > +	/*
> > +	 * If we have parent pointers, we need to add the attribute containing
> > +	 * the parent information now.
> > +	 */
> > +	if (parent) {
> > +		parent->args.dp	= ip;

...and on second thought, it seems a little odd that you pass @dp to
xfs_parent_init only to override parent->args.dp here.  Given that this
doesn't do anything with @parent until here, why not pass NULL to the
init function above?

--D

> > +		error = xfs_parent_defer_add(tp, dp, parent, diroffset);
> > +		if (error)
> > +			goto out_trans_cancel;
> > +	}
> > +
> >  	/*
> >  	 * If this is a synchronous mount, make sure that the
> >  	 * create transaction goes to disk before returning to
> > @@ -1080,6 +1103,7 @@ xfs_create(
> >  
> >  	*ipp = ip;
> >  	xfs_iunlock(ip, XFS_ILOCK_EXCL);
> > +	xfs_iunlock(dp, XFS_ILOCK_EXCL | XFS_ILOCK_PARENT);
> 
> I don't think we need the ILOCK class annotations for unlocks.
> 
> Other than the two things I asked about, this is looking good.
> 
> --D
> 
> >  	return 0;
> >  
> >   out_trans_cancel:
> > @@ -1094,6 +1118,9 @@ xfs_create(
> >  		xfs_finish_inode_setup(ip);
> >  		xfs_irele(ip);
> >  	}
> > + drop_incompat:
> > +	if (parent)
> > +		xfs_parent_cancel(mp, parent);
> >   out_release_dquots:
> >  	xfs_qm_dqrele(udqp);
> >  	xfs_qm_dqrele(gdqp);
> > diff --git a/fs/xfs/xfs_xattr.c b/fs/xfs/xfs_xattr.c
> > index c325a28b89a8..d9067c5f6bd6 100644
> > --- a/fs/xfs/xfs_xattr.c
> > +++ b/fs/xfs/xfs_xattr.c
> > @@ -27,7 +27,7 @@
> >   * they must release the permission by calling xlog_drop_incompat_feat
> >   * when they're done.
> >   */
> > -static inline int
> > +int
> >  xfs_attr_grab_log_assist(
> >  	struct xfs_mount	*mp)
> >  {
> > diff --git a/fs/xfs/xfs_xattr.h b/fs/xfs/xfs_xattr.h
> > index 2b09133b1b9b..3fd6520a4d69 100644
> > --- a/fs/xfs/xfs_xattr.h
> > +++ b/fs/xfs/xfs_xattr.h
> > @@ -7,6 +7,7 @@
> >  #define __XFS_XATTR_H__
> >  
> >  int xfs_attr_change(struct xfs_da_args *args);
> > +int xfs_attr_grab_log_assist(struct xfs_mount *mp);
> >  
> >  extern const struct xattr_handler *xfs_xattr_handlers[];
> >  
> > -- 
> > 2.25.1
> > 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH RESEND v2 13/18] xfs: add parent attributes to link
  2022-08-04 19:40 ` [PATCH RESEND v2 13/18] xfs: add parent attributes to link Allison Henderson
@ 2022-08-09 18:43   ` Darrick J. Wong
  2022-08-10  3:09     ` Alli
  0 siblings, 1 reply; 58+ messages in thread
From: Darrick J. Wong @ 2022-08-09 18:43 UTC (permalink / raw)
  To: Allison Henderson; +Cc: linux-xfs

On Thu, Aug 04, 2022 at 12:40:08PM -0700, Allison Henderson wrote:
> This patch modifies xfs_link to add a parent pointer to the inode.
> 
> [bfoster: rebase, use VFS inode fields, fix xfs_bmap_finish() usage]
> [achender: rebased, changed __unint32_t to xfs_dir2_dataptr_t,
>            fixed null pointer bugs]
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
> ---
>  fs/xfs/xfs_inode.c | 43 ++++++++++++++++++++++++++++++++++---------
>  1 file changed, 34 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index ef993c3a8963..6e5deb0d42c4 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -1228,14 +1228,16 @@ xfs_create_tmpfile(
>  
>  int
>  xfs_link(
> -	xfs_inode_t		*tdp,
> -	xfs_inode_t		*sip,
> +	struct xfs_inode	*tdp,
> +	struct xfs_inode	*sip,
>  	struct xfs_name		*target_name)
>  {
> -	xfs_mount_t		*mp = tdp->i_mount;
> -	xfs_trans_t		*tp;
> +	struct xfs_mount	*mp = tdp->i_mount;
> +	struct xfs_trans	*tp;
>  	int			error, nospace_error = 0;
>  	int			resblks;
> +	xfs_dir2_dataptr_t	diroffset;
> +	struct xfs_parent_defer	*parent = NULL;
>  
>  	trace_xfs_link(tdp, target_name);
>  
> @@ -1252,11 +1254,17 @@ xfs_link(
>  	if (error)
>  		goto std_return;
>  
> +	if (xfs_has_parent(mp)) {
> +		error = xfs_parent_init(mp, sip, target_name, &parent);

Why does xfs_parent_init check xfs_has_parent if the callers already do
that?

> +		if (error)
> +			goto std_return;
> +	}
> +
>  	resblks = XFS_LINK_SPACE_RES(mp, target_name->len);

Same comment about increasing XFS_LINK_SPACE_RES to accomodate xattr
expansion as I had for the last patch.

>  	error = xfs_trans_alloc_dir(tdp, &M_RES(mp)->tr_link, sip, &resblks,
>  			&tp, &nospace_error);
>  	if (error)
> -		goto std_return;
> +		goto drop_incompat;
>  
>  	/*
>  	 * If we are using project inheritance, we only allow hard link
> @@ -1289,14 +1297,26 @@ xfs_link(
>  	}
>  
>  	error = xfs_dir_createname(tp, tdp, target_name, sip->i_ino,
> -				   resblks, NULL);
> +				   resblks, &diroffset);
>  	if (error)
> -		goto error_return;
> +		goto out_defer_cancel;
>  	xfs_trans_ichgtime(tp, tdp, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG);
>  	xfs_trans_log_inode(tp, tdp, XFS_ILOG_CORE);
>  
>  	xfs_bumplink(tp, sip);
>  
> +	/*
> +	 * If we have parent pointers, we now need to add the parent record to
> +	 * the attribute fork of the inode. If this is the initial parent
> +	 * attribute, we need to create it correctly, otherwise we can just add
> +	 * the parent to the inode.
> +	 */
> +	if (parent) {
> +		error = xfs_parent_defer_add(tp, tdp, parent, diroffset);

A followup to the comments I made to the previous patch about
parent->args.dp --

Since you're partially initializing the xfs_defer_parent structure
before you even have the dir offset, why not delay initializing the
parent and child pointers until the xfs_parent_defer_add step?

int
xfs_parent_init(
	struct xfs_mount		*mp,
	struct xfs_parent_defer		**parentp)
{
	struct xfs_parent_defer		*parent;
	int				error;

	if (!xfs_has_parent(mp))
		return 0;

	error = xfs_attr_grab_log_assist(mp);
	if (error)
		return error;

	parent = kzalloc(sizeof(*parent), GFP_KERNEL);
	if (!parent)
		return -ENOMEM;

	/* init parent da_args */
	parent->args.geo = mp->m_attr_geo;
	parent->args.whichfork = XFS_ATTR_FORK;
	parent->args.attr_filter = XFS_ATTR_PARENT;
	parent->args.op_flags = XFS_DA_OP_OKNOENT | XFS_DA_OP_LOGGED;
	parent->args.name = (const uint8_t *)&parent->rec;
	parent->args.namelen = sizeof(struct xfs_parent_name_rec);

	*parentp = parent;
	return 0;
}

int
xfs_parent_defer_add(
	struct xfs_trans	*tp,
	struct xfs_parent_defer	*parent,
	struct xfs_inode	*dp,
	struct xfs_name		*parent_name,
	xfs_dir2_dataptr_t	parent_offset,
	struct xfs_inode	*child)
{
	struct xfs_da_args	*args = &parent->args;

	xfs_init_parent_name_rec(&parent->rec, dp, diroffset);
	args->hashval = xfs_da_hashname(args->name, args->namelen);

	args->trans = tp;
	args->dp = child;
	if (parent_name) {
		args->name = parent_name->name;
		args->valuelen = parent_name->len;
	}
	return xfs_attr_defer_add(args);
}

And then the callsites become:

	/*
	 * If we have parent pointers, we now need to add the parent record to
	 * the attribute fork of the inode. If this is the initial parent
	 * attribute, we need to create it correctly, otherwise we can just add
	 * the parent to the inode.
	 */
	if (parent) {
		error = xfs_parent_defer_add(tp, parent, tdp,
				target_name, diroffset, sip);
		if (error)
			goto out_defer_cancel;
	}

Aside from the API suggestions, the rest looks good to me.

--D

> +		if (error)
> +			goto out_defer_cancel;
> +	}
> +
>  	/*
>  	 * If this is a synchronous mount, make sure that the
>  	 * link transaction goes to disk before returning to
> @@ -1310,11 +1330,16 @@ xfs_link(
>  	xfs_iunlock(sip, XFS_ILOCK_EXCL);
>  	return error;
>  
> - error_return:
> +out_defer_cancel:
> +	xfs_defer_cancel(tp);
> +error_return:
>  	xfs_trans_cancel(tp);
>  	xfs_iunlock(tdp, XFS_ILOCK_EXCL);
>  	xfs_iunlock(sip, XFS_ILOCK_EXCL);
> - std_return:
> +drop_incompat:
> +	if (parent)
> +		xfs_parent_cancel(mp, parent);
> +std_return:
>  	if (error == -ENOSPC && nospace_error)
>  		error = nospace_error;
>  	return error;
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH RESEND v2 14/18] xfs: remove parent pointers in unlink
  2022-08-04 19:40 ` [PATCH RESEND v2 14/18] xfs: remove parent pointers in unlink Allison Henderson
@ 2022-08-09 18:45   ` Darrick J. Wong
  2022-08-10  3:09     ` Alli
  0 siblings, 1 reply; 58+ messages in thread
From: Darrick J. Wong @ 2022-08-09 18:45 UTC (permalink / raw)
  To: Allison Henderson; +Cc: linux-xfs

On Thu, Aug 04, 2022 at 12:40:09PM -0700, Allison Henderson wrote:
> This patch removes the parent pointer attribute during unlink
> 
> [bfoster: rebase, use VFS inode generation]
> [achender: rebased, changed __unint32_t to xfs_dir2_dataptr_t
>            implemented xfs_attr_remove_parent]
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_attr.c   |  2 +-
>  fs/xfs/libxfs/xfs_attr.h   |  1 +
>  fs/xfs/libxfs/xfs_parent.c | 15 +++++++++++++++
>  fs/xfs/libxfs/xfs_parent.h |  3 +++
>  fs/xfs/xfs_inode.c         | 29 +++++++++++++++++++++++------
>  5 files changed, 43 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> index 0a458ea7051f..77513ff7e1ec 100644
> --- a/fs/xfs/libxfs/xfs_attr.c
> +++ b/fs/xfs/libxfs/xfs_attr.c
> @@ -936,7 +936,7 @@ xfs_attr_defer_replace(
>  }
>  
>  /* Removes an attribute for an inode as a deferred operation */
> -static int
> +int
>  xfs_attr_defer_remove(
>  	struct xfs_da_args	*args)
>  {
> diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
> index b47417b5172f..2e11e5e83941 100644
> --- a/fs/xfs/libxfs/xfs_attr.h
> +++ b/fs/xfs/libxfs/xfs_attr.h
> @@ -545,6 +545,7 @@ bool xfs_attr_is_leaf(struct xfs_inode *ip);
>  int xfs_attr_get_ilocked(struct xfs_da_args *args);
>  int xfs_attr_get(struct xfs_da_args *args);
>  int xfs_attr_defer_add(struct xfs_da_args *args);
> +int xfs_attr_defer_remove(struct xfs_da_args *args);
>  int xfs_attr_set(struct xfs_da_args *args);
>  int xfs_attr_set_iter(struct xfs_attr_intent *attr);
>  int xfs_attr_remove_iter(struct xfs_attr_intent *attr);
> diff --git a/fs/xfs/libxfs/xfs_parent.c b/fs/xfs/libxfs/xfs_parent.c
> index 4ab531c77d7d..03f03f731d02 100644
> --- a/fs/xfs/libxfs/xfs_parent.c
> +++ b/fs/xfs/libxfs/xfs_parent.c
> @@ -123,6 +123,21 @@ xfs_parent_defer_add(
>  	return xfs_attr_defer_add(args);
>  }
>  
> +int
> +xfs_parent_defer_remove(
> +	struct xfs_trans	*tp,
> +	struct xfs_inode	*ip,
> +	struct xfs_parent_defer	*parent,
> +	xfs_dir2_dataptr_t	diroffset)

Same suggestion about setting args->dp here instead of in
xfs_parent_init.

> +{
> +	struct xfs_da_args	*args = &parent->args;
> +
> +	xfs_init_parent_name_rec(&parent->rec, ip, diroffset);
> +	args->trans = tp;
> +	args->hashval = xfs_da_hashname(args->name, args->namelen);
> +	return xfs_attr_defer_remove(args);
> +}
> +
>  void
>  xfs_parent_cancel(
>  	xfs_mount_t		*mp,
> diff --git a/fs/xfs/libxfs/xfs_parent.h b/fs/xfs/libxfs/xfs_parent.h
> index 21a350b97ed5..67948f4b3834 100644
> --- a/fs/xfs/libxfs/xfs_parent.h
> +++ b/fs/xfs/libxfs/xfs_parent.h
> @@ -29,6 +29,9 @@ int xfs_parent_init(xfs_mount_t *mp, xfs_inode_t *ip,
>  int xfs_parent_defer_add(struct xfs_trans *tp, struct xfs_inode *ip,
>  			 struct xfs_parent_defer *parent,
>  			 xfs_dir2_dataptr_t diroffset);
> +int xfs_parent_defer_remove(struct xfs_trans *tp, struct xfs_inode *ip,
> +			    struct xfs_parent_defer *parent,
> +			    xfs_dir2_dataptr_t diroffset);
>  void xfs_parent_cancel(xfs_mount_t *mp, struct xfs_parent_defer *parent);
>  
>  #endif	/* __XFS_PARENT_H__ */
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index 6e5deb0d42c4..69bb67f2a252 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -2464,16 +2464,18 @@ xfs_iunpin_wait(
>   */
>  int
>  xfs_remove(
> -	xfs_inode_t             *dp,
> +	struct xfs_inode	*dp,
>  	struct xfs_name		*name,
> -	xfs_inode_t		*ip)
> +	struct xfs_inode	*ip)
>  {
> -	xfs_mount_t		*mp = dp->i_mount;
> -	xfs_trans_t             *tp = NULL;
> +	struct xfs_mount	*mp = dp->i_mount;
> +	struct xfs_trans	*tp = NULL;
>  	int			is_dir = S_ISDIR(VFS_I(ip)->i_mode);
>  	int			dontcare;
>  	int                     error = 0;
>  	uint			resblks;
> +	xfs_dir2_dataptr_t	dir_offset;
> +	struct xfs_parent_defer	*parent = NULL;
>  
>  	trace_xfs_remove(dp, name);
>  
> @@ -2488,6 +2490,12 @@ xfs_remove(
>  	if (error)
>  		goto std_return;
>  
> +	if (xfs_has_parent(mp)) {
> +		error = xfs_parent_init(mp, ip, NULL, &parent);
> +		if (error)
> +			goto std_return;
> +	}
> +
>  	/*
>  	 * We try to get the real space reservation first, allowing for
>  	 * directory btree deletion(s) implying possible bmap insert(s).  If we
> @@ -2504,7 +2512,7 @@ xfs_remove(
>  			&tp, &dontcare);
>  	if (error) {
>  		ASSERT(error != -ENOSPC);
> -		goto std_return;
> +		goto drop_incompat;
>  	}
>  
>  	/*
> @@ -2558,12 +2566,18 @@ xfs_remove(
>  	if (error)
>  		goto out_trans_cancel;
>  
> -	error = xfs_dir_removename(tp, dp, name, ip->i_ino, resblks, NULL);
> +	error = xfs_dir_removename(tp, dp, name, ip->i_ino, resblks, &dir_offset);
>  	if (error) {
>  		ASSERT(error != -ENOENT);
>  		goto out_trans_cancel;
>  	}
>  
> +	if (xfs_has_parent(mp)) {
> +		error = xfs_parent_defer_remove(tp, dp, parent, dir_offset);

If it's safe to gate xfs_parent_cancel on "if (parent)" then can we
avoid the atomic bit access by doing that here too?

Otherwise looks good here.

--D

> +		if (error)
> +			goto out_trans_cancel;
> +	}
> +
>  	/*
>  	 * If this is a synchronous mount, make sure that the
>  	 * remove transaction goes to disk before returning to
> @@ -2588,6 +2602,9 @@ xfs_remove(
>   out_unlock:
>  	xfs_iunlock(ip, XFS_ILOCK_EXCL);
>  	xfs_iunlock(dp, XFS_ILOCK_EXCL);
> + drop_incompat:
> +	if (parent)
> +		xfs_parent_cancel(mp, parent);
>   std_return:
>  	return error;
>  }
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH RESEND v2 15/18] xfs: Add parent pointers to rename
  2022-08-04 19:40 ` [PATCH RESEND v2 15/18] xfs: Add parent pointers to rename Allison Henderson
@ 2022-08-09 18:49   ` Darrick J. Wong
  2022-08-10  3:09     ` Alli
  0 siblings, 1 reply; 58+ messages in thread
From: Darrick J. Wong @ 2022-08-09 18:49 UTC (permalink / raw)
  To: Allison Henderson; +Cc: linux-xfs

On Thu, Aug 04, 2022 at 12:40:10PM -0700, Allison Henderson wrote:
> This patch removes the old parent pointer attribute during the rename
> operation, and re-adds the updated parent pointer.  In the case of
> xfs_cross_rename, we modify the routine not to roll the transaction just
> yet.  We will do this after the parent pointer is added in the calling
> xfs_rename function.
> 
> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
> ---
>  fs/xfs/xfs_inode.c | 128 +++++++++++++++++++++++++++++++++------------
>  1 file changed, 94 insertions(+), 34 deletions(-)
> 
> diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> index 69bb67f2a252..8a81b78b6dd7 100644
> --- a/fs/xfs/xfs_inode.c
> +++ b/fs/xfs/xfs_inode.c
> @@ -2776,7 +2776,7 @@ xfs_cross_rename(
>  	}
>  	xfs_trans_ichgtime(tp, dp1, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG);
>  	xfs_trans_log_inode(tp, dp1, XFS_ILOG_CORE);
> -	return xfs_finish_rename(tp);
> +	return 0;
>  
>  out_trans_abort:
>  	xfs_trans_cancel(tp);
> @@ -2834,26 +2834,31 @@ xfs_rename_alloc_whiteout(
>   */
>  int
>  xfs_rename(
> -	struct user_namespace	*mnt_userns,
> -	struct xfs_inode	*src_dp,
> -	struct xfs_name		*src_name,
> -	struct xfs_inode	*src_ip,
> -	struct xfs_inode	*target_dp,
> -	struct xfs_name		*target_name,
> -	struct xfs_inode	*target_ip,
> -	unsigned int		flags)
> +	struct user_namespace		*mnt_userns,
> +	struct xfs_inode		*src_dp,
> +	struct xfs_name			*src_name,
> +	struct xfs_inode		*src_ip,
> +	struct xfs_inode		*target_dp,
> +	struct xfs_name			*target_name,
> +	struct xfs_inode		*target_ip,
> +	unsigned int			flags)
>  {
> -	struct xfs_mount	*mp = src_dp->i_mount;
> -	struct xfs_trans	*tp;
> -	struct xfs_inode	*wip = NULL;		/* whiteout inode */
> -	struct xfs_inode	*inodes[__XFS_SORT_INODES];
> -	int			i;
> -	int			num_inodes = __XFS_SORT_INODES;
> -	bool			new_parent = (src_dp != target_dp);
> -	bool			src_is_directory = S_ISDIR(VFS_I(src_ip)->i_mode);
> -	int			spaceres;
> -	bool			retried = false;
> -	int			error, nospace_error = 0;
> +	struct xfs_mount		*mp = src_dp->i_mount;
> +	struct xfs_trans		*tp;
> +	struct xfs_inode		*wip = NULL;		/* whiteout inode */
> +	struct xfs_inode		*inodes[__XFS_SORT_INODES];
> +	int				i;
> +	int				num_inodes = __XFS_SORT_INODES;
> +	bool				new_parent = (src_dp != target_dp);
> +	bool				src_is_directory = S_ISDIR(VFS_I(src_ip)->i_mode);
> +	int				spaceres;
> +	bool				retried = false;
> +	int				error, nospace_error = 0;
> +	xfs_dir2_dataptr_t		new_diroffset;
> +	xfs_dir2_dataptr_t		old_diroffset;
> +	struct xfs_parent_defer		*old_parent_ptr = NULL;
> +	struct xfs_parent_defer		*new_parent_ptr = NULL;
> +	struct xfs_parent_defer		*target_parent_ptr = NULL;
>  
>  	trace_xfs_rename(src_dp, target_dp, src_name, target_name);
>  
> @@ -2877,6 +2882,15 @@ xfs_rename(
>  
>  	xfs_sort_for_rename(src_dp, target_dp, src_ip, target_ip, wip,
>  				inodes, &num_inodes);
> +	if (xfs_has_parent(mp)) {
> +		error = xfs_parent_init(mp, src_ip, NULL, &old_parent_ptr);
> +		if (error)
> +			goto out_release_wip;
> +		error = xfs_parent_init(mp, src_ip, target_name,
> +					&new_parent_ptr);
> +		if (error)
> +			goto out_release_wip;
> +	}
>  
>  retry:
>  	nospace_error = 0;
> @@ -2889,7 +2903,7 @@ xfs_rename(
>  				&tp);
>  	}
>  	if (error)
> -		goto out_release_wip;
> +		goto drop_incompat;
>  
>  	/*
>  	 * Attach the dquots to the inodes
> @@ -2911,14 +2925,14 @@ xfs_rename(
>  	 * we can rely on either trans_commit or trans_cancel to unlock
>  	 * them.
>  	 */
> -	xfs_trans_ijoin(tp, src_dp, XFS_ILOCK_EXCL);
> +	xfs_trans_ijoin(tp, src_dp, 0);
>  	if (new_parent)
> -		xfs_trans_ijoin(tp, target_dp, XFS_ILOCK_EXCL);
> -	xfs_trans_ijoin(tp, src_ip, XFS_ILOCK_EXCL);
> +		xfs_trans_ijoin(tp, target_dp, 0);
> +	xfs_trans_ijoin(tp, src_ip, 0);
>  	if (target_ip)
> -		xfs_trans_ijoin(tp, target_ip, XFS_ILOCK_EXCL);
> +		xfs_trans_ijoin(tp, target_ip, 0);
>  	if (wip)
> -		xfs_trans_ijoin(tp, wip, XFS_ILOCK_EXCL);
> +		xfs_trans_ijoin(tp, wip, 0);
>  
>  	/*
>  	 * If we are using project inheritance, we only allow renames
> @@ -2928,15 +2942,16 @@ xfs_rename(
>  	if (unlikely((target_dp->i_diflags & XFS_DIFLAG_PROJINHERIT) &&
>  		     target_dp->i_projid != src_ip->i_projid)) {
>  		error = -EXDEV;
> -		goto out_trans_cancel;
> +		goto out_unlock;
>  	}
>  
>  	/* RENAME_EXCHANGE is unique from here on. */
> -	if (flags & RENAME_EXCHANGE)
> -		return xfs_cross_rename(tp, src_dp, src_name, src_ip,
> +	if (flags & RENAME_EXCHANGE) {
> +		error = xfs_cross_rename(tp, src_dp, src_name, src_ip,
>  					target_dp, target_name, target_ip,
>  					spaceres);
> -
> +		goto out_pptr;
> +	}
>  	/*
>  	 * Try to reserve quota to handle an expansion of the target directory.
>  	 * We'll allow the rename to continue in reservationless mode if we hit
> @@ -3052,7 +3067,7 @@ xfs_rename(
>  		 * to account for the ".." reference from the new entry.
>  		 */
>  		error = xfs_dir_createname(tp, target_dp, target_name,
> -					   src_ip->i_ino, spaceres, NULL);
> +					   src_ip->i_ino, spaceres, &new_diroffset);
>  		if (error)
>  			goto out_trans_cancel;
>  
> @@ -3073,10 +3088,14 @@ xfs_rename(
>  		 * name at the destination directory, remove it first.
>  		 */
>  		error = xfs_dir_replace(tp, target_dp, target_name,
> -					src_ip->i_ino, spaceres, NULL);
> +					src_ip->i_ino, spaceres, &new_diroffset);
>  		if (error)
>  			goto out_trans_cancel;
>  
> +		if (xfs_has_parent(mp))
> +			error = xfs_parent_init(mp, target_ip, NULL,
> +						&target_parent_ptr);
> +
>  		xfs_trans_ichgtime(tp, target_dp,
>  					XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG);
>  
> @@ -3146,26 +3165,67 @@ xfs_rename(
>  	 */
>  	if (wip)
>  		error = xfs_dir_replace(tp, src_dp, src_name, wip->i_ino,
> -					spaceres, NULL);
> +					spaceres, &old_diroffset);
>  	else
>  		error = xfs_dir_removename(tp, src_dp, src_name, src_ip->i_ino,
> -					   spaceres, NULL);
> +					   spaceres, &old_diroffset);
>  
>  	if (error)
>  		goto out_trans_cancel;
>  
> +out_pptr:
> +	if (new_parent_ptr) {
> +		error = xfs_parent_defer_add(tp, target_dp, new_parent_ptr,
> +					     new_diroffset);
> +		if (error)
> +			goto out_trans_cancel;
> +	}
> +
> +	if (old_parent_ptr) {
> +		error = xfs_parent_defer_remove(tp, src_dp, old_parent_ptr,
> +						old_diroffset);
> +		if (error)
> +			goto out_trans_cancel;
> +	}
> +
> +	if (target_parent_ptr) {
> +		error = xfs_parent_defer_remove(tp, target_dp,
> +						target_parent_ptr,
> +						new_diroffset);
> +		if (error)
> +			goto out_trans_cancel;
> +	}
> +
>  	xfs_trans_ichgtime(tp, src_dp, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG);
>  	xfs_trans_log_inode(tp, src_dp, XFS_ILOG_CORE);
>  	if (new_parent)
>  		xfs_trans_log_inode(tp, target_dp, XFS_ILOG_CORE);
>  
>  	error = xfs_finish_rename(tp);
> +
> +out_unlock:
>  	if (wip)
>  		xfs_irele(wip);
> +	if (wip)
> +		xfs_iunlock(wip, XFS_ILOCK_EXCL);
> +	if (target_ip)
> +		xfs_iunlock(target_ip, XFS_ILOCK_EXCL);
> +	xfs_iunlock(src_ip, XFS_ILOCK_EXCL);
> +	if (new_parent)
> +		xfs_iunlock(target_dp, XFS_ILOCK_EXCL);
> +	xfs_iunlock(src_dp, XFS_ILOCK_EXCL);

Sorry to be fussy, but could you separate the ILOCK unlocking changes
(and maybe the variable indentation part too) into a separate prep
patch, please?

Also, who frees the xfs_parent_defer objects?

--D

> +
>  	return error;
>  
>  out_trans_cancel:
>  	xfs_trans_cancel(tp);
> +drop_incompat:
> +	if (new_parent_ptr)
> +		xfs_parent_cancel(mp, new_parent_ptr);
> +	if (old_parent_ptr)
> +		xfs_parent_cancel(mp, old_parent_ptr);
> +	if (target_parent_ptr)
> +		xfs_parent_cancel(mp, target_parent_ptr);
>  out_release_wip:
>  	if (wip)
>  		xfs_irele(wip);
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH RESEND v2 18/18] xfs: Add parent pointer ioctl
  2022-08-04 19:40 ` [PATCH RESEND v2 18/18] xfs: Add parent pointer ioctl Allison Henderson
@ 2022-08-09 19:26   ` Darrick J. Wong
  2022-08-10  3:09     ` Alli
  0 siblings, 1 reply; 58+ messages in thread
From: Darrick J. Wong @ 2022-08-09 19:26 UTC (permalink / raw)
  To: Allison Henderson; +Cc: linux-xfs

On Thu, Aug 04, 2022 at 12:40:13PM -0700, Allison Henderson wrote:
> This patch adds a new file ioctl to retrieve the parent pointer of a
> given inode
> 
> Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
> ---
>  fs/xfs/Makefile            |   1 +
>  fs/xfs/libxfs/xfs_fs.h     |  57 ++++++++++++++++
>  fs/xfs/libxfs/xfs_parent.c |  10 +++
>  fs/xfs/libxfs/xfs_parent.h |   2 +
>  fs/xfs/xfs_ioctl.c         |  95 +++++++++++++++++++++++++-
>  fs/xfs/xfs_ondisk.h        |   4 ++
>  fs/xfs/xfs_parent_utils.c  | 134 +++++++++++++++++++++++++++++++++++++
>  fs/xfs/xfs_parent_utils.h  |  22 ++++++
>  8 files changed, 323 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
> index caeea8d968ba..998658e40ab4 100644
> --- a/fs/xfs/Makefile
> +++ b/fs/xfs/Makefile
> @@ -86,6 +86,7 @@ xfs-y				+= xfs_aops.o \
>  				   xfs_mount.o \
>  				   xfs_mru_cache.o \
>  				   xfs_pwork.o \
> +				   xfs_parent_utils.o \
>  				   xfs_reflink.o \
>  				   xfs_stats.o \
>  				   xfs_super.o \
> diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
> index b0b4d7a3aa15..ba6ec82a0272 100644
> --- a/fs/xfs/libxfs/xfs_fs.h
> +++ b/fs/xfs/libxfs/xfs_fs.h
> @@ -574,6 +574,7 @@ typedef struct xfs_fsop_handlereq {
>  #define XFS_IOC_ATTR_SECURE	0x0008	/* use attrs in security namespace */
>  #define XFS_IOC_ATTR_CREATE	0x0010	/* fail if attr already exists */
>  #define XFS_IOC_ATTR_REPLACE	0x0020	/* fail if attr does not exist */
> +#define XFS_IOC_ATTR_PARENT	0x0040  /* use attrs in parent namespace */

This is the userspace API header, so I wonder -- should we allow
XFS_IOC_ATTRLIST_BY_HANDLE and XFS_IOC_ATTRMULTI_BY_HANDLE to access
parent pointers?

I think it's *definitely* incorrect to let ATTR_OP_REMOVE or ATTR_OP_SET
(attrmulti subcommands) to mess with parent pointers.

I don't think attrlist or ATTR_OP_GET should be touching them either,
particularly since you're defining a new ioctl to extract *only* the
parent pointers.

If there wasn't XFS_IOC_GETPPOINTER then perhaps it would be ok to allow
reads via ATTRLIST/ATTRMULTI.  But even then, I don't think we want
things like xfsdump to think that it has to preserve those attributes
since xfsrestore will reconstruct the directory tree (and hence the
pptrs) for us.

>  
>  typedef struct xfs_attrlist_cursor {
>  	__u32		opaque[4];
> @@ -752,6 +753,61 @@ struct xfs_scrub_metadata {
>  				 XFS_SCRUB_OFLAG_NO_REPAIR_NEEDED)
>  #define XFS_SCRUB_FLAGS_ALL	(XFS_SCRUB_FLAGS_IN | XFS_SCRUB_FLAGS_OUT)
>  
> +#define XFS_PPTR_MAXNAMELEN				256
> +
> +/* return parents of the handle, not the open fd */
> +#define XFS_PPTR_IFLAG_HANDLE  (1U << 0)
> +
> +/* target was the root directory */
> +#define XFS_PPTR_OFLAG_ROOT    (1U << 1)
> +
> +/* Cursor is done iterating pptrs */
> +#define XFS_PPTR_OFLAG_DONE    (1U << 2)
> +
> +/* Get an inode parent pointer through ioctl */
> +struct xfs_parent_ptr {
> +	__u64		xpp_ino;			/* Inode */
> +	__u32		xpp_gen;			/* Inode generation */
> +	__u32		xpp_diroffset;			/* Directory offset */
> +	__u32		xpp_namelen;			/* File name length */
> +	__u32		xpp_pad;
> +	__u8		xpp_name[XFS_PPTR_MAXNAMELEN];	/* File name */

Since xpp_name is a fixed-length array that is long enough to ensure
that there's a null at the end of the name, we don't need xpp_namelen.

I wonder if xpp_namelen and xpp_pad should simply turn into a u64 field
that's defined zero for future expansion?

> +};
> +
> +/* Iterate through an inodes parent pointers */
> +struct xfs_pptr_info {
> +	struct xfs_handle		pi_handle;
> +	struct xfs_attrlist_cursor	pi_cursor;
> +	__u32				pi_flags;
> +	__u32				pi_reserved;
> +	__u32				pi_ptrs_size;

Is this the number of elements in pi_parents[]?

> +	__u32				pi_ptrs_used;
> +	__u64				pi_reserved2[6];
> +
> +	/*
> +	 * An array of struct xfs_parent_ptr follows the header
> +	 * information. Use XFS_PPINFO_TO_PP() to access the
> +	 * parent pointer array entries.
> +	 */
> +	struct xfs_parent_ptr		pi_parents[];
> +};
> +
> +static inline size_t
> +xfs_pptr_info_sizeof(int nr_ptrs)
> +{
> +	return sizeof(struct xfs_pptr_info) +
> +	       (nr_ptrs * sizeof(struct xfs_parent_ptr));
> +}
> +
> +static inline struct xfs_parent_ptr*
> +xfs_ppinfo_to_pp(
> +	struct xfs_pptr_info	*info,
> +	int			idx)
> +{
> +

Nit: extra space.

> +	return &info->pi_parents[idx];
> +}
> +
>  /*
>   * ioctl limits
>   */
> @@ -797,6 +853,7 @@ struct xfs_scrub_metadata {
>  /*	XFS_IOC_GETFSMAP ------ hoisted 59         */
>  #define XFS_IOC_SCRUB_METADATA	_IOWR('X', 60, struct xfs_scrub_metadata)
>  #define XFS_IOC_AG_GEOMETRY	_IOWR('X', 61, struct xfs_ag_geometry)
> +#define XFS_IOC_GETPPOINTER	_IOR ('X', 62, struct xfs_parent_ptr)

I wonder if this name should more strongly emphasize that it's for reading
the parents of a file?

#define XFS_IOC_GETPARENTS	_IOWR(...)

Also, the ioctl reads and writes its parameter, so this is _IOWR, not
_IOR.

BTW, is there a sample manpage somewhere?

>  
>  /*
>   * ioctl commands that replace IRIX syssgi()'s
> diff --git a/fs/xfs/libxfs/xfs_parent.c b/fs/xfs/libxfs/xfs_parent.c
> index 03f03f731d02..d9c922a78617 100644
> --- a/fs/xfs/libxfs/xfs_parent.c
> +++ b/fs/xfs/libxfs/xfs_parent.c
> @@ -26,6 +26,16 @@
>  #include "xfs_xattr.h"
>  #include "xfs_parent.h"
>  
> +/* Initializes a xfs_parent_ptr from an xfs_parent_name_rec */
> +void
> +xfs_init_parent_ptr(struct xfs_parent_ptr	*xpp,
> +		    struct xfs_parent_name_rec	*rec)

The second parameter ought to be const struct xfs_parent_name_rec *rec
to make it unambiguous to readers which is the source and which is the
destination argument.

> +{
> +	xpp->xpp_ino = be64_to_cpu(rec->p_ino);
> +	xpp->xpp_gen = be32_to_cpu(rec->p_gen);
> +	xpp->xpp_diroffset = be32_to_cpu(rec->p_diroffset);
> +}
> +
>  /*
>   * Parent pointer attribute handling.
>   *
> diff --git a/fs/xfs/libxfs/xfs_parent.h b/fs/xfs/libxfs/xfs_parent.h
> index 67948f4b3834..53161b79d1e2 100644
> --- a/fs/xfs/libxfs/xfs_parent.h
> +++ b/fs/xfs/libxfs/xfs_parent.h
> @@ -23,6 +23,8 @@ void xfs_init_parent_name_rec(struct xfs_parent_name_rec *rec,
>  			      uint32_t p_diroffset);
>  void xfs_init_parent_name_irec(struct xfs_parent_name_irec *irec,
>  			       struct xfs_parent_name_rec *rec);
> +void xfs_init_parent_ptr(struct xfs_parent_ptr *xpp,
> +			 struct xfs_parent_name_rec *rec);
>  int xfs_parent_init(xfs_mount_t *mp, xfs_inode_t *ip,
>  		    struct xfs_name *target_name,
>  		    struct xfs_parent_defer **parentp);
> diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> index 5b600d3f7981..8a9530588ef4 100644
> --- a/fs/xfs/xfs_ioctl.c
> +++ b/fs/xfs/xfs_ioctl.c
> @@ -37,6 +37,7 @@
>  #include "xfs_health.h"
>  #include "xfs_reflink.h"
>  #include "xfs_ioctl.h"
> +#include "xfs_parent_utils.h"
>  #include "xfs_xattr.h"
>  
>  #include <linux/mount.h>
> @@ -355,6 +356,8 @@ xfs_attr_filter(
>  		return XFS_ATTR_ROOT;
>  	if (ioc_flags & XFS_IOC_ATTR_SECURE)
>  		return XFS_ATTR_SECURE;
> +	if (ioc_flags & XFS_IOC_ATTR_PARENT)
> +		return XFS_ATTR_PARENT;
>  	return 0;
>  }
>  
> @@ -422,7 +425,8 @@ xfs_ioc_attr_list(
>  	/*
>  	 * Reject flags, only allow namespaces.
>  	 */
> -	if (flags & ~(XFS_IOC_ATTR_ROOT | XFS_IOC_ATTR_SECURE))
> +	if (flags & ~(XFS_IOC_ATTR_ROOT | XFS_IOC_ATTR_SECURE |
> +		      XFS_IOC_ATTR_PARENT))
>  		return -EINVAL;

I think xfs_ioc_attrmulti_one needs filtering for XFS_IOC_ATTR_PARENT,
if we're still going to allow attrlist/attrmulti to return parent
pointers.

>  	if (flags == (XFS_IOC_ATTR_ROOT | XFS_IOC_ATTR_SECURE))
>  		return -EINVAL;
> @@ -1679,6 +1683,92 @@ xfs_ioc_scrub_metadata(
>  	return 0;
>  }
>  
> +/*
> + * IOCTL routine to get the parent pointers of an inode and return it to user
> + * space.  Caller must pass a buffer space containing a struct xfs_pptr_info,
> + * followed by a region large enough to contain an array of struct
> + * xfs_parent_ptr of a size specified in pi_ptrs_size.  If the inode contains
> + * more parent pointers than can fit in the buffer space, caller may re-call
> + * the function using the returned pi_cursor to resume iteration.  The
> + * number of xfs_parent_ptr returned will be stored in pi_ptrs_used.
> + *
> + * Returns 0 on success or non-zero on failure
> + */
> +STATIC int
> +xfs_ioc_get_parent_pointer(
> +	struct file			*filp,
> +	void				__user *arg)
> +{
> +	struct xfs_pptr_info		*ppi = NULL;
> +	int				error = 0;
> +	struct xfs_inode		*ip = XFS_I(file_inode(filp));
> +	struct xfs_mount		*mp = ip->i_mount;
> +
> +	if (!capable(CAP_SYS_ADMIN))
> +		return -EPERM;
> +
> +	/* Allocate an xfs_pptr_info to put the user data */
> +	ppi = kmem_alloc(sizeof(struct xfs_pptr_info), 0);

New code should call kmalloc instead of the old kmem_alloc wrapper.

> +	if (!ppi)
> +		return -ENOMEM;
> +
> +	/* Copy the data from the user */
> +	error = copy_from_user(ppi, arg, sizeof(struct xfs_pptr_info));

Note: copy_from_user returns the number of bytes *not* copied.  If you
receive a nonzero return value, error usually gets set to EFAULT.

> +	if (error)
> +		goto out;
> +
> +	/* Check size of buffer requested by user */
> +	if (xfs_pptr_info_sizeof(ppi->pi_ptrs_size) > XFS_XATTR_LIST_MAX) {
> +		error = -ENOMEM;
> +		goto out;
> +	}
> +
> +	if (ppi->pi_flags != 0 && ppi->pi_flags != XFS_PPTR_IFLAG_HANDLE) {

	if (ppi->pi_flags & ~XFS_PPTR_IFLAG_HANDLE) ?

(If we really want to be pedantic, this really ought to be:

#define XFS_PPTR_IFLAG_ALL	(XFS_PPTR_IFLAG_HANDLE)

	if (ppi->pi_flags & ~XFS_PPTR_IFLAG_ALL)
		return -EINVAL;

Or you could be more flexible, since the kernel could just set the
OFLAGs appropriately and not care about their value on input:

#define XFS_PPTR_FLAG_ALL	(XFS_PPTR_IFLAG_HANDLE | XFS_PPTR_OFLAG...)

	if (ppi->pi_flags & ~XFS_PPTR_FLAG_ALL)
		return -EINVAL;

	ppi->pi_flags &= ~(XFS_PPTR_OFLAG_ROOT | XFS_PPTR_OFLAG_DONE);

> +		error = -EINVAL;
> +		goto out;
> +	}
> +
> +	/*
> +	 * Now that we know how big the trailing buffer is, expand
> +	 * our kernel xfs_pptr_info to be the same size
> +	 */
> +	ppi = krealloc(ppi, xfs_pptr_info_sizeof(ppi->pi_ptrs_size),
> +		       GFP_NOFS | __GFP_NOFAIL);
> +	if (!ppi)
> +		return -ENOMEM;

Why NOFS and NOFAIL?  We don't have any writeback resources locked
(transactions and ILOCKs) so we can hit ourselves up for memory.

> +
> +	if (ppi->pi_flags == XFS_PPTR_IFLAG_HANDLE) {

	if (ppi->pi_flags & XFS_PPTR_IFLAG_HANDLE) {

> +		error = xfs_iget(mp, NULL, ppi->pi_handle.ha_fid.fid_ino,
> +				0, 0, &ip);
> +		if (error)
> +			goto out;
> +
> +		if (VFS_I(ip)->i_generation != ppi->pi_handle.ha_fid.fid_gen) {
> +			error = -EINVAL;
> +			goto out;
> +		}
> +	}
> +
> +	if (ip->i_ino == mp->m_sb.sb_rootino)
> +		ppi->pi_flags |= XFS_PPTR_OFLAG_ROOT;
> +
> +	/* Get the parent pointers */
> +	error = xfs_attr_get_parent_pointer(ip, ppi);
> +
> +	if (error)
> +		goto out;
> +
> +	/* Copy the parent pointers back to the user */
> +	error = copy_to_user(arg, ppi,
> +			xfs_pptr_info_sizeof(ppi->pi_ptrs_size));

Same note as the one I made for copy_from_user.

> +	if (error)
> +		goto out;
> +
> +out:
> +	kmem_free(ppi);
> +	return error;
> +}
> +
>  int
>  xfs_ioc_swapext(
>  	xfs_swapext_t	*sxp)
> @@ -1968,7 +2058,8 @@ xfs_file_ioctl(
>  
>  	case XFS_IOC_FSGETXATTRA:
>  		return xfs_ioc_fsgetxattra(ip, arg);
> -
> +	case XFS_IOC_GETPPOINTER:
> +		return xfs_ioc_get_parent_pointer(filp, arg);
>  	case XFS_IOC_GETBMAP:
>  	case XFS_IOC_GETBMAPA:
>  	case XFS_IOC_GETBMAPX:
> diff --git a/fs/xfs/xfs_ondisk.h b/fs/xfs/xfs_ondisk.h
> index 758702b9495f..765eb514a917 100644
> --- a/fs/xfs/xfs_ondisk.h
> +++ b/fs/xfs/xfs_ondisk.h
> @@ -135,6 +135,10 @@ xfs_check_ondisk_structs(void)
>  	XFS_CHECK_STRUCT_SIZE(struct xfs_attri_log_format,	40);
>  	XFS_CHECK_STRUCT_SIZE(struct xfs_attrd_log_format,	16);
>  
> +	/* parent pointer ioctls */
> +	XFS_CHECK_STRUCT_SIZE(struct xfs_parent_ptr,            280);
> +	XFS_CHECK_STRUCT_SIZE(struct xfs_pptr_info,             104);
> +
>  	/*
>  	 * The v5 superblock format extended several v4 header structures with
>  	 * additional data. While new fields are only accessible on v5
> diff --git a/fs/xfs/xfs_parent_utils.c b/fs/xfs/xfs_parent_utils.c
> new file mode 100644
> index 000000000000..3351ce173075
> --- /dev/null
> +++ b/fs/xfs/xfs_parent_utils.c
> @@ -0,0 +1,134 @@
> +/*
> + * Copyright (c) 2015 Red Hat, Inc.
> + * All rights reserved.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it would be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write the Free Software Foundation
> + */

Please condense this boilerplate down to a SPDX tag and a copyright
statement.

> +#include "xfs.h"
> +#include "xfs_fs.h"
> +#include "xfs_format.h"
> +#include "xfs_log_format.h"
> +#include "xfs_shared.h"
> +#include "xfs_trans_resv.h"
> +#include "xfs_mount.h"
> +#include "xfs_bmap_btree.h"
> +#include "xfs_inode.h"
> +#include "xfs_error.h"
> +#include "xfs_trace.h"
> +#include "xfs_trans.h"
> +#include "xfs_da_format.h"
> +#include "xfs_da_btree.h"
> +#include "xfs_attr.h"
> +#include "xfs_ioctl.h"
> +#include "xfs_parent.h"
> +#include "xfs_da_btree.h"
> +
> +/*
> + * Get the parent pointers for a given inode
> + *
> + * Returns 0 on success and non zero on error
> + */
> +int
> +xfs_attr_get_parent_pointer(struct xfs_inode		*ip,
> +			    struct xfs_pptr_info	*ppi)
> +
> +{
> +
> +	struct xfs_attrlist		*alist;

int
xfs_attr_get_parent_pointer(
	struct xfs_inode		*ip,
	struct xfs_pptr_info		*ppi)
{
	struct xfs_attrlist		*alist;


> +	struct xfs_attrlist_ent		*aent;
> +	struct xfs_parent_ptr		*xpp;
> +	struct xfs_parent_name_rec	*xpnr;
> +	char				*namebuf;
> +	unsigned int			namebuf_size;
> +	int				name_len;
> +	int				error = 0;
> +	unsigned int			ioc_flags = XFS_IOC_ATTR_PARENT;
> +	unsigned int			flags = XFS_ATTR_PARENT;
> +	int				i;
> +	struct xfs_attr_list_context	context;
> +
> +	/* Allocate a buffer to store the attribute names */
> +	namebuf_size = sizeof(struct xfs_attrlist) +
> +		       (ppi->pi_ptrs_size) * sizeof(struct xfs_attrlist_ent);
> +	namebuf = kvzalloc(namebuf_size, GFP_KERNEL);
> +	if (!namebuf)
> +		return -ENOMEM;

Do we need the buffer to be zeroed if xfs_attr_list is just going to set
its contents?

> +
> +	memset(&context, 0, sizeof(struct xfs_attr_list_context));
> +	error = xfs_ioc_attr_list_context_init(ip, namebuf, namebuf_size,
> +			ioc_flags, &context);

Aha, so the internal implementation has access to xfs_attr_list_context
before it calls into the attr list code.  Ok, in that case, xfs_fs.h
doesn't need the XFS_IOC_ATTR_PARENT flag, and you can set
context.attr_filter = XFS_ATTR_PARENT here.  Then we don't have to worry
about the existing xattr bulk ioctls returning parent pointers.

> +
> +	/* Copy the cursor provided by caller */
> +	memcpy(&context.cursor, &ppi->pi_cursor,
> +	       sizeof(struct xfs_attrlist_cursor));
> +
> +	if (error)
> +		goto out_kfree;

Why does the error check come after copying the cursor into the onstack
variable?

> +
> +	xfs_ilock(ip, XFS_ILOCK_EXCL);

xfs_ilock_attr_map_shared() ?

> +
> +	error = xfs_attr_list_ilocked(&context);
> +	if (error)
> +		goto out_kfree;
> +
> +	alist = (struct xfs_attrlist *)namebuf;
> +	for (i = 0; i < alist->al_count; i++) {
> +		struct xfs_da_args args = {
> +			.geo = ip->i_mount->m_attr_geo,
> +			.whichfork = XFS_ATTR_FORK,
> +			.dp = ip,
> +			.namelen = sizeof(struct xfs_parent_name_rec),
> +			.attr_filter = flags,
> +			.op_flags = XFS_DA_OP_OKNOENT,
> +		};
> +
> +		xpp = xfs_ppinfo_to_pp(ppi, i);
> +		memset(xpp, 0, sizeof(struct xfs_parent_ptr));
> +		aent = (struct xfs_attrlist_ent *)
> +			&namebuf[alist->al_offset[i]];
> +		xpnr = (struct xfs_parent_name_rec *)(aent->a_name);
> +
> +		if (aent->a_valuelen > XFS_PPTR_MAXNAMELEN) {
> +			error = -ERANGE;
> +			goto out_kfree;

If a parent pointer has a name longer than MAXNAMELEN then isn't that a
corruption?  And in that case, -EFSCORRUPTED would be more appropriate
here, right?

> +		}
> +		name_len = aent->a_valuelen;
> +
> +		args.name = (char *)xpnr;
> +		args.hashval = xfs_da_hashname(args.name, args.namelen),
> +		args.value = (unsigned char *)(xpp->xpp_name);
> +		args.valuelen = name_len;
> +
> +		error = xfs_attr_get_ilocked(&args);

If error is ENOENT (or ENOATTR or whatever the return value is when the
attr doesn't exist) then shouldn't that be treated as a corruption too?
We still hold the ILOCK from earlier.  I don't think OKNOENT is correct
either.

> +		error = (error == -EEXIST ? 0 : error);
> +		if (error)
> +			goto out_kfree;
> +
> +		xpp->xpp_namelen = name_len;
> +		xfs_init_parent_ptr(xpp, xpnr);

Also, should we validate xpnr before copying it out to userspace?
If, say, the inode number is bogus, that should generate an
EFSCORRUPTED.

> +	}
> +	ppi->pi_ptrs_used = alist->al_count;
> +	if (!alist->al_more)
> +		ppi->pi_flags |= XFS_PPTR_OFLAG_DONE;
> +
> +	/* Update the caller with the current cursor position */
> +	memcpy(&ppi->pi_cursor, &context.cursor,
> +		sizeof(struct xfs_attrlist_cursor));

Glad you remembered to do this; attrmulti forgot to do this for a long
time. :)

> +
> +out_kfree:
> +	xfs_iunlock(ip, XFS_ILOCK_EXCL);
> +	kmem_free(namebuf);

kvfree, since you got namebuf from kvzalloc.

> +
> +	return error;
> +}
> +
> diff --git a/fs/xfs/xfs_parent_utils.h b/fs/xfs/xfs_parent_utils.h
> new file mode 100644
> index 000000000000..0e952b2ebd4a
> --- /dev/null
> +++ b/fs/xfs/xfs_parent_utils.h
> @@ -0,0 +1,22 @@
> +/*
> + * Copyright (c) 2017 Oracle, Inc.

2022?

> + * All Rights Reserved.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it would be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write the Free Software Foundation Inc.

This also needs to be condensed to a SPDX header and a copyright
statement.

> + */
> +#ifndef	__XFS_PARENT_UTILS_H__
> +#define	__XFS_PARENT_UTILS_H__
> +
> +int xfs_attr_get_parent_pointer(struct xfs_inode *ip,
> +				struct xfs_pptr_info *ppi);
> +#endif	/* __XFS_PARENT_UTILS_H__ */
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [RFC PATCH 19/18] xfs: fix unit conversion error in xfs_log_calc_max_attrsetm_res
  2022-08-04 19:39 [PATCH RESEND v2 00/18] Parent Pointers Allison Henderson
                   ` (17 preceding siblings ...)
  2022-08-04 19:40 ` [PATCH RESEND v2 18/18] xfs: Add parent pointer ioctl Allison Henderson
@ 2022-08-09 22:55 ` Darrick J. Wong
  2022-08-09 22:56 ` [RFC PATCH 20/18] xfs: drop compatibility minimum log size computations for reflink Darrick J. Wong
  19 siblings, 0 replies; 58+ messages in thread
From: Darrick J. Wong @ 2022-08-09 22:55 UTC (permalink / raw)
  To: Allison Henderson; +Cc: linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Dave and I were discussing some recent test regressions as a result of
me turning on nrext64=1 on realtime filesystems, when we noticed that
the minimum log size of a 32M filesystem jumped from 954 blocks to 4287
blocks.

Digging through xfs_log_calc_max_attrsetm_res, Dave noticed that @size
contains the maximum estimated amount of space needed for a local format
xattr, in bytes, but we feed this quantity to XFS_NEXTENTADD_SPACE_RES,
which requires units of blocks.  This has resulted in an overestimation
of the minimum log size over the years.

We should nominally correct this, but there's a backwards compatibility
problem -- if we enable it now, the minimum log size will decrease.  If
a corrected mkfs formats a filesystem with this new smaller log size, a
user will encounter mount failures on an uncorrected kernel due to the
larger minimum log size computations there.

However, the large extent counters feature is still EXPERIMENTAL, so we
can gate the correction on that feature (or any features that get added
after that) being enabled.  Any filesystem with nrext64 or any of the
as-yet-undefined feature bits turned on will be rejected by old
uncorrected kernels, so this should be safe even in the upgrade case.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_log_rlimit.c |   43 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/fs/xfs/libxfs/xfs_log_rlimit.c b/fs/xfs/libxfs/xfs_log_rlimit.c
index 9975b93a7412..cc4837b948b1 100644
--- a/fs/xfs/libxfs/xfs_log_rlimit.c
+++ b/fs/xfs/libxfs/xfs_log_rlimit.c
@@ -16,6 +16,39 @@
 #include "xfs_bmap_btree.h"
 #include "xfs_trace.h"
 
+/*
+ * Decide if the filesystem has the parent pointer feature or any feature
+ * added after that.
+ */
+static inline bool
+xfs_has_parent_or_newer_feature(
+	struct xfs_mount	*mp)
+{
+	if (!xfs_sb_is_v5(&mp->m_sb))
+		return false;
+
+	if (xfs_sb_has_compat_feature(&mp->m_sb, ~0))
+		return true;
+
+	if (xfs_sb_has_ro_compat_feature(&mp->m_sb,
+				~(XFS_SB_FEAT_RO_COMPAT_FINOBT |
+				  XFS_SB_FEAT_RO_COMPAT_RMAPBT |
+				  XFS_SB_FEAT_RO_COMPAT_REFLINK |
+				  XFS_SB_FEAT_RO_COMPAT_INOBTCNT)))
+		return true;
+
+	if (xfs_sb_has_incompat_feature(&mp->m_sb,
+				~(XFS_SB_FEAT_INCOMPAT_FTYPE |
+				  XFS_SB_FEAT_INCOMPAT_SPINODES |
+				  XFS_SB_FEAT_INCOMPAT_META_UUID |
+				  XFS_SB_FEAT_INCOMPAT_BIGTIME |
+				  XFS_SB_FEAT_INCOMPAT_NEEDSREPAIR |
+				  XFS_SB_FEAT_INCOMPAT_NREXT64)))
+		return true;
+
+	return false;
+}
+
 /*
  * Calculate the maximum length in bytes that would be required for a local
  * attribute value as large attributes out of line are not logged.
@@ -31,6 +64,16 @@ xfs_log_calc_max_attrsetm_res(
 	       MAXNAMELEN - 1;
 	nblks = XFS_DAENTER_SPACE_RES(mp, XFS_ATTR_FORK);
 	nblks += XFS_B_TO_FSB(mp, size);
+
+	/*
+	 * Starting with the parent pointer feature, every new fs feature
+	 * corrects a unit conversion error in the xattr transaction
+	 * reservation code that resulted in oversized minimum log size
+	 * computations.
+	 */
+	if (xfs_has_parent_or_newer_feature(mp))
+		size = XFS_B_TO_FSB(mp, size);
+
 	nblks += XFS_NEXTENTADD_SPACE_RES(mp, size, XFS_ATTR_FORK);
 
 	return  M_RES(mp)->tr_attrsetm.tr_logres +

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [RFC PATCH 20/18] xfs: drop compatibility minimum log size computations for reflink
  2022-08-04 19:39 [PATCH RESEND v2 00/18] Parent Pointers Allison Henderson
                   ` (18 preceding siblings ...)
  2022-08-09 22:55 ` [RFC PATCH 19/18] xfs: fix unit conversion error in xfs_log_calc_max_attrsetm_res Darrick J. Wong
@ 2022-08-09 22:56 ` Darrick J. Wong
  19 siblings, 0 replies; 58+ messages in thread
From: Darrick J. Wong @ 2022-08-09 22:56 UTC (permalink / raw)
  To: Allison Henderson; +Cc: linux-xfs

From: Darrick J. Wong <djwong@kernel.org>

Having established that we can reduce the minimum log size computation
for filesystems with parent pointers or any newer feature, we should
also drop the compat minlogsize code that we added when we reduced the
transaction reservation size for rmap and reflink.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_log_rlimit.c |   10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/fs/xfs/libxfs/xfs_log_rlimit.c b/fs/xfs/libxfs/xfs_log_rlimit.c
index cc4837b948b1..c6b098d12f65 100644
--- a/fs/xfs/libxfs/xfs_log_rlimit.c
+++ b/fs/xfs/libxfs/xfs_log_rlimit.c
@@ -91,6 +91,16 @@ xfs_log_calc_trans_resv_for_minlogblocks(
 {
 	unsigned int		rmap_maxlevels = mp->m_rmap_maxlevels;
 
+	/*
+	 * Starting with the parent pointer feature, every new fs feature
+	 * drops the oversized minimum log size computation introduced by the
+	 * original reflink code.
+	 */
+	if (xfs_has_parent_or_newer_feature(mp)) {
+		xfs_trans_resv_calc(mp, resv);
+		return;
+	}
+
 	/*
 	 * In the early days of rmap+reflink, we always set the rmap maxlevels
 	 * to 9 even if the AG was small enough that it would never grow to

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [PATCH RESEND v2 01/18] xfs: Fix multi-transaction larp replay
  2022-08-09 16:52   ` Darrick J. Wong
@ 2022-08-10  1:58     ` Dave Chinner
  2022-08-10  5:01       ` Alli
  2022-08-10  3:08     ` Alli
  1 sibling, 1 reply; 58+ messages in thread
From: Dave Chinner @ 2022-08-10  1:58 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Allison Henderson, linux-xfs

On Tue, Aug 09, 2022 at 09:52:55AM -0700, Darrick J. Wong wrote:
> On Thu, Aug 04, 2022 at 12:39:56PM -0700, Allison Henderson wrote:
> > Recent parent pointer testing has exposed a bug in the underlying
> > attr replay.  A multi transaction replay currently performs a
> > single step of the replay, then deferrs the rest if there is more
> > to do.

Yup.

> > This causes race conditions with other attr replays that
> > might be recovered before the remaining deferred work has had a
> > chance to finish.

What other attr replays are we racing against?  There can only be
one incomplete attr item intent/done chain per inode present in log
recovery, right?

> > This can lead to interleaved set and remove
> > operations that may clobber the attribute fork.  Fix this by
> > deferring all work for any attribute operation.

Which means this should be an impossible situation.

That is, if we crash before the final attrd DONE intent is written
to the log, it means that new attr intents for modifications made
*after* the current attr modification was completed will not be
present in the log. We have strict ordering of committed operations
in the journal, hence an operation on an inode has an incomplete
intent *must* be the last operation and the *only* incomplete intent
that is found in the journal for that inode.

Hence from an operational ordering persepective, this explanation
for issue being seen doesn't make any sense to me.  If there are
multiple incomplete attri intents then we've either got a runtime
journalling problem (a white-out issue? failing to relog the inode
in each new intent?) or a log recovery problem (failing to match
intent-done pairs correctly?), not a recovery deferral issue.

Hence I think we're still looking for the root cause of this
problem...

> > Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
> > ---
> >  fs/xfs/xfs_attr_item.c | 35 ++++++++---------------------------
> >  1 file changed, 8 insertions(+), 27 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
> > index 5077a7ad5646..c13d724a3e13 100644
> > --- a/fs/xfs/xfs_attr_item.c
> > +++ b/fs/xfs/xfs_attr_item.c
> > @@ -635,52 +635,33 @@ xfs_attri_item_recover(
> >  		break;
> >  	case XFS_ATTRI_OP_FLAGS_REMOVE:
> >  		if (!xfs_inode_hasattr(args->dp))
> > -			goto out;
> > +			return 0;
> >  		attr->xattri_dela_state = xfs_attr_init_remove_state(args);
> >  		break;
> >  	default:
> >  		ASSERT(0);
> > -		error = -EFSCORRUPTED;
> > -		goto out;
> > +		return -EFSCORRUPTED;
> >  	}
> >  
> >  	xfs_init_attr_trans(args, &tres, &total);
> >  	error = xfs_trans_alloc(mp, &tres, total, 0, XFS_TRANS_RESERVE, &tp);
> >  	if (error)
> > -		goto out;
> > +		return error;
> >  
> >  	args->trans = tp;
> >  	done_item = xfs_trans_get_attrd(tp, attrip);
> > +	args->trans->t_flags |= XFS_TRANS_HAS_INTENT_DONE;
> > +	set_bit(XFS_LI_DIRTY, &done_item->attrd_item.li_flags);
> >  
> >  	xfs_ilock(ip, XFS_ILOCK_EXCL);
> >  	xfs_trans_ijoin(tp, ip, 0);
> >  
> > -	error = xfs_xattri_finish_update(attr, done_item);
> > -	if (error == -EAGAIN) {
> > -		/*
> > -		 * There's more work to do, so add the intent item to this
> > -		 * transaction so that we can continue it later.
> > -		 */
> > -		xfs_defer_add(tp, XFS_DEFER_OPS_TYPE_ATTR, &attr->xattri_list);
> > -		error = xfs_defer_ops_capture_and_commit(tp, capture_list);
> > -		if (error)
> > -			goto out_unlock;
> > -
> > -		xfs_iunlock(ip, XFS_ILOCK_EXCL);
> > -		xfs_irele(ip);
> > -		return 0;
> > -	}
> > -	if (error) {
> > -		xfs_trans_cancel(tp);
> > -		goto out_unlock;
> > -	}
> > -
> > +	xfs_defer_add(tp, XFS_DEFER_OPS_TYPE_ATTR, &attr->xattri_list);
> 
> This seems a little convoluted to me.  Maybe?  Maybe not?
> 
> 1. Log recovery recreates an incore xfs_attri_log_item from what it
> finds in the log.
> 
> 2. This function then logs an xattrd for the recovered xattri item.
> 
> 3. Then it creates a new xfs_attr_intent to complete the operation.
> 
> 4. Finally, it calls xfs_defer_ops_capture_and_commit, which logs a new
> xattri for the intent created in step 3 and also commits the xattrd for
> the first xattri.
> 
> IOWs, the only difference between before and after is that we're not
> advancing one more step through the state machine as part of log
> recovery.  From the perspective of the log, the recovery function merely
> replaces the recovered xattri log item with a new one.
> 
> Why can't we just attach the recovered xattri to the xfs_defer_pending
> that is created to point to the xfs_attr_intent that's created in step
> 3, and skip the xattrd?

Remember that attribute intents are different to all other intent
types that we have. The existing extent based intents define a
single indepedent operation that needs to be performed, and each
step of the intent chain is completely independent of the previous
step in the chain.  e.g. removing the extent from the rmap btree is
completely independent of removing it from the inode bmap btree -
all that matters is that the removal from the bmbt happens first.
The rmapbt removal can happen at any time after than, and is
completely independent of any other bmbt or rmapbt operation.
Similarly, the EFI can processed independently of all bmapbt and
rmapbt modifications, it just has to happen after those
modifications are done.

Hence if we crash during recovery, we can just restart from
where-ever we got to in the middle of the intent chains and not have
to care at all.  IOWs, eventual consistency works with these chains
because there is no dependencies between each step of the intent
chain and each step is completely independent of the other steps.

Attribute intent chains are completely different. They link steps in
a state machine together in a non-trivial, highly dependent chain.
We can't just restart the chain in the middle like we can for the
BUI->RUI->CUI->EFI chain because the on-disk attribute is in an
unknown state and recovering that exact state is .... complex.

Hence the the first step of recovery is to return the attribute we
are trying to modify back to a known state. That means we have to
perform a removal of any existing attribute under that name first.
Hence this first step should be replacing the existing attr intent
with the intent that defines the recovery operation we are going to
perform.

That means we need to translate set to replace so that cleanup is
run first, replace needs to clean up the attr under that name
regardless of whether it has the incomplete bit set on it or not.
Remove is the only operation that runs the same as at runtime, as
cleanup for remove is just repeating the remove operation from
scratch.

> I /think/ the answer to that question is that we might need to move the
> log tail forward to free enough log space to finish the intent items, so
> creating the extra xattrd/xattri (a) avoid the complexity of submitting
> an incore intent item *and* a log intent item to the defer ops
> machinery; and (b) avoid livelocks in log recovery.  Therefore, we
> actually need to do it this way.

We really need the initial operation to rewrite the intent to match
the recovery operation we are going to perform. Everything else is
secondary.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH RESEND v2 02/18] xfs: Increase XFS_DEFER_OPS_NR_INODES to 5
  2022-08-09 16:38   ` Darrick J. Wong
@ 2022-08-10  3:07     ` Alli
  0 siblings, 0 replies; 58+ messages in thread
From: Alli @ 2022-08-10  3:07 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Tue, 2022-08-09 at 09:38 -0700, Darrick J. Wong wrote:
> On Thu, Aug 04, 2022 at 12:39:57PM -0700, Allison Henderson wrote:
> > Renames that generate parent pointer updates can join up to 5
> > inodes locked in sorted order.  So we need to increase the
> > number of defer ops inodes and relock them in the same way.
> > 
> > Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
> > ---
> >  fs/xfs/libxfs/xfs_defer.c | 28 ++++++++++++++++++++++++++--
> >  fs/xfs/libxfs/xfs_defer.h |  8 +++++++-
> >  fs/xfs/xfs_inode.c        |  2 +-
> >  fs/xfs/xfs_inode.h        |  1 +
> >  4 files changed, 35 insertions(+), 4 deletions(-)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
> > index 5a321b783398..c0279b57e51d 100644
> > --- a/fs/xfs/libxfs/xfs_defer.c
> > +++ b/fs/xfs/libxfs/xfs_defer.c
> > @@ -820,13 +820,37 @@ xfs_defer_ops_continue(
> >  	struct xfs_trans		*tp,
> >  	struct xfs_defer_resources	*dres)
> >  {
> > -	unsigned int			i;
> > +	unsigned int			i, j;
> > +	struct xfs_inode		*sips[XFS_DEFER_OPS_NR_INODES];
> > +	struct xfs_inode		*temp;
> >  
> >  	ASSERT(tp->t_flags & XFS_TRANS_PERM_LOG_RES);
> >  	ASSERT(!(tp->t_flags & XFS_TRANS_DIRTY));
> >  
> >  	/* Lock the captured resources to the new transaction. */
> > -	if (dfc->dfc_held.dr_inos == 2)
> > +	if (dfc->dfc_held.dr_inos > 2) {
> > +		/*
> > +		 * Renames with parent pointer updates can lock up to 5
> > inodes,
> > +		 * sorted by their inode number.  So we need to make
> > sure they
> > +		 * are relocked in the same way.
> > +		 */
> > +		memset(sips, 0, sizeof(sips));
> > +		for (i = 0; i < dfc->dfc_held.dr_inos; i++)
> > +			sips[i] = dfc->dfc_held.dr_ip[i];
> > +
> > +		/* Bubble sort of at most 5 inodes */
> > +		for (i = 0; i < dfc->dfc_held.dr_inos; i++) {
> > +			for (j = 1; j < dfc->dfc_held.dr_inos; j++) {
> > +				if (sips[j]->i_ino < sips[j-1]->i_ino)
> > {
> > +					temp = sips[j];
> > +					sips[j] = sips[j-1];
> > +					sips[j-1] = temp;
> > +				}
> > +			}
> > +		}
> 
> Why not reuse xfs_sort_for_rename?
Initially I had looked at doing that, but it would need some
refactoring as it is not meant for an arbitrary number of inodes.
Either some logic specific to rename would get pulled up, or we'd need
another helper to repackage the parameters, but it's such a small bit
of code, I'm not sure it saves much LOC either way.  

> 
> I also wonder if it's worth the trouble to replace the open-coded
> bubblesort with a call to sort_r(), but TBH I suspect the cost of a
> retpoline for the compare function isn't worth the overhead.
Yeah, it would make sense if there was lot of other places we sorted
inodes, but with only two callers it does seem like a bit much.

I am fine with what ever method folks prefer tho.

> 
> > +
> > +		xfs_lock_inodes(sips, dfc->dfc_held.dr_inos,
> > XFS_ILOCK_EXCL);
> > +	} else if (dfc->dfc_held.dr_inos == 2)
> >  		xfs_lock_two_inodes(dfc->dfc_held.dr_ip[0],
> > XFS_ILOCK_EXCL,
> >  				    dfc->dfc_held.dr_ip[1],
> > XFS_ILOCK_EXCL);
> >  	else if (dfc->dfc_held.dr_inos == 1)
> > diff --git a/fs/xfs/libxfs/xfs_defer.h b/fs/xfs/libxfs/xfs_defer.h
> > index 114a3a4930a3..3e4029d2ce41 100644
> > --- a/fs/xfs/libxfs/xfs_defer.h
> > +++ b/fs/xfs/libxfs/xfs_defer.h
> > @@ -70,7 +70,13 @@ extern const struct xfs_defer_op_type
> > xfs_attr_defer_type;
> >  /*
> >   * Deferred operation item relogging limits.
> >   */
> > -#define XFS_DEFER_OPS_NR_INODES	2	/* join up to two inodes */
> > +
> > +/*
> > + * Rename w/ parent pointers can require up to 5 inodes with
> > defered ops to
> > + * be joined to the transaction: src_dp, target_dp, src_ip,
> > target_ip, and wip.
> > + * These inodes are locked in sorted order by their inode numbers
> 
> Much inode.  Thanks for recording this.
Sure, thx for the reviews!

Allison
> 
> --D
> 
> > + */
> > +#define XFS_DEFER_OPS_NR_INODES	5
> >  #define XFS_DEFER_OPS_NR_BUFS	2	/* join up to two buffers
> > */
> >  
> >  /* Resources that must be held across a transaction roll. */
> > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> > index 3022918bf96a..cfdcca95594f 100644
> > --- a/fs/xfs/xfs_inode.c
> > +++ b/fs/xfs/xfs_inode.c
> > @@ -447,7 +447,7 @@ xfs_lock_inumorder(
> >   * lock more than one at a time, lockdep will report false
> > positives saying we
> >   * have violated locking orders.
> >   */
> > -static void
> > +void
> >  xfs_lock_inodes(
> >  	struct xfs_inode	**ips,
> >  	int			inodes,
> > diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
> > index 4d626f4321bc..bc06d6e4164a 100644
> > --- a/fs/xfs/xfs_inode.h
> > +++ b/fs/xfs/xfs_inode.h
> > @@ -573,5 +573,6 @@ void xfs_end_io(struct work_struct *work);
> >  
> >  int xfs_ilock2_io_mmap(struct xfs_inode *ip1, struct xfs_inode
> > *ip2);
> >  void xfs_iunlock2_io_mmap(struct xfs_inode *ip1, struct xfs_inode
> > *ip2);
> > +void xfs_lock_inodes(struct xfs_inode **ips, int inodes, uint
> > lock_mode);
> >  
> >  #endif	/* __XFS_INODE_H__ */
> > -- 
> > 2.25.1
> > 


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH RESEND v2 08/18] xfs: add parent pointer support to attribute code
  2022-08-09 16:54   ` Darrick J. Wong
@ 2022-08-10  3:08     ` Alli
  0 siblings, 0 replies; 58+ messages in thread
From: Alli @ 2022-08-10  3:08 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Tue, 2022-08-09 at 09:54 -0700, Darrick J. Wong wrote:
> On Thu, Aug 04, 2022 at 12:40:03PM -0700, Allison Henderson wrote:
> > Add the new parent attribute type. XFS_ATTR_PARENT is used only for
> > parent pointer
> > entries; it uses reserved blocks like XFS_ATTR_ROOT.
> > 
> > [dchinner: forward ported and cleaned up]
> > [achender: rebased]
> > 
> > Signed-off-by: Mark Tinguely <tinguely@sgi.com>
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
> 
> Looks good now,
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>

Great!  Thanks!
Allison


> 
> --D
> 
> > ---
> >  fs/xfs/libxfs/xfs_attr.c       | 4 +++-
> >  fs/xfs/libxfs/xfs_da_format.h  | 5 ++++-
> >  fs/xfs/libxfs/xfs_log_format.h | 1 +
> >  3 files changed, 8 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> > index e28d93d232de..8df80d91399b 100644
> > --- a/fs/xfs/libxfs/xfs_attr.c
> > +++ b/fs/xfs/libxfs/xfs_attr.c
> > @@ -966,11 +966,13 @@ xfs_attr_set(
> >  	struct xfs_inode	*dp = args->dp;
> >  	struct xfs_mount	*mp = dp->i_mount;
> >  	struct xfs_trans_res	tres;
> > -	bool			rsvd = (args->attr_filter &
> > XFS_ATTR_ROOT);
> > +	bool			rsvd;
> >  	int			error, local;
> >  	int			rmt_blks = 0;
> >  	unsigned int		total;
> >  
> > +	rsvd = (args->attr_filter & (XFS_ATTR_ROOT | XFS_ATTR_PARENT))
> > != 0;
> > +
> >  	if (xfs_is_shutdown(dp->i_mount))
> >  		return -EIO;
> >  
> > diff --git a/fs/xfs/libxfs/xfs_da_format.h
> > b/fs/xfs/libxfs/xfs_da_format.h
> > index 25e2841084e1..3dc03968bba6 100644
> > --- a/fs/xfs/libxfs/xfs_da_format.h
> > +++ b/fs/xfs/libxfs/xfs_da_format.h
> > @@ -688,12 +688,15 @@ struct xfs_attr3_leafblock {
> >  #define	XFS_ATTR_LOCAL_BIT	0	/* attr is stored locally
> > */
> >  #define	XFS_ATTR_ROOT_BIT	1	/* limit access to trusted
> > attrs */
> >  #define	XFS_ATTR_SECURE_BIT	2	/* limit access to secure
> > attrs */
> > +#define	XFS_ATTR_PARENT_BIT	3	/* parent pointer attrs */
> >  #define	XFS_ATTR_INCOMPLETE_BIT	7	/* attr in middle
> > of create/delete */
> >  #define XFS_ATTR_LOCAL		(1u << XFS_ATTR_LOCAL_BIT)
> >  #define XFS_ATTR_ROOT		(1u << XFS_ATTR_ROOT_BIT)
> >  #define XFS_ATTR_SECURE		(1u << XFS_ATTR_SECURE_BIT)
> > +#define XFS_ATTR_PARENT		(1u << XFS_ATTR_PARENT_BIT)
> >  #define XFS_ATTR_INCOMPLETE	(1u << XFS_ATTR_INCOMPLETE_BIT)
> > -#define XFS_ATTR_NSP_ONDISK_MASK	(XFS_ATTR_ROOT |
> > XFS_ATTR_SECURE)
> > +#define XFS_ATTR_NSP_ONDISK_MASK \
> > +			(XFS_ATTR_ROOT | XFS_ATTR_SECURE |
> > XFS_ATTR_PARENT)
> >  
> >  /*
> >   * Alignment for namelist and valuelist entries (since they are
> > mixed
> > diff --git a/fs/xfs/libxfs/xfs_log_format.h
> > b/fs/xfs/libxfs/xfs_log_format.h
> > index b351b9dc6561..eea53874fde8 100644
> > --- a/fs/xfs/libxfs/xfs_log_format.h
> > +++ b/fs/xfs/libxfs/xfs_log_format.h
> > @@ -917,6 +917,7 @@ struct xfs_icreate_log {
> >   */
> >  #define XFS_ATTRI_FILTER_MASK		(XFS_ATTR_ROOT | \
> >  					 XFS_ATTR_SECURE | \
> > +					 XFS_ATTR_PARENT | \
> >  					 XFS_ATTR_INCOMPLETE)
> >  
> >  /*
> > -- 
> > 2.25.1
> > 


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH RESEND v2 01/18] xfs: Fix multi-transaction larp replay
  2022-08-09 16:52   ` Darrick J. Wong
  2022-08-10  1:58     ` Dave Chinner
@ 2022-08-10  3:08     ` Alli
  1 sibling, 0 replies; 58+ messages in thread
From: Alli @ 2022-08-10  3:08 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Tue, 2022-08-09 at 09:52 -0700, Darrick J. Wong wrote:
> On Thu, Aug 04, 2022 at 12:39:56PM -0700, Allison Henderson wrote:
> > Recent parent pointer testing has exposed a bug in the underlying
> > attr replay.  A multi transaction replay currently performs a
> > single step of the replay, then deferrs the rest if there is more
> > to do.  This causes race conditions with other attr replays that
> > might be recovered before the remaining deferred work has had a
> > chance to finish.  This can lead to interleaved set and remove
> > operations that may clobber the attribute fork.  Fix this by
> > deferring all work for any attribute operation.
> > 
> > Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
> > ---
> >  fs/xfs/xfs_attr_item.c | 35 ++++++++---------------------------
> >  1 file changed, 8 insertions(+), 27 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
> > index 5077a7ad5646..c13d724a3e13 100644
> > --- a/fs/xfs/xfs_attr_item.c
> > +++ b/fs/xfs/xfs_attr_item.c
> > @@ -635,52 +635,33 @@ xfs_attri_item_recover(
> >  		break;
> >  	case XFS_ATTRI_OP_FLAGS_REMOVE:
> >  		if (!xfs_inode_hasattr(args->dp))
> > -			goto out;
> > +			return 0;
> >  		attr->xattri_dela_state =
> > xfs_attr_init_remove_state(args);
> >  		break;
> >  	default:
> >  		ASSERT(0);
> > -		error = -EFSCORRUPTED;
> > -		goto out;
> > +		return -EFSCORRUPTED;
> >  	}
> >  
> >  	xfs_init_attr_trans(args, &tres, &total);
> >  	error = xfs_trans_alloc(mp, &tres, total, 0, XFS_TRANS_RESERVE,
> > &tp);
> >  	if (error)
> > -		goto out;
> > +		return error;
> >  
> >  	args->trans = tp;
> >  	done_item = xfs_trans_get_attrd(tp, attrip);
> > +	args->trans->t_flags |= XFS_TRANS_HAS_INTENT_DONE;
> > +	set_bit(XFS_LI_DIRTY, &done_item->attrd_item.li_flags);
> >  
> >  	xfs_ilock(ip, XFS_ILOCK_EXCL);
> >  	xfs_trans_ijoin(tp, ip, 0);
> >  
> > -	error = xfs_xattri_finish_update(attr, done_item);
> > -	if (error == -EAGAIN) {
> > -		/*
> > -		 * There's more work to do, so add the intent item to
> > this
> > -		 * transaction so that we can continue it later.
> > -		 */
> > -		xfs_defer_add(tp, XFS_DEFER_OPS_TYPE_ATTR, &attr-
> > >xattri_list);
> > -		error = xfs_defer_ops_capture_and_commit(tp,
> > capture_list);
> > -		if (error)
> > -			goto out_unlock;
> > -
> > -		xfs_iunlock(ip, XFS_ILOCK_EXCL);
> > -		xfs_irele(ip);
> > -		return 0;
> > -	}
> > -	if (error) {
> > -		xfs_trans_cancel(tp);
> > -		goto out_unlock;
> > -	}
> > -
> > +	xfs_defer_add(tp, XFS_DEFER_OPS_TYPE_ATTR, &attr->xattri_list);
> 
> This seems a little convoluted to me.  Maybe?  Maybe not?
> 
> 1. Log recovery recreates an incore xfs_attri_log_item from what it
> finds in the log.
> 
> 2. This function then logs an xattrd for the recovered xattri item.
> 
> 3. Then it creates a new xfs_attr_intent to complete the operation.
> 
> 4. Finally, it calls xfs_defer_ops_capture_and_commit, which logs a
> new
> xattri for the intent created in step 3 and also commits the xattrd
> for
> the first xattri.
> 
> IOWs, the only difference between before and after is that we're not
> advancing one more step through the state machine as part of log
> recovery.  From the perspective of the log, the recovery function
> merely
> replaces the recovered xattri log item with a new one.
> 
> Why can't we just attach the recovered xattri to the
> xfs_defer_pending
> that is created to point to the xfs_attr_intent that's created in
> step
> 3, and skip the xattrd?
Oh, I see.  I hadnt thought of doing it that way, this was based on the
initial solution suggested to the first patch of v1 (xfs: Add larp
state XFS_DAS_CREATE_FORK).  But what you mention below also makes
sense. So I suppose if no one has any gripes then maybe it should stay
as it is then.  Thx for the reviews!

Allison

> 
> I /think/ the answer to that question is that we might need to move
> the
> log tail forward to free enough log space to finish the intent items,
> so
> creating the extra xattrd/xattri (a) avoid the complexity of
> submitting
> an incore intent item *and* a log intent item to the defer ops
> machinery; and (b) avoid livelocks in log recovery.  Therefore, we
> actually need to do it this way.
> 
> IOWS, I *think* this is ok, but want to see if others have differing
> perspectives on how log item recovery works?
> 
> --D
> 
> >  	error = xfs_defer_ops_capture_and_commit(tp, capture_list);
> > -out_unlock:
> > +
> >  	xfs_iunlock(ip, XFS_ILOCK_EXCL);
> >  	xfs_irele(ip);
> > -out:
> > -	xfs_attr_free_item(attr);
> > +
> >  	return error;
> >  }
> >  
> > -- 
> > 2.25.1
> > 


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH RESEND v2 10/18] xfs: Add xfs_verify_pptr
  2022-08-09 16:59   ` Darrick J. Wong
@ 2022-08-10  3:08     ` Alli
  0 siblings, 0 replies; 58+ messages in thread
From: Alli @ 2022-08-10  3:08 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Tue, 2022-08-09 at 09:59 -0700, Darrick J. Wong wrote:
> On Thu, Aug 04, 2022 at 12:40:05PM -0700, Allison Henderson wrote:
> > Attribute names of parent pointers are not strings.  So we need to
> > modify
> > attr_namecheck to verify parent pointer records when the
> > XFS_ATTR_PARENT flag is
> > set.
> > 
> > Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
> > Reviewed-by: Darrick J. Wong <djwong@kernel.org>
> > ---
> >  fs/xfs/libxfs/xfs_attr.c | 43
> > +++++++++++++++++++++++++++++++++++++---
> >  fs/xfs/libxfs/xfs_attr.h |  3 ++-
> >  fs/xfs/scrub/attr.c      |  2 +-
> >  fs/xfs/xfs_attr_item.c   |  6 ++++--
> >  fs/xfs/xfs_attr_list.c   | 17 +++++++++++-----
> >  5 files changed, 59 insertions(+), 12 deletions(-)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> > index 8df80d91399b..2ef3262f21e8 100644
> > --- a/fs/xfs/libxfs/xfs_attr.c
> > +++ b/fs/xfs/libxfs/xfs_attr.c
> > @@ -1567,9 +1567,29 @@ xfs_attr_node_get(
> >  	return error;
> >  }
> >  
> > -/* Returns true if the attribute entry name is valid. */
> > -bool
> > -xfs_attr_namecheck(
> > +/*
> > + * Verify parent pointer attribute is valid.
> > + * Return true on success or false on failure
> > + */
> > +STATIC bool
> > +xfs_verify_pptr(struct xfs_mount *mp, struct xfs_parent_name_rec
> > *rec)
> > +{
> > +	xfs_ino_t p_ino = (xfs_ino_t)be64_to_cpu(rec->p_ino);
> > +	xfs_dir2_dataptr_t p_diroffset =
> > +		(xfs_dir2_dataptr_t)be32_to_cpu(rec->p_diroffset);
> 
> I guess I should complain about the indentation here...
> 
> STATIC bool
> xfs_verify_pptr(
> 	struct xfs_mount		*mp,
> 	struct xfs_parent_name_rec	*rec)
> {
> 	xfs_ino_t			p_ino;
> 	xfs_dir2_dataptr_t		p_diroffset;
> 
> 	p_ino = be64_to_cpu(rec->p_ino);
> 	p_diroffset = be32_to_cpu(rec->p_diroffset);
> 
> (You can keep the RVB tag if you clean this up for the next
> revision.)
Sure, will fix

Allison
> 
> --D
> 
> > +
> > +	if (!xfs_verify_ino(mp, p_ino))
> > +		return false;
> > +
> > +	if (p_diroffset > XFS_DIR2_MAX_DATAPTR)
> > +		return false;
> > +
> > +	return true;
> > +}
> > +
> > +/* Returns true if the string attribute entry name is valid. */
> > +static bool
> > +xfs_str_attr_namecheck(
> >  	const void	*name,
> >  	size_t		length)
> >  {
> > @@ -1584,6 +1604,23 @@ xfs_attr_namecheck(
> >  	return !memchr(name, 0, length);
> >  }
> >  
> > +/* Returns true if the attribute entry name is valid. */
> > +bool
> > +xfs_attr_namecheck(
> > +	struct xfs_mount	*mp,
> > +	const void		*name,
> > +	size_t			length,
> > +	int			flags)
> > +{
> > +	if (flags & XFS_ATTR_PARENT) {
> > +		if (length != sizeof(struct xfs_parent_name_rec))
> > +			return false;
> > +		return xfs_verify_pptr(mp, (struct xfs_parent_name_rec
> > *)name);
> > +	}
> > +
> > +	return xfs_str_attr_namecheck(name, length);
> > +}
> > +
> >  int __init
> >  xfs_attr_intent_init_cache(void)
> >  {
> > diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
> > index 81be9b3e4004..af92cc57e7d8 100644
> > --- a/fs/xfs/libxfs/xfs_attr.h
> > +++ b/fs/xfs/libxfs/xfs_attr.h
> > @@ -547,7 +547,8 @@ int xfs_attr_get(struct xfs_da_args *args);
> >  int xfs_attr_set(struct xfs_da_args *args);
> >  int xfs_attr_set_iter(struct xfs_attr_intent *attr);
> >  int xfs_attr_remove_iter(struct xfs_attr_intent *attr);
> > -bool xfs_attr_namecheck(const void *name, size_t length);
> > +bool xfs_attr_namecheck(struct xfs_mount *mp, const void *name,
> > size_t length,
> > +			int flags);
> >  int xfs_attr_calc_size(struct xfs_da_args *args, int *local);
> >  void xfs_init_attr_trans(struct xfs_da_args *args, struct
> > xfs_trans_res *tres,
> >  			 unsigned int *total);
> > diff --git a/fs/xfs/scrub/attr.c b/fs/xfs/scrub/attr.c
> > index b6f0c9f3f124..d3e75c077fab 100644
> > --- a/fs/xfs/scrub/attr.c
> > +++ b/fs/xfs/scrub/attr.c
> > @@ -128,7 +128,7 @@ xchk_xattr_listent(
> >  	}
> >  
> >  	/* Does this name make sense? */
> > -	if (!xfs_attr_namecheck(name, namelen)) {
> > +	if (!xfs_attr_namecheck(sx->sc->mp, name, namelen, flags)) {
> >  		xchk_fblock_set_corrupt(sx->sc, XFS_ATTR_FORK,
> > args.blkno);
> >  		return;
> >  	}
> > diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
> > index c13d724a3e13..69856814c066 100644
> > --- a/fs/xfs/xfs_attr_item.c
> > +++ b/fs/xfs/xfs_attr_item.c
> > @@ -587,7 +587,8 @@ xfs_attri_item_recover(
> >  	 */
> >  	attrp = &attrip->attri_format;
> >  	if (!xfs_attri_validate(mp, attrp) ||
> > -	    !xfs_attr_namecheck(nv->name.i_addr, nv->name.i_len))
> > +	    !xfs_attr_namecheck(mp, nv->name.i_addr, nv->name.i_len,
> > +				attrp->alfi_attr_filter))
> >  		return -EFSCORRUPTED;
> >  
> >  	error = xlog_recover_iget(mp,  attrp->alfi_ino, &ip);
> > @@ -727,7 +728,8 @@ xlog_recover_attri_commit_pass2(
> >  		return -EFSCORRUPTED;
> >  	}
> >  
> > -	if (!xfs_attr_namecheck(attr_name, attri_formatp-
> > >alfi_name_len)) {
> > +	if (!xfs_attr_namecheck(mp, attr_name, attri_formatp-
> > >alfi_name_len,
> > +				attri_formatp->alfi_attr_filter)) {
> >  		XFS_ERROR_REPORT(__func__, XFS_ERRLEVEL_LOW, mp);
> >  		return -EFSCORRUPTED;
> >  	}
> > diff --git a/fs/xfs/xfs_attr_list.c b/fs/xfs/xfs_attr_list.c
> > index 99bbbe1a0e44..a51f7f13a352 100644
> > --- a/fs/xfs/xfs_attr_list.c
> > +++ b/fs/xfs/xfs_attr_list.c
> > @@ -58,9 +58,13 @@ xfs_attr_shortform_list(
> >  	struct xfs_attr_sf_sort		*sbuf, *sbp;
> >  	struct xfs_attr_shortform	*sf;
> >  	struct xfs_attr_sf_entry	*sfe;
> > +	struct xfs_mount		*mp;
> >  	int				sbsize, nsbuf, count, i;
> >  	int				error = 0;
> >  
> > +	ASSERT(context != NULL);
> > +	ASSERT(dp != NULL);
> > +	mp = dp->i_mount;
> >  	sf = (struct xfs_attr_shortform *)dp->i_af.if_u1.if_data;
> >  	ASSERT(sf != NULL);
> >  	if (!sf->hdr.count)
> > @@ -82,8 +86,9 @@ xfs_attr_shortform_list(
> >  	     (dp->i_af.if_bytes + sf->hdr.count * 16) < context-
> > >bufsize)) {
> >  		for (i = 0, sfe = &sf->list[0]; i < sf->hdr.count; i++)
> > {
> >  			if (XFS_IS_CORRUPT(context->dp->i_mount,
> > -					   !xfs_attr_namecheck(sfe-
> > >nameval,
> > -							       sfe-
> > >namelen)))
> > +					   !xfs_attr_namecheck(mp, sfe-
> > >nameval,
> > +							       sfe-
> > >namelen,
> > +							       sfe-
> > >flags)))
> >  				return -EFSCORRUPTED;
> >  			context->put_listent(context,
> >  					     sfe->flags,
> > @@ -174,8 +179,9 @@ xfs_attr_shortform_list(
> >  			cursor->offset = 0;
> >  		}
> >  		if (XFS_IS_CORRUPT(context->dp->i_mount,
> > -				   !xfs_attr_namecheck(sbp->name,
> > -						       sbp->namelen)))
> > {
> > +				   !xfs_attr_namecheck(mp, sbp->name,
> > +						       sbp->namelen,
> > +						       sbp->flags))) {
> >  			error = -EFSCORRUPTED;
> >  			goto out;
> >  		}
> > @@ -465,7 +471,8 @@ xfs_attr3_leaf_list_int(
> >  		}
> >  
> >  		if (XFS_IS_CORRUPT(context->dp->i_mount,
> > -				   !xfs_attr_namecheck(name, namelen)))
> > +				   !xfs_attr_namecheck(mp, name,
> > namelen,
> > +						       entry->flags)))
> >  			return -EFSCORRUPTED;
> >  		context->put_listent(context, entry->flags,
> >  					      name, namelen, valuelen);
> > -- 
> > 2.25.1
> > 


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH RESEND v2 11/18] xfs: extend transaction reservations for parent attributes
  2022-08-09 17:48   ` Darrick J. Wong
@ 2022-08-10  3:08     ` Alli
  0 siblings, 0 replies; 58+ messages in thread
From: Alli @ 2022-08-10  3:08 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Tue, 2022-08-09 at 10:48 -0700, Darrick J. Wong wrote:
> On Thu, Aug 04, 2022 at 12:40:06PM -0700, Allison Henderson wrote:
> > We need to add, remove or modify parent pointer attributes during
> > create/link/unlink/rename operations atomically with the dirents in
> > the
> > parent directories being modified. This means they need to be
> > modified
> > in the same transaction as the parent directories, and so we need
> > to add
> > the required space for the attribute modifications to the
> > transaction
> > reservations.
> 
> While we're on the topic of log reservations ... Dave and I noticed
> during the 5.19 cycle that xfs_log_calc_max_attrsetm_res has a unit
> conversion problem when it's trying to compute the minimum log size:
> 
> STATIC int
> xfs_log_calc_max_attrsetm_res(
> 	struct xfs_mount	*mp)
> {
> 	int			size;
> 	int			nblks;
> 
> 	size = xfs_attr_leaf_entsize_local_max(mp->m_attr_geo->blksize) 
> -
> 	       MAXNAMELEN - 1;
> 
> Notice here that @size is the maximum amount of space that a local
> format attribute can use in an xattr leaf block.  The computation is
> in
> units of bytes.
> 
> 	nblks = XFS_DAENTER_SPACE_RES(mp, XFS_ATTR_FORK);
> 	nblks += XFS_B_TO_FSB(mp, size);
> 
> ...and here we convert bytes to fs blocks for the block count
> computation...
> 
> 	nblks += XFS_NEXTENTADD_SPACE_RES(mp, size, XFS_ATTR_FORK);
> 
> ...but here we pass the byte count into a macro that takes a block
> count
> as its second parameter and returns the number of bmbt blocks needed
> to
> add that many blocks to an attribute fork.  Oops!
> 
> I would like to fix this incorrect code, but it's never a good idea
> to
> adjust downwards the min log size calculation for existing
> filesystems,
> because this can result in the situation where new mkfs formats a
> filesystem with a small enough log that an old kernel won't mount it.
> 
> Therefore, the corrected logic would have to be gated on whatever
> happens to be the next new ondisk feature.  It's probably too late to
> do
> this for large extent counts, but fixing the calculation would be (I
> think) appropriate for parent pointers, since it's still undergoing
> review and won't be an easy upgrade, which eliminates the legacy
> problem.
> 
> I'll attach the patches that I've written as patches 19 and 20 to
> this
> patchset, if you don't 
Sure, I will keep an eye out for them then.

> 
> 	return  M_RES(mp)->tr_attrsetm.tr_logres +
> 		M_RES(mp)->tr_attrsetrt.tr_logres * nblks;
> }
> 
> 
> > [achender: rebased]
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
> > ---
> >  fs/xfs/libxfs/xfs_trans_resv.c | 105 +++++++++++++++++++++++++++
> > ------
> >  1 file changed, 86 insertions(+), 19 deletions(-)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_trans_resv.c
> > b/fs/xfs/libxfs/xfs_trans_resv.c
> > index e9913c2c5a24..b43ac4be7564 100644
> > --- a/fs/xfs/libxfs/xfs_trans_resv.c
> > +++ b/fs/xfs/libxfs/xfs_trans_resv.c
> > @@ -909,24 +909,67 @@ xfs_calc_sb_reservation(
> >  	return xfs_calc_buf_res(1, mp->m_sb.sb_sectsize);
> >  }
> >  
> > -void
> > -xfs_trans_resv_calc(
> > -	struct xfs_mount	*mp,
> > -	struct xfs_trans_resv	*resp)
> > +STATIC void
> > +xfs_calc_parent_ptr_reservations(
> > +	struct xfs_mount     *mp)
> >  {
> > -	int			logcount_adj = 0;
> > +	struct xfs_trans_resv   *resp = M_RES(mp);
> >  
> > -	/*
> > -	 * The following transactions are logged in physical format and
> > -	 * require a permanent reservation on space.
> > -	 */
> > -	resp->tr_write.tr_logres = xfs_calc_write_reservation(mp,
> > false);
> > -	resp->tr_write.tr_logcount = XFS_WRITE_LOG_COUNT;
> > -	resp->tr_write.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
> > +	/* Calculate extra space needed for parent pointer attributes
> > */
> 
> This might be better expressed as a comment just prior to the
> function
> declaration above.
Alrighty, will move upwards

> 
> > +	if (!xfs_has_parent(mp))
> > +		return;
> >  
> > -	resp->tr_itruncate.tr_logres =
> > xfs_calc_itruncate_reservation(mp, false);
> > -	resp->tr_itruncate.tr_logcount = XFS_ITRUNCATE_LOG_COUNT;
> > -	resp->tr_itruncate.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
> > +	/* rename can add/remove/modify 4 parent attributes */
> > +	resp->tr_rename.tr_logres += 4 * max(resp-
> > >tr_attrsetm.tr_logres,
> > +					 resp->tr_attrrm.tr_logres);
> 
> Why does the per-transaction reservation increase by 4x the amount of
> space needed to set (or delete) an xattr?  The pptr patchset now uses
> logged xattrs, which means that each xattr update needed to commit
> the
> rename operation will happen in a separate transaction.  IOWs, each
> transaction in the chain does not have to handle *every* update that
> must be made during the entire chain, it only has to handle one step
> of
> the full process.
Oh, I think initially this might have been pre-larp code, so probably
we  can just drop it now

> 
> Doesn't that mean that the size of tr_rename.tr_logres only needs to
> increase by the amount of space needed to log the four(?) xattr items
> to
> the first transaction in the chain?  AFAICT, it also can't be smaller
> than max(resp->tr_attrsetm.tr_logres, resp->tr_attrrm.tr_logres);
I think so, probably we can just leave it as max

> 
> (I'm also not sure why four -- the patch for xfs_rename only creates
> three xfs_parent_defer objects.)
Hrmm, well initially I think it was for 4 inodes, but then we
remembered wip in the last review, so now it's 5 right? That's why we
expanded XFS_DEFER_OPS_NR_INODES in patch 1.  So probably the 4 inodes
needs to turn into a 5 in this patch too.


> 
> I also think that adjusting tr_rename to account for parent pointers
> is
> something that should be done in xfs_calc_rename_reservation, not a
> separate function:
> 
> /*
>  * In renaming a files we can modify (t1):
>  *    the four inodes involved: 4 * inode size
5?

>  *    the two directory btrees: 2 * (max depth + v2) * dir block size
>  *    the two directory bmap btrees: 2 * max depth * block size
>  * And the bmap_finish transaction can free dir and bmap blocks (two
> sets
>  *	of bmap blocks) giving (t2):
>  *    the agf for the ags in which the blocks live: 3 * sector size
>  *    the agfl for the ags in which the blocks live: 3 * sector size
>  *    the superblock for the free block count: sector size
>  *    the allocation btrees: 3 exts * 2 trees * (2 * max depth - 1) *
> block size
>  * If parent pointers are enabled (t3), then each transaction in the
> chain
>  *    must be capable of setting or removing the extended attribute
>  *    containing the parent information.  It must also be able to
> handle
>  *    the three xattr intent items that track the progress of the
> parent
>  *    pointer update.
>  */
> STATIC uint
> xfs_calc_rename_reservation(
> 	struct xfs_mount	*mp)
> {
> 	unsigned int		overhead = XFS_DQUOT_LOGRES(mp);
> 	unsigned int		t1, t2, t3 = 0;
> 
> 	t1 = xfs_calc_inode_res(mp, 4) +
and then the same 5 goes in here too...

> 	     xfs_calc_buf_res(2 * XFS_DIROP_LOG_COUNT(mp),
> 			XFS_FSB_TO_B(mp, 1));
> 
> 	t2 = xfs_calc_buf_res(7, mp->m_sb.sb_sectsize) +
> 	     xfs_calc_buf_res(xfs_allocfree_block_count(mp, 3),
> 			XFS_FSB_TO_B(mp, 1))));
> 
> 	if (xfs_has_parent(mp)) {
> 		t3 = max(resp->tr_attrsetm.tr_logres,
> 				resp->tr_attrrm.tr_logres);
> 		overhead += 3 * (size of a pptr xattr intent item);
> 	}
> 
> 	return overhead + max3(t1, t2, t3);
> }
> 
> > +	resp->tr_rename.tr_logcount += 4 * max(resp-
> > >tr_attrsetm.tr_logcount,
> > +					   resp-
> > >tr_attrrm.tr_logcount);
> 
> Looks correct, module the 4 vs. 3 thing.
> 
I think that looks right?  5 inodes 3 pptrs?


> > +
> > +	/* create will add 1 parent attribute */
> > +	resp->tr_create.tr_logres += resp->tr_attrsetm.tr_logres;
> > +	resp->tr_create.tr_logcount += resp->tr_attrsetm.tr_logcount;
> > +
> > +	/* mkdir will add 1 parent attribute */
> > +	resp->tr_mkdir.tr_logres += resp->tr_attrsetm.tr_logres;
> > +	resp->tr_mkdir.tr_logcount += resp->tr_attrsetm.tr_logcount;
> > +
> > +	/* link will add 1 parent attribute */
> > +	resp->tr_link.tr_logres += resp->tr_attrsetm.tr_logres;
> > +	resp->tr_link.tr_logcount += resp->tr_attrsetm.tr_logcount;
> > +
> > +	/* symlink will add 1 parent attribute */
> > +	resp->tr_symlink.tr_logres += resp->tr_attrsetm.tr_logres;
> > +	resp->tr_symlink.tr_logcount += resp->tr_attrsetm.tr_logcount;
> > +
> > +	/* remove will remove 1 parent attribute */
> > +	resp->tr_remove.tr_logres += resp->tr_attrrm.tr_logres;
> > +	resp->tr_remove.tr_logcount += resp->tr_attrrm.tr_logcount;
> > +}
> > +
> > +/*
> > + * Namespace reservations.
> > + *
> > + * These get tricky when parent pointers are enabled as we have
> > attribute
> > + * modifications occurring from within these transactions. Rather
> > than confuse
> > + * each of these reservation calculations with the conditional
> > attribute
> > + * reservations, add them here in a clear and concise manner. This
> > assumes that
> > + * the attribute reservations have already been calculated.
> > + *
> > + * Note that we only include the static attribute reservation
> > here; the runtime
> > + * reservation will have to be modified by the size of the
> > attributes being
> > + * added/removed/modified. See the comments on the attribute
> > reservation
> > + * calculations for more details.
> > + *
> > + * Note for rename: rename will vastly overestimate requirements.
> > This will be
> > + * addressed later when modifications are made to ensure parent
> > attribute
> 
> Later?  I took a look at the rename patch, and it looks like we're
> using
> logged xattrs from the start.
I think it's a stale comment.  Will remove.  Thanks for the reviews!

Allison

> 
> --D
> 
> > + * modifications can be done atomically with the rename operation.
> > + */
> > +STATIC void
> > +xfs_calc_namespace_reservations(
> > +	struct xfs_mount	*mp,
> > +	struct xfs_trans_resv	*resp)
> > +{
> > +	ASSERT(resp->tr_attrsetm.tr_logres > 0);
> >  
> >  	resp->tr_rename.tr_logres = xfs_calc_rename_reservation(mp);
> >  	resp->tr_rename.tr_logcount = XFS_RENAME_LOG_COUNT;
> > @@ -948,15 +991,37 @@ xfs_trans_resv_calc(
> >  	resp->tr_create.tr_logcount = XFS_CREATE_LOG_COUNT;
> >  	resp->tr_create.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
> >  
> > +	resp->tr_mkdir.tr_logres = xfs_calc_mkdir_reservation(mp);
> > +	resp->tr_mkdir.tr_logcount = XFS_MKDIR_LOG_COUNT;
> > +	resp->tr_mkdir.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
> > +
> > +	xfs_calc_parent_ptr_reservations(mp);
> > +}
> > +
> > +void
> > +xfs_trans_resv_calc(
> > +	struct xfs_mount	*mp,
> > +	struct xfs_trans_resv	*resp)
> > +{
> > +	int			logcount_adj = 0;
> > +
> > +	/*
> > +	 * The following transactions are logged in physical format and
> > +	 * require a permanent reservation on space.
> > +	 */
> > +	resp->tr_write.tr_logres = xfs_calc_write_reservation(mp,
> > false);
> > +	resp->tr_write.tr_logcount = XFS_WRITE_LOG_COUNT;
> > +	resp->tr_write.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
> > +
> > +	resp->tr_itruncate.tr_logres =
> > xfs_calc_itruncate_reservation(mp, false);
> > +	resp->tr_itruncate.tr_logcount = XFS_ITRUNCATE_LOG_COUNT;
> > +	resp->tr_itruncate.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
> > +
> >  	resp->tr_create_tmpfile.tr_logres =
> >  			xfs_calc_create_tmpfile_reservation(mp);
> >  	resp->tr_create_tmpfile.tr_logcount =
> > XFS_CREATE_TMPFILE_LOG_COUNT;
> >  	resp->tr_create_tmpfile.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
> >  
> > -	resp->tr_mkdir.tr_logres = xfs_calc_mkdir_reservation(mp);
> > -	resp->tr_mkdir.tr_logcount = XFS_MKDIR_LOG_COUNT;
> > -	resp->tr_mkdir.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
> > -
> >  	resp->tr_ifree.tr_logres = xfs_calc_ifree_reservation(mp);
> >  	resp->tr_ifree.tr_logcount = XFS_INACTIVE_LOG_COUNT;
> >  	resp->tr_ifree.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
> > @@ -986,6 +1051,8 @@ xfs_trans_resv_calc(
> >  	resp->tr_qm_dqalloc.tr_logcount = XFS_WRITE_LOG_COUNT;
> >  	resp->tr_qm_dqalloc.tr_logflags |= XFS_TRANS_PERM_LOG_RES;
> >  
> > +	xfs_calc_namespace_reservations(mp, resp);
> > +
> >  	/*
> >  	 * The following transactions are logged in logical format with
> >  	 * a default log count.
> > -- 
> > 2.25.1
> > 


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH RESEND v2 12/18] xfs: parent pointer attribute creation
  2022-08-09 18:01   ` Darrick J. Wong
  2022-08-09 18:13     ` Darrick J. Wong
@ 2022-08-10  3:08     ` Alli
  1 sibling, 0 replies; 58+ messages in thread
From: Alli @ 2022-08-10  3:08 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Tue, 2022-08-09 at 11:01 -0700, Darrick J. Wong wrote:
> On Thu, Aug 04, 2022 at 12:40:07PM -0700, Allison Henderson wrote:
> > Add parent pointer attribute during xfs_create, and subroutines to
> > initialize attributes
> > 
> > [bfoster: rebase, use VFS inode generation]
> > [achender: rebased, changed __unint32_t to xfs_dir2_dataptr_t,
> 
> Nit: uint32_t, not unint32_t.
I actually thought about removing this little change log all together?
 I had initially added that to follow suit with Brians style, but
really the set has undergone so many updates, trying to keep a log here
seems a bit silly.  Unless there's a reason people would like to hang
on to them, I think maybe we should just clean them out?


> 
> >            fixed some null pointer bugs,
> >            merged error handling patch,
> >            remove unnecessary ENOSPC handling in
> > xfs_attr_set_first_parent]
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
> > ---
> >  fs/xfs/Makefile            |   1 +
> >  fs/xfs/libxfs/xfs_attr.c   |   4 +-
> >  fs/xfs/libxfs/xfs_attr.h   |   4 +-
> >  fs/xfs/libxfs/xfs_parent.c | 134
> > +++++++++++++++++++++++++++++++++++++
> >  fs/xfs/libxfs/xfs_parent.h |  34 ++++++++++
> >  fs/xfs/xfs_inode.c         |  37 ++++++++--
> >  fs/xfs/xfs_xattr.c         |   2 +-
> >  fs/xfs/xfs_xattr.h         |   1 +
> >  8 files changed, 208 insertions(+), 9 deletions(-)
> > 
> > diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
> > index 1131dd01e4fe..caeea8d968ba 100644
> > --- a/fs/xfs/Makefile
> > +++ b/fs/xfs/Makefile
> > @@ -40,6 +40,7 @@ xfs-y				+= $(addprefix
> > libxfs/, \
> >  				   xfs_inode_fork.o \
> >  				   xfs_inode_buf.o \
> >  				   xfs_log_rlimit.o \
> > +				   xfs_parent.o \
> >  				   xfs_ag_resv.o \
> >  				   xfs_rmap.o \
> >  				   xfs_rmap_btree.o \
> > diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> > index 2ef3262f21e8..0a458ea7051f 100644
> > --- a/fs/xfs/libxfs/xfs_attr.c
> > +++ b/fs/xfs/libxfs/xfs_attr.c
> > @@ -880,7 +880,7 @@ xfs_attr_lookup(
> >  	return error;
> >  }
> >  
> > -static int
> > +int
> >  xfs_attr_intent_init(
> >  	struct xfs_da_args	*args,
> >  	unsigned int		op_flags,	/* op flag (set or
> > remove) */
> > @@ -898,7 +898,7 @@ xfs_attr_intent_init(
> >  }
> >  
> >  /* Sets an attribute for an inode as a deferred operation */
> > -static int
> > +int
> >  xfs_attr_defer_add(
> >  	struct xfs_da_args	*args)
> >  {
> > diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
> > index af92cc57e7d8..b47417b5172f 100644
> > --- a/fs/xfs/libxfs/xfs_attr.h
> > +++ b/fs/xfs/libxfs/xfs_attr.h
> > @@ -544,6 +544,7 @@ int xfs_inode_hasattr(struct xfs_inode *ip);
> >  bool xfs_attr_is_leaf(struct xfs_inode *ip);
> >  int xfs_attr_get_ilocked(struct xfs_da_args *args);
> >  int xfs_attr_get(struct xfs_da_args *args);
> > +int xfs_attr_defer_add(struct xfs_da_args *args);
> >  int xfs_attr_set(struct xfs_da_args *args);
> >  int xfs_attr_set_iter(struct xfs_attr_intent *attr);
> >  int xfs_attr_remove_iter(struct xfs_attr_intent *attr);
> > @@ -552,7 +553,8 @@ bool xfs_attr_namecheck(struct xfs_mount *mp,
> > const void *name, size_t length,
> >  int xfs_attr_calc_size(struct xfs_da_args *args, int *local);
> >  void xfs_init_attr_trans(struct xfs_da_args *args, struct
> > xfs_trans_res *tres,
> >  			 unsigned int *total);
> > -
> > +int xfs_attr_intent_init(struct xfs_da_args *args, unsigned int
> > op_flags,
> > +			 struct xfs_attr_intent  **attr);
> >  /*
> >   * Check to see if the attr should be upgraded from non-existent
> > or shortform to
> >   * single-leaf-block attribute list.
> > diff --git a/fs/xfs/libxfs/xfs_parent.c
> > b/fs/xfs/libxfs/xfs_parent.c
> > new file mode 100644
> > index 000000000000..4ab531c77d7d
> > --- /dev/null
> > +++ b/fs/xfs/libxfs/xfs_parent.c
> > @@ -0,0 +1,134 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * Copyright (c) 2022 Oracle, Inc.
> > + * All rights reserved.
> > + */
> > +#include "xfs.h"
> > +#include "xfs_fs.h"
> > +#include "xfs_format.h"
> > +#include "xfs_da_format.h"
> > +#include "xfs_log_format.h"
> > +#include "xfs_shared.h"
> > +#include "xfs_trans_resv.h"
> > +#include "xfs_mount.h"
> > +#include "xfs_bmap_btree.h"
> > +#include "xfs_inode.h"
> > +#include "xfs_error.h"
> > +#include "xfs_trace.h"
> > +#include "xfs_trans.h"
> > +#include "xfs_da_btree.h"
> > +#include "xfs_attr.h"
> > +#include "xfs_da_btree.h"
> > +#include "xfs_attr_sf.h"
> > +#include "xfs_bmap.h"
> > +#include "xfs_defer.h"
> > +#include "xfs_log.h"
> > +#include "xfs_xattr.h"
> > +#include "xfs_parent.h"
> > +
> > +/*
> > + * Parent pointer attribute handling.
> > + *
> > + * Because the attribute value is a filename component, it will
> > never be longer
> > + * than 255 bytes. This means the attribute will always be a local
> > format
> > + * attribute as it is xfs_attr_leaf_entsize_local_max() for v5
> > filesystems will
> > + * always be larger than this (max is 75% of block size).
> > + *
> > + * Creating a new parent attribute will always create a new
> > attribute - there
> > + * should never, ever be an existing attribute in the tree for a
> > new inode.
> > + * ENOSPC behavior is problematic - creating the inode without the
> > parent
> > + * pointer is effectively a corruption, so we allow parent
> > attribute creation
> > + * to dip into the reserve block pool to avoid unexpected ENOSPC
> > errors from
> > + * occurring.
> 
> Shouldn't we increase XFS_LINK_SPACE_RES to avoid this?  The reserve
> pool isn't terribly large (8192 blocks) and was really only supposed
> to
> save us from an ENOSPC shutdown if an unwritten extent conversion in
> the
> writeback endio handler needs a few more blocks.
> 
Did you maybe mean XFS_IALLOC_SPACE_RES?  That looks like the macro
that's getting used below in xfs_create

> IOWs, we really ought to ENOSPC at transaction reservation time
> instead
> of draining the reserve pool.
It looks like we do that in most cases.  I dont actually see rsvd
getting set, other than in xfs_attr_set.  Which isnt used in parent
pointer updating, and should probably be removed.  I suspect it's a
relic of the pre-larp version of the set. So perhaps the comment is
stale and should be removed as well.  

> 
> > + */
> > +
> > +
> > +/* Initializes a xfs_parent_name_rec to be stored as an attribute
> > name */
> > +void
> > +xfs_init_parent_name_rec(
> > +	struct xfs_parent_name_rec	*rec,
> > +	struct xfs_inode		*ip,
> > +	uint32_t			p_diroffset)
> > +{
> > +	xfs_ino_t			p_ino = ip->i_ino;
> > +	uint32_t			p_gen = VFS_I(ip)->i_generation;
> > +
> > +	rec->p_ino = cpu_to_be64(p_ino);
> > +	rec->p_gen = cpu_to_be32(p_gen);
> > +	rec->p_diroffset = cpu_to_be32(p_diroffset);
> > +}
> > +
> > +/* Initializes a xfs_parent_name_irec from an xfs_parent_name_rec
> > */
> > +void
> > +xfs_init_parent_name_irec(
> > +	struct xfs_parent_name_irec	*irec,
> > +	struct xfs_parent_name_rec	*rec)
> > +{
> > +	irec->p_ino = be64_to_cpu(rec->p_ino);
> > +	irec->p_gen = be32_to_cpu(rec->p_gen);
> > +	irec->p_diroffset = be32_to_cpu(rec->p_diroffset);
> > +}
> > +
> > +int
> > +xfs_parent_init(
> > +	xfs_mount_t                     *mp,
> > +	xfs_inode_t			*ip,
> > +	struct xfs_name			*target_name,
> > +	struct xfs_parent_defer		**parentp)
> > +{
> > +	struct xfs_parent_defer		*parent;
> > +	int				error;
> > +
> > +	if (!xfs_has_parent(mp))
> > +		return 0;
> > +
> > +	error = xfs_attr_grab_log_assist(mp);
> 
> At some point we might want to consider boosting performance by
> setting
> XFS_SB_FEAT_INCOMPAT_LOG_XATTRS permanently when parent pointers are
> turned on, since adding the feature requires a synchronous bwrite of
> the
> primary superblock.
> 
> I /think/ this could be accomplished by setting the feature bit in
> mkfs
> and teaching xlog_clear_incompat to exit if xfs_has_parent()==true.
> Then we can skip the xfs_attr_grab_log_assist calls.
> 
> But, let's focus on getting this patchset into good enough shape that
> we can be confident that we don't need any ondisk format changes, and
> worry about speed later.
Yep, I will add that to the mkfs side.  I do have the user space
updates on git hub, but I dont want to patch bomb the list with it just
yet because it's just too much to review all at once.  It makes sense
to get the kernel updates out of the way first.

> 
> > +	if (error)
> > +		return error;
> > +
> > +	parent = kzalloc(sizeof(*parent), GFP_KERNEL);
> 
> These objects are going to be created and freed fairly frequently;
> could
> you please convert these to a kmem cache?  (That can be a cleanup at
> the
> end.)
Sure, will do

> 
> > +	if (!parent)
> > +		return -ENOMEM;
> > +
> > +	/* init parent da_args */
> > +	parent->args.dp = ip;
> > +	parent->args.geo = mp->m_attr_geo;
> > +	parent->args.whichfork = XFS_ATTR_FORK;
> > +	parent->args.attr_filter = XFS_ATTR_PARENT;
> > +	parent->args.op_flags = XFS_DA_OP_OKNOENT | XFS_DA_OP_LOGGED;
> > +	parent->args.name = (const uint8_t *)&parent->rec;
> > +	parent->args.namelen = sizeof(struct xfs_parent_name_rec);
> > +
> > +	if (target_name) {
> > +		parent->args.value = (void *)target_name->name;
> > +		parent->args.valuelen = target_name->len;
> > +	}
> > +
> > +	*parentp = parent;
> > +	return 0;
> > +}
> > +
> > +int
> > +xfs_parent_defer_add(
> > +	struct xfs_trans	*tp,
> > +	struct xfs_inode	*ip,
> > +	struct xfs_parent_defer	*parent,
> > +	xfs_dir2_dataptr_t	diroffset)
> > +{
> > +	struct xfs_da_args	*args = &parent->args;
> > +
> > +	xfs_init_parent_name_rec(&parent->rec, ip, diroffset);
> > +	args->trans = tp;
> > +	args->hashval = xfs_da_hashname(args->name, args->namelen);
> > +	return xfs_attr_defer_add(args);
> > +}
> > +
> > +void
> > +xfs_parent_cancel(
> > +	xfs_mount_t		*mp,
> > +	struct xfs_parent_defer *parent)
> > +{
> > +	xlog_drop_incompat_feat(mp->m_log);
> > +	kfree(parent);
> > +}
> > +
> > diff --git a/fs/xfs/libxfs/xfs_parent.h
> > b/fs/xfs/libxfs/xfs_parent.h
> > new file mode 100644
> > index 000000000000..21a350b97ed5
> > --- /dev/null
> > +++ b/fs/xfs/libxfs/xfs_parent.h
> > @@ -0,0 +1,34 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * Copyright (c) 2022 Oracle, Inc.
> > + * All Rights Reserved.
> > + */
> > +#ifndef	__XFS_PARENT_H__
> > +#define	__XFS_PARENT_H__
> > +
> > +/*
> > + * Dynamically allocd structure used to wrap the needed data to
> > pass around
> > + * the defer ops machinery
> > + */
> > +struct xfs_parent_defer {
> > +	struct xfs_parent_name_rec	rec;
> > +	struct xfs_da_args		args;
> > +};
> > +
> > +/*
> > + * Parent pointer attribute prototypes
> > + */
> > +void xfs_init_parent_name_rec(struct xfs_parent_name_rec *rec,
> > +			      struct xfs_inode *ip,
> > +			      uint32_t p_diroffset);
> > +void xfs_init_parent_name_irec(struct xfs_parent_name_irec *irec,
> > +			       struct xfs_parent_name_rec *rec);
> > +int xfs_parent_init(xfs_mount_t *mp, xfs_inode_t *ip,
> > +		    struct xfs_name *target_name,
> > +		    struct xfs_parent_defer **parentp);
> > +int xfs_parent_defer_add(struct xfs_trans *tp, struct xfs_inode
> > *ip,
> > +			 struct xfs_parent_defer *parent,
> > +			 xfs_dir2_dataptr_t diroffset);
> > +void xfs_parent_cancel(xfs_mount_t *mp, struct xfs_parent_defer
> > *parent);
> > +
> > +#endif	/* __XFS_PARENT_H__ */
> > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> > index 09876ba10a42..ef993c3a8963 100644
> > --- a/fs/xfs/xfs_inode.c
> > +++ b/fs/xfs/xfs_inode.c
> > @@ -37,6 +37,8 @@
> >  #include "xfs_reflink.h"
> >  #include "xfs_ag.h"
> >  #include "xfs_log_priv.h"
> > +#include "xfs_parent.h"
> > +#include "xfs_xattr.h"
> >  
> >  struct kmem_cache *xfs_inode_cache;
> >  
> > @@ -950,7 +952,7 @@ xfs_bumplink(
> >  int
> >  xfs_create(
> >  	struct user_namespace	*mnt_userns,
> > -	xfs_inode_t		*dp,
> > +	struct xfs_inode	*dp,
> >  	struct xfs_name		*name,
> >  	umode_t			mode,
> >  	dev_t			rdev,
> > @@ -962,7 +964,7 @@ xfs_create(
> >  	struct xfs_inode	*ip = NULL;
> >  	struct xfs_trans	*tp = NULL;
> >  	int			error;
> > -	bool                    unlock_dp_on_error = false;
> > +	bool			unlock_dp_on_error = false;
> >  	prid_t			prid;
> >  	struct xfs_dquot	*udqp = NULL;
> >  	struct xfs_dquot	*gdqp = NULL;
> > @@ -970,6 +972,8 @@ xfs_create(
> >  	struct xfs_trans_res	*tres;
> >  	uint			resblks;
> >  	xfs_ino_t		ino;
> > +	xfs_dir2_dataptr_t	diroffset;
> > +	struct xfs_parent_defer	*parent = NULL;
> >  
> >  	trace_xfs_create(dp, name);
> >  
> > @@ -996,6 +1000,12 @@ xfs_create(
> >  		tres = &M_RES(mp)->tr_create;
> >  	}
> >  
> > +	if (xfs_has_parent(mp)) {
> > +		error = xfs_parent_init(mp, dp, name, &parent);
> > +		if (error)
> > +			goto out_release_dquots;
> > +	}
> > +
> >  	/*
> >  	 * Initially assume that the file does not exist and
> >  	 * reserve the resources for that case.  If that is not
> > @@ -1011,7 +1021,7 @@ xfs_create(
> >  				resblks, &tp);
> >  	}
> >  	if (error)
> > -		goto out_release_dquots;
> > +		goto drop_incompat;
> >  
> >  	xfs_ilock(dp, XFS_ILOCK_EXCL | XFS_ILOCK_PARENT);
> >  	unlock_dp_on_error = true;
> > @@ -1021,6 +1031,7 @@ xfs_create(
> >  	 * entry pointing to them, but a directory also the "." entry
> >  	 * pointing to itself.
> >  	 */
> > +	init_xattrs |= xfs_has_parent(mp);
> >  	error = xfs_dialloc(&tp, dp->i_ino, mode, &ino);
> >  	if (!error)
> >  		error = xfs_init_new_inode(mnt_userns, tp, dp, ino,
> > mode,
> > @@ -1035,11 +1046,12 @@ xfs_create(
> >  	 * the transaction cancel unlocking dp so don't do it
> > explicitly in the
> >  	 * error path.
> >  	 */
> > -	xfs_trans_ijoin(tp, dp, XFS_ILOCK_EXCL);
> > +	xfs_trans_ijoin(tp, dp, 0);
> >  	unlock_dp_on_error = false;
> >  
> >  	error = xfs_dir_createname(tp, dp, name, ip->i_ino,
> > -				   resblks - XFS_IALLOC_SPACE_RES(mp),
> > NULL);
> > +				   resblks - XFS_IALLOC_SPACE_RES(mp),
> > +				   &diroffset);
> >  	if (error) {
> >  		ASSERT(error != -ENOSPC);
> >  		goto out_trans_cancel;
> > @@ -1055,6 +1067,17 @@ xfs_create(
> >  		xfs_bumplink(tp, dp);
> >  	}
> >  
> > +	/*
> > +	 * If we have parent pointers, we need to add the attribute
> > containing
> > +	 * the parent information now.
> > +	 */
> > +	if (parent) {
> > +		parent->args.dp	= ip;
> > +		error = xfs_parent_defer_add(tp, dp, parent,
> > diroffset);
> > +		if (error)
> > +			goto out_trans_cancel;
> > +	}
> > +
> >  	/*
> >  	 * If this is a synchronous mount, make sure that the
> >  	 * create transaction goes to disk before returning to
> > @@ -1080,6 +1103,7 @@ xfs_create(
> >  
> >  	*ipp = ip;
> >  	xfs_iunlock(ip, XFS_ILOCK_EXCL);
> > +	xfs_iunlock(dp, XFS_ILOCK_EXCL | XFS_ILOCK_PARENT);
> 
> I don't think we need the ILOCK class annotations for unlocks.
> 
> Other than the two things I asked about, this is looking good.
Ok, will remove the XFS_ILOCK_PARENT.  Thanks for the reviews!

Allison
> 
> --D
> 
> >  	return 0;
> >  
> >   out_trans_cancel:
> > @@ -1094,6 +1118,9 @@ xfs_create(
> >  		xfs_finish_inode_setup(ip);
> >  		xfs_irele(ip);
> >  	}
> > + drop_incompat:
> > +	if (parent)
> > +		xfs_parent_cancel(mp, parent);
> >   out_release_dquots:
> >  	xfs_qm_dqrele(udqp);
> >  	xfs_qm_dqrele(gdqp);
> > diff --git a/fs/xfs/xfs_xattr.c b/fs/xfs/xfs_xattr.c
> > index c325a28b89a8..d9067c5f6bd6 100644
> > --- a/fs/xfs/xfs_xattr.c
> > +++ b/fs/xfs/xfs_xattr.c
> > @@ -27,7 +27,7 @@
> >   * they must release the permission by calling
> > xlog_drop_incompat_feat
> >   * when they're done.
> >   */
> > -static inline int
> > +int
> >  xfs_attr_grab_log_assist(
> >  	struct xfs_mount	*mp)
> >  {
> > diff --git a/fs/xfs/xfs_xattr.h b/fs/xfs/xfs_xattr.h
> > index 2b09133b1b9b..3fd6520a4d69 100644
> > --- a/fs/xfs/xfs_xattr.h
> > +++ b/fs/xfs/xfs_xattr.h
> > @@ -7,6 +7,7 @@
> >  #define __XFS_XATTR_H__
> >  
> >  int xfs_attr_change(struct xfs_da_args *args);
> > +int xfs_attr_grab_log_assist(struct xfs_mount *mp);
> >  
> >  extern const struct xattr_handler *xfs_xattr_handlers[];
> >  
> > -- 
> > 2.25.1
> > 


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH RESEND v2 12/18] xfs: parent pointer attribute creation
  2022-08-09 18:13     ` Darrick J. Wong
@ 2022-08-10  3:09       ` Alli
  0 siblings, 0 replies; 58+ messages in thread
From: Alli @ 2022-08-10  3:09 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Tue, 2022-08-09 at 11:13 -0700, Darrick J. Wong wrote:
> On Tue, Aug 09, 2022 at 11:01:01AM -0700, Darrick J. Wong wrote:
> > On Thu, Aug 04, 2022 at 12:40:07PM -0700, Allison Henderson wrote:
> > > Add parent pointer attribute during xfs_create, and subroutines
> > > to
> > > initialize attributes
> > > 
> > > [bfoster: rebase, use VFS inode generation]
> > > [achender: rebased, changed __unint32_t to xfs_dir2_dataptr_t,
> > 
> > Nit: uint32_t, not unint32_t.
> > 
> > >            fixed some null pointer bugs,
> > >            merged error handling patch,
> > >            remove unnecessary ENOSPC handling in
> > > xfs_attr_set_first_parent]
> > > 
> > > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > > Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
> > > ---
> > >  fs/xfs/Makefile            |   1 +
> > >  fs/xfs/libxfs/xfs_attr.c   |   4 +-
> > >  fs/xfs/libxfs/xfs_attr.h   |   4 +-
> > >  fs/xfs/libxfs/xfs_parent.c | 134
> > > +++++++++++++++++++++++++++++++++++++
> > >  fs/xfs/libxfs/xfs_parent.h |  34 ++++++++++
> > >  fs/xfs/xfs_inode.c         |  37 ++++++++--
> > >  fs/xfs/xfs_xattr.c         |   2 +-
> > >  fs/xfs/xfs_xattr.h         |   1 +
> > >  8 files changed, 208 insertions(+), 9 deletions(-)
> > > 
> > > diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
> > > index 1131dd01e4fe..caeea8d968ba 100644
> > > --- a/fs/xfs/Makefile
> > > +++ b/fs/xfs/Makefile
> > > @@ -40,6 +40,7 @@ xfs-y				+= $(addprefix
> > > libxfs/, \
> > >  				   xfs_inode_fork.o \
> > >  				   xfs_inode_buf.o \
> > >  				   xfs_log_rlimit.o \
> > > +				   xfs_parent.o \
> > >  				   xfs_ag_resv.o \
> > >  				   xfs_rmap.o \
> > >  				   xfs_rmap_btree.o \
> > > diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> > > index 2ef3262f21e8..0a458ea7051f 100644
> > > --- a/fs/xfs/libxfs/xfs_attr.c
> > > +++ b/fs/xfs/libxfs/xfs_attr.c
> > > @@ -880,7 +880,7 @@ xfs_attr_lookup(
> > >  	return error;
> > >  }
> > >  
> > > -static int
> > > +int
> > >  xfs_attr_intent_init(
> > >  	struct xfs_da_args	*args,
> > >  	unsigned int		op_flags,	/* op flag (set or
> > > remove) */
> > > @@ -898,7 +898,7 @@ xfs_attr_intent_init(
> > >  }
> > >  
> > >  /* Sets an attribute for an inode as a deferred operation */
> > > -static int
> > > +int
> > >  xfs_attr_defer_add(
> > >  	struct xfs_da_args	*args)
> > >  {
> > > diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
> > > index af92cc57e7d8..b47417b5172f 100644
> > > --- a/fs/xfs/libxfs/xfs_attr.h
> > > +++ b/fs/xfs/libxfs/xfs_attr.h
> > > @@ -544,6 +544,7 @@ int xfs_inode_hasattr(struct xfs_inode *ip);
> > >  bool xfs_attr_is_leaf(struct xfs_inode *ip);
> > >  int xfs_attr_get_ilocked(struct xfs_da_args *args);
> > >  int xfs_attr_get(struct xfs_da_args *args);
> > > +int xfs_attr_defer_add(struct xfs_da_args *args);
> > >  int xfs_attr_set(struct xfs_da_args *args);
> > >  int xfs_attr_set_iter(struct xfs_attr_intent *attr);
> > >  int xfs_attr_remove_iter(struct xfs_attr_intent *attr);
> > > @@ -552,7 +553,8 @@ bool xfs_attr_namecheck(struct xfs_mount *mp,
> > > const void *name, size_t length,
> > >  int xfs_attr_calc_size(struct xfs_da_args *args, int *local);
> > >  void xfs_init_attr_trans(struct xfs_da_args *args, struct
> > > xfs_trans_res *tres,
> > >  			 unsigned int *total);
> > > -
> > > +int xfs_attr_intent_init(struct xfs_da_args *args, unsigned int
> > > op_flags,
> > > +			 struct xfs_attr_intent  **attr);
> > >  /*
> > >   * Check to see if the attr should be upgraded from non-existent 
> > > or shortform to
> > >   * single-leaf-block attribute list.
> > > diff --git a/fs/xfs/libxfs/xfs_parent.c
> > > b/fs/xfs/libxfs/xfs_parent.c
> > > new file mode 100644
> > > index 000000000000..4ab531c77d7d
> > > --- /dev/null
> > > +++ b/fs/xfs/libxfs/xfs_parent.c
> > > @@ -0,0 +1,134 @@
> > > +// SPDX-License-Identifier: GPL-2.0
> > > +/*
> > > + * Copyright (c) 2022 Oracle, Inc.
> > > + * All rights reserved.
> > > + */
> > > +#include "xfs.h"
> > > +#include "xfs_fs.h"
> > > +#include "xfs_format.h"
> > > +#include "xfs_da_format.h"
> > > +#include "xfs_log_format.h"
> > > +#include "xfs_shared.h"
> > > +#include "xfs_trans_resv.h"
> > > +#include "xfs_mount.h"
> > > +#include "xfs_bmap_btree.h"
> > > +#include "xfs_inode.h"
> > > +#include "xfs_error.h"
> > > +#include "xfs_trace.h"
> > > +#include "xfs_trans.h"
> > > +#include "xfs_da_btree.h"
> > > +#include "xfs_attr.h"
> > > +#include "xfs_da_btree.h"
> > > +#include "xfs_attr_sf.h"
> > > +#include "xfs_bmap.h"
> > > +#include "xfs_defer.h"
> > > +#include "xfs_log.h"
> > > +#include "xfs_xattr.h"
> > > +#include "xfs_parent.h"
> > > +
> > > +/*
> > > + * Parent pointer attribute handling.
> > > + *
> > > + * Because the attribute value is a filename component, it will
> > > never be longer
> > > + * than 255 bytes. This means the attribute will always be a
> > > local format
> > > + * attribute as it is xfs_attr_leaf_entsize_local_max() for v5
> > > filesystems will
> > > + * always be larger than this (max is 75% of block size).
> > > + *
> > > + * Creating a new parent attribute will always create a new
> > > attribute - there
> > > + * should never, ever be an existing attribute in the tree for a
> > > new inode.
> > > + * ENOSPC behavior is problematic - creating the inode without
> > > the parent
> > > + * pointer is effectively a corruption, so we allow parent
> > > attribute creation
> > > + * to dip into the reserve block pool to avoid unexpected ENOSPC
> > > errors from
> > > + * occurring.
> > 
> > Shouldn't we increase XFS_LINK_SPACE_RES to avoid this?  The
> > reserve
> > pool isn't terribly large (8192 blocks) and was really only
> > supposed to
> > save us from an ENOSPC shutdown if an unwritten extent conversion
> > in the
> > writeback endio handler needs a few more blocks.
> > 
> > IOWs, we really ought to ENOSPC at transaction reservation time
> > instead
> > of draining the reserve pool.
> > 
> > > + */
> > > +
> > > +
> > > +/* Initializes a xfs_parent_name_rec to be stored as an
> > > attribute name */
> > > +void
> > > +xfs_init_parent_name_rec(
> > > +	struct xfs_parent_name_rec	*rec,
> > > +	struct xfs_inode		*ip,
> > > +	uint32_t			p_diroffset)
> > > +{
> > > +	xfs_ino_t			p_ino = ip->i_ino;
> > > +	uint32_t			p_gen = VFS_I(ip)->i_generation;
> > > +
> > > +	rec->p_ino = cpu_to_be64(p_ino);
> > > +	rec->p_gen = cpu_to_be32(p_gen);
> > > +	rec->p_diroffset = cpu_to_be32(p_diroffset);
> > > +}
> > > +
> > > +/* Initializes a xfs_parent_name_irec from an
> > > xfs_parent_name_rec */
> > > +void
> > > +xfs_init_parent_name_irec(
> > > +	struct xfs_parent_name_irec	*irec,
> > > +	struct xfs_parent_name_rec	*rec)
> > > +{
> > > +	irec->p_ino = be64_to_cpu(rec->p_ino);
> > > +	irec->p_gen = be32_to_cpu(rec->p_gen);
> > > +	irec->p_diroffset = be32_to_cpu(rec->p_diroffset);
> > > +}
> > > +
> > > +int
> > > +xfs_parent_init(
> > > +	xfs_mount_t                     *mp,
> > > +	xfs_inode_t			*ip,
> 
> More nits: Please don't use struct typedefs here.
Sure, will fix

> 
> > > +	struct xfs_name			*target_name,
> > > +	struct xfs_parent_defer		**parentp)
> > > +{
> > > +	struct xfs_parent_defer		*parent;
> > > +	int				error;
> > > +
> > > +	if (!xfs_has_parent(mp))
> > > +		return 0;
> > > +
> > > +	error = xfs_attr_grab_log_assist(mp);
> > 
> > At some point we might want to consider boosting performance by
> > setting
> > XFS_SB_FEAT_INCOMPAT_LOG_XATTRS permanently when parent pointers
> > are
> > turned on, since adding the feature requires a synchronous bwrite
> > of the
> > primary superblock.
> > 
> > I /think/ this could be accomplished by setting the feature bit in
> > mkfs
> > and teaching xlog_clear_incompat to exit if xfs_has_parent()==true.
> > Then we can skip the xfs_attr_grab_log_assist calls.
> > 
> > But, let's focus on getting this patchset into good enough shape
> > that
> > we can be confident that we don't need any ondisk format changes,
> > and
> > worry about speed later.
> > 
> > > +	if (error)
> > > +		return error;
> > > +
> > > +	parent = kzalloc(sizeof(*parent), GFP_KERNEL);
> > 
> > These objects are going to be created and freed fairly frequently;
> > could
> > you please convert these to a kmem cache?  (That can be a cleanup
> > at the
> > end.)
> > 
> > > +	if (!parent)
> > > +		return -ENOMEM;
> > > +
> > > +	/* init parent da_args */
> > > +	parent->args.dp = ip;
> > > +	parent->args.geo = mp->m_attr_geo;
> > > +	parent->args.whichfork = XFS_ATTR_FORK;
> > > +	parent->args.attr_filter = XFS_ATTR_PARENT;
> > > +	parent->args.op_flags = XFS_DA_OP_OKNOENT | XFS_DA_OP_LOGGED;
> > > +	parent->args.name = (const uint8_t *)&parent->rec;
> > > +	parent->args.namelen = sizeof(struct xfs_parent_name_rec);
> > > +
> > > +	if (target_name) {
> > > +		parent->args.value = (void *)target_name->name;
> > > +		parent->args.valuelen = target_name->len;
> > > +	}
> > > +
> > > +	*parentp = parent;
> > > +	return 0;
> > > +}
> > > +
> > > +int
> > > +xfs_parent_defer_add(
> > > +	struct xfs_trans	*tp,
> > > +	struct xfs_inode	*ip,
> > > +	struct xfs_parent_defer	*parent,
> > > +	xfs_dir2_dataptr_t	diroffset)
> > > +{
> > > +	struct xfs_da_args	*args = &parent->args;
> > > +
> > > +	xfs_init_parent_name_rec(&parent->rec, ip, diroffset);
> > > +	args->trans = tp;
> > > +	args->hashval = xfs_da_hashname(args->name, args->namelen);
> > > +	return xfs_attr_defer_add(args);
> > > +}
> > > +
> > > +void
> > > +xfs_parent_cancel(
> > > +	xfs_mount_t		*mp,
> > > +	struct xfs_parent_defer *parent)
> > > +{
> > > +	xlog_drop_incompat_feat(mp->m_log);
> > > +	kfree(parent);
> > > +}
> > > +
> > > diff --git a/fs/xfs/libxfs/xfs_parent.h
> > > b/fs/xfs/libxfs/xfs_parent.h
> > > new file mode 100644
> > > index 000000000000..21a350b97ed5
> > > --- /dev/null
> > > +++ b/fs/xfs/libxfs/xfs_parent.h
> > > @@ -0,0 +1,34 @@
> > > +// SPDX-License-Identifier: GPL-2.0
> > > +/*
> > > + * Copyright (c) 2022 Oracle, Inc.
> > > + * All Rights Reserved.
> > > + */
> > > +#ifndef	__XFS_PARENT_H__
> > > +#define	__XFS_PARENT_H__
> > > +
> > > +/*
> > > + * Dynamically allocd structure used to wrap the needed data to
> > > pass around
> > > + * the defer ops machinery
> > > + */
> > > +struct xfs_parent_defer {
> > > +	struct xfs_parent_name_rec	rec;
> > > +	struct xfs_da_args		args;
> > > +};
> > > +
> > > +/*
> > > + * Parent pointer attribute prototypes
> > > + */
> > > +void xfs_init_parent_name_rec(struct xfs_parent_name_rec *rec,
> > > +			      struct xfs_inode *ip,
> > > +			      uint32_t p_diroffset);
> > > +void xfs_init_parent_name_irec(struct xfs_parent_name_irec
> > > *irec,
> > > +			       struct xfs_parent_name_rec *rec);
> > > +int xfs_parent_init(xfs_mount_t *mp, xfs_inode_t *ip,
> > > +		    struct xfs_name *target_name,
> > > +		    struct xfs_parent_defer **parentp);
> > > +int xfs_parent_defer_add(struct xfs_trans *tp, struct xfs_inode
> > > *ip,
> > > +			 struct xfs_parent_defer *parent,
> > > +			 xfs_dir2_dataptr_t diroffset);
> > > +void xfs_parent_cancel(xfs_mount_t *mp, struct xfs_parent_defer
> > > *parent);
> > > +
> > > +#endif	/* __XFS_PARENT_H__ */
> > > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> > > index 09876ba10a42..ef993c3a8963 100644
> > > --- a/fs/xfs/xfs_inode.c
> > > +++ b/fs/xfs/xfs_inode.c
> > > @@ -37,6 +37,8 @@
> > >  #include "xfs_reflink.h"
> > >  #include "xfs_ag.h"
> > >  #include "xfs_log_priv.h"
> > > +#include "xfs_parent.h"
> > > +#include "xfs_xattr.h"
> > >  
> > >  struct kmem_cache *xfs_inode_cache;
> > >  
> > > @@ -950,7 +952,7 @@ xfs_bumplink(
> > >  int
> > >  xfs_create(
> > >  	struct user_namespace	*mnt_userns,
> > > -	xfs_inode_t		*dp,
> > > +	struct xfs_inode	*dp,
> > >  	struct xfs_name		*name,
> > >  	umode_t			mode,
> > >  	dev_t			rdev,
> > > @@ -962,7 +964,7 @@ xfs_create(
> > >  	struct xfs_inode	*ip = NULL;
> > >  	struct xfs_trans	*tp = NULL;
> > >  	int			error;
> > > -	bool                    unlock_dp_on_error = false;
> > > +	bool			unlock_dp_on_error = false;
> > >  	prid_t			prid;
> > >  	struct xfs_dquot	*udqp = NULL;
> > >  	struct xfs_dquot	*gdqp = NULL;
> > > @@ -970,6 +972,8 @@ xfs_create(
> > >  	struct xfs_trans_res	*tres;
> > >  	uint			resblks;
> > >  	xfs_ino_t		ino;
> > > +	xfs_dir2_dataptr_t	diroffset;
> > > +	struct xfs_parent_defer	*parent = NULL;
> > >  
> > >  	trace_xfs_create(dp, name);
> > >  
> > > @@ -996,6 +1000,12 @@ xfs_create(
> > >  		tres = &M_RES(mp)->tr_create;
> > >  	}
> > >  
> > > +	if (xfs_has_parent(mp)) {
> > > +		error = xfs_parent_init(mp, dp, name, &parent);
> > > +		if (error)
> > > +			goto out_release_dquots;
> > > +	}
> > > +
> > >  	/*
> > >  	 * Initially assume that the file does not exist and
> > >  	 * reserve the resources for that case.  If that is not
> > > @@ -1011,7 +1021,7 @@ xfs_create(
> > >  				resblks, &tp);
> > >  	}
> > >  	if (error)
> > > -		goto out_release_dquots;
> > > +		goto drop_incompat;
> > >  
> > >  	xfs_ilock(dp, XFS_ILOCK_EXCL | XFS_ILOCK_PARENT);
> > >  	unlock_dp_on_error = true;
> > > @@ -1021,6 +1031,7 @@ xfs_create(
> > >  	 * entry pointing to them, but a directory also the "." entry
> > >  	 * pointing to itself.
> > >  	 */
> > > +	init_xattrs |= xfs_has_parent(mp);
> > >  	error = xfs_dialloc(&tp, dp->i_ino, mode, &ino);
> > >  	if (!error)
> > >  		error = xfs_init_new_inode(mnt_userns, tp, dp, ino,
> > > mode,
> > > @@ -1035,11 +1046,12 @@ xfs_create(
> > >  	 * the transaction cancel unlocking dp so don't do it
> > > explicitly in the
> > >  	 * error path.
> > >  	 */
> > > -	xfs_trans_ijoin(tp, dp, XFS_ILOCK_EXCL);
> > > +	xfs_trans_ijoin(tp, dp, 0);
> > >  	unlock_dp_on_error = false;
> > >  
> > >  	error = xfs_dir_createname(tp, dp, name, ip->i_ino,
> > > -				   resblks - XFS_IALLOC_SPACE_RES(mp),
> > > NULL);
> > > +				   resblks - XFS_IALLOC_SPACE_RES(mp),
> > > +				   &diroffset);
> > >  	if (error) {
> > >  		ASSERT(error != -ENOSPC);
> > >  		goto out_trans_cancel;
> > > @@ -1055,6 +1067,17 @@ xfs_create(
> > >  		xfs_bumplink(tp, dp);
> > >  	}
> > >  
> > > +	/*
> > > +	 * If we have parent pointers, we need to add the attribute
> > > containing
> > > +	 * the parent information now.
> > > +	 */
> > > +	if (parent) {
> > > +		parent->args.dp	= ip;
> 
> ...and on second thought, it seems a little odd that you pass @dp to
> xfs_parent_init only to override parent->args.dp here.  Given that
> this
> doesn't do anything with @parent until here, why not pass NULL to the
> init function above?
Sure, the init helpers are helpful, but do create some out of order
initializing since sometimes the required parameters are not all
available at the same time.  Will update.

Thanks!
Allison

> 
> --D
> 
> > > +		error = xfs_parent_defer_add(tp, dp, parent,
> > > diroffset);
> > > +		if (error)
> > > +			goto out_trans_cancel;
> > > +	}
> > > +
> > >  	/*
> > >  	 * If this is a synchronous mount, make sure that the
> > >  	 * create transaction goes to disk before returning to
> > > @@ -1080,6 +1103,7 @@ xfs_create(
> > >  
> > >  	*ipp = ip;
> > >  	xfs_iunlock(ip, XFS_ILOCK_EXCL);
> > > +	xfs_iunlock(dp, XFS_ILOCK_EXCL | XFS_ILOCK_PARENT);
> > 
> > I don't think we need the ILOCK class annotations for unlocks.
> > 
> > Other than the two things I asked about, this is looking good.
> > 
> > --D
> > 
> > >  	return 0;
> > >  
> > >   out_trans_cancel:
> > > @@ -1094,6 +1118,9 @@ xfs_create(
> > >  		xfs_finish_inode_setup(ip);
> > >  		xfs_irele(ip);
> > >  	}
> > > + drop_incompat:
> > > +	if (parent)
> > > +		xfs_parent_cancel(mp, parent);
> > >   out_release_dquots:
> > >  	xfs_qm_dqrele(udqp);
> > >  	xfs_qm_dqrele(gdqp);
> > > diff --git a/fs/xfs/xfs_xattr.c b/fs/xfs/xfs_xattr.c
> > > index c325a28b89a8..d9067c5f6bd6 100644
> > > --- a/fs/xfs/xfs_xattr.c
> > > +++ b/fs/xfs/xfs_xattr.c
> > > @@ -27,7 +27,7 @@
> > >   * they must release the permission by calling
> > > xlog_drop_incompat_feat
> > >   * when they're done.
> > >   */
> > > -static inline int
> > > +int
> > >  xfs_attr_grab_log_assist(
> > >  	struct xfs_mount	*mp)
> > >  {
> > > diff --git a/fs/xfs/xfs_xattr.h b/fs/xfs/xfs_xattr.h
> > > index 2b09133b1b9b..3fd6520a4d69 100644
> > > --- a/fs/xfs/xfs_xattr.h
> > > +++ b/fs/xfs/xfs_xattr.h
> > > @@ -7,6 +7,7 @@
> > >  #define __XFS_XATTR_H__
> > >  
> > >  int xfs_attr_change(struct xfs_da_args *args);
> > > +int xfs_attr_grab_log_assist(struct xfs_mount *mp);
> > >  
> > >  extern const struct xattr_handler *xfs_xattr_handlers[];
> > >  
> > > -- 
> > > 2.25.1
> > > 


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH RESEND v2 13/18] xfs: add parent attributes to link
  2022-08-09 18:43   ` Darrick J. Wong
@ 2022-08-10  3:09     ` Alli
  2022-09-23 20:25       ` Darrick J. Wong
  0 siblings, 1 reply; 58+ messages in thread
From: Alli @ 2022-08-10  3:09 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Tue, 2022-08-09 at 11:43 -0700, Darrick J. Wong wrote:
> On Thu, Aug 04, 2022 at 12:40:08PM -0700, Allison Henderson wrote:
> > This patch modifies xfs_link to add a parent pointer to the inode.
> > 
> > [bfoster: rebase, use VFS inode fields, fix xfs_bmap_finish()
> > usage]
> > [achender: rebased, changed __unint32_t to xfs_dir2_dataptr_t,
> >            fixed null pointer bugs]
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
> > ---
> >  fs/xfs/xfs_inode.c | 43 ++++++++++++++++++++++++++++++++++------
> > ---
> >  1 file changed, 34 insertions(+), 9 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> > index ef993c3a8963..6e5deb0d42c4 100644
> > --- a/fs/xfs/xfs_inode.c
> > +++ b/fs/xfs/xfs_inode.c
> > @@ -1228,14 +1228,16 @@ xfs_create_tmpfile(
> >  
> >  int
> >  xfs_link(
> > -	xfs_inode_t		*tdp,
> > -	xfs_inode_t		*sip,
> > +	struct xfs_inode	*tdp,
> > +	struct xfs_inode	*sip,
> >  	struct xfs_name		*target_name)
> >  {
> > -	xfs_mount_t		*mp = tdp->i_mount;
> > -	xfs_trans_t		*tp;
> > +	struct xfs_mount	*mp = tdp->i_mount;
> > +	struct xfs_trans	*tp;
> >  	int			error, nospace_error = 0;
> >  	int			resblks;
> > +	xfs_dir2_dataptr_t	diroffset;
> > +	struct xfs_parent_defer	*parent = NULL;
> >  
> >  	trace_xfs_link(tdp, target_name);
> >  
> > @@ -1252,11 +1254,17 @@ xfs_link(
> >  	if (error)
> >  		goto std_return;
> >  
> > +	if (xfs_has_parent(mp)) {
> > +		error = xfs_parent_init(mp, sip, target_name, &parent);
> 
> Why does xfs_parent_init check xfs_has_parent if the callers already
> do
> that?
It was part of the solution outlined in the last review.  It is
redundant, but not an inappropriate sanity check for that function
either. I can remove it from the helper if it bothers folks. 


> 
> > +		if (error)
> > +			goto std_return;
> > +	}
> > +
> >  	resblks = XFS_LINK_SPACE_RES(mp, target_name->len);
> 
> Same comment about increasing XFS_LINK_SPACE_RES to accomodate xattr
> expansion as I had for the last patch.
So we do use XFS_LINK_SPACE_RES here, but didnt we update the tr_link
below in patch 11 to accommodate for the extra space?  Maybe I'm not
understanding why we would need both?

> 
> >  	error = xfs_trans_alloc_dir(tdp, &M_RES(mp)->tr_link, sip,
> > &resblks,
> >  			&tp, &nospace_error);
> >  	if (error)
> > -		goto std_return;
> > +		goto drop_incompat;
> >  
> >  	/*
> >  	 * If we are using project inheritance, we only allow hard link
> > @@ -1289,14 +1297,26 @@ xfs_link(
> >  	}
> >  
> >  	error = xfs_dir_createname(tp, tdp, target_name, sip->i_ino,
> > -				   resblks, NULL);
> > +				   resblks, &diroffset);
> >  	if (error)
> > -		goto error_return;
> > +		goto out_defer_cancel;
> >  	xfs_trans_ichgtime(tp, tdp, XFS_ICHGTIME_MOD |
> > XFS_ICHGTIME_CHG);
> >  	xfs_trans_log_inode(tp, tdp, XFS_ILOG_CORE);
> >  
> >  	xfs_bumplink(tp, sip);
> >  
> > +	/*
> > +	 * If we have parent pointers, we now need to add the parent
> > record to
> > +	 * the attribute fork of the inode. If this is the initial
> > parent
> > +	 * attribute, we need to create it correctly, otherwise we can
> > just add
> > +	 * the parent to the inode.
> > +	 */
> > +	if (parent) {
> > +		error = xfs_parent_defer_add(tp, tdp, parent,
> > diroffset);
> 
> A followup to the comments I made to the previous patch about
> parent->args.dp --
> 
> Since you're partially initializing the xfs_defer_parent structure
> before you even have the dir offset, why not delay initializing the
> parent and child pointers until the xfs_parent_defer_add step?
> 
> int
> xfs_parent_init(
> 	struct xfs_mount		*mp,
> 	struct xfs_parent_defer		**parentp)
> {
> 	struct xfs_parent_defer		*parent;
> 	int				error;
> 
> 	if (!xfs_has_parent(mp))
> 		return 0;
> 
> 	error = xfs_attr_grab_log_assist(mp);
> 	if (error)
> 		return error;
> 
> 	parent = kzalloc(sizeof(*parent), GFP_KERNEL);
> 	if (!parent)
> 		return -ENOMEM;
> 
> 	/* init parent da_args */
> 	parent->args.geo = mp->m_attr_geo;
> 	parent->args.whichfork = XFS_ATTR_FORK;
> 	parent->args.attr_filter = XFS_ATTR_PARENT;
> 	parent->args.op_flags = XFS_DA_OP_OKNOENT | XFS_DA_OP_LOGGED;
> 	parent->args.name = (const uint8_t *)&parent->rec;
> 	parent->args.namelen = sizeof(struct xfs_parent_name_rec);
> 
> 	*parentp = parent;
> 	return 0;
> }
> 
> int
> xfs_parent_defer_add(
> 	struct xfs_trans	*tp,
> 	struct xfs_parent_defer	*parent,
> 	struct xfs_inode	*dp,
> 	struct xfs_name		*parent_name,
> 	xfs_dir2_dataptr_t	parent_offset,
> 	struct xfs_inode	*child)
> {
> 	struct xfs_da_args	*args = &parent->args;
> 
> 	xfs_init_parent_name_rec(&parent->rec, dp, diroffset);
> 	args->hashval = xfs_da_hashname(args->name, args->namelen);
> 
> 	args->trans = tp;
> 	args->dp = child;
> 	if (parent_name) {
> 		args->name = parent_name->name;
> 		args->valuelen = parent_name->len;
> 	}
> 	return xfs_attr_defer_add(args);
> }
> 
> And then the callsites become:
> 
> 	/*
> 	 * If we have parent pointers, we now need to add the parent
> record to
> 	 * the attribute fork of the inode. If this is the initial
> parent
> 	 * attribute, we need to create it correctly, otherwise we can
> just add
> 	 * the parent to the inode.
> 	 */
> 	if (parent) {
> 		error = xfs_parent_defer_add(tp, parent, tdp,
> 				target_name, diroffset, sip);
> 		if (error)
> 			goto out_defer_cancel;
> 	}
Sure, I can scoot that part down to the defer_add helper. Thanks for
the reviews!

Allison
> 
> Aside from the API suggestions, the rest looks good to me.
> 
> --D
> 
> > +		if (error)
> > +			goto out_defer_cancel;
> > +	}
> > +
> >  	/*
> >  	 * If this is a synchronous mount, make sure that the
> >  	 * link transaction goes to disk before returning to
> > @@ -1310,11 +1330,16 @@ xfs_link(
> >  	xfs_iunlock(sip, XFS_ILOCK_EXCL);
> >  	return error;
> >  
> > - error_return:
> > +out_defer_cancel:
> > +	xfs_defer_cancel(tp);
> > +error_return:
> >  	xfs_trans_cancel(tp);
> >  	xfs_iunlock(tdp, XFS_ILOCK_EXCL);
> >  	xfs_iunlock(sip, XFS_ILOCK_EXCL);
> > - std_return:
> > +drop_incompat:
> > +	if (parent)
> > +		xfs_parent_cancel(mp, parent);
> > +std_return:
> >  	if (error == -ENOSPC && nospace_error)
> >  		error = nospace_error;
> >  	return error;
> > -- 
> > 2.25.1
> > 


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH RESEND v2 14/18] xfs: remove parent pointers in unlink
  2022-08-09 18:45   ` Darrick J. Wong
@ 2022-08-10  3:09     ` Alli
  0 siblings, 0 replies; 58+ messages in thread
From: Alli @ 2022-08-10  3:09 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Tue, 2022-08-09 at 11:45 -0700, Darrick J. Wong wrote:
> On Thu, Aug 04, 2022 at 12:40:09PM -0700, Allison Henderson wrote:
> > This patch removes the parent pointer attribute during unlink
> > 
> > [bfoster: rebase, use VFS inode generation]
> > [achender: rebased, changed __unint32_t to xfs_dir2_dataptr_t
> >            implemented xfs_attr_remove_parent]
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
> > ---
> >  fs/xfs/libxfs/xfs_attr.c   |  2 +-
> >  fs/xfs/libxfs/xfs_attr.h   |  1 +
> >  fs/xfs/libxfs/xfs_parent.c | 15 +++++++++++++++
> >  fs/xfs/libxfs/xfs_parent.h |  3 +++
> >  fs/xfs/xfs_inode.c         | 29 +++++++++++++++++++++++------
> >  5 files changed, 43 insertions(+), 7 deletions(-)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_attr.c b/fs/xfs/libxfs/xfs_attr.c
> > index 0a458ea7051f..77513ff7e1ec 100644
> > --- a/fs/xfs/libxfs/xfs_attr.c
> > +++ b/fs/xfs/libxfs/xfs_attr.c
> > @@ -936,7 +936,7 @@ xfs_attr_defer_replace(
> >  }
> >  
> >  /* Removes an attribute for an inode as a deferred operation */
> > -static int
> > +int
> >  xfs_attr_defer_remove(
> >  	struct xfs_da_args	*args)
> >  {
> > diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
> > index b47417b5172f..2e11e5e83941 100644
> > --- a/fs/xfs/libxfs/xfs_attr.h
> > +++ b/fs/xfs/libxfs/xfs_attr.h
> > @@ -545,6 +545,7 @@ bool xfs_attr_is_leaf(struct xfs_inode *ip);
> >  int xfs_attr_get_ilocked(struct xfs_da_args *args);
> >  int xfs_attr_get(struct xfs_da_args *args);
> >  int xfs_attr_defer_add(struct xfs_da_args *args);
> > +int xfs_attr_defer_remove(struct xfs_da_args *args);
> >  int xfs_attr_set(struct xfs_da_args *args);
> >  int xfs_attr_set_iter(struct xfs_attr_intent *attr);
> >  int xfs_attr_remove_iter(struct xfs_attr_intent *attr);
> > diff --git a/fs/xfs/libxfs/xfs_parent.c
> > b/fs/xfs/libxfs/xfs_parent.c
> > index 4ab531c77d7d..03f03f731d02 100644
> > --- a/fs/xfs/libxfs/xfs_parent.c
> > +++ b/fs/xfs/libxfs/xfs_parent.c
> > @@ -123,6 +123,21 @@ xfs_parent_defer_add(
> >  	return xfs_attr_defer_add(args);
> >  }
> >  
> > +int
> > +xfs_parent_defer_remove(
> > +	struct xfs_trans	*tp,
> > +	struct xfs_inode	*ip,
> > +	struct xfs_parent_defer	*parent,
> > +	xfs_dir2_dataptr_t	diroffset)
> 
> Same suggestion about setting args->dp here instead of in
> xfs_parent_init.
Sure, that should be fine.

> 
> > +{
> > +	struct xfs_da_args	*args = &parent->args;
> > +
> > +	xfs_init_parent_name_rec(&parent->rec, ip, diroffset);
> > +	args->trans = tp;
> > +	args->hashval = xfs_da_hashname(args->name, args->namelen);
> > +	return xfs_attr_defer_remove(args);
> > +}
> > +
> >  void
> >  xfs_parent_cancel(
> >  	xfs_mount_t		*mp,
> > diff --git a/fs/xfs/libxfs/xfs_parent.h
> > b/fs/xfs/libxfs/xfs_parent.h
> > index 21a350b97ed5..67948f4b3834 100644
> > --- a/fs/xfs/libxfs/xfs_parent.h
> > +++ b/fs/xfs/libxfs/xfs_parent.h
> > @@ -29,6 +29,9 @@ int xfs_parent_init(xfs_mount_t *mp, xfs_inode_t
> > *ip,
> >  int xfs_parent_defer_add(struct xfs_trans *tp, struct xfs_inode
> > *ip,
> >  			 struct xfs_parent_defer *parent,
> >  			 xfs_dir2_dataptr_t diroffset);
> > +int xfs_parent_defer_remove(struct xfs_trans *tp, struct xfs_inode
> > *ip,
> > +			    struct xfs_parent_defer *parent,
> > +			    xfs_dir2_dataptr_t diroffset);
> >  void xfs_parent_cancel(xfs_mount_t *mp, struct xfs_parent_defer
> > *parent);
> >  
> >  #endif	/* __XFS_PARENT_H__ */
> > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> > index 6e5deb0d42c4..69bb67f2a252 100644
> > --- a/fs/xfs/xfs_inode.c
> > +++ b/fs/xfs/xfs_inode.c
> > @@ -2464,16 +2464,18 @@ xfs_iunpin_wait(
> >   */
> >  int
> >  xfs_remove(
> > -	xfs_inode_t             *dp,
> > +	struct xfs_inode	*dp,
> >  	struct xfs_name		*name,
> > -	xfs_inode_t		*ip)
> > +	struct xfs_inode	*ip)
> >  {
> > -	xfs_mount_t		*mp = dp->i_mount;
> > -	xfs_trans_t             *tp = NULL;
> > +	struct xfs_mount	*mp = dp->i_mount;
> > +	struct xfs_trans	*tp = NULL;
> >  	int			is_dir = S_ISDIR(VFS_I(ip)->i_mode);
> >  	int			dontcare;
> >  	int                     error = 0;
> >  	uint			resblks;
> > +	xfs_dir2_dataptr_t	dir_offset;
> > +	struct xfs_parent_defer	*parent = NULL;
> >  
> >  	trace_xfs_remove(dp, name);
> >  
> > @@ -2488,6 +2490,12 @@ xfs_remove(
> >  	if (error)
> >  		goto std_return;
> >  
> > +	if (xfs_has_parent(mp)) {
> > +		error = xfs_parent_init(mp, ip, NULL, &parent);
> > +		if (error)
> > +			goto std_return;
> > +	}
> > +
> >  	/*
> >  	 * We try to get the real space reservation first, allowing for
> >  	 * directory btree deletion(s) implying possible bmap
> > insert(s).  If we
> > @@ -2504,7 +2512,7 @@ xfs_remove(
> >  			&tp, &dontcare);
> >  	if (error) {
> >  		ASSERT(error != -ENOSPC);
> > -		goto std_return;
> > +		goto drop_incompat;
> >  	}
> >  
> >  	/*
> > @@ -2558,12 +2566,18 @@ xfs_remove(
> >  	if (error)
> >  		goto out_trans_cancel;
> >  
> > -	error = xfs_dir_removename(tp, dp, name, ip->i_ino, resblks,
> > NULL);
> > +	error = xfs_dir_removename(tp, dp, name, ip->i_ino, resblks,
> > &dir_offset);
> >  	if (error) {
> >  		ASSERT(error != -ENOENT);
> >  		goto out_trans_cancel;
> >  	}
> >  
> > +	if (xfs_has_parent(mp)) {
> > +		error = xfs_parent_defer_remove(tp, dp, parent,
> > dir_offset);
> 
> If it's safe to gate xfs_parent_cancel on "if (parent)" then can we
> avoid the atomic bit access by doing that here too?
Oh, sure, I likely just forgot to update that conditional.

> 
> Otherwise looks good here.
Thank you!

Allison

> 
> --D
> 
> > +		if (error)
> > +			goto out_trans_cancel;
> > +	}
> > +
> >  	/*
> >  	 * If this is a synchronous mount, make sure that the
> >  	 * remove transaction goes to disk before returning to
> > @@ -2588,6 +2602,9 @@ xfs_remove(
> >   out_unlock:
> >  	xfs_iunlock(ip, XFS_ILOCK_EXCL);
> >  	xfs_iunlock(dp, XFS_ILOCK_EXCL);
> > + drop_incompat:
> > +	if (parent)
> > +		xfs_parent_cancel(mp, parent);
> >   std_return:
> >  	return error;
> >  }
> > -- 
> > 2.25.1
> > 


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH RESEND v2 15/18] xfs: Add parent pointers to rename
  2022-08-09 18:49   ` Darrick J. Wong
@ 2022-08-10  3:09     ` Alli
  0 siblings, 0 replies; 58+ messages in thread
From: Alli @ 2022-08-10  3:09 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Tue, 2022-08-09 at 11:49 -0700, Darrick J. Wong wrote:
> On Thu, Aug 04, 2022 at 12:40:10PM -0700, Allison Henderson wrote:
> > This patch removes the old parent pointer attribute during the
> > rename
> > operation, and re-adds the updated parent pointer.  In the case of
> > xfs_cross_rename, we modify the routine not to roll the transaction
> > just
> > yet.  We will do this after the parent pointer is added in the
> > calling
> > xfs_rename function.
> > 
> > Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
> > ---
> >  fs/xfs/xfs_inode.c | 128 +++++++++++++++++++++++++++++++++------
> > ------
> >  1 file changed, 94 insertions(+), 34 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> > index 69bb67f2a252..8a81b78b6dd7 100644
> > --- a/fs/xfs/xfs_inode.c
> > +++ b/fs/xfs/xfs_inode.c
> > @@ -2776,7 +2776,7 @@ xfs_cross_rename(
> >  	}
> >  	xfs_trans_ichgtime(tp, dp1, XFS_ICHGTIME_MOD |
> > XFS_ICHGTIME_CHG);
> >  	xfs_trans_log_inode(tp, dp1, XFS_ILOG_CORE);
> > -	return xfs_finish_rename(tp);
> > +	return 0;
> >  
> >  out_trans_abort:
> >  	xfs_trans_cancel(tp);
> > @@ -2834,26 +2834,31 @@ xfs_rename_alloc_whiteout(
> >   */
> >  int
> >  xfs_rename(
> > -	struct user_namespace	*mnt_userns,
> > -	struct xfs_inode	*src_dp,
> > -	struct xfs_name		*src_name,
> > -	struct xfs_inode	*src_ip,
> > -	struct xfs_inode	*target_dp,
> > -	struct xfs_name		*target_name,
> > -	struct xfs_inode	*target_ip,
> > -	unsigned int		flags)
> > +	struct user_namespace		*mnt_userns,
> > +	struct xfs_inode		*src_dp,
> > +	struct xfs_name			*src_name,
> > +	struct xfs_inode		*src_ip,
> > +	struct xfs_inode		*target_dp,
> > +	struct xfs_name			*target_name,
> > +	struct xfs_inode		*target_ip,
> > +	unsigned int			flags)
> >  {
> > -	struct xfs_mount	*mp = src_dp->i_mount;
> > -	struct xfs_trans	*tp;
> > -	struct xfs_inode	*wip = NULL;		/* whiteout inode
> > */
> > -	struct xfs_inode	*inodes[__XFS_SORT_INODES];
> > -	int			i;
> > -	int			num_inodes = __XFS_SORT_INODES;
> > -	bool			new_parent = (src_dp != target_dp);
> > -	bool			src_is_directory =
> > S_ISDIR(VFS_I(src_ip)->i_mode);
> > -	int			spaceres;
> > -	bool			retried = false;
> > -	int			error, nospace_error = 0;
> > +	struct xfs_mount		*mp = src_dp->i_mount;
> > +	struct xfs_trans		*tp;
> > +	struct xfs_inode		*wip = NULL;		/* whiteout
> > inode */
> > +	struct xfs_inode		*inodes[__XFS_SORT_INODES];
> > +	int				i;
> > +	int				num_inodes = __XFS_SORT_INODES;
> > +	bool				new_parent = (src_dp !=
> > target_dp);
> > +	bool				src_is_directory =
> > S_ISDIR(VFS_I(src_ip)->i_mode);
> > +	int				spaceres;
> > +	bool				retried = false;
> > +	int				error, nospace_error = 0;
> > +	xfs_dir2_dataptr_t		new_diroffset;
> > +	xfs_dir2_dataptr_t		old_diroffset;
> > +	struct xfs_parent_defer		*old_parent_ptr = NULL;
> > +	struct xfs_parent_defer		*new_parent_ptr = NULL;
> > +	struct xfs_parent_defer		*target_parent_ptr = NULL;
> >  
> >  	trace_xfs_rename(src_dp, target_dp, src_name, target_name);
> >  
> > @@ -2877,6 +2882,15 @@ xfs_rename(
> >  
> >  	xfs_sort_for_rename(src_dp, target_dp, src_ip, target_ip, wip,
> >  				inodes, &num_inodes);
> > +	if (xfs_has_parent(mp)) {
> > +		error = xfs_parent_init(mp, src_ip, NULL,
> > &old_parent_ptr);
> > +		if (error)
> > +			goto out_release_wip;
> > +		error = xfs_parent_init(mp, src_ip, target_name,
> > +					&new_parent_ptr);
> > +		if (error)
> > +			goto out_release_wip;
> > +	}
> >  
> >  retry:
> >  	nospace_error = 0;
> > @@ -2889,7 +2903,7 @@ xfs_rename(
> >  				&tp);
> >  	}
> >  	if (error)
> > -		goto out_release_wip;
> > +		goto drop_incompat;
> >  
> >  	/*
> >  	 * Attach the dquots to the inodes
> > @@ -2911,14 +2925,14 @@ xfs_rename(
> >  	 * we can rely on either trans_commit or trans_cancel to unlock
> >  	 * them.
> >  	 */
> > -	xfs_trans_ijoin(tp, src_dp, XFS_ILOCK_EXCL);
> > +	xfs_trans_ijoin(tp, src_dp, 0);
> >  	if (new_parent)
> > -		xfs_trans_ijoin(tp, target_dp, XFS_ILOCK_EXCL);
> > -	xfs_trans_ijoin(tp, src_ip, XFS_ILOCK_EXCL);
> > +		xfs_trans_ijoin(tp, target_dp, 0);
> > +	xfs_trans_ijoin(tp, src_ip, 0);
> >  	if (target_ip)
> > -		xfs_trans_ijoin(tp, target_ip, XFS_ILOCK_EXCL);
> > +		xfs_trans_ijoin(tp, target_ip, 0);
> >  	if (wip)
> > -		xfs_trans_ijoin(tp, wip, XFS_ILOCK_EXCL);
> > +		xfs_trans_ijoin(tp, wip, 0);
> >  
> >  	/*
> >  	 * If we are using project inheritance, we only allow renames
> > @@ -2928,15 +2942,16 @@ xfs_rename(
> >  	if (unlikely((target_dp->i_diflags & XFS_DIFLAG_PROJINHERIT) &&
> >  		     target_dp->i_projid != src_ip->i_projid)) {
> >  		error = -EXDEV;
> > -		goto out_trans_cancel;
> > +		goto out_unlock;
> >  	}
> >  
> >  	/* RENAME_EXCHANGE is unique from here on. */
> > -	if (flags & RENAME_EXCHANGE)
> > -		return xfs_cross_rename(tp, src_dp, src_name, src_ip,
> > +	if (flags & RENAME_EXCHANGE) {
> > +		error = xfs_cross_rename(tp, src_dp, src_name, src_ip,
> >  					target_dp, target_name,
> > target_ip,
> >  					spaceres);
> > -
> > +		goto out_pptr;
> > +	}
> >  	/*
> >  	 * Try to reserve quota to handle an expansion of the target
> > directory.
> >  	 * We'll allow the rename to continue in reservationless mode
> > if we hit
> > @@ -3052,7 +3067,7 @@ xfs_rename(
> >  		 * to account for the ".." reference from the new
> > entry.
> >  		 */
> >  		error = xfs_dir_createname(tp, target_dp, target_name,
> > -					   src_ip->i_ino, spaceres,
> > NULL);
> > +					   src_ip->i_ino, spaceres,
> > &new_diroffset);
> >  		if (error)
> >  			goto out_trans_cancel;
> >  
> > @@ -3073,10 +3088,14 @@ xfs_rename(
> >  		 * name at the destination directory, remove it first.
> >  		 */
> >  		error = xfs_dir_replace(tp, target_dp, target_name,
> > -					src_ip->i_ino, spaceres, NULL);
> > +					src_ip->i_ino, spaceres,
> > &new_diroffset);
> >  		if (error)
> >  			goto out_trans_cancel;
> >  
> > +		if (xfs_has_parent(mp))
> > +			error = xfs_parent_init(mp, target_ip, NULL,
> > +						&target_parent_ptr);
> > +
> >  		xfs_trans_ichgtime(tp, target_dp,
> >  					XFS_ICHGTIME_MOD |
> > XFS_ICHGTIME_CHG);
> >  
> > @@ -3146,26 +3165,67 @@ xfs_rename(
> >  	 */
> >  	if (wip)
> >  		error = xfs_dir_replace(tp, src_dp, src_name, wip-
> > >i_ino,
> > -					spaceres, NULL);
> > +					spaceres, &old_diroffset);
> >  	else
> >  		error = xfs_dir_removename(tp, src_dp, src_name,
> > src_ip->i_ino,
> > -					   spaceres, NULL);
> > +					   spaceres, &old_diroffset);
> >  
> >  	if (error)
> >  		goto out_trans_cancel;
> >  
> > +out_pptr:
> > +	if (new_parent_ptr) {
> > +		error = xfs_parent_defer_add(tp, target_dp,
> > new_parent_ptr,
> > +					     new_diroffset);
> > +		if (error)
> > +			goto out_trans_cancel;
> > +	}
> > +
> > +	if (old_parent_ptr) {
> > +		error = xfs_parent_defer_remove(tp, src_dp,
> > old_parent_ptr,
> > +						old_diroffset);
> > +		if (error)
> > +			goto out_trans_cancel;
> > +	}
> > +
> > +	if (target_parent_ptr) {
> > +		error = xfs_parent_defer_remove(tp, target_dp,
> > +						target_parent_ptr,
> > +						new_diroffset);
> > +		if (error)
> > +			goto out_trans_cancel;
> > +	}
> > +
> >  	xfs_trans_ichgtime(tp, src_dp, XFS_ICHGTIME_MOD |
> > XFS_ICHGTIME_CHG);
> >  	xfs_trans_log_inode(tp, src_dp, XFS_ILOG_CORE);
> >  	if (new_parent)
> >  		xfs_trans_log_inode(tp, target_dp, XFS_ILOG_CORE);
> >  
> >  	error = xfs_finish_rename(tp);
> > +
> > +out_unlock:
> >  	if (wip)
> >  		xfs_irele(wip);
> > +	if (wip)
> > +		xfs_iunlock(wip, XFS_ILOCK_EXCL);
> > +	if (target_ip)
> > +		xfs_iunlock(target_ip, XFS_ILOCK_EXCL);
> > +	xfs_iunlock(src_ip, XFS_ILOCK_EXCL);
> > +	if (new_parent)
> > +		xfs_iunlock(target_dp, XFS_ILOCK_EXCL);
> > +	xfs_iunlock(src_dp, XFS_ILOCK_EXCL);
> 
> Sorry to be fussy, but could you separate the ILOCK unlocking changes
> (and maybe the variable indentation part too) into a separate prep
> patch, please?
Sure, that should be fine.

> 
> Also, who frees the xfs_parent_defer objects?
> 
its the xfs_parent_cancel() calls below

> --D
> 
> > +
> >  	return error;
> >  
> >  out_trans_cancel:
> >  	xfs_trans_cancel(tp);
> > +drop_incompat:
> > +	if (new_parent_ptr)
> > +		xfs_parent_cancel(mp, new_parent_ptr);
> > +	if (old_parent_ptr)
> > +		xfs_parent_cancel(mp, old_parent_ptr);
> > +	if (target_parent_ptr)
> > +		xfs_parent_cancel(mp, target_parent_ptr);
> >  out_release_wip:
> >  	if (wip)
> >  		xfs_irele(wip);
> > -- 
> > 2.25.1
> > 


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH RESEND v2 18/18] xfs: Add parent pointer ioctl
  2022-08-09 19:26   ` Darrick J. Wong
@ 2022-08-10  3:09     ` Alli
  2022-09-24  0:01       ` Darrick J. Wong
  0 siblings, 1 reply; 58+ messages in thread
From: Alli @ 2022-08-10  3:09 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs

On Tue, 2022-08-09 at 12:26 -0700, Darrick J. Wong wrote:
> On Thu, Aug 04, 2022 at 12:40:13PM -0700, Allison Henderson wrote:
> > This patch adds a new file ioctl to retrieve the parent pointer of
> > a
> > given inode
> > 
> > Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
> > ---
> >  fs/xfs/Makefile            |   1 +
> >  fs/xfs/libxfs/xfs_fs.h     |  57 ++++++++++++++++
> >  fs/xfs/libxfs/xfs_parent.c |  10 +++
> >  fs/xfs/libxfs/xfs_parent.h |   2 +
> >  fs/xfs/xfs_ioctl.c         |  95 +++++++++++++++++++++++++-
> >  fs/xfs/xfs_ondisk.h        |   4 ++
> >  fs/xfs/xfs_parent_utils.c  | 134
> > +++++++++++++++++++++++++++++++++++++
> >  fs/xfs/xfs_parent_utils.h  |  22 ++++++
> >  8 files changed, 323 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
> > index caeea8d968ba..998658e40ab4 100644
> > --- a/fs/xfs/Makefile
> > +++ b/fs/xfs/Makefile
> > @@ -86,6 +86,7 @@ xfs-y				+= xfs_aops.o \
> >  				   xfs_mount.o \
> >  				   xfs_mru_cache.o \
> >  				   xfs_pwork.o \
> > +				   xfs_parent_utils.o \
> >  				   xfs_reflink.o \
> >  				   xfs_stats.o \
> >  				   xfs_super.o \
> > diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
> > index b0b4d7a3aa15..ba6ec82a0272 100644
> > --- a/fs/xfs/libxfs/xfs_fs.h
> > +++ b/fs/xfs/libxfs/xfs_fs.h
> > @@ -574,6 +574,7 @@ typedef struct xfs_fsop_handlereq {
> >  #define XFS_IOC_ATTR_SECURE	0x0008	/* use attrs in
> > security namespace */
> >  #define XFS_IOC_ATTR_CREATE	0x0010	/* fail if attr
> > already exists */
> >  #define XFS_IOC_ATTR_REPLACE	0x0020	/* fail if attr
> > does not exist */
> > +#define XFS_IOC_ATTR_PARENT	0x0040  /* use attrs in parent
> > namespace */
> 
> This is the userspace API header, so I wonder -- should we allow
> XFS_IOC_ATTRLIST_BY_HANDLE and XFS_IOC_ATTRMULTI_BY_HANDLE to access
> parent pointers?
Well, the ioc is how the test cases get the pptrs back out in order to
verify parent pointers are working.  So we need to keep at least that,
but then I think it makes worrying about other forms of access feel
sort of silly since we're not really hiding anything.  They would have
to pass in the parent filter flag which wasnt allowable until now, so
it's not like having pptrs appear in the list when asked for is
inappropriate.

> 
> I think it's *definitely* incorrect to let ATTR_OP_REMOVE or
> ATTR_OP_SET
> (attrmulti subcommands) to mess with parent pointers.
Ok, I can see if I can add some sanity checking there.

> 
> I don't think attrlist or ATTR_OP_GET should be touching them either,
> particularly since you're defining a new ioctl to extract *only* the
> parent pointers.
> 
> If there wasn't XFS_IOC_GETPPOINTER then perhaps it would be ok to
> allow
> reads via ATTRLIST/ATTRMULTI.  But even then, I don't think we want
> things like xfsdump to think that it has to preserve those attributes
> since xfsrestore will reconstruct the directory tree (and hence the
> pptrs) for us.
Hrmm, not sure I follow this part, the point of pptrs are to
reconstruct the tree, so wouldnt we want them preserved?

> 
> >  
> >  typedef struct xfs_attrlist_cursor {
> >  	__u32		opaque[4];
> > @@ -752,6 +753,61 @@ struct xfs_scrub_metadata {
> >  				 XFS_SCRUB_OFLAG_NO_REPAIR_NEEDED)
> >  #define XFS_SCRUB_FLAGS_ALL	(XFS_SCRUB_FLAGS_IN |
> > XFS_SCRUB_FLAGS_OUT)
> >  
> > +#define XFS_PPTR_MAXNAMELEN				256
> > +
> > +/* return parents of the handle, not the open fd */
> > +#define XFS_PPTR_IFLAG_HANDLE  (1U << 0)
> > +
> > +/* target was the root directory */
> > +#define XFS_PPTR_OFLAG_ROOT    (1U << 1)
> > +
> > +/* Cursor is done iterating pptrs */
> > +#define XFS_PPTR_OFLAG_DONE    (1U << 2)
> > +
> > +/* Get an inode parent pointer through ioctl */
> > +struct xfs_parent_ptr {
> > +	__u64		xpp_ino;			/* Inode */
> > +	__u32		xpp_gen;			/* Inode
> > generation */
> > +	__u32		xpp_diroffset;			/*
> > Directory offset */
> > +	__u32		xpp_namelen;			/* File
> > name length */
> > +	__u32		xpp_pad;
> > +	__u8		xpp_name[XFS_PPTR_MAXNAMELEN];	/* File
> > name */
> 
> Since xpp_name is a fixed-length array that is long enough to ensure
> that there's a null at the end of the name, we don't need
> xpp_namelen.
> 
> I wonder if xpp_namelen and xpp_pad should simply turn into a u64
> field
> that's defined zero for future expansion?
Sure, I'll see if I can remove it and add a reserved field

> 
> > +};
> > +
> > +/* Iterate through an inodes parent pointers */
> > +struct xfs_pptr_info {
> > +	struct xfs_handle		pi_handle;
> > +	struct xfs_attrlist_cursor	pi_cursor;
> > +	__u32				pi_flags;
> > +	__u32				pi_reserved;
> > +	__u32				pi_ptrs_size;
> 
> Is this the number of elements in pi_parents[]?
Yes, it's the number parent pointers in the array

> 
> > +	__u32				pi_ptrs_used;
> > +	__u64				pi_reserved2[6];
> > +
> > +	/*
> > +	 * An array of struct xfs_parent_ptr follows the header
> > +	 * information. Use XFS_PPINFO_TO_PP() to access the
> > +	 * parent pointer array entries.
> > +	 */
> > +	struct xfs_parent_ptr		pi_parents[];
> > +};
> > +
> > +static inline size_t
> > +xfs_pptr_info_sizeof(int nr_ptrs)
> > +{
> > +	return sizeof(struct xfs_pptr_info) +
> > +	       (nr_ptrs * sizeof(struct xfs_parent_ptr));
> > +}
> > +
> > +static inline struct xfs_parent_ptr*
> > +xfs_ppinfo_to_pp(
> > +	struct xfs_pptr_info	*info,
> > +	int			idx)
> > +{
> > +
> 
> Nit: extra space.
Will fix

> 
> > +	return &info->pi_parents[idx];
> > +}
> > +
> >  /*
> >   * ioctl limits
> >   */
> > @@ -797,6 +853,7 @@ struct xfs_scrub_metadata {
> >  /*	XFS_IOC_GETFSMAP ------ hoisted 59         */
> >  #define XFS_IOC_SCRUB_METADATA	_IOWR('X', 60, struct
> > xfs_scrub_metadata)
> >  #define XFS_IOC_AG_GEOMETRY	_IOWR('X', 61, struct
> > xfs_ag_geometry)
> > +#define XFS_IOC_GETPPOINTER	_IOR ('X', 62, struct
> > xfs_parent_ptr)
> 
> I wonder if this name should more strongly emphasize that it's for
> reading
> the parents of a file?
> 
> #define XFS_IOC_GETPARENTS	_IOWR(...)
Sure, that sounds fine i think

> 
> Also, the ioctl reads and writes its parameter, so this is _IOWR, not
> _IOR.
> 
> BTW, is there a sample manpage somewhere?
The userspace branch adds some new flags to xfsprogs and some usage
help to explain how to use them.  See the last patch in the branch:
https://github.com/allisonhenderson/xfsprogs/tree/xfsprogs_new_pptrsv2

But it's just for printing the parent pointers out, it doesn't have a
man page for how to write your own ioctl.  I suppose we could add it
though.

> 
> >  
> >  /*
> >   * ioctl commands that replace IRIX syssgi()'s
> > diff --git a/fs/xfs/libxfs/xfs_parent.c
> > b/fs/xfs/libxfs/xfs_parent.c
> > index 03f03f731d02..d9c922a78617 100644
> > --- a/fs/xfs/libxfs/xfs_parent.c
> > +++ b/fs/xfs/libxfs/xfs_parent.c
> > @@ -26,6 +26,16 @@
> >  #include "xfs_xattr.h"
> >  #include "xfs_parent.h"
> >  
> > +/* Initializes a xfs_parent_ptr from an xfs_parent_name_rec */
> > +void
> > +xfs_init_parent_ptr(struct xfs_parent_ptr	*xpp,
> > +		    struct xfs_parent_name_rec	*rec)
> 
> The second parameter ought to be const struct xfs_parent_name_rec
> *rec
> to make it unambiguous to readers which is the source and which is
> the
> destination argument.
Ok, will update

> 
> > +{
> > +	xpp->xpp_ino = be64_to_cpu(rec->p_ino);
> > +	xpp->xpp_gen = be32_to_cpu(rec->p_gen);
> > +	xpp->xpp_diroffset = be32_to_cpu(rec->p_diroffset);
> > +}
> > +
> >  /*
> >   * Parent pointer attribute handling.
> >   *
> > diff --git a/fs/xfs/libxfs/xfs_parent.h
> > b/fs/xfs/libxfs/xfs_parent.h
> > index 67948f4b3834..53161b79d1e2 100644
> > --- a/fs/xfs/libxfs/xfs_parent.h
> > +++ b/fs/xfs/libxfs/xfs_parent.h
> > @@ -23,6 +23,8 @@ void xfs_init_parent_name_rec(struct
> > xfs_parent_name_rec *rec,
> >  			      uint32_t p_diroffset);
> >  void xfs_init_parent_name_irec(struct xfs_parent_name_irec *irec,
> >  			       struct xfs_parent_name_rec *rec);
> > +void xfs_init_parent_ptr(struct xfs_parent_ptr *xpp,
> > +			 struct xfs_parent_name_rec *rec);
> >  int xfs_parent_init(xfs_mount_t *mp, xfs_inode_t *ip,
> >  		    struct xfs_name *target_name,
> >  		    struct xfs_parent_defer **parentp);
> > diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> > index 5b600d3f7981..8a9530588ef4 100644
> > --- a/fs/xfs/xfs_ioctl.c
> > +++ b/fs/xfs/xfs_ioctl.c
> > @@ -37,6 +37,7 @@
> >  #include "xfs_health.h"
> >  #include "xfs_reflink.h"
> >  #include "xfs_ioctl.h"
> > +#include "xfs_parent_utils.h"
> >  #include "xfs_xattr.h"
> >  
> >  #include <linux/mount.h>
> > @@ -355,6 +356,8 @@ xfs_attr_filter(
> >  		return XFS_ATTR_ROOT;
> >  	if (ioc_flags & XFS_IOC_ATTR_SECURE)
> >  		return XFS_ATTR_SECURE;
> > +	if (ioc_flags & XFS_IOC_ATTR_PARENT)
> > +		return XFS_ATTR_PARENT;
> >  	return 0;
> >  }
> >  
> > @@ -422,7 +425,8 @@ xfs_ioc_attr_list(
> >  	/*
> >  	 * Reject flags, only allow namespaces.
> >  	 */
> > -	if (flags & ~(XFS_IOC_ATTR_ROOT | XFS_IOC_ATTR_SECURE))
> > +	if (flags & ~(XFS_IOC_ATTR_ROOT | XFS_IOC_ATTR_SECURE |
> > +		      XFS_IOC_ATTR_PARENT))
> >  		return -EINVAL;
> 
> I think xfs_ioc_attrmulti_one needs filtering for
> XFS_IOC_ATTR_PARENT,
> if we're still going to allow attrlist/attrmulti to return parent
> pointers.
Ok, will update that one as well then

> 
> >  	if (flags == (XFS_IOC_ATTR_ROOT | XFS_IOC_ATTR_SECURE))
> >  		return -EINVAL;
> > @@ -1679,6 +1683,92 @@ xfs_ioc_scrub_metadata(
> >  	return 0;
> >  }
> >  
> > +/*
> > + * IOCTL routine to get the parent pointers of an inode and return
> > it to user
> > + * space.  Caller must pass a buffer space containing a struct
> > xfs_pptr_info,
> > + * followed by a region large enough to contain an array of struct
> > + * xfs_parent_ptr of a size specified in pi_ptrs_size.  If the
> > inode contains
> > + * more parent pointers than can fit in the buffer space, caller
> > may re-call
> > + * the function using the returned pi_cursor to resume
> > iteration.  The
> > + * number of xfs_parent_ptr returned will be stored in
> > pi_ptrs_used.
> > + *
> > + * Returns 0 on success or non-zero on failure
> > + */
> > +STATIC int
> > +xfs_ioc_get_parent_pointer(
> > +	struct file			*filp,
> > +	void				__user *arg)
> > +{
> > +	struct xfs_pptr_info		*ppi = NULL;
> > +	int				error = 0;
> > +	struct xfs_inode		*ip = XFS_I(file_inode(filp));
> > +	struct xfs_mount		*mp = ip->i_mount;
> > +
> > +	if (!capable(CAP_SYS_ADMIN))
> > +		return -EPERM;
> > +
> > +	/* Allocate an xfs_pptr_info to put the user data */
> > +	ppi = kmem_alloc(sizeof(struct xfs_pptr_info), 0);
> 
> New code should call kmalloc instead of the old kmem_alloc wrapper.
> 
Ok, will update

> > +	if (!ppi)
> > +		return -ENOMEM;
> > +
> > +	/* Copy the data from the user */
> > +	error = copy_from_user(ppi, arg, sizeof(struct xfs_pptr_info));
> 
> Note: copy_from_user returns the number of bytes *not* copied.  If
> you
> receive a nonzero return value, error usually gets set to EFAULT.
ooooh. ok, will fix that then.

> 
> > +	if (error)
> > +		goto out;
> > +
> > +	/* Check size of buffer requested by user */
> > +	if (xfs_pptr_info_sizeof(ppi->pi_ptrs_size) >
> > XFS_XATTR_LIST_MAX) {
> > +		error = -ENOMEM;
> > +		goto out;
> > +	}
> > +
> > +	if (ppi->pi_flags != 0 && ppi->pi_flags !=
> > XFS_PPTR_IFLAG_HANDLE) {
> 
> 	if (ppi->pi_flags & ~XFS_PPTR_IFLAG_HANDLE) ?
> 
> (If we really want to be pedantic, this really ought to be:
> 
> #define XFS_PPTR_IFLAG_ALL	(XFS_PPTR_IFLAG_HANDLE)
> 
> 	if (ppi->pi_flags & ~XFS_PPTR_IFLAG_ALL)
> 		return -EINVAL;
> 
> Or you could be more flexible, since the kernel could just set the
> OFLAGs appropriately and not care about their value on input:
> 
> #define XFS_PPTR_FLAG_ALL	(XFS_PPTR_IFLAG_HANDLE |
> XFS_PPTR_OFLAG...)
> 
> 	if (ppi->pi_flags & ~XFS_PPTR_FLAG_ALL)
> 		return -EINVAL;
> 
> 	ppi->pi_flags &= ~(XFS_PPTR_OFLAG_ROOT | XFS_PPTR_OFLAG_DONE);

Oh, I see, sure that makes sense.
> 
> > +		error = -EINVAL;
> > +		goto out;
> > +	}
> > +
> > +	/*
> > +	 * Now that we know how big the trailing buffer is, expand
> > +	 * our kernel xfs_pptr_info to be the same size
> > +	 */
> > +	ppi = krealloc(ppi, xfs_pptr_info_sizeof(ppi->pi_ptrs_size),
> > +		       GFP_NOFS | __GFP_NOFAIL);
> > +	if (!ppi)
> > +		return -ENOMEM;
> 
> Why NOFS and NOFAIL?  We don't have any writeback resources locked
> (transactions and ILOCKs) so we can hit ourselves up for memory.
Ok, will update

> 
> > +
> > +	if (ppi->pi_flags == XFS_PPTR_IFLAG_HANDLE) {
> 
> 	if (ppi->pi_flags & XFS_PPTR_IFLAG_HANDLE) {
ok, will fix

> 
> > +		error = xfs_iget(mp, NULL, ppi-
> > >pi_handle.ha_fid.fid_ino,
> > +				0, 0, &ip);
> > +		if (error)
> > +			goto out;
> > +
> > +		if (VFS_I(ip)->i_generation != ppi-
> > >pi_handle.ha_fid.fid_gen) {
> > +			error = -EINVAL;
> > +			goto out;
> > +		}
> > +	}
> > +
> > +	if (ip->i_ino == mp->m_sb.sb_rootino)
> > +		ppi->pi_flags |= XFS_PPTR_OFLAG_ROOT;
> > +
> > +	/* Get the parent pointers */
> > +	error = xfs_attr_get_parent_pointer(ip, ppi);
> > +
> > +	if (error)
> > +		goto out;
> > +
> > +	/* Copy the parent pointers back to the user */
> > +	error = copy_to_user(arg, ppi,
> > +			xfs_pptr_info_sizeof(ppi->pi_ptrs_size));
> 
> Same note as the one I made for copy_from_user.
> 
Will update

> > +	if (error)
> > +		goto out;
> > +
> > +out:
> > +	kmem_free(ppi);
> > +	return error;
> > +}
> > +
> >  int
> >  xfs_ioc_swapext(
> >  	xfs_swapext_t	*sxp)
> > @@ -1968,7 +2058,8 @@ xfs_file_ioctl(
> >  
> >  	case XFS_IOC_FSGETXATTRA:
> >  		return xfs_ioc_fsgetxattra(ip, arg);
> > -
> > +	case XFS_IOC_GETPPOINTER:
> > +		return xfs_ioc_get_parent_pointer(filp, arg);
> >  	case XFS_IOC_GETBMAP:
> >  	case XFS_IOC_GETBMAPA:
> >  	case XFS_IOC_GETBMAPX:
> > diff --git a/fs/xfs/xfs_ondisk.h b/fs/xfs/xfs_ondisk.h
> > index 758702b9495f..765eb514a917 100644
> > --- a/fs/xfs/xfs_ondisk.h
> > +++ b/fs/xfs/xfs_ondisk.h
> > @@ -135,6 +135,10 @@ xfs_check_ondisk_structs(void)
> >  	XFS_CHECK_STRUCT_SIZE(struct xfs_attri_log_format,	40);
> >  	XFS_CHECK_STRUCT_SIZE(struct xfs_attrd_log_format,	16);
> >  
> > +	/* parent pointer ioctls */
> > +	XFS_CHECK_STRUCT_SIZE(struct xfs_parent_ptr,            280);
> > +	XFS_CHECK_STRUCT_SIZE(struct xfs_pptr_info,             104);
> > +
> >  	/*
> >  	 * The v5 superblock format extended several v4 header
> > structures with
> >  	 * additional data. While new fields are only accessible on v5
> > diff --git a/fs/xfs/xfs_parent_utils.c b/fs/xfs/xfs_parent_utils.c
> > new file mode 100644
> > index 000000000000..3351ce173075
> > --- /dev/null
> > +++ b/fs/xfs/xfs_parent_utils.c
> > @@ -0,0 +1,134 @@
> > +/*
> > + * Copyright (c) 2015 Red Hat, Inc.
> > + * All rights reserved.
> > + *
> > + * This program is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU General Public License as
> > + * published by the Free Software Foundation.
> > + *
> > + * This program is distributed in the hope that it would be
> > useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > + * GNU General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU General Public
> > License
> > + * along with this program; if not, write the Free Software
> > Foundation
> > + */
> 
> Please condense this boilerplate down to a SPDX tag and a copyright
> statement.
Sure, will do

> 
> > +#include "xfs.h"
> > +#include "xfs_fs.h"
> > +#include "xfs_format.h"
> > +#include "xfs_log_format.h"
> > +#include "xfs_shared.h"
> > +#include "xfs_trans_resv.h"
> > +#include "xfs_mount.h"
> > +#include "xfs_bmap_btree.h"
> > +#include "xfs_inode.h"
> > +#include "xfs_error.h"
> > +#include "xfs_trace.h"
> > +#include "xfs_trans.h"
> > +#include "xfs_da_format.h"
> > +#include "xfs_da_btree.h"
> > +#include "xfs_attr.h"
> > +#include "xfs_ioctl.h"
> > +#include "xfs_parent.h"
> > +#include "xfs_da_btree.h"
> > +
> > +/*
> > + * Get the parent pointers for a given inode
> > + *
> > + * Returns 0 on success and non zero on error
> > + */
> > +int
> > +xfs_attr_get_parent_pointer(struct xfs_inode		*ip,
> > +			    struct xfs_pptr_info	*ppi)
> > +
> > +{
> > +
> > +	struct xfs_attrlist		*alist;
> 
> int
> xfs_attr_get_parent_pointer(
> 	struct xfs_inode		*ip,
> 	struct xfs_pptr_info		*ppi)
> {
> 	struct xfs_attrlist		*alist;
will fix

> 
> 
> > +	struct xfs_attrlist_ent		*aent;
> > +	struct xfs_parent_ptr		*xpp;
> > +	struct xfs_parent_name_rec	*xpnr;
> > +	char				*namebuf;
> > +	unsigned int			namebuf_size;
> > +	int				name_len;
> > +	int				error = 0;
> > +	unsigned int			ioc_flags =
> > XFS_IOC_ATTR_PARENT;
> > +	unsigned int			flags = XFS_ATTR_PARENT;
> > +	int				i;
> > +	struct xfs_attr_list_context	context;
> > +
> > +	/* Allocate a buffer to store the attribute names */
> > +	namebuf_size = sizeof(struct xfs_attrlist) +
> > +		       (ppi->pi_ptrs_size) * sizeof(struct
> > xfs_attrlist_ent);
> > +	namebuf = kvzalloc(namebuf_size, GFP_KERNEL);
> > +	if (!namebuf)
> > +		return -ENOMEM;
> 
> Do we need the buffer to be zeroed if xfs_attr_list is just going to
> set
> its contents?
I think i might have initially done this out of habit, but I think it's
safe to remove.

> 
> > +
> > +	memset(&context, 0, sizeof(struct xfs_attr_list_context));
> > +	error = xfs_ioc_attr_list_context_init(ip, namebuf,
> > namebuf_size,
> > +			ioc_flags, &context);
> 
> Aha, so the internal implementation has access to
> xfs_attr_list_context
> before it calls into the attr list code.  Ok, in that case, xfs_fs.h
> doesn't need the XFS_IOC_ATTR_PARENT flag, and you can set
> context.attr_filter = XFS_ATTR_PARENT here.  Then we don't have to
> worry
> about the existing xattr bulk ioctls returning parent pointers.Oh ok.
>  I'll see if I can take it out
Oh ok, I'll take a look and see it it can come out.

> 
> > +
> > +	/* Copy the cursor provided by caller */
> > +	memcpy(&context.cursor, &ppi->pi_cursor,
> > +	       sizeof(struct xfs_attrlist_cursor));
> > +
> > +	if (error)
> > +		goto out_kfree;
> 
> Why does the error check come after copying the cursor into the
> onstack
> variable?
Hmm, there might have been a reason at one point, but I
think xfs_ioc_attr_list_context_init could actually just be a void
return now.

> 
> > +
> > +	xfs_ilock(ip, XFS_ILOCK_EXCL);
> 
> xfs_ilock_attr_map_shared() ?
Ok, will update

> 
> > +
> > +	error = xfs_attr_list_ilocked(&context);
> > +	if (error)
> > +		goto out_kfree;
> > +
> > +	alist = (struct xfs_attrlist *)namebuf;
> > +	for (i = 0; i < alist->al_count; i++) {
> > +		struct xfs_da_args args = {
> > +			.geo = ip->i_mount->m_attr_geo,
> > +			.whichfork = XFS_ATTR_FORK,
> > +			.dp = ip,
> > +			.namelen = sizeof(struct xfs_parent_name_rec),
> > +			.attr_filter = flags,
> > +			.op_flags = XFS_DA_OP_OKNOENT,
> > +		};
> > +
> > +		xpp = xfs_ppinfo_to_pp(ppi, i);
> > +		memset(xpp, 0, sizeof(struct xfs_parent_ptr));
> > +		aent = (struct xfs_attrlist_ent *)
> > +			&namebuf[alist->al_offset[i]];
> > +		xpnr = (struct xfs_parent_name_rec *)(aent->a_name);
> > +
> > +		if (aent->a_valuelen > XFS_PPTR_MAXNAMELEN) {
> > +			error = -ERANGE;
> > +			goto out_kfree;
> 
> If a parent pointer has a name longer than MAXNAMELEN then isn't that
> a
> corruption?  And in that case, -EFSCORRUPTED would be more
> appropriate
> here, right?
I think so, will fix

> 
> > +		}
> > +		name_len = aent->a_valuelen;
> > +
> > +		args.name = (char *)xpnr;
> > +		args.hashval = xfs_da_hashname(args.name,
> > args.namelen),
> > +		args.value = (unsigned char *)(xpp->xpp_name);
> > +		args.valuelen = name_len;
> > +
> > +		error = xfs_attr_get_ilocked(&args);
> 
> If error is ENOENT (or ENOATTR or whatever the return value is when
> the
> attr doesn't exist) then shouldn't that be treated as a corruption
> too?
> We still hold the ILOCK from earlier.  I don't think OKNOENT is
> correct
> either.
Hmm, I think I likely borrowed this from similar code else where, but
if the inode is locked in this case probably any error is grounds for
corruption.  will update

> 
> > +		error = (error == -EEXIST ? 0 : error);
> > +		if (error)
> > +			goto out_kfree;
> > +
> > +		xpp->xpp_namelen = name_len;
> > +		xfs_init_parent_ptr(xpp, xpnr);
> 
> Also, should we validate xpnr before copying it out to userspace?
> If, say, the inode number is bogus, that should generate an
> EFSCORRUPTED.
I suppose we could validate the inode while we have it here.

> 
> > +	}
> > +	ppi->pi_ptrs_used = alist->al_count;
> > +	if (!alist->al_more)
> > +		ppi->pi_flags |= XFS_PPTR_OFLAG_DONE;
> > +
> > +	/* Update the caller with the current cursor position */
> > +	memcpy(&ppi->pi_cursor, &context.cursor,
> > +		sizeof(struct xfs_attrlist_cursor));
> 
> Glad you remembered to do this; attrmulti forgot to do this for a
> long
> time. :)
:-)  I do recall running into it some time ago

> 
> > +
> > +out_kfree:
> > +	xfs_iunlock(ip, XFS_ILOCK_EXCL);
> > +	kmem_free(namebuf);
> 
> kvfree, since you got namebuf from kvzalloc.
Alrighty

> 
> > +
> > +	return error;
> > +}
> > +
> > diff --git a/fs/xfs/xfs_parent_utils.h b/fs/xfs/xfs_parent_utils.h
> > new file mode 100644
> > index 000000000000..0e952b2ebd4a
> > --- /dev/null
> > +++ b/fs/xfs/xfs_parent_utils.h
> > @@ -0,0 +1,22 @@
> > +/*
> > + * Copyright (c) 2017 Oracle, Inc.
> 
> 2022?
Sure, will update date

> 
> > + * All Rights Reserved.
> > + *
> > + * This program is free software; you can redistribute it and/or
> > + * modify it under the terms of the GNU General Public License as
> > + * published by the Free Software Foundation.
> > + *
> > + * This program is distributed in the hope that it would be
> > useful,
> > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > + * GNU General Public License for more details.
> > + *
> > + * You should have received a copy of the GNU General Public
> > License
> > + * along with this program; if not, write the Free Software
> > Foundation Inc.
> 
> This also needs to be condensed to a SPDX header and a copyright
> statement.
Right, will clean that up too

Thanks for the reviews!
Allison

> 
> > + */
> > +#ifndef	__XFS_PARENT_UTILS_H__
> > +#define	__XFS_PARENT_UTILS_H__
> > +
> > +int xfs_attr_get_parent_pointer(struct xfs_inode *ip,
> > +				struct xfs_pptr_info *ppi);
> > +#endif	/* __XFS_PARENT_UTILS_H__ */
> > -- 
> > 2.25.1
> > 


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH RESEND v2 01/18] xfs: Fix multi-transaction larp replay
  2022-08-10  1:58     ` Dave Chinner
@ 2022-08-10  5:01       ` Alli
  2022-08-10  6:12         ` Dave Chinner
  0 siblings, 1 reply; 58+ messages in thread
From: Alli @ 2022-08-10  5:01 UTC (permalink / raw)
  To: Dave Chinner, Darrick J. Wong; +Cc: linux-xfs

On Wed, 2022-08-10 at 11:58 +1000, Dave Chinner wrote:
> On Tue, Aug 09, 2022 at 09:52:55AM -0700, Darrick J. Wong wrote:
> > On Thu, Aug 04, 2022 at 12:39:56PM -0700, Allison Henderson wrote:
> > > Recent parent pointer testing has exposed a bug in the underlying
> > > attr replay.  A multi transaction replay currently performs a
> > > single step of the replay, then deferrs the rest if there is more
> > > to do.
> 
> Yup.
> 
> > > This causes race conditions with other attr replays that
> > > might be recovered before the remaining deferred work has had a
> > > chance to finish.
> 
> What other attr replays are we racing against?  There can only be
> one incomplete attr item intent/done chain per inode present in log
> recovery, right?
No, a rename queues up a set and remove before committing the
transaction.  One for the new parent pointer, and another to remove the
old one.  It cant be an attr replace because technically the names are
different.

So the recovered set grows the leaf, and returns the egain, then rest
gets capture committed.  Next up is the recovered remove which pulls
out the fork, which causes problems when the rest of the set operation
resumes as a deferred operation.  Here is the link to the original
discussion, it was quite a while ago:

https://lore.kernel.org/all/Yrzw9F5aGsaldrmR@magnolia/

I hope that helps?
Allison

> 
> > > This can lead to interleaved set and remove
> > > operations that may clobber the attribute fork.  Fix this by
> > > deferring all work for any attribute operation.
> 
> Which means this should be an impossible situation.
> 
> That is, if we crash before the final attrd DONE intent is written
> to the log, it means that new attr intents for modifications made
> *after* the current attr modification was completed will not be
> present in the log. We have strict ordering of committed operations
> in the journal, hence an operation on an inode has an incomplete
> intent *must* be the last operation and the *only* incomplete intent
> that is found in the journal for that inode.
> 
> Hence from an operational ordering persepective, this explanation
> for issue being seen doesn't make any sense to me.  If there are
> multiple incomplete attri intents then we've either got a runtime
> journalling problem (a white-out issue? failing to relog the inode
> in each new intent?) or a log recovery problem (failing to match
> intent-done pairs correctly?), not a recovery deferral issue.
> 
> Hence I think we're still looking for the root cause of this
> problem...
> 
> > > Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
> > > ---
> > >  fs/xfs/xfs_attr_item.c | 35 ++++++++---------------------------
> > >  1 file changed, 8 insertions(+), 27 deletions(-)
> > > 
> > > diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
> > > index 5077a7ad5646..c13d724a3e13 100644
> > > --- a/fs/xfs/xfs_attr_item.c
> > > +++ b/fs/xfs/xfs_attr_item.c
> > > @@ -635,52 +635,33 @@ xfs_attri_item_recover(
> > >  		break;
> > >  	case XFS_ATTRI_OP_FLAGS_REMOVE:
> > >  		if (!xfs_inode_hasattr(args->dp))
> > > -			goto out;
> > > +			return 0;
> > >  		attr->xattri_dela_state =
> > > xfs_attr_init_remove_state(args);
> > >  		break;
> > >  	default:
> > >  		ASSERT(0);
> > > -		error = -EFSCORRUPTED;
> > > -		goto out;
> > > +		return -EFSCORRUPTED;
> > >  	}
> > >  
> > >  	xfs_init_attr_trans(args, &tres, &total);
> > >  	error = xfs_trans_alloc(mp, &tres, total, 0, XFS_TRANS_RESERVE,
> > > &tp);
> > >  	if (error)
> > > -		goto out;
> > > +		return error;
> > >  
> > >  	args->trans = tp;
> > >  	done_item = xfs_trans_get_attrd(tp, attrip);
> > > +	args->trans->t_flags |= XFS_TRANS_HAS_INTENT_DONE;
> > > +	set_bit(XFS_LI_DIRTY, &done_item->attrd_item.li_flags);
> > >  
> > >  	xfs_ilock(ip, XFS_ILOCK_EXCL);
> > >  	xfs_trans_ijoin(tp, ip, 0);
> > >  
> > > -	error = xfs_xattri_finish_update(attr, done_item);
> > > -	if (error == -EAGAIN) {
> > > -		/*
> > > -		 * There's more work to do, so add the intent item to
> > > this
> > > -		 * transaction so that we can continue it later.
> > > -		 */
> > > -		xfs_defer_add(tp, XFS_DEFER_OPS_TYPE_ATTR, &attr-
> > > >xattri_list);
> > > -		error = xfs_defer_ops_capture_and_commit(tp,
> > > capture_list);
> > > -		if (error)
> > > -			goto out_unlock;
> > > -
> > > -		xfs_iunlock(ip, XFS_ILOCK_EXCL);
> > > -		xfs_irele(ip);
> > > -		return 0;
> > > -	}
> > > -	if (error) {
> > > -		xfs_trans_cancel(tp);
> > > -		goto out_unlock;
> > > -	}
> > > -
> > > +	xfs_defer_add(tp, XFS_DEFER_OPS_TYPE_ATTR, &attr->xattri_list);
> > 
> > This seems a little convoluted to me.  Maybe?  Maybe not?
> > 
> > 1. Log recovery recreates an incore xfs_attri_log_item from what it
> > finds in the log.
> > 
> > 2. This function then logs an xattrd for the recovered xattri item.
> > 
> > 3. Then it creates a new xfs_attr_intent to complete the operation.
> > 
> > 4. Finally, it calls xfs_defer_ops_capture_and_commit, which logs a
> > new
> > xattri for the intent created in step 3 and also commits the xattrd
> > for
> > the first xattri.
> > 
> > IOWs, the only difference between before and after is that we're
> > not
> > advancing one more step through the state machine as part of log
> > recovery.  From the perspective of the log, the recovery function
> > merely
> > replaces the recovered xattri log item with a new one.
> > 
> > Why can't we just attach the recovered xattri to the
> > xfs_defer_pending
> > that is created to point to the xfs_attr_intent that's created in
> > step
> > 3, and skip the xattrd?
> 
> Remember that attribute intents are different to all other intent
> types that we have. The existing extent based intents define a
> single indepedent operation that needs to be performed, and each
> step of the intent chain is completely independent of the previous
> step in the chain.  e.g. removing the extent from the rmap btree is
> completely independent of removing it from the inode bmap btree -
> all that matters is that the removal from the bmbt happens first.
> The rmapbt removal can happen at any time after than, and is
> completely independent of any other bmbt or rmapbt operation.
> Similarly, the EFI can processed independently of all bmapbt and
> rmapbt modifications, it just has to happen after those
> modifications are done.
> 
> Hence if we crash during recovery, we can just restart from
> where-ever we got to in the middle of the intent chains and not have
> to care at all.  IOWs, eventual consistency works with these chains
> because there is no dependencies between each step of the intent
> chain and each step is completely independent of the other steps.
> 
> Attribute intent chains are completely different. They link steps in
> a state machine together in a non-trivial, highly dependent chain.
> We can't just restart the chain in the middle like we can for the
> BUI->RUI->CUI->EFI chain because the on-disk attribute is in an
> unknown state and recovering that exact state is .... complex.
> 
> Hence the the first step of recovery is to return the attribute we
> are trying to modify back to a known state. That means we have to
> perform a removal of any existing attribute under that name first.
> Hence this first step should be replacing the existing attr intent
> with the intent that defines the recovery operation we are going to
> perform.
> 
> That means we need to translate set to replace so that cleanup is
> run first, replace needs to clean up the attr under that name
> regardless of whether it has the incomplete bit set on it or not.
> Remove is the only operation that runs the same as at runtime, as
> cleanup for remove is just repeating the remove operation from
> scratch.
> 
> > I /think/ the answer to that question is that we might need to move
> > the
> > log tail forward to free enough log space to finish the intent
> > items, so
> > creating the extra xattrd/xattri (a) avoid the complexity of
> > submitting
> > an incore intent item *and* a log intent item to the defer ops
> > machinery; and (b) avoid livelocks in log recovery.  Therefore, we
> > actually need to do it this way.
> 
> We really need the initial operation to rewrite the intent to match
> the recovery operation we are going to perform. Everything else is
> secondary.
> 
> Cheers,
> 
> Dave.


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH RESEND v2 01/18] xfs: Fix multi-transaction larp replay
  2022-08-10  5:01       ` Alli
@ 2022-08-10  6:12         ` Dave Chinner
  2022-08-10 15:52           ` Darrick J. Wong
  2022-08-12  1:55           ` Alli
  0 siblings, 2 replies; 58+ messages in thread
From: Dave Chinner @ 2022-08-10  6:12 UTC (permalink / raw)
  To: Alli; +Cc: Darrick J. Wong, linux-xfs

On Tue, Aug 09, 2022 at 10:01:49PM -0700, Alli wrote:
> On Wed, 2022-08-10 at 11:58 +1000, Dave Chinner wrote:
> > On Tue, Aug 09, 2022 at 09:52:55AM -0700, Darrick J. Wong wrote:
> > > On Thu, Aug 04, 2022 at 12:39:56PM -0700, Allison Henderson wrote:
> > > > Recent parent pointer testing has exposed a bug in the underlying
> > > > attr replay.  A multi transaction replay currently performs a
> > > > single step of the replay, then deferrs the rest if there is more
> > > > to do.
> > 
> > Yup.
> > 
> > > > This causes race conditions with other attr replays that
> > > > might be recovered before the remaining deferred work has had a
> > > > chance to finish.
> > 
> > What other attr replays are we racing against?  There can only be
> > one incomplete attr item intent/done chain per inode present in log
> > recovery, right?
> No, a rename queues up a set and remove before committing the
> transaction.  One for the new parent pointer, and another to remove the
> old one.

Ah. That really needs to be described in the commit message -
changing from "single intent chain per object" to "multiple
concurrent independent and unserialised intent chains per object" is
a pretty important design rule change...

The whole point of intents is to allow complex, multi-stage
operations on a single object to be sequenced in a tightly
controlled manner. They weren't intended to be run as concurrent
lines of modification on single items; if you need to do two
modifications on an object, the intent chain ties the two
modifications together into a single whole.

One of the reasons I rewrote the attr state machine for LARP was to
enable new multiple attr operation chains to be easily build from
the entry points the state machien provides. Parent attr rename
needs a new intent chain to be built, not run multiple independent
intent chains for each modification.

> It cant be an attr replace because technically the names are
> different.

I disagree - we have all the pieces we need in the state machine
already, we just need to define separate attr names for the
remove and insert steps in the attr intent.

That is, the "replace" operation we execute when an attr set
overwrites the value is "technically" a "replace value" operation,
but we actually implement it as a "replace entire attribute"
operation.

Without LARP, we do that overwrite in independent steps via an
intermediate INCOMPLETE state to allow two xattrs of the same name
to exist in the attr tree at the same time. IOWs, the attr value
overwrite is effectively a "set-swap-remove" operation on two
entirely independent xattrs, ensuring that if we crash we always
have either the old or new xattr visible.

With LARP, we can remove the original attr first, thereby avoiding
the need for two versions of the xattr to exist in the tree in the
first place. However, we have to do these two operations as a pair
of linked independent operations. The intent chain provides the
linking, and requires us to log the name and the value of the attr
that we are overwriting in the intent. Hence we can always recover
the modification to completion no matter where in the operation we
fail.

When it comes to a parent attr rename operation, we are effectively
doing two linked operations - remove the old attr, set the new attr
- on different attributes. Implementation wise, it is exactly the
same sequence as a "replace value" operation, except for the fact
that the new attr we add has a different name.

Hence the only real difference between the existing "attr replace"
and the intent chain we need for "parent attr rename" is that we
have to log two attr names instead of one. Basically, we have a new
XFS_ATTRI_OP_FLAGS... type for this, and that's what tells us that
we are operating on two different attributes instead of just one.

The recovery operation becomes slightly different - we have to run a
remove on the old, then a replace on the new - so there a little bit
of new code needed to manage that in the state machine.

These, however, are just small tweaks on the existing replace attr
operation, and there should be little difference in performance or
overhead between a "replace value" and a "replace entire xattr"
operation as they are largely the same runtime operation for LARP.

> So the recovered set grows the leaf, and returns the egain, then rest
> gets capture committed.  Next up is the recovered remove which pulls
> out the fork, which causes problems when the rest of the set operation
> resumes as a deferred operation.

Yup, and all this goes away when we build the right intent chain for
replacing a parent attr rename....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH RESEND v2 01/18] xfs: Fix multi-transaction larp replay
  2022-08-10  6:12         ` Dave Chinner
@ 2022-08-10 15:52           ` Darrick J. Wong
  2022-08-10 19:28             ` Alli
  2022-08-12  1:55           ` Alli
  1 sibling, 1 reply; 58+ messages in thread
From: Darrick J. Wong @ 2022-08-10 15:52 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Alli, linux-xfs

On Wed, Aug 10, 2022 at 04:12:58PM +1000, Dave Chinner wrote:
> On Tue, Aug 09, 2022 at 10:01:49PM -0700, Alli wrote:
> > On Wed, 2022-08-10 at 11:58 +1000, Dave Chinner wrote:
> > > On Tue, Aug 09, 2022 at 09:52:55AM -0700, Darrick J. Wong wrote:
> > > > On Thu, Aug 04, 2022 at 12:39:56PM -0700, Allison Henderson wrote:
> > > > > Recent parent pointer testing has exposed a bug in the underlying
> > > > > attr replay.  A multi transaction replay currently performs a
> > > > > single step of the replay, then deferrs the rest if there is more
> > > > > to do.
> > > 
> > > Yup.
> > > 
> > > > > This causes race conditions with other attr replays that
> > > > > might be recovered before the remaining deferred work has had a
> > > > > chance to finish.
> > > 
> > > What other attr replays are we racing against?  There can only be
> > > one incomplete attr item intent/done chain per inode present in log
> > > recovery, right?
> > No, a rename queues up a set and remove before committing the
> > transaction.  One for the new parent pointer, and another to remove the
> > old one.
> 
> Ah. That really needs to be described in the commit message -
> changing from "single intent chain per object" to "multiple
> concurrent independent and unserialised intent chains per object" is
> a pretty important design rule change...
> 
> The whole point of intents is to allow complex, multi-stage
> operations on a single object to be sequenced in a tightly
> controlled manner. They weren't intended to be run as concurrent
> lines of modification on single items; if you need to do two
> modifications on an object, the intent chain ties the two
> modifications together into a single whole.

Back when I made the suggestion that resulted in this patch, I was
pondering why it is that (say) atomic swapext didn't suffer from these
recovery problems, and I realized that for any given inode, you can only
have one ongoing swapext operation at a time.  That's why recovery of
swapext operations works fine, whereas pptr recovery has this quirk.

At the time, my thought process was more narrowly focused on making log
recovery mimic runtime more closely.  I didn't make the connection
between this problem and the other open question I had (see the bottom)
about how to fix pptr attrs when rebuilding a directory.

> One of the reasons I rewrote the attr state machine for LARP was to
> enable new multiple attr operation chains to be easily build from
> the entry points the state machien provides. Parent attr rename
> needs a new intent chain to be built, not run multiple independent
> intent chains for each modification.
> 
> > It cant be an attr replace because technically the names are
> > different.
> 
> I disagree - we have all the pieces we need in the state machine
> already, we just need to define separate attr names for the
> remove and insert steps in the attr intent.
> 
> That is, the "replace" operation we execute when an attr set
> overwrites the value is "technically" a "replace value" operation,
> but we actually implement it as a "replace entire attribute"
> operation.

OH.  Right.  I forgot that ATTR_REPLACE=="replace entire attr".

If I'm understanding this right, that means that the xfs_rename patch
ought to detect the situation where there's an existing dirent in the
target directory, and do something along the lines of:

	} else { /* target_ip != NULL */
		xfs_dir_replace(...);

		xfs_parent_defer_replace(tp, new_parent_ptr, target_dp,
				old_diroffset, target_name,
				new_diroffset);

		xfs_trans_ichgtime(...);

Where the xfs_parent_defer_replace operation does an ATTR_REPLACE to
switch:

(target_dp_ino, target_gen, old_diroffset) == <dontcare>

to this:

(target_dp_ino, target_gen, new_diroffset) == target_name

except, I think we have to log the old name in addition to the new name,
because userspace ATTR_REPLACE operations don't allow name changes?

I guess this also implies that xfs_dir_replace will pass out the offset
of the old name, in addition to the offset of the new name.

> Without LARP, we do that overwrite in independent steps via an
> intermediate INCOMPLETE state to allow two xattrs of the same name
> to exist in the attr tree at the same time. IOWs, the attr value
> overwrite is effectively a "set-swap-remove" operation on two
> entirely independent xattrs, ensuring that if we crash we always
> have either the old or new xattr visible.
> 
> With LARP, we can remove the original attr first, thereby avoiding
> the need for two versions of the xattr to exist in the tree in the
> first place. However, we have to do these two operations as a pair
> of linked independent operations. The intent chain provides the
> linking, and requires us to log the name and the value of the attr
> that we are overwriting in the intent. Hence we can always recover
> the modification to completion no matter where in the operation we
> fail.
> 
> When it comes to a parent attr rename operation, we are effectively
> doing two linked operations - remove the old attr, set the new attr
> - on different attributes. Implementation wise, it is exactly the
> same sequence as a "replace value" operation, except for the fact
> that the new attr we add has a different name.
> 
> Hence the only real difference between the existing "attr replace"
> and the intent chain we need for "parent attr rename" is that we
> have to log two attr names instead of one. Basically, we have a new
> XFS_ATTRI_OP_FLAGS... type for this, and that's what tells us that
> we are operating on two different attributes instead of just one.

This answers my earlier question: Yes, and yes.

> The recovery operation becomes slightly different - we have to run a
> remove on the old, then a replace on the new - so there a little bit
> of new code needed to manage that in the state machine.
> 
> These, however, are just small tweaks on the existing replace attr
> operation, and there should be little difference in performance or
> overhead between a "replace value" and a "replace entire xattr"
> operation as they are largely the same runtime operation for LARP.
> 
> > So the recovered set grows the leaf, and returns the egain, then rest
> > gets capture committed.  Next up is the recovered remove which pulls
> > out the fork, which causes problems when the rest of the set operation
> > resumes as a deferred operation.
> 
> Yup, and all this goes away when we build the right intent chain for
> replacing a parent attr rename....

Funnily enough, just last week I had thought that online repair was
going to require the ability to replace an entire xattr...

https://djwong.org/docs/xfs-online-fsck-design/#parent-pointers

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH RESEND v2 01/18] xfs: Fix multi-transaction larp replay
  2022-08-10 15:52           ` Darrick J. Wong
@ 2022-08-10 19:28             ` Alli
  0 siblings, 0 replies; 58+ messages in thread
From: Alli @ 2022-08-10 19:28 UTC (permalink / raw)
  To: Darrick J. Wong, Dave Chinner; +Cc: linux-xfs

On Wed, 2022-08-10 at 08:52 -0700, Darrick J. Wong wrote:
> On Wed, Aug 10, 2022 at 04:12:58PM +1000, Dave Chinner wrote:
> > On Tue, Aug 09, 2022 at 10:01:49PM -0700, Alli wrote:
> > > On Wed, 2022-08-10 at 11:58 +1000, Dave Chinner wrote:
> > > > On Tue, Aug 09, 2022 at 09:52:55AM -0700, Darrick J. Wong
> > > > wrote:
> > > > > On Thu, Aug 04, 2022 at 12:39:56PM -0700, Allison Henderson
> > > > > wrote:
> > > > > > Recent parent pointer testing has exposed a bug in the
> > > > > > underlying
> > > > > > attr replay.  A multi transaction replay currently performs
> > > > > > a
> > > > > > single step of the replay, then deferrs the rest if there
> > > > > > is more
> > > > > > to do.
> > > > 
> > > > Yup.
> > > > 
> > > > > > This causes race conditions with other attr replays that
> > > > > > might be recovered before the remaining deferred work has
> > > > > > had a
> > > > > > chance to finish.
> > > > 
> > > > What other attr replays are we racing against?  There can only
> > > > be
> > > > one incomplete attr item intent/done chain per inode present in
> > > > log
> > > > recovery, right?
> > > No, a rename queues up a set and remove before committing the
> > > transaction.  One for the new parent pointer, and another to
> > > remove the
> > > old one.
> > 
> > Ah. That really needs to be described in the commit message -
> > changing from "single intent chain per object" to "multiple
> > concurrent independent and unserialised intent chains per object"
> > is
> > a pretty important design rule change...
> > 
> > The whole point of intents is to allow complex, multi-stage
> > operations on a single object to be sequenced in a tightly
> > controlled manner. They weren't intended to be run as concurrent
> > lines of modification on single items; if you need to do two
> > modifications on an object, the intent chain ties the two
> > modifications together into a single whole.
> 
> Back when I made the suggestion that resulted in this patch, I was
> pondering why it is that (say) atomic swapext didn't suffer from
> these
> recovery problems, and I realized that for any given inode, you can
> only
> have one ongoing swapext operation at a time.  That's why recovery of
> swapext operations works fine, whereas pptr recovery has this quirk.
> 
> At the time, my thought process was more narrowly focused on making
> log
> recovery mimic runtime more closely.  I didn't make the connection
> between this problem and the other open question I had (see the
> bottom)
> about how to fix pptr attrs when rebuilding a directory.
> 
> > One of the reasons I rewrote the attr state machine for LARP was to
> > enable new multiple attr operation chains to be easily build from
> > the entry points the state machien provides. Parent attr rename
> > needs a new intent chain to be built, not run multiple independent
> > intent chains for each modification.
> > 
> > > It cant be an attr replace because technically the names are
> > > different.
> > 
> > I disagree - we have all the pieces we need in the state machine
> > already, we just need to define separate attr names for the
> > remove and insert steps in the attr intent.
> > 
> > That is, the "replace" operation we execute when an attr set
> > overwrites the value is "technically" a "replace value" operation,
> > but we actually implement it as a "replace entire attribute"
> > operation.
> 
> OH.  Right.  I forgot that ATTR_REPLACE=="replace entire attr".
> 
> If I'm understanding this right, that means that the xfs_rename patch
> ought to detect the situation where there's an existing dirent in the
> target directory, and do something along the lines of:
> 
> 	} else { /* target_ip != NULL */
> 		xfs_dir_replace(...);
> 
> 		xfs_parent_defer_replace(tp, new_parent_ptr, target_dp,
> 				old_diroffset, target_name,
> 				new_diroffset);
> 
> 		xfs_trans_ichgtime(...);
> 
> Where the xfs_parent_defer_replace operation does an ATTR_REPLACE to
> switch:
> 
> (target_dp_ino, target_gen, old_diroffset) == <dontcare>
> 
> to this:
> 
> (target_dp_ino, target_gen, new_diroffset) == target_name
> 
> except, I think we have to log the old name in addition to the new
> name,
> because userspace ATTR_REPLACE operations don't allow name changes?
> 
> I guess this also implies that xfs_dir_replace will pass out the
> offset
> of the old name, in addition to the offset of the new name.
> 
> > Without LARP, we do that overwrite in independent steps via an
> > intermediate INCOMPLETE state to allow two xattrs of the same name
> > to exist in the attr tree at the same time. IOWs, the attr value
> > overwrite is effectively a "set-swap-remove" operation on two
> > entirely independent xattrs, ensuring that if we crash we always
> > have either the old or new xattr visible.
> > 
> > With LARP, we can remove the original attr first, thereby avoiding
> > the need for two versions of the xattr to exist in the tree in the
> > first place. However, we have to do these two operations as a pair
> > of linked independent operations. The intent chain provides the
> > linking, and requires us to log the name and the value of the attr
> > that we are overwriting in the intent. Hence we can always recover
> > the modification to completion no matter where in the operation we
> > fail.
> > 
> > When it comes to a parent attr rename operation, we are effectively
> > doing two linked operations - remove the old attr, set the new attr
> > - on different attributes. Implementation wise, it is exactly the
> > same sequence as a "replace value" operation, except for the fact
> > that the new attr we add has a different name.
> > 
> > Hence the only real difference between the existing "attr replace"
> > and the intent chain we need for "parent attr rename" is that we
> > have to log two attr names instead of one. Basically, we have a new
> > XFS_ATTRI_OP_FLAGS... type for this, and that's what tells us that
> > we are operating on two different attributes instead of just one.
> 
> This answers my earlier question: Yes, and yes.

I see, alrighty then, I'll see if I can put together a new
XFS_ATTRI_OP_FLAGS type that carries both the old and new name.  That
sounds like it should work.  Thanks for all the feed back!

Allison


> 
> > The recovery operation becomes slightly different - we have to run
> > a
> > remove on the old, then a replace on the new - so there a little
> > bit
> > of new code needed to manage that in the state machine.
> > 
> > These, however, are just small tweaks on the existing replace attr
> > operation, and there should be little difference in performance or
> > overhead between a "replace value" and a "replace entire xattr"
> > operation as they are largely the same runtime operation for LARP.
> > 
> > > So the recovered set grows the leaf, and returns the egain, then
> > > rest
> > > gets capture committed.  Next up is the recovered remove which
> > > pulls
> > > out the fork, which causes problems when the rest of the set
> > > operation
> > > resumes as a deferred operation.
> > 
> > Yup, and all this goes away when we build the right intent chain
> > for
> > replacing a parent attr rename....
> 
> Funnily enough, just last week I had thought that online repair was
> going to require the ability to replace an entire xattr...
> 
> https://urldefense.com/v3/__https://djwong.org/docs/xfs-online-fsck-design/*parent-pointers__;Iw!!ACWV5N9M2RV99hQ!MA2KfxWZLMTj_fdJoFnvZhLIgOGsGlIclRVE39DFME755VnvyX4VqsQGM6GfBDnDXKkfAcFjdv2oENaXepic$ 
> 
> --D
> 
> > Cheers,
> > 
> > Dave.
> > -- 
> > Dave Chinner
> > david@fromorbit.com


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH RESEND v2 01/18] xfs: Fix multi-transaction larp replay
  2022-08-10  6:12         ` Dave Chinner
  2022-08-10 15:52           ` Darrick J. Wong
@ 2022-08-12  1:55           ` Alli
  2022-08-12  3:05             ` Darrick J. Wong
  2022-08-16  0:54             ` Dave Chinner
  1 sibling, 2 replies; 58+ messages in thread
From: Alli @ 2022-08-12  1:55 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Darrick J. Wong, linux-xfs

On Wed, 2022-08-10 at 16:12 +1000, Dave Chinner wrote:
> On Tue, Aug 09, 2022 at 10:01:49PM -0700, Alli wrote:
> > On Wed, 2022-08-10 at 11:58 +1000, Dave Chinner wrote:
> > > On Tue, Aug 09, 2022 at 09:52:55AM -0700, Darrick J. Wong wrote:
> > > > On Thu, Aug 04, 2022 at 12:39:56PM -0700, Allison Henderson
> > > > wrote:
> > > > > Recent parent pointer testing has exposed a bug in the
> > > > > underlying
> > > > > attr replay.  A multi transaction replay currently performs a
> > > > > single step of the replay, then deferrs the rest if there is
> > > > > more
> > > > > to do.
> > > 
> > > Yup.
> > > 
> > > > > This causes race conditions with other attr replays that
> > > > > might be recovered before the remaining deferred work has had
> > > > > a
> > > > > chance to finish.
> > > 
> > > What other attr replays are we racing against?  There can only be
> > > one incomplete attr item intent/done chain per inode present in
> > > log
> > > recovery, right?
> > No, a rename queues up a set and remove before committing the
> > transaction.  One for the new parent pointer, and another to remove
> > the
> > old one.
> 
> Ah. That really needs to be described in the commit message -
> changing from "single intent chain per object" to "multiple
> concurrent independent and unserialised intent chains per object" is
> a pretty important design rule change...
> 
> The whole point of intents is to allow complex, multi-stage
> operations on a single object to be sequenced in a tightly
> controlled manner. They weren't intended to be run as concurrent
> lines of modification on single items; if you need to do two
> modifications on an object, the intent chain ties the two
> modifications together into a single whole.
> 
> One of the reasons I rewrote the attr state machine for LARP was to
> enable new multiple attr operation chains to be easily build from
> the entry points the state machien provides. Parent attr rename
> needs a new intent chain to be built, not run multiple independent
> intent chains for each modification.
> 
> > It cant be an attr replace because technically the names are
> > different.
> 
> I disagree - we have all the pieces we need in the state machine
> already, we just need to define separate attr names for the
> remove and insert steps in the attr intent.
> 
> That is, the "replace" operation we execute when an attr set
> overwrites the value is "technically" a "replace value" operation,
> but we actually implement it as a "replace entire attribute"
> operation.
> 
> Without LARP, we do that overwrite in independent steps via an
> intermediate INCOMPLETE state to allow two xattrs of the same name
> to exist in the attr tree at the same time. IOWs, the attr value
> overwrite is effectively a "set-swap-remove" operation on two
> entirely independent xattrs, ensuring that if we crash we always
> have either the old or new xattr visible.
> 
> With LARP, we can remove the original attr first, thereby avoiding
> the need for two versions of the xattr to exist in the tree in the
> first place. However, we have to do these two operations as a pair
> of linked independent operations. The intent chain provides the
> linking, and requires us to log the name and the value of the attr
> that we are overwriting in the intent. Hence we can always recover
> the modification to completion no matter where in the operation we
> fail.
> 
> When it comes to a parent attr rename operation, we are effectively
> doing two linked operations - remove the old attr, set the new attr
> - on different attributes. Implementation wise, it is exactly the
> same sequence as a "replace value" operation, except for the fact
> that the new attr we add has a different name.
> 
> Hence the only real difference between the existing "attr replace"
> and the intent chain we need for "parent attr rename" is that we
> have to log two attr names instead of one. 

To be clear, this would imply expanding xfs_attri_log_format to have
another alfi_new_name_len feild and another iovec for the attr intent
right?  Does that cause issues to change the on disk log layout after
the original has merged?  Or is that ok for things that are still
experimental? Thanks!

Allison

> Basically, we have a new
> XFS_ATTRI_OP_FLAGS... type for this, and that's what tells us that
> we are operating on two different attributes instead of just one.
> 
> The recovery operation becomes slightly different - we have to run a
> remove on the old, then a replace on the new - so there a little bit
> of new code needed to manage that in the state machine.
> 
> These, however, are just small tweaks on the existing replace attr
> operation, and there should be little difference in performance or
> overhead between a "replace value" and a "replace entire xattr"
> operation as they are largely the same runtime operation for LARP.
> 
> > So the recovered set grows the leaf, and returns the egain, then
> > rest
> > gets capture committed.  Next up is the recovered remove which
> > pulls
> > out the fork, which causes problems when the rest of the set
> > operation
> > resumes as a deferred operation.
> 
> Yup, and all this goes away when we build the right intent chain for
> replacing a parent attr rename....
> 
> Cheers,
> 
> Dave.


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH RESEND v2 01/18] xfs: Fix multi-transaction larp replay
  2022-08-12  1:55           ` Alli
@ 2022-08-12  3:05             ` Darrick J. Wong
  2022-08-16  0:54             ` Dave Chinner
  1 sibling, 0 replies; 58+ messages in thread
From: Darrick J. Wong @ 2022-08-12  3:05 UTC (permalink / raw)
  To: Alli; +Cc: Dave Chinner, linux-xfs

On Thu, Aug 11, 2022 at 06:55:16PM -0700, Alli wrote:
> On Wed, 2022-08-10 at 16:12 +1000, Dave Chinner wrote:
> > On Tue, Aug 09, 2022 at 10:01:49PM -0700, Alli wrote:
> > > On Wed, 2022-08-10 at 11:58 +1000, Dave Chinner wrote:
> > > > On Tue, Aug 09, 2022 at 09:52:55AM -0700, Darrick J. Wong wrote:
> > > > > On Thu, Aug 04, 2022 at 12:39:56PM -0700, Allison Henderson
> > > > > wrote:
> > > > > > Recent parent pointer testing has exposed a bug in the
> > > > > > underlying
> > > > > > attr replay.  A multi transaction replay currently performs a
> > > > > > single step of the replay, then deferrs the rest if there is
> > > > > > more
> > > > > > to do.
> > > > 
> > > > Yup.
> > > > 
> > > > > > This causes race conditions with other attr replays that
> > > > > > might be recovered before the remaining deferred work has had
> > > > > > a
> > > > > > chance to finish.
> > > > 
> > > > What other attr replays are we racing against?  There can only be
> > > > one incomplete attr item intent/done chain per inode present in
> > > > log
> > > > recovery, right?
> > > No, a rename queues up a set and remove before committing the
> > > transaction.  One for the new parent pointer, and another to remove
> > > the
> > > old one.
> > 
> > Ah. That really needs to be described in the commit message -
> > changing from "single intent chain per object" to "multiple
> > concurrent independent and unserialised intent chains per object" is
> > a pretty important design rule change...
> > 
> > The whole point of intents is to allow complex, multi-stage
> > operations on a single object to be sequenced in a tightly
> > controlled manner. They weren't intended to be run as concurrent
> > lines of modification on single items; if you need to do two
> > modifications on an object, the intent chain ties the two
> > modifications together into a single whole.
> > 
> > One of the reasons I rewrote the attr state machine for LARP was to
> > enable new multiple attr operation chains to be easily build from
> > the entry points the state machien provides. Parent attr rename
> > needs a new intent chain to be built, not run multiple independent
> > intent chains for each modification.
> > 
> > > It cant be an attr replace because technically the names are
> > > different.
> > 
> > I disagree - we have all the pieces we need in the state machine
> > already, we just need to define separate attr names for the
> > remove and insert steps in the attr intent.
> > 
> > That is, the "replace" operation we execute when an attr set
> > overwrites the value is "technically" a "replace value" operation,
> > but we actually implement it as a "replace entire attribute"
> > operation.
> > 
> > Without LARP, we do that overwrite in independent steps via an
> > intermediate INCOMPLETE state to allow two xattrs of the same name
> > to exist in the attr tree at the same time. IOWs, the attr value
> > overwrite is effectively a "set-swap-remove" operation on two
> > entirely independent xattrs, ensuring that if we crash we always
> > have either the old or new xattr visible.
> > 
> > With LARP, we can remove the original attr first, thereby avoiding
> > the need for two versions of the xattr to exist in the tree in the
> > first place. However, we have to do these two operations as a pair
> > of linked independent operations. The intent chain provides the
> > linking, and requires us to log the name and the value of the attr
> > that we are overwriting in the intent. Hence we can always recover
> > the modification to completion no matter where in the operation we
> > fail.
> > 
> > When it comes to a parent attr rename operation, we are effectively
> > doing two linked operations - remove the old attr, set the new attr
> > - on different attributes. Implementation wise, it is exactly the
> > same sequence as a "replace value" operation, except for the fact
> > that the new attr we add has a different name.
> > 
> > Hence the only real difference between the existing "attr replace"
> > and the intent chain we need for "parent attr rename" is that we
> > have to log two attr names instead of one. 
> 
> To be clear, this would imply expanding xfs_attri_log_format to have
> another alfi_new_name_len feild and another iovec for the attr intent
> right?  Does that cause issues to change the on disk log layout after
> the original has merged?  Or is that ok for things that are still
> experimental? Thanks!

XFS_SB_FEAT_INCOMPAT_PARENT should protect against that, since the
userspace xattr api does not support replacing an attr's name, only the
value.

--D

> Allison
> 
> > Basically, we have a new
> > XFS_ATTRI_OP_FLAGS... type for this, and that's what tells us that
> > we are operating on two different attributes instead of just one.
> > 
> > The recovery operation becomes slightly different - we have to run a
> > remove on the old, then a replace on the new - so there a little bit
> > of new code needed to manage that in the state machine.
> > 
> > These, however, are just small tweaks on the existing replace attr
> > operation, and there should be little difference in performance or
> > overhead between a "replace value" and a "replace entire xattr"
> > operation as they are largely the same runtime operation for LARP.
> > 
> > > So the recovered set grows the leaf, and returns the egain, then
> > > rest
> > > gets capture committed.  Next up is the recovered remove which
> > > pulls
> > > out the fork, which causes problems when the rest of the set
> > > operation
> > > resumes as a deferred operation.
> > 
> > Yup, and all this goes away when we build the right intent chain for
> > replacing a parent attr rename....
> > 
> > Cheers,
> > 
> > Dave.
> 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH RESEND v2 01/18] xfs: Fix multi-transaction larp replay
  2022-08-12  1:55           ` Alli
  2022-08-12  3:05             ` Darrick J. Wong
@ 2022-08-16  0:54             ` Dave Chinner
  2022-08-16  5:07               ` Darrick J. Wong
  1 sibling, 1 reply; 58+ messages in thread
From: Dave Chinner @ 2022-08-16  0:54 UTC (permalink / raw)
  To: Alli; +Cc: Darrick J. Wong, linux-xfs

On Thu, Aug 11, 2022 at 06:55:16PM -0700, Alli wrote:
> On Wed, 2022-08-10 at 16:12 +1000, Dave Chinner wrote:
> > On Tue, Aug 09, 2022 at 10:01:49PM -0700, Alli wrote:
> > > On Wed, 2022-08-10 at 11:58 +1000, Dave Chinner wrote:
> > > > On Tue, Aug 09, 2022 at 09:52:55AM -0700, Darrick J. Wong wrote:
> > > > > On Thu, Aug 04, 2022 at 12:39:56PM -0700, Allison Henderson
> > > > > wrote:
> > > > > > Recent parent pointer testing has exposed a bug in the
> > > > > > underlying
> > > > > > attr replay.  A multi transaction replay currently performs a
> > > > > > single step of the replay, then deferrs the rest if there is
> > > > > > more
> > > > > > to do.
> > > > 
> > > > Yup.
> > > > 
> > > > > > This causes race conditions with other attr replays that
> > > > > > might be recovered before the remaining deferred work has had
> > > > > > a
> > > > > > chance to finish.
> > > > 
> > > > What other attr replays are we racing against?  There can only be
> > > > one incomplete attr item intent/done chain per inode present in
> > > > log
> > > > recovery, right?
> > > No, a rename queues up a set and remove before committing the
> > > transaction.  One for the new parent pointer, and another to remove
> > > the
> > > old one.
> > 
> > Ah. That really needs to be described in the commit message -
> > changing from "single intent chain per object" to "multiple
> > concurrent independent and unserialised intent chains per object" is
> > a pretty important design rule change...
> > 
> > The whole point of intents is to allow complex, multi-stage
> > operations on a single object to be sequenced in a tightly
> > controlled manner. They weren't intended to be run as concurrent
> > lines of modification on single items; if you need to do two
> > modifications on an object, the intent chain ties the two
> > modifications together into a single whole.
> > 
> > One of the reasons I rewrote the attr state machine for LARP was to
> > enable new multiple attr operation chains to be easily build from
> > the entry points the state machien provides. Parent attr rename
> > needs a new intent chain to be built, not run multiple independent
> > intent chains for each modification.
> > 
> > > It cant be an attr replace because technically the names are
> > > different.
> > 
> > I disagree - we have all the pieces we need in the state machine
> > already, we just need to define separate attr names for the
> > remove and insert steps in the attr intent.
> > 
> > That is, the "replace" operation we execute when an attr set
> > overwrites the value is "technically" a "replace value" operation,
> > but we actually implement it as a "replace entire attribute"
> > operation.
> > 
> > Without LARP, we do that overwrite in independent steps via an
> > intermediate INCOMPLETE state to allow two xattrs of the same name
> > to exist in the attr tree at the same time. IOWs, the attr value
> > overwrite is effectively a "set-swap-remove" operation on two
> > entirely independent xattrs, ensuring that if we crash we always
> > have either the old or new xattr visible.
> > 
> > With LARP, we can remove the original attr first, thereby avoiding
> > the need for two versions of the xattr to exist in the tree in the
> > first place. However, we have to do these two operations as a pair
> > of linked independent operations. The intent chain provides the
> > linking, and requires us to log the name and the value of the attr
> > that we are overwriting in the intent. Hence we can always recover
> > the modification to completion no matter where in the operation we
> > fail.
> > 
> > When it comes to a parent attr rename operation, we are effectively
> > doing two linked operations - remove the old attr, set the new attr
> > - on different attributes. Implementation wise, it is exactly the
> > same sequence as a "replace value" operation, except for the fact
> > that the new attr we add has a different name.
> > 
> > Hence the only real difference between the existing "attr replace"
> > and the intent chain we need for "parent attr rename" is that we
> > have to log two attr names instead of one. 
> 
> To be clear, this would imply expanding xfs_attri_log_format to have
> another alfi_new_name_len feild and another iovec for the attr intent
> right?  Does that cause issues to change the on disk log layout after
> the original has merged?  Or is that ok for things that are still
> experimental? Thanks!

I think we can get away with this quite easily without breaking the
existing experimental code.

struct xfs_attri_log_format {
        uint16_t        alfi_type;      /* attri log item type */
        uint16_t        alfi_size;      /* size of this item */
        uint32_t        __pad;          /* pad to 64 bit aligned */
        uint64_t        alfi_id;        /* attri identifier */
        uint64_t        alfi_ino;       /* the inode for this attr operation */
        uint32_t        alfi_op_flags;  /* marks the op as a set or remove */
        uint32_t        alfi_name_len;  /* attr name length */
        uint32_t        alfi_value_len; /* attr value length */
        uint32_t        alfi_attr_filter;/* attr filter flags */
};

We have a padding field in there that is currently all zeros. Let's
make that a count of the number of {name, value} tuples that are
appended to the format. i.e.

struct xfs_attri_log_name {
        uint32_t        alfi_op_flags;  /* marks the op as a set or remove */
        uint32_t        alfi_name_len;  /* attr name length */
        uint32_t        alfi_value_len; /* attr value length */
        uint32_t        alfi_attr_filter;/* attr filter flags */
};

struct xfs_attri_log_format {
        uint16_t        alfi_type;      /* attri log item type */
        uint16_t        alfi_size;      /* size of this item */
	uint8_t		alfi_attr_cnt;	/* count of name/val pairs */
        uint8_t		__pad1;          /* pad to 64 bit aligned */
        uint16_t	__pad2;          /* pad to 64 bit aligned */
        uint64_t        alfi_id;        /* attri identifier */
        uint64_t        alfi_ino;       /* the inode for this attr operation */
	struct xfs_attri_log_name alfi_attr[]; /* attrs to operate on */
};

Basically, the size and shape of the structure has not changed, and
if alfi_attr_cnt == 0 we just treat it as if alfi_attr_cnt == 1 as
the backwards compat code for the existing code.

And then we just have as many followup regions for name/val pairs
as are defined by the alfi_attr_cnt and alfi_attr[] parts of the
structure. Each attr can have a different operation performed on
them, and they can have different filters applied so they can exist
in different namespaces, too.

SO I don't think we need a new on-disk feature bit for this
enhancement - it definitely comes under the heading of "this stuff
is experimental, this is the sort of early structure revision that
EXPERIMENTAL is supposed to cover....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH RESEND v2 01/18] xfs: Fix multi-transaction larp replay
  2022-08-16  0:54             ` Dave Chinner
@ 2022-08-16  5:07               ` Darrick J. Wong
  2022-08-16 20:41                 ` Alli
  0 siblings, 1 reply; 58+ messages in thread
From: Darrick J. Wong @ 2022-08-16  5:07 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Alli, linux-xfs

On Tue, Aug 16, 2022 at 10:54:38AM +1000, Dave Chinner wrote:
> On Thu, Aug 11, 2022 at 06:55:16PM -0700, Alli wrote:
> > On Wed, 2022-08-10 at 16:12 +1000, Dave Chinner wrote:
> > > On Tue, Aug 09, 2022 at 10:01:49PM -0700, Alli wrote:
> > > > On Wed, 2022-08-10 at 11:58 +1000, Dave Chinner wrote:
> > > > > On Tue, Aug 09, 2022 at 09:52:55AM -0700, Darrick J. Wong wrote:
> > > > > > On Thu, Aug 04, 2022 at 12:39:56PM -0700, Allison Henderson
> > > > > > wrote:
> > > > > > > Recent parent pointer testing has exposed a bug in the
> > > > > > > underlying
> > > > > > > attr replay.  A multi transaction replay currently performs a
> > > > > > > single step of the replay, then deferrs the rest if there is
> > > > > > > more
> > > > > > > to do.
> > > > > 
> > > > > Yup.
> > > > > 
> > > > > > > This causes race conditions with other attr replays that
> > > > > > > might be recovered before the remaining deferred work has had
> > > > > > > a
> > > > > > > chance to finish.
> > > > > 
> > > > > What other attr replays are we racing against?  There can only be
> > > > > one incomplete attr item intent/done chain per inode present in
> > > > > log
> > > > > recovery, right?
> > > > No, a rename queues up a set and remove before committing the
> > > > transaction.  One for the new parent pointer, and another to remove
> > > > the
> > > > old one.
> > > 
> > > Ah. That really needs to be described in the commit message -
> > > changing from "single intent chain per object" to "multiple
> > > concurrent independent and unserialised intent chains per object" is
> > > a pretty important design rule change...
> > > 
> > > The whole point of intents is to allow complex, multi-stage
> > > operations on a single object to be sequenced in a tightly
> > > controlled manner. They weren't intended to be run as concurrent
> > > lines of modification on single items; if you need to do two
> > > modifications on an object, the intent chain ties the two
> > > modifications together into a single whole.
> > > 
> > > One of the reasons I rewrote the attr state machine for LARP was to
> > > enable new multiple attr operation chains to be easily build from
> > > the entry points the state machien provides. Parent attr rename
> > > needs a new intent chain to be built, not run multiple independent
> > > intent chains for each modification.
> > > 
> > > > It cant be an attr replace because technically the names are
> > > > different.
> > > 
> > > I disagree - we have all the pieces we need in the state machine
> > > already, we just need to define separate attr names for the
> > > remove and insert steps in the attr intent.
> > > 
> > > That is, the "replace" operation we execute when an attr set
> > > overwrites the value is "technically" a "replace value" operation,
> > > but we actually implement it as a "replace entire attribute"
> > > operation.
> > > 
> > > Without LARP, we do that overwrite in independent steps via an
> > > intermediate INCOMPLETE state to allow two xattrs of the same name
> > > to exist in the attr tree at the same time. IOWs, the attr value
> > > overwrite is effectively a "set-swap-remove" operation on two
> > > entirely independent xattrs, ensuring that if we crash we always
> > > have either the old or new xattr visible.
> > > 
> > > With LARP, we can remove the original attr first, thereby avoiding
> > > the need for two versions of the xattr to exist in the tree in the
> > > first place. However, we have to do these two operations as a pair
> > > of linked independent operations. The intent chain provides the
> > > linking, and requires us to log the name and the value of the attr
> > > that we are overwriting in the intent. Hence we can always recover
> > > the modification to completion no matter where in the operation we
> > > fail.
> > > 
> > > When it comes to a parent attr rename operation, we are effectively
> > > doing two linked operations - remove the old attr, set the new attr
> > > - on different attributes. Implementation wise, it is exactly the
> > > same sequence as a "replace value" operation, except for the fact
> > > that the new attr we add has a different name.
> > > 
> > > Hence the only real difference between the existing "attr replace"
> > > and the intent chain we need for "parent attr rename" is that we
> > > have to log two attr names instead of one. 
> > 
> > To be clear, this would imply expanding xfs_attri_log_format to have
> > another alfi_new_name_len feild and another iovec for the attr intent
> > right?  Does that cause issues to change the on disk log layout after
> > the original has merged?  Or is that ok for things that are still
> > experimental? Thanks!
> 
> I think we can get away with this quite easily without breaking the
> existing experimental code.
> 
> struct xfs_attri_log_format {
>         uint16_t        alfi_type;      /* attri log item type */
>         uint16_t        alfi_size;      /* size of this item */
>         uint32_t        __pad;          /* pad to 64 bit aligned */
>         uint64_t        alfi_id;        /* attri identifier */
>         uint64_t        alfi_ino;       /* the inode for this attr operation */
>         uint32_t        alfi_op_flags;  /* marks the op as a set or remove */
>         uint32_t        alfi_name_len;  /* attr name length */
>         uint32_t        alfi_value_len; /* attr value length */
>         uint32_t        alfi_attr_filter;/* attr filter flags */
> };
> 
> We have a padding field in there that is currently all zeros. Let's
> make that a count of the number of {name, value} tuples that are
> appended to the format. i.e.
> 
> struct xfs_attri_log_name {
>         uint32_t        alfi_op_flags;  /* marks the op as a set or remove */
>         uint32_t        alfi_name_len;  /* attr name length */
>         uint32_t        alfi_value_len; /* attr value length */
>         uint32_t        alfi_attr_filter;/* attr filter flags */
> };
> 
> struct xfs_attri_log_format {
>         uint16_t        alfi_type;      /* attri log item type */
>         uint16_t        alfi_size;      /* size of this item */
> 	uint8_t		alfi_attr_cnt;	/* count of name/val pairs */
>         uint8_t		__pad1;          /* pad to 64 bit aligned */
>         uint16_t	__pad2;          /* pad to 64 bit aligned */
>         uint64_t        alfi_id;        /* attri identifier */
>         uint64_t        alfi_ino;       /* the inode for this attr operation */
> 	struct xfs_attri_log_name alfi_attr[]; /* attrs to operate on */
> };
> 
> Basically, the size and shape of the structure has not changed, and
> if alfi_attr_cnt == 0 we just treat it as if alfi_attr_cnt == 1 as
> the backwards compat code for the existing code.
> 
> And then we just have as many followup regions for name/val pairs
> as are defined by the alfi_attr_cnt and alfi_attr[] parts of the
> structure. Each attr can have a different operation performed on
> them, and they can have different filters applied so they can exist
> in different namespaces, too.
> 
> SO I don't think we need a new on-disk feature bit for this
> enhancement - it definitely comes under the heading of "this stuff
> is experimental, this is the sort of early structure revision that
> EXPERIMENTAL is supposed to cover....

You might even callit "alfi_extra_names" to avoid the "0 means 1" stuff.
;)

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH RESEND v2 01/18] xfs: Fix multi-transaction larp replay
  2022-08-16  5:07               ` Darrick J. Wong
@ 2022-08-16 20:41                 ` Alli
  2022-08-19  1:05                   ` Alli
  0 siblings, 1 reply; 58+ messages in thread
From: Alli @ 2022-08-16 20:41 UTC (permalink / raw)
  To: Darrick J. Wong, Dave Chinner; +Cc: linux-xfs

On Mon, 2022-08-15 at 22:07 -0700, Darrick J. Wong wrote:
> On Tue, Aug 16, 2022 at 10:54:38AM +1000, Dave Chinner wrote:
> > On Thu, Aug 11, 2022 at 06:55:16PM -0700, Alli wrote:
> > > On Wed, 2022-08-10 at 16:12 +1000, Dave Chinner wrote:
> > > > On Tue, Aug 09, 2022 at 10:01:49PM -0700, Alli wrote:
> > > > > On Wed, 2022-08-10 at 11:58 +1000, Dave Chinner wrote:
> > > > > > On Tue, Aug 09, 2022 at 09:52:55AM -0700, Darrick J. Wong
> > > > > > wrote:
> > > > > > > On Thu, Aug 04, 2022 at 12:39:56PM -0700, Allison
> > > > > > > Henderson
> > > > > > > wrote:
> > > > > > > > Recent parent pointer testing has exposed a bug in the
> > > > > > > > underlying
> > > > > > > > attr replay.  A multi transaction replay currently
> > > > > > > > performs a
> > > > > > > > single step of the replay, then deferrs the rest if
> > > > > > > > there is
> > > > > > > > more
> > > > > > > > to do.
> > > > > > 
> > > > > > Yup.
> > > > > > 
> > > > > > > > This causes race conditions with other attr replays
> > > > > > > > that
> > > > > > > > might be recovered before the remaining deferred work
> > > > > > > > has had
> > > > > > > > a
> > > > > > > > chance to finish.
> > > > > > 
> > > > > > What other attr replays are we racing against?  There can
> > > > > > only be
> > > > > > one incomplete attr item intent/done chain per inode
> > > > > > present in
> > > > > > log
> > > > > > recovery, right?
> > > > > No, a rename queues up a set and remove before committing the
> > > > > transaction.  One for the new parent pointer, and another to
> > > > > remove
> > > > > the
> > > > > old one.
> > > > 
> > > > Ah. That really needs to be described in the commit message -
> > > > changing from "single intent chain per object" to "multiple
> > > > concurrent independent and unserialised intent chains per
> > > > object" is
> > > > a pretty important design rule change...
> > > > 
> > > > The whole point of intents is to allow complex, multi-stage
> > > > operations on a single object to be sequenced in a tightly
> > > > controlled manner. They weren't intended to be run as
> > > > concurrent
> > > > lines of modification on single items; if you need to do two
> > > > modifications on an object, the intent chain ties the two
> > > > modifications together into a single whole.
> > > > 
> > > > One of the reasons I rewrote the attr state machine for LARP
> > > > was to
> > > > enable new multiple attr operation chains to be easily build
> > > > from
> > > > the entry points the state machien provides. Parent attr rename
> > > > needs a new intent chain to be built, not run multiple
> > > > independent
> > > > intent chains for each modification.
> > > > 
> > > > > It cant be an attr replace because technically the names are
> > > > > different.
> > > > 
> > > > I disagree - we have all the pieces we need in the state
> > > > machine
> > > > already, we just need to define separate attr names for the
> > > > remove and insert steps in the attr intent.
> > > > 
> > > > That is, the "replace" operation we execute when an attr set
> > > > overwrites the value is "technically" a "replace value"
> > > > operation,
> > > > but we actually implement it as a "replace entire attribute"
> > > > operation.
> > > > 
> > > > Without LARP, we do that overwrite in independent steps via an
> > > > intermediate INCOMPLETE state to allow two xattrs of the same
> > > > name
> > > > to exist in the attr tree at the same time. IOWs, the attr
> > > > value
> > > > overwrite is effectively a "set-swap-remove" operation on two
> > > > entirely independent xattrs, ensuring that if we crash we
> > > > always
> > > > have either the old or new xattr visible.
> > > > 
> > > > With LARP, we can remove the original attr first, thereby
> > > > avoiding
> > > > the need for two versions of the xattr to exist in the tree in
> > > > the
> > > > first place. However, we have to do these two operations as a
> > > > pair
> > > > of linked independent operations. The intent chain provides the
> > > > linking, and requires us to log the name and the value of the
> > > > attr
> > > > that we are overwriting in the intent. Hence we can always
> > > > recover
> > > > the modification to completion no matter where in the operation
> > > > we
> > > > fail.
> > > > 
> > > > When it comes to a parent attr rename operation, we are
> > > > effectively
> > > > doing two linked operations - remove the old attr, set the new
> > > > attr
> > > > - on different attributes. Implementation wise, it is exactly
> > > > the
> > > > same sequence as a "replace value" operation, except for the
> > > > fact
> > > > that the new attr we add has a different name.
> > > > 
> > > > Hence the only real difference between the existing "attr
> > > > replace"
> > > > and the intent chain we need for "parent attr rename" is that
> > > > we
> > > > have to log two attr names instead of one. 
> > > 
> > > To be clear, this would imply expanding xfs_attri_log_format to
> > > have
> > > another alfi_new_name_len feild and another iovec for the attr
> > > intent
> > > right?  Does that cause issues to change the on disk log layout
> > > after
> > > the original has merged?  Or is that ok for things that are still
> > > experimental? Thanks!
> > 
> > I think we can get away with this quite easily without breaking the
> > existing experimental code.
> > 
> > struct xfs_attri_log_format {
> >         uint16_t        alfi_type;      /* attri log item type */
> >         uint16_t        alfi_size;      /* size of this item */
> >         uint32_t        __pad;          /* pad to 64 bit aligned */
> >         uint64_t        alfi_id;        /* attri identifier */
> >         uint64_t        alfi_ino;       /* the inode for this attr
> > operation */
> >         uint32_t        alfi_op_flags;  /* marks the op as a set or
> > remove */
> >         uint32_t        alfi_name_len;  /* attr name length */
> >         uint32_t        alfi_value_len; /* attr value length */
> >         uint32_t        alfi_attr_filter;/* attr filter flags */
> > };
> > 
> > We have a padding field in there that is currently all zeros. Let's
> > make that a count of the number of {name, value} tuples that are
> > appended to the format. i.e.
> > 
> > struct xfs_attri_log_name {
> >         uint32_t        alfi_op_flags;  /* marks the op as a set or
> > remove */
> >         uint32_t        alfi_name_len;  /* attr name length */
> >         uint32_t        alfi_value_len; /* attr value length */
> >         uint32_t        alfi_attr_filter;/* attr filter flags */
> > };
> > 
> > struct xfs_attri_log_format {
> >         uint16_t        alfi_type;      /* attri log item type */
> >         uint16_t        alfi_size;      /* size of this item */
> > 	uint8_t		alfi_attr_cnt;	/* count of name/val pairs
> > */
> >         uint8_t		__pad1;          /* pad to 64 bit
> > aligned */
> >         uint16_t	__pad2;          /* pad to 64 bit aligned */
> >         uint64_t        alfi_id;        /* attri identifier */
> >         uint64_t        alfi_ino;       /* the inode for this attr
> > operation */
> > 	struct xfs_attri_log_name alfi_attr[]; /* attrs to operate on
> > */
> > };
> > 
> > Basically, the size and shape of the structure has not changed, and
> > if alfi_attr_cnt == 0 we just treat it as if alfi_attr_cnt == 1 as
> > the backwards compat code for the existing code.
> > 
> > And then we just have as many followup regions for name/val pairs
> > as are defined by the alfi_attr_cnt and alfi_attr[] parts of the
> > structure. Each attr can have a different operation performed on
> > them, and they can have different filters applied so they can exist
> > in different namespaces, too.
> > 
> > SO I don't think we need a new on-disk feature bit for this
> > enhancement - it definitely comes under the heading of "this stuff
> > is experimental, this is the sort of early structure revision that
> > EXPERIMENTAL is supposed to cover....
> 
> You might even callit "alfi_extra_names" to avoid the "0 means 1"
> stuff.
> ;)
> 
> --D

Oh, I just noticed these comments this morning when I sent out the new
attri/d patch.  I'll add this changes to v2.  Please let me know if
there's anything else you'd like me to change from the v1.  Thx!

Allison

> 
> > Cheers,
> > 
> > Dave.
> > -- 
> > Dave Chinner
> > david@fromorbit.com


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH RESEND v2 01/18] xfs: Fix multi-transaction larp replay
  2022-08-16 20:41                 ` Alli
@ 2022-08-19  1:05                   ` Alli
  2022-08-23 15:07                     ` Darrick J. Wong
  0 siblings, 1 reply; 58+ messages in thread
From: Alli @ 2022-08-19  1:05 UTC (permalink / raw)
  To: Darrick J. Wong, Dave Chinner; +Cc: linux-xfs

On Tue, 2022-08-16 at 13:41 -0700, Alli wrote:
> On Mon, 2022-08-15 at 22:07 -0700, Darrick J. Wong wrote:
> > On Tue, Aug 16, 2022 at 10:54:38AM +1000, Dave Chinner wrote:
> > > On Thu, Aug 11, 2022 at 06:55:16PM -0700, Alli wrote:
> > > > On Wed, 2022-08-10 at 16:12 +1000, Dave Chinner wrote:
> > > > > On Tue, Aug 09, 2022 at 10:01:49PM -0700, Alli wrote:
> > > > > > On Wed, 2022-08-10 at 11:58 +1000, Dave Chinner wrote:
> > > > > > > On Tue, Aug 09, 2022 at 09:52:55AM -0700, Darrick J. Wong
> > > > > > > wrote:
> > > > > > > > On Thu, Aug 04, 2022 at 12:39:56PM -0700, Allison
> > > > > > > > Henderson
> > > > > > > > wrote:
> > > > > > > > > Recent parent pointer testing has exposed a bug in
> > > > > > > > > the
> > > > > > > > > underlying
> > > > > > > > > attr replay.  A multi transaction replay currently
> > > > > > > > > performs a
> > > > > > > > > single step of the replay, then deferrs the rest if
> > > > > > > > > there is
> > > > > > > > > more
> > > > > > > > > to do.
> > > > > > > 
> > > > > > > Yup.
> > > > > > > 
> > > > > > > > > This causes race conditions with other attr replays
> > > > > > > > > that
> > > > > > > > > might be recovered before the remaining deferred work
> > > > > > > > > has had
> > > > > > > > > a
> > > > > > > > > chance to finish.
> > > > > > > 
> > > > > > > What other attr replays are we racing against?  There can
> > > > > > > only be
> > > > > > > one incomplete attr item intent/done chain per inode
> > > > > > > present in
> > > > > > > log
> > > > > > > recovery, right?
> > > > > > No, a rename queues up a set and remove before committing
> > > > > > the
> > > > > > transaction.  One for the new parent pointer, and another
> > > > > > to
> > > > > > remove
> > > > > > the
> > > > > > old one.
> > > > > 
> > > > > Ah. That really needs to be described in the commit message -
> > > > > changing from "single intent chain per object" to "multiple
> > > > > concurrent independent and unserialised intent chains per
> > > > > object" is
> > > > > a pretty important design rule change...
> > > > > 
> > > > > The whole point of intents is to allow complex, multi-stage
> > > > > operations on a single object to be sequenced in a tightly
> > > > > controlled manner. They weren't intended to be run as
> > > > > concurrent
> > > > > lines of modification on single items; if you need to do two
> > > > > modifications on an object, the intent chain ties the two
> > > > > modifications together into a single whole.
> > > > > 
> > > > > One of the reasons I rewrote the attr state machine for LARP
> > > > > was to
> > > > > enable new multiple attr operation chains to be easily build
> > > > > from
> > > > > the entry points the state machien provides. Parent attr
> > > > > rename
> > > > > needs a new intent chain to be built, not run multiple
> > > > > independent
> > > > > intent chains for each modification.
> > > > > 
> > > > > > It cant be an attr replace because technically the names
> > > > > > are
> > > > > > different.
> > > > > 
> > > > > I disagree - we have all the pieces we need in the state
> > > > > machine
> > > > > already, we just need to define separate attr names for the
> > > > > remove and insert steps in the attr intent.
> > > > > 
> > > > > That is, the "replace" operation we execute when an attr set
> > > > > overwrites the value is "technically" a "replace value"
> > > > > operation,
> > > > > but we actually implement it as a "replace entire attribute"
> > > > > operation.
> > > > > 
> > > > > Without LARP, we do that overwrite in independent steps via
> > > > > an
> > > > > intermediate INCOMPLETE state to allow two xattrs of the same
> > > > > name
> > > > > to exist in the attr tree at the same time. IOWs, the attr
> > > > > value
> > > > > overwrite is effectively a "set-swap-remove" operation on two
> > > > > entirely independent xattrs, ensuring that if we crash we
> > > > > always
> > > > > have either the old or new xattr visible.
> > > > > 
> > > > > With LARP, we can remove the original attr first, thereby
> > > > > avoiding
> > > > > the need for two versions of the xattr to exist in the tree
> > > > > in
> > > > > the
> > > > > first place. However, we have to do these two operations as a
> > > > > pair
> > > > > of linked independent operations. The intent chain provides
> > > > > the
> > > > > linking, and requires us to log the name and the value of the
> > > > > attr
> > > > > that we are overwriting in the intent. Hence we can always
> > > > > recover
> > > > > the modification to completion no matter where in the
> > > > > operation
> > > > > we
> > > > > fail.
> > > > > 
> > > > > When it comes to a parent attr rename operation, we are
> > > > > effectively
> > > > > doing two linked operations - remove the old attr, set the
> > > > > new
> > > > > attr
> > > > > - on different attributes. Implementation wise, it is exactly
> > > > > the
> > > > > same sequence as a "replace value" operation, except for the
> > > > > fact
> > > > > that the new attr we add has a different name.
> > > > > 
> > > > > Hence the only real difference between the existing "attr
> > > > > replace"
> > > > > and the intent chain we need for "parent attr rename" is that
> > > > > we
> > > > > have to log two attr names instead of one. 
> > > > 
> > > > To be clear, this would imply expanding xfs_attri_log_format to
> > > > have
> > > > another alfi_new_name_len feild and another iovec for the attr
> > > > intent
> > > > right?  Does that cause issues to change the on disk log layout
> > > > after
> > > > the original has merged?  Or is that ok for things that are
> > > > still
> > > > experimental? Thanks!
> > > 
> > > I think we can get away with this quite easily without breaking
> > > the
> > > existing experimental code.
> > > 
> > > struct xfs_attri_log_format {
> > >         uint16_t        alfi_type;      /* attri log item type */
> > >         uint16_t        alfi_size;      /* size of this item */
> > >         uint32_t        __pad;          /* pad to 64 bit aligned
> > > */
> > >         uint64_t        alfi_id;        /* attri identifier */
> > >         uint64_t        alfi_ino;       /* the inode for this
> > > attr
> > > operation */
> > >         uint32_t        alfi_op_flags;  /* marks the op as a set
> > > or
> > > remove */
> > >         uint32_t        alfi_name_len;  /* attr name length */
> > >         uint32_t        alfi_value_len; /* attr value length */
> > >         uint32_t        alfi_attr_filter;/* attr filter flags */
> > > };
> > > 
> > > We have a padding field in there that is currently all zeros.
> > > Let's
> > > make that a count of the number of {name, value} tuples that are
> > > appended to the format. i.e.
> > > 
> > > struct xfs_attri_log_name {
> > >         uint32_t        alfi_op_flags;  /* marks the op as a set
> > > or
> > > remove */
> > >         uint32_t        alfi_name_len;  /* attr name length */
> > >         uint32_t        alfi_value_len; /* attr value length */
> > >         uint32_t        alfi_attr_filter;/* attr filter flags */
> > > };
> > > 
> > > struct xfs_attri_log_format {
> > >         uint16_t        alfi_type;      /* attri log item type */
> > >         uint16_t        alfi_size;      /* size of this item */
> > > 	uint8_t		alfi_attr_cnt;	/* count of name/val
> > > pairs
> > > */
> > >         uint8_t		__pad1;          /* pad to 64 bit
> > > aligned */
> > >         uint16_t	__pad2;          /* pad to 64 bit aligned */
> > >         uint64_t        alfi_id;        /* attri identifier */
> > >         uint64_t        alfi_ino;       /* the inode for this
> > > attr
> > > operation */
> > > 	struct xfs_attri_log_name alfi_attr[]; /* attrs to operate on
> > > */
> > > };
> > > 
> > > Basically, the size and shape of the structure has not changed,
> > > and
> > > if alfi_attr_cnt == 0 we just treat it as if alfi_attr_cnt == 1
> > > as
> > > the backwards compat code for the existing code.
> > > 
> > > And then we just have as many followup regions for name/val pairs
> > > as are defined by the alfi_attr_cnt and alfi_attr[] parts of the
> > > structure. Each attr can have a different operation performed on
> > > them, and they can have different filters applied so they can
> > > exist
> > > in different namespaces, too.
> > > 
> > > SO I don't think we need a new on-disk feature bit for this
> > > enhancement - it definitely comes under the heading of "this
> > > stuff
> > > is experimental, this is the sort of early structure revision
> > > that
> > > EXPERIMENTAL is supposed to cover....
> > 
> > You might even callit "alfi_extra_names" to avoid the "0 means 1"
> > stuff.
> > ;)
> > 
> > --D
> 
> Oh, I just noticed these comments this morning when I sent out the
> new
> attri/d patch.  I'll add this changes to v2.  Please let me know if
> there's anything else you'd like me to change from the v1.  Thx!
> 
> Allison

Ok, so I am part way through coding this up, and I'm getting this
feeling like this is not going to work out very well due to the size
checks for the log formats:

root@garnet:/home/achender/work_area/xfs-linux# git diff
fs/xfs/libxfs/xfs_log_format.h fs/xfs/xfs_ondisk.h
diff --git a/fs/xfs/libxfs/xfs_log_format.h
b/fs/xfs/libxfs/xfs_log_format.h
index f1ff52ebb982..5a4e700f32fc 100644
--- a/fs/xfs/libxfs/xfs_log_format.h
+++ b/fs/xfs/libxfs/xfs_log_format.h
@@ -922,6 +922,13 @@ struct xfs_icreate_log {
                                         XFS_ATTR_PARENT | \
                                         XFS_ATTR_INCOMPLETE)
 
+struct xfs_attri_log_name {
+       uint32_t        alfi_op_flags;  /* marks the op as a set or
remove */
+       uint32_t        alfi_name_len;  /* attr name length */
+       uint32_t        alfi_value_len; /* attr value length */
+       uint32_t        alfi_attr_filter;/* attr filter flags */
+};
+
 /*
  * This is the structure used to lay out an attr log item in the
  * log.
@@ -929,14 +936,12 @@ struct xfs_icreate_log {
 struct xfs_attri_log_format {
        uint16_t        alfi_type;      /* attri log item type */
        uint16_t        alfi_size;      /* size of this item */
-       uint32_t        __pad;          /* pad to 64 bit aligned */
+       uint8_t         alfi_extra_names;/* count of name/val pairs */
+       uint8_t         __pad1;         /* pad to 64 bit aligned */
+       uint16_t        __pad2;         /* pad to 64 bit aligned */
        uint64_t        alfi_id;        /* attri identifier */
        uint64_t        alfi_ino;       /* the inode for this attr
operation */
-       uint32_t        alfi_op_flags;  /* marks the op as a set or
remove */
-       uint32_t        alfi_name_len;  /* attr name length */
-       uint32_t        alfi_value_len; /* attr value length */
-       uint32_t        alfi_attr_filter;/* attr filter flags */
+       struct xfs_attri_log_name alfi_attr[]; /* attrs to operate on
*/
 };
 
 struct xfs_attrd_log_format {
diff --git a/fs/xfs/xfs_ondisk.h b/fs/xfs/xfs_ondisk.h
index 3e7f7eaa5b96..c040eeb88def 100644
--- a/fs/xfs/xfs_ondisk.h
+++ b/fs/xfs/xfs_ondisk.h
@@ -132,7 +132,7 @@ xfs_check_ondisk_structs(void)
        XFS_CHECK_STRUCT_SIZE(struct xfs_inode_log_format,      56);
        XFS_CHECK_STRUCT_SIZE(struct xfs_qoff_logformat,        20);
        XFS_CHECK_STRUCT_SIZE(struct xfs_trans_header,          16);
-       XFS_CHECK_STRUCT_SIZE(struct xfs_attri_log_format,      48);
+       XFS_CHECK_STRUCT_SIZE(struct xfs_attri_log_format,      24);
        XFS_CHECK_STRUCT_SIZE(struct xfs_attrd_log_format,      16);
 
        /* parent pointer ioctls */
root@garnet:/home/achender/work_area/xfs-linux# 



If the on disk size check thinks the format is 24 bytes, and then we
surprise pack an array of structs after it, isnt that going to run over
the next item?  I think anything dynamic like this has to be an nvec.
 Maybe we leave the existing alfi_* as they are so the size doesnt
change, and then if we have a value in alfi_extra_names, then we have
an extra nvec that has the array in it.  I think that would work.

FWIW, an alternate solution would be to use the pad for a second name
length, and then we get a patch that's very similar to the one I sent
out last Tues, but backward compatible.  Though it does eat the
remaining pad and wouldn't be as flexible, I cant think of an attr op
that would need more than two names either?

Let me know what people think.  Thanks!
Allison


> > > Cheers,
> > > 
> > > Dave.
> > > -- 
> > > Dave Chinner
> > > david@fromorbit.com


^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [PATCH RESEND v2 01/18] xfs: Fix multi-transaction larp replay
  2022-08-19  1:05                   ` Alli
@ 2022-08-23 15:07                     ` Darrick J. Wong
  2022-08-24 18:47                       ` Alli
  0 siblings, 1 reply; 58+ messages in thread
From: Darrick J. Wong @ 2022-08-23 15:07 UTC (permalink / raw)
  To: Alli; +Cc: Dave Chinner, linux-xfs

On Thu, Aug 18, 2022 at 06:05:54PM -0700, Alli wrote:
> On Tue, 2022-08-16 at 13:41 -0700, Alli wrote:
> > On Mon, 2022-08-15 at 22:07 -0700, Darrick J. Wong wrote:
> > > On Tue, Aug 16, 2022 at 10:54:38AM +1000, Dave Chinner wrote:
> > > > On Thu, Aug 11, 2022 at 06:55:16PM -0700, Alli wrote:
> > > > > On Wed, 2022-08-10 at 16:12 +1000, Dave Chinner wrote:
> > > > > > On Tue, Aug 09, 2022 at 10:01:49PM -0700, Alli wrote:
> > > > > > > On Wed, 2022-08-10 at 11:58 +1000, Dave Chinner wrote:
> > > > > > > > On Tue, Aug 09, 2022 at 09:52:55AM -0700, Darrick J. Wong
> > > > > > > > wrote:
> > > > > > > > > On Thu, Aug 04, 2022 at 12:39:56PM -0700, Allison
> > > > > > > > > Henderson
> > > > > > > > > wrote:
> > > > > > > > > > Recent parent pointer testing has exposed a bug in
> > > > > > > > > > the
> > > > > > > > > > underlying
> > > > > > > > > > attr replay.  A multi transaction replay currently
> > > > > > > > > > performs a
> > > > > > > > > > single step of the replay, then deferrs the rest if
> > > > > > > > > > there is
> > > > > > > > > > more
> > > > > > > > > > to do.
> > > > > > > > 
> > > > > > > > Yup.
> > > > > > > > 
> > > > > > > > > > This causes race conditions with other attr replays
> > > > > > > > > > that
> > > > > > > > > > might be recovered before the remaining deferred work
> > > > > > > > > > has had
> > > > > > > > > > a
> > > > > > > > > > chance to finish.
> > > > > > > > 
> > > > > > > > What other attr replays are we racing against?  There can
> > > > > > > > only be
> > > > > > > > one incomplete attr item intent/done chain per inode
> > > > > > > > present in
> > > > > > > > log
> > > > > > > > recovery, right?
> > > > > > > No, a rename queues up a set and remove before committing
> > > > > > > the
> > > > > > > transaction.  One for the new parent pointer, and another
> > > > > > > to
> > > > > > > remove
> > > > > > > the
> > > > > > > old one.
> > > > > > 
> > > > > > Ah. That really needs to be described in the commit message -
> > > > > > changing from "single intent chain per object" to "multiple
> > > > > > concurrent independent and unserialised intent chains per
> > > > > > object" is
> > > > > > a pretty important design rule change...
> > > > > > 
> > > > > > The whole point of intents is to allow complex, multi-stage
> > > > > > operations on a single object to be sequenced in a tightly
> > > > > > controlled manner. They weren't intended to be run as
> > > > > > concurrent
> > > > > > lines of modification on single items; if you need to do two
> > > > > > modifications on an object, the intent chain ties the two
> > > > > > modifications together into a single whole.
> > > > > > 
> > > > > > One of the reasons I rewrote the attr state machine for LARP
> > > > > > was to
> > > > > > enable new multiple attr operation chains to be easily build
> > > > > > from
> > > > > > the entry points the state machien provides. Parent attr
> > > > > > rename
> > > > > > needs a new intent chain to be built, not run multiple
> > > > > > independent
> > > > > > intent chains for each modification.
> > > > > > 
> > > > > > > It cant be an attr replace because technically the names
> > > > > > > are
> > > > > > > different.
> > > > > > 
> > > > > > I disagree - we have all the pieces we need in the state
> > > > > > machine
> > > > > > already, we just need to define separate attr names for the
> > > > > > remove and insert steps in the attr intent.
> > > > > > 
> > > > > > That is, the "replace" operation we execute when an attr set
> > > > > > overwrites the value is "technically" a "replace value"
> > > > > > operation,
> > > > > > but we actually implement it as a "replace entire attribute"
> > > > > > operation.
> > > > > > 
> > > > > > Without LARP, we do that overwrite in independent steps via
> > > > > > an
> > > > > > intermediate INCOMPLETE state to allow two xattrs of the same
> > > > > > name
> > > > > > to exist in the attr tree at the same time. IOWs, the attr
> > > > > > value
> > > > > > overwrite is effectively a "set-swap-remove" operation on two
> > > > > > entirely independent xattrs, ensuring that if we crash we
> > > > > > always
> > > > > > have either the old or new xattr visible.
> > > > > > 
> > > > > > With LARP, we can remove the original attr first, thereby
> > > > > > avoiding
> > > > > > the need for two versions of the xattr to exist in the tree
> > > > > > in
> > > > > > the
> > > > > > first place. However, we have to do these two operations as a
> > > > > > pair
> > > > > > of linked independent operations. The intent chain provides
> > > > > > the
> > > > > > linking, and requires us to log the name and the value of the
> > > > > > attr
> > > > > > that we are overwriting in the intent. Hence we can always
> > > > > > recover
> > > > > > the modification to completion no matter where in the
> > > > > > operation
> > > > > > we
> > > > > > fail.
> > > > > > 
> > > > > > When it comes to a parent attr rename operation, we are
> > > > > > effectively
> > > > > > doing two linked operations - remove the old attr, set the
> > > > > > new
> > > > > > attr
> > > > > > - on different attributes. Implementation wise, it is exactly
> > > > > > the
> > > > > > same sequence as a "replace value" operation, except for the
> > > > > > fact
> > > > > > that the new attr we add has a different name.
> > > > > > 
> > > > > > Hence the only real difference between the existing "attr
> > > > > > replace"
> > > > > > and the intent chain we need for "parent attr rename" is that
> > > > > > we
> > > > > > have to log two attr names instead of one. 
> > > > > 
> > > > > To be clear, this would imply expanding xfs_attri_log_format to
> > > > > have
> > > > > another alfi_new_name_len feild and another iovec for the attr
> > > > > intent
> > > > > right?  Does that cause issues to change the on disk log layout
> > > > > after
> > > > > the original has merged?  Or is that ok for things that are
> > > > > still
> > > > > experimental? Thanks!
> > > > 
> > > > I think we can get away with this quite easily without breaking
> > > > the
> > > > existing experimental code.
> > > > 
> > > > struct xfs_attri_log_format {
> > > >         uint16_t        alfi_type;      /* attri log item type */
> > > >         uint16_t        alfi_size;      /* size of this item */
> > > >         uint32_t        __pad;          /* pad to 64 bit aligned
> > > > */
> > > >         uint64_t        alfi_id;        /* attri identifier */
> > > >         uint64_t        alfi_ino;       /* the inode for this
> > > > attr
> > > > operation */
> > > >         uint32_t        alfi_op_flags;  /* marks the op as a set
> > > > or
> > > > remove */
> > > >         uint32_t        alfi_name_len;  /* attr name length */
> > > >         uint32_t        alfi_value_len; /* attr value length */
> > > >         uint32_t        alfi_attr_filter;/* attr filter flags */
> > > > };
> > > > 
> > > > We have a padding field in there that is currently all zeros.
> > > > Let's
> > > > make that a count of the number of {name, value} tuples that are
> > > > appended to the format. i.e.
> > > > 
> > > > struct xfs_attri_log_name {
> > > >         uint32_t        alfi_op_flags;  /* marks the op as a set
> > > > or
> > > > remove */
> > > >         uint32_t        alfi_name_len;  /* attr name length */
> > > >         uint32_t        alfi_value_len; /* attr value length */
> > > >         uint32_t        alfi_attr_filter;/* attr filter flags */
> > > > };
> > > > 
> > > > struct xfs_attri_log_format {
> > > >         uint16_t        alfi_type;      /* attri log item type */
> > > >         uint16_t        alfi_size;      /* size of this item */
> > > > 	uint8_t		alfi_attr_cnt;	/* count of name/val
> > > > pairs
> > > > */
> > > >         uint8_t		__pad1;          /* pad to 64 bit
> > > > aligned */
> > > >         uint16_t	__pad2;          /* pad to 64 bit aligned */
> > > >         uint64_t        alfi_id;        /* attri identifier */
> > > >         uint64_t        alfi_ino;       /* the inode for this
> > > > attr
> > > > operation */
> > > > 	struct xfs_attri_log_name alfi_attr[]; /* attrs to operate on
> > > > */
> > > > };
> > > > 
> > > > Basically, the size and shape of the structure has not changed,
> > > > and
> > > > if alfi_attr_cnt == 0 we just treat it as if alfi_attr_cnt == 1
> > > > as
> > > > the backwards compat code for the existing code.
> > > > 
> > > > And then we just have as many followup regions for name/val pairs
> > > > as are defined by the alfi_attr_cnt and alfi_attr[] parts of the
> > > > structure. Each attr can have a different operation performed on
> > > > them, and they can have different filters applied so they can
> > > > exist
> > > > in different namespaces, too.
> > > > 
> > > > SO I don't think we need a new on-disk feature bit for this
> > > > enhancement - it definitely comes under the heading of "this
> > > > stuff
> > > > is experimental, this is the sort of early structure revision
> > > > that
> > > > EXPERIMENTAL is supposed to cover....
> > > 
> > > You might even callit "alfi_extra_names" to avoid the "0 means 1"
> > > stuff.
> > > ;)
> > > 
> > > --D
> > 
> > Oh, I just noticed these comments this morning when I sent out the
> > new
> > attri/d patch.  I'll add this changes to v2.  Please let me know if
> > there's anything else you'd like me to change from the v1.  Thx!
> > 
> > Allison
> 
> Ok, so I am part way through coding this up, and I'm getting this
> feeling like this is not going to work out very well due to the size
> checks for the log formats:
> 
> root@garnet:/home/achender/work_area/xfs-linux# git diff
> fs/xfs/libxfs/xfs_log_format.h fs/xfs/xfs_ondisk.h
> diff --git a/fs/xfs/libxfs/xfs_log_format.h
> b/fs/xfs/libxfs/xfs_log_format.h
> index f1ff52ebb982..5a4e700f32fc 100644
> --- a/fs/xfs/libxfs/xfs_log_format.h
> +++ b/fs/xfs/libxfs/xfs_log_format.h
> @@ -922,6 +922,13 @@ struct xfs_icreate_log {
>                                          XFS_ATTR_PARENT | \
>                                          XFS_ATTR_INCOMPLETE)
>  
> +struct xfs_attri_log_name {
> +       uint32_t        alfi_op_flags;  /* marks the op as a set or
> remove */
> +       uint32_t        alfi_name_len;  /* attr name length */
> +       uint32_t        alfi_value_len; /* attr value length */
> +       uint32_t        alfi_attr_filter;/* attr filter flags */
> +};
> +
>  /*
>   * This is the structure used to lay out an attr log item in the
>   * log.
> @@ -929,14 +936,12 @@ struct xfs_icreate_log {
>  struct xfs_attri_log_format {
>         uint16_t        alfi_type;      /* attri log item type */
>         uint16_t        alfi_size;      /* size of this item */
> -       uint32_t        __pad;          /* pad to 64 bit aligned */
> +       uint8_t         alfi_extra_names;/* count of name/val pairs */
> +       uint8_t         __pad1;         /* pad to 64 bit aligned */
> +       uint16_t        __pad2;         /* pad to 64 bit aligned */
>         uint64_t        alfi_id;        /* attri identifier */
>         uint64_t        alfi_ino;       /* the inode for this attr
> operation */
> -       uint32_t        alfi_op_flags;  /* marks the op as a set or
> remove */
> -       uint32_t        alfi_name_len;  /* attr name length */
> -       uint32_t        alfi_value_len; /* attr value length */
> -       uint32_t        alfi_attr_filter;/* attr filter flags */
> +       struct xfs_attri_log_name alfi_attr[]; /* attrs to operate on

What's the length of this VLA?  1 for a normal SET or REPLACE
operation, and 2 for the "rename and replace value" operation?

If so, why do we need two xfs_attri_log_name structures?  The old value
is unimportant, so we only need one alfi_value_len per operation.  Each
xfs_attri_log_format only describes one change, so it only needs one
alfi_op_flags per op.

For now I also don't think attributes should be able to jump namespaces,
so we'd only need one alfi_attr_filter per op as well.

*lightbulb comes on*  Oops, I think I led you astray with my unfortunate
comment. :(

IOWs, the only change to struct xfs_attri_log_format is:

-       uint32_t        __pad;          /* pad to 64 bit aligned */
+       uint32_t        alfi_new_namelen;/* new attr name length */

and the rest of the changes in "[PATCH] xfs: Add new name to attri/d"
are more or less fine as is.

I'll go reply to that before I get back to Dave's log accounting stuff.

--D

> */
>  };
>  
>  struct xfs_attrd_log_format {
> diff --git a/fs/xfs/xfs_ondisk.h b/fs/xfs/xfs_ondisk.h
> index 3e7f7eaa5b96..c040eeb88def 100644
> --- a/fs/xfs/xfs_ondisk.h
> +++ b/fs/xfs/xfs_ondisk.h
> @@ -132,7 +132,7 @@ xfs_check_ondisk_structs(void)
>         XFS_CHECK_STRUCT_SIZE(struct xfs_inode_log_format,      56);
>         XFS_CHECK_STRUCT_SIZE(struct xfs_qoff_logformat,        20);
>         XFS_CHECK_STRUCT_SIZE(struct xfs_trans_header,          16);
> -       XFS_CHECK_STRUCT_SIZE(struct xfs_attri_log_format,      48);
> +       XFS_CHECK_STRUCT_SIZE(struct xfs_attri_log_format,      24);
>         XFS_CHECK_STRUCT_SIZE(struct xfs_attrd_log_format,      16);
>  
>         /* parent pointer ioctls */
> root@garnet:/home/achender/work_area/xfs-linux# 
> 
> 
> 
> If the on disk size check thinks the format is 24 bytes, and then we
> surprise pack an array of structs after it, isnt that going to run over
> the next item?  I think anything dynamic like this has to be an nvec.
>  Maybe we leave the existing alfi_* as they are so the size doesnt
> change, and then if we have a value in alfi_extra_names, then we have
> an extra nvec that has the array in it.  I think that would work.
> 
> FWIW, an alternate solution would be to use the pad for a second name
> length, and then we get a patch that's very similar to the one I sent
> out last Tues, but backward compatible.  Though it does eat the
> remaining pad and wouldn't be as flexible, I cant think of an attr op
> that would need more than two names either?
> 
> Let me know what people think.  Thanks!
> Allison
> 
> 
> > > > Cheers,
> > > > 
> > > > Dave.
> > > > -- 
> > > > Dave Chinner
> > > > david@fromorbit.com
> 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH RESEND v2 01/18] xfs: Fix multi-transaction larp replay
  2022-08-23 15:07                     ` Darrick J. Wong
@ 2022-08-24 18:47                       ` Alli
  0 siblings, 0 replies; 58+ messages in thread
From: Alli @ 2022-08-24 18:47 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: Dave Chinner, linux-xfs

On Tue, 2022-08-23 at 08:07 -0700, Darrick J. Wong wrote:
> On Thu, Aug 18, 2022 at 06:05:54PM -0700, Alli wrote:
> > On Tue, 2022-08-16 at 13:41 -0700, Alli wrote:
> > > On Mon, 2022-08-15 at 22:07 -0700, Darrick J. Wong wrote:
> > > > On Tue, Aug 16, 2022 at 10:54:38AM +1000, Dave Chinner wrote:
> > > > > On Thu, Aug 11, 2022 at 06:55:16PM -0700, Alli wrote:
> > > > > > On Wed, 2022-08-10 at 16:12 +1000, Dave Chinner wrote:
> > > > > > > On Tue, Aug 09, 2022 at 10:01:49PM -0700, Alli wrote:
> > > > > > > > On Wed, 2022-08-10 at 11:58 +1000, Dave Chinner wrote:
> > > > > > > > > On Tue, Aug 09, 2022 at 09:52:55AM -0700, Darrick J.
> > > > > > > > > Wong
> > > > > > > > > wrote:
> > > > > > > > > > On Thu, Aug 04, 2022 at 12:39:56PM -0700, Allison
> > > > > > > > > > Henderson
> > > > > > > > > > wrote:
> > > > > > > > > > > Recent parent pointer testing has exposed a bug
> > > > > > > > > > > in
> > > > > > > > > > > the
> > > > > > > > > > > underlying
> > > > > > > > > > > attr replay.  A multi transaction replay
> > > > > > > > > > > currently
> > > > > > > > > > > performs a
> > > > > > > > > > > single step of the replay, then deferrs the rest
> > > > > > > > > > > if
> > > > > > > > > > > there is
> > > > > > > > > > > more
> > > > > > > > > > > to do.
> > > > > > > > > 
> > > > > > > > > Yup.
> > > > > > > > > 
> > > > > > > > > > > This causes race conditions with other attr
> > > > > > > > > > > replays
> > > > > > > > > > > that
> > > > > > > > > > > might be recovered before the remaining deferred
> > > > > > > > > > > work
> > > > > > > > > > > has had
> > > > > > > > > > > a
> > > > > > > > > > > chance to finish.
> > > > > > > > > 
> > > > > > > > > What other attr replays are we racing against?  There
> > > > > > > > > can
> > > > > > > > > only be
> > > > > > > > > one incomplete attr item intent/done chain per inode
> > > > > > > > > present in
> > > > > > > > > log
> > > > > > > > > recovery, right?
> > > > > > > > No, a rename queues up a set and remove before
> > > > > > > > committing
> > > > > > > > the
> > > > > > > > transaction.  One for the new parent pointer, and
> > > > > > > > another
> > > > > > > > to
> > > > > > > > remove
> > > > > > > > the
> > > > > > > > old one.
> > > > > > > 
> > > > > > > Ah. That really needs to be described in the commit
> > > > > > > message -
> > > > > > > changing from "single intent chain per object" to
> > > > > > > "multiple
> > > > > > > concurrent independent and unserialised intent chains per
> > > > > > > object" is
> > > > > > > a pretty important design rule change...
> > > > > > > 
> > > > > > > The whole point of intents is to allow complex, multi-
> > > > > > > stage
> > > > > > > operations on a single object to be sequenced in a
> > > > > > > tightly
> > > > > > > controlled manner. They weren't intended to be run as
> > > > > > > concurrent
> > > > > > > lines of modification on single items; if you need to do
> > > > > > > two
> > > > > > > modifications on an object, the intent chain ties the two
> > > > > > > modifications together into a single whole.
> > > > > > > 
> > > > > > > One of the reasons I rewrote the attr state machine for
> > > > > > > LARP
> > > > > > > was to
> > > > > > > enable new multiple attr operation chains to be easily
> > > > > > > build
> > > > > > > from
> > > > > > > the entry points the state machien provides. Parent attr
> > > > > > > rename
> > > > > > > needs a new intent chain to be built, not run multiple
> > > > > > > independent
> > > > > > > intent chains for each modification.
> > > > > > > 
> > > > > > > > It cant be an attr replace because technically the
> > > > > > > > names
> > > > > > > > are
> > > > > > > > different.
> > > > > > > 
> > > > > > > I disagree - we have all the pieces we need in the state
> > > > > > > machine
> > > > > > > already, we just need to define separate attr names for
> > > > > > > the
> > > > > > > remove and insert steps in the attr intent.
> > > > > > > 
> > > > > > > That is, the "replace" operation we execute when an attr
> > > > > > > set
> > > > > > > overwrites the value is "technically" a "replace value"
> > > > > > > operation,
> > > > > > > but we actually implement it as a "replace entire
> > > > > > > attribute"
> > > > > > > operation.
> > > > > > > 
> > > > > > > Without LARP, we do that overwrite in independent steps
> > > > > > > via
> > > > > > > an
> > > > > > > intermediate INCOMPLETE state to allow two xattrs of the
> > > > > > > same
> > > > > > > name
> > > > > > > to exist in the attr tree at the same time. IOWs, the
> > > > > > > attr
> > > > > > > value
> > > > > > > overwrite is effectively a "set-swap-remove" operation on
> > > > > > > two
> > > > > > > entirely independent xattrs, ensuring that if we crash we
> > > > > > > always
> > > > > > > have either the old or new xattr visible.
> > > > > > > 
> > > > > > > With LARP, we can remove the original attr first, thereby
> > > > > > > avoiding
> > > > > > > the need for two versions of the xattr to exist in the
> > > > > > > tree
> > > > > > > in
> > > > > > > the
> > > > > > > first place. However, we have to do these two operations
> > > > > > > as a
> > > > > > > pair
> > > > > > > of linked independent operations. The intent chain
> > > > > > > provides
> > > > > > > the
> > > > > > > linking, and requires us to log the name and the value of
> > > > > > > the
> > > > > > > attr
> > > > > > > that we are overwriting in the intent. Hence we can
> > > > > > > always
> > > > > > > recover
> > > > > > > the modification to completion no matter where in the
> > > > > > > operation
> > > > > > > we
> > > > > > > fail.
> > > > > > > 
> > > > > > > When it comes to a parent attr rename operation, we are
> > > > > > > effectively
> > > > > > > doing two linked operations - remove the old attr, set
> > > > > > > the
> > > > > > > new
> > > > > > > attr
> > > > > > > - on different attributes. Implementation wise, it is
> > > > > > > exactly
> > > > > > > the
> > > > > > > same sequence as a "replace value" operation, except for
> > > > > > > the
> > > > > > > fact
> > > > > > > that the new attr we add has a different name.
> > > > > > > 
> > > > > > > Hence the only real difference between the existing "attr
> > > > > > > replace"
> > > > > > > and the intent chain we need for "parent attr rename" is
> > > > > > > that
> > > > > > > we
> > > > > > > have to log two attr names instead of one. 
> > > > > > 
> > > > > > To be clear, this would imply expanding
> > > > > > xfs_attri_log_format to
> > > > > > have
> > > > > > another alfi_new_name_len feild and another iovec for the
> > > > > > attr
> > > > > > intent
> > > > > > right?  Does that cause issues to change the on disk log
> > > > > > layout
> > > > > > after
> > > > > > the original has merged?  Or is that ok for things that are
> > > > > > still
> > > > > > experimental? Thanks!
> > > > > 
> > > > > I think we can get away with this quite easily without
> > > > > breaking
> > > > > the
> > > > > existing experimental code.
> > > > > 
> > > > > struct xfs_attri_log_format {
> > > > >         uint16_t        alfi_type;      /* attri log item
> > > > > type */
> > > > >         uint16_t        alfi_size;      /* size of this item
> > > > > */
> > > > >         uint32_t        __pad;          /* pad to 64 bit
> > > > > aligned
> > > > > */
> > > > >         uint64_t        alfi_id;        /* attri identifier
> > > > > */
> > > > >         uint64_t        alfi_ino;       /* the inode for this
> > > > > attr
> > > > > operation */
> > > > >         uint32_t        alfi_op_flags;  /* marks the op as a
> > > > > set
> > > > > or
> > > > > remove */
> > > > >         uint32_t        alfi_name_len;  /* attr name length
> > > > > */
> > > > >         uint32_t        alfi_value_len; /* attr value length
> > > > > */
> > > > >         uint32_t        alfi_attr_filter;/* attr filter flags
> > > > > */
> > > > > };
> > > > > 
> > > > > We have a padding field in there that is currently all zeros.
> > > > > Let's
> > > > > make that a count of the number of {name, value} tuples that
> > > > > are
> > > > > appended to the format. i.e.
> > > > > 
> > > > > struct xfs_attri_log_name {
> > > > >         uint32_t        alfi_op_flags;  /* marks the op as a
> > > > > set
> > > > > or
> > > > > remove */
> > > > >         uint32_t        alfi_name_len;  /* attr name length
> > > > > */
> > > > >         uint32_t        alfi_value_len; /* attr value length
> > > > > */
> > > > >         uint32_t        alfi_attr_filter;/* attr filter flags
> > > > > */
> > > > > };
> > > > > 
> > > > > struct xfs_attri_log_format {
> > > > >         uint16_t        alfi_type;      /* attri log item
> > > > > type */
> > > > >         uint16_t        alfi_size;      /* size of this item
> > > > > */
> > > > > 	uint8_t		alfi_attr_cnt;	/* count of name/val
> > > > > pairs
> > > > > */
> > > > >         uint8_t		__pad1;          /* pad to 64
> > > > > bit
> > > > > aligned */
> > > > >         uint16_t	__pad2;          /* pad to 64 bit
> > > > > aligned */
> > > > >         uint64_t        alfi_id;        /* attri identifier
> > > > > */
> > > > >         uint64_t        alfi_ino;       /* the inode for this
> > > > > attr
> > > > > operation */
> > > > > 	struct xfs_attri_log_name alfi_attr[]; /* attrs to
> > > > > operate on
> > > > > */
> > > > > };
> > > > > 
> > > > > Basically, the size and shape of the structure has not
> > > > > changed,
> > > > > and
> > > > > if alfi_attr_cnt == 0 we just treat it as if alfi_attr_cnt ==
> > > > > 1
> > > > > as
> > > > > the backwards compat code for the existing code.
> > > > > 
> > > > > And then we just have as many followup regions for name/val
> > > > > pairs
> > > > > as are defined by the alfi_attr_cnt and alfi_attr[] parts of
> > > > > the
> > > > > structure. Each attr can have a different operation performed
> > > > > on
> > > > > them, and they can have different filters applied so they can
> > > > > exist
> > > > > in different namespaces, too.
> > > > > 
> > > > > SO I don't think we need a new on-disk feature bit for this
> > > > > enhancement - it definitely comes under the heading of "this
> > > > > stuff
> > > > > is experimental, this is the sort of early structure revision
> > > > > that
> > > > > EXPERIMENTAL is supposed to cover....
> > > > 
> > > > You might even callit "alfi_extra_names" to avoid the "0 means
> > > > 1"
> > > > stuff.
> > > > ;)
> > > > 
> > > > --D
> > > 
> > > Oh, I just noticed these comments this morning when I sent out
> > > the
> > > new
> > > attri/d patch.  I'll add this changes to v2.  Please let me know
> > > if
> > > there's anything else you'd like me to change from the v1.  Thx!
> > > 
> > > Allison
> > 
> > Ok, so I am part way through coding this up, and I'm getting this
> > feeling like this is not going to work out very well due to the
> > size
> > checks for the log formats:
> > 
> > root@garnet:/home/achender/work_area/xfs-linux# git diff
> > fs/xfs/libxfs/xfs_log_format.h fs/xfs/xfs_ondisk.h
> > diff --git a/fs/xfs/libxfs/xfs_log_format.h
> > b/fs/xfs/libxfs/xfs_log_format.h
> > index f1ff52ebb982..5a4e700f32fc 100644
> > --- a/fs/xfs/libxfs/xfs_log_format.h
> > +++ b/fs/xfs/libxfs/xfs_log_format.h
> > @@ -922,6 +922,13 @@ struct xfs_icreate_log {
> >                                          XFS_ATTR_PARENT | \
> >                                          XFS_ATTR_INCOMPLETE)
> >  
> > +struct xfs_attri_log_name {
> > +       uint32_t        alfi_op_flags;  /* marks the op as a set or
> > remove */
> > +       uint32_t        alfi_name_len;  /* attr name length */
> > +       uint32_t        alfi_value_len; /* attr value length */
> > +       uint32_t        alfi_attr_filter;/* attr filter flags */
> > +};
> > +
> >  /*
> >   * This is the structure used to lay out an attr log item in the
> >   * log.
> > @@ -929,14 +936,12 @@ struct xfs_icreate_log {
> >  struct xfs_attri_log_format {
> >         uint16_t        alfi_type;      /* attri log item type */
> >         uint16_t        alfi_size;      /* size of this item */
> > -       uint32_t        __pad;          /* pad to 64 bit aligned */
> > +       uint8_t         alfi_extra_names;/* count of name/val pairs
> > */
> > +       uint8_t         __pad1;         /* pad to 64 bit aligned */
> > +       uint16_t        __pad2;         /* pad to 64 bit aligned */
> >         uint64_t        alfi_id;        /* attri identifier */
> >         uint64_t        alfi_ino;       /* the inode for this attr
> > operation */
> > -       uint32_t        alfi_op_flags;  /* marks the op as a set or
> > remove */
> > -       uint32_t        alfi_name_len;  /* attr name length */
> > -       uint32_t        alfi_value_len; /* attr value length */
> > -       uint32_t        alfi_attr_filter;/* attr filter flags */
> > +       struct xfs_attri_log_name alfi_attr[]; /* attrs to operate
> > on
> 
> What's the length of this VLA?  1 for a normal SET or REPLACE
> operation, and 2 for the "rename and replace value" operation?
> 
> If so, why do we need two xfs_attri_log_name structures?  The old
> value
> is unimportant, so we only need one alfi_value_len per
> operation.  Each
> xfs_attri_log_format only describes one change, so it only needs one
> alfi_op_flags per op.
> 
> For now I also don't think attributes should be able to jump
> namespaces,
> so we'd only need one alfi_attr_filter per op as well.
> 
> *lightbulb comes on*  Oops, I think I led you astray with my
> unfortunate
> comment. :(
> 
> IOWs, the only change to struct xfs_attri_log_format is:
> 
> -       uint32_t        __pad;          /* pad to 64 bit aligned */
> +       uint32_t        alfi_new_namelen;/* new attr name length */
> 
> and the rest of the changes in "[PATCH] xfs: Add new name to attri/d"
> are more or less fine as is.
> 
> I'll go reply to that before I get back to Dave's log accounting
> stuff.
> 
> --D
Alrighty, I think thats the simplest solution for now.  Will switch to
that thread....

> 
> > */
> >  };
> >  
> >  struct xfs_attrd_log_format {
> > diff --git a/fs/xfs/xfs_ondisk.h b/fs/xfs/xfs_ondisk.h
> > index 3e7f7eaa5b96..c040eeb88def 100644
> > --- a/fs/xfs/xfs_ondisk.h
> > +++ b/fs/xfs/xfs_ondisk.h
> > @@ -132,7 +132,7 @@ xfs_check_ondisk_structs(void)
> >         XFS_CHECK_STRUCT_SIZE(struct
> > xfs_inode_log_format,      56);
> >         XFS_CHECK_STRUCT_SIZE(struct
> > xfs_qoff_logformat,        20);
> >         XFS_CHECK_STRUCT_SIZE(struct
> > xfs_trans_header,          16);
> > -       XFS_CHECK_STRUCT_SIZE(struct
> > xfs_attri_log_format,      48);
> > +       XFS_CHECK_STRUCT_SIZE(struct
> > xfs_attri_log_format,      24);
> >         XFS_CHECK_STRUCT_SIZE(struct
> > xfs_attrd_log_format,      16);
> >  
> >         /* parent pointer ioctls */
> > root@garnet:/home/achender/work_area/xfs-linux# 
> > 
> > 
> > 
> > If the on disk size check thinks the format is 24 bytes, and then
> > we
> > surprise pack an array of structs after it, isnt that going to run
> > over
> > the next item?  I think anything dynamic like this has to be an
> > nvec.
> >  Maybe we leave the existing alfi_* as they are so the size doesnt
> > change, and then if we have a value in alfi_extra_names, then we
> > have
> > an extra nvec that has the array in it.  I think that would work.
> > 
> > FWIW, an alternate solution would be to use the pad for a second
> > name
> > length, and then we get a patch that's very similar to the one I
> > sent
> > out last Tues, but backward compatible.  Though it does eat the
> > remaining pad and wouldn't be as flexible, I cant think of an attr
> > op
> > that would need more than two names either?
> > 
> > Let me know what people think.  Thanks!
> > Allison
> > 
> > 
> > > > > Cheers,
> > > > > 
> > > > > Dave.
> > > > > -- 
> > > > > Dave Chinner
> > > > > david@fromorbit.com


^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH RESEND v2 13/18] xfs: add parent attributes to link
  2022-08-10  3:09     ` Alli
@ 2022-09-23 20:25       ` Darrick J. Wong
  0 siblings, 0 replies; 58+ messages in thread
From: Darrick J. Wong @ 2022-09-23 20:25 UTC (permalink / raw)
  To: Alli; +Cc: linux-xfs

On Tue, Aug 09, 2022 at 08:09:15PM -0700, Alli wrote:
> On Tue, 2022-08-09 at 11:43 -0700, Darrick J. Wong wrote:
> > On Thu, Aug 04, 2022 at 12:40:08PM -0700, Allison Henderson wrote:
> > > This patch modifies xfs_link to add a parent pointer to the inode.
> > > 
> > > [bfoster: rebase, use VFS inode fields, fix xfs_bmap_finish()
> > > usage]
> > > [achender: rebased, changed __unint32_t to xfs_dir2_dataptr_t,
> > >            fixed null pointer bugs]
> > > 
> > > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > > Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
> > > ---
> > >  fs/xfs/xfs_inode.c | 43 ++++++++++++++++++++++++++++++++++------
> > > ---
> > >  1 file changed, 34 insertions(+), 9 deletions(-)
> > > 
> > > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
> > > index ef993c3a8963..6e5deb0d42c4 100644
> > > --- a/fs/xfs/xfs_inode.c
> > > +++ b/fs/xfs/xfs_inode.c
> > > @@ -1228,14 +1228,16 @@ xfs_create_tmpfile(
> > >  
> > >  int
> > >  xfs_link(
> > > -	xfs_inode_t		*tdp,
> > > -	xfs_inode_t		*sip,
> > > +	struct xfs_inode	*tdp,
> > > +	struct xfs_inode	*sip,
> > >  	struct xfs_name		*target_name)
> > >  {
> > > -	xfs_mount_t		*mp = tdp->i_mount;
> > > -	xfs_trans_t		*tp;
> > > +	struct xfs_mount	*mp = tdp->i_mount;
> > > +	struct xfs_trans	*tp;
> > >  	int			error, nospace_error = 0;
> > >  	int			resblks;
> > > +	xfs_dir2_dataptr_t	diroffset;
> > > +	struct xfs_parent_defer	*parent = NULL;
> > >  
> > >  	trace_xfs_link(tdp, target_name);
> > >  
> > > @@ -1252,11 +1254,17 @@ xfs_link(
> > >  	if (error)
> > >  		goto std_return;
> > >  
> > > +	if (xfs_has_parent(mp)) {
> > > +		error = xfs_parent_init(mp, sip, target_name, &parent);
> > 
> > Why does xfs_parent_init check xfs_has_parent if the callers already
> > do
> > that?
> It was part of the solution outlined in the last review.  It is
> redundant, but not an inappropriate sanity check for that function
> either. I can remove it from the helper if it bothers folks. 
> 
> 
> > 
> > > +		if (error)
> > > +			goto std_return;
> > > +	}
> > > +
> > >  	resblks = XFS_LINK_SPACE_RES(mp, target_name->len);
> > 
> > Same comment about increasing XFS_LINK_SPACE_RES to accomodate xattr
> > expansion as I had for the last patch.
> So we do use XFS_LINK_SPACE_RES here, but didnt we update the tr_link
> below in patch 11 to accommodate for the extra space?  Maybe I'm not
> understanding why we would need both?

D'oh, I apparently forgot to respond to this. :/

tr_res == space we reserve in the *log* to record updates.

XFS_LINK_SPACE_RES == block we reserve from the filesystem free space to
handle expansions of metadata structures.

At this point in this version of the patchset, you've increased the log
space reservations in anticipation of logging more information per
transaction.  However, you've not increased the free space reservations
to handle potential node splitting in the ondisk xattr btree.

(Will copy this to my reply for the patch resend.)

--D

> > 
> > >  	error = xfs_trans_alloc_dir(tdp, &M_RES(mp)->tr_link, sip,
> > > &resblks,
> > >  			&tp, &nospace_error);
> > >  	if (error)
> > > -		goto std_return;
> > > +		goto drop_incompat;
> > >  
> > >  	/*
> > >  	 * If we are using project inheritance, we only allow hard link
> > > @@ -1289,14 +1297,26 @@ xfs_link(
> > >  	}
> > >  
> > >  	error = xfs_dir_createname(tp, tdp, target_name, sip->i_ino,
> > > -				   resblks, NULL);
> > > +				   resblks, &diroffset);
> > >  	if (error)
> > > -		goto error_return;
> > > +		goto out_defer_cancel;
> > >  	xfs_trans_ichgtime(tp, tdp, XFS_ICHGTIME_MOD |
> > > XFS_ICHGTIME_CHG);
> > >  	xfs_trans_log_inode(tp, tdp, XFS_ILOG_CORE);
> > >  
> > >  	xfs_bumplink(tp, sip);
> > >  
> > > +	/*
> > > +	 * If we have parent pointers, we now need to add the parent
> > > record to
> > > +	 * the attribute fork of the inode. If this is the initial
> > > parent
> > > +	 * attribute, we need to create it correctly, otherwise we can
> > > just add
> > > +	 * the parent to the inode.
> > > +	 */
> > > +	if (parent) {
> > > +		error = xfs_parent_defer_add(tp, tdp, parent,
> > > diroffset);
> > 
> > A followup to the comments I made to the previous patch about
> > parent->args.dp --
> > 
> > Since you're partially initializing the xfs_defer_parent structure
> > before you even have the dir offset, why not delay initializing the
> > parent and child pointers until the xfs_parent_defer_add step?
> > 
> > int
> > xfs_parent_init(
> > 	struct xfs_mount		*mp,
> > 	struct xfs_parent_defer		**parentp)
> > {
> > 	struct xfs_parent_defer		*parent;
> > 	int				error;
> > 
> > 	if (!xfs_has_parent(mp))
> > 		return 0;
> > 
> > 	error = xfs_attr_grab_log_assist(mp);
> > 	if (error)
> > 		return error;
> > 
> > 	parent = kzalloc(sizeof(*parent), GFP_KERNEL);
> > 	if (!parent)
> > 		return -ENOMEM;
> > 
> > 	/* init parent da_args */
> > 	parent->args.geo = mp->m_attr_geo;
> > 	parent->args.whichfork = XFS_ATTR_FORK;
> > 	parent->args.attr_filter = XFS_ATTR_PARENT;
> > 	parent->args.op_flags = XFS_DA_OP_OKNOENT | XFS_DA_OP_LOGGED;
> > 	parent->args.name = (const uint8_t *)&parent->rec;
> > 	parent->args.namelen = sizeof(struct xfs_parent_name_rec);
> > 
> > 	*parentp = parent;
> > 	return 0;
> > }
> > 
> > int
> > xfs_parent_defer_add(
> > 	struct xfs_trans	*tp,
> > 	struct xfs_parent_defer	*parent,
> > 	struct xfs_inode	*dp,
> > 	struct xfs_name		*parent_name,
> > 	xfs_dir2_dataptr_t	parent_offset,
> > 	struct xfs_inode	*child)
> > {
> > 	struct xfs_da_args	*args = &parent->args;
> > 
> > 	xfs_init_parent_name_rec(&parent->rec, dp, diroffset);
> > 	args->hashval = xfs_da_hashname(args->name, args->namelen);
> > 
> > 	args->trans = tp;
> > 	args->dp = child;
> > 	if (parent_name) {
> > 		args->name = parent_name->name;
> > 		args->valuelen = parent_name->len;
> > 	}
> > 	return xfs_attr_defer_add(args);
> > }
> > 
> > And then the callsites become:
> > 
> > 	/*
> > 	 * If we have parent pointers, we now need to add the parent
> > record to
> > 	 * the attribute fork of the inode. If this is the initial
> > parent
> > 	 * attribute, we need to create it correctly, otherwise we can
> > just add
> > 	 * the parent to the inode.
> > 	 */
> > 	if (parent) {
> > 		error = xfs_parent_defer_add(tp, parent, tdp,
> > 				target_name, diroffset, sip);
> > 		if (error)
> > 			goto out_defer_cancel;
> > 	}
> Sure, I can scoot that part down to the defer_add helper. Thanks for
> the reviews!
> 
> Allison
> > 
> > Aside from the API suggestions, the rest looks good to me.
> > 
> > --D
> > 
> > > +		if (error)
> > > +			goto out_defer_cancel;
> > > +	}
> > > +
> > >  	/*
> > >  	 * If this is a synchronous mount, make sure that the
> > >  	 * link transaction goes to disk before returning to
> > > @@ -1310,11 +1330,16 @@ xfs_link(
> > >  	xfs_iunlock(sip, XFS_ILOCK_EXCL);
> > >  	return error;
> > >  
> > > - error_return:
> > > +out_defer_cancel:
> > > +	xfs_defer_cancel(tp);
> > > +error_return:
> > >  	xfs_trans_cancel(tp);
> > >  	xfs_iunlock(tdp, XFS_ILOCK_EXCL);
> > >  	xfs_iunlock(sip, XFS_ILOCK_EXCL);
> > > - std_return:
> > > +drop_incompat:
> > > +	if (parent)
> > > +		xfs_parent_cancel(mp, parent);
> > > +std_return:
> > >  	if (error == -ENOSPC && nospace_error)
> > >  		error = nospace_error;
> > >  	return error;
> > > -- 
> > > 2.25.1
> > > 
> 

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH RESEND v2 18/18] xfs: Add parent pointer ioctl
  2022-08-10  3:09     ` Alli
@ 2022-09-24  0:01       ` Darrick J. Wong
  0 siblings, 0 replies; 58+ messages in thread
From: Darrick J. Wong @ 2022-09-24  0:01 UTC (permalink / raw)
  To: Alli; +Cc: linux-xfs

On Tue, Aug 09, 2022 at 08:09:51PM -0700, Alli wrote:
> On Tue, 2022-08-09 at 12:26 -0700, Darrick J. Wong wrote:
> > On Thu, Aug 04, 2022 at 12:40:13PM -0700, Allison Henderson wrote:
> > > This patch adds a new file ioctl to retrieve the parent pointer of
> > > a
> > > given inode
> > > 
> > > Signed-off-by: Allison Henderson <allison.henderson@oracle.com>
> > > ---
> > >  fs/xfs/Makefile            |   1 +
> > >  fs/xfs/libxfs/xfs_fs.h     |  57 ++++++++++++++++
> > >  fs/xfs/libxfs/xfs_parent.c |  10 +++
> > >  fs/xfs/libxfs/xfs_parent.h |   2 +
> > >  fs/xfs/xfs_ioctl.c         |  95 +++++++++++++++++++++++++-
> > >  fs/xfs/xfs_ondisk.h        |   4 ++
> > >  fs/xfs/xfs_parent_utils.c  | 134
> > > +++++++++++++++++++++++++++++++++++++
> > >  fs/xfs/xfs_parent_utils.h  |  22 ++++++
> > >  8 files changed, 323 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/fs/xfs/Makefile b/fs/xfs/Makefile
> > > index caeea8d968ba..998658e40ab4 100644
> > > --- a/fs/xfs/Makefile
> > > +++ b/fs/xfs/Makefile
> > > @@ -86,6 +86,7 @@ xfs-y				+= xfs_aops.o \
> > >  				   xfs_mount.o \
> > >  				   xfs_mru_cache.o \
> > >  				   xfs_pwork.o \
> > > +				   xfs_parent_utils.o \
> > >  				   xfs_reflink.o \
> > >  				   xfs_stats.o \
> > >  				   xfs_super.o \
> > > diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
> > > index b0b4d7a3aa15..ba6ec82a0272 100644
> > > --- a/fs/xfs/libxfs/xfs_fs.h
> > > +++ b/fs/xfs/libxfs/xfs_fs.h
> > > @@ -574,6 +574,7 @@ typedef struct xfs_fsop_handlereq {
> > >  #define XFS_IOC_ATTR_SECURE	0x0008	/* use attrs in
> > > security namespace */
> > >  #define XFS_IOC_ATTR_CREATE	0x0010	/* fail if attr
> > > already exists */
> > >  #define XFS_IOC_ATTR_REPLACE	0x0020	/* fail if attr
> > > does not exist */
> > > +#define XFS_IOC_ATTR_PARENT	0x0040  /* use attrs in parent
> > > namespace */
> > 
> > This is the userspace API header, so I wonder -- should we allow
> > XFS_IOC_ATTRLIST_BY_HANDLE and XFS_IOC_ATTRMULTI_BY_HANDLE to access
> > parent pointers?
> Well, the ioc is how the test cases get the pptrs back out in order to
> verify parent pointers are working.  So we need to keep at least that,
> but then I think it makes worrying about other forms of access feel
> sort of silly since we're not really hiding anything.  They would have
> to pass in the parent filter flag which wasnt allowable until now, so
> it's not like having pptrs appear in the list when asked for is
> inappropriate.
> 
> > 
> > I think it's *definitely* incorrect to let ATTR_OP_REMOVE or
> > ATTR_OP_SET
> > (attrmulti subcommands) to mess with parent pointers.
> Ok, I can see if I can add some sanity checking there.
> 
> > 
> > I don't think attrlist or ATTR_OP_GET should be touching them either,
> > particularly since you're defining a new ioctl to extract *only* the
> > parent pointers.
> > 
> > If there wasn't XFS_IOC_GETPPOINTER then perhaps it would be ok to
> > allow
> > reads via ATTRLIST/ATTRMULTI.  But even then, I don't think we want
> > things like xfsdump to think that it has to preserve those attributes
> > since xfsrestore will reconstruct the directory tree (and hence the
> > pptrs) for us.
> Hrmm, not sure I follow this part, the point of pptrs are to
> reconstruct the tree, so wouldnt we want them preserved?

Parent pointers are backwards links through the directory tree.  xfsdump
already records the forward links in the dump file.  xfsrestore uses
those forward links to rebuild the directory tree, which recreates the
parent pointers automatically.  Hence we don't need ATTRMULTI to reveal
(or recreate) the parent pointer xattrs; the kernel does that when we
create the directory tree.

The second reason I can think of why we don't want to expose the parent
pointers through the xattr APIs is that we don't want to reveal ondisk
metadata directly to users -- some day we might want to change wthat's
stored on disk, or store them in a totally separate structure, or
whatever.

If we force the interface to be the GETPARENTS ioctl, then we've
decoupled the front and backends.  I conclude that the /only/ userspace
API that should ever touch parent pointers is XFS_IOC_GETPARENTS.

I'll forward this on to the v3 thread too.

--D

> > 
> > >  
> > >  typedef struct xfs_attrlist_cursor {
> > >  	__u32		opaque[4];
> > > @@ -752,6 +753,61 @@ struct xfs_scrub_metadata {
> > >  				 XFS_SCRUB_OFLAG_NO_REPAIR_NEEDED)
> > >  #define XFS_SCRUB_FLAGS_ALL	(XFS_SCRUB_FLAGS_IN |
> > > XFS_SCRUB_FLAGS_OUT)
> > >  
> > > +#define XFS_PPTR_MAXNAMELEN				256
> > > +
> > > +/* return parents of the handle, not the open fd */
> > > +#define XFS_PPTR_IFLAG_HANDLE  (1U << 0)
> > > +
> > > +/* target was the root directory */
> > > +#define XFS_PPTR_OFLAG_ROOT    (1U << 1)
> > > +
> > > +/* Cursor is done iterating pptrs */
> > > +#define XFS_PPTR_OFLAG_DONE    (1U << 2)
> > > +
> > > +/* Get an inode parent pointer through ioctl */
> > > +struct xfs_parent_ptr {
> > > +	__u64		xpp_ino;			/* Inode */
> > > +	__u32		xpp_gen;			/* Inode
> > > generation */
> > > +	__u32		xpp_diroffset;			/*
> > > Directory offset */
> > > +	__u32		xpp_namelen;			/* File
> > > name length */
> > > +	__u32		xpp_pad;
> > > +	__u8		xpp_name[XFS_PPTR_MAXNAMELEN];	/* File
> > > name */
> > 
> > Since xpp_name is a fixed-length array that is long enough to ensure
> > that there's a null at the end of the name, we don't need
> > xpp_namelen.
> > 
> > I wonder if xpp_namelen and xpp_pad should simply turn into a u64
> > field
> > that's defined zero for future expansion?
> Sure, I'll see if I can remove it and add a reserved field
> 
> > 
> > > +};
> > > +
> > > +/* Iterate through an inodes parent pointers */
> > > +struct xfs_pptr_info {
> > > +	struct xfs_handle		pi_handle;
> > > +	struct xfs_attrlist_cursor	pi_cursor;
> > > +	__u32				pi_flags;
> > > +	__u32				pi_reserved;
> > > +	__u32				pi_ptrs_size;
> > 
> > Is this the number of elements in pi_parents[]?
> Yes, it's the number parent pointers in the array
> 
> > 
> > > +	__u32				pi_ptrs_used;
> > > +	__u64				pi_reserved2[6];
> > > +
> > > +	/*
> > > +	 * An array of struct xfs_parent_ptr follows the header
> > > +	 * information. Use XFS_PPINFO_TO_PP() to access the
> > > +	 * parent pointer array entries.
> > > +	 */
> > > +	struct xfs_parent_ptr		pi_parents[];
> > > +};
> > > +
> > > +static inline size_t
> > > +xfs_pptr_info_sizeof(int nr_ptrs)
> > > +{
> > > +	return sizeof(struct xfs_pptr_info) +
> > > +	       (nr_ptrs * sizeof(struct xfs_parent_ptr));
> > > +}
> > > +
> > > +static inline struct xfs_parent_ptr*
> > > +xfs_ppinfo_to_pp(
> > > +	struct xfs_pptr_info	*info,
> > > +	int			idx)
> > > +{
> > > +
> > 
> > Nit: extra space.
> Will fix
> 
> > 
> > > +	return &info->pi_parents[idx];
> > > +}
> > > +
> > >  /*
> > >   * ioctl limits
> > >   */
> > > @@ -797,6 +853,7 @@ struct xfs_scrub_metadata {
> > >  /*	XFS_IOC_GETFSMAP ------ hoisted 59         */
> > >  #define XFS_IOC_SCRUB_METADATA	_IOWR('X', 60, struct
> > > xfs_scrub_metadata)
> > >  #define XFS_IOC_AG_GEOMETRY	_IOWR('X', 61, struct
> > > xfs_ag_geometry)
> > > +#define XFS_IOC_GETPPOINTER	_IOR ('X', 62, struct
> > > xfs_parent_ptr)
> > 
> > I wonder if this name should more strongly emphasize that it's for
> > reading
> > the parents of a file?
> > 
> > #define XFS_IOC_GETPARENTS	_IOWR(...)
> Sure, that sounds fine i think
> 
> > 
> > Also, the ioctl reads and writes its parameter, so this is _IOWR, not
> > _IOR.
> > 
> > BTW, is there a sample manpage somewhere?
> The userspace branch adds some new flags to xfsprogs and some usage
> help to explain how to use them.  See the last patch in the branch:
> https://github.com/allisonhenderson/xfsprogs/tree/xfsprogs_new_pptrsv2
> 
> But it's just for printing the parent pointers out, it doesn't have a
> man page for how to write your own ioctl.  I suppose we could add it
> though.
> 
> > 
> > >  
> > >  /*
> > >   * ioctl commands that replace IRIX syssgi()'s
> > > diff --git a/fs/xfs/libxfs/xfs_parent.c
> > > b/fs/xfs/libxfs/xfs_parent.c
> > > index 03f03f731d02..d9c922a78617 100644
> > > --- a/fs/xfs/libxfs/xfs_parent.c
> > > +++ b/fs/xfs/libxfs/xfs_parent.c
> > > @@ -26,6 +26,16 @@
> > >  #include "xfs_xattr.h"
> > >  #include "xfs_parent.h"
> > >  
> > > +/* Initializes a xfs_parent_ptr from an xfs_parent_name_rec */
> > > +void
> > > +xfs_init_parent_ptr(struct xfs_parent_ptr	*xpp,
> > > +		    struct xfs_parent_name_rec	*rec)
> > 
> > The second parameter ought to be const struct xfs_parent_name_rec
> > *rec
> > to make it unambiguous to readers which is the source and which is
> > the
> > destination argument.
> Ok, will update
> 
> > 
> > > +{
> > > +	xpp->xpp_ino = be64_to_cpu(rec->p_ino);
> > > +	xpp->xpp_gen = be32_to_cpu(rec->p_gen);
> > > +	xpp->xpp_diroffset = be32_to_cpu(rec->p_diroffset);
> > > +}
> > > +
> > >  /*
> > >   * Parent pointer attribute handling.
> > >   *
> > > diff --git a/fs/xfs/libxfs/xfs_parent.h
> > > b/fs/xfs/libxfs/xfs_parent.h
> > > index 67948f4b3834..53161b79d1e2 100644
> > > --- a/fs/xfs/libxfs/xfs_parent.h
> > > +++ b/fs/xfs/libxfs/xfs_parent.h
> > > @@ -23,6 +23,8 @@ void xfs_init_parent_name_rec(struct
> > > xfs_parent_name_rec *rec,
> > >  			      uint32_t p_diroffset);
> > >  void xfs_init_parent_name_irec(struct xfs_parent_name_irec *irec,
> > >  			       struct xfs_parent_name_rec *rec);
> > > +void xfs_init_parent_ptr(struct xfs_parent_ptr *xpp,
> > > +			 struct xfs_parent_name_rec *rec);
> > >  int xfs_parent_init(xfs_mount_t *mp, xfs_inode_t *ip,
> > >  		    struct xfs_name *target_name,
> > >  		    struct xfs_parent_defer **parentp);
> > > diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> > > index 5b600d3f7981..8a9530588ef4 100644
> > > --- a/fs/xfs/xfs_ioctl.c
> > > +++ b/fs/xfs/xfs_ioctl.c
> > > @@ -37,6 +37,7 @@
> > >  #include "xfs_health.h"
> > >  #include "xfs_reflink.h"
> > >  #include "xfs_ioctl.h"
> > > +#include "xfs_parent_utils.h"
> > >  #include "xfs_xattr.h"
> > >  
> > >  #include <linux/mount.h>
> > > @@ -355,6 +356,8 @@ xfs_attr_filter(
> > >  		return XFS_ATTR_ROOT;
> > >  	if (ioc_flags & XFS_IOC_ATTR_SECURE)
> > >  		return XFS_ATTR_SECURE;
> > > +	if (ioc_flags & XFS_IOC_ATTR_PARENT)
> > > +		return XFS_ATTR_PARENT;
> > >  	return 0;
> > >  }
> > >  
> > > @@ -422,7 +425,8 @@ xfs_ioc_attr_list(
> > >  	/*
> > >  	 * Reject flags, only allow namespaces.
> > >  	 */
> > > -	if (flags & ~(XFS_IOC_ATTR_ROOT | XFS_IOC_ATTR_SECURE))
> > > +	if (flags & ~(XFS_IOC_ATTR_ROOT | XFS_IOC_ATTR_SECURE |
> > > +		      XFS_IOC_ATTR_PARENT))
> > >  		return -EINVAL;
> > 
> > I think xfs_ioc_attrmulti_one needs filtering for
> > XFS_IOC_ATTR_PARENT,
> > if we're still going to allow attrlist/attrmulti to return parent
> > pointers.
> Ok, will update that one as well then
> 
> > 
> > >  	if (flags == (XFS_IOC_ATTR_ROOT | XFS_IOC_ATTR_SECURE))
> > >  		return -EINVAL;
> > > @@ -1679,6 +1683,92 @@ xfs_ioc_scrub_metadata(
> > >  	return 0;
> > >  }
> > >  
> > > +/*
> > > + * IOCTL routine to get the parent pointers of an inode and return
> > > it to user
> > > + * space.  Caller must pass a buffer space containing a struct
> > > xfs_pptr_info,
> > > + * followed by a region large enough to contain an array of struct
> > > + * xfs_parent_ptr of a size specified in pi_ptrs_size.  If the
> > > inode contains
> > > + * more parent pointers than can fit in the buffer space, caller
> > > may re-call
> > > + * the function using the returned pi_cursor to resume
> > > iteration.  The
> > > + * number of xfs_parent_ptr returned will be stored in
> > > pi_ptrs_used.
> > > + *
> > > + * Returns 0 on success or non-zero on failure
> > > + */
> > > +STATIC int
> > > +xfs_ioc_get_parent_pointer(
> > > +	struct file			*filp,
> > > +	void				__user *arg)
> > > +{
> > > +	struct xfs_pptr_info		*ppi = NULL;
> > > +	int				error = 0;
> > > +	struct xfs_inode		*ip = XFS_I(file_inode(filp));
> > > +	struct xfs_mount		*mp = ip->i_mount;
> > > +
> > > +	if (!capable(CAP_SYS_ADMIN))
> > > +		return -EPERM;
> > > +
> > > +	/* Allocate an xfs_pptr_info to put the user data */
> > > +	ppi = kmem_alloc(sizeof(struct xfs_pptr_info), 0);
> > 
> > New code should call kmalloc instead of the old kmem_alloc wrapper.
> > 
> Ok, will update
> 
> > > +	if (!ppi)
> > > +		return -ENOMEM;
> > > +
> > > +	/* Copy the data from the user */
> > > +	error = copy_from_user(ppi, arg, sizeof(struct xfs_pptr_info));
> > 
> > Note: copy_from_user returns the number of bytes *not* copied.  If
> > you
> > receive a nonzero return value, error usually gets set to EFAULT.
> ooooh. ok, will fix that then.
> 
> > 
> > > +	if (error)
> > > +		goto out;
> > > +
> > > +	/* Check size of buffer requested by user */
> > > +	if (xfs_pptr_info_sizeof(ppi->pi_ptrs_size) >
> > > XFS_XATTR_LIST_MAX) {
> > > +		error = -ENOMEM;
> > > +		goto out;
> > > +	}
> > > +
> > > +	if (ppi->pi_flags != 0 && ppi->pi_flags !=
> > > XFS_PPTR_IFLAG_HANDLE) {
> > 
> > 	if (ppi->pi_flags & ~XFS_PPTR_IFLAG_HANDLE) ?
> > 
> > (If we really want to be pedantic, this really ought to be:
> > 
> > #define XFS_PPTR_IFLAG_ALL	(XFS_PPTR_IFLAG_HANDLE)
> > 
> > 	if (ppi->pi_flags & ~XFS_PPTR_IFLAG_ALL)
> > 		return -EINVAL;
> > 
> > Or you could be more flexible, since the kernel could just set the
> > OFLAGs appropriately and not care about their value on input:
> > 
> > #define XFS_PPTR_FLAG_ALL	(XFS_PPTR_IFLAG_HANDLE |
> > XFS_PPTR_OFLAG...)
> > 
> > 	if (ppi->pi_flags & ~XFS_PPTR_FLAG_ALL)
> > 		return -EINVAL;
> > 
> > 	ppi->pi_flags &= ~(XFS_PPTR_OFLAG_ROOT | XFS_PPTR_OFLAG_DONE);
> 
> Oh, I see, sure that makes sense.
> > 
> > > +		error = -EINVAL;
> > > +		goto out;
> > > +	}
> > > +
> > > +	/*
> > > +	 * Now that we know how big the trailing buffer is, expand
> > > +	 * our kernel xfs_pptr_info to be the same size
> > > +	 */
> > > +	ppi = krealloc(ppi, xfs_pptr_info_sizeof(ppi->pi_ptrs_size),
> > > +		       GFP_NOFS | __GFP_NOFAIL);
> > > +	if (!ppi)
> > > +		return -ENOMEM;
> > 
> > Why NOFS and NOFAIL?  We don't have any writeback resources locked
> > (transactions and ILOCKs) so we can hit ourselves up for memory.
> Ok, will update
> 
> > 
> > > +
> > > +	if (ppi->pi_flags == XFS_PPTR_IFLAG_HANDLE) {
> > 
> > 	if (ppi->pi_flags & XFS_PPTR_IFLAG_HANDLE) {
> ok, will fix
> 
> > 
> > > +		error = xfs_iget(mp, NULL, ppi-
> > > >pi_handle.ha_fid.fid_ino,
> > > +				0, 0, &ip);
> > > +		if (error)
> > > +			goto out;
> > > +
> > > +		if (VFS_I(ip)->i_generation != ppi-
> > > >pi_handle.ha_fid.fid_gen) {
> > > +			error = -EINVAL;
> > > +			goto out;
> > > +		}
> > > +	}
> > > +
> > > +	if (ip->i_ino == mp->m_sb.sb_rootino)
> > > +		ppi->pi_flags |= XFS_PPTR_OFLAG_ROOT;
> > > +
> > > +	/* Get the parent pointers */
> > > +	error = xfs_attr_get_parent_pointer(ip, ppi);
> > > +
> > > +	if (error)
> > > +		goto out;
> > > +
> > > +	/* Copy the parent pointers back to the user */
> > > +	error = copy_to_user(arg, ppi,
> > > +			xfs_pptr_info_sizeof(ppi->pi_ptrs_size));
> > 
> > Same note as the one I made for copy_from_user.
> > 
> Will update
> 
> > > +	if (error)
> > > +		goto out;
> > > +
> > > +out:
> > > +	kmem_free(ppi);
> > > +	return error;
> > > +}
> > > +
> > >  int
> > >  xfs_ioc_swapext(
> > >  	xfs_swapext_t	*sxp)
> > > @@ -1968,7 +2058,8 @@ xfs_file_ioctl(
> > >  
> > >  	case XFS_IOC_FSGETXATTRA:
> > >  		return xfs_ioc_fsgetxattra(ip, arg);
> > > -
> > > +	case XFS_IOC_GETPPOINTER:
> > > +		return xfs_ioc_get_parent_pointer(filp, arg);
> > >  	case XFS_IOC_GETBMAP:
> > >  	case XFS_IOC_GETBMAPA:
> > >  	case XFS_IOC_GETBMAPX:
> > > diff --git a/fs/xfs/xfs_ondisk.h b/fs/xfs/xfs_ondisk.h
> > > index 758702b9495f..765eb514a917 100644
> > > --- a/fs/xfs/xfs_ondisk.h
> > > +++ b/fs/xfs/xfs_ondisk.h
> > > @@ -135,6 +135,10 @@ xfs_check_ondisk_structs(void)
> > >  	XFS_CHECK_STRUCT_SIZE(struct xfs_attri_log_format,	40);
> > >  	XFS_CHECK_STRUCT_SIZE(struct xfs_attrd_log_format,	16);
> > >  
> > > +	/* parent pointer ioctls */
> > > +	XFS_CHECK_STRUCT_SIZE(struct xfs_parent_ptr,            280);
> > > +	XFS_CHECK_STRUCT_SIZE(struct xfs_pptr_info,             104);
> > > +
> > >  	/*
> > >  	 * The v5 superblock format extended several v4 header
> > > structures with
> > >  	 * additional data. While new fields are only accessible on v5
> > > diff --git a/fs/xfs/xfs_parent_utils.c b/fs/xfs/xfs_parent_utils.c
> > > new file mode 100644
> > > index 000000000000..3351ce173075
> > > --- /dev/null
> > > +++ b/fs/xfs/xfs_parent_utils.c
> > > @@ -0,0 +1,134 @@
> > > +/*
> > > + * Copyright (c) 2015 Red Hat, Inc.
> > > + * All rights reserved.
> > > + *
> > > + * This program is free software; you can redistribute it and/or
> > > + * modify it under the terms of the GNU General Public License as
> > > + * published by the Free Software Foundation.
> > > + *
> > > + * This program is distributed in the hope that it would be
> > > useful,
> > > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > > + * GNU General Public License for more details.
> > > + *
> > > + * You should have received a copy of the GNU General Public
> > > License
> > > + * along with this program; if not, write the Free Software
> > > Foundation
> > > + */
> > 
> > Please condense this boilerplate down to a SPDX tag and a copyright
> > statement.
> Sure, will do
> 
> > 
> > > +#include "xfs.h"
> > > +#include "xfs_fs.h"
> > > +#include "xfs_format.h"
> > > +#include "xfs_log_format.h"
> > > +#include "xfs_shared.h"
> > > +#include "xfs_trans_resv.h"
> > > +#include "xfs_mount.h"
> > > +#include "xfs_bmap_btree.h"
> > > +#include "xfs_inode.h"
> > > +#include "xfs_error.h"
> > > +#include "xfs_trace.h"
> > > +#include "xfs_trans.h"
> > > +#include "xfs_da_format.h"
> > > +#include "xfs_da_btree.h"
> > > +#include "xfs_attr.h"
> > > +#include "xfs_ioctl.h"
> > > +#include "xfs_parent.h"
> > > +#include "xfs_da_btree.h"
> > > +
> > > +/*
> > > + * Get the parent pointers for a given inode
> > > + *
> > > + * Returns 0 on success and non zero on error
> > > + */
> > > +int
> > > +xfs_attr_get_parent_pointer(struct xfs_inode		*ip,
> > > +			    struct xfs_pptr_info	*ppi)
> > > +
> > > +{
> > > +
> > > +	struct xfs_attrlist		*alist;
> > 
> > int
> > xfs_attr_get_parent_pointer(
> > 	struct xfs_inode		*ip,
> > 	struct xfs_pptr_info		*ppi)
> > {
> > 	struct xfs_attrlist		*alist;
> will fix
> 
> > 
> > 
> > > +	struct xfs_attrlist_ent		*aent;
> > > +	struct xfs_parent_ptr		*xpp;
> > > +	struct xfs_parent_name_rec	*xpnr;
> > > +	char				*namebuf;
> > > +	unsigned int			namebuf_size;
> > > +	int				name_len;
> > > +	int				error = 0;
> > > +	unsigned int			ioc_flags =
> > > XFS_IOC_ATTR_PARENT;
> > > +	unsigned int			flags = XFS_ATTR_PARENT;
> > > +	int				i;
> > > +	struct xfs_attr_list_context	context;
> > > +
> > > +	/* Allocate a buffer to store the attribute names */
> > > +	namebuf_size = sizeof(struct xfs_attrlist) +
> > > +		       (ppi->pi_ptrs_size) * sizeof(struct
> > > xfs_attrlist_ent);
> > > +	namebuf = kvzalloc(namebuf_size, GFP_KERNEL);
> > > +	if (!namebuf)
> > > +		return -ENOMEM;
> > 
> > Do we need the buffer to be zeroed if xfs_attr_list is just going to
> > set
> > its contents?
> I think i might have initially done this out of habit, but I think it's
> safe to remove.
> 
> > 
> > > +
> > > +	memset(&context, 0, sizeof(struct xfs_attr_list_context));
> > > +	error = xfs_ioc_attr_list_context_init(ip, namebuf,
> > > namebuf_size,
> > > +			ioc_flags, &context);
> > 
> > Aha, so the internal implementation has access to
> > xfs_attr_list_context
> > before it calls into the attr list code.  Ok, in that case, xfs_fs.h
> > doesn't need the XFS_IOC_ATTR_PARENT flag, and you can set
> > context.attr_filter = XFS_ATTR_PARENT here.  Then we don't have to
> > worry
> > about the existing xattr bulk ioctls returning parent pointers.Oh ok.
> >  I'll see if I can take it out
> Oh ok, I'll take a look and see it it can come out.
> 
> > 
> > > +
> > > +	/* Copy the cursor provided by caller */
> > > +	memcpy(&context.cursor, &ppi->pi_cursor,
> > > +	       sizeof(struct xfs_attrlist_cursor));
> > > +
> > > +	if (error)
> > > +		goto out_kfree;
> > 
> > Why does the error check come after copying the cursor into the
> > onstack
> > variable?
> Hmm, there might have been a reason at one point, but I
> think xfs_ioc_attr_list_context_init could actually just be a void
> return now.
> 
> > 
> > > +
> > > +	xfs_ilock(ip, XFS_ILOCK_EXCL);
> > 
> > xfs_ilock_attr_map_shared() ?
> Ok, will update
> 
> > 
> > > +
> > > +	error = xfs_attr_list_ilocked(&context);
> > > +	if (error)
> > > +		goto out_kfree;
> > > +
> > > +	alist = (struct xfs_attrlist *)namebuf;
> > > +	for (i = 0; i < alist->al_count; i++) {
> > > +		struct xfs_da_args args = {
> > > +			.geo = ip->i_mount->m_attr_geo,
> > > +			.whichfork = XFS_ATTR_FORK,
> > > +			.dp = ip,
> > > +			.namelen = sizeof(struct xfs_parent_name_rec),
> > > +			.attr_filter = flags,
> > > +			.op_flags = XFS_DA_OP_OKNOENT,
> > > +		};
> > > +
> > > +		xpp = xfs_ppinfo_to_pp(ppi, i);
> > > +		memset(xpp, 0, sizeof(struct xfs_parent_ptr));
> > > +		aent = (struct xfs_attrlist_ent *)
> > > +			&namebuf[alist->al_offset[i]];
> > > +		xpnr = (struct xfs_parent_name_rec *)(aent->a_name);
> > > +
> > > +		if (aent->a_valuelen > XFS_PPTR_MAXNAMELEN) {
> > > +			error = -ERANGE;
> > > +			goto out_kfree;
> > 
> > If a parent pointer has a name longer than MAXNAMELEN then isn't that
> > a
> > corruption?  And in that case, -EFSCORRUPTED would be more
> > appropriate
> > here, right?
> I think so, will fix
> 
> > 
> > > +		}
> > > +		name_len = aent->a_valuelen;
> > > +
> > > +		args.name = (char *)xpnr;
> > > +		args.hashval = xfs_da_hashname(args.name,
> > > args.namelen),
> > > +		args.value = (unsigned char *)(xpp->xpp_name);
> > > +		args.valuelen = name_len;
> > > +
> > > +		error = xfs_attr_get_ilocked(&args);
> > 
> > If error is ENOENT (or ENOATTR or whatever the return value is when
> > the
> > attr doesn't exist) then shouldn't that be treated as a corruption
> > too?
> > We still hold the ILOCK from earlier.  I don't think OKNOENT is
> > correct
> > either.
> Hmm, I think I likely borrowed this from similar code else where, but
> if the inode is locked in this case probably any error is grounds for
> corruption.  will update
> 
> > 
> > > +		error = (error == -EEXIST ? 0 : error);
> > > +		if (error)
> > > +			goto out_kfree;
> > > +
> > > +		xpp->xpp_namelen = name_len;
> > > +		xfs_init_parent_ptr(xpp, xpnr);
> > 
> > Also, should we validate xpnr before copying it out to userspace?
> > If, say, the inode number is bogus, that should generate an
> > EFSCORRUPTED.
> I suppose we could validate the inode while we have it here.
> 
> > 
> > > +	}
> > > +	ppi->pi_ptrs_used = alist->al_count;
> > > +	if (!alist->al_more)
> > > +		ppi->pi_flags |= XFS_PPTR_OFLAG_DONE;
> > > +
> > > +	/* Update the caller with the current cursor position */
> > > +	memcpy(&ppi->pi_cursor, &context.cursor,
> > > +		sizeof(struct xfs_attrlist_cursor));
> > 
> > Glad you remembered to do this; attrmulti forgot to do this for a
> > long
> > time. :)
> :-)  I do recall running into it some time ago
> 
> > 
> > > +
> > > +out_kfree:
> > > +	xfs_iunlock(ip, XFS_ILOCK_EXCL);
> > > +	kmem_free(namebuf);
> > 
> > kvfree, since you got namebuf from kvzalloc.
> Alrighty
> 
> > 
> > > +
> > > +	return error;
> > > +}
> > > +
> > > diff --git a/fs/xfs/xfs_parent_utils.h b/fs/xfs/xfs_parent_utils.h
> > > new file mode 100644
> > > index 000000000000..0e952b2ebd4a
> > > --- /dev/null
> > > +++ b/fs/xfs/xfs_parent_utils.h
> > > @@ -0,0 +1,22 @@
> > > +/*
> > > + * Copyright (c) 2017 Oracle, Inc.
> > 
> > 2022?
> Sure, will update date
> 
> > 
> > > + * All Rights Reserved.
> > > + *
> > > + * This program is free software; you can redistribute it and/or
> > > + * modify it under the terms of the GNU General Public License as
> > > + * published by the Free Software Foundation.
> > > + *
> > > + * This program is distributed in the hope that it would be
> > > useful,
> > > + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> > > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> > > + * GNU General Public License for more details.
> > > + *
> > > + * You should have received a copy of the GNU General Public
> > > License
> > > + * along with this program; if not, write the Free Software
> > > Foundation Inc.
> > 
> > This also needs to be condensed to a SPDX header and a copyright
> > statement.
> Right, will clean that up too
> 
> Thanks for the reviews!
> Allison
> 
> > 
> > > + */
> > > +#ifndef	__XFS_PARENT_UTILS_H__
> > > +#define	__XFS_PARENT_UTILS_H__
> > > +
> > > +int xfs_attr_get_parent_pointer(struct xfs_inode *ip,
> > > +				struct xfs_pptr_info *ppi);
> > > +#endif	/* __XFS_PARENT_UTILS_H__ */
> > > -- 
> > > 2.25.1
> > > 
> 

^ permalink raw reply	[flat|nested] 58+ messages in thread

end of thread, other threads:[~2022-09-24  0:01 UTC | newest]

Thread overview: 58+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-04 19:39 [PATCH RESEND v2 00/18] Parent Pointers Allison Henderson
2022-08-04 19:39 ` [PATCH RESEND v2 01/18] xfs: Fix multi-transaction larp replay Allison Henderson
2022-08-09 16:52   ` Darrick J. Wong
2022-08-10  1:58     ` Dave Chinner
2022-08-10  5:01       ` Alli
2022-08-10  6:12         ` Dave Chinner
2022-08-10 15:52           ` Darrick J. Wong
2022-08-10 19:28             ` Alli
2022-08-12  1:55           ` Alli
2022-08-12  3:05             ` Darrick J. Wong
2022-08-16  0:54             ` Dave Chinner
2022-08-16  5:07               ` Darrick J. Wong
2022-08-16 20:41                 ` Alli
2022-08-19  1:05                   ` Alli
2022-08-23 15:07                     ` Darrick J. Wong
2022-08-24 18:47                       ` Alli
2022-08-10  3:08     ` Alli
2022-08-04 19:39 ` [PATCH RESEND v2 02/18] xfs: Increase XFS_DEFER_OPS_NR_INODES to 5 Allison Henderson
2022-08-09 16:38   ` Darrick J. Wong
2022-08-10  3:07     ` Alli
2022-08-04 19:39 ` [PATCH RESEND v2 03/18] xfs: Hold inode locks in xfs_ialloc Allison Henderson
2022-08-04 19:39 ` [PATCH RESEND v2 04/18] xfs: Hold inode locks in xfs_trans_alloc_dir Allison Henderson
2022-08-04 19:40 ` [PATCH RESEND v2 05/18] xfs: get directory offset when adding directory name Allison Henderson
2022-08-04 19:40 ` [PATCH RESEND v2 06/18] xfs: get directory offset when removing " Allison Henderson
2022-08-04 19:40 ` [PATCH RESEND v2 07/18] xfs: get directory offset when replacing a " Allison Henderson
2022-08-04 19:40 ` [PATCH RESEND v2 08/18] xfs: add parent pointer support to attribute code Allison Henderson
2022-08-09 16:54   ` Darrick J. Wong
2022-08-10  3:08     ` Alli
2022-08-04 19:40 ` [PATCH RESEND v2 09/18] xfs: define parent pointer xattr format Allison Henderson
2022-08-04 19:40 ` [PATCH RESEND v2 10/18] xfs: Add xfs_verify_pptr Allison Henderson
2022-08-09 16:59   ` Darrick J. Wong
2022-08-10  3:08     ` Alli
2022-08-04 19:40 ` [PATCH RESEND v2 11/18] xfs: extend transaction reservations for parent attributes Allison Henderson
2022-08-09 17:48   ` Darrick J. Wong
2022-08-10  3:08     ` Alli
2022-08-04 19:40 ` [PATCH RESEND v2 12/18] xfs: parent pointer attribute creation Allison Henderson
2022-08-09 18:01   ` Darrick J. Wong
2022-08-09 18:13     ` Darrick J. Wong
2022-08-10  3:09       ` Alli
2022-08-10  3:08     ` Alli
2022-08-04 19:40 ` [PATCH RESEND v2 13/18] xfs: add parent attributes to link Allison Henderson
2022-08-09 18:43   ` Darrick J. Wong
2022-08-10  3:09     ` Alli
2022-09-23 20:25       ` Darrick J. Wong
2022-08-04 19:40 ` [PATCH RESEND v2 14/18] xfs: remove parent pointers in unlink Allison Henderson
2022-08-09 18:45   ` Darrick J. Wong
2022-08-10  3:09     ` Alli
2022-08-04 19:40 ` [PATCH RESEND v2 15/18] xfs: Add parent pointers to rename Allison Henderson
2022-08-09 18:49   ` Darrick J. Wong
2022-08-10  3:09     ` Alli
2022-08-04 19:40 ` [PATCH RESEND v2 16/18] xfs: Add the parent pointer support to the superblock version 5 Allison Henderson
2022-08-04 19:40 ` [PATCH RESEND v2 17/18] xfs: Add helper function xfs_attr_list_context_init Allison Henderson
2022-08-04 19:40 ` [PATCH RESEND v2 18/18] xfs: Add parent pointer ioctl Allison Henderson
2022-08-09 19:26   ` Darrick J. Wong
2022-08-10  3:09     ` Alli
2022-09-24  0:01       ` Darrick J. Wong
2022-08-09 22:55 ` [RFC PATCH 19/18] xfs: fix unit conversion error in xfs_log_calc_max_attrsetm_res Darrick J. Wong
2022-08-09 22:56 ` [RFC PATCH 20/18] xfs: drop compatibility minimum log size computations for reflink Darrick J. Wong

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.