All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHSET 0/3] xfs: fix name/value buffer lifetime errrors
@ 2022-05-18 18:55 Darrick J. Wong
  2022-05-18 18:55 ` [PATCH 1/3] xfs: validate xattr name earlier in recovery Darrick J. Wong
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Darrick J. Wong @ 2022-05-18 18:55 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs, david, allison.henderson

Hi all,

As part of kicking the tires on the new logged extended attributes
(LARP) code, I observed a crash in the xattr intent log item relogging
function.  A quick code inspection led to me noticing that the logging
code repeatedly allocates and frees space for the name and value
buffers, which is unnecessary.  Replace all that with a shared
refcounted buffer to hold the name and the value across all the log
items involved in the transaction, and as long as is necessary for the
log to process all work.

If you're going to start using this mess, you probably ought to just
pull from my git trees, which are linked below.

This is an extraordinary way to destroy everything.  Enjoy!
Comments and questions are, as always, welcome.

--D

kernel git tree:
https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=attr-intent-uaf-fixes-5.19
---
 fs/xfs/libxfs/xfs_attr.h |    8 +
 fs/xfs/xfs_attr_item.c   |  296 +++++++++++++++++++++++++++-------------------
 fs/xfs/xfs_attr_item.h   |   13 +-
 3 files changed, 191 insertions(+), 126 deletions(-)


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH 1/3] xfs: validate xattr name earlier in recovery
  2022-05-18 18:55 [PATCHSET 0/3] xfs: fix name/value buffer lifetime errrors Darrick J. Wong
@ 2022-05-18 18:55 ` Darrick J. Wong
  2022-05-19  1:36   ` Dave Chinner
  2022-05-19 20:33   ` Alli
  2022-05-18 18:55 ` [PATCH 2/3] xfs: share xattr name and value buffers when logging xattr updates Darrick J. Wong
  2022-05-18 18:55 ` [PATCH 3/3] xfs: free xfs_attrd_log_items correctly Darrick J. Wong
  2 siblings, 2 replies; 11+ messages in thread
From: Darrick J. Wong @ 2022-05-18 18:55 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs, david, allison.henderson

From: Darrick J. Wong <djwong@kernel.org>

When we're validating a recovered xattr log item during log recovery, we
should check the name before starting to allocate resources.  This isn't
strictly necessary on its own, but it means that we won't bother with
huge memory allocations during recovery if the attr name is garbage,
which will simplify the changes in the next patch.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_attr_item.c |   15 ++++++++-------
 1 file changed, 8 insertions(+), 7 deletions(-)


diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
index fd0a74f3ef45..4976b1ddc09f 100644
--- a/fs/xfs/xfs_attr_item.c
+++ b/fs/xfs/xfs_attr_item.c
@@ -688,16 +688,23 @@ xlog_recover_attri_commit_pass2(
 	struct xfs_mount                *mp = log->l_mp;
 	struct xfs_attri_log_item       *attrip;
 	struct xfs_attri_log_format     *attri_formatp;
+	const void			*attr_name;
 	int				region = 0;
 
 	attri_formatp = item->ri_buf[region].i_addr;
+	attr_name = item->ri_buf[1].i_addr;
 
-	/* Validate xfs_attri_log_format */
+	/* Validate xfs_attri_log_format before the large memory allocation */
 	if (!xfs_attri_validate(mp, attri_formatp)) {
 		XFS_ERROR_REPORT(__func__, XFS_ERRLEVEL_LOW, mp);
 		return -EFSCORRUPTED;
 	}
 
+	if (!xfs_attr_namecheck(attr_name, attri_formatp->alfi_name_len)) {
+		XFS_ERROR_REPORT(__func__, XFS_ERRLEVEL_LOW, mp);
+		return -EFSCORRUPTED;
+	}
+
 	/* memory alloc failure will cause replay to abort */
 	attrip = xfs_attri_init(mp, attri_formatp->alfi_name_len,
 				attri_formatp->alfi_value_len);
@@ -713,12 +720,6 @@ xlog_recover_attri_commit_pass2(
 	memcpy(attrip->attri_name, item->ri_buf[region].i_addr,
 	       attrip->attri_name_len);
 
-	if (!xfs_attr_namecheck(attrip->attri_name, attrip->attri_name_len)) {
-		XFS_ERROR_REPORT(__func__, XFS_ERRLEVEL_LOW, mp);
-		error = -EFSCORRUPTED;
-		goto out;
-	}
-
 	if (attrip->attri_value_len > 0) {
 		region++;
 		memcpy(attrip->attri_value, item->ri_buf[region].i_addr,


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 2/3] xfs: share xattr name and value buffers when logging xattr updates
  2022-05-18 18:55 [PATCHSET 0/3] xfs: fix name/value buffer lifetime errrors Darrick J. Wong
  2022-05-18 18:55 ` [PATCH 1/3] xfs: validate xattr name earlier in recovery Darrick J. Wong
@ 2022-05-18 18:55 ` Darrick J. Wong
  2022-05-19  0:27   ` Dave Chinner
  2022-05-18 18:55 ` [PATCH 3/3] xfs: free xfs_attrd_log_items correctly Darrick J. Wong
  2 siblings, 1 reply; 11+ messages in thread
From: Darrick J. Wong @ 2022-05-18 18:55 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs, david, allison.henderson

From: Darrick J. Wong <djwong@kernel.org>

While running xfs/297 and generic/642, I noticed a crash in
xfs_attri_item_relog when it tries to copy the attr name to the new
xattri log item.  I think what happened here was that we called
->iop_commit on the old attri item (which nulls out the pointers) as
part of a log force at the same time that a chained attr operation was
ongoing.  The system was busy enough that at some later point, the defer
ops operation decided it was necessary to relog the attri log item, but
as we've detached the name buffer from the old attri log item, we can't
copy it to the new one, and kaboom.

I think there's a broader refcounting problem with LARP mode -- the
setxattr code can return to userspace before the CIL actually formats
and commits the log item, which results in a UAF bug.  Therefore, the
xattr log item needs to be able to retain a reference to the name and
value buffers until the log items have completely cleared the log.
Furthermore, each time we create an intent log item, we allocate new
memory and (re)copy the contents; sharing here would be very useful.

Solve the UAF and the unnecessary memory allocations by having the log
code create a single refcounted buffer to contain the name and value
contents.  This buffer can be passed from old to new during a relog
operation, and the logging code can (optionally) attach it to the
xfs_attr_item for reuse when LARP mode is enabled.

This also fixes a problem where the xfs_attri_log_item objects weren't
being freed back to the same cache where they came from.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/libxfs/xfs_attr.h |    8 +
 fs/xfs/xfs_attr_item.c   |  279 +++++++++++++++++++++++++++-------------------
 fs/xfs/xfs_attr_item.h   |   13 +-
 3 files changed, 182 insertions(+), 118 deletions(-)


diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
index 1af7abe29eef..17746dcc2268 100644
--- a/fs/xfs/libxfs/xfs_attr.h
+++ b/fs/xfs/libxfs/xfs_attr.h
@@ -501,6 +501,8 @@ enum xfs_delattr_state {
 	{ XFS_DAS_NODE_REMOVE_ATTR,	"XFS_DAS_NODE_REMOVE_ATTR" }, \
 	{ XFS_DAS_DONE,			"XFS_DAS_DONE" }
 
+struct xfs_attri_log_nameval;
+
 /*
  * Defines for xfs_attr_item.xattri_flags
  */
@@ -512,6 +514,12 @@ enum xfs_delattr_state {
 struct xfs_attr_item {
 	struct xfs_da_args		*xattri_da_args;
 
+	/*
+	 * Shared buffer containing the attr name and value so that the logging
+	 * code can share large memory buffers between log items.
+	 */
+	struct xfs_attri_log_nameval	*xattri_nameval;
+
 	/*
 	 * Used by xfs_attr_set to hold a leaf buffer across a transaction roll
 	 */
diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
index 4976b1ddc09f..7d4469e8a4fc 100644
--- a/fs/xfs/xfs_attr_item.c
+++ b/fs/xfs/xfs_attr_item.c
@@ -39,12 +39,91 @@ static inline struct xfs_attri_log_item *ATTRI_ITEM(struct xfs_log_item *lip)
 	return container_of(lip, struct xfs_attri_log_item, attri_item);
 }
 
+/*
+ * Shared xattr name/value buffers for logged extended attribute operations
+ *
+ * When logging updates to extended attributes, we can create quite a few
+ * attribute log intent items for a single xattr update.  To avoid cycling the
+ * memory allocator and memcpy overhead, the name (and value, for setxattr)
+ * are kept in a refcounted object that is shared across all related log items
+ * and the upper-level deferred work state structure.  The shared buffer has
+ * a control structure, followed by the name, and then the value.
+ */
+
+static inline struct xfs_attri_log_nameval *
+xfs_attri_log_nameval_get(
+	struct xfs_attri_log_nameval	*anvl)
+{
+	BUG_ON(!refcount_inc_not_zero(&anvl->anvl_refcount));
+	return anvl;
+}
+
+static inline void
+xfs_attri_log_nameval_put(
+	struct xfs_attri_log_nameval	*anvl)
+{
+	if (refcount_dec_and_test(&anvl->anvl_refcount))
+		kvfree(anvl);
+}
+
+static inline void *
+xfs_attri_log_namebuf(
+	struct xfs_attri_log_item	*attrip)
+{
+	return attrip->attri_nameval + 1;
+}
+
+static inline void *
+xfs_attri_log_valbuf(
+	struct xfs_attri_log_item	*attrip)
+{
+	struct xfs_attri_log_nameval	*anvl = attrip->attri_nameval;
+
+	if (anvl->anvl_value_len == 0)
+		return NULL;
+
+	return (char *)xfs_attri_log_namebuf(attrip) + anvl->anvl_name_len;
+}
+
+static inline struct xfs_attri_log_nameval *
+xfs_attri_log_nameval_alloc(
+	const void			*name,
+	unsigned int			name_len,
+	const void			*value,
+	unsigned int			value_len)
+{
+	struct xfs_attri_log_nameval	*anvl;
+	void				*p;
+
+	/*
+	 * This could be over 64kB in length, so we have to use kvmalloc() for
+	 * this. But kvmalloc() utterly sucks, so we use our own version.
+	 */
+	anvl = xlog_kvmalloc(sizeof(struct xfs_attri_log_nameval) +
+					name_len + value_len);
+	if (!anvl)
+		return anvl;
+
+	anvl->anvl_name_len = name_len;
+	anvl->anvl_value_len = value_len;
+	p = anvl + 1;
+	memcpy(p, name, name_len);
+	if (value_len) {
+		p += name_len;
+		memcpy(p, value, value_len);
+	}
+	refcount_set(&anvl->anvl_refcount, 1);
+	return anvl;
+}
+
 STATIC void
 xfs_attri_item_free(
 	struct xfs_attri_log_item	*attrip)
 {
 	kmem_free(attrip->attri_item.li_lv_shadow);
-	kvfree(attrip);
+	if (attrip->attri_nameval)
+		xfs_attri_log_nameval_put(attrip->attri_nameval);
+	kmem_cache_free(xfs_attri_cache, attrip);
 }
 
 /*
@@ -73,16 +152,17 @@ xfs_attri_item_size(
 	int				*nbytes)
 {
 	struct xfs_attri_log_item       *attrip = ATTRI_ITEM(lip);
+	struct xfs_attri_log_nameval	*anvl = attrip->attri_nameval;
 
 	*nvecs += 2;
 	*nbytes += sizeof(struct xfs_attri_log_format) +
-			xlog_calc_iovec_len(attrip->attri_name_len);
+			xlog_calc_iovec_len(anvl->anvl_name_len);
 
-	if (!attrip->attri_value_len)
+	if (!anvl->anvl_value_len)
 		return;
 
 	*nvecs += 1;
-	*nbytes += xlog_calc_iovec_len(attrip->attri_value_len);
+	*nbytes += xlog_calc_iovec_len(anvl->anvl_value_len);
 }
 
 /*
@@ -97,6 +177,7 @@ xfs_attri_item_format(
 {
 	struct xfs_attri_log_item	*attrip = ATTRI_ITEM(lip);
 	struct xfs_log_iovec		*vecp = NULL;
+	struct xfs_attri_log_nameval	*anvl = attrip->attri_nameval;
 
 	attrip->attri_format.alfi_type = XFS_LI_ATTRI;
 	attrip->attri_format.alfi_size = 1;
@@ -108,22 +189,22 @@ xfs_attri_item_format(
 	 * the log recovery.
 	 */
 
-	ASSERT(attrip->attri_name_len > 0);
+	ASSERT(anvl->anvl_name_len > 0);
 	attrip->attri_format.alfi_size++;
 
-	if (attrip->attri_value_len > 0)
+	if (anvl->anvl_value_len > 0)
 		attrip->attri_format.alfi_size++;
 
 	xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTRI_FORMAT,
 			&attrip->attri_format,
 			sizeof(struct xfs_attri_log_format));
 	xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTR_NAME,
-			attrip->attri_name,
-			attrip->attri_name_len);
-	if (attrip->attri_value_len > 0)
+			xfs_attri_log_namebuf(attrip),
+			anvl->anvl_name_len);
+	if (anvl->anvl_value_len > 0)
 		xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTR_VALUE,
-				attrip->attri_value,
-				attrip->attri_value_len);
+				xfs_attri_log_valbuf(attrip),
+				anvl->anvl_value_len);
 }
 
 /*
@@ -158,41 +239,17 @@ xfs_attri_item_release(
 STATIC struct xfs_attri_log_item *
 xfs_attri_init(
 	struct xfs_mount		*mp,
-	uint32_t			name_len,
-	uint32_t			value_len)
-
+	struct xfs_attri_log_nameval	*anvl)
 {
 	struct xfs_attri_log_item	*attrip;
-	uint32_t			buffer_size = name_len + value_len;
 
-	if (buffer_size) {
-		/*
-		 * This could be over 64kB in length, so we have to use
-		 * kvmalloc() for this. But kvmalloc() utterly sucks, so we
-		 * use own version.
-		 */
-		attrip = xlog_kvmalloc(sizeof(struct xfs_attri_log_item) +
-					buffer_size);
-	} else {
-		attrip = kmem_cache_alloc(xfs_attri_cache,
-					GFP_NOFS | __GFP_NOFAIL);
-	}
-	memset(attrip, 0, sizeof(struct xfs_attri_log_item));
+	attrip = kmem_cache_zalloc(xfs_attri_cache, GFP_NOFS | __GFP_NOFAIL);
 
-	attrip->attri_name_len = name_len;
-	if (name_len)
-		attrip->attri_name = ((char *)attrip) +
-				sizeof(struct xfs_attri_log_item);
-	else
-		attrip->attri_name = NULL;
-
-	attrip->attri_value_len = value_len;
-	if (value_len)
-		attrip->attri_value = ((char *)attrip) +
-				sizeof(struct xfs_attri_log_item) +
-				name_len;
-	else
-		attrip->attri_value = NULL;
+	/*
+	 * Grab an extra reference to the name/value buffer for this log item.
+	 * The caller retains its own reference!
+	 */
+	attrip->attri_nameval = xfs_attri_log_nameval_get(anvl);
 
 	xfs_log_item_init(mp, &attrip->attri_item, XFS_LI_ATTRI,
 			  &xfs_attri_item_ops);
@@ -335,7 +392,7 @@ STATIC void
 xfs_attr_log_item(
 	struct xfs_trans		*tp,
 	struct xfs_attri_log_item	*attrip,
-	struct xfs_attr_item		*attr)
+	const struct xfs_attr_item	*attr)
 {
 	struct xfs_attri_log_format	*attrp;
 
@@ -351,17 +408,10 @@ xfs_attr_log_item(
 	attrp->alfi_ino = attr->xattri_da_args->dp->i_ino;
 	ASSERT(!(attr->xattri_op_flags & ~XFS_ATTR_OP_FLAGS_TYPE_MASK));
 	attrp->alfi_op_flags = attr->xattri_op_flags;
-	attrp->alfi_value_len = attr->xattri_da_args->valuelen;
-	attrp->alfi_name_len = attr->xattri_da_args->namelen;
+	attrp->alfi_value_len = attr->xattri_nameval->anvl_value_len;
+	attrp->alfi_name_len = attr->xattri_nameval->anvl_name_len;
 	ASSERT(!(attr->xattri_da_args->attr_filter & ~XFS_ATTRI_FILTER_MASK));
 	attrp->alfi_attr_filter = attr->xattri_da_args->attr_filter;
-
-	memcpy(attrip->attri_name, attr->xattri_da_args->name,
-	       attr->xattri_da_args->namelen);
-	memcpy(attrip->attri_value, attr->xattri_da_args->value,
-	       attr->xattri_da_args->valuelen);
-	attrip->attri_name_len = attr->xattri_da_args->namelen;
-	attrip->attri_value_len = attr->xattri_da_args->valuelen;
 }
 
 /* Get an ATTRI. */
@@ -385,16 +435,33 @@ xfs_attr_create_intent(
 	 * Each attr item only performs one attribute operation at a time, so
 	 * this is a list of one
 	 */
-	list_for_each_entry(attr, items, xattri_list) {
-		attrip = xfs_attri_init(mp, attr->xattri_da_args->namelen,
-					attr->xattri_da_args->valuelen);
-		if (attrip == NULL)
-			return NULL;
+	attr = list_first_entry_or_null(items, struct xfs_attr_item,
+			xattri_list);
 
-		xfs_trans_add_item(tp, &attrip->attri_item);
-		xfs_attr_log_item(tp, attrip, attr);
+	/*
+	 * Create a buffer to store the attribute name and value.  This buffer
+	 * will be shared between the higher level deferred xattr work state
+	 * and the lower level xattr log items.
+	 */
+	if (!attr->xattri_nameval) {
+		struct xfs_da_args	*args = attr->xattri_da_args;
+
+		/*
+		 * Transfer our reference to the name/value buffer to the
+		 * deferred work state structure.
+		 */
+		attr->xattri_nameval = xfs_attri_log_nameval_alloc(args->name,
+				args->namelen, args->value, args->valuelen);
+	}
+	if (!attr->xattri_nameval) {
+		xlog_force_shutdown(mp->m_log, SHUTDOWN_LOG_IO_ERROR);
+		return NULL;
 	}
 
+	attrip = xfs_attri_init(mp, attr->xattri_nameval);
+	xfs_trans_add_item(tp, &attrip->attri_item);
+	xfs_attr_log_item(tp, attrip, attr);
+
 	return &attrip->attri_item;
 }
 
@@ -404,6 +471,8 @@ xfs_attr_free_item(
 {
 	if (attr->xattri_da_state)
 		xfs_da_state_free(attr->xattri_da_state);
+	if (attr->xattri_nameval)
+		xfs_attri_log_nameval_put(attr->xattri_nameval);
 	kmem_free(attr);
 }
 
@@ -455,29 +524,6 @@ xfs_attr_cancel_item(
 	xfs_attr_free_item(attr);
 }
 
-STATIC xfs_lsn_t
-xfs_attri_item_committed(
-	struct xfs_log_item		*lip,
-	xfs_lsn_t			lsn)
-{
-	struct xfs_attri_log_item	*attrip = ATTRI_ITEM(lip);
-
-	/*
-	 * The attrip refers to xfs_attr_item memory to log the name and value
-	 * with the intent item. This already occurred when the intent was
-	 * committed so these fields are no longer accessed. Clear them out of
-	 * caution since we're about to free the xfs_attr_item.
-	 */
-	attrip->attri_name = NULL;
-	attrip->attri_value = NULL;
-
-	/*
-	 * The ATTRI is logged only once and cannot be moved in the log, so
-	 * simply return the lsn at which it's been logged.
-	 */
-	return lsn;
-}
-
 STATIC bool
 xfs_attri_item_match(
 	struct xfs_log_item	*lip,
@@ -541,6 +587,7 @@ xfs_attri_item_recover(
 	struct xfs_trans		*tp;
 	struct xfs_trans_res		tres;
 	struct xfs_attri_log_format	*attrp;
+	struct xfs_attri_log_nameval	*anvl = attrip->attri_nameval;
 	int				error, ret = 0;
 	int				total;
 	int				local;
@@ -552,7 +599,8 @@ xfs_attri_item_recover(
 	 */
 	attrp = &attrip->attri_format;
 	if (!xfs_attri_validate(mp, attrp) ||
-	    !xfs_attr_namecheck(attrip->attri_name, attrip->attri_name_len))
+	    !xfs_attr_namecheck(xfs_attri_log_namebuf(attrip),
+			anvl->anvl_name_len))
 		return -EFSCORRUPTED;
 
 	error = xlog_recover_iget(mp,  attrp->alfi_ino, &ip);
@@ -567,10 +615,17 @@ xfs_attri_item_recover(
 	attr->xattri_op_flags = attrp->alfi_op_flags &
 						XFS_ATTR_OP_FLAGS_TYPE_MASK;
 
+	/*
+	 * We're reconstructing the deferred work state structure from the
+	 * recovered log item.  Grab a reference to the name/value buffer and
+	 * attach it to the new work state.
+	 */
+	attr->xattri_nameval = xfs_attri_log_nameval_get(anvl);
+
 	args->dp = ip;
 	args->geo = mp->m_attr_geo;
 	args->whichfork = XFS_ATTR_FORK;
-	args->name = attrip->attri_name;
+	args->name = xfs_attri_log_namebuf(attrip);
 	args->namelen = attrp->alfi_name_len;
 	args->hashval = xfs_da_hashname(args->name, args->namelen);
 	args->attr_filter = attrp->alfi_attr_filter & XFS_ATTRI_FILTER_MASK;
@@ -579,7 +634,7 @@ xfs_attri_item_recover(
 	switch (attr->xattri_op_flags) {
 	case XFS_ATTR_OP_FLAGS_SET:
 	case XFS_ATTR_OP_FLAGS_REPLACE:
-		args->value = attrip->attri_value;
+		args->value = xfs_attri_log_valbuf(attrip);
 		args->valuelen = attrp->alfi_value_len;
 		args->total = xfs_attr_calc_size(args, &local);
 		if (xfs_inode_hasattr(args->dp))
@@ -654,8 +709,11 @@ xfs_attri_item_relog(
 	attrdp = xfs_trans_get_attrd(tp, old_attrip);
 	set_bit(XFS_LI_DIRTY, &attrdp->attrd_item.li_flags);
 
-	new_attrip = xfs_attri_init(tp->t_mountp, old_attrp->alfi_name_len,
-				    old_attrp->alfi_value_len);
+	/*
+	 * Create a new log item that shares the same name/value buffer as the
+	 * old log item.
+	 */
+	new_attrip = xfs_attri_init(tp->t_mountp, old_attrip->attri_nameval);
 	new_attrp = &new_attrip->attri_format;
 
 	new_attrp->alfi_ino = old_attrp->alfi_ino;
@@ -664,13 +722,6 @@ xfs_attri_item_relog(
 	new_attrp->alfi_name_len = old_attrp->alfi_name_len;
 	new_attrp->alfi_attr_filter = old_attrp->alfi_attr_filter;
 
-	memcpy(new_attrip->attri_name, old_attrip->attri_name,
-		new_attrip->attri_name_len);
-
-	if (new_attrip->attri_value_len > 0)
-		memcpy(new_attrip->attri_value, old_attrip->attri_value,
-		       new_attrip->attri_value_len);
-
 	xfs_trans_add_item(tp, &new_attrip->attri_item);
 	set_bit(XFS_LI_DIRTY, &new_attrip->attri_item.li_flags);
 
@@ -684,14 +735,15 @@ xlog_recover_attri_commit_pass2(
 	struct xlog_recover_item        *item,
 	xfs_lsn_t                       lsn)
 {
-	int                             error;
 	struct xfs_mount                *mp = log->l_mp;
 	struct xfs_attri_log_item       *attrip;
 	struct xfs_attri_log_format     *attri_formatp;
+	struct xfs_attri_log_nameval	*anvl;
+	const void			*attr_value = NULL;
 	const void			*attr_name;
-	int				region = 0;
+	int                             error;
 
-	attri_formatp = item->ri_buf[region].i_addr;
+	attri_formatp = item->ri_buf[0].i_addr;
 	attr_name = item->ri_buf[1].i_addr;
 
 	/* Validate xfs_attri_log_format before the large memory allocation */
@@ -705,27 +757,25 @@ xlog_recover_attri_commit_pass2(
 		return -EFSCORRUPTED;
 	}
 
-	/* memory alloc failure will cause replay to abort */
-	attrip = xfs_attri_init(mp, attri_formatp->alfi_name_len,
-				attri_formatp->alfi_value_len);
-	if (attrip == NULL)
+	if (attri_formatp->alfi_value_len)
+		attr_value = item->ri_buf[2].i_addr;
+
+	/*
+	 * Memory alloc failure will cause replay to abort.  We attach the
+	 * name/value buffer to the recovered incore log item and drop our
+	 * reference.
+	 */
+	anvl = xfs_attri_log_nameval_alloc(attr_name,
+			attri_formatp->alfi_name_len, attr_value,
+			attri_formatp->alfi_value_len);
+	if (!anvl)
 		return -ENOMEM;
 
-	error = xfs_attri_copy_format(&item->ri_buf[region],
-				      &attrip->attri_format);
+	attrip = xfs_attri_init(mp, anvl);
+	error = xfs_attri_copy_format(&item->ri_buf[0], &attrip->attri_format);
 	if (error)
 		goto out;
 
-	region++;
-	memcpy(attrip->attri_name, item->ri_buf[region].i_addr,
-	       attrip->attri_name_len);
-
-	if (attrip->attri_value_len > 0) {
-		region++;
-		memcpy(attrip->attri_value, item->ri_buf[region].i_addr,
-		       attrip->attri_value_len);
-	}
-
 	/*
 	 * The ATTRI has two references. One for the ATTRD and one for ATTRI to
 	 * ensure it makes it into the AIL. Insert the ATTRI into the AIL
@@ -734,9 +784,11 @@ xlog_recover_attri_commit_pass2(
 	 */
 	xfs_trans_ail_insert(log->l_ailp, &attrip->attri_item, lsn);
 	xfs_attri_release(attrip);
+	xfs_attri_log_nameval_put(anvl);
 	return 0;
 out:
 	xfs_attri_item_free(attrip);
+	xfs_attri_log_nameval_put(anvl);
 	return error;
 }
 
@@ -816,7 +868,6 @@ static const struct xfs_item_ops xfs_attri_item_ops = {
 	.iop_size	= xfs_attri_item_size,
 	.iop_format	= xfs_attri_item_format,
 	.iop_unpin	= xfs_attri_item_unpin,
-	.iop_committed	= xfs_attri_item_committed,
 	.iop_release    = xfs_attri_item_release,
 	.iop_recover	= xfs_attri_item_recover,
 	.iop_match	= xfs_attri_item_match,
diff --git a/fs/xfs/xfs_attr_item.h b/fs/xfs/xfs_attr_item.h
index c3b779f82adb..2475e68712e1 100644
--- a/fs/xfs/xfs_attr_item.h
+++ b/fs/xfs/xfs_attr_item.h
@@ -11,6 +11,14 @@
 struct xfs_mount;
 struct kmem_zone;
 
+struct xfs_attri_log_nameval {
+	refcount_t		anvl_refcount;
+	unsigned int		anvl_name_len;
+	unsigned int		anvl_value_len;
+
+	/* name and value follow the end of this struct */
+};
+
 /*
  * This is the "attr intention" log item.  It is used to log the fact that some
  * extended attribute operations need to be processed.  An operation is
@@ -26,10 +34,7 @@ struct kmem_zone;
 struct xfs_attri_log_item {
 	struct xfs_log_item		attri_item;
 	atomic_t			attri_refcount;
-	int				attri_name_len;
-	int				attri_value_len;
-	void				*attri_name;
-	void				*attri_value;
+	struct xfs_attri_log_nameval	*attri_nameval;
 	struct xfs_attri_log_format	attri_format;
 };
 


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH 3/3] xfs: free xfs_attrd_log_items correctly
  2022-05-18 18:55 [PATCHSET 0/3] xfs: fix name/value buffer lifetime errrors Darrick J. Wong
  2022-05-18 18:55 ` [PATCH 1/3] xfs: validate xattr name earlier in recovery Darrick J. Wong
  2022-05-18 18:55 ` [PATCH 2/3] xfs: share xattr name and value buffers when logging xattr updates Darrick J. Wong
@ 2022-05-18 18:55 ` Darrick J. Wong
  2022-05-19  1:37   ` Dave Chinner
  2022-05-19 20:33   ` Alli
  2 siblings, 2 replies; 11+ messages in thread
From: Darrick J. Wong @ 2022-05-18 18:55 UTC (permalink / raw)
  To: djwong; +Cc: linux-xfs, david, allison.henderson

From: Darrick J. Wong <djwong@kernel.org>

Technically speaking, objects allocated out of a specific slab cache are
supposed to be freed to that slab cache.  The popular slab backends will
take care of this for us, but SLOB famously doesn't.  Fix this, even if
slob + xfs are not that common of a combination.

Signed-off-by: Darrick J. Wong <djwong@kernel.org>
---
 fs/xfs/xfs_attr_item.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
index 7d4469e8a4fc..9ef2c2455921 100644
--- a/fs/xfs/xfs_attr_item.c
+++ b/fs/xfs/xfs_attr_item.c
@@ -290,7 +290,7 @@ STATIC void
 xfs_attrd_item_free(struct xfs_attrd_log_item *attrdp)
 {
 	kmem_free(attrdp->attrd_item.li_lv_shadow);
-	kmem_free(attrdp);
+	kmem_cache_free(xfs_attrd_cache, attrdp);
 }
 
 STATIC void


^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] xfs: share xattr name and value buffers when logging xattr updates
  2022-05-18 18:55 ` [PATCH 2/3] xfs: share xattr name and value buffers when logging xattr updates Darrick J. Wong
@ 2022-05-19  0:27   ` Dave Chinner
  2022-05-19 18:08     ` Darrick J. Wong
  0 siblings, 1 reply; 11+ messages in thread
From: Dave Chinner @ 2022-05-19  0:27 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, allison.henderson

On Wed, May 18, 2022 at 11:55:13AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> While running xfs/297 and generic/642, I noticed a crash in
> xfs_attri_item_relog when it tries to copy the attr name to the new
> xattri log item.  I think what happened here was that we called
> ->iop_commit on the old attri item (which nulls out the pointers) as
> part of a log force at the same time that a chained attr operation was
> ongoing.  The system was busy enough that at some later point, the defer
> ops operation decided it was necessary to relog the attri log item, but
> as we've detached the name buffer from the old attri log item, we can't
> copy it to the new one, and kaboom.
> 
> I think there's a broader refcounting problem with LARP mode -- the
> setxattr code can return to userspace before the CIL actually formats
> and commits the log item, which results in a UAF bug.  Therefore, the
> xattr log item needs to be able to retain a reference to the name and
> value buffers until the log items have completely cleared the log.
> Furthermore, each time we create an intent log item, we allocate new
> memory and (re)copy the contents; sharing here would be very useful.
> 
> Solve the UAF and the unnecessary memory allocations by having the log
> code create a single refcounted buffer to contain the name and value
> contents.  This buffer can be passed from old to new during a relog
> operation, and the logging code can (optionally) attach it to the
> xfs_attr_item for reuse when LARP mode is enabled.
> 
> This also fixes a problem where the xfs_attri_log_item objects weren't
> being freed back to the same cache where they came from.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
>  fs/xfs/libxfs/xfs_attr.h |    8 +
>  fs/xfs/xfs_attr_item.c   |  279 +++++++++++++++++++++++++++-------------------
>  fs/xfs/xfs_attr_item.h   |   13 +-
>  3 files changed, 182 insertions(+), 118 deletions(-)
> 
> 
> diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
> index 1af7abe29eef..17746dcc2268 100644
> --- a/fs/xfs/libxfs/xfs_attr.h
> +++ b/fs/xfs/libxfs/xfs_attr.h
> @@ -501,6 +501,8 @@ enum xfs_delattr_state {
>  	{ XFS_DAS_NODE_REMOVE_ATTR,	"XFS_DAS_NODE_REMOVE_ATTR" }, \
>  	{ XFS_DAS_DONE,			"XFS_DAS_DONE" }
>  
> +struct xfs_attri_log_nameval;

Ok, so this structure is:

struct xfs_attri_log_nameval {
	refcount_t		anvl_refcount;
	unsigned int		anvl_name_len;
	unsigned int		anvl_value_len;

	/* name and value follow the end of this struct */
};

If we are going to have a custom structure for this, let's actually
do it properly and not use magic pointers for the name/value
buffers. These are effectively iovecs, and we already do this
properly with log iovecs in shadow buffers. i.e:

struct xfs_attri_log_nameval {
	refcount_t		refcount;
	struct xfs_log_iovec	name;
	struct xfs_log_iovec	value;

	/* name and value follow the end of this struct */
};

When we set up the structure:

	nv.name.i_addr = (void *)&nv + 1;
	nv.name.i_len = name_len;
	memcpy(nv.name.i_addr, name, name_len);
	if (value_len) {
		nv.value.i_addr = nv.name.i_addr + nv.name.i_len;
		nv.value.i_len = value_len;
		memcpy(nv.value.i_addr, value, value_len);
	}

And from that point onwards we only ever use the i_addr/i_len pairs
to refer to the name/values. We don't need pointer magic anywhere
but the setup code.

(Also, "anvil"? Too clever by half and unnecessily verbose. It's a
just {name, value} pair, call it "nv").

>  /*
>   * Defines for xfs_attr_item.xattri_flags
>   */
> @@ -512,6 +514,12 @@ enum xfs_delattr_state {
>  struct xfs_attr_item {
>  	struct xfs_da_args		*xattri_da_args;
>  
> +	/*
> +	 * Shared buffer containing the attr name and value so that the logging
> +	 * code can share large memory buffers between log items.
> +	 */
> +	struct xfs_attri_log_nameval	*xattri_nameval;
> +
>  	/*
>  	 * Used by xfs_attr_set to hold a leaf buffer across a transaction roll
>  	 */
> diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
> index 4976b1ddc09f..7d4469e8a4fc 100644
> --- a/fs/xfs/xfs_attr_item.c
> +++ b/fs/xfs/xfs_attr_item.c
> @@ -39,12 +39,91 @@ static inline struct xfs_attri_log_item *ATTRI_ITEM(struct xfs_log_item *lip)
>  	return container_of(lip, struct xfs_attri_log_item, attri_item);
>  }
>  
> +/*
> + * Shared xattr name/value buffers for logged extended attribute operations
> + *
> + * When logging updates to extended attributes, we can create quite a few
> + * attribute log intent items for a single xattr update.  To avoid cycling the
> + * memory allocator and memcpy overhead, the name (and value, for setxattr)
> + * are kept in a refcounted object that is shared across all related log items
> + * and the upper-level deferred work state structure.  The shared buffer has
> + * a control structure, followed by the name, and then the value.
> + */
> +
> +static inline struct xfs_attri_log_nameval *
> +xfs_attri_log_nameval_get(
> +	struct xfs_attri_log_nameval	*anvl)
> +{
> +	BUG_ON(!refcount_inc_not_zero(&anvl->anvl_refcount));
> +	return anvl;

No BUG_ON() error handling.

ASSERT or WARN, return NULL on failure. Handle failure in the
caller.

> +}
> +
> +static inline void
> +xfs_attri_log_nameval_put(
> +	struct xfs_attri_log_nameval	*anvl)
> +{
> +	if (refcount_dec_and_test(&anvl->anvl_refcount))
> +		kvfree(anvl);
> +}
> +
> +static inline void *
> +xfs_attri_log_namebuf(
> +	struct xfs_attri_log_item	*attrip)
> +{
> +	return attrip->attri_nameval + 1;

I don't know what this points to just looking at this. Trying to
work out if it points to an address in the attri_nameval region or
whether it points to something in the attrip structure makes my head
hurt.  I'd much prefer this be written as:

	return attrip->attri_nameval->name.i_addr;

Because it explicitly encodes exactly what buffer we are returning
here. And with this, we probably don't even need the wrapper
function now.

> +}
> +
> +static inline void *
> +xfs_attri_log_valbuf(
> +	struct xfs_attri_log_item	*attrip)
> +{
> +	struct xfs_attri_log_nameval	*anvl = attrip->attri_nameval;
> +
> +	if (anvl->anvl_value_len == 0)
> +		return NULL;
> +
> +	return (char *)xfs_attri_log_namebuf(attrip) + anvl->anvl_name_len;
> +}

And this becomes:

	return attrip->attri_nameval->value.i_addr;

Because it is initialised to NULL if there is no value.

I think the need for this wrapper goes away, too.

> +
> +static inline struct xfs_attri_log_nameval *
> +xfs_attri_log_nameval_alloc(
> +	const void			*name,
> +	unsigned int			name_len,
> +	const void			*value,
> +	unsigned int			value_len)
> +{
> +	struct xfs_attri_log_nameval	*anvl;
> +	void				*p;
> +
> +	/*
> +	 * This could be over 64kB in length, so we have to use kvmalloc() for
> +	 * this. But kvmalloc() utterly sucks, so we use our own version.
> +	 */
> +	anvl = xlog_kvmalloc(sizeof(struct xfs_attri_log_nameval) +
> +					name_len + value_len);
> +	if (!anvl)
> +		return anvl;
> +
> +	anvl->anvl_name_len = name_len;
> +	anvl->anvl_value_len = value_len;

I'm really starting to dislike all the 4 and 5 letter variable name
prefixs in the new attr code. They are all just repeating the
variable name and so are largely redundant and makes the code
very verbose for no good reason. This:

	anvl->name_len = name_len;
	anvl->value_len = value_len;

is better because it is shorter and conveys exactly the same
information to the reader.

> +	p = anvl + 1;
> +	memcpy(p, name, name_len);
> +	if (value_len) {
> +		p += name_len;
> +		memcpy(p, value, value_len);
> +	}
> +	refcount_set(&anvl->anvl_refcount, 1);
> +	return anvl;
> +}
> +
>  STATIC void
>  xfs_attri_item_free(
>  	struct xfs_attri_log_item	*attrip)
>  {
>  	kmem_free(attrip->attri_item.li_lv_shadow);
> -	kvfree(attrip);
> +	if (attrip->attri_nameval)
> +		xfs_attri_log_nameval_put(attrip->attri_nameval);

Handle the NULL inside xfs_attri_log_nameval_put().

....

> @@ -108,22 +189,22 @@ xfs_attri_item_format(
>  	 * the log recovery.
>  	 */
>  
> -	ASSERT(attrip->attri_name_len > 0);
> +	ASSERT(anvl->anvl_name_len > 0);
>  	attrip->attri_format.alfi_size++;
>  
> -	if (attrip->attri_value_len > 0)
> +	if (anvl->anvl_value_len > 0)
>  		attrip->attri_format.alfi_size++;
>  
>  	xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTRI_FORMAT,
>  			&attrip->attri_format,
>  			sizeof(struct xfs_attri_log_format));
>  	xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTR_NAME,
> -			attrip->attri_name,
> -			attrip->attri_name_len);
> -	if (attrip->attri_value_len > 0)
> +			xfs_attri_log_namebuf(attrip),
> +			anvl->anvl_name_len);

Yeah, these wrappers to retrieve the actual buffer need to die.

FWIW, If we've already got the name encoded in a log iovec, add a
xlog_copy_from_iovec() method that handles empty iovecs and just
replace all this code with:

	xlog_copy_from_iovec(lv, &vecp, nv->name);
	xlog_copy_from_iovec(lv, &vecp, nv->value);

This is the first step towards sharing the NV buffer deep into the
CIL commit code so the shadow buffer doesn't need to allocate
another 64kB and copy that 64kB every time we log a new intent.

Not necessary for this patch, but I really want start in that
direction to get away from magic buffers so that the future
optimisations we already know we need to make are easier to do.

> +	if (anvl->anvl_value_len > 0)
>  		xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTR_VALUE,
> -				attrip->attri_value,
> -				attrip->attri_value_len);
> +				xfs_attri_log_valbuf(attrip),
> +				anvl->anvl_value_len);
>  }
>  
>  /*
> @@ -158,41 +239,17 @@ xfs_attri_item_release(
>  STATIC struct xfs_attri_log_item *
>  xfs_attri_init(
>  	struct xfs_mount		*mp,
> -	uint32_t			name_len,
> -	uint32_t			value_len)
> -
> +	struct xfs_attri_log_nameval	*anvl)
>  {
>  	struct xfs_attri_log_item	*attrip;
> -	uint32_t			buffer_size = name_len + value_len;
>  
> -	if (buffer_size) {
> -		/*
> -		 * This could be over 64kB in length, so we have to use
> -		 * kvmalloc() for this. But kvmalloc() utterly sucks, so we
> -		 * use own version.
> -		 */
> -		attrip = xlog_kvmalloc(sizeof(struct xfs_attri_log_item) +
> -					buffer_size);
> -	} else {
> -		attrip = kmem_cache_alloc(xfs_attri_cache,
> -					GFP_NOFS | __GFP_NOFAIL);
> -	}
> -	memset(attrip, 0, sizeof(struct xfs_attri_log_item));
> +	attrip = kmem_cache_zalloc(xfs_attri_cache, GFP_NOFS | __GFP_NOFAIL);
>  
> -	attrip->attri_name_len = name_len;
> -	if (name_len)
> -		attrip->attri_name = ((char *)attrip) +
> -				sizeof(struct xfs_attri_log_item);
> -	else
> -		attrip->attri_name = NULL;
> -
> -	attrip->attri_value_len = value_len;
> -	if (value_len)
> -		attrip->attri_value = ((char *)attrip) +
> -				sizeof(struct xfs_attri_log_item) +
> -				name_len;
> -	else
> -		attrip->attri_value = NULL;
> +	/*
> +	 * Grab an extra reference to the name/value buffer for this log item.
> +	 * The caller retains its own reference!
> +	 */
> +	attrip->attri_nameval = xfs_attri_log_nameval_get(anvl);

Handle _get failure here, or better, pass in an already referenced
nv because the caller should always have a reference to begin
with. Caller can probably handle allocation failure, too, because
this should be called before we've dirtied the transaction, right?

.....

> @@ -385,16 +435,33 @@ xfs_attr_create_intent(
>  	 * Each attr item only performs one attribute operation at a time, so
>  	 * this is a list of one
>  	 */
> -	list_for_each_entry(attr, items, xattri_list) {
> -		attrip = xfs_attri_init(mp, attr->xattri_da_args->namelen,
> -					attr->xattri_da_args->valuelen);
> -		if (attrip == NULL)
> -			return NULL;
> +	attr = list_first_entry_or_null(items, struct xfs_attr_item,
> +			xattri_list);
>  
> -		xfs_trans_add_item(tp, &attrip->attri_item);
> -		xfs_attr_log_item(tp, attrip, attr);
> +	/*
> +	 * Create a buffer to store the attribute name and value.  This buffer
> +	 * will be shared between the higher level deferred xattr work state
> +	 * and the lower level xattr log items.
> +	 */
> +	if (!attr->xattri_nameval) {
> +		struct xfs_da_args	*args = attr->xattri_da_args;
> +
> +		/*
> +		 * Transfer our reference to the name/value buffer to the
> +		 * deferred work state structure.
> +		 */
> +		attr->xattri_nameval = xfs_attri_log_nameval_alloc(args->name,
> +				args->namelen, args->value, args->valuelen);
> +	}
> +	if (!attr->xattri_nameval) {
> +		xlog_force_shutdown(mp->m_log, SHUTDOWN_LOG_IO_ERROR);
> +		return NULL;
>  	}

Why shutdown on ENOMEM? I would have expects that we return to the
caller so it can cancel the transaction. That way we only shut down
if the transaction is dirty or the caller context can't handle
errors....

>  
> +	attrip = xfs_attri_init(mp, attr->xattri_nameval);
> +	xfs_trans_add_item(tp, &attrip->attri_item);
> +	xfs_attr_log_item(tp, attrip, attr);
> +
>  	return &attrip->attri_item;
>  }
>  
> @@ -404,6 +471,8 @@ xfs_attr_free_item(
>  {
>  	if (attr->xattri_da_state)
>  		xfs_da_state_free(attr->xattri_da_state);
> +	if (attr->xattri_nameval)
> +		xfs_attri_log_nameval_put(attr->xattri_nameval);

Handle the null inside xfs_attri_log_nameval_put().

> @@ -567,10 +615,17 @@ xfs_attri_item_recover(
>  	attr->xattri_op_flags = attrp->alfi_op_flags &
>  						XFS_ATTR_OP_FLAGS_TYPE_MASK;
>  
> +	/*
> +	 * We're reconstructing the deferred work state structure from the
> +	 * recovered log item.  Grab a reference to the name/value buffer and
> +	 * attach it to the new work state.
> +	 */
> +	attr->xattri_nameval = xfs_attri_log_nameval_get(anvl);

and handle NULL here.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] xfs: validate xattr name earlier in recovery
  2022-05-18 18:55 ` [PATCH 1/3] xfs: validate xattr name earlier in recovery Darrick J. Wong
@ 2022-05-19  1:36   ` Dave Chinner
  2022-05-19 20:33   ` Alli
  1 sibling, 0 replies; 11+ messages in thread
From: Dave Chinner @ 2022-05-19  1:36 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, allison.henderson

On Wed, May 18, 2022 at 11:55:08AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> When we're validating a recovered xattr log item during log recovery, we
> should check the name before starting to allocate resources.  This isn't
> strictly necessary on its own, but it means that we won't bother with
> huge memory allocations during recovery if the attr name is garbage,
> which will simplify the changes in the next patch.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
>  fs/xfs/xfs_attr_item.c |   15 ++++++++-------
>  1 file changed, 8 insertions(+), 7 deletions(-)

Looks fine.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] xfs: free xfs_attrd_log_items correctly
  2022-05-18 18:55 ` [PATCH 3/3] xfs: free xfs_attrd_log_items correctly Darrick J. Wong
@ 2022-05-19  1:37   ` Dave Chinner
  2022-05-19 20:33   ` Alli
  1 sibling, 0 replies; 11+ messages in thread
From: Dave Chinner @ 2022-05-19  1:37 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, allison.henderson

On Wed, May 18, 2022 at 11:55:19AM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Technically speaking, objects allocated out of a specific slab cache are
> supposed to be freed to that slab cache.  The popular slab backends will
> take care of this for us, but SLOB famously doesn't.  Fix this, even if
> slob + xfs are not that common of a combination.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> ---
>  fs/xfs/xfs_attr_item.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

looks fine.

Reviewed-by: Dave Chinner <dchinner@redhat.com>
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] xfs: share xattr name and value buffers when logging xattr updates
  2022-05-19  0:27   ` Dave Chinner
@ 2022-05-19 18:08     ` Darrick J. Wong
  2022-05-20  3:22       ` Dave Chinner
  0 siblings, 1 reply; 11+ messages in thread
From: Darrick J. Wong @ 2022-05-19 18:08 UTC (permalink / raw)
  To: Dave Chinner; +Cc: linux-xfs, allison.henderson

On Thu, May 19, 2022 at 10:27:27AM +1000, Dave Chinner wrote:
> On Wed, May 18, 2022 at 11:55:13AM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@kernel.org>
> > 
> > While running xfs/297 and generic/642, I noticed a crash in
> > xfs_attri_item_relog when it tries to copy the attr name to the new
> > xattri log item.  I think what happened here was that we called
> > ->iop_commit on the old attri item (which nulls out the pointers) as
> > part of a log force at the same time that a chained attr operation was
> > ongoing.  The system was busy enough that at some later point, the defer
> > ops operation decided it was necessary to relog the attri log item, but
> > as we've detached the name buffer from the old attri log item, we can't
> > copy it to the new one, and kaboom.
> > 
> > I think there's a broader refcounting problem with LARP mode -- the
> > setxattr code can return to userspace before the CIL actually formats
> > and commits the log item, which results in a UAF bug.  Therefore, the
> > xattr log item needs to be able to retain a reference to the name and
> > value buffers until the log items have completely cleared the log.
> > Furthermore, each time we create an intent log item, we allocate new
> > memory and (re)copy the contents; sharing here would be very useful.
> > 
> > Solve the UAF and the unnecessary memory allocations by having the log
> > code create a single refcounted buffer to contain the name and value
> > contents.  This buffer can be passed from old to new during a relog
> > operation, and the logging code can (optionally) attach it to the
> > xfs_attr_item for reuse when LARP mode is enabled.
> > 
> > This also fixes a problem where the xfs_attri_log_item objects weren't
> > being freed back to the same cache where they came from.
> > 
> > Signed-off-by: Darrick J. Wong <djwong@kernel.org>
> > ---
> >  fs/xfs/libxfs/xfs_attr.h |    8 +
> >  fs/xfs/xfs_attr_item.c   |  279 +++++++++++++++++++++++++++-------------------
> >  fs/xfs/xfs_attr_item.h   |   13 +-
> >  3 files changed, 182 insertions(+), 118 deletions(-)
> > 
> > 
> > diff --git a/fs/xfs/libxfs/xfs_attr.h b/fs/xfs/libxfs/xfs_attr.h
> > index 1af7abe29eef..17746dcc2268 100644
> > --- a/fs/xfs/libxfs/xfs_attr.h
> > +++ b/fs/xfs/libxfs/xfs_attr.h
> > @@ -501,6 +501,8 @@ enum xfs_delattr_state {
> >  	{ XFS_DAS_NODE_REMOVE_ATTR,	"XFS_DAS_NODE_REMOVE_ATTR" }, \
> >  	{ XFS_DAS_DONE,			"XFS_DAS_DONE" }
> >  
> > +struct xfs_attri_log_nameval;
> 
> Ok, so this structure is:
> 
> struct xfs_attri_log_nameval {
> 	refcount_t		anvl_refcount;
> 	unsigned int		anvl_name_len;
> 	unsigned int		anvl_value_len;
> 
> 	/* name and value follow the end of this struct */
> };
> 
> If we are going to have a custom structure for this, let's actually
> do it properly and not use magic pointers for the name/value
> buffers. These are effectively iovecs, and we already do this
> properly with log iovecs in shadow buffers. i.e:
> 
> struct xfs_attri_log_nameval {
> 	refcount_t		refcount;
> 	struct xfs_log_iovec	name;
> 	struct xfs_log_iovec	value;
> 
> 	/* name and value follow the end of this struct */
> };
> 
> When we set up the structure:
> 
> 	nv.name.i_addr = (void *)&nv + 1;
> 	nv.name.i_len = name_len;
> 	memcpy(nv.name.i_addr, name, name_len);
> 	if (value_len) {
> 		nv.value.i_addr = nv.name.i_addr + nv.name.i_len;
> 		nv.value.i_len = value_len;
> 		memcpy(nv.value.i_addr, value, value_len);
> 	}

Yes, that makes a lot more sense.  I'll convert the whole thing to use
log iovecs and non-prefixed names.

> And from that point onwards we only ever use the i_addr/i_len pairs
> to refer to the name/values. We don't need pointer magic anywhere
> but the setup code.
> 
> (Also, "anvil"? Too clever by half and unnecessily verbose. It's a
> just {name, value} pair, call it "nv").

Heh. :)

> >  /*
> >   * Defines for xfs_attr_item.xattri_flags
> >   */
> > @@ -512,6 +514,12 @@ enum xfs_delattr_state {
> >  struct xfs_attr_item {
> >  	struct xfs_da_args		*xattri_da_args;
> >  
> > +	/*
> > +	 * Shared buffer containing the attr name and value so that the logging
> > +	 * code can share large memory buffers between log items.
> > +	 */
> > +	struct xfs_attri_log_nameval	*xattri_nameval;
> > +
> >  	/*
> >  	 * Used by xfs_attr_set to hold a leaf buffer across a transaction roll
> >  	 */
> > diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
> > index 4976b1ddc09f..7d4469e8a4fc 100644
> > --- a/fs/xfs/xfs_attr_item.c
> > +++ b/fs/xfs/xfs_attr_item.c
> > @@ -39,12 +39,91 @@ static inline struct xfs_attri_log_item *ATTRI_ITEM(struct xfs_log_item *lip)
> >  	return container_of(lip, struct xfs_attri_log_item, attri_item);
> >  }
> >  
> > +/*
> > + * Shared xattr name/value buffers for logged extended attribute operations
> > + *
> > + * When logging updates to extended attributes, we can create quite a few
> > + * attribute log intent items for a single xattr update.  To avoid cycling the
> > + * memory allocator and memcpy overhead, the name (and value, for setxattr)
> > + * are kept in a refcounted object that is shared across all related log items
> > + * and the upper-level deferred work state structure.  The shared buffer has
> > + * a control structure, followed by the name, and then the value.
> > + */
> > +
> > +static inline struct xfs_attri_log_nameval *
> > +xfs_attri_log_nameval_get(
> > +	struct xfs_attri_log_nameval	*anvl)
> > +{
> > +	BUG_ON(!refcount_inc_not_zero(&anvl->anvl_refcount));
> > +	return anvl;
> 
> No BUG_ON() error handling.
> 
> ASSERT or WARN, return NULL on failure. Handle failure in the
> caller.

Fixed.

> > +}
> > +
> > +static inline void
> > +xfs_attri_log_nameval_put(
> > +	struct xfs_attri_log_nameval	*anvl)
> > +{
> > +	if (refcount_dec_and_test(&anvl->anvl_refcount))
> > +		kvfree(anvl);
> > +}
> > +
> > +static inline void *
> > +xfs_attri_log_namebuf(
> > +	struct xfs_attri_log_item	*attrip)
> > +{
> > +	return attrip->attri_nameval + 1;
> 
> I don't know what this points to just looking at this. Trying to
> work out if it points to an address in the attri_nameval region or
> whether it points to something in the attrip structure makes my head
> hurt.  I'd much prefer this be written as:
> 
> 	return attrip->attri_nameval->name.i_addr;
> 
> Because it explicitly encodes exactly what buffer we are returning
> here. And with this, we probably don't even need the wrapper
> function now.

Yep, this is now removed.

> > +}
> > +
> > +static inline void *
> > +xfs_attri_log_valbuf(
> > +	struct xfs_attri_log_item	*attrip)
> > +{
> > +	struct xfs_attri_log_nameval	*anvl = attrip->attri_nameval;
> > +
> > +	if (anvl->anvl_value_len == 0)
> > +		return NULL;
> > +
> > +	return (char *)xfs_attri_log_namebuf(attrip) + anvl->anvl_name_len;
> > +}
> 
> And this becomes:
> 
> 	return attrip->attri_nameval->value.i_addr;
> 
> Because it is initialised to NULL if there is no value.
> 
> I think the need for this wrapper goes away, too.

Also removed.

> > +
> > +static inline struct xfs_attri_log_nameval *
> > +xfs_attri_log_nameval_alloc(
> > +	const void			*name,
> > +	unsigned int			name_len,
> > +	const void			*value,
> > +	unsigned int			value_len)
> > +{
> > +	struct xfs_attri_log_nameval	*anvl;
> > +	void				*p;
> > +
> > +	/*
> > +	 * This could be over 64kB in length, so we have to use kvmalloc() for
> > +	 * this. But kvmalloc() utterly sucks, so we use our own version.
> > +	 */
> > +	anvl = xlog_kvmalloc(sizeof(struct xfs_attri_log_nameval) +
> > +					name_len + value_len);
> > +	if (!anvl)
> > +		return anvl;
> > +
> > +	anvl->anvl_name_len = name_len;
> > +	anvl->anvl_value_len = value_len;
> 
> I'm really starting to dislike all the 4 and 5 letter variable name
> prefixs in the new attr code. They are all just repeating the
> variable name and so are largely redundant and makes the code
> very verbose for no good reason. This:
> 
> 	anvl->name_len = name_len;
> 	anvl->value_len = value_len;
> 
> is better because it is shorter and conveys exactly the same
> information to the reader.

<nod>

> > +	p = anvl + 1;
> > +	memcpy(p, name, name_len);
> > +	if (value_len) {
> > +		p += name_len;
> > +		memcpy(p, value, value_len);
> > +	}
> > +	refcount_set(&anvl->anvl_refcount, 1);
> > +	return anvl;
> > +}
> > +
> >  STATIC void
> >  xfs_attri_item_free(
> >  	struct xfs_attri_log_item	*attrip)
> >  {
> >  	kmem_free(attrip->attri_item.li_lv_shadow);
> > -	kvfree(attrip);
> > +	if (attrip->attri_nameval)
> > +		xfs_attri_log_nameval_put(attrip->attri_nameval);
> 
> Handle the NULL inside xfs_attri_log_nameval_put().

Fixed.

> ....
> 
> > @@ -108,22 +189,22 @@ xfs_attri_item_format(
> >  	 * the log recovery.
> >  	 */
> >  
> > -	ASSERT(attrip->attri_name_len > 0);
> > +	ASSERT(anvl->anvl_name_len > 0);
> >  	attrip->attri_format.alfi_size++;
> >  
> > -	if (attrip->attri_value_len > 0)
> > +	if (anvl->anvl_value_len > 0)
> >  		attrip->attri_format.alfi_size++;
> >  
> >  	xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTRI_FORMAT,
> >  			&attrip->attri_format,
> >  			sizeof(struct xfs_attri_log_format));
> >  	xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTR_NAME,
> > -			attrip->attri_name,
> > -			attrip->attri_name_len);
> > -	if (attrip->attri_value_len > 0)
> > +			xfs_attri_log_namebuf(attrip),
> > +			anvl->anvl_name_len);
> 
> Yeah, these wrappers to retrieve the actual buffer need to die.
> 
> FWIW, If we've already got the name encoded in a log iovec, add a
> xlog_copy_from_iovec() method that handles empty iovecs and just
> replace all this code with:
> 
> 	xlog_copy_from_iovec(lv, &vecp, nv->name);
> 	xlog_copy_from_iovec(lv, &vecp, nv->value);
> 
> This is the first step towards sharing the NV buffer deep into the
> CIL commit code so the shadow buffer doesn't need to allocate
> another 64kB and copy that 64kB every time we log a new intent.
> 
> Not necessary for this patch, but I really want start in that
> direction to get away from magic buffers so that the future
> optimisations we already know we need to make are easier to do.

I agree, structured, uh, structs seems like a better course to take in
the long run.

> > +	if (anvl->anvl_value_len > 0)
> >  		xlog_copy_iovec(lv, &vecp, XLOG_REG_TYPE_ATTR_VALUE,
> > -				attrip->attri_value,
> > -				attrip->attri_value_len);
> > +				xfs_attri_log_valbuf(attrip),
> > +				anvl->anvl_value_len);
> >  }
> >  
> >  /*
> > @@ -158,41 +239,17 @@ xfs_attri_item_release(
> >  STATIC struct xfs_attri_log_item *
> >  xfs_attri_init(
> >  	struct xfs_mount		*mp,
> > -	uint32_t			name_len,
> > -	uint32_t			value_len)
> > -
> > +	struct xfs_attri_log_nameval	*anvl)
> >  {
> >  	struct xfs_attri_log_item	*attrip;
> > -	uint32_t			buffer_size = name_len + value_len;
> >  
> > -	if (buffer_size) {
> > -		/*
> > -		 * This could be over 64kB in length, so we have to use
> > -		 * kvmalloc() for this. But kvmalloc() utterly sucks, so we
> > -		 * use own version.
> > -		 */
> > -		attrip = xlog_kvmalloc(sizeof(struct xfs_attri_log_item) +
> > -					buffer_size);
> > -	} else {
> > -		attrip = kmem_cache_alloc(xfs_attri_cache,
> > -					GFP_NOFS | __GFP_NOFAIL);
> > -	}
> > -	memset(attrip, 0, sizeof(struct xfs_attri_log_item));
> > +	attrip = kmem_cache_zalloc(xfs_attri_cache, GFP_NOFS | __GFP_NOFAIL);
> >  
> > -	attrip->attri_name_len = name_len;
> > -	if (name_len)
> > -		attrip->attri_name = ((char *)attrip) +
> > -				sizeof(struct xfs_attri_log_item);
> > -	else
> > -		attrip->attri_name = NULL;
> > -
> > -	attrip->attri_value_len = value_len;
> > -	if (value_len)
> > -		attrip->attri_value = ((char *)attrip) +
> > -				sizeof(struct xfs_attri_log_item) +
> > -				name_len;
> > -	else
> > -		attrip->attri_value = NULL;
> > +	/*
> > +	 * Grab an extra reference to the name/value buffer for this log item.
> > +	 * The caller retains its own reference!
> > +	 */
> > +	attrip->attri_nameval = xfs_attri_log_nameval_get(anvl);
> 
> Handle _get failure here, or better, pass in an already referenced
> nv because the caller should always have a reference to begin
> with. Caller can probably handle allocation failure, too, because
> this should be called before we've dirtied the transaction, right?

_nameval_get merely bumps the refcount, so the caller should already
have a valid reference to begin with.  So I think the _get function can
become:

	if (!refcount_inc_not_zero(anvl..))
		return NULL;
	return val;

and then this callsite can add a simple ASSERT to ensure that we never
pass around a zero-refcount object (in theory the refcount code will
also fail loudly):

	attrip->attri_nameval = xfs_attri_log_nameval_get(anvl);
	ASSERT(attrip->attri_nameval);

> .....
> 
> > @@ -385,16 +435,33 @@ xfs_attr_create_intent(
> >  	 * Each attr item only performs one attribute operation at a time, so
> >  	 * this is a list of one
> >  	 */
> > -	list_for_each_entry(attr, items, xattri_list) {
> > -		attrip = xfs_attri_init(mp, attr->xattri_da_args->namelen,
> > -					attr->xattri_da_args->valuelen);
> > -		if (attrip == NULL)
> > -			return NULL;
> > +	attr = list_first_entry_or_null(items, struct xfs_attr_item,
> > +			xattri_list);
> >  
> > -		xfs_trans_add_item(tp, &attrip->attri_item);
> > -		xfs_attr_log_item(tp, attrip, attr);
> > +	/*
> > +	 * Create a buffer to store the attribute name and value.  This buffer
> > +	 * will be shared between the higher level deferred xattr work state
> > +	 * and the lower level xattr log items.
> > +	 */
> > +	if (!attr->xattri_nameval) {
> > +		struct xfs_da_args	*args = attr->xattri_da_args;
> > +
> > +		/*
> > +		 * Transfer our reference to the name/value buffer to the
> > +		 * deferred work state structure.
> > +		 */
> > +		attr->xattri_nameval = xfs_attri_log_nameval_alloc(args->name,
> > +				args->namelen, args->value, args->valuelen);
> > +	}
> > +	if (!attr->xattri_nameval) {
> > +		xlog_force_shutdown(mp->m_log, SHUTDOWN_LOG_IO_ERROR);
> > +		return NULL;
> >  	}
> 
> Why shutdown on ENOMEM? I would have expects that we return to the
> caller so it can cancel the transaction. That way we only shut down
> if the transaction is dirty or the caller context can't handle
> errors....

->create_intent can return NULL if they don't need to log any intent
items to restart a piece of deferred work.  xfs_defer_create_intent*()
currently have no means to convey an errno up to their callers, but that
could be a preparatory cleanup.

> >  
> > +	attrip = xfs_attri_init(mp, attr->xattri_nameval);
> > +	xfs_trans_add_item(tp, &attrip->attri_item);
> > +	xfs_attr_log_item(tp, attrip, attr);
> > +
> >  	return &attrip->attri_item;
> >  }
> >  
> > @@ -404,6 +471,8 @@ xfs_attr_free_item(
> >  {
> >  	if (attr->xattri_da_state)
> >  		xfs_da_state_free(attr->xattri_da_state);
> > +	if (attr->xattri_nameval)
> > +		xfs_attri_log_nameval_put(attr->xattri_nameval);
> 
> Handle the null inside xfs_attri_log_nameval_put().

Fixed.

> > @@ -567,10 +615,17 @@ xfs_attri_item_recover(
> >  	attr->xattri_op_flags = attrp->alfi_op_flags &
> >  						XFS_ATTR_OP_FLAGS_TYPE_MASK;
> >  
> > +	/*
> > +	 * We're reconstructing the deferred work state structure from the
> > +	 * recovered log item.  Grab a reference to the name/value buffer and
> > +	 * attach it to the new work state.
> > +	 */
> > +	attr->xattri_nameval = xfs_attri_log_nameval_get(anvl);
> 
> and handle NULL here.

Done.

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 1/3] xfs: validate xattr name earlier in recovery
  2022-05-18 18:55 ` [PATCH 1/3] xfs: validate xattr name earlier in recovery Darrick J. Wong
  2022-05-19  1:36   ` Dave Chinner
@ 2022-05-19 20:33   ` Alli
  1 sibling, 0 replies; 11+ messages in thread
From: Alli @ 2022-05-19 20:33 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, david

On Wed, 2022-05-18 at 11:55 -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> When we're validating a recovered xattr log item during log recovery,
> we
> should check the name before starting to allocate resources.  This
> isn't
> strictly necessary on its own, but it means that we won't bother with
> huge memory allocations during recovery if the attr name is garbage,
> which will simplify the changes in the next patch.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Ok, this looks good to me
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
> ---
>  fs/xfs/xfs_attr_item.c |   15 ++++++++-------
>  1 file changed, 8 insertions(+), 7 deletions(-)
> 
> 
> diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
> index fd0a74f3ef45..4976b1ddc09f 100644
> --- a/fs/xfs/xfs_attr_item.c
> +++ b/fs/xfs/xfs_attr_item.c
> @@ -688,16 +688,23 @@ xlog_recover_attri_commit_pass2(
>  	struct xfs_mount                *mp = log->l_mp;
>  	struct xfs_attri_log_item       *attrip;
>  	struct xfs_attri_log_format     *attri_formatp;
> +	const void			*attr_name;
>  	int				region = 0;
>  
>  	attri_formatp = item->ri_buf[region].i_addr;
> +	attr_name = item->ri_buf[1].i_addr;
>  
> -	/* Validate xfs_attri_log_format */
> +	/* Validate xfs_attri_log_format before the large memory
> allocation */
>  	if (!xfs_attri_validate(mp, attri_formatp)) {
>  		XFS_ERROR_REPORT(__func__, XFS_ERRLEVEL_LOW, mp);
>  		return -EFSCORRUPTED;
>  	}
>  
> +	if (!xfs_attr_namecheck(attr_name, attri_formatp-
> >alfi_name_len)) {
> +		XFS_ERROR_REPORT(__func__, XFS_ERRLEVEL_LOW, mp);
> +		return -EFSCORRUPTED;
> +	}
> +
>  	/* memory alloc failure will cause replay to abort */
>  	attrip = xfs_attri_init(mp, attri_formatp->alfi_name_len,
>  				attri_formatp->alfi_value_len);
> @@ -713,12 +720,6 @@ xlog_recover_attri_commit_pass2(
>  	memcpy(attrip->attri_name, item->ri_buf[region].i_addr,
>  	       attrip->attri_name_len);
>  
> -	if (!xfs_attr_namecheck(attrip->attri_name, attrip-
> >attri_name_len)) {
> -		XFS_ERROR_REPORT(__func__, XFS_ERRLEVEL_LOW, mp);
> -		error = -EFSCORRUPTED;
> -		goto out;
> -	}
> -
>  	if (attrip->attri_value_len > 0) {
>  		region++;
>  		memcpy(attrip->attri_value, item-
> >ri_buf[region].i_addr,
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 3/3] xfs: free xfs_attrd_log_items correctly
  2022-05-18 18:55 ` [PATCH 3/3] xfs: free xfs_attrd_log_items correctly Darrick J. Wong
  2022-05-19  1:37   ` Dave Chinner
@ 2022-05-19 20:33   ` Alli
  1 sibling, 0 replies; 11+ messages in thread
From: Alli @ 2022-05-19 20:33 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, david

On Wed, 2022-05-18 at 11:55 -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
> 
> Technically speaking, objects allocated out of a specific slab cache
> are
> supposed to be freed to that slab cache.  The popular slab backends
> will
> take care of this for us, but SLOB famously doesn't.  Fix this, even
> if
> slob + xfs are not that common of a combination.
> 
> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Looks fine:
Reviewed-by: Allison Henderson <allison.henderson@oracle.com>
> ---
>  fs/xfs/xfs_attr_item.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> 
> diff --git a/fs/xfs/xfs_attr_item.c b/fs/xfs/xfs_attr_item.c
> index 7d4469e8a4fc..9ef2c2455921 100644
> --- a/fs/xfs/xfs_attr_item.c
> +++ b/fs/xfs/xfs_attr_item.c
> @@ -290,7 +290,7 @@ STATIC void
>  xfs_attrd_item_free(struct xfs_attrd_log_item *attrdp)
>  {
>  	kmem_free(attrdp->attrd_item.li_lv_shadow);
> -	kmem_free(attrdp);
> +	kmem_cache_free(xfs_attrd_cache, attrdp);
>  }
>  
>  STATIC void
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH 2/3] xfs: share xattr name and value buffers when logging xattr updates
  2022-05-19 18:08     ` Darrick J. Wong
@ 2022-05-20  3:22       ` Dave Chinner
  0 siblings, 0 replies; 11+ messages in thread
From: Dave Chinner @ 2022-05-20  3:22 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-xfs, allison.henderson

On Thu, May 19, 2022 at 11:08:16AM -0700, Darrick J. Wong wrote:
> On Thu, May 19, 2022 at 10:27:27AM +1000, Dave Chinner wrote:
> > On Wed, May 18, 2022 at 11:55:13AM -0700, Darrick J. Wong wrote:
> > > @@ -158,41 +239,17 @@ xfs_attri_item_release(
> > >  STATIC struct xfs_attri_log_item *
> > >  xfs_attri_init(
> > >  	struct xfs_mount		*mp,
> > > -	uint32_t			name_len,
> > > -	uint32_t			value_len)
> > > -
> > > +	struct xfs_attri_log_nameval	*anvl)
> > >  {
> > >  	struct xfs_attri_log_item	*attrip;
> > > -	uint32_t			buffer_size = name_len + value_len;
> > >  
> > > -	if (buffer_size) {
> > > -		/*
> > > -		 * This could be over 64kB in length, so we have to use
> > > -		 * kvmalloc() for this. But kvmalloc() utterly sucks, so we
> > > -		 * use own version.
> > > -		 */
> > > -		attrip = xlog_kvmalloc(sizeof(struct xfs_attri_log_item) +
> > > -					buffer_size);
> > > -	} else {
> > > -		attrip = kmem_cache_alloc(xfs_attri_cache,
> > > -					GFP_NOFS | __GFP_NOFAIL);
> > > -	}
> > > -	memset(attrip, 0, sizeof(struct xfs_attri_log_item));
> > > +	attrip = kmem_cache_zalloc(xfs_attri_cache, GFP_NOFS | __GFP_NOFAIL);
> > >  
> > > -	attrip->attri_name_len = name_len;
> > > -	if (name_len)
> > > -		attrip->attri_name = ((char *)attrip) +
> > > -				sizeof(struct xfs_attri_log_item);
> > > -	else
> > > -		attrip->attri_name = NULL;
> > > -
> > > -	attrip->attri_value_len = value_len;
> > > -	if (value_len)
> > > -		attrip->attri_value = ((char *)attrip) +
> > > -				sizeof(struct xfs_attri_log_item) +
> > > -				name_len;
> > > -	else
> > > -		attrip->attri_value = NULL;
> > > +	/*
> > > +	 * Grab an extra reference to the name/value buffer for this log item.
> > > +	 * The caller retains its own reference!
> > > +	 */
> > > +	attrip->attri_nameval = xfs_attri_log_nameval_get(anvl);
> > 
> > Handle _get failure here, or better, pass in an already referenced
> > nv because the caller should always have a reference to begin
> > with. Caller can probably handle allocation failure, too, because
> > this should be called before we've dirtied the transaction, right?
> 
> _nameval_get merely bumps the refcount, so the caller should already
> have a valid reference to begin with.  So I think the _get function can
> become:
> 
> 	if (!refcount_inc_not_zero(anvl..))
> 		return NULL;
> 	return val;
> 
> and then this callsite can add a simple ASSERT to ensure that we never
> pass around a zero-refcount object (in theory the refcount code will
> also fail loudly):
> 
> 	attrip->attri_nameval = xfs_attri_log_nameval_get(anvl);
> 	ASSERT(attrip->attri_nameval);

I'm fine with that - it documents that the get should not fail at
this point.

> 
> > .....
> > 
> > > @@ -385,16 +435,33 @@ xfs_attr_create_intent(
> > >  	 * Each attr item only performs one attribute operation at a time, so
> > >  	 * this is a list of one
> > >  	 */
> > > -	list_for_each_entry(attr, items, xattri_list) {
> > > -		attrip = xfs_attri_init(mp, attr->xattri_da_args->namelen,
> > > -					attr->xattri_da_args->valuelen);
> > > -		if (attrip == NULL)
> > > -			return NULL;
> > > +	attr = list_first_entry_or_null(items, struct xfs_attr_item,
> > > +			xattri_list);
> > >  
> > > -		xfs_trans_add_item(tp, &attrip->attri_item);
> > > -		xfs_attr_log_item(tp, attrip, attr);
> > > +	/*
> > > +	 * Create a buffer to store the attribute name and value.  This buffer
> > > +	 * will be shared between the higher level deferred xattr work state
> > > +	 * and the lower level xattr log items.
> > > +	 */
> > > +	if (!attr->xattri_nameval) {
> > > +		struct xfs_da_args	*args = attr->xattri_da_args;
> > > +
> > > +		/*
> > > +		 * Transfer our reference to the name/value buffer to the
> > > +		 * deferred work state structure.
> > > +		 */
> > > +		attr->xattri_nameval = xfs_attri_log_nameval_alloc(args->name,
> > > +				args->namelen, args->value, args->valuelen);
> > > +	}
> > > +	if (!attr->xattri_nameval) {
> > > +		xlog_force_shutdown(mp->m_log, SHUTDOWN_LOG_IO_ERROR);
> > > +		return NULL;
> > >  	}
> > 
> > Why shutdown on ENOMEM? I would have expects that we return to the
> > caller so it can cancel the transaction. That way we only shut down
> > if the transaction is dirty or the caller context can't handle
> > errors....
> 
> ->create_intent can return NULL if they don't need to log any intent
> items to restart a piece of deferred work.  xfs_defer_create_intent*()
> currently have no means to convey an errno up to their callers, but that
> could be a preparatory cleanup.

Ok, if you add a brief comment then I'm fine with it. The cleanup so
we have error paths here can be done later.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2022-05-20  3:22 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-18 18:55 [PATCHSET 0/3] xfs: fix name/value buffer lifetime errrors Darrick J. Wong
2022-05-18 18:55 ` [PATCH 1/3] xfs: validate xattr name earlier in recovery Darrick J. Wong
2022-05-19  1:36   ` Dave Chinner
2022-05-19 20:33   ` Alli
2022-05-18 18:55 ` [PATCH 2/3] xfs: share xattr name and value buffers when logging xattr updates Darrick J. Wong
2022-05-19  0:27   ` Dave Chinner
2022-05-19 18:08     ` Darrick J. Wong
2022-05-20  3:22       ` Dave Chinner
2022-05-18 18:55 ` [PATCH 3/3] xfs: free xfs_attrd_log_items correctly Darrick J. Wong
2022-05-19  1:37   ` Dave Chinner
2022-05-19 20:33   ` Alli

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.