All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/14] xfsprogs: kernel libxfs sync up to 4.5-rc2
@ 2016-02-15  6:18 Dave Chinner
  2016-02-15  6:18 ` [PATCH 01/14] libxfs: Optimize the loop for xfs_bitmap_empty Dave Chinner
                   ` (14 more replies)
  0 siblings, 15 replies; 18+ messages in thread
From: Dave Chinner @ 2016-02-15  6:18 UTC (permalink / raw)
  To: xfs

Hi folks,

This patchset pulls xfsprogs lixfs up to the same base code as the
v4.5-rc2 kernel. I've simply pulled the commits across and in each
patch made the external xfsprogs changes necessary to make them
compile and work correctly. This applies on top of the xfs_io dax
patchset I posted a short while ago. I've left all the kernel
sign-offs on the commits - it's really only the patches that change
stuff outside libxfs that need any sort of review...

This is pretty much all of the remaining changes needed before I can
cut a 4.5.0-rc1 release of xfsprogs - it's really only bug fixes and
minor changes from here, as I really need this base tree mostly
stable from here on.

The reason for that I'm now going to be build a for-next branch off
this that has all the changes in the current kernel for-next branch.
This will enable Darrick to have a sane code base that he can rebase
all his xfsprogs changes for rmap/reflink on top of. That will make
review, testing and eventual merge much easier for everyone.

-Dave.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 01/14] libxfs: Optimize the loop for xfs_bitmap_empty
  2016-02-15  6:18 [PATCH 0/14] xfsprogs: kernel libxfs sync up to 4.5-rc2 Dave Chinner
@ 2016-02-15  6:18 ` Dave Chinner
  2016-02-15  6:18 ` [PATCH 02/14] xfs: add missing bmap cancel calls in error paths Dave Chinner
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 18+ messages in thread
From: Dave Chinner @ 2016-02-15  6:18 UTC (permalink / raw)
  To: xfs

From: Jia He <hejianet@gmail.com>

Source kernel commit 1d4292bfdc77f4f7c520064be15d0c46bd025fd2

If there is any non zero bit in a long bitmap, it can jump out of the
loop and finish the function as soon as possible.

Signed-off-by: Jia He <hejianet@gmail.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 libxfs/xfs_bit.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/libxfs/xfs_bit.c b/libxfs/xfs_bit.c
index 8b5b81c..041557a 100644
--- a/libxfs/xfs_bit.c
+++ b/libxfs/xfs_bit.c
@@ -32,13 +32,13 @@ int
 xfs_bitmap_empty(uint *map, uint size)
 {
 	uint i;
-	uint ret = 0;
 
 	for (i = 0; i < size; i++) {
-		ret |= map[i];
+		if (map[i] != 0)
+			return 0;
 	}
 
-	return (ret == 0);
+	return 1;
 }
 
 /*
-- 
2.5.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 02/14] xfs: add missing bmap cancel calls in error paths
  2016-02-15  6:18 [PATCH 0/14] xfsprogs: kernel libxfs sync up to 4.5-rc2 Dave Chinner
  2016-02-15  6:18 ` [PATCH 01/14] libxfs: Optimize the loop for xfs_bitmap_empty Dave Chinner
@ 2016-02-15  6:18 ` Dave Chinner
  2016-02-15  6:18 ` [PATCH 03/14] xfs: log local to remote symlink conversions correctly on v5 supers Dave Chinner
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 18+ messages in thread
From: Dave Chinner @ 2016-02-15  6:18 UTC (permalink / raw)
  To: xfs

From: Brian Foster <bfoster@redhat.com>

Source kernel commit d4a97a04227d5ba91b91888a016e2300861cfbc7

If a failure occurs after the bmap free list is populated and before
xfs_bmap_finish() completes successfully (which returns a partial
list on failure), the bmap free list must be cancelled. Otherwise,
the extent items on the list are never freed and a memory leak
occurs.

Several random error paths throughout the code suffer this problem.
Fix these up such that xfs_bmap_cancel() is always called on error.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 libxfs/xfs_bmap.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/libxfs/xfs_bmap.c b/libxfs/xfs_bmap.c
index aef7cf3..b9e58e2 100644
--- a/libxfs/xfs_bmap.c
+++ b/libxfs/xfs_bmap.c
@@ -5938,6 +5938,7 @@ xfs_bmap_split_extent(
 	return xfs_trans_commit(tp);
 
 out:
+	xfs_bmap_cancel(&free_list);
 	xfs_trans_cancel(tp);
 	return error;
 }
-- 
2.5.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 03/14] xfs: log local to remote symlink conversions correctly on v5 supers
  2016-02-15  6:18 [PATCH 0/14] xfsprogs: kernel libxfs sync up to 4.5-rc2 Dave Chinner
  2016-02-15  6:18 ` [PATCH 01/14] libxfs: Optimize the loop for xfs_bitmap_empty Dave Chinner
  2016-02-15  6:18 ` [PATCH 02/14] xfs: add missing bmap cancel calls in error paths Dave Chinner
@ 2016-02-15  6:18 ` Dave Chinner
  2016-02-15  6:18 ` [PATCH 04/14] xfs: per-filesystem stats counter implementation Dave Chinner
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 18+ messages in thread
From: Dave Chinner @ 2016-02-15  6:18 UTC (permalink / raw)
  To: xfs

From: Brian Foster <bfoster@redhat.com>

Source kernel commit b7cdc66be54b64daef593894d12ecc405f117829

A local format symlink inode is converted to extent format when an
extended attribute is set on an inode as part of the attribute fork
creation. This means a block is allocated, the local symlink target name
is copied to the block and the block is logged. Currently,
xfs_bmap_local_to_extents() handles logging the remote block data based
on the size of the data fork prior to the conversion. This is not
correct on v5 superblock filesystems, which add an additional header to
remote symlink blocks that is nonexistent in local format inodes.

As a result, the full length of the remote symlink block content is not
logged. This can lead to corruption should a crash occur and log
recovery replay this transaction.

Since a callout is already used to initialize the new remote symlink
block, update the local-to-extents conversion mechanism to make the
callout also responsible for logging the block. It is already required
to set the log buffer type and format the block appropriately based on
the superblock version. This ensures the remote symlink is always logged
correctly. Note that xfs_bmap_local_to_extents() is only called for
symlinks so there are no other callouts that require modification.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 libxfs/xfs_bmap.c           | 10 ++++++----
 libxfs/xfs_symlink_remote.c |  3 +++
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/libxfs/xfs_bmap.c b/libxfs/xfs_bmap.c
index b9e58e2..8464810 100644
--- a/libxfs/xfs_bmap.c
+++ b/libxfs/xfs_bmap.c
@@ -940,14 +940,16 @@ xfs_bmap_local_to_extents(
 	bp = xfs_btree_get_bufl(args.mp, tp, args.fsbno, 0);
 
 	/*
-	 * Initialise the block and copy the data
+	 * Initialize the block, copy the data and log the remote buffer.
 	 *
-	 * Note: init_fn must set the buffer log item type correctly!
+	 * The callout is responsible for logging because the remote format
+	 * might differ from the local format and thus we don't know how much to
+	 * log here. Note that init_fn must also set the buffer log item type
+	 * correctly.
 	 */
 	init_fn(tp, bp, ip, ifp);
 
-	/* account for the change in fork size and log everything */
-	xfs_trans_log_buf(tp, bp, 0, ifp->if_bytes - 1);
+	/* account for the change in fork size */
 	xfs_idata_realloc(ip, -ifp->if_bytes, whichfork);
 	xfs_bmap_local_to_extents_empty(ip, whichfork);
 	flags |= XFS_ILOG_CORE;
diff --git a/libxfs/xfs_symlink_remote.c b/libxfs/xfs_symlink_remote.c
index fb9ece8..04c7446 100644
--- a/libxfs/xfs_symlink_remote.c
+++ b/libxfs/xfs_symlink_remote.c
@@ -184,6 +184,7 @@ xfs_symlink_local_to_remote(
 	if (!xfs_sb_version_hascrc(&mp->m_sb)) {
 		bp->b_ops = NULL;
 		memcpy(bp->b_addr, ifp->if_u1.if_data, ifp->if_bytes);
+		xfs_trans_log_buf(tp, bp, 0, ifp->if_bytes - 1);
 		return;
 	}
 
@@ -199,4 +200,6 @@ xfs_symlink_local_to_remote(
 	buf = bp->b_addr;
 	buf += xfs_symlink_hdr_set(mp, ip->i_ino, 0, ifp->if_bytes, bp);
 	memcpy(buf, ifp->if_u1.if_data, ifp->if_bytes);
+	xfs_trans_log_buf(tp, bp, 0, sizeof(struct xfs_dsymlink_hdr) +
+					ifp->if_bytes - 1);
 }
-- 
2.5.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 04/14] xfs: per-filesystem stats counter implementation
  2016-02-15  6:18 [PATCH 0/14] xfsprogs: kernel libxfs sync up to 4.5-rc2 Dave Chinner
                   ` (2 preceding siblings ...)
  2016-02-15  6:18 ` [PATCH 03/14] xfs: log local to remote symlink conversions correctly on v5 supers Dave Chinner
@ 2016-02-15  6:18 ` Dave Chinner
  2016-02-15  6:18 ` [PATCH 05/14] xfs: introduce BMAPI_ZERO for allocating zeroed extents Dave Chinner
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 18+ messages in thread
From: Dave Chinner @ 2016-02-15  6:18 UTC (permalink / raw)
  To: xfs

From: Bill O'Donnell <billodo@redhat.com>

Source kernel commit ff6d6af2351caea7db681f4539d0d893e400557a

This patch modifies the stats counting macros and the callers
to those macros to properly increment, decrement, and add-to
the xfs stats counts. The counts for global and per-fs stats
are correctly advanced, and cleared by writing a "1" to the
corresponding clear file.

global counts: /sys/fs/xfs/stats/stats
per-fs counts: /sys/fs/xfs/sda*/stats/stats

global clear:  /sys/fs/xfs/stats/stats_clear
per-fs clear:  /sys/fs/xfs/sda*/stats/stats_clear

[dchinner: cleaned up macro variables, removed CONFIG_FS_PROC around
 stats structures and macros. ]

Signed-off-by: Bill O'Donnell <billodo@redhat.com>
Reviewed-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 libxfs/libxfs_priv.h |  6 +++---
 libxfs/xfs_alloc.c   |  8 ++++----
 libxfs/xfs_attr.c    |  6 +++---
 libxfs/xfs_bmap.c    | 20 ++++++++++----------
 libxfs/xfs_btree.h   | 41 ++++++++++++++++++++++++-----------------
 libxfs/xfs_dir2.c    |  6 +++---
 6 files changed, 47 insertions(+), 40 deletions(-)

diff --git a/libxfs/libxfs_priv.h b/libxfs/libxfs_priv.h
index 0f4d3e5..e2884a2 100644
--- a/libxfs/libxfs_priv.h
+++ b/libxfs/libxfs_priv.h
@@ -140,9 +140,9 @@ enum ce { CE_DEBUG, CE_CONT, CE_NOTE, CE_WARN, CE_ALERT, CE_PANIC };
 #define XFS_ERRLEVEL_LOW		1
 #define XFS_FORCED_SHUTDOWN(mp)		0
 #define XFS_ILOCK_EXCL			0
-#define XFS_STATS_INC(count)		do { } while (0)
-#define XFS_STATS_DEC(count, x)		do { } while (0)
-#define XFS_STATS_ADD(count, x)		do { } while (0)
+#define XFS_STATS_INC(mp, count)	do { } while (0)
+#define XFS_STATS_DEC(mp, count, x)	do { } while (0)
+#define XFS_STATS_ADD(mp, count, x)	do { } while (0)
 #define XFS_TRANS_MOD_DQUOT_BYINO(mp,tp,ip,field,delta)	do { } while (0)
 #define XFS_TRANS_RESERVE_QUOTA_NBLKS(mp,tp,ip,nblks,ninos,fl)	0
 #define XFS_TRANS_UNRESERVE_QUOTA_NBLKS(mp,tp,ip,nblks,ninos,fl)	0
diff --git a/libxfs/xfs_alloc.c b/libxfs/xfs_alloc.c
index b43655c..12d59df 100644
--- a/libxfs/xfs_alloc.c
+++ b/libxfs/xfs_alloc.c
@@ -650,8 +650,8 @@ xfs_alloc_ag_vextent(
 				 -((long)(args->len)));
 	}
 
-	XFS_STATS_INC(xs_allocx);
-	XFS_STATS_ADD(xs_allocb, args->len);
+	XFS_STATS_INC(args->mp, xs_allocx);
+	XFS_STATS_ADD(args->mp, xs_allocb, args->len);
 	return error;
 }
 
@@ -1807,8 +1807,8 @@ xfs_free_ag_extent(
 
 	if (!isfl)
 		xfs_trans_mod_sb(tp, XFS_TRANS_SB_FDBLOCKS, (long)len);
-	XFS_STATS_INC(xs_freex);
-	XFS_STATS_ADD(xs_freeb, len);
+	XFS_STATS_INC(mp, xs_freex);
+	XFS_STATS_ADD(mp, xs_freeb, len);
 
 	trace_xfs_free_extent(mp, agno, bno, len, isfl, haveleft, haveright);
 
diff --git a/libxfs/xfs_attr.c b/libxfs/xfs_attr.c
index bdde0f6..5e79f3d 100644
--- a/libxfs/xfs_attr.c
+++ b/libxfs/xfs_attr.c
@@ -120,7 +120,7 @@ xfs_attr_get(
 	uint			lock_mode;
 	int			error;
 
-	XFS_STATS_INC(xs_attr_get);
+	XFS_STATS_INC(ip->i_mount, xs_attr_get);
 
 	if (XFS_FORCED_SHUTDOWN(ip->i_mount))
 		return -EIO;
@@ -202,7 +202,7 @@ xfs_attr_set(
 	int			rsvd = (flags & ATTR_ROOT) != 0;
 	int			error, err2, committed, local;
 
-	XFS_STATS_INC(xs_attr_set);
+	XFS_STATS_INC(mp, xs_attr_set);
 
 	if (XFS_FORCED_SHUTDOWN(dp->i_mount))
 		return -EIO;
@@ -405,7 +405,7 @@ xfs_attr_remove(
 	xfs_fsblock_t		firstblock;
 	int			error;
 
-	XFS_STATS_INC(xs_attr_remove);
+	XFS_STATS_INC(mp, xs_attr_remove);
 
 	if (XFS_FORCED_SHUTDOWN(dp->i_mount))
 		return -EIO;
diff --git a/libxfs/xfs_bmap.c b/libxfs/xfs_bmap.c
index 8464810..8e19b50 100644
--- a/libxfs/xfs_bmap.c
+++ b/libxfs/xfs_bmap.c
@@ -1429,7 +1429,7 @@ xfs_bmap_search_extents(
 	xfs_ifork_t	*ifp;		/* inode fork pointer */
 	xfs_bmbt_rec_host_t  *ep;            /* extent record pointer */
 
-	XFS_STATS_INC(xs_look_exlist);
+	XFS_STATS_INC(ip->i_mount, xs_look_exlist);
 	ifp = XFS_IFORK_PTR(ip, fork);
 
 	ep = xfs_bmap_search_multi_extents(ifp, bno, eofp, lastxp, gotp, prevp);
@@ -1727,7 +1727,7 @@ xfs_bmap_add_extent_delay_real(
 	ASSERT(!bma->cur ||
 	       (bma->cur->bc_private.b.flags & XFS_BTCUR_BPRV_WASDEL));
 
-	XFS_STATS_INC(xs_add_exlist);
+	XFS_STATS_INC(mp, xs_add_exlist);
 
 #define	LEFT		r[0]
 #define	RIGHT		r[1]
@@ -2281,7 +2281,7 @@ xfs_bmap_add_extent_unwritten_real(
 	ASSERT(*idx <= ifp->if_bytes / sizeof(struct xfs_bmbt_rec));
 	ASSERT(!isnullstartblock(new->br_startblock));
 
-	XFS_STATS_INC(xs_add_exlist);
+	XFS_STATS_INC(mp, xs_add_exlist);
 
 #define	LEFT		r[0]
 #define	RIGHT		r[1]
@@ -2941,7 +2941,7 @@ xfs_bmap_add_extent_hole_real(
 	ASSERT(!bma->cur ||
 	       !(bma->cur->bc_private.b.flags & XFS_BTCUR_BPRV_WASDEL));
 
-	XFS_STATS_INC(xs_add_exlist);
+	XFS_STATS_INC(mp, xs_add_exlist);
 
 	state = 0;
 	if (whichfork == XFS_ATTR_FORK)
@@ -4031,7 +4031,7 @@ xfs_bmapi_read(
 	if (XFS_FORCED_SHUTDOWN(mp))
 		return -EIO;
 
-	XFS_STATS_INC(xs_blk_mapr);
+	XFS_STATS_INC(mp, xs_blk_mapr);
 
 	ifp = XFS_IFORK_PTR(ip, whichfork);
 
@@ -4216,7 +4216,7 @@ xfs_bmapi_delay(
 	if (XFS_FORCED_SHUTDOWN(mp))
 		return -EIO;
 
-	XFS_STATS_INC(xs_blk_mapw);
+	XFS_STATS_INC(mp, xs_blk_mapw);
 
 	if (!(ifp->if_flags & XFS_IFEXTENTS)) {
 		error = xfs_iread_extents(NULL, ip, XFS_DATA_FORK);
@@ -4520,7 +4520,7 @@ xfs_bmapi_write(
 
 	ifp = XFS_IFORK_PTR(ip, whichfork);
 
-	XFS_STATS_INC(xs_blk_mapw);
+	XFS_STATS_INC(mp, xs_blk_mapw);
 
 	if (*firstblock == NULLFSBLOCK) {
 		if (XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_BTREE)
@@ -4713,12 +4713,12 @@ xfs_bmap_del_extent(
 	xfs_filblks_t		temp2;	/* for indirect length calculations */
 	int			state = 0;
 
-	XFS_STATS_INC(xs_del_exlist);
+	mp = ip->i_mount;
+	XFS_STATS_INC(mp, xs_del_exlist);
 
 	if (whichfork == XFS_ATTR_FORK)
 		state |= BMAP_ATTRFORK;
 
-	mp = ip->i_mount;
 	ifp = XFS_IFORK_PTR(ip, whichfork);
 	ASSERT((*idx >= 0) && (*idx < ifp->if_bytes /
 		(uint)sizeof(xfs_bmbt_rec_t)));
@@ -5065,7 +5065,7 @@ xfs_bunmapi(
 		*done = 1;
 		return 0;
 	}
-	XFS_STATS_INC(xs_blk_unmap);
+	XFS_STATS_INC(mp, xs_blk_unmap);
 	isrt = (whichfork == XFS_DATA_FORK) && XFS_IS_REALTIME_INODE(ip);
 	start = bno;
 	bno = start + len - 1;
diff --git a/libxfs/xfs_btree.h b/libxfs/xfs_btree.h
index 48cb251..9a88839 100644
--- a/libxfs/xfs_btree.h
+++ b/libxfs/xfs_btree.h
@@ -84,31 +84,38 @@ union xfs_btree_rec {
 /*
  * Generic stats interface
  */
-#define __XFS_BTREE_STATS_INC(type, stat) \
-	XFS_STATS_INC(xs_ ## type ## _2_ ## stat)
-#define XFS_BTREE_STATS_INC(cur, stat)  \
+#define __XFS_BTREE_STATS_INC(mp, type, stat) \
+	XFS_STATS_INC(mp, xs_ ## type ## _2_ ## stat)
+#define XFS_BTREE_STATS_INC(cur, stat)	\
 do {    \
+	struct xfs_mount *__mp = cur->bc_mp; \
 	switch (cur->bc_btnum) {  \
-	case XFS_BTNUM_BNO: __XFS_BTREE_STATS_INC(abtb, stat); break;	\
-	case XFS_BTNUM_CNT: __XFS_BTREE_STATS_INC(abtc, stat); break;	\
-	case XFS_BTNUM_BMAP: __XFS_BTREE_STATS_INC(bmbt, stat); break;	\
-	case XFS_BTNUM_INO: __XFS_BTREE_STATS_INC(ibt, stat); break;	\
-	case XFS_BTNUM_FINO: __XFS_BTREE_STATS_INC(fibt, stat); break;	\
-	case XFS_BTNUM_MAX: ASSERT(0); /* fucking gcc */ ; break;	\
+	case XFS_BTNUM_BNO: __XFS_BTREE_STATS_INC(__mp, abtb, stat); break; \
+	case XFS_BTNUM_CNT: __XFS_BTREE_STATS_INC(__mp, abtc, stat); break; \
+	case XFS_BTNUM_BMAP: __XFS_BTREE_STATS_INC(__mp, bmbt, stat); break; \
+	case XFS_BTNUM_INO: __XFS_BTREE_STATS_INC(__mp, ibt, stat); break; \
+	case XFS_BTNUM_FINO: __XFS_BTREE_STATS_INC(__mp, fibt, stat); break; \
+	case XFS_BTNUM_MAX: ASSERT(0); __mp = __mp /* fucking gcc */ ; break; \
 	}       \
 } while (0)
 
-#define __XFS_BTREE_STATS_ADD(type, stat, val) \
-	XFS_STATS_ADD(xs_ ## type ## _2_ ## stat, val)
+#define __XFS_BTREE_STATS_ADD(mp, type, stat, val) \
+	XFS_STATS_ADD(mp, xs_ ## type ## _2_ ## stat, val)
 #define XFS_BTREE_STATS_ADD(cur, stat, val)  \
 do {    \
+	struct xfs_mount *__mp = cur->bc_mp; \
 	switch (cur->bc_btnum) {  \
-	case XFS_BTNUM_BNO: __XFS_BTREE_STATS_ADD(abtb, stat, val); break; \
-	case XFS_BTNUM_CNT: __XFS_BTREE_STATS_ADD(abtc, stat, val); break; \
-	case XFS_BTNUM_BMAP: __XFS_BTREE_STATS_ADD(bmbt, stat, val); break; \
-	case XFS_BTNUM_INO: __XFS_BTREE_STATS_ADD(ibt, stat, val); break; \
-	case XFS_BTNUM_FINO: __XFS_BTREE_STATS_ADD(fibt, stat, val); break; \
-	case XFS_BTNUM_MAX: ASSERT(0); /* fucking gcc */ ; break;	\
+	case XFS_BTNUM_BNO:	\
+		__XFS_BTREE_STATS_ADD(__mp, abtb, stat, val); break; \
+	case XFS_BTNUM_CNT:	\
+		__XFS_BTREE_STATS_ADD(__mp, abtc, stat, val); break; \
+	case XFS_BTNUM_BMAP:	\
+		__XFS_BTREE_STATS_ADD(__mp, bmbt, stat, val); break; \
+	case XFS_BTNUM_INO:	\
+		__XFS_BTREE_STATS_ADD(__mp, ibt, stat, val); break; \
+	case XFS_BTNUM_FINO:	\
+		__XFS_BTREE_STATS_ADD(__mp, fibt, stat, val); break; \
+	case XFS_BTNUM_MAX: ASSERT(0); __mp = __mp /* fucking gcc */ ; break; \
 	}       \
 } while (0)
 
diff --git a/libxfs/xfs_dir2.c b/libxfs/xfs_dir2.c
index 4a19b76..383401b 100644
--- a/libxfs/xfs_dir2.c
+++ b/libxfs/xfs_dir2.c
@@ -269,7 +269,7 @@ xfs_dir_createname(
 		rval = xfs_dir_ino_validate(tp->t_mountp, inum);
 		if (rval)
 			return rval;
-		XFS_STATS_INC(xs_dir_create);
+		XFS_STATS_INC(dp->i_mount, xs_dir_create);
 	}
 
 	args = kmem_zalloc(sizeof(*args), KM_SLEEP | KM_NOFS);
@@ -362,7 +362,7 @@ xfs_dir_lookup(
 	int		v;		/* type-checking value */
 
 	ASSERT(S_ISDIR(dp->i_d.di_mode));
-	XFS_STATS_INC(xs_dir_lookup);
+	XFS_STATS_INC(dp->i_mount, xs_dir_lookup);
 
 	/*
 	 * We need to use KM_NOFS here so that lockdep will not throw false
@@ -439,7 +439,7 @@ xfs_dir_removename(
 	int		v;		/* type-checking value */
 
 	ASSERT(S_ISDIR(dp->i_d.di_mode));
-	XFS_STATS_INC(xs_dir_remove);
+	XFS_STATS_INC(dp->i_mount, xs_dir_remove);
 
 	args = kmem_zalloc(sizeof(*args), KM_SLEEP | KM_NOFS);
 	if (!args)
-- 
2.5.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 05/14] xfs: introduce BMAPI_ZERO for allocating zeroed extents
  2016-02-15  6:18 [PATCH 0/14] xfsprogs: kernel libxfs sync up to 4.5-rc2 Dave Chinner
                   ` (3 preceding siblings ...)
  2016-02-15  6:18 ` [PATCH 04/14] xfs: per-filesystem stats counter implementation Dave Chinner
@ 2016-02-15  6:18 ` Dave Chinner
  2016-02-16 19:20   ` Brian Foster
  2016-02-15  6:18 ` [PATCH 06/14] xfs: get mp from bma->ip in xfs_bmap code Dave Chinner
                   ` (9 subsequent siblings)
  14 siblings, 1 reply; 18+ messages in thread
From: Dave Chinner @ 2016-02-15  6:18 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

Source kernel commit 3fbbbea34bac049c0b5938dc065f7d8ee1ef7e67

To enable DAX to do atomic allocation of zeroed extents, we need to
drive the block zeroing deep into the allocator. Because
xfs_bmapi_write() can return merged extents on allocation that were
only partially allocated (i.e. requested range spans allocated and
hole regions, allocation into the hole was contiguous), we cannot
zero the extent returned from xfs_bmapi_write() as that can
overwrite existing data with zeros.

Hence we have to drive the extent zeroing into the allocation code,
prior to where we merge the extents into the BMBT and return the
resultant map. This means we need to propagate this need down to
the xfs_alloc_vextent() and issue the block zeroing at this point.

While this functionality is being introduced for DAX, there is no
reason why it is specific to DAX - we can per-zero blocks during the
allocation transaction on any type of device. It's just slow (and
usually slower than unwritten allocation and conversion) on
traditional block devices so doesn't tend to get used. We can,
however, hook hardware zeroing optimisations via sb_issue_zeroout()
to this operation, so it may be useful in future and hence the
"allocate zeroed blocks" API needs to be implementation neutral.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 include/libxfs.h         |  1 -
 libxfs/libxfs_api_defs.h |  1 +
 libxfs/libxfs_io.h       |  2 ++
 libxfs/libxfs_priv.h     |  3 +++
 libxfs/rdwr.c            |  4 +++-
 libxfs/util.c            | 35 +++++++++++++++++++++++++++++++++++
 libxfs/xfs_alloc.c       | 10 +++++++++-
 libxfs/xfs_alloc.h       |  8 +++++---
 libxfs/xfs_bmap.c        | 35 +++++++++++++++++++++++++++++++++--
 libxfs/xfs_bmap.h        | 13 +++++++++++--
 10 files changed, 102 insertions(+), 10 deletions(-)

diff --git a/include/libxfs.h b/include/libxfs.h
index bd51df0..5e8f3d4 100644
--- a/include/libxfs.h
+++ b/include/libxfs.h
@@ -139,7 +139,6 @@ extern int	libxfs_init (libxfs_init_t *);
 extern void	libxfs_destroy (void);
 extern int	libxfs_device_to_fd (dev_t);
 extern dev_t	libxfs_device_open (char *, int, int, int);
-extern void	libxfs_device_zero(struct xfs_buftarg *, xfs_daddr_t, uint);
 extern void	libxfs_device_close (dev_t);
 extern int	libxfs_device_alignment (void);
 extern void	libxfs_report(FILE *);
diff --git a/libxfs/libxfs_api_defs.h b/libxfs/libxfs_api_defs.h
index e9fd9c8..3a649e3 100644
--- a/libxfs/libxfs_api_defs.h
+++ b/libxfs/libxfs_api_defs.h
@@ -73,6 +73,7 @@
 #define xfs_bunmapi			libxfs_bunmapi
 #define xfs_bmbt_get_all		libxfs_bmbt_get_all
 #define xfs_rtfree_extent		libxfs_rtfree_extent
+#define xfs_zero_extent			libxfs_zero_extent
 
 #define xfs_da_brelse			libxfs_da_brelse
 #define xfs_da_hashname			libxfs_da_hashname
diff --git a/libxfs/libxfs_io.h b/libxfs/libxfs_io.h
index 86b18a0..29ab583 100644
--- a/libxfs/libxfs_io.h
+++ b/libxfs/libxfs_io.h
@@ -212,6 +212,8 @@ extern int	libxfs_writebufr(struct xfs_buf *);
 extern int	libxfs_readbufr(struct xfs_buftarg *, xfs_daddr_t, xfs_buf_t *, int, int);
 extern int	libxfs_readbufr_map(struct xfs_buftarg *, struct xfs_buf *, int);
 
+extern int	libxfs_device_zero(struct xfs_buftarg *, xfs_daddr_t, uint);
+
 extern int libxfs_bhash_size;
 
 #define LIBXFS_BREAD	0x1
diff --git a/libxfs/libxfs_priv.h b/libxfs/libxfs_priv.h
index e2884a2..2c5aba0 100644
--- a/libxfs/libxfs_priv.h
+++ b/libxfs/libxfs_priv.h
@@ -520,6 +520,9 @@ void xfs_verifier_error(struct xfs_buf *bp);
 /* xfs_rtalloc.c */
 int libxfs_rtfree_extent(struct xfs_trans *, xfs_rtblock_t, xfs_extlen_t);
 
+int libxfs_zero_extent(struct xfs_inode *ip, xfs_fsblock_t start_fsb,
+                        xfs_off_t count_fsb);
+
 bool xfs_log_check_lsn(struct xfs_mount *, xfs_lsn_t);
 
 #endif	/* __LIBXFS_INTERNAL_XFS_H__ */
diff --git a/libxfs/rdwr.c b/libxfs/rdwr.c
index 2e298c2..3522c26 100644
--- a/libxfs/rdwr.c
+++ b/libxfs/rdwr.c
@@ -67,7 +67,8 @@
 
 #define IO_BCOMPARE_CHECK
 
-void
+/* XXX: (dgc) Propagate errors, only exit if fail-on-error flag set */
+int
 libxfs_device_zero(struct xfs_buftarg *btp, xfs_daddr_t start, uint len)
 {
 	xfs_off_t	start_offset, end_offset, offset;
@@ -109,6 +110,7 @@ libxfs_device_zero(struct xfs_buftarg *btp, xfs_daddr_t start, uint len)
 		offset += bytes;
 	}
 	free(z);
+	return 0;
 }
 
 static void unmount_record(void *p)
diff --git a/libxfs/util.c b/libxfs/util.c
index 90031fd..ee4bf3c 100644
--- a/libxfs/util.c
+++ b/libxfs/util.c
@@ -17,6 +17,7 @@
  */
 
 #include "libxfs_priv.h"
+#include "libxfs_io.h"
 #include "init.h"
 #include "xfs_fs.h"
 #include "xfs_shared.h"
@@ -33,6 +34,7 @@
 #include "xfs_trans_space.h"
 #include "xfs_ialloc.h"
 #include "xfs_alloc.h"
+#include "xfs_bit.h"
 
 /*
  * Calculate the worst case log unit reservation for a given superblock
@@ -770,3 +772,36 @@ xfs_log_check_lsn(
 
 	return true;
 }
+
+static struct xfs_buftarg *
+xfs_find_bdev_for_inode(
+	struct xfs_inode	*ip)
+{
+	struct xfs_mount	*mp = ip->i_mount;
+
+	if (XFS_IS_REALTIME_INODE(ip))
+		return mp->m_rtdev_targp;
+	return mp->m_ddev_targp;
+}
+
+static xfs_daddr_t
+xfs_fsb_to_db(struct xfs_inode *ip, xfs_fsblock_t fsb)
+{
+	if (XFS_IS_REALTIME_INODE(ip))
+		 return XFS_FSB_TO_BB(ip->i_mount, fsb);
+	return XFS_FSB_TO_DADDR(ip->i_mount, (fsb));
+}
+
+
+int
+libxfs_zero_extent(
+	struct xfs_inode *ip,
+	xfs_fsblock_t	start_fsb,
+	xfs_off_t	count_fsb)
+{
+	xfs_daddr_t	sector = xfs_fsb_to_db(ip, start_fsb);
+	ssize_t		size = XFS_FSB_TO_BB(ip->i_mount, count_fsb);
+
+	return libxfs_device_zero(xfs_find_bdev_for_inode(ip), sector, size);
+}
+
diff --git a/libxfs/xfs_alloc.c b/libxfs/xfs_alloc.c
index 12d59df..af40270 100644
--- a/libxfs/xfs_alloc.c
+++ b/libxfs/xfs_alloc.c
@@ -2508,7 +2508,7 @@ xfs_alloc_vextent(
 		 * Try near allocation first, then anywhere-in-ag after
 		 * the first a.g. fails.
 		 */
-		if ((args->userdata  == XFS_ALLOC_INITIAL_USER_DATA) &&
+		if ((args->userdata & XFS_ALLOC_INITIAL_USER_DATA) &&
 		    (mp->m_flags & XFS_MOUNT_32BITINODES)) {
 			args->fsbno = XFS_AGB_TO_FSB(mp,
 					((mp->m_agfrotor / rotorstep) %
@@ -2639,6 +2639,14 @@ xfs_alloc_vextent(
 		XFS_AG_CHECK_DADDR(mp, XFS_FSB_TO_DADDR(mp, args->fsbno),
 			args->len);
 #endif
+
+		/* Zero the extent if we were asked to do so */
+		if (args->userdata & XFS_ALLOC_USERDATA_ZERO) {
+			error = xfs_zero_extent(args->ip, args->fsbno, args->len);
+			if (error)
+				goto error0;
+		}
+
 	}
 	xfs_perag_put(args->pag);
 	return 0;
diff --git a/libxfs/xfs_alloc.h b/libxfs/xfs_alloc.h
index 071b28b..135eb3d 100644
--- a/libxfs/xfs_alloc.h
+++ b/libxfs/xfs_alloc.h
@@ -101,6 +101,7 @@ typedef struct xfs_alloc_arg {
 	struct xfs_mount *mp;		/* file system mount point */
 	struct xfs_buf	*agbp;		/* buffer for a.g. freelist header */
 	struct xfs_perag *pag;		/* per-ag struct for this agno */
+	struct xfs_inode *ip;		/* for userdata zeroing method */
 	xfs_fsblock_t	fsbno;		/* file system block number */
 	xfs_agnumber_t	agno;		/* allocation group number */
 	xfs_agblock_t	agbno;		/* allocation group-relative block # */
@@ -120,15 +121,16 @@ typedef struct xfs_alloc_arg {
 	char		wasdel;		/* set if allocation was prev delayed */
 	char		wasfromfl;	/* set if allocation is from freelist */
 	char		isfl;		/* set if is freelist blocks - !acctg */
-	char		userdata;	/* set if this is user data */
+	char		userdata;	/* mask defining userdata treatment */
 	xfs_fsblock_t	firstblock;	/* io first block allocated */
 } xfs_alloc_arg_t;
 
 /*
  * Defines for userdata
  */
-#define XFS_ALLOC_USERDATA		1	/* allocation is for user data*/
-#define XFS_ALLOC_INITIAL_USER_DATA	2	/* special case start of file */
+#define XFS_ALLOC_USERDATA		(1 << 0)/* allocation is for user data*/
+#define XFS_ALLOC_INITIAL_USER_DATA	(1 << 1)/* special case start of file */
+#define XFS_ALLOC_USERDATA_ZERO		(1 << 2)/* zero extent on allocation */
 
 xfs_extlen_t xfs_alloc_longest_free_extent(struct xfs_mount *mp,
 		struct xfs_perag *pag, xfs_extlen_t need);
diff --git a/libxfs/xfs_bmap.c b/libxfs/xfs_bmap.c
index 8e19b50..a38549c 100644
--- a/libxfs/xfs_bmap.c
+++ b/libxfs/xfs_bmap.c
@@ -3795,8 +3795,13 @@ xfs_bmap_btalloc(
 	args.wasdel = ap->wasdel;
 	args.isfl = 0;
 	args.userdata = ap->userdata;
-	if ((error = xfs_alloc_vextent(&args)))
+	if (ap->userdata & XFS_ALLOC_USERDATA_ZERO)
+		args.ip = ap->ip;
+
+	error = xfs_alloc_vextent(&args);
+	if (error)
 		return error;
+
 	if (tryagain && args.fsbno == NULLFSBLOCK) {
 		/*
 		 * Exact allocation failed. Now try with alignment
@@ -4295,11 +4300,14 @@ xfs_bmapi_allocate(
 
 	/*
 	 * Indicate if this is the first user data in the file, or just any
-	 * user data.
+	 * user data. And if it is userdata, indicate whether it needs to
+	 * be initialised to zero during allocation.
 	 */
 	if (!(bma->flags & XFS_BMAPI_METADATA)) {
 		bma->userdata = (bma->offset == 0) ?
 			XFS_ALLOC_INITIAL_USER_DATA : XFS_ALLOC_USERDATA;
+		if (bma->flags & XFS_BMAPI_ZERO)
+			bma->userdata |= XFS_ALLOC_USERDATA_ZERO;
 	}
 
 	bma->minlen = (bma->flags & XFS_BMAPI_CONTIG) ? bma->length : 1;
@@ -4414,6 +4422,17 @@ xfs_bmapi_convert_unwritten(
 	mval->br_state = (mval->br_state == XFS_EXT_UNWRITTEN)
 				? XFS_EXT_NORM : XFS_EXT_UNWRITTEN;
 
+	/*
+	 * Before insertion into the bmbt, zero the range being converted
+	 * if required.
+	 */
+	if (flags & XFS_BMAPI_ZERO) {
+		error = xfs_zero_extent(bma->ip, mval->br_startblock,
+					mval->br_blockcount);
+		if (error)
+			return error;
+	}
+
 	error = xfs_bmap_add_extent_unwritten_real(bma->tp, bma->ip, &bma->idx,
 			&bma->cur, mval, bma->firstblock, bma->flist,
 			&tmp_logflags);
@@ -4507,6 +4526,18 @@ xfs_bmapi_write(
 	ASSERT(XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_LOCAL);
 	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
 
+	/* zeroing is for currently only for data extents, not metadata */
+	ASSERT((flags & (XFS_BMAPI_METADATA | XFS_BMAPI_ZERO)) !=
+			(XFS_BMAPI_METADATA | XFS_BMAPI_ZERO));
+	/*
+	 * we can allocate unwritten extents or pre-zero allocated blocks,
+	 * but it makes no sense to do both at once. This would result in
+	 * zeroing the unwritten extent twice, but it still being an
+	 * unwritten extent....
+	 */
+	ASSERT((flags & (XFS_BMAPI_PREALLOC | XFS_BMAPI_ZERO)) !=
+			(XFS_BMAPI_PREALLOC | XFS_BMAPI_ZERO));
+
 	if (unlikely(XFS_TEST_ERROR(
 	    (XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_EXTENTS &&
 	     XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_BTREE),
diff --git a/libxfs/xfs_bmap.h b/libxfs/xfs_bmap.h
index d3daf6d..baec27d 100644
--- a/libxfs/xfs_bmap.h
+++ b/libxfs/xfs_bmap.h
@@ -52,9 +52,9 @@ struct xfs_bmalloca {
 	xfs_extlen_t		minleft; /* amount must be left after alloc */
 	bool			eof;	/* set if allocating past last extent */
 	bool			wasdel;	/* replacing a delayed allocation */
-	bool			userdata;/* set if is user data */
 	bool			aeof;	/* allocated space at eof */
 	bool			conv;	/* overwriting unwritten extents */
+	char			userdata;/* userdata mask */
 	int			flags;
 };
 
@@ -109,6 +109,14 @@ typedef	struct xfs_bmap_free
  */
 #define XFS_BMAPI_CONVERT	0x040
 
+/*
+ * allocate zeroed extents - this requires all newly allocated user data extents
+ * to be initialised to zero. It will be ignored if XFS_BMAPI_METADATA is set.
+ * Use in conjunction with XFS_BMAPI_CONVERT to convert unwritten extents found
+ * during the allocation range to zeroed written extents.
+ */
+#define XFS_BMAPI_ZERO		0x080
+
 #define XFS_BMAPI_FLAGS \
 	{ XFS_BMAPI_ENTIRE,	"ENTIRE" }, \
 	{ XFS_BMAPI_METADATA,	"METADATA" }, \
@@ -116,7 +124,8 @@ typedef	struct xfs_bmap_free
 	{ XFS_BMAPI_PREALLOC,	"PREALLOC" }, \
 	{ XFS_BMAPI_IGSTATE,	"IGSTATE" }, \
 	{ XFS_BMAPI_CONTIG,	"CONTIG" }, \
-	{ XFS_BMAPI_CONVERT,	"CONVERT" }
+	{ XFS_BMAPI_CONVERT,	"CONVERT" }, \
+	{ XFS_BMAPI_ZERO,	"ZERO" }
 
 
 static inline int xfs_bmapi_aflag(int w)
-- 
2.5.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 06/14] xfs: get mp from bma->ip in xfs_bmap code
  2016-02-15  6:18 [PATCH 0/14] xfsprogs: kernel libxfs sync up to 4.5-rc2 Dave Chinner
                   ` (4 preceding siblings ...)
  2016-02-15  6:18 ` [PATCH 05/14] xfs: introduce BMAPI_ZERO for allocating zeroed extents Dave Chinner
@ 2016-02-15  6:18 ` Dave Chinner
  2016-02-15  6:18 ` [PATCH 07/14] xfs: bmapbt checking on debug kernels too expensive Dave Chinner
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 18+ messages in thread
From: Dave Chinner @ 2016-02-15  6:18 UTC (permalink / raw)
  To: xfs

From: Eric Sandeen <sandeen@redhat.com>

Source kernel commit f1f96c4946590616812711ac19eb7a84be160877

In my earlier commit

  c29aad4 xfs: pass mp to XFS_WANT_CORRUPTED_GOTO

I added some local mp variables with code which indicates that
mp might be NULL.  Coverity doesn't like this now, because the
updated per-fs XFS_STATS macros dereference mp.

I don't think this is actually a problem; from what I can tell,
we cannot get to these functions with a null bma->tp, so my NULL
check was probably pointless.  Still, it's not super obvious.

So switch this code to get mp from the inode on the xfs_bmalloca
structure, with no conditional, because the functions are already
using bmap->ip directly.

Addresses-Coverity-Id: 1339552
Addresses-Coverity-Id: 1339553
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 libxfs/xfs_bmap.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libxfs/xfs_bmap.c b/libxfs/xfs_bmap.c
index a38549c..70417fc 100644
--- a/libxfs/xfs_bmap.c
+++ b/libxfs/xfs_bmap.c
@@ -1718,7 +1718,7 @@ xfs_bmap_add_extent_delay_real(
 	struct xfs_mount	*mp;
 	int			whichfork = XFS_DATA_FORK;
 
-	mp  = bma->tp ? bma->tp->t_mountp : NULL;
+	mp = bma->ip->i_mount;
 	ifp = XFS_IFORK_PTR(bma->ip, whichfork);
 
 	ASSERT(bma->idx >= 0);
@@ -2932,7 +2932,7 @@ xfs_bmap_add_extent_hole_real(
 	int			state;	/* state bits, accessed thru macros */
 	struct xfs_mount	*mp;
 
-	mp = bma->tp ? bma->tp->t_mountp : NULL;
+	mp = bma->ip->i_mount;
 	ifp = XFS_IFORK_PTR(bma->ip, whichfork);
 
 	ASSERT(bma->idx >= 0);
-- 
2.5.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 07/14] xfs: bmapbt checking on debug kernels too expensive
  2016-02-15  6:18 [PATCH 0/14] xfsprogs: kernel libxfs sync up to 4.5-rc2 Dave Chinner
                   ` (5 preceding siblings ...)
  2016-02-15  6:18 ` [PATCH 06/14] xfs: get mp from bma->ip in xfs_bmap code Dave Chinner
@ 2016-02-15  6:18 ` Dave Chinner
  2016-02-15  6:18 ` [PATCH 08/14] xfs: eliminate committed arg from xfs_bmap_finish Dave Chinner
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 18+ messages in thread
From: Dave Chinner @ 2016-02-15  6:18 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

Source kernel commit e35438196c6a1d8b206471d51e80c380e80e047b

For large sparse or fragmented files, checking every single entry in
the bmapbt on every operation is prohibitively expensive. Especially
as such checks rarely discover problems during normal operations on
high extent coutn files. Our regression tests don't tend to exercise
files with hundreds of thousands to millions of extents, so mostly
this isn't noticed.

However, trying to run things like xfs_mdrestore of large filesystem
dumps on a debug kernel quickly becomes impossible as the CPU is
completely burnt up repeatedly walking the sparse file bmapbt that
is generated for every allocation that is made.

Hence, if the file has more than 10,000 extents, just don't bother
with walking the tree to check it exhaustively. The btree code has
checks that ensure that the newly inserted/removed/modified record
is correctly ordered, so the entrie tree walk in thses cases has
limited additional value.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 libxfs/xfs_bmap.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/libxfs/xfs_bmap.c b/libxfs/xfs_bmap.c
index 70417fc..eb19133 100644
--- a/libxfs/xfs_bmap.c
+++ b/libxfs/xfs_bmap.c
@@ -317,9 +317,11 @@ xfs_check_block(
 
 /*
  * Check that the extents for the inode ip are in the right order in all
- * btree leaves.
+ * btree leaves. THis becomes prohibitively expensive for large extent count
+ * files, so don't bother with inodes that have more than 10,000 extents in
+ * them. The btree record ordering checks will still be done, so for such large
+ * bmapbt constructs that is going to catch most corruptions.
  */
-
 STATIC void
 xfs_bmap_check_leaf_extents(
 	xfs_btree_cur_t		*cur,	/* btree cursor or null */
@@ -344,6 +346,10 @@ xfs_bmap_check_leaf_extents(
 		return;
 	}
 
+	/* skip large extent count inodes */
+	if (ip->i_d.di_nextents > 10000)
+		return;
+
 	bno = NULLFSBLOCK;
 	mp = ip->i_mount;
 	ifp = XFS_IFORK_PTR(ip, whichfork);
-- 
2.5.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 08/14] xfs: eliminate committed arg from xfs_bmap_finish
  2016-02-15  6:18 [PATCH 0/14] xfsprogs: kernel libxfs sync up to 4.5-rc2 Dave Chinner
                   ` (6 preceding siblings ...)
  2016-02-15  6:18 ` [PATCH 07/14] xfs: bmapbt checking on debug kernels too expensive Dave Chinner
@ 2016-02-15  6:18 ` Dave Chinner
  2016-02-15  6:18 ` [PATCH 09/14] xfs: inode recovery readahead can race with inode buffer creation Dave Chinner
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 18+ messages in thread
From: Dave Chinner @ 2016-02-15  6:18 UTC (permalink / raw)
  To: xfs

From: Eric Sandeen <sandeen@sandeen.net>

Source kernel commit f6106efae5f4144b32f6c10de0dc3e7efc9181e3

Calls to xfs_bmap_finish() and xfs_trans_ijoin(), and the
associated comments were replicated several times across
the attribute code, all dealing with what to do if the
transaction was or wasn't committed.

And in that replicated code, an ASSERT() test of an
uninitialized variable occurs in several locations:

	error = xfs_attr_thing(&args);
	if (!error) {
		error = xfs_bmap_finish(&args.trans, args.flist,
					&committed);
	}
	if (error) {
		ASSERT(committed);

If the first xfs_attr_thing() failed, we'd skip the xfs_bmap_finish,
never set "committed", and then test it in the ASSERT.

Fix this up by moving the committed state internal to xfs_bmap_finish,
and add a new inode argument.  If an inode is passed in, it is passed
through to __xfs_trans_roll() and joined to the transaction there if
the transaction was committed.

xfs_qm_dqalloc() was a little unique in that it called bjoin rather
than ijoin, but as Dave points out we can detect the committed state
but checking whether (*tpp != tp).

Addresses-Coverity-Id: 102360
Addresses-Coverity-Id: 102361
Addresses-Coverity-Id: 102363
Addresses-Coverity-Id: 102364
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 include/libxfs.h         |   1 -
 libxfs/util.c            |  19 +++----
 libxfs/xfs_attr.c        | 141 ++++++++---------------------------------------
 libxfs/xfs_attr_remote.c |  31 ++---------
 libxfs/xfs_bmap.c        |   6 +-
 libxfs/xfs_bmap.h        |   2 +-
 mkfs/proto.c             |  14 ++---
 repair/phase6.c          |  33 +++++------
 8 files changed, 57 insertions(+), 190 deletions(-)

diff --git a/include/libxfs.h b/include/libxfs.h
index 5e8f3d4..cf2e20e 100644
--- a/include/libxfs.h
+++ b/include/libxfs.h
@@ -163,7 +163,6 @@ extern unsigned int	libxfs_log2_roundup(unsigned int i);
 
 extern int	libxfs_alloc_file_space (struct xfs_inode *, xfs_off_t,
 				xfs_off_t, int, int);
-extern int	libxfs_bmap_finish(xfs_trans_t **, xfs_bmap_free_t *, int *);
 
 extern void 	libxfs_fs_repair_cmn_err(int, struct xfs_mount *, char *, ...);
 extern void	libxfs_fs_cmn_err(int, struct xfs_mount *, char *, ...);
diff --git a/libxfs/util.c b/libxfs/util.c
index ee4bf3c..484a924 100644
--- a/libxfs/util.c
+++ b/libxfs/util.c
@@ -493,27 +493,25 @@ libxfs_mod_incore_sb(
 
 int
 libxfs_bmap_finish(
-	xfs_trans_t	**tp,
-	xfs_bmap_free_t *flist,
-	int		*committed)
+	struct xfs_trans	**tp,
+	struct xfs_bmap_free	*flist,
+	struct xfs_inode	*ip)
 {
 	xfs_bmap_free_item_t	*free;	/* free extent list item */
 	xfs_bmap_free_item_t	*next;	/* next item on free list */
 	int			error;
 
-	if (flist->xbf_count == 0) {
-		*committed = 0;
+	if (flist->xbf_count == 0)
 		return 0;
-	}
 
 	for (free = flist->xbf_first; free != NULL; free = next) {
 		next = free->xbfi_next;
-		if ((error = xfs_free_extent(*tp, free->xbfi_startblock,
-				free->xbfi_blockcount)))
+		error = xfs_free_extent(*tp, free->xbfi_startblock,
+					free->xbfi_blockcount);
+		if (error)
 			return error;
 		xfs_bmap_del_free(flist, NULL, free);
 	}
-	*committed = 0;
 	return 0;
 }
 
@@ -543,7 +541,6 @@ libxfs_alloc_file_space(
 	xfs_fileoff_t	startoffset_fsb;
 	xfs_trans_t	*tp;
 	int		xfs_bmapi_flags;
-	int		committed;
 	int		error;
 
 	if (len <= 0)
@@ -588,7 +585,7 @@ libxfs_alloc_file_space(
 			goto error0;
 
 		/* complete the transaction */
-		error = xfs_bmap_finish(&tp, &free_list, &committed);
+		error = xfs_bmap_finish(&tp, &free_list, ip);
 		if (error)
 			goto error0;
 
diff --git a/libxfs/xfs_attr.c b/libxfs/xfs_attr.c
index 5e79f3d..afe3dcb 100644
--- a/libxfs/xfs_attr.c
+++ b/libxfs/xfs_attr.c
@@ -200,7 +200,7 @@ xfs_attr_set(
 	struct xfs_trans_res	tres;
 	xfs_fsblock_t		firstblock;
 	int			rsvd = (flags & ATTR_ROOT) != 0;
-	int			error, err2, committed, local;
+	int			error, err2, local;
 
 	XFS_STATS_INC(mp, xs_attr_set);
 
@@ -327,25 +327,15 @@ xfs_attr_set(
 		 */
 		xfs_bmap_init(args.flist, args.firstblock);
 		error = xfs_attr_shortform_to_leaf(&args);
-		if (!error) {
-			error = xfs_bmap_finish(&args.trans, args.flist,
-						&committed);
-		}
+		if (!error)
+			error = xfs_bmap_finish(&args.trans, args.flist, dp);
 		if (error) {
-			ASSERT(committed);
 			args.trans = NULL;
 			xfs_bmap_cancel(&flist);
 			goto out;
 		}
 
 		/*
-		 * bmap_finish() may have committed the last trans and started
-		 * a new one.  We need the inode to be in all transactions.
-		 */
-		if (committed)
-			xfs_trans_ijoin(args.trans, dp, 0);
-
-		/*
 		 * Commit the leaf transformation.  We'll need another (linked)
 		 * transaction to add the new attribute to the leaf.
 		 */
@@ -561,7 +551,7 @@ xfs_attr_leaf_addname(xfs_da_args_t *args)
 {
 	xfs_inode_t *dp;
 	struct xfs_buf *bp;
-	int retval, error, committed, forkoff;
+	int retval, error, forkoff;
 
 	trace_xfs_attr_leaf_addname(args);
 
@@ -621,25 +611,15 @@ xfs_attr_leaf_addname(xfs_da_args_t *args)
 		 */
 		xfs_bmap_init(args->flist, args->firstblock);
 		error = xfs_attr3_leaf_to_node(args);
-		if (!error) {
-			error = xfs_bmap_finish(&args->trans, args->flist,
-						&committed);
-		}
+		if (!error)
+			error = xfs_bmap_finish(&args->trans, args->flist, dp);
 		if (error) {
-			ASSERT(committed);
 			args->trans = NULL;
 			xfs_bmap_cancel(args->flist);
 			return error;
 		}
 
 		/*
-		 * bmap_finish() may have committed the last trans and started
-		 * a new one.  We need the inode to be in all transactions.
-		 */
-		if (committed)
-			xfs_trans_ijoin(args->trans, dp, 0);
-
-		/*
 		 * Commit the current trans (including the inode) and start
 		 * a new one.
 		 */
@@ -722,25 +702,14 @@ xfs_attr_leaf_addname(xfs_da_args_t *args)
 			xfs_bmap_init(args->flist, args->firstblock);
 			error = xfs_attr3_leaf_to_shortform(bp, args, forkoff);
 			/* bp is gone due to xfs_da_shrink_inode */
-			if (!error) {
+			if (!error)
 				error = xfs_bmap_finish(&args->trans,
-							args->flist,
-							&committed);
-			}
+							args->flist, dp);
 			if (error) {
-				ASSERT(committed);
 				args->trans = NULL;
 				xfs_bmap_cancel(args->flist);
 				return error;
 			}
-
-			/*
-			 * bmap_finish() may have committed the last trans
-			 * and started a new one.  We need the inode to be
-			 * in all transactions.
-			 */
-			if (committed)
-				xfs_trans_ijoin(args->trans, dp, 0);
 		}
 
 		/*
@@ -768,7 +737,7 @@ xfs_attr_leaf_removename(xfs_da_args_t *args)
 {
 	xfs_inode_t *dp;
 	struct xfs_buf *bp;
-	int error, committed, forkoff;
+	int error, forkoff;
 
 	trace_xfs_attr_leaf_removename(args);
 
@@ -796,23 +765,13 @@ xfs_attr_leaf_removename(xfs_da_args_t *args)
 		xfs_bmap_init(args->flist, args->firstblock);
 		error = xfs_attr3_leaf_to_shortform(bp, args, forkoff);
 		/* bp is gone due to xfs_da_shrink_inode */
-		if (!error) {
-			error = xfs_bmap_finish(&args->trans, args->flist,
-						&committed);
-		}
+		if (!error)
+			error = xfs_bmap_finish(&args->trans, args->flist, dp);
 		if (error) {
-			ASSERT(committed);
 			args->trans = NULL;
 			xfs_bmap_cancel(args->flist);
 			return error;
 		}
-
-		/*
-		 * bmap_finish() may have committed the last trans and started
-		 * a new one.  We need the inode to be in all transactions.
-		 */
-		if (committed)
-			xfs_trans_ijoin(args->trans, dp, 0);
 	}
 	return 0;
 }
@@ -870,7 +829,7 @@ xfs_attr_node_addname(xfs_da_args_t *args)
 	xfs_da_state_blk_t *blk;
 	xfs_inode_t *dp;
 	xfs_mount_t *mp;
-	int committed, retval, error;
+	int retval, error;
 
 	trace_xfs_attr_node_addname(args);
 
@@ -931,27 +890,16 @@ restart:
 			state = NULL;
 			xfs_bmap_init(args->flist, args->firstblock);
 			error = xfs_attr3_leaf_to_node(args);
-			if (!error) {
+			if (!error)
 				error = xfs_bmap_finish(&args->trans,
-							args->flist,
-							&committed);
-			}
+							args->flist, dp);
 			if (error) {
-				ASSERT(committed);
 				args->trans = NULL;
 				xfs_bmap_cancel(args->flist);
 				goto out;
 			}
 
 			/*
-			 * bmap_finish() may have committed the last trans
-			 * and started a new one.  We need the inode to be
-			 * in all transactions.
-			 */
-			if (committed)
-				xfs_trans_ijoin(args->trans, dp, 0);
-
-			/*
 			 * Commit the node conversion and start the next
 			 * trans in the chain.
 			 */
@@ -970,23 +918,13 @@ restart:
 		 */
 		xfs_bmap_init(args->flist, args->firstblock);
 		error = xfs_da3_split(state);
-		if (!error) {
-			error = xfs_bmap_finish(&args->trans, args->flist,
-						&committed);
-		}
+		if (!error)
+			error = xfs_bmap_finish(&args->trans, args->flist, dp);
 		if (error) {
-			ASSERT(committed);
 			args->trans = NULL;
 			xfs_bmap_cancel(args->flist);
 			goto out;
 		}
-
-		/*
-		 * bmap_finish() may have committed the last trans and started
-		 * a new one.  We need the inode to be in all transactions.
-		 */
-		if (committed)
-			xfs_trans_ijoin(args->trans, dp, 0);
 	} else {
 		/*
 		 * Addition succeeded, update Btree hashvals.
@@ -1079,25 +1017,14 @@ restart:
 		if (retval && (state->path.active > 1)) {
 			xfs_bmap_init(args->flist, args->firstblock);
 			error = xfs_da3_join(state);
-			if (!error) {
+			if (!error)
 				error = xfs_bmap_finish(&args->trans,
-							args->flist,
-							&committed);
-			}
+							args->flist, dp);
 			if (error) {
-				ASSERT(committed);
 				args->trans = NULL;
 				xfs_bmap_cancel(args->flist);
 				goto out;
 			}
-
-			/*
-			 * bmap_finish() may have committed the last trans
-			 * and started a new one.  We need the inode to be
-			 * in all transactions.
-			 */
-			if (committed)
-				xfs_trans_ijoin(args->trans, dp, 0);
 		}
 
 		/*
@@ -1139,7 +1066,7 @@ xfs_attr_node_removename(xfs_da_args_t *args)
 	xfs_da_state_blk_t *blk;
 	xfs_inode_t *dp;
 	struct xfs_buf *bp;
-	int retval, error, committed, forkoff;
+	int retval, error, forkoff;
 
 	trace_xfs_attr_node_removename(args);
 
@@ -1213,24 +1140,13 @@ xfs_attr_node_removename(xfs_da_args_t *args)
 	if (retval && (state->path.active > 1)) {
 		xfs_bmap_init(args->flist, args->firstblock);
 		error = xfs_da3_join(state);
-		if (!error) {
-			error = xfs_bmap_finish(&args->trans, args->flist,
-						&committed);
-		}
+		if (!error)
+			error = xfs_bmap_finish(&args->trans, args->flist, dp);
 		if (error) {
-			ASSERT(committed);
 			args->trans = NULL;
 			xfs_bmap_cancel(args->flist);
 			goto out;
 		}
-
-		/*
-		 * bmap_finish() may have committed the last trans and started
-		 * a new one.  We need the inode to be in all transactions.
-		 */
-		if (committed)
-			xfs_trans_ijoin(args->trans, dp, 0);
-
 		/*
 		 * Commit the Btree join operation and start a new trans.
 		 */
@@ -1258,25 +1174,14 @@ xfs_attr_node_removename(xfs_da_args_t *args)
 			xfs_bmap_init(args->flist, args->firstblock);
 			error = xfs_attr3_leaf_to_shortform(bp, args, forkoff);
 			/* bp is gone due to xfs_da_shrink_inode */
-			if (!error) {
+			if (!error)
 				error = xfs_bmap_finish(&args->trans,
-							args->flist,
-							&committed);
-			}
+							args->flist, dp);
 			if (error) {
-				ASSERT(committed);
 				args->trans = NULL;
 				xfs_bmap_cancel(args->flist);
 				goto out;
 			}
-
-			/*
-			 * bmap_finish() may have committed the last trans
-			 * and started a new one.  We need the inode to be
-			 * in all transactions.
-			 */
-			if (committed)
-				xfs_trans_ijoin(args->trans, dp, 0);
 		} else
 			xfs_trans_brelse(args->trans, bp);
 	}
diff --git a/libxfs/xfs_attr_remote.c b/libxfs/xfs_attr_remote.c
index 95383e3..79d663e 100644
--- a/libxfs/xfs_attr_remote.c
+++ b/libxfs/xfs_attr_remote.c
@@ -443,8 +443,6 @@ xfs_attr_rmtval_set(
 	 * Roll through the "value", allocating blocks on disk as required.
 	 */
 	while (blkcnt > 0) {
-		int	committed;
-
 		/*
 		 * Allocate a single extent, up to the size of the value.
 		 *
@@ -462,24 +460,14 @@ xfs_attr_rmtval_set(
 		error = xfs_bmapi_write(args->trans, dp, (xfs_fileoff_t)lblkno,
 				  blkcnt, XFS_BMAPI_ATTRFORK, args->firstblock,
 				  args->total, &map, &nmap, args->flist);
-		if (!error) {
-			error = xfs_bmap_finish(&args->trans, args->flist,
-						&committed);
-		}
+		if (!error)
+			error = xfs_bmap_finish(&args->trans, args->flist, dp);
 		if (error) {
-			ASSERT(committed);
 			args->trans = NULL;
 			xfs_bmap_cancel(args->flist);
 			return error;
 		}
 
-		/*
-		 * bmap_finish() may have committed the last trans and started
-		 * a new one.  We need the inode to be in all transactions.
-		 */
-		if (committed)
-			xfs_trans_ijoin(args->trans, dp, 0);
-
 		ASSERT(nmap == 1);
 		ASSERT((map.br_startblock != DELAYSTARTBLOCK) &&
 		       (map.br_startblock != HOLESTARTBLOCK));
@@ -610,31 +598,20 @@ xfs_attr_rmtval_remove(
 	blkcnt = args->rmtblkcnt;
 	done = 0;
 	while (!done) {
-		int committed;
-
 		xfs_bmap_init(args->flist, args->firstblock);
 		error = xfs_bunmapi(args->trans, args->dp, lblkno, blkcnt,
 				    XFS_BMAPI_ATTRFORK, 1, args->firstblock,
 				    args->flist, &done);
-		if (!error) {
+		if (!error)
 			error = xfs_bmap_finish(&args->trans, args->flist,
-						&committed);
-		}
+						args->dp);
 		if (error) {
-			ASSERT(committed);
 			args->trans = NULL;
 			xfs_bmap_cancel(args->flist);
 			return error;
 		}
 
 		/*
-		 * bmap_finish() may have committed the last trans and started
-		 * a new one.  We need the inode to be in all transactions.
-		 */
-		if (committed)
-			xfs_trans_ijoin(args->trans, args->dp, 0);
-
-		/*
 		 * Close out trans and start the next one in the chain.
 		 */
 		error = xfs_trans_roll(&args->trans, args->dp);
diff --git a/libxfs/xfs_bmap.c b/libxfs/xfs_bmap.c
index eb19133..8cb89bc 100644
--- a/libxfs/xfs_bmap.c
+++ b/libxfs/xfs_bmap.c
@@ -1109,7 +1109,6 @@ xfs_bmap_add_attrfork(
 	xfs_trans_t		*tp;		/* transaction pointer */
 	int			blks;		/* space reservation */
 	int			version = 1;	/* superblock attr version */
-	int			committed;	/* xaction was committed */
 	int			logflags;	/* logging flags */
 	int			error;		/* error return value */
 
@@ -1212,7 +1211,7 @@ xfs_bmap_add_attrfork(
 			xfs_log_sb(tp);
 	}
 
-	error = xfs_bmap_finish(&tp, &flist, &committed);
+	error = xfs_bmap_finish(&tp, &flist, NULL);
 	if (error)
 		goto bmap_cancel;
 	error = xfs_trans_commit(tp);
@@ -5949,7 +5948,6 @@ xfs_bmap_split_extent(
 	struct xfs_trans        *tp;
 	struct xfs_bmap_free    free_list;
 	xfs_fsblock_t           firstfsb;
-	int                     committed;
 	int                     error;
 
 	tp = xfs_trans_alloc(mp, XFS_TRANS_DIOSTRAT);
@@ -5970,7 +5968,7 @@ xfs_bmap_split_extent(
 	if (error)
 		goto out;
 
-	error = xfs_bmap_finish(&tp, &free_list, &committed);
+	error = xfs_bmap_finish(&tp, &free_list, NULL);
 	if (error)
 		goto out;
 
diff --git a/libxfs/xfs_bmap.h b/libxfs/xfs_bmap.h
index baec27d..6485403 100644
--- a/libxfs/xfs_bmap.h
+++ b/libxfs/xfs_bmap.h
@@ -195,7 +195,7 @@ void	xfs_bmap_add_free(struct xfs_mount *mp, struct xfs_bmap_free *flist,
 			  xfs_fsblock_t bno, xfs_filblks_t len);
 void	xfs_bmap_cancel(struct xfs_bmap_free *flist);
 int	xfs_bmap_finish(struct xfs_trans **tp, struct xfs_bmap_free *flist,
-			int *committed);
+			struct xfs_inode *ip);
 void	xfs_bmap_compute_maxlevels(struct xfs_mount *mp, int whichfork);
 int	xfs_bmap_first_unused(struct xfs_trans *tp, struct xfs_inode *ip,
 		xfs_extlen_t len, xfs_fileoff_t *unused, int whichfork);
diff --git a/mkfs/proto.c b/mkfs/proto.c
index cb34b28..21960d5 100644
--- a/mkfs/proto.c
+++ b/mkfs/proto.c
@@ -358,7 +358,6 @@ parseproto(
 #define	IF_FIFO		6
 
 	char		*buf;
-	int		committed;
 	int		error;
 	xfs_fsblock_t	first;
 	int		flags;
@@ -481,7 +480,7 @@ parseproto(
 		newdirent(mp, tp, pip, &xname, ip->i_ino, &first, &flist);
 		libxfs_trans_log_inode(tp, ip, flags);
 
-		error = -libxfs_bmap_finish(&tp, &flist, &committed);
+		error = -libxfs_bmap_finish(&tp, &flist, ip);
 		if (error)
 			fail(_("Pre-allocated file creation failed"), error);
 		libxfs_trans_commit(tp);
@@ -563,7 +562,7 @@ parseproto(
 		}
 		newdirectory(mp, tp, ip, pip);
 		libxfs_trans_log_inode(tp, ip, flags);
-		error = -libxfs_bmap_finish(&tp, &flist, &committed);
+		error = -libxfs_bmap_finish(&tp, &flist, ip);
 		if (error)
 			fail(_("Directory creation failed"), error);
 		libxfs_trans_commit(tp);
@@ -589,7 +588,7 @@ parseproto(
 		fail(_("Unknown format"), EINVAL);
 	}
 	libxfs_trans_log_inode(tp, ip, flags);
-	error = -libxfs_bmap_finish(&tp, &flist, &committed);
+	error = -libxfs_bmap_finish(&tp, &flist, ip);
 	if (error) {
 		fail(_("Error encountered creating file from prototype file"),
 			error);
@@ -615,7 +614,6 @@ rtinit(
 	xfs_mount_t	*mp)
 {
 	xfs_fileoff_t	bno;
-	int		committed;
 	xfs_fileoff_t	ebno;
 	xfs_bmbt_irec_t	*ep;
 	int		error;
@@ -700,7 +698,7 @@ rtinit(
 		}
 	}
 
-	error = -libxfs_bmap_finish(&tp, &flist, &committed);
+	error = -libxfs_bmap_finish(&tp, &flist, rbmip);
 	if (error) {
 		fail(_("Completion of the realtime bitmap failed"), error);
 	}
@@ -735,7 +733,7 @@ rtinit(
 			bno += ep->br_blockcount;
 		}
 	}
-	error = -libxfs_bmap_finish(&tp, &flist, &committed);
+	error = -libxfs_bmap_finish(&tp, &flist, rsumip);
 	if (error) {
 		fail(_("Completion of the realtime summary failed"), error);
 	}
@@ -759,7 +757,7 @@ rtinit(
 			fail(_("Error initializing the realtime space"),
 				error);
 		}
-		error = -libxfs_bmap_finish(&tp, &flist, &committed);
+		error = -libxfs_bmap_finish(&tp, &flist, rbmip);
 		if (error) {
 			fail(_("Error completing the realtime space"), error);
 		}
diff --git a/repair/phase6.c b/repair/phase6.c
index 7680deb..eb36520 100644
--- a/repair/phase6.c
+++ b/repair/phase6.c
@@ -483,7 +483,6 @@ mk_rbmino(xfs_mount_t *mp)
 	xfs_fsblock_t	first;
 	int		i;
 	int		nmap;
-	int		committed;
 	int		error;
 	xfs_bmap_free_t	flist;
 	xfs_fileoff_t	bno;
@@ -578,7 +577,7 @@ mk_rbmino(xfs_mount_t *mp)
 			bno += ep->br_blockcount;
 		}
 	}
-	error = -libxfs_bmap_finish(&tp, &flist, &committed);
+	error = -libxfs_bmap_finish(&tp, &flist, ip);
 	if (error) {
 		do_error(
 		_("allocation of the realtime bitmap failed, error = %d\n"),
@@ -742,7 +741,6 @@ mk_rsumino(xfs_mount_t *mp)
 	xfs_fsblock_t	first;
 	int		i;
 	int		nmap;
-	int		committed;
 	int		error;
 	int		nsumblocks;
 	xfs_bmap_free_t	flist;
@@ -843,7 +841,7 @@ mk_rsumino(xfs_mount_t *mp)
 			bno += ep->br_blockcount;
 		}
 	}
-	error = -libxfs_bmap_finish(&tp, &flist, &committed);
+	error = -libxfs_bmap_finish(&tp, &flist, ip);
 	if (error) {
 		do_error(
 	_("allocation of the realtime summary ino failed, error = %d\n"),
@@ -947,7 +945,6 @@ mk_orphanage(xfs_mount_t *mp)
 	ino_tree_node_t	*irec;
 	int		ino_offset = 0;
 	int		i;
-	int		committed;
 	int		error;
 	xfs_bmap_free_t	flist;
 	const int	mode = 0755;
@@ -1059,7 +1056,7 @@ mk_orphanage(xfs_mount_t *mp)
 	libxfs_dir_init(tp, ip, pip);
 	libxfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
 
-	error = -libxfs_bmap_finish(&tp, &flist, &committed);
+	error = -libxfs_bmap_finish(&tp, &flist, ip);
 	if (error) {
 		do_error(_("%s directory creation failed -- bmapf error %d\n"),
 			ORPHANAGE, error);
@@ -1090,7 +1087,6 @@ mv_orphanage(
 	xfs_fsblock_t		first;
 	xfs_bmap_free_t		flist;
 	int			err;
-	int			committed;
 	unsigned char		fname[MAXPATHLEN + 1];
 	int			nres;
 	int			incr;
@@ -1168,7 +1164,7 @@ mv_orphanage(
 			ino_p->i_d.di_nlink++;
 			libxfs_trans_log_inode(tp, ino_p, XFS_ILOG_CORE);
 
-			err = -libxfs_bmap_finish(&tp, &flist, &committed);
+			err = -libxfs_bmap_finish(&tp, &flist, ino_p);
 			if (err)
 				do_error(
 	_("bmap finish failed (err - %d), filesystem may be out of space\n"),
@@ -1215,7 +1211,7 @@ mv_orphanage(
 						err);
 			}
 
-			err = -libxfs_bmap_finish(&tp, &flist, &committed);
+			err = -libxfs_bmap_finish(&tp, &flist, ino_p);
 			if (err)
 				do_error(
 	_("bmap finish failed (%d), filesystem may be out of space\n"),
@@ -1254,7 +1250,7 @@ mv_orphanage(
 		ino_p->i_d.di_nlink = 1;
 		libxfs_trans_log_inode(tp, ino_p, XFS_ILOG_CORE);
 
-		err = -libxfs_bmap_finish(&tp, &flist, &committed);
+		err = -libxfs_bmap_finish(&tp, &flist, ino_p);
 		if (err)
 			do_error(
 	_("bmap finish failed (%d), filesystem may be out of space\n"),
@@ -1306,7 +1302,6 @@ longform_dir2_rebuild(
 	xfs_bmap_free_t		flist;
 	xfs_inode_t		pip;
 	dir_hash_ent_t		*p;
-	int			committed;
 	int			done;
 
 	/*
@@ -1356,7 +1351,7 @@ longform_dir2_rebuild(
 		goto out_bmap_cancel;
 	}
 
-	error = -libxfs_bmap_finish(&tp, &flist, &committed);
+	error = -libxfs_bmap_finish(&tp, &flist, ip);
 
 	libxfs_trans_commit(tp);
 
@@ -1391,7 +1386,7 @@ _("name create failed in ino %" PRIu64 " (%d), filesystem may be out of space\n"
 			goto out_bmap_cancel;
 		}
 
-		error = -libxfs_bmap_finish(&tp, &flist, &committed);
+		error = -libxfs_bmap_finish(&tp, &flist, ip);
 		if (error) {
 			do_warn(
 	_("bmap finish failed (%d), filesystem may be out of space\n"),
@@ -1423,7 +1418,6 @@ dir2_kill_block(
 	struct xfs_buf	*bp)
 {
 	xfs_da_args_t	args;
-	int		committed;
 	int		error;
 	xfs_fsblock_t	firstblock;
 	xfs_bmap_free_t	flist;
@@ -1453,7 +1447,7 @@ dir2_kill_block(
 	if (error)
 		do_error(_("shrink_inode failed inode %" PRIu64 " block %u\n"),
 			ip->i_ino, da_bno);
-	libxfs_bmap_finish(&tp, &flist, &committed);
+	libxfs_bmap_finish(&tp, &flist, ip);
 	libxfs_trans_commit(tp);
 }
 
@@ -1479,7 +1473,6 @@ longform_dir2_entry_check_data(
 	xfs_dir2_leaf_entry_t	*blp;
 	struct xfs_buf		*bp;
 	xfs_dir2_block_tail_t	*btp;
-	int			committed;
 	struct xfs_dir2_data_hdr *d;
 	xfs_dir2_db_t		db;
 	xfs_dir2_data_entry_t	*dep;
@@ -1930,7 +1923,7 @@ _("entry \"%s\" in dir inode %" PRIu64 " inconsistent with .. value (%" PRIu64 "
 		libxfs_dir2_data_freescan(mp->m_dir_geo, M_DIROPS(mp), d, &i);
 	if (needlog)
 		libxfs_dir2_data_log_header(&da, bp);
-	libxfs_bmap_finish(&tp, &flist, &committed);
+	libxfs_bmap_finish(&tp, &flist, ip);
 	libxfs_trans_commit(tp);
 
 	/* record the largest free space in the freetab for later checking */
@@ -2851,7 +2844,7 @@ process_dir_inode(
 	xfs_inode_t		*ip;
 	xfs_trans_t		*tp;
 	dir_hash_tab_t		*hashtab;
-	int			need_dot, committed;
+	int			need_dot;
 	int			dirty, num_illegal, error, nres;
 
 	ino = XFS_AGINO_TO_INO(mp, agno, irec->ino_startnum + ino_offset);
@@ -2996,7 +2989,7 @@ process_dir_inode(
 
 		libxfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
 
-		error = -libxfs_bmap_finish(&tp, &flist, &committed);
+		error = -libxfs_bmap_finish(&tp, &flist, ip);
 		ASSERT(error == 0);
 		libxfs_trans_commit(tp);
 
@@ -3057,7 +3050,7 @@ process_dir_inode(
 
 			libxfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
 
-			error = -libxfs_bmap_finish(&tp, &flist, &committed);
+			error = -libxfs_bmap_finish(&tp, &flist, ip);
 			ASSERT(error == 0);
 			libxfs_trans_commit(tp);
 		}
-- 
2.5.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 09/14] xfs: inode recovery readahead can race with inode buffer creation
  2016-02-15  6:18 [PATCH 0/14] xfsprogs: kernel libxfs sync up to 4.5-rc2 Dave Chinner
                   ` (7 preceding siblings ...)
  2016-02-15  6:18 ` [PATCH 08/14] xfs: eliminate committed arg from xfs_bmap_finish Dave Chinner
@ 2016-02-15  6:18 ` Dave Chinner
  2016-02-15  6:18 ` [PATCH 10/14] xfs: handle dquot buffer readahead in log recovery correctly Dave Chinner
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 18+ messages in thread
From: Dave Chinner @ 2016-02-15  6:18 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

Source kernel commit b79f4a1c68bb99152d0785ee4ea3ab4396cdacc6

When we do inode readahead in log recovery, we do can do the
readahead before we've replayed the icreate transaction that stamps
the buffer with inode cores. The inode readahead verifier catches
this and marks the buffer as !done to indicate that it doesn't yet
contain valid inodes.

In adding buffer error notification  (i.e. setting b_error = -EIO at
the same time as as we clear the done flag) to such a readahead
verifier failure, we can then get subsequent inode recovery failing
with this error:

XFS (dm-0): metadata I/O error: block 0xa00060 ("xlog_recover_do..(read#2)") error 5 numblks 32

This occurs when readahead completion races with icreate item replay
such as:

	inode readahead
		find buffer
		lock buffer
		submit RA io
	....
	icreate recovery
	    xfs_trans_get_buffer
		find buffer
		lock buffer
		<blocks on RA completion>
	.....
	<ra completion>
		fails verifier
		clear XBF_DONE
		set bp->b_error = -EIO
		release and unlock buffer
	<icreate gains lock>
	icreate initialises buffer
	marks buffer as done
	adds buffer to delayed write queue
	releases buffer

At this point, we have an initialised inode buffer that is up to
date but has an -EIO state registered against it. When we finally
get to recovering an inode in that buffer:

	inode item recovery
	    xfs_trans_read_buffer
		find buffer
		lock buffer
		sees XBF_DONE is set, returns buffer
	    sees bp->b_error is set
		fail log recovery!

Essentially, we need xfs_trans_get_buf_map() to clear the error status of
the buffer when doing a lookup. This function returns uninitialised
buffers, so the buffer returned can not be in an error state and
none of the code that uses this function expects b_error to be set
on return. Indeed, there is an ASSERT(!bp->b_error); in the
transaction case in xfs_trans_get_buf_map() that would have caught
this if log recovery used transactions....

This patch firstly changes the inode readahead failure to set -EIO
on the buffer, and secondly changes xfs_buf_get_map() to never
return a buffer with an error state set so this first change doesn't
cause unexpected log recovery failures.

cc: <stable@vger.kernel.org> # 3.12 - current
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 libxfs/xfs_inode_buf.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/libxfs/xfs_inode_buf.c b/libxfs/xfs_inode_buf.c
index 324715e..546af74 100644
--- a/libxfs/xfs_inode_buf.c
+++ b/libxfs/xfs_inode_buf.c
@@ -71,11 +71,12 @@ xfs_dinode_good_version(
  * has not had the inode cores stamped into it. Hence for readahead, the buffer
  * may be potentially invalid.
  *
- * If the readahead buffer is invalid, we don't want to mark it with an error,
- * but we do want to clear the DONE status of the buffer so that a followup read
- * will re-read it from disk. This will ensure that we don't get an unnecessary
- * warnings during log recovery and we don't get unnecssary panics on debug
- * kernels.
+ * If the readahead buffer is invalid, we need to mark it with an error and
+ * clear the DONE status of the buffer so that a followup read will re-read it
+ * from disk. We don't report the error otherwise to avoid warnings during log
+ * recovery and we don't get unnecssary panics on debug kernels. We use EIO here
+ * because all we want to do is say readahead failed; there is no-one to report
+ * the error to, so this will distinguish it from a non-ra verifier failure.
  */
 static void
 xfs_inode_buf_verify(
@@ -102,6 +103,7 @@ xfs_inode_buf_verify(
 						XFS_RANDOM_ITOBP_INOTOBP))) {
 			if (readahead) {
 				bp->b_flags &= ~XBF_DONE;
+				xfs_buf_ioerror(bp, -EIO);
 				return;
 			}
 
-- 
2.5.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 10/14] xfs: handle dquot buffer readahead in log recovery correctly
  2016-02-15  6:18 [PATCH 0/14] xfsprogs: kernel libxfs sync up to 4.5-rc2 Dave Chinner
                   ` (8 preceding siblings ...)
  2016-02-15  6:18 ` [PATCH 09/14] xfs: inode recovery readahead can race with inode buffer creation Dave Chinner
@ 2016-02-15  6:18 ` Dave Chinner
  2016-02-16 19:20   ` Brian Foster
  2016-02-15  6:18 ` [PATCH 11/14] xfs: swap leaf buffer into path struct atomically during path shift Dave Chinner
                   ` (4 subsequent siblings)
  14 siblings, 1 reply; 18+ messages in thread
From: Dave Chinner @ 2016-02-15  6:18 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

Source kernel commit 7d6a13f023567d573ac362502bb702eda716e654

When we do dquot readahead in log recovery, we do not use a verifier
as the underlying buffer may not have dquots in it. e.g. the
allocation operation hasn't yet been replayed. Hence we do not want
to fail recovery because we detect an operation to be replayed has
not been run yet. This problem was addressed for inodes in commit
d891400 ("xfs: inode buffers may not be valid during recovery
readahead") but the problem was not recognised to exist for dquots
and their buffers as the dquot readahead did not have a verifier.

The result of not using a verifier is that when the buffer is then
next read to replay a dquot modification, the dquot buffer verifier
will only be attached to the buffer if *readahead is not complete*.
Hence we can read the buffer, replay the dquot changes and then add
it to the delwri submission list without it having a verifier
attached to it. This then generates warnings in xfs_buf_ioapply(),
which catches and warns about this case.

Fix this and make it handle the same readahead verifier error cases
as for inode buffers by adding a new readahead verifier that has a
write operation as well as a read operation that marks the buffer as
not done if any corruption is detected.  Also make sure we don't run
readahead if the dquot buffer has been marked as cancelled by
recovery.

This will result in readahead either succeeding and the buffer
having a valid write verifier, or readahead failing and the buffer
state requiring the subsequent read to resubmit the IO with the new
verifier.  In either case, this will result in the buffer always
ending up with a valid write verifier on it.

Note: we also need to fix the inode buffer readahead error handling
to mark the buffer with EIO. Brian noticed the code I copied from
there wrong during review, so fix it at the same time. Add comments
linking the two functions that handle readahead verifier errors
together so we don't forget this behavioural link in future.

cc: <stable@vger.kernel.org> # 3.12 - current
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 libxfs/xfs_dquot_buf.c  | 36 ++++++++++++++++++++++++++++++------
 libxfs/xfs_inode_buf.c  |  2 ++
 libxfs/xfs_quota_defs.h |  2 +-
 libxfs/xfs_shared.h     |  1 +
 4 files changed, 34 insertions(+), 7 deletions(-)

diff --git a/libxfs/xfs_dquot_buf.c b/libxfs/xfs_dquot_buf.c
index fd4aa4b..025d49b 100644
--- a/libxfs/xfs_dquot_buf.c
+++ b/libxfs/xfs_dquot_buf.c
@@ -62,7 +62,7 @@ xfs_dqcheck(
 	xfs_dqid_t	 id,
 	uint		 type,	  /* used only when IO_dorepair is true */
 	uint		 flags,
-	char		 *str)
+	const char	 *str)
 {
 	xfs_dqblk_t	 *d = (xfs_dqblk_t *)ddq;
 	int		errs = 0;
@@ -215,7 +215,8 @@ xfs_dquot_buf_verify_crc(
 STATIC bool
 xfs_dquot_buf_verify(
 	struct xfs_mount	*mp,
-	struct xfs_buf		*bp)
+	struct xfs_buf		*bp,
+	int			warn)
 {
 	struct xfs_dqblk	*d = (struct xfs_dqblk *)bp->b_addr;
 	xfs_dqid_t		id = 0;
@@ -248,8 +249,7 @@ xfs_dquot_buf_verify(
 		if (i == 0)
 			id = be32_to_cpu(ddq->d_id);
 
-		error = xfs_dqcheck(mp, ddq, id + i, 0, XFS_QMOPT_DOWARN,
-				       "xfs_dquot_buf_verify");
+		error = xfs_dqcheck(mp, ddq, id + i, 0, warn, __func__);
 		if (error)
 			return false;
 	}
@@ -264,7 +264,7 @@ xfs_dquot_buf_read_verify(
 
 	if (!xfs_dquot_buf_verify_crc(mp, bp))
 		xfs_buf_ioerror(bp, -EFSBADCRC);
-	else if (!xfs_dquot_buf_verify(mp, bp))
+	else if (!xfs_dquot_buf_verify(mp, bp, XFS_QMOPT_DOWARN))
 		xfs_buf_ioerror(bp, -EFSCORRUPTED);
 
 	if (bp->b_error)
@@ -272,6 +272,25 @@ xfs_dquot_buf_read_verify(
 }
 
 /*
+ * readahead errors are silent and simply leave the buffer as !done so a real
+ * read will then be run with the xfs_dquot_buf_ops verifier. See
+ * xfs_inode_buf_verify() for why we use EIO and ~XBF_DONE here rather than
+ * reporting the failure.
+ */
+static void
+xfs_dquot_buf_readahead_verify(
+	struct xfs_buf	*bp)
+{
+	struct xfs_mount	*mp = bp->b_target->bt_mount;
+
+	if (!xfs_dquot_buf_verify_crc(mp, bp) ||
+	    !xfs_dquot_buf_verify(mp, bp, 0)) {
+		xfs_buf_ioerror(bp, -EIO);
+		bp->b_flags &= ~XBF_DONE;
+	}
+}
+
+/*
  * we don't calculate the CRC here as that is done when the dquot is flushed to
  * the buffer after the update is done. This ensures that the dquot in the
  * buffer always has an up-to-date CRC value.
@@ -282,13 +301,18 @@ xfs_dquot_buf_write_verify(
 {
 	struct xfs_mount	*mp = bp->b_target->bt_mount;
 
-	if (!xfs_dquot_buf_verify(mp, bp)) {
+	if (!xfs_dquot_buf_verify(mp, bp, XFS_QMOPT_DOWARN)) {
 		xfs_buf_ioerror(bp, -EFSCORRUPTED);
 		xfs_verifier_error(bp);
 		return;
 	}
 }
 
+const struct xfs_buf_ops xfs_dquot_buf_ra_ops = {
+	.name = "xfs_dquot_ra",
+	.verify_read = xfs_dquot_buf_readahead_verify,
+	.verify_write = xfs_dquot_buf_write_verify,
+};
 const struct xfs_buf_ops xfs_dquot_buf_ops = {
 	.name = "xfs_dquot",
 	.verify_read = xfs_dquot_buf_read_verify,
diff --git a/libxfs/xfs_inode_buf.c b/libxfs/xfs_inode_buf.c
index 546af74..89c05ad 100644
--- a/libxfs/xfs_inode_buf.c
+++ b/libxfs/xfs_inode_buf.c
@@ -77,6 +77,8 @@ xfs_dinode_good_version(
  * recovery and we don't get unnecssary panics on debug kernels. We use EIO here
  * because all we want to do is say readahead failed; there is no-one to report
  * the error to, so this will distinguish it from a non-ra verifier failure.
+ * Changes to this readahead error behavour also need to be reflected in
+ * xfs_dquot_buf_readahead_verify().
  */
 static void
 xfs_inode_buf_verify(
diff --git a/libxfs/xfs_quota_defs.h b/libxfs/xfs_quota_defs.h
index 1b0a083..f51078f 100644
--- a/libxfs/xfs_quota_defs.h
+++ b/libxfs/xfs_quota_defs.h
@@ -153,7 +153,7 @@ typedef __uint16_t	xfs_qwarncnt_t;
 #define XFS_QMOPT_RESBLK_MASK	(XFS_QMOPT_RES_REGBLKS | XFS_QMOPT_RES_RTBLKS)
 
 extern int xfs_dqcheck(struct xfs_mount *mp, xfs_disk_dquot_t *ddq,
-		       xfs_dqid_t id, uint type, uint flags, char *str);
+		       xfs_dqid_t id, uint type, uint flags, const char *str);
 extern int xfs_calc_dquots_per_chunk(unsigned int nbblks);
 
 #endif	/* __XFS_QUOTA_H__ */
diff --git a/libxfs/xfs_shared.h b/libxfs/xfs_shared.h
index 5be5297..15c3ceb 100644
--- a/libxfs/xfs_shared.h
+++ b/libxfs/xfs_shared.h
@@ -49,6 +49,7 @@ extern const struct xfs_buf_ops xfs_inobt_buf_ops;
 extern const struct xfs_buf_ops xfs_inode_buf_ops;
 extern const struct xfs_buf_ops xfs_inode_buf_ra_ops;
 extern const struct xfs_buf_ops xfs_dquot_buf_ops;
+extern const struct xfs_buf_ops xfs_dquot_buf_ra_ops;
 extern const struct xfs_buf_ops xfs_sb_buf_ops;
 extern const struct xfs_buf_ops xfs_sb_quiet_buf_ops;
 extern const struct xfs_buf_ops xfs_symlink_buf_ops;
-- 
2.5.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 11/14] xfs: swap leaf buffer into path struct atomically during path shift
  2016-02-15  6:18 [PATCH 0/14] xfsprogs: kernel libxfs sync up to 4.5-rc2 Dave Chinner
                   ` (9 preceding siblings ...)
  2016-02-15  6:18 ` [PATCH 10/14] xfs: handle dquot buffer readahead in log recovery correctly Dave Chinner
@ 2016-02-15  6:18 ` Dave Chinner
  2016-02-15  6:18 ` [PATCH 12/14] libxfs: fix two comment typos Dave Chinner
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 18+ messages in thread
From: Dave Chinner @ 2016-02-15  6:18 UTC (permalink / raw)
  To: xfs

From: Brian Foster <bfoster@redhat.com>

Source kernel commit 7df1c170b9a45ab3a7401c79bbefa9939bf8eafb

The node directory lookup code uses a state structure that tracks the
path of buffers used to search for the hash of a filename through the
leaf blocks. When the lookup encounters a block that ends with the
requested hash, but the entry has not yet been found, it must shift over
to the next block and continue looking for the entry (i.e., duplicate
hashes could continue over into the next block). This shift mechanism
involves walking back up and down the state structure, replacing buffers
at the appropriate btree levels as necessary.

When a buffer is replaced, the old buffer is released and the new buffer
read into the active slot in the path structure. Because the buffer is
read directly into the path slot, a buffer read failure can result in
setting a NULL buffer pointer in an active slot. This throws off the
state cleanup code in xfs_dir2_node_lookup(), which expects to release a
buffer from each active slot. Instead, a BUG occurs due to a NULL
pointer dereference:

  BUG: unable to handle kernel NULL pointer dereference at 00000000000001e8
  IP: [<ffffffffa0585063>] xfs_trans_brelse+0x2a3/0x3c0 [xfs]
  ...
  RIP: 0010:[<ffffffffa0585063>]  [<ffffffffa0585063>] xfs_trans_brelse+0x2a3/0x3c0 [xfs]
  ...
  Call Trace:
   [<ffffffffa05250c6>] xfs_dir2_node_lookup+0xa6/0x2c0 [xfs]
   [<ffffffffa0519f7c>] xfs_dir_lookup+0x1ac/0x1c0 [xfs]
   [<ffffffffa055d0e1>] xfs_lookup+0x91/0x290 [xfs]
   [<ffffffffa05580b3>] xfs_vn_lookup+0x73/0xb0 [xfs]
   [<ffffffff8122de8d>] lookup_real+0x1d/0x50
   [<ffffffff8123330e>] path_openat+0x91e/0x1490
   [<ffffffff81235079>] do_filp_open+0x89/0x100
   ...

This has been reproduced via a parallel fsstress and filesystem shutdown
workload in a loop. The shutdown triggers the read error in the
aforementioned codepath and causes the BUG in xfs_dir2_node_lookup().

Update xfs_da3_path_shift() to update the active path slot atomically
with respect to the caller when a buffer is replaced. This ensures that
the caller always sees the old or new buffer in the slot and prevents
the NULL pointer dereference.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 libxfs/xfs_da_btree.c | 23 ++++++++++++++---------
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/libxfs/xfs_da_btree.c b/libxfs/xfs_da_btree.c
index 25072c7..f3c04ab 100644
--- a/libxfs/xfs_da_btree.c
+++ b/libxfs/xfs_da_btree.c
@@ -1822,6 +1822,7 @@ xfs_da3_path_shift(
 	struct xfs_da_args	*args;
 	struct xfs_da_node_entry *btree;
 	struct xfs_da3_icnode_hdr nodehdr;
+	struct xfs_buf		*bp;
 	xfs_dablk_t		blkno = 0;
 	int			level;
 	int			error;
@@ -1866,20 +1867,24 @@ xfs_da3_path_shift(
 	 */
 	for (blk++, level++; level < path->active; blk++, level++) {
 		/*
-		 * Release the old block.
-		 * (if it's dirty, trans won't actually let go)
+		 * Read the next child block into a local buffer.
 		 */
-		if (release)
-			xfs_trans_brelse(args->trans, blk->bp);
+		error = xfs_da3_node_read(args->trans, dp, blkno, -1, &bp,
+					  args->whichfork);
+		if (error)
+			return error;
 
 		/*
-		 * Read the next child block.
+		 * Release the old block (if it's dirty, the trans doesn't
+		 * actually let go) and swap the local buffer into the path
+		 * structure. This ensures failure of the above read doesn't set
+		 * a NULL buffer in an active slot in the path.
 		 */
+		if (release)
+			xfs_trans_brelse(args->trans, blk->bp);
 		blk->blkno = blkno;
-		error = xfs_da3_node_read(args->trans, dp, blkno, -1,
-					&blk->bp, args->whichfork);
-		if (error)
-			return error;
+		blk->bp = bp;
+
 		info = blk->bp->b_addr;
 		ASSERT(info->magic == cpu_to_be16(XFS_DA_NODE_MAGIC) ||
 		       info->magic == cpu_to_be16(XFS_DA3_NODE_MAGIC) ||
-- 
2.5.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 12/14] libxfs: fix two comment typos
  2016-02-15  6:18 [PATCH 0/14] xfsprogs: kernel libxfs sync up to 4.5-rc2 Dave Chinner
                   ` (10 preceding siblings ...)
  2016-02-15  6:18 ` [PATCH 11/14] xfs: swap leaf buffer into path struct atomically during path shift Dave Chinner
@ 2016-02-15  6:18 ` Dave Chinner
  2016-02-15  6:18 ` [PATCH 13/14] xfs: stop holding ILOCK over filldir callbacks Dave Chinner
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 18+ messages in thread
From: Dave Chinner @ 2016-02-15  6:18 UTC (permalink / raw)
  To: xfs

From: Geliang Tang <geliangtang@163.com>

Source kernel commit fef4ded8cb2e53a47093b334e7730d55a3badc59

Just fix two typos in code comments.

Signed-off-by: Geliang Tang <geliangtang@163.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 libxfs/xfs_btree.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/libxfs/xfs_btree.c b/libxfs/xfs_btree.c
index 658f27e..efcad26 100644
--- a/libxfs/xfs_btree.c
+++ b/libxfs/xfs_btree.c
@@ -219,7 +219,7 @@ xfs_btree_check_ptr(
  * long-form btree header.
  *
  * Prior to calculting the CRC, pull the LSN out of the buffer log item and put
- * it into the buffer so recovery knows what the last modifcation was that made
+ * it into the buffer so recovery knows what the last modification was that made
  * it to disk.
  */
 void
@@ -257,7 +257,7 @@ xfs_btree_lblock_verify_crc(
  * short-form btree header.
  *
  * Prior to calculting the CRC, pull the LSN out of the buffer log item and put
- * it into the buffer so recovery knows what the last modifcation was that made
+ * it into the buffer so recovery knows what the last modification was that made
  * it to disk.
  */
 void
-- 
2.5.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 13/14] xfs: stop holding ILOCK over filldir callbacks
  2016-02-15  6:18 [PATCH 0/14] xfsprogs: kernel libxfs sync up to 4.5-rc2 Dave Chinner
                   ` (11 preceding siblings ...)
  2016-02-15  6:18 ` [PATCH 12/14] libxfs: fix two comment typos Dave Chinner
@ 2016-02-15  6:18 ` Dave Chinner
  2016-02-15  6:18 ` [PATCH 14/14] xfs: Validate the length of on-disk ACLs Dave Chinner
  2016-02-16 19:19 ` [PATCH 0/14] xfsprogs: kernel libxfs sync up to 4.5-rc2 Brian Foster
  14 siblings, 0 replies; 18+ messages in thread
From: Dave Chinner @ 2016-02-15  6:18 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

Source kernel commit dbad7c993053d8f482a5f76270a93307537efd8e

The recent change to the readdir locking made in 40194ec ("xfs:
reinstate the ilock in xfs_readdir") for CXFS directory sanity was
probably the wrong thing to do. Deep in the readdir code we
can take page faults in the filldir callback, and so taking a page
fault while holding an inode ilock creates a new set of locking
issues that lockdep warns all over the place about.

The locking order for regular inodes w.r.t. page faults is io_lock
-> pagefault -> mmap_sem -> ilock. The directory readdir code now
triggers ilock -> page fault -> mmap_sem. While we cannot deadlock
at this point, it inverts all the locking patterns that lockdep
normally sees on XFS inodes, and so triggers lockdep. We worked
around this with commit 93a8614 ("xfs: fix directory inode iolock
lockdep false positive"), but that then just moved the lockdep
warning to deeper in the page fault path and triggered on security
inode locks. Fixing the shmem issue there just moved the lockdep
reports somewhere else, and now we are getting false positives from
filesystem freezing annotations getting confused.

Further, if we enter memory reclaim in a readdir path, we now get
lockdep warning about potential deadlocks because the ilock is held
when we enter reclaim. This, again, is different to a regular file
in that we never allow memory reclaim to run while holding the ilock
for regular files. Hence lockdep now throws
ilock->kmalloc->reclaim->ilock warnings.

Basically, the problem is that the ilock is being used to protect
the directory data and the inode metadata, whereas for a regular
file the iolock protects the data and the ilock protects the
metadata. From the VFS perspective, the i_mutex serialises all
accesses to the directory data, and so not holding the ilock for
readdir doesn't matter. The issue is that CXFS doesn't access
directory data via the VFS, so it has no "data serialisaton"
mechanism. Hence we need to hold the IOLOCK in the correct places to
provide this low level directory data access serialisation.

The ilock can then be used just when the extent list needs to be
read, just like we do for regular files. The directory modification
code can take the iolock exclusive when the ilock is also taken,
and this then ensures that readdir is correct excluded while
modifications are in progress.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 libxfs/xfs_dir2.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/libxfs/xfs_dir2.c b/libxfs/xfs_dir2.c
index 383401b..89b4781 100644
--- a/libxfs/xfs_dir2.c
+++ b/libxfs/xfs_dir2.c
@@ -360,6 +360,7 @@ xfs_dir_lookup(
 	struct xfs_da_args *args;
 	int		rval;
 	int		v;		/* type-checking value */
+	int		lock_mode;
 
 	ASSERT(S_ISDIR(dp->i_d.di_mode));
 	XFS_STATS_INC(dp->i_mount, xs_dir_lookup);
@@ -385,6 +386,7 @@ xfs_dir_lookup(
 	if (ci_name)
 		args->op_flags |= XFS_DA_OP_CILOOKUP;
 
+	lock_mode = xfs_ilock_data_map_shared(dp);
 	if (dp->i_d.di_format == XFS_DINODE_FMT_LOCAL) {
 		rval = xfs_dir2_sf_lookup(args);
 		goto out_check_rval;
@@ -417,6 +419,7 @@ out_check_rval:
 		}
 	}
 out_free:
+	xfs_iunlock(dp, lock_mode);
 	kmem_free(args);
 	return rval;
 }
-- 
2.5.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 14/14] xfs: Validate the length of on-disk ACLs
  2016-02-15  6:18 [PATCH 0/14] xfsprogs: kernel libxfs sync up to 4.5-rc2 Dave Chinner
                   ` (12 preceding siblings ...)
  2016-02-15  6:18 ` [PATCH 13/14] xfs: stop holding ILOCK over filldir callbacks Dave Chinner
@ 2016-02-15  6:18 ` Dave Chinner
  2016-02-16 19:19 ` [PATCH 0/14] xfsprogs: kernel libxfs sync up to 4.5-rc2 Brian Foster
  14 siblings, 0 replies; 18+ messages in thread
From: Dave Chinner @ 2016-02-15  6:18 UTC (permalink / raw)
  To: xfs

From: Andreas Gruenbacher <agruenba@redhat.com>

Source kernel commit 86a21c79745ca97676cbd47f8608839382cc0448

In xfs_acl_from_disk, instead of trusting that xfs_acl.acl_cnt is correct,
make sure that the length of the attributes is correct as well.  Also, turn
the aclp parameter into a const pointer.

Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
---
 libxfs/xfs_format.h | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/libxfs/xfs_format.h b/libxfs/xfs_format.h
index 7eae0a5..f89b6e0 100644
--- a/libxfs/xfs_format.h
+++ b/libxfs/xfs_format.h
@@ -1502,9 +1502,13 @@ struct xfs_acl {
 						sizeof(struct xfs_acl_entry) \
 		: 25)
 
-#define XFS_ACL_MAX_SIZE(mp) \
+#define XFS_ACL_SIZE(cnt) \
 	(sizeof(struct xfs_acl) + \
-		sizeof(struct xfs_acl_entry) * XFS_ACL_MAX_ENTRIES((mp)))
+		sizeof(struct xfs_acl_entry) * cnt)
+
+#define XFS_ACL_MAX_SIZE(mp) \
+	XFS_ACL_SIZE(XFS_ACL_MAX_ENTRIES((mp)))
+
 
 /* On-disk XFS extended attribute names */
 #define SGI_ACL_FILE		"SGI_ACL_FILE"
-- 
2.5.0

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 0/14] xfsprogs: kernel libxfs sync up to 4.5-rc2
  2016-02-15  6:18 [PATCH 0/14] xfsprogs: kernel libxfs sync up to 4.5-rc2 Dave Chinner
                   ` (13 preceding siblings ...)
  2016-02-15  6:18 ` [PATCH 14/14] xfs: Validate the length of on-disk ACLs Dave Chinner
@ 2016-02-16 19:19 ` Brian Foster
  14 siblings, 0 replies; 18+ messages in thread
From: Brian Foster @ 2016-02-16 19:19 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Mon, Feb 15, 2016 at 05:18:11PM +1100, Dave Chinner wrote:
> Hi folks,
> 
> This patchset pulls xfsprogs lixfs up to the same base code as the
> v4.5-rc2 kernel. I've simply pulled the commits across and in each
> patch made the external xfsprogs changes necessary to make them
> compile and work correctly. This applies on top of the xfs_io dax
> patchset I posted a short while ago. I've left all the kernel
> sign-offs on the commits - it's really only the patches that change
> stuff outside libxfs that need any sort of review...
> 
> This is pretty much all of the remaining changes needed before I can
> cut a 4.5.0-rc1 release of xfsprogs - it's really only bug fixes and
> minor changes from here, as I really need this base tree mostly
> stable from here on.
> 
> The reason for that I'm now going to be build a for-next branch off
> this that has all the changes in the current kernel for-next branch.
> This will enable Darrick to have a sane code base that he can rebase
> all his xfsprogs changes for rmap/reflink on top of. That will make
> review, testing and eventual merge much easier for everyone.
> 

A couple nits to follow that don't warrant a resend. So, for the series:

Reviewed-by: Brian Foster <bfoster@redhat.com>

> -Dave.
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 05/14] xfs: introduce BMAPI_ZERO for allocating zeroed extents
  2016-02-15  6:18 ` [PATCH 05/14] xfs: introduce BMAPI_ZERO for allocating zeroed extents Dave Chinner
@ 2016-02-16 19:20   ` Brian Foster
  0 siblings, 0 replies; 18+ messages in thread
From: Brian Foster @ 2016-02-16 19:20 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Mon, Feb 15, 2016 at 05:18:16PM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Source kernel commit 3fbbbea34bac049c0b5938dc065f7d8ee1ef7e67
> 
> To enable DAX to do atomic allocation of zeroed extents, we need to
> drive the block zeroing deep into the allocator. Because
> xfs_bmapi_write() can return merged extents on allocation that were
> only partially allocated (i.e. requested range spans allocated and
> hole regions, allocation into the hole was contiguous), we cannot
> zero the extent returned from xfs_bmapi_write() as that can
> overwrite existing data with zeros.
> 
> Hence we have to drive the extent zeroing into the allocation code,
> prior to where we merge the extents into the BMBT and return the
> resultant map. This means we need to propagate this need down to
> the xfs_alloc_vextent() and issue the block zeroing at this point.
> 
> While this functionality is being introduced for DAX, there is no
> reason why it is specific to DAX - we can per-zero blocks during the
> allocation transaction on any type of device. It's just slow (and
> usually slower than unwritten allocation and conversion) on
> traditional block devices so doesn't tend to get used. We can,
> however, hook hardware zeroing optimisations via sb_issue_zeroout()
> to this operation, so it may be useful in future and hence the
> "allocate zeroed blocks" API needs to be implementation neutral.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> Reviewed-by: Brian Foster <bfoster@redhat.com>
> Signed-off-by: Dave Chinner <david@fromorbit.com>
> ---
>  include/libxfs.h         |  1 -
>  libxfs/libxfs_api_defs.h |  1 +
>  libxfs/libxfs_io.h       |  2 ++
>  libxfs/libxfs_priv.h     |  3 +++
>  libxfs/rdwr.c            |  4 +++-
>  libxfs/util.c            | 35 +++++++++++++++++++++++++++++++++++
>  libxfs/xfs_alloc.c       | 10 +++++++++-
>  libxfs/xfs_alloc.h       |  8 +++++---
>  libxfs/xfs_bmap.c        | 35 +++++++++++++++++++++++++++++++++--
>  libxfs/xfs_bmap.h        | 13 +++++++++++--
>  10 files changed, 102 insertions(+), 10 deletions(-)
> 
...
> diff --git a/libxfs/util.c b/libxfs/util.c
> index 90031fd..ee4bf3c 100644
> --- a/libxfs/util.c
> +++ b/libxfs/util.c
...
> @@ -770,3 +772,36 @@ xfs_log_check_lsn(
...
> +int
> +libxfs_zero_extent(
> +	struct xfs_inode *ip,
> +	xfs_fsblock_t	start_fsb,
> +	xfs_off_t	count_fsb)
> +{
> +	xfs_daddr_t	sector = xfs_fsb_to_db(ip, start_fsb);
> +	ssize_t		size = XFS_FSB_TO_BB(ip->i_mount, count_fsb);
> +
> +	return libxfs_device_zero(xfs_find_bdev_for_inode(ip), sector, size);
> +}
> +

Extra whitespace ^, otherwise looks good:

Reviewed-by: Brian Foster <bfoster@redhat.com>

> diff --git a/libxfs/xfs_alloc.c b/libxfs/xfs_alloc.c
> index 12d59df..af40270 100644
> --- a/libxfs/xfs_alloc.c
> +++ b/libxfs/xfs_alloc.c
> @@ -2508,7 +2508,7 @@ xfs_alloc_vextent(
>  		 * Try near allocation first, then anywhere-in-ag after
>  		 * the first a.g. fails.
>  		 */
> -		if ((args->userdata  == XFS_ALLOC_INITIAL_USER_DATA) &&
> +		if ((args->userdata & XFS_ALLOC_INITIAL_USER_DATA) &&
>  		    (mp->m_flags & XFS_MOUNT_32BITINODES)) {
>  			args->fsbno = XFS_AGB_TO_FSB(mp,
>  					((mp->m_agfrotor / rotorstep) %
> @@ -2639,6 +2639,14 @@ xfs_alloc_vextent(
>  		XFS_AG_CHECK_DADDR(mp, XFS_FSB_TO_DADDR(mp, args->fsbno),
>  			args->len);
>  #endif
> +
> +		/* Zero the extent if we were asked to do so */
> +		if (args->userdata & XFS_ALLOC_USERDATA_ZERO) {
> +			error = xfs_zero_extent(args->ip, args->fsbno, args->len);
> +			if (error)
> +				goto error0;
> +		}
> +
>  	}
>  	xfs_perag_put(args->pag);
>  	return 0;
> diff --git a/libxfs/xfs_alloc.h b/libxfs/xfs_alloc.h
> index 071b28b..135eb3d 100644
> --- a/libxfs/xfs_alloc.h
> +++ b/libxfs/xfs_alloc.h
> @@ -101,6 +101,7 @@ typedef struct xfs_alloc_arg {
>  	struct xfs_mount *mp;		/* file system mount point */
>  	struct xfs_buf	*agbp;		/* buffer for a.g. freelist header */
>  	struct xfs_perag *pag;		/* per-ag struct for this agno */
> +	struct xfs_inode *ip;		/* for userdata zeroing method */
>  	xfs_fsblock_t	fsbno;		/* file system block number */
>  	xfs_agnumber_t	agno;		/* allocation group number */
>  	xfs_agblock_t	agbno;		/* allocation group-relative block # */
> @@ -120,15 +121,16 @@ typedef struct xfs_alloc_arg {
>  	char		wasdel;		/* set if allocation was prev delayed */
>  	char		wasfromfl;	/* set if allocation is from freelist */
>  	char		isfl;		/* set if is freelist blocks - !acctg */
> -	char		userdata;	/* set if this is user data */
> +	char		userdata;	/* mask defining userdata treatment */
>  	xfs_fsblock_t	firstblock;	/* io first block allocated */
>  } xfs_alloc_arg_t;
>  
>  /*
>   * Defines for userdata
>   */
> -#define XFS_ALLOC_USERDATA		1	/* allocation is for user data*/
> -#define XFS_ALLOC_INITIAL_USER_DATA	2	/* special case start of file */
> +#define XFS_ALLOC_USERDATA		(1 << 0)/* allocation is for user data*/
> +#define XFS_ALLOC_INITIAL_USER_DATA	(1 << 1)/* special case start of file */
> +#define XFS_ALLOC_USERDATA_ZERO		(1 << 2)/* zero extent on allocation */
>  
>  xfs_extlen_t xfs_alloc_longest_free_extent(struct xfs_mount *mp,
>  		struct xfs_perag *pag, xfs_extlen_t need);
> diff --git a/libxfs/xfs_bmap.c b/libxfs/xfs_bmap.c
> index 8e19b50..a38549c 100644
> --- a/libxfs/xfs_bmap.c
> +++ b/libxfs/xfs_bmap.c
> @@ -3795,8 +3795,13 @@ xfs_bmap_btalloc(
>  	args.wasdel = ap->wasdel;
>  	args.isfl = 0;
>  	args.userdata = ap->userdata;
> -	if ((error = xfs_alloc_vextent(&args)))
> +	if (ap->userdata & XFS_ALLOC_USERDATA_ZERO)
> +		args.ip = ap->ip;
> +
> +	error = xfs_alloc_vextent(&args);
> +	if (error)
>  		return error;
> +
>  	if (tryagain && args.fsbno == NULLFSBLOCK) {
>  		/*
>  		 * Exact allocation failed. Now try with alignment
> @@ -4295,11 +4300,14 @@ xfs_bmapi_allocate(
>  
>  	/*
>  	 * Indicate if this is the first user data in the file, or just any
> -	 * user data.
> +	 * user data. And if it is userdata, indicate whether it needs to
> +	 * be initialised to zero during allocation.
>  	 */
>  	if (!(bma->flags & XFS_BMAPI_METADATA)) {
>  		bma->userdata = (bma->offset == 0) ?
>  			XFS_ALLOC_INITIAL_USER_DATA : XFS_ALLOC_USERDATA;
> +		if (bma->flags & XFS_BMAPI_ZERO)
> +			bma->userdata |= XFS_ALLOC_USERDATA_ZERO;
>  	}
>  
>  	bma->minlen = (bma->flags & XFS_BMAPI_CONTIG) ? bma->length : 1;
> @@ -4414,6 +4422,17 @@ xfs_bmapi_convert_unwritten(
>  	mval->br_state = (mval->br_state == XFS_EXT_UNWRITTEN)
>  				? XFS_EXT_NORM : XFS_EXT_UNWRITTEN;
>  
> +	/*
> +	 * Before insertion into the bmbt, zero the range being converted
> +	 * if required.
> +	 */
> +	if (flags & XFS_BMAPI_ZERO) {
> +		error = xfs_zero_extent(bma->ip, mval->br_startblock,
> +					mval->br_blockcount);
> +		if (error)
> +			return error;
> +	}
> +
>  	error = xfs_bmap_add_extent_unwritten_real(bma->tp, bma->ip, &bma->idx,
>  			&bma->cur, mval, bma->firstblock, bma->flist,
>  			&tmp_logflags);
> @@ -4507,6 +4526,18 @@ xfs_bmapi_write(
>  	ASSERT(XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_LOCAL);
>  	ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL));
>  
> +	/* zeroing is for currently only for data extents, not metadata */
> +	ASSERT((flags & (XFS_BMAPI_METADATA | XFS_BMAPI_ZERO)) !=
> +			(XFS_BMAPI_METADATA | XFS_BMAPI_ZERO));
> +	/*
> +	 * we can allocate unwritten extents or pre-zero allocated blocks,
> +	 * but it makes no sense to do both at once. This would result in
> +	 * zeroing the unwritten extent twice, but it still being an
> +	 * unwritten extent....
> +	 */
> +	ASSERT((flags & (XFS_BMAPI_PREALLOC | XFS_BMAPI_ZERO)) !=
> +			(XFS_BMAPI_PREALLOC | XFS_BMAPI_ZERO));
> +
>  	if (unlikely(XFS_TEST_ERROR(
>  	    (XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_EXTENTS &&
>  	     XFS_IFORK_FORMAT(ip, whichfork) != XFS_DINODE_FMT_BTREE),
> diff --git a/libxfs/xfs_bmap.h b/libxfs/xfs_bmap.h
> index d3daf6d..baec27d 100644
> --- a/libxfs/xfs_bmap.h
> +++ b/libxfs/xfs_bmap.h
> @@ -52,9 +52,9 @@ struct xfs_bmalloca {
>  	xfs_extlen_t		minleft; /* amount must be left after alloc */
>  	bool			eof;	/* set if allocating past last extent */
>  	bool			wasdel;	/* replacing a delayed allocation */
> -	bool			userdata;/* set if is user data */
>  	bool			aeof;	/* allocated space at eof */
>  	bool			conv;	/* overwriting unwritten extents */
> +	char			userdata;/* userdata mask */
>  	int			flags;
>  };
>  
> @@ -109,6 +109,14 @@ typedef	struct xfs_bmap_free
>   */
>  #define XFS_BMAPI_CONVERT	0x040
>  
> +/*
> + * allocate zeroed extents - this requires all newly allocated user data extents
> + * to be initialised to zero. It will be ignored if XFS_BMAPI_METADATA is set.
> + * Use in conjunction with XFS_BMAPI_CONVERT to convert unwritten extents found
> + * during the allocation range to zeroed written extents.
> + */
> +#define XFS_BMAPI_ZERO		0x080
> +
>  #define XFS_BMAPI_FLAGS \
>  	{ XFS_BMAPI_ENTIRE,	"ENTIRE" }, \
>  	{ XFS_BMAPI_METADATA,	"METADATA" }, \
> @@ -116,7 +124,8 @@ typedef	struct xfs_bmap_free
>  	{ XFS_BMAPI_PREALLOC,	"PREALLOC" }, \
>  	{ XFS_BMAPI_IGSTATE,	"IGSTATE" }, \
>  	{ XFS_BMAPI_CONTIG,	"CONTIG" }, \
> -	{ XFS_BMAPI_CONVERT,	"CONVERT" }
> +	{ XFS_BMAPI_CONVERT,	"CONVERT" }, \
> +	{ XFS_BMAPI_ZERO,	"ZERO" }
>  
>  
>  static inline int xfs_bmapi_aflag(int w)
> -- 
> 2.5.0
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 10/14] xfs: handle dquot buffer readahead in log recovery correctly
  2016-02-15  6:18 ` [PATCH 10/14] xfs: handle dquot buffer readahead in log recovery correctly Dave Chinner
@ 2016-02-16 19:20   ` Brian Foster
  0 siblings, 0 replies; 18+ messages in thread
From: Brian Foster @ 2016-02-16 19:20 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Mon, Feb 15, 2016 at 05:18:21PM +1100, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Source kernel commit 7d6a13f023567d573ac362502bb702eda716e654
> 
> When we do dquot readahead in log recovery, we do not use a verifier
> as the underlying buffer may not have dquots in it. e.g. the
> allocation operation hasn't yet been replayed. Hence we do not want
> to fail recovery because we detect an operation to be replayed has
> not been run yet. This problem was addressed for inodes in commit
> d891400 ("xfs: inode buffers may not be valid during recovery
> readahead") but the problem was not recognised to exist for dquots
> and their buffers as the dquot readahead did not have a verifier.
> 
> The result of not using a verifier is that when the buffer is then
> next read to replay a dquot modification, the dquot buffer verifier
> will only be attached to the buffer if *readahead is not complete*.
> Hence we can read the buffer, replay the dquot changes and then add
> it to the delwri submission list without it having a verifier
> attached to it. This then generates warnings in xfs_buf_ioapply(),
> which catches and warns about this case.
> 
> Fix this and make it handle the same readahead verifier error cases
> as for inode buffers by adding a new readahead verifier that has a
> write operation as well as a read operation that marks the buffer as
> not done if any corruption is detected.  Also make sure we don't run
> readahead if the dquot buffer has been marked as cancelled by
> recovery.
> 
> This will result in readahead either succeeding and the buffer
> having a valid write verifier, or readahead failing and the buffer
> state requiring the subsequent read to resubmit the IO with the new
> verifier.  In either case, this will result in the buffer always
> ending up with a valid write verifier on it.
> 
> Note: we also need to fix the inode buffer readahead error handling
> to mark the buffer with EIO. Brian noticed the code I copied from
> there wrong during review, so fix it at the same time. Add comments
> linking the two functions that handle readahead verifier errors
> together so we don't forget this behavioural link in future.
> 
> cc: <stable@vger.kernel.org> # 3.12 - current
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> Reviewed-by: Brian Foster <bfoster@redhat.com>
> Signed-off-by: Dave Chinner <david@fromorbit.com>
> ---
>  libxfs/xfs_dquot_buf.c  | 36 ++++++++++++++++++++++++++++++------
>  libxfs/xfs_inode_buf.c  |  2 ++
>  libxfs/xfs_quota_defs.h |  2 +-
>  libxfs/xfs_shared.h     |  1 +
>  4 files changed, 34 insertions(+), 7 deletions(-)
> 
> diff --git a/libxfs/xfs_dquot_buf.c b/libxfs/xfs_dquot_buf.c
> index fd4aa4b..025d49b 100644
> --- a/libxfs/xfs_dquot_buf.c
> +++ b/libxfs/xfs_dquot_buf.c
...
> @@ -282,13 +301,18 @@ xfs_dquot_buf_write_verify(
>  {
>  	struct xfs_mount	*mp = bp->b_target->bt_mount;
>  
> -	if (!xfs_dquot_buf_verify(mp, bp)) {
> +	if (!xfs_dquot_buf_verify(mp, bp, XFS_QMOPT_DOWARN)) {
>  		xfs_buf_ioerror(bp, -EFSCORRUPTED);
>  		xfs_verifier_error(bp);
>  		return;
>  	}
>  }
>  
> +const struct xfs_buf_ops xfs_dquot_buf_ra_ops = {
> +	.name = "xfs_dquot_ra",
> +	.verify_read = xfs_dquot_buf_readahead_verify,
> +	.verify_write = xfs_dquot_buf_write_verify,
> +};
>  const struct xfs_buf_ops xfs_dquot_buf_ops = {
>  	.name = "xfs_dquot",
>  	.verify_read = xfs_dquot_buf_read_verify,

Nit: the readahead ops structure comes after the buf_ops structure in
the kernel code.

Reviewed-by: Brian Foster <bfoster@redhat.com>

> diff --git a/libxfs/xfs_inode_buf.c b/libxfs/xfs_inode_buf.c
> index 546af74..89c05ad 100644
> --- a/libxfs/xfs_inode_buf.c
> +++ b/libxfs/xfs_inode_buf.c
> @@ -77,6 +77,8 @@ xfs_dinode_good_version(
>   * recovery and we don't get unnecssary panics on debug kernels. We use EIO here
>   * because all we want to do is say readahead failed; there is no-one to report
>   * the error to, so this will distinguish it from a non-ra verifier failure.
> + * Changes to this readahead error behavour also need to be reflected in
> + * xfs_dquot_buf_readahead_verify().
>   */
>  static void
>  xfs_inode_buf_verify(
> diff --git a/libxfs/xfs_quota_defs.h b/libxfs/xfs_quota_defs.h
> index 1b0a083..f51078f 100644
> --- a/libxfs/xfs_quota_defs.h
> +++ b/libxfs/xfs_quota_defs.h
> @@ -153,7 +153,7 @@ typedef __uint16_t	xfs_qwarncnt_t;
>  #define XFS_QMOPT_RESBLK_MASK	(XFS_QMOPT_RES_REGBLKS | XFS_QMOPT_RES_RTBLKS)
>  
>  extern int xfs_dqcheck(struct xfs_mount *mp, xfs_disk_dquot_t *ddq,
> -		       xfs_dqid_t id, uint type, uint flags, char *str);
> +		       xfs_dqid_t id, uint type, uint flags, const char *str);
>  extern int xfs_calc_dquots_per_chunk(unsigned int nbblks);
>  
>  #endif	/* __XFS_QUOTA_H__ */
> diff --git a/libxfs/xfs_shared.h b/libxfs/xfs_shared.h
> index 5be5297..15c3ceb 100644
> --- a/libxfs/xfs_shared.h
> +++ b/libxfs/xfs_shared.h
> @@ -49,6 +49,7 @@ extern const struct xfs_buf_ops xfs_inobt_buf_ops;
>  extern const struct xfs_buf_ops xfs_inode_buf_ops;
>  extern const struct xfs_buf_ops xfs_inode_buf_ra_ops;
>  extern const struct xfs_buf_ops xfs_dquot_buf_ops;
> +extern const struct xfs_buf_ops xfs_dquot_buf_ra_ops;
>  extern const struct xfs_buf_ops xfs_sb_buf_ops;
>  extern const struct xfs_buf_ops xfs_sb_quiet_buf_ops;
>  extern const struct xfs_buf_ops xfs_symlink_buf_ops;
> -- 
> 2.5.0
> 
> _______________________________________________
> xfs mailing list
> xfs@oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2016-02-16 19:20 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-15  6:18 [PATCH 0/14] xfsprogs: kernel libxfs sync up to 4.5-rc2 Dave Chinner
2016-02-15  6:18 ` [PATCH 01/14] libxfs: Optimize the loop for xfs_bitmap_empty Dave Chinner
2016-02-15  6:18 ` [PATCH 02/14] xfs: add missing bmap cancel calls in error paths Dave Chinner
2016-02-15  6:18 ` [PATCH 03/14] xfs: log local to remote symlink conversions correctly on v5 supers Dave Chinner
2016-02-15  6:18 ` [PATCH 04/14] xfs: per-filesystem stats counter implementation Dave Chinner
2016-02-15  6:18 ` [PATCH 05/14] xfs: introduce BMAPI_ZERO for allocating zeroed extents Dave Chinner
2016-02-16 19:20   ` Brian Foster
2016-02-15  6:18 ` [PATCH 06/14] xfs: get mp from bma->ip in xfs_bmap code Dave Chinner
2016-02-15  6:18 ` [PATCH 07/14] xfs: bmapbt checking on debug kernels too expensive Dave Chinner
2016-02-15  6:18 ` [PATCH 08/14] xfs: eliminate committed arg from xfs_bmap_finish Dave Chinner
2016-02-15  6:18 ` [PATCH 09/14] xfs: inode recovery readahead can race with inode buffer creation Dave Chinner
2016-02-15  6:18 ` [PATCH 10/14] xfs: handle dquot buffer readahead in log recovery correctly Dave Chinner
2016-02-16 19:20   ` Brian Foster
2016-02-15  6:18 ` [PATCH 11/14] xfs: swap leaf buffer into path struct atomically during path shift Dave Chinner
2016-02-15  6:18 ` [PATCH 12/14] libxfs: fix two comment typos Dave Chinner
2016-02-15  6:18 ` [PATCH 13/14] xfs: stop holding ILOCK over filldir callbacks Dave Chinner
2016-02-15  6:18 ` [PATCH 14/14] xfs: Validate the length of on-disk ACLs Dave Chinner
2016-02-16 19:19 ` [PATCH 0/14] xfsprogs: kernel libxfs sync up to 4.5-rc2 Brian Foster

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.