All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] xfs: more patches for 3.13
@ 2013-11-01  4:27 Dave Chinner
  2013-11-01  4:27 ` [PATCH 1/5] xfs: xfs_remove deadlocks due to inverted AGF vs AGI lock ordering Dave Chinner
                   ` (6 more replies)
  0 siblings, 7 replies; 30+ messages in thread
From: Dave Chinner @ 2013-11-01  4:27 UTC (permalink / raw)
  To: xfs

Hi folks,

The following series follows up the recently committed series of
patches for 3.13. The first two patches are the remaining
uncommitted patches from the previous series.

The next two patches are tracing patches, one for AIL manipulations
and the other for AGF and AGI read operations. Both of these were
written during recent debugging sessions, and both proved useful so
should be added to the menagerie of tracepoints we already have
avaialble.

The final patch is the increasing of the inode cluster size for v5
filesystems. I'd like to get this into v5 filesystems for 3.13 so we
get wider exposure of it ASAP so we have more data available to be
able to make informed decisions about how to bring this back to v4
filesystems in a safe and controlled manner.

Cheers,

Dave.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 1/5] xfs: xfs_remove deadlocks due to inverted AGF vs AGI lock ordering
  2013-11-01  4:27 [PATCH 0/5] xfs: more patches for 3.13 Dave Chinner
@ 2013-11-01  4:27 ` Dave Chinner
  2013-11-01  4:27 ` [PATCH 2/5] xfs: open code inc_inode_iversion when logging an inode Dave Chinner
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 30+ messages in thread
From: Dave Chinner @ 2013-11-01  4:27 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

Removing an inode from the namespace involves removing the directory
entry and dropping the link count on the inode. Removing the
directory entry can result in locking an AGF (directory blocks were
freed) and removing a link count can result in placing the inode on
an unlinked list which results in locking an AGI.

The big problem here is that we have an ordering constraint on AGF
and AGI locking - inode allocation locks the AGI, then can allocate
a new extent for new inodes, locking the AGF after the AGI.
Similarly, freeing the inode removes the inode from the unlinked
list, requiring that we lock the AGI first, and then freeing the
inode can result in an inode chunk being freed and hence freeing
disk space requiring that we lock an AGF.

Hence the ordering that is imposed by other parts of the code is AGI
before AGF. This means we cannot remove the directory entry before
we drop the inode reference count and put it on the unlinked list as
this results in a lock order of AGF then AGI, and this can deadlock
against inode allocation and freeing. Therefore we must drop the
link counts before we remove the directory entry.

This is still safe from a transactional point of view - it is not
until we get to xfs_bmap_finish() that we have the possibility of
multiple transactions in this operation. Hence as long as we remove
the directory entry and drop the link count in the first transaction
of the remove operation, there are no transactional constraints on
the ordering here.

Change the ordering of the operations in the xfs_remove() function
to align the ordering of AGI and AGF locking to match that of the
rest of the code.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Ben Myers <bpm@sgi.com>
---
 fs/xfs/xfs_inode.c | 72 +++++++++++++++++++++++++++++++++---------------------
 1 file changed, 44 insertions(+), 28 deletions(-)

diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 326b94d..001aa89 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -2404,6 +2404,33 @@ xfs_iunpin_wait(
 		__xfs_iunpin_wait(ip);
 }
 
+/*
+ * Removing an inode from the namespace involves removing the directory entry
+ * and dropping the link count on the inode. Removing the directory entry can
+ * result in locking an AGF (directory blocks were freed) and removing a link
+ * count can result in placing the inode on an unlinked list which results in
+ * locking an AGI.
+ *
+ * The big problem here is that we have an ordering constraint on AGF and AGI
+ * locking - inode allocation locks the AGI, then can allocate a new extent for
+ * new inodes, locking the AGF after the AGI. Similarly, freeing the inode
+ * removes the inode from the unlinked list, requiring that we lock the AGI
+ * first, and then freeing the inode can result in an inode chunk being freed
+ * and hence freeing disk space requiring that we lock an AGF.
+ *
+ * Hence the ordering that is imposed by other parts of the code is AGI before
+ * AGF. This means we cannot remove the directory entry before we drop the inode
+ * reference count and put it on the unlinked list as this results in a lock
+ * order of AGF then AGI, and this can deadlock against inode allocation and
+ * freeing. Therefore we must drop the link counts before we remove the
+ * directory entry.
+ *
+ * This is still safe from a transactional point of view - it is not until we
+ * get to xfs_bmap_finish() that we have the possibility of multiple
+ * transactions in this operation. Hence as long as we remove the directory
+ * entry and drop the link count in the first transaction of the remove
+ * operation, there are no transactional constraints on the ordering here.
+ */
 int
 xfs_remove(
 	xfs_inode_t             *dp,
@@ -2473,6 +2500,7 @@ xfs_remove(
 	/*
 	 * If we're removing a directory perform some additional validation.
 	 */
+	cancel_flags |= XFS_TRANS_ABORT;
 	if (is_dir) {
 		ASSERT(ip->i_d.di_nlink >= 2);
 		if (ip->i_d.di_nlink != 2) {
@@ -2483,31 +2511,16 @@ xfs_remove(
 			error = XFS_ERROR(ENOTEMPTY);
 			goto out_trans_cancel;
 		}
-	}
 
-	xfs_bmap_init(&free_list, &first_block);
-	error = xfs_dir_removename(tp, dp, name, ip->i_ino,
-					&first_block, &free_list, resblks);
-	if (error) {
-		ASSERT(error != ENOENT);
-		goto out_bmap_cancel;
-	}
-	xfs_trans_ichgtime(tp, dp, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG);
-
-	if (is_dir) {
-		/*
-		 * Drop the link from ip's "..".
-		 */
+		/* Drop the link from ip's "..".  */
 		error = xfs_droplink(tp, dp);
 		if (error)
-			goto out_bmap_cancel;
+			goto out_trans_cancel;
 
-		/*
-		 * Drop the "." link from ip to self.
-		 */
+		/* Drop the "." link from ip to self.  */
 		error = xfs_droplink(tp, ip);
 		if (error)
-			goto out_bmap_cancel;
+			goto out_trans_cancel;
 	} else {
 		/*
 		 * When removing a non-directory we need to log the parent
@@ -2516,20 +2529,24 @@ xfs_remove(
 		 */
 		xfs_trans_log_inode(tp, dp, XFS_ILOG_CORE);
 	}
+	xfs_trans_ichgtime(tp, dp, XFS_ICHGTIME_MOD | XFS_ICHGTIME_CHG);
 
-	/*
-	 * Drop the link from dp to ip.
-	 */
+	/* Drop the link from dp to ip. */
 	error = xfs_droplink(tp, ip);
 	if (error)
-		goto out_bmap_cancel;
+		goto out_trans_cancel;
 
-	/*
-	 * Determine if this is the last link while
-	 * we are in the transaction.
-	 */
+	/* Determine if this is the last link while the inode is locked */
 	link_zero = (ip->i_d.di_nlink == 0);
 
+	xfs_bmap_init(&free_list, &first_block);
+	error = xfs_dir_removename(tp, dp, name, ip->i_ino,
+					&first_block, &free_list, resblks);
+	if (error) {
+		ASSERT(error != ENOENT);
+		goto out_bmap_cancel;
+	}
+
 	/*
 	 * If this is a synchronous mount, make sure that the
 	 * remove transaction goes to disk before returning to
@@ -2559,7 +2576,6 @@ xfs_remove(
 
  out_bmap_cancel:
 	xfs_bmap_cancel(&free_list);
-	cancel_flags |= XFS_TRANS_ABORT;
  out_trans_cancel:
 	xfs_trans_cancel(tp, cancel_flags);
  std_return:
-- 
1.8.4.rc3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 2/5] xfs: open code inc_inode_iversion when logging an inode
  2013-11-01  4:27 [PATCH 0/5] xfs: more patches for 3.13 Dave Chinner
  2013-11-01  4:27 ` [PATCH 1/5] xfs: xfs_remove deadlocks due to inverted AGF vs AGI lock ordering Dave Chinner
@ 2013-11-01  4:27 ` Dave Chinner
  2013-11-05 16:41   ` Christoph Hellwig
  2013-11-18 21:54   ` Eric Sandeen
  2013-11-01  4:27 ` [PATCH 3/5] xfs: trace AIL manipulations Dave Chinner
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 30+ messages in thread
From: Dave Chinner @ 2013-11-01  4:27 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

Michael L Semon reported that generic/069 runtime increased on v5
superblocks by 100% compared to v4 superblocks. his perf-based
analysis pointed directly at the timestamp updates being done by the
write path in this workload. The append writers are doing 4-byte
writes, so there are lots of timestamp updates occurring.

The thing is, they aren't being triggered by timestamp changes -
they are being triggered by the inode change counter needing to be
updated. That is, every write(2) system call needs to bump the inode
version count, and it does that through the timestamp update
mechanism. Hence for v5 filesystems, test generic/069 is running 3
orders of magnitude more timestmap update transactions on v5
filesystems due to the fact it does a huge number of *4 byte*
write(2) calls.

This isn't a real world scenario we really need to address - anyone
doing such sequential IO should be using fwrite(3), not write(2).
i.e. fwrite(3) buffers the writes in userspace to minimise the
number of write(2) syscalls, and the problem goes away.

However, there is a small change we can make to improve the
situation - removing the expensive lock operation on the change
counter update.  All inode version counter changes in XFS occur
under the ip->i_ilock during a transaction, and therefore we
don't actually need the spin lock that provides exclusive access to
it through inc_inode_iversion().

Hence avoid the lock and just open code the increment ourselves when
logging the inode.

Reported-by: Michael L. Semon <mlsemon35@gmail.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_trans_inode.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/xfs_trans_inode.c b/fs/xfs/xfs_trans_inode.c
index 1bba7f6..50c3f56 100644
--- a/fs/xfs/xfs_trans_inode.c
+++ b/fs/xfs/xfs_trans_inode.c
@@ -111,12 +111,14 @@ xfs_trans_log_inode(
 
 	/*
 	 * First time we log the inode in a transaction, bump the inode change
-	 * counter if it is configured for this to occur.
+	 * counter if it is configured for this to occur. We don't use
+	 * inode_inc_version() because there is no need for extra locking around
+	 * i_version as we already hold the inode locked exclusively for
+	 * metadata modification.
 	 */
 	if (!(ip->i_itemp->ili_item.li_desc->lid_flags & XFS_LID_DIRTY) &&
 	    IS_I_VERSION(VFS_I(ip))) {
-		inode_inc_iversion(VFS_I(ip));
-		ip->i_d.di_changecount = VFS_I(ip)->i_version;
+		ip->i_d.di_changecount = ++VFS_I(ip)->i_version;
 		flags |= XFS_ILOG_CORE;
 	}
 
-- 
1.8.4.rc3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 3/5] xfs: trace AIL manipulations
  2013-11-01  4:27 [PATCH 0/5] xfs: more patches for 3.13 Dave Chinner
  2013-11-01  4:27 ` [PATCH 1/5] xfs: xfs_remove deadlocks due to inverted AGF vs AGI lock ordering Dave Chinner
  2013-11-01  4:27 ` [PATCH 2/5] xfs: open code inc_inode_iversion when logging an inode Dave Chinner
@ 2013-11-01  4:27 ` Dave Chinner
  2013-11-05 16:41   ` Christoph Hellwig
  2013-11-01  4:27 ` [PATCH 4/5] xfs: add tracepoints to AGF/AGI read operations Dave Chinner
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 30+ messages in thread
From: Dave Chinner @ 2013-11-01  4:27 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

I debugging a log tail issue on a RHEL6 kernel, I added these trace
points to trace log items being added, moved and removed in the AIL
and how that affected the log tail LSN that was written to the log.
They were very helpful in that they immediately identified the cause
of the problem being seen. Hence I'd like to always have them
available for use.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_log.c       |  1 +
 fs/xfs/xfs_trace.h     | 59 +++++++++++++++++++++++++++++++++++++++++++++++++-
 fs/xfs/xfs_trans_ail.c |  3 +++
 3 files changed, 62 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index 49dd41e..8497a00 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -1076,6 +1076,7 @@ xlog_assign_tail_lsn_locked(
 		tail_lsn = lip->li_lsn;
 	else
 		tail_lsn = atomic64_read(&log->l_last_sync_lsn);
+	trace_xfs_log_assign_tail_lsn(log, tail_lsn);
 	atomic64_set(&log->l_tail_lsn, tail_lsn);
 	return tail_lsn;
 }
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 47910e6..f195476 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -31,8 +31,8 @@ struct xfs_da_args;
 struct xfs_da_node_entry;
 struct xfs_dquot;
 struct xfs_log_item;
-struct xlog_ticket;
 struct xlog;
+struct xlog_ticket;
 struct xlog_recover;
 struct xlog_recover_item;
 struct xfs_buf_log_format;
@@ -938,6 +938,63 @@ DEFINE_LOG_ITEM_EVENT(xfs_ail_pinned);
 DEFINE_LOG_ITEM_EVENT(xfs_ail_locked);
 DEFINE_LOG_ITEM_EVENT(xfs_ail_flushing);
 
+DECLARE_EVENT_CLASS(xfs_ail_class,
+	TP_PROTO(struct xfs_log_item *lip, xfs_lsn_t old_lsn, xfs_lsn_t new_lsn),
+	TP_ARGS(lip, old_lsn, new_lsn),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(void *, lip)
+		__field(uint, type)
+		__field(uint, flags)
+		__field(xfs_lsn_t, old_lsn)
+		__field(xfs_lsn_t, new_lsn)
+	),
+	TP_fast_assign(
+		__entry->dev = lip->li_mountp->m_super->s_dev;
+		__entry->lip = lip;
+		__entry->type = lip->li_type;
+		__entry->flags = lip->li_flags;
+		__entry->old_lsn = old_lsn;
+		__entry->new_lsn = new_lsn;
+	),
+	TP_printk("dev %d:%d lip 0x%p old lsn %d/%d new lsn %d/%d type %s flags %s",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->lip,
+		  CYCLE_LSN(__entry->old_lsn), BLOCK_LSN(__entry->old_lsn),
+		  CYCLE_LSN(__entry->new_lsn), BLOCK_LSN(__entry->new_lsn),
+		  __print_symbolic(__entry->type, XFS_LI_TYPE_DESC),
+		  __print_flags(__entry->flags, "|", XFS_LI_FLAGS))
+)
+
+#define DEFINE_AIL_EVENT(name) \
+DEFINE_EVENT(xfs_ail_class, name, \
+	TP_PROTO(struct xfs_log_item *lip, xfs_lsn_t old_lsn, xfs_lsn_t new_lsn), \
+	TP_ARGS(lip, old_lsn, new_lsn))
+DEFINE_AIL_EVENT(xfs_ail_insert);
+DEFINE_AIL_EVENT(xfs_ail_move);
+DEFINE_AIL_EVENT(xfs_ail_delete);
+
+TRACE_EVENT(xfs_log_assign_tail_lsn,
+	TP_PROTO(struct xlog *log, xfs_lsn_t new_lsn),
+	TP_ARGS(log, new_lsn),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_lsn_t, new_lsn)
+		__field(xfs_lsn_t, old_lsn)
+		__field(xfs_lsn_t, last_sync_lsn)
+	),
+	TP_fast_assign(
+		__entry->dev = log->l_mp->m_super->s_dev;
+		__entry->new_lsn = new_lsn;
+		__entry->old_lsn = atomic64_read(&log->l_tail_lsn);
+		__entry->last_sync_lsn = atomic64_read(&log->l_last_sync_lsn);
+	),
+	TP_printk("dev %d:%d new tail lsn %d/%d, old lsn %d/%d, last sync %d/%d",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  CYCLE_LSN(__entry->new_lsn), BLOCK_LSN(__entry->new_lsn),
+		  CYCLE_LSN(__entry->old_lsn), BLOCK_LSN(__entry->old_lsn),
+		  CYCLE_LSN(__entry->last_sync_lsn), BLOCK_LSN(__entry->last_sync_lsn))
+)
 
 DECLARE_EVENT_CLASS(xfs_file_class,
 	TP_PROTO(struct xfs_inode *ip, size_t count, loff_t offset, int flags),
diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
index 4b47cfe..a728735 100644
--- a/fs/xfs/xfs_trans_ail.c
+++ b/fs/xfs/xfs_trans_ail.c
@@ -659,11 +659,13 @@ xfs_trans_ail_update_bulk(
 			if (XFS_LSN_CMP(lsn, lip->li_lsn) <= 0)
 				continue;
 
+			trace_xfs_ail_move(lip, lip->li_lsn, lsn);
 			xfs_ail_delete(ailp, lip);
 			if (mlip == lip)
 				mlip_changed = 1;
 		} else {
 			lip->li_flags |= XFS_LI_IN_AIL;
+			trace_xfs_ail_insert(lip, 0, lsn);
 		}
 		lip->li_lsn = lsn;
 		list_add(&lip->li_ail, &tmp);
@@ -732,6 +734,7 @@ xfs_trans_ail_delete_bulk(
 			return;
 		}
 
+		trace_xfs_ail_delete(lip, mlip->li_lsn, lip->li_lsn);
 		xfs_ail_delete(ailp, lip);
 		lip->li_flags &= ~XFS_LI_IN_AIL;
 		lip->li_lsn = 0;
-- 
1.8.4.rc3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 4/5] xfs: add tracepoints to AGF/AGI read operations
  2013-11-01  4:27 [PATCH 0/5] xfs: more patches for 3.13 Dave Chinner
                   ` (2 preceding siblings ...)
  2013-11-01  4:27 ` [PATCH 3/5] xfs: trace AIL manipulations Dave Chinner
@ 2013-11-01  4:27 ` Dave Chinner
  2013-11-05 16:42   ` Christoph Hellwig
  2013-11-01  4:27 ` [PATCH 5/5] xfs: increase inode cluster size for v5 filesystems Dave Chinner
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 30+ messages in thread
From: Dave Chinner @ 2013-11-01  4:27 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

To help track down AGI/AGF lock ordering issues, I added these
tracepoints to tell us when an AGI or AGF is read and locked.  With
these we can now determine if the lock ordering goes wrong from
tracing captures.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_alloc.c  |  5 ++++-
 fs/xfs/xfs_ialloc.c |  6 +++++-
 fs/xfs/xfs_trace.h  | 25 +++++++++++++++++++++++++
 3 files changed, 34 insertions(+), 2 deletions(-)

diff --git a/fs/xfs/xfs_alloc.c b/fs/xfs/xfs_alloc.c
index bcf1652..9eab2df 100644
--- a/fs/xfs/xfs_alloc.c
+++ b/fs/xfs/xfs_alloc.c
@@ -2294,6 +2294,8 @@ xfs_read_agf(
 {
 	int		error;
 
+	trace_xfs_read_agf(mp, agno);
+
 	ASSERT(agno != NULLAGNUMBER);
 	error = xfs_trans_read_buf(
 			mp, tp, mp->m_ddev_targp,
@@ -2324,8 +2326,9 @@ xfs_alloc_read_agf(
 	struct xfs_perag	*pag;		/* per allocation group data */
 	int			error;
 
-	ASSERT(agno != NULLAGNUMBER);
+	trace_xfs_alloc_read_agf(mp, agno);
 
+	ASSERT(agno != NULLAGNUMBER);
 	error = xfs_read_agf(mp, tp, agno,
 			(flags & XFS_ALLOC_FLAG_TRYLOCK) ? XBF_TRYLOCK : 0,
 			bpp);
diff --git a/fs/xfs/xfs_ialloc.c b/fs/xfs/xfs_ialloc.c
index 14d732f..e87719c 100644
--- a/fs/xfs/xfs_ialloc.c
+++ b/fs/xfs/xfs_ialloc.c
@@ -40,6 +40,7 @@
 #include "xfs_icreate_item.h"
 #include "xfs_icache.h"
 #include "xfs_dinode.h"
+#include "xfs_trace.h"
 
 
 /*
@@ -1627,8 +1628,9 @@ xfs_read_agi(
 {
 	int			error;
 
-	ASSERT(agno != NULLAGNUMBER);
+	trace_xfs_read_agi(mp, agno);
 
+	ASSERT(agno != NULLAGNUMBER);
 	error = xfs_trans_read_buf(mp, tp, mp->m_ddev_targp,
 			XFS_AG_DADDR(mp, agno, XFS_AGI_DADDR(mp)),
 			XFS_FSS_TO_BB(mp, 1), 0, bpp, &xfs_agi_buf_ops);
@@ -1651,6 +1653,8 @@ xfs_ialloc_read_agi(
 	struct xfs_perag	*pag;	/* per allocation group data */
 	int			error;
 
+	trace_xfs_ialloc_read_agi(mp, agno);
+
 	error = xfs_read_agi(mp, tp, agno, bpp);
 	if (error)
 		return error;
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index f195476..425dfa4 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -135,6 +135,31 @@ DEFINE_PERAG_REF_EVENT(xfs_perag_clear_reclaim);
 DEFINE_PERAG_REF_EVENT(xfs_perag_set_eofblocks);
 DEFINE_PERAG_REF_EVENT(xfs_perag_clear_eofblocks);
 
+DECLARE_EVENT_CLASS(xfs_ag_class,
+	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno),
+	TP_ARGS(mp, agno),
+	TP_STRUCT__entry(
+		__field(dev_t, dev)
+		__field(xfs_agnumber_t, agno)
+	),
+	TP_fast_assign(
+		__entry->dev = mp->m_super->s_dev;
+		__entry->agno = agno;
+	),
+	TP_printk("dev %d:%d agno %u",
+		  MAJOR(__entry->dev), MINOR(__entry->dev),
+		  __entry->agno)
+);
+#define DEFINE_AG_EVENT(name)	\
+DEFINE_EVENT(xfs_ag_class, name,	\
+	TP_PROTO(struct xfs_mount *mp, xfs_agnumber_t agno),	\
+	TP_ARGS(mp, agno))
+
+DEFINE_AG_EVENT(xfs_read_agf);
+DEFINE_AG_EVENT(xfs_alloc_read_agf);
+DEFINE_AG_EVENT(xfs_read_agi);
+DEFINE_AG_EVENT(xfs_ialloc_read_agi);
+
 TRACE_EVENT(xfs_attr_list_node_descend,
 	TP_PROTO(struct xfs_attr_list_context *ctx,
 		 struct xfs_da_node_entry *btree),
-- 
1.8.4.rc3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 5/5] xfs: increase inode cluster size for v5 filesystems
  2013-11-01  4:27 [PATCH 0/5] xfs: more patches for 3.13 Dave Chinner
                   ` (3 preceding siblings ...)
  2013-11-01  4:27 ` [PATCH 4/5] xfs: add tracepoints to AGF/AGI read operations Dave Chinner
@ 2013-11-01  4:27 ` Dave Chinner
  2013-11-05 16:43   ` Christoph Hellwig
                     ` (2 more replies)
  2013-11-06 23:01 ` [PATCH 0/5] xfs: more patches for 3.13 Ben Myers
  2013-11-18 20:30 ` Ben Myers
  6 siblings, 3 replies; 30+ messages in thread
From: Dave Chinner @ 2013-11-01  4:27 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

v5 filesystems use 512 byte inodes as a minimum, so read inodes in
clusters that are effectively half the size of a v4 filesystem with
256 byte inodes. For v5 fielsystems, scale the inode cluster size
with the size of the inode so that we keep a constant 32 inodes per
cluster ratio for all inode IO.

This only works if mkfs.xfs sets the inode alignment appropriately
for larger inode clusters, so this functionality is made conditional
on mkfs doing the right thing. xfs_repair needs to know about
the inode alignment changes, too.

Wall time:
	create	bulkstat	find+stat	ls -R	unlink
v4	237s	161s		173s		201s	299s
v5	235s	163s		205s		 31s	356s
patched	234s	160s		182s		 29s	317s

System time:
	create	bulkstat	find+stat	ls -R	unlink
v4	2601s	2490s		1653s		1656s	2960s
v5	2637s	2497s		1681s		  20s	3216s
patched	2613s	2451s		1658s		  20s	3007s

So, wall time same or down across the board, system time same or
down across the board, and cache hit rates all improve except for
the ls -R case which is a pure cold cache directory read workload
on v5 filesystems...

So, this patch removes most of the performance and CPU usage
differential between v4 and v5 filesystems on traversal related
workloads.

Note: while this patch is currently for v5 filesystems only, there
is no reason it can't be ported back to v4 filesystems.  This hasn't
been done here because bringing the code back to v4 requires
forwards and backwards kernel compatibility testing.  i.e. to
deterine if older kernels(*) do the right thing with larger inode
alignments but still only using 8k inode cluster sizes. None of this
testing and validation on v4 filesystems has been done, so for the
moment larger inode clusters is limited to v5 superblocks.

(*) a current default config v4 filesystem should mount just fine on
2.6.23 (when lazy-count support was introduced), and so if we change
the alignment emitted by mkfs without a feature bit then we have to
make sure it works properly on all kernels since 2.6.23. And if we
allow it to be changed when the lazy-count bit is not set, then it's
all kernels since v2 logs were introduced that need to be tested for
compatibility...

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_mount.c      | 15 +++++++++++++++
 fs/xfs/xfs_mount.h      |  2 +-
 fs/xfs/xfs_trans_resv.c |  3 +--
 3 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index da88f16..02df7b4 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -41,6 +41,7 @@
 #include "xfs_fsops.h"
 #include "xfs_trace.h"
 #include "xfs_icache.h"
+#include "xfs_dinode.h"
 
 
 #ifdef HAVE_PERCPU_SB
@@ -718,8 +719,22 @@ xfs_mountfs(
 	 * Set the inode cluster size.
 	 * This may still be overridden by the file system
 	 * block size if it is larger than the chosen cluster size.
+	 *
+	 * For v5 filesystems, scale the cluster size with the inode size to
+	 * keep a constant ratio of inode per cluster buffer, but only if mkfs
+	 * has set the inode alignment value appropriately for larger cluster
+	 * sizes.
 	 */
 	mp->m_inode_cluster_size = XFS_INODE_BIG_CLUSTER_SIZE;
+	if (xfs_sb_version_hascrc(&mp->m_sb)) {
+		int	new_size = mp->m_inode_cluster_size;
+
+		new_size *= mp->m_sb.sb_inodesize / XFS_DINODE_MIN_SIZE;
+		if (mp->m_sb.sb_inoalignmt >= XFS_B_TO_FSBT(mp, new_size))
+			mp->m_inode_cluster_size = new_size;
+		xfs_info(mp, "Using inode cluster size of %d bytes",
+			 mp->m_inode_cluster_size);
+	}
 
 	/*
 	 * Set inode alignment fields
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index 1d8101a..a466c5e 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -112,7 +112,7 @@ typedef struct xfs_mount {
 	__uint8_t		m_blkbb_log;	/* blocklog - BBSHIFT */
 	__uint8_t		m_agno_log;	/* log #ag's */
 	__uint8_t		m_agino_log;	/* #bits for agino in inum */
-	__uint16_t		m_inode_cluster_size;/* min inode buf size */
+	uint			m_inode_cluster_size;/* min inode buf size */
 	uint			m_blockmask;	/* sb_blocksize-1 */
 	uint			m_blockwsize;	/* sb_blocksize in words */
 	uint			m_blockwmask;	/* blockwsize-1 */
diff --git a/fs/xfs/xfs_trans_resv.c b/fs/xfs/xfs_trans_resv.c
index d53d9f0..2fd59c0 100644
--- a/fs/xfs/xfs_trans_resv.c
+++ b/fs/xfs/xfs_trans_resv.c
@@ -385,8 +385,7 @@ xfs_calc_ifree_reservation(
 		xfs_calc_inode_res(mp, 1) +
 		xfs_calc_buf_res(2, mp->m_sb.sb_sectsize) +
 		xfs_calc_buf_res(1, XFS_FSB_TO_B(mp, 1)) +
-		MAX((__uint16_t)XFS_FSB_TO_B(mp, 1),
-		    XFS_INODE_CLUSTER_SIZE(mp)) +
+		max_t(uint, XFS_FSB_TO_B(mp, 1), XFS_INODE_CLUSTER_SIZE(mp)) +
 		xfs_calc_buf_res(1, 0) +
 		xfs_calc_buf_res(2 + XFS_IALLOC_BLOCKS(mp) +
 				 mp->m_in_maxlevels, 0) +
-- 
1.8.4.rc3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/5] xfs: open code inc_inode_iversion when logging an inode
  2013-11-01  4:27 ` [PATCH 2/5] xfs: open code inc_inode_iversion when logging an inode Dave Chinner
@ 2013-11-05 16:41   ` Christoph Hellwig
  2013-11-18 21:54   ` Eric Sandeen
  1 sibling, 0 replies; 30+ messages in thread
From: Christoph Hellwig @ 2013-11-05 16:41 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

Looks good to me.  I really wish we could remove i_version from the VFS
inode and just leave it to the fs, though.

Reviewed-by: Christoph Hellwig <hch@lst.de>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/5] xfs: trace AIL manipulations
  2013-11-01  4:27 ` [PATCH 3/5] xfs: trace AIL manipulations Dave Chinner
@ 2013-11-05 16:41   ` Christoph Hellwig
  0 siblings, 0 replies; 30+ messages in thread
From: Christoph Hellwig @ 2013-11-05 16:41 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs


Reviewed-by: Christoph Hellwig <hch@lst.de>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/5] xfs: add tracepoints to AGF/AGI read operations
  2013-11-01  4:27 ` [PATCH 4/5] xfs: add tracepoints to AGF/AGI read operations Dave Chinner
@ 2013-11-05 16:42   ` Christoph Hellwig
  0 siblings, 0 replies; 30+ messages in thread
From: Christoph Hellwig @ 2013-11-05 16:42 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/5] xfs: increase inode cluster size for v5 filesystems
  2013-11-01  4:27 ` [PATCH 5/5] xfs: increase inode cluster size for v5 filesystems Dave Chinner
@ 2013-11-05 16:43   ` Christoph Hellwig
  2013-11-05 19:56     ` Dave Chinner
  2013-11-08 18:21   ` Eric Sandeen
  2013-11-14 18:51   ` Eric Sandeen
  2 siblings, 1 reply; 30+ messages in thread
From: Christoph Hellwig @ 2013-11-05 16:43 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

> +	if (xfs_sb_version_hascrc(&mp->m_sb)) {
> +		int	new_size = mp->m_inode_cluster_size;
> +
> +		new_size *= mp->m_sb.sb_inodesize / XFS_DINODE_MIN_SIZE;
> +		if (mp->m_sb.sb_inoalignmt >= XFS_B_TO_FSBT(mp, new_size))
> +			mp->m_inode_cluster_size = new_size;
> +		xfs_info(mp, "Using inode cluster size of %d bytes",
> +			 mp->m_inode_cluster_size);

printing this on every mount seem a bit too verbose.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/5] xfs: increase inode cluster size for v5 filesystems
  2013-11-05 16:43   ` Christoph Hellwig
@ 2013-11-05 19:56     ` Dave Chinner
  2013-11-06 21:31       ` Ben Myers
  2013-11-12 17:33       ` Christoph Hellwig
  0 siblings, 2 replies; 30+ messages in thread
From: Dave Chinner @ 2013-11-05 19:56 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Tue, Nov 05, 2013 at 08:43:07AM -0800, Christoph Hellwig wrote:
> > +	if (xfs_sb_version_hascrc(&mp->m_sb)) {
> > +		int	new_size = mp->m_inode_cluster_size;
> > +
> > +		new_size *= mp->m_sb.sb_inodesize / XFS_DINODE_MIN_SIZE;
> > +		if (mp->m_sb.sb_inoalignmt >= XFS_B_TO_FSBT(mp, new_size))
> > +			mp->m_inode_cluster_size = new_size;
> > +		xfs_info(mp, "Using inode cluster size of %d bytes",
> > +			 mp->m_inode_cluster_size);
> 
> printing this on every mount seem a bit too verbose.

I'd like to leave it there until we remove the experimental tag from
the v5 superblock configuration, as there is no good way of
determining that someone is using a mkfs patched to enable this
feature yet...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/5] xfs: increase inode cluster size for v5 filesystems
  2013-11-05 19:56     ` Dave Chinner
@ 2013-11-06 21:31       ` Ben Myers
  2013-11-07  0:32         ` Dave Chinner
  2013-11-12 17:33       ` Christoph Hellwig
  1 sibling, 1 reply; 30+ messages in thread
From: Ben Myers @ 2013-11-06 21:31 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, xfs

Hi Dave,

On Wed, Nov 06, 2013 at 06:56:50AM +1100, Dave Chinner wrote:
> On Tue, Nov 05, 2013 at 08:43:07AM -0800, Christoph Hellwig wrote:
> > > +	if (xfs_sb_version_hascrc(&mp->m_sb)) {
> > > +		int	new_size = mp->m_inode_cluster_size;
> > > +
> > > +		new_size *= mp->m_sb.sb_inodesize / XFS_DINODE_MIN_SIZE;
> > > +		if (mp->m_sb.sb_inoalignmt >= XFS_B_TO_FSBT(mp, new_size))
> > > +			mp->m_inode_cluster_size = new_size;
> > > +		xfs_info(mp, "Using inode cluster size of %d bytes",
> > > +			 mp->m_inode_cluster_size);
> > 
> > printing this on every mount seem a bit too verbose.
> 
> I'd like to leave it there until we remove the experimental tag from
> the v5 superblock configuration, as there is no good way of
> determining that someone is using a mkfs patched to enable this
> feature yet...

I don't think I have a problem with bumping up the inode cluster size, but I am
a little concerned about two aspects of this patch:

1) Backward compatability issue with earlier v5 filesystems that don't support
the larger inode cluster.  I know it's experimental, but what do those failures
look like?  This strikes me as being kind of scary.

2) I don't like to overload the inode alignment mkfs option to do this.  I
think it would be better if we explicitly set the inode cluster size at mkfs
time.

Or maybe this one should have an incompatability bit.  Maybe it doesn't need to
be a separate mkfs option.

-Ben

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/5] xfs: more patches for 3.13
  2013-11-01  4:27 [PATCH 0/5] xfs: more patches for 3.13 Dave Chinner
                   ` (4 preceding siblings ...)
  2013-11-01  4:27 ` [PATCH 5/5] xfs: increase inode cluster size for v5 filesystems Dave Chinner
@ 2013-11-06 23:01 ` Ben Myers
  2013-11-07  1:57   ` Dave Chinner
  2013-11-18 20:30 ` Ben Myers
  6 siblings, 1 reply; 30+ messages in thread
From: Ben Myers @ 2013-11-06 23:01 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Fri, Nov 01, 2013 at 03:27:15PM +1100, Dave Chinner wrote:
> Hi folks,
> 
> The following series follows up the recently committed series of
> patches for 3.13. The first two patches are the remaining
> uncommitted patches from the previous series.
> 
> The next two patches are tracing patches, one for AIL manipulations
> and the other for AGF and AGI read operations. Both of these were
> written during recent debugging sessions, and both proved useful so
> should be added to the menagerie of tracepoints we already have
> avaialble.
> 
> The final patch is the increasing of the inode cluster size for v5
> filesystems. I'd like to get this into v5 filesystems for 3.13 so we
> get wider exposure of it ASAP so we have more data available to be
> able to make informed decisions about how to bring this back to v4
> filesystems in a safe and controlled manner.

Applied 3 and 4.  I still don't understand why the locking on patch 2 is
correct.  Seems like the readers of i_version hold different locks than we do
when we log the inode.  Maybe Christoph can help me with that.

Thanks,
	Ben

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/5] xfs: increase inode cluster size for v5 filesystems
  2013-11-06 21:31       ` Ben Myers
@ 2013-11-07  0:32         ` Dave Chinner
  0 siblings, 0 replies; 30+ messages in thread
From: Dave Chinner @ 2013-11-07  0:32 UTC (permalink / raw)
  To: Ben Myers; +Cc: Christoph Hellwig, xfs

On Wed, Nov 06, 2013 at 03:31:59PM -0600, Ben Myers wrote:
> Hi Dave,
> 
> On Wed, Nov 06, 2013 at 06:56:50AM +1100, Dave Chinner wrote:
> > On Tue, Nov 05, 2013 at 08:43:07AM -0800, Christoph Hellwig wrote:
> > > > +	if (xfs_sb_version_hascrc(&mp->m_sb)) {
> > > > +		int	new_size = mp->m_inode_cluster_size;
> > > > +
> > > > +		new_size *= mp->m_sb.sb_inodesize / XFS_DINODE_MIN_SIZE;
> > > > +		if (mp->m_sb.sb_inoalignmt >= XFS_B_TO_FSBT(mp, new_size))
> > > > +			mp->m_inode_cluster_size = new_size;
> > > > +		xfs_info(mp, "Using inode cluster size of %d bytes",
> > > > +			 mp->m_inode_cluster_size);
> > > 
> > > printing this on every mount seem a bit too verbose.
> > 
> > I'd like to leave it there until we remove the experimental tag from
> > the v5 superblock configuration, as there is no good way of
> > determining that someone is using a mkfs patched to enable this
> > feature yet...
> 
> I don't think I have a problem with bumping up the inode cluster size, but I am
> a little concerned about two aspects of this patch:
> 
> 1) Backward compatability issue with earlier v5 filesystems that don't support
> the larger inode cluster.  I know it's experimental, but what do those failures
> look like?  This strikes me as being kind of scary.

There shouldn't be any failures - it should just work.

That is, larger alignment on older kernels will only affect
allocation alignment of new chunks, not the cluster size, and so
everything should work just fine because it's inode chunk alignment
that matters for larger inode clusters to work....

Also, log recovery knows about the fact that inode clusters could
potentially change size from mount to mount and handles such cases
appropriately e.g. see the comment in xlog_recover_buffer_pass2().

Hence there should be no problems at all.

If there are, discovering flaws in my understanding of how inode
alignment and clusters interact is part of reason I have not enabled
this feature on v4 superblocks (as the original commit explained).
If we discover one, then the worst case is that we change mkfs back
to setting sb_inoalign = 2 for the xfsprogs 3.2.0 release and this
code in the kernel becomes a no-op...


> 2) I don't like to overload the inode alignment mkfs option to do this.  I
> think it would be better if we explicitly set the inode cluster size at mkfs
> time.

Overload? Not at all - inode alignment was introduced back in 1996
specifically to alleviate performance issues with mapping inode
numbers to cluster buffers way back in 1996. The two have been
intimately related for a long time, and you can't use larger inode
clusters without first adjusting the inode alignment to support
those larger cluster sizes.

FWIW, the kernel is free to use whatever cluster size it likes as
they are an in-memory construct - the kernel can do inode IO in
single blocks if it wants to or needs to. However, it can't do
correct inode number to cluster offset calculations if the inode
chunk block number is not appropriately aligned for the size of the
cluster it wants to use.

Originally (1995), clusters were 4k:

http://oss.sgi.com/cgi-bin/gitweb.cgi?p=archive/xfs-import.git;a=commitdiff;h=3f6e8796aa0c6511a6c33035ab75a842e27ab8da

months later, 8k:

http://oss.sgi.com/cgi-bin/gitweb.cgi?p=archive/xfs-import.git;a=commitdiff;h=94c9a3cfa5c94ca002dfb3476b321422337e7637

But we don't have the history of mkfs available to what inode
alignment was used at the time.

Then when the various Irix trees got merged into a combined tree a
few months later, both 4k and 8k clusters were supported:

http://oss.sgi.com/cgi-bin/gitweb.cgi?p=archive/xfs-import.git;a=commitdiff;h=978576245cd1d77c157ae39e4a569e7b88c3a75b
http://oss.sgi.com/cgi-bin/gitweb.cgi?p=archive/xfs-import.git;a=commitdiff;h=7c9131fe8bd196ed4dda6af83d62dde87486b205
http://oss.sgi.com/cgi-bin/gitweb.cgi?p=archive/xfs-import.git;a=commitdiff;h=e50cfe641aa9555e8446318c8db865671b284a5d

And so you can see that the cluster size was chosen base on the
amount of RAM the system had. However, because mkfs never set an
alignment of more than 2 blocks, the cluster size couldn't be
increased to be any larger....

> Or maybe this one should have an incompatability bit.  Maybe it doesn't need to
> be a separate mkfs option.

It shouldn't need an incompatibility bit, nor should it be a
separate mkfs option for v5 filesystems. Whether either are
necessary for v4 filesystems is separate discussion.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/5] xfs: more patches for 3.13
  2013-11-06 23:01 ` [PATCH 0/5] xfs: more patches for 3.13 Ben Myers
@ 2013-11-07  1:57   ` Dave Chinner
  2013-11-13  1:16     ` Eric Sandeen
  0 siblings, 1 reply; 30+ messages in thread
From: Dave Chinner @ 2013-11-07  1:57 UTC (permalink / raw)
  To: Ben Myers; +Cc: xfs

On Wed, Nov 06, 2013 at 05:01:33PM -0600, Ben Myers wrote:
> On Fri, Nov 01, 2013 at 03:27:15PM +1100, Dave Chinner wrote:
> > Hi folks,
> > 
> > The following series follows up the recently committed series of
> > patches for 3.13. The first two patches are the remaining
> > uncommitted patches from the previous series.
> > 
> > The next two patches are tracing patches, one for AIL manipulations
> > and the other for AGF and AGI read operations. Both of these were
> > written during recent debugging sessions, and both proved useful so
> > should be added to the menagerie of tracepoints we already have
> > avaialble.
> > 
> > The final patch is the increasing of the inode cluster size for v5
> > filesystems. I'd like to get this into v5 filesystems for 3.13 so we
> > get wider exposure of it ASAP so we have more data available to be
> > able to make informed decisions about how to bring this back to v4
> > filesystems in a safe and controlled manner.
> 
> Applied 3 and 4.  I still don't understand why the locking on patch 2 is
> correct.  Seems like the readers of i_version hold different locks than we do
> when we log the inode.  Maybe Christoph can help me with that.

Readers don't need to hold a spinlock, and many don't. The spinlock
is only there to prevent concurrent updates from "losing" an update
due to races.  All modifications to XFS inodes occur via
transactions, inodes are locked exclusively in transactions and
hence we will never lose i_version updates due to races. Hence we
don't need the spinlock during the update, either.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/5] xfs: increase inode cluster size for v5 filesystems
  2013-11-01  4:27 ` [PATCH 5/5] xfs: increase inode cluster size for v5 filesystems Dave Chinner
  2013-11-05 16:43   ` Christoph Hellwig
@ 2013-11-08 18:21   ` Eric Sandeen
  2013-11-11 22:45     ` Dave Chinner
  2013-11-14 18:51   ` Eric Sandeen
  2 siblings, 1 reply; 30+ messages in thread
From: Eric Sandeen @ 2013-11-08 18:21 UTC (permalink / raw)
  To: Dave Chinner, xfs

On 10/31/13, 11:27 PM, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> v5 filesystems use 512 byte inodes as a minimum, so read inodes in

I think you meant :

"..., so today we read inodes in clusters that are..."
         ^^^^^^^^

- changes the meaning quite a bit.  :)

> clusters that are effectively half the size of a v4 filesystem with
> 256 byte inodes. For v5 fielsystems, scale the inode cluster size
> with the size of the inode so that we keep a constant 32 inodes per
> cluster ratio for all inode IO.
> 
> This only works if mkfs.xfs sets the inode alignment appropriately
> for larger inode clusters, so this functionality is made conditional
> on mkfs doing the right thing. xfs_repair needs to know about
> the inode alignment changes, too.
> 
> Wall time:
> 	create	bulkstat	find+stat	ls -R	unlink
> v4	237s	161s		173s		201s	299s
> v5	235s	163s		205s		 31s	356s
> patched	234s	160s		182s		 29s	317s
> 
> System time:
> 	create	bulkstat	find+stat	ls -R	unlink
> v4	2601s	2490s		1653s		1656s	2960s
> v5	2637s	2497s		1681s		  20s	3216s
> patched	2613s	2451s		1658s		  20s	3007s
> 
> So, wall time same or down across the board, system time same or
> down across the board, and cache hit rates all improve except for
> the ls -R case which is a pure cold cache directory read workload
> on v5 filesystems...
> 
> So, this patch removes most of the performance and CPU usage
> differential between v4 and v5 filesystems on traversal related
> workloads.

We already have some issues w/ larger inode clusters & freespace
fragmentation resulting in the inability to create new clusters,
right?

How might this impact that problem?  I understand that it may be
a tradeoff.

> Note: while this patch is currently for v5 filesystems only, there
> is no reason it can't be ported back to v4 filesystems.  This hasn't
> been done here because bringing the code back to v4 requires
> forwards and backwards kernel compatibility testing.  i.e. to
> deterine if older kernels(*) do the right thing with larger inode
> alignments but still only using 8k inode cluster sizes. None of this
> testing and validation on v4 filesystems has been done, so for the
> moment larger inode clusters is limited to v5 superblocks.

Thanks for that explanation.

> (*) a current default config v4 filesystem should mount just fine on
> 2.6.23 (when lazy-count support was introduced), and so if we change
> the alignment emitted by mkfs without a feature bit then we have to
> make sure it works properly on all kernels since 2.6.23. And if we
> allow it to be changed when the lazy-count bit is not set, then it's
> all kernels since v2 logs were introduced that need to be tested for
> compatibility...
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_mount.c      | 15 +++++++++++++++
>  fs/xfs/xfs_mount.h      |  2 +-
>  fs/xfs/xfs_trans_resv.c |  3 +--
>  3 files changed, 17 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
> index da88f16..02df7b4 100644
> --- a/fs/xfs/xfs_mount.c
> +++ b/fs/xfs/xfs_mount.c
> @@ -41,6 +41,7 @@
>  #include "xfs_fsops.h"
>  #include "xfs_trace.h"
>  #include "xfs_icache.h"
> +#include "xfs_dinode.h"
>  
>  
>  #ifdef HAVE_PERCPU_SB
> @@ -718,8 +719,22 @@ xfs_mountfs(
>  	 * Set the inode cluster size.
>  	 * This may still be overridden by the file system
>  	 * block size if it is larger than the chosen cluster size.
> +	 *
> +	 * For v5 filesystems, scale the cluster size with the inode size to
> +	 * keep a constant ratio of inode per cluster buffer, but only if mkfs
> +	 * has set the inode alignment value appropriately for larger cluster
> +	 * sizes.
>  	 */
>  	mp->m_inode_cluster_size = XFS_INODE_BIG_CLUSTER_SIZE;

Just thinking out loud here: So this is runtime only; nothing on disk sets
the cluster size explicitly (granted, it never did).

So moving back and forth across newer/older kernels will create clusters 
of different sizes on the same filesystem, right?

(In the very distant past, this same change could have happened if
the amount of memory in a box changed (!) - see commit
425f9ddd534573f58df8e7b633a534fcfc16d44d; prior to that we set
m_inode_cluster_size on the fly as well).

But sb_inoalignmt is a mkfs-set, on-disk feature.  So we might start with
i.e. this, where A1 are 8k alignment points, and 512 byte inodes, in clusters
of size 8k / 16 inodes:

A1           A1           A1           A1           
[ 16 inodes ][ 16 inodes ]             [ 16 inodes ]

and in this case we couldn't bump up m_inode_cluster_size, lest we
allocate a larger cluster on the smaller alignment & overlap:

A1           A1           A1           A1           
[ 16 inodes ][ 16 inodes ]             [ 16 inodes ] <--- existing
                         [        32 inodes        ] <--- new

But if we had a larger (say, 16k) inode alignment, like:  we could
safely do:

A1                        A2                        A2          
[ 16 inodes ]                                       [ 16 inodes ]
                          [        32 inodes       ] <--- new

and be ok.

So the only other thing I wonder about is when we are handling
pre-existing, smaller-than m_inode_cluster_size clusters.

i.e. xfs_ifree_cluster() figures out the number of blocks &
number of inodes in a cluster, based on the (now not
constant) m_inode_cluster_size.

What stops us from going off the end of a smaller cluster?
</handwave>

I'm not up to speed on inode IO TBH, so will dig into it a little more.

Aside from all those vague questions, this seems sane.  ;)

> +	if (xfs_sb_version_hascrc(&mp->m_sb)) {
> +		int	new_size = mp->m_inode_cluster_size;
> +
> +		new_size *= mp->m_sb.sb_inodesize / XFS_DINODE_MIN_SIZE;
> +		if (mp->m_sb.sb_inoalignmt >= XFS_B_TO_FSBT(mp, new_size))
> +			mp->m_inode_cluster_size = new_size;
> +		xfs_info(mp, "Using inode cluster size of %d bytes",
> +			 mp->m_inode_cluster_size);
> +	}
>  
>  	/*
>  	 * Set inode alignment fields

next 2 hunks are just tidying, right.

Thanks,
-Eric

> diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
> index 1d8101a..a466c5e 100644
> --- a/fs/xfs/xfs_mount.h
> +++ b/fs/xfs/xfs_mount.h
> @@ -112,7 +112,7 @@ typedef struct xfs_mount {
>  	__uint8_t		m_blkbb_log;	/* blocklog - BBSHIFT */
>  	__uint8_t		m_agno_log;	/* log #ag's */
>  	__uint8_t		m_agino_log;	/* #bits for agino in inum */
> -	__uint16_t		m_inode_cluster_size;/* min inode buf size */
> +	uint			m_inode_cluster_size;/* min inode buf size */
>  	uint			m_blockmask;	/* sb_blocksize-1 */
>  	uint			m_blockwsize;	/* sb_blocksize in words */
>  	uint			m_blockwmask;	/* blockwsize-1 */
> diff --git a/fs/xfs/xfs_trans_resv.c b/fs/xfs/xfs_trans_resv.c
> index d53d9f0..2fd59c0 100644
> --- a/fs/xfs/xfs_trans_resv.c
> +++ b/fs/xfs/xfs_trans_resv.c
> @@ -385,8 +385,7 @@ xfs_calc_ifree_reservation(
>  		xfs_calc_inode_res(mp, 1) +
>  		xfs_calc_buf_res(2, mp->m_sb.sb_sectsize) +
>  		xfs_calc_buf_res(1, XFS_FSB_TO_B(mp, 1)) +
> -		MAX((__uint16_t)XFS_FSB_TO_B(mp, 1),
> -		    XFS_INODE_CLUSTER_SIZE(mp)) +
> +		max_t(uint, XFS_FSB_TO_B(mp, 1), XFS_INODE_CLUSTER_SIZE(mp)) +
>  		xfs_calc_buf_res(1, 0) +
>  		xfs_calc_buf_res(2 + XFS_IALLOC_BLOCKS(mp) +
>  				 mp->m_in_maxlevels, 0) +
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/5] xfs: increase inode cluster size for v5 filesystems
  2013-11-08 18:21   ` Eric Sandeen
@ 2013-11-11 22:45     ` Dave Chinner
  2013-11-12  0:24       ` Eric Sandeen
  0 siblings, 1 reply; 30+ messages in thread
From: Dave Chinner @ 2013-11-11 22:45 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: xfs

On Fri, Nov 08, 2013 at 12:21:20PM -0600, Eric Sandeen wrote:
> On 10/31/13, 11:27 PM, Dave Chinner wrote:
> > So, this patch removes most of the performance and CPU usage
> > differential between v4 and v5 filesystems on traversal related
> > workloads.
> 
> We already have some issues w/ larger inode clusters & freespace
> fragmentation resulting in the inability to create new clusters,
> right?

Yes, we do.

> How might this impact that problem?  I understand that it may be
> a tradeoff.

It will make the problem slightly worse for workloads that already
suffer from this problem. For those that don't, it should have no
noticeable effect.

> > @@ -718,8 +719,22 @@ xfs_mountfs(
> >  	 * Set the inode cluster size.
> >  	 * This may still be overridden by the file system
> >  	 * block size if it is larger than the chosen cluster size.
> > +	 *
> > +	 * For v5 filesystems, scale the cluster size with the inode size to
> > +	 * keep a constant ratio of inode per cluster buffer, but only if mkfs
> > +	 * has set the inode alignment value appropriately for larger cluster
> > +	 * sizes.
> >  	 */
> >  	mp->m_inode_cluster_size = XFS_INODE_BIG_CLUSTER_SIZE;
> 
> Just thinking out loud here: So this is runtime only; nothing on disk sets
> the cluster size explicitly (granted, it never did).
> 
> So moving back and forth across newer/older kernels will create clusters 
> of different sizes on the same filesystem, right?

No - inodes are allocated in chunks, not clusters. Inode clusters
are the unit of IO we read and write inodes in.

> (In the very distant past, this same change could have happened if
> the amount of memory in a box changed (!) - see commit
> 425f9ddd534573f58df8e7b633a534fcfc16d44d; prior to that we set
> m_inode_cluster_size on the fly as well).

Right, I think I've already pointed that out.

> But sb_inoalignmt is a mkfs-set, on-disk feature.  So we might start with
> i.e. this, where A1 are 8k alignment points, and 512 byte inodes, in clusters
> of size 8k / 16 inodes:
> 
> A1           A1           A1           A1           
> [ 16 inodes ][ 16 inodes ]             [ 16 inodes ]

Ok, here's where you go wrong. Inode chunks are always 64 inodes,
and so what you have on disk after any inode allocation is:

A1           A1           A1           A1
[ 16 inodes ][ 16 inodes ][ 16 inodes ][ 16 inodes ]

and sb_inoalign determines where A1 lands in terms of filesystem
blocks. With sb_inoalign = 2 and a 4k filesystem block size, you can
only align inode *chunks* to even filesystem blocks like so:

ODD   EVEN   ODD   EVEN   ODD   EVEN   ODD   EVEN   ODD   EVEN
      A1           A1           A1           A1		  A1
      [ 16 inodes ][ 16 inodes ][ 16 inodes ][ 16 inodes ]

If we have 1kb filesystem blocks, then the equivalent sb_inoalign
value to give this same inode *chunk* layout is 8:

# mkfs.xfs -f -b size=1024 -i size=512 /dev/vdb
.....
# xfs_db -c "sb 0" -c "p inoalignmt" /dev/vdb
inoalignmt = 8
#

i.e:

45 6 70 1 2 345 6 70 1 2 345 6 70 1 2 345 6 70 1 2 345 6 70 1 2 ...
      A1           A1           A1           A1		  A1
      [ 16 inodes ][ 16 inodes ][ 16 inodes ][ 16 inodes ]

And with the larger inode cluster sizes:

# mkfs.xfs -f -b size=1024 -i size=512 -m crc=1 /dev/vdb
.....
# xfs_db -c "sb 0" -c "p inoalignmt" /dev/vdb
inoalignmt = 16
#

And, yes, that's not actually out of the range we commonly test -
512 byte block size with 256 byte inodes:

# mkfs.xfs -f -b size=512 /dev/vdb
.....
# xfs_db -c "sb 0" -c "p inoalignmt" /dev/vdb
inoalignmt = 16
#

So we definitely already handle and test these inode alignment
configurations all the time....

> and in this case we couldn't bump up m_inode_cluster_size, lest we
> allocate a larger cluster on the smaller alignment & overlap:
> 
> A1           A1           A1           A1           
> [ 16 inodes ][ 16 inodes ]             [ 16 inodes ] <--- existing
>                          [        32 inodes        ] <--- new

To be able to bump up the inode cluster size, what we have to
guarantee is that the inode chunks align to the the larger cluster
size like so:

A2                        A2
A1           A1           A1           A1
[ 16 inodes ][ 16 inodes ][ 16 inodes ][ 16 inodes ] <--- existing
[        32 inodes       ][       32 inodes        ] <--- new

i.e. inode chunk allocation needs to be aligned to A2, not A1 for
the correct alignment of the larger clusters.

If we align to A1, then this will happen:

A2                        A2                        A2
A1           A1           A1           A1           A1
             [ 16 inodes ][ 16 inodes ][ 16 inodes ][ 16 inodes ]
[        32 inodes       ][       32 inodes        ] <--- new

And that is clearly broken. Hence, to ensure we can use larger inode
clusters, we have to ensure that the inode chunks are aligned
appropriately for those cluster sizes. If the chunks are
appropriately aligned for larger inode clusters (e.g. sb_inoalign =
4), then they are also appropriately aligned for inode cluster sizes
older kernels support.

> So the only other thing I wonder about is when we are handling
> pre-existing, smaller-than m_inode_cluster_size clusters.
> 
> i.e. xfs_ifree_cluster() figures out the number of blocks &
> number of inodes in a cluster, based on the (now not
> constant) m_inode_cluster_size.
> 
> What stops us from going off the end of a smaller cluster?

The fact that we calculate the number of inodes to process per
cluster based on the size of the cluster buffer (in blocks)
multiplied by the number of inodes per block. If the code didn't
work, we'd have found out a long time ago ;)

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/5] xfs: increase inode cluster size for v5 filesystems
  2013-11-11 22:45     ` Dave Chinner
@ 2013-11-12  0:24       ` Eric Sandeen
  0 siblings, 0 replies; 30+ messages in thread
From: Eric Sandeen @ 2013-11-12  0:24 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On 11/11/13, 4:45 PM, Dave Chinner wrote:
> On Fri, Nov 08, 2013 at 12:21:20PM -0600, Eric Sandeen wrote:
>> On 10/31/13, 11:27 PM, Dave Chinner wrote:
>>> So, this patch removes most of the performance and CPU usage
>>> differential between v4 and v5 filesystems on traversal related
>>> workloads.

...

>> Just thinking out loud here: So this is runtime only; nothing on disk sets
>> the cluster size explicitly (granted, it never did).
>>
>> So moving back and forth across newer/older kernels will create clusters 
>> of different sizes on the same filesystem, right?
> 
> No - inodes are allocated in chunks, not clusters. Inode clusters
> are the unit of IO we read and write inodes in.

Hohum, confused that fundamental difference.  Sorry.  Kind of
invalidates my other questions doesn't it.

>> (In the very distant past, this same change could have happened if
>> the amount of memory in a box changed (!) - see commit
>> 425f9ddd534573f58df8e7b633a534fcfc16d44d; prior to that we set
>> m_inode_cluster_size on the fly as well).
> 
> Right, I think I've already pointed that out.
> 
>> But sb_inoalignmt is a mkfs-set, on-disk feature.  So we might start with
>> i.e. this, where A1 are 8k alignment points, and 512 byte inodes, in clusters
>> of size 8k / 16 inodes:
>>
>> A1           A1           A1           A1           
>> [ 16 inodes ][ 16 inodes ]             [ 16 inodes ]
> 
> Ok, here's where you go wrong. Inode chunks are always 64 inodes,
> and so what you have on disk after any inode allocation is:
> 
> A1           A1           A1           A1
> [ 16 inodes ][ 16 inodes ][ 16 inodes ][ 16 inodes ]
> 
> and sb_inoalign determines where A1 lands in terms of filesystem
> blocks. With sb_inoalign = 2 and a 4k filesystem block size, you can
> only align inode *chunks* to even filesystem blocks like so:
> 
> ODD   EVEN   ODD   EVEN   ODD   EVEN   ODD   EVEN   ODD   EVEN
>       A1           A1           A1           A1		  A1
>       [ 16 inodes ][ 16 inodes ][ 16 inodes ][ 16 inodes ]
> 

<snip a few examples>

> To be able to bump up the inode cluster size, what we have to
> guarantee is that the inode chunks align to the the larger cluster
> size like so:
> 
> A2                        A2
> A1           A1           A1           A1
> [ 16 inodes ][ 16 inodes ][ 16 inodes ][ 16 inodes ] <--- existing
> [        32 inodes       ][       32 inodes        ] <--- new
> 
> i.e. inode chunk allocation needs to be aligned to A2, not A1 for
> the correct alignment of the larger clusters.
> 
> If we align to A1, then this will happen:
> 
> A2                        A2                        A2
> A1           A1           A1           A1           A1
>              [ 16 inodes ][ 16 inodes ][ 16 inodes ][ 16 inodes ]
> [        32 inodes       ][       32 inodes        ] <--- new
> 
> And that is clearly broken. Hence, to ensure we can use larger inode
> clusters, we have to ensure that the inode chunks are aligned
> appropriately for those cluster sizes. If the chunks are
> appropriately aligned for larger inode clusters (e.g. sb_inoalign =
> 4), then they are also appropriately aligned for inode cluster sizes
> older kernels support.

*nod*  Ok.

>> So the only other thing I wonder about is when we are handling
>> pre-existing, smaller-than m_inode_cluster_size clusters.
>>
>> i.e. xfs_ifree_cluster() figures out the number of blocks &
>> number of inodes in a cluster, based on the (now not
>> constant) m_inode_cluster_size.
>>
>> What stops us from going off the end of a smaller cluster?
> 
> The fact that we calculate the number of inodes to process per
> cluster based on the size of the cluster buffer (in blocks)
> multiplied by the number of inodes per block. If the code didn't
> work, we'd have found out a long time ago ;)

And I was rather stupidly confusing clusters & chunks, so never mind...

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/5] xfs: increase inode cluster size for v5 filesystems
  2013-11-05 19:56     ` Dave Chinner
  2013-11-06 21:31       ` Ben Myers
@ 2013-11-12 17:33       ` Christoph Hellwig
  1 sibling, 0 replies; 30+ messages in thread
From: Christoph Hellwig @ 2013-11-12 17:33 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, xfs

On Wed, Nov 06, 2013 at 06:56:50AM +1100, Dave Chinner wrote:
> > printing this on every mount seem a bit too verbose.
> 
> I'd like to leave it there until we remove the experimental tag from
> the v5 superblock configuration, as there is no good way of
> determining that someone is using a mkfs patched to enable this
> feature yet...

I tend to disagree, but given that it's not actively harmful I'm fine
with merging it as-is.

Reviewed-by: Christoph Hellwig <hch@lst.de>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/5] xfs: more patches for 3.13
  2013-11-07  1:57   ` Dave Chinner
@ 2013-11-13  1:16     ` Eric Sandeen
  2013-11-14  1:16       ` Dave Chinner
  0 siblings, 1 reply; 30+ messages in thread
From: Eric Sandeen @ 2013-11-13  1:16 UTC (permalink / raw)
  To: Dave Chinner, Ben Myers; +Cc: xfs

On 11/6/13, 7:57 PM, Dave Chinner wrote:
> On Wed, Nov 06, 2013 at 05:01:33PM -0600, Ben Myers wrote:
>> On Fri, Nov 01, 2013 at 03:27:15PM +1100, Dave Chinner wrote:
>>> Hi folks,
>>>
>>> The following series follows up the recently committed series of
>>> patches for 3.13. The first two patches are the remaining
>>> uncommitted patches from the previous series.
>>>
>>> The next two patches are tracing patches, one for AIL manipulations
>>> and the other for AGF and AGI read operations. Both of these were
>>> written during recent debugging sessions, and both proved useful so
>>> should be added to the menagerie of tracepoints we already have
>>> avaialble.
>>>
>>> The final patch is the increasing of the inode cluster size for v5
>>> filesystems. I'd like to get this into v5 filesystems for 3.13 so we
>>> get wider exposure of it ASAP so we have more data available to be
>>> able to make informed decisions about how to bring this back to v4
>>> filesystems in a safe and controlled manner.
>>
>> Applied 3 and 4.  I still don't understand why the locking on patch 2 is
>> correct.  Seems like the readers of i_version hold different locks than we do
>> when we log the inode.  Maybe Christoph can help me with that.
> 
> Readers don't need to hold a spinlock, and many don't. The spinlock
> is only there to prevent concurrent updates from "losing" an update
> due to races.  All modifications to XFS inodes occur via
> transactions, inodes are locked exclusively in transactions and
> hence we will never lose i_version updates due to races. Hence we
> don't need the spinlock during the update, either.

I'm not completely convinced that readers don't need to.  What happens when
we read in the middle of an update?  Especially when a 32-bit box reads the
64-bit value in the middle of an update?

NFS is the only reader we care about (right?)

I see a several paths to i_version reads in nfs; so far I'm finding locked reads:

<2 callers of nfs_refresh_inode_locked>
	spin_lock(&inode->i_lock);
	nfs_refresh_inode_locked
		nfs_update_inode
			nfs_wcc_update_inode
				(... && inode->i_version == fattr->pre_change_attr)
		...
		if (inode->i_version != fattr->change_attr) {
		...
		nfs_check_inode_attributes
			(... && inode->i_version != fattr->change_attr)

---

update_changeattr
	spin_lock(&dir->i_lock);
	if (!cinfo->atomic || cinfo->before != dir->i_version)

---

nfs_post_op_update_inode_force_wcc
	 spin_lock(&inode->i_lock);
	 fattr->pre_change_attr = inode->i_version;

---

I haven't audited everything but do you have an example of an unlocked
reader (which is relevant to xfs)?

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/5] xfs: more patches for 3.13
  2013-11-13  1:16     ` Eric Sandeen
@ 2013-11-14  1:16       ` Dave Chinner
  2013-11-15 17:19         ` Eric Sandeen
  0 siblings, 1 reply; 30+ messages in thread
From: Dave Chinner @ 2013-11-14  1:16 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Ben Myers, xfs

On Tue, Nov 12, 2013 at 07:16:03PM -0600, Eric Sandeen wrote:
> On 11/6/13, 7:57 PM, Dave Chinner wrote:
> > On Wed, Nov 06, 2013 at 05:01:33PM -0600, Ben Myers wrote:
> >> On Fri, Nov 01, 2013 at 03:27:15PM +1100, Dave Chinner wrote:
> >>> Hi folks,
> >>>
> >>> The following series follows up the recently committed series of
> >>> patches for 3.13. The first two patches are the remaining
> >>> uncommitted patches from the previous series.
> >>>
> >>> The next two patches are tracing patches, one for AIL manipulations
> >>> and the other for AGF and AGI read operations. Both of these were
> >>> written during recent debugging sessions, and both proved useful so
> >>> should be added to the menagerie of tracepoints we already have
> >>> avaialble.
> >>>
> >>> The final patch is the increasing of the inode cluster size for v5
> >>> filesystems. I'd like to get this into v5 filesystems for 3.13 so we
> >>> get wider exposure of it ASAP so we have more data available to be
> >>> able to make informed decisions about how to bring this back to v4
> >>> filesystems in a safe and controlled manner.
> >>
> >> Applied 3 and 4.  I still don't understand why the locking on patch 2 is
> >> correct.  Seems like the readers of i_version hold different locks than we do
> >> when we log the inode.  Maybe Christoph can help me with that.
> > 
> > Readers don't need to hold a spinlock, and many don't. The spinlock
> > is only there to prevent concurrent updates from "losing" an update
> > due to races.  All modifications to XFS inodes occur via
> > transactions, inodes are locked exclusively in transactions and
> > hence we will never lose i_version updates due to races. Hence we
> > don't need the spinlock during the update, either.
> 
> I'm not completely convinced that readers don't need to.  What happens when
> we read in the middle of an update?  Especially when a 32-bit box reads the
> 64-bit value in the middle of an update?
> 
> NFS is the only reader we care about (right?)
> 
> I see a several paths to i_version reads in nfs; so far I'm finding locked reads:
> 
> <2 callers of nfs_refresh_inode_locked>
> 	spin_lock(&inode->i_lock);
> 	nfs_refresh_inode_locked
> 		nfs_update_inode
> 			nfs_wcc_update_inode
> 				(... && inode->i_version == fattr->pre_change_attr)
> 		...
> 		if (inode->i_version != fattr->change_attr) {
> 		...
> 		nfs_check_inode_attributes
> 			(... && inode->i_version != fattr->change_attr)


That's client side, not server side, so that's the NFS client inode
it is locking, not the server side XFS inode.

> ---
> 
> update_changeattr
> 	spin_lock(&dir->i_lock);
> 	if (!cinfo->atomic || cinfo->before != dir->i_version)

Again, client side.

> ---
> 
> nfs_post_op_update_inode_force_wcc
> 	 spin_lock(&inode->i_lock);
> 	 fattr->pre_change_attr = inode->i_version;

Client side.

> 
> ---
> 
> I haven't audited everything but do you have an example of an unlocked
> reader (which is relevant to xfs)?

Server side, where i_version is pulled out of an XFS inode:

$ git grep i_version fs/nfsd
fs/nfsd/nfs3xdr.c:      fhp->fh_post_change = fhp->fh_dentry->d_inode->i_version;
fs/nfsd/nfs4xdr.c:              write64(p, inode->i_version);
fs/nfsd/nfsfh.h:                fhp->fh_pre_change = inode->i_version;
$

the nfsfh.h hit is in fill_pre_wcc(), which appears to be called
under i_mutex but not i_lock. The xdr encoding functions don't
appear to be holding i_lock, and may be holding i_mutex, but I
haven't looked that far.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/5] xfs: increase inode cluster size for v5 filesystems
  2013-11-01  4:27 ` [PATCH 5/5] xfs: increase inode cluster size for v5 filesystems Dave Chinner
  2013-11-05 16:43   ` Christoph Hellwig
  2013-11-08 18:21   ` Eric Sandeen
@ 2013-11-14 18:51   ` Eric Sandeen
  2 siblings, 0 replies; 30+ messages in thread
From: Eric Sandeen @ 2013-11-14 18:51 UTC (permalink / raw)
  To: Dave Chinner, xfs

On 10/31/13, 11:27 PM, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> v5 filesystems use 512 byte inodes as a minimum, so read inodes in
> clusters that are effectively half the size of a v4 filesystem with
> 256 byte inodes. For v5 fielsystems, scale the inode cluster size
> with the size of the inode so that we keep a constant 32 inodes per
> cluster ratio for all inode IO.

Ok, I'm happy with this now that I was reminded of the difference
between clusters & chunks.  :/

Ben, regarding your compat concern, I agree w/ Dave that there
should be no failures moving forward or back; as he & I mentioned
(I had missed his other reply), the kernel already (used to) set
different cluster sizes based on the memory available in the machine
that mounted the filesystem.


So:

Reviewed-by: Eric Sandeen <sandeen@redhat.com>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/5] xfs: more patches for 3.13
  2013-11-14  1:16       ` Dave Chinner
@ 2013-11-15 17:19         ` Eric Sandeen
  2013-11-15 17:55           ` Eric Sandeen
  0 siblings, 1 reply; 30+ messages in thread
From: Eric Sandeen @ 2013-11-15 17:19 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Ben Myers, xfs

On 11/13/13, 7:16 PM, Dave Chinner wrote:

> That's client side, not server side, so that's the NFS client inode
> it is locking, not the server side XFS inode.

Ah, geez, you're right. (x3)

<snip>

> Server side, where i_version is pulled out of an XFS inode:
> 
> $ git grep i_version fs/nfsd
> fs/nfsd/nfs3xdr.c:      fhp->fh_post_change = fhp->fh_dentry->d_inode->i_version;
> fs/nfsd/nfs4xdr.c:              write64(p, inode->i_version);
> fs/nfsd/nfsfh.h:                fhp->fh_pre_change = inode->i_version;
> $
> 
> the nfsfh.h hit is in fill_pre_wcc(), which appears to be called
> under i_mutex but not i_lock. The xdr encoding functions don't
> appear to be holding i_lock, and may be holding i_mutex, but I
> haven't looked that far.

I'm still not sure how 

> Cheers,
> 
> Dave.
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/5] xfs: more patches for 3.13
  2013-11-15 17:19         ` Eric Sandeen
@ 2013-11-15 17:55           ` Eric Sandeen
  2013-11-17 19:48             ` Dave Chinner
  0 siblings, 1 reply; 30+ messages in thread
From: Eric Sandeen @ 2013-11-15 17:55 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Ben Myers, xfs

On 11/15/13, 11:19 AM, Eric Sandeen wrote:
> On 11/13/13, 7:16 PM, Dave Chinner wrote:
> 
>> That's client side, not server side, so that's the NFS client inode
>> it is locking, not the server side XFS inode.
> 
> Ah, geez, you're right. (x3)
> 
> <snip>
> 
>> Server side, where i_version is pulled out of an XFS inode:
>>
>> $ git grep i_version fs/nfsd
>> fs/nfsd/nfs3xdr.c:      fhp->fh_post_change = fhp->fh_dentry->d_inode->i_version;
>> fs/nfsd/nfs4xdr.c:              write64(p, inode->i_version);
>> fs/nfsd/nfsfh.h:                fhp->fh_pre_change = inode->i_version;
>> $
>>
>> the nfsfh.h hit is in fill_pre_wcc(), which appears to be called
>> under i_mutex but not i_lock. The xdr encoding functions don't
>> appear to be holding i_lock, and may be holding i_mutex, but I
>> haven't looked that far.
> 
> I'm still not sure how  . . . 

ugh didn't mean to send this reply quite yet, sorry.

Not sure how we do an unlocked read on a 32-bit machine that doesn't potentially
get the wrong answer.  I talked to Bruce about it a bit; nothing jumped out at
us.  At worst (?) it seems that if you happened to race on a read at exactly
the 2^32'nd modification, you might go backwards.  

As Bruce says, even if so, maybe "so rare we don't care?"

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/5] xfs: more patches for 3.13
  2013-11-15 17:55           ` Eric Sandeen
@ 2013-11-17 19:48             ` Dave Chinner
  2013-11-18 21:52               ` Eric Sandeen
  0 siblings, 1 reply; 30+ messages in thread
From: Dave Chinner @ 2013-11-17 19:48 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Ben Myers, xfs

On Fri, Nov 15, 2013 at 11:55:20AM -0600, Eric Sandeen wrote:
> On 11/15/13, 11:19 AM, Eric Sandeen wrote:
> > On 11/13/13, 7:16 PM, Dave Chinner wrote:
> > 
> >> That's client side, not server side, so that's the NFS client inode
> >> it is locking, not the server side XFS inode.
> > 
> > Ah, geez, you're right. (x3)
> > 
> > <snip>
> > 
> >> Server side, where i_version is pulled out of an XFS inode:
> >>
> >> $ git grep i_version fs/nfsd
> >> fs/nfsd/nfs3xdr.c:      fhp->fh_post_change = fhp->fh_dentry->d_inode->i_version;
> >> fs/nfsd/nfs4xdr.c:              write64(p, inode->i_version);
> >> fs/nfsd/nfsfh.h:                fhp->fh_pre_change = inode->i_version;
> >> $
> >>
> >> the nfsfh.h hit is in fill_pre_wcc(), which appears to be called
> >> under i_mutex but not i_lock. The xdr encoding functions don't
> >> appear to be holding i_lock, and may be holding i_mutex, but I
> >> haven't looked that far.
> > 
> > I'm still not sure how  . . . 
> 
> ugh didn't mean to send this reply quite yet, sorry.
> 
> Not sure how we do an unlocked read on a 32-bit machine that doesn't potentially
> get the wrong answer.  I talked to Bruce about it a bit; nothing jumped out at
> us.  At worst (?) it seems that if you happened to race on a read at exactly
> the 2^32'nd modification, you might go backwards.  
> 
> As Bruce says, even if so, maybe "so rare we don't care?"

Especially as it requires 2^32 modifications to first be made to the
file before there's even the possibility of a high word race on a
read and then there's only one increment we could race with before
it doesn't chnge again for another 2^32 modifications.

Hence, at 1 in 4 billion modifications potentially causing a problem,
I'd agree with the "so rare we don't care" assessment.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/5] xfs: more patches for 3.13
  2013-11-01  4:27 [PATCH 0/5] xfs: more patches for 3.13 Dave Chinner
                   ` (5 preceding siblings ...)
  2013-11-06 23:01 ` [PATCH 0/5] xfs: more patches for 3.13 Ben Myers
@ 2013-11-18 20:30 ` Ben Myers
  6 siblings, 0 replies; 30+ messages in thread
From: Ben Myers @ 2013-11-18 20:30 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Fri, Nov 01, 2013 at 03:27:15PM +1100, Dave Chinner wrote:
> The following series follows up the recently committed series of
> patches for 3.13. The first two patches are the remaining
> uncommitted patches from the previous series.
> 
> The next two patches are tracing patches, one for AIL manipulations
> and the other for AGF and AGI read operations. Both of these were
> written during recent debugging sessions, and both proved useful so
> should be added to the menagerie of tracepoints we already have
> avaialble.
> 
> The final patch is the increasing of the inode cluster size for v5
> filesystems. I'd like to get this into v5 filesystems for 3.13 so we
> get wider exposure of it ASAP so we have more data available to be
> able to make informed decisions about how to bring this back to v4
> filesystems in a safe and controlled manner.

Applied #2 and #5.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/5] xfs: more patches for 3.13
  2013-11-17 19:48             ` Dave Chinner
@ 2013-11-18 21:52               ` Eric Sandeen
  0 siblings, 0 replies; 30+ messages in thread
From: Eric Sandeen @ 2013-11-18 21:52 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Ben Myers, xfs

On 11/17/13, 1:48 PM, Dave Chinner wrote:
> On Fri, Nov 15, 2013 at 11:55:20AM -0600, Eric Sandeen wrote:
>> On 11/15/13, 11:19 AM, Eric Sandeen wrote:
>>> On 11/13/13, 7:16 PM, Dave Chinner wrote:
>>>
>>>> That's client side, not server side, so that's the NFS client inode
>>>> it is locking, not the server side XFS inode.
>>>
>>> Ah, geez, you're right. (x3)
>>>
>>> <snip>
>>>
>>>> Server side, where i_version is pulled out of an XFS inode:
>>>>
>>>> $ git grep i_version fs/nfsd
>>>> fs/nfsd/nfs3xdr.c:      fhp->fh_post_change = fhp->fh_dentry->d_inode->i_version;
>>>> fs/nfsd/nfs4xdr.c:              write64(p, inode->i_version);
>>>> fs/nfsd/nfsfh.h:                fhp->fh_pre_change = inode->i_version;
>>>> $
>>>>
>>>> the nfsfh.h hit is in fill_pre_wcc(), which appears to be called
>>>> under i_mutex but not i_lock. The xdr encoding functions don't
>>>> appear to be holding i_lock, and may be holding i_mutex, but I
>>>> haven't looked that far.
>>>
>>> I'm still not sure how  . . . 
>>
>> ugh didn't mean to send this reply quite yet, sorry.
>>
>> Not sure how we do an unlocked read on a 32-bit machine that doesn't potentially
>> get the wrong answer.  I talked to Bruce about it a bit; nothing jumped out at
>> us.  At worst (?) it seems that if you happened to race on a read at exactly
>> the 2^32'nd modification, you might go backwards.  
>>
>> As Bruce says, even if so, maybe "so rare we don't care?"
> 
> Especially as it requires 2^32 modifications to first be made to the
> file before there's even the possibility of a high word race on a
> read and then there's only one increment we could race with before
> it doesn't chnge again for another 2^32 modifications.
> 
> Hence, at 1 in 4 billion modifications potentially causing a problem,
> I'd agree with the "so rare we don't care" assessment.

Ok, I'm sold.

-Eric

> Cheers,
> 
> Dave.
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/5] xfs: open code inc_inode_iversion when logging an inode
  2013-11-01  4:27 ` [PATCH 2/5] xfs: open code inc_inode_iversion when logging an inode Dave Chinner
  2013-11-05 16:41   ` Christoph Hellwig
@ 2013-11-18 21:54   ` Eric Sandeen
  2013-11-18 22:28     ` Ben Myers
  1 sibling, 1 reply; 30+ messages in thread
From: Eric Sandeen @ 2013-11-18 21:54 UTC (permalink / raw)
  To: Dave Chinner, xfs

On 10/31/13, 11:27 PM, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Michael L Semon reported that generic/069 runtime increased on v5
> superblocks by 100% compared to v4 superblocks. his perf-based
> analysis pointed directly at the timestamp updates being done by the
> write path in this workload. The append writers are doing 4-byte
> writes, so there are lots of timestamp updates occurring.
> 
> The thing is, they aren't being triggered by timestamp changes -
> they are being triggered by the inode change counter needing to be
> updated. That is, every write(2) system call needs to bump the inode
> version count, and it does that through the timestamp update
> mechanism. Hence for v5 filesystems, test generic/069 is running 3
> orders of magnitude more timestmap update transactions on v5
> filesystems due to the fact it does a huge number of *4 byte*
> write(2) calls.
> 
> This isn't a real world scenario we really need to address - anyone
> doing such sequential IO should be using fwrite(3), not write(2).
> i.e. fwrite(3) buffers the writes in userspace to minimise the
> number of write(2) syscalls, and the problem goes away.
> 
> However, there is a small change we can make to improve the
> situation - removing the expensive lock operation on the change
> counter update.  All inode version counter changes in XFS occur
> under the ip->i_ilock during a transaction, and therefore we
> don't actually need the spin lock that provides exclusive access to
> it through inc_inode_iversion().
> 
> Hence avoid the lock and just open code the increment ourselves when
> logging the inode.

Well, ok.  Maybe worth a note about why the unlocked read is 99.9999% ok...

Reviewed-by: Eric Sandeen <sandeen@redhat.com>

> Reported-by: Michael L. Semon <mlsemon35@gmail.com>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_trans_inode.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/xfs/xfs_trans_inode.c b/fs/xfs/xfs_trans_inode.c
> index 1bba7f6..50c3f56 100644
> --- a/fs/xfs/xfs_trans_inode.c
> +++ b/fs/xfs/xfs_trans_inode.c
> @@ -111,12 +111,14 @@ xfs_trans_log_inode(
>  
>  	/*
>  	 * First time we log the inode in a transaction, bump the inode change
> -	 * counter if it is configured for this to occur.
> +	 * counter if it is configured for this to occur. We don't use
> +	 * inode_inc_version() because there is no need for extra locking around
> +	 * i_version as we already hold the inode locked exclusively for
> +	 * metadata modification.
>  	 */
>  	if (!(ip->i_itemp->ili_item.li_desc->lid_flags & XFS_LID_DIRTY) &&
>  	    IS_I_VERSION(VFS_I(ip))) {
> -		inode_inc_iversion(VFS_I(ip));
> -		ip->i_d.di_changecount = VFS_I(ip)->i_version;
> +		ip->i_d.di_changecount = ++VFS_I(ip)->i_version;
>  		flags |= XFS_ILOG_CORE;
>  	}
>  
> 

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/5] xfs: open code inc_inode_iversion when logging an inode
  2013-11-18 21:54   ` Eric Sandeen
@ 2013-11-18 22:28     ` Ben Myers
  2013-11-18 22:45       ` Eric Sandeen
  0 siblings, 1 reply; 30+ messages in thread
From: Ben Myers @ 2013-11-18 22:28 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: xfs

On Mon, Nov 18, 2013 at 03:54:24PM -0600, Eric Sandeen wrote:
> On 10/31/13, 11:27 PM, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > Michael L Semon reported that generic/069 runtime increased on v5
> > superblocks by 100% compared to v4 superblocks. his perf-based
> > analysis pointed directly at the timestamp updates being done by the
> > write path in this workload. The append writers are doing 4-byte
> > writes, so there are lots of timestamp updates occurring.
> > 
> > The thing is, they aren't being triggered by timestamp changes -
> > they are being triggered by the inode change counter needing to be
> > updated. That is, every write(2) system call needs to bump the inode
> > version count, and it does that through the timestamp update
> > mechanism. Hence for v5 filesystems, test generic/069 is running 3
> > orders of magnitude more timestmap update transactions on v5
> > filesystems due to the fact it does a huge number of *4 byte*
> > write(2) calls.
> > 
> > This isn't a real world scenario we really need to address - anyone
> > doing such sequential IO should be using fwrite(3), not write(2).
> > i.e. fwrite(3) buffers the writes in userspace to minimise the
> > number of write(2) syscalls, and the problem goes away.
> > 
> > However, there is a small change we can make to improve the
> > situation - removing the expensive lock operation on the change
> > counter update.  All inode version counter changes in XFS occur
> > under the ip->i_ilock during a transaction, and therefore we
> > don't actually need the spin lock that provides exclusive access to
> > it through inc_inode_iversion().
> > 
> > Hence avoid the lock and just open code the increment ourselves when
> > logging the inode.
> 
> Well, ok.  Maybe worth a note about why the unlocked read is 99.9999% ok...
> 
> Reviewed-by: Eric Sandeen <sandeen@redhat.com>

Ah, sorry Eric, I didn't realize you were still reviewing this guy.  I pulled
him in a bit earlier in the day.

Thanks,
	Ben

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/5] xfs: open code inc_inode_iversion when logging an inode
  2013-11-18 22:28     ` Ben Myers
@ 2013-11-18 22:45       ` Eric Sandeen
  0 siblings, 0 replies; 30+ messages in thread
From: Eric Sandeen @ 2013-11-18 22:45 UTC (permalink / raw)
  To: Ben Myers; +Cc: xfs

On 11/18/13, 4:28 PM, Ben Myers wrote:
> On Mon, Nov 18, 2013 at 03:54:24PM -0600, Eric Sandeen wrote:

<snip>

>> Well, ok.  Maybe worth a note about why the unlocked read is 99.9999% ok...
>>
>> Reviewed-by: Eric Sandeen <sandeen@redhat.com>
> 
> Ah, sorry Eric, I didn't realize you were still reviewing this guy.  I pulled
> him in a bit earlier in the day.

No worries.  I'm very slow.  ;)

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2013-11-18 22:45 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-11-01  4:27 [PATCH 0/5] xfs: more patches for 3.13 Dave Chinner
2013-11-01  4:27 ` [PATCH 1/5] xfs: xfs_remove deadlocks due to inverted AGF vs AGI lock ordering Dave Chinner
2013-11-01  4:27 ` [PATCH 2/5] xfs: open code inc_inode_iversion when logging an inode Dave Chinner
2013-11-05 16:41   ` Christoph Hellwig
2013-11-18 21:54   ` Eric Sandeen
2013-11-18 22:28     ` Ben Myers
2013-11-18 22:45       ` Eric Sandeen
2013-11-01  4:27 ` [PATCH 3/5] xfs: trace AIL manipulations Dave Chinner
2013-11-05 16:41   ` Christoph Hellwig
2013-11-01  4:27 ` [PATCH 4/5] xfs: add tracepoints to AGF/AGI read operations Dave Chinner
2013-11-05 16:42   ` Christoph Hellwig
2013-11-01  4:27 ` [PATCH 5/5] xfs: increase inode cluster size for v5 filesystems Dave Chinner
2013-11-05 16:43   ` Christoph Hellwig
2013-11-05 19:56     ` Dave Chinner
2013-11-06 21:31       ` Ben Myers
2013-11-07  0:32         ` Dave Chinner
2013-11-12 17:33       ` Christoph Hellwig
2013-11-08 18:21   ` Eric Sandeen
2013-11-11 22:45     ` Dave Chinner
2013-11-12  0:24       ` Eric Sandeen
2013-11-14 18:51   ` Eric Sandeen
2013-11-06 23:01 ` [PATCH 0/5] xfs: more patches for 3.13 Ben Myers
2013-11-07  1:57   ` Dave Chinner
2013-11-13  1:16     ` Eric Sandeen
2013-11-14  1:16       ` Dave Chinner
2013-11-15 17:19         ` Eric Sandeen
2013-11-15 17:55           ` Eric Sandeen
2013-11-17 19:48             ` Dave Chinner
2013-11-18 21:52               ` Eric Sandeen
2013-11-18 20:30 ` Ben Myers

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.