All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/18] xfs: current patch queue
@ 2012-04-13 12:10 Dave Chinner
  2012-04-13 12:10 ` [PATCH 01/18] xfs: Ensure inode reclaim can run during quotacheck Dave Chinner
                   ` (18 more replies)
  0 siblings, 19 replies; 45+ messages in thread
From: Dave Chinner @ 2012-04-13 12:10 UTC (permalink / raw)
  To: xfs

Here's a repost of my current patch queue. They are all based on Christoph's
ilock and xfsbufd patch series and been tested on top of them. I've updated all
the reviewed-by tags, so if the patch is missing such a tag then it hasn't been
reviewed....

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 01/18] xfs: Ensure inode reclaim can run during quotacheck
  2012-04-13 12:10 [PATCH 0/18] xfs: current patch queue Dave Chinner
@ 2012-04-13 12:10 ` Dave Chinner
  2012-04-13 18:01   ` Mark Tinguely
  2012-04-29 21:37   ` Christoph Hellwig
  2012-04-13 12:10 ` [PATCH 02/18] xfs: pass shutdown method into xfs_trans_ail_delete_bulk Dave Chinner
                   ` (17 subsequent siblings)
  18 siblings, 2 replies; 45+ messages in thread
From: Dave Chinner @ 2012-04-13 12:10 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

Because the mount process can run a quotacheck and consume lots of
inodes, we need to be able to run periodic inode reclaim during the
mount process. This will prevent running the system out of memory
during quota checks.

This essentially reverts 2bcf6e97, but that is safe to do now that
the quota sync code that was causing problems during long quotacheck
executions is now gone.

The reclaim work is currently protected from running during the
unmount process by a check against MS_ACTIVE. Unfortunately, this
also means that the reclaim work cannot run during mount.  The
unmount process should stop the reclaim cleanly before freeing
anything that the reclaim work depends on, so there is no need to
have this guard in place.

Also, the inode reclaim work is demand driven, so there is no need
to start it immediately during mount. It will be started the moment
an inode is queued for reclaim, so qutoacheck will trigger it just
fine.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_super.c |   20 ++++++++++----------
 fs/xfs/xfs_sync.c  |   19 +++++++++----------
 2 files changed, 19 insertions(+), 20 deletions(-)

diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 6306882..744d6b0 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -981,10 +981,9 @@ xfs_fs_put_super(
 {
 	struct xfs_mount	*mp = XFS_M(sb);
 
-	xfs_syncd_stop(mp);
-
 	xfs_filestream_unmount(mp);
 	xfs_unmountfs(mp);
+	xfs_syncd_stop(mp);
 	xfs_freesb(mp);
 	xfs_icsb_destroy_counters(mp);
 	xfs_destroy_mount_workqueues(mp);
@@ -1354,31 +1353,33 @@ xfs_fs_fill_super(
 	sb->s_time_gran = 1;
 	set_posix_acl_flag(sb);
 
-	error = xfs_mountfs(mp);
+	error = xfs_syncd_init(mp);
 	if (error)
 		goto out_filestream_unmount;
 
-	error = xfs_syncd_init(mp);
+	error = xfs_mountfs(mp);
 	if (error)
-		goto out_unmount;
+		goto out_syncd_stop;
 
 	root = igrab(VFS_I(mp->m_rootip));
 	if (!root) {
 		error = ENOENT;
-		goto out_syncd_stop;
+		goto out_unmount;
 	}
 	if (is_bad_inode(root)) {
 		error = EINVAL;
-		goto out_syncd_stop;
+		goto out_unmount;
 	}
 	sb->s_root = d_make_root(root);
 	if (!sb->s_root) {
 		error = ENOMEM;
-		goto out_syncd_stop;
+		goto out_unmount;
 	}
 
 	return 0;
 
+ out_syncd_stop:
+	xfs_syncd_stop(mp);
  out_filestream_unmount:
 	xfs_filestream_unmount(mp);
  out_free_sb:
@@ -1395,11 +1396,10 @@ out_destroy_workqueues:
  out:
 	return -error;
 
- out_syncd_stop:
-	xfs_syncd_stop(mp);
  out_unmount:
 	xfs_filestream_unmount(mp);
 	xfs_unmountfs(mp);
+	xfs_syncd_stop(mp);
 	goto out_free_sb;
 }
 
diff --git a/fs/xfs/xfs_sync.c b/fs/xfs/xfs_sync.c
index f0f1894..f7f019f 100644
--- a/fs/xfs/xfs_sync.c
+++ b/fs/xfs/xfs_sync.c
@@ -379,7 +379,15 @@ xfs_sync_worker(
 					struct xfs_mount, m_sync_work);
 	int		error;
 
-	if (!(mp->m_flags & XFS_MOUNT_RDONLY)) {
+	/*
+	 * We shouldn't write/force the log if we are in the mount/unmount
+	 * process or on a read only filesystem. The workqueue still needs to be
+	 * active in both cases, however, because it is used for inode reclaim
+	 * during these times. hence use the MS_ACTIVE flag to avoid doing
+	 * anything in these periods.
+	 */
+	if (!(mp->m_super->s_flags & MS_ACTIVE) &&
+	    !(mp->m_flags & XFS_MOUNT_RDONLY)) {
 		/* dgc: errors ignored here */
 		if (mp->m_super->s_frozen == SB_UNFROZEN &&
 		    xfs_log_need_covered(mp))
@@ -407,14 +415,6 @@ xfs_syncd_queue_reclaim(
 	struct xfs_mount        *mp)
 {
 
-	/*
-	 * We can have inodes enter reclaim after we've shut down the syncd
-	 * workqueue during unmount, so don't allow reclaim work to be queued
-	 * during unmount.
-	 */
-	if (!(mp->m_super->s_flags & MS_ACTIVE))
-		return;
-
 	rcu_read_lock();
 	if (radix_tree_tagged(&mp->m_perag_tree, XFS_ICI_RECLAIM_TAG)) {
 		queue_delayed_work(xfs_syncd_wq, &mp->m_reclaim_work,
@@ -483,7 +483,6 @@ xfs_syncd_init(
 	INIT_DELAYED_WORK(&mp->m_reclaim_work, xfs_reclaim_worker);
 
 	xfs_syncd_queue_sync(mp);
-	xfs_syncd_queue_reclaim(mp);
 
 	return 0;
 }
-- 
1.7.9.5

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 02/18] xfs: pass shutdown method into xfs_trans_ail_delete_bulk
  2012-04-13 12:10 [PATCH 0/18] xfs: current patch queue Dave Chinner
  2012-04-13 12:10 ` [PATCH 01/18] xfs: Ensure inode reclaim can run during quotacheck Dave Chinner
@ 2012-04-13 12:10 ` Dave Chinner
  2012-04-13 17:40   ` Mark Tinguely
  2012-04-13 12:10 ` [PATCH 03/18] xfs: Do background CIL flushes via a workqueue Dave Chinner
                   ` (16 subsequent siblings)
  18 siblings, 1 reply; 45+ messages in thread
From: Dave Chinner @ 2012-04-13 12:10 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

xfs_trans_ail_delete_bulk() can be called from different contexts so
if the item is not in the AIL we need different shutdown for each
context.  Pass in the shutdown method needed so the correct action
can be taken.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
---
 fs/xfs/xfs_buf_item.c     |    4 ++--
 fs/xfs/xfs_dquot.c        |    2 +-
 fs/xfs/xfs_dquot_item.c   |    2 +-
 fs/xfs/xfs_extfree_item.c |    3 ++-
 fs/xfs/xfs_inode.c        |    4 ++--
 fs/xfs/xfs_inode_item.c   |   23 +++++++++++++----------
 fs/xfs/xfs_inode_item.h   |    2 +-
 fs/xfs/xfs_log_recover.c  |    3 ++-
 fs/xfs/xfs_trans_ail.c    |    5 +++--
 fs/xfs/xfs_trans_priv.h   |    8 +++++---
 10 files changed, 32 insertions(+), 24 deletions(-)

diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
index fb20f38..7f0abea 100644
--- a/fs/xfs/xfs_buf_item.c
+++ b/fs/xfs/xfs_buf_item.c
@@ -454,7 +454,7 @@ xfs_buf_item_unpin(
 			bp->b_iodone = NULL;
 		} else {
 			spin_lock(&ailp->xa_lock);
-			xfs_trans_ail_delete(ailp, (xfs_log_item_t *)bip);
+			xfs_trans_ail_delete(ailp, lip, SHUTDOWN_LOG_IO_ERROR);
 			xfs_buf_item_relse(bp);
 			ASSERT(bp->b_fspriv == NULL);
 		}
@@ -1006,6 +1006,6 @@ xfs_buf_iodone(
 	 * Either way, AIL is useless if we're forcing a shutdown.
 	 */
 	spin_lock(&ailp->xa_lock);
-	xfs_trans_ail_delete(ailp, lip);
+	xfs_trans_ail_delete(ailp, lip, SHUTDOWN_CORRUPT_INCORE);
 	xfs_buf_item_free(BUF_ITEM(lip));
 }
diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
index 0760c47..7bf3855 100644
--- a/fs/xfs/xfs_dquot.c
+++ b/fs/xfs/xfs_dquot.c
@@ -857,7 +857,7 @@ xfs_qm_dqflush_done(
 		/* xfs_trans_ail_delete() drops the AIL lock. */
 		spin_lock(&ailp->xa_lock);
 		if (lip->li_lsn == qip->qli_flush_lsn)
-			xfs_trans_ail_delete(ailp, lip);
+			xfs_trans_ail_delete(ailp, lip, SHUTDOWN_CORRUPT_INCORE);
 		else
 			spin_unlock(&ailp->xa_lock);
 	}
diff --git a/fs/xfs/xfs_dquot_item.c b/fs/xfs/xfs_dquot_item.c
index 9c5d58d..aa6a2a6 100644
--- a/fs/xfs/xfs_dquot_item.c
+++ b/fs/xfs/xfs_dquot_item.c
@@ -384,7 +384,7 @@ xfs_qm_qoffend_logitem_committed(
 	 * xfs_trans_ail_delete() drops the AIL lock.
 	 */
 	spin_lock(&ailp->xa_lock);
-	xfs_trans_ail_delete(ailp, (xfs_log_item_t *)qfs);
+	xfs_trans_ail_delete(ailp, &qfs->qql_item, SHUTDOWN_LOG_IO_ERROR);
 
 	kmem_free(qfs);
 	kmem_free(qfe);
diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
index 1a40dd7..1c57eb9 100644
--- a/fs/xfs/xfs_extfree_item.c
+++ b/fs/xfs/xfs_extfree_item.c
@@ -64,7 +64,8 @@ __xfs_efi_release(
 	if (!test_and_clear_bit(XFS_EFI_COMMITTED, &efip->efi_flags)) {
 		spin_lock(&ailp->xa_lock);
 		/* xfs_trans_ail_delete() drops the AIL lock. */
-		xfs_trans_ail_delete(ailp, &efip->efi_item);
+		xfs_trans_ail_delete(ailp, &efip->efi_item,
+				     SHUTDOWN_LOG_IO_ERROR);
 		xfs_efi_item_free(efip);
 	}
 }
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index acd846d..65d7d99 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -2377,7 +2377,7 @@ cluster_corrupt_out:
 	/*
 	 * Unlocks the flush lock
 	 */
-	xfs_iflush_abort(iq);
+	xfs_iflush_abort(iq, false);
 	kmem_free(ilist);
 	xfs_perag_put(pag);
 	return XFS_ERROR(EFSCORRUPTED);
@@ -2482,7 +2482,7 @@ abort_out:
 	/*
 	 * Unlocks the flush lock
 	 */
-	xfs_iflush_abort(ip);
+	xfs_iflush_abort(ip, false);
 	return error;
 }
 
diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c
index 8aaebb2..3f96a94 100644
--- a/fs/xfs/xfs_inode_item.c
+++ b/fs/xfs/xfs_inode_item.c
@@ -777,7 +777,8 @@ xfs_iflush_done(
 			ASSERT(i <= need_ail);
 		}
 		/* xfs_trans_ail_delete_bulk() drops the AIL lock. */
-		xfs_trans_ail_delete_bulk(ailp, log_items, i);
+		xfs_trans_ail_delete_bulk(ailp, log_items, i,
+					  SHUTDOWN_CORRUPT_INCORE);
 	}
 
 
@@ -798,16 +799,15 @@ xfs_iflush_done(
 }
 
 /*
- * This is the inode flushing abort routine.  It is called
- * from xfs_iflush when the filesystem is shutting down to clean
- * up the inode state.
- * It is responsible for removing the inode item
- * from the AIL if it has not been re-logged, and unlocking the inode's
- * flush lock.
+ * This is the inode flushing abort routine.  It is called from xfs_iflush when
+ * the filesystem is shutting down to clean up the inode state.  It is
+ * responsible for removing the inode item from the AIL if it has not been
+ * re-logged, and unlocking the inode's flush lock.
  */
 void
 xfs_iflush_abort(
-	xfs_inode_t		*ip)
+	xfs_inode_t		*ip,
+	bool			stale)
 {
 	xfs_inode_log_item_t	*iip = ip->i_itemp;
 
@@ -817,7 +817,10 @@ xfs_iflush_abort(
 			spin_lock(&ailp->xa_lock);
 			if (iip->ili_item.li_flags & XFS_LI_IN_AIL) {
 				/* xfs_trans_ail_delete() drops the AIL lock. */
-				xfs_trans_ail_delete(ailp, (xfs_log_item_t *)iip);
+				xfs_trans_ail_delete(ailp, &iip->ili_item,
+						stale ?
+						     SHUTDOWN_LOG_IO_ERROR :
+						     SHUTDOWN_CORRUPT_INCORE);
 			} else
 				spin_unlock(&ailp->xa_lock);
 		}
@@ -844,7 +847,7 @@ xfs_istale_done(
 	struct xfs_buf		*bp,
 	struct xfs_log_item	*lip)
 {
-	xfs_iflush_abort(INODE_ITEM(lip)->ili_inode);
+	xfs_iflush_abort(INODE_ITEM(lip)->ili_inode, true);
 }
 
 /*
diff --git a/fs/xfs/xfs_inode_item.h b/fs/xfs/xfs_inode_item.h
index 41d61c3..376d4d0 100644
--- a/fs/xfs/xfs_inode_item.h
+++ b/fs/xfs/xfs_inode_item.h
@@ -165,7 +165,7 @@ extern void xfs_inode_item_init(struct xfs_inode *, struct xfs_mount *);
 extern void xfs_inode_item_destroy(struct xfs_inode *);
 extern void xfs_iflush_done(struct xfs_buf *, struct xfs_log_item *);
 extern void xfs_istale_done(struct xfs_buf *, struct xfs_log_item *);
-extern void xfs_iflush_abort(struct xfs_inode *);
+extern void xfs_iflush_abort(struct xfs_inode *, bool);
 extern int xfs_inode_item_format_convert(xfs_log_iovec_t *,
 					 xfs_inode_log_format_t *);
 
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 5e864a9..396e3bf 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -2645,7 +2645,8 @@ xlog_recover_efd_pass2(
 				 * xfs_trans_ail_delete() drops the
 				 * AIL lock.
 				 */
-				xfs_trans_ail_delete(ailp, lip);
+				xfs_trans_ail_delete(ailp, lip,
+						     SHUTDOWN_CORRUPT_INCORE);
 				xfs_efi_item_free(efip);
 				spin_lock(&ailp->xa_lock);
 				break;
diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
index 99106f1..cd2fde7 100644
--- a/fs/xfs/xfs_trans_ail.c
+++ b/fs/xfs/xfs_trans_ail.c
@@ -717,7 +717,8 @@ void
 xfs_trans_ail_delete_bulk(
 	struct xfs_ail		*ailp,
 	struct xfs_log_item	**log_items,
-	int			nr_items) __releases(ailp->xa_lock)
+	int			nr_items,
+	int			shutdown_type) __releases(ailp->xa_lock)
 {
 	xfs_log_item_t		*mlip;
 	int			mlip_changed = 0;
@@ -735,7 +736,7 @@ xfs_trans_ail_delete_bulk(
 				xfs_alert_tag(mp, XFS_PTAG_AILDELETE,
 		"%s: attempting to delete a log item that is not in the AIL",
 						__func__);
-				xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
+				xfs_force_shutdown(mp, shutdown_type);
 			}
 			return;
 		}
diff --git a/fs/xfs/xfs_trans_priv.h b/fs/xfs/xfs_trans_priv.h
index dd2742f..da03d1a 100644
--- a/fs/xfs/xfs_trans_priv.h
+++ b/fs/xfs/xfs_trans_priv.h
@@ -93,14 +93,16 @@ xfs_trans_ail_update(
 }
 
 void	xfs_trans_ail_delete_bulk(struct xfs_ail *ailp,
-				struct xfs_log_item **log_items, int nr_items)
+				struct xfs_log_item **log_items, int nr_items,
+				int shutdown_type)
 				__releases(ailp->xa_lock);
 static inline void
 xfs_trans_ail_delete(
 	struct xfs_ail	*ailp,
-	xfs_log_item_t	*lip) __releases(ailp->xa_lock)
+	xfs_log_item_t	*lip,
+	int		shutdown_type) __releases(ailp->xa_lock)
 {
-	xfs_trans_ail_delete_bulk(ailp, &lip, 1);
+	xfs_trans_ail_delete_bulk(ailp, &lip, 1, shutdown_type);
 }
 
 void			xfs_ail_push(struct xfs_ail *, xfs_lsn_t);
-- 
1.7.9.5

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 03/18] xfs: Do background CIL flushes via a workqueue
  2012-04-13 12:10 [PATCH 0/18] xfs: current patch queue Dave Chinner
  2012-04-13 12:10 ` [PATCH 01/18] xfs: Ensure inode reclaim can run during quotacheck Dave Chinner
  2012-04-13 12:10 ` [PATCH 02/18] xfs: pass shutdown method into xfs_trans_ail_delete_bulk Dave Chinner
@ 2012-04-13 12:10 ` Dave Chinner
  2012-04-17 17:54   ` Mark Tinguely
  2012-04-17 21:21   ` Ben Myers
  2012-04-13 12:10 ` [PATCH 04/18] xfs: page type check in writeback only checks last buffer Dave Chinner
                   ` (15 subsequent siblings)
  18 siblings, 2 replies; 45+ messages in thread
From: Dave Chinner @ 2012-04-13 12:10 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

Doing background CIL flushes adds significant latency to whatever
async transaction that triggers it. To avoid blocking async
transactions on things like waiting for log buffer IO to complete,
move the CIL push off into a workqueue.  By moving the push work
into a workqueue, we remove all the latency that the commit adds
from the foreground transaction commit path. This also means that
single threaded workloads won't do the CIL push procssing, leaving
them more CPU to do more async transactions.

To do this, we need to keep track of the sequence number we have
pushed work for. This avoids having many transaction commits
attempting to schedule work for the same sequence, and ensures that
we only ever have one push (background or forced) in progress at a
time. It also means that we don't need to take the CIL lock in write
mode to check for potential background push races, which reduces
lock contention.

To avoid potential issues with "smart" IO schedulers, don't use the
workqueue for log force triggered flushes. Instead, do them directly
so that the log IO is done directly by the process issuing the log
force and so doesn't get stuck on IO elevator queue idling
incorrectly delaying the log IO from the workqueue.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_log_cil.c  |  241 ++++++++++++++++++++++++++++++-------------------
 fs/xfs/xfs_log_priv.h |    2 +
 fs/xfs/xfs_mount.h    |    1 +
 fs/xfs/xfs_super.c    |    7 ++
 4 files changed, 157 insertions(+), 94 deletions(-)

diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
index d4fadbe..566a2d5 100644
--- a/fs/xfs/xfs_log_cil.c
+++ b/fs/xfs/xfs_log_cil.c
@@ -32,58 +32,6 @@
 #include "xfs_discard.h"
 
 /*
- * Perform initial CIL structure initialisation.
- */
-int
-xlog_cil_init(
-	struct log	*log)
-{
-	struct xfs_cil	*cil;
-	struct xfs_cil_ctx *ctx;
-
-	cil = kmem_zalloc(sizeof(*cil), KM_SLEEP|KM_MAYFAIL);
-	if (!cil)
-		return ENOMEM;
-
-	ctx = kmem_zalloc(sizeof(*ctx), KM_SLEEP|KM_MAYFAIL);
-	if (!ctx) {
-		kmem_free(cil);
-		return ENOMEM;
-	}
-
-	INIT_LIST_HEAD(&cil->xc_cil);
-	INIT_LIST_HEAD(&cil->xc_committing);
-	spin_lock_init(&cil->xc_cil_lock);
-	init_rwsem(&cil->xc_ctx_lock);
-	init_waitqueue_head(&cil->xc_commit_wait);
-
-	INIT_LIST_HEAD(&ctx->committing);
-	INIT_LIST_HEAD(&ctx->busy_extents);
-	ctx->sequence = 1;
-	ctx->cil = cil;
-	cil->xc_ctx = ctx;
-	cil->xc_current_sequence = ctx->sequence;
-
-	cil->xc_log = log;
-	log->l_cilp = cil;
-	return 0;
-}
-
-void
-xlog_cil_destroy(
-	struct log	*log)
-{
-	if (log->l_cilp->xc_ctx) {
-		if (log->l_cilp->xc_ctx->ticket)
-			xfs_log_ticket_put(log->l_cilp->xc_ctx->ticket);
-		kmem_free(log->l_cilp->xc_ctx);
-	}
-
-	ASSERT(list_empty(&log->l_cilp->xc_cil));
-	kmem_free(log->l_cilp);
-}
-
-/*
  * Allocate a new ticket. Failing to get a new ticket makes it really hard to
  * recover, so we don't allow failure here. Also, we allocate in a context that
  * we don't want to be issuing transactions from, so we need to tell the
@@ -426,8 +374,7 @@ xlog_cil_committed(
  */
 STATIC int
 xlog_cil_push(
-	struct log		*log,
-	xfs_lsn_t		push_seq)
+	struct log		*log)
 {
 	struct xfs_cil		*cil = log->l_cilp;
 	struct xfs_log_vec	*lv;
@@ -443,39 +390,35 @@ xlog_cil_push(
 	struct xfs_log_iovec	lhdr;
 	struct xfs_log_vec	lvhdr = { NULL };
 	xfs_lsn_t		commit_lsn;
+	xfs_lsn_t		push_seq;
 
 	if (!cil)
 		return 0;
 
-	ASSERT(!push_seq || push_seq <= cil->xc_ctx->sequence);
-
 	new_ctx = kmem_zalloc(sizeof(*new_ctx), KM_SLEEP|KM_NOFS);
 	new_ctx->ticket = xlog_cil_ticket_alloc(log);
 
-	/*
-	 * Lock out transaction commit, but don't block for background pushes
-	 * unless we are well over the CIL space limit. See the definition of
-	 * XLOG_CIL_HARD_SPACE_LIMIT() for the full explanation of the logic
-	 * used here.
-	 */
-	if (!down_write_trylock(&cil->xc_ctx_lock)) {
-		if (!push_seq &&
-		    cil->xc_ctx->space_used < XLOG_CIL_HARD_SPACE_LIMIT(log))
-			goto out_free_ticket;
-		down_write(&cil->xc_ctx_lock);
-	}
+	down_write(&cil->xc_ctx_lock);
 	ctx = cil->xc_ctx;
 
-	/* check if we've anything to push */
-	if (list_empty(&cil->xc_cil))
-		goto out_skip;
+	spin_lock(&cil->xc_cil_lock);
+	push_seq = cil->xc_push_seq;
+	ASSERT(push_seq > 0 && push_seq <= ctx->sequence);
 
-	/* check for spurious background flush */
-	if (!push_seq && cil->xc_ctx->space_used < XLOG_CIL_SPACE_LIMIT(log))
+	/*
+	 * Check if we've anything to push. If there is nothing, then we don't
+	 * move on to a new sequence number and so we have to be able to push
+	 * this sequence again later.
+	 */
+	if (list_empty(&cil->xc_cil)) {
+		cil->xc_push_seq = 0;
+		spin_unlock(&cil->xc_cil_lock);
 		goto out_skip;
+	}
+	spin_unlock(&cil->xc_cil_lock);
 
 	/* check for a previously pushed seqeunce */
-	if (push_seq && push_seq < cil->xc_ctx->sequence)
+	if (push_seq < cil->xc_ctx->sequence)
 		goto out_skip;
 
 	/*
@@ -629,7 +572,6 @@ restart:
 
 out_skip:
 	up_write(&cil->xc_ctx_lock);
-out_free_ticket:
 	xfs_log_ticket_put(new_ctx->ticket);
 	kmem_free(new_ctx);
 	return 0;
@@ -641,6 +583,80 @@ out_abort:
 	return XFS_ERROR(EIO);
 }
 
+static void
+xlog_cil_push_work(
+	struct work_struct	*work)
+{
+	struct xfs_cil		*cil = container_of(work, struct xfs_cil,
+							xc_push_work);
+	xlog_cil_push(cil->xc_log);
+}
+
+/*
+ * We need to push CIL every so often so we don't cache more than we can fit in
+ * the log. The limit really is that a checkpoint can't be more than half the
+ * log (the current checkpoint is not allowed to overwrite the previous
+ * checkpoint), but commit latency and memory usage limit this to a smaller
+ * size.
+ */
+static void
+xlog_cil_push_background(
+	struct log	*log)
+{
+	struct xfs_cil	*cil = log->l_cilp;
+
+	/*
+	 * The cil won't be empty because we are called while holding the
+	 * context lock so whatever we added to the CIL will still be there
+	 */
+	ASSERT(!list_empty(&cil->xc_cil));
+
+	/*
+	 * don't do a background push if we haven't used up all the
+	 * space available yet.
+	 */
+	if (cil->xc_ctx->space_used < XLOG_CIL_SPACE_LIMIT(log))
+		return;
+
+	spin_lock(&cil->xc_cil_lock);
+	cil->xc_push_seq = cil->xc_current_sequence;
+	queue_work(log->l_mp->m_cil_workqueue, &cil->xc_push_work);
+	spin_unlock(&cil->xc_cil_lock);
+
+}
+
+static void
+xlog_cil_push_foreground(
+	struct log	*log,
+	xfs_lsn_t	push_seq)
+{
+	struct xfs_cil	*cil = log->l_cilp;
+
+	if (!cil)
+		return;
+
+	ASSERT(push_seq && push_seq <= cil->xc_current_sequence);
+
+	/* start on any pending background push to minimise wait time on it */
+	flush_work(&cil->xc_push_work);
+
+	/*
+	 * If the CIL is empty or we've already pushed the sequence then
+	 * there's no work we need to do.
+	 */
+	spin_lock(&cil->xc_cil_lock);
+	if (list_empty(&cil->xc_cil) || push_seq <= cil->xc_push_seq) {
+		spin_unlock(&cil->xc_cil_lock);
+		return;
+	}
+
+	cil->xc_push_seq = push_seq;
+	spin_unlock(&cil->xc_cil_lock);
+
+	/* do the push now */
+	xlog_cil_push(log);
+}
+
 /*
  * Commit a transaction with the given vector to the Committed Item List.
  *
@@ -667,7 +683,6 @@ xfs_log_commit_cil(
 {
 	struct log		*log = mp->m_log;
 	int			log_flags = 0;
-	int			push = 0;
 	struct xfs_log_vec	*log_vector;
 
 	if (flags & XFS_TRANS_RELEASE_LOG_RES)
@@ -719,21 +734,9 @@ xfs_log_commit_cil(
 	 */
 	xfs_trans_free_items(tp, *commit_lsn, 0);
 
-	/* check for background commit before unlock */
-	if (log->l_cilp->xc_ctx->space_used > XLOG_CIL_SPACE_LIMIT(log))
-		push = 1;
+	xlog_cil_push_background(log);
 
 	up_read(&log->l_cilp->xc_ctx_lock);
-
-	/*
-	 * We need to push CIL every so often so we don't cache more than we
-	 * can fit in the log. The limit really is that a checkpoint can't be
-	 * more than half the log (the current checkpoint is not allowed to
-	 * overwrite the previous checkpoint), but commit latency and memory
-	 * usage limit this to a smaller size in most cases.
-	 */
-	if (push)
-		xlog_cil_push(log, 0);
 	return 0;
 }
 
@@ -746,9 +749,6 @@ xfs_log_commit_cil(
  *
  * We return the current commit lsn to allow the callers to determine if a
  * iclog flush is necessary following this call.
- *
- * XXX: Initially, just push the CIL unconditionally and return whatever
- * commit lsn is there. It'll be empty, so this is broken for now.
  */
 xfs_lsn_t
 xlog_cil_force_lsn(
@@ -766,8 +766,7 @@ xlog_cil_force_lsn(
 	 * xlog_cil_push() handles racing pushes for the same sequence,
 	 * so no need to deal with it here.
 	 */
-	if (sequence == cil->xc_current_sequence)
-		xlog_cil_push(log, sequence);
+	xlog_cil_push_foreground(log, sequence);
 
 	/*
 	 * See if we can find a previous sequence still committing.
@@ -826,3 +825,57 @@ xfs_log_item_in_current_chkpt(
 		return false;
 	return true;
 }
+
+/*
+ * Perform initial CIL structure initialisation.
+ */
+int
+xlog_cil_init(
+	struct log	*log)
+{
+	struct xfs_cil	*cil;
+	struct xfs_cil_ctx *ctx;
+
+	cil = kmem_zalloc(sizeof(*cil), KM_SLEEP|KM_MAYFAIL);
+	if (!cil)
+		return ENOMEM;
+
+	ctx = kmem_zalloc(sizeof(*ctx), KM_SLEEP|KM_MAYFAIL);
+	if (!ctx) {
+		kmem_free(cil);
+		return ENOMEM;
+	}
+
+	INIT_WORK(&cil->xc_push_work, xlog_cil_push_work);
+	INIT_LIST_HEAD(&cil->xc_cil);
+	INIT_LIST_HEAD(&cil->xc_committing);
+	spin_lock_init(&cil->xc_cil_lock);
+	init_rwsem(&cil->xc_ctx_lock);
+	init_waitqueue_head(&cil->xc_commit_wait);
+
+	INIT_LIST_HEAD(&ctx->committing);
+	INIT_LIST_HEAD(&ctx->busy_extents);
+	ctx->sequence = 1;
+	ctx->cil = cil;
+	cil->xc_ctx = ctx;
+	cil->xc_current_sequence = ctx->sequence;
+
+	cil->xc_log = log;
+	log->l_cilp = cil;
+	return 0;
+}
+
+void
+xlog_cil_destroy(
+	struct log	*log)
+{
+	if (log->l_cilp->xc_ctx) {
+		if (log->l_cilp->xc_ctx->ticket)
+			xfs_log_ticket_put(log->l_cilp->xc_ctx->ticket);
+		kmem_free(log->l_cilp->xc_ctx);
+	}
+
+	ASSERT(list_empty(&log->l_cilp->xc_cil));
+	kmem_free(log->l_cilp);
+}
+
diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h
index 2152900..735ff1e 100644
--- a/fs/xfs/xfs_log_priv.h
+++ b/fs/xfs/xfs_log_priv.h
@@ -417,6 +417,8 @@ struct xfs_cil {
 	struct list_head	xc_committing;
 	wait_queue_head_t	xc_commit_wait;
 	xfs_lsn_t		xc_current_sequence;
+	struct work_struct	xc_push_work;
+	xfs_lsn_t		xc_push_seq;
 };
 
 /*
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index 19fd5ed..8b89c5a 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -214,6 +214,7 @@ typedef struct xfs_mount {
 
 	struct workqueue_struct	*m_data_workqueue;
 	struct workqueue_struct	*m_unwritten_workqueue;
+	struct workqueue_struct	*m_cil_workqueue;
 } xfs_mount_t;
 
 /*
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 744d6b0..9ec7626b 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -773,8 +773,14 @@ xfs_init_mount_workqueues(
 	if (!mp->m_unwritten_workqueue)
 		goto out_destroy_data_iodone_queue;
 
+	mp->m_cil_workqueue = alloc_workqueue("xfs-cil/%s",
+			WQ_MEM_RECLAIM, 0, mp->m_fsname);
+	if (!mp->m_cil_workqueue)
+		goto out_destroy_unwritten;
 	return 0;
 
+out_destroy_unwritten:
+	destroy_workqueue(mp->m_unwritten_workqueue);
 out_destroy_data_iodone_queue:
 	destroy_workqueue(mp->m_data_workqueue);
 out:
@@ -785,6 +791,7 @@ STATIC void
 xfs_destroy_mount_workqueues(
 	struct xfs_mount	*mp)
 {
+	destroy_workqueue(mp->m_cil_workqueue);
 	destroy_workqueue(mp->m_data_workqueue);
 	destroy_workqueue(mp->m_unwritten_workqueue);
 }
-- 
1.7.9.5

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 04/18] xfs: page type check in writeback only checks last buffer
  2012-04-13 12:10 [PATCH 0/18] xfs: current patch queue Dave Chinner
                   ` (2 preceding siblings ...)
  2012-04-13 12:10 ` [PATCH 03/18] xfs: Do background CIL flushes via a workqueue Dave Chinner
@ 2012-04-13 12:10 ` Dave Chinner
  2012-04-16 16:15   ` Mark Tinguely
  2012-04-29 21:39   ` Christoph Hellwig
  2012-04-13 12:10 ` [PATCH 05/18] xfs: Use preallocation for inodes with extsz hints Dave Chinner
                   ` (14 subsequent siblings)
  18 siblings, 2 replies; 45+ messages in thread
From: Dave Chinner @ 2012-04-13 12:10 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

xfs_is_delayed_page() checks to see if a page has buffers matching
the given IO type passed in. It does so by walking the buffer heads
on the page and checking if the state flags match the IO type.

However, the "acceptable" variable that is calculated is overwritten
every time a new buffer is checked. Hence if the first buffer on the
page is of the right type, this state is lost if the second buffer
is not of the correct type. This means that xfs_aops_discard_page()
may not discard delalloc regions when it is supposed to, and
xfs_convert_page() may not cluster IO as efficiently as possible.

This problem only occurs on filesystems with a block size smaller
than page size.

Also, rename xfs_is_delayed_page() to xfs_check_page_type() to
better describe what it is doing - it is not delalloc specific
anymore.

The problem was first noticed by Peter Watkins.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_aops.c |   12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 0783def..2fc12db 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -623,7 +623,7 @@ xfs_map_at_offset(
  * or delayed allocate extent.
  */
 STATIC int
-xfs_is_delayed_page(
+xfs_check_page_type(
 	struct page		*page,
 	unsigned int		type)
 {
@@ -637,11 +637,11 @@ xfs_is_delayed_page(
 		bh = head = page_buffers(page);
 		do {
 			if (buffer_unwritten(bh))
-				acceptable = (type == IO_UNWRITTEN);
+				acceptable += (type == IO_UNWRITTEN);
 			else if (buffer_delay(bh))
-				acceptable = (type == IO_DELALLOC);
+				acceptable += (type == IO_DELALLOC);
 			else if (buffer_dirty(bh) && buffer_mapped(bh))
-				acceptable = (type == IO_OVERWRITE);
+				acceptable += (type == IO_OVERWRITE);
 			else
 				break;
 		} while ((bh = bh->b_this_page) != head);
@@ -684,7 +684,7 @@ xfs_convert_page(
 		goto fail_unlock_page;
 	if (page->mapping != inode->i_mapping)
 		goto fail_unlock_page;
-	if (!xfs_is_delayed_page(page, (*ioendp)->io_type))
+	if (!xfs_check_page_type(page, (*ioendp)->io_type))
 		goto fail_unlock_page;
 
 	/*
@@ -834,7 +834,7 @@ xfs_aops_discard_page(
 	struct buffer_head	*bh, *head;
 	loff_t			offset = page_offset(page);
 
-	if (!xfs_is_delayed_page(page, IO_DELALLOC))
+	if (!xfs_check_page_type(page, IO_DELALLOC))
 		goto out_invalidate;
 
 	if (XFS_FORCED_SHUTDOWN(ip->i_mount))
-- 
1.7.9.5

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 05/18] xfs: Use preallocation for inodes with extsz hints
  2012-04-13 12:10 [PATCH 0/18] xfs: current patch queue Dave Chinner
                   ` (3 preceding siblings ...)
  2012-04-13 12:10 ` [PATCH 04/18] xfs: page type check in writeback only checks last buffer Dave Chinner
@ 2012-04-13 12:10 ` Dave Chinner
  2012-04-13 16:45   ` Mark Tinguely
  2012-04-16 15:59   ` Mark Tinguely
  2012-04-13 12:10 ` [PATCH 06/18] xfs: fix buffer lookup race on allocation failure Dave Chinner
                   ` (13 subsequent siblings)
  18 siblings, 2 replies; 45+ messages in thread
From: Dave Chinner @ 2012-04-13 12:10 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

xfstest 229 exposes a problem with buffered IO, delayed allocation
and extent size hints. That is when we do delayed allocation during
buffered IO, we reserve space for the extent size hint alignment and
allocate the physical space to align the extent, but we do not zero
the regions of the extent that aren't written by the write(2)
syscall. The result is that we expose stale data in unwritten
regions of the extent size hints.

There are two ways to fix this. The first is to detect that we are
doing unaligned writes, check if there is already a mapping or data
over the extent size hint range, and if not zero the page cache
first before then doing the real write. This can be very expensive
for large extent size hints, especially if the subsequent writes
fill then entire extent size before the data is written to disk.

The second, and simpler way, is simply to turn off delayed
allocation when the extent size hint is set and use preallocation
instead. This results in unwritten extents being laid down on disk
and so only the written portions will be converted. This matches the
behaviour for direct IO, and will also work for the real time
device. The disadvantage of this approach is that for small extent
size hints we can get file fragmentation, but in general extent size
hints are fairly large (e.g. stripe width sized) so this isn't a big
deal.

Implement the second approach as it is simple and effective.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_aops.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 2fc12db..19ce2e2 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -1175,7 +1175,7 @@ __xfs_get_blocks(
 	    (!nimaps ||
 	     (imap.br_startblock == HOLESTARTBLOCK ||
 	      imap.br_startblock == DELAYSTARTBLOCK))) {
-		if (direct) {
+		if (direct || xfs_get_extsz_hint(ip)) {
 			xfs_iunlock(ip, lockmode);
 
 			error = xfs_iomap_write_direct(ip, offset, size,
-- 
1.7.9.5

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 06/18] xfs: fix buffer lookup race on allocation failure
  2012-04-13 12:10 [PATCH 0/18] xfs: current patch queue Dave Chinner
                   ` (4 preceding siblings ...)
  2012-04-13 12:10 ` [PATCH 05/18] xfs: Use preallocation for inodes with extsz hints Dave Chinner
@ 2012-04-13 12:10 ` Dave Chinner
  2012-04-13 18:32   ` Mark Tinguely
  2012-04-13 12:10 ` [PATCH 07/18] xfs: check for buffer errors before waiting Dave Chinner
                   ` (12 subsequent siblings)
  18 siblings, 1 reply; 45+ messages in thread
From: Dave Chinner @ 2012-04-13 12:10 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

When memory allocation fails to add the page array or tht epages to
a buffer during xfs_buf_get(), the buffer is left in the cache in a
partially initialised state. There is enough state left for the next
lookup on that buffer to find the buffer, and for the buffer to then
be used without finishing the initialisation.  As a result, when an
attempt to do IO on the buffer occurs, it fails with EIO because
there are no pages attached to the buffer.

We cannot remove the buffer from the cache immediately and free it,
because there may already be a racing lookup that is blocked on the
buffer lock. Hence the moment we unlock the buffer to then free it,
the other user is woken and we have a use-after-free situation.

To avoid this race condition altogether, allocate the pages for the
buffer before we insert it into the cache.  This then means that we
don't have an allocation  failure case to deal after the buffer is
already present in the cache, and hence avoid the problem
altogether.  In most cases we won't have racing inserts for the same
buffer, and so won't increase the memory pressure allocation before
insertion may entail.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_buf.c |   16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 193e1de..1578721 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -546,18 +546,20 @@ xfs_buf_get(
 	if (unlikely(!new_bp))
 		return NULL;
 
+	error = xfs_buf_allocate_memory(new_bp, flags);
+	if (error) {
+		kmem_zone_free(xfs_buf_zone, new_bp);
+		return NULL;
+	}
+
 	bp = _xfs_buf_find(target, ioff, isize, flags, new_bp);
 	if (!bp) {
-		kmem_zone_free(xfs_buf_zone, new_bp);
+		xfs_buf_free(new_bp);
 		return NULL;
 	}
 
-	if (bp == new_bp) {
-		error = xfs_buf_allocate_memory(bp, flags);
-		if (error)
-			goto no_buffer;
-	} else
-		kmem_zone_free(xfs_buf_zone, new_bp);
+	if (bp != new_bp)
+		xfs_buf_free(new_bp);
 
 	/*
 	 * Now we have a workable buffer, fill in the block number so
-- 
1.7.9.5

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 07/18] xfs: check for buffer errors before waiting
  2012-04-13 12:10 [PATCH 0/18] xfs: current patch queue Dave Chinner
                   ` (5 preceding siblings ...)
  2012-04-13 12:10 ` [PATCH 06/18] xfs: fix buffer lookup race on allocation failure Dave Chinner
@ 2012-04-13 12:10 ` Dave Chinner
  2012-04-13 17:56   ` Mark Tinguely
  2012-04-13 12:10 ` [PATCH 08/18] xfs: fix incorrect b_offset initialisation Dave Chinner
                   ` (11 subsequent siblings)
  18 siblings, 1 reply; 45+ messages in thread
From: Dave Chinner @ 2012-04-13 12:10 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

If we call xfs_buf_iowait() on a buffer that failed dispatch due to
an IO error, it will wait forever for an Io that does not exist.
This is hndled in xfs_buf_read, but there is other code that calls
xfs_buf_iowait directly that doesn't.

Rather than make the call sites have to handle checking for dispatch
errors and then checking for completion errors, make
xfs_buf_iowait() check for dispatch errors on the buffer before
waiting. This means we handle both dispatch and completion errors
with one set of error handling at the caller sites.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_buf.c         |   22 ++++++++++------------
 fs/xfs/xfs_buf.h         |    2 +-
 fs/xfs/xfs_log_recover.c |    2 ++
 3 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 1578721..841eec3 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -594,17 +594,15 @@ _xfs_buf_read(
 	xfs_buf_t		*bp,
 	xfs_buf_flags_t		flags)
 {
-	int			status;
-
 	ASSERT(!(flags & XBF_WRITE));
 	ASSERT(bp->b_bn != XFS_BUF_DADDR_NULL);
 
 	bp->b_flags &= ~(XBF_WRITE | XBF_ASYNC | XBF_READ_AHEAD);
 	bp->b_flags |= flags & (XBF_READ | XBF_ASYNC | XBF_READ_AHEAD);
 
-	status = xfs_buf_iorequest(bp);
-	if (status || bp->b_error || (flags & XBF_ASYNC))
-		return status;
+	xfs_buf_iorequest(bp);
+	if (flags & XBF_ASYNC)
+		return 0;
 	return xfs_buf_iowait(bp);
 }
 
@@ -689,7 +687,7 @@ xfs_buf_read_uncached(
 
 	xfsbdstrat(mp, bp);
 	error = xfs_buf_iowait(bp);
-	if (error || bp->b_error) {
+	if (error) {
 		xfs_buf_relse(bp);
 		return NULL;
 	}
@@ -1246,7 +1244,7 @@ next_chunk:
 	}
 }
 
-int
+void
 xfs_buf_iorequest(
 	xfs_buf_t		*bp)
 {
@@ -1267,13 +1265,12 @@ xfs_buf_iorequest(
 	_xfs_buf_ioend(bp, 0);
 
 	xfs_buf_rele(bp);
-	return 0;
 }
 
 /*
- *	Waits for I/O to complete on the buffer supplied.
- *	It returns immediately if no I/O is pending.
- *	It returns the I/O error code, if any, or 0 if there was no error.
+ * Waits for I/O to complete on the buffer supplied.  It returns immediately if
+ * no I/O is pending or there is already a pending error on the buffer.  It
+ * returns the I/O error code, if any, or 0 if there was no error.
  */
 int
 xfs_buf_iowait(
@@ -1281,7 +1278,8 @@ xfs_buf_iowait(
 {
 	trace_xfs_buf_iowait(bp, _RET_IP_);
 
-	wait_for_completion(&bp->b_iowait);
+	if (!bp->b_error)
+		wait_for_completion(&bp->b_iowait);
 
 	trace_xfs_buf_iowait_done(bp, _RET_IP_);
 	return bp->b_error;
diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
index 7083cf4..87a4748 100644
--- a/fs/xfs/xfs_buf.h
+++ b/fs/xfs/xfs_buf.h
@@ -191,7 +191,7 @@ extern int xfs_bdstrat_cb(struct xfs_buf *);
 extern void xfs_buf_ioend(xfs_buf_t *,	int);
 extern void xfs_buf_ioerror(xfs_buf_t *, int);
 extern void xfs_buf_ioerror_alert(struct xfs_buf *, const char *func);
-extern int xfs_buf_iorequest(xfs_buf_t *);
+extern void xfs_buf_iorequest(xfs_buf_t *);
 extern int xfs_buf_iowait(xfs_buf_t *);
 extern void xfs_buf_iomove(xfs_buf_t *, size_t, size_t, void *,
 				xfs_buf_rw_t);
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 396e3bf..64ed6ff 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -179,6 +179,7 @@ xlog_bread_noalign(
 	XFS_BUF_SET_ADDR(bp, log->l_logBBstart + blk_no);
 	XFS_BUF_READ(bp);
 	XFS_BUF_SET_COUNT(bp, BBTOB(nbblks));
+	bp->b_error = 0;
 
 	xfsbdstrat(log->l_mp, bp);
 	error = xfs_buf_iowait(bp);
@@ -266,6 +267,7 @@ xlog_bwrite(
 	xfs_buf_hold(bp);
 	xfs_buf_lock(bp);
 	XFS_BUF_SET_COUNT(bp, BBTOB(nbblks));
+	bp->b_error = 0;
 
 	error = xfs_bwrite(bp);
 	if (error)
-- 
1.7.9.5

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 08/18] xfs: fix incorrect b_offset initialisation
  2012-04-13 12:10 [PATCH 0/18] xfs: current patch queue Dave Chinner
                   ` (6 preceding siblings ...)
  2012-04-13 12:10 ` [PATCH 07/18] xfs: check for buffer errors before waiting Dave Chinner
@ 2012-04-13 12:10 ` Dave Chinner
  2012-04-13 12:10 ` [PATCH 09/18] xfs: use kmem_zone_zalloc for buffers Dave Chinner
                   ` (10 subsequent siblings)
  18 siblings, 0 replies; 45+ messages in thread
From: Dave Chinner @ 2012-04-13 12:10 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

Because we no longer use the page cache for buffering, there is no
direct block number to page offset relationship anymore.
xfs_buf_get_pages is still setting up b_offset as if there was some
relationship, and that is leading to incorrectly setting up
*uncached* buffers that don't overwrite b_offset once they've had
pages allocated.

For cached buffers, the first block of the buffer is always at offset
zero into the allocated memory. This is true for sub-page sized
buffers, as well as for multiple-page buffers.

For uncached buffers, b_offset is only non-zero when we are
associating specific memory to the buffers, and that is set
correctly by the code setting up the buffer.

Hence remove the setting of b_offset in xfs_buf_get_pages, because
it is now always the wrong thing to do.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
---
 fs/xfs/xfs_buf.c |    1 -
 1 file changed, 1 deletion(-)

diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 841eec3..e7dab7d 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -221,7 +221,6 @@ _xfs_buf_get_pages(
 {
 	/* Make sure that we have a page list */
 	if (bp->b_pages == NULL) {
-		bp->b_offset = xfs_buf_poff(bp->b_file_offset);
 		bp->b_page_count = page_count;
 		if (page_count <= XB_PAGES) {
 			bp->b_pages = bp->b_page_array;
-- 
1.7.9.5

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 09/18] xfs: use kmem_zone_zalloc for buffers
  2012-04-13 12:10 [PATCH 0/18] xfs: current patch queue Dave Chinner
                   ` (7 preceding siblings ...)
  2012-04-13 12:10 ` [PATCH 08/18] xfs: fix incorrect b_offset initialisation Dave Chinner
@ 2012-04-13 12:10 ` Dave Chinner
  2012-04-13 12:10 ` [PATCH 10/18] xfs: clean up buffer get/read call API Dave Chinner
                   ` (9 subsequent siblings)
  18 siblings, 0 replies; 45+ messages in thread
From: Dave Chinner @ 2012-04-13 12:10 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

To replace the alloc/memset pair.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
---
 fs/xfs/xfs_buf.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index e7dab7d..fb88163 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -172,7 +172,7 @@ xfs_buf_alloc(
 {
 	struct xfs_buf		*bp;
 
-	bp = kmem_zone_alloc(xfs_buf_zone, xb_to_km(flags));
+	bp = kmem_zone_zalloc(xfs_buf_zone, xb_to_km(flags));
 	if (unlikely(!bp))
 		return NULL;
 
@@ -181,7 +181,6 @@ xfs_buf_alloc(
 	 */
 	flags &= ~(XBF_LOCK|XBF_MAPPED|XBF_DONT_BLOCK|XBF_READ_AHEAD);
 
-	memset(bp, 0, sizeof(xfs_buf_t));
 	atomic_set(&bp->b_hold, 1);
 	atomic_set(&bp->b_lru_ref, 1);
 	init_completion(&bp->b_iowait);
-- 
1.7.9.5

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 10/18] xfs: clean up buffer get/read call API
  2012-04-13 12:10 [PATCH 0/18] xfs: current patch queue Dave Chinner
                   ` (8 preceding siblings ...)
  2012-04-13 12:10 ` [PATCH 09/18] xfs: use kmem_zone_zalloc for buffers Dave Chinner
@ 2012-04-13 12:10 ` Dave Chinner
  2012-04-13 12:10 ` [PATCH 11/18] xfs: kill b_file_offset Dave Chinner
                   ` (8 subsequent siblings)
  18 siblings, 0 replies; 45+ messages in thread
From: Dave Chinner @ 2012-04-13 12:10 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

The xfs_buf_get/read API is not consistent in the units it uses, and
does not use appropriate or consistent units/types for the
variables.

Convert the API to use disk addresses and block counts for all
buffer get and read calls. Use consistent naming for all the
functions and their declarations, and convert the internal functions
to use disk addresses and block counts to avoid need to convert them
from one type to another and back again.

Fix all the callers to use disk addresses and block counts. In many
cases, this removes an additional conversion from the function call
as the callers already have a block count.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
---
 fs/xfs/xfs_buf.c         |   86 +++++++++++++++++++++++++---------------------
 fs/xfs/xfs_buf.h         |   38 +++++++++++---------
 fs/xfs/xfs_fsops.c       |    4 +--
 fs/xfs/xfs_log.c         |    6 ++--
 fs/xfs/xfs_log_recover.c |    2 +-
 fs/xfs/xfs_mount.c       |   12 +++----
 fs/xfs/xfs_rtalloc.c     |    8 ++---
 fs/xfs/xfs_vnodeops.c    |    2 +-
 8 files changed, 84 insertions(+), 74 deletions(-)

diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index fb88163..c104cbb 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -166,8 +166,8 @@ xfs_buf_stale(
 struct xfs_buf *
 xfs_buf_alloc(
 	struct xfs_buftarg	*target,
-	xfs_off_t		range_base,
-	size_t			range_length,
+	xfs_daddr_t		blkno,
+	size_t			numblks,
 	xfs_buf_flags_t		flags)
 {
 	struct xfs_buf		*bp;
@@ -190,14 +190,21 @@ xfs_buf_alloc(
 	sema_init(&bp->b_sema, 0); /* held, no waiters */
 	XB_SET_OWNER(bp);
 	bp->b_target = target;
-	bp->b_file_offset = range_base;
+	bp->b_file_offset = blkno << BBSHIFT;
 	/*
 	 * Set buffer_length and count_desired to the same value initially.
 	 * I/O routines should use count_desired, which will be the same in
 	 * most cases but may be reset (e.g. XFS recovery).
 	 */
-	bp->b_buffer_length = bp->b_count_desired = range_length;
+	bp->b_buffer_length = bp->b_count_desired = numblks << BBSHIFT;
 	bp->b_flags = flags;
+
+	/*
+	 * We do not set the block number here in the buffer because we have not
+	 * finished initialising the buffer. We insert the buffer into the cache
+	 * in this state, so this ensures that we are unable to do IO on a
+	 * buffer that hasn't been fully initialised.
+	 */
 	bp->b_bn = XFS_BUF_DADDR_NULL;
 	atomic_set(&bp->b_pin_count, 0);
 	init_waitqueue_head(&bp->b_waiters);
@@ -420,29 +427,29 @@ _xfs_buf_map_pages(
  */
 xfs_buf_t *
 _xfs_buf_find(
-	xfs_buftarg_t		*btp,	/* block device target		*/
-	xfs_off_t		ioff,	/* starting offset of range	*/
-	size_t			isize,	/* length of range		*/
+	struct xfs_buftarg	*btp,
+	xfs_daddr_t		blkno,
+	size_t			numblks,
 	xfs_buf_flags_t		flags,
 	xfs_buf_t		*new_bp)
 {
-	xfs_off_t		range_base;
-	size_t			range_length;
+	xfs_off_t		offset;
+	size_t			numbytes;
 	struct xfs_perag	*pag;
 	struct rb_node		**rbp;
 	struct rb_node		*parent;
 	xfs_buf_t		*bp;
 
-	range_base = (ioff << BBSHIFT);
-	range_length = (isize << BBSHIFT);
+	offset = BBTOB(blkno);
+	numbytes = BBTOB(numblks);
 
 	/* Check for IOs smaller than the sector size / not sector aligned */
-	ASSERT(!(range_length < (1 << btp->bt_sshift)));
-	ASSERT(!(range_base & (xfs_off_t)btp->bt_smask));
+	ASSERT(!(numbytes < (1 << btp->bt_sshift)));
+	ASSERT(!(offset & (xfs_off_t)btp->bt_smask));
 
 	/* get tree root */
 	pag = xfs_perag_get(btp->bt_mount,
-				xfs_daddr_to_agno(btp->bt_mount, ioff));
+				xfs_daddr_to_agno(btp->bt_mount, blkno));
 
 	/* walk tree */
 	spin_lock(&pag->pag_buf_lock);
@@ -453,9 +460,9 @@ _xfs_buf_find(
 		parent = *rbp;
 		bp = rb_entry(parent, struct xfs_buf, b_rbnode);
 
-		if (range_base < bp->b_file_offset)
+		if (offset < bp->b_file_offset)
 			rbp = &(*rbp)->rb_left;
-		else if (range_base > bp->b_file_offset)
+		else if (offset > bp->b_file_offset)
 			rbp = &(*rbp)->rb_right;
 		else {
 			/*
@@ -466,7 +473,7 @@ _xfs_buf_find(
 			 * reallocating a busy extent. Skip this buffer and
 			 * continue searching to the right for an exact match.
 			 */
-			if (bp->b_buffer_length != range_length) {
+			if (bp->b_buffer_length != numbytes) {
 				ASSERT(bp->b_flags & XBF_STALE);
 				rbp = &(*rbp)->rb_right;
 				continue;
@@ -526,21 +533,20 @@ found:
  */
 struct xfs_buf *
 xfs_buf_get(
-	xfs_buftarg_t		*target,/* target for buffer		*/
-	xfs_off_t		ioff,	/* starting offset of range	*/
-	size_t			isize,	/* length of range		*/
+	xfs_buftarg_t		*target,
+	xfs_daddr_t		blkno,
+	size_t			numblks,
 	xfs_buf_flags_t		flags)
 {
 	struct xfs_buf		*bp;
 	struct xfs_buf		*new_bp;
 	int			error = 0;
 
-	bp = _xfs_buf_find(target, ioff, isize, flags, NULL);
+	bp = _xfs_buf_find(target, blkno, numblks, flags, NULL);
 	if (likely(bp))
 		goto found;
 
-	new_bp = xfs_buf_alloc(target, ioff << BBSHIFT, isize << BBSHIFT,
-			       flags);
+	new_bp = xfs_buf_alloc(target, blkno, numblks, flags);
 	if (unlikely(!new_bp))
 		return NULL;
 
@@ -550,7 +556,7 @@ xfs_buf_get(
 		return NULL;
 	}
 
-	bp = _xfs_buf_find(target, ioff, isize, flags, new_bp);
+	bp = _xfs_buf_find(target, blkno, numblks, flags, new_bp);
 	if (!bp) {
 		xfs_buf_free(new_bp);
 		return NULL;
@@ -563,7 +569,7 @@ xfs_buf_get(
 	 * Now we have a workable buffer, fill in the block number so
 	 * that we can do IO on it.
 	 */
-	bp->b_bn = ioff;
+	bp->b_bn = blkno;
 	bp->b_count_desired = bp->b_buffer_length;
 
 found:
@@ -607,15 +613,15 @@ _xfs_buf_read(
 xfs_buf_t *
 xfs_buf_read(
 	xfs_buftarg_t		*target,
-	xfs_off_t		ioff,
-	size_t			isize,
+	xfs_daddr_t		blkno,
+	size_t			numblks,
 	xfs_buf_flags_t		flags)
 {
 	xfs_buf_t		*bp;
 
 	flags |= XBF_READ;
 
-	bp = xfs_buf_get(target, ioff, isize, flags);
+	bp = xfs_buf_get(target, blkno, numblks, flags);
 	if (bp) {
 		trace_xfs_buf_read(bp, flags, _RET_IP_);
 
@@ -650,13 +656,13 @@ xfs_buf_read(
 void
 xfs_buf_readahead(
 	xfs_buftarg_t		*target,
-	xfs_off_t		ioff,
-	size_t			isize)
+	xfs_daddr_t		blkno,
+	size_t			numblks)
 {
 	if (bdi_read_congested(target->bt_bdi))
 		return;
 
-	xfs_buf_read(target, ioff, isize,
+	xfs_buf_read(target, blkno, numblks,
 		     XBF_TRYLOCK|XBF_ASYNC|XBF_READ_AHEAD|XBF_DONT_BLOCK);
 }
 
@@ -666,16 +672,15 @@ xfs_buf_readahead(
  */
 struct xfs_buf *
 xfs_buf_read_uncached(
-	struct xfs_mount	*mp,
 	struct xfs_buftarg	*target,
 	xfs_daddr_t		daddr,
-	size_t			length,
+	size_t			numblks,
 	int			flags)
 {
 	xfs_buf_t		*bp;
 	int			error;
 
-	bp = xfs_buf_get_uncached(target, length, flags);
+	bp = xfs_buf_get_uncached(target, numblks, flags);
 	if (!bp)
 		return NULL;
 
@@ -683,7 +688,7 @@ xfs_buf_read_uncached(
 	XFS_BUF_SET_ADDR(bp, daddr);
 	XFS_BUF_READ(bp);
 
-	xfsbdstrat(mp, bp);
+	xfsbdstrat(target->bt_mount, bp);
 	error = xfs_buf_iowait(bp);
 	if (error) {
 		xfs_buf_relse(bp);
@@ -699,7 +704,7 @@ xfs_buf_read_uncached(
 void
 xfs_buf_set_empty(
 	struct xfs_buf		*bp,
-	size_t			len)
+	size_t			numblks)
 {
 	if (bp->b_pages)
 		_xfs_buf_free_pages(bp);
@@ -708,7 +713,7 @@ xfs_buf_set_empty(
 	bp->b_page_count = 0;
 	bp->b_addr = NULL;
 	bp->b_file_offset = 0;
-	bp->b_buffer_length = bp->b_count_desired = len;
+	bp->b_buffer_length = bp->b_count_desired = numblks << BBSHIFT;
 	bp->b_bn = XFS_BUF_DADDR_NULL;
 	bp->b_flags &= ~XBF_MAPPED;
 }
@@ -770,17 +775,18 @@ xfs_buf_associate_memory(
 xfs_buf_t *
 xfs_buf_get_uncached(
 	struct xfs_buftarg	*target,
-	size_t			len,
+	size_t			numblks,
 	int			flags)
 {
-	unsigned long		page_count = PAGE_ALIGN(len) >> PAGE_SHIFT;
+	unsigned long		page_count;
 	int			error, i;
 	xfs_buf_t		*bp;
 
-	bp = xfs_buf_alloc(target, 0, len, 0);
+	bp = xfs_buf_alloc(target, 0, numblks, 0);
 	if (unlikely(bp == NULL))
 		goto fail;
 
+	page_count = PAGE_ALIGN(numblks << BBSHIFT) >> PAGE_SHIFT;
 	error = _xfs_buf_get_pages(bp, page_count, 0);
 	if (error)
 		goto fail_free_buf;
diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
index 87a4748..ffd6da0 100644
--- a/fs/xfs/xfs_buf.h
+++ b/fs/xfs/xfs_buf.h
@@ -150,26 +150,30 @@ typedef struct xfs_buf {
 
 
 /* Finding and Reading Buffers */
-extern xfs_buf_t *_xfs_buf_find(xfs_buftarg_t *, xfs_off_t, size_t,
-				xfs_buf_flags_t, xfs_buf_t *);
+struct xfs_buf *_xfs_buf_find(struct xfs_buftarg *target, xfs_daddr_t blkno,
+				size_t numblks, xfs_buf_flags_t flags,
+				struct xfs_buf *new_bp);
 #define xfs_incore(buftarg,blkno,len,lockit) \
 	_xfs_buf_find(buftarg, blkno ,len, lockit, NULL)
 
-extern xfs_buf_t *xfs_buf_get(xfs_buftarg_t *, xfs_off_t, size_t,
-				xfs_buf_flags_t);
-extern xfs_buf_t *xfs_buf_read(xfs_buftarg_t *, xfs_off_t, size_t,
-				xfs_buf_flags_t);
-
-struct xfs_buf *xfs_buf_alloc(struct xfs_buftarg *, xfs_off_t, size_t,
-			      xfs_buf_flags_t);
-extern void xfs_buf_set_empty(struct xfs_buf *bp, size_t len);
-extern xfs_buf_t *xfs_buf_get_uncached(struct xfs_buftarg *, size_t, int);
-extern int xfs_buf_associate_memory(xfs_buf_t *, void *, size_t);
-extern void xfs_buf_hold(xfs_buf_t *);
-extern void xfs_buf_readahead(xfs_buftarg_t *, xfs_off_t, size_t);
-struct xfs_buf *xfs_buf_read_uncached(struct xfs_mount *mp,
-				struct xfs_buftarg *target,
-				xfs_daddr_t daddr, size_t length, int flags);
+struct xfs_buf *xfs_buf_get(struct xfs_buftarg *target, xfs_daddr_t blkno,
+				size_t numblks, xfs_buf_flags_t flags);
+struct xfs_buf *xfs_buf_read(struct xfs_buftarg *target, xfs_daddr_t blkno,
+				size_t numblks, xfs_buf_flags_t flags);
+void xfs_buf_readahead(struct xfs_buftarg *target, xfs_daddr_t blkno,
+				size_t numblks);
+
+struct xfs_buf *xfs_buf_get_empty(struct xfs_buftarg *target, size_t numblks);
+struct xfs_buf *xfs_buf_alloc(struct xfs_buftarg *target, xfs_daddr_t blkno,
+				size_t numblks, xfs_buf_flags_t flags);
+void xfs_buf_set_empty(struct xfs_buf *bp, size_t numblks);
+int xfs_buf_associate_memory(struct xfs_buf *bp, void *mem, size_t length);
+
+struct xfs_buf *xfs_buf_get_uncached(struct xfs_buftarg *target, size_t numblks,
+				int flags);
+struct xfs_buf *xfs_buf_read_uncached(struct xfs_buftarg *target,
+				xfs_daddr_t daddr, size_t numblks, int flags);
+void xfs_buf_hold(struct xfs_buf *bp);
 
 /* Releasing Buffers */
 extern void xfs_buf_free(xfs_buf_t *);
diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index 1c6fdeb..019ba5c 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -147,9 +147,9 @@ xfs_growfs_data_private(
 	if ((error = xfs_sb_validate_fsb_count(&mp->m_sb, nb)))
 		return error;
 	dpct = pct - mp->m_sb.sb_imax_pct;
-	bp = xfs_buf_read_uncached(mp, mp->m_ddev_targp,
+	bp = xfs_buf_read_uncached(mp->m_ddev_targp,
 				XFS_FSB_TO_BB(mp, nb) - XFS_FSS_TO_BB(mp, 1),
-				BBTOB(XFS_FSS_TO_BB(mp, 1)), 0);
+				XFS_FSS_TO_BB(mp, 1), 0);
 	if (!bp)
 		return EIO;
 	xfs_buf_relse(bp);
diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index 418d5d7..8990012 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -1187,7 +1187,7 @@ xlog_alloc_log(xfs_mount_t	*mp,
 	xlog_get_iclog_buffer_size(mp, log);
 
 	error = ENOMEM;
-	bp = xfs_buf_alloc(mp->m_logdev_targp, 0, log->l_iclog_size, 0);
+	bp = xfs_buf_alloc(mp->m_logdev_targp, 0, BTOBB(log->l_iclog_size), 0);
 	if (!bp)
 		goto out_free_log;
 	bp->b_iodone = xlog_iodone;
@@ -1219,7 +1219,7 @@ xlog_alloc_log(xfs_mount_t	*mp,
 		prev_iclog = iclog;
 
 		bp = xfs_buf_get_uncached(mp->m_logdev_targp,
-						log->l_iclog_size, 0);
+						BTOBB(log->l_iclog_size), 0);
 		if (!bp)
 			goto out_free_iclog;
 
@@ -1588,7 +1588,7 @@ xlog_dealloc_log(xlog_t *log)
 	 * always need to ensure that the extra buffer does not point to memory
 	 * owned by another log buffer before we free it.
 	 */
-	xfs_buf_set_empty(log->l_xbuf, log->l_iclog_size);
+	xfs_buf_set_empty(log->l_xbuf, BTOBB(log->l_iclog_size));
 	xfs_buf_free(log->l_xbuf);
 
 	iclog = log->l_iclog;
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 64ed6ff..d94ed40 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -120,7 +120,7 @@ xlog_get_bp(
 		nbblks += log->l_sectBBsize;
 	nbblks = round_up(nbblks, log->l_sectBBsize);
 
-	bp = xfs_buf_get_uncached(log->l_mp->m_logdev_targp, BBTOB(nbblks), 0);
+	bp = xfs_buf_get_uncached(log->l_mp->m_logdev_targp, nbblks, 0);
 	if (bp)
 		xfs_buf_unlock(bp);
 	return bp;
diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index 385a3b1..89be5ff 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -684,8 +684,8 @@ xfs_readsb(xfs_mount_t *mp, int flags)
 	sector_size = xfs_getsize_buftarg(mp->m_ddev_targp);
 
 reread:
-	bp = xfs_buf_read_uncached(mp, mp->m_ddev_targp,
-					XFS_SB_DADDR, sector_size, 0);
+	bp = xfs_buf_read_uncached(mp->m_ddev_targp, XFS_SB_DADDR,
+					BTOBB(sector_size), 0);
 	if (!bp) {
 		if (loud)
 			xfs_warn(mp, "SB buffer read failed");
@@ -1033,9 +1033,9 @@ xfs_check_sizes(xfs_mount_t *mp)
 		xfs_warn(mp, "filesystem size mismatch detected");
 		return XFS_ERROR(EFBIG);
 	}
-	bp = xfs_buf_read_uncached(mp, mp->m_ddev_targp,
+	bp = xfs_buf_read_uncached(mp->m_ddev_targp,
 					d - XFS_FSS_TO_BB(mp, 1),
-					BBTOB(XFS_FSS_TO_BB(mp, 1)), 0);
+					XFS_FSS_TO_BB(mp, 1), 0);
 	if (!bp) {
 		xfs_warn(mp, "last sector read failed");
 		return EIO;
@@ -1048,9 +1048,9 @@ xfs_check_sizes(xfs_mount_t *mp)
 			xfs_warn(mp, "log size mismatch detected");
 			return XFS_ERROR(EFBIG);
 		}
-		bp = xfs_buf_read_uncached(mp, mp->m_logdev_targp,
+		bp = xfs_buf_read_uncached(mp->m_logdev_targp,
 					d - XFS_FSB_TO_BB(mp, 1),
-					XFS_FSB_TO_B(mp, 1), 0);
+					XFS_FSB_TO_BB(mp, 1), 0);
 		if (!bp) {
 			xfs_warn(mp, "log device read failed");
 			return EIO;
diff --git a/fs/xfs/xfs_rtalloc.c b/fs/xfs/xfs_rtalloc.c
index ca4f315..7434d3f 100644
--- a/fs/xfs/xfs_rtalloc.c
+++ b/fs/xfs/xfs_rtalloc.c
@@ -1872,9 +1872,9 @@ xfs_growfs_rt(
 	/*
 	 * Read in the last block of the device, make sure it exists.
 	 */
-	bp = xfs_buf_read_uncached(mp, mp->m_rtdev_targp,
+	bp = xfs_buf_read_uncached(mp->m_rtdev_targp,
 				XFS_FSB_TO_BB(mp, nrblocks - 1),
-				XFS_FSB_TO_B(mp, 1), 0);
+				XFS_FSB_TO_BB(mp, 1), 0);
 	if (!bp)
 		return EIO;
 	xfs_buf_relse(bp);
@@ -2219,9 +2219,9 @@ xfs_rtmount_init(
 			(unsigned long long) mp->m_sb.sb_rblocks);
 		return XFS_ERROR(EFBIG);
 	}
-	bp = xfs_buf_read_uncached(mp, mp->m_rtdev_targp,
+	bp = xfs_buf_read_uncached(mp->m_rtdev_targp,
 					d - XFS_FSB_TO_BB(mp, 1),
-					XFS_FSB_TO_B(mp, 1), 0);
+					XFS_FSB_TO_BB(mp, 1), 0);
 	if (!bp) {
 		xfs_warn(mp, "realtime device size check failed");
 		return EIO;
diff --git a/fs/xfs/xfs_vnodeops.c b/fs/xfs/xfs_vnodeops.c
index 64981d7..445c224 100644
--- a/fs/xfs/xfs_vnodeops.c
+++ b/fs/xfs/xfs_vnodeops.c
@@ -1966,7 +1966,7 @@ xfs_zero_remaining_bytes(
 
 	bp = xfs_buf_get_uncached(XFS_IS_REALTIME_INODE(ip) ?
 					mp->m_rtdev_targp : mp->m_ddev_targp,
-				mp->m_sb.sb_blocksize, XBF_DONT_BLOCK);
+				BTOBB(mp->m_sb.sb_blocksize), XBF_DONT_BLOCK);
 	if (!bp)
 		return XFS_ERROR(ENOMEM);
 
-- 
1.7.9.5

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 11/18] xfs: kill b_file_offset
  2012-04-13 12:10 [PATCH 0/18] xfs: current patch queue Dave Chinner
                   ` (9 preceding siblings ...)
  2012-04-13 12:10 ` [PATCH 10/18] xfs: clean up buffer get/read call API Dave Chinner
@ 2012-04-13 12:10 ` Dave Chinner
  2012-04-13 12:10 ` [PATCH 12/18] xfs: use blocks for counting length of buffers Dave Chinner
                   ` (7 subsequent siblings)
  18 siblings, 0 replies; 45+ messages in thread
From: Dave Chinner @ 2012-04-13 12:10 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

Seeing as we pass block numbers around everywhere in the buffer
cache now, it makes no sense to index everything by byte offset.
Replace all the byte offset indexing with block number based
indexing, and replace all uses of the byte offset with direct
conversion from the block index.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
---
 fs/xfs/xfs_buf.c |   17 +++++++----------
 fs/xfs/xfs_buf.h |    5 +----
 2 files changed, 8 insertions(+), 14 deletions(-)

diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index c104cbb..58561d9 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -190,7 +190,7 @@ xfs_buf_alloc(
 	sema_init(&bp->b_sema, 0); /* held, no waiters */
 	XB_SET_OWNER(bp);
 	bp->b_target = target;
-	bp->b_file_offset = blkno << BBSHIFT;
+
 	/*
 	 * Set buffer_length and count_desired to the same value initially.
 	 * I/O routines should use count_desired, which will be the same in
@@ -331,8 +331,8 @@ xfs_buf_allocate_memory(
 	}
 
 use_alloc_page:
-	end = bp->b_file_offset + bp->b_buffer_length;
-	page_count = xfs_buf_btoc(end) - xfs_buf_btoct(bp->b_file_offset);
+	end = BBTOB(bp->b_bn) + bp->b_buffer_length;
+	page_count = xfs_buf_btoc(end) - xfs_buf_btoct(BBTOB(bp->b_bn));
 	error = _xfs_buf_get_pages(bp, page_count, flags);
 	if (unlikely(error))
 		return error;
@@ -433,19 +433,17 @@ _xfs_buf_find(
 	xfs_buf_flags_t		flags,
 	xfs_buf_t		*new_bp)
 {
-	xfs_off_t		offset;
 	size_t			numbytes;
 	struct xfs_perag	*pag;
 	struct rb_node		**rbp;
 	struct rb_node		*parent;
 	xfs_buf_t		*bp;
 
-	offset = BBTOB(blkno);
 	numbytes = BBTOB(numblks);
 
 	/* Check for IOs smaller than the sector size / not sector aligned */
 	ASSERT(!(numbytes < (1 << btp->bt_sshift)));
-	ASSERT(!(offset & (xfs_off_t)btp->bt_smask));
+	ASSERT(!(BBTOB(blkno) & (xfs_off_t)btp->bt_smask));
 
 	/* get tree root */
 	pag = xfs_perag_get(btp->bt_mount,
@@ -460,13 +458,13 @@ _xfs_buf_find(
 		parent = *rbp;
 		bp = rb_entry(parent, struct xfs_buf, b_rbnode);
 
-		if (offset < bp->b_file_offset)
+		if (blkno < bp->b_bn)
 			rbp = &(*rbp)->rb_left;
-		else if (offset > bp->b_file_offset)
+		else if (blkno > bp->b_bn)
 			rbp = &(*rbp)->rb_right;
 		else {
 			/*
-			 * found a block offset match. If the range doesn't
+			 * found a block number match. If the range doesn't
 			 * match, the only way this is allowed is if the buffer
 			 * in the cache is stale and the transaction that made
 			 * it stale has not yet committed. i.e. we are
@@ -712,7 +710,6 @@ xfs_buf_set_empty(
 	bp->b_pages = NULL;
 	bp->b_page_count = 0;
 	bp->b_addr = NULL;
-	bp->b_file_offset = 0;
 	bp->b_buffer_length = bp->b_count_desired = numblks << BBSHIFT;
 	bp->b_bn = XFS_BUF_DADDR_NULL;
 	bp->b_flags &= ~XBF_MAPPED;
diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
index ffd6da0..4d472e5 100644
--- a/fs/xfs/xfs_buf.h
+++ b/fs/xfs/xfs_buf.h
@@ -116,7 +116,7 @@ typedef struct xfs_buf {
 	 * fast-path on locking.
 	 */
 	struct rb_node		b_rbnode;	/* rbtree node */
-	xfs_off_t		b_file_offset;	/* offset in file */
+	xfs_daddr_t		b_bn;		/* block number for I/O */
 	size_t			b_buffer_length;/* size of buffer in bytes */
 	atomic_t		b_hold;		/* reference count */
 	atomic_t		b_lru_ref;	/* lru reclaim ref count */
@@ -128,7 +128,6 @@ typedef struct xfs_buf {
 	struct list_head	b_list;
 	struct xfs_perag	*b_pag;		/* contains rbtree root */
 	xfs_buftarg_t		*b_target;	/* buffer target (device) */
-	xfs_daddr_t		b_bn;		/* block number for I/O */
 	size_t			b_count_desired;/* desired transfer size */
 	void			*b_addr;	/* virtual address of buffer */
 	struct work_struct	b_iodone_work;
@@ -245,8 +244,6 @@ void xfs_buf_stale(struct xfs_buf *bp);
 
 #define XFS_BUF_ADDR(bp)		((bp)->b_bn)
 #define XFS_BUF_SET_ADDR(bp, bno)	((bp)->b_bn = (xfs_daddr_t)(bno))
-#define XFS_BUF_OFFSET(bp)		((bp)->b_file_offset)
-#define XFS_BUF_SET_OFFSET(bp, off)	((bp)->b_file_offset = (off))
 #define XFS_BUF_COUNT(bp)		((bp)->b_count_desired)
 #define XFS_BUF_SET_COUNT(bp, cnt)	((bp)->b_count_desired = (cnt))
 #define XFS_BUF_SIZE(bp)		((bp)->b_buffer_length)
-- 
1.7.9.5

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 12/18] xfs: use blocks for counting length of buffers
  2012-04-13 12:10 [PATCH 0/18] xfs: current patch queue Dave Chinner
                   ` (10 preceding siblings ...)
  2012-04-13 12:10 ` [PATCH 11/18] xfs: kill b_file_offset Dave Chinner
@ 2012-04-13 12:10 ` Dave Chinner
  2012-04-13 12:10 ` [PATCH 13/18] xfs: use blocks for storing the desired IO size Dave Chinner
                   ` (6 subsequent siblings)
  18 siblings, 0 replies; 45+ messages in thread
From: Dave Chinner @ 2012-04-13 12:10 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

Now that we pass block counts everywhere, and index buffers by block
number, track the length of the buffer in units of blocks rather
than bytes. Convert the code to use block counts, and those that
need byte counts get converted at the time of use.

Also, remove the XFS_BUF_{SET_}SIZE() macros that are just wrappers
around the buffer length. They onyl serve to make the code shouty
loud and don't actually add any real value.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
---
 fs/xfs/xfs_attr.c        |   15 +++++++++------
 fs/xfs/xfs_buf.c         |   22 ++++++++++++----------
 fs/xfs/xfs_buf.h         |    4 +---
 fs/xfs/xfs_log.c         |    5 +----
 fs/xfs/xfs_log_recover.c |    8 ++++----
 fs/xfs/xfs_trace.h       |   14 +++++++-------
 6 files changed, 34 insertions(+), 34 deletions(-)

diff --git a/fs/xfs/xfs_attr.c b/fs/xfs/xfs_attr.c
index 65d61b9..6e9bd7e 100644
--- a/fs/xfs/xfs_attr.c
+++ b/fs/xfs/xfs_attr.c
@@ -1993,8 +1993,7 @@ xfs_attr_rmtval_get(xfs_da_args_t *args)
 			if (error)
 				return(error);
 
-			tmp = (valuelen < XFS_BUF_SIZE(bp))
-				? valuelen : XFS_BUF_SIZE(bp);
+			tmp = min_t(int, valuelen, BBTOB(bp->b_length));
 			xfs_buf_iomove(bp, 0, tmp, dst, XBRW_READ);
 			xfs_buf_relse(bp);
 			dst += tmp;
@@ -2097,6 +2096,8 @@ xfs_attr_rmtval_set(xfs_da_args_t *args)
 	lblkno = args->rmtblkno;
 	valuelen = args->valuelen;
 	while (valuelen > 0) {
+		int buflen;
+
 		/*
 		 * Try to remember where we decided to put the value.
 		 */
@@ -2118,11 +2119,13 @@ xfs_attr_rmtval_set(xfs_da_args_t *args)
 				 XBF_LOCK | XBF_DONT_BLOCK);
 		if (!bp)
 			return ENOMEM;
-		tmp = (valuelen < XFS_BUF_SIZE(bp)) ? valuelen :
-							XFS_BUF_SIZE(bp);
+
+		buflen = BBTOB(bp->b_length);
+		tmp = min_t(int, valuelen, buflen);
 		xfs_buf_iomove(bp, 0, tmp, src, XBRW_WRITE);
-		if (tmp < XFS_BUF_SIZE(bp))
-			xfs_buf_zero(bp, tmp, XFS_BUF_SIZE(bp) - tmp);
+		if (tmp < buflen)
+			xfs_buf_zero(bp, tmp, buflen - tmp);
+
 		error = xfs_bwrite(bp);	/* GROT: NOTE: synchronous write */
 		xfs_buf_relse(bp);
 		if (error)
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 58561d9..088b4f0 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -192,11 +192,12 @@ xfs_buf_alloc(
 	bp->b_target = target;
 
 	/*
-	 * Set buffer_length and count_desired to the same value initially.
+	 * Set length and count_desired to the same value initially.
 	 * I/O routines should use count_desired, which will be the same in
 	 * most cases but may be reset (e.g. XFS recovery).
 	 */
-	bp->b_buffer_length = bp->b_count_desired = numblks << BBSHIFT;
+	bp->b_length = numblks;
+	bp->b_count_desired = numblks << BBSHIFT;
 	bp->b_flags = flags;
 
 	/*
@@ -307,14 +308,14 @@ xfs_buf_allocate_memory(
 	 * the memory from the heap - there's no need for the complexity of
 	 * page arrays to keep allocation down to order 0.
 	 */
-	if (bp->b_buffer_length < PAGE_SIZE) {
-		bp->b_addr = kmem_alloc(bp->b_buffer_length, xb_to_km(flags));
+	if (bp->b_length < BTOBB(PAGE_SIZE)) {
+		bp->b_addr = kmem_alloc(BBTOB(bp->b_length), xb_to_km(flags));
 		if (!bp->b_addr) {
 			/* low memory - use alloc_page loop instead */
 			goto use_alloc_page;
 		}
 
-		if (((unsigned long)(bp->b_addr + bp->b_buffer_length - 1) &
+		if (((unsigned long)(bp->b_addr + BBTOB(bp->b_length) - 1) &
 								PAGE_MASK) !=
 		    ((unsigned long)bp->b_addr & PAGE_MASK)) {
 			/* b_addr spans two pages - use alloc_page instead */
@@ -331,7 +332,7 @@ xfs_buf_allocate_memory(
 	}
 
 use_alloc_page:
-	end = BBTOB(bp->b_bn) + bp->b_buffer_length;
+	end = BBTOB(bp->b_bn + bp->b_length);
 	page_count = xfs_buf_btoc(end) - xfs_buf_btoct(BBTOB(bp->b_bn));
 	error = _xfs_buf_get_pages(bp, page_count, flags);
 	if (unlikely(error))
@@ -471,7 +472,7 @@ _xfs_buf_find(
 			 * reallocating a busy extent. Skip this buffer and
 			 * continue searching to the right for an exact match.
 			 */
-			if (bp->b_buffer_length != numbytes) {
+			if (bp->b_length != numblks) {
 				ASSERT(bp->b_flags & XBF_STALE);
 				rbp = &(*rbp)->rb_right;
 				continue;
@@ -568,7 +569,7 @@ xfs_buf_get(
 	 * that we can do IO on it.
 	 */
 	bp->b_bn = blkno;
-	bp->b_count_desired = bp->b_buffer_length;
+	bp->b_count_desired = BBTOB(bp->b_length);
 
 found:
 	if (!(bp->b_flags & XBF_MAPPED)) {
@@ -710,7 +711,8 @@ xfs_buf_set_empty(
 	bp->b_pages = NULL;
 	bp->b_page_count = 0;
 	bp->b_addr = NULL;
-	bp->b_buffer_length = bp->b_count_desired = numblks << BBSHIFT;
+	bp->b_length = numblks;
+	bp->b_count_desired = numblks << BBSHIFT;
 	bp->b_bn = XFS_BUF_DADDR_NULL;
 	bp->b_flags &= ~XBF_MAPPED;
 }
@@ -763,7 +765,7 @@ xfs_buf_associate_memory(
 	}
 
 	bp->b_count_desired = len;
-	bp->b_buffer_length = buflen;
+	bp->b_length = BTOBB(buflen);
 	bp->b_flags |= XBF_MAPPED;
 
 	return 0;
diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
index 4d472e5..3dab208 100644
--- a/fs/xfs/xfs_buf.h
+++ b/fs/xfs/xfs_buf.h
@@ -117,7 +117,7 @@ typedef struct xfs_buf {
 	 */
 	struct rb_node		b_rbnode;	/* rbtree node */
 	xfs_daddr_t		b_bn;		/* block number for I/O */
-	size_t			b_buffer_length;/* size of buffer in bytes */
+	int			b_length;	/* size of buffer in BBs */
 	atomic_t		b_hold;		/* reference count */
 	atomic_t		b_lru_ref;	/* lru reclaim ref count */
 	xfs_buf_flags_t		b_flags;	/* status flags */
@@ -246,8 +246,6 @@ void xfs_buf_stale(struct xfs_buf *bp);
 #define XFS_BUF_SET_ADDR(bp, bno)	((bp)->b_bn = (xfs_daddr_t)(bno))
 #define XFS_BUF_COUNT(bp)		((bp)->b_count_desired)
 #define XFS_BUF_SET_COUNT(bp, cnt)	((bp)->b_count_desired = (cnt))
-#define XFS_BUF_SIZE(bp)		((bp)->b_buffer_length)
-#define XFS_BUF_SET_SIZE(bp, cnt)	((bp)->b_buffer_length = (cnt))
 
 static inline void xfs_buf_set_ref(struct xfs_buf *bp, int lru_ref)
 {
diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index 8990012..f9d8355 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -1197,9 +1197,6 @@ xlog_alloc_log(xfs_mount_t	*mp,
 	spin_lock_init(&log->l_icloglock);
 	init_waitqueue_head(&log->l_flush_wait);
 
-	/* log record size must be multiple of BBSIZE; see xlog_rec_header_t */
-	ASSERT((XFS_BUF_SIZE(bp) & BBMASK) == 0);
-
 	iclogp = &log->l_iclog;
 	/*
 	 * The amount of memory to allocate for the iclog structure is
@@ -1239,7 +1236,7 @@ xlog_alloc_log(xfs_mount_t	*mp,
 		head->h_fmt = cpu_to_be32(XLOG_FMT);
 		memcpy(&head->h_fs_uuid, &mp->m_sb.sb_uuid, sizeof(uuid_t));
 
-		iclog->ic_size = XFS_BUF_SIZE(bp) - log->l_iclog_hsize;
+		iclog->ic_size = BBTOB(bp->b_length) - log->l_iclog_hsize;
 		iclog->ic_state = XLOG_STATE_ACTIVE;
 		iclog->ic_log = log;
 		atomic_set(&iclog->ic_refcnt, 0);
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index d94ed40..15b7470 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -146,7 +146,7 @@ xlog_align(
 {
 	xfs_daddr_t	offset = blk_no & ((xfs_daddr_t)log->l_sectBBsize - 1);
 
-	ASSERT(BBTOB(offset + nbblks) <= XFS_BUF_SIZE(bp));
+	ASSERT(offset + nbblks <= bp->b_length);
 	return bp->b_addr + BBTOB(offset);
 }
 
@@ -174,7 +174,7 @@ xlog_bread_noalign(
 	nbblks = round_up(nbblks, log->l_sectBBsize);
 
 	ASSERT(nbblks > 0);
-	ASSERT(BBTOB(nbblks) <= XFS_BUF_SIZE(bp));
+	ASSERT(nbblks <= bp->b_length);
 
 	XFS_BUF_SET_ADDR(bp, log->l_logBBstart + blk_no);
 	XFS_BUF_READ(bp);
@@ -219,7 +219,7 @@ xlog_bread_offset(
 	xfs_caddr_t	offset)
 {
 	xfs_caddr_t	orig_offset = bp->b_addr;
-	int		orig_len = bp->b_buffer_length;
+	int		orig_len = BBTOB(bp->b_length);
 	int		error, error2;
 
 	error = xfs_buf_associate_memory(bp, offset, BBTOB(nbblks));
@@ -260,7 +260,7 @@ xlog_bwrite(
 	nbblks = round_up(nbblks, log->l_sectBBsize);
 
 	ASSERT(nbblks > 0);
-	ASSERT(BBTOB(nbblks) <= XFS_BUF_SIZE(bp));
+	ASSERT(nbblks <= bp->b_length);
 
 	XFS_BUF_SET_ADDR(bp, log->l_logBBstart + blk_no);
 	XFS_BUF_ZEROFLAGS(bp);
diff --git a/fs/xfs/xfs_trace.h b/fs/xfs/xfs_trace.h
index 2e41756..900764c 100644
--- a/fs/xfs/xfs_trace.h
+++ b/fs/xfs/xfs_trace.h
@@ -281,7 +281,7 @@ DECLARE_EVENT_CLASS(xfs_buf_class,
 	TP_STRUCT__entry(
 		__field(dev_t, dev)
 		__field(xfs_daddr_t, bno)
-		__field(size_t, buffer_length)
+		__field(int, nblks)
 		__field(int, hold)
 		__field(int, pincount)
 		__field(unsigned, lockval)
@@ -291,18 +291,18 @@ DECLARE_EVENT_CLASS(xfs_buf_class,
 	TP_fast_assign(
 		__entry->dev = bp->b_target->bt_dev;
 		__entry->bno = bp->b_bn;
-		__entry->buffer_length = bp->b_buffer_length;
+		__entry->nblks = bp->b_length;
 		__entry->hold = atomic_read(&bp->b_hold);
 		__entry->pincount = atomic_read(&bp->b_pin_count);
 		__entry->lockval = bp->b_sema.count;
 		__entry->flags = bp->b_flags;
 		__entry->caller_ip = caller_ip;
 	),
-	TP_printk("dev %d:%d bno 0x%llx len 0x%zx hold %d pincount %d "
+	TP_printk("dev %d:%d bno 0x%llx nblks 0x%x hold %d pincount %d "
 		  "lock %d flags %s caller %pf",
 		  MAJOR(__entry->dev), MINOR(__entry->dev),
 		  (unsigned long long)__entry->bno,
-		  __entry->buffer_length,
+		  __entry->nblks,
 		  __entry->hold,
 		  __entry->pincount,
 		  __entry->lockval,
@@ -362,7 +362,7 @@ DECLARE_EVENT_CLASS(xfs_buf_flags_class,
 	TP_fast_assign(
 		__entry->dev = bp->b_target->bt_dev;
 		__entry->bno = bp->b_bn;
-		__entry->buffer_length = bp->b_buffer_length;
+		__entry->buffer_length = BBTOB(bp->b_length);
 		__entry->flags = flags;
 		__entry->hold = atomic_read(&bp->b_hold);
 		__entry->pincount = atomic_read(&bp->b_pin_count);
@@ -406,7 +406,7 @@ TRACE_EVENT(xfs_buf_ioerror,
 	TP_fast_assign(
 		__entry->dev = bp->b_target->bt_dev;
 		__entry->bno = bp->b_bn;
-		__entry->buffer_length = bp->b_buffer_length;
+		__entry->buffer_length = BBTOB(bp->b_length);
 		__entry->hold = atomic_read(&bp->b_hold);
 		__entry->pincount = atomic_read(&bp->b_pin_count);
 		__entry->lockval = bp->b_sema.count;
@@ -450,7 +450,7 @@ DECLARE_EVENT_CLASS(xfs_buf_item_class,
 		__entry->bli_recur = bip->bli_recur;
 		__entry->bli_refcount = atomic_read(&bip->bli_refcount);
 		__entry->buf_bno = bip->bli_buf->b_bn;
-		__entry->buf_len = bip->bli_buf->b_buffer_length;
+		__entry->buf_len = BBTOB(bip->bli_buf->b_length);
 		__entry->buf_flags = bip->bli_buf->b_flags;
 		__entry->buf_hold = atomic_read(&bip->bli_buf->b_hold);
 		__entry->buf_pincount = atomic_read(&bip->bli_buf->b_pin_count);
-- 
1.7.9.5

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 13/18] xfs: use blocks for storing the desired IO size
  2012-04-13 12:10 [PATCH 0/18] xfs: current patch queue Dave Chinner
                   ` (11 preceding siblings ...)
  2012-04-13 12:10 ` [PATCH 12/18] xfs: use blocks for counting length of buffers Dave Chinner
@ 2012-04-13 12:10 ` Dave Chinner
  2012-04-13 12:10 ` [PATCH 14/18] xfs: kill xfs_buf_btoc Dave Chinner
                   ` (5 subsequent siblings)
  18 siblings, 0 replies; 45+ messages in thread
From: Dave Chinner @ 2012-04-13 12:10 UTC (permalink / raw)
  To: xfs

Now that we pass block counts everywhere, and index buffers by block
number and length in units of blocks, convert the desired IO size
into block counts rather than bytes. Convert the code to use block
counts, and those that need byte counts get converted at the time of
use.

Rename the b_desired_count variable to something closer to it's
purpose - b_io_length - as it is only used to specify the length of
an IO for a subset of the buffer.  The only time this is used is for
log IO - both writing iclogs and during log recovery. In all other
cases, the b_io_length matches b_length, and hence a lot of code
confuses the two. e.g. the buf item code uses the io count
exclusively when it should be using the buffer length. Fix these
apprpriately as they are found.

Also, remove the XFS_BUF_{SET_}COUNT() macros that are just wrappers
around the desired IO length. They only serve to make the code
shouty loud, don't actually add any real value, and are often used
incorrectly.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
---
 fs/xfs/xfs_buf.c         |   26 +++++++++++++-------------
 fs/xfs/xfs_buf.h         |    4 +---
 fs/xfs/xfs_buf_item.c    |   15 ++++++++-------
 fs/xfs/xfs_da_btree.c    |   16 ++++++++--------
 fs/xfs/xfs_log.c         |    2 +-
 fs/xfs/xfs_log_recover.c |   15 ++++++++-------
 fs/xfs/xfs_trans_buf.c   |    4 ++--
 7 files changed, 41 insertions(+), 41 deletions(-)

diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 088b4f0..b508eaf 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -192,12 +192,12 @@ xfs_buf_alloc(
 	bp->b_target = target;
 
 	/*
-	 * Set length and count_desired to the same value initially.
-	 * I/O routines should use count_desired, which will be the same in
+	 * Set length and io_length to the same value initially.
+	 * I/O routines should use io_length, which will be the same in
 	 * most cases but may be reset (e.g. XFS recovery).
 	 */
 	bp->b_length = numblks;
-	bp->b_count_desired = numblks << BBSHIFT;
+	bp->b_io_length = numblks;
 	bp->b_flags = flags;
 
 	/*
@@ -296,7 +296,7 @@ xfs_buf_allocate_memory(
 	xfs_buf_t		*bp,
 	uint			flags)
 {
-	size_t			size = bp->b_count_desired;
+	size_t			size;
 	size_t			nbytes, offset;
 	gfp_t			gfp_mask = xb_to_gfp(flags);
 	unsigned short		page_count, i;
@@ -339,6 +339,7 @@ use_alloc_page:
 		return error;
 
 	offset = bp->b_offset;
+	size = BBTOB(bp->b_length);
 	bp->b_flags |= _XBF_PAGES;
 
 	for (i = 0; i < bp->b_page_count; i++) {
@@ -569,7 +570,7 @@ xfs_buf_get(
 	 * that we can do IO on it.
 	 */
 	bp->b_bn = blkno;
-	bp->b_count_desired = BBTOB(bp->b_length);
+	bp->b_io_length = bp->b_length;
 
 found:
 	if (!(bp->b_flags & XBF_MAPPED)) {
@@ -712,7 +713,7 @@ xfs_buf_set_empty(
 	bp->b_page_count = 0;
 	bp->b_addr = NULL;
 	bp->b_length = numblks;
-	bp->b_count_desired = numblks << BBSHIFT;
+	bp->b_io_length = numblks;
 	bp->b_bn = XFS_BUF_DADDR_NULL;
 	bp->b_flags &= ~XBF_MAPPED;
 }
@@ -764,7 +765,7 @@ xfs_buf_associate_memory(
 		pageaddr += PAGE_SIZE;
 	}
 
-	bp->b_count_desired = len;
+	bp->b_io_length = BTOBB(len);
 	bp->b_length = BTOBB(buflen);
 	bp->b_flags |= XBF_MAPPED;
 
@@ -1006,9 +1007,8 @@ xfs_buf_ioerror_alert(
 	const char		*func)
 {
 	xfs_alert(bp->b_target->bt_mount,
-"metadata I/O error: block 0x%llx (\"%s\") error %d buf count %zd",
-		(__uint64_t)XFS_BUF_ADDR(bp), func,
-		bp->b_error, XFS_BUF_COUNT(bp));
+"metadata I/O error: block 0x%llx (\"%s\") error %d numblks %d",
+		(__uint64_t)XFS_BUF_ADDR(bp), func, bp->b_error, bp->b_length);
 }
 
 int
@@ -1180,7 +1180,7 @@ _xfs_buf_ioapply(
 	int			rw, map_i, total_nr_pages, nr_pages;
 	struct bio		*bio;
 	int			offset = bp->b_offset;
-	int			size = bp->b_count_desired;
+	int			size = BBTOB(bp->b_io_length);
 	sector_t		sector = bp->b_bn;
 
 	total_nr_pages = bp->b_page_count;
@@ -1228,7 +1228,7 @@ next_chunk:
 			break;
 
 		offset = 0;
-		sector += nbytes >> BBSHIFT;
+		sector += BTOBB(nbytes);
 		size -= nbytes;
 		total_nr_pages--;
 	}
@@ -1322,7 +1322,7 @@ xfs_buf_iomove(
 		page = bp->b_pages[xfs_buf_btoct(boff + bp->b_offset)];
 		cpoff = xfs_buf_poff(boff + bp->b_offset);
 		csize = min_t(size_t,
-			      PAGE_SIZE-cpoff, bp->b_count_desired-boff);
+			      PAGE_SIZE - cpoff, BBTOB(bp->b_io_length) - boff);
 
 		ASSERT(((csize + cpoff) <= PAGE_SIZE));
 
diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
index 3dab208..9787645 100644
--- a/fs/xfs/xfs_buf.h
+++ b/fs/xfs/xfs_buf.h
@@ -128,7 +128,6 @@ typedef struct xfs_buf {
 	struct list_head	b_list;
 	struct xfs_perag	*b_pag;		/* contains rbtree root */
 	xfs_buftarg_t		*b_target;	/* buffer target (device) */
-	size_t			b_count_desired;/* desired transfer size */
 	void			*b_addr;	/* virtual address of buffer */
 	struct work_struct	b_iodone_work;
 	xfs_buf_iodone_t	b_iodone;	/* I/O completion function */
@@ -137,6 +136,7 @@ typedef struct xfs_buf {
 	struct xfs_trans	*b_transp;
 	struct page		**b_pages;	/* array of page pointers */
 	struct page		*b_page_array[XB_PAGES]; /* inline pages */
+	int			b_io_length;	/* IO size in BBs */
 	atomic_t		b_pin_count;	/* pin count */
 	atomic_t		b_io_remaining;	/* #outstanding I/O requests */
 	unsigned int		b_page_count;	/* size of page array */
@@ -244,8 +244,6 @@ void xfs_buf_stale(struct xfs_buf *bp);
 
 #define XFS_BUF_ADDR(bp)		((bp)->b_bn)
 #define XFS_BUF_SET_ADDR(bp, bno)	((bp)->b_bn = (xfs_daddr_t)(bno))
-#define XFS_BUF_COUNT(bp)		((bp)->b_count_desired)
-#define XFS_BUF_SET_COUNT(bp, cnt)	((bp)->b_count_desired = (cnt))
 
 static inline void xfs_buf_set_ref(struct xfs_buf *bp, int lru_ref)
 {
diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
index 7f0abea..a25206c 100644
--- a/fs/xfs/xfs_buf_item.c
+++ b/fs/xfs/xfs_buf_item.c
@@ -123,11 +123,11 @@ xfs_buf_item_log_check(
 	ASSERT(bip->bli_logged != NULL);
 
 	bp = bip->bli_buf;
-	ASSERT(XFS_BUF_COUNT(bp) > 0);
+	ASSERT(bp->b_length > 0);
 	ASSERT(bp->b_addr != NULL);
 	orig = bip->bli_orig;
 	buffer = bp->b_addr;
-	for (x = 0; x < XFS_BUF_COUNT(bp); x++) {
+	for (x = 0; x < BBTOB(bp->b_length); x++) {
 		if (orig[x] != buffer[x] && !btst(bip->bli_logged, x)) {
 			xfs_emerg(bp->b_mount,
 				"%s: bip %x buffer %x orig %x index %d",
@@ -657,7 +657,8 @@ xfs_buf_item_init(
 	 * truncate any pieces.  map_size is the size of the
 	 * bitmap needed to describe the chunks of the buffer.
 	 */
-	chunks = (int)((XFS_BUF_COUNT(bp) + (XFS_BLF_CHUNK - 1)) >> XFS_BLF_SHIFT);
+	chunks = (int)((BBTOB(bp->b_length) + (XFS_BLF_CHUNK - 1)) >>
+								XFS_BLF_SHIFT);
 	map_size = (int)((chunks + NBWORD) >> BIT_TO_WORD_SHIFT);
 
 	bip = (xfs_buf_log_item_t*)kmem_zone_zalloc(xfs_buf_item_zone,
@@ -667,7 +668,7 @@ xfs_buf_item_init(
 	xfs_buf_hold(bp);
 	bip->bli_format.blf_type = XFS_LI_BUF;
 	bip->bli_format.blf_blkno = (__int64_t)XFS_BUF_ADDR(bp);
-	bip->bli_format.blf_len = (ushort)BTOBB(XFS_BUF_COUNT(bp));
+	bip->bli_format.blf_len = (ushort)bp->b_length;
 	bip->bli_format.blf_map_size = map_size;
 
 #ifdef XFS_TRANS_DEBUG
@@ -679,9 +680,9 @@ xfs_buf_item_init(
 	 * the buffer to indicate which bytes the callers have asked
 	 * to have logged.
 	 */
-	bip->bli_orig = (char *)kmem_alloc(XFS_BUF_COUNT(bp), KM_SLEEP);
-	memcpy(bip->bli_orig, bp->b_addr, XFS_BUF_COUNT(bp));
-	bip->bli_logged = (char *)kmem_zalloc(XFS_BUF_COUNT(bp) / NBBY, KM_SLEEP);
+	bip->bli_orig = kmem_alloc(BBTOB(bp->b_length), KM_SLEEP);
+	memcpy(bip->bli_orig, bp->b_addr, BBTOB(bp->b_length));
+	bip->bli_logged = kmem_zalloc(BBTOB(bp->b_length) / NBBY, KM_SLEEP);
 #endif
 
 	/*
diff --git a/fs/xfs/xfs_da_btree.c b/fs/xfs/xfs_da_btree.c
index 7f1a6f5..b8adc79 100644
--- a/fs/xfs/xfs_da_btree.c
+++ b/fs/xfs/xfs_da_btree.c
@@ -2277,20 +2277,20 @@ xfs_da_buf_make(int nbuf, xfs_buf_t **bps)
 	if (nbuf == 1) {
 		dabuf->nbuf = 1;
 		bp = bps[0];
-		dabuf->bbcount = (short)BTOBB(XFS_BUF_COUNT(bp));
+		dabuf->bbcount = bp->b_length;
 		dabuf->data = bp->b_addr;
 		dabuf->bps[0] = bp;
 	} else {
 		dabuf->nbuf = nbuf;
 		for (i = 0, dabuf->bbcount = 0; i < nbuf; i++) {
 			dabuf->bps[i] = bp = bps[i];
-			dabuf->bbcount += BTOBB(XFS_BUF_COUNT(bp));
+			dabuf->bbcount += bp->b_length;
 		}
 		dabuf->data = kmem_alloc(BBTOB(dabuf->bbcount), KM_SLEEP);
-		for (i = off = 0; i < nbuf; i++, off += XFS_BUF_COUNT(bp)) {
+		for (i = off = 0; i < nbuf; i++, off += BBTOB(bp->b_length)) {
 			bp = bps[i];
 			memcpy((char *)dabuf->data + off, bp->b_addr,
-				XFS_BUF_COUNT(bp));
+				BBTOB(bp->b_length));
 		}
 	}
 	return dabuf;
@@ -2310,10 +2310,10 @@ xfs_da_buf_clean(xfs_dabuf_t *dabuf)
 		ASSERT(dabuf->nbuf > 1);
 		dabuf->dirty = 0;
 		for (i = off = 0; i < dabuf->nbuf;
-				i++, off += XFS_BUF_COUNT(bp)) {
+				i++, off += BBTOB(bp->b_length)) {
 			bp = dabuf->bps[i];
 			memcpy(bp->b_addr, dabuf->data + off,
-						XFS_BUF_COUNT(bp));
+						BBTOB(bp->b_length));
 		}
 	}
 }
@@ -2356,10 +2356,10 @@ xfs_da_log_buf(xfs_trans_t *tp, xfs_dabuf_t *dabuf, uint first, uint last)
 	}
 	dabuf->dirty = 1;
 	ASSERT(first <= last);
-	for (i = off = 0; i < dabuf->nbuf; i++, off += XFS_BUF_COUNT(bp)) {
+	for (i = off = 0; i < dabuf->nbuf; i++, off += BBTOB(bp->b_length)) {
 		bp = dabuf->bps[i];
 		f = off;
-		l = f + XFS_BUF_COUNT(bp) - 1;
+		l = f + BBTOB(bp->b_length) - 1;
 		if (f < first)
 			f = first;
 		if (l > last)
diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index f9d8355..5e2aa52 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -1487,7 +1487,7 @@ xlog_sync(xlog_t		*log,
 	} else {
 		iclog->ic_bwritecnt = 1;
 	}
-	XFS_BUF_SET_COUNT(bp, count);
+	bp->b_io_length = BTOBB(count);
 	bp->b_fspriv = iclog;
 	XFS_BUF_ZEROFLAGS(bp);
 	XFS_BUF_ASYNC(bp);
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 15b7470..0872d71 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -178,7 +178,7 @@ xlog_bread_noalign(
 
 	XFS_BUF_SET_ADDR(bp, log->l_logBBstart + blk_no);
 	XFS_BUF_READ(bp);
-	XFS_BUF_SET_COUNT(bp, BBTOB(nbblks));
+	bp->b_io_length = nbblks;
 	bp->b_error = 0;
 
 	xfsbdstrat(log->l_mp, bp);
@@ -266,7 +266,7 @@ xlog_bwrite(
 	XFS_BUF_ZEROFLAGS(bp);
 	xfs_buf_hold(bp);
 	xfs_buf_lock(bp);
-	XFS_BUF_SET_COUNT(bp, BBTOB(nbblks));
+	bp->b_io_length = nbblks;
 	bp->b_error = 0;
 
 	error = xfs_bwrite(bp);
@@ -1774,7 +1774,7 @@ xlog_recover_do_inode_buffer(
 
 	trace_xfs_log_recover_buf_inode_buf(mp->m_log, buf_f);
 
-	inodes_per_buf = XFS_BUF_COUNT(bp) >> mp->m_sb.sb_inodelog;
+	inodes_per_buf = BBTOB(bp->b_io_length) >> mp->m_sb.sb_inodelog;
 	for (i = 0; i < inodes_per_buf; i++) {
 		next_unlinked_offset = (i * mp->m_sb.sb_inodesize) +
 			offsetof(xfs_dinode_t, di_next_unlinked);
@@ -1816,7 +1816,8 @@ xlog_recover_do_inode_buffer(
 
 		ASSERT(item->ri_buf[item_index].i_addr != NULL);
 		ASSERT((item->ri_buf[item_index].i_len % XFS_BLF_CHUNK) == 0);
-		ASSERT((reg_buf_offset + reg_buf_bytes) <= XFS_BUF_COUNT(bp));
+		ASSERT((reg_buf_offset + reg_buf_bytes) <=
+							BBTOB(bp->b_io_length));
 
 		/*
 		 * The current logged region contains a copy of the
@@ -1875,8 +1876,8 @@ xlog_recover_do_reg_buffer(
 		ASSERT(nbits > 0);
 		ASSERT(item->ri_buf[i].i_addr != NULL);
 		ASSERT(item->ri_buf[i].i_len % XFS_BLF_CHUNK == 0);
-		ASSERT(XFS_BUF_COUNT(bp) >=
-		       ((uint)bit << XFS_BLF_SHIFT)+(nbits<<XFS_BLF_SHIFT));
+		ASSERT(BBTOB(bp->b_io_length) >=
+		       ((uint)bit << XFS_BLF_SHIFT) + (nbits << XFS_BLF_SHIFT));
 
 		/*
 		 * Do a sanity check if this is a dquot buffer. Just checking
@@ -2169,7 +2170,7 @@ xlog_recover_buffer_pass2(
 	 */
 	if (XFS_DINODE_MAGIC ==
 	    be16_to_cpu(*((__be16 *)xfs_buf_offset(bp, 0))) &&
-	    (XFS_BUF_COUNT(bp) != MAX(log->l_mp->m_sb.sb_blocksize,
+	    (BBTOB(bp->b_io_length) != MAX(log->l_mp->m_sb.sb_blocksize,
 			(__uint32_t)XFS_INODE_CLUSTER_SIZE(log->l_mp)))) {
 		xfs_buf_stale(bp);
 		error = xfs_bwrite(bp);
diff --git a/fs/xfs/xfs_trans_buf.c b/fs/xfs/xfs_trans_buf.c
index 9132d16..2ec196b 100644
--- a/fs/xfs/xfs_trans_buf.c
+++ b/fs/xfs/xfs_trans_buf.c
@@ -56,7 +56,7 @@ xfs_trans_buf_item_match(
 		if (blip->bli_item.li_type == XFS_LI_BUF &&
 		    blip->bli_buf->b_target == target &&
 		    XFS_BUF_ADDR(blip->bli_buf) == blkno &&
-		    XFS_BUF_COUNT(blip->bli_buf) == len)
+		    BBTOB(blip->bli_buf->b_length) == len)
 			return blip->bli_buf;
 	}
 
@@ -585,7 +585,7 @@ xfs_trans_log_buf(xfs_trans_t	*tp,
 
 	ASSERT(bp->b_transp == tp);
 	ASSERT(bip != NULL);
-	ASSERT((first <= last) && (last < XFS_BUF_COUNT(bp)));
+	ASSERT(first <= last && last < BBTOB(bp->b_length));
 	ASSERT(bp->b_iodone == NULL ||
 	       bp->b_iodone == xfs_buf_iodone_callbacks);
 
-- 
1.7.9.5

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 14/18] xfs: kill xfs_buf_btoc
  2012-04-13 12:10 [PATCH 0/18] xfs: current patch queue Dave Chinner
                   ` (12 preceding siblings ...)
  2012-04-13 12:10 ` [PATCH 13/18] xfs: use blocks for storing the desired IO size Dave Chinner
@ 2012-04-13 12:10 ` Dave Chinner
  2012-04-13 12:10 ` [PATCH 15/18] xfs: kill XBF_LOCK Dave Chinner
                   ` (4 subsequent siblings)
  18 siblings, 0 replies; 45+ messages in thread
From: Dave Chinner @ 2012-04-13 12:10 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

xfs_buf_btoc and friends are simple macros that do basic block
to page index conversion and vice versa. These aren't widely used,
and we use open coded masking and shifting everywhere else. Hence
remove themacros and open code the work they do.

Also, use of PAGE_CACHE_{SIZE|SHIFT|MASK} for these macros is now
incorrect - we are using pages directly and not the page cache, so
use PAGE_{SIZE|MASK|SHIFT} instead.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>
---
 fs/xfs/xfs_buf.c |   39 +++++++++++++++++++++------------------
 fs/xfs/xfs_buf.h |    5 -----
 2 files changed, 21 insertions(+), 23 deletions(-)

diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index b508eaf..7a945fd 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -300,7 +300,7 @@ xfs_buf_allocate_memory(
 	size_t			nbytes, offset;
 	gfp_t			gfp_mask = xb_to_gfp(flags);
 	unsigned short		page_count, i;
-	xfs_off_t		end;
+	xfs_off_t		start, end;
 	int			error;
 
 	/*
@@ -308,15 +308,15 @@ xfs_buf_allocate_memory(
 	 * the memory from the heap - there's no need for the complexity of
 	 * page arrays to keep allocation down to order 0.
 	 */
-	if (bp->b_length < BTOBB(PAGE_SIZE)) {
-		bp->b_addr = kmem_alloc(BBTOB(bp->b_length), xb_to_km(flags));
+	size = BBTOB(bp->b_length);
+	if (size < PAGE_SIZE) {
+		bp->b_addr = kmem_alloc(size, xb_to_km(flags));
 		if (!bp->b_addr) {
 			/* low memory - use alloc_page loop instead */
 			goto use_alloc_page;
 		}
 
-		if (((unsigned long)(bp->b_addr + BBTOB(bp->b_length) - 1) &
-								PAGE_MASK) !=
+		if (((unsigned long)(bp->b_addr + size - 1) & PAGE_MASK) !=
 		    ((unsigned long)bp->b_addr & PAGE_MASK)) {
 			/* b_addr spans two pages - use alloc_page instead */
 			kmem_free(bp->b_addr);
@@ -332,14 +332,14 @@ xfs_buf_allocate_memory(
 	}
 
 use_alloc_page:
-	end = BBTOB(bp->b_bn + bp->b_length);
-	page_count = xfs_buf_btoc(end) - xfs_buf_btoct(BBTOB(bp->b_bn));
+	start = BBTOB(bp->b_bn) >> PAGE_SHIFT;
+	end = (BBTOB(bp->b_bn + bp->b_length) + PAGE_SIZE - 1) >> PAGE_SHIFT;
+	page_count = end - start;
 	error = _xfs_buf_get_pages(bp, page_count, flags);
 	if (unlikely(error))
 		return error;
 
 	offset = bp->b_offset;
-	size = BBTOB(bp->b_length);
 	bp->b_flags |= _XBF_PAGES;
 
 	for (i = 0; i < bp->b_page_count; i++) {
@@ -1314,27 +1314,30 @@ xfs_buf_iomove(
 	void			*data,	/* data address			*/
 	xfs_buf_rw_t		mode)	/* read/write/zero flag		*/
 {
-	size_t			bend, cpoff, csize;
-	struct page		*page;
+	size_t			bend;
 
 	bend = boff + bsize;
 	while (boff < bend) {
-		page = bp->b_pages[xfs_buf_btoct(boff + bp->b_offset)];
-		cpoff = xfs_buf_poff(boff + bp->b_offset);
-		csize = min_t(size_t,
-			      PAGE_SIZE - cpoff, BBTOB(bp->b_io_length) - boff);
+		struct page	*page;
+		int		page_index, page_offset, csize;
+
+		page_index = (boff + bp->b_offset) >> PAGE_SHIFT;
+		page_offset = (boff + bp->b_offset) & ~PAGE_MASK;
+		page = bp->b_pages[page_index];
+		csize = min_t(size_t, PAGE_SIZE - page_offset,
+				      BBTOB(bp->b_io_length) - boff);
 
-		ASSERT(((csize + cpoff) <= PAGE_SIZE));
+		ASSERT((csize + page_offset) <= PAGE_SIZE);
 
 		switch (mode) {
 		case XBRW_ZERO:
-			memset(page_address(page) + cpoff, 0, csize);
+			memset(page_address(page) + page_offset, 0, csize);
 			break;
 		case XBRW_READ:
-			memcpy(data, page_address(page) + cpoff, csize);
+			memcpy(data, page_address(page) + page_offset, csize);
 			break;
 		case XBRW_WRITE:
-			memcpy(page_address(page) + cpoff, data, csize);
+			memcpy(page_address(page) + page_offset, data, csize);
 		}
 
 		boff += csize;
diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
index 9787645..5b048f7 100644
--- a/fs/xfs/xfs_buf.h
+++ b/fs/xfs/xfs_buf.h
@@ -32,11 +32,6 @@
 
 #define XFS_BUF_DADDR_NULL	((xfs_daddr_t) (-1LL))
 
-#define xfs_buf_ctob(pp)	((pp) * PAGE_CACHE_SIZE)
-#define xfs_buf_btoc(dd)	(((dd) + PAGE_CACHE_SIZE-1) >> PAGE_CACHE_SHIFT)
-#define xfs_buf_btoct(dd)	((dd) >> PAGE_CACHE_SHIFT)
-#define xfs_buf_poff(aa)	((aa) & ~PAGE_CACHE_MASK)
-
 typedef enum {
 	XBRW_READ = 1,			/* transfer into target memory */
 	XBRW_WRITE = 2,			/* transfer from target memory */
-- 
1.7.9.5

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 15/18] xfs: kill XBF_LOCK
  2012-04-13 12:10 [PATCH 0/18] xfs: current patch queue Dave Chinner
                   ` (13 preceding siblings ...)
  2012-04-13 12:10 ` [PATCH 14/18] xfs: kill xfs_buf_btoc Dave Chinner
@ 2012-04-13 12:10 ` Dave Chinner
  2012-04-13 21:20   ` Mark Tinguely
  2012-04-13 12:10 ` [PATCH 16/18] xfs: kill xfs_read_buf() Dave Chinner
                   ` (3 subsequent siblings)
  18 siblings, 1 reply; 45+ messages in thread
From: Dave Chinner @ 2012-04-13 12:10 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

Buffers are always returned locked from the lookup routines. Hence
we don't need to tell the lookup routines to return locked buffers,
on to try and lock them. Remove XBF_LOCK from all the callers and
from internal buffer cache usage.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_attr.c        |    5 ++---
 fs/xfs/xfs_attr_leaf.c   |    2 +-
 fs/xfs/xfs_buf.c         |   20 +++++---------------
 fs/xfs/xfs_buf.h         |    4 +---
 fs/xfs/xfs_fsops.c       |   13 +++++--------
 fs/xfs/xfs_ialloc.c      |    3 +--
 fs/xfs/xfs_inode.c       |   16 +++++++---------
 fs/xfs/xfs_log_recover.c |    7 +++----
 fs/xfs/xfs_rw.c          |    2 +-
 fs/xfs/xfs_trans_buf.c   |    4 ++--
 fs/xfs/xfs_vnodeops.c    |    2 +-
 11 files changed, 29 insertions(+), 49 deletions(-)

diff --git a/fs/xfs/xfs_attr.c b/fs/xfs/xfs_attr.c
index 6e9bd7e..c8ef9a9 100644
--- a/fs/xfs/xfs_attr.c
+++ b/fs/xfs/xfs_attr.c
@@ -1988,8 +1988,7 @@ xfs_attr_rmtval_get(xfs_da_args_t *args)
 			dblkno = XFS_FSB_TO_DADDR(mp, map[i].br_startblock);
 			blkcnt = XFS_FSB_TO_BB(mp, map[i].br_blockcount);
 			error = xfs_read_buf(mp, mp->m_ddev_targp, dblkno,
-					     blkcnt, XBF_LOCK | XBF_DONT_BLOCK,
-					     &bp);
+					     blkcnt, XBF_DONT_BLOCK, &bp);
 			if (error)
 				return(error);
 
@@ -2116,7 +2115,7 @@ xfs_attr_rmtval_set(xfs_da_args_t *args)
 		blkcnt = XFS_FSB_TO_BB(mp, map.br_blockcount);
 
 		bp = xfs_buf_get(mp->m_ddev_targp, dblkno, blkcnt,
-				 XBF_LOCK | XBF_DONT_BLOCK);
+				 XBF_DONT_BLOCK);
 		if (!bp)
 			return ENOMEM;
 
diff --git a/fs/xfs/xfs_attr_leaf.c b/fs/xfs/xfs_attr_leaf.c
index 76d93dc..3cd5dc6 100644
--- a/fs/xfs/xfs_attr_leaf.c
+++ b/fs/xfs/xfs_attr_leaf.c
@@ -2983,7 +2983,7 @@ xfs_attr_leaf_freextent(xfs_trans_t **trans, xfs_inode_t *dp,
 						map.br_blockcount);
 			bp = xfs_trans_get_buf(*trans,
 					dp->i_mount->m_ddev_targp,
-					dblkno, dblkcnt, XBF_LOCK);
+					dblkno, dblkcnt, 0);
 			if (!bp)
 				return ENOMEM;
 			xfs_trans_binval(*trans, bp);
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 7a945fd..8d76df1 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -179,7 +179,7 @@ xfs_buf_alloc(
 	/*
 	 * We don't want certain flags to appear in b_flags.
 	 */
-	flags &= ~(XBF_LOCK|XBF_MAPPED|XBF_DONT_BLOCK|XBF_READ_AHEAD);
+	flags &= ~(XBF_MAPPED|XBF_DONT_BLOCK|XBF_READ_AHEAD);
 
 	atomic_set(&bp->b_hold, 1);
 	atomic_set(&bp->b_lru_ref, 1);
@@ -578,19 +578,14 @@ found:
 		if (unlikely(error)) {
 			xfs_warn(target->bt_mount,
 				"%s: failed to map pages\n", __func__);
-			goto no_buffer;
+			xfs_buf_relse(bp);
+			return NULL;
 		}
 	}
 
 	XFS_STATS_INC(xb_get);
 	trace_xfs_buf_get(bp, flags, _RET_IP_);
 	return bp;
-
-no_buffer:
-	if (flags & (XBF_LOCK | XBF_TRYLOCK))
-		xfs_buf_unlock(bp);
-	xfs_buf_rele(bp);
-	return NULL;
 }
 
 STATIC int
@@ -633,7 +628,8 @@ xfs_buf_read(
 			 * Read ahead call which is already satisfied,
 			 * drop the buffer
 			 */
-			goto no_buffer;
+			xfs_buf_relse(bp);
+			return NULL;
 		} else {
 			/* We do not want read in the flags */
 			bp->b_flags &= ~XBF_READ;
@@ -641,12 +637,6 @@ xfs_buf_read(
 	}
 
 	return bp;
-
- no_buffer:
-	if (flags & (XBF_LOCK | XBF_TRYLOCK))
-		xfs_buf_unlock(bp);
-	xfs_buf_rele(bp);
-	return NULL;
 }
 
 /*
diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
index 5b048f7..512d9a6 100644
--- a/fs/xfs/xfs_buf.h
+++ b/fs/xfs/xfs_buf.h
@@ -52,7 +52,6 @@ typedef enum {
 #define XBF_FLUSH	(1 << 12)/* flush the disk cache before a write */
 
 /* flags used only as arguments to access routines */
-#define XBF_LOCK	(1 << 15)/* lock requested */
 #define XBF_TRYLOCK	(1 << 16)/* lock requested, but do not wait */
 #define XBF_DONT_BLOCK	(1 << 17)/* do not block in current thread */
 
@@ -74,8 +73,7 @@ typedef unsigned int xfs_buf_flags_t;
 	{ XBF_SYNCIO,		"SYNCIO" }, \
 	{ XBF_FUA,		"FUA" }, \
 	{ XBF_FLUSH,		"FLUSH" }, \
-	{ XBF_LOCK,		"LOCK" },  	/* should never be set */\
-	{ XBF_TRYLOCK,		"TRYLOCK" }, 	/* ditto */\
+	{ XBF_TRYLOCK,		"TRYLOCK" }, 	/* should never be set */\
 	{ XBF_DONT_BLOCK,	"DONT_BLOCK" },	/* ditto */\
 	{ _XBF_PAGES,		"PAGES" }, \
 	{ _XBF_KMEM,		"KMEM" }, \
diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index 019ba5c..874d398 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -193,7 +193,7 @@ xfs_growfs_data_private(
 		 */
 		bp = xfs_buf_get(mp->m_ddev_targp,
 				 XFS_AG_DADDR(mp, agno, XFS_AGF_DADDR(mp)),
-				 XFS_FSS_TO_BB(mp, 1), XBF_LOCK | XBF_MAPPED);
+				 XFS_FSS_TO_BB(mp, 1), XBF_MAPPED);
 		if (!bp) {
 			error = ENOMEM;
 			goto error0;
@@ -230,7 +230,7 @@ xfs_growfs_data_private(
 		 */
 		bp = xfs_buf_get(mp->m_ddev_targp,
 				 XFS_AG_DADDR(mp, agno, XFS_AGI_DADDR(mp)),
-				 XFS_FSS_TO_BB(mp, 1), XBF_LOCK | XBF_MAPPED);
+				 XFS_FSS_TO_BB(mp, 1), XBF_MAPPED);
 		if (!bp) {
 			error = ENOMEM;
 			goto error0;
@@ -259,8 +259,7 @@ xfs_growfs_data_private(
 		 */
 		bp = xfs_buf_get(mp->m_ddev_targp,
 				 XFS_AGB_TO_DADDR(mp, agno, XFS_BNO_BLOCK(mp)),
-				 BTOBB(mp->m_sb.sb_blocksize),
-				 XBF_LOCK | XBF_MAPPED);
+				 BTOBB(mp->m_sb.sb_blocksize), XBF_MAPPED);
 		if (!bp) {
 			error = ENOMEM;
 			goto error0;
@@ -286,8 +285,7 @@ xfs_growfs_data_private(
 		 */
 		bp = xfs_buf_get(mp->m_ddev_targp,
 				 XFS_AGB_TO_DADDR(mp, agno, XFS_CNT_BLOCK(mp)),
-				 BTOBB(mp->m_sb.sb_blocksize),
-				 XBF_LOCK | XBF_MAPPED);
+				 BTOBB(mp->m_sb.sb_blocksize), XBF_MAPPED);
 		if (!bp) {
 			error = ENOMEM;
 			goto error0;
@@ -314,8 +312,7 @@ xfs_growfs_data_private(
 		 */
 		bp = xfs_buf_get(mp->m_ddev_targp,
 				 XFS_AGB_TO_DADDR(mp, agno, XFS_IBT_BLOCK(mp)),
-				 BTOBB(mp->m_sb.sb_blocksize),
-				 XBF_LOCK | XBF_MAPPED);
+				 BTOBB(mp->m_sb.sb_blocksize), XBF_MAPPED);
 		if (!bp) {
 			error = ENOMEM;
 			goto error0;
diff --git a/fs/xfs/xfs_ialloc.c b/fs/xfs/xfs_ialloc.c
index dad1a31..d094a23 100644
--- a/fs/xfs/xfs_ialloc.c
+++ b/fs/xfs/xfs_ialloc.c
@@ -200,8 +200,7 @@ xfs_ialloc_inode_init(
 		 */
 		d = XFS_AGB_TO_DADDR(mp, agno, agbno + (j * blks_per_cluster));
 		fbuf = xfs_trans_get_buf(tp, mp->m_ddev_targp, d,
-					 mp->m_bsize * blks_per_cluster,
-					 XBF_LOCK);
+					 mp->m_bsize * blks_per_cluster, 0);
 		if (!fbuf)
 			return ENOMEM;
 		/*
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 65d7d99..f64b482 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -226,7 +226,7 @@ xfs_inotobp(
 	if (error)
 		return error;
 
-	error = xfs_imap_to_bp(mp, tp, &imap, &bp, XBF_LOCK, imap_flags);
+	error = xfs_imap_to_bp(mp, tp, &imap, &bp, 0, imap_flags);
 	if (error)
 		return error;
 
@@ -782,8 +782,7 @@ xfs_iread(
 	/*
 	 * Get pointers to the on-disk inode and the buffer containing it.
 	 */
-	error = xfs_imap_to_bp(mp, tp, &ip->i_imap, &bp,
-			       XBF_LOCK, iget_flags);
+	error = xfs_imap_to_bp(mp, tp, &ip->i_imap, &bp, 0, iget_flags);
 	if (error)
 		return error;
 	dip = (xfs_dinode_t *)xfs_buf_offset(bp, ip->i_imap.im_boffset);
@@ -1342,7 +1341,7 @@ xfs_iunlink(
 		 * Here we put the head pointer into our next pointer,
 		 * and then we fall through to point the head at us.
 		 */
-		error = xfs_itobp(mp, tp, ip, &dip, &ibp, XBF_LOCK);
+		error = xfs_itobp(mp, tp, ip, &dip, &ibp, 0);
 		if (error)
 			return error;
 
@@ -1423,7 +1422,7 @@ xfs_iunlink_remove(
 		 * of dealing with the buffer when there is no need to
 		 * change it.
 		 */
-		error = xfs_itobp(mp, tp, ip, &dip, &ibp, XBF_LOCK);
+		error = xfs_itobp(mp, tp, ip, &dip, &ibp, 0);
 		if (error) {
 			xfs_warn(mp, "%s: xfs_itobp() returned error %d.",
 				__func__, error);
@@ -1484,7 +1483,7 @@ xfs_iunlink_remove(
 		 * Now last_ibp points to the buffer previous to us on
 		 * the unlinked list.  Pull us from the list.
 		 */
-		error = xfs_itobp(mp, tp, ip, &dip, &ibp, XBF_LOCK);
+		error = xfs_itobp(mp, tp, ip, &dip, &ibp, 0);
 		if (error) {
 			xfs_warn(mp, "%s: xfs_itobp(2) returned error %d.",
 				__func__, error);
@@ -1566,8 +1565,7 @@ xfs_ifree_cluster(
 		 * to mark all the active inodes on the buffer stale.
 		 */
 		bp = xfs_trans_get_buf(tp, mp->m_ddev_targp, blkno,
-					mp->m_bsize * blks_per_cluster,
-					XBF_LOCK);
+					mp->m_bsize * blks_per_cluster, 0);
 
 		if (!bp)
 			return ENOMEM;
@@ -1737,7 +1735,7 @@ xfs_ifree(
 
 	xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
 
-	error = xfs_itobp(ip->i_mount, tp, ip, &dip, &ibp, XBF_LOCK);
+	error = xfs_itobp(ip->i_mount, tp, ip, &dip, &ibp, 0);
 	if (error)
 		return error;
 
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 0872d71..6353eb3 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -2127,7 +2127,7 @@ xlog_recover_buffer_pass2(
 
 	trace_xfs_log_recover_buf_recover(log, buf_f);
 
-	buf_flags = XBF_LOCK;
+	buf_flags = 0;
 	if (!(buf_f->blf_flags & XFS_BLF_INODE_BUF))
 		buf_flags |= XBF_MAPPED;
 
@@ -2225,8 +2225,7 @@ xlog_recover_inode_pass2(
 	}
 	trace_xfs_log_recover_inode_recover(log, in_f);
 
-	bp = xfs_buf_read(mp->m_ddev_targp, in_f->ilf_blkno, in_f->ilf_len,
-			  XBF_LOCK);
+	bp = xfs_buf_read(mp->m_ddev_targp, in_f->ilf_blkno, in_f->ilf_len, 0);
 	if (!bp) {
 		error = ENOMEM;
 		goto error;
@@ -3099,7 +3098,7 @@ xlog_recover_process_one_iunlink(
 	/*
 	 * Get the on disk inode to find the next inode in the bucket.
 	 */
-	error = xfs_itobp(mp, NULL, ip, &dip, &ibp, XBF_LOCK);
+	error = xfs_itobp(mp, NULL, ip, &dip, &ibp, 0);
 	if (error)
 		goto fail_iput;
 
diff --git a/fs/xfs/xfs_rw.c b/fs/xfs/xfs_rw.c
index 597d044..2ce9775 100644
--- a/fs/xfs/xfs_rw.c
+++ b/fs/xfs/xfs_rw.c
@@ -114,7 +114,7 @@ xfs_read_buf(
 	int		 error;
 
 	if (!flags)
-		flags = XBF_LOCK | XBF_MAPPED;
+		flags = XBF_MAPPED;
 
 	bp = xfs_buf_read(target, blkno, len, flags);
 	if (!bp)
diff --git a/fs/xfs/xfs_trans_buf.c b/fs/xfs/xfs_trans_buf.c
index 2ec196b..f9cb7ee 100644
--- a/fs/xfs/xfs_trans_buf.c
+++ b/fs/xfs/xfs_trans_buf.c
@@ -142,7 +142,7 @@ xfs_trans_get_buf(xfs_trans_t	*tp,
 	xfs_buf_log_item_t	*bip;
 
 	if (flags == 0)
-		flags = XBF_LOCK | XBF_MAPPED;
+		flags = XBF_MAPPED;
 
 	/*
 	 * Default to a normal get_buf() call if the tp is NULL.
@@ -275,7 +275,7 @@ xfs_trans_read_buf(
 	int			error;
 
 	if (flags == 0)
-		flags = XBF_LOCK | XBF_MAPPED;
+		flags = XBF_MAPPED;
 
 	/*
 	 * Default to a normal get_buf() call if the tp is NULL.
diff --git a/fs/xfs/xfs_vnodeops.c b/fs/xfs/xfs_vnodeops.c
index 445c224..8f99c77 100644
--- a/fs/xfs/xfs_vnodeops.c
+++ b/fs/xfs/xfs_vnodeops.c
@@ -82,7 +82,7 @@ xfs_readlink_bmap(
 		byte_cnt = XFS_FSB_TO_B(mp, mval[n].br_blockcount);
 
 		bp = xfs_buf_read(mp->m_ddev_targp, d, BTOBB(byte_cnt),
-				  XBF_LOCK | XBF_MAPPED | XBF_DONT_BLOCK);
+				  XBF_MAPPED | XBF_DONT_BLOCK);
 		if (!bp)
 			return XFS_ERROR(ENOMEM);
 		error = bp->b_error;
-- 
1.7.9.5

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 16/18] xfs: kill xfs_read_buf()
  2012-04-13 12:10 [PATCH 0/18] xfs: current patch queue Dave Chinner
                   ` (14 preceding siblings ...)
  2012-04-13 12:10 ` [PATCH 15/18] xfs: kill XBF_LOCK Dave Chinner
@ 2012-04-13 12:10 ` Dave Chinner
  2012-04-13 12:11 ` [PATCH 17/18] xfs: kill XBF_DONTBLOCK Dave Chinner
                   ` (2 subsequent siblings)
  18 siblings, 0 replies; 45+ messages in thread
From: Dave Chinner @ 2012-04-13 12:10 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

xfs_read_buf() is effectively the same as xfs_trans_read_buf() when called
outside a transaction context. The error handling is slightly different in that
xfs_read_buf stales the errored buffer it gets back, but there is probably good
reason for xfs_trans_read_buf() for doing this.

Hence update xfs_trans_read_buf() to the same error handling as xfs_read_buf(),
and convert all the callers of xfs_read_buf() to use the former function. We can
then remove xfs_read_buf().

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_attr.c        |    4 ++--
 fs/xfs/xfs_fsops.c       |    2 +-
 fs/xfs/xfs_log_recover.c |   11 ++++------
 fs/xfs/xfs_rw.c          |   50 ----------------------------------------------
 fs/xfs/xfs_rw.h          |    3 ---
 fs/xfs/xfs_trans_buf.c   |    4 ++++
 6 files changed, 11 insertions(+), 63 deletions(-)

diff --git a/fs/xfs/xfs_attr.c b/fs/xfs/xfs_attr.c
index c8ef9a9..ad85bed 100644
--- a/fs/xfs/xfs_attr.c
+++ b/fs/xfs/xfs_attr.c
@@ -1987,8 +1987,8 @@ xfs_attr_rmtval_get(xfs_da_args_t *args)
 			       (map[i].br_startblock != HOLESTARTBLOCK));
 			dblkno = XFS_FSB_TO_DADDR(mp, map[i].br_startblock);
 			blkcnt = XFS_FSB_TO_BB(mp, map[i].br_blockcount);
-			error = xfs_read_buf(mp, mp->m_ddev_targp, dblkno,
-					     blkcnt, XBF_DONT_BLOCK, &bp);
+			error = xfs_trans_read_buf(mp, NULL, mp->m_ddev_targp,
+						   dblkno, blkcnt, 0, &bp);
 			if (error)
 				return(error);
 
diff --git a/fs/xfs/xfs_fsops.c b/fs/xfs/xfs_fsops.c
index 874d398..e9f5bc0 100644
--- a/fs/xfs/xfs_fsops.c
+++ b/fs/xfs/xfs_fsops.c
@@ -402,7 +402,7 @@ xfs_growfs_data_private(
 
 	/* update secondary superblocks. */
 	for (agno = 1; agno < nagcount; agno++) {
-		error = xfs_read_buf(mp, mp->m_ddev_targp,
+		error = xfs_trans_read_buf(mp, NULL, mp->m_ddev_targp,
 				  XFS_AGB_TO_DADDR(mp, agno, XFS_SB_BLOCK(mp)),
 				  XFS_FSS_TO_BB(mp, 1), 0, &bp);
 		if (error) {
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 6353eb3..b700ced 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -2535,14 +2535,11 @@ xlog_recover_dquot_pass2(
 		return XFS_ERROR(EIO);
 	ASSERT(dq_f->qlf_len == 1);
 
-	error = xfs_read_buf(mp, mp->m_ddev_targp,
-			     dq_f->qlf_blkno,
-			     XFS_FSB_TO_BB(mp, dq_f->qlf_len),
-			     0, &bp);
-	if (error) {
-		xfs_buf_ioerror_alert(bp, "xlog_recover_do..(read#3)");
+	error = xfs_trans_read_buf(mp, NULL, mp->m_ddev_targp, dq_f->qlf_blkno,
+				   XFS_FSB_TO_BB(mp, dq_f->qlf_len), 0, &bp);
+	if (error)
 		return error;
-	}
+
 	ASSERT(bp);
 	ddq = (xfs_disk_dquot_t *)xfs_buf_offset(bp, dq_f->qlf_boffset);
 
diff --git a/fs/xfs/xfs_rw.c b/fs/xfs/xfs_rw.c
index 2ce9775..3c2488a 100644
--- a/fs/xfs/xfs_rw.c
+++ b/fs/xfs/xfs_rw.c
@@ -92,56 +92,6 @@ xfs_do_force_shutdown(
 }
 
 /*
- * This isn't an absolute requirement, but it is
- * just a good idea to call xfs_read_buf instead of
- * directly doing a read_buf call. For one, we shouldn't
- * be doing this disk read if we are in SHUTDOWN state anyway,
- * so this stops that from happening. Secondly, this does all
- * the error checking stuff and the brelse if appropriate for
- * the caller, so the code can be a little leaner.
- */
-
-int
-xfs_read_buf(
-	struct xfs_mount *mp,
-	xfs_buftarg_t	 *target,
-	xfs_daddr_t	 blkno,
-	int              len,
-	uint             flags,
-	xfs_buf_t	 **bpp)
-{
-	xfs_buf_t	 *bp;
-	int		 error;
-
-	if (!flags)
-		flags = XBF_MAPPED;
-
-	bp = xfs_buf_read(target, blkno, len, flags);
-	if (!bp)
-		return XFS_ERROR(EIO);
-	error = bp->b_error;
-	if (!error && !XFS_FORCED_SHUTDOWN(mp)) {
-		*bpp = bp;
-	} else {
-		*bpp = NULL;
-		if (error) {
-			xfs_buf_ioerror_alert(bp, __func__);
-		} else {
-			error = XFS_ERROR(EIO);
-		}
-		if (bp) {
-			XFS_BUF_UNDONE(bp);
-			xfs_buf_stale(bp);
-			/*
-			 * brelse clears B_ERROR and b_error
-			 */
-			xfs_buf_relse(bp);
-		}
-	}
-	return (error);
-}
-
-/*
  * helper function to extract extent size hint from inode
  */
 xfs_extlen_t
diff --git a/fs/xfs/xfs_rw.h b/fs/xfs/xfs_rw.h
index bbdb9ad..967b3a4 100644
--- a/fs/xfs/xfs_rw.h
+++ b/fs/xfs/xfs_rw.h
@@ -39,9 +39,6 @@ xfs_fsb_to_db(struct xfs_inode *ip, xfs_fsblock_t fsb)
 /*
  * Prototypes for functions in xfs_rw.c.
  */
-extern int xfs_read_buf(struct xfs_mount *mp, xfs_buftarg_t *btp,
-			xfs_daddr_t blkno, int len, uint flags,
-			struct xfs_buf **bpp);
 extern xfs_extlen_t xfs_get_extsz_hint(struct xfs_inode *ip);
 
 #endif /* __XFS_RW_H__ */
diff --git a/fs/xfs/xfs_trans_buf.c b/fs/xfs/xfs_trans_buf.c
index f9cb7ee..5e4cf61 100644
--- a/fs/xfs/xfs_trans_buf.c
+++ b/fs/xfs/xfs_trans_buf.c
@@ -274,6 +274,8 @@ xfs_trans_read_buf(
 	xfs_buf_log_item_t	*bip;
 	int			error;
 
+	*bpp = NULL;
+
 	if (flags == 0)
 		flags = XBF_MAPPED;
 
@@ -289,6 +291,8 @@ xfs_trans_read_buf(
 		if (bp->b_error) {
 			error = bp->b_error;
 			xfs_buf_ioerror_alert(bp, __func__);
+			XFS_BUF_UNDONE(bp);
+			xfs_buf_stale(bp);
 			xfs_buf_relse(bp);
 			return error;
 		}
-- 
1.7.9.5

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 17/18] xfs: kill XBF_DONTBLOCK
  2012-04-13 12:10 [PATCH 0/18] xfs: current patch queue Dave Chinner
                   ` (15 preceding siblings ...)
  2012-04-13 12:10 ` [PATCH 16/18] xfs: kill xfs_read_buf() Dave Chinner
@ 2012-04-13 12:11 ` Dave Chinner
  2012-04-16 14:34   ` Mark Tinguely
  2012-04-13 12:11 ` [PATCH 18/18] xfs: use iolock on XFS_IOC_ALLOCSP calls Dave Chinner
  2012-04-16 21:29 ` [PATCH 0/18] xfs: current patch queue Ben Myers
  18 siblings, 1 reply; 45+ messages in thread
From: Dave Chinner @ 2012-04-13 12:11 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

Just about all callers of xfs_buf_read() and xfs_buf_get() use XBF_DONTBLOCK.
This is used to make memory allocation use GFP_NOFS rather than GFP_KERNEL to
avoid recursion through memory reclaim back into the filesystem.

All the blocking get calls in growfs occur inside a transaction, even though
they are no part of the transaction, so all allocation will be GFP_NOFS due to
the task flag PF_TRANS being set. The blocking read calls occur during log
recovery, so they will probably be unaffected by converting to GFP_NOFS
allocations.

Hence make XBF_DONTBLOCK behaviour always occur for buffers and kill the flag.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_attr.c      |    3 +--
 fs/xfs/xfs_buf.c       |   18 +++++++-----------
 fs/xfs/xfs_buf.h       |    2 --
 fs/xfs/xfs_trans_buf.c |   25 ++++---------------------
 fs/xfs/xfs_vnodeops.c  |    4 ++--
 5 files changed, 14 insertions(+), 38 deletions(-)

diff --git a/fs/xfs/xfs_attr.c b/fs/xfs/xfs_attr.c
index ad85bed..0960bb6 100644
--- a/fs/xfs/xfs_attr.c
+++ b/fs/xfs/xfs_attr.c
@@ -2114,8 +2114,7 @@ xfs_attr_rmtval_set(xfs_da_args_t *args)
 		dblkno = XFS_FSB_TO_DADDR(mp, map.br_startblock),
 		blkcnt = XFS_FSB_TO_BB(mp, map.br_blockcount);
 
-		bp = xfs_buf_get(mp->m_ddev_targp, dblkno, blkcnt,
-				 XBF_DONT_BLOCK);
+		bp = xfs_buf_get(mp->m_ddev_targp, dblkno, blkcnt, 0);
 		if (!bp)
 			return ENOMEM;
 
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 8d76df1..e2e0538 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -56,11 +56,7 @@ static struct workqueue_struct *xfslogd_workqueue;
 #endif
 
 #define xb_to_gfp(flags) \
-	((((flags) & XBF_READ_AHEAD) ? __GFP_NORETRY : \
-	  ((flags) & XBF_DONT_BLOCK) ? GFP_NOFS : GFP_KERNEL) | __GFP_NOWARN)
-
-#define xb_to_km(flags) \
-	 (((flags) & XBF_DONT_BLOCK) ? KM_NOFS : KM_SLEEP)
+	((((flags) & XBF_READ_AHEAD) ? __GFP_NORETRY : GFP_NOFS) | __GFP_NOWARN)
 
 
 static inline int
@@ -172,14 +168,14 @@ xfs_buf_alloc(
 {
 	struct xfs_buf		*bp;
 
-	bp = kmem_zone_zalloc(xfs_buf_zone, xb_to_km(flags));
+	bp = kmem_zone_zalloc(xfs_buf_zone, KM_NOFS);
 	if (unlikely(!bp))
 		return NULL;
 
 	/*
 	 * We don't want certain flags to appear in b_flags.
 	 */
-	flags &= ~(XBF_MAPPED|XBF_DONT_BLOCK|XBF_READ_AHEAD);
+	flags &= ~(XBF_MAPPED|XBF_READ_AHEAD);
 
 	atomic_set(&bp->b_hold, 1);
 	atomic_set(&bp->b_lru_ref, 1);
@@ -233,7 +229,7 @@ _xfs_buf_get_pages(
 			bp->b_pages = bp->b_page_array;
 		} else {
 			bp->b_pages = kmem_alloc(sizeof(struct page *) *
-					page_count, xb_to_km(flags));
+						 page_count, KM_NOFS);
 			if (bp->b_pages == NULL)
 				return -ENOMEM;
 		}
@@ -310,7 +306,7 @@ xfs_buf_allocate_memory(
 	 */
 	size = BBTOB(bp->b_length);
 	if (size < PAGE_SIZE) {
-		bp->b_addr = kmem_alloc(size, xb_to_km(flags));
+		bp->b_addr = kmem_alloc(size, KM_NOFS);
 		if (!bp->b_addr) {
 			/* low memory - use alloc_page loop instead */
 			goto use_alloc_page;
@@ -653,7 +649,7 @@ xfs_buf_readahead(
 		return;
 
 	xfs_buf_read(target, blkno, numblks,
-		     XBF_TRYLOCK|XBF_ASYNC|XBF_READ_AHEAD|XBF_DONT_BLOCK);
+		     XBF_TRYLOCK|XBF_ASYNC|XBF_READ_AHEAD);
 }
 
 /*
@@ -744,7 +740,7 @@ xfs_buf_associate_memory(
 	bp->b_pages = NULL;
 	bp->b_addr = mem;
 
-	rval = _xfs_buf_get_pages(bp, page_count, XBF_DONT_BLOCK);
+	rval = _xfs_buf_get_pages(bp, page_count, 0);
 	if (rval)
 		return rval;
 
diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
index 512d9a6..846dee3 100644
--- a/fs/xfs/xfs_buf.h
+++ b/fs/xfs/xfs_buf.h
@@ -53,7 +53,6 @@ typedef enum {
 
 /* flags used only as arguments to access routines */
 #define XBF_TRYLOCK	(1 << 16)/* lock requested, but do not wait */
-#define XBF_DONT_BLOCK	(1 << 17)/* do not block in current thread */
 
 /* flags used only internally */
 #define _XBF_PAGES	(1 << 20)/* backed by refcounted pages */
@@ -74,7 +73,6 @@ typedef unsigned int xfs_buf_flags_t;
 	{ XBF_FUA,		"FUA" }, \
 	{ XBF_FLUSH,		"FLUSH" }, \
 	{ XBF_TRYLOCK,		"TRYLOCK" }, 	/* should never be set */\
-	{ XBF_DONT_BLOCK,	"DONT_BLOCK" },	/* ditto */\
 	{ _XBF_PAGES,		"PAGES" }, \
 	{ _XBF_KMEM,		"KMEM" }, \
 	{ _XBF_DELWRI_Q,	"DELWRI_Q" }
diff --git a/fs/xfs/xfs_trans_buf.c b/fs/xfs/xfs_trans_buf.c
index 5e4cf61..ccc6da1 100644
--- a/fs/xfs/xfs_trans_buf.c
+++ b/fs/xfs/xfs_trans_buf.c
@@ -148,8 +148,7 @@ xfs_trans_get_buf(xfs_trans_t	*tp,
 	 * Default to a normal get_buf() call if the tp is NULL.
 	 */
 	if (tp == NULL)
-		return xfs_buf_get(target_dev, blkno, len,
-				   flags | XBF_DONT_BLOCK);
+		return xfs_buf_get(target_dev, blkno, len, flags);
 
 	/*
 	 * If we find the buffer in the cache with this transaction
@@ -174,15 +173,7 @@ xfs_trans_get_buf(xfs_trans_t	*tp,
 		return (bp);
 	}
 
-	/*
-	 * We always specify the XBF_DONT_BLOCK flag within a transaction
-	 * so that get_buf does not try to push out a delayed write buffer
-	 * which might cause another transaction to take place (if the
-	 * buffer was delayed alloc).  Such recursive transactions can
-	 * easily deadlock with our current transaction as well as cause
-	 * us to run out of stack space.
-	 */
-	bp = xfs_buf_get(target_dev, blkno, len, flags | XBF_DONT_BLOCK);
+	bp = xfs_buf_get(target_dev, blkno, len, flags);
 	if (bp == NULL) {
 		return NULL;
 	}
@@ -283,7 +274,7 @@ xfs_trans_read_buf(
 	 * Default to a normal get_buf() call if the tp is NULL.
 	 */
 	if (tp == NULL) {
-		bp = xfs_buf_read(target, blkno, len, flags | XBF_DONT_BLOCK);
+		bp = xfs_buf_read(target, blkno, len, flags);
 		if (!bp)
 			return (flags & XBF_TRYLOCK) ?
 					EAGAIN : XFS_ERROR(ENOMEM);
@@ -367,15 +358,7 @@ xfs_trans_read_buf(
 		return 0;
 	}
 
-	/*
-	 * We always specify the XBF_DONT_BLOCK flag within a transaction
-	 * so that get_buf does not try to push out a delayed write buffer
-	 * which might cause another transaction to take place (if the
-	 * buffer was delayed alloc).  Such recursive transactions can
-	 * easily deadlock with our current transaction as well as cause
-	 * us to run out of stack space.
-	 */
-	bp = xfs_buf_read(target, blkno, len, flags | XBF_DONT_BLOCK);
+	bp = xfs_buf_read(target, blkno, len, flags);
 	if (bp == NULL) {
 		*bpp = NULL;
 		return (flags & XBF_TRYLOCK) ?
diff --git a/fs/xfs/xfs_vnodeops.c b/fs/xfs/xfs_vnodeops.c
index 8f99c77..6c18745 100644
--- a/fs/xfs/xfs_vnodeops.c
+++ b/fs/xfs/xfs_vnodeops.c
@@ -82,7 +82,7 @@ xfs_readlink_bmap(
 		byte_cnt = XFS_FSB_TO_B(mp, mval[n].br_blockcount);
 
 		bp = xfs_buf_read(mp->m_ddev_targp, d, BTOBB(byte_cnt),
-				  XBF_MAPPED | XBF_DONT_BLOCK);
+				  XBF_MAPPED);
 		if (!bp)
 			return XFS_ERROR(ENOMEM);
 		error = bp->b_error;
@@ -1966,7 +1966,7 @@ xfs_zero_remaining_bytes(
 
 	bp = xfs_buf_get_uncached(XFS_IS_REALTIME_INODE(ip) ?
 					mp->m_rtdev_targp : mp->m_ddev_targp,
-				BTOBB(mp->m_sb.sb_blocksize), XBF_DONT_BLOCK);
+				  BTOBB(mp->m_sb.sb_blocksize), 0);
 	if (!bp)
 		return XFS_ERROR(ENOMEM);
 
-- 
1.7.9.5

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* [PATCH 18/18] xfs: use iolock on XFS_IOC_ALLOCSP calls
  2012-04-13 12:10 [PATCH 0/18] xfs: current patch queue Dave Chinner
                   ` (16 preceding siblings ...)
  2012-04-13 12:11 ` [PATCH 17/18] xfs: kill XBF_DONTBLOCK Dave Chinner
@ 2012-04-13 12:11 ` Dave Chinner
  2012-04-16 15:10   ` Mark Tinguely
  2012-04-16 21:29 ` [PATCH 0/18] xfs: current patch queue Ben Myers
  18 siblings, 1 reply; 45+ messages in thread
From: Dave Chinner @ 2012-04-13 12:11 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

fsstress has a particular effective way of stopping debug XFS
kernels. We keep seeing assert failures due finding delayed
allocation extents where there should be none. This shows up when
extracting extent maps and we are holding all the locks we should be
to prevent races, so this really makes no sense to see these errors.

After checking that fsstress does not use mmap, it occurred to me
that fsstress uses something that no sane application uses - the
XFS_IOC_ALLOCSP ioctl interfaces for preallocation. These interfaces
do allocation of blocks beyond EOF without using preallocation, and
then call setattr to extend and zero the allocated blocks.

THe problem here is this is a buffered write, and hence the
allocation is a delayed allocation. Unlike the buffered IO path, the
allocation and zeroing are not serialised using the IOLOCK. Hence
the ALLOCSP operation can race with operations holding the iolock to
prevent buffered IO operations from occurring.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_vnodeops.c |   22 +++++++++++++++++++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/xfs_vnodeops.c b/fs/xfs/xfs_vnodeops.c
index 6c18745..9b6c94e 100644
--- a/fs/xfs/xfs_vnodeops.c
+++ b/fs/xfs/xfs_vnodeops.c
@@ -2315,17 +2315,33 @@ xfs_change_file_space(
 	case XFS_IOC_ALLOCSP64:
 	case XFS_IOC_FREESP:
 	case XFS_IOC_FREESP64:
+		/*
+		 * These operations actually do IO when extending the file, but
+		 * the allocation is done seperately to the zeroing that is
+		 * done. This set of operations need to be serialised against
+		 * other IO operations, such as truncate and buffered IO. We
+		 * need to take the IOLOCK here to serialise the allocation and
+		 * zeroing IO to prevent other IOLOCK holders (e.g. getbmap,
+		 * truncate, direct IO) from racing against the transient
+		 * allocated but not written state we can have here.
+		 */
+		xfs_ilock(ip, XFS_IOLOCK_EXCL);
 		if (startoffset > fsize) {
 			error = xfs_alloc_file_space(ip, fsize,
-					startoffset - fsize, 0, attr_flags);
-			if (error)
+					startoffset - fsize, 0,
+					attr_flags | XFS_ATTR_NOLOCK);
+			if (error) {
+				xfs_iunlock(ip, XFS_IOLOCK_EXCL);
 				break;
+			}
 		}
 
 		iattr.ia_valid = ATTR_SIZE;
 		iattr.ia_size = startoffset;
 
-		error = xfs_setattr_size(ip, &iattr, attr_flags);
+		error = xfs_setattr_size(ip, &iattr,
+					 attr_flags | XFS_ATTR_NOLOCK);
+		xfs_iunlock(ip, XFS_IOLOCK_EXCL);
 
 		if (error)
 			return error;
-- 
1.7.9.5

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH 05/18] xfs: Use preallocation for inodes with extsz hints
  2012-04-13 12:10 ` [PATCH 05/18] xfs: Use preallocation for inodes with extsz hints Dave Chinner
@ 2012-04-13 16:45   ` Mark Tinguely
  2012-04-16 15:59   ` Mark Tinguely
  1 sibling, 0 replies; 45+ messages in thread
From: Mark Tinguely @ 2012-04-13 16:45 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On 04/13/12 07:10, Dave Chinner wrote:
> From: Dave Chinner<dchinner@redhat.com>
>
> xfstest 229 exposes a problem with buffered IO, delayed allocation
> and extent size hints. That is when we do delayed allocation during
> buffered IO, we reserve space for the extent size hint alignment and
> allocate the physical space to align the extent, but we do not zero
> the regions of the extent that aren't written by the write(2)
> syscall. The result is that we expose stale data in unwritten
> regions of the extent size hints.
>
> There are two ways to fix this. The first is to detect that we are
> doing unaligned writes, check if there is already a mapping or data
> over the extent size hint range, and if not zero the page cache
> first before then doing the real write. This can be very expensive
> for large extent size hints, especially if the subsequent writes
> fill then entire extent size before the data is written to disk.
>
> The second, and simpler way, is simply to turn off delayed
> allocation when the extent size hint is set and use preallocation
> instead. This results in unwritten extents being laid down on disk
> and so only the written portions will be converted. This matches the
> behaviour for direct IO, and will also work for the real time
> device. The disadvantage of this approach is that for small extent
> size hints we can get file fragmentation, but in general extent size
> hints are fairly large (e.g. stripe width sized) so this isn't a big
> deal.
>
> Implement the second approach as it is simple and effective.
>
> Signed-off-by: Dave Chinner<dchinner@redhat.com>
> ---
>   fs/xfs/xfs_aops.c |    2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 2fc12db..19ce2e2 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -1175,7 +1175,7 @@ __xfs_get_blocks(
>   	    (!nimaps ||
>   	     (imap.br_startblock == HOLESTARTBLOCK ||
>   	      imap.br_startblock == DELAYSTARTBLOCK))) {
> -		if (direct) {
> +		if (direct || xfs_get_extsz_hint(ip)) {
>   			xfs_iunlock(ip, lockmode);
>
>   			error = xfs_iomap_write_direct(ip, offset, size,

FYI:

Christoph had reposted the ilock series. This file does not apply 
cleanly to the new post because he added a new comment before the 
xfs_iunlock().

Thank-you for reposting the patches.

-Mark Tinguely <tinguely@sgi.com>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 02/18] xfs: pass shutdown method into xfs_trans_ail_delete_bulk
  2012-04-13 12:10 ` [PATCH 02/18] xfs: pass shutdown method into xfs_trans_ail_delete_bulk Dave Chinner
@ 2012-04-13 17:40   ` Mark Tinguely
  2012-04-13 23:04     ` Dave Chinner
  0 siblings, 1 reply; 45+ messages in thread
From: Mark Tinguely @ 2012-04-13 17:40 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On 04/13/12 07:10, Dave Chinner wrote:
> From: Dave Chinner<dchinner@redhat.com>
>
> xfs_trans_ail_delete_bulk() can be called from different contexts so
> if the item is not in the AIL we need different shutdown for each
> context.  Pass in the shutdown method needed so the correct action
> can be taken.
>
> Signed-off-by: Dave Chinner<dchinner@redhat.com>
> Reviewed-by: Christoph Hellwig<hch@lst.de>
> Reviewed-by: Mark Tinguely<tinguely@sgi.com>



These items in my copy of Christoph's xfsbufd series are not resolved in 
this patch:

In "[01/10] xfs: remove log item from AIL in xfs_qm_dqflush after a 
shutdown" a xfs_trans_ail_delete() has been added:

Index: xfs/fs/xfs/xfs_dquot.c
===================================================================
--- xfs.orig/fs/xfs/xfs_dquot.c	2012-02-23 17:52:53.916002428 -0800
+++ xfs/fs/xfs/xfs_dquot.c	2012-02-23 17:53:01.829335739 -0800
@@ -904,10 +904,21 @@ xfs_qm_dqflush(
  	/*
  	 * This may have been unpinned because the filesystem is shutting
  	 * down forcibly. If that's the case we must not write this dquot
-	 * to disk, because the log record didn't make it to disk!
+	 * to disk, because the log record didn't make it to disk.
+	 *
+	 * We also have to remove the log item from the AIL in this case,
+	 * as we wait for an emptry AIL as part of the unmount process.
  	 */
  	if (XFS_FORCED_SHUTDOWN(mp)) {
+		struct xfs_log_item	*lip = &dqp->q_logitem.qli_item;
  		dqp->dq_flags &= ~XFS_DQ_DIRTY;
+
+		spin_lock(&mp->m_ail->xa_lock);
+		if (lip->li_flags & XFS_LI_IN_AIL)
+			xfs_trans_ail_delete(mp->m_ail, lip);
+		else
+			spin_unlock(&mp->m_ail->xa_lock);

                       ====
In "[02/10] xfs: remove log item from AIL in xfs_iflush after a 
shutdown", he added another xfs_iflush_abort(). That is not converted 
with a boolean in this series:


Index: xfs/fs/xfs/xfs_sync.c
===================================================================
--- xfs.orig/fs/xfs/xfs_sync.c	2012-03-16 12:44:57.707030619 +0100
+++ xfs/fs/xfs/xfs_sync.c	2012-03-16 12:47:03.697032954 +0100
@@ -783,6 +783,7 @@ restart:
  		goto reclaim;
  	if (XFS_FORCED_SHUTDOWN(ip->i_mount)) {
  		xfs_iunpin_wait(ip);
+		xfs_iflush_abort(ip);

Thank-you,

--Mark Tinguely <tinguely@sgi.com>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 07/18] xfs: check for buffer errors before waiting
  2012-04-13 12:10 ` [PATCH 07/18] xfs: check for buffer errors before waiting Dave Chinner
@ 2012-04-13 17:56   ` Mark Tinguely
  0 siblings, 0 replies; 45+ messages in thread
From: Mark Tinguely @ 2012-04-13 17:56 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On 04/13/12 07:10, Dave Chinner wrote:
> From: Dave Chinner<dchinner@redhat.com>
>
> If we call xfs_buf_iowait() on a buffer that failed dispatch due to
> an IO error, it will wait forever for an Io that does not exist.
> This is hndled in xfs_buf_read, but there is other code that calls
> xfs_buf_iowait directly that doesn't.
>
> Rather than make the call sites have to handle checking for dispatch
> errors and then checking for completion errors, make
> xfs_buf_iowait() check for dispatch errors on the buffer before
> waiting. This means we handle both dispatch and completion errors
> with one set of error handling at the caller sites.
>
> Signed-off-by: Dave Chinner<dchinner@redhat.com>
> Reviewed-by: Christoph Hellwig<hch@lst.de>
> ---

Looks good.

Reviewed-by: Mark Tinguely <tinguely@sgi.com>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 01/18] xfs: Ensure inode reclaim can run during quotacheck
  2012-04-13 12:10 ` [PATCH 01/18] xfs: Ensure inode reclaim can run during quotacheck Dave Chinner
@ 2012-04-13 18:01   ` Mark Tinguely
  2012-04-29 21:37   ` Christoph Hellwig
  1 sibling, 0 replies; 45+ messages in thread
From: Mark Tinguely @ 2012-04-13 18:01 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On 04/13/12 07:10, Dave Chinner wrote:
> From: Dave Chinner<dchinner@redhat.com>
>
> Because the mount process can run a quotacheck and consume lots of
> inodes, we need to be able to run periodic inode reclaim during the
> mount process. This will prevent running the system out of memory
> during quota checks.
>
> This essentially reverts 2bcf6e97, but that is safe to do now that
> the quota sync code that was causing problems during long quotacheck
> executions is now gone.
>
> The reclaim work is currently protected from running during the
> unmount process by a check against MS_ACTIVE. Unfortunately, this
> also means that the reclaim work cannot run during mount.  The
> unmount process should stop the reclaim cleanly before freeing
> anything that the reclaim work depends on, so there is no need to
> have this guard in place.
>
> Also, the inode reclaim work is demand driven, so there is no need
> to start it immediately during mount. It will be started the moment
> an inode is queued for reclaim, so qutoacheck will trigger it just
> fine.
>
> Signed-off-by: Dave Chinner<dchinner@redhat.com>
> ---


Looks good.

Reviewed-by: Mark Tinguely <tinguely@sgi.com>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 06/18] xfs: fix buffer lookup race on allocation failure
  2012-04-13 12:10 ` [PATCH 06/18] xfs: fix buffer lookup race on allocation failure Dave Chinner
@ 2012-04-13 18:32   ` Mark Tinguely
  0 siblings, 0 replies; 45+ messages in thread
From: Mark Tinguely @ 2012-04-13 18:32 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On 04/13/12 07:10, Dave Chinner wrote:
> From: Dave Chinner<dchinner@redhat.com>
>
> When memory allocation fails to add the page array or tht epages to
> a buffer during xfs_buf_get(), the buffer is left in the cache in a
> partially initialised state. There is enough state left for the next
> lookup on that buffer to find the buffer, and for the buffer to then
> be used without finishing the initialisation.  As a result, when an
> attempt to do IO on the buffer occurs, it fails with EIO because
> there are no pages attached to the buffer.
>
> We cannot remove the buffer from the cache immediately and free it,
> because there may already be a racing lookup that is blocked on the
> buffer lock. Hence the moment we unlock the buffer to then free it,
> the other user is woken and we have a use-after-free situation.
>
> To avoid this race condition altogether, allocate the pages for the
> buffer before we insert it into the cache.  This then means that we
> don't have an allocation  failure case to deal after the buffer is
> already present in the cache, and hence avoid the problem
> altogether.  In most cases we won't have racing inserts for the same
> buffer, and so won't increase the memory pressure allocation before
> insertion may entail.
>
> Signed-off-by: Dave Chinner<dchinner@redhat.com>
> ---



Much simpler idea than v1. Looks good.

Reviewed-by: Mark Tinguely <tinguely@sgi.com>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 15/18] xfs: kill XBF_LOCK
  2012-04-13 12:10 ` [PATCH 15/18] xfs: kill XBF_LOCK Dave Chinner
@ 2012-04-13 21:20   ` Mark Tinguely
  0 siblings, 0 replies; 45+ messages in thread
From: Mark Tinguely @ 2012-04-13 21:20 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On 04/13/12 07:10, Dave Chinner wrote:
> From: Dave Chinner<dchinner@redhat.com>
>
> Buffers are always returned locked from the lookup routines. Hence
> we don't need to tell the lookup routines to return locked buffers,
> on to try and lock them. Remove XBF_LOCK from all the callers and
> from internal buffer cache usage.
>
> Signed-off-by: Dave Chinner<dchinner@redhat.com>
> Reviewed-by: Christoph Hellwig<hch@lst.de>
> ---

Looks good.

Reviewed-by: Mark Tinguely <tinguely@sgi.com>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 02/18] xfs: pass shutdown method into xfs_trans_ail_delete_bulk
  2012-04-13 17:40   ` Mark Tinguely
@ 2012-04-13 23:04     ` Dave Chinner
  2012-04-13 23:38       ` [PATCH 02/18 V2] " Dave Chinner
  0 siblings, 1 reply; 45+ messages in thread
From: Dave Chinner @ 2012-04-13 23:04 UTC (permalink / raw)
  To: Mark Tinguely; +Cc: xfs

On Fri, Apr 13, 2012 at 12:40:46PM -0500, Mark Tinguely wrote:
> On 04/13/12 07:10, Dave Chinner wrote:
> >From: Dave Chinner<dchinner@redhat.com>
> >
> >xfs_trans_ail_delete_bulk() can be called from different contexts so
> >if the item is not in the AIL we need different shutdown for each
> >context.  Pass in the shutdown method needed so the correct action
> >can be taken.
> >
> >Signed-off-by: Dave Chinner<dchinner@redhat.com>
> >Reviewed-by: Christoph Hellwig<hch@lst.de>
> >Reviewed-by: Mark Tinguely<tinguely@sgi.com>
> 
> 
> 
> These items in my copy of Christoph's xfsbufd series are not
> resolved in this patch:

Ah, oh, that's my fault. I re-ordered the series so that all my
patches were after christoph's, and when reapplying christophs
patches from my series there were a few merge conflicts I obviously
didn't resolve correctly.

I'll fix my tree up and repost this patch.

Cheers,

Dave.

-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH 02/18 V2] xfs: pass shutdown method into xfs_trans_ail_delete_bulk
  2012-04-13 23:04     ` Dave Chinner
@ 2012-04-13 23:38       ` Dave Chinner
  2012-04-16 18:49         ` Mark Tinguely
  0 siblings, 1 reply; 45+ messages in thread
From: Dave Chinner @ 2012-04-13 23:38 UTC (permalink / raw)
  To: Mark Tinguely; +Cc: xfs

On Sat, Apr 14, 2012 at 09:04:51AM +1000, Dave Chinner wrote:
> On Fri, Apr 13, 2012 at 12:40:46PM -0500, Mark Tinguely wrote:
> > On 04/13/12 07:10, Dave Chinner wrote:
> > >From: Dave Chinner<dchinner@redhat.com>
> > >
> > >xfs_trans_ail_delete_bulk() can be called from different contexts so
> > >if the item is not in the AIL we need different shutdown for each
> > >context.  Pass in the shutdown method needed so the correct action
> > >can be taken.
> > >
> > >Signed-off-by: Dave Chinner<dchinner@redhat.com>
> > >Reviewed-by: Christoph Hellwig<hch@lst.de>
> > >Reviewed-by: Mark Tinguely<tinguely@sgi.com>
> > 
> > 
> > 
> > These items in my copy of Christoph's xfsbufd series are not
> > resolved in this patch:
> 
> Ah, oh, that's my fault. I re-ordered the series so that all my
> patches were after christoph's, and when reapplying christophs
> patches from my series there were a few merge conflicts I obviously
> didn't resolve correctly.
> 
> I'll fix my tree up and repost this patch.

Updated patch below.

-- 
Dave Chinner
david@fromorbit.com

xfs: pass shutdown method into xfs_trans_ail_delete_bulk

From: Dave Chinner <dchinner@redhat.com>

xfs_trans_ail_delete_bulk() can be called from different contexts so
if the item is not in the AIL we need different shutdown for each
context.  Pass in the shutdown method needed so the correct action
can be taken.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Mark Tinguely <tinguely@sgi.com>

---
 fs/xfs/xfs_buf_item.c     |    4 ++--
 fs/xfs/xfs_dquot.c        |    5 +++--
 fs/xfs/xfs_dquot_item.c   |    2 +-
 fs/xfs/xfs_extfree_item.c |    3 ++-
 fs/xfs/xfs_inode.c        |    4 ++--
 fs/xfs/xfs_inode_item.c   |   23 +++++++++++++----------
 fs/xfs/xfs_inode_item.h   |    2 +-
 fs/xfs/xfs_log_recover.c  |    3 ++-
 fs/xfs/xfs_sync.c         |    2 +-
 fs/xfs/xfs_trans_ail.c    |    5 +++--
 fs/xfs/xfs_trans_priv.h   |    8 +++++---
 11 files changed, 35 insertions(+), 26 deletions(-)

diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c
index fb20f38..7f0abea 100644
--- a/fs/xfs/xfs_buf_item.c
+++ b/fs/xfs/xfs_buf_item.c
@@ -454,7 +454,7 @@ xfs_buf_item_unpin(
 			bp->b_iodone = NULL;
 		} else {
 			spin_lock(&ailp->xa_lock);
-			xfs_trans_ail_delete(ailp, (xfs_log_item_t *)bip);
+			xfs_trans_ail_delete(ailp, lip, SHUTDOWN_LOG_IO_ERROR);
 			xfs_buf_item_relse(bp);
 			ASSERT(bp->b_fspriv == NULL);
 		}
@@ -1006,6 +1006,6 @@ xfs_buf_iodone(
 	 * Either way, AIL is useless if we're forcing a shutdown.
 	 */
 	spin_lock(&ailp->xa_lock);
-	xfs_trans_ail_delete(ailp, lip);
+	xfs_trans_ail_delete(ailp, lip, SHUTDOWN_CORRUPT_INCORE);
 	xfs_buf_item_free(BUF_ITEM(lip));
 }
diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
index 65b8aa3..7bf3855 100644
--- a/fs/xfs/xfs_dquot.c
+++ b/fs/xfs/xfs_dquot.c
@@ -857,7 +857,7 @@ xfs_qm_dqflush_done(
 		/* xfs_trans_ail_delete() drops the AIL lock. */
 		spin_lock(&ailp->xa_lock);
 		if (lip->li_lsn == qip->qli_flush_lsn)
-			xfs_trans_ail_delete(ailp, lip);
+			xfs_trans_ail_delete(ailp, lip, SHUTDOWN_CORRUPT_INCORE);
 		else
 			spin_unlock(&ailp->xa_lock);
 	}
@@ -909,7 +909,8 @@ xfs_qm_dqflush(
 
 		spin_lock(&mp->m_ail->xa_lock);
 		if (lip->li_flags & XFS_LI_IN_AIL)
-			xfs_trans_ail_delete(mp->m_ail, lip);
+			xfs_trans_ail_delete(mp->m_ail, lip,
+					     SHUTDOWN_CORRUPT_INCORE);
 		else
 			spin_unlock(&mp->m_ail->xa_lock);
 		error = XFS_ERROR(EIO);
diff --git a/fs/xfs/xfs_dquot_item.c b/fs/xfs/xfs_dquot_item.c
index 9c5d58d..aa6a2a6 100644
--- a/fs/xfs/xfs_dquot_item.c
+++ b/fs/xfs/xfs_dquot_item.c
@@ -384,7 +384,7 @@ xfs_qm_qoffend_logitem_committed(
 	 * xfs_trans_ail_delete() drops the AIL lock.
 	 */
 	spin_lock(&ailp->xa_lock);
-	xfs_trans_ail_delete(ailp, (xfs_log_item_t *)qfs);
+	xfs_trans_ail_delete(ailp, &qfs->qql_item, SHUTDOWN_LOG_IO_ERROR);
 
 	kmem_free(qfs);
 	kmem_free(qfe);
diff --git a/fs/xfs/xfs_extfree_item.c b/fs/xfs/xfs_extfree_item.c
index 1a40dd7..1c57eb9 100644
--- a/fs/xfs/xfs_extfree_item.c
+++ b/fs/xfs/xfs_extfree_item.c
@@ -64,7 +64,8 @@ __xfs_efi_release(
 	if (!test_and_clear_bit(XFS_EFI_COMMITTED, &efip->efi_flags)) {
 		spin_lock(&ailp->xa_lock);
 		/* xfs_trans_ail_delete() drops the AIL lock. */
-		xfs_trans_ail_delete(ailp, &efip->efi_item);
+		xfs_trans_ail_delete(ailp, &efip->efi_item,
+				     SHUTDOWN_LOG_IO_ERROR);
 		xfs_efi_item_free(efip);
 	}
 }
diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index acd846d..65d7d99 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -2377,7 +2377,7 @@ cluster_corrupt_out:
 	/*
 	 * Unlocks the flush lock
 	 */
-	xfs_iflush_abort(iq);
+	xfs_iflush_abort(iq, false);
 	kmem_free(ilist);
 	xfs_perag_put(pag);
 	return XFS_ERROR(EFSCORRUPTED);
@@ -2482,7 +2482,7 @@ abort_out:
 	/*
 	 * Unlocks the flush lock
 	 */
-	xfs_iflush_abort(ip);
+	xfs_iflush_abort(ip, false);
 	return error;
 }
 
diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c
index 8aaebb2..3f96a94 100644
--- a/fs/xfs/xfs_inode_item.c
+++ b/fs/xfs/xfs_inode_item.c
@@ -777,7 +777,8 @@ xfs_iflush_done(
 			ASSERT(i <= need_ail);
 		}
 		/* xfs_trans_ail_delete_bulk() drops the AIL lock. */
-		xfs_trans_ail_delete_bulk(ailp, log_items, i);
+		xfs_trans_ail_delete_bulk(ailp, log_items, i,
+					  SHUTDOWN_CORRUPT_INCORE);
 	}
 
 
@@ -798,16 +799,15 @@ xfs_iflush_done(
 }
 
 /*
- * This is the inode flushing abort routine.  It is called
- * from xfs_iflush when the filesystem is shutting down to clean
- * up the inode state.
- * It is responsible for removing the inode item
- * from the AIL if it has not been re-logged, and unlocking the inode's
- * flush lock.
+ * This is the inode flushing abort routine.  It is called from xfs_iflush when
+ * the filesystem is shutting down to clean up the inode state.  It is
+ * responsible for removing the inode item from the AIL if it has not been
+ * re-logged, and unlocking the inode's flush lock.
  */
 void
 xfs_iflush_abort(
-	xfs_inode_t		*ip)
+	xfs_inode_t		*ip,
+	bool			stale)
 {
 	xfs_inode_log_item_t	*iip = ip->i_itemp;
 
@@ -817,7 +817,10 @@ xfs_iflush_abort(
 			spin_lock(&ailp->xa_lock);
 			if (iip->ili_item.li_flags & XFS_LI_IN_AIL) {
 				/* xfs_trans_ail_delete() drops the AIL lock. */
-				xfs_trans_ail_delete(ailp, (xfs_log_item_t *)iip);
+				xfs_trans_ail_delete(ailp, &iip->ili_item,
+						stale ?
+						     SHUTDOWN_LOG_IO_ERROR :
+						     SHUTDOWN_CORRUPT_INCORE);
 			} else
 				spin_unlock(&ailp->xa_lock);
 		}
@@ -844,7 +847,7 @@ xfs_istale_done(
 	struct xfs_buf		*bp,
 	struct xfs_log_item	*lip)
 {
-	xfs_iflush_abort(INODE_ITEM(lip)->ili_inode);
+	xfs_iflush_abort(INODE_ITEM(lip)->ili_inode, true);
 }
 
 /*
diff --git a/fs/xfs/xfs_inode_item.h b/fs/xfs/xfs_inode_item.h
index 41d61c3..376d4d0 100644
--- a/fs/xfs/xfs_inode_item.h
+++ b/fs/xfs/xfs_inode_item.h
@@ -165,7 +165,7 @@ extern void xfs_inode_item_init(struct xfs_inode *, struct xfs_mount *);
 extern void xfs_inode_item_destroy(struct xfs_inode *);
 extern void xfs_iflush_done(struct xfs_buf *, struct xfs_log_item *);
 extern void xfs_istale_done(struct xfs_buf *, struct xfs_log_item *);
-extern void xfs_iflush_abort(struct xfs_inode *);
+extern void xfs_iflush_abort(struct xfs_inode *, bool);
 extern int xfs_inode_item_format_convert(xfs_log_iovec_t *,
 					 xfs_inode_log_format_t *);
 
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 5e864a9..396e3bf 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -2645,7 +2645,8 @@ xlog_recover_efd_pass2(
 				 * xfs_trans_ail_delete() drops the
 				 * AIL lock.
 				 */
-				xfs_trans_ail_delete(ailp, lip);
+				xfs_trans_ail_delete(ailp, lip,
+						     SHUTDOWN_CORRUPT_INCORE);
 				xfs_efi_item_free(efip);
 				spin_lock(&ailp->xa_lock);
 				break;
diff --git a/fs/xfs/xfs_sync.c b/fs/xfs/xfs_sync.c
index 67f3ea6..f7f019f 100644
--- a/fs/xfs/xfs_sync.c
+++ b/fs/xfs/xfs_sync.c
@@ -680,7 +680,7 @@ restart:
 		goto reclaim;
 	if (XFS_FORCED_SHUTDOWN(ip->i_mount)) {
 		xfs_iunpin_wait(ip);
-		xfs_iflush_abort(ip);
+		xfs_iflush_abort(ip, false);
 		goto reclaim;
 	}
 	if (xfs_ipincount(ip)) {
diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
index 99106f1..cd2fde7 100644
--- a/fs/xfs/xfs_trans_ail.c
+++ b/fs/xfs/xfs_trans_ail.c
@@ -717,7 +717,8 @@ void
 xfs_trans_ail_delete_bulk(
 	struct xfs_ail		*ailp,
 	struct xfs_log_item	**log_items,
-	int			nr_items) __releases(ailp->xa_lock)
+	int			nr_items,
+	int			shutdown_type) __releases(ailp->xa_lock)
 {
 	xfs_log_item_t		*mlip;
 	int			mlip_changed = 0;
@@ -735,7 +736,7 @@ xfs_trans_ail_delete_bulk(
 				xfs_alert_tag(mp, XFS_PTAG_AILDELETE,
 		"%s: attempting to delete a log item that is not in the AIL",
 						__func__);
-				xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE);
+				xfs_force_shutdown(mp, shutdown_type);
 			}
 			return;
 		}
diff --git a/fs/xfs/xfs_trans_priv.h b/fs/xfs/xfs_trans_priv.h
index dd2742f..da03d1a 100644
--- a/fs/xfs/xfs_trans_priv.h
+++ b/fs/xfs/xfs_trans_priv.h
@@ -93,14 +93,16 @@ xfs_trans_ail_update(
 }
 
 void	xfs_trans_ail_delete_bulk(struct xfs_ail *ailp,
-				struct xfs_log_item **log_items, int nr_items)
+				struct xfs_log_item **log_items, int nr_items,
+				int shutdown_type)
 				__releases(ailp->xa_lock);
 static inline void
 xfs_trans_ail_delete(
 	struct xfs_ail	*ailp,
-	xfs_log_item_t	*lip) __releases(ailp->xa_lock)
+	xfs_log_item_t	*lip,
+	int		shutdown_type) __releases(ailp->xa_lock)
 {
-	xfs_trans_ail_delete_bulk(ailp, &lip, 1);
+	xfs_trans_ail_delete_bulk(ailp, &lip, 1, shutdown_type);
 }
 
 void			xfs_ail_push(struct xfs_ail *, xfs_lsn_t);

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 45+ messages in thread

* Re: [PATCH 17/18] xfs: kill XBF_DONTBLOCK
  2012-04-13 12:11 ` [PATCH 17/18] xfs: kill XBF_DONTBLOCK Dave Chinner
@ 2012-04-16 14:34   ` Mark Tinguely
  0 siblings, 0 replies; 45+ messages in thread
From: Mark Tinguely @ 2012-04-16 14:34 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On 04/13/12 07:11, Dave Chinner wrote:
> From: Dave Chinner<dchinner@redhat.com>
>
> Just about all callers of xfs_buf_read() and xfs_buf_get() use XBF_DONTBLOCK.
> This is used to make memory allocation use GFP_NOFS rather than GFP_KERNEL to
> avoid recursion through memory reclaim back into the filesystem.
>
> All the blocking get calls in growfs occur inside a transaction, even though
> they are no part of the transaction, so all allocation will be GFP_NOFS due to
> the task flag PF_TRANS being set. The blocking read calls occur during log
> recovery, so they will probably be unaffected by converting to GFP_NOFS
> allocations.
>
> Hence make XBF_DONTBLOCK behaviour always occur for buffers and kill the flag.
>
> Signed-off-by: Dave Chinner<dchinner@redhat.com>
> Reviewed-by: Christoph Hellwig<hch@lst.de>
> ---

Looks good.

Reviewed-by: Mark Tinguely <tinguely@sgi.com>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 18/18] xfs: use iolock on XFS_IOC_ALLOCSP calls
  2012-04-13 12:11 ` [PATCH 18/18] xfs: use iolock on XFS_IOC_ALLOCSP calls Dave Chinner
@ 2012-04-16 15:10   ` Mark Tinguely
  0 siblings, 0 replies; 45+ messages in thread
From: Mark Tinguely @ 2012-04-16 15:10 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On 04/13/12 07:11, Dave Chinner wrote:
> From: Dave Chinner<dchinner@redhat.com>
>
> fsstress has a particular effective way of stopping debug XFS
> kernels. We keep seeing assert failures due finding delayed
> allocation extents where there should be none. This shows up when
> extracting extent maps and we are holding all the locks we should be
> to prevent races, so this really makes no sense to see these errors.
>
> After checking that fsstress does not use mmap, it occurred to me
> that fsstress uses something that no sane application uses - the
> XFS_IOC_ALLOCSP ioctl interfaces for preallocation. These interfaces
> do allocation of blocks beyond EOF without using preallocation, and
> then call setattr to extend and zero the allocated blocks.
>
> THe problem here is this is a buffered write, and hence the
> allocation is a delayed allocation. Unlike the buffered IO path, the
> allocation and zeroing are not serialised using the IOLOCK. Hence
> the ALLOCSP operation can race with operations holding the iolock to
> prevent buffered IO operations from occurring.
>
> Signed-off-by: Dave Chinner<dchinner@redhat.com>
> Reviewed-by: Christoph Hellwig<hch@lst.de>
> ---


Looks good.

Reviewed-by: Mark Tinguely <tinguely@sgi.com>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 05/18] xfs: Use preallocation for inodes with extsz hints
  2012-04-13 12:10 ` [PATCH 05/18] xfs: Use preallocation for inodes with extsz hints Dave Chinner
  2012-04-13 16:45   ` Mark Tinguely
@ 2012-04-16 15:59   ` Mark Tinguely
  1 sibling, 0 replies; 45+ messages in thread
From: Mark Tinguely @ 2012-04-16 15:59 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On 04/13/12 07:10, Dave Chinner wrote:
> From: Dave Chinner<dchinner@redhat.com>
>
> xfstest 229 exposes a problem with buffered IO, delayed allocation
> and extent size hints. That is when we do delayed allocation during
> buffered IO, we reserve space for the extent size hint alignment and
> allocate the physical space to align the extent, but we do not zero
> the regions of the extent that aren't written by the write(2)
> syscall. The result is that we expose stale data in unwritten
> regions of the extent size hints.
>
> There are two ways to fix this. The first is to detect that we are
> doing unaligned writes, check if there is already a mapping or data
> over the extent size hint range, and if not zero the page cache
> first before then doing the real write. This can be very expensive
> for large extent size hints, especially if the subsequent writes
> fill then entire extent size before the data is written to disk.
>
> The second, and simpler way, is simply to turn off delayed
> allocation when the extent size hint is set and use preallocation
> instead. This results in unwritten extents being laid down on disk
> and so only the written portions will be converted. This matches the
> behaviour for direct IO, and will also work for the real time
> device. The disadvantage of this approach is that for small extent
> size hints we can get file fragmentation, but in general extent size
> hints are fairly large (e.g. stripe width sized) so this isn't a big
> deal.
>
> Implement the second approach as it is simple and effective.
>
> Signed-off-by: Dave Chinner<dchinner@redhat.com>
> ---


Looks good.

Reviewed-by: Mark Tinguely <tinguely@sgi.com>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 04/18] xfs: page type check in writeback only checks last buffer
  2012-04-13 12:10 ` [PATCH 04/18] xfs: page type check in writeback only checks last buffer Dave Chinner
@ 2012-04-16 16:15   ` Mark Tinguely
  2012-04-29 21:39   ` Christoph Hellwig
  1 sibling, 0 replies; 45+ messages in thread
From: Mark Tinguely @ 2012-04-16 16:15 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On 04/13/12 07:10, Dave Chinner wrote:
> From: Dave Chinner<dchinner@redhat.com>
>
> xfs_is_delayed_page() checks to see if a page has buffers matching
> the given IO type passed in. It does so by walking the buffer heads
> on the page and checking if the state flags match the IO type.
>
> However, the "acceptable" variable that is calculated is overwritten
> every time a new buffer is checked. Hence if the first buffer on the
> page is of the right type, this state is lost if the second buffer
> is not of the correct type. This means that xfs_aops_discard_page()
> may not discard delalloc regions when it is supposed to, and
> xfs_convert_page() may not cluster IO as efficiently as possible.
>
> This problem only occurs on filesystems with a block size smaller
> than page size.
>
> Also, rename xfs_is_delayed_page() to xfs_check_page_type() to
> better describe what it is doing - it is not delalloc specific
> anymore.
>
> The problem was first noticed by Peter Watkins.
>
> Signed-off-by: Dave Chinner<dchinner@redhat.com>
> ---
>   fs/xfs/xfs_aops.c |   12 ++++++------
>   1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
> index 0783def..2fc12db 100644
> --- a/fs/xfs/xfs_aops.c
> +++ b/fs/xfs/xfs_aops.c
> @@ -623,7 +623,7 @@ xfs_map_at_offset(
>    * or delayed allocate extent.
>    */
>   STATIC int
> -xfs_is_delayed_page(
> +xfs_check_page_type(
>   	struct page		*page,
>   	unsigned int		type)
>   {
> @@ -637,11 +637,11 @@ xfs_is_delayed_page(
>   		bh = head = page_buffers(page);
>   		do {
>   			if (buffer_unwritten(bh))
> -				acceptable = (type == IO_UNWRITTEN);
> +				acceptable += (type == IO_UNWRITTEN);
>   			else if (buffer_delay(bh))
> -				acceptable = (type == IO_DELALLOC);
> +				acceptable += (type == IO_DELALLOC);
>   			else if (buffer_dirty(bh)&&  buffer_mapped(bh))
> -				acceptable = (type == IO_OVERWRITE);
> +				acceptable += (type == IO_OVERWRITE);
>   			else
>   				break;

Looks good.

Could short-cut and return 1 on the first acceptable buffer rather than 
scanning the entire too.

Reviewed-by: Mark Tinguely <tinguely@sgi.com>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 02/18 V2] xfs: pass shutdown method into xfs_trans_ail_delete_bulk
  2012-04-13 23:38       ` [PATCH 02/18 V2] " Dave Chinner
@ 2012-04-16 18:49         ` Mark Tinguely
  0 siblings, 0 replies; 45+ messages in thread
From: Mark Tinguely @ 2012-04-16 18:49 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On 04/13/12 18:38, Dave Chinner wrote:
> On Sat, Apr 14, 2012 at 09:04:51AM +1000, Dave Chinner wrote:
>> On Fri, Apr 13, 2012 at 12:40:46PM -0500, Mark Tinguely wrote:
>>> On 04/13/12 07:10, Dave Chinner wrote:
>>>> From: Dave Chinner<dchinner@redhat.com>
>>>>
>>>> xfs_trans_ail_delete_bulk() can be called from different contexts so
>>>> if the item is not in the AIL we need different shutdown for each
>>>> context.  Pass in the shutdown method needed so the correct action
>>>> can be taken.
>>>>
>>>> Signed-off-by: Dave Chinner<dchinner@redhat.com>
>>>> Reviewed-by: Christoph Hellwig<hch@lst.de>
>>>> Reviewed-by: Mark Tinguely<tinguely@sgi.com>
>>>
>>>
>>>
>>> These items in my copy of Christoph's xfsbufd series are not
>>> resolved in this patch:
>>
>> Ah, oh, that's my fault. I re-ordered the series so that all my
>> patches were after christoph's, and when reapplying christophs
>> patches from my series there were a few merge conflicts I obviously
>> didn't resolve correctly.
>>
>> I'll fix my tree up and repost this patch.
>
> Updated patch below.
>

Applies cleanly. Thanks.

Reviewed-by: Mark Tinguely <tinguely@sgi.com>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 0/18] xfs: current patch queue
  2012-04-13 12:10 [PATCH 0/18] xfs: current patch queue Dave Chinner
                   ` (17 preceding siblings ...)
  2012-04-13 12:11 ` [PATCH 18/18] xfs: use iolock on XFS_IOC_ALLOCSP calls Dave Chinner
@ 2012-04-16 21:29 ` Ben Myers
  2012-04-17  4:12   ` Dave Chinner
  18 siblings, 1 reply; 45+ messages in thread
From: Ben Myers @ 2012-04-16 21:29 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

Dave, 

On Fri, Apr 13, 2012 at 10:10:43PM +1000, Dave Chinner wrote:
> Here's a repost of my current patch queue. They are all based on Christoph's
> ilock and xfsbufd patch series and been tested on top of them. I've updated all
> the reviewed-by tags, so if the patch is missing such a tag then it hasn't been
> reviewed....

Looks like Christoph has some work to do on patch 4 of his xfsbufd series.  Do
you want to rebase to something without that patch?

Regards,
	Ben

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 0/18] xfs: current patch queue
  2012-04-16 21:29 ` [PATCH 0/18] xfs: current patch queue Ben Myers
@ 2012-04-17  4:12   ` Dave Chinner
  2012-04-17 14:44     ` Ben Myers
  0 siblings, 1 reply; 45+ messages in thread
From: Dave Chinner @ 2012-04-17  4:12 UTC (permalink / raw)
  To: Ben Myers; +Cc: xfs

On Mon, Apr 16, 2012 at 04:29:11PM -0500, Ben Myers wrote:
> Dave, 
> 
> On Fri, Apr 13, 2012 at 10:10:43PM +1000, Dave Chinner wrote:
> > Here's a repost of my current patch queue. They are all based on Christoph's
> > ilock and xfsbufd patch series and been tested on top of them. I've updated all
> > the reviewed-by tags, so if the patch is missing such a tag then it hasn't been
> > reviewed....
> 
> Looks like Christoph has some work to do on patch 4 of his xfsbufd series.  Do
> you want to rebase to something without that patch?

It's a bit of a PITA to do because of all the overlap with
Christoph's patch set. I'll spend the afternoon trying to work out
what is wrong with Christoph's code, because I think we really need
to get rid of the xfsbufd before I start making the buffer code more
complex again...

Cheers,

Dave.

-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 0/18] xfs: current patch queue
  2012-04-17  4:12   ` Dave Chinner
@ 2012-04-17 14:44     ` Ben Myers
  0 siblings, 0 replies; 45+ messages in thread
From: Ben Myers @ 2012-04-17 14:44 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

Hey Dave,

On Tue, Apr 17, 2012 at 02:12:58PM +1000, Dave Chinner wrote:
> On Mon, Apr 16, 2012 at 04:29:11PM -0500, Ben Myers wrote:
> > On Fri, Apr 13, 2012 at 10:10:43PM +1000, Dave Chinner wrote:
> > > Here's a repost of my current patch queue. They are all based on Christoph's
> > > ilock and xfsbufd patch series and been tested on top of them. I've updated all
> > > the reviewed-by tags, so if the patch is missing such a tag then it hasn't been
> > > reviewed....
> > 
> > Looks like Christoph has some work to do on patch 4 of his xfsbufd series.  Do
> > you want to rebase to something without that patch?
> 
> It's a bit of a PITA to do because of all the overlap with
> Christoph's patch set.

I noticed.  ;)

> I'll spend the afternoon trying to work out
> what is wrong with Christoph's code, because I think we really need
> to get rid of the xfsbufd before I start making the buffer code more
> complex again...

Cool.  I agree it's great to get rid of xfsbufd, just wanted you to know that
the hang is blocking this series.

Regards,
	Ben

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 03/18] xfs: Do background CIL flushes via a workqueue
  2012-04-13 12:10 ` [PATCH 03/18] xfs: Do background CIL flushes via a workqueue Dave Chinner
@ 2012-04-17 17:54   ` Mark Tinguely
  2012-04-17 21:21   ` Ben Myers
  1 sibling, 0 replies; 45+ messages in thread
From: Mark Tinguely @ 2012-04-17 17:54 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On 04/13/12 07:10, Dave Chinner wrote:
> From: Dave Chinner<dchinner@redhat.com>
>
> Doing background CIL flushes adds significant latency to whatever
> async transaction that triggers it. To avoid blocking async
> transactions on things like waiting for log buffer IO to complete,
> move the CIL push off into a workqueue.  By moving the push work
> into a workqueue, we remove all the latency that the commit adds
> from the foreground transaction commit path. This also means that
> single threaded workloads won't do the CIL push procssing, leaving
> them more CPU to do more async transactions.
>
> To do this, we need to keep track of the sequence number we have
> pushed work for. This avoids having many transaction commits
> attempting to schedule work for the same sequence, and ensures that
> we only ever have one push (background or forced) in progress at a
> time. It also means that we don't need to take the CIL lock in write
> mode to check for potential background push races, which reduces
> lock contention.
>
> To avoid potential issues with "smart" IO schedulers, don't use the
> workqueue for log force triggered flushes. Instead, do them directly
> so that the log IO is done directly by the process issuing the log
> force and so doesn't get stuck on IO elevator queue idling
> incorrectly delaying the log IO from the workqueue.
>
> Signed-off-by: Dave Chinner<dchinner@redhat.com>
> ---

Makes sense.

Reviewed-by: Mark Tinguely <tinguely@sgi.com>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 03/18] xfs: Do background CIL flushes via a workqueue
  2012-04-13 12:10 ` [PATCH 03/18] xfs: Do background CIL flushes via a workqueue Dave Chinner
  2012-04-17 17:54   ` Mark Tinguely
@ 2012-04-17 21:21   ` Ben Myers
  2012-04-17 21:49     ` Ben Myers
  2012-04-18  1:58     ` Dave Chinner
  1 sibling, 2 replies; 45+ messages in thread
From: Ben Myers @ 2012-04-17 21:21 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Fri, Apr 13, 2012 at 10:10:46PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Doing background CIL flushes adds significant latency to whatever
> async transaction that triggers it. To avoid blocking async
> transactions on things like waiting for log buffer IO to complete,
> move the CIL push off into a workqueue.  By moving the push work
> into a workqueue, we remove all the latency that the commit adds
> from the foreground transaction commit path. This also means that
> single threaded workloads won't do the CIL push procssing, leaving
> them more CPU to do more async transactions.
> 
> To do this, we need to keep track of the sequence number we have
> pushed work for. This avoids having many transaction commits
> attempting to schedule work for the same sequence, and ensures that
> we only ever have one push (background or forced) in progress at a
> time. It also means that we don't need to take the CIL lock in write
> mode to check for potential background push races, which reduces
> lock contention.
> 
> To avoid potential issues with "smart" IO schedulers, don't use the
> workqueue for log force triggered flushes. Instead, do them directly
> so that the log IO is done directly by the process issuing the log
> force and so doesn't get stuck on IO elevator queue idling
> incorrectly delaying the log IO from the workqueue.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> ---
>  fs/xfs/xfs_log_cil.c  |  241 ++++++++++++++++++++++++++++++-------------------
>  fs/xfs/xfs_log_priv.h |    2 +
>  fs/xfs/xfs_mount.h    |    1 +
>  fs/xfs/xfs_super.c    |    7 ++
>  4 files changed, 157 insertions(+), 94 deletions(-)
> 
> diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
> index d4fadbe..566a2d5 100644
> --- a/fs/xfs/xfs_log_cil.c
> +++ b/fs/xfs/xfs_log_cil.c
> @@ -32,58 +32,6 @@
>  #include "xfs_discard.h"
>  
>  /*
> - * Perform initial CIL structure initialisation.
> - */
> -int
> -xlog_cil_init(
> -	struct log	*log)
> -{
> -	struct xfs_cil	*cil;
> -	struct xfs_cil_ctx *ctx;
> -
> -	cil = kmem_zalloc(sizeof(*cil), KM_SLEEP|KM_MAYFAIL);
> -	if (!cil)
> -		return ENOMEM;
> -
> -	ctx = kmem_zalloc(sizeof(*ctx), KM_SLEEP|KM_MAYFAIL);
> -	if (!ctx) {
> -		kmem_free(cil);
> -		return ENOMEM;
> -	}
> -
> -	INIT_LIST_HEAD(&cil->xc_cil);
> -	INIT_LIST_HEAD(&cil->xc_committing);
> -	spin_lock_init(&cil->xc_cil_lock);
> -	init_rwsem(&cil->xc_ctx_lock);
> -	init_waitqueue_head(&cil->xc_commit_wait);
> -
> -	INIT_LIST_HEAD(&ctx->committing);
> -	INIT_LIST_HEAD(&ctx->busy_extents);
> -	ctx->sequence = 1;
> -	ctx->cil = cil;
> -	cil->xc_ctx = ctx;
> -	cil->xc_current_sequence = ctx->sequence;
> -
> -	cil->xc_log = log;
> -	log->l_cilp = cil;
> -	return 0;
> -}
> -
> -void
> -xlog_cil_destroy(
> -	struct log	*log)
> -{
> -	if (log->l_cilp->xc_ctx) {
> -		if (log->l_cilp->xc_ctx->ticket)
> -			xfs_log_ticket_put(log->l_cilp->xc_ctx->ticket);
> -		kmem_free(log->l_cilp->xc_ctx);
> -	}
> -
> -	ASSERT(list_empty(&log->l_cilp->xc_cil));
> -	kmem_free(log->l_cilp);
> -}
> -
> -/*
>   * Allocate a new ticket. Failing to get a new ticket makes it really hard to
>   * recover, so we don't allow failure here. Also, we allocate in a context that
>   * we don't want to be issuing transactions from, so we need to tell the
> @@ -426,8 +374,7 @@ xlog_cil_committed(
>   */
>  STATIC int
>  xlog_cil_push(
> -	struct log		*log,
> -	xfs_lsn_t		push_seq)
> +	struct log		*log)
>  {
>  	struct xfs_cil		*cil = log->l_cilp;
>  	struct xfs_log_vec	*lv;
> @@ -443,39 +390,35 @@ xlog_cil_push(
>  	struct xfs_log_iovec	lhdr;
>  	struct xfs_log_vec	lvhdr = { NULL };
>  	xfs_lsn_t		commit_lsn;
> +	xfs_lsn_t		push_seq;
>  
>  	if (!cil)
>  		return 0;
>  
> -	ASSERT(!push_seq || push_seq <= cil->xc_ctx->sequence);
> -
>  	new_ctx = kmem_zalloc(sizeof(*new_ctx), KM_SLEEP|KM_NOFS);
>  	new_ctx->ticket = xlog_cil_ticket_alloc(log);
>  
> -	/*
> -	 * Lock out transaction commit, but don't block for background pushes
> -	 * unless we are well over the CIL space limit. See the definition of
> -	 * XLOG_CIL_HARD_SPACE_LIMIT() for the full explanation of the logic
> -	 * used here.
> -	 */
> -	if (!down_write_trylock(&cil->xc_ctx_lock)) {
> -		if (!push_seq &&
> -		    cil->xc_ctx->space_used < XLOG_CIL_HARD_SPACE_LIMIT(log))
> -			goto out_free_ticket;
> -		down_write(&cil->xc_ctx_lock);
> -	}
> +	down_write(&cil->xc_ctx_lock);
>  	ctx = cil->xc_ctx;
>  
> -	/* check if we've anything to push */
> -	if (list_empty(&cil->xc_cil))
> -		goto out_skip;
> +	spin_lock(&cil->xc_cil_lock);
> +	push_seq = cil->xc_push_seq;
> +	ASSERT(push_seq > 0 && push_seq <= ctx->sequence);

Gah! I just hit this assert.  

v3.4-rc2-3-g8a00ebe with:
Christoph's ilock series
Christoph's xfsbufd series
Jan's freeze series
Dave's queue.

nfs7 login: [ 1175.172406] XFS: Assertion failed: push_seq > 0 && push_seq <= ctx->sequence, file: /root/xfs/fs/xfs/xfs_log_cil.c, line: 406
[ 1175.183766] ------------[ cut here ]------------
[ 1175.188010] kernel BUG at /root/xfs/fs/xfs/xfs_message.c:101!
[ 1175.188010] invalid opcode: 0000 [#1] PREEMPT SMP 
[ 1175.188010] Modules linked in: xfs(O) exportfs af_packet dm_mod floppy iTCO_wdt sg i2c_i801 iTCO_vendor_support e7xxx_edac edac_core sr_mod e100 cdrom e1000 shpchp pci_hotplug button serio_raw pcspkr autofs4 processor thermal_sys ata_generic
[ 1175.188010] 
[ 1175.188010] Pid: 2760, comm: kworker/3:2 Tainted: G           O 3.4.0-rc2-1.2-desktop+ #15 TYAN Computer Corp. S2721-533 Thunder i7501 Pro/S2721-533 Thunder i7501 Pro
[ 1175.188010] EIP: 0060:[<faa4f966>] EFLAGS: 00010296 CPU: 3
[ 1175.188010] EIP is at assfail+0x26/0x30 [xfs]
[ 1175.188010] EAX: 00000087 EBX: f1f10980 ECX: 00000079 EDX: 00000046
[ 1175.188010] ESI: f1f10780 EDI: 00000000 EBP: f1d93ec4 ESP: f1d93eb0
[ 1175.188010]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
[ 1175.188010] CR0: 8005003b CR2: b770ee20 CR3: 2779c000 CR4: 000007f0
[ 1175.188010] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[ 1175.188010] DR6: ffff0ff0 DR7: 00000400
[ 1175.188010] Process kworker/3:2 (pid: 2760, ti=f1d92000 task=f1e671e0 task.ti=f1d92000)
[ 1175.188010] Stack:
[ 1175.188010]  00000000 faac7a6c faad0ee4 faad0e28 00000196 f1d93f4c faaacb2c ee24c3b0
[ 1175.188010]  eee1f700 f1d93f00 c0201a8e f1f10994 f1daea00 f1f1098c 00000000 f1e67500
[ 1175.188010]  f1f10c00 00000000 00000000 00000000 00000000 00000000 00000000 c06983de
[ 1175.188010] Call Trace:
[ 1175.188010]  [<faaacb2c>] xlog_cil_push+0x2cc/0x3e0 [xfs]
[ 1175.188010]  [<c0201a8e>] ? __switch_to+0xde/0x2c0
[ 1175.188010]  [<c06983de>] ? __schedule+0x21e/0x7c0
[ 1175.188010]  [<faaacc4b>] xlog_cil_push_work+0xb/0x10 [xfs]
[ 1175.188010]  [<c024bc07>] process_one_work+0xf7/0x3f0
[ 1175.188010]  [<faaacc40>] ? xlog_cil_push+0x3e0/0x3e0 [xfs]
[ 1175.188010]  [<c024c202>] worker_thread+0x122/0x2d0
[ 1175.188010]  [<c024c0e0>] ? rescuer_thread+0x1b0/0x1b0
[ 1175.188010]  [<c024fd6d>] kthread+0x6d/0x80
[ 1175.188010]  [<c024fd00>] ? kthread_freezable_should_stop+0x50/0x50
[ 1175.188010]  [<c06a02b6>] kernel_thread_helper+0x6/0xd
[ 1175.188010] Code: bf 00 00 00 00 55 89 e5 83 ec 14 89 4c 24 10 89 54 24 0c 89 44 24 08 c7 44 24 04 6c 7a ac fa c7 04 24 00 00 00 00 e8 da fd ff ff <0f> 0b 90 8d b4 26 00 00 00 00 55 b9 01 00 00 00 89 e5 83 ec 14 
[ 1175.188010] EIP: [<faa4f966>] assfail+0x26/0x30 [xfs] SS:ESP 0068:f1d93eb0
[ 1175.189377] ---[ end trace b2f84afec1bd6d71 ]---

Unfortunately this machine isn't configured to dump.

-Ben

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 03/18] xfs: Do background CIL flushes via a workqueue
  2012-04-17 21:21   ` Ben Myers
@ 2012-04-17 21:49     ` Ben Myers
  2012-04-18  1:47       ` Dave Chinner
  2012-04-18  1:58     ` Dave Chinner
  1 sibling, 1 reply; 45+ messages in thread
From: Ben Myers @ 2012-04-17 21:49 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Tue, Apr 17, 2012 at 04:21:55PM -0500, Ben Myers wrote:
> On Fri, Apr 13, 2012 at 10:10:46PM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > Doing background CIL flushes adds significant latency to whatever
> > async transaction that triggers it. To avoid blocking async
> > transactions on things like waiting for log buffer IO to complete,
> > move the CIL push off into a workqueue.  By moving the push work
> > into a workqueue, we remove all the latency that the commit adds
> > from the foreground transaction commit path. This also means that
> > single threaded workloads won't do the CIL push procssing, leaving
> > them more CPU to do more async transactions.
> > 
> > To do this, we need to keep track of the sequence number we have
> > pushed work for. This avoids having many transaction commits
> > attempting to schedule work for the same sequence, and ensures that
> > we only ever have one push (background or forced) in progress at a
> > time. It also means that we don't need to take the CIL lock in write
> > mode to check for potential background push races, which reduces
> > lock contention.
> > 
> > To avoid potential issues with "smart" IO schedulers, don't use the
> > workqueue for log force triggered flushes. Instead, do them directly
> > so that the log IO is done directly by the process issuing the log
> > force and so doesn't get stuck on IO elevator queue idling
> > incorrectly delaying the log IO from the workqueue.
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > ---
> >  fs/xfs/xfs_log_cil.c  |  241 ++++++++++++++++++++++++++++++-------------------
> >  fs/xfs/xfs_log_priv.h |    2 +
> >  fs/xfs/xfs_mount.h    |    1 +
> >  fs/xfs/xfs_super.c    |    7 ++
> >  4 files changed, 157 insertions(+), 94 deletions(-)
> > 
> > diff --git a/fs/xfs/xfs_log_cil.c b/fs/xfs/xfs_log_cil.c
> > index d4fadbe..566a2d5 100644
> > --- a/fs/xfs/xfs_log_cil.c
> > +++ b/fs/xfs/xfs_log_cil.c
> > @@ -32,58 +32,6 @@
> >  #include "xfs_discard.h"
> >  
> >  /*
> > - * Perform initial CIL structure initialisation.
> > - */
> > -int
> > -xlog_cil_init(
> > -	struct log	*log)
> > -{
> > -	struct xfs_cil	*cil;
> > -	struct xfs_cil_ctx *ctx;
> > -
> > -	cil = kmem_zalloc(sizeof(*cil), KM_SLEEP|KM_MAYFAIL);
> > -	if (!cil)
> > -		return ENOMEM;
> > -
> > -	ctx = kmem_zalloc(sizeof(*ctx), KM_SLEEP|KM_MAYFAIL);
> > -	if (!ctx) {
> > -		kmem_free(cil);
> > -		return ENOMEM;
> > -	}
> > -
> > -	INIT_LIST_HEAD(&cil->xc_cil);
> > -	INIT_LIST_HEAD(&cil->xc_committing);
> > -	spin_lock_init(&cil->xc_cil_lock);
> > -	init_rwsem(&cil->xc_ctx_lock);
> > -	init_waitqueue_head(&cil->xc_commit_wait);
> > -
> > -	INIT_LIST_HEAD(&ctx->committing);
> > -	INIT_LIST_HEAD(&ctx->busy_extents);
> > -	ctx->sequence = 1;
> > -	ctx->cil = cil;
> > -	cil->xc_ctx = ctx;
> > -	cil->xc_current_sequence = ctx->sequence;
> > -
> > -	cil->xc_log = log;
> > -	log->l_cilp = cil;
> > -	return 0;
> > -}
> > -
> > -void
> > -xlog_cil_destroy(
> > -	struct log	*log)
> > -{
> > -	if (log->l_cilp->xc_ctx) {
> > -		if (log->l_cilp->xc_ctx->ticket)
> > -			xfs_log_ticket_put(log->l_cilp->xc_ctx->ticket);
> > -		kmem_free(log->l_cilp->xc_ctx);
> > -	}
> > -
> > -	ASSERT(list_empty(&log->l_cilp->xc_cil));
> > -	kmem_free(log->l_cilp);
> > -}
> > -
> > -/*
> >   * Allocate a new ticket. Failing to get a new ticket makes it really hard to
> >   * recover, so we don't allow failure here. Also, we allocate in a context that
> >   * we don't want to be issuing transactions from, so we need to tell the
> > @@ -426,8 +374,7 @@ xlog_cil_committed(
> >   */
> >  STATIC int
> >  xlog_cil_push(
> > -	struct log		*log,
> > -	xfs_lsn_t		push_seq)
> > +	struct log		*log)
> >  {
> >  	struct xfs_cil		*cil = log->l_cilp;
> >  	struct xfs_log_vec	*lv;
> > @@ -443,39 +390,35 @@ xlog_cil_push(
> >  	struct xfs_log_iovec	lhdr;
> >  	struct xfs_log_vec	lvhdr = { NULL };
> >  	xfs_lsn_t		commit_lsn;
> > +	xfs_lsn_t		push_seq;
> >  
> >  	if (!cil)
> >  		return 0;
> >  
> > -	ASSERT(!push_seq || push_seq <= cil->xc_ctx->sequence);
> > -
> >  	new_ctx = kmem_zalloc(sizeof(*new_ctx), KM_SLEEP|KM_NOFS);
> >  	new_ctx->ticket = xlog_cil_ticket_alloc(log);
> >  
> > -	/*
> > -	 * Lock out transaction commit, but don't block for background pushes
> > -	 * unless we are well over the CIL space limit. See the definition of
> > -	 * XLOG_CIL_HARD_SPACE_LIMIT() for the full explanation of the logic
> > -	 * used here.
> > -	 */
> > -	if (!down_write_trylock(&cil->xc_ctx_lock)) {
> > -		if (!push_seq &&
> > -		    cil->xc_ctx->space_used < XLOG_CIL_HARD_SPACE_LIMIT(log))
> > -			goto out_free_ticket;
> > -		down_write(&cil->xc_ctx_lock);
> > -	}
> > +	down_write(&cil->xc_ctx_lock);
> >  	ctx = cil->xc_ctx;
> >  
> > -	/* check if we've anything to push */
> > -	if (list_empty(&cil->xc_cil))
> > -		goto out_skip;
> > +	spin_lock(&cil->xc_cil_lock);
> > +	push_seq = cil->xc_push_seq;
> > +	ASSERT(push_seq > 0 && push_seq <= ctx->sequence);
> 
> Gah! I just hit this assert.  
> 
> v3.4-rc2-3-g8a00ebe with:
> Christoph's ilock series
> Christoph's xfsbufd series
> Jan's freeze series
> Dave's queue.
> 
> nfs7 login: [ 1175.172406] XFS: Assertion failed: push_seq > 0 && push_seq <= ctx->sequence, file: /root/xfs/fs/xfs/xfs_log_cil.c, line: 406
> [ 1175.183766] ------------[ cut here ]------------
> [ 1175.188010] kernel BUG at /root/xfs/fs/xfs/xfs_message.c:101!
> [ 1175.188010] invalid opcode: 0000 [#1] PREEMPT SMP 
> [ 1175.188010] Modules linked in: xfs(O) exportfs af_packet dm_mod floppy iTCO_wdt sg i2c_i801 iTCO_vendor_support e7xxx_edac edac_core sr_mod e100 cdrom e1000 shpchp pci_hotplug button serio_raw pcspkr autofs4 processor thermal_sys ata_generic
> [ 1175.188010] 
> [ 1175.188010] Pid: 2760, comm: kworker/3:2 Tainted: G           O 3.4.0-rc2-1.2-desktop+ #15 TYAN Computer Corp. S2721-533 Thunder i7501 Pro/S2721-533 Thunder i7501 Pro
> [ 1175.188010] EIP: 0060:[<faa4f966>] EFLAGS: 00010296 CPU: 3
> [ 1175.188010] EIP is at assfail+0x26/0x30 [xfs]
> [ 1175.188010] EAX: 00000087 EBX: f1f10980 ECX: 00000079 EDX: 00000046
> [ 1175.188010] ESI: f1f10780 EDI: 00000000 EBP: f1d93ec4 ESP: f1d93eb0
> [ 1175.188010]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> [ 1175.188010] CR0: 8005003b CR2: b770ee20 CR3: 2779c000 CR4: 000007f0
> [ 1175.188010] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> [ 1175.188010] DR6: ffff0ff0 DR7: 00000400
> [ 1175.188010] Process kworker/3:2 (pid: 2760, ti=f1d92000 task=f1e671e0 task.ti=f1d92000)
> [ 1175.188010] Stack:
> [ 1175.188010]  00000000 faac7a6c faad0ee4 faad0e28 00000196 f1d93f4c faaacb2c ee24c3b0
> [ 1175.188010]  eee1f700 f1d93f00 c0201a8e f1f10994 f1daea00 f1f1098c 00000000 f1e67500
> [ 1175.188010]  f1f10c00 00000000 00000000 00000000 00000000 00000000 00000000 c06983de
> [ 1175.188010] Call Trace:
> [ 1175.188010]  [<faaacb2c>] xlog_cil_push+0x2cc/0x3e0 [xfs]
> [ 1175.188010]  [<c0201a8e>] ? __switch_to+0xde/0x2c0
> [ 1175.188010]  [<c06983de>] ? __schedule+0x21e/0x7c0
> [ 1175.188010]  [<faaacc4b>] xlog_cil_push_work+0xb/0x10 [xfs]
> [ 1175.188010]  [<c024bc07>] process_one_work+0xf7/0x3f0
> [ 1175.188010]  [<faaacc40>] ? xlog_cil_push+0x3e0/0x3e0 [xfs]
> [ 1175.188010]  [<c024c202>] worker_thread+0x122/0x2d0
> [ 1175.188010]  [<c024c0e0>] ? rescuer_thread+0x1b0/0x1b0
> [ 1175.188010]  [<c024fd6d>] kthread+0x6d/0x80
> [ 1175.188010]  [<c024fd00>] ? kthread_freezable_should_stop+0x50/0x50
> [ 1175.188010]  [<c06a02b6>] kernel_thread_helper+0x6/0xd
> [ 1175.188010] Code: bf 00 00 00 00 55 89 e5 83 ec 14 89 4c 24 10 89 54 24 0c 89 44 24 08 c7 44 24 04 6c 7a ac fa c7 04 24 00 00 00 00 e8 da fd ff ff <0f> 0b 90 8d b4 26 00 00 00 00 55 b9 01 00 00 00 89 e5 83 ec 14 
> [ 1175.188010] EIP: [<faa4f966>] assfail+0x26/0x30 [xfs] SS:ESP 0068:f1d93eb0
> [ 1175.189377] ---[ end trace b2f84afec1bd6d71 ]---
> 
> Unfortunately this machine isn't configured to dump.

Now I've backed off to just Christoph's ilock series and ran 'check -g auto'.

It hung on the first test:

[ 1175.190019]  [<faaacc40>] ? xlog_cil_push+0x3e0/0x3e0 [xfs]
[ 1175.190025]  [<c024c202>] worker_thread+0x122/0x2d0
nfs7 login: 0]  [<c024c0e0>] ? rescuer_thread+0x1b0/0x1b0
Welcome to openSUSE 12.1 "Asparagus" - Kernel 3.4.0-rc2-1.2-desktop+ (ttyS0).
[ 1175.190042]  [<c024fd00>] ? kthread_freezable_should_stop+0x50/0x50
[ 1175.190049]  [<c06a02b6>] kernel_thread_helper+0x6/0xd
nfs7 login: 2] Code: 89 e0 25 00 e0 ff ff 83 68 14 01 8b 40 08 a8 08 0f 84 42 ff ff ff e8 d6 88 44 00 e9 38 ff ff ff 90 55 8b 80 84 02 00 00 89 e5 5d <8b> 40 fc c3 66 90 3b 05 44 17 9a cWelcome to openSUSE 12.1 "Asparagus" - Kernel 3.4.0-rc2-1.2-desktop+ (ttyS0).
[ 1175.190109] EIP: [<c025013a>] kthread_data+0xa/0x10 SS:ESP 0068:f1d93ca8
[ 1175.190116] CR2: 00000000fffffffc
nfs7 login: [  390.054357] XFS (�=��P۔�): xlog_recover_inode_pass2: Bad inode magic number, dip = 0xf0ffd800, dino bp = 0xef89c480, ino = 25541592
[  390.066290] XFS (�=��P۔�): Internal error xlog_recover_inode_pass2(1) at line 2248 of file /root/xfs/fs/xfs/xfs_log_recover.c.  Caller 0xfaa08ffa
[  390.066295] INFO: rcu_preempt detected stalls on CPUs/tasks: { 3} (detected by 2, t=60008 jiffies)
[  390.082217] XFS: Assertion failed: atomic_read(&pag->pag_ref) == 0, file: /root/xfs/fs/xfs/xfs_mount.c, line: 272
[  390.092542] ------------[ cut here ]------------ CPUs/tasks: { 3} (detected by 2, t=240013 jiffies)
[  390.097004] kernel BUG at /root/xfs/fs/xfs/xfs_message.c:101!
[  390.097004] invalid opcode: 0000 [#1] PREEMPT SMP PUs/tasks: { 3} (detected by 1, t=420012 jiffies)
[  390.097004] Modules linked in: xfs(O) exportfs af_packet dm_mod e1000 e100 sr_mod shpchp iTCO_wdt cdrom i2c_i801 e7xxx_edac iTCO_vendor_support sg floppy pci_hotplug serio_raw button edac_core pcspkr autofs4 processor thermal_sys ata_genericasks: { 3} (detected by 2, t=600023 jiffies)
[  390.097004] INFO: Stall ended before state dump start
[  390.097004] Pid: 5217, comm: mount Tainted: G           O 3.4.0-rc2-1.2-desktop+ #15 TYAN Computer Corp. S2721-533 Thunder i7501 Pro/S2721-533 Thunder i7501 Pro
[  390.097004] EIP: 0060:[<fa9b8d56>] EFLAGS: 00010286 CPU: 3
[  390.097004] EIP is at assfail+0x26/0x30 [xfs] on CPUs/tasks: { 3} (detected by 2, t=960033 jiffies)
[  390.097004] EAX: 0000007b EBX: f0f87680 ECX: 000000f3 EDX: 00000046
[  390.097004] ESI: 00000000 EDI: f1d089a8 EBP: f1d67ddc ESP: f1d67dc8detected by 1, t=1140032 jiffies)
[  390.097004]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[  390.097004] CR0: 8005003b CR2: b7735580 CR3: 2ff82000 CR4: 000007f0detected by 1, t=1320037 jiffies)
[  390.097004] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[  390.097004] DR6: ffff0ff0 DR7: 00000400stalls on CPUs/tasks: { 3} (detected by 1, t=1500042 jiffies)
[  390.097004] Process mount (pid: 5217, ti=f1d66000 task=f0d99060 task.ti=f1d66000)
[  390.097004] Stack:
[  390.097004]  00000000 faa31844 faa399d4 faa41a7f 00000110 f1d67df8 faa0b125 f1d0899c
[  390.097004]  f1d08800 00000075 f1d08800 00000014 f1d67e50 faa0dfe9 f1d08800 faa41c34
[  390.097004]  00005000 ef82b700 fa9aea50 f1d67e24 f1d08acc 00000000 00000002 00000003
[  390.097004] Call Trace:
[  390.097004]  [<faa0b125>] xfs_free_perag+0x75/0xa0 [xfs]
[  390.097004]  [<faa0dfe9>] xfs_mountfs+0x2d9/0x710 [xfs]
[  390.097004]  [<fa9aea50>] ? _xfs_filestream_pick_ag+0x1b0/0x1b0 [xfs]
[  390.097004]  [<fa9bb5d6>] xfs_fs_fill_super+0x196/0x240 [xfs]
[  390.097004]  [<c031ce72>] mount_bdev+0x172/0x1b0
[  390.097004]  [<fa9b965a>] xfs_fs_mount+0x1a/0x20 [xfs]
[  390.097004]  [<fa9bb440>] ? xfs_finish_flags+0x130/0x130 [xfs]
[  390.097004]  [<c031d9a1>] mount_fs+0x31/0x170
[  390.097004]  [<c02ecbfa>] ? __alloc_percpu+0xa/0x10
[  390.097004]  [<c03336cc>] vfs_kern_mount+0x4c/0xc0
[  390.097004]  [<c0333fd9>] do_kern_mount+0x39/0xd0
[  390.097004]  [<c0335381>] do_mount+0x161/0x710
[  390.097004]  [<c02e7e41>] ? strndup_user+0x41/0x60
[  390.097004]  [<c0335a36>] sys_mount+0x66/0xa0
[  390.097004]  [<c0699d2d>] syscall_call+0x7/0xb
[  390.097004] Code: bf 00 00 00 00 55 89 e5 83 ec 14 89 4c 24 10 89 54 24 0c 89 44 24 08 c7 44 24 04 44 18 a3 fa c7 04 24 00 00 00 00 e8 da fd ff ff <0f> 0b 90 8d b4 26 00 00 00 00 55 b9 01 00 00 00 89 e5 83 ec 14 
[  390.097004] EIP: [<fa9b8d56>] assfail+0x26/0x30 [xfs] SS:ESP 0068:f1d67dc8
[  390.097633] ---[ end trace 33a9795b638157b9 ]---

Here it is from the system log

Apr 17 16:35:12 linux kernel: [  389.637543] XFS (264=211361P۔300): Mounting Filesystem
Apr 17 16:35:12 linux kernel: [  389.912402] XFS (264=211361P۔300): Starting recovery (logdev: internal)
Apr 17 16:35:12 linux kernel: [  390.054357] XFS (264=211361P۔300): xlog_recover_inode_pass2: Bad inode magic number, dip = 0xf0ffd800, dino bp = 0xef89c480, ino = 25541592
Apr 17 16:35:12 linux kernel: [  390.066290] XFS (264=211361P۔300): Internal error xlog_recover_inode_pass2(1) at line 2248 of file /root/xfs/fs/xfs/xfs_log_recover.c.  Caller 0xfaa08ffa
Apr 17 16:35:12 linux kernel: [  390.066295] 
Apr 17 16:35:12 linux kernel: [  390.080932] Pid: 5217, comm: mount Tainted: G           O 3.4.0-rc2-1.2-desktop+ #15
Apr 17 16:35:12 linux kernel: [  390.080938] Call Trace:
Apr 17 16:35:12 linux kernel: [  390.080998]  [<fa9ac826>] xfs_error_report+0x46/0x50 [xfs]
Apr 17 16:35:12 linux kernel: [  390.081063]  [<faa08ffa>] ? xlog_recover_commit_pass2+0xea/0x1a0 [xfs]
Apr 17 16:35:12 linux kernel: [  390.081105]  [<faa086a1>] xlog_recover_inode_pass2+0x7e1/0x9f0 [xfs]
Apr 17 16:35:12 linux kernel: [  390.081167]  [<faa08ffa>] ? xlog_recover_commit_pass2+0xea/0x1a0 [xfs]
Apr 17 16:35:12 linux kernel: [  390.081212]  [<faa07d32>] ? xlog_recover_buffer_pass2+0x122/0x2b0 [xfs]
Apr 17 16:35:12 linux kernel: [  390.081255]  [<faa08ffa>] xlog_recover_commit_pass2+0xea/0x1a0 [xfs]
Apr 17 16:35:12 linux kernel: [  390.081326]  [<faa09105>] xlog_recover_commit_trans+0x55/0xa0 [xfs]
Apr 17 16:35:12 linux kernel: [  390.081371]  [<faa0932c>] xlog_recover_process_data+0x1ac/0x2a0 [xfs]
Apr 17 16:35:12 linux kernel: [  390.081444]  [<faa0a2d8>] xlog_do_recovery_pass+0x3b8/0x7c0 [xfs]
Apr 17 16:35:12 linux kernel: [  390.081505]  [<faa0a793>] xlog_do_log_recovery+0xb3/0x120 [xfs]
Apr 17 16:35:12 linux kernel: [  390.081562]  [<faa0a9b4>] xlog_do_recover+0x24/0x1c0 [xfs]
Apr 17 16:35:12 linux kernel: [  390.081612]  [<fa9b8c48>] ? xfs_notice+0x28/0x30 [xfs]
Apr 17 16:35:12 linux kernel: [  390.081654]  [<faa0abc8>] xlog_recover+0x78/0x90 [xfs]
Apr 17 16:35:12 linux kernel: [  390.081709]  [<faa125c2>] xfs_log_mount+0x92/0x180 [xfs]
Apr 17 16:35:12 linux kernel: [  390.081752]  [<faa0e073>] xfs_mountfs+0x363/0x710 [xfs]
Apr 17 16:35:12 linux kernel: [  390.081785]  [<fa9aea50>] ? _xfs_filestream_pick_ag+0x1b0/0x1b0 [xfs]
Apr 17 16:35:12 linux kernel: [  390.081828]  [<fa9bb5d6>] xfs_fs_fill_super+0x196/0x240 [xfs]
Apr 17 16:35:12 linux kernel: [  390.081845]  [<c031ce72>] mount_bdev+0x172/0x1b0
Apr 17 16:35:12 linux kernel: [  390.081888]  [<fa9b965a>] xfs_fs_mount+0x1a/0x20 [xfs]
Apr 17 16:35:12 linux kernel: [  390.081932]  [<fa9bb440>] ? xfs_finish_flags+0x130/0x130 [xfs]
Apr 17 16:35:12 linux kernel: [  390.081943]  [<c031d9a1>] mount_fs+0x31/0x170
Apr 17 16:35:12 linux kernel: [  390.081954]  [<c02ecbfa>] ? __alloc_percpu+0xa/0x10
Apr 17 16:35:12 linux kernel: [  390.081965]  [<c03336cc>] vfs_kern_mount+0x4c/0xc0
Apr 17 16:35:12 linux kernel: [  390.081974]  [<c0333fd9>] do_kern_mount+0x39/0xd0
Apr 17 16:35:12 linux kernel: [  390.081984]  [<c0335381>] do_mount+0x161/0x710
Apr 17 16:35:12 linux kernel: [  390.081993]  [<c02e7e41>] ? strndup_user+0x41/0x60
Apr 17 16:35:12 linux kernel: [  390.082035]  [<c0335a36>] sys_mount+0x66/0xa0
Apr 17 16:35:12 linux kernel: [  390.082051]  [<c0699d2d>] syscall_call+0x7/0xb
Apr 17 16:35:12 linux kernel: [  390.082084] XFS (264=211361P۔300): log mount/recovery failed: error 117
Apr 17 16:35:12 linux kernel: [  390.082209] XFS (264=211361P۔300): log mount failed
Apr 17 16:35:12 linux kernel: [  390.082217] XFS: Assertion failed: atomic_read(&pag->pag_ref) == 0, file: /root/xfs/fs/xfs/xfs_mount.c, line: 272

Apr 17 16:35:12 linux kernel: [  390.092542] ------------[ cut here ]------------
Apr 17 16:35:12 linux kernel: [  390.097004] kernel BUG at /root/xfs/fs/xfs/xfs_message.c:101!
Apr 17 16:35:12 linux kernel: [  390.097004] invalid opcode: 0000 [#1] PREEMPT SMP 
Apr 17 16:35:12 linux kernel: [  390.097004] Modules linked in: xfs(O) exportfs af_packet dm_mod e1000 e100 sr_mod shpchp iTCO_wdt cdrom i2c_i801 e7xxx_edac iTCO_vendor_support sg floppy pci_hotplug serio_raw button edac_core pcspkr autofs4 processor thermal_sys ata_generic
Apr 17 16:35:12 linux kernel: [  390.097004] 
Apr 17 16:35:12 linux kernel: [  390.097004] Pid: 5217, comm: mount Tainted: G           O 3.4.0-rc2-1.2-desktop+ #15 TYAN Computer Corp. S2721-533 Thunder i7501 Pro/S2721-533 Thunder i7501 Pro
Apr 17 16:35:12 linux kernel: [  390.097004] EIP: 0060:[<fa9b8d56>] EFLAGS: 00010286 CPU: 3
Apr 17 16:35:12 linux kernel: [  390.097004] EIP is at assfail+0x26/0x30 [xfs]
Apr 17 16:35:12 linux kernel: [  390.097004] EAX: 0000007b EBX: f0f87680 ECX: 000000f3 EDX: 00000046
Apr 17 16:35:12 linux kernel: [  390.097004] ESI: 00000000 EDI: f1d089a8 EBP: f1d67ddc ESP: f1d67dc8
Apr 17 16:35:12 linux kernel: [  390.097004]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Apr 17 16:35:12 linux kernel: [  390.097004] CR0: 8005003b CR2: b7735580 CR3: 2ff82000 CR4: 000007f0
Apr 17 16:35:12 linux kernel: [  390.097004] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
Apr 17 16:35:12 linux kernel: [  390.097004] DR6: ffff0ff0 DR7: 00000400
Apr 17 16:35:12 linux kernel: [  390.097004] Process mount (pid: 5217, ti=f1d66000 task=f0d99060 task.ti=f1d66000)
Apr 17 16:35:12 linux kernel: [  390.097004] Stack:
Apr 17 16:35:12 linux kernel: [  390.097004]  00000000 faa31844 faa399d4 faa41a7f 00000110 f1d67df8 faa0b125 f1d0899c
Apr 17 16:35:12 linux kernel: [  390.097004]  f1d08800 00000075 f1d08800 00000014 f1d67e50 faa0dfe9 f1d08800 faa41c34
Apr 17 16:35:12 linux kernel: [  390.097004]  00005000 ef82b700 fa9aea50 f1d67e24 f1d08acc 00000000 00000002 00000003
Apr 17 16:35:12 linux kernel: [  390.097004] Call Trace:
Apr 17 16:35:12 linux kernel: [  390.097004]  [<faa0b125>] xfs_free_perag+0x75/0xa0 [xfs]
Apr 17 16:35:12 linux kernel: [  390.097004]  [<faa0dfe9>] xfs_mountfs+0x2d9/0x710 [xfs]
Apr 17 16:35:12 linux kernel: [  390.097004]  [<fa9aea50>] ? _xfs_filestream_pick_ag+0x1b0/0x1b0 [xfs]
Apr 17 16:35:12 linux kernel: [  390.097004]  [<fa9bb5d6>] xfs_fs_fill_super+0x196/0x240 [xfs]
Apr 17 16:35:12 linux kernel: [  390.097004]  [<c031ce72>] mount_bdev+0x172/0x1b0
Apr 17 16:35:12 linux kernel: [  390.097004]  [<fa9b965a>] xfs_fs_mount+0x1a/0x20 [xfs]
Apr 17 16:35:12 linux kernel: [  390.097004]  [<fa9bb440>] ? xfs_finish_flags+0x130/0x130 [xfs]
Apr 17 16:35:12 linux kernel: [  390.097004]  [<c031d9a1>] mount_fs+0x31/0x170
Apr 17 16:35:12 linux kernel: [  390.097004]  [<c02ecbfa>] ? __alloc_percpu+0xa/0x10
Apr 17 16:35:12 linux kernel: [  390.097004]  [<c03336cc>] vfs_kern_mount+0x4c/0xc0
Apr 17 16:35:12 linux kernel: [  390.097004]  [<c0333fd9>] do_kern_mount+0x39/0xd0
Apr 17 16:35:12 linux kernel: [  390.097004]  [<c0335381>] do_mount+0x161/0x710
Apr 17 16:35:12 linux kernel: [  390.097004]  [<c02e7e41>] ? strndup_user+0x41/0x60
Apr 17 16:35:12 linux kernel: [  390.097004]  [<c0335a36>] sys_mount+0x66/0xa0
Apr 17 16:35:12 linux kernel: [  390.097004]  [<c0699d2d>] syscall_call+0x7/0xb
Apr 17 16:35:12 linux kernel: [  390.097004] Code: bf 00 00 00 00 55 89 e5 83 ec 14 89 4c 24 10 89 54 24 0c 89 44 24 08 c7 44 24 04 44 18 a3 fa c7 04 24 00 00 00 00 e8 da fd ff ff <0f> 0b 90 8d b4 26 00 00 00 00 55 b9 01 00 00 00 89 e5 83 ec 14 
Apr 17 16:35:12 linux kernel: [  390.097004] EIP: [<fa9b8d56>] assfail+0x26/0x30 [xfs] SS:ESP 0068:f1d67dc8
Apr 17 16:35:12 linux kernel: [  390.097633] ---[ end trace 33a9795b638157b9 ]---

-Ben

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 03/18] xfs: Do background CIL flushes via a workqueue
  2012-04-17 21:49     ` Ben Myers
@ 2012-04-18  1:47       ` Dave Chinner
  2012-04-30  1:24         ` Dave Chinner
  0 siblings, 1 reply; 45+ messages in thread
From: Dave Chinner @ 2012-04-18  1:47 UTC (permalink / raw)
  To: Ben Myers; +Cc: xfs

On Tue, Apr 17, 2012 at 04:49:56PM -0500, Ben Myers wrote:
> On Tue, Apr 17, 2012 at 04:21:55PM -0500, Ben Myers wrote:
> > Gah! I just hit this assert.  
> > 
> > v3.4-rc2-3-g8a00ebe with:
> > Christoph's ilock series
> > Christoph's xfsbufd series
> > Jan's freeze series
> > Dave's queue.
> > 
> > nfs7 login: [ 1175.172406] XFS: Assertion failed: push_seq > 0 && push_seq <= ctx->sequence, file: /root/xfs/fs/xfs/xfs_log_cil.c, line: 406

which has probably resulted in a log corruption.

> [  390.097633] ---[ end trace 33a9795b638157b9 ]---
> 
> Here it is from the system log
> 
> Apr 17 16:35:12 linux kernel: [  389.637543] XFS (264=211361P۔300): Mounting Filesystem
> Apr 17 16:35:12 linux kernel: [  389.912402] XFS (264=211361P۔300): Starting recovery (logdev: internal)
> Apr 17 16:35:12 linux kernel: [  390.054357] XFS (264=211361P۔300): xlog_recover_inode_pass2: Bad inode magic number, dip = 0xf0ffd800, dino bp = 0xef89c480, ino = 25541592
> Apr 17 16:35:12 linux kernel: [  390.066290] XFS (264=211361P۔300): Internal error xlog_recover_inode_pass2(1) at line 2248 of file /root/xfs/fs/xfs/xfs_log_recover.c.  Caller 0xfaa08ffa

Because log recovery has read a bad inode from disk, which means an
allocation transaction has probably not been replayed.

....

> Apr 17 16:35:12 linux kernel: [  390.082084] XFS (264=211361P۔300): log mount/recovery failed: error 117
> Apr 17 16:35:12 linux kernel: [  390.082209] XFS (264=211361P۔300): log mount failed
> Apr 17 16:35:12 linux kernel: [  390.082217] XFS: Assertion failed: atomic_read(&pag->pag_ref) == 0, file: /root/xfs/fs/xfs/xfs_mount.c, line: 272
> 
> Apr 17 16:35:12 linux kernel: [  390.092542] ------------[ cut here ]------------
> Apr 17 16:35:12 linux kernel: [  390.097004] kernel BUG at /root/xfs/fs/xfs/xfs_message.c:101!
> Apr 17 16:35:12 linux kernel: [  390.097004] invalid opcode: 0000 [#1] PREEMPT SMP 
> Apr 17 16:35:12 linux kernel: [  390.097004] Modules linked in: xfs(O) exportfs af_packet dm_mod e1000 e100 sr_mod shpchp iTCO_wdt cdrom i2c_i801 e7xxx_edac iTCO_vendor_support sg floppy pci_hotplug serio_raw button edac_core pcspkr autofs4 processor thermal_sys ata_generic
> Apr 17 16:35:12 linux kernel: [  390.097004] 
> Apr 17 16:35:12 linux kernel: [  390.097004] Pid: 5217, comm: mount Tainted: G           O 3.4.0-rc2-1.2-desktop+ #15 TYAN Computer Corp. S2721-533 Thunder i7501 Pro/S2721-533 Thunder i7501 Pro
> Apr 17 16:35:12 linux kernel: [  390.097004] EIP: 0060:[<fa9b8d56>] EFLAGS: 00010286 CPU: 3
> Apr 17 16:35:12 linux kernel: [  390.097004] EIP is at assfail+0x26/0x30 [xfs]
> Apr 17 16:35:12 linux kernel: [  390.097004] EAX: 0000007b EBX: f0f87680 ECX: 000000f3 EDX: 00000046
> Apr 17 16:35:12 linux kernel: [  390.097004] ESI: 00000000 EDI: f1d089a8 EBP: f1d67ddc ESP: f1d67dc8
> Apr 17 16:35:12 linux kernel: [  390.097004]  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> Apr 17 16:35:12 linux kernel: [  390.097004] CR0: 8005003b CR2: b7735580 CR3: 2ff82000 CR4: 000007f0
> Apr 17 16:35:12 linux kernel: [  390.097004] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> Apr 17 16:35:12 linux kernel: [  390.097004] DR6: ffff0ff0 DR7: 00000400
> Apr 17 16:35:12 linux kernel: [  390.097004] Process mount (pid: 5217, ti=f1d66000 task=f0d99060 task.ti=f1d66000)
> Apr 17 16:35:12 linux kernel: [  390.097004] Stack:
> Apr 17 16:35:12 linux kernel: [  390.097004]  00000000 faa31844 faa399d4 faa41a7f 00000110 f1d67df8 faa0b125 f1d0899c
> Apr 17 16:35:12 linux kernel: [  390.097004]  f1d08800 00000075 f1d08800 00000014 f1d67e50 faa0dfe9 f1d08800 faa41c34
> Apr 17 16:35:12 linux kernel: [  390.097004]  00005000 ef82b700 fa9aea50 f1d67e24 f1d08acc 00000000 00000002 00000003
> Apr 17 16:35:12 linux kernel: [  390.097004] Call Trace:
> Apr 17 16:35:12 linux kernel: [  390.097004]  [<faa0b125>] xfs_free_perag+0x75/0xa0 [xfs]
> Apr 17 16:35:12 linux kernel: [  390.097004]  [<faa0dfe9>] xfs_mountfs+0x2d9/0x710 [xfs]
> Apr 17 16:35:12 linux kernel: [  390.097004]  [<fa9aea50>] ? _xfs_filestream_pick_ag+0x1b0/0x1b0 [xfs]
> Apr 17 16:35:12 linux kernel: [  390.097004]  [<fa9bb5d6>] xfs_fs_fill_super+0x196/0x240 [xfs]
> Apr 17 16:35:12 linux kernel: [  390.097004]  [<c031ce72>] mount_bdev+0x172/0x1b0
> Apr 17 16:35:12 linux kernel: [  390.097004]  [<fa9b965a>] xfs_fs_mount+0x1a/0x20 [xfs]

And that's a different problem that Eric tripped over a couple of
weeks ago, but hasn't sent a new patch to fix:

http://oss.sgi.com/archives/xfs/2012-04/msg00035.html
http://oss.sgi.com/archives/xfs/2012-04/msg00115.html

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 03/18] xfs: Do background CIL flushes via a workqueue
  2012-04-17 21:21   ` Ben Myers
  2012-04-17 21:49     ` Ben Myers
@ 2012-04-18  1:58     ` Dave Chinner
  1 sibling, 0 replies; 45+ messages in thread
From: Dave Chinner @ 2012-04-18  1:58 UTC (permalink / raw)
  To: Ben Myers; +Cc: xfs

On Tue, Apr 17, 2012 at 04:21:55PM -0500, Ben Myers wrote:
> On Fri, Apr 13, 2012 at 10:10:46PM +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > Doing background CIL flushes adds significant latency to whatever
> > async transaction that triggers it. To avoid blocking async
> > transactions on things like waiting for log buffer IO to complete,
> > move the CIL push off into a workqueue.  By moving the push work
> > into a workqueue, we remove all the latency that the commit adds
> > from the foreground transaction commit path. This also means that
> > single threaded workloads won't do the CIL push procssing, leaving
> > them more CPU to do more async transactions.
> > 
> > To do this, we need to keep track of the sequence number we have
> > pushed work for. This avoids having many transaction commits
> > attempting to schedule work for the same sequence, and ensures that
> > we only ever have one push (background or forced) in progress at a
> > time. It also means that we don't need to take the CIL lock in write
> > mode to check for potential background push races, which reduces
> > lock contention.
> > 
> > To avoid potential issues with "smart" IO schedulers, don't use the
> > workqueue for log force triggered flushes. Instead, do them directly
> > so that the log IO is done directly by the process issuing the log
> > force and so doesn't get stuck on IO elevator queue idling
> > incorrectly delaying the log IO from the workqueue.
> > 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
.....
> Gah! I just hit this assert.  
> 
> v3.4-rc2-3-g8a00ebe with:
> Christoph's ilock series
> Christoph's xfsbufd series
> Jan's freeze series
> Dave's queue.
> 
> nfs7 login: [ 1175.172406] XFS: Assertion failed: push_seq > 0 && push_seq <= ctx->sequence, file: /root/xfs/fs/xfs/xfs_log_cil.c, line: 406
> [ 1175.183766] ------------[ cut here ]------------
> [ 1175.188010] kernel BUG at /root/xfs/fs/xfs/xfs_message.c:101!
> [ 1175.188010] invalid opcode: 0000 [#1] PREEMPT SMP 
> [ 1175.188010] Modules linked in: xfs(O) exportfs af_packet dm_mod floppy iTCO_wdt sg i2c_i801 iTCO_vendor_support e7xxx_edac edac_core sr_mod e100 cdrom e1000 shpchp pci_hotplug button serio_raw pcspkr autofs4 processor thermal_sys ata_generic
> [ 1175.188010] 
> [ 1175.188010] Pid: 2760, comm: kworker/3:2 Tainted: G           O 3.4.0-rc2-1.2-desktop+ #15 TYAN Computer Corp. S2721-533 Thunder i7501 Pro/S2721-533 Thunder i7501 Pro
> [ 1175.188010] EIP: 0060:[<faa4f966>] EFLAGS: 00010296 CPU: 3
> [ 1175.188010] EIP is at assfail+0x26/0x30 [xfs]
> [ 1175.188010] EAX: 00000087 EBX: f1f10980 ECX: 00000079 EDX: 00000046
> [ 1175.188010] ESI: f1f10780 EDI: 00000000 EBP: f1d93ec4 ESP: f1d93eb0
> [ 1175.188010]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> [ 1175.188010] CR0: 8005003b CR2: b770ee20 CR3: 2779c000 CR4: 000007f0
> [ 1175.188010] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> [ 1175.188010] DR6: ffff0ff0 DR7: 00000400

Ah, that's a 32 bit machine. The sequence numbers are 64 bit values
- I wonder if there's an issue with reading/writing
xc_current_sequence without a spinlock held...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 01/18] xfs: Ensure inode reclaim can run during quotacheck
  2012-04-13 12:10 ` [PATCH 01/18] xfs: Ensure inode reclaim can run during quotacheck Dave Chinner
  2012-04-13 18:01   ` Mark Tinguely
@ 2012-04-29 21:37   ` Christoph Hellwig
  1 sibling, 0 replies; 45+ messages in thread
From: Christoph Hellwig @ 2012-04-29 21:37 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 04/18] xfs: page type check in writeback only checks last buffer
  2012-04-13 12:10 ` [PATCH 04/18] xfs: page type check in writeback only checks last buffer Dave Chinner
  2012-04-16 16:15   ` Mark Tinguely
@ 2012-04-29 21:39   ` Christoph Hellwig
  2012-04-30  0:29     ` Dave Chinner
  1 sibling, 1 reply; 45+ messages in thread
From: Christoph Hellwig @ 2012-04-29 21:39 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Fri, Apr 13, 2012 at 10:10:47PM +1000, Dave Chinner wrote:
> The problem was first noticed by Peter Watkins.

Was this a user visible bug?  Do we have a test case for it?

Either way, the patch looks fine,

Reviewed-by: Christoph Hellwig <hch@lst.de>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 04/18] xfs: page type check in writeback only checks last buffer
  2012-04-29 21:39   ` Christoph Hellwig
@ 2012-04-30  0:29     ` Dave Chinner
  0 siblings, 0 replies; 45+ messages in thread
From: Dave Chinner @ 2012-04-30  0:29 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Sun, Apr 29, 2012 at 05:39:26PM -0400, Christoph Hellwig wrote:
> On Fri, Apr 13, 2012 at 10:10:47PM +1000, Dave Chinner wrote:
> > The problem was first noticed by Peter Watkins.
> 
> Was this a user visible bug?  Do we have a test case for it?

Not that I know of, and no. It was noticed by Peter when doing some
work to try to understand the writeback code and the
allocation-to-data-write crash hole.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 03/18] xfs: Do background CIL flushes via a workqueue
  2012-04-18  1:47       ` Dave Chinner
@ 2012-04-30  1:24         ` Dave Chinner
  2012-04-30  6:09           ` Dave Chinner
  0 siblings, 1 reply; 45+ messages in thread
From: Dave Chinner @ 2012-04-30  1:24 UTC (permalink / raw)
  To: Ben Myers; +Cc: xfs

On Wed, Apr 18, 2012 at 11:47:06AM +1000, Dave Chinner wrote:
> On Tue, Apr 17, 2012 at 04:49:56PM -0500, Ben Myers wrote:
> > On Tue, Apr 17, 2012 at 04:21:55PM -0500, Ben Myers wrote:
> > > Gah! I just hit this assert.  
> > > 
> > > v3.4-rc2-3-g8a00ebe with:
> > > Christoph's ilock series
> > > Christoph's xfsbufd series
> > > Jan's freeze series
> > > Dave's queue.
> > > 
> > > nfs7 login: [ 1175.172406] XFS: Assertion failed: push_seq > 0 && push_seq <= ctx->sequence, file: /root/xfs/fs/xfs/xfs_log_cil.c, line: 406
> 
> which has probably resulted in a log corruption.
> 
> > [  390.097633] ---[ end trace 33a9795b638157b9 ]---
> > 
> > Here it is from the system log
> > 
> > Apr 17 16:35:12 linux kernel: [  389.637543] XFS (264=211361P۔300): Mounting Filesystem
> > Apr 17 16:35:12 linux kernel: [  389.912402] XFS (264=211361P۔300): Starting recovery (logdev: internal)
> > Apr 17 16:35:12 linux kernel: [  390.054357] XFS (264=211361P۔300): xlog_recover_inode_pass2: Bad inode magic number, dip = 0xf0ffd800, dino bp = 0xef89c480, ino = 25541592
> > Apr 17 16:35:12 linux kernel: [  390.066290] XFS (264=211361P۔300): Internal error xlog_recover_inode_pass2(1) at line 2248 of file /root/xfs/fs/xfs/xfs_log_recover.c.  Caller 0xfaa08ffa
> 
> Because log recovery has read a bad inode from disk, which means an
> allocation transaction has probably not been replayed.

Just as a note - I'm getting itest 121 reliably tripping over this
exact problem as a result of the discontiguous buffer item support
patch. Basically the problem is that an inode modification
transaction is being replayed before the inode buffer allocation
transaction, so the read of the inode buffer returns an
uninitialised data area, and hence the assert failure.

At this point I can't see why that patch would cause the problem to
occur and this tends to imply that it isn't that patch that has
caused the bug. I suspect that the problem is that the relogging of
the buffer for the unlinked inode list updates is moving the buffer
behind the inode modifications in the CIL so the order of operations
in the CIL is inode modification(s) followed by buffer
modifications. That order is then reflected in the order changes are
written to the log in the checkpoint.

What I don't understand yet is why that patch triggers an apparent
change of behaviour when it doesn't change the order of logging or
operations at all. So I need to do more debugging before being able
to say what is causing this.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH 03/18] xfs: Do background CIL flushes via a workqueue
  2012-04-30  1:24         ` Dave Chinner
@ 2012-04-30  6:09           ` Dave Chinner
  0 siblings, 0 replies; 45+ messages in thread
From: Dave Chinner @ 2012-04-30  6:09 UTC (permalink / raw)
  To: Ben Myers; +Cc: xfs

On Mon, Apr 30, 2012 at 11:24:32AM +1000, Dave Chinner wrote:
> On Wed, Apr 18, 2012 at 11:47:06AM +1000, Dave Chinner wrote:
> > On Tue, Apr 17, 2012 at 04:49:56PM -0500, Ben Myers wrote:
> > > On Tue, Apr 17, 2012 at 04:21:55PM -0500, Ben Myers wrote:
> > > > Gah! I just hit this assert.  
> > > > 
> > > > v3.4-rc2-3-g8a00ebe with:
> > > > Christoph's ilock series
> > > > Christoph's xfsbufd series
> > > > Jan's freeze series
> > > > Dave's queue.
> > > > 
> > > > nfs7 login: [ 1175.172406] XFS: Assertion failed: push_seq > 0 && push_seq <= ctx->sequence, file: /root/xfs/fs/xfs/xfs_log_cil.c, line: 406
> > 
> > which has probably resulted in a log corruption.
> > 
> > > [  390.097633] ---[ end trace 33a9795b638157b9 ]---
> > > 
> > > Here it is from the system log
> > > 
> > > Apr 17 16:35:12 linux kernel: [  389.637543] XFS (264=211361P۔300): Mounting Filesystem
> > > Apr 17 16:35:12 linux kernel: [  389.912402] XFS (264=211361P۔300): Starting recovery (logdev: internal)
> > > Apr 17 16:35:12 linux kernel: [  390.054357] XFS (264=211361P۔300): xlog_recover_inode_pass2: Bad inode magic number, dip = 0xf0ffd800, dino bp = 0xef89c480, ino = 25541592
> > > Apr 17 16:35:12 linux kernel: [  390.066290] XFS (264=211361P۔300): Internal error xlog_recover_inode_pass2(1) at line 2248 of file /root/xfs/fs/xfs/xfs_log_recover.c.  Caller 0xfaa08ffa
> > 
> > Because log recovery has read a bad inode from disk, which means an
> > allocation transaction has probably not been replayed.
> 
> Just as a note - I'm getting itest 121 reliably tripping over this
> exact problem as a result of the discontiguous buffer item support
> patch. Basically the problem is that an inode modification
> transaction is being replayed before the inode buffer allocation
> transaction, so the read of the inode buffer returns an
> uninitialised data area, and hence the assert failure.
> 
> At this point I can't see why that patch would cause the problem to
> occur and this tends to imply that it isn't that patch that has
> caused the bug.

The bug was in my code - the calculation for the map size was wrong
so only the first half of the modifications to the buffer was being
replayed. So, this seems unrelated.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2012-04-30  6:09 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-13 12:10 [PATCH 0/18] xfs: current patch queue Dave Chinner
2012-04-13 12:10 ` [PATCH 01/18] xfs: Ensure inode reclaim can run during quotacheck Dave Chinner
2012-04-13 18:01   ` Mark Tinguely
2012-04-29 21:37   ` Christoph Hellwig
2012-04-13 12:10 ` [PATCH 02/18] xfs: pass shutdown method into xfs_trans_ail_delete_bulk Dave Chinner
2012-04-13 17:40   ` Mark Tinguely
2012-04-13 23:04     ` Dave Chinner
2012-04-13 23:38       ` [PATCH 02/18 V2] " Dave Chinner
2012-04-16 18:49         ` Mark Tinguely
2012-04-13 12:10 ` [PATCH 03/18] xfs: Do background CIL flushes via a workqueue Dave Chinner
2012-04-17 17:54   ` Mark Tinguely
2012-04-17 21:21   ` Ben Myers
2012-04-17 21:49     ` Ben Myers
2012-04-18  1:47       ` Dave Chinner
2012-04-30  1:24         ` Dave Chinner
2012-04-30  6:09           ` Dave Chinner
2012-04-18  1:58     ` Dave Chinner
2012-04-13 12:10 ` [PATCH 04/18] xfs: page type check in writeback only checks last buffer Dave Chinner
2012-04-16 16:15   ` Mark Tinguely
2012-04-29 21:39   ` Christoph Hellwig
2012-04-30  0:29     ` Dave Chinner
2012-04-13 12:10 ` [PATCH 05/18] xfs: Use preallocation for inodes with extsz hints Dave Chinner
2012-04-13 16:45   ` Mark Tinguely
2012-04-16 15:59   ` Mark Tinguely
2012-04-13 12:10 ` [PATCH 06/18] xfs: fix buffer lookup race on allocation failure Dave Chinner
2012-04-13 18:32   ` Mark Tinguely
2012-04-13 12:10 ` [PATCH 07/18] xfs: check for buffer errors before waiting Dave Chinner
2012-04-13 17:56   ` Mark Tinguely
2012-04-13 12:10 ` [PATCH 08/18] xfs: fix incorrect b_offset initialisation Dave Chinner
2012-04-13 12:10 ` [PATCH 09/18] xfs: use kmem_zone_zalloc for buffers Dave Chinner
2012-04-13 12:10 ` [PATCH 10/18] xfs: clean up buffer get/read call API Dave Chinner
2012-04-13 12:10 ` [PATCH 11/18] xfs: kill b_file_offset Dave Chinner
2012-04-13 12:10 ` [PATCH 12/18] xfs: use blocks for counting length of buffers Dave Chinner
2012-04-13 12:10 ` [PATCH 13/18] xfs: use blocks for storing the desired IO size Dave Chinner
2012-04-13 12:10 ` [PATCH 14/18] xfs: kill xfs_buf_btoc Dave Chinner
2012-04-13 12:10 ` [PATCH 15/18] xfs: kill XBF_LOCK Dave Chinner
2012-04-13 21:20   ` Mark Tinguely
2012-04-13 12:10 ` [PATCH 16/18] xfs: kill xfs_read_buf() Dave Chinner
2012-04-13 12:11 ` [PATCH 17/18] xfs: kill XBF_DONTBLOCK Dave Chinner
2012-04-16 14:34   ` Mark Tinguely
2012-04-13 12:11 ` [PATCH 18/18] xfs: use iolock on XFS_IOC_ALLOCSP calls Dave Chinner
2012-04-16 15:10   ` Mark Tinguely
2012-04-16 21:29 ` [PATCH 0/18] xfs: current patch queue Ben Myers
2012-04-17  4:12   ` Dave Chinner
2012-04-17 14:44     ` Ben Myers

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.