All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/9] xfs; candidate fixes for 2.6.39 V2
@ 2011-04-07  1:57 Dave Chinner
  2011-04-07  1:57 ` [PATCH 1/9] xfs: fix extent format buffer allocation size Dave Chinner
                   ` (8 more replies)
  0 siblings, 9 replies; 25+ messages in thread
From: Dave Chinner @ 2011-04-07  1:57 UTC (permalink / raw)
  To: xfs

V2: Updated to address Christoph's review comments.

The following series contains a few things:

	1. an extent buffer allocation size fix
	2. a corrupt extent freeing fix
	3. converting the log tail checking to a warning
	4. workqueue series for fixing OOM problems

#1 and #4 have been posted before, these are updated versions.
#2 was recetnly reported by Redhat QA, and #3 was reported as a
problem for the Power arch via IBM.

I can separate them out into separate patch series if wanted, I'm
just being lazy sending them all as one series. Alex, I'll push them
to a git branch if they are all acceptable and send you a pull req.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/9] xfs: fix extent format buffer allocation size
  2011-04-07  1:57 [PATCH 0/9] xfs; candidate fixes for 2.6.39 V2 Dave Chinner
@ 2011-04-07  1:57 ` Dave Chinner
  2011-04-07  1:57 ` [PATCH 2/9] xfs: introduce a xfssyncd workqueue Dave Chinner
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 25+ messages in thread
From: Dave Chinner @ 2011-04-07  1:57 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

When formatting an inode item, we have to allocate a separate buffer
to hold extents when there are delayed allocation extents on the
inode and it is in extent format. The allocation size is derived
from the in-core data fork representation, which accounts for
delayed allocation extents, while the on-disk representation does
not contain any delalloc extents.

As a result of this mismatch, the allocated buffer can be far larger
than needed to hold the real extent list which, due to the fact the
inode is in extent format, is limited to the size of the literal
area of the inode. However, we can have thousands of delalloc
extents, resulting in an allocation size orders of magnitude larger
than is needed to hold all the real extents.

Fix this by limiting the size of the buffer being allocated to the
size of the literal area of the inodes in the filesystem (i.e. the
maximum size an inode fork can grow to).

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
---
 fs/xfs/xfs_inode_item.c |   67 ++++++++++++++++++++++++++++-------------------
 1 files changed, 40 insertions(+), 27 deletions(-)

diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c
index 46cc401..576fdfe 100644
--- a/fs/xfs/xfs_inode_item.c
+++ b/fs/xfs/xfs_inode_item.c
@@ -198,6 +198,41 @@ xfs_inode_item_size(
 }
 
 /*
+ * xfs_inode_item_format_extents - convert in-core extents to on-disk form
+ *
+ * For either the data or attr fork in extent format, we need to endian convert
+ * the in-core extent as we place them into the on-disk inode. In this case, we
+ * need to do this conversion before we write the extents into the log. Because
+ * we don't have the disk inode to write into here, we allocate a buffer and
+ * format the extents into it via xfs_iextents_copy(). We free the buffer in
+ * the unlock routine after the copy for the log has been made.
+ *
+ * In the case of the data fork, the in-core and on-disk fork sizes can be
+ * different due to delayed allocation extents. We only log on-disk extents
+ * here, so always use the physical fork size to determine the size of the
+ * buffer we need to allocate.
+ */
+STATIC void
+xfs_inode_item_format_extents(
+	struct xfs_inode	*ip,
+	struct xfs_log_iovec	*vecp,
+	int			whichfork,
+	int			type)
+{
+	xfs_bmbt_rec_t		*ext_buffer;
+
+	ext_buffer = kmem_alloc(XFS_IFORK_SIZE(ip, whichfork), KM_SLEEP);
+	if (whichfork == XFS_DATA_FORK)
+		ip->i_itemp->ili_extents_buf = ext_buffer;
+	else
+		ip->i_itemp->ili_aextents_buf = ext_buffer;
+
+	vecp->i_addr = ext_buffer;
+	vecp->i_len = xfs_iextents_copy(ip, ext_buffer, whichfork);
+	vecp->i_type = type;
+}
+
+/*
  * This is called to fill in the vector of log iovecs for the
  * given inode log item.  It fills the first item with an inode
  * log format structure, the second with the on-disk inode structure,
@@ -213,7 +248,6 @@ xfs_inode_item_format(
 	struct xfs_inode	*ip = iip->ili_inode;
 	uint			nvecs;
 	size_t			data_bytes;
-	xfs_bmbt_rec_t		*ext_buffer;
 	xfs_mount_t		*mp;
 
 	vecp->i_addr = &iip->ili_format;
@@ -320,22 +354,8 @@ xfs_inode_item_format(
 			} else
 #endif
 			{
-				/*
-				 * There are delayed allocation extents
-				 * in the inode, or we need to convert
-				 * the extents to on disk format.
-				 * Use xfs_iextents_copy()
-				 * to copy only the real extents into
-				 * a separate buffer.  We'll free the
-				 * buffer in the unlock routine.
-				 */
-				ext_buffer = kmem_alloc(ip->i_df.if_bytes,
-					KM_SLEEP);
-				iip->ili_extents_buf = ext_buffer;
-				vecp->i_addr = ext_buffer;
-				vecp->i_len = xfs_iextents_copy(ip, ext_buffer,
-						XFS_DATA_FORK);
-				vecp->i_type = XLOG_REG_TYPE_IEXT;
+				xfs_inode_item_format_extents(ip, vecp,
+					XFS_DATA_FORK, XLOG_REG_TYPE_IEXT);
 			}
 			ASSERT(vecp->i_len <= ip->i_df.if_bytes);
 			iip->ili_format.ilf_dsize = vecp->i_len;
@@ -445,19 +465,12 @@ xfs_inode_item_format(
 			 */
 			vecp->i_addr = ip->i_afp->if_u1.if_extents;
 			vecp->i_len = ip->i_afp->if_bytes;
+			vecp->i_type = XLOG_REG_TYPE_IATTR_EXT;
 #else
 			ASSERT(iip->ili_aextents_buf == NULL);
-			/*
-			 * Need to endian flip before logging
-			 */
-			ext_buffer = kmem_alloc(ip->i_afp->if_bytes,
-				KM_SLEEP);
-			iip->ili_aextents_buf = ext_buffer;
-			vecp->i_addr = ext_buffer;
-			vecp->i_len = xfs_iextents_copy(ip, ext_buffer,
-					XFS_ATTR_FORK);
+			xfs_inode_item_format_extents(ip, vecp,
+					XFS_ATTR_FORK, XLOG_REG_TYPE_IATTR_EXT);
 #endif
-			vecp->i_type = XLOG_REG_TYPE_IATTR_EXT;
 			iip->ili_format.ilf_asize = vecp->i_len;
 			vecp++;
 			nvecs++;
-- 
1.7.2.3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 2/9] xfs: introduce a xfssyncd workqueue
  2011-04-07  1:57 [PATCH 0/9] xfs; candidate fixes for 2.6.39 V2 Dave Chinner
  2011-04-07  1:57 ` [PATCH 1/9] xfs: fix extent format buffer allocation size Dave Chinner
@ 2011-04-07  1:57 ` Dave Chinner
  2011-04-07 21:34   ` Alex Elder
  2011-04-07  1:57 ` [PATCH 3/9] xfs: convert ENOSPC inode flushing to use new syncd workqueue Dave Chinner
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 25+ messages in thread
From: Dave Chinner @ 2011-04-07  1:57 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

All of the work xfssyncd does is background functionality. There is
no need for a thread per filesystem to do this work - it can al be
managed by a global workqueue now they manage concurrency
effectively.

Introduce a new gglobal xfssyncd workqueue, and convert the periodic
work to use this new functionality. To do this, use a delayed work
construct to schedule the next running of the periodic sync work
for the filesystem. When the sync work is complete, queue a new
delayed work for the next running of the sync work.

For laptop mode, we wait on completion for the sync works, so ensure
that the sync work queuing interface can flush and wait for work to
complete to enable the work queue infrastructure to replace the
current sequence number and wakeup that is used.

Because the sync work does non-trivial amounts of work, mark the
new work queue as CPU intensive.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/linux-2.6/xfs_super.c |   24 +++++------
 fs/xfs/linux-2.6/xfs_sync.c  |   86 ++++++++++++++++++++---------------------
 fs/xfs/linux-2.6/xfs_sync.h  |    2 +
 fs/xfs/xfs_mount.h           |    4 +-
 4 files changed, 56 insertions(+), 60 deletions(-)

diff --git a/fs/xfs/linux-2.6/xfs_super.c b/fs/xfs/linux-2.6/xfs_super.c
index 1ba5c45..99dded9 100644
--- a/fs/xfs/linux-2.6/xfs_super.c
+++ b/fs/xfs/linux-2.6/xfs_super.c
@@ -1191,22 +1191,12 @@ xfs_fs_sync_fs(
 		return -error;
 
 	if (laptop_mode) {
-		int	prev_sync_seq = mp->m_sync_seq;
-
 		/*
 		 * The disk must be active because we're syncing.
 		 * We schedule xfssyncd now (now that the disk is
 		 * active) instead of later (when it might not be).
 		 */
-		wake_up_process(mp->m_sync_task);
-		/*
-		 * We have to wait for the sync iteration to complete.
-		 * If we don't, the disk activity caused by the sync
-		 * will come after the sync is completed, and that
-		 * triggers another sync from laptop mode.
-		 */
-		wait_event(mp->m_wait_single_sync_task,
-				mp->m_sync_seq != prev_sync_seq);
+		flush_delayed_work_sync(&mp->m_sync_work);
 	}
 
 	return 0;
@@ -1492,7 +1482,6 @@ xfs_fs_fill_super(
 	atomic_set(&mp->m_active_trans, 0);
 	INIT_LIST_HEAD(&mp->m_sync_list);
 	spin_lock_init(&mp->m_sync_lock);
-	init_waitqueue_head(&mp->m_wait_single_sync_task);
 
 	mp->m_super = sb;
 	sb->s_fs_info = mp;
@@ -1833,13 +1822,21 @@ init_xfs_fs(void)
 	if (error)
 		goto out_cleanup_procfs;
 
+	xfs_syncd_wq = alloc_workqueue("xfssyncd", WQ_CPU_INTENSIVE, 8);
+	if (!xfs_syncd_wq) {
+		error = -ENOMEM;
+		goto out_sysctl_unregister;
+	}
+
 	vfs_initquota();
 
 	error = register_filesystem(&xfs_fs_type);
 	if (error)
-		goto out_sysctl_unregister;
+		goto out_destroy_xfs_syncd;
 	return 0;
 
+ out_destroy_xfs_syncd:
+	destroy_workqueue(xfs_syncd_wq);
  out_sysctl_unregister:
 	xfs_sysctl_unregister();
  out_cleanup_procfs:
@@ -1861,6 +1858,7 @@ exit_xfs_fs(void)
 {
 	vfs_exitquota();
 	unregister_filesystem(&xfs_fs_type);
+	destroy_workqueue(xfs_syncd_wq);
 	xfs_sysctl_unregister();
 	xfs_cleanup_procfs();
 	xfs_buf_terminate();
diff --git a/fs/xfs/linux-2.6/xfs_sync.c b/fs/xfs/linux-2.6/xfs_sync.c
index 594cd82..ee9a6c3 100644
--- a/fs/xfs/linux-2.6/xfs_sync.c
+++ b/fs/xfs/linux-2.6/xfs_sync.c
@@ -39,6 +39,8 @@
 #include <linux/kthread.h>
 #include <linux/freezer.h>
 
+struct workqueue_struct	*xfs_syncd_wq;	/* sync workqueue */
+
 /*
  * The inode lookup is done in batches to keep the amount of lock traffic and
  * radix tree lookups to a minimum. The batch size is a trade off between
@@ -489,32 +491,6 @@ xfs_flush_inodes(
 	xfs_log_force(ip->i_mount, XFS_LOG_SYNC);
 }
 
-/*
- * Every sync period we need to unpin all items, reclaim inodes and sync
- * disk quotas.  We might need to cover the log to indicate that the
- * filesystem is idle and not frozen.
- */
-STATIC void
-xfs_sync_worker(
-	struct xfs_mount *mp,
-	void		*unused)
-{
-	int		error;
-
-	if (!(mp->m_flags & XFS_MOUNT_RDONLY)) {
-		/* dgc: errors ignored here */
-		if (mp->m_super->s_frozen == SB_UNFROZEN &&
-		    xfs_log_need_covered(mp))
-			error = xfs_fs_log_dummy(mp);
-		else
-			xfs_log_force(mp, 0);
-		xfs_reclaim_inodes(mp, 0);
-		error = xfs_qm_sync(mp, SYNC_TRYLOCK);
-	}
-	mp->m_sync_seq++;
-	wake_up(&mp->m_wait_single_sync_task);
-}
-
 STATIC int
 xfssyncd(
 	void			*arg)
@@ -535,27 +511,12 @@ xfssyncd(
 			break;
 
 		spin_lock(&mp->m_sync_lock);
-		/*
-		 * We can get woken by laptop mode, to do a sync -
-		 * that's the (only!) case where the list would be
-		 * empty with time remaining.
-		 */
-		if (!timeleft || list_empty(&mp->m_sync_list)) {
-			if (!timeleft)
-				timeleft = xfs_syncd_centisecs *
-							msecs_to_jiffies(10);
-			INIT_LIST_HEAD(&mp->m_sync_work.w_list);
-			list_add_tail(&mp->m_sync_work.w_list,
-					&mp->m_sync_list);
-		}
 		list_splice_init(&mp->m_sync_list, &tmp);
 		spin_unlock(&mp->m_sync_lock);
 
 		list_for_each_entry_safe(work, n, &tmp, w_list) {
 			(*work->w_syncer)(mp, work->w_data);
 			list_del(&work->w_list);
-			if (work == &mp->m_sync_work)
-				continue;
 			if (work->w_completion)
 				complete(work->w_completion);
 			kmem_free(work);
@@ -565,13 +526,49 @@ xfssyncd(
 	return 0;
 }
 
+static void
+xfs_syncd_queue_sync(
+	struct xfs_mount        *mp)
+{
+	queue_delayed_work(xfs_syncd_wq, &mp->m_sync_work,
+				msecs_to_jiffies(xfs_syncd_centisecs * 10));
+}
+
+/*
+ * Every sync period we need to unpin all items, reclaim inodes and sync
+ * disk quotas.  We might need to cover the log to indicate that the
+ * filesystem is idle and not frozen.
+ */
+STATIC void
+xfs_sync_worker(
+	struct work_struct *work)
+{
+	struct xfs_mount *mp = container_of(to_delayed_work(work),
+					struct xfs_mount, m_sync_work);
+	int		error;
+
+	if (!(mp->m_flags & XFS_MOUNT_RDONLY)) {
+		/* dgc: errors ignored here */
+		if (mp->m_super->s_frozen == SB_UNFROZEN &&
+		    xfs_log_need_covered(mp))
+			error = xfs_fs_log_dummy(mp);
+		else
+			xfs_log_force(mp, 0);
+		xfs_reclaim_inodes(mp, 0);
+		error = xfs_qm_sync(mp, SYNC_TRYLOCK);
+	}
+
+	/* queue us up again */
+	xfs_syncd_queue_sync(mp);
+}
+
 int
 xfs_syncd_init(
 	struct xfs_mount	*mp)
 {
-	mp->m_sync_work.w_syncer = xfs_sync_worker;
-	mp->m_sync_work.w_mount = mp;
-	mp->m_sync_work.w_completion = NULL;
+	INIT_DELAYED_WORK(&mp->m_sync_work, xfs_sync_worker);
+	xfs_syncd_queue_sync(mp);
+
 	mp->m_sync_task = kthread_run(xfssyncd, mp, "xfssyncd/%s", mp->m_fsname);
 	if (IS_ERR(mp->m_sync_task))
 		return -PTR_ERR(mp->m_sync_task);
@@ -582,6 +579,7 @@ void
 xfs_syncd_stop(
 	struct xfs_mount	*mp)
 {
+	cancel_delayed_work_sync(&mp->m_sync_work);
 	kthread_stop(mp->m_sync_task);
 }
 
diff --git a/fs/xfs/linux-2.6/xfs_sync.h b/fs/xfs/linux-2.6/xfs_sync.h
index 32ba662..e3a6ad2 100644
--- a/fs/xfs/linux-2.6/xfs_sync.h
+++ b/fs/xfs/linux-2.6/xfs_sync.h
@@ -32,6 +32,8 @@ typedef struct xfs_sync_work {
 #define SYNC_WAIT		0x0001	/* wait for i/o to complete */
 #define SYNC_TRYLOCK		0x0002  /* only try to lock inodes */
 
+extern struct workqueue_struct	*xfs_syncd_wq;	/* sync workqueue */
+
 int xfs_syncd_init(struct xfs_mount *mp);
 void xfs_syncd_stop(struct xfs_mount *mp);
 
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index a62e897..2c11e62 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -203,12 +203,10 @@ typedef struct xfs_mount {
 	struct mutex		m_icsb_mutex;	/* balancer sync lock */
 #endif
 	struct xfs_mru_cache	*m_filestream;  /* per-mount filestream data */
+	struct delayed_work	m_sync_work;	/* background sync work */
 	struct task_struct	*m_sync_task;	/* generalised sync thread */
-	xfs_sync_work_t		m_sync_work;	/* work item for VFS_SYNC */
 	struct list_head	m_sync_list;	/* sync thread work item list */
 	spinlock_t		m_sync_lock;	/* work item list lock */
-	int			m_sync_seq;	/* sync thread generation no. */
-	wait_queue_head_t	m_wait_single_sync_task;
 	__int64_t		m_update_flags;	/* sb flags we need to update
 						   on the next remount,rw */
 	struct shrinker		m_inode_shrink;	/* inode reclaim shrinker */
-- 
1.7.2.3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 3/9] xfs: convert ENOSPC inode flushing to use new syncd workqueue
  2011-04-07  1:57 [PATCH 0/9] xfs; candidate fixes for 2.6.39 V2 Dave Chinner
  2011-04-07  1:57 ` [PATCH 1/9] xfs: fix extent format buffer allocation size Dave Chinner
  2011-04-07  1:57 ` [PATCH 2/9] xfs: introduce a xfssyncd workqueue Dave Chinner
@ 2011-04-07  1:57 ` Dave Chinner
  2011-04-07 21:16   ` Alex Elder
  2011-04-07  1:57 ` [PATCH 4/9] xfs: introduce background inode reclaim work Dave Chinner
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 25+ messages in thread
From: Dave Chinner @ 2011-04-07  1:57 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

On of the problems with the current inode flush at ENOSPC is that we
queue a flush per ENOSPC event, regardless of how many are already
queued. Thi can result in    hundreds of queued flushes, most of
which simply burn CPU scanned and do no real work. This simply slows
down allocation at ENOSPC.

We really only need one active flush at a time, and we can easily
implement that via the new xfs_syncd_wq. All we need to do is queue
a flush if one is not already active, then block waiting for the
currently active flush to complete. The result is that we only ever
have a single ENOSPC inode flush active at a time and this greatly
reduces the overhead of ENOSPC processing.

On my 2p test machine, this results in tests exercising ENOSPC
conditions running significantly faster - 042 halves execution time,
083 drops from 60s to 5s, etc - while not introducing test
regressions.

This allows us to remove the old xfssyncd threads and infrastructure
as they are no longer used.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/linux-2.6/xfs_super.c |    2 -
 fs/xfs/linux-2.6/xfs_sync.c  |  132 +++++++++++-------------------------------
 fs/xfs/xfs_mount.h           |    4 +-
 3 files changed, 36 insertions(+), 102 deletions(-)

diff --git a/fs/xfs/linux-2.6/xfs_super.c b/fs/xfs/linux-2.6/xfs_super.c
index 99dded9..a20071f 100644
--- a/fs/xfs/linux-2.6/xfs_super.c
+++ b/fs/xfs/linux-2.6/xfs_super.c
@@ -1480,8 +1480,6 @@ xfs_fs_fill_super(
 	spin_lock_init(&mp->m_sb_lock);
 	mutex_init(&mp->m_growlock);
 	atomic_set(&mp->m_active_trans, 0);
-	INIT_LIST_HEAD(&mp->m_sync_list);
-	spin_lock_init(&mp->m_sync_lock);
 
 	mp->m_super = sb;
 	sb->s_fs_info = mp;
diff --git a/fs/xfs/linux-2.6/xfs_sync.c b/fs/xfs/linux-2.6/xfs_sync.c
index ee9a6c3..af32759 100644
--- a/fs/xfs/linux-2.6/xfs_sync.c
+++ b/fs/xfs/linux-2.6/xfs_sync.c
@@ -433,99 +433,6 @@ xfs_quiesce_attr(
 	xfs_unmountfs_writesb(mp);
 }
 
-/*
- * Enqueue a work item to be picked up by the vfs xfssyncd thread.
- * Doing this has two advantages:
- * - It saves on stack space, which is tight in certain situations
- * - It can be used (with care) as a mechanism to avoid deadlocks.
- * Flushing while allocating in a full filesystem requires both.
- */
-STATIC void
-xfs_syncd_queue_work(
-	struct xfs_mount *mp,
-	void		*data,
-	void		(*syncer)(struct xfs_mount *, void *),
-	struct completion *completion)
-{
-	struct xfs_sync_work *work;
-
-	work = kmem_alloc(sizeof(struct xfs_sync_work), KM_SLEEP);
-	INIT_LIST_HEAD(&work->w_list);
-	work->w_syncer = syncer;
-	work->w_data = data;
-	work->w_mount = mp;
-	work->w_completion = completion;
-	spin_lock(&mp->m_sync_lock);
-	list_add_tail(&work->w_list, &mp->m_sync_list);
-	spin_unlock(&mp->m_sync_lock);
-	wake_up_process(mp->m_sync_task);
-}
-
-/*
- * Flush delayed allocate data, attempting to free up reserved space
- * from existing allocations.  At this point a new allocation attempt
- * has failed with ENOSPC and we are in the process of scratching our
- * heads, looking about for more room...
- */
-STATIC void
-xfs_flush_inodes_work(
-	struct xfs_mount *mp,
-	void		*arg)
-{
-	struct inode	*inode = arg;
-	xfs_sync_data(mp, SYNC_TRYLOCK);
-	xfs_sync_data(mp, SYNC_TRYLOCK | SYNC_WAIT);
-	iput(inode);
-}
-
-void
-xfs_flush_inodes(
-	xfs_inode_t	*ip)
-{
-	struct inode	*inode = VFS_I(ip);
-	DECLARE_COMPLETION_ONSTACK(completion);
-
-	igrab(inode);
-	xfs_syncd_queue_work(ip->i_mount, inode, xfs_flush_inodes_work, &completion);
-	wait_for_completion(&completion);
-	xfs_log_force(ip->i_mount, XFS_LOG_SYNC);
-}
-
-STATIC int
-xfssyncd(
-	void			*arg)
-{
-	struct xfs_mount	*mp = arg;
-	long			timeleft;
-	xfs_sync_work_t		*work, *n;
-	LIST_HEAD		(tmp);
-
-	set_freezable();
-	timeleft = xfs_syncd_centisecs * msecs_to_jiffies(10);
-	for (;;) {
-		if (list_empty(&mp->m_sync_list))
-			timeleft = schedule_timeout_interruptible(timeleft);
-		/* swsusp */
-		try_to_freeze();
-		if (kthread_should_stop() && list_empty(&mp->m_sync_list))
-			break;
-
-		spin_lock(&mp->m_sync_lock);
-		list_splice_init(&mp->m_sync_list, &tmp);
-		spin_unlock(&mp->m_sync_lock);
-
-		list_for_each_entry_safe(work, n, &tmp, w_list) {
-			(*work->w_syncer)(mp, work->w_data);
-			list_del(&work->w_list);
-			if (work->w_completion)
-				complete(work->w_completion);
-			kmem_free(work);
-		}
-	}
-
-	return 0;
-}
-
 static void
 xfs_syncd_queue_sync(
 	struct xfs_mount        *mp)
@@ -562,16 +469,47 @@ xfs_sync_worker(
 	xfs_syncd_queue_sync(mp);
 }
 
+/*
+ * Flush delayed allocate data, attempting to free up reserved space
+ * from existing allocations.  At this point a new allocation attempt
+ * has failed with ENOSPC and we are in the process of scratching our
+ * heads, looking about for more room.
+ *
+ * Queue a new data flush if there isn't one already in progress and
+ * wait for completion of the flush. This means that we only ever have one
+ * inode flush in progress no matter how many ENOSPC events are occurring and
+ * so will prevent the system from bogging down due to every concurrent
+ * ENOSPC event scanning all the active inodes in the system for writeback.
+ */
+void
+xfs_flush_inodes(
+	struct xfs_inode	*ip)
+{
+	struct xfs_mount	*mp = ip->i_mount;
+
+	queue_work(xfs_syncd_wq, &mp->m_flush_work);
+	flush_work_sync(&mp->m_flush_work);
+}
+
+STATIC void
+xfs_flush_worker(
+	struct work_struct *work)
+{
+	struct xfs_mount *mp = container_of(work,
+					struct xfs_mount, m_flush_work);
+
+	xfs_sync_data(mp, SYNC_TRYLOCK);
+	xfs_sync_data(mp, SYNC_TRYLOCK | SYNC_WAIT);
+}
+
 int
 xfs_syncd_init(
 	struct xfs_mount	*mp)
 {
+	INIT_WORK(&mp->m_flush_work, xfs_flush_worker);
 	INIT_DELAYED_WORK(&mp->m_sync_work, xfs_sync_worker);
 	xfs_syncd_queue_sync(mp);
 
-	mp->m_sync_task = kthread_run(xfssyncd, mp, "xfssyncd/%s", mp->m_fsname);
-	if (IS_ERR(mp->m_sync_task))
-		return -PTR_ERR(mp->m_sync_task);
 	return 0;
 }
 
@@ -580,7 +518,7 @@ xfs_syncd_stop(
 	struct xfs_mount	*mp)
 {
 	cancel_delayed_work_sync(&mp->m_sync_work);
-	kthread_stop(mp->m_sync_task);
+	cancel_work_sync(&mp->m_flush_work);
 }
 
 void
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index 2c11e62..a0ad90e 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -204,9 +204,7 @@ typedef struct xfs_mount {
 #endif
 	struct xfs_mru_cache	*m_filestream;  /* per-mount filestream data */
 	struct delayed_work	m_sync_work;	/* background sync work */
-	struct task_struct	*m_sync_task;	/* generalised sync thread */
-	struct list_head	m_sync_list;	/* sync thread work item list */
-	spinlock_t		m_sync_lock;	/* work item list lock */
+	struct work_struct	m_flush_work;	/* background inode flush */
 	__int64_t		m_update_flags;	/* sb flags we need to update
 						   on the next remount,rw */
 	struct shrinker		m_inode_shrink;	/* inode reclaim shrinker */
-- 
1.7.2.3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 4/9] xfs: introduce background inode reclaim work
  2011-04-07  1:57 [PATCH 0/9] xfs; candidate fixes for 2.6.39 V2 Dave Chinner
                   ` (2 preceding siblings ...)
  2011-04-07  1:57 ` [PATCH 3/9] xfs: convert ENOSPC inode flushing to use new syncd workqueue Dave Chinner
@ 2011-04-07  1:57 ` Dave Chinner
  2011-04-07 21:16   ` Alex Elder
  2011-04-07  1:57 ` [PATCH 5/9] xfs: convert the xfsaild threads to a workqueue Dave Chinner
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 25+ messages in thread
From: Dave Chinner @ 2011-04-07  1:57 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

Background inode reclaim needs to run more frequently that the XFS
syncd work is run as 30s is too long between optimal reclaim runs.
Add a new periodic work item to the xfs syncd workqueue to run a
fast, non-blocking inode reclaim scan.

Background inode reclaim is kicked by the act of marking inodes for
reclaim.  When an AG is first marked as having reclaimable inodes,
the background reclaim work is kicked. It will continue to run
periodically untill it detects that there are no more reclaimable
inodes. It will be kicked again when the first inode is queued for
reclaim.

To ensure shrinker based inode reclaim throttles to the inode
cleaning and reclaim rate but still reclaim inodes efficiently, make it kick the
background inode reclaim so that when we are low on memory we are
trying to reclaim inodes as efficiently as possible. This kick shoul
d not be necessary, but it will protect against failures to kick the
background reclaim when inodes are first dirtied.

To provide the rate throttling, make the shrinker pass do
synchronous inode reclaim so that it blocks on inodes under IO. This
means that the shrinker will reclaim inodes rather than just
skipping over them, but it does not adversely affect the rate of
reclaim because most dirty inodes are already under IO due to the
background reclaim work the shrinker kicked.

These two modifications solve one of the two OOM killer invocations
Chris Mason reported recently when running a stress testing script.
The particular workload trigger for the OOM killer invocation is
where there are more threads than CPUs all unlinking files in an
extremely memory constrained environment. Unlike other solutions,
this one does not have a performance impact on performance when
memory is not constrained or the number of concurrent threads
operating is <= to the number of CPUs.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/linux-2.6/xfs_sync.c |   69 +++++++++++++++++++++++++++++++++++++++++--
 fs/xfs/xfs_mount.h          |    1 +
 2 files changed, 67 insertions(+), 3 deletions(-)

diff --git a/fs/xfs/linux-2.6/xfs_sync.c b/fs/xfs/linux-2.6/xfs_sync.c
index af32759..debe282 100644
--- a/fs/xfs/linux-2.6/xfs_sync.c
+++ b/fs/xfs/linux-2.6/xfs_sync.c
@@ -461,7 +461,6 @@ xfs_sync_worker(
 			error = xfs_fs_log_dummy(mp);
 		else
 			xfs_log_force(mp, 0);
-		xfs_reclaim_inodes(mp, 0);
 		error = xfs_qm_sync(mp, SYNC_TRYLOCK);
 	}
 
@@ -470,6 +469,52 @@ xfs_sync_worker(
 }
 
 /*
+ * Queue a new inode reclaim pass if there are reclaimable inodes and there
+ * isn't a reclaim pass already in progress. By default it runs every 5s based
+ * on the xfs syncd work default of 30s. Perhaps this should have it's own
+ * tunable, but that can be done if this method proves to be ineffective or too
+ * aggressive.
+ */
+static void
+xfs_syncd_queue_reclaim(
+	struct xfs_mount        *mp)
+{
+
+	/*
+	 * We can have inodes enter reclaim after we've shut down the syncd
+	 * workqueue during unmount, so don't allow reclaim work to be queued
+	 * during unmount.
+	 */
+	if (!(mp->m_super->s_flags & MS_ACTIVE))
+		return;
+
+	rcu_read_lock();
+	if (radix_tree_tagged(&mp->m_perag_tree, XFS_ICI_RECLAIM_TAG)) {
+		queue_delayed_work(xfs_syncd_wq, &mp->m_reclaim_work,
+			msecs_to_jiffies(xfs_syncd_centisecs / 6 * 10));
+	}
+	rcu_read_unlock();
+}
+
+/*
+ * This is a fast pass over the inode cache to try to get reclaim moving on as
+ * many inodes as possible in a short period of time. It kicks itself every few
+ * seconds, as well as being kicked by the inode cache shrinker when memory
+ * goes low. It scans as quickly as possible avoiding locked inodes or those
+ * already being flushed, and once done schedules a future pass.
+ */
+STATIC void
+xfs_reclaim_worker(
+	struct work_struct *work)
+{
+	struct xfs_mount *mp = container_of(to_delayed_work(work),
+					struct xfs_mount, m_reclaim_work);
+
+	xfs_reclaim_inodes(mp, SYNC_TRYLOCK);
+	xfs_syncd_queue_reclaim(mp);
+}
+
+/*
  * Flush delayed allocate data, attempting to free up reserved space
  * from existing allocations.  At this point a new allocation attempt
  * has failed with ENOSPC and we are in the process of scratching our
@@ -508,7 +553,10 @@ xfs_syncd_init(
 {
 	INIT_WORK(&mp->m_flush_work, xfs_flush_worker);
 	INIT_DELAYED_WORK(&mp->m_sync_work, xfs_sync_worker);
+	INIT_DELAYED_WORK(&mp->m_reclaim_work, xfs_reclaim_worker);
+
 	xfs_syncd_queue_sync(mp);
+	xfs_syncd_queue_reclaim(mp);
 
 	return 0;
 }
@@ -518,6 +566,7 @@ xfs_syncd_stop(
 	struct xfs_mount	*mp)
 {
 	cancel_delayed_work_sync(&mp->m_sync_work);
+	cancel_delayed_work_sync(&mp->m_reclaim_work);
 	cancel_work_sync(&mp->m_flush_work);
 }
 
@@ -537,6 +586,10 @@ __xfs_inode_set_reclaim_tag(
 				XFS_INO_TO_AGNO(ip->i_mount, ip->i_ino),
 				XFS_ICI_RECLAIM_TAG);
 		spin_unlock(&ip->i_mount->m_perag_lock);
+
+		/* schedule periodic background inode reclaim */
+		xfs_syncd_queue_reclaim(ip->i_mount);
+
 		trace_xfs_perag_set_reclaim(ip->i_mount, pag->pag_agno,
 							-1, _RET_IP_);
 	}
@@ -953,7 +1006,13 @@ xfs_reclaim_inodes(
 }
 
 /*
- * Shrinker infrastructure.
+ * Inode cache shrinker.
+ *
+ * When called we make sure that there is a background (fast) inode reclaim in
+ * progress, while we will throttle the speed of reclaim via doiing synchronous
+ * reclaim of inodes. That means if we come across dirty inodes, we wait for
+ * them to be cleaned, which we hope will not be very long due to the
+ * background walker having already kicked the IO off on those dirty inodes.
  */
 static int
 xfs_reclaim_inode_shrink(
@@ -968,10 +1027,14 @@ xfs_reclaim_inode_shrink(
 
 	mp = container_of(shrink, struct xfs_mount, m_inode_shrink);
 	if (nr_to_scan) {
+		/* kick background reclaimer */
+		xfs_syncd_queue_reclaim(mp);
+
 		if (!(gfp_mask & __GFP_FS))
 			return -1;
 
-		xfs_reclaim_inodes_ag(mp, SYNC_TRYLOCK, &nr_to_scan);
+		xfs_reclaim_inodes_ag(mp, SYNC_TRYLOCK | SYNC_WAIT,
+					&nr_to_scan);
 		/* terminate if we don't exhaust the scan */
 		if (nr_to_scan > 0)
 			return -1;
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index a0ad90e..19af0ab 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -204,6 +204,7 @@ typedef struct xfs_mount {
 #endif
 	struct xfs_mru_cache	*m_filestream;  /* per-mount filestream data */
 	struct delayed_work	m_sync_work;	/* background sync work */
+	struct delayed_work	m_reclaim_work;	/* background inode reclaim */
 	struct work_struct	m_flush_work;	/* background inode flush */
 	__int64_t		m_update_flags;	/* sb flags we need to update
 						   on the next remount,rw */
-- 
1.7.2.3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 5/9] xfs: convert the xfsaild threads to a workqueue
  2011-04-07  1:57 [PATCH 0/9] xfs; candidate fixes for 2.6.39 V2 Dave Chinner
                   ` (3 preceding siblings ...)
  2011-04-07  1:57 ` [PATCH 4/9] xfs: introduce background inode reclaim work Dave Chinner
@ 2011-04-07  1:57 ` Dave Chinner
  2011-04-07 21:16   ` Alex Elder
  2011-04-07  1:57 ` [PATCH 6/9] xfs: clean up code layout in xfs_trans_ail.c Dave Chinner
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 25+ messages in thread
From: Dave Chinner @ 2011-04-07  1:57 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

Similar to the xfssyncd, the per-filesystem xfsaild threads can be
converted to a global workqueue and run periodically by delayed
works. This makes sense for the AIL pushing because it uses
variable timeouts depending on the work that needs to be done.

By removing the xfsaild, we simplify the AIL pushing code and
remove the need to spread the code to implement the threading
and pushing across multiple files.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/linux-2.6/xfs_super.c |  115 ++++++++++++------------------------
 fs/xfs/xfs_trans_ail.c       |  133 +++++++++++++++++++++++-------------------
 fs/xfs/xfs_trans_priv.h      |   15 +++--
 3 files changed, 118 insertions(+), 145 deletions(-)

diff --git a/fs/xfs/linux-2.6/xfs_super.c b/fs/xfs/linux-2.6/xfs_super.c
index a20071f..b5674ed 100644
--- a/fs/xfs/linux-2.6/xfs_super.c
+++ b/fs/xfs/linux-2.6/xfs_super.c
@@ -816,75 +816,6 @@ xfs_setup_devices(
 	return 0;
 }
 
-/*
- * XFS AIL push thread support
- */
-void
-xfsaild_wakeup(
-	struct xfs_ail		*ailp,
-	xfs_lsn_t		threshold_lsn)
-{
-	/* only ever move the target forwards */
-	if (XFS_LSN_CMP(threshold_lsn, ailp->xa_target) > 0) {
-		ailp->xa_target = threshold_lsn;
-		wake_up_process(ailp->xa_task);
-	}
-}
-
-STATIC int
-xfsaild(
-	void	*data)
-{
-	struct xfs_ail	*ailp = data;
-	xfs_lsn_t	last_pushed_lsn = 0;
-	long		tout = 0; /* milliseconds */
-
-	while (!kthread_should_stop()) {
-		/*
-		 * for short sleeps indicating congestion, don't allow us to
-		 * get woken early. Otherwise all we do is bang on the AIL lock
-		 * without making progress.
-		 */
-		if (tout && tout <= 20)
-			__set_current_state(TASK_KILLABLE);
-		else
-			__set_current_state(TASK_INTERRUPTIBLE);
-		schedule_timeout(tout ?
-				 msecs_to_jiffies(tout) : MAX_SCHEDULE_TIMEOUT);
-
-		/* swsusp */
-		try_to_freeze();
-
-		ASSERT(ailp->xa_mount->m_log);
-		if (XFS_FORCED_SHUTDOWN(ailp->xa_mount))
-			continue;
-
-		tout = xfsaild_push(ailp, &last_pushed_lsn);
-	}
-
-	return 0;
-}	/* xfsaild */
-
-int
-xfsaild_start(
-	struct xfs_ail	*ailp)
-{
-	ailp->xa_target = 0;
-	ailp->xa_task = kthread_run(xfsaild, ailp, "xfsaild/%s",
-				    ailp->xa_mount->m_fsname);
-	if (IS_ERR(ailp->xa_task))
-		return -PTR_ERR(ailp->xa_task);
-	return 0;
-}
-
-void
-xfsaild_stop(
-	struct xfs_ail	*ailp)
-{
-	kthread_stop(ailp->xa_task);
-}
-
-
 /* Catch misguided souls that try to use this interface on XFS */
 STATIC struct inode *
 xfs_fs_alloc_inode(
@@ -1786,6 +1717,32 @@ xfs_destroy_zones(void)
 }
 
 STATIC int __init
+xfs_init_workqueues(void)
+{
+	xfs_syncd_wq = alloc_workqueue("xfssyncd", WQ_CPU_INTENSIVE, 8);
+	if (!xfs_syncd_wq)
+		goto out;
+
+	xfs_ail_wq = alloc_workqueue("xfsail", WQ_CPU_INTENSIVE, 8);
+	if (!xfs_ail_wq)
+		goto out_destroy_syncd;
+
+	return 0;
+
+out_destroy_syncd:
+	destroy_workqueue(xfs_syncd_wq);
+out:
+	return -ENOMEM;
+}
+
+STATIC void __exit
+xfs_destroy_workqueues(void)
+{
+	destroy_workqueue(xfs_ail_wq);
+	destroy_workqueue(xfs_syncd_wq);
+}
+
+STATIC int __init
 init_xfs_fs(void)
 {
 	int			error;
@@ -1800,10 +1757,14 @@ init_xfs_fs(void)
 	if (error)
 		goto out;
 
-	error = xfs_mru_cache_init();
+	error = xfs_init_workqueues();
 	if (error)
 		goto out_destroy_zones;
 
+	error = xfs_mru_cache_init();
+	if (error)
+		goto out_destroy_wq;
+
 	error = xfs_filestream_init();
 	if (error)
 		goto out_mru_cache_uninit;
@@ -1820,21 +1781,17 @@ init_xfs_fs(void)
 	if (error)
 		goto out_cleanup_procfs;
 
-	xfs_syncd_wq = alloc_workqueue("xfssyncd", WQ_CPU_INTENSIVE, 8);
-	if (!xfs_syncd_wq) {
-		error = -ENOMEM;
+	error = xfs_init_workqueues();
+	if (error)
 		goto out_sysctl_unregister;
-	}
 
 	vfs_initquota();
 
 	error = register_filesystem(&xfs_fs_type);
 	if (error)
-		goto out_destroy_xfs_syncd;
+		goto out_sysctl_unregister;
 	return 0;
 
- out_destroy_xfs_syncd:
-	destroy_workqueue(xfs_syncd_wq);
  out_sysctl_unregister:
 	xfs_sysctl_unregister();
  out_cleanup_procfs:
@@ -1845,6 +1802,8 @@ init_xfs_fs(void)
 	xfs_filestream_uninit();
  out_mru_cache_uninit:
 	xfs_mru_cache_uninit();
+ out_destroy_wq:
+	xfs_destroy_workqueues();
  out_destroy_zones:
 	xfs_destroy_zones();
  out:
@@ -1856,12 +1815,12 @@ exit_xfs_fs(void)
 {
 	vfs_exitquota();
 	unregister_filesystem(&xfs_fs_type);
-	destroy_workqueue(xfs_syncd_wq);
 	xfs_sysctl_unregister();
 	xfs_cleanup_procfs();
 	xfs_buf_terminate();
 	xfs_filestream_uninit();
 	xfs_mru_cache_uninit();
+	xfs_destroy_workqueues();
 	xfs_destroy_zones();
 }
 
diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
index 12aff95..cb3aeac 100644
--- a/fs/xfs/xfs_trans_ail.c
+++ b/fs/xfs/xfs_trans_ail.c
@@ -28,6 +28,8 @@
 #include "xfs_trans_priv.h"
 #include "xfs_error.h"
 
+struct workqueue_struct	*xfs_ail_wq;	/* AIL workqueue */
+
 STATIC void xfs_ail_splice(struct xfs_ail *, struct list_head *, xfs_lsn_t);
 STATIC void xfs_ail_delete(struct xfs_ail *, xfs_log_item_t *);
 STATIC xfs_log_item_t * xfs_ail_min(struct xfs_ail *);
@@ -69,36 +71,6 @@ xfs_trans_ail_tail(
 }
 
 /*
- * xfs_trans_push_ail
- *
- * This routine is called to move the tail of the AIL forward.  It does this by
- * trying to flush items in the AIL whose lsns are below the given
- * threshold_lsn.
- *
- * the push is run asynchronously in a separate thread, so we return the tail
- * of the log right now instead of the tail after the push. This means we will
- * either continue right away, or we will sleep waiting on the async thread to
- * do its work.
- *
- * We do this unlocked - we only need to know whether there is anything in the
- * AIL at the time we are called. We don't need to access the contents of
- * any of the objects, so the lock is not needed.
- */
-void
-xfs_trans_ail_push(
-	struct xfs_ail	*ailp,
-	xfs_lsn_t	threshold_lsn)
-{
-	xfs_log_item_t	*lip;
-
-	lip = xfs_ail_min(ailp);
-	if (lip && !XFS_FORCED_SHUTDOWN(ailp->xa_mount)) {
-		if (XFS_LSN_CMP(threshold_lsn, ailp->xa_target) > 0)
-			xfsaild_wakeup(ailp, threshold_lsn);
-	}
-}
-
-/*
  * AIL traversal cursor initialisation.
  *
  * The cursor keeps track of where our current traversal is up
@@ -236,16 +208,16 @@ out:
 }
 
 /*
- * xfsaild_push does the work of pushing on the AIL.  Returning a timeout of
- * zero indicates that the caller should sleep until woken.
+ * xfs_ail_worker does the work of pushing on the AIL. It will requeue itself
+ * to run at a later time if there is more work to do to complete the push.
  */
-long
-xfsaild_push(
-	struct xfs_ail	*ailp,
-	xfs_lsn_t	*last_lsn)
+STATIC void
+xfs_ail_worker(
+	struct work_struct *work)
 {
-	long		tout = 0;
-	xfs_lsn_t	last_pushed_lsn = *last_lsn;
+	struct xfs_ail	*ailp = container_of(to_delayed_work(work),
+					struct xfs_ail, xa_work);
+	long		tout;
 	xfs_lsn_t	target =  ailp->xa_target;
 	xfs_lsn_t	lsn;
 	xfs_log_item_t	*lip;
@@ -256,15 +228,15 @@ xfsaild_push(
 
 	spin_lock(&ailp->xa_lock);
 	xfs_trans_ail_cursor_init(ailp, cur);
-	lip = xfs_trans_ail_cursor_first(ailp, cur, *last_lsn);
+	lip = xfs_trans_ail_cursor_first(ailp, cur, ailp->xa_last_pushed_lsn);
 	if (!lip || XFS_FORCED_SHUTDOWN(mp)) {
 		/*
 		 * AIL is empty or our push has reached the end.
 		 */
 		xfs_trans_ail_cursor_done(ailp, cur);
 		spin_unlock(&ailp->xa_lock);
-		*last_lsn = 0;
-		return tout;
+		ailp->xa_last_pushed_lsn = 0;
+		return;
 	}
 
 	XFS_STATS_INC(xs_push_ail);
@@ -301,13 +273,13 @@ xfsaild_push(
 		case XFS_ITEM_SUCCESS:
 			XFS_STATS_INC(xs_push_ail_success);
 			IOP_PUSH(lip);
-			last_pushed_lsn = lsn;
+			ailp->xa_last_pushed_lsn = lsn;
 			break;
 
 		case XFS_ITEM_PUSHBUF:
 			XFS_STATS_INC(xs_push_ail_pushbuf);
 			IOP_PUSHBUF(lip);
-			last_pushed_lsn = lsn;
+			ailp->xa_last_pushed_lsn = lsn;
 			push_xfsbufd = 1;
 			break;
 
@@ -319,7 +291,7 @@ xfsaild_push(
 
 		case XFS_ITEM_LOCKED:
 			XFS_STATS_INC(xs_push_ail_locked);
-			last_pushed_lsn = lsn;
+			ailp->xa_last_pushed_lsn = lsn;
 			stuck++;
 			break;
 
@@ -374,9 +346,23 @@ xfsaild_push(
 		wake_up_process(mp->m_ddev_targp->bt_task);
 	}
 
+	/* assume we have more work to do in a short while */
+	tout = 10;
 	if (!count) {
 		/* We're past our target or empty, so idle */
-		last_pushed_lsn = 0;
+		ailp->xa_last_pushed_lsn = 0;
+
+		/*
+		 * Check for an updated push target before clearing the
+		 * XFS_AIL_PUSHING_BIT. If the target changed, we've got more
+		 * work to do. Wait a bit longer before starting that work.
+		 */
+		smp_rmb();
+		if (ailp->xa_target == target) {
+			clear_bit(XFS_AIL_PUSHING_BIT, &ailp->xa_flags);
+			return;
+		}
+		tout = 50;
 	} else if (XFS_LSN_CMP(lsn, target) >= 0) {
 		/*
 		 * We reached the target so wait a bit longer for I/O to
@@ -384,7 +370,7 @@ xfsaild_push(
 		 * start the next scan from the start of the AIL.
 		 */
 		tout = 50;
-		last_pushed_lsn = 0;
+		ailp->xa_last_pushed_lsn = 0;
 	} else if ((stuck * 100) / count > 90) {
 		/*
 		 * Either there is a lot of contention on the AIL or we
@@ -396,14 +382,48 @@ xfsaild_push(
 		 * continuing from where we were.
 		 */
 		tout = 20;
-	} else {
-		/* more to do, but wait a short while before continuing */
-		tout = 10;
 	}
-	*last_lsn = last_pushed_lsn;
-	return tout;
+
+	/* There is more to do, requeue us.  */
+	queue_delayed_work(xfs_syncd_wq, &ailp->xa_work,
+					msecs_to_jiffies(tout));
 }
 
+/*
+ * This routine is called to move the tail of the AIL forward.  It does this by
+ * trying to flush items in the AIL whose lsns are below the given
+ * threshold_lsn.
+ *
+ * The push is run asynchronously in a workqueue, which means the caller needs
+ * to handle waiting on the async flush for space to become available.
+ * We don't want to interrupt any push that is in progress, hence we only queue
+ * work if we set the pushing bit approriately.
+ *
+ * We do this unlocked - we only need to know whether there is anything in the
+ * AIL at the time we are called. We don't need to access the contents of
+ * any of the objects, so the lock is not needed.
+ */
+void
+xfs_trans_ail_push(
+	struct xfs_ail	*ailp,
+	xfs_lsn_t	threshold_lsn)
+{
+	xfs_log_item_t	*lip;
+
+	lip = xfs_ail_min(ailp);
+	if (!lip || XFS_FORCED_SHUTDOWN(ailp->xa_mount) ||
+	    XFS_LSN_CMP(threshold_lsn, ailp->xa_target) <= 0)
+		return;
+
+	/*
+	 * Ensure that the new target is noticed in push code before it clears
+	 * the XFS_AIL_PUSHING_BIT.
+	 */
+	smp_wmb();
+	ailp->xa_target = threshold_lsn;
+	if (!test_and_set_bit(XFS_AIL_PUSHING_BIT, &ailp->xa_flags))
+		queue_delayed_work(xfs_syncd_wq, &ailp->xa_work, 0);
+}
 
 /*
  * This is to be called when an item is unlocked that may have
@@ -615,7 +635,6 @@ xfs_trans_ail_init(
 	xfs_mount_t	*mp)
 {
 	struct xfs_ail	*ailp;
-	int		error;
 
 	ailp = kmem_zalloc(sizeof(struct xfs_ail), KM_MAYFAIL);
 	if (!ailp)
@@ -624,15 +643,9 @@ xfs_trans_ail_init(
 	ailp->xa_mount = mp;
 	INIT_LIST_HEAD(&ailp->xa_ail);
 	spin_lock_init(&ailp->xa_lock);
-	error = xfsaild_start(ailp);
-	if (error)
-		goto out_free_ailp;
+	INIT_DELAYED_WORK(&ailp->xa_work, xfs_ail_worker);
 	mp->m_ail = ailp;
 	return 0;
-
-out_free_ailp:
-	kmem_free(ailp);
-	return error;
 }
 
 void
@@ -641,7 +654,7 @@ xfs_trans_ail_destroy(
 {
 	struct xfs_ail	*ailp = mp->m_ail;
 
-	xfsaild_stop(ailp);
+	cancel_delayed_work_sync(&ailp->xa_work);
 	kmem_free(ailp);
 }
 
diff --git a/fs/xfs/xfs_trans_priv.h b/fs/xfs/xfs_trans_priv.h
index 35162c2..6ebd322 100644
--- a/fs/xfs/xfs_trans_priv.h
+++ b/fs/xfs/xfs_trans_priv.h
@@ -65,16 +65,22 @@ struct xfs_ail_cursor {
 struct xfs_ail {
 	struct xfs_mount	*xa_mount;
 	struct list_head	xa_ail;
-	uint			xa_gen;
-	struct task_struct	*xa_task;
 	xfs_lsn_t		xa_target;
 	struct xfs_ail_cursor	xa_cursors;
 	spinlock_t		xa_lock;
+	struct delayed_work	xa_work;
+	xfs_lsn_t		xa_last_pushed_lsn;
+	unsigned long		xa_flags;
 };
 
+#define XFS_AIL_PUSHING_BIT	0
+
 /*
  * From xfs_trans_ail.c
  */
+
+extern struct workqueue_struct	*xfs_ail_wq;	/* AIL workqueue */
+
 void	xfs_trans_ail_update_bulk(struct xfs_ail *ailp,
 				struct xfs_log_item **log_items, int nr_items,
 				xfs_lsn_t lsn) __releases(ailp->xa_lock);
@@ -112,11 +118,6 @@ struct xfs_log_item	*xfs_trans_ail_cursor_next(struct xfs_ail *ailp,
 void			xfs_trans_ail_cursor_done(struct xfs_ail *ailp,
 					struct xfs_ail_cursor *cur);
 
-long	xfsaild_push(struct xfs_ail *, xfs_lsn_t *);
-void	xfsaild_wakeup(struct xfs_ail *, xfs_lsn_t);
-int	xfsaild_start(struct xfs_ail *);
-void	xfsaild_stop(struct xfs_ail *);
-
 #if BITS_PER_LONG != 64
 static inline void
 xfs_trans_ail_copy_lsn(
-- 
1.7.2.3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 6/9] xfs: clean up code layout in xfs_trans_ail.c
  2011-04-07  1:57 [PATCH 0/9] xfs; candidate fixes for 2.6.39 V2 Dave Chinner
                   ` (4 preceding siblings ...)
  2011-04-07  1:57 ` [PATCH 5/9] xfs: convert the xfsaild threads to a workqueue Dave Chinner
@ 2011-04-07  1:57 ` Dave Chinner
  2011-04-07 21:16   ` Alex Elder
  2011-04-07  1:57 ` [PATCH 7/9] xfs: push the AIL from memory reclaim and periodic sync Dave Chinner
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 25+ messages in thread
From: Dave Chinner @ 2011-04-07  1:57 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

This patch rearranges the location of functions in xfs_trans_ail.c
to remove the need for forward declarations of those functions in
preparation for adding new functions without the need for forward
declarations.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_trans_ail.c |  254 ++++++++++++++++++++++--------------------------
 1 files changed, 118 insertions(+), 136 deletions(-)

diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
index cb3aeac..8012bfb 100644
--- a/fs/xfs/xfs_trans_ail.c
+++ b/fs/xfs/xfs_trans_ail.c
@@ -30,41 +30,100 @@
 
 struct workqueue_struct	*xfs_ail_wq;	/* AIL workqueue */
 
-STATIC void xfs_ail_splice(struct xfs_ail *, struct list_head *, xfs_lsn_t);
-STATIC void xfs_ail_delete(struct xfs_ail *, xfs_log_item_t *);
-STATIC xfs_log_item_t * xfs_ail_min(struct xfs_ail *);
-STATIC xfs_log_item_t * xfs_ail_next(struct xfs_ail *, xfs_log_item_t *);
-
 #ifdef DEBUG
-STATIC void xfs_ail_check(struct xfs_ail *, xfs_log_item_t *);
-#else
+/*
+ * Check that the list is sorted as it should be.
+ */
+STATIC void
+xfs_ail_check(
+	struct xfs_ail	*ailp,
+	xfs_log_item_t	*lip)
+{
+	xfs_log_item_t	*prev_lip;
+
+	if (list_empty(&ailp->xa_ail))
+		return;
+
+	/*
+	 * Check the next and previous entries are valid.
+	 */
+	ASSERT((lip->li_flags & XFS_LI_IN_AIL) != 0);
+	prev_lip = list_entry(lip->li_ail.prev, xfs_log_item_t, li_ail);
+	if (&prev_lip->li_ail != &ailp->xa_ail)
+		ASSERT(XFS_LSN_CMP(prev_lip->li_lsn, lip->li_lsn) <= 0);
+
+	prev_lip = list_entry(lip->li_ail.next, xfs_log_item_t, li_ail);
+	if (&prev_lip->li_ail != &ailp->xa_ail)
+		ASSERT(XFS_LSN_CMP(prev_lip->li_lsn, lip->li_lsn) >= 0);
+
+
+#ifdef XFS_TRANS_DEBUG
+	/*
+	 * Walk the list checking lsn ordering, and that every entry has the
+	 * XFS_LI_IN_AIL flag set. This is really expensive, so only do it
+	 * when specifically debugging the transaction subsystem.
+	 */
+	prev_lip = list_entry(&ailp->xa_ail, xfs_log_item_t, li_ail);
+	list_for_each_entry(lip, &ailp->xa_ail, li_ail) {
+		if (&prev_lip->li_ail != &ailp->xa_ail)
+			ASSERT(XFS_LSN_CMP(prev_lip->li_lsn, lip->li_lsn) <= 0);
+		ASSERT((lip->li_flags & XFS_LI_IN_AIL) != 0);
+		prev_lip = lip;
+	}
+#endif /* XFS_TRANS_DEBUG */
+}
+#else /* !DEBUG */
 #define	xfs_ail_check(a,l)
 #endif /* DEBUG */
 
+/*
+ * Return a pointer to the first item in the AIL.  If the AIL is empty, then
+ * return NULL.
+ */
+static xfs_log_item_t *
+xfs_ail_min(
+	struct xfs_ail  *ailp)
+{
+	if (list_empty(&ailp->xa_ail))
+		return NULL;
+
+	return list_first_entry(&ailp->xa_ail, xfs_log_item_t, li_ail);
+}
+
+/*
+ * Return a pointer to the item which follows the given item in the AIL.  If
+ * the given item is the last item in the list, then return NULL.
+ */
+static xfs_log_item_t *
+xfs_ail_next(
+	struct xfs_ail  *ailp,
+	xfs_log_item_t  *lip)
+{
+	if (lip->li_ail.next == &ailp->xa_ail)
+		return NULL;
+
+	return list_first_entry(&lip->li_ail, xfs_log_item_t, li_ail);
+}
 
 /*
- * This is called by the log manager code to determine the LSN
- * of the tail of the log.  This is exactly the LSN of the first
- * item in the AIL.  If the AIL is empty, then this function
- * returns 0.
+ * This is called by the log manager code to determine the LSN of the tail of
+ * the log.  This is exactly the LSN of the first item in the AIL.  If the AIL
+ * is empty, then this function returns 0.
  *
- * We need the AIL lock in order to get a coherent read of the
- * lsn of the last item in the AIL.
+ * We need the AIL lock in order to get a coherent read of the lsn of the last
+ * item in the AIL.
  */
 xfs_lsn_t
 xfs_trans_ail_tail(
 	struct xfs_ail	*ailp)
 {
-	xfs_lsn_t	lsn;
+	xfs_lsn_t	lsn = 0;
 	xfs_log_item_t	*lip;
 
 	spin_lock(&ailp->xa_lock);
 	lip = xfs_ail_min(ailp);
-	if (lip == NULL) {
-		lsn = (xfs_lsn_t)0;
-	} else {
+	if (lip)
 		lsn = lip->li_lsn;
-	}
 	spin_unlock(&ailp->xa_lock);
 
 	return lsn;
@@ -208,6 +267,47 @@ out:
 }
 
 /*
+ * splice the log item list into the AIL at the given LSN.
+ */
+static void
+xfs_ail_splice(
+	struct xfs_ail  *ailp,
+	struct list_head *list,
+	xfs_lsn_t       lsn)
+{
+	xfs_log_item_t  *next_lip;
+
+	/* If the list is empty, just insert the item.  */
+	if (list_empty(&ailp->xa_ail)) {
+		list_splice(list, &ailp->xa_ail);
+		return;
+	}
+
+	list_for_each_entry_reverse(next_lip, &ailp->xa_ail, li_ail) {
+		if (XFS_LSN_CMP(next_lip->li_lsn, lsn) <= 0)
+			break;
+	}
+
+	ASSERT(&next_lip->li_ail == &ailp->xa_ail ||
+	       XFS_LSN_CMP(next_lip->li_lsn, lsn) <= 0);
+
+	list_splice_init(list, &next_lip->li_ail);
+}
+
+/*
+ * Delete the given item from the AIL.  Return a pointer to the item.
+ */
+static void
+xfs_ail_delete(
+	struct xfs_ail  *ailp,
+	xfs_log_item_t  *lip)
+{
+	xfs_ail_check(ailp, lip);
+	list_del(&lip->li_ail);
+	xfs_trans_ail_cursor_clear(ailp, lip);
+}
+
+/*
  * xfs_ail_worker does the work of pushing on the AIL. It will requeue itself
  * to run at a later time if there is more work to do to complete the push.
  */
@@ -657,121 +757,3 @@ xfs_trans_ail_destroy(
 	cancel_delayed_work_sync(&ailp->xa_work);
 	kmem_free(ailp);
 }
-
-/*
- * splice the log item list into the AIL at the given LSN.
- */
-STATIC void
-xfs_ail_splice(
-	struct xfs_ail	*ailp,
-	struct list_head *list,
-	xfs_lsn_t	lsn)
-{
-	xfs_log_item_t	*next_lip;
-
-	/*
-	 * If the list is empty, just insert the item.
-	 */
-	if (list_empty(&ailp->xa_ail)) {
-		list_splice(list, &ailp->xa_ail);
-		return;
-	}
-
-	list_for_each_entry_reverse(next_lip, &ailp->xa_ail, li_ail) {
-		if (XFS_LSN_CMP(next_lip->li_lsn, lsn) <= 0)
-			break;
-	}
-
-	ASSERT((&next_lip->li_ail == &ailp->xa_ail) ||
-	       (XFS_LSN_CMP(next_lip->li_lsn, lsn) <= 0));
-
-	list_splice_init(list, &next_lip->li_ail);
-	return;
-}
-
-/*
- * Delete the given item from the AIL.  Return a pointer to the item.
- */
-STATIC void
-xfs_ail_delete(
-	struct xfs_ail	*ailp,
-	xfs_log_item_t	*lip)
-{
-	xfs_ail_check(ailp, lip);
-	list_del(&lip->li_ail);
-	xfs_trans_ail_cursor_clear(ailp, lip);
-}
-
-/*
- * Return a pointer to the first item in the AIL.
- * If the AIL is empty, then return NULL.
- */
-STATIC xfs_log_item_t *
-xfs_ail_min(
-	struct xfs_ail	*ailp)
-{
-	if (list_empty(&ailp->xa_ail))
-		return NULL;
-
-	return list_first_entry(&ailp->xa_ail, xfs_log_item_t, li_ail);
-}
-
-/*
- * Return a pointer to the item which follows
- * the given item in the AIL.  If the given item
- * is the last item in the list, then return NULL.
- */
-STATIC xfs_log_item_t *
-xfs_ail_next(
-	struct xfs_ail	*ailp,
-	xfs_log_item_t	*lip)
-{
-	if (lip->li_ail.next == &ailp->xa_ail)
-		return NULL;
-
-	return list_first_entry(&lip->li_ail, xfs_log_item_t, li_ail);
-}
-
-#ifdef DEBUG
-/*
- * Check that the list is sorted as it should be.
- */
-STATIC void
-xfs_ail_check(
-	struct xfs_ail	*ailp,
-	xfs_log_item_t	*lip)
-{
-	xfs_log_item_t	*prev_lip;
-
-	if (list_empty(&ailp->xa_ail))
-		return;
-
-	/*
-	 * Check the next and previous entries are valid.
-	 */
-	ASSERT((lip->li_flags & XFS_LI_IN_AIL) != 0);
-	prev_lip = list_entry(lip->li_ail.prev, xfs_log_item_t, li_ail);
-	if (&prev_lip->li_ail != &ailp->xa_ail)
-		ASSERT(XFS_LSN_CMP(prev_lip->li_lsn, lip->li_lsn) <= 0);
-
-	prev_lip = list_entry(lip->li_ail.next, xfs_log_item_t, li_ail);
-	if (&prev_lip->li_ail != &ailp->xa_ail)
-		ASSERT(XFS_LSN_CMP(prev_lip->li_lsn, lip->li_lsn) >= 0);
-
-
-#ifdef XFS_TRANS_DEBUG
-	/*
-	 * Walk the list checking lsn ordering, and that every entry has the
-	 * XFS_LI_IN_AIL flag set. This is really expensive, so only do it
-	 * when specifically debugging the transaction subsystem.
-	 */
-	prev_lip = list_entry(&ailp->xa_ail, xfs_log_item_t, li_ail);
-	list_for_each_entry(lip, &ailp->xa_ail, li_ail) {
-		if (&prev_lip->li_ail != &ailp->xa_ail)
-			ASSERT(XFS_LSN_CMP(prev_lip->li_lsn, lip->li_lsn) <= 0);
-		ASSERT((lip->li_flags & XFS_LI_IN_AIL) != 0);
-		prev_lip = lip;
-	}
-#endif /* XFS_TRANS_DEBUG */
-}
-#endif /* DEBUG */
-- 
1.7.2.3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 7/9] xfs: push the AIL from memory reclaim and periodic sync
  2011-04-07  1:57 [PATCH 0/9] xfs; candidate fixes for 2.6.39 V2 Dave Chinner
                   ` (5 preceding siblings ...)
  2011-04-07  1:57 ` [PATCH 6/9] xfs: clean up code layout in xfs_trans_ail.c Dave Chinner
@ 2011-04-07  1:57 ` Dave Chinner
  2011-04-07 21:16   ` Alex Elder
  2011-04-07  1:57 ` [PATCH 8/9] xfs: catch bad block numbers freeing extents Dave Chinner
  2011-04-07  1:57 ` [PATCH 9/9] xfs: convert log tail checking to a warning Dave Chinner
  8 siblings, 1 reply; 25+ messages in thread
From: Dave Chinner @ 2011-04-07  1:57 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

When we are short on memory, we want to expedite the cleaning of
dirty objects.  Hence when we run short on memory, we need to kick
the AIL flushing into action to clean as many dirty objects as
quickly as possible.  To implement this, sample the lsn of the log
item at the head of the AIL and use that as the push target for the
AIL flush.

Further, we keep items in the AIL that are dirty that are not
tracked any other way, so we can get objects sitting in the AIL that
don't get written back until the AIL is pushed. Hence to get the
filesystem to the idle state, we might need to push the AIL to flush
out any remaining dirty objects sitting in the AIL. This requires
the same push mechanism as the reclaim push.

This patch also renames xfs_trans_ail_tail() to xfs_ail_min_lsn() to
match the new xfs_ail_max_lsn() function introduced in this patch.
Similarly for xfs_trans_ail_push -> xfs_ail_push.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/linux-2.6/xfs_sync.c |    7 +++++-
 fs/xfs/xfs_log.c            |    6 ++--
 fs/xfs/xfs_trans_ail.c      |   50 +++++++++++++++++++++++++++++++++++++++++-
 fs/xfs/xfs_trans_priv.h     |    7 +++--
 4 files changed, 61 insertions(+), 9 deletions(-)

diff --git a/fs/xfs/linux-2.6/xfs_sync.c b/fs/xfs/linux-2.6/xfs_sync.c
index debe282..9ad9560 100644
--- a/fs/xfs/linux-2.6/xfs_sync.c
+++ b/fs/xfs/linux-2.6/xfs_sync.c
@@ -22,6 +22,7 @@
 #include "xfs_log.h"
 #include "xfs_inum.h"
 #include "xfs_trans.h"
+#include "xfs_trans_priv.h"
 #include "xfs_sb.h"
 #include "xfs_ag.h"
 #include "xfs_mount.h"
@@ -462,6 +463,9 @@ xfs_sync_worker(
 		else
 			xfs_log_force(mp, 0);
 		error = xfs_qm_sync(mp, SYNC_TRYLOCK);
+
+		/* start pushing all the metadata that is currently dirty */
+		xfs_ail_push_all(mp->m_ail);
 	}
 
 	/* queue us up again */
@@ -1027,8 +1031,9 @@ xfs_reclaim_inode_shrink(
 
 	mp = container_of(shrink, struct xfs_mount, m_inode_shrink);
 	if (nr_to_scan) {
-		/* kick background reclaimer */
+		/* kick background reclaimer and push the AIL */
 		xfs_syncd_queue_reclaim(mp);
+		xfs_ail_push_all(mp->m_ail);
 
 		if (!(gfp_mask & __GFP_FS))
 			return -1;
diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index 25efa9b..2464316 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -761,7 +761,7 @@ xfs_log_need_covered(xfs_mount_t *mp)
 		break;
 	case XLOG_STATE_COVER_NEED:
 	case XLOG_STATE_COVER_NEED2:
-		if (!xfs_trans_ail_tail(log->l_ailp) &&
+		if (!xfs_ail_min_lsn(log->l_ailp) &&
 		    xlog_iclogs_empty(log)) {
 			if (log->l_covered_state == XLOG_STATE_COVER_NEED)
 				log->l_covered_state = XLOG_STATE_COVER_DONE;
@@ -801,7 +801,7 @@ xlog_assign_tail_lsn(
 	xfs_lsn_t		tail_lsn;
 	struct log		*log = mp->m_log;
 
-	tail_lsn = xfs_trans_ail_tail(mp->m_ail);
+	tail_lsn = xfs_ail_min_lsn(mp->m_ail);
 	if (!tail_lsn)
 		tail_lsn = atomic64_read(&log->l_last_sync_lsn);
 
@@ -1239,7 +1239,7 @@ xlog_grant_push_ail(
 	 * the filesystem is shutting down.
 	 */
 	if (!XLOG_FORCED_SHUTDOWN(log))
-		xfs_trans_ail_push(log->l_ailp, threshold_lsn);
+		xfs_ail_push(log->l_ailp, threshold_lsn);
 }
 
 /*
diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
index 8012bfb..acdb92f 100644
--- a/fs/xfs/xfs_trans_ail.c
+++ b/fs/xfs/xfs_trans_ail.c
@@ -90,6 +90,20 @@ xfs_ail_min(
 	return list_first_entry(&ailp->xa_ail, xfs_log_item_t, li_ail);
 }
 
+ /*
+ * Return a pointer to the last item in the AIL.  If the AIL is empty, then
+ * return NULL.
+ */
+static xfs_log_item_t *
+xfs_ail_max(
+	struct xfs_ail  *ailp)
+{
+	if (list_empty(&ailp->xa_ail))
+		return NULL;
+
+	return list_entry(ailp->xa_ail.prev, xfs_log_item_t, li_ail);
+}
+
 /*
  * Return a pointer to the item which follows the given item in the AIL.  If
  * the given item is the last item in the list, then return NULL.
@@ -114,7 +128,7 @@ xfs_ail_next(
  * item in the AIL.
  */
 xfs_lsn_t
-xfs_trans_ail_tail(
+xfs_ail_min_lsn(
 	struct xfs_ail	*ailp)
 {
 	xfs_lsn_t	lsn = 0;
@@ -130,6 +144,25 @@ xfs_trans_ail_tail(
 }
 
 /*
+ * Return the maximum lsn held in the AIL, or zero if the AIL is empty.
+ */
+static xfs_lsn_t
+xfs_ail_max_lsn(
+	struct xfs_ail  *ailp)
+{
+	xfs_lsn_t       lsn = 0;
+	xfs_log_item_t  *lip;
+
+	spin_lock(&ailp->xa_lock);
+	lip = xfs_ail_max(ailp);
+	if (lip)
+		lsn = lip->li_lsn;
+	spin_unlock(&ailp->xa_lock);
+
+	return lsn;
+}
+
+/*
  * AIL traversal cursor initialisation.
  *
  * The cursor keeps track of where our current traversal is up
@@ -504,7 +537,7 @@ xfs_ail_worker(
  * any of the objects, so the lock is not needed.
  */
 void
-xfs_trans_ail_push(
+xfs_ail_push(
 	struct xfs_ail	*ailp,
 	xfs_lsn_t	threshold_lsn)
 {
@@ -526,6 +559,19 @@ xfs_trans_ail_push(
 }
 
 /*
+ * Push out all items in the AIL immediately
+ */
+void
+xfs_ail_push_all(
+	struct xfs_ail  *ailp)
+{
+	xfs_lsn_t       threshold_lsn = xfs_ail_max_lsn(ailp);
+
+	if (threshold_lsn)
+		xfs_ail_push(ailp, threshold_lsn);
+}
+
+/*
  * This is to be called when an item is unlocked that may have
  * been in the AIL.  It will wake up the first member of the AIL
  * wait list if this item's unlocking might allow it to progress.
diff --git a/fs/xfs/xfs_trans_priv.h b/fs/xfs/xfs_trans_priv.h
index 6ebd322..6b164e9 100644
--- a/fs/xfs/xfs_trans_priv.h
+++ b/fs/xfs/xfs_trans_priv.h
@@ -104,12 +104,13 @@ xfs_trans_ail_delete(
 	xfs_trans_ail_delete_bulk(ailp, &lip, 1);
 }
 
-void			xfs_trans_ail_push(struct xfs_ail *, xfs_lsn_t);
+void			xfs_ail_push(struct xfs_ail *, xfs_lsn_t);
+void			xfs_ail_push_all(struct xfs_ail *);
+xfs_lsn_t		xfs_ail_min_lsn(struct xfs_ail *ailp);
+
 void			xfs_trans_unlocked_item(struct xfs_ail *,
 					xfs_log_item_t *);
 
-xfs_lsn_t		xfs_trans_ail_tail(struct xfs_ail *ailp);
-
 struct xfs_log_item	*xfs_trans_ail_cursor_first(struct xfs_ail *ailp,
 					struct xfs_ail_cursor *cur,
 					xfs_lsn_t lsn);
-- 
1.7.2.3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 8/9] xfs: catch bad block numbers freeing extents.
  2011-04-07  1:57 [PATCH 0/9] xfs; candidate fixes for 2.6.39 V2 Dave Chinner
                   ` (6 preceding siblings ...)
  2011-04-07  1:57 ` [PATCH 7/9] xfs: push the AIL from memory reclaim and periodic sync Dave Chinner
@ 2011-04-07  1:57 ` Dave Chinner
  2011-04-07 21:16   ` Alex Elder
  2011-04-07  1:57 ` [PATCH 9/9] xfs: convert log tail checking to a warning Dave Chinner
  8 siblings, 1 reply; 25+ messages in thread
From: Dave Chinner @ 2011-04-07  1:57 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

A fuzzed filesystem crashed a kernel when freeing an extent with a
block number beyond the end of the filesystem. Convert all the debug
asserts in xfs_free_extent() to active checks so that we catch bad
extents and return that the filesytsem is corrupted rather than
crashing.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_alloc.c |   30 +++++++++++++++++++++++-------
 1 files changed, 23 insertions(+), 7 deletions(-)

diff --git a/fs/xfs/xfs_alloc.c b/fs/xfs/xfs_alloc.c
index 4bc3c64..27d64d7 100644
--- a/fs/xfs/xfs_alloc.c
+++ b/fs/xfs/xfs_alloc.c
@@ -2395,17 +2395,33 @@ xfs_free_extent(
 	memset(&args, 0, sizeof(xfs_alloc_arg_t));
 	args.tp = tp;
 	args.mp = tp->t_mountp;
+
+	/*
+	 * validate that the block number is legal - the enables us to detect
+	 * and handle a silent filesystem corruption rather than crashing.
+	 */
 	args.agno = XFS_FSB_TO_AGNO(args.mp, bno);
-	ASSERT(args.agno < args.mp->m_sb.sb_agcount);
+	if (args.agno >= args.mp->m_sb.sb_agcount)
+		return EFSCORRUPTED;
+
 	args.agbno = XFS_FSB_TO_AGBNO(args.mp, bno);
+	if (args.agbno >= args.mp->m_sb.sb_agblocks)
+		return EFSCORRUPTED;
+
 	args.pag = xfs_perag_get(args.mp, args.agno);
-	if ((error = xfs_alloc_fix_freelist(&args, XFS_ALLOC_FLAG_FREEING)))
+	ASSERT(args.pag);
+
+	error = xfs_alloc_fix_freelist(&args, XFS_ALLOC_FLAG_FREEING);
+	if (error)
 		goto error0;
-#ifdef DEBUG
-	ASSERT(args.agbp != NULL);
-	ASSERT((args.agbno + len) <=
-		be32_to_cpu(XFS_BUF_TO_AGF(args.agbp)->agf_length));
-#endif
+
+	/* validate the extent size is legal now we have the agf locked */
+	if (args.agbno + len >
+			be32_to_cpu(XFS_BUF_TO_AGF(args.agbp)->agf_length)) {
+		error = EFSCORRUPTED;
+		goto error0;
+	}
+
 	error = xfs_free_ag_extent(tp, args.agbp, args.agno, args.agbno, len, 0);
 error0:
 	xfs_perag_put(args.pag);
-- 
1.7.2.3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 9/9] xfs: convert log tail checking to a warning
  2011-04-07  1:57 [PATCH 0/9] xfs; candidate fixes for 2.6.39 V2 Dave Chinner
                   ` (7 preceding siblings ...)
  2011-04-07  1:57 ` [PATCH 8/9] xfs: catch bad block numbers freeing extents Dave Chinner
@ 2011-04-07  1:57 ` Dave Chinner
  2011-04-07 21:16   ` Alex Elder
  8 siblings, 1 reply; 25+ messages in thread
From: Dave Chinner @ 2011-04-07  1:57 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

On the Power platform, the log tail debug checks fire excessively
causing the system to panic early in testing. The debug checks are
known to be racy, though on x86_64 there is no evidence that they
trigger at all.

We want to keep the checks active on debug systems to alert us to
problems with log space accounting, but we need to reduce the impact
of a racy check on testing on the Power platform.

As a result, convert the ASSERT conditions to warnings, and
allow them to fire only once per filesystem mount. This will prevent
false positives from interfering with testing, whilst still
providing us with the indication that they may be a problem with log
space accounting should that occur.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_log.c      |   32 ++++++++++++++++++++++++--------
 fs/xfs/xfs_log_priv.h |    1 +
 2 files changed, 25 insertions(+), 8 deletions(-)

diff --git a/fs/xfs/xfs_log.c b/fs/xfs/xfs_log.c
index 2464316..b612ce4 100644
--- a/fs/xfs/xfs_log.c
+++ b/fs/xfs/xfs_log.c
@@ -3407,6 +3407,17 @@ xlog_verify_dest_ptr(
 		xfs_emerg(log->l_mp, "%s: invalid ptr", __func__);
 }
 
+/*
+ * Check to make sure the grant write head didn't just over lap the tail.  If
+ * the cycles are the same, we can't be overlapping.  Otherwise, make sure that
+ * the cycles differ by exactly one and check the byte count.
+ *
+ * This check is run unlocked, so can give false positives. Rather than assert
+ * on failures, use a warn-once flag and a panic tag to allow the admin to
+ * determine if they want to panic the machine when such an error occurs. For
+ * debug kernels this will have the same effect as using an assert but, unlinke
+ * an assert, it can be turned off at runtime.
+ */
 STATIC void
 xlog_verify_grant_tail(
 	struct log	*log)
@@ -3414,17 +3425,22 @@ xlog_verify_grant_tail(
 	int		tail_cycle, tail_blocks;
 	int		cycle, space;
 
-	/*
-	 * Check to make sure the grant write head didn't just over lap the
-	 * tail.  If the cycles are the same, we can't be overlapping.
-	 * Otherwise, make sure that the cycles differ by exactly one and
-	 * check the byte count.
-	 */
 	xlog_crack_grant_head(&log->l_grant_write_head, &cycle, &space);
 	xlog_crack_atomic_lsn(&log->l_tail_lsn, &tail_cycle, &tail_blocks);
 	if (tail_cycle != cycle) {
-		ASSERT(cycle - 1 == tail_cycle);
-		ASSERT(space <= BBTOB(tail_blocks));
+		if (cycle - 1 != tail_cycle &&
+		    !(log->l_flags & XLOG_TAIL_WARN)) {
+			xfs_alert_tag(log->l_mp, XFS_PTAG_LOGRES,
+				"%s: cycle - 1 != tail_cycle", __func__);
+			log->l_flags |= XLOG_TAIL_WARN;
+		}
+
+		if (space > BBTOB(tail_blocks) &&
+		    !(log->l_flags & XLOG_TAIL_WARN)) {
+			xfs_alert_tag(log->l_mp, XFS_PTAG_LOGRES,
+				"%s: space > BBTOB(tail_blocks)", __func__);
+			log->l_flags |= XLOG_TAIL_WARN;
+		}
 	}
 }
 
diff --git a/fs/xfs/xfs_log_priv.h b/fs/xfs/xfs_log_priv.h
index 15dbf1f..bc988d4 100644
--- a/fs/xfs/xfs_log_priv.h
+++ b/fs/xfs/xfs_log_priv.h
@@ -144,6 +144,7 @@ static inline uint xlog_get_client_id(__be32 i)
 #define	XLOG_RECOVERY_NEEDED	0x4	/* log was recovered */
 #define XLOG_IO_ERROR		0x8	/* log hit an I/O error, and being
 					   shutdown */
+#define XLOG_TAIL_WARN		0x10	/* log tail verify warning issued */
 
 #ifdef __KERNEL__
 /*
-- 
1.7.2.3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/9] xfs: convert ENOSPC inode flushing to use new syncd workqueue
  2011-04-07  1:57 ` [PATCH 3/9] xfs: convert ENOSPC inode flushing to use new syncd workqueue Dave Chinner
@ 2011-04-07 21:16   ` Alex Elder
  0 siblings, 0 replies; 25+ messages in thread
From: Alex Elder @ 2011-04-07 21:16 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Thu, 2011-04-07 at 11:57 +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> On of the problems with the current inode flush at ENOSPC is that we
> queue a flush per ENOSPC event, regardless of how many are already
> queued. Thi can result in    hundreds of queued flushes, most of
> which simply burn CPU scanned and do no real work. This simply slows
> down allocation at ENOSPC.
> 
> We really only need one active flush at a time, and we can easily
> implement that via the new xfs_syncd_wq. All we need to do is queue
> a flush if one is not already active, then block waiting for the
> currently active flush to complete. The result is that we only ever
> have a single ENOSPC inode flush active at a time and this greatly
> reduces the overhead of ENOSPC processing.
> 
> On my 2p test machine, this results in tests exercising ENOSPC
> conditions running significantly faster - 042 halves execution time,
> 083 drops from 60s to 5s, etc - while not introducing test
> regressions.
> 
> This allows us to remove the old xfssyncd threads and infrastructure
> as they are no longer used.

Looks good.  You got rid of a useless log force as well.

Reviewed-by: Alex Elder <aelder@sgi.com>

> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/9] xfs: introduce background inode reclaim work
  2011-04-07  1:57 ` [PATCH 4/9] xfs: introduce background inode reclaim work Dave Chinner
@ 2011-04-07 21:16   ` Alex Elder
  2011-04-08  0:19     ` Dave Chinner
  0 siblings, 1 reply; 25+ messages in thread
From: Alex Elder @ 2011-04-07 21:16 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Thu, 2011-04-07 at 11:57 +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Background inode reclaim needs to run more frequently that the XFS
> syncd work is run as 30s is too long between optimal reclaim runs.
> Add a new periodic work item to the xfs syncd workqueue to run a
> fast, non-blocking inode reclaim scan.
> 
> Background inode reclaim is kicked by the act of marking inodes for
> reclaim.  When an AG is first marked as having reclaimable inodes,
> the background reclaim work is kicked. It will continue to run
> periodically untill it detects that there are no more reclaimable
> inodes. It will be kicked again when the first inode is queued for
> reclaim.
> 
> To ensure shrinker based inode reclaim throttles to the inode
> cleaning and reclaim rate but still reclaim inodes efficiently, make it kick the
> background inode reclaim so that when we are low on memory we are
> trying to reclaim inodes as efficiently as possible. This kick shoul
> d not be necessary, but it will protect against failures to kick the
> background reclaim when inodes are first dirtied.
> 
> To provide the rate throttling, make the shrinker pass do
> synchronous inode reclaim so that it blocks on inodes under IO. This
> means that the shrinker will reclaim inodes rather than just
> skipping over them, but it does not adversely affect the rate of
> reclaim because most dirty inodes are already under IO due to the
> background reclaim work the shrinker kicked.
> 
> These two modifications solve one of the two OOM killer invocations
> Chris Mason reported recently when running a stress testing script.
> The particular workload trigger for the OOM killer invocation is
> where there are more threads than CPUs all unlinking files in an
> extremely memory constrained environment. Unlike other solutions,
> this one does not have a performance impact on performance when
> memory is not constrained or the number of concurrent threads
> operating is <= to the number of CPUs.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>

Looks good.

Reviewed-by: Alex Elder <aelder@sgi.com>

> @@ -470,6 +469,52 @@ xfs_sync_worker(
>  }
>  
>  /*
> + * Queue a new inode reclaim pass if there are reclaimable inodes and there
> + * isn't a reclaim pass already in progress. By default it runs every 5s based
> + * on the xfs syncd work default of 30s. Perhaps this should have it's own

Agreed--I was going to say that but then I noticed your comment.

> + * tunable, but that can be done if this method proves to be ineffective or too
> + * aggressive.
> + */
> +static void
> +xfs_syncd_queue_reclaim(
> +	struct xfs_mount        *mp)
> +{
> +
> +	/*
> +	 * We can have inodes enter reclaim after we've shut down the syncd
> +	 * workqueue during unmount, so don't allow reclaim work to be queued
> +	 * during unmount.
> +	 */
> +	if (!(mp->m_super->s_flags & MS_ACTIVE))
> +		return;
> +
> +	rcu_read_lock();
> +	if (radix_tree_tagged(&mp->m_perag_tree, XFS_ICI_RECLAIM_TAG)) {
> +		queue_delayed_work(xfs_syncd_wq, &mp->m_reclaim_work,
> +			msecs_to_jiffies(xfs_syncd_centisecs / 6 * 10));

Probably better to do the multiply before the divide here.
(But whatever... it's heuristic.)

> +	}
> +	rcu_read_unlock();
> +}
> +
. . .

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 5/9] xfs: convert the xfsaild threads to a workqueue
  2011-04-07  1:57 ` [PATCH 5/9] xfs: convert the xfsaild threads to a workqueue Dave Chinner
@ 2011-04-07 21:16   ` Alex Elder
  0 siblings, 0 replies; 25+ messages in thread
From: Alex Elder @ 2011-04-07 21:16 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Thu, 2011-04-07 at 11:57 +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Similar to the xfssyncd, the per-filesystem xfsaild threads can be
> converted to a global workqueue and run periodically by delayed
> works. This makes sense for the AIL pushing because it uses
> variable timeouts depending on the work that needs to be done.
> 
> By removing the xfsaild, we simplify the AIL pushing code and
> remove the need to spread the code to implement the threading
> and pushing across multiple files.

Pretty cool.  Looks good to me.

Reviewed-by: Alex Elder <aelder@sgi.com>

> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 6/9] xfs: clean up code layout in xfs_trans_ail.c
  2011-04-07  1:57 ` [PATCH 6/9] xfs: clean up code layout in xfs_trans_ail.c Dave Chinner
@ 2011-04-07 21:16   ` Alex Elder
  0 siblings, 0 replies; 25+ messages in thread
From: Alex Elder @ 2011-04-07 21:16 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Thu, 2011-04-07 at 11:57 +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> This patch rearranges the location of functions in xfs_trans_ail.c
> to remove the need for forward declarations of those functions in
> preparation for adding new functions without the need for forward
> declarations.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

Looks good.

Reviewed-by: Alex Elder <aelder@sgi.com>


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 7/9] xfs: push the AIL from memory reclaim and periodic sync
  2011-04-07  1:57 ` [PATCH 7/9] xfs: push the AIL from memory reclaim and periodic sync Dave Chinner
@ 2011-04-07 21:16   ` Alex Elder
  0 siblings, 0 replies; 25+ messages in thread
From: Alex Elder @ 2011-04-07 21:16 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Thu, 2011-04-07 at 11:57 +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> When we are short on memory, we want to expedite the cleaning of
> dirty objects.  Hence when we run short on memory, we need to kick
> the AIL flushing into action to clean as many dirty objects as
> quickly as possible.  To implement this, sample the lsn of the log
> item at the head of the AIL and use that as the push target for the
> AIL flush.
> 
> Further, we keep items in the AIL that are dirty that are not
> tracked any other way, so we can get objects sitting in the AIL that
> don't get written back until the AIL is pushed. Hence to get the
> filesystem to the idle state, we might need to push the AIL to flush
> out any remaining dirty objects sitting in the AIL. This requires
> the same push mechanism as the reclaim push.
> 
> This patch also renames xfs_trans_ail_tail() to xfs_ail_min_lsn() to
> match the new xfs_ail_max_lsn() function introduced in this patch.
> Similarly for xfs_trans_ail_push -> xfs_ail_push.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>

Looks OK to me.

Reviewed-by: Alex Elder <aelder@sgi.com>


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 8/9] xfs: catch bad block numbers freeing extents.
  2011-04-07  1:57 ` [PATCH 8/9] xfs: catch bad block numbers freeing extents Dave Chinner
@ 2011-04-07 21:16   ` Alex Elder
  0 siblings, 0 replies; 25+ messages in thread
From: Alex Elder @ 2011-04-07 21:16 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Thu, 2011-04-07 at 11:57 +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> A fuzzed filesystem crashed a kernel when freeing an extent with a
> block number beyond the end of the filesystem. Convert all the debug
> asserts in xfs_free_extent() to active checks so that we catch bad
> extents and return that the filesytsem is corrupted rather than
> crashing.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>

Looks good.

Reviewed-by: Alex Elder <aelder@sgi.com>


_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 9/9] xfs: convert log tail checking to a warning
  2011-04-07  1:57 ` [PATCH 9/9] xfs: convert log tail checking to a warning Dave Chinner
@ 2011-04-07 21:16   ` Alex Elder
  0 siblings, 0 replies; 25+ messages in thread
From: Alex Elder @ 2011-04-07 21:16 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Thu, 2011-04-07 at 11:57 +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> On the Power platform, the log tail debug checks fire excessively
> causing the system to panic early in testing. The debug checks are
> known to be racy, though on x86_64 there is no evidence that they
> trigger at all.
> 
> We want to keep the checks active on debug systems to alert us to
> problems with log space accounting, but we need to reduce the impact
> of a racy check on testing on the Power platform.
> 
> As a result, convert the ASSERT conditions to warnings, and
> allow them to fire only once per filesystem mount. This will prevent
> false positives from interfering with testing, whilst still
> providing us with the indication that they may be a problem with log
> space accounting should that occur.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>

Looks good.

Reviewed-by: Alex Elder <aelder@sgi.com>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/9] xfs: introduce a xfssyncd workqueue
  2011-04-07  1:57 ` [PATCH 2/9] xfs: introduce a xfssyncd workqueue Dave Chinner
@ 2011-04-07 21:34   ` Alex Elder
  2011-04-08  0:41     ` Dave Chinner
  0 siblings, 1 reply; 25+ messages in thread
From: Alex Elder @ 2011-04-07 21:34 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Thu, 2011-04-07 at 11:57 +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> All of the work xfssyncd does is background functionality. There is
> no need for a thread per filesystem to do this work - it can al be
> managed by a global workqueue now they manage concurrency
> effectively.
> 
> Introduce a new gglobal xfssyncd workqueue, and convert the periodic
> work to use this new functionality. To do this, use a delayed work
> construct to schedule the next running of the periodic sync work
> for the filesystem. When the sync work is complete, queue a new
> delayed work for the next running of the sync work.
> 
> For laptop mode, we wait on completion for the sync works, so ensure
> that the sync work queuing interface can flush and wait for work to
> complete to enable the work queue infrastructure to replace the
> current sequence number and wakeup that is used.
> 
> Because the sync work does non-trivial amounts of work, mark the
> new work queue as CPU intensive.

(I've now seen your next patch so my confusion is I
think resolved.  I'm sending the following as I originally
wrote it anyway.)

I have two comments below.  One is something that can be
fixed later and another I think may be a problem.  I also
was just a little confused about something.

The confusing thing is that you still are spawning a kernel
thread per filesystem in xfs_syncd_init(), which is still
waiting xfs_syncd_centisecs between runs, and which is
then running work queued on the mount point's m_sync_list.

I *think* the reason it's confusing is just that your
description talks about "all of the work xfssyncd does,"
while this patch just pulls out the data syncing portion
of what it does.  The patch preserves the ability to make
use of the per-FS periodic syncer thread to flush inodes
(via xfs_flush_inodes()).

In any case, with the exception of the timeout thing
below (which ought to be easy to fix) the code looks
correct to me.  It just took me a little while to
reconcile the what the delayed workqueues (named
"xfssyncd") do, versus what the xfssyncd" threads
that remain do.

Despite the above, you can consider this reviewed by me.

Reviewed-by: Alex Elder <aelder@sgi.com>

> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/xfs/linux-2.6/xfs_super.c |   24 +++++------
>  fs/xfs/linux-2.6/xfs_sync.c  |   86 ++++++++++++++++++++---------------------
>  fs/xfs/linux-2.6/xfs_sync.h  |    2 +
>  fs/xfs/xfs_mount.h           |    4 +-
>  4 files changed, 56 insertions(+), 60 deletions(-)
> 
> diff --git a/fs/xfs/linux-2.6/xfs_super.c b/fs/xfs/linux-2.6/xfs_super.c
> index 1ba5c45..99dded9 100644
> --- a/fs/xfs/linux-2.6/xfs_super.c
> +++ b/fs/xfs/linux-2.6/xfs_super.c

. . .

> @@ -1833,13 +1822,21 @@ init_xfs_fs(void)
>  	if (error)
>  		goto out_cleanup_procfs;
>  
> +	xfs_syncd_wq = alloc_workqueue("xfssyncd", WQ_CPU_INTENSIVE, 8);

The value (8) for max_active here is arbitrary, and maybe
justified with some magic words in a comment or something.
But I really think it should be configurable, I suppose
via a module parameter, for the benefit of unusual (i.e.
large) configurations.

> +	if (!xfs_syncd_wq) {
> +		error = -ENOMEM;
> +		goto out_sysctl_unregister;
> +	}

. . .

> diff --git a/fs/xfs/linux-2.6/xfs_sync.c b/fs/xfs/linux-2.6/xfs_sync.c
> index 594cd82..ee9a6c3 100644
> --- a/fs/xfs/linux-2.6/xfs_sync.c
> +++ b/fs/xfs/linux-2.6/xfs_sync.c

. . .

> @@ -535,27 +511,12 @@ xfssyncd(
>  			break;
>  
>  		spin_lock(&mp->m_sync_lock);
> -		/*
> -		 * We can get woken by laptop mode, to do a sync -
> -		 * that's the (only!) case where the list would be
> -		 * empty with time remaining.
> -		 */
> -		if (!timeleft || list_empty(&mp->m_sync_list)) {
> -			if (!timeleft)
> -				timeleft = xfs_syncd_centisecs *
> -							msecs_to_jiffies(10);
> -			INIT_LIST_HEAD(&mp->m_sync_work.w_list);
> -			list_add_tail(&mp->m_sync_work.w_list,
> -					&mp->m_sync_list);
> -		}

Does timeleft have to be re-initialized in here somewhere?
It looks to me like it will become zero pretty quickly and
stay there.

>  		list_splice_init(&mp->m_sync_list, &tmp);
>  		spin_unlock(&mp->m_sync_lock);
>  
>  		list_for_each_entry_safe(work, n, &tmp, w_list) {
>  			(*work->w_syncer)(mp, work->w_data);
>  			list_del(&work->w_list);
> -			if (work == &mp->m_sync_work)
> -				continue;
>  			if (work->w_completion)
>  				complete(work->w_completion);
>  			kmem_free(work);

. . .



_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/9] xfs: introduce background inode reclaim work
  2011-04-07 21:16   ` Alex Elder
@ 2011-04-08  0:19     ` Dave Chinner
  2011-04-08 13:49       ` Alex Elder
  0 siblings, 1 reply; 25+ messages in thread
From: Dave Chinner @ 2011-04-08  0:19 UTC (permalink / raw)
  To: Alex Elder; +Cc: xfs

On Thu, Apr 07, 2011 at 04:16:22PM -0500, Alex Elder wrote:
> On Thu, 2011-04-07 at 11:57 +1000, Dave Chinner wrote:
> > +	if (!(mp->m_super->s_flags & MS_ACTIVE))
> > +		return;
> > +
> > +	rcu_read_lock();
> > +	if (radix_tree_tagged(&mp->m_perag_tree, XFS_ICI_RECLAIM_TAG)) {
> > +		queue_delayed_work(xfs_syncd_wq, &mp->m_reclaim_work,
> > +			msecs_to_jiffies(xfs_syncd_centisecs / 6 * 10));
> 
> Probably better to do the multiply before the divide here.
> (But whatever... it's heuristic.)

I always tend to divide before multiply to prevent the multiple from
overflowing before the divide is done. In this case the granularity
of xfs_syncd_centisecs is sufficient that the rounding error of the
divide is meaningless. ie. 30s = 3000.

FWIW, I changed this from:

	xfs_syncd_centisecs / 6 * msecs_to_jiffies(10)

because msecs_to_jiffies() has larger rounding problems. e.g. @
CONFIG_HZ=250, msecs_to_jiffies(10) = 3 which is actually 12ms. That
is, we want to sleep for 5s at a time, and the two different
calculations give:

New:
	msecs_to_jiffies(3000 / 6 * 10) = 5000 / 4 jiffies
					= 1250 jiffies
					= 5s
Old:
	3000 / 6 * msecs_to_jiffies(10) = 500 * 3 jiffies
					= 1500 jiffies
					= 6s

This 20% rounding error is the reason we've recently noticed xfssyncd
running every 36s rather than every 30s....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 2/9] xfs: introduce a xfssyncd workqueue
  2011-04-07 21:34   ` Alex Elder
@ 2011-04-08  0:41     ` Dave Chinner
  0 siblings, 0 replies; 25+ messages in thread
From: Dave Chinner @ 2011-04-08  0:41 UTC (permalink / raw)
  To: Alex Elder; +Cc: xfs

On Thu, Apr 07, 2011 at 04:34:53PM -0500, Alex Elder wrote:
> On Thu, 2011-04-07 at 11:57 +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> > 
> > All of the work xfssyncd does is background functionality. There is
> > no need for a thread per filesystem to do this work - it can al be
> > managed by a global workqueue now they manage concurrency
> > effectively.
> > 
> > Introduce a new gglobal xfssyncd workqueue, and convert the periodic
> > work to use this new functionality. To do this, use a delayed work
> > construct to schedule the next running of the periodic sync work
> > for the filesystem. When the sync work is complete, queue a new
> > delayed work for the next running of the sync work.
> > 
> > For laptop mode, we wait on completion for the sync works, so ensure
> > that the sync work queuing interface can flush and wait for work to
> > complete to enable the work queue infrastructure to replace the
> > current sequence number and wakeup that is used.
> > 
> > Because the sync work does non-trivial amounts of work, mark the
> > new work queue as CPU intensive.
> 
> (I've now seen your next patch so my confusion is I
> think resolved.  I'm sending the following as I originally
> wrote it anyway.)
> 
> I have two comments below.  One is something that can be
> fixed later and another I think may be a problem.  I also
> was just a little confused about something.
> 
> The confusing thing is that you still are spawning a kernel
> thread per filesystem in xfs_syncd_init(), which is still
> waiting xfs_syncd_centisecs between runs, and which is
> then running work queued on the mount point's m_sync_list.
> 
> I *think* the reason it's confusing is just that your
> description talks about "all of the work xfssyncd does,"

".. is background functionality." The rest of the patch description
talks only about introducing the workqueue and converting a single
operation to use it:

> while this patch just pulls out the data syncing portion
> of what it does.
>
> The patch preserves the ability to make
> use of the per-FS periodic syncer thread to flush inodes
> (via xfs_flush_inodes()).

Right - that's converted in the next patch, and the unused syncd
thread is removed.

> 
> In any case, with the exception of the timeout thing
> below (which ought to be easy to fix) the code looks
> correct to me.  It just took me a little while to
> reconcile the what the delayed workqueues (named
> "xfssyncd") do, versus what the xfssyncd" threads
> that remain do.
> 
> Despite the above, you can consider this reviewed by me.
> 
> Reviewed-by: Alex Elder <aelder@sgi.com>
> 
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
> > Reviewed-by: Christoph Hellwig <hch@lst.de>
> > ---
> >  fs/xfs/linux-2.6/xfs_super.c |   24 +++++------
> >  fs/xfs/linux-2.6/xfs_sync.c  |   86 ++++++++++++++++++++---------------------
> >  fs/xfs/linux-2.6/xfs_sync.h  |    2 +
> >  fs/xfs/xfs_mount.h           |    4 +-
> >  4 files changed, 56 insertions(+), 60 deletions(-)
> > 
> > diff --git a/fs/xfs/linux-2.6/xfs_super.c b/fs/xfs/linux-2.6/xfs_super.c
> > index 1ba5c45..99dded9 100644
> > --- a/fs/xfs/linux-2.6/xfs_super.c
> > +++ b/fs/xfs/linux-2.6/xfs_super.c
> 
> . . .
> 
> > @@ -1833,13 +1822,21 @@ init_xfs_fs(void)
> >  	if (error)
> >  		goto out_cleanup_procfs;
> >  
> > +	xfs_syncd_wq = alloc_workqueue("xfssyncd", WQ_CPU_INTENSIVE, 8);
> 
> The value (8) for max_active here is arbitrary, and maybe
> justified with some magic words in a comment or something.
> But I really think it should be configurable, I suppose
> via a module parameter, for the benefit of unusual (i.e.
> large) configurations.

I'll add a comment. FYI, it's a per-cpu number, not a global number.
>From Documentation/workqueue.txt:

"@max_active determines the maximum number of execution contexts per
CPU which can be assigned to the work items of a wq.  For example,
with @max_active of 16, at most 16 work items of the wq can be
executing at the same time per CPU."

Which means it does scale with machine size already. Essentially, we
have a maximum of 3 work concurrent work items executing on the
syncd work queue per filesystem, so I don't think there'll be any
shortage of worker contexts on a typical system....


> > @@ -535,27 +511,12 @@ xfssyncd(
> >  			break;
> >  
> >  		spin_lock(&mp->m_sync_lock);
> > -		/*
> > -		 * We can get woken by laptop mode, to do a sync -
> > -		 * that's the (only!) case where the list would be
> > -		 * empty with time remaining.
> > -		 */
> > -		if (!timeleft || list_empty(&mp->m_sync_list)) {
> > -			if (!timeleft)
> > -				timeleft = xfs_syncd_centisecs *
> > -							msecs_to_jiffies(10);
> > -			INIT_LIST_HEAD(&mp->m_sync_work.w_list);
> > -			list_add_tail(&mp->m_sync_work.w_list,
> > -					&mp->m_sync_list);
> > -		}
> 
> Does timeleft have to be re-initialized in here somewhere?
> It looks to me like it will become zero pretty quickly and
> stay there.

Yeah, you're right, though the code is completely removed in the
next patch so it's not noticable. I'll just set it unconditionally
so that bisects don't do strange things if they land on this commit.

Cheers,

Dave.

-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/9] xfs: introduce background inode reclaim work
  2011-04-08  0:19     ` Dave Chinner
@ 2011-04-08 13:49       ` Alex Elder
  0 siblings, 0 replies; 25+ messages in thread
From: Alex Elder @ 2011-04-08 13:49 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Fri, 2011-04-08 at 10:19 +1000, Dave Chinner wrote:
> On Thu, Apr 07, 2011 at 04:16:22PM -0500, Alex Elder wrote:
> > On Thu, 2011-04-07 at 11:57 +1000, Dave Chinner wrote:
> > > +	if (!(mp->m_super->s_flags & MS_ACTIVE))
> > > +		return;
> > > +
> > > +	rcu_read_lock();
> > > +	if (radix_tree_tagged(&mp->m_perag_tree, XFS_ICI_RECLAIM_TAG)) {
> > > +		queue_delayed_work(xfs_syncd_wq, &mp->m_reclaim_work,
> > > +			msecs_to_jiffies(xfs_syncd_centisecs / 6 * 10));
> > 
> > Probably better to do the multiply before the divide here.
> > (But whatever... it's heuristic.)
> 
> I always tend to divide before multiply to prevent the multiple from
> overflowing before the divide is done. In this case the granularity
> of xfs_syncd_centisecs is sufficient that the rounding error of the
> divide is meaningless. ie. 30s = 3000.

Funny, I normally do the same, but in this case
I was thinking about getting better accuracy.
The numbers involved (msecs / jiffies and the
number of msec) are not likely to be even close
to sqrt(INT32_MAX) (let alone sqrt(UINT64_MAX)).
Putting the calculation inside the parenthesis
is much better though, and I did notice that
change.

					-Alex

> FWIW, I changed this from:
> 
> 	xfs_syncd_centisecs / 6 * msecs_to_jiffies(10)
> 
> because msecs_to_jiffies() has larger rounding problems. e.g. @
> CONFIG_HZ=250, msecs_to_jiffies(10) = 3 which is actually 12ms. That
> is, we want to sleep for 5s at a time, and the two different
> calculations give:
> 
> New:
> 	msecs_to_jiffies(3000 / 6 * 10) = 5000 / 4 jiffies
> 					= 1250 jiffies
> 					= 5s
> Old:
> 	3000 / 6 * msecs_to_jiffies(10) = 500 * 3 jiffies
> 					= 1500 jiffies
> 					= 6s
> 
> This 20% rounding error is the reason we've recently noticed xfssyncd
> running every 36s rather than every 30s....
> 
> Cheers,
> 
> Dave.



_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/9] xfs: fix extent format buffer allocation size
  2011-04-07  0:05     ` Dave Chinner
@ 2011-04-07 17:19       ` Christoph Hellwig
  0 siblings, 0 replies; 25+ messages in thread
From: Christoph Hellwig @ 2011-04-07 17:19 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, xfs

On Thu, Apr 07, 2011 at 10:05:05AM +1000, Dave Chinner wrote:
> The function is needed regardless of XFS_NATIVE_HOST due to the fact
> it is needed to pick the real extents out of the n-core data fork
> when delayed allocation extents are present. This is endian
> independent.

You're right.  I was only looking at the attr fork side where we
don't handle the delalloc extents.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/9] xfs: fix extent format buffer allocation size
  2011-04-06 13:38   ` Christoph Hellwig
@ 2011-04-07  0:05     ` Dave Chinner
  2011-04-07 17:19       ` Christoph Hellwig
  0 siblings, 1 reply; 25+ messages in thread
From: Dave Chinner @ 2011-04-07  0:05 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Wed, Apr 06, 2011 at 09:38:17AM -0400, Christoph Hellwig wrote:
> On Wed, Apr 06, 2011 at 04:19:10PM +1000, Dave Chinner wrote:
> > +	ext_buffer = kmem_alloc(XFS_IFORK_SIZE(ip, whichfork),
> > +							KM_SLEEP | KM_NOFS);
> 
> As mentioned before the KM_NOFS is a change from the previous version
> and really should not be needed.

Ok. If you feel strongly enough to point it out a second time, I'll
fix it ;)

> Also the feedback that the new helper needs to be under the
> !XFS_NATIVE_ENDIAN ifdef wasn't picked up either.

The function is needed regardless of XFS_NATIVE_HOST due to the fact
it is needed to pick the real extents out of the n-core data fork
when delayed allocation extents are present. This is endian
independent.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/9] xfs: fix extent format buffer allocation size
  2011-04-06  6:19 ` [PATCH 1/9] xfs: fix extent format buffer allocation size Dave Chinner
@ 2011-04-06 13:38   ` Christoph Hellwig
  2011-04-07  0:05     ` Dave Chinner
  0 siblings, 1 reply; 25+ messages in thread
From: Christoph Hellwig @ 2011-04-06 13:38 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Wed, Apr 06, 2011 at 04:19:10PM +1000, Dave Chinner wrote:
> +	ext_buffer = kmem_alloc(XFS_IFORK_SIZE(ip, whichfork),
> +							KM_SLEEP | KM_NOFS);

As mentioned before the KM_NOFS is a change from the previous version
and really should not be needed.

Also the feedback that the new helper needs to be under the
!XFS_NATIVE_ENDIAN ifdef wasn't picked up either.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/9] xfs: fix extent format buffer allocation size
  2011-04-06  6:19 [PATCH 0/9] xfs: candidate fixes for 2.6.39 Dave Chinner
@ 2011-04-06  6:19 ` Dave Chinner
  2011-04-06 13:38   ` Christoph Hellwig
  0 siblings, 1 reply; 25+ messages in thread
From: Dave Chinner @ 2011-04-06  6:19 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

When formatting an inode item, we have to allocate a separate buffer
to hold extents when there are delayed allocation extents on the
inode and it is in extent format. The allocation size is derived
from the in-core data fork representation, which accounts for
delayed allocation extents, while the on-disk representation does
not contain any delalloc extents.

As a result of this mismatch, the allocated buffer can be far larger
than needed to hold the real extent list which, due to the fact the
inode is in extent format, is limited to the size of the literal
area of the inode. However, we can have thousands of delalloc
extents, resulting in an allocation size orders of magnitude larger
than is needed to hold all the real extents.

Fix this by limiting the size of the buffer being allocated to the
size of the literal area of the inodes in the filesystem (i.e. the
maximum size an inode fork can grow to).

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
---
 fs/xfs/xfs_inode_item.c |   68 ++++++++++++++++++++++++++++------------------
 1 files changed, 41 insertions(+), 27 deletions(-)

diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c
index 46cc401..7cfdb10 100644
--- a/fs/xfs/xfs_inode_item.c
+++ b/fs/xfs/xfs_inode_item.c
@@ -198,6 +198,42 @@ xfs_inode_item_size(
 }
 
 /*
+ * xfs_inode_item_format_extents - convert in-core extents to on-disk form
+ *
+ * For either the data or attr fork in extent format, we need to endian convert
+ * the in-core extent as we place them into the on-disk inode. In this case, we
+ * need to do this conversion before we write the extents into the log. Because
+ * we don't have the disk inode to write into here, we allocate a buffer and
+ * format the extents into it via xfs_iextents_copy(). We free the buffer in
+ * the unlock routine after the copy for the log has been made.
+ *
+ * In the case of the data fork, the in-core and on-disk fork sizes can be
+ * different due to delayed allocation extents. We only log on-disk extents
+ * here, so always use the physical fork size to determine the size of the
+ * buffer we need to allocate.
+ */
+STATIC void
+xfs_inode_item_format_extents(
+	struct xfs_inode	*ip,
+	struct xfs_log_iovec	*vecp,
+	int			whichfork,
+	int			type)
+{
+	xfs_bmbt_rec_t		*ext_buffer;
+
+	ext_buffer = kmem_alloc(XFS_IFORK_SIZE(ip, whichfork),
+							KM_SLEEP | KM_NOFS);
+	if (whichfork == XFS_DATA_FORK)
+		ip->i_itemp->ili_extents_buf = ext_buffer;
+	else
+		ip->i_itemp->ili_aextents_buf = ext_buffer;
+
+	vecp->i_addr = ext_buffer;
+	vecp->i_len = xfs_iextents_copy(ip, ext_buffer, whichfork);
+	vecp->i_type = type;
+}
+
+/*
  * This is called to fill in the vector of log iovecs for the
  * given inode log item.  It fills the first item with an inode
  * log format structure, the second with the on-disk inode structure,
@@ -213,7 +249,6 @@ xfs_inode_item_format(
 	struct xfs_inode	*ip = iip->ili_inode;
 	uint			nvecs;
 	size_t			data_bytes;
-	xfs_bmbt_rec_t		*ext_buffer;
 	xfs_mount_t		*mp;
 
 	vecp->i_addr = &iip->ili_format;
@@ -320,22 +355,8 @@ xfs_inode_item_format(
 			} else
 #endif
 			{
-				/*
-				 * There are delayed allocation extents
-				 * in the inode, or we need to convert
-				 * the extents to on disk format.
-				 * Use xfs_iextents_copy()
-				 * to copy only the real extents into
-				 * a separate buffer.  We'll free the
-				 * buffer in the unlock routine.
-				 */
-				ext_buffer = kmem_alloc(ip->i_df.if_bytes,
-					KM_SLEEP);
-				iip->ili_extents_buf = ext_buffer;
-				vecp->i_addr = ext_buffer;
-				vecp->i_len = xfs_iextents_copy(ip, ext_buffer,
-						XFS_DATA_FORK);
-				vecp->i_type = XLOG_REG_TYPE_IEXT;
+				xfs_inode_item_format_extents(ip, vecp,
+					XFS_DATA_FORK, XLOG_REG_TYPE_IEXT);
 			}
 			ASSERT(vecp->i_len <= ip->i_df.if_bytes);
 			iip->ili_format.ilf_dsize = vecp->i_len;
@@ -445,19 +466,12 @@ xfs_inode_item_format(
 			 */
 			vecp->i_addr = ip->i_afp->if_u1.if_extents;
 			vecp->i_len = ip->i_afp->if_bytes;
+			vecp->i_type = XLOG_REG_TYPE_IATTR_EXT;
 #else
 			ASSERT(iip->ili_aextents_buf == NULL);
-			/*
-			 * Need to endian flip before logging
-			 */
-			ext_buffer = kmem_alloc(ip->i_afp->if_bytes,
-				KM_SLEEP);
-			iip->ili_aextents_buf = ext_buffer;
-			vecp->i_addr = ext_buffer;
-			vecp->i_len = xfs_iextents_copy(ip, ext_buffer,
-					XFS_ATTR_FORK);
+			xfs_inode_item_format_extents(ip, vecp,
+					XFS_ATTR_FORK, XLOG_REG_TYPE_IATTR_EXT);
 #endif
-			vecp->i_type = XLOG_REG_TYPE_IATTR_EXT;
 			iip->ili_format.ilf_asize = vecp->i_len;
 			vecp++;
 			nvecs++;
-- 
1.7.2.3

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2011-04-08 13:47 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-04-07  1:57 [PATCH 0/9] xfs; candidate fixes for 2.6.39 V2 Dave Chinner
2011-04-07  1:57 ` [PATCH 1/9] xfs: fix extent format buffer allocation size Dave Chinner
2011-04-07  1:57 ` [PATCH 2/9] xfs: introduce a xfssyncd workqueue Dave Chinner
2011-04-07 21:34   ` Alex Elder
2011-04-08  0:41     ` Dave Chinner
2011-04-07  1:57 ` [PATCH 3/9] xfs: convert ENOSPC inode flushing to use new syncd workqueue Dave Chinner
2011-04-07 21:16   ` Alex Elder
2011-04-07  1:57 ` [PATCH 4/9] xfs: introduce background inode reclaim work Dave Chinner
2011-04-07 21:16   ` Alex Elder
2011-04-08  0:19     ` Dave Chinner
2011-04-08 13:49       ` Alex Elder
2011-04-07  1:57 ` [PATCH 5/9] xfs: convert the xfsaild threads to a workqueue Dave Chinner
2011-04-07 21:16   ` Alex Elder
2011-04-07  1:57 ` [PATCH 6/9] xfs: clean up code layout in xfs_trans_ail.c Dave Chinner
2011-04-07 21:16   ` Alex Elder
2011-04-07  1:57 ` [PATCH 7/9] xfs: push the AIL from memory reclaim and periodic sync Dave Chinner
2011-04-07 21:16   ` Alex Elder
2011-04-07  1:57 ` [PATCH 8/9] xfs: catch bad block numbers freeing extents Dave Chinner
2011-04-07 21:16   ` Alex Elder
2011-04-07  1:57 ` [PATCH 9/9] xfs: convert log tail checking to a warning Dave Chinner
2011-04-07 21:16   ` Alex Elder
  -- strict thread matches above, loose matches on Subject: below --
2011-04-06  6:19 [PATCH 0/9] xfs: candidate fixes for 2.6.39 Dave Chinner
2011-04-06  6:19 ` [PATCH 1/9] xfs: fix extent format buffer allocation size Dave Chinner
2011-04-06 13:38   ` Christoph Hellwig
2011-04-07  0:05     ` Dave Chinner
2011-04-07 17:19       ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.