All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] xfs: patch queue for 3.2 v2
@ 2011-08-26  6:51 Dave Chinner
  2011-08-26  6:51 ` [PATCH 1/4] xfs: Don't allocate new buffers on every call to _xfs_buf_find Dave Chinner
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: Dave Chinner @ 2011-08-26  6:51 UTC (permalink / raw)
  To: xfs

Updates series after Alex's review and merging of the dio locking
fixes.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/4] xfs: Don't allocate new buffers on every call to _xfs_buf_find
  2011-08-26  6:51 [PATCH 0/4] xfs: patch queue for 3.2 v2 Dave Chinner
@ 2011-08-26  6:51 ` Dave Chinner
  2011-08-26  8:11   ` Christoph Hellwig
  2011-08-26  6:51 ` [PATCH 2/4] xfs: reduce the number of log forces from tail pushing Dave Chinner
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2011-08-26  6:51 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

Stats show that for an 8-way unlink @ ~80,000 unlinks/s we are doing
~1 million cache hit lookups to ~3000 buffer creates. That's almost
3 orders of magnitude more cahce hits than misses, so optimising for
cache hits is quite important. In the cache hit case, we do not need
to allocate a new buffer in case of a cache miss, so we are
effectively hitting the allocator for no good reason for vast the
majority of calls to _xfs_buf_find. 8-way create workloads are
showing similar cache hit/miss ratios.

The result is profiles that look like this:

     samples  pcnt function                        DSO
     _______ _____ _______________________________ _________________

     1036.00 10.0% _xfs_buf_find                   [kernel.kallsyms]
      582.00  5.6% kmem_cache_alloc                [kernel.kallsyms]
      519.00  5.0% __memcpy                        [kernel.kallsyms]
      468.00  4.5% __ticket_spin_lock              [kernel.kallsyms]
      388.00  3.7% kmem_cache_free                 [kernel.kallsyms]
      331.00  3.2% xfs_log_commit_cil              [kernel.kallsyms]


Further, there is a fair bit of work involved in initialising a new
buffer once a cache miss has occurred and we currently do that under
the rbtree spinlock. That increases spinlock hold time on what are
heavily used trees.

To fix this, remove the initialisation of the buffer from
_xfs_buf_find() and only allocate the new buffer once we've had a
cache miss. Initialise the buffer immediately after allocating it in
xfs_buf_get, too, so that is it ready for insert if we get another
cache miss after allocation. This minimises lock hold time and
avoids unnecessary allocator churn. The resulting profiles look
like:

     samples  pcnt function                    DSO
     _______ _____ ___________________________ _________________

     8111.00  9.1% _xfs_buf_find               [kernel.kallsyms]
     4380.00  4.9% __memcpy                    [kernel.kallsyms]
     4341.00  4.8% __ticket_spin_lock          [kernel.kallsyms]
     3401.00  3.8% kmem_cache_alloc            [kernel.kallsyms]
     2856.00  3.2% xfs_log_commit_cil          [kernel.kallsyms]
     2625.00  2.9% __kmalloc                   [kernel.kallsyms]
     2380.00  2.7% kfree                       [kernel.kallsyms]
     2016.00  2.3% kmem_cache_free             [kernel.kallsyms]

Showing a significant reduction in time spent doing allocation and
freeing from slabs (kmem_cache_alloc and kmem_cache_free).

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
---
 fs/xfs/xfs_buf.c |   89 +++++++++++++++++++++++++++++++-----------------------
 1 files changed, 51 insertions(+), 38 deletions(-)

diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index c57836d..594cea5 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -171,10 +171,16 @@ STATIC void
 _xfs_buf_initialize(
 	xfs_buf_t		*bp,
 	xfs_buftarg_t		*target,
-	xfs_off_t		range_base,
-	size_t			range_length,
+	xfs_off_t		bno,
+	size_t			num_blocks,
 	xfs_buf_flags_t		flags)
 {
+	xfs_off_t		range_base;
+	size_t			range_length;
+
+	range_base = BBTOB(bno);
+	range_length = BBTOB(num_blocks);
+
 	/*
 	 * We don't want certain flags to appear in b_flags.
 	 */
@@ -423,9 +429,9 @@ _xfs_buf_map_pages(
  */
 xfs_buf_t *
 _xfs_buf_find(
-	xfs_buftarg_t		*btp,	/* block device target		*/
-	xfs_off_t		ioff,	/* starting offset of range	*/
-	size_t			isize,	/* length of range		*/
+	xfs_buftarg_t		*btp,
+	xfs_off_t		bno,
+	size_t			num_blocks,
 	xfs_buf_flags_t		flags,
 	xfs_buf_t		*new_bp)
 {
@@ -436,8 +442,8 @@ _xfs_buf_find(
 	struct rb_node		*parent;
 	xfs_buf_t		*bp;
 
-	range_base = (ioff << BBSHIFT);
-	range_length = (isize << BBSHIFT);
+	range_base = BBTOB(bno);
+	range_length = BBTOB(num_blocks);
 
 	/* Check for IOs smaller than the sector size / not sector aligned */
 	ASSERT(!(range_length < (1 << btp->bt_sshift)));
@@ -445,7 +451,7 @@ _xfs_buf_find(
 
 	/* get tree root */
 	pag = xfs_perag_get(btp->bt_mount,
-				xfs_daddr_to_agno(btp->bt_mount, ioff));
+				xfs_daddr_to_agno(btp->bt_mount, bno));
 
 	/* walk tree */
 	spin_lock(&pag->pag_buf_lock);
@@ -481,8 +487,6 @@ _xfs_buf_find(
 
 	/* No match found */
 	if (new_bp) {
-		_xfs_buf_initialize(new_bp, btp, range_base,
-				range_length, flags);
 		rb_link_node(&new_bp->b_rbnode, parent, rbp);
 		rb_insert_color(&new_bp->b_rbnode, &pag->pag_buf_tree);
 		/* the buffer keeps the perag reference until it is freed */
@@ -525,34 +529,43 @@ found:
 }
 
 /*
- *	Assembles a buffer covering the specified range.
- *	Storage in memory for all portions of the buffer will be allocated,
- *	although backing storage may not be.
+ * Assembles a buffer covering the specified range. The code is optimised for
+ * cache hits, as metadata intensive workloads will see 3 orders of magnitude
+ * more hits than misses.
  */
-xfs_buf_t *
+struct xfs_buf *
 xfs_buf_get(
-	xfs_buftarg_t		*target,/* target for buffer		*/
-	xfs_off_t		ioff,	/* starting offset of range	*/
-	size_t			isize,	/* length of range		*/
+	struct xfs_buftarg	*target,
+	xfs_off_t		bno,
+	size_t			num_blocks,
 	xfs_buf_flags_t		flags)
 {
-	xfs_buf_t		*bp, *new_bp;
+	struct xfs_buf		*bp;
+	struct xfs_buf		*new_bp;
 	int			error = 0;
 
+	bp = _xfs_buf_find(target, bno, num_blocks, flags, NULL);
+	if (likely(bp))
+		goto found;
+
 	new_bp = xfs_buf_allocate(flags);
 	if (unlikely(!new_bp))
 		return NULL;
 
-	bp = _xfs_buf_find(target, ioff, isize, flags, new_bp);
+	_xfs_buf_initialize(new_bp, target, bno, num_blocks, flags);
+
+	bp = _xfs_buf_find(target, bno, num_blocks, flags, new_bp);
+	if (!bp) {
+		xfs_buf_deallocate(new_bp);
+		return NULL;
+	}
+
 	if (bp == new_bp) {
 		error = xfs_buf_allocate_memory(bp, flags);
 		if (error)
 			goto no_buffer;
-	} else {
+	} else
 		xfs_buf_deallocate(new_bp);
-		if (unlikely(bp == NULL))
-			return NULL;
-	}
 
 	if (!(bp->b_flags & XBF_MAPPED)) {
 		error = _xfs_buf_map_pages(bp, flags);
@@ -563,19 +576,19 @@ xfs_buf_get(
 		}
 	}
 
-	XFS_STATS_INC(xb_get);
-
 	/*
-	 * Always fill in the block number now, the mapped cases can do
-	 * their own overlay of this later.
+	 * Now we have a workable buffer, fill in the block number so
+	 * that we can do IO on it.
 	 */
-	bp->b_bn = ioff;
-	bp->b_count_desired = bp->b_buffer_length;
+	bp->b_bn = bno;
 
+found:
+	ASSERT(bp->b_flags & XBF_MAPPED);
+	XFS_STATS_INC(xb_get);
 	trace_xfs_buf_get(bp, flags, _RET_IP_);
 	return bp;
 
- no_buffer:
+no_buffer:
 	if (flags & (XBF_LOCK | XBF_TRYLOCK))
 		xfs_buf_unlock(bp);
 	xfs_buf_rele(bp);
@@ -604,15 +617,15 @@ _xfs_buf_read(
 xfs_buf_t *
 xfs_buf_read(
 	xfs_buftarg_t		*target,
-	xfs_off_t		ioff,
-	size_t			isize,
+	xfs_off_t		bno,
+	size_t			num_blocks,
 	xfs_buf_flags_t		flags)
 {
 	xfs_buf_t		*bp;
 
 	flags |= XBF_READ;
 
-	bp = xfs_buf_get(target, ioff, isize, flags);
+	bp = xfs_buf_get(target, bno, num_blocks, flags);
 	if (bp) {
 		trace_xfs_buf_read(bp, flags, _RET_IP_);
 
@@ -647,13 +660,13 @@ xfs_buf_read(
 void
 xfs_buf_readahead(
 	xfs_buftarg_t		*target,
-	xfs_off_t		ioff,
-	size_t			isize)
+	xfs_off_t		bno,
+	size_t			num_blocks)
 {
 	if (bdi_read_congested(target->bt_bdi))
 		return;
 
-	xfs_buf_read(target, ioff, isize,
+	xfs_buf_read(target, bno, num_blocks,
 		     XBF_TRYLOCK|XBF_ASYNC|XBF_READ_AHEAD|XBF_DONT_BLOCK);
 }
 
@@ -698,7 +711,7 @@ xfs_buf_get_empty(
 
 	bp = xfs_buf_allocate(0);
 	if (bp)
-		_xfs_buf_initialize(bp, target, 0, len, 0);
+		_xfs_buf_initialize(bp, target, 0, BTOBB(len), 0);
 	return bp;
 }
 
@@ -790,7 +803,7 @@ xfs_buf_get_uncached(
 	bp = xfs_buf_allocate(0);
 	if (unlikely(bp == NULL))
 		goto fail;
-	_xfs_buf_initialize(bp, target, 0, len, 0);
+	_xfs_buf_initialize(bp, target, 0, BTOBB(len), 0);
 
 	error = _xfs_buf_get_pages(bp, page_count, 0);
 	if (error)
-- 
1.7.5.4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/4] xfs: reduce the number of log forces from tail pushing
  2011-08-26  6:51 [PATCH 0/4] xfs: patch queue for 3.2 v2 Dave Chinner
  2011-08-26  6:51 ` [PATCH 1/4] xfs: Don't allocate new buffers on every call to _xfs_buf_find Dave Chinner
@ 2011-08-26  6:51 ` Dave Chinner
  2011-08-26  8:14   ` Christoph Hellwig
  2011-08-26  6:51 ` [PATCH 3/4] xfs: re-arrange all the xfsbufd delwri queue code Dave Chinner
  2011-08-26  6:51 ` [PATCH 4/4] xfs: convert xfsbufd to use a workqueue Dave Chinner
  3 siblings, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2011-08-26  6:51 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

The AIL push code will issue a log force on ever single push loop
that it exits and has encountered pinned items. It doesn't rescan
these pinned items until it revisits the AIL from the start. Hence
we only need to force the log once per walk from the start of the
AIL to the target LSN.

This results in numbers like this:

	xs_push_ail_flush.....         1456
	xs_log_force.........          1485

For an 8-way 50M inode create workload - almost all the log forces
are coming from the AIL pushing code.

Reduce the number of log forces by only forcing the log if the
previous walk found pinned buffers. This reduces the numbers to:

	xs_push_ail_flush.....          665
	xs_log_force.........           682

For the same test.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
---
 fs/xfs/xfs_trans_ail.c  |   27 +++++++++++++++------------
 fs/xfs/xfs_trans_priv.h |    1 +
 2 files changed, 16 insertions(+), 12 deletions(-)

diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
index c15aa29..13188df 100644
--- a/fs/xfs/xfs_trans_ail.c
+++ b/fs/xfs/xfs_trans_ail.c
@@ -372,12 +372,24 @@ xfs_ail_worker(
 	xfs_lsn_t		lsn;
 	xfs_lsn_t		target;
 	long			tout = 10;
-	int			flush_log = 0;
 	int			stuck = 0;
 	int			count = 0;
 	int			push_xfsbufd = 0;
 
+	/*
+	 * If last time we ran we encountered pinned items, force the log first
+	 * and wait for it before pushing again.
+	 */
 	spin_lock(&ailp->xa_lock);
+	if (ailp->xa_last_pushed_lsn == 0 && ailp->xa_log_flush &&
+	    !list_empty(&ailp->xa_ail)) {
+		ailp->xa_log_flush = 0;
+		spin_unlock(&ailp->xa_lock);
+		XFS_STATS_INC(xs_push_ail_flush);
+		xfs_log_force(mp, XFS_LOG_SYNC);
+		spin_lock(&ailp->xa_lock);
+	}
+
 	target = ailp->xa_target;
 	lip = xfs_trans_ail_cursor_first(ailp, &cur, ailp->xa_last_pushed_lsn);
 	if (!lip || XFS_FORCED_SHUTDOWN(mp)) {
@@ -435,7 +447,7 @@ xfs_ail_worker(
 		case XFS_ITEM_PINNED:
 			XFS_STATS_INC(xs_push_ail_pinned);
 			stuck++;
-			flush_log = 1;
+			ailp->xa_log_flush++;
 			break;
 
 		case XFS_ITEM_LOCKED:
@@ -480,16 +492,6 @@ xfs_ail_worker(
 	xfs_trans_ail_cursor_done(ailp, &cur);
 	spin_unlock(&ailp->xa_lock);
 
-	if (flush_log) {
-		/*
-		 * If something we need to push out was pinned, then
-		 * push out the log so it will become unpinned and
-		 * move forward in the AIL.
-		 */
-		XFS_STATS_INC(xs_push_ail_flush);
-		xfs_log_force(mp, 0);
-	}
-
 	if (push_xfsbufd) {
 		/* we've got delayed write buffers to flush */
 		wake_up_process(mp->m_ddev_targp->bt_task);
@@ -500,6 +502,7 @@ out_done:
 	if (!count) {
 		/* We're past our target or empty, so idle */
 		ailp->xa_last_pushed_lsn = 0;
+		ailp->xa_log_flush = 0;
 
 		/*
 		 * We clear the XFS_AIL_PUSHING_BIT first before checking
diff --git a/fs/xfs/xfs_trans_priv.h b/fs/xfs/xfs_trans_priv.h
index 212946b..0a6eec6 100644
--- a/fs/xfs/xfs_trans_priv.h
+++ b/fs/xfs/xfs_trans_priv.h
@@ -71,6 +71,7 @@ struct xfs_ail {
 	struct delayed_work	xa_work;
 	xfs_lsn_t		xa_last_pushed_lsn;
 	unsigned long		xa_flags;
+	int			xa_log_flush;
 };
 
 #define XFS_AIL_PUSHING_BIT	0
-- 
1.7.5.4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/4] xfs: re-arrange all the xfsbufd delwri queue code
  2011-08-26  6:51 [PATCH 0/4] xfs: patch queue for 3.2 v2 Dave Chinner
  2011-08-26  6:51 ` [PATCH 1/4] xfs: Don't allocate new buffers on every call to _xfs_buf_find Dave Chinner
  2011-08-26  6:51 ` [PATCH 2/4] xfs: reduce the number of log forces from tail pushing Dave Chinner
@ 2011-08-26  6:51 ` Dave Chinner
  2011-08-26  8:14   ` Christoph Hellwig
  2011-08-26  6:51 ` [PATCH 4/4] xfs: convert xfsbufd to use a workqueue Dave Chinner
  3 siblings, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2011-08-26  6:51 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

In preparation for converting the xfsbufd to a workqueue, re-arrange
all the delayed write queue manipulation code above the buftarg
sections.  This allows us to avoid the need for forward declarations
of the new xfsbufd worker functions. This is pure code movement, not
changes to the code have been made.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
---
 fs/xfs/xfs_buf.c |  387 +++++++++++++++++++++++++++---------------------------
 1 files changed, 194 insertions(+), 193 deletions(-)

diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 594cea5..415ab71 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -1383,199 +1383,6 @@ xfs_buf_iomove(
 }
 
 /*
- *	Handling of buffer targets (buftargs).
- */
-
-/*
- * Wait for any bufs with callbacks that have been submitted but have not yet
- * returned. These buffers will have an elevated hold count, so wait on those
- * while freeing all the buffers only held by the LRU.
- */
-void
-xfs_wait_buftarg(
-	struct xfs_buftarg	*btp)
-{
-	struct xfs_buf		*bp;
-
-restart:
-	spin_lock(&btp->bt_lru_lock);
-	while (!list_empty(&btp->bt_lru)) {
-		bp = list_first_entry(&btp->bt_lru, struct xfs_buf, b_lru);
-		if (atomic_read(&bp->b_hold) > 1) {
-			spin_unlock(&btp->bt_lru_lock);
-			delay(100);
-			goto restart;
-		}
-		/*
-		 * clear the LRU reference count so the bufer doesn't get
-		 * ignored in xfs_buf_rele().
-		 */
-		atomic_set(&bp->b_lru_ref, 0);
-		spin_unlock(&btp->bt_lru_lock);
-		xfs_buf_rele(bp);
-		spin_lock(&btp->bt_lru_lock);
-	}
-	spin_unlock(&btp->bt_lru_lock);
-}
-
-int
-xfs_buftarg_shrink(
-	struct shrinker		*shrink,
-	struct shrink_control	*sc)
-{
-	struct xfs_buftarg	*btp = container_of(shrink,
-					struct xfs_buftarg, bt_shrinker);
-	struct xfs_buf		*bp;
-	int nr_to_scan = sc->nr_to_scan;
-	LIST_HEAD(dispose);
-
-	if (!nr_to_scan)
-		return btp->bt_lru_nr;
-
-	spin_lock(&btp->bt_lru_lock);
-	while (!list_empty(&btp->bt_lru)) {
-		if (nr_to_scan-- <= 0)
-			break;
-
-		bp = list_first_entry(&btp->bt_lru, struct xfs_buf, b_lru);
-
-		/*
-		 * Decrement the b_lru_ref count unless the value is already
-		 * zero. If the value is already zero, we need to reclaim the
-		 * buffer, otherwise it gets another trip through the LRU.
-		 */
-		if (!atomic_add_unless(&bp->b_lru_ref, -1, 0)) {
-			list_move_tail(&bp->b_lru, &btp->bt_lru);
-			continue;
-		}
-
-		/*
-		 * remove the buffer from the LRU now to avoid needing another
-		 * lock round trip inside xfs_buf_rele().
-		 */
-		list_move(&bp->b_lru, &dispose);
-		btp->bt_lru_nr--;
-	}
-	spin_unlock(&btp->bt_lru_lock);
-
-	while (!list_empty(&dispose)) {
-		bp = list_first_entry(&dispose, struct xfs_buf, b_lru);
-		list_del_init(&bp->b_lru);
-		xfs_buf_rele(bp);
-	}
-
-	return btp->bt_lru_nr;
-}
-
-void
-xfs_free_buftarg(
-	struct xfs_mount	*mp,
-	struct xfs_buftarg	*btp)
-{
-	unregister_shrinker(&btp->bt_shrinker);
-
-	xfs_flush_buftarg(btp, 1);
-	if (mp->m_flags & XFS_MOUNT_BARRIER)
-		xfs_blkdev_issue_flush(btp);
-
-	kthread_stop(btp->bt_task);
-	kmem_free(btp);
-}
-
-STATIC int
-xfs_setsize_buftarg_flags(
-	xfs_buftarg_t		*btp,
-	unsigned int		blocksize,
-	unsigned int		sectorsize,
-	int			verbose)
-{
-	btp->bt_bsize = blocksize;
-	btp->bt_sshift = ffs(sectorsize) - 1;
-	btp->bt_smask = sectorsize - 1;
-
-	if (set_blocksize(btp->bt_bdev, sectorsize)) {
-		xfs_warn(btp->bt_mount,
-			"Cannot set_blocksize to %u on device %s\n",
-			sectorsize, xfs_buf_target_name(btp));
-		return EINVAL;
-	}
-
-	return 0;
-}
-
-/*
- *	When allocating the initial buffer target we have not yet
- *	read in the superblock, so don't know what sized sectors
- *	are being used is at this early stage.  Play safe.
- */
-STATIC int
-xfs_setsize_buftarg_early(
-	xfs_buftarg_t		*btp,
-	struct block_device	*bdev)
-{
-	return xfs_setsize_buftarg_flags(btp,
-			PAGE_SIZE, bdev_logical_block_size(bdev), 0);
-}
-
-int
-xfs_setsize_buftarg(
-	xfs_buftarg_t		*btp,
-	unsigned int		blocksize,
-	unsigned int		sectorsize)
-{
-	return xfs_setsize_buftarg_flags(btp, blocksize, sectorsize, 1);
-}
-
-STATIC int
-xfs_alloc_delwrite_queue(
-	xfs_buftarg_t		*btp,
-	const char		*fsname)
-{
-	INIT_LIST_HEAD(&btp->bt_delwrite_queue);
-	spin_lock_init(&btp->bt_delwrite_lock);
-	btp->bt_flags = 0;
-	btp->bt_task = kthread_run(xfsbufd, btp, "xfsbufd/%s", fsname);
-	if (IS_ERR(btp->bt_task))
-		return PTR_ERR(btp->bt_task);
-	return 0;
-}
-
-xfs_buftarg_t *
-xfs_alloc_buftarg(
-	struct xfs_mount	*mp,
-	struct block_device	*bdev,
-	int			external,
-	const char		*fsname)
-{
-	xfs_buftarg_t		*btp;
-
-	btp = kmem_zalloc(sizeof(*btp), KM_SLEEP);
-
-	btp->bt_mount = mp;
-	btp->bt_dev =  bdev->bd_dev;
-	btp->bt_bdev = bdev;
-	btp->bt_bdi = blk_get_backing_dev_info(bdev);
-	if (!btp->bt_bdi)
-		goto error;
-
-	INIT_LIST_HEAD(&btp->bt_lru);
-	spin_lock_init(&btp->bt_lru_lock);
-	if (xfs_setsize_buftarg_early(btp, bdev))
-		goto error;
-	if (xfs_alloc_delwrite_queue(btp, fsname))
-		goto error;
-	btp->bt_shrinker.shrink = xfs_buftarg_shrink;
-	btp->bt_shrinker.seeks = DEFAULT_SEEKS;
-	register_shrinker(&btp->bt_shrinker);
-	return btp;
-
-error:
-	kmem_free(btp);
-	return NULL;
-}
-
-
-/*
  *	Delayed write buffer handling
  */
 STATIC void
@@ -1781,6 +1588,10 @@ xfsbufd(
 }
 
 /*
+ *	Handling of buffer targets (buftargs).
+ */
+
+/*
  *	Go through all incore buffers, and release buffers if they belong to
  *	the given device. This is used in filesystem error handling to
  *	preserve the consistency of its metadata.
@@ -1837,6 +1648,196 @@ xfs_flush_buftarg(
 	return pincount;
 }
 
+/*
+ * Wait for any bufs with callbacks that have been submitted but have not yet
+ * returned. These buffers will have an elevated hold count, so wait on those
+ * while freeing all the buffers only held by the LRU.
+ */
+void
+xfs_wait_buftarg(
+	struct xfs_buftarg	*btp)
+{
+	struct xfs_buf		*bp;
+
+restart:
+	spin_lock(&btp->bt_lru_lock);
+	while (!list_empty(&btp->bt_lru)) {
+		bp = list_first_entry(&btp->bt_lru, struct xfs_buf, b_lru);
+		if (atomic_read(&bp->b_hold) > 1) {
+			spin_unlock(&btp->bt_lru_lock);
+			delay(100);
+			goto restart;
+		}
+		/*
+		 * clear the LRU reference count so the bufer doesn't get
+		 * ignored in xfs_buf_rele().
+		 */
+		atomic_set(&bp->b_lru_ref, 0);
+		spin_unlock(&btp->bt_lru_lock);
+		xfs_buf_rele(bp);
+		spin_lock(&btp->bt_lru_lock);
+	}
+	spin_unlock(&btp->bt_lru_lock);
+}
+
+int
+xfs_buftarg_shrink(
+	struct shrinker		*shrink,
+	struct shrink_control	*sc)
+{
+	struct xfs_buftarg	*btp = container_of(shrink,
+					struct xfs_buftarg, bt_shrinker);
+	struct xfs_buf		*bp;
+	int nr_to_scan = sc->nr_to_scan;
+	LIST_HEAD(dispose);
+
+	if (!nr_to_scan)
+		return btp->bt_lru_nr;
+
+	spin_lock(&btp->bt_lru_lock);
+	while (!list_empty(&btp->bt_lru)) {
+		if (nr_to_scan-- <= 0)
+			break;
+
+		bp = list_first_entry(&btp->bt_lru, struct xfs_buf, b_lru);
+
+		/*
+		 * Decrement the b_lru_ref count unless the value is already
+		 * zero. If the value is already zero, we need to reclaim the
+		 * buffer, otherwise it gets another trip through the LRU.
+		 */
+		if (!atomic_add_unless(&bp->b_lru_ref, -1, 0)) {
+			list_move_tail(&bp->b_lru, &btp->bt_lru);
+			continue;
+		}
+
+		/*
+		 * remove the buffer from the LRU now to avoid needing another
+		 * lock round trip inside xfs_buf_rele().
+		 */
+		list_move(&bp->b_lru, &dispose);
+		btp->bt_lru_nr--;
+	}
+	spin_unlock(&btp->bt_lru_lock);
+
+	while (!list_empty(&dispose)) {
+		bp = list_first_entry(&dispose, struct xfs_buf, b_lru);
+		list_del_init(&bp->b_lru);
+		xfs_buf_rele(bp);
+	}
+
+	return btp->bt_lru_nr;
+}
+
+void
+xfs_free_buftarg(
+	struct xfs_mount	*mp,
+	struct xfs_buftarg	*btp)
+{
+	unregister_shrinker(&btp->bt_shrinker);
+
+	xfs_flush_buftarg(btp, 1);
+	if (mp->m_flags & XFS_MOUNT_BARRIER)
+		xfs_blkdev_issue_flush(btp);
+
+	kthread_stop(btp->bt_task);
+	kmem_free(btp);
+}
+
+STATIC int
+xfs_setsize_buftarg_flags(
+	xfs_buftarg_t		*btp,
+	unsigned int		blocksize,
+	unsigned int		sectorsize,
+	int			verbose)
+{
+	btp->bt_bsize = blocksize;
+	btp->bt_sshift = ffs(sectorsize) - 1;
+	btp->bt_smask = sectorsize - 1;
+
+	if (set_blocksize(btp->bt_bdev, sectorsize)) {
+		xfs_warn(btp->bt_mount,
+			"Cannot set_blocksize to %u on device %s\n",
+			sectorsize, xfs_buf_target_name(btp));
+		return EINVAL;
+	}
+
+	return 0;
+}
+
+/*
+ *	When allocating the initial buffer target we have not yet
+ *	read in the superblock, so don't know what sized sectors
+ *	are being used is at this early stage.  Play safe.
+ */
+STATIC int
+xfs_setsize_buftarg_early(
+	xfs_buftarg_t		*btp,
+	struct block_device	*bdev)
+{
+	return xfs_setsize_buftarg_flags(btp,
+			PAGE_SIZE, bdev_logical_block_size(bdev), 0);
+}
+
+int
+xfs_setsize_buftarg(
+	xfs_buftarg_t		*btp,
+	unsigned int		blocksize,
+	unsigned int		sectorsize)
+{
+	return xfs_setsize_buftarg_flags(btp, blocksize, sectorsize, 1);
+}
+
+STATIC int
+xfs_alloc_delwrite_queue(
+	xfs_buftarg_t		*btp,
+	const char		*fsname)
+{
+	INIT_LIST_HEAD(&btp->bt_delwrite_queue);
+	spin_lock_init(&btp->bt_delwrite_lock);
+	btp->bt_flags = 0;
+	btp->bt_task = kthread_run(xfsbufd, btp, "xfsbufd/%s", fsname);
+	if (IS_ERR(btp->bt_task))
+		return PTR_ERR(btp->bt_task);
+	return 0;
+}
+
+xfs_buftarg_t *
+xfs_alloc_buftarg(
+	struct xfs_mount	*mp,
+	struct block_device	*bdev,
+	int			external,
+	const char		*fsname)
+{
+	xfs_buftarg_t		*btp;
+
+	btp = kmem_zalloc(sizeof(*btp), KM_SLEEP);
+
+	btp->bt_mount = mp;
+	btp->bt_dev =  bdev->bd_dev;
+	btp->bt_bdev = bdev;
+	btp->bt_bdi = blk_get_backing_dev_info(bdev);
+	if (!btp->bt_bdi)
+		goto error;
+
+	INIT_LIST_HEAD(&btp->bt_lru);
+	spin_lock_init(&btp->bt_lru_lock);
+	if (xfs_setsize_buftarg_early(btp, bdev))
+		goto error;
+	if (xfs_alloc_delwrite_queue(btp, fsname))
+		goto error;
+	btp->bt_shrinker.shrink = xfs_buftarg_shrink;
+	btp->bt_shrinker.seeks = DEFAULT_SEEKS;
+	register_shrinker(&btp->bt_shrinker);
+	return btp;
+
+error:
+	kmem_free(btp);
+	return NULL;
+}
+
+
+
 int __init
 xfs_buf_init(void)
 {
-- 
1.7.5.4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 4/4] xfs: convert xfsbufd to use a workqueue
  2011-08-26  6:51 [PATCH 0/4] xfs: patch queue for 3.2 v2 Dave Chinner
                   ` (2 preceding siblings ...)
  2011-08-26  6:51 ` [PATCH 3/4] xfs: re-arrange all the xfsbufd delwri queue code Dave Chinner
@ 2011-08-26  6:51 ` Dave Chinner
  2011-08-26  8:25   ` Christoph Hellwig
  3 siblings, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2011-08-26  6:51 UTC (permalink / raw)
  To: xfs

From: Dave Chinner <dchinner@redhat.com>

There is no reason we need a thread per filesystem to do the
flushing of the delayed write buffer queue. This can be easily
handled by a global concurrency managed workqueue.

Convert the delayed write buffer handling to use workqueues and
workqueue flushes to implement buffer writeback by embedding a
delayed work structure into the struct xfs_buftarg and using that to
control flushing.  This greatly simplifes the process of flushing
and also removes a bunch of duplicated code between buftarg flushing
and delwri buffer writeback.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
 fs/xfs/xfs_buf.c       |  172 +++++++++++++++++++++---------------------------
 fs/xfs/xfs_buf.h       |    5 +-
 fs/xfs/xfs_dquot.c     |    1 -
 fs/xfs/xfs_trans_ail.c |    2 +-
 4 files changed, 78 insertions(+), 102 deletions(-)

diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 415ab71..9aa4e60 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -42,9 +42,9 @@
 #include "xfs_trace.h"
 
 static kmem_zone_t *xfs_buf_zone;
-STATIC int xfsbufd(void *);
-STATIC void xfs_buf_delwri_queue(xfs_buf_t *, int);
+STATIC void xfs_buf_delwri_queue(xfs_buf_t *bp, int unlock);
 
+static struct workqueue_struct *xfs_buf_wq;
 static struct workqueue_struct *xfslogd_workqueue;
 struct workqueue_struct *xfsdatad_workqueue;
 struct workqueue_struct *xfsconvertd_workqueue;
@@ -1407,8 +1407,9 @@ xfs_buf_delwri_queue(
 	}
 
 	if (list_empty(dwq)) {
-		/* start xfsbufd as it is about to have something to do */
-		wake_up_process(bp->b_target->bt_task);
+		/* queue a delayed flush as we are about to queue a buffer */
+		queue_delayed_work(xfs_buf_wq, &bp->b_target->bt_delwrite_work,
+			xfs_buf_timer_centisecs * msecs_to_jiffies(10));
 	}
 
 	bp->b_flags |= _XBF_DELWRI_Q;
@@ -1486,15 +1487,14 @@ STATIC int
 xfs_buf_delwri_split(
 	xfs_buftarg_t	*target,
 	struct list_head *list,
-	unsigned long	age)
+	unsigned long	age,
+	int		force)
 {
 	xfs_buf_t	*bp, *n;
 	struct list_head *dwq = &target->bt_delwrite_queue;
 	spinlock_t	*dwlk = &target->bt_delwrite_lock;
 	int		skipped = 0;
-	int		force;
 
-	force = test_and_clear_bit(XBT_FORCE_FLUSH, &target->bt_flags);
 	INIT_LIST_HEAD(list);
 	spin_lock(dwlk);
 	list_for_each_entry_safe(bp, n, dwq, b_list) {
@@ -1543,90 +1543,33 @@ xfs_buf_cmp(
 	return 0;
 }
 
-STATIC int
-xfsbufd(
-	void		*data)
-{
-	xfs_buftarg_t   *target = (xfs_buftarg_t *)data;
-
-	current->flags |= PF_MEMALLOC;
-
-	set_freezable();
-
-	do {
-		long	age = xfs_buf_age_centisecs * msecs_to_jiffies(10);
-		long	tout = xfs_buf_timer_centisecs * msecs_to_jiffies(10);
-		struct list_head tmp;
-		struct blk_plug plug;
-
-		if (unlikely(freezing(current))) {
-			set_bit(XBT_FORCE_SLEEP, &target->bt_flags);
-			refrigerator();
-		} else {
-			clear_bit(XBT_FORCE_SLEEP, &target->bt_flags);
-		}
-
-		/* sleep for a long time if there is nothing to do. */
-		if (list_empty(&target->bt_delwrite_queue))
-			tout = MAX_SCHEDULE_TIMEOUT;
-		schedule_timeout_interruptible(tout);
-
-		xfs_buf_delwri_split(target, &tmp, age);
-		list_sort(NULL, &tmp, xfs_buf_cmp);
-
-		blk_start_plug(&plug);
-		while (!list_empty(&tmp)) {
-			struct xfs_buf *bp;
-			bp = list_first_entry(&tmp, struct xfs_buf, b_list);
-			list_del_init(&bp->b_list);
-			xfs_bdstrat_cb(bp);
-		}
-		blk_finish_plug(&plug);
-	} while (!kthread_should_stop());
-
-	return 0;
-}
-
 /*
- *	Handling of buffer targets (buftargs).
+ * If we are doing a forced flush, then we need to wait for the IO that we
+ * issue to complete.
  */
-
-/*
- *	Go through all incore buffers, and release buffers if they belong to
- *	the given device. This is used in filesystem error handling to
- *	preserve the consistency of its metadata.
- */
-int
-xfs_flush_buftarg(
-	xfs_buftarg_t	*target,
-	int		wait)
+static void
+xfs_buf_delwri_work(
+	struct work_struct *work)
 {
-	xfs_buf_t	*bp;
-	int		pincount = 0;
+	struct xfs_buftarg *btp = container_of(to_delayed_work(work),
+					struct xfs_buftarg, bt_delwrite_work);
+	struct xfs_buf	*bp;
+	struct blk_plug	plug;
 	LIST_HEAD(tmp_list);
 	LIST_HEAD(wait_list);
-	struct blk_plug plug;
-
-	xfs_buf_runall_queues(xfsconvertd_workqueue);
-	xfs_buf_runall_queues(xfsdatad_workqueue);
-	xfs_buf_runall_queues(xfslogd_workqueue);
+	long		age = xfs_buf_age_centisecs * msecs_to_jiffies(10);
+	int		force = 0;
 
-	set_bit(XBT_FORCE_FLUSH, &target->bt_flags);
-	pincount = xfs_buf_delwri_split(target, &tmp_list, 0);
+	force = test_and_clear_bit(XBT_FORCE_FLUSH, &btp->bt_flags);
 
-	/*
-	 * Dropped the delayed write list lock, now walk the temporary list.
-	 * All I/O is issued async and then if we need to wait for completion
-	 * we do that after issuing all the IO.
-	 */
+	xfs_buf_delwri_split(btp, &tmp_list, age, force);
 	list_sort(NULL, &tmp_list, xfs_buf_cmp);
 
 	blk_start_plug(&plug);
 	while (!list_empty(&tmp_list)) {
 		bp = list_first_entry(&tmp_list, struct xfs_buf, b_list);
-		ASSERT(target == bp->b_target);
 		list_del_init(&bp->b_list);
-		if (wait) {
+		if (force) {
 			bp->b_flags &= ~XBF_ASYNC;
 			list_add(&bp->b_list, &wait_list);
 		}
@@ -1634,7 +1577,7 @@ xfs_flush_buftarg(
 	}
 	blk_finish_plug(&plug);
 
-	if (wait) {
+	if (force) {
 		/* Wait for IO to complete. */
 		while (!list_empty(&wait_list)) {
 			bp = list_first_entry(&wait_list, struct xfs_buf, b_list);
@@ -1645,7 +1588,48 @@ xfs_flush_buftarg(
 		}
 	}
 
-	return pincount;
+	if (list_empty(&btp->bt_delwrite_queue))
+		return;
+
+	queue_delayed_work(xfs_buf_wq, &btp->bt_delwrite_work,
+			xfs_buf_timer_centisecs * msecs_to_jiffies(10));
+}
+
+/*
+ *	Handling of buffer targets (buftargs).
+ */
+
+/*
+ * Flush all the queued buffer work, then flush any remaining dirty buffers
+ * and wait for them to complete. If there are buffers remaining on the delwri
+ * queue, then they were pinned so couldn't be flushed. Return a value of 1 to
+ * indicate that there were pinned buffers and the caller needs to retry the
+ * flush.
+ */
+int
+xfs_flush_buftarg(
+	xfs_buftarg_t	*target,
+	int		wait)
+{
+	xfs_buf_runall_queues(xfsconvertd_workqueue);
+	xfs_buf_runall_queues(xfsdatad_workqueue);
+	xfs_buf_runall_queues(xfslogd_workqueue);
+
+	if (wait) {
+		/*
+		 * Ensure we have work queued up after setting the force flag.
+		 * If work is already in progress then the wq flush below won't
+		 * cause new work to start and hence the force flag will not be
+		 * seen by the flush and the flush will be incomplete.
+		 */
+		set_bit(XBT_FORCE_FLUSH, &target->bt_flags);
+		queue_delayed_work(xfs_buf_wq, &target->bt_delwrite_work, 0);
+	}
+	flush_delayed_work_sync(&target->bt_delwrite_work);
+
+	if (!list_empty(&target->bt_delwrite_queue))
+		return 1;
+	return 0;
 }
 
 /*
@@ -1740,7 +1724,6 @@ xfs_free_buftarg(
 	if (mp->m_flags & XFS_MOUNT_BARRIER)
 		xfs_blkdev_issue_flush(btp);
 
-	kthread_stop(btp->bt_task);
 	kmem_free(btp);
 }
 
@@ -1788,20 +1771,6 @@ xfs_setsize_buftarg(
 	return xfs_setsize_buftarg_flags(btp, blocksize, sectorsize, 1);
 }
 
-STATIC int
-xfs_alloc_delwrite_queue(
-	xfs_buftarg_t		*btp,
-	const char		*fsname)
-{
-	INIT_LIST_HEAD(&btp->bt_delwrite_queue);
-	spin_lock_init(&btp->bt_delwrite_lock);
-	btp->bt_flags = 0;
-	btp->bt_task = kthread_run(xfsbufd, btp, "xfsbufd/%s", fsname);
-	if (IS_ERR(btp->bt_task))
-		return PTR_ERR(btp->bt_task);
-	return 0;
-}
-
 xfs_buftarg_t *
 xfs_alloc_buftarg(
 	struct xfs_mount	*mp,
@@ -1824,8 +1793,11 @@ xfs_alloc_buftarg(
 	spin_lock_init(&btp->bt_lru_lock);
 	if (xfs_setsize_buftarg_early(btp, bdev))
 		goto error;
-	if (xfs_alloc_delwrite_queue(btp, fsname))
-		goto error;
+
+	INIT_LIST_HEAD(&btp->bt_delwrite_queue);
+	spin_lock_init(&btp->bt_delwrite_lock);
+	INIT_DELAYED_WORK(&btp->bt_delwrite_work, xfs_buf_delwri_work);
+
 	btp->bt_shrinker.shrink = xfs_buftarg_shrink;
 	btp->bt_shrinker.seeks = DEFAULT_SEEKS;
 	register_shrinker(&btp->bt_shrinker);
@@ -1860,8 +1832,13 @@ xfs_buf_init(void)
 	if (!xfsconvertd_workqueue)
 		goto out_destroy_xfsdatad_workqueue;
 
+	xfs_buf_wq = alloc_workqueue("xfsbufd", WQ_MEM_RECLAIM, 8);
+	if (!xfs_buf_wq)
+		goto out_destroy_xfsconvertd_wq;
 	return 0;
 
+ out_destroy_xfsconvertd_wq:
+	destroy_workqueue(xfsconvertd_workqueue);
  out_destroy_xfsdatad_workqueue:
 	destroy_workqueue(xfsdatad_workqueue);
  out_destroy_xfslogd_workqueue:
@@ -1875,6 +1852,7 @@ xfs_buf_init(void)
 void
 xfs_buf_terminate(void)
 {
+	destroy_workqueue(xfs_buf_wq);
 	destroy_workqueue(xfsconvertd_workqueue);
 	destroy_workqueue(xfsdatad_workqueue);
 	destroy_workqueue(xfslogd_workqueue);
diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
index 620972b..c1aabfd 100644
--- a/fs/xfs/xfs_buf.h
+++ b/fs/xfs/xfs_buf.h
@@ -90,8 +90,7 @@ typedef unsigned int xfs_buf_flags_t;
 	{ _XBF_DELWRI_Q,	"DELWRI_Q" }
 
 typedef enum {
-	XBT_FORCE_SLEEP = 0,
-	XBT_FORCE_FLUSH = 1,
+	XBT_FORCE_FLUSH = 0,
 } xfs_buftarg_flags_t;
 
 typedef struct xfs_buftarg {
@@ -104,7 +103,7 @@ typedef struct xfs_buftarg {
 	size_t			bt_smask;
 
 	/* per device delwri queue */
-	struct task_struct	*bt_task;
+	struct delayed_work	bt_delwrite_work;
 	struct list_head	bt_delwrite_queue;
 	spinlock_t		bt_delwrite_lock;
 	unsigned long		bt_flags;
diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
index db62959..1fb9d93 100644
--- a/fs/xfs/xfs_dquot.c
+++ b/fs/xfs/xfs_dquot.c
@@ -1446,7 +1446,6 @@ xfs_qm_dqflock_pushbuf_wait(
 		if (xfs_buf_ispinned(bp))
 			xfs_log_force(mp, 0);
 		xfs_buf_delwri_promote(bp);
-		wake_up_process(bp->b_target->bt_task);
 	}
 	xfs_buf_relse(bp);
 out_lock:
diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
index 13188df..a3d1784 100644
--- a/fs/xfs/xfs_trans_ail.c
+++ b/fs/xfs/xfs_trans_ail.c
@@ -494,7 +494,7 @@ xfs_ail_worker(
 
 	if (push_xfsbufd) {
 		/* we've got delayed write buffers to flush */
-		wake_up_process(mp->m_ddev_targp->bt_task);
+		flush_delayed_work(&mp->m_ddev_targp->bt_delwrite_work);
 	}
 
 	/* assume we have more work to do in a short while */
-- 
1.7.5.4

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/4] xfs: Don't allocate new buffers on every call to _xfs_buf_find
  2011-08-26  6:51 ` [PATCH 1/4] xfs: Don't allocate new buffers on every call to _xfs_buf_find Dave Chinner
@ 2011-08-26  8:11   ` Christoph Hellwig
  2011-08-26 14:19     ` Alex Elder
  2011-09-21  6:44     ` Dave Chinner
  0 siblings, 2 replies; 14+ messages in thread
From: Christoph Hellwig @ 2011-08-26  8:11 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index c57836d..594cea5 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -171,10 +171,16 @@ STATIC void
>  _xfs_buf_initialize(
>  	xfs_buf_t		*bp,
>  	xfs_buftarg_t		*target,
> -	xfs_off_t		range_base,
> -	size_t			range_length,
> +	xfs_off_t		bno,
> +	size_t			num_blocks,
>  	xfs_buf_flags_t		flags)
>  {
> +	xfs_off_t		range_base;
> +	size_t			range_length;
> +
> +	range_base = BBTOB(bno);
> +	range_length = BBTOB(num_blocks);

What is the point of changing the mostly unrelated _xfs_buf_initialize
prototype in this patch?

I think it (and the other renaming changes related to it) are fine,
but should be a separate patch.  And once you touch _xfs_buf_initialize
after the core of this patch, please merge it with xfs_buf_allocate into
a new xfs_buf_alloc that does the full allocation + initialization and
can also replace xfs_buf_get_empty.

> +	bp = _xfs_buf_find(target, bno, num_blocks, flags, new_bp);
> +	if (!bp) {
> +		xfs_buf_deallocate(new_bp);
> +		return NULL;
> +	}
> +
>  	if (bp == new_bp) {
>  		error = xfs_buf_allocate_memory(bp, flags);
>  		if (error)
>  			goto no_buffer;
> +	} else
>  		xfs_buf_deallocate(new_bp);

I'd recommend moving the call to xfs_buf_allocate_memory into
_xfs_buf_find so that it returns a fully allocated buffer.  In fact I'd
also move the xfs_buf_deallocate(new_bp) into the found side of
_xfs_buf_find, avoiding any conditionals in xfs_buf_get.


>  
> -	XFS_STATS_INC(xb_get);
> -
>  	/*
> -	 * Always fill in the block number now, the mapped cases can do
> -	 * their own overlay of this later.
> +	 * Now we have a workable buffer, fill in the block number so
> +	 * that we can do IO on it.
>  	 */
> -	bp->b_bn = ioff;
> -	bp->b_count_desired = bp->b_buffer_length;
> +	bp->b_bn = bno;

Note that we only need this if we did not find an existing buffer.  It's
not strictly related to the patch, but given that you stop assigning
b_count_desired and redo this whole area it might be worth shifting it
into the if (bp == new_bp) conditional area.


>  
> +found:
> +	ASSERT(bp->b_flags & XBF_MAPPED);

This doesn't look right to me.  Various buffers like inode or remoate attrs
are unmapped, and I can't see any reason why we would assert not beeing
allowed to find them here.

Thinking about it more I'm also not sure skipping the code to map
buffers on a straight cache hit is a good idea - there's nothing
inherent to requiring a given buffer to be mapped for all callers.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/4] xfs: reduce the number of log forces from tail pushing
  2011-08-26  6:51 ` [PATCH 2/4] xfs: reduce the number of log forces from tail pushing Dave Chinner
@ 2011-08-26  8:14   ` Christoph Hellwig
  0 siblings, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2011-08-26  8:14 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

Looks fine.  xa_log_flush is updated unlocked, but given that we always
just have a single worker per ail running at the same time that should
not be a problem.

Reviewed-by: Christoph Hellwig <hch@lst.de>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/4] xfs: re-arrange all the xfsbufd delwri queue code
  2011-08-26  6:51 ` [PATCH 3/4] xfs: re-arrange all the xfsbufd delwri queue code Dave Chinner
@ 2011-08-26  8:14   ` Christoph Hellwig
  0 siblings, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2011-08-26  8:14 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Fri, Aug 26, 2011 at 04:51:36PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> In preparation for converting the xfsbufd to a workqueue, re-arrange
> all the delayed write queue manipulation code above the buftarg
> sections.  This allows us to avoid the need for forward declarations
> of the new xfsbufd worker functions. This is pure code movement, not
> changes to the code have been made.
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> Reviewed-by: Alex Elder <aelder@sgi.com>

Looks fine, although it clashed with my reviewed queue of xfs_buf
patches.

Reviewed-by: Christoph Hellwig <hch@lst.de>

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/4] xfs: convert xfsbufd to use a workqueue
  2011-08-26  6:51 ` [PATCH 4/4] xfs: convert xfsbufd to use a workqueue Dave Chinner
@ 2011-08-26  8:25   ` Christoph Hellwig
  2011-09-21  6:25     ` Dave Chinner
  0 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2011-08-26  8:25 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

> -STATIC void xfs_buf_delwri_queue(xfs_buf_t *, int);
> +STATIC void xfs_buf_delwri_queue(xfs_buf_t *bp, int unlock);

spuriously changing this prototype just makes merging with other
pending changes in this are harder :P

>  /*
> + * If we are doing a forced flush, then we need to wait for the IO that we
> + * issue to complete.
>   */
> +static void
> +xfs_buf_delwri_work(
> +	struct work_struct *work)
>  {
> +	struct xfs_buftarg *btp = container_of(to_delayed_work(work),
> +					struct xfs_buftarg, bt_delwrite_work);
> +	struct xfs_buf	*bp;
> +	struct blk_plug	plug;
>  	LIST_HEAD(tmp_list);
>  	LIST_HEAD(wait_list);
> +	long		age = xfs_buf_age_centisecs * msecs_to_jiffies(10);
> +	int		force = 0;
>  
> +	force = test_and_clear_bit(XBT_FORCE_FLUSH, &btp->bt_flags);
>  
> +	xfs_buf_delwri_split(btp, &tmp_list, age, force);
>  	list_sort(NULL, &tmp_list, xfs_buf_cmp);
>  
>  	blk_start_plug(&plug);
>  	while (!list_empty(&tmp_list)) {
>  		bp = list_first_entry(&tmp_list, struct xfs_buf, b_list);
> -		ASSERT(target == bp->b_target);
>  		list_del_init(&bp->b_list);
> -		if (wait) {
> +		if (force) {
>  			bp->b_flags &= ~XBF_ASYNC;
>  			list_add(&bp->b_list, &wait_list);
>  		}
> @@ -1634,7 +1577,7 @@ xfs_flush_buftarg(
>  	}
>  	blk_finish_plug(&plug);
>  
> +	if (force) {
>  		/* Wait for IO to complete. */
>  		while (!list_empty(&wait_list)) {
>  			bp = list_first_entry(&wait_list, struct xfs_buf, b_list);
> @@ -1645,7 +1588,48 @@ xfs_flush_buftarg(
>  		}
>  	}
>  

> +/*
> + *	Handling of buffer targets (buftargs).
> + */

I think we can just kill this comment.

> +/*
> + * Flush all the queued buffer work, then flush any remaining dirty buffers
> + * and wait for them to complete. If there are buffers remaining on the delwri
> + * queue, then they were pinned so couldn't be flushed. Return a value of 1 to
> + * indicate that there were pinned buffers and the caller needs to retry the
> + * flush.
> + */

Not directly related to your patch, but only one caller ever checks the
return value and retries.  This means e.g. during sync or umount we
don't bother with trying to push pinned buffers.

> +int
> +xfs_flush_buftarg(
> +	xfs_buftarg_t	*target,

Please use the non-typedef version of new or largely changed code.

> index 13188df..a3d1784 100644
> --- a/fs/xfs/xfs_trans_ail.c
> +++ b/fs/xfs/xfs_trans_ail.c
> @@ -494,7 +494,7 @@ xfs_ail_worker(
>  
>  	if (push_xfsbufd) {
>  		/* we've got delayed write buffers to flush */
> -		wake_up_process(mp->m_ddev_targp->bt_task);
> +		flush_delayed_work(&mp->m_ddev_targp->bt_delwrite_work);

This is a huge change in behaviour.  wake_up_process just kicks the
thread to wakeup from sleep as soon as the schedule selects it, while
flush_delayed_work does not only queue a pending delayed work, but also
waits for it to finish.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/4] xfs: Don't allocate new buffers on every call to _xfs_buf_find
  2011-08-26  8:11   ` Christoph Hellwig
@ 2011-08-26 14:19     ` Alex Elder
  2011-09-21  6:44     ` Dave Chinner
  1 sibling, 0 replies; 14+ messages in thread
From: Alex Elder @ 2011-08-26 14:19 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Fri, 2011-08-26 at 04:11 -0400, Christoph Hellwig wrote:
. . .
> >  
> > +found:
> > +	ASSERT(bp->b_flags & XBF_MAPPED);
> 
> This doesn't look right to me.  Various buffers like inode or remoate attrs
> are unmapped, and I can't see any reason why we would assert not beeing
> allowed to find them here.
> 
> Thinking about it more I'm also not sure skipping the code to map
> buffers on a straight cache hit is a good idea - there's nothing
> inherent to requiring a given buffer to be mapped for all callers.

I actually tripped this assert last night the first time I
tried running it.

					-Alex

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/4] xfs: convert xfsbufd to use a workqueue
  2011-08-26  8:25   ` Christoph Hellwig
@ 2011-09-21  6:25     ` Dave Chinner
  2011-09-21 11:26       ` Christoph Hellwig
  0 siblings, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2011-09-21  6:25 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Fri, Aug 26, 2011 at 04:25:15AM -0400, Christoph Hellwig wrote:
> > index 13188df..a3d1784 100644
> > --- a/fs/xfs/xfs_trans_ail.c
> > +++ b/fs/xfs/xfs_trans_ail.c
> > @@ -494,7 +494,7 @@ xfs_ail_worker(
> >  
> >  	if (push_xfsbufd) {
> >  		/* we've got delayed write buffers to flush */
> > -		wake_up_process(mp->m_ddev_targp->bt_task);
> > +		flush_delayed_work(&mp->m_ddev_targp->bt_delwrite_work);
> 
> This is a huge change in behaviour.  wake_up_process just kicks the
> thread to wakeup from sleep as soon as the schedule selects it, while
> flush_delayed_work does not only queue a pending delayed work, but also
> waits for it to finish.

Which is precisely what I want here - to wait for all the delwri
buffers that were promoted to be submitted before continuing
onwards.  This makes the scanning algorithm self throttling -
instead of simply pushing the buffers to the delwri queue and
kicking a background thread and hoping it can flush buffers faster
than we can promote them from the AIL, it explicitly pushes the
delwri buffers before the next round of AIL scanning. The ensures we
start timely IO on the buffers and don't simple continue to scan the
AIL while we wait for the background thread to send them off to
disk and complete.

IOWs, instead of:

	AIL			bufd
	promote
	....
	promote
	wakeup
	short sleep
				woken
				sort
				dispatch
				....

<sometime later, maybe before, during or after xfsbufd has run>

	promote
	.....
	promote
	wakeup
	short sleep
				woken
				sort
				dispatch
				....

Where we *hope* the short sleep in the AIL processing is long enough
to avoid repeated scanning of the AIL while the queued IO is
dispatched. we end up with:

	AIL			bufd
	promote
	....
	promote
	flush_work
				sort
				dispatch
	short sleep

	promote
	....
	promote
	flush_work
				sort
				dispatch
	short sleep

Which is much more controlled and means that the short sleep that
that the AIL processing does actually gives time for IO completions
to occur before continuing. It means that dispatch of IO from the
AIL is throttled to the rate of device congestion as it now waits
for the IO dispatch to complete instead of just sholving as much Io
as possible into the bufd queue.

FWIW, if we move to building a direct IO buffer list in the AIL as
we were recently discussing, this is -exactly- the IO dispatch
patterns and delays that we will get from the AIL....

Cheers,

Dave.

-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/4] xfs: Don't allocate new buffers on every call to _xfs_buf_find
  2011-08-26  8:11   ` Christoph Hellwig
  2011-08-26 14:19     ` Alex Elder
@ 2011-09-21  6:44     ` Dave Chinner
  2011-09-21 11:28       ` Christoph Hellwig
  1 sibling, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2011-09-21  6:44 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: xfs

On Fri, Aug 26, 2011 at 04:11:32AM -0400, Christoph Hellwig wrote:
> > diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> > index c57836d..594cea5 100644
> > --- a/fs/xfs/xfs_buf.c
> > +++ b/fs/xfs/xfs_buf.c
> > @@ -171,10 +171,16 @@ STATIC void
> >  _xfs_buf_initialize(
> >  	xfs_buf_t		*bp,
> >  	xfs_buftarg_t		*target,
> > -	xfs_off_t		range_base,
> > -	size_t			range_length,
> > +	xfs_off_t		bno,
> > +	size_t			num_blocks,
> >  	xfs_buf_flags_t		flags)
> >  {
> > +	xfs_off_t		range_base;
> > +	size_t			range_length;
> > +
> > +	range_base = BBTOB(bno);
> > +	range_length = BBTOB(num_blocks);
> 
> What is the point of changing the mostly unrelated _xfs_buf_initialize
> prototype in this patch?

We were converting units backwards and forwards inconsistently, some
functions taking bytes, some basic blocks, and conversions were
being done all over the place.

> I think it (and the other renaming changes related to it) are fine,
> but should be a separate patch.

OK, fine, I can do that.

> And once you touch _xfs_buf_initialize
> after the core of this patch, please merge it with xfs_buf_allocate into
> a new xfs_buf_alloc that does the full allocation + initialization and
> can also replace xfs_buf_get_empty.

Not right now. That restructing can be done separately, probably in
the same patch set that fixes the API types problems...

> > +	bp = _xfs_buf_find(target, bno, num_blocks, flags, new_bp);
> > +	if (!bp) {
> > +		xfs_buf_deallocate(new_bp);
> > +		return NULL;
> > +	}
> > +
> >  	if (bp == new_bp) {
> >  		error = xfs_buf_allocate_memory(bp, flags);
> >  		if (error)
> >  			goto no_buffer;
> > +	} else
> >  		xfs_buf_deallocate(new_bp);
> 
> I'd recommend moving the call to xfs_buf_allocate_memory into
> _xfs_buf_find so that it returns a fully allocated buffer.  In fact I'd
> also move the xfs_buf_deallocate(new_bp) into the found side of
> _xfs_buf_find, avoiding any conditionals in xfs_buf_get.

<sigh>

This code s pretty much as you requested it after the first time I
posted it.

http://oss.sgi.com/archives/xfs/2011-08/msg00146.html

I'll go rewrite this again, but IMO all you are asking for is for me
to put a different colour on the bike shed....

> >  
> > -	XFS_STATS_INC(xb_get);
> > -
> >  	/*
> > -	 * Always fill in the block number now, the mapped cases can do
> > -	 * their own overlay of this later.
> > +	 * Now we have a workable buffer, fill in the block number so
> > +	 * that we can do IO on it.
> >  	 */
> > -	bp->b_bn = ioff;
> > -	bp->b_count_desired = bp->b_buffer_length;
> > +	bp->b_bn = bno;
> 
> Note that we only need this if we did not find an existing buffer.  It's
> not strictly related to the patch, but given that you stop assigning
> b_count_desired and redo this whole area it might be worth shifting it
> into the if (bp == new_bp) conditional area.

OK.

> >  
> > +found:
> > +	ASSERT(bp->b_flags & XBF_MAPPED);
> 
> This doesn't look right to me.  Various buffers like inode or remoate attrs
> are unmapped, and I can't see any reason why we would assert not beeing
> allowed to find them here.

Yeah, a bit of a thinko, but it never tripped on me....

> Thinking about it more I'm also not sure skipping the code to map
> buffers on a straight cache hit is a good idea - there's nothing
> inherent to requiring a given buffer to be mapped for all callers.

OK, will fix.


-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/4] xfs: convert xfsbufd to use a workqueue
  2011-09-21  6:25     ` Dave Chinner
@ 2011-09-21 11:26       ` Christoph Hellwig
  0 siblings, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2011-09-21 11:26 UTC (permalink / raw)
  To: Dave Chinner; +Cc: xfs

On Wed, Sep 21, 2011 at 04:25:39PM +1000, Dave Chinner wrote:
> Which is precisely what I want here - to wait for all the delwri
> buffers that were promoted to be submitted before continuing
> onwards.  This makes the scanning algorithm self throttling -
> instead of simply pushing the buffers to the delwri queue and
> kicking a background thread and hoping it can flush buffers faster
> than we can promote them from the AIL, it explicitly pushes the
> delwri buffers before the next round of AIL scanning. The ensures we
> start timely IO on the buffers and don't simple continue to scan the
> AIL while we wait for the background thread to send them off to
> disk and complete.

I didn't say I'm against it.  The important bit is that such changes
in behaviour get documented in the patch description, including
a rationale like the on in this mail.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/4] xfs: Don't allocate new buffers on every call to _xfs_buf_find
  2011-09-21  6:44     ` Dave Chinner
@ 2011-09-21 11:28       ` Christoph Hellwig
  0 siblings, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2011-09-21 11:28 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Christoph Hellwig, xfs

On Wed, Sep 21, 2011 at 04:44:43PM +1000, Dave Chinner wrote:
> 
> > And once you touch _xfs_buf_initialize
> > after the core of this patch, please merge it with xfs_buf_allocate into
> > a new xfs_buf_alloc that does the full allocation + initialization and
> > can also replace xfs_buf_get_empty.
> 
> Not right now. That restructing can be done separately, probably in
> the same patch set that fixes the API types problems...

That's what I meant - the conversion changes should be part of a larger
patch (-series) to also fix up the API, and this bit.

> > >  	if (bp == new_bp) {
> > >  		error = xfs_buf_allocate_memory(bp, flags);
> > >  		if (error)
> > >  			goto no_buffer;
> > > +	} else
> > >  		xfs_buf_deallocate(new_bp);
> > 
> > I'd recommend moving the call to xfs_buf_allocate_memory into
> > _xfs_buf_find so that it returns a fully allocated buffer.  In fact I'd
> > also move the xfs_buf_deallocate(new_bp) into the found side of
> > _xfs_buf_find, avoiding any conditionals in xfs_buf_get.
> 
> <sigh>
> 
> This code s pretty much as you requested it after the first time I
> posted it.
> 
> http://oss.sgi.com/archives/xfs/2011-08/msg00146.html
> 
> I'll go rewrite this again, but IMO all you are asking for is for me
> to put a different colour on the bike shed....

We can leave it as-is for now. My suggestion in the previous mail just
went half-way to where it makes most sense after looking at it for a
while.

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2011-09-21 11:28 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-26  6:51 [PATCH 0/4] xfs: patch queue for 3.2 v2 Dave Chinner
2011-08-26  6:51 ` [PATCH 1/4] xfs: Don't allocate new buffers on every call to _xfs_buf_find Dave Chinner
2011-08-26  8:11   ` Christoph Hellwig
2011-08-26 14:19     ` Alex Elder
2011-09-21  6:44     ` Dave Chinner
2011-09-21 11:28       ` Christoph Hellwig
2011-08-26  6:51 ` [PATCH 2/4] xfs: reduce the number of log forces from tail pushing Dave Chinner
2011-08-26  8:14   ` Christoph Hellwig
2011-08-26  6:51 ` [PATCH 3/4] xfs: re-arrange all the xfsbufd delwri queue code Dave Chinner
2011-08-26  8:14   ` Christoph Hellwig
2011-08-26  6:51 ` [PATCH 4/4] xfs: convert xfsbufd to use a workqueue Dave Chinner
2011-08-26  8:25   ` Christoph Hellwig
2011-09-21  6:25     ` Dave Chinner
2011-09-21 11:26       ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.