* [PATCH 0/4] xfs: patch queue for 3.2 v2
@ 2011-08-26 6:51 Dave Chinner
2011-08-26 6:51 ` [PATCH 1/4] xfs: Don't allocate new buffers on every call to _xfs_buf_find Dave Chinner
` (3 more replies)
0 siblings, 4 replies; 14+ messages in thread
From: Dave Chinner @ 2011-08-26 6:51 UTC (permalink / raw)
To: xfs
Updates series after Alex's review and merging of the dio locking
fixes.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 1/4] xfs: Don't allocate new buffers on every call to _xfs_buf_find
2011-08-26 6:51 [PATCH 0/4] xfs: patch queue for 3.2 v2 Dave Chinner
@ 2011-08-26 6:51 ` Dave Chinner
2011-08-26 8:11 ` Christoph Hellwig
2011-08-26 6:51 ` [PATCH 2/4] xfs: reduce the number of log forces from tail pushing Dave Chinner
` (2 subsequent siblings)
3 siblings, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2011-08-26 6:51 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
Stats show that for an 8-way unlink @ ~80,000 unlinks/s we are doing
~1 million cache hit lookups to ~3000 buffer creates. That's almost
3 orders of magnitude more cahce hits than misses, so optimising for
cache hits is quite important. In the cache hit case, we do not need
to allocate a new buffer in case of a cache miss, so we are
effectively hitting the allocator for no good reason for vast the
majority of calls to _xfs_buf_find. 8-way create workloads are
showing similar cache hit/miss ratios.
The result is profiles that look like this:
samples pcnt function DSO
_______ _____ _______________________________ _________________
1036.00 10.0% _xfs_buf_find [kernel.kallsyms]
582.00 5.6% kmem_cache_alloc [kernel.kallsyms]
519.00 5.0% __memcpy [kernel.kallsyms]
468.00 4.5% __ticket_spin_lock [kernel.kallsyms]
388.00 3.7% kmem_cache_free [kernel.kallsyms]
331.00 3.2% xfs_log_commit_cil [kernel.kallsyms]
Further, there is a fair bit of work involved in initialising a new
buffer once a cache miss has occurred and we currently do that under
the rbtree spinlock. That increases spinlock hold time on what are
heavily used trees.
To fix this, remove the initialisation of the buffer from
_xfs_buf_find() and only allocate the new buffer once we've had a
cache miss. Initialise the buffer immediately after allocating it in
xfs_buf_get, too, so that is it ready for insert if we get another
cache miss after allocation. This minimises lock hold time and
avoids unnecessary allocator churn. The resulting profiles look
like:
samples pcnt function DSO
_______ _____ ___________________________ _________________
8111.00 9.1% _xfs_buf_find [kernel.kallsyms]
4380.00 4.9% __memcpy [kernel.kallsyms]
4341.00 4.8% __ticket_spin_lock [kernel.kallsyms]
3401.00 3.8% kmem_cache_alloc [kernel.kallsyms]
2856.00 3.2% xfs_log_commit_cil [kernel.kallsyms]
2625.00 2.9% __kmalloc [kernel.kallsyms]
2380.00 2.7% kfree [kernel.kallsyms]
2016.00 2.3% kmem_cache_free [kernel.kallsyms]
Showing a significant reduction in time spent doing allocation and
freeing from slabs (kmem_cache_alloc and kmem_cache_free).
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
---
fs/xfs/xfs_buf.c | 89 +++++++++++++++++++++++++++++++-----------------------
1 files changed, 51 insertions(+), 38 deletions(-)
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index c57836d..594cea5 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -171,10 +171,16 @@ STATIC void
_xfs_buf_initialize(
xfs_buf_t *bp,
xfs_buftarg_t *target,
- xfs_off_t range_base,
- size_t range_length,
+ xfs_off_t bno,
+ size_t num_blocks,
xfs_buf_flags_t flags)
{
+ xfs_off_t range_base;
+ size_t range_length;
+
+ range_base = BBTOB(bno);
+ range_length = BBTOB(num_blocks);
+
/*
* We don't want certain flags to appear in b_flags.
*/
@@ -423,9 +429,9 @@ _xfs_buf_map_pages(
*/
xfs_buf_t *
_xfs_buf_find(
- xfs_buftarg_t *btp, /* block device target */
- xfs_off_t ioff, /* starting offset of range */
- size_t isize, /* length of range */
+ xfs_buftarg_t *btp,
+ xfs_off_t bno,
+ size_t num_blocks,
xfs_buf_flags_t flags,
xfs_buf_t *new_bp)
{
@@ -436,8 +442,8 @@ _xfs_buf_find(
struct rb_node *parent;
xfs_buf_t *bp;
- range_base = (ioff << BBSHIFT);
- range_length = (isize << BBSHIFT);
+ range_base = BBTOB(bno);
+ range_length = BBTOB(num_blocks);
/* Check for IOs smaller than the sector size / not sector aligned */
ASSERT(!(range_length < (1 << btp->bt_sshift)));
@@ -445,7 +451,7 @@ _xfs_buf_find(
/* get tree root */
pag = xfs_perag_get(btp->bt_mount,
- xfs_daddr_to_agno(btp->bt_mount, ioff));
+ xfs_daddr_to_agno(btp->bt_mount, bno));
/* walk tree */
spin_lock(&pag->pag_buf_lock);
@@ -481,8 +487,6 @@ _xfs_buf_find(
/* No match found */
if (new_bp) {
- _xfs_buf_initialize(new_bp, btp, range_base,
- range_length, flags);
rb_link_node(&new_bp->b_rbnode, parent, rbp);
rb_insert_color(&new_bp->b_rbnode, &pag->pag_buf_tree);
/* the buffer keeps the perag reference until it is freed */
@@ -525,34 +529,43 @@ found:
}
/*
- * Assembles a buffer covering the specified range.
- * Storage in memory for all portions of the buffer will be allocated,
- * although backing storage may not be.
+ * Assembles a buffer covering the specified range. The code is optimised for
+ * cache hits, as metadata intensive workloads will see 3 orders of magnitude
+ * more hits than misses.
*/
-xfs_buf_t *
+struct xfs_buf *
xfs_buf_get(
- xfs_buftarg_t *target,/* target for buffer */
- xfs_off_t ioff, /* starting offset of range */
- size_t isize, /* length of range */
+ struct xfs_buftarg *target,
+ xfs_off_t bno,
+ size_t num_blocks,
xfs_buf_flags_t flags)
{
- xfs_buf_t *bp, *new_bp;
+ struct xfs_buf *bp;
+ struct xfs_buf *new_bp;
int error = 0;
+ bp = _xfs_buf_find(target, bno, num_blocks, flags, NULL);
+ if (likely(bp))
+ goto found;
+
new_bp = xfs_buf_allocate(flags);
if (unlikely(!new_bp))
return NULL;
- bp = _xfs_buf_find(target, ioff, isize, flags, new_bp);
+ _xfs_buf_initialize(new_bp, target, bno, num_blocks, flags);
+
+ bp = _xfs_buf_find(target, bno, num_blocks, flags, new_bp);
+ if (!bp) {
+ xfs_buf_deallocate(new_bp);
+ return NULL;
+ }
+
if (bp == new_bp) {
error = xfs_buf_allocate_memory(bp, flags);
if (error)
goto no_buffer;
- } else {
+ } else
xfs_buf_deallocate(new_bp);
- if (unlikely(bp == NULL))
- return NULL;
- }
if (!(bp->b_flags & XBF_MAPPED)) {
error = _xfs_buf_map_pages(bp, flags);
@@ -563,19 +576,19 @@ xfs_buf_get(
}
}
- XFS_STATS_INC(xb_get);
-
/*
- * Always fill in the block number now, the mapped cases can do
- * their own overlay of this later.
+ * Now we have a workable buffer, fill in the block number so
+ * that we can do IO on it.
*/
- bp->b_bn = ioff;
- bp->b_count_desired = bp->b_buffer_length;
+ bp->b_bn = bno;
+found:
+ ASSERT(bp->b_flags & XBF_MAPPED);
+ XFS_STATS_INC(xb_get);
trace_xfs_buf_get(bp, flags, _RET_IP_);
return bp;
- no_buffer:
+no_buffer:
if (flags & (XBF_LOCK | XBF_TRYLOCK))
xfs_buf_unlock(bp);
xfs_buf_rele(bp);
@@ -604,15 +617,15 @@ _xfs_buf_read(
xfs_buf_t *
xfs_buf_read(
xfs_buftarg_t *target,
- xfs_off_t ioff,
- size_t isize,
+ xfs_off_t bno,
+ size_t num_blocks,
xfs_buf_flags_t flags)
{
xfs_buf_t *bp;
flags |= XBF_READ;
- bp = xfs_buf_get(target, ioff, isize, flags);
+ bp = xfs_buf_get(target, bno, num_blocks, flags);
if (bp) {
trace_xfs_buf_read(bp, flags, _RET_IP_);
@@ -647,13 +660,13 @@ xfs_buf_read(
void
xfs_buf_readahead(
xfs_buftarg_t *target,
- xfs_off_t ioff,
- size_t isize)
+ xfs_off_t bno,
+ size_t num_blocks)
{
if (bdi_read_congested(target->bt_bdi))
return;
- xfs_buf_read(target, ioff, isize,
+ xfs_buf_read(target, bno, num_blocks,
XBF_TRYLOCK|XBF_ASYNC|XBF_READ_AHEAD|XBF_DONT_BLOCK);
}
@@ -698,7 +711,7 @@ xfs_buf_get_empty(
bp = xfs_buf_allocate(0);
if (bp)
- _xfs_buf_initialize(bp, target, 0, len, 0);
+ _xfs_buf_initialize(bp, target, 0, BTOBB(len), 0);
return bp;
}
@@ -790,7 +803,7 @@ xfs_buf_get_uncached(
bp = xfs_buf_allocate(0);
if (unlikely(bp == NULL))
goto fail;
- _xfs_buf_initialize(bp, target, 0, len, 0);
+ _xfs_buf_initialize(bp, target, 0, BTOBB(len), 0);
error = _xfs_buf_get_pages(bp, page_count, 0);
if (error)
--
1.7.5.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 2/4] xfs: reduce the number of log forces from tail pushing
2011-08-26 6:51 [PATCH 0/4] xfs: patch queue for 3.2 v2 Dave Chinner
2011-08-26 6:51 ` [PATCH 1/4] xfs: Don't allocate new buffers on every call to _xfs_buf_find Dave Chinner
@ 2011-08-26 6:51 ` Dave Chinner
2011-08-26 8:14 ` Christoph Hellwig
2011-08-26 6:51 ` [PATCH 3/4] xfs: re-arrange all the xfsbufd delwri queue code Dave Chinner
2011-08-26 6:51 ` [PATCH 4/4] xfs: convert xfsbufd to use a workqueue Dave Chinner
3 siblings, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2011-08-26 6:51 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
The AIL push code will issue a log force on ever single push loop
that it exits and has encountered pinned items. It doesn't rescan
these pinned items until it revisits the AIL from the start. Hence
we only need to force the log once per walk from the start of the
AIL to the target LSN.
This results in numbers like this:
xs_push_ail_flush..... 1456
xs_log_force......... 1485
For an 8-way 50M inode create workload - almost all the log forces
are coming from the AIL pushing code.
Reduce the number of log forces by only forcing the log if the
previous walk found pinned buffers. This reduces the numbers to:
xs_push_ail_flush..... 665
xs_log_force......... 682
For the same test.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
---
fs/xfs/xfs_trans_ail.c | 27 +++++++++++++++------------
fs/xfs/xfs_trans_priv.h | 1 +
2 files changed, 16 insertions(+), 12 deletions(-)
diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
index c15aa29..13188df 100644
--- a/fs/xfs/xfs_trans_ail.c
+++ b/fs/xfs/xfs_trans_ail.c
@@ -372,12 +372,24 @@ xfs_ail_worker(
xfs_lsn_t lsn;
xfs_lsn_t target;
long tout = 10;
- int flush_log = 0;
int stuck = 0;
int count = 0;
int push_xfsbufd = 0;
+ /*
+ * If last time we ran we encountered pinned items, force the log first
+ * and wait for it before pushing again.
+ */
spin_lock(&ailp->xa_lock);
+ if (ailp->xa_last_pushed_lsn == 0 && ailp->xa_log_flush &&
+ !list_empty(&ailp->xa_ail)) {
+ ailp->xa_log_flush = 0;
+ spin_unlock(&ailp->xa_lock);
+ XFS_STATS_INC(xs_push_ail_flush);
+ xfs_log_force(mp, XFS_LOG_SYNC);
+ spin_lock(&ailp->xa_lock);
+ }
+
target = ailp->xa_target;
lip = xfs_trans_ail_cursor_first(ailp, &cur, ailp->xa_last_pushed_lsn);
if (!lip || XFS_FORCED_SHUTDOWN(mp)) {
@@ -435,7 +447,7 @@ xfs_ail_worker(
case XFS_ITEM_PINNED:
XFS_STATS_INC(xs_push_ail_pinned);
stuck++;
- flush_log = 1;
+ ailp->xa_log_flush++;
break;
case XFS_ITEM_LOCKED:
@@ -480,16 +492,6 @@ xfs_ail_worker(
xfs_trans_ail_cursor_done(ailp, &cur);
spin_unlock(&ailp->xa_lock);
- if (flush_log) {
- /*
- * If something we need to push out was pinned, then
- * push out the log so it will become unpinned and
- * move forward in the AIL.
- */
- XFS_STATS_INC(xs_push_ail_flush);
- xfs_log_force(mp, 0);
- }
-
if (push_xfsbufd) {
/* we've got delayed write buffers to flush */
wake_up_process(mp->m_ddev_targp->bt_task);
@@ -500,6 +502,7 @@ out_done:
if (!count) {
/* We're past our target or empty, so idle */
ailp->xa_last_pushed_lsn = 0;
+ ailp->xa_log_flush = 0;
/*
* We clear the XFS_AIL_PUSHING_BIT first before checking
diff --git a/fs/xfs/xfs_trans_priv.h b/fs/xfs/xfs_trans_priv.h
index 212946b..0a6eec6 100644
--- a/fs/xfs/xfs_trans_priv.h
+++ b/fs/xfs/xfs_trans_priv.h
@@ -71,6 +71,7 @@ struct xfs_ail {
struct delayed_work xa_work;
xfs_lsn_t xa_last_pushed_lsn;
unsigned long xa_flags;
+ int xa_log_flush;
};
#define XFS_AIL_PUSHING_BIT 0
--
1.7.5.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 3/4] xfs: re-arrange all the xfsbufd delwri queue code
2011-08-26 6:51 [PATCH 0/4] xfs: patch queue for 3.2 v2 Dave Chinner
2011-08-26 6:51 ` [PATCH 1/4] xfs: Don't allocate new buffers on every call to _xfs_buf_find Dave Chinner
2011-08-26 6:51 ` [PATCH 2/4] xfs: reduce the number of log forces from tail pushing Dave Chinner
@ 2011-08-26 6:51 ` Dave Chinner
2011-08-26 8:14 ` Christoph Hellwig
2011-08-26 6:51 ` [PATCH 4/4] xfs: convert xfsbufd to use a workqueue Dave Chinner
3 siblings, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2011-08-26 6:51 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
In preparation for converting the xfsbufd to a workqueue, re-arrange
all the delayed write queue manipulation code above the buftarg
sections. This allows us to avoid the need for forward declarations
of the new xfsbufd worker functions. This is pure code movement, not
changes to the code have been made.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Alex Elder <aelder@sgi.com>
---
fs/xfs/xfs_buf.c | 387 +++++++++++++++++++++++++++---------------------------
1 files changed, 194 insertions(+), 193 deletions(-)
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 594cea5..415ab71 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -1383,199 +1383,6 @@ xfs_buf_iomove(
}
/*
- * Handling of buffer targets (buftargs).
- */
-
-/*
- * Wait for any bufs with callbacks that have been submitted but have not yet
- * returned. These buffers will have an elevated hold count, so wait on those
- * while freeing all the buffers only held by the LRU.
- */
-void
-xfs_wait_buftarg(
- struct xfs_buftarg *btp)
-{
- struct xfs_buf *bp;
-
-restart:
- spin_lock(&btp->bt_lru_lock);
- while (!list_empty(&btp->bt_lru)) {
- bp = list_first_entry(&btp->bt_lru, struct xfs_buf, b_lru);
- if (atomic_read(&bp->b_hold) > 1) {
- spin_unlock(&btp->bt_lru_lock);
- delay(100);
- goto restart;
- }
- /*
- * clear the LRU reference count so the bufer doesn't get
- * ignored in xfs_buf_rele().
- */
- atomic_set(&bp->b_lru_ref, 0);
- spin_unlock(&btp->bt_lru_lock);
- xfs_buf_rele(bp);
- spin_lock(&btp->bt_lru_lock);
- }
- spin_unlock(&btp->bt_lru_lock);
-}
-
-int
-xfs_buftarg_shrink(
- struct shrinker *shrink,
- struct shrink_control *sc)
-{
- struct xfs_buftarg *btp = container_of(shrink,
- struct xfs_buftarg, bt_shrinker);
- struct xfs_buf *bp;
- int nr_to_scan = sc->nr_to_scan;
- LIST_HEAD(dispose);
-
- if (!nr_to_scan)
- return btp->bt_lru_nr;
-
- spin_lock(&btp->bt_lru_lock);
- while (!list_empty(&btp->bt_lru)) {
- if (nr_to_scan-- <= 0)
- break;
-
- bp = list_first_entry(&btp->bt_lru, struct xfs_buf, b_lru);
-
- /*
- * Decrement the b_lru_ref count unless the value is already
- * zero. If the value is already zero, we need to reclaim the
- * buffer, otherwise it gets another trip through the LRU.
- */
- if (!atomic_add_unless(&bp->b_lru_ref, -1, 0)) {
- list_move_tail(&bp->b_lru, &btp->bt_lru);
- continue;
- }
-
- /*
- * remove the buffer from the LRU now to avoid needing another
- * lock round trip inside xfs_buf_rele().
- */
- list_move(&bp->b_lru, &dispose);
- btp->bt_lru_nr--;
- }
- spin_unlock(&btp->bt_lru_lock);
-
- while (!list_empty(&dispose)) {
- bp = list_first_entry(&dispose, struct xfs_buf, b_lru);
- list_del_init(&bp->b_lru);
- xfs_buf_rele(bp);
- }
-
- return btp->bt_lru_nr;
-}
-
-void
-xfs_free_buftarg(
- struct xfs_mount *mp,
- struct xfs_buftarg *btp)
-{
- unregister_shrinker(&btp->bt_shrinker);
-
- xfs_flush_buftarg(btp, 1);
- if (mp->m_flags & XFS_MOUNT_BARRIER)
- xfs_blkdev_issue_flush(btp);
-
- kthread_stop(btp->bt_task);
- kmem_free(btp);
-}
-
-STATIC int
-xfs_setsize_buftarg_flags(
- xfs_buftarg_t *btp,
- unsigned int blocksize,
- unsigned int sectorsize,
- int verbose)
-{
- btp->bt_bsize = blocksize;
- btp->bt_sshift = ffs(sectorsize) - 1;
- btp->bt_smask = sectorsize - 1;
-
- if (set_blocksize(btp->bt_bdev, sectorsize)) {
- xfs_warn(btp->bt_mount,
- "Cannot set_blocksize to %u on device %s\n",
- sectorsize, xfs_buf_target_name(btp));
- return EINVAL;
- }
-
- return 0;
-}
-
-/*
- * When allocating the initial buffer target we have not yet
- * read in the superblock, so don't know what sized sectors
- * are being used is at this early stage. Play safe.
- */
-STATIC int
-xfs_setsize_buftarg_early(
- xfs_buftarg_t *btp,
- struct block_device *bdev)
-{
- return xfs_setsize_buftarg_flags(btp,
- PAGE_SIZE, bdev_logical_block_size(bdev), 0);
-}
-
-int
-xfs_setsize_buftarg(
- xfs_buftarg_t *btp,
- unsigned int blocksize,
- unsigned int sectorsize)
-{
- return xfs_setsize_buftarg_flags(btp, blocksize, sectorsize, 1);
-}
-
-STATIC int
-xfs_alloc_delwrite_queue(
- xfs_buftarg_t *btp,
- const char *fsname)
-{
- INIT_LIST_HEAD(&btp->bt_delwrite_queue);
- spin_lock_init(&btp->bt_delwrite_lock);
- btp->bt_flags = 0;
- btp->bt_task = kthread_run(xfsbufd, btp, "xfsbufd/%s", fsname);
- if (IS_ERR(btp->bt_task))
- return PTR_ERR(btp->bt_task);
- return 0;
-}
-
-xfs_buftarg_t *
-xfs_alloc_buftarg(
- struct xfs_mount *mp,
- struct block_device *bdev,
- int external,
- const char *fsname)
-{
- xfs_buftarg_t *btp;
-
- btp = kmem_zalloc(sizeof(*btp), KM_SLEEP);
-
- btp->bt_mount = mp;
- btp->bt_dev = bdev->bd_dev;
- btp->bt_bdev = bdev;
- btp->bt_bdi = blk_get_backing_dev_info(bdev);
- if (!btp->bt_bdi)
- goto error;
-
- INIT_LIST_HEAD(&btp->bt_lru);
- spin_lock_init(&btp->bt_lru_lock);
- if (xfs_setsize_buftarg_early(btp, bdev))
- goto error;
- if (xfs_alloc_delwrite_queue(btp, fsname))
- goto error;
- btp->bt_shrinker.shrink = xfs_buftarg_shrink;
- btp->bt_shrinker.seeks = DEFAULT_SEEKS;
- register_shrinker(&btp->bt_shrinker);
- return btp;
-
-error:
- kmem_free(btp);
- return NULL;
-}
-
-
-/*
* Delayed write buffer handling
*/
STATIC void
@@ -1781,6 +1588,10 @@ xfsbufd(
}
/*
+ * Handling of buffer targets (buftargs).
+ */
+
+/*
* Go through all incore buffers, and release buffers if they belong to
* the given device. This is used in filesystem error handling to
* preserve the consistency of its metadata.
@@ -1837,6 +1648,196 @@ xfs_flush_buftarg(
return pincount;
}
+/*
+ * Wait for any bufs with callbacks that have been submitted but have not yet
+ * returned. These buffers will have an elevated hold count, so wait on those
+ * while freeing all the buffers only held by the LRU.
+ */
+void
+xfs_wait_buftarg(
+ struct xfs_buftarg *btp)
+{
+ struct xfs_buf *bp;
+
+restart:
+ spin_lock(&btp->bt_lru_lock);
+ while (!list_empty(&btp->bt_lru)) {
+ bp = list_first_entry(&btp->bt_lru, struct xfs_buf, b_lru);
+ if (atomic_read(&bp->b_hold) > 1) {
+ spin_unlock(&btp->bt_lru_lock);
+ delay(100);
+ goto restart;
+ }
+ /*
+ * clear the LRU reference count so the bufer doesn't get
+ * ignored in xfs_buf_rele().
+ */
+ atomic_set(&bp->b_lru_ref, 0);
+ spin_unlock(&btp->bt_lru_lock);
+ xfs_buf_rele(bp);
+ spin_lock(&btp->bt_lru_lock);
+ }
+ spin_unlock(&btp->bt_lru_lock);
+}
+
+int
+xfs_buftarg_shrink(
+ struct shrinker *shrink,
+ struct shrink_control *sc)
+{
+ struct xfs_buftarg *btp = container_of(shrink,
+ struct xfs_buftarg, bt_shrinker);
+ struct xfs_buf *bp;
+ int nr_to_scan = sc->nr_to_scan;
+ LIST_HEAD(dispose);
+
+ if (!nr_to_scan)
+ return btp->bt_lru_nr;
+
+ spin_lock(&btp->bt_lru_lock);
+ while (!list_empty(&btp->bt_lru)) {
+ if (nr_to_scan-- <= 0)
+ break;
+
+ bp = list_first_entry(&btp->bt_lru, struct xfs_buf, b_lru);
+
+ /*
+ * Decrement the b_lru_ref count unless the value is already
+ * zero. If the value is already zero, we need to reclaim the
+ * buffer, otherwise it gets another trip through the LRU.
+ */
+ if (!atomic_add_unless(&bp->b_lru_ref, -1, 0)) {
+ list_move_tail(&bp->b_lru, &btp->bt_lru);
+ continue;
+ }
+
+ /*
+ * remove the buffer from the LRU now to avoid needing another
+ * lock round trip inside xfs_buf_rele().
+ */
+ list_move(&bp->b_lru, &dispose);
+ btp->bt_lru_nr--;
+ }
+ spin_unlock(&btp->bt_lru_lock);
+
+ while (!list_empty(&dispose)) {
+ bp = list_first_entry(&dispose, struct xfs_buf, b_lru);
+ list_del_init(&bp->b_lru);
+ xfs_buf_rele(bp);
+ }
+
+ return btp->bt_lru_nr;
+}
+
+void
+xfs_free_buftarg(
+ struct xfs_mount *mp,
+ struct xfs_buftarg *btp)
+{
+ unregister_shrinker(&btp->bt_shrinker);
+
+ xfs_flush_buftarg(btp, 1);
+ if (mp->m_flags & XFS_MOUNT_BARRIER)
+ xfs_blkdev_issue_flush(btp);
+
+ kthread_stop(btp->bt_task);
+ kmem_free(btp);
+}
+
+STATIC int
+xfs_setsize_buftarg_flags(
+ xfs_buftarg_t *btp,
+ unsigned int blocksize,
+ unsigned int sectorsize,
+ int verbose)
+{
+ btp->bt_bsize = blocksize;
+ btp->bt_sshift = ffs(sectorsize) - 1;
+ btp->bt_smask = sectorsize - 1;
+
+ if (set_blocksize(btp->bt_bdev, sectorsize)) {
+ xfs_warn(btp->bt_mount,
+ "Cannot set_blocksize to %u on device %s\n",
+ sectorsize, xfs_buf_target_name(btp));
+ return EINVAL;
+ }
+
+ return 0;
+}
+
+/*
+ * When allocating the initial buffer target we have not yet
+ * read in the superblock, so don't know what sized sectors
+ * are being used is at this early stage. Play safe.
+ */
+STATIC int
+xfs_setsize_buftarg_early(
+ xfs_buftarg_t *btp,
+ struct block_device *bdev)
+{
+ return xfs_setsize_buftarg_flags(btp,
+ PAGE_SIZE, bdev_logical_block_size(bdev), 0);
+}
+
+int
+xfs_setsize_buftarg(
+ xfs_buftarg_t *btp,
+ unsigned int blocksize,
+ unsigned int sectorsize)
+{
+ return xfs_setsize_buftarg_flags(btp, blocksize, sectorsize, 1);
+}
+
+STATIC int
+xfs_alloc_delwrite_queue(
+ xfs_buftarg_t *btp,
+ const char *fsname)
+{
+ INIT_LIST_HEAD(&btp->bt_delwrite_queue);
+ spin_lock_init(&btp->bt_delwrite_lock);
+ btp->bt_flags = 0;
+ btp->bt_task = kthread_run(xfsbufd, btp, "xfsbufd/%s", fsname);
+ if (IS_ERR(btp->bt_task))
+ return PTR_ERR(btp->bt_task);
+ return 0;
+}
+
+xfs_buftarg_t *
+xfs_alloc_buftarg(
+ struct xfs_mount *mp,
+ struct block_device *bdev,
+ int external,
+ const char *fsname)
+{
+ xfs_buftarg_t *btp;
+
+ btp = kmem_zalloc(sizeof(*btp), KM_SLEEP);
+
+ btp->bt_mount = mp;
+ btp->bt_dev = bdev->bd_dev;
+ btp->bt_bdev = bdev;
+ btp->bt_bdi = blk_get_backing_dev_info(bdev);
+ if (!btp->bt_bdi)
+ goto error;
+
+ INIT_LIST_HEAD(&btp->bt_lru);
+ spin_lock_init(&btp->bt_lru_lock);
+ if (xfs_setsize_buftarg_early(btp, bdev))
+ goto error;
+ if (xfs_alloc_delwrite_queue(btp, fsname))
+ goto error;
+ btp->bt_shrinker.shrink = xfs_buftarg_shrink;
+ btp->bt_shrinker.seeks = DEFAULT_SEEKS;
+ register_shrinker(&btp->bt_shrinker);
+ return btp;
+
+error:
+ kmem_free(btp);
+ return NULL;
+}
+
+
+
int __init
xfs_buf_init(void)
{
--
1.7.5.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 4/4] xfs: convert xfsbufd to use a workqueue
2011-08-26 6:51 [PATCH 0/4] xfs: patch queue for 3.2 v2 Dave Chinner
` (2 preceding siblings ...)
2011-08-26 6:51 ` [PATCH 3/4] xfs: re-arrange all the xfsbufd delwri queue code Dave Chinner
@ 2011-08-26 6:51 ` Dave Chinner
2011-08-26 8:25 ` Christoph Hellwig
3 siblings, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2011-08-26 6:51 UTC (permalink / raw)
To: xfs
From: Dave Chinner <dchinner@redhat.com>
There is no reason we need a thread per filesystem to do the
flushing of the delayed write buffer queue. This can be easily
handled by a global concurrency managed workqueue.
Convert the delayed write buffer handling to use workqueues and
workqueue flushes to implement buffer writeback by embedding a
delayed work structure into the struct xfs_buftarg and using that to
control flushing. This greatly simplifes the process of flushing
and also removes a bunch of duplicated code between buftarg flushing
and delwri buffer writeback.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
---
fs/xfs/xfs_buf.c | 172 +++++++++++++++++++++---------------------------
fs/xfs/xfs_buf.h | 5 +-
fs/xfs/xfs_dquot.c | 1 -
fs/xfs/xfs_trans_ail.c | 2 +-
4 files changed, 78 insertions(+), 102 deletions(-)
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 415ab71..9aa4e60 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -42,9 +42,9 @@
#include "xfs_trace.h"
static kmem_zone_t *xfs_buf_zone;
-STATIC int xfsbufd(void *);
-STATIC void xfs_buf_delwri_queue(xfs_buf_t *, int);
+STATIC void xfs_buf_delwri_queue(xfs_buf_t *bp, int unlock);
+static struct workqueue_struct *xfs_buf_wq;
static struct workqueue_struct *xfslogd_workqueue;
struct workqueue_struct *xfsdatad_workqueue;
struct workqueue_struct *xfsconvertd_workqueue;
@@ -1407,8 +1407,9 @@ xfs_buf_delwri_queue(
}
if (list_empty(dwq)) {
- /* start xfsbufd as it is about to have something to do */
- wake_up_process(bp->b_target->bt_task);
+ /* queue a delayed flush as we are about to queue a buffer */
+ queue_delayed_work(xfs_buf_wq, &bp->b_target->bt_delwrite_work,
+ xfs_buf_timer_centisecs * msecs_to_jiffies(10));
}
bp->b_flags |= _XBF_DELWRI_Q;
@@ -1486,15 +1487,14 @@ STATIC int
xfs_buf_delwri_split(
xfs_buftarg_t *target,
struct list_head *list,
- unsigned long age)
+ unsigned long age,
+ int force)
{
xfs_buf_t *bp, *n;
struct list_head *dwq = &target->bt_delwrite_queue;
spinlock_t *dwlk = &target->bt_delwrite_lock;
int skipped = 0;
- int force;
- force = test_and_clear_bit(XBT_FORCE_FLUSH, &target->bt_flags);
INIT_LIST_HEAD(list);
spin_lock(dwlk);
list_for_each_entry_safe(bp, n, dwq, b_list) {
@@ -1543,90 +1543,33 @@ xfs_buf_cmp(
return 0;
}
-STATIC int
-xfsbufd(
- void *data)
-{
- xfs_buftarg_t *target = (xfs_buftarg_t *)data;
-
- current->flags |= PF_MEMALLOC;
-
- set_freezable();
-
- do {
- long age = xfs_buf_age_centisecs * msecs_to_jiffies(10);
- long tout = xfs_buf_timer_centisecs * msecs_to_jiffies(10);
- struct list_head tmp;
- struct blk_plug plug;
-
- if (unlikely(freezing(current))) {
- set_bit(XBT_FORCE_SLEEP, &target->bt_flags);
- refrigerator();
- } else {
- clear_bit(XBT_FORCE_SLEEP, &target->bt_flags);
- }
-
- /* sleep for a long time if there is nothing to do. */
- if (list_empty(&target->bt_delwrite_queue))
- tout = MAX_SCHEDULE_TIMEOUT;
- schedule_timeout_interruptible(tout);
-
- xfs_buf_delwri_split(target, &tmp, age);
- list_sort(NULL, &tmp, xfs_buf_cmp);
-
- blk_start_plug(&plug);
- while (!list_empty(&tmp)) {
- struct xfs_buf *bp;
- bp = list_first_entry(&tmp, struct xfs_buf, b_list);
- list_del_init(&bp->b_list);
- xfs_bdstrat_cb(bp);
- }
- blk_finish_plug(&plug);
- } while (!kthread_should_stop());
-
- return 0;
-}
-
/*
- * Handling of buffer targets (buftargs).
+ * If we are doing a forced flush, then we need to wait for the IO that we
+ * issue to complete.
*/
-
-/*
- * Go through all incore buffers, and release buffers if they belong to
- * the given device. This is used in filesystem error handling to
- * preserve the consistency of its metadata.
- */
-int
-xfs_flush_buftarg(
- xfs_buftarg_t *target,
- int wait)
+static void
+xfs_buf_delwri_work(
+ struct work_struct *work)
{
- xfs_buf_t *bp;
- int pincount = 0;
+ struct xfs_buftarg *btp = container_of(to_delayed_work(work),
+ struct xfs_buftarg, bt_delwrite_work);
+ struct xfs_buf *bp;
+ struct blk_plug plug;
LIST_HEAD(tmp_list);
LIST_HEAD(wait_list);
- struct blk_plug plug;
-
- xfs_buf_runall_queues(xfsconvertd_workqueue);
- xfs_buf_runall_queues(xfsdatad_workqueue);
- xfs_buf_runall_queues(xfslogd_workqueue);
+ long age = xfs_buf_age_centisecs * msecs_to_jiffies(10);
+ int force = 0;
- set_bit(XBT_FORCE_FLUSH, &target->bt_flags);
- pincount = xfs_buf_delwri_split(target, &tmp_list, 0);
+ force = test_and_clear_bit(XBT_FORCE_FLUSH, &btp->bt_flags);
- /*
- * Dropped the delayed write list lock, now walk the temporary list.
- * All I/O is issued async and then if we need to wait for completion
- * we do that after issuing all the IO.
- */
+ xfs_buf_delwri_split(btp, &tmp_list, age, force);
list_sort(NULL, &tmp_list, xfs_buf_cmp);
blk_start_plug(&plug);
while (!list_empty(&tmp_list)) {
bp = list_first_entry(&tmp_list, struct xfs_buf, b_list);
- ASSERT(target == bp->b_target);
list_del_init(&bp->b_list);
- if (wait) {
+ if (force) {
bp->b_flags &= ~XBF_ASYNC;
list_add(&bp->b_list, &wait_list);
}
@@ -1634,7 +1577,7 @@ xfs_flush_buftarg(
}
blk_finish_plug(&plug);
- if (wait) {
+ if (force) {
/* Wait for IO to complete. */
while (!list_empty(&wait_list)) {
bp = list_first_entry(&wait_list, struct xfs_buf, b_list);
@@ -1645,7 +1588,48 @@ xfs_flush_buftarg(
}
}
- return pincount;
+ if (list_empty(&btp->bt_delwrite_queue))
+ return;
+
+ queue_delayed_work(xfs_buf_wq, &btp->bt_delwrite_work,
+ xfs_buf_timer_centisecs * msecs_to_jiffies(10));
+}
+
+/*
+ * Handling of buffer targets (buftargs).
+ */
+
+/*
+ * Flush all the queued buffer work, then flush any remaining dirty buffers
+ * and wait for them to complete. If there are buffers remaining on the delwri
+ * queue, then they were pinned so couldn't be flushed. Return a value of 1 to
+ * indicate that there were pinned buffers and the caller needs to retry the
+ * flush.
+ */
+int
+xfs_flush_buftarg(
+ xfs_buftarg_t *target,
+ int wait)
+{
+ xfs_buf_runall_queues(xfsconvertd_workqueue);
+ xfs_buf_runall_queues(xfsdatad_workqueue);
+ xfs_buf_runall_queues(xfslogd_workqueue);
+
+ if (wait) {
+ /*
+ * Ensure we have work queued up after setting the force flag.
+ * If work is already in progress then the wq flush below won't
+ * cause new work to start and hence the force flag will not be
+ * seen by the flush and the flush will be incomplete.
+ */
+ set_bit(XBT_FORCE_FLUSH, &target->bt_flags);
+ queue_delayed_work(xfs_buf_wq, &target->bt_delwrite_work, 0);
+ }
+ flush_delayed_work_sync(&target->bt_delwrite_work);
+
+ if (!list_empty(&target->bt_delwrite_queue))
+ return 1;
+ return 0;
}
/*
@@ -1740,7 +1724,6 @@ xfs_free_buftarg(
if (mp->m_flags & XFS_MOUNT_BARRIER)
xfs_blkdev_issue_flush(btp);
- kthread_stop(btp->bt_task);
kmem_free(btp);
}
@@ -1788,20 +1771,6 @@ xfs_setsize_buftarg(
return xfs_setsize_buftarg_flags(btp, blocksize, sectorsize, 1);
}
-STATIC int
-xfs_alloc_delwrite_queue(
- xfs_buftarg_t *btp,
- const char *fsname)
-{
- INIT_LIST_HEAD(&btp->bt_delwrite_queue);
- spin_lock_init(&btp->bt_delwrite_lock);
- btp->bt_flags = 0;
- btp->bt_task = kthread_run(xfsbufd, btp, "xfsbufd/%s", fsname);
- if (IS_ERR(btp->bt_task))
- return PTR_ERR(btp->bt_task);
- return 0;
-}
-
xfs_buftarg_t *
xfs_alloc_buftarg(
struct xfs_mount *mp,
@@ -1824,8 +1793,11 @@ xfs_alloc_buftarg(
spin_lock_init(&btp->bt_lru_lock);
if (xfs_setsize_buftarg_early(btp, bdev))
goto error;
- if (xfs_alloc_delwrite_queue(btp, fsname))
- goto error;
+
+ INIT_LIST_HEAD(&btp->bt_delwrite_queue);
+ spin_lock_init(&btp->bt_delwrite_lock);
+ INIT_DELAYED_WORK(&btp->bt_delwrite_work, xfs_buf_delwri_work);
+
btp->bt_shrinker.shrink = xfs_buftarg_shrink;
btp->bt_shrinker.seeks = DEFAULT_SEEKS;
register_shrinker(&btp->bt_shrinker);
@@ -1860,8 +1832,13 @@ xfs_buf_init(void)
if (!xfsconvertd_workqueue)
goto out_destroy_xfsdatad_workqueue;
+ xfs_buf_wq = alloc_workqueue("xfsbufd", WQ_MEM_RECLAIM, 8);
+ if (!xfs_buf_wq)
+ goto out_destroy_xfsconvertd_wq;
return 0;
+ out_destroy_xfsconvertd_wq:
+ destroy_workqueue(xfsconvertd_workqueue);
out_destroy_xfsdatad_workqueue:
destroy_workqueue(xfsdatad_workqueue);
out_destroy_xfslogd_workqueue:
@@ -1875,6 +1852,7 @@ xfs_buf_init(void)
void
xfs_buf_terminate(void)
{
+ destroy_workqueue(xfs_buf_wq);
destroy_workqueue(xfsconvertd_workqueue);
destroy_workqueue(xfsdatad_workqueue);
destroy_workqueue(xfslogd_workqueue);
diff --git a/fs/xfs/xfs_buf.h b/fs/xfs/xfs_buf.h
index 620972b..c1aabfd 100644
--- a/fs/xfs/xfs_buf.h
+++ b/fs/xfs/xfs_buf.h
@@ -90,8 +90,7 @@ typedef unsigned int xfs_buf_flags_t;
{ _XBF_DELWRI_Q, "DELWRI_Q" }
typedef enum {
- XBT_FORCE_SLEEP = 0,
- XBT_FORCE_FLUSH = 1,
+ XBT_FORCE_FLUSH = 0,
} xfs_buftarg_flags_t;
typedef struct xfs_buftarg {
@@ -104,7 +103,7 @@ typedef struct xfs_buftarg {
size_t bt_smask;
/* per device delwri queue */
- struct task_struct *bt_task;
+ struct delayed_work bt_delwrite_work;
struct list_head bt_delwrite_queue;
spinlock_t bt_delwrite_lock;
unsigned long bt_flags;
diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
index db62959..1fb9d93 100644
--- a/fs/xfs/xfs_dquot.c
+++ b/fs/xfs/xfs_dquot.c
@@ -1446,7 +1446,6 @@ xfs_qm_dqflock_pushbuf_wait(
if (xfs_buf_ispinned(bp))
xfs_log_force(mp, 0);
xfs_buf_delwri_promote(bp);
- wake_up_process(bp->b_target->bt_task);
}
xfs_buf_relse(bp);
out_lock:
diff --git a/fs/xfs/xfs_trans_ail.c b/fs/xfs/xfs_trans_ail.c
index 13188df..a3d1784 100644
--- a/fs/xfs/xfs_trans_ail.c
+++ b/fs/xfs/xfs_trans_ail.c
@@ -494,7 +494,7 @@ xfs_ail_worker(
if (push_xfsbufd) {
/* we've got delayed write buffers to flush */
- wake_up_process(mp->m_ddev_targp->bt_task);
+ flush_delayed_work(&mp->m_ddev_targp->bt_delwrite_work);
}
/* assume we have more work to do in a short while */
--
1.7.5.4
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 1/4] xfs: Don't allocate new buffers on every call to _xfs_buf_find
2011-08-26 6:51 ` [PATCH 1/4] xfs: Don't allocate new buffers on every call to _xfs_buf_find Dave Chinner
@ 2011-08-26 8:11 ` Christoph Hellwig
2011-08-26 14:19 ` Alex Elder
2011-09-21 6:44 ` Dave Chinner
0 siblings, 2 replies; 14+ messages in thread
From: Christoph Hellwig @ 2011-08-26 8:11 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
> diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> index c57836d..594cea5 100644
> --- a/fs/xfs/xfs_buf.c
> +++ b/fs/xfs/xfs_buf.c
> @@ -171,10 +171,16 @@ STATIC void
> _xfs_buf_initialize(
> xfs_buf_t *bp,
> xfs_buftarg_t *target,
> - xfs_off_t range_base,
> - size_t range_length,
> + xfs_off_t bno,
> + size_t num_blocks,
> xfs_buf_flags_t flags)
> {
> + xfs_off_t range_base;
> + size_t range_length;
> +
> + range_base = BBTOB(bno);
> + range_length = BBTOB(num_blocks);
What is the point of changing the mostly unrelated _xfs_buf_initialize
prototype in this patch?
I think it (and the other renaming changes related to it) are fine,
but should be a separate patch. And once you touch _xfs_buf_initialize
after the core of this patch, please merge it with xfs_buf_allocate into
a new xfs_buf_alloc that does the full allocation + initialization and
can also replace xfs_buf_get_empty.
> + bp = _xfs_buf_find(target, bno, num_blocks, flags, new_bp);
> + if (!bp) {
> + xfs_buf_deallocate(new_bp);
> + return NULL;
> + }
> +
> if (bp == new_bp) {
> error = xfs_buf_allocate_memory(bp, flags);
> if (error)
> goto no_buffer;
> + } else
> xfs_buf_deallocate(new_bp);
I'd recommend moving the call to xfs_buf_allocate_memory into
_xfs_buf_find so that it returns a fully allocated buffer. In fact I'd
also move the xfs_buf_deallocate(new_bp) into the found side of
_xfs_buf_find, avoiding any conditionals in xfs_buf_get.
>
> - XFS_STATS_INC(xb_get);
> -
> /*
> - * Always fill in the block number now, the mapped cases can do
> - * their own overlay of this later.
> + * Now we have a workable buffer, fill in the block number so
> + * that we can do IO on it.
> */
> - bp->b_bn = ioff;
> - bp->b_count_desired = bp->b_buffer_length;
> + bp->b_bn = bno;
Note that we only need this if we did not find an existing buffer. It's
not strictly related to the patch, but given that you stop assigning
b_count_desired and redo this whole area it might be worth shifting it
into the if (bp == new_bp) conditional area.
>
> +found:
> + ASSERT(bp->b_flags & XBF_MAPPED);
This doesn't look right to me. Various buffers like inode or remoate attrs
are unmapped, and I can't see any reason why we would assert not beeing
allowed to find them here.
Thinking about it more I'm also not sure skipping the code to map
buffers on a straight cache hit is a good idea - there's nothing
inherent to requiring a given buffer to be mapped for all callers.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/4] xfs: reduce the number of log forces from tail pushing
2011-08-26 6:51 ` [PATCH 2/4] xfs: reduce the number of log forces from tail pushing Dave Chinner
@ 2011-08-26 8:14 ` Christoph Hellwig
0 siblings, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2011-08-26 8:14 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
Looks fine. xa_log_flush is updated unlocked, but given that we always
just have a single worker per ail running at the same time that should
not be a problem.
Reviewed-by: Christoph Hellwig <hch@lst.de>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 3/4] xfs: re-arrange all the xfsbufd delwri queue code
2011-08-26 6:51 ` [PATCH 3/4] xfs: re-arrange all the xfsbufd delwri queue code Dave Chinner
@ 2011-08-26 8:14 ` Christoph Hellwig
0 siblings, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2011-08-26 8:14 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
On Fri, Aug 26, 2011 at 04:51:36PM +1000, Dave Chinner wrote:
> From: Dave Chinner <dchinner@redhat.com>
>
> In preparation for converting the xfsbufd to a workqueue, re-arrange
> all the delayed write queue manipulation code above the buftarg
> sections. This allows us to avoid the need for forward declarations
> of the new xfsbufd worker functions. This is pure code movement, not
> changes to the code have been made.
>
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> Reviewed-by: Alex Elder <aelder@sgi.com>
Looks fine, although it clashed with my reviewed queue of xfs_buf
patches.
Reviewed-by: Christoph Hellwig <hch@lst.de>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 4/4] xfs: convert xfsbufd to use a workqueue
2011-08-26 6:51 ` [PATCH 4/4] xfs: convert xfsbufd to use a workqueue Dave Chinner
@ 2011-08-26 8:25 ` Christoph Hellwig
2011-09-21 6:25 ` Dave Chinner
0 siblings, 1 reply; 14+ messages in thread
From: Christoph Hellwig @ 2011-08-26 8:25 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
> -STATIC void xfs_buf_delwri_queue(xfs_buf_t *, int);
> +STATIC void xfs_buf_delwri_queue(xfs_buf_t *bp, int unlock);
spuriously changing this prototype just makes merging with other
pending changes in this are harder :P
> /*
> + * If we are doing a forced flush, then we need to wait for the IO that we
> + * issue to complete.
> */
> +static void
> +xfs_buf_delwri_work(
> + struct work_struct *work)
> {
> + struct xfs_buftarg *btp = container_of(to_delayed_work(work),
> + struct xfs_buftarg, bt_delwrite_work);
> + struct xfs_buf *bp;
> + struct blk_plug plug;
> LIST_HEAD(tmp_list);
> LIST_HEAD(wait_list);
> + long age = xfs_buf_age_centisecs * msecs_to_jiffies(10);
> + int force = 0;
>
> + force = test_and_clear_bit(XBT_FORCE_FLUSH, &btp->bt_flags);
>
> + xfs_buf_delwri_split(btp, &tmp_list, age, force);
> list_sort(NULL, &tmp_list, xfs_buf_cmp);
>
> blk_start_plug(&plug);
> while (!list_empty(&tmp_list)) {
> bp = list_first_entry(&tmp_list, struct xfs_buf, b_list);
> - ASSERT(target == bp->b_target);
> list_del_init(&bp->b_list);
> - if (wait) {
> + if (force) {
> bp->b_flags &= ~XBF_ASYNC;
> list_add(&bp->b_list, &wait_list);
> }
> @@ -1634,7 +1577,7 @@ xfs_flush_buftarg(
> }
> blk_finish_plug(&plug);
>
> + if (force) {
> /* Wait for IO to complete. */
> while (!list_empty(&wait_list)) {
> bp = list_first_entry(&wait_list, struct xfs_buf, b_list);
> @@ -1645,7 +1588,48 @@ xfs_flush_buftarg(
> }
> }
>
> +/*
> + * Handling of buffer targets (buftargs).
> + */
I think we can just kill this comment.
> +/*
> + * Flush all the queued buffer work, then flush any remaining dirty buffers
> + * and wait for them to complete. If there are buffers remaining on the delwri
> + * queue, then they were pinned so couldn't be flushed. Return a value of 1 to
> + * indicate that there were pinned buffers and the caller needs to retry the
> + * flush.
> + */
Not directly related to your patch, but only one caller ever checks the
return value and retries. This means e.g. during sync or umount we
don't bother with trying to push pinned buffers.
> +int
> +xfs_flush_buftarg(
> + xfs_buftarg_t *target,
Please use the non-typedef version of new or largely changed code.
> index 13188df..a3d1784 100644
> --- a/fs/xfs/xfs_trans_ail.c
> +++ b/fs/xfs/xfs_trans_ail.c
> @@ -494,7 +494,7 @@ xfs_ail_worker(
>
> if (push_xfsbufd) {
> /* we've got delayed write buffers to flush */
> - wake_up_process(mp->m_ddev_targp->bt_task);
> + flush_delayed_work(&mp->m_ddev_targp->bt_delwrite_work);
This is a huge change in behaviour. wake_up_process just kicks the
thread to wakeup from sleep as soon as the schedule selects it, while
flush_delayed_work does not only queue a pending delayed work, but also
waits for it to finish.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/4] xfs: Don't allocate new buffers on every call to _xfs_buf_find
2011-08-26 8:11 ` Christoph Hellwig
@ 2011-08-26 14:19 ` Alex Elder
2011-09-21 6:44 ` Dave Chinner
1 sibling, 0 replies; 14+ messages in thread
From: Alex Elder @ 2011-08-26 14:19 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: xfs
On Fri, 2011-08-26 at 04:11 -0400, Christoph Hellwig wrote:
. . .
> >
> > +found:
> > + ASSERT(bp->b_flags & XBF_MAPPED);
>
> This doesn't look right to me. Various buffers like inode or remoate attrs
> are unmapped, and I can't see any reason why we would assert not beeing
> allowed to find them here.
>
> Thinking about it more I'm also not sure skipping the code to map
> buffers on a straight cache hit is a good idea - there's nothing
> inherent to requiring a given buffer to be mapped for all callers.
I actually tripped this assert last night the first time I
tried running it.
-Alex
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 4/4] xfs: convert xfsbufd to use a workqueue
2011-08-26 8:25 ` Christoph Hellwig
@ 2011-09-21 6:25 ` Dave Chinner
2011-09-21 11:26 ` Christoph Hellwig
0 siblings, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2011-09-21 6:25 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: xfs
On Fri, Aug 26, 2011 at 04:25:15AM -0400, Christoph Hellwig wrote:
> > index 13188df..a3d1784 100644
> > --- a/fs/xfs/xfs_trans_ail.c
> > +++ b/fs/xfs/xfs_trans_ail.c
> > @@ -494,7 +494,7 @@ xfs_ail_worker(
> >
> > if (push_xfsbufd) {
> > /* we've got delayed write buffers to flush */
> > - wake_up_process(mp->m_ddev_targp->bt_task);
> > + flush_delayed_work(&mp->m_ddev_targp->bt_delwrite_work);
>
> This is a huge change in behaviour. wake_up_process just kicks the
> thread to wakeup from sleep as soon as the schedule selects it, while
> flush_delayed_work does not only queue a pending delayed work, but also
> waits for it to finish.
Which is precisely what I want here - to wait for all the delwri
buffers that were promoted to be submitted before continuing
onwards. This makes the scanning algorithm self throttling -
instead of simply pushing the buffers to the delwri queue and
kicking a background thread and hoping it can flush buffers faster
than we can promote them from the AIL, it explicitly pushes the
delwri buffers before the next round of AIL scanning. The ensures we
start timely IO on the buffers and don't simple continue to scan the
AIL while we wait for the background thread to send them off to
disk and complete.
IOWs, instead of:
AIL bufd
promote
....
promote
wakeup
short sleep
woken
sort
dispatch
....
<sometime later, maybe before, during or after xfsbufd has run>
promote
.....
promote
wakeup
short sleep
woken
sort
dispatch
....
Where we *hope* the short sleep in the AIL processing is long enough
to avoid repeated scanning of the AIL while the queued IO is
dispatched. we end up with:
AIL bufd
promote
....
promote
flush_work
sort
dispatch
short sleep
promote
....
promote
flush_work
sort
dispatch
short sleep
Which is much more controlled and means that the short sleep that
that the AIL processing does actually gives time for IO completions
to occur before continuing. It means that dispatch of IO from the
AIL is throttled to the rate of device congestion as it now waits
for the IO dispatch to complete instead of just sholving as much Io
as possible into the bufd queue.
FWIW, if we move to building a direct IO buffer list in the AIL as
we were recently discussing, this is -exactly- the IO dispatch
patterns and delays that we will get from the AIL....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/4] xfs: Don't allocate new buffers on every call to _xfs_buf_find
2011-08-26 8:11 ` Christoph Hellwig
2011-08-26 14:19 ` Alex Elder
@ 2011-09-21 6:44 ` Dave Chinner
2011-09-21 11:28 ` Christoph Hellwig
1 sibling, 1 reply; 14+ messages in thread
From: Dave Chinner @ 2011-09-21 6:44 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: xfs
On Fri, Aug 26, 2011 at 04:11:32AM -0400, Christoph Hellwig wrote:
> > diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> > index c57836d..594cea5 100644
> > --- a/fs/xfs/xfs_buf.c
> > +++ b/fs/xfs/xfs_buf.c
> > @@ -171,10 +171,16 @@ STATIC void
> > _xfs_buf_initialize(
> > xfs_buf_t *bp,
> > xfs_buftarg_t *target,
> > - xfs_off_t range_base,
> > - size_t range_length,
> > + xfs_off_t bno,
> > + size_t num_blocks,
> > xfs_buf_flags_t flags)
> > {
> > + xfs_off_t range_base;
> > + size_t range_length;
> > +
> > + range_base = BBTOB(bno);
> > + range_length = BBTOB(num_blocks);
>
> What is the point of changing the mostly unrelated _xfs_buf_initialize
> prototype in this patch?
We were converting units backwards and forwards inconsistently, some
functions taking bytes, some basic blocks, and conversions were
being done all over the place.
> I think it (and the other renaming changes related to it) are fine,
> but should be a separate patch.
OK, fine, I can do that.
> And once you touch _xfs_buf_initialize
> after the core of this patch, please merge it with xfs_buf_allocate into
> a new xfs_buf_alloc that does the full allocation + initialization and
> can also replace xfs_buf_get_empty.
Not right now. That restructing can be done separately, probably in
the same patch set that fixes the API types problems...
> > + bp = _xfs_buf_find(target, bno, num_blocks, flags, new_bp);
> > + if (!bp) {
> > + xfs_buf_deallocate(new_bp);
> > + return NULL;
> > + }
> > +
> > if (bp == new_bp) {
> > error = xfs_buf_allocate_memory(bp, flags);
> > if (error)
> > goto no_buffer;
> > + } else
> > xfs_buf_deallocate(new_bp);
>
> I'd recommend moving the call to xfs_buf_allocate_memory into
> _xfs_buf_find so that it returns a fully allocated buffer. In fact I'd
> also move the xfs_buf_deallocate(new_bp) into the found side of
> _xfs_buf_find, avoiding any conditionals in xfs_buf_get.
<sigh>
This code s pretty much as you requested it after the first time I
posted it.
http://oss.sgi.com/archives/xfs/2011-08/msg00146.html
I'll go rewrite this again, but IMO all you are asking for is for me
to put a different colour on the bike shed....
> >
> > - XFS_STATS_INC(xb_get);
> > -
> > /*
> > - * Always fill in the block number now, the mapped cases can do
> > - * their own overlay of this later.
> > + * Now we have a workable buffer, fill in the block number so
> > + * that we can do IO on it.
> > */
> > - bp->b_bn = ioff;
> > - bp->b_count_desired = bp->b_buffer_length;
> > + bp->b_bn = bno;
>
> Note that we only need this if we did not find an existing buffer. It's
> not strictly related to the patch, but given that you stop assigning
> b_count_desired and redo this whole area it might be worth shifting it
> into the if (bp == new_bp) conditional area.
OK.
> >
> > +found:
> > + ASSERT(bp->b_flags & XBF_MAPPED);
>
> This doesn't look right to me. Various buffers like inode or remoate attrs
> are unmapped, and I can't see any reason why we would assert not beeing
> allowed to find them here.
Yeah, a bit of a thinko, but it never tripped on me....
> Thinking about it more I'm also not sure skipping the code to map
> buffers on a straight cache hit is a good idea - there's nothing
> inherent to requiring a given buffer to be mapped for all callers.
OK, will fix.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 4/4] xfs: convert xfsbufd to use a workqueue
2011-09-21 6:25 ` Dave Chinner
@ 2011-09-21 11:26 ` Christoph Hellwig
0 siblings, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2011-09-21 11:26 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
On Wed, Sep 21, 2011 at 04:25:39PM +1000, Dave Chinner wrote:
> Which is precisely what I want here - to wait for all the delwri
> buffers that were promoted to be submitted before continuing
> onwards. This makes the scanning algorithm self throttling -
> instead of simply pushing the buffers to the delwri queue and
> kicking a background thread and hoping it can flush buffers faster
> than we can promote them from the AIL, it explicitly pushes the
> delwri buffers before the next round of AIL scanning. The ensures we
> start timely IO on the buffers and don't simple continue to scan the
> AIL while we wait for the background thread to send them off to
> disk and complete.
I didn't say I'm against it. The important bit is that such changes
in behaviour get documented in the patch description, including
a rationale like the on in this mail.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/4] xfs: Don't allocate new buffers on every call to _xfs_buf_find
2011-09-21 6:44 ` Dave Chinner
@ 2011-09-21 11:28 ` Christoph Hellwig
0 siblings, 0 replies; 14+ messages in thread
From: Christoph Hellwig @ 2011-09-21 11:28 UTC (permalink / raw)
To: Dave Chinner; +Cc: Christoph Hellwig, xfs
On Wed, Sep 21, 2011 at 04:44:43PM +1000, Dave Chinner wrote:
>
> > And once you touch _xfs_buf_initialize
> > after the core of this patch, please merge it with xfs_buf_allocate into
> > a new xfs_buf_alloc that does the full allocation + initialization and
> > can also replace xfs_buf_get_empty.
>
> Not right now. That restructing can be done separately, probably in
> the same patch set that fixes the API types problems...
That's what I meant - the conversion changes should be part of a larger
patch (-series) to also fix up the API, and this bit.
> > > if (bp == new_bp) {
> > > error = xfs_buf_allocate_memory(bp, flags);
> > > if (error)
> > > goto no_buffer;
> > > + } else
> > > xfs_buf_deallocate(new_bp);
> >
> > I'd recommend moving the call to xfs_buf_allocate_memory into
> > _xfs_buf_find so that it returns a fully allocated buffer. In fact I'd
> > also move the xfs_buf_deallocate(new_bp) into the found side of
> > _xfs_buf_find, avoiding any conditionals in xfs_buf_get.
>
> <sigh>
>
> This code s pretty much as you requested it after the first time I
> posted it.
>
> http://oss.sgi.com/archives/xfs/2011-08/msg00146.html
>
> I'll go rewrite this again, but IMO all you are asking for is for me
> to put a different colour on the bike shed....
We can leave it as-is for now. My suggestion in the previous mail just
went half-way to where it makes most sense after looking at it for a
while.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2011-09-21 11:28 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-08-26 6:51 [PATCH 0/4] xfs: patch queue for 3.2 v2 Dave Chinner
2011-08-26 6:51 ` [PATCH 1/4] xfs: Don't allocate new buffers on every call to _xfs_buf_find Dave Chinner
2011-08-26 8:11 ` Christoph Hellwig
2011-08-26 14:19 ` Alex Elder
2011-09-21 6:44 ` Dave Chinner
2011-09-21 11:28 ` Christoph Hellwig
2011-08-26 6:51 ` [PATCH 2/4] xfs: reduce the number of log forces from tail pushing Dave Chinner
2011-08-26 8:14 ` Christoph Hellwig
2011-08-26 6:51 ` [PATCH 3/4] xfs: re-arrange all the xfsbufd delwri queue code Dave Chinner
2011-08-26 8:14 ` Christoph Hellwig
2011-08-26 6:51 ` [PATCH 4/4] xfs: convert xfsbufd to use a workqueue Dave Chinner
2011-08-26 8:25 ` Christoph Hellwig
2011-09-21 6:25 ` Dave Chinner
2011-09-21 11:26 ` Christoph Hellwig
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.