* [PATCH 0/6] [PATCH 0/6] more efficient busy extent handling and discard support
@ 2011-03-22 19:55 Christoph Hellwig
2011-03-22 19:55 ` [PATCH 1/6] xfs: optimize AGFL refills Christoph Hellwig
` (5 more replies)
0 siblings, 6 replies; 31+ messages in thread
From: Christoph Hellwig @ 2011-03-22 19:55 UTC (permalink / raw)
To: xfs
This series updates the busy extent tracking to:
a) keep exact track of busy extents
b) allow reallocations where they are not a problem, that is generally
everything but metadata -> userdata reallocations
The result is that compilebench is around 5% faster and neither it nor
any other workload I've thrown at the patchset causes a log force anymore.
The second to last patch also adds discard support, which is rather trivial
now that we do track the busy extents exactly. The last patch further
optimizes the discard code to not stall the log state machine but
execute asynchronously. This does however trigger a bug in the block
layer, for the report and a workaround see:
http://marc.info/?l=linux-kernel&m=130082329118907&w=2
There is a few further optimizations we can add but I haven't done yet:
- do not attempt to discard blocks on the AGFL, as the chance of reuse
is very high, and we can't avoid reallocating from them
- on SSDs preferably reuse blocks on the busy list to avoid discards
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH 1/6] xfs: optimize AGFL refills
2011-03-22 19:55 [PATCH 0/6] [PATCH 0/6] more efficient busy extent handling and discard support Christoph Hellwig
@ 2011-03-22 19:55 ` Christoph Hellwig
2011-03-22 22:30 ` Alex Elder
2011-03-22 23:30 ` Dave Chinner
2011-03-22 19:55 ` [PATCH 2/6] xfs: do not immediately reuse busy extent ranges Christoph Hellwig
` (4 subsequent siblings)
5 siblings, 2 replies; 31+ messages in thread
From: Christoph Hellwig @ 2011-03-22 19:55 UTC (permalink / raw)
To: xfs
[-- Attachment #1: xfs-optimize-freelist-refills --]
[-- Type: text/plain, Size: 3496 bytes --]
Avoid forcing out busy extent when moving blocks from/to the AGFL. We
archive this my moving the busy search out of xfs_alloc_get_freelist into
the callers that need it, and by moving the busy list insert from
xfs_free_ag_extent extent which is used both by AGFL refills and real
allocation to xfs_free_extent, which is only used by the latter.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Index: xfs/fs/xfs/xfs_alloc.c
===================================================================
--- xfs.orig/fs/xfs/xfs_alloc.c 2011-03-19 16:49:23.774797370 +0100
+++ xfs/fs/xfs/xfs_alloc.c 2011-03-19 16:49:38.882797272 +0100
@@ -1326,6 +1326,8 @@ xfs_alloc_ag_vextent_small(
if (error)
goto error0;
if (fbno != NULLAGBLOCK) {
+ if (xfs_alloc_busy_search(args->mp, args->agno, fbno, 1))
+ xfs_trans_set_sync(args->tp);
if (args->userdata) {
xfs_buf_t *bp;
@@ -1617,18 +1619,6 @@ xfs_free_ag_extent(
trace_xfs_free_extent(mp, agno, bno, len, isfl, haveleft, haveright);
- /*
- * Since blocks move to the free list without the coordination
- * used in xfs_bmap_finish, we can't allow block to be available
- * for reallocation and non-transaction writing (user data)
- * until we know that the transaction that moved it to the free
- * list is permanently on disk. We track the blocks by declaring
- * these blocks as "busy"; the busy list is maintained on a per-ag
- * basis and each transaction records which entries should be removed
- * when the iclog commits to disk. If a busy block is allocated,
- * the iclog is pushed up to the LSN that freed the block.
- */
- xfs_alloc_busy_insert(tp, agno, bno, len);
return 0;
error0:
@@ -1923,21 +1913,6 @@ xfs_alloc_get_freelist(
xfs_alloc_log_agf(tp, agbp, logflags);
*bnop = bno;
- /*
- * As blocks are freed, they are added to the per-ag busy list and
- * remain there until the freeing transaction is committed to disk.
- * Now that we have allocated blocks, this list must be searched to see
- * if a block is being reused. If one is, then the freeing transaction
- * must be pushed to disk before this transaction.
- *
- * We do this by setting the current transaction to a sync transaction
- * which guarantees that the freeing transaction is on disk before this
- * transaction. This is done instead of a synchronous log force here so
- * that we don't sit and wait with the AGF locked in the transaction
- * during the log force.
- */
- if (xfs_alloc_busy_search(mp, be32_to_cpu(agf->agf_seqno), bno, 1))
- xfs_trans_set_sync(tp);
return 0;
}
@@ -2407,6 +2382,8 @@ xfs_free_extent(
be32_to_cpu(XFS_BUF_TO_AGF(args.agbp)->agf_length));
#endif
error = xfs_free_ag_extent(tp, args.agbp, args.agno, args.agbno, len, 0);
+ if (error)
+ xfs_alloc_busy_insert(tp, args.agno, args.agbno, len);
error0:
xfs_perag_put(args.pag);
return error;
Index: xfs/fs/xfs/xfs_alloc_btree.c
===================================================================
--- xfs.orig/fs/xfs/xfs_alloc_btree.c 2011-03-19 16:49:22.929299332 +0100
+++ xfs/fs/xfs/xfs_alloc_btree.c 2011-03-19 16:49:30.382841378 +0100
@@ -94,6 +94,8 @@ xfs_allocbt_alloc_block(
*stat = 0;
return 0;
}
+ if (xfs_alloc_busy_search(cur->bc_mp, cur->bc_private.a.agno, bno, 1))
+ xfs_trans_set_sync(cur->bc_tp);
xfs_trans_agbtree_delta(cur->bc_tp, 1);
new->s = cpu_to_be32(bno);
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH 2/6] xfs: do not immediately reuse busy extent ranges
2011-03-22 19:55 [PATCH 0/6] [PATCH 0/6] more efficient busy extent handling and discard support Christoph Hellwig
2011-03-22 19:55 ` [PATCH 1/6] xfs: optimize AGFL refills Christoph Hellwig
@ 2011-03-22 19:55 ` Christoph Hellwig
2011-03-22 22:30 ` Alex Elder
2011-03-25 21:03 ` Alex Elder
2011-03-22 19:55 ` [PATCH 3/6] xfs: exact busy extent tracking Christoph Hellwig
` (3 subsequent siblings)
5 siblings, 2 replies; 31+ messages in thread
From: Christoph Hellwig @ 2011-03-22 19:55 UTC (permalink / raw)
To: xfs
[-- Attachment #1: xfs-skip-busy-extents --]
[-- Type: text/plain, Size: 21217 bytes --]
Every time we reallocate a busy extent, we cause a synchronous log force
to occur to ensure the freeing transaction is on disk before we continue
and use the newly allocated extent. This is extremely sub-optimal as we
have to mark every transaction with blocks that get reused as synchronous.
Instead of searching the busy extent list after deciding on the extent to
allocate, check each candidate extent during the allocation decisions as
to whether they are in the busy list. If they are in the busy list, we
trim the busy range out of the extent we have found and determine if that
trimmed range is still OK for allocation. In many cases, this check can
be incorporated into the allocation extent alignment code which already
does trimming of the found extent before determining if it is a valid
candidate for allocation.
Based on two earlier patches from Dave Chinner.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Index: xfs/fs/xfs/xfs_alloc.c
===================================================================
--- xfs.orig/fs/xfs/xfs_alloc.c 2011-03-19 16:51:44.645299267 +0100
+++ xfs/fs/xfs/xfs_alloc.c 2011-03-19 18:11:35.157296743 +0100
@@ -41,19 +41,13 @@
#define XFSA_FIXUP_BNO_OK 1
#define XFSA_FIXUP_CNT_OK 2
-/*
- * Prototypes for per-ag allocation routines
- */
-
STATIC int xfs_alloc_ag_vextent_exact(xfs_alloc_arg_t *);
STATIC int xfs_alloc_ag_vextent_near(xfs_alloc_arg_t *);
STATIC int xfs_alloc_ag_vextent_size(xfs_alloc_arg_t *);
STATIC int xfs_alloc_ag_vextent_small(xfs_alloc_arg_t *,
- xfs_btree_cur_t *, xfs_agblock_t *, xfs_extlen_t *, int *);
-
-/*
- * Internal functions.
- */
+ xfs_btree_cur_t *, xfs_agblock_t *, xfs_extlen_t *, int *);
+STATIC void xfs_alloc_busy_trim(struct xfs_alloc_arg *,
+ xfs_agblock_t, xfs_extlen_t, xfs_agblock_t *, xfs_extlen_t *);
/*
* Lookup the record equal to [bno, len] in the btree given by cur.
@@ -154,19 +148,21 @@ xfs_alloc_compute_aligned(
xfs_extlen_t *reslen) /* result length */
{
xfs_agblock_t bno;
- xfs_extlen_t diff;
xfs_extlen_t len;
- if (args->alignment > 1 && foundlen >= args->minlen) {
- bno = roundup(foundbno, args->alignment);
- diff = bno - foundbno;
- len = diff >= foundlen ? 0 : foundlen - diff;
+ /* Trim busy sections out of found extent */
+ xfs_alloc_busy_trim(args, foundbno, foundlen, &bno, &len);
+
+ if (args->alignment > 1 && len >= args->minlen) {
+ xfs_agblock_t aligned_bno = roundup(bno, args->alignment);
+ xfs_extlen_t diff = aligned_bno - bno;
+
+ *resbno = aligned_bno;
+ *reslen = diff >= len ? 0 : len - diff;
} else {
- bno = foundbno;
- len = foundlen;
+ *resbno = bno;
+ *reslen = len;
}
- *resbno = bno;
- *reslen = len;
}
/*
@@ -541,16 +537,8 @@ xfs_alloc_ag_vextent(
if (error)
return error;
- /*
- * Search the busylist for these blocks and mark the
- * transaction as synchronous if blocks are found. This
- * avoids the need to block due to a synchronous log
- * force to ensure correct ordering as the synchronous
- * transaction will guarantee that for us.
- */
- if (xfs_alloc_busy_search(args->mp, args->agno,
- args->agbno, args->len))
- xfs_trans_set_sync(args->tp);
+ ASSERT(!xfs_alloc_busy_search(args->mp, args->agno,
+ args->agbno, args->len));
}
if (!args->isfl) {
@@ -577,14 +565,14 @@ xfs_alloc_ag_vextent_exact(
{
xfs_btree_cur_t *bno_cur;/* by block-number btree cursor */
xfs_btree_cur_t *cnt_cur;/* by count btree cursor */
- xfs_agblock_t end; /* end of allocated extent */
int error;
xfs_agblock_t fbno; /* start block of found extent */
- xfs_agblock_t fend; /* end block of found extent */
xfs_extlen_t flen; /* length of found extent */
+ xfs_agblock_t tbno; /* start block of trimmed extent */
+ xfs_extlen_t tlen; /* length of trimmed extent */
+ xfs_agblock_t tend; /* end block of trimmed extent */
+ xfs_agblock_t end; /* end of allocated extent */
int i; /* success/failure of operation */
- xfs_agblock_t maxend; /* end of maximal extent */
- xfs_agblock_t minend; /* end of minimal extent */
xfs_extlen_t rlen; /* length of returned extent */
ASSERT(args->alignment == 1);
@@ -614,14 +602,22 @@ xfs_alloc_ag_vextent_exact(
goto error0;
XFS_WANT_CORRUPTED_GOTO(i == 1, error0);
ASSERT(fbno <= args->agbno);
- minend = args->agbno + args->minlen;
- maxend = args->agbno + args->maxlen;
- fend = fbno + flen;
/*
- * Give up if the freespace isn't long enough for the minimum request.
+ * Check for overlapping busy extents.
+ */
+ xfs_alloc_busy_trim(args, fbno, flen, &tbno, &tlen);
+
+ /*
+ * Give up if the start of the extent is busy, or the freespace isn't
+ * long enough for the minimum request.
*/
- if (fend < minend)
+ if (tbno > args->agbno)
+ goto not_found;
+ if (tlen < args->minlen)
+ goto not_found;
+ tend = tbno + tlen;
+ if (tend < args->agbno + args->minlen)
goto not_found;
/*
@@ -630,14 +626,14 @@ xfs_alloc_ag_vextent_exact(
*
* Fix the length according to mod and prod if given.
*/
- end = XFS_AGBLOCK_MIN(fend, maxend);
+ end = XFS_AGBLOCK_MIN(tend, args->agbno + args->maxlen);
args->len = end - args->agbno;
xfs_alloc_fix_len(args);
if (!xfs_alloc_fix_minleft(args))
goto not_found;
rlen = args->len;
- ASSERT(args->agbno + rlen <= fend);
+ ASSERT(args->agbno + rlen <= tend);
end = args->agbno + rlen;
/*
@@ -686,11 +682,11 @@ xfs_alloc_find_best_extent(
struct xfs_btree_cur **scur, /* searching cursor */
xfs_agblock_t gdiff, /* difference for search comparison */
xfs_agblock_t *sbno, /* extent found by search */
- xfs_extlen_t *slen,
- xfs_extlen_t *slena, /* aligned length */
+ xfs_extlen_t *slen, /* extent length */
+ xfs_agblock_t *sbnoa, /* aligned extent found by search */
+ xfs_extlen_t *slena, /* aligned extent length */
int dir) /* 0 = search right, 1 = search left */
{
- xfs_agblock_t bno;
xfs_agblock_t new;
xfs_agblock_t sdiff;
int error;
@@ -708,16 +704,16 @@ xfs_alloc_find_best_extent(
if (error)
goto error0;
XFS_WANT_CORRUPTED_GOTO(i == 1, error0);
- xfs_alloc_compute_aligned(args, *sbno, *slen, &bno, slena);
+ xfs_alloc_compute_aligned(args, *sbno, *slen, sbnoa, slena);
/*
* The good extent is closer than this one.
*/
if (!dir) {
- if (bno >= args->agbno + gdiff)
+ if (*sbnoa >= args->agbno + gdiff)
goto out_use_good;
} else {
- if (bno <= args->agbno - gdiff)
+ if (*sbnoa <= args->agbno - gdiff)
goto out_use_good;
}
@@ -729,8 +725,8 @@ xfs_alloc_find_best_extent(
xfs_alloc_fix_len(args);
sdiff = xfs_alloc_compute_diff(args->agbno, args->len,
- args->alignment, *sbno,
- *slen, &new);
+ args->alignment, *sbnoa,
+ *slena, &new);
/*
* Choose closer size and invalidate other cursor.
@@ -780,7 +776,7 @@ xfs_alloc_ag_vextent_near(
xfs_agblock_t gtbnoa; /* aligned ... */
xfs_extlen_t gtdiff; /* difference to right side entry */
xfs_extlen_t gtlen; /* length of right side entry */
- xfs_extlen_t gtlena = 0; /* aligned ... */
+ xfs_extlen_t gtlena; /* aligned ... */
xfs_agblock_t gtnew; /* useful start bno of right side */
int error; /* error code */
int i; /* result code, temporary */
@@ -789,9 +785,10 @@ xfs_alloc_ag_vextent_near(
xfs_agblock_t ltbnoa; /* aligned ... */
xfs_extlen_t ltdiff; /* difference to left side entry */
xfs_extlen_t ltlen; /* length of left side entry */
- xfs_extlen_t ltlena = 0; /* aligned ... */
+ xfs_extlen_t ltlena; /* aligned ... */
xfs_agblock_t ltnew; /* useful start bno of left side */
xfs_extlen_t rlen; /* length of returned extent */
+ int forced = 0;
#if defined(DEBUG) && defined(__KERNEL__)
/*
* Randomly don't execute the first algorithm.
@@ -800,13 +797,20 @@ xfs_alloc_ag_vextent_near(
dofirst = random32() & 1;
#endif
+
+restart:
+ bno_cur_lt = NULL;
+ bno_cur_gt = NULL;
+ ltlen = 0;
+ gtlena = 0;
+ ltlena = 0;
+
/*
* Get a cursor for the by-size btree.
*/
cnt_cur = xfs_allocbt_init_cursor(args->mp, args->tp, args->agbp,
args->agno, XFS_BTNUM_CNT);
- ltlen = 0;
- bno_cur_lt = bno_cur_gt = NULL;
+
/*
* See if there are any free extents as big as maxlen.
*/
@@ -822,11 +826,13 @@ xfs_alloc_ag_vextent_near(
goto error0;
if (i == 0 || ltlen == 0) {
xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR);
+ trace_xfs_alloc_near_noentry(args);
return 0;
}
ASSERT(i == 1);
}
args->wasfromfl = 0;
+
/*
* First algorithm.
* If the requested extent is large wrt the freespaces available
@@ -890,7 +896,7 @@ xfs_alloc_ag_vextent_near(
if (args->len < blen)
continue;
ltdiff = xfs_alloc_compute_diff(args->agbno, args->len,
- args->alignment, ltbno, ltlen, <new);
+ args->alignment, ltbnoa, ltlena, <new);
if (ltnew != NULLAGBLOCK &&
(args->len > blen || ltdiff < bdiff)) {
bdiff = ltdiff;
@@ -1042,11 +1048,12 @@ xfs_alloc_ag_vextent_near(
args->len = XFS_EXTLEN_MIN(ltlena, args->maxlen);
xfs_alloc_fix_len(args);
ltdiff = xfs_alloc_compute_diff(args->agbno, args->len,
- args->alignment, ltbno, ltlen, <new);
+ args->alignment, ltbnoa, ltlena, <new);
error = xfs_alloc_find_best_extent(args,
&bno_cur_lt, &bno_cur_gt,
- ltdiff, >bno, >len, >lena,
+ ltdiff, >bno, >len,
+ >bnoa, >lena,
0 /* search right */);
} else {
ASSERT(gtlena >= args->minlen);
@@ -1057,11 +1064,12 @@ xfs_alloc_ag_vextent_near(
args->len = XFS_EXTLEN_MIN(gtlena, args->maxlen);
xfs_alloc_fix_len(args);
gtdiff = xfs_alloc_compute_diff(args->agbno, args->len,
- args->alignment, gtbno, gtlen, >new);
+ args->alignment, gtbnoa, gtlena, >new);
error = xfs_alloc_find_best_extent(args,
&bno_cur_gt, &bno_cur_lt,
- gtdiff, <bno, <len, <lena,
+ gtdiff, <bno, <len,
+ <bnoa, <lena,
1 /* search left */);
}
@@ -1073,6 +1081,12 @@ xfs_alloc_ag_vextent_near(
* If we couldn't get anything, give up.
*/
if (bno_cur_lt == NULL && bno_cur_gt == NULL) {
+ if (!forced++) {
+ trace_xfs_alloc_near_busy(args);
+ xfs_log_force(args->mp, XFS_LOG_SYNC);
+ goto restart;
+ }
+
trace_xfs_alloc_size_neither(args);
args->agbno = NULLAGBLOCK;
return 0;
@@ -1107,12 +1121,13 @@ xfs_alloc_ag_vextent_near(
return 0;
}
rlen = args->len;
- (void)xfs_alloc_compute_diff(args->agbno, rlen, args->alignment, ltbno,
- ltlen, <new);
+ (void)xfs_alloc_compute_diff(args->agbno, rlen, args->alignment,
+ ltbnoa, ltlena, <new);
ASSERT(ltnew >= ltbno);
- ASSERT(ltnew + rlen <= ltbno + ltlen);
+ ASSERT(ltnew + rlen <= ltbnoa + ltlena);
ASSERT(ltnew + rlen <= be32_to_cpu(XFS_BUF_TO_AGF(args->agbp)->agf_length));
args->agbno = ltnew;
+
if ((error = xfs_alloc_fixup_trees(cnt_cur, bno_cur_lt, ltbno, ltlen,
ltnew, rlen, XFSA_FIXUP_BNO_OK)))
goto error0;
@@ -1155,26 +1170,35 @@ xfs_alloc_ag_vextent_size(
int i; /* temp status variable */
xfs_agblock_t rbno; /* returned block number */
xfs_extlen_t rlen; /* length of returned extent */
+ int forced = 0;
+restart:
/*
* Allocate and initialize a cursor for the by-size btree.
*/
cnt_cur = xfs_allocbt_init_cursor(args->mp, args->tp, args->agbp,
args->agno, XFS_BTNUM_CNT);
bno_cur = NULL;
+
/*
* Look for an entry >= maxlen+alignment-1 blocks.
*/
if ((error = xfs_alloc_lookup_ge(cnt_cur, 0,
args->maxlen + args->alignment - 1, &i)))
goto error0;
+
/*
- * If none, then pick up the last entry in the tree unless the
- * tree is empty.
- */
- if (!i) {
- if ((error = xfs_alloc_ag_vextent_small(args, cnt_cur, &fbno,
- &flen, &i)))
+ * If none or we have busy extents that we cannot allocate from, then
+ * we have to settle for a smaller extent. In the case that there are
+ * no large extents, this will return the last entry in the tree unless
+ * the tree is empty. In the case that there are only busy large
+ * extents, this will return the largest small extent unless there
+ * are no smaller extents available.
+ */
+ if (!i || forced > 1) {
+ error = xfs_alloc_ag_vextent_small(args, cnt_cur,
+ &fbno, &flen, &i);
+ if (error)
goto error0;
if (i == 0 || flen == 0) {
xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR);
@@ -1182,22 +1206,56 @@ xfs_alloc_ag_vextent_size(
return 0;
}
ASSERT(i == 1);
- }
- /*
- * There's a freespace as big as maxlen+alignment-1, get it.
- */
- else {
- if ((error = xfs_alloc_get_rec(cnt_cur, &fbno, &flen, &i)))
- goto error0;
- XFS_WANT_CORRUPTED_GOTO(i == 1, error0);
- }
+ xfs_alloc_compute_aligned(args, fbno, flen, &rbno, &rlen);
+ } else {
+ /*
+ * Search for a non-busy extent that is large enough.
+ * If we are at low space, don't check, or if we fall of
+ * the end of the btree, turn off the busy check and
+ * restart.
+ */
+ for (;;) {
+ error = xfs_alloc_get_rec(cnt_cur, &fbno, &flen, &i);
+ if (error)
+ goto error0;
+ XFS_WANT_CORRUPTED_GOTO(i == 1, error0);
+
+ xfs_alloc_compute_aligned(args, fbno, flen,
+ &rbno, &rlen);
+
+ if (rlen >= args->maxlen)
+ break;
+
+ error = xfs_btree_increment(cnt_cur, 0, &i);
+ if (error)
+ goto error0;
+ if (i == 0) {
+ /*
+ * Our only valid extents must have been busy.
+ * Make it unbusy by forcing the log out and
+ * retrying. If we've been here before, forcing
+ * the log isn't making the extents available,
+ * which means they have probably been freed in
+ * this transaction. In that case, we have to
+ * give up on them and we'll attempt a minlen
+ * allocation the next time around.
+ */
+ xfs_btree_del_cursor(cnt_cur,
+ XFS_BTREE_NOERROR);
+ trace_xfs_alloc_size_busy(args);
+ if (!forced++)
+ xfs_log_force(args->mp, XFS_LOG_SYNC);
+ goto restart;
+ }
+ }
+ }
+
/*
* In the first case above, we got the last entry in the
* by-size btree. Now we check to see if the space hits maxlen
* once aligned; if not, we search left for something better.
* This can't happen in the second case above.
*/
- xfs_alloc_compute_aligned(args, fbno, flen, &rbno, &rlen);
rlen = XFS_EXTLEN_MIN(args->maxlen, rlen);
XFS_WANT_CORRUPTED_GOTO(rlen == 0 ||
(rlen <= flen && rbno + rlen <= fbno + flen), error0);
@@ -1251,13 +1309,19 @@ xfs_alloc_ag_vextent_size(
* Fix up the length.
*/
args->len = rlen;
- xfs_alloc_fix_len(args);
- if (rlen < args->minlen || !xfs_alloc_fix_minleft(args)) {
- xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR);
- trace_xfs_alloc_size_nominleft(args);
- args->agbno = NULLAGBLOCK;
- return 0;
+ if (rlen < args->minlen) {
+ if (!forced++) {
+ xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR);
+ trace_xfs_alloc_size_busy(args);
+ xfs_log_force(args->mp, XFS_LOG_SYNC);
+ goto restart;
+ }
+ goto out_nominleft;
}
+ xfs_alloc_fix_len(args);
+
+ if (!xfs_alloc_fix_minleft(args))
+ goto out_nominleft;
rlen = args->len;
XFS_WANT_CORRUPTED_GOTO(rlen <= flen, error0);
/*
@@ -1287,6 +1351,12 @@ error0:
if (bno_cur)
xfs_btree_del_cursor(bno_cur, XFS_BTREE_ERROR);
return error;
+
+out_nominleft:
+ xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR);
+ trace_xfs_alloc_size_nominleft(args);
+ args->agbno = NULLAGBLOCK;
+ return 0;
}
/*
@@ -2634,6 +2704,181 @@ xfs_alloc_busy_search(
return match;
}
+/*
+ * For a given extent [fbno, flen], search the busy extent list
+ * to find a subset of the extent that is not busy.
+ */
+STATIC void
+xfs_alloc_busy_trim(
+ struct xfs_alloc_arg *args,
+ xfs_agblock_t fbno,
+ xfs_extlen_t flen,
+ xfs_agblock_t *rbno,
+ xfs_extlen_t *rlen)
+{
+ struct rb_node *rbp;
+
+ ASSERT(flen > 0);
+
+ spin_lock(&args->pag->pagb_lock);
+ rbp = args->pag->pagb_tree.rb_node;
+ while (rbp && flen >= args->minlen) {
+ struct xfs_busy_extent *busyp =
+ rb_entry(rbp, struct xfs_busy_extent, rb_node);
+ xfs_agblock_t fend = fbno + flen;
+ xfs_agblock_t bbno = busyp->bno;
+ xfs_agblock_t bend = bbno + busyp->length;
+
+ if (fend <= bbno) {
+ rbp = rbp->rb_left;
+ continue;
+ } else if (fbno >= bend) {
+ rbp = rbp->rb_right;
+ continue;
+ }
+
+ if (bbno <= fbno) {
+ /* start overlap */
+
+ /*
+ * Case 1:
+ * bbno bend
+ * +BBBBBBBBBBBBBBBBB+
+ * +---------+
+ * fbno fend
+ *
+ * Case 2:
+ * bbno bend
+ * +BBBBBBBBBBBBBBBBB+
+ * +-------------+
+ * fbno fend
+ *
+ * Case 3:
+ * bbno bend
+ * +BBBBBBBBBBBBBBBBB+
+ * +-------------+
+ * fbno fend
+ *
+ * Case 4:
+ * bbno bend
+ * +BBBBBBBBBBBBBBBBB+
+ * +-----------------+
+ * fbno fend
+ *
+ * No unbusy region in extent, return failure.
+ */
+ if (fend <= bend)
+ goto fail;
+
+ /*
+ * Case 5:
+ * bbno bend
+ * +BBBBBBBBBBBBBBBBB+
+ * +----------------------+
+ * fbno fend
+ *
+ * Case 6:
+ * bbno bend
+ * +BBBBBBBBBBBBBBBBB+
+ * +--------------------------+
+ * fbno fend
+ *
+ * Needs to be trimmed to:
+ * +-------+
+ * fbno fend
+ */
+ fbno = bend;
+ } else if (bend >= fend) {
+ /* end overlap */
+
+ /*
+ * Case 7:
+ * bbno bend
+ * +BBBBBBBBBBBBBBBBB+
+ * +------------------+
+ * fbno fend
+ *
+ * Case 8:
+ * bbno bend
+ * +BBBBBBBBBBBBBBBBB+
+ * +--------------------------+
+ * fbno fend
+ *
+ * Needs to be trimmed to:
+ * +-------+
+ * fbno fend
+ */
+ fend = bbno;
+ } else {
+ /* middle overlap */
+
+ /*
+ * Case 9:
+ * bbno bend
+ * +BBBBBBBBBBBBBBBBB+
+ * +-----------------------------------+
+ * fbno fend
+ *
+ * Can be trimmed to:
+ * +-------+ OR +-------+
+ * fbno fend fbno fend
+ *
+ * We prefer the lower bno extent because the next
+ * allocation for this inode will use "end" as the
+ * target for first block. If the busy segment has
+ * cleared, this will get a contiguous allocation next
+ * time around; if thebusy segment has not cleared,
+ * it will get an allocation at bend, which is a forward
+ * allocation.
+ *
+ * If we choose segment at bend, and this remains the
+ * best extent for the next allocation (e.g. NEAR_BNO
+ * allocation) we'll next allocate at bno, which will
+ * give us backwards allocation. We already know that
+ * backwards allocation direction causes significant
+ * fragmentation of directories and degradataion of
+ * directory performance.
+ *
+ * Always chose the option that produces forward
+ * allocation patterns so that sequential reads and
+ * writes only ever seek in one direction. Only choose
+ * the higher bno extent if the remainin unused extent
+ * length is much larger than the current allocation
+ * request, promising us a contiguous allocation in
+ * the following free space.
+ */
+
+ if (bbno - fbno >= args->maxlen) {
+ /* left candidate fits perfect */
+ fend = bbno;
+ } else if (fend - bend >= args->maxlen * 4) {
+ /* right candidate has enough free space */
+ fbno = bend;
+ } else if (bbno - fbno >= args->minlen) {
+ /* left candidate fits minimum requirement */
+ fend = bbno;
+ } else {
+ goto fail;
+ }
+ }
+
+ flen = fend - fbno;
+ }
+ spin_unlock(&args->pag->pagb_lock);
+
+ *rbno = fbno;
+ *rlen = flen;
+ return;
+fail:
+ /*
+ * Return a zero extent length as failure indications. All callers
+ * re-check if the trimmed extent satisfies the minlen requirement.
+ */
+ spin_unlock(&args->pag->pagb_lock);
+ *rbno = fbno;
+ *rlen = 0;
+}
+
void
xfs_alloc_busy_clear(
struct xfs_mount *mp,
Index: xfs/fs/xfs/linux-2.6/xfs_trace.h
===================================================================
--- xfs.orig/fs/xfs/linux-2.6/xfs_trace.h 2011-03-19 16:51:44.661297219 +0100
+++ xfs/fs/xfs/linux-2.6/xfs_trace.h 2011-03-19 16:52:48.258797626 +0100
@@ -1433,11 +1433,14 @@ DEFINE_ALLOC_EVENT(xfs_alloc_near_first)
DEFINE_ALLOC_EVENT(xfs_alloc_near_greater);
DEFINE_ALLOC_EVENT(xfs_alloc_near_lesser);
DEFINE_ALLOC_EVENT(xfs_alloc_near_error);
+DEFINE_ALLOC_EVENT(xfs_alloc_near_noentry);
+DEFINE_ALLOC_EVENT(xfs_alloc_near_busy);
DEFINE_ALLOC_EVENT(xfs_alloc_size_neither);
DEFINE_ALLOC_EVENT(xfs_alloc_size_noentry);
DEFINE_ALLOC_EVENT(xfs_alloc_size_nominleft);
DEFINE_ALLOC_EVENT(xfs_alloc_size_done);
DEFINE_ALLOC_EVENT(xfs_alloc_size_error);
+DEFINE_ALLOC_EVENT(xfs_alloc_size_busy);
DEFINE_ALLOC_EVENT(xfs_alloc_small_freelist);
DEFINE_ALLOC_EVENT(xfs_alloc_small_notenough);
DEFINE_ALLOC_EVENT(xfs_alloc_small_done);
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH 3/6] xfs: exact busy extent tracking
2011-03-22 19:55 [PATCH 0/6] [PATCH 0/6] more efficient busy extent handling and discard support Christoph Hellwig
2011-03-22 19:55 ` [PATCH 1/6] xfs: optimize AGFL refills Christoph Hellwig
2011-03-22 19:55 ` [PATCH 2/6] xfs: do not immediately reuse busy extent ranges Christoph Hellwig
@ 2011-03-22 19:55 ` Christoph Hellwig
2011-03-22 23:47 ` Dave Chinner
2011-03-25 21:04 ` Alex Elder
2011-03-22 19:55 ` [PATCH 4/6] xfs: allow reusing busy extents where safe Christoph Hellwig
` (2 subsequent siblings)
5 siblings, 2 replies; 31+ messages in thread
From: Christoph Hellwig @ 2011-03-22 19:55 UTC (permalink / raw)
To: xfs
[-- Attachment #1: xfs-better-busy-extent-tracking --]
[-- Type: text/plain, Size: 16502 bytes --]
Update the extent tree in case we have to reuse a busy extent, so that it
always is kept uptodate. This is done by replacing the busy list searches
with a new xfs_alloc_busy_reuse helper, which updates the busy extent tree
in case of a reuse. Also replace setting transactions to sync with forcing
the log out in case we found a busy extent to reuse. This makes the code a
lot more simple, and is required for discard support later on. While it
will cause performance regressios with just this patch applied, the impact
is more than mitigated by the next patch in the series.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Index: xfs/fs/xfs/xfs_alloc.c
===================================================================
--- xfs.orig/fs/xfs/xfs_alloc.c 2011-03-20 19:41:55.835479390 +0100
+++ xfs/fs/xfs/xfs_alloc.c 2011-03-21 14:49:14.157973188 +0100
@@ -1396,8 +1396,8 @@ xfs_alloc_ag_vextent_small(
if (error)
goto error0;
if (fbno != NULLAGBLOCK) {
- if (xfs_alloc_busy_search(args->mp, args->agno, fbno, 1))
- xfs_trans_set_sync(args->tp);
+ xfs_alloc_busy_reuse(args->tp, args->agno, fbno, 1);
+
if (args->userdata) {
xfs_buf_t *bp;
@@ -2459,100 +2459,6 @@ error0:
return error;
}
-
-/*
- * AG Busy list management
- * The busy list contains block ranges that have been freed but whose
- * transactions have not yet hit disk. If any block listed in a busy
- * list is reused, the transaction that freed it must be forced to disk
- * before continuing to use the block.
- *
- * xfs_alloc_busy_insert - add to the per-ag busy list
- * xfs_alloc_busy_clear - remove an item from the per-ag busy list
- * xfs_alloc_busy_search - search for a busy extent
- */
-
-/*
- * Insert a new extent into the busy tree.
- *
- * The busy extent tree is indexed by the start block of the busy extent.
- * there can be multiple overlapping ranges in the busy extent tree but only
- * ever one entry at a given start block. The reason for this is that
- * multi-block extents can be freed, then smaller chunks of that extent
- * allocated and freed again before the first transaction commit is on disk.
- * If the exact same start block is freed a second time, we have to wait for
- * that busy extent to pass out of the tree before the new extent is inserted.
- * There are two main cases we have to handle here.
- *
- * The first case is a transaction that triggers a "free - allocate - free"
- * cycle. This can occur during btree manipulations as a btree block is freed
- * to the freelist, then allocated from the free list, then freed again. In
- * this case, the second extxpnet free is what triggers the duplicate and as
- * such the transaction IDs should match. Because the extent was allocated in
- * this transaction, the transaction must be marked as synchronous. This is
- * true for all cases where the free/alloc/free occurs in the one transaction,
- * hence the addition of the ASSERT(tp->t_flags & XFS_TRANS_SYNC) to this case.
- * This serves to catch violations of the second case quite effectively.
- *
- * The second case is where the free/alloc/free occur in different
- * transactions. In this case, the thread freeing the extent the second time
- * can't mark the extent busy immediately because it is already tracked in a
- * transaction that may be committing. When the log commit for the existing
- * busy extent completes, the busy extent will be removed from the tree. If we
- * allow the second busy insert to continue using that busy extent structure,
- * it can be freed before this transaction is safely in the log. Hence our
- * only option in this case is to force the log to remove the existing busy
- * extent from the list before we insert the new one with the current
- * transaction ID.
- *
- * The problem we are trying to avoid in the free-alloc-free in separate
- * transactions is most easily described with a timeline:
- *
- * Thread 1 Thread 2 Thread 3 xfslogd
- * xact alloc
- * free X
- * mark busy
- * commit xact
- * free xact
- * xact alloc
- * alloc X
- * busy search
- * mark xact sync
- * commit xact
- * free xact
- * force log
- * checkpoint starts
- * ....
- * xact alloc
- * free X
- * mark busy
- * finds match
- * *** KABOOM! ***
- * ....
- * log IO completes
- * unbusy X
- * checkpoint completes
- *
- * By issuing a log force in thread 3 @ "KABOOM", the thread will block until
- * the checkpoint completes, and the busy extent it matched will have been
- * removed from the tree when it is woken. Hence it can then continue safely.
- *
- * However, to ensure this matching process is robust, we need to use the
- * transaction ID for identifying transaction, as delayed logging results in
- * the busy extent and transaction lifecycles being different. i.e. the busy
- * extent is active for a lot longer than the transaction. Hence the
- * transaction structure can be freed and reallocated, then mark the same
- * extent busy again in the new transaction. In this case the new transaction
- * will have a different tid but can have the same address, and hence we need
- * to check against the tid.
- *
- * Future: for delayed logging, we could avoid the log force if the extent was
- * first freed in the current checkpoint sequence. This, however, requires the
- * ability to pin the current checkpoint in memory until this transaction
- * commits to ensure that both the original free and the current one combine
- * logically into the one checkpoint. If the checkpoint sequences are
- * different, however, we still need to wait on a log force.
- */
void
xfs_alloc_busy_insert(
struct xfs_trans *tp,
@@ -2564,9 +2470,7 @@ xfs_alloc_busy_insert(
struct xfs_busy_extent *busyp;
struct xfs_perag *pag;
struct rb_node **rbp;
- struct rb_node *parent;
- int match;
-
+ struct rb_node *parent = NULL;
new = kmem_zalloc(sizeof(struct xfs_busy_extent), KM_MAYFAIL);
if (!new) {
@@ -2583,66 +2487,30 @@ xfs_alloc_busy_insert(
new->agno = agno;
new->bno = bno;
new->length = len;
- new->tid = xfs_log_get_trans_ident(tp);
-
INIT_LIST_HEAD(&new->list);
/* trace before insert to be able to see failed inserts */
trace_xfs_alloc_busy(tp, agno, bno, len, 0);
pag = xfs_perag_get(tp->t_mountp, new->agno);
-restart:
spin_lock(&pag->pagb_lock);
rbp = &pag->pagb_tree.rb_node;
- parent = NULL;
- busyp = NULL;
- match = 0;
- while (*rbp && match >= 0) {
+ while (*rbp) {
parent = *rbp;
busyp = rb_entry(parent, struct xfs_busy_extent, rb_node);
if (new->bno < busyp->bno) {
/* may overlap, but exact start block is lower */
rbp = &(*rbp)->rb_left;
- if (new->bno + new->length > busyp->bno)
- match = busyp->tid == new->tid ? 1 : -1;
+ BUG_ON(new->bno + new->length > busyp->bno);
} else if (new->bno > busyp->bno) {
/* may overlap, but exact start block is higher */
rbp = &(*rbp)->rb_right;
- if (bno < busyp->bno + busyp->length)
- match = busyp->tid == new->tid ? 1 : -1;
+ BUG_ON(bno < busyp->bno + busyp->length);
} else {
- match = busyp->tid == new->tid ? 1 : -1;
- break;
+ BUG();
}
}
- if (match < 0) {
- /* overlap marked busy in different transaction */
- spin_unlock(&pag->pagb_lock);
- xfs_log_force(tp->t_mountp, XFS_LOG_SYNC);
- goto restart;
- }
- if (match > 0) {
- /*
- * overlap marked busy in same transaction. Update if exact
- * start block match, otherwise combine the busy extents into
- * a single range.
- */
- if (busyp->bno == new->bno) {
- busyp->length = max(busyp->length, new->length);
- spin_unlock(&pag->pagb_lock);
- ASSERT(tp->t_flags & XFS_TRANS_SYNC);
- xfs_perag_put(pag);
- kmem_free(new);
- return;
- }
- rb_erase(&busyp->rb_node, &pag->pagb_tree);
- new->length = max(busyp->bno + busyp->length,
- new->bno + new->length) -
- min(busyp->bno, new->bno);
- new->bno = min(busyp->bno, new->bno);
- } else
- busyp = NULL;
rb_link_node(&new->rb_node, parent, rbp);
rb_insert_color(&new->rb_node, &pag->pagb_tree);
@@ -2650,7 +2518,6 @@ restart:
list_add(&new->list, &tp->t_busy);
spin_unlock(&pag->pagb_lock);
xfs_perag_put(pag);
- kmem_free(busyp);
}
/*
@@ -2704,6 +2571,173 @@ xfs_alloc_busy_search(
return match;
}
+STATIC int
+xfs_alloc_busy_try_reuse(
+ struct xfs_perag *pag,
+ struct xfs_busy_extent *busyp,
+ xfs_agblock_t fbno,
+ xfs_agblock_t fend)
+{
+ xfs_agblock_t bbno = busyp->bno;
+ xfs_agblock_t bend = bbno + busyp->length;
+
+ if (bbno < fbno && bend > fend) {
+ /*
+ * Case 1:
+ * bbno bend
+ * +BBBBBBBBBBBBBBBBB+
+ * +---------+
+ * fbno fend
+ */
+
+ /*
+ * We would have to split the busy extent to be able
+ * to track it correct, which we cannot do because we
+ * would have to modify the list of busy extents
+ * attached to the transaction/CIL context, which
+ * is mutable.
+ *
+ * Force out the log to clear the busy extents
+ * and retry the search.
+ */
+ return -1;
+ } else if (bbno >= fbno && bend <= fend) {
+ /*
+ * Case 2:
+ * bbno bend
+ * +BBBBBBBBBBBBBBBBB+
+ * +-----------------+
+ * fbno fend
+ *
+ * Case 3:
+ * bbno bend
+ * +BBBBBBBBBBBBBBBBB+
+ * +--------------------------+
+ * fbno fend
+ *
+ * Case 4:
+ * bbno bend
+ * +BBBBBBBBBBBBBBBBB+
+ * +--------------------------+
+ * fbno fend
+ *
+ * Case 5:
+ * bbno bend
+ * +BBBBBBBBBBBBBBBBB+
+ * +-----------------------------------+
+ * fbno fend
+ *
+ */
+
+ /*
+ * The busy extent is fully covered by the extent
+ * we are allocating, and can simply be removed from
+ * the rbtree. However we cannot remove it from the
+ * immutable list tracking busy extents in the
+ * transaction or CIL context, so set the length
+ * to zero to mark it invalid.
+ */
+ rb_erase(&busyp->rb_node, &pag->pagb_tree);
+ busyp->length = 0;
+ } else if (bbno == fbno) {
+ /*
+ * Case 6:
+ * bbno bend
+ * +BBBBBBBBBBBBBBBBB+
+ * +---------+
+ * fbno fend
+ *
+ * Case 7:
+ * bbno bend
+ * +BBBBBBBBBBBBBBBBB+
+ * +------------------+
+ * fbno fend
+ *
+ */
+
+ busyp->bno = fend;
+ } else if (bend == fend) {
+ /*
+ * Case 8:
+ * bbno bend
+ * +BBBBBBBBBBBBBBBBB+
+ * +-------------+
+ * fbno fend
+ *
+ * Case 9:
+ * bbno bend
+ * +BBBBBBBBBBBBBBBBB+
+ * +----------------------+
+ * fbno fend
+ */
+
+ busyp->length = fbno - busyp->bno;
+ } else {
+ BUG();
+ }
+
+ return 1;
+}
+
+
+/*
+ * For a given extent [fbno, flen], make sure we can reuse it safely.
+ */
+void
+xfs_alloc_busy_reuse(
+ struct xfs_trans *tp,
+ xfs_agnumber_t agno,
+ xfs_agblock_t fbno,
+ xfs_extlen_t flen)
+{
+ struct xfs_perag *pag;
+ struct rb_node *rbp;
+
+ ASSERT(flen > 0);
+
+ pag = xfs_perag_get(tp->t_mountp, agno);
+restart:
+ spin_lock(&pag->pagb_lock);
+ rbp = pag->pagb_tree.rb_node;
+ while (rbp) {
+ struct xfs_busy_extent *busyp =
+ rb_entry(rbp, struct xfs_busy_extent, rb_node);
+ xfs_agblock_t fend = fbno + flen;
+ xfs_agblock_t bbno = busyp->bno;
+ xfs_agblock_t bend = bbno + busyp->length;
+ int overlap;
+
+ if (fend <= bbno) {
+ rbp = rbp->rb_left;
+ continue;
+ } else if (fbno >= bend) {
+ rbp = rbp->rb_right;
+ continue;
+ }
+
+ overlap = xfs_alloc_busy_try_reuse(pag, busyp,
+ fbno, fbno + flen);
+ if (overlap) {
+ spin_unlock(&pag->pagb_lock);
+ xfs_log_force(tp->t_mountp, XFS_LOG_SYNC);
+ goto restart;
+ }
+
+ /*
+ * No more busy extents to search.
+ */
+ if (bbno <= fbno && bend >= fend)
+ break;
+
+ if (fbno < bbno)
+ rbp = rbp->rb_left;
+ else
+ rbp = rbp->rb_right;
+ }
+ spin_unlock(&pag->pagb_lock);
+ xfs_perag_put(pag);
+}
+
/*
* For a given extent [fbno, flen], search the busy extent list
* to find a subset of the extent that is not busy.
@@ -2889,14 +2923,12 @@ xfs_alloc_busy_clear(
trace_xfs_alloc_unbusy(mp, busyp->agno, busyp->bno,
busyp->length);
- ASSERT(xfs_alloc_busy_search(mp, busyp->agno, busyp->bno,
- busyp->length) == 1);
-
list_del_init(&busyp->list);
pag = xfs_perag_get(mp, busyp->agno);
spin_lock(&pag->pagb_lock);
- rb_erase(&busyp->rb_node, &pag->pagb_tree);
+ if (busyp->length)
+ rb_erase(&busyp->rb_node, &pag->pagb_tree);
spin_unlock(&pag->pagb_lock);
xfs_perag_put(pag);
Index: xfs/fs/xfs/xfs_alloc.h
===================================================================
--- xfs.orig/fs/xfs/xfs_alloc.h 2011-03-20 19:41:55.847481367 +0100
+++ xfs/fs/xfs/xfs_alloc.h 2011-03-21 14:48:04.202480897 +0100
@@ -145,6 +145,10 @@ xfs_alloc_busy_clear(struct xfs_mount *m
int
xfs_alloc_busy_search(struct xfs_mount *mp, xfs_agnumber_t agno,
xfs_agblock_t bno, xfs_extlen_t len);
+
+void
+xfs_alloc_busy_reuse(struct xfs_trans *tp, xfs_agnumber_t agno,
+ xfs_agblock_t fbno, xfs_extlen_t flen);
#endif /* __KERNEL__ */
/*
Index: xfs/fs/xfs/xfs_alloc_btree.c
===================================================================
--- xfs.orig/fs/xfs/xfs_alloc_btree.c 2011-03-20 19:41:55.863479131 +0100
+++ xfs/fs/xfs/xfs_alloc_btree.c 2011-03-21 14:48:04.210474164 +0100
@@ -94,8 +94,8 @@ xfs_allocbt_alloc_block(
*stat = 0;
return 0;
}
- if (xfs_alloc_busy_search(cur->bc_mp, cur->bc_private.a.agno, bno, 1))
- xfs_trans_set_sync(cur->bc_tp);
+
+ xfs_alloc_busy_reuse(cur->bc_tp, cur->bc_private.a.agno, bno, 1);
xfs_trans_agbtree_delta(cur->bc_tp, 1);
new->s = cpu_to_be32(bno);
Index: xfs/fs/xfs/xfs_ag.h
===================================================================
--- xfs.orig/fs/xfs/xfs_ag.h 2011-03-21 14:37:58.553976032 +0100
+++ xfs/fs/xfs/xfs_ag.h 2011-03-21 14:48:04.222474401 +0100
@@ -187,7 +187,6 @@ struct xfs_busy_extent {
xfs_agnumber_t agno;
xfs_agblock_t bno;
xfs_extlen_t length;
- xlog_tid_t tid; /* transaction that created this */
};
/*
Index: xfs/fs/xfs/xfs_log.c
===================================================================
--- xfs.orig/fs/xfs/xfs_log.c 2011-03-21 14:37:58.565974096 +0100
+++ xfs/fs/xfs/xfs_log.c 2011-03-21 14:38:03.000000000 +0100
@@ -3248,13 +3248,6 @@ xfs_log_ticket_get(
return ticket;
}
-xlog_tid_t
-xfs_log_get_trans_ident(
- struct xfs_trans *tp)
-{
- return tp->t_ticket->t_tid;
-}
-
/*
* Allocate and initialise a new log ticket.
*/
Index: xfs/fs/xfs/xfs_log.h
===================================================================
--- xfs.orig/fs/xfs/xfs_log.h 2011-03-21 14:37:58.578474771 +0100
+++ xfs/fs/xfs/xfs_log.h 2011-03-21 14:38:03.000000000 +0100
@@ -189,8 +189,6 @@ void xlog_iodone(struct xfs_buf *);
struct xlog_ticket *xfs_log_ticket_get(struct xlog_ticket *ticket);
void xfs_log_ticket_put(struct xlog_ticket *ticket);
-xlog_tid_t xfs_log_get_trans_ident(struct xfs_trans *tp);
-
void xfs_log_commit_cil(struct xfs_mount *mp, struct xfs_trans *tp,
struct xfs_log_vec *log_vector,
xfs_lsn_t *commit_lsn, int flags);
Index: xfs/fs/xfs/xfs_log_priv.h
===================================================================
--- xfs.orig/fs/xfs/xfs_log_priv.h 2011-03-21 14:37:58.590474432 +0100
+++ xfs/fs/xfs/xfs_log_priv.h 2011-03-21 14:38:03.000000000 +0100
@@ -144,6 +144,8 @@ static inline uint xlog_get_client_id(__
#define XLOG_RECOVERY_NEEDED 0x4 /* log was recovered */
#define XLOG_IO_ERROR 0x8 /* log hit an I/O error, and being
shutdown */
+typedef __uint32_t xlog_tid_t;
+
#ifdef __KERNEL__
/*
Index: xfs/fs/xfs/xfs_types.h
===================================================================
--- xfs.orig/fs/xfs/xfs_types.h 2011-03-21 14:37:58.610475776 +0100
+++ xfs/fs/xfs/xfs_types.h 2011-03-21 14:38:03.000000000 +0100
@@ -73,8 +73,6 @@ typedef __int32_t xfs_tid_t; /* transact
typedef __uint32_t xfs_dablk_t; /* dir/attr block number (in file) */
typedef __uint32_t xfs_dahash_t; /* dir/attr hash value */
-typedef __uint32_t xlog_tid_t; /* transaction ID type */
-
/*
* These types are 64 bits on disk but are either 32 or 64 bits in memory.
* Disk based types:
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH 4/6] xfs: allow reusing busy extents where safe
2011-03-22 19:55 [PATCH 0/6] [PATCH 0/6] more efficient busy extent handling and discard support Christoph Hellwig
` (2 preceding siblings ...)
2011-03-22 19:55 ` [PATCH 3/6] xfs: exact busy extent tracking Christoph Hellwig
@ 2011-03-22 19:55 ` Christoph Hellwig
2011-03-23 0:20 ` Dave Chinner
2011-03-25 21:04 ` Alex Elder
2011-03-22 19:55 ` [PATCH 5/6] xfs: add online discard support Christoph Hellwig
2011-03-22 19:55 ` [PATCH 6/6] xfs: make discard operations asynchronous Christoph Hellwig
5 siblings, 2 replies; 31+ messages in thread
From: Christoph Hellwig @ 2011-03-22 19:55 UTC (permalink / raw)
To: xfs
[-- Attachment #1: xfs-simplify-user-allocations --]
[-- Type: text/plain, Size: 16231 bytes --]
Allow reusing any busy extent for metadata allocations, and reusing busy
userdata extents for userdata allocations. Most of the complexity is
propagating the userdata information from the XFS_BMAPI_METADATA flag
to xfs_bunmapi into the low-level extent freeing routines. After that
we can just track what type of busy extent we have and treat it accordingly.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Index: xfs/fs/xfs/xfs_alloc.c
===================================================================
--- xfs.orig/fs/xfs/xfs_alloc.c 2011-03-21 14:49:14.000000000 +0100
+++ xfs/fs/xfs/xfs_alloc.c 2011-03-21 14:51:31.746155282 +0100
@@ -1396,7 +1396,8 @@ xfs_alloc_ag_vextent_small(
if (error)
goto error0;
if (fbno != NULLAGBLOCK) {
- xfs_alloc_busy_reuse(args->tp, args->agno, fbno, 1);
+ xfs_alloc_busy_reuse(args->tp, args->agno, fbno, 1,
+ args->userdata);
if (args->userdata) {
xfs_buf_t *bp;
@@ -2431,7 +2432,8 @@ int /* error */
xfs_free_extent(
xfs_trans_t *tp, /* transaction pointer */
xfs_fsblock_t bno, /* starting block number of extent */
- xfs_extlen_t len) /* length of extent */
+ xfs_extlen_t len,
+ bool userdata)/* length of extent */
{
xfs_alloc_arg_t args;
int error;
@@ -2444,6 +2446,7 @@ xfs_free_extent(
ASSERT(args.agno < args.mp->m_sb.sb_agcount);
args.agbno = XFS_FSB_TO_AGBNO(args.mp, bno);
args.pag = xfs_perag_get(args.mp, args.agno);
+ args.userdata = userdata;
if ((error = xfs_alloc_fix_freelist(&args, XFS_ALLOC_FLAG_FREEING)))
goto error0;
#ifdef DEBUG
@@ -2453,7 +2456,7 @@ xfs_free_extent(
#endif
error = xfs_free_ag_extent(tp, args.agbp, args.agno, args.agbno, len, 0);
if (error)
- xfs_alloc_busy_insert(tp, args.agno, args.agbno, len);
+ xfs_alloc_busy_insert(tp, args.agno, args.agbno, len, userdata);
error0:
xfs_perag_put(args.pag);
return error;
@@ -2464,7 +2467,8 @@ xfs_alloc_busy_insert(
struct xfs_trans *tp,
xfs_agnumber_t agno,
xfs_agblock_t bno,
- xfs_extlen_t len)
+ xfs_extlen_t len,
+ bool userdata)
{
struct xfs_busy_extent *new;
struct xfs_busy_extent *busyp;
@@ -2487,6 +2491,7 @@ xfs_alloc_busy_insert(
new->agno = agno;
new->bno = bno;
new->length = len;
+ new->flags = userdata ? XFS_ALLOC_BUSY_USERDATA : 0;
INIT_LIST_HEAD(&new->list);
/* trace before insert to be able to see failed inserts */
@@ -2688,7 +2693,8 @@ xfs_alloc_busy_reuse(
struct xfs_trans *tp,
xfs_agnumber_t agno,
xfs_agblock_t fbno,
- xfs_extlen_t flen)
+ xfs_extlen_t flen,
+ bool userdata)
{
struct xfs_perag *pag;
struct rb_node *rbp;
@@ -2717,7 +2723,7 @@ restart:
overlap = xfs_alloc_busy_try_reuse(pag, busyp,
fbno, fbno + flen);
- if (overlap) {
+ if (overlap == -1 || (overlap && userdata)) {
spin_unlock(&pag->pagb_lock);
xfs_log_force(tp->t_mountp, XFS_LOG_SYNC);
goto restart;
@@ -2754,6 +2760,7 @@ xfs_alloc_busy_trim(
ASSERT(flen > 0);
+restart:
spin_lock(&args->pag->pagb_lock);
rbp = args->pag->pagb_tree.rb_node;
while (rbp && flen >= args->minlen) {
@@ -2771,6 +2778,31 @@ xfs_alloc_busy_trim(
continue;
}
+ if (!args->userdata ||
+ (busyp->flags & XFS_ALLOC_BUSY_USERDATA)) {
+ int overlap;
+
+ overlap = xfs_alloc_busy_try_reuse(args->pag, busyp,
+ fbno, fbno + flen);
+ if (unlikely(overlap == -1)) {
+ spin_unlock(&args->pag->pagb_lock);
+ xfs_log_force(args->mp, XFS_LOG_SYNC);
+ goto restart;
+ }
+
+ /*
+ * No more busy extents to search.
+ */
+ if (bbno <= fbno && bend >= fend)
+ goto out;
+
+ if (fbno < bbno)
+ rbp = rbp->rb_left;
+ else
+ rbp = rbp->rb_right;
+ continue;
+ }
+
if (bbno <= fbno) {
/* start overlap */
@@ -2898,8 +2930,8 @@ xfs_alloc_busy_trim(
flen = fend - fbno;
}
+out:
spin_unlock(&args->pag->pagb_lock);
-
*rbno = fbno;
*rlen = flen;
return;
Index: xfs/fs/xfs/xfs_alloc.h
===================================================================
--- xfs.orig/fs/xfs/xfs_alloc.h 2011-03-21 14:48:04.000000000 +0100
+++ xfs/fs/xfs/xfs_alloc.h 2011-03-21 14:49:21.933973842 +0100
@@ -137,7 +137,7 @@ xfs_alloc_longest_free_extent(struct xfs
#ifdef __KERNEL__
void
xfs_alloc_busy_insert(struct xfs_trans *tp, xfs_agnumber_t agno,
- xfs_agblock_t bno, xfs_extlen_t len);
+ xfs_agblock_t bno, xfs_extlen_t len, bool userdata);
void
xfs_alloc_busy_clear(struct xfs_mount *mp, struct xfs_busy_extent *busyp);
@@ -148,7 +148,7 @@ xfs_alloc_busy_search(struct xfs_mount *
void
xfs_alloc_busy_reuse(struct xfs_trans *tp, xfs_agnumber_t agno,
- xfs_agblock_t fbno, xfs_extlen_t flen);
+ xfs_agblock_t fbno, xfs_extlen_t flen, bool userdata);
#endif /* __KERNEL__ */
/*
@@ -224,7 +224,8 @@ int /* error */
xfs_free_extent(
struct xfs_trans *tp, /* transaction pointer */
xfs_fsblock_t bno, /* starting block number of extent */
- xfs_extlen_t len); /* length of extent */
+ xfs_extlen_t len,
+ bool userdata);/* length of extent */
int /* error */
xfs_alloc_lookup_le(
Index: xfs/fs/xfs/xfs_alloc_btree.c
===================================================================
--- xfs.orig/fs/xfs/xfs_alloc_btree.c 2011-03-21 14:48:04.000000000 +0100
+++ xfs/fs/xfs/xfs_alloc_btree.c 2011-03-21 14:49:21.937977488 +0100
@@ -95,7 +95,7 @@ xfs_allocbt_alloc_block(
return 0;
}
- xfs_alloc_busy_reuse(cur->bc_tp, cur->bc_private.a.agno, bno, 1);
+ xfs_alloc_busy_reuse(cur->bc_tp, cur->bc_private.a.agno, bno, 1, false);
xfs_trans_agbtree_delta(cur->bc_tp, 1);
new->s = cpu_to_be32(bno);
@@ -120,18 +120,8 @@ xfs_allocbt_free_block(
if (error)
return error;
- /*
- * Since blocks move to the free list without the coordination used in
- * xfs_bmap_finish, we can't allow block to be available for
- * reallocation and non-transaction writing (user data) until we know
- * that the transaction that moved it to the free list is permanently
- * on disk. We track the blocks by declaring these blocks as "busy";
- * the busy list is maintained on a per-ag basis and each transaction
- * records which entries should be removed when the iclog commits to
- * disk. If a busy block is allocated, the iclog is pushed up to the
- * LSN that freed the block.
- */
- xfs_alloc_busy_insert(cur->bc_tp, be32_to_cpu(agf->agf_seqno), bno, 1);
+ xfs_alloc_busy_insert(cur->bc_tp, be32_to_cpu(agf->agf_seqno), bno,
+ 1, false);
xfs_trans_agbtree_delta(cur->bc_tp, -1);
return 0;
}
Index: xfs/fs/xfs/xfs_ag.h
===================================================================
--- xfs.orig/fs/xfs/xfs_ag.h 2011-03-21 14:48:04.000000000 +0100
+++ xfs/fs/xfs/xfs_ag.h 2011-03-21 14:49:21.941981228 +0100
@@ -187,6 +187,8 @@ struct xfs_busy_extent {
xfs_agnumber_t agno;
xfs_agblock_t bno;
xfs_extlen_t length;
+ unsigned int flags;
+#define XFS_ALLOC_BUSY_USERDATA 0x01 /* freed data extents */
};
/*
Index: xfs/fs/xfs/xfs_bmap.c
===================================================================
--- xfs.orig/fs/xfs/xfs_bmap.c 2011-03-21 14:48:04.238474593 +0100
+++ xfs/fs/xfs/xfs_bmap.c 2011-03-21 14:49:21.949973870 +0100
@@ -180,22 +180,6 @@ xfs_bmap_btree_to_extents(
int whichfork); /* data or attr fork */
/*
- * Called by xfs_bmapi to update file extent records and the btree
- * after removing space (or undoing a delayed allocation).
- */
-STATIC int /* error */
-xfs_bmap_del_extent(
- xfs_inode_t *ip, /* incore inode pointer */
- xfs_trans_t *tp, /* current trans pointer */
- xfs_extnum_t idx, /* extent number to update/insert */
- xfs_bmap_free_t *flist, /* list of extents to be freed */
- xfs_btree_cur_t *cur, /* if null, not a btree */
- xfs_bmbt_irec_t *new, /* new data to add to file extents */
- int *logflagsp,/* inode logging flags */
- int whichfork, /* data or attr fork */
- int rsvd); /* OK to allocate reserved blocks */
-
-/*
* Remove the entry "free" from the free item list. Prev points to the
* previous entry, unless "free" is the head of the list.
*/
@@ -2811,7 +2795,7 @@ xfs_bmap_btree_to_extents(
cblock = XFS_BUF_TO_BLOCK(cbp);
if ((error = xfs_btree_check_block(cur, cblock, 0, cbp)))
return error;
- xfs_bmap_add_free(cbno, 1, cur->bc_private.b.flist, mp);
+ xfs_bmap_add_free(mp, cur->bc_private.b.flist, cbno, 1, 0);
ip->i_d.di_nblocks--;
xfs_trans_mod_dquot_byino(tp, ip, XFS_TRANS_DQ_BCOUNT, -1L);
xfs_trans_binval(tp, cbp);
@@ -2838,8 +2822,7 @@ xfs_bmap_del_extent(
xfs_btree_cur_t *cur, /* if null, not a btree */
xfs_bmbt_irec_t *del, /* data to remove from extents */
int *logflagsp, /* inode logging flags */
- int whichfork, /* data or attr fork */
- int rsvd) /* OK to allocate reserved blocks */
+ int flags) /* XFS_BMAPI_* flags */
{
xfs_filblks_t da_new; /* new delay-alloc indirect blocks */
xfs_filblks_t da_old; /* old delay-alloc indirect blocks */
@@ -2849,7 +2832,6 @@ xfs_bmap_del_extent(
int do_fx; /* free extent at end of routine */
xfs_bmbt_rec_host_t *ep; /* current extent entry pointer */
int error; /* error return value */
- int flags; /* inode logging flags */
xfs_bmbt_irec_t got; /* current extent entry */
xfs_fileoff_t got_endoff; /* first offset past got */
int i; /* temp state */
@@ -2861,12 +2843,17 @@ xfs_bmap_del_extent(
uint qfield; /* quota field to update */
xfs_filblks_t temp; /* for indirect length calculations */
xfs_filblks_t temp2; /* for indirect length calculations */
- int state = 0;
+ int state, whichfork;
XFS_STATS_INC(xs_del_exlist);
- if (whichfork == XFS_ATTR_FORK)
- state |= BMAP_ATTRFORK;
+ if (flags & XFS_BMAPI_ATTRFORK) {
+ whichfork = XFS_ATTR_FORK;
+ state = BMAP_ATTRFORK;
+ } else {
+ whichfork = XFS_DATA_FORK;
+ state = 0;
+ }
mp = ip->i_mount;
ifp = XFS_IFORK_PTR(ip, whichfork);
@@ -3121,9 +3108,13 @@ xfs_bmap_del_extent(
/*
* If we need to, add to list of extents to delete.
*/
- if (do_fx)
- xfs_bmap_add_free(del->br_startblock, del->br_blockcount, flist,
- mp);
+ if (do_fx) {
+ xfs_bmap_add_free(mp, flist, del->br_startblock,
+ del->br_blockcount,
+ (flags & XFS_BMAPI_METADATA) ? 0 :
+ XFS_BFI_USERDATA);
+ }
+
/*
* Adjust inode # blocks in the file.
*/
@@ -3142,7 +3133,9 @@ xfs_bmap_del_extent(
ASSERT(da_old >= da_new);
if (da_old > da_new) {
xfs_icsb_modify_counters(mp, XFS_SBS_FDBLOCKS,
- (int64_t)(da_old - da_new), rsvd);
+ (int64_t)(da_old - da_new),
+ !!(flags & XFS_BMAPI_RSVBLOCKS));
+
}
done:
*logflagsp = flags;
@@ -3723,10 +3716,11 @@ error0:
/* ARGSUSED */
void
xfs_bmap_add_free(
+ struct xfs_mount *mp, /* mount point structure */
+ struct xfs_bmap_free *flist, /* list of extents */
xfs_fsblock_t bno, /* fs block number of extent */
xfs_filblks_t len, /* length of extent */
- xfs_bmap_free_t *flist, /* list of extents */
- xfs_mount_t *mp) /* mount point structure */
+ unsigned int flags)
{
xfs_bmap_free_item_t *cur; /* current (next) element */
xfs_bmap_free_item_t *new; /* new element */
@@ -3750,6 +3744,7 @@ xfs_bmap_add_free(
new = kmem_zone_alloc(xfs_bmap_free_item_zone, KM_SLEEP);
new->xbfi_startblock = bno;
new->xbfi_blockcount = (xfs_extlen_t)len;
+ new->xbfi_flags = XFS_BFI_USERDATA;
for (prev = NULL, cur = flist->xbf_first;
cur != NULL;
prev = cur, cur = cur->xbfi_next) {
@@ -3883,8 +3878,11 @@ xfs_bmap_finish(
efd = xfs_trans_get_efd(ntp, efi, flist->xbf_count);
for (free = flist->xbf_first; free != NULL; free = next) {
next = free->xbfi_next;
- if ((error = xfs_free_extent(ntp, free->xbfi_startblock,
- free->xbfi_blockcount))) {
+
+ error = xfs_free_extent(ntp, free->xbfi_startblock,
+ free->xbfi_blockcount,
+ !!(free->xbfi_flags & XFS_BFI_USERDATA));
+ if (error) {
/*
* The bmap free list will be cleaned up at a
* higher level. The EFI will be canceled when
@@ -5278,7 +5276,7 @@ xfs_bunmapi(
goto error0;
}
error = xfs_bmap_del_extent(ip, tp, lastx, flist, cur, &del,
- &tmp_logflags, whichfork, rsvd);
+ &tmp_logflags, flags);
logflags |= tmp_logflags;
if (error)
goto error0;
Index: xfs/fs/xfs/xfs_bmap.h
===================================================================
--- xfs.orig/fs/xfs/xfs_bmap.h 2011-03-21 14:48:04.000000000 +0100
+++ xfs/fs/xfs/xfs_bmap.h 2011-03-21 14:49:21.953974825 +0100
@@ -35,6 +35,8 @@ typedef struct xfs_bmap_free_item
{
xfs_fsblock_t xbfi_startblock;/* starting fs block number */
xfs_extlen_t xbfi_blockcount;/* number of blocks in extent */
+ unsigned int xbfi_flags;
+#define XFS_BFI_USERDATA 0x01 /* userdata extent */
struct xfs_bmap_free_item *xbfi_next; /* link to next entry */
} xfs_bmap_free_item_t;
@@ -188,10 +190,11 @@ xfs_bmap_add_attrfork(
*/
void
xfs_bmap_add_free(
+ struct xfs_mount *mp, /* mount point structure */
+ struct xfs_bmap_free *flist, /* list of extents */
xfs_fsblock_t bno, /* fs block number of extent */
xfs_filblks_t len, /* length of extent */
- xfs_bmap_free_t *flist, /* list of extents */
- struct xfs_mount *mp); /* mount point structure */
+ unsigned int flags);
/*
* Routine to clean up the free list data structure when
Index: xfs/fs/xfs/xfs_bmap_btree.c
===================================================================
--- xfs.orig/fs/xfs/xfs_bmap_btree.c 2011-03-21 14:48:04.000000000 +0100
+++ xfs/fs/xfs/xfs_bmap_btree.c 2011-03-21 14:49:21.957973331 +0100
@@ -598,7 +598,7 @@ xfs_bmbt_free_block(
struct xfs_trans *tp = cur->bc_tp;
xfs_fsblock_t fsbno = XFS_DADDR_TO_FSB(mp, XFS_BUF_ADDR(bp));
- xfs_bmap_add_free(fsbno, 1, cur->bc_private.b.flist, mp);
+ xfs_bmap_add_free(mp, cur->bc_private.b.flist, fsbno, 1, 0);
ip->i_d.di_nblocks--;
xfs_trans_log_inode(tp, ip, XFS_ILOG_CORE);
Index: xfs/fs/xfs/xfs_fsops.c
===================================================================
--- xfs.orig/fs/xfs/xfs_fsops.c 2011-03-21 14:48:04.000000000 +0100
+++ xfs/fs/xfs/xfs_fsops.c 2011-03-21 14:49:21.961974307 +0100
@@ -344,7 +344,7 @@ xfs_growfs_data_private(
* Free the new space.
*/
error = xfs_free_extent(tp, XFS_AGB_TO_FSB(mp, agno,
- be32_to_cpu(agf->agf_length) - new), new);
+ be32_to_cpu(agf->agf_length) - new), new, false);
if (error) {
goto error0;
}
Index: xfs/fs/xfs/xfs_ialloc.c
===================================================================
--- xfs.orig/fs/xfs/xfs_ialloc.c 2011-03-21 14:48:04.000000000 +0100
+++ xfs/fs/xfs/xfs_ialloc.c 2011-03-21 14:49:21.965975280 +0100
@@ -1154,9 +1154,10 @@ xfs_difree(
goto error0;
}
- xfs_bmap_add_free(XFS_AGB_TO_FSB(mp,
- agno, XFS_INO_TO_AGBNO(mp,rec.ir_startino)),
- XFS_IALLOC_BLOCKS(mp), flist, mp);
+ xfs_bmap_add_free(mp, flist,
+ XFS_AGB_TO_FSB(mp, agno,
+ XFS_INO_TO_AGBNO(mp,rec.ir_startino)),
+ XFS_IALLOC_BLOCKS(mp), 0);
} else {
*delete = 0;
Index: xfs/fs/xfs/xfs_ialloc_btree.c
===================================================================
--- xfs.orig/fs/xfs/xfs_ialloc_btree.c 2011-03-21 14:48:04.000000000 +0100
+++ xfs/fs/xfs/xfs_ialloc_btree.c 2011-03-21 14:49:21.973975583 +0100
@@ -117,7 +117,7 @@ xfs_inobt_free_block(
int error;
fsbno = XFS_DADDR_TO_FSB(cur->bc_mp, XFS_BUF_ADDR(bp));
- error = xfs_free_extent(cur->bc_tp, fsbno, 1);
+ error = xfs_free_extent(cur->bc_tp, fsbno, 1, false);
if (error)
return error;
Index: xfs/fs/xfs/xfs_log_recover.c
===================================================================
--- xfs.orig/fs/xfs/xfs_log_recover.c 2011-03-21 14:48:04.000000000 +0100
+++ xfs/fs/xfs/xfs_log_recover.c 2011-03-21 14:49:21.973975583 +0100
@@ -2907,8 +2907,9 @@ xlog_recover_process_efi(
efdp = xfs_trans_get_efd(tp, efip, efip->efi_format.efi_nextents);
for (i = 0; i < efip->efi_format.efi_nextents; i++) {
- extp = &(efip->efi_format.efi_extents[i]);
- error = xfs_free_extent(tp, extp->ext_start, extp->ext_len);
+ extp = &efip->efi_format.efi_extents[i];
+ error = xfs_free_extent(tp, extp->ext_start, extp->ext_len,
+ false);
if (error)
goto abort_error;
xfs_trans_log_efd_extent(tp, efdp, extp->ext_start,
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH 5/6] xfs: add online discard support
2011-03-22 19:55 [PATCH 0/6] [PATCH 0/6] more efficient busy extent handling and discard support Christoph Hellwig
` (3 preceding siblings ...)
2011-03-22 19:55 ` [PATCH 4/6] xfs: allow reusing busy extents where safe Christoph Hellwig
@ 2011-03-22 19:55 ` Christoph Hellwig
2011-03-23 0:30 ` Dave Chinner
2011-03-25 21:04 ` Alex Elder
2011-03-22 19:55 ` [PATCH 6/6] xfs: make discard operations asynchronous Christoph Hellwig
5 siblings, 2 replies; 31+ messages in thread
From: Christoph Hellwig @ 2011-03-22 19:55 UTC (permalink / raw)
To: xfs
[-- Attachment #1: xfs-add-online-discard-support --]
[-- Type: text/plain, Size: 6812 bytes --]
Now that we have reliably tracking of deleted extents in a transaction
we can easily implement "online" discard support which calls
blkdev_issue_discard once a transaction commits.
The actual discard is a two stage operation as we first have to mark
the busy extent as not available for reuse before we can start the
actual discard. Note that we don't bother supporting discard for
the non-delaylog mode. While that would be possible with this patch
performance is awfull, and the optimization in the next patch won't
work as easily.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Index: xfs/fs/xfs/linux-2.6/xfs_super.c
===================================================================
--- xfs.orig/fs/xfs/linux-2.6/xfs_super.c 2011-03-21 14:51:40.806473979 +0100
+++ xfs/fs/xfs/linux-2.6/xfs_super.c 2011-03-21 14:51:41.437974814 +0100
@@ -112,6 +112,8 @@ mempool_t *xfs_ioend_pool;
#define MNTOPT_QUOTANOENF "qnoenforce" /* same as uqnoenforce */
#define MNTOPT_DELAYLOG "delaylog" /* Delayed loging enabled */
#define MNTOPT_NODELAYLOG "nodelaylog" /* Delayed loging disabled */
+#define MNTOPT_DISCARD "discard" /* Discard unused blocks */
+#define MNTOPT_NODISCARD "nodiscard" /* Do not discard unused blocks */
/*
* Table driven mount option parser.
@@ -356,6 +358,10 @@ xfs_parseargs(
mp->m_flags |= XFS_MOUNT_DELAYLOG;
} else if (!strcmp(this_char, MNTOPT_NODELAYLOG)) {
mp->m_flags &= ~XFS_MOUNT_DELAYLOG;
+ } else if (!strcmp(this_char, MNTOPT_DISCARD)) {
+ mp->m_flags |= XFS_MOUNT_DISCARD;
+ } else if (!strcmp(this_char, MNTOPT_NODISCARD)) {
+ mp->m_flags &= ~XFS_MOUNT_DISCARD;
} else if (!strcmp(this_char, "ihashsize")) {
xfs_warn(mp,
"ihashsize no longer used, option is deprecated.");
@@ -489,6 +495,7 @@ xfs_showargs(
{ XFS_MOUNT_FILESTREAMS, "," MNTOPT_FILESTREAM },
{ XFS_MOUNT_GRPID, "," MNTOPT_GRPID },
{ XFS_MOUNT_DELAYLOG, "," MNTOPT_DELAYLOG },
+ { XFS_MOUNT_DISCARD, "," MNTOPT_DISCARD },
{ 0, NULL }
};
static struct proc_xfs_info xfs_info_unset[] = {
Index: xfs/fs/xfs/xfs_mount.h
===================================================================
--- xfs.orig/fs/xfs/xfs_mount.h 2011-03-21 14:47:03.630474095 +0100
+++ xfs/fs/xfs/xfs_mount.h 2011-03-21 14:51:41.437974814 +0100
@@ -227,6 +227,7 @@ typedef struct xfs_mount {
#define XFS_MOUNT_FS_SHUTDOWN (1ULL << 4) /* atomic stop of all filesystem
operations, typically for
disk errors in metadata */
+#define XFS_MOUNT_DISCARD (1ULL << 5) /* discard unused blocks */
#define XFS_MOUNT_RETERR (1ULL << 6) /* return alignment errors to
user */
#define XFS_MOUNT_NOALIGN (1ULL << 7) /* turn off stripe alignment
Index: xfs/fs/xfs/xfs_log_cil.c
===================================================================
--- xfs.orig/fs/xfs/xfs_log_cil.c 2011-03-21 14:47:03.638475274 +0100
+++ xfs/fs/xfs/xfs_log_cil.c 2011-03-21 14:51:41.441975564 +0100
@@ -29,6 +29,7 @@
#include "xfs_mount.h"
#include "xfs_error.h"
#include "xfs_alloc.h"
+#include "xfs_discard.h"
/*
* Perform initial CIL structure initialisation. If the CIL is not
@@ -361,13 +362,17 @@ xlog_cil_committed(
int abort)
{
struct xfs_cil_ctx *ctx = args;
+ struct xfs_mount *mp = ctx->cil->xc_log->l_mp;
struct xfs_busy_extent *busyp, *n;
xfs_trans_committed_bulk(ctx->cil->xc_log->l_ailp, ctx->lv_chain,
ctx->start_lsn, abort);
- list_for_each_entry_safe(busyp, n, &ctx->busy_extents, list)
- xfs_alloc_busy_clear(ctx->cil->xc_log->l_mp, busyp);
+ list_for_each_entry_safe(busyp, n, &ctx->busy_extents, list) {
+ if (!abort)
+ xfs_discard_extent(mp, busyp);
+ xfs_alloc_busy_clear(mp, busyp);
+ }
spin_lock(&ctx->cil->xc_cil_lock);
list_del(&ctx->committing);
Index: xfs/fs/xfs/linux-2.6/xfs_discard.c
===================================================================
--- xfs.orig/fs/xfs/linux-2.6/xfs_discard.c 2011-03-21 14:47:03.614474345 +0100
+++ xfs/fs/xfs/linux-2.6/xfs_discard.c 2011-03-21 14:51:41.449976366 +0100
@@ -191,3 +191,38 @@ xfs_ioc_trim(
return -XFS_ERROR(EFAULT);
return 0;
}
+
+int
+xfs_discard_extent(
+ struct xfs_mount *mp,
+ struct xfs_busy_extent *busyp)
+{
+ struct xfs_perag *pag;
+ int error = 0;
+ xfs_daddr_t bno;
+ int64_t len;
+ bool done = false;
+
+ if ((mp->m_flags & XFS_MOUNT_DISCARD) == 0)
+ return 0;
+
+ bno = XFS_AGB_TO_DADDR(mp, busyp->agno, busyp->bno);
+ len = XFS_FSB_TO_BB(mp, busyp->length);
+
+ pag = xfs_perag_get(mp, busyp->agno);
+ spin_lock(&pag->pagb_lock);
+ if (!busyp->length)
+ done = true;
+ busyp->flags = XFS_ALLOC_BUSY_DISCARDED;
+ spin_unlock(&pag->pagb_lock);
+ xfs_perag_put(pag);
+
+ if (done)
+ return 0;
+
+ error = -blkdev_issue_discard(mp->m_ddev_targp->bt_bdev, bno, len,
+ GFP_NOFS, 0);
+ if (error && error != EOPNOTSUPP)
+ xfs_info(mp, "discard failed, error %d", error);
+ return error;
+}
Index: xfs/fs/xfs/linux-2.6/xfs_discard.h
===================================================================
--- xfs.orig/fs/xfs/linux-2.6/xfs_discard.h 2011-03-21 14:47:03.622475346 +0100
+++ xfs/fs/xfs/linux-2.6/xfs_discard.h 2011-03-21 14:51:41.453973708 +0100
@@ -2,7 +2,11 @@
#define XFS_DISCARD_H 1
struct fstrim_range;
+struct xfs_busy_extent;
extern int xfs_ioc_trim(struct xfs_mount *, struct fstrim_range __user *);
+extern int xfs_discard_extent(struct xfs_mount *,
+ struct xfs_busy_extent *);
+
#endif /* XFS_DISCARD_H */
Index: xfs/fs/xfs/xfs_ag.h
===================================================================
--- xfs.orig/fs/xfs/xfs_ag.h 2011-03-21 14:49:21.941981228 +0100
+++ xfs/fs/xfs/xfs_ag.h 2011-03-21 14:51:41.457976522 +0100
@@ -189,6 +189,7 @@ struct xfs_busy_extent {
xfs_extlen_t length;
unsigned int flags;
#define XFS_ALLOC_BUSY_USERDATA 0x01 /* freed data extents */
+#define XFS_ALLOC_BUSY_DISCARDED 0x02 /* can't be reused */
};
/*
Index: xfs/fs/xfs/xfs_alloc.c
===================================================================
--- xfs.orig/fs/xfs/xfs_alloc.c 2011-03-21 14:51:31.746155282 +0100
+++ xfs/fs/xfs/xfs_alloc.c 2011-03-21 14:51:41.457976522 +0100
@@ -2586,6 +2586,9 @@ xfs_alloc_busy_try_reuse(
xfs_agblock_t bbno = busyp->bno;
xfs_agblock_t bend = bbno + busyp->length;
+ if (busyp->flags & XFS_ALLOC_BUSY_DISCARDED)
+ return -1;
+
if (bbno < fbno && bend > fend) {
/*
* Case 1:
@@ -2778,8 +2781,9 @@ restart:
continue;
}
- if (!args->userdata ||
- (busyp->flags & XFS_ALLOC_BUSY_USERDATA)) {
+ if (!(busyp->flags & XFS_ALLOC_BUSY_DISCARDED) &&
+ (!args->userdata ||
+ (busyp->flags & XFS_ALLOC_BUSY_USERDATA))) {
int overlap;
overlap = xfs_alloc_busy_try_reuse(args->pag, busyp,
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH 6/6] xfs: make discard operations asynchronous
2011-03-22 19:55 [PATCH 0/6] [PATCH 0/6] more efficient busy extent handling and discard support Christoph Hellwig
` (4 preceding siblings ...)
2011-03-22 19:55 ` [PATCH 5/6] xfs: add online discard support Christoph Hellwig
@ 2011-03-22 19:55 ` Christoph Hellwig
2011-03-23 0:43 ` Dave Chinner
2011-03-25 21:04 ` Alex Elder
5 siblings, 2 replies; 31+ messages in thread
From: Christoph Hellwig @ 2011-03-22 19:55 UTC (permalink / raw)
To: xfs
[-- Attachment #1: xfs-async-discard-v2 --]
[-- Type: text/plain, Size: 10195 bytes --]
Instead of waiting for each discard request keep the CIL context alive
until all of them are done, at which point we can tear it down completly
and remove the busy extents from the rbtree.
At this point I'm doing the I/O completion from IRQ context for simplicity,
but I'll benchmark it against a version that uses a workqueue.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Index: xfs/fs/xfs/linux-2.6/xfs_discard.c
===================================================================
--- xfs.orig/fs/xfs/linux-2.6/xfs_discard.c 2011-03-22 15:58:10.301855813 +0100
+++ xfs/fs/xfs/linux-2.6/xfs_discard.c 2011-03-22 18:39:09.000000000 +0100
@@ -30,6 +30,7 @@
#include "xfs_inode.h"
#include "xfs_alloc.h"
#include "xfs_error.h"
+#include "xfs_log_priv.h"
#include "xfs_discard.h"
#include "xfs_trace.h"
@@ -192,37 +193,119 @@ xfs_ioc_trim(
return 0;
}
+void
+xfs_cil_discard_done(
+ struct xfs_cil_ctx *ctx)
+{
+ if (atomic_dec_and_test(&ctx->discards)) {
+ struct xfs_busy_extent *busyp, *n;
+
+ list_for_each_entry_safe(busyp, n, &ctx->busy_extents, list)
+ xfs_alloc_busy_clear(ctx->cil->xc_log->l_mp, busyp);
+ kmem_free(ctx);
+ }
+}
+
+STATIC void
+xfs_discard_end_io(
+ struct bio *bio,
+ int err)
+{
+ struct xfs_cil_ctx *ctx = bio->bi_private;
+
+ if (err && err != -EOPNOTSUPP) {
+ xfs_info(ctx->cil->xc_log->l_mp,
+ "I/O error during discard\n");
+ }
+
+ bio_put(bio);
+ xfs_cil_discard_done(ctx);
+}
+
+static int
+xfs_issue_discard(
+ struct block_device *bdev,
+ sector_t sector,
+ sector_t nr_sects,
+ gfp_t gfp_mask,
+ struct xfs_cil_ctx *ctx)
+{
+ struct request_queue *q = bdev_get_queue(bdev);
+ unsigned int max_discard_sectors;
+ struct bio *bio;
+ int ret = 0;
+
+ if (!q)
+ return -ENXIO;
+
+ if (!blk_queue_discard(q))
+ return -EOPNOTSUPP;
+
+ /*
+ * Ensure that max_discard_sectors is of the proper
+ * granularity
+ */
+ max_discard_sectors = min(q->limits.max_discard_sectors, UINT_MAX >> 9);
+ if (q->limits.discard_granularity) {
+ unsigned int disc_sects = q->limits.discard_granularity >> 9;
+
+ max_discard_sectors &= ~(disc_sects - 1);
+ }
+
+
+ while (nr_sects && !ret) {
+ bio = bio_alloc(gfp_mask, 1);
+ if (!bio) {
+ ret = -ENOMEM;
+ break;
+ }
+
+ bio->bi_sector = sector;
+ bio->bi_end_io = xfs_discard_end_io;
+ bio->bi_bdev = bdev;
+ bio->bi_private = ctx;
+
+ if (nr_sects > max_discard_sectors) {
+ bio->bi_size = max_discard_sectors << 9;
+ nr_sects -= max_discard_sectors;
+ sector += max_discard_sectors;
+ } else {
+ bio->bi_size = nr_sects << 9;
+ nr_sects = 0;
+ }
+
+ atomic_inc(&ctx->discards);
+ submit_bio(REQ_WRITE | REQ_DISCARD, bio);
+ }
+
+ return ret;
+}
+
int
xfs_discard_extent(
struct xfs_mount *mp,
- struct xfs_busy_extent *busyp)
+ struct xfs_busy_extent *busyp,
+ struct xfs_cil_ctx *ctx)
{
struct xfs_perag *pag;
- int error = 0;
xfs_daddr_t bno;
int64_t len;
bool done = false;
- if ((mp->m_flags & XFS_MOUNT_DISCARD) == 0)
- return 0;
-
bno = XFS_AGB_TO_DADDR(mp, busyp->agno, busyp->bno);
len = XFS_FSB_TO_BB(mp, busyp->length);
pag = xfs_perag_get(mp, busyp->agno);
- spin_lock(&pag->pagb_lock);
+ spin_lock_irq(&pag->pagb_lock);
if (!busyp->length)
done = true;
busyp->flags = XFS_ALLOC_BUSY_DISCARDED;
- spin_unlock(&pag->pagb_lock);
+ spin_unlock_irq(&pag->pagb_lock);
xfs_perag_put(pag);
if (done)
return 0;
- error = -blkdev_issue_discard(mp->m_ddev_targp->bt_bdev, bno, len,
- GFP_NOFS, 0);
- if (error && error != EOPNOTSUPP)
- xfs_info(mp, "discard failed, error %d", error);
- return error;
+ return -xfs_issue_discard(mp->m_ddev_targp->bt_bdev,
+ bno, len, GFP_NOFS, ctx);
}
Index: xfs/fs/xfs/linux-2.6/xfs_discard.h
===================================================================
--- xfs.orig/fs/xfs/linux-2.6/xfs_discard.h 2011-03-22 15:58:10.313857879 +0100
+++ xfs/fs/xfs/linux-2.6/xfs_discard.h 2011-03-22 18:39:09.000000000 +0100
@@ -3,10 +3,13 @@
struct fstrim_range;
struct xfs_busy_extent;
+struct xfs_cil_ctx;
extern int xfs_ioc_trim(struct xfs_mount *, struct fstrim_range __user *);
extern int xfs_discard_extent(struct xfs_mount *,
- struct xfs_busy_extent *);
+ struct xfs_busy_extent *,
+ struct xfs_cil_ctx *);
+extern void xfs_cil_discard_done(struct xfs_cil_ctx *ctx);
#endif /* XFS_DISCARD_H */
Index: xfs/fs/xfs/xfs_log_cil.c
===================================================================
--- xfs.orig/fs/xfs/xfs_log_cil.c 2011-03-22 15:58:10.329855977 +0100
+++ xfs/fs/xfs/xfs_log_cil.c 2011-03-22 18:39:09.000000000 +0100
@@ -68,6 +68,7 @@ xlog_cil_init(
INIT_LIST_HEAD(&ctx->busy_extents);
ctx->sequence = 1;
ctx->cil = cil;
+ atomic_set(&ctx->discards, 1);
cil->xc_ctx = ctx;
cil->xc_current_sequence = ctx->sequence;
@@ -364,14 +365,18 @@ xlog_cil_committed(
struct xfs_cil_ctx *ctx = args;
struct xfs_mount *mp = ctx->cil->xc_log->l_mp;
struct xfs_busy_extent *busyp, *n;
+ bool keep_alive = false;
xfs_trans_committed_bulk(ctx->cil->xc_log->l_ailp, ctx->lv_chain,
ctx->start_lsn, abort);
- list_for_each_entry_safe(busyp, n, &ctx->busy_extents, list) {
- if (!abort)
- xfs_discard_extent(mp, busyp);
- xfs_alloc_busy_clear(mp, busyp);
+ if (!(mp->m_flags & XFS_MOUNT_DISCARD) || abort) {
+ list_for_each_entry_safe(busyp, n, &ctx->busy_extents, list)
+ xfs_alloc_busy_clear(mp, busyp);
+ } else if (!list_empty(&ctx->busy_extents)) {
+ list_for_each_entry(busyp, &ctx->busy_extents, list)
+ xfs_discard_extent(mp, busyp, ctx);
+ keep_alive = true;
}
spin_lock(&ctx->cil->xc_cil_lock);
@@ -379,7 +384,10 @@ xlog_cil_committed(
spin_unlock(&ctx->cil->xc_cil_lock);
xlog_cil_free_logvec(ctx->lv_chain);
- kmem_free(ctx);
+ if (keep_alive)
+ xfs_cil_discard_done(ctx);
+ else
+ kmem_free(ctx);
}
/*
@@ -490,6 +498,7 @@ xlog_cil_push(
INIT_LIST_HEAD(&new_ctx->busy_extents);
new_ctx->sequence = ctx->sequence + 1;
new_ctx->cil = cil;
+ atomic_set(&ctx->discards, 1);
cil->xc_ctx = new_ctx;
/*
Index: xfs/fs/xfs/xfs_alloc.c
===================================================================
--- xfs.orig/fs/xfs/xfs_alloc.c 2011-03-22 18:39:05.173855849 +0100
+++ xfs/fs/xfs/xfs_alloc.c 2011-03-22 18:39:09.000000000 +0100
@@ -2498,7 +2498,7 @@ xfs_alloc_busy_insert(
trace_xfs_alloc_busy(tp, agno, bno, len, 0);
pag = xfs_perag_get(tp->t_mountp, new->agno);
- spin_lock(&pag->pagb_lock);
+ spin_lock_irq(&pag->pagb_lock);
rbp = &pag->pagb_tree.rb_node;
while (*rbp) {
parent = *rbp;
@@ -2521,7 +2521,7 @@ xfs_alloc_busy_insert(
rb_insert_color(&new->rb_node, &pag->pagb_tree);
list_add(&new->list, &tp->t_busy);
- spin_unlock(&pag->pagb_lock);
+ spin_unlock_irq(&pag->pagb_lock);
xfs_perag_put(pag);
}
@@ -2547,7 +2547,7 @@ xfs_alloc_busy_search(
int match = 0;
pag = xfs_perag_get(mp, agno);
- spin_lock(&pag->pagb_lock);
+ spin_lock_irq(&pag->pagb_lock);
rbp = pag->pagb_tree.rb_node;
@@ -2570,7 +2570,7 @@ xfs_alloc_busy_search(
break;
}
}
- spin_unlock(&pag->pagb_lock);
+ spin_unlock_irq(&pag->pagb_lock);
trace_xfs_alloc_busysearch(mp, agno, bno, len, !!match);
xfs_perag_put(pag);
return match;
@@ -2706,7 +2706,7 @@ xfs_alloc_busy_reuse(
pag = xfs_perag_get(tp->t_mountp, agno);
restart:
- spin_lock(&pag->pagb_lock);
+ spin_lock_irq(&pag->pagb_lock);
rbp = pag->pagb_tree.rb_node;
while (rbp) {
struct xfs_busy_extent *busyp =
@@ -2727,7 +2727,7 @@ restart:
overlap = xfs_alloc_busy_try_reuse(pag, busyp,
fbno, fbno + flen);
if (overlap == -1 || (overlap && userdata)) {
- spin_unlock(&pag->pagb_lock);
+ spin_unlock_irq(&pag->pagb_lock);
xfs_log_force(tp->t_mountp, XFS_LOG_SYNC);
goto restart;
}
@@ -2743,7 +2743,7 @@ restart:
else
rbp = rbp->rb_right;
}
- spin_unlock(&pag->pagb_lock);
+ spin_unlock_irq(&pag->pagb_lock);
xfs_perag_put(pag);
}
@@ -2764,7 +2764,7 @@ xfs_alloc_busy_trim(
ASSERT(flen > 0);
restart:
- spin_lock(&args->pag->pagb_lock);
+ spin_lock_irq(&args->pag->pagb_lock);
rbp = args->pag->pagb_tree.rb_node;
while (rbp && flen >= args->minlen) {
struct xfs_busy_extent *busyp =
@@ -2789,7 +2789,7 @@ restart:
overlap = xfs_alloc_busy_try_reuse(args->pag, busyp,
fbno, fbno + flen);
if (unlikely(overlap == -1)) {
- spin_unlock(&args->pag->pagb_lock);
+ spin_unlock_irq(&args->pag->pagb_lock);
xfs_log_force(args->mp, XFS_LOG_SYNC);
goto restart;
}
@@ -2935,7 +2935,7 @@ restart:
flen = fend - fbno;
}
out:
- spin_unlock(&args->pag->pagb_lock);
+ spin_unlock_irq(&args->pag->pagb_lock);
*rbno = fbno;
*rlen = flen;
return;
@@ -2944,7 +2944,7 @@ fail:
* Return a zero extent length as failure indications. All callers
* re-check if the trimmed extent satisfies the minlen requirement.
*/
- spin_unlock(&args->pag->pagb_lock);
+ spin_unlock_irq(&args->pag->pagb_lock);
*rbno = fbno;
*rlen = 0;
}
@@ -2955,6 +2955,7 @@ xfs_alloc_busy_clear(
struct xfs_busy_extent *busyp)
{
struct xfs_perag *pag;
+ unsigned long flags;
trace_xfs_alloc_unbusy(mp, busyp->agno, busyp->bno,
busyp->length);
@@ -2962,10 +2963,10 @@ xfs_alloc_busy_clear(
list_del_init(&busyp->list);
pag = xfs_perag_get(mp, busyp->agno);
- spin_lock(&pag->pagb_lock);
+ spin_lock_irqsave(&pag->pagb_lock, flags);
if (busyp->length)
rb_erase(&busyp->rb_node, &pag->pagb_tree);
- spin_unlock(&pag->pagb_lock);
+ spin_unlock_irqrestore(&pag->pagb_lock, flags);
xfs_perag_put(pag);
kmem_free(busyp);
Index: xfs/fs/xfs/xfs_log_priv.h
===================================================================
--- xfs.orig/fs/xfs/xfs_log_priv.h 2011-03-22 18:39:05.229883275 +0100
+++ xfs/fs/xfs/xfs_log_priv.h 2011-03-22 18:39:09.000000000 +0100
@@ -389,6 +389,7 @@ struct xfs_cil_ctx {
struct xfs_log_vec *lv_chain; /* logvecs being pushed */
xfs_log_callback_t log_cb; /* completion callback hook. */
struct list_head committing; /* ctx committing list */
+ atomic_t discards; /* no. of pending discards */
};
/*
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 1/6] xfs: optimize AGFL refills
2011-03-22 19:55 ` [PATCH 1/6] xfs: optimize AGFL refills Christoph Hellwig
@ 2011-03-22 22:30 ` Alex Elder
2011-03-23 12:16 ` Christoph Hellwig
2011-03-22 23:30 ` Dave Chinner
1 sibling, 1 reply; 31+ messages in thread
From: Alex Elder @ 2011-03-22 22:30 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: xfs
On Tue, 2011-03-22 at 15:55 -0400, Christoph Hellwig wrote:
> Avoid forcing out busy extent when moving blocks from/to the AGFL. We
Without thinking a bit about it, it wasn't clear why doing
this was OK to do. It would be nice to record a one-sentence
justification of why we don't need to force busy extents out
in this case.
> archive this my moving the busy search out of xfs_alloc_get_freelist into
achieve this
> the callers that need it, and by moving the busy list insert from
> xfs_free_ag_extent extent which is used both by AGFL refills and real
^^^^^ drop this word
> allocation to xfs_free_extent, which is only used by the latter.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Otherwise the change itself looks good.
Reviewed-by: Alex Elder <aelder@sgi.com>
. . .
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 2/6] xfs: do not immediately reuse busy extent ranges
2011-03-22 19:55 ` [PATCH 2/6] xfs: do not immediately reuse busy extent ranges Christoph Hellwig
@ 2011-03-22 22:30 ` Alex Elder
2011-03-23 12:17 ` Christoph Hellwig
2011-03-25 21:03 ` Alex Elder
1 sibling, 1 reply; 31+ messages in thread
From: Alex Elder @ 2011-03-22 22:30 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: xfs
On Tue, 2011-03-22 at 15:55 -0400, Christoph Hellwig wrote:
> plain text document attachment (xfs-skip-busy-extents)
> Every time we reallocate a busy extent, we cause a synchronous log force
> to occur to ensure the freeing transaction is on disk before we continue
> and use the newly allocated extent. This is extremely sub-optimal as we
> have to mark every transaction with blocks that get reused as synchronous.
>
> Instead of searching the busy extent list after deciding on the extent to
> allocate, check each candidate extent during the allocation decisions as
> to whether they are in the busy list. If they are in the busy list, we
> trim the busy range out of the extent we have found and determine if that
> trimmed range is still OK for allocation. In many cases, this check can
> be incorporated into the allocation extent alignment code which already
> does trimming of the found extent before determining if it is a valid
> candidate for allocation.
>
> Based on two earlier patches from Dave Chinner.
Again, this looks nearly identical to your last version.
On that one, I suggested rewording a comment, and you
said "ok." You did not do so in this version, which I
guess is fine. I just want to know whether you intended
to. If so I'll give you a chance to post an update; if
not I can take this one as-is.
(I think I'll have the rest of the series reviewed tomorrow.)
Reviewed-by: Alex Elder <aelder@sgi.com>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
. . .
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 1/6] xfs: optimize AGFL refills
2011-03-22 19:55 ` [PATCH 1/6] xfs: optimize AGFL refills Christoph Hellwig
2011-03-22 22:30 ` Alex Elder
@ 2011-03-22 23:30 ` Dave Chinner
2011-03-23 12:16 ` Christoph Hellwig
1 sibling, 1 reply; 31+ messages in thread
From: Dave Chinner @ 2011-03-22 23:30 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: xfs
On Tue, Mar 22, 2011 at 03:55:51PM -0400, Christoph Hellwig wrote:
> Avoid forcing out busy extent when moving blocks from/to the AGFL. We
> archive this my moving the busy search out of xfs_alloc_get_freelist into
> the callers that need it, and by moving the busy list insert from
> xfs_free_ag_extent extent which is used both by AGFL refills and real
> allocation to xfs_free_extent, which is only used by the latter.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
>
> Index: xfs/fs/xfs/xfs_alloc.c
> ===================================================================
> --- xfs.orig/fs/xfs/xfs_alloc.c 2011-03-19 16:49:23.774797370 +0100
> +++ xfs/fs/xfs/xfs_alloc.c 2011-03-19 16:49:38.882797272 +0100
> @@ -1326,6 +1326,8 @@ xfs_alloc_ag_vextent_small(
> if (error)
> goto error0;
> if (fbno != NULLAGBLOCK) {
> + if (xfs_alloc_busy_search(args->mp, args->agno, fbno, 1))
> + xfs_trans_set_sync(args->tp);
> if (args->userdata) {
> xfs_buf_t *bp;
>
> @@ -1617,18 +1619,6 @@ xfs_free_ag_extent(
>
> trace_xfs_free_extent(mp, agno, bno, len, isfl, haveleft, haveright);
>
> - /*
> - * Since blocks move to the free list without the coordination
> - * used in xfs_bmap_finish, we can't allow block to be available
> - * for reallocation and non-transaction writing (user data)
> - * until we know that the transaction that moved it to the free
> - * list is permanently on disk. We track the blocks by declaring
> - * these blocks as "busy"; the busy list is maintained on a per-ag
> - * basis and each transaction records which entries should be removed
> - * when the iclog commits to disk. If a busy block is allocated,
> - * the iclog is pushed up to the LSN that freed the block.
> - */
> - xfs_alloc_busy_insert(tp, agno, bno, len);
> return 0;
>
> error0:
> @@ -1923,21 +1913,6 @@ xfs_alloc_get_freelist(
> xfs_alloc_log_agf(tp, agbp, logflags);
> *bnop = bno;
>
> - /*
> - * As blocks are freed, they are added to the per-ag busy list and
> - * remain there until the freeing transaction is committed to disk.
> - * Now that we have allocated blocks, this list must be searched to see
> - * if a block is being reused. If one is, then the freeing transaction
> - * must be pushed to disk before this transaction.
> - *
> - * We do this by setting the current transaction to a sync transaction
> - * which guarantees that the freeing transaction is on disk before this
> - * transaction. This is done instead of a synchronous log force here so
> - * that we don't sit and wait with the AGF locked in the transaction
> - * during the log force.
> - */
> - if (xfs_alloc_busy_search(mp, be32_to_cpu(agf->agf_seqno), bno, 1))
> - xfs_trans_set_sync(tp);
> return 0;
> }
>
> @@ -2407,6 +2382,8 @@ xfs_free_extent(
> be32_to_cpu(XFS_BUF_TO_AGF(args.agbp)->agf_length));
> #endif
> error = xfs_free_ag_extent(tp, args.agbp, args.agno, args.agbno, len, 0);
> + if (error)
> + xfs_alloc_busy_insert(tp, args.agno, args.agbno, len);
Shouldn't that be "if (!error)"? i.e. if we freed the extent
successfully we need to insert it in the busy list. This current
code has the effect of never inserting freed data extents in the
busy list....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 3/6] xfs: exact busy extent tracking
2011-03-22 19:55 ` [PATCH 3/6] xfs: exact busy extent tracking Christoph Hellwig
@ 2011-03-22 23:47 ` Dave Chinner
2011-03-23 12:24 ` Christoph Hellwig
2011-03-25 21:04 ` Alex Elder
1 sibling, 1 reply; 31+ messages in thread
From: Dave Chinner @ 2011-03-22 23:47 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: xfs
On Tue, Mar 22, 2011 at 03:55:53PM -0400, Christoph Hellwig wrote:
> Update the extent tree in case we have to reuse a busy extent, so that it
> always is kept uptodate. This is done by replacing the busy list searches
> with a new xfs_alloc_busy_reuse helper, which updates the busy extent tree
> in case of a reuse. Also replace setting transactions to sync with forcing
> the log out in case we found a busy extent to reuse. This makes the code a
> lot more simple, and is required for discard support later on. While it
> will cause performance regressios with just this patch applied, the impact
> is more than mitigated by the next patch in the series.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
>
> Index: xfs/fs/xfs/xfs_alloc.c
> ===================================================================
> --- xfs.orig/fs/xfs/xfs_alloc.c 2011-03-20 19:41:55.835479390 +0100
> +++ xfs/fs/xfs/xfs_alloc.c 2011-03-21 14:49:14.157973188 +0100
.....
> - new->tid = xfs_log_get_trans_ident(tp);
> -
> INIT_LIST_HEAD(&new->list);
>
> /* trace before insert to be able to see failed inserts */
> trace_xfs_alloc_busy(tp, agno, bno, len, 0);
>
> pag = xfs_perag_get(tp->t_mountp, new->agno);
> -restart:
> spin_lock(&pag->pagb_lock);
> rbp = &pag->pagb_tree.rb_node;
> - parent = NULL;
> - busyp = NULL;
> - match = 0;
> - while (*rbp && match >= 0) {
> + while (*rbp) {
> parent = *rbp;
> busyp = rb_entry(parent, struct xfs_busy_extent, rb_node);
>
> if (new->bno < busyp->bno) {
> /* may overlap, but exact start block is lower */
> rbp = &(*rbp)->rb_left;
> - if (new->bno + new->length > busyp->bno)
> - match = busyp->tid == new->tid ? 1 : -1;
> + BUG_ON(new->bno + new->length > busyp->bno);
BUG_ON() inside a spinlock will effectively lock up the machine as
other CPUs try to take the spinlock that was held by the thread that
went bad. It causes the machine to die a horrible, undebuggable death
rather than just kill the thread that caused the oops and leaving
everything else still functional.
If it were an ASSERT(), that's probably OK because it is only debug
systems that woul dhave this problem, but the use of BUG(_ON) means
it will cause production systems to die as well.
> } else if (new->bno > busyp->bno) {
> /* may overlap, but exact start block is higher */
> rbp = &(*rbp)->rb_right;
> - if (bno < busyp->bno + busyp->length)
> - match = busyp->tid == new->tid ? 1 : -1;
> + BUG_ON(bno < busyp->bno + busyp->length);
> } else {
> - match = busyp->tid == new->tid ? 1 : -1;
> - break;
> + BUG();
> }
> }
> - if (match < 0) {
> - /* overlap marked busy in different transaction */
> - spin_unlock(&pag->pagb_lock);
> - xfs_log_force(tp->t_mountp, XFS_LOG_SYNC);
> - goto restart;
> - }
> - if (match > 0) {
> - /*
> - * overlap marked busy in same transaction. Update if exact
> - * start block match, otherwise combine the busy extents into
> - * a single range.
> - */
> - if (busyp->bno == new->bno) {
> - busyp->length = max(busyp->length, new->length);
> - spin_unlock(&pag->pagb_lock);
> - ASSERT(tp->t_flags & XFS_TRANS_SYNC);
> - xfs_perag_put(pag);
> - kmem_free(new);
> - return;
> - }
> - rb_erase(&busyp->rb_node, &pag->pagb_tree);
> - new->length = max(busyp->bno + busyp->length,
> - new->bno + new->length) -
> - min(busyp->bno, new->bno);
> - new->bno = min(busyp->bno, new->bno);
> - } else
> - busyp = NULL;
>
> rb_link_node(&new->rb_node, parent, rbp);
> rb_insert_color(&new->rb_node, &pag->pagb_tree);
> @@ -2650,7 +2518,6 @@ restart:
> list_add(&new->list, &tp->t_busy);
> spin_unlock(&pag->pagb_lock);
> xfs_perag_put(pag);
> - kmem_free(busyp);
> }
>
> /*
> @@ -2704,6 +2571,173 @@ xfs_alloc_busy_search(
> return match;
> }
>
> +STATIC int
> +xfs_alloc_busy_try_reuse(
> + struct xfs_perag *pag,
> + struct xfs_busy_extent *busyp,
> + xfs_agblock_t fbno,
> + xfs_agblock_t fend)
> +{
> + xfs_agblock_t bbno = busyp->bno;
> + xfs_agblock_t bend = bbno + busyp->length;
> +
> + if (bbno < fbno && bend > fend) {
> + /*
> + * Case 1:
> + * bbno bend
> + * +BBBBBBBBBBBBBBBBB+
> + * +---------+
> + * fbno fend
> + */
> +
> + /*
> + * We would have to split the busy extent to be able
> + * to track it correct, which we cannot do because we
> + * would have to modify the list of busy extents
> + * attached to the transaction/CIL context, which
> + * is mutable.
immutable?
> + *
> + * Force out the log to clear the busy extents
> + * and retry the search.
> + */
BTW, these comments appear to wrap at 68 columns - any particular
reason?
> + return -1;
> + } else if (bbno >= fbno && bend <= fend) {
> + /*
> + * Case 2:
> + * bbno bend
> + * +BBBBBBBBBBBBBBBBB+
> + * +-----------------+
> + * fbno fend
> + *
> + * Case 3:
> + * bbno bend
> + * +BBBBBBBBBBBBBBBBB+
> + * +--------------------------+
> + * fbno fend
> + *
> + * Case 4:
> + * bbno bend
> + * +BBBBBBBBBBBBBBBBB+
> + * +--------------------------+
> + * fbno fend
> + *
> + * Case 5:
> + * bbno bend
> + * +BBBBBBBBBBBBBBBBB+
> + * +-----------------------------------+
> + * fbno fend
> + *
> + */
> +
> + /*
> + * The busy extent is fully covered by the extent
> + * we are allocating, and can simply be removed from
> + * the rbtree. However we cannot remove it from the
> + * immutable list tracking busy extents in the
> + * transaction or CIL context, so set the length
> + * to zero to mark it invalid.
> + */
> + rb_erase(&busyp->rb_node, &pag->pagb_tree);
> + busyp->length = 0;
> + } else if (bbno == fbno) {
> + /*
> + * Case 6:
> + * bbno bend
> + * +BBBBBBBBBBBBBBBBB+
> + * +---------+
> + * fbno fend
> + *
> + * Case 7:
> + * bbno bend
> + * +BBBBBBBBBBBBBBBBB+
> + * +------------------+
> + * fbno fend
> + *
> + */
> +
> + busyp->bno = fend;
> + } else if (bend == fend) {
> + /*
> + * Case 8:
> + * bbno bend
> + * +BBBBBBBBBBBBBBBBB+
> + * +-------------+
> + * fbno fend
> + *
> + * Case 9:
> + * bbno bend
> + * +BBBBBBBBBBBBBBBBB+
> + * +----------------------+
> + * fbno fend
> + */
> +
> + busyp->length = fbno - busyp->bno;
> + } else {
> + BUG();
> + }
> +
> + return 1;
> +}
> +
> +
> +/*
> + * For a given extent [fbno, flen], make sure we can reuse it safely.
> + */
> +void
> +xfs_alloc_busy_reuse(
> + struct xfs_trans *tp,
> + xfs_agnumber_t agno,
> + xfs_agblock_t fbno,
> + xfs_extlen_t flen)
> +{
> + struct xfs_perag *pag;
> + struct rb_node *rbp;
> +
> + ASSERT(flen > 0);
> +
> + pag = xfs_perag_get(tp->t_mountp, agno);
> +restart:
> + spin_lock(&pag->pagb_lock);
> + rbp = pag->pagb_tree.rb_node;
> + while (rbp) {
> + struct xfs_busy_extent *busyp =
> + rb_entry(rbp, struct xfs_busy_extent, rb_node);
> + xfs_agblock_t fend = fbno + flen;
> + xfs_agblock_t bbno = busyp->bno;
> + xfs_agblock_t bend = bbno + busyp->length;
> + int overlap;
> +
> + if (fend <= bbno) {
> + rbp = rbp->rb_left;
> + continue;
> + } else if (fbno >= bend) {
> + rbp = rbp->rb_right;
> + continue;
> + }
> +
> + overlap = xfs_alloc_busy_try_reuse(pag, busyp,
> + fbno, fbno + flen);
> + if (overlap) {
> + spin_unlock(&pag->pagb_lock);
> + xfs_log_force(tp->t_mountp, XFS_LOG_SYNC);
> + goto restart;
> + }
xfs_alloc_busy_try_reuse() only ever returns -1 or 1, which means
this will always trigger a log force on overlap....
> +
> + /*
> + * No more busy extents to search.
> + */
> + if (bbno <= fbno && bend >= fend)
> + break;
> +
> + if (fbno < bbno)
> + rbp = rbp->rb_left;
> + else
> + rbp = rbp->rb_right;
and this code is never executed.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 4/6] xfs: allow reusing busy extents where safe
2011-03-22 19:55 ` [PATCH 4/6] xfs: allow reusing busy extents where safe Christoph Hellwig
@ 2011-03-23 0:20 ` Dave Chinner
2011-03-23 12:26 ` Christoph Hellwig
2011-03-25 21:04 ` Alex Elder
1 sibling, 1 reply; 31+ messages in thread
From: Dave Chinner @ 2011-03-23 0:20 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: xfs
On Tue, Mar 22, 2011 at 03:55:54PM -0400, Christoph Hellwig wrote:
> Allow reusing any busy extent for metadata allocations, and reusing busy
> userdata extents for userdata allocations. Most of the complexity is
> propagating the userdata information from the XFS_BMAPI_METADATA flag
> to xfs_bunmapi into the low-level extent freeing routines. After that
> we can just track what type of busy extent we have and treat it accordingly.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
.....
> @@ -2717,7 +2723,7 @@ restart:
>
> overlap = xfs_alloc_busy_try_reuse(pag, busyp,
> fbno, fbno + flen);
> - if (overlap) {
> + if (overlap == -1 || (overlap && userdata)) {
> spin_unlock(&pag->pagb_lock);
> xfs_log_force(tp->t_mountp, XFS_LOG_SYNC);
> goto restart;
Ok, so the only time we'll do a log force now is on an complete
overlap or a partial userdata overlap?
> @@ -2754,6 +2760,7 @@ xfs_alloc_busy_trim(
>
> ASSERT(flen > 0);
>
> +restart:
> spin_lock(&args->pag->pagb_lock);
> rbp = args->pag->pagb_tree.rb_node;
> while (rbp && flen >= args->minlen) {
> @@ -2771,6 +2778,31 @@ xfs_alloc_busy_trim(
> continue;
> }
>
> + if (!args->userdata ||
> + (busyp->flags & XFS_ALLOC_BUSY_USERDATA)) {
> + int overlap;
> +
> + overlap = xfs_alloc_busy_try_reuse(args->pag, busyp,
> + fbno, fbno + flen);
> + if (unlikely(overlap == -1)) {
> + spin_unlock(&args->pag->pagb_lock);
> + xfs_log_force(args->mp, XFS_LOG_SYNC);
> + goto restart;
> + }
Hmmmm - I'm not so sure we can reuse overlapped data extents for
data allocations without a log force at all as there is no guarantee
that the data will not be overwritten before the original free
transaction is on disk.
That is, recovery may not replay the original data extent free
transaction or the new allocation transaction, but there is nothing
stopping us from having written the new data into the extent before
the crash occurred, especially as delayed allocation places the
allocation very close the data IO issue. e.g.:
thread X thread Y
free data extent ABC
allocate data extent BCD
partial overlap, no log force
issue data IO
.....
<crash>
That leads to corruption of the data in the original file because
neither transaction is written to disk, but new data is....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 5/6] xfs: add online discard support
2011-03-22 19:55 ` [PATCH 5/6] xfs: add online discard support Christoph Hellwig
@ 2011-03-23 0:30 ` Dave Chinner
2011-03-23 12:31 ` Christoph Hellwig
2011-03-25 21:04 ` Alex Elder
1 sibling, 1 reply; 31+ messages in thread
From: Dave Chinner @ 2011-03-23 0:30 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: xfs
On Tue, Mar 22, 2011 at 03:55:55PM -0400, Christoph Hellwig wrote:
> Now that we have reliably tracking of deleted extents in a transaction
> we can easily implement "online" discard support which calls
> blkdev_issue_discard once a transaction commits.
>
> The actual discard is a two stage operation as we first have to mark
> the busy extent as not available for reuse before we can start the
> actual discard. Note that we don't bother supporting discard for
> the non-delaylog mode. While that would be possible with this patch
> performance is awfull, and the optimization in the next patch won't
> work as easily.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
.....
> @@ -361,13 +362,17 @@ xlog_cil_committed(
> int abort)
> {
> struct xfs_cil_ctx *ctx = args;
> + struct xfs_mount *mp = ctx->cil->xc_log->l_mp;
> struct xfs_busy_extent *busyp, *n;
>
> xfs_trans_committed_bulk(ctx->cil->xc_log->l_ailp, ctx->lv_chain,
> ctx->start_lsn, abort);
>
> - list_for_each_entry_safe(busyp, n, &ctx->busy_extents, list)
> - xfs_alloc_busy_clear(ctx->cil->xc_log->l_mp, busyp);
> + list_for_each_entry_safe(busyp, n, &ctx->busy_extents, list) {
> + if (!abort)
> + xfs_discard_extent(mp, busyp);
> + xfs_alloc_busy_clear(mp, busyp);
> + }
>
> spin_lock(&ctx->cil->xc_cil_lock);
> list_del(&ctx->committing);
> Index: xfs/fs/xfs/linux-2.6/xfs_discard.c
> ===================================================================
> --- xfs.orig/fs/xfs/linux-2.6/xfs_discard.c 2011-03-21 14:47:03.614474345 +0100
> +++ xfs/fs/xfs/linux-2.6/xfs_discard.c 2011-03-21 14:51:41.449976366 +0100
> @@ -191,3 +191,38 @@ xfs_ioc_trim(
> return -XFS_ERROR(EFAULT);
> return 0;
> }
> +
> +int
> +xfs_discard_extent(
> + struct xfs_mount *mp,
> + struct xfs_busy_extent *busyp)
> +{
> + struct xfs_perag *pag;
> + int error = 0;
> + xfs_daddr_t bno;
> + int64_t len;
> + bool done = false;
> +
> + if ((mp->m_flags & XFS_MOUNT_DISCARD) == 0)
> + return 0;
I'd move this check to the callers, otherwise we are going to be
doing lots of function calls in a relatively performance sensitive
loop just to run a single check when discard is not enabled...
> +
> + bno = XFS_AGB_TO_DADDR(mp, busyp->agno, busyp->bno);
> + len = XFS_FSB_TO_BB(mp, busyp->length);
> +
> + pag = xfs_perag_get(mp, busyp->agno);
> + spin_lock(&pag->pagb_lock);
> + if (!busyp->length)
> + done = true;
> + busyp->flags = XFS_ALLOC_BUSY_DISCARDED;
> + spin_unlock(&pag->pagb_lock);
> + xfs_perag_put(pag);
> +
> + if (done)
> + return 0;
> +
> + error = -blkdev_issue_discard(mp->m_ddev_targp->bt_bdev, bno, len,
> + GFP_NOFS, 0);
> + if (error && error != EOPNOTSUPP)
> + xfs_info(mp, "discard failed, error %d", error);
This would be more informative if it also printed the bno and len of
the discard that failed. A couple of tracepoints here (e.g.
discard_extent_issued, discard_extent_failed) could also be useful
for tracking discard operations.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 6/6] xfs: make discard operations asynchronous
2011-03-22 19:55 ` [PATCH 6/6] xfs: make discard operations asynchronous Christoph Hellwig
@ 2011-03-23 0:43 ` Dave Chinner
2011-03-28 12:44 ` Christoph Hellwig
2011-03-25 21:04 ` Alex Elder
1 sibling, 1 reply; 31+ messages in thread
From: Dave Chinner @ 2011-03-23 0:43 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: xfs
On Tue, Mar 22, 2011 at 03:55:56PM -0400, Christoph Hellwig wrote:
> Instead of waiting for each discard request keep the CIL context alive
> until all of them are done, at which point we can tear it down completly
> and remove the busy extents from the rbtree.
>
> At this point I'm doing the I/O completion from IRQ context for simplicity,
> but I'll benchmark it against a version that uses a workqueue.
A workqueue is probably a good idea, because then the processing has
some level of concurrency built into it. It also means we don't need
to convert the locking to irq-safe variants and all the overhead
that this introduces.
> Signed-off-by: Christoph Hellwig <hch@lst.de>
>
> Index: xfs/fs/xfs/linux-2.6/xfs_discard.c
> ===================================================================
> --- xfs.orig/fs/xfs/linux-2.6/xfs_discard.c 2011-03-22 15:58:10.301855813 +0100
> +++ xfs/fs/xfs/linux-2.6/xfs_discard.c 2011-03-22 18:39:09.000000000 +0100
> @@ -30,6 +30,7 @@
> #include "xfs_inode.h"
> #include "xfs_alloc.h"
> #include "xfs_error.h"
> +#include "xfs_log_priv.h"
> #include "xfs_discard.h"
> #include "xfs_trace.h"
>
> @@ -192,37 +193,119 @@ xfs_ioc_trim(
> return 0;
> }
>
> +void
> +xfs_cil_discard_done(
> + struct xfs_cil_ctx *ctx)
> +{
> + if (atomic_dec_and_test(&ctx->discards)) {
> + struct xfs_busy_extent *busyp, *n;
> +
> + list_for_each_entry_safe(busyp, n, &ctx->busy_extents, list)
> + xfs_alloc_busy_clear(ctx->cil->xc_log->l_mp, busyp);
> + kmem_free(ctx);
> + }
> +}
> +
> +STATIC void
> +xfs_discard_end_io(
> + struct bio *bio,
> + int err)
> +{
> + struct xfs_cil_ctx *ctx = bio->bi_private;
> +
> + if (err && err != -EOPNOTSUPP) {
> + xfs_info(ctx->cil->xc_log->l_mp,
> + "I/O error during discard\n");
> + }
Same comment about the bno/len in the error message as the previous
patch.
> +
> + bio_put(bio);
> + xfs_cil_discard_done(ctx);
> +}
> +
> +static int
> +xfs_issue_discard(
> + struct block_device *bdev,
> + sector_t sector,
> + sector_t nr_sects,
> + gfp_t gfp_mask,
> + struct xfs_cil_ctx *ctx)
> +{
> + struct request_queue *q = bdev_get_queue(bdev);
> + unsigned int max_discard_sectors;
> + struct bio *bio;
> + int ret = 0;
> +
> + if (!q)
> + return -ENXIO;
> +
> + if (!blk_queue_discard(q))
> + return -EOPNOTSUPP;
> +
> + /*
> + * Ensure that max_discard_sectors is of the proper
> + * granularity
> + */
> + max_discard_sectors = min(q->limits.max_discard_sectors, UINT_MAX >> 9);
> + if (q->limits.discard_granularity) {
> + unsigned int disc_sects = q->limits.discard_granularity >> 9;
> +
> + max_discard_sectors &= ~(disc_sects - 1);
> + }
This is asking for a helper function....
> +
> +
> + while (nr_sects && !ret) {
no need to check ret here.
> + bio = bio_alloc(gfp_mask, 1);
> + if (!bio) {
> + ret = -ENOMEM;
> + break;
> + }
> +
> + bio->bi_sector = sector;
> + bio->bi_end_io = xfs_discard_end_io;
> + bio->bi_bdev = bdev;
> + bio->bi_private = ctx;
> +
> + if (nr_sects > max_discard_sectors) {
> + bio->bi_size = max_discard_sectors << 9;
> + nr_sects -= max_discard_sectors;
> + sector += max_discard_sectors;
> + } else {
> + bio->bi_size = nr_sects << 9;
> + nr_sects = 0;
> + }
> +
> + atomic_inc(&ctx->discards);
> + submit_bio(REQ_WRITE | REQ_DISCARD, bio);
> + }
> +
> + return ret;
> +}
> +
> int
> xfs_discard_extent(
> struct xfs_mount *mp,
> - struct xfs_busy_extent *busyp)
> + struct xfs_busy_extent *busyp,
> + struct xfs_cil_ctx *ctx)
> {
> struct xfs_perag *pag;
> - int error = 0;
> xfs_daddr_t bno;
> int64_t len;
> bool done = false;
>
> - if ((mp->m_flags & XFS_MOUNT_DISCARD) == 0)
> - return 0;
> -
> bno = XFS_AGB_TO_DADDR(mp, busyp->agno, busyp->bno);
> len = XFS_FSB_TO_BB(mp, busyp->length);
>
> pag = xfs_perag_get(mp, busyp->agno);
> - spin_lock(&pag->pagb_lock);
> + spin_lock_irq(&pag->pagb_lock);
> if (!busyp->length)
> done = true;
> busyp->flags = XFS_ALLOC_BUSY_DISCARDED;
> - spin_unlock(&pag->pagb_lock);
> + spin_unlock_irq(&pag->pagb_lock);
Disabling/enabling interrupts on these locks could hurt quite a bit.
They are travelled quite frequently, and irq operations add quite a
bit of overhead....
> xfs_perag_put(pag);
>
> if (done)
> return 0;
>
> - error = -blkdev_issue_discard(mp->m_ddev_targp->bt_bdev, bno, len,
> - GFP_NOFS, 0);
> - if (error && error != EOPNOTSUPP)
> - xfs_info(mp, "discard failed, error %d", error);
> - return error;
> + return -xfs_issue_discard(mp->m_ddev_targp->bt_bdev,
> + bno, len, GFP_NOFS, ctx);
> }
> Index: xfs/fs/xfs/linux-2.6/xfs_discard.h
> ===================================================================
> --- xfs.orig/fs/xfs/linux-2.6/xfs_discard.h 2011-03-22 15:58:10.313857879 +0100
> +++ xfs/fs/xfs/linux-2.6/xfs_discard.h 2011-03-22 18:39:09.000000000 +0100
> @@ -3,10 +3,13 @@
>
> struct fstrim_range;
> struct xfs_busy_extent;
> +struct xfs_cil_ctx;
>
> extern int xfs_ioc_trim(struct xfs_mount *, struct fstrim_range __user *);
>
> extern int xfs_discard_extent(struct xfs_mount *,
> - struct xfs_busy_extent *);
> + struct xfs_busy_extent *,
> + struct xfs_cil_ctx *);
> +extern void xfs_cil_discard_done(struct xfs_cil_ctx *ctx);
>
> #endif /* XFS_DISCARD_H */
> Index: xfs/fs/xfs/xfs_log_cil.c
> ===================================================================
> --- xfs.orig/fs/xfs/xfs_log_cil.c 2011-03-22 15:58:10.329855977 +0100
> +++ xfs/fs/xfs/xfs_log_cil.c 2011-03-22 18:39:09.000000000 +0100
> @@ -68,6 +68,7 @@ xlog_cil_init(
> INIT_LIST_HEAD(&ctx->busy_extents);
> ctx->sequence = 1;
> ctx->cil = cil;
> + atomic_set(&ctx->discards, 1);
> cil->xc_ctx = ctx;
> cil->xc_current_sequence = ctx->sequence;
>
> @@ -364,14 +365,18 @@ xlog_cil_committed(
> struct xfs_cil_ctx *ctx = args;
> struct xfs_mount *mp = ctx->cil->xc_log->l_mp;
> struct xfs_busy_extent *busyp, *n;
> + bool keep_alive = false;
>
> xfs_trans_committed_bulk(ctx->cil->xc_log->l_ailp, ctx->lv_chain,
> ctx->start_lsn, abort);
>
> - list_for_each_entry_safe(busyp, n, &ctx->busy_extents, list) {
> - if (!abort)
> - xfs_discard_extent(mp, busyp);
> - xfs_alloc_busy_clear(mp, busyp);
> + if (!(mp->m_flags & XFS_MOUNT_DISCARD) || abort) {
> + list_for_each_entry_safe(busyp, n, &ctx->busy_extents, list)
> + xfs_alloc_busy_clear(mp, busyp);
> + } else if (!list_empty(&ctx->busy_extents)) {
> + list_for_each_entry(busyp, &ctx->busy_extents, list)
> + xfs_discard_extent(mp, busyp, ctx);
> + keep_alive = true;
> }
Oh, I see you've moved the XFS_MOUNT_DISCARD into the loop here...
>
> spin_lock(&ctx->cil->xc_cil_lock);
> @@ -379,7 +384,10 @@ xlog_cil_committed(
> spin_unlock(&ctx->cil->xc_cil_lock);
>
> xlog_cil_free_logvec(ctx->lv_chain);
> - kmem_free(ctx);
> + if (keep_alive)
> + xfs_cil_discard_done(ctx);
> + else
> + kmem_free(ctx);
You could probably just call xfs_cil_discard_done(ctx) here as the
busy extent list will be empty in the !keep_alive case and so it
will simply do the kmem_free(ctx) call there.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 1/6] xfs: optimize AGFL refills
2011-03-22 22:30 ` Alex Elder
@ 2011-03-23 12:16 ` Christoph Hellwig
2011-03-25 21:03 ` Alex Elder
0 siblings, 1 reply; 31+ messages in thread
From: Christoph Hellwig @ 2011-03-23 12:16 UTC (permalink / raw)
To: Alex Elder; +Cc: Christoph Hellwig, xfs
On Tue, Mar 22, 2011 at 05:30:51PM -0500, Alex Elder wrote:
> On Tue, 2011-03-22 at 15:55 -0400, Christoph Hellwig wrote:
> > Avoid forcing out busy extent when moving blocks from/to the AGFL. We
>
> Without thinking a bit about it, it wasn't clear why doing
> this was OK to do. It would be nice to record a one-sentence
> justification of why we don't need to force busy extents out
> in this case.
I've added a sentence, but it seems rather pointless.
> achieve this
fixed.
> > the callers that need it, and by moving the busy list insert from
> > xfs_free_ag_extent extent which is used both by AGFL refills and real
> ^^^^^ drop this word
fixed.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 1/6] xfs: optimize AGFL refills
2011-03-22 23:30 ` Dave Chinner
@ 2011-03-23 12:16 ` Christoph Hellwig
0 siblings, 0 replies; 31+ messages in thread
From: Christoph Hellwig @ 2011-03-23 12:16 UTC (permalink / raw)
To: Dave Chinner; +Cc: Christoph Hellwig, xfs
On Wed, Mar 23, 2011 at 10:30:54AM +1100, Dave Chinner wrote:
> > + if (error)
> > + xfs_alloc_busy_insert(tp, args.agno, args.agbno, len);
>
> Shouldn't that be "if (!error)"? i.e. if we freed the extent
> successfully we need to insert it in the busy list. This current
> code has the effect of never inserting freed data extents in the
> busy list....
Yes, it should. This got messed up at some point during reshuffling
the series.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 2/6] xfs: do not immediately reuse busy extent ranges
2011-03-22 22:30 ` Alex Elder
@ 2011-03-23 12:17 ` Christoph Hellwig
0 siblings, 0 replies; 31+ messages in thread
From: Christoph Hellwig @ 2011-03-23 12:17 UTC (permalink / raw)
To: Alex Elder; +Cc: Christoph Hellwig, xfs
On Tue, Mar 22, 2011 at 05:30:59PM -0500, Alex Elder wrote:
> Again, this looks nearly identical to your last version.
> On that one, I suggested rewording a comment, and you
> said "ok." You did not do so in this version, which I
> guess is fine. I just want to know whether you intended
> to. If so I'll give you a chance to post an update; if
> not I can take this one as-is.
Sorry, I fixe half of what you pointed out and postponed the
comment rewrite. I've taken your version of the comments
for the next spin. Still not sure what to do about the magic 4,
though.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 3/6] xfs: exact busy extent tracking
2011-03-22 23:47 ` Dave Chinner
@ 2011-03-23 12:24 ` Christoph Hellwig
0 siblings, 0 replies; 31+ messages in thread
From: Christoph Hellwig @ 2011-03-23 12:24 UTC (permalink / raw)
To: Dave Chinner; +Cc: xfs
On Wed, Mar 23, 2011 at 10:47:28AM +1100, Dave Chinner wrote:
> BUG_ON() inside a spinlock will effectively lock up the machine as
> other CPUs try to take the spinlock that was held by the thread that
> went bad. It causes the machine to die a horrible, undebuggable death
> rather than just kill the thread that caused the oops and leaving
> everything else still functional.
>
> If it were an ASSERT(), that's probably OK because it is only debug
> systems that woul dhave this problem, but the use of BUG(_ON) means
> it will cause production systems to die as well.
I've changed it to asserts.
> > + * We would have to split the busy extent to be able
> > + * to track it correct, which we cannot do because we
> > + * would have to modify the list of busy extents
> > + * attached to the transaction/CIL context, which
> > + * is mutable.
>
> immutable?
Yes.
> > + *
> > + * Force out the log to clear the busy extents
> > + * and retry the search.
> > + */
>
> BTW, these comments appear to wrap at 68 columns - any particular
> reason?
They used to sit one more tab to the right and I didn't fix them up
when splitting the function. I've rewrapped them now.
> > + overlap = xfs_alloc_busy_try_reuse(pag, busyp,
> > + fbno, fbno + flen);
> > + if (overlap) {
> > + spin_unlock(&pag->pagb_lock);
> > + xfs_log_force(tp->t_mountp, XFS_LOG_SYNC);
> > + goto restart;
> > + }
>
> xfs_alloc_busy_try_reuse() only ever returns -1 or 1, which means
> this will always trigger a log force on overlap....
At least for now, yes. In the next patch we will treat -1 vs 1
differently. Maybe it should be changed to 0 vs 1, but even that
isn't quite intuitive. I'll think about it a bit more.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 4/6] xfs: allow reusing busy extents where safe
2011-03-23 0:20 ` Dave Chinner
@ 2011-03-23 12:26 ` Christoph Hellwig
0 siblings, 0 replies; 31+ messages in thread
From: Christoph Hellwig @ 2011-03-23 12:26 UTC (permalink / raw)
To: Dave Chinner; +Cc: Christoph Hellwig, xfs
On Wed, Mar 23, 2011 at 11:20:24AM +1100, Dave Chinner wrote:
> > overlap = xfs_alloc_busy_try_reuse(pag, busyp,
> > fbno, fbno + flen);
> > - if (overlap) {
> > + if (overlap == -1 || (overlap && userdata)) {
> > spin_unlock(&pag->pagb_lock);
> > xfs_log_force(tp->t_mountp, XFS_LOG_SYNC);
> > goto restart;
>
> Ok, so the only time we'll do a log force now is on an complete
> overlap or a partial userdata overlap?
We do it if we would have to split the busy extent, or reallocate
metadata to userdata.
> > + if (!args->userdata ||
> > + (busyp->flags & XFS_ALLOC_BUSY_USERDATA)) {
> > + int overlap;
> > +
> > + overlap = xfs_alloc_busy_try_reuse(args->pag, busyp,
> > + fbno, fbno + flen);
> > + if (unlikely(overlap == -1)) {
> > + spin_unlock(&args->pag->pagb_lock);
> > + xfs_log_force(args->mp, XFS_LOG_SYNC);
> > + goto restart;
> > + }
>
> Hmmmm - I'm not so sure we can reuse overlapped data extents for
> data allocations without a log force at all as there is no guarantee
> that the data will not be overwritten before the original free
> transaction is on disk.
>
> That is, recovery may not replay the original data extent free
> transaction or the new allocation transaction, but there is nothing
> stopping us from having written the new data into the extent before
> the crash occurred, especially as delayed allocation places the
> allocation very close the data IO issue. e.g.:
>
> thread X thread Y
> free data extent ABC
> allocate data extent BCD
> partial overlap, no log force
> issue data IO
> .....
>
> <crash>
>
> That leads to corruption of the data in the original file because
> neither transaction is written to disk, but new data is....
You're right. It's fine from the transaction point of view, but
it could leave wrong data in a file. I'll disallow this case, too.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 5/6] xfs: add online discard support
2011-03-23 0:30 ` Dave Chinner
@ 2011-03-23 12:31 ` Christoph Hellwig
0 siblings, 0 replies; 31+ messages in thread
From: Christoph Hellwig @ 2011-03-23 12:31 UTC (permalink / raw)
To: Dave Chinner; +Cc: Christoph Hellwig, xfs
> > + if ((mp->m_flags & XFS_MOUNT_DISCARD) == 0)
> > + return 0;
>
> I'd move this check to the callers, otherwise we are going to be
> doing lots of function calls in a relatively performance sensitive
> loop just to run a single check when discard is not enabled...
Ok. As you noticed it's done in the next patch, and they will probably
be merged into one before the final submission. But for now I'll
move it to make the patch more obvious.
> > + error = -blkdev_issue_discard(mp->m_ddev_targp->bt_bdev, bno, len,
> > + GFP_NOFS, 0);
> > + if (error && error != EOPNOTSUPP)
> > + xfs_info(mp, "discard failed, error %d", error);
>
> This would be more informative if it also printed the bno and len of
> the discard that failed. A couple of tracepoints here (e.g.
> discard_extent_issued, discard_extent_failed) could also be useful
> for tracking discard operations.
Yeah, I was planning to add a lot more tracepoint later on, both
for the busy extent handling and discards.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 1/6] xfs: optimize AGFL refills
2011-03-23 12:16 ` Christoph Hellwig
@ 2011-03-25 21:03 ` Alex Elder
2011-03-28 12:07 ` Christoph Hellwig
0 siblings, 1 reply; 31+ messages in thread
From: Alex Elder @ 2011-03-25 21:03 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: xfs
On Wed, 2011-03-23 at 08:16 -0400, Christoph Hellwig wrote:
> On Tue, Mar 22, 2011 at 05:30:51PM -0500, Alex Elder wrote:
> > On Tue, 2011-03-22 at 15:55 -0400, Christoph Hellwig wrote:
> > > Avoid forcing out busy extent when moving blocks from/to the AGFL. We
> >
> > Without thinking a bit about it, it wasn't clear why doing
> > this was OK to do. It would be nice to record a one-sentence
> > justification of why we don't need to force busy extents out
> > in this case.
>
> I've added a sentence, but it seems rather pointless.
The reason it was not obvious is because the case you're
treating is specifically when moving extents (blocks,
really) between the free list and the free btrees, in
xfs_alloc_fix_freelist(). You still need to force it
out when allocating and freeing "actually used" blocks,
which could also be considered "moving blocks from/to
the AGFL."
-Alex
> > achieve this
>
> fixed.
>
> > > the callers that need it, and by moving the busy list insert from
> > > xfs_free_ag_extent extent which is used both by AGFL refills and real
> > ^^^^^ drop this word
>
> fixed.
>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 2/6] xfs: do not immediately reuse busy extent ranges
2011-03-22 19:55 ` [PATCH 2/6] xfs: do not immediately reuse busy extent ranges Christoph Hellwig
2011-03-22 22:30 ` Alex Elder
@ 2011-03-25 21:03 ` Alex Elder
2011-03-28 12:07 ` Christoph Hellwig
1 sibling, 1 reply; 31+ messages in thread
From: Alex Elder @ 2011-03-25 21:03 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: xfs
On Tue, 2011-03-22 at 15:55 -0400, Christoph Hellwig wrote:
> Every time we reallocate a busy extent, we cause a synchronous log force
> to occur to ensure the freeing transaction is on disk before we continue
> and use the newly allocated extent. This is extremely sub-optimal as we
> have to mark every transaction with blocks that get reused as synchronous.
>
> Instead of searching the busy extent list after deciding on the extent to
> allocate, check each candidate extent during the allocation decisions as
> to whether they are in the busy list. If they are in the busy list, we
> trim the busy range out of the extent we have found and determine if that
> trimmed range is still OK for allocation. In many cases, this check can
> be incorporated into the allocation extent alignment code which already
> does trimming of the found extent before determining if it is a valid
> candidate for allocation.
>
> Based on two earlier patches from Dave Chinner.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
I already reviewed this, but I noticed a few things
that I think are worth clarifying in one comment.
There are a few typo's in that same block that you
might as well fix while you're at it.
. . .
> @@ -2634,6 +2704,181 @@ xfs_alloc_busy_search(
> return match;
> }
>
> +/*
> + * For a given extent [fbno, flen], search the busy extent list
> + * to find a subset of the extent that is not busy.
> + */
> +STATIC void
> +xfs_alloc_busy_trim(
> + struct xfs_alloc_arg *args,
> + xfs_agblock_t fbno,
> + xfs_extlen_t flen,
> + xfs_agblock_t *rbno,
> + xfs_extlen_t *rlen)
> +{
. . .
> + } else {
> + /* middle overlap */
> +
> + /*
> + * Case 9:
> + * bbno bend
> + * +BBBBBBBBBBBBBBBBB+
> + * +-----------------------------------+
> + * fbno fend
> + *
> + * Can be trimmed to:
> + * +-------+ OR +-------+
> + * fbno fend fbno fend
> + *
> + * We prefer the lower bno extent because the next
> + * allocation for this inode will use "end" as the
> + * target for first block. If the busy segment has
...will use the updated value of fend as the target...
> + * cleared, this will get a contiguous allocation next
> + * time around; if thebusy segment has not cleared,
the busy
> + * it will get an allocation at bend, which is a forward
> + * allocation.
> + *
> + * If we choose segment at bend, and this remains the
> + * best extent for the next allocation (e.g. NEAR_BNO
> + * allocation) we'll next allocate at bno, which will
...we'll next allocate at (pre-update) fbno, which will...
(Actually, correct this if my statements are wrong. The point is
to use "fbno" and "fend" where you currently just have "bno" and
"end".)
> + * give us backwards allocation. We already know that
> + * backwards allocation direction causes significant
> + * fragmentation of directories and degradataion of
> + * directory performance.
> + *
> + * Always chose the option that produces forward
choose
> + * allocation patterns so that sequential reads and
> + * writes only ever seek in one direction. Only choose
> + * the higher bno extent if the remainin unused extent
remaining
> + * length is much larger than the current allocation
> + * request, promising us a contiguous allocation in
> + * the following free space.
> + */
> +
> + if (bbno - fbno >= args->maxlen) {
. . .
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 3/6] xfs: exact busy extent tracking
2011-03-22 19:55 ` [PATCH 3/6] xfs: exact busy extent tracking Christoph Hellwig
2011-03-22 23:47 ` Dave Chinner
@ 2011-03-25 21:04 ` Alex Elder
2011-03-28 12:10 ` Christoph Hellwig
1 sibling, 1 reply; 31+ messages in thread
From: Alex Elder @ 2011-03-25 21:04 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: xfs
On Tue, 2011-03-22 at 15:55 -0400, Christoph Hellwig wrote:
> Update the extent tree in case we have to reuse a busy extent, so that it
> always is kept uptodate. This is done by replacing the busy list searches
> with a new xfs_alloc_busy_reuse helper, which updates the busy extent tree
> in case of a reuse. Also replace setting transactions to sync with forcing
> the log out in case we found a busy extent to reuse. This makes the code a
> lot more simple, and is required for discard support later on. While it
> will cause performance regressios with just this patch applied, the impact
> is more than mitigated by the next patch in the series.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
In this one, xfs_alloc_busy_reuse() is broken, or at least it has some
dead code in it now. (But now that I've looked at the later patches
I see that's a temporary artifact. I've left my original comments
in place anyway, in case any insights remain despite that.)
More below.
> Index: xfs/fs/xfs/xfs_alloc.c
> ===================================================================
> --- xfs.orig/fs/xfs/xfs_alloc.c 2011-03-20 19:41:55.835479390 +0100
> +++ xfs/fs/xfs/xfs_alloc.c 2011-03-21 14:49:14.157973188 +0100
. . .
> @@ -2459,100 +2459,6 @@ error0:
> return error;
> }
>
> -
> -/*
> - * AG Busy list management
> - * The busy list contains block ranges that have been freed but whose
> - * transactions have not yet hit disk. If any block listed in a busy
> - * list is reused, the transaction that freed it must be forced to disk
> - * before continuing to use the block.
> - *
> - * xfs_alloc_busy_insert - add to the per-ag busy list
> - * xfs_alloc_busy_clear - remove an item from the per-ag busy list
> - * xfs_alloc_busy_search - search for a busy extent
> - */
> -
> -/*
> - * Insert a new extent into the busy tree.
OK, to be honest I haven't re-read this entire block of
comment text to identify what might be of value. But
is there really *nothing* worth saving? Is the busy
extent tree documented adequately elsewhere?
> - * The busy extent tree is indexed by the start block of the busy extent.
> - * there can be multiple overlapping ranges in the busy extent tree but only
> - * ever one entry at a given start block. The reason for this is that
> - * multi-block extents can be freed, then smaller chunks of that extent
> - * allocated and freed again before the first transaction commit is on disk.
> - * If the exact same start block is freed a second time, we have to wait for
> - * that busy extent to pass out of the tree before the new extent is inserted.
> - * There are two main cases we have to handle here.
. . .
> @@ -2583,66 +2487,30 @@ xfs_alloc_busy_insert(
> new->agno = agno;
> new->bno = bno;
> new->length = len;
> - new->tid = xfs_log_get_trans_ident(tp);
> -
> INIT_LIST_HEAD(&new->list);
>
> /* trace before insert to be able to see failed inserts */
> trace_xfs_alloc_busy(tp, agno, bno, len, 0);
>
> pag = xfs_perag_get(tp->t_mountp, new->agno);
> -restart:
> spin_lock(&pag->pagb_lock);
> rbp = &pag->pagb_tree.rb_node;
> - parent = NULL;
> - busyp = NULL;
> - match = 0;
> - while (*rbp && match >= 0) {
> + while (*rbp) {
> parent = *rbp;
> busyp = rb_entry(parent, struct xfs_busy_extent, rb_node);
>
> if (new->bno < busyp->bno) {
> /* may overlap, but exact start block is lower */
This comment isn't really right any more (BUG_ON that condition).
> rbp = &(*rbp)->rb_left;
> - if (new->bno + new->length > busyp->bno)
> - match = busyp->tid == new->tid ? 1 : -1;
> + BUG_ON(new->bno + new->length > busyp->bno);
> } else if (new->bno > busyp->bno) {
> /* may overlap, but exact start block is higher */
This one too.
> rbp = &(*rbp)->rb_right;
> - if (bno < busyp->bno + busyp->length)
> - match = busyp->tid == new->tid ? 1 : -1;
> + BUG_ON(bno < busyp->bno + busyp->length);
> } else {
> - match = busyp->tid == new->tid ? 1 : -1;
> - break;
> + BUG();
> }
. . .
> @@ -2704,6 +2571,173 @@ xfs_alloc_busy_search(
> return match;
> }
>
/*
* The found free extent [fbno, fend] overlaps part or all
* of the given busy extent. If the overlap covers the
* beginning, the end, or all of the busy extent, the
* overlapping portion can be made unbusy and used for
* the allocation. We can't split a busy extent because
* we can't modify a transaction/CIL context busy list,
* but we can update an entry's block number or length.
*
* The caller will force the log and re-check the busy
* list after returning from this function.
*/
> +STATIC int
> +xfs_alloc_busy_try_reuse(
> + struct xfs_perag *pag,
> + struct xfs_busy_extent *busyp,
> + xfs_agblock_t fbno,
> + xfs_agblock_t fend)
> +{
> + xfs_agblock_t bbno = busyp->bno;
> + xfs_agblock_t bend = bbno + busyp->length;
> +
> + if (bbno < fbno && bend > fend) {
> + /*
> + * Case 1:
> + * bbno bend
> + * +BBBBBBBBBBBBBBBBB+
> + * +---------+
> + * fbno fend
> + */
> +
> + /*
> + * We would have to split the busy extent to be able
> + * to track it correct, which we cannot do because we
> + * would have to modify the list of busy extents
> + * attached to the transaction/CIL context, which
> + * is mutable.
> + *
> + * Force out the log to clear the busy extents
> + * and retry the search.
The caller forces the log. Rely on a comment in this
function's header to say that (not here).
> + */
> + return -1;
> + } else if (bbno >= fbno && bend <= fend) {
> + /*
> + * Case 2:
> + * bbno bend
> + * +BBBBBBBBBBBBBBBBB+
> + * +-----------------+
> + * fbno fend
. . .
> +
> + return 1;
> +}
> +
> +
> +/*
> + * For a given extent [fbno, flen], make sure we can reuse it safely.
> + */
> +void
> +xfs_alloc_busy_reuse(
> + struct xfs_trans *tp,
> + xfs_agnumber_t agno,
> + xfs_agblock_t fbno,
> + xfs_extlen_t flen)
> +{
> + struct xfs_perag *pag;
> + struct rb_node *rbp;
> +
> + ASSERT(flen > 0);
> +
> + pag = xfs_perag_get(tp->t_mountp, agno);
> +restart:
> + spin_lock(&pag->pagb_lock);
> + rbp = pag->pagb_tree.rb_node;
> + while (rbp) {
> + struct xfs_busy_extent *busyp =
> + rb_entry(rbp, struct xfs_busy_extent, rb_node);
> + xfs_agblock_t fend = fbno + flen;
> + xfs_agblock_t bbno = busyp->bno;
> + xfs_agblock_t bend = bbno + busyp->length;
> + int overlap;
> +
> + if (fend <= bbno) {
> + rbp = rbp->rb_left;
> + continue;
> + } else if (fbno >= bend) {
> + rbp = rbp->rb_right;
> + continue;
> + }
> +
I was going to suggest:
/* Extent overlaps busy extent */
here, but I think that may not be the case any more if
busyp->length is zero. Or rather, the extent may
surround the zero-length busy extent (which I suppose
could be considered overlap).
If busyp->length is zero, I think the call to
xfs_alloc_busy_try_reuse() is not needed; in fact,
if it is already 0, that function will call
rb_erase() on the entry again.
> + overlap = xfs_alloc_busy_try_reuse(pag, busyp,
> + fbno, fbno + flen);
Note that xfs_alloc_busy_try_reuse() only returns 1 or -1 now...
> + if (overlap) {
...therefore this branch is always taken, and the code
below this block to the end of the loop is not reached.
Since this is the only place it's used, xfs_alloc_busy_try_reuse()
might as well be defined as a void function.
(Ahh, but now that I've looked at the later patches
I see it gets used again later. I'm leaving my
comments here nevertheless.)
> + spin_unlock(&pag->pagb_lock);
> + xfs_log_force(tp->t_mountp, XFS_LOG_SYNC);
> + goto restart;
> + }
+------ code starting here is never reached
|
> + /*
> + * No more busy extents to search.
> + */
> + if (bbno <= fbno && bend >= fend)
> + break;
> +
> + if (fbno < bbno)
> + rbp = rbp->rb_left;
> + else
> + rbp = rbp->rb_right;
|
+------ end of code not reached
> + }
> + spin_unlock(&pag->pagb_lock);
> + xfs_perag_put(pag);
> +}
> +
> /*
> * For a given extent [fbno, flen], search the busy extent list
> * to find a subset of the extent that is not busy.
* If part or all of the extent is busy, force the log, then
* verify it is no longer busy before returning.
> @@ -2889,14 +2923,12 @@ xfs_alloc_busy_clear(
> trace_xfs_alloc_unbusy(mp, busyp->agno, busyp->bno,
> busyp->length);
>
> - ASSERT(xfs_alloc_busy_search(mp, busyp->agno, busyp->bno,
> - busyp->length) == 1);
> -
> list_del_init(&busyp->list);
>
> pag = xfs_perag_get(mp, busyp->agno);
> spin_lock(&pag->pagb_lock);
> - rb_erase(&busyp->rb_node, &pag->pagb_tree);
> + if (busyp->length)
> + rb_erase(&busyp->rb_node, &pag->pagb_tree);
> spin_unlock(&pag->pagb_lock);
> xfs_perag_put(pag);
>
. . .
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 4/6] xfs: allow reusing busy extents where safe
2011-03-22 19:55 ` [PATCH 4/6] xfs: allow reusing busy extents where safe Christoph Hellwig
2011-03-23 0:20 ` Dave Chinner
@ 2011-03-25 21:04 ` Alex Elder
1 sibling, 0 replies; 31+ messages in thread
From: Alex Elder @ 2011-03-25 21:04 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: xfs
On Tue, 2011-03-22 at 15:55 -0400, Christoph Hellwig wrote:
> Allow reusing any busy extent for metadata allocations, and reusing busy
> userdata extents for userdata allocations. Most of the complexity is
> propagating the userdata information from the XFS_BMAPI_METADATA flag
> to xfs_bunmapi into the low-level extent freeing routines. After that
> we can just track what type of busy extent we have and treat it accordingly.
Why is it OK to reuse user data extents for user data allocations?
I accept it is, I just haven't worked through in my mind why.
> Signed-off-by: Christoph Hellwig <hch@lst.de>
>
> Index: xfs/fs/xfs/xfs_alloc.c
> ===================================================================
> --- xfs.orig/fs/xfs/xfs_alloc.c 2011-03-21 14:49:14.000000000 +0100
> +++ xfs/fs/xfs/xfs_alloc.c 2011-03-21 14:51:31.746155282 +0100
> @@ -1396,7 +1396,8 @@ xfs_alloc_ag_vextent_small(
> if (error)
> goto error0;
> if (fbno != NULLAGBLOCK) {
> - xfs_alloc_busy_reuse(args->tp, args->agno, fbno, 1);
> + xfs_alloc_busy_reuse(args->tp, args->agno, fbno, 1,
> + args->userdata);
>
> if (args->userdata) {
> xfs_buf_t *bp;
> @@ -2431,7 +2432,8 @@ int /* error */
> xfs_free_extent(
> xfs_trans_t *tp, /* transaction pointer */
> xfs_fsblock_t bno, /* starting block number of extent */
> - xfs_extlen_t len) /* length of extent */
> + xfs_extlen_t len,
> + bool userdata)/* length of extent */
xfs_extlen_t len, /* length of extent */
bool userdata)
> {
> xfs_alloc_arg_t args;
> int error;
. . .
> @@ -2717,7 +2723,7 @@ restart: (in xfs_alloc_busy_reuse())
>
> overlap = xfs_alloc_busy_try_reuse(pag, busyp,
> fbno, fbno + flen);
> - if (overlap) {
> + if (overlap == -1 || (overlap && userdata)) {
xfs_alloc_busy_try_reuse() (still) never returns non-zero,
so this could just be:
if (overlap == -1 || userdata) {
I understand why we can skip forcing the log if
we're not doing a userdata allocation. But why
don't you also check busyp->flags here when it's
a userdata allocation, to see if it represents a
busy userdata section and therefore would allow
avoiding the log force (like is done below in
xfs_alloc_busy_trim())? You would have to grab
the flag value in busyp before the call.
> spin_unlock(&pag->pagb_lock);
> xfs_log_force(tp->t_mountp, XFS_LOG_SYNC);
> goto restart;
> @@ -2754,6 +2760,7 @@ xfs_alloc_busy_trim(
>
> ASSERT(flen > 0);
>
> +restart:
> spin_lock(&args->pag->pagb_lock);
> rbp = args->pag->pagb_tree.rb_node;
> while (rbp && flen >= args->minlen) {
> @@ -2771,6 +2778,31 @@ xfs_alloc_busy_trim(
> continue;
> }
>
> + if (!args->userdata ||
> + (busyp->flags & XFS_ALLOC_BUSY_USERDATA)) {
> + int overlap;
> +
> + overlap = xfs_alloc_busy_try_reuse(args->pag, busyp,
> + fbno, fbno + flen);
> + if (unlikely(overlap == -1)) {
> + spin_unlock(&args->pag->pagb_lock);
> + xfs_log_force(args->mp, XFS_LOG_SYNC);
> + goto restart;
> + }
> +
> + /*
> + * No more busy extents to search.
> + */
> + if (bbno <= fbno && bend >= fend)
> + goto out;
> +
> + if (fbno < bbno)
> + rbp = rbp->rb_left;
> + else
> + rbp = rbp->rb_right;
> + continue;
> + }
> +
> if (bbno <= fbno) {
> /* start overlap */
>
. . .
> Index: xfs/fs/xfs/xfs_ag.h
> ===================================================================
> --- xfs.orig/fs/xfs/xfs_ag.h 2011-03-21 14:48:04.000000000 +0100
> +++ xfs/fs/xfs/xfs_ag.h 2011-03-21 14:49:21.941981228 +0100
. . .
> @@ -3750,6 +3744,7 @@ xfs_bmap_add_free(
> new = kmem_zone_alloc(xfs_bmap_free_item_zone, KM_SLEEP);
> new->xbfi_startblock = bno;
> new->xbfi_blockcount = (xfs_extlen_t)len;
> + new->xbfi_flags = XFS_BFI_USERDATA;
Couldn't you arrange for the the xfs_bmbt_free_block() path
to *not* set this? (As it stands, it will always be set.)
> for (prev = NULL, cur = flist->xbf_first;
> cur != NULL;
> prev = cur, cur = cur->xbfi_next) {
. . .
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 5/6] xfs: add online discard support
2011-03-22 19:55 ` [PATCH 5/6] xfs: add online discard support Christoph Hellwig
2011-03-23 0:30 ` Dave Chinner
@ 2011-03-25 21:04 ` Alex Elder
2011-03-28 12:35 ` Christoph Hellwig
1 sibling, 1 reply; 31+ messages in thread
From: Alex Elder @ 2011-03-25 21:04 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: xfs
On Tue, 2011-03-22 at 15:55 -0400, Christoph Hellwig wrote:
> Now that we have reliably tracking of deleted extents in a transaction
> we can easily implement "online" discard support which calls
> blkdev_issue_discard once a transaction commits.
>
> The actual discard is a two stage operation as we first have to mark
> the busy extent as not available for reuse before we can start the
> actual discard. Note that we don't bother supporting discard for
> the non-delaylog mode. While that would be possible with this patch
> performance is awfull, and the optimization in the next patch won't
> work as easily.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
. . .
This one looks good overall. How about adding
documentation to Documentation/filesystems/xfs.txt
to describe the new mount options? (Dave, same
goes for you with delaylog. I haven't looked for
any other things missing from there.)
Reviewed-by: Alex Elder <aelder@sgi.com>
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 6/6] xfs: make discard operations asynchronous
2011-03-22 19:55 ` [PATCH 6/6] xfs: make discard operations asynchronous Christoph Hellwig
2011-03-23 0:43 ` Dave Chinner
@ 2011-03-25 21:04 ` Alex Elder
1 sibling, 0 replies; 31+ messages in thread
From: Alex Elder @ 2011-03-25 21:04 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: xfs
On Tue, 2011-03-22 at 15:55 -0400, Christoph Hellwig wrote:
> Instead of waiting for each discard request keep the CIL context alive
> until all of them are done, at which point we can tear it down completly
> and remove the busy extents from the rbtree.
>
> At this point I'm doing the I/O completion from IRQ context for simplicity,
> but I'll benchmark it against a version that uses a workqueue.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
This looks good.
Reviewed-by: Alex Elder <aelder@sgi.com>
. . .
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 1/6] xfs: optimize AGFL refills
2011-03-25 21:03 ` Alex Elder
@ 2011-03-28 12:07 ` Christoph Hellwig
0 siblings, 0 replies; 31+ messages in thread
From: Christoph Hellwig @ 2011-03-28 12:07 UTC (permalink / raw)
To: Alex Elder; +Cc: Christoph Hellwig, xfs
> The reason it was not obvious is because the case you're
> treating is specifically when moving extents (blocks,
> really) between the free list and the free btrees, in
> xfs_alloc_fix_freelist(). You still need to force it
> out when allocating and freeing "actually used" blocks,
> which could also be considered "moving blocks from/to
> the AGFL."
I've updated the comment to make that more clear.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 2/6] xfs: do not immediately reuse busy extent ranges
2011-03-25 21:03 ` Alex Elder
@ 2011-03-28 12:07 ` Christoph Hellwig
0 siblings, 0 replies; 31+ messages in thread
From: Christoph Hellwig @ 2011-03-28 12:07 UTC (permalink / raw)
To: Alex Elder; +Cc: Christoph Hellwig, xfs
On Fri, Mar 25, 2011 at 04:03:19PM -0500, Alex Elder wrote:
> > + * Can be trimmed to:
> > + * +-------+ OR +-------+
> > + * fbno fend fbno fend
> > + *
> > + * We prefer the lower bno extent because the next
> > + * allocation for this inode will use "end" as the
> > + * target for first block. If the busy segment has
>
> ...will use the updated value of fend as the target...
I've replaced all these comments based on the text you replied to the
last round (finally..)
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 3/6] xfs: exact busy extent tracking
2011-03-25 21:04 ` Alex Elder
@ 2011-03-28 12:10 ` Christoph Hellwig
0 siblings, 0 replies; 31+ messages in thread
From: Christoph Hellwig @ 2011-03-28 12:10 UTC (permalink / raw)
To: Alex Elder; +Cc: Christoph Hellwig, xfs
On Fri, Mar 25, 2011 at 04:04:20PM -0500, Alex Elder wrote:
> > - * Insert a new extent into the busy tree.
>
> OK, to be honest I haven't re-read this entire block of
> comment text to identify what might be of value. But
> is there really *nothing* worth saving? Is the busy
> extent tree documented adequately elsewhere?
The old text doesn't really apply to the new code anymore. I'll see
if there's a place I can insert some more documentaion, but it already
has quite a lot of comments.
> > if (new->bno < busyp->bno) {
> > /* may overlap, but exact start block is lower */
>
> This comment isn't really right any more (BUG_ON that condition).
> > } else if (new->bno > busyp->bno) {
> > /* may overlap, but exact start block is higher */
>
> This one too.
Removed.
> > + continue;
> > + } else if (fbno >= bend) {
> > + rbp = rbp->rb_right;
> > + continue;
> > + }
> > +
>
> I was going to suggest:
> /* Extent overlaps busy extent */
>
> here, but I think that may not be the case any more if
> busyp->length is zero. Or rather, the extent may
> surround the zero-length busy extent (which I suppose
> could be considered overlap).
>
> If busyp->length is zero, I think the call to
> xfs_alloc_busy_try_reuse() is not needed; in fact,
> if it is already 0, that function will call
> rb_erase() on the entry again.
We will never see zero-length busy extents here. While they have
to remain on the per-transaction/cil_context list they are properly
removed from the rbtree.
> ...therefore this branch is always taken, and the code
> below this block to the end of the loop is not reached.
>
> Since this is the only place it's used, xfs_alloc_busy_try_reuse()
> might as well be defined as a void function.
>
> (Ahh, but now that I've looked at the later patches
> I see it gets used again later. I'm leaving my
> comments here nevertheless.)
Yes, it's changing in the next patch.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 5/6] xfs: add online discard support
2011-03-25 21:04 ` Alex Elder
@ 2011-03-28 12:35 ` Christoph Hellwig
0 siblings, 0 replies; 31+ messages in thread
From: Christoph Hellwig @ 2011-03-28 12:35 UTC (permalink / raw)
To: Alex Elder; +Cc: Christoph Hellwig, xfs
On Fri, Mar 25, 2011 at 04:04:52PM -0500, Alex Elder wrote:
> This one looks good overall. How about adding
> documentation to Documentation/filesystems/xfs.txt
> to describe the new mount options?
I've added it.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH 6/6] xfs: make discard operations asynchronous
2011-03-23 0:43 ` Dave Chinner
@ 2011-03-28 12:44 ` Christoph Hellwig
0 siblings, 0 replies; 31+ messages in thread
From: Christoph Hellwig @ 2011-03-28 12:44 UTC (permalink / raw)
To: Dave Chinner; +Cc: Christoph Hellwig, xfs
On Wed, Mar 23, 2011 at 11:43:23AM +1100, Dave Chinner wrote:
> > + struct xfs_cil_ctx *ctx = bio->bi_private;
> > +
> > + if (err && err != -EOPNOTSUPP) {
> > + xfs_info(ctx->cil->xc_log->l_mp,
> > + "I/O error during discard\n");
> > + }
>
> Same comment about the bno/len in the error message as the previous
> patch.
We don't have the len information available anymore at this point,
but I've added the startblock.
> > + max_discard_sectors = min(q->limits.max_discard_sectors, UINT_MAX >> 9);
> > + if (q->limits.discard_granularity) {
> > + unsigned int disc_sects = q->limits.discard_granularity >> 9;
> > +
> > + max_discard_sectors &= ~(disc_sects - 1);
> > + }
>
> This is asking for a helper function....
Yes, but that helper needs to be shared with the block layer discard
code, so I'll send it through Jens' tree later.
> > + if (keep_alive)
> > + xfs_cil_discard_done(ctx);
> > + else
> > + kmem_free(ctx);
>
> You could probably just call xfs_cil_discard_done(ctx) here as the
> busy extent list will be empty in the !keep_alive case and so it
> will simply do the kmem_free(ctx) call there.
This means we clear the busy list a little later in the non-discard
case, but it cleans up the code nicely. I've added it to the series
as a separate patch.
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
^ permalink raw reply [flat|nested] 31+ messages in thread
end of thread, other threads:[~2011-03-28 12:40 UTC | newest]
Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-03-22 19:55 [PATCH 0/6] [PATCH 0/6] more efficient busy extent handling and discard support Christoph Hellwig
2011-03-22 19:55 ` [PATCH 1/6] xfs: optimize AGFL refills Christoph Hellwig
2011-03-22 22:30 ` Alex Elder
2011-03-23 12:16 ` Christoph Hellwig
2011-03-25 21:03 ` Alex Elder
2011-03-28 12:07 ` Christoph Hellwig
2011-03-22 23:30 ` Dave Chinner
2011-03-23 12:16 ` Christoph Hellwig
2011-03-22 19:55 ` [PATCH 2/6] xfs: do not immediately reuse busy extent ranges Christoph Hellwig
2011-03-22 22:30 ` Alex Elder
2011-03-23 12:17 ` Christoph Hellwig
2011-03-25 21:03 ` Alex Elder
2011-03-28 12:07 ` Christoph Hellwig
2011-03-22 19:55 ` [PATCH 3/6] xfs: exact busy extent tracking Christoph Hellwig
2011-03-22 23:47 ` Dave Chinner
2011-03-23 12:24 ` Christoph Hellwig
2011-03-25 21:04 ` Alex Elder
2011-03-28 12:10 ` Christoph Hellwig
2011-03-22 19:55 ` [PATCH 4/6] xfs: allow reusing busy extents where safe Christoph Hellwig
2011-03-23 0:20 ` Dave Chinner
2011-03-23 12:26 ` Christoph Hellwig
2011-03-25 21:04 ` Alex Elder
2011-03-22 19:55 ` [PATCH 5/6] xfs: add online discard support Christoph Hellwig
2011-03-23 0:30 ` Dave Chinner
2011-03-23 12:31 ` Christoph Hellwig
2011-03-25 21:04 ` Alex Elder
2011-03-28 12:35 ` Christoph Hellwig
2011-03-22 19:55 ` [PATCH 6/6] xfs: make discard operations asynchronous Christoph Hellwig
2011-03-23 0:43 ` Dave Chinner
2011-03-28 12:44 ` Christoph Hellwig
2011-03-25 21:04 ` Alex Elder
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.