From: Dave Chinner <david@fromorbit.com>
To: Christoph Hellwig <hch@infradead.org>
Cc: xfs@oss.sgi.com
Subject: Re: [PATCH 3/6] xfs: do not immediately reuse busy extent ranges
Date: Fri, 28 Jan 2011 12:58:35 +1100 [thread overview]
Message-ID: <20110128015835.GQ21311@dastard> (raw)
In-Reply-To: <20110121092550.933551564@bombadil.infradead.org>
On Fri, Jan 21, 2011 at 04:22:30AM -0500, Christoph Hellwig wrote:
> Every time we reallocate a busy extent, we cause a synchronous log force
> to occur to ensure the freeing transaction is on disk before we continue
> and use the newly allocated extent. This is extremely sub-optimal as we
> have to mark every transaction with blocks that get reused as synchronous.
>
> Instead of searching the busy extent list after deciding on the extent to
> allocate, check each candidate extent during the allocation decisions as
> to whether they are in the busy list. If they are in the busy list, we
> trim the busy range out of the extent we have found and determine if that
> trimmed range is still OK for allocation. In many cases, this check can
> be incorporated into the allocation extent alignment code which already
> does trimming of the found extent before determining if it is a valid
> candidate for allocation.
>
> [hch: merged two earlier patches from Dave and fixed various bugs]
>
> Signed-off-by: Dave Chinner <david@fromorbit.com>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
....
> ASSERT(args->agbno % args->alignment == 0);
>
> if (!args->wasfromfl) {
> - xfs_alloc_update_counters(args->tp, args->pag, args->agbp,
> - -((long)(args->len)));
> + error = xfs_alloc_update_counters(args->tp, args->pag,
> + args->agbp,
> + -((long)(args->len)));
>
> - /*
> - * Search the busylist for these blocks and mark the
> - * transaction as synchronous if blocks are found. This
> - * avoids the need to block due to a synchronous log
> - * force to ensure correct ordering as the synchronous
> - * transaction will guarantee that for us.
> - */
> - if (xfs_alloc_busy_search(args->mp, args->agno,
> - args->agbno, args->len))
> - xfs_trans_set_sync(args->tp);
> + ASSERT(!xfs_alloc_busy_search(args->mp, args->agno,
> + args->agbno, args->len));
> }
>
> if (!args->isfl) {
> @@ -559,7 +554,7 @@ xfs_alloc_ag_vextent(
>
> XFS_STATS_INC(xs_allocx);
> XFS_STATS_ADD(xs_allocb, args->len);
> - return 0;
> + return error;
> }
These error handling changes could go in the previous patch.
> @@ -2612,7 +2688,7 @@ restart:
> * will require a synchronous transaction, but it can still be
> * used to distinguish between a partial or exact match.
> */
> -int
> +STATIC int
> xfs_alloc_busy_search(
> struct xfs_mount *mp,
> xfs_agnumber_t agno,
> @@ -2654,6 +2730,71 @@ xfs_alloc_busy_search(
> return match;
> }
>
> +/*
> + * For a given extent [fbno, flen], search the busy extent list
> + * to find a subset of the extent that is not busy.
> + */
How does this function return an indication that the extent selected
is entirrely unsuitable? e.g. the extent lies wholly within
the busy range and gets trimmed to zero length? perhaps it woul dbe
good to return an error in this case?
> +void
> +xfs_alloc_busy_search_trim(
> + struct xfs_mount *mp,
> + struct xfs_perag *pag,
> + xfs_agblock_t fbno,
> + xfs_extlen_t flen,
> + xfs_agblock_t *rbno,
> + xfs_extlen_t *rlen)
> +{
> + struct rb_node *rbp;
> + xfs_agblock_t bno = fbno;
> + xfs_extlen_t len = flen;
> +
> + spin_lock(&pag->pagb_lock);
> + rbp = pag->pagb_tree.rb_node;
> + while (rbp) {
> + struct xfs_busy_extent *busyp =
> + rb_entry(rbp, struct xfs_busy_extent, rb_node);
> + xfs_agblock_t end = bno + len;
> + xfs_agblock_t bend = busyp->bno + busyp->length;
> +
> + if (bno + len <= busyp->bno) {
> + rbp = rbp->rb_left;
> + continue;
> + } else if (bno >= busyp->bno + busyp->length) {
> + rbp = rbp->rb_right;
> + continue;
> + }
if (end <= bbno)
left;
else if (bno > bend)
right;
/* overlap */
> +
> + if (busyp->bno < bno) {
> + /* start overlap */
> + ASSERT(bend >= bno);
> + ASSERT(bend <= end);
> + len -= bno - bend;
> + bno = bend;
if (bbno < bno) {
bbno bend
+-----------------+
Case 1:
+---------+
bno end
No unbusy region in extent, return failure
Case 2:
+------------------------+
bno end
Needs to be trimmed to:
+-------+
bno end
bno = bend;
len = end - bno;
> + } else if (bend > end) {
> + /* end overlap */
> + ASSERT(busyp->bno >= bno);
> + ASSERT(busyp->bno < end);
> + len -= bend - end;
} else if (bend > end) {
bno must be <= bbno in this case, so:
bbno bend
+-----------------+
Case 3: bno == bbno:
+-------------+
bno end
No unbusy region in extent, return failure.
Case 4:
+------------------+
bno end
Needs to be trimmed to:
+-------+
bno end
end = bbno;
len = end - bno;
> + } else {
> + /* middle overlap - return larger segment */
> + ASSERT(busyp->bno >= bno);
> + ASSERT(bend <= end);
> + len = busyp->bno - bno;
> + if (len >= end - bend) {
> + /* use first segment */
> + len = len;
> + } else {
> + /* use last segment */
> + bno = bend;
> + len = end - bend;
> + }
(bno <= bbno && bend <= end) in this case, so:
bbno bend
+-----------------+
Case 5: Exact match: (bno == bbno && bend == end)
+-----------------+
bno end
No unbusy region in extent, return failure.
Case 6: (bno == bbno && bend < end)
+-------------------------+
bno end
Needs to be trimmed to:
+-------+
bno end
Same as Case 2.
- case 2 can be extend to bbno <= bno.
Case 7: (bno < bbno && bend == end)
+-------------------------+
bno end
Needs to be trimmed to:
+-------+
bno end
Same as case 4.
- Case 4 can be extented to bend >= end
Case 8: (bno < bbno && bend < end)
+---------------------------------+
bno end
can be trimmed to:
+-------+ OR +-------+
bno end bno end
- Should always want to choose the lower bno extent:
- next allocation in file will use "end" as
the target for first block
- if busy segment has cleared, will get contiguous
allocation
- if busy segment has not cleared, get allocation at
bend, which is forwards allocation.
- if we choose segment at bend, and this remains the
best extent for the next allocation (e.g.
NEAR_BNO allocation) we'll next allocate at bno,
which will give us backwards allocation.
- always chose the option that produces forwards
allocation patterns so that sequential reads and
writes only ever seek in one direction.
- we already know that backwards allocation
direction (due to NEAR_BNO second algorithm)
causes significant fragmentation of directories
and degradataion of directory performance, so I
think we should avoid introducing a new allocation
algorithm that can lead to backwards allocation
around frequently reused extents.
- only choose right hand edge if the remainin unused extent
length is much larger than the current allocation request.
- otherwise return failure if we can't use lower bno due to
minlen restrictions so a new extent is chosen by the
higher level algorithm.
So, it looks to me like the "overlap found" algorithm shoul dbe
something like:
if (bbno <= bno) {
if (end <= bend) {
/* case 1, 3, 5 */
return failure;
}
/* case 2, 6 */
bno = bend;
len = end - bno;
} else if (bend >= end) {
ASSERT(bbno > bno);
/* case 4, 7 */
end = bbno;
len = end - bno;
} else {
ASSERT(bbno > bno);
ASSERT(bend < end);
/* case 8 */
if (bbno - bno >= args->minlen) {
/* left candidate OK */
left = 1;
}
if (end - bend >= args->maxlen * 4) {
/* right candidate OK */
right = 1;
}
if (left && right) {
/* take right if left is not a
* maximal allocation */
if (bbno - bno < args->maxlen)
left = 0;
}
if (left) {
end = bbno;
len = end - bno;
} else if (right) {
bno = bend;
len = end - bno;
} else {
return failure;
}
}
> @@ -109,19 +109,16 @@ xfs_trim_extents(
> * If any blocks in the range are still busy, skip the
> * discard and try again the next time.
> */
> - if (xfs_alloc_busy_search(mp, agno, fbno, flen)) {
> - trace_xfs_discard_busy(mp, agno, fbno, flen);
> - goto next_extent;
> - }
> + xfs_alloc_busy_search_trim(mp, pag, fbno, flen, &tbno, &tlen);
>
> - trace_xfs_discard_extent(mp, agno, fbno, flen);
> + trace_xfs_discard_extent(mp, agno, tbno, tlen);
> error = -blkdev_issue_discard(bdev,
> - XFS_AGB_TO_DADDR(mp, agno, fbno),
> - XFS_FSB_TO_BB(mp, flen),
> + XFS_AGB_TO_DADDR(mp, agno, tbno),
> + XFS_FSB_TO_BB(mp, tlen),
> GFP_NOFS, 0);
> if (error)
> goto out_del_cursor;
> - *blocks_trimmed += flen;
> + *blocks_trimmed += tlen;
Hmmm - that means if we get a case 8 overlap, we'll only trim one
side of the extent. That's probably not a big deal. However, perhaps
this should check the size of the trimmed extent before issuing the
discard against it in case we've reduced it to something smaller
thanthe minimum requested trim size....
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com
_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs
next prev parent reply other threads:[~2011-01-28 1:56 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-01-21 9:22 [PATCH 0/6] do not reuse busy extents Christoph Hellwig
2011-01-21 9:22 ` [PATCH 1/6] xfs: clean up the xfs_alloc_compute_aligned calling convention Christoph Hellwig
2011-01-25 4:23 ` Dave Chinner
2011-01-27 23:21 ` Alex Elder
2011-01-21 9:22 ` [PATCH 3/6] xfs: do not immediately reuse busy extent ranges Christoph Hellwig
2011-01-27 23:20 ` Alex Elder
2011-01-28 1:58 ` Dave Chinner [this message]
2011-01-28 16:19 ` Alex Elder
2011-01-29 0:25 ` Dave Chinner
2011-01-21 9:22 ` [PATCH 4/6] xfs: optimize xfs_alloc_fix_freelist Christoph Hellwig
2011-01-28 5:36 ` Dave Chinner
2011-01-28 5:51 ` Dave Chinner
2011-01-28 22:17 ` Alex Elder
2011-01-21 9:22 ` [PATCH 5/6] xfs: do not classify freed allocation btree blocks as busy Christoph Hellwig
2011-01-28 6:33 ` Dave Chinner
2011-01-28 22:17 ` Alex Elder
2011-02-01 23:02 ` Alex Elder
2011-01-21 9:22 ` [PATCH 6/6] xfs: remove handling of duplicates the busy extent tree Christoph Hellwig
2011-02-01 23:02 ` Alex Elder
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20110128015835.GQ21311@dastard \
--to=david@fromorbit.com \
--cc=hch@infradead.org \
--cc=xfs@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.