All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Christoph Hellwig <hch@infradead.org>
Cc: xfs@oss.sgi.com
Subject: Re: [PATCH 3/6] xfs: do not immediately reuse busy extent ranges
Date: Fri, 28 Jan 2011 12:58:35 +1100	[thread overview]
Message-ID: <20110128015835.GQ21311@dastard> (raw)
In-Reply-To: <20110121092550.933551564@bombadil.infradead.org>

On Fri, Jan 21, 2011 at 04:22:30AM -0500, Christoph Hellwig wrote:
> Every time we reallocate a busy extent, we cause a synchronous log force
> to occur to ensure the freeing transaction is on disk before we continue
> and use the newly allocated extent.  This is extremely sub-optimal as we
> have to mark every transaction with blocks that get reused as synchronous.
> 
> Instead of searching the busy extent list after deciding on the extent to
> allocate, check each candidate extent during the allocation decisions as
> to whether they are in the busy list.  If they are in the busy list, we
> trim the busy range out of the extent we have found and determine if that
> trimmed range is still OK for allocation. In many cases, this check can
> be incorporated into the allocation extent alignment code which already
> does trimming of the found extent before determining if it is a valid
> candidate for allocation.
> 
> [hch: merged two earlier patches from Dave and fixed various bugs]
> 
> Signed-off-by: Dave Chinner <david@fromorbit.com>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
....
>  	ASSERT(args->agbno % args->alignment == 0);
>  
>  	if (!args->wasfromfl) {
> -		xfs_alloc_update_counters(args->tp, args->pag, args->agbp,
> -					  -((long)(args->len)));
> +		error = xfs_alloc_update_counters(args->tp, args->pag,
> +						  args->agbp,
> +						  -((long)(args->len)));
>  
> -		/*
> -		 * Search the busylist for these blocks and mark the
> -		 * transaction as synchronous if blocks are found. This
> -		 * avoids the need to block due to a synchronous log
> -		 * force to ensure correct ordering as the synchronous
> -		 * transaction will guarantee that for us.
> -		 */
> -		if (xfs_alloc_busy_search(args->mp, args->agno,
> -					args->agbno, args->len))
> -			xfs_trans_set_sync(args->tp);
> +		ASSERT(!xfs_alloc_busy_search(args->mp, args->agno,
> +					      args->agbno, args->len));
>  	}
>  
>  	if (!args->isfl) {
> @@ -559,7 +554,7 @@ xfs_alloc_ag_vextent(
>  
>  	XFS_STATS_INC(xs_allocx);
>  	XFS_STATS_ADD(xs_allocb, args->len);
> -	return 0;
> +	return error;
>  }

These error handling changes could go in the previous patch.

> @@ -2612,7 +2688,7 @@ restart:
>   * will require a synchronous transaction, but it can still be
>   * used to distinguish between a partial or exact match.
>   */
> -int
> +STATIC int
>  xfs_alloc_busy_search(
>  	struct xfs_mount	*mp,
>  	xfs_agnumber_t		agno,
> @@ -2654,6 +2730,71 @@ xfs_alloc_busy_search(
>  	return match;
>  }
>  
> +/*
> + * For a given extent [fbno, flen], search the busy extent list
> + * to find a subset of the extent that is not busy.
> + */

How does this function return an indication that the extent selected
is entirrely unsuitable? e.g. the extent lies wholly within
the busy range and gets trimmed to zero length? perhaps it woul dbe
good to return an error in this case?

> +void
> +xfs_alloc_busy_search_trim(
> +	struct xfs_mount	*mp,
> +	struct xfs_perag	*pag,
> +	xfs_agblock_t		fbno,
> +	xfs_extlen_t		flen,
> +	xfs_agblock_t		*rbno,
> +	xfs_extlen_t		*rlen)
> +{
> +	struct rb_node		*rbp;
> +	xfs_agblock_t           bno = fbno;
> +	xfs_extlen_t            len = flen;
> +
> +	spin_lock(&pag->pagb_lock);
> +	rbp = pag->pagb_tree.rb_node;
> +	while (rbp) {
> +		struct xfs_busy_extent *busyp =
> +			rb_entry(rbp, struct xfs_busy_extent, rb_node);
> +		xfs_agblock_t	end = bno + len;
> +		xfs_agblock_t	bend = busyp->bno + busyp->length;
> +
> +		if (bno + len <= busyp->bno) {
> +			rbp = rbp->rb_left;
> +			continue;
> +		} else if (bno >= busyp->bno + busyp->length) {
> +			rbp = rbp->rb_right;
> +			continue;
> +		}

		if (end <= bbno)
			left;
		else if (bno > bend)
			right;

		/* overlap */

> +
> +		if (busyp->bno < bno) {
> +			/* start overlap */
> +			ASSERT(bend >= bno);
> +			ASSERT(bend <= end);
> +			len -= bno - bend;
> +			bno = bend;

		if (bbno < bno) {

		bbno           bend
		+-----------------+
Case 1:
		   +---------+
		   bno     end

		   No unbusy region in extent, return failure

Case 2:
		   +------------------------+
		   bno                    end

		   Needs to be trimmed to:
		                    +-------+
		                    bno   end
		   bno = bend;
		   len = end - bno;

> +		} else if (bend > end) {
> +			/* end overlap */
> +			ASSERT(busyp->bno >= bno);
> +			ASSERT(busyp->bno < end);
> +			len -= bend - end;

		} else if (bend > end) {

bno must be <= bbno in this case, so:

		bbno           bend
		+-----------------+

Case 3: bno == bbno:
		+-------------+
		bno         end

		No unbusy region in extent, return failure.

Case 4:
        +------------------+
        bno              end

   Needs to be trimmed to:
        +-------+
        bno   end

		end = bbno;
		len = end - bno;



> +		} else {
> +			/* middle overlap - return larger segment */
> +			ASSERT(busyp->bno >= bno);
> +			ASSERT(bend <= end);
> +			len = busyp->bno - bno;
> +			if (len >= end - bend) {
> +				/* use first segment */
> +				len = len;
> +			} else {
> +				/* use last segment */
> +				bno = bend;
> +				len = end - bend;
> +			}

(bno <= bbno && bend <= end) in this case, so:

		bbno           bend
		+-----------------+

Case 5: Exact match: (bno == bbno && bend == end)
		+-----------------+
		bno             end

		No unbusy region in extent, return failure.

Case 6: (bno == bbno && bend < end)

		+-------------------------+
		bno                     end

		Needs to be trimmed to:
		                  +-------+
		                  bno   end

		Same as Case 2.
			- case 2 can be extend to bbno <= bno.

Case 7: (bno < bbno && bend == end)

        +-------------------------+
        bno                     end

   Needs to be trimmed to:
        +-------+
        bno   end

		Same as case 4.
			- Case 4 can be extented to bend >= end

Case 8: (bno < bbno && bend < end)

        +---------------------------------+
        bno                             end

can be trimmed to:
        +-------+        OR       +-------+
        bno   end                 bno   end

	- Should always want to choose the lower bno extent:
		- next allocation in file will use "end" as
		  the target for first block
		- if busy segment has cleared, will get contiguous
		  allocation
		- if busy segment has not cleared, get allocation at
		  bend, which is forwards allocation.

		- if we choose segment at bend, and this remains the
		  best extent for the next allocation (e.g.
		  NEAR_BNO allocation) we'll next allocate at bno,
		  which will give us backwards allocation.

		- always chose the option that produces forwards
		  allocation patterns so that sequential reads and
		  writes only ever seek in one direction.

		- we already know that backwards allocation
		  direction (due to NEAR_BNO second algorithm)
		  causes significant fragmentation of directories
		  and degradataion of directory performance, so I
		  think we should avoid introducing a new allocation
		  algorithm that can lead to backwards allocation
		  around frequently reused extents.

	- only choose right hand edge if the remainin unused extent
	  length is much larger than the current allocation request.

	- otherwise return failure if we can't use lower bno due to
	  minlen restrictions so a new extent is chosen by the
	  higher level algorithm.


So, it looks to me like the "overlap found" algorithm shoul dbe
something like:

		if (bbno <= bno) {
			if (end <= bend) {
				/* case 1, 3, 5 */
				return failure;
			}
			/* case 2, 6 */
			bno = bend;
			len = end - bno;
		} else if (bend >= end) {
			ASSERT(bbno > bno);
			/* case 4, 7 */
			end = bbno;
			len = end - bno;
		} else {
			ASSERT(bbno > bno);
			ASSERT(bend < end);
			/* case 8 */
			if (bbno - bno >= args->minlen) {
				/* left candidate OK */
				left = 1;
			}
			if (end - bend >= args->maxlen * 4) {
				/* right candidate OK */
				right = 1;
			}
			if (left && right) {
				/* take right if left is not a
				 * maximal allocation */
				if (bbno - bno < args->maxlen)
					left = 0;
			}
			if (left) {
				end = bbno;
				len = end - bno;
			} else if (right) {
				bno = bend;
				len = end - bno;
			} else {
				return failure;
			}
		}

> @@ -109,19 +109,16 @@ xfs_trim_extents(
>  		 * If any blocks in the range are still busy, skip the
>  		 * discard and try again the next time.
>  		 */
> -		if (xfs_alloc_busy_search(mp, agno, fbno, flen)) {
> -			trace_xfs_discard_busy(mp, agno, fbno, flen);
> -			goto next_extent;
> -		}
> +		xfs_alloc_busy_search_trim(mp, pag, fbno, flen, &tbno, &tlen);
>  
> -		trace_xfs_discard_extent(mp, agno, fbno, flen);
> +		trace_xfs_discard_extent(mp, agno, tbno, tlen);
>  		error = -blkdev_issue_discard(bdev,
> -				XFS_AGB_TO_DADDR(mp, agno, fbno),
> -				XFS_FSB_TO_BB(mp, flen),
> +				XFS_AGB_TO_DADDR(mp, agno, tbno),
> +				XFS_FSB_TO_BB(mp, tlen),
>  				GFP_NOFS, 0);
>  		if (error)
>  			goto out_del_cursor;
> -		*blocks_trimmed += flen;
> +		*blocks_trimmed += tlen;

Hmmm - that means if we get a case 8 overlap, we'll only trim one
side of the extent. That's probably not a big deal. However, perhaps
this should check the size of the trimmed extent before issuing the
discard against it in case we've reduced it to something smaller
thanthe minimum requested trim size....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  parent reply	other threads:[~2011-01-28  1:56 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-01-21  9:22 [PATCH 0/6] do not reuse busy extents Christoph Hellwig
2011-01-21  9:22 ` [PATCH 1/6] xfs: clean up the xfs_alloc_compute_aligned calling convention Christoph Hellwig
2011-01-25  4:23   ` Dave Chinner
2011-01-27 23:21   ` Alex Elder
2011-01-21  9:22 ` [PATCH 3/6] xfs: do not immediately reuse busy extent ranges Christoph Hellwig
2011-01-27 23:20   ` Alex Elder
2011-01-28  1:58   ` Dave Chinner [this message]
2011-01-28 16:19     ` Alex Elder
2011-01-29  0:25       ` Dave Chinner
2011-01-21  9:22 ` [PATCH 4/6] xfs: optimize xfs_alloc_fix_freelist Christoph Hellwig
2011-01-28  5:36   ` Dave Chinner
2011-01-28  5:51     ` Dave Chinner
2011-01-28 22:17   ` Alex Elder
2011-01-21  9:22 ` [PATCH 5/6] xfs: do not classify freed allocation btree blocks as busy Christoph Hellwig
2011-01-28  6:33   ` Dave Chinner
2011-01-28 22:17   ` Alex Elder
2011-02-01 23:02   ` Alex Elder
2011-01-21  9:22 ` [PATCH 6/6] xfs: remove handling of duplicates the busy extent tree Christoph Hellwig
2011-02-01 23:02   ` Alex Elder

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110128015835.GQ21311@dastard \
    --to=david@fromorbit.com \
    --cc=hch@infradead.org \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.