From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <xfs-bounces@oss.sgi.com>
Received: from cuda.sgi.com (cuda3.sgi.com [192.48.176.15])
	by oss.sgi.com (8.14.3/8.14.3/SuSE Linux 0.8) with ESMTP id
	p0T0NSQi181324 for <xfs@oss.sgi.com>; Fri, 28 Jan 2011 18:23:29 -0600
Received: from ipmail06.adl2.internode.on.net (localhost [127.0.0.1])
	by cuda.sgi.com (Spam Firewall) with ESMTP id 5F0841A47120
	for <xfs@oss.sgi.com>; Fri, 28 Jan 2011 16:25:53 -0800 (PST)
Received: from ipmail06.adl2.internode.on.net (ipmail06.adl2.internode.on.net
	[150.101.137.129]) by cuda.sgi.com with ESMTP id
	AF3tXlLVerWU9Bwq for <xfs@oss.sgi.com>;
	Fri, 28 Jan 2011 16:25:53 -0800 (PST)
Date: Sat, 29 Jan 2011 11:25:50 +1100
From: Dave Chinner <david@fromorbit.com>
Subject: Re: [PATCH 3/6] xfs: do not immediately reuse busy extent ranges
Message-ID: <20110129002550.GA21311@dastard>
References: <20110121092227.115815324@bombadil.infradead.org>
	<20110121092550.933551564@bombadil.infradead.org>
	<20110128015835.GQ21311@dastard> <1296231591.2342.47.camel@doink>
MIME-Version: 1.0
Content-Disposition: inline
In-Reply-To: <1296231591.2342.47.camel@doink>
List-Id: XFS Filesystem from SGI <xfs.oss.sgi.com>
List-Unsubscribe: <http://oss.sgi.com/mailman/options/xfs>,
	<mailto:xfs-request@oss.sgi.com?subject=unsubscribe>
List-Archive: <http://oss.sgi.com/pipermail/xfs>
List-Post: <mailto:xfs@oss.sgi.com>
List-Help: <mailto:xfs-request@oss.sgi.com?subject=help>
List-Subscribe: <http://oss.sgi.com/mailman/listinfo/xfs>,
	<mailto:xfs-request@oss.sgi.com?subject=subscribe>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Sender: xfs-bounces@oss.sgi.com
Errors-To: xfs-bounces@oss.sgi.com
To: Alex Elder <aelder@sgi.com>
Cc: Christoph Hellwig <hch@infradead.org>, xfs@oss.sgi.com

On Fri, Jan 28, 2011 at 10:19:51AM -0600, Alex Elder wrote:
> On Fri, 2011-01-28 at 12:58 +1100, Dave Chinner wrote:
> > On Fri, Jan 21, 2011 at 04:22:30AM -0500, Christoph Hellwig wrote:
> > > Every time we reallocate a busy extent, we cause a synchronous log force
> > > to occur to ensure the freeing transaction is on disk before we continue
> . . .
> > > +
> > > +	spin_lock(&pag->pagb_lock);
> > > +	rbp = pag->pagb_tree.rb_node;
> > > +	while (rbp) {
> 
> I will amend the loop termination condition I suggested
> before to be this:
> 
> 	while (rbp && len >= args->minlen) {

Makes sense.

> > > +		struct xfs_busy_extent *busyp =
> > > +			rb_entry(rbp, struct xfs_busy_extent, rb_node);
> > > +		xfs_agblock_t	end = bno + len;
> > > +		xfs_agblock_t	bend = busyp->bno + busyp->length;
> > > +
> > > +		if (bno + len <= busyp->bno) {
> > > +			rbp = rbp->rb_left;
> > > +			continue;
> > > +		} else if (bno >= busyp->bno + busyp->length) {
> > > +			rbp = rbp->rb_right;
> > > +			continue;
> > > +		}
> > 
> > 		if (end <= bbno)
> > 			left;
> > 		else if (bno > bend)
> > 			right;
> 
> I think the original code is right in this case.
> The value of "bend" is the offset *following* the
> end of the range.  So if "bno" equals that, we
> want to move Right.  (Same reason <= is correct
> for the first condition here.)

Oops, yes, you are right. Good catch - I failed to copy that code
into psuedo code correctly.

> 
> > 		/* overlap */
> > 
> > > +
> > > +		if (busyp->bno < bno) {
> > > +			/* start overlap */
> > > +			ASSERT(bend >= bno);
> > > +			ASSERT(bend <= end);
> > > +			len -= bno - bend;
> > > +			bno = bend;
> > 
> > 		if (bbno < bno) {
> > 
> > 		bbno           bend
> > 		+-----------------+
> > Case 1:
> > 		   +---------+
> > 		   bno     end
> > 
> > 		   No unbusy region in extent, return failure
> 
> Yes, that's right, I missed that.  My suggestion goes
> negative in this case.
> 
> > Case 2:
> > 		   +------------------------+
> > 		   bno                    end
> > 
> > 		   Needs to be trimmed to:
> > 		                    +-------+
> > 		                    bno   end
> > 		   bno = bend;
> > 		   len = end - bno;
> 
> I like defining len in terms of the updated bno as
> you have suggested here.
> 
> > > +		} else if (bend > end) {
> > > +			/* end overlap */
> > > +			ASSERT(busyp->bno >= bno);
> > > +			ASSERT(busyp->bno < end);
> > > +			len -= bend - end;
> > 
> . . .
> 
> 
> > So, it looks to me like the "overlap found" algorithm shoul dbe
> > something like:
> 
> For this algorithm, updating the value of len can be done
> once, at the bottom (or top) of the loop, based simply on
> the (updated) value of end and bno:
> 
> 	len = end - bno;
> 
> You could rearrange things a bit so this gets done at
> the top--instead of computing the value of end based
> on bno and len.

Quite possibly - I just wanted to enumerate what I though the code
should do rather than present a optimal, completed function ;)

> > 		if (bbno <= bno) {
> > 			if (end <= bend) {
> > 				/* case 1, 3, 5 */
> > 				return failure;
> > 			}
> > 			/* case 2, 6 */
> > 			bno = bend;
> > 			len = end - bno;
> > 		} else if (bend >= end) {
> > 			ASSERT(bbno > bno);
> > 			/* case 4, 7 */
> > 			end = bbno;
> > 			len = end - bno;
> > 		} else {
> > 			ASSERT(bbno > bno);
> > 			ASSERT(bend < end);
> > 			/* case 8 */
> > 			if (bbno - bno >= args->minlen) {
> > 				/* left candidate OK */
> > 				left = 1;
> > 			}
> > 			if (end - bend >= args->maxlen * 4) {
> 
> The "4" here I understand, but it's arbitrary (based
> on an educated guess) so it needs to at least be explained
> here with a comment.  Making it symbolic might make it
> something one could search for at some future date.

Yup. That value of "4" was simply a SWAG - I haven't really thought
about the best way to determine if the "remaining free space is much
larger than the allocation request" reliably, but I needed
something there to demonstrate what I was thinking....

....

> > > -		if (xfs_alloc_busy_search(mp, agno, fbno, flen)) {
> > > -			trace_xfs_discard_busy(mp, agno, fbno, flen);
> > > -			goto next_extent;
> > > -		}
> > > +		xfs_alloc_busy_search_trim(mp, pag, fbno, flen, &tbno, &tlen);
> > >  
> > > -		trace_xfs_discard_extent(mp, agno, fbno, flen);
> > > +		trace_xfs_discard_extent(mp, agno, tbno, tlen);
> > >  		error = -blkdev_issue_discard(bdev,
> > > -				XFS_AGB_TO_DADDR(mp, agno, fbno),
> > > -				XFS_FSB_TO_BB(mp, flen),
> > > +				XFS_AGB_TO_DADDR(mp, agno, tbno),
> > > +				XFS_FSB_TO_BB(mp, tlen),
> > >  				GFP_NOFS, 0);
> > >  		if (error)
> > >  			goto out_del_cursor;
> > > -		*blocks_trimmed += flen;
> > > +		*blocks_trimmed += tlen;
> > 
> > Hmmm - that means if we get a case 8 overlap, we'll only trim one
> > side of the extent. That's probably not a big deal. However, perhaps
> > this should check the size of the trimmed extent before issuing the
> > discard against it in case we've reduced it to something smaller
> > thanthe minimum requested trim size....
> 
> I think all of the places that (ultimately) call this function
> need to be looked at to make sure they handle the "error" case
> properly--either checking for a returned error or verifying the
> returned length is at least the minimum.

*nods vigorously*

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs