All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Dave Chinner <david@fromorbit.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: [PATCH] xfs: byte range buffer dirty region tracking
Date: Thu, 1 Feb 2018 15:22:58 -0800	[thread overview]
Message-ID: <20180201232258.GW4849@magnolia> (raw)
In-Reply-To: <20180201231647.qsiq6vnmllqc32le@destitution>

On Fri, Feb 02, 2018 at 10:16:47AM +1100, Dave Chinner wrote:
> On Thu, Feb 01, 2018 at 12:35:26PM -0800, Darrick J. Wong wrote:
> > On Thu, Feb 01, 2018 at 07:14:52PM +1100, Dave Chinner wrote:
> > > On Wed, Jan 31, 2018 at 09:11:28PM -0800, Darrick J. Wong wrote:
> > > > On Thu, Feb 01, 2018 at 12:05:14PM +1100, Dave Chinner wrote:
> > > > > From: Dave Chinner <dchinner@redhat.com>
> > > > > 
> > > > > One of the biggest performance problems with large directory block
> > > > > sizes is the CPU overhead in maintaining the buffer log item direty
> > > > > region bitmap.  The bit manipulations and buffer region mapping
> > > > > calls are right at the top of the profiles when running tests on 64k
> > > > > directory buffers:
> > > .....
> > > > > ---
> > > > >  fs/xfs/xfs_buf.c      |   2 +
> > > > >  fs/xfs/xfs_buf_item.c | 431 +++++++++++++++++++++++++-------------------------
> > > > >  fs/xfs/xfs_buf_item.h |  19 +++
> > > > >  3 files changed, 238 insertions(+), 214 deletions(-)
> > > > > 
> > > > > diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
> > > > > index d1da2ee9e6db..7621fabeb505 100644
> > > > > --- a/fs/xfs/xfs_buf.c
> > > > > +++ b/fs/xfs/xfs_buf.c
> > > > > @@ -1583,6 +1583,8 @@ xfs_buf_iomove(
> > > > >  		page = bp->b_pages[page_index];
> > > > >  		csize = min_t(size_t, PAGE_SIZE - page_offset,
> > > > >  				      BBTOB(bp->b_io_length) - boff);
> > > > > +		if (boff + csize > bend)
> > > > > +			csize = bend - boff;
> > > > 
> > > > How often does csize exceed bend?
> > > 
> > > /me checks notes when the patch was written a couple of years ago
> > > 
> > > Rarely. I didn't record the exact cause because it was a memory
> > > corruption bug that showed up long after the cause was gone.
> > > Reading between the lines, I think was a case where bsize was a
> > > single chunk (128 bytes), boff was 256 (third chunk in the buffer)
> > > b_io_length was 512 bytes and a page offset of ~512 bytes.
> > > 
> > > That means csize was coming out at 256 bytes, but we only wanted 128
> > > bytes to be copied. In most cases this didn't cause a problem
> > > because there was more space in the log iovec buffer being copied
> > > into, but occasionally it would be the last copy into the
> > > logvec buffer and that would overrun the user buffer and corrupt
> > > memory.
> > > 
> > > Essentially we are trying to copy from boff to bend, there's
> > > nothing in the loop to clamp the copy size to bend, and that's
> > > what this is doing. I can separate it out into another patch if you
> > > want - I'd completely forgotten this was in the patch because I've
> > > been running this patch in my tree for a long time now without
> > > really looking at it...
> > 
> > I don't know if this needs to be a separate patch, but it seems like the
> > upper levels shouldn't be sending us overlong lengths?  So either we
> > need to go find the ones that do and fix them to dtrt, possibly leaving
> > an assert here for "hey someone screwed up but we're fixing it"
> > analysis.
> 
> It was probably caused by a bug in the original range->bitmap
> conversion code I'd written, not by any of the external code. I'll
> add an assert into the code, but also leave the clamping so that
> production systems don't go bad if there's some other bug in the
> code that triggers it.
> 
> > > > > +	ASSERT(bip->bli_range[0].last != 0);
> > > > > +	if (bip->bli_range[0].last == 0) {
> > > > > +		/* clean! */
> > > > > +		ASSERT(bip->bli_range[0].first == 0);
> > > > 
> > > > Hm, so given that the firsts are initialized to UINT_MAX, this only
> > > > happens if the first (only?) range we log is ... (0, 0) ?
> > > 
> > > Yeah, basically it catches code that should not be logging buffers
> > > because there is no dirty range in the buffer.
> > > 
> > > > Mildly confused about what these asserts are going after, since the
> > > > first one implies that this shouldn't happen anyway.
> > > 
> > > If first is after last, then we've really screwed up because we've
> > > got a dirty buffer with an invalid range. I can't recall seeing
> > > either of these asserts fire, but we still need the check for clean
> > > buffer ranges/ screwups in production code. maybe there's a better
> > > way to do this?
> > 
> > I only came up with:
> > 
> > /*
> >  * If the first bli_range has a last of 0, we've been fed a clean
> >  * buffer.  This shouldn't happen but we'll be paranoid and check
> >  * anyway.
> >  */
> > if (bip->bli_range[0].last == 0) {
> > 	ASSERT(0);
> > 	ASSERT(bip->bli_range[0].first == 0);
> > 	return;
> > }
> 
> Yup, that's a bit cleaner, I'll change it over.
> 
> > FWIW I also ran straight into this when I applied it for giggles and ran
> > xfstests -g quick (generic/001 blew up):
> 
> I must have screwed up the forward port worse than usual - the
> conflicts with the xfs_buf_log_item typedef removal were pretty
> extensive.

Ah, sorry about that.  I'd thought it was just the xfs_buf rename. :/

> > [   31.909228] ================================================================================
> > [   31.911258] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a0
> > [   31.912375] IP: xfs_buf_item_init+0x33/0x350 [xfs]
> 
> Hmmmm - I'm seeing that on my subvol smoke test script but not
> elsewhere. I've been looking through the subvol code to try to find
> this, maybe it's not the subvol code.  What mkfs parameters where
> you using?

mkfs.xfs -m rmapbt=1,reflink=1 -i sparse=1 /dev/pmem0

--D

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2018-02-01 23:23 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-01  1:05 [PATCH] xfs: byte range buffer dirty region tracking Dave Chinner
2018-02-01  5:11 ` Darrick J. Wong
2018-02-01  8:14   ` Dave Chinner
2018-02-01 20:35     ` Darrick J. Wong
2018-02-01 23:16       ` Dave Chinner
2018-02-01 23:22         ` Darrick J. Wong [this message]
2018-02-01 23:55           ` Dave Chinner
2018-02-02 10:56             ` Brian Foster
2018-02-05  0:34 ` [PATCH v2] " Dave Chinner
2018-02-06 16:21   ` Brian Foster
2018-02-12  2:41     ` Dave Chinner
2018-02-12 14:26       ` Brian Foster
2018-02-12 21:18         ` Dave Chinner
2018-02-13 13:15           ` Brian Foster
2018-02-13 22:02             ` Dave Chinner
2018-02-14 13:09               ` Brian Foster
2018-02-14 16:49                 ` Darrick J. Wong
2018-02-14 18:08                   ` Brian Foster
2018-02-14 22:05                     ` Dave Chinner
2018-02-14 22:30                 ` Dave Chinner
2018-02-15 13:42                   ` Brian Foster

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180201232258.GW4849@magnolia \
    --to=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.