linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Omar Sandoval <osandov@osandov.com>, linux-xfs@vger.kernel.org
Subject: Re: Transaction log reservation overrun when fallocating realtime file
Date: Mon, 2 Dec 2019 18:45:26 -0800	[thread overview]
Message-ID: <20191203024526.GF7339@magnolia> (raw)
In-Reply-To: <20191202215113.GH2695@dread.disaster.area>

On Tue, Dec 03, 2019 at 08:51:13AM +1100, Dave Chinner wrote:
> On Tue, Nov 26, 2019 at 04:34:26PM -0800, Darrick J. Wong wrote:
> > On Tue, Nov 26, 2019 at 12:27:14PM -0800, Omar Sandoval wrote:
> > > Hello,
> > > 
> > > The following reproducer results in a transaction log overrun warning
> > > for me:
> > > 
> > >   mkfs.xfs -f -r rtdev=/dev/vdc -d rtinherit=1 -m reflink=0 /dev/vdb
> > >   mount -o rtdev=/dev/vdc /dev/vdb /mnt
> > >   fallocate -l 4G /mnt/foo
> > > 
> > > I've attached the full dmesg output. My guess at the problem is that the
> > > tr_write reservation used by xfs_alloc_file_space is not taking the realtime
> > > bitmap and realtime summary inodes into account (inode numbers 129 and 130 on
> > > this filesystem, which I do see in some of the log items). However, I'm not
> > > familiar enough with the XFS transaction guts to confidently fix this. Can
> > > someone please help me out?
> > 
> > Hmm...
> > 
> > /*
> >  * In a write transaction we can allocate a maximum of 2
> >  * extents.  This gives:
> >  *    the inode getting the new extents: inode size
> >  *    the inode's bmap btree: max depth * block size
> >  *    the agfs of the ags from which the extents are allocated: 2 * sector
> >  *    the superblock free block counter: sector size
> >  *    the allocation btrees: 2 exts * 2 trees * (2 * max depth - 1) * block size
> >  * And the bmap_finish transaction can free bmap blocks in a join:
> >  *    the agfs of the ags containing the blocks: 2 * sector size
> >  *    the agfls of the ags containing the blocks: 2 * sector size
> >  *    the super block free block counter: sector size
> >  *    the allocation btrees: 2 exts * 2 trees * (2 * max depth - 1) * block size
> >  */
> > STATIC uint
> > xfs_calc_write_reservation(...);
> > 
> > So this means that the rt allocator can burn through at most ...
> > 1 ext * 2 trees * (2 * maxdepth - 1) * blocksize
> > ... worth of log reservation as part of setting bits in the rtbitmap and
> > fiddling with the rtsummary information.
> > 
> > Instead, 4GB of 4k rt extents == 1 million rtexts to mark in use, which
> > is 131072 bytes of rtbitmap to log, and *kaboom* there goes the 109K log
> > reservation.
> 
> Ok, if that's the case, we still need to be able to allocate MAXEXTLEN in
> a single transaction. That's 2^21 filesystem blocks, which at most
> is 2^21 rtexts.
> 
> Hence I think we probably should have a separate rt-write
> reservation that handles this case, and we use that for allocation
> on rt devices rather than the bt-based allocation reservation.

2^21 rtexts is ... 2^18 bytes worth of rtbitmap block, which implies a
transaction reservation of around ... ~300K?  I guess I'll have to go
play with xfs_db to see how small of a datadev you can make before that
causes us to fail the minimum log size checks.

As you said on IRC, it probably won't affect /most/ setups... but I
don't want to run around increasing support calls either.  Even if most
distributors don't turn on rt support.

> 
> > From: Darrick J. Wong <darrick.wong@oracle.com>
> > 
> > xfs: cap realtime allocation length to something we can log
> > 
> > Omar Sandoval reported that a 4G fallocate on the realtime device causes
> > filesystem shutdowns due to a log reservation overflow that happens when
> > we log the rtbitmap updates.
> > 
> > The tr_write transaction reserves enough log reservation to handle a
> > full splits of both free space btrees, so cap the rt allocation at that
> > number of bits.
> > 
> > "The following reproducer results in a transaction log overrun warning
> > for me:
> > 
> >     mkfs.xfs -f -r rtdev=/dev/vdc -d rtinherit=1 -m reflink=0 /dev/vdb
> >     mount -o rtdev=/dev/vdc /dev/vdb /mnt
> >     fallocate -l 4G /mnt/foo
> > 
> > Reported-by: Omar Sandoval <osandov@osandov.com>
> > Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> > ---
> >  fs/xfs/xfs_bmap_util.c |   23 +++++++++++++++++++++++
> >  1 file changed, 23 insertions(+)
> > 
> > diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> > index 49d7b530c8f7..15c4e2790de3 100644
> > --- a/fs/xfs/xfs_bmap_util.c
> > +++ b/fs/xfs/xfs_bmap_util.c
> > @@ -69,6 +69,26 @@ xfs_zero_extent(
> >  }
> >  
> >  #ifdef CONFIG_XFS_RT
> > +/*
> > + * tr_write allows for one full split in the bnobt and cntbt to record the
> > + * allocation, and that's how many bits of rtbitmap we can log to the
> > + * transaction.  We leave one full block's worth of log space to handle the
> > + * rtsummary update, though that's probably overkill.
> > + */
> > +static inline uint64_t
> > +xfs_bmap_rtalloc_max(
> > +	struct xfs_mount	*mp)
> > +{
> > +	uint64_t		max_rtbitmap;
> > +
> > +	max_rtbitmap = xfs_allocfree_log_count(mp, 1) - 1;
> > +	max_rtbitmap *= XFS_FSB_TO_B(mp, 1);
> > +	max_rtbitmap *= NBBY;
> > +	max_rtbitmap *= mp->m_sb.sb_rextsize;
> 
> I can see how this works, but it strikes me as a bit of a hack. We
> calculate the worst case reservations up front to avoid having to
> play games like this in the code. Hence I think the correct thing to
> do is fix the reservation to ensure we can do MAXEXTLEN allocations
> without overruns...

Yeah...

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

  reply	other threads:[~2019-12-03  2:45 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-26 20:27 Transaction log reservation overrun when fallocating realtime file Omar Sandoval
2019-11-27  0:34 ` Darrick J. Wong
2019-12-02 19:32   ` Omar Sandoval
2019-12-02 21:51   ` Dave Chinner
2019-12-03  2:45     ` Darrick J. Wong [this message]
2019-12-03 21:31       ` Dave Chinner
2019-12-04 16:31         ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191203024526.GF7339@magnolia \
    --to=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    --cc=osandov@osandov.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).