linux-xfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: "Darrick J. Wong" <darrick.wong@oracle.com>
Cc: Omar Sandoval <osandov@osandov.com>, linux-xfs@vger.kernel.org
Subject: Re: Transaction log reservation overrun when fallocating realtime file
Date: Wed, 4 Dec 2019 08:31:17 +1100	[thread overview]
Message-ID: <20191203213117.GL2695@dread.disaster.area> (raw)
In-Reply-To: <20191203024526.GF7339@magnolia>

On Mon, Dec 02, 2019 at 06:45:26PM -0800, Darrick J. Wong wrote:
> On Tue, Dec 03, 2019 at 08:51:13AM +1100, Dave Chinner wrote:
> > On Tue, Nov 26, 2019 at 04:34:26PM -0800, Darrick J. Wong wrote:
> > > On Tue, Nov 26, 2019 at 12:27:14PM -0800, Omar Sandoval wrote:
> > > > Hello,
> > > > 
> > > > The following reproducer results in a transaction log overrun warning
> > > > for me:
> > > > 
> > > >   mkfs.xfs -f -r rtdev=/dev/vdc -d rtinherit=1 -m reflink=0 /dev/vdb
> > > >   mount -o rtdev=/dev/vdc /dev/vdb /mnt
> > > >   fallocate -l 4G /mnt/foo
> > > > 
> > > > I've attached the full dmesg output. My guess at the problem is that the
> > > > tr_write reservation used by xfs_alloc_file_space is not taking the realtime
> > > > bitmap and realtime summary inodes into account (inode numbers 129 and 130 on
> > > > this filesystem, which I do see in some of the log items). However, I'm not
> > > > familiar enough with the XFS transaction guts to confidently fix this. Can
> > > > someone please help me out?
> > > 
> > > Hmm...
> > > 
> > > /*
> > >  * In a write transaction we can allocate a maximum of 2
> > >  * extents.  This gives:
> > >  *    the inode getting the new extents: inode size
> > >  *    the inode's bmap btree: max depth * block size
> > >  *    the agfs of the ags from which the extents are allocated: 2 * sector
> > >  *    the superblock free block counter: sector size
> > >  *    the allocation btrees: 2 exts * 2 trees * (2 * max depth - 1) * block size
> > >  * And the bmap_finish transaction can free bmap blocks in a join:
> > >  *    the agfs of the ags containing the blocks: 2 * sector size
> > >  *    the agfls of the ags containing the blocks: 2 * sector size
> > >  *    the super block free block counter: sector size
> > >  *    the allocation btrees: 2 exts * 2 trees * (2 * max depth - 1) * block size
> > >  */
> > > STATIC uint
> > > xfs_calc_write_reservation(...);
> > > 
> > > So this means that the rt allocator can burn through at most ...
> > > 1 ext * 2 trees * (2 * maxdepth - 1) * blocksize
> > > ... worth of log reservation as part of setting bits in the rtbitmap and
> > > fiddling with the rtsummary information.
> > > 
> > > Instead, 4GB of 4k rt extents == 1 million rtexts to mark in use, which
> > > is 131072 bytes of rtbitmap to log, and *kaboom* there goes the 109K log
> > > reservation.
> > 
> > Ok, if that's the case, we still need to be able to allocate MAXEXTLEN in
> > a single transaction. That's 2^21 filesystem blocks, which at most
> > is 2^21 rtexts.
> > 
> > Hence I think we probably should have a separate rt-write
> > reservation that handles this case, and we use that for allocation
> > on rt devices rather than the bt-based allocation reservation.
> 
> 2^21 rtexts is ... 2^18 bytes worth of rtbitmap block, which implies a
> transaction reservation of around ... ~300K?  I guess I'll have to go
> play with xfs_db to see how small of a datadev you can make before that
> causes us to fail the minimum log size checks.

Keep in mind that rtextsz is often larger than a single filesystem
block, so the bitmap size rapidly reduces as rtextsz goes up.

> As you said on IRC, it probably won't affect /most/ setups... but I
> don't want to run around increasing support calls either.  Even if most
> distributors don't turn on rt support.

Sure, we can limit the size of the allocation based on the
transaction reservation limits, but I suspect this will only affect
filesystems with really, really small data devices that result in a
<10MB default log size. I don't think there is that many of these
around in production....

I'd prefer to fix the transaction size, and then if people start
reporting that the log size is too small, we can then
limit the extent size allocation and transaction reservation based
on the (tiny) log size we read out of the superblock...

Alternatively, we could implement log growing :)

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2019-12-03 21:31 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-26 20:27 Transaction log reservation overrun when fallocating realtime file Omar Sandoval
2019-11-27  0:34 ` Darrick J. Wong
2019-12-02 19:32   ` Omar Sandoval
2019-12-02 21:51   ` Dave Chinner
2019-12-03  2:45     ` Darrick J. Wong
2019-12-03 21:31       ` Dave Chinner [this message]
2019-12-04 16:31         ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191203213117.GL2695@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=darrick.wong@oracle.com \
    --cc=linux-xfs@vger.kernel.org \
    --cc=osandov@osandov.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).