Linux-XFS Archive on lore.kernel.org
 help / color / Atom feed
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Omar Sandoval <osandov@osandov.com>, linux-xfs@vger.kernel.org
Subject: Re: Transaction log reservation overrun when fallocating realtime file
Date: Wed, 4 Dec 2019 08:31:36 -0800
Message-ID: <20191204163136.GO7335@magnolia> (raw)
In-Reply-To: <20191203213117.GL2695@dread.disaster.area>

On Wed, Dec 04, 2019 at 08:31:17AM +1100, Dave Chinner wrote:
> On Mon, Dec 02, 2019 at 06:45:26PM -0800, Darrick J. Wong wrote:
> > On Tue, Dec 03, 2019 at 08:51:13AM +1100, Dave Chinner wrote:
> > > On Tue, Nov 26, 2019 at 04:34:26PM -0800, Darrick J. Wong wrote:
> > > > On Tue, Nov 26, 2019 at 12:27:14PM -0800, Omar Sandoval wrote:
> > > > > Hello,
> > > > > 
> > > > > The following reproducer results in a transaction log overrun warning
> > > > > for me:
> > > > > 
> > > > >   mkfs.xfs -f -r rtdev=/dev/vdc -d rtinherit=1 -m reflink=0 /dev/vdb
> > > > >   mount -o rtdev=/dev/vdc /dev/vdb /mnt
> > > > >   fallocate -l 4G /mnt/foo
> > > > > 
> > > > > I've attached the full dmesg output. My guess at the problem is that the
> > > > > tr_write reservation used by xfs_alloc_file_space is not taking the realtime
> > > > > bitmap and realtime summary inodes into account (inode numbers 129 and 130 on
> > > > > this filesystem, which I do see in some of the log items). However, I'm not
> > > > > familiar enough with the XFS transaction guts to confidently fix this. Can
> > > > > someone please help me out?
> > > > 
> > > > Hmm...
> > > > 
> > > > /*
> > > >  * In a write transaction we can allocate a maximum of 2
> > > >  * extents.  This gives:
> > > >  *    the inode getting the new extents: inode size
> > > >  *    the inode's bmap btree: max depth * block size
> > > >  *    the agfs of the ags from which the extents are allocated: 2 * sector
> > > >  *    the superblock free block counter: sector size
> > > >  *    the allocation btrees: 2 exts * 2 trees * (2 * max depth - 1) * block size
> > > >  * And the bmap_finish transaction can free bmap blocks in a join:
> > > >  *    the agfs of the ags containing the blocks: 2 * sector size
> > > >  *    the agfls of the ags containing the blocks: 2 * sector size
> > > >  *    the super block free block counter: sector size
> > > >  *    the allocation btrees: 2 exts * 2 trees * (2 * max depth - 1) * block size
> > > >  */
> > > > STATIC uint
> > > > xfs_calc_write_reservation(...);
> > > > 
> > > > So this means that the rt allocator can burn through at most ...
> > > > 1 ext * 2 trees * (2 * maxdepth - 1) * blocksize
> > > > ... worth of log reservation as part of setting bits in the rtbitmap and
> > > > fiddling with the rtsummary information.
> > > > 
> > > > Instead, 4GB of 4k rt extents == 1 million rtexts to mark in use, which
> > > > is 131072 bytes of rtbitmap to log, and *kaboom* there goes the 109K log
> > > > reservation.
> > > 
> > > Ok, if that's the case, we still need to be able to allocate MAXEXTLEN in
> > > a single transaction. That's 2^21 filesystem blocks, which at most
> > > is 2^21 rtexts.
> > > 
> > > Hence I think we probably should have a separate rt-write
> > > reservation that handles this case, and we use that for allocation
> > > on rt devices rather than the bt-based allocation reservation.
> > 
> > 2^21 rtexts is ... 2^18 bytes worth of rtbitmap block, which implies a
> > transaction reservation of around ... ~300K?  I guess I'll have to go
> > play with xfs_db to see how small of a datadev you can make before that
> > causes us to fail the minimum log size checks.
> 
> Keep in mind that rtextsz is often larger than a single filesystem
> block, so the bitmap size rapidly reduces as rtextsz goes up.
> 
> > As you said on IRC, it probably won't affect /most/ setups... but I
> > don't want to run around increasing support calls either.  Even if most
> > distributors don't turn on rt support.
> 
> Sure, we can limit the size of the allocation based on the
> transaction reservation limits, but I suspect this will only affect
> filesystems with really, really small data devices that result in a
> <10MB default log size. I don't think there is that many of these
> around in production....
> 
> I'd prefer to fix the transaction size, and then if people start
> reporting that the log size is too small, we can then
> limit the extent size allocation and transaction reservation based
> on the (tiny) log size we read out of the superblock...

Ok, I'll work on that.

> Alternatively, we could implement log growing :)

Heh.  Wandering logs?

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

      reply index

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-26 20:27 Omar Sandoval
2019-11-27  0:34 ` Darrick J. Wong
2019-12-02 19:32   ` Omar Sandoval
2019-12-02 21:51   ` Dave Chinner
2019-12-03  2:45     ` Darrick J. Wong
2019-12-03 21:31       ` Dave Chinner
2019-12-04 16:31         ` Darrick J. Wong [this message]

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191204163136.GO7335@magnolia \
    --to=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=linux-xfs@vger.kernel.org \
    --cc=osandov@osandov.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-XFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-xfs/0 linux-xfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-xfs linux-xfs/ https://lore.kernel.org/linux-xfs \
		linux-xfs@vger.kernel.org
	public-inbox-index linux-xfs

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-xfs


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git