From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Omar Sandoval <osandov@osandov.com>
Cc: linux-xfs@vger.kernel.org
Subject: Re: Transaction log reservation overrun when fallocating realtime file
Date: Tue, 26 Nov 2019 16:34:26 -0800 [thread overview]
Message-ID: <20191127003426.GP6219@magnolia> (raw)
In-Reply-To: <20191126202714.GA667580@vader>
On Tue, Nov 26, 2019 at 12:27:14PM -0800, Omar Sandoval wrote:
> Hello,
>
> The following reproducer results in a transaction log overrun warning
> for me:
>
> mkfs.xfs -f -r rtdev=/dev/vdc -d rtinherit=1 -m reflink=0 /dev/vdb
> mount -o rtdev=/dev/vdc /dev/vdb /mnt
> fallocate -l 4G /mnt/foo
>
> I've attached the full dmesg output. My guess at the problem is that the
> tr_write reservation used by xfs_alloc_file_space is not taking the realtime
> bitmap and realtime summary inodes into account (inode numbers 129 and 130 on
> this filesystem, which I do see in some of the log items). However, I'm not
> familiar enough with the XFS transaction guts to confidently fix this. Can
> someone please help me out?
Hmm...
/*
* In a write transaction we can allocate a maximum of 2
* extents. This gives:
* the inode getting the new extents: inode size
* the inode's bmap btree: max depth * block size
* the agfs of the ags from which the extents are allocated: 2 * sector
* the superblock free block counter: sector size
* the allocation btrees: 2 exts * 2 trees * (2 * max depth - 1) * block size
* And the bmap_finish transaction can free bmap blocks in a join:
* the agfs of the ags containing the blocks: 2 * sector size
* the agfls of the ags containing the blocks: 2 * sector size
* the super block free block counter: sector size
* the allocation btrees: 2 exts * 2 trees * (2 * max depth - 1) * block size
*/
STATIC uint
xfs_calc_write_reservation(...);
So this means that the rt allocator can burn through at most ...
1 ext * 2 trees * (2 * maxdepth - 1) * blocksize
... worth of log reservation as part of setting bits in the rtbitmap and
fiddling with the rtsummary information.
Instead, 4GB of 4k rt extents == 1 million rtexts to mark in use, which
is 131072 bytes of rtbitmap to log, and *kaboom* there goes the 109K log
reservation.
So I think you're right, and the fix is probably? to cap ralen further
in xfs_bmap_rtalloc(). Does the following patch fix it?
--D
From: Darrick J. Wong <darrick.wong@oracle.com>
xfs: cap realtime allocation length to something we can log
Omar Sandoval reported that a 4G fallocate on the realtime device causes
filesystem shutdowns due to a log reservation overflow that happens when
we log the rtbitmap updates.
The tr_write transaction reserves enough log reservation to handle a
full splits of both free space btrees, so cap the rt allocation at that
number of bits.
"The following reproducer results in a transaction log overrun warning
for me:
mkfs.xfs -f -r rtdev=/dev/vdc -d rtinherit=1 -m reflink=0 /dev/vdb
mount -o rtdev=/dev/vdc /dev/vdb /mnt
fallocate -l 4G /mnt/foo
Reported-by: Omar Sandoval <osandov@osandov.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
fs/xfs/xfs_bmap_util.c | 23 +++++++++++++++++++++++
1 file changed, 23 insertions(+)
diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
index 49d7b530c8f7..15c4e2790de3 100644
--- a/fs/xfs/xfs_bmap_util.c
+++ b/fs/xfs/xfs_bmap_util.c
@@ -69,6 +69,26 @@ xfs_zero_extent(
}
#ifdef CONFIG_XFS_RT
+/*
+ * tr_write allows for one full split in the bnobt and cntbt to record the
+ * allocation, and that's how many bits of rtbitmap we can log to the
+ * transaction. We leave one full block's worth of log space to handle the
+ * rtsummary update, though that's probably overkill.
+ */
+static inline uint64_t
+xfs_bmap_rtalloc_max(
+ struct xfs_mount *mp)
+{
+ uint64_t max_rtbitmap;
+
+ max_rtbitmap = xfs_allocfree_log_count(mp, 1) - 1;
+ max_rtbitmap *= XFS_FSB_TO_B(mp, 1);
+ max_rtbitmap *= NBBY;
+ max_rtbitmap *= mp->m_sb.sb_rextsize;
+
+ return max_rtbitmap;
+}
+
int
xfs_bmap_rtalloc(
struct xfs_bmalloca *ap) /* bmap alloc argument struct */
@@ -113,6 +133,9 @@ xfs_bmap_rtalloc(
if (ralen * mp->m_sb.sb_rextsize >= MAXEXTLEN)
ralen = MAXEXTLEN / mp->m_sb.sb_rextsize;
+ /* Don't allocate so much that we blow out the log reservation. */
+ ralen = min_t(uint64_t, ralen, xfs_bmap_rtalloc_max(mp));
+
/*
* Lock out modifications to both the RT bitmap and summary inodes
*/
next prev parent reply other threads:[~2019-11-27 0:34 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-11-26 20:27 Transaction log reservation overrun when fallocating realtime file Omar Sandoval
2019-11-27 0:34 ` Darrick J. Wong [this message]
2019-12-02 19:32 ` Omar Sandoval
2019-12-02 21:51 ` Dave Chinner
2019-12-03 2:45 ` Darrick J. Wong
2019-12-03 21:31 ` Dave Chinner
2019-12-04 16:31 ` Darrick J. Wong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20191127003426.GP6219@magnolia \
--to=darrick.wong@oracle.com \
--cc=linux-xfs@vger.kernel.org \
--cc=osandov@osandov.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).