All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Brian Foster <bfoster@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>, linux-xfs@vger.kernel.org
Subject: Re: t_firstblock assert triggered from dedup in generic/051
Date: Wed, 11 Jul 2018 12:43:21 -0700	[thread overview]
Message-ID: <20180711194321.GJ5724@magnolia> (raw)
In-Reply-To: <20180711193821.GA6841@killians.ges.redhat.com>

On Wed, Jul 11, 2018 at 03:38:21PM -0400, Brian Foster wrote:
> On Wed, Jul 11, 2018 at 12:09:25PM -0700, Darrick J. Wong wrote:
> > On Wed, Jul 11, 2018 at 08:42:41PM +0200, Christoph Hellwig wrote:
> > > I ran into this when testing the rebase of my cow optimization
> > > series to the 4.19-merge branch, but from a quick glance it really
> > > looks like the t_firstblock series might be at faul.  Feel free
> > > to discard if you think it is not, I will retest with my patches
> > > tomorrow.
> > > 
> > > [  635.600041] run fstests generic/051 at 2018-07-11 18:25:22
> > > [  635.802443] XFS (vdb): Mounting V5 Filesystem
> > > [  635.814103] XFS (vdb): Ending clean mount
> > > [  635.992757] XFS (vdc): Mounting V5 Filesystem
> > > [  636.000955] XFS (vdc): Ending clean mount
> > > [  636.031583] XFS (vdc): Unmounting Filesystem
> > > [  636.115131] XFS (vdc): Mounting V5 Filesystem
> > > [  636.123271] XFS (vdc): Ending clean mount
> > > [  639.538434] XFS: Assertion failed: tp->t_firstblock == NULLFSBLOCK, file: fs/xfs/libxfs5
> > > [  639.539893] ------------[ cut here ]------------
> > > [  639.540516] kernel BUG at fs/xfs/xfs_message.c:102!
> > > [  639.541168] invalid opcode: 0000 [#1] SMP PTI
> > > [  639.541814] CPU: 2 PID: 20757 Comm: fsstress Not tainted 4.18.0-rc4+ #3969
> > > [  639.542744] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/4
> > > [  639.543848] RIP: 0010:assfail+0x23/0x30
> > > [  639.544470] Code: c3 66 0f 1f 44 00 00 48 89 f1 41 89 d0 48 c7 c6 88 e0 8c 82 48 89 fa  
> > > [  639.546396] RSP: 0018:ffff88012dc43c08 EFLAGS: 00010202
> > > [  639.546899] RAX: 0000000000000000 RBX: ffff88012dc43ca0 RCX: 0000000000000000
> > > [  639.547582] RDX: 00000000ffffffc0 RSI: 000000000000000a RDI: ffffffff828480eb
> > > [  639.548273] RBP: ffff88012aa92758 R08: 0000000000000000 R09: 0000000000000000
> > > [  639.548954] R10: 0000000000000000 R11: f000000000000000 R12: 0000000000000000
> > > [  639.549704] R13: ffff88012dc43d48 R14: ffff88013092e7e8 R15: 0000000000000014
> > > [  639.550397] FS:  00007f8d689b8e80(0000) GS:ffff88013fd00000(0000) knlGS:0000000000000000
> > > [  639.551166] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [  639.551795] CR2: 00007f8d689c7000 CR3: 000000012ba6a000 CR4: 00000000000006e0
> > > [  639.552604] Call Trace:
> > > [  639.552896]  xfs_defer_init+0xff/0x160
> > > [  639.553331]  xfs_reflink_remap_extent+0x31b/0xa00
> > 
> > Hmmm, this is one of those functions that does:
> > 
> > xfs_trans_alloc
> > while(...)
> > 	xfs_defer_init
> > 	...
> > 	xfs_defer_finish
> > xfs_trans_commit
> > 
> > Hmm, we set t_firstblock to NULLFSBLOCK in _trans_alloc and _trans_dup,
> > which should take care of the preconditions and the defer_finis...
> > 
> > ...oh, you know what I think it is?  We don't roll the transaction after
> > successfully finishing the while (xfs_defer_has_unfinished_work) loop in
> > _defer_finish, which means that t_firstblock can be non-null at the
> > bottom of that loop, which means that we then trip the assert at the top
> > of the loop, just like you saw.
> > 
> > An extra downside is that _defer_finish returns with a dirty transaction,
> > which means that whatever happens afterwards can (in theory) exceed the
> > transaction reservation and blow up.  Yikes.
> > 
> 
> My first thought was whether we could get through that loop with an
> allocation or something but without deferring anything, but what you
> describe above seems much more likely. Given the next series of
> refactoring buries the final defer_finish() in trans_commit(), I'm
> wondering if we should just have a defer_finish() wrapper that rolls the
> tx a final time, which we can bypass in the trans_commit() case because
> we aren't making any more changes to the transaction. Hm?

That sounds like a reasonable approach.  In the meantime I'm testing a
patch (below) to fix(?) the problem on 4.9 - current kernels.  If it
tests ok I'll send it for reals to the list.

--D

Christoph Hellwig reported seeing the following assertion in
generic/051:

XFS: Assertion failed: tp->t_firstblock == NULLFSBLOCK, file: fs/xfs/libxfs5
------------[ cut here ]------------
kernel BUG at fs/xfs/xfs_message.c:102!
invalid opcode: 0000 [#1] SMP PTI
CPU: 2 PID: 20757 Comm: fsstress Not tainted 4.18.0-rc4+ #3969
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/4
RIP: 0010:assfail+0x23/0x30
Code: c3 66 0f 1f 44 00 00 48 89 f1 41 89 d0 48 c7 c6 88 e0 8c 82 48 89 fa  
RSP: 0018:ffff88012dc43c08 EFLAGS: 00010202
RAX: 0000000000000000 RBX: ffff88012dc43ca0 RCX: 0000000000000000
RDX: 00000000ffffffc0 RSI: 000000000000000a RDI: ffffffff828480eb
RBP: ffff88012aa92758 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: f000000000000000 R12: 0000000000000000
R13: ffff88012dc43d48 R14: ffff88013092e7e8 R15: 0000000000000014
FS:  00007f8d689b8e80(0000) GS:ffff88013fd00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f8d689c7000 CR3: 000000012ba6a000 CR4: 00000000000006e0
Call Trace:
 xfs_defer_init+0xff/0x160
 xfs_reflink_remap_extent+0x31b/0xa00
 xfs_reflink_remap_blocks+0xec/0x4a0
 xfs_reflink_remap_range+0x3a1/0x650
 xfs_file_dedupe_range+0x39/0x50
 vfs_dedupe_file_range+0x218/0x260
 do_vfs_ioctl+0x262/0x6a0
 ? __se_sys_newfstat+0x3c/0x60
 ksys_ioctl+0x35/0x60
 __x64_sys_ioctl+0x11/0x20
 do_syscall_64+0x4b/0x190
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

The root cause of the assertion failure is that xfs_defer_finish doesn't
roll the transaction after processing all the deferred items.  Therefore
it returns a dirty transaction to the caller, which leaves the caller at
risk of exceeding the transaction reservation if it logs more items.

Brian Foster's patchset to move the defer_ops firstblock into the
transaction requires t_firstblock == NULLFSBLOCK upon defer_ops
initialization, which is how this was noticed at all.

Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/libxfs/xfs_defer.c |    5 +++++
 1 file changed, 5 insertions(+)

diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
index 2713e2d808a7..9bab0fb23761 100644
--- a/fs/xfs/libxfs/xfs_defer.c
+++ b/fs/xfs/libxfs/xfs_defer.c
@@ -424,6 +424,11 @@ xfs_defer_finish(
 			cleanup_fn(*tp, state, error);
 	}
 
+	/*
+	 * Roll the transaction once more to avoid returning to the caller
+	 * with a dirty transaction.
+	 */
+	error = xfs_defer_trans_roll(tp, dop);
 out:
 	(*tp)->t_dfops = orig_dop;
 	if (error)

  reply	other threads:[~2018-07-11 19:49 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-11 18:42 t_firstblock assert triggered from dedup in generic/051 Christoph Hellwig
2018-07-11 19:09 ` Darrick J. Wong
2018-07-11 19:38   ` Brian Foster
2018-07-11 19:43     ` Darrick J. Wong [this message]
2018-07-12  0:25       ` Darrick J. Wong
2018-07-12 12:06         ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180711194321.GJ5724@magnolia \
    --to=darrick.wong@oracle.com \
    --cc=bfoster@redhat.com \
    --cc=hch@lst.de \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.