All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Brian Foster <bfoster@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>, linux-xfs@vger.kernel.org
Subject: Re: t_firstblock assert triggered from dedup in generic/051
Date: Wed, 11 Jul 2018 17:25:34 -0700	[thread overview]
Message-ID: <20180712002534.GJ32415@magnolia> (raw)
In-Reply-To: <20180711194321.GJ5724@magnolia>

On Wed, Jul 11, 2018 at 12:43:21PM -0700, Darrick J. Wong wrote:
> On Wed, Jul 11, 2018 at 03:38:21PM -0400, Brian Foster wrote:
> > On Wed, Jul 11, 2018 at 12:09:25PM -0700, Darrick J. Wong wrote:
> > > On Wed, Jul 11, 2018 at 08:42:41PM +0200, Christoph Hellwig wrote:
> > > > I ran into this when testing the rebase of my cow optimization
> > > > series to the 4.19-merge branch, but from a quick glance it really
> > > > looks like the t_firstblock series might be at faul.  Feel free
> > > > to discard if you think it is not, I will retest with my patches
> > > > tomorrow.
> > > > 
> > > > [  635.600041] run fstests generic/051 at 2018-07-11 18:25:22
> > > > [  635.802443] XFS (vdb): Mounting V5 Filesystem
> > > > [  635.814103] XFS (vdb): Ending clean mount
> > > > [  635.992757] XFS (vdc): Mounting V5 Filesystem
> > > > [  636.000955] XFS (vdc): Ending clean mount
> > > > [  636.031583] XFS (vdc): Unmounting Filesystem
> > > > [  636.115131] XFS (vdc): Mounting V5 Filesystem
> > > > [  636.123271] XFS (vdc): Ending clean mount
> > > > [  639.538434] XFS: Assertion failed: tp->t_firstblock == NULLFSBLOCK, file: fs/xfs/libxfs5
> > > > [  639.539893] ------------[ cut here ]------------
> > > > [  639.540516] kernel BUG at fs/xfs/xfs_message.c:102!
> > > > [  639.541168] invalid opcode: 0000 [#1] SMP PTI
> > > > [  639.541814] CPU: 2 PID: 20757 Comm: fsstress Not tainted 4.18.0-rc4+ #3969
> > > > [  639.542744] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/4
> > > > [  639.543848] RIP: 0010:assfail+0x23/0x30
> > > > [  639.544470] Code: c3 66 0f 1f 44 00 00 48 89 f1 41 89 d0 48 c7 c6 88 e0 8c 82 48 89 fa  
> > > > [  639.546396] RSP: 0018:ffff88012dc43c08 EFLAGS: 00010202
> > > > [  639.546899] RAX: 0000000000000000 RBX: ffff88012dc43ca0 RCX: 0000000000000000
> > > > [  639.547582] RDX: 00000000ffffffc0 RSI: 000000000000000a RDI: ffffffff828480eb
> > > > [  639.548273] RBP: ffff88012aa92758 R08: 0000000000000000 R09: 0000000000000000
> > > > [  639.548954] R10: 0000000000000000 R11: f000000000000000 R12: 0000000000000000
> > > > [  639.549704] R13: ffff88012dc43d48 R14: ffff88013092e7e8 R15: 0000000000000014
> > > > [  639.550397] FS:  00007f8d689b8e80(0000) GS:ffff88013fd00000(0000) knlGS:0000000000000000
> > > > [  639.551166] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > [  639.551795] CR2: 00007f8d689c7000 CR3: 000000012ba6a000 CR4: 00000000000006e0
> > > > [  639.552604] Call Trace:
> > > > [  639.552896]  xfs_defer_init+0xff/0x160
> > > > [  639.553331]  xfs_reflink_remap_extent+0x31b/0xa00
> > > 
> > > Hmmm, this is one of those functions that does:
> > > 
> > > xfs_trans_alloc
> > > while(...)
> > > 	xfs_defer_init
> > > 	...
> > > 	xfs_defer_finish
> > > xfs_trans_commit
> > > 
> > > Hmm, we set t_firstblock to NULLFSBLOCK in _trans_alloc and _trans_dup,
> > > which should take care of the preconditions and the defer_finis...
> > > 
> > > ...oh, you know what I think it is?  We don't roll the transaction after
> > > successfully finishing the while (xfs_defer_has_unfinished_work) loop in
> > > _defer_finish, which means that t_firstblock can be non-null at the
> > > bottom of that loop, which means that we then trip the assert at the top
> > > of the loop, just like you saw.
> > > 
> > > An extra downside is that _defer_finish returns with a dirty transaction,
> > > which means that whatever happens afterwards can (in theory) exceed the
> > > transaction reservation and blow up.  Yikes.
> > > 
> > 
> > My first thought was whether we could get through that loop with an
> > allocation or something but without deferring anything, but what you
> > describe above seems much more likely. Given the next series of
> > refactoring buries the final defer_finish() in trans_commit(), I'm
> > wondering if we should just have a defer_finish() wrapper that rolls the
> > tx a final time, which we can bypass in the trans_commit() case because
> > we aren't making any more changes to the transaction. Hm?
> 
> That sounds like a reasonable approach.  In the meantime I'm testing a
> patch (below) to fix(?) the problem on 4.9 - current kernels.  If it
> tests ok I'll send it for reals to the list.
> 
> Christoph Hellwig reported seeing the following assertion in
> generic/051:
> 
> XFS: Assertion failed: tp->t_firstblock == NULLFSBLOCK, file: fs/xfs/libxfs5
> ------------[ cut here ]------------
> kernel BUG at fs/xfs/xfs_message.c:102!
> invalid opcode: 0000 [#1] SMP PTI
> CPU: 2 PID: 20757 Comm: fsstress Not tainted 4.18.0-rc4+ #3969
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.1-1 04/01/4
> RIP: 0010:assfail+0x23/0x30
> Code: c3 66 0f 1f 44 00 00 48 89 f1 41 89 d0 48 c7 c6 88 e0 8c 82 48 89 fa  
> RSP: 0018:ffff88012dc43c08 EFLAGS: 00010202
> RAX: 0000000000000000 RBX: ffff88012dc43ca0 RCX: 0000000000000000
> RDX: 00000000ffffffc0 RSI: 000000000000000a RDI: ffffffff828480eb
> RBP: ffff88012aa92758 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: f000000000000000 R12: 0000000000000000
> R13: ffff88012dc43d48 R14: ffff88013092e7e8 R15: 0000000000000014
> FS:  00007f8d689b8e80(0000) GS:ffff88013fd00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f8d689c7000 CR3: 000000012ba6a000 CR4: 00000000000006e0
> Call Trace:
>  xfs_defer_init+0xff/0x160
>  xfs_reflink_remap_extent+0x31b/0xa00
>  xfs_reflink_remap_blocks+0xec/0x4a0
>  xfs_reflink_remap_range+0x3a1/0x650
>  xfs_file_dedupe_range+0x39/0x50
>  vfs_dedupe_file_range+0x218/0x260
>  do_vfs_ioctl+0x262/0x6a0
>  ? __se_sys_newfstat+0x3c/0x60
>  ksys_ioctl+0x35/0x60
>  __x64_sys_ioctl+0x11/0x20
>  do_syscall_64+0x4b/0x190
>  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
> The root cause of the assertion failure is that xfs_defer_finish doesn't
> roll the transaction after processing all the deferred items.  Therefore
> it returns a dirty transaction to the caller, which leaves the caller at
> risk of exceeding the transaction reservation if it logs more items.
> 
> Brian Foster's patchset to move the defer_ops firstblock into the
> transaction requires t_firstblock == NULLFSBLOCK upon defer_ops
> initialization, which is how this was noticed at all.
> 
> Reported-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
> ---
>  fs/xfs/libxfs/xfs_defer.c |    5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/fs/xfs/libxfs/xfs_defer.c b/fs/xfs/libxfs/xfs_defer.c
> index 2713e2d808a7..9bab0fb23761 100644
> --- a/fs/xfs/libxfs/xfs_defer.c
> +++ b/fs/xfs/libxfs/xfs_defer.c
> @@ -424,6 +424,11 @@ xfs_defer_finish(
>  			cleanup_fn(*tp, state, error);
>  	}
>  
> +	/*
> +	 * Roll the transaction once more to avoid returning to the caller
> +	 * with a dirty transaction.
> +	 */
> +	error = xfs_defer_trans_roll(tp, dop);

Heh, it doesn't test ok; back to the drawing board...

XFS: Assertion failed: !test_bit(XFS_LI_DIRTY, &lip->li_flags), file: fs/xfs/xfs_trans.c, line: 740

I think we only need to roll if the transaction is actually dirty, i.e.

	if ((*tp)->t_flags & XFS_TRANS_DIRTY)
		error = xfs_defer_trans_roll(tp, dop);

--D

>  out:
>  	(*tp)->t_dfops = orig_dop;
>  	if (error)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2018-07-12  0:32 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-07-11 18:42 t_firstblock assert triggered from dedup in generic/051 Christoph Hellwig
2018-07-11 19:09 ` Darrick J. Wong
2018-07-11 19:38   ` Brian Foster
2018-07-11 19:43     ` Darrick J. Wong
2018-07-12  0:25       ` Darrick J. Wong [this message]
2018-07-12 12:06         ` Christoph Hellwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180712002534.GJ32415@magnolia \
    --to=darrick.wong@oracle.com \
    --cc=bfoster@redhat.com \
    --cc=hch@lst.de \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.