linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Josef Bacik <josef@toxicpanda.com>
To: Filipe Manana <fdmanana@kernel.org>
Cc: Josef Bacik <josef@toxicpanda.com>,
	linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: [PATCH] Btrfs: fix assertion failure during fsync and use of stale transaction
Date: Thu, 12 Sep 2019 11:14:20 -0400	[thread overview]
Message-ID: <20190912151416.svmkxurjrtyc5jkf@MacBook-Pro-91.local> (raw)
In-Reply-To: <CAL3q7H4tTsZaz0Nr+yWBkDfHmj=M0JszZtaTaaM9LP2oeWLu3w@mail.gmail.com>

On Thu, Sep 12, 2019 at 02:19:55PM +0100, Filipe Manana wrote:
> On Thu, Sep 12, 2019 at 1:18 PM Josef Bacik <josef@toxicpanda.com> wrote:
> >
> > On Tue, Sep 10, 2019 at 03:26:49PM +0100, fdmanana@kernel.org wrote:
> > > From: Filipe Manana <fdmanana@suse.com>
> > >
> > > Sometimes when fsync'ing a file we need to log that other inodes exist and
> > > when we need to do that we acquire a reference on the inodes and then drop
> > > that reference using iput() after logging them.
> > >
> > > That generally is not a problem except if we end up doing the final iput()
> > > (dropping the last reference) on the inode and that inode has a link count
> > > of 0, which can happen in a very short time window if the logging path
> > > gets a reference on the inode while it's being unlinked.
> > >
> > > In that case we end up getting the eviction callback, btrfs_evict_inode(),
> > > invoked through the iput() call chain which needs to drop all of the
> > > inode's items from its subvolume btree, and in order to do that, it needs
> > > to join a transaction at the helper function evict_refill_and_join().
> > > However because the task previously started a transaction at the fsync
> > > handler, btrfs_sync_file(), it has current->journal_info already pointing
> > > to a transaction handle and therefore evict_refill_and_join() will get
> > > that transaction handle from btrfs_join_transaction(). From this point on,
> > > two different problems can happen:
> > >
> > > 1) evict_refill_and_join() will often change the transaction handle's
> > >    block reserve (->block_rsv) and set its ->bytes_reserved field to a
> > >    value greater than 0. If evict_refill_and_join() never commits the
> > >    transaction, the eviction handler ends up decreasing the reference
> > >    count (->use_count) of the transaction handle through the call to
> > >    btrfs_end_transaction(), and after that point we have a transaction
> > >    handle with a NULL ->block_rsv (which is the value prior to the
> > >    transaction join from evict_refill_and_join()) and a ->bytes_reserved
> > >    value greater than 0. If after the eviction/iput completes the inode
> > >    logging path hits an error or it decides that it must fallback to a
> > >    transaction commit, the btrfs fsync handle, btrfs_sync_file(), gets a
> > >    non-zero value from btrfs_log_dentry_safe(), and because of that
> > >    non-zero value it tries to commit the transaction using a handle with
> > >    a NULL ->block_rsv and a non-zero ->bytes_reserved value. This makes
> > >    the transaction commit hit an assertion failure at
> > >    btrfs_trans_release_metadata() because ->bytes_reserved is not zero but
> > >    the ->block_rsv is NULL. The produced stack trace for that is like the
> > >    following:
> > >
> > >    [192922.917158] assertion failed: !trans->bytes_reserved, file: fs/btrfs/transaction.c, line: 816
> > >    [192922.917553] ------------[ cut here ]------------
> > >    [192922.917922] kernel BUG at fs/btrfs/ctree.h:3532!
> > >    [192922.918310] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
> > >    [192922.918666] CPU: 2 PID: 883 Comm: fsstress Tainted: G        W         5.1.4-btrfs-next-47 #1
> > >    [192922.919035] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
> > >    [192922.919801] RIP: 0010:assfail.constprop.25+0x18/0x1a [btrfs]
> > >    (...)
> > >    [192922.920925] RSP: 0018:ffffaebdc8a27da8 EFLAGS: 00010286
> > >    [192922.921315] RAX: 0000000000000051 RBX: ffff95c9c16a41c0 RCX: 0000000000000000
> > >    [192922.921692] RDX: 0000000000000000 RSI: ffff95cab6b16838 RDI: ffff95cab6b16838
> > >    [192922.922066] RBP: ffff95c9c16a41c0 R08: 0000000000000000 R09: 0000000000000000
> > >    [192922.922442] R10: ffffaebdc8a27e70 R11: 0000000000000000 R12: ffff95ca731a0980
> > >    [192922.922820] R13: 0000000000000000 R14: ffff95ca84c73338 R15: ffff95ca731a0ea8
> > >    [192922.923200] FS:  00007f337eda4e80(0000) GS:ffff95cab6b00000(0000) knlGS:0000000000000000
> > >    [192922.923579] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > >    [192922.923948] CR2: 00007f337edad000 CR3: 00000001e00f6002 CR4: 00000000003606e0
> > >    [192922.924329] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > >    [192922.924711] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > >    [192922.925105] Call Trace:
> > >    [192922.925505]  btrfs_trans_release_metadata+0x10c/0x170 [btrfs]
> > >    [192922.925911]  btrfs_commit_transaction+0x3e/0xaf0 [btrfs]
> > >    [192922.926324]  btrfs_sync_file+0x44c/0x490 [btrfs]
> > >    [192922.926731]  do_fsync+0x38/0x60
> > >    [192922.927138]  __x64_sys_fdatasync+0x13/0x20
> > >    [192922.927543]  do_syscall_64+0x60/0x1c0
> > >    [192922.927939]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > >    (...)
> > >    [192922.934077] ---[ end trace f00808b12068168f ]---
> > >
> > > 2) If evict_refill_and_join() decides to commit the transaction, it will
> > >    be able to do it, since the nested transaction join only increments the
> > >    transaction handle's ->use_count reference counter and it does not
> > >    prevent the transaction from getting committed. This means that after
> > >    eviction completes, the fsync logging path will be using a transaction
> > >    handle that refers to an already committed transaction. What happens
> > >    when using such a stale transaction can be unpredictable, we are at
> > >    least having a use-after-free on the transaction handle itself, since
> > >    the transaction commit will call kmem_cache_free() against the handle
> > >    regardless of its ->use_count value, or we can end up silently losing
> > >    all the updates to the log tree after that iput() in the logging path,
> > >    or using a transaction handle that in the meanwhile was allocated to
> > >    another task for a new transaction, etc, pretty much unpredictable
> > >    what can happen.
> > >
> >
> > And talking it over with Nikolay I realized that since we're doing the commit
> > through the flushing this doesn't actually happen anymore.
> > may_commit_transaction() returns -EGAIN if we already have a trans handle open,
> > so hooray I made it safer by accident!  But we definitely should follow up and
> > add an assert in btrfs_commit_transaction() to catch this case, cause holy shit
> > that's bad.  Thanks,
> 
> Yep, but problem 1) (assertion failure) is still valid after your
> recent changes in misc-next.
> Either way, this fix has to be added to stable releases to fix both
> problems there.
> 

Agreed, just adding context for the current changes.  This patch is fine as it
is, just some follow up stuff to catch transaction commits under nested trans
handles.  Thanks,

Josef

      reply	other threads:[~2019-09-12 15:14 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-10 14:26 [PATCH] Btrfs: fix assertion failure during fsync and use of stale transaction fdmanana
2019-09-11 16:04 ` Josef Bacik
2019-09-12  7:30 ` Sasha Levin
2019-09-12  8:02   ` Filipe Manana
2019-09-16 12:59   ` Filipe Manana
2019-09-16 15:13     ` Greg KH
2019-09-12 11:10 ` Nikolay Borisov
2019-09-12 11:24   ` Filipe Manana
2019-09-12 11:35     ` Nikolay Borisov
2019-09-12 11:58 ` Josef Bacik
2019-09-12 13:17   ` Filipe Manana
2019-09-12 12:18 ` Josef Bacik
2019-09-12 13:19   ` Filipe Manana
2019-09-12 15:14     ` Josef Bacik [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190912151416.svmkxurjrtyc5jkf@MacBook-Pro-91.local \
    --to=josef@toxicpanda.com \
    --cc=fdmanana@kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).