From: Filipe Manana <fdmanana@gmail.com>
To: Josef Bacik <josef@toxicpanda.com>
Cc: kernel-team@fb.com, linux-btrfs <linux-btrfs@vger.kernel.org>,
Josef Bacik <jbacik@fb.com>,
stable@vger.kernel.org
Subject: Re: [PATCH][v2] btrfs: run delayed items before dropping the snapshot
Date: Wed, 5 Dec 2018 20:13:20 +0000 [thread overview]
Message-ID: <CAL3q7H4a-Roey45q41bscKkYRRFk0sibpJcBkzvxayv73Jk8iA@mail.gmail.com> (raw)
In-Reply-To: <20181205171221.28777-1-josef@toxicpanda.com>
On Wed, Dec 5, 2018 at 5:14 PM Josef Bacik <josef@toxicpanda.com> wrote:
>
> From: Josef Bacik <jbacik@fb.com>
>
> With my delayed refs patches in place we started seeing a large amount
> of aborts in __btrfs_free_extent
>
> BTRFS error (device sdb1): unable to find ref byte nr 91947008 parent 0 root 35964 owner 1 offset 0
> Call Trace:
> ? btrfs_merge_delayed_refs+0xaf/0x340
> __btrfs_run_delayed_refs+0x6ea/0xfc0
> ? btrfs_set_path_blocking+0x31/0x60
> btrfs_run_delayed_refs+0xeb/0x180
> btrfs_commit_transaction+0x179/0x7f0
> ? btrfs_check_space_for_delayed_refs+0x30/0x50
> ? should_end_transaction.isra.19+0xe/0x40
> btrfs_drop_snapshot+0x41c/0x7c0
> btrfs_clean_one_deleted_snapshot+0xb5/0xd0
> cleaner_kthread+0xf6/0x120
> kthread+0xf8/0x130
> ? btree_invalidatepage+0x90/0x90
> ? kthread_bind+0x10/0x10
> ret_from_fork+0x35/0x40
>
> This was because btrfs_drop_snapshot depends on the root not being modified
> while it's dropping the snapshot. It will unlock the root node (and really
> every node) as it walks down the tree, only to re-lock it when it needs to do
> something. This is a problem because if we modify the tree we could cow a block
> in our path, which free's our reference to that block. Then once we get back to
> that shared block we'll free our reference to it again, and get ENOENT when
> trying to lookup our extent reference to that block in __btrfs_free_extent.
>
> This is ultimately happening because we have delayed items left to be processed
> for our deleted snapshot _after_ all of the inodes are closed for the snapshot.
> We only run the delayed inode item if we're deleting the inode, and even then we
> do not run the delayed insertions or delayed removals. These can be run at any
> point after our final inode does it's last iput, which is what triggers the
> snapshot deletion. We can end up with the snapshot deletion happening and then
> have the delayed items run on that file system, resulting in the above problem.
>
> This problem has existed forever, however my patches made it much easier to hit
> as I wake up the cleaner much more often to deal with delayed iputs, which made
> us more likely to start the snapshot dropping work before the transaction
> commits, which is when the delayed items would generally be run. Before,
> generally speaking, we would run the delayed items, commit the transaction, and
> wakeup the cleaner thread to start deleting snapshots, which means we were less
> likely to hit this problem. You could still hit it if you had multiple
> snapshots to be deleted and ended up with lots of delayed items, but it was
> definitely harder.
>
> Fix for now by simply running all the delayed items before starting to drop the
> snapshot. We could make this smarter in the future by making the delayed items
> per-root, and then simply drop any delayed items for roots that we are going to
> delete. But for now just a quick and easy solution is the safest.
>
> Cc: stable@vger.kernel.org
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Looks good now. Thanks.
> ---
> v1->v2:
> - check for errors from btrfs_run_delayed_items.
> - Dave I can reroll the series, but the second version of patch 1 is the same,
> let me know what you want.
>
> fs/btrfs/extent-tree.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index dcb699dd57f3..473084eb7a2d 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -9330,6 +9330,10 @@ int btrfs_drop_snapshot(struct btrfs_root *root,
> goto out_free;
> }
>
> + err = btrfs_run_delayed_items(trans);
> + if (err)
> + goto out_end_trans;
> +
> if (block_rsv)
> trans->block_rsv = block_rsv;
>
> --
> 2.14.3
>
--
Filipe David Manana,
“Whether you think you can, or you think you can't — you're right.”
next prev parent reply other threads:[~2018-12-05 20:13 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-12-05 17:12 [PATCH][v2] btrfs: run delayed items before dropping the snapshot Josef Bacik
2018-12-05 20:13 ` Filipe Manana [this message]
2018-12-06 18:31 ` David Sterba
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAL3q7H4a-Roey45q41bscKkYRRFk0sibpJcBkzvxayv73Jk8iA@mail.gmail.com \
--to=fdmanana@gmail.com \
--cc=jbacik@fb.com \
--cc=josef@toxicpanda.com \
--cc=kernel-team@fb.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).