From: Josef Bacik <josef@toxicpanda.com>
To: Nikolay Borisov <nborisov@suse.com>
Cc: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>,
linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: bisected: btrfs dedupe regression in v5.11-rc1: 3078d85c9a10 vfs: verify source area in vfs_dedupe_file_range_one()
Date: Thu, 16 Dec 2021 17:07:50 -0500 [thread overview]
Message-ID: <Ybu4tuzqpaiast5H@localhost.localdomain> (raw)
In-Reply-To: <ab295d78-d250-fe8f-33a5-09cc90d5e406@suse.com>
On Thu, Dec 16, 2021 at 11:29:06PM +0200, Nikolay Borisov wrote:
>
>
> On 16.12.21 г. 7:33, Zygo Blaxell wrote:
> > On Wed, Dec 15, 2021 at 12:25:04AM +0200, Nikolay Borisov wrote:
> >> Huhz, this means there is an open transaction handle somewhere o_O. I
> >> checked back the stacktraces in your original email but couldn't see
> >> where that might be coming from. I.e all processes are waiting on
> >> wait_current_trans and this happens _before_ the transaction handle is
> >> opened, hence num_extwriters can't have been incremented by them.
> >>
> >> When an fs wedges, and you get again num_extwriters can you provde the
> >> output of "echo w > /proc/sysrq-trigger"
> >
> > Here you go...
>
> <snip>
>
> >
> > Again we have "3 locks held" but no list of locks. WTF is 10883 doing?
> > Well, first of all it's using 100% CPU in the kernel. Some samples of
> > kernel stacks:
> >
> > # cat /proc/*/task/10883/stack
> > [<0>] down_read_nested+0x32/0x140
> > [<0>] __btrfs_tree_read_lock+0x2d/0x110
> > [<0>] btrfs_tree_read_lock+0x10/0x20
> > [<0>] btrfs_search_old_slot+0x627/0x8a0
> > [<0>] btrfs_next_old_leaf+0xcb/0x340
> > [<0>] find_parent_nodes+0xcd7/0x1c40
> > [<0>] btrfs_find_all_leafs+0x63/0xb0
> > [<0>] iterate_extent_inodes+0xc8/0x270
> > [<0>] iterate_inodes_from_logical+0x9f/0xe0
>
> That's the real culprit, in this case we are not searching the commit
> root hence we've attached to the transaction. So we are doing backref
> resolution which either:
>
> a) Hits some pathological case and loops for very long time, backref
> resolution is known to take a lot of time.
>
> b) We hit a bug in backref resolution and loop forever which again
> results in the transaction being kept open.
>
> Now I wonder why you were able to bisect this to the seemingly unrelated
> commit in the vfs code.
>
> Josef any ideas how to proceed further to debug why backref resolution
> takes a long time and if it's just an infinite loop?
>
It's probably an infinite loop, I'd just start with something like this
bpftrace -e 'tracepoint:btrfs:btrfs_prelim_ref_insert { printf("bytenr is %llu", args->bytenr); }'
and see if it's spitting out the same shit over and over again. If it is can I
get a btrfs inspect-internal dump-tree -e on the device along with the bytenr
it's hung up on so I can figure out wtf it's tripping over?
If it's not looping there, it may be looping higher up, but I don't see where it
would be doing that. Lets start here and work our way up if we need to.
Thanks,
Josef
next prev parent reply other threads:[~2021-12-16 22:07 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-12-10 18:34 bisected: btrfs dedupe regression in v5.11-rc1: 3078d85c9a10 vfs: verify source area in vfs_dedupe_file_range_one() Zygo Blaxell
2021-12-12 10:03 ` Thorsten Leemhuis
2021-12-13 13:28 ` Nikolay Borisov
2021-12-13 23:12 ` Zygo Blaxell
2021-12-14 11:11 ` Nikolay Borisov
2021-12-14 19:50 ` Zygo Blaxell
2021-12-14 22:25 ` Nikolay Borisov
2021-12-16 5:33 ` Zygo Blaxell
2021-12-16 21:29 ` Nikolay Borisov
2021-12-16 22:07 ` Josef Bacik [this message]
2021-12-17 20:50 ` Zygo Blaxell
2022-01-07 18:31 ` bisected: btrfs dedupe regression in v5.11-rc1 Zygo Blaxell
2022-01-20 14:04 ` Thorsten Leemhuis
2022-01-21 0:27 ` Zygo Blaxell
2022-02-09 12:22 ` Libor Klepáč
2022-02-18 14:46 ` Thorsten Leemhuis
2022-03-06 10:31 ` Thorsten Leemhuis
2022-03-06 23:34 ` Zygo Blaxell
2022-03-07 6:17 ` Thorsten Leemhuis
2021-12-17 5:38 ` bisected: btrfs dedupe regression in v5.11-rc1: 3078d85c9a10 vfs: verify source area in vfs_dedupe_file_range_one() Zygo Blaxell
2022-06-13 8:38 ` Libor Klepáč
2022-06-21 5:08 ` Zygo Blaxell
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Ybu4tuzqpaiast5H@localhost.localdomain \
--to=josef@toxicpanda.com \
--cc=ce3g8jdj@umail.furryterror.org \
--cc=linux-btrfs@vger.kernel.org \
--cc=nborisov@suse.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).