linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Josef Bacik <josef@toxicpanda.com>
To: Nikolay Borisov <nborisov@suse.com>
Cc: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>,
	linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: bisected: btrfs dedupe regression in v5.11-rc1: 3078d85c9a10 vfs: verify source area in vfs_dedupe_file_range_one()
Date: Thu, 16 Dec 2021 17:07:50 -0500	[thread overview]
Message-ID: <Ybu4tuzqpaiast5H@localhost.localdomain> (raw)
In-Reply-To: <ab295d78-d250-fe8f-33a5-09cc90d5e406@suse.com>

On Thu, Dec 16, 2021 at 11:29:06PM +0200, Nikolay Borisov wrote:
> 
> 
> On 16.12.21 г. 7:33, Zygo Blaxell wrote:
> > On Wed, Dec 15, 2021 at 12:25:04AM +0200, Nikolay Borisov wrote:
> >> Huhz, this means there is an open transaction handle somewhere o_O. I
> >> checked back the stacktraces in your original email but couldn't see
> >> where that might be coming from. I.e all processes are waiting on
> >> wait_current_trans and this happens _before_ the transaction handle is
> >> opened, hence num_extwriters can't have been incremented by them.
> >>
> >> When an fs wedges, and you get again num_extwriters can you provde the
> >> output of "echo w > /proc/sysrq-trigger"
> > 
> > Here you go...
> 
> <snip>
> 
> > 
> > Again we have "3 locks held" but no list of locks.  WTF is 10883 doing?
> > Well, first of all it's using 100% CPU in the kernel.  Some samples of
> > kernel stacks:
> > 
> > 	# cat /proc/*/task/10883/stack
> > 	[<0>] down_read_nested+0x32/0x140
> > 	[<0>] __btrfs_tree_read_lock+0x2d/0x110
> > 	[<0>] btrfs_tree_read_lock+0x10/0x20
> > 	[<0>] btrfs_search_old_slot+0x627/0x8a0
> > 	[<0>] btrfs_next_old_leaf+0xcb/0x340
> > 	[<0>] find_parent_nodes+0xcd7/0x1c40
> > 	[<0>] btrfs_find_all_leafs+0x63/0xb0
> > 	[<0>] iterate_extent_inodes+0xc8/0x270
> > 	[<0>] iterate_inodes_from_logical+0x9f/0xe0
> 
> That's the real culprit, in this case we are not searching the commit
> root hence we've attached to the transaction. So we are doing backref
> resolution which either:
> 
> a) Hits some pathological case and loops for very long time, backref
> resolution is known to take a lot of time.
> 
> b) We hit a bug in backref resolution and loop forever which again
> results in the transaction being kept open.
> 
> Now I wonder why you were able to bisect this to the seemingly unrelated
> commit in the vfs code.
> 
> Josef any ideas how to proceed further to debug why backref resolution
> takes a long time and if it's just an infinite loop?
> 

It's probably an infinite loop, I'd just start with something like this

bpftrace -e 'tracepoint:btrfs:btrfs_prelim_ref_insert { printf("bytenr is %llu", args->bytenr); }'

and see if it's spitting out the same shit over and over again.  If it is can I
get a btrfs inspect-internal dump-tree -e on the device along with the bytenr
it's hung up on so I can figure out wtf it's tripping over?

If it's not looping there, it may be looping higher up, but I don't see where it
would be doing that.  Lets start here and work our way up if we need to.
Thanks,

Josef

  reply	other threads:[~2021-12-16 22:07 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-10 18:34 Zygo Blaxell
2021-12-12 10:03 ` Thorsten Leemhuis
2021-12-13 13:28 ` Nikolay Borisov
2021-12-13 23:12   ` Zygo Blaxell
2021-12-14 11:11     ` Nikolay Borisov
2021-12-14 19:50       ` Zygo Blaxell
2021-12-14 22:25         ` Nikolay Borisov
2021-12-16  5:33           ` Zygo Blaxell
2021-12-16 21:29             ` Nikolay Borisov
2021-12-16 22:07               ` Josef Bacik [this message]
2021-12-17 20:50                 ` Zygo Blaxell
2022-01-07 18:31                   ` bisected: btrfs dedupe regression in v5.11-rc1 Zygo Blaxell
2022-01-20 14:04                     ` Thorsten Leemhuis
2022-01-21  0:27                       ` Zygo Blaxell
2022-02-09 12:22                         ` Libor Klepáč
2022-02-18 14:46                         ` Thorsten Leemhuis
2022-03-06 10:31                           ` Thorsten Leemhuis
2022-03-06 23:34                             ` Zygo Blaxell
2022-03-07  6:17                               ` Thorsten Leemhuis
2021-12-17  5:38               ` bisected: btrfs dedupe regression in v5.11-rc1: 3078d85c9a10 vfs: verify source area in vfs_dedupe_file_range_one() Zygo Blaxell
2022-06-13  8:38 ` Libor Klepáč
2022-06-21  5:08   ` Zygo Blaxell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Ybu4tuzqpaiast5H@localhost.localdomain \
    --to=josef@toxicpanda.com \
    --cc=ce3g8jdj@umail.furryterror.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=nborisov@suse.com \
    --subject='Re: bisected: btrfs dedupe regression in v5.11-rc1: 3078d85c9a10 vfs: verify source area in vfs_dedupe_file_range_one()' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).