Re: bisected: btrfs dedupe regression in v5.11-rc1

From: Thorsten Leemhuis <regressions@leemhuis.info>
To: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Cc: Chris Mason <clm@fb.com>, Josef Bacik <josef@toxicpanda.com>,
	David Sterba <dsterba@suse.com>,
	Nikolay Borisov <nborisov@suse.com>,
	linux-btrfs <linux-btrfs@vger.kernel.org>,
	"regressions@lists.linux.dev" <regressions@lists.linux.dev>
Subject: Re: bisected: btrfs dedupe regression in v5.11-rc1
Date: Mon, 7 Mar 2022 07:17:56 +0100	[thread overview]
Message-ID: <00afd7f5-ffaf-659f-3d0c-9c12b97d6ef4@leemhuis.info> (raw)
In-Reply-To: <YiVFCO5KPGkdQAyA@hungrycats.org>

On 07.03.22 00:34, Zygo Blaxell wrote:
> On Sun, Mar 06, 2022 at 11:31:05AM +0100, Thorsten Leemhuis wrote:
>> [Adding Chris and David to the list of recipients]
>>
>> Hi, this is your Linux kernel regression tracker again.
>>
>> @Btrfs-Maintainers: is anyone addressing this regression that got
>> bisected many weeks ago? It from here looks like things are stuck: Libor
>> asked for a status update 24 days ago, I did the same 15 days ago, but
>> neither of us got a answer. Is there some good reason for this? Or did
>> the discussion continue somewhere else?
> 
> TL;DR it's not a regression and the bisection is invalid.

#regzbot invalid: it's not a regression and the bisection is invalid
according to the reporter

>  I apologize for using the word "regression" too early.

Np, that just didn't became clear to me from the discussion, but I can
only skim those, as I get many reports on my table.

thx for clarifying and good luck in hunting this down!

Ciao, Thorsten

>  This is a pre-existing bug
> that doesn't require special attention.
>
> After multiple bisection test runs and multiple brute-force linear tests
> of every btrfs commit between 5.10 and 5.11, and similar event reports
> observed in production fleet, I now believe the original bisection
> results are an artifact of the git bisection algorithm and a bug with
> unusual behavior.
> 
> Every commit in the range eventually locks up with the same symptoms.
> Commits seem to randomly fall into the "minutes to reproduce" or "weeks
> to reproduce" buckets, i.e. there's no single point in the commit history
> where everything before the point is good and everything after the point
> is bad.  The fact that bad commits are interleaved with good ones points
> to an issue elsewhere, far away from the commit that bisect happened to
> land on.
> 
> This bug is now at the top of my "reasons why btrfs crashes" list,
> but that is because 1) every worse bug on that list has now been fixed
> (including some other much more urgent regressions from the past few
> kernel cycles) and 2) I'm pushing hard on specific parts of btrfs that
> only a few people are using at the moment.  Most of the bugs I've hit
> in btrfs turned out to be more than 5 years old at the time, and this
> "new" bug could easily fall into that category.
> 
>> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
>>
>> P.S.: As the Linux kernel's regression tracker I'm getting a lot of
>> reports on my table. I can only look briefly into most of them and lack
>> knowledge about most of the areas they concern. I thus unfortunately
>> will sometimes get things wrong or miss something important. I hope
>> that's not the case here; if you think it is, don't hesitate to tell me
>> in a public reply, it's in everyone's interest to set the public record
>> straight.
>>
>> #regzbot poke
>>
>> On 18.02.22 15:46, Thorsten Leemhuis wrote:
>>> Hi, this is your Linux kernel regression tracker speaking. Top-posting
>>> for once, to make this easy accessible to everyone.
>>>
>>> What's up here? Can anyone (Zygo? Josef?) please provide a status
>>> update? Yes, it took quite a while till this regression got found, but
>>> nevertheless this looks to take awfully long to get resolved for a
>>> regression was bisected weeks ago.
>>>
>>> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
>>>
>>> P.S.: As the Linux kernel's regression tracker I'm getting a lot of
>>> reports on my table. I can only look briefly into most of them and lack
>>> knowledge about most of the areas they concern. I thus unfortunately
>>> will sometimes get things wrong or miss something important. I hope
>>> that's not the case here; if you think it is, don't hesitate to tell me
>>> in a public reply, it's in everyone's interest to set the public record
>>> straight.
>>>
>>>
>>> On 21.01.22 01:27, Zygo Blaxell wrote:
>>>> On Thu, Jan 20, 2022 at 03:04:19PM +0100, Thorsten Leemhuis wrote:
>>>>> Hi, this is your Linux kernel regression tracker speaking.
>>>>>
>>>>> On 07.01.22 19:31, Zygo Blaxell wrote:
>>>>>> On Fri, Dec 17, 2021 at 03:50:44PM -0500, Zygo Blaxell wrote:
>>>>>> I left my VM running tests for a few weeks and got some more information.
>>>>>> Or at least more data, I'm not feeling particularly informed by it.  :-P
>>>>>>
>>>>>> 1.  It's not a regression.  5.10 has the same symptoms, but about 100x
>>>>>> less often (once a week under these test conditions, compared to once
>>>>>> every 90 minutes or so on 5.11-rc1).
>>>>>
>>>>> Well, I'd still call it a regression, as it's now happening way more
>>>>> often and thus will likely hit more users. It's thus a bit like a
>>>>> problem that leads to higher energy consumption: things still work, but
>>>>> worse than before -- nevertheless it's considered a regression. Anway:
>>>>>
>>>>> What's the status here? Are you still investigating the issue? Are any
>>>>> developers looking out for the root cause?
>>>>
>>>> I think Josef's plan (start inside the logical_ino ioctl with bpftrace
>>>> and work upwards to see where the looping is getting stuck) is a good plan,
>>>> but due to conflicting priorities I haven't found the time to act on it.
>>>>
>>>> I can take experimental patches and throw them at my repro setup if
>>>> anyone would like to supply some.
>>>>
>>>>> Ciao, Thorsten
>>>>>
>>>>> P.S.: As a Linux kernel regression tracker I'm getting a lot of reports
>>>>> on my table. I can only look briefly into most of them. Unfortunately
>>>>> therefore I sometimes will get things wrong or miss something important.
>>>>> I hope that's not the case here; if you think it is, don't hesitate to
>>>>> tell me about it in a public reply, that's in everyone's interest.
>>>>>
>>>>> BTW, I have no personal interest in this issue, which is tracked using
>>>>> regzbot, my Linux kernel regression tracking bot
>>>>> (https://linux-regtracking.leemhuis.info/regzbot/). I'm only posting
>>>>> this mail to get things rolling again and hence don't need to be CC on
>>>>> all further activities wrt to this regression.
>>>>>
>>>>> #regzbot poke
>>>>>
>>>>>> 2.  Bisection doesn't work, because there are patches that are repeatably
>>>>>> good and bad mixed together, so the bisect algorithm (even with stochastic
>>>>>> enhancement) repeatably picks the wrong commits and converges with
>>>>>> high confidence on nonsense.  Instead of bisecting, I picked commits
>>>>>> semi-randomly from 5.11-rc1's patch set, and got these results:
>>>>>>
>>>>>>    124  3a160a933111 btrfs: drop never met disk total bytes check in verify_one_dev_extent
>>>>>> 	1x hang, 2x slower
>>>>>>    125  bacce86ae8a7 btrfs: drop unused argument step from btrfs_free_extra_devids
>>>>>> 	1x pass (fast)
>>>>>>    126  2766ff61762c btrfs: update the number of bytes used by an inode atomically
>>>>>> 	1x hang (<20 minutes)
>>>>>>    127  7f458a3873ae btrfs: fix race when defragmenting leads to unnecessary IO
>>>>>> 	1x hang, runs 3x slower
>>>>>>    128  5893dfb98f25 btrfs: refactor btrfs_drop_extents() to make it easier to extend
>>>>>> 	2x hang (<20 minutes)
>>>>>>    129  e114c545bb69 btrfs: set the lockdep class for extent buffers on creation
>>>>>> 	2x pass (but runs 2x slower, both times)
>>>>>>    130  3fbaf25817f7 btrfs: pass the owner_root and level to alloc_extent_buffer
>>>>>> 	1x pass
>>>>>>    131  5d81230baa90 btrfs: pass the root owner and level around for readahead
>>>>>> 	1x pass
>>>>>>    132  1b7ec85ef490 btrfs: pass root owner to read_tree_block
>>>>>> 	1x pass
>>>>>>    133  182c79fcb857 btrfs: use btrfs_read_node_slot in btrfs_qgroup_trace_subtree
>>>>>>    134  3acfbd6a990c btrfs: use btrfs_read_node_slot in qgroup_trace_new_subtree_blocks
>>>>>> 	1x hang
>>>>>>    135  6b2cb7cb959a btrfs: use btrfs_read_node_slot in qgroup_trace_extent_swap
>>>>>>    136  c990ada2a0bb btrfs: use btrfs_read_node_slot in walk_down_tree
>>>>>> 	1x hang
>>>>>>    137  6b3426be27de btrfs: use btrfs_read_node_slot in replace_path
>>>>>> 	1x hang, 1x pass
>>>>>>    138  c975253682e0 btrfs: use btrfs_read_node_slot in do_relocation
>>>>>> 	1x hang
>>>>>>    139  8ef385bbf099 btrfs: use btrfs_read_node_slot in walk_down_reloc_tree
>>>>>> 	1x hang, 1x pass
>>>>>>    140  206983b72a36 btrfs: use btrfs_read_node_slot in btrfs_realloc_node
>>>>>> 	1x pass
>>>>>>    141  bfb484d922a3 btrfs: cleanup extent buffer readahead
>>>>>> 	1x pass
>>>>>>    142  416e3445ef80 btrfs: remove lockdep classes for the fs tree
>>>>>>    143  3e48d8d2540d btrfs: discard: reschedule work after sysfs param update
>>>>>>    144  df903e5d294f btrfs: don't miss async discards after scheduled work override
>>>>>>    145  6e88f116bd4c btrfs: discard: store async discard delay as ns not as jiffies
>>>>>> 	2x hang
>>>>>>    146  e50404a8a699 btrfs: discard: speed up async discard up to iops_limit
>>>>>>
>>>>>>    [snip]
>>>>>>
>>>>>>    155  0d01e247a06b btrfs: assert page mapping lock in attach_extent_buffer_page
>>>>>> 	1x hang, 1x pass
>>>>>>    156  bbb86a371791 btrfs: protect fs_info->caching_block_groups by block_group_cache_lock
>>>>>> 	1x hang
>>>>>>    157  e747853cae3a btrfs: load free space cache asynchronously
>>>>>> 	1x pass
>>>>>>    158  4d7240f0abda btrfs: load the free space cache inode extents from commit root
>>>>>> 	1x hang
>>>>>>    159  cd79909bc7cd btrfs: load free space cache into a temporary ctl
>>>>>> 	2x pass
>>>>>>    160  66b53bae46c8 btrfs: cleanup btrfs_discard_update_discardable usage
>>>>>> 	2x hang, 1x pass
>>>>>>    161  2ca08c56e813 btrfs: explicitly protect ->last_byte_to_unpin in unpin_extent_range
>>>>>> 	2x pass
>>>>>>    162  27d56e62e474 btrfs: update last_byte_to_unpin in switch_commit_roots
>>>>>> 	2x pass
>>>>>>    163  9076dbd5ee83 btrfs: do not shorten unpin len for caching block groups
>>>>>>    164  dc5161648693 btrfs: reorder extent buffer members for better packing
>>>>>> 	2x pass
>>>>>>    165  b9729ce014f6 btrfs: locking: rip out path->leave_spinning
>>>>>>    166  ac5887c8e013 btrfs: locking: remove all the blocking helpers
>>>>>>    167  2ae0c2d80d25 btrfs: scrub: remove local copy of csum_size from context
>>>>>>    168  419b791ce760 btrfs: check integrity: remove local copy of csum_size
>>>>>> 	1x hang, 1x pass
>>>>>>    169  713cebfb9891 btrfs: remove unnecessary local variables for checksum size
>>>>>>    170  223486c27b36 btrfs: switch cached fs_info::csum_size from u16 to u32
>>>>>>    171  55fc29bed8dd btrfs: use cached value of fs_info::csum_size everywhere
>>>>>>    172  fe5ecbe818de btrfs: precalculate checksums per leaf once
>>>>>>    173  22b6331d9617 btrfs: store precalculated csum_size in fs_info
>>>>>>    174  265fdfa6ce0a btrfs: replace s_blocksize_bits with fs_info::sectorsize_bits
>>>>>>    175  098e63082b9b btrfs: replace div_u64 by shift in free_space_bitmap_size
>>>>>> 	2x pass
>>>>>>    176  ab108d992b12 btrfs: use precalculated sectorsize_bits from fs_info
>>>>>>
>>>>>>    [snip]
>>>>>>
>>>>>>    200  5e8b9ef30392 btrfs: move pos increment and pagecache extension to btrfs_buffered_write
>>>>>> 	1x pass
>>>>>>    201  4e4cabece9f9 btrfs: split btrfs_direct_IO to read and write
>>>>>>
>>>>>>    [snip]
>>>>>>
>>>>>>    215  d70bf7484f72 btrfs: unify the ro checking for mount options
>>>>>> 	1x pass
>>>>>>    216  a6889caf6ec6 btrfs: do not start readahead for csum tree when scrubbing non-data block groups
>>>>>>    217  a57ad681f12e btrfs: assert we are holding the reada_lock when releasing a readahead zone
>>>>>>    218  aa8c1a41a1e6 btrfs: set EXTENT_NORESERVE bits side btrfs_dirty_pages()
>>>>>>    219  13f0dd8f7861 btrfs: use round_down while calculating start position in btrfs_dirty_pages()
>>>>>>    220  949b32732eab btrfs: use iosize while reading compressed pages
>>>>>>    221  eefa45f59379 btrfs: calculate num_pages, reserve_bytes once in btrfs_buffered_write
>>>>>>    222  fb8a7e941b1b btrfs: calculate more accurate remaining time to sleep in transaction_kthread
>>>>>> 	1x pass
>>>>>>
>>>>>> There is some repeatability in these results--some commits have a much
>>>>>> lower failure rate than others--but I don't see a reason why the bad
>>>>>> commits are bad or the good commits are good.  There are some commits with
>>>>>> locking and concurrency implications, but they're as likely to produce
>>>>>> good as bad results in test.  Sometimes there's a consistent change in
>>>>>> test result after a commit that only rearranges function arguments on
>>>>>> the stack.
>>>>>>
>>>>>> Maybe what we're looking at is a subtle race that is popping up due
>>>>>> to unrelated changes in the kernel, and disappearing just as often,
>>>>>> and 5.11-rc1 in particular did something innocent that aggravates
>>>>>> it somehow, so all later kernels hit the problem more often than
>>>>>> 5.10 did.
>>>>>>
>>>>>> 3.  Somewhere around "7f458a3873ae btrfs: fix race when defragmenting
>>>>>> leads to unnecessary IO" bees starts running about 3x slower than on
>>>>>> earlier kernels.  bees is a nightmare of nondeterministically racing
>>>>>> worker threads, so I'm not sure how important this observation is,
>>>>>> but it keeps showing up in the data.
>>>>>>
>>>>>> 4.  I had one machine on 5.10.84 (not a test VM) with a shell process
>>>>>> that got stuck spinning 100% CPU in the kernel on sys_write.  bees was
>>>>>> also running, but its threads were all stuck waiting for the shell to
>>>>>> release the transaction.  Other crashes on 5.10.8x kernels look more
>>>>>> like the one in this thread, with a logical_ino spinning.
>>>>>>
>>>>>>>> If it's not looping there, it may be looping higher up, but I don't see where it
>>>>>>>> would be doing that.  Lets start here and work our way up if we need to.
>>>>>>>> Thanks,
>>>>>
>>>
>>
> 
>