linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nikolay Borisov <nborisov@suse.com>
To: Zygo Blaxell <ce3g8jdj@umail.furryterror.org>
Cc: linux-btrfs <linux-btrfs@vger.kernel.org>
Subject: Re: bisected: btrfs dedupe regression in v5.11-rc1: 3078d85c9a10 vfs: verify source area in vfs_dedupe_file_range_one()
Date: Wed, 15 Dec 2021 00:25:04 +0200	[thread overview]
Message-ID: <c6125582-a1dc-1114-8211-48437dbf4976@suse.com> (raw)
In-Reply-To: <Ybj1jVYu3MrUzVTD@hungrycats.org>



On 14.12.21 г. 21:50, Zygo Blaxell wrote:
> On Tue, Dec 14, 2021 at 01:11:24PM +0200, Nikolay Borisov wrote:
>>
>>
>> On 14.12.21 г. 1:12, Zygo Blaxell wrote:
>>> On Mon, Dec 13, 2021 at 03:28:26PM +0200, Nikolay Borisov wrote:
>>>> On 10.12.21 г. 20:34, Zygo Blaxell wrote:
>>>>> I've been getting deadlocks in dedupe on btrfs since kernel 5.11, and
>>>>> some bees users have reported it as well.  I bisected to this commit:
>>>>>
>>>>> 	3078d85c9a10 vfs: verify source area in vfs_dedupe_file_range_one()
>>>>>
>>>>> These kernels work for at least 18 hours:
>>>>>
>>>>> 	5.10.83 (months)
>>>>> 	5.11.22 with 3078d85c9a10 reverted (36 hours)
>>>>> 	btrfs misc-next 66dc4de326b0 with 3078d85c9a10 reverted
>>>>>
>>>>> These kernels lock up in 3 hours or less:
>>>>>
>>>>> 	5.11.22
>>>>> 	5.12.19
>>>>> 	5.14.21
>>>>> 	5.15.6
>>>>> 	btrfs for-next 279373dee83e
>>>>>
>>>>> All of the failing kernels include this commit, none of the non-failing
>>>>> kernels include the commit.
>>>>>
>>>>> Kernel logs from the lockup:
>>>>>
>>>>> 	[19647.696042][ T3721] sysrq: Show Blocked State
>>>>> 	[19647.697024][ T3721] task:btrfs-transacti state:D stack:    0 pid: 6161 ppid:     2 flags:0x00004000
>>>>> 	[19647.698203][ T3721] Call Trace:
>>>>> 	[19647.698608][ T3721]  __schedule+0x388/0xaf0
>>>>> 	[19647.699125][ T3721]  schedule+0x68/0xe0
>>>>> 	[19647.699615][ T3721]  btrfs_commit_transaction+0x97c/0xbf0
>>>>
>>>> Can you run this through symbolize script as I'd like to understand
>>>> where in transaction commit the sleep is happening. 
>>>
>>> 	btrfs_commit_transaction+0x97c/0xbf0:
>>>
>>> 	btrfs_commit_transaction at fs/btrfs/transaction.c:2159 (discriminator 9)
>>> 	 2154
>>> 	 2155           ret = btrfs_run_delayed_items(trans);
>>> 	 2156           if (ret)
>>> 	 2157                   goto cleanup_transaction;
>>> 	 2158
>>> 	>2159<          wait_event(cur_trans->writer_wait,
>>> 	 2160                      extwriter_counter_read(cur_trans) == 0);
>>> 	 2161
>>> 	 2162           /* some pending stuffs might be added after the previous flush. */
>>> 	 2163           ret = btrfs_run_delayed_items(trans);
>>> 	 2164           if (ret)
>>>
>>
>> So it seems there is an open transaction handle thus commit can't
>> continue and everything is stalled behind. Would you be able to run the
>> attached python script on a host which is stuck. It requires you having
>> debug symbols for the kernel installed as well as
>> https://github.com/osandov/drgn/ which is a scriptable debugger. The
>> easiest way would to follow the instructions at
>> https://drgn.readthedocs.io/en/latest/installation.html and just get it
>> via pip.
>>
>>
>> Once you have it installed run it by doing:
>>
>> "sudo drgn get-num-extwriters.py 310dd372-0fd1-4496-a232-0fb46ca4afd6"
>>
>> Where 310dd372-0fd1-4496-a232-0fb46ca4afd6 is the fsid as taken from
>> 'blkid' which corresponds to the wedged fs.
> 
> [drum roll noises...]
> 
> 	[f79c1081-d81d-4abc-8b47-3b15bf2f93c5] num_extwriters is: 1

Huhz, this means there is an open transaction handle somewhere o_O. I
checked back the stacktraces in your original email but couldn't see
where that might be coming from. I.e all processes are waiting on
wait_current_trans and this happens _before_ the transaction handle is
opened, hence num_extwriters can't have been incremented by them.

When an fs wedges, and you get again num_extwriters can you provde the
output of "echo w > /proc/sysrq-trigger"

<snip>

  reply	other threads:[~2021-12-14 22:25 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-10 18:34 Zygo Blaxell
2021-12-12 10:03 ` Thorsten Leemhuis
2021-12-13 13:28 ` Nikolay Borisov
2021-12-13 23:12   ` Zygo Blaxell
2021-12-14 11:11     ` Nikolay Borisov
2021-12-14 19:50       ` Zygo Blaxell
2021-12-14 22:25         ` Nikolay Borisov [this message]
2021-12-16  5:33           ` Zygo Blaxell
2021-12-16 21:29             ` Nikolay Borisov
2021-12-16 22:07               ` Josef Bacik
2021-12-17 20:50                 ` Zygo Blaxell
2022-01-07 18:31                   ` bisected: btrfs dedupe regression in v5.11-rc1 Zygo Blaxell
2022-01-20 14:04                     ` Thorsten Leemhuis
2022-01-21  0:27                       ` Zygo Blaxell
2022-02-09 12:22                         ` Libor Klepáč
2022-02-18 14:46                         ` Thorsten Leemhuis
2022-03-06 10:31                           ` Thorsten Leemhuis
2022-03-06 23:34                             ` Zygo Blaxell
2022-03-07  6:17                               ` Thorsten Leemhuis
2021-12-17  5:38               ` bisected: btrfs dedupe regression in v5.11-rc1: 3078d85c9a10 vfs: verify source area in vfs_dedupe_file_range_one() Zygo Blaxell
2022-06-13  8:38 ` Libor Klepáč
2022-06-21  5:08   ` Zygo Blaxell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c6125582-a1dc-1114-8211-48437dbf4976@suse.com \
    --to=nborisov@suse.com \
    --cc=ce3g8jdj@umail.furryterror.org \
    --cc=linux-btrfs@vger.kernel.org \
    --subject='Re: bisected: btrfs dedupe regression in v5.11-rc1: 3078d85c9a10 vfs: verify source area in vfs_dedupe_file_range_one()' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).