linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Qu Wenruo <wqu@suse.com>
To: Tavian Barnes <tavianator@tavianator.com>,
	Qu Wenruo <quwenruo.btrfs@gmx.com>
Cc: linux-btrfs@vger.kernel.org
Subject: Re: [PATCH] btrfs: tree-checker: dump the page status if hit something wrong
Date: Wed, 14 Feb 2024 07:56:31 +1030	[thread overview]
Message-ID: <f33a6faa-eb99-4525-bf6b-c6276beebc5b@suse.com> (raw)
In-Reply-To: <CABg4E-kqfkX3nyVdcSsgucmcxdcJRMfH+ahVBR+bYXJyd0y53g@mail.gmail.com>


[-- Attachment #1.1.1: Type: text/plain, Size: 7207 bytes --]



On 2024/2/14 04:37, Tavian Barnes wrote:
> On Tue, Feb 6, 2024 at 4:53 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
[...]
>>
>> Yes, still worthy.
>>
>> The btrfs/for-next contains that commit (which is already upstreamed).
>> That patch itself has some bugs fixed early (before hitting upstream),
>> but since it's touching the whole memory management of tree blocks, it
>> is still the best possible culprit.
> 
> Ah okay, I see what you mean.  Unfortunately it still reproduces on
> both that commit 09e6cef19c9f ("btrfs: refactor alloc_extent_buffer()
> to allocate-then-attach method"), and the commit before it
> 2b0122aaa800 ("btrfs: sysfs: validate scrub_speed_max value").

At least we have one less thing to worry.

> 
> I tried to bisect but I don't know where to start from.  It still
> reproduces all the way back to v6.5, although with a different splat:
>       general protection fault, probably for non-canonical address
> 0x7f99872a6b80f0: 0000 [#1] PREEMPT SMP NOPTI
>       BTRFS critical (device dm-3): corrupted node, root=518
> block=16000637395156534217 owner mismatch, have 12049901028372027545
> expect [256, 18446744073709551360]
>       CPU: 47 PID: 3729 Comm: iou-wrk-3310 Not tainted 6.5.0-euclean
> #10 4197dfd21e86f976fbd69cbd6a56016cf20d42e1
>       Hardware name: Micro-Star International Co., Ltd. MS-7C60/TRX40
> PRO WIFI (MS-7C60), BIOS 2.80 05/17/2022

Since it's threadripper and support ECC memory, and you're already using 
one, I don't believe it's hardware problem.

Furthermore, Linus himself also hit it once, it must be something 
related to our extent buffer memory management.


The call trace is different but I believe the problem is the same.
For now I don't have much clue unfortunately.

The only recommendation I have is to try version by version, if the 
problem persists even at v6.0, I believe we're having a bigger problem.

I can add some extra trace_printk() for you to test.
Before that, please give me sometime to craft a debug patch, meanwhile 
feel free to try older kernels until v6.0.

Really appreciate not only your report but all the effort,

Thanks,
Qu

> 
>      RIP: 0010:btrfs_bin_search+0xd7/0x1d0 [btrfs]
>      Code: c2 65 48 89 d0 25 ff 0f 00 00 48 83 c0 11 48 3d 00 10 00 00
> 0f 87 ae 00 00 00 48 89 d0 48 03 13 48 c1 e8 0c 81 e2 ff 0f 00 00 <48>
> 8b 44 c3 70 48 2b 05 35 3c c3 fa 48 c1 f8 06 48 c1 e0 0c 48 03
>      RSP: 0018:ffffbd8d5d7537c0 EFLAGS: 00010206
>      RAX: 000ffffffff9a33e RBX: ffff99872a9e6690 RCX: ffffbd8d5d753860
>      RDX: 00000000000000d1 RSI: 0000000000000000 RDI: ffff99872a9e6690
>      RBP: 0000000061c382ec R08: ffff99872a9e6690 R09: 0000083e36760000
>      R10: ffffbd8d5d753758 R11: 0000000000001000 R12: 00000000c38705d9
>      R13: 0000000000000000 R14: ffffbd8d5d75392f R15: 0000000000000021
>      FS:  00007f3d627f96c0(0000) GS:ffff99a57ecc0000(0000) knlGS:0000000000000000
>      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>      CR2: 00007f3d65db3010 CR3: 000000010fcd6000 CR4: 0000000000350ee0
>      Call Trace:
>       <TASK>
>       ? die_addr+0x36/0x90
>       ? exc_general_protection+0x1c5/0x430
>       ? asm_exc_general_protection+0x26/0x30
>       ? btrfs_bin_search+0xd7/0x1d0 [btrfs
> 698563e3c4412867d9f65411f4b3f353931d836b]
>       btrfs_search_slot+0x458/0xd00 [btrfs
> 698563e3c4412867d9f65411f4b3f353931d836b]
>       btrfs_lookup_inode+0x55/0xe0 [btrfs
> 698563e3c4412867d9f65411f4b3f353931d836b]
>       btrfs_read_locked_inode+0x52a/0x610 [btrfs
> 698563e3c4412867d9f65411f4b3f353931d836b]
>       btrfs_iget_path+0x93/0xe0 [btrfs 698563e3c4412867d9f65411f4b3f353931d836b]
>       btrfs_lookup_dentry+0x394/0x630 [btrfs
> 698563e3c4412867d9f65411f4b3f353931d836b]
>       ? d_alloc_parallel+0x230/0x3f0
>       btrfs_lookup+0x12/0x30 [btrfs 698563e3c4412867d9f65411f4b3f353931d836b]
>       __lookup_slow+0x86/0x130
>       walk_component+0xdb/0x150
>       path_lookupat+0x6a/0x1a0
>       filename_lookup+0xe8/0x1f0
>       vfs_statx+0x9e/0x180
>       do_statx+0x66/0xb0
>       io_statx+0x27/0x40
>       io_issue_sqe+0x63/0x3c0
>       io_wq_submit_work+0x89/0x2c0
>       io_worker_handle_work+0x189/0x560
>       io_wq_worker+0x10a/0x360
>       ? srso_return_thunk+0x5/0x10
>       ? __pfx_io_wq_worker+0x10/0x10
>       ret_from_fork+0x34/0x50
>       ? __pfx_io_wq_worker+0x10/0x10
>       ret_from_fork_asm+0x1b/0x30
>       </TASK>
>      Modules linked in: xt_conntrack xt_comment veth rpcrdma rdma_cm
> iw_cm ib_cm ib_core cmac algif_hash algif_skcipher nct6775 af_alg
> nct6775_core hwmon_vid bnep lm92 nls_iso8859_1 vfat fat intel_rapl_msr
> intel_rapl_common amd64_edac edac_mce_amd snd_hda_codec_hdmi kvm_amd
> uvcvideo snd_hda_intel snd_intel_dspcfg uvc snd_intel_sdw_acpi
> snd_usb_audio gspca_vc032x snd_hda_codec gspca_main btusb
> snd_usbmidi_lib snd_hda_core kvm btrtl videobuf2_vmalloc snd_ump btbcm
> videobuf2_memops snd_rawmidi snd_hwdep btintel videobuf2_v4l2
> snd_seq_device btmtk videodev snd_pcm bluetooth mxm_wmi wmi_bmof
> videobuf2_common snd_timer irqbypass rapl snd mc ecdh_generic pcspkr
> acpi_cpufreq sp5100_tco crc16 soundcore i2c_piix4 k10temp mousedev
> joydev wmi mac_hid nfsd auth_rpcgss nfs_acl lockd usbip_host
> usbip_core pkcs8_key_parser grace i2c_dev sg sunrpc crypto_user fuse
> loop btrfs blake2b_generic xor raid6_pq dm_crypt cbc encrypted_keys
> trusted asn1_encoder tee xt_MASQUERADE xt_tcpudp xt_mark uas
> usb_storage hid_logitech_hidpp dm_mod
>       tun hid_logitech_dj usbhid crct10dif_pclmul crc32_pclmul
> polyval_clmulni polyval_generic gf128mul ghash_clmulni_intel
> sha512_ssse3 iwlwifi igb aesni_intel nvme crypto_simd ccp cryptd
> sr_mod i2c_algo_bit nvme_core xhci_pci cdrom dca xhci_pci_renesas
> bridge nf_tables stp llc ip6table_nat ip6table_filter ip6_tables
> cfg80211 iptable_nat nf_nat nf_conntrack rfkill nf_defrag_ipv6
> nf_defrag_ipv4 libcrc32c crc32c_generic crc32c_intel iptable_filter
> nfnetlink ip_tables x_tables
>      ---[ end trace 0000000000000000 ]---
>      RIP: 0010:btrfs_bin_search+0xd7/0x1d0 [btrfs]
>      Code: c2 65 48 89 d0 25 ff 0f 00 00 48 83 c0 11 48 3d 00 10 00 00
> 0f 87 ae 00 00 00 48 89 d0 48 03 13 48 c1 e8 0c 81 e2 ff 0f 00 00 <48>
> 8b 44 c3 70 48 2b 05 35 3c c3 fa 48 c1 f8 06 48 c1 e0 0c 48 03
>      RSP: 0018:ffffbd8d5d7537c0 EFLAGS: 00010206
>      RAX: 000ffffffff9a33e RBX: ffff99872a9e6690 RCX: ffffbd8d5d753860
>      RDX: 00000000000000d1 RSI: 0000000000000000 RDI: ffff99872a9e6690
>      RBP: 0000000061c382ec R08: ffff99872a9e6690 R09: 0000083e36760000
>      R10: ffffbd8d5d753758 R11: 0000000000001000 R12: 00000000c38705d9
>      R13: 0000000000000000 R14: ffffbd8d5d75392f R15: 0000000000000021
>      FS:  00007f3d627f96c0(0000) GS:ffff99a57ecc0000(0000) knlGS:0000000000000000
>      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>      CR2: 00007feb3489f960 CR3: 000000010fcd6000 CR4: 0000000000350ee0
>      BTRFS critical (device dm-3): corrupted node, root=518
> block=17613216952440067356 owner mismatch, have 16303017448389165215
> expect [256, 18446744073709551360]
> 

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 7027 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

  parent reply	other threads:[~2024-02-13 21:26 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-26 23:48 [PATCH] btrfs: tree-checker: dump the page status if hit something wrong Qu Wenruo
2024-02-06  3:38 ` tavianator
2024-02-06  5:54   ` Qu Wenruo
2024-02-06 20:12     ` Tavian Barnes
2024-02-06 20:39       ` Qu Wenruo
2024-02-06 21:48         ` Tavian Barnes
2024-02-06 21:53           ` Qu Wenruo
2024-02-13 18:07             ` Tavian Barnes
2024-02-13 18:26               ` Tavian Barnes
2024-02-13 21:26               ` Qu Wenruo [this message]
2024-02-06 21:53           ` Tavian Barnes
2024-02-06 22:01             ` Qu Wenruo
2024-02-06 12:51   ` David Sterba
2024-02-06 20:19     ` Tavian Barnes
2024-02-06 12:46 ` David Sterba
2024-02-06 20:34   ` Qu Wenruo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f33a6faa-eb99-4525-bf6b-c6276beebc5b@suse.com \
    --to=wqu@suse.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=quwenruo.btrfs@gmx.com \
    --cc=tavianator@tavianator.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).