linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Kernel NULL pointer deref and data corruptions with xfs on 6.1
@ 2023-07-21 10:49 Daniel Dao
  2023-07-24 11:23 ` Daniel Dao
  2023-07-27  3:27 ` Matthew Wilcox
  0 siblings, 2 replies; 10+ messages in thread
From: Daniel Dao @ 2023-07-21 10:49 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Matthew Wilcox, Dave Chinner, kernel-team, linux-kernel, djwong

Hi,

In the past, we reported some corruptions on xfs/iomap/xarray combinations on
kernel 6.1. This happened very rarely ( once a week for every 10000 hosts), and
the host exhibited symptoms such as: rcu_preempt self-detected stalls,
NULL pointer
dereferences or deadlock when reading a particular file.

We do not have a reproducer yet, but we now have more debugging data
which hopefully
should help narrow this down. Details as followed:

1. Kernel NULL pointer deferencences in __filemap_get_folio

This happened on a few different hosts, with a few different repeated addresses.
The addresses are 0000000000000036, 0000000000000076,
00000000000000f6. This looks
like the xarray is corrupted and we were trying to do some work on a
sibling entry.

    BUG: kernel NULL pointer dereference, address: 0000000000000036
    #PF: supervisor read access in kernel mode
    #PF: error_code(0x0000) - not-present page
    PGD 18806c5067 P4D 18806c5067 PUD 188ed48067 PMD 0
    Oops: 0000 [#1] PREEMPT SMP NOPTI
    CPU: 73 PID: 3579408 Comm: prometheus Tainted: G           O
6.1.34-cloudflare-2023.6.7 #1
    Hardware name: GIGABYTE R162-Z12-CD1/MZ12-HD4-CD, BIOS M03 11/19/2021
    RIP: 0010:__filemap_get_folio (arch/x86/include/asm/atomic.h:29
include/linux/atomic/atomic-arch-fallback.h:1242
include/linux/atomic/atomic-arch-fallback.h:1267
include/linux/atomic/atomic-instrumented.h:608
include/linux/page_ref.h:238 include/linux/page_ref.h:247
include/linux/page_ref.h:280 include/linux/page_ref.h:313
mm/filemap.c:1863 mm/filemap.c:1915)
    Code: 10 e8 99 ac 84 00 48 3d 06 04 00 00 49 89 c4 74 e2 48 3d 02
04 00 00 74 da 48 85 c0 0f 84 2e 02 00 00 a8 01 0f 85 e3 00 00 00 <8b>
40 34 85 c0 74 c2 8d 50 01 4d 8d 7c 24 34 f0 41 0f b1 54 24 34
    All code
    ========
      0: 10 e8                adc    %ch,%al
      2: 99                    cltd
      3: ac                    lods   %ds:(%rsi),%al
      4: 84 00                test   %al,(%rax)
      6: 48 3d 06 04 00 00    cmp    $0x406,%rax
      c: 49 89 c4              mov    %rax,%r12
      f: 74 e2                je     0xfffffffffffffff3
      11: 48 3d 02 04 00 00    cmp    $0x402,%rax
      17: 74 da                je     0xfffffffffffffff3
      19: 48 85 c0              test   %rax,%rax
      1c: 0f 84 2e 02 00 00    je     0x250
      22: a8 01                test   $0x1,%al
      24: 0f 85 e3 00 00 00    jne    0x10d
      2a:* 8b 40 34              mov    0x34(%rax),%eax <-- trapping instruction
      2d: 85 c0                test   %eax,%eax
      2f: 74 c2                je     0xfffffffffffffff3
      31: 8d 50 01              lea    0x1(%rax),%edx
      34: 4d 8d 7c 24 34        lea    0x34(%r12),%r15
      39: f0 41 0f b1 54 24 34 lock cmpxchg %edx,0x34(%r12)

    Code starting with the faulting instruction
    ===========================================
      0: 8b 40 34              mov    0x34(%rax),%eax
      3: 85 c0                test   %eax,%eax
      5: 74 c2                je     0xffffffffffffffc9
      7: 8d 50 01              lea    0x1(%rax),%edx
      a: 4d 8d 7c 24 34        lea    0x34(%r12),%r15
      f: f0 41 0f b1 54 24 34 lock cmpxchg %edx,0x34(%r12)
    RSP: 0000:ffffaf5587cdfc60 EFLAGS: 00010246
    RAX: 0000000000000002 RBX: 0000000000000000 RCX: 0000000000000002
    RDX: 0000000000000008 RSI: ffffa45181fa8000 RDI: ffffaf5587cdfc70
    RBP: 0000000000000000 R08: 0000000000000402 R09: 000000000006e44f
    R10: 000000000006e450 R11: 000000000006e448 R12: 0000000000000002
    R13: ffffa3fff6fdfeb0 R14: 000000000006e44a R15: 00000000000000d1
    FS:  000000c9e385ac90(0000) GS:ffffa4153fc40000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000036 CR3: 000000296a1bc002 CR4: 0000000000770ee0
    PKRU: 55555554
    Call Trace:
    <TASK>
    ? __die_body.cold (arch/x86/kernel/dumpstack.c:478
arch/x86/kernel/dumpstack.c:465 arch/x86/kernel/dumpstack.c:420)
    ? page_fault_oops (arch/x86/mm/fault.c:727)
    ? migrate_task_rq_fair (include/linux/sched.h:1921
kernel/sched/fair.c:3932 kernel/sched/fair.c:7497)
    ? do_user_addr_fault (include/linux/kprobes.h:404
include/linux/kprobes.h:597 arch/x86/mm/fault.c:1280)
    ? ttwu_queue_wakelist (kernel/sched/core.c:3880)
    ? exc_page_fault (arch/x86/include/asm/irqflags.h:40
arch/x86/include/asm/irqflags.h:75 arch/x86/mm/fault.c:1527
arch/x86/mm/fault.c:1575)
    ? asm_exc_page_fault (arch/x86/include/asm/idtentry.h:570)
    ? __filemap_get_folio (arch/x86/include/asm/atomic.h:29
include/linux/atomic/atomic-arch-fallback.h:1242
include/linux/atomic/atomic-arch-fallback.h:1267
include/linux/atomic/atomic-instrumented.h:608
include/linux/page_ref.h:238 include/linux/page_ref.h:247
include/linux/page_ref.h:280 include/linux/page_ref.h:313
mm/filemap.c:1863 mm/filemap.c:1915)
    filemap_fault (mm/filemap.c:3120)
    ? preempt_count_add (include/linux/ftrace.h:950
kernel/sched/core.c:5685 kernel/sched/core.c:5682
kernel/sched/core.c:5710)
    __do_fault (mm/memory.c:4234)
    do_fault (mm/memory.c:4564 mm/memory.c:4692)
    __handle_mm_fault (mm/memory.c:4964 mm/memory.c:5106)
    handle_mm_fault (mm/memory.c:5227)
    do_user_addr_fault (include/linux/sched/signal.h:433
arch/x86/mm/fault.c:1430)
    exc_page_fault (arch/x86/include/asm/irqflags.h:40
arch/x86/include/asm/irqflags.h:75 arch/x86/mm/fault.c:1527
arch/x86/mm/fault.c:1575)
    asm_exc_page_fault (arch/x86/include/asm/idtentry.h:570)
    RIP: 0033:0x268b8b9
    Code: 70 48 89 4c 24 78 48 8b 94 24 b8 00 00 00 0f 1f 00 48 85 d2
74 3f 48 89 ce 48 29 d9 4c 8d 49 04 49 f7 d9 49 c1 f9 3f 49 21 f9 <46>
8b 0c 08 44 89 4c 24 34 90 90 48 89 d3 48 89 c1 41 b8 01 00 00
    All code
    ========
      0: 70 48                jo     0x4a
      2: 89 4c 24 78          mov    %ecx,0x78(%rsp)
      6: 48 8b 94 24 b8 00 00 mov    0xb8(%rsp),%rdx
      d: 00
      e: 0f 1f 00              nopl   (%rax)
      11: 48 85 d2              test   %rdx,%rdx
      14: 74 3f                je     0x55
      16: 48 89 ce              mov    %rcx,%rsi
      19: 48 29 d9              sub    %rbx,%rcx
      1c: 4c 8d 49 04          lea    0x4(%rcx),%r9
      20: 49 f7 d9              neg    %r9
      23: 49 c1 f9 3f          sar    $0x3f,%r9
      27: 49 21 f9              and    %rdi,%r9
      2a:* 46 8b 0c 08          mov    (%rax,%r9,1),%r9d <-- trapping
instruction
      2e: 44 89 4c 24 34        mov    %r9d,0x34(%rsp)
      33: 90                    nop
      34: 90                    nop
      35: 48 89 d3              mov    %rdx,%rbx
      38: 48 89 c1              mov    %rax,%rcx
      3b: 41                    rex.B
      3c: b8                    .byte 0xb8
      3d: 01 00                add    %eax,(%rax)
      ...

    Code starting with the faulting instruction
    ===========================================
      0: 46 8b 0c 08          mov    (%rax,%r9,1),%r9d
      4: 44 89 4c 24 34        mov    %r9d,0x34(%rsp)
      9: 90                    nop
      a: 90                    nop
      b: 48 89 d3              mov    %rdx,%rbx
      e: 48 89 c1              mov    %rax,%rcx
      11: 41                    rex.B
      12: b8                    .byte 0xb8
      13: 01 00                add    %eax,(%rax)
      ...
    RSP: 002b:000000cbc509f520 EFLAGS: 00010202
    RAX: 00007e81cf427e0c RBX: 00000000000222cc RCX: 00000000123817b2
    RDX: 000000c00001ac00 RSI: 00000000123a3a7e RDI: 00000000000222c8
    RBP: 000000cbc509f5b0 R08: 0000000003cb5910 R09: 00000000000222c8
    R10: 000000c4de3dea00 R11: 0000000000000123 R12: 0000000000000000
    R13: 0000000000000005 R14: 000000c83bad2340 R15: 0000010000000000
    </TASK>
    Modules linked in: xt_connlabel xt_MASQUERADE nf_conntrack_netlink
xfrm_user xfrm_algo xt_addrtype br_netfilter bridge overlay zstd
zstd_compress zram zsmalloc tun tcp_diag inet_diag raid0 md_mod essiv
dm_crypt trusted asn1_encoder tee ip6table_filter ip6table_mangle
ip6table_raw ip6table_security ip6table_nat ip6_tables xt_bpf
xt_conntrack xt_multiport xt_set iptable_filter xt_NFLOG nfnetlink_log
xt_connbytes xt_comment xt_connmark xt_statistic iptable_mangle xt_nat
xt_tcpudp iptable_nat nf_nat xt_CT iptable_raw ip_set_hash_ip
ip_set_hash_net ip_set nfnetlink sch_fq nf_conntrack nf_defrag_ipv6
nf_defrag_ipv4 8021q garp mrp stp llc bonding nvme_fabrics amd64_edac
kvm_amd ipmi_ssif kvm irqbypass crc32_pclmul crc32c_intel sha512_ssse3
acpi_ipmi mlx5_core aesni_intel ipmi_si mlxfw rapl xhci_pci nvme tls
ipmi_devintf tiny_power_button psample nvme_core xhci_hcd i2c_piix4
ccp ipmi_msghandler button fuse dm_mod dax efivarfs ip_tables x_tables
bcmcrypt(O)
    crypto_simd cryptd
    CR2: 0000000000000036
    ---[ end trace 0000000000000000 ]---
    RIP: 0010:__filemap_get_folio (arch/x86/include/asm/atomic.h:29
include/linux/atomic/atomic-arch-fallback.h:1242
include/linux/atomic/atomic-arch-fallback.h:1267
include/linux/atomic/atomic-instrumented.h:608
include/linux/page_ref.h:238 include/linux/page_ref.h:247
include/linux/page_ref.h:280 include/linux/page_ref.h:313
mm/filemap.c:1863 mm/filemap.c:1915)
    Code: 10 e8 99 ac 84 00 48 3d 06 04 00 00 49 89 c4 74 e2 48 3d 02
04 00 00 74 da 48 85 c0 0f 84 2e 02 00 00 a8 01 0f 85 e3 00 00 00 <8b>
40 34 85 c0 74 c2 8d 50 01 4d 8d 7c 24 34 f0 41 0f b1 54 24 34
    All code
    ========
      0: 10 e8                adc    %ch,%al
      2: 99                    cltd
      3: ac                    lods   %ds:(%rsi),%al
      4: 84 00                test   %al,(%rax)
      6: 48 3d 06 04 00 00    cmp    $0x406,%rax
      c: 49 89 c4              mov    %rax,%r12
      f: 74 e2                je     0xfffffffffffffff3
      11: 48 3d 02 04 00 00    cmp    $0x402,%rax
      17: 74 da                je     0xfffffffffffffff3
      19: 48 85 c0              test   %rax,%rax
      1c: 0f 84 2e 02 00 00    je     0x250
      22: a8 01                test   $0x1,%al
      24: 0f 85 e3 00 00 00    jne    0x10d
      2a:* 8b 40 34              mov    0x34(%rax),%eax <-- trapping instruction
      2d: 85 c0                test   %eax,%eax
      2f: 74 c2                je     0xfffffffffffffff3
      31: 8d 50 01              lea    0x1(%rax),%edx
      34: 4d 8d 7c 24 34        lea    0x34(%r12),%r15
      39: f0 41 0f b1 54 24 34 lock cmpxchg %edx,0x34(%r12)

    Code starting with the faulting instruction
    ===========================================
      0: 8b 40 34              mov    0x34(%rax),%eax
      3: 85 c0                test   %eax,%eax
      5: 74 c2                je     0xffffffffffffffc9
      7: 8d 50 01              lea    0x1(%rax),%edx
      a: 4d 8d 7c 24 34        lea    0x34(%r12),%r15
      f: f0 41 0f b1 54 24 34 lock cmpxchg %edx,0x34(%r12)
    RSP: 0000:ffffaf5587cdfc60 EFLAGS: 00010246
    RAX: 0000000000000002 RBX: 0000000000000000 RCX: 0000000000000002
    RDX: 0000000000000008 RSI: ffffa45181fa8000 RDI: ffffaf5587cdfc70
    RBP: 0000000000000000 R08: 0000000000000402 R09: 000000000006e44f
    R10: 000000000006e450 R11: 000000000006e448 R12: 0000000000000002
    R13: ffffa3fff6fdfeb0 R14: 000000000006e44a R15: 00000000000000d1
    FS:  000000c9e385ac90(0000) GS:ffffa4153fc40000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000036 CR3: 000000296a1bc002 CR4: 0000000000770ee0
    PKRU: 55555554

    BUG: kernel NULL pointer dereference, address: 0000000000000076
    #PF: supervisor read access in kernel mode
    #PF: error_code(0x0000) - not-present page
    PGD 7acd78067 P4D 7acd78067 PUD 7acd79067 PMD 0
    Oops: 0000 [#1] PREEMPT SMP NOPTI
    CPU: 93 PID: 3784417 Comm: prometheus Tainted: G           O
6.1.20-cloudflare-2023.3.18 #1
    Hardware name: GIGABYTE R162-Z13-CD/MZ12-HD2-CD, BIOS R13 07/17/2020
    RIP: 0010:__filemap_get_folio (arch/x86/include/asm/atomic.h:29
include/linux/atomic/atomic-arch-fallback.h:1242
include/linux/atomic/atomic-arch-fallback.h:1267
include/linux/atomic/atomic-instrumented.h:608
include/linux/page_ref.h:238 include/linux/page_ref.h:247
include/linux/page_ref.h:280 include/linux/page_ref.h:313
mm/filemap.c:1863 mm/filemap.c:1915)
    Code: 10 e8 b9 a4 84 00 48 3d 06 04 00 00 49 89 c4 74 e2 48 3d 02
04 00 00 74 da 48 85 c0 0f 84 2e 02 00 00 a8 01 0f 85 e3 00 00 00 <8b>
40 34 85 c0 74 c2 8d 50 01 4d 8d 7c 24 34 f0 41 0f b1 54 24 34
    All code
    ========
       0: 10 e8                adc    %ch,%al
       2: b9 a4 84 00 48        mov    $0x480084a4,%ecx
       7: 3d 06 04 00 00        cmp    $0x406,%eax
       c: 49 89 c4              mov    %rax,%r12
       f: 74 e2                je     0xfffffffffffffff3
      11: 48 3d 02 04 00 00    cmp    $0x402,%rax
      17: 74 da                je     0xfffffffffffffff3
      19: 48 85 c0              test   %rax,%rax
      1c: 0f 84 2e 02 00 00    je     0x250
      22: a8 01                test   $0x1,%al
      24: 0f 85 e3 00 00 00    jne    0x10d
      2a:* 8b 40 34              mov    0x34(%rax),%eax <-- trapping instruction
      2d: 85 c0                test   %eax,%eax
      2f: 74 c2                je     0xfffffffffffffff3
      31: 8d 50 01              lea    0x1(%rax),%edx
      34: 4d 8d 7c 24 34        lea    0x34(%r12),%r15
      39: f0 41 0f b1 54 24 34 lock cmpxchg %edx,0x34(%r12)

    Code starting with the faulting instruction
    ===========================================
       0: 8b 40 34              mov    0x34(%rax),%eax
       3: 85 c0                test   %eax,%eax
       5: 74 c2                je     0xffffffffffffffc9
       7: 8d 50 01              lea    0x1(%rax),%edx
       a: 4d 8d 7c 24 34        lea    0x34(%r12),%r15
       f: f0 41 0f b1 54 24 34 lock cmpxchg %edx,0x34(%r12)
    RSP: 0000:ffffb15106683c60 EFLAGS: 00010246
    RAX: 0000000000000042 RBX: 0000000000000000 RCX: 0000000000000002
    RDX: 0000000000000018 RSI: ffff934b0029efc8 RDI: ffffb15106683c70
    RBP: 0000000000000000 R08: 0000000000000402 R09: 00000000000cbe5f
    R10: 00000000000cbe60 R11: 00000000000cbe5c R12: 0000000000000042
    R13: ffff93449c251eb0 R14: 00000000000cbe59 R15: 00000000000000d1
    FS:  000000c000300090(0000) GS:ffff937e6ed40000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000076 CR3: 0000000a6528e000 CR4: 0000000000350ee0
    Call Trace:
     <TASK>
    filemap_fault (mm/filemap.c:3120)
    ? preempt_count_add (include/linux/ftrace.h:950
kernel/sched/core.c:5685 kernel/sched/core.c:5682
kernel/sched/core.c:5710)
    __do_fault (mm/memory.c:4234)
    do_fault (mm/memory.c:4564 mm/memory.c:4692)
    __handle_mm_fault (mm/memory.c:4964 mm/memory.c:5106)
    handle_mm_fault (mm/memory.c:5227)
    do_user_addr_fault (include/linux/sched/signal.h:433
arch/x86/mm/fault.c:1430)
    exc_page_fault (arch/x86/include/asm/irqflags.h:40
arch/x86/include/asm/irqflags.h:75 arch/x86/mm/fault.c:1527
arch/x86/mm/fault.c:1575)
    asm_exc_page_fault (arch/x86/include/asm/idtentry.h:570)
    RIP: 0033:0x268b8b9
    Code: 70 48 89 4c 24 78 48 8b 94 24 b8 00 00 00 0f 1f 00 48 85 d2
74 3f 48 89 ce 48 29 d9 4c 8d 49 04 49 f7 d9 49 c1 f9 3f 49 21 f9 <46>
8b 0c 08 44 89 4c 24 34 90 90 48 89 d3 48 89 c1 41 b8 01 00 00
    All code
    ========
       0: 70 48                jo     0x4a
       2: 89 4c 24 78          mov    %ecx,0x78(%rsp)
       6: 48 8b 94 24 b8 00 00 mov    0xb8(%rsp),%rdx
       d: 00
       e: 0f 1f 00              nopl   (%rax)
      11: 48 85 d2              test   %rdx,%rdx
      14: 74 3f                je     0x55
      16: 48 89 ce              mov    %rcx,%rsi
      19: 48 29 d9              sub    %rbx,%rcx
      1c: 4c 8d 49 04          lea    0x4(%rcx),%r9
      20: 49 f7 d9              neg    %r9
      23: 49 c1 f9 3f          sar    $0x3f,%r9
      27: 49 21 f9              and    %rdi,%r9
      2a:* 46 8b 0c 08          mov    (%rax,%r9,1),%r9d <-- trapping
instruction
      2e: 44 89 4c 24 34        mov    %r9d,0x34(%rsp)
      33: 90                    nop
      34: 90                    nop
      35: 48 89 d3              mov    %rdx,%rbx
      38: 48 89 c1              mov    %rax,%rcx
      3b: 41                    rex.B
      3c: b8                    .byte 0xb8
      3d: 01 00                add    %eax,(%rax)
    ...

    Code starting with the faulting instruction
    ===========================================
       0: 46 8b 0c 08          mov    (%rax,%r9,1),%r9d
       4: 44 89 4c 24 34        mov    %r9d,0x34(%rsp)
       9: 90                    nop
       a: 90                    nop
       b: 48 89 d3              mov    %rdx,%rbx
       e: 48 89 c1              mov    %rax,%rcx
      11: 41                    rex.B
      12: b8                    .byte 0xb8
      13: 01 00                add    %eax,(%rax)
    ...
    RSP: 002b:000000d735bb3558 EFLAGS: 00010206
    RAX: 00007c018402dad8 RBX: 000000000002c3d8 RCX: 0000000037f9be1c
    RDX: 000000c000222c00 RSI: 0000000037fc81f4 RDI: 000000000002c3d4
    RBP: 000000d735bb35e8 R08: 0000000003cb5910 R09: 000000000002c3d4
    R10: 000000c385d2a000 R11: 0000000000000021 R12: 0000000000000000
    R13: 000000000000000b R14: 000000d1bb70e340 R15: 0000000001000000
     </TASK>
    Modules linked in: veth xt_MASQUERADE nf_conntrack_netlink
xfrm_user xfrm_algo xt_addrtype br_netfilter bridge overlay raid1
md_mod essiv dm_crypt trusted tee asn1_encoder xt_hl ip6table_filter
ip6table_mangle ip6table_raw ip6table_security ip6table_nat ip6_tables
xt_tcpudp xt_conntrack xt_comment xt_multiport xt_set iptable_filter
iptable_mangle iptable_nat nf_nat xt_CT iptable_raw ip_set_hash_ip
ip_set_hash_net ip_set nfnetlink tcp_bbr sch_fq nf_conntrack
nf_defrag_ipv6 nf_defrag_ipv4 8021q mrp garp stp llc bonding
amd64_edac kvm_amd ipmi_ssif kvm irqbypass crc32_pclmul crc32c_intel
mlx5_core sha512_ssse3 psample acpi_ipmi aesni_intel xhci_pci nvme
ipmi_si rapl tls ipmi_devintf tiny_power_button nvme_core mlxfw
xhci_hcd i2c_piix4 ccp ipmi_msghandler button fuse dm_mod dax efivarfs
ip_tables x_tables bcmcrypt(O) crypto_simd cryptd
    CR2: 0000000000000076
    ---[ end trace 0000000000000000 ]---
    RIP: 0010:__filemap_get_folio (arch/x86/include/asm/atomic.h:29
include/linux/atomic/atomic-arch-fallback.h:1242
include/linux/atomic/atomic-arch-fallback.h:1267
include/linux/atomic/atomic-instrumented.h:608
include/linux/page_ref.h:238 include/linux/page_ref.h:247
include/linux/page_ref.h:280 include/linux/page_ref.h:313
mm/filemap.c:1863 mm/filemap.c:1915)
    Code: 10 e8 b9 a4 84 00 48 3d 06 04 00 00 49 89 c4 74 e2 48 3d 02
04 00 00 74 da 48 85 c0 0f 84 2e 02 00 00 a8 01 0f 85 e3 00 00 00 <8b>
40 34 85 c0 74 c2 8d 50 01 4d 8d 7c 24 34 f0 41 0f b1 54 24 34
    All code
    ========
       0: 10 e8                adc    %ch,%al
       2: b9 a4 84 00 48        mov    $0x480084a4,%ecx
       7: 3d 06 04 00 00        cmp    $0x406,%eax
       c: 49 89 c4              mov    %rax,%r12
       f: 74 e2                je     0xfffffffffffffff3
      11: 48 3d 02 04 00 00    cmp    $0x402,%rax
      17: 74 da                je     0xfffffffffffffff3
      19: 48 85 c0              test   %rax,%rax
      1c: 0f 84 2e 02 00 00    je     0x250
      22: a8 01                test   $0x1,%al
      24: 0f 85 e3 00 00 00    jne    0x10d
      2a:* 8b 40 34              mov    0x34(%rax),%eax <-- trapping instruction
      2d: 85 c0                test   %eax,%eax
      2f: 74 c2                je     0xfffffffffffffff3
      31: 8d 50 01              lea    0x1(%rax),%edx
      34: 4d 8d 7c 24 34        lea    0x34(%r12),%r15
      39: f0 41 0f b1 54 24 34 lock cmpxchg %edx,0x34(%r12)

    Code starting with the faulting instruction
    ===========================================
       0: 8b 40 34              mov    0x34(%rax),%eax
       3: 85 c0                test   %eax,%eax
       5: 74 c2                je     0xffffffffffffffc9
       7: 8d 50 01              lea    0x1(%rax),%edx
       a: 4d 8d 7c 24 34        lea    0x34(%r12),%r15
       f: f0 41 0f b1 54 24 34 lock cmpxchg %edx,0x34(%r12)
    RSP: 0000:ffffb15106683c60 EFLAGS: 00010246
    RAX: 0000000000000042 RBX: 0000000000000000 RCX: 0000000000000002
    RDX: 0000000000000018 RSI: ffff934b0029efc8 RDI: ffffb15106683c70
    RBP: 0000000000000000 R08: 0000000000000402 R09: 00000000000cbe5f
    R10: 00000000000cbe60 R11: 00000000000cbe5c R12: 0000000000000042
    R13: ffff93449c251eb0 R14: 00000000000cbe59 R15: 00000000000000d1
    FS:  000000c000300090(0000) GS:ffff937e6ed40000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000076 CR3: 0000000a6528e000 CR4: 0000000000350ee0
    note: prometheus[3784417] exited with irqs disabled

2. Kernel NULL pointer deferencences in xfs_read_iomap_begin

    BUG: unable to handle page fault for address: 0000000000034668
    #PF: supervisor read access in kernel mode
    #PF: error_code(0x0000) - not-present page
    PGD 11cfd37067 P4D 11cfd37067 PUD b88086067 PMD 0
    Oops: 0000 [#1] PREEMPT SMP NOPTI
    CPU: 124 PID: 3831226 Comm: rocksdb:low Kdump: loaded Tainted: G
     W  O L     6.1.27-cloudflare-2023.5.0 #1
    Hardware name: HYVE EDGE-METAL-GEN11/HS1811D_Lite, BIOS V0.11-sig 12/23/2022
    RIP: 0010:xfs_read_iomap_begin (fs/xfs/xfs_iomap.c:1200)
    Code: 0f 1f 44 00 00 41 57 41 56 41 55 41 54 55 53 48 83 ec 50 48
89 14 24 4c 89 44 24 08 65 48 8b 04 25 28 00 00 00 48 89 44 24 48 <48>
8b 87 >
    All code
    ========
      0:   0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
      5:   41 57                   push   %r15
      7:   41 56                   push   %r14
      9:   41 55                   push   %r13
      b:   41 54                   push   %r12
      d:   55                      push   %rbp
      e:   53                      push   %rbx
      f:   48 83 ec 50             sub    $0x50,%rsp
      13:   48 89 14 24             mov    %rdx,(%rsp)
      17:   4c 89 44 24 08          mov    %r8,0x8(%rsp)
      1c:   65 48 8b 04 25 28 00    mov    %gs:0x28,%rax
      23:   00 00
      25:   48 89 44 24 48          mov    %rax,0x48(%rsp)
      2a:*  48                      rex.W           <-- trapping instruction
      2b:   8b                      .byte 0x8b
      2c:   87 00                   xchg   %eax,(%rax)

    Code starting with the faulting instruction
    ===========================================
      0:   48                      rex.W
      1:   8b                      .byte 0x8b
      2:   87 00                   xchg   %eax,(%rax)
    RSP: 0018:ffffa63810733a70 EFLAGS: 00010282
    RAX: 78ac714f0997e100 RBX: ffffa63810733b40 RCX: 0000000000000000
    RDX: 0000000000004000 RSI: 0000000000000000 RDI: 00000000000347a0
    RBP: ffffffff8664d950 R08: ffffa63810733b68 R09: ffffa63810733bb0
    R10: 000000000001f627 R11: 0000000000000000 R12: ffffa63810733b68
    R13: ffffa63810733bb0 R14: 00000000000019c1 R15: 00000000fffffff5
    FS:  00007f48d8504700(0000) GS:ffffa2fe5ef00000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000034668 CR3: 00000013037ec001 CR4: 0000000000770ee0
    PKRU: 55555554
    Call Trace:
    <TASK>
    ? __mod_memcg_lruvec_state (mm/memcontrol.c:613 mm/memcontrol.c:799)
    iomap_iter (fs/iomap/iter.c:76)
    iomap_read_folio (fs/iomap/buffered-io.c:342)
    ? xfs_end_bio (fs/xfs/xfs_aops.c:542)
    filemap_read_folio (mm/filemap.c:2407)
    filemap_get_pages (mm/filemap.c:2492 mm/filemap.c:2606)
    filemap_read (mm/filemap.c:2677)
    xfs_file_buffered_read (fs/xfs/xfs_file.c:278)
    xfs_file_read_iter (fs/xfs/xfs_file.c:304)
    vfs_read (fs/read_write.c:390 fs/read_write.c:470)
    __x64_sys_pread64 (include/linux/file.h:44 fs/read_write.c:666
fs/read_write.c:675 fs/read_write.c:672 fs/read_write.c:672)
    do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
    entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
    RIP: 0033:0x7f49061ca917
    Code: 08 89 3c 24 48 89 4c 24 18 e8 05 f4 ff ff 4c 8b 54 24 18 48
8b 54 24 10 41 89 c0 48 8b 74 24 08 8b 3c 24 b8 11 00 00 00 0f 05 <48>
3d 00 >
    All code
    ========
      0:   08 89 3c 24 48 89       or     %cl,-0x76b7dbc4(%rcx)
      6:   4c 24 18                rex.WR and $0x18,%al
      9:   e8 05 f4 ff ff          call   0xfffffffffffff413
      e:   4c 8b 54 24 18          mov    0x18(%rsp),%r10
      13:   48 8b 54 24 10          mov    0x10(%rsp),%rdx
      18:   41 89 c0                mov    %eax,%r8d
      1b:   48 8b 74 24 08          mov    0x8(%rsp),%rsi
      20:   8b 3c 24                mov    (%rsp),%edi
      23:   b8 11 00 00 00          mov    $0x11,%eax
      28:   0f 05                   syscall
      2a:*  48                      rex.W           <-- trapping instruction
      2b:   3d                      .byte 0x3d
            ...

    Code starting with the faulting instruction
    ===========================================
      0:   48                      rex.W
      1:   3d                      .byte 0x3d
            ...
    RSP: 002b:00007f48d84ffc70 EFLAGS: 00000293 ORIG_RAX: 0000000000000011
    RAX: ffffffffffffffda RBX: 00000000018a0c90 RCX: 00007f49061ca917
    RDX: 00000000000c294f RSI: 000000002265e000 RDI: 000000000000003c
    RBP: 00007f48d84ffda0 R08: 0000000000000000 R09: 00007f48d84ffe60
    R10: 000000000191dcd8 R11: 0000000000000293 R12: 0000000007c3c6c0
    R13: 00000000000c294f R14: 00000000000c294f R15: 000000000191dcd8
    </TASK>
    Modules linked in: xt_connlabel overlay nft_compat esp4
xt_hashlimit ip_set_hash_netport xt_length nf_conntrack_netlink
mpls_gso mpls_iptunnel >
    tcp_bbr sch_fq nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 8021q
garp mrp stp llc ipmi_ssif amd64_edac kvm_amd kvm irqbypass
crc32_pclmul crc32>
    CR2: 0000000000034668
    ---[ end trace 0000000000000000 ]---

We also have a deadlock reading a very specific file on this host. We managed to
do a kdump on this host and extracted out the state of the mapping.


    >>> trace
    #0  context_switch (/cfsetup_build/build/linux/kernel/sched/core.c:5241:2)
    #1  __schedule (/cfsetup_build/build/linux/kernel/sched/core.c:6554:8)
    #2  schedule (/cfsetup_build/build/linux/kernel/sched/core.c:6630:3)
    #3  io_schedule (/cfsetup_build/build/linux/kernel/sched/core.c:8774:2)
    #4  folio_wait_bit_common (/cfsetup_build/build/linux/mm/filemap.c:1296:4)
    #5  folio_put_wait_locked (/cfsetup_build/build/linux/mm/filemap.c:1465:9)
    #6  filemap_update_page (/cfsetup_build/build/linux/mm/filemap.c:2472:4)
    #7  filemap_get_pages (/cfsetup_build/build/linux/mm/filemap.c:2606:9)
    #8  filemap_read (/cfsetup_build/build/linux/mm/filemap.c:2676:11)
    #9  xfs_file_buffered_read
(/cfsetup_build/build/linux/fs/xfs/xfs_file.c:277:8)
    #10 xfs_file_read_iter (/cfsetup_build/build/linux/fs/xfs/xfs_file.c:302:9)
    #11 call_read_iter (/cfsetup_build/build/linux/include/linux/fs.h:2199:9)
    #12 new_sync_read (/cfsetup_build/build/linux/fs/read_write.c:389:8)
    #13 vfs_read (/cfsetup_build/build/linux/fs/read_write.c:470:9)
    #14 ksys_read (/cfsetup_build/build/linux/fs/read_write.c:613:9)
    #15 do_syscall_x64
(/cfsetup_build/build/linux/arch/x86/entry/common.c:50:14)
    #16 do_syscall_64 (/cfsetup_build/build/linux/arch/x86/entry/common.c:80:7)
    #17 entry_SYSCALL_64+0x83/0x164
(/cfsetup_build/build/linux/arch/x86/entry/entry_64.S:120)
    #18 0x7f05f0b093ce
    >>> folio = trace[6]['folio']
    >>> decode_page_flags(folio)
    'PG_locked|PG_waiters|PG_head'
    >>> folio
    *(struct folio *)0xffffd67406346000 = {
            .flags = (unsigned long)13510764522438785,
            .lru = (struct list_head){
                    .next = (struct list_head *)0xdead000000000100,
                    .prev = (struct list_head *)0xdead000000000122,
            },
            .__filler = (void *)0xdead000000000100,
            .mlock_count = (unsigned int)290,
            .mapping = (struct address_space *)0x0,
            .index = (unsigned long)18446641474676726016,
            .private = (void *)0x400000,
            ._mapcount = (atomic_t){
                    .counter = (int)-1,
            },
            ._refcount = (atomic_t){
                    .counter = (int)1,
            },
            .memcg_data = (unsigned long)0,
            .page = (struct page){
                    .flags = (unsigned long)13510764522438785,
                    .lru = (struct list_head){
                            .next = (struct list_head *)0xdead000000000100,
                            .prev = (struct list_head *)0xdead000000000122,
                    },
                    .__filler = (void *)0xdead000000000100,
                    .mlock_count = (unsigned int)290,
                    .buddy_list = (struct list_head){
                            .next = (struct list_head *)0xdead000000000100,
                            .prev = (struct list_head *)0xdead000000000122,
                    },
                    .pcp_list = (struct list_head){
                            .next = (struct list_head *)0xdead000000000100,
                            .prev = (struct list_head *)0xdead000000000122,
                    },
                    .mapping = (struct address_space *)0x0,
                    .index = (unsigned long)18446641474676726016,
                    .private = (unsigned long)4194304,
                    .pp_magic = (unsigned long)16045481047390945536,
                    .pp = (struct page_pool *)0xdead000000000122,
                    ._pp_mapping_pad = (unsigned long)0,
                    .dma_addr = (unsigned long)18446641474676726016,
                    .dma_addr_upper = (unsigned long)4194304,
                    .pp_frag_count = (atomic_long_t){
                            .counter = (s64)4194304,
                    },
                    .compound_head = (unsigned long)16045481047390945536,
                    .compound_dtor = (unsigned char)34,
                    .compound_order = (unsigned char)1,
                    .compound_mapcount = (atomic_t){
                            .counter = (int)-559087616,
                    },
                    .compound_pincount = (atomic_t){
                            .counter = (int)0,
                    },
                    .compound_nr = (unsigned int)0,
                    ._compound_pad_1 = (unsigned long)16045481047390945536,
                    ._compound_pad_2 = (unsigned long)16045481047390945570,
                    .deferred_list = (struct list_head){
                            .next = (struct list_head *)0x0,
                            .prev = (struct list_head *)0xffffa2afcd181900,
                    },
                    ._pt_pad_1 = (unsigned long)16045481047390945536,
                    .pmd_huge_pte = (pgtable_t)0xdead000000000122,
                    ._pt_pad_2 = (unsigned long)0,
                    .pt_mm = (struct mm_struct *)0xffffa2afcd181900,
                    .pt_frag_refcount = (atomic_t){
                            .counter = (int)-854058752,
                    },
                    .ptl = (spinlock_t){
                            .rlock = (struct raw_spinlock){
                                    .raw_lock = (arch_spinlock_t){
                                            .val = (atomic_t){
                                                    .counter = (int)4194304,
                                            },
                                            .locked = (u8)0,
                                            .pending = (u8)0,
                                            .locked_pending = (u16)0,
                                            .tail = (u16)64,
                                    },
                            },
                    },
                    .pgmap = (struct dev_pagemap *)0xdead000000000100,
                    .zone_device_data = (void *)0xdead000000000122,
                    .callback_head = (struct callback_head){
                            .next = (struct callback_head *)0xdead000000000100,
                            .func = (void (*)(struct callback_head
*))0xdead000000000122,
                    },
                    ._mapcount = (atomic_t){
                            .counter = (int)-1,
                    },
                    .page_type = (unsigned int)4294967295,
                    ._refcount = (atomic_t){
                            .counter = (int)1,
                    },
                    .memcg_data = (unsigned long)0,
            },
            ._flags_1 = (unsigned long)13510764522373120,
            .__head = (unsigned long)18446698392541487105,
            ._folio_dtor = (unsigned char)1,
            ._folio_order = (unsigned char)2,
            ._total_mapcount = (atomic_t){
                    .counter = (int)-1,
            },
            ._pincount = (atomic_t){
                    .counter = (int)0,
            },
            ._folio_nr_pages = (unsigned int)4,
    }
    >>> for index, entry in
xa_for_each(trace[7]['mapping'].i_pages.address_of_()):
            print(index, entry, cast('struct folio *',
entry).page.mapping.address_of_())
    ....
    6464 (void *)0xffffd674c130a000 *(struct address_space
**)0xffffd674c130a018 = 0xffffa2b30e93b2b0
    6528 (void *)0xffffd674beb22000 *(struct address_space
**)0xffffd674beb22018 = 0xffffa2b30e93b2b0
    6592 (void *)0xffffd67406346000 *(struct address_space
**)0xffffd67406346018 = 0x0 <===== our folio
    6624 (void *)0x7037e8d8000100d (struct address_space **)0x7037e8d80001025
    6625 (void *)0x7037e047000100d (struct address_space **)0x7037e0470001025
    ....

This looks like the xarray is corrupted, and for some reason we have a
locked folio
in the mapping with a page with no mapping.

Any suggestions on narrowing this down to a hypothesis to try to reproduce this,
or potential fixes are very much appreciated. We are also trying some
different kernels
configurations on different set of hosts to see if the problems go
away for them, such as:
- 6.1.36 without xfs: Support large folios
6795801366da0cd3d99e27c37f020a8f16714886
- 6.1.36 without THP
- 6.1.37 with the following series backported xfs, iomap: fix data
corruption due to stale cached iomaps
https://lore.kernel.org/linux-fsdevel/20221129001632.GX3600936@dread.disaster.area/

Best,
Daniel.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Kernel NULL pointer deref and data corruptions with xfs on 6.1
  2023-07-21 10:49 Kernel NULL pointer deref and data corruptions with xfs on 6.1 Daniel Dao
@ 2023-07-24 11:23 ` Daniel Dao
  2023-07-24 21:45   ` Dave Chinner
  2023-07-27  3:27 ` Matthew Wilcox
  1 sibling, 1 reply; 10+ messages in thread
From: Daniel Dao @ 2023-07-24 11:23 UTC (permalink / raw)
  To: linux-fsdevel
  Cc: Matthew Wilcox, Dave Chinner, kernel-team, linux-kernel, djwong

Hi again,

We had another example of xarray corruption involving xfs and zsmalloc. We are
running zram as swap. We have 2 tasks deadlock waiting for page to be released

The following backtrace is from zsmalloc task
  #0  context_switch (/cfsetup_build/build/linux/kernel/sched/core.c:5241:2)
  #1  __schedule (/cfsetup_build/build/linux/kernel/sched/core.c:6554:8)
  #2  schedule (/cfsetup_build/build/linux/kernel/sched/core.c:6630:3)
  #3  io_schedule (/cfsetup_build/build/linux/kernel/sched/core.c:8774:2)
  #4  folio_wait_bit_common (/cfsetup_build/build/linux/mm/filemap.c:1296:4)
  #5  folio_wait_locked
(/cfsetup_build/build/linux/include/linux/pagemap.h:1022:3)
  #6  wait_on_page_locked
(/cfsetup_build/build/linux/include/linux/pagemap.h:1034:2)
  #7  lock_zspage (/cfsetup_build/build/linux/mm/zsmalloc.c:1736:3)
  #8  async_free_zspage (/cfsetup_build/build/linux/mm/zsmalloc.c:1974:3)
  #9  process_one_work (/cfsetup_build/build/linux/kernel/workqueue.c:2289:2)
  #10 worker_thread (/cfsetup_build/build/linux/kernel/workqueue.c:2436:4)
  #11 kthread (/cfsetup_build/build/linux/kernel/kthread.c:376:9)
  #12 ret_from_fork+0x22/0x2d
(/cfsetup_build/build/linux/arch/x86/entry/entry_64.S:306)

The following backtrace is from a userspace task
  #0  context_switch (/cfsetup_build/build/linux/kernel/sched/core.c:5241:2)
  #1  __schedule (/cfsetup_build/build/linux/kernel/sched/core.c:6554:8)
  #2  schedule (/cfsetup_build/build/linux/kernel/sched/core.c:6630:3)
  #3  io_schedule (/cfsetup_build/build/linux/kernel/sched/core.c:8774:2)
  #4  folio_wait_bit_common (/cfsetup_build/build/linux/mm/filemap.c:1296:4)
  #5  folio_put_wait_locked (/cfsetup_build/build/linux/mm/filemap.c:1465:9)
  #6  filemap_update_page (/cfsetup_build/build/linux/mm/filemap.c:2472:4)
  #7  filemap_get_pages (/cfsetup_build/build/linux/mm/filemap.c:2606:9)
  #8  filemap_read (/cfsetup_build/build/linux/mm/filemap.c:2676:11)
  #9  xfs_file_buffered_read
(/cfsetup_build/build/linux/fs/xfs/xfs_file.c:277:8)
  #10 xfs_file_read_iter (/cfsetup_build/build/linux/fs/xfs/xfs_file.c:302:9)
  #11 call_read_iter (/cfsetup_build/build/linux/include/linux/fs.h:2199:9)
  #12 new_sync_read (/cfsetup_build/build/linux/fs/read_write.c:389:8)
  #13 vfs_read (/cfsetup_build/build/linux/fs/read_write.c:470:9)
  #14 ksys_read (/cfsetup_build/build/linux/fs/read_write.c:613:9)
  #15 do_syscall_x64 (/cfsetup_build/build/linux/arch/x86/entry/common.c:50:14)
  #16 do_syscall_64 (/cfsetup_build/build/linux/arch/x86/entry/common.c:80:7)
  #17 entry_SYSCALL_64+0x83/0x164
(/cfsetup_build/build/linux/arch/x86/entry/entry_64.S:120)

The folio in question has .mapping = (struct address_space
*)zsmalloc_mops+0x2 = 0xffffffffc1a9f332
and flag 'PG_locked|PG_waiters|PG_private|PG_slob_free'. In fact, the
file's i_pages
mapping has a node full of these pages. The following are entries we
get from mapping
in #6 at 0xffffffffa4e1c586 (filemap_get_pages+0x5d6/0x624) in
filemap_update_page at /cfsetup_build/build/linux/mm/filemap.c:2472:4
(inlined)

  > for index, entry in xa_for_each(trace[6]['mapping'].i_pages.address_of_()):
      print(index, entry, cast('struct folio *',
entry).page.mapping.address_of_())

  2936 (void *)0xffffe53ab6454f00 *(struct address_space
**)0xffffe53ab6454f18 = 0xffff9ffc9ded16b0
  2940 (void *)0xffffe53ab6454300 *(struct address_space
**)0xffffe53ab6454318 = 0xffff9ffc9ded16b0
  2944 (void *)0xffffe53a02696000 *(struct address_space
**)0xffffe53a02696018 = zsmalloc_mops+0x2 = 0xffffffffc1a9f332 <==
index
  2945 (void *)0xffffe53a02696000 *(struct address_space
**)0xffffe53a02696018 = zsmalloc_mops+0x2 = 0xffffffffc1a9f332
  2946 (void *)0xffffe53a02696000 *(struct address_space
**)0xffffe53a02696018 = zsmalloc_mops+0x2 = 0xffffffffc1a9f332
  ...
  2976 (void *)0xffffe53a02696000 *(struct address_space
**)0xffffe53a02696018 = zsmalloc_mops+0x2 = 0xffffffffc1a9f332 <==
last_index
  ...
  3006 (void *)0xffffe53a02696000 *(struct address_space
**)0xffffe53a02696018 = zsmalloc_mops+0x2 = 0xffffffffc1a9f332
  3007 (void *)0xffffe53ad71c37c0 *(struct address_space
**)0xffffe53ad71c37d8 = 0xffff9ffc9ded16b0

On Fri, Jul 21, 2023 at 11:49 AM Daniel Dao <dqminh@cloudflare.com> wrote:
>
> Hi,
>
> In the past, we reported some corruptions on xfs/iomap/xarray combinations on
> kernel 6.1. This happened very rarely ( once a week for every 10000 hosts), and
> the host exhibited symptoms such as: rcu_preempt self-detected stalls,
> NULL pointer
> dereferences or deadlock when reading a particular file.
>
> We do not have a reproducer yet, but we now have more debugging data
> which hopefully
> should help narrow this down. Details as followed:
>
> 1. Kernel NULL pointer deferencences in __filemap_get_folio
>
> This happened on a few different hosts, with a few different repeated addresses.
> The addresses are 0000000000000036, 0000000000000076,
> 00000000000000f6. This looks
> like the xarray is corrupted and we were trying to do some work on a
> sibling entry.
>
>     BUG: kernel NULL pointer dereference, address: 0000000000000036
>     #PF: supervisor read access in kernel mode
>     #PF: error_code(0x0000) - not-present page
>     PGD 18806c5067 P4D 18806c5067 PUD 188ed48067 PMD 0
>     Oops: 0000 [#1] PREEMPT SMP NOPTI
>     CPU: 73 PID: 3579408 Comm: prometheus Tainted: G           O
> 6.1.34-cloudflare-2023.6.7 #1
>     Hardware name: GIGABYTE R162-Z12-CD1/MZ12-HD4-CD, BIOS M03 11/19/2021
>     RIP: 0010:__filemap_get_folio (arch/x86/include/asm/atomic.h:29
> include/linux/atomic/atomic-arch-fallback.h:1242
> include/linux/atomic/atomic-arch-fallback.h:1267
> include/linux/atomic/atomic-instrumented.h:608
> include/linux/page_ref.h:238 include/linux/page_ref.h:247
> include/linux/page_ref.h:280 include/linux/page_ref.h:313
> mm/filemap.c:1863 mm/filemap.c:1915)
>     Code: 10 e8 99 ac 84 00 48 3d 06 04 00 00 49 89 c4 74 e2 48 3d 02
> 04 00 00 74 da 48 85 c0 0f 84 2e 02 00 00 a8 01 0f 85 e3 00 00 00 <8b>
> 40 34 85 c0 74 c2 8d 50 01 4d 8d 7c 24 34 f0 41 0f b1 54 24 34
>     All code
>     ========
>       0: 10 e8                adc    %ch,%al
>       2: 99                    cltd
>       3: ac                    lods   %ds:(%rsi),%al
>       4: 84 00                test   %al,(%rax)
>       6: 48 3d 06 04 00 00    cmp    $0x406,%rax
>       c: 49 89 c4              mov    %rax,%r12
>       f: 74 e2                je     0xfffffffffffffff3
>       11: 48 3d 02 04 00 00    cmp    $0x402,%rax
>       17: 74 da                je     0xfffffffffffffff3
>       19: 48 85 c0              test   %rax,%rax
>       1c: 0f 84 2e 02 00 00    je     0x250
>       22: a8 01                test   $0x1,%al
>       24: 0f 85 e3 00 00 00    jne    0x10d
>       2a:* 8b 40 34              mov    0x34(%rax),%eax <-- trapping instruction
>       2d: 85 c0                test   %eax,%eax
>       2f: 74 c2                je     0xfffffffffffffff3
>       31: 8d 50 01              lea    0x1(%rax),%edx
>       34: 4d 8d 7c 24 34        lea    0x34(%r12),%r15
>       39: f0 41 0f b1 54 24 34 lock cmpxchg %edx,0x34(%r12)
>
>     Code starting with the faulting instruction
>     ===========================================
>       0: 8b 40 34              mov    0x34(%rax),%eax
>       3: 85 c0                test   %eax,%eax
>       5: 74 c2                je     0xffffffffffffffc9
>       7: 8d 50 01              lea    0x1(%rax),%edx
>       a: 4d 8d 7c 24 34        lea    0x34(%r12),%r15
>       f: f0 41 0f b1 54 24 34 lock cmpxchg %edx,0x34(%r12)
>     RSP: 0000:ffffaf5587cdfc60 EFLAGS: 00010246
>     RAX: 0000000000000002 RBX: 0000000000000000 RCX: 0000000000000002
>     RDX: 0000000000000008 RSI: ffffa45181fa8000 RDI: ffffaf5587cdfc70
>     RBP: 0000000000000000 R08: 0000000000000402 R09: 000000000006e44f
>     R10: 000000000006e450 R11: 000000000006e448 R12: 0000000000000002
>     R13: ffffa3fff6fdfeb0 R14: 000000000006e44a R15: 00000000000000d1
>     FS:  000000c9e385ac90(0000) GS:ffffa4153fc40000(0000) knlGS:0000000000000000
>     CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>     CR2: 0000000000000036 CR3: 000000296a1bc002 CR4: 0000000000770ee0
>     PKRU: 55555554
>     Call Trace:
>     <TASK>
>     ? __die_body.cold (arch/x86/kernel/dumpstack.c:478
> arch/x86/kernel/dumpstack.c:465 arch/x86/kernel/dumpstack.c:420)
>     ? page_fault_oops (arch/x86/mm/fault.c:727)
>     ? migrate_task_rq_fair (include/linux/sched.h:1921
> kernel/sched/fair.c:3932 kernel/sched/fair.c:7497)
>     ? do_user_addr_fault (include/linux/kprobes.h:404
> include/linux/kprobes.h:597 arch/x86/mm/fault.c:1280)
>     ? ttwu_queue_wakelist (kernel/sched/core.c:3880)
>     ? exc_page_fault (arch/x86/include/asm/irqflags.h:40
> arch/x86/include/asm/irqflags.h:75 arch/x86/mm/fault.c:1527
> arch/x86/mm/fault.c:1575)
>     ? asm_exc_page_fault (arch/x86/include/asm/idtentry.h:570)
>     ? __filemap_get_folio (arch/x86/include/asm/atomic.h:29
> include/linux/atomic/atomic-arch-fallback.h:1242
> include/linux/atomic/atomic-arch-fallback.h:1267
> include/linux/atomic/atomic-instrumented.h:608
> include/linux/page_ref.h:238 include/linux/page_ref.h:247
> include/linux/page_ref.h:280 include/linux/page_ref.h:313
> mm/filemap.c:1863 mm/filemap.c:1915)
>     filemap_fault (mm/filemap.c:3120)
>     ? preempt_count_add (include/linux/ftrace.h:950
> kernel/sched/core.c:5685 kernel/sched/core.c:5682
> kernel/sched/core.c:5710)
>     __do_fault (mm/memory.c:4234)
>     do_fault (mm/memory.c:4564 mm/memory.c:4692)
>     __handle_mm_fault (mm/memory.c:4964 mm/memory.c:5106)
>     handle_mm_fault (mm/memory.c:5227)
>     do_user_addr_fault (include/linux/sched/signal.h:433
> arch/x86/mm/fault.c:1430)
>     exc_page_fault (arch/x86/include/asm/irqflags.h:40
> arch/x86/include/asm/irqflags.h:75 arch/x86/mm/fault.c:1527
> arch/x86/mm/fault.c:1575)
>     asm_exc_page_fault (arch/x86/include/asm/idtentry.h:570)
>     RIP: 0033:0x268b8b9
>     Code: 70 48 89 4c 24 78 48 8b 94 24 b8 00 00 00 0f 1f 00 48 85 d2
> 74 3f 48 89 ce 48 29 d9 4c 8d 49 04 49 f7 d9 49 c1 f9 3f 49 21 f9 <46>
> 8b 0c 08 44 89 4c 24 34 90 90 48 89 d3 48 89 c1 41 b8 01 00 00
>     All code
>     ========
>       0: 70 48                jo     0x4a
>       2: 89 4c 24 78          mov    %ecx,0x78(%rsp)
>       6: 48 8b 94 24 b8 00 00 mov    0xb8(%rsp),%rdx
>       d: 00
>       e: 0f 1f 00              nopl   (%rax)
>       11: 48 85 d2              test   %rdx,%rdx
>       14: 74 3f                je     0x55
>       16: 48 89 ce              mov    %rcx,%rsi
>       19: 48 29 d9              sub    %rbx,%rcx
>       1c: 4c 8d 49 04          lea    0x4(%rcx),%r9
>       20: 49 f7 d9              neg    %r9
>       23: 49 c1 f9 3f          sar    $0x3f,%r9
>       27: 49 21 f9              and    %rdi,%r9
>       2a:* 46 8b 0c 08          mov    (%rax,%r9,1),%r9d <-- trapping
> instruction
>       2e: 44 89 4c 24 34        mov    %r9d,0x34(%rsp)
>       33: 90                    nop
>       34: 90                    nop
>       35: 48 89 d3              mov    %rdx,%rbx
>       38: 48 89 c1              mov    %rax,%rcx
>       3b: 41                    rex.B
>       3c: b8                    .byte 0xb8
>       3d: 01 00                add    %eax,(%rax)
>       ...
>
>     Code starting with the faulting instruction
>     ===========================================
>       0: 46 8b 0c 08          mov    (%rax,%r9,1),%r9d
>       4: 44 89 4c 24 34        mov    %r9d,0x34(%rsp)
>       9: 90                    nop
>       a: 90                    nop
>       b: 48 89 d3              mov    %rdx,%rbx
>       e: 48 89 c1              mov    %rax,%rcx
>       11: 41                    rex.B
>       12: b8                    .byte 0xb8
>       13: 01 00                add    %eax,(%rax)
>       ...
>     RSP: 002b:000000cbc509f520 EFLAGS: 00010202
>     RAX: 00007e81cf427e0c RBX: 00000000000222cc RCX: 00000000123817b2
>     RDX: 000000c00001ac00 RSI: 00000000123a3a7e RDI: 00000000000222c8
>     RBP: 000000cbc509f5b0 R08: 0000000003cb5910 R09: 00000000000222c8
>     R10: 000000c4de3dea00 R11: 0000000000000123 R12: 0000000000000000
>     R13: 0000000000000005 R14: 000000c83bad2340 R15: 0000010000000000
>     </TASK>
>     Modules linked in: xt_connlabel xt_MASQUERADE nf_conntrack_netlink
> xfrm_user xfrm_algo xt_addrtype br_netfilter bridge overlay zstd
> zstd_compress zram zsmalloc tun tcp_diag inet_diag raid0 md_mod essiv
> dm_crypt trusted asn1_encoder tee ip6table_filter ip6table_mangle
> ip6table_raw ip6table_security ip6table_nat ip6_tables xt_bpf
> xt_conntrack xt_multiport xt_set iptable_filter xt_NFLOG nfnetlink_log
> xt_connbytes xt_comment xt_connmark xt_statistic iptable_mangle xt_nat
> xt_tcpudp iptable_nat nf_nat xt_CT iptable_raw ip_set_hash_ip
> ip_set_hash_net ip_set nfnetlink sch_fq nf_conntrack nf_defrag_ipv6
> nf_defrag_ipv4 8021q garp mrp stp llc bonding nvme_fabrics amd64_edac
> kvm_amd ipmi_ssif kvm irqbypass crc32_pclmul crc32c_intel sha512_ssse3
> acpi_ipmi mlx5_core aesni_intel ipmi_si mlxfw rapl xhci_pci nvme tls
> ipmi_devintf tiny_power_button psample nvme_core xhci_hcd i2c_piix4
> ccp ipmi_msghandler button fuse dm_mod dax efivarfs ip_tables x_tables
> bcmcrypt(O)
>     crypto_simd cryptd
>     CR2: 0000000000000036
>     ---[ end trace 0000000000000000 ]---
>     RIP: 0010:__filemap_get_folio (arch/x86/include/asm/atomic.h:29
> include/linux/atomic/atomic-arch-fallback.h:1242
> include/linux/atomic/atomic-arch-fallback.h:1267
> include/linux/atomic/atomic-instrumented.h:608
> include/linux/page_ref.h:238 include/linux/page_ref.h:247
> include/linux/page_ref.h:280 include/linux/page_ref.h:313
> mm/filemap.c:1863 mm/filemap.c:1915)
>     Code: 10 e8 99 ac 84 00 48 3d 06 04 00 00 49 89 c4 74 e2 48 3d 02
> 04 00 00 74 da 48 85 c0 0f 84 2e 02 00 00 a8 01 0f 85 e3 00 00 00 <8b>
> 40 34 85 c0 74 c2 8d 50 01 4d 8d 7c 24 34 f0 41 0f b1 54 24 34
>     All code
>     ========
>       0: 10 e8                adc    %ch,%al
>       2: 99                    cltd
>       3: ac                    lods   %ds:(%rsi),%al
>       4: 84 00                test   %al,(%rax)
>       6: 48 3d 06 04 00 00    cmp    $0x406,%rax
>       c: 49 89 c4              mov    %rax,%r12
>       f: 74 e2                je     0xfffffffffffffff3
>       11: 48 3d 02 04 00 00    cmp    $0x402,%rax
>       17: 74 da                je     0xfffffffffffffff3
>       19: 48 85 c0              test   %rax,%rax
>       1c: 0f 84 2e 02 00 00    je     0x250
>       22: a8 01                test   $0x1,%al
>       24: 0f 85 e3 00 00 00    jne    0x10d
>       2a:* 8b 40 34              mov    0x34(%rax),%eax <-- trapping instruction
>       2d: 85 c0                test   %eax,%eax
>       2f: 74 c2                je     0xfffffffffffffff3
>       31: 8d 50 01              lea    0x1(%rax),%edx
>       34: 4d 8d 7c 24 34        lea    0x34(%r12),%r15
>       39: f0 41 0f b1 54 24 34 lock cmpxchg %edx,0x34(%r12)
>
>     Code starting with the faulting instruction
>     ===========================================
>       0: 8b 40 34              mov    0x34(%rax),%eax
>       3: 85 c0                test   %eax,%eax
>       5: 74 c2                je     0xffffffffffffffc9
>       7: 8d 50 01              lea    0x1(%rax),%edx
>       a: 4d 8d 7c 24 34        lea    0x34(%r12),%r15
>       f: f0 41 0f b1 54 24 34 lock cmpxchg %edx,0x34(%r12)
>     RSP: 0000:ffffaf5587cdfc60 EFLAGS: 00010246
>     RAX: 0000000000000002 RBX: 0000000000000000 RCX: 0000000000000002
>     RDX: 0000000000000008 RSI: ffffa45181fa8000 RDI: ffffaf5587cdfc70
>     RBP: 0000000000000000 R08: 0000000000000402 R09: 000000000006e44f
>     R10: 000000000006e450 R11: 000000000006e448 R12: 0000000000000002
>     R13: ffffa3fff6fdfeb0 R14: 000000000006e44a R15: 00000000000000d1
>     FS:  000000c9e385ac90(0000) GS:ffffa4153fc40000(0000) knlGS:0000000000000000
>     CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>     CR2: 0000000000000036 CR3: 000000296a1bc002 CR4: 0000000000770ee0
>     PKRU: 55555554
>
>     BUG: kernel NULL pointer dereference, address: 0000000000000076
>     #PF: supervisor read access in kernel mode
>     #PF: error_code(0x0000) - not-present page
>     PGD 7acd78067 P4D 7acd78067 PUD 7acd79067 PMD 0
>     Oops: 0000 [#1] PREEMPT SMP NOPTI
>     CPU: 93 PID: 3784417 Comm: prometheus Tainted: G           O
> 6.1.20-cloudflare-2023.3.18 #1
>     Hardware name: GIGABYTE R162-Z13-CD/MZ12-HD2-CD, BIOS R13 07/17/2020
>     RIP: 0010:__filemap_get_folio (arch/x86/include/asm/atomic.h:29
> include/linux/atomic/atomic-arch-fallback.h:1242
> include/linux/atomic/atomic-arch-fallback.h:1267
> include/linux/atomic/atomic-instrumented.h:608
> include/linux/page_ref.h:238 include/linux/page_ref.h:247
> include/linux/page_ref.h:280 include/linux/page_ref.h:313
> mm/filemap.c:1863 mm/filemap.c:1915)
>     Code: 10 e8 b9 a4 84 00 48 3d 06 04 00 00 49 89 c4 74 e2 48 3d 02
> 04 00 00 74 da 48 85 c0 0f 84 2e 02 00 00 a8 01 0f 85 e3 00 00 00 <8b>
> 40 34 85 c0 74 c2 8d 50 01 4d 8d 7c 24 34 f0 41 0f b1 54 24 34
>     All code
>     ========
>        0: 10 e8                adc    %ch,%al
>        2: b9 a4 84 00 48        mov    $0x480084a4,%ecx
>        7: 3d 06 04 00 00        cmp    $0x406,%eax
>        c: 49 89 c4              mov    %rax,%r12
>        f: 74 e2                je     0xfffffffffffffff3
>       11: 48 3d 02 04 00 00    cmp    $0x402,%rax
>       17: 74 da                je     0xfffffffffffffff3
>       19: 48 85 c0              test   %rax,%rax
>       1c: 0f 84 2e 02 00 00    je     0x250
>       22: a8 01                test   $0x1,%al
>       24: 0f 85 e3 00 00 00    jne    0x10d
>       2a:* 8b 40 34              mov    0x34(%rax),%eax <-- trapping instruction
>       2d: 85 c0                test   %eax,%eax
>       2f: 74 c2                je     0xfffffffffffffff3
>       31: 8d 50 01              lea    0x1(%rax),%edx
>       34: 4d 8d 7c 24 34        lea    0x34(%r12),%r15
>       39: f0 41 0f b1 54 24 34 lock cmpxchg %edx,0x34(%r12)
>
>     Code starting with the faulting instruction
>     ===========================================
>        0: 8b 40 34              mov    0x34(%rax),%eax
>        3: 85 c0                test   %eax,%eax
>        5: 74 c2                je     0xffffffffffffffc9
>        7: 8d 50 01              lea    0x1(%rax),%edx
>        a: 4d 8d 7c 24 34        lea    0x34(%r12),%r15
>        f: f0 41 0f b1 54 24 34 lock cmpxchg %edx,0x34(%r12)
>     RSP: 0000:ffffb15106683c60 EFLAGS: 00010246
>     RAX: 0000000000000042 RBX: 0000000000000000 RCX: 0000000000000002
>     RDX: 0000000000000018 RSI: ffff934b0029efc8 RDI: ffffb15106683c70
>     RBP: 0000000000000000 R08: 0000000000000402 R09: 00000000000cbe5f
>     R10: 00000000000cbe60 R11: 00000000000cbe5c R12: 0000000000000042
>     R13: ffff93449c251eb0 R14: 00000000000cbe59 R15: 00000000000000d1
>     FS:  000000c000300090(0000) GS:ffff937e6ed40000(0000) knlGS:0000000000000000
>     CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>     CR2: 0000000000000076 CR3: 0000000a6528e000 CR4: 0000000000350ee0
>     Call Trace:
>      <TASK>
>     filemap_fault (mm/filemap.c:3120)
>     ? preempt_count_add (include/linux/ftrace.h:950
> kernel/sched/core.c:5685 kernel/sched/core.c:5682
> kernel/sched/core.c:5710)
>     __do_fault (mm/memory.c:4234)
>     do_fault (mm/memory.c:4564 mm/memory.c:4692)
>     __handle_mm_fault (mm/memory.c:4964 mm/memory.c:5106)
>     handle_mm_fault (mm/memory.c:5227)
>     do_user_addr_fault (include/linux/sched/signal.h:433
> arch/x86/mm/fault.c:1430)
>     exc_page_fault (arch/x86/include/asm/irqflags.h:40
> arch/x86/include/asm/irqflags.h:75 arch/x86/mm/fault.c:1527
> arch/x86/mm/fault.c:1575)
>     asm_exc_page_fault (arch/x86/include/asm/idtentry.h:570)
>     RIP: 0033:0x268b8b9
>     Code: 70 48 89 4c 24 78 48 8b 94 24 b8 00 00 00 0f 1f 00 48 85 d2
> 74 3f 48 89 ce 48 29 d9 4c 8d 49 04 49 f7 d9 49 c1 f9 3f 49 21 f9 <46>
> 8b 0c 08 44 89 4c 24 34 90 90 48 89 d3 48 89 c1 41 b8 01 00 00
>     All code
>     ========
>        0: 70 48                jo     0x4a
>        2: 89 4c 24 78          mov    %ecx,0x78(%rsp)
>        6: 48 8b 94 24 b8 00 00 mov    0xb8(%rsp),%rdx
>        d: 00
>        e: 0f 1f 00              nopl   (%rax)
>       11: 48 85 d2              test   %rdx,%rdx
>       14: 74 3f                je     0x55
>       16: 48 89 ce              mov    %rcx,%rsi
>       19: 48 29 d9              sub    %rbx,%rcx
>       1c: 4c 8d 49 04          lea    0x4(%rcx),%r9
>       20: 49 f7 d9              neg    %r9
>       23: 49 c1 f9 3f          sar    $0x3f,%r9
>       27: 49 21 f9              and    %rdi,%r9
>       2a:* 46 8b 0c 08          mov    (%rax,%r9,1),%r9d <-- trapping
> instruction
>       2e: 44 89 4c 24 34        mov    %r9d,0x34(%rsp)
>       33: 90                    nop
>       34: 90                    nop
>       35: 48 89 d3              mov    %rdx,%rbx
>       38: 48 89 c1              mov    %rax,%rcx
>       3b: 41                    rex.B
>       3c: b8                    .byte 0xb8
>       3d: 01 00                add    %eax,(%rax)
>     ...
>
>     Code starting with the faulting instruction
>     ===========================================
>        0: 46 8b 0c 08          mov    (%rax,%r9,1),%r9d
>        4: 44 89 4c 24 34        mov    %r9d,0x34(%rsp)
>        9: 90                    nop
>        a: 90                    nop
>        b: 48 89 d3              mov    %rdx,%rbx
>        e: 48 89 c1              mov    %rax,%rcx
>       11: 41                    rex.B
>       12: b8                    .byte 0xb8
>       13: 01 00                add    %eax,(%rax)
>     ...
>     RSP: 002b:000000d735bb3558 EFLAGS: 00010206
>     RAX: 00007c018402dad8 RBX: 000000000002c3d8 RCX: 0000000037f9be1c
>     RDX: 000000c000222c00 RSI: 0000000037fc81f4 RDI: 000000000002c3d4
>     RBP: 000000d735bb35e8 R08: 0000000003cb5910 R09: 000000000002c3d4
>     R10: 000000c385d2a000 R11: 0000000000000021 R12: 0000000000000000
>     R13: 000000000000000b R14: 000000d1bb70e340 R15: 0000000001000000
>      </TASK>
>     Modules linked in: veth xt_MASQUERADE nf_conntrack_netlink
> xfrm_user xfrm_algo xt_addrtype br_netfilter bridge overlay raid1
> md_mod essiv dm_crypt trusted tee asn1_encoder xt_hl ip6table_filter
> ip6table_mangle ip6table_raw ip6table_security ip6table_nat ip6_tables
> xt_tcpudp xt_conntrack xt_comment xt_multiport xt_set iptable_filter
> iptable_mangle iptable_nat nf_nat xt_CT iptable_raw ip_set_hash_ip
> ip_set_hash_net ip_set nfnetlink tcp_bbr sch_fq nf_conntrack
> nf_defrag_ipv6 nf_defrag_ipv4 8021q mrp garp stp llc bonding
> amd64_edac kvm_amd ipmi_ssif kvm irqbypass crc32_pclmul crc32c_intel
> mlx5_core sha512_ssse3 psample acpi_ipmi aesni_intel xhci_pci nvme
> ipmi_si rapl tls ipmi_devintf tiny_power_button nvme_core mlxfw
> xhci_hcd i2c_piix4 ccp ipmi_msghandler button fuse dm_mod dax efivarfs
> ip_tables x_tables bcmcrypt(O) crypto_simd cryptd
>     CR2: 0000000000000076
>     ---[ end trace 0000000000000000 ]---
>     RIP: 0010:__filemap_get_folio (arch/x86/include/asm/atomic.h:29
> include/linux/atomic/atomic-arch-fallback.h:1242
> include/linux/atomic/atomic-arch-fallback.h:1267
> include/linux/atomic/atomic-instrumented.h:608
> include/linux/page_ref.h:238 include/linux/page_ref.h:247
> include/linux/page_ref.h:280 include/linux/page_ref.h:313
> mm/filemap.c:1863 mm/filemap.c:1915)
>     Code: 10 e8 b9 a4 84 00 48 3d 06 04 00 00 49 89 c4 74 e2 48 3d 02
> 04 00 00 74 da 48 85 c0 0f 84 2e 02 00 00 a8 01 0f 85 e3 00 00 00 <8b>
> 40 34 85 c0 74 c2 8d 50 01 4d 8d 7c 24 34 f0 41 0f b1 54 24 34
>     All code
>     ========
>        0: 10 e8                adc    %ch,%al
>        2: b9 a4 84 00 48        mov    $0x480084a4,%ecx
>        7: 3d 06 04 00 00        cmp    $0x406,%eax
>        c: 49 89 c4              mov    %rax,%r12
>        f: 74 e2                je     0xfffffffffffffff3
>       11: 48 3d 02 04 00 00    cmp    $0x402,%rax
>       17: 74 da                je     0xfffffffffffffff3
>       19: 48 85 c0              test   %rax,%rax
>       1c: 0f 84 2e 02 00 00    je     0x250
>       22: a8 01                test   $0x1,%al
>       24: 0f 85 e3 00 00 00    jne    0x10d
>       2a:* 8b 40 34              mov    0x34(%rax),%eax <-- trapping instruction
>       2d: 85 c0                test   %eax,%eax
>       2f: 74 c2                je     0xfffffffffffffff3
>       31: 8d 50 01              lea    0x1(%rax),%edx
>       34: 4d 8d 7c 24 34        lea    0x34(%r12),%r15
>       39: f0 41 0f b1 54 24 34 lock cmpxchg %edx,0x34(%r12)
>
>     Code starting with the faulting instruction
>     ===========================================
>        0: 8b 40 34              mov    0x34(%rax),%eax
>        3: 85 c0                test   %eax,%eax
>        5: 74 c2                je     0xffffffffffffffc9
>        7: 8d 50 01              lea    0x1(%rax),%edx
>        a: 4d 8d 7c 24 34        lea    0x34(%r12),%r15
>        f: f0 41 0f b1 54 24 34 lock cmpxchg %edx,0x34(%r12)
>     RSP: 0000:ffffb15106683c60 EFLAGS: 00010246
>     RAX: 0000000000000042 RBX: 0000000000000000 RCX: 0000000000000002
>     RDX: 0000000000000018 RSI: ffff934b0029efc8 RDI: ffffb15106683c70
>     RBP: 0000000000000000 R08: 0000000000000402 R09: 00000000000cbe5f
>     R10: 00000000000cbe60 R11: 00000000000cbe5c R12: 0000000000000042
>     R13: ffff93449c251eb0 R14: 00000000000cbe59 R15: 00000000000000d1
>     FS:  000000c000300090(0000) GS:ffff937e6ed40000(0000) knlGS:0000000000000000
>     CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>     CR2: 0000000000000076 CR3: 0000000a6528e000 CR4: 0000000000350ee0
>     note: prometheus[3784417] exited with irqs disabled
>
> 2. Kernel NULL pointer deferencences in xfs_read_iomap_begin
>
>     BUG: unable to handle page fault for address: 0000000000034668
>     #PF: supervisor read access in kernel mode
>     #PF: error_code(0x0000) - not-present page
>     PGD 11cfd37067 P4D 11cfd37067 PUD b88086067 PMD 0
>     Oops: 0000 [#1] PREEMPT SMP NOPTI
>     CPU: 124 PID: 3831226 Comm: rocksdb:low Kdump: loaded Tainted: G
>      W  O L     6.1.27-cloudflare-2023.5.0 #1
>     Hardware name: HYVE EDGE-METAL-GEN11/HS1811D_Lite, BIOS V0.11-sig 12/23/2022
>     RIP: 0010:xfs_read_iomap_begin (fs/xfs/xfs_iomap.c:1200)
>     Code: 0f 1f 44 00 00 41 57 41 56 41 55 41 54 55 53 48 83 ec 50 48
> 89 14 24 4c 89 44 24 08 65 48 8b 04 25 28 00 00 00 48 89 44 24 48 <48>
> 8b 87 >
>     All code
>     ========
>       0:   0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
>       5:   41 57                   push   %r15
>       7:   41 56                   push   %r14
>       9:   41 55                   push   %r13
>       b:   41 54                   push   %r12
>       d:   55                      push   %rbp
>       e:   53                      push   %rbx
>       f:   48 83 ec 50             sub    $0x50,%rsp
>       13:   48 89 14 24             mov    %rdx,(%rsp)
>       17:   4c 89 44 24 08          mov    %r8,0x8(%rsp)
>       1c:   65 48 8b 04 25 28 00    mov    %gs:0x28,%rax
>       23:   00 00
>       25:   48 89 44 24 48          mov    %rax,0x48(%rsp)
>       2a:*  48                      rex.W           <-- trapping instruction
>       2b:   8b                      .byte 0x8b
>       2c:   87 00                   xchg   %eax,(%rax)
>
>     Code starting with the faulting instruction
>     ===========================================
>       0:   48                      rex.W
>       1:   8b                      .byte 0x8b
>       2:   87 00                   xchg   %eax,(%rax)
>     RSP: 0018:ffffa63810733a70 EFLAGS: 00010282
>     RAX: 78ac714f0997e100 RBX: ffffa63810733b40 RCX: 0000000000000000
>     RDX: 0000000000004000 RSI: 0000000000000000 RDI: 00000000000347a0
>     RBP: ffffffff8664d950 R08: ffffa63810733b68 R09: ffffa63810733bb0
>     R10: 000000000001f627 R11: 0000000000000000 R12: ffffa63810733b68
>     R13: ffffa63810733bb0 R14: 00000000000019c1 R15: 00000000fffffff5
>     FS:  00007f48d8504700(0000) GS:ffffa2fe5ef00000(0000) knlGS:0000000000000000
>     CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>     CR2: 0000000000034668 CR3: 00000013037ec001 CR4: 0000000000770ee0
>     PKRU: 55555554
>     Call Trace:
>     <TASK>
>     ? __mod_memcg_lruvec_state (mm/memcontrol.c:613 mm/memcontrol.c:799)
>     iomap_iter (fs/iomap/iter.c:76)
>     iomap_read_folio (fs/iomap/buffered-io.c:342)
>     ? xfs_end_bio (fs/xfs/xfs_aops.c:542)
>     filemap_read_folio (mm/filemap.c:2407)
>     filemap_get_pages (mm/filemap.c:2492 mm/filemap.c:2606)
>     filemap_read (mm/filemap.c:2677)
>     xfs_file_buffered_read (fs/xfs/xfs_file.c:278)
>     xfs_file_read_iter (fs/xfs/xfs_file.c:304)
>     vfs_read (fs/read_write.c:390 fs/read_write.c:470)
>     __x64_sys_pread64 (include/linux/file.h:44 fs/read_write.c:666
> fs/read_write.c:675 fs/read_write.c:672 fs/read_write.c:672)
>     do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80)
>     entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
>     RIP: 0033:0x7f49061ca917
>     Code: 08 89 3c 24 48 89 4c 24 18 e8 05 f4 ff ff 4c 8b 54 24 18 48
> 8b 54 24 10 41 89 c0 48 8b 74 24 08 8b 3c 24 b8 11 00 00 00 0f 05 <48>
> 3d 00 >
>     All code
>     ========
>       0:   08 89 3c 24 48 89       or     %cl,-0x76b7dbc4(%rcx)
>       6:   4c 24 18                rex.WR and $0x18,%al
>       9:   e8 05 f4 ff ff          call   0xfffffffffffff413
>       e:   4c 8b 54 24 18          mov    0x18(%rsp),%r10
>       13:   48 8b 54 24 10          mov    0x10(%rsp),%rdx
>       18:   41 89 c0                mov    %eax,%r8d
>       1b:   48 8b 74 24 08          mov    0x8(%rsp),%rsi
>       20:   8b 3c 24                mov    (%rsp),%edi
>       23:   b8 11 00 00 00          mov    $0x11,%eax
>       28:   0f 05                   syscall
>       2a:*  48                      rex.W           <-- trapping instruction
>       2b:   3d                      .byte 0x3d
>             ...
>
>     Code starting with the faulting instruction
>     ===========================================
>       0:   48                      rex.W
>       1:   3d                      .byte 0x3d
>             ...
>     RSP: 002b:00007f48d84ffc70 EFLAGS: 00000293 ORIG_RAX: 0000000000000011
>     RAX: ffffffffffffffda RBX: 00000000018a0c90 RCX: 00007f49061ca917
>     RDX: 00000000000c294f RSI: 000000002265e000 RDI: 000000000000003c
>     RBP: 00007f48d84ffda0 R08: 0000000000000000 R09: 00007f48d84ffe60
>     R10: 000000000191dcd8 R11: 0000000000000293 R12: 0000000007c3c6c0
>     R13: 00000000000c294f R14: 00000000000c294f R15: 000000000191dcd8
>     </TASK>
>     Modules linked in: xt_connlabel overlay nft_compat esp4
> xt_hashlimit ip_set_hash_netport xt_length nf_conntrack_netlink
> mpls_gso mpls_iptunnel >
>     tcp_bbr sch_fq nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 8021q
> garp mrp stp llc ipmi_ssif amd64_edac kvm_amd kvm irqbypass
> crc32_pclmul crc32>
>     CR2: 0000000000034668
>     ---[ end trace 0000000000000000 ]---
>
> We also have a deadlock reading a very specific file on this host. We managed to
> do a kdump on this host and extracted out the state of the mapping.
>
>
>     >>> trace
>     #0  context_switch (/cfsetup_build/build/linux/kernel/sched/core.c:5241:2)
>     #1  __schedule (/cfsetup_build/build/linux/kernel/sched/core.c:6554:8)
>     #2  schedule (/cfsetup_build/build/linux/kernel/sched/core.c:6630:3)
>     #3  io_schedule (/cfsetup_build/build/linux/kernel/sched/core.c:8774:2)
>     #4  folio_wait_bit_common (/cfsetup_build/build/linux/mm/filemap.c:1296:4)
>     #5  folio_put_wait_locked (/cfsetup_build/build/linux/mm/filemap.c:1465:9)
>     #6  filemap_update_page (/cfsetup_build/build/linux/mm/filemap.c:2472:4)
>     #7  filemap_get_pages (/cfsetup_build/build/linux/mm/filemap.c:2606:9)
>     #8  filemap_read (/cfsetup_build/build/linux/mm/filemap.c:2676:11)
>     #9  xfs_file_buffered_read
> (/cfsetup_build/build/linux/fs/xfs/xfs_file.c:277:8)
>     #10 xfs_file_read_iter (/cfsetup_build/build/linux/fs/xfs/xfs_file.c:302:9)
>     #11 call_read_iter (/cfsetup_build/build/linux/include/linux/fs.h:2199:9)
>     #12 new_sync_read (/cfsetup_build/build/linux/fs/read_write.c:389:8)
>     #13 vfs_read (/cfsetup_build/build/linux/fs/read_write.c:470:9)
>     #14 ksys_read (/cfsetup_build/build/linux/fs/read_write.c:613:9)
>     #15 do_syscall_x64
> (/cfsetup_build/build/linux/arch/x86/entry/common.c:50:14)
>     #16 do_syscall_64 (/cfsetup_build/build/linux/arch/x86/entry/common.c:80:7)
>     #17 entry_SYSCALL_64+0x83/0x164
> (/cfsetup_build/build/linux/arch/x86/entry/entry_64.S:120)
>     #18 0x7f05f0b093ce
>     >>> folio = trace[6]['folio']
>     >>> decode_page_flags(folio)
>     'PG_locked|PG_waiters|PG_head'
>     >>> folio
>     *(struct folio *)0xffffd67406346000 = {
>             .flags = (unsigned long)13510764522438785,
>             .lru = (struct list_head){
>                     .next = (struct list_head *)0xdead000000000100,
>                     .prev = (struct list_head *)0xdead000000000122,
>             },
>             .__filler = (void *)0xdead000000000100,
>             .mlock_count = (unsigned int)290,
>             .mapping = (struct address_space *)0x0,
>             .index = (unsigned long)18446641474676726016,
>             .private = (void *)0x400000,
>             ._mapcount = (atomic_t){
>                     .counter = (int)-1,
>             },
>             ._refcount = (atomic_t){
>                     .counter = (int)1,
>             },
>             .memcg_data = (unsigned long)0,
>             .page = (struct page){
>                     .flags = (unsigned long)13510764522438785,
>                     .lru = (struct list_head){
>                             .next = (struct list_head *)0xdead000000000100,
>                             .prev = (struct list_head *)0xdead000000000122,
>                     },
>                     .__filler = (void *)0xdead000000000100,
>                     .mlock_count = (unsigned int)290,
>                     .buddy_list = (struct list_head){
>                             .next = (struct list_head *)0xdead000000000100,
>                             .prev = (struct list_head *)0xdead000000000122,
>                     },
>                     .pcp_list = (struct list_head){
>                             .next = (struct list_head *)0xdead000000000100,
>                             .prev = (struct list_head *)0xdead000000000122,
>                     },
>                     .mapping = (struct address_space *)0x0,
>                     .index = (unsigned long)18446641474676726016,
>                     .private = (unsigned long)4194304,
>                     .pp_magic = (unsigned long)16045481047390945536,
>                     .pp = (struct page_pool *)0xdead000000000122,
>                     ._pp_mapping_pad = (unsigned long)0,
>                     .dma_addr = (unsigned long)18446641474676726016,
>                     .dma_addr_upper = (unsigned long)4194304,
>                     .pp_frag_count = (atomic_long_t){
>                             .counter = (s64)4194304,
>                     },
>                     .compound_head = (unsigned long)16045481047390945536,
>                     .compound_dtor = (unsigned char)34,
>                     .compound_order = (unsigned char)1,
>                     .compound_mapcount = (atomic_t){
>                             .counter = (int)-559087616,
>                     },
>                     .compound_pincount = (atomic_t){
>                             .counter = (int)0,
>                     },
>                     .compound_nr = (unsigned int)0,
>                     ._compound_pad_1 = (unsigned long)16045481047390945536,
>                     ._compound_pad_2 = (unsigned long)16045481047390945570,
>                     .deferred_list = (struct list_head){
>                             .next = (struct list_head *)0x0,
>                             .prev = (struct list_head *)0xffffa2afcd181900,
>                     },
>                     ._pt_pad_1 = (unsigned long)16045481047390945536,
>                     .pmd_huge_pte = (pgtable_t)0xdead000000000122,
>                     ._pt_pad_2 = (unsigned long)0,
>                     .pt_mm = (struct mm_struct *)0xffffa2afcd181900,
>                     .pt_frag_refcount = (atomic_t){
>                             .counter = (int)-854058752,
>                     },
>                     .ptl = (spinlock_t){
>                             .rlock = (struct raw_spinlock){
>                                     .raw_lock = (arch_spinlock_t){
>                                             .val = (atomic_t){
>                                                     .counter = (int)4194304,
>                                             },
>                                             .locked = (u8)0,
>                                             .pending = (u8)0,
>                                             .locked_pending = (u16)0,
>                                             .tail = (u16)64,
>                                     },
>                             },
>                     },
>                     .pgmap = (struct dev_pagemap *)0xdead000000000100,
>                     .zone_device_data = (void *)0xdead000000000122,
>                     .callback_head = (struct callback_head){
>                             .next = (struct callback_head *)0xdead000000000100,
>                             .func = (void (*)(struct callback_head
> *))0xdead000000000122,
>                     },
>                     ._mapcount = (atomic_t){
>                             .counter = (int)-1,
>                     },
>                     .page_type = (unsigned int)4294967295,
>                     ._refcount = (atomic_t){
>                             .counter = (int)1,
>                     },
>                     .memcg_data = (unsigned long)0,
>             },
>             ._flags_1 = (unsigned long)13510764522373120,
>             .__head = (unsigned long)18446698392541487105,
>             ._folio_dtor = (unsigned char)1,
>             ._folio_order = (unsigned char)2,
>             ._total_mapcount = (atomic_t){
>                     .counter = (int)-1,
>             },
>             ._pincount = (atomic_t){
>                     .counter = (int)0,
>             },
>             ._folio_nr_pages = (unsigned int)4,
>     }
>     >>> for index, entry in
> xa_for_each(trace[7]['mapping'].i_pages.address_of_()):
>             print(index, entry, cast('struct folio *',
> entry).page.mapping.address_of_())
>     ....
>     6464 (void *)0xffffd674c130a000 *(struct address_space
> **)0xffffd674c130a018 = 0xffffa2b30e93b2b0
>     6528 (void *)0xffffd674beb22000 *(struct address_space
> **)0xffffd674beb22018 = 0xffffa2b30e93b2b0
>     6592 (void *)0xffffd67406346000 *(struct address_space
> **)0xffffd67406346018 = 0x0 <===== our folio
>     6624 (void *)0x7037e8d8000100d (struct address_space **)0x7037e8d80001025
>     6625 (void *)0x7037e047000100d (struct address_space **)0x7037e0470001025
>     ....
>
> This looks like the xarray is corrupted, and for some reason we have a
> locked folio
> in the mapping with a page with no mapping.
>
> Any suggestions on narrowing this down to a hypothesis to try to reproduce this,
> or potential fixes are very much appreciated. We are also trying some
> different kernels
> configurations on different set of hosts to see if the problems go
> away for them, such as:
> - 6.1.36 without xfs: Support large folios
> 6795801366da0cd3d99e27c37f020a8f16714886
> - 6.1.36 without THP
> - 6.1.37 with the following series backported xfs, iomap: fix data
> corruption due to stale cached iomaps
> https://lore.kernel.org/linux-fsdevel/20221129001632.GX3600936@dread.disaster.area/
>
> Best,
> Daniel.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Kernel NULL pointer deref and data corruptions with xfs on 6.1
  2023-07-24 11:23 ` Daniel Dao
@ 2023-07-24 21:45   ` Dave Chinner
  2023-07-24 22:04     ` Daniel Dao
  2023-07-25  3:41     ` Matthew Wilcox
  0 siblings, 2 replies; 10+ messages in thread
From: Dave Chinner @ 2023-07-24 21:45 UTC (permalink / raw)
  To: Daniel Dao
  Cc: linux-fsdevel, Matthew Wilcox, kernel-team, linux-kernel, djwong

On Mon, Jul 24, 2023 at 12:23:31PM +0100, Daniel Dao wrote:
> Hi again,
> 
> We had another example of xarray corruption involving xfs and zsmalloc. We are
> running zram as swap. We have 2 tasks deadlock waiting for page to be released

Do your problems on 6.1 go away if you stop using zram as swap?

-Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Kernel NULL pointer deref and data corruptions with xfs on 6.1
  2023-07-24 21:45   ` Dave Chinner
@ 2023-07-24 22:04     ` Daniel Dao
  2023-07-25  3:41     ` Matthew Wilcox
  1 sibling, 0 replies; 10+ messages in thread
From: Daniel Dao @ 2023-07-24 22:04 UTC (permalink / raw)
  To: Dave Chinner
  Cc: linux-fsdevel, Matthew Wilcox, kernel-team, linux-kernel, djwong

On Mon, Jul 24, 2023 at 10:45 PM Dave Chinner <david@fromorbit.com> wrote:
>
> On Mon, Jul 24, 2023 at 12:23:31PM +0100, Daniel Dao wrote:
> > Hi again,
> >
> > We had another example of xarray corruption involving xfs and zsmalloc. We are
> > running zram as swap. We have 2 tasks deadlock waiting for page to be released
>
> Do your problems on 6.1 go away if you stop using zram as swap?

We had xarray corruptions even on nodes without swap, so I'm not sure
if swap matters.
The corruption on those nodes were noted in the first email with the
following trace

 BUG: kernel NULL pointer dereference, address: 0000000000000036
    #PF: supervisor read access in kernel mode
    #PF: error_code(0x0000) - not-present page
    PGD 18806c5067 P4D 18806c5067 PUD 188ed48067 PMD 0
    Oops: 0000 [#1] PREEMPT SMP NOPTI
    CPU: 73 PID: 3579408 Comm: prometheus Tainted: G           O
6.1.34-cloudflare-2023.6.7 #1
    Hardware name: GIGABYTE R162-Z12-CD1/MZ12-HD4-CD, BIOS M03 11/19/2021
    RIP: 0010:__filemap_get_folio (arch/x86/include/asm/atomic.h:29
include/linux/atomic/atomic-arch-fallback.h:1242
include/linux/atomic/atomic-arch-fallback.h:1267
include/linux/atomic/atomic-instrumented.h:608
include/linux/page_ref.h:238 include/linux/page_ref.h:247
include/linux/page_ref.h:280 include/linux/page_ref.h:313
mm/filemap.c:1863 mm/filemap.c:1915)

It's hard for us to run tests without zram swap at scale since the
benefits are significant with a lot of
workloads.

Daniel.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Kernel NULL pointer deref and data corruptions with xfs on 6.1
  2023-07-24 21:45   ` Dave Chinner
  2023-07-24 22:04     ` Daniel Dao
@ 2023-07-25  3:41     ` Matthew Wilcox
  1 sibling, 0 replies; 10+ messages in thread
From: Matthew Wilcox @ 2023-07-25  3:41 UTC (permalink / raw)
  To: Dave Chinner; +Cc: Daniel Dao, linux-fsdevel, kernel-team, linux-kernel, djwong

On Tue, Jul 25, 2023 at 07:45:25AM +1000, Dave Chinner wrote:
> On Mon, Jul 24, 2023 at 12:23:31PM +0100, Daniel Dao wrote:
> > Hi again,
> > 
> > We had another example of xarray corruption involving xfs and zsmalloc. We are
> > running zram as swap. We have 2 tasks deadlock waiting for page to be released
> 
> Do your problems on 6.1 go away if you stop using zram as swap?

I think zram is the victim here, not the culprit.  I think what's
going on is that -- somehow -- there are stale pointers in the xarray.
zram allocates these pages (I suspect most of the memory in this machine
is allocated to zram or page cache) and then we blow up when finding
a folio in the page cache which has a ->mapping that is actually a
movable_ops structure.

But how do we get stale pointers in the xarray?  I've been worrying at
that problem for months.  At some point, the refcount must go down to
zero:

static inline void folio_put(struct folio *folio)
{
        if (folio_put_testzero(folio))
                __folio_put(folio);
}

(assume we're talking about a large folio; everything seems to point
that way):

__folio_put_large:
        if (!folio_test_hugetlb(folio))
                __page_cache_release(folio);
        destroy_large_folio(folio);

destroy_large_folio:
	free_transhuge_page()
free_transhuge_page:
        free_compound_page(page);
free_compound_page:
        free_the_page(page, compound_order(page));
free_the_page:
                __free_pages_ok(page, order, FPI_NONE);
__free_pages_ok:
        if (!free_pages_prepare(page, order, fpi_flags))
free_pages_prepare:
       if (PageMappingFlags(page))
                page->mapping = NULL;
(doesn't trigger; PageMappingFlags are false for page cache)
        if (is_check_pages_enabled()) {
                if (free_page_is_bad(page))
free_page_is_bad:
        if (likely(page_expected_state(page, PAGE_FLAGS_CHECK_AT_FREE)))
                return false;

        /* Something has gone sideways, find it */
        free_page_is_bad_report(page);
page_expected_state:
        if (unlikely((unsigned long)page->mapping | ...
                return false;

free_page_is_bad_report:
        bad_page(page,
                 page_bad_reason(page, PAGE_FLAGS_CHECK_AT_FREE));
page_bad_reason:
        if (unlikely(page->mapping != NULL))
                bad_reason = "non-NULL mapping";

So (assuming that Daniel has check_pages_enabled set and isn't ignoring
important parts of dmesg, which seem like reasonable assumptions), the
last put of a folio must be after the folio has had its ->mapping cleared

But we remove the folio from the page cache in page_cache_delete(),
right before we set the mapping to NULL.  And again in
delete_from_page_cache_batch() (in the other order; I don't think that's
relevant?)

So where do we set folio->mapping to NULL without removing folio from
the XArray?  I'm beginning to suspect it's a mishandled failure in
split_huge_page(), so I'll re-review that code path tomorrow.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Kernel NULL pointer deref and data corruptions with xfs on 6.1
  2023-07-21 10:49 Kernel NULL pointer deref and data corruptions with xfs on 6.1 Daniel Dao
  2023-07-24 11:23 ` Daniel Dao
@ 2023-07-27  3:27 ` Matthew Wilcox
  2023-07-27 10:25   ` Daniel Dao
  1 sibling, 1 reply; 10+ messages in thread
From: Matthew Wilcox @ 2023-07-27  3:27 UTC (permalink / raw)
  To: Daniel Dao; +Cc: linux-fsdevel, Dave Chinner, kernel-team, linux-kernel, djwong

On Fri, Jul 21, 2023 at 11:49:04AM +0100, Daniel Dao wrote:
> We do not have a reproducer yet, but we now have more debugging data
> which hopefully
> should help narrow this down. Details as followed:
> 
> 1. Kernel NULL pointer deferencences in __filemap_get_folio
> 
> This happened on a few different hosts, with a few different repeated addresses.
> The addresses are 0000000000000036, 0000000000000076,
> 00000000000000f6. This looks
> like the xarray is corrupted and we were trying to do some work on a
> sibling entry.

I think I have a fix for this one.  Please try the attached.

> 2. Kernel NULL pointer deferencences in xfs_read_iomap_begin
> 
>     BUG: unable to handle page fault for address: 0000000000034668
>     #PF: supervisor read access in kernel mode
>     #PF: error_code(0x0000) - not-present page
>     PGD 11cfd37067 P4D 11cfd37067 PUD b88086067 PMD 0
>     Oops: 0000 [#1] PREEMPT SMP NOPTI
>     CPU: 124 PID: 3831226 Comm: rocksdb:low Kdump: loaded Tainted: G
>      W  O L     6.1.27-cloudflare-2023.5.0 #1
>     Hardware name: HYVE EDGE-METAL-GEN11/HS1811D_Lite, BIOS V0.11-sig 12/23/2022
>     RIP: 0010:xfs_read_iomap_begin (fs/xfs/xfs_iomap.c:1200)
>     Code: 0f 1f 44 00 00 41 57 41 56 41 55 41 54 55 53 48 83 ec 50 48
> 89 14 24 4c 89 44 24 08 65 48 8b 04 25 28 00 00 00 48 89 44 24 48 <48>
> 8b 87 >
>     All code
>     ========
>       0:   0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
>       5:   41 57                   push   %r15
>       7:   41 56                   push   %r14
>       9:   41 55                   push   %r13
>       b:   41 54                   push   %r12
>       d:   55                      push   %rbp
>       e:   53                      push   %rbx
>       f:   48 83 ec 50             sub    $0x50,%rsp
>       13:   48 89 14 24             mov    %rdx,(%rsp)
>       17:   4c 89 44 24 08          mov    %r8,0x8(%rsp)
>       1c:   65 48 8b 04 25 28 00    mov    %gs:0x28,%rax
>       23:   00 00
>       25:   48 89 44 24 48          mov    %rax,0x48(%rsp)
>       2a:*  48                      rex.W           <-- trapping instruction
>       2b:   8b                      .byte 0x8b
>       2c:   87 00                   xchg   %eax,(%rax)
> 
>     Code starting with the faulting instruction
>     ===========================================
>       0:   48                      rex.W
>       1:   8b                      .byte 0x8b
>       2:   87 00                   xchg   %eax,(%rax)

This one is hard to understand because the decoding of the instruction
got cut off.  But ...

>     RSP: 0018:ffffa63810733a70 EFLAGS: 00010282
>     RAX: 78ac714f0997e100 RBX: ffffa63810733b40 RCX: 0000000000000000
>     RDX: 0000000000004000 RSI: 0000000000000000 RDI: 00000000000347a0

RDI is kind of close to the fault address ... RDI is used as the first
argument in the x86-64 SYSV ABI, and the first parameter to
xfs_read_iomap_begin() is supposed to be a struct inode pointer.

I don't think this is related.

> We also have a deadlock reading a very specific file on this host. We managed to
> do a kdump on this host and extracted out the state of the mapping.

This is almost certainly a different bug, but alos XArray related, so
I'll keep looking at this one.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Kernel NULL pointer deref and data corruptions with xfs on 6.1
  2023-07-27  3:27 ` Matthew Wilcox
@ 2023-07-27 10:25   ` Daniel Dao
  2023-07-27 12:27     ` Matthew Wilcox
  0 siblings, 1 reply; 10+ messages in thread
From: Daniel Dao @ 2023-07-27 10:25 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-fsdevel, Dave Chinner, kernel-team, linux-kernel, djwong

On Thu, Jul 27, 2023 at 4:27 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Fri, Jul 21, 2023 at 11:49:04AM +0100, Daniel Dao wrote:
> > We do not have a reproducer yet, but we now have more debugging data
> > which hopefully
> > should help narrow this down. Details as followed:
> >
> > 1. Kernel NULL pointer deferencences in __filemap_get_folio
> >
> > This happened on a few different hosts, with a few different repeated addresses.
> > The addresses are 0000000000000036, 0000000000000076,
> > 00000000000000f6. This looks
> > like the xarray is corrupted and we were trying to do some work on a
> > sibling entry.
>
> I think I have a fix for this one.  Please try the attached.

For some reason I do not see the attached patch. Can you resend it, or
is it the same
one as in https://bugzilla.kernel.org/show_bug.cgi?id=216646#c31 ?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Kernel NULL pointer deref and data corruptions with xfs on 6.1
  2023-07-27 10:25   ` Daniel Dao
@ 2023-07-27 12:27     ` Matthew Wilcox
  2023-08-04 16:57       ` Frederick Lawler
  0 siblings, 1 reply; 10+ messages in thread
From: Matthew Wilcox @ 2023-07-27 12:27 UTC (permalink / raw)
  To: Daniel Dao; +Cc: linux-fsdevel, Dave Chinner, kernel-team, linux-kernel, djwong

On Thu, Jul 27, 2023 at 11:25:33AM +0100, Daniel Dao wrote:
> On Thu, Jul 27, 2023 at 4:27 AM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Fri, Jul 21, 2023 at 11:49:04AM +0100, Daniel Dao wrote:
> > > We do not have a reproducer yet, but we now have more debugging data
> > > which hopefully
> > > should help narrow this down. Details as followed:
> > >
> > > 1. Kernel NULL pointer deferencences in __filemap_get_folio
> > >
> > > This happened on a few different hosts, with a few different repeated addresses.
> > > The addresses are 0000000000000036, 0000000000000076,
> > > 00000000000000f6. This looks
> > > like the xarray is corrupted and we were trying to do some work on a
> > > sibling entry.
> >
> > I think I have a fix for this one.  Please try the attached.
> 
> For some reason I do not see the attached patch. Can you resend it, or
> is it the same
> one as in https://bugzilla.kernel.org/show_bug.cgi?id=216646#c31 ?

Yes, that's the one, sorry.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Kernel NULL pointer deref and data corruptions with xfs on 6.1
  2023-07-27 12:27     ` Matthew Wilcox
@ 2023-08-04 16:57       ` Frederick Lawler
  2023-08-30 19:26         ` Frederick Lawler
  0 siblings, 1 reply; 10+ messages in thread
From: Frederick Lawler @ 2023-08-04 16:57 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Daniel Dao, linux-fsdevel, Dave Chinner, kernel-team,
	linux-kernel, djwong

Hi Matthew,

On Thu, Jul 27, 2023 at 01:27:56PM +0100, Matthew Wilcox wrote:
> On Thu, Jul 27, 2023 at 11:25:33AM +0100, Daniel Dao wrote:
> > On Thu, Jul 27, 2023 at 4:27 AM Matthew Wilcox <willy@infradead.org> wrote:
> > >
> > > On Fri, Jul 21, 2023 at 11:49:04AM +0100, Daniel Dao wrote:
> > > > We do not have a reproducer yet, but we now have more debugging data
> > > > which hopefully
> > > > should help narrow this down. Details as followed:
> > > >
> > > > 1. Kernel NULL pointer deferencences in __filemap_get_folio
> > > >
> > > > This happened on a few different hosts, with a few different repeated addresses.
> > > > The addresses are 0000000000000036, 0000000000000076,
> > > > 00000000000000f6. This looks
> > > > like the xarray is corrupted and we were trying to do some work on a
> > > > sibling entry.
> > >
> > > I think I have a fix for this one.  Please try the attached.
> > 
> > For some reason I do not see the attached patch. Can you resend it, or
> > is it the same
> > one as in https://bugzilla.kernel.org/show_bug.cgi?id=216646#c31 ?
> 
> Yes, that's the one, sorry.

I setup a kernel with this patch to deploy out. It'll take some time to
see any results from that. I did run your multiorder.c changes with/without
the change to lib/xarray.c and that seemed to work as intended. I didn't see
any regressions across multiple seeds with our kernel config.

Fred

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Kernel NULL pointer deref and data corruptions with xfs on 6.1
  2023-08-04 16:57       ` Frederick Lawler
@ 2023-08-30 19:26         ` Frederick Lawler
  0 siblings, 0 replies; 10+ messages in thread
From: Frederick Lawler @ 2023-08-30 19:26 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Daniel Dao, linux-fsdevel, Dave Chinner, kernel-team,
	linux-kernel, djwong

Hi Matthew,

On Fri, Aug 04, 2023 at 11:57:22AM -0500, Frederick Lawler wrote:
> Hi Matthew,
> 
> On Thu, Jul 27, 2023 at 01:27:56PM +0100, Matthew Wilcox wrote:
> > On Thu, Jul 27, 2023 at 11:25:33AM +0100, Daniel Dao wrote:
> > > On Thu, Jul 27, 2023 at 4:27 AM Matthew Wilcox <willy@infradead.org> wrote:
> > > >
> > > > On Fri, Jul 21, 2023 at 11:49:04AM +0100, Daniel Dao wrote:
> > > > > We do not have a reproducer yet, but we now have more debugging data
> > > > > which hopefully
> > > > > should help narrow this down. Details as followed:
> > > > >
> > > > > 1. Kernel NULL pointer deferencences in __filemap_get_folio
> > > > >
> > > > > This happened on a few different hosts, with a few different repeated addresses.
> > > > > The addresses are 0000000000000036, 0000000000000076,
> > > > > 00000000000000f6. This looks
> > > > > like the xarray is corrupted and we were trying to do some work on a
> > > > > sibling entry.
> > > >
> > > > I think I have a fix for this one.  Please try the attached.
> > > 
> > > For some reason I do not see the attached patch. Can you resend it, or
> > > is it the same
> > > one as in https://bugzilla.kernel.org/show_bug.cgi?id=216646#c31 ?
> > 
> > Yes, that's the one, sorry.
> 
> I setup a kernel with this patch to deploy out. It'll take some time to
> see any results from that. I did run your multiorder.c changes with/without
> the change to lib/xarray.c and that seemed to work as intended. I didn't see
> any regressions across multiple seeds with our kernel config.
> 
> Fred

We deployed out the xarray lib fix to our fleet and didn't notice any more
issues cropping up wrt this error among other oddities. LGTM

Fred

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-08-30 20:30 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-21 10:49 Kernel NULL pointer deref and data corruptions with xfs on 6.1 Daniel Dao
2023-07-24 11:23 ` Daniel Dao
2023-07-24 21:45   ` Dave Chinner
2023-07-24 22:04     ` Daniel Dao
2023-07-25  3:41     ` Matthew Wilcox
2023-07-27  3:27 ` Matthew Wilcox
2023-07-27 10:25   ` Daniel Dao
2023-07-27 12:27     ` Matthew Wilcox
2023-08-04 16:57       ` Frederick Lawler
2023-08-30 19:26         ` Frederick Lawler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).