linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* kernel BUG at include/linux/swapops.h:204!
@ 2021-07-10  7:33 Igor Raits
  2021-07-10 12:46 ` Hillf Danton
  2021-07-11  4:17 ` Hugh Dickins
  0 siblings, 2 replies; 24+ messages in thread
From: Igor Raits @ 2021-07-10  7:33 UTC (permalink / raw)
  To: linux-mm, Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 5975 bytes --]

Hello,

I've seen one weird bug on 5.12.14 that happened a couple of times when I
started a bunch of VMs on a server.

I've briefly googled this problem but could not find any relevant commit
that would fix this issue.

Do you have any hint how to debug this further or know the fix by any
chance?

Thanks in advance. Stack trace following:

[  376.876610] ------------[ cut here ]------------
[  376.881274] kernel BUG at include/linux/swapops.h:204!
[  376.886455] invalid opcode: 0000 [#1] SMP NOPTI
[  376.891014] CPU: 40 PID: 11775 Comm: rpc-worker Tainted: G            E
    5.12.14-1.gdc.el8.x86_64 #1
[  376.900464] Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380
Gen10, BIOS U30 05/24/2021
[  376.909038] RIP: 0010:pmd_migration_entry_wait+0x132/0x140
[  376.914562] Code: 02 00 00 00 5b 4c 89 c7 5d e9 8a e4 f6 ff 48 81 e2 00
f0 ff ff 48 f7 d2 48 21 c2 89 d1 f7 c2 81 01 00 00 75 80 e9 44 ff ff ff
<0f> 0b 48 8b 2d 75 bd 30 01 e9 ef fe ff ff 0f 1f 44 00 00 41 55 48
[  376.933443] RSP: 0000:ffffb65a5e1cfdc8 EFLAGS: 00010246
[  376.938701] RAX: 0017ffffc0000000 RBX: ffff908b8ecabaf8 RCX:
ffffffffffffffff
[  376.945878] RDX: 0000000000000000 RSI: ffff908b8ecabaf8 RDI:
fffff497473b2ae8
[  376.953055] RBP: fffff497473b2ae8 R08: fffff49747fa8080 R09:
0000000000000000
[  376.960230] R10: 0000000000000000 R11: 0000000000000000 R12:
0000000000000af8
[  376.967407] R13: 0400000000000000 R14: 0400000000000080 R15:
ffff908bbef7b6a8
[  376.974582] FS:  00007f5bb1f81700(0000) GS:ffff90e87fd80000(0000)
knlGS:0000000000000000
[  376.982718] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  376.988497] CR2: 00007f5b2bfffd98 CR3: 00000001f793e006 CR4:
00000000007726e0
[  376.995673] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[  377.002849] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[  377.010026] PKRU: 55555554
[  377.012745] Call Trace:
[  377.015207]  __handle_mm_fault+0x5ad/0x6e0
[  377.019335]  handle_mm_fault+0xc5/0x290
[  377.023194]  do_user_addr_fault+0x1cd/0x740
[  377.027406]  exc_page_fault+0x54/0x110
[  377.031182]  ? asm_exc_page_fault+0x8/0x30
[  377.035307]  asm_exc_page_fault+0x1e/0x30
[  377.039340] RIP: 0033:0x7f5bb91d6734
[  377.042937] Code: 89 08 48 8b 35 dd 3b 21 00 4c 8d 0d d6 3b 21 00 31 c0
4c 39 ce 74 73 0f 1f 80 00 00 00 00 48 8d 96 40 fd ff ff 49 39 d2 74 22
<48> 8b 96 d8 03 00 00 48 01 15 4e 7c 21 00 80 be 50 03 00 00 00 c7
[  377.061820] RSP: 002b:00007f5bb1f7ff58 EFLAGS: 00010206
[  377.067076] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
00007f5ba0000020
[  377.074255] RDX: 00007f5b2bfff700 RSI: 00007f5b2bfff9c0 RDI:
0000000000000001
[  377.081429] RBP: 0000000000000001 R08: 0000000000000000 R09:
00007f5bb93ea2f0
[  377.088606] R10: 00007f5bb1f81700 R11: 0000000000000202 R12:
0000000000000001
[  377.095782] R13: 0000000000000006 R14: 0000000000000cb4 R15:
00007f5bb1f801f0
[  377.102958] Modules linked in: ebt_arp(E) nft_meta_bridge(E)
ip6_tables(E) xt_CT(E) nf_log_ipv4(E) nf_log_common(E) nft_limit(E)
nft_counter(E) xt_LOG(E) xt_limit(E) xt_mac(E) xt_set(E) xt_multiport(E)
xt_state(E) xt_conntrack(E) xt_comment(E) xt_physdev(E) nft_compat(E)
ip_set_hash_net(E) ip_set(E) vhost_net(E) vhost(E) vhost_iotlb(E) tap(E)
tun(E) tcp_diag(E) udp_diag(E) inet_diag(E) netconsole(E) nf_tables(E)
vxlan(E) ip6_udp_tunnel(E) udp_tunnel(E) nfnetlink(E) binfmt_misc(E)
iscsi_tcp(E) libiscsi_tcp(E) 8021q(E) garp(E) mrp(E) bonding(E) tls(E)
vfat(E) fat(E) dm_service_time(E) dm_multipath(E) rpcrdma(E) sunrpc(E)
rdma_ucm(E) ib_srpt(E) ib_isert(E) iscsi_target_mod(E) target_core_mod(E)
ib_iser(E) rdma_cm(E) iw_cm(E) ib_cm(E) libiscsi(E) scsi_transport_iscsi(E)
intel_rapl_msr(E) qedr(E) intel_rapl_common(E) ib_uverbs(E)
isst_if_common(E) ib_core(E) nfit(E) libnvdimm(E) x86_pkg_temp_thermal(E)
intel_powerclamp(E) coretemp(E) kvm_intel(E) kvm(E) irqbypass(E)
crct10dif_pclmul(E)
[  377.102999]  crc32_pclmul(E) ghash_clmulni_intel(E) rapl(E)
intel_cstate(E) ipmi_ssif(E) acpi_ipmi(E) ipmi_si(E) mei_me(E) ioatdma(E)
ipmi_devintf(E) dm_mod(E) ses(E) intel_uncore(E) pcspkr(E) qede(E)
enclosure(E) tg3(E) mei(E) lpc_ich(E) hpilo(E) hpwdt(E)
intel_pch_thermal(E) dca(E) ipmi_msghandler(E) acpi_power_meter(E) ext4(E)
mbcache(E) jbd2(E) sd_mod(E) t10_pi(E) sg(E) qedf(E) qed(E) crc8(E)
libfcoe(E) libfc(E) smartpqi(E) scsi_transport_fc(E) scsi_transport_sas(E)
wmi(E) nf_conntrack(E) nf_defrag_ipv6(E) libcrc32c(E) crc32c_intel(E)
nf_defrag_ipv4(E) br_netfilter(E) bridge(E) stp(E) llc(E)
[  377.243468] ---[ end trace 04bce3bb051f7620 ]---
[  377.385645] RIP: 0010:pmd_migration_entry_wait+0x132/0x140
[  377.391194] Code: 02 00 00 00 5b 4c 89 c7 5d e9 8a e4 f6 ff 48 81 e2 00
f0 ff ff 48 f7 d2 48 21 c2 89 d1 f7 c2 81 01 00 00 75 80 e9 44 ff ff ff
<0f> 0b 48 8b 2d 75 bd 30 01 e9 ef fe ff ff 0f 1f 44 00 00 41 55 48
[  377.410091] RSP: 0000:ffffb65a5e1cfdc8 EFLAGS: 00010246
[  377.415355] RAX: 0017ffffc0000000 RBX: ffff908b8ecabaf8 RCX:
ffffffffffffffff
[  377.422540] RDX: 0000000000000000 RSI: ffff908b8ecabaf8 RDI:
fffff497473b2ae8
[  377.429721] RBP: fffff497473b2ae8 R08: fffff49747fa8080 R09:
0000000000000000
[  377.436902] R10: 0000000000000000 R11: 0000000000000000 R12:
0000000000000af8
[  377.444086] R13: 0400000000000000 R14: 0400000000000080 R15:
ffff908bbef7b6a8
[  377.451272] FS:  00007f5bb1f81700(0000) GS:ffff90e87fd80000(0000)
knlGS:0000000000000000
[  377.459415] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  377.465196] CR2: 00007f5b2bfffd98 CR3: 00000001f793e006 CR4:
00000000007726e0
[  377.472377] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[  377.479556] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[  377.486738] PKRU: 55555554
[  377.489465] Kernel panic - not syncing: Fatal exception
[  377.573911] Kernel Offset: 0xa000000 from 0xffffffff81000000 (relocation
range: 0xffffffff80000000-0xffffffffbfffffff)
[  377.716482] ---[ end Kernel panic - not syncing: Fatal exception ]---

[-- Attachment #2: Type: text/html, Size: 6390 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: kernel BUG at include/linux/swapops.h:204!
  2021-07-10  7:33 kernel BUG at include/linux/swapops.h:204! Igor Raits
@ 2021-07-10 12:46 ` Hillf Danton
  2021-07-11  4:17 ` Hugh Dickins
  1 sibling, 0 replies; 24+ messages in thread
From: Hillf Danton @ 2021-07-10 12:46 UTC (permalink / raw)
  To: Igor Raits; +Cc: linux-mm, syzbot, Andrew Morton

On Sat, 10 Jul 2021 09:33:26 +0200 Igor Raits wrote:
>Hello,
>
>I've seen one weird bug on 5.12.14 that happened a couple of times when I
>started a bunch of VMs on a server.

Thanks for your report.

>
>I've briefly googled this problem but could not find any relevant commit
>that would fix this issue.

Not sure this is the first report - a similar one [0] from syzbot.

[0] https://lore.kernel.org/linux-mm/00000000000045ff9505c1cfc9ae@google.com/

>
>Do you have any hint how to debug this further or know the fix by any
>chance?

This report has more info about the BUG - in pmd_migration_entry_wait() huge
migration entry is checked under page table lock. And on the updater side,
hme should be set and removed also with ptl held, see below diff.

>
>Thanks in advance. Stack trace following:
>
>[  376.876610] ------------[ cut here ]------------
>[  376.881274] kernel BUG at include/linux/swapops.h:204!
>[  376.886455] invalid opcode: 0000 [#1] SMP NOPTI
>[  376.891014] CPU: 40 PID: 11775 Comm: rpc-worker Tainted: G            E
>    5.12.14-1.gdc.el8.x86_64 #1
>[  376.900464] Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380
>Gen10, BIOS U30 05/24/2021
>[  376.909038] RIP: 0010:pmd_migration_entry_wait+0x132/0x140
>[  376.914562] Code: 02 00 00 00 5b 4c 89 c7 5d e9 8a e4 f6 ff 48 81 e2 00
>f0 ff ff 48 f7 d2 48 21 c2 89 d1 f7 c2 81 01 00 00 75 80 e9 44 ff ff ff
><0f> 0b 48 8b 2d 75 bd 30 01 e9 ef fe ff ff 0f 1f 44 00 00 41 55 48
>[  376.933443] RSP: 0000:ffffb65a5e1cfdc8 EFLAGS: 00010246
>[  376.938701] RAX: 0017ffffc0000000 RBX: ffff908b8ecabaf8 RCX:
>ffffffffffffffff
>[  376.945878] RDX: 0000000000000000 RSI: ffff908b8ecabaf8 RDI:
>fffff497473b2ae8
>[  376.953055] RBP: fffff497473b2ae8 R08: fffff49747fa8080 R09:
>0000000000000000
>[  376.960230] R10: 0000000000000000 R11: 0000000000000000 R12:
>0000000000000af8
>[  376.967407] R13: 0400000000000000 R14: 0400000000000080 R15:
>ffff908bbef7b6a8
>[  376.974582] FS:  00007f5bb1f81700(0000) GS:ffff90e87fd80000(0000)
>knlGS:0000000000000000
>[  376.982718] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>[  376.988497] CR2: 00007f5b2bfffd98 CR3: 00000001f793e006 CR4:
>00000000007726e0
>[  376.995673] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
>0000000000000000
>[  377.002849] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
>0000000000000400
>[  377.010026] PKRU: 55555554
>[  377.012745] Call Trace:
>[  377.015207]  __handle_mm_fault+0x5ad/0x6e0
>[  377.019335]  handle_mm_fault+0xc5/0x290
>[  377.023194]  do_user_addr_fault+0x1cd/0x740
>[  377.027406]  exc_page_fault+0x54/0x110
>[  377.031182]  ? asm_exc_page_fault+0x8/0x30
>[  377.035307]  asm_exc_page_fault+0x1e/0x30


+++ x/mm/huge_memory.c
@@ -2983,6 +2983,7 @@ void set_pmd_migration_entry(struct page
 	struct vm_area_struct *vma = pvmw->vma;
 	struct mm_struct *mm = vma->vm_mm;
 	unsigned long address = pvmw->address;
+	spinlock_t *ptl;
 	pmd_t pmdval;
 	swp_entry_t entry;
 	pmd_t pmdswp;
@@ -2998,7 +2999,9 @@ void set_pmd_migration_entry(struct page
 	pmdswp = swp_entry_to_pmd(entry);
 	if (pmd_soft_dirty(pmdval))
 		pmdswp = pmd_swp_mksoft_dirty(pmdswp);
+	ptl = pmd_lock(mm, pvmw->pmd);
 	set_pmd_at(mm, address, pvmw->pmd, pmdswp);
+	spin_unlock(ptl);
 	page_remove_rmap(page, true);
 	put_page(page);
 }
@@ -3009,6 +3012,7 @@ void remove_migration_pmd(struct page_vm
 	struct mm_struct *mm = vma->vm_mm;
 	unsigned long address = pvmw->address;
 	unsigned long mmun_start = address & HPAGE_PMD_MASK;
+	spinlock_t *ptl;
 	pmd_t pmde;
 	swp_entry_t entry;
 
@@ -3028,7 +3032,9 @@ void remove_migration_pmd(struct page_vm
 		page_add_anon_rmap(new, vma, mmun_start, true);
 	else
 		page_add_file_rmap(new, true);
+	ptl = pmd_lock(mm, pvmw->pmd);
 	set_pmd_at(mm, mmun_start, pvmw->pmd, pmde);
+	spin_unlock(ptl);
 	if ((vma->vm_flags & VM_LOCKED) && !PageDoubleMap(new))
 		mlock_vma_page(new);
 	update_mmu_cache_pmd(vma, address, pvmw->pmd);


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: kernel BUG at include/linux/swapops.h:204!
  2021-07-10  7:33 kernel BUG at include/linux/swapops.h:204! Igor Raits
  2021-07-10 12:46 ` Hillf Danton
@ 2021-07-11  4:17 ` Hugh Dickins
  2021-07-11  6:06   ` Igor Raits
  1 sibling, 1 reply; 24+ messages in thread
From: Hugh Dickins @ 2021-07-11  4:17 UTC (permalink / raw)
  To: Igor Raits; +Cc: linux-mm, Andrew Morton, Hillf Danton

On Sat, 10 Jul 2021, Igor Raits wrote:

> Hello,
> 
> I've seen one weird bug on 5.12.14 that happened a couple of times when I
> started a bunch of VMs on a server.

Would it be possible for you to try the same on a 5.12.13 kernel?
Perhaps by reverting the diff between 5.12.13 and 5.12.14 temporarily.
Enough to form an impression of whether the issue is new in 5.12.14.

I ask because 5.12.14 did include several fixes and cleanups from me
to page_vma_mapped_walk(), and that is involved in inserting and
removing pmd migration entries.  I am not aware of introducing any
bug there, but your report has got me worried.  If it's happening in
5.12.14 but not in 5.12.13, then I must look again at my changes.

I don't expect Hillf's patch to help at at all: the pmd_lock()
is supposed to be taken by page_vma_mapped_walk(), before
set_pmd_migration_entry() and remove_migration_pmd() are called.

Thanks,
Hugh

> 
> I've briefly googled this problem but could not find any relevant commit
> that would fix this issue.
> 
> Do you have any hint how to debug this further or know the fix by any
> chance?
> 
> Thanks in advance. Stack trace following:
> 
> [  376.876610] ------------[ cut here ]------------
> [  376.881274] kernel BUG at include/linux/swapops.h:204!
> [  376.886455] invalid opcode: 0000 [#1] SMP NOPTI
> [  376.891014] CPU: 40 PID: 11775 Comm: rpc-worker Tainted: G            E
>     5.12.14-1.gdc.el8.x86_64 #1
> [  376.900464] Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380
> Gen10, BIOS U30 05/24/2021
> [  376.909038] RIP: 0010:pmd_migration_entry_wait+0x132/0x140
> [  376.914562] Code: 02 00 00 00 5b 4c 89 c7 5d e9 8a e4 f6 ff 48 81 e2 00
> f0 ff ff 48 f7 d2 48 21 c2 89 d1 f7 c2 81 01 00 00 75 80 e9 44 ff ff ff
> <0f> 0b 48 8b 2d 75 bd 30 01 e9 ef fe ff ff 0f 1f 44 00 00 41 55 48
> [  376.933443] RSP: 0000:ffffb65a5e1cfdc8 EFLAGS: 00010246
> [  376.938701] RAX: 0017ffffc0000000 RBX: ffff908b8ecabaf8 RCX:
> ffffffffffffffff
> [  376.945878] RDX: 0000000000000000 RSI: ffff908b8ecabaf8 RDI:
> fffff497473b2ae8
> [  376.953055] RBP: fffff497473b2ae8 R08: fffff49747fa8080 R09:
> 0000000000000000
> [  376.960230] R10: 0000000000000000 R11: 0000000000000000 R12:
> 0000000000000af8
> [  376.967407] R13: 0400000000000000 R14: 0400000000000080 R15:
> ffff908bbef7b6a8
> [  376.974582] FS:  00007f5bb1f81700(0000) GS:ffff90e87fd80000(0000)
> knlGS:0000000000000000
> [  376.982718] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  376.988497] CR2: 00007f5b2bfffd98 CR3: 00000001f793e006 CR4:
> 00000000007726e0
> [  376.995673] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [  377.002849] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> 0000000000000400
> [  377.010026] PKRU: 55555554
> [  377.012745] Call Trace:
> [  377.015207]  __handle_mm_fault+0x5ad/0x6e0
> [  377.019335]  handle_mm_fault+0xc5/0x290
> [  377.023194]  do_user_addr_fault+0x1cd/0x740
> [  377.027406]  exc_page_fault+0x54/0x110
> [  377.031182]  ? asm_exc_page_fault+0x8/0x30
> [  377.035307]  asm_exc_page_fault+0x1e/0x30
> [  377.039340] RIP: 0033:0x7f5bb91d6734
> [  377.042937] Code: 89 08 48 8b 35 dd 3b 21 00 4c 8d 0d d6 3b 21 00 31 c0
> 4c 39 ce 74 73 0f 1f 80 00 00 00 00 48 8d 96 40 fd ff ff 49 39 d2 74 22
> <48> 8b 96 d8 03 00 00 48 01 15 4e 7c 21 00 80 be 50 03 00 00 00 c7
> [  377.061820] RSP: 002b:00007f5bb1f7ff58 EFLAGS: 00010206
> [  377.067076] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
> 00007f5ba0000020
> [  377.074255] RDX: 00007f5b2bfff700 RSI: 00007f5b2bfff9c0 RDI:
> 0000000000000001
> [  377.081429] RBP: 0000000000000001 R08: 0000000000000000 R09:
> 00007f5bb93ea2f0
> [  377.088606] R10: 00007f5bb1f81700 R11: 0000000000000202 R12:
> 0000000000000001
> [  377.095782] R13: 0000000000000006 R14: 0000000000000cb4 R15:
> 00007f5bb1f801f0
> [  377.102958] Modules linked in: ebt_arp(E) nft_meta_bridge(E)
> ip6_tables(E) xt_CT(E) nf_log_ipv4(E) nf_log_common(E) nft_limit(E)
> nft_counter(E) xt_LOG(E) xt_limit(E) xt_mac(E) xt_set(E) xt_multiport(E)
> xt_state(E) xt_conntrack(E) xt_comment(E) xt_physdev(E) nft_compat(E)
> ip_set_hash_net(E) ip_set(E) vhost_net(E) vhost(E) vhost_iotlb(E) tap(E)
> tun(E) tcp_diag(E) udp_diag(E) inet_diag(E) netconsole(E) nf_tables(E)
> vxlan(E) ip6_udp_tunnel(E) udp_tunnel(E) nfnetlink(E) binfmt_misc(E)
> iscsi_tcp(E) libiscsi_tcp(E) 8021q(E) garp(E) mrp(E) bonding(E) tls(E)
> vfat(E) fat(E) dm_service_time(E) dm_multipath(E) rpcrdma(E) sunrpc(E)
> rdma_ucm(E) ib_srpt(E) ib_isert(E) iscsi_target_mod(E) target_core_mod(E)
> ib_iser(E) rdma_cm(E) iw_cm(E) ib_cm(E) libiscsi(E) scsi_transport_iscsi(E)
> intel_rapl_msr(E) qedr(E) intel_rapl_common(E) ib_uverbs(E)
> isst_if_common(E) ib_core(E) nfit(E) libnvdimm(E) x86_pkg_temp_thermal(E)
> intel_powerclamp(E) coretemp(E) kvm_intel(E) kvm(E) irqbypass(E)
> crct10dif_pclmul(E)
> [  377.102999]  crc32_pclmul(E) ghash_clmulni_intel(E) rapl(E)
> intel_cstate(E) ipmi_ssif(E) acpi_ipmi(E) ipmi_si(E) mei_me(E) ioatdma(E)
> ipmi_devintf(E) dm_mod(E) ses(E) intel_uncore(E) pcspkr(E) qede(E)
> enclosure(E) tg3(E) mei(E) lpc_ich(E) hpilo(E) hpwdt(E)
> intel_pch_thermal(E) dca(E) ipmi_msghandler(E) acpi_power_meter(E) ext4(E)
> mbcache(E) jbd2(E) sd_mod(E) t10_pi(E) sg(E) qedf(E) qed(E) crc8(E)
> libfcoe(E) libfc(E) smartpqi(E) scsi_transport_fc(E) scsi_transport_sas(E)
> wmi(E) nf_conntrack(E) nf_defrag_ipv6(E) libcrc32c(E) crc32c_intel(E)
> nf_defrag_ipv4(E) br_netfilter(E) bridge(E) stp(E) llc(E)
> [  377.243468] ---[ end trace 04bce3bb051f7620 ]---
> [  377.385645] RIP: 0010:pmd_migration_entry_wait+0x132/0x140
> [  377.391194] Code: 02 00 00 00 5b 4c 89 c7 5d e9 8a e4 f6 ff 48 81 e2 00
> f0 ff ff 48 f7 d2 48 21 c2 89 d1 f7 c2 81 01 00 00 75 80 e9 44 ff ff ff
> <0f> 0b 48 8b 2d 75 bd 30 01 e9 ef fe ff ff 0f 1f 44 00 00 41 55 48
> [  377.410091] RSP: 0000:ffffb65a5e1cfdc8 EFLAGS: 00010246
> [  377.415355] RAX: 0017ffffc0000000 RBX: ffff908b8ecabaf8 RCX:
> ffffffffffffffff
> [  377.422540] RDX: 0000000000000000 RSI: ffff908b8ecabaf8 RDI:
> fffff497473b2ae8
> [  377.429721] RBP: fffff497473b2ae8 R08: fffff49747fa8080 R09:
> 0000000000000000
> [  377.436902] R10: 0000000000000000 R11: 0000000000000000 R12:
> 0000000000000af8
> [  377.444086] R13: 0400000000000000 R14: 0400000000000080 R15:
> ffff908bbef7b6a8
> [  377.451272] FS:  00007f5bb1f81700(0000) GS:ffff90e87fd80000(0000)
> knlGS:0000000000000000
> [  377.459415] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  377.465196] CR2: 00007f5b2bfffd98 CR3: 00000001f793e006 CR4:
> 00000000007726e0
> [  377.472377] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [  377.479556] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> 0000000000000400
> [  377.486738] PKRU: 55555554
> [  377.489465] Kernel panic - not syncing: Fatal exception
> [  377.573911] Kernel Offset: 0xa000000 from 0xffffffff81000000 (relocation
> range: 0xffffffff80000000-0xffffffffbfffffff)
> [  377.716482] ---[ end Kernel panic - not syncing: Fatal exception ]---
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: kernel BUG at include/linux/swapops.h:204!
  2021-07-11  4:17 ` Hugh Dickins
@ 2021-07-11  6:06   ` Igor Raits
  2021-07-15 17:47     ` Igor Raits
  0 siblings, 1 reply; 24+ messages in thread
From: Igor Raits @ 2021-07-11  6:06 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: linux-mm, Andrew Morton, Hillf Danton

[-- Attachment #1: Type: text/plain, Size: 8036 bytes --]

Hi Hugh,

On Sun, Jul 11, 2021 at 6:17 AM Hugh Dickins <hughd@google.com> wrote:

> On Sat, 10 Jul 2021, Igor Raits wrote:
>
> > Hello,
> >
> > I've seen one weird bug on 5.12.14 that happened a couple of times when I
> > started a bunch of VMs on a server.
>
> Would it be possible for you to try the same on a 5.12.13 kernel?
> Perhaps by reverting the diff between 5.12.13 and 5.12.14 temporarily.
> Enough to form an impression of whether the issue is new in 5.12.14.
>

We've been using 5.12.12 for quite some time (~ a month) and I never saw it
there.

But I have to admit that I don't really have a reproducer. For example, on
servers where it happened,
I just rebooted them and panic did not happen anymore (so I saw it only
only once,
only on 2 servers out of 32 that we have on 5.12.14).


> I ask because 5.12.14 did include several fixes and cleanups from me
> to page_vma_mapped_walk(), and that is involved in inserting and
> removing pmd migration entries.  I am not aware of introducing any
> bug there, but your report has got me worried.  If it's happening in
> 5.12.14 but not in 5.12.13, then I must look again at my changes.
>
> I don't expect Hillf's patch to help at at all: the pmd_lock()
> is supposed to be taken by page_vma_mapped_walk(), before
> set_pmd_migration_entry() and remove_migration_pmd() are called.
>
> Thanks,
> Hugh
>
> >
> > I've briefly googled this problem but could not find any relevant commit
> > that would fix this issue.
> >
> > Do you have any hint how to debug this further or know the fix by any
> > chance?
> >
> > Thanks in advance. Stack trace following:
> >
> > [  376.876610] ------------[ cut here ]------------
> > [  376.881274] kernel BUG at include/linux/swapops.h:204!
> > [  376.886455] invalid opcode: 0000 [#1] SMP NOPTI
> > [  376.891014] CPU: 40 PID: 11775 Comm: rpc-worker Tainted: G
> E
> >     5.12.14-1.gdc.el8.x86_64 #1
> > [  376.900464] Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380
> > Gen10, BIOS U30 05/24/2021
> > [  376.909038] RIP: 0010:pmd_migration_entry_wait+0x132/0x140
> > [  376.914562] Code: 02 00 00 00 5b 4c 89 c7 5d e9 8a e4 f6 ff 48 81 e2
> 00
> > f0 ff ff 48 f7 d2 48 21 c2 89 d1 f7 c2 81 01 00 00 75 80 e9 44 ff ff ff
> > <0f> 0b 48 8b 2d 75 bd 30 01 e9 ef fe ff ff 0f 1f 44 00 00 41 55 48
> > [  376.933443] RSP: 0000:ffffb65a5e1cfdc8 EFLAGS: 00010246
> > [  376.938701] RAX: 0017ffffc0000000 RBX: ffff908b8ecabaf8 RCX:
> > ffffffffffffffff
> > [  376.945878] RDX: 0000000000000000 RSI: ffff908b8ecabaf8 RDI:
> > fffff497473b2ae8
> > [  376.953055] RBP: fffff497473b2ae8 R08: fffff49747fa8080 R09:
> > 0000000000000000
> > [  376.960230] R10: 0000000000000000 R11: 0000000000000000 R12:
> > 0000000000000af8
> > [  376.967407] R13: 0400000000000000 R14: 0400000000000080 R15:
> > ffff908bbef7b6a8
> > [  376.974582] FS:  00007f5bb1f81700(0000) GS:ffff90e87fd80000(0000)
> > knlGS:0000000000000000
> > [  376.982718] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  376.988497] CR2: 00007f5b2bfffd98 CR3: 00000001f793e006 CR4:
> > 00000000007726e0
> > [  376.995673] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > 0000000000000000
> > [  377.002849] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> > 0000000000000400
> > [  377.010026] PKRU: 55555554
> > [  377.012745] Call Trace:
> > [  377.015207]  __handle_mm_fault+0x5ad/0x6e0
> > [  377.019335]  handle_mm_fault+0xc5/0x290
> > [  377.023194]  do_user_addr_fault+0x1cd/0x740
> > [  377.027406]  exc_page_fault+0x54/0x110
> > [  377.031182]  ? asm_exc_page_fault+0x8/0x30
> > [  377.035307]  asm_exc_page_fault+0x1e/0x30
> > [  377.039340] RIP: 0033:0x7f5bb91d6734
> > [  377.042937] Code: 89 08 48 8b 35 dd 3b 21 00 4c 8d 0d d6 3b 21 00 31
> c0
> > 4c 39 ce 74 73 0f 1f 80 00 00 00 00 48 8d 96 40 fd ff ff 49 39 d2 74 22
> > <48> 8b 96 d8 03 00 00 48 01 15 4e 7c 21 00 80 be 50 03 00 00 00 c7
> > [  377.061820] RSP: 002b:00007f5bb1f7ff58 EFLAGS: 00010206
> > [  377.067076] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
> > 00007f5ba0000020
> > [  377.074255] RDX: 00007f5b2bfff700 RSI: 00007f5b2bfff9c0 RDI:
> > 0000000000000001
> > [  377.081429] RBP: 0000000000000001 R08: 0000000000000000 R09:
> > 00007f5bb93ea2f0
> > [  377.088606] R10: 00007f5bb1f81700 R11: 0000000000000202 R12:
> > 0000000000000001
> > [  377.095782] R13: 0000000000000006 R14: 0000000000000cb4 R15:
> > 00007f5bb1f801f0
> > [  377.102958] Modules linked in: ebt_arp(E) nft_meta_bridge(E)
> > ip6_tables(E) xt_CT(E) nf_log_ipv4(E) nf_log_common(E) nft_limit(E)
> > nft_counter(E) xt_LOG(E) xt_limit(E) xt_mac(E) xt_set(E) xt_multiport(E)
> > xt_state(E) xt_conntrack(E) xt_comment(E) xt_physdev(E) nft_compat(E)
> > ip_set_hash_net(E) ip_set(E) vhost_net(E) vhost(E) vhost_iotlb(E) tap(E)
> > tun(E) tcp_diag(E) udp_diag(E) inet_diag(E) netconsole(E) nf_tables(E)
> > vxlan(E) ip6_udp_tunnel(E) udp_tunnel(E) nfnetlink(E) binfmt_misc(E)
> > iscsi_tcp(E) libiscsi_tcp(E) 8021q(E) garp(E) mrp(E) bonding(E) tls(E)
> > vfat(E) fat(E) dm_service_time(E) dm_multipath(E) rpcrdma(E) sunrpc(E)
> > rdma_ucm(E) ib_srpt(E) ib_isert(E) iscsi_target_mod(E) target_core_mod(E)
> > ib_iser(E) rdma_cm(E) iw_cm(E) ib_cm(E) libiscsi(E)
> scsi_transport_iscsi(E)
> > intel_rapl_msr(E) qedr(E) intel_rapl_common(E) ib_uverbs(E)
> > isst_if_common(E) ib_core(E) nfit(E) libnvdimm(E) x86_pkg_temp_thermal(E)
> > intel_powerclamp(E) coretemp(E) kvm_intel(E) kvm(E) irqbypass(E)
> > crct10dif_pclmul(E)
> > [  377.102999]  crc32_pclmul(E) ghash_clmulni_intel(E) rapl(E)
> > intel_cstate(E) ipmi_ssif(E) acpi_ipmi(E) ipmi_si(E) mei_me(E) ioatdma(E)
> > ipmi_devintf(E) dm_mod(E) ses(E) intel_uncore(E) pcspkr(E) qede(E)
> > enclosure(E) tg3(E) mei(E) lpc_ich(E) hpilo(E) hpwdt(E)
> > intel_pch_thermal(E) dca(E) ipmi_msghandler(E) acpi_power_meter(E)
> ext4(E)
> > mbcache(E) jbd2(E) sd_mod(E) t10_pi(E) sg(E) qedf(E) qed(E) crc8(E)
> > libfcoe(E) libfc(E) smartpqi(E) scsi_transport_fc(E)
> scsi_transport_sas(E)
> > wmi(E) nf_conntrack(E) nf_defrag_ipv6(E) libcrc32c(E) crc32c_intel(E)
> > nf_defrag_ipv4(E) br_netfilter(E) bridge(E) stp(E) llc(E)
> > [  377.243468] ---[ end trace 04bce3bb051f7620 ]---
> > [  377.385645] RIP: 0010:pmd_migration_entry_wait+0x132/0x140
> > [  377.391194] Code: 02 00 00 00 5b 4c 89 c7 5d e9 8a e4 f6 ff 48 81 e2
> 00
> > f0 ff ff 48 f7 d2 48 21 c2 89 d1 f7 c2 81 01 00 00 75 80 e9 44 ff ff ff
> > <0f> 0b 48 8b 2d 75 bd 30 01 e9 ef fe ff ff 0f 1f 44 00 00 41 55 48
> > [  377.410091] RSP: 0000:ffffb65a5e1cfdc8 EFLAGS: 00010246
> > [  377.415355] RAX: 0017ffffc0000000 RBX: ffff908b8ecabaf8 RCX:
> > ffffffffffffffff
> > [  377.422540] RDX: 0000000000000000 RSI: ffff908b8ecabaf8 RDI:
> > fffff497473b2ae8
> > [  377.429721] RBP: fffff497473b2ae8 R08: fffff49747fa8080 R09:
> > 0000000000000000
> > [  377.436902] R10: 0000000000000000 R11: 0000000000000000 R12:
> > 0000000000000af8
> > [  377.444086] R13: 0400000000000000 R14: 0400000000000080 R15:
> > ffff908bbef7b6a8
> > [  377.451272] FS:  00007f5bb1f81700(0000) GS:ffff90e87fd80000(0000)
> > knlGS:0000000000000000
> > [  377.459415] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  377.465196] CR2: 00007f5b2bfffd98 CR3: 00000001f793e006 CR4:
> > 00000000007726e0
> > [  377.472377] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > 0000000000000000
> > [  377.479556] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> > 0000000000000400
> > [  377.486738] PKRU: 55555554
> > [  377.489465] Kernel panic - not syncing: Fatal exception
> > [  377.573911] Kernel Offset: 0xa000000 from 0xffffffff81000000
> (relocation
> > range: 0xffffffff80000000-0xffffffffbfffffff)
> > [  377.716482] ---[ end Kernel panic - not syncing: Fatal exception ]---
> >
>


-- 

Igor Raits

Sr. SW Engineer

igor@gooddata.com

+420 775 117 817

Moravske namesti 1007/14

602 00 Brno-Veveri, Czech Republic

Twitter <https://twitter.com/gooddata> | Facebook
<https://www.facebook.com/gooddata> | LinkedIn
<http://www.linkedin.com/company/gooddata> | Blog
<http://www.gooddata.com/blog>


<https://www.gooddata.com/>

[-- Attachment #2: Type: text/html, Size: 13860 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: kernel BUG at include/linux/swapops.h:204!
  2021-07-11  6:06   ` Igor Raits
@ 2021-07-15 17:47     ` Igor Raits
  2021-07-16 19:45       ` Hugh Dickins
  0 siblings, 1 reply; 24+ messages in thread
From: Igor Raits @ 2021-07-15 17:47 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: linux-mm, Andrew Morton, Hillf Danton

[-- Attachment #1: Type: text/plain, Size: 14298 bytes --]

Hi everyone again,

I've been trying to reproduce this issue but still can't find a consistent
pattern.

However, it did happen once more and this time on 5.13.1:

[  222.068216] ------------[ cut here ]------------
[  222.072884] kernel BUG at include/linux/swapops.h:204!
[  222.078062] invalid opcode: 0000 [#1] SMP NOPTI
[  222.082618] CPU: 38 PID: 9828 Comm: rpc-worker Tainted: G            E
  5.13.1-1.gdc.el8.x86_64 #1
[  222.091894] Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380
Gen10, BIOS U30 05/24/2021
[  222.100468] RIP: 0010:pmd_migration_entry_wait+0x132/0x140
[  222.105994] Code: 02 00 00 00 5b 4c 89 c7 5d e9 ca c5 f6 ff 48 81 e2 00
f0 ff ff 48 f7 d2 48 21 c2 89 d1 f7 c2 81 01 00 00 75 80 e9 44 ff ff ff
<0f> 0b 48 8b 2d 65 48 30 01 e9 ef fe ff ff 0f 1f 44 00 00 41 55 48
[  222.124878] RSP: 0000:ffffbcfe9eb7bdd8 EFLAGS: 00010246
[  222.130134] RAX: 0057ffffc0000000 RBX: ffff9eec4b3cdbf8 RCX:
ffffffffffffffff
[  222.137309] RDX: 0000000000000000 RSI: ffff9eec4b3cdbf8 RDI:
ffffdf55c52cf368
[  222.144485] RBP: ffffdf55c52cf368 R08: ffffdf57428d8080 R09:
0000000000000000
[  222.151661] R10: 0000000000000000 R11: 0000000000000000 R12:
0000000000000bf8
[  222.158837] R13: 0400000000000000 R14: 0400000000000080 R15:
ffff9eec2825b1f8
[  222.166015] FS:  00007f6754aeb700(0000) GS:ffff9f49bfd00000(0000)
knlGS:0000000000000000
[  222.174153] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  222.179932] CR2: 00007f676ffffd98 CR3: 000000012bf6a002 CR4:
00000000007726e0
[  222.187109] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[  222.194283] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[  222.201457] PKRU: 55555554
[  222.204178] Call Trace:
[  222.206638]  __handle_mm_fault+0x5ad/0x6e0
[  222.210760]  ? sysvec_call_function_single+0xb/0x90
[  222.215672]  handle_mm_fault+0xc5/0x290
[  222.219529]  do_user_addr_fault+0x1a9/0x660
[  222.223740]  ? sched_clock_cpu+0xc/0xa0
[  222.227602]  exc_page_fault+0x68/0x130
[  222.231373]  ? asm_exc_page_fault+0x8/0x30
[  222.235495]  asm_exc_page_fault+0x1e/0x30
[  222.239526] RIP: 0033:0x7f67baaed734
[  222.243120] Code: 89 08 48 8b 35 dd 3b 21 00 4c 8d 0d d6 3b 21 00 31 c0
4c 39 ce 74 73 0f 1f 80 00 00 00 00 48 8d 96 40 fd ff ff 49 39 d2 74 22
<48> 8b 96 d8 03 00 00 48 01 15 4e 7c 21 00 80 be 50 03 00 00 00 c7
[  222.262002] RSP: 002b:00007f6754aea298 EFLAGS: 00010287
[  222.267257] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
0000000000000000
[  222.274432] RDX: 00007f676ffff700 RSI: 00007f676ffff9c0 RDI:
00007f676f7fec10
[  222.281609] RBP: 0000000000000001 R08: 00007f676f7fed10 R09:
00007f67bad012f0
[  222.288785] R10: 00007f6754aeb700 R11: 0000000000000202 R12:
0000000000000001
[  222.295961] R13: 0000000000000006 R14: 0000000000000e28 R15:
00007f674006e1f0
[  222.303137] Modules linked in: vhost_net(E) vhost(E) vhost_iotlb(E)
tap(E) tun(E) tcp_diag(E) udp_diag(E) inet_diag(E) netconsole(E)
nf_tables(E) vxlan(E) ip6_udp_tunnel(E) udp_tunnel(E) nfnetlink(E)
binfmt_misc(E) iscsi_tcp(E) libiscsi_tcp(E) 8021q(E) garp(E) mrp(E)
bonding(E) tls(E) vfat(E) fat(E) dm_service_time(E) dm_multipath(E)
rpcrdma(E) sunrpc(E) rdma_ucm(E) ib_srpt(E) ib_isert(E) iscsi_target_mod(E)
target_core_mod(E) ib_iser(E) rdma_cm(E) iw_cm(E) ib_cm(E) libiscsi(E)
intel_rapl_msr(E) intel_rapl_common(E) scsi_transport_iscsi(E)
isst_if_common(E) ipmi_ssif(E) nfit(E) libnvdimm(E) x86_pkg_temp_thermal(E)
intel_powerclamp(E) coretemp(E) kvm_intel(E) kvm(E) irqbypass(E)
crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) rapl(E) qedr(E)
mei_me(E) acpi_ipmi(E) ib_uverbs(E) intel_cstate(E) ipmi_si(E) ib_core(E)
ipmi_devintf(E) dm_mod(E) ioatdma(E) ses(E) intel_uncore(E) pcspkr(E)
enclosure(E) mei(E) hpwdt(E) hpilo(E) lpc_ich(E) intel_pch_thermal(E)
dca(E) ipmi_msghandler(E)
[  222.303181]  acpi_power_meter(E) ext4(E) mbcache(E) jbd2(E) sd_mod(E)
t10_pi(E) sg(E) qedf(E) qede(E) libfcoe(E) qed(E) libfc(E) smartpqi(E)
scsi_transport_fc(E) tg3(E) scsi_transport_sas(E) crc8(E) wmi(E)
nf_conntrack(E) libcrc32c(E) crc32c_intel(E) nf_defrag_ipv6(E)
nf_defrag_ipv4(E) br_netfilter(E) bridge(E) stp(E) llc(E)
[  222.420050] ---[ end trace bcf7b6d1610cc21f ]---
[  222.572925] RIP: 0010:pmd_migration_entry_wait+0x132/0x140
[  222.578469] Code: 02 00 00 00 5b 4c 89 c7 5d e9 ca c5 f6 ff 48 81 e2 00
f0 ff ff 48 f7 d2 48 21 c2 89 d1 f7 c2 81 01 00 00 75 80 e9 44 ff ff ff
<0f> 0b 48 8b 2d 65 48 30 01 e9 ef fe ff ff 0f 1f 44 00 00 41 55 48
[  222.597359] RSP: 0000:ffffbcfe9eb7bdd8 EFLAGS: 00010246
[  222.602620] RAX: 0057ffffc0000000 RBX: ffff9eec4b3cdbf8 RCX:
ffffffffffffffff
[  222.609807] RDX: 0000000000000000 RSI: ffff9eec4b3cdbf8 RDI:
ffffdf55c52cf368
[  222.616990] RBP: ffffdf55c52cf368 R08: ffffdf57428d8080 R09:
0000000000000000
[  222.624177] R10: 0000000000000000 R11: 0000000000000000 R12:
0000000000000bf8
[  222.631361] R13: 0400000000000000 R14: 0400000000000080 R15:
ffff9eec2825b1f8
[  222.638548] FS:  00007f6754aeb700(0000) GS:ffff9f49bfd00000(0000)
knlGS:0000000000000000
[  222.646694] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  222.652481] CR2: 00007f676ffffd98 CR3: 000000012bf6a002 CR4:
00000000007726e0
[  222.659665] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[  222.666850] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[  222.674031] PKRU: 55555554
[  222.676758] Kernel panic - not syncing: Fatal exception
[  222.817538] Kernel Offset: 0x16000000 from 0xffffffff81000000
(relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[  222.965540] ---[ end Kernel panic - not syncing: Fatal exception ]---

On Sun, Jul 11, 2021 at 8:06 AM Igor Raits <igor@gooddata.com> wrote:

> Hi Hugh,
>
> On Sun, Jul 11, 2021 at 6:17 AM Hugh Dickins <hughd@google.com> wrote:
>
>> On Sat, 10 Jul 2021, Igor Raits wrote:
>>
>> > Hello,
>> >
>> > I've seen one weird bug on 5.12.14 that happened a couple of times when
>> I
>> > started a bunch of VMs on a server.
>>
>> Would it be possible for you to try the same on a 5.12.13 kernel?
>> Perhaps by reverting the diff between 5.12.13 and 5.12.14 temporarily.
>> Enough to form an impression of whether the issue is new in 5.12.14.
>>
>
> We've been using 5.12.12 for quite some time (~ a month) and I never saw
> it there.
>
> But I have to admit that I don't really have a reproducer. For example, on
> servers where it happened,
> I just rebooted them and panic did not happen anymore (so I saw it only
> only once,
> only on 2 servers out of 32 that we have on 5.12.14).
>
>
>> I ask because 5.12.14 did include several fixes and cleanups from me
>> to page_vma_mapped_walk(), and that is involved in inserting and
>> removing pmd migration entries.  I am not aware of introducing any
>> bug there, but your report has got me worried.  If it's happening in
>> 5.12.14 but not in 5.12.13, then I must look again at my changes.
>>
>> I don't expect Hillf's patch to help at at all: the pmd_lock()
>> is supposed to be taken by page_vma_mapped_walk(), before
>> set_pmd_migration_entry() and remove_migration_pmd() are called.
>>
>> Thanks,
>> Hugh
>>
>> >
>> > I've briefly googled this problem but could not find any relevant commit
>> > that would fix this issue.
>> >
>> > Do you have any hint how to debug this further or know the fix by any
>> > chance?
>> >
>> > Thanks in advance. Stack trace following:
>> >
>> > [  376.876610] ------------[ cut here ]------------
>> > [  376.881274] kernel BUG at include/linux/swapops.h:204!
>> > [  376.886455] invalid opcode: 0000 [#1] SMP NOPTI
>> > [  376.891014] CPU: 40 PID: 11775 Comm: rpc-worker Tainted: G
>>   E
>> >     5.12.14-1.gdc.el8.x86_64 #1
>> > [  376.900464] Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380
>> > Gen10, BIOS U30 05/24/2021
>> > [  376.909038] RIP: 0010:pmd_migration_entry_wait+0x132/0x140
>> > [  376.914562] Code: 02 00 00 00 5b 4c 89 c7 5d e9 8a e4 f6 ff 48 81 e2
>> 00
>> > f0 ff ff 48 f7 d2 48 21 c2 89 d1 f7 c2 81 01 00 00 75 80 e9 44 ff ff ff
>> > <0f> 0b 48 8b 2d 75 bd 30 01 e9 ef fe ff ff 0f 1f 44 00 00 41 55 48
>> > [  376.933443] RSP: 0000:ffffb65a5e1cfdc8 EFLAGS: 00010246
>> > [  376.938701] RAX: 0017ffffc0000000 RBX: ffff908b8ecabaf8 RCX:
>> > ffffffffffffffff
>> > [  376.945878] RDX: 0000000000000000 RSI: ffff908b8ecabaf8 RDI:
>> > fffff497473b2ae8
>> > [  376.953055] RBP: fffff497473b2ae8 R08: fffff49747fa8080 R09:
>> > 0000000000000000
>> > [  376.960230] R10: 0000000000000000 R11: 0000000000000000 R12:
>> > 0000000000000af8
>> > [  376.967407] R13: 0400000000000000 R14: 0400000000000080 R15:
>> > ffff908bbef7b6a8
>> > [  376.974582] FS:  00007f5bb1f81700(0000) GS:ffff90e87fd80000(0000)
>> > knlGS:0000000000000000
>> > [  376.982718] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> > [  376.988497] CR2: 00007f5b2bfffd98 CR3: 00000001f793e006 CR4:
>> > 00000000007726e0
>> > [  376.995673] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
>> > 0000000000000000
>> > [  377.002849] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
>> > 0000000000000400
>> > [  377.010026] PKRU: 55555554
>> > [  377.012745] Call Trace:
>> > [  377.015207]  __handle_mm_fault+0x5ad/0x6e0
>> > [  377.019335]  handle_mm_fault+0xc5/0x290
>> > [  377.023194]  do_user_addr_fault+0x1cd/0x740
>> > [  377.027406]  exc_page_fault+0x54/0x110
>> > [  377.031182]  ? asm_exc_page_fault+0x8/0x30
>> > [  377.035307]  asm_exc_page_fault+0x1e/0x30
>> > [  377.039340] RIP: 0033:0x7f5bb91d6734
>> > [  377.042937] Code: 89 08 48 8b 35 dd 3b 21 00 4c 8d 0d d6 3b 21 00 31
>> c0
>> > 4c 39 ce 74 73 0f 1f 80 00 00 00 00 48 8d 96 40 fd ff ff 49 39 d2 74 22
>> > <48> 8b 96 d8 03 00 00 48 01 15 4e 7c 21 00 80 be 50 03 00 00 00 c7
>> > [  377.061820] RSP: 002b:00007f5bb1f7ff58 EFLAGS: 00010206
>> > [  377.067076] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
>> > 00007f5ba0000020
>> > [  377.074255] RDX: 00007f5b2bfff700 RSI: 00007f5b2bfff9c0 RDI:
>> > 0000000000000001
>> > [  377.081429] RBP: 0000000000000001 R08: 0000000000000000 R09:
>> > 00007f5bb93ea2f0
>> > [  377.088606] R10: 00007f5bb1f81700 R11: 0000000000000202 R12:
>> > 0000000000000001
>> > [  377.095782] R13: 0000000000000006 R14: 0000000000000cb4 R15:
>> > 00007f5bb1f801f0
>> > [  377.102958] Modules linked in: ebt_arp(E) nft_meta_bridge(E)
>> > ip6_tables(E) xt_CT(E) nf_log_ipv4(E) nf_log_common(E) nft_limit(E)
>> > nft_counter(E) xt_LOG(E) xt_limit(E) xt_mac(E) xt_set(E) xt_multiport(E)
>> > xt_state(E) xt_conntrack(E) xt_comment(E) xt_physdev(E) nft_compat(E)
>> > ip_set_hash_net(E) ip_set(E) vhost_net(E) vhost(E) vhost_iotlb(E) tap(E)
>> > tun(E) tcp_diag(E) udp_diag(E) inet_diag(E) netconsole(E) nf_tables(E)
>> > vxlan(E) ip6_udp_tunnel(E) udp_tunnel(E) nfnetlink(E) binfmt_misc(E)
>> > iscsi_tcp(E) libiscsi_tcp(E) 8021q(E) garp(E) mrp(E) bonding(E) tls(E)
>> > vfat(E) fat(E) dm_service_time(E) dm_multipath(E) rpcrdma(E) sunrpc(E)
>> > rdma_ucm(E) ib_srpt(E) ib_isert(E) iscsi_target_mod(E)
>> target_core_mod(E)
>> > ib_iser(E) rdma_cm(E) iw_cm(E) ib_cm(E) libiscsi(E)
>> scsi_transport_iscsi(E)
>> > intel_rapl_msr(E) qedr(E) intel_rapl_common(E) ib_uverbs(E)
>> > isst_if_common(E) ib_core(E) nfit(E) libnvdimm(E)
>> x86_pkg_temp_thermal(E)
>> > intel_powerclamp(E) coretemp(E) kvm_intel(E) kvm(E) irqbypass(E)
>> > crct10dif_pclmul(E)
>> > [  377.102999]  crc32_pclmul(E) ghash_clmulni_intel(E) rapl(E)
>> > intel_cstate(E) ipmi_ssif(E) acpi_ipmi(E) ipmi_si(E) mei_me(E)
>> ioatdma(E)
>> > ipmi_devintf(E) dm_mod(E) ses(E) intel_uncore(E) pcspkr(E) qede(E)
>> > enclosure(E) tg3(E) mei(E) lpc_ich(E) hpilo(E) hpwdt(E)
>> > intel_pch_thermal(E) dca(E) ipmi_msghandler(E) acpi_power_meter(E)
>> ext4(E)
>> > mbcache(E) jbd2(E) sd_mod(E) t10_pi(E) sg(E) qedf(E) qed(E) crc8(E)
>> > libfcoe(E) libfc(E) smartpqi(E) scsi_transport_fc(E)
>> scsi_transport_sas(E)
>> > wmi(E) nf_conntrack(E) nf_defrag_ipv6(E) libcrc32c(E) crc32c_intel(E)
>> > nf_defrag_ipv4(E) br_netfilter(E) bridge(E) stp(E) llc(E)
>> > [  377.243468] ---[ end trace 04bce3bb051f7620 ]---
>> > [  377.385645] RIP: 0010:pmd_migration_entry_wait+0x132/0x140
>> > [  377.391194] Code: 02 00 00 00 5b 4c 89 c7 5d e9 8a e4 f6 ff 48 81 e2
>> 00
>> > f0 ff ff 48 f7 d2 48 21 c2 89 d1 f7 c2 81 01 00 00 75 80 e9 44 ff ff ff
>> > <0f> 0b 48 8b 2d 75 bd 30 01 e9 ef fe ff ff 0f 1f 44 00 00 41 55 48
>> > [  377.410091] RSP: 0000:ffffb65a5e1cfdc8 EFLAGS: 00010246
>> > [  377.415355] RAX: 0017ffffc0000000 RBX: ffff908b8ecabaf8 RCX:
>> > ffffffffffffffff
>> > [  377.422540] RDX: 0000000000000000 RSI: ffff908b8ecabaf8 RDI:
>> > fffff497473b2ae8
>> > [  377.429721] RBP: fffff497473b2ae8 R08: fffff49747fa8080 R09:
>> > 0000000000000000
>> > [  377.436902] R10: 0000000000000000 R11: 0000000000000000 R12:
>> > 0000000000000af8
>> > [  377.444086] R13: 0400000000000000 R14: 0400000000000080 R15:
>> > ffff908bbef7b6a8
>> > [  377.451272] FS:  00007f5bb1f81700(0000) GS:ffff90e87fd80000(0000)
>> > knlGS:0000000000000000
>> > [  377.459415] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> > [  377.465196] CR2: 00007f5b2bfffd98 CR3: 00000001f793e006 CR4:
>> > 00000000007726e0
>> > [  377.472377] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
>> > 0000000000000000
>> > [  377.479556] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
>> > 0000000000000400
>> > [  377.486738] PKRU: 55555554
>> > [  377.489465] Kernel panic - not syncing: Fatal exception
>> > [  377.573911] Kernel Offset: 0xa000000 from 0xffffffff81000000
>> (relocation
>> > range: 0xffffffff80000000-0xffffffffbfffffff)
>> > [  377.716482] ---[ end Kernel panic - not syncing: Fatal exception ]---
>> >
>>
>
>
> --
>
> Igor Raits
>
> Sr. SW Engineer
>
> igor@gooddata.com
>
> +420 775 117 817
>
> Moravske namesti 1007/14
>
> 602 00 Brno-Veveri, Czech Republic
>
> Twitter <https://twitter.com/gooddata> | Facebook
> <https://www.facebook.com/gooddata> | LinkedIn
> <http://www.linkedin.com/company/gooddata> | Blog
> <http://www.gooddata.com/blog>
>
>
> <https://www.gooddata.com/>
>


-- 

Igor Raits

Sr. SW Engineer

igor@gooddata.com

+420 775 117 817

Moravske namesti 1007/14

602 00 Brno-Veveri, Czech Republic

Twitter <https://twitter.com/gooddata> | Facebook
<https://www.facebook.com/gooddata> | LinkedIn
<http://www.linkedin.com/company/gooddata> | Blog
<http://www.gooddata.com/blog>


<https://www.gooddata.com/>

[-- Attachment #2: Type: text/html, Size: 24877 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: kernel BUG at include/linux/swapops.h:204!
  2021-07-15 17:47     ` Igor Raits
@ 2021-07-16 19:45       ` Hugh Dickins
  2021-07-19 19:11         ` Hugh Dickins
  0 siblings, 1 reply; 24+ messages in thread
From: Hugh Dickins @ 2021-07-16 19:45 UTC (permalink / raw)
  To: Igor Raits; +Cc: Hugh Dickins, linux-mm, Andrew Morton, Hillf Danton

On Thu, 15 Jul 2021, Igor Raits wrote:

> Hi everyone again,
> 
> I've been trying to reproduce this issue but still can't find a consistent
> pattern.
> 
> However, it did happen once more and this time on 5.13.1:

Thanks for the updates, Igor.

I have to admit that what you have reported confirms the suspicion
that it's a bug introduced by one of my "stable" patches in 5.12.14
(which are also in 5.13): nothing else between 5.12.12 and 5.12.14
seems likely to be relevant.

But I've gone back and forth and not been able to spot the problem.

Please would you send (either privately to me, or to the list) your
5.13.1 kernel's .config, and disassembly of pmd_migration_entry_wait()
from its vmlinux (with line numbers if available; or just send the
whole vmlinux if that's easier, and I'll disassemble).

I am hoping that the disassembly, together with the register contents
that you've shown, will help guide towards an answer.

Thanks,
Hugh

> 
> [  222.068216] ------------[ cut here ]------------
> [  222.072884] kernel BUG at include/linux/swapops.h:204!
> [  222.078062] invalid opcode: 0000 [#1] SMP NOPTI
> [  222.082618] CPU: 38 PID: 9828 Comm: rpc-worker Tainted: G            E
>   5.13.1-1.gdc.el8.x86_64 #1
> [  222.091894] Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380
> Gen10, BIOS U30 05/24/2021
> [  222.100468] RIP: 0010:pmd_migration_entry_wait+0x132/0x140
> [  222.105994] Code: 02 00 00 00 5b 4c 89 c7 5d e9 ca c5 f6 ff 48 81 e2 00
> f0 ff ff 48 f7 d2 48 21 c2 89 d1 f7 c2 81 01 00 00 75 80 e9 44 ff ff ff
> <0f> 0b 48 8b 2d 65 48 30 01 e9 ef fe ff ff 0f 1f 44 00 00 41 55 48
> [  222.124878] RSP: 0000:ffffbcfe9eb7bdd8 EFLAGS: 00010246
> [  222.130134] RAX: 0057ffffc0000000 RBX: ffff9eec4b3cdbf8 RCX:
> ffffffffffffffff
> [  222.137309] RDX: 0000000000000000 RSI: ffff9eec4b3cdbf8 RDI:
> ffffdf55c52cf368
> [  222.144485] RBP: ffffdf55c52cf368 R08: ffffdf57428d8080 R09:
> 0000000000000000
> [  222.151661] R10: 0000000000000000 R11: 0000000000000000 R12:
> 0000000000000bf8
> [  222.158837] R13: 0400000000000000 R14: 0400000000000080 R15:
> ffff9eec2825b1f8
> [  222.166015] FS:  00007f6754aeb700(0000) GS:ffff9f49bfd00000(0000)
> knlGS:0000000000000000
> [  222.174153] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  222.179932] CR2: 00007f676ffffd98 CR3: 000000012bf6a002 CR4:
> 00000000007726e0
> [  222.187109] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [  222.194283] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> 0000000000000400
> [  222.201457] PKRU: 55555554
> [  222.204178] Call Trace:
> [  222.206638]  __handle_mm_fault+0x5ad/0x6e0
> [  222.210760]  ? sysvec_call_function_single+0xb/0x90
> [  222.215672]  handle_mm_fault+0xc5/0x290
> [  222.219529]  do_user_addr_fault+0x1a9/0x660
> [  222.223740]  ? sched_clock_cpu+0xc/0xa0
> [  222.227602]  exc_page_fault+0x68/0x130
> [  222.231373]  ? asm_exc_page_fault+0x8/0x30
> [  222.235495]  asm_exc_page_fault+0x1e/0x30
> [  222.239526] RIP: 0033:0x7f67baaed734
> [  222.243120] Code: 89 08 48 8b 35 dd 3b 21 00 4c 8d 0d d6 3b 21 00 31 c0
> 4c 39 ce 74 73 0f 1f 80 00 00 00 00 48 8d 96 40 fd ff ff 49 39 d2 74 22
> <48> 8b 96 d8 03 00 00 48 01 15 4e 7c 21 00 80 be 50 03 00 00 00 c7
> [  222.262002] RSP: 002b:00007f6754aea298 EFLAGS: 00010287
> [  222.267257] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
> 0000000000000000
> [  222.274432] RDX: 00007f676ffff700 RSI: 00007f676ffff9c0 RDI:
> 00007f676f7fec10
> [  222.281609] RBP: 0000000000000001 R08: 00007f676f7fed10 R09:
> 00007f67bad012f0
> [  222.288785] R10: 00007f6754aeb700 R11: 0000000000000202 R12:
> 0000000000000001
> [  222.295961] R13: 0000000000000006 R14: 0000000000000e28 R15:
> 00007f674006e1f0
> [  222.303137] Modules linked in: vhost_net(E) vhost(E) vhost_iotlb(E)
> tap(E) tun(E) tcp_diag(E) udp_diag(E) inet_diag(E) netconsole(E)
> nf_tables(E) vxlan(E) ip6_udp_tunnel(E) udp_tunnel(E) nfnetlink(E)
> binfmt_misc(E) iscsi_tcp(E) libiscsi_tcp(E) 8021q(E) garp(E) mrp(E)
> bonding(E) tls(E) vfat(E) fat(E) dm_service_time(E) dm_multipath(E)
> rpcrdma(E) sunrpc(E) rdma_ucm(E) ib_srpt(E) ib_isert(E) iscsi_target_mod(E)
> target_core_mod(E) ib_iser(E) rdma_cm(E) iw_cm(E) ib_cm(E) libiscsi(E)
> intel_rapl_msr(E) intel_rapl_common(E) scsi_transport_iscsi(E)
> isst_if_common(E) ipmi_ssif(E) nfit(E) libnvdimm(E) x86_pkg_temp_thermal(E)
> intel_powerclamp(E) coretemp(E) kvm_intel(E) kvm(E) irqbypass(E)
> crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) rapl(E) qedr(E)
> mei_me(E) acpi_ipmi(E) ib_uverbs(E) intel_cstate(E) ipmi_si(E) ib_core(E)
> ipmi_devintf(E) dm_mod(E) ioatdma(E) ses(E) intel_uncore(E) pcspkr(E)
> enclosure(E) mei(E) hpwdt(E) hpilo(E) lpc_ich(E) intel_pch_thermal(E)
> dca(E) ipmi_msghandler(E)
> [  222.303181]  acpi_power_meter(E) ext4(E) mbcache(E) jbd2(E) sd_mod(E)
> t10_pi(E) sg(E) qedf(E) qede(E) libfcoe(E) qed(E) libfc(E) smartpqi(E)
> scsi_transport_fc(E) tg3(E) scsi_transport_sas(E) crc8(E) wmi(E)
> nf_conntrack(E) libcrc32c(E) crc32c_intel(E) nf_defrag_ipv6(E)
> nf_defrag_ipv4(E) br_netfilter(E) bridge(E) stp(E) llc(E)
> [  222.420050] ---[ end trace bcf7b6d1610cc21f ]---
> [  222.572925] RIP: 0010:pmd_migration_entry_wait+0x132/0x140
> [  222.578469] Code: 02 00 00 00 5b 4c 89 c7 5d e9 ca c5 f6 ff 48 81 e2 00
> f0 ff ff 48 f7 d2 48 21 c2 89 d1 f7 c2 81 01 00 00 75 80 e9 44 ff ff ff
> <0f> 0b 48 8b 2d 65 48 30 01 e9 ef fe ff ff 0f 1f 44 00 00 41 55 48
> [  222.597359] RSP: 0000:ffffbcfe9eb7bdd8 EFLAGS: 00010246
> [  222.602620] RAX: 0057ffffc0000000 RBX: ffff9eec4b3cdbf8 RCX:
> ffffffffffffffff
> [  222.609807] RDX: 0000000000000000 RSI: ffff9eec4b3cdbf8 RDI:
> ffffdf55c52cf368
> [  222.616990] RBP: ffffdf55c52cf368 R08: ffffdf57428d8080 R09:
> 0000000000000000
> [  222.624177] R10: 0000000000000000 R11: 0000000000000000 R12:
> 0000000000000bf8
> [  222.631361] R13: 0400000000000000 R14: 0400000000000080 R15:
> ffff9eec2825b1f8
> [  222.638548] FS:  00007f6754aeb700(0000) GS:ffff9f49bfd00000(0000)
> knlGS:0000000000000000
> [  222.646694] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  222.652481] CR2: 00007f676ffffd98 CR3: 000000012bf6a002 CR4:
> 00000000007726e0
> [  222.659665] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [  222.666850] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> 0000000000000400
> [  222.674031] PKRU: 55555554
> [  222.676758] Kernel panic - not syncing: Fatal exception
> [  222.817538] Kernel Offset: 0x16000000 from 0xffffffff81000000
> (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> [  222.965540] ---[ end Kernel panic - not syncing: Fatal exception ]---
> 
> On Sun, Jul 11, 2021 at 8:06 AM Igor Raits <igor@gooddata.com> wrote:
> 
> > Hi Hugh,
> >
> > On Sun, Jul 11, 2021 at 6:17 AM Hugh Dickins <hughd@google.com> wrote:
> >
> >> On Sat, 10 Jul 2021, Igor Raits wrote:
> >>
> >> > Hello,
> >> >
> >> > I've seen one weird bug on 5.12.14 that happened a couple of times when
> >> I
> >> > started a bunch of VMs on a server.
> >>
> >> Would it be possible for you to try the same on a 5.12.13 kernel?
> >> Perhaps by reverting the diff between 5.12.13 and 5.12.14 temporarily.
> >> Enough to form an impression of whether the issue is new in 5.12.14.
> >>
> >
> > We've been using 5.12.12 for quite some time (~ a month) and I never saw
> > it there.
> >
> > But I have to admit that I don't really have a reproducer. For example, on
> > servers where it happened,
> > I just rebooted them and panic did not happen anymore (so I saw it only
> > only once,
> > only on 2 servers out of 32 that we have on 5.12.14).
> >
> >
> >> I ask because 5.12.14 did include several fixes and cleanups from me
> >> to page_vma_mapped_walk(), and that is involved in inserting and
> >> removing pmd migration entries.  I am not aware of introducing any
> >> bug there, but your report has got me worried.  If it's happening in
> >> 5.12.14 but not in 5.12.13, then I must look again at my changes.
> >>
> >> I don't expect Hillf's patch to help at at all: the pmd_lock()
> >> is supposed to be taken by page_vma_mapped_walk(), before
> >> set_pmd_migration_entry() and remove_migration_pmd() are called.
> >>
> >> Thanks,
> >> Hugh
> >>
> >> >
> >> > I've briefly googled this problem but could not find any relevant commit
> >> > that would fix this issue.
> >> >
> >> > Do you have any hint how to debug this further or know the fix by any
> >> > chance?
> >> >
> >> > Thanks in advance. Stack trace following:
> >> >
> >> > [  376.876610] ------------[ cut here ]------------
> >> > [  376.881274] kernel BUG at include/linux/swapops.h:204!
> >> > [  376.886455] invalid opcode: 0000 [#1] SMP NOPTI
> >> > [  376.891014] CPU: 40 PID: 11775 Comm: rpc-worker Tainted: G
> >>   E
> >> >     5.12.14-1.gdc.el8.x86_64 #1
> >> > [  376.900464] Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380
> >> > Gen10, BIOS U30 05/24/2021
> >> > [  376.909038] RIP: 0010:pmd_migration_entry_wait+0x132/0x140
> >> > [  376.914562] Code: 02 00 00 00 5b 4c 89 c7 5d e9 8a e4 f6 ff 48 81 e2
> >> 00
> >> > f0 ff ff 48 f7 d2 48 21 c2 89 d1 f7 c2 81 01 00 00 75 80 e9 44 ff ff ff
> >> > <0f> 0b 48 8b 2d 75 bd 30 01 e9 ef fe ff ff 0f 1f 44 00 00 41 55 48
> >> > [  376.933443] RSP: 0000:ffffb65a5e1cfdc8 EFLAGS: 00010246
> >> > [  376.938701] RAX: 0017ffffc0000000 RBX: ffff908b8ecabaf8 RCX:
> >> > ffffffffffffffff
> >> > [  376.945878] RDX: 0000000000000000 RSI: ffff908b8ecabaf8 RDI:
> >> > fffff497473b2ae8
> >> > [  376.953055] RBP: fffff497473b2ae8 R08: fffff49747fa8080 R09:
> >> > 0000000000000000
> >> > [  376.960230] R10: 0000000000000000 R11: 0000000000000000 R12:
> >> > 0000000000000af8
> >> > [  376.967407] R13: 0400000000000000 R14: 0400000000000080 R15:
> >> > ffff908bbef7b6a8
> >> > [  376.974582] FS:  00007f5bb1f81700(0000) GS:ffff90e87fd80000(0000)
> >> > knlGS:0000000000000000
> >> > [  376.982718] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> > [  376.988497] CR2: 00007f5b2bfffd98 CR3: 00000001f793e006 CR4:
> >> > 00000000007726e0
> >> > [  376.995673] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> >> > 0000000000000000
> >> > [  377.002849] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> >> > 0000000000000400
> >> > [  377.010026] PKRU: 55555554
> >> > [  377.012745] Call Trace:
> >> > [  377.015207]  __handle_mm_fault+0x5ad/0x6e0
> >> > [  377.019335]  handle_mm_fault+0xc5/0x290
> >> > [  377.023194]  do_user_addr_fault+0x1cd/0x740
> >> > [  377.027406]  exc_page_fault+0x54/0x110
> >> > [  377.031182]  ? asm_exc_page_fault+0x8/0x30
> >> > [  377.035307]  asm_exc_page_fault+0x1e/0x30
> >> > [  377.039340] RIP: 0033:0x7f5bb91d6734
> >> > [  377.042937] Code: 89 08 48 8b 35 dd 3b 21 00 4c 8d 0d d6 3b 21 00 31
> >> c0
> >> > 4c 39 ce 74 73 0f 1f 80 00 00 00 00 48 8d 96 40 fd ff ff 49 39 d2 74 22
> >> > <48> 8b 96 d8 03 00 00 48 01 15 4e 7c 21 00 80 be 50 03 00 00 00 c7
> >> > [  377.061820] RSP: 002b:00007f5bb1f7ff58 EFLAGS: 00010206
> >> > [  377.067076] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
> >> > 00007f5ba0000020
> >> > [  377.074255] RDX: 00007f5b2bfff700 RSI: 00007f5b2bfff9c0 RDI:
> >> > 0000000000000001
> >> > [  377.081429] RBP: 0000000000000001 R08: 0000000000000000 R09:
> >> > 00007f5bb93ea2f0
> >> > [  377.088606] R10: 00007f5bb1f81700 R11: 0000000000000202 R12:
> >> > 0000000000000001
> >> > [  377.095782] R13: 0000000000000006 R14: 0000000000000cb4 R15:
> >> > 00007f5bb1f801f0
> >> > [  377.102958] Modules linked in: ebt_arp(E) nft_meta_bridge(E)
> >> > ip6_tables(E) xt_CT(E) nf_log_ipv4(E) nf_log_common(E) nft_limit(E)
> >> > nft_counter(E) xt_LOG(E) xt_limit(E) xt_mac(E) xt_set(E) xt_multiport(E)
> >> > xt_state(E) xt_conntrack(E) xt_comment(E) xt_physdev(E) nft_compat(E)
> >> > ip_set_hash_net(E) ip_set(E) vhost_net(E) vhost(E) vhost_iotlb(E) tap(E)
> >> > tun(E) tcp_diag(E) udp_diag(E) inet_diag(E) netconsole(E) nf_tables(E)
> >> > vxlan(E) ip6_udp_tunnel(E) udp_tunnel(E) nfnetlink(E) binfmt_misc(E)
> >> > iscsi_tcp(E) libiscsi_tcp(E) 8021q(E) garp(E) mrp(E) bonding(E) tls(E)
> >> > vfat(E) fat(E) dm_service_time(E) dm_multipath(E) rpcrdma(E) sunrpc(E)
> >> > rdma_ucm(E) ib_srpt(E) ib_isert(E) iscsi_target_mod(E)
> >> target_core_mod(E)
> >> > ib_iser(E) rdma_cm(E) iw_cm(E) ib_cm(E) libiscsi(E)
> >> scsi_transport_iscsi(E)
> >> > intel_rapl_msr(E) qedr(E) intel_rapl_common(E) ib_uverbs(E)
> >> > isst_if_common(E) ib_core(E) nfit(E) libnvdimm(E)
> >> x86_pkg_temp_thermal(E)
> >> > intel_powerclamp(E) coretemp(E) kvm_intel(E) kvm(E) irqbypass(E)
> >> > crct10dif_pclmul(E)
> >> > [  377.102999]  crc32_pclmul(E) ghash_clmulni_intel(E) rapl(E)
> >> > intel_cstate(E) ipmi_ssif(E) acpi_ipmi(E) ipmi_si(E) mei_me(E)
> >> ioatdma(E)
> >> > ipmi_devintf(E) dm_mod(E) ses(E) intel_uncore(E) pcspkr(E) qede(E)
> >> > enclosure(E) tg3(E) mei(E) lpc_ich(E) hpilo(E) hpwdt(E)
> >> > intel_pch_thermal(E) dca(E) ipmi_msghandler(E) acpi_power_meter(E)
> >> ext4(E)
> >> > mbcache(E) jbd2(E) sd_mod(E) t10_pi(E) sg(E) qedf(E) qed(E) crc8(E)
> >> > libfcoe(E) libfc(E) smartpqi(E) scsi_transport_fc(E)
> >> scsi_transport_sas(E)
> >> > wmi(E) nf_conntrack(E) nf_defrag_ipv6(E) libcrc32c(E) crc32c_intel(E)
> >> > nf_defrag_ipv4(E) br_netfilter(E) bridge(E) stp(E) llc(E)
> >> > [  377.243468] ---[ end trace 04bce3bb051f7620 ]---
> >> > [  377.385645] RIP: 0010:pmd_migration_entry_wait+0x132/0x140
> >> > [  377.391194] Code: 02 00 00 00 5b 4c 89 c7 5d e9 8a e4 f6 ff 48 81 e2
> >> 00
> >> > f0 ff ff 48 f7 d2 48 21 c2 89 d1 f7 c2 81 01 00 00 75 80 e9 44 ff ff ff
> >> > <0f> 0b 48 8b 2d 75 bd 30 01 e9 ef fe ff ff 0f 1f 44 00 00 41 55 48
> >> > [  377.410091] RSP: 0000:ffffb65a5e1cfdc8 EFLAGS: 00010246
> >> > [  377.415355] RAX: 0017ffffc0000000 RBX: ffff908b8ecabaf8 RCX:
> >> > ffffffffffffffff
> >> > [  377.422540] RDX: 0000000000000000 RSI: ffff908b8ecabaf8 RDI:
> >> > fffff497473b2ae8
> >> > [  377.429721] RBP: fffff497473b2ae8 R08: fffff49747fa8080 R09:
> >> > 0000000000000000
> >> > [  377.436902] R10: 0000000000000000 R11: 0000000000000000 R12:
> >> > 0000000000000af8
> >> > [  377.444086] R13: 0400000000000000 R14: 0400000000000080 R15:
> >> > ffff908bbef7b6a8
> >> > [  377.451272] FS:  00007f5bb1f81700(0000) GS:ffff90e87fd80000(0000)
> >> > knlGS:0000000000000000
> >> > [  377.459415] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> >> > [  377.465196] CR2: 00007f5b2bfffd98 CR3: 00000001f793e006 CR4:
> >> > 00000000007726e0
> >> > [  377.472377] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> >> > 0000000000000000
> >> > [  377.479556] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> >> > 0000000000000400
> >> > [  377.486738] PKRU: 55555554
> >> > [  377.489465] Kernel panic - not syncing: Fatal exception
> >> > [  377.573911] Kernel Offset: 0xa000000 from 0xffffffff81000000
> >> (relocation
> >> > range: 0xffffffff80000000-0xffffffffbfffffff)
> >> > [  377.716482] ---[ end Kernel panic - not syncing: Fatal exception ]---
> >> >
> >>
> >
> >
> > --
> >
> > Igor Raits
> >
> > Sr. SW Engineer
> >
> > igor@gooddata.com
> >
> > +420 775 117 817
> >
> > Moravske namesti 1007/14
> >
> > 602 00 Brno-Veveri, Czech Republic
> >
> > Twitter <https://twitter.com/gooddata> | Facebook
> > <https://www.facebook.com/gooddata> | LinkedIn
> > <http://www.linkedin.com/company/gooddata> | Blog
> > <http://www.gooddata.com/blog>
> >
> >
> > <https://www.gooddata.com/>
> >
> 
> 
> -- 
> 
> Igor Raits
> 
> Sr. SW Engineer
> 
> igor@gooddata.com
> 
> +420 775 117 817
> 
> Moravske namesti 1007/14
> 
> 602 00 Brno-Veveri, Czech Republic
> 
> Twitter <https://twitter.com/gooddata> | Facebook
> <https://www.facebook.com/gooddata> | LinkedIn
> <http://www.linkedin.com/company/gooddata> | Blog
> <http://www.gooddata.com/blog>
> 
> 
> <https://www.gooddata.com/>
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: kernel BUG at include/linux/swapops.h:204!
  2021-07-16 19:45       ` Hugh Dickins
@ 2021-07-19 19:11         ` Hugh Dickins
  2021-07-19 22:12           ` Peter Xu
                             ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: Hugh Dickins @ 2021-07-19 19:11 UTC (permalink / raw)
  To: Peter Xu
  Cc: Igor Raits, Hugh Dickins, Andrew Morton, Hillf Danton,
	Axel Rasmussen, linux-mm

Hi Peter,

I believe you have already fixed this, but the fix needs to go to stable.
Sorry, the messages below are a muddle of top and middle posting,
I'll resume at the bottom.

On Fri, 16 Jul 2021, Hugh Dickins wrote:
> On Thu, 15 Jul 2021, Igor Raits wrote:
> 
> > Hi everyone again,
> > 
> > I've been trying to reproduce this issue but still can't find a consistent
> > pattern.
> > 
> > However, it did happen once more and this time on 5.13.1:
> 
> Thanks for the updates, Igor.
> 
> I have to admit that what you have reported confirms the suspicion
> that it's a bug introduced by one of my "stable" patches in 5.12.14
> (which are also in 5.13): nothing else between 5.12.12 and 5.12.14
> seems likely to be relevant.
> 
> But I've gone back and forth and not been able to spot the problem.
> 
> Please would you send (either privately to me, or to the list) your
> 5.13.1 kernel's .config, and disassembly of pmd_migration_entry_wait()
> from its vmlinux (with line numbers if available; or just send the
> whole vmlinux if that's easier, and I'll disassemble).
> 
> I am hoping that the disassembly, together with the register contents
> that you've shown, will help guide towards an answer.
> 
> Thanks,
> Hugh
> 
> > 
> > [  222.068216] ------------[ cut here ]------------
> > [  222.072884] kernel BUG at include/linux/swapops.h:204!
> > [  222.078062] invalid opcode: 0000 [#1] SMP NOPTI
> > [  222.082618] CPU: 38 PID: 9828 Comm: rpc-worker Tainted: G            E
> >   5.13.1-1.gdc.el8.x86_64 #1
> > [  222.091894] Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380
> > Gen10, BIOS U30 05/24/2021
> > [  222.100468] RIP: 0010:pmd_migration_entry_wait+0x132/0x140
> > [  222.105994] Code: 02 00 00 00 5b 4c 89 c7 5d e9 ca c5 f6 ff 48 81 e2 00
> > f0 ff ff 48 f7 d2 48 21 c2 89 d1 f7 c2 81 01 00 00 75 80 e9 44 ff ff ff
> > <0f> 0b 48 8b 2d 65 48 30 01 e9 ef fe ff ff 0f 1f 44 00 00 41 55 48
> > [  222.124878] RSP: 0000:ffffbcfe9eb7bdd8 EFLAGS: 00010246
> > [  222.130134] RAX: 0057ffffc0000000 RBX: ffff9eec4b3cdbf8 RCX:
> > ffffffffffffffff
> > [  222.137309] RDX: 0000000000000000 RSI: ffff9eec4b3cdbf8 RDI:
> > ffffdf55c52cf368
> > [  222.144485] RBP: ffffdf55c52cf368 R08: ffffdf57428d8080 R09:
> > 0000000000000000
> > [  222.151661] R10: 0000000000000000 R11: 0000000000000000 R12:
> > 0000000000000bf8
> > [  222.158837] R13: 0400000000000000 R14: 0400000000000080 R15:
> > ffff9eec2825b1f8
> > [  222.166015] FS:  00007f6754aeb700(0000) GS:ffff9f49bfd00000(0000)
> > knlGS:0000000000000000
> > [  222.174153] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  222.179932] CR2: 00007f676ffffd98 CR3: 000000012bf6a002 CR4:
> > 00000000007726e0
> > [  222.187109] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > 0000000000000000
> > [  222.194283] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> > 0000000000000400
> > [  222.201457] PKRU: 55555554
> > [  222.204178] Call Trace:
> > [  222.206638]  __handle_mm_fault+0x5ad/0x6e0
> > [  222.210760]  ? sysvec_call_function_single+0xb/0x90
> > [  222.215672]  handle_mm_fault+0xc5/0x290
> > [  222.219529]  do_user_addr_fault+0x1a9/0x660
> > [  222.223740]  ? sched_clock_cpu+0xc/0xa0
> > [  222.227602]  exc_page_fault+0x68/0x130
> > [  222.231373]  ? asm_exc_page_fault+0x8/0x30
> > [  222.235495]  asm_exc_page_fault+0x1e/0x30
> > [  222.239526] RIP: 0033:0x7f67baaed734
> > [  222.243120] Code: 89 08 48 8b 35 dd 3b 21 00 4c 8d 0d d6 3b 21 00 31 c0
> > 4c 39 ce 74 73 0f 1f 80 00 00 00 00 48 8d 96 40 fd ff ff 49 39 d2 74 22
> > <48> 8b 96 d8 03 00 00 48 01 15 4e 7c 21 00 80 be 50 03 00 00 00 c7
> > [  222.262002] RSP: 002b:00007f6754aea298 EFLAGS: 00010287
> > [  222.267257] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
> > 0000000000000000
> > [  222.274432] RDX: 00007f676ffff700 RSI: 00007f676ffff9c0 RDI:
> > 00007f676f7fec10
> > [  222.281609] RBP: 0000000000000001 R08: 00007f676f7fed10 R09:
> > 00007f67bad012f0
> > [  222.288785] R10: 00007f6754aeb700 R11: 0000000000000202 R12:
> > 0000000000000001
> > [  222.295961] R13: 0000000000000006 R14: 0000000000000e28 R15:
> > 00007f674006e1f0
> > [  222.303137] Modules linked in: vhost_net(E) vhost(E) vhost_iotlb(E)
> > tap(E) tun(E) tcp_diag(E) udp_diag(E) inet_diag(E) netconsole(E)
> > nf_tables(E) vxlan(E) ip6_udp_tunnel(E) udp_tunnel(E) nfnetlink(E)
> > binfmt_misc(E) iscsi_tcp(E) libiscsi_tcp(E) 8021q(E) garp(E) mrp(E)
> > bonding(E) tls(E) vfat(E) fat(E) dm_service_time(E) dm_multipath(E)
> > rpcrdma(E) sunrpc(E) rdma_ucm(E) ib_srpt(E) ib_isert(E) iscsi_target_mod(E)
> > target_core_mod(E) ib_iser(E) rdma_cm(E) iw_cm(E) ib_cm(E) libiscsi(E)
> > intel_rapl_msr(E) intel_rapl_common(E) scsi_transport_iscsi(E)
> > isst_if_common(E) ipmi_ssif(E) nfit(E) libnvdimm(E) x86_pkg_temp_thermal(E)
> > intel_powerclamp(E) coretemp(E) kvm_intel(E) kvm(E) irqbypass(E)
> > crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) rapl(E) qedr(E)
> > mei_me(E) acpi_ipmi(E) ib_uverbs(E) intel_cstate(E) ipmi_si(E) ib_core(E)
> > ipmi_devintf(E) dm_mod(E) ioatdma(E) ses(E) intel_uncore(E) pcspkr(E)
> > enclosure(E) mei(E) hpwdt(E) hpilo(E) lpc_ich(E) intel_pch_thermal(E)
> > dca(E) ipmi_msghandler(E)
> > [  222.303181]  acpi_power_meter(E) ext4(E) mbcache(E) jbd2(E) sd_mod(E)
> > t10_pi(E) sg(E) qedf(E) qede(E) libfcoe(E) qed(E) libfc(E) smartpqi(E)
> > scsi_transport_fc(E) tg3(E) scsi_transport_sas(E) crc8(E) wmi(E)
> > nf_conntrack(E) libcrc32c(E) crc32c_intel(E) nf_defrag_ipv6(E)
> > nf_defrag_ipv4(E) br_netfilter(E) bridge(E) stp(E) llc(E)
> > [  222.420050] ---[ end trace bcf7b6d1610cc21f ]---
> > [  222.572925] RIP: 0010:pmd_migration_entry_wait+0x132/0x140
> > [  222.578469] Code: 02 00 00 00 5b 4c 89 c7 5d e9 ca c5 f6 ff 48 81 e2 00
> > f0 ff ff 48 f7 d2 48 21 c2 89 d1 f7 c2 81 01 00 00 75 80 e9 44 ff ff ff
> > <0f> 0b 48 8b 2d 65 48 30 01 e9 ef fe ff ff 0f 1f 44 00 00 41 55 48
> > [  222.597359] RSP: 0000:ffffbcfe9eb7bdd8 EFLAGS: 00010246
> > [  222.602620] RAX: 0057ffffc0000000 RBX: ffff9eec4b3cdbf8 RCX:
> > ffffffffffffffff
> > [  222.609807] RDX: 0000000000000000 RSI: ffff9eec4b3cdbf8 RDI:
> > ffffdf55c52cf368
> > [  222.616990] RBP: ffffdf55c52cf368 R08: ffffdf57428d8080 R09:
> > 0000000000000000
> > [  222.624177] R10: 0000000000000000 R11: 0000000000000000 R12:
> > 0000000000000bf8
> > [  222.631361] R13: 0400000000000000 R14: 0400000000000080 R15:
> > ffff9eec2825b1f8
> > [  222.638548] FS:  00007f6754aeb700(0000) GS:ffff9f49bfd00000(0000)
> > knlGS:0000000000000000
> > [  222.646694] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  222.652481] CR2: 00007f676ffffd98 CR3: 000000012bf6a002 CR4:
> > 00000000007726e0
> > [  222.659665] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > 0000000000000000
> > [  222.666850] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> > 0000000000000400
> > [  222.674031] PKRU: 55555554
> > [  222.676758] Kernel panic - not syncing: Fatal exception
> > [  222.817538] Kernel Offset: 0x16000000 from 0xffffffff81000000
> > (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> > [  222.965540] ---[ end Kernel panic - not syncing: Fatal exception ]---
> > 
> > On Sun, Jul 11, 2021 at 8:06 AM Igor Raits <igor@gooddata.com> wrote:
> > 
> > > Hi Hugh,
> > >
> > > On Sun, Jul 11, 2021 at 6:17 AM Hugh Dickins <hughd@google.com> wrote:
> > >
> > >> On Sat, 10 Jul 2021, Igor Raits wrote:
> > >>
> > >> > Hello,
> > >> >
> > >> > I've seen one weird bug on 5.12.14 that happened a couple of times when
> > >> I
> > >> > started a bunch of VMs on a server.
> > >>
> > >> Would it be possible for you to try the same on a 5.12.13 kernel?
> > >> Perhaps by reverting the diff between 5.12.13 and 5.12.14 temporarily.
> > >> Enough to form an impression of whether the issue is new in 5.12.14.
> > >>
> > >
> > > We've been using 5.12.12 for quite some time (~ a month) and I never saw
> > > it there.
> > >
> > > But I have to admit that I don't really have a reproducer. For example, on
> > > servers where it happened,
> > > I just rebooted them and panic did not happen anymore (so I saw it only
> > > only once,
> > > only on 2 servers out of 32 that we have on 5.12.14).
> > >
> > >
> > >> I ask because 5.12.14 did include several fixes and cleanups from me
> > >> to page_vma_mapped_walk(), and that is involved in inserting and
> > >> removing pmd migration entries.  I am not aware of introducing any
> > >> bug there, but your report has got me worried.  If it's happening in
> > >> 5.12.14 but not in 5.12.13, then I must look again at my changes.
> > >>
> > >> I don't expect Hillf's patch to help at at all: the pmd_lock()
> > >> is supposed to be taken by page_vma_mapped_walk(), before
> > >> set_pmd_migration_entry() and remove_migration_pmd() are called.
> > >>
> > >> Thanks,
> > >> Hugh
> > >>
> > >> >
> > >> > I've briefly googled this problem but could not find any relevant commit
> > >> > that would fix this issue.
> > >> >
> > >> > Do you have any hint how to debug this further or know the fix by any
> > >> > chance?
> > >> >
> > >> > Thanks in advance. Stack trace following:
> > >> >
> > >> > [  376.876610] ------------[ cut here ]------------
> > >> > [  376.881274] kernel BUG at include/linux/swapops.h:204!
> > >> > [  376.886455] invalid opcode: 0000 [#1] SMP NOPTI
> > >> > [  376.891014] CPU: 40 PID: 11775 Comm: rpc-worker Tainted: G
> > >>   E
> > >> >     5.12.14-1.gdc.el8.x86_64 #1
> > >> > [  376.900464] Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380
> > >> > Gen10, BIOS U30 05/24/2021
> > >> > [  376.909038] RIP: 0010:pmd_migration_entry_wait+0x132/0x140
> > >> > [  376.914562] Code: 02 00 00 00 5b 4c 89 c7 5d e9 8a e4 f6 ff 48 81 e2
> > >> 00
> > >> > f0 ff ff 48 f7 d2 48 21 c2 89 d1 f7 c2 81 01 00 00 75 80 e9 44 ff ff ff
> > >> > <0f> 0b 48 8b 2d 75 bd 30 01 e9 ef fe ff ff 0f 1f 44 00 00 41 55 48
> > >> > [  376.933443] RSP: 0000:ffffb65a5e1cfdc8 EFLAGS: 00010246
> > >> > [  376.938701] RAX: 0017ffffc0000000 RBX: ffff908b8ecabaf8 RCX:
> > >> > ffffffffffffffff
> > >> > [  376.945878] RDX: 0000000000000000 RSI: ffff908b8ecabaf8 RDI:
> > >> > fffff497473b2ae8
> > >> > [  376.953055] RBP: fffff497473b2ae8 R08: fffff49747fa8080 R09:
> > >> > 0000000000000000
> > >> > [  376.960230] R10: 0000000000000000 R11: 0000000000000000 R12:
> > >> > 0000000000000af8
> > >> > [  376.967407] R13: 0400000000000000 R14: 0400000000000080 R15:
> > >> > ffff908bbef7b6a8
> > >> > [  376.974582] FS:  00007f5bb1f81700(0000) GS:ffff90e87fd80000(0000)
> > >> > knlGS:0000000000000000
> > >> > [  376.982718] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > >> > [  376.988497] CR2: 00007f5b2bfffd98 CR3: 00000001f793e006 CR4:
> > >> > 00000000007726e0
> > >> > [  376.995673] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > >> > 0000000000000000
> > >> > [  377.002849] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> > >> > 0000000000000400
> > >> > [  377.010026] PKRU: 55555554
> > >> > [  377.012745] Call Trace:
> > >> > [  377.015207]  __handle_mm_fault+0x5ad/0x6e0
> > >> > [  377.019335]  handle_mm_fault+0xc5/0x290
> > >> > [  377.023194]  do_user_addr_fault+0x1cd/0x740
> > >> > [  377.027406]  exc_page_fault+0x54/0x110
> > >> > [  377.031182]  ? asm_exc_page_fault+0x8/0x30
> > >> > [  377.035307]  asm_exc_page_fault+0x1e/0x30
> > >> > [  377.039340] RIP: 0033:0x7f5bb91d6734
> > >> > [  377.042937] Code: 89 08 48 8b 35 dd 3b 21 00 4c 8d 0d d6 3b 21 00 31
> > >> c0
> > >> > 4c 39 ce 74 73 0f 1f 80 00 00 00 00 48 8d 96 40 fd ff ff 49 39 d2 74 22
> > >> > <48> 8b 96 d8 03 00 00 48 01 15 4e 7c 21 00 80 be 50 03 00 00 00 c7
> > >> > [  377.061820] RSP: 002b:00007f5bb1f7ff58 EFLAGS: 00010206
> > >> > [  377.067076] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
> > >> > 00007f5ba0000020
> > >> > [  377.074255] RDX: 00007f5b2bfff700 RSI: 00007f5b2bfff9c0 RDI:
> > >> > 0000000000000001
> > >> > [  377.081429] RBP: 0000000000000001 R08: 0000000000000000 R09:
> > >> > 00007f5bb93ea2f0
> > >> > [  377.088606] R10: 00007f5bb1f81700 R11: 0000000000000202 R12:
> > >> > 0000000000000001
> > >> > [  377.095782] R13: 0000000000000006 R14: 0000000000000cb4 R15:
> > >> > 00007f5bb1f801f0
> > >> > [  377.102958] Modules linked in: ebt_arp(E) nft_meta_bridge(E)
> > >> > ip6_tables(E) xt_CT(E) nf_log_ipv4(E) nf_log_common(E) nft_limit(E)
> > >> > nft_counter(E) xt_LOG(E) xt_limit(E) xt_mac(E) xt_set(E) xt_multiport(E)
> > >> > xt_state(E) xt_conntrack(E) xt_comment(E) xt_physdev(E) nft_compat(E)
> > >> > ip_set_hash_net(E) ip_set(E) vhost_net(E) vhost(E) vhost_iotlb(E) tap(E)
> > >> > tun(E) tcp_diag(E) udp_diag(E) inet_diag(E) netconsole(E) nf_tables(E)
> > >> > vxlan(E) ip6_udp_tunnel(E) udp_tunnel(E) nfnetlink(E) binfmt_misc(E)
> > >> > iscsi_tcp(E) libiscsi_tcp(E) 8021q(E) garp(E) mrp(E) bonding(E) tls(E)
> > >> > vfat(E) fat(E) dm_service_time(E) dm_multipath(E) rpcrdma(E) sunrpc(E)
> > >> > rdma_ucm(E) ib_srpt(E) ib_isert(E) iscsi_target_mod(E)
> > >> target_core_mod(E)
> > >> > ib_iser(E) rdma_cm(E) iw_cm(E) ib_cm(E) libiscsi(E)
> > >> scsi_transport_iscsi(E)
> > >> > intel_rapl_msr(E) qedr(E) intel_rapl_common(E) ib_uverbs(E)
> > >> > isst_if_common(E) ib_core(E) nfit(E) libnvdimm(E)
> > >> x86_pkg_temp_thermal(E)
> > >> > intel_powerclamp(E) coretemp(E) kvm_intel(E) kvm(E) irqbypass(E)
> > >> > crct10dif_pclmul(E)
> > >> > [  377.102999]  crc32_pclmul(E) ghash_clmulni_intel(E) rapl(E)
> > >> > intel_cstate(E) ipmi_ssif(E) acpi_ipmi(E) ipmi_si(E) mei_me(E)
> > >> ioatdma(E)
> > >> > ipmi_devintf(E) dm_mod(E) ses(E) intel_uncore(E) pcspkr(E) qede(E)
> > >> > enclosure(E) tg3(E) mei(E) lpc_ich(E) hpilo(E) hpwdt(E)
> > >> > intel_pch_thermal(E) dca(E) ipmi_msghandler(E) acpi_power_meter(E)
> > >> ext4(E)
> > >> > mbcache(E) jbd2(E) sd_mod(E) t10_pi(E) sg(E) qedf(E) qed(E) crc8(E)
> > >> > libfcoe(E) libfc(E) smartpqi(E) scsi_transport_fc(E)
> > >> scsi_transport_sas(E)
> > >> > wmi(E) nf_conntrack(E) nf_defrag_ipv6(E) libcrc32c(E) crc32c_intel(E)
> > >> > nf_defrag_ipv4(E) br_netfilter(E) bridge(E) stp(E) llc(E)
> > >> > [  377.243468] ---[ end trace 04bce3bb051f7620 ]---
> > >> > [  377.385645] RIP: 0010:pmd_migration_entry_wait+0x132/0x140
> > >> > [  377.391194] Code: 02 00 00 00 5b 4c 89 c7 5d e9 8a e4 f6 ff 48 81 e2
> > >> 00
> > >> > f0 ff ff 48 f7 d2 48 21 c2 89 d1 f7 c2 81 01 00 00 75 80 e9 44 ff ff ff
> > >> > <0f> 0b 48 8b 2d 75 bd 30 01 e9 ef fe ff ff 0f 1f 44 00 00 41 55 48
> > >> > [  377.410091] RSP: 0000:ffffb65a5e1cfdc8 EFLAGS: 00010246
> > >> > [  377.415355] RAX: 0017ffffc0000000 RBX: ffff908b8ecabaf8 RCX:
> > >> > ffffffffffffffff
> > >> > [  377.422540] RDX: 0000000000000000 RSI: ffff908b8ecabaf8 RDI:
> > >> > fffff497473b2ae8
> > >> > [  377.429721] RBP: fffff497473b2ae8 R08: fffff49747fa8080 R09:
> > >> > 0000000000000000
> > >> > [  377.436902] R10: 0000000000000000 R11: 0000000000000000 R12:
> > >> > 0000000000000af8
> > >> > [  377.444086] R13: 0400000000000000 R14: 0400000000000080 R15:
> > >> > ffff908bbef7b6a8
> > >> > [  377.451272] FS:  00007f5bb1f81700(0000) GS:ffff90e87fd80000(0000)
> > >> > knlGS:0000000000000000
> > >> > [  377.459415] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > >> > [  377.465196] CR2: 00007f5b2bfffd98 CR3: 00000001f793e006 CR4:
> > >> > 00000000007726e0
> > >> > [  377.472377] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > >> > 0000000000000000
> > >> > [  377.479556] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> > >> > 0000000000000400
> > >> > [  377.486738] PKRU: 55555554
> > >> > [  377.489465] Kernel panic - not syncing: Fatal exception
> > >> > [  377.573911] Kernel Offset: 0xa000000 from 0xffffffff81000000
> > >> (relocation
> > >> > range: 0xffffffff80000000-0xffffffffbfffffff)
> > >> > [  377.716482] ---[ end Kernel panic - not syncing: Fatal exception ]---

Disassembly of the vmlinux Igor sent (along with other info) confirmed
something I suspected, that R08: fffff49747fa8080 in one of the dumps,
R08: ffffdf57428d8080 in the other, is the relevant struct page pointer
(and RAX the page->flags, which look like it was pointing at a good page).

A page pointer ....8080 in pmd_migration_entry_wait() is interesting:
normally I'd expect that to be ....0000 or ....8000, pointing to the
head of a huge page.  But instead it's pointing to the second tail
(though by now that compound page has been freed, and head pointers in
the tails reset to 0): as if the pfn has been incremented by 2 somehow.

And if the pfn (swp_offset) in the migration entry has got corrupted,
then it's no surprise that when removing migration entries,
page_vma_mapped_walk() would see migration_entry_to_page(entry) != page,
so be unable to replace that migration entry, leaving it behind for the
user to hit BUG_ON(!PageLocked) in pmd_migration_entry_wait() when
faulting on it later.

So, what might increment the swp_offset by 2? Hunt around the encodings.
Hmm, _PAGE_BIT_UFFD_WP is _PAGE_BIT_SOFTW2 which is bit 10, whereas
_PAGE_BIT_PROTNONE (top bit to be avoided in pte encoding of swap)
is _PAGE_BIT_GLOBAL is bit 8. After overcoming off-by-one confusions,
it looks like if something somewhere were to set _PAGE_BIT_UFFD_WP
in a migration pmd (whereas it's only suitable for a present pmd),
it would indeed increment the swp_offset by 2.

Hunt for uffd_wps, and run across copy_huge_pmd() in mm/huge_memory.c:
in Igor's 5.13.1 and 5.12.14 and many others, that says
	if (!(vma->vm_flags & VM_UFFD_WP))
		pmd = pmd_clear_uffd_wp(pmd);
just *before* checking is_swap_pmd(). Fixed in 5.14-rc1 in commit
8f34f1eac382 ("mm/userfaultfd: fix uffd-wp special cases for fork()").

But clearing the bit would be harmless, wouldn't it? Because it wouldn't
be set anyway. Waste a day before remembering what I never forgot but
somehow blanked out: the L1TF "feature" forced us to invert the offset
bits in the pte encoding of a swap entry, so there really is a bit set
there in the pmd entry, and clearing it has the effect of setting it in
the corresponding swap entry, so incrementing the migration pfn by 2.

I cannot explain why Igor never saw this crash on 5.12.12: maybe
something else in the environment changed around that time.  And it
will take several days for it to be confirmed as the fix in practice.

But I'm confident that 8f34f1eac382 will prove to be the fix, so Peter
please prepare some backports of that for the various stable/longterm
kernels that need it - I've not looked into whether it applies cleanly,
or depends on other commits too.  You fixed several related but different
things in that commit: but this one is the worst, because it can corrupt
even those who are not using UFFD_WP at all.

Many thans for reporting and helping, Igor.
Hugh

p.s. Peter, unrelated to this particular bug, and should not divert from
fixing it: but looking again at those swap encodings, and particularly
the soft_dirty manipulations: they look very fragile. I think uffd_wp
was wrong to follow that bad example, and your upcoming new encoding
(that I have previously called elegant) takes it a worse step further.

I think we should change to a rule where the architecture-independent
swp_entry_t contains *all* the info, including bits for soft_dirty and
uffd_wp, so that swap entry cases can move immediately to decoding from
arch-dependent pte to arch-independent swp_entry_t, and do all the
manipulations on that. But I don't have time to make that change, and
probably neither do you, and making the change is liable to introduce
errors itself. So, no immediate plans, but please keep in mind.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: kernel BUG at include/linux/swapops.h:204!
  2021-07-19 19:11         ` Hugh Dickins
@ 2021-07-19 22:12           ` Peter Xu
  2021-07-19 22:42             ` Hugh Dickins
  2021-07-20  7:47             ` Igor Raits
  2021-07-20 15:51           ` [PATCH stable 5.13.y/5.12.y 0/2] mm/thp: Fix uffd-wp with fork(); crash on pmd migration entry on fork Peter Xu
  2021-07-20 15:56           ` [PATCH stable 5.10.y " Peter Xu
  2 siblings, 2 replies; 24+ messages in thread
From: Peter Xu @ 2021-07-19 22:12 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Igor Raits, Andrew Morton, Hillf Danton, Axel Rasmussen, linux-mm

On Mon, Jul 19, 2021 at 12:11:21PM -0700, Hugh Dickins wrote:
> Hi Peter,

Hi, Hugh,

> 
> I believe you have already fixed this, but the fix needs to go to stable.
> Sorry, the messages below are a muddle of top and middle posting,
> I'll resume at the bottom.
> 
> On Fri, 16 Jul 2021, Hugh Dickins wrote:
> > On Thu, 15 Jul 2021, Igor Raits wrote:
> > 
> > > Hi everyone again,
> > > 
> > > I've been trying to reproduce this issue but still can't find a consistent
> > > pattern.
> > > 
> > > However, it did happen once more and this time on 5.13.1:
> > 
> > Thanks for the updates, Igor.
> > 
> > I have to admit that what you have reported confirms the suspicion
> > that it's a bug introduced by one of my "stable" patches in 5.12.14
> > (which are also in 5.13): nothing else between 5.12.12 and 5.12.14
> > seems likely to be relevant.
> > 
> > But I've gone back and forth and not been able to spot the problem.
> > 
> > Please would you send (either privately to me, or to the list) your
> > 5.13.1 kernel's .config, and disassembly of pmd_migration_entry_wait()
> > from its vmlinux (with line numbers if available; or just send the
> > whole vmlinux if that's easier, and I'll disassemble).
> > 
> > I am hoping that the disassembly, together with the register contents
> > that you've shown, will help guide towards an answer.
> > 
> > Thanks,
> > Hugh
> > 
> > > 
> > > [  222.068216] ------------[ cut here ]------------
> > > [  222.072884] kernel BUG at include/linux/swapops.h:204!
> > > [  222.078062] invalid opcode: 0000 [#1] SMP NOPTI
> > > [  222.082618] CPU: 38 PID: 9828 Comm: rpc-worker Tainted: G            E
> > >   5.13.1-1.gdc.el8.x86_64 #1
> > > [  222.091894] Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380
> > > Gen10, BIOS U30 05/24/2021
> > > [  222.100468] RIP: 0010:pmd_migration_entry_wait+0x132/0x140
> > > [  222.105994] Code: 02 00 00 00 5b 4c 89 c7 5d e9 ca c5 f6 ff 48 81 e2 00
> > > f0 ff ff 48 f7 d2 48 21 c2 89 d1 f7 c2 81 01 00 00 75 80 e9 44 ff ff ff
> > > <0f> 0b 48 8b 2d 65 48 30 01 e9 ef fe ff ff 0f 1f 44 00 00 41 55 48
> > > [  222.124878] RSP: 0000:ffffbcfe9eb7bdd8 EFLAGS: 00010246
> > > [  222.130134] RAX: 0057ffffc0000000 RBX: ffff9eec4b3cdbf8 RCX:
> > > ffffffffffffffff
> > > [  222.137309] RDX: 0000000000000000 RSI: ffff9eec4b3cdbf8 RDI:
> > > ffffdf55c52cf368
> > > [  222.144485] RBP: ffffdf55c52cf368 R08: ffffdf57428d8080 R09:
> > > 0000000000000000
> > > [  222.151661] R10: 0000000000000000 R11: 0000000000000000 R12:
> > > 0000000000000bf8
> > > [  222.158837] R13: 0400000000000000 R14: 0400000000000080 R15:
> > > ffff9eec2825b1f8
> > > [  222.166015] FS:  00007f6754aeb700(0000) GS:ffff9f49bfd00000(0000)
> > > knlGS:0000000000000000
> > > [  222.174153] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [  222.179932] CR2: 00007f676ffffd98 CR3: 000000012bf6a002 CR4:
> > > 00000000007726e0
> > > [  222.187109] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > > 0000000000000000
> > > [  222.194283] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> > > 0000000000000400
> > > [  222.201457] PKRU: 55555554
> > > [  222.204178] Call Trace:
> > > [  222.206638]  __handle_mm_fault+0x5ad/0x6e0
> > > [  222.210760]  ? sysvec_call_function_single+0xb/0x90
> > > [  222.215672]  handle_mm_fault+0xc5/0x290
> > > [  222.219529]  do_user_addr_fault+0x1a9/0x660
> > > [  222.223740]  ? sched_clock_cpu+0xc/0xa0
> > > [  222.227602]  exc_page_fault+0x68/0x130
> > > [  222.231373]  ? asm_exc_page_fault+0x8/0x30
> > > [  222.235495]  asm_exc_page_fault+0x1e/0x30
> > > [  222.239526] RIP: 0033:0x7f67baaed734
> > > [  222.243120] Code: 89 08 48 8b 35 dd 3b 21 00 4c 8d 0d d6 3b 21 00 31 c0
> > > 4c 39 ce 74 73 0f 1f 80 00 00 00 00 48 8d 96 40 fd ff ff 49 39 d2 74 22
> > > <48> 8b 96 d8 03 00 00 48 01 15 4e 7c 21 00 80 be 50 03 00 00 00 c7
> > > [  222.262002] RSP: 002b:00007f6754aea298 EFLAGS: 00010287
> > > [  222.267257] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
> > > 0000000000000000
> > > [  222.274432] RDX: 00007f676ffff700 RSI: 00007f676ffff9c0 RDI:
> > > 00007f676f7fec10
> > > [  222.281609] RBP: 0000000000000001 R08: 00007f676f7fed10 R09:
> > > 00007f67bad012f0
> > > [  222.288785] R10: 00007f6754aeb700 R11: 0000000000000202 R12:
> > > 0000000000000001
> > > [  222.295961] R13: 0000000000000006 R14: 0000000000000e28 R15:
> > > 00007f674006e1f0
> > > [  222.303137] Modules linked in: vhost_net(E) vhost(E) vhost_iotlb(E)
> > > tap(E) tun(E) tcp_diag(E) udp_diag(E) inet_diag(E) netconsole(E)
> > > nf_tables(E) vxlan(E) ip6_udp_tunnel(E) udp_tunnel(E) nfnetlink(E)
> > > binfmt_misc(E) iscsi_tcp(E) libiscsi_tcp(E) 8021q(E) garp(E) mrp(E)
> > > bonding(E) tls(E) vfat(E) fat(E) dm_service_time(E) dm_multipath(E)
> > > rpcrdma(E) sunrpc(E) rdma_ucm(E) ib_srpt(E) ib_isert(E) iscsi_target_mod(E)
> > > target_core_mod(E) ib_iser(E) rdma_cm(E) iw_cm(E) ib_cm(E) libiscsi(E)
> > > intel_rapl_msr(E) intel_rapl_common(E) scsi_transport_iscsi(E)
> > > isst_if_common(E) ipmi_ssif(E) nfit(E) libnvdimm(E) x86_pkg_temp_thermal(E)
> > > intel_powerclamp(E) coretemp(E) kvm_intel(E) kvm(E) irqbypass(E)
> > > crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) rapl(E) qedr(E)
> > > mei_me(E) acpi_ipmi(E) ib_uverbs(E) intel_cstate(E) ipmi_si(E) ib_core(E)
> > > ipmi_devintf(E) dm_mod(E) ioatdma(E) ses(E) intel_uncore(E) pcspkr(E)
> > > enclosure(E) mei(E) hpwdt(E) hpilo(E) lpc_ich(E) intel_pch_thermal(E)
> > > dca(E) ipmi_msghandler(E)
> > > [  222.303181]  acpi_power_meter(E) ext4(E) mbcache(E) jbd2(E) sd_mod(E)
> > > t10_pi(E) sg(E) qedf(E) qede(E) libfcoe(E) qed(E) libfc(E) smartpqi(E)
> > > scsi_transport_fc(E) tg3(E) scsi_transport_sas(E) crc8(E) wmi(E)
> > > nf_conntrack(E) libcrc32c(E) crc32c_intel(E) nf_defrag_ipv6(E)
> > > nf_defrag_ipv4(E) br_netfilter(E) bridge(E) stp(E) llc(E)
> > > [  222.420050] ---[ end trace bcf7b6d1610cc21f ]---
> > > [  222.572925] RIP: 0010:pmd_migration_entry_wait+0x132/0x140
> > > [  222.578469] Code: 02 00 00 00 5b 4c 89 c7 5d e9 ca c5 f6 ff 48 81 e2 00
> > > f0 ff ff 48 f7 d2 48 21 c2 89 d1 f7 c2 81 01 00 00 75 80 e9 44 ff ff ff
> > > <0f> 0b 48 8b 2d 65 48 30 01 e9 ef fe ff ff 0f 1f 44 00 00 41 55 48
> > > [  222.597359] RSP: 0000:ffffbcfe9eb7bdd8 EFLAGS: 00010246
> > > [  222.602620] RAX: 0057ffffc0000000 RBX: ffff9eec4b3cdbf8 RCX:
> > > ffffffffffffffff
> > > [  222.609807] RDX: 0000000000000000 RSI: ffff9eec4b3cdbf8 RDI:
> > > ffffdf55c52cf368
> > > [  222.616990] RBP: ffffdf55c52cf368 R08: ffffdf57428d8080 R09:
> > > 0000000000000000
> > > [  222.624177] R10: 0000000000000000 R11: 0000000000000000 R12:
> > > 0000000000000bf8
> > > [  222.631361] R13: 0400000000000000 R14: 0400000000000080 R15:
> > > ffff9eec2825b1f8
> > > [  222.638548] FS:  00007f6754aeb700(0000) GS:ffff9f49bfd00000(0000)
> > > knlGS:0000000000000000
> > > [  222.646694] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [  222.652481] CR2: 00007f676ffffd98 CR3: 000000012bf6a002 CR4:
> > > 00000000007726e0
> > > [  222.659665] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > > 0000000000000000
> > > [  222.666850] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> > > 0000000000000400
> > > [  222.674031] PKRU: 55555554
> > > [  222.676758] Kernel panic - not syncing: Fatal exception
> > > [  222.817538] Kernel Offset: 0x16000000 from 0xffffffff81000000
> > > (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> > > [  222.965540] ---[ end Kernel panic - not syncing: Fatal exception ]---
> > > 
> > > On Sun, Jul 11, 2021 at 8:06 AM Igor Raits <igor@gooddata.com> wrote:
> > > 
> > > > Hi Hugh,
> > > >
> > > > On Sun, Jul 11, 2021 at 6:17 AM Hugh Dickins <hughd@google.com> wrote:
> > > >
> > > >> On Sat, 10 Jul 2021, Igor Raits wrote:
> > > >>
> > > >> > Hello,
> > > >> >
> > > >> > I've seen one weird bug on 5.12.14 that happened a couple of times when
> > > >> I
> > > >> > started a bunch of VMs on a server.
> > > >>
> > > >> Would it be possible for you to try the same on a 5.12.13 kernel?
> > > >> Perhaps by reverting the diff between 5.12.13 and 5.12.14 temporarily.
> > > >> Enough to form an impression of whether the issue is new in 5.12.14.
> > > >>
> > > >
> > > > We've been using 5.12.12 for quite some time (~ a month) and I never saw
> > > > it there.
> > > >
> > > > But I have to admit that I don't really have a reproducer. For example, on
> > > > servers where it happened,
> > > > I just rebooted them and panic did not happen anymore (so I saw it only
> > > > only once,
> > > > only on 2 servers out of 32 that we have on 5.12.14).
> > > >
> > > >
> > > >> I ask because 5.12.14 did include several fixes and cleanups from me
> > > >> to page_vma_mapped_walk(), and that is involved in inserting and
> > > >> removing pmd migration entries.  I am not aware of introducing any
> > > >> bug there, but your report has got me worried.  If it's happening in
> > > >> 5.12.14 but not in 5.12.13, then I must look again at my changes.
> > > >>
> > > >> I don't expect Hillf's patch to help at at all: the pmd_lock()
> > > >> is supposed to be taken by page_vma_mapped_walk(), before
> > > >> set_pmd_migration_entry() and remove_migration_pmd() are called.
> > > >>
> > > >> Thanks,
> > > >> Hugh
> > > >>
> > > >> >
> > > >> > I've briefly googled this problem but could not find any relevant commit
> > > >> > that would fix this issue.
> > > >> >
> > > >> > Do you have any hint how to debug this further or know the fix by any
> > > >> > chance?
> > > >> >
> > > >> > Thanks in advance. Stack trace following:
> > > >> >
> > > >> > [  376.876610] ------------[ cut here ]------------
> > > >> > [  376.881274] kernel BUG at include/linux/swapops.h:204!
> > > >> > [  376.886455] invalid opcode: 0000 [#1] SMP NOPTI
> > > >> > [  376.891014] CPU: 40 PID: 11775 Comm: rpc-worker Tainted: G
> > > >>   E
> > > >> >     5.12.14-1.gdc.el8.x86_64 #1
> > > >> > [  376.900464] Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380
> > > >> > Gen10, BIOS U30 05/24/2021
> > > >> > [  376.909038] RIP: 0010:pmd_migration_entry_wait+0x132/0x140
> > > >> > [  376.914562] Code: 02 00 00 00 5b 4c 89 c7 5d e9 8a e4 f6 ff 48 81 e2
> > > >> 00
> > > >> > f0 ff ff 48 f7 d2 48 21 c2 89 d1 f7 c2 81 01 00 00 75 80 e9 44 ff ff ff
> > > >> > <0f> 0b 48 8b 2d 75 bd 30 01 e9 ef fe ff ff 0f 1f 44 00 00 41 55 48
> > > >> > [  376.933443] RSP: 0000:ffffb65a5e1cfdc8 EFLAGS: 00010246
> > > >> > [  376.938701] RAX: 0017ffffc0000000 RBX: ffff908b8ecabaf8 RCX:
> > > >> > ffffffffffffffff
> > > >> > [  376.945878] RDX: 0000000000000000 RSI: ffff908b8ecabaf8 RDI:
> > > >> > fffff497473b2ae8
> > > >> > [  376.953055] RBP: fffff497473b2ae8 R08: fffff49747fa8080 R09:
> > > >> > 0000000000000000
> > > >> > [  376.960230] R10: 0000000000000000 R11: 0000000000000000 R12:
> > > >> > 0000000000000af8
> > > >> > [  376.967407] R13: 0400000000000000 R14: 0400000000000080 R15:
> > > >> > ffff908bbef7b6a8
> > > >> > [  376.974582] FS:  00007f5bb1f81700(0000) GS:ffff90e87fd80000(0000)
> > > >> > knlGS:0000000000000000
> > > >> > [  376.982718] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > >> > [  376.988497] CR2: 00007f5b2bfffd98 CR3: 00000001f793e006 CR4:
> > > >> > 00000000007726e0
> > > >> > [  376.995673] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > > >> > 0000000000000000
> > > >> > [  377.002849] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> > > >> > 0000000000000400
> > > >> > [  377.010026] PKRU: 55555554
> > > >> > [  377.012745] Call Trace:
> > > >> > [  377.015207]  __handle_mm_fault+0x5ad/0x6e0
> > > >> > [  377.019335]  handle_mm_fault+0xc5/0x290
> > > >> > [  377.023194]  do_user_addr_fault+0x1cd/0x740
> > > >> > [  377.027406]  exc_page_fault+0x54/0x110
> > > >> > [  377.031182]  ? asm_exc_page_fault+0x8/0x30
> > > >> > [  377.035307]  asm_exc_page_fault+0x1e/0x30
> > > >> > [  377.039340] RIP: 0033:0x7f5bb91d6734
> > > >> > [  377.042937] Code: 89 08 48 8b 35 dd 3b 21 00 4c 8d 0d d6 3b 21 00 31
> > > >> c0
> > > >> > 4c 39 ce 74 73 0f 1f 80 00 00 00 00 48 8d 96 40 fd ff ff 49 39 d2 74 22
> > > >> > <48> 8b 96 d8 03 00 00 48 01 15 4e 7c 21 00 80 be 50 03 00 00 00 c7
> > > >> > [  377.061820] RSP: 002b:00007f5bb1f7ff58 EFLAGS: 00010206
> > > >> > [  377.067076] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
> > > >> > 00007f5ba0000020
> > > >> > [  377.074255] RDX: 00007f5b2bfff700 RSI: 00007f5b2bfff9c0 RDI:
> > > >> > 0000000000000001
> > > >> > [  377.081429] RBP: 0000000000000001 R08: 0000000000000000 R09:
> > > >> > 00007f5bb93ea2f0
> > > >> > [  377.088606] R10: 00007f5bb1f81700 R11: 0000000000000202 R12:
> > > >> > 0000000000000001
> > > >> > [  377.095782] R13: 0000000000000006 R14: 0000000000000cb4 R15:
> > > >> > 00007f5bb1f801f0
> > > >> > [  377.102958] Modules linked in: ebt_arp(E) nft_meta_bridge(E)
> > > >> > ip6_tables(E) xt_CT(E) nf_log_ipv4(E) nf_log_common(E) nft_limit(E)
> > > >> > nft_counter(E) xt_LOG(E) xt_limit(E) xt_mac(E) xt_set(E) xt_multiport(E)
> > > >> > xt_state(E) xt_conntrack(E) xt_comment(E) xt_physdev(E) nft_compat(E)
> > > >> > ip_set_hash_net(E) ip_set(E) vhost_net(E) vhost(E) vhost_iotlb(E) tap(E)
> > > >> > tun(E) tcp_diag(E) udp_diag(E) inet_diag(E) netconsole(E) nf_tables(E)
> > > >> > vxlan(E) ip6_udp_tunnel(E) udp_tunnel(E) nfnetlink(E) binfmt_misc(E)
> > > >> > iscsi_tcp(E) libiscsi_tcp(E) 8021q(E) garp(E) mrp(E) bonding(E) tls(E)
> > > >> > vfat(E) fat(E) dm_service_time(E) dm_multipath(E) rpcrdma(E) sunrpc(E)
> > > >> > rdma_ucm(E) ib_srpt(E) ib_isert(E) iscsi_target_mod(E)
> > > >> target_core_mod(E)
> > > >> > ib_iser(E) rdma_cm(E) iw_cm(E) ib_cm(E) libiscsi(E)
> > > >> scsi_transport_iscsi(E)
> > > >> > intel_rapl_msr(E) qedr(E) intel_rapl_common(E) ib_uverbs(E)
> > > >> > isst_if_common(E) ib_core(E) nfit(E) libnvdimm(E)
> > > >> x86_pkg_temp_thermal(E)
> > > >> > intel_powerclamp(E) coretemp(E) kvm_intel(E) kvm(E) irqbypass(E)
> > > >> > crct10dif_pclmul(E)
> > > >> > [  377.102999]  crc32_pclmul(E) ghash_clmulni_intel(E) rapl(E)
> > > >> > intel_cstate(E) ipmi_ssif(E) acpi_ipmi(E) ipmi_si(E) mei_me(E)
> > > >> ioatdma(E)
> > > >> > ipmi_devintf(E) dm_mod(E) ses(E) intel_uncore(E) pcspkr(E) qede(E)
> > > >> > enclosure(E) tg3(E) mei(E) lpc_ich(E) hpilo(E) hpwdt(E)
> > > >> > intel_pch_thermal(E) dca(E) ipmi_msghandler(E) acpi_power_meter(E)
> > > >> ext4(E)
> > > >> > mbcache(E) jbd2(E) sd_mod(E) t10_pi(E) sg(E) qedf(E) qed(E) crc8(E)
> > > >> > libfcoe(E) libfc(E) smartpqi(E) scsi_transport_fc(E)
> > > >> scsi_transport_sas(E)
> > > >> > wmi(E) nf_conntrack(E) nf_defrag_ipv6(E) libcrc32c(E) crc32c_intel(E)
> > > >> > nf_defrag_ipv4(E) br_netfilter(E) bridge(E) stp(E) llc(E)
> > > >> > [  377.243468] ---[ end trace 04bce3bb051f7620 ]---
> > > >> > [  377.385645] RIP: 0010:pmd_migration_entry_wait+0x132/0x140
> > > >> > [  377.391194] Code: 02 00 00 00 5b 4c 89 c7 5d e9 8a e4 f6 ff 48 81 e2
> > > >> 00
> > > >> > f0 ff ff 48 f7 d2 48 21 c2 89 d1 f7 c2 81 01 00 00 75 80 e9 44 ff ff ff
> > > >> > <0f> 0b 48 8b 2d 75 bd 30 01 e9 ef fe ff ff 0f 1f 44 00 00 41 55 48
> > > >> > [  377.410091] RSP: 0000:ffffb65a5e1cfdc8 EFLAGS: 00010246
> > > >> > [  377.415355] RAX: 0017ffffc0000000 RBX: ffff908b8ecabaf8 RCX:
> > > >> > ffffffffffffffff
> > > >> > [  377.422540] RDX: 0000000000000000 RSI: ffff908b8ecabaf8 RDI:
> > > >> > fffff497473b2ae8
> > > >> > [  377.429721] RBP: fffff497473b2ae8 R08: fffff49747fa8080 R09:
> > > >> > 0000000000000000
> > > >> > [  377.436902] R10: 0000000000000000 R11: 0000000000000000 R12:
> > > >> > 0000000000000af8
> > > >> > [  377.444086] R13: 0400000000000000 R14: 0400000000000080 R15:
> > > >> > ffff908bbef7b6a8
> > > >> > [  377.451272] FS:  00007f5bb1f81700(0000) GS:ffff90e87fd80000(0000)
> > > >> > knlGS:0000000000000000
> > > >> > [  377.459415] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > >> > [  377.465196] CR2: 00007f5b2bfffd98 CR3: 00000001f793e006 CR4:
> > > >> > 00000000007726e0
> > > >> > [  377.472377] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > > >> > 0000000000000000
> > > >> > [  377.479556] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> > > >> > 0000000000000400
> > > >> > [  377.486738] PKRU: 55555554
> > > >> > [  377.489465] Kernel panic - not syncing: Fatal exception
> > > >> > [  377.573911] Kernel Offset: 0xa000000 from 0xffffffff81000000
> > > >> (relocation
> > > >> > range: 0xffffffff80000000-0xffffffffbfffffff)
> > > >> > [  377.716482] ---[ end Kernel panic - not syncing: Fatal exception ]---
> 
> Disassembly of the vmlinux Igor sent (along with other info) confirmed
> something I suspected, that R08: fffff49747fa8080 in one of the dumps,
> R08: ffffdf57428d8080 in the other, is the relevant struct page pointer
> (and RAX the page->flags, which look like it was pointing at a good page).
> 
> A page pointer ....8080 in pmd_migration_entry_wait() is interesting:
> normally I'd expect that to be ....0000 or ....8000, pointing to the
> head of a huge page.  But instead it's pointing to the second tail
> (though by now that compound page has been freed, and head pointers in
> the tails reset to 0): as if the pfn has been incremented by 2 somehow.
> 
> And if the pfn (swp_offset) in the migration entry has got corrupted,
> then it's no surprise that when removing migration entries,
> page_vma_mapped_walk() would see migration_entry_to_page(entry) != page,
> so be unable to replace that migration entry, leaving it behind for the
> user to hit BUG_ON(!PageLocked) in pmd_migration_entry_wait() when
> faulting on it later.
> 
> So, what might increment the swp_offset by 2? Hunt around the encodings.
> Hmm, _PAGE_BIT_UFFD_WP is _PAGE_BIT_SOFTW2 which is bit 10, whereas
> _PAGE_BIT_PROTNONE (top bit to be avoided in pte encoding of swap)
> is _PAGE_BIT_GLOBAL is bit 8. After overcoming off-by-one confusions,
> it looks like if something somewhere were to set _PAGE_BIT_UFFD_WP
> in a migration pmd (whereas it's only suitable for a present pmd),
> it would indeed increment the swp_offset by 2.
> 
> Hunt for uffd_wps, and run across copy_huge_pmd() in mm/huge_memory.c:
> in Igor's 5.13.1 and 5.12.14 and many others, that says
> 	if (!(vma->vm_flags & VM_UFFD_WP))
> 		pmd = pmd_clear_uffd_wp(pmd);
> just *before* checking is_swap_pmd(). Fixed in 5.14-rc1 in commit
> 8f34f1eac382 ("mm/userfaultfd: fix uffd-wp special cases for fork()").
> 
> But clearing the bit would be harmless, wouldn't it? Because it wouldn't
> be set anyway. Waste a day before remembering what I never forgot but
> somehow blanked out: the L1TF "feature" forced us to invert the offset
> bits in the pte encoding of a swap entry, so there really is a bit set
> there in the pmd entry, and clearing it has the effect of setting it in
> the corresponding swap entry, so incrementing the migration pfn by 2.
> 
> I cannot explain why Igor never saw this crash on 5.12.12: maybe
> something else in the environment changed around that time.  And it
> will take several days for it to be confirmed as the fix in practice.
> 
> But I'm confident that 8f34f1eac382 will prove to be the fix, so Peter
> please prepare some backports of that for the various stable/longterm
> kernels that need it - I've not looked into whether it applies cleanly,
> or depends on other commits too.  You fixed several related but different
> things in that commit: but this one is the worst, because it can corrupt
> even those who are not using UFFD_WP at all.

Looks right to me, b569a1760782 ("userfaultfd: wp: drop _PAGE_UFFD_WP properly
when fork", 2020-04-07) seems to be the culprit.  I didn't notice the side
effect in the bug or in the fix, or it should have already land stables. I am
very sorry for such a preliminary bug that caused this fallout - I really can't
tell why I completely didn't look at is_swap_pte() that's so obvious indeed.

I checked it up, 5.6.y doesn't have the issue commit yet as it's not marked as
"fixes". It started to show up in 5.7.y~5.13.y. 5.14-rc1 has 8f34f1eac382 which
is the fix.  So I think we need the fix or equivalent fix for 5.7.y~5.13.y.

5.12.y & 5.13.y can pick up the fix 8f34f1eac382 cleanly.  For the olders
(5.7.y~5.11.y) they can't.  I plan to revert b569a1760782 instead.

> 
> Many thans for reporting and helping, Igor.
> Hugh
> 
> p.s. Peter, unrelated to this particular bug, and should not divert from
> fixing it: but looking again at those swap encodings, and particularly
> the soft_dirty manipulations: they look very fragile. I think uffd_wp
> was wrong to follow that bad example, and your upcoming new encoding
> (that I have previously called elegant) takes it a worse step further.
> 
> I think we should change to a rule where the architecture-independent
> swp_entry_t contains *all* the info, including bits for soft_dirty and
> uffd_wp, so that swap entry cases can move immediately to decoding from
> arch-dependent pte to arch-independent swp_entry_t, and do all the
> manipulations on that. But I don't have time to make that change, and
> probably neither do you, and making the change is liable to introduce
> errors itself. So, no immediate plans, but please keep in mind.

Curious: did we encounter similar issue previously where soft dirty bit is
applied wrongly so causing hard-to-debug issues?

If this is destined to be the best solution, I can work on both of them.  I am
just worried that's too big a change as you said so we don't know what's the
most efficient considering total time we use to develop, review and debug them.

The other alternative is we fix bugs; I know that's so cheap a word when I said
it, however we still can't deny it as an option yet.

We can definitely discuss this out of this thread and I'll prepare the backport
first.  For all the cases, this bug definitely brings some alert, and I'll keep
that in mind.

Please let me know if there's any comment on the backport plan above, or I'll
prepare the patches for all the branches before tomorrow.

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: kernel BUG at include/linux/swapops.h:204!
  2021-07-19 22:12           ` Peter Xu
@ 2021-07-19 22:42             ` Hugh Dickins
  2021-07-20  0:34               ` Peter Xu
  2021-07-20  7:47             ` Igor Raits
  1 sibling, 1 reply; 24+ messages in thread
From: Hugh Dickins @ 2021-07-19 22:42 UTC (permalink / raw)
  To: Peter Xu
  Cc: Hugh Dickins, Igor Raits, Andrew Morton, Hillf Danton,
	Axel Rasmussen, linux-mm

On Mon, 19 Jul 2021, Peter Xu wrote:
> On Mon, Jul 19, 2021 at 12:11:21PM -0700, Hugh Dickins wrote:
> > 
> > But I'm confident that 8f34f1eac382 will prove to be the fix, so Peter
> > please prepare some backports of that for the various stable/longterm
> > kernels that need it - I've not looked into whether it applies cleanly,
> > or depends on other commits too.  You fixed several related but different
> > things in that commit: but this one is the worst, because it can corrupt
> > even those who are not using UFFD_WP at all.
> 
> Looks right to me, b569a1760782 ("userfaultfd: wp: drop _PAGE_UFFD_WP properly
> when fork", 2020-04-07) seems to be the culprit.  I didn't notice the side
> effect in the bug or in the fix, or it should have already land stables. I am
> very sorry for such a preliminary bug that caused this fallout - I really can't
> tell why I completely didn't look at is_swap_pte() that's so obvious indeed.
> 
> I checked it up, 5.6.y doesn't have the issue commit yet as it's not marked as
> "fixes". It started to show up in 5.7.y~5.13.y. 5.14-rc1 has 8f34f1eac382 which
> is the fix.  So I think we need the fix or equivalent fix for 5.7.y~5.13.y.
> 
> 5.12.y & 5.13.y can pick up the fix 8f34f1eac382 cleanly.  For the olders
> (5.7.y~5.11.y) they can't.  I plan to revert b569a1760782 instead.
> 
...
> 
> Please let me know if there's any comment on the backport plan above, or I'll
> prepare the patches for all the branches before tomorrow.

Thanks for getting on to it so quickly, Peter.

The only non-EOL stable/longterm releases are then 5.13, 5.12 and 5.10.

I have no appreciation of the importance of UFFD_EVENT_FORK support
for uffd-wp.  And no appreciation of the importance of the other bugs
you fixed in 8f34f1eac382, and other uffd-wp fixes you may have made
recently, some backported, some not.

But I think it is worth giving 5.10, the longterm, a little more
consideration: don't be driven by whether 8f34f1eac382 applies cleanly
(all 5.13 and 5.12 would require then is a mail to GregKH Cc stable
asking him to add that commit), but by how important the support is
to users of 5.10, and how far away from working safely it is - maybe
a 5.10-specific patch would be worthwhile, maybe not, I cannot judge.

Hugh


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: kernel BUG at include/linux/swapops.h:204!
  2021-07-19 22:42             ` Hugh Dickins
@ 2021-07-20  0:34               ` Peter Xu
  2021-07-20  3:31                 ` Hugh Dickins
  0 siblings, 1 reply; 24+ messages in thread
From: Peter Xu @ 2021-07-20  0:34 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Igor Raits, Andrew Morton, Hillf Danton, Axel Rasmussen, linux-mm

On Mon, Jul 19, 2021 at 03:42:41PM -0700, Hugh Dickins wrote:
> On Mon, 19 Jul 2021, Peter Xu wrote:
> > On Mon, Jul 19, 2021 at 12:11:21PM -0700, Hugh Dickins wrote:
> > > 
> > > But I'm confident that 8f34f1eac382 will prove to be the fix, so Peter
> > > please prepare some backports of that for the various stable/longterm
> > > kernels that need it - I've not looked into whether it applies cleanly,
> > > or depends on other commits too.  You fixed several related but different
> > > things in that commit: but this one is the worst, because it can corrupt
> > > even those who are not using UFFD_WP at all.
> > 
> > Looks right to me, b569a1760782 ("userfaultfd: wp: drop _PAGE_UFFD_WP properly
> > when fork", 2020-04-07) seems to be the culprit.  I didn't notice the side
> > effect in the bug or in the fix, or it should have already land stables. I am
> > very sorry for such a preliminary bug that caused this fallout - I really can't
> > tell why I completely didn't look at is_swap_pte() that's so obvious indeed.
> > 
> > I checked it up, 5.6.y doesn't have the issue commit yet as it's not marked as
> > "fixes". It started to show up in 5.7.y~5.13.y. 5.14-rc1 has 8f34f1eac382 which
> > is the fix.  So I think we need the fix or equivalent fix for 5.7.y~5.13.y.
> > 
> > 5.12.y & 5.13.y can pick up the fix 8f34f1eac382 cleanly.  For the olders
> > (5.7.y~5.11.y) they can't.  I plan to revert b569a1760782 instead.
> > 
> ...
> > 
> > Please let me know if there's any comment on the backport plan above, or I'll
> > prepare the patches for all the branches before tomorrow.
> 
> Thanks for getting on to it so quickly, Peter.
> 
> The only non-EOL stable/longterm releases are then 5.13, 5.12 and 5.10.

I see, thanks.  I haven't explicitly backported patches to stable; it's a good
chance to learn about the bits, though.

> 
> I have no appreciation of the importance of UFFD_EVENT_FORK support
> for uffd-wp.  And no appreciation of the importance of the other bugs
> you fixed in 8f34f1eac382, and other uffd-wp fixes you may have made
> recently, some backported, some not.
> 
> But I think it is worth giving 5.10, the longterm, a little more
> consideration: don't be driven by whether 8f34f1eac382 applies cleanly
> (all 5.13 and 5.12 would require then is a mail to GregKH Cc stable
> asking him to add that commit), but by how important the support is
> to users of 5.10, and how far away from working safely it is - maybe
> a 5.10-specific patch would be worthwhile, maybe not, I cannot judge.

I am not aware of anyone who's using fork with uffd-wp.  CRIU is the major user
per my knowledge that uses uffd fork, but still it shouldn't be using uffd-wp.
I know other users that use uffd-wp, but never with fork event.

Per my understanding, if above is true then it's probably not a good candidate
for such patches that fixing uffd-wp + fork to be backported to stable, as in
stable tree rules there's one entry:

  - It must fix a real bug that bothers people (not a, “This could be a
    problem…” type thing).

https://www.kernel.org/doc/html/v5.13/process/stable-kernel-rules.html

That's also one reason I didn't add Fixes for some patches because I am not
sure whether that'll help anyone, and the worst case is if someone hit some
issue we can backport explicitly.

There're a few other things that made me a bit worried before backporting the
full patch, even for 5.12/5.13, as there're requirements on the patch:

  - It cannot be bigger than 100 lines, with context.
  - It must fix only one thing.

The full patch (after squashed) contains ~200 LOC with context, and it does fix
a few more things... Do you know whether that would be a problem?

I'm not sure how strict would stable branches be, but if that's one blocker
then we'll only be able to either pick up the real fix for copy_pmd_range() or
just revert the issue commit which is just a few more lines; that should also
keep the tree cleaner.  From that sense, maybe it's easier to just revert for
all the branches (5.10/5.12/5.13).  Then if someone broke with uffd-wp+fork, I
can backport the full patch with better reasoning.

Thoughts?

Thanks,

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: kernel BUG at include/linux/swapops.h:204!
  2021-07-20  0:34               ` Peter Xu
@ 2021-07-20  3:31                 ` Hugh Dickins
  0 siblings, 0 replies; 24+ messages in thread
From: Hugh Dickins @ 2021-07-20  3:31 UTC (permalink / raw)
  To: Peter Xu
  Cc: Hugh Dickins, Igor Raits, Andrew Morton, Hillf Danton,
	Axel Rasmussen, linux-mm

[-- Attachment #1: Type: text/plain, Size: 6366 bytes --]

On Mon, 19 Jul 2021, Peter Xu wrote:
> On Mon, Jul 19, 2021 at 03:42:41PM -0700, Hugh Dickins wrote:
> > On Mon, 19 Jul 2021, Peter Xu wrote:
> > > On Mon, Jul 19, 2021 at 12:11:21PM -0700, Hugh Dickins wrote:
> > > > 
> > > > But I'm confident that 8f34f1eac382 will prove to be the fix, so Peter
> > > > please prepare some backports of that for the various stable/longterm
> > > > kernels that need it - I've not looked into whether it applies cleanly,
> > > > or depends on other commits too.  You fixed several related but different
> > > > things in that commit: but this one is the worst, because it can corrupt
> > > > even those who are not using UFFD_WP at all.
> > > 
> > > Looks right to me, b569a1760782 ("userfaultfd: wp: drop _PAGE_UFFD_WP properly
> > > when fork", 2020-04-07) seems to be the culprit.  I didn't notice the side
> > > effect in the bug or in the fix, or it should have already land stables. I am
> > > very sorry for such a preliminary bug that caused this fallout - I really can't
> > > tell why I completely didn't look at is_swap_pte() that's so obvious indeed.
> > > 
> > > I checked it up, 5.6.y doesn't have the issue commit yet as it's not marked as
> > > "fixes". It started to show up in 5.7.y~5.13.y. 5.14-rc1 has 8f34f1eac382 which
> > > is the fix.  So I think we need the fix or equivalent fix for 5.7.y~5.13.y.
> > > 
> > > 5.12.y & 5.13.y can pick up the fix 8f34f1eac382 cleanly.  For the olders
> > > (5.7.y~5.11.y) they can't.  I plan to revert b569a1760782 instead.
> > > 
> > ...
> > > 
> > > Please let me know if there's any comment on the backport plan above, or I'll
> > > prepare the patches for all the branches before tomorrow.
> > 
> > Thanks for getting on to it so quickly, Peter.
> > 
> > The only non-EOL stable/longterm releases are then 5.13, 5.12 and 5.10.
> 
> I see, thanks.  I haven't explicitly backported patches to stable; it's a good
> chance to learn about the bits, though.
> 
> > 
> > I have no appreciation of the importance of UFFD_EVENT_FORK support
> > for uffd-wp.  And no appreciation of the importance of the other bugs
> > you fixed in 8f34f1eac382, and other uffd-wp fixes you may have made
> > recently, some backported, some not.
> > 
> > But I think it is worth giving 5.10, the longterm, a little more
> > consideration: don't be driven by whether 8f34f1eac382 applies cleanly
> > (all 5.13 and 5.12 would require then is a mail to GregKH Cc stable
> > asking him to add that commit), but by how important the support is
> > to users of 5.10, and how far away from working safely it is - maybe
> > a 5.10-specific patch would be worthwhile, maybe not, I cannot judge.
> 
> I am not aware of anyone who's using fork with uffd-wp.  CRIU is the major user
> per my knowledge that uses uffd fork, but still it shouldn't be using uffd-wp.
> I know other users that use uffd-wp, but never with fork event.

Okay. It does run a risk, that an unknown 5.10 longterm user is relying
on uffd-wp with uffd-fork, and the revert will break their usage in what
is supposed to be a stable kernel. But I'll let you be the judge of that.

> 
> Per my understanding, if above is true then it's probably not a good candidate
> for such patches that fixing uffd-wp + fork to be backported to stable, as in
> stable tree rules there's one entry:
> 
>   - It must fix a real bug that bothers people (not a, “This could be a
>     problem…” type thing).

BUG_ON(!PageLocked) is a real bug that bothered Igor (thougn not on 5.10).
I cannot estimate the seriousness of the other fixes in that commit.

> 
> https://www.kernel.org/doc/html/v5.13/process/stable-kernel-rules.html

I have not re-read those rules recently, but last time I questioned them,
the conclusion seemed to be that they're reasons the stable guys can give
for refusing a patch, but generally nowadays they put in much more than
fits those rules; to the extent that akpm has to restrain them from time
to time, insisting that mm patches for stable be his decision.

> 
> That's also one reason I didn't add Fixes for some patches because I am not
> sure whether that'll help anyone, and the worst case is if someone hit some
> issue we can backport explicitly.

Putting in Fixes generally helps, but whether then to Cc stable will
usually be akpm's decision based on our input.  In the case of a BUG
that hit a real user, I feel fairly confident predicting his decision.

> 
> There're a few other things that made me a bit worried before backporting the
> full patch, even for 5.12/5.13, as there're requirements on the patch:
> 
>   - It cannot be bigger than 100 lines, with context.
>   - It must fix only one thing.
> 
> The full patch (after squashed) contains ~200 LOC with context, and it does fix
> a few more things... Do you know whether that would be a problem?

I doubt it. But I'd have to spend too long reading the git-log man-page
to learn what options to give it to find an example in stable.git to
reassure you.

I'd worry much more about the quality of your commit than its size.
If you're uneasy about that now, then it should not have gone into
5.14-rc.  If you're confident in it, then I'd say it's good for stable:
unless you suspect interdependent fixes, some of which may already be
there in 5.10.y (not always by akpm's choice), some of which not: that
can get messy. Applying cleanly is not the most important criterion.

> 
> I'm not sure how strict would stable branches be, but if that's one blocker
> then we'll only be able to either pick up the real fix for copy_pmd_range() or
> just revert the issue commit which is just a few more lines; that should also
> keep the tree cleaner.  From that sense, maybe it's easier to just revert for
> all the branches (5.10/5.12/5.13).  Then if someone broke with uffd-wp+fork, I
> can backport the full patch with better reasoning.

5.10 is a long-lasting foundation for many, considerably more important
than 5.12 and 5.13 which will likely reach EOL soon: so yes, if you have
any doubts about 8f34f1eac382, or the way it will fit into earlier releases,
and think a revert of uffd-wp+fork is safer, that may be the right thing
for all those releases.

I feel like I've used a lot of words to say nothing: back to patches!

Hugh

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: kernel BUG at include/linux/swapops.h:204!
  2021-07-19 22:12           ` Peter Xu
  2021-07-19 22:42             ` Hugh Dickins
@ 2021-07-20  7:47             ` Igor Raits
  2021-07-20 16:01               ` Peter Xu
  1 sibling, 1 reply; 24+ messages in thread
From: Igor Raits @ 2021-07-20  7:47 UTC (permalink / raw)
  To: Peter Xu
  Cc: Hugh Dickins, Andrew Morton, Hillf Danton, Axel Rasmussen, linux-mm

[-- Attachment #1: Type: text/plain, Size: 23499 bytes --]

On Tue, Jul 20, 2021 at 12:13 AM Peter Xu <peterx@redhat.com> wrote:

> On Mon, Jul 19, 2021 at 12:11:21PM -0700, Hugh Dickins wrote:
> > Hi Peter,
>
> Hi, Hugh,
>
> >
> > I believe you have already fixed this, but the fix needs to go to stable.
> > Sorry, the messages below are a muddle of top and middle posting,
> > I'll resume at the bottom.
> >
> > On Fri, 16 Jul 2021, Hugh Dickins wrote:
> > > On Thu, 15 Jul 2021, Igor Raits wrote:
> > >
> > > > Hi everyone again,
> > > >
> > > > I've been trying to reproduce this issue but still can't find a
> consistent
> > > > pattern.
> > > >
> > > > However, it did happen once more and this time on 5.13.1:
> > >
> > > Thanks for the updates, Igor.
> > >
> > > I have to admit that what you have reported confirms the suspicion
> > > that it's a bug introduced by one of my "stable" patches in 5.12.14
> > > (which are also in 5.13): nothing else between 5.12.12 and 5.12.14
> > > seems likely to be relevant.
> > >
> > > But I've gone back and forth and not been able to spot the problem.
> > >
> > > Please would you send (either privately to me, or to the list) your
> > > 5.13.1 kernel's .config, and disassembly of pmd_migration_entry_wait()
> > > from its vmlinux (with line numbers if available; or just send the
> > > whole vmlinux if that's easier, and I'll disassemble).
> > >
> > > I am hoping that the disassembly, together with the register contents
> > > that you've shown, will help guide towards an answer.
> > >
> > > Thanks,
> > > Hugh
> > >
> > > >
> > > > [  222.068216] ------------[ cut here ]------------
> > > > [  222.072884] kernel BUG at include/linux/swapops.h:204!
> > > > [  222.078062] invalid opcode: 0000 [#1] SMP NOPTI
> > > > [  222.082618] CPU: 38 PID: 9828 Comm: rpc-worker Tainted: G
>     E
> > > >   5.13.1-1.gdc.el8.x86_64 #1
> > > > [  222.091894] Hardware name: HPE ProLiant DL380 Gen10/ProLiant DL380
> > > > Gen10, BIOS U30 05/24/2021
> > > > [  222.100468] RIP: 0010:pmd_migration_entry_wait+0x132/0x140
> > > > [  222.105994] Code: 02 00 00 00 5b 4c 89 c7 5d e9 ca c5 f6 ff 48 81
> e2 00
> > > > f0 ff ff 48 f7 d2 48 21 c2 89 d1 f7 c2 81 01 00 00 75 80 e9 44 ff ff
> ff
> > > > <0f> 0b 48 8b 2d 65 48 30 01 e9 ef fe ff ff 0f 1f 44 00 00 41 55 48
> > > > [  222.124878] RSP: 0000:ffffbcfe9eb7bdd8 EFLAGS: 00010246
> > > > [  222.130134] RAX: 0057ffffc0000000 RBX: ffff9eec4b3cdbf8 RCX:
> > > > ffffffffffffffff
> > > > [  222.137309] RDX: 0000000000000000 RSI: ffff9eec4b3cdbf8 RDI:
> > > > ffffdf55c52cf368
> > > > [  222.144485] RBP: ffffdf55c52cf368 R08: ffffdf57428d8080 R09:
> > > > 0000000000000000
> > > > [  222.151661] R10: 0000000000000000 R11: 0000000000000000 R12:
> > > > 0000000000000bf8
> > > > [  222.158837] R13: 0400000000000000 R14: 0400000000000080 R15:
> > > > ffff9eec2825b1f8
> > > > [  222.166015] FS:  00007f6754aeb700(0000) GS:ffff9f49bfd00000(0000)
> > > > knlGS:0000000000000000
> > > > [  222.174153] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > [  222.179932] CR2: 00007f676ffffd98 CR3: 000000012bf6a002 CR4:
> > > > 00000000007726e0
> > > > [  222.187109] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > > > 0000000000000000
> > > > [  222.194283] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> > > > 0000000000000400
> > > > [  222.201457] PKRU: 55555554
> > > > [  222.204178] Call Trace:
> > > > [  222.206638]  __handle_mm_fault+0x5ad/0x6e0
> > > > [  222.210760]  ? sysvec_call_function_single+0xb/0x90
> > > > [  222.215672]  handle_mm_fault+0xc5/0x290
> > > > [  222.219529]  do_user_addr_fault+0x1a9/0x660
> > > > [  222.223740]  ? sched_clock_cpu+0xc/0xa0
> > > > [  222.227602]  exc_page_fault+0x68/0x130
> > > > [  222.231373]  ? asm_exc_page_fault+0x8/0x30
> > > > [  222.235495]  asm_exc_page_fault+0x1e/0x30
> > > > [  222.239526] RIP: 0033:0x7f67baaed734
> > > > [  222.243120] Code: 89 08 48 8b 35 dd 3b 21 00 4c 8d 0d d6 3b 21 00
> 31 c0
> > > > 4c 39 ce 74 73 0f 1f 80 00 00 00 00 48 8d 96 40 fd ff ff 49 39 d2 74
> 22
> > > > <48> 8b 96 d8 03 00 00 48 01 15 4e 7c 21 00 80 be 50 03 00 00 00 c7
> > > > [  222.262002] RSP: 002b:00007f6754aea298 EFLAGS: 00010287
> > > > [  222.267257] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
> > > > 0000000000000000
> > > > [  222.274432] RDX: 00007f676ffff700 RSI: 00007f676ffff9c0 RDI:
> > > > 00007f676f7fec10
> > > > [  222.281609] RBP: 0000000000000001 R08: 00007f676f7fed10 R09:
> > > > 00007f67bad012f0
> > > > [  222.288785] R10: 00007f6754aeb700 R11: 0000000000000202 R12:
> > > > 0000000000000001
> > > > [  222.295961] R13: 0000000000000006 R14: 0000000000000e28 R15:
> > > > 00007f674006e1f0
> > > > [  222.303137] Modules linked in: vhost_net(E) vhost(E)
> vhost_iotlb(E)
> > > > tap(E) tun(E) tcp_diag(E) udp_diag(E) inet_diag(E) netconsole(E)
> > > > nf_tables(E) vxlan(E) ip6_udp_tunnel(E) udp_tunnel(E) nfnetlink(E)
> > > > binfmt_misc(E) iscsi_tcp(E) libiscsi_tcp(E) 8021q(E) garp(E) mrp(E)
> > > > bonding(E) tls(E) vfat(E) fat(E) dm_service_time(E) dm_multipath(E)
> > > > rpcrdma(E) sunrpc(E) rdma_ucm(E) ib_srpt(E) ib_isert(E)
> iscsi_target_mod(E)
> > > > target_core_mod(E) ib_iser(E) rdma_cm(E) iw_cm(E) ib_cm(E)
> libiscsi(E)
> > > > intel_rapl_msr(E) intel_rapl_common(E) scsi_transport_iscsi(E)
> > > > isst_if_common(E) ipmi_ssif(E) nfit(E) libnvdimm(E)
> x86_pkg_temp_thermal(E)
> > > > intel_powerclamp(E) coretemp(E) kvm_intel(E) kvm(E) irqbypass(E)
> > > > crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) rapl(E)
> qedr(E)
> > > > mei_me(E) acpi_ipmi(E) ib_uverbs(E) intel_cstate(E) ipmi_si(E)
> ib_core(E)
> > > > ipmi_devintf(E) dm_mod(E) ioatdma(E) ses(E) intel_uncore(E) pcspkr(E)
> > > > enclosure(E) mei(E) hpwdt(E) hpilo(E) lpc_ich(E) intel_pch_thermal(E)
> > > > dca(E) ipmi_msghandler(E)
> > > > [  222.303181]  acpi_power_meter(E) ext4(E) mbcache(E) jbd2(E)
> sd_mod(E)
> > > > t10_pi(E) sg(E) qedf(E) qede(E) libfcoe(E) qed(E) libfc(E)
> smartpqi(E)
> > > > scsi_transport_fc(E) tg3(E) scsi_transport_sas(E) crc8(E) wmi(E)
> > > > nf_conntrack(E) libcrc32c(E) crc32c_intel(E) nf_defrag_ipv6(E)
> > > > nf_defrag_ipv4(E) br_netfilter(E) bridge(E) stp(E) llc(E)
> > > > [  222.420050] ---[ end trace bcf7b6d1610cc21f ]---
> > > > [  222.572925] RIP: 0010:pmd_migration_entry_wait+0x132/0x140
> > > > [  222.578469] Code: 02 00 00 00 5b 4c 89 c7 5d e9 ca c5 f6 ff 48 81
> e2 00
> > > > f0 ff ff 48 f7 d2 48 21 c2 89 d1 f7 c2 81 01 00 00 75 80 e9 44 ff ff
> ff
> > > > <0f> 0b 48 8b 2d 65 48 30 01 e9 ef fe ff ff 0f 1f 44 00 00 41 55 48
> > > > [  222.597359] RSP: 0000:ffffbcfe9eb7bdd8 EFLAGS: 00010246
> > > > [  222.602620] RAX: 0057ffffc0000000 RBX: ffff9eec4b3cdbf8 RCX:
> > > > ffffffffffffffff
> > > > [  222.609807] RDX: 0000000000000000 RSI: ffff9eec4b3cdbf8 RDI:
> > > > ffffdf55c52cf368
> > > > [  222.616990] RBP: ffffdf55c52cf368 R08: ffffdf57428d8080 R09:
> > > > 0000000000000000
> > > > [  222.624177] R10: 0000000000000000 R11: 0000000000000000 R12:
> > > > 0000000000000bf8
> > > > [  222.631361] R13: 0400000000000000 R14: 0400000000000080 R15:
> > > > ffff9eec2825b1f8
> > > > [  222.638548] FS:  00007f6754aeb700(0000) GS:ffff9f49bfd00000(0000)
> > > > knlGS:0000000000000000
> > > > [  222.646694] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > [  222.652481] CR2: 00007f676ffffd98 CR3: 000000012bf6a002 CR4:
> > > > 00000000007726e0
> > > > [  222.659665] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > > > 0000000000000000
> > > > [  222.666850] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> > > > 0000000000000400
> > > > [  222.674031] PKRU: 55555554
> > > > [  222.676758] Kernel panic - not syncing: Fatal exception
> > > > [  222.817538] Kernel Offset: 0x16000000 from 0xffffffff81000000
> > > > (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> > > > [  222.965540] ---[ end Kernel panic - not syncing: Fatal exception
> ]---
> > > >
> > > > On Sun, Jul 11, 2021 at 8:06 AM Igor Raits <igor@gooddata.com>
> wrote:
> > > >
> > > > > Hi Hugh,
> > > > >
> > > > > On Sun, Jul 11, 2021 at 6:17 AM Hugh Dickins <hughd@google.com>
> wrote:
> > > > >
> > > > >> On Sat, 10 Jul 2021, Igor Raits wrote:
> > > > >>
> > > > >> > Hello,
> > > > >> >
> > > > >> > I've seen one weird bug on 5.12.14 that happened a couple of
> times when
> > > > >> I
> > > > >> > started a bunch of VMs on a server.
> > > > >>
> > > > >> Would it be possible for you to try the same on a 5.12.13 kernel?
> > > > >> Perhaps by reverting the diff between 5.12.13 and 5.12.14
> temporarily.
> > > > >> Enough to form an impression of whether the issue is new in
> 5.12.14.
> > > > >>
> > > > >
> > > > > We've been using 5.12.12 for quite some time (~ a month) and I
> never saw
> > > > > it there.
> > > > >
> > > > > But I have to admit that I don't really have a reproducer. For
> example, on
> > > > > servers where it happened,
> > > > > I just rebooted them and panic did not happen anymore (so I saw it
> only
> > > > > only once,
> > > > > only on 2 servers out of 32 that we have on 5.12.14).
> > > > >
> > > > >
> > > > >> I ask because 5.12.14 did include several fixes and cleanups from
> me
> > > > >> to page_vma_mapped_walk(), and that is involved in inserting and
> > > > >> removing pmd migration entries.  I am not aware of introducing any
> > > > >> bug there, but your report has got me worried.  If it's happening
> in
> > > > >> 5.12.14 but not in 5.12.13, then I must look again at my changes.
> > > > >>
> > > > >> I don't expect Hillf's patch to help at at all: the pmd_lock()
> > > > >> is supposed to be taken by page_vma_mapped_walk(), before
> > > > >> set_pmd_migration_entry() and remove_migration_pmd() are called.
> > > > >>
> > > > >> Thanks,
> > > > >> Hugh
> > > > >>
> > > > >> >
> > > > >> > I've briefly googled this problem but could not find any
> relevant commit
> > > > >> > that would fix this issue.
> > > > >> >
> > > > >> > Do you have any hint how to debug this further or know the fix
> by any
> > > > >> > chance?
> > > > >> >
> > > > >> > Thanks in advance. Stack trace following:
> > > > >> >
> > > > >> > [  376.876610] ------------[ cut here ]------------
> > > > >> > [  376.881274] kernel BUG at include/linux/swapops.h:204!
> > > > >> > [  376.886455] invalid opcode: 0000 [#1] SMP NOPTI
> > > > >> > [  376.891014] CPU: 40 PID: 11775 Comm: rpc-worker Tainted: G
> > > > >>   E
> > > > >> >     5.12.14-1.gdc.el8.x86_64 #1
> > > > >> > [  376.900464] Hardware name: HPE ProLiant DL380 Gen10/ProLiant
> DL380
> > > > >> > Gen10, BIOS U30 05/24/2021
> > > > >> > [  376.909038] RIP: 0010:pmd_migration_entry_wait+0x132/0x140
> > > > >> > [  376.914562] Code: 02 00 00 00 5b 4c 89 c7 5d e9 8a e4 f6 ff
> 48 81 e2
> > > > >> 00
> > > > >> > f0 ff ff 48 f7 d2 48 21 c2 89 d1 f7 c2 81 01 00 00 75 80 e9 44
> ff ff ff
> > > > >> > <0f> 0b 48 8b 2d 75 bd 30 01 e9 ef fe ff ff 0f 1f 44 00 00 41
> 55 48
> > > > >> > [  376.933443] RSP: 0000:ffffb65a5e1cfdc8 EFLAGS: 00010246
> > > > >> > [  376.938701] RAX: 0017ffffc0000000 RBX: ffff908b8ecabaf8 RCX:
> > > > >> > ffffffffffffffff
> > > > >> > [  376.945878] RDX: 0000000000000000 RSI: ffff908b8ecabaf8 RDI:
> > > > >> > fffff497473b2ae8
> > > > >> > [  376.953055] RBP: fffff497473b2ae8 R08: fffff49747fa8080 R09:
> > > > >> > 0000000000000000
> > > > >> > [  376.960230] R10: 0000000000000000 R11: 0000000000000000 R12:
> > > > >> > 0000000000000af8
> > > > >> > [  376.967407] R13: 0400000000000000 R14: 0400000000000080 R15:
> > > > >> > ffff908bbef7b6a8
> > > > >> > [  376.974582] FS:  00007f5bb1f81700(0000)
> GS:ffff90e87fd80000(0000)
> > > > >> > knlGS:0000000000000000
> > > > >> > [  376.982718] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > >> > [  376.988497] CR2: 00007f5b2bfffd98 CR3: 00000001f793e006 CR4:
> > > > >> > 00000000007726e0
> > > > >> > [  376.995673] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > > > >> > 0000000000000000
> > > > >> > [  377.002849] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> > > > >> > 0000000000000400
> > > > >> > [  377.010026] PKRU: 55555554
> > > > >> > [  377.012745] Call Trace:
> > > > >> > [  377.015207]  __handle_mm_fault+0x5ad/0x6e0
> > > > >> > [  377.019335]  handle_mm_fault+0xc5/0x290
> > > > >> > [  377.023194]  do_user_addr_fault+0x1cd/0x740
> > > > >> > [  377.027406]  exc_page_fault+0x54/0x110
> > > > >> > [  377.031182]  ? asm_exc_page_fault+0x8/0x30
> > > > >> > [  377.035307]  asm_exc_page_fault+0x1e/0x30
> > > > >> > [  377.039340] RIP: 0033:0x7f5bb91d6734
> > > > >> > [  377.042937] Code: 89 08 48 8b 35 dd 3b 21 00 4c 8d 0d d6 3b
> 21 00 31
> > > > >> c0
> > > > >> > 4c 39 ce 74 73 0f 1f 80 00 00 00 00 48 8d 96 40 fd ff ff 49 39
> d2 74 22
> > > > >> > <48> 8b 96 d8 03 00 00 48 01 15 4e 7c 21 00 80 be 50 03 00 00
> 00 c7
> > > > >> > [  377.061820] RSP: 002b:00007f5bb1f7ff58 EFLAGS: 00010206
> > > > >> > [  377.067076] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
> > > > >> > 00007f5ba0000020
> > > > >> > [  377.074255] RDX: 00007f5b2bfff700 RSI: 00007f5b2bfff9c0 RDI:
> > > > >> > 0000000000000001
> > > > >> > [  377.081429] RBP: 0000000000000001 R08: 0000000000000000 R09:
> > > > >> > 00007f5bb93ea2f0
> > > > >> > [  377.088606] R10: 00007f5bb1f81700 R11: 0000000000000202 R12:
> > > > >> > 0000000000000001
> > > > >> > [  377.095782] R13: 0000000000000006 R14: 0000000000000cb4 R15:
> > > > >> > 00007f5bb1f801f0
> > > > >> > [  377.102958] Modules linked in: ebt_arp(E) nft_meta_bridge(E)
> > > > >> > ip6_tables(E) xt_CT(E) nf_log_ipv4(E) nf_log_common(E)
> nft_limit(E)
> > > > >> > nft_counter(E) xt_LOG(E) xt_limit(E) xt_mac(E) xt_set(E)
> xt_multiport(E)
> > > > >> > xt_state(E) xt_conntrack(E) xt_comment(E) xt_physdev(E)
> nft_compat(E)
> > > > >> > ip_set_hash_net(E) ip_set(E) vhost_net(E) vhost(E)
> vhost_iotlb(E) tap(E)
> > > > >> > tun(E) tcp_diag(E) udp_diag(E) inet_diag(E) netconsole(E)
> nf_tables(E)
> > > > >> > vxlan(E) ip6_udp_tunnel(E) udp_tunnel(E) nfnetlink(E)
> binfmt_misc(E)
> > > > >> > iscsi_tcp(E) libiscsi_tcp(E) 8021q(E) garp(E) mrp(E) bonding(E)
> tls(E)
> > > > >> > vfat(E) fat(E) dm_service_time(E) dm_multipath(E) rpcrdma(E)
> sunrpc(E)
> > > > >> > rdma_ucm(E) ib_srpt(E) ib_isert(E) iscsi_target_mod(E)
> > > > >> target_core_mod(E)
> > > > >> > ib_iser(E) rdma_cm(E) iw_cm(E) ib_cm(E) libiscsi(E)
> > > > >> scsi_transport_iscsi(E)
> > > > >> > intel_rapl_msr(E) qedr(E) intel_rapl_common(E) ib_uverbs(E)
> > > > >> > isst_if_common(E) ib_core(E) nfit(E) libnvdimm(E)
> > > > >> x86_pkg_temp_thermal(E)
> > > > >> > intel_powerclamp(E) coretemp(E) kvm_intel(E) kvm(E) irqbypass(E)
> > > > >> > crct10dif_pclmul(E)
> > > > >> > [  377.102999]  crc32_pclmul(E) ghash_clmulni_intel(E) rapl(E)
> > > > >> > intel_cstate(E) ipmi_ssif(E) acpi_ipmi(E) ipmi_si(E) mei_me(E)
> > > > >> ioatdma(E)
> > > > >> > ipmi_devintf(E) dm_mod(E) ses(E) intel_uncore(E) pcspkr(E)
> qede(E)
> > > > >> > enclosure(E) tg3(E) mei(E) lpc_ich(E) hpilo(E) hpwdt(E)
> > > > >> > intel_pch_thermal(E) dca(E) ipmi_msghandler(E)
> acpi_power_meter(E)
> > > > >> ext4(E)
> > > > >> > mbcache(E) jbd2(E) sd_mod(E) t10_pi(E) sg(E) qedf(E) qed(E)
> crc8(E)
> > > > >> > libfcoe(E) libfc(E) smartpqi(E) scsi_transport_fc(E)
> > > > >> scsi_transport_sas(E)
> > > > >> > wmi(E) nf_conntrack(E) nf_defrag_ipv6(E) libcrc32c(E)
> crc32c_intel(E)
> > > > >> > nf_defrag_ipv4(E) br_netfilter(E) bridge(E) stp(E) llc(E)
> > > > >> > [  377.243468] ---[ end trace 04bce3bb051f7620 ]---
> > > > >> > [  377.385645] RIP: 0010:pmd_migration_entry_wait+0x132/0x140
> > > > >> > [  377.391194] Code: 02 00 00 00 5b 4c 89 c7 5d e9 8a e4 f6 ff
> 48 81 e2
> > > > >> 00
> > > > >> > f0 ff ff 48 f7 d2 48 21 c2 89 d1 f7 c2 81 01 00 00 75 80 e9 44
> ff ff ff
> > > > >> > <0f> 0b 48 8b 2d 75 bd 30 01 e9 ef fe ff ff 0f 1f 44 00 00 41
> 55 48
> > > > >> > [  377.410091] RSP: 0000:ffffb65a5e1cfdc8 EFLAGS: 00010246
> > > > >> > [  377.415355] RAX: 0017ffffc0000000 RBX: ffff908b8ecabaf8 RCX:
> > > > >> > ffffffffffffffff
> > > > >> > [  377.422540] RDX: 0000000000000000 RSI: ffff908b8ecabaf8 RDI:
> > > > >> > fffff497473b2ae8
> > > > >> > [  377.429721] RBP: fffff497473b2ae8 R08: fffff49747fa8080 R09:
> > > > >> > 0000000000000000
> > > > >> > [  377.436902] R10: 0000000000000000 R11: 0000000000000000 R12:
> > > > >> > 0000000000000af8
> > > > >> > [  377.444086] R13: 0400000000000000 R14: 0400000000000080 R15:
> > > > >> > ffff908bbef7b6a8
> > > > >> > [  377.451272] FS:  00007f5bb1f81700(0000)
> GS:ffff90e87fd80000(0000)
> > > > >> > knlGS:0000000000000000
> > > > >> > [  377.459415] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > >> > [  377.465196] CR2: 00007f5b2bfffd98 CR3: 00000001f793e006 CR4:
> > > > >> > 00000000007726e0
> > > > >> > [  377.472377] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > > > >> > 0000000000000000
> > > > >> > [  377.479556] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> > > > >> > 0000000000000400
> > > > >> > [  377.486738] PKRU: 55555554
> > > > >> > [  377.489465] Kernel panic - not syncing: Fatal exception
> > > > >> > [  377.573911] Kernel Offset: 0xa000000 from 0xffffffff81000000
> > > > >> (relocation
> > > > >> > range: 0xffffffff80000000-0xffffffffbfffffff)
> > > > >> > [  377.716482] ---[ end Kernel panic - not syncing: Fatal
> exception ]---
> >
> > Disassembly of the vmlinux Igor sent (along with other info) confirmed
> > something I suspected, that R08: fffff49747fa8080 in one of the dumps,
> > R08: ffffdf57428d8080 in the other, is the relevant struct page pointer
> > (and RAX the page->flags, which look like it was pointing at a good
> page).
> >
> > A page pointer ....8080 in pmd_migration_entry_wait() is interesting:
> > normally I'd expect that to be ....0000 or ....8000, pointing to the
> > head of a huge page.  But instead it's pointing to the second tail
> > (though by now that compound page has been freed, and head pointers in
> > the tails reset to 0): as if the pfn has been incremented by 2 somehow.
> >
> > And if the pfn (swp_offset) in the migration entry has got corrupted,
> > then it's no surprise that when removing migration entries,
> > page_vma_mapped_walk() would see migration_entry_to_page(entry) != page,
> > so be unable to replace that migration entry, leaving it behind for the
> > user to hit BUG_ON(!PageLocked) in pmd_migration_entry_wait() when
> > faulting on it later.
> >
> > So, what might increment the swp_offset by 2? Hunt around the encodings.
> > Hmm, _PAGE_BIT_UFFD_WP is _PAGE_BIT_SOFTW2 which is bit 10, whereas
> > _PAGE_BIT_PROTNONE (top bit to be avoided in pte encoding of swap)
> > is _PAGE_BIT_GLOBAL is bit 8. After overcoming off-by-one confusions,
> > it looks like if something somewhere were to set _PAGE_BIT_UFFD_WP
> > in a migration pmd (whereas it's only suitable for a present pmd),
> > it would indeed increment the swp_offset by 2.
> >
> > Hunt for uffd_wps, and run across copy_huge_pmd() in mm/huge_memory.c:
> > in Igor's 5.13.1 and 5.12.14 and many others, that says
> >       if (!(vma->vm_flags & VM_UFFD_WP))
> >               pmd = pmd_clear_uffd_wp(pmd);
> > just *before* checking is_swap_pmd(). Fixed in 5.14-rc1 in commit
> > 8f34f1eac382 ("mm/userfaultfd: fix uffd-wp special cases for fork()").
> >
> > But clearing the bit would be harmless, wouldn't it? Because it wouldn't
> > be set anyway. Waste a day before remembering what I never forgot but
> > somehow blanked out: the L1TF "feature" forced us to invert the offset
> > bits in the pte encoding of a swap entry, so there really is a bit set
> > there in the pmd entry, and clearing it has the effect of setting it in
> > the corresponding swap entry, so incrementing the migration pfn by 2.
> >
> > I cannot explain why Igor never saw this crash on 5.12.12: maybe
> > something else in the environment changed around that time.  And it
> > will take several days for it to be confirmed as the fix in practice.
> >
> > But I'm confident that 8f34f1eac382 will prove to be the fix, so Peter
> > please prepare some backports of that for the various stable/longterm
> > kernels that need it - I've not looked into whether it applies cleanly,
> > or depends on other commits too.  You fixed several related but different
> > things in that commit: but this one is the worst, because it can corrupt
> > even those who are not using UFFD_WP at all.
>
> Looks right to me, b569a1760782 ("userfaultfd: wp: drop _PAGE_UFFD_WP
> properly
> when fork", 2020-04-07) seems to be the culprit.  I didn't notice the side
> effect in the bug or in the fix, or it should have already land stables. I
> am
> very sorry for such a preliminary bug that caused this fallout - I really
> can't
> tell why I completely didn't look at is_swap_pte() that's so obvious
> indeed.
>
> I checked it up, 5.6.y doesn't have the issue commit yet as it's not
> marked as
> "fixes". It started to show up in 5.7.y~5.13.y. 5.14-rc1 has 8f34f1eac382
> which
> is the fix.  So I think we need the fix or equivalent fix for 5.7.y~5.13.y.
>
> 5.12.y & 5.13.y can pick up the fix 8f34f1eac382 cleanly.  For the olders
> (5.7.y~5.11.y) they can't.  I plan to revert b569a1760782 instead.
>

FTR, even though 8f34f1eac382 applies cleanly it does not compile.
The 1st patch of that series is also required (5fc7a5f6fd04) - it removes
use of
*vma, which is later removed by the patch that fixes the actual problem.


>
> >
> > Many thans for reporting and helping, Igor.
> > Hugh
> >
> > p.s. Peter, unrelated to this particular bug, and should not divert from
> > fixing it: but looking again at those swap encodings, and particularly
> > the soft_dirty manipulations: they look very fragile. I think uffd_wp
> > was wrong to follow that bad example, and your upcoming new encoding
> > (that I have previously called elegant) takes it a worse step further.
> >
> > I think we should change to a rule where the architecture-independent
> > swp_entry_t contains *all* the info, including bits for soft_dirty and
> > uffd_wp, so that swap entry cases can move immediately to decoding from
> > arch-dependent pte to arch-independent swp_entry_t, and do all the
> > manipulations on that. But I don't have time to make that change, and
> > probably neither do you, and making the change is liable to introduce
> > errors itself. So, no immediate plans, but please keep in mind.
>
> Curious: did we encounter similar issue previously where soft dirty bit is
> applied wrongly so causing hard-to-debug issues?
>
> If this is destined to be the best solution, I can work on both of them.
> I am
> just worried that's too big a change as you said so we don't know what's
> the
> most efficient considering total time we use to develop, review and debug
> them.
>
> The other alternative is we fix bugs; I know that's so cheap a word when I
> said
> it, however we still can't deny it as an option yet.
>
> We can definitely discuss this out of this thread and I'll prepare the
> backport
> first.  For all the cases, this bug definitely brings some alert, and I'll
> keep
> that in mind.
>
> Please let me know if there's any comment on the backport plan above, or
> I'll
> prepare the patches for all the branches before tomorrow.
>
> Thanks,
>
> --
> Peter Xu
>
>

-- 

Igor Raits

Sr. SW Engineer

igor@gooddata.com

+420 775 117 817

Moravske namesti 1007/14

602 00 Brno-Veveri, Czech Republic

Twitter <https://twitter.com/gooddata> | Facebook
<https://www.facebook.com/gooddata> | LinkedIn
<http://www.linkedin.com/company/gooddata> | Blog
<http://www.gooddata.com/blog>


<https://www.gooddata.com/>

[-- Attachment #2: Type: text/html, Size: 34308 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH stable 5.13.y/5.12.y 0/2] mm/thp: Fix uffd-wp with fork(); crash on pmd migration entry on fork
  2021-07-19 19:11         ` Hugh Dickins
  2021-07-19 22:12           ` Peter Xu
@ 2021-07-20 15:51           ` Peter Xu
  2021-07-20 15:51             ` [PATCH stable 5.13.y/5.12.y 1/2] mm/thp: simplify copying of huge zero page pmd when fork Peter Xu
                               ` (2 more replies)
  2021-07-20 15:56           ` [PATCH stable 5.10.y " Peter Xu
  2 siblings, 3 replies; 24+ messages in thread
From: Peter Xu @ 2021-07-20 15:51 UTC (permalink / raw)
  To: stable, linux-mm, linux-kernel
  Cc: Hugh Dickins, Axel Rasmussen, Andrew Morton, peterx,
	Hillf Danton, Igor Raits

In summary: this series should be needed for 5.10/5.12/5.13. This is the
5.13.y/5.12.y backport of the series, and it should be able to be applied on
both of the branches.  Patch 1 is a dependency of patch 2, while patch 2 should
be the real fix.

This series should be able to fix a rare race that mentioned in thread:

https://lore.kernel.org/linux-mm/796cbb7-5a1c-1ba0-dde5-479aba8224f2@google.com/

This fact wasn't discovered when the fix got proposed and merged, because the
fix was originally about uffd-wp and its fork event.  However it turns out that
the problematic commit b569a1760782f3d is also causing crashing on fork() of
pmd migration entries which is even more severe than the original uffd-wp
problem.

Stable kernels at least on 5.12.y has the crash reproduced, and it's possible
5.13.y and 5.10.y could hit it due to having the problematic commit
b569a1760782f3d but lacking of the uffd-wp fix patch (8f34f1eac382, which is
also patch 2 of this series).

The pmd entry crash problem was reported by Igor Raits <igor@gooddata.com> and
debugged by Hugh Dickins <hughd@google.com>.

Please review, thanks.

Peter Xu (2):
  mm/thp: simplify copying of huge zero page pmd when fork
  mm/userfaultfd: fix uffd-wp special cases for fork()

 include/linux/huge_mm.h |  2 +-
 include/linux/swapops.h |  2 ++
 mm/huge_memory.c        | 36 +++++++++++++++++-------------------
 mm/memory.c             | 25 +++++++++++++------------
 4 files changed, 33 insertions(+), 32 deletions(-)

-- 
2.31.1




^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH stable 5.13.y/5.12.y 1/2] mm/thp: simplify copying of huge zero page pmd when fork
  2021-07-20 15:51           ` [PATCH stable 5.13.y/5.12.y 0/2] mm/thp: Fix uffd-wp with fork(); crash on pmd migration entry on fork Peter Xu
@ 2021-07-20 15:51             ` Peter Xu
  2021-07-20 15:51             ` [PATCH stable 5.13.y/5.12.y 2/2] mm/userfaultfd: fix uffd-wp special cases for fork() Peter Xu
  2021-07-20 20:32             ` [PATCH stable 5.13.y/5.12.y 0/2] mm/thp: Fix uffd-wp with fork(); crash on pmd migration entry on fork Hugh Dickins
  2 siblings, 0 replies; 24+ messages in thread
From: Peter Xu @ 2021-07-20 15:51 UTC (permalink / raw)
  To: stable, linux-mm, linux-kernel
  Cc: Hugh Dickins, Axel Rasmussen, Andrew Morton, peterx,
	Hillf Danton, Igor Raits

Patch series "mm/uffd: Misc fix for uffd-wp and one more test".

This series tries to fix some corner case bugs for uffd-wp on either thp
or fork().  Then it introduced a new test with pagemap/pageout.

Patch layout:

Patch 1:    cleanup for THP, it'll slightly simplify the follow up patches
Patch 2-4:  misc fixes for uffd-wp here and there; please refer to each patch
Patch 5:    add pagemap support for uffd-wp
Patch 6:    add pagemap/pageout test for uffd-wp

The last test introduced can also verify some of the fixes in previous
patches, as the test will fail without the fixes.  However it's not easy
to verify all the changes in patch 2-4, but hopefully they can still be
properly reviewed.

Note that if considering the ongoing uffd-wp shmem & hugetlbfs work, patch
5 will be incomplete as it's missing e.g.  hugetlbfs part or the special
swap pte detection.  However that's not needed in this series, and since
that series is still during review, this series does not depend on that
one (the last test only runs with anonymous memory, not file-backed).  So
this series can be merged even before that series.

This patch (of 6):

Huge zero page is handled in a special path in copy_huge_pmd(), however it
should share most codes with a normal thp page.  Trying to share more code
with it by removing the special path.  The only leftover so far is the
huge zero page refcounting (mm_get_huge_zero_page()), because that's
separately done with a global counter.

This prepares for a future patch to modify the huge pmd to be installed,
so that we don't need to duplicate it explicitly into huge zero page case
too.

Link: https://lkml.kernel.org/r/20210428225030.9708-1-peterx@redhat.com
Link: https://lkml.kernel.org/r/20210428225030.9708-2-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Mike Kravetz <mike.kravetz@oracle.com>, peterx@redhat.com
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Brian Geffon <bgeffon@google.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Wang Qing <wangqing@vivo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 5fc7a5f6fd04bc18f309d9f979b32ef7d1d0a997)
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 mm/huge_memory.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 8857ef1543eb..4cea1e218b48 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1088,17 +1088,13 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 	 * a page table.
 	 */
 	if (is_huge_zero_pmd(pmd)) {
-		struct page *zero_page;
 		/*
 		 * get_huge_zero_page() will never allocate a new page here,
 		 * since we already have a zero page to copy. It just takes a
 		 * reference.
 		 */
-		zero_page = mm_get_huge_zero_page(dst_mm);
-		set_huge_zero_page(pgtable, dst_mm, vma, addr, dst_pmd,
-				zero_page);
-		ret = 0;
-		goto out_unlock;
+		mm_get_huge_zero_page(dst_mm);
+		goto out_zero_page;
 	}
 
 	src_page = pmd_page(pmd);
@@ -1122,6 +1118,7 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 	get_page(src_page);
 	page_dup_rmap(src_page, true);
 	add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR);
+out_zero_page:
 	mm_inc_nr_ptes(dst_mm);
 	pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable);
 
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH stable 5.13.y/5.12.y 2/2] mm/userfaultfd: fix uffd-wp special cases for fork()
  2021-07-20 15:51           ` [PATCH stable 5.13.y/5.12.y 0/2] mm/thp: Fix uffd-wp with fork(); crash on pmd migration entry on fork Peter Xu
  2021-07-20 15:51             ` [PATCH stable 5.13.y/5.12.y 1/2] mm/thp: simplify copying of huge zero page pmd when fork Peter Xu
@ 2021-07-20 15:51             ` Peter Xu
  2021-07-20 20:32             ` [PATCH stable 5.13.y/5.12.y 0/2] mm/thp: Fix uffd-wp with fork(); crash on pmd migration entry on fork Hugh Dickins
  2 siblings, 0 replies; 24+ messages in thread
From: Peter Xu @ 2021-07-20 15:51 UTC (permalink / raw)
  To: stable, linux-mm, linux-kernel
  Cc: Hugh Dickins, Axel Rasmussen, Andrew Morton, peterx,
	Hillf Danton, Igor Raits

We tried to do something similar in b569a1760782 ("userfaultfd: wp: drop
_PAGE_UFFD_WP properly when fork") previously, but it's not doing it all
right..  A few fixes around the code path:

1. We were referencing VM_UFFD_WP vm_flags on the _old_ vma rather
   than the new vma.  That's overlooked in b569a1760782, so it won't work
   as expected.  Thanks to the recent rework on fork code
   (7a4830c380f3a8b3), we can easily get the new vma now, so switch the
   checks to that.

2. Dropping the uffd-wp bit in copy_huge_pmd() could be wrong if the
   huge pmd is a migration huge pmd.  When it happens, instead of using
   pmd_uffd_wp(), we should use pmd_swp_uffd_wp().  The fix is simply to
   handle them separately.

3. Forget to carry over uffd-wp bit for a write migration huge pmd
   entry.  This also happens in copy_huge_pmd(), where we converted a
   write huge migration entry into a read one.

4. In copy_nonpresent_pte(), drop uffd-wp if necessary for swap ptes.

5. In copy_present_page() when COW is enforced when fork(), we also
   need to pass over the uffd-wp bit if VM_UFFD_WP is armed on the new
   vma, and when the pte to be copied has uffd-wp bit set.

Remove the comment in copy_present_pte() about this.  It won't help a huge
lot to only comment there, but comment everywhere would be an overkill.
Let's assume the commit messages would help.

[peterx@redhat.com: fix a few thp pmd missing uffd-wp bit]
  Link: https://lkml.kernel.org/r/20210428225030.9708-4-peterx@redhat.com

Link: https://lkml.kernel.org/r/20210428225030.9708-3-peterx@redhat.com
Fixes: b569a1760782f ("userfaultfd: wp: drop _PAGE_UFFD_WP properly when fork")
Signed-off-by: Peter Xu <peterx@redhat.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joe Perches <joe@perches.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Wang Qing <wangqing@vivo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 8f34f1eac3820fc2722e5159acceb22545b30b0d)
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 include/linux/huge_mm.h |  2 +-
 include/linux/swapops.h |  2 ++
 mm/huge_memory.c        | 27 ++++++++++++++-------------
 mm/memory.c             | 25 +++++++++++++------------
 4 files changed, 30 insertions(+), 26 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index b4e1ebaae825..939f21b69ead 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -10,7 +10,7 @@
 vm_fault_t do_huge_pmd_anonymous_page(struct vm_fault *vmf);
 int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 		  pmd_t *dst_pmd, pmd_t *src_pmd, unsigned long addr,
-		  struct vm_area_struct *vma);
+		  struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma);
 void huge_pmd_set_accessed(struct vm_fault *vmf, pmd_t orig_pmd);
 int copy_huge_pud(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 		  pud_t *dst_pud, pud_t *src_pud, unsigned long addr,
diff --git a/include/linux/swapops.h b/include/linux/swapops.h
index 6430a94c6981..0d429a102d41 100644
--- a/include/linux/swapops.h
+++ b/include/linux/swapops.h
@@ -265,6 +265,8 @@ static inline swp_entry_t pmd_to_swp_entry(pmd_t pmd)
 
 	if (pmd_swp_soft_dirty(pmd))
 		pmd = pmd_swp_clear_soft_dirty(pmd);
+	if (pmd_swp_uffd_wp(pmd))
+		pmd = pmd_swp_clear_uffd_wp(pmd);
 	arch_entry = __pmd_to_swp_entry(pmd);
 	return swp_entry(__swp_type(arch_entry), __swp_offset(arch_entry));
 }
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 4cea1e218b48..9aaf4a8ebeeb 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1026,7 +1026,7 @@ struct page *follow_devmap_pmd(struct vm_area_struct *vma, unsigned long addr,
 
 int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 		  pmd_t *dst_pmd, pmd_t *src_pmd, unsigned long addr,
-		  struct vm_area_struct *vma)
+		  struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma)
 {
 	spinlock_t *dst_ptl, *src_ptl;
 	struct page *src_page;
@@ -1035,7 +1035,7 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 	int ret = -ENOMEM;
 
 	/* Skip if can be re-fill on fault */
-	if (!vma_is_anonymous(vma))
+	if (!vma_is_anonymous(dst_vma))
 		return 0;
 
 	pgtable = pte_alloc_one(dst_mm);
@@ -1049,14 +1049,6 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 	ret = -EAGAIN;
 	pmd = *src_pmd;
 
-	/*
-	 * Make sure the _PAGE_UFFD_WP bit is cleared if the new VMA
-	 * does not have the VM_UFFD_WP, which means that the uffd
-	 * fork event is not enabled.
-	 */
-	if (!(vma->vm_flags & VM_UFFD_WP))
-		pmd = pmd_clear_uffd_wp(pmd);
-
 #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
 	if (unlikely(is_swap_pmd(pmd))) {
 		swp_entry_t entry = pmd_to_swp_entry(pmd);
@@ -1067,11 +1059,15 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 			pmd = swp_entry_to_pmd(entry);
 			if (pmd_swp_soft_dirty(*src_pmd))
 				pmd = pmd_swp_mksoft_dirty(pmd);
+			if (pmd_swp_uffd_wp(*src_pmd))
+				pmd = pmd_swp_mkuffd_wp(pmd);
 			set_pmd_at(src_mm, addr, src_pmd, pmd);
 		}
 		add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR);
 		mm_inc_nr_ptes(dst_mm);
 		pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable);
+		if (!userfaultfd_wp(dst_vma))
+			pmd = pmd_swp_clear_uffd_wp(pmd);
 		set_pmd_at(dst_mm, addr, dst_pmd, pmd);
 		ret = 0;
 		goto out_unlock;
@@ -1107,11 +1103,11 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 	 * best effort that the pinned pages won't be replaced by another
 	 * random page during the coming copy-on-write.
 	 */
-	if (unlikely(page_needs_cow_for_dma(vma, src_page))) {
+	if (unlikely(page_needs_cow_for_dma(src_vma, src_page))) {
 		pte_free(dst_mm, pgtable);
 		spin_unlock(src_ptl);
 		spin_unlock(dst_ptl);
-		__split_huge_pmd(vma, src_pmd, addr, false, NULL);
+		__split_huge_pmd(src_vma, src_pmd, addr, false, NULL);
 		return -EAGAIN;
 	}
 
@@ -1121,8 +1117,9 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 out_zero_page:
 	mm_inc_nr_ptes(dst_mm);
 	pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable);
-
 	pmdp_set_wrprotect(src_mm, addr, src_pmd);
+	if (!userfaultfd_wp(dst_vma))
+		pmd = pmd_clear_uffd_wp(pmd);
 	pmd = pmd_mkold(pmd_wrprotect(pmd));
 	set_pmd_at(dst_mm, addr, dst_pmd, pmd);
 
@@ -1838,6 +1835,8 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
 			newpmd = swp_entry_to_pmd(entry);
 			if (pmd_swp_soft_dirty(*pmd))
 				newpmd = pmd_swp_mksoft_dirty(newpmd);
+			if (pmd_swp_uffd_wp(*pmd))
+				newpmd = pmd_swp_mkuffd_wp(newpmd);
 			set_pmd_at(mm, addr, pmd, newpmd);
 		}
 		goto unlock;
@@ -3248,6 +3247,8 @@ void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, struct page *new)
 		pmde = pmd_mksoft_dirty(pmde);
 	if (is_write_migration_entry(entry))
 		pmde = maybe_pmd_mkwrite(pmde, vma);
+	if (pmd_swp_uffd_wp(*pvmw->pmd))
+		pmde = pmd_wrprotect(pmd_mkuffd_wp(pmde));
 
 	flush_cache_range(vma, mmun_start, mmun_start + HPAGE_PMD_SIZE);
 	if (PageAnon(new))
diff --git a/mm/memory.c b/mm/memory.c
index b15367c285bd..f8e8c99bba73 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -708,10 +708,10 @@ struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr,
 
 static unsigned long
 copy_nonpresent_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
-		pte_t *dst_pte, pte_t *src_pte, struct vm_area_struct *vma,
-		unsigned long addr, int *rss)
+		pte_t *dst_pte, pte_t *src_pte, struct vm_area_struct *dst_vma,
+		struct vm_area_struct *src_vma, unsigned long addr, int *rss)
 {
-	unsigned long vm_flags = vma->vm_flags;
+	unsigned long vm_flags = dst_vma->vm_flags;
 	pte_t pte = *src_pte;
 	struct page *page;
 	swp_entry_t entry = pte_to_swp_entry(pte);
@@ -780,6 +780,8 @@ copy_nonpresent_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 			set_pte_at(src_mm, addr, src_pte, pte);
 		}
 	}
+	if (!userfaultfd_wp(dst_vma))
+		pte = pte_swp_clear_uffd_wp(pte);
 	set_pte_at(dst_mm, addr, dst_pte, pte);
 	return 0;
 }
@@ -845,6 +847,9 @@ copy_present_page(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma
 	/* All done, just insert the new page copy in the child */
 	pte = mk_pte(new_page, dst_vma->vm_page_prot);
 	pte = maybe_mkwrite(pte_mkdirty(pte), dst_vma);
+	if (userfaultfd_pte_wp(dst_vma, *src_pte))
+		/* Uffd-wp needs to be delivered to dest pte as well */
+		pte = pte_wrprotect(pte_mkuffd_wp(pte));
 	set_pte_at(dst_vma->vm_mm, addr, dst_pte, pte);
 	return 0;
 }
@@ -894,12 +899,7 @@ copy_present_pte(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma,
 		pte = pte_mkclean(pte);
 	pte = pte_mkold(pte);
 
-	/*
-	 * Make sure the _PAGE_UFFD_WP bit is cleared if the new VMA
-	 * does not have the VM_UFFD_WP, which means that the uffd
-	 * fork event is not enabled.
-	 */
-	if (!(vm_flags & VM_UFFD_WP))
+	if (!userfaultfd_wp(dst_vma))
 		pte = pte_clear_uffd_wp(pte);
 
 	set_pte_at(dst_vma->vm_mm, addr, dst_pte, pte);
@@ -974,7 +974,8 @@ copy_pte_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma,
 		if (unlikely(!pte_present(*src_pte))) {
 			entry.val = copy_nonpresent_pte(dst_mm, src_mm,
 							dst_pte, src_pte,
-							src_vma, addr, rss);
+							dst_vma, src_vma,
+							addr, rss);
 			if (entry.val)
 				break;
 			progress += 8;
@@ -1051,8 +1052,8 @@ copy_pmd_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma,
 			|| pmd_devmap(*src_pmd)) {
 			int err;
 			VM_BUG_ON_VMA(next-addr != HPAGE_PMD_SIZE, src_vma);
-			err = copy_huge_pmd(dst_mm, src_mm,
-					    dst_pmd, src_pmd, addr, src_vma);
+			err = copy_huge_pmd(dst_mm, src_mm, dst_pmd, src_pmd,
+					    addr, dst_vma, src_vma);
 			if (err == -ENOMEM)
 				return -ENOMEM;
 			if (!err)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH stable 5.10.y 0/2] mm/thp: Fix uffd-wp with fork(); crash on pmd migration entry on fork
  2021-07-19 19:11         ` Hugh Dickins
  2021-07-19 22:12           ` Peter Xu
  2021-07-20 15:51           ` [PATCH stable 5.13.y/5.12.y 0/2] mm/thp: Fix uffd-wp with fork(); crash on pmd migration entry on fork Peter Xu
@ 2021-07-20 15:56           ` Peter Xu
  2021-07-20 15:56             ` [PATCH stable 5.10.y 1/2] mm/thp: simplify copying of huge zero page pmd when fork Peter Xu
                               ` (2 more replies)
  2 siblings, 3 replies; 24+ messages in thread
From: Peter Xu @ 2021-07-20 15:56 UTC (permalink / raw)
  To: stable, linux-kernel, linux-mm
  Cc: Igor Raits, peterx, Hillf Danton, Axel Rasmussen, Hugh Dickins,
	Andrew Morton

In summary, this series should be needed for 5.10/5.12/5.13. This is the 5.10.y
backport of the series.  Patch 1 is a dependency of patch 2, while patch 2
should be the real fix.

There's a minor conflict on patch 2 when cherry pick due to not having the new
helper called page_needs_cow_for_dma().  It's also mentioned at the entry of
patch 2.

This series should be able to fix a rare race that mentioned in thread:

https://lore.kernel.org/linux-mm/796cbb7-5a1c-1ba0-dde5-479aba8224f2@google.com/

This fact wasn't discovered when the fix got proposed and merged, because the
fix was originally about uffd-wp and its fork event.  However it turns out that
the problematic commit b569a1760782f3d is also causing crashing on fork() of
pmd migration entries which is even more severe than the original uffd-wp
problem.

Stable kernels at least on 5.12.y has the crash reproduced, and it's possible
5.13.y and 5.10.y could hit it due to having the problematic commit
b569a1760782f3d but lacking of the uffd-wp fix patch (8f34f1eac382, which is
also patch 2 of this series).

The pmd entry crash problem was reported by Igor Raits <igor@gooddata.com> and
debugged by Hugh Dickins <hughd@google.com>.

Please review, thanks.

Peter Xu (2):
  mm/thp: simplify copying of huge zero page pmd when fork
  mm/userfaultfd: fix uffd-wp special cases for fork()

 include/linux/huge_mm.h |  2 +-
 include/linux/swapops.h |  2 ++
 mm/huge_memory.c        | 36 +++++++++++++++++-------------------
 mm/memory.c             | 25 +++++++++++++------------
 4 files changed, 33 insertions(+), 32 deletions(-)

-- 
2.31.1




^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH stable 5.10.y 1/2] mm/thp: simplify copying of huge zero page pmd when fork
  2021-07-20 15:56           ` [PATCH stable 5.10.y " Peter Xu
@ 2021-07-20 15:56             ` Peter Xu
  2021-07-20 15:56             ` [PATCH stable 5.10.y 2/2] mm/userfaultfd: fix uffd-wp special cases for fork() Peter Xu
  2021-07-20 20:38             ` [PATCH stable 5.10.y 0/2] mm/thp: Fix uffd-wp with fork(); crash on pmd migration entry on fork Hugh Dickins
  2 siblings, 0 replies; 24+ messages in thread
From: Peter Xu @ 2021-07-20 15:56 UTC (permalink / raw)
  To: stable, linux-kernel, linux-mm
  Cc: Igor Raits, peterx, Hillf Danton, Axel Rasmussen, Hugh Dickins,
	Andrew Morton

Patch series "mm/uffd: Misc fix for uffd-wp and one more test".

This series tries to fix some corner case bugs for uffd-wp on either thp
or fork().  Then it introduced a new test with pagemap/pageout.

Patch layout:

Patch 1:    cleanup for THP, it'll slightly simplify the follow up patches
Patch 2-4:  misc fixes for uffd-wp here and there; please refer to each patch
Patch 5:    add pagemap support for uffd-wp
Patch 6:    add pagemap/pageout test for uffd-wp

The last test introduced can also verify some of the fixes in previous
patches, as the test will fail without the fixes.  However it's not easy
to verify all the changes in patch 2-4, but hopefully they can still be
properly reviewed.

Note that if considering the ongoing uffd-wp shmem & hugetlbfs work, patch
5 will be incomplete as it's missing e.g.  hugetlbfs part or the special
swap pte detection.  However that's not needed in this series, and since
that series is still during review, this series does not depend on that
one (the last test only runs with anonymous memory, not file-backed).  So
this series can be merged even before that series.

This patch (of 6):

Huge zero page is handled in a special path in copy_huge_pmd(), however it
should share most codes with a normal thp page.  Trying to share more code
with it by removing the special path.  The only leftover so far is the
huge zero page refcounting (mm_get_huge_zero_page()), because that's
separately done with a global counter.

This prepares for a future patch to modify the huge pmd to be installed,
so that we don't need to duplicate it explicitly into huge zero page case
too.

Link: https://lkml.kernel.org/r/20210428225030.9708-1-peterx@redhat.com
Link: https://lkml.kernel.org/r/20210428225030.9708-2-peterx@redhat.com
Signed-off-by: Peter Xu <peterx@redhat.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Mike Kravetz <mike.kravetz@oracle.com>, peterx@redhat.com
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Brian Geffon <bgeffon@google.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Wang Qing <wangqing@vivo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 5fc7a5f6fd04bc18f309d9f979b32ef7d1d0a997)
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 mm/huge_memory.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 9fe622ff2fc4..8763c4e346cb 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1074,17 +1074,13 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 	 * a page table.
 	 */
 	if (is_huge_zero_pmd(pmd)) {
-		struct page *zero_page;
 		/*
 		 * get_huge_zero_page() will never allocate a new page here,
 		 * since we already have a zero page to copy. It just takes a
 		 * reference.
 		 */
-		zero_page = mm_get_huge_zero_page(dst_mm);
-		set_huge_zero_page(pgtable, dst_mm, vma, addr, dst_pmd,
-				zero_page);
-		ret = 0;
-		goto out_unlock;
+		mm_get_huge_zero_page(dst_mm);
+		goto out_zero_page;
 	}
 
 	src_page = pmd_page(pmd);
@@ -1110,6 +1106,7 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 	get_page(src_page);
 	page_dup_rmap(src_page, true);
 	add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR);
+out_zero_page:
 	mm_inc_nr_ptes(dst_mm);
 	pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable);
 
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH stable 5.10.y 2/2] mm/userfaultfd: fix uffd-wp special cases for fork()
  2021-07-20 15:56           ` [PATCH stable 5.10.y " Peter Xu
  2021-07-20 15:56             ` [PATCH stable 5.10.y 1/2] mm/thp: simplify copying of huge zero page pmd when fork Peter Xu
@ 2021-07-20 15:56             ` Peter Xu
  2021-07-20 20:38             ` [PATCH stable 5.10.y 0/2] mm/thp: Fix uffd-wp with fork(); crash on pmd migration entry on fork Hugh Dickins
  2 siblings, 0 replies; 24+ messages in thread
From: Peter Xu @ 2021-07-20 15:56 UTC (permalink / raw)
  To: stable, linux-kernel, linux-mm
  Cc: Igor Raits, peterx, Hillf Danton, Axel Rasmussen, Hugh Dickins,
	Andrew Morton

[Conflict: copy_huge_pmd() hasn't introduced helper page_needs_cow_for_dma()]

We tried to do something similar in b569a1760782 ("userfaultfd: wp: drop
_PAGE_UFFD_WP properly when fork") previously, but it's not doing it all
right..  A few fixes around the code path:

1. We were referencing VM_UFFD_WP vm_flags on the _old_ vma rather
   than the new vma.  That's overlooked in b569a1760782, so it won't work
   as expected.  Thanks to the recent rework on fork code
   (7a4830c380f3a8b3), we can easily get the new vma now, so switch the
   checks to that.

2. Dropping the uffd-wp bit in copy_huge_pmd() could be wrong if the
   huge pmd is a migration huge pmd.  When it happens, instead of using
   pmd_uffd_wp(), we should use pmd_swp_uffd_wp().  The fix is simply to
   handle them separately.

3. Forget to carry over uffd-wp bit for a write migration huge pmd
   entry.  This also happens in copy_huge_pmd(), where we converted a
   write huge migration entry into a read one.

4. In copy_nonpresent_pte(), drop uffd-wp if necessary for swap ptes.

5. In copy_present_page() when COW is enforced when fork(), we also
   need to pass over the uffd-wp bit if VM_UFFD_WP is armed on the new
   vma, and when the pte to be copied has uffd-wp bit set.

Remove the comment in copy_present_pte() about this.  It won't help a huge
lot to only comment there, but comment everywhere would be an overkill.
Let's assume the commit messages would help.

[peterx@redhat.com: fix a few thp pmd missing uffd-wp bit]
  Link: https://lkml.kernel.org/r/20210428225030.9708-4-peterx@redhat.com

Link: https://lkml.kernel.org/r/20210428225030.9708-3-peterx@redhat.com
Fixes: b569a1760782f ("userfaultfd: wp: drop _PAGE_UFFD_WP properly when fork")
Signed-off-by: Peter Xu <peterx@redhat.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Joe Perches <joe@perches.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mina Almasry <almasrymina@google.com>
Cc: Oliver Upton <oupton@google.com>
Cc: Shaohua Li <shli@fb.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Wang Qing <wangqing@vivo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 8f34f1eac3820fc2722e5159acceb22545b30b0d)
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 include/linux/huge_mm.h |  2 +-
 include/linux/swapops.h |  2 ++
 mm/huge_memory.c        | 27 ++++++++++++++-------------
 mm/memory.c             | 25 +++++++++++++------------
 4 files changed, 30 insertions(+), 26 deletions(-)

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index e72787731a5b..176457145bcf 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -10,7 +10,7 @@
 vm_fault_t do_huge_pmd_anonymous_page(struct vm_fault *vmf);
 int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 		  pmd_t *dst_pmd, pmd_t *src_pmd, unsigned long addr,
-		  struct vm_area_struct *vma);
+		  struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma);
 void huge_pmd_set_accessed(struct vm_fault *vmf, pmd_t orig_pmd);
 int copy_huge_pud(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 		  pud_t *dst_pud, pud_t *src_pud, unsigned long addr,
diff --git a/include/linux/swapops.h b/include/linux/swapops.h
index 6430a94c6981..0d429a102d41 100644
--- a/include/linux/swapops.h
+++ b/include/linux/swapops.h
@@ -265,6 +265,8 @@ static inline swp_entry_t pmd_to_swp_entry(pmd_t pmd)
 
 	if (pmd_swp_soft_dirty(pmd))
 		pmd = pmd_swp_clear_soft_dirty(pmd);
+	if (pmd_swp_uffd_wp(pmd))
+		pmd = pmd_swp_clear_uffd_wp(pmd);
 	arch_entry = __pmd_to_swp_entry(pmd);
 	return swp_entry(__swp_type(arch_entry), __swp_offset(arch_entry));
 }
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 8763c4e346cb..594368f6134f 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1012,7 +1012,7 @@ struct page *follow_devmap_pmd(struct vm_area_struct *vma, unsigned long addr,
 
 int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 		  pmd_t *dst_pmd, pmd_t *src_pmd, unsigned long addr,
-		  struct vm_area_struct *vma)
+		  struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma)
 {
 	spinlock_t *dst_ptl, *src_ptl;
 	struct page *src_page;
@@ -1021,7 +1021,7 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 	int ret = -ENOMEM;
 
 	/* Skip if can be re-fill on fault */
-	if (!vma_is_anonymous(vma))
+	if (!vma_is_anonymous(dst_vma))
 		return 0;
 
 	pgtable = pte_alloc_one(dst_mm);
@@ -1035,14 +1035,6 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 	ret = -EAGAIN;
 	pmd = *src_pmd;
 
-	/*
-	 * Make sure the _PAGE_UFFD_WP bit is cleared if the new VMA
-	 * does not have the VM_UFFD_WP, which means that the uffd
-	 * fork event is not enabled.
-	 */
-	if (!(vma->vm_flags & VM_UFFD_WP))
-		pmd = pmd_clear_uffd_wp(pmd);
-
 #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
 	if (unlikely(is_swap_pmd(pmd))) {
 		swp_entry_t entry = pmd_to_swp_entry(pmd);
@@ -1053,11 +1045,15 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 			pmd = swp_entry_to_pmd(entry);
 			if (pmd_swp_soft_dirty(*src_pmd))
 				pmd = pmd_swp_mksoft_dirty(pmd);
+			if (pmd_swp_uffd_wp(*src_pmd))
+				pmd = pmd_swp_mkuffd_wp(pmd);
 			set_pmd_at(src_mm, addr, src_pmd, pmd);
 		}
 		add_mm_counter(dst_mm, MM_ANONPAGES, HPAGE_PMD_NR);
 		mm_inc_nr_ptes(dst_mm);
 		pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable);
+		if (!userfaultfd_wp(dst_vma))
+			pmd = pmd_swp_clear_uffd_wp(pmd);
 		set_pmd_at(dst_mm, addr, dst_pmd, pmd);
 		ret = 0;
 		goto out_unlock;
@@ -1093,13 +1089,13 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 	 * best effort that the pinned pages won't be replaced by another
 	 * random page during the coming copy-on-write.
 	 */
-	if (unlikely(is_cow_mapping(vma->vm_flags) &&
+	if (unlikely(is_cow_mapping(src_vma->vm_flags) &&
 		     atomic_read(&src_mm->has_pinned) &&
 		     page_maybe_dma_pinned(src_page))) {
 		pte_free(dst_mm, pgtable);
 		spin_unlock(src_ptl);
 		spin_unlock(dst_ptl);
-		__split_huge_pmd(vma, src_pmd, addr, false, NULL);
+		__split_huge_pmd(src_vma, src_pmd, addr, false, NULL);
 		return -EAGAIN;
 	}
 
@@ -1109,8 +1105,9 @@ int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 out_zero_page:
 	mm_inc_nr_ptes(dst_mm);
 	pgtable_trans_huge_deposit(dst_mm, dst_pmd, pgtable);
-
 	pmdp_set_wrprotect(src_mm, addr, src_pmd);
+	if (!userfaultfd_wp(dst_vma))
+		pmd = pmd_clear_uffd_wp(pmd);
 	pmd = pmd_mkold(pmd_wrprotect(pmd));
 	set_pmd_at(dst_mm, addr, dst_pmd, pmd);
 
@@ -1829,6 +1826,8 @@ int change_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd,
 			newpmd = swp_entry_to_pmd(entry);
 			if (pmd_swp_soft_dirty(*pmd))
 				newpmd = pmd_swp_mksoft_dirty(newpmd);
+			if (pmd_swp_uffd_wp(*pmd))
+				newpmd = pmd_swp_mkuffd_wp(newpmd);
 			set_pmd_at(mm, addr, pmd, newpmd);
 		}
 		goto unlock;
@@ -2995,6 +2994,8 @@ void remove_migration_pmd(struct page_vma_mapped_walk *pvmw, struct page *new)
 		pmde = pmd_mksoft_dirty(pmde);
 	if (is_write_migration_entry(entry))
 		pmde = maybe_pmd_mkwrite(pmde, vma);
+	if (pmd_swp_uffd_wp(*pvmw->pmd))
+		pmde = pmd_wrprotect(pmd_mkuffd_wp(pmde));
 
 	flush_cache_range(vma, mmun_start, mmun_start + HPAGE_PMD_SIZE);
 	if (PageAnon(new))
diff --git a/mm/memory.c b/mm/memory.c
index 0a905e0a7e67..f979511a3bb4 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -696,10 +696,10 @@ struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr,
 
 static unsigned long
 copy_nonpresent_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
-		pte_t *dst_pte, pte_t *src_pte, struct vm_area_struct *vma,
-		unsigned long addr, int *rss)
+		pte_t *dst_pte, pte_t *src_pte, struct vm_area_struct *dst_vma,
+		struct vm_area_struct *src_vma, unsigned long addr, int *rss)
 {
-	unsigned long vm_flags = vma->vm_flags;
+	unsigned long vm_flags = dst_vma->vm_flags;
 	pte_t pte = *src_pte;
 	struct page *page;
 	swp_entry_t entry = pte_to_swp_entry(pte);
@@ -768,6 +768,8 @@ copy_nonpresent_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 			set_pte_at(src_mm, addr, src_pte, pte);
 		}
 	}
+	if (!userfaultfd_wp(dst_vma))
+		pte = pte_swp_clear_uffd_wp(pte);
 	set_pte_at(dst_mm, addr, dst_pte, pte);
 	return 0;
 }
@@ -839,6 +841,9 @@ copy_present_page(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma
 	/* All done, just insert the new page copy in the child */
 	pte = mk_pte(new_page, dst_vma->vm_page_prot);
 	pte = maybe_mkwrite(pte_mkdirty(pte), dst_vma);
+	if (userfaultfd_pte_wp(dst_vma, *src_pte))
+		/* Uffd-wp needs to be delivered to dest pte as well */
+		pte = pte_wrprotect(pte_mkuffd_wp(pte));
 	set_pte_at(dst_vma->vm_mm, addr, dst_pte, pte);
 	return 0;
 }
@@ -888,12 +893,7 @@ copy_present_pte(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma,
 		pte = pte_mkclean(pte);
 	pte = pte_mkold(pte);
 
-	/*
-	 * Make sure the _PAGE_UFFD_WP bit is cleared if the new VMA
-	 * does not have the VM_UFFD_WP, which means that the uffd
-	 * fork event is not enabled.
-	 */
-	if (!(vm_flags & VM_UFFD_WP))
+	if (!userfaultfd_wp(dst_vma))
 		pte = pte_clear_uffd_wp(pte);
 
 	set_pte_at(dst_vma->vm_mm, addr, dst_pte, pte);
@@ -968,7 +968,8 @@ copy_pte_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma,
 		if (unlikely(!pte_present(*src_pte))) {
 			entry.val = copy_nonpresent_pte(dst_mm, src_mm,
 							dst_pte, src_pte,
-							src_vma, addr, rss);
+							dst_vma, src_vma,
+							addr, rss);
 			if (entry.val)
 				break;
 			progress += 8;
@@ -1045,8 +1046,8 @@ copy_pmd_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma,
 			|| pmd_devmap(*src_pmd)) {
 			int err;
 			VM_BUG_ON_VMA(next-addr != HPAGE_PMD_SIZE, src_vma);
-			err = copy_huge_pmd(dst_mm, src_mm,
-					    dst_pmd, src_pmd, addr, src_vma);
+			err = copy_huge_pmd(dst_mm, src_mm, dst_pmd, src_pmd,
+					    addr, dst_vma, src_vma);
 			if (err == -ENOMEM)
 				return -ENOMEM;
 			if (!err)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: kernel BUG at include/linux/swapops.h:204!
  2021-07-20  7:47             ` Igor Raits
@ 2021-07-20 16:01               ` Peter Xu
  2021-07-20 16:05                 ` Igor Raits
  0 siblings, 1 reply; 24+ messages in thread
From: Peter Xu @ 2021-07-20 16:01 UTC (permalink / raw)
  To: Igor Raits
  Cc: Hugh Dickins, Andrew Morton, Hillf Danton, Axel Rasmussen, linux-mm

Hi, Igor,

On Tue, Jul 20, 2021 at 09:47:02AM +0200, Igor Raits wrote:
> FTR, even though 8f34f1eac382 applies cleanly it does not compile.
> The 1st patch of that series is also required (5fc7a5f6fd04) - it removes
> use of
> *vma, which is later removed by the patch that fixes the actual problem.

Please feel free to give it a shot with the two backport series to see whether
you spot any issue with it so far.

5.13/5.12: https://lore.kernel.org/linux-mm/20210720155150.497148-1-peterx@redhat.com
5.10:      https://lore.kernel.org/linux-mm/20210720155657.499127-1-peterx@redhat.com

Sorry for the trouble.

-- 
Peter Xu



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: kernel BUG at include/linux/swapops.h:204!
  2021-07-20 16:01               ` Peter Xu
@ 2021-07-20 16:05                 ` Igor Raits
  0 siblings, 0 replies; 24+ messages in thread
From: Igor Raits @ 2021-07-20 16:05 UTC (permalink / raw)
  To: Peter Xu
  Cc: Hugh Dickins, Andrew Morton, Hillf Danton, Axel Rasmussen, linux-mm

[-- Attachment #1: Type: text/plain, Size: 1543 bytes --]

Hi Peter,

On Tue, Jul 20, 2021 at 6:01 PM Peter Xu <peterx@redhat.com> wrote:

> Hi, Igor,
>
> On Tue, Jul 20, 2021 at 09:47:02AM +0200, Igor Raits wrote:
> > FTR, even though 8f34f1eac382 applies cleanly it does not compile.
> > The 1st patch of that series is also required (5fc7a5f6fd04) - it removes
> > use of
> > *vma, which is later removed by the patch that fixes the actual problem.
>
> Please feel free to give it a shot with the two backport series to see
> whether
> you spot any issue with it so far.
>

> 5.13/5.12:
> https://lore.kernel.org/linux-mm/20210720155150.497148-1-peterx@redhat.com


It seems like a normal cherry-pick without any extra changes around (please
correct me if I'm wrong),
so I've already added that to our internal kernel builds and so far I did
not spot any issues
(but as the issue does not happen frequently, I can't confirm if it is
really fixed but I trust Hugh
that he identified that as the commit that should fix it).

Anyhow, I'll keep looking around and let you know if I spot anything :)

5.10:
> https://lore.kernel.org/linux-mm/20210720155657.499127-1-peterx@redhat.com
>
> Sorry for the trouble.
>

Thanks for providing fixes!

-- 
> Peter Xu
>
>

-- 

Igor Raits

Sr. SW Engineer

igor@gooddata.com

+420 775 117 817

Moravske namesti 1007/14

602 00 Brno-Veveri, Czech Republic

Twitter <https://twitter.com/gooddata> | Facebook
<https://www.facebook.com/gooddata> | LinkedIn
<http://www.linkedin.com/company/gooddata> | Blog
<http://www.gooddata.com/blog>


<https://www.gooddata.com/>

[-- Attachment #2: Type: text/html, Size: 7100 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH stable 5.13.y/5.12.y 0/2] mm/thp: Fix uffd-wp with fork(); crash on pmd migration entry on fork
  2021-07-20 15:51           ` [PATCH stable 5.13.y/5.12.y 0/2] mm/thp: Fix uffd-wp with fork(); crash on pmd migration entry on fork Peter Xu
  2021-07-20 15:51             ` [PATCH stable 5.13.y/5.12.y 1/2] mm/thp: simplify copying of huge zero page pmd when fork Peter Xu
  2021-07-20 15:51             ` [PATCH stable 5.13.y/5.12.y 2/2] mm/userfaultfd: fix uffd-wp special cases for fork() Peter Xu
@ 2021-07-20 20:32             ` Hugh Dickins
  2021-07-22 14:02               ` Greg KH
  2 siblings, 1 reply; 24+ messages in thread
From: Hugh Dickins @ 2021-07-20 20:32 UTC (permalink / raw)
  To: Peter Xu
  Cc: stable, linux-mm, linux-kernel, Axel Rasmussen, Andrew Morton,
	Hillf Danton, Igor Raits

On Tue, Jul 20, 2021 at 8:52 AM Peter Xu <peterx@redhat.com> wrote:
>
> In summary: this series should be needed for 5.10/5.12/5.13. This is the
> 5.13.y/5.12.y backport of the series, and it should be able to be applied on
> both of the branches.  Patch 1 is a dependency of patch 2, while patch 2 should
> be the real fix.
>
> This series should be able to fix a rare race that mentioned in thread:
>
> https://lore.kernel.org/linux-mm/796cbb7-5a1c-1ba0-dde5-479aba8224f2@google.com/
>
> This fact wasn't discovered when the fix got proposed and merged, because the
> fix was originally about uffd-wp and its fork event.  However it turns out that
> the problematic commit b569a1760782f3d is also causing crashing on fork() of
> pmd migration entries which is even more severe than the original uffd-wp
> problem.
>
> Stable kernels at least on 5.12.y has the crash reproduced, and it's possible
> 5.13.y and 5.10.y could hit it due to having the problematic commit
> b569a1760782f3d but lacking of the uffd-wp fix patch (8f34f1eac382, which is
> also patch 2 of this series).
>
> The pmd entry crash problem was reported by Igor Raits <igor@gooddata.com> and
> debugged by Hugh Dickins <hughd@google.com>.
>
> Please review, thanks.

These two 5.13.y patches look just right to me, thank you Peter (and
5.12.19 announced EOL overnight, so nothing more wanted for that).

But these do just amount to asking stable@vger.kernel.org to
cherry-pick the two commits
5fc7a5f6fd04bc18f309d9f979b32ef7d1d0a997
8f34f1eac3820fc2722e5159acceb22545b30b0d

Hugh

(I'd usually reply with alpine rather than gmail, but I see extra
blank lines on these 0/2s that way; but the patches themselves are
good.)

>
> Peter Xu (2):
>   mm/thp: simplify copying of huge zero page pmd when fork
>   mm/userfaultfd: fix uffd-wp special cases for fork()
>
>  include/linux/huge_mm.h |  2 +-
>  include/linux/swapops.h |  2 ++
>  mm/huge_memory.c        | 36 +++++++++++++++++-------------------
>  mm/memory.c             | 25 +++++++++++++------------
>  4 files changed, 33 insertions(+), 32 deletions(-)
>
> --
> 2.31.1
>
>


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH stable 5.10.y 0/2] mm/thp: Fix uffd-wp with fork(); crash on pmd migration entry on fork
  2021-07-20 15:56           ` [PATCH stable 5.10.y " Peter Xu
  2021-07-20 15:56             ` [PATCH stable 5.10.y 1/2] mm/thp: simplify copying of huge zero page pmd when fork Peter Xu
  2021-07-20 15:56             ` [PATCH stable 5.10.y 2/2] mm/userfaultfd: fix uffd-wp special cases for fork() Peter Xu
@ 2021-07-20 20:38             ` Hugh Dickins
  2021-07-22 14:05               ` Greg KH
  2 siblings, 1 reply; 24+ messages in thread
From: Hugh Dickins @ 2021-07-20 20:38 UTC (permalink / raw)
  To: Peter Xu
  Cc: stable, linux-kernel, linux-mm, Igor Raits, Hillf Danton,
	Axel Rasmussen, Andrew Morton

On Tue, Jul 20, 2021 at 8:57 AM Peter Xu <peterx@redhat.com> wrote:
>
> In summary, this series should be needed for 5.10/5.12/5.13. This is the 5.10.y
> backport of the series.  Patch 1 is a dependency of patch 2, while patch 2
> should be the real fix.
>
> There's a minor conflict on patch 2 when cherry pick due to not having the new
> helper called page_needs_cow_for_dma().  It's also mentioned at the entry of
> patch 2.
>
> This series should be able to fix a rare race that mentioned in thread:
>
> https://lore.kernel.org/linux-mm/796cbb7-5a1c-1ba0-dde5-479aba8224f2@google.com/
>
> This fact wasn't discovered when the fix got proposed and merged, because the
> fix was originally about uffd-wp and its fork event.  However it turns out that
> the problematic commit b569a1760782f3d is also causing crashing on fork() of
> pmd migration entries which is even more severe than the original uffd-wp
> problem.
>
> Stable kernels at least on 5.12.y has the crash reproduced, and it's possible
> 5.13.y and 5.10.y could hit it due to having the problematic commit
> b569a1760782f3d but lacking of the uffd-wp fix patch (8f34f1eac382, which is
> also patch 2 of this series).
>
> The pmd entry crash problem was reported by Igor Raits <igor@gooddata.com> and
> debugged by Hugh Dickins <hughd@google.com>.
>
> Please review, thanks.

And these two for 5.10.y look good to me also: I'm glad you decided in
the end to keep 5.10's support for uffd-wp-fork.
The first is just a straight cherry-pick of
5fc7a5f6fd04bc18f309d9f979b32ef7d1d0a997, but as you noted above,
8f34f1eac3820fc2722e5159acceb22545b30b0d needed one line of fixup for
that tree.

Thank you Peter,
Hugh


>
> Peter Xu (2):
>   mm/thp: simplify copying of huge zero page pmd when fork
>   mm/userfaultfd: fix uffd-wp special cases for fork()
>
>  include/linux/huge_mm.h |  2 +-
>  include/linux/swapops.h |  2 ++
>  mm/huge_memory.c        | 36 +++++++++++++++++-------------------
>  mm/memory.c             | 25 +++++++++++++------------
>  4 files changed, 33 insertions(+), 32 deletions(-)
>
> --
> 2.31.1
>
>


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH stable 5.13.y/5.12.y 0/2] mm/thp: Fix uffd-wp with fork(); crash on pmd migration entry on fork
  2021-07-20 20:32             ` [PATCH stable 5.13.y/5.12.y 0/2] mm/thp: Fix uffd-wp with fork(); crash on pmd migration entry on fork Hugh Dickins
@ 2021-07-22 14:02               ` Greg KH
  0 siblings, 0 replies; 24+ messages in thread
From: Greg KH @ 2021-07-22 14:02 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Peter Xu, stable, linux-mm, linux-kernel, Axel Rasmussen,
	Andrew Morton, Hillf Danton, Igor Raits

On Tue, Jul 20, 2021 at 01:32:19PM -0700, Hugh Dickins wrote:
> On Tue, Jul 20, 2021 at 8:52 AM Peter Xu <peterx@redhat.com> wrote:
> >
> > In summary: this series should be needed for 5.10/5.12/5.13. This is the
> > 5.13.y/5.12.y backport of the series, and it should be able to be applied on
> > both of the branches.  Patch 1 is a dependency of patch 2, while patch 2 should
> > be the real fix.
> >
> > This series should be able to fix a rare race that mentioned in thread:
> >
> > https://lore.kernel.org/linux-mm/796cbb7-5a1c-1ba0-dde5-479aba8224f2@google.com/
> >
> > This fact wasn't discovered when the fix got proposed and merged, because the
> > fix was originally about uffd-wp and its fork event.  However it turns out that
> > the problematic commit b569a1760782f3d is also causing crashing on fork() of
> > pmd migration entries which is even more severe than the original uffd-wp
> > problem.
> >
> > Stable kernels at least on 5.12.y has the crash reproduced, and it's possible
> > 5.13.y and 5.10.y could hit it due to having the problematic commit
> > b569a1760782f3d but lacking of the uffd-wp fix patch (8f34f1eac382, which is
> > also patch 2 of this series).
> >
> > The pmd entry crash problem was reported by Igor Raits <igor@gooddata.com> and
> > debugged by Hugh Dickins <hughd@google.com>.
> >
> > Please review, thanks.
> 
> These two 5.13.y patches look just right to me, thank you Peter (and
> 5.12.19 announced EOL overnight, so nothing more wanted for that).
> 
> But these do just amount to asking stable@vger.kernel.org to
> cherry-pick the two commits
> 5fc7a5f6fd04bc18f309d9f979b32ef7d1d0a997
> 8f34f1eac3820fc2722e5159acceb22545b30b0d

Thanks for the review, both now queued up to 5.13.y.

greg k-h


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH stable 5.10.y 0/2] mm/thp: Fix uffd-wp with fork(); crash on pmd migration entry on fork
  2021-07-20 20:38             ` [PATCH stable 5.10.y 0/2] mm/thp: Fix uffd-wp with fork(); crash on pmd migration entry on fork Hugh Dickins
@ 2021-07-22 14:05               ` Greg KH
  0 siblings, 0 replies; 24+ messages in thread
From: Greg KH @ 2021-07-22 14:05 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Peter Xu, stable, linux-kernel, linux-mm, Igor Raits,
	Hillf Danton, Axel Rasmussen, Andrew Morton

On Tue, Jul 20, 2021 at 01:38:53PM -0700, Hugh Dickins wrote:
> On Tue, Jul 20, 2021 at 8:57 AM Peter Xu <peterx@redhat.com> wrote:
> >
> > In summary, this series should be needed for 5.10/5.12/5.13. This is the 5.10.y
> > backport of the series.  Patch 1 is a dependency of patch 2, while patch 2
> > should be the real fix.
> >
> > There's a minor conflict on patch 2 when cherry pick due to not having the new
> > helper called page_needs_cow_for_dma().  It's also mentioned at the entry of
> > patch 2.
> >
> > This series should be able to fix a rare race that mentioned in thread:
> >
> > https://lore.kernel.org/linux-mm/796cbb7-5a1c-1ba0-dde5-479aba8224f2@google.com/
> >
> > This fact wasn't discovered when the fix got proposed and merged, because the
> > fix was originally about uffd-wp and its fork event.  However it turns out that
> > the problematic commit b569a1760782f3d is also causing crashing on fork() of
> > pmd migration entries which is even more severe than the original uffd-wp
> > problem.
> >
> > Stable kernels at least on 5.12.y has the crash reproduced, and it's possible
> > 5.13.y and 5.10.y could hit it due to having the problematic commit
> > b569a1760782f3d but lacking of the uffd-wp fix patch (8f34f1eac382, which is
> > also patch 2 of this series).
> >
> > The pmd entry crash problem was reported by Igor Raits <igor@gooddata.com> and
> > debugged by Hugh Dickins <hughd@google.com>.
> >
> > Please review, thanks.
> 
> And these two for 5.10.y look good to me also: I'm glad you decided in
> the end to keep 5.10's support for uffd-wp-fork.
> The first is just a straight cherry-pick of
> 5fc7a5f6fd04bc18f309d9f979b32ef7d1d0a997, but as you noted above,
> 8f34f1eac3820fc2722e5159acceb22545b30b0d needed one line of fixup for
> that tree.

All now queued up, thanks.

greg k-h


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2021-07-22 14:05 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-10  7:33 kernel BUG at include/linux/swapops.h:204! Igor Raits
2021-07-10 12:46 ` Hillf Danton
2021-07-11  4:17 ` Hugh Dickins
2021-07-11  6:06   ` Igor Raits
2021-07-15 17:47     ` Igor Raits
2021-07-16 19:45       ` Hugh Dickins
2021-07-19 19:11         ` Hugh Dickins
2021-07-19 22:12           ` Peter Xu
2021-07-19 22:42             ` Hugh Dickins
2021-07-20  0:34               ` Peter Xu
2021-07-20  3:31                 ` Hugh Dickins
2021-07-20  7:47             ` Igor Raits
2021-07-20 16:01               ` Peter Xu
2021-07-20 16:05                 ` Igor Raits
2021-07-20 15:51           ` [PATCH stable 5.13.y/5.12.y 0/2] mm/thp: Fix uffd-wp with fork(); crash on pmd migration entry on fork Peter Xu
2021-07-20 15:51             ` [PATCH stable 5.13.y/5.12.y 1/2] mm/thp: simplify copying of huge zero page pmd when fork Peter Xu
2021-07-20 15:51             ` [PATCH stable 5.13.y/5.12.y 2/2] mm/userfaultfd: fix uffd-wp special cases for fork() Peter Xu
2021-07-20 20:32             ` [PATCH stable 5.13.y/5.12.y 0/2] mm/thp: Fix uffd-wp with fork(); crash on pmd migration entry on fork Hugh Dickins
2021-07-22 14:02               ` Greg KH
2021-07-20 15:56           ` [PATCH stable 5.10.y " Peter Xu
2021-07-20 15:56             ` [PATCH stable 5.10.y 1/2] mm/thp: simplify copying of huge zero page pmd when fork Peter Xu
2021-07-20 15:56             ` [PATCH stable 5.10.y 2/2] mm/userfaultfd: fix uffd-wp special cases for fork() Peter Xu
2021-07-20 20:38             ` [PATCH stable 5.10.y 0/2] mm/thp: Fix uffd-wp with fork(); crash on pmd migration entry on fork Hugh Dickins
2021-07-22 14:05               ` Greg KH

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).