All of lore.kernel.org
 help / color / mirror / Atom feed
* PROBLEM: Power9: kernel oops on memory hotunplug from ppc64le guest
@ 2019-05-16 14:14 ` srikanth
  0 siblings, 0 replies; 26+ messages in thread
From: srikanth @ 2019-05-16 14:14 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: bharata, mpe, linux-kernel, linux-next

Hello,

On power9 host, performing memory hotunplug from ppc64le guest results 
in kernel oops.

Kernel used : https://github.com/torvalds/linux/tree/v5.1 built using 
ppc64le_defconfig for host and ppc64le_guest_defconfig for guest.

Recreation steps:

1. Boot a guest with below mem configuration:
   <maxMemory slots='32' unit='KiB'>33554432</maxMemory>
   <memory unit='KiB'>8388608</memory>
   <currentMemory unit='KiB'>4194304</currentMemory>
   <cpu>
     <numa>
       <cell id='0' cpus='0-31' memory='8388608' unit='KiB'/>
     </numa>
   </cpu>

2. From host hotplug 8G memory -> verify memory hotadded succesfully -> 
now reboot guest -> once guest comes back try to unplug 8G memory

mem.xml used:
<memory model='dimm'>
<target>
<size unit='GiB'>8</size>
<node>0</node>
</target>
</memory>

Memory attach and detach commands used:
     virsh attach-device vm1 ./mem.xml --live
     virsh detach-device vm1 ./mem.xml --live

Trace seen inside guest after unplug, guest just hangs there forever:

[   21.962986] kernel BUG at arch/powerpc/mm/pgtable-frag.c:113!
[   21.963064] Oops: Exception in kernel mode, sig: 5 [#1]
[   21.963090] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA 
pSeries
[   21.963131] Modules linked in: xt_tcpudp iptable_filter squashfs fuse 
vmx_crypto ib_iser rdma_cm iw_cm ib_cm ib_core libiscsi 
scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_decompress 
zstd_compress lzo_compress raid10 raid456 async_raid6_recov async_memcpy 
async_pq async_xor async_tx xor raid6_pq multipath crc32c_vpmsum
[   21.963281] CPU: 11 PID: 316 Comm: kworker/u64:5 Kdump: loaded Not 
tainted 5.1.0-dirty #2
[   21.963323] Workqueue: pseries hotplug workque pseries_hp_work_fn
[   21.963355] NIP:  c000000000079e18 LR: c000000000c79308 CTR: 
0000000000008000
[   21.963392] REGS: c0000003f88034f0 TRAP: 0700   Not tainted (5.1.0-dirty)
[   21.963422] MSR:  800000000282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  
CR: 28002884  XER: 20040000
[   21.963470] CFAR: c000000000c79304 IRQMASK: 0
[   21.963470] GPR00: c000000000c79308 c0000003f8803780 c000000001521000 
0000000000fff8c0
[   21.963470] GPR04: 0000000000000001 00000000ffe30005 0000000000000005 
0000000000000020
[   21.963470] GPR08: 0000000000000000 0000000000000001 c00a000000fff8e0 
c0000000016d21a0
[   21.963470] GPR12: c0000000016e7b90 c000000007ff2700 c00a000000a00000 
c0000003ffe30100
[   21.963470] GPR16: c0000003ffe30000 c0000000014aa4de c00a0000009f0000 
c0000000016d21b0
[   21.963470] GPR20: c0000000014de588 0000000000000001 c0000000016d21b8 
c00a000000a00000
[   21.963470] GPR24: 0000000000000000 ffffffffffffffff c00a000000a00000 
c0000003ffe96000
[   21.963470] GPR28: c00a000000a00000 c00a000000a00000 c0000003fffec000 
c00a000000fff8c0
[   21.963802] NIP [c000000000079e18] pte_fragment_free+0x48/0xd0
[   21.963838] LR [c000000000c79308] remove_pagetable+0x49c/0x5b4
[   21.963873] Call Trace:
[   21.963890] [c0000003f8803780] [c0000003ffe997f0] 0xc0000003ffe997f0 
(unreliable)
[   21.963933] [c0000003f88037b0] [0000000000000000] (null)
[   21.963969] [c0000003f88038c0] [c00000000006f038] 
vmemmap_free+0x218/0x2e0
[   21.964006] [c0000003f8803940] [c00000000036f100] 
sparse_remove_one_section+0xd0/0x138
[   21.964050] [c0000003f8803980] [c000000000383a50] 
__remove_pages+0x410/0x560
[   21.964093] [c0000003f8803a90] [c000000000c784d8] 
arch_remove_memory+0x68/0xdc
[   21.964136] [c0000003f8803ad0] [c000000000385d74] 
__remove_memory+0xc4/0x110
[   21.964180] [c0000003f8803b10] [c0000000000d44e4] 
dlpar_remove_lmb+0x94/0x140
[   21.964223] [c0000003f8803b50] [c0000000000d52b4] 
dlpar_memory+0x464/0xd00
[   21.964259] [c0000003f8803be0] [c0000000000cd5c0] 
handle_dlpar_errorlog+0xc0/0x190
[   21.964303] [c0000003f8803c50] [c0000000000cd6bc] 
pseries_hp_work_fn+0x2c/0x60
[   21.964346] [c0000003f8803c80] [c00000000013a4a0] 
process_one_work+0x2b0/0x5a0
[   21.964388] [c0000003f8803d10] [c00000000013a818] 
worker_thread+0x88/0x610
[   21.964434] [c0000003f8803db0] [c000000000143884] kthread+0x1a4/0x1b0
[   21.964468] [c0000003f8803e20] [c00000000000bdc4] 
ret_from_kernel_thread+0x5c/0x78
[   21.964506] Instruction dump:
[   21.964527] fbe1fff8 f821ffd1 78638502 78633664 ebe90000 7fff1a14 
395f0020 813f0020
[   21.964569] 7d2907b4 7d2900d0 79290fe0 69290001 <0b090000> 7c0004ac 
7d205028 3129ffff
[   21.964613] ---[ end trace aaa571aa1636fee6 ]---
[   21.966349]
[   21.966383] Sending IPI to other CPUs
[   21.978335] IPI complete
[   21.981354] kexec: Starting switchover sequence.
I'm in purgatory


^ permalink raw reply	[flat|nested] 26+ messages in thread

* PROBLEM: Power9: kernel oops on memory hotunplug from ppc64le guest
@ 2019-05-16 14:14 ` srikanth
  0 siblings, 0 replies; 26+ messages in thread
From: srikanth @ 2019-05-16 14:14 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: linux-next, linux-kernel, bharata

Hello,

On power9 host, performing memory hotunplug from ppc64le guest results 
in kernel oops.

Kernel used : https://github.com/torvalds/linux/tree/v5.1 built using 
ppc64le_defconfig for host and ppc64le_guest_defconfig for guest.

Recreation steps:

1. Boot a guest with below mem configuration:
   <maxMemory slots='32' unit='KiB'>33554432</maxMemory>
   <memory unit='KiB'>8388608</memory>
   <currentMemory unit='KiB'>4194304</currentMemory>
   <cpu>
     <numa>
       <cell id='0' cpus='0-31' memory='8388608' unit='KiB'/>
     </numa>
   </cpu>

2. From host hotplug 8G memory -> verify memory hotadded succesfully -> 
now reboot guest -> once guest comes back try to unplug 8G memory

mem.xml used:
<memory model='dimm'>
<target>
<size unit='GiB'>8</size>
<node>0</node>
</target>
</memory>

Memory attach and detach commands used:
     virsh attach-device vm1 ./mem.xml --live
     virsh detach-device vm1 ./mem.xml --live

Trace seen inside guest after unplug, guest just hangs there forever:

[   21.962986] kernel BUG at arch/powerpc/mm/pgtable-frag.c:113!
[   21.963064] Oops: Exception in kernel mode, sig: 5 [#1]
[   21.963090] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA 
pSeries
[   21.963131] Modules linked in: xt_tcpudp iptable_filter squashfs fuse 
vmx_crypto ib_iser rdma_cm iw_cm ib_cm ib_core libiscsi 
scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_decompress 
zstd_compress lzo_compress raid10 raid456 async_raid6_recov async_memcpy 
async_pq async_xor async_tx xor raid6_pq multipath crc32c_vpmsum
[   21.963281] CPU: 11 PID: 316 Comm: kworker/u64:5 Kdump: loaded Not 
tainted 5.1.0-dirty #2
[   21.963323] Workqueue: pseries hotplug workque pseries_hp_work_fn
[   21.963355] NIP:  c000000000079e18 LR: c000000000c79308 CTR: 
0000000000008000
[   21.963392] REGS: c0000003f88034f0 TRAP: 0700   Not tainted (5.1.0-dirty)
[   21.963422] MSR:  800000000282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  
CR: 28002884  XER: 20040000
[   21.963470] CFAR: c000000000c79304 IRQMASK: 0
[   21.963470] GPR00: c000000000c79308 c0000003f8803780 c000000001521000 
0000000000fff8c0
[   21.963470] GPR04: 0000000000000001 00000000ffe30005 0000000000000005 
0000000000000020
[   21.963470] GPR08: 0000000000000000 0000000000000001 c00a000000fff8e0 
c0000000016d21a0
[   21.963470] GPR12: c0000000016e7b90 c000000007ff2700 c00a000000a00000 
c0000003ffe30100
[   21.963470] GPR16: c0000003ffe30000 c0000000014aa4de c00a0000009f0000 
c0000000016d21b0
[   21.963470] GPR20: c0000000014de588 0000000000000001 c0000000016d21b8 
c00a000000a00000
[   21.963470] GPR24: 0000000000000000 ffffffffffffffff c00a000000a00000 
c0000003ffe96000
[   21.963470] GPR28: c00a000000a00000 c00a000000a00000 c0000003fffec000 
c00a000000fff8c0
[   21.963802] NIP [c000000000079e18] pte_fragment_free+0x48/0xd0
[   21.963838] LR [c000000000c79308] remove_pagetable+0x49c/0x5b4
[   21.963873] Call Trace:
[   21.963890] [c0000003f8803780] [c0000003ffe997f0] 0xc0000003ffe997f0 
(unreliable)
[   21.963933] [c0000003f88037b0] [0000000000000000] (null)
[   21.963969] [c0000003f88038c0] [c00000000006f038] 
vmemmap_free+0x218/0x2e0
[   21.964006] [c0000003f8803940] [c00000000036f100] 
sparse_remove_one_section+0xd0/0x138
[   21.964050] [c0000003f8803980] [c000000000383a50] 
__remove_pages+0x410/0x560
[   21.964093] [c0000003f8803a90] [c000000000c784d8] 
arch_remove_memory+0x68/0xdc
[   21.964136] [c0000003f8803ad0] [c000000000385d74] 
__remove_memory+0xc4/0x110
[   21.964180] [c0000003f8803b10] [c0000000000d44e4] 
dlpar_remove_lmb+0x94/0x140
[   21.964223] [c0000003f8803b50] [c0000000000d52b4] 
dlpar_memory+0x464/0xd00
[   21.964259] [c0000003f8803be0] [c0000000000cd5c0] 
handle_dlpar_errorlog+0xc0/0x190
[   21.964303] [c0000003f8803c50] [c0000000000cd6bc] 
pseries_hp_work_fn+0x2c/0x60
[   21.964346] [c0000003f8803c80] [c00000000013a4a0] 
process_one_work+0x2b0/0x5a0
[   21.964388] [c0000003f8803d10] [c00000000013a818] 
worker_thread+0x88/0x610
[   21.964434] [c0000003f8803db0] [c000000000143884] kthread+0x1a4/0x1b0
[   21.964468] [c0000003f8803e20] [c00000000000bdc4] 
ret_from_kernel_thread+0x5c/0x78
[   21.964506] Instruction dump:
[   21.964527] fbe1fff8 f821ffd1 78638502 78633664 ebe90000 7fff1a14 
395f0020 813f0020
[   21.964569] 7d2907b4 7d2900d0 79290fe0 69290001 <0b090000> 7c0004ac 
7d205028 3129ffff
[   21.964613] ---[ end trace aaa571aa1636fee6 ]---
[   21.966349]
[   21.966383] Sending IPI to other CPUs
[   21.978335] IPI complete
[   21.981354] kexec: Starting switchover sequence.
I'm in purgatory


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: PROBLEM: Power9: kernel oops on memory hotunplug from ppc64le guest
  2019-05-16 14:14 ` srikanth
@ 2019-05-17 11:20   ` Michael Ellerman
  -1 siblings, 0 replies; 26+ messages in thread
From: Michael Ellerman @ 2019-05-17 11:20 UTC (permalink / raw)
  To: srikanth, linuxppc-dev; +Cc: bharata, linux-kernel, linux-next

srikanth <sraithal@linux.vnet.ibm.com> writes:
> Hello,
>
> On power9 host, performing memory hotunplug from ppc64le guest results 
> in kernel oops.

Thanks for the report.

Did this used to work in the past? If so what is the last version that
worked?

> Kernel used : https://github.com/torvalds/linux/tree/v5.1 built using 
> ppc64le_defconfig for host and ppc64le_guest_defconfig for guest.
>
> Recreation steps:
>
> 1. Boot a guest with below mem configuration:
>    <maxMemory slots='32' unit='KiB'>33554432</maxMemory>
>    <memory unit='KiB'>8388608</memory>
>    <currentMemory unit='KiB'>4194304</currentMemory>
>    <cpu>
>      <numa>
>        <cell id='0' cpus='0-31' memory='8388608' unit='KiB'/>
>      </numa>
>    </cpu>
>
> 2. From host hotplug 8G memory -> verify memory hotadded succesfully -> 
> now reboot guest -> once guest comes back try to unplug 8G memory

I assume the reboot is required to trigger the bug? ie. if you unplug
without rebooting it doesn't crash?

> mem.xml used:
> <memory model='dimm'>
> <target>
> <size unit='GiB'>8</size>
> <node>0</node>
> </target>
> </memory>
>
> Memory attach and detach commands used:
>      virsh attach-device vm1 ./mem.xml --live
>      virsh detach-device vm1 ./mem.xml --live
>
> Trace seen inside guest after unplug, guest just hangs there forever:
>
> [   21.962986] kernel BUG at arch/powerpc/mm/pgtable-frag.c:113!
> [   21.963064] Oops: Exception in kernel mode, sig: 5 [#1]
> [   21.963090] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA 
> pSeries
> [   21.963131] Modules linked in: xt_tcpudp iptable_filter squashfs fuse 
> vmx_crypto ib_iser rdma_cm iw_cm ib_cm ib_core libiscsi 
> scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_decompress 
> zstd_compress lzo_compress raid10 raid456 async_raid6_recov async_memcpy 
> async_pq async_xor async_tx xor raid6_pq multipath crc32c_vpmsum
> [   21.963281] CPU: 11 PID: 316 Comm: kworker/u64:5 Kdump: loaded Not 
> tainted 5.1.0-dirty #2
> [   21.963323] Workqueue: pseries hotplug workque pseries_hp_work_fn
> [   21.963355] NIP:  c000000000079e18 LR: c000000000c79308 CTR: 
> 0000000000008000
> [   21.963392] REGS: c0000003f88034f0 TRAP: 0700   Not tainted (5.1.0-dirty)
> [   21.963422] MSR:  800000000282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  
> CR: 28002884  XER: 20040000
> [   21.963470] CFAR: c000000000c79304 IRQMASK: 0
> [   21.963470] GPR00: c000000000c79308 c0000003f8803780 c000000001521000 
> 0000000000fff8c0

Can you try not to word wrap these, it makes them much harder to read.

There's some instructions here on configuring Thunderbird:
  https://www.kernel.org/doc/html/latest/process/email-clients.html#thunderbird-gui

> [   21.963470] GPR04: 0000000000000001 00000000ffe30005 0000000000000005 
> 0000000000000020
> [   21.963470] GPR08: 0000000000000000 0000000000000001 c00a000000fff8e0 
> c0000000016d21a0
> [   21.963470] GPR12: c0000000016e7b90 c000000007ff2700 c00a000000a00000 
> c0000003ffe30100
> [   21.963470] GPR16: c0000003ffe30000 c0000000014aa4de c00a0000009f0000 
> c0000000016d21b0
> [   21.963470] GPR20: c0000000014de588 0000000000000001 c0000000016d21b8 
> c00a000000a00000
> [   21.963470] GPR24: 0000000000000000 ffffffffffffffff c00a000000a00000 
> c0000003ffe96000
> [   21.963470] GPR28: c00a000000a00000 c00a000000a00000 c0000003fffec000 
> c00a000000fff8c0
> [   21.963802] NIP [c000000000079e18] pte_fragment_free+0x48/0xd0
> [   21.963838] LR [c000000000c79308] remove_pagetable+0x49c/0x5b4
> [   21.963873] Call Trace:
> [   21.963890] [c0000003f8803780] [c0000003ffe997f0] 0xc0000003ffe997f0 
> (unreliable)
> [   21.963933] [c0000003f88037b0] [0000000000000000] (null)
> [   21.963969] [c0000003f88038c0] [c00000000006f038] 
> vmemmap_free+0x218/0x2e0
> [   21.964006] [c0000003f8803940] [c00000000036f100] 
> sparse_remove_one_section+0xd0/0x138
> [   21.964050] [c0000003f8803980] [c000000000383a50] 
> __remove_pages+0x410/0x560
> [   21.964093] [c0000003f8803a90] [c000000000c784d8] 
> arch_remove_memory+0x68/0xdc
> [   21.964136] [c0000003f8803ad0] [c000000000385d74] 
> __remove_memory+0xc4/0x110
> [   21.964180] [c0000003f8803b10] [c0000000000d44e4] 
> dlpar_remove_lmb+0x94/0x140
> [   21.964223] [c0000003f8803b50] [c0000000000d52b4] 
> dlpar_memory+0x464/0xd00
> [   21.964259] [c0000003f8803be0] [c0000000000cd5c0] 
> handle_dlpar_errorlog+0xc0/0x190
> [   21.964303] [c0000003f8803c50] [c0000000000cd6bc] 
> pseries_hp_work_fn+0x2c/0x60
> [   21.964346] [c0000003f8803c80] [c00000000013a4a0] 
> process_one_work+0x2b0/0x5a0
> [   21.964388] [c0000003f8803d10] [c00000000013a818] 
> worker_thread+0x88/0x610
> [   21.964434] [c0000003f8803db0] [c000000000143884] kthread+0x1a4/0x1b0
> [   21.964468] [c0000003f8803e20] [c00000000000bdc4] 
> ret_from_kernel_thread+0x5c/0x78
> [   21.964506] Instruction dump:
> [   21.964527] fbe1fff8 f821ffd1 78638502 78633664 ebe90000 7fff1a14 
> 395f0020 813f0020
> [   21.964569] 7d2907b4 7d2900d0 79290fe0 69290001 <0b090000> 7c0004ac 
> 7d205028 3129ffff
> [   21.964613] ---[ end trace aaa571aa1636fee6 ]---
> [   21.966349]
> [   21.966383] Sending IPI to other CPUs
> [   21.978335] IPI complete
> [   21.981354] kexec: Starting switchover sequence.
> I'm in purgatory

It's not hung here, it's just not executing what we want it to :)

If you break into the qemu monitor and issue `info registers` it should
give you some idea of what's going on.

cheers

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: PROBLEM: Power9: kernel oops on memory hotunplug from ppc64le guest
@ 2019-05-17 11:20   ` Michael Ellerman
  0 siblings, 0 replies; 26+ messages in thread
From: Michael Ellerman @ 2019-05-17 11:20 UTC (permalink / raw)
  To: srikanth, linuxppc-dev; +Cc: linux-next, linux-kernel, bharata

srikanth <sraithal@linux.vnet.ibm.com> writes:
> Hello,
>
> On power9 host, performing memory hotunplug from ppc64le guest results 
> in kernel oops.

Thanks for the report.

Did this used to work in the past? If so what is the last version that
worked?

> Kernel used : https://github.com/torvalds/linux/tree/v5.1 built using 
> ppc64le_defconfig for host and ppc64le_guest_defconfig for guest.
>
> Recreation steps:
>
> 1. Boot a guest with below mem configuration:
>    <maxMemory slots='32' unit='KiB'>33554432</maxMemory>
>    <memory unit='KiB'>8388608</memory>
>    <currentMemory unit='KiB'>4194304</currentMemory>
>    <cpu>
>      <numa>
>        <cell id='0' cpus='0-31' memory='8388608' unit='KiB'/>
>      </numa>
>    </cpu>
>
> 2. From host hotplug 8G memory -> verify memory hotadded succesfully -> 
> now reboot guest -> once guest comes back try to unplug 8G memory

I assume the reboot is required to trigger the bug? ie. if you unplug
without rebooting it doesn't crash?

> mem.xml used:
> <memory model='dimm'>
> <target>
> <size unit='GiB'>8</size>
> <node>0</node>
> </target>
> </memory>
>
> Memory attach and detach commands used:
>      virsh attach-device vm1 ./mem.xml --live
>      virsh detach-device vm1 ./mem.xml --live
>
> Trace seen inside guest after unplug, guest just hangs there forever:
>
> [   21.962986] kernel BUG at arch/powerpc/mm/pgtable-frag.c:113!
> [   21.963064] Oops: Exception in kernel mode, sig: 5 [#1]
> [   21.963090] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA 
> pSeries
> [   21.963131] Modules linked in: xt_tcpudp iptable_filter squashfs fuse 
> vmx_crypto ib_iser rdma_cm iw_cm ib_cm ib_core libiscsi 
> scsi_transport_iscsi ip_tables x_tables autofs4 btrfs zstd_decompress 
> zstd_compress lzo_compress raid10 raid456 async_raid6_recov async_memcpy 
> async_pq async_xor async_tx xor raid6_pq multipath crc32c_vpmsum
> [   21.963281] CPU: 11 PID: 316 Comm: kworker/u64:5 Kdump: loaded Not 
> tainted 5.1.0-dirty #2
> [   21.963323] Workqueue: pseries hotplug workque pseries_hp_work_fn
> [   21.963355] NIP:  c000000000079e18 LR: c000000000c79308 CTR: 
> 0000000000008000
> [   21.963392] REGS: c0000003f88034f0 TRAP: 0700   Not tainted (5.1.0-dirty)
> [   21.963422] MSR:  800000000282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  
> CR: 28002884  XER: 20040000
> [   21.963470] CFAR: c000000000c79304 IRQMASK: 0
> [   21.963470] GPR00: c000000000c79308 c0000003f8803780 c000000001521000 
> 0000000000fff8c0

Can you try not to word wrap these, it makes them much harder to read.

There's some instructions here on configuring Thunderbird:
  https://www.kernel.org/doc/html/latest/process/email-clients.html#thunderbird-gui

> [   21.963470] GPR04: 0000000000000001 00000000ffe30005 0000000000000005 
> 0000000000000020
> [   21.963470] GPR08: 0000000000000000 0000000000000001 c00a000000fff8e0 
> c0000000016d21a0
> [   21.963470] GPR12: c0000000016e7b90 c000000007ff2700 c00a000000a00000 
> c0000003ffe30100
> [   21.963470] GPR16: c0000003ffe30000 c0000000014aa4de c00a0000009f0000 
> c0000000016d21b0
> [   21.963470] GPR20: c0000000014de588 0000000000000001 c0000000016d21b8 
> c00a000000a00000
> [   21.963470] GPR24: 0000000000000000 ffffffffffffffff c00a000000a00000 
> c0000003ffe96000
> [   21.963470] GPR28: c00a000000a00000 c00a000000a00000 c0000003fffec000 
> c00a000000fff8c0
> [   21.963802] NIP [c000000000079e18] pte_fragment_free+0x48/0xd0
> [   21.963838] LR [c000000000c79308] remove_pagetable+0x49c/0x5b4
> [   21.963873] Call Trace:
> [   21.963890] [c0000003f8803780] [c0000003ffe997f0] 0xc0000003ffe997f0 
> (unreliable)
> [   21.963933] [c0000003f88037b0] [0000000000000000] (null)
> [   21.963969] [c0000003f88038c0] [c00000000006f038] 
> vmemmap_free+0x218/0x2e0
> [   21.964006] [c0000003f8803940] [c00000000036f100] 
> sparse_remove_one_section+0xd0/0x138
> [   21.964050] [c0000003f8803980] [c000000000383a50] 
> __remove_pages+0x410/0x560
> [   21.964093] [c0000003f8803a90] [c000000000c784d8] 
> arch_remove_memory+0x68/0xdc
> [   21.964136] [c0000003f8803ad0] [c000000000385d74] 
> __remove_memory+0xc4/0x110
> [   21.964180] [c0000003f8803b10] [c0000000000d44e4] 
> dlpar_remove_lmb+0x94/0x140
> [   21.964223] [c0000003f8803b50] [c0000000000d52b4] 
> dlpar_memory+0x464/0xd00
> [   21.964259] [c0000003f8803be0] [c0000000000cd5c0] 
> handle_dlpar_errorlog+0xc0/0x190
> [   21.964303] [c0000003f8803c50] [c0000000000cd6bc] 
> pseries_hp_work_fn+0x2c/0x60
> [   21.964346] [c0000003f8803c80] [c00000000013a4a0] 
> process_one_work+0x2b0/0x5a0
> [   21.964388] [c0000003f8803d10] [c00000000013a818] 
> worker_thread+0x88/0x610
> [   21.964434] [c0000003f8803db0] [c000000000143884] kthread+0x1a4/0x1b0
> [   21.964468] [c0000003f8803e20] [c00000000000bdc4] 
> ret_from_kernel_thread+0x5c/0x78
> [   21.964506] Instruction dump:
> [   21.964527] fbe1fff8 f821ffd1 78638502 78633664 ebe90000 7fff1a14 
> 395f0020 813f0020
> [   21.964569] 7d2907b4 7d2900d0 79290fe0 69290001 <0b090000> 7c0004ac 
> 7d205028 3129ffff
> [   21.964613] ---[ end trace aaa571aa1636fee6 ]---
> [   21.966349]
> [   21.966383] Sending IPI to other CPUs
> [   21.978335] IPI complete
> [   21.981354] kexec: Starting switchover sequence.
> I'm in purgatory

It's not hung here, it's just not executing what we want it to :)

If you break into the qemu monitor and issue `info registers` it should
give you some idea of what's going on.

cheers

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: PROBLEM: Power9: kernel oops on memory hotunplug from ppc64le guest
  2019-05-16 14:14 ` srikanth
@ 2019-05-18 14:14   ` Bharata B Rao
  -1 siblings, 0 replies; 26+ messages in thread
From: Bharata B Rao @ 2019-05-18 14:14 UTC (permalink / raw)
  To: srikanth
  Cc: linuxppc-dev, bharata, mpe, linux-kernel, linux-next, npiggin,
	aneesh.kumar

On Thu, May 16, 2019 at 07:44:20PM +0530, srikanth wrote:
> Hello,
> 
> On power9 host, performing memory hotunplug from ppc64le guest results in
> kernel oops.
> 
> Kernel used : https://github.com/torvalds/linux/tree/v5.1 built using
> ppc64le_defconfig for host and ppc64le_guest_defconfig for guest.
> 
> Recreation steps:
> 
> 1. Boot a guest with below mem configuration:
>   <maxMemory slots='32' unit='KiB'>33554432</maxMemory>
>   <memory unit='KiB'>8388608</memory>
>   <currentMemory unit='KiB'>4194304</currentMemory>
>   <cpu>
>     <numa>
>       <cell id='0' cpus='0-31' memory='8388608' unit='KiB'/>
>     </numa>
>   </cpu>
> 
> 2. From host hotplug 8G memory -> verify memory hotadded succesfully -> now
> reboot guest -> once guest comes back try to unplug 8G memory
> 
> mem.xml used:
> <memory model='dimm'>
> <target>
> <size unit='GiB'>8</size>
> <node>0</node>
> </target>
> </memory>
> 
> Memory attach and detach commands used:
>     virsh attach-device vm1 ./mem.xml --live
>     virsh detach-device vm1 ./mem.xml --live
> 
> Trace seen inside guest after unplug, guest just hangs there forever:
> 
> [   21.962986] kernel BUG at arch/powerpc/mm/pgtable-frag.c:113!
> [   21.963064] Oops: Exception in kernel mode, sig: 5 [#1]
> [   21.963090] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA
> pSeries
> [   21.963131] Modules linked in: xt_tcpudp iptable_filter squashfs fuse
> vmx_crypto ib_iser rdma_cm iw_cm ib_cm ib_core libiscsi scsi_transport_iscsi
> ip_tables x_tables autofs4 btrfs zstd_decompress zstd_compress lzo_compress
> raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx
> xor raid6_pq multipath crc32c_vpmsum
> [   21.963281] CPU: 11 PID: 316 Comm: kworker/u64:5 Kdump: loaded Not
> tainted 5.1.0-dirty #2
> [   21.963323] Workqueue: pseries hotplug workque pseries_hp_work_fn
> [   21.963355] NIP:  c000000000079e18 LR: c000000000c79308 CTR:
> 0000000000008000
> [   21.963392] REGS: c0000003f88034f0 TRAP: 0700   Not tainted (5.1.0-dirty)
> [   21.963422] MSR:  800000000282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR:
> 28002884  XER: 20040000
> [   21.963470] CFAR: c000000000c79304 IRQMASK: 0
> [   21.963470] GPR00: c000000000c79308 c0000003f8803780 c000000001521000
> 0000000000fff8c0
> [   21.963470] GPR04: 0000000000000001 00000000ffe30005 0000000000000005
> 0000000000000020
> [   21.963470] GPR08: 0000000000000000 0000000000000001 c00a000000fff8e0
> c0000000016d21a0
> [   21.963470] GPR12: c0000000016e7b90 c000000007ff2700 c00a000000a00000
> c0000003ffe30100
> [   21.963470] GPR16: c0000003ffe30000 c0000000014aa4de c00a0000009f0000
> c0000000016d21b0
> [   21.963470] GPR20: c0000000014de588 0000000000000001 c0000000016d21b8
> c00a000000a00000
> [   21.963470] GPR24: 0000000000000000 ffffffffffffffff c00a000000a00000
> c0000003ffe96000
> [   21.963470] GPR28: c00a000000a00000 c00a000000a00000 c0000003fffec000
> c00a000000fff8c0
> [   21.963802] NIP [c000000000079e18] pte_fragment_free+0x48/0xd0
> [   21.963838] LR [c000000000c79308] remove_pagetable+0x49c/0x5b4
> [   21.963873] Call Trace:
> [   21.963890] [c0000003f8803780] [c0000003ffe997f0] 0xc0000003ffe997f0
> (unreliable)
> [   21.963933] [c0000003f88037b0] [0000000000000000] (null)
> [   21.963969] [c0000003f88038c0] [c00000000006f038]
> vmemmap_free+0x218/0x2e0
> [   21.964006] [c0000003f8803940] [c00000000036f100]
> sparse_remove_one_section+0xd0/0x138
> [   21.964050] [c0000003f8803980] [c000000000383a50]
> __remove_pages+0x410/0x560
> [   21.964093] [c0000003f8803a90] [c000000000c784d8]
> arch_remove_memory+0x68/0xdc
> [   21.964136] [c0000003f8803ad0] [c000000000385d74]
> __remove_memory+0xc4/0x110
> [   21.964180] [c0000003f8803b10] [c0000000000d44e4]
> dlpar_remove_lmb+0x94/0x140
> [   21.964223] [c0000003f8803b50] [c0000000000d52b4]
> dlpar_memory+0x464/0xd00
> [   21.964259] [c0000003f8803be0] [c0000000000cd5c0]
> handle_dlpar_errorlog+0xc0/0x190
> [   21.964303] [c0000003f8803c50] [c0000000000cd6bc]
> pseries_hp_work_fn+0x2c/0x60
> [   21.964346] [c0000003f8803c80] [c00000000013a4a0]
> process_one_work+0x2b0/0x5a0
> [   21.964388] [c0000003f8803d10] [c00000000013a818]
> worker_thread+0x88/0x610
> [   21.964434] [c0000003f8803db0] [c000000000143884] kthread+0x1a4/0x1b0
> [   21.964468] [c0000003f8803e20] [c00000000000bdc4]
> ret_from_kernel_thread+0x5c/0x78
> [   21.964506] Instruction dump:
> [   21.964527] fbe1fff8 f821ffd1 78638502 78633664 ebe90000 7fff1a14
> 395f0020 813f0020
> [   21.964569] 7d2907b4 7d2900d0 79290fe0 69290001 <0b090000> 7c0004ac
> 7d205028 3129ffff
> [   21.964613] ---[ end trace aaa571aa1636fee6 ]---
> [   21.966349]
> [   21.966383] Sending IPI to other CPUs
> [   21.978335] IPI complete
> [   21.981354] kexec: Starting switchover sequence.
> I'm in purgatory

git bisect points to

commit 4231aba000f5a4583dd9f67057aadb68c3eca99d
Author: Nicholas Piggin <npiggin@gmail.com>
Date:   Fri Jul 27 21:48:17 2018 +1000

    powerpc/64s: Fix page table fragment refcount race vs speculative references

    The page table fragment allocator uses the main page refcount racily
    with respect to speculative references. A customer observed a BUG due
    to page table page refcount underflow in the fragment allocator. This
    can be caused by the fragment allocator set_page_count stomping on a
    speculative reference, and then the speculative failure handler
    decrements the new reference, and the underflow eventually pops when
    the page tables are freed.

    Fix this by using a dedicated field in the struct page for the page
    table fragment allocator.

    Fixes: 5c1f6ee9a31c ("powerpc: Reduce PTE table memory wastage")
    Cc: stable@vger.kernel.org # v3.10+

Regards,
Bharata.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: PROBLEM: Power9: kernel oops on memory hotunplug from ppc64le guest
@ 2019-05-18 14:14   ` Bharata B Rao
  0 siblings, 0 replies; 26+ messages in thread
From: Bharata B Rao @ 2019-05-18 14:14 UTC (permalink / raw)
  To: srikanth
  Cc: linux-kernel, npiggin, linux-next, aneesh.kumar, bharata, linuxppc-dev

On Thu, May 16, 2019 at 07:44:20PM +0530, srikanth wrote:
> Hello,
> 
> On power9 host, performing memory hotunplug from ppc64le guest results in
> kernel oops.
> 
> Kernel used : https://github.com/torvalds/linux/tree/v5.1 built using
> ppc64le_defconfig for host and ppc64le_guest_defconfig for guest.
> 
> Recreation steps:
> 
> 1. Boot a guest with below mem configuration:
>   <maxMemory slots='32' unit='KiB'>33554432</maxMemory>
>   <memory unit='KiB'>8388608</memory>
>   <currentMemory unit='KiB'>4194304</currentMemory>
>   <cpu>
>     <numa>
>       <cell id='0' cpus='0-31' memory='8388608' unit='KiB'/>
>     </numa>
>   </cpu>
> 
> 2. From host hotplug 8G memory -> verify memory hotadded succesfully -> now
> reboot guest -> once guest comes back try to unplug 8G memory
> 
> mem.xml used:
> <memory model='dimm'>
> <target>
> <size unit='GiB'>8</size>
> <node>0</node>
> </target>
> </memory>
> 
> Memory attach and detach commands used:
>     virsh attach-device vm1 ./mem.xml --live
>     virsh detach-device vm1 ./mem.xml --live
> 
> Trace seen inside guest after unplug, guest just hangs there forever:
> 
> [   21.962986] kernel BUG at arch/powerpc/mm/pgtable-frag.c:113!
> [   21.963064] Oops: Exception in kernel mode, sig: 5 [#1]
> [   21.963090] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA
> pSeries
> [   21.963131] Modules linked in: xt_tcpudp iptable_filter squashfs fuse
> vmx_crypto ib_iser rdma_cm iw_cm ib_cm ib_core libiscsi scsi_transport_iscsi
> ip_tables x_tables autofs4 btrfs zstd_decompress zstd_compress lzo_compress
> raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx
> xor raid6_pq multipath crc32c_vpmsum
> [   21.963281] CPU: 11 PID: 316 Comm: kworker/u64:5 Kdump: loaded Not
> tainted 5.1.0-dirty #2
> [   21.963323] Workqueue: pseries hotplug workque pseries_hp_work_fn
> [   21.963355] NIP:  c000000000079e18 LR: c000000000c79308 CTR:
> 0000000000008000
> [   21.963392] REGS: c0000003f88034f0 TRAP: 0700   Not tainted (5.1.0-dirty)
> [   21.963422] MSR:  800000000282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR:
> 28002884  XER: 20040000
> [   21.963470] CFAR: c000000000c79304 IRQMASK: 0
> [   21.963470] GPR00: c000000000c79308 c0000003f8803780 c000000001521000
> 0000000000fff8c0
> [   21.963470] GPR04: 0000000000000001 00000000ffe30005 0000000000000005
> 0000000000000020
> [   21.963470] GPR08: 0000000000000000 0000000000000001 c00a000000fff8e0
> c0000000016d21a0
> [   21.963470] GPR12: c0000000016e7b90 c000000007ff2700 c00a000000a00000
> c0000003ffe30100
> [   21.963470] GPR16: c0000003ffe30000 c0000000014aa4de c00a0000009f0000
> c0000000016d21b0
> [   21.963470] GPR20: c0000000014de588 0000000000000001 c0000000016d21b8
> c00a000000a00000
> [   21.963470] GPR24: 0000000000000000 ffffffffffffffff c00a000000a00000
> c0000003ffe96000
> [   21.963470] GPR28: c00a000000a00000 c00a000000a00000 c0000003fffec000
> c00a000000fff8c0
> [   21.963802] NIP [c000000000079e18] pte_fragment_free+0x48/0xd0
> [   21.963838] LR [c000000000c79308] remove_pagetable+0x49c/0x5b4
> [   21.963873] Call Trace:
> [   21.963890] [c0000003f8803780] [c0000003ffe997f0] 0xc0000003ffe997f0
> (unreliable)
> [   21.963933] [c0000003f88037b0] [0000000000000000] (null)
> [   21.963969] [c0000003f88038c0] [c00000000006f038]
> vmemmap_free+0x218/0x2e0
> [   21.964006] [c0000003f8803940] [c00000000036f100]
> sparse_remove_one_section+0xd0/0x138
> [   21.964050] [c0000003f8803980] [c000000000383a50]
> __remove_pages+0x410/0x560
> [   21.964093] [c0000003f8803a90] [c000000000c784d8]
> arch_remove_memory+0x68/0xdc
> [   21.964136] [c0000003f8803ad0] [c000000000385d74]
> __remove_memory+0xc4/0x110
> [   21.964180] [c0000003f8803b10] [c0000000000d44e4]
> dlpar_remove_lmb+0x94/0x140
> [   21.964223] [c0000003f8803b50] [c0000000000d52b4]
> dlpar_memory+0x464/0xd00
> [   21.964259] [c0000003f8803be0] [c0000000000cd5c0]
> handle_dlpar_errorlog+0xc0/0x190
> [   21.964303] [c0000003f8803c50] [c0000000000cd6bc]
> pseries_hp_work_fn+0x2c/0x60
> [   21.964346] [c0000003f8803c80] [c00000000013a4a0]
> process_one_work+0x2b0/0x5a0
> [   21.964388] [c0000003f8803d10] [c00000000013a818]
> worker_thread+0x88/0x610
> [   21.964434] [c0000003f8803db0] [c000000000143884] kthread+0x1a4/0x1b0
> [   21.964468] [c0000003f8803e20] [c00000000000bdc4]
> ret_from_kernel_thread+0x5c/0x78
> [   21.964506] Instruction dump:
> [   21.964527] fbe1fff8 f821ffd1 78638502 78633664 ebe90000 7fff1a14
> 395f0020 813f0020
> [   21.964569] 7d2907b4 7d2900d0 79290fe0 69290001 <0b090000> 7c0004ac
> 7d205028 3129ffff
> [   21.964613] ---[ end trace aaa571aa1636fee6 ]---
> [   21.966349]
> [   21.966383] Sending IPI to other CPUs
> [   21.978335] IPI complete
> [   21.981354] kexec: Starting switchover sequence.
> I'm in purgatory

git bisect points to

commit 4231aba000f5a4583dd9f67057aadb68c3eca99d
Author: Nicholas Piggin <npiggin@gmail.com>
Date:   Fri Jul 27 21:48:17 2018 +1000

    powerpc/64s: Fix page table fragment refcount race vs speculative references

    The page table fragment allocator uses the main page refcount racily
    with respect to speculative references. A customer observed a BUG due
    to page table page refcount underflow in the fragment allocator. This
    can be caused by the fragment allocator set_page_count stomping on a
    speculative reference, and then the speculative failure handler
    decrements the new reference, and the underflow eventually pops when
    the page tables are freed.

    Fix this by using a dedicated field in the struct page for the page
    table fragment allocator.

    Fixes: 5c1f6ee9a31c ("powerpc: Reduce PTE table memory wastage")
    Cc: stable@vger.kernel.org # v3.10+

Regards,
Bharata.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: PROBLEM: Power9: kernel oops on memory hotunplug from ppc64le guest
  2019-05-18 14:14   ` Bharata B Rao
@ 2019-05-20  2:02     ` Michael Ellerman
  -1 siblings, 0 replies; 26+ messages in thread
From: Michael Ellerman @ 2019-05-20  2:02 UTC (permalink / raw)
  To: bharata, srikanth
  Cc: linuxppc-dev, bharata, linux-kernel, linux-next, npiggin, aneesh.kumar

Bharata B Rao <bharata@linux.ibm.com> writes:
> On Thu, May 16, 2019 at 07:44:20PM +0530, srikanth wrote:
>> Hello,
>> 
>> On power9 host, performing memory hotunplug from ppc64le guest results in
>> kernel oops.
>> 
>> Kernel used : https://github.com/torvalds/linux/tree/v5.1 built using
>> ppc64le_defconfig for host and ppc64le_guest_defconfig for guest.
>> 
>> Recreation steps:
>> 
>> 1. Boot a guest with below mem configuration:
>>   <maxMemory slots='32' unit='KiB'>33554432</maxMemory>
>>   <memory unit='KiB'>8388608</memory>
>>   <currentMemory unit='KiB'>4194304</currentMemory>
>>   <cpu>
>>     <numa>
>>       <cell id='0' cpus='0-31' memory='8388608' unit='KiB'/>
>>     </numa>
>>   </cpu>
>> 
>> 2. From host hotplug 8G memory -> verify memory hotadded succesfully -> now
>> reboot guest -> once guest comes back try to unplug 8G memory
>> 
>> mem.xml used:
>> <memory model='dimm'>
>> <target>
>> <size unit='GiB'>8</size>
>> <node>0</node>
>> </target>
>> </memory>
>> 
>> Memory attach and detach commands used:
>>     virsh attach-device vm1 ./mem.xml --live
>>     virsh detach-device vm1 ./mem.xml --live
>> 
>> Trace seen inside guest after unplug, guest just hangs there forever:
>> 
>> [   21.962986] kernel BUG at arch/powerpc/mm/pgtable-frag.c:113!
>> [   21.963064] Oops: Exception in kernel mode, sig: 5 [#1]
>> [   21.963090] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA
>> pSeries
>> [   21.963131] Modules linked in: xt_tcpudp iptable_filter squashfs fuse
>> vmx_crypto ib_iser rdma_cm iw_cm ib_cm ib_core libiscsi scsi_transport_iscsi
>> ip_tables x_tables autofs4 btrfs zstd_decompress zstd_compress lzo_compress
>> raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx
>> xor raid6_pq multipath crc32c_vpmsum
>> [   21.963281] CPU: 11 PID: 316 Comm: kworker/u64:5 Kdump: loaded Not
>> tainted 5.1.0-dirty #2
>> [   21.963323] Workqueue: pseries hotplug workque pseries_hp_work_fn
>> [   21.963355] NIP:  c000000000079e18 LR: c000000000c79308 CTR:
>> 0000000000008000
>> [   21.963392] REGS: c0000003f88034f0 TRAP: 0700   Not tainted (5.1.0-dirty)
>> [   21.963422] MSR:  800000000282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR:
>> 28002884  XER: 20040000
>> [   21.963470] CFAR: c000000000c79304 IRQMASK: 0
>> [   21.963470] GPR00: c000000000c79308 c0000003f8803780 c000000001521000
>> 0000000000fff8c0
>> [   21.963470] GPR04: 0000000000000001 00000000ffe30005 0000000000000005
>> 0000000000000020
>> [   21.963470] GPR08: 0000000000000000 0000000000000001 c00a000000fff8e0
>> c0000000016d21a0
>> [   21.963470] GPR12: c0000000016e7b90 c000000007ff2700 c00a000000a00000
>> c0000003ffe30100
>> [   21.963470] GPR16: c0000003ffe30000 c0000000014aa4de c00a0000009f0000
>> c0000000016d21b0
>> [   21.963470] GPR20: c0000000014de588 0000000000000001 c0000000016d21b8
>> c00a000000a00000
>> [   21.963470] GPR24: 0000000000000000 ffffffffffffffff c00a000000a00000
>> c0000003ffe96000
>> [   21.963470] GPR28: c00a000000a00000 c00a000000a00000 c0000003fffec000
>> c00a000000fff8c0
>> [   21.963802] NIP [c000000000079e18] pte_fragment_free+0x48/0xd0
>> [   21.963838] LR [c000000000c79308] remove_pagetable+0x49c/0x5b4
>> [   21.963873] Call Trace:
>> [   21.963890] [c0000003f8803780] [c0000003ffe997f0] 0xc0000003ffe997f0
>> (unreliable)
>> [   21.963933] [c0000003f88037b0] [0000000000000000] (null)
>> [   21.963969] [c0000003f88038c0] [c00000000006f038]
>> vmemmap_free+0x218/0x2e0
>> [   21.964006] [c0000003f8803940] [c00000000036f100]
>> sparse_remove_one_section+0xd0/0x138
>> [   21.964050] [c0000003f8803980] [c000000000383a50]
>> __remove_pages+0x410/0x560
>> [   21.964093] [c0000003f8803a90] [c000000000c784d8]
>> arch_remove_memory+0x68/0xdc
>> [   21.964136] [c0000003f8803ad0] [c000000000385d74]
>> __remove_memory+0xc4/0x110
>> [   21.964180] [c0000003f8803b10] [c0000000000d44e4]
>> dlpar_remove_lmb+0x94/0x140
>> [   21.964223] [c0000003f8803b50] [c0000000000d52b4]
>> dlpar_memory+0x464/0xd00
>> [   21.964259] [c0000003f8803be0] [c0000000000cd5c0]
>> handle_dlpar_errorlog+0xc0/0x190
>> [   21.964303] [c0000003f8803c50] [c0000000000cd6bc]
>> pseries_hp_work_fn+0x2c/0x60
>> [   21.964346] [c0000003f8803c80] [c00000000013a4a0]
>> process_one_work+0x2b0/0x5a0
>> [   21.964388] [c0000003f8803d10] [c00000000013a818]
>> worker_thread+0x88/0x610
>> [   21.964434] [c0000003f8803db0] [c000000000143884] kthread+0x1a4/0x1b0
>> [   21.964468] [c0000003f8803e20] [c00000000000bdc4]
>> ret_from_kernel_thread+0x5c/0x78
>> [   21.964506] Instruction dump:
>> [   21.964527] fbe1fff8 f821ffd1 78638502 78633664 ebe90000 7fff1a14
>> 395f0020 813f0020
>> [   21.964569] 7d2907b4 7d2900d0 79290fe0 69290001 <0b090000> 7c0004ac
>> 7d205028 3129ffff
>> [   21.964613] ---[ end trace aaa571aa1636fee6 ]---
>> [   21.966349]
>> [   21.966383] Sending IPI to other CPUs
>> [   21.978335] IPI complete
>> [   21.981354] kexec: Starting switchover sequence.
>> I'm in purgatory
>
> git bisect points to
>
> commit 4231aba000f5a4583dd9f67057aadb68c3eca99d
> Author: Nicholas Piggin <npiggin@gmail.com>
> Date:   Fri Jul 27 21:48:17 2018 +1000
>
>     powerpc/64s: Fix page table fragment refcount race vs speculative references
>
>     The page table fragment allocator uses the main page refcount racily
>     with respect to speculative references. A customer observed a BUG due
>     to page table page refcount underflow in the fragment allocator. This
>     can be caused by the fragment allocator set_page_count stomping on a
>     speculative reference, and then the speculative failure handler
>     decrements the new reference, and the underflow eventually pops when
>     the page tables are freed.
>
>     Fix this by using a dedicated field in the struct page for the page
>     table fragment allocator.
>
>     Fixes: 5c1f6ee9a31c ("powerpc: Reduce PTE table memory wastage")
>     Cc: stable@vger.kernel.org # v3.10+

That's the commit that added the BUG_ON(), so prior to that you won't
see the crash.

cheers

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: PROBLEM: Power9: kernel oops on memory hotunplug from ppc64le guest
@ 2019-05-20  2:02     ` Michael Ellerman
  0 siblings, 0 replies; 26+ messages in thread
From: Michael Ellerman @ 2019-05-20  2:02 UTC (permalink / raw)
  To: bharata, srikanth
  Cc: aneesh.kumar, linux-kernel, npiggin, linux-next, bharata, linuxppc-dev

Bharata B Rao <bharata@linux.ibm.com> writes:
> On Thu, May 16, 2019 at 07:44:20PM +0530, srikanth wrote:
>> Hello,
>> 
>> On power9 host, performing memory hotunplug from ppc64le guest results in
>> kernel oops.
>> 
>> Kernel used : https://github.com/torvalds/linux/tree/v5.1 built using
>> ppc64le_defconfig for host and ppc64le_guest_defconfig for guest.
>> 
>> Recreation steps:
>> 
>> 1. Boot a guest with below mem configuration:
>>   <maxMemory slots='32' unit='KiB'>33554432</maxMemory>
>>   <memory unit='KiB'>8388608</memory>
>>   <currentMemory unit='KiB'>4194304</currentMemory>
>>   <cpu>
>>     <numa>
>>       <cell id='0' cpus='0-31' memory='8388608' unit='KiB'/>
>>     </numa>
>>   </cpu>
>> 
>> 2. From host hotplug 8G memory -> verify memory hotadded succesfully -> now
>> reboot guest -> once guest comes back try to unplug 8G memory
>> 
>> mem.xml used:
>> <memory model='dimm'>
>> <target>
>> <size unit='GiB'>8</size>
>> <node>0</node>
>> </target>
>> </memory>
>> 
>> Memory attach and detach commands used:
>>     virsh attach-device vm1 ./mem.xml --live
>>     virsh detach-device vm1 ./mem.xml --live
>> 
>> Trace seen inside guest after unplug, guest just hangs there forever:
>> 
>> [   21.962986] kernel BUG at arch/powerpc/mm/pgtable-frag.c:113!
>> [   21.963064] Oops: Exception in kernel mode, sig: 5 [#1]
>> [   21.963090] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA
>> pSeries
>> [   21.963131] Modules linked in: xt_tcpudp iptable_filter squashfs fuse
>> vmx_crypto ib_iser rdma_cm iw_cm ib_cm ib_core libiscsi scsi_transport_iscsi
>> ip_tables x_tables autofs4 btrfs zstd_decompress zstd_compress lzo_compress
>> raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx
>> xor raid6_pq multipath crc32c_vpmsum
>> [   21.963281] CPU: 11 PID: 316 Comm: kworker/u64:5 Kdump: loaded Not
>> tainted 5.1.0-dirty #2
>> [   21.963323] Workqueue: pseries hotplug workque pseries_hp_work_fn
>> [   21.963355] NIP:  c000000000079e18 LR: c000000000c79308 CTR:
>> 0000000000008000
>> [   21.963392] REGS: c0000003f88034f0 TRAP: 0700   Not tainted (5.1.0-dirty)
>> [   21.963422] MSR:  800000000282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR:
>> 28002884  XER: 20040000
>> [   21.963470] CFAR: c000000000c79304 IRQMASK: 0
>> [   21.963470] GPR00: c000000000c79308 c0000003f8803780 c000000001521000
>> 0000000000fff8c0
>> [   21.963470] GPR04: 0000000000000001 00000000ffe30005 0000000000000005
>> 0000000000000020
>> [   21.963470] GPR08: 0000000000000000 0000000000000001 c00a000000fff8e0
>> c0000000016d21a0
>> [   21.963470] GPR12: c0000000016e7b90 c000000007ff2700 c00a000000a00000
>> c0000003ffe30100
>> [   21.963470] GPR16: c0000003ffe30000 c0000000014aa4de c00a0000009f0000
>> c0000000016d21b0
>> [   21.963470] GPR20: c0000000014de588 0000000000000001 c0000000016d21b8
>> c00a000000a00000
>> [   21.963470] GPR24: 0000000000000000 ffffffffffffffff c00a000000a00000
>> c0000003ffe96000
>> [   21.963470] GPR28: c00a000000a00000 c00a000000a00000 c0000003fffec000
>> c00a000000fff8c0
>> [   21.963802] NIP [c000000000079e18] pte_fragment_free+0x48/0xd0
>> [   21.963838] LR [c000000000c79308] remove_pagetable+0x49c/0x5b4
>> [   21.963873] Call Trace:
>> [   21.963890] [c0000003f8803780] [c0000003ffe997f0] 0xc0000003ffe997f0
>> (unreliable)
>> [   21.963933] [c0000003f88037b0] [0000000000000000] (null)
>> [   21.963969] [c0000003f88038c0] [c00000000006f038]
>> vmemmap_free+0x218/0x2e0
>> [   21.964006] [c0000003f8803940] [c00000000036f100]
>> sparse_remove_one_section+0xd0/0x138
>> [   21.964050] [c0000003f8803980] [c000000000383a50]
>> __remove_pages+0x410/0x560
>> [   21.964093] [c0000003f8803a90] [c000000000c784d8]
>> arch_remove_memory+0x68/0xdc
>> [   21.964136] [c0000003f8803ad0] [c000000000385d74]
>> __remove_memory+0xc4/0x110
>> [   21.964180] [c0000003f8803b10] [c0000000000d44e4]
>> dlpar_remove_lmb+0x94/0x140
>> [   21.964223] [c0000003f8803b50] [c0000000000d52b4]
>> dlpar_memory+0x464/0xd00
>> [   21.964259] [c0000003f8803be0] [c0000000000cd5c0]
>> handle_dlpar_errorlog+0xc0/0x190
>> [   21.964303] [c0000003f8803c50] [c0000000000cd6bc]
>> pseries_hp_work_fn+0x2c/0x60
>> [   21.964346] [c0000003f8803c80] [c00000000013a4a0]
>> process_one_work+0x2b0/0x5a0
>> [   21.964388] [c0000003f8803d10] [c00000000013a818]
>> worker_thread+0x88/0x610
>> [   21.964434] [c0000003f8803db0] [c000000000143884] kthread+0x1a4/0x1b0
>> [   21.964468] [c0000003f8803e20] [c00000000000bdc4]
>> ret_from_kernel_thread+0x5c/0x78
>> [   21.964506] Instruction dump:
>> [   21.964527] fbe1fff8 f821ffd1 78638502 78633664 ebe90000 7fff1a14
>> 395f0020 813f0020
>> [   21.964569] 7d2907b4 7d2900d0 79290fe0 69290001 <0b090000> 7c0004ac
>> 7d205028 3129ffff
>> [   21.964613] ---[ end trace aaa571aa1636fee6 ]---
>> [   21.966349]
>> [   21.966383] Sending IPI to other CPUs
>> [   21.978335] IPI complete
>> [   21.981354] kexec: Starting switchover sequence.
>> I'm in purgatory
>
> git bisect points to
>
> commit 4231aba000f5a4583dd9f67057aadb68c3eca99d
> Author: Nicholas Piggin <npiggin@gmail.com>
> Date:   Fri Jul 27 21:48:17 2018 +1000
>
>     powerpc/64s: Fix page table fragment refcount race vs speculative references
>
>     The page table fragment allocator uses the main page refcount racily
>     with respect to speculative references. A customer observed a BUG due
>     to page table page refcount underflow in the fragment allocator. This
>     can be caused by the fragment allocator set_page_count stomping on a
>     speculative reference, and then the speculative failure handler
>     decrements the new reference, and the underflow eventually pops when
>     the page tables are freed.
>
>     Fix this by using a dedicated field in the struct page for the page
>     table fragment allocator.
>
>     Fixes: 5c1f6ee9a31c ("powerpc: Reduce PTE table memory wastage")
>     Cc: stable@vger.kernel.org # v3.10+

That's the commit that added the BUG_ON(), so prior to that you won't
see the crash.

cheers

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: PROBLEM: Power9: kernel oops on memory hotunplug from ppc64le guest
  2019-05-20  2:02     ` Michael Ellerman
@ 2019-05-20  4:25       ` Bharata B Rao
  -1 siblings, 0 replies; 26+ messages in thread
From: Bharata B Rao @ 2019-05-20  4:25 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: srikanth, linuxppc-dev, bharata, linux-kernel, linux-next,
	npiggin, aneesh.kumar

On Mon, May 20, 2019 at 12:02:23PM +1000, Michael Ellerman wrote:
> Bharata B Rao <bharata@linux.ibm.com> writes:
> > On Thu, May 16, 2019 at 07:44:20PM +0530, srikanth wrote:
> >> Hello,
> >> 
> >> On power9 host, performing memory hotunplug from ppc64le guest results in
> >> kernel oops.
> >> 
> >> Kernel used : https://github.com/torvalds/linux/tree/v5.1 built using
> >> ppc64le_defconfig for host and ppc64le_guest_defconfig for guest.
> >> 
> >> Recreation steps:
> >> 
> >> 1. Boot a guest with below mem configuration:
> >>   <maxMemory slots='32' unit='KiB'>33554432</maxMemory>
> >>   <memory unit='KiB'>8388608</memory>
> >>   <currentMemory unit='KiB'>4194304</currentMemory>
> >>   <cpu>
> >>     <numa>
> >>       <cell id='0' cpus='0-31' memory='8388608' unit='KiB'/>
> >>     </numa>
> >>   </cpu>
> >> 
> >> 2. From host hotplug 8G memory -> verify memory hotadded succesfully -> now
> >> reboot guest -> once guest comes back try to unplug 8G memory
> >> 
> >> mem.xml used:
> >> <memory model='dimm'>
> >> <target>
> >> <size unit='GiB'>8</size>
> >> <node>0</node>
> >> </target>
> >> </memory>
> >> 
> >> Memory attach and detach commands used:
> >>     virsh attach-device vm1 ./mem.xml --live
> >>     virsh detach-device vm1 ./mem.xml --live
> >> 
> >> Trace seen inside guest after unplug, guest just hangs there forever:
> >> 
> >> [   21.962986] kernel BUG at arch/powerpc/mm/pgtable-frag.c:113!
> >> [   21.963064] Oops: Exception in kernel mode, sig: 5 [#1]
> >> [   21.963090] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA
> >> pSeries
> >> [   21.963131] Modules linked in: xt_tcpudp iptable_filter squashfs fuse
> >> vmx_crypto ib_iser rdma_cm iw_cm ib_cm ib_core libiscsi scsi_transport_iscsi
> >> ip_tables x_tables autofs4 btrfs zstd_decompress zstd_compress lzo_compress
> >> raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx
> >> xor raid6_pq multipath crc32c_vpmsum
> >> [   21.963281] CPU: 11 PID: 316 Comm: kworker/u64:5 Kdump: loaded Not
> >> tainted 5.1.0-dirty #2
> >> [   21.963323] Workqueue: pseries hotplug workque pseries_hp_work_fn
> >> [   21.963355] NIP:  c000000000079e18 LR: c000000000c79308 CTR:
> >> 0000000000008000
> >> [   21.963392] REGS: c0000003f88034f0 TRAP: 0700   Not tainted (5.1.0-dirty)
> >> [   21.963422] MSR:  800000000282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR:
> >> 28002884  XER: 20040000
> >> [   21.963470] CFAR: c000000000c79304 IRQMASK: 0
> >> [   21.963470] GPR00: c000000000c79308 c0000003f8803780 c000000001521000
> >> 0000000000fff8c0
> >> [   21.963470] GPR04: 0000000000000001 00000000ffe30005 0000000000000005
> >> 0000000000000020
> >> [   21.963470] GPR08: 0000000000000000 0000000000000001 c00a000000fff8e0
> >> c0000000016d21a0
> >> [   21.963470] GPR12: c0000000016e7b90 c000000007ff2700 c00a000000a00000
> >> c0000003ffe30100
> >> [   21.963470] GPR16: c0000003ffe30000 c0000000014aa4de c00a0000009f0000
> >> c0000000016d21b0
> >> [   21.963470] GPR20: c0000000014de588 0000000000000001 c0000000016d21b8
> >> c00a000000a00000
> >> [   21.963470] GPR24: 0000000000000000 ffffffffffffffff c00a000000a00000
> >> c0000003ffe96000
> >> [   21.963470] GPR28: c00a000000a00000 c00a000000a00000 c0000003fffec000
> >> c00a000000fff8c0
> >> [   21.963802] NIP [c000000000079e18] pte_fragment_free+0x48/0xd0
> >> [   21.963838] LR [c000000000c79308] remove_pagetable+0x49c/0x5b4
> >> [   21.963873] Call Trace:
> >> [   21.963890] [c0000003f8803780] [c0000003ffe997f0] 0xc0000003ffe997f0
> >> (unreliable)
> >> [   21.963933] [c0000003f88037b0] [0000000000000000] (null)
> >> [   21.963969] [c0000003f88038c0] [c00000000006f038]
> >> vmemmap_free+0x218/0x2e0
> >> [   21.964006] [c0000003f8803940] [c00000000036f100]
> >> sparse_remove_one_section+0xd0/0x138
> >> [   21.964050] [c0000003f8803980] [c000000000383a50]
> >> __remove_pages+0x410/0x560
> >> [   21.964093] [c0000003f8803a90] [c000000000c784d8]
> >> arch_remove_memory+0x68/0xdc
> >> [   21.964136] [c0000003f8803ad0] [c000000000385d74]
> >> __remove_memory+0xc4/0x110
> >> [   21.964180] [c0000003f8803b10] [c0000000000d44e4]
> >> dlpar_remove_lmb+0x94/0x140
> >> [   21.964223] [c0000003f8803b50] [c0000000000d52b4]
> >> dlpar_memory+0x464/0xd00
> >> [   21.964259] [c0000003f8803be0] [c0000000000cd5c0]
> >> handle_dlpar_errorlog+0xc0/0x190
> >> [   21.964303] [c0000003f8803c50] [c0000000000cd6bc]
> >> pseries_hp_work_fn+0x2c/0x60
> >> [   21.964346] [c0000003f8803c80] [c00000000013a4a0]
> >> process_one_work+0x2b0/0x5a0
> >> [   21.964388] [c0000003f8803d10] [c00000000013a818]
> >> worker_thread+0x88/0x610
> >> [   21.964434] [c0000003f8803db0] [c000000000143884] kthread+0x1a4/0x1b0
> >> [   21.964468] [c0000003f8803e20] [c00000000000bdc4]
> >> ret_from_kernel_thread+0x5c/0x78
> >> [   21.964506] Instruction dump:
> >> [   21.964527] fbe1fff8 f821ffd1 78638502 78633664 ebe90000 7fff1a14
> >> 395f0020 813f0020
> >> [   21.964569] 7d2907b4 7d2900d0 79290fe0 69290001 <0b090000> 7c0004ac
> >> 7d205028 3129ffff
> >> [   21.964613] ---[ end trace aaa571aa1636fee6 ]---
> >> [   21.966349]
> >> [   21.966383] Sending IPI to other CPUs
> >> [   21.978335] IPI complete
> >> [   21.981354] kexec: Starting switchover sequence.
> >> I'm in purgatory
> >
> > git bisect points to
> >
> > commit 4231aba000f5a4583dd9f67057aadb68c3eca99d
> > Author: Nicholas Piggin <npiggin@gmail.com>
> > Date:   Fri Jul 27 21:48:17 2018 +1000
> >
> >     powerpc/64s: Fix page table fragment refcount race vs speculative references
> >
> >     The page table fragment allocator uses the main page refcount racily
> >     with respect to speculative references. A customer observed a BUG due
> >     to page table page refcount underflow in the fragment allocator. This
> >     can be caused by the fragment allocator set_page_count stomping on a
> >     speculative reference, and then the speculative failure handler
> >     decrements the new reference, and the underflow eventually pops when
> >     the page tables are freed.
> >
> >     Fix this by using a dedicated field in the struct page for the page
> >     table fragment allocator.
> >
> >     Fixes: 5c1f6ee9a31c ("powerpc: Reduce PTE table memory wastage")
> >     Cc: stable@vger.kernel.org # v3.10+
> 
> That's the commit that added the BUG_ON(), so prior to that you won't
> see the crash.

Right, but the commit says it fixes page table page refcount underflow by
introducing a new field &page->pt_frag_refcount. Now we are hitting the underflow
for this pt_frag_refcount.

BTW, if I go below this commit, I don't hit the pagecount

VM_BUG_ON_PAGE(page_ref_count(page) == 0, page);

which is in pte_fragment_free() path.

Regards,
Bharata.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: PROBLEM: Power9: kernel oops on memory hotunplug from ppc64le guest
@ 2019-05-20  4:25       ` Bharata B Rao
  0 siblings, 0 replies; 26+ messages in thread
From: Bharata B Rao @ 2019-05-20  4:25 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: aneesh.kumar, linux-kernel, npiggin, linux-next, bharata,
	srikanth, linuxppc-dev

On Mon, May 20, 2019 at 12:02:23PM +1000, Michael Ellerman wrote:
> Bharata B Rao <bharata@linux.ibm.com> writes:
> > On Thu, May 16, 2019 at 07:44:20PM +0530, srikanth wrote:
> >> Hello,
> >> 
> >> On power9 host, performing memory hotunplug from ppc64le guest results in
> >> kernel oops.
> >> 
> >> Kernel used : https://github.com/torvalds/linux/tree/v5.1 built using
> >> ppc64le_defconfig for host and ppc64le_guest_defconfig for guest.
> >> 
> >> Recreation steps:
> >> 
> >> 1. Boot a guest with below mem configuration:
> >>   <maxMemory slots='32' unit='KiB'>33554432</maxMemory>
> >>   <memory unit='KiB'>8388608</memory>
> >>   <currentMemory unit='KiB'>4194304</currentMemory>
> >>   <cpu>
> >>     <numa>
> >>       <cell id='0' cpus='0-31' memory='8388608' unit='KiB'/>
> >>     </numa>
> >>   </cpu>
> >> 
> >> 2. From host hotplug 8G memory -> verify memory hotadded succesfully -> now
> >> reboot guest -> once guest comes back try to unplug 8G memory
> >> 
> >> mem.xml used:
> >> <memory model='dimm'>
> >> <target>
> >> <size unit='GiB'>8</size>
> >> <node>0</node>
> >> </target>
> >> </memory>
> >> 
> >> Memory attach and detach commands used:
> >>     virsh attach-device vm1 ./mem.xml --live
> >>     virsh detach-device vm1 ./mem.xml --live
> >> 
> >> Trace seen inside guest after unplug, guest just hangs there forever:
> >> 
> >> [   21.962986] kernel BUG at arch/powerpc/mm/pgtable-frag.c:113!
> >> [   21.963064] Oops: Exception in kernel mode, sig: 5 [#1]
> >> [   21.963090] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA
> >> pSeries
> >> [   21.963131] Modules linked in: xt_tcpudp iptable_filter squashfs fuse
> >> vmx_crypto ib_iser rdma_cm iw_cm ib_cm ib_core libiscsi scsi_transport_iscsi
> >> ip_tables x_tables autofs4 btrfs zstd_decompress zstd_compress lzo_compress
> >> raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx
> >> xor raid6_pq multipath crc32c_vpmsum
> >> [   21.963281] CPU: 11 PID: 316 Comm: kworker/u64:5 Kdump: loaded Not
> >> tainted 5.1.0-dirty #2
> >> [   21.963323] Workqueue: pseries hotplug workque pseries_hp_work_fn
> >> [   21.963355] NIP:  c000000000079e18 LR: c000000000c79308 CTR:
> >> 0000000000008000
> >> [   21.963392] REGS: c0000003f88034f0 TRAP: 0700   Not tainted (5.1.0-dirty)
> >> [   21.963422] MSR:  800000000282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR:
> >> 28002884  XER: 20040000
> >> [   21.963470] CFAR: c000000000c79304 IRQMASK: 0
> >> [   21.963470] GPR00: c000000000c79308 c0000003f8803780 c000000001521000
> >> 0000000000fff8c0
> >> [   21.963470] GPR04: 0000000000000001 00000000ffe30005 0000000000000005
> >> 0000000000000020
> >> [   21.963470] GPR08: 0000000000000000 0000000000000001 c00a000000fff8e0
> >> c0000000016d21a0
> >> [   21.963470] GPR12: c0000000016e7b90 c000000007ff2700 c00a000000a00000
> >> c0000003ffe30100
> >> [   21.963470] GPR16: c0000003ffe30000 c0000000014aa4de c00a0000009f0000
> >> c0000000016d21b0
> >> [   21.963470] GPR20: c0000000014de588 0000000000000001 c0000000016d21b8
> >> c00a000000a00000
> >> [   21.963470] GPR24: 0000000000000000 ffffffffffffffff c00a000000a00000
> >> c0000003ffe96000
> >> [   21.963470] GPR28: c00a000000a00000 c00a000000a00000 c0000003fffec000
> >> c00a000000fff8c0
> >> [   21.963802] NIP [c000000000079e18] pte_fragment_free+0x48/0xd0
> >> [   21.963838] LR [c000000000c79308] remove_pagetable+0x49c/0x5b4
> >> [   21.963873] Call Trace:
> >> [   21.963890] [c0000003f8803780] [c0000003ffe997f0] 0xc0000003ffe997f0
> >> (unreliable)
> >> [   21.963933] [c0000003f88037b0] [0000000000000000] (null)
> >> [   21.963969] [c0000003f88038c0] [c00000000006f038]
> >> vmemmap_free+0x218/0x2e0
> >> [   21.964006] [c0000003f8803940] [c00000000036f100]
> >> sparse_remove_one_section+0xd0/0x138
> >> [   21.964050] [c0000003f8803980] [c000000000383a50]
> >> __remove_pages+0x410/0x560
> >> [   21.964093] [c0000003f8803a90] [c000000000c784d8]
> >> arch_remove_memory+0x68/0xdc
> >> [   21.964136] [c0000003f8803ad0] [c000000000385d74]
> >> __remove_memory+0xc4/0x110
> >> [   21.964180] [c0000003f8803b10] [c0000000000d44e4]
> >> dlpar_remove_lmb+0x94/0x140
> >> [   21.964223] [c0000003f8803b50] [c0000000000d52b4]
> >> dlpar_memory+0x464/0xd00
> >> [   21.964259] [c0000003f8803be0] [c0000000000cd5c0]
> >> handle_dlpar_errorlog+0xc0/0x190
> >> [   21.964303] [c0000003f8803c50] [c0000000000cd6bc]
> >> pseries_hp_work_fn+0x2c/0x60
> >> [   21.964346] [c0000003f8803c80] [c00000000013a4a0]
> >> process_one_work+0x2b0/0x5a0
> >> [   21.964388] [c0000003f8803d10] [c00000000013a818]
> >> worker_thread+0x88/0x610
> >> [   21.964434] [c0000003f8803db0] [c000000000143884] kthread+0x1a4/0x1b0
> >> [   21.964468] [c0000003f8803e20] [c00000000000bdc4]
> >> ret_from_kernel_thread+0x5c/0x78
> >> [   21.964506] Instruction dump:
> >> [   21.964527] fbe1fff8 f821ffd1 78638502 78633664 ebe90000 7fff1a14
> >> 395f0020 813f0020
> >> [   21.964569] 7d2907b4 7d2900d0 79290fe0 69290001 <0b090000> 7c0004ac
> >> 7d205028 3129ffff
> >> [   21.964613] ---[ end trace aaa571aa1636fee6 ]---
> >> [   21.966349]
> >> [   21.966383] Sending IPI to other CPUs
> >> [   21.978335] IPI complete
> >> [   21.981354] kexec: Starting switchover sequence.
> >> I'm in purgatory
> >
> > git bisect points to
> >
> > commit 4231aba000f5a4583dd9f67057aadb68c3eca99d
> > Author: Nicholas Piggin <npiggin@gmail.com>
> > Date:   Fri Jul 27 21:48:17 2018 +1000
> >
> >     powerpc/64s: Fix page table fragment refcount race vs speculative references
> >
> >     The page table fragment allocator uses the main page refcount racily
> >     with respect to speculative references. A customer observed a BUG due
> >     to page table page refcount underflow in the fragment allocator. This
> >     can be caused by the fragment allocator set_page_count stomping on a
> >     speculative reference, and then the speculative failure handler
> >     decrements the new reference, and the underflow eventually pops when
> >     the page tables are freed.
> >
> >     Fix this by using a dedicated field in the struct page for the page
> >     table fragment allocator.
> >
> >     Fixes: 5c1f6ee9a31c ("powerpc: Reduce PTE table memory wastage")
> >     Cc: stable@vger.kernel.org # v3.10+
> 
> That's the commit that added the BUG_ON(), so prior to that you won't
> see the crash.

Right, but the commit says it fixes page table page refcount underflow by
introducing a new field &page->pt_frag_refcount. Now we are hitting the underflow
for this pt_frag_refcount.

BTW, if I go below this commit, I don't hit the pagecount

VM_BUG_ON_PAGE(page_ref_count(page) == 0, page);

which is in pte_fragment_free() path.

Regards,
Bharata.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: PROBLEM: Power9: kernel oops on memory hotunplug from ppc64le guest
  2019-05-20  4:25       ` Bharata B Rao
@ 2019-05-20  4:48         ` Nicholas Piggin
  -1 siblings, 0 replies; 26+ messages in thread
From: Nicholas Piggin @ 2019-05-20  4:48 UTC (permalink / raw)
  To: bharata, Michael Ellerman
  Cc: aneesh.kumar, bharata, linux-kernel, linux-next, linuxppc-dev, srikanth

Bharata B Rao's on May 20, 2019 2:25 pm:
> On Mon, May 20, 2019 at 12:02:23PM +1000, Michael Ellerman wrote:
>> Bharata B Rao <bharata@linux.ibm.com> writes:
>> > On Thu, May 16, 2019 at 07:44:20PM +0530, srikanth wrote:
>> >> Hello,
>> >> 
>> >> On power9 host, performing memory hotunplug from ppc64le guest results in
>> >> kernel oops.
>> >> 
>> >> Kernel used : https://github.com/torvalds/linux/tree/v5.1 built using
>> >> ppc64le_defconfig for host and ppc64le_guest_defconfig for guest.
>> >> 
>> >> Recreation steps:
>> >> 
>> >> 1. Boot a guest with below mem configuration:
>> >>   <maxMemory slots='32' unit='KiB'>33554432</maxMemory>
>> >>   <memory unit='KiB'>8388608</memory>
>> >>   <currentMemory unit='KiB'>4194304</currentMemory>
>> >>   <cpu>
>> >>     <numa>
>> >>       <cell id='0' cpus='0-31' memory='8388608' unit='KiB'/>
>> >>     </numa>
>> >>   </cpu>
>> >> 
>> >> 2. From host hotplug 8G memory -> verify memory hotadded succesfully -> now
>> >> reboot guest -> once guest comes back try to unplug 8G memory
>> >> 
>> >> mem.xml used:
>> >> <memory model='dimm'>
>> >> <target>
>> >> <size unit='GiB'>8</size>
>> >> <node>0</node>
>> >> </target>
>> >> </memory>
>> >> 
>> >> Memory attach and detach commands used:
>> >>     virsh attach-device vm1 ./mem.xml --live
>> >>     virsh detach-device vm1 ./mem.xml --live
>> >> 
>> >> Trace seen inside guest after unplug, guest just hangs there forever:
>> >> 
>> >> [   21.962986] kernel BUG at arch/powerpc/mm/pgtable-frag.c:113!
>> >> [   21.963064] Oops: Exception in kernel mode, sig: 5 [#1]
>> >> [   21.963090] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA
>> >> pSeries
>> >> [   21.963131] Modules linked in: xt_tcpudp iptable_filter squashfs fuse
>> >> vmx_crypto ib_iser rdma_cm iw_cm ib_cm ib_core libiscsi scsi_transport_iscsi
>> >> ip_tables x_tables autofs4 btrfs zstd_decompress zstd_compress lzo_compress
>> >> raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx
>> >> xor raid6_pq multipath crc32c_vpmsum
>> >> [   21.963281] CPU: 11 PID: 316 Comm: kworker/u64:5 Kdump: loaded Not
>> >> tainted 5.1.0-dirty #2
>> >> [   21.963323] Workqueue: pseries hotplug workque pseries_hp_work_fn
>> >> [   21.963355] NIP:  c000000000079e18 LR: c000000000c79308 CTR:
>> >> 0000000000008000
>> >> [   21.963392] REGS: c0000003f88034f0 TRAP: 0700   Not tainted (5.1.0-dirty)
>> >> [   21.963422] MSR:  800000000282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR:
>> >> 28002884  XER: 20040000
>> >> [   21.963470] CFAR: c000000000c79304 IRQMASK: 0
>> >> [   21.963470] GPR00: c000000000c79308 c0000003f8803780 c000000001521000
>> >> 0000000000fff8c0
>> >> [   21.963470] GPR04: 0000000000000001 00000000ffe30005 0000000000000005
>> >> 0000000000000020
>> >> [   21.963470] GPR08: 0000000000000000 0000000000000001 c00a000000fff8e0
>> >> c0000000016d21a0
>> >> [   21.963470] GPR12: c0000000016e7b90 c000000007ff2700 c00a000000a00000
>> >> c0000003ffe30100
>> >> [   21.963470] GPR16: c0000003ffe30000 c0000000014aa4de c00a0000009f0000
>> >> c0000000016d21b0
>> >> [   21.963470] GPR20: c0000000014de588 0000000000000001 c0000000016d21b8
>> >> c00a000000a00000
>> >> [   21.963470] GPR24: 0000000000000000 ffffffffffffffff c00a000000a00000
>> >> c0000003ffe96000
>> >> [   21.963470] GPR28: c00a000000a00000 c00a000000a00000 c0000003fffec000
>> >> c00a000000fff8c0
>> >> [   21.963802] NIP [c000000000079e18] pte_fragment_free+0x48/0xd0
>> >> [   21.963838] LR [c000000000c79308] remove_pagetable+0x49c/0x5b4
>> >> [   21.963873] Call Trace:
>> >> [   21.963890] [c0000003f8803780] [c0000003ffe997f0] 0xc0000003ffe997f0
>> >> (unreliable)
>> >> [   21.963933] [c0000003f88037b0] [0000000000000000] (null)
>> >> [   21.963969] [c0000003f88038c0] [c00000000006f038]
>> >> vmemmap_free+0x218/0x2e0
>> >> [   21.964006] [c0000003f8803940] [c00000000036f100]
>> >> sparse_remove_one_section+0xd0/0x138
>> >> [   21.964050] [c0000003f8803980] [c000000000383a50]
>> >> __remove_pages+0x410/0x560
>> >> [   21.964093] [c0000003f8803a90] [c000000000c784d8]
>> >> arch_remove_memory+0x68/0xdc
>> >> [   21.964136] [c0000003f8803ad0] [c000000000385d74]
>> >> __remove_memory+0xc4/0x110
>> >> [   21.964180] [c0000003f8803b10] [c0000000000d44e4]
>> >> dlpar_remove_lmb+0x94/0x140
>> >> [   21.964223] [c0000003f8803b50] [c0000000000d52b4]
>> >> dlpar_memory+0x464/0xd00
>> >> [   21.964259] [c0000003f8803be0] [c0000000000cd5c0]
>> >> handle_dlpar_errorlog+0xc0/0x190
>> >> [   21.964303] [c0000003f8803c50] [c0000000000cd6bc]
>> >> pseries_hp_work_fn+0x2c/0x60
>> >> [   21.964346] [c0000003f8803c80] [c00000000013a4a0]
>> >> process_one_work+0x2b0/0x5a0
>> >> [   21.964388] [c0000003f8803d10] [c00000000013a818]
>> >> worker_thread+0x88/0x610
>> >> [   21.964434] [c0000003f8803db0] [c000000000143884] kthread+0x1a4/0x1b0
>> >> [   21.964468] [c0000003f8803e20] [c00000000000bdc4]
>> >> ret_from_kernel_thread+0x5c/0x78
>> >> [   21.964506] Instruction dump:
>> >> [   21.964527] fbe1fff8 f821ffd1 78638502 78633664 ebe90000 7fff1a14
>> >> 395f0020 813f0020
>> >> [   21.964569] 7d2907b4 7d2900d0 79290fe0 69290001 <0b090000> 7c0004ac
>> >> 7d205028 3129ffff
>> >> [   21.964613] ---[ end trace aaa571aa1636fee6 ]---
>> >> [   21.966349]
>> >> [   21.966383] Sending IPI to other CPUs
>> >> [   21.978335] IPI complete
>> >> [   21.981354] kexec: Starting switchover sequence.
>> >> I'm in purgatory
>> >
>> > git bisect points to
>> >
>> > commit 4231aba000f5a4583dd9f67057aadb68c3eca99d
>> > Author: Nicholas Piggin <npiggin@gmail.com>
>> > Date:   Fri Jul 27 21:48:17 2018 +1000
>> >
>> >     powerpc/64s: Fix page table fragment refcount race vs speculative references
>> >
>> >     The page table fragment allocator uses the main page refcount racily
>> >     with respect to speculative references. A customer observed a BUG due
>> >     to page table page refcount underflow in the fragment allocator. This
>> >     can be caused by the fragment allocator set_page_count stomping on a
>> >     speculative reference, and then the speculative failure handler
>> >     decrements the new reference, and the underflow eventually pops when
>> >     the page tables are freed.
>> >
>> >     Fix this by using a dedicated field in the struct page for the page
>> >     table fragment allocator.
>> >
>> >     Fixes: 5c1f6ee9a31c ("powerpc: Reduce PTE table memory wastage")
>> >     Cc: stable@vger.kernel.org # v3.10+
>> 
>> That's the commit that added the BUG_ON(), so prior to that you won't
>> see the crash.
> 
> Right, but the commit says it fixes page table page refcount underflow by
> introducing a new field &page->pt_frag_refcount. Now we are hitting the underflow
> for this pt_frag_refcount.

The fixed underflow is caused by a bug (race on page count) that got 
fixed by that patch. You are hitting a different underflow here. It's
not certain my patch caused it, I'm just trying to reproduce now.

> 
> BTW, if I go below this commit, I don't hit the pagecount
> 
> VM_BUG_ON_PAGE(page_ref_count(page) == 0, page);
> 
> which is in pte_fragment_free() path.

Do you have CONFIG_DEBUG_VM=y?

Thanks,
Nick


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: PROBLEM: Power9: kernel oops on memory hotunplug from ppc64le guest
@ 2019-05-20  4:48         ` Nicholas Piggin
  0 siblings, 0 replies; 26+ messages in thread
From: Nicholas Piggin @ 2019-05-20  4:48 UTC (permalink / raw)
  To: bharata, Michael Ellerman
  Cc: aneesh.kumar, linux-kernel, srikanth, linux-next, bharata, linuxppc-dev

Bharata B Rao's on May 20, 2019 2:25 pm:
> On Mon, May 20, 2019 at 12:02:23PM +1000, Michael Ellerman wrote:
>> Bharata B Rao <bharata@linux.ibm.com> writes:
>> > On Thu, May 16, 2019 at 07:44:20PM +0530, srikanth wrote:
>> >> Hello,
>> >> 
>> >> On power9 host, performing memory hotunplug from ppc64le guest results in
>> >> kernel oops.
>> >> 
>> >> Kernel used : https://github.com/torvalds/linux/tree/v5.1 built using
>> >> ppc64le_defconfig for host and ppc64le_guest_defconfig for guest.
>> >> 
>> >> Recreation steps:
>> >> 
>> >> 1. Boot a guest with below mem configuration:
>> >>   <maxMemory slots='32' unit='KiB'>33554432</maxMemory>
>> >>   <memory unit='KiB'>8388608</memory>
>> >>   <currentMemory unit='KiB'>4194304</currentMemory>
>> >>   <cpu>
>> >>     <numa>
>> >>       <cell id='0' cpus='0-31' memory='8388608' unit='KiB'/>
>> >>     </numa>
>> >>   </cpu>
>> >> 
>> >> 2. From host hotplug 8G memory -> verify memory hotadded succesfully -> now
>> >> reboot guest -> once guest comes back try to unplug 8G memory
>> >> 
>> >> mem.xml used:
>> >> <memory model='dimm'>
>> >> <target>
>> >> <size unit='GiB'>8</size>
>> >> <node>0</node>
>> >> </target>
>> >> </memory>
>> >> 
>> >> Memory attach and detach commands used:
>> >>     virsh attach-device vm1 ./mem.xml --live
>> >>     virsh detach-device vm1 ./mem.xml --live
>> >> 
>> >> Trace seen inside guest after unplug, guest just hangs there forever:
>> >> 
>> >> [   21.962986] kernel BUG at arch/powerpc/mm/pgtable-frag.c:113!
>> >> [   21.963064] Oops: Exception in kernel mode, sig: 5 [#1]
>> >> [   21.963090] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA
>> >> pSeries
>> >> [   21.963131] Modules linked in: xt_tcpudp iptable_filter squashfs fuse
>> >> vmx_crypto ib_iser rdma_cm iw_cm ib_cm ib_core libiscsi scsi_transport_iscsi
>> >> ip_tables x_tables autofs4 btrfs zstd_decompress zstd_compress lzo_compress
>> >> raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx
>> >> xor raid6_pq multipath crc32c_vpmsum
>> >> [   21.963281] CPU: 11 PID: 316 Comm: kworker/u64:5 Kdump: loaded Not
>> >> tainted 5.1.0-dirty #2
>> >> [   21.963323] Workqueue: pseries hotplug workque pseries_hp_work_fn
>> >> [   21.963355] NIP:  c000000000079e18 LR: c000000000c79308 CTR:
>> >> 0000000000008000
>> >> [   21.963392] REGS: c0000003f88034f0 TRAP: 0700   Not tainted (5.1.0-dirty)
>> >> [   21.963422] MSR:  800000000282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR:
>> >> 28002884  XER: 20040000
>> >> [   21.963470] CFAR: c000000000c79304 IRQMASK: 0
>> >> [   21.963470] GPR00: c000000000c79308 c0000003f8803780 c000000001521000
>> >> 0000000000fff8c0
>> >> [   21.963470] GPR04: 0000000000000001 00000000ffe30005 0000000000000005
>> >> 0000000000000020
>> >> [   21.963470] GPR08: 0000000000000000 0000000000000001 c00a000000fff8e0
>> >> c0000000016d21a0
>> >> [   21.963470] GPR12: c0000000016e7b90 c000000007ff2700 c00a000000a00000
>> >> c0000003ffe30100
>> >> [   21.963470] GPR16: c0000003ffe30000 c0000000014aa4de c00a0000009f0000
>> >> c0000000016d21b0
>> >> [   21.963470] GPR20: c0000000014de588 0000000000000001 c0000000016d21b8
>> >> c00a000000a00000
>> >> [   21.963470] GPR24: 0000000000000000 ffffffffffffffff c00a000000a00000
>> >> c0000003ffe96000
>> >> [   21.963470] GPR28: c00a000000a00000 c00a000000a00000 c0000003fffec000
>> >> c00a000000fff8c0
>> >> [   21.963802] NIP [c000000000079e18] pte_fragment_free+0x48/0xd0
>> >> [   21.963838] LR [c000000000c79308] remove_pagetable+0x49c/0x5b4
>> >> [   21.963873] Call Trace:
>> >> [   21.963890] [c0000003f8803780] [c0000003ffe997f0] 0xc0000003ffe997f0
>> >> (unreliable)
>> >> [   21.963933] [c0000003f88037b0] [0000000000000000] (null)
>> >> [   21.963969] [c0000003f88038c0] [c00000000006f038]
>> >> vmemmap_free+0x218/0x2e0
>> >> [   21.964006] [c0000003f8803940] [c00000000036f100]
>> >> sparse_remove_one_section+0xd0/0x138
>> >> [   21.964050] [c0000003f8803980] [c000000000383a50]
>> >> __remove_pages+0x410/0x560
>> >> [   21.964093] [c0000003f8803a90] [c000000000c784d8]
>> >> arch_remove_memory+0x68/0xdc
>> >> [   21.964136] [c0000003f8803ad0] [c000000000385d74]
>> >> __remove_memory+0xc4/0x110
>> >> [   21.964180] [c0000003f8803b10] [c0000000000d44e4]
>> >> dlpar_remove_lmb+0x94/0x140
>> >> [   21.964223] [c0000003f8803b50] [c0000000000d52b4]
>> >> dlpar_memory+0x464/0xd00
>> >> [   21.964259] [c0000003f8803be0] [c0000000000cd5c0]
>> >> handle_dlpar_errorlog+0xc0/0x190
>> >> [   21.964303] [c0000003f8803c50] [c0000000000cd6bc]
>> >> pseries_hp_work_fn+0x2c/0x60
>> >> [   21.964346] [c0000003f8803c80] [c00000000013a4a0]
>> >> process_one_work+0x2b0/0x5a0
>> >> [   21.964388] [c0000003f8803d10] [c00000000013a818]
>> >> worker_thread+0x88/0x610
>> >> [   21.964434] [c0000003f8803db0] [c000000000143884] kthread+0x1a4/0x1b0
>> >> [   21.964468] [c0000003f8803e20] [c00000000000bdc4]
>> >> ret_from_kernel_thread+0x5c/0x78
>> >> [   21.964506] Instruction dump:
>> >> [   21.964527] fbe1fff8 f821ffd1 78638502 78633664 ebe90000 7fff1a14
>> >> 395f0020 813f0020
>> >> [   21.964569] 7d2907b4 7d2900d0 79290fe0 69290001 <0b090000> 7c0004ac
>> >> 7d205028 3129ffff
>> >> [   21.964613] ---[ end trace aaa571aa1636fee6 ]---
>> >> [   21.966349]
>> >> [   21.966383] Sending IPI to other CPUs
>> >> [   21.978335] IPI complete
>> >> [   21.981354] kexec: Starting switchover sequence.
>> >> I'm in purgatory
>> >
>> > git bisect points to
>> >
>> > commit 4231aba000f5a4583dd9f67057aadb68c3eca99d
>> > Author: Nicholas Piggin <npiggin@gmail.com>
>> > Date:   Fri Jul 27 21:48:17 2018 +1000
>> >
>> >     powerpc/64s: Fix page table fragment refcount race vs speculative references
>> >
>> >     The page table fragment allocator uses the main page refcount racily
>> >     with respect to speculative references. A customer observed a BUG due
>> >     to page table page refcount underflow in the fragment allocator. This
>> >     can be caused by the fragment allocator set_page_count stomping on a
>> >     speculative reference, and then the speculative failure handler
>> >     decrements the new reference, and the underflow eventually pops when
>> >     the page tables are freed.
>> >
>> >     Fix this by using a dedicated field in the struct page for the page
>> >     table fragment allocator.
>> >
>> >     Fixes: 5c1f6ee9a31c ("powerpc: Reduce PTE table memory wastage")
>> >     Cc: stable@vger.kernel.org # v3.10+
>> 
>> That's the commit that added the BUG_ON(), so prior to that you won't
>> see the crash.
> 
> Right, but the commit says it fixes page table page refcount underflow by
> introducing a new field &page->pt_frag_refcount. Now we are hitting the underflow
> for this pt_frag_refcount.

The fixed underflow is caused by a bug (race on page count) that got 
fixed by that patch. You are hitting a different underflow here. It's
not certain my patch caused it, I'm just trying to reproduce now.

> 
> BTW, if I go below this commit, I don't hit the pagecount
> 
> VM_BUG_ON_PAGE(page_ref_count(page) == 0, page);
> 
> which is in pte_fragment_free() path.

Do you have CONFIG_DEBUG_VM=y?

Thanks,
Nick


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: PROBLEM: Power9: kernel oops on memory hotunplug from ppc64le guest
  2019-05-20  4:48         ` Nicholas Piggin
@ 2019-05-20  5:56           ` Bharata B Rao
  -1 siblings, 0 replies; 26+ messages in thread
From: Bharata B Rao @ 2019-05-20  5:56 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: Michael Ellerman, aneesh.kumar, bharata, linux-kernel,
	linux-next, linuxppc-dev, srikanth

On Mon, May 20, 2019 at 02:48:35PM +1000, Nicholas Piggin wrote:
> >> > git bisect points to
> >> >
> >> > commit 4231aba000f5a4583dd9f67057aadb68c3eca99d
> >> > Author: Nicholas Piggin <npiggin@gmail.com>
> >> > Date:   Fri Jul 27 21:48:17 2018 +1000
> >> >
> >> >     powerpc/64s: Fix page table fragment refcount race vs speculative references
> >> >
> >> >     The page table fragment allocator uses the main page refcount racily
> >> >     with respect to speculative references. A customer observed a BUG due
> >> >     to page table page refcount underflow in the fragment allocator. This
> >> >     can be caused by the fragment allocator set_page_count stomping on a
> >> >     speculative reference, and then the speculative failure handler
> >> >     decrements the new reference, and the underflow eventually pops when
> >> >     the page tables are freed.
> >> >
> >> >     Fix this by using a dedicated field in the struct page for the page
> >> >     table fragment allocator.
> >> >
> >> >     Fixes: 5c1f6ee9a31c ("powerpc: Reduce PTE table memory wastage")
> >> >     Cc: stable@vger.kernel.org # v3.10+
> >> 
> >> That's the commit that added the BUG_ON(), so prior to that you won't
> >> see the crash.
> > 
> > Right, but the commit says it fixes page table page refcount underflow by
> > introducing a new field &page->pt_frag_refcount. Now we are hitting the underflow
> > for this pt_frag_refcount.
> 
> The fixed underflow is caused by a bug (race on page count) that got 
> fixed by that patch. You are hitting a different underflow here. It's
> not certain my patch caused it, I'm just trying to reproduce now.

Ok.

> 
> > 
> > BTW, if I go below this commit, I don't hit the pagecount
> > 
> > VM_BUG_ON_PAGE(page_ref_count(page) == 0, page);
> > 
> > which is in pte_fragment_free() path.
> 
> Do you have CONFIG_DEBUG_VM=y?

Yes.

Regards,
Bharata.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: PROBLEM: Power9: kernel oops on memory hotunplug from ppc64le guest
@ 2019-05-20  5:56           ` Bharata B Rao
  0 siblings, 0 replies; 26+ messages in thread
From: Bharata B Rao @ 2019-05-20  5:56 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: linux-kernel, linux-next, aneesh.kumar, bharata, srikanth, linuxppc-dev

On Mon, May 20, 2019 at 02:48:35PM +1000, Nicholas Piggin wrote:
> >> > git bisect points to
> >> >
> >> > commit 4231aba000f5a4583dd9f67057aadb68c3eca99d
> >> > Author: Nicholas Piggin <npiggin@gmail.com>
> >> > Date:   Fri Jul 27 21:48:17 2018 +1000
> >> >
> >> >     powerpc/64s: Fix page table fragment refcount race vs speculative references
> >> >
> >> >     The page table fragment allocator uses the main page refcount racily
> >> >     with respect to speculative references. A customer observed a BUG due
> >> >     to page table page refcount underflow in the fragment allocator. This
> >> >     can be caused by the fragment allocator set_page_count stomping on a
> >> >     speculative reference, and then the speculative failure handler
> >> >     decrements the new reference, and the underflow eventually pops when
> >> >     the page tables are freed.
> >> >
> >> >     Fix this by using a dedicated field in the struct page for the page
> >> >     table fragment allocator.
> >> >
> >> >     Fixes: 5c1f6ee9a31c ("powerpc: Reduce PTE table memory wastage")
> >> >     Cc: stable@vger.kernel.org # v3.10+
> >> 
> >> That's the commit that added the BUG_ON(), so prior to that you won't
> >> see the crash.
> > 
> > Right, but the commit says it fixes page table page refcount underflow by
> > introducing a new field &page->pt_frag_refcount. Now we are hitting the underflow
> > for this pt_frag_refcount.
> 
> The fixed underflow is caused by a bug (race on page count) that got 
> fixed by that patch. You are hitting a different underflow here. It's
> not certain my patch caused it, I'm just trying to reproduce now.

Ok.

> 
> > 
> > BTW, if I go below this commit, I don't hit the pagecount
> > 
> > VM_BUG_ON_PAGE(page_ref_count(page) == 0, page);
> > 
> > which is in pte_fragment_free() path.
> 
> Do you have CONFIG_DEBUG_VM=y?

Yes.

Regards,
Bharata.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: PROBLEM: Power9: kernel oops on memory hotunplug from ppc64le guest
  2019-05-20  5:56           ` Bharata B Rao
@ 2019-05-20  7:00             ` Nicholas Piggin
  -1 siblings, 0 replies; 26+ messages in thread
From: Nicholas Piggin @ 2019-05-20  7:00 UTC (permalink / raw)
  To: bharata
  Cc: aneesh.kumar, bharata, linux-kernel, linux-next, linuxppc-dev,
	Michael Ellerman, srikanth

Bharata B Rao's on May 20, 2019 3:56 pm:
> On Mon, May 20, 2019 at 02:48:35PM +1000, Nicholas Piggin wrote:
>> >> > git bisect points to
>> >> >
>> >> > commit 4231aba000f5a4583dd9f67057aadb68c3eca99d
>> >> > Author: Nicholas Piggin <npiggin@gmail.com>
>> >> > Date:   Fri Jul 27 21:48:17 2018 +1000
>> >> >
>> >> >     powerpc/64s: Fix page table fragment refcount race vs speculative references
>> >> >
>> >> >     The page table fragment allocator uses the main page refcount racily
>> >> >     with respect to speculative references. A customer observed a BUG due
>> >> >     to page table page refcount underflow in the fragment allocator. This
>> >> >     can be caused by the fragment allocator set_page_count stomping on a
>> >> >     speculative reference, and then the speculative failure handler
>> >> >     decrements the new reference, and the underflow eventually pops when
>> >> >     the page tables are freed.
>> >> >
>> >> >     Fix this by using a dedicated field in the struct page for the page
>> >> >     table fragment allocator.
>> >> >
>> >> >     Fixes: 5c1f6ee9a31c ("powerpc: Reduce PTE table memory wastage")
>> >> >     Cc: stable@vger.kernel.org # v3.10+
>> >> 
>> >> That's the commit that added the BUG_ON(), so prior to that you won't
>> >> see the crash.
>> > 
>> > Right, but the commit says it fixes page table page refcount underflow by
>> > introducing a new field &page->pt_frag_refcount. Now we are hitting the underflow
>> > for this pt_frag_refcount.
>> 
>> The fixed underflow is caused by a bug (race on page count) that got 
>> fixed by that patch. You are hitting a different underflow here. It's
>> not certain my patch caused it, I'm just trying to reproduce now.
> 
> Ok.

Can't reproduce I'm afraid, tried adding and removing 8GB memory from a
4GB guest (via host adding / removing memory device), and it just works.

It's likely to be an edge case like an off by one or rounding error
that just happens to trigger in your config. Might be easiest if you
could test with a debug patch.

Thanks,
Nick


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: PROBLEM: Power9: kernel oops on memory hotunplug from ppc64le guest
@ 2019-05-20  7:00             ` Nicholas Piggin
  0 siblings, 0 replies; 26+ messages in thread
From: Nicholas Piggin @ 2019-05-20  7:00 UTC (permalink / raw)
  To: bharata
  Cc: bharata, linux-kernel, linux-next, aneesh.kumar, srikanth, linuxppc-dev

Bharata B Rao's on May 20, 2019 3:56 pm:
> On Mon, May 20, 2019 at 02:48:35PM +1000, Nicholas Piggin wrote:
>> >> > git bisect points to
>> >> >
>> >> > commit 4231aba000f5a4583dd9f67057aadb68c3eca99d
>> >> > Author: Nicholas Piggin <npiggin@gmail.com>
>> >> > Date:   Fri Jul 27 21:48:17 2018 +1000
>> >> >
>> >> >     powerpc/64s: Fix page table fragment refcount race vs speculative references
>> >> >
>> >> >     The page table fragment allocator uses the main page refcount racily
>> >> >     with respect to speculative references. A customer observed a BUG due
>> >> >     to page table page refcount underflow in the fragment allocator. This
>> >> >     can be caused by the fragment allocator set_page_count stomping on a
>> >> >     speculative reference, and then the speculative failure handler
>> >> >     decrements the new reference, and the underflow eventually pops when
>> >> >     the page tables are freed.
>> >> >
>> >> >     Fix this by using a dedicated field in the struct page for the page
>> >> >     table fragment allocator.
>> >> >
>> >> >     Fixes: 5c1f6ee9a31c ("powerpc: Reduce PTE table memory wastage")
>> >> >     Cc: stable@vger.kernel.org # v3.10+
>> >> 
>> >> That's the commit that added the BUG_ON(), so prior to that you won't
>> >> see the crash.
>> > 
>> > Right, but the commit says it fixes page table page refcount underflow by
>> > introducing a new field &page->pt_frag_refcount. Now we are hitting the underflow
>> > for this pt_frag_refcount.
>> 
>> The fixed underflow is caused by a bug (race on page count) that got 
>> fixed by that patch. You are hitting a different underflow here. It's
>> not certain my patch caused it, I'm just trying to reproduce now.
> 
> Ok.

Can't reproduce I'm afraid, tried adding and removing 8GB memory from a
4GB guest (via host adding / removing memory device), and it just works.

It's likely to be an edge case like an off by one or rounding error
that just happens to trigger in your config. Might be easiest if you
could test with a debug patch.

Thanks,
Nick


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: PROBLEM: Power9: kernel oops on memory hotunplug from ppc64le guest
  2019-05-20  7:00             ` Nicholas Piggin
@ 2019-05-20  8:20               ` Bharata B Rao
  -1 siblings, 0 replies; 26+ messages in thread
From: Bharata B Rao @ 2019-05-20  8:20 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: aneesh.kumar, bharata, linux-kernel, linux-next, linuxppc-dev,
	Michael Ellerman, srikanth

On Mon, May 20, 2019 at 05:00:21PM +1000, Nicholas Piggin wrote:
> Bharata B Rao's on May 20, 2019 3:56 pm:
> > On Mon, May 20, 2019 at 02:48:35PM +1000, Nicholas Piggin wrote:
> >> >> > git bisect points to
> >> >> >
> >> >> > commit 4231aba000f5a4583dd9f67057aadb68c3eca99d
> >> >> > Author: Nicholas Piggin <npiggin@gmail.com>
> >> >> > Date:   Fri Jul 27 21:48:17 2018 +1000
> >> >> >
> >> >> >     powerpc/64s: Fix page table fragment refcount race vs speculative references
> >> >> >
> >> >> >     The page table fragment allocator uses the main page refcount racily
> >> >> >     with respect to speculative references. A customer observed a BUG due
> >> >> >     to page table page refcount underflow in the fragment allocator. This
> >> >> >     can be caused by the fragment allocator set_page_count stomping on a
> >> >> >     speculative reference, and then the speculative failure handler
> >> >> >     decrements the new reference, and the underflow eventually pops when
> >> >> >     the page tables are freed.
> >> >> >
> >> >> >     Fix this by using a dedicated field in the struct page for the page
> >> >> >     table fragment allocator.
> >> >> >
> >> >> >     Fixes: 5c1f6ee9a31c ("powerpc: Reduce PTE table memory wastage")
> >> >> >     Cc: stable@vger.kernel.org # v3.10+
> >> >> 
> >> >> That's the commit that added the BUG_ON(), so prior to that you won't
> >> >> see the crash.
> >> > 
> >> > Right, but the commit says it fixes page table page refcount underflow by
> >> > introducing a new field &page->pt_frag_refcount. Now we are hitting the underflow
> >> > for this pt_frag_refcount.
> >> 
> >> The fixed underflow is caused by a bug (race on page count) that got 
> >> fixed by that patch. You are hitting a different underflow here. It's
> >> not certain my patch caused it, I'm just trying to reproduce now.
> > 
> > Ok.
> 
> Can't reproduce I'm afraid, tried adding and removing 8GB memory from a
> 4GB guest (via host adding / removing memory device), and it just works.

Boot, add 8G, reboot, remove 8G is the sequence to reproduce.

> 
> It's likely to be an edge case like an off by one or rounding error
> that just happens to trigger in your config. Might be easiest if you
> could test with a debug patch.

Sure, I will continue debugging.

Regards,
Bharata.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: PROBLEM: Power9: kernel oops on memory hotunplug from ppc64le guest
@ 2019-05-20  8:20               ` Bharata B Rao
  0 siblings, 0 replies; 26+ messages in thread
From: Bharata B Rao @ 2019-05-20  8:20 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: bharata, linux-kernel, linux-next, aneesh.kumar, srikanth, linuxppc-dev

On Mon, May 20, 2019 at 05:00:21PM +1000, Nicholas Piggin wrote:
> Bharata B Rao's on May 20, 2019 3:56 pm:
> > On Mon, May 20, 2019 at 02:48:35PM +1000, Nicholas Piggin wrote:
> >> >> > git bisect points to
> >> >> >
> >> >> > commit 4231aba000f5a4583dd9f67057aadb68c3eca99d
> >> >> > Author: Nicholas Piggin <npiggin@gmail.com>
> >> >> > Date:   Fri Jul 27 21:48:17 2018 +1000
> >> >> >
> >> >> >     powerpc/64s: Fix page table fragment refcount race vs speculative references
> >> >> >
> >> >> >     The page table fragment allocator uses the main page refcount racily
> >> >> >     with respect to speculative references. A customer observed a BUG due
> >> >> >     to page table page refcount underflow in the fragment allocator. This
> >> >> >     can be caused by the fragment allocator set_page_count stomping on a
> >> >> >     speculative reference, and then the speculative failure handler
> >> >> >     decrements the new reference, and the underflow eventually pops when
> >> >> >     the page tables are freed.
> >> >> >
> >> >> >     Fix this by using a dedicated field in the struct page for the page
> >> >> >     table fragment allocator.
> >> >> >
> >> >> >     Fixes: 5c1f6ee9a31c ("powerpc: Reduce PTE table memory wastage")
> >> >> >     Cc: stable@vger.kernel.org # v3.10+
> >> >> 
> >> >> That's the commit that added the BUG_ON(), so prior to that you won't
> >> >> see the crash.
> >> > 
> >> > Right, but the commit says it fixes page table page refcount underflow by
> >> > introducing a new field &page->pt_frag_refcount. Now we are hitting the underflow
> >> > for this pt_frag_refcount.
> >> 
> >> The fixed underflow is caused by a bug (race on page count) that got 
> >> fixed by that patch. You are hitting a different underflow here. It's
> >> not certain my patch caused it, I'm just trying to reproduce now.
> > 
> > Ok.
> 
> Can't reproduce I'm afraid, tried adding and removing 8GB memory from a
> 4GB guest (via host adding / removing memory device), and it just works.

Boot, add 8G, reboot, remove 8G is the sequence to reproduce.

> 
> It's likely to be an edge case like an off by one or rounding error
> that just happens to trigger in your config. Might be easiest if you
> could test with a debug patch.

Sure, I will continue debugging.

Regards,
Bharata.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: PROBLEM: Power9: kernel oops on memory hotunplug from ppc64le guest
  2019-05-20  8:20               ` Bharata B Rao
@ 2019-05-20 14:29                 ` Bharata B Rao
  -1 siblings, 0 replies; 26+ messages in thread
From: Bharata B Rao @ 2019-05-20 14:29 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: aneesh.kumar, bharata, linux-kernel, linux-next, linuxppc-dev,
	Michael Ellerman, srikanth

On Mon, May 20, 2019 at 01:50:35PM +0530, Bharata B Rao wrote:
> On Mon, May 20, 2019 at 05:00:21PM +1000, Nicholas Piggin wrote:
> > Bharata B Rao's on May 20, 2019 3:56 pm:
> > > On Mon, May 20, 2019 at 02:48:35PM +1000, Nicholas Piggin wrote:
> > >> >> > git bisect points to
> > >> >> >
> > >> >> > commit 4231aba000f5a4583dd9f67057aadb68c3eca99d
> > >> >> > Author: Nicholas Piggin <npiggin@gmail.com>
> > >> >> > Date:   Fri Jul 27 21:48:17 2018 +1000
> > >> >> >
> > >> >> >     powerpc/64s: Fix page table fragment refcount race vs speculative references
> > >> >> >
> > >> >> >     The page table fragment allocator uses the main page refcount racily
> > >> >> >     with respect to speculative references. A customer observed a BUG due
> > >> >> >     to page table page refcount underflow in the fragment allocator. This
> > >> >> >     can be caused by the fragment allocator set_page_count stomping on a
> > >> >> >     speculative reference, and then the speculative failure handler
> > >> >> >     decrements the new reference, and the underflow eventually pops when
> > >> >> >     the page tables are freed.
> > >> >> >
> > >> >> >     Fix this by using a dedicated field in the struct page for the page
> > >> >> >     table fragment allocator.
> > >> >> >
> > >> >> >     Fixes: 5c1f6ee9a31c ("powerpc: Reduce PTE table memory wastage")
> > >> >> >     Cc: stable@vger.kernel.org # v3.10+
> > >> >> 
> > >> >> That's the commit that added the BUG_ON(), so prior to that you won't
> > >> >> see the crash.
> > >> > 
> > >> > Right, but the commit says it fixes page table page refcount underflow by
> > >> > introducing a new field &page->pt_frag_refcount. Now we are hitting the underflow
> > >> > for this pt_frag_refcount.
> > >> 
> > >> The fixed underflow is caused by a bug (race on page count) that got 
> > >> fixed by that patch. You are hitting a different underflow here. It's
> > >> not certain my patch caused it, I'm just trying to reproduce now.
> > > 
> > > Ok.
> > 
> > Can't reproduce I'm afraid, tried adding and removing 8GB memory from a
> > 4GB guest (via host adding / removing memory device), and it just works.
> 
> Boot, add 8G, reboot, remove 8G is the sequence to reproduce.
> 
> > 
> > It's likely to be an edge case like an off by one or rounding error
> > that just happens to trigger in your config. Might be easiest if you
> > could test with a debug patch.
> 
> Sure, I will continue debugging.

When the guest is rebooted after hotplug, the entire memory (which includes
the hotplugged memory) gets remapped again freshly. However at this time
since no slab is available yet, pt_frag_refcount never gets initialized as we
never do pte_fragment_alloc() for these mappings. So we right away hit the
underflow during the first unplug itself, it looks like.

I will check how this can be fixed.

> 
> Regards,
> Bharata.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: PROBLEM: Power9: kernel oops on memory hotunplug from ppc64le guest
@ 2019-05-20 14:29                 ` Bharata B Rao
  0 siblings, 0 replies; 26+ messages in thread
From: Bharata B Rao @ 2019-05-20 14:29 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: bharata, linux-kernel, linux-next, aneesh.kumar, srikanth, linuxppc-dev

On Mon, May 20, 2019 at 01:50:35PM +0530, Bharata B Rao wrote:
> On Mon, May 20, 2019 at 05:00:21PM +1000, Nicholas Piggin wrote:
> > Bharata B Rao's on May 20, 2019 3:56 pm:
> > > On Mon, May 20, 2019 at 02:48:35PM +1000, Nicholas Piggin wrote:
> > >> >> > git bisect points to
> > >> >> >
> > >> >> > commit 4231aba000f5a4583dd9f67057aadb68c3eca99d
> > >> >> > Author: Nicholas Piggin <npiggin@gmail.com>
> > >> >> > Date:   Fri Jul 27 21:48:17 2018 +1000
> > >> >> >
> > >> >> >     powerpc/64s: Fix page table fragment refcount race vs speculative references
> > >> >> >
> > >> >> >     The page table fragment allocator uses the main page refcount racily
> > >> >> >     with respect to speculative references. A customer observed a BUG due
> > >> >> >     to page table page refcount underflow in the fragment allocator. This
> > >> >> >     can be caused by the fragment allocator set_page_count stomping on a
> > >> >> >     speculative reference, and then the speculative failure handler
> > >> >> >     decrements the new reference, and the underflow eventually pops when
> > >> >> >     the page tables are freed.
> > >> >> >
> > >> >> >     Fix this by using a dedicated field in the struct page for the page
> > >> >> >     table fragment allocator.
> > >> >> >
> > >> >> >     Fixes: 5c1f6ee9a31c ("powerpc: Reduce PTE table memory wastage")
> > >> >> >     Cc: stable@vger.kernel.org # v3.10+
> > >> >> 
> > >> >> That's the commit that added the BUG_ON(), so prior to that you won't
> > >> >> see the crash.
> > >> > 
> > >> > Right, but the commit says it fixes page table page refcount underflow by
> > >> > introducing a new field &page->pt_frag_refcount. Now we are hitting the underflow
> > >> > for this pt_frag_refcount.
> > >> 
> > >> The fixed underflow is caused by a bug (race on page count) that got 
> > >> fixed by that patch. You are hitting a different underflow here. It's
> > >> not certain my patch caused it, I'm just trying to reproduce now.
> > > 
> > > Ok.
> > 
> > Can't reproduce I'm afraid, tried adding and removing 8GB memory from a
> > 4GB guest (via host adding / removing memory device), and it just works.
> 
> Boot, add 8G, reboot, remove 8G is the sequence to reproduce.
> 
> > 
> > It's likely to be an edge case like an off by one or rounding error
> > that just happens to trigger in your config. Might be easiest if you
> > could test with a debug patch.
> 
> Sure, I will continue debugging.

When the guest is rebooted after hotplug, the entire memory (which includes
the hotplugged memory) gets remapped again freshly. However at this time
since no slab is available yet, pt_frag_refcount never gets initialized as we
never do pte_fragment_alloc() for these mappings. So we right away hit the
underflow during the first unplug itself, it looks like.

I will check how this can be fixed.

> 
> Regards,
> Bharata.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: PROBLEM: Power9: kernel oops on memory hotunplug from ppc64le guest
  2019-05-20 14:29                 ` Bharata B Rao
@ 2019-05-20 14:55                   ` Nicholas Piggin
  -1 siblings, 0 replies; 26+ messages in thread
From: Nicholas Piggin @ 2019-05-20 14:55 UTC (permalink / raw)
  To: bharata
  Cc: aneesh.kumar, bharata, linux-kernel, linux-next, linuxppc-dev,
	Michael Ellerman, srikanth

Bharata B Rao's on May 21, 2019 12:29 am:
> On Mon, May 20, 2019 at 01:50:35PM +0530, Bharata B Rao wrote:
>> On Mon, May 20, 2019 at 05:00:21PM +1000, Nicholas Piggin wrote:
>> > Bharata B Rao's on May 20, 2019 3:56 pm:
>> > > On Mon, May 20, 2019 at 02:48:35PM +1000, Nicholas Piggin wrote:
>> > >> >> > git bisect points to
>> > >> >> >
>> > >> >> > commit 4231aba000f5a4583dd9f67057aadb68c3eca99d
>> > >> >> > Author: Nicholas Piggin <npiggin@gmail.com>
>> > >> >> > Date:   Fri Jul 27 21:48:17 2018 +1000
>> > >> >> >
>> > >> >> >     powerpc/64s: Fix page table fragment refcount race vs speculative references
>> > >> >> >
>> > >> >> >     The page table fragment allocator uses the main page refcount racily
>> > >> >> >     with respect to speculative references. A customer observed a BUG due
>> > >> >> >     to page table page refcount underflow in the fragment allocator. This
>> > >> >> >     can be caused by the fragment allocator set_page_count stomping on a
>> > >> >> >     speculative reference, and then the speculative failure handler
>> > >> >> >     decrements the new reference, and the underflow eventually pops when
>> > >> >> >     the page tables are freed.
>> > >> >> >
>> > >> >> >     Fix this by using a dedicated field in the struct page for the page
>> > >> >> >     table fragment allocator.
>> > >> >> >
>> > >> >> >     Fixes: 5c1f6ee9a31c ("powerpc: Reduce PTE table memory wastage")
>> > >> >> >     Cc: stable@vger.kernel.org # v3.10+
>> > >> >> 
>> > >> >> That's the commit that added the BUG_ON(), so prior to that you won't
>> > >> >> see the crash.
>> > >> > 
>> > >> > Right, but the commit says it fixes page table page refcount underflow by
>> > >> > introducing a new field &page->pt_frag_refcount. Now we are hitting the underflow
>> > >> > for this pt_frag_refcount.
>> > >> 
>> > >> The fixed underflow is caused by a bug (race on page count) that got 
>> > >> fixed by that patch. You are hitting a different underflow here. It's
>> > >> not certain my patch caused it, I'm just trying to reproduce now.
>> > > 
>> > > Ok.
>> > 
>> > Can't reproduce I'm afraid, tried adding and removing 8GB memory from a
>> > 4GB guest (via host adding / removing memory device), and it just works.
>> 
>> Boot, add 8G, reboot, remove 8G is the sequence to reproduce.
>> 
>> > 
>> > It's likely to be an edge case like an off by one or rounding error
>> > that just happens to trigger in your config. Might be easiest if you
>> > could test with a debug patch.
>> 
>> Sure, I will continue debugging.
> 
> When the guest is rebooted after hotplug, the entire memory (which includes
> the hotplugged memory) gets remapped again freshly. However at this time
> since no slab is available yet, pt_frag_refcount never gets initialized as we
> never do pte_fragment_alloc() for these mappings. So we right away hit the
> underflow during the first unplug itself, it looks like.

Nice catch, good debugging work.

> I will check how this can be fixed.

Tricky problem. What do you think? You might be able to make the early 
page table allocations in the same pattern as the frag allocations, and 
then fill in the struct page metadata when you have those.

Other option may be create a new set of page tables after mm comes up
to replace the early page tables with. That's a bigger hammer though.

Thanks,
Nick


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: PROBLEM: Power9: kernel oops on memory hotunplug from ppc64le guest
@ 2019-05-20 14:55                   ` Nicholas Piggin
  0 siblings, 0 replies; 26+ messages in thread
From: Nicholas Piggin @ 2019-05-20 14:55 UTC (permalink / raw)
  To: bharata
  Cc: bharata, linux-kernel, linux-next, aneesh.kumar, srikanth, linuxppc-dev

Bharata B Rao's on May 21, 2019 12:29 am:
> On Mon, May 20, 2019 at 01:50:35PM +0530, Bharata B Rao wrote:
>> On Mon, May 20, 2019 at 05:00:21PM +1000, Nicholas Piggin wrote:
>> > Bharata B Rao's on May 20, 2019 3:56 pm:
>> > > On Mon, May 20, 2019 at 02:48:35PM +1000, Nicholas Piggin wrote:
>> > >> >> > git bisect points to
>> > >> >> >
>> > >> >> > commit 4231aba000f5a4583dd9f67057aadb68c3eca99d
>> > >> >> > Author: Nicholas Piggin <npiggin@gmail.com>
>> > >> >> > Date:   Fri Jul 27 21:48:17 2018 +1000
>> > >> >> >
>> > >> >> >     powerpc/64s: Fix page table fragment refcount race vs speculative references
>> > >> >> >
>> > >> >> >     The page table fragment allocator uses the main page refcount racily
>> > >> >> >     with respect to speculative references. A customer observed a BUG due
>> > >> >> >     to page table page refcount underflow in the fragment allocator. This
>> > >> >> >     can be caused by the fragment allocator set_page_count stomping on a
>> > >> >> >     speculative reference, and then the speculative failure handler
>> > >> >> >     decrements the new reference, and the underflow eventually pops when
>> > >> >> >     the page tables are freed.
>> > >> >> >
>> > >> >> >     Fix this by using a dedicated field in the struct page for the page
>> > >> >> >     table fragment allocator.
>> > >> >> >
>> > >> >> >     Fixes: 5c1f6ee9a31c ("powerpc: Reduce PTE table memory wastage")
>> > >> >> >     Cc: stable@vger.kernel.org # v3.10+
>> > >> >> 
>> > >> >> That's the commit that added the BUG_ON(), so prior to that you won't
>> > >> >> see the crash.
>> > >> > 
>> > >> > Right, but the commit says it fixes page table page refcount underflow by
>> > >> > introducing a new field &page->pt_frag_refcount. Now we are hitting the underflow
>> > >> > for this pt_frag_refcount.
>> > >> 
>> > >> The fixed underflow is caused by a bug (race on page count) that got 
>> > >> fixed by that patch. You are hitting a different underflow here. It's
>> > >> not certain my patch caused it, I'm just trying to reproduce now.
>> > > 
>> > > Ok.
>> > 
>> > Can't reproduce I'm afraid, tried adding and removing 8GB memory from a
>> > 4GB guest (via host adding / removing memory device), and it just works.
>> 
>> Boot, add 8G, reboot, remove 8G is the sequence to reproduce.
>> 
>> > 
>> > It's likely to be an edge case like an off by one or rounding error
>> > that just happens to trigger in your config. Might be easiest if you
>> > could test with a debug patch.
>> 
>> Sure, I will continue debugging.
> 
> When the guest is rebooted after hotplug, the entire memory (which includes
> the hotplugged memory) gets remapped again freshly. However at this time
> since no slab is available yet, pt_frag_refcount never gets initialized as we
> never do pte_fragment_alloc() for these mappings. So we right away hit the
> underflow during the first unplug itself, it looks like.

Nice catch, good debugging work.

> I will check how this can be fixed.

Tricky problem. What do you think? You might be able to make the early 
page table allocations in the same pattern as the frag allocations, and 
then fill in the struct page metadata when you have those.

Other option may be create a new set of page tables after mm comes up
to replace the early page tables with. That's a bigger hammer though.

Thanks,
Nick


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: PROBLEM: Power9: kernel oops on memory hotunplug from ppc64le guest
  2019-05-20 14:55                   ` Nicholas Piggin
@ 2019-05-20 15:12                     ` Bharata B Rao
  -1 siblings, 0 replies; 26+ messages in thread
From: Bharata B Rao @ 2019-05-20 15:12 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: aneesh.kumar, bharata, linux-kernel, linux-next, linuxppc-dev,
	Michael Ellerman, srikanth

On Tue, May 21, 2019 at 12:55:49AM +1000, Nicholas Piggin wrote:
> Bharata B Rao's on May 21, 2019 12:29 am:
> > On Mon, May 20, 2019 at 01:50:35PM +0530, Bharata B Rao wrote:
> >> On Mon, May 20, 2019 at 05:00:21PM +1000, Nicholas Piggin wrote:
> >> > Bharata B Rao's on May 20, 2019 3:56 pm:
> >> > > On Mon, May 20, 2019 at 02:48:35PM +1000, Nicholas Piggin wrote:
> >> > >> >> > git bisect points to
> >> > >> >> >
> >> > >> >> > commit 4231aba000f5a4583dd9f67057aadb68c3eca99d
> >> > >> >> > Author: Nicholas Piggin <npiggin@gmail.com>
> >> > >> >> > Date:   Fri Jul 27 21:48:17 2018 +1000
> >> > >> >> >
> >> > >> >> >     powerpc/64s: Fix page table fragment refcount race vs speculative references
> >> > >> >> >
> >> > >> >> >     The page table fragment allocator uses the main page refcount racily
> >> > >> >> >     with respect to speculative references. A customer observed a BUG due
> >> > >> >> >     to page table page refcount underflow in the fragment allocator. This
> >> > >> >> >     can be caused by the fragment allocator set_page_count stomping on a
> >> > >> >> >     speculative reference, and then the speculative failure handler
> >> > >> >> >     decrements the new reference, and the underflow eventually pops when
> >> > >> >> >     the page tables are freed.
> >> > >> >> >
> >> > >> >> >     Fix this by using a dedicated field in the struct page for the page
> >> > >> >> >     table fragment allocator.
> >> > >> >> >
> >> > >> >> >     Fixes: 5c1f6ee9a31c ("powerpc: Reduce PTE table memory wastage")
> >> > >> >> >     Cc: stable@vger.kernel.org # v3.10+
> >> > >> >> 
> >> > >> >> That's the commit that added the BUG_ON(), so prior to that you won't
> >> > >> >> see the crash.
> >> > >> > 
> >> > >> > Right, but the commit says it fixes page table page refcount underflow by
> >> > >> > introducing a new field &page->pt_frag_refcount. Now we are hitting the underflow
> >> > >> > for this pt_frag_refcount.
> >> > >> 
> >> > >> The fixed underflow is caused by a bug (race on page count) that got 
> >> > >> fixed by that patch. You are hitting a different underflow here. It's
> >> > >> not certain my patch caused it, I'm just trying to reproduce now.
> >> > > 
> >> > > Ok.
> >> > 
> >> > Can't reproduce I'm afraid, tried adding and removing 8GB memory from a
> >> > 4GB guest (via host adding / removing memory device), and it just works.
> >> 
> >> Boot, add 8G, reboot, remove 8G is the sequence to reproduce.
> >> 
> >> > 
> >> > It's likely to be an edge case like an off by one or rounding error
> >> > that just happens to trigger in your config. Might be easiest if you
> >> > could test with a debug patch.
> >> 
> >> Sure, I will continue debugging.
> > 
> > When the guest is rebooted after hotplug, the entire memory (which includes
> > the hotplugged memory) gets remapped again freshly. However at this time
> > since no slab is available yet, pt_frag_refcount never gets initialized as we
> > never do pte_fragment_alloc() for these mappings. So we right away hit the
> > underflow during the first unplug itself, it looks like.
> 
> Nice catch, good debugging work.

Thanks, with help from Aneesh.

> 
> > I will check how this can be fixed.
> 
> Tricky problem. What do you think? You might be able to make the early 
> page table allocations in the same pattern as the frag allocations, and 
> then fill in the struct page metadata when you have those.

Will explore.

> 
> Other option may be create a new set of page tables after mm comes up
> to replace the early page tables with. That's a bigger hammer though.

Will also check if similar scenario exists on x86 and if so, how and when
pte frag data is fixed there.

Regards,
Bharata.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: PROBLEM: Power9: kernel oops on memory hotunplug from ppc64le guest
@ 2019-05-20 15:12                     ` Bharata B Rao
  0 siblings, 0 replies; 26+ messages in thread
From: Bharata B Rao @ 2019-05-20 15:12 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: bharata, linux-kernel, linux-next, aneesh.kumar, srikanth, linuxppc-dev

On Tue, May 21, 2019 at 12:55:49AM +1000, Nicholas Piggin wrote:
> Bharata B Rao's on May 21, 2019 12:29 am:
> > On Mon, May 20, 2019 at 01:50:35PM +0530, Bharata B Rao wrote:
> >> On Mon, May 20, 2019 at 05:00:21PM +1000, Nicholas Piggin wrote:
> >> > Bharata B Rao's on May 20, 2019 3:56 pm:
> >> > > On Mon, May 20, 2019 at 02:48:35PM +1000, Nicholas Piggin wrote:
> >> > >> >> > git bisect points to
> >> > >> >> >
> >> > >> >> > commit 4231aba000f5a4583dd9f67057aadb68c3eca99d
> >> > >> >> > Author: Nicholas Piggin <npiggin@gmail.com>
> >> > >> >> > Date:   Fri Jul 27 21:48:17 2018 +1000
> >> > >> >> >
> >> > >> >> >     powerpc/64s: Fix page table fragment refcount race vs speculative references
> >> > >> >> >
> >> > >> >> >     The page table fragment allocator uses the main page refcount racily
> >> > >> >> >     with respect to speculative references. A customer observed a BUG due
> >> > >> >> >     to page table page refcount underflow in the fragment allocator. This
> >> > >> >> >     can be caused by the fragment allocator set_page_count stomping on a
> >> > >> >> >     speculative reference, and then the speculative failure handler
> >> > >> >> >     decrements the new reference, and the underflow eventually pops when
> >> > >> >> >     the page tables are freed.
> >> > >> >> >
> >> > >> >> >     Fix this by using a dedicated field in the struct page for the page
> >> > >> >> >     table fragment allocator.
> >> > >> >> >
> >> > >> >> >     Fixes: 5c1f6ee9a31c ("powerpc: Reduce PTE table memory wastage")
> >> > >> >> >     Cc: stable@vger.kernel.org # v3.10+
> >> > >> >> 
> >> > >> >> That's the commit that added the BUG_ON(), so prior to that you won't
> >> > >> >> see the crash.
> >> > >> > 
> >> > >> > Right, but the commit says it fixes page table page refcount underflow by
> >> > >> > introducing a new field &page->pt_frag_refcount. Now we are hitting the underflow
> >> > >> > for this pt_frag_refcount.
> >> > >> 
> >> > >> The fixed underflow is caused by a bug (race on page count) that got 
> >> > >> fixed by that patch. You are hitting a different underflow here. It's
> >> > >> not certain my patch caused it, I'm just trying to reproduce now.
> >> > > 
> >> > > Ok.
> >> > 
> >> > Can't reproduce I'm afraid, tried adding and removing 8GB memory from a
> >> > 4GB guest (via host adding / removing memory device), and it just works.
> >> 
> >> Boot, add 8G, reboot, remove 8G is the sequence to reproduce.
> >> 
> >> > 
> >> > It's likely to be an edge case like an off by one or rounding error
> >> > that just happens to trigger in your config. Might be easiest if you
> >> > could test with a debug patch.
> >> 
> >> Sure, I will continue debugging.
> > 
> > When the guest is rebooted after hotplug, the entire memory (which includes
> > the hotplugged memory) gets remapped again freshly. However at this time
> > since no slab is available yet, pt_frag_refcount never gets initialized as we
> > never do pte_fragment_alloc() for these mappings. So we right away hit the
> > underflow during the first unplug itself, it looks like.
> 
> Nice catch, good debugging work.

Thanks, with help from Aneesh.

> 
> > I will check how this can be fixed.
> 
> Tricky problem. What do you think? You might be able to make the early 
> page table allocations in the same pattern as the frag allocations, and 
> then fill in the struct page metadata when you have those.

Will explore.

> 
> Other option may be create a new set of page tables after mm comes up
> to replace the early page tables with. That's a bigger hammer though.

Will also check if similar scenario exists on x86 and if so, how and when
pte frag data is fixed there.

Regards,
Bharata.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: PROBLEM: Power9: kernel oops on memory hotunplug from ppc64le guest
  2019-05-20 14:55                   ` Nicholas Piggin
@ 2019-05-20 15:20                     ` Aneesh Kumar K.V
  -1 siblings, 0 replies; 26+ messages in thread
From: Aneesh Kumar K.V @ 2019-05-20 15:20 UTC (permalink / raw)
  To: Nicholas Piggin, bharata
  Cc: bharata, linux-kernel, linux-next, linuxppc-dev,
	Michael Ellerman, srikanth

On 5/20/19 8:25 PM, Nicholas Piggin wrote:
> Bharata B Rao's on May 21, 2019 12:29 am:
>> On Mon, May 20, 2019 at 01:50:35PM +0530, Bharata B Rao wrote:
>>> On Mon, May 20, 2019 at 05:00:21PM +1000, Nicholas Piggin wrote:
>>>> Bharata B Rao's on May 20, 2019 3:56 pm:
>>>>> On Mon, May 20, 2019 at 02:48:35PM +1000, Nicholas Piggin wrote:
>>>>>>>>> git bisect points to
>>>>>>>>>
>>>>>>>>> commit 4231aba000f5a4583dd9f67057aadb68c3eca99d
>>>>>>>>> Author: Nicholas Piggin <npiggin@gmail.com>
>>>>>>>>> Date:   Fri Jul 27 21:48:17 2018 +1000
>>>>>>>>>
>>>>>>>>>      powerpc/64s: Fix page table fragment refcount race vs speculative references
>>>>>>>>>
>>>>>>>>>      The page table fragment allocator uses the main page refcount racily
>>>>>>>>>      with respect to speculative references. A customer observed a BUG due
>>>>>>>>>      to page table page refcount underflow in the fragment allocator. This
>>>>>>>>>      can be caused by the fragment allocator set_page_count stomping on a
>>>>>>>>>      speculative reference, and then the speculative failure handler
>>>>>>>>>      decrements the new reference, and the underflow eventually pops when
>>>>>>>>>      the page tables are freed.
>>>>>>>>>
>>>>>>>>>      Fix this by using a dedicated field in the struct page for the page
>>>>>>>>>      table fragment allocator.
>>>>>>>>>
>>>>>>>>>      Fixes: 5c1f6ee9a31c ("powerpc: Reduce PTE table memory wastage")
>>>>>>>>>      Cc: stable@vger.kernel.org # v3.10+
>>>>>>>>
>>>>>>>> That's the commit that added the BUG_ON(), so prior to that you won't
>>>>>>>> see the crash.
>>>>>>>
>>>>>>> Right, but the commit says it fixes page table page refcount underflow by
>>>>>>> introducing a new field &page->pt_frag_refcount. Now we are hitting the underflow
>>>>>>> for this pt_frag_refcount.
>>>>>>
>>>>>> The fixed underflow is caused by a bug (race on page count) that got
>>>>>> fixed by that patch. You are hitting a different underflow here. It's
>>>>>> not certain my patch caused it, I'm just trying to reproduce now.
>>>>>
>>>>> Ok.
>>>>
>>>> Can't reproduce I'm afraid, tried adding and removing 8GB memory from a
>>>> 4GB guest (via host adding / removing memory device), and it just works.
>>>
>>> Boot, add 8G, reboot, remove 8G is the sequence to reproduce.
>>>
>>>>
>>>> It's likely to be an edge case like an off by one or rounding error
>>>> that just happens to trigger in your config. Might be easiest if you
>>>> could test with a debug patch.
>>>
>>> Sure, I will continue debugging.
>>
>> When the guest is rebooted after hotplug, the entire memory (which includes
>> the hotplugged memory) gets remapped again freshly. However at this time
>> since no slab is available yet, pt_frag_refcount never gets initialized as we
>> never do pte_fragment_alloc() for these mappings. So we right away hit the
>> underflow during the first unplug itself, it looks like.
> 
> Nice catch, good debugging work.
> 
>> I will check how this can be fixed.
> 
> Tricky problem. What do you think? You might be able to make the early
> page table allocations in the same pattern as the frag allocations, and
> then fill in the struct page metadata when you have those.


I guess we need to do something similar to what x86 does. We need to 
walk the init_mm page table again and re-init struct page and other data 
structures backing the tables?

-aneesh


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: PROBLEM: Power9: kernel oops on memory hotunplug from ppc64le guest
@ 2019-05-20 15:20                     ` Aneesh Kumar K.V
  0 siblings, 0 replies; 26+ messages in thread
From: Aneesh Kumar K.V @ 2019-05-20 15:20 UTC (permalink / raw)
  To: Nicholas Piggin, bharata
  Cc: linux-kernel, srikanth, linux-next, bharata, linuxppc-dev

On 5/20/19 8:25 PM, Nicholas Piggin wrote:
> Bharata B Rao's on May 21, 2019 12:29 am:
>> On Mon, May 20, 2019 at 01:50:35PM +0530, Bharata B Rao wrote:
>>> On Mon, May 20, 2019 at 05:00:21PM +1000, Nicholas Piggin wrote:
>>>> Bharata B Rao's on May 20, 2019 3:56 pm:
>>>>> On Mon, May 20, 2019 at 02:48:35PM +1000, Nicholas Piggin wrote:
>>>>>>>>> git bisect points to
>>>>>>>>>
>>>>>>>>> commit 4231aba000f5a4583dd9f67057aadb68c3eca99d
>>>>>>>>> Author: Nicholas Piggin <npiggin@gmail.com>
>>>>>>>>> Date:   Fri Jul 27 21:48:17 2018 +1000
>>>>>>>>>
>>>>>>>>>      powerpc/64s: Fix page table fragment refcount race vs speculative references
>>>>>>>>>
>>>>>>>>>      The page table fragment allocator uses the main page refcount racily
>>>>>>>>>      with respect to speculative references. A customer observed a BUG due
>>>>>>>>>      to page table page refcount underflow in the fragment allocator. This
>>>>>>>>>      can be caused by the fragment allocator set_page_count stomping on a
>>>>>>>>>      speculative reference, and then the speculative failure handler
>>>>>>>>>      decrements the new reference, and the underflow eventually pops when
>>>>>>>>>      the page tables are freed.
>>>>>>>>>
>>>>>>>>>      Fix this by using a dedicated field in the struct page for the page
>>>>>>>>>      table fragment allocator.
>>>>>>>>>
>>>>>>>>>      Fixes: 5c1f6ee9a31c ("powerpc: Reduce PTE table memory wastage")
>>>>>>>>>      Cc: stable@vger.kernel.org # v3.10+
>>>>>>>>
>>>>>>>> That's the commit that added the BUG_ON(), so prior to that you won't
>>>>>>>> see the crash.
>>>>>>>
>>>>>>> Right, but the commit says it fixes page table page refcount underflow by
>>>>>>> introducing a new field &page->pt_frag_refcount. Now we are hitting the underflow
>>>>>>> for this pt_frag_refcount.
>>>>>>
>>>>>> The fixed underflow is caused by a bug (race on page count) that got
>>>>>> fixed by that patch. You are hitting a different underflow here. It's
>>>>>> not certain my patch caused it, I'm just trying to reproduce now.
>>>>>
>>>>> Ok.
>>>>
>>>> Can't reproduce I'm afraid, tried adding and removing 8GB memory from a
>>>> 4GB guest (via host adding / removing memory device), and it just works.
>>>
>>> Boot, add 8G, reboot, remove 8G is the sequence to reproduce.
>>>
>>>>
>>>> It's likely to be an edge case like an off by one or rounding error
>>>> that just happens to trigger in your config. Might be easiest if you
>>>> could test with a debug patch.
>>>
>>> Sure, I will continue debugging.
>>
>> When the guest is rebooted after hotplug, the entire memory (which includes
>> the hotplugged memory) gets remapped again freshly. However at this time
>> since no slab is available yet, pt_frag_refcount never gets initialized as we
>> never do pte_fragment_alloc() for these mappings. So we right away hit the
>> underflow during the first unplug itself, it looks like.
> 
> Nice catch, good debugging work.
> 
>> I will check how this can be fixed.
> 
> Tricky problem. What do you think? You might be able to make the early
> page table allocations in the same pattern as the frag allocations, and
> then fill in the struct page metadata when you have those.


I guess we need to do something similar to what x86 does. We need to 
walk the init_mm page table again and re-init struct page and other data 
structures backing the tables?

-aneesh


^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2019-05-20 15:22 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-16 14:14 PROBLEM: Power9: kernel oops on memory hotunplug from ppc64le guest srikanth
2019-05-16 14:14 ` srikanth
2019-05-17 11:20 ` Michael Ellerman
2019-05-17 11:20   ` Michael Ellerman
2019-05-18 14:14 ` Bharata B Rao
2019-05-18 14:14   ` Bharata B Rao
2019-05-20  2:02   ` Michael Ellerman
2019-05-20  2:02     ` Michael Ellerman
2019-05-20  4:25     ` Bharata B Rao
2019-05-20  4:25       ` Bharata B Rao
2019-05-20  4:48       ` Nicholas Piggin
2019-05-20  4:48         ` Nicholas Piggin
2019-05-20  5:56         ` Bharata B Rao
2019-05-20  5:56           ` Bharata B Rao
2019-05-20  7:00           ` Nicholas Piggin
2019-05-20  7:00             ` Nicholas Piggin
2019-05-20  8:20             ` Bharata B Rao
2019-05-20  8:20               ` Bharata B Rao
2019-05-20 14:29               ` Bharata B Rao
2019-05-20 14:29                 ` Bharata B Rao
2019-05-20 14:55                 ` Nicholas Piggin
2019-05-20 14:55                   ` Nicholas Piggin
2019-05-20 15:12                   ` Bharata B Rao
2019-05-20 15:12                     ` Bharata B Rao
2019-05-20 15:20                   ` Aneesh Kumar K.V
2019-05-20 15:20                     ` Aneesh Kumar K.V

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.