All of lore.kernel.org
 help / color / mirror / Atom feed
* Crash in kvmppc_xive_release()
@ 2019-07-18 12:49 Michael Ellerman
  2019-07-18 13:14 ` Cédric Le Goater
  0 siblings, 1 reply; 7+ messages in thread
From: Michael Ellerman @ 2019-07-18 12:49 UTC (permalink / raw)
  To: linuxppc-dev, Cédric Le Goater

Anyone else seen this?

This is running ~176 VMs on a Power9 (1 per thread), host crashes:

  [   66.403750][ T6423] xive: OPAL failed to allocate VCPUs order 11, err -10
  [188523.080935670,4] Spent 1783 msecs in OPAL call 135!
  [   66.484965][ T6250] BUG: Kernel NULL pointer dereference at 0x000042e8
  [   66.485558][ T6250] Faulting instruction address: 0xc008000011a33fcc
  [   66.485990][ T6250] Oops: Kernel access of bad area, sig: 7 [#1]
  [   66.486405][ T6250] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV
  [   66.486967][ T6250] Modules linked in: kvm_hv kvm
  [   66.487275][ T6250] CPU: 107 PID: 6250 Comm: qemu-system-ppc Not tainted 5.2.0-rc2-gcc9x-gf5a9e488d623 #1
  [   66.487902][ T6250] NIP:  c008000011a33fcc LR: c008000011a33fc4 CTR: c0000000005d5970
  [   66.488383][ T6250] REGS: c000001fabebb900 TRAP: 0300   Not tainted  (5.2.0-rc2-gcc9x-gf5a9e488d623)
  [   66.488933][ T6250] MSR:  900000000280b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 24028224  XER: 00000000
  [   66.489724][ T6250] CFAR: c0000000005d6a4c DAR: 00000000000042e8 DSISR: 00080000 IRQMASK: 0 
  [   66.489724][ T6250] GPR00: c008000011a33fc4 c000001fabebbb90 c008000011a5a200 c000000001399928 
  [   66.489724][ T6250] GPR04: 0000000000000001 c00000000047b8d0 0000000000000000 0000000000000001 
  [   66.489724][ T6250] GPR08: 0000000000000000 0000000000000000 c000001fa8c42f00 c008000011a3af20 
  [   66.489724][ T6250] GPR12: 0000000000008000 c0002023ff65a880 000000013a1b4000 0000000000000002 
  [   66.489724][ T6250] GPR16: 0000000010000000 0000000000000002 0000000000000001 000000012b194cc0 
  [   66.489724][ T6250] GPR20: 00007fffb1645250 0000000000000001 0000000000000031 0000000000000000 
  [   66.489724][ T6250] GPR24: 00007fffb16408d8 c000001ffafb62e0 c000001f78699360 c000001ff35d0620 
  [   66.489724][ T6250] GPR28: c000001ed0ed0000 c000001ecd900000 0000000000000000 c000001ed0ed0000 
  [   66.495211][ T6250] NIP [c008000011a33fcc] kvmppc_xive_release+0x54/0x1b0 [kvm]
  [   66.495642][ T6250] LR [c008000011a33fc4] kvmppc_xive_release+0x4c/0x1b0 [kvm]
  [   66.496101][ T6250] Call Trace:
  [   66.496314][ T6250] [c000001fabebbb90] [c008000011a33fc4] kvmppc_xive_release+0x4c/0x1b0 [kvm] (unreliable)
  [   66.496893][ T6250] [c000001fabebbbf0] [c008000011a18d54] kvm_device_release+0xac/0xf0 [kvm]
  [   66.497399][ T6250] [c000001fabebbc30] [c000000000442f8c] __fput+0xec/0x310
  [   66.497815][ T6250] [c000001fabebbc90] [c000000000145f94] task_work_run+0x114/0x170
  [   66.498296][ T6250] [c000001fabebbce0] [c000000000115274] do_exit+0x454/0xee0
  [   66.498743][ T6250] [c000001fabebbdc0] [c000000000115dd0] do_group_exit+0x60/0xe0
  [   66.499201][ T6250] [c000001fabebbe00] [c000000000115e74] sys_exit_group+0x24/0x40
  [   66.499747][ T6250] [c000001fabebbe20] [c00000000000b83c] system_call+0x5c/0x70
  [   66.500261][ T6250] Instruction dump:
  [   66.500484][ T6250] fbe1fff8 fba1ffe8 fbc1fff0 7c7c1b78 f8010010 f821ffa1 eba30010 e87d0010 
  [   66.501006][ T6250] ebdd0000 48006f61 e8410018 39200000 <eb7e42ea> 913e42e8 48007f3d e8410018 
  [   66.501529][ T6250] ---[ end trace c021a6ca03594ec3 ]---
  [   66.513119][ T6150] xive: OPAL failed to allocate VCPUs order 11, err -10

cheers

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Crash in kvmppc_xive_release()
  2019-07-18 12:49 Crash in kvmppc_xive_release() Michael Ellerman
@ 2019-07-18 13:14 ` Cédric Le Goater
  2019-07-18 21:51   ` Cédric Le Goater
  2019-07-19 11:20   ` Michael Ellerman
  0 siblings, 2 replies; 7+ messages in thread
From: Cédric Le Goater @ 2019-07-18 13:14 UTC (permalink / raw)
  To: Michael Ellerman, linuxppc-dev

On 18/07/2019 14:49, Michael Ellerman wrote:
> Anyone else seen this?
> 
> This is running ~176 VMs on a Power9 (1 per thread), host crashes:

This is beyond the underlying limits of XIVE. 

As we allocate 2K vCPUs per VM, that is 16K EQs for interrupt events. The overall
EQ count is 1M. I let you calculate what is our max number of VMs ...

>   [   66.403750][ T6423] xive: OPAL failed to allocate VCPUs order 11, err -10

Hence, the OPAL XIVE driver fails which is good but ...

>   [188523.080935670,4] Spent 1783 msecs in OPAL call 135!
>   [   66.484965][ T6250] BUG: Kernel NULL pointer dereference at 0x000042e8
>   [   66.485558][ T6250] Faulting instruction address: 0xc008000011a33fcc
>   [   66.485990][ T6250] Oops: Kernel access of bad area, sig: 7 [#1]
>   [   66.486405][ T6250] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV
>   [   66.486967][ T6250] Modules linked in: kvm_hv kvm
>   [   66.487275][ T6250] CPU: 107 PID: 6250 Comm: qemu-system-ppc Not tainted 5.2.0-rc2-gcc9x-gf5a9e488d623 #1
>   [   66.487902][ T6250] NIP:  c008000011a33fcc LR: c008000011a33fc4 CTR: c0000000005d5970
>   [   66.488383][ T6250] REGS: c000001fabebb900 TRAP: 0300   Not tainted  (5.2.0-rc2-gcc9x-gf5a9e488d623)
>   [   66.488933][ T6250] MSR:  900000000280b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 24028224  XER: 00000000
>   [   66.489724][ T6250] CFAR: c0000000005d6a4c DAR: 00000000000042e8 DSISR: 00080000 IRQMASK: 0 
>   [   66.489724][ T6250] GPR00: c008000011a33fc4 c000001fabebbb90 c008000011a5a200 c000000001399928 
>   [   66.489724][ T6250] GPR04: 0000000000000001 c00000000047b8d0 0000000000000000 0000000000000001 
>   [   66.489724][ T6250] GPR08: 0000000000000000 0000000000000000 c000001fa8c42f00 c008000011a3af20 
>   [   66.489724][ T6250] GPR12: 0000000000008000 c0002023ff65a880 000000013a1b4000 0000000000000002 
>   [   66.489724][ T6250] GPR16: 0000000010000000 0000000000000002 0000000000000001 000000012b194cc0 
>   [   66.489724][ T6250] GPR20: 00007fffb1645250 0000000000000001 0000000000000031 0000000000000000 
>   [   66.489724][ T6250] GPR24: 00007fffb16408d8 c000001ffafb62e0 c000001f78699360 c000001ff35d0620 
>   [   66.489724][ T6250] GPR28: c000001ed0ed0000 c000001ecd900000 0000000000000000 c000001ed0ed0000 
>   [   66.495211][ T6250] NIP [c008000011a33fcc] kvmppc_xive_release+0x54/0x1b0 [kvm]
>   [   66.495642][ T6250] LR [c008000011a33fc4] kvmppc_xive_release+0x4c/0x1b0 [kvm]
>   [   66.496101][ T6250] Call Trace:
>   [   66.496314][ T6250] [c000001fabebbb90] [c008000011a33fc4] kvmppc_xive_release+0x4c/0x1b0 [kvm] (unreliable)
>   [   66.496893][ T6250] [c000001fabebbbf0] [c008000011a18d54] kvm_device_release+0xac/0xf0 [kvm]
>   [   66.497399][ T6250] [c000001fabebbc30] [c000000000442f8c] __fput+0xec/0x310
>   [   66.497815][ T6250] [c000001fabebbc90] [c000000000145f94] task_work_run+0x114/0x170
>   [   66.498296][ T6250] [c000001fabebbce0] [c000000000115274] do_exit+0x454/0xee0
>   [   66.498743][ T6250] [c000001fabebbdc0] [c000000000115dd0] do_group_exit+0x60/0xe0
>   [   66.499201][ T6250] [c000001fabebbe00] [c000000000115e74] sys_exit_group+0x24/0x40
>   [   66.499747][ T6250] [c000001fabebbe20] [c00000000000b83c] system_call+0x5c/0x70
>   [   66.500261][ T6250] Instruction dump:
>   [   66.500484][ T6250] fbe1fff8 fba1ffe8 fbc1fff0 7c7c1b78 f8010010 f821ffa1 eba30010 e87d0010 
>   [   66.501006][ T6250] ebdd0000 48006f61 e8410018 39200000 <eb7e42ea> 913e42e8 48007f3d e8410018 
>   [   66.501529][ T6250] ---[ end trace c021a6ca03594ec3 ]---
>   [   66.513119][ T6150] xive: OPAL failed to allocate VCPUs order 11, err -10


... the rollback code in case of such error must be bogus. It was never tested 
clearly :/

Thanks,

C.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Crash in kvmppc_xive_release()
  2019-07-18 13:14 ` Cédric Le Goater
@ 2019-07-18 21:51   ` Cédric Le Goater
  2019-07-19  7:08     ` Michael Ellerman
  2019-07-22  2:48     ` Michael Ellerman
  2019-07-19 11:20   ` Michael Ellerman
  1 sibling, 2 replies; 7+ messages in thread
From: Cédric Le Goater @ 2019-07-18 21:51 UTC (permalink / raw)
  To: Michael Ellerman, linuxppc-dev, Greg Kurz, Satheesh Rajendran,
	Paul Mackerras

On 18/07/2019 15:14, Cédric Le Goater wrote:
> On 18/07/2019 14:49, Michael Ellerman wrote:
>> Anyone else seen this?
>>
>> This is running ~176 VMs on a Power9 (1 per thread), host crashes:
> 
> This is beyond the underlying limits of XIVE. 
> 
> As we allocate 2K vCPUs per VM, that is 16K EQs for interrupt events. The overall
> EQ count is 1M. I let you calculate what is our max number of VMs ...
> 
>>   [   66.403750][ T6423] xive: OPAL failed to allocate VCPUs order 11, err -10
> 
> Hence, the OPAL XIVE driver fails which is good but ...
> 
>>   [188523.080935670,4] Spent 1783 msecs in OPAL call 135!
>>   [   66.484965][ T6250] BUG: Kernel NULL pointer dereference at 0x000042e8
>>   [   66.485558][ T6250] Faulting instruction address: 0xc008000011a33fcc
>>   [   66.485990][ T6250] Oops: Kernel access of bad area, sig: 7 [#1]
>>   [   66.486405][ T6250] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV
>>   [   66.486967][ T6250] Modules linked in: kvm_hv kvm
>>   [   66.487275][ T6250] CPU: 107 PID: 6250 Comm: qemu-system-ppc Not tainted 5.2.0-rc2-gcc9x-gf5a9e488d623 #1
>>   [   66.487902][ T6250] NIP:  c008000011a33fcc LR: c008000011a33fc4 CTR: c0000000005d5970
>>   [   66.488383][ T6250] REGS: c000001fabebb900 TRAP: 0300   Not tainted  (5.2.0-rc2-gcc9x-gf5a9e488d623)
>>   [   66.488933][ T6250] MSR:  900000000280b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 24028224  XER: 00000000
>>   [   66.489724][ T6250] CFAR: c0000000005d6a4c DAR: 00000000000042e8 DSISR: 00080000 IRQMASK: 0 
>>   [   66.489724][ T6250] GPR00: c008000011a33fc4 c000001fabebbb90 c008000011a5a200 c000000001399928 
>>   [   66.489724][ T6250] GPR04: 0000000000000001 c00000000047b8d0 0000000000000000 0000000000000001 
>>   [   66.489724][ T6250] GPR08: 0000000000000000 0000000000000000 c000001fa8c42f00 c008000011a3af20 
>>   [   66.489724][ T6250] GPR12: 0000000000008000 c0002023ff65a880 000000013a1b4000 0000000000000002 
>>   [   66.489724][ T6250] GPR16: 0000000010000000 0000000000000002 0000000000000001 000000012b194cc0 
>>   [   66.489724][ T6250] GPR20: 00007fffb1645250 0000000000000001 0000000000000031 0000000000000000 
>>   [   66.489724][ T6250] GPR24: 00007fffb16408d8 c000001ffafb62e0 c000001f78699360 c000001ff35d0620 
>>   [   66.489724][ T6250] GPR28: c000001ed0ed0000 c000001ecd900000 0000000000000000 c000001ed0ed0000 
>>   [   66.495211][ T6250] NIP [c008000011a33fcc] kvmppc_xive_release+0x54/0x1b0 [kvm]
>>   [   66.495642][ T6250] LR [c008000011a33fc4] kvmppc_xive_release+0x4c/0x1b0 [kvm]
>>   [   66.496101][ T6250] Call Trace:
>>   [   66.496314][ T6250] [c000001fabebbb90] [c008000011a33fc4] kvmppc_xive_release+0x4c/0x1b0 [kvm] (unreliable)
>>   [   66.496893][ T6250] [c000001fabebbbf0] [c008000011a18d54] kvm_device_release+0xac/0xf0 [kvm]
>>   [   66.497399][ T6250] [c000001fabebbc30] [c000000000442f8c] __fput+0xec/0x310
>>   [   66.497815][ T6250] [c000001fabebbc90] [c000000000145f94] task_work_run+0x114/0x170
>>   [   66.498296][ T6250] [c000001fabebbce0] [c000000000115274] do_exit+0x454/0xee0
>>   [   66.498743][ T6250] [c000001fabebbdc0] [c000000000115dd0] do_group_exit+0x60/0xe0
>>   [   66.499201][ T6250] [c000001fabebbe00] [c000000000115e74] sys_exit_group+0x24/0x40
>>   [   66.499747][ T6250] [c000001fabebbe20] [c00000000000b83c] system_call+0x5c/0x70
>>   [   66.500261][ T6250] Instruction dump:
>>   [   66.500484][ T6250] fbe1fff8 fba1ffe8 fbc1fff0 7c7c1b78 f8010010 f821ffa1 eba30010 e87d0010 
>>   [   66.501006][ T6250] ebdd0000 48006f61 e8410018 39200000 <eb7e42ea> 913e42e8 48007f3d e8410018 
>>   [   66.501529][ T6250] ---[ end trace c021a6ca03594ec3 ]---
>>   [   66.513119][ T6150] xive: OPAL failed to allocate VCPUs order 11, err -10
> 
> 
> ... the rollback code in case of such error must be bogus. It was never tested 
> clearly :/

Here is a fix. Could you give it a try on your system  ?

Thanks,

C.

From b6f728ca19a9540c8bf4f5a56991c4e3dab4cf56 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= <clg@kaod.org>
Date: Thu, 18 Jul 2019 22:15:31 +0200
Subject: [PATCH] KVM: PPC: Book3S HV: XIVE: fix rollback when
 kvmppc_xive_create fails
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The XIVE device structure is now allocated in kvmppc_xive_get_device()
and kfree'd in kvmppc_core_destroy_vm(). In case of an OPAL error when
allocating the XIVE VPs, the kfree() call in kvmppc_xive_*create()
will result in a double free and corrupt the host memory.

Fixes: 5422e95103cf ("KVM: PPC: Book3S HV: XIVE: Replace the 'destroy' method by a 'release' method")
Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 arch/powerpc/kvm/book3s_xive.c        | 4 +---
 arch/powerpc/kvm/book3s_xive_native.c | 4 ++--
 2 files changed, 3 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
index 6ca0d7376a9f..e3ba67095895 100644
--- a/arch/powerpc/kvm/book3s_xive.c
+++ b/arch/powerpc/kvm/book3s_xive.c
@@ -1986,10 +1986,8 @@ static int kvmppc_xive_create(struct kvm_device *dev, u32 type)
 
 	xive->single_escalation = xive_native_has_single_escalation();
 
-	if (ret) {
-		kfree(xive);
+	if (ret)
 		return ret;
-	}
 
 	return 0;
 }
diff --git a/arch/powerpc/kvm/book3s_xive_native.c b/arch/powerpc/kvm/book3s_xive_native.c
index c7c7e3d4c031..45b0c143280c 100644
--- a/arch/powerpc/kvm/book3s_xive_native.c
+++ b/arch/powerpc/kvm/book3s_xive_native.c
@@ -1090,9 +1090,9 @@ static int kvmppc_xive_native_create(struct kvm_device *dev, u32 type)
 	xive->ops = &kvmppc_xive_native_ops;
 
 	if (ret)
-		kfree(xive);
+		return ret;
 
-	return ret;
+	return 0;
 }
 
 /*
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: Crash in kvmppc_xive_release()
  2019-07-18 21:51   ` Cédric Le Goater
@ 2019-07-19  7:08     ` Michael Ellerman
  2019-07-22  2:48     ` Michael Ellerman
  1 sibling, 0 replies; 7+ messages in thread
From: Michael Ellerman @ 2019-07-19  7:08 UTC (permalink / raw)
  To: Cédric Le Goater, linuxppc-dev, Greg Kurz,
	Satheesh Rajendran, Paul Mackerras

Cédric Le Goater <clg@kaod.org> writes:
> On 18/07/2019 15:14, Cédric Le Goater wrote:
>> On 18/07/2019 14:49, Michael Ellerman wrote:
>>> Anyone else seen this?
>>>
>>> This is running ~176 VMs on a Power9 (1 per thread), host crashes:
>> 
>> This is beyond the underlying limits of XIVE. 
>> 
>> As we allocate 2K vCPUs per VM, that is 16K EQs for interrupt events. The overall
>> EQ count is 1M. I let you calculate what is our max number of VMs ...
>> 
>>>   [   66.403750][ T6423] xive: OPAL failed to allocate VCPUs order 11, err -10
>> 
>> Hence, the OPAL XIVE driver fails which is good but ...
>> 
>>>   [188523.080935670,4] Spent 1783 msecs in OPAL call 135!
>>>   [   66.484965][ T6250] BUG: Kernel NULL pointer dereference at 0x000042e8
>>>   [   66.485558][ T6250] Faulting instruction address: 0xc008000011a33fcc
>>>   [   66.485990][ T6250] Oops: Kernel access of bad area, sig: 7 [#1]
>>>   [   66.486405][ T6250] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV
>>>   [   66.486967][ T6250] Modules linked in: kvm_hv kvm
>>>   [   66.487275][ T6250] CPU: 107 PID: 6250 Comm: qemu-system-ppc Not tainted 5.2.0-rc2-gcc9x-gf5a9e488d623 #1
>>>   [   66.487902][ T6250] NIP:  c008000011a33fcc LR: c008000011a33fc4 CTR: c0000000005d5970
>>>   [   66.488383][ T6250] REGS: c000001fabebb900 TRAP: 0300   Not tainted  (5.2.0-rc2-gcc9x-gf5a9e488d623)
>>>   [   66.488933][ T6250] MSR:  900000000280b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 24028224  XER: 00000000
>>>   [   66.489724][ T6250] CFAR: c0000000005d6a4c DAR: 00000000000042e8 DSISR: 00080000 IRQMASK: 0 
>>>   [   66.489724][ T6250] GPR00: c008000011a33fc4 c000001fabebbb90 c008000011a5a200 c000000001399928 
>>>   [   66.489724][ T6250] GPR04: 0000000000000001 c00000000047b8d0 0000000000000000 0000000000000001 
>>>   [   66.489724][ T6250] GPR08: 0000000000000000 0000000000000000 c000001fa8c42f00 c008000011a3af20 
>>>   [   66.489724][ T6250] GPR12: 0000000000008000 c0002023ff65a880 000000013a1b4000 0000000000000002 
>>>   [   66.489724][ T6250] GPR16: 0000000010000000 0000000000000002 0000000000000001 000000012b194cc0 
>>>   [   66.489724][ T6250] GPR20: 00007fffb1645250 0000000000000001 0000000000000031 0000000000000000 
>>>   [   66.489724][ T6250] GPR24: 00007fffb16408d8 c000001ffafb62e0 c000001f78699360 c000001ff35d0620 
>>>   [   66.489724][ T6250] GPR28: c000001ed0ed0000 c000001ecd900000 0000000000000000 c000001ed0ed0000 
>>>   [   66.495211][ T6250] NIP [c008000011a33fcc] kvmppc_xive_release+0x54/0x1b0 [kvm]
>>>   [   66.495642][ T6250] LR [c008000011a33fc4] kvmppc_xive_release+0x4c/0x1b0 [kvm]
>>>   [   66.496101][ T6250] Call Trace:
>>>   [   66.496314][ T6250] [c000001fabebbb90] [c008000011a33fc4] kvmppc_xive_release+0x4c/0x1b0 [kvm] (unreliable)
>>>   [   66.496893][ T6250] [c000001fabebbbf0] [c008000011a18d54] kvm_device_release+0xac/0xf0 [kvm]
>>>   [   66.497399][ T6250] [c000001fabebbc30] [c000000000442f8c] __fput+0xec/0x310
>>>   [   66.497815][ T6250] [c000001fabebbc90] [c000000000145f94] task_work_run+0x114/0x170
>>>   [   66.498296][ T6250] [c000001fabebbce0] [c000000000115274] do_exit+0x454/0xee0
>>>   [   66.498743][ T6250] [c000001fabebbdc0] [c000000000115dd0] do_group_exit+0x60/0xe0
>>>   [   66.499201][ T6250] [c000001fabebbe00] [c000000000115e74] sys_exit_group+0x24/0x40
>>>   [   66.499747][ T6250] [c000001fabebbe20] [c00000000000b83c] system_call+0x5c/0x70
>>>   [   66.500261][ T6250] Instruction dump:
>>>   [   66.500484][ T6250] fbe1fff8 fba1ffe8 fbc1fff0 7c7c1b78 f8010010 f821ffa1 eba30010 e87d0010 
>>>   [   66.501006][ T6250] ebdd0000 48006f61 e8410018 39200000 <eb7e42ea> 913e42e8 48007f3d e8410018 
>>>   [   66.501529][ T6250] ---[ end trace c021a6ca03594ec3 ]---
>>>   [   66.513119][ T6150] xive: OPAL failed to allocate VCPUs order 11, err -10
>> 
>> 
>> ... the rollback code in case of such error must be bogus. It was never tested 
>> clearly :/
>
> Here is a fix. Could you give it a try on your system  ?

Yeah that fixes it, thanks.

Will apply it to my fixes branch.

cheers


> From b6f728ca19a9540c8bf4f5a56991c4e3dab4cf56 Mon Sep 17 00:00:00 2001
> From: =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= <clg@kaod.org>
> Date: Thu, 18 Jul 2019 22:15:31 +0200
> Subject: [PATCH] KVM: PPC: Book3S HV: XIVE: fix rollback when
>  kvmppc_xive_create fails
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
>
> The XIVE device structure is now allocated in kvmppc_xive_get_device()
> and kfree'd in kvmppc_core_destroy_vm(). In case of an OPAL error when
> allocating the XIVE VPs, the kfree() call in kvmppc_xive_*create()
> will result in a double free and corrupt the host memory.
>
> Fixes: 5422e95103cf ("KVM: PPC: Book3S HV: XIVE: Replace the 'destroy' method by a 'release' method")
> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> ---
>  arch/powerpc/kvm/book3s_xive.c        | 4 +---
>  arch/powerpc/kvm/book3s_xive_native.c | 4 ++--
>  2 files changed, 3 insertions(+), 5 deletions(-)
>
> diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
> index 6ca0d7376a9f..e3ba67095895 100644
> --- a/arch/powerpc/kvm/book3s_xive.c
> +++ b/arch/powerpc/kvm/book3s_xive.c
> @@ -1986,10 +1986,8 @@ static int kvmppc_xive_create(struct kvm_device *dev, u32 type)
>  
>  	xive->single_escalation = xive_native_has_single_escalation();
>  
> -	if (ret) {
> -		kfree(xive);
> +	if (ret)
>  		return ret;
> -	}
>  
>  	return 0;
>  }
> diff --git a/arch/powerpc/kvm/book3s_xive_native.c b/arch/powerpc/kvm/book3s_xive_native.c
> index c7c7e3d4c031..45b0c143280c 100644
> --- a/arch/powerpc/kvm/book3s_xive_native.c
> +++ b/arch/powerpc/kvm/book3s_xive_native.c
> @@ -1090,9 +1090,9 @@ static int kvmppc_xive_native_create(struct kvm_device *dev, u32 type)
>  	xive->ops = &kvmppc_xive_native_ops;
>  
>  	if (ret)
> -		kfree(xive);
> +		return ret;
>  
> -	return ret;
> +	return 0;
>  }
>  
>  /*
> -- 
> 2.21.0

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Crash in kvmppc_xive_release()
  2019-07-18 13:14 ` Cédric Le Goater
  2019-07-18 21:51   ` Cédric Le Goater
@ 2019-07-19 11:20   ` Michael Ellerman
  2019-07-19 12:05     ` Cédric Le Goater
  1 sibling, 1 reply; 7+ messages in thread
From: Michael Ellerman @ 2019-07-19 11:20 UTC (permalink / raw)
  To: Cédric Le Goater, linuxppc-dev

Cédric Le Goater <clg@kaod.org> writes:
> On 18/07/2019 14:49, Michael Ellerman wrote:
>> Anyone else seen this?
>> 
>> This is running ~176 VMs on a Power9 (1 per thread), host crashes:
>
> This is beyond the underlying limits of XIVE. 
>
> As we allocate 2K vCPUs per VM, that is 16K EQs for interrupt events. The overall
> EQ count is 1M. I let you calculate what is our max number of VMs ...

We need to fix it somehow, people will expect to be able to run a VM per
thread.

cheers

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Crash in kvmppc_xive_release()
  2019-07-19 11:20   ` Michael Ellerman
@ 2019-07-19 12:05     ` Cédric Le Goater
  0 siblings, 0 replies; 7+ messages in thread
From: Cédric Le Goater @ 2019-07-19 12:05 UTC (permalink / raw)
  To: Michael Ellerman, linuxppc-dev, Greg Kurz

On 19/07/2019 13:20, Michael Ellerman wrote:
> Cédric Le Goater <clg@kaod.org> writes:
>> On 18/07/2019 14:49, Michael Ellerman wrote:
>>> Anyone else seen this?
>>>
>>> This is running ~176 VMs on a Power9 (1 per thread), host crashes:
>>
>> This is beyond the underlying limits of XIVE. 
>>
>> As we allocate 2K vCPUs per VM, that is 16K EQs for interrupt events. The overall
>> EQ count is 1M. I let you calculate what is our max number of VMs ...
> 
> We need to fix it somehow, people will expect to be able to run a VM per
> thread.

we are limited by two spaces : VP space (1 << 19) system overall and 
EQ space (1 << 20 per chip, this one we could increase). But one of 
the big issue is the way we allocate the XIVE VPs in the XIVE devices.
As we have no idea of how much vCPUs we should provision for, we take 
the max: 2048 ...

If we had the maxcpu of the VM (from QEMU) or at least some hints on 
a rough figure, lets say a power of 2 [ 32 - 4096 ] CPUs, we would 
fragment less our VP space and increase a lot our #VMs per system.

It could be a kernel global, sysfs or what ever, a new KVM PPC control
on the VM to tune maxcpu, or a KVM device creation parameter. we 
could also register multiple KVM devices each having its maximum :
tiny (5), small (6), normal (8), big (11, default legacy), huge (12),
and create from QEMU the one we think fits the best.

I have to think this over. 

Nevertheless, I am trying to increase by 2 or 4 the XIVE spaces for
POWER10. 

C.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Re: Crash in kvmppc_xive_release()
  2019-07-18 21:51   ` Cédric Le Goater
  2019-07-19  7:08     ` Michael Ellerman
@ 2019-07-22  2:48     ` Michael Ellerman
  1 sibling, 0 replies; 7+ messages in thread
From: Michael Ellerman @ 2019-07-22  2:48 UTC (permalink / raw)
  To: Cédric Le Goater, linuxppc-dev, Greg Kurz,
	Satheesh Rajendran, Paul Mackerras

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 1131 bytes --]

On Thu, 2019-07-18 at 21:51:54 UTC, =?UTF-8?Q?C=c3=a9dric_Le_Goater?= wrote:
> On 18/07/2019 15:14, Cédric Le Goater wrote:
...
> 
> Here is a fix. Could you give it a try on your system  ?
> 
> Thanks,
> 
> C.
> 
> >From b6f728ca19a9540c8bf4f5a56991c4e3dab4cf56 Mon Sep 17 00:00:00 2001
> From: =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= <clg@kaod.org>
> Date: Thu, 18 Jul 2019 22:15:31 +0200
> Subject: [PATCH] KVM: PPC: Book3S HV: XIVE: fix rollback when
>  kvmppc_xive_create fails
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
> 
> The XIVE device structure is now allocated in kvmppc_xive_get_device()
> and kfree'd in kvmppc_core_destroy_vm(). In case of an OPAL error when
> allocating the XIVE VPs, the kfree() call in kvmppc_xive_*create()
> will result in a double free and corrupt the host memory.
> 
> Fixes: 5422e95103cf ("KVM: PPC: Book3S HV: XIVE: Replace the 'destroy' method by a 'release' method")
> Signed-off-by: Cédric Le Goater <clg@kaod.org>

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/9798f4ea71eaf8eaad7e688c5b298528089c7bf8

cheers

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-07-22  2:57 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-18 12:49 Crash in kvmppc_xive_release() Michael Ellerman
2019-07-18 13:14 ` Cédric Le Goater
2019-07-18 21:51   ` Cédric Le Goater
2019-07-19  7:08     ` Michael Ellerman
2019-07-22  2:48     ` Michael Ellerman
2019-07-19 11:20   ` Michael Ellerman
2019-07-19 12:05     ` Cédric Le Goater

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.