All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] KVM: PPC: Book3S HV: XIVE: fixes for the KVM devices
@ 2019-05-28 12:17 ` Cédric Le Goater
  0 siblings, 0 replies; 10+ messages in thread
From: Cédric Le Goater @ 2019-05-28 12:17 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Alexey Kardashevskiy, David Gibson, Greg Kurz, kvm, kvm-ppc,
	Cédric Le Goater

Hello,

First patch fixes a host crash in PCI passthrough when using the
XICS-on-XIVE or the XIVE native KVM device. Second patch fixes locking
in code accessing the KVM memslots.

Thanks,

C.

Cédric Le Goater (2):
  KVM: PPC: Book3S HV: XIVE: do not clear IRQ data of passthrough
    interrupts
  KVM: PPC: Book3S HV: XIVE: take the srcu read lock when accessing
    memslots

 arch/powerpc/kvm/book3s_xive.c        | 4 ++--
 arch/powerpc/kvm/book3s_xive_native.c | 8 ++++++++
 2 files changed, 10 insertions(+), 2 deletions(-)

-- 
2.21.0


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 0/2] KVM: PPC: Book3S HV: XIVE: fixes for the KVM devices
@ 2019-05-28 12:17 ` Cédric Le Goater
  0 siblings, 0 replies; 10+ messages in thread
From: Cédric Le Goater @ 2019-05-28 12:17 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Alexey Kardashevskiy, David Gibson, Greg Kurz, kvm, kvm-ppc,
	Cédric Le Goater

Hello,

First patch fixes a host crash in PCI passthrough when using the
XICS-on-XIVE or the XIVE native KVM device. Second patch fixes locking
in code accessing the KVM memslots.

Thanks,

C.

Cédric Le Goater (2):
  KVM: PPC: Book3S HV: XIVE: do not clear IRQ data of passthrough
    interrupts
  KVM: PPC: Book3S HV: XIVE: take the srcu read lock when accessing
    memslots

 arch/powerpc/kvm/book3s_xive.c        | 4 ++--
 arch/powerpc/kvm/book3s_xive_native.c | 8 ++++++++
 2 files changed, 10 insertions(+), 2 deletions(-)

-- 
2.21.0

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/2] KVM: PPC: Book3S HV: XIVE: do not clear IRQ data of passthrough interrupts
  2019-05-28 12:17 ` Cédric Le Goater
@ 2019-05-28 12:17   ` Cédric Le Goater
  -1 siblings, 0 replies; 10+ messages in thread
From: Cédric Le Goater @ 2019-05-28 12:17 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Alexey Kardashevskiy, David Gibson, Greg Kurz, kvm, kvm-ppc,
	Cédric Le Goater

The passthrough interrupts are defined at the host level and their IRQ
data should not be cleared unless specifically deconfigured (shutdown)
by the host. They differ from the IPI interrupts which are allocated
by the XIVE KVM device and reserved to the guest usage only.

This fixes a host crash when destroying a VM in which a PCI adapter
was passed-through. In this case, the interrupt is cleared and freed
by the KVM device and then shutdown by vfio at the host level.

[ 1007.360265] BUG: Kernel NULL pointer dereference at 0x00000d00
[ 1007.360285] Faulting instruction address: 0xc00000000009da34
[ 1007.360296] Oops: Kernel access of bad area, sig: 7 [#1]
[ 1007.360303] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV
[ 1007.360314] Modules linked in: vhost_net vhost iptable_mangle ipt_MASQUERADE iptable_nat nf_nat xt_conntrack nf_conntrack nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4 tun bridge stp llc kvm_hv kvm xt_tcpudp iptable_filter squashfs fuse binfmt_misc vmx_crypto ib_iser rdma_cm iw_cm ib_cm libiscsi scsi_transport_iscsi nfsd ip_tables x_tables autofs4 btrfs zstd_decompress zstd_compress lzo_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq multipath mlx5_ib ib_uverbs ib_core crc32c_vpmsum mlx5_core
[ 1007.360425] CPU: 9 PID: 15576 Comm: CPU 18/KVM Kdump: loaded Not tainted 5.1.0-gad7e7d0ef #4
[ 1007.360454] NIP:  c00000000009da34 LR: c00000000009e50c CTR: c00000000009e5d0
[ 1007.360482] REGS: c000007f24ccf330 TRAP: 0300   Not tainted  (5.1.0-gad7e7d0ef)
[ 1007.360500] MSR:  900000000280b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 24002484  XER: 00000000
[ 1007.360532] CFAR: c00000000009da10 DAR: 0000000000000d00 DSISR: 00080000 IRQMASK: 1
[ 1007.360532] GPR00: c00000000009e62c c000007f24ccf5c0 c000000001510600 c000007fe7f947c0
[ 1007.360532] GPR04: 0000000000000d00 0000000000000000 0000000000000000 c000005eff02d200
[ 1007.360532] GPR08: 0000000000400000 0000000000000000 0000000000000000 fffffffffffffffd
[ 1007.360532] GPR12: c00000000009e5d0 c000007fffff7b00 0000000000000031 000000012c345718
[ 1007.360532] GPR16: 0000000000000000 0000000000000008 0000000000418004 0000000000040100
[ 1007.360532] GPR20: 0000000000000000 0000000008430000 00000000003c0000 0000000000000027
[ 1007.360532] GPR24: 00000000000000ff 0000000000000000 00000000000000ff c000007faa90d98c
[ 1007.360532] GPR28: c000007faa90da40 00000000000fe040 ffffffffffffffff c000007fe7f947c0
[ 1007.360689] NIP [c00000000009da34] xive_esb_read+0x34/0x120
[ 1007.360706] LR [c00000000009e50c] xive_do_source_set_mask.part.0+0x2c/0x50
[ 1007.360732] Call Trace:
[ 1007.360738] [c000007f24ccf5c0] [c000000000a6383c] snooze_loop+0x15c/0x270 (unreliable)
[ 1007.360775] [c000007f24ccf5f0] [c00000000009e62c] xive_irq_shutdown+0x5c/0xe0
[ 1007.360795] [c000007f24ccf630] [c00000000019e4a0] irq_shutdown+0x60/0xe0
[ 1007.360813] [c000007f24ccf660] [c000000000198c44] __free_irq+0x3a4/0x420
[ 1007.360831] [c000007f24ccf700] [c000000000198dc8] free_irq+0x78/0xe0
[ 1007.360849] [c000007f24ccf730] [c00000000096c5a8] vfio_msi_set_vector_signal+0xa8/0x350
[ 1007.360878] [c000007f24ccf7f0] [c00000000096c938] vfio_msi_set_block+0xe8/0x1e0
[ 1007.360899] [c000007f24ccf850] [c00000000096cae0] vfio_msi_disable+0xb0/0x110
[ 1007.360912] [c000007f24ccf8a0] [c00000000096cd04] vfio_pci_set_msi_trigger+0x1c4/0x3d0
[ 1007.360922] [c000007f24ccf910] [c00000000096d910] vfio_pci_set_irqs_ioctl+0xa0/0x170
[ 1007.360941] [c000007f24ccf930] [c00000000096b400] vfio_pci_disable+0x80/0x5e0
[ 1007.360963] [c000007f24ccfa10] [c00000000096b9bc] vfio_pci_release+0x5c/0x90
[ 1007.360991] [c000007f24ccfa40] [c000000000963a9c] vfio_device_fops_release+0x3c/0x70
[ 1007.361012] [c000007f24ccfa70] [c0000000003b5668] __fput+0xc8/0x2b0
[ 1007.361040] [c000007f24ccfac0] [c0000000001409b0] task_work_run+0x140/0x1b0
[ 1007.361059] [c000007f24ccfb20] [c000000000118f8c] do_exit+0x3ac/0xd00
[ 1007.361076] [c000007f24ccfc00] [c0000000001199b0] do_group_exit+0x60/0x100
[ 1007.361094] [c000007f24ccfc40] [c00000000012b514] get_signal+0x1a4/0x8f0
[ 1007.361112] [c000007f24ccfd30] [c000000000021cc8] do_notify_resume+0x1a8/0x430
[ 1007.361141] [c000007f24ccfe20] [c00000000000e444] ret_from_except_lite+0x70/0x74
[ 1007.361159] Instruction dump:
[ 1007.361175] 38422c00 e9230000 712a0004 41820010 548a2036 7d442378 78840020 71290020
[ 1007.361194] 4082004c e9230010 7c892214 7c0004ac <e9240000> 0c090000 4c00012c 792a0022

Fixes: 5af50993850a ("KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controller")
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Greg Kurz <groug@kaod.org>
---
 arch/powerpc/kvm/book3s_xive.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
index 12c8a36dd980..922fd62bcd2a 100644
--- a/arch/powerpc/kvm/book3s_xive.c
+++ b/arch/powerpc/kvm/book3s_xive.c
@@ -1828,7 +1828,6 @@ static void kvmppc_xive_cleanup_irq(u32 hw_num, struct xive_irq_data *xd)
 {
 	xive_vm_esb_load(xd, XIVE_ESB_SET_PQ_01);
 	xive_native_configure_irq(hw_num, 0, MASKED, 0);
-	xive_cleanup_irq_data(xd);
 }
 
 void kvmppc_xive_free_sources(struct kvmppc_xive_src_block *sb)
@@ -1842,9 +1841,10 @@ void kvmppc_xive_free_sources(struct kvmppc_xive_src_block *sb)
 			continue;
 
 		kvmppc_xive_cleanup_irq(state->ipi_number, &state->ipi_data);
+		xive_cleanup_irq_data(&state->ipi_data);
 		xive_native_free_irq(state->ipi_number);
 
-		/* Pass-through, cleanup too */
+		/* Pass-through, cleanup too but keep IRQ hw data */
 		if (state->pt_number)
 			kvmppc_xive_cleanup_irq(state->pt_number, state->pt_data);
 
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 1/2] KVM: PPC: Book3S HV: XIVE: do not clear IRQ data of passthrough interrupts
@ 2019-05-28 12:17   ` Cédric Le Goater
  0 siblings, 0 replies; 10+ messages in thread
From: Cédric Le Goater @ 2019-05-28 12:17 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Alexey Kardashevskiy, David Gibson, Greg Kurz, kvm, kvm-ppc,
	Cédric Le Goater

The passthrough interrupts are defined at the host level and their IRQ
data should not be cleared unless specifically deconfigured (shutdown)
by the host. They differ from the IPI interrupts which are allocated
by the XIVE KVM device and reserved to the guest usage only.

This fixes a host crash when destroying a VM in which a PCI adapter
was passed-through. In this case, the interrupt is cleared and freed
by the KVM device and then shutdown by vfio at the host level.

[ 1007.360265] BUG: Kernel NULL pointer dereference at 0x00000d00
[ 1007.360285] Faulting instruction address: 0xc00000000009da34
[ 1007.360296] Oops: Kernel access of bad area, sig: 7 [#1]
[ 1007.360303] LE PAGE_SIZEdK MMU=Radix MMU=Hash SMP NR_CPUS 48 NUMA PowerNV
[ 1007.360314] Modules linked in: vhost_net vhost iptable_mangle ipt_MASQUERADE iptable_nat nf_nat xt_conntrack nf_conntrack nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4 tun bridge stp llc kvm_hv kvm xt_tcpudp iptable_filter squashfs fuse binfmt_misc vmx_crypto ib_iser rdma_cm iw_cm ib_cm libiscsi scsi_transport_iscsi nfsd ip_tables x_tables autofs4 btrfs zstd_decompress zstd_compress lzo_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq multipath mlx5_ib ib_uverbs ib_core crc32c_vpmsum mlx5_core
[ 1007.360425] CPU: 9 PID: 15576 Comm: CPU 18/KVM Kdump: loaded Not tainted 5.1.0-gad7e7d0ef #4
[ 1007.360454] NIP:  c00000000009da34 LR: c00000000009e50c CTR: c00000000009e5d0
[ 1007.360482] REGS: c000007f24ccf330 TRAP: 0300   Not tainted  (5.1.0-gad7e7d0ef)
[ 1007.360500] MSR:  900000000280b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 24002484  XER: 00000000
[ 1007.360532] CFAR: c00000000009da10 DAR: 0000000000000d00 DSISR: 00080000 IRQMASK: 1
[ 1007.360532] GPR00: c00000000009e62c c000007f24ccf5c0 c000000001510600 c000007fe7f947c0
[ 1007.360532] GPR04: 0000000000000d00 0000000000000000 0000000000000000 c000005eff02d200
[ 1007.360532] GPR08: 0000000000400000 0000000000000000 0000000000000000 fffffffffffffffd
[ 1007.360532] GPR12: c00000000009e5d0 c000007fffff7b00 0000000000000031 000000012c345718
[ 1007.360532] GPR16: 0000000000000000 0000000000000008 0000000000418004 0000000000040100
[ 1007.360532] GPR20: 0000000000000000 0000000008430000 00000000003c0000 0000000000000027
[ 1007.360532] GPR24: 00000000000000ff 0000000000000000 00000000000000ff c000007faa90d98c
[ 1007.360532] GPR28: c000007faa90da40 00000000000fe040 ffffffffffffffff c000007fe7f947c0
[ 1007.360689] NIP [c00000000009da34] xive_esb_read+0x34/0x120
[ 1007.360706] LR [c00000000009e50c] xive_do_source_set_mask.part.0+0x2c/0x50
[ 1007.360732] Call Trace:
[ 1007.360738] [c000007f24ccf5c0] [c000000000a6383c] snooze_loop+0x15c/0x270 (unreliable)
[ 1007.360775] [c000007f24ccf5f0] [c00000000009e62c] xive_irq_shutdown+0x5c/0xe0
[ 1007.360795] [c000007f24ccf630] [c00000000019e4a0] irq_shutdown+0x60/0xe0
[ 1007.360813] [c000007f24ccf660] [c000000000198c44] __free_irq+0x3a4/0x420
[ 1007.360831] [c000007f24ccf700] [c000000000198dc8] free_irq+0x78/0xe0
[ 1007.360849] [c000007f24ccf730] [c00000000096c5a8] vfio_msi_set_vector_signal+0xa8/0x350
[ 1007.360878] [c000007f24ccf7f0] [c00000000096c938] vfio_msi_set_block+0xe8/0x1e0
[ 1007.360899] [c000007f24ccf850] [c00000000096cae0] vfio_msi_disable+0xb0/0x110
[ 1007.360912] [c000007f24ccf8a0] [c00000000096cd04] vfio_pci_set_msi_trigger+0x1c4/0x3d0
[ 1007.360922] [c000007f24ccf910] [c00000000096d910] vfio_pci_set_irqs_ioctl+0xa0/0x170
[ 1007.360941] [c000007f24ccf930] [c00000000096b400] vfio_pci_disable+0x80/0x5e0
[ 1007.360963] [c000007f24ccfa10] [c00000000096b9bc] vfio_pci_release+0x5c/0x90
[ 1007.360991] [c000007f24ccfa40] [c000000000963a9c] vfio_device_fops_release+0x3c/0x70
[ 1007.361012] [c000007f24ccfa70] [c0000000003b5668] __fput+0xc8/0x2b0
[ 1007.361040] [c000007f24ccfac0] [c0000000001409b0] task_work_run+0x140/0x1b0
[ 1007.361059] [c000007f24ccfb20] [c000000000118f8c] do_exit+0x3ac/0xd00
[ 1007.361076] [c000007f24ccfc00] [c0000000001199b0] do_group_exit+0x60/0x100
[ 1007.361094] [c000007f24ccfc40] [c00000000012b514] get_signal+0x1a4/0x8f0
[ 1007.361112] [c000007f24ccfd30] [c000000000021cc8] do_notify_resume+0x1a8/0x430
[ 1007.361141] [c000007f24ccfe20] [c00000000000e444] ret_from_except_lite+0x70/0x74
[ 1007.361159] Instruction dump:
[ 1007.361175] 38422c00 e9230000 712a0004 41820010 548a2036 7d442378 78840020 71290020
[ 1007.361194] 4082004c e9230010 7c892214 7c0004ac <e9240000> 0c090000 4c00012c 792a0022

Fixes: 5af50993850a ("KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controller")
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Greg Kurz <groug@kaod.org>
---
 arch/powerpc/kvm/book3s_xive.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
index 12c8a36dd980..922fd62bcd2a 100644
--- a/arch/powerpc/kvm/book3s_xive.c
+++ b/arch/powerpc/kvm/book3s_xive.c
@@ -1828,7 +1828,6 @@ static void kvmppc_xive_cleanup_irq(u32 hw_num, struct xive_irq_data *xd)
 {
 	xive_vm_esb_load(xd, XIVE_ESB_SET_PQ_01);
 	xive_native_configure_irq(hw_num, 0, MASKED, 0);
-	xive_cleanup_irq_data(xd);
 }
 
 void kvmppc_xive_free_sources(struct kvmppc_xive_src_block *sb)
@@ -1842,9 +1841,10 @@ void kvmppc_xive_free_sources(struct kvmppc_xive_src_block *sb)
 			continue;
 
 		kvmppc_xive_cleanup_irq(state->ipi_number, &state->ipi_data);
+		xive_cleanup_irq_data(&state->ipi_data);
 		xive_native_free_irq(state->ipi_number);
 
-		/* Pass-through, cleanup too */
+		/* Pass-through, cleanup too but keep IRQ hw data */
 		if (state->pt_number)
 			kvmppc_xive_cleanup_irq(state->pt_number, state->pt_data);
 
-- 
2.21.0

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/2] KVM: PPC: Book3S HV: XIVE: take the srcu read lock when accessing memslots
  2019-05-28 12:17 ` Cédric Le Goater
@ 2019-05-28 12:17   ` Cédric Le Goater
  -1 siblings, 0 replies; 10+ messages in thread
From: Cédric Le Goater @ 2019-05-28 12:17 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Alexey Kardashevskiy, David Gibson, Greg Kurz, kvm, kvm-ppc,
	Cédric Le Goater

According to Documentation/virtual/kvm/locking.txt, the srcu read lock
should be taken when accessing the memslots of the VM. The XIVE KVM
device needs to do so when configuring the page of the OS event queue
of vCPU for a given priority and when marking the same page dirty
before migration.

This avoids warnings such as :

[  208.224882] =============================
[  208.224884] WARNING: suspicious RCU usage
[  208.224889] 5.2.0-rc2-xive+ #47 Not tainted
[  208.224890] -----------------------------
[  208.224894] ../include/linux/kvm_host.h:633 suspicious rcu_dereference_check() usage!
[  208.224896]
               other info that might help us debug this:

[  208.224898]
               rcu_scheduler_active = 2, debug_locks = 1
[  208.224901] no locks held by qemu-system-ppc/3923.
[  208.224902]
               stack backtrace:
[  208.224907] CPU: 64 PID: 3923 Comm: qemu-system-ppc Kdump: loaded Not tainted 5.2.0-rc2-xive+ #47
[  208.224909] Call Trace:
[  208.224918] [c000200cdd98fa30] [c000000000be1934] dump_stack+0xe8/0x164 (unreliable)
[  208.224924] [c000200cdd98fa80] [c0000000001aec80] lockdep_rcu_suspicious+0x110/0x180
[  208.224935] [c000200cdd98fb00] [c0080000075933a0] gfn_to_memslot+0x1c8/0x200 [kvm]
[  208.224943] [c000200cdd98fb40] [c008000007599600] gfn_to_pfn+0x28/0x60 [kvm]
[  208.224951] [c000200cdd98fb70] [c008000007599658] gfn_to_page+0x20/0x40 [kvm]
[  208.224959] [c000200cdd98fb90] [c0080000075b495c] kvmppc_xive_native_set_attr+0x8b4/0x1480 [kvm]
[  208.224967] [c000200cdd98fca0] [c00800000759261c] kvm_device_ioctl_attr+0x64/0xb0 [kvm]
[  208.224974] [c000200cdd98fcf0] [c008000007592730] kvm_device_ioctl+0xc8/0x110 [kvm]
[  208.224979] [c000200cdd98fd10] [c000000000433a24] do_vfs_ioctl+0xd4/0xcd0
[  208.224981] [c000200cdd98fdb0] [c000000000434724] ksys_ioctl+0x104/0x120
[  208.224984] [c000200cdd98fe00] [c000000000434768] sys_ioctl+0x28/0x80
[  208.224988] [c000200cdd98fe20] [c00000000000b888] system_call+0x5c/0x70
legoater@boss01:~$

Fixes: 13ce3297c576 ("KVM: PPC: Book3S HV: XIVE: Add controls for the EQ configuration")
Fixes: e6714bd1671d ("KVM: PPC: Book3S HV: XIVE: Add a control to dirty the XIVE EQ pages")
Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 arch/powerpc/kvm/book3s_xive_native.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/powerpc/kvm/book3s_xive_native.c b/arch/powerpc/kvm/book3s_xive_native.c
index fec3b85411ef..8b762e3ebbc5 100644
--- a/arch/powerpc/kvm/book3s_xive_native.c
+++ b/arch/powerpc/kvm/book3s_xive_native.c
@@ -535,6 +535,7 @@ static int kvmppc_xive_native_set_queue_config(struct kvmppc_xive *xive,
 	struct xive_q *q;
 	gfn_t gfn;
 	unsigned long page_size;
+	int srcu_idx;
 
 	/*
 	 * Demangle priority/server tuple from the EQ identifier
@@ -610,20 +611,24 @@ static int kvmppc_xive_native_set_queue_config(struct kvmppc_xive *xive,
 		return -EINVAL;
 	}
 
+	srcu_idx = srcu_read_lock(&kvm->srcu);
 	gfn = gpa_to_gfn(kvm_eq.qaddr);
 	page = gfn_to_page(kvm, gfn);
 	if (is_error_page(page)) {
+		srcu_read_unlock(&kvm->srcu, srcu_idx);
 		pr_err("Couldn't get queue page %llx!\n", kvm_eq.qaddr);
 		return -EINVAL;
 	}
 
 	page_size = kvm_host_page_size(kvm, gfn);
 	if (1ull << kvm_eq.qshift > page_size) {
+		srcu_read_unlock(&kvm->srcu, srcu_idx);
 		pr_warn("Incompatible host page size %lx!\n", page_size);
 		return -EINVAL;
 	}
 
 	qaddr = page_to_virt(page) + (kvm_eq.qaddr & ~PAGE_MASK);
+	srcu_read_unlock(&kvm->srcu, srcu_idx);
 
 	/*
 	 * Backup the queue page guest address to the mark EQ page
@@ -854,6 +859,7 @@ static int kvmppc_xive_native_vcpu_eq_sync(struct kvm_vcpu *vcpu)
 {
 	struct kvmppc_xive_vcpu *xc = vcpu->arch.xive_vcpu;
 	unsigned int prio;
+	int srcu_idx;
 
 	if (!xc)
 		return -ENOENT;
@@ -865,7 +871,9 @@ static int kvmppc_xive_native_vcpu_eq_sync(struct kvm_vcpu *vcpu)
 			continue;
 
 		/* Mark EQ page dirty for migration */
+		srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
 		mark_page_dirty(vcpu->kvm, gpa_to_gfn(q->guest_qaddr));
+		srcu_read_unlock(&vcpu->kvm->srcu, srcu_idx);
 	}
 	return 0;
 }
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/2] KVM: PPC: Book3S HV: XIVE: take the srcu read lock when accessing memslots
@ 2019-05-28 12:17   ` Cédric Le Goater
  0 siblings, 0 replies; 10+ messages in thread
From: Cédric Le Goater @ 2019-05-28 12:17 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Alexey Kardashevskiy, David Gibson, Greg Kurz, kvm, kvm-ppc,
	Cédric Le Goater

According to Documentation/virtual/kvm/locking.txt, the srcu read lock
should be taken when accessing the memslots of the VM. The XIVE KVM
device needs to do so when configuring the page of the OS event queue
of vCPU for a given priority and when marking the same page dirty
before migration.

This avoids warnings such as :

[  208.224882] ==============[  208.224884] WARNING: suspicious RCU usage
[  208.224889] 5.2.0-rc2-xive+ #47 Not tainted
[  208.224890] -----------------------------
[  208.224894] ../include/linux/kvm_host.h:633 suspicious rcu_dereference_check() usage!
[  208.224896]
               other info that might help us debug this:

[  208.224898]
               rcu_scheduler_active = 2, debug_locks = 1
[  208.224901] no locks held by qemu-system-ppc/3923.
[  208.224902]
               stack backtrace:
[  208.224907] CPU: 64 PID: 3923 Comm: qemu-system-ppc Kdump: loaded Not tainted 5.2.0-rc2-xive+ #47
[  208.224909] Call Trace:
[  208.224918] [c000200cdd98fa30] [c000000000be1934] dump_stack+0xe8/0x164 (unreliable)
[  208.224924] [c000200cdd98fa80] [c0000000001aec80] lockdep_rcu_suspicious+0x110/0x180
[  208.224935] [c000200cdd98fb00] [c0080000075933a0] gfn_to_memslot+0x1c8/0x200 [kvm]
[  208.224943] [c000200cdd98fb40] [c008000007599600] gfn_to_pfn+0x28/0x60 [kvm]
[  208.224951] [c000200cdd98fb70] [c008000007599658] gfn_to_page+0x20/0x40 [kvm]
[  208.224959] [c000200cdd98fb90] [c0080000075b495c] kvmppc_xive_native_set_attr+0x8b4/0x1480 [kvm]
[  208.224967] [c000200cdd98fca0] [c00800000759261c] kvm_device_ioctl_attr+0x64/0xb0 [kvm]
[  208.224974] [c000200cdd98fcf0] [c008000007592730] kvm_device_ioctl+0xc8/0x110 [kvm]
[  208.224979] [c000200cdd98fd10] [c000000000433a24] do_vfs_ioctl+0xd4/0xcd0
[  208.224981] [c000200cdd98fdb0] [c000000000434724] ksys_ioctl+0x104/0x120
[  208.224984] [c000200cdd98fe00] [c000000000434768] sys_ioctl+0x28/0x80
[  208.224988] [c000200cdd98fe20] [c00000000000b888] system_call+0x5c/0x70
legoater@boss01:~$

Fixes: 13ce3297c576 ("KVM: PPC: Book3S HV: XIVE: Add controls for the EQ configuration")
Fixes: e6714bd1671d ("KVM: PPC: Book3S HV: XIVE: Add a control to dirty the XIVE EQ pages")
Signed-off-by: Cédric Le Goater <clg@kaod.org>
---
 arch/powerpc/kvm/book3s_xive_native.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/powerpc/kvm/book3s_xive_native.c b/arch/powerpc/kvm/book3s_xive_native.c
index fec3b85411ef..8b762e3ebbc5 100644
--- a/arch/powerpc/kvm/book3s_xive_native.c
+++ b/arch/powerpc/kvm/book3s_xive_native.c
@@ -535,6 +535,7 @@ static int kvmppc_xive_native_set_queue_config(struct kvmppc_xive *xive,
 	struct xive_q *q;
 	gfn_t gfn;
 	unsigned long page_size;
+	int srcu_idx;
 
 	/*
 	 * Demangle priority/server tuple from the EQ identifier
@@ -610,20 +611,24 @@ static int kvmppc_xive_native_set_queue_config(struct kvmppc_xive *xive,
 		return -EINVAL;
 	}
 
+	srcu_idx = srcu_read_lock(&kvm->srcu);
 	gfn = gpa_to_gfn(kvm_eq.qaddr);
 	page = gfn_to_page(kvm, gfn);
 	if (is_error_page(page)) {
+		srcu_read_unlock(&kvm->srcu, srcu_idx);
 		pr_err("Couldn't get queue page %llx!\n", kvm_eq.qaddr);
 		return -EINVAL;
 	}
 
 	page_size = kvm_host_page_size(kvm, gfn);
 	if (1ull << kvm_eq.qshift > page_size) {
+		srcu_read_unlock(&kvm->srcu, srcu_idx);
 		pr_warn("Incompatible host page size %lx!\n", page_size);
 		return -EINVAL;
 	}
 
 	qaddr = page_to_virt(page) + (kvm_eq.qaddr & ~PAGE_MASK);
+	srcu_read_unlock(&kvm->srcu, srcu_idx);
 
 	/*
 	 * Backup the queue page guest address to the mark EQ page
@@ -854,6 +859,7 @@ static int kvmppc_xive_native_vcpu_eq_sync(struct kvm_vcpu *vcpu)
 {
 	struct kvmppc_xive_vcpu *xc = vcpu->arch.xive_vcpu;
 	unsigned int prio;
+	int srcu_idx;
 
 	if (!xc)
 		return -ENOENT;
@@ -865,7 +871,9 @@ static int kvmppc_xive_native_vcpu_eq_sync(struct kvm_vcpu *vcpu)
 			continue;
 
 		/* Mark EQ page dirty for migration */
+		srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
 		mark_page_dirty(vcpu->kvm, gpa_to_gfn(q->guest_qaddr));
+		srcu_read_unlock(&vcpu->kvm->srcu, srcu_idx);
 	}
 	return 0;
 }
-- 
2.21.0

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] KVM: PPC: Book3S HV: XIVE: do not clear IRQ data of passthrough interrupts
  2019-05-28 12:17   ` Cédric Le Goater
@ 2019-05-29 13:17     ` Greg Kurz
  -1 siblings, 0 replies; 10+ messages in thread
From: Greg Kurz @ 2019-05-29 13:17 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: Paul Mackerras, Alexey Kardashevskiy, David Gibson, kvm, kvm-ppc

On Tue, 28 May 2019 14:17:15 +0200
Cédric Le Goater <clg@kaod.org> wrote:

> The passthrough interrupts are defined at the host level and their IRQ
> data should not be cleared unless specifically deconfigured (shutdown)
> by the host. They differ from the IPI interrupts which are allocated
> by the XIVE KVM device and reserved to the guest usage only.
> 
> This fixes a host crash when destroying a VM in which a PCI adapter
> was passed-through. In this case, the interrupt is cleared and freed
> by the KVM device and then shutdown by vfio at the host level.
> 
> [ 1007.360265] BUG: Kernel NULL pointer dereference at 0x00000d00
> [ 1007.360285] Faulting instruction address: 0xc00000000009da34
> [ 1007.360296] Oops: Kernel access of bad area, sig: 7 [#1]
> [ 1007.360303] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV
> [ 1007.360314] Modules linked in: vhost_net vhost iptable_mangle ipt_MASQUERADE iptable_nat nf_nat xt_conntrack nf_conntrack nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4 tun bridge stp llc kvm_hv kvm xt_tcpudp iptable_filter squashfs fuse binfmt_misc vmx_crypto ib_iser rdma_cm iw_cm ib_cm libiscsi scsi_transport_iscsi nfsd ip_tables x_tables autofs4 btrfs zstd_decompress zstd_compress lzo_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq multipath mlx5_ib ib_uverbs ib_core crc32c_vpmsum mlx5_core
> [ 1007.360425] CPU: 9 PID: 15576 Comm: CPU 18/KVM Kdump: loaded Not tainted 5.1.0-gad7e7d0ef #4
> [ 1007.360454] NIP:  c00000000009da34 LR: c00000000009e50c CTR: c00000000009e5d0
> [ 1007.360482] REGS: c000007f24ccf330 TRAP: 0300   Not tainted  (5.1.0-gad7e7d0ef)
> [ 1007.360500] MSR:  900000000280b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 24002484  XER: 00000000
> [ 1007.360532] CFAR: c00000000009da10 DAR: 0000000000000d00 DSISR: 00080000 IRQMASK: 1
> [ 1007.360532] GPR00: c00000000009e62c c000007f24ccf5c0 c000000001510600 c000007fe7f947c0
> [ 1007.360532] GPR04: 0000000000000d00 0000000000000000 0000000000000000 c000005eff02d200
> [ 1007.360532] GPR08: 0000000000400000 0000000000000000 0000000000000000 fffffffffffffffd
> [ 1007.360532] GPR12: c00000000009e5d0 c000007fffff7b00 0000000000000031 000000012c345718
> [ 1007.360532] GPR16: 0000000000000000 0000000000000008 0000000000418004 0000000000040100
> [ 1007.360532] GPR20: 0000000000000000 0000000008430000 00000000003c0000 0000000000000027
> [ 1007.360532] GPR24: 00000000000000ff 0000000000000000 00000000000000ff c000007faa90d98c
> [ 1007.360532] GPR28: c000007faa90da40 00000000000fe040 ffffffffffffffff c000007fe7f947c0
> [ 1007.360689] NIP [c00000000009da34] xive_esb_read+0x34/0x120
> [ 1007.360706] LR [c00000000009e50c] xive_do_source_set_mask.part.0+0x2c/0x50
> [ 1007.360732] Call Trace:
> [ 1007.360738] [c000007f24ccf5c0] [c000000000a6383c] snooze_loop+0x15c/0x270 (unreliable)
> [ 1007.360775] [c000007f24ccf5f0] [c00000000009e62c] xive_irq_shutdown+0x5c/0xe0
> [ 1007.360795] [c000007f24ccf630] [c00000000019e4a0] irq_shutdown+0x60/0xe0
> [ 1007.360813] [c000007f24ccf660] [c000000000198c44] __free_irq+0x3a4/0x420
> [ 1007.360831] [c000007f24ccf700] [c000000000198dc8] free_irq+0x78/0xe0
> [ 1007.360849] [c000007f24ccf730] [c00000000096c5a8] vfio_msi_set_vector_signal+0xa8/0x350
> [ 1007.360878] [c000007f24ccf7f0] [c00000000096c938] vfio_msi_set_block+0xe8/0x1e0
> [ 1007.360899] [c000007f24ccf850] [c00000000096cae0] vfio_msi_disable+0xb0/0x110
> [ 1007.360912] [c000007f24ccf8a0] [c00000000096cd04] vfio_pci_set_msi_trigger+0x1c4/0x3d0
> [ 1007.360922] [c000007f24ccf910] [c00000000096d910] vfio_pci_set_irqs_ioctl+0xa0/0x170
> [ 1007.360941] [c000007f24ccf930] [c00000000096b400] vfio_pci_disable+0x80/0x5e0
> [ 1007.360963] [c000007f24ccfa10] [c00000000096b9bc] vfio_pci_release+0x5c/0x90
> [ 1007.360991] [c000007f24ccfa40] [c000000000963a9c] vfio_device_fops_release+0x3c/0x70
> [ 1007.361012] [c000007f24ccfa70] [c0000000003b5668] __fput+0xc8/0x2b0
> [ 1007.361040] [c000007f24ccfac0] [c0000000001409b0] task_work_run+0x140/0x1b0
> [ 1007.361059] [c000007f24ccfb20] [c000000000118f8c] do_exit+0x3ac/0xd00
> [ 1007.361076] [c000007f24ccfc00] [c0000000001199b0] do_group_exit+0x60/0x100
> [ 1007.361094] [c000007f24ccfc40] [c00000000012b514] get_signal+0x1a4/0x8f0
> [ 1007.361112] [c000007f24ccfd30] [c000000000021cc8] do_notify_resume+0x1a8/0x430
> [ 1007.361141] [c000007f24ccfe20] [c00000000000e444] ret_from_except_lite+0x70/0x74
> [ 1007.361159] Instruction dump:
> [ 1007.361175] 38422c00 e9230000 712a0004 41820010 548a2036 7d442378 78840020 71290020
> [ 1007.361194] 4082004c e9230010 7c892214 7c0004ac <e9240000> 0c090000 4c00012c 792a0022
> 
> Fixes: 5af50993850a ("KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controller")

Maybe worth adding:

Cc: stable@vger.kernel.org # v4.12+

> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> Signed-off-by: Greg Kurz <groug@kaod.org>
> ---
>  arch/powerpc/kvm/book3s_xive.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
> index 12c8a36dd980..922fd62bcd2a 100644
> --- a/arch/powerpc/kvm/book3s_xive.c
> +++ b/arch/powerpc/kvm/book3s_xive.c
> @@ -1828,7 +1828,6 @@ static void kvmppc_xive_cleanup_irq(u32 hw_num, struct xive_irq_data *xd)
>  {
>  	xive_vm_esb_load(xd, XIVE_ESB_SET_PQ_01);
>  	xive_native_configure_irq(hw_num, 0, MASKED, 0);
> -	xive_cleanup_irq_data(xd);
>  }
>  
>  void kvmppc_xive_free_sources(struct kvmppc_xive_src_block *sb)
> @@ -1842,9 +1841,10 @@ void kvmppc_xive_free_sources(struct kvmppc_xive_src_block *sb)
>  			continue;
>  
>  		kvmppc_xive_cleanup_irq(state->ipi_number, &state->ipi_data);
> +		xive_cleanup_irq_data(&state->ipi_data);
>  		xive_native_free_irq(state->ipi_number);
>  
> -		/* Pass-through, cleanup too */
> +		/* Pass-through, cleanup too but keep IRQ hw data */
>  		if (state->pt_number)
>  			kvmppc_xive_cleanup_irq(state->pt_number, state->pt_data);
>  

Also, even if this definitely allows to avoid the crash, I'm still not
convinced we should be calling kvmppc_xive_cleanup_irq() for pass-through
at all.

My concern is that kvmppc_xive_clr_mapped() which gets called when VFIO shuts
the interrupt down seems to be doing extra stuff to release the pass-through
interrupt back to the host. But when the KVM device gets released before VFIO
had a chance to do that, kvmppc_xive_clr_mapped() is a nop since both
kvmppc_xive_release() and kvmppc_xive_native_release() set kvm.arch->xive to
NULL. This triggers the following warning in the xics-on-xive case:

[25185.218975] kvmppc_clr_passthru_irq (irq 94, gsi 4870) fails: -19

I'm wondering if we should do this extra stuff from kvmppc_xive_clr_mapped()
as well when closing the KVM device instead of clearing the interrupt as the
we do now.

This needs some more investigation.

Cheers,

--
Greg

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] KVM: PPC: Book3S HV: XIVE: do not clear IRQ data of passthrough interrupts
@ 2019-05-29 13:17     ` Greg Kurz
  0 siblings, 0 replies; 10+ messages in thread
From: Greg Kurz @ 2019-05-29 13:17 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: Paul Mackerras, Alexey Kardashevskiy, David Gibson, kvm, kvm-ppc

On Tue, 28 May 2019 14:17:15 +0200
Cédric Le Goater <clg@kaod.org> wrote:

> The passthrough interrupts are defined at the host level and their IRQ
> data should not be cleared unless specifically deconfigured (shutdown)
> by the host. They differ from the IPI interrupts which are allocated
> by the XIVE KVM device and reserved to the guest usage only.
> 
> This fixes a host crash when destroying a VM in which a PCI adapter
> was passed-through. In this case, the interrupt is cleared and freed
> by the KVM device and then shutdown by vfio at the host level.
> 
> [ 1007.360265] BUG: Kernel NULL pointer dereference at 0x00000d00
> [ 1007.360285] Faulting instruction address: 0xc00000000009da34
> [ 1007.360296] Oops: Kernel access of bad area, sig: 7 [#1]
> [ 1007.360303] LE PAGE_SIZEdK MMU=Radix MMU=Hash SMP NR_CPUS 48 NUMA PowerNV
> [ 1007.360314] Modules linked in: vhost_net vhost iptable_mangle ipt_MASQUERADE iptable_nat nf_nat xt_conntrack nf_conntrack nf_defrag_ipv4 ipt_REJECT nf_reject_ipv4 tun bridge stp llc kvm_hv kvm xt_tcpudp iptable_filter squashfs fuse binfmt_misc vmx_crypto ib_iser rdma_cm iw_cm ib_cm libiscsi scsi_transport_iscsi nfsd ip_tables x_tables autofs4 btrfs zstd_decompress zstd_compress lzo_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq multipath mlx5_ib ib_uverbs ib_core crc32c_vpmsum mlx5_core
> [ 1007.360425] CPU: 9 PID: 15576 Comm: CPU 18/KVM Kdump: loaded Not tainted 5.1.0-gad7e7d0ef #4
> [ 1007.360454] NIP:  c00000000009da34 LR: c00000000009e50c CTR: c00000000009e5d0
> [ 1007.360482] REGS: c000007f24ccf330 TRAP: 0300   Not tainted  (5.1.0-gad7e7d0ef)
> [ 1007.360500] MSR:  900000000280b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 24002484  XER: 00000000
> [ 1007.360532] CFAR: c00000000009da10 DAR: 0000000000000d00 DSISR: 00080000 IRQMASK: 1
> [ 1007.360532] GPR00: c00000000009e62c c000007f24ccf5c0 c000000001510600 c000007fe7f947c0
> [ 1007.360532] GPR04: 0000000000000d00 0000000000000000 0000000000000000 c000005eff02d200
> [ 1007.360532] GPR08: 0000000000400000 0000000000000000 0000000000000000 fffffffffffffffd
> [ 1007.360532] GPR12: c00000000009e5d0 c000007fffff7b00 0000000000000031 000000012c345718
> [ 1007.360532] GPR16: 0000000000000000 0000000000000008 0000000000418004 0000000000040100
> [ 1007.360532] GPR20: 0000000000000000 0000000008430000 00000000003c0000 0000000000000027
> [ 1007.360532] GPR24: 00000000000000ff 0000000000000000 00000000000000ff c000007faa90d98c
> [ 1007.360532] GPR28: c000007faa90da40 00000000000fe040 ffffffffffffffff c000007fe7f947c0
> [ 1007.360689] NIP [c00000000009da34] xive_esb_read+0x34/0x120
> [ 1007.360706] LR [c00000000009e50c] xive_do_source_set_mask.part.0+0x2c/0x50
> [ 1007.360732] Call Trace:
> [ 1007.360738] [c000007f24ccf5c0] [c000000000a6383c] snooze_loop+0x15c/0x270 (unreliable)
> [ 1007.360775] [c000007f24ccf5f0] [c00000000009e62c] xive_irq_shutdown+0x5c/0xe0
> [ 1007.360795] [c000007f24ccf630] [c00000000019e4a0] irq_shutdown+0x60/0xe0
> [ 1007.360813] [c000007f24ccf660] [c000000000198c44] __free_irq+0x3a4/0x420
> [ 1007.360831] [c000007f24ccf700] [c000000000198dc8] free_irq+0x78/0xe0
> [ 1007.360849] [c000007f24ccf730] [c00000000096c5a8] vfio_msi_set_vector_signal+0xa8/0x350
> [ 1007.360878] [c000007f24ccf7f0] [c00000000096c938] vfio_msi_set_block+0xe8/0x1e0
> [ 1007.360899] [c000007f24ccf850] [c00000000096cae0] vfio_msi_disable+0xb0/0x110
> [ 1007.360912] [c000007f24ccf8a0] [c00000000096cd04] vfio_pci_set_msi_trigger+0x1c4/0x3d0
> [ 1007.360922] [c000007f24ccf910] [c00000000096d910] vfio_pci_set_irqs_ioctl+0xa0/0x170
> [ 1007.360941] [c000007f24ccf930] [c00000000096b400] vfio_pci_disable+0x80/0x5e0
> [ 1007.360963] [c000007f24ccfa10] [c00000000096b9bc] vfio_pci_release+0x5c/0x90
> [ 1007.360991] [c000007f24ccfa40] [c000000000963a9c] vfio_device_fops_release+0x3c/0x70
> [ 1007.361012] [c000007f24ccfa70] [c0000000003b5668] __fput+0xc8/0x2b0
> [ 1007.361040] [c000007f24ccfac0] [c0000000001409b0] task_work_run+0x140/0x1b0
> [ 1007.361059] [c000007f24ccfb20] [c000000000118f8c] do_exit+0x3ac/0xd00
> [ 1007.361076] [c000007f24ccfc00] [c0000000001199b0] do_group_exit+0x60/0x100
> [ 1007.361094] [c000007f24ccfc40] [c00000000012b514] get_signal+0x1a4/0x8f0
> [ 1007.361112] [c000007f24ccfd30] [c000000000021cc8] do_notify_resume+0x1a8/0x430
> [ 1007.361141] [c000007f24ccfe20] [c00000000000e444] ret_from_except_lite+0x70/0x74
> [ 1007.361159] Instruction dump:
> [ 1007.361175] 38422c00 e9230000 712a0004 41820010 548a2036 7d442378 78840020 71290020
> [ 1007.361194] 4082004c e9230010 7c892214 7c0004ac <e9240000> 0c090000 4c00012c 792a0022
> 
> Fixes: 5af50993850a ("KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controller")

Maybe worth adding:

Cc: stable@vger.kernel.org # v4.12+

> Signed-off-by: Cédric Le Goater <clg@kaod.org>
> Signed-off-by: Greg Kurz <groug@kaod.org>
> ---
>  arch/powerpc/kvm/book3s_xive.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
> index 12c8a36dd980..922fd62bcd2a 100644
> --- a/arch/powerpc/kvm/book3s_xive.c
> +++ b/arch/powerpc/kvm/book3s_xive.c
> @@ -1828,7 +1828,6 @@ static void kvmppc_xive_cleanup_irq(u32 hw_num, struct xive_irq_data *xd)
>  {
>  	xive_vm_esb_load(xd, XIVE_ESB_SET_PQ_01);
>  	xive_native_configure_irq(hw_num, 0, MASKED, 0);
> -	xive_cleanup_irq_data(xd);
>  }
>  
>  void kvmppc_xive_free_sources(struct kvmppc_xive_src_block *sb)
> @@ -1842,9 +1841,10 @@ void kvmppc_xive_free_sources(struct kvmppc_xive_src_block *sb)
>  			continue;
>  
>  		kvmppc_xive_cleanup_irq(state->ipi_number, &state->ipi_data);
> +		xive_cleanup_irq_data(&state->ipi_data);
>  		xive_native_free_irq(state->ipi_number);
>  
> -		/* Pass-through, cleanup too */
> +		/* Pass-through, cleanup too but keep IRQ hw data */
>  		if (state->pt_number)
>  			kvmppc_xive_cleanup_irq(state->pt_number, state->pt_data);
>  

Also, even if this definitely allows to avoid the crash, I'm still not
convinced we should be calling kvmppc_xive_cleanup_irq() for pass-through
at all.

My concern is that kvmppc_xive_clr_mapped() which gets called when VFIO shuts
the interrupt down seems to be doing extra stuff to release the pass-through
interrupt back to the host. But when the KVM device gets released before VFIO
had a chance to do that, kvmppc_xive_clr_mapped() is a nop since both
kvmppc_xive_release() and kvmppc_xive_native_release() set kvm.arch->xive to
NULL. This triggers the following warning in the xics-on-xive case:

[25185.218975] kvmppc_clr_passthru_irq (irq 94, gsi 4870) fails: -19

I'm wondering if we should do this extra stuff from kvmppc_xive_clr_mapped()
as well when closing the KVM device instead of clearing the interrupt as the
we do now.

This needs some more investigation.

Cheers,

--
Greg

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/2] KVM: PPC: Book3S HV: XIVE: fixes for the KVM devices
  2019-05-28 12:17 ` Cédric Le Goater
@ 2019-05-31  6:36   ` Paul Mackerras
  -1 siblings, 0 replies; 10+ messages in thread
From: Paul Mackerras @ 2019-05-31  6:36 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: Alexey Kardashevskiy, David Gibson, Greg Kurz, kvm, kvm-ppc

On Tue, May 28, 2019 at 02:17:14PM +0200, Cédric Le Goater wrote:
> Hello,
> 
> First patch fixes a host crash in PCI passthrough when using the
> XICS-on-XIVE or the XIVE native KVM device. Second patch fixes locking
> in code accessing the KVM memslots.
> 
> Thanks,
> 
> C.
> 
> Cédric Le Goater (2):
>   KVM: PPC: Book3S HV: XIVE: do not clear IRQ data of passthrough
>     interrupts
>   KVM: PPC: Book3S HV: XIVE: take the srcu read lock when accessing
>     memslots

Thanks, series applied to my kvm-ppc-fixes branch.

Paul.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/2] KVM: PPC: Book3S HV: XIVE: fixes for the KVM devices
@ 2019-05-31  6:36   ` Paul Mackerras
  0 siblings, 0 replies; 10+ messages in thread
From: Paul Mackerras @ 2019-05-31  6:36 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: Alexey Kardashevskiy, David Gibson, Greg Kurz, kvm, kvm-ppc

On Tue, May 28, 2019 at 02:17:14PM +0200, Cédric Le Goater wrote:
> Hello,
> 
> First patch fixes a host crash in PCI passthrough when using the
> XICS-on-XIVE or the XIVE native KVM device. Second patch fixes locking
> in code accessing the KVM memslots.
> 
> Thanks,
> 
> C.
> 
> Cédric Le Goater (2):
>   KVM: PPC: Book3S HV: XIVE: do not clear IRQ data of passthrough
>     interrupts
>   KVM: PPC: Book3S HV: XIVE: take the srcu read lock when accessing
>     memslots

Thanks, series applied to my kvm-ppc-fixes branch.

Paul.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-05-31  6:38 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-05-28 12:17 [PATCH 0/2] KVM: PPC: Book3S HV: XIVE: fixes for the KVM devices Cédric Le Goater
2019-05-28 12:17 ` Cédric Le Goater
2019-05-28 12:17 ` [PATCH 1/2] KVM: PPC: Book3S HV: XIVE: do not clear IRQ data of passthrough interrupts Cédric Le Goater
2019-05-28 12:17   ` Cédric Le Goater
2019-05-29 13:17   ` Greg Kurz
2019-05-29 13:17     ` Greg Kurz
2019-05-28 12:17 ` [PATCH 2/2] KVM: PPC: Book3S HV: XIVE: take the srcu read lock when accessing memslots Cédric Le Goater
2019-05-28 12:17   ` Cédric Le Goater
2019-05-31  6:36 ` [PATCH 0/2] KVM: PPC: Book3S HV: XIVE: fixes for the KVM devices Paul Mackerras
2019-05-31  6:36   ` Paul Mackerras

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.