[PATCH 1/1] kvm/book3s_64: Fixes crash caused by not cleaning vhost IOTLB

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 1/1] kvm/book3s_64: Fixes crash caused by not cleaning vhost IOTLB
@ 2019-12-17 21:06 ` Leonardo Bras
  0 siblings, 0 replies; 12+ messages in thread
From: Leonardo Bras @ 2019-12-17 21:06 UTC (permalink / raw)
  To: Paul Mackerras, Benjamin Herrenschmidt, Michael Ellerman
  Cc: Leonardo Bras, kvm-ppc, linuxppc-dev, linux-kernel, farosas, aik

Fixes a bug that happens when a virtual machine is created without DDW,
with vhost supporting a virtio-net device.

In this scenario, an IOMMU with 32-bit DMA window will possibly map
IOVA's to different memory addresses.

As the code works today, H_STUFF_TCE hypercall will be dealt only with
kvm code, which does not invalidate the IOTLB entry in vhost, meaning
that at some point, and old entry can cause an access to a previous
memory address that IOVA pointed.

Example:
- virtio-net passes IOVA N to vhost, which point to M1
- vhost tries IOTLB, but miss
- vhost translates IOVA N and stores result to IOTLB
- vhost writes to M1
- (some IOMMU usage)
- virtio-net passes IOVA N to vhost, which now points to M2
- vhost tries IOTLB, and translates IOVA N to M1
- vhost writes to M1 <error, should write to M2>

The reason why this error was not so evident, is probably because the
IOTLB was small enough to almost always miss at the point an IOVA was
reused. Raising the IOTLB size to 32k (which is a module parameter that
defaults to 2k) is enough to reproduce the bug in +90% of the runs.
It usually takes less than 10 seconds of netperf to cause this bug
to happen.

A few minutes after reproducing this bug, the guest usually crash.

Fixing this bug involves cleaning a IOVA entry from IOTLB.
The guest kernel trigger this by doing a H_STUFF_TCE hypercall with
tce_value == 0.

This change fixes this bug by returning H_TOO_HARD on kvmppc_h_stuff_tce
when tce_value == 0, which causes kvm to let qemu deal with this.
In this case, qemu does free the vhost IOTLB entry, which fixes the bug.

Signed-off-by: Leonardo Bras <leonardo@linux.ibm.com>
---
 arch/powerpc/kvm/book3s_64_vio.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 883a66e76638..841eff3f6392 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -710,6 +710,9 @@ long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
 	if (ret != H_SUCCESS)
 		return ret;

+	if (tce_value == 0)
+		return H_TOO_HARD;
+
 	/* Check permission bits only to allow userspace poison TCE for debug */
 	if (tce_value & (TCE_PCI_WRITE | TCE_PCI_READ))
 		return H_PARAMETER;
-- 
2.23.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 1/1] kvm/book3s_64: Fixes crash caused by not cleaning vhost IOTLB
@ 2019-12-17 21:06 ` Leonardo Bras
  0 siblings, 0 replies; 12+ messages in thread
From: Leonardo Bras @ 2019-12-17 21:06 UTC (permalink / raw)
  To: Paul Mackerras, Benjamin Herrenschmidt, Michael Ellerman
  Cc: Leonardo Bras, kvm-ppc, linuxppc-dev, linux-kernel, farosas, aik

Fixes a bug that happens when a virtual machine is created without DDW,
with vhost supporting a virtio-net device.

In this scenario, an IOMMU with 32-bit DMA window will possibly map
IOVA's to different memory addresses.

As the code works today, H_STUFF_TCE hypercall will be dealt only with
kvm code, which does not invalidate the IOTLB entry in vhost, meaning
that at some point, and old entry can cause an access to a previous
memory address that IOVA pointed.

Example:
- virtio-net passes IOVA N to vhost, which point to M1
- vhost tries IOTLB, but miss
- vhost translates IOVA N and stores result to IOTLB
- vhost writes to M1
- (some IOMMU usage)
- virtio-net passes IOVA N to vhost, which now points to M2
- vhost tries IOTLB, and translates IOVA N to M1
- vhost writes to M1 <error, should write to M2>

The reason why this error was not so evident, is probably because the
IOTLB was small enough to almost always miss at the point an IOVA was
reused. Raising the IOTLB size to 32k (which is a module parameter that
defaults to 2k) is enough to reproduce the bug in +90% of the runs.
It usually takes less than 10 seconds of netperf to cause this bug
to happen.

A few minutes after reproducing this bug, the guest usually crash.

Fixing this bug involves cleaning a IOVA entry from IOTLB.
The guest kernel trigger this by doing a H_STUFF_TCE hypercall with
tce_value = 0.

This change fixes this bug by returning H_TOO_HARD on kvmppc_h_stuff_tce
when tce_value = 0, which causes kvm to let qemu deal with this.
In this case, qemu does free the vhost IOTLB entry, which fixes the bug.

Signed-off-by: Leonardo Bras <leonardo@linux.ibm.com>
---
 arch/powerpc/kvm/book3s_64_vio.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 883a66e76638..841eff3f6392 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -710,6 +710,9 @@ long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
 	if (ret != H_SUCCESS)
 		return ret;

+	if (tce_value = 0)
+		return H_TOO_HARD;
+
 	/* Check permission bits only to allow userspace poison TCE for debug */
 	if (tce_value & (TCE_PCI_WRITE | TCE_PCI_READ))
 		return H_PARAMETER;
-- 
2.23.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH 1/1] kvm/book3s_64: Fixes crash caused by not cleaning vhost IOTLB
@ 2019-12-17 21:06 ` Leonardo Bras
  0 siblings, 0 replies; 12+ messages in thread
From: Leonardo Bras @ 2019-12-17 21:06 UTC (permalink / raw)
  To: Paul Mackerras, Benjamin Herrenschmidt, Michael Ellerman
  Cc: farosas, aik, linux-kernel, kvm-ppc, Leonardo Bras, linuxppc-dev

Fixes a bug that happens when a virtual machine is created without DDW,
with vhost supporting a virtio-net device.

In this scenario, an IOMMU with 32-bit DMA window will possibly map
IOVA's to different memory addresses.

As the code works today, H_STUFF_TCE hypercall will be dealt only with
kvm code, which does not invalidate the IOTLB entry in vhost, meaning
that at some point, and old entry can cause an access to a previous
memory address that IOVA pointed.

Example:
- virtio-net passes IOVA N to vhost, which point to M1
- vhost tries IOTLB, but miss
- vhost translates IOVA N and stores result to IOTLB
- vhost writes to M1
- (some IOMMU usage)
- virtio-net passes IOVA N to vhost, which now points to M2
- vhost tries IOTLB, and translates IOVA N to M1
- vhost writes to M1 <error, should write to M2>

The reason why this error was not so evident, is probably because the
IOTLB was small enough to almost always miss at the point an IOVA was
reused. Raising the IOTLB size to 32k (which is a module parameter that
defaults to 2k) is enough to reproduce the bug in +90% of the runs.
It usually takes less than 10 seconds of netperf to cause this bug
to happen.

A few minutes after reproducing this bug, the guest usually crash.

Fixing this bug involves cleaning a IOVA entry from IOTLB.
The guest kernel trigger this by doing a H_STUFF_TCE hypercall with
tce_value == 0.

This change fixes this bug by returning H_TOO_HARD on kvmppc_h_stuff_tce
when tce_value == 0, which causes kvm to let qemu deal with this.
In this case, qemu does free the vhost IOTLB entry, which fixes the bug.

Signed-off-by: Leonardo Bras <leonardo@linux.ibm.com>
---
 arch/powerpc/kvm/book3s_64_vio.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 883a66e76638..841eff3f6392 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -710,6 +710,9 @@ long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
 	if (ret != H_SUCCESS)
 		return ret;

+	if (tce_value == 0)
+		return H_TOO_HARD;
+
 	/* Check permission bits only to allow userspace poison TCE for debug */
 	if (tce_value & (TCE_PCI_WRITE | TCE_PCI_READ))
 		return H_PARAMETER;
-- 
2.23.0

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/1] kvm/book3s_64: Fixes crash caused by not cleaning vhost IOTLB
  2019-12-17 21:06 ` Leonardo Bras
  (?)
@ 2019-12-18  4:53   ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 12+ messages in thread
From: Alexey Kardashevskiy @ 2019-12-18  4:53 UTC (permalink / raw)
  To: Leonardo Bras, Paul Mackerras, Benjamin Herrenschmidt, Michael Ellerman
  Cc: kvm-ppc, linuxppc-dev, linux-kernel, farosas, David Gibson



On 18/12/2019 08:06, Leonardo Bras wrote:
> Fixes a bug that happens when a virtual machine is created without DDW,
> with vhost supporting a virtio-net device.
> 
> In this scenario, an IOMMU with 32-bit DMA window will possibly map
> IOVA's to different memory addresses.
> 
> As the code works today, H_STUFF_TCE hypercall will be dealt only with
> kvm code, which does not invalidate the IOTLB entry in vhost, meaning
> that at some point, and old entry can cause an access to a previous
> memory address that IOVA pointed.
> 
> Example:
> - virtio-net passes IOVA N to vhost, which point to M1
> - vhost tries IOTLB, but miss
> - vhost translates IOVA N and stores result to IOTLB
> - vhost writes to M1
> - (some IOMMU usage)
> - virtio-net passes IOVA N to vhost, which now points to M2
> - vhost tries IOTLB, and translates IOVA N to M1
> - vhost writes to M1 <error, should write to M2>
> 
> The reason why this error was not so evident, is probably because the
> IOTLB was small enough to almost always miss at the point an IOVA was
> reused. Raising the IOTLB size to 32k (which is a module parameter that
> defaults to 2k) is enough to reproduce the bug in +90% of the runs.
> It usually takes less than 10 seconds of netperf to cause this bug
> to happen.
> 
> A few minutes after reproducing this bug, the guest usually crash.
> 
> Fixing this bug involves cleaning a IOVA entry from IOTLB.
> The guest kernel trigger this by doing a H_STUFF_TCE hypercall with
> tce_value == 0.
> 
> This change fixes this bug by returning H_TOO_HARD on kvmppc_h_stuff_tce
> when tce_value == 0, which causes kvm to let qemu deal with this.
> In this case, qemu does free the vhost IOTLB entry, which fixes the bug.
> 
> Signed-off-by: Leonardo Bras <leonardo@linux.ibm.com>
> ---
>  arch/powerpc/kvm/book3s_64_vio.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
> index 883a66e76638..841eff3f6392 100644
> --- a/arch/powerpc/kvm/book3s_64_vio.c
> +++ b/arch/powerpc/kvm/book3s_64_vio.c
> @@ -710,6 +710,9 @@ long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
>  	if (ret != H_SUCCESS)
>  		return ret;
>  
> +	if (tce_value == 0)


H_STUFF_TCE is always called with 0. Well, may be some AIX somewhere
calls it with a value other than zero, and I probably saw some other
value somewhere but in QEMU/KVM case it is 0 so you effectively disable
in-kernel acceleration of H_STUFF_TCE which is undesirable.

For now we should disable in-kernel H_STUFF_TCE/... handlers in QEMU
just like we do for VFIO for older host kernels:

https://git.qemu.org/?p=qemu.git;a=blob;f=hw/ppc/spapr_iommu.c;h=3d3bcc86496a5277d62f7855fbb09c013c015f27;hb=HEAD#l208

I am not sure what a proper solution would be, something like an eventfd
and KVM's kvmppc_h_stuff_tce() signalling vhost that the latter needs to
invalidate iotlbs. Or we can just say that we do not allow KVM
acceleration if there is vhost+iommu on the same liobn (== vPHB, pretty
much). Thanks,



> +		return H_TOO_HARD;
> +
>  	/* Check permission bits only to allow userspace poison TCE for debug */
>  	if (tce_value & (TCE_PCI_WRITE | TCE_PCI_READ))
>  		return H_PARAMETER;
> 

-- 
Alexey

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/1] kvm/book3s_64: Fixes crash caused by not cleaning vhost IOTLB
@ 2019-12-18  4:53   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 12+ messages in thread
From: Alexey Kardashevskiy @ 2019-12-18  4:53 UTC (permalink / raw)
  To: Leonardo Bras, Paul Mackerras, Benjamin Herrenschmidt, Michael Ellerman
  Cc: kvm-ppc, linuxppc-dev, linux-kernel, farosas, David Gibson



On 18/12/2019 08:06, Leonardo Bras wrote:
> Fixes a bug that happens when a virtual machine is created without DDW,
> with vhost supporting a virtio-net device.
> 
> In this scenario, an IOMMU with 32-bit DMA window will possibly map
> IOVA's to different memory addresses.
> 
> As the code works today, H_STUFF_TCE hypercall will be dealt only with
> kvm code, which does not invalidate the IOTLB entry in vhost, meaning
> that at some point, and old entry can cause an access to a previous
> memory address that IOVA pointed.
> 
> Example:
> - virtio-net passes IOVA N to vhost, which point to M1
> - vhost tries IOTLB, but miss
> - vhost translates IOVA N and stores result to IOTLB
> - vhost writes to M1
> - (some IOMMU usage)
> - virtio-net passes IOVA N to vhost, which now points to M2
> - vhost tries IOTLB, and translates IOVA N to M1
> - vhost writes to M1 <error, should write to M2>
> 
> The reason why this error was not so evident, is probably because the
> IOTLB was small enough to almost always miss at the point an IOVA was
> reused. Raising the IOTLB size to 32k (which is a module parameter that
> defaults to 2k) is enough to reproduce the bug in +90% of the runs.
> It usually takes less than 10 seconds of netperf to cause this bug
> to happen.
> 
> A few minutes after reproducing this bug, the guest usually crash.
> 
> Fixing this bug involves cleaning a IOVA entry from IOTLB.
> The guest kernel trigger this by doing a H_STUFF_TCE hypercall with
> tce_value = 0.
> 
> This change fixes this bug by returning H_TOO_HARD on kvmppc_h_stuff_tce
> when tce_value = 0, which causes kvm to let qemu deal with this.
> In this case, qemu does free the vhost IOTLB entry, which fixes the bug.
> 
> Signed-off-by: Leonardo Bras <leonardo@linux.ibm.com>
> ---
>  arch/powerpc/kvm/book3s_64_vio.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
> index 883a66e76638..841eff3f6392 100644
> --- a/arch/powerpc/kvm/book3s_64_vio.c
> +++ b/arch/powerpc/kvm/book3s_64_vio.c
> @@ -710,6 +710,9 @@ long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
>  	if (ret != H_SUCCESS)
>  		return ret;
>  
> +	if (tce_value = 0)


H_STUFF_TCE is always called with 0. Well, may be some AIX somewhere
calls it with a value other than zero, and I probably saw some other
value somewhere but in QEMU/KVM case it is 0 so you effectively disable
in-kernel acceleration of H_STUFF_TCE which is undesirable.

For now we should disable in-kernel H_STUFF_TCE/... handlers in QEMU
just like we do for VFIO for older host kernels:

https://git.qemu.org/?p=qemu.git;a=blob;f=hw/ppc/spapr_iommu.c;h=3bcc86496a5277d62f7855fbb09c013c015f27;hb=HEAD#l208

I am not sure what a proper solution would be, something like an eventfd
and KVM's kvmppc_h_stuff_tce() signalling vhost that the latter needs to
invalidate iotlbs. Or we can just say that we do not allow KVM
acceleration if there is vhost+iommu on the same liobn (= vPHB, pretty
much). Thanks,



> +		return H_TOO_HARD;
> +
>  	/* Check permission bits only to allow userspace poison TCE for debug */
>  	if (tce_value & (TCE_PCI_WRITE | TCE_PCI_READ))
>  		return H_PARAMETER;
> 

-- 
Alexey

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/1] kvm/book3s_64: Fixes crash caused by not cleaning vhost IOTLB
@ 2019-12-18  4:53   ` Alexey Kardashevskiy
  0 siblings, 0 replies; 12+ messages in thread
From: Alexey Kardashevskiy @ 2019-12-18  4:53 UTC (permalink / raw)
  To: Leonardo Bras, Paul Mackerras, Benjamin Herrenschmidt, Michael Ellerman
  Cc: David Gibson, linuxppc-dev, linux-kernel, kvm-ppc, farosas



On 18/12/2019 08:06, Leonardo Bras wrote:
> Fixes a bug that happens when a virtual machine is created without DDW,
> with vhost supporting a virtio-net device.
> 
> In this scenario, an IOMMU with 32-bit DMA window will possibly map
> IOVA's to different memory addresses.
> 
> As the code works today, H_STUFF_TCE hypercall will be dealt only with
> kvm code, which does not invalidate the IOTLB entry in vhost, meaning
> that at some point, and old entry can cause an access to a previous
> memory address that IOVA pointed.
> 
> Example:
> - virtio-net passes IOVA N to vhost, which point to M1
> - vhost tries IOTLB, but miss
> - vhost translates IOVA N and stores result to IOTLB
> - vhost writes to M1
> - (some IOMMU usage)
> - virtio-net passes IOVA N to vhost, which now points to M2
> - vhost tries IOTLB, and translates IOVA N to M1
> - vhost writes to M1 <error, should write to M2>
> 
> The reason why this error was not so evident, is probably because the
> IOTLB was small enough to almost always miss at the point an IOVA was
> reused. Raising the IOTLB size to 32k (which is a module parameter that
> defaults to 2k) is enough to reproduce the bug in +90% of the runs.
> It usually takes less than 10 seconds of netperf to cause this bug
> to happen.
> 
> A few minutes after reproducing this bug, the guest usually crash.
> 
> Fixing this bug involves cleaning a IOVA entry from IOTLB.
> The guest kernel trigger this by doing a H_STUFF_TCE hypercall with
> tce_value == 0.
> 
> This change fixes this bug by returning H_TOO_HARD on kvmppc_h_stuff_tce
> when tce_value == 0, which causes kvm to let qemu deal with this.
> In this case, qemu does free the vhost IOTLB entry, which fixes the bug.
> 
> Signed-off-by: Leonardo Bras <leonardo@linux.ibm.com>
> ---
>  arch/powerpc/kvm/book3s_64_vio.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
> index 883a66e76638..841eff3f6392 100644
> --- a/arch/powerpc/kvm/book3s_64_vio.c
> +++ b/arch/powerpc/kvm/book3s_64_vio.c
> @@ -710,6 +710,9 @@ long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
>  	if (ret != H_SUCCESS)
>  		return ret;
>  
> +	if (tce_value == 0)


H_STUFF_TCE is always called with 0. Well, may be some AIX somewhere
calls it with a value other than zero, and I probably saw some other
value somewhere but in QEMU/KVM case it is 0 so you effectively disable
in-kernel acceleration of H_STUFF_TCE which is undesirable.

For now we should disable in-kernel H_STUFF_TCE/... handlers in QEMU
just like we do for VFIO for older host kernels:

https://git.qemu.org/?p=qemu.git;a=blob;f=hw/ppc/spapr_iommu.c;h=3d3bcc86496a5277d62f7855fbb09c013c015f27;hb=HEAD#l208

I am not sure what a proper solution would be, something like an eventfd
and KVM's kvmppc_h_stuff_tce() signalling vhost that the latter needs to
invalidate iotlbs. Or we can just say that we do not allow KVM
acceleration if there is vhost+iommu on the same liobn (== vPHB, pretty
much). Thanks,



> +		return H_TOO_HARD;
> +
>  	/* Check permission bits only to allow userspace poison TCE for debug */
>  	if (tce_value & (TCE_PCI_WRITE | TCE_PCI_READ))
>  		return H_PARAMETER;
> 

-- 
Alexey

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/1] kvm/book3s_64: Fixes crash caused by not cleaning vhost IOTLB
  2019-12-18  4:53   ` Alexey Kardashevskiy
  (?)
@ 2019-12-18 23:28     ` Leonardo Bras
  -1 siblings, 0 replies; 12+ messages in thread
From: Leonardo Bras @ 2019-12-18 23:28 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Paul Mackerras, Benjamin Herrenschmidt,
	Michael Ellerman
  Cc: kvm-ppc, linuxppc-dev, linux-kernel, farosas, David Gibson

[-- Attachment #1: Type: text/plain, Size: 2188 bytes --]

On Wed, 2019-12-18 at 15:53 +1100, Alexey Kardashevskiy wrote:
> H_STUFF_TCE is always called with 0. Well, may be some AIX somewhere
> calls it with a value other than zero, and I probably saw some other
> value somewhere but in QEMU/KVM case it is 0 so you effectively disable
> in-kernel acceleration of H_STUFF_TCE which is
> undesirable.
> 

Thanks for the feedback!

> For now we should disable in-kernel H_STUFF_TCE/... handlers in QEMU
> just like we do for VFIO for older host kernels:
> 
> https://git.qemu.org/?p=qemu.git;a=blob;f=hw/ppc/spapr_iommu.c;h=3d3bcc86496a5277d62f7855fbb09c013c015f27;hb=HEAD#l208

I am still reading into this temporary solution, I could still not
understand how it works.

> I am not sure what a proper solution would be, something like an eventfd
> and KVM's kvmppc_h_stuff_tce() signaling vhost that the latter needs to
> invalidate iotlbs. Or we can just say that we do not allow KVM
> acceleration if there is vhost+iommu on the same liobn (== vPHB, pretty
> much). Thanks,

I am not used to eventfd, but i agree it's a valid solution to talk to
QEMU and then use it to send a message via /dev/vhost.
KVM -> QEMU -> vhost

But I can't get my mind out of another solution: doing it in
kernelspace.  I am not sure how that would work, though.

If I could understand correctly, there is a vhost IOTLB per vhost_dev,
and H_STUFF_TCE is not called in 64-bit DMA case (for tce_value == 0
case, at least), which makes sense, given it doesn't need to invalidate
entries on IOTLB.

So, we would need to somehow replace `return H_TOO_HARD` in this patch
with code that could call vhost_process_iotlb_msg() with
VHOST_IOTLB_INVALIDATE.

For that, I would need to know what are the vhost_dev's of that
process, which I don't know if it's possible to do currently (or safe
at all).

I am thinking of linking all vhost_dev's with a list (list.h) that
could be searched, comparing `mm_struct *` of the calling task with all
vhost_dev's, and removing the entry of all IOTLB that hits.

Not sure if that's the best approach to find the related vhost_dev's.

What do you think?

Best regards,
Leonardo Bras

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/1] kvm/book3s_64: Fixes crash caused by not cleaning vhost IOTLB
@ 2019-12-18 23:28     ` Leonardo Bras
  0 siblings, 0 replies; 12+ messages in thread
From: Leonardo Bras @ 2019-12-18 23:28 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Paul Mackerras, Benjamin Herrenschmidt,
	Michael Ellerman
  Cc: kvm-ppc, linuxppc-dev, linux-kernel, farosas, David Gibson

[-- Attachment #1: Type: text/plain, Size: 2188 bytes --]

On Wed, 2019-12-18 at 15:53 +1100, Alexey Kardashevskiy wrote:
> H_STUFF_TCE is always called with 0. Well, may be some AIX somewhere
> calls it with a value other than zero, and I probably saw some other
> value somewhere but in QEMU/KVM case it is 0 so you effectively disable
> in-kernel acceleration of H_STUFF_TCE which is
> undesirable.
> 

Thanks for the feedback!

> For now we should disable in-kernel H_STUFF_TCE/... handlers in QEMU
> just like we do for VFIO for older host kernels:
> 
> https://git.qemu.org/?p=qemu.git;a=blob;f=hw/ppc/spapr_iommu.c;h=3d3bcc86496a5277d62f7855fbb09c013c015f27;hb=HEAD#l208

I am still reading into this temporary solution, I could still not
understand how it works.

> I am not sure what a proper solution would be, something like an eventfd
> and KVM's kvmppc_h_stuff_tce() signaling vhost that the latter needs to
> invalidate iotlbs. Or we can just say that we do not allow KVM
> acceleration if there is vhost+iommu on the same liobn (== vPHB, pretty
> much). Thanks,

I am not used to eventfd, but i agree it's a valid solution to talk to
QEMU and then use it to send a message via /dev/vhost.
KVM -> QEMU -> vhost

But I can't get my mind out of another solution: doing it in
kernelspace.  I am not sure how that would work, though.

If I could understand correctly, there is a vhost IOTLB per vhost_dev,
and H_STUFF_TCE is not called in 64-bit DMA case (for tce_value == 0
case, at least), which makes sense, given it doesn't need to invalidate
entries on IOTLB.

So, we would need to somehow replace `return H_TOO_HARD` in this patch
with code that could call vhost_process_iotlb_msg() with
VHOST_IOTLB_INVALIDATE.

For that, I would need to know what are the vhost_dev's of that
process, which I don't know if it's possible to do currently (or safe
at all).

I am thinking of linking all vhost_dev's with a list (list.h) that
could be searched, comparing `mm_struct *` of the calling task with all
vhost_dev's, and removing the entry of all IOTLB that hits.

Not sure if that's the best approach to find the related vhost_dev's.

What do you think?

Best regards,
Leonardo Bras

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/1] kvm/book3s_64: Fixes crash caused by not cleaning vhost IOTLB
@ 2019-12-18 23:28     ` Leonardo Bras
  0 siblings, 0 replies; 12+ messages in thread
From: Leonardo Bras @ 2019-12-18 23:28 UTC (permalink / raw)
  To: Alexey Kardashevskiy, Paul Mackerras, Benjamin Herrenschmidt,
	Michael Ellerman
  Cc: David Gibson, linuxppc-dev, linux-kernel, kvm-ppc, farosas

[-- Attachment #1: Type: text/plain, Size: 2188 bytes --]

On Wed, 2019-12-18 at 15:53 +1100, Alexey Kardashevskiy wrote:
> H_STUFF_TCE is always called with 0. Well, may be some AIX somewhere
> calls it with a value other than zero, and I probably saw some other
> value somewhere but in QEMU/KVM case it is 0 so you effectively disable
> in-kernel acceleration of H_STUFF_TCE which is
> undesirable.
> 

Thanks for the feedback!

> For now we should disable in-kernel H_STUFF_TCE/... handlers in QEMU
> just like we do for VFIO for older host kernels:
> 
> https://git.qemu.org/?p=qemu.git;a=blob;f=hw/ppc/spapr_iommu.c;h=3d3bcc86496a5277d62f7855fbb09c013c015f27;hb=HEAD#l208

I am still reading into this temporary solution, I could still not
understand how it works.

> I am not sure what a proper solution would be, something like an eventfd
> and KVM's kvmppc_h_stuff_tce() signaling vhost that the latter needs to
> invalidate iotlbs. Or we can just say that we do not allow KVM
> acceleration if there is vhost+iommu on the same liobn (== vPHB, pretty
> much). Thanks,

I am not used to eventfd, but i agree it's a valid solution to talk to
QEMU and then use it to send a message via /dev/vhost.
KVM -> QEMU -> vhost

But I can't get my mind out of another solution: doing it in
kernelspace.  I am not sure how that would work, though.

If I could understand correctly, there is a vhost IOTLB per vhost_dev,
and H_STUFF_TCE is not called in 64-bit DMA case (for tce_value == 0
case, at least), which makes sense, given it doesn't need to invalidate
entries on IOTLB.

So, we would need to somehow replace `return H_TOO_HARD` in this patch
with code that could call vhost_process_iotlb_msg() with
VHOST_IOTLB_INVALIDATE.

For that, I would need to know what are the vhost_dev's of that
process, which I don't know if it's possible to do currently (or safe
at all).

I am thinking of linking all vhost_dev's with a list (list.h) that
could be searched, comparing `mm_struct *` of the calling task with all
vhost_dev's, and removing the entry of all IOTLB that hits.

Not sure if that's the best approach to find the related vhost_dev's.

What do you think?

Best regards,
Leonardo Bras

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/1] kvm/book3s_64: Fixes crash caused by not cleaning vhost IOTLB
  2019-12-18 23:28     ` Leonardo Bras
  (?)
@ 2019-12-20  3:35       ` Alexey Kardashevskiy
  -1 siblings, 0 replies; 12+ messages in thread
From: Alexey Kardashevskiy @ 2019-12-20  3:35 UTC (permalink / raw)
  To: Leonardo Bras, Paul Mackerras, Benjamin Herrenschmidt, Michael Ellerman
  Cc: kvm-ppc, linuxppc-dev, linux-kernel, farosas, David Gibson,
	Alex Williamson



On 19/12/2019 10:28, Leonardo Bras wrote:
> On Wed, 2019-12-18 at 15:53 +1100, Alexey Kardashevskiy wrote:
>> H_STUFF_TCE is always called with 0. Well, may be some AIX somewhere
>> calls it with a value other than zero, and I probably saw some other
>> value somewhere but in QEMU/KVM case it is 0 so you effectively disable
>> in-kernel acceleration of H_STUFF_TCE which is
>> undesirable.
>>
> 
> Thanks for the feedback!
> 
>> For now we should disable in-kernel H_STUFF_TCE/... handlers in QEMU
>> just like we do for VFIO for older host kernels:
>>
>> https://git.qemu.org/?p=qemu.git;a=blob;f=hw/ppc/spapr_iommu.c;h=3d3bcc86496a5277d62f7855fbb09c013c015f27;hb=HEAD#l208
>  
> I am still reading into this temporary solution, I could still not
> understand how it works.
> 
>> I am not sure what a proper solution would be, something like an eventfd
>> and KVM's kvmppc_h_stuff_tce() signaling vhost that the latter needs to
>> invalidate iotlbs. Or we can just say that we do not allow KVM
>> acceleration if there is vhost+iommu on the same liobn (== vPHB, pretty
>> much). Thanks,
> 
> I am not used to eventfd, but i agree it's a valid solution to talk to
> QEMU and then use it to send a message via /dev/vhost.
> KVM -> QEMU -> vhost
> 
> But I can't get my mind out of another solution: doing it in
> kernelspace.  I am not sure how that would work, though.
> 
> If I could understand correctly, there is a vhost IOTLB per vhost_dev,
> and H_STUFF_TCE is not called in 64-bit DMA case (for tce_value == 0
> case, at least), which makes sense, given it doesn't need to invalidate
> entries on IOTLB.
> 
> So, we would need to somehow replace `return H_TOO_HARD` in this patch
> with code that could call vhost_process_iotlb_msg() with
> VHOST_IOTLB_INVALIDATE.
> 
> For that, I would need to know what are the vhost_dev's of that
> process, which I don't know if it's possible to do currently (or safe
> at all).
> 
> I am thinking of linking all vhost_dev's with a list (list.h) that
> could be searched, comparing `mm_struct *` of the calling task with all
> vhost_dev's, and removing the entry of all IOTLB that hits.
> 
> Not sure if that's the best approach to find the related vhost_dev's.
> 
> What do you think?


As discussed in slack, we need to do the same thing we do with physical
devices when we invalidate hardware IOMMU translation caches via
tbl->it_ops->tce_kill. The problem to solve now is how we tell KVM/PPC
about vhost/iotlb (is there an fd?), something similar to the existing
KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE. I guess x86 handles all the mappings
in QEMU and therefore they do not have this problem. Thanks,


-- 
Alexey

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/1] kvm/book3s_64: Fixes crash caused by not cleaning vhost IOTLB
@ 2019-12-20  3:35       ` Alexey Kardashevskiy
  0 siblings, 0 replies; 12+ messages in thread
From: Alexey Kardashevskiy @ 2019-12-20  3:35 UTC (permalink / raw)
  To: Leonardo Bras, Paul Mackerras, Benjamin Herrenschmidt, Michael Ellerman
  Cc: kvm-ppc, linuxppc-dev, linux-kernel, farosas, David Gibson,
	Alex Williamson



On 19/12/2019 10:28, Leonardo Bras wrote:
> On Wed, 2019-12-18 at 15:53 +1100, Alexey Kardashevskiy wrote:
>> H_STUFF_TCE is always called with 0. Well, may be some AIX somewhere
>> calls it with a value other than zero, and I probably saw some other
>> value somewhere but in QEMU/KVM case it is 0 so you effectively disable
>> in-kernel acceleration of H_STUFF_TCE which is
>> undesirable.
>>
> 
> Thanks for the feedback!
> 
>> For now we should disable in-kernel H_STUFF_TCE/... handlers in QEMU
>> just like we do for VFIO for older host kernels:
>>
>> https://git.qemu.org/?p=qemu.git;a=blob;f=hw/ppc/spapr_iommu.c;h=3bcc86496a5277d62f7855fbb09c013c015f27;hb=HEAD#l208
>  
> I am still reading into this temporary solution, I could still not
> understand how it works.
> 
>> I am not sure what a proper solution would be, something like an eventfd
>> and KVM's kvmppc_h_stuff_tce() signaling vhost that the latter needs to
>> invalidate iotlbs. Or we can just say that we do not allow KVM
>> acceleration if there is vhost+iommu on the same liobn (= vPHB, pretty
>> much). Thanks,
> 
> I am not used to eventfd, but i agree it's a valid solution to talk to
> QEMU and then use it to send a message via /dev/vhost.
> KVM -> QEMU -> vhost
> 
> But I can't get my mind out of another solution: doing it in
> kernelspace.  I am not sure how that would work, though.
> 
> If I could understand correctly, there is a vhost IOTLB per vhost_dev,
> and H_STUFF_TCE is not called in 64-bit DMA case (for tce_value = 0
> case, at least), which makes sense, given it doesn't need to invalidate
> entries on IOTLB.
> 
> So, we would need to somehow replace `return H_TOO_HARD` in this patch
> with code that could call vhost_process_iotlb_msg() with
> VHOST_IOTLB_INVALIDATE.
> 
> For that, I would need to know what are the vhost_dev's of that
> process, which I don't know if it's possible to do currently (or safe
> at all).
> 
> I am thinking of linking all vhost_dev's with a list (list.h) that
> could be searched, comparing `mm_struct *` of the calling task with all
> vhost_dev's, and removing the entry of all IOTLB that hits.
> 
> Not sure if that's the best approach to find the related vhost_dev's.
> 
> What do you think?


As discussed in slack, we need to do the same thing we do with physical
devices when we invalidate hardware IOMMU translation caches via
tbl->it_ops->tce_kill. The problem to solve now is how we tell KVM/PPC
about vhost/iotlb (is there an fd?), something similar to the existing
KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE. I guess x86 handles all the mappings
in QEMU and therefore they do not have this problem. Thanks,


-- 
Alexey

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH 1/1] kvm/book3s_64: Fixes crash caused by not cleaning vhost IOTLB
@ 2019-12-20  3:35       ` Alexey Kardashevskiy
  0 siblings, 0 replies; 12+ messages in thread
From: Alexey Kardashevskiy @ 2019-12-20  3:35 UTC (permalink / raw)
  To: Leonardo Bras, Paul Mackerras, Benjamin Herrenschmidt, Michael Ellerman
  Cc: farosas, linux-kernel, kvm-ppc, Alex Williamson, linuxppc-dev,
	David Gibson



On 19/12/2019 10:28, Leonardo Bras wrote:
> On Wed, 2019-12-18 at 15:53 +1100, Alexey Kardashevskiy wrote:
>> H_STUFF_TCE is always called with 0. Well, may be some AIX somewhere
>> calls it with a value other than zero, and I probably saw some other
>> value somewhere but in QEMU/KVM case it is 0 so you effectively disable
>> in-kernel acceleration of H_STUFF_TCE which is
>> undesirable.
>>
> 
> Thanks for the feedback!
> 
>> For now we should disable in-kernel H_STUFF_TCE/... handlers in QEMU
>> just like we do for VFIO for older host kernels:
>>
>> https://git.qemu.org/?p=qemu.git;a=blob;f=hw/ppc/spapr_iommu.c;h=3d3bcc86496a5277d62f7855fbb09c013c015f27;hb=HEAD#l208
>  
> I am still reading into this temporary solution, I could still not
> understand how it works.
> 
>> I am not sure what a proper solution would be, something like an eventfd
>> and KVM's kvmppc_h_stuff_tce() signaling vhost that the latter needs to
>> invalidate iotlbs. Or we can just say that we do not allow KVM
>> acceleration if there is vhost+iommu on the same liobn (== vPHB, pretty
>> much). Thanks,
> 
> I am not used to eventfd, but i agree it's a valid solution to talk to
> QEMU and then use it to send a message via /dev/vhost.
> KVM -> QEMU -> vhost
> 
> But I can't get my mind out of another solution: doing it in
> kernelspace.  I am not sure how that would work, though.
> 
> If I could understand correctly, there is a vhost IOTLB per vhost_dev,
> and H_STUFF_TCE is not called in 64-bit DMA case (for tce_value == 0
> case, at least), which makes sense, given it doesn't need to invalidate
> entries on IOTLB.
> 
> So, we would need to somehow replace `return H_TOO_HARD` in this patch
> with code that could call vhost_process_iotlb_msg() with
> VHOST_IOTLB_INVALIDATE.
> 
> For that, I would need to know what are the vhost_dev's of that
> process, which I don't know if it's possible to do currently (or safe
> at all).
> 
> I am thinking of linking all vhost_dev's with a list (list.h) that
> could be searched, comparing `mm_struct *` of the calling task with all
> vhost_dev's, and removing the entry of all IOTLB that hits.
> 
> Not sure if that's the best approach to find the related vhost_dev's.
> 
> What do you think?


As discussed in slack, we need to do the same thing we do with physical
devices when we invalidate hardware IOMMU translation caches via
tbl->it_ops->tce_kill. The problem to solve now is how we tell KVM/PPC
about vhost/iotlb (is there an fd?), something similar to the existing
KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE. I guess x86 handles all the mappings
in QEMU and therefore they do not have this problem. Thanks,


-- 
Alexey

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2019-12-20  3:37 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-17 21:06 [PATCH 1/1] kvm/book3s_64: Fixes crash caused by not cleaning vhost IOTLB Leonardo Bras
2019-12-17 21:06 ` Leonardo Bras
2019-12-17 21:06 ` Leonardo Bras
2019-12-18  4:53 ` Alexey Kardashevskiy
2019-12-18  4:53   ` Alexey Kardashevskiy
2019-12-18  4:53   ` Alexey Kardashevskiy
2019-12-18 23:28   ` Leonardo Bras
2019-12-18 23:28     ` Leonardo Bras
2019-12-18 23:28     ` Leonardo Bras
2019-12-20  3:35     ` Alexey Kardashevskiy
2019-12-20  3:35       ` Alexey Kardashevskiy
2019-12-20  3:35       ` Alexey Kardashevskiy

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.