* Re: [PATCH 1/1] kvm/book3s_64: Fixes crash caused by not cleaning vhost IOTLB
@ 2019-12-18 4:53 ` Alexey Kardashevskiy
0 siblings, 0 replies; 12+ messages in thread
From: Alexey Kardashevskiy @ 2019-12-18 4:53 UTC (permalink / raw)
To: Leonardo Bras, Paul Mackerras, Benjamin Herrenschmidt, Michael Ellerman
Cc: kvm-ppc, linuxppc-dev, linux-kernel, farosas, David Gibson
On 18/12/2019 08:06, Leonardo Bras wrote:
> Fixes a bug that happens when a virtual machine is created without DDW,
> with vhost supporting a virtio-net device.
>
> In this scenario, an IOMMU with 32-bit DMA window will possibly map
> IOVA's to different memory addresses.
>
> As the code works today, H_STUFF_TCE hypercall will be dealt only with
> kvm code, which does not invalidate the IOTLB entry in vhost, meaning
> that at some point, and old entry can cause an access to a previous
> memory address that IOVA pointed.
>
> Example:
> - virtio-net passes IOVA N to vhost, which point to M1
> - vhost tries IOTLB, but miss
> - vhost translates IOVA N and stores result to IOTLB
> - vhost writes to M1
> - (some IOMMU usage)
> - virtio-net passes IOVA N to vhost, which now points to M2
> - vhost tries IOTLB, and translates IOVA N to M1
> - vhost writes to M1 <error, should write to M2>
>
> The reason why this error was not so evident, is probably because the
> IOTLB was small enough to almost always miss at the point an IOVA was
> reused. Raising the IOTLB size to 32k (which is a module parameter that
> defaults to 2k) is enough to reproduce the bug in +90% of the runs.
> It usually takes less than 10 seconds of netperf to cause this bug
> to happen.
>
> A few minutes after reproducing this bug, the guest usually crash.
>
> Fixing this bug involves cleaning a IOVA entry from IOTLB.
> The guest kernel trigger this by doing a H_STUFF_TCE hypercall with
> tce_value = 0.
>
> This change fixes this bug by returning H_TOO_HARD on kvmppc_h_stuff_tce
> when tce_value = 0, which causes kvm to let qemu deal with this.
> In this case, qemu does free the vhost IOTLB entry, which fixes the bug.
>
> Signed-off-by: Leonardo Bras <leonardo@linux.ibm.com>
> ---
> arch/powerpc/kvm/book3s_64_vio.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
> index 883a66e76638..841eff3f6392 100644
> --- a/arch/powerpc/kvm/book3s_64_vio.c
> +++ b/arch/powerpc/kvm/book3s_64_vio.c
> @@ -710,6 +710,9 @@ long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
> if (ret != H_SUCCESS)
> return ret;
>
> + if (tce_value = 0)
H_STUFF_TCE is always called with 0. Well, may be some AIX somewhere
calls it with a value other than zero, and I probably saw some other
value somewhere but in QEMU/KVM case it is 0 so you effectively disable
in-kernel acceleration of H_STUFF_TCE which is undesirable.
For now we should disable in-kernel H_STUFF_TCE/... handlers in QEMU
just like we do for VFIO for older host kernels:
https://git.qemu.org/?p=qemu.git;a=blob;f=hw/ppc/spapr_iommu.c;h=3bcc86496a5277d62f7855fbb09c013c015f27;hb=HEAD#l208
I am not sure what a proper solution would be, something like an eventfd
and KVM's kvmppc_h_stuff_tce() signalling vhost that the latter needs to
invalidate iotlbs. Or we can just say that we do not allow KVM
acceleration if there is vhost+iommu on the same liobn (= vPHB, pretty
much). Thanks,
> + return H_TOO_HARD;
> +
> /* Check permission bits only to allow userspace poison TCE for debug */
> if (tce_value & (TCE_PCI_WRITE | TCE_PCI_READ))
> return H_PARAMETER;
>
--
Alexey
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/1] kvm/book3s_64: Fixes crash caused by not cleaning vhost IOTLB
@ 2019-12-18 4:53 ` Alexey Kardashevskiy
0 siblings, 0 replies; 12+ messages in thread
From: Alexey Kardashevskiy @ 2019-12-18 4:53 UTC (permalink / raw)
To: Leonardo Bras, Paul Mackerras, Benjamin Herrenschmidt, Michael Ellerman
Cc: David Gibson, linuxppc-dev, linux-kernel, kvm-ppc, farosas
On 18/12/2019 08:06, Leonardo Bras wrote:
> Fixes a bug that happens when a virtual machine is created without DDW,
> with vhost supporting a virtio-net device.
>
> In this scenario, an IOMMU with 32-bit DMA window will possibly map
> IOVA's to different memory addresses.
>
> As the code works today, H_STUFF_TCE hypercall will be dealt only with
> kvm code, which does not invalidate the IOTLB entry in vhost, meaning
> that at some point, and old entry can cause an access to a previous
> memory address that IOVA pointed.
>
> Example:
> - virtio-net passes IOVA N to vhost, which point to M1
> - vhost tries IOTLB, but miss
> - vhost translates IOVA N and stores result to IOTLB
> - vhost writes to M1
> - (some IOMMU usage)
> - virtio-net passes IOVA N to vhost, which now points to M2
> - vhost tries IOTLB, and translates IOVA N to M1
> - vhost writes to M1 <error, should write to M2>
>
> The reason why this error was not so evident, is probably because the
> IOTLB was small enough to almost always miss at the point an IOVA was
> reused. Raising the IOTLB size to 32k (which is a module parameter that
> defaults to 2k) is enough to reproduce the bug in +90% of the runs.
> It usually takes less than 10 seconds of netperf to cause this bug
> to happen.
>
> A few minutes after reproducing this bug, the guest usually crash.
>
> Fixing this bug involves cleaning a IOVA entry from IOTLB.
> The guest kernel trigger this by doing a H_STUFF_TCE hypercall with
> tce_value == 0.
>
> This change fixes this bug by returning H_TOO_HARD on kvmppc_h_stuff_tce
> when tce_value == 0, which causes kvm to let qemu deal with this.
> In this case, qemu does free the vhost IOTLB entry, which fixes the bug.
>
> Signed-off-by: Leonardo Bras <leonardo@linux.ibm.com>
> ---
> arch/powerpc/kvm/book3s_64_vio.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
> index 883a66e76638..841eff3f6392 100644
> --- a/arch/powerpc/kvm/book3s_64_vio.c
> +++ b/arch/powerpc/kvm/book3s_64_vio.c
> @@ -710,6 +710,9 @@ long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
> if (ret != H_SUCCESS)
> return ret;
>
> + if (tce_value == 0)
H_STUFF_TCE is always called with 0. Well, may be some AIX somewhere
calls it with a value other than zero, and I probably saw some other
value somewhere but in QEMU/KVM case it is 0 so you effectively disable
in-kernel acceleration of H_STUFF_TCE which is undesirable.
For now we should disable in-kernel H_STUFF_TCE/... handlers in QEMU
just like we do for VFIO for older host kernels:
https://git.qemu.org/?p=qemu.git;a=blob;f=hw/ppc/spapr_iommu.c;h=3d3bcc86496a5277d62f7855fbb09c013c015f27;hb=HEAD#l208
I am not sure what a proper solution would be, something like an eventfd
and KVM's kvmppc_h_stuff_tce() signalling vhost that the latter needs to
invalidate iotlbs. Or we can just say that we do not allow KVM
acceleration if there is vhost+iommu on the same liobn (== vPHB, pretty
much). Thanks,
> + return H_TOO_HARD;
> +
> /* Check permission bits only to allow userspace poison TCE for debug */
> if (tce_value & (TCE_PCI_WRITE | TCE_PCI_READ))
> return H_PARAMETER;
>
--
Alexey
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/1] kvm/book3s_64: Fixes crash caused by not cleaning vhost IOTLB
2019-12-18 4:53 ` Alexey Kardashevskiy
(?)
@ 2019-12-18 23:28 ` Leonardo Bras
-1 siblings, 0 replies; 12+ messages in thread
From: Leonardo Bras @ 2019-12-18 23:28 UTC (permalink / raw)
To: Alexey Kardashevskiy, Paul Mackerras, Benjamin Herrenschmidt,
Michael Ellerman
Cc: kvm-ppc, linuxppc-dev, linux-kernel, farosas, David Gibson
[-- Attachment #1: Type: text/plain, Size: 2188 bytes --]
On Wed, 2019-12-18 at 15:53 +1100, Alexey Kardashevskiy wrote:
> H_STUFF_TCE is always called with 0. Well, may be some AIX somewhere
> calls it with a value other than zero, and I probably saw some other
> value somewhere but in QEMU/KVM case it is 0 so you effectively disable
> in-kernel acceleration of H_STUFF_TCE which is
> undesirable.
>
Thanks for the feedback!
> For now we should disable in-kernel H_STUFF_TCE/... handlers in QEMU
> just like we do for VFIO for older host kernels:
>
> https://git.qemu.org/?p=qemu.git;a=blob;f=hw/ppc/spapr_iommu.c;h=3d3bcc86496a5277d62f7855fbb09c013c015f27;hb=HEAD#l208
I am still reading into this temporary solution, I could still not
understand how it works.
> I am not sure what a proper solution would be, something like an eventfd
> and KVM's kvmppc_h_stuff_tce() signaling vhost that the latter needs to
> invalidate iotlbs. Or we can just say that we do not allow KVM
> acceleration if there is vhost+iommu on the same liobn (== vPHB, pretty
> much). Thanks,
I am not used to eventfd, but i agree it's a valid solution to talk to
QEMU and then use it to send a message via /dev/vhost.
KVM -> QEMU -> vhost
But I can't get my mind out of another solution: doing it in
kernelspace. I am not sure how that would work, though.
If I could understand correctly, there is a vhost IOTLB per vhost_dev,
and H_STUFF_TCE is not called in 64-bit DMA case (for tce_value == 0
case, at least), which makes sense, given it doesn't need to invalidate
entries on IOTLB.
So, we would need to somehow replace `return H_TOO_HARD` in this patch
with code that could call vhost_process_iotlb_msg() with
VHOST_IOTLB_INVALIDATE.
For that, I would need to know what are the vhost_dev's of that
process, which I don't know if it's possible to do currently (or safe
at all).
I am thinking of linking all vhost_dev's with a list (list.h) that
could be searched, comparing `mm_struct *` of the calling task with all
vhost_dev's, and removing the entry of all IOTLB that hits.
Not sure if that's the best approach to find the related vhost_dev's.
What do you think?
Best regards,
Leonardo Bras
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/1] kvm/book3s_64: Fixes crash caused by not cleaning vhost IOTLB
@ 2019-12-18 23:28 ` Leonardo Bras
0 siblings, 0 replies; 12+ messages in thread
From: Leonardo Bras @ 2019-12-18 23:28 UTC (permalink / raw)
To: Alexey Kardashevskiy, Paul Mackerras, Benjamin Herrenschmidt,
Michael Ellerman
Cc: kvm-ppc, linuxppc-dev, linux-kernel, farosas, David Gibson
[-- Attachment #1: Type: text/plain, Size: 2188 bytes --]
On Wed, 2019-12-18 at 15:53 +1100, Alexey Kardashevskiy wrote:
> H_STUFF_TCE is always called with 0. Well, may be some AIX somewhere
> calls it with a value other than zero, and I probably saw some other
> value somewhere but in QEMU/KVM case it is 0 so you effectively disable
> in-kernel acceleration of H_STUFF_TCE which is
> undesirable.
>
Thanks for the feedback!
> For now we should disable in-kernel H_STUFF_TCE/... handlers in QEMU
> just like we do for VFIO for older host kernels:
>
> https://git.qemu.org/?p=qemu.git;a=blob;f=hw/ppc/spapr_iommu.c;h=3d3bcc86496a5277d62f7855fbb09c013c015f27;hb=HEAD#l208
I am still reading into this temporary solution, I could still not
understand how it works.
> I am not sure what a proper solution would be, something like an eventfd
> and KVM's kvmppc_h_stuff_tce() signaling vhost that the latter needs to
> invalidate iotlbs. Or we can just say that we do not allow KVM
> acceleration if there is vhost+iommu on the same liobn (== vPHB, pretty
> much). Thanks,
I am not used to eventfd, but i agree it's a valid solution to talk to
QEMU and then use it to send a message via /dev/vhost.
KVM -> QEMU -> vhost
But I can't get my mind out of another solution: doing it in
kernelspace. I am not sure how that would work, though.
If I could understand correctly, there is a vhost IOTLB per vhost_dev,
and H_STUFF_TCE is not called in 64-bit DMA case (for tce_value == 0
case, at least), which makes sense, given it doesn't need to invalidate
entries on IOTLB.
So, we would need to somehow replace `return H_TOO_HARD` in this patch
with code that could call vhost_process_iotlb_msg() with
VHOST_IOTLB_INVALIDATE.
For that, I would need to know what are the vhost_dev's of that
process, which I don't know if it's possible to do currently (or safe
at all).
I am thinking of linking all vhost_dev's with a list (list.h) that
could be searched, comparing `mm_struct *` of the calling task with all
vhost_dev's, and removing the entry of all IOTLB that hits.
Not sure if that's the best approach to find the related vhost_dev's.
What do you think?
Best regards,
Leonardo Bras
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/1] kvm/book3s_64: Fixes crash caused by not cleaning vhost IOTLB
@ 2019-12-18 23:28 ` Leonardo Bras
0 siblings, 0 replies; 12+ messages in thread
From: Leonardo Bras @ 2019-12-18 23:28 UTC (permalink / raw)
To: Alexey Kardashevskiy, Paul Mackerras, Benjamin Herrenschmidt,
Michael Ellerman
Cc: David Gibson, linuxppc-dev, linux-kernel, kvm-ppc, farosas
[-- Attachment #1: Type: text/plain, Size: 2188 bytes --]
On Wed, 2019-12-18 at 15:53 +1100, Alexey Kardashevskiy wrote:
> H_STUFF_TCE is always called with 0. Well, may be some AIX somewhere
> calls it with a value other than zero, and I probably saw some other
> value somewhere but in QEMU/KVM case it is 0 so you effectively disable
> in-kernel acceleration of H_STUFF_TCE which is
> undesirable.
>
Thanks for the feedback!
> For now we should disable in-kernel H_STUFF_TCE/... handlers in QEMU
> just like we do for VFIO for older host kernels:
>
> https://git.qemu.org/?p=qemu.git;a=blob;f=hw/ppc/spapr_iommu.c;h=3d3bcc86496a5277d62f7855fbb09c013c015f27;hb=HEAD#l208
I am still reading into this temporary solution, I could still not
understand how it works.
> I am not sure what a proper solution would be, something like an eventfd
> and KVM's kvmppc_h_stuff_tce() signaling vhost that the latter needs to
> invalidate iotlbs. Or we can just say that we do not allow KVM
> acceleration if there is vhost+iommu on the same liobn (== vPHB, pretty
> much). Thanks,
I am not used to eventfd, but i agree it's a valid solution to talk to
QEMU and then use it to send a message via /dev/vhost.
KVM -> QEMU -> vhost
But I can't get my mind out of another solution: doing it in
kernelspace. I am not sure how that would work, though.
If I could understand correctly, there is a vhost IOTLB per vhost_dev,
and H_STUFF_TCE is not called in 64-bit DMA case (for tce_value == 0
case, at least), which makes sense, given it doesn't need to invalidate
entries on IOTLB.
So, we would need to somehow replace `return H_TOO_HARD` in this patch
with code that could call vhost_process_iotlb_msg() with
VHOST_IOTLB_INVALIDATE.
For that, I would need to know what are the vhost_dev's of that
process, which I don't know if it's possible to do currently (or safe
at all).
I am thinking of linking all vhost_dev's with a list (list.h) that
could be searched, comparing `mm_struct *` of the calling task with all
vhost_dev's, and removing the entry of all IOTLB that hits.
Not sure if that's the best approach to find the related vhost_dev's.
What do you think?
Best regards,
Leonardo Bras
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/1] kvm/book3s_64: Fixes crash caused by not cleaning vhost IOTLB
2019-12-18 23:28 ` Leonardo Bras
(?)
@ 2019-12-20 3:35 ` Alexey Kardashevskiy
-1 siblings, 0 replies; 12+ messages in thread
From: Alexey Kardashevskiy @ 2019-12-20 3:35 UTC (permalink / raw)
To: Leonardo Bras, Paul Mackerras, Benjamin Herrenschmidt, Michael Ellerman
Cc: kvm-ppc, linuxppc-dev, linux-kernel, farosas, David Gibson,
Alex Williamson
On 19/12/2019 10:28, Leonardo Bras wrote:
> On Wed, 2019-12-18 at 15:53 +1100, Alexey Kardashevskiy wrote:
>> H_STUFF_TCE is always called with 0. Well, may be some AIX somewhere
>> calls it with a value other than zero, and I probably saw some other
>> value somewhere but in QEMU/KVM case it is 0 so you effectively disable
>> in-kernel acceleration of H_STUFF_TCE which is
>> undesirable.
>>
>
> Thanks for the feedback!
>
>> For now we should disable in-kernel H_STUFF_TCE/... handlers in QEMU
>> just like we do for VFIO for older host kernels:
>>
>> https://git.qemu.org/?p=qemu.git;a=blob;f=hw/ppc/spapr_iommu.c;h=3d3bcc86496a5277d62f7855fbb09c013c015f27;hb=HEAD#l208
>
> I am still reading into this temporary solution, I could still not
> understand how it works.
>
>> I am not sure what a proper solution would be, something like an eventfd
>> and KVM's kvmppc_h_stuff_tce() signaling vhost that the latter needs to
>> invalidate iotlbs. Or we can just say that we do not allow KVM
>> acceleration if there is vhost+iommu on the same liobn (== vPHB, pretty
>> much). Thanks,
>
> I am not used to eventfd, but i agree it's a valid solution to talk to
> QEMU and then use it to send a message via /dev/vhost.
> KVM -> QEMU -> vhost
>
> But I can't get my mind out of another solution: doing it in
> kernelspace. I am not sure how that would work, though.
>
> If I could understand correctly, there is a vhost IOTLB per vhost_dev,
> and H_STUFF_TCE is not called in 64-bit DMA case (for tce_value == 0
> case, at least), which makes sense, given it doesn't need to invalidate
> entries on IOTLB.
>
> So, we would need to somehow replace `return H_TOO_HARD` in this patch
> with code that could call vhost_process_iotlb_msg() with
> VHOST_IOTLB_INVALIDATE.
>
> For that, I would need to know what are the vhost_dev's of that
> process, which I don't know if it's possible to do currently (or safe
> at all).
>
> I am thinking of linking all vhost_dev's with a list (list.h) that
> could be searched, comparing `mm_struct *` of the calling task with all
> vhost_dev's, and removing the entry of all IOTLB that hits.
>
> Not sure if that's the best approach to find the related vhost_dev's.
>
> What do you think?
As discussed in slack, we need to do the same thing we do with physical
devices when we invalidate hardware IOMMU translation caches via
tbl->it_ops->tce_kill. The problem to solve now is how we tell KVM/PPC
about vhost/iotlb (is there an fd?), something similar to the existing
KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE. I guess x86 handles all the mappings
in QEMU and therefore they do not have this problem. Thanks,
--
Alexey
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/1] kvm/book3s_64: Fixes crash caused by not cleaning vhost IOTLB
@ 2019-12-20 3:35 ` Alexey Kardashevskiy
0 siblings, 0 replies; 12+ messages in thread
From: Alexey Kardashevskiy @ 2019-12-20 3:35 UTC (permalink / raw)
To: Leonardo Bras, Paul Mackerras, Benjamin Herrenschmidt, Michael Ellerman
Cc: kvm-ppc, linuxppc-dev, linux-kernel, farosas, David Gibson,
Alex Williamson
On 19/12/2019 10:28, Leonardo Bras wrote:
> On Wed, 2019-12-18 at 15:53 +1100, Alexey Kardashevskiy wrote:
>> H_STUFF_TCE is always called with 0. Well, may be some AIX somewhere
>> calls it with a value other than zero, and I probably saw some other
>> value somewhere but in QEMU/KVM case it is 0 so you effectively disable
>> in-kernel acceleration of H_STUFF_TCE which is
>> undesirable.
>>
>
> Thanks for the feedback!
>
>> For now we should disable in-kernel H_STUFF_TCE/... handlers in QEMU
>> just like we do for VFIO for older host kernels:
>>
>> https://git.qemu.org/?p=qemu.git;a=blob;f=hw/ppc/spapr_iommu.c;h=3bcc86496a5277d62f7855fbb09c013c015f27;hb=HEAD#l208
>
> I am still reading into this temporary solution, I could still not
> understand how it works.
>
>> I am not sure what a proper solution would be, something like an eventfd
>> and KVM's kvmppc_h_stuff_tce() signaling vhost that the latter needs to
>> invalidate iotlbs. Or we can just say that we do not allow KVM
>> acceleration if there is vhost+iommu on the same liobn (= vPHB, pretty
>> much). Thanks,
>
> I am not used to eventfd, but i agree it's a valid solution to talk to
> QEMU and then use it to send a message via /dev/vhost.
> KVM -> QEMU -> vhost
>
> But I can't get my mind out of another solution: doing it in
> kernelspace. I am not sure how that would work, though.
>
> If I could understand correctly, there is a vhost IOTLB per vhost_dev,
> and H_STUFF_TCE is not called in 64-bit DMA case (for tce_value = 0
> case, at least), which makes sense, given it doesn't need to invalidate
> entries on IOTLB.
>
> So, we would need to somehow replace `return H_TOO_HARD` in this patch
> with code that could call vhost_process_iotlb_msg() with
> VHOST_IOTLB_INVALIDATE.
>
> For that, I would need to know what are the vhost_dev's of that
> process, which I don't know if it's possible to do currently (or safe
> at all).
>
> I am thinking of linking all vhost_dev's with a list (list.h) that
> could be searched, comparing `mm_struct *` of the calling task with all
> vhost_dev's, and removing the entry of all IOTLB that hits.
>
> Not sure if that's the best approach to find the related vhost_dev's.
>
> What do you think?
As discussed in slack, we need to do the same thing we do with physical
devices when we invalidate hardware IOMMU translation caches via
tbl->it_ops->tce_kill. The problem to solve now is how we tell KVM/PPC
about vhost/iotlb (is there an fd?), something similar to the existing
KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE. I guess x86 handles all the mappings
in QEMU and therefore they do not have this problem. Thanks,
--
Alexey
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH 1/1] kvm/book3s_64: Fixes crash caused by not cleaning vhost IOTLB
@ 2019-12-20 3:35 ` Alexey Kardashevskiy
0 siblings, 0 replies; 12+ messages in thread
From: Alexey Kardashevskiy @ 2019-12-20 3:35 UTC (permalink / raw)
To: Leonardo Bras, Paul Mackerras, Benjamin Herrenschmidt, Michael Ellerman
Cc: farosas, linux-kernel, kvm-ppc, Alex Williamson, linuxppc-dev,
David Gibson
On 19/12/2019 10:28, Leonardo Bras wrote:
> On Wed, 2019-12-18 at 15:53 +1100, Alexey Kardashevskiy wrote:
>> H_STUFF_TCE is always called with 0. Well, may be some AIX somewhere
>> calls it with a value other than zero, and I probably saw some other
>> value somewhere but in QEMU/KVM case it is 0 so you effectively disable
>> in-kernel acceleration of H_STUFF_TCE which is
>> undesirable.
>>
>
> Thanks for the feedback!
>
>> For now we should disable in-kernel H_STUFF_TCE/... handlers in QEMU
>> just like we do for VFIO for older host kernels:
>>
>> https://git.qemu.org/?p=qemu.git;a=blob;f=hw/ppc/spapr_iommu.c;h=3d3bcc86496a5277d62f7855fbb09c013c015f27;hb=HEAD#l208
>
> I am still reading into this temporary solution, I could still not
> understand how it works.
>
>> I am not sure what a proper solution would be, something like an eventfd
>> and KVM's kvmppc_h_stuff_tce() signaling vhost that the latter needs to
>> invalidate iotlbs. Or we can just say that we do not allow KVM
>> acceleration if there is vhost+iommu on the same liobn (== vPHB, pretty
>> much). Thanks,
>
> I am not used to eventfd, but i agree it's a valid solution to talk to
> QEMU and then use it to send a message via /dev/vhost.
> KVM -> QEMU -> vhost
>
> But I can't get my mind out of another solution: doing it in
> kernelspace. I am not sure how that would work, though.
>
> If I could understand correctly, there is a vhost IOTLB per vhost_dev,
> and H_STUFF_TCE is not called in 64-bit DMA case (for tce_value == 0
> case, at least), which makes sense, given it doesn't need to invalidate
> entries on IOTLB.
>
> So, we would need to somehow replace `return H_TOO_HARD` in this patch
> with code that could call vhost_process_iotlb_msg() with
> VHOST_IOTLB_INVALIDATE.
>
> For that, I would need to know what are the vhost_dev's of that
> process, which I don't know if it's possible to do currently (or safe
> at all).
>
> I am thinking of linking all vhost_dev's with a list (list.h) that
> could be searched, comparing `mm_struct *` of the calling task with all
> vhost_dev's, and removing the entry of all IOTLB that hits.
>
> Not sure if that's the best approach to find the related vhost_dev's.
>
> What do you think?
As discussed in slack, we need to do the same thing we do with physical
devices when we invalidate hardware IOMMU translation caches via
tbl->it_ops->tce_kill. The problem to solve now is how we tell KVM/PPC
about vhost/iotlb (is there an fd?), something similar to the existing
KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE. I guess x86 handles all the mappings
in QEMU and therefore they do not have this problem. Thanks,
--
Alexey
^ permalink raw reply [flat|nested] 12+ messages in thread