* [RFC 0/4] Virtio uses DMA API for all devices @ 2018-07-20 3:59 Anshuman Khandual 0 siblings, 0 replies; 206+ messages in thread From: Anshuman Khandual @ 2018-07-20 3:59 UTC (permalink / raw) To: virtualization, linux-kernel Cc: robh, srikar, mst, benh, linuxram, hch, paulus, mpe, joe, khandual, linuxppc-dev, elfring, haren, david This patch series is the follow up on the discussions we had before about the RFC titled [RFC,V2] virtio: Add platform specific DMA API translation for virito devices (https://patchwork.kernel.org/patch/10417371/). There were suggestions about doing away with two different paths of transactions with the host/QEMU, first being the direct GPA and the other being the DMA API based translations. First patch attempts to create a direct GPA mapping based DMA operations structure called 'virtio_direct_dma_ops' with exact same implementation of the direct GPA path which virtio core currently has but just wrapped in a DMA API format. Virtio core must use 'virtio_direct_dma_ops' instead of the arch default in absence of VIRTIO_F_IOMMU_PLATFORM flag to preserve the existing semantics. The second patch does exactly that inside the function virtio_finalize_features(). The third patch removes the default direct GPA path from virtio core forcing it to use DMA API callbacks for all devices. Now with that change, every device must have a DMA operations structure associated with it. The fourth patch adds an additional hook which gives the platform an opportunity to do yet another override if required. This platform hook can be used on POWER Ultravisor based protected guests to load up SWIOTLB DMA callbacks to do the required (as discussed previously in the above mentioned thread how host is allowed to access only parts of the guest GPA range) bounce buffering into the shared memory for all I/O scatter gather buffers to be consumed on the host side. Please go through these patches and review whether this approach broadly makes sense. I will appreciate suggestions, inputs, comments regarding the patches or the approach in general. Thank you. Anshuman Khandual (4): virtio: Define virtio_direct_dma_ops structure virtio: Override device's DMA OPS with virtio_direct_dma_ops selectively virtio: Force virtio core to use DMA API callbacks for all virtio devices virtio: Add platform specific DMA API translation for virito devices arch/powerpc/include/asm/dma-mapping.h | 6 +++ arch/powerpc/platforms/pseries/iommu.c | 6 +++ drivers/virtio/virtio.c | 72 ++++++++++++++++++++++++++++++++++ drivers/virtio/virtio_pci_common.h | 3 ++ drivers/virtio/virtio_ring.c | 65 +----------------------------- 5 files changed, 89 insertions(+), 63 deletions(-) -- 2.9.3 ^ permalink raw reply [flat|nested] 206+ messages in thread
* [RFC 0/4] Virtio uses DMA API for all devices @ 2018-07-20 3:59 Anshuman Khandual 2018-07-20 13:16 ` Michael S. Tsirkin ` (4 more replies) 0 siblings, 5 replies; 206+ messages in thread From: Anshuman Khandual @ 2018-07-20 3:59 UTC (permalink / raw) To: virtualization, linux-kernel Cc: linuxppc-dev, aik, robh, joe, elfring, david, jasowang, benh, mpe, mst, hch, khandual, linuxram, haren, paulus, srikar This patch series is the follow up on the discussions we had before about the RFC titled [RFC,V2] virtio: Add platform specific DMA API translation for virito devices (https://patchwork.kernel.org/patch/10417371/). There were suggestions about doing away with two different paths of transactions with the host/QEMU, first being the direct GPA and the other being the DMA API based translations. First patch attempts to create a direct GPA mapping based DMA operations structure called 'virtio_direct_dma_ops' with exact same implementation of the direct GPA path which virtio core currently has but just wrapped in a DMA API format. Virtio core must use 'virtio_direct_dma_ops' instead of the arch default in absence of VIRTIO_F_IOMMU_PLATFORM flag to preserve the existing semantics. The second patch does exactly that inside the function virtio_finalize_features(). The third patch removes the default direct GPA path from virtio core forcing it to use DMA API callbacks for all devices. Now with that change, every device must have a DMA operations structure associated with it. The fourth patch adds an additional hook which gives the platform an opportunity to do yet another override if required. This platform hook can be used on POWER Ultravisor based protected guests to load up SWIOTLB DMA callbacks to do the required (as discussed previously in the above mentioned thread how host is allowed to access only parts of the guest GPA range) bounce buffering into the shared memory for all I/O scatter gather buffers to be consumed on the host side. Please go through these patches and review whether this approach broadly makes sense. I will appreciate suggestions, inputs, comments regarding the patches or the approach in general. Thank you. Anshuman Khandual (4): virtio: Define virtio_direct_dma_ops structure virtio: Override device's DMA OPS with virtio_direct_dma_ops selectively virtio: Force virtio core to use DMA API callbacks for all virtio devices virtio: Add platform specific DMA API translation for virito devices arch/powerpc/include/asm/dma-mapping.h | 6 +++ arch/powerpc/platforms/pseries/iommu.c | 6 +++ drivers/virtio/virtio.c | 72 ++++++++++++++++++++++++++++++++++ drivers/virtio/virtio_pci_common.h | 3 ++ drivers/virtio/virtio_ring.c | 65 +----------------------------- 5 files changed, 89 insertions(+), 63 deletions(-) -- 2.9.3 ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-07-20 3:59 Anshuman Khandual @ 2018-07-20 13:16 ` Michael S. Tsirkin 2018-07-20 13:16 ` Michael S. Tsirkin ` (3 subsequent siblings) 4 siblings, 0 replies; 206+ messages in thread From: Michael S. Tsirkin @ 2018-07-20 13:16 UTC (permalink / raw) To: Anshuman Khandual Cc: robh, srikar, benh, linuxram, linux-kernel, virtualization, hch, paulus, mpe, joe, linuxppc-dev, elfring, haren, david On Fri, Jul 20, 2018 at 09:29:37AM +0530, Anshuman Khandual wrote: > This patch series is the follow up on the discussions we had before about > the RFC titled [RFC,V2] virtio: Add platform specific DMA API translation > for virito devices (https://patchwork.kernel.org/patch/10417371/). There > were suggestions about doing away with two different paths of transactions > with the host/QEMU, first being the direct GPA and the other being the DMA > API based translations. > > First patch attempts to create a direct GPA mapping based DMA operations > structure called 'virtio_direct_dma_ops' with exact same implementation > of the direct GPA path which virtio core currently has but just wrapped in > a DMA API format. Virtio core must use 'virtio_direct_dma_ops' instead of > the arch default in absence of VIRTIO_F_IOMMU_PLATFORM flag to preserve the > existing semantics. The second patch does exactly that inside the function > virtio_finalize_features(). The third patch removes the default direct GPA > path from virtio core forcing it to use DMA API callbacks for all devices. > Now with that change, every device must have a DMA operations structure > associated with it. The fourth patch adds an additional hook which gives > the platform an opportunity to do yet another override if required. This > platform hook can be used on POWER Ultravisor based protected guests to > load up SWIOTLB DMA callbacks to do the required (as discussed previously > in the above mentioned thread how host is allowed to access only parts of > the guest GPA range) bounce buffering into the shared memory for all I/O > scatter gather buffers to be consumed on the host side. > > Please go through these patches and review whether this approach broadly > makes sense. I will appreciate suggestions, inputs, comments regarding > the patches or the approach in general. Thank you. I like how patches 1-3 look. Could you test performance with/without to see whether the extra indirection through use of DMA ops causes a measurable slow-down? > Anshuman Khandual (4): > virtio: Define virtio_direct_dma_ops structure > virtio: Override device's DMA OPS with virtio_direct_dma_ops selectively > virtio: Force virtio core to use DMA API callbacks for all virtio devices > virtio: Add platform specific DMA API translation for virito devices > > arch/powerpc/include/asm/dma-mapping.h | 6 +++ > arch/powerpc/platforms/pseries/iommu.c | 6 +++ > drivers/virtio/virtio.c | 72 ++++++++++++++++++++++++++++++++++ > drivers/virtio/virtio_pci_common.h | 3 ++ > drivers/virtio/virtio_ring.c | 65 +----------------------------- > 5 files changed, 89 insertions(+), 63 deletions(-) > > -- > 2.9.3 ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-07-20 3:59 Anshuman Khandual 2018-07-20 13:16 ` Michael S. Tsirkin @ 2018-07-20 13:16 ` Michael S. Tsirkin 2018-07-23 6:28 ` Anshuman Khandual 2018-07-23 6:28 ` Anshuman Khandual 2018-07-27 9:58 ` Will Deacon ` (2 subsequent siblings) 4 siblings, 2 replies; 206+ messages in thread From: Michael S. Tsirkin @ 2018-07-20 13:16 UTC (permalink / raw) To: Anshuman Khandual Cc: virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, benh, mpe, hch, linuxram, haren, paulus, srikar On Fri, Jul 20, 2018 at 09:29:37AM +0530, Anshuman Khandual wrote: > This patch series is the follow up on the discussions we had before about > the RFC titled [RFC,V2] virtio: Add platform specific DMA API translation > for virito devices (https://patchwork.kernel.org/patch/10417371/). There > were suggestions about doing away with two different paths of transactions > with the host/QEMU, first being the direct GPA and the other being the DMA > API based translations. > > First patch attempts to create a direct GPA mapping based DMA operations > structure called 'virtio_direct_dma_ops' with exact same implementation > of the direct GPA path which virtio core currently has but just wrapped in > a DMA API format. Virtio core must use 'virtio_direct_dma_ops' instead of > the arch default in absence of VIRTIO_F_IOMMU_PLATFORM flag to preserve the > existing semantics. The second patch does exactly that inside the function > virtio_finalize_features(). The third patch removes the default direct GPA > path from virtio core forcing it to use DMA API callbacks for all devices. > Now with that change, every device must have a DMA operations structure > associated with it. The fourth patch adds an additional hook which gives > the platform an opportunity to do yet another override if required. This > platform hook can be used on POWER Ultravisor based protected guests to > load up SWIOTLB DMA callbacks to do the required (as discussed previously > in the above mentioned thread how host is allowed to access only parts of > the guest GPA range) bounce buffering into the shared memory for all I/O > scatter gather buffers to be consumed on the host side. > > Please go through these patches and review whether this approach broadly > makes sense. I will appreciate suggestions, inputs, comments regarding > the patches or the approach in general. Thank you. I like how patches 1-3 look. Could you test performance with/without to see whether the extra indirection through use of DMA ops causes a measurable slow-down? > Anshuman Khandual (4): > virtio: Define virtio_direct_dma_ops structure > virtio: Override device's DMA OPS with virtio_direct_dma_ops selectively > virtio: Force virtio core to use DMA API callbacks for all virtio devices > virtio: Add platform specific DMA API translation for virito devices > > arch/powerpc/include/asm/dma-mapping.h | 6 +++ > arch/powerpc/platforms/pseries/iommu.c | 6 +++ > drivers/virtio/virtio.c | 72 ++++++++++++++++++++++++++++++++++ > drivers/virtio/virtio_pci_common.h | 3 ++ > drivers/virtio/virtio_ring.c | 65 +----------------------------- > 5 files changed, 89 insertions(+), 63 deletions(-) > > -- > 2.9.3 ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-07-20 13:16 ` Michael S. Tsirkin @ 2018-07-23 6:28 ` Anshuman Khandual 2018-07-23 9:08 ` Michael S. Tsirkin 2018-07-23 6:28 ` Anshuman Khandual 1 sibling, 1 reply; 206+ messages in thread From: Anshuman Khandual @ 2018-07-23 6:28 UTC (permalink / raw) To: Michael S. Tsirkin Cc: robh, srikar, aik, jasowang, linuxram, linux-kernel, virtualization, hch, paulus, joe, linuxppc-dev, elfring, haren, david On 07/20/2018 06:46 PM, Michael S. Tsirkin wrote: > On Fri, Jul 20, 2018 at 09:29:37AM +0530, Anshuman Khandual wrote: >> This patch series is the follow up on the discussions we had before about >> the RFC titled [RFC,V2] virtio: Add platform specific DMA API translation >> for virito devices (https://patchwork.kernel.org/patch/10417371/). There >> were suggestions about doing away with two different paths of transactions >> with the host/QEMU, first being the direct GPA and the other being the DMA >> API based translations. >> >> First patch attempts to create a direct GPA mapping based DMA operations >> structure called 'virtio_direct_dma_ops' with exact same implementation >> of the direct GPA path which virtio core currently has but just wrapped in >> a DMA API format. Virtio core must use 'virtio_direct_dma_ops' instead of >> the arch default in absence of VIRTIO_F_IOMMU_PLATFORM flag to preserve the >> existing semantics. The second patch does exactly that inside the function >> virtio_finalize_features(). The third patch removes the default direct GPA >> path from virtio core forcing it to use DMA API callbacks for all devices. >> Now with that change, every device must have a DMA operations structure >> associated with it. The fourth patch adds an additional hook which gives >> the platform an opportunity to do yet another override if required. This >> platform hook can be used on POWER Ultravisor based protected guests to >> load up SWIOTLB DMA callbacks to do the required (as discussed previously >> in the above mentioned thread how host is allowed to access only parts of >> the guest GPA range) bounce buffering into the shared memory for all I/O >> scatter gather buffers to be consumed on the host side. >> >> Please go through these patches and review whether this approach broadly >> makes sense. I will appreciate suggestions, inputs, comments regarding >> the patches or the approach in general. Thank you. > I like how patches 1-3 look. Could you test performance > with/without to see whether the extra indirection through > use of DMA ops causes a measurable slow-down? I ran this simple DD command 10 times where /dev/vda is a virtio block device of 10GB size. dd if=/dev/zero of=/dev/vda bs=8M count=1024 oflag=direct With and without patches bandwidth which has a bit wide range does not look that different from each other. Without patches =============== ---------- 1 --------- 1024+0 records in 1024+0 records out 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.95557 s, 4.4 GB/s ---------- 2 --------- 1024+0 records in 1024+0 records out 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 2.05176 s, 4.2 GB/s ---------- 3 --------- 1024+0 records in 1024+0 records out 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.88314 s, 4.6 GB/s ---------- 4 --------- 1024+0 records in 1024+0 records out 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.84899 s, 4.6 GB/s ---------- 5 --------- 1024+0 records in 1024+0 records out 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 5.37184 s, 1.6 GB/s ---------- 6 --------- 1024+0 records in 1024+0 records out 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.9205 s, 4.5 GB/s ---------- 7 --------- 1024+0 records in 1024+0 records out 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 6.85166 s, 1.3 GB/s ---------- 8 --------- 1024+0 records in 1024+0 records out 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.74049 s, 4.9 GB/s ---------- 9 --------- 1024+0 records in 1024+0 records out 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 6.31699 s, 1.4 GB/s ---------- 10 --------- 1024+0 records in 1024+0 records out 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 2.47057 s, 3.5 GB/s With patches ============ ---------- 1 --------- 1024+0 records in 1024+0 records out 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 2.25993 s, 3.8 GB/s ---------- 2 --------- 1024+0 records in 1024+0 records out 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.82438 s, 4.7 GB/s ---------- 3 --------- 1024+0 records in 1024+0 records out 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.93856 s, 4.4 GB/s ---------- 4 --------- 1024+0 records in 1024+0 records out 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.83405 s, 4.7 GB/s ---------- 5 --------- 1024+0 records in 1024+0 records out 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 7.50199 s, 1.1 GB/s ---------- 6 --------- 1024+0 records in 1024+0 records out 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 2.28742 s, 3.8 GB/s ---------- 7 --------- 1024+0 records in 1024+0 records out 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 5.74958 s, 1.5 GB/s ---------- 8 --------- 1024+0 records in 1024+0 records out 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.99149 s, 4.3 GB/s ---------- 9 --------- 1024+0 records in 1024+0 records out 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 5.67647 s, 1.5 GB/s ---------- 10 --------- 1024+0 records in 1024+0 records out 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 2.93957 s, 2.9 GB/s Does this look okay ? ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-07-23 6:28 ` Anshuman Khandual @ 2018-07-23 9:08 ` Michael S. Tsirkin 0 siblings, 0 replies; 206+ messages in thread From: Michael S. Tsirkin @ 2018-07-23 9:08 UTC (permalink / raw) To: Anshuman Khandual Cc: robh, srikar, aik, jasowang, linuxram, linux-kernel, virtualization, hch, paulus, joe, linuxppc-dev, elfring, haren, david On Mon, Jul 23, 2018 at 11:58:23AM +0530, Anshuman Khandual wrote: > On 07/20/2018 06:46 PM, Michael S. Tsirkin wrote: > > On Fri, Jul 20, 2018 at 09:29:37AM +0530, Anshuman Khandual wrote: > >> This patch series is the follow up on the discussions we had before about > >> the RFC titled [RFC,V2] virtio: Add platform specific DMA API translation > >> for virito devices (https://patchwork.kernel.org/patch/10417371/). There > >> were suggestions about doing away with two different paths of transactions > >> with the host/QEMU, first being the direct GPA and the other being the DMA > >> API based translations. > >> > >> First patch attempts to create a direct GPA mapping based DMA operations > >> structure called 'virtio_direct_dma_ops' with exact same implementation > >> of the direct GPA path which virtio core currently has but just wrapped in > >> a DMA API format. Virtio core must use 'virtio_direct_dma_ops' instead of > >> the arch default in absence of VIRTIO_F_IOMMU_PLATFORM flag to preserve the > >> existing semantics. The second patch does exactly that inside the function > >> virtio_finalize_features(). The third patch removes the default direct GPA > >> path from virtio core forcing it to use DMA API callbacks for all devices. > >> Now with that change, every device must have a DMA operations structure > >> associated with it. The fourth patch adds an additional hook which gives > >> the platform an opportunity to do yet another override if required. This > >> platform hook can be used on POWER Ultravisor based protected guests to > >> load up SWIOTLB DMA callbacks to do the required (as discussed previously > >> in the above mentioned thread how host is allowed to access only parts of > >> the guest GPA range) bounce buffering into the shared memory for all I/O > >> scatter gather buffers to be consumed on the host side. > >> > >> Please go through these patches and review whether this approach broadly > >> makes sense. I will appreciate suggestions, inputs, comments regarding > >> the patches or the approach in general. Thank you. > > I like how patches 1-3 look. Could you test performance > > with/without to see whether the extra indirection through > > use of DMA ops causes a measurable slow-down? > > I ran this simple DD command 10 times where /dev/vda is a virtio block > device of 10GB size. > > dd if=/dev/zero of=/dev/vda bs=8M count=1024 oflag=direct > > With and without patches bandwidth which has a bit wide range does not > look that different from each other. > > Without patches > =============== > > ---------- 1 --------- > 1024+0 records in > 1024+0 records out > 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.95557 s, 4.4 GB/s > ---------- 2 --------- > 1024+0 records in > 1024+0 records out > 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 2.05176 s, 4.2 GB/s > ---------- 3 --------- > 1024+0 records in > 1024+0 records out > 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.88314 s, 4.6 GB/s > ---------- 4 --------- > 1024+0 records in > 1024+0 records out > 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.84899 s, 4.6 GB/s > ---------- 5 --------- > 1024+0 records in > 1024+0 records out > 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 5.37184 s, 1.6 GB/s > ---------- 6 --------- > 1024+0 records in > 1024+0 records out > 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.9205 s, 4.5 GB/s > ---------- 7 --------- > 1024+0 records in > 1024+0 records out > 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 6.85166 s, 1.3 GB/s > ---------- 8 --------- > 1024+0 records in > 1024+0 records out > 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.74049 s, 4.9 GB/s > ---------- 9 --------- > 1024+0 records in > 1024+0 records out > 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 6.31699 s, 1.4 GB/s > ---------- 10 --------- > 1024+0 records in > 1024+0 records out > 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 2.47057 s, 3.5 GB/s > > > With patches > ============ > > ---------- 1 --------- > 1024+0 records in > 1024+0 records out > 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 2.25993 s, 3.8 GB/s > ---------- 2 --------- > 1024+0 records in > 1024+0 records out > 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.82438 s, 4.7 GB/s > ---------- 3 --------- > 1024+0 records in > 1024+0 records out > 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.93856 s, 4.4 GB/s > ---------- 4 --------- > 1024+0 records in > 1024+0 records out > 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.83405 s, 4.7 GB/s > ---------- 5 --------- > 1024+0 records in > 1024+0 records out > 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 7.50199 s, 1.1 GB/s > ---------- 6 --------- > 1024+0 records in > 1024+0 records out > 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 2.28742 s, 3.8 GB/s > ---------- 7 --------- > 1024+0 records in > 1024+0 records out > 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 5.74958 s, 1.5 GB/s > ---------- 8 --------- > 1024+0 records in > 1024+0 records out > 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.99149 s, 4.3 GB/s > ---------- 9 --------- > 1024+0 records in > 1024+0 records out > 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 5.67647 s, 1.5 GB/s > ---------- 10 --------- > 1024+0 records in > 1024+0 records out > 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 2.93957 s, 2.9 GB/s > > Does this look okay ? You want to test IOPS with lots of small writes and using raw ramdisk on host. -- MST ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices @ 2018-07-23 9:08 ` Michael S. Tsirkin 0 siblings, 0 replies; 206+ messages in thread From: Michael S. Tsirkin @ 2018-07-23 9:08 UTC (permalink / raw) To: Anshuman Khandual Cc: robh, srikar, linuxram, linux-kernel, virtualization, hch, paulus, joe, linuxppc-dev, elfring, haren, david On Mon, Jul 23, 2018 at 11:58:23AM +0530, Anshuman Khandual wrote: > On 07/20/2018 06:46 PM, Michael S. Tsirkin wrote: > > On Fri, Jul 20, 2018 at 09:29:37AM +0530, Anshuman Khandual wrote: > >> This patch series is the follow up on the discussions we had before about > >> the RFC titled [RFC,V2] virtio: Add platform specific DMA API translation > >> for virito devices (https://patchwork.kernel.org/patch/10417371/). There > >> were suggestions about doing away with two different paths of transactions > >> with the host/QEMU, first being the direct GPA and the other being the DMA > >> API based translations. > >> > >> First patch attempts to create a direct GPA mapping based DMA operations > >> structure called 'virtio_direct_dma_ops' with exact same implementation > >> of the direct GPA path which virtio core currently has but just wrapped in > >> a DMA API format. Virtio core must use 'virtio_direct_dma_ops' instead of > >> the arch default in absence of VIRTIO_F_IOMMU_PLATFORM flag to preserve the > >> existing semantics. The second patch does exactly that inside the function > >> virtio_finalize_features(). The third patch removes the default direct GPA > >> path from virtio core forcing it to use DMA API callbacks for all devices. > >> Now with that change, every device must have a DMA operations structure > >> associated with it. The fourth patch adds an additional hook which gives > >> the platform an opportunity to do yet another override if required. This > >> platform hook can be used on POWER Ultravisor based protected guests to > >> load up SWIOTLB DMA callbacks to do the required (as discussed previously > >> in the above mentioned thread how host is allowed to access only parts of > >> the guest GPA range) bounce buffering into the shared memory for all I/O > >> scatter gather buffers to be consumed on the host side. > >> > >> Please go through these patches and review whether this approach broadly > >> makes sense. I will appreciate suggestions, inputs, comments regarding > >> the patches or the approach in general. Thank you. > > I like how patches 1-3 look. Could you test performance > > with/without to see whether the extra indirection through > > use of DMA ops causes a measurable slow-down? > > I ran this simple DD command 10 times where /dev/vda is a virtio block > device of 10GB size. > > dd if=/dev/zero of=/dev/vda bs=8M count=1024 oflag=direct > > With and without patches bandwidth which has a bit wide range does not > look that different from each other. > > Without patches > =============== > > ---------- 1 --------- > 1024+0 records in > 1024+0 records out > 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.95557 s, 4.4 GB/s > ---------- 2 --------- > 1024+0 records in > 1024+0 records out > 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 2.05176 s, 4.2 GB/s > ---------- 3 --------- > 1024+0 records in > 1024+0 records out > 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.88314 s, 4.6 GB/s > ---------- 4 --------- > 1024+0 records in > 1024+0 records out > 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.84899 s, 4.6 GB/s > ---------- 5 --------- > 1024+0 records in > 1024+0 records out > 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 5.37184 s, 1.6 GB/s > ---------- 6 --------- > 1024+0 records in > 1024+0 records out > 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.9205 s, 4.5 GB/s > ---------- 7 --------- > 1024+0 records in > 1024+0 records out > 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 6.85166 s, 1.3 GB/s > ---------- 8 --------- > 1024+0 records in > 1024+0 records out > 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.74049 s, 4.9 GB/s > ---------- 9 --------- > 1024+0 records in > 1024+0 records out > 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 6.31699 s, 1.4 GB/s > ---------- 10 --------- > 1024+0 records in > 1024+0 records out > 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 2.47057 s, 3.5 GB/s > > > With patches > ============ > > ---------- 1 --------- > 1024+0 records in > 1024+0 records out > 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 2.25993 s, 3.8 GB/s > ---------- 2 --------- > 1024+0 records in > 1024+0 records out > 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.82438 s, 4.7 GB/s > ---------- 3 --------- > 1024+0 records in > 1024+0 records out > 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.93856 s, 4.4 GB/s > ---------- 4 --------- > 1024+0 records in > 1024+0 records out > 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.83405 s, 4.7 GB/s > ---------- 5 --------- > 1024+0 records in > 1024+0 records out > 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 7.50199 s, 1.1 GB/s > ---------- 6 --------- > 1024+0 records in > 1024+0 records out > 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 2.28742 s, 3.8 GB/s > ---------- 7 --------- > 1024+0 records in > 1024+0 records out > 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 5.74958 s, 1.5 GB/s > ---------- 8 --------- > 1024+0 records in > 1024+0 records out > 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.99149 s, 4.3 GB/s > ---------- 9 --------- > 1024+0 records in > 1024+0 records out > 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 5.67647 s, 1.5 GB/s > ---------- 10 --------- > 1024+0 records in > 1024+0 records out > 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 2.93957 s, 2.9 GB/s > > Does this look okay ? You want to test IOPS with lots of small writes and using raw ramdisk on host. -- MST ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-07-23 9:08 ` Michael S. Tsirkin (?) @ 2018-07-25 3:26 ` Anshuman Khandual 2018-07-27 11:31 ` Michael S. Tsirkin -1 siblings, 1 reply; 206+ messages in thread From: Anshuman Khandual @ 2018-07-25 3:26 UTC (permalink / raw) To: Michael S. Tsirkin Cc: robh, srikar, aik, jasowang, linuxram, linux-kernel, virtualization, hch, paulus, joe, linuxppc-dev, elfring, haren, david On 07/23/2018 02:38 PM, Michael S. Tsirkin wrote: > On Mon, Jul 23, 2018 at 11:58:23AM +0530, Anshuman Khandual wrote: >> On 07/20/2018 06:46 PM, Michael S. Tsirkin wrote: >>> On Fri, Jul 20, 2018 at 09:29:37AM +0530, Anshuman Khandual wrote: >>>> This patch series is the follow up on the discussions we had before about >>>> the RFC titled [RFC,V2] virtio: Add platform specific DMA API translation >>>> for virito devices (https://patchwork.kernel.org/patch/10417371/). There >>>> were suggestions about doing away with two different paths of transactions >>>> with the host/QEMU, first being the direct GPA and the other being the DMA >>>> API based translations. >>>> >>>> First patch attempts to create a direct GPA mapping based DMA operations >>>> structure called 'virtio_direct_dma_ops' with exact same implementation >>>> of the direct GPA path which virtio core currently has but just wrapped in >>>> a DMA API format. Virtio core must use 'virtio_direct_dma_ops' instead of >>>> the arch default in absence of VIRTIO_F_IOMMU_PLATFORM flag to preserve the >>>> existing semantics. The second patch does exactly that inside the function >>>> virtio_finalize_features(). The third patch removes the default direct GPA >>>> path from virtio core forcing it to use DMA API callbacks for all devices. >>>> Now with that change, every device must have a DMA operations structure >>>> associated with it. The fourth patch adds an additional hook which gives >>>> the platform an opportunity to do yet another override if required. This >>>> platform hook can be used on POWER Ultravisor based protected guests to >>>> load up SWIOTLB DMA callbacks to do the required (as discussed previously >>>> in the above mentioned thread how host is allowed to access only parts of >>>> the guest GPA range) bounce buffering into the shared memory for all I/O >>>> scatter gather buffers to be consumed on the host side. >>>> >>>> Please go through these patches and review whether this approach broadly >>>> makes sense. I will appreciate suggestions, inputs, comments regarding >>>> the patches or the approach in general. Thank you. >>> I like how patches 1-3 look. Could you test performance >>> with/without to see whether the extra indirection through >>> use of DMA ops causes a measurable slow-down? >> >> I ran this simple DD command 10 times where /dev/vda is a virtio block >> device of 10GB size. >> >> dd if=/dev/zero of=/dev/vda bs=8M count=1024 oflag=direct >> >> With and without patches bandwidth which has a bit wide range does not >> look that different from each other. >> >> Without patches >> =============== >> >> ---------- 1 --------- >> 1024+0 records in >> 1024+0 records out >> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.95557 s, 4.4 GB/s >> ---------- 2 --------- >> 1024+0 records in >> 1024+0 records out >> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 2.05176 s, 4.2 GB/s >> ---------- 3 --------- >> 1024+0 records in >> 1024+0 records out >> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.88314 s, 4.6 GB/s >> ---------- 4 --------- >> 1024+0 records in >> 1024+0 records out >> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.84899 s, 4.6 GB/s >> ---------- 5 --------- >> 1024+0 records in >> 1024+0 records out >> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 5.37184 s, 1.6 GB/s >> ---------- 6 --------- >> 1024+0 records in >> 1024+0 records out >> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.9205 s, 4.5 GB/s >> ---------- 7 --------- >> 1024+0 records in >> 1024+0 records out >> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 6.85166 s, 1.3 GB/s >> ---------- 8 --------- >> 1024+0 records in >> 1024+0 records out >> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.74049 s, 4.9 GB/s >> ---------- 9 --------- >> 1024+0 records in >> 1024+0 records out >> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 6.31699 s, 1.4 GB/s >> ---------- 10 --------- >> 1024+0 records in >> 1024+0 records out >> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 2.47057 s, 3.5 GB/s >> >> >> With patches >> ============ >> >> ---------- 1 --------- >> 1024+0 records in >> 1024+0 records out >> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 2.25993 s, 3.8 GB/s >> ---------- 2 --------- >> 1024+0 records in >> 1024+0 records out >> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.82438 s, 4.7 GB/s >> ---------- 3 --------- >> 1024+0 records in >> 1024+0 records out >> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.93856 s, 4.4 GB/s >> ---------- 4 --------- >> 1024+0 records in >> 1024+0 records out >> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.83405 s, 4.7 GB/s >> ---------- 5 --------- >> 1024+0 records in >> 1024+0 records out >> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 7.50199 s, 1.1 GB/s >> ---------- 6 --------- >> 1024+0 records in >> 1024+0 records out >> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 2.28742 s, 3.8 GB/s >> ---------- 7 --------- >> 1024+0 records in >> 1024+0 records out >> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 5.74958 s, 1.5 GB/s >> ---------- 8 --------- >> 1024+0 records in >> 1024+0 records out >> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.99149 s, 4.3 GB/s >> ---------- 9 --------- >> 1024+0 records in >> 1024+0 records out >> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 5.67647 s, 1.5 GB/s >> ---------- 10 --------- >> 1024+0 records in >> 1024+0 records out >> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 2.93957 s, 2.9 GB/s >> >> Does this look okay ? > > You want to test IOPS with lots of small writes and using > raw ramdisk on host. Hello Michael, I have conducted the following experiments and here are the results. TEST SETUP ========== A virtio block disk is mounted on the guest as follows. <disk type='file' device='disk'> <driver name='qemu' type='raw' ioeventfd='off'/> <source file='/mnt/disk2.img'/> <target dev='vdb' bus='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </disk> In the host back end its an QEMU raw image on tmpfs file system. disk: -rw-r--r-- 1 libvirt-qemu kvm 5.0G Jul 24 06:26 disk2.img mount: size=21G on /mnt type tmpfs (rw,relatime,size=22020096k) TEST CONFIG =========== FIO (https://linux.die.net/man/1/fio) is being run with and without the patches. Read test config: [Sequential] direct=1 ioengine=libaio runtime=5m time_based filename=/dev/vda bs=4k numjobs=16 rw=read unlink=1 iodepth=256 Write test config: [Sequential] direct=1 ioengine=libaio runtime=5m time_based filename=/dev/vda bs=4k numjobs=16 rw=write unlink=1 iodepth=256 The virtio block device comes up as /dev/vda on the guest with /sys/block/vda/queue/nr_requests=128 TEST RESULTS ============ Without the patches ------------------- Read test: Run status group 0 (all jobs): READ: bw=550MiB/s (577MB/s), 33.2MiB/s-35.6MiB/s (34.9MB/s-37.4MB/s), io=161GiB (173GB), run=300001-300009msec Disk stats (read/write): vda: ios=42249926/0, merge=0/0, ticks=1499920/0, in_queue=35672384, util=100.00% Write test: Run status group 0 (all jobs): WRITE: bw=514MiB/s (539MB/s), 31.5MiB/s-34.6MiB/s (33.0MB/s-36.2MB/s), io=151GiB (162GB), run=300001-300009msec Disk stats (read/write): vda: ios=29/39459261, merge=0/0, ticks=0/1570580, in_queue=35745992, util=100.00% With the patches ---------------- Read test: Run status group 0 (all jobs): READ: bw=572MiB/s (600MB/s), 35.0MiB/s-37.2MiB/s (36.7MB/s-38.0MB/s), io=168GiB (180GB), run=300001-300006msec Disk stats (read/write): vda: ios=43917611/0, merge=0/0, ticks=1934268/0, in_queue=35531688, util=100.00% Write test: Run status group 0 (all jobs): WRITE: bw=546MiB/s (572MB/s), 33.7MiB/s-35.0MiB/s (35.3MB/s-36.7MB/s), io=160GiB (172GB), run=300001-300007msec Disk stats (read/write): vda: ios=14/41893878, merge=0/0, ticks=8/2107816, in_queue=35535716, util=100.00% Results with and without the patches are similar. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-07-25 3:26 ` Anshuman Khandual @ 2018-07-27 11:31 ` Michael S. Tsirkin 0 siblings, 0 replies; 206+ messages in thread From: Michael S. Tsirkin @ 2018-07-27 11:31 UTC (permalink / raw) To: Anshuman Khandual Cc: robh, srikar, aik, jasowang, linuxram, linux-kernel, virtualization, hch, paulus, joe, linuxppc-dev, elfring, haren, david On Wed, Jul 25, 2018 at 08:56:23AM +0530, Anshuman Khandual wrote: > Results with and without the patches are similar. Thanks! And another thing to try is virtio-net with a fast NIC backend (40G and up). Unfortunately at this point loopback tests stress the host scheduler too much. -- MST ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices @ 2018-07-27 11:31 ` Michael S. Tsirkin 0 siblings, 0 replies; 206+ messages in thread From: Michael S. Tsirkin @ 2018-07-27 11:31 UTC (permalink / raw) To: Anshuman Khandual Cc: robh, srikar, linuxram, linux-kernel, virtualization, hch, paulus, joe, linuxppc-dev, elfring, haren, david On Wed, Jul 25, 2018 at 08:56:23AM +0530, Anshuman Khandual wrote: > Results with and without the patches are similar. Thanks! And another thing to try is virtio-net with a fast NIC backend (40G and up). Unfortunately at this point loopback tests stress the host scheduler too much. -- MST ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-07-27 11:31 ` Michael S. Tsirkin @ 2018-07-28 8:37 ` Anshuman Khandual -1 siblings, 0 replies; 206+ messages in thread From: Anshuman Khandual @ 2018-07-28 8:37 UTC (permalink / raw) To: Michael S. Tsirkin Cc: robh, srikar, aik, jasowang, linuxram, linux-kernel, virtualization, hch, paulus, joe, linuxppc-dev, elfring, haren, david On 07/27/2018 05:01 PM, Michael S. Tsirkin wrote: > On Wed, Jul 25, 2018 at 08:56:23AM +0530, Anshuman Khandual wrote: >> Results with and without the patches are similar. > > Thanks! And another thing to try is virtio-net with > a fast NIC backend (40G and up). Unfortunately > at this point loopback tests stress the host > scheduler too much. > Sure. Will look around for a 40G NIC system. BTW I have been testing virtio-net with a TAP device as back end. ip tuntap add dev tap0 mode tap user $(whoami) ip link set tap0 master virbr0 ip link set dev virbr0 up ip link set dev tap0 up which is exported into the guest as follows -device virtio-net,netdev=network0,mac=52:55:00:d1:55:01 \ -netdev tap,id=network0,ifname=tap0,script=no,downscript=no \ But I have not run any network benchmarks on it though. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices @ 2018-07-28 8:37 ` Anshuman Khandual 0 siblings, 0 replies; 206+ messages in thread From: Anshuman Khandual @ 2018-07-28 8:37 UTC (permalink / raw) To: Michael S. Tsirkin Cc: robh, srikar, linuxram, linux-kernel, virtualization, hch, paulus, joe, linuxppc-dev, elfring, haren, david On 07/27/2018 05:01 PM, Michael S. Tsirkin wrote: > On Wed, Jul 25, 2018 at 08:56:23AM +0530, Anshuman Khandual wrote: >> Results with and without the patches are similar. > > Thanks! And another thing to try is virtio-net with > a fast NIC backend (40G and up). Unfortunately > at this point loopback tests stress the host > scheduler too much. > Sure. Will look around for a 40G NIC system. BTW I have been testing virtio-net with a TAP device as back end. ip tuntap add dev tap0 mode tap user $(whoami) ip link set tap0 master virbr0 ip link set dev virbr0 up ip link set dev tap0 up which is exported into the guest as follows -device virtio-net,netdev=network0,mac=52:55:00:d1:55:01 \ -netdev tap,id=network0,ifname=tap0,script=no,downscript=no \ But I have not run any network benchmarks on it though. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-07-23 9:08 ` Michael S. Tsirkin (?) (?) @ 2018-07-25 3:26 ` Anshuman Khandual -1 siblings, 0 replies; 206+ messages in thread From: Anshuman Khandual @ 2018-07-25 3:26 UTC (permalink / raw) To: Michael S. Tsirkin Cc: robh, srikar, linuxram, linux-kernel, virtualization, hch, paulus, joe, linuxppc-dev, elfring, haren, david On 07/23/2018 02:38 PM, Michael S. Tsirkin wrote: > On Mon, Jul 23, 2018 at 11:58:23AM +0530, Anshuman Khandual wrote: >> On 07/20/2018 06:46 PM, Michael S. Tsirkin wrote: >>> On Fri, Jul 20, 2018 at 09:29:37AM +0530, Anshuman Khandual wrote: >>>> This patch series is the follow up on the discussions we had before about >>>> the RFC titled [RFC,V2] virtio: Add platform specific DMA API translation >>>> for virito devices (https://patchwork.kernel.org/patch/10417371/). There >>>> were suggestions about doing away with two different paths of transactions >>>> with the host/QEMU, first being the direct GPA and the other being the DMA >>>> API based translations. >>>> >>>> First patch attempts to create a direct GPA mapping based DMA operations >>>> structure called 'virtio_direct_dma_ops' with exact same implementation >>>> of the direct GPA path which virtio core currently has but just wrapped in >>>> a DMA API format. Virtio core must use 'virtio_direct_dma_ops' instead of >>>> the arch default in absence of VIRTIO_F_IOMMU_PLATFORM flag to preserve the >>>> existing semantics. The second patch does exactly that inside the function >>>> virtio_finalize_features(). The third patch removes the default direct GPA >>>> path from virtio core forcing it to use DMA API callbacks for all devices. >>>> Now with that change, every device must have a DMA operations structure >>>> associated with it. The fourth patch adds an additional hook which gives >>>> the platform an opportunity to do yet another override if required. This >>>> platform hook can be used on POWER Ultravisor based protected guests to >>>> load up SWIOTLB DMA callbacks to do the required (as discussed previously >>>> in the above mentioned thread how host is allowed to access only parts of >>>> the guest GPA range) bounce buffering into the shared memory for all I/O >>>> scatter gather buffers to be consumed on the host side. >>>> >>>> Please go through these patches and review whether this approach broadly >>>> makes sense. I will appreciate suggestions, inputs, comments regarding >>>> the patches or the approach in general. Thank you. >>> I like how patches 1-3 look. Could you test performance >>> with/without to see whether the extra indirection through >>> use of DMA ops causes a measurable slow-down? >> >> I ran this simple DD command 10 times where /dev/vda is a virtio block >> device of 10GB size. >> >> dd if=/dev/zero of=/dev/vda bs=8M count=1024 oflag=direct >> >> With and without patches bandwidth which has a bit wide range does not >> look that different from each other. >> >> Without patches >> =============== >> >> ---------- 1 --------- >> 1024+0 records in >> 1024+0 records out >> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.95557 s, 4.4 GB/s >> ---------- 2 --------- >> 1024+0 records in >> 1024+0 records out >> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 2.05176 s, 4.2 GB/s >> ---------- 3 --------- >> 1024+0 records in >> 1024+0 records out >> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.88314 s, 4.6 GB/s >> ---------- 4 --------- >> 1024+0 records in >> 1024+0 records out >> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.84899 s, 4.6 GB/s >> ---------- 5 --------- >> 1024+0 records in >> 1024+0 records out >> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 5.37184 s, 1.6 GB/s >> ---------- 6 --------- >> 1024+0 records in >> 1024+0 records out >> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.9205 s, 4.5 GB/s >> ---------- 7 --------- >> 1024+0 records in >> 1024+0 records out >> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 6.85166 s, 1.3 GB/s >> ---------- 8 --------- >> 1024+0 records in >> 1024+0 records out >> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.74049 s, 4.9 GB/s >> ---------- 9 --------- >> 1024+0 records in >> 1024+0 records out >> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 6.31699 s, 1.4 GB/s >> ---------- 10 --------- >> 1024+0 records in >> 1024+0 records out >> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 2.47057 s, 3.5 GB/s >> >> >> With patches >> ============ >> >> ---------- 1 --------- >> 1024+0 records in >> 1024+0 records out >> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 2.25993 s, 3.8 GB/s >> ---------- 2 --------- >> 1024+0 records in >> 1024+0 records out >> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.82438 s, 4.7 GB/s >> ---------- 3 --------- >> 1024+0 records in >> 1024+0 records out >> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.93856 s, 4.4 GB/s >> ---------- 4 --------- >> 1024+0 records in >> 1024+0 records out >> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.83405 s, 4.7 GB/s >> ---------- 5 --------- >> 1024+0 records in >> 1024+0 records out >> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 7.50199 s, 1.1 GB/s >> ---------- 6 --------- >> 1024+0 records in >> 1024+0 records out >> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 2.28742 s, 3.8 GB/s >> ---------- 7 --------- >> 1024+0 records in >> 1024+0 records out >> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 5.74958 s, 1.5 GB/s >> ---------- 8 --------- >> 1024+0 records in >> 1024+0 records out >> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.99149 s, 4.3 GB/s >> ---------- 9 --------- >> 1024+0 records in >> 1024+0 records out >> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 5.67647 s, 1.5 GB/s >> ---------- 10 --------- >> 1024+0 records in >> 1024+0 records out >> 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 2.93957 s, 2.9 GB/s >> >> Does this look okay ? > > You want to test IOPS with lots of small writes and using > raw ramdisk on host. Hello Michael, I have conducted the following experiments and here are the results. TEST SETUP ========== A virtio block disk is mounted on the guest as follows. <disk type='file' device='disk'> <driver name='qemu' type='raw' ioeventfd='off'/> <source file='/mnt/disk2.img'/> <target dev='vdb' bus='virtio'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </disk> In the host back end its an QEMU raw image on tmpfs file system. disk: -rw-r--r-- 1 libvirt-qemu kvm 5.0G Jul 24 06:26 disk2.img mount: size=21G on /mnt type tmpfs (rw,relatime,size=22020096k) TEST CONFIG =========== FIO (https://linux.die.net/man/1/fio) is being run with and without the patches. Read test config: [Sequential] direct=1 ioengine=libaio runtime=5m time_based filename=/dev/vda bs=4k numjobs=16 rw=read unlink=1 iodepth=256 Write test config: [Sequential] direct=1 ioengine=libaio runtime=5m time_based filename=/dev/vda bs=4k numjobs=16 rw=write unlink=1 iodepth=256 The virtio block device comes up as /dev/vda on the guest with /sys/block/vda/queue/nr_requests=128 TEST RESULTS ============ Without the patches ------------------- Read test: Run status group 0 (all jobs): READ: bw=550MiB/s (577MB/s), 33.2MiB/s-35.6MiB/s (34.9MB/s-37.4MB/s), io=161GiB (173GB), run=300001-300009msec Disk stats (read/write): vda: ios=42249926/0, merge=0/0, ticks=1499920/0, in_queue=35672384, util=100.00% Write test: Run status group 0 (all jobs): WRITE: bw=514MiB/s (539MB/s), 31.5MiB/s-34.6MiB/s (33.0MB/s-36.2MB/s), io=151GiB (162GB), run=300001-300009msec Disk stats (read/write): vda: ios=29/39459261, merge=0/0, ticks=0/1570580, in_queue=35745992, util=100.00% With the patches ---------------- Read test: Run status group 0 (all jobs): READ: bw=572MiB/s (600MB/s), 35.0MiB/s-37.2MiB/s (36.7MB/s-38.0MB/s), io=168GiB (180GB), run=300001-300006msec Disk stats (read/write): vda: ios=43917611/0, merge=0/0, ticks=1934268/0, in_queue=35531688, util=100.00% Write test: Run status group 0 (all jobs): WRITE: bw=546MiB/s (572MB/s), 33.7MiB/s-35.0MiB/s (35.3MB/s-36.7MB/s), io=160GiB (172GB), run=300001-300007msec Disk stats (read/write): vda: ios=14/41893878, merge=0/0, ticks=8/2107816, in_queue=35535716, util=100.00% Results with and without the patches are similar. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-07-20 13:16 ` Michael S. Tsirkin 2018-07-23 6:28 ` Anshuman Khandual @ 2018-07-23 6:28 ` Anshuman Khandual 1 sibling, 0 replies; 206+ messages in thread From: Anshuman Khandual @ 2018-07-23 6:28 UTC (permalink / raw) To: Michael S. Tsirkin Cc: robh, srikar, linuxram, linux-kernel, virtualization, hch, paulus, joe, linuxppc-dev, elfring, haren, david On 07/20/2018 06:46 PM, Michael S. Tsirkin wrote: > On Fri, Jul 20, 2018 at 09:29:37AM +0530, Anshuman Khandual wrote: >> This patch series is the follow up on the discussions we had before about >> the RFC titled [RFC,V2] virtio: Add platform specific DMA API translation >> for virito devices (https://patchwork.kernel.org/patch/10417371/). There >> were suggestions about doing away with two different paths of transactions >> with the host/QEMU, first being the direct GPA and the other being the DMA >> API based translations. >> >> First patch attempts to create a direct GPA mapping based DMA operations >> structure called 'virtio_direct_dma_ops' with exact same implementation >> of the direct GPA path which virtio core currently has but just wrapped in >> a DMA API format. Virtio core must use 'virtio_direct_dma_ops' instead of >> the arch default in absence of VIRTIO_F_IOMMU_PLATFORM flag to preserve the >> existing semantics. The second patch does exactly that inside the function >> virtio_finalize_features(). The third patch removes the default direct GPA >> path from virtio core forcing it to use DMA API callbacks for all devices. >> Now with that change, every device must have a DMA operations structure >> associated with it. The fourth patch adds an additional hook which gives >> the platform an opportunity to do yet another override if required. This >> platform hook can be used on POWER Ultravisor based protected guests to >> load up SWIOTLB DMA callbacks to do the required (as discussed previously >> in the above mentioned thread how host is allowed to access only parts of >> the guest GPA range) bounce buffering into the shared memory for all I/O >> scatter gather buffers to be consumed on the host side. >> >> Please go through these patches and review whether this approach broadly >> makes sense. I will appreciate suggestions, inputs, comments regarding >> the patches or the approach in general. Thank you. > I like how patches 1-3 look. Could you test performance > with/without to see whether the extra indirection through > use of DMA ops causes a measurable slow-down? I ran this simple DD command 10 times where /dev/vda is a virtio block device of 10GB size. dd if=/dev/zero of=/dev/vda bs=8M count=1024 oflag=direct With and without patches bandwidth which has a bit wide range does not look that different from each other. Without patches =============== ---------- 1 --------- 1024+0 records in 1024+0 records out 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.95557 s, 4.4 GB/s ---------- 2 --------- 1024+0 records in 1024+0 records out 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 2.05176 s, 4.2 GB/s ---------- 3 --------- 1024+0 records in 1024+0 records out 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.88314 s, 4.6 GB/s ---------- 4 --------- 1024+0 records in 1024+0 records out 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.84899 s, 4.6 GB/s ---------- 5 --------- 1024+0 records in 1024+0 records out 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 5.37184 s, 1.6 GB/s ---------- 6 --------- 1024+0 records in 1024+0 records out 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.9205 s, 4.5 GB/s ---------- 7 --------- 1024+0 records in 1024+0 records out 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 6.85166 s, 1.3 GB/s ---------- 8 --------- 1024+0 records in 1024+0 records out 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.74049 s, 4.9 GB/s ---------- 9 --------- 1024+0 records in 1024+0 records out 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 6.31699 s, 1.4 GB/s ---------- 10 --------- 1024+0 records in 1024+0 records out 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 2.47057 s, 3.5 GB/s With patches ============ ---------- 1 --------- 1024+0 records in 1024+0 records out 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 2.25993 s, 3.8 GB/s ---------- 2 --------- 1024+0 records in 1024+0 records out 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.82438 s, 4.7 GB/s ---------- 3 --------- 1024+0 records in 1024+0 records out 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.93856 s, 4.4 GB/s ---------- 4 --------- 1024+0 records in 1024+0 records out 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.83405 s, 4.7 GB/s ---------- 5 --------- 1024+0 records in 1024+0 records out 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 7.50199 s, 1.1 GB/s ---------- 6 --------- 1024+0 records in 1024+0 records out 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 2.28742 s, 3.8 GB/s ---------- 7 --------- 1024+0 records in 1024+0 records out 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 5.74958 s, 1.5 GB/s ---------- 8 --------- 1024+0 records in 1024+0 records out 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 1.99149 s, 4.3 GB/s ---------- 9 --------- 1024+0 records in 1024+0 records out 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 5.67647 s, 1.5 GB/s ---------- 10 --------- 1024+0 records in 1024+0 records out 8589934592 bytes (8.6 GB, 8.0 GiB) copied, 2.93957 s, 2.9 GB/s Does this look okay ? ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-07-20 3:59 Anshuman Khandual 2018-07-20 13:16 ` Michael S. Tsirkin @ 2018-07-27 9:58 ` Will Deacon 2018-07-27 9:58 ` Will Deacon ` (2 subsequent siblings) 4 siblings, 0 replies; 206+ messages in thread From: Will Deacon @ 2018-07-27 9:58 UTC (permalink / raw) To: Anshuman Khandual Cc: virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, benh, mpe, mst, hch, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier Hi Anshuman, On Fri, Jul 20, 2018 at 09:29:37AM +0530, Anshuman Khandual wrote: > This patch series is the follow up on the discussions we had before about > the RFC titled [RFC,V2] virtio: Add platform specific DMA API translation > for virito devices (https://patchwork.kernel.org/patch/10417371/). There > were suggestions about doing away with two different paths of transactions > with the host/QEMU, first being the direct GPA and the other being the DMA > API based translations. > > First patch attempts to create a direct GPA mapping based DMA operations > structure called 'virtio_direct_dma_ops' with exact same implementation > of the direct GPA path which virtio core currently has but just wrapped in > a DMA API format. Virtio core must use 'virtio_direct_dma_ops' instead of > the arch default in absence of VIRTIO_F_IOMMU_PLATFORM flag to preserve the > existing semantics. The second patch does exactly that inside the function > virtio_finalize_features(). The third patch removes the default direct GPA > path from virtio core forcing it to use DMA API callbacks for all devices. > Now with that change, every device must have a DMA operations structure > associated with it. The fourth patch adds an additional hook which gives > the platform an opportunity to do yet another override if required. This > platform hook can be used on POWER Ultravisor based protected guests to > load up SWIOTLB DMA callbacks to do the required (as discussed previously > in the above mentioned thread how host is allowed to access only parts of > the guest GPA range) bounce buffering into the shared memory for all I/O > scatter gather buffers to be consumed on the host side. > > Please go through these patches and review whether this approach broadly > makes sense. I will appreciate suggestions, inputs, comments regarding > the patches or the approach in general. Thank you. I just wanted to say that this patch series provides a means for us to force the coherent DMA ops for legacy virtio devices on arm64, which in turn means that we can enable the SMMU with legacy devices in our fastmodel emulation platform (which is slowly being upgraded to virtio 1.0) without hanging during boot. Patch below. So: Acked-by: Will Deacon <will.deacon@arm.com> Tested-by: Will Deacon <will.deacon@arm.com> Thanks! Will --->8 From 4ef39e9de2c87c97bf046816ca762832f92e39b5 Mon Sep 17 00:00:00 2001 From: Will Deacon <will.deacon@arm.com> Date: Fri, 27 Jul 2018 10:49:25 +0100 Subject: [PATCH] arm64: dma: Override DMA ops for legacy virtio devices Virtio devices are always cache-coherent, so force use of the coherent DMA ops for legacy virtio devices where the dma-coherent is known to be omitted by QEMU for the MMIO transport. Signed-off-by: Will Deacon <will.deacon@arm.com> --- arch/arm64/include/asm/dma-mapping.h | 6 ++++++ arch/arm64/mm/dma-mapping.c | 19 +++++++++++++++++++ 2 files changed, 25 insertions(+) diff --git a/arch/arm64/include/asm/dma-mapping.h b/arch/arm64/include/asm/dma-mapping.h index b7847eb8a7bb..30aa8fb62dc3 100644 --- a/arch/arm64/include/asm/dma-mapping.h +++ b/arch/arm64/include/asm/dma-mapping.h @@ -44,6 +44,12 @@ void arch_teardown_dma_ops(struct device *dev); #define arch_teardown_dma_ops arch_teardown_dma_ops #endif +#ifdef CONFIG_VIRTIO +struct virtio_device; +void platform_override_dma_ops(struct virtio_device *vdev); +#define platform_override_dma_ops platform_override_dma_ops +#endif + /* do not use this function in a driver */ static inline bool is_device_dma_coherent(struct device *dev) { diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c index 61e93f0b5482..f9ca61b1b34d 100644 --- a/arch/arm64/mm/dma-mapping.c +++ b/arch/arm64/mm/dma-mapping.c @@ -891,3 +891,22 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, } #endif } + +#ifdef CONFIG_VIRTIO +#include <linux/virtio_config.h> + +void platform_override_dma_ops(struct virtio_device *vdev) +{ + struct device *dev = vdev->dev.parent; + const struct dma_map_ops *dma_ops = &arm64_swiotlb_dma_ops; + + if (virtio_has_feature(vdev, VIRTIO_F_VERSION_1)) + return; + + dev->archdata.dma_coherent = true; + if (iommu_get_domain_for_dev(dev)) + dma_ops = &iommu_dma_ops; + + set_dma_ops(dev, dma_ops); +} +#endif /* CONFIG_VIRTIO */ -- 2.1.4 ^ permalink raw reply related [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices @ 2018-07-27 9:58 ` Will Deacon 0 siblings, 0 replies; 206+ messages in thread From: Will Deacon @ 2018-07-27 9:58 UTC (permalink / raw) To: Anshuman Khandual Cc: robh, srikar, mst, benh, linuxram, linux-kernel, virtualization, hch, paulus, marc.zyngier, mpe, joe, robin.murphy, linuxppc-dev, elfring, haren, david Hi Anshuman, On Fri, Jul 20, 2018 at 09:29:37AM +0530, Anshuman Khandual wrote: > This patch series is the follow up on the discussions we had before about > the RFC titled [RFC,V2] virtio: Add platform specific DMA API translation > for virito devices (https://patchwork.kernel.org/patch/10417371/). There > were suggestions about doing away with two different paths of transactions > with the host/QEMU, first being the direct GPA and the other being the DMA > API based translations. > > First patch attempts to create a direct GPA mapping based DMA operations > structure called 'virtio_direct_dma_ops' with exact same implementation > of the direct GPA path which virtio core currently has but just wrapped in > a DMA API format. Virtio core must use 'virtio_direct_dma_ops' instead of > the arch default in absence of VIRTIO_F_IOMMU_PLATFORM flag to preserve the > existing semantics. The second patch does exactly that inside the function > virtio_finalize_features(). The third patch removes the default direct GPA > path from virtio core forcing it to use DMA API callbacks for all devices. > Now with that change, every device must have a DMA operations structure > associated with it. The fourth patch adds an additional hook which gives > the platform an opportunity to do yet another override if required. This > platform hook can be used on POWER Ultravisor based protected guests to > load up SWIOTLB DMA callbacks to do the required (as discussed previously > in the above mentioned thread how host is allowed to access only parts of > the guest GPA range) bounce buffering into the shared memory for all I/O > scatter gather buffers to be consumed on the host side. > > Please go through these patches and review whether this approach broadly > makes sense. I will appreciate suggestions, inputs, comments regarding > the patches or the approach in general. Thank you. I just wanted to say that this patch series provides a means for us to force the coherent DMA ops for legacy virtio devices on arm64, which in turn means that we can enable the SMMU with legacy devices in our fastmodel emulation platform (which is slowly being upgraded to virtio 1.0) without hanging during boot. Patch below. So: Acked-by: Will Deacon <will.deacon@arm.com> Tested-by: Will Deacon <will.deacon@arm.com> Thanks! Will --->8 From 4ef39e9de2c87c97bf046816ca762832f92e39b5 Mon Sep 17 00:00:00 2001 From: Will Deacon <will.deacon@arm.com> Date: Fri, 27 Jul 2018 10:49:25 +0100 Subject: [PATCH] arm64: dma: Override DMA ops for legacy virtio devices Virtio devices are always cache-coherent, so force use of the coherent DMA ops for legacy virtio devices where the dma-coherent is known to be omitted by QEMU for the MMIO transport. Signed-off-by: Will Deacon <will.deacon@arm.com> --- arch/arm64/include/asm/dma-mapping.h | 6 ++++++ arch/arm64/mm/dma-mapping.c | 19 +++++++++++++++++++ 2 files changed, 25 insertions(+) diff --git a/arch/arm64/include/asm/dma-mapping.h b/arch/arm64/include/asm/dma-mapping.h index b7847eb8a7bb..30aa8fb62dc3 100644 --- a/arch/arm64/include/asm/dma-mapping.h +++ b/arch/arm64/include/asm/dma-mapping.h @@ -44,6 +44,12 @@ void arch_teardown_dma_ops(struct device *dev); #define arch_teardown_dma_ops arch_teardown_dma_ops #endif +#ifdef CONFIG_VIRTIO +struct virtio_device; +void platform_override_dma_ops(struct virtio_device *vdev); +#define platform_override_dma_ops platform_override_dma_ops +#endif + /* do not use this function in a driver */ static inline bool is_device_dma_coherent(struct device *dev) { diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c index 61e93f0b5482..f9ca61b1b34d 100644 --- a/arch/arm64/mm/dma-mapping.c +++ b/arch/arm64/mm/dma-mapping.c @@ -891,3 +891,22 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, } #endif } + +#ifdef CONFIG_VIRTIO +#include <linux/virtio_config.h> + +void platform_override_dma_ops(struct virtio_device *vdev) +{ + struct device *dev = vdev->dev.parent; + const struct dma_map_ops *dma_ops = &arm64_swiotlb_dma_ops; + + if (virtio_has_feature(vdev, VIRTIO_F_VERSION_1)) + return; + + dev->archdata.dma_coherent = true; + if (iommu_get_domain_for_dev(dev)) + dma_ops = &iommu_dma_ops; + + set_dma_ops(dev, dma_ops); +} +#endif /* CONFIG_VIRTIO */ -- 2.1.4 ^ permalink raw reply related [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices @ 2018-07-27 9:58 ` Will Deacon 0 siblings, 0 replies; 206+ messages in thread From: Will Deacon @ 2018-07-27 9:58 UTC (permalink / raw) To: Anshuman Khandual Cc: virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, benh, mpe, mst, hch, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier Hi Anshuman, On Fri, Jul 20, 2018 at 09:29:37AM +0530, Anshuman Khandual wrote: > This patch series is the follow up on the discussions we had before about > the RFC titled [RFC,V2] virtio: Add platform specific DMA API translation > for virito devices (https://patchwork.kernel.org/patch/10417371/). There > were suggestions about doing away with two different paths of transactions > with the host/QEMU, first being the direct GPA and the other being the DMA > API based translations. > > First patch attempts to create a direct GPA mapping based DMA operations > structure called 'virtio_direct_dma_ops' with exact same implementation > of the direct GPA path which virtio core currently has but just wrapped in > a DMA API format. Virtio core must use 'virtio_direct_dma_ops' instead of > the arch default in absence of VIRTIO_F_IOMMU_PLATFORM flag to preserve the > existing semantics. The second patch does exactly that inside the function > virtio_finalize_features(). The third patch removes the default direct GPA > path from virtio core forcing it to use DMA API callbacks for all devices. > Now with that change, every device must have a DMA operations structure > associated with it. The fourth patch adds an additional hook which gives > the platform an opportunity to do yet another override if required. This > platform hook can be used on POWER Ultravisor based protected guests to > load up SWIOTLB DMA callbacks to do the required (as discussed previously > in the above mentioned thread how host is allowed to access only parts of > the guest GPA range) bounce buffering into the shared memory for all I/O > scatter gather buffers to be consumed on the host side. > > Please go through these patches and review whether this approach broadly > makes sense. I will appreciate suggestions, inputs, comments regarding > the patches or the approach in general. Thank you. I just wanted to say that this patch series provides a means for us to force the coherent DMA ops for legacy virtio devices on arm64, which in turn means that we can enable the SMMU with legacy devices in our fastmodel emulation platform (which is slowly being upgraded to virtio 1.0) without hanging during boot. Patch below. So: Acked-by: Will Deacon <will.deacon@arm.com> Tested-by: Will Deacon <will.deacon@arm.com> Thanks! Will --->8 >From 4ef39e9de2c87c97bf046816ca762832f92e39b5 Mon Sep 17 00:00:00 2001 From: Will Deacon <will.deacon@arm.com> Date: Fri, 27 Jul 2018 10:49:25 +0100 Subject: [PATCH] arm64: dma: Override DMA ops for legacy virtio devices Virtio devices are always cache-coherent, so force use of the coherent DMA ops for legacy virtio devices where the dma-coherent is known to be omitted by QEMU for the MMIO transport. Signed-off-by: Will Deacon <will.deacon@arm.com> --- arch/arm64/include/asm/dma-mapping.h | 6 ++++++ arch/arm64/mm/dma-mapping.c | 19 +++++++++++++++++++ 2 files changed, 25 insertions(+) diff --git a/arch/arm64/include/asm/dma-mapping.h b/arch/arm64/include/asm/dma-mapping.h index b7847eb8a7bb..30aa8fb62dc3 100644 --- a/arch/arm64/include/asm/dma-mapping.h +++ b/arch/arm64/include/asm/dma-mapping.h @@ -44,6 +44,12 @@ void arch_teardown_dma_ops(struct device *dev); #define arch_teardown_dma_ops arch_teardown_dma_ops #endif +#ifdef CONFIG_VIRTIO +struct virtio_device; +void platform_override_dma_ops(struct virtio_device *vdev); +#define platform_override_dma_ops platform_override_dma_ops +#endif + /* do not use this function in a driver */ static inline bool is_device_dma_coherent(struct device *dev) { diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c index 61e93f0b5482..f9ca61b1b34d 100644 --- a/arch/arm64/mm/dma-mapping.c +++ b/arch/arm64/mm/dma-mapping.c @@ -891,3 +891,22 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, } #endif } + +#ifdef CONFIG_VIRTIO +#include <linux/virtio_config.h> + +void platform_override_dma_ops(struct virtio_device *vdev) +{ + struct device *dev = vdev->dev.parent; + const struct dma_map_ops *dma_ops = &arm64_swiotlb_dma_ops; + + if (virtio_has_feature(vdev, VIRTIO_F_VERSION_1)) + return; + + dev->archdata.dma_coherent = true; + if (iommu_get_domain_for_dev(dev)) + dma_ops = &iommu_dma_ops; + + set_dma_ops(dev, dma_ops); +} +#endif /* CONFIG_VIRTIO */ -- 2.1.4 ^ permalink raw reply related [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-07-27 9:58 ` Will Deacon (?) (?) @ 2018-07-27 10:58 ` Anshuman Khandual -1 siblings, 0 replies; 206+ messages in thread From: Anshuman Khandual @ 2018-07-27 10:58 UTC (permalink / raw) To: Will Deacon Cc: robh, srikar, mst, linuxppc-dev, linuxram, linux-kernel, virtualization, hch, paulus, marc.zyngier, joe, robin.murphy, elfring, haren, david On 07/27/2018 03:28 PM, Will Deacon wrote: > Hi Anshuman, > > On Fri, Jul 20, 2018 at 09:29:37AM +0530, Anshuman Khandual wrote: >> This patch series is the follow up on the discussions we had before about >> the RFC titled [RFC,V2] virtio: Add platform specific DMA API translation >> for virito devices (https://patchwork.kernel.org/patch/10417371/). There >> were suggestions about doing away with two different paths of transactions >> with the host/QEMU, first being the direct GPA and the other being the DMA >> API based translations. >> >> First patch attempts to create a direct GPA mapping based DMA operations >> structure called 'virtio_direct_dma_ops' with exact same implementation >> of the direct GPA path which virtio core currently has but just wrapped in >> a DMA API format. Virtio core must use 'virtio_direct_dma_ops' instead of >> the arch default in absence of VIRTIO_F_IOMMU_PLATFORM flag to preserve the >> existing semantics. The second patch does exactly that inside the function >> virtio_finalize_features(). The third patch removes the default direct GPA >> path from virtio core forcing it to use DMA API callbacks for all devices. >> Now with that change, every device must have a DMA operations structure >> associated with it. The fourth patch adds an additional hook which gives >> the platform an opportunity to do yet another override if required. This >> platform hook can be used on POWER Ultravisor based protected guests to >> load up SWIOTLB DMA callbacks to do the required (as discussed previously >> in the above mentioned thread how host is allowed to access only parts of >> the guest GPA range) bounce buffering into the shared memory for all I/O >> scatter gather buffers to be consumed on the host side. >> >> Please go through these patches and review whether this approach broadly >> makes sense. I will appreciate suggestions, inputs, comments regarding >> the patches or the approach in general. Thank you. > I just wanted to say that this patch series provides a means for us to > force the coherent DMA ops for legacy virtio devices on arm64, which in turn > means that we can enable the SMMU with legacy devices in our fastmodel > emulation platform (which is slowly being upgraded to virtio 1.0) without > hanging during boot. Patch below. > > So: > > Acked-by: Will Deacon <will.deacon@arm.com> > Tested-by: Will Deacon <will.deacon@arm.com> Thanks Will. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-07-27 9:58 ` Will Deacon ` (2 preceding siblings ...) (?) @ 2018-07-27 10:58 ` Anshuman Khandual -1 siblings, 0 replies; 206+ messages in thread From: Anshuman Khandual @ 2018-07-27 10:58 UTC (permalink / raw) To: Will Deacon Cc: robh, srikar, mst, aik, jasowang, linuxram, linux-kernel, virtualization, hch, jean-philippe.brucker, paulus, marc.zyngier, joe, robin.murphy, linuxppc-dev, elfring, haren, david On 07/27/2018 03:28 PM, Will Deacon wrote: > Hi Anshuman, > > On Fri, Jul 20, 2018 at 09:29:37AM +0530, Anshuman Khandual wrote: >> This patch series is the follow up on the discussions we had before about >> the RFC titled [RFC,V2] virtio: Add platform specific DMA API translation >> for virito devices (https://patchwork.kernel.org/patch/10417371/). There >> were suggestions about doing away with two different paths of transactions >> with the host/QEMU, first being the direct GPA and the other being the DMA >> API based translations. >> >> First patch attempts to create a direct GPA mapping based DMA operations >> structure called 'virtio_direct_dma_ops' with exact same implementation >> of the direct GPA path which virtio core currently has but just wrapped in >> a DMA API format. Virtio core must use 'virtio_direct_dma_ops' instead of >> the arch default in absence of VIRTIO_F_IOMMU_PLATFORM flag to preserve the >> existing semantics. The second patch does exactly that inside the function >> virtio_finalize_features(). The third patch removes the default direct GPA >> path from virtio core forcing it to use DMA API callbacks for all devices. >> Now with that change, every device must have a DMA operations structure >> associated with it. The fourth patch adds an additional hook which gives >> the platform an opportunity to do yet another override if required. This >> platform hook can be used on POWER Ultravisor based protected guests to >> load up SWIOTLB DMA callbacks to do the required (as discussed previously >> in the above mentioned thread how host is allowed to access only parts of >> the guest GPA range) bounce buffering into the shared memory for all I/O >> scatter gather buffers to be consumed on the host side. >> >> Please go through these patches and review whether this approach broadly >> makes sense. I will appreciate suggestions, inputs, comments regarding >> the patches or the approach in general. Thank you. > I just wanted to say that this patch series provides a means for us to > force the coherent DMA ops for legacy virtio devices on arm64, which in turn > means that we can enable the SMMU with legacy devices in our fastmodel > emulation platform (which is slowly being upgraded to virtio 1.0) without > hanging during boot. Patch below. > > So: > > Acked-by: Will Deacon <will.deacon@arm.com> > Tested-by: Will Deacon <will.deacon@arm.com> Thanks Will. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-07-27 9:58 ` Will Deacon @ 2018-07-30 9:34 ` Christoph Hellwig -1 siblings, 0 replies; 206+ messages in thread From: Christoph Hellwig @ 2018-07-30 9:34 UTC (permalink / raw) To: Will Deacon Cc: Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, benh, mpe, mst, hch, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Fri, Jul 27, 2018 at 10:58:05AM +0100, Will Deacon wrote: > > I just wanted to say that this patch series provides a means for us to > force the coherent DMA ops for legacy virtio devices on arm64, which in turn > means that we can enable the SMMU with legacy devices in our fastmodel > emulation platform (which is slowly being upgraded to virtio 1.0) without > hanging during boot. Patch below. Yikes, this is a nightmare. That is exactly where I do not want things to end up. We really need to distinguish between legacy virtual crappy virtio (and that includes v1) that totally ignores the bus it pretends to be on, and sane virtio (to be defined) that sit on a real (or properly emulated including iommu and details for dma mapping) bus. Having a mumble jumble of arch specific undocumented magic as in the powerpc patch replied to or this arm patch is a complete no-go. Nacked-by: Christoph Hellwig <hch@lst.de> for both. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices @ 2018-07-30 9:34 ` Christoph Hellwig 0 siblings, 0 replies; 206+ messages in thread From: Christoph Hellwig @ 2018-07-30 9:34 UTC (permalink / raw) To: Will Deacon Cc: robh, srikar, mst, benh, linuxram, linux-kernel, virtualization, hch, paulus, marc.zyngier, mpe, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Fri, Jul 27, 2018 at 10:58:05AM +0100, Will Deacon wrote: > > I just wanted to say that this patch series provides a means for us to > force the coherent DMA ops for legacy virtio devices on arm64, which in turn > means that we can enable the SMMU with legacy devices in our fastmodel > emulation platform (which is slowly being upgraded to virtio 1.0) without > hanging during boot. Patch below. Yikes, this is a nightmare. That is exactly where I do not want things to end up. We really need to distinguish between legacy virtual crappy virtio (and that includes v1) that totally ignores the bus it pretends to be on, and sane virtio (to be defined) that sit on a real (or properly emulated including iommu and details for dma mapping) bus. Having a mumble jumble of arch specific undocumented magic as in the powerpc patch replied to or this arm patch is a complete no-go. Nacked-by: Christoph Hellwig <hch@lst.de> for both. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-07-30 9:34 ` Christoph Hellwig @ 2018-07-30 10:28 ` Michael S. Tsirkin -1 siblings, 0 replies; 206+ messages in thread From: Michael S. Tsirkin @ 2018-07-30 10:28 UTC (permalink / raw) To: Christoph Hellwig Cc: Will Deacon, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, benh, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Mon, Jul 30, 2018 at 02:34:14AM -0700, Christoph Hellwig wrote: > We really need to distinguish between legacy virtual crappy > virtio (and that includes v1) that totally ignores the bus it pretends > to be on, and sane virtio (to be defined) that sit on a real (or > properly emulated including iommu and details for dma mapping) bus. Let me reply to the "crappy" part first: So virtio devices can run on another CPU or on a PCI bus. Configuration can happen over mupltiple transports. There is a discovery protocol to figure out where it is. It has some warts but any real system has warts. So IMHO virtio running on another CPU isn't "legacy virtual crappy virtio". virtio devices that actually sit on a PCI bus aren't "sane" simply because the DMA is more convoluted on some architectures. Performance impact of the optimizations possible when you know your "device" is in fact just another CPU has been measured, it is real, so we aren't interested in adding all that overhead back just so we can use DMA API. The "correct then fast" mantra doesn't apply to something that is as widely deployed as virtio. And I can accept an argument that maybe the DMA API isn't designed to support such virtual DMA. Whether it should I don't know. With this out of my system: I agree these approaches are hacky. I think it is generally better to have virtio feature negotiation tell you whether device runs on a CPU or not rather than rely on platform specific ways for this. To this end there was a recent proposal to rename VIRTIO_F_IO_BARRIER to VIRTIO_F_REAL_DEVICE. It got stuck since "real" sounds vague to people, e.g. what if it's a VF - is that real or not? But I can see something like e.g. VIRTIO_F_PLATFORM_DMA gaining support. We would then rename virtio_has_iommu_quirk to virtio_has_dma_quirk and test VIRTIO_F_PLATFORM_DMA in addition to the IOMMU thing. -- MST ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices @ 2018-07-30 10:28 ` Michael S. Tsirkin 0 siblings, 0 replies; 206+ messages in thread From: Michael S. Tsirkin @ 2018-07-30 10:28 UTC (permalink / raw) To: Christoph Hellwig Cc: robh, srikar, benh, Will Deacon, linux-kernel, linuxram, virtualization, paulus, marc.zyngier, mpe, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Mon, Jul 30, 2018 at 02:34:14AM -0700, Christoph Hellwig wrote: > We really need to distinguish between legacy virtual crappy > virtio (and that includes v1) that totally ignores the bus it pretends > to be on, and sane virtio (to be defined) that sit on a real (or > properly emulated including iommu and details for dma mapping) bus. Let me reply to the "crappy" part first: So virtio devices can run on another CPU or on a PCI bus. Configuration can happen over mupltiple transports. There is a discovery protocol to figure out where it is. It has some warts but any real system has warts. So IMHO virtio running on another CPU isn't "legacy virtual crappy virtio". virtio devices that actually sit on a PCI bus aren't "sane" simply because the DMA is more convoluted on some architectures. Performance impact of the optimizations possible when you know your "device" is in fact just another CPU has been measured, it is real, so we aren't interested in adding all that overhead back just so we can use DMA API. The "correct then fast" mantra doesn't apply to something that is as widely deployed as virtio. And I can accept an argument that maybe the DMA API isn't designed to support such virtual DMA. Whether it should I don't know. With this out of my system: I agree these approaches are hacky. I think it is generally better to have virtio feature negotiation tell you whether device runs on a CPU or not rather than rely on platform specific ways for this. To this end there was a recent proposal to rename VIRTIO_F_IO_BARRIER to VIRTIO_F_REAL_DEVICE. It got stuck since "real" sounds vague to people, e.g. what if it's a VF - is that real or not? But I can see something like e.g. VIRTIO_F_PLATFORM_DMA gaining support. We would then rename virtio_has_iommu_quirk to virtio_has_dma_quirk and test VIRTIO_F_PLATFORM_DMA in addition to the IOMMU thing. -- MST ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-07-30 10:28 ` Michael S. Tsirkin @ 2018-07-30 11:18 ` Christoph Hellwig -1 siblings, 0 replies; 206+ messages in thread From: Christoph Hellwig @ 2018-07-30 11:18 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Christoph Hellwig, Will Deacon, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, benh, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Mon, Jul 30, 2018 at 01:28:03PM +0300, Michael S. Tsirkin wrote: > Let me reply to the "crappy" part first: > So virtio devices can run on another CPU or on a PCI bus. Configuration > can happen over mupltiple transports. There is a discovery protocol to > figure out where it is. It has some warts but any real system has warts. > > So IMHO virtio running on another CPU isn't "legacy virtual crappy > virtio". virtio devices that actually sit on a PCI bus aren't "sane" > simply because the DMA is more convoluted on some architectures. All of what you said would be true if virtio didn't claim to be a PCI device. Once it claims to be a PCI device and we also see real hardware written to the interface I stand to all what I said above. > With this out of my system: > I agree these approaches are hacky. I think it is generally better to > have virtio feature negotiation tell you whether device runs on a CPU or > not rather than rely on platform specific ways for this. To this end > there was a recent proposal to rename VIRTIO_F_IO_BARRIER to > VIRTIO_F_REAL_DEVICE. It got stuck since "real" sounds vague to people, > e.g. what if it's a VF - is that real or not? But I can see something > like e.g. VIRTIO_F_PLATFORM_DMA gaining support. > > We would then rename virtio_has_iommu_quirk to virtio_has_dma_quirk > and test VIRTIO_F_PLATFORM_DMA in addition to the IOMMU thing. I don't really care about the exact naming, and indeed a device that sets the flag doesn't have to be a 'real' device - it just has to act like one. I explained all the issues that this means (at least relating to DMA) in one of the previous threads. The important bit is that we can specify exact behavior for both devices that sets the "I'm real!" flag and that ones that don't exactly in the spec. And that very much excludes arch-specific (or Xen-specific) overrides. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices @ 2018-07-30 11:18 ` Christoph Hellwig 0 siblings, 0 replies; 206+ messages in thread From: Christoph Hellwig @ 2018-07-30 11:18 UTC (permalink / raw) To: Michael S. Tsirkin Cc: robh, srikar, benh, Will Deacon, linux-kernel, linuxram, virtualization, Christoph Hellwig, paulus, marc.zyngier, mpe, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Mon, Jul 30, 2018 at 01:28:03PM +0300, Michael S. Tsirkin wrote: > Let me reply to the "crappy" part first: > So virtio devices can run on another CPU or on a PCI bus. Configuration > can happen over mupltiple transports. There is a discovery protocol to > figure out where it is. It has some warts but any real system has warts. > > So IMHO virtio running on another CPU isn't "legacy virtual crappy > virtio". virtio devices that actually sit on a PCI bus aren't "sane" > simply because the DMA is more convoluted on some architectures. All of what you said would be true if virtio didn't claim to be a PCI device. Once it claims to be a PCI device and we also see real hardware written to the interface I stand to all what I said above. > With this out of my system: > I agree these approaches are hacky. I think it is generally better to > have virtio feature negotiation tell you whether device runs on a CPU or > not rather than rely on platform specific ways for this. To this end > there was a recent proposal to rename VIRTIO_F_IO_BARRIER to > VIRTIO_F_REAL_DEVICE. It got stuck since "real" sounds vague to people, > e.g. what if it's a VF - is that real or not? But I can see something > like e.g. VIRTIO_F_PLATFORM_DMA gaining support. > > We would then rename virtio_has_iommu_quirk to virtio_has_dma_quirk > and test VIRTIO_F_PLATFORM_DMA in addition to the IOMMU thing. I don't really care about the exact naming, and indeed a device that sets the flag doesn't have to be a 'real' device - it just has to act like one. I explained all the issues that this means (at least relating to DMA) in one of the previous threads. The important bit is that we can specify exact behavior for both devices that sets the "I'm real!" flag and that ones that don't exactly in the spec. And that very much excludes arch-specific (or Xen-specific) overrides. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-07-30 11:18 ` Christoph Hellwig @ 2018-07-30 13:26 ` Michael S. Tsirkin -1 siblings, 0 replies; 206+ messages in thread From: Michael S. Tsirkin @ 2018-07-30 13:26 UTC (permalink / raw) To: Christoph Hellwig Cc: Will Deacon, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, benh, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Mon, Jul 30, 2018 at 04:18:02AM -0700, Christoph Hellwig wrote: > On Mon, Jul 30, 2018 at 01:28:03PM +0300, Michael S. Tsirkin wrote: > > Let me reply to the "crappy" part first: > > So virtio devices can run on another CPU or on a PCI bus. Configuration > > can happen over mupltiple transports. There is a discovery protocol to > > figure out where it is. It has some warts but any real system has warts. > > > > So IMHO virtio running on another CPU isn't "legacy virtual crappy > > virtio". virtio devices that actually sit on a PCI bus aren't "sane" > > simply because the DMA is more convoluted on some architectures. > > All of what you said would be true if virtio didn't claim to be > a PCI device. There's nothing virtio claims to be. It's a PV device that uses PCI for its configuration. Configuration is enumerated on the virtual PCI bus. That part of the interface is emulated PCI. Data path is through a PV device enumerated on the virtio bus. > Once it claims to be a PCI device and we also see > real hardware written to the interface I stand to all what I said > above. Real hardware would reuse parts of the interface but by necessity it needs to behave slightly differently on some platforms. However for some platforms (such as x86) a PV virtio driver will by luck work with a PCI device backend without changes. As these platforms and drivers are widely deployed, some people will deploy hardware like that. Should be a non issue as by definition it's transparent to guests. > > With this out of my system: > > I agree these approaches are hacky. I think it is generally better to > > have virtio feature negotiation tell you whether device runs on a CPU or > > not rather than rely on platform specific ways for this. To this end > > there was a recent proposal to rename VIRTIO_F_IO_BARRIER to > > VIRTIO_F_REAL_DEVICE. It got stuck since "real" sounds vague to people, > > e.g. what if it's a VF - is that real or not? But I can see something > > like e.g. VIRTIO_F_PLATFORM_DMA gaining support. > > > > We would then rename virtio_has_iommu_quirk to virtio_has_dma_quirk > > and test VIRTIO_F_PLATFORM_DMA in addition to the IOMMU thing. > > I don't really care about the exact naming, and indeed a device that > sets the flag doesn't have to be a 'real' device - it just has to act > like one. I explained all the issues that this means (at least relating > to DMA) in one of the previous threads. I believe you refer to this: https://lkml.org/lkml/2018/6/7/15 that was a very helpful list outlining the problems we need to solve, thanks a lot for that! > The important bit is that we can specify exact behavior for both > devices that sets the "I'm real!" flag and that ones that don't exactly > in the spec. I would very much like that, yes. > And that very much excludes arch-specific (or > Xen-specific) overrides. We already committed to a xen specific hack but generally I prefer devices that describe how they work instead of platforms magically guessing, yes. However the question people raise is that DMA API is already full of arch-specific tricks the likes of which are outlined in your post linked above. How is this one much worse? -- MST ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices @ 2018-07-30 13:26 ` Michael S. Tsirkin 0 siblings, 0 replies; 206+ messages in thread From: Michael S. Tsirkin @ 2018-07-30 13:26 UTC (permalink / raw) To: Christoph Hellwig Cc: robh, srikar, benh, Will Deacon, linux-kernel, linuxram, virtualization, paulus, marc.zyngier, mpe, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Mon, Jul 30, 2018 at 04:18:02AM -0700, Christoph Hellwig wrote: > On Mon, Jul 30, 2018 at 01:28:03PM +0300, Michael S. Tsirkin wrote: > > Let me reply to the "crappy" part first: > > So virtio devices can run on another CPU or on a PCI bus. Configuration > > can happen over mupltiple transports. There is a discovery protocol to > > figure out where it is. It has some warts but any real system has warts. > > > > So IMHO virtio running on another CPU isn't "legacy virtual crappy > > virtio". virtio devices that actually sit on a PCI bus aren't "sane" > > simply because the DMA is more convoluted on some architectures. > > All of what you said would be true if virtio didn't claim to be > a PCI device. There's nothing virtio claims to be. It's a PV device that uses PCI for its configuration. Configuration is enumerated on the virtual PCI bus. That part of the interface is emulated PCI. Data path is through a PV device enumerated on the virtio bus. > Once it claims to be a PCI device and we also see > real hardware written to the interface I stand to all what I said > above. Real hardware would reuse parts of the interface but by necessity it needs to behave slightly differently on some platforms. However for some platforms (such as x86) a PV virtio driver will by luck work with a PCI device backend without changes. As these platforms and drivers are widely deployed, some people will deploy hardware like that. Should be a non issue as by definition it's transparent to guests. > > With this out of my system: > > I agree these approaches are hacky. I think it is generally better to > > have virtio feature negotiation tell you whether device runs on a CPU or > > not rather than rely on platform specific ways for this. To this end > > there was a recent proposal to rename VIRTIO_F_IO_BARRIER to > > VIRTIO_F_REAL_DEVICE. It got stuck since "real" sounds vague to people, > > e.g. what if it's a VF - is that real or not? But I can see something > > like e.g. VIRTIO_F_PLATFORM_DMA gaining support. > > > > We would then rename virtio_has_iommu_quirk to virtio_has_dma_quirk > > and test VIRTIO_F_PLATFORM_DMA in addition to the IOMMU thing. > > I don't really care about the exact naming, and indeed a device that > sets the flag doesn't have to be a 'real' device - it just has to act > like one. I explained all the issues that this means (at least relating > to DMA) in one of the previous threads. I believe you refer to this: https://lkml.org/lkml/2018/6/7/15 that was a very helpful list outlining the problems we need to solve, thanks a lot for that! > The important bit is that we can specify exact behavior for both > devices that sets the "I'm real!" flag and that ones that don't exactly > in the spec. I would very much like that, yes. > And that very much excludes arch-specific (or > Xen-specific) overrides. We already committed to a xen specific hack but generally I prefer devices that describe how they work instead of platforms magically guessing, yes. However the question people raise is that DMA API is already full of arch-specific tricks the likes of which are outlined in your post linked above. How is this one much worse? -- MST ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-07-30 13:26 ` Michael S. Tsirkin @ 2018-07-31 17:30 ` Christoph Hellwig -1 siblings, 0 replies; 206+ messages in thread From: Christoph Hellwig @ 2018-07-31 17:30 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Christoph Hellwig, Will Deacon, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, benh, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Mon, Jul 30, 2018 at 04:26:32PM +0300, Michael S. Tsirkin wrote: > Real hardware would reuse parts of the interface but by necessity it > needs to behave slightly differently on some platforms. However for > some platforms (such as x86) a PV virtio driver will by luck work with a > PCI device backend without changes. As these platforms and drivers are > widely deployed, some people will deploy hardware like that. Should be > a non issue as by definition it's transparent to guests. On some x86. As soon as you have an iommu or strange PCI root ports things are going to start breaking apart. > > And that very much excludes arch-specific (or > > Xen-specific) overrides. > > We already committed to a xen specific hack but generally I prefer > devices that describe how they work instead of platforms magically > guessing, yes. For legacy reasons I guess we'll have to keep it, but we really need to avoid adding more junk than this. > However the question people raise is that DMA API is already full of > arch-specific tricks the likes of which are outlined in your post linked > above. How is this one much worse? None of these warts is visible to the driver, they are all handled in the architecture (possibly on a per-bus basis). So for virtio we really need to decide if it has one set of behavior as specified in the virtio spec, or if it behaves exactly as if it was on a PCI bus, or in fact probably both as you lined up. But no magic arch specific behavior inbetween. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices @ 2018-07-31 17:30 ` Christoph Hellwig 0 siblings, 0 replies; 206+ messages in thread From: Christoph Hellwig @ 2018-07-31 17:30 UTC (permalink / raw) To: Michael S. Tsirkin Cc: robh, srikar, benh, Will Deacon, linux-kernel, linuxram, virtualization, Christoph Hellwig, paulus, marc.zyngier, mpe, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Mon, Jul 30, 2018 at 04:26:32PM +0300, Michael S. Tsirkin wrote: > Real hardware would reuse parts of the interface but by necessity it > needs to behave slightly differently on some platforms. However for > some platforms (such as x86) a PV virtio driver will by luck work with a > PCI device backend without changes. As these platforms and drivers are > widely deployed, some people will deploy hardware like that. Should be > a non issue as by definition it's transparent to guests. On some x86. As soon as you have an iommu or strange PCI root ports things are going to start breaking apart. > > And that very much excludes arch-specific (or > > Xen-specific) overrides. > > We already committed to a xen specific hack but generally I prefer > devices that describe how they work instead of platforms magically > guessing, yes. For legacy reasons I guess we'll have to keep it, but we really need to avoid adding more junk than this. > However the question people raise is that DMA API is already full of > arch-specific tricks the likes of which are outlined in your post linked > above. How is this one much worse? None of these warts is visible to the driver, they are all handled in the architecture (possibly on a per-bus basis). So for virtio we really need to decide if it has one set of behavior as specified in the virtio spec, or if it behaves exactly as if it was on a PCI bus, or in fact probably both as you lined up. But no magic arch specific behavior inbetween. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-07-31 17:30 ` Christoph Hellwig (?) @ 2018-07-31 20:36 ` Benjamin Herrenschmidt -1 siblings, 0 replies; 206+ messages in thread From: Benjamin Herrenschmidt @ 2018-07-31 20:36 UTC (permalink / raw) To: Christoph Hellwig, Michael S. Tsirkin Cc: robh, srikar, mpe, Will Deacon, linux-kernel, linuxram, virtualization, paulus, marc.zyngier, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Tue, 2018-07-31 at 10:30 -0700, Christoph Hellwig wrote: > > However the question people raise is that DMA API is already full of > > arch-specific tricks the likes of which are outlined in your post linked > > above. How is this one much worse? > > None of these warts is visible to the driver, they are all handled in > the architecture (possibly on a per-bus basis). > > So for virtio we really need to decide if it has one set of behavior > as specified in the virtio spec, or if it behaves exactly as if it > was on a PCI bus, or in fact probably both as you lined up. But no > magic arch specific behavior inbetween. The only arch specific behaviour is needed in the case where it doesn't behave like PCI. In this case, the PCI DMA ops are not suitable, but in our secure VMs, we still need to make it use swiotlb in order to bounce through non-secure pages. It would be nice if "real PCI" was the default but it's not, VMs are created in "legacy" mode all the times and we don't know at VM creation time whether it will become a secure VM or not. The way our secure VMs work is that they start as a normal VM, load a secure "payload" and call the Ultravisor to "become" secure. So we're in a bit of a bind here. We need that one-liner optional arch hook to make virtio use swiotlb in that "IOMMU bypass" case. Ben. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-07-31 17:30 ` Christoph Hellwig (?) (?) @ 2018-07-31 20:36 ` Benjamin Herrenschmidt 2018-08-01 8:16 ` Will Deacon ` (3 more replies) -1 siblings, 4 replies; 206+ messages in thread From: Benjamin Herrenschmidt @ 2018-07-31 20:36 UTC (permalink / raw) To: Christoph Hellwig, Michael S. Tsirkin Cc: Will Deacon, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Tue, 2018-07-31 at 10:30 -0700, Christoph Hellwig wrote: > > However the question people raise is that DMA API is already full of > > arch-specific tricks the likes of which are outlined in your post linked > > above. How is this one much worse? > > None of these warts is visible to the driver, they are all handled in > the architecture (possibly on a per-bus basis). > > So for virtio we really need to decide if it has one set of behavior > as specified in the virtio spec, or if it behaves exactly as if it > was on a PCI bus, or in fact probably both as you lined up. But no > magic arch specific behavior inbetween. The only arch specific behaviour is needed in the case where it doesn't behave like PCI. In this case, the PCI DMA ops are not suitable, but in our secure VMs, we still need to make it use swiotlb in order to bounce through non-secure pages. It would be nice if "real PCI" was the default but it's not, VMs are created in "legacy" mode all the times and we don't know at VM creation time whether it will become a secure VM or not. The way our secure VMs work is that they start as a normal VM, load a secure "payload" and call the Ultravisor to "become" secure. So we're in a bit of a bind here. We need that one-liner optional arch hook to make virtio use swiotlb in that "IOMMU bypass" case. Ben. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-07-31 20:36 ` Benjamin Herrenschmidt @ 2018-08-01 8:16 ` Will Deacon 2018-08-01 8:16 ` Will Deacon ` (2 subsequent siblings) 3 siblings, 0 replies; 206+ messages in thread From: Will Deacon @ 2018-08-01 8:16 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: robh, srikar, Michael S. Tsirkin, mpe, linuxram, linux-kernel, virtualization, Christoph Hellwig, paulus, marc.zyngier, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Tue, Jul 31, 2018 at 03:36:22PM -0500, Benjamin Herrenschmidt wrote: > On Tue, 2018-07-31 at 10:30 -0700, Christoph Hellwig wrote: > > > However the question people raise is that DMA API is already full of > > > arch-specific tricks the likes of which are outlined in your post linked > > > above. How is this one much worse? > > > > None of these warts is visible to the driver, they are all handled in > > the architecture (possibly on a per-bus basis). > > > > So for virtio we really need to decide if it has one set of behavior > > as specified in the virtio spec, or if it behaves exactly as if it > > was on a PCI bus, or in fact probably both as you lined up. But no > > magic arch specific behavior inbetween. > > The only arch specific behaviour is needed in the case where it doesn't > behave like PCI. In this case, the PCI DMA ops are not suitable, but in > our secure VMs, we still need to make it use swiotlb in order to bounce > through non-secure pages. On arm/arm64, the problem we have is that legacy virtio devices on the MMIO transport (so definitely not PCI) have historically been advertised by qemu as not being cache coherent, but because the virtio core has bypassed DMA ops then everything has happened to work. If we blindly enable the arch DMA ops, we'll plumb in the non-coherent ops and start getting data corruption, so we do need a way to quirk virtio as being "always coherent" if we want to use the DMA ops (which we do, because our emulation platforms have an IOMMU for all virtio devices). Will ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-07-31 20:36 ` Benjamin Herrenschmidt 2018-08-01 8:16 ` Will Deacon @ 2018-08-01 8:16 ` Will Deacon 2018-08-01 8:36 ` Christoph Hellwig ` (2 more replies) 2018-08-01 21:56 ` Michael S. Tsirkin 2018-08-01 21:56 ` Michael S. Tsirkin 3 siblings, 3 replies; 206+ messages in thread From: Will Deacon @ 2018-08-01 8:16 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Christoph Hellwig, Michael S. Tsirkin, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Tue, Jul 31, 2018 at 03:36:22PM -0500, Benjamin Herrenschmidt wrote: > On Tue, 2018-07-31 at 10:30 -0700, Christoph Hellwig wrote: > > > However the question people raise is that DMA API is already full of > > > arch-specific tricks the likes of which are outlined in your post linked > > > above. How is this one much worse? > > > > None of these warts is visible to the driver, they are all handled in > > the architecture (possibly on a per-bus basis). > > > > So for virtio we really need to decide if it has one set of behavior > > as specified in the virtio spec, or if it behaves exactly as if it > > was on a PCI bus, or in fact probably both as you lined up. But no > > magic arch specific behavior inbetween. > > The only arch specific behaviour is needed in the case where it doesn't > behave like PCI. In this case, the PCI DMA ops are not suitable, but in > our secure VMs, we still need to make it use swiotlb in order to bounce > through non-secure pages. On arm/arm64, the problem we have is that legacy virtio devices on the MMIO transport (so definitely not PCI) have historically been advertised by qemu as not being cache coherent, but because the virtio core has bypassed DMA ops then everything has happened to work. If we blindly enable the arch DMA ops, we'll plumb in the non-coherent ops and start getting data corruption, so we do need a way to quirk virtio as being "always coherent" if we want to use the DMA ops (which we do, because our emulation platforms have an IOMMU for all virtio devices). Will ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-01 8:16 ` Will Deacon @ 2018-08-01 8:36 ` Christoph Hellwig 2018-08-01 8:36 ` Christoph Hellwig 2018-08-05 0:27 ` Michael S. Tsirkin 2 siblings, 0 replies; 206+ messages in thread From: Christoph Hellwig @ 2018-08-01 8:36 UTC (permalink / raw) To: Will Deacon Cc: robh, srikar, Michael S. Tsirkin, Benjamin Herrenschmidt, linuxram, linux-kernel, virtualization, Christoph Hellwig, paulus, marc.zyngier, mpe, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Wed, Aug 01, 2018 at 09:16:38AM +0100, Will Deacon wrote: > On arm/arm64, the problem we have is that legacy virtio devices on the MMIO > transport (so definitely not PCI) have historically been advertised by qemu > as not being cache coherent, but because the virtio core has bypassed DMA > ops then everything has happened to work. If we blindly enable the arch DMA > ops, No one is suggesting that as far as I can tell. > we'll plumb in the non-coherent ops and start getting data corruption, > so we do need a way to quirk virtio as being "always coherent" if we want to > use the DMA ops (which we do, because our emulation platforms have an IOMMU > for all virtio devices). From all that I've gather so far: no you do not want that. We really need to figure out virtio "dma" interacts with the host / device. If you look at the current iommu spec it does talk of physical address with a little careveout for VIRTIO_F_IOMMU_PLATFORM. So between that and our discussion in this thread and its previous iterations I think we need to stick to the current always physical, bypass system dma ops mode of virtio operation as the default. We just need to figure out how to deal with devices that deviate from the default. One things is that VIRTIO_F_IOMMU_PLATFORM really should become VIRTIO_F_PLATFORM_DMA to cover the cases of non-iommu dma tweaks (offsets, cache flushing), which seems well in spirit of the original design. The other issue is VIRTIO_F_IO_BARRIER which is very vaguely defined, and which needs a better definition. And last but not least we'll need some text explaining the challenges of hardware devices - I think VIRTIO_F_PLATFORM_DMA + VIRTIO_F_IO_BARRIER is what would basically cover them, but a good description including an explanation of why these matter. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-01 8:16 ` Will Deacon @ 2018-08-01 8:36 ` Christoph Hellwig 2018-08-01 8:36 ` Christoph Hellwig 2018-08-05 0:27 ` Michael S. Tsirkin 2 siblings, 0 replies; 206+ messages in thread From: Christoph Hellwig @ 2018-08-01 8:36 UTC (permalink / raw) To: Will Deacon Cc: Benjamin Herrenschmidt, Christoph Hellwig, Michael S. Tsirkin, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Wed, Aug 01, 2018 at 09:16:38AM +0100, Will Deacon wrote: > On arm/arm64, the problem we have is that legacy virtio devices on the MMIO > transport (so definitely not PCI) have historically been advertised by qemu > as not being cache coherent, but because the virtio core has bypassed DMA > ops then everything has happened to work. If we blindly enable the arch DMA > ops, No one is suggesting that as far as I can tell. > we'll plumb in the non-coherent ops and start getting data corruption, > so we do need a way to quirk virtio as being "always coherent" if we want to > use the DMA ops (which we do, because our emulation platforms have an IOMMU > for all virtio devices). From all that I've gather so far: no you do not want that. We really need to figure out virtio "dma" interacts with the host / device. If you look at the current iommu spec it does talk of physical address with a little careveout for VIRTIO_F_IOMMU_PLATFORM. So between that and our discussion in this thread and its previous iterations I think we need to stick to the current always physical, bypass system dma ops mode of virtio operation as the default. We just need to figure out how to deal with devices that deviate from the default. One things is that VIRTIO_F_IOMMU_PLATFORM really should become VIRTIO_F_PLATFORM_DMA to cover the cases of non-iommu dma tweaks (offsets, cache flushing), which seems well in spirit of the original design. The other issue is VIRTIO_F_IO_BARRIER which is very vaguely defined, and which needs a better definition. And last but not least we'll need some text explaining the challenges of hardware devices - I think VIRTIO_F_PLATFORM_DMA + VIRTIO_F_IO_BARRIER is what would basically cover them, but a good description including an explanation of why these matter. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices @ 2018-08-01 8:36 ` Christoph Hellwig 0 siblings, 0 replies; 206+ messages in thread From: Christoph Hellwig @ 2018-08-01 8:36 UTC (permalink / raw) To: Will Deacon Cc: Benjamin Herrenschmidt, Christoph Hellwig, Michael S. Tsirkin, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Wed, Aug 01, 2018 at 09:16:38AM +0100, Will Deacon wrote: > On arm/arm64, the problem we have is that legacy virtio devices on the MMIO > transport (so definitely not PCI) have historically been advertised by qemu > as not being cache coherent, but because the virtio core has bypassed DMA > ops then everything has happened to work. If we blindly enable the arch DMA > ops, No one is suggesting that as far as I can tell. > we'll plumb in the non-coherent ops and start getting data corruption, > so we do need a way to quirk virtio as being "always coherent" if we want to > use the DMA ops (which we do, because our emulation platforms have an IOMMU > for all virtio devices). >From all that I've gather so far: no you do not want that. We really need to figure out virtio "dma" interacts with the host / device. If you look at the current iommu spec it does talk of physical address with a little careveout for VIRTIO_F_IOMMU_PLATFORM. So between that and our discussion in this thread and its previous iterations I think we need to stick to the current always physical, bypass system dma ops mode of virtio operation as the default. We just need to figure out how to deal with devices that deviate from the default. One things is that VIRTIO_F_IOMMU_PLATFORM really should become VIRTIO_F_PLATFORM_DMA to cover the cases of non-iommu dma tweaks (offsets, cache flushing), which seems well in spirit of the original design. The other issue is VIRTIO_F_IO_BARRIER which is very vaguely defined, and which needs a better definition. And last but not least we'll need some text explaining the challenges of hardware devices - I think VIRTIO_F_PLATFORM_DMA + VIRTIO_F_IO_BARRIER is what would basically cover them, but a good description including an explanation of why these matter. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-01 8:36 ` Christoph Hellwig @ 2018-08-01 9:05 ` Will Deacon -1 siblings, 0 replies; 206+ messages in thread From: Will Deacon @ 2018-08-01 9:05 UTC (permalink / raw) To: Christoph Hellwig Cc: Benjamin Herrenschmidt, Michael S. Tsirkin, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier Hi Christoph, On Wed, Aug 01, 2018 at 01:36:39AM -0700, Christoph Hellwig wrote: > On Wed, Aug 01, 2018 at 09:16:38AM +0100, Will Deacon wrote: > > On arm/arm64, the problem we have is that legacy virtio devices on the MMIO > > transport (so definitely not PCI) have historically been advertised by qemu > > as not being cache coherent, but because the virtio core has bypassed DMA > > ops then everything has happened to work. If we blindly enable the arch DMA > > ops, > > No one is suggesting that as far as I can tell. Apologies: it's me that wants the DMA ops enabled to handle legacy devices behind an IOMMU, but see below. > > we'll plumb in the non-coherent ops and start getting data corruption, > > so we do need a way to quirk virtio as being "always coherent" if we want to > > use the DMA ops (which we do, because our emulation platforms have an IOMMU > > for all virtio devices). > > From all that I've gather so far: no you do not want that. We really > need to figure out virtio "dma" interacts with the host / device. > > If you look at the current iommu spec it does talk of physical address > with a little careveout for VIRTIO_F_IOMMU_PLATFORM. That's true, although that doesn't exist in the legacy virtio spec, and we have an existing emulation platform which puts legacy virtio devices behind an IOMMU. Currently, Linux is unable to boot on this platform unless the IOMMU is configured as bypass. If we can use the coherent IOMMU DMA ops, then it works perfectly. > So between that and our discussion in this thread and its previous > iterations I think we need to stick to the current always physical, > bypass system dma ops mode of virtio operation as the default. As above -- that means we hang during boot because we get stuck trying to bring up a virtio-block device whose DMA is aborted by the IOMMU. The easy answer is "just upgrade to latest virtio and advertise the presence of the IOMMU". I'm pushing for that in future platforms, but it seems a shame not to support the current platform, especially given that other systems do have hacks in mainline to get virtio working. > We just need to figure out how to deal with devices that deviate > from the default. One things is that VIRTIO_F_IOMMU_PLATFORM really > should become VIRTIO_F_PLATFORM_DMA to cover the cases of non-iommu > dma tweaks (offsets, cache flushing), which seems well in spirit of > the original design. The other issue is VIRTIO_F_IO_BARRIER > which is very vaguely defined, and which needs a better definition. > And last but not least we'll need some text explaining the challenges > of hardware devices - I think VIRTIO_F_PLATFORM_DMA + VIRTIO_F_IO_BARRIER > is what would basically cover them, but a good description including > an explanation of why these matter. I agree that this makes sense for future revisions of virtio (or perhaps it can just be a clarification to virtio 1.0), but we're still left in the dark with legacy devices and it would be nice to have them work on the systems which currently exist, even if it's a legacy-only hack in the arch code. Will ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices @ 2018-08-01 9:05 ` Will Deacon 0 siblings, 0 replies; 206+ messages in thread From: Will Deacon @ 2018-08-01 9:05 UTC (permalink / raw) To: Christoph Hellwig Cc: robh, srikar, Michael S. Tsirkin, Benjamin Herrenschmidt, linuxram, linux-kernel, virtualization, paulus, marc.zyngier, mpe, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual Hi Christoph, On Wed, Aug 01, 2018 at 01:36:39AM -0700, Christoph Hellwig wrote: > On Wed, Aug 01, 2018 at 09:16:38AM +0100, Will Deacon wrote: > > On arm/arm64, the problem we have is that legacy virtio devices on the MMIO > > transport (so definitely not PCI) have historically been advertised by qemu > > as not being cache coherent, but because the virtio core has bypassed DMA > > ops then everything has happened to work. If we blindly enable the arch DMA > > ops, > > No one is suggesting that as far as I can tell. Apologies: it's me that wants the DMA ops enabled to handle legacy devices behind an IOMMU, but see below. > > we'll plumb in the non-coherent ops and start getting data corruption, > > so we do need a way to quirk virtio as being "always coherent" if we want to > > use the DMA ops (which we do, because our emulation platforms have an IOMMU > > for all virtio devices). > > From all that I've gather so far: no you do not want that. We really > need to figure out virtio "dma" interacts with the host / device. > > If you look at the current iommu spec it does talk of physical address > with a little careveout for VIRTIO_F_IOMMU_PLATFORM. That's true, although that doesn't exist in the legacy virtio spec, and we have an existing emulation platform which puts legacy virtio devices behind an IOMMU. Currently, Linux is unable to boot on this platform unless the IOMMU is configured as bypass. If we can use the coherent IOMMU DMA ops, then it works perfectly. > So between that and our discussion in this thread and its previous > iterations I think we need to stick to the current always physical, > bypass system dma ops mode of virtio operation as the default. As above -- that means we hang during boot because we get stuck trying to bring up a virtio-block device whose DMA is aborted by the IOMMU. The easy answer is "just upgrade to latest virtio and advertise the presence of the IOMMU". I'm pushing for that in future platforms, but it seems a shame not to support the current platform, especially given that other systems do have hacks in mainline to get virtio working. > We just need to figure out how to deal with devices that deviate > from the default. One things is that VIRTIO_F_IOMMU_PLATFORM really > should become VIRTIO_F_PLATFORM_DMA to cover the cases of non-iommu > dma tweaks (offsets, cache flushing), which seems well in spirit of > the original design. The other issue is VIRTIO_F_IO_BARRIER > which is very vaguely defined, and which needs a better definition. > And last but not least we'll need some text explaining the challenges > of hardware devices - I think VIRTIO_F_PLATFORM_DMA + VIRTIO_F_IO_BARRIER > is what would basically cover them, but a good description including > an explanation of why these matter. I agree that this makes sense for future revisions of virtio (or perhaps it can just be a clarification to virtio 1.0), but we're still left in the dark with legacy devices and it would be nice to have them work on the systems which currently exist, even if it's a legacy-only hack in the arch code. Will ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-01 9:05 ` Will Deacon (?) @ 2018-08-01 22:41 ` Michael S. Tsirkin -1 siblings, 0 replies; 206+ messages in thread From: Michael S. Tsirkin @ 2018-08-01 22:41 UTC (permalink / raw) To: Will Deacon Cc: Christoph Hellwig, Benjamin Herrenschmidt, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Wed, Aug 01, 2018 at 10:05:35AM +0100, Will Deacon wrote: > Hi Christoph, > > On Wed, Aug 01, 2018 at 01:36:39AM -0700, Christoph Hellwig wrote: > > On Wed, Aug 01, 2018 at 09:16:38AM +0100, Will Deacon wrote: > > > On arm/arm64, the problem we have is that legacy virtio devices on the MMIO > > > transport (so definitely not PCI) have historically been advertised by qemu > > > as not being cache coherent, but because the virtio core has bypassed DMA > > > ops then everything has happened to work. If we blindly enable the arch DMA > > > ops, > > > > No one is suggesting that as far as I can tell. > > Apologies: it's me that wants the DMA ops enabled to handle legacy devices > behind an IOMMU, but see below. > > > > we'll plumb in the non-coherent ops and start getting data corruption, > > > so we do need a way to quirk virtio as being "always coherent" if we want to > > > use the DMA ops (which we do, because our emulation platforms have an IOMMU > > > for all virtio devices). > > > > From all that I've gather so far: no you do not want that. We really > > need to figure out virtio "dma" interacts with the host / device. > > > > If you look at the current iommu spec it does talk of physical address > > with a little careveout for VIRTIO_F_IOMMU_PLATFORM. > > That's true, although that doesn't exist in the legacy virtio spec, and we > have an existing emulation platform which puts legacy virtio devices behind > an IOMMU. Currently, Linux is unable to boot on this platform unless the > IOMMU is configured as bypass. If we can use the coherent IOMMU DMA ops, > then it works perfectly. > > > So between that and our discussion in this thread and its previous > > iterations I think we need to stick to the current always physical, > > bypass system dma ops mode of virtio operation as the default. > > As above -- that means we hang during boot because we get stuck trying to > bring up a virtio-block device whose DMA is aborted by the IOMMU. The easy > answer is "just upgrade to latest virtio and advertise the presence of the > IOMMU". I'm pushing for that in future platforms, but it seems a shame not > to support the current platform, especially given that other systems do have > hacks in mainline to get virtio working. > > > We just need to figure out how to deal with devices that deviate > > from the default. One things is that VIRTIO_F_IOMMU_PLATFORM really > > should become VIRTIO_F_PLATFORM_DMA to cover the cases of non-iommu > > dma tweaks (offsets, cache flushing), which seems well in spirit of > > the original design. The other issue is VIRTIO_F_IO_BARRIER > > which is very vaguely defined, and which needs a better definition. > > And last but not least we'll need some text explaining the challenges > > of hardware devices - I think VIRTIO_F_PLATFORM_DMA + VIRTIO_F_IO_BARRIER > > is what would basically cover them, but a good description including > > an explanation of why these matter. > > I agree that this makes sense for future revisions of virtio (or perhaps > it can just be a clarification to virtio 1.0), but we're still left in the > dark with legacy devices and it would be nice to have them work on the > systems which currently exist, even if it's a legacy-only hack in the arch > code. > > Will Myself I'm sympathetic to this use-case and I see more uses to this than just legacy support. But more work is required IMHO. Will post tomorrow though - it's late here ... -- MST ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-01 9:05 ` Will Deacon (?) (?) @ 2018-08-01 22:41 ` Michael S. Tsirkin -1 siblings, 0 replies; 206+ messages in thread From: Michael S. Tsirkin @ 2018-08-01 22:41 UTC (permalink / raw) To: Will Deacon Cc: robh, srikar, Benjamin Herrenschmidt, linuxram, linux-kernel, virtualization, Christoph Hellwig, paulus, marc.zyngier, mpe, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Wed, Aug 01, 2018 at 10:05:35AM +0100, Will Deacon wrote: > Hi Christoph, > > On Wed, Aug 01, 2018 at 01:36:39AM -0700, Christoph Hellwig wrote: > > On Wed, Aug 01, 2018 at 09:16:38AM +0100, Will Deacon wrote: > > > On arm/arm64, the problem we have is that legacy virtio devices on the MMIO > > > transport (so definitely not PCI) have historically been advertised by qemu > > > as not being cache coherent, but because the virtio core has bypassed DMA > > > ops then everything has happened to work. If we blindly enable the arch DMA > > > ops, > > > > No one is suggesting that as far as I can tell. > > Apologies: it's me that wants the DMA ops enabled to handle legacy devices > behind an IOMMU, but see below. > > > > we'll plumb in the non-coherent ops and start getting data corruption, > > > so we do need a way to quirk virtio as being "always coherent" if we want to > > > use the DMA ops (which we do, because our emulation platforms have an IOMMU > > > for all virtio devices). > > > > From all that I've gather so far: no you do not want that. We really > > need to figure out virtio "dma" interacts with the host / device. > > > > If you look at the current iommu spec it does talk of physical address > > with a little careveout for VIRTIO_F_IOMMU_PLATFORM. > > That's true, although that doesn't exist in the legacy virtio spec, and we > have an existing emulation platform which puts legacy virtio devices behind > an IOMMU. Currently, Linux is unable to boot on this platform unless the > IOMMU is configured as bypass. If we can use the coherent IOMMU DMA ops, > then it works perfectly. > > > So between that and our discussion in this thread and its previous > > iterations I think we need to stick to the current always physical, > > bypass system dma ops mode of virtio operation as the default. > > As above -- that means we hang during boot because we get stuck trying to > bring up a virtio-block device whose DMA is aborted by the IOMMU. The easy > answer is "just upgrade to latest virtio and advertise the presence of the > IOMMU". I'm pushing for that in future platforms, but it seems a shame not > to support the current platform, especially given that other systems do have > hacks in mainline to get virtio working. > > > We just need to figure out how to deal with devices that deviate > > from the default. One things is that VIRTIO_F_IOMMU_PLATFORM really > > should become VIRTIO_F_PLATFORM_DMA to cover the cases of non-iommu > > dma tweaks (offsets, cache flushing), which seems well in spirit of > > the original design. The other issue is VIRTIO_F_IO_BARRIER > > which is very vaguely defined, and which needs a better definition. > > And last but not least we'll need some text explaining the challenges > > of hardware devices - I think VIRTIO_F_PLATFORM_DMA + VIRTIO_F_IO_BARRIER > > is what would basically cover them, but a good description including > > an explanation of why these matter. > > I agree that this makes sense for future revisions of virtio (or perhaps > it can just be a clarification to virtio 1.0), but we're still left in the > dark with legacy devices and it would be nice to have them work on the > systems which currently exist, even if it's a legacy-only hack in the arch > code. > > Will Myself I'm sympathetic to this use-case and I see more uses to this than just legacy support. But more work is required IMHO. Will post tomorrow though - it's late here ... -- MST ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-01 8:36 ` Christoph Hellwig (?) (?) @ 2018-08-01 22:35 ` Michael S. Tsirkin -1 siblings, 0 replies; 206+ messages in thread From: Michael S. Tsirkin @ 2018-08-01 22:35 UTC (permalink / raw) To: Christoph Hellwig Cc: robh, srikar, Benjamin Herrenschmidt, Will Deacon, linux-kernel, linuxram, virtualization, paulus, marc.zyngier, mpe, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Wed, Aug 01, 2018 at 01:36:39AM -0700, Christoph Hellwig wrote: > On Wed, Aug 01, 2018 at 09:16:38AM +0100, Will Deacon wrote: > > On arm/arm64, the problem we have is that legacy virtio devices on the MMIO > > transport (so definitely not PCI) have historically been advertised by qemu > > as not being cache coherent, but because the virtio core has bypassed DMA > > ops then everything has happened to work. If we blindly enable the arch DMA > > ops, > > No one is suggesting that as far as I can tell. > > > we'll plumb in the non-coherent ops and start getting data corruption, > > so we do need a way to quirk virtio as being "always coherent" if we want to > > use the DMA ops (which we do, because our emulation platforms have an IOMMU > > for all virtio devices). > > >From all that I've gather so far: no you do not want that. We really > need to figure out virtio "dma" interacts with the host / device. > > If you look at the current iommu spec it does talk of physical address > with a little careveout for VIRTIO_F_IOMMU_PLATFORM. > > So between that and our discussion in this thread and its previous > iterations I think we need to stick to the current always physical, > bypass system dma ops mode of virtio operation as the default. > > We just need to figure out how to deal with devices that deviate > from the default. One things is that VIRTIO_F_IOMMU_PLATFORM really > should become VIRTIO_F_PLATFORM_DMA to cover the cases of non-iommu > dma tweaks (offsets, cache flushing), which seems well in spirit of > the original design. Well I wouldn't say that. VIRTIO_F_IOMMU_PLATFORM is for guest programmable protection which is designed for things like userspace drivers but still very much which a CPU doing the accesses. I think VIRTIO_F_IO_BARRIER needs to be extended to VIRTIO_F_PLATFORM_DMA. > The other issue is VIRTIO_F_IO_BARRIER > which is very vaguely defined, and which needs a better definition. > And last but not least we'll need some text explaining the challenges > of hardware devices - I think VIRTIO_F_PLATFORM_DMA + VIRTIO_F_IO_BARRIER > is what would basically cover them, but a good description including > an explanation of why these matter. I think VIRTIO_F_IOMMU_PLATFORM + VIRTIO_F_PLATFORM_DMA but yea. -- MST ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-01 8:36 ` Christoph Hellwig ` (2 preceding siblings ...) (?) @ 2018-08-01 22:35 ` Michael S. Tsirkin -1 siblings, 0 replies; 206+ messages in thread From: Michael S. Tsirkin @ 2018-08-01 22:35 UTC (permalink / raw) To: Christoph Hellwig Cc: Will Deacon, Benjamin Herrenschmidt, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Wed, Aug 01, 2018 at 01:36:39AM -0700, Christoph Hellwig wrote: > On Wed, Aug 01, 2018 at 09:16:38AM +0100, Will Deacon wrote: > > On arm/arm64, the problem we have is that legacy virtio devices on the MMIO > > transport (so definitely not PCI) have historically been advertised by qemu > > as not being cache coherent, but because the virtio core has bypassed DMA > > ops then everything has happened to work. If we blindly enable the arch DMA > > ops, > > No one is suggesting that as far as I can tell. > > > we'll plumb in the non-coherent ops and start getting data corruption, > > so we do need a way to quirk virtio as being "always coherent" if we want to > > use the DMA ops (which we do, because our emulation platforms have an IOMMU > > for all virtio devices). > > >From all that I've gather so far: no you do not want that. We really > need to figure out virtio "dma" interacts with the host / device. > > If you look at the current iommu spec it does talk of physical address > with a little careveout for VIRTIO_F_IOMMU_PLATFORM. > > So between that and our discussion in this thread and its previous > iterations I think we need to stick to the current always physical, > bypass system dma ops mode of virtio operation as the default. > > We just need to figure out how to deal with devices that deviate > from the default. One things is that VIRTIO_F_IOMMU_PLATFORM really > should become VIRTIO_F_PLATFORM_DMA to cover the cases of non-iommu > dma tweaks (offsets, cache flushing), which seems well in spirit of > the original design. Well I wouldn't say that. VIRTIO_F_IOMMU_PLATFORM is for guest programmable protection which is designed for things like userspace drivers but still very much which a CPU doing the accesses. I think VIRTIO_F_IO_BARRIER needs to be extended to VIRTIO_F_PLATFORM_DMA. > The other issue is VIRTIO_F_IO_BARRIER > which is very vaguely defined, and which needs a better definition. > And last but not least we'll need some text explaining the challenges > of hardware devices - I think VIRTIO_F_PLATFORM_DMA + VIRTIO_F_IO_BARRIER > is what would basically cover them, but a good description including > an explanation of why these matter. I think VIRTIO_F_IOMMU_PLATFORM + VIRTIO_F_PLATFORM_DMA but yea. -- MST ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-01 8:36 ` Christoph Hellwig @ 2018-08-02 15:24 ` Benjamin Herrenschmidt -1 siblings, 0 replies; 206+ messages in thread From: Benjamin Herrenschmidt @ 2018-08-02 15:24 UTC (permalink / raw) To: Christoph Hellwig, Will Deacon Cc: Michael S. Tsirkin, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Wed, 2018-08-01 at 01:36 -0700, Christoph Hellwig wrote: > We just need to figure out how to deal with devices that deviate > from the default. One things is that VIRTIO_F_IOMMU_PLATFORM really > should become VIRTIO_F_PLATFORM_DMA to cover the cases of non-iommu > dma tweaks (offsets, cache flushing), which seems well in spirit of > the original design. I don't completely agree: 1 - VIRTIO_F_IOMMU_PLATFORM is a property of the "other side", ie qemu for example. It indicates that the peer bypasses the normal platform iommu. The platform code in the guest has no real way to know that this is the case, this is a specific "feature" of the qemu implementation. 2 - VIRTIO_F_PLATFORM_DMA (or whatever you want to call it), is a property of the guest platform itself (not qemu), there's no way the "peer" can advertize it via the virtio negociated flags. At least for us. I don't know for sure whether that would be workable for the ARM case. In our case, qemu has no idea at VM creation time that the VM will turn itself into a secure VM and thus will require bounce buffering for IOs (including virtio). So unless we have another hook for the arch code to set VIRTIO_F_PLATFORM_DMA on selected (or all) virtio devices from the guest itself, I don't see that as a way to deal with it. > The other issue is VIRTIO_F_IO_BARRIER > which is very vaguely defined, and which needs a better definition. > And last but not least we'll need some text explaining the challenges > of hardware devices - I think VIRTIO_F_PLATFORM_DMA + VIRTIO_F_IO_BARRIER > is what would basically cover them, but a good description including > an explanation of why these matter. Ben. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices @ 2018-08-02 15:24 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 206+ messages in thread From: Benjamin Herrenschmidt @ 2018-08-02 15:24 UTC (permalink / raw) To: Christoph Hellwig, Will Deacon Cc: robh, srikar, Michael S. Tsirkin, mpe, linuxram, linux-kernel, virtualization, paulus, marc.zyngier, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Wed, 2018-08-01 at 01:36 -0700, Christoph Hellwig wrote: > We just need to figure out how to deal with devices that deviate > from the default. One things is that VIRTIO_F_IOMMU_PLATFORM really > should become VIRTIO_F_PLATFORM_DMA to cover the cases of non-iommu > dma tweaks (offsets, cache flushing), which seems well in spirit of > the original design. I don't completely agree: 1 - VIRTIO_F_IOMMU_PLATFORM is a property of the "other side", ie qemu for example. It indicates that the peer bypasses the normal platform iommu. The platform code in the guest has no real way to know that this is the case, this is a specific "feature" of the qemu implementation. 2 - VIRTIO_F_PLATFORM_DMA (or whatever you want to call it), is a property of the guest platform itself (not qemu), there's no way the "peer" can advertize it via the virtio negociated flags. At least for us. I don't know for sure whether that would be workable for the ARM case. In our case, qemu has no idea at VM creation time that the VM will turn itself into a secure VM and thus will require bounce buffering for IOs (including virtio). So unless we have another hook for the arch code to set VIRTIO_F_PLATFORM_DMA on selected (or all) virtio devices from the guest itself, I don't see that as a way to deal with it. > The other issue is VIRTIO_F_IO_BARRIER > which is very vaguely defined, and which needs a better definition. > And last but not least we'll need some text explaining the challenges > of hardware devices - I think VIRTIO_F_PLATFORM_DMA + VIRTIO_F_IO_BARRIER > is what would basically cover them, but a good description including > an explanation of why these matter. Ben. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-02 15:24 ` Benjamin Herrenschmidt (?) @ 2018-08-02 15:41 ` Michael S. Tsirkin -1 siblings, 0 replies; 206+ messages in thread From: Michael S. Tsirkin @ 2018-08-02 15:41 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: robh, srikar, mpe, Will Deacon, linux-kernel, linuxram, virtualization, Christoph Hellwig, paulus, marc.zyngier, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Thu, Aug 02, 2018 at 10:24:57AM -0500, Benjamin Herrenschmidt wrote: > On Wed, 2018-08-01 at 01:36 -0700, Christoph Hellwig wrote: > > We just need to figure out how to deal with devices that deviate > > from the default. One things is that VIRTIO_F_IOMMU_PLATFORM really > > should become VIRTIO_F_PLATFORM_DMA to cover the cases of non-iommu > > dma tweaks (offsets, cache flushing), which seems well in spirit of > > the original design. > > I don't completely agree: > > 1 - VIRTIO_F_IOMMU_PLATFORM is a property of the "other side", ie qemu > for example. It indicates that the peer bypasses the normal platform > iommu. The platform code in the guest has no real way to know that this > is the case, this is a specific "feature" of the qemu implementation. > > 2 - VIRTIO_F_PLATFORM_DMA (or whatever you want to call it), is a > property of the guest platform itself (not qemu), there's no way the > "peer" can advertize it via the virtio negociated flags. At least for > us. I don't know for sure whether that would be workable for the ARM > case. In our case, qemu has no idea at VM creation time that the VM > will turn itself into a secure VM and thus will require bounce > buffering for IOs (including virtio). > > So unless we have another hook for the arch code to set > VIRTIO_F_PLATFORM_DMA on selected (or all) virtio devices from the > guest itself, I don't see that as a way to deal with it. > > > The other issue is VIRTIO_F_IO_BARRIER > > which is very vaguely defined, and which needs a better definition. > > And last but not least we'll need some text explaining the challenges > > of hardware devices - I think VIRTIO_F_PLATFORM_DMA + VIRTIO_F_IO_BARRIER > > is what would basically cover them, but a good description including > > an explanation of why these matter. > > Ben. > So is it true that from qemu point of view there is nothing special going on? You pass in a PA, host writes there. -- MST ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-02 15:24 ` Benjamin Herrenschmidt (?) (?) @ 2018-08-02 15:41 ` Michael S. Tsirkin 2018-08-02 16:01 ` Benjamin Herrenschmidt -1 siblings, 1 reply; 206+ messages in thread From: Michael S. Tsirkin @ 2018-08-02 15:41 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Christoph Hellwig, Will Deacon, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Thu, Aug 02, 2018 at 10:24:57AM -0500, Benjamin Herrenschmidt wrote: > On Wed, 2018-08-01 at 01:36 -0700, Christoph Hellwig wrote: > > We just need to figure out how to deal with devices that deviate > > from the default. One things is that VIRTIO_F_IOMMU_PLATFORM really > > should become VIRTIO_F_PLATFORM_DMA to cover the cases of non-iommu > > dma tweaks (offsets, cache flushing), which seems well in spirit of > > the original design. > > I don't completely agree: > > 1 - VIRTIO_F_IOMMU_PLATFORM is a property of the "other side", ie qemu > for example. It indicates that the peer bypasses the normal platform > iommu. The platform code in the guest has no real way to know that this > is the case, this is a specific "feature" of the qemu implementation. > > 2 - VIRTIO_F_PLATFORM_DMA (or whatever you want to call it), is a > property of the guest platform itself (not qemu), there's no way the > "peer" can advertize it via the virtio negociated flags. At least for > us. I don't know for sure whether that would be workable for the ARM > case. In our case, qemu has no idea at VM creation time that the VM > will turn itself into a secure VM and thus will require bounce > buffering for IOs (including virtio). > > So unless we have another hook for the arch code to set > VIRTIO_F_PLATFORM_DMA on selected (or all) virtio devices from the > guest itself, I don't see that as a way to deal with it. > > > The other issue is VIRTIO_F_IO_BARRIER > > which is very vaguely defined, and which needs a better definition. > > And last but not least we'll need some text explaining the challenges > > of hardware devices - I think VIRTIO_F_PLATFORM_DMA + VIRTIO_F_IO_BARRIER > > is what would basically cover them, but a good description including > > an explanation of why these matter. > > Ben. > So is it true that from qemu point of view there is nothing special going on? You pass in a PA, host writes there. -- MST ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-02 15:41 ` Michael S. Tsirkin @ 2018-08-02 16:01 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 206+ messages in thread From: Benjamin Herrenschmidt @ 2018-08-02 16:01 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Christoph Hellwig, Will Deacon, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Thu, 2018-08-02 at 18:41 +0300, Michael S. Tsirkin wrote: > > > I don't completely agree: > > > > 1 - VIRTIO_F_IOMMU_PLATFORM is a property of the "other side", ie qemu > > for example. It indicates that the peer bypasses the normal platform > > iommu. The platform code in the guest has no real way to know that this > > is the case, this is a specific "feature" of the qemu implementation. > > > > 2 - VIRTIO_F_PLATFORM_DMA (or whatever you want to call it), is a > > property of the guest platform itself (not qemu), there's no way the > > "peer" can advertize it via the virtio negociated flags. At least for > > us. I don't know for sure whether that would be workable for the ARM > > case. In our case, qemu has no idea at VM creation time that the VM > > will turn itself into a secure VM and thus will require bounce > > buffering for IOs (including virtio). > > > > So unless we have another hook for the arch code to set > > VIRTIO_F_PLATFORM_DMA on selected (or all) virtio devices from the > > guest itself, I don't see that as a way to deal with it. > > > > > The other issue is VIRTIO_F_IO_BARRIER > > > which is very vaguely defined, and which needs a better definition. > > > And last but not least we'll need some text explaining the challenges > > > of hardware devices - I think VIRTIO_F_PLATFORM_DMA + VIRTIO_F_IO_BARRIER > > > is what would basically cover them, but a good description including > > > an explanation of why these matter. > > > > Ben. > > > > So is it true that from qemu point of view there is nothing special > going on? You pass in a PA, host writes there. Yes, qemu doesn't see a different. It's the guest that will bounce the pages via a pool of "insecure" pages that qemu can access. Normal pages in a secure VM come from PAs that qemu cannot physically access. Cheers, Ben. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices @ 2018-08-02 16:01 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 206+ messages in thread From: Benjamin Herrenschmidt @ 2018-08-02 16:01 UTC (permalink / raw) To: Michael S. Tsirkin Cc: robh, srikar, mpe, Will Deacon, linux-kernel, linuxram, virtualization, Christoph Hellwig, paulus, marc.zyngier, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Thu, 2018-08-02 at 18:41 +0300, Michael S. Tsirkin wrote: > > > I don't completely agree: > > > > 1 - VIRTIO_F_IOMMU_PLATFORM is a property of the "other side", ie qemu > > for example. It indicates that the peer bypasses the normal platform > > iommu. The platform code in the guest has no real way to know that this > > is the case, this is a specific "feature" of the qemu implementation. > > > > 2 - VIRTIO_F_PLATFORM_DMA (or whatever you want to call it), is a > > property of the guest platform itself (not qemu), there's no way the > > "peer" can advertize it via the virtio negociated flags. At least for > > us. I don't know for sure whether that would be workable for the ARM > > case. In our case, qemu has no idea at VM creation time that the VM > > will turn itself into a secure VM and thus will require bounce > > buffering for IOs (including virtio). > > > > So unless we have another hook for the arch code to set > > VIRTIO_F_PLATFORM_DMA on selected (or all) virtio devices from the > > guest itself, I don't see that as a way to deal with it. > > > > > The other issue is VIRTIO_F_IO_BARRIER > > > which is very vaguely defined, and which needs a better definition. > > > And last but not least we'll need some text explaining the challenges > > > of hardware devices - I think VIRTIO_F_PLATFORM_DMA + VIRTIO_F_IO_BARRIER > > > is what would basically cover them, but a good description including > > > an explanation of why these matter. > > > > Ben. > > > > So is it true that from qemu point of view there is nothing special > going on? You pass in a PA, host writes there. Yes, qemu doesn't see a different. It's the guest that will bounce the pages via a pool of "insecure" pages that qemu can access. Normal pages in a secure VM come from PAs that qemu cannot physically access. Cheers, Ben. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-02 16:01 ` Benjamin Herrenschmidt (?) @ 2018-08-02 17:19 ` Michael S. Tsirkin -1 siblings, 0 replies; 206+ messages in thread From: Michael S. Tsirkin @ 2018-08-02 17:19 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: robh, srikar, mpe, Will Deacon, linux-kernel, linuxram, virtualization, Christoph Hellwig, paulus, marc.zyngier, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Thu, Aug 02, 2018 at 11:01:26AM -0500, Benjamin Herrenschmidt wrote: > On Thu, 2018-08-02 at 18:41 +0300, Michael S. Tsirkin wrote: > > > > > I don't completely agree: > > > > > > 1 - VIRTIO_F_IOMMU_PLATFORM is a property of the "other side", ie qemu > > > for example. It indicates that the peer bypasses the normal platform > > > iommu. The platform code in the guest has no real way to know that this > > > is the case, this is a specific "feature" of the qemu implementation. > > > > > > 2 - VIRTIO_F_PLATFORM_DMA (or whatever you want to call it), is a > > > property of the guest platform itself (not qemu), there's no way the > > > "peer" can advertize it via the virtio negociated flags. At least for > > > us. I don't know for sure whether that would be workable for the ARM > > > case. In our case, qemu has no idea at VM creation time that the VM > > > will turn itself into a secure VM and thus will require bounce > > > buffering for IOs (including virtio). > > > > > > So unless we have another hook for the arch code to set > > > VIRTIO_F_PLATFORM_DMA on selected (or all) virtio devices from the > > > guest itself, I don't see that as a way to deal with it. > > > > > > > The other issue is VIRTIO_F_IO_BARRIER > > > > which is very vaguely defined, and which needs a better definition. > > > > And last but not least we'll need some text explaining the challenges > > > > of hardware devices - I think VIRTIO_F_PLATFORM_DMA + VIRTIO_F_IO_BARRIER > > > > is what would basically cover them, but a good description including > > > > an explanation of why these matter. > > > > > > Ben. > > > > > > > So is it true that from qemu point of view there is nothing special > > going on? You pass in a PA, host writes there. > > Yes, qemu doesn't see a different. It's the guest that will bounce the > pages via a pool of "insecure" pages that qemu can access. Normal pages > in a secure VM come from PAs that qemu cannot physically access. > > Cheers, > Ben. > I see. So yes, given that device does not know or care, using virtio features is an awkward fit. So let's say as a quick fix for you maybe we could generalize the xen_domain hack, instead of just checking xen_domain check some static branch. Then teach xen and others to enable that. OK but problem then becomes this: if you do this and virtio device appears behind a vIOMMU and it does not advertize the IOMMU flag, the code will try to use the vIOMMU mappings and fail. It does look like even with trick above, you need a special version of DMA ops that does just swiotlb but not any of the other things DMA API might do. Thoughts? -- MST ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-02 16:01 ` Benjamin Herrenschmidt (?) (?) @ 2018-08-02 17:19 ` Michael S. Tsirkin 2018-08-02 17:53 ` Benjamin Herrenschmidt -1 siblings, 1 reply; 206+ messages in thread From: Michael S. Tsirkin @ 2018-08-02 17:19 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Christoph Hellwig, Will Deacon, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Thu, Aug 02, 2018 at 11:01:26AM -0500, Benjamin Herrenschmidt wrote: > On Thu, 2018-08-02 at 18:41 +0300, Michael S. Tsirkin wrote: > > > > > I don't completely agree: > > > > > > 1 - VIRTIO_F_IOMMU_PLATFORM is a property of the "other side", ie qemu > > > for example. It indicates that the peer bypasses the normal platform > > > iommu. The platform code in the guest has no real way to know that this > > > is the case, this is a specific "feature" of the qemu implementation. > > > > > > 2 - VIRTIO_F_PLATFORM_DMA (or whatever you want to call it), is a > > > property of the guest platform itself (not qemu), there's no way the > > > "peer" can advertize it via the virtio negociated flags. At least for > > > us. I don't know for sure whether that would be workable for the ARM > > > case. In our case, qemu has no idea at VM creation time that the VM > > > will turn itself into a secure VM and thus will require bounce > > > buffering for IOs (including virtio). > > > > > > So unless we have another hook for the arch code to set > > > VIRTIO_F_PLATFORM_DMA on selected (or all) virtio devices from the > > > guest itself, I don't see that as a way to deal with it. > > > > > > > The other issue is VIRTIO_F_IO_BARRIER > > > > which is very vaguely defined, and which needs a better definition. > > > > And last but not least we'll need some text explaining the challenges > > > > of hardware devices - I think VIRTIO_F_PLATFORM_DMA + VIRTIO_F_IO_BARRIER > > > > is what would basically cover them, but a good description including > > > > an explanation of why these matter. > > > > > > Ben. > > > > > > > So is it true that from qemu point of view there is nothing special > > going on? You pass in a PA, host writes there. > > Yes, qemu doesn't see a different. It's the guest that will bounce the > pages via a pool of "insecure" pages that qemu can access. Normal pages > in a secure VM come from PAs that qemu cannot physically access. > > Cheers, > Ben. > I see. So yes, given that device does not know or care, using virtio features is an awkward fit. So let's say as a quick fix for you maybe we could generalize the xen_domain hack, instead of just checking xen_domain check some static branch. Then teach xen and others to enable that. OK but problem then becomes this: if you do this and virtio device appears behind a vIOMMU and it does not advertize the IOMMU flag, the code will try to use the vIOMMU mappings and fail. It does look like even with trick above, you need a special version of DMA ops that does just swiotlb but not any of the other things DMA API might do. Thoughts? -- MST ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-02 17:19 ` Michael S. Tsirkin @ 2018-08-02 17:53 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 206+ messages in thread From: Benjamin Herrenschmidt @ 2018-08-02 17:53 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Christoph Hellwig, Will Deacon, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Thu, 2018-08-02 at 20:19 +0300, Michael S. Tsirkin wrote: > > I see. So yes, given that device does not know or care, using > virtio features is an awkward fit. > > So let's say as a quick fix for you maybe we could generalize the > xen_domain hack, instead of just checking xen_domain check some static > branch. Then teach xen and others to enable that.> > OK but problem then becomes this: if you do this and virtio device appears > behind a vIOMMU and it does not advertize the IOMMU flag, the > code will try to use the vIOMMU mappings and fail. > > It does look like even with trick above, you need a special version of > DMA ops that does just swiotlb but not any of the other things DMA API > might do. > > Thoughts? Yes, this is the purpose of Anshuman original patch (I haven't looked at the details of the patch in a while but that's what I told him to implement ;-) : - Make virtio always use DMA ops to simplify the code path (with a set of "transparent" ops for legacy) and - Provide an arch hook allowing us to "override" those "transparent" DMA ops with some custom ones that do the appropriate swiotlb gunk. Cheers, Ben. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices @ 2018-08-02 17:53 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 206+ messages in thread From: Benjamin Herrenschmidt @ 2018-08-02 17:53 UTC (permalink / raw) To: Michael S. Tsirkin Cc: robh, srikar, mpe, Will Deacon, linux-kernel, linuxram, virtualization, Christoph Hellwig, paulus, marc.zyngier, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Thu, 2018-08-02 at 20:19 +0300, Michael S. Tsirkin wrote: > > I see. So yes, given that device does not know or care, using > virtio features is an awkward fit. > > So let's say as a quick fix for you maybe we could generalize the > xen_domain hack, instead of just checking xen_domain check some static > branch. Then teach xen and others to enable that.> > OK but problem then becomes this: if you do this and virtio device appears > behind a vIOMMU and it does not advertize the IOMMU flag, the > code will try to use the vIOMMU mappings and fail. > > It does look like even with trick above, you need a special version of > DMA ops that does just swiotlb but not any of the other things DMA API > might do. > > Thoughts? Yes, this is the purpose of Anshuman original patch (I haven't looked at the details of the patch in a while but that's what I told him to implement ;-) : - Make virtio always use DMA ops to simplify the code path (with a set of "transparent" ops for legacy) and - Provide an arch hook allowing us to "override" those "transparent" DMA ops with some custom ones that do the appropriate swiotlb gunk. Cheers, Ben. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-02 17:53 ` Benjamin Herrenschmidt @ 2018-08-02 20:52 ` Michael S. Tsirkin -1 siblings, 0 replies; 206+ messages in thread From: Michael S. Tsirkin @ 2018-08-02 20:52 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Christoph Hellwig, Will Deacon, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Thu, Aug 02, 2018 at 12:53:41PM -0500, Benjamin Herrenschmidt wrote: > On Thu, 2018-08-02 at 20:19 +0300, Michael S. Tsirkin wrote: > > > > I see. So yes, given that device does not know or care, using > > virtio features is an awkward fit. > > > > So let's say as a quick fix for you maybe we could generalize the > > xen_domain hack, instead of just checking xen_domain check some static > > branch. Then teach xen and others to enable that.> > > > OK but problem then becomes this: if you do this and virtio device appears > > behind a vIOMMU and it does not advertize the IOMMU flag, the > > code will try to use the vIOMMU mappings and fail. > > > > It does look like even with trick above, you need a special version of > > DMA ops that does just swiotlb but not any of the other things DMA API > > might do. > > > > Thoughts? > > Yes, this is the purpose of Anshuman original patch (I haven't looked > at the details of the patch in a while but that's what I told him to > implement ;-) : > > - Make virtio always use DMA ops to simplify the code path (with a set > of "transparent" ops for legacy) > > and > > - Provide an arch hook allowing us to "override" those "transparent" > DMA ops with some custom ones that do the appropriate swiotlb gunk. > > Cheers, > Ben. > Right but as I tried to say doing that brings us to a bunch of issues with using DMA APIs in virtio. Put simply DMA APIs weren't designed for guest to hypervisor communication. When we do (as is the case with PLATFORM_IOMMU right now) this adds a bunch of overhead which we need to get rid of if we are to switch to PLATFORM_IOMMU by default. We need to fix that. -- MST ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices @ 2018-08-02 20:52 ` Michael S. Tsirkin 0 siblings, 0 replies; 206+ messages in thread From: Michael S. Tsirkin @ 2018-08-02 20:52 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: robh, srikar, mpe, Will Deacon, linux-kernel, linuxram, virtualization, Christoph Hellwig, paulus, marc.zyngier, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Thu, Aug 02, 2018 at 12:53:41PM -0500, Benjamin Herrenschmidt wrote: > On Thu, 2018-08-02 at 20:19 +0300, Michael S. Tsirkin wrote: > > > > I see. So yes, given that device does not know or care, using > > virtio features is an awkward fit. > > > > So let's say as a quick fix for you maybe we could generalize the > > xen_domain hack, instead of just checking xen_domain check some static > > branch. Then teach xen and others to enable that.> > > > OK but problem then becomes this: if you do this and virtio device appears > > behind a vIOMMU and it does not advertize the IOMMU flag, the > > code will try to use the vIOMMU mappings and fail. > > > > It does look like even with trick above, you need a special version of > > DMA ops that does just swiotlb but not any of the other things DMA API > > might do. > > > > Thoughts? > > Yes, this is the purpose of Anshuman original patch (I haven't looked > at the details of the patch in a while but that's what I told him to > implement ;-) : > > - Make virtio always use DMA ops to simplify the code path (with a set > of "transparent" ops for legacy) > > and > > - Provide an arch hook allowing us to "override" those "transparent" > DMA ops with some custom ones that do the appropriate swiotlb gunk. > > Cheers, > Ben. > Right but as I tried to say doing that brings us to a bunch of issues with using DMA APIs in virtio. Put simply DMA APIs weren't designed for guest to hypervisor communication. When we do (as is the case with PLATFORM_IOMMU right now) this adds a bunch of overhead which we need to get rid of if we are to switch to PLATFORM_IOMMU by default. We need to fix that. -- MST ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-02 20:52 ` Michael S. Tsirkin @ 2018-08-02 21:13 ` Benjamin Herrenschmidt -1 siblings, 0 replies; 206+ messages in thread From: Benjamin Herrenschmidt @ 2018-08-02 21:13 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Christoph Hellwig, Will Deacon, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Thu, 2018-08-02 at 23:52 +0300, Michael S. Tsirkin wrote: > > Yes, this is the purpose of Anshuman original patch (I haven't looked > > at the details of the patch in a while but that's what I told him to > > implement ;-) : > > > > - Make virtio always use DMA ops to simplify the code path (with a set > > of "transparent" ops for legacy) > > > > and > > > > - Provide an arch hook allowing us to "override" those "transparent" > > DMA ops with some custom ones that do the appropriate swiotlb gunk. > > > > Cheers, > > Ben. > > > > Right but as I tried to say doing that brings us to a bunch of issues > with using DMA APIs in virtio. Put simply DMA APIs weren't designed for > guest to hypervisor communication. I'm not sure I see the problem, see below > When we do (as is the case with PLATFORM_IOMMU right now) this adds a > bunch of overhead which we need to get rid of if we are to switch to > PLATFORM_IOMMU by default. We need to fix that. So let's differenciate the two problems of having an IOMMU (real or emulated) which indeeds adds overhead etc... and using the DMA API. At the moment, virtio does this all over the place: if (use_dma_api) dma_map/alloc_something(...) else use_pa The idea of the patch set is to do two, somewhat orthogonal, changes that together achieve what we want. Let me know where you think there is "a bunch of issues" because I'm missing it: 1- Replace the above if/else constructs with just calling the DMA API, and have virtio, at initialization, hookup its own dma_ops that just "return pa" (roughly) when the IOMMU stuff isn't used. This adds an indirect function call to the path that previously didn't have one (the else case above). Is that a significant/measurable overhead ? This change stands alone, and imho "cleans" up virtio by avoiding all that if/else "2 path" and unless it adds a measurable overhead, should probably be done. 2- Make virtio use the DMA API with our custom platform-provided swiotlb callbacks when needed, that is when not using IOMMU *and* running on a secure VM in our case. This benefits from -1- by making us just plumb in a different set of DMA ops we would have cooked up specifically for virtio in our arch code (or in virtio itself but build arch-conditionally in a separate file). But it doesn't strictly need it -1-: Now, -2- doesn't strictly needs -1-. We could have just done another xen-like hack that forces the DMA API "ON" for virtio when running in a secure VM. The problem if we do that however is that we also then need the arch PCI code to make sure it hooks up the virtio PCI devices with the special "magic" DMA ops that avoid the iommu but still do swiotlb, ie, not the same as other PCI devices. So it will have to play games such as checking vendor/device IDs for virtio, checking the IOMMU flag, etc... from the arch code which really bloody sucks when assigning PCI DMA ops. However, if we do it the way we plan here, on top of -1-, with a hook called from virtio into the arch to "override" the virtio DMA ops, then we avoid the problem completely: The arch hook would only be called by virtio if the IOMMU flag is *not* set. IE only when using that special "hypervisor" iommu bypass. If the IOMMU flag is set, virtio uses normal PCI dma ops as usual. That way, we have a very clear semantic: This hook is purely about replacing those "null" DMA ops that just return PA introduced in -1- with some arch provided specially cooked up DMA ops for non-IOMMU virtio that know about the arch special requirements. For us bounce buffering. Is there something I'm missing ? Cheers, Ben. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices @ 2018-08-02 21:13 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 206+ messages in thread From: Benjamin Herrenschmidt @ 2018-08-02 21:13 UTC (permalink / raw) To: Michael S. Tsirkin Cc: robh, srikar, mpe, Will Deacon, linux-kernel, linuxram, virtualization, Christoph Hellwig, paulus, marc.zyngier, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Thu, 2018-08-02 at 23:52 +0300, Michael S. Tsirkin wrote: > > Yes, this is the purpose of Anshuman original patch (I haven't looked > > at the details of the patch in a while but that's what I told him to > > implement ;-) : > > > > - Make virtio always use DMA ops to simplify the code path (with a set > > of "transparent" ops for legacy) > > > > and > > > > - Provide an arch hook allowing us to "override" those "transparent" > > DMA ops with some custom ones that do the appropriate swiotlb gunk. > > > > Cheers, > > Ben. > > > > Right but as I tried to say doing that brings us to a bunch of issues > with using DMA APIs in virtio. Put simply DMA APIs weren't designed for > guest to hypervisor communication. I'm not sure I see the problem, see below > When we do (as is the case with PLATFORM_IOMMU right now) this adds a > bunch of overhead which we need to get rid of if we are to switch to > PLATFORM_IOMMU by default. We need to fix that. So let's differenciate the two problems of having an IOMMU (real or emulated) which indeeds adds overhead etc... and using the DMA API. At the moment, virtio does this all over the place: if (use_dma_api) dma_map/alloc_something(...) else use_pa The idea of the patch set is to do two, somewhat orthogonal, changes that together achieve what we want. Let me know where you think there is "a bunch of issues" because I'm missing it: 1- Replace the above if/else constructs with just calling the DMA API, and have virtio, at initialization, hookup its own dma_ops that just "return pa" (roughly) when the IOMMU stuff isn't used. This adds an indirect function call to the path that previously didn't have one (the else case above). Is that a significant/measurable overhead ? This change stands alone, and imho "cleans" up virtio by avoiding all that if/else "2 path" and unless it adds a measurable overhead, should probably be done. 2- Make virtio use the DMA API with our custom platform-provided swiotlb callbacks when needed, that is when not using IOMMU *and* running on a secure VM in our case. This benefits from -1- by making us just plumb in a different set of DMA ops we would have cooked up specifically for virtio in our arch code (or in virtio itself but build arch-conditionally in a separate file). But it doesn't strictly need it -1-: Now, -2- doesn't strictly needs -1-. We could have just done another xen-like hack that forces the DMA API "ON" for virtio when running in a secure VM. The problem if we do that however is that we also then need the arch PCI code to make sure it hooks up the virtio PCI devices with the special "magic" DMA ops that avoid the iommu but still do swiotlb, ie, not the same as other PCI devices. So it will have to play games such as checking vendor/device IDs for virtio, checking the IOMMU flag, etc... from the arch code which really bloody sucks when assigning PCI DMA ops. However, if we do it the way we plan here, on top of -1-, with a hook called from virtio into the arch to "override" the virtio DMA ops, then we avoid the problem completely: The arch hook would only be called by virtio if the IOMMU flag is *not* set. IE only when using that special "hypervisor" iommu bypass. If the IOMMU flag is set, virtio uses normal PCI dma ops as usual. That way, we have a very clear semantic: This hook is purely about replacing those "null" DMA ops that just return PA introduced in -1- with some arch provided specially cooked up DMA ops for non-IOMMU virtio that know about the arch special requirements. For us bounce buffering. Is there something I'm missing ? Cheers, Ben. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-02 21:13 ` Benjamin Herrenschmidt @ 2018-08-02 21:51 ` Michael S. Tsirkin -1 siblings, 0 replies; 206+ messages in thread From: Michael S. Tsirkin @ 2018-08-02 21:51 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Christoph Hellwig, Will Deacon, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Thu, Aug 02, 2018 at 04:13:09PM -0500, Benjamin Herrenschmidt wrote: > On Thu, 2018-08-02 at 23:52 +0300, Michael S. Tsirkin wrote: > > > Yes, this is the purpose of Anshuman original patch (I haven't looked > > > at the details of the patch in a while but that's what I told him to > > > implement ;-) : > > > > > > - Make virtio always use DMA ops to simplify the code path (with a set > > > of "transparent" ops for legacy) > > > > > > and > > > > > > - Provide an arch hook allowing us to "override" those "transparent" > > > DMA ops with some custom ones that do the appropriate swiotlb gunk. > > > > > > Cheers, > > > Ben. > > > > > > > Right but as I tried to say doing that brings us to a bunch of issues > > with using DMA APIs in virtio. Put simply DMA APIs weren't designed for > > guest to hypervisor communication. > > I'm not sure I see the problem, see below > > > When we do (as is the case with PLATFORM_IOMMU right now) this adds a > > bunch of overhead which we need to get rid of if we are to switch to > > PLATFORM_IOMMU by default. We need to fix that. > > So let's differenciate the two problems of having an IOMMU (real or > emulated) which indeeds adds overhead etc... and using the DMA API. Well actually it's the other way around. An iommu in theory doesn't need to bring overhead if you set it in bypass mode. Which does imply the iommu supports bypass mode. Is that universally the case? DMA API does see Christoph's list of things it does some of which add overhead. > At the moment, virtio does this all over the place: > > if (use_dma_api) > dma_map/alloc_something(...) > else > use_pa > > The idea of the patch set is to do two, somewhat orthogonal, changes > that together achieve what we want. Let me know where you think there > is "a bunch of issues" because I'm missing it: > > 1- Replace the above if/else constructs with just calling the DMA API, > and have virtio, at initialization, hookup its own dma_ops that just > "return pa" (roughly) when the IOMMU stuff isn't used. > > This adds an indirect function call to the path that previously didn't > have one (the else case above). Is that a significant/measurable > overhead ? Seems to be :( Jason reports about 4%. I wonder whether we can support map_sg and friends being NULL, then use that when mapping is an identity. A conditional branch there is likely very cheap. Would this cover all platforms with kvm (which is where we care most about performance)? > This change stands alone, and imho "cleans" up virtio by avoiding all > that if/else "2 path" and unless it adds a measurable overhead, should > probably be done. > > 2- Make virtio use the DMA API with our custom platform-provided > swiotlb callbacks when needed, that is when not using IOMMU *and* > running on a secure VM in our case. > > This benefits from -1- by making us just plumb in a different set of > DMA ops we would have cooked up specifically for virtio in our arch > code (or in virtio itself but build arch-conditionally in a separate > file). But it doesn't strictly need it -1-: > > Now, -2- doesn't strictly needs -1-. We could have just done another > xen-like hack that forces the DMA API "ON" for virtio when running in a > secure VM. > > The problem if we do that however is that we also then need the arch > PCI code to make sure it hooks up the virtio PCI devices with the > special "magic" DMA ops that avoid the iommu but still do swiotlb, ie, > not the same as other PCI devices. So it will have to play games such > as checking vendor/device IDs for virtio, checking the IOMMU flag, > etc... from the arch code which really bloody sucks when assigning PCI > DMA ops. > > However, if we do it the way we plan here, on top of -1-, with a hook > called from virtio into the arch to "override" the virtio DMA ops, then > we avoid the problem completely: The arch hook would only be called by > virtio if the IOMMU flag is *not* set. IE only when using that special > "hypervisor" iommu bypass. If the IOMMU flag is set, virtio uses normal > PCI dma ops as usual. > > That way, we have a very clear semantic: This hook is purely about > replacing those "null" DMA ops that just return PA introduced in -1- > with some arch provided specially cooked up DMA ops for non-IOMMU > virtio that know about the arch special requirements. For us bounce > buffering. > > Is there something I'm missing ? > > Cheers, > Ben. Right so I was trying to write it up in a systematic way, but just to give you one example, if there is a system where DMA API handles coherency issues, or flushing of some buffers, then our PLATFORM_IOMMU flag causes that to happen. And we kinda worked around this without the IOMMU by basically saying "ok we do not really need DMA API so let's just bypass it" and it was kind of ok except now everyone is switching to vIOMMU just in case. So now people do want some parts of what DMA API does, such as the bounce buffer use, or IOMMU mappings. And maybe in the end the solution is going to be to do something similar to virt_Xmb except for DMA APIs: add APIs that handle just the addressing bits but without the overhead. See commit 6a65d26385bf487926a0616650927303058551e3 asm-generic: implement virt_xxx memory barriers for reference, it's a similar set of issues. So it's not a problem with your patches as such, it's just that they don't solve that harder problem. -- MST ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices @ 2018-08-02 21:51 ` Michael S. Tsirkin 0 siblings, 0 replies; 206+ messages in thread From: Michael S. Tsirkin @ 2018-08-02 21:51 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: robh, srikar, mpe, Will Deacon, linux-kernel, linuxram, virtualization, Christoph Hellwig, paulus, marc.zyngier, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Thu, Aug 02, 2018 at 04:13:09PM -0500, Benjamin Herrenschmidt wrote: > On Thu, 2018-08-02 at 23:52 +0300, Michael S. Tsirkin wrote: > > > Yes, this is the purpose of Anshuman original patch (I haven't looked > > > at the details of the patch in a while but that's what I told him to > > > implement ;-) : > > > > > > - Make virtio always use DMA ops to simplify the code path (with a set > > > of "transparent" ops for legacy) > > > > > > and > > > > > > - Provide an arch hook allowing us to "override" those "transparent" > > > DMA ops with some custom ones that do the appropriate swiotlb gunk. > > > > > > Cheers, > > > Ben. > > > > > > > Right but as I tried to say doing that brings us to a bunch of issues > > with using DMA APIs in virtio. Put simply DMA APIs weren't designed for > > guest to hypervisor communication. > > I'm not sure I see the problem, see below > > > When we do (as is the case with PLATFORM_IOMMU right now) this adds a > > bunch of overhead which we need to get rid of if we are to switch to > > PLATFORM_IOMMU by default. We need to fix that. > > So let's differenciate the two problems of having an IOMMU (real or > emulated) which indeeds adds overhead etc... and using the DMA API. Well actually it's the other way around. An iommu in theory doesn't need to bring overhead if you set it in bypass mode. Which does imply the iommu supports bypass mode. Is that universally the case? DMA API does see Christoph's list of things it does some of which add overhead. > At the moment, virtio does this all over the place: > > if (use_dma_api) > dma_map/alloc_something(...) > else > use_pa > > The idea of the patch set is to do two, somewhat orthogonal, changes > that together achieve what we want. Let me know where you think there > is "a bunch of issues" because I'm missing it: > > 1- Replace the above if/else constructs with just calling the DMA API, > and have virtio, at initialization, hookup its own dma_ops that just > "return pa" (roughly) when the IOMMU stuff isn't used. > > This adds an indirect function call to the path that previously didn't > have one (the else case above). Is that a significant/measurable > overhead ? Seems to be :( Jason reports about 4%. I wonder whether we can support map_sg and friends being NULL, then use that when mapping is an identity. A conditional branch there is likely very cheap. Would this cover all platforms with kvm (which is where we care most about performance)? > This change stands alone, and imho "cleans" up virtio by avoiding all > that if/else "2 path" and unless it adds a measurable overhead, should > probably be done. > > 2- Make virtio use the DMA API with our custom platform-provided > swiotlb callbacks when needed, that is when not using IOMMU *and* > running on a secure VM in our case. > > This benefits from -1- by making us just plumb in a different set of > DMA ops we would have cooked up specifically for virtio in our arch > code (or in virtio itself but build arch-conditionally in a separate > file). But it doesn't strictly need it -1-: > > Now, -2- doesn't strictly needs -1-. We could have just done another > xen-like hack that forces the DMA API "ON" for virtio when running in a > secure VM. > > The problem if we do that however is that we also then need the arch > PCI code to make sure it hooks up the virtio PCI devices with the > special "magic" DMA ops that avoid the iommu but still do swiotlb, ie, > not the same as other PCI devices. So it will have to play games such > as checking vendor/device IDs for virtio, checking the IOMMU flag, > etc... from the arch code which really bloody sucks when assigning PCI > DMA ops. > > However, if we do it the way we plan here, on top of -1-, with a hook > called from virtio into the arch to "override" the virtio DMA ops, then > we avoid the problem completely: The arch hook would only be called by > virtio if the IOMMU flag is *not* set. IE only when using that special > "hypervisor" iommu bypass. If the IOMMU flag is set, virtio uses normal > PCI dma ops as usual. > > That way, we have a very clear semantic: This hook is purely about > replacing those "null" DMA ops that just return PA introduced in -1- > with some arch provided specially cooked up DMA ops for non-IOMMU > virtio that know about the arch special requirements. For us bounce > buffering. > > Is there something I'm missing ? > > Cheers, > Ben. Right so I was trying to write it up in a systematic way, but just to give you one example, if there is a system where DMA API handles coherency issues, or flushing of some buffers, then our PLATFORM_IOMMU flag causes that to happen. And we kinda worked around this without the IOMMU by basically saying "ok we do not really need DMA API so let's just bypass it" and it was kind of ok except now everyone is switching to vIOMMU just in case. So now people do want some parts of what DMA API does, such as the bounce buffer use, or IOMMU mappings. And maybe in the end the solution is going to be to do something similar to virt_Xmb except for DMA APIs: add APIs that handle just the addressing bits but without the overhead. See commit 6a65d26385bf487926a0616650927303058551e3 asm-generic: implement virt_xxx memory barriers for reference, it's a similar set of issues. So it's not a problem with your patches as such, it's just that they don't solve that harder problem. -- MST ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-02 21:13 ` Benjamin Herrenschmidt (?) (?) @ 2018-08-03 7:05 ` Christoph Hellwig -1 siblings, 0 replies; 206+ messages in thread From: Christoph Hellwig @ 2018-08-03 7:05 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: robh, srikar, Michael S. Tsirkin, mpe, Will Deacon, linux-kernel, linuxram, virtualization, Christoph Hellwig, paulus, marc.zyngier, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Thu, Aug 02, 2018 at 04:13:09PM -0500, Benjamin Herrenschmidt wrote: > So let's differenciate the two problems of having an IOMMU (real or > emulated) which indeeds adds overhead etc... and using the DMA API. > > At the moment, virtio does this all over the place: > > if (use_dma_api) > dma_map/alloc_something(...) > else > use_pa > > The idea of the patch set is to do two, somewhat orthogonal, changes > that together achieve what we want. Let me know where you think there > is "a bunch of issues" because I'm missing it: > > 1- Replace the above if/else constructs with just calling the DMA API, > and have virtio, at initialization, hookup its own dma_ops that just > "return pa" (roughly) when the IOMMU stuff isn't used. > > This adds an indirect function call to the path that previously didn't > have one (the else case above). Is that a significant/measurable > overhead ? If you call it often enough it does: https://www.spinics.net/lists/netdev/msg495413.html > 2- Make virtio use the DMA API with our custom platform-provided > swiotlb callbacks when needed, that is when not using IOMMU *and* > running on a secure VM in our case. And total NAK the customer platform-provided part of this. We need a flag passed in from the hypervisor that the device needs all bus specific dma api treatment, and then just use the normal plaform dma mapping setup. To get swiotlb you'll need to then use the DT/ACPI dma-range property to limit the addressable range, and a swiotlb capable plaform will use swiotlb automatically. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-02 21:13 ` Benjamin Herrenschmidt ` (2 preceding siblings ...) (?) @ 2018-08-03 7:05 ` Christoph Hellwig 2018-08-03 15:58 ` Benjamin Herrenschmidt 2018-08-03 19:17 ` Michael S. Tsirkin -1 siblings, 2 replies; 206+ messages in thread From: Christoph Hellwig @ 2018-08-03 7:05 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Michael S. Tsirkin, Christoph Hellwig, Will Deacon, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Thu, Aug 02, 2018 at 04:13:09PM -0500, Benjamin Herrenschmidt wrote: > So let's differenciate the two problems of having an IOMMU (real or > emulated) which indeeds adds overhead etc... and using the DMA API. > > At the moment, virtio does this all over the place: > > if (use_dma_api) > dma_map/alloc_something(...) > else > use_pa > > The idea of the patch set is to do two, somewhat orthogonal, changes > that together achieve what we want. Let me know where you think there > is "a bunch of issues" because I'm missing it: > > 1- Replace the above if/else constructs with just calling the DMA API, > and have virtio, at initialization, hookup its own dma_ops that just > "return pa" (roughly) when the IOMMU stuff isn't used. > > This adds an indirect function call to the path that previously didn't > have one (the else case above). Is that a significant/measurable > overhead ? If you call it often enough it does: https://www.spinics.net/lists/netdev/msg495413.html > 2- Make virtio use the DMA API with our custom platform-provided > swiotlb callbacks when needed, that is when not using IOMMU *and* > running on a secure VM in our case. And total NAK the customer platform-provided part of this. We need a flag passed in from the hypervisor that the device needs all bus specific dma api treatment, and then just use the normal plaform dma mapping setup. To get swiotlb you'll need to then use the DT/ACPI dma-range property to limit the addressable range, and a swiotlb capable plaform will use swiotlb automatically. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-03 7:05 ` Christoph Hellwig @ 2018-08-03 15:58 ` Benjamin Herrenschmidt 2018-08-03 19:17 ` Michael S. Tsirkin 1 sibling, 0 replies; 206+ messages in thread From: Benjamin Herrenschmidt @ 2018-08-03 15:58 UTC (permalink / raw) To: Christoph Hellwig Cc: Michael S. Tsirkin, Will Deacon, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Fri, 2018-08-03 at 00:05 -0700, Christoph Hellwig wrote: > > 2- Make virtio use the DMA API with our custom platform-provided > > swiotlb callbacks when needed, that is when not using IOMMU *and* > > running on a secure VM in our case. > > And total NAK the customer platform-provided part of this. We need > a flag passed in from the hypervisor that the device needs all bus > specific dma api treatment, and then just use the normal plaform > dma mapping setup. Christoph, as I have explained already, we do NOT have a way to provide such a flag as neither the hypervisor nor qemu knows anything about this when the VM is created. > To get swiotlb you'll need to then use the DT/ACPI > dma-range property to limit the addressable range, and a swiotlb > capable plaform will use swiotlb automatically. This cannot be done as you describe it. The VM is created as a *normal* VM. The DT stuff is generated by qemu at a point where it has *no idea* that the VM will later become secure and thus will have to restrict which pages can be used for "DMA". The VM will *at runtime* turn itself into a secure VM via interactions with the security HW and the Ultravisor layer (which sits below the HV). This happens way after the DT has been created and consumed, the qemu devices instanciated etc... Only the guest kernel knows because it initates the transition. When that happens, the virtio devices have already been used by the guest firmware, bootloader, possibly another kernel that kexeced the "secure" one, etc... So instead of running around saying NAK NAK NAK, please explain how we can solve that differently. Ben. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices @ 2018-08-03 15:58 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 206+ messages in thread From: Benjamin Herrenschmidt @ 2018-08-03 15:58 UTC (permalink / raw) To: Christoph Hellwig Cc: robh, srikar, Michael S. Tsirkin, mpe, Will Deacon, linux-kernel, linuxram, virtualization, paulus, marc.zyngier, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Fri, 2018-08-03 at 00:05 -0700, Christoph Hellwig wrote: > > 2- Make virtio use the DMA API with our custom platform-provided > > swiotlb callbacks when needed, that is when not using IOMMU *and* > > running on a secure VM in our case. > > And total NAK the customer platform-provided part of this. We need > a flag passed in from the hypervisor that the device needs all bus > specific dma api treatment, and then just use the normal plaform > dma mapping setup. Christoph, as I have explained already, we do NOT have a way to provide such a flag as neither the hypervisor nor qemu knows anything about this when the VM is created. > To get swiotlb you'll need to then use the DT/ACPI > dma-range property to limit the addressable range, and a swiotlb > capable plaform will use swiotlb automatically. This cannot be done as you describe it. The VM is created as a *normal* VM. The DT stuff is generated by qemu at a point where it has *no idea* that the VM will later become secure and thus will have to restrict which pages can be used for "DMA". The VM will *at runtime* turn itself into a secure VM via interactions with the security HW and the Ultravisor layer (which sits below the HV). This happens way after the DT has been created and consumed, the qemu devices instanciated etc... Only the guest kernel knows because it initates the transition. When that happens, the virtio devices have already been used by the guest firmware, bootloader, possibly another kernel that kexeced the "secure" one, etc... So instead of running around saying NAK NAK NAK, please explain how we can solve that differently. Ben. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-03 15:58 ` Benjamin Herrenschmidt @ 2018-08-03 16:02 ` Christoph Hellwig -1 siblings, 0 replies; 206+ messages in thread From: Christoph Hellwig @ 2018-08-03 16:02 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Christoph Hellwig, Michael S. Tsirkin, Will Deacon, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Fri, Aug 03, 2018 at 10:58:36AM -0500, Benjamin Herrenschmidt wrote: > On Fri, 2018-08-03 at 00:05 -0700, Christoph Hellwig wrote: > > > 2- Make virtio use the DMA API with our custom platform-provided > > > swiotlb callbacks when needed, that is when not using IOMMU *and* > > > running on a secure VM in our case. > > > > And total NAK the customer platform-provided part of this. We need > > a flag passed in from the hypervisor that the device needs all bus > > specific dma api treatment, and then just use the normal plaform > > dma mapping setup. > > Christoph, as I have explained already, we do NOT have a way to provide > such a flag as neither the hypervisor nor qemu knows anything about > this when the VM is created. Well, if your setup is so fucked up I see no way to support it in Linux. Let's end the discussion right now then. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices @ 2018-08-03 16:02 ` Christoph Hellwig 0 siblings, 0 replies; 206+ messages in thread From: Christoph Hellwig @ 2018-08-03 16:02 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: robh, srikar, Michael S. Tsirkin, mpe, Will Deacon, linux-kernel, linuxram, virtualization, Christoph Hellwig, paulus, marc.zyngier, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Fri, Aug 03, 2018 at 10:58:36AM -0500, Benjamin Herrenschmidt wrote: > On Fri, 2018-08-03 at 00:05 -0700, Christoph Hellwig wrote: > > > 2- Make virtio use the DMA API with our custom platform-provided > > > swiotlb callbacks when needed, that is when not using IOMMU *and* > > > running on a secure VM in our case. > > > > And total NAK the customer platform-provided part of this. We need > > a flag passed in from the hypervisor that the device needs all bus > > specific dma api treatment, and then just use the normal plaform > > dma mapping setup. > > Christoph, as I have explained already, we do NOT have a way to provide > such a flag as neither the hypervisor nor qemu knows anything about > this when the VM is created. Well, if your setup is so fucked up I see no way to support it in Linux. Let's end the discussion right now then. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-03 16:02 ` Christoph Hellwig @ 2018-08-03 18:58 ` Benjamin Herrenschmidt -1 siblings, 0 replies; 206+ messages in thread From: Benjamin Herrenschmidt @ 2018-08-03 18:58 UTC (permalink / raw) To: Christoph Hellwig Cc: Michael S. Tsirkin, Will Deacon, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Fri, 2018-08-03 at 09:02 -0700, Christoph Hellwig wrote: > On Fri, Aug 03, 2018 at 10:58:36AM -0500, Benjamin Herrenschmidt wrote: > > On Fri, 2018-08-03 at 00:05 -0700, Christoph Hellwig wrote: > > > > 2- Make virtio use the DMA API with our custom platform-provided > > > > swiotlb callbacks when needed, that is when not using IOMMU *and* > > > > running on a secure VM in our case. > > > > > > And total NAK the customer platform-provided part of this. We need > > > a flag passed in from the hypervisor that the device needs all bus > > > specific dma api treatment, and then just use the normal plaform > > > dma mapping setup. > > > > Christoph, as I have explained already, we do NOT have a way to provide > > such a flag as neither the hypervisor nor qemu knows anything about > > this when the VM is created. > > Well, if your setup is so fucked up I see no way to support it in Linux. > > Let's end the discussion right now then. You are saying something along the lines of "I don't like an instruction in your ISA, let's not support your entire CPU architecture in Linux". Our setup is not fucked. It makes a LOT of sense and it's a very sensible design. It's hitting a problem due to a corner case oddity in virtio bypassing the MMU, we've worked around such corner cases many times in the past without any problem, I fail to see what the problem is here. We aren't going to cancel years of HW and SW development for our security infrastructure bcs you don't like a 2 lines hook into virtio to make things work and aren't willing to even consider the options. Ben. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices @ 2018-08-03 18:58 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 206+ messages in thread From: Benjamin Herrenschmidt @ 2018-08-03 18:58 UTC (permalink / raw) To: Christoph Hellwig Cc: robh, srikar, Michael S. Tsirkin, mpe, Will Deacon, linux-kernel, linuxram, virtualization, paulus, marc.zyngier, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Fri, 2018-08-03 at 09:02 -0700, Christoph Hellwig wrote: > On Fri, Aug 03, 2018 at 10:58:36AM -0500, Benjamin Herrenschmidt wrote: > > On Fri, 2018-08-03 at 00:05 -0700, Christoph Hellwig wrote: > > > > 2- Make virtio use the DMA API with our custom platform-provided > > > > swiotlb callbacks when needed, that is when not using IOMMU *and* > > > > running on a secure VM in our case. > > > > > > And total NAK the customer platform-provided part of this. We need > > > a flag passed in from the hypervisor that the device needs all bus > > > specific dma api treatment, and then just use the normal plaform > > > dma mapping setup. > > > > Christoph, as I have explained already, we do NOT have a way to provide > > such a flag as neither the hypervisor nor qemu knows anything about > > this when the VM is created. > > Well, if your setup is so fucked up I see no way to support it in Linux. > > Let's end the discussion right now then. You are saying something along the lines of "I don't like an instruction in your ISA, let's not support your entire CPU architecture in Linux". Our setup is not fucked. It makes a LOT of sense and it's a very sensible design. It's hitting a problem due to a corner case oddity in virtio bypassing the MMU, we've worked around such corner cases many times in the past without any problem, I fail to see what the problem is here. We aren't going to cancel years of HW and SW development for our security infrastructure bcs you don't like a 2 lines hook into virtio to make things work and aren't willing to even consider the options. Ben. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-03 18:58 ` Benjamin Herrenschmidt @ 2018-08-04 8:21 ` Christoph Hellwig -1 siblings, 0 replies; 206+ messages in thread From: Christoph Hellwig @ 2018-08-04 8:21 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Christoph Hellwig, Michael S. Tsirkin, Will Deacon, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Fri, Aug 03, 2018 at 01:58:46PM -0500, Benjamin Herrenschmidt wrote: > You are saying something along the lines of "I don't like an > instruction in your ISA, let's not support your entire CPU architecture > in Linux". No. I'm saying if you can't describe your architecture in the virtio spec document it is bogus. > Our setup is not fucked. It makes a LOT of sense and it's a very > sensible design. It's hitting a problem due to a corner case oddity in > virtio bypassing the MMU, we've worked around such corner cases many > times in the past without any problem, I fail to see what the problem > is here. No matter if you like it or not (I don't!) virtio is defined to bypass dma translations, it is very clearly stated in the spec. It has some ill-defined bits to bypass it, so if you want the dma mapping API to be used you'll have to set that bit (in its original form, a refined form, or an entirely newly defined sane form) and make sure your hypersivors always sets it. It's not rocket science, just a little bit for work to make sure your setup is actually going to work reliably and portably. > We aren't going to cancel years of HW and SW development for our Maybe you should have actually read the specs you are claiming to implemented before spending all that effort. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices @ 2018-08-04 8:21 ` Christoph Hellwig 0 siblings, 0 replies; 206+ messages in thread From: Christoph Hellwig @ 2018-08-04 8:21 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: robh, srikar, Michael S. Tsirkin, mpe, Will Deacon, linux-kernel, linuxram, virtualization, Christoph Hellwig, paulus, marc.zyngier, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Fri, Aug 03, 2018 at 01:58:46PM -0500, Benjamin Herrenschmidt wrote: > You are saying something along the lines of "I don't like an > instruction in your ISA, let's not support your entire CPU architecture > in Linux". No. I'm saying if you can't describe your architecture in the virtio spec document it is bogus. > Our setup is not fucked. It makes a LOT of sense and it's a very > sensible design. It's hitting a problem due to a corner case oddity in > virtio bypassing the MMU, we've worked around such corner cases many > times in the past without any problem, I fail to see what the problem > is here. No matter if you like it or not (I don't!) virtio is defined to bypass dma translations, it is very clearly stated in the spec. It has some ill-defined bits to bypass it, so if you want the dma mapping API to be used you'll have to set that bit (in its original form, a refined form, or an entirely newly defined sane form) and make sure your hypersivors always sets it. It's not rocket science, just a little bit for work to make sure your setup is actually going to work reliably and portably. > We aren't going to cancel years of HW and SW development for our Maybe you should have actually read the specs you are claiming to implemented before spending all that effort. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-04 8:21 ` Christoph Hellwig (?) @ 2018-08-05 1:10 ` Benjamin Herrenschmidt -1 siblings, 0 replies; 206+ messages in thread From: Benjamin Herrenschmidt @ 2018-08-05 1:10 UTC (permalink / raw) To: Christoph Hellwig Cc: Michael S. Tsirkin, Will Deacon, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Sat, 2018-08-04 at 01:21 -0700, Christoph Hellwig wrote: > No matter if you like it or not (I don't!) virtio is defined to bypass > dma translations, it is very clearly stated in the spec. It has some > ill-defined bits to bypass it, so if you want the dma mapping API > to be used you'll have to set that bit (in its original form, a refined > form, or an entirely newly defined sane form) and make sure your > hypersivors always sets it. It's not rocket science, just a little bit > for work to make sure your setup is actually going to work reliably > and portably. I think you are conflating completely different things, let me try to clarify, we might actually be talking past each other. > > We aren't going to cancel years of HW and SW development for our > > Maybe you should have actually read the specs you are claiming to > implemented before spending all that effort. Anyway, let's cool our respective jets and sort that out, there are indeed other approaches than overriding the DMA ops with special ones, though I find them less tasty ... but here's my attempt at a (simpler) description. Bear with me for the long-ish email, this tries to describe the system so you get an idea where we come from, and options we can use to get out of this. So we *are* implementing the spec, since qemu is currently unmodified: Default virtio will bypass the iommu emulated by qemu as per spec etc.. On the Linux side, thus, virtio "sees" a normal iommu-bypassing device and will treat it as such. The problem is the assumption in the middle that qemu can access all guest pages directly, which holds true for traditional VMs, but breaks when the VM in our case turns itself into a secure VM. This isn't under the action (or due to changes in) the hypervisor. KVM operates (almost) normally here. But there's this (very thin and open source btw) layer underneath called ultravisor, which exploits some HW facilities to maintain a separate pool of "secure" memory, which cannot be physically accessed by a non-secure entity. So in our scenario, qemu and KVM create a VM totally normally, there is no changes required to the VM firmware, bootloader(s), etc... in fact we support Linux based bootloaders, and those will work as normal linux would in a VM, virtio works normally, etc... Until that VM (via grub or kexec for example) loads a "secure image". That secure image is a Linux kernel which has been "wrapped" (to simply imagine a modified zImage wrapper though that's not entirely exact). When that is run, before it modifies it's .data, it will interact with the ultravisor using a specific HW facility to make itself secure. What happens then is that the UV cryptographically verifies the kernel and ramdisk, and copies them to the secure memory where execution returns. The Ultravisor is then involved as a small shim for hypercalls between the secure VM and KVM to prevent leakage of information (sanitize registers etc...). Now at this point, qemu can no longer access the secure VM pages (there's more to this, such as using HMM to allow migration/encryption accross etc... but let's not get bogged down). So virtio can no longer access any page in the VM. Now the VM *can* request from the Ultravisor some selected pages to be made "insecure" and thus shared with qemu. This is how we handle some of the pages used in our paravirt stuff, and that's how we want to deal with virtio, by creating an insecure swiotlb pool. At this point, thus, there are two options. - One you have rejected, which is to have a way for "no-iommu" virtio (which still doesn't use an iommu on the qemu side and doesn't need to), to be forced to use some custom DMA ops on the VM side. - One, which sadly has more overhead and will require modifying more pieces of the puzzle, which is to make qemu uses an emulated iommu. Once we make qemu do that, we can then layer swiotlb on top of the emulated iommu on the guest side, and pass that as dma_ops to virtio. Now, assuming you still absolutely want us to go down the second option, there are several ways to get there. We would prefer to avoid requiring the user to pass some special option to qemu. That has an impact up the food chain (libvirt, management tools etc...) and users probably won't understand what it's about. In fact the *end user* might not even need to know a VM is secure, though applications inside might. There's the additional annoyance that currently our guest FW (SLOF) cannot deal with virtio in IOMMU mode, but that's fixable. From there, refer to the email chain between Michael and I where we are discussing options to "switch" virtio at runtime on the qemu side. Any comment or suggestion ? Cheers, Ben. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices @ 2018-08-05 1:10 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 206+ messages in thread From: Benjamin Herrenschmidt @ 2018-08-05 1:10 UTC (permalink / raw) To: Christoph Hellwig Cc: robh, srikar, Michael S. Tsirkin, mpe, Will Deacon, linux-kernel, linuxram, virtualization, paulus, marc.zyngier, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Sat, 2018-08-04 at 01:21 -0700, Christoph Hellwig wrote: > No matter if you like it or not (I don't!) virtio is defined to bypass > dma translations, it is very clearly stated in the spec. It has some > ill-defined bits to bypass it, so if you want the dma mapping API > to be used you'll have to set that bit (in its original form, a refined > form, or an entirely newly defined sane form) and make sure your > hypersivors always sets it. It's not rocket science, just a little bit > for work to make sure your setup is actually going to work reliably > and portably. I think you are conflating completely different things, let me try to clarify, we might actually be talking past each other. > > We aren't going to cancel years of HW and SW development for our > > Maybe you should have actually read the specs you are claiming to > implemented before spending all that effort. Anyway, let's cool our respective jets and sort that out, there are indeed other approaches than overriding the DMA ops with special ones, though I find them less tasty ... but here's my attempt at a (simpler) description. Bear with me for the long-ish email, this tries to describe the system so you get an idea where we come from, and options we can use to get out of this. So we *are* implementing the spec, since qemu is currently unmodified: Default virtio will bypass the iommu emulated by qemu as per spec etc.. On the Linux side, thus, virtio "sees" a normal iommu-bypassing device and will treat it as such. The problem is the assumption in the middle that qemu can access all guest pages directly, which holds true for traditional VMs, but breaks when the VM in our case turns itself into a secure VM. This isn't under the action (or due to changes in) the hypervisor. KVM operates (almost) normally here. But there's this (very thin and open source btw) layer underneath called ultravisor, which exploits some HW facilities to maintain a separate pool of "secure" memory, which cannot be physically accessed by a non-secure entity. So in our scenario, qemu and KVM create a VM totally normally, there is no changes required to the VM firmware, bootloader(s), etc... in fact we support Linux based bootloaders, and those will work as normal linux would in a VM, virtio works normally, etc... Until that VM (via grub or kexec for example) loads a "secure image". That secure image is a Linux kernel which has been "wrapped" (to simply imagine a modified zImage wrapper though that's not entirely exact). When that is run, before it modifies it's .data, it will interact with the ultravisor using a specific HW facility to make itself secure. What happens then is that the UV cryptographically verifies the kernel and ramdisk, and copies them to the secure memory where execution returns. The Ultravisor is then involved as a small shim for hypercalls between the secure VM and KVM to prevent leakage of information (sanitize registers etc...). Now at this point, qemu can no longer access the secure VM pages (there's more to this, such as using HMM to allow migration/encryption accross etc... but let's not get bogged down). So virtio can no longer access any page in the VM. Now the VM *can* request from the Ultravisor some selected pages to be made "insecure" and thus shared with qemu. This is how we handle some of the pages used in our paravirt stuff, and that's how we want to deal with virtio, by creating an insecure swiotlb pool. At this point, thus, there are two options. - One you have rejected, which is to have a way for "no-iommu" virtio (which still doesn't use an iommu on the qemu side and doesn't need to), to be forced to use some custom DMA ops on the VM side. - One, which sadly has more overhead and will require modifying more pieces of the puzzle, which is to make qemu uses an emulated iommu. Once we make qemu do that, we can then layer swiotlb on top of the emulated iommu on the guest side, and pass that as dma_ops to virtio. Now, assuming you still absolutely want us to go down the second option, there are several ways to get there. We would prefer to avoid requiring the user to pass some special option to qemu. That has an impact up the food chain (libvirt, management tools etc...) and users probably won't understand what it's about. In fact the *end user* might not even need to know a VM is secure, though applications inside might. There's the additional annoyance that currently our guest FW (SLOF) cannot deal with virtio in IOMMU mode, but that's fixable. From there, refer to the email chain between Michael and I where we are discussing options to "switch" virtio at runtime on the qemu side. Any comment or suggestion ? Cheers, Ben. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices @ 2018-08-05 1:10 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 206+ messages in thread From: Benjamin Herrenschmidt @ 2018-08-05 1:10 UTC (permalink / raw) To: Christoph Hellwig Cc: Michael S. Tsirkin, Will Deacon, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Sat, 2018-08-04 at 01:21 -0700, Christoph Hellwig wrote: > No matter if you like it or not (I don't!) virtio is defined to bypass > dma translations, it is very clearly stated in the spec. It has some > ill-defined bits to bypass it, so if you want the dma mapping API > to be used you'll have to set that bit (in its original form, a refined > form, or an entirely newly defined sane form) and make sure your > hypersivors always sets it. It's not rocket science, just a little bit > for work to make sure your setup is actually going to work reliably > and portably. I think you are conflating completely different things, let me try to clarify, we might actually be talking past each other. > > We aren't going to cancel years of HW and SW development for our > > Maybe you should have actually read the specs you are claiming to > implemented before spending all that effort. Anyway, let's cool our respective jets and sort that out, there are indeed other approaches than overriding the DMA ops with special ones, though I find them less tasty ... but here's my attempt at a (simpler) description. Bear with me for the long-ish email, this tries to describe the system so you get an idea where we come from, and options we can use to get out of this. So we *are* implementing the spec, since qemu is currently unmodified: Default virtio will bypass the iommu emulated by qemu as per spec etc.. On the Linux side, thus, virtio "sees" a normal iommu-bypassing device and will treat it as such. The problem is the assumption in the middle that qemu can access all guest pages directly, which holds true for traditional VMs, but breaks when the VM in our case turns itself into a secure VM. This isn't under the action (or due to changes in) the hypervisor. KVM operates (almost) normally here. But there's this (very thin and open source btw) layer underneath called ultravisor, which exploits some HW facilities to maintain a separate pool of "secure" memory, which cannot be physically accessed by a non-secure entity. So in our scenario, qemu and KVM create a VM totally normally, there is no changes required to the VM firmware, bootloader(s), etc... in fact we support Linux based bootloaders, and those will work as normal linux would in a VM, virtio works normally, etc... Until that VM (via grub or kexec for example) loads a "secure image". That secure image is a Linux kernel which has been "wrapped" (to simply imagine a modified zImage wrapper though that's not entirely exact). When that is run, before it modifies it's .data, it will interact with the ultravisor using a specific HW facility to make itself secure. What happens then is that the UV cryptographically verifies the kernel and ramdisk, and copies them to the secure memory where execution returns. The Ultravisor is then involved as a small shim for hypercalls between the secure VM and KVM to prevent leakage of information (sanitize registers etc...). Now at this point, qemu can no longer access the secure VM pages (there's more to this, such as using HMM to allow migration/encryption accross etc... but let's not get bogged down). So virtio can no longer access any page in the VM. Now the VM *can* request from the Ultravisor some selected pages to be made "insecure" and thus shared with qemu. This is how we handle some of the pages used in our paravirt stuff, and that's how we want to deal with virtio, by creating an insecure swiotlb pool. At this point, thus, there are two options. - One you have rejected, which is to have a way for "no-iommu" virtio (which still doesn't use an iommu on the qemu side and doesn't need to), to be forced to use some custom DMA ops on the VM side. - One, which sadly has more overhead and will require modifying more pieces of the puzzle, which is to make qemu uses an emulated iommu. Once we make qemu do that, we can then layer swiotlb on top of the emulated iommu on the guest side, and pass that as dma_ops to virtio. Now, assuming you still absolutely want us to go down the second option, there are several ways to get there. We would prefer to avoid requiring the user to pass some special option to qemu. That has an impact up the food chain (libvirt, management tools etc...) and users probably won't understand what it's about. In fact the *end user* might not even need to know a VM is secure, though applications inside might. There's the additional annoyance that currently our guest FW (SLOF) cannot deal with virtio in IOMMU mode, but that's fixable. >From there, refer to the email chain between Michael and I where we are discussing options to "switch" virtio at runtime on the qemu side. Any comment or suggestion ? Cheers, Ben. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-05 1:10 ` Benjamin Herrenschmidt @ 2018-08-05 7:29 ` Christoph Hellwig -1 siblings, 0 replies; 206+ messages in thread From: Christoph Hellwig @ 2018-08-05 7:29 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Christoph Hellwig, Michael S. Tsirkin, Will Deacon, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Sun, Aug 05, 2018 at 11:10:15AM +1000, Benjamin Herrenschmidt wrote: > - One you have rejected, which is to have a way for "no-iommu" virtio > (which still doesn't use an iommu on the qemu side and doesn't need > to), to be forced to use some custom DMA ops on the VM side. > > - One, which sadly has more overhead and will require modifying more > pieces of the puzzle, which is to make qemu uses an emulated iommu. > Once we make qemu do that, we can then layer swiotlb on top of the > emulated iommu on the guest side, and pass that as dma_ops to virtio. Or number three: have a a virtio feature bit that tells the VM to use whatever dma ops the platform thinks are appropinquate for the bus it pretends to be on. Then set a dma-range that is limited to your secure memory range (if you really need it to be runtime enabled only after a device reset that rescans) and use the normal dma mapping code to bounce buffer. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices @ 2018-08-05 7:29 ` Christoph Hellwig 0 siblings, 0 replies; 206+ messages in thread From: Christoph Hellwig @ 2018-08-05 7:29 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: robh, srikar, Michael S. Tsirkin, mpe, Will Deacon, linux-kernel, linuxram, virtualization, Christoph Hellwig, paulus, marc.zyngier, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Sun, Aug 05, 2018 at 11:10:15AM +1000, Benjamin Herrenschmidt wrote: > - One you have rejected, which is to have a way for "no-iommu" virtio > (which still doesn't use an iommu on the qemu side and doesn't need > to), to be forced to use some custom DMA ops on the VM side. > > - One, which sadly has more overhead and will require modifying more > pieces of the puzzle, which is to make qemu uses an emulated iommu. > Once we make qemu do that, we can then layer swiotlb on top of the > emulated iommu on the guest side, and pass that as dma_ops to virtio. Or number three: have a a virtio feature bit that tells the VM to use whatever dma ops the platform thinks are appropinquate for the bus it pretends to be on. Then set a dma-range that is limited to your secure memory range (if you really need it to be runtime enabled only after a device reset that rescans) and use the normal dma mapping code to bounce buffer. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-05 7:29 ` Christoph Hellwig @ 2018-08-05 21:16 ` Benjamin Herrenschmidt -1 siblings, 0 replies; 206+ messages in thread From: Benjamin Herrenschmidt @ 2018-08-05 21:16 UTC (permalink / raw) To: Christoph Hellwig Cc: Michael S. Tsirkin, Will Deacon, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Sun, 2018-08-05 at 00:29 -0700, Christoph Hellwig wrote: > On Sun, Aug 05, 2018 at 11:10:15AM +1000, Benjamin Herrenschmidt wrote: > > - One you have rejected, which is to have a way for "no-iommu" virtio > > (which still doesn't use an iommu on the qemu side and doesn't need > > to), to be forced to use some custom DMA ops on the VM side. > > > > - One, which sadly has more overhead and will require modifying more > > pieces of the puzzle, which is to make qemu uses an emulated iommu. > > Once we make qemu do that, we can then layer swiotlb on top of the > > emulated iommu on the guest side, and pass that as dma_ops to virtio. > > Or number three: have a a virtio feature bit that tells the VM > to use whatever dma ops the platform thinks are appropinquate for > the bus it pretends to be on. Then set a dma-range that is limited > to your secure memory range (if you really need it to be runtime > enabled only after a device reset that rescans) and use the normal > dma mapping code to bounce buffer. Who would set this bit ? qemu ? Under what circumstances ? What would be the effect of this bit while VIRTIO_F_IOMMU is NOT set, ie, what would qemu do and what would Linux do ? I'm not sure I fully understand your idea. I'm trying to understand because the limitation is not a device side limitation, it's not a qemu limitation, it's actually more of a VM limitation. It has most of its memory pages made inaccessible for security reasons. The platform from a qemu/KVM perspective is almost entirely normal. So I don't understand when would qemu set this bit, or should it be set by the VM at runtime ? Cheers, Ben. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices @ 2018-08-05 21:16 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 206+ messages in thread From: Benjamin Herrenschmidt @ 2018-08-05 21:16 UTC (permalink / raw) To: Christoph Hellwig Cc: robh, srikar, Michael S. Tsirkin, mpe, Will Deacon, linux-kernel, linuxram, virtualization, paulus, marc.zyngier, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Sun, 2018-08-05 at 00:29 -0700, Christoph Hellwig wrote: > On Sun, Aug 05, 2018 at 11:10:15AM +1000, Benjamin Herrenschmidt wrote: > > - One you have rejected, which is to have a way for "no-iommu" virtio > > (which still doesn't use an iommu on the qemu side and doesn't need > > to), to be forced to use some custom DMA ops on the VM side. > > > > - One, which sadly has more overhead and will require modifying more > > pieces of the puzzle, which is to make qemu uses an emulated iommu. > > Once we make qemu do that, we can then layer swiotlb on top of the > > emulated iommu on the guest side, and pass that as dma_ops to virtio. > > Or number three: have a a virtio feature bit that tells the VM > to use whatever dma ops the platform thinks are appropinquate for > the bus it pretends to be on. Then set a dma-range that is limited > to your secure memory range (if you really need it to be runtime > enabled only after a device reset that rescans) and use the normal > dma mapping code to bounce buffer. Who would set this bit ? qemu ? Under what circumstances ? What would be the effect of this bit while VIRTIO_F_IOMMU is NOT set, ie, what would qemu do and what would Linux do ? I'm not sure I fully understand your idea. I'm trying to understand because the limitation is not a device side limitation, it's not a qemu limitation, it's actually more of a VM limitation. It has most of its memory pages made inaccessible for security reasons. The platform from a qemu/KVM perspective is almost entirely normal. So I don't understand when would qemu set this bit, or should it be set by the VM at runtime ? Cheers, Ben. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-05 21:16 ` Benjamin Herrenschmidt @ 2018-08-05 21:30 ` Benjamin Herrenschmidt -1 siblings, 0 replies; 206+ messages in thread From: Benjamin Herrenschmidt @ 2018-08-05 21:30 UTC (permalink / raw) To: Christoph Hellwig Cc: Michael S. Tsirkin, Will Deacon, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Mon, 2018-08-06 at 07:16 +1000, Benjamin Herrenschmidt wrote: > I'm trying to understand because the limitation is not a device side > limitation, it's not a qemu limitation, it's actually more of a VM > limitation. It has most of its memory pages made inaccessible for > security reasons. The platform from a qemu/KVM perspective is almost > entirely normal. In fact this is probably the best image of what's going on: It's a normal VM from a KVM/qemu perspective (and thus virtio). It boots normally, can run firmware, linux, etc... normally, it's not created with any different XML or qemu command line definition etc... It just that once it reaches the kernel with the secure stuff enabled (could be via kexec from a normal kernel), that kernel will "stash away" most of the VM's memory into some secure space that nothing else (not even the hypervisor) can access. It can keep around a pool or two of normal memory for bounce buferring IOs but that's about it. I think that's the clearest way I could find to explain what's going on, and why I'm so resistant on adding things on qemu side. That said, we *can* (and will) notify KVM and qemu of the transition, and we can/will do so after virtio has been instanciated and used by the bootloader, but before it will be used (or even probed) by the secure VM itself, so there's an opportunity to poke at things, either from the VM itself (a quirk poking at virtio config space for example) or from qemu (though I find the idea of iterating all virtio devices from qemu to change a setting rather gross). Cheers, Ben. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices @ 2018-08-05 21:30 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 206+ messages in thread From: Benjamin Herrenschmidt @ 2018-08-05 21:30 UTC (permalink / raw) To: Christoph Hellwig Cc: robh, srikar, Michael S. Tsirkin, mpe, Will Deacon, linux-kernel, linuxram, virtualization, paulus, marc.zyngier, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Mon, 2018-08-06 at 07:16 +1000, Benjamin Herrenschmidt wrote: > I'm trying to understand because the limitation is not a device side > limitation, it's not a qemu limitation, it's actually more of a VM > limitation. It has most of its memory pages made inaccessible for > security reasons. The platform from a qemu/KVM perspective is almost > entirely normal. In fact this is probably the best image of what's going on: It's a normal VM from a KVM/qemu perspective (and thus virtio). It boots normally, can run firmware, linux, etc... normally, it's not created with any different XML or qemu command line definition etc... It just that once it reaches the kernel with the secure stuff enabled (could be via kexec from a normal kernel), that kernel will "stash away" most of the VM's memory into some secure space that nothing else (not even the hypervisor) can access. It can keep around a pool or two of normal memory for bounce buferring IOs but that's about it. I think that's the clearest way I could find to explain what's going on, and why I'm so resistant on adding things on qemu side. That said, we *can* (and will) notify KVM and qemu of the transition, and we can/will do so after virtio has been instanciated and used by the bootloader, but before it will be used (or even probed) by the secure VM itself, so there's an opportunity to poke at things, either from the VM itself (a quirk poking at virtio config space for example) or from qemu (though I find the idea of iterating all virtio devices from qemu to change a setting rather gross). Cheers, Ben. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-05 21:16 ` Benjamin Herrenschmidt @ 2018-08-06 9:42 ` Christoph Hellwig -1 siblings, 0 replies; 206+ messages in thread From: Christoph Hellwig @ 2018-08-06 9:42 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Christoph Hellwig, Michael S. Tsirkin, Will Deacon, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Mon, Aug 06, 2018 at 07:16:47AM +1000, Benjamin Herrenschmidt wrote: > Who would set this bit ? qemu ? Under what circumstances ? I don't really care who sets what. The implementation might not even involved qemu. It is your job to write a coherent interface specification that does not depend on the used components. The hypervisor might be PAPR, Linux + qemu, VMware, Hyperv or something so secret that you'd have to shoot me if you had to tell me. The guest might be Linux, FreeBSD, AIX, OS400 or a Hipster project of the day in Rust. As long as we properly specify the interface it simplify does not matter. > What would be the effect of this bit while VIRTIO_F_IOMMU is NOT set, > ie, what would qemu do and what would Linux do ? I'm not sure I fully > understand your idea. In a perfect would we'd just reuse VIRTIO_F_IOMMU and clarify the description which currently is rather vague but basically captures the use case. Currently is is: VIRTIO_F_IOMMU_PLATFORM(33) This feature indicates that the device is behind an IOMMU that translates bus addresses from the device into physical addresses in memory. If this feature bit is set to 0, then the device emits physical addresses which are not translated further, even though an IOMMU may be present. And I'd change it to something like: VIRTIO_F_PLATFORM_DMA(33) This feature indicates that the device emits platform specific bus addresses that might not be identical to physical address. The translation of physical to bus address is platform speific and defined by the plaform specification for the bus that the virtio device is attached to. If this feature bit is set to 0, then the device emits physical addresses which are not translated further, even if the platform would normally require translations for the bus that the virtio device is attached to. If we can't change the defintion any more we should deprecate the old VIRTIO_F_IOMMU_PLATFORM bit, and require the VIRTIO_F_IOMMU_PLATFORM and VIRTIO_F_PLATFORM_DMA to be not set at the same time. > I'm trying to understand because the limitation is not a device side > limitation, it's not a qemu limitation, it's actually more of a VM > limitation. It has most of its memory pages made inaccessible for > security reasons. The platform from a qemu/KVM perspective is almost > entirely normal. Well, find a way to describe this either in the qemu specification using new feature bits, or by using something like the above. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices @ 2018-08-06 9:42 ` Christoph Hellwig 0 siblings, 0 replies; 206+ messages in thread From: Christoph Hellwig @ 2018-08-06 9:42 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: robh, srikar, Michael S. Tsirkin, mpe, Will Deacon, linux-kernel, linuxram, virtualization, Christoph Hellwig, paulus, marc.zyngier, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Mon, Aug 06, 2018 at 07:16:47AM +1000, Benjamin Herrenschmidt wrote: > Who would set this bit ? qemu ? Under what circumstances ? I don't really care who sets what. The implementation might not even involved qemu. It is your job to write a coherent interface specification that does not depend on the used components. The hypervisor might be PAPR, Linux + qemu, VMware, Hyperv or something so secret that you'd have to shoot me if you had to tell me. The guest might be Linux, FreeBSD, AIX, OS400 or a Hipster project of the day in Rust. As long as we properly specify the interface it simplify does not matter. > What would be the effect of this bit while VIRTIO_F_IOMMU is NOT set, > ie, what would qemu do and what would Linux do ? I'm not sure I fully > understand your idea. In a perfect would we'd just reuse VIRTIO_F_IOMMU and clarify the description which currently is rather vague but basically captures the use case. Currently is is: VIRTIO_F_IOMMU_PLATFORM(33) This feature indicates that the device is behind an IOMMU that translates bus addresses from the device into physical addresses in memory. If this feature bit is set to 0, then the device emits physical addresses which are not translated further, even though an IOMMU may be present. And I'd change it to something like: VIRTIO_F_PLATFORM_DMA(33) This feature indicates that the device emits platform specific bus addresses that might not be identical to physical address. The translation of physical to bus address is platform speific and defined by the plaform specification for the bus that the virtio device is attached to. If this feature bit is set to 0, then the device emits physical addresses which are not translated further, even if the platform would normally require translations for the bus that the virtio device is attached to. If we can't change the defintion any more we should deprecate the old VIRTIO_F_IOMMU_PLATFORM bit, and require the VIRTIO_F_IOMMU_PLATFORM and VIRTIO_F_PLATFORM_DMA to be not set at the same time. > I'm trying to understand because the limitation is not a device side > limitation, it's not a qemu limitation, it's actually more of a VM > limitation. It has most of its memory pages made inaccessible for > security reasons. The platform from a qemu/KVM perspective is almost > entirely normal. Well, find a way to describe this either in the qemu specification using new feature bits, or by using something like the above. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-06 9:42 ` Christoph Hellwig @ 2018-08-06 19:52 ` Benjamin Herrenschmidt -1 siblings, 0 replies; 206+ messages in thread From: Benjamin Herrenschmidt @ 2018-08-06 19:52 UTC (permalink / raw) To: Christoph Hellwig Cc: Michael S. Tsirkin, Will Deacon, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Mon, 2018-08-06 at 02:42 -0700, Christoph Hellwig wrote: > On Mon, Aug 06, 2018 at 07:16:47AM +1000, Benjamin Herrenschmidt wrote: > > Who would set this bit ? qemu ? Under what circumstances ? > > I don't really care who sets what. The implementation might not even > involved qemu. > > It is your job to write a coherent interface specification that does > not depend on the used components. The hypervisor might be PAPR, > Linux + qemu, VMware, Hyperv or something so secret that you'd have > to shoot me if you had to tell me. The guest might be Linux, FreeBSD, > AIX, OS400 or a Hipster project of the day in Rust. As long as we > properly specify the interface it simplify does not matter. That's the point Christoph. The interface is today's interface. It does NOT change. That information is not part of the interface. It's the VM itself that is stashing away its memory in a secret place, and thus needs to do bounce buffering. There is no change to the virtio interface per-se. > > What would be the effect of this bit while VIRTIO_F_IOMMU is NOT set, > > ie, what would qemu do and what would Linux do ? I'm not sure I fully > > understand your idea. > > In a perfect would we'd just reuse VIRTIO_F_IOMMU and clarify the > description which currently is rather vague but basically captures > the use case. Currently is is: > > VIRTIO_F_IOMMU_PLATFORM(33) > This feature indicates that the device is behind an IOMMU that > translates bus addresses from the device into physical addresses in > memory. If this feature bit is set to 0, then the device emits > physical addresses which are not translated further, even though an > IOMMU may be present. > > And I'd change it to something like: > > VIRTIO_F_PLATFORM_DMA(33) > This feature indicates that the device emits platform specific > bus addresses that might not be identical to physical address. > The translation of physical to bus address is platform speific > and defined by the plaform specification for the bus that the virtio > device is attached to. > If this feature bit is set to 0, then the device emits > physical addresses which are not translated further, even if > the platform would normally require translations for the bus that > the virtio device is attached to. > > If we can't change the defintion any more we should deprecate the > old VIRTIO_F_IOMMU_PLATFORM bit, and require the VIRTIO_F_IOMMU_PLATFORM > and VIRTIO_F_PLATFORM_DMA to be not set at the same time. But this doesn't really change our problem does it ? None of what happens in our case is part of the "interface". The suggestion to force the iommu ON was simply that it was a "workaround" as by doing so, we get to override the DMA ops, but that's just a trick. Fundamentally, what we need to solve is pretty much entirely a guest problem. > > I'm trying to understand because the limitation is not a device side > > limitation, it's not a qemu limitation, it's actually more of a VM > > limitation. It has most of its memory pages made inaccessible for > > security reasons. The platform from a qemu/KVM perspective is almost > > entirely normal. > > Well, find a way to describe this either in the qemu specification using > new feature bits, or by using something like the above. But again, why do you want to involve the interface, and thus the hypervisor for something that is essentially what the guest is doign to itself ? It really is something we need to solve locally to the guest, it's not part of the interface. Cheers, Ben. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices @ 2018-08-06 19:52 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 206+ messages in thread From: Benjamin Herrenschmidt @ 2018-08-06 19:52 UTC (permalink / raw) To: Christoph Hellwig Cc: robh, srikar, Michael S. Tsirkin, mpe, Will Deacon, linux-kernel, linuxram, virtualization, jean-philippe.brucker, paulus, marc.zyngier, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Mon, 2018-08-06 at 02:42 -0700, Christoph Hellwig wrote: > On Mon, Aug 06, 2018 at 07:16:47AM +1000, Benjamin Herrenschmidt wrote: > > Who would set this bit ? qemu ? Under what circumstances ? > > I don't really care who sets what. The implementation might not even > involved qemu. > > It is your job to write a coherent interface specification that does > not depend on the used components. The hypervisor might be PAPR, > Linux + qemu, VMware, Hyperv or something so secret that you'd have > to shoot me if you had to tell me. The guest might be Linux, FreeBSD, > AIX, OS400 or a Hipster project of the day in Rust. As long as we > properly specify the interface it simplify does not matter. That's the point Christoph. The interface is today's interface. It does NOT change. That information is not part of the interface. It's the VM itself that is stashing away its memory in a secret place, and thus needs to do bounce buffering. There is no change to the virtio interface per-se. > > What would be the effect of this bit while VIRTIO_F_IOMMU is NOT set, > > ie, what would qemu do and what would Linux do ? I'm not sure I fully > > understand your idea. > > In a perfect would we'd just reuse VIRTIO_F_IOMMU and clarify the > description which currently is rather vague but basically captures > the use case. Currently is is: > > VIRTIO_F_IOMMU_PLATFORM(33) > This feature indicates that the device is behind an IOMMU that > translates bus addresses from the device into physical addresses in > memory. If this feature bit is set to 0, then the device emits > physical addresses which are not translated further, even though an > IOMMU may be present. > > And I'd change it to something like: > > VIRTIO_F_PLATFORM_DMA(33) > This feature indicates that the device emits platform specific > bus addresses that might not be identical to physical address. > The translation of physical to bus address is platform speific > and defined by the plaform specification for the bus that the virtio > device is attached to. > If this feature bit is set to 0, then the device emits > physical addresses which are not translated further, even if > the platform would normally require translations for the bus that > the virtio device is attached to. > > If we can't change the defintion any more we should deprecate the > old VIRTIO_F_IOMMU_PLATFORM bit, and require the VIRTIO_F_IOMMU_PLATFORM > and VIRTIO_F_PLATFORM_DMA to be not set at the same time. But this doesn't really change our problem does it ? None of what happens in our case is part of the "interface". The suggestion to force the iommu ON was simply that it was a "workaround" as by doing so, we get to override the DMA ops, but that's just a trick. Fundamentally, what we need to solve is pretty much entirely a guest problem. > > I'm trying to understand because the limitation is not a device side > > limitation, it's not a qemu limitation, it's actually more of a VM > > limitation. It has most of its memory pages made inaccessible for > > security reasons. The platform from a qemu/KVM perspective is almost > > entirely normal. > > Well, find a way to describe this either in the qemu specification using > new feature bits, or by using something like the above. But again, why do you want to involve the interface, and thus the hypervisor for something that is essentially what the guest is doign to itself ? It really is something we need to solve locally to the guest, it's not part of the interface. Cheers, Ben. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-06 19:52 ` Benjamin Herrenschmidt (?) @ 2018-08-07 6:21 ` Christoph Hellwig 2018-08-07 6:42 ` Benjamin Herrenschmidt -1 siblings, 1 reply; 206+ messages in thread From: Christoph Hellwig @ 2018-08-07 6:21 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Christoph Hellwig, Michael S. Tsirkin, Will Deacon, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Tue, Aug 07, 2018 at 05:52:12AM +1000, Benjamin Herrenschmidt wrote: > > It is your job to write a coherent interface specification that does > > not depend on the used components. The hypervisor might be PAPR, > > Linux + qemu, VMware, Hyperv or something so secret that you'd have > > to shoot me if you had to tell me. The guest might be Linux, FreeBSD, > > AIX, OS400 or a Hipster project of the day in Rust. As long as we > > properly specify the interface it simplify does not matter. > > That's the point Christoph. The interface is today's interface. It does > NOT change. That information is not part of the interface. > > It's the VM itself that is stashing away its memory in a secret place, > and thus needs to do bounce buffering. There is no change to the virtio > interface per-se. Any guest that doesn't know about your magic limited adressing is simply not going to work, so we need to communicate that fact. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-07 6:21 ` Christoph Hellwig @ 2018-08-07 6:42 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 206+ messages in thread From: Benjamin Herrenschmidt @ 2018-08-07 6:42 UTC (permalink / raw) To: Christoph Hellwig Cc: Michael S. Tsirkin, Will Deacon, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Mon, 2018-08-06 at 23:21 -0700, Christoph Hellwig wrote: > On Tue, Aug 07, 2018 at 05:52:12AM +1000, Benjamin Herrenschmidt wrote: > > > It is your job to write a coherent interface specification that does > > > not depend on the used components. The hypervisor might be PAPR, > > > Linux + qemu, VMware, Hyperv or something so secret that you'd have > > > to shoot me if you had to tell me. The guest might be Linux, FreeBSD, > > > AIX, OS400 or a Hipster project of the day in Rust. As long as we > > > properly specify the interface it simplify does not matter. > > > > That's the point Christoph. The interface is today's interface. It does > > NOT change. That information is not part of the interface. > > > > It's the VM itself that is stashing away its memory in a secret place, > > and thus needs to do bounce buffering. There is no change to the virtio > > interface per-se. > > Any guest that doesn't know about your magic limited adressing is simply > not going to work, so we need to communicate that fact. The guest does. It's the guest itself that initiates it. That's my point, it's not a factor of the hypervisor, which is unchanged in that area. It's the guest itself, that makes the decision early on, to stash it's memory away in a secure place, and thus needs to establish some kind of bouce buffering via a few left over "insecure" pages. It's all done by the guest: initiated by the guest and controlled by the guest. That's why I don't see why this specifically needs to involve the hypervisor side, and thus a VIRTIO feature bit. Note that I can make it so that the same DMA ops (basically standard swiotlb ops without arch hacks) work for both "direct virtio" and "normal PCI" devices. The trick is simply in the arch to setup the iommu to map the swiotlb bounce buffer pool 1:1 in the iommu, so the iommu essentially can be ignored without affecting the physical addresses. If I do that, *all* I need is a way, from the guest itself (again, the other side dosn't know anything about it), to force virtio to use the DMA ops as if there was an iommu, that is, use whatever dma ops were setup by the platform for the pci device. Cheers, Ben. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices @ 2018-08-07 6:42 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 206+ messages in thread From: Benjamin Herrenschmidt @ 2018-08-07 6:42 UTC (permalink / raw) To: Christoph Hellwig Cc: robh, srikar, Michael S. Tsirkin, mpe, Will Deacon, linux-kernel, linuxram, virtualization, jean-philippe.brucker, paulus, marc.zyngier, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Mon, 2018-08-06 at 23:21 -0700, Christoph Hellwig wrote: > On Tue, Aug 07, 2018 at 05:52:12AM +1000, Benjamin Herrenschmidt wrote: > > > It is your job to write a coherent interface specification that does > > > not depend on the used components. The hypervisor might be PAPR, > > > Linux + qemu, VMware, Hyperv or something so secret that you'd have > > > to shoot me if you had to tell me. The guest might be Linux, FreeBSD, > > > AIX, OS400 or a Hipster project of the day in Rust. As long as we > > > properly specify the interface it simplify does not matter. > > > > That's the point Christoph. The interface is today's interface. It does > > NOT change. That information is not part of the interface. > > > > It's the VM itself that is stashing away its memory in a secret place, > > and thus needs to do bounce buffering. There is no change to the virtio > > interface per-se. > > Any guest that doesn't know about your magic limited adressing is simply > not going to work, so we need to communicate that fact. The guest does. It's the guest itself that initiates it. That's my point, it's not a factor of the hypervisor, which is unchanged in that area. It's the guest itself, that makes the decision early on, to stash it's memory away in a secure place, and thus needs to establish some kind of bouce buffering via a few left over "insecure" pages. It's all done by the guest: initiated by the guest and controlled by the guest. That's why I don't see why this specifically needs to involve the hypervisor side, and thus a VIRTIO feature bit. Note that I can make it so that the same DMA ops (basically standard swiotlb ops without arch hacks) work for both "direct virtio" and "normal PCI" devices. The trick is simply in the arch to setup the iommu to map the swiotlb bounce buffer pool 1:1 in the iommu, so the iommu essentially can be ignored without affecting the physical addresses. If I do that, *all* I need is a way, from the guest itself (again, the other side dosn't know anything about it), to force virtio to use the DMA ops as if there was an iommu, that is, use whatever dma ops were setup by the platform for the pci device. Cheers, Ben. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-07 6:42 ` Benjamin Herrenschmidt (?) @ 2018-08-07 13:55 ` Christoph Hellwig 2018-08-07 20:32 ` Benjamin Herrenschmidt -1 siblings, 1 reply; 206+ messages in thread From: Christoph Hellwig @ 2018-08-07 13:55 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Christoph Hellwig, Michael S. Tsirkin, Will Deacon, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Tue, Aug 07, 2018 at 04:42:44PM +1000, Benjamin Herrenschmidt wrote: > Note that I can make it so that the same DMA ops (basically standard > swiotlb ops without arch hacks) work for both "direct virtio" and > "normal PCI" devices. > > The trick is simply in the arch to setup the iommu to map the swiotlb > bounce buffer pool 1:1 in the iommu, so the iommu essentially can be > ignored without affecting the physical addresses. > > If I do that, *all* I need is a way, from the guest itself (again, the > other side dosn't know anything about it), to force virtio to use the > DMA ops as if there was an iommu, that is, use whatever dma ops were > setup by the platform for the pci device. In that case just setting VIRTIO_F_IOMMU_PLATFORM in the flags should do the work (even if that isn't strictly what the current definition of the flag actually means). On the qemu side you'll need to make sure you have a way to set VIRTIO_F_IOMMU_PLATFORM without emulating an iommu, but with code to take dma offsets into account if your plaform has any (various power plaforms seem to have them, not sure if it affects your config). ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-07 13:55 ` Christoph Hellwig @ 2018-08-07 20:32 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 206+ messages in thread From: Benjamin Herrenschmidt @ 2018-08-07 20:32 UTC (permalink / raw) To: Christoph Hellwig Cc: Michael S. Tsirkin, Will Deacon, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Tue, 2018-08-07 at 06:55 -0700, Christoph Hellwig wrote: > On Tue, Aug 07, 2018 at 04:42:44PM +1000, Benjamin Herrenschmidt wrote: > > Note that I can make it so that the same DMA ops (basically standard > > swiotlb ops without arch hacks) work for both "direct virtio" and > > "normal PCI" devices. > > > > The trick is simply in the arch to setup the iommu to map the swiotlb > > bounce buffer pool 1:1 in the iommu, so the iommu essentially can be > > ignored without affecting the physical addresses. > > > > If I do that, *all* I need is a way, from the guest itself (again, the > > other side dosn't know anything about it), to force virtio to use the > > DMA ops as if there was an iommu, that is, use whatever dma ops were > > setup by the platform for the pci device. > > In that case just setting VIRTIO_F_IOMMU_PLATFORM in the flags should > do the work (even if that isn't strictly what the current definition > of the flag actually means). On the qemu side you'll need to make > sure you have a way to set VIRTIO_F_IOMMU_PLATFORM without emulating > an iommu, but with code to take dma offsets into account if your > plaform has any (various power plaforms seem to have them, not sure > if it affects your config). Something like that yes. I prefer a slightly different way, see below, any but in both cases, it should alleviate your concerns since it means there would be no particular mucking around with DMA ops at all, virtio would just use whatever "normal" ops we establish for all PCI devices on that platform, which will be standard ones. (swiotlb ones today and the new "integrates" ones you're cooking tomorrow). As for the flag itself, while we could set it from qemu when we get notified that the guest is going secure, both Michael and I think it's rather gross, it requires qemu to go iterate all virtio devices and "poke" something into them. It also means qemu will need some other internal nasty flag that says "set that bit but don't do iommu". It's nicer if we have a way in the guest virtio driver to do something along the lines of if ((flags & VIRTIO_F_IOMMU_PLATFORM) || arch_virtio_wants_dma_ops()) Which would have the same effect and means the issue is entirely contained in the guest. Cheers, Ben. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices @ 2018-08-07 20:32 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 206+ messages in thread From: Benjamin Herrenschmidt @ 2018-08-07 20:32 UTC (permalink / raw) To: Christoph Hellwig Cc: robh, srikar, Michael S. Tsirkin, mpe, Will Deacon, linux-kernel, linuxram, virtualization, jean-philippe.brucker, paulus, marc.zyngier, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Tue, 2018-08-07 at 06:55 -0700, Christoph Hellwig wrote: > On Tue, Aug 07, 2018 at 04:42:44PM +1000, Benjamin Herrenschmidt wrote: > > Note that I can make it so that the same DMA ops (basically standard > > swiotlb ops without arch hacks) work for both "direct virtio" and > > "normal PCI" devices. > > > > The trick is simply in the arch to setup the iommu to map the swiotlb > > bounce buffer pool 1:1 in the iommu, so the iommu essentially can be > > ignored without affecting the physical addresses. > > > > If I do that, *all* I need is a way, from the guest itself (again, the > > other side dosn't know anything about it), to force virtio to use the > > DMA ops as if there was an iommu, that is, use whatever dma ops were > > setup by the platform for the pci device. > > In that case just setting VIRTIO_F_IOMMU_PLATFORM in the flags should > do the work (even if that isn't strictly what the current definition > of the flag actually means). On the qemu side you'll need to make > sure you have a way to set VIRTIO_F_IOMMU_PLATFORM without emulating > an iommu, but with code to take dma offsets into account if your > plaform has any (various power plaforms seem to have them, not sure > if it affects your config). Something like that yes. I prefer a slightly different way, see below, any but in both cases, it should alleviate your concerns since it means there would be no particular mucking around with DMA ops at all, virtio would just use whatever "normal" ops we establish for all PCI devices on that platform, which will be standard ones. (swiotlb ones today and the new "integrates" ones you're cooking tomorrow). As for the flag itself, while we could set it from qemu when we get notified that the guest is going secure, both Michael and I think it's rather gross, it requires qemu to go iterate all virtio devices and "poke" something into them. It also means qemu will need some other internal nasty flag that says "set that bit but don't do iommu". It's nicer if we have a way in the guest virtio driver to do something along the lines of if ((flags & VIRTIO_F_IOMMU_PLATFORM) || arch_virtio_wants_dma_ops()) Which would have the same effect and means the issue is entirely contained in the guest. Cheers, Ben. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-07 20:32 ` Benjamin Herrenschmidt (?) @ 2018-08-08 6:31 ` Christoph Hellwig -1 siblings, 0 replies; 206+ messages in thread From: Christoph Hellwig @ 2018-08-08 6:31 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: robh, srikar, Michael S. Tsirkin, mpe, Will Deacon, linux-kernel, linuxram, virtualization, Christoph Hellwig, jean-philippe.brucker, paulus, marc.zyngier, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Wed, Aug 08, 2018 at 06:32:45AM +1000, Benjamin Herrenschmidt wrote: > As for the flag itself, while we could set it from qemu when we get > notified that the guest is going secure, both Michael and I think it's > rather gross, it requires qemu to go iterate all virtio devices and > "poke" something into them. You don't need to set them the time you go secure. You just need to set the flag from the beginning on any VM you might want to go secure. Or for simplicity just any VM - if the DT/ACPI tables exposed by qemu are good enough that will always exclude a iommu and not set a DMA offset, so nothing will change on the qemu side of he processing, and with the new direct calls for the direct dma ops performance in the guest won't change either. > It's nicer if we have a way in the guest virtio driver to do something > along the lines of > > if ((flags & VIRTIO_F_IOMMU_PLATFORM) || arch_virtio_wants_dma_ops()) > > Which would have the same effect and means the issue is entirely > contained in the guest. It would not be the same effect. The problem with that is that you must now assumes that your qemu knows that for example you might be passing a dma offset if the bus otherwise requires it. Or in other words: you potentially break the contract between qemu and the guest of always passing down physical addresses. If we explicitly change that contract through using a flag that says you pass bus address everything is fine. Note that in practice your scheme will probably just work for your initial prototype, but chances are it will get us in trouble later on. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-07 20:32 ` Benjamin Herrenschmidt (?) (?) @ 2018-08-08 6:31 ` Christoph Hellwig 2018-08-08 10:07 ` Benjamin Herrenschmidt -1 siblings, 1 reply; 206+ messages in thread From: Christoph Hellwig @ 2018-08-08 6:31 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Christoph Hellwig, Michael S. Tsirkin, Will Deacon, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Wed, Aug 08, 2018 at 06:32:45AM +1000, Benjamin Herrenschmidt wrote: > As for the flag itself, while we could set it from qemu when we get > notified that the guest is going secure, both Michael and I think it's > rather gross, it requires qemu to go iterate all virtio devices and > "poke" something into them. You don't need to set them the time you go secure. You just need to set the flag from the beginning on any VM you might want to go secure. Or for simplicity just any VM - if the DT/ACPI tables exposed by qemu are good enough that will always exclude a iommu and not set a DMA offset, so nothing will change on the qemu side of he processing, and with the new direct calls for the direct dma ops performance in the guest won't change either. > It's nicer if we have a way in the guest virtio driver to do something > along the lines of > > if ((flags & VIRTIO_F_IOMMU_PLATFORM) || arch_virtio_wants_dma_ops()) > > Which would have the same effect and means the issue is entirely > contained in the guest. It would not be the same effect. The problem with that is that you must now assumes that your qemu knows that for example you might be passing a dma offset if the bus otherwise requires it. Or in other words: you potentially break the contract between qemu and the guest of always passing down physical addresses. If we explicitly change that contract through using a flag that says you pass bus address everything is fine. Note that in practice your scheme will probably just work for your initial prototype, but chances are it will get us in trouble later on. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-08 6:31 ` Christoph Hellwig @ 2018-08-08 10:07 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 206+ messages in thread From: Benjamin Herrenschmidt @ 2018-08-08 10:07 UTC (permalink / raw) To: Christoph Hellwig Cc: Michael S. Tsirkin, Will Deacon, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Tue, 2018-08-07 at 23:31 -0700, Christoph Hellwig wrote: > > You don't need to set them the time you go secure. You just need to > set the flag from the beginning on any VM you might want to go secure. > Or for simplicity just any VM - if the DT/ACPI tables exposed by > qemu are good enough that will always exclude a iommu and not set a > DMA offset, so nothing will change on the qemu side of he processing, > and with the new direct calls for the direct dma ops performance in > the guest won't change either. So that's where I'm not sure things are "good enough" due to how pseries works. (remember it's paravirtualized). A pseries system starts with a default iommu on all devices, that uses translation using 4k entires with a "pinhole" window (usually 2G with qemu iirc). There's no "pass through" by default. Qemu virtio bypasses that iommu when the VIRTIO_F_IOMMU_PLATFORM flag is not set (default) but there's nothing in the device-tree to tell the guest about this since it's a violation of our pseries architecture, so we just rely on Linux virtio "knowing" that it happens. It's a bit yucky but that's now history... Essentially pseries "architecturally" does not have the concept of not having an iommu in the way and qemu violates that architecture today. (Remember it comes from pHyp, our priorietary HV, which we are somewhat mimmicing here). So if we always set VIRTIO_F_IOMMU_PLATFORM, it *will* force all virtio through that iommu and performance will suffer (esp vhost I suspect), especially since adding/removing translations in the iommu is a hypercall. Now, we do have HV APIs to create a second window that's "permanently mapped" to the guest memory, thus avoiding dynamic map/unmaps, and Linux can make use of this but I don't know if that works with qemu and the performance impact with vhost. So the situation isn't that great.... On the other hand, I think the other approach works for us: > > It's nicer if we have a way in the guest virtio driver to do something > > along the lines of > > > > if ((flags & VIRTIO_F_IOMMU_PLATFORM) || arch_virtio_wants_dma_ops()) > > > > Which would have the same effect and means the issue is entirely > > contained in the guest. > > It would not be the same effect. The problem with that is that you must > now assumes that your qemu knows that for example you might be passing > a dma offset if the bus otherwise requires it. I would assume that arch_virtio_wants_dma_ops() only returns true when no such offsets are involved, at least in our case that would be what happens. > Or in other words: > you potentially break the contract between qemu and the guest of always > passing down physical addresses. If we explicitly change that contract > through using a flag that says you pass bus address everything is fine. For us a "bus address" is behind the iommu so that's what VIRTIO_F_IOMMU_PLATFORM does already. We don't have the concept of a bus address that is different. I suppose it's an ARMism to have DMA offsets that are separate from iommus ? > Note that in practice your scheme will probably just work for your > initial prototype, but chances are it will get us in trouble later on. Not on pseries, at least not in any way I can think of mind you... but maybe other architectures would abuse it... We could add a WARN_ON if that calls returns true on a bus with an offset I suppose. Cheers, Ben. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices @ 2018-08-08 10:07 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 206+ messages in thread From: Benjamin Herrenschmidt @ 2018-08-08 10:07 UTC (permalink / raw) To: Christoph Hellwig Cc: robh, srikar, Michael S. Tsirkin, mpe, Will Deacon, linux-kernel, linuxram, virtualization, jean-philippe.brucker, paulus, marc.zyngier, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Tue, 2018-08-07 at 23:31 -0700, Christoph Hellwig wrote: > > You don't need to set them the time you go secure. You just need to > set the flag from the beginning on any VM you might want to go secure. > Or for simplicity just any VM - if the DT/ACPI tables exposed by > qemu are good enough that will always exclude a iommu and not set a > DMA offset, so nothing will change on the qemu side of he processing, > and with the new direct calls for the direct dma ops performance in > the guest won't change either. So that's where I'm not sure things are "good enough" due to how pseries works. (remember it's paravirtualized). A pseries system starts with a default iommu on all devices, that uses translation using 4k entires with a "pinhole" window (usually 2G with qemu iirc). There's no "pass through" by default. Qemu virtio bypasses that iommu when the VIRTIO_F_IOMMU_PLATFORM flag is not set (default) but there's nothing in the device-tree to tell the guest about this since it's a violation of our pseries architecture, so we just rely on Linux virtio "knowing" that it happens. It's a bit yucky but that's now history... Essentially pseries "architecturally" does not have the concept of not having an iommu in the way and qemu violates that architecture today. (Remember it comes from pHyp, our priorietary HV, which we are somewhat mimmicing here). So if we always set VIRTIO_F_IOMMU_PLATFORM, it *will* force all virtio through that iommu and performance will suffer (esp vhost I suspect), especially since adding/removing translations in the iommu is a hypercall. Now, we do have HV APIs to create a second window that's "permanently mapped" to the guest memory, thus avoiding dynamic map/unmaps, and Linux can make use of this but I don't know if that works with qemu and the performance impact with vhost. So the situation isn't that great.... On the other hand, I think the other approach works for us: > > It's nicer if we have a way in the guest virtio driver to do something > > along the lines of > > > > if ((flags & VIRTIO_F_IOMMU_PLATFORM) || arch_virtio_wants_dma_ops()) > > > > Which would have the same effect and means the issue is entirely > > contained in the guest. > > It would not be the same effect. The problem with that is that you must > now assumes that your qemu knows that for example you might be passing > a dma offset if the bus otherwise requires it. I would assume that arch_virtio_wants_dma_ops() only returns true when no such offsets are involved, at least in our case that would be what happens. > Or in other words: > you potentially break the contract between qemu and the guest of always > passing down physical addresses. If we explicitly change that contract > through using a flag that says you pass bus address everything is fine. For us a "bus address" is behind the iommu so that's what VIRTIO_F_IOMMU_PLATFORM does already. We don't have the concept of a bus address that is different. I suppose it's an ARMism to have DMA offsets that are separate from iommus ? > Note that in practice your scheme will probably just work for your > initial prototype, but chances are it will get us in trouble later on. Not on pseries, at least not in any way I can think of mind you... but maybe other architectures would abuse it... We could add a WARN_ON if that calls returns true on a bus with an offset I suppose. Cheers, Ben. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-08 10:07 ` Benjamin Herrenschmidt (?) @ 2018-08-08 12:30 ` Christoph Hellwig 2018-08-08 13:18 ` Benjamin Herrenschmidt -1 siblings, 1 reply; 206+ messages in thread From: Christoph Hellwig @ 2018-08-08 12:30 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Christoph Hellwig, Michael S. Tsirkin, Will Deacon, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Wed, Aug 08, 2018 at 08:07:49PM +1000, Benjamin Herrenschmidt wrote: > Qemu virtio bypasses that iommu when the VIRTIO_F_IOMMU_PLATFORM flag > is not set (default) but there's nothing in the device-tree to tell the > guest about this since it's a violation of our pseries architecture, so > we just rely on Linux virtio "knowing" that it happens. It's a bit > yucky but that's now history... That is ugly as hell, but it is how virtio works everywhere, so nothing special so far. > Essentially pseries "architecturally" does not have the concept of not > having an iommu in the way and qemu violates that architecture today. > > (Remember it comes from pHyp, our priorietary HV, which we are somewhat > mimmicing here). It shouldnt be too hard to have a dt property that communicates this, should it? > So if we always set VIRTIO_F_IOMMU_PLATFORM, it *will* force all virtio > through that iommu and performance will suffer (esp vhost I suspect), > especially since adding/removing translations in the iommu is a > hypercall. Well, we'd nee to make sure that for this particular bus we skip the actualy iommu. > > It would not be the same effect. The problem with that is that you must > > now assumes that your qemu knows that for example you might be passing > > a dma offset if the bus otherwise requires it. > > I would assume that arch_virtio_wants_dma_ops() only returns true when > no such offsets are involved, at least in our case that would be what > happens. That would work, but we're really piling hacĸs ontop of hacks here. > > Or in other words: > > you potentially break the contract between qemu and the guest of always > > passing down physical addresses. If we explicitly change that contract > > through using a flag that says you pass bus address everything is fine. > > For us a "bus address" is behind the iommu so that's what > VIRTIO_F_IOMMU_PLATFORM does already. We don't have the concept of a > bus address that is different. I suppose it's an ARMism to have DMA > offsets that are separate from iommus ? No, a lot of platforms support a bus address that has an offset from the physical address. including a lot of power platforms: arch/powerpc/kernel/pci-common.c: set_dma_offset(&dev->dev, PCI_DRAM_OFFSET); arch/powerpc/platforms/cell/iommu.c: set_dma_offset(dev, cell_dma_nommu_offset); arch/powerpc/platforms/cell/iommu.c: set_dma_offset(dev, addr); arch/powerpc/platforms/powernv/pci-ioda.c: set_dma_offset(&pdev->dev, pe->tce_bypass_base); arch/powerpc/platforms/powernv/pci-ioda.c: set_dma_offset(&pdev->dev, (1ULL << 32)); arch/powerpc/platforms/powernv/pci-ioda.c: set_dma_offset(&dev->dev, pe->tce_bypass_base); arch/powerpc/platforms/pseries/iommu.c: set_dma_offset(dev, dma_offset); arch/powerpc/sysdev/dart_iommu.c: set_dma_offset(&dev->dev, DART_U4_BYPASS_BASE); arch/powerpc/sysdev/fsl_pci.c: set_dma_offset(dev, pci64_dma_offset); to make things worse some platforms (at least on arm/arm64/mips/x86) can also require additional banking where it isn't even a single linear map but multiples windows. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-08 12:30 ` Christoph Hellwig @ 2018-08-08 13:18 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 206+ messages in thread From: Benjamin Herrenschmidt @ 2018-08-08 13:18 UTC (permalink / raw) To: Christoph Hellwig Cc: Michael S. Tsirkin, Will Deacon, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Wed, 2018-08-08 at 05:30 -0700, Christoph Hellwig wrote: > On Wed, Aug 08, 2018 at 08:07:49PM +1000, Benjamin Herrenschmidt wrote: > > Qemu virtio bypasses that iommu when the VIRTIO_F_IOMMU_PLATFORM flag > > is not set (default) but there's nothing in the device-tree to tell the > > guest about this since it's a violation of our pseries architecture, so > > we just rely on Linux virtio "knowing" that it happens. It's a bit > > yucky but that's now history... > > That is ugly as hell, but it is how virtio works everywhere, so nothing > special so far. Yup. > > Essentially pseries "architecturally" does not have the concept of not > > having an iommu in the way and qemu violates that architecture today. > > > > (Remember it comes from pHyp, our priorietary HV, which we are somewhat > > mimmicing here). > > It shouldnt be too hard to have a dt property that communicates this, > should it? We could invent something I suppose. The additional problem then (yeah I know ... what a mess) is that qemu doesn't create the DT for PCI devices, the firmware (SLOF) inside the guest does using normal PCI probing. That said, that FW could know about all the virtio vendor/device IDs, check the VIRTIO_F_IOMMU_PLATFORM and set that property accordingly... messy but doable. It's not a bus property (see my other reply below as this could complicate things with your bus mask). But we are drifting from the problem at hand :-) You propose we do set VIRTIO_F_IOMMU_PLATFORM so we aren't in the above case, and the bypass stuff works, so no need to touch it. See my recap at the end of the email to make sure I understand fully what you suggest. > > So if we always set VIRTIO_F_IOMMU_PLATFORM, it *will* force all virtio > > through that iommu and performance will suffer (esp vhost I suspect), > > especially since adding/removing translations in the iommu is a > > hypercall. > Well, we'd nee to make sure that for this particular bus we skip the > actualy iommu. It's not a bus property. Qemu will happily mix up everything on the same bus, that includes emulated devices that go through the emulated iommu, real VFIO devices that go through an actual HW iommu and virtio that bypasses everything. This makes things tricky in general (not just in my powerpc secure VM case) since, at least on powerpc but I suppose elsewhere too, iommu related properties tend to be per "bus" while here, qemu will mix and match. But again, I think we are drifting away from the topic, see below > > > It would not be the same effect. The problem with that is that you must > > > now assumes that your qemu knows that for example you might be passing > > > a dma offset if the bus otherwise requires it. > > > > I would assume that arch_virtio_wants_dma_ops() only returns true when > > no such offsets are involved, at least in our case that would be what > > happens. > > That would work, but we're really piling hacĸs ontop of hacks here. Sort-of :-) At least none of what we are discussing now involves touching the dma_ops themselves so we are not in the way of your big cleanup operation here. But yeah, let's continue discussing your other solution below. > > > Or in other words: > > > you potentially break the contract between qemu and the guest of always > > > passing down physical addresses. If we explicitly change that contract > > > through using a flag that says you pass bus address everything is fine. > > > > For us a "bus address" is behind the iommu so that's what > > VIRTIO_F_IOMMU_PLATFORM does already. We don't have the concept of a > > bus address that is different. I suppose it's an ARMism to have DMA > > offsets that are separate from iommus ? > > No, a lot of platforms support a bus address that has an offset from > the physical address. including a lot of power platforms: Ok, just talking past each other :-) For all the powerpc ones, these *do* go through the iommu, which is what I meant. It's just a window of the iommu that provides some kind of direct mapping of memory. For pseries, there is no such thing however. What we do to avoid constant map/unmap of iommu PTEs in pseries guests is that we use hypercalls to create a 64-bit window and populate all its PTEs with an identity mapping. But that's not as efficient as a real bypass. There are good historical reasons for that, since pseries is a guest platform, its memory is never really where the guest thinks it is, so you always need an iommu to remap. Even for virtual devices, since for most of them, in the "IBM" pHyp model, the "peer" is actually another partition, so the virtual iommu handles translating accross the two partitions. Same goes with cell in HW, no real bypass, just the iommu being confiured with very large pages and a fixed mapping. powernv has a separate physical window that can be configured as a real bypass though, so does the U4 DART. Not sure about the FSL one. But yeah, your point stands, this is just implementation details. > arch/powerpc/kernel/pci-common.c: set_dma_offset(&dev->dev, PCI_DRAM_OFFSET); > arch/powerpc/platforms/cell/iommu.c: set_dma_offset(dev, cell_dma_nommu_offset); > arch/powerpc/platforms/cell/iommu.c: set_dma_offset(dev, addr); > arch/powerpc/platforms/powernv/pci-ioda.c: set_dma_offset(&pdev->dev, pe->tce_bypass_base); > arch/powerpc/platforms/powernv/pci-ioda.c: set_dma_offset(&pdev->dev, (1ULL << 32)); > arch/powerpc/platforms/powernv/pci-ioda.c: set_dma_offset(&dev->dev, pe->tce_bypass_base); > arch/powerpc/platforms/pseries/iommu.c: set_dma_offset(dev, dma_offset); > arch/powerpc/sysdev/dart_iommu.c: set_dma_offset(&dev->dev, DART_U4_BYPASS_BASE); > arch/powerpc/sysdev/fsl_pci.c: set_dma_offset(dev, pci64_dma_offset); > > to make things worse some platforms (at least on arm/arm64/mips/x86) can > also require additional banking where it isn't even a single linear map > but multiples windows. Sure, but all of this is just the configuration of the iommu. But I think we agree here, and your point remains valid, indeed my proposed hack: > if ((flags & VIRTIO_F_IOMMU_PLATFORM) || arch_virtio_wants_dma_ops()) Will only work if the IOMMU and non-IOMMU path are completely equivalent. We can provide that guarantee for our secure VM case, but not generally so if we were to go down the route of a quirk in virtio, it might be better to make it painfully obvious that it's specific to that one case with a different kind of turd: - if (xen_domain()) + if (xen_domain() || pseries_secure_vm()) return true; So to summarize, and make sure I'm not missing something, the two approaches at hand are either: 1- The above, which is a one liner and contained in the guest, so that's nice, but also means another turd in virtio which isn't ... 2- We force pseries to always set VIRTIO_F_IOMMU_PLATFORM, but with the current architecture on our side that will force virtio to always go through an emulated iommu, as pseries doesn't have the concept of a real bypass window, and thus will impact performance for both secure and non-secure VMs. 3- Invent a property that can be put in selected PCI device tree nodes that indicates that for that device specifically, the iommu can be bypassed, along with a hypercall to turn that bypass on/off. Virtio would then use VIRTIO_F_IOMMU_PLATFORM but its DT nodes would also have that property and Linux would notice it and turn bypass on. The resulting properties of those options are: 1- Is what I want because it's the simplest, provides the best performance now, and works without code changes to qemu or non-secure Linux. However it does add a tiny turd to virtio which is annoying. 2- This works but it puts the iommu in the way always, thus reducing virtio performance accross the board for pseries unless we only do that for secure VMs but that is difficult (as discussed earlier). 3- This would recover the performance lost in -2-, however it requires qemu *and* guest changes. Specifically, existing guests (RHEL 7 etc...) would get the performance hit of -2- unless modified to call that 'enable bypass' call, which isn't great. So imho we have to chose one of 3 not-great solutions here... Unless I missed something in your ideas of course. Cheers, Ben. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices @ 2018-08-08 13:18 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 206+ messages in thread From: Benjamin Herrenschmidt @ 2018-08-08 13:18 UTC (permalink / raw) To: Christoph Hellwig Cc: robh, srikar, Michael S. Tsirkin, mpe, Will Deacon, linux-kernel, linuxram, virtualization, jean-philippe.brucker, paulus, marc.zyngier, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Wed, 2018-08-08 at 05:30 -0700, Christoph Hellwig wrote: > On Wed, Aug 08, 2018 at 08:07:49PM +1000, Benjamin Herrenschmidt wrote: > > Qemu virtio bypasses that iommu when the VIRTIO_F_IOMMU_PLATFORM flag > > is not set (default) but there's nothing in the device-tree to tell the > > guest about this since it's a violation of our pseries architecture, so > > we just rely on Linux virtio "knowing" that it happens. It's a bit > > yucky but that's now history... > > That is ugly as hell, but it is how virtio works everywhere, so nothing > special so far. Yup. > > Essentially pseries "architecturally" does not have the concept of not > > having an iommu in the way and qemu violates that architecture today. > > > > (Remember it comes from pHyp, our priorietary HV, which we are somewhat > > mimmicing here). > > It shouldnt be too hard to have a dt property that communicates this, > should it? We could invent something I suppose. The additional problem then (yeah I know ... what a mess) is that qemu doesn't create the DT for PCI devices, the firmware (SLOF) inside the guest does using normal PCI probing. That said, that FW could know about all the virtio vendor/device IDs, check the VIRTIO_F_IOMMU_PLATFORM and set that property accordingly... messy but doable. It's not a bus property (see my other reply below as this could complicate things with your bus mask). But we are drifting from the problem at hand :-) You propose we do set VIRTIO_F_IOMMU_PLATFORM so we aren't in the above case, and the bypass stuff works, so no need to touch it. See my recap at the end of the email to make sure I understand fully what you suggest. > > So if we always set VIRTIO_F_IOMMU_PLATFORM, it *will* force all virtio > > through that iommu and performance will suffer (esp vhost I suspect), > > especially since adding/removing translations in the iommu is a > > hypercall. > Well, we'd nee to make sure that for this particular bus we skip the > actualy iommu. It's not a bus property. Qemu will happily mix up everything on the same bus, that includes emulated devices that go through the emulated iommu, real VFIO devices that go through an actual HW iommu and virtio that bypasses everything. This makes things tricky in general (not just in my powerpc secure VM case) since, at least on powerpc but I suppose elsewhere too, iommu related properties tend to be per "bus" while here, qemu will mix and match. But again, I think we are drifting away from the topic, see below > > > It would not be the same effect. The problem with that is that you must > > > now assumes that your qemu knows that for example you might be passing > > > a dma offset if the bus otherwise requires it. > > > > I would assume that arch_virtio_wants_dma_ops() only returns true when > > no such offsets are involved, at least in our case that would be what > > happens. > > That would work, but we're really piling hacĸs ontop of hacks here. Sort-of :-) At least none of what we are discussing now involves touching the dma_ops themselves so we are not in the way of your big cleanup operation here. But yeah, let's continue discussing your other solution below. > > > Or in other words: > > > you potentially break the contract between qemu and the guest of always > > > passing down physical addresses. If we explicitly change that contract > > > through using a flag that says you pass bus address everything is fine. > > > > For us a "bus address" is behind the iommu so that's what > > VIRTIO_F_IOMMU_PLATFORM does already. We don't have the concept of a > > bus address that is different. I suppose it's an ARMism to have DMA > > offsets that are separate from iommus ? > > No, a lot of platforms support a bus address that has an offset from > the physical address. including a lot of power platforms: Ok, just talking past each other :-) For all the powerpc ones, these *do* go through the iommu, which is what I meant. It's just a window of the iommu that provides some kind of direct mapping of memory. For pseries, there is no such thing however. What we do to avoid constant map/unmap of iommu PTEs in pseries guests is that we use hypercalls to create a 64-bit window and populate all its PTEs with an identity mapping. But that's not as efficient as a real bypass. There are good historical reasons for that, since pseries is a guest platform, its memory is never really where the guest thinks it is, so you always need an iommu to remap. Even for virtual devices, since for most of them, in the "IBM" pHyp model, the "peer" is actually another partition, so the virtual iommu handles translating accross the two partitions. Same goes with cell in HW, no real bypass, just the iommu being confiured with very large pages and a fixed mapping. powernv has a separate physical window that can be configured as a real bypass though, so does the U4 DART. Not sure about the FSL one. But yeah, your point stands, this is just implementation details. > arch/powerpc/kernel/pci-common.c: set_dma_offset(&dev->dev, PCI_DRAM_OFFSET); > arch/powerpc/platforms/cell/iommu.c: set_dma_offset(dev, cell_dma_nommu_offset); > arch/powerpc/platforms/cell/iommu.c: set_dma_offset(dev, addr); > arch/powerpc/platforms/powernv/pci-ioda.c: set_dma_offset(&pdev->dev, pe->tce_bypass_base); > arch/powerpc/platforms/powernv/pci-ioda.c: set_dma_offset(&pdev->dev, (1ULL << 32)); > arch/powerpc/platforms/powernv/pci-ioda.c: set_dma_offset(&dev->dev, pe->tce_bypass_base); > arch/powerpc/platforms/pseries/iommu.c: set_dma_offset(dev, dma_offset); > arch/powerpc/sysdev/dart_iommu.c: set_dma_offset(&dev->dev, DART_U4_BYPASS_BASE); > arch/powerpc/sysdev/fsl_pci.c: set_dma_offset(dev, pci64_dma_offset); > > to make things worse some platforms (at least on arm/arm64/mips/x86) can > also require additional banking where it isn't even a single linear map > but multiples windows. Sure, but all of this is just the configuration of the iommu. But I think we agree here, and your point remains valid, indeed my proposed hack: > if ((flags & VIRTIO_F_IOMMU_PLATFORM) || arch_virtio_wants_dma_ops()) Will only work if the IOMMU and non-IOMMU path are completely equivalent. We can provide that guarantee for our secure VM case, but not generally so if we were to go down the route of a quirk in virtio, it might be better to make it painfully obvious that it's specific to that one case with a different kind of turd: - if (xen_domain()) + if (xen_domain() || pseries_secure_vm()) return true; So to summarize, and make sure I'm not missing something, the two approaches at hand are either: 1- The above, which is a one liner and contained in the guest, so that's nice, but also means another turd in virtio which isn't ... 2- We force pseries to always set VIRTIO_F_IOMMU_PLATFORM, but with the current architecture on our side that will force virtio to always go through an emulated iommu, as pseries doesn't have the concept of a real bypass window, and thus will impact performance for both secure and non-secure VMs. 3- Invent a property that can be put in selected PCI device tree nodes that indicates that for that device specifically, the iommu can be bypassed, along with a hypercall to turn that bypass on/off. Virtio would then use VIRTIO_F_IOMMU_PLATFORM but its DT nodes would also have that property and Linux would notice it and turn bypass on. The resulting properties of those options are: 1- Is what I want because it's the simplest, provides the best performance now, and works without code changes to qemu or non-secure Linux. However it does add a tiny turd to virtio which is annoying. 2- This works but it puts the iommu in the way always, thus reducing virtio performance accross the board for pseries unless we only do that for secure VMs but that is difficult (as discussed earlier). 3- This would recover the performance lost in -2-, however it requires qemu *and* guest changes. Specifically, existing guests (RHEL 7 etc...) would get the performance hit of -2- unless modified to call that 'enable bypass' call, which isn't great. So imho we have to chose one of 3 not-great solutions here... Unless I missed something in your ideas of course. Cheers, Ben. _______________________________________________ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-08 13:18 ` Benjamin Herrenschmidt (?) @ 2018-08-08 20:31 ` Michael S. Tsirkin 2018-08-08 22:13 ` Benjamin Herrenschmidt -1 siblings, 1 reply; 206+ messages in thread From: Michael S. Tsirkin @ 2018-08-08 20:31 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Christoph Hellwig, Will Deacon, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Wed, Aug 08, 2018 at 11:18:13PM +1000, Benjamin Herrenschmidt wrote: > Sure, but all of this is just the configuration of the iommu. But I > think we agree here, and your point remains valid, indeed my proposed > hack: > > > if ((flags & VIRTIO_F_IOMMU_PLATFORM) || arch_virtio_wants_dma_ops()) > > Will only work if the IOMMU and non-IOMMU path are completely equivalent. > > We can provide that guarantee for our secure VM case, but not generally so if > we were to go down the route of a quirk in virtio, it might be better to > make it painfully obvious that it's specific to that one case with a different > kind of turd: > > - if (xen_domain()) > + if (xen_domain() || pseries_secure_vm()) > return true; I don't think it's pseries specific actually. E.g. I suspect AMD SEV might benefit from the same kind of hack. > So to summarize, and make sure I'm not missing something, the two approaches > at hand are either: > > 1- The above, which is a one liner and contained in the guest, so that's nice, but > also means another turd in virtio which isn't ... > > 2- We force pseries to always set VIRTIO_F_IOMMU_PLATFORM, but with the current > architecture on our side that will force virtio to always go through an emulated > iommu, as pseries doesn't have the concept of a real bypass window, and thus will > impact performance for both secure and non-secure VMs. > > 3- Invent a property that can be put in selected PCI device tree nodes that > indicates that for that device specifically, the iommu can be bypassed, along with > a hypercall to turn that bypass on/off. Virtio would then use VIRTIO_F_IOMMU_PLATFORM > but its DT nodes would also have that property and Linux would notice it and turn > bypass on. For completeness, virtio could also have its own bounce buffer outside of DMA API one. I don't see lots of benefits to this though. > The resulting properties of those options are: > > 1- Is what I want because it's the simplest, provides the best performance now, > and works without code changes to qemu or non-secure Linux. However it does > add a tiny turd to virtio which is annoying. > > 2- This works but it puts the iommu in the way always, thus reducing virtio performance > accross the board for pseries unless we only do that for secure VMs but that is > difficult (as discussed earlier). > > 3- This would recover the performance lost in -2-, however it requires qemu *and* > guest changes. Specifically, existing guests (RHEL 7 etc...) would get the > performance hit of -2- unless modified to call that 'enable bypass' call, which > isn't great. > > So imho we have to chose one of 3 not-great solutions here... Unless I missed > something in your ideas of course. > > Cheers, > Ben. > > ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-08 20:31 ` Michael S. Tsirkin @ 2018-08-08 22:13 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 206+ messages in thread From: Benjamin Herrenschmidt @ 2018-08-08 22:13 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Christoph Hellwig, Will Deacon, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Wed, 2018-08-08 at 23:31 +0300, Michael S. Tsirkin wrote: > On Wed, Aug 08, 2018 at 11:18:13PM +1000, Benjamin Herrenschmidt wrote: > > Sure, but all of this is just the configuration of the iommu. But I > > think we agree here, and your point remains valid, indeed my proposed > > hack: > > > > > if ((flags & VIRTIO_F_IOMMU_PLATFORM) || arch_virtio_wants_dma_ops()) > > > > Will only work if the IOMMU and non-IOMMU path are completely equivalent. > > > > We can provide that guarantee for our secure VM case, but not generally so if > > we were to go down the route of a quirk in virtio, it might be better to > > make it painfully obvious that it's specific to that one case with a different > > kind of turd: > > > > - if (xen_domain()) > > + if (xen_domain() || pseries_secure_vm()) > > return true; > > I don't think it's pseries specific actually. E.g. I suspect AMD SEV > might benefit from the same kind of hack. As long as they can provide the same guarantee that the DMA ops are completely equivalent between virtio and other PCI devices, at least on the same bus, ie, we don't have to go hack special DMA ops. I think the latter is really what Christoph wants to avoid for good reasons. > > So to summarize, and make sure I'm not missing something, the two approaches > > at hand are either: > > > > 1- The above, which is a one liner and contained in the guest, so that's nice, but > > also means another turd in virtio which isn't ... > > > > 2- We force pseries to always set VIRTIO_F_IOMMU_PLATFORM, but with the current > > architecture on our side that will force virtio to always go through an emulated > > iommu, as pseries doesn't have the concept of a real bypass window, and thus will > > impact performance for both secure and non-secure VMs. > > > > 3- Invent a property that can be put in selected PCI device tree nodes that > > indicates that for that device specifically, the iommu can be bypassed, along with > > a hypercall to turn that bypass on/off. Virtio would then use VIRTIO_F_IOMMU_PLATFORM > > but its DT nodes would also have that property and Linux would notice it and turn > > bypass on. > > For completeness, virtio could also have its own bounce buffer > outside of DMA API one. I don't see lots of benefits to this > though. Not fan of that either... > > The resulting properties of those options are: > > > > 1- Is what I want because it's the simplest, provides the best performance now, > > and works without code changes to qemu or non-secure Linux. However it does > > add a tiny turd to virtio which is annoying. > > > > 2- This works but it puts the iommu in the way always, thus reducing virtio performance > > accross the board for pseries unless we only do that for secure VMs but that is > > difficult (as discussed earlier). > > > > 3- This would recover the performance lost in -2-, however it requires qemu *and* > > guest changes. Specifically, existing guests (RHEL 7 etc...) would get the > > performance hit of -2- unless modified to call that 'enable bypass' call, which > > isn't great. > > > > So imho we have to chose one of 3 not-great solutions here... Unless I missed > > something in your ideas of course. > > ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices @ 2018-08-08 22:13 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 206+ messages in thread From: Benjamin Herrenschmidt @ 2018-08-08 22:13 UTC (permalink / raw) To: Michael S. Tsirkin Cc: robh, srikar, mpe, Will Deacon, linux-kernel, linuxram, virtualization, Christoph Hellwig, jean-philippe.brucker, paulus, marc.zyngier, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Wed, 2018-08-08 at 23:31 +0300, Michael S. Tsirkin wrote: > On Wed, Aug 08, 2018 at 11:18:13PM +1000, Benjamin Herrenschmidt wrote: > > Sure, but all of this is just the configuration of the iommu. But I > > think we agree here, and your point remains valid, indeed my proposed > > hack: > > > > > if ((flags & VIRTIO_F_IOMMU_PLATFORM) || arch_virtio_wants_dma_ops()) > > > > Will only work if the IOMMU and non-IOMMU path are completely equivalent. > > > > We can provide that guarantee for our secure VM case, but not generally so if > > we were to go down the route of a quirk in virtio, it might be better to > > make it painfully obvious that it's specific to that one case with a different > > kind of turd: > > > > - if (xen_domain()) > > + if (xen_domain() || pseries_secure_vm()) > > return true; > > I don't think it's pseries specific actually. E.g. I suspect AMD SEV > might benefit from the same kind of hack. As long as they can provide the same guarantee that the DMA ops are completely equivalent between virtio and other PCI devices, at least on the same bus, ie, we don't have to go hack special DMA ops. I think the latter is really what Christoph wants to avoid for good reasons. > > So to summarize, and make sure I'm not missing something, the two approaches > > at hand are either: > > > > 1- The above, which is a one liner and contained in the guest, so that's nice, but > > also means another turd in virtio which isn't ... > > > > 2- We force pseries to always set VIRTIO_F_IOMMU_PLATFORM, but with the current > > architecture on our side that will force virtio to always go through an emulated > > iommu, as pseries doesn't have the concept of a real bypass window, and thus will > > impact performance for both secure and non-secure VMs. > > > > 3- Invent a property that can be put in selected PCI device tree nodes that > > indicates that for that device specifically, the iommu can be bypassed, along with > > a hypercall to turn that bypass on/off. Virtio would then use VIRTIO_F_IOMMU_PLATFORM > > but its DT nodes would also have that property and Linux would notice it and turn > > bypass on. > > For completeness, virtio could also have its own bounce buffer > outside of DMA API one. I don't see lots of benefits to this > though. Not fan of that either... > > The resulting properties of those options are: > > > > 1- Is what I want because it's the simplest, provides the best performance now, > > and works without code changes to qemu or non-secure Linux. However it does > > add a tiny turd to virtio which is annoying. > > > > 2- This works but it puts the iommu in the way always, thus reducing virtio performance > > accross the board for pseries unless we only do that for secure VMs but that is > > difficult (as discussed earlier). > > > > 3- This would recover the performance lost in -2-, however it requires qemu *and* > > guest changes. Specifically, existing guests (RHEL 7 etc...) would get the > > performance hit of -2- unless modified to call that 'enable bypass' call, which > > isn't great. > > > > So imho we have to chose one of 3 not-great solutions here... Unless I missed > > something in your ideas of course. > > ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-08 22:13 ` Benjamin Herrenschmidt @ 2018-08-09 2:00 ` Benjamin Herrenschmidt -1 siblings, 0 replies; 206+ messages in thread From: Benjamin Herrenschmidt @ 2018-08-09 2:00 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Christoph Hellwig, Will Deacon, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Thu, 2018-08-09 at 08:13 +1000, Benjamin Herrenschmidt wrote: > > For completeness, virtio could also have its own bounce buffer > > outside of DMA API one. I don't see lots of benefits to this > > though. > > Not fan of that either... To elaborate a bit ... For our secure VMs, we will need bounce buffering for everything anyway. virtio, emulated PCI, or vfio. By ensuring that we create an identity mapping in the IOMMU for the bounce buffering pool, we enable virtio "legacy/direct" to use the same mapping ops as things using the iommu. That said, we still need somewhere in arch/powerpc a set of dma ops which we'll attach to all PCI devices of a secure VM to force bouncing always, rather than just based on address (which is what the standard swiotlb ones do)... Unless we can tweak the swiotlb "threshold" for example by using an empty mask. We'll need the same set of DMA ops for VIO devices too, not just PCI. Cheers, Ben. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices @ 2018-08-09 2:00 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 206+ messages in thread From: Benjamin Herrenschmidt @ 2018-08-09 2:00 UTC (permalink / raw) To: Michael S. Tsirkin Cc: robh, srikar, mpe, Will Deacon, linux-kernel, linuxram, virtualization, Christoph Hellwig, jean-philippe.brucker, paulus, marc.zyngier, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Thu, 2018-08-09 at 08:13 +1000, Benjamin Herrenschmidt wrote: > > For completeness, virtio could also have its own bounce buffer > > outside of DMA API one. I don't see lots of benefits to this > > though. > > Not fan of that either... To elaborate a bit ... For our secure VMs, we will need bounce buffering for everything anyway. virtio, emulated PCI, or vfio. By ensuring that we create an identity mapping in the IOMMU for the bounce buffering pool, we enable virtio "legacy/direct" to use the same mapping ops as things using the iommu. That said, we still need somewhere in arch/powerpc a set of dma ops which we'll attach to all PCI devices of a secure VM to force bouncing always, rather than just based on address (which is what the standard swiotlb ones do)... Unless we can tweak the swiotlb "threshold" for example by using an empty mask. We'll need the same set of DMA ops for VIO devices too, not just PCI. Cheers, Ben. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-08 22:13 ` Benjamin Herrenschmidt @ 2018-08-09 5:40 ` Christoph Hellwig -1 siblings, 0 replies; 206+ messages in thread From: Christoph Hellwig @ 2018-08-09 5:40 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Michael S. Tsirkin, Christoph Hellwig, Will Deacon, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Thu, Aug 09, 2018 at 08:13:32AM +1000, Benjamin Herrenschmidt wrote: > > > - if (xen_domain()) > > > + if (xen_domain() || pseries_secure_vm()) > > > return true; > > > > I don't think it's pseries specific actually. E.g. I suspect AMD SEV > > might benefit from the same kind of hack. > > As long as they can provide the same guarantee that the DMA ops are > completely equivalent between virtio and other PCI devices, at least on > the same bus, ie, we don't have to go hack special DMA ops. > > I think the latter is really what Christoph wants to avoid for good > reasons. Yes. I also generally want to avoid too much arch specific magic. FYI, I'm off to a week-long vacation today, don't expect quick replies. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices @ 2018-08-09 5:40 ` Christoph Hellwig 0 siblings, 0 replies; 206+ messages in thread From: Christoph Hellwig @ 2018-08-09 5:40 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: robh, srikar, Michael S. Tsirkin, mpe, Will Deacon, linux-kernel, linuxram, virtualization, Christoph Hellwig, jean-philippe.brucker, paulus, marc.zyngier, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Thu, Aug 09, 2018 at 08:13:32AM +1000, Benjamin Herrenschmidt wrote: > > > - if (xen_domain()) > > > + if (xen_domain() || pseries_secure_vm()) > > > return true; > > > > I don't think it's pseries specific actually. E.g. I suspect AMD SEV > > might benefit from the same kind of hack. > > As long as they can provide the same guarantee that the DMA ops are > completely equivalent between virtio and other PCI devices, at least on > the same bus, ie, we don't have to go hack special DMA ops. > > I think the latter is really what Christoph wants to avoid for good > reasons. Yes. I also generally want to avoid too much arch specific magic. FYI, I'm off to a week-long vacation today, don't expect quick replies. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-09 5:40 ` Christoph Hellwig (?) @ 2018-09-07 0:09 ` Jiandi An 2018-09-10 6:19 ` Christoph Hellwig 2018-09-10 6:19 ` Christoph Hellwig -1 siblings, 2 replies; 206+ messages in thread From: Jiandi An @ 2018-09-07 0:09 UTC (permalink / raw) To: Christoph Hellwig, Benjamin Herrenschmidt Cc: Michael S. Tsirkin, Will Deacon, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier, thomas.lendacky, brijesh.singh, jiandi.an On 08/09/2018 12:40 AM, Christoph Hellwig wrote: > On Thu, Aug 09, 2018 at 08:13:32AM +1000, Benjamin Herrenschmidt wrote: >>>> - if (xen_domain()) >>>> + if (xen_domain() || pseries_secure_vm()) >>>> return true; >>> >>> I don't think it's pseries specific actually. E.g. I suspect AMD SEV >>> might benefit from the same kind of hack. >> >> As long as they can provide the same guarantee that the DMA ops are >> completely equivalent between virtio and other PCI devices, at least on >> the same bus, ie, we don't have to go hack special DMA ops. >> >> I think the latter is really what Christoph wants to avoid for good >> reasons. > > Yes. I also generally want to avoid too much arch specific magic. > > FYI, I'm off to a week-long vacation today, don't expect quick replies. > > > I've been following this RFC series as this has impact on AMD SEV. Could you guys keep us in the loop on this (thomas.lendacky@amd.com, brijesh.singh@amd.com, jiandi.an@amd.com are on cc). AMD SEV today sets swiotlb_force to SWIOTLB_FORCE early on in x86_64_start_kernel. During start_kernel, mem_encrypt_init() sets dma_ops to swiotlb_dma_ops if SEV is on as it uses SWIOTLB to bounce buffer DMA operation and it's marked as decrypted. For virtio device we have to pass in iommu_platform=true flag for this to set the VIRTIO_F_IOMMU_PLATFORM flag. But for example QEMU has the use of iommu_platform attribute disabled for virtio-gpu device. So would also like to move towards not having to specify the VIRTIO_F_IOMMU_PLATFORM flag. Anshuman's patch [RFC,2/4] virtio: Override device's DMA OPS with virtio_direct_dma_ops selectively sets the default dma ops of virtio device's parent pci to direct_dma_ops if VIRTIO_F_IOMMU_PLATFORM is set. And later platform specific code can override the dma ops again. int virtio_finalize_features(struct virtio_device *dev) { int ret = dev->config->finalize_features(dev); @@ -174,6 +176,9 @@ int virtio_finalize_features(struct virtio_device *dev) if (ret) return ret; + if (virtio_has_iommu_quirk(dev)) + set_dma_ops(dev->dev.parent, &virtio_direct_dma_ops); + if (!virtio_has_feature(dev, VIRTIO_F_VERSION_1)) return 0; Would like to be in the loop and put in the platform specific code for AMD SEV at the same time along with this. Or does it make sense that if swiotlb force is set don't override virtio device's parent dma_ops to direct_dma_ops? What if someone passes in swiotlb=force from kernel boot command line parameter, this would override the parent pci device's dma ops to direct_dma_ops. So where is this RFC standing currently. I saw Michael's comment mentioning this RFC is blocked by its performance overhead for now for switching to using DMA ops unconditionally. -Jiandi ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-09-07 0:09 ` Jiandi An @ 2018-09-10 6:19 ` Christoph Hellwig 2018-09-10 6:19 ` Christoph Hellwig 1 sibling, 0 replies; 206+ messages in thread From: Christoph Hellwig @ 2018-09-10 6:19 UTC (permalink / raw) To: Jiandi An Cc: brijesh.singh, srikar, Michael S. Tsirkin, Benjamin Herrenschmidt, Will Deacon, virtualization, paulus, elfring, Anshuman Khandual, robh, jean-philippe.brucker, mpe, Christoph Hellwig, thomas.lendacky, marc.zyngier, linuxram, david, linuxppc-dev, linux-kernel, joe, jiandi.an, robin.murphy, haren On Thu, Sep 06, 2018 at 07:09:09PM -0500, Jiandi An wrote: > For virtio device we have to pass in iommu_platform=true flag for > this to set the VIRTIO_F_IOMMU_PLATFORM flag. But for example > QEMU has the use of iommu_platform attribute disabled for virtio-gpu > device. So would also like to move towards not having to specify > the VIRTIO_F_IOMMU_PLATFORM flag. Specifying VIRTIO_F_IOMMU_PLATFORM is the right thing for your platform given that you can't directly use physical addresses. Please fix qemu so that virtio-gpu works with VIRTIO_F_IOMMU_PLATFORM. Also just as I said for the power folks: you should really work with the qemu folks that VIRTIO_F_IOMMU_PLATFORM (or whatever we call the properly documented flag) can be set by default, and no pointless performance overhead is implied by having a sane and simple implementation. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-09-07 0:09 ` Jiandi An 2018-09-10 6:19 ` Christoph Hellwig @ 2018-09-10 6:19 ` Christoph Hellwig 2018-09-10 8:53 ` Gerd Hoffmann 2018-09-10 8:53 ` Gerd Hoffmann 1 sibling, 2 replies; 206+ messages in thread From: Christoph Hellwig @ 2018-09-10 6:19 UTC (permalink / raw) To: Jiandi An Cc: Christoph Hellwig, Benjamin Herrenschmidt, Michael S. Tsirkin, Will Deacon, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier, thomas.lendacky, brijesh.singh, jiandi.an On Thu, Sep 06, 2018 at 07:09:09PM -0500, Jiandi An wrote: > For virtio device we have to pass in iommu_platform=true flag for > this to set the VIRTIO_F_IOMMU_PLATFORM flag. But for example > QEMU has the use of iommu_platform attribute disabled for virtio-gpu > device. So would also like to move towards not having to specify > the VIRTIO_F_IOMMU_PLATFORM flag. Specifying VIRTIO_F_IOMMU_PLATFORM is the right thing for your platform given that you can't directly use physical addresses. Please fix qemu so that virtio-gpu works with VIRTIO_F_IOMMU_PLATFORM. Also just as I said for the power folks: you should really work with the qemu folks that VIRTIO_F_IOMMU_PLATFORM (or whatever we call the properly documented flag) can be set by default, and no pointless performance overhead is implied by having a sane and simple implementation. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-09-10 6:19 ` Christoph Hellwig @ 2018-09-10 8:53 ` Gerd Hoffmann 2018-09-10 8:53 ` Gerd Hoffmann 1 sibling, 0 replies; 206+ messages in thread From: Gerd Hoffmann @ 2018-09-10 8:53 UTC (permalink / raw) To: Christoph Hellwig Cc: brijesh.singh, srikar, Michael S. Tsirkin, Benjamin Herrenschmidt, Will Deacon, virtualization, paulus, elfring, Anshuman Khandual, robh, Jiandi An, jean-philippe.brucker, mpe, thomas.lendacky, marc.zyngier, linuxram, david, linuxppc-dev, linux-kernel, joe, jiandi.an, robin.murphy, haren > > this to set the VIRTIO_F_IOMMU_PLATFORM flag. But for example > > QEMU has the use of iommu_platform attribute disabled for virtio-gpu > > device. So would also like to move towards not having to specify > > the VIRTIO_F_IOMMU_PLATFORM flag. > > Specifying VIRTIO_F_IOMMU_PLATFORM is the right thing for your > platform given that you can't directly use physical addresses. > Please fix qemu so that virtio-gpu works with VIRTIO_F_IOMMU_PLATFORM. This needs both host and guest side changes btw. Guest side patch is in drm-misc (a3b815f09bb8) and should land in the next merge window. Host side patches are here: https://git.kraxel.org/cgit/qemu/log/?h=sirius/virtio-gpu-iommu Should also land in the next qemu version. cheers, Gerd ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-09-10 6:19 ` Christoph Hellwig 2018-09-10 8:53 ` Gerd Hoffmann @ 2018-09-10 8:53 ` Gerd Hoffmann 1 sibling, 0 replies; 206+ messages in thread From: Gerd Hoffmann @ 2018-09-10 8:53 UTC (permalink / raw) To: Christoph Hellwig Cc: Jiandi An, brijesh.singh, srikar, Michael S. Tsirkin, Benjamin Herrenschmidt, Will Deacon, virtualization, paulus, elfring, Anshuman Khandual, robh, jean-philippe.brucker, mpe, thomas.lendacky, marc.zyngier, linuxram, david, linuxppc-dev, linux-kernel, joe, jiandi.an, robin.murphy, haren > > this to set the VIRTIO_F_IOMMU_PLATFORM flag. But for example > > QEMU has the use of iommu_platform attribute disabled for virtio-gpu > > device. So would also like to move towards not having to specify > > the VIRTIO_F_IOMMU_PLATFORM flag. > > Specifying VIRTIO_F_IOMMU_PLATFORM is the right thing for your > platform given that you can't directly use physical addresses. > Please fix qemu so that virtio-gpu works with VIRTIO_F_IOMMU_PLATFORM. This needs both host and guest side changes btw. Guest side patch is in drm-misc (a3b815f09bb8) and should land in the next merge window. Host side patches are here: https://git.kraxel.org/cgit/qemu/log/?h=sirius/virtio-gpu-iommu Should also land in the next qemu version. cheers, Gerd ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-08 13:18 ` Benjamin Herrenschmidt (?) (?) @ 2018-08-08 20:31 ` Michael S. Tsirkin -1 siblings, 0 replies; 206+ messages in thread From: Michael S. Tsirkin @ 2018-08-08 20:31 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: robh, srikar, mpe, Will Deacon, linux-kernel, linuxram, virtualization, Christoph Hellwig, jean-philippe.brucker, paulus, marc.zyngier, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Wed, Aug 08, 2018 at 11:18:13PM +1000, Benjamin Herrenschmidt wrote: > Sure, but all of this is just the configuration of the iommu. But I > think we agree here, and your point remains valid, indeed my proposed > hack: > > > if ((flags & VIRTIO_F_IOMMU_PLATFORM) || arch_virtio_wants_dma_ops()) > > Will only work if the IOMMU and non-IOMMU path are completely equivalent. > > We can provide that guarantee for our secure VM case, but not generally so if > we were to go down the route of a quirk in virtio, it might be better to > make it painfully obvious that it's specific to that one case with a different > kind of turd: > > - if (xen_domain()) > + if (xen_domain() || pseries_secure_vm()) > return true; I don't think it's pseries specific actually. E.g. I suspect AMD SEV might benefit from the same kind of hack. > So to summarize, and make sure I'm not missing something, the two approaches > at hand are either: > > 1- The above, which is a one liner and contained in the guest, so that's nice, but > also means another turd in virtio which isn't ... > > 2- We force pseries to always set VIRTIO_F_IOMMU_PLATFORM, but with the current > architecture on our side that will force virtio to always go through an emulated > iommu, as pseries doesn't have the concept of a real bypass window, and thus will > impact performance for both secure and non-secure VMs. > > 3- Invent a property that can be put in selected PCI device tree nodes that > indicates that for that device specifically, the iommu can be bypassed, along with > a hypercall to turn that bypass on/off. Virtio would then use VIRTIO_F_IOMMU_PLATFORM > but its DT nodes would also have that property and Linux would notice it and turn > bypass on. For completeness, virtio could also have its own bounce buffer outside of DMA API one. I don't see lots of benefits to this though. > The resulting properties of those options are: > > 1- Is what I want because it's the simplest, provides the best performance now, > and works without code changes to qemu or non-secure Linux. However it does > add a tiny turd to virtio which is annoying. > > 2- This works but it puts the iommu in the way always, thus reducing virtio performance > accross the board for pseries unless we only do that for secure VMs but that is > difficult (as discussed earlier). > > 3- This would recover the performance lost in -2-, however it requires qemu *and* > guest changes. Specifically, existing guests (RHEL 7 etc...) would get the > performance hit of -2- unless modified to call that 'enable bypass' call, which > isn't great. > > So imho we have to chose one of 3 not-great solutions here... Unless I missed > something in your ideas of course. > > Cheers, > Ben. > > ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-08 10:07 ` Benjamin Herrenschmidt (?) (?) @ 2018-08-08 12:30 ` Christoph Hellwig -1 siblings, 0 replies; 206+ messages in thread From: Christoph Hellwig @ 2018-08-08 12:30 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: robh, srikar, Michael S. Tsirkin, mpe, Will Deacon, linux-kernel, linuxram, virtualization, Christoph Hellwig, jean-philippe.brucker, paulus, marc.zyngier, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Wed, Aug 08, 2018 at 08:07:49PM +1000, Benjamin Herrenschmidt wrote: > Qemu virtio bypasses that iommu when the VIRTIO_F_IOMMU_PLATFORM flag > is not set (default) but there's nothing in the device-tree to tell the > guest about this since it's a violation of our pseries architecture, so > we just rely on Linux virtio "knowing" that it happens. It's a bit > yucky but that's now history... That is ugly as hell, but it is how virtio works everywhere, so nothing special so far. > Essentially pseries "architecturally" does not have the concept of not > having an iommu in the way and qemu violates that architecture today. > > (Remember it comes from pHyp, our priorietary HV, which we are somewhat > mimmicing here). It shouldnt be too hard to have a dt property that communicates this, should it? > So if we always set VIRTIO_F_IOMMU_PLATFORM, it *will* force all virtio > through that iommu and performance will suffer (esp vhost I suspect), > especially since adding/removing translations in the iommu is a > hypercall. Well, we'd nee to make sure that for this particular bus we skip the actualy iommu. > > It would not be the same effect. The problem with that is that you must > > now assumes that your qemu knows that for example you might be passing > > a dma offset if the bus otherwise requires it. > > I would assume that arch_virtio_wants_dma_ops() only returns true when > no such offsets are involved, at least in our case that would be what > happens. That would work, but we're really piling hacĸs ontop of hacks here. > > Or in other words: > > you potentially break the contract between qemu and the guest of always > > passing down physical addresses. If we explicitly change that contract > > through using a flag that says you pass bus address everything is fine. > > For us a "bus address" is behind the iommu so that's what > VIRTIO_F_IOMMU_PLATFORM does already. We don't have the concept of a > bus address that is different. I suppose it's an ARMism to have DMA > offsets that are separate from iommus ? No, a lot of platforms support a bus address that has an offset from the physical address. including a lot of power platforms: arch/powerpc/kernel/pci-common.c: set_dma_offset(&dev->dev, PCI_DRAM_OFFSET); arch/powerpc/platforms/cell/iommu.c: set_dma_offset(dev, cell_dma_nommu_offset); arch/powerpc/platforms/cell/iommu.c: set_dma_offset(dev, addr); arch/powerpc/platforms/powernv/pci-ioda.c: set_dma_offset(&pdev->dev, pe->tce_bypass_base); arch/powerpc/platforms/powernv/pci-ioda.c: set_dma_offset(&pdev->dev, (1ULL << 32)); arch/powerpc/platforms/powernv/pci-ioda.c: set_dma_offset(&dev->dev, pe->tce_bypass_base); arch/powerpc/platforms/pseries/iommu.c: set_dma_offset(dev, dma_offset); arch/powerpc/sysdev/dart_iommu.c: set_dma_offset(&dev->dev, DART_U4_BYPASS_BASE); arch/powerpc/sysdev/fsl_pci.c: set_dma_offset(dev, pci64_dma_offset); to make things worse some platforms (at least on arm/arm64/mips/x86) can also require additional banking where it isn't even a single linear map but multiples windows. _______________________________________________ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-07 6:42 ` Benjamin Herrenschmidt (?) (?) @ 2018-08-07 13:55 ` Christoph Hellwig -1 siblings, 0 replies; 206+ messages in thread From: Christoph Hellwig @ 2018-08-07 13:55 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: robh, srikar, Michael S. Tsirkin, mpe, Will Deacon, linux-kernel, linuxram, virtualization, Christoph Hellwig, jean-philippe.brucker, paulus, marc.zyngier, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Tue, Aug 07, 2018 at 04:42:44PM +1000, Benjamin Herrenschmidt wrote: > Note that I can make it so that the same DMA ops (basically standard > swiotlb ops without arch hacks) work for both "direct virtio" and > "normal PCI" devices. > > The trick is simply in the arch to setup the iommu to map the swiotlb > bounce buffer pool 1:1 in the iommu, so the iommu essentially can be > ignored without affecting the physical addresses. > > If I do that, *all* I need is a way, from the guest itself (again, the > other side dosn't know anything about it), to force virtio to use the > DMA ops as if there was an iommu, that is, use whatever dma ops were > setup by the platform for the pci device. In that case just setting VIRTIO_F_IOMMU_PLATFORM in the flags should do the work (even if that isn't strictly what the current definition of the flag actually means). On the qemu side you'll need to make sure you have a way to set VIRTIO_F_IOMMU_PLATFORM without emulating an iommu, but with code to take dma offsets into account if your plaform has any (various power plaforms seem to have them, not sure if it affects your config). ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-06 19:52 ` Benjamin Herrenschmidt (?) (?) @ 2018-08-07 6:21 ` Christoph Hellwig -1 siblings, 0 replies; 206+ messages in thread From: Christoph Hellwig @ 2018-08-07 6:21 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: robh, srikar, Michael S. Tsirkin, mpe, Will Deacon, linux-kernel, linuxram, virtualization, Christoph Hellwig, jean-philippe.brucker, paulus, marc.zyngier, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Tue, Aug 07, 2018 at 05:52:12AM +1000, Benjamin Herrenschmidt wrote: > > It is your job to write a coherent interface specification that does > > not depend on the used components. The hypervisor might be PAPR, > > Linux + qemu, VMware, Hyperv or something so secret that you'd have > > to shoot me if you had to tell me. The guest might be Linux, FreeBSD, > > AIX, OS400 or a Hipster project of the day in Rust. As long as we > > properly specify the interface it simplify does not matter. > > That's the point Christoph. The interface is today's interface. It does > NOT change. That information is not part of the interface. > > It's the VM itself that is stashing away its memory in a secret place, > and thus needs to do bounce buffering. There is no change to the virtio > interface per-se. Any guest that doesn't know about your magic limited adressing is simply not going to work, so we need to communicate that fact. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-03 15:58 ` Benjamin Herrenschmidt (?) (?) @ 2018-08-03 19:07 ` Michael S. Tsirkin -1 siblings, 0 replies; 206+ messages in thread From: Michael S. Tsirkin @ 2018-08-03 19:07 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: robh, srikar, mpe, Will Deacon, linux-kernel, linuxram, virtualization, Christoph Hellwig, paulus, marc.zyngier, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Fri, Aug 03, 2018 at 10:58:36AM -0500, Benjamin Herrenschmidt wrote: > On Fri, 2018-08-03 at 00:05 -0700, Christoph Hellwig wrote: > > > 2- Make virtio use the DMA API with our custom platform-provided > > > swiotlb callbacks when needed, that is when not using IOMMU *and* > > > running on a secure VM in our case. > > > > And total NAK the customer platform-provided part of this. We need > > a flag passed in from the hypervisor that the device needs all bus > > specific dma api treatment, and then just use the normal plaform > > dma mapping setup. > > Christoph, as I have explained already, we do NOT have a way to provide > such a flag as neither the hypervisor nor qemu knows anything about > this when the VM is created. I think the fact you can't add flags from the hypervisor is a sign of a problematic architecture, you should look at adding that down the road - you will likely need it at some point. However in this specific case, the flag does not need to come from the hypervisor, it can be set by arch boot code I think. Christoph do you see a problem with that? > > To get swiotlb you'll need to then use the DT/ACPI > > dma-range property to limit the addressable range, and a swiotlb > > capable plaform will use swiotlb automatically. > > This cannot be done as you describe it. > > The VM is created as a *normal* VM. The DT stuff is generated by qemu > at a point where it has *no idea* that the VM will later become secure > and thus will have to restrict which pages can be used for "DMA". > > The VM will *at runtime* turn itself into a secure VM via interactions > with the security HW and the Ultravisor layer (which sits below the > HV). This happens way after the DT has been created and consumed, the > qemu devices instanciated etc... > > Only the guest kernel knows because it initates the transition. When > that happens, the virtio devices have already been used by the guest > firmware, bootloader, possibly another kernel that kexeced the "secure" > one, etc... > > So instead of running around saying NAK NAK NAK, please explain how we > can solve that differently. > > Ben. > ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-03 15:58 ` Benjamin Herrenschmidt ` (2 preceding siblings ...) (?) @ 2018-08-03 19:07 ` Michael S. Tsirkin 2018-08-04 1:11 ` Benjamin Herrenschmidt ` (5 more replies) -1 siblings, 6 replies; 206+ messages in thread From: Michael S. Tsirkin @ 2018-08-03 19:07 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Christoph Hellwig, Will Deacon, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Fri, Aug 03, 2018 at 10:58:36AM -0500, Benjamin Herrenschmidt wrote: > On Fri, 2018-08-03 at 00:05 -0700, Christoph Hellwig wrote: > > > 2- Make virtio use the DMA API with our custom platform-provided > > > swiotlb callbacks when needed, that is when not using IOMMU *and* > > > running on a secure VM in our case. > > > > And total NAK the customer platform-provided part of this. We need > > a flag passed in from the hypervisor that the device needs all bus > > specific dma api treatment, and then just use the normal plaform > > dma mapping setup. > > Christoph, as I have explained already, we do NOT have a way to provide > such a flag as neither the hypervisor nor qemu knows anything about > this when the VM is created. I think the fact you can't add flags from the hypervisor is a sign of a problematic architecture, you should look at adding that down the road - you will likely need it at some point. However in this specific case, the flag does not need to come from the hypervisor, it can be set by arch boot code I think. Christoph do you see a problem with that? > > To get swiotlb you'll need to then use the DT/ACPI > > dma-range property to limit the addressable range, and a swiotlb > > capable plaform will use swiotlb automatically. > > This cannot be done as you describe it. > > The VM is created as a *normal* VM. The DT stuff is generated by qemu > at a point where it has *no idea* that the VM will later become secure > and thus will have to restrict which pages can be used for "DMA". > > The VM will *at runtime* turn itself into a secure VM via interactions > with the security HW and the Ultravisor layer (which sits below the > HV). This happens way after the DT has been created and consumed, the > qemu devices instanciated etc... > > Only the guest kernel knows because it initates the transition. When > that happens, the virtio devices have already been used by the guest > firmware, bootloader, possibly another kernel that kexeced the "secure" > one, etc... > > So instead of running around saying NAK NAK NAK, please explain how we > can solve that differently. > > Ben. > ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-03 19:07 ` Michael S. Tsirkin @ 2018-08-04 1:11 ` Benjamin Herrenschmidt 2018-08-04 1:16 ` Benjamin Herrenschmidt ` (4 subsequent siblings) 5 siblings, 0 replies; 206+ messages in thread From: Benjamin Herrenschmidt @ 2018-08-04 1:11 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Christoph Hellwig, Will Deacon, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Fri, 2018-08-03 at 22:07 +0300, Michael S. Tsirkin wrote: > On Fri, Aug 03, 2018 at 10:58:36AM -0500, Benjamin Herrenschmidt wrote: > > On Fri, 2018-08-03 at 00:05 -0700, Christoph Hellwig wrote: > > > > 2- Make virtio use the DMA API with our custom platform-provided > > > > swiotlb callbacks when needed, that is when not using IOMMU *and* > > > > running on a secure VM in our case. > > > > > > And total NAK the customer platform-provided part of this. We need > > > a flag passed in from the hypervisor that the device needs all bus > > > specific dma api treatment, and then just use the normal plaform > > > dma mapping setup. > > > > Christoph, as I have explained already, we do NOT have a way to provide > > such a flag as neither the hypervisor nor qemu knows anything about > > this when the VM is created. > > I think the fact you can't add flags from the hypervisor is > a sign of a problematic architecture, you should look at > adding that down the road - you will likely need it at some point. Well, we can later in the boot process. At VM creation time, it's just a normal VM. The VM firmware, bootloader etc... are just operating normally etc... Later on, (we may have even already run Linux at that point, unsecurely, as we can use Linux as a bootloader under some circumstances), we start a "secure image". This is a kernel zImage that includes a "ticket" that has the appropriate signature etc... so that when that kernel starts, it can authenticate with the ultravisor, be verified (along with its ramdisk) etc... and copied (by the UV) into secure memory & run from there. At that point, the hypervisor is informed that the VM has become secure. So at that point, we could exit to qemu to inform it of the change, and have it walk the qtree and "Switch" all the virtio devices to use the IOMMU I suppose, but it feels a lot grosser to me. That's the only other option I can think of. > However in this specific case, the flag does not need to come from the > hypervisor, it can be set by arch boot code I think. > Christoph do you see a problem with that? The above could do that yes. Another approach would be to do it from a small virtio "quirk" that pokes a bit in the device to force it to iommu mode when it detects that we are running in a secure VM. That's a bit warty on the virito side but probably not as much as having a qemu one that walks of the virtio devices to change how they behave. What do you reckon ? Cheers, Ben. > > > To get swiotlb you'll need to then use the DT/ACPI > > > dma-range property to limit the addressable range, and a swiotlb > > > capable plaform will use swiotlb automatically. > > > > This cannot be done as you describe it. > > > > The VM is created as a *normal* VM. The DT stuff is generated by qemu > > at a point where it has *no idea* that the VM will later become secure > > and thus will have to restrict which pages can be used for "DMA". > > > > The VM will *at runtime* turn itself into a secure VM via interactions > > with the security HW and the Ultravisor layer (which sits below the > > HV). This happens way after the DT has been created and consumed, the > > qemu devices instanciated etc... > > > > Only the guest kernel knows because it initates the transition. When > > that happens, the virtio devices have already been used by the guest > > firmware, bootloader, possibly another kernel that kexeced the "secure" > > one, etc... > > > > So instead of running around saying NAK NAK NAK, please explain how we > > can solve that differently. > > > > Ben. > > ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices @ 2018-08-04 1:11 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 206+ messages in thread From: Benjamin Herrenschmidt @ 2018-08-04 1:11 UTC (permalink / raw) To: Michael S. Tsirkin Cc: robh, srikar, mpe, Will Deacon, linux-kernel, linuxram, virtualization, Christoph Hellwig, paulus, marc.zyngier, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Fri, 2018-08-03 at 22:07 +0300, Michael S. Tsirkin wrote: > On Fri, Aug 03, 2018 at 10:58:36AM -0500, Benjamin Herrenschmidt wrote: > > On Fri, 2018-08-03 at 00:05 -0700, Christoph Hellwig wrote: > > > > 2- Make virtio use the DMA API with our custom platform-provided > > > > swiotlb callbacks when needed, that is when not using IOMMU *and* > > > > running on a secure VM in our case. > > > > > > And total NAK the customer platform-provided part of this. We need > > > a flag passed in from the hypervisor that the device needs all bus > > > specific dma api treatment, and then just use the normal plaform > > > dma mapping setup. > > > > Christoph, as I have explained already, we do NOT have a way to provide > > such a flag as neither the hypervisor nor qemu knows anything about > > this when the VM is created. > > I think the fact you can't add flags from the hypervisor is > a sign of a problematic architecture, you should look at > adding that down the road - you will likely need it at some point. Well, we can later in the boot process. At VM creation time, it's just a normal VM. The VM firmware, bootloader etc... are just operating normally etc... Later on, (we may have even already run Linux at that point, unsecurely, as we can use Linux as a bootloader under some circumstances), we start a "secure image". This is a kernel zImage that includes a "ticket" that has the appropriate signature etc... so that when that kernel starts, it can authenticate with the ultravisor, be verified (along with its ramdisk) etc... and copied (by the UV) into secure memory & run from there. At that point, the hypervisor is informed that the VM has become secure. So at that point, we could exit to qemu to inform it of the change, and have it walk the qtree and "Switch" all the virtio devices to use the IOMMU I suppose, but it feels a lot grosser to me. That's the only other option I can think of. > However in this specific case, the flag does not need to come from the > hypervisor, it can be set by arch boot code I think. > Christoph do you see a problem with that? The above could do that yes. Another approach would be to do it from a small virtio "quirk" that pokes a bit in the device to force it to iommu mode when it detects that we are running in a secure VM. That's a bit warty on the virito side but probably not as much as having a qemu one that walks of the virtio devices to change how they behave. What do you reckon ? Cheers, Ben. > > > To get swiotlb you'll need to then use the DT/ACPI > > > dma-range property to limit the addressable range, and a swiotlb > > > capable plaform will use swiotlb automatically. > > > > This cannot be done as you describe it. > > > > The VM is created as a *normal* VM. The DT stuff is generated by qemu > > at a point where it has *no idea* that the VM will later become secure > > and thus will have to restrict which pages can be used for "DMA". > > > > The VM will *at runtime* turn itself into a secure VM via interactions > > with the security HW and the Ultravisor layer (which sits below the > > HV). This happens way after the DT has been created and consumed, the > > qemu devices instanciated etc... > > > > Only the guest kernel knows because it initates the transition. When > > that happens, the virtio devices have already been used by the guest > > firmware, bootloader, possibly another kernel that kexeced the "secure" > > one, etc... > > > > So instead of running around saying NAK NAK NAK, please explain how we > > can solve that differently. > > > > Ben. > > ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-03 19:07 ` Michael S. Tsirkin 2018-08-04 1:11 ` Benjamin Herrenschmidt @ 2018-08-04 1:16 ` Benjamin Herrenschmidt 2018-08-04 1:16 ` Benjamin Herrenschmidt ` (3 subsequent siblings) 5 siblings, 0 replies; 206+ messages in thread From: Benjamin Herrenschmidt @ 2018-08-04 1:16 UTC (permalink / raw) To: Michael S. Tsirkin Cc: robh, srikar, mpe, Will Deacon, linux-kernel, linuxram, virtualization, Christoph Hellwig, paulus, marc.zyngier, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Fri, 2018-08-03 at 22:07 +0300, Michael S. Tsirkin wrote: > On Fri, Aug 03, 2018 at 10:58:36AM -0500, Benjamin Herrenschmidt wrote: > > On Fri, 2018-08-03 at 00:05 -0700, Christoph Hellwig wrote: > > > > 2- Make virtio use the DMA API with our custom platform-provided > > > > swiotlb callbacks when needed, that is when not using IOMMU *and* > > > > running on a secure VM in our case. > > > > > > And total NAK the customer platform-provided part of this. We need > > > a flag passed in from the hypervisor that the device needs all bus > > > specific dma api treatment, and then just use the normal plaform > > > dma mapping setup. > > > > Christoph, as I have explained already, we do NOT have a way to provide > > such a flag as neither the hypervisor nor qemu knows anything about > > this when the VM is created. > > I think the fact you can't add flags from the hypervisor is > a sign of a problematic architecture, you should look at > adding that down the road - you will likely need it at some point. Well, we can later in the boot process. At VM creation time, it's just a normal VM. The VM firmware, bootloader etc... are just operating normally etc... Later on, (we may have even already run Linux at that point, unsecurely, as we can use Linux as a bootloader under some circumstances), we start a "secure image". This is a kernel zImage that includes a "ticket" that has the appropriate signature etc... so that when that kernel starts, it can authenticate with the ultravisor, be verified (along with its ramdisk) etc... and copied (by the UV) into secure memory & run from there. At that point, the hypervisor is informed that the VM has become secure. So at that point, we could exit to qemu to inform it of the change, and have it walk the qtree and "Switch" all the virtio devices to use the IOMMU I suppose, but it feels a lot grosser to me. That's the only other option I can think of. > However in this specific case, the flag does not need to come from the > hypervisor, it can be set by arch boot code I think. > Christoph do you see a problem with that? The above could do that yes. Another approach would be to do it from a small virtio "quirk" that pokes a bit in the device to force it to iommu mode when it detects that we are running in a secure VM. That's a bit warty on the virito side but probably not as much as having a qemu one that walks of the virtio devices to change how they behave. What do you reckon ? What we want to avoid is to expose any of this to the *end user* or libvirt or any other higher level of the management stack. We really want that stuff to remain contained between the VM itself, KVM and maybe qemu. We will need some other qemu changes for migration so that's ok. But the minute you start touching libvirt and the higher levels it becomes a nightmare. Cheers, Ben. > > > To get swiotlb you'll need to then use the DT/ACPI > > > dma-range property to limit the addressable range, and a swiotlb > > > capable plaform will use swiotlb automatically. > > > > This cannot be done as you describe it. > > > > The VM is created as a *normal* VM. The DT stuff is generated by qemu > > at a point where it has *no idea* that the VM will later become secure > > and thus will have to restrict which pages can be used for "DMA". > > > > The VM will *at runtime* turn itself into a secure VM via interactions > > with the security HW and the Ultravisor layer (which sits below the > > HV). This happens way after the DT has been created and consumed, the > > qemu devices instanciated etc... > > > > Only the guest kernel knows because it initates the transition. When > > that happens, the virtio devices have already been used by the guest > > firmware, bootloader, possibly another kernel that kexeced the "secure" > > one, etc... > > > > So instead of running around saying NAK NAK NAK, please explain how we > > can solve that differently. > > > > Ben. > > ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-03 19:07 ` Michael S. Tsirkin 2018-08-04 1:11 ` Benjamin Herrenschmidt 2018-08-04 1:16 ` Benjamin Herrenschmidt @ 2018-08-04 1:16 ` Benjamin Herrenschmidt 2018-08-05 0:22 ` Michael S. Tsirkin 2018-08-05 0:22 ` Michael S. Tsirkin 2018-08-04 1:18 ` Benjamin Herrenschmidt ` (2 subsequent siblings) 5 siblings, 2 replies; 206+ messages in thread From: Benjamin Herrenschmidt @ 2018-08-04 1:16 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Christoph Hellwig, Will Deacon, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Fri, 2018-08-03 at 22:07 +0300, Michael S. Tsirkin wrote: > On Fri, Aug 03, 2018 at 10:58:36AM -0500, Benjamin Herrenschmidt wrote: > > On Fri, 2018-08-03 at 00:05 -0700, Christoph Hellwig wrote: > > > > 2- Make virtio use the DMA API with our custom platform-provided > > > > swiotlb callbacks when needed, that is when not using IOMMU *and* > > > > running on a secure VM in our case. > > > > > > And total NAK the customer platform-provided part of this. We need > > > a flag passed in from the hypervisor that the device needs all bus > > > specific dma api treatment, and then just use the normal plaform > > > dma mapping setup. > > > > Christoph, as I have explained already, we do NOT have a way to provide > > such a flag as neither the hypervisor nor qemu knows anything about > > this when the VM is created. > > I think the fact you can't add flags from the hypervisor is > a sign of a problematic architecture, you should look at > adding that down the road - you will likely need it at some point. Well, we can later in the boot process. At VM creation time, it's just a normal VM. The VM firmware, bootloader etc... are just operating normally etc... Later on, (we may have even already run Linux at that point, unsecurely, as we can use Linux as a bootloader under some circumstances), we start a "secure image". This is a kernel zImage that includes a "ticket" that has the appropriate signature etc... so that when that kernel starts, it can authenticate with the ultravisor, be verified (along with its ramdisk) etc... and copied (by the UV) into secure memory & run from there. At that point, the hypervisor is informed that the VM has become secure. So at that point, we could exit to qemu to inform it of the change, and have it walk the qtree and "Switch" all the virtio devices to use the IOMMU I suppose, but it feels a lot grosser to me. That's the only other option I can think of. > However in this specific case, the flag does not need to come from the > hypervisor, it can be set by arch boot code I think. > Christoph do you see a problem with that? The above could do that yes. Another approach would be to do it from a small virtio "quirk" that pokes a bit in the device to force it to iommu mode when it detects that we are running in a secure VM. That's a bit warty on the virito side but probably not as much as having a qemu one that walks of the virtio devices to change how they behave. What do you reckon ? What we want to avoid is to expose any of this to the *end user* or libvirt or any other higher level of the management stack. We really want that stuff to remain contained between the VM itself, KVM and maybe qemu. We will need some other qemu changes for migration so that's ok. But the minute you start touching libvirt and the higher levels it becomes a nightmare. Cheers, Ben. > > > To get swiotlb you'll need to then use the DT/ACPI > > > dma-range property to limit the addressable range, and a swiotlb > > > capable plaform will use swiotlb automatically. > > > > This cannot be done as you describe it. > > > > The VM is created as a *normal* VM. The DT stuff is generated by qemu > > at a point where it has *no idea* that the VM will later become secure > > and thus will have to restrict which pages can be used for "DMA". > > > > The VM will *at runtime* turn itself into a secure VM via interactions > > with the security HW and the Ultravisor layer (which sits below the > > HV). This happens way after the DT has been created and consumed, the > > qemu devices instanciated etc... > > > > Only the guest kernel knows because it initates the transition. When > > that happens, the virtio devices have already been used by the guest > > firmware, bootloader, possibly another kernel that kexeced the "secure" > > one, etc... > > > > So instead of running around saying NAK NAK NAK, please explain how we > > can solve that differently. > > > > Ben. > > ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-04 1:16 ` Benjamin Herrenschmidt @ 2018-08-05 0:22 ` Michael S. Tsirkin 2018-08-05 4:52 ` Benjamin Herrenschmidt 2018-08-05 0:22 ` Michael S. Tsirkin 1 sibling, 1 reply; 206+ messages in thread From: Michael S. Tsirkin @ 2018-08-05 0:22 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Christoph Hellwig, Will Deacon, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Fri, Aug 03, 2018 at 08:16:21PM -0500, Benjamin Herrenschmidt wrote: > On Fri, 2018-08-03 at 22:07 +0300, Michael S. Tsirkin wrote: > > On Fri, Aug 03, 2018 at 10:58:36AM -0500, Benjamin Herrenschmidt wrote: > > > On Fri, 2018-08-03 at 00:05 -0700, Christoph Hellwig wrote: > > > > > 2- Make virtio use the DMA API with our custom platform-provided > > > > > swiotlb callbacks when needed, that is when not using IOMMU *and* > > > > > running on a secure VM in our case. > > > > > > > > And total NAK the customer platform-provided part of this. We need > > > > a flag passed in from the hypervisor that the device needs all bus > > > > specific dma api treatment, and then just use the normal plaform > > > > dma mapping setup. > > > > > > Christoph, as I have explained already, we do NOT have a way to provide > > > such a flag as neither the hypervisor nor qemu knows anything about > > > this when the VM is created. > > > > I think the fact you can't add flags from the hypervisor is > > a sign of a problematic architecture, you should look at > > adding that down the road - you will likely need it at some point. > > Well, we can later in the boot process. At VM creation time, it's just > a normal VM. The VM firmware, bootloader etc... are just operating > normally etc... I see the allure of this, but I think down the road you will discover passing a flag in libvirt XML saying "please use a secure mode" or whatever is a good idea. Even thought it is probably not required to address this specific issue. For example, I don't think ballooning works in secure mode, you will be able to teach libvirt not to try to add a balloon to the guest. > Later on, (we may have even already run Linux at that point, > unsecurely, as we can use Linux as a bootloader under some > circumstances), we start a "secure image". > > This is a kernel zImage that includes a "ticket" that has the > appropriate signature etc... so that when that kernel starts, it can > authenticate with the ultravisor, be verified (along with its ramdisk) > etc... and copied (by the UV) into secure memory & run from there. > > At that point, the hypervisor is informed that the VM has become > secure. > > So at that point, we could exit to qemu to inform it of the change, That's probably a good idea too. > and > have it walk the qtree and "Switch" all the virtio devices to use the > IOMMU I suppose, but it feels a lot grosser to me. That part feels gross, yes. > That's the only other option I can think of. > > > However in this specific case, the flag does not need to come from the > > hypervisor, it can be set by arch boot code I think. > > Christoph do you see a problem with that? > > The above could do that yes. Another approach would be to do it from a > small virtio "quirk" that pokes a bit in the device to force it to > iommu mode when it detects that we are running in a secure VM. That's a > bit warty on the virito side but probably not as much as having a qemu > one that walks of the virtio devices to change how they behave. > > What do you reckon ? I think you are right that for the dma limit the hypervisor doesn't seem to need to know. > What we want to avoid is to expose any of this to the *end user* or > libvirt or any other higher level of the management stack. We really > want that stuff to remain contained between the VM itself, KVM and > maybe qemu. > > We will need some other qemu changes for migration so that's ok. But > the minute you start touching libvirt and the higher levels it becomes > a nightmare. > > Cheers, > Ben. I don't believe you'll be able to avoid that entirely. The split between libvirt and qemu is more about community than about code, random bits of functionality tend to land on random sides of that fence. Better add a tag in domain XML early is my advice. Having said that, it's your hypervisor. I'm just suggesting that when hypervisor does somehow need to care then I suspect most people won't be receptive to the argument that changing libvirt is a nightmare. > > > > To get swiotlb you'll need to then use the DT/ACPI > > > > dma-range property to limit the addressable range, and a swiotlb > > > > capable plaform will use swiotlb automatically. > > > > > > This cannot be done as you describe it. > > > > > > The VM is created as a *normal* VM. The DT stuff is generated by qemu > > > at a point where it has *no idea* that the VM will later become secure > > > and thus will have to restrict which pages can be used for "DMA". > > > > > > The VM will *at runtime* turn itself into a secure VM via interactions > > > with the security HW and the Ultravisor layer (which sits below the > > > HV). This happens way after the DT has been created and consumed, the > > > qemu devices instanciated etc... > > > > > > Only the guest kernel knows because it initates the transition. When > > > that happens, the virtio devices have already been used by the guest > > > firmware, bootloader, possibly another kernel that kexeced the "secure" > > > one, etc... > > > > > > So instead of running around saying NAK NAK NAK, please explain how we > > > can solve that differently. > > > > > > Ben. > > > ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-05 0:22 ` Michael S. Tsirkin @ 2018-08-05 4:52 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 206+ messages in thread From: Benjamin Herrenschmidt @ 2018-08-05 4:52 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Christoph Hellwig, Will Deacon, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Sun, 2018-08-05 at 03:22 +0300, Michael S. Tsirkin wrote: > I see the allure of this, but I think down the road you will > discover passing a flag in libvirt XML saying > "please use a secure mode" or whatever is a good idea. > > Even thought it is probably not required to address this > specific issue. > > For example, I don't think ballooning works in secure mode, > you will be able to teach libvirt not to try to add a > balloon to the guest. Right, we'll need some quirk to disable balloons in the guest I suppose. Passing something from libvirt is cumbersome because the end user may not even need to know about secure VMs. There are use cases where the security is a contract down to some special application running inside the secure VM, the sysadmin knows nothing about. Also there's repercussions all the way to admin tools, web UIs etc... so it's fairly wide ranging. So as long as we only need to quirk a couple of devices, it's much better contained that way. > > Later on, (we may have even already run Linux at that point, > > unsecurely, as we can use Linux as a bootloader under some > > circumstances), we start a "secure image". > > > > This is a kernel zImage that includes a "ticket" that has the > > appropriate signature etc... so that when that kernel starts, it can > > authenticate with the ultravisor, be verified (along with its ramdisk) > > etc... and copied (by the UV) into secure memory & run from there. > > > > At that point, the hypervisor is informed that the VM has become > > secure. > > > > So at that point, we could exit to qemu to inform it of the change, > > That's probably a good idea too. We probably will have to tell qemu eventually for migration, as we'll need some kind of key exchange phase etc... to deal with the crypto aspects (the actual page copy is sorted via encrypting the secure pages back to normal pages in qemu, but we'll need extra metadata). > > and > > have it walk the qtree and "Switch" all the virtio devices to use the > > IOMMU I suppose, but it feels a lot grosser to me. > > That part feels gross, yes. > > > That's the only other option I can think of. > > > > > However in this specific case, the flag does not need to come from the > > > hypervisor, it can be set by arch boot code I think. > > > Christoph do you see a problem with that? > > > > The above could do that yes. Another approach would be to do it from a > > small virtio "quirk" that pokes a bit in the device to force it to > > iommu mode when it detects that we are running in a secure VM. That's a > > bit warty on the virito side but probably not as much as having a qemu > > one that walks of the virtio devices to change how they behave. > > > > What do you reckon ? > > I think you are right that for the dma limit the hypervisor doesn't seem > to need to know. It's not just a limit mind you. It's a range, at least if we allocate just a single pool of insecure pages. swiotlb feels like a better option for us. > > What we want to avoid is to expose any of this to the *end user* or > > libvirt or any other higher level of the management stack. We really > > want that stuff to remain contained between the VM itself, KVM and > > maybe qemu. > > > > We will need some other qemu changes for migration so that's ok. But > > the minute you start touching libvirt and the higher levels it becomes > > a nightmare. > > > > Cheers, > > Ben. > > I don't believe you'll be able to avoid that entirely. The split between > libvirt and qemu is more about community than about code, random bits of > functionality tend to land on random sides of that fence. Better add a > tag in domain XML early is my advice. Having said that, it's your > hypervisor. I'm just suggesting that when hypervisor does somehow need > to care then I suspect most people won't be receptive to the argument > that changing libvirt is a nightmare. It only needs to care at runtime. The problem isn't changing libvirt per-se, I don't have a problem with that. The problem is that it means creating two categories of machines "secure" and "non-secure", which is end-user visible, and thus has to be escalated to all the various management stacks, UIs, etc... out there. In addition, there are some cases where the individual creating the VMs may not have any idea that they are secure. But yes, if we have to, we'll do it. However, so far, we don't think it's a great idea. Cheers, Ben. > > > > > To get swiotlb you'll need to then use the DT/ACPI > > > > > dma-range property to limit the addressable range, and a swiotlb > > > > > capable plaform will use swiotlb automatically. > > > > > > > > This cannot be done as you describe it. > > > > > > > > The VM is created as a *normal* VM. The DT stuff is generated by qemu > > > > at a point where it has *no idea* that the VM will later become secure > > > > and thus will have to restrict which pages can be used for "DMA". > > > > > > > > The VM will *at runtime* turn itself into a secure VM via interactions > > > > with the security HW and the Ultravisor layer (which sits below the > > > > HV). This happens way after the DT has been created and consumed, the > > > > qemu devices instanciated etc... > > > > > > > > Only the guest kernel knows because it initates the transition. When > > > > that happens, the virtio devices have already been used by the guest > > > > firmware, bootloader, possibly another kernel that kexeced the "secure" > > > > one, etc... > > > > > > > > So instead of running around saying NAK NAK NAK, please explain how we > > > > can solve that differently. > > > > > > > > Ben. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices @ 2018-08-05 4:52 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 206+ messages in thread From: Benjamin Herrenschmidt @ 2018-08-05 4:52 UTC (permalink / raw) To: Michael S. Tsirkin Cc: robh, srikar, mpe, Will Deacon, linux-kernel, linuxram, virtualization, Christoph Hellwig, paulus, marc.zyngier, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Sun, 2018-08-05 at 03:22 +0300, Michael S. Tsirkin wrote: > I see the allure of this, but I think down the road you will > discover passing a flag in libvirt XML saying > "please use a secure mode" or whatever is a good idea. > > Even thought it is probably not required to address this > specific issue. > > For example, I don't think ballooning works in secure mode, > you will be able to teach libvirt not to try to add a > balloon to the guest. Right, we'll need some quirk to disable balloons in the guest I suppose. Passing something from libvirt is cumbersome because the end user may not even need to know about secure VMs. There are use cases where the security is a contract down to some special application running inside the secure VM, the sysadmin knows nothing about. Also there's repercussions all the way to admin tools, web UIs etc... so it's fairly wide ranging. So as long as we only need to quirk a couple of devices, it's much better contained that way. > > Later on, (we may have even already run Linux at that point, > > unsecurely, as we can use Linux as a bootloader under some > > circumstances), we start a "secure image". > > > > This is a kernel zImage that includes a "ticket" that has the > > appropriate signature etc... so that when that kernel starts, it can > > authenticate with the ultravisor, be verified (along with its ramdisk) > > etc... and copied (by the UV) into secure memory & run from there. > > > > At that point, the hypervisor is informed that the VM has become > > secure. > > > > So at that point, we could exit to qemu to inform it of the change, > > That's probably a good idea too. We probably will have to tell qemu eventually for migration, as we'll need some kind of key exchange phase etc... to deal with the crypto aspects (the actual page copy is sorted via encrypting the secure pages back to normal pages in qemu, but we'll need extra metadata). > > and > > have it walk the qtree and "Switch" all the virtio devices to use the > > IOMMU I suppose, but it feels a lot grosser to me. > > That part feels gross, yes. > > > That's the only other option I can think of. > > > > > However in this specific case, the flag does not need to come from the > > > hypervisor, it can be set by arch boot code I think. > > > Christoph do you see a problem with that? > > > > The above could do that yes. Another approach would be to do it from a > > small virtio "quirk" that pokes a bit in the device to force it to > > iommu mode when it detects that we are running in a secure VM. That's a > > bit warty on the virito side but probably not as much as having a qemu > > one that walks of the virtio devices to change how they behave. > > > > What do you reckon ? > > I think you are right that for the dma limit the hypervisor doesn't seem > to need to know. It's not just a limit mind you. It's a range, at least if we allocate just a single pool of insecure pages. swiotlb feels like a better option for us. > > What we want to avoid is to expose any of this to the *end user* or > > libvirt or any other higher level of the management stack. We really > > want that stuff to remain contained between the VM itself, KVM and > > maybe qemu. > > > > We will need some other qemu changes for migration so that's ok. But > > the minute you start touching libvirt and the higher levels it becomes > > a nightmare. > > > > Cheers, > > Ben. > > I don't believe you'll be able to avoid that entirely. The split between > libvirt and qemu is more about community than about code, random bits of > functionality tend to land on random sides of that fence. Better add a > tag in domain XML early is my advice. Having said that, it's your > hypervisor. I'm just suggesting that when hypervisor does somehow need > to care then I suspect most people won't be receptive to the argument > that changing libvirt is a nightmare. It only needs to care at runtime. The problem isn't changing libvirt per-se, I don't have a problem with that. The problem is that it means creating two categories of machines "secure" and "non-secure", which is end-user visible, and thus has to be escalated to all the various management stacks, UIs, etc... out there. In addition, there are some cases where the individual creating the VMs may not have any idea that they are secure. But yes, if we have to, we'll do it. However, so far, we don't think it's a great idea. Cheers, Ben. > > > > > To get swiotlb you'll need to then use the DT/ACPI > > > > > dma-range property to limit the addressable range, and a swiotlb > > > > > capable plaform will use swiotlb automatically. > > > > > > > > This cannot be done as you describe it. > > > > > > > > The VM is created as a *normal* VM. The DT stuff is generated by qemu > > > > at a point where it has *no idea* that the VM will later become secure > > > > and thus will have to restrict which pages can be used for "DMA". > > > > > > > > The VM will *at runtime* turn itself into a secure VM via interactions > > > > with the security HW and the Ultravisor layer (which sits below the > > > > HV). This happens way after the DT has been created and consumed, the > > > > qemu devices instanciated etc... > > > > > > > > Only the guest kernel knows because it initates the transition. When > > > > that happens, the virtio devices have already been used by the guest > > > > firmware, bootloader, possibly another kernel that kexeced the "secure" > > > > one, etc... > > > > > > > > So instead of running around saying NAK NAK NAK, please explain how we > > > > can solve that differently. > > > > > > > > Ben. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-05 4:52 ` Benjamin Herrenschmidt (?) @ 2018-08-06 13:46 ` Michael S. Tsirkin 2018-08-06 19:56 ` Benjamin Herrenschmidt -1 siblings, 1 reply; 206+ messages in thread From: Michael S. Tsirkin @ 2018-08-06 13:46 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Christoph Hellwig, Will Deacon, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Sun, Aug 05, 2018 at 02:52:54PM +1000, Benjamin Herrenschmidt wrote: > On Sun, 2018-08-05 at 03:22 +0300, Michael S. Tsirkin wrote: > > I see the allure of this, but I think down the road you will > > discover passing a flag in libvirt XML saying > > "please use a secure mode" or whatever is a good idea. > > > > Even thought it is probably not required to address this > > specific issue. > > > > For example, I don't think ballooning works in secure mode, > > you will be able to teach libvirt not to try to add a > > balloon to the guest. > > Right, we'll need some quirk to disable balloons in the guest I > suppose. > > Passing something from libvirt is cumbersome because the end user may > not even need to know about secure VMs. There are use cases where the > security is a contract down to some special application running inside > the secure VM, the sysadmin knows nothing about. > > Also there's repercussions all the way to admin tools, web UIs etc... > so it's fairly wide ranging. > > So as long as we only need to quirk a couple of devices, it's much > better contained that way. So just the balloon thing already means that yes management and all the way to the user tools must know this is going on. Otherwise user will try to inflate the balloon and wonder why this does not work. > > > Later on, (we may have even already run Linux at that point, > > > unsecurely, as we can use Linux as a bootloader under some > > > circumstances), we start a "secure image". > > > > > > This is a kernel zImage that includes a "ticket" that has the > > > appropriate signature etc... so that when that kernel starts, it can > > > authenticate with the ultravisor, be verified (along with its ramdisk) > > > etc... and copied (by the UV) into secure memory & run from there. > > > > > > At that point, the hypervisor is informed that the VM has become > > > secure. > > > > > > So at that point, we could exit to qemu to inform it of the change, > > > > That's probably a good idea too. > > We probably will have to tell qemu eventually for migration, as we'll > need some kind of key exchange phase etc... to deal with the crypto > aspects (the actual page copy is sorted via encrypting the secure pages > back to normal pages in qemu, but we'll need extra metadata). > > > > and > > > have it walk the qtree and "Switch" all the virtio devices to use the > > > IOMMU I suppose, but it feels a lot grosser to me. > > > > That part feels gross, yes. > > > > > That's the only other option I can think of. > > > > > > > However in this specific case, the flag does not need to come from the > > > > hypervisor, it can be set by arch boot code I think. > > > > Christoph do you see a problem with that? > > > > > > The above could do that yes. Another approach would be to do it from a > > > small virtio "quirk" that pokes a bit in the device to force it to > > > iommu mode when it detects that we are running in a secure VM. That's a > > > bit warty on the virito side but probably not as much as having a qemu > > > one that walks of the virtio devices to change how they behave. > > > > > > What do you reckon ? > > > > I think you are right that for the dma limit the hypervisor doesn't seem > > to need to know. > > It's not just a limit mind you. It's a range, at least if we allocate > just a single pool of insecure pages. swiotlb feels like a better > option for us. > > > > What we want to avoid is to expose any of this to the *end user* or > > > libvirt or any other higher level of the management stack. We really > > > want that stuff to remain contained between the VM itself, KVM and > > > maybe qemu. > > > > > > We will need some other qemu changes for migration so that's ok. But > > > the minute you start touching libvirt and the higher levels it becomes > > > a nightmare. > > > > > > Cheers, > > > Ben. > > > > I don't believe you'll be able to avoid that entirely. The split between > > libvirt and qemu is more about community than about code, random bits of > > functionality tend to land on random sides of that fence. Better add a > > tag in domain XML early is my advice. Having said that, it's your > > hypervisor. I'm just suggesting that when hypervisor does somehow need > > to care then I suspect most people won't be receptive to the argument > > that changing libvirt is a nightmare. > > It only needs to care at runtime. The problem isn't changing libvirt > per-se, I don't have a problem with that. The problem is that it means > creating two categories of machines "secure" and "non-secure", which is > end-user visible, and thus has to be escalated to all the various > management stacks, UIs, etc... out there. > > In addition, there are some cases where the individual creating the VMs > may not have any idea that they are secure. > > But yes, if we have to, we'll do it. However, so far, we don't think > it's a great idea. > > Cheers, > Ben. Here's another example: you can't migrate a secure vm to hypervisor which doesn't support this feature. Again management tools above libvirt need to know otherwise they will try. > > > > > > To get swiotlb you'll need to then use the DT/ACPI > > > > > > dma-range property to limit the addressable range, and a swiotlb > > > > > > capable plaform will use swiotlb automatically. > > > > > > > > > > This cannot be done as you describe it. > > > > > > > > > > The VM is created as a *normal* VM. The DT stuff is generated by qemu > > > > > at a point where it has *no idea* that the VM will later become secure > > > > > and thus will have to restrict which pages can be used for "DMA". > > > > > > > > > > The VM will *at runtime* turn itself into a secure VM via interactions > > > > > with the security HW and the Ultravisor layer (which sits below the > > > > > HV). This happens way after the DT has been created and consumed, the > > > > > qemu devices instanciated etc... > > > > > > > > > > Only the guest kernel knows because it initates the transition. When > > > > > that happens, the virtio devices have already been used by the guest > > > > > firmware, bootloader, possibly another kernel that kexeced the "secure" > > > > > one, etc... > > > > > > > > > > So instead of running around saying NAK NAK NAK, please explain how we > > > > > can solve that differently. > > > > > > > > > > Ben. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-06 13:46 ` Michael S. Tsirkin @ 2018-08-06 19:56 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 206+ messages in thread From: Benjamin Herrenschmidt @ 2018-08-06 19:56 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Christoph Hellwig, Will Deacon, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Mon, 2018-08-06 at 16:46 +0300, Michael S. Tsirkin wrote: > > > Right, we'll need some quirk to disable balloons in the guest I > > suppose. > > > > Passing something from libvirt is cumbersome because the end user may > > not even need to know about secure VMs. There are use cases where the > > security is a contract down to some special application running inside > > the secure VM, the sysadmin knows nothing about. > > > > Also there's repercussions all the way to admin tools, web UIs etc... > > so it's fairly wide ranging. > > > > So as long as we only need to quirk a couple of devices, it's much > > better contained that way. > > So just the balloon thing already means that yes management and all the > way to the user tools must know this is going on. Otherwise > user will try to inflate the balloon and wonder why this does not work. There is *dozens* of management systems out there, not even all open source, we won't ever be able to see the end of the tunnel if we need to teach every single of them, including end users, about platform specific new VM flags like that. .../... > Here's another example: you can't migrate a secure vm to hypervisor > which doesn't support this feature. Again management tools above libvirt > need to know otherwise they will try. There will have to be a new machine type for that I suppose, yes, though it's not just the hypervisor that needs to know about the modified migration stream, it's also the need to have a compatible ultravisor with the right keys on the other side. So migration is going to be special and require extra admin work in all cases yes. But not all secure VMs are meant to be migratable. In any case, back to the problem at hand. What a qemu flag gives us is just a way to force iommu at VM creation time. This is rather sub-optimal, we don't really want the iommu in the way, so it's at best a "workaround", and it's not really solving the real problem. As I said replying to Christoph, we are "leaking" into the interface something here that is really what's the VM is doing to itself, which is to stash its memory away in an inaccessible place. Cheers, Ben. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices @ 2018-08-06 19:56 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 206+ messages in thread From: Benjamin Herrenschmidt @ 2018-08-06 19:56 UTC (permalink / raw) To: Michael S. Tsirkin Cc: robh, srikar, mpe, Will Deacon, linux-kernel, linuxram, virtualization, Christoph Hellwig, jean-philippe.brucker, paulus, marc.zyngier, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Mon, 2018-08-06 at 16:46 +0300, Michael S. Tsirkin wrote: > > > Right, we'll need some quirk to disable balloons in the guest I > > suppose. > > > > Passing something from libvirt is cumbersome because the end user may > > not even need to know about secure VMs. There are use cases where the > > security is a contract down to some special application running inside > > the secure VM, the sysadmin knows nothing about. > > > > Also there's repercussions all the way to admin tools, web UIs etc... > > so it's fairly wide ranging. > > > > So as long as we only need to quirk a couple of devices, it's much > > better contained that way. > > So just the balloon thing already means that yes management and all the > way to the user tools must know this is going on. Otherwise > user will try to inflate the balloon and wonder why this does not work. There is *dozens* of management systems out there, not even all open source, we won't ever be able to see the end of the tunnel if we need to teach every single of them, including end users, about platform specific new VM flags like that. .../... > Here's another example: you can't migrate a secure vm to hypervisor > which doesn't support this feature. Again management tools above libvirt > need to know otherwise they will try. There will have to be a new machine type for that I suppose, yes, though it's not just the hypervisor that needs to know about the modified migration stream, it's also the need to have a compatible ultravisor with the right keys on the other side. So migration is going to be special and require extra admin work in all cases yes. But not all secure VMs are meant to be migratable. In any case, back to the problem at hand. What a qemu flag gives us is just a way to force iommu at VM creation time. This is rather sub-optimal, we don't really want the iommu in the way, so it's at best a "workaround", and it's not really solving the real problem. As I said replying to Christoph, we are "leaking" into the interface something here that is really what's the VM is doing to itself, which is to stash its memory away in an inaccessible place. Cheers, Ben. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-06 19:56 ` Benjamin Herrenschmidt (?) @ 2018-08-06 20:35 ` Michael S. Tsirkin 2018-08-06 21:26 ` Benjamin Herrenschmidt ` (2 more replies) -1 siblings, 3 replies; 206+ messages in thread From: Michael S. Tsirkin @ 2018-08-06 20:35 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Christoph Hellwig, Will Deacon, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Tue, Aug 07, 2018 at 05:56:59AM +1000, Benjamin Herrenschmidt wrote: > On Mon, 2018-08-06 at 16:46 +0300, Michael S. Tsirkin wrote: > > > > > Right, we'll need some quirk to disable balloons in the guest I > > > suppose. > > > > > > Passing something from libvirt is cumbersome because the end user may > > > not even need to know about secure VMs. There are use cases where the > > > security is a contract down to some special application running inside > > > the secure VM, the sysadmin knows nothing about. > > > > > > Also there's repercussions all the way to admin tools, web UIs etc... > > > so it's fairly wide ranging. > > > > > > So as long as we only need to quirk a couple of devices, it's much > > > better contained that way. > > > > So just the balloon thing already means that yes management and all the > > way to the user tools must know this is going on. Otherwise > > user will try to inflate the balloon and wonder why this does not work. > > There is *dozens* of management systems out there, not even all open > source, we won't ever be able to see the end of the tunnel if we need > to teach every single of them, including end users, about platform > specific new VM flags like that. > > .../... In the end I suspect you will find you have to. > > Here's another example: you can't migrate a secure vm to hypervisor > > which doesn't support this feature. Again management tools above libvirt > > need to know otherwise they will try. > > There will have to be a new machine type for that I suppose, yes, > though it's not just the hypervisor that needs to know about the > modified migration stream, it's also the need to have a compatible > ultravisor with the right keys on the other side. > > So migration is going to be special and require extra admin work in all > cases yes. But not all secure VMs are meant to be migratable. > > In any case, back to the problem at hand. What a qemu flag gives us is > just a way to force iommu at VM creation time. I don't think a qemu flag is strictly required for a problem at hand. > This is rather sub-optimal, we don't really want the iommu in the way, > so it's at best a "workaround", and it's not really solving the real > problem. This specific problem, I think I agree. > As I said replying to Christoph, we are "leaking" into the interface > something here that is really what's the VM is doing to itself, which > is to stash its memory away in an inaccessible place. > > Cheers, > Ben. I think Christoph merely objects to the specific implementation. If instead you do something like tweak dev->bus_dma_mask for the virtio device I think he won't object. -- MST ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-06 20:35 ` Michael S. Tsirkin @ 2018-08-06 21:26 ` Benjamin Herrenschmidt 2018-08-06 23:18 ` Benjamin Herrenschmidt 2018-08-07 6:12 ` Christoph Hellwig 2 siblings, 0 replies; 206+ messages in thread From: Benjamin Herrenschmidt @ 2018-08-06 21:26 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Christoph Hellwig, Will Deacon, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Mon, 2018-08-06 at 23:35 +0300, Michael S. Tsirkin wrote: > > As I said replying to Christoph, we are "leaking" into the interface > > something here that is really what's the VM is doing to itself, which > > is to stash its memory away in an inaccessible place. > > > > Cheers, > > Ben. > > I think Christoph merely objects to the specific implementation. If > instead you do something like tweak dev->bus_dma_mask for the virtio > device I think he won't object. Well, we don't have "bus_dma_mask" yet ..or you mean dma_mask ? So, something like that would be a possibility, but the problem is that the current virtio (guest side) implementation doesn't honor this when not using dma ops and will not use dma ops if not using iommu, so back to square one. Christoph seems to be wanting to use a flag in the interface to make the guest use dma_ops which is what I don't understand. What would be needed then would be something along the lines of virtio noticing that dma_mask isn't big enough to cover all of memory (which isn't something generic code can easily do here for various reasons I can elaborate if you want, but that specific test more/less has to be arch specific), and in that case, force itself to use DMA ops routed to swiotlb. I'd rather have arch code do the bulk of that work, don't you think ? Which brings me back to this option, which may be the simplest and avoids the overhead of the proposed series (I found the series to be a nice cleanup but retpoline does kick us in the nuts here). So what about this ? --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -155,7 +155,7 @@ static bool vring_use_dma_api(struct virtio_device *vdev) * the DMA API if we're a Xen guest, which at least allows * all of the sensible Xen configurations to work correctly. */ - if (xen_domain()) + if (xen_domain() || arch_virtio_direct_dma_ops(&vdev->dev)) return true; return false; (Passing the dev allows the arch to know this is a virtio device in "direct" mode or whatever we want to call the !iommu case, and construct appropriate DMA ops for it, which aren't the same as the DMA ops of any other PCI device who *do* use the iommu). Otherwise, the harder option would be for us to hack so that xen_domain() returns true in our setup (gross), and have the arch code, when it sets up PCI device DMA ops, have a gross hack to identify virtio PCI devices, checks their F_IOMMU flag itself, and sets up the different ops at that point. As for those "special" ops, they are of course just normal swiotlb ops, there's nothing "special" other that they aren't the ops that other PCI device on that bus use. Cheers, Ben. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices @ 2018-08-06 21:26 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 206+ messages in thread From: Benjamin Herrenschmidt @ 2018-08-06 21:26 UTC (permalink / raw) To: Michael S. Tsirkin Cc: robh, srikar, mpe, Will Deacon, linux-kernel, linuxram, virtualization, Christoph Hellwig, jean-philippe.brucker, paulus, marc.zyngier, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Mon, 2018-08-06 at 23:35 +0300, Michael S. Tsirkin wrote: > > As I said replying to Christoph, we are "leaking" into the interface > > something here that is really what's the VM is doing to itself, which > > is to stash its memory away in an inaccessible place. > > > > Cheers, > > Ben. > > I think Christoph merely objects to the specific implementation. If > instead you do something like tweak dev->bus_dma_mask for the virtio > device I think he won't object. Well, we don't have "bus_dma_mask" yet ..or you mean dma_mask ? So, something like that would be a possibility, but the problem is that the current virtio (guest side) implementation doesn't honor this when not using dma ops and will not use dma ops if not using iommu, so back to square one. Christoph seems to be wanting to use a flag in the interface to make the guest use dma_ops which is what I don't understand. What would be needed then would be something along the lines of virtio noticing that dma_mask isn't big enough to cover all of memory (which isn't something generic code can easily do here for various reasons I can elaborate if you want, but that specific test more/less has to be arch specific), and in that case, force itself to use DMA ops routed to swiotlb. I'd rather have arch code do the bulk of that work, don't you think ? Which brings me back to this option, which may be the simplest and avoids the overhead of the proposed series (I found the series to be a nice cleanup but retpoline does kick us in the nuts here). So what about this ? --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -155,7 +155,7 @@ static bool vring_use_dma_api(struct virtio_device *vdev) * the DMA API if we're a Xen guest, which at least allows * all of the sensible Xen configurations to work correctly. */ - if (xen_domain()) + if (xen_domain() || arch_virtio_direct_dma_ops(&vdev->dev)) return true; return false; (Passing the dev allows the arch to know this is a virtio device in "direct" mode or whatever we want to call the !iommu case, and construct appropriate DMA ops for it, which aren't the same as the DMA ops of any other PCI device who *do* use the iommu). Otherwise, the harder option would be for us to hack so that xen_domain() returns true in our setup (gross), and have the arch code, when it sets up PCI device DMA ops, have a gross hack to identify virtio PCI devices, checks their F_IOMMU flag itself, and sets up the different ops at that point. As for those "special" ops, they are of course just normal swiotlb ops, there's nothing "special" other that they aren't the ops that other PCI device on that bus use. Cheers, Ben. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-06 21:26 ` Benjamin Herrenschmidt (?) @ 2018-08-06 21:46 ` Michael S. Tsirkin -1 siblings, 0 replies; 206+ messages in thread From: Michael S. Tsirkin @ 2018-08-06 21:46 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: robh, srikar, mpe, Will Deacon, linux-kernel, linuxram, virtualization, Christoph Hellwig, jean-philippe.brucker, paulus, marc.zyngier, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Tue, Aug 07, 2018 at 07:26:35AM +1000, Benjamin Herrenschmidt wrote: > On Mon, 2018-08-06 at 23:35 +0300, Michael S. Tsirkin wrote: > > > As I said replying to Christoph, we are "leaking" into the interface > > > something here that is really what's the VM is doing to itself, which > > > is to stash its memory away in an inaccessible place. > > > > > > Cheers, > > > Ben. > > > > I think Christoph merely objects to the specific implementation. If > > instead you do something like tweak dev->bus_dma_mask for the virtio > > device I think he won't object. > > Well, we don't have "bus_dma_mask" yet ..or you mean dma_mask ? > > So, something like that would be a possibility, but the problem is that > the current virtio (guest side) implementation doesn't honor this when > not using dma ops and will not use dma ops if not using iommu, so back > to square one. Well we have the RFC for that - the switch to using DMA ops unconditionally isn't problematic itself IMHO, for now that RFC is blocked by its perfromance overhead for now but Christoph says he's trying to remove that for direct mappings, so we should hopefully be able to get there in X weeks. > Christoph seems to be wanting to use a flag in the interface to make > the guest use dma_ops which is what I don't understand. > > What would be needed then would be something along the lines of virtio > noticing that dma_mask isn't big enough to cover all of memory (which > isn't something generic code can easily do here for various reasons I > can elaborate if you want, but that specific test more/less has to be > arch specific), and in that case, force itself to use DMA ops routed to > swiotlb. > > I'd rather have arch code do the bulk of that work, don't you think ? > > Which brings me back to this option, which may be the simplest and > avoids the overhead of the proposed series (I found the series to be a > nice cleanup but retpoline does kick us in the nuts here). > > So what about this ? > > --- a/drivers/virtio/virtio_ring.c > +++ b/drivers/virtio/virtio_ring.c > @@ -155,7 +155,7 @@ static bool vring_use_dma_api(struct virtio_device > *vdev) > * the DMA API if we're a Xen guest, which at least allows > * all of the sensible Xen configurations to work correctly. > */ > - if (xen_domain()) > + if (xen_domain() || arch_virtio_direct_dma_ops(&vdev->dev)) > return true; > > return false; Right but can't we fix the retpoline overhead such that vring_use_dma_api will not be called on data path any longer, making this a setup time check? > (Passing the dev allows the arch to know this is a virtio device in > "direct" mode or whatever we want to call the !iommu case, and > construct appropriate DMA ops for it, which aren't the same as the DMA > ops of any other PCI device who *do* use the iommu). I think that's where Christoph might have specific ideas about it. > Otherwise, the harder option would be for us to hack so that > xen_domain() returns true in our setup (gross), and have the arch code, > when it sets up PCI device DMA ops, have a gross hack to identify > virtio PCI devices, checks their F_IOMMU flag itself, and sets up the > different ops at that point. > > As for those "special" ops, they are of course just normal swiotlb ops, > there's nothing "special" other that they aren't the ops that other PCI > device on that bus use. > > Cheers, > Ben. -- MST ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-06 21:26 ` Benjamin Herrenschmidt (?) (?) @ 2018-08-06 21:46 ` Michael S. Tsirkin 2018-08-06 22:13 ` Benjamin Herrenschmidt ` (2 more replies) -1 siblings, 3 replies; 206+ messages in thread From: Michael S. Tsirkin @ 2018-08-06 21:46 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Christoph Hellwig, Will Deacon, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Tue, Aug 07, 2018 at 07:26:35AM +1000, Benjamin Herrenschmidt wrote: > On Mon, 2018-08-06 at 23:35 +0300, Michael S. Tsirkin wrote: > > > As I said replying to Christoph, we are "leaking" into the interface > > > something here that is really what's the VM is doing to itself, which > > > is to stash its memory away in an inaccessible place. > > > > > > Cheers, > > > Ben. > > > > I think Christoph merely objects to the specific implementation. If > > instead you do something like tweak dev->bus_dma_mask for the virtio > > device I think he won't object. > > Well, we don't have "bus_dma_mask" yet ..or you mean dma_mask ? > > So, something like that would be a possibility, but the problem is that > the current virtio (guest side) implementation doesn't honor this when > not using dma ops and will not use dma ops if not using iommu, so back > to square one. Well we have the RFC for that - the switch to using DMA ops unconditionally isn't problematic itself IMHO, for now that RFC is blocked by its perfromance overhead for now but Christoph says he's trying to remove that for direct mappings, so we should hopefully be able to get there in X weeks. > Christoph seems to be wanting to use a flag in the interface to make > the guest use dma_ops which is what I don't understand. > > What would be needed then would be something along the lines of virtio > noticing that dma_mask isn't big enough to cover all of memory (which > isn't something generic code can easily do here for various reasons I > can elaborate if you want, but that specific test more/less has to be > arch specific), and in that case, force itself to use DMA ops routed to > swiotlb. > > I'd rather have arch code do the bulk of that work, don't you think ? > > Which brings me back to this option, which may be the simplest and > avoids the overhead of the proposed series (I found the series to be a > nice cleanup but retpoline does kick us in the nuts here). > > So what about this ? > > --- a/drivers/virtio/virtio_ring.c > +++ b/drivers/virtio/virtio_ring.c > @@ -155,7 +155,7 @@ static bool vring_use_dma_api(struct virtio_device > *vdev) > * the DMA API if we're a Xen guest, which at least allows > * all of the sensible Xen configurations to work correctly. > */ > - if (xen_domain()) > + if (xen_domain() || arch_virtio_direct_dma_ops(&vdev->dev)) > return true; > > return false; Right but can't we fix the retpoline overhead such that vring_use_dma_api will not be called on data path any longer, making this a setup time check? > (Passing the dev allows the arch to know this is a virtio device in > "direct" mode or whatever we want to call the !iommu case, and > construct appropriate DMA ops for it, which aren't the same as the DMA > ops of any other PCI device who *do* use the iommu). I think that's where Christoph might have specific ideas about it. > Otherwise, the harder option would be for us to hack so that > xen_domain() returns true in our setup (gross), and have the arch code, > when it sets up PCI device DMA ops, have a gross hack to identify > virtio PCI devices, checks their F_IOMMU flag itself, and sets up the > different ops at that point. > > As for those "special" ops, they are of course just normal swiotlb ops, > there's nothing "special" other that they aren't the ops that other PCI > device on that bus use. > > Cheers, > Ben. -- MST ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-06 21:46 ` Michael S. Tsirkin @ 2018-08-06 22:13 ` Benjamin Herrenschmidt 2018-08-07 6:18 ` Christoph Hellwig 2018-08-07 6:18 ` Christoph Hellwig 2 siblings, 0 replies; 206+ messages in thread From: Benjamin Herrenschmidt @ 2018-08-06 22:13 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Christoph Hellwig, Will Deacon, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Tue, 2018-08-07 at 00:46 +0300, Michael S. Tsirkin wrote: > On Tue, Aug 07, 2018 at 07:26:35AM +1000, Benjamin Herrenschmidt wrote: > > On Mon, 2018-08-06 at 23:35 +0300, Michael S. Tsirkin wrote: > > > > As I said replying to Christoph, we are "leaking" into the interface > > > > something here that is really what's the VM is doing to itself, which > > > > is to stash its memory away in an inaccessible place. > > > > > > > > Cheers, > > > > Ben. > > > > > > I think Christoph merely objects to the specific implementation. If > > > instead you do something like tweak dev->bus_dma_mask for the virtio > > > device I think he won't object. > > > > Well, we don't have "bus_dma_mask" yet ..or you mean dma_mask ? > > > > So, something like that would be a possibility, but the problem is that > > the current virtio (guest side) implementation doesn't honor this when > > not using dma ops and will not use dma ops if not using iommu, so back > > to square one. > > Well we have the RFC for that - the switch to using DMA ops unconditionally isn't > problematic itself IMHO, for now that RFC is blocked > by its perfromance overhead for now but Christoph says > he's trying to remove that for direct mappings, > so we should hopefully be able to get there in X weeks. That would be good yes. ../.. > > --- a/drivers/virtio/virtio_ring.c > > +++ b/drivers/virtio/virtio_ring.c > > @@ -155,7 +155,7 @@ static bool vring_use_dma_api(struct virtio_device > > *vdev) > > * the DMA API if we're a Xen guest, which at least allows > > * all of the sensible Xen configurations to work correctly. > > */ > > - if (xen_domain()) > > + if (xen_domain() || arch_virtio_direct_dma_ops(&vdev->dev)) > > return true; > > > > return false; > > Right but can't we fix the retpoline overhead such that > vring_use_dma_api will not be called on data path any longer, making > this a setup time check? Yes it needs to be a setup time check regardless actually ! The above is broken, sorry I was a bit quick here (too early in the morning... ugh). We don't want the arch to go override the dma ops every time that is callled. But yes, if we can fix the overhead, it becomes just a matter of setting up the "right" ops automatically. > > (Passing the dev allows the arch to know this is a virtio device in > > "direct" mode or whatever we want to call the !iommu case, and > > construct appropriate DMA ops for it, which aren't the same as the DMA > > ops of any other PCI device who *do* use the iommu). > > I think that's where Christoph might have specific ideas about it. OK well, assuming Christoph can solve the direct case in a way that also work for the virtio !iommu case, we still want some bit of logic somewhere that will "switch" to swiotlb based ops if the DMA mask is limited. You mentioned an RFC for that ? Do you happen to have a link ? It would be indeed ideal if all we had to do was setup some kind of bus_dma_mask on all PCI devices and have virtio automagically insert swiotlb when necessary. Cheers, Ben. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices @ 2018-08-06 22:13 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 206+ messages in thread From: Benjamin Herrenschmidt @ 2018-08-06 22:13 UTC (permalink / raw) To: Michael S. Tsirkin Cc: robh, srikar, mpe, Will Deacon, linux-kernel, linuxram, virtualization, Christoph Hellwig, jean-philippe.brucker, paulus, marc.zyngier, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Tue, 2018-08-07 at 00:46 +0300, Michael S. Tsirkin wrote: > On Tue, Aug 07, 2018 at 07:26:35AM +1000, Benjamin Herrenschmidt wrote: > > On Mon, 2018-08-06 at 23:35 +0300, Michael S. Tsirkin wrote: > > > > As I said replying to Christoph, we are "leaking" into the interface > > > > something here that is really what's the VM is doing to itself, which > > > > is to stash its memory away in an inaccessible place. > > > > > > > > Cheers, > > > > Ben. > > > > > > I think Christoph merely objects to the specific implementation. If > > > instead you do something like tweak dev->bus_dma_mask for the virtio > > > device I think he won't object. > > > > Well, we don't have "bus_dma_mask" yet ..or you mean dma_mask ? > > > > So, something like that would be a possibility, but the problem is that > > the current virtio (guest side) implementation doesn't honor this when > > not using dma ops and will not use dma ops if not using iommu, so back > > to square one. > > Well we have the RFC for that - the switch to using DMA ops unconditionally isn't > problematic itself IMHO, for now that RFC is blocked > by its perfromance overhead for now but Christoph says > he's trying to remove that for direct mappings, > so we should hopefully be able to get there in X weeks. That would be good yes. ../.. > > --- a/drivers/virtio/virtio_ring.c > > +++ b/drivers/virtio/virtio_ring.c > > @@ -155,7 +155,7 @@ static bool vring_use_dma_api(struct virtio_device > > *vdev) > > * the DMA API if we're a Xen guest, which at least allows > > * all of the sensible Xen configurations to work correctly. > > */ > > - if (xen_domain()) > > + if (xen_domain() || arch_virtio_direct_dma_ops(&vdev->dev)) > > return true; > > > > return false; > > Right but can't we fix the retpoline overhead such that > vring_use_dma_api will not be called on data path any longer, making > this a setup time check? Yes it needs to be a setup time check regardless actually ! The above is broken, sorry I was a bit quick here (too early in the morning... ugh). We don't want the arch to go override the dma ops every time that is callled. But yes, if we can fix the overhead, it becomes just a matter of setting up the "right" ops automatically. > > (Passing the dev allows the arch to know this is a virtio device in > > "direct" mode or whatever we want to call the !iommu case, and > > construct appropriate DMA ops for it, which aren't the same as the DMA > > ops of any other PCI device who *do* use the iommu). > > I think that's where Christoph might have specific ideas about it. OK well, assuming Christoph can solve the direct case in a way that also work for the virtio !iommu case, we still want some bit of logic somewhere that will "switch" to swiotlb based ops if the DMA mask is limited. You mentioned an RFC for that ? Do you happen to have a link ? It would be indeed ideal if all we had to do was setup some kind of bus_dma_mask on all PCI devices and have virtio automagically insert swiotlb when necessary. Cheers, Ben. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-06 22:13 ` Benjamin Herrenschmidt @ 2018-08-06 23:16 ` Benjamin Herrenschmidt -1 siblings, 0 replies; 206+ messages in thread From: Benjamin Herrenschmidt @ 2018-08-06 23:16 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Christoph Hellwig, Will Deacon, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Tue, 2018-08-07 at 08:13 +1000, Benjamin Herrenschmidt wrote: > > OK well, assuming Christoph can solve the direct case in a way that > also work for the virtio !iommu case, we still want some bit of logic > somewhere that will "switch" to swiotlb based ops if the DMA mask is > limited. > > You mentioned an RFC for that ? Do you happen to have a link ? > > It would be indeed ideal if all we had to do was setup some kind of > bus_dma_mask on all PCI devices and have virtio automagically insert > swiotlb when necessary. Actually... I can think of a simpler option (Anshuman, didn't you prototype this earlier ?): Since that limitaiton of requiring bounce buffering via swiotlb is true of any device in a secure VM, whether it goes through the iommu or not, the iommu remapping is essentially pointless. Thus, we could ensure that the iommu maps 1:1 the swiotlb bounce buffer (either that or we configure it as "disabled" which is equivalent in this case). That way, we can now use the basic swiotlb ops everywhere, the same dma_ops (swiotlb) will work whether a device uses the iommu or not. Which boils down now to only making virtio use dma ops, there is no need to override the dma_ops. Which means all we have to do is either make xen_domain() return true (yuck) or replace that one test with arch_virtio_force_dma_api() which resolves to xen_domain() on x86 and can do something else for us. As to using a virtio feature flag for that, which is what Christoph proposes, I'm not too fan of it because this means effectively exposing this to the peer, ie the interface. I don't think it belong there. The interface, from the hypervisor perspective, whether it's qemu, vmware, hyperz etc... have no business knowing how the guest manages its dma operations, and may not even be aware of the access limitations (in our case they are somewhat guest self-imposed). Now, if this flag really is what we have to do, then we'd probably need a qemu hack which will go set that flag on all virtio devices when it detects that the VM is going secure. But I don't think that's where that information "need to use the dma API even for direct mode" belongs. Cheers, Ben. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices @ 2018-08-06 23:16 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 206+ messages in thread From: Benjamin Herrenschmidt @ 2018-08-06 23:16 UTC (permalink / raw) To: Michael S. Tsirkin Cc: robh, srikar, mpe, Will Deacon, linux-kernel, linuxram, virtualization, Christoph Hellwig, jean-philippe.brucker, paulus, marc.zyngier, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Tue, 2018-08-07 at 08:13 +1000, Benjamin Herrenschmidt wrote: > > OK well, assuming Christoph can solve the direct case in a way that > also work for the virtio !iommu case, we still want some bit of logic > somewhere that will "switch" to swiotlb based ops if the DMA mask is > limited. > > You mentioned an RFC for that ? Do you happen to have a link ? > > It would be indeed ideal if all we had to do was setup some kind of > bus_dma_mask on all PCI devices and have virtio automagically insert > swiotlb when necessary. Actually... I can think of a simpler option (Anshuman, didn't you prototype this earlier ?): Since that limitaiton of requiring bounce buffering via swiotlb is true of any device in a secure VM, whether it goes through the iommu or not, the iommu remapping is essentially pointless. Thus, we could ensure that the iommu maps 1:1 the swiotlb bounce buffer (either that or we configure it as "disabled" which is equivalent in this case). That way, we can now use the basic swiotlb ops everywhere, the same dma_ops (swiotlb) will work whether a device uses the iommu or not. Which boils down now to only making virtio use dma ops, there is no need to override the dma_ops. Which means all we have to do is either make xen_domain() return true (yuck) or replace that one test with arch_virtio_force_dma_api() which resolves to xen_domain() on x86 and can do something else for us. As to using a virtio feature flag for that, which is what Christoph proposes, I'm not too fan of it because this means effectively exposing this to the peer, ie the interface. I don't think it belong there. The interface, from the hypervisor perspective, whether it's qemu, vmware, hyperz etc... have no business knowing how the guest manages its dma operations, and may not even be aware of the access limitations (in our case they are somewhat guest self-imposed). Now, if this flag really is what we have to do, then we'd probably need a qemu hack which will go set that flag on all virtio devices when it detects that the VM is going secure. But I don't think that's where that information "need to use the dma API even for direct mode" belongs. Cheers, Ben. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-06 22:13 ` Benjamin Herrenschmidt (?) (?) @ 2018-08-06 23:45 ` Michael S. Tsirkin 2018-08-07 0:18 ` Benjamin Herrenschmidt ` (2 more replies) -1 siblings, 3 replies; 206+ messages in thread From: Michael S. Tsirkin @ 2018-08-06 23:45 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Christoph Hellwig, Will Deacon, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Tue, Aug 07, 2018 at 08:13:56AM +1000, Benjamin Herrenschmidt wrote: > On Tue, 2018-08-07 at 00:46 +0300, Michael S. Tsirkin wrote: > > On Tue, Aug 07, 2018 at 07:26:35AM +1000, Benjamin Herrenschmidt wrote: > > > On Mon, 2018-08-06 at 23:35 +0300, Michael S. Tsirkin wrote: > > > > > As I said replying to Christoph, we are "leaking" into the interface > > > > > something here that is really what's the VM is doing to itself, which > > > > > is to stash its memory away in an inaccessible place. > > > > > > > > > > Cheers, > > > > > Ben. > > > > > > > > I think Christoph merely objects to the specific implementation. If > > > > instead you do something like tweak dev->bus_dma_mask for the virtio > > > > device I think he won't object. > > > > > > Well, we don't have "bus_dma_mask" yet ..or you mean dma_mask ? > > > > > > So, something like that would be a possibility, but the problem is that > > > the current virtio (guest side) implementation doesn't honor this when > > > not using dma ops and will not use dma ops if not using iommu, so back > > > to square one. > > > > Well we have the RFC for that - the switch to using DMA ops unconditionally isn't > > problematic itself IMHO, for now that RFC is blocked > > by its perfromance overhead for now but Christoph says > > he's trying to remove that for direct mappings, > > so we should hopefully be able to get there in X weeks. > > That would be good yes. > > ../.. > > > > --- a/drivers/virtio/virtio_ring.c > > > +++ b/drivers/virtio/virtio_ring.c > > > @@ -155,7 +155,7 @@ static bool vring_use_dma_api(struct virtio_device > > > *vdev) > > > * the DMA API if we're a Xen guest, which at least allows > > > * all of the sensible Xen configurations to work correctly. > > > */ > > > - if (xen_domain()) > > > + if (xen_domain() || arch_virtio_direct_dma_ops(&vdev->dev)) > > > return true; > > > > > > return false; > > > > Right but can't we fix the retpoline overhead such that > > vring_use_dma_api will not be called on data path any longer, making > > this a setup time check? > > Yes it needs to be a setup time check regardless actually ! > > The above is broken, sorry I was a bit quick here (too early in the > morning... ugh). We don't want the arch to go override the dma ops > every time that is callled. > > But yes, if we can fix the overhead, it becomes just a matter of > setting up the "right" ops automatically. > > > > (Passing the dev allows the arch to know this is a virtio device in > > > "direct" mode or whatever we want to call the !iommu case, and > > > construct appropriate DMA ops for it, which aren't the same as the DMA > > > ops of any other PCI device who *do* use the iommu). > > > > I think that's where Christoph might have specific ideas about it. > > OK well, assuming Christoph can solve the direct case in a way that > also work for the virtio !iommu case, we still want some bit of logic > somewhere that will "switch" to swiotlb based ops if the DMA mask is > limited. > > You mentioned an RFC for that ? Do you happen to have a link ? No but Christoph did I think. > It would be indeed ideal if all we had to do was setup some kind of > bus_dma_mask on all PCI devices and have virtio automagically insert > swiotlb when necessary. > > Cheers, > Ben. > ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-06 23:45 ` Michael S. Tsirkin @ 2018-08-07 0:18 ` Benjamin Herrenschmidt 2018-08-07 6:32 ` Christoph Hellwig 2018-08-07 6:32 ` Christoph Hellwig 2 siblings, 0 replies; 206+ messages in thread From: Benjamin Herrenschmidt @ 2018-08-07 0:18 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Christoph Hellwig, Will Deacon, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Tue, 2018-08-07 at 02:45 +0300, Michael S. Tsirkin wrote: > > OK well, assuming Christoph can solve the direct case in a way that > > also work for the virtio !iommu case, we still want some bit of logic > > somewhere that will "switch" to swiotlb based ops if the DMA mask is > > limited. > > > > You mentioned an RFC for that ? Do you happen to have a link ? > > No but Christoph did I think. Ok I missed that, sorry, I'll dig it out. Thanks. Cheers, Ben. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices @ 2018-08-07 0:18 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 206+ messages in thread From: Benjamin Herrenschmidt @ 2018-08-07 0:18 UTC (permalink / raw) To: Michael S. Tsirkin Cc: robh, srikar, mpe, Will Deacon, linux-kernel, linuxram, virtualization, Christoph Hellwig, jean-philippe.brucker, paulus, marc.zyngier, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Tue, 2018-08-07 at 02:45 +0300, Michael S. Tsirkin wrote: > > OK well, assuming Christoph can solve the direct case in a way that > > also work for the virtio !iommu case, we still want some bit of logic > > somewhere that will "switch" to swiotlb based ops if the DMA mask is > > limited. > > > > You mentioned an RFC for that ? Do you happen to have a link ? > > No but Christoph did I think. Ok I missed that, sorry, I'll dig it out. Thanks. Cheers, Ben. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-06 23:45 ` Michael S. Tsirkin 2018-08-07 0:18 ` Benjamin Herrenschmidt @ 2018-08-07 6:32 ` Christoph Hellwig 2018-08-07 6:32 ` Christoph Hellwig 2 siblings, 0 replies; 206+ messages in thread From: Christoph Hellwig @ 2018-08-07 6:32 UTC (permalink / raw) To: Michael S. Tsirkin Cc: robh, srikar, Benjamin Herrenschmidt, Will Deacon, linux-kernel, linuxram, virtualization, Christoph Hellwig, jean-philippe.brucker, paulus, marc.zyngier, mpe, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Tue, Aug 07, 2018 at 02:45:25AM +0300, Michael S. Tsirkin wrote: > > > I think that's where Christoph might have specific ideas about it. > > > > OK well, assuming Christoph can solve the direct case in a way that > > also work for the virtio !iommu case, we still want some bit of logic > > somewhere that will "switch" to swiotlb based ops if the DMA mask is > > limited. > > > > You mentioned an RFC for that ? Do you happen to have a link ? > > No but Christoph did I think. Do you mean the direct map retpoline mitigation? It is here: https://www.spinics.net/lists/netdev/msg495413.html https://www.spinics.net/lists/netdev/msg495785.html ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-06 23:45 ` Michael S. Tsirkin 2018-08-07 0:18 ` Benjamin Herrenschmidt 2018-08-07 6:32 ` Christoph Hellwig @ 2018-08-07 6:32 ` Christoph Hellwig 2 siblings, 0 replies; 206+ messages in thread From: Christoph Hellwig @ 2018-08-07 6:32 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Benjamin Herrenschmidt, Christoph Hellwig, Will Deacon, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Tue, Aug 07, 2018 at 02:45:25AM +0300, Michael S. Tsirkin wrote: > > > I think that's where Christoph might have specific ideas about it. > > > > OK well, assuming Christoph can solve the direct case in a way that > > also work for the virtio !iommu case, we still want some bit of logic > > somewhere that will "switch" to swiotlb based ops if the DMA mask is > > limited. > > > > You mentioned an RFC for that ? Do you happen to have a link ? > > No but Christoph did I think. Do you mean the direct map retpoline mitigation? It is here: https://www.spinics.net/lists/netdev/msg495413.html https://www.spinics.net/lists/netdev/msg495785.html ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-06 22:13 ` Benjamin Herrenschmidt ` (2 preceding siblings ...) (?) @ 2018-08-06 23:45 ` Michael S. Tsirkin -1 siblings, 0 replies; 206+ messages in thread From: Michael S. Tsirkin @ 2018-08-06 23:45 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: robh, srikar, mpe, Will Deacon, linux-kernel, linuxram, virtualization, Christoph Hellwig, jean-philippe.brucker, paulus, marc.zyngier, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Tue, Aug 07, 2018 at 08:13:56AM +1000, Benjamin Herrenschmidt wrote: > On Tue, 2018-08-07 at 00:46 +0300, Michael S. Tsirkin wrote: > > On Tue, Aug 07, 2018 at 07:26:35AM +1000, Benjamin Herrenschmidt wrote: > > > On Mon, 2018-08-06 at 23:35 +0300, Michael S. Tsirkin wrote: > > > > > As I said replying to Christoph, we are "leaking" into the interface > > > > > something here that is really what's the VM is doing to itself, which > > > > > is to stash its memory away in an inaccessible place. > > > > > > > > > > Cheers, > > > > > Ben. > > > > > > > > I think Christoph merely objects to the specific implementation. If > > > > instead you do something like tweak dev->bus_dma_mask for the virtio > > > > device I think he won't object. > > > > > > Well, we don't have "bus_dma_mask" yet ..or you mean dma_mask ? > > > > > > So, something like that would be a possibility, but the problem is that > > > the current virtio (guest side) implementation doesn't honor this when > > > not using dma ops and will not use dma ops if not using iommu, so back > > > to square one. > > > > Well we have the RFC for that - the switch to using DMA ops unconditionally isn't > > problematic itself IMHO, for now that RFC is blocked > > by its perfromance overhead for now but Christoph says > > he's trying to remove that for direct mappings, > > so we should hopefully be able to get there in X weeks. > > That would be good yes. > > ../.. > > > > --- a/drivers/virtio/virtio_ring.c > > > +++ b/drivers/virtio/virtio_ring.c > > > @@ -155,7 +155,7 @@ static bool vring_use_dma_api(struct virtio_device > > > *vdev) > > > * the DMA API if we're a Xen guest, which at least allows > > > * all of the sensible Xen configurations to work correctly. > > > */ > > > - if (xen_domain()) > > > + if (xen_domain() || arch_virtio_direct_dma_ops(&vdev->dev)) > > > return true; > > > > > > return false; > > > > Right but can't we fix the retpoline overhead such that > > vring_use_dma_api will not be called on data path any longer, making > > this a setup time check? > > Yes it needs to be a setup time check regardless actually ! > > The above is broken, sorry I was a bit quick here (too early in the > morning... ugh). We don't want the arch to go override the dma ops > every time that is callled. > > But yes, if we can fix the overhead, it becomes just a matter of > setting up the "right" ops automatically. > > > > (Passing the dev allows the arch to know this is a virtio device in > > > "direct" mode or whatever we want to call the !iommu case, and > > > construct appropriate DMA ops for it, which aren't the same as the DMA > > > ops of any other PCI device who *do* use the iommu). > > > > I think that's where Christoph might have specific ideas about it. > > OK well, assuming Christoph can solve the direct case in a way that > also work for the virtio !iommu case, we still want some bit of logic > somewhere that will "switch" to swiotlb based ops if the DMA mask is > limited. > > You mentioned an RFC for that ? Do you happen to have a link ? No but Christoph did I think. > It would be indeed ideal if all we had to do was setup some kind of > bus_dma_mask on all PCI devices and have virtio automagically insert > swiotlb when necessary. > > Cheers, > Ben. > ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-06 22:13 ` Benjamin Herrenschmidt ` (3 preceding siblings ...) (?) @ 2018-08-07 6:27 ` Christoph Hellwig -1 siblings, 0 replies; 206+ messages in thread From: Christoph Hellwig @ 2018-08-07 6:27 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: robh, srikar, Michael S. Tsirkin, mpe, Will Deacon, linux-kernel, linuxram, virtualization, Christoph Hellwig, jean-philippe.brucker, paulus, marc.zyngier, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Tue, Aug 07, 2018 at 08:13:56AM +1000, Benjamin Herrenschmidt wrote: > It would be indeed ideal if all we had to do was setup some kind of > bus_dma_mask on all PCI devices and have virtio automagically insert > swiotlb when necessary. For 4.20 I plan to remove the swiotlb ops and instead do the bounce buffering in the common code, including a direct call to the direct ops to avoid retpoline overhead. For that you still need a flag in virtio that instead of blindly working physical addresses it needs to be treated like a real device in terms of DMA. And for powerpc to make use of that I need to get the dma series I posted last week reviewed and included, otherwise powerpc will have to be excepted (like arm, where rmk didn't like the way the code was factored, everything else has already been taken care of). https://lists.linuxfoundation.org/pipermail/iommu/2018-July/028989.html ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-06 22:13 ` Benjamin Herrenschmidt ` (4 preceding siblings ...) (?) @ 2018-08-07 6:27 ` Christoph Hellwig 2018-08-07 6:44 ` Benjamin Herrenschmidt -1 siblings, 1 reply; 206+ messages in thread From: Christoph Hellwig @ 2018-08-07 6:27 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Michael S. Tsirkin, Christoph Hellwig, Will Deacon, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Tue, Aug 07, 2018 at 08:13:56AM +1000, Benjamin Herrenschmidt wrote: > It would be indeed ideal if all we had to do was setup some kind of > bus_dma_mask on all PCI devices and have virtio automagically insert > swiotlb when necessary. For 4.20 I plan to remove the swiotlb ops and instead do the bounce buffering in the common code, including a direct call to the direct ops to avoid retpoline overhead. For that you still need a flag in virtio that instead of blindly working physical addresses it needs to be treated like a real device in terms of DMA. And for powerpc to make use of that I need to get the dma series I posted last week reviewed and included, otherwise powerpc will have to be excepted (like arm, where rmk didn't like the way the code was factored, everything else has already been taken care of). https://lists.linuxfoundation.org/pipermail/iommu/2018-July/028989.html ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-07 6:27 ` Christoph Hellwig @ 2018-08-07 6:44 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 206+ messages in thread From: Benjamin Herrenschmidt @ 2018-08-07 6:44 UTC (permalink / raw) To: Christoph Hellwig Cc: Michael S. Tsirkin, Will Deacon, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Mon, 2018-08-06 at 23:27 -0700, Christoph Hellwig wrote: > On Tue, Aug 07, 2018 at 08:13:56AM +1000, Benjamin Herrenschmidt wrote: > > It would be indeed ideal if all we had to do was setup some kind of > > bus_dma_mask on all PCI devices and have virtio automagically insert > > swiotlb when necessary. > > For 4.20 I plan to remove the swiotlb ops and instead do the bounce > buffering in the common code, including a direct call to the direct > ops to avoid retpoline overhead. For that you still need a flag > in virtio that instead of blindly working physical addresses it needs > to be treated like a real device in terms of DMA. But you will still call the swiotlb infrastructure, right ? IE, I sitll need to control where/how the swiotlb "pool" is allocated. > > And for powerpc to make use of that I need to get the dma series I > posted last week reviewed and included, otherwise powerpc will have > to be excepted (like arm, where rmk didn't like the way the code > was factored, everything else has already been taken care of). > > https://lists.linuxfoundation.org/pipermail/iommu/2018-July/028989.html Yes, I saw your series. I'm just back from a week of travel, I plan to start reviewing it this week if Michael doesn't beat me to it. Cheers, Ben. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices @ 2018-08-07 6:44 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 206+ messages in thread From: Benjamin Herrenschmidt @ 2018-08-07 6:44 UTC (permalink / raw) To: Christoph Hellwig Cc: robh, srikar, Michael S. Tsirkin, mpe, Will Deacon, linux-kernel, linuxram, virtualization, jean-philippe.brucker, paulus, marc.zyngier, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Mon, 2018-08-06 at 23:27 -0700, Christoph Hellwig wrote: > On Tue, Aug 07, 2018 at 08:13:56AM +1000, Benjamin Herrenschmidt wrote: > > It would be indeed ideal if all we had to do was setup some kind of > > bus_dma_mask on all PCI devices and have virtio automagically insert > > swiotlb when necessary. > > For 4.20 I plan to remove the swiotlb ops and instead do the bounce > buffering in the common code, including a direct call to the direct > ops to avoid retpoline overhead. For that you still need a flag > in virtio that instead of blindly working physical addresses it needs > to be treated like a real device in terms of DMA. But you will still call the swiotlb infrastructure, right ? IE, I sitll need to control where/how the swiotlb "pool" is allocated. > > And for powerpc to make use of that I need to get the dma series I > posted last week reviewed and included, otherwise powerpc will have > to be excepted (like arm, where rmk didn't like the way the code > was factored, everything else has already been taken care of). > > https://lists.linuxfoundation.org/pipermail/iommu/2018-July/028989.html Yes, I saw your series. I'm just back from a week of travel, I plan to start reviewing it this week if Michael doesn't beat me to it. Cheers, Ben. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-06 21:46 ` Michael S. Tsirkin 2018-08-06 22:13 ` Benjamin Herrenschmidt @ 2018-08-07 6:18 ` Christoph Hellwig 2018-08-07 6:18 ` Christoph Hellwig 2 siblings, 0 replies; 206+ messages in thread From: Christoph Hellwig @ 2018-08-07 6:18 UTC (permalink / raw) To: Michael S. Tsirkin Cc: robh, srikar, Benjamin Herrenschmidt, Will Deacon, linux-kernel, linuxram, virtualization, Christoph Hellwig, jean-philippe.brucker, paulus, marc.zyngier, mpe, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Tue, Aug 07, 2018 at 12:46:34AM +0300, Michael S. Tsirkin wrote: > Well we have the RFC for that - the switch to using DMA ops unconditionally isn't > problematic itself IMHO, for now that RFC is blocked > by its perfromance overhead for now but Christoph says > he's trying to remove that for direct mappings, > so we should hopefully be able to get there in X weeks. The direct calls to dma_direct_ops aren't going to help you with legacy virtio, given that virtio is specified to deal with physical addresses, while dma-direct is not in many cases. It would however help with the case where qemu always sets the platform dma flag, as we'd avoid the indirect calls for that. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-06 21:46 ` Michael S. Tsirkin 2018-08-06 22:13 ` Benjamin Herrenschmidt 2018-08-07 6:18 ` Christoph Hellwig @ 2018-08-07 6:18 ` Christoph Hellwig 2 siblings, 0 replies; 206+ messages in thread From: Christoph Hellwig @ 2018-08-07 6:18 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Benjamin Herrenschmidt, Christoph Hellwig, Will Deacon, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Tue, Aug 07, 2018 at 12:46:34AM +0300, Michael S. Tsirkin wrote: > Well we have the RFC for that - the switch to using DMA ops unconditionally isn't > problematic itself IMHO, for now that RFC is blocked > by its perfromance overhead for now but Christoph says > he's trying to remove that for direct mappings, > so we should hopefully be able to get there in X weeks. The direct calls to dma_direct_ops aren't going to help you with legacy virtio, given that virtio is specified to deal with physical addresses, while dma-direct is not in many cases. It would however help with the case where qemu always sets the platform dma flag, as we'd avoid the indirect calls for that. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-06 21:26 ` Benjamin Herrenschmidt @ 2018-08-07 6:16 ` Christoph Hellwig -1 siblings, 0 replies; 206+ messages in thread From: Christoph Hellwig @ 2018-08-07 6:16 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Michael S. Tsirkin, Christoph Hellwig, Will Deacon, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Tue, Aug 07, 2018 at 07:26:35AM +1000, Benjamin Herrenschmidt wrote: > > I think Christoph merely objects to the specific implementation. If > > instead you do something like tweak dev->bus_dma_mask for the virtio > > device I think he won't object. > > Well, we don't have "bus_dma_mask" yet ..or you mean dma_mask ? It will be new in 4.19: http://git.infradead.org/users/hch/dma-mapping.git/commitdiff/f07d141fe9430cdf9f8a65a87c41 > So, something like that would be a possibility, but the problem is that > the current virtio (guest side) implementation doesn't honor this when > not using dma ops and will not use dma ops if not using iommu, so back > to square one. > > Christoph seems to be wanting to use a flag in the interface to make > the guest use dma_ops which is what I don't understand. As-is virtio devices are very clearly and explcitly defined to use physical addresses in the spec. dma ops will often do platform based translations (iommu, offsets), so we can't just use the plaform default dma ops and will need to opt into them. > What would be needed then would be something along the lines of virtio > noticing that dma_mask isn't big enough to cover all of memory (which > isn't something generic code can easily do here for various reasons I > can elaborate if you want, but that specific test more/less has to be > arch specific), and in that case, force itself to use DMA ops routed to > swiotlb. > > I'd rather have arch code do the bulk of that work, don't you think ? There is nothing architecture specific about that. I've been working hard to remove all the bullshit architectures have done in their DMA ops and consolidating them into common code based on rules. The last thing I want is another vector for weird underspecified arch interfaction with DMA ops, which is exactly what your patch below does. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices @ 2018-08-07 6:16 ` Christoph Hellwig 0 siblings, 0 replies; 206+ messages in thread From: Christoph Hellwig @ 2018-08-07 6:16 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: robh, srikar, Michael S. Tsirkin, mpe, Will Deacon, linux-kernel, linuxram, virtualization, Christoph Hellwig, jean-philippe.brucker, paulus, marc.zyngier, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Tue, Aug 07, 2018 at 07:26:35AM +1000, Benjamin Herrenschmidt wrote: > > I think Christoph merely objects to the specific implementation. If > > instead you do something like tweak dev->bus_dma_mask for the virtio > > device I think he won't object. > > Well, we don't have "bus_dma_mask" yet ..or you mean dma_mask ? It will be new in 4.19: http://git.infradead.org/users/hch/dma-mapping.git/commitdiff/f07d141fe9430cdf9f8a65a87c41 > So, something like that would be a possibility, but the problem is that > the current virtio (guest side) implementation doesn't honor this when > not using dma ops and will not use dma ops if not using iommu, so back > to square one. > > Christoph seems to be wanting to use a flag in the interface to make > the guest use dma_ops which is what I don't understand. As-is virtio devices are very clearly and explcitly defined to use physical addresses in the spec. dma ops will often do platform based translations (iommu, offsets), so we can't just use the plaform default dma ops and will need to opt into them. > What would be needed then would be something along the lines of virtio > noticing that dma_mask isn't big enough to cover all of memory (which > isn't something generic code can easily do here for various reasons I > can elaborate if you want, but that specific test more/less has to be > arch specific), and in that case, force itself to use DMA ops routed to > swiotlb. > > I'd rather have arch code do the bulk of that work, don't you think ? There is nothing architecture specific about that. I've been working hard to remove all the bullshit architectures have done in their DMA ops and consolidating them into common code based on rules. The last thing I want is another vector for weird underspecified arch interfaction with DMA ops, which is exactly what your patch below does. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-06 20:35 ` Michael S. Tsirkin @ 2018-08-06 23:18 ` Benjamin Herrenschmidt 2018-08-06 23:18 ` Benjamin Herrenschmidt 2018-08-07 6:12 ` Christoph Hellwig 2 siblings, 0 replies; 206+ messages in thread From: Benjamin Herrenschmidt @ 2018-08-06 23:18 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Christoph Hellwig, Will Deacon, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Mon, 2018-08-06 at 23:35 +0300, Michael S. Tsirkin wrote: > On Tue, Aug 07, 2018 at 05:56:59AM +1000, Benjamin Herrenschmidt wrote: > > On Mon, 2018-08-06 at 16:46 +0300, Michael S. Tsirkin wrote: > > > > > > > Right, we'll need some quirk to disable balloons in the guest I > > > > suppose. > > > > > > > > Passing something from libvirt is cumbersome because the end user may > > > > not even need to know about secure VMs. There are use cases where the > > > > security is a contract down to some special application running inside > > > > the secure VM, the sysadmin knows nothing about. > > > > > > > > Also there's repercussions all the way to admin tools, web UIs etc... > > > > so it's fairly wide ranging. > > > > > > > > So as long as we only need to quirk a couple of devices, it's much > > > > better contained that way. > > > > > > So just the balloon thing already means that yes management and all the > > > way to the user tools must know this is going on. Otherwise > > > user will try to inflate the balloon and wonder why this does not work. > > > > There is *dozens* of management systems out there, not even all open > > source, we won't ever be able to see the end of the tunnel if we need > > to teach every single of them, including end users, about platform > > specific new VM flags like that. > > > > .../... > > In the end I suspect you will find you have to. Maybe... we'll tackle this if/when we have to. For balloon I suspect it's not such a big deal because once secure, all the guest memory goes into the secure memory which isn't visible or accounted by the hypervisor, so there's nothing to steal but the guest is also using no HV memory (other than the few "non-secure" pages used for swiotlb and a couple of other kernel things). Future versions of our secure architecture might allow to turn arbitrary pages of memory secure/non-secure rather than relying on a separate physical pool, in which case, the balloon will be able to work normally. Cheers, Ben. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices @ 2018-08-06 23:18 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 206+ messages in thread From: Benjamin Herrenschmidt @ 2018-08-06 23:18 UTC (permalink / raw) To: Michael S. Tsirkin Cc: robh, srikar, mpe, Will Deacon, linux-kernel, linuxram, virtualization, Christoph Hellwig, jean-philippe.brucker, paulus, marc.zyngier, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Mon, 2018-08-06 at 23:35 +0300, Michael S. Tsirkin wrote: > On Tue, Aug 07, 2018 at 05:56:59AM +1000, Benjamin Herrenschmidt wrote: > > On Mon, 2018-08-06 at 16:46 +0300, Michael S. Tsirkin wrote: > > > > > > > Right, we'll need some quirk to disable balloons in the guest I > > > > suppose. > > > > > > > > Passing something from libvirt is cumbersome because the end user may > > > > not even need to know about secure VMs. There are use cases where the > > > > security is a contract down to some special application running inside > > > > the secure VM, the sysadmin knows nothing about. > > > > > > > > Also there's repercussions all the way to admin tools, web UIs etc... > > > > so it's fairly wide ranging. > > > > > > > > So as long as we only need to quirk a couple of devices, it's much > > > > better contained that way. > > > > > > So just the balloon thing already means that yes management and all the > > > way to the user tools must know this is going on. Otherwise > > > user will try to inflate the balloon and wonder why this does not work. > > > > There is *dozens* of management systems out there, not even all open > > source, we won't ever be able to see the end of the tunnel if we need > > to teach every single of them, including end users, about platform > > specific new VM flags like that. > > > > .../... > > In the end I suspect you will find you have to. Maybe... we'll tackle this if/when we have to. For balloon I suspect it's not such a big deal because once secure, all the guest memory goes into the secure memory which isn't visible or accounted by the hypervisor, so there's nothing to steal but the guest is also using no HV memory (other than the few "non-secure" pages used for swiotlb and a couple of other kernel things). Future versions of our secure architecture might allow to turn arbitrary pages of memory secure/non-secure rather than relying on a separate physical pool, in which case, the balloon will be able to work normally. Cheers, Ben. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-06 20:35 ` Michael S. Tsirkin @ 2018-08-07 6:12 ` Christoph Hellwig 2018-08-06 23:18 ` Benjamin Herrenschmidt 2018-08-07 6:12 ` Christoph Hellwig 2 siblings, 0 replies; 206+ messages in thread From: Christoph Hellwig @ 2018-08-07 6:12 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Benjamin Herrenschmidt, Christoph Hellwig, Will Deacon, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Mon, Aug 06, 2018 at 11:35:39PM +0300, Michael S. Tsirkin wrote: > > As I said replying to Christoph, we are "leaking" into the interface > > something here that is really what's the VM is doing to itself, which > > is to stash its memory away in an inaccessible place. > > > > Cheers, > > Ben. > > I think Christoph merely objects to the specific implementation. If > instead you do something like tweak dev->bus_dma_mask for the virtio > device I think he won't object. As long as we also document how dev->bus_dma_mask is tweaked for this particular virtual bus, yes. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices @ 2018-08-07 6:12 ` Christoph Hellwig 0 siblings, 0 replies; 206+ messages in thread From: Christoph Hellwig @ 2018-08-07 6:12 UTC (permalink / raw) To: Michael S. Tsirkin Cc: robh, srikar, Benjamin Herrenschmidt, Will Deacon, linux-kernel, linuxram, virtualization, Christoph Hellwig, jean-philippe.brucker, paulus, marc.zyngier, mpe, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Mon, Aug 06, 2018 at 11:35:39PM +0300, Michael S. Tsirkin wrote: > > As I said replying to Christoph, we are "leaking" into the interface > > something here that is really what's the VM is doing to itself, which > > is to stash its memory away in an inaccessible place. > > > > Cheers, > > Ben. > > I think Christoph merely objects to the specific implementation. If > instead you do something like tweak dev->bus_dma_mask for the virtio > device I think he won't object. As long as we also document how dev->bus_dma_mask is tweaked for this particular virtual bus, yes. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-06 19:56 ` Benjamin Herrenschmidt (?) (?) @ 2018-08-06 20:35 ` Michael S. Tsirkin -1 siblings, 0 replies; 206+ messages in thread From: Michael S. Tsirkin @ 2018-08-06 20:35 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: robh, srikar, mpe, Will Deacon, linux-kernel, linuxram, virtualization, Christoph Hellwig, jean-philippe.brucker, paulus, marc.zyngier, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Tue, Aug 07, 2018 at 05:56:59AM +1000, Benjamin Herrenschmidt wrote: > On Mon, 2018-08-06 at 16:46 +0300, Michael S. Tsirkin wrote: > > > > > Right, we'll need some quirk to disable balloons in the guest I > > > suppose. > > > > > > Passing something from libvirt is cumbersome because the end user may > > > not even need to know about secure VMs. There are use cases where the > > > security is a contract down to some special application running inside > > > the secure VM, the sysadmin knows nothing about. > > > > > > Also there's repercussions all the way to admin tools, web UIs etc... > > > so it's fairly wide ranging. > > > > > > So as long as we only need to quirk a couple of devices, it's much > > > better contained that way. > > > > So just the balloon thing already means that yes management and all the > > way to the user tools must know this is going on. Otherwise > > user will try to inflate the balloon and wonder why this does not work. > > There is *dozens* of management systems out there, not even all open > source, we won't ever be able to see the end of the tunnel if we need > to teach every single of them, including end users, about platform > specific new VM flags like that. > > .../... In the end I suspect you will find you have to. > > Here's another example: you can't migrate a secure vm to hypervisor > > which doesn't support this feature. Again management tools above libvirt > > need to know otherwise they will try. > > There will have to be a new machine type for that I suppose, yes, > though it's not just the hypervisor that needs to know about the > modified migration stream, it's also the need to have a compatible > ultravisor with the right keys on the other side. > > So migration is going to be special and require extra admin work in all > cases yes. But not all secure VMs are meant to be migratable. > > In any case, back to the problem at hand. What a qemu flag gives us is > just a way to force iommu at VM creation time. I don't think a qemu flag is strictly required for a problem at hand. > This is rather sub-optimal, we don't really want the iommu in the way, > so it's at best a "workaround", and it's not really solving the real > problem. This specific problem, I think I agree. > As I said replying to Christoph, we are "leaking" into the interface > something here that is really what's the VM is doing to itself, which > is to stash its memory away in an inaccessible place. > > Cheers, > Ben. I think Christoph merely objects to the specific implementation. If instead you do something like tweak dev->bus_dma_mask for the virtio device I think he won't object. -- MST ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-05 4:52 ` Benjamin Herrenschmidt (?) (?) @ 2018-08-06 13:46 ` Michael S. Tsirkin -1 siblings, 0 replies; 206+ messages in thread From: Michael S. Tsirkin @ 2018-08-06 13:46 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: robh, srikar, mpe, Will Deacon, linux-kernel, linuxram, virtualization, Christoph Hellwig, jean-philippe.brucker, paulus, marc.zyngier, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Sun, Aug 05, 2018 at 02:52:54PM +1000, Benjamin Herrenschmidt wrote: > On Sun, 2018-08-05 at 03:22 +0300, Michael S. Tsirkin wrote: > > I see the allure of this, but I think down the road you will > > discover passing a flag in libvirt XML saying > > "please use a secure mode" or whatever is a good idea. > > > > Even thought it is probably not required to address this > > specific issue. > > > > For example, I don't think ballooning works in secure mode, > > you will be able to teach libvirt not to try to add a > > balloon to the guest. > > Right, we'll need some quirk to disable balloons in the guest I > suppose. > > Passing something from libvirt is cumbersome because the end user may > not even need to know about secure VMs. There are use cases where the > security is a contract down to some special application running inside > the secure VM, the sysadmin knows nothing about. > > Also there's repercussions all the way to admin tools, web UIs etc... > so it's fairly wide ranging. > > So as long as we only need to quirk a couple of devices, it's much > better contained that way. So just the balloon thing already means that yes management and all the way to the user tools must know this is going on. Otherwise user will try to inflate the balloon and wonder why this does not work. > > > Later on, (we may have even already run Linux at that point, > > > unsecurely, as we can use Linux as a bootloader under some > > > circumstances), we start a "secure image". > > > > > > This is a kernel zImage that includes a "ticket" that has the > > > appropriate signature etc... so that when that kernel starts, it can > > > authenticate with the ultravisor, be verified (along with its ramdisk) > > > etc... and copied (by the UV) into secure memory & run from there. > > > > > > At that point, the hypervisor is informed that the VM has become > > > secure. > > > > > > So at that point, we could exit to qemu to inform it of the change, > > > > That's probably a good idea too. > > We probably will have to tell qemu eventually for migration, as we'll > need some kind of key exchange phase etc... to deal with the crypto > aspects (the actual page copy is sorted via encrypting the secure pages > back to normal pages in qemu, but we'll need extra metadata). > > > > and > > > have it walk the qtree and "Switch" all the virtio devices to use the > > > IOMMU I suppose, but it feels a lot grosser to me. > > > > That part feels gross, yes. > > > > > That's the only other option I can think of. > > > > > > > However in this specific case, the flag does not need to come from the > > > > hypervisor, it can be set by arch boot code I think. > > > > Christoph do you see a problem with that? > > > > > > The above could do that yes. Another approach would be to do it from a > > > small virtio "quirk" that pokes a bit in the device to force it to > > > iommu mode when it detects that we are running in a secure VM. That's a > > > bit warty on the virito side but probably not as much as having a qemu > > > one that walks of the virtio devices to change how they behave. > > > > > > What do you reckon ? > > > > I think you are right that for the dma limit the hypervisor doesn't seem > > to need to know. > > It's not just a limit mind you. It's a range, at least if we allocate > just a single pool of insecure pages. swiotlb feels like a better > option for us. > > > > What we want to avoid is to expose any of this to the *end user* or > > > libvirt or any other higher level of the management stack. We really > > > want that stuff to remain contained between the VM itself, KVM and > > > maybe qemu. > > > > > > We will need some other qemu changes for migration so that's ok. But > > > the minute you start touching libvirt and the higher levels it becomes > > > a nightmare. > > > > > > Cheers, > > > Ben. > > > > I don't believe you'll be able to avoid that entirely. The split between > > libvirt and qemu is more about community than about code, random bits of > > functionality tend to land on random sides of that fence. Better add a > > tag in domain XML early is my advice. Having said that, it's your > > hypervisor. I'm just suggesting that when hypervisor does somehow need > > to care then I suspect most people won't be receptive to the argument > > that changing libvirt is a nightmare. > > It only needs to care at runtime. The problem isn't changing libvirt > per-se, I don't have a problem with that. The problem is that it means > creating two categories of machines "secure" and "non-secure", which is > end-user visible, and thus has to be escalated to all the various > management stacks, UIs, etc... out there. > > In addition, there are some cases where the individual creating the VMs > may not have any idea that they are secure. > > But yes, if we have to, we'll do it. However, so far, we don't think > it's a great idea. > > Cheers, > Ben. Here's another example: you can't migrate a secure vm to hypervisor which doesn't support this feature. Again management tools above libvirt need to know otherwise they will try. > > > > > > To get swiotlb you'll need to then use the DT/ACPI > > > > > > dma-range property to limit the addressable range, and a swiotlb > > > > > > capable plaform will use swiotlb automatically. > > > > > > > > > > This cannot be done as you describe it. > > > > > > > > > > The VM is created as a *normal* VM. The DT stuff is generated by qemu > > > > > at a point where it has *no idea* that the VM will later become secure > > > > > and thus will have to restrict which pages can be used for "DMA". > > > > > > > > > > The VM will *at runtime* turn itself into a secure VM via interactions > > > > > with the security HW and the Ultravisor layer (which sits below the > > > > > HV). This happens way after the DT has been created and consumed, the > > > > > qemu devices instanciated etc... > > > > > > > > > > Only the guest kernel knows because it initates the transition. When > > > > > that happens, the virtio devices have already been used by the guest > > > > > firmware, bootloader, possibly another kernel that kexeced the "secure" > > > > > one, etc... > > > > > > > > > > So instead of running around saying NAK NAK NAK, please explain how we > > > > > can solve that differently. > > > > > > > > > > Ben. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-04 1:16 ` Benjamin Herrenschmidt 2018-08-05 0:22 ` Michael S. Tsirkin @ 2018-08-05 0:22 ` Michael S. Tsirkin 1 sibling, 0 replies; 206+ messages in thread From: Michael S. Tsirkin @ 2018-08-05 0:22 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: robh, srikar, mpe, Will Deacon, linux-kernel, linuxram, virtualization, Christoph Hellwig, paulus, marc.zyngier, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Fri, Aug 03, 2018 at 08:16:21PM -0500, Benjamin Herrenschmidt wrote: > On Fri, 2018-08-03 at 22:07 +0300, Michael S. Tsirkin wrote: > > On Fri, Aug 03, 2018 at 10:58:36AM -0500, Benjamin Herrenschmidt wrote: > > > On Fri, 2018-08-03 at 00:05 -0700, Christoph Hellwig wrote: > > > > > 2- Make virtio use the DMA API with our custom platform-provided > > > > > swiotlb callbacks when needed, that is when not using IOMMU *and* > > > > > running on a secure VM in our case. > > > > > > > > And total NAK the customer platform-provided part of this. We need > > > > a flag passed in from the hypervisor that the device needs all bus > > > > specific dma api treatment, and then just use the normal plaform > > > > dma mapping setup. > > > > > > Christoph, as I have explained already, we do NOT have a way to provide > > > such a flag as neither the hypervisor nor qemu knows anything about > > > this when the VM is created. > > > > I think the fact you can't add flags from the hypervisor is > > a sign of a problematic architecture, you should look at > > adding that down the road - you will likely need it at some point. > > Well, we can later in the boot process. At VM creation time, it's just > a normal VM. The VM firmware, bootloader etc... are just operating > normally etc... I see the allure of this, but I think down the road you will discover passing a flag in libvirt XML saying "please use a secure mode" or whatever is a good idea. Even thought it is probably not required to address this specific issue. For example, I don't think ballooning works in secure mode, you will be able to teach libvirt not to try to add a balloon to the guest. > Later on, (we may have even already run Linux at that point, > unsecurely, as we can use Linux as a bootloader under some > circumstances), we start a "secure image". > > This is a kernel zImage that includes a "ticket" that has the > appropriate signature etc... so that when that kernel starts, it can > authenticate with the ultravisor, be verified (along with its ramdisk) > etc... and copied (by the UV) into secure memory & run from there. > > At that point, the hypervisor is informed that the VM has become > secure. > > So at that point, we could exit to qemu to inform it of the change, That's probably a good idea too. > and > have it walk the qtree and "Switch" all the virtio devices to use the > IOMMU I suppose, but it feels a lot grosser to me. That part feels gross, yes. > That's the only other option I can think of. > > > However in this specific case, the flag does not need to come from the > > hypervisor, it can be set by arch boot code I think. > > Christoph do you see a problem with that? > > The above could do that yes. Another approach would be to do it from a > small virtio "quirk" that pokes a bit in the device to force it to > iommu mode when it detects that we are running in a secure VM. That's a > bit warty on the virito side but probably not as much as having a qemu > one that walks of the virtio devices to change how they behave. > > What do you reckon ? I think you are right that for the dma limit the hypervisor doesn't seem to need to know. > What we want to avoid is to expose any of this to the *end user* or > libvirt or any other higher level of the management stack. We really > want that stuff to remain contained between the VM itself, KVM and > maybe qemu. > > We will need some other qemu changes for migration so that's ok. But > the minute you start touching libvirt and the higher levels it becomes > a nightmare. > > Cheers, > Ben. I don't believe you'll be able to avoid that entirely. The split between libvirt and qemu is more about community than about code, random bits of functionality tend to land on random sides of that fence. Better add a tag in domain XML early is my advice. Having said that, it's your hypervisor. I'm just suggesting that when hypervisor does somehow need to care then I suspect most people won't be receptive to the argument that changing libvirt is a nightmare. > > > > To get swiotlb you'll need to then use the DT/ACPI > > > > dma-range property to limit the addressable range, and a swiotlb > > > > capable plaform will use swiotlb automatically. > > > > > > This cannot be done as you describe it. > > > > > > The VM is created as a *normal* VM. The DT stuff is generated by qemu > > > at a point where it has *no idea* that the VM will later become secure > > > and thus will have to restrict which pages can be used for "DMA". > > > > > > The VM will *at runtime* turn itself into a secure VM via interactions > > > with the security HW and the Ultravisor layer (which sits below the > > > HV). This happens way after the DT has been created and consumed, the > > > qemu devices instanciated etc... > > > > > > Only the guest kernel knows because it initates the transition. When > > > that happens, the virtio devices have already been used by the guest > > > firmware, bootloader, possibly another kernel that kexeced the "secure" > > > one, etc... > > > > > > So instead of running around saying NAK NAK NAK, please explain how we > > > can solve that differently. > > > > > > Ben. > > > ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-03 19:07 ` Michael S. Tsirkin ` (2 preceding siblings ...) 2018-08-04 1:16 ` Benjamin Herrenschmidt @ 2018-08-04 1:18 ` Benjamin Herrenschmidt 2018-08-04 1:18 ` Benjamin Herrenschmidt 2018-08-04 1:22 ` Benjamin Herrenschmidt 5 siblings, 0 replies; 206+ messages in thread From: Benjamin Herrenschmidt @ 2018-08-04 1:18 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Christoph Hellwig, Will Deacon, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Fri, 2018-08-03 at 22:07 +0300, Michael S. Tsirkin wrote: > On Fri, Aug 03, 2018 at 10:58:36AM -0500, Benjamin Herrenschmidt wrote: > > On Fri, 2018-08-03 at 00:05 -0700, Christoph Hellwig wrote: > > > > 2- Make virtio use the DMA API with our custom platform-provided > > > > swiotlb callbacks when needed, that is when not using IOMMU *and* > > > > running on a secure VM in our case. > > > > > > And total NAK the customer platform-provided part of this. We need > > > a flag passed in from the hypervisor that the device needs all bus > > > specific dma api treatment, and then just use the normal plaform > > > dma mapping setup. > > > > Christoph, as I have explained already, we do NOT have a way to provide > > such a flag as neither the hypervisor nor qemu knows anything about > > this when the VM is created. > > I think the fact you can't add flags from the hypervisor is > a sign of a problematic architecture, you should look at > adding that down the road - you will likely need it at some point. Well, we can later in the boot process. At VM creation time, it's just a normal VM. The VM firmware, bootloader etc... are just operating normally etc... Later on, (we may have even already run Linux at that point, unsecurely, as we can use Linux as a bootloader under some circumstances), we start a "secure image". This is a kernel zImage that includes a "ticket" that has the appropriate signature etc... so that when that kernel starts, it can authenticate with the ultravisor, be verified (along with its ramdisk) etc... and copied (by the UV) into secure memory & run from there. At that point, the hypervisor is informed that the VM has become secure. So at that point, we could exit to qemu to inform it of the change, and have it walk the qtree and "Switch" all the virtio devices to use the IOMMU I suppose, but it feels a lot grosser to me. That's the only other option I can think of. > However in this specific case, the flag does not need to come from the > hypervisor, it can be set by arch boot code I think. > Christoph do you see a problem with that? The above could do that yes. Another approach would be to do it from a small virtio "quirk" that pokes a bit in the device to force it to iommu mode when it detects that we are running in a secure VM. That's a bit warty on the virito side but probably not as much as having a qemu one that walks of the virtio devices to change how they behave. What do you reckon ? What we want to avoid is to expose any of this to the *end user* or libvirt or any other higher level of the management stack. We really want that stuff to remain contained between the VM itself, KVM and maybe qemu. We will need some other qemu changes for migration so that's ok. But the minute you start touching libvirt and the higher levels it becomes a nightmare. Cheers, Ben. > > > To get swiotlb you'll need to then use the DT/ACPI > > > dma-range property to limit the addressable range, and a swiotlb > > > capable plaform will use swiotlb automatically. > > > > This cannot be done as you describe it. > > > > The VM is created as a *normal* VM. The DT stuff is generated by qemu > > at a point where it has *no idea* that the VM will later become secure > > and thus will have to restrict which pages can be used for "DMA". > > > > The VM will *at runtime* turn itself into a secure VM via interactions > > with the security HW and the Ultravisor layer (which sits below the > > HV). This happens way after the DT has been created and consumed, the > > qemu devices instanciated etc... > > > > Only the guest kernel knows because it initates the transition. When > > that happens, the virtio devices have already been used by the guest > > firmware, bootloader, possibly another kernel that kexeced the "secure" > > one, etc... > > > > So instead of running around saying NAK NAK NAK, please explain how we > > can solve that differently. > > > > Ben. > > ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-03 19:07 ` Michael S. Tsirkin ` (3 preceding siblings ...) 2018-08-04 1:18 ` Benjamin Herrenschmidt @ 2018-08-04 1:18 ` Benjamin Herrenschmidt 2018-08-04 1:22 ` Benjamin Herrenschmidt 5 siblings, 0 replies; 206+ messages in thread From: Benjamin Herrenschmidt @ 2018-08-04 1:18 UTC (permalink / raw) To: Michael S. Tsirkin Cc: robh, srikar, mpe, Will Deacon, linux-kernel, linuxram, virtualization, Christoph Hellwig, paulus, marc.zyngier, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Fri, 2018-08-03 at 22:07 +0300, Michael S. Tsirkin wrote: > On Fri, Aug 03, 2018 at 10:58:36AM -0500, Benjamin Herrenschmidt wrote: > > On Fri, 2018-08-03 at 00:05 -0700, Christoph Hellwig wrote: > > > > 2- Make virtio use the DMA API with our custom platform-provided > > > > swiotlb callbacks when needed, that is when not using IOMMU *and* > > > > running on a secure VM in our case. > > > > > > And total NAK the customer platform-provided part of this. We need > > > a flag passed in from the hypervisor that the device needs all bus > > > specific dma api treatment, and then just use the normal plaform > > > dma mapping setup. > > > > Christoph, as I have explained already, we do NOT have a way to provide > > such a flag as neither the hypervisor nor qemu knows anything about > > this when the VM is created. > > I think the fact you can't add flags from the hypervisor is > a sign of a problematic architecture, you should look at > adding that down the road - you will likely need it at some point. Well, we can later in the boot process. At VM creation time, it's just a normal VM. The VM firmware, bootloader etc... are just operating normally etc... Later on, (we may have even already run Linux at that point, unsecurely, as we can use Linux as a bootloader under some circumstances), we start a "secure image". This is a kernel zImage that includes a "ticket" that has the appropriate signature etc... so that when that kernel starts, it can authenticate with the ultravisor, be verified (along with its ramdisk) etc... and copied (by the UV) into secure memory & run from there. At that point, the hypervisor is informed that the VM has become secure. So at that point, we could exit to qemu to inform it of the change, and have it walk the qtree and "Switch" all the virtio devices to use the IOMMU I suppose, but it feels a lot grosser to me. That's the only other option I can think of. > However in this specific case, the flag does not need to come from the > hypervisor, it can be set by arch boot code I think. > Christoph do you see a problem with that? The above could do that yes. Another approach would be to do it from a small virtio "quirk" that pokes a bit in the device to force it to iommu mode when it detects that we are running in a secure VM. That's a bit warty on the virito side but probably not as much as having a qemu one that walks of the virtio devices to change how they behave. What do you reckon ? What we want to avoid is to expose any of this to the *end user* or libvirt or any other higher level of the management stack. We really want that stuff to remain contained between the VM itself, KVM and maybe qemu. We will need some other qemu changes for migration so that's ok. But the minute you start touching libvirt and the higher levels it becomes a nightmare. Cheers, Ben. > > > To get swiotlb you'll need to then use the DT/ACPI > > > dma-range property to limit the addressable range, and a swiotlb > > > capable plaform will use swiotlb automatically. > > > > This cannot be done as you describe it. > > > > The VM is created as a *normal* VM. The DT stuff is generated by qemu > > at a point where it has *no idea* that the VM will later become secure > > and thus will have to restrict which pages can be used for "DMA". > > > > The VM will *at runtime* turn itself into a secure VM via interactions > > with the security HW and the Ultravisor layer (which sits below the > > HV). This happens way after the DT has been created and consumed, the > > qemu devices instanciated etc... > > > > Only the guest kernel knows because it initates the transition. When > > that happens, the virtio devices have already been used by the guest > > firmware, bootloader, possibly another kernel that kexeced the "secure" > > one, etc... > > > > So instead of running around saying NAK NAK NAK, please explain how we > > can solve that differently. > > > > Ben. > > ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-03 19:07 ` Michael S. Tsirkin @ 2018-08-04 1:22 ` Benjamin Herrenschmidt 2018-08-04 1:16 ` Benjamin Herrenschmidt ` (4 subsequent siblings) 5 siblings, 0 replies; 206+ messages in thread From: Benjamin Herrenschmidt @ 2018-08-04 1:22 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Christoph Hellwig, Will Deacon, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Fri, 2018-08-03 at 22:07 +0300, Michael S. Tsirkin wrote: > On Fri, Aug 03, 2018 at 10:58:36AM -0500, Benjamin Herrenschmidt wrote: > > On Fri, 2018-08-03 at 00:05 -0700, Christoph Hellwig wrote: > > > > 2- Make virtio use the DMA API with our custom platform-provided > > > > swiotlb callbacks when needed, that is when not using IOMMU *and* > > > > running on a secure VM in our case. > > > > > > And total NAK the customer platform-provided part of this. We need > > > a flag passed in from the hypervisor that the device needs all bus > > > specific dma api treatment, and then just use the normal plaform > > > dma mapping setup. > > > > Christoph, as I have explained already, we do NOT have a way to provide > > such a flag as neither the hypervisor nor qemu knows anything about > > this when the VM is created. > > I think the fact you can't add flags from the hypervisor is > a sign of a problematic architecture, you should look at > adding that down the road - you will likely need it at some point. (Appologies if you got this twice, my mailer had a brain fart and I don't know if the first one got through & am about to disappear in a plane for 17h) Well, we can later in the boot process. At VM creation time, it's just a normal VM. The VM firmware, bootloader etc... are just operating normally etc... Later on, (we may have even already run Linux at that point, unsecurely, as we can use Linux as a bootloader under some circumstances), we start a "secure image". This is a kernel zImage that includes a "ticket" that has the appropriate signature etc... so that when that kernel starts, it can authenticate with the ultravisor, be verified (along with its ramdisk) etc... and copied (by the UV) into secure memory & run from there. At that point, the hypervisor is informed that the VM has become secure. So at that point, we could exit to qemu to inform it of the change, and have it walk the qtree and "Switch" all the virtio devices to use the IOMMU I suppose, but it feels a lot grosser to me. That's the only other option I can think of. > However in this specific case, the flag does not need to come from the > hypervisor, it can be set by arch boot code I think. > Christoph do you see a problem with that? The above could do that yes. Another approach would be to do it from a small virtio "quirk" that pokes a bit in the device to force it to iommu mode when it detects that we are running in a secure VM. That's a bit warty on the virito side but probably not as much as having a qemu one that walks of the virtio devices to change how they behave. What do you reckon ? What we want to avoid is to expose any of this to the *end user* or libvirt or any other higher level of the management stack. We really want that stuff to remain contained between the VM itself, KVM and maybe qemu. We will need some other qemu changes for migration so that's ok. But the minute you start touching libvirt and the higher levels it becomes a nightmare. Cheers, Ben. > > > To get swiotlb you'll need to then use the DT/ACPI > > > dma-range property to limit the addressable range, and a swiotlb > > > capable plaform will use swiotlb automatically. > > > > This cannot be done as you describe it. > > > > The VM is created as a *normal* VM. The DT stuff is generated by qemu > > at a point where it has *no idea* that the VM will later become secure > > and thus will have to restrict which pages can be used for "DMA". > > > > The VM will *at runtime* turn itself into a secure VM via interactions > > with the security HW and the Ultravisor layer (which sits below the > > HV). This happens way after the DT has been created and consumed, the > > qemu devices instanciated etc... > > > > Only the guest kernel knows because it initates the transition. When > > that happens, the virtio devices have already been used by the guest > > firmware, bootloader, possibly another kernel that kexeced the "secure" > > one, etc... > > > > So instead of running around saying NAK NAK NAK, please explain how we > > can solve that differently. > > > > Ben. > > ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices @ 2018-08-04 1:22 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 206+ messages in thread From: Benjamin Herrenschmidt @ 2018-08-04 1:22 UTC (permalink / raw) To: Michael S. Tsirkin Cc: robh, srikar, mpe, Will Deacon, linux-kernel, linuxram, virtualization, Christoph Hellwig, paulus, marc.zyngier, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Fri, 2018-08-03 at 22:07 +0300, Michael S. Tsirkin wrote: > On Fri, Aug 03, 2018 at 10:58:36AM -0500, Benjamin Herrenschmidt wrote: > > On Fri, 2018-08-03 at 00:05 -0700, Christoph Hellwig wrote: > > > > 2- Make virtio use the DMA API with our custom platform-provided > > > > swiotlb callbacks when needed, that is when not using IOMMU *and* > > > > running on a secure VM in our case. > > > > > > And total NAK the customer platform-provided part of this. We need > > > a flag passed in from the hypervisor that the device needs all bus > > > specific dma api treatment, and then just use the normal plaform > > > dma mapping setup. > > > > Christoph, as I have explained already, we do NOT have a way to provide > > such a flag as neither the hypervisor nor qemu knows anything about > > this when the VM is created. > > I think the fact you can't add flags from the hypervisor is > a sign of a problematic architecture, you should look at > adding that down the road - you will likely need it at some point. (Appologies if you got this twice, my mailer had a brain fart and I don't know if the first one got through & am about to disappear in a plane for 17h) Well, we can later in the boot process. At VM creation time, it's just a normal VM. The VM firmware, bootloader etc... are just operating normally etc... Later on, (we may have even already run Linux at that point, unsecurely, as we can use Linux as a bootloader under some circumstances), we start a "secure image". This is a kernel zImage that includes a "ticket" that has the appropriate signature etc... so that when that kernel starts, it can authenticate with the ultravisor, be verified (along with its ramdisk) etc... and copied (by the UV) into secure memory & run from there. At that point, the hypervisor is informed that the VM has become secure. So at that point, we could exit to qemu to inform it of the change, and have it walk the qtree and "Switch" all the virtio devices to use the IOMMU I suppose, but it feels a lot grosser to me. That's the only other option I can think of. > However in this specific case, the flag does not need to come from the > hypervisor, it can be set by arch boot code I think. > Christoph do you see a problem with that? The above could do that yes. Another approach would be to do it from a small virtio "quirk" that pokes a bit in the device to force it to iommu mode when it detects that we are running in a secure VM. That's a bit warty on the virito side but probably not as much as having a qemu one that walks of the virtio devices to change how they behave. What do you reckon ? What we want to avoid is to expose any of this to the *end user* or libvirt or any other higher level of the management stack. We really want that stuff to remain contained between the VM itself, KVM and maybe qemu. We will need some other qemu changes for migration so that's ok. But the minute you start touching libvirt and the higher levels it becomes a nightmare. Cheers, Ben. > > > To get swiotlb you'll need to then use the DT/ACPI > > > dma-range property to limit the addressable range, and a swiotlb > > > capable plaform will use swiotlb automatically. > > > > This cannot be done as you describe it. > > > > The VM is created as a *normal* VM. The DT stuff is generated by qemu > > at a point where it has *no idea* that the VM will later become secure > > and thus will have to restrict which pages can be used for "DMA". > > > > The VM will *at runtime* turn itself into a secure VM via interactions > > with the security HW and the Ultravisor layer (which sits below the > > HV). This happens way after the DT has been created and consumed, the > > qemu devices instanciated etc... > > > > Only the guest kernel knows because it initates the transition. When > > that happens, the virtio devices have already been used by the guest > > firmware, bootloader, possibly another kernel that kexeced the "secure" > > one, etc... > > > > So instead of running around saying NAK NAK NAK, please explain how we > > can solve that differently. > > > > Ben. > > ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-04 1:22 ` Benjamin Herrenschmidt (?) @ 2018-08-05 0:23 ` Michael S. Tsirkin -1 siblings, 0 replies; 206+ messages in thread From: Michael S. Tsirkin @ 2018-08-05 0:23 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: robh, srikar, mpe, Will Deacon, linux-kernel, linuxram, virtualization, Christoph Hellwig, paulus, marc.zyngier, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Fri, Aug 03, 2018 at 08:22:11PM -0500, Benjamin Herrenschmidt wrote: > (Appologies if you got this twice, my mailer had a brain fart and I don't > know if the first one got through & am about to disappear in a plane for 17h) I got like 3 of these. I hope that's true for everyone as I replied to 1st one. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-04 1:22 ` Benjamin Herrenschmidt (?) (?) @ 2018-08-05 0:23 ` Michael S. Tsirkin -1 siblings, 0 replies; 206+ messages in thread From: Michael S. Tsirkin @ 2018-08-05 0:23 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Christoph Hellwig, Will Deacon, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Fri, Aug 03, 2018 at 08:22:11PM -0500, Benjamin Herrenschmidt wrote: > (Appologies if you got this twice, my mailer had a brain fart and I don't > know if the first one got through & am about to disappear in a plane for 17h) I got like 3 of these. I hope that's true for everyone as I replied to 1st one. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-03 7:05 ` Christoph Hellwig @ 2018-08-03 19:17 ` Michael S. Tsirkin 2018-08-03 19:17 ` Michael S. Tsirkin 1 sibling, 0 replies; 206+ messages in thread From: Michael S. Tsirkin @ 2018-08-03 19:17 UTC (permalink / raw) To: Christoph Hellwig Cc: Benjamin Herrenschmidt, Will Deacon, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Fri, Aug 03, 2018 at 12:05:07AM -0700, Christoph Hellwig wrote: > On Thu, Aug 02, 2018 at 04:13:09PM -0500, Benjamin Herrenschmidt wrote: > > So let's differenciate the two problems of having an IOMMU (real or > > emulated) which indeeds adds overhead etc... and using the DMA API. > > > > At the moment, virtio does this all over the place: > > > > if (use_dma_api) > > dma_map/alloc_something(...) > > else > > use_pa > > > > The idea of the patch set is to do two, somewhat orthogonal, changes > > that together achieve what we want. Let me know where you think there > > is "a bunch of issues" because I'm missing it: > > > > 1- Replace the above if/else constructs with just calling the DMA API, > > and have virtio, at initialization, hookup its own dma_ops that just > > "return pa" (roughly) when the IOMMU stuff isn't used. > > > > This adds an indirect function call to the path that previously didn't > > have one (the else case above). Is that a significant/measurable > > overhead ? > > If you call it often enough it does: > > https://www.spinics.net/lists/netdev/msg495413.html > > > 2- Make virtio use the DMA API with our custom platform-provided > > swiotlb callbacks when needed, that is when not using IOMMU *and* > > running on a secure VM in our case. > > And total NAK the customer platform-provided part of this. We need > a flag passed in from the hypervisor that the device needs all bus > specific dma api treatment, and then just use the normal plaform > dma mapping setup. To get swiotlb you'll need to then use the DT/ACPI > dma-range property to limit the addressable range, and a swiotlb > capable plaform will use swiotlb automatically. It seems reasonable to teach a platform to override dma-range for a specific device e.g. in case it knows about bugs in ACPI. -- MST ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices @ 2018-08-03 19:17 ` Michael S. Tsirkin 0 siblings, 0 replies; 206+ messages in thread From: Michael S. Tsirkin @ 2018-08-03 19:17 UTC (permalink / raw) To: Christoph Hellwig Cc: robh, srikar, Benjamin Herrenschmidt, Will Deacon, linux-kernel, linuxram, virtualization, paulus, marc.zyngier, mpe, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Fri, Aug 03, 2018 at 12:05:07AM -0700, Christoph Hellwig wrote: > On Thu, Aug 02, 2018 at 04:13:09PM -0500, Benjamin Herrenschmidt wrote: > > So let's differenciate the two problems of having an IOMMU (real or > > emulated) which indeeds adds overhead etc... and using the DMA API. > > > > At the moment, virtio does this all over the place: > > > > if (use_dma_api) > > dma_map/alloc_something(...) > > else > > use_pa > > > > The idea of the patch set is to do two, somewhat orthogonal, changes > > that together achieve what we want. Let me know where you think there > > is "a bunch of issues" because I'm missing it: > > > > 1- Replace the above if/else constructs with just calling the DMA API, > > and have virtio, at initialization, hookup its own dma_ops that just > > "return pa" (roughly) when the IOMMU stuff isn't used. > > > > This adds an indirect function call to the path that previously didn't > > have one (the else case above). Is that a significant/measurable > > overhead ? > > If you call it often enough it does: > > https://www.spinics.net/lists/netdev/msg495413.html > > > 2- Make virtio use the DMA API with our custom platform-provided > > swiotlb callbacks when needed, that is when not using IOMMU *and* > > running on a secure VM in our case. > > And total NAK the customer platform-provided part of this. We need > a flag passed in from the hypervisor that the device needs all bus > specific dma api treatment, and then just use the normal plaform > dma mapping setup. To get swiotlb you'll need to then use the DT/ACPI > dma-range property to limit the addressable range, and a swiotlb > capable plaform will use swiotlb automatically. It seems reasonable to teach a platform to override dma-range for a specific device e.g. in case it knows about bugs in ACPI. -- MST ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-03 19:17 ` Michael S. Tsirkin @ 2018-08-04 8:15 ` Christoph Hellwig -1 siblings, 0 replies; 206+ messages in thread From: Christoph Hellwig @ 2018-08-04 8:15 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Christoph Hellwig, Benjamin Herrenschmidt, Will Deacon, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Fri, Aug 03, 2018 at 10:17:32PM +0300, Michael S. Tsirkin wrote: > It seems reasonable to teach a platform to override dma-range > for a specific device e.g. in case it knows about bugs in ACPI. A platform will be able override dma-range using the dev->bus_dma_mask field starting in 4.19. But we'll still need a way how to a) document in the virtio spec that all bus dma quirks are to be applied b) a way to document in a virtio-related spec how the bus handles dma for Ben's totally fucked up hypervisor. Without that there is not way we'll get interoperable implementations. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices @ 2018-08-04 8:15 ` Christoph Hellwig 0 siblings, 0 replies; 206+ messages in thread From: Christoph Hellwig @ 2018-08-04 8:15 UTC (permalink / raw) To: Michael S. Tsirkin Cc: robh, srikar, Benjamin Herrenschmidt, Will Deacon, linux-kernel, linuxram, virtualization, Christoph Hellwig, paulus, marc.zyngier, mpe, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Fri, Aug 03, 2018 at 10:17:32PM +0300, Michael S. Tsirkin wrote: > It seems reasonable to teach a platform to override dma-range > for a specific device e.g. in case it knows about bugs in ACPI. A platform will be able override dma-range using the dev->bus_dma_mask field starting in 4.19. But we'll still need a way how to a) document in the virtio spec that all bus dma quirks are to be applied b) a way to document in a virtio-related spec how the bus handles dma for Ben's totally fucked up hypervisor. Without that there is not way we'll get interoperable implementations. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-04 8:15 ` Christoph Hellwig (?) @ 2018-08-05 0:09 ` Michael S. Tsirkin -1 siblings, 0 replies; 206+ messages in thread From: Michael S. Tsirkin @ 2018-08-05 0:09 UTC (permalink / raw) To: Christoph Hellwig Cc: robh, srikar, Benjamin Herrenschmidt, Will Deacon, linux-kernel, linuxram, virtualization, paulus, marc.zyngier, mpe, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Sat, Aug 04, 2018 at 01:15:00AM -0700, Christoph Hellwig wrote: > On Fri, Aug 03, 2018 at 10:17:32PM +0300, Michael S. Tsirkin wrote: > > It seems reasonable to teach a platform to override dma-range > > for a specific device e.g. in case it knows about bugs in ACPI. > > A platform will be able override dma-range using the dev->bus_dma_mask > field starting in 4.19. But we'll still need a way how to > > a) document in the virtio spec that all bus dma quirks are to be > applied I agree it's a good idea. In particular I suspect that PLATFORM_IOMMU should be extended to cover that. But see below. > b) a way to document in a virtio-related spec how the bus handles > dma for Ben's totally fucked up hypervisor. Without that there > is not way we'll get interoperable implementations. So in this case however I'm not sure what exactly do we want to add. It seems that from point of view of the device, there is nothing special - it just gets a PA and writes there. It also seems that guest does not need to get any info from the device either. Instead guest itself needs device to DMA into specific addresses, for its own reasons. It seems that the fact that within guest it's implemented using a bounce buffer and that it's easiest to do by switching virtio to use the DMA API isn't something virtio spec concerns itself with. I'm open to suggestions. -- MST ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-04 8:15 ` Christoph Hellwig (?) (?) @ 2018-08-05 0:09 ` Michael S. Tsirkin 2018-08-05 1:11 ` Benjamin Herrenschmidt 2018-08-05 7:25 ` Christoph Hellwig -1 siblings, 2 replies; 206+ messages in thread From: Michael S. Tsirkin @ 2018-08-05 0:09 UTC (permalink / raw) To: Christoph Hellwig Cc: Benjamin Herrenschmidt, Will Deacon, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Sat, Aug 04, 2018 at 01:15:00AM -0700, Christoph Hellwig wrote: > On Fri, Aug 03, 2018 at 10:17:32PM +0300, Michael S. Tsirkin wrote: > > It seems reasonable to teach a platform to override dma-range > > for a specific device e.g. in case it knows about bugs in ACPI. > > A platform will be able override dma-range using the dev->bus_dma_mask > field starting in 4.19. But we'll still need a way how to > > a) document in the virtio spec that all bus dma quirks are to be > applied I agree it's a good idea. In particular I suspect that PLATFORM_IOMMU should be extended to cover that. But see below. > b) a way to document in a virtio-related spec how the bus handles > dma for Ben's totally fucked up hypervisor. Without that there > is not way we'll get interoperable implementations. So in this case however I'm not sure what exactly do we want to add. It seems that from point of view of the device, there is nothing special - it just gets a PA and writes there. It also seems that guest does not need to get any info from the device either. Instead guest itself needs device to DMA into specific addresses, for its own reasons. It seems that the fact that within guest it's implemented using a bounce buffer and that it's easiest to do by switching virtio to use the DMA API isn't something virtio spec concerns itself with. I'm open to suggestions. -- MST ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-05 0:09 ` Michael S. Tsirkin @ 2018-08-05 1:11 ` Benjamin Herrenschmidt 2018-08-05 7:25 ` Christoph Hellwig 1 sibling, 0 replies; 206+ messages in thread From: Benjamin Herrenschmidt @ 2018-08-05 1:11 UTC (permalink / raw) To: Michael S. Tsirkin, Christoph Hellwig Cc: Will Deacon, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Sun, 2018-08-05 at 03:09 +0300, Michael S. Tsirkin wrote: > It seems that the fact that within guest it's implemented using a bounce > buffer and that it's easiest to do by switching virtio to use the DMA API > isn't something virtio spec concerns itself with. Right, this is my reasoning as well. See this other (long) email I just sent to Christoph to explain the whole flow. > I'm open to suggestions. Cheers, Ben. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices @ 2018-08-05 1:11 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 206+ messages in thread From: Benjamin Herrenschmidt @ 2018-08-05 1:11 UTC (permalink / raw) To: Michael S. Tsirkin, Christoph Hellwig Cc: robh, srikar, mpe, Will Deacon, linux-kernel, linuxram, virtualization, paulus, marc.zyngier, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Sun, 2018-08-05 at 03:09 +0300, Michael S. Tsirkin wrote: > It seems that the fact that within guest it's implemented using a bounce > buffer and that it's easiest to do by switching virtio to use the DMA API > isn't something virtio spec concerns itself with. Right, this is my reasoning as well. See this other (long) email I just sent to Christoph to explain the whole flow. > I'm open to suggestions. Cheers, Ben. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-05 0:09 ` Michael S. Tsirkin @ 2018-08-05 7:25 ` Christoph Hellwig 2018-08-05 7:25 ` Christoph Hellwig 1 sibling, 0 replies; 206+ messages in thread From: Christoph Hellwig @ 2018-08-05 7:25 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Christoph Hellwig, Benjamin Herrenschmidt, Will Deacon, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Sun, Aug 05, 2018 at 03:09:55AM +0300, Michael S. Tsirkin wrote: > So in this case however I'm not sure what exactly do we want to add. It > seems that from point of view of the device, there is nothing special - > it just gets a PA and writes there. It also seems that guest does not > need to get any info from the device either. Instead guest itself needs > device to DMA into specific addresses, for its own reasons. > > It seems that the fact that within guest it's implemented using a bounce > buffer and that it's easiest to do by switching virtio to use the DMA API > isn't something virtio spec concerns itself with. And that is exactly what we added bus_dma_mask for - the case where the device itself has not limitation (or a bigger limitation), but the platform limits the accessible dma ranges. One typical case is a PCIe root port that is only connected to the CPU through an interconnect that is limited to 32 address bits for example. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices @ 2018-08-05 7:25 ` Christoph Hellwig 0 siblings, 0 replies; 206+ messages in thread From: Christoph Hellwig @ 2018-08-05 7:25 UTC (permalink / raw) To: Michael S. Tsirkin Cc: robh, srikar, Benjamin Herrenschmidt, Will Deacon, linux-kernel, linuxram, virtualization, Christoph Hellwig, paulus, marc.zyngier, mpe, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Sun, Aug 05, 2018 at 03:09:55AM +0300, Michael S. Tsirkin wrote: > So in this case however I'm not sure what exactly do we want to add. It > seems that from point of view of the device, there is nothing special - > it just gets a PA and writes there. It also seems that guest does not > need to get any info from the device either. Instead guest itself needs > device to DMA into specific addresses, for its own reasons. > > It seems that the fact that within guest it's implemented using a bounce > buffer and that it's easiest to do by switching virtio to use the DMA API > isn't something virtio spec concerns itself with. And that is exactly what we added bus_dma_mask for - the case where the device itself has not limitation (or a bigger limitation), but the platform limits the accessible dma ranges. One typical case is a PCIe root port that is only connected to the CPU through an interconnect that is limited to 32 address bits for example. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-04 8:15 ` Christoph Hellwig @ 2018-08-05 0:53 ` Benjamin Herrenschmidt -1 siblings, 0 replies; 206+ messages in thread From: Benjamin Herrenschmidt @ 2018-08-05 0:53 UTC (permalink / raw) To: Christoph Hellwig, Michael S. Tsirkin Cc: Will Deacon, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Sat, 2018-08-04 at 01:15 -0700, Christoph Hellwig wrote: > b) a way to document in a virtio-related spec how the bus handles > dma for Ben's totally fucked up hypervisor. Without that there > is not way we'll get interoperable implementations. Christoph, this isn't a totally fucked up hypervisor. It's not even about the hypervisor itself, I mean seriously, man, can you at least bother reading what I described is going on with the security architecture ? Anyway, Michael is onto what could possibly be an alternative approach, by having us tell qemu to flip to iommu mode at secure VM boot time. Let's see where that leads. Cheers, Ben. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices @ 2018-08-05 0:53 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 206+ messages in thread From: Benjamin Herrenschmidt @ 2018-08-05 0:53 UTC (permalink / raw) To: Christoph Hellwig, Michael S. Tsirkin Cc: robh, srikar, mpe, Will Deacon, linux-kernel, linuxram, virtualization, paulus, marc.zyngier, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Sat, 2018-08-04 at 01:15 -0700, Christoph Hellwig wrote: > b) a way to document in a virtio-related spec how the bus handles > dma for Ben's totally fucked up hypervisor. Without that there > is not way we'll get interoperable implementations. Christoph, this isn't a totally fucked up hypervisor. It's not even about the hypervisor itself, I mean seriously, man, can you at least bother reading what I described is going on with the security architecture ? Anyway, Michael is onto what could possibly be an alternative approach, by having us tell qemu to flip to iommu mode at secure VM boot time. Let's see where that leads. Cheers, Ben. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-01 8:16 ` Will Deacon @ 2018-08-05 0:27 ` Michael S. Tsirkin 2018-08-01 8:36 ` Christoph Hellwig 2018-08-05 0:27 ` Michael S. Tsirkin 2 siblings, 0 replies; 206+ messages in thread From: Michael S. Tsirkin @ 2018-08-05 0:27 UTC (permalink / raw) To: Will Deacon Cc: Benjamin Herrenschmidt, Christoph Hellwig, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Wed, Aug 01, 2018 at 09:16:38AM +0100, Will Deacon wrote: > On Tue, Jul 31, 2018 at 03:36:22PM -0500, Benjamin Herrenschmidt wrote: > > On Tue, 2018-07-31 at 10:30 -0700, Christoph Hellwig wrote: > > > > However the question people raise is that DMA API is already full of > > > > arch-specific tricks the likes of which are outlined in your post linked > > > > above. How is this one much worse? > > > > > > None of these warts is visible to the driver, they are all handled in > > > the architecture (possibly on a per-bus basis). > > > > > > So for virtio we really need to decide if it has one set of behavior > > > as specified in the virtio spec, or if it behaves exactly as if it > > > was on a PCI bus, or in fact probably both as you lined up. But no > > > magic arch specific behavior inbetween. > > > > The only arch specific behaviour is needed in the case where it doesn't > > behave like PCI. In this case, the PCI DMA ops are not suitable, but in > > our secure VMs, we still need to make it use swiotlb in order to bounce > > through non-secure pages. > > On arm/arm64, the problem we have is that legacy virtio devices on the MMIO > transport (so definitely not PCI) have historically been advertised by qemu > as not being cache coherent, but because the virtio core has bypassed DMA > ops then everything has happened to work. If we blindly enable the arch DMA > ops, we'll plumb in the non-coherent ops and start getting data corruption, > so we do need a way to quirk virtio as being "always coherent" if we want to > use the DMA ops (which we do, because our emulation platforms have an IOMMU > for all virtio devices). > > Will Right that's not very different from placing the device within the IOMMU domain but in fact bypassing the IOMMU. I wonder whether anyone ever needs a non coherent virtio-mmio. If yes we can extend PLATFORM_IOMMU to cover that or add another bit. What exactly do the non-coherent ops do that causes the corruption? -- MST ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices @ 2018-08-05 0:27 ` Michael S. Tsirkin 0 siblings, 0 replies; 206+ messages in thread From: Michael S. Tsirkin @ 2018-08-05 0:27 UTC (permalink / raw) To: Will Deacon Cc: robh, srikar, Benjamin Herrenschmidt, linuxram, linux-kernel, virtualization, Christoph Hellwig, paulus, marc.zyngier, mpe, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Wed, Aug 01, 2018 at 09:16:38AM +0100, Will Deacon wrote: > On Tue, Jul 31, 2018 at 03:36:22PM -0500, Benjamin Herrenschmidt wrote: > > On Tue, 2018-07-31 at 10:30 -0700, Christoph Hellwig wrote: > > > > However the question people raise is that DMA API is already full of > > > > arch-specific tricks the likes of which are outlined in your post linked > > > > above. How is this one much worse? > > > > > > None of these warts is visible to the driver, they are all handled in > > > the architecture (possibly on a per-bus basis). > > > > > > So for virtio we really need to decide if it has one set of behavior > > > as specified in the virtio spec, or if it behaves exactly as if it > > > was on a PCI bus, or in fact probably both as you lined up. But no > > > magic arch specific behavior inbetween. > > > > The only arch specific behaviour is needed in the case where it doesn't > > behave like PCI. In this case, the PCI DMA ops are not suitable, but in > > our secure VMs, we still need to make it use swiotlb in order to bounce > > through non-secure pages. > > On arm/arm64, the problem we have is that legacy virtio devices on the MMIO > transport (so definitely not PCI) have historically been advertised by qemu > as not being cache coherent, but because the virtio core has bypassed DMA > ops then everything has happened to work. If we blindly enable the arch DMA > ops, we'll plumb in the non-coherent ops and start getting data corruption, > so we do need a way to quirk virtio as being "always coherent" if we want to > use the DMA ops (which we do, because our emulation platforms have an IOMMU > for all virtio devices). > > Will Right that's not very different from placing the device within the IOMMU domain but in fact bypassing the IOMMU. I wonder whether anyone ever needs a non coherent virtio-mmio. If yes we can extend PLATFORM_IOMMU to cover that or add another bit. What exactly do the non-coherent ops do that causes the corruption? -- MST ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-05 0:27 ` Michael S. Tsirkin (?) @ 2018-08-06 14:05 ` Will Deacon -1 siblings, 0 replies; 206+ messages in thread From: Will Deacon @ 2018-08-06 14:05 UTC (permalink / raw) To: Michael S. Tsirkin Cc: robh, srikar, Benjamin Herrenschmidt, linuxram, linux-kernel, virtualization, Christoph Hellwig, jean-philippe.brucker, paulus, marc.zyngier, mpe, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual Hi Michael, On Sun, Aug 05, 2018 at 03:27:42AM +0300, Michael S. Tsirkin wrote: > On Wed, Aug 01, 2018 at 09:16:38AM +0100, Will Deacon wrote: > > On Tue, Jul 31, 2018 at 03:36:22PM -0500, Benjamin Herrenschmidt wrote: > > > On Tue, 2018-07-31 at 10:30 -0700, Christoph Hellwig wrote: > > > > > However the question people raise is that DMA API is already full of > > > > > arch-specific tricks the likes of which are outlined in your post linked > > > > > above. How is this one much worse? > > > > > > > > None of these warts is visible to the driver, they are all handled in > > > > the architecture (possibly on a per-bus basis). > > > > > > > > So for virtio we really need to decide if it has one set of behavior > > > > as specified in the virtio spec, or if it behaves exactly as if it > > > > was on a PCI bus, or in fact probably both as you lined up. But no > > > > magic arch specific behavior inbetween. > > > > > > The only arch specific behaviour is needed in the case where it doesn't > > > behave like PCI. In this case, the PCI DMA ops are not suitable, but in > > > our secure VMs, we still need to make it use swiotlb in order to bounce > > > through non-secure pages. > > > > On arm/arm64, the problem we have is that legacy virtio devices on the MMIO > > transport (so definitely not PCI) have historically been advertised by qemu > > as not being cache coherent, but because the virtio core has bypassed DMA > > ops then everything has happened to work. If we blindly enable the arch DMA > > ops, we'll plumb in the non-coherent ops and start getting data corruption, > > so we do need a way to quirk virtio as being "always coherent" if we want to > > use the DMA ops (which we do, because our emulation platforms have an IOMMU > > for all virtio devices). > > > > Will > > Right that's not very different from placing the device within the IOMMU > domain but in fact bypassing the IOMMU Hmm, I'm not sure I follow you here -- the IOMMU bypassing is handled inside the IOMMU driver, so we'd still end up with non-coherent DMA ops for the guest accesses. The presence of an IOMMU doesn't imply coherency for us. Or am I missing your point here? > I wonder whether anyone ever needs a non coherent virtio-mmio. If yes we > can extend PLATFORM_IOMMU to cover that or add another bit. I think that's probably the right way around: assume that legacy virtio-mmio devices are coherent by default. > What exactly do the non-coherent ops do that causes the corruption? The non-coherent ops mean that the guest ends up allocating the vring queues using non-cacheable mappings, whereas qemu/hypervisor uses a cacheable mapping despite not advertising the devices as being cache-coherent. This hits something in the architecture known as "mismatched aliases", which means that coherency is lost between the guest and the hypervisor, consequently resulting in data not being visible and ordering not being guaranteed. The usual symptom is that the device appears to lock up iirc, because the guest and the hypervisor are unable to communicate with each other. Does that help to clarify things? Thanks, Will ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-05 0:27 ` Michael S. Tsirkin (?) (?) @ 2018-08-06 14:05 ` Will Deacon -1 siblings, 0 replies; 206+ messages in thread From: Will Deacon @ 2018-08-06 14:05 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Benjamin Herrenschmidt, Christoph Hellwig, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier Hi Michael, On Sun, Aug 05, 2018 at 03:27:42AM +0300, Michael S. Tsirkin wrote: > On Wed, Aug 01, 2018 at 09:16:38AM +0100, Will Deacon wrote: > > On Tue, Jul 31, 2018 at 03:36:22PM -0500, Benjamin Herrenschmidt wrote: > > > On Tue, 2018-07-31 at 10:30 -0700, Christoph Hellwig wrote: > > > > > However the question people raise is that DMA API is already full of > > > > > arch-specific tricks the likes of which are outlined in your post linked > > > > > above. How is this one much worse? > > > > > > > > None of these warts is visible to the driver, they are all handled in > > > > the architecture (possibly on a per-bus basis). > > > > > > > > So for virtio we really need to decide if it has one set of behavior > > > > as specified in the virtio spec, or if it behaves exactly as if it > > > > was on a PCI bus, or in fact probably both as you lined up. But no > > > > magic arch specific behavior inbetween. > > > > > > The only arch specific behaviour is needed in the case where it doesn't > > > behave like PCI. In this case, the PCI DMA ops are not suitable, but in > > > our secure VMs, we still need to make it use swiotlb in order to bounce > > > through non-secure pages. > > > > On arm/arm64, the problem we have is that legacy virtio devices on the MMIO > > transport (so definitely not PCI) have historically been advertised by qemu > > as not being cache coherent, but because the virtio core has bypassed DMA > > ops then everything has happened to work. If we blindly enable the arch DMA > > ops, we'll plumb in the non-coherent ops and start getting data corruption, > > so we do need a way to quirk virtio as being "always coherent" if we want to > > use the DMA ops (which we do, because our emulation platforms have an IOMMU > > for all virtio devices). > > > > Will > > Right that's not very different from placing the device within the IOMMU > domain but in fact bypassing the IOMMU Hmm, I'm not sure I follow you here -- the IOMMU bypassing is handled inside the IOMMU driver, so we'd still end up with non-coherent DMA ops for the guest accesses. The presence of an IOMMU doesn't imply coherency for us. Or am I missing your point here? > I wonder whether anyone ever needs a non coherent virtio-mmio. If yes we > can extend PLATFORM_IOMMU to cover that or add another bit. I think that's probably the right way around: assume that legacy virtio-mmio devices are coherent by default. > What exactly do the non-coherent ops do that causes the corruption? The non-coherent ops mean that the guest ends up allocating the vring queues using non-cacheable mappings, whereas qemu/hypervisor uses a cacheable mapping despite not advertising the devices as being cache-coherent. This hits something in the architecture known as "mismatched aliases", which means that coherency is lost between the guest and the hypervisor, consequently resulting in data not being visible and ordering not being guaranteed. The usual symptom is that the device appears to lock up iirc, because the guest and the hypervisor are unable to communicate with each other. Does that help to clarify things? Thanks, Will ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-07-31 20:36 ` Benjamin Herrenschmidt 2018-08-01 8:16 ` Will Deacon 2018-08-01 8:16 ` Will Deacon @ 2018-08-01 21:56 ` Michael S. Tsirkin 2018-08-01 21:56 ` Michael S. Tsirkin 3 siblings, 0 replies; 206+ messages in thread From: Michael S. Tsirkin @ 2018-08-01 21:56 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: robh, srikar, mpe, Will Deacon, linux-kernel, linuxram, virtualization, Christoph Hellwig, paulus, marc.zyngier, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Tue, Jul 31, 2018 at 03:36:22PM -0500, Benjamin Herrenschmidt wrote: > On Tue, 2018-07-31 at 10:30 -0700, Christoph Hellwig wrote: > > > However the question people raise is that DMA API is already full of > > > arch-specific tricks the likes of which are outlined in your post linked > > > above. How is this one much worse? > > > > None of these warts is visible to the driver, they are all handled in > > the architecture (possibly on a per-bus basis). > > > > So for virtio we really need to decide if it has one set of behavior > > as specified in the virtio spec, or if it behaves exactly as if it > > was on a PCI bus, or in fact probably both as you lined up. But no > > magic arch specific behavior inbetween. > > The only arch specific behaviour is needed in the case where it doesn't > behave like PCI. In this case, the PCI DMA ops are not suitable, but in > our secure VMs, we still need to make it use swiotlb in order to bounce > through non-secure pages. > > It would be nice if "real PCI" was the default I think you are mixing "real PCI" which isn't coded up yet and IOMMU bypass which is. IOMMU bypass will maybe with time become unnecessary since it seems that one can just program an IOMMU in a bypass mode instead. It's hard to blame you since right now if you disable IOMMU bypass you get a real PCI mode. But they are distinct and to allow people to enable IOMMU by default we will need to teach someone (virtio or DMA API) about this mode that does follow translation and protection rules in the IOMMU but runs on a CPU and so does not need cache flushes and whatnot. OTOH real PCI mode as opposed to default hypervisor mode does not perform as well when what you actually have is a hypervisor. So we'll likely have a mix of these two modes for a while. > but it's not, VMs are > created in "legacy" mode all the times and we don't know at VM creation > time whether it will become a secure VM or not. The way our secure VMs > work is that they start as a normal VM, load a secure "payload" and > call the Ultravisor to "become" secure. > > So we're in a bit of a bind here. We need that one-liner optional arch > hook to make virtio use swiotlb in that "IOMMU bypass" case. > > Ben. And just to make sure I understand, on your platform DMA APIs do include some of the cache flushing tricks and this is why you don't want to declare iommu support in the hypervisor? -- MST ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-07-31 20:36 ` Benjamin Herrenschmidt ` (2 preceding siblings ...) 2018-08-01 21:56 ` Michael S. Tsirkin @ 2018-08-01 21:56 ` Michael S. Tsirkin 2018-08-02 15:33 ` Benjamin Herrenschmidt 3 siblings, 1 reply; 206+ messages in thread From: Michael S. Tsirkin @ 2018-08-01 21:56 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Christoph Hellwig, Will Deacon, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Tue, Jul 31, 2018 at 03:36:22PM -0500, Benjamin Herrenschmidt wrote: > On Tue, 2018-07-31 at 10:30 -0700, Christoph Hellwig wrote: > > > However the question people raise is that DMA API is already full of > > > arch-specific tricks the likes of which are outlined in your post linked > > > above. How is this one much worse? > > > > None of these warts is visible to the driver, they are all handled in > > the architecture (possibly on a per-bus basis). > > > > So for virtio we really need to decide if it has one set of behavior > > as specified in the virtio spec, or if it behaves exactly as if it > > was on a PCI bus, or in fact probably both as you lined up. But no > > magic arch specific behavior inbetween. > > The only arch specific behaviour is needed in the case where it doesn't > behave like PCI. In this case, the PCI DMA ops are not suitable, but in > our secure VMs, we still need to make it use swiotlb in order to bounce > through non-secure pages. > > It would be nice if "real PCI" was the default I think you are mixing "real PCI" which isn't coded up yet and IOMMU bypass which is. IOMMU bypass will maybe with time become unnecessary since it seems that one can just program an IOMMU in a bypass mode instead. It's hard to blame you since right now if you disable IOMMU bypass you get a real PCI mode. But they are distinct and to allow people to enable IOMMU by default we will need to teach someone (virtio or DMA API) about this mode that does follow translation and protection rules in the IOMMU but runs on a CPU and so does not need cache flushes and whatnot. OTOH real PCI mode as opposed to default hypervisor mode does not perform as well when what you actually have is a hypervisor. So we'll likely have a mix of these two modes for a while. > but it's not, VMs are > created in "legacy" mode all the times and we don't know at VM creation > time whether it will become a secure VM or not. The way our secure VMs > work is that they start as a normal VM, load a secure "payload" and > call the Ultravisor to "become" secure. > > So we're in a bit of a bind here. We need that one-liner optional arch > hook to make virtio use swiotlb in that "IOMMU bypass" case. > > Ben. And just to make sure I understand, on your platform DMA APIs do include some of the cache flushing tricks and this is why you don't want to declare iommu support in the hypervisor? -- MST ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-01 21:56 ` Michael S. Tsirkin @ 2018-08-02 15:33 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 206+ messages in thread From: Benjamin Herrenschmidt @ 2018-08-02 15:33 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Christoph Hellwig, Will Deacon, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Thu, 2018-08-02 at 00:56 +0300, Michael S. Tsirkin wrote: > > but it's not, VMs are > > created in "legacy" mode all the times and we don't know at VM creation > > time whether it will become a secure VM or not. The way our secure VMs > > work is that they start as a normal VM, load a secure "payload" and > > call the Ultravisor to "become" secure. > > > > So we're in a bit of a bind here. We need that one-liner optional arch > > hook to make virtio use swiotlb in that "IOMMU bypass" case. > > > > Ben. > > And just to make sure I understand, on your platform DMA APIs do include > some of the cache flushing tricks and this is why you don't want to > declare iommu support in the hypervisor? I'm not sure I parse what you mean. We don't need cache flushing tricks. The problem we have with our "secure" VMs is that: - At VM creation time we have no idea it's going to become a secure VM, qemu doesn't know anything about it, and thus qemu (or other management tools, libvirt etc...) are going to create "legacy" (ie iommu bypass) virtio devices. - Once the VM goes secure (early during boot but too late for qemu), it will need to make virtio do bounce-buffering via swiotlb because qemu cannot physically access most VM pages (blocked by HW security features), we need to bounce buffer using some unsecure pages that are accessible to qemu. That said, I wouldn't object for us to more generally switch long run to changing qemu so that virtio on powerpc starts using the IOMMU as a default provided we fix our guest firmware to understand it (it currently doesn't), and provided we verify that the performance impact on things like vhost is negligible. Cheers, Ben. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices @ 2018-08-02 15:33 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 206+ messages in thread From: Benjamin Herrenschmidt @ 2018-08-02 15:33 UTC (permalink / raw) To: Michael S. Tsirkin Cc: robh, srikar, mpe, Will Deacon, linux-kernel, linuxram, virtualization, Christoph Hellwig, paulus, marc.zyngier, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Thu, 2018-08-02 at 00:56 +0300, Michael S. Tsirkin wrote: > > but it's not, VMs are > > created in "legacy" mode all the times and we don't know at VM creation > > time whether it will become a secure VM or not. The way our secure VMs > > work is that they start as a normal VM, load a secure "payload" and > > call the Ultravisor to "become" secure. > > > > So we're in a bit of a bind here. We need that one-liner optional arch > > hook to make virtio use swiotlb in that "IOMMU bypass" case. > > > > Ben. > > And just to make sure I understand, on your platform DMA APIs do include > some of the cache flushing tricks and this is why you don't want to > declare iommu support in the hypervisor? I'm not sure I parse what you mean. We don't need cache flushing tricks. The problem we have with our "secure" VMs is that: - At VM creation time we have no idea it's going to become a secure VM, qemu doesn't know anything about it, and thus qemu (or other management tools, libvirt etc...) are going to create "legacy" (ie iommu bypass) virtio devices. - Once the VM goes secure (early during boot but too late for qemu), it will need to make virtio do bounce-buffering via swiotlb because qemu cannot physically access most VM pages (blocked by HW security features), we need to bounce buffer using some unsecure pages that are accessible to qemu. That said, I wouldn't object for us to more generally switch long run to changing qemu so that virtio on powerpc starts using the IOMMU as a default provided we fix our guest firmware to understand it (it currently doesn't), and provided we verify that the performance impact on things like vhost is negligible. Cheers, Ben. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-02 15:33 ` Benjamin Herrenschmidt (?) @ 2018-08-02 20:53 ` Michael S. Tsirkin 2018-08-03 7:06 ` Christoph Hellwig 2018-08-03 7:06 ` Christoph Hellwig -1 siblings, 2 replies; 206+ messages in thread From: Michael S. Tsirkin @ 2018-08-02 20:53 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Christoph Hellwig, Will Deacon, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Thu, Aug 02, 2018 at 10:33:05AM -0500, Benjamin Herrenschmidt wrote: > On Thu, 2018-08-02 at 00:56 +0300, Michael S. Tsirkin wrote: > > > but it's not, VMs are > > > created in "legacy" mode all the times and we don't know at VM creation > > > time whether it will become a secure VM or not. The way our secure VMs > > > work is that they start as a normal VM, load a secure "payload" and > > > call the Ultravisor to "become" secure. > > > > > > So we're in a bit of a bind here. We need that one-liner optional arch > > > hook to make virtio use swiotlb in that "IOMMU bypass" case. > > > > > > Ben. > > > > And just to make sure I understand, on your platform DMA APIs do include > > some of the cache flushing tricks and this is why you don't want to > > declare iommu support in the hypervisor? > > I'm not sure I parse what you mean. > > We don't need cache flushing tricks. You don't but do real devices on same platform need them? > The problem we have with our > "secure" VMs is that: > > - At VM creation time we have no idea it's going to become a secure > VM, qemu doesn't know anything about it, and thus qemu (or other > management tools, libvirt etc...) are going to create "legacy" (ie > iommu bypass) virtio devices. > > - Once the VM goes secure (early during boot but too late for qemu), > it will need to make virtio do bounce-buffering via swiotlb because > qemu cannot physically access most VM pages (blocked by HW security > features), we need to bounce buffer using some unsecure pages that are > accessible to qemu. > > That said, I wouldn't object for us to more generally switch long run > to changing qemu so that virtio on powerpc starts using the IOMMU as a > default provided we fix our guest firmware to understand it (it > currently doesn't), and provided we verify that the performance impact > on things like vhost is negligible. > > Cheers, > Ben. > ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-02 20:53 ` Michael S. Tsirkin @ 2018-08-03 7:06 ` Christoph Hellwig 2018-08-03 7:06 ` Christoph Hellwig 1 sibling, 0 replies; 206+ messages in thread From: Christoph Hellwig @ 2018-08-03 7:06 UTC (permalink / raw) To: Michael S. Tsirkin Cc: robh, srikar, Benjamin Herrenschmidt, Will Deacon, linux-kernel, linuxram, virtualization, Christoph Hellwig, paulus, marc.zyngier, mpe, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Thu, Aug 02, 2018 at 11:53:08PM +0300, Michael S. Tsirkin wrote: > > We don't need cache flushing tricks. > > You don't but do real devices on same platform need them? IBMs power plaforms are always cache coherent. There are some powerpc platforms have not cache coherent DMA, but I guess this scheme isn't intended for them. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-02 20:53 ` Michael S. Tsirkin 2018-08-03 7:06 ` Christoph Hellwig @ 2018-08-03 7:06 ` Christoph Hellwig 1 sibling, 0 replies; 206+ messages in thread From: Christoph Hellwig @ 2018-08-03 7:06 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Benjamin Herrenschmidt, Christoph Hellwig, Will Deacon, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, mpe, linuxram, haren, paulus, srikar, robin.murphy, jean-philippe.brucker, marc.zyngier On Thu, Aug 02, 2018 at 11:53:08PM +0300, Michael S. Tsirkin wrote: > > We don't need cache flushing tricks. > > You don't but do real devices on same platform need them? IBMs power plaforms are always cache coherent. There are some powerpc platforms have not cache coherent DMA, but I guess this scheme isn't intended for them. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-02 15:33 ` Benjamin Herrenschmidt (?) (?) @ 2018-08-02 20:53 ` Michael S. Tsirkin -1 siblings, 0 replies; 206+ messages in thread From: Michael S. Tsirkin @ 2018-08-02 20:53 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: robh, srikar, mpe, Will Deacon, linux-kernel, linuxram, virtualization, Christoph Hellwig, paulus, marc.zyngier, joe, robin.murphy, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Thu, Aug 02, 2018 at 10:33:05AM -0500, Benjamin Herrenschmidt wrote: > On Thu, 2018-08-02 at 00:56 +0300, Michael S. Tsirkin wrote: > > > but it's not, VMs are > > > created in "legacy" mode all the times and we don't know at VM creation > > > time whether it will become a secure VM or not. The way our secure VMs > > > work is that they start as a normal VM, load a secure "payload" and > > > call the Ultravisor to "become" secure. > > > > > > So we're in a bit of a bind here. We need that one-liner optional arch > > > hook to make virtio use swiotlb in that "IOMMU bypass" case. > > > > > > Ben. > > > > And just to make sure I understand, on your platform DMA APIs do include > > some of the cache flushing tricks and this is why you don't want to > > declare iommu support in the hypervisor? > > I'm not sure I parse what you mean. > > We don't need cache flushing tricks. You don't but do real devices on same platform need them? > The problem we have with our > "secure" VMs is that: > > - At VM creation time we have no idea it's going to become a secure > VM, qemu doesn't know anything about it, and thus qemu (or other > management tools, libvirt etc...) are going to create "legacy" (ie > iommu bypass) virtio devices. > > - Once the VM goes secure (early during boot but too late for qemu), > it will need to make virtio do bounce-buffering via swiotlb because > qemu cannot physically access most VM pages (blocked by HW security > features), we need to bounce buffer using some unsecure pages that are > accessible to qemu. > > That said, I wouldn't object for us to more generally switch long run > to changing qemu so that virtio on powerpc starts using the IOMMU as a > default provided we fix our guest firmware to understand it (it > currently doesn't), and provided we verify that the performance impact > on things like vhost is negligible. > > Cheers, > Ben. > ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-07-20 3:59 Anshuman Khandual ` (2 preceding siblings ...) 2018-07-27 9:58 ` Will Deacon @ 2018-08-02 20:55 ` Michael S. Tsirkin 2018-08-02 20:55 ` Michael S. Tsirkin 4 siblings, 0 replies; 206+ messages in thread From: Michael S. Tsirkin @ 2018-08-02 20:55 UTC (permalink / raw) To: Anshuman Khandual Cc: robh, srikar, benh, linuxram, linux-kernel, virtualization, hch, paulus, mpe, joe, linuxppc-dev, elfring, haren, david On Fri, Jul 20, 2018 at 09:29:37AM +0530, Anshuman Khandual wrote: > This patch series is the follow up on the discussions we had before about > the RFC titled [RFC,V2] virtio: Add platform specific DMA API translation > for virito devices (https://patchwork.kernel.org/patch/10417371/). There > were suggestions about doing away with two different paths of transactions > with the host/QEMU, first being the direct GPA and the other being the DMA > API based translations. > > First patch attempts to create a direct GPA mapping based DMA operations > structure called 'virtio_direct_dma_ops' with exact same implementation > of the direct GPA path which virtio core currently has but just wrapped in > a DMA API format. Virtio core must use 'virtio_direct_dma_ops' instead of > the arch default in absence of VIRTIO_F_IOMMU_PLATFORM flag to preserve the > existing semantics. The second patch does exactly that inside the function > virtio_finalize_features(). The third patch removes the default direct GPA > path from virtio core forcing it to use DMA API callbacks for all devices. > Now with that change, every device must have a DMA operations structure > associated with it. The fourth patch adds an additional hook which gives > the platform an opportunity to do yet another override if required. This > platform hook can be used on POWER Ultravisor based protected guests to > load up SWIOTLB DMA callbacks to do the required (as discussed previously > in the above mentioned thread how host is allowed to access only parts of > the guest GPA range) bounce buffering into the shared memory for all I/O > scatter gather buffers to be consumed on the host side. > > Please go through these patches and review whether this approach broadly > makes sense. I will appreciate suggestions, inputs, comments regarding > the patches or the approach in general. Thank you. Jason did some work on profiling this. Unfortunately he reports about 4% extra overhead from this switch on x86 with no vIOMMU. I expect he's writing up the data in more detail, but just wanted to let you know this would be one more thing to debug before we can just switch to DMA APIs. > Anshuman Khandual (4): > virtio: Define virtio_direct_dma_ops structure > virtio: Override device's DMA OPS with virtio_direct_dma_ops selectively > virtio: Force virtio core to use DMA API callbacks for all virtio devices > virtio: Add platform specific DMA API translation for virito devices > > arch/powerpc/include/asm/dma-mapping.h | 6 +++ > arch/powerpc/platforms/pseries/iommu.c | 6 +++ > drivers/virtio/virtio.c | 72 ++++++++++++++++++++++++++++++++++ > drivers/virtio/virtio_pci_common.h | 3 ++ > drivers/virtio/virtio_ring.c | 65 +----------------------------- > 5 files changed, 89 insertions(+), 63 deletions(-) > > -- > 2.9.3 ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-07-20 3:59 Anshuman Khandual ` (3 preceding siblings ...) 2018-08-02 20:55 ` Michael S. Tsirkin @ 2018-08-02 20:55 ` Michael S. Tsirkin 2018-08-03 2:41 ` Jason Wang 4 siblings, 1 reply; 206+ messages in thread From: Michael S. Tsirkin @ 2018-08-02 20:55 UTC (permalink / raw) To: Anshuman Khandual Cc: virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, jasowang, benh, mpe, hch, linuxram, haren, paulus, srikar On Fri, Jul 20, 2018 at 09:29:37AM +0530, Anshuman Khandual wrote: > This patch series is the follow up on the discussions we had before about > the RFC titled [RFC,V2] virtio: Add platform specific DMA API translation > for virito devices (https://patchwork.kernel.org/patch/10417371/). There > were suggestions about doing away with two different paths of transactions > with the host/QEMU, first being the direct GPA and the other being the DMA > API based translations. > > First patch attempts to create a direct GPA mapping based DMA operations > structure called 'virtio_direct_dma_ops' with exact same implementation > of the direct GPA path which virtio core currently has but just wrapped in > a DMA API format. Virtio core must use 'virtio_direct_dma_ops' instead of > the arch default in absence of VIRTIO_F_IOMMU_PLATFORM flag to preserve the > existing semantics. The second patch does exactly that inside the function > virtio_finalize_features(). The third patch removes the default direct GPA > path from virtio core forcing it to use DMA API callbacks for all devices. > Now with that change, every device must have a DMA operations structure > associated with it. The fourth patch adds an additional hook which gives > the platform an opportunity to do yet another override if required. This > platform hook can be used on POWER Ultravisor based protected guests to > load up SWIOTLB DMA callbacks to do the required (as discussed previously > in the above mentioned thread how host is allowed to access only parts of > the guest GPA range) bounce buffering into the shared memory for all I/O > scatter gather buffers to be consumed on the host side. > > Please go through these patches and review whether this approach broadly > makes sense. I will appreciate suggestions, inputs, comments regarding > the patches or the approach in general. Thank you. Jason did some work on profiling this. Unfortunately he reports about 4% extra overhead from this switch on x86 with no vIOMMU. I expect he's writing up the data in more detail, but just wanted to let you know this would be one more thing to debug before we can just switch to DMA APIs. > Anshuman Khandual (4): > virtio: Define virtio_direct_dma_ops structure > virtio: Override device's DMA OPS with virtio_direct_dma_ops selectively > virtio: Force virtio core to use DMA API callbacks for all virtio devices > virtio: Add platform specific DMA API translation for virito devices > > arch/powerpc/include/asm/dma-mapping.h | 6 +++ > arch/powerpc/platforms/pseries/iommu.c | 6 +++ > drivers/virtio/virtio.c | 72 ++++++++++++++++++++++++++++++++++ > drivers/virtio/virtio_pci_common.h | 3 ++ > drivers/virtio/virtio_ring.c | 65 +----------------------------- > 5 files changed, 89 insertions(+), 63 deletions(-) > > -- > 2.9.3 ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-02 20:55 ` Michael S. Tsirkin @ 2018-08-03 2:41 ` Jason Wang 0 siblings, 0 replies; 206+ messages in thread From: Jason Wang @ 2018-08-03 2:41 UTC (permalink / raw) To: Michael S. Tsirkin, Anshuman Khandual Cc: virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, benh, mpe, hch, linuxram, haren, paulus, srikar On 2018年08月03日 04:55, Michael S. Tsirkin wrote: > On Fri, Jul 20, 2018 at 09:29:37AM +0530, Anshuman Khandual wrote: >> This patch series is the follow up on the discussions we had before about >> the RFC titled [RFC,V2] virtio: Add platform specific DMA API translation >> for virito devices (https://patchwork.kernel.org/patch/10417371/). There >> were suggestions about doing away with two different paths of transactions >> with the host/QEMU, first being the direct GPA and the other being the DMA >> API based translations. >> >> First patch attempts to create a direct GPA mapping based DMA operations >> structure called 'virtio_direct_dma_ops' with exact same implementation >> of the direct GPA path which virtio core currently has but just wrapped in >> a DMA API format. Virtio core must use 'virtio_direct_dma_ops' instead of >> the arch default in absence of VIRTIO_F_IOMMU_PLATFORM flag to preserve the >> existing semantics. The second patch does exactly that inside the function >> virtio_finalize_features(). The third patch removes the default direct GPA >> path from virtio core forcing it to use DMA API callbacks for all devices. >> Now with that change, every device must have a DMA operations structure >> associated with it. The fourth patch adds an additional hook which gives >> the platform an opportunity to do yet another override if required. This >> platform hook can be used on POWER Ultravisor based protected guests to >> load up SWIOTLB DMA callbacks to do the required (as discussed previously >> in the above mentioned thread how host is allowed to access only parts of >> the guest GPA range) bounce buffering into the shared memory for all I/O >> scatter gather buffers to be consumed on the host side. >> >> Please go through these patches and review whether this approach broadly >> makes sense. I will appreciate suggestions, inputs, comments regarding >> the patches or the approach in general. Thank you. > Jason did some work on profiling this. Unfortunately he reports > about 4% extra overhead from this switch on x86 with no vIOMMU. The test is rather simple, just run pktgen (pktgen_sample01_simple.sh) in guest and measure PPS on tap on host. Thanks > > I expect he's writing up the data in more detail, but > just wanted to let you know this would be one more > thing to debug before we can just switch to DMA APIs. > > >> Anshuman Khandual (4): >> virtio: Define virtio_direct_dma_ops structure >> virtio: Override device's DMA OPS with virtio_direct_dma_ops selectively >> virtio: Force virtio core to use DMA API callbacks for all virtio devices >> virtio: Add platform specific DMA API translation for virito devices >> >> arch/powerpc/include/asm/dma-mapping.h | 6 +++ >> arch/powerpc/platforms/pseries/iommu.c | 6 +++ >> drivers/virtio/virtio.c | 72 ++++++++++++++++++++++++++++++++++ >> drivers/virtio/virtio_pci_common.h | 3 ++ >> drivers/virtio/virtio_ring.c | 65 +----------------------------- >> 5 files changed, 89 insertions(+), 63 deletions(-) >> >> -- >> 2.9.3 ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices @ 2018-08-03 2:41 ` Jason Wang 0 siblings, 0 replies; 206+ messages in thread From: Jason Wang @ 2018-08-03 2:41 UTC (permalink / raw) To: Michael S. Tsirkin, Anshuman Khandual Cc: robh, srikar, benh, linuxram, linux-kernel, virtualization, hch, paulus, mpe, joe, linuxppc-dev, elfring, haren, david On 2018年08月03日 04:55, Michael S. Tsirkin wrote: > On Fri, Jul 20, 2018 at 09:29:37AM +0530, Anshuman Khandual wrote: >> This patch series is the follow up on the discussions we had before about >> the RFC titled [RFC,V2] virtio: Add platform specific DMA API translation >> for virito devices (https://patchwork.kernel.org/patch/10417371/). There >> were suggestions about doing away with two different paths of transactions >> with the host/QEMU, first being the direct GPA and the other being the DMA >> API based translations. >> >> First patch attempts to create a direct GPA mapping based DMA operations >> structure called 'virtio_direct_dma_ops' with exact same implementation >> of the direct GPA path which virtio core currently has but just wrapped in >> a DMA API format. Virtio core must use 'virtio_direct_dma_ops' instead of >> the arch default in absence of VIRTIO_F_IOMMU_PLATFORM flag to preserve the >> existing semantics. The second patch does exactly that inside the function >> virtio_finalize_features(). The third patch removes the default direct GPA >> path from virtio core forcing it to use DMA API callbacks for all devices. >> Now with that change, every device must have a DMA operations structure >> associated with it. The fourth patch adds an additional hook which gives >> the platform an opportunity to do yet another override if required. This >> platform hook can be used on POWER Ultravisor based protected guests to >> load up SWIOTLB DMA callbacks to do the required (as discussed previously >> in the above mentioned thread how host is allowed to access only parts of >> the guest GPA range) bounce buffering into the shared memory for all I/O >> scatter gather buffers to be consumed on the host side. >> >> Please go through these patches and review whether this approach broadly >> makes sense. I will appreciate suggestions, inputs, comments regarding >> the patches or the approach in general. Thank you. > Jason did some work on profiling this. Unfortunately he reports > about 4% extra overhead from this switch on x86 with no vIOMMU. The test is rather simple, just run pktgen (pktgen_sample01_simple.sh) in guest and measure PPS on tap on host. Thanks > > I expect he's writing up the data in more detail, but > just wanted to let you know this would be one more > thing to debug before we can just switch to DMA APIs. > > >> Anshuman Khandual (4): >> virtio: Define virtio_direct_dma_ops structure >> virtio: Override device's DMA OPS with virtio_direct_dma_ops selectively >> virtio: Force virtio core to use DMA API callbacks for all virtio devices >> virtio: Add platform specific DMA API translation for virito devices >> >> arch/powerpc/include/asm/dma-mapping.h | 6 +++ >> arch/powerpc/platforms/pseries/iommu.c | 6 +++ >> drivers/virtio/virtio.c | 72 ++++++++++++++++++++++++++++++++++ >> drivers/virtio/virtio_pci_common.h | 3 ++ >> drivers/virtio/virtio_ring.c | 65 +----------------------------- >> 5 files changed, 89 insertions(+), 63 deletions(-) >> >> -- >> 2.9.3 _______________________________________________ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-03 2:41 ` Jason Wang (?) @ 2018-08-03 19:08 ` Michael S. Tsirkin 2018-08-04 1:21 ` Benjamin Herrenschmidt -1 siblings, 1 reply; 206+ messages in thread From: Michael S. Tsirkin @ 2018-08-03 19:08 UTC (permalink / raw) To: Jason Wang Cc: Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, benh, mpe, hch, linuxram, haren, paulus, srikar On Fri, Aug 03, 2018 at 10:41:41AM +0800, Jason Wang wrote: > > > On 2018年08月03日 04:55, Michael S. Tsirkin wrote: > > On Fri, Jul 20, 2018 at 09:29:37AM +0530, Anshuman Khandual wrote: > > > This patch series is the follow up on the discussions we had before about > > > the RFC titled [RFC,V2] virtio: Add platform specific DMA API translation > > > for virito devices (https://patchwork.kernel.org/patch/10417371/). There > > > were suggestions about doing away with two different paths of transactions > > > with the host/QEMU, first being the direct GPA and the other being the DMA > > > API based translations. > > > > > > First patch attempts to create a direct GPA mapping based DMA operations > > > structure called 'virtio_direct_dma_ops' with exact same implementation > > > of the direct GPA path which virtio core currently has but just wrapped in > > > a DMA API format. Virtio core must use 'virtio_direct_dma_ops' instead of > > > the arch default in absence of VIRTIO_F_IOMMU_PLATFORM flag to preserve the > > > existing semantics. The second patch does exactly that inside the function > > > virtio_finalize_features(). The third patch removes the default direct GPA > > > path from virtio core forcing it to use DMA API callbacks for all devices. > > > Now with that change, every device must have a DMA operations structure > > > associated with it. The fourth patch adds an additional hook which gives > > > the platform an opportunity to do yet another override if required. This > > > platform hook can be used on POWER Ultravisor based protected guests to > > > load up SWIOTLB DMA callbacks to do the required (as discussed previously > > > in the above mentioned thread how host is allowed to access only parts of > > > the guest GPA range) bounce buffering into the shared memory for all I/O > > > scatter gather buffers to be consumed on the host side. > > > > > > Please go through these patches and review whether this approach broadly > > > makes sense. I will appreciate suggestions, inputs, comments regarding > > > the patches or the approach in general. Thank you. > > Jason did some work on profiling this. Unfortunately he reports > > about 4% extra overhead from this switch on x86 with no vIOMMU. > > The test is rather simple, just run pktgen (pktgen_sample01_simple.sh) in > guest and measure PPS on tap on host. > > Thanks Could you supply host configuration involved please? > > > > I expect he's writing up the data in more detail, but > > just wanted to let you know this would be one more > > thing to debug before we can just switch to DMA APIs. > > > > > > > Anshuman Khandual (4): > > > virtio: Define virtio_direct_dma_ops structure > > > virtio: Override device's DMA OPS with virtio_direct_dma_ops selectively > > > virtio: Force virtio core to use DMA API callbacks for all virtio devices > > > virtio: Add platform specific DMA API translation for virito devices > > > > > > arch/powerpc/include/asm/dma-mapping.h | 6 +++ > > > arch/powerpc/platforms/pseries/iommu.c | 6 +++ > > > drivers/virtio/virtio.c | 72 ++++++++++++++++++++++++++++++++++ > > > drivers/virtio/virtio_pci_common.h | 3 ++ > > > drivers/virtio/virtio_ring.c | 65 +----------------------------- > > > 5 files changed, 89 insertions(+), 63 deletions(-) > > > > > > -- > > > 2.9.3 ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-03 19:08 ` Michael S. Tsirkin @ 2018-08-04 1:21 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 206+ messages in thread From: Benjamin Herrenschmidt @ 2018-08-04 1:21 UTC (permalink / raw) To: Michael S. Tsirkin, Jason Wang Cc: Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, mpe, hch, linuxram, haren, paulus, srikar On Fri, 2018-08-03 at 22:08 +0300, Michael S. Tsirkin wrote: > > > > Please go through these patches and review whether this approach broadly > > > > makes sense. I will appreciate suggestions, inputs, comments regarding > > > > the patches or the approach in general. Thank you. > > > > > > Jason did some work on profiling this. Unfortunately he reports > > > about 4% extra overhead from this switch on x86 with no vIOMMU. > > > > The test is rather simple, just run pktgen (pktgen_sample01_simple.sh) in > > guest and measure PPS on tap on host. > > > > Thanks > > Could you supply host configuration involved please? I wonder how much of that could be caused by Spectre mitigations blowing up indirect function calls... Cheers, Ben. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices @ 2018-08-04 1:21 ` Benjamin Herrenschmidt 0 siblings, 0 replies; 206+ messages in thread From: Benjamin Herrenschmidt @ 2018-08-04 1:21 UTC (permalink / raw) To: Michael S. Tsirkin, Jason Wang Cc: robh, srikar, mpe, linuxram, linux-kernel, virtualization, hch, paulus, joe, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Fri, 2018-08-03 at 22:08 +0300, Michael S. Tsirkin wrote: > > > > Please go through these patches and review whether this approach broadly > > > > makes sense. I will appreciate suggestions, inputs, comments regarding > > > > the patches or the approach in general. Thank you. > > > > > > Jason did some work on profiling this. Unfortunately he reports > > > about 4% extra overhead from this switch on x86 with no vIOMMU. > > > > The test is rather simple, just run pktgen (pktgen_sample01_simple.sh) in > > guest and measure PPS on tap on host. > > > > Thanks > > Could you supply host configuration involved please? I wonder how much of that could be caused by Spectre mitigations blowing up indirect function calls... Cheers, Ben. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-04 1:21 ` Benjamin Herrenschmidt (?) @ 2018-08-05 0:24 ` Michael S. Tsirkin -1 siblings, 0 replies; 206+ messages in thread From: Michael S. Tsirkin @ 2018-08-05 0:24 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: robh, srikar, mpe, linuxram, linux-kernel, virtualization, hch, paulus, joe, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Fri, Aug 03, 2018 at 08:21:26PM -0500, Benjamin Herrenschmidt wrote: > On Fri, 2018-08-03 at 22:08 +0300, Michael S. Tsirkin wrote: > > > > > Please go through these patches and review whether this approach broadly > > > > > makes sense. I will appreciate suggestions, inputs, comments regarding > > > > > the patches or the approach in general. Thank you. > > > > > > > > Jason did some work on profiling this. Unfortunately he reports > > > > about 4% extra overhead from this switch on x86 with no vIOMMU. > > > > > > The test is rather simple, just run pktgen (pktgen_sample01_simple.sh) in > > > guest and measure PPS on tap on host. > > > > > > Thanks > > > > Could you supply host configuration involved please? > > I wonder how much of that could be caused by Spectre mitigations > blowing up indirect function calls... > > Cheers, > Ben. I won't be surprised. If yes I suggested a way to mitigate the overhead. -- MSR ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-04 1:21 ` Benjamin Herrenschmidt (?) (?) @ 2018-08-05 0:24 ` Michael S. Tsirkin 2018-08-06 9:02 ` Anshuman Khandual -1 siblings, 1 reply; 206+ messages in thread From: Michael S. Tsirkin @ 2018-08-05 0:24 UTC (permalink / raw) To: Benjamin Herrenschmidt Cc: Jason Wang, Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david, mpe, hch, linuxram, haren, paulus, srikar On Fri, Aug 03, 2018 at 08:21:26PM -0500, Benjamin Herrenschmidt wrote: > On Fri, 2018-08-03 at 22:08 +0300, Michael S. Tsirkin wrote: > > > > > Please go through these patches and review whether this approach broadly > > > > > makes sense. I will appreciate suggestions, inputs, comments regarding > > > > > the patches or the approach in general. Thank you. > > > > > > > > Jason did some work on profiling this. Unfortunately he reports > > > > about 4% extra overhead from this switch on x86 with no vIOMMU. > > > > > > The test is rather simple, just run pktgen (pktgen_sample01_simple.sh) in > > > guest and measure PPS on tap on host. > > > > > > Thanks > > > > Could you supply host configuration involved please? > > I wonder how much of that could be caused by Spectre mitigations > blowing up indirect function calls... > > Cheers, > Ben. I won't be surprised. If yes I suggested a way to mitigate the overhead. -- MSR ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-05 0:24 ` Michael S. Tsirkin @ 2018-08-06 9:02 ` Anshuman Khandual 0 siblings, 0 replies; 206+ messages in thread From: Anshuman Khandual @ 2018-08-06 9:02 UTC (permalink / raw) To: Michael S. Tsirkin, Benjamin Herrenschmidt Cc: robh, srikar, aik, Jason Wang, linuxram, linux-kernel, virtualization, hch, paulus, joe, david, linuxppc-dev, elfring, haren On 08/05/2018 05:54 AM, Michael S. Tsirkin wrote: > On Fri, Aug 03, 2018 at 08:21:26PM -0500, Benjamin Herrenschmidt wrote: >> On Fri, 2018-08-03 at 22:08 +0300, Michael S. Tsirkin wrote: >>>>>> Please go through these patches and review whether this approach broadly >>>>>> makes sense. I will appreciate suggestions, inputs, comments regarding >>>>>> the patches or the approach in general. Thank you. >>>>> >>>>> Jason did some work on profiling this. Unfortunately he reports >>>>> about 4% extra overhead from this switch on x86 with no vIOMMU. >>>> >>>> The test is rather simple, just run pktgen (pktgen_sample01_simple.sh) in >>>> guest and measure PPS on tap on host. >>>> >>>> Thanks >>> >>> Could you supply host configuration involved please? >> >> I wonder how much of that could be caused by Spectre mitigations >> blowing up indirect function calls... >> >> Cheers, >> Ben. > > I won't be surprised. If yes I suggested a way to mitigate the overhead. Did we get better results (lower regression due to indirect calls) with the suggested mitigation ? Just curious. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices @ 2018-08-06 9:02 ` Anshuman Khandual 0 siblings, 0 replies; 206+ messages in thread From: Anshuman Khandual @ 2018-08-06 9:02 UTC (permalink / raw) To: Michael S. Tsirkin, Benjamin Herrenschmidt Cc: robh, srikar, linuxram, linux-kernel, virtualization, hch, paulus, joe, linuxppc-dev, elfring, haren, david On 08/05/2018 05:54 AM, Michael S. Tsirkin wrote: > On Fri, Aug 03, 2018 at 08:21:26PM -0500, Benjamin Herrenschmidt wrote: >> On Fri, 2018-08-03 at 22:08 +0300, Michael S. Tsirkin wrote: >>>>>> Please go through these patches and review whether this approach broadly >>>>>> makes sense. I will appreciate suggestions, inputs, comments regarding >>>>>> the patches or the approach in general. Thank you. >>>>> >>>>> Jason did some work on profiling this. Unfortunately he reports >>>>> about 4% extra overhead from this switch on x86 with no vIOMMU. >>>> >>>> The test is rather simple, just run pktgen (pktgen_sample01_simple.sh) in >>>> guest and measure PPS on tap on host. >>>> >>>> Thanks >>> >>> Could you supply host configuration involved please? >> >> I wonder how much of that could be caused by Spectre mitigations >> blowing up indirect function calls... >> >> Cheers, >> Ben. > > I won't be surprised. If yes I suggested a way to mitigate the overhead. Did we get better results (lower regression due to indirect calls) with the suggested mitigation ? Just curious. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-06 9:02 ` Anshuman Khandual @ 2018-08-06 13:36 ` Michael S. Tsirkin -1 siblings, 0 replies; 206+ messages in thread From: Michael S. Tsirkin @ 2018-08-06 13:36 UTC (permalink / raw) To: Anshuman Khandual Cc: Benjamin Herrenschmidt, robh, srikar, aik, Jason Wang, linuxram, linux-kernel, virtualization, hch, paulus, joe, david, linuxppc-dev, elfring, haren On Mon, Aug 06, 2018 at 02:32:28PM +0530, Anshuman Khandual wrote: > On 08/05/2018 05:54 AM, Michael S. Tsirkin wrote: > > On Fri, Aug 03, 2018 at 08:21:26PM -0500, Benjamin Herrenschmidt wrote: > >> On Fri, 2018-08-03 at 22:08 +0300, Michael S. Tsirkin wrote: > >>>>>> Please go through these patches and review whether this approach broadly > >>>>>> makes sense. I will appreciate suggestions, inputs, comments regarding > >>>>>> the patches or the approach in general. Thank you. > >>>>> > >>>>> Jason did some work on profiling this. Unfortunately he reports > >>>>> about 4% extra overhead from this switch on x86 with no vIOMMU. > >>>> > >>>> The test is rather simple, just run pktgen (pktgen_sample01_simple.sh) in > >>>> guest and measure PPS on tap on host. > >>>> > >>>> Thanks > >>> > >>> Could you supply host configuration involved please? > >> > >> I wonder how much of that could be caused by Spectre mitigations > >> blowing up indirect function calls... > >> > >> Cheers, > >> Ben. > > > > I won't be surprised. If yes I suggested a way to mitigate the overhead. > > Did we get better results (lower regression due to indirect calls) with > the suggested mitigation ? Just curious. I'm referring to this: I wonder whether we can support map_sg and friends being NULL, then use that when mapping is an identity. A conditional branch there is likely very cheap. I don't think anyone tried implementing this yes. -- MST ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices @ 2018-08-06 13:36 ` Michael S. Tsirkin 0 siblings, 0 replies; 206+ messages in thread From: Michael S. Tsirkin @ 2018-08-06 13:36 UTC (permalink / raw) To: Anshuman Khandual Cc: robh, srikar, Benjamin Herrenschmidt, linuxram, linux-kernel, virtualization, hch, paulus, joe, linuxppc-dev, elfring, haren, david On Mon, Aug 06, 2018 at 02:32:28PM +0530, Anshuman Khandual wrote: > On 08/05/2018 05:54 AM, Michael S. Tsirkin wrote: > > On Fri, Aug 03, 2018 at 08:21:26PM -0500, Benjamin Herrenschmidt wrote: > >> On Fri, 2018-08-03 at 22:08 +0300, Michael S. Tsirkin wrote: > >>>>>> Please go through these patches and review whether this approach broadly > >>>>>> makes sense. I will appreciate suggestions, inputs, comments regarding > >>>>>> the patches or the approach in general. Thank you. > >>>>> > >>>>> Jason did some work on profiling this. Unfortunately he reports > >>>>> about 4% extra overhead from this switch on x86 with no vIOMMU. > >>>> > >>>> The test is rather simple, just run pktgen (pktgen_sample01_simple.sh) in > >>>> guest and measure PPS on tap on host. > >>>> > >>>> Thanks > >>> > >>> Could you supply host configuration involved please? > >> > >> I wonder how much of that could be caused by Spectre mitigations > >> blowing up indirect function calls... > >> > >> Cheers, > >> Ben. > > > > I won't be surprised. If yes I suggested a way to mitigate the overhead. > > Did we get better results (lower regression due to indirect calls) with > the suggested mitigation ? Just curious. I'm referring to this: I wonder whether we can support map_sg and friends being NULL, then use that when mapping is an identity. A conditional branch there is likely very cheap. I don't think anyone tried implementing this yes. -- MST ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-06 13:36 ` Michael S. Tsirkin (?) @ 2018-08-06 15:24 ` Christoph Hellwig 2018-08-06 16:06 ` Michael S. Tsirkin 2018-08-06 16:06 ` Michael S. Tsirkin -1 siblings, 2 replies; 206+ messages in thread From: Christoph Hellwig @ 2018-08-06 15:24 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Anshuman Khandual, Benjamin Herrenschmidt, robh, srikar, aik, Jason Wang, linuxram, linux-kernel, virtualization, hch, paulus, joe, david, linuxppc-dev, elfring, haren On Mon, Aug 06, 2018 at 04:36:43PM +0300, Michael S. Tsirkin wrote: > On Mon, Aug 06, 2018 at 02:32:28PM +0530, Anshuman Khandual wrote: > > On 08/05/2018 05:54 AM, Michael S. Tsirkin wrote: > > > On Fri, Aug 03, 2018 at 08:21:26PM -0500, Benjamin Herrenschmidt wrote: > > >> On Fri, 2018-08-03 at 22:08 +0300, Michael S. Tsirkin wrote: > > >>>>>> Please go through these patches and review whether this approach broadly > > >>>>>> makes sense. I will appreciate suggestions, inputs, comments regarding > > >>>>>> the patches or the approach in general. Thank you. > > >>>>> > > >>>>> Jason did some work on profiling this. Unfortunately he reports > > >>>>> about 4% extra overhead from this switch on x86 with no vIOMMU. > > >>>> > > >>>> The test is rather simple, just run pktgen (pktgen_sample01_simple.sh) in > > >>>> guest and measure PPS on tap on host. > > >>>> > > >>>> Thanks > > >>> > > >>> Could you supply host configuration involved please? > > >> > > >> I wonder how much of that could be caused by Spectre mitigations > > >> blowing up indirect function calls... > > >> > > >> Cheers, > > >> Ben. > > > > > > I won't be surprised. If yes I suggested a way to mitigate the overhead. > > > > Did we get better results (lower regression due to indirect calls) with > > the suggested mitigation ? Just curious. > > I'm referring to this: > I wonder whether we can support map_sg and friends being NULL, then use > that when mapping is an identity. A conditional branch there is likely > very cheap. > > I don't think anyone tried implementing this yes. I've done something very similar in the thread I posted a few years ago. I plan to get a version of that upstream for 4.20, but it won't cover the virtio case, just the real direct mapping. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-06 15:24 ` Christoph Hellwig @ 2018-08-06 16:06 ` Michael S. Tsirkin 2018-08-06 16:06 ` Michael S. Tsirkin 1 sibling, 0 replies; 206+ messages in thread From: Michael S. Tsirkin @ 2018-08-06 16:06 UTC (permalink / raw) To: Christoph Hellwig Cc: robh, srikar, Benjamin Herrenschmidt, linuxram, linux-kernel, virtualization, paulus, joe, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Mon, Aug 06, 2018 at 08:24:06AM -0700, Christoph Hellwig wrote: > On Mon, Aug 06, 2018 at 04:36:43PM +0300, Michael S. Tsirkin wrote: > > On Mon, Aug 06, 2018 at 02:32:28PM +0530, Anshuman Khandual wrote: > > > On 08/05/2018 05:54 AM, Michael S. Tsirkin wrote: > > > > On Fri, Aug 03, 2018 at 08:21:26PM -0500, Benjamin Herrenschmidt wrote: > > > >> On Fri, 2018-08-03 at 22:08 +0300, Michael S. Tsirkin wrote: > > > >>>>>> Please go through these patches and review whether this approach broadly > > > >>>>>> makes sense. I will appreciate suggestions, inputs, comments regarding > > > >>>>>> the patches or the approach in general. Thank you. > > > >>>>> > > > >>>>> Jason did some work on profiling this. Unfortunately he reports > > > >>>>> about 4% extra overhead from this switch on x86 with no vIOMMU. > > > >>>> > > > >>>> The test is rather simple, just run pktgen (pktgen_sample01_simple.sh) in > > > >>>> guest and measure PPS on tap on host. > > > >>>> > > > >>>> Thanks > > > >>> > > > >>> Could you supply host configuration involved please? > > > >> > > > >> I wonder how much of that could be caused by Spectre mitigations > > > >> blowing up indirect function calls... > > > >> > > > >> Cheers, > > > >> Ben. > > > > > > > > I won't be surprised. If yes I suggested a way to mitigate the overhead. > > > > > > Did we get better results (lower regression due to indirect calls) with > > > the suggested mitigation ? Just curious. > > > > I'm referring to this: > > I wonder whether we can support map_sg and friends being NULL, then use > > that when mapping is an identity. A conditional branch there is likely > > very cheap. > > > > I don't think anyone tried implementing this yes. > > I've done something very similar in the thread I posted a few years > ago. Right so that was before spectre where a virtual call was cheaper :( > I plan to get a version of that upstream for 4.20, but it won't > cover the virtio case, just the real direct mapping. I guess this RFC will have to be reworked on top and performance retested. Thanks, -- MST ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-06 15:24 ` Christoph Hellwig 2018-08-06 16:06 ` Michael S. Tsirkin @ 2018-08-06 16:06 ` Michael S. Tsirkin 2018-08-06 16:10 ` Christoph Hellwig 1 sibling, 1 reply; 206+ messages in thread From: Michael S. Tsirkin @ 2018-08-06 16:06 UTC (permalink / raw) To: Christoph Hellwig Cc: Anshuman Khandual, Benjamin Herrenschmidt, robh, srikar, aik, Jason Wang, linuxram, linux-kernel, virtualization, paulus, joe, david, linuxppc-dev, elfring, haren On Mon, Aug 06, 2018 at 08:24:06AM -0700, Christoph Hellwig wrote: > On Mon, Aug 06, 2018 at 04:36:43PM +0300, Michael S. Tsirkin wrote: > > On Mon, Aug 06, 2018 at 02:32:28PM +0530, Anshuman Khandual wrote: > > > On 08/05/2018 05:54 AM, Michael S. Tsirkin wrote: > > > > On Fri, Aug 03, 2018 at 08:21:26PM -0500, Benjamin Herrenschmidt wrote: > > > >> On Fri, 2018-08-03 at 22:08 +0300, Michael S. Tsirkin wrote: > > > >>>>>> Please go through these patches and review whether this approach broadly > > > >>>>>> makes sense. I will appreciate suggestions, inputs, comments regarding > > > >>>>>> the patches or the approach in general. Thank you. > > > >>>>> > > > >>>>> Jason did some work on profiling this. Unfortunately he reports > > > >>>>> about 4% extra overhead from this switch on x86 with no vIOMMU. > > > >>>> > > > >>>> The test is rather simple, just run pktgen (pktgen_sample01_simple.sh) in > > > >>>> guest and measure PPS on tap on host. > > > >>>> > > > >>>> Thanks > > > >>> > > > >>> Could you supply host configuration involved please? > > > >> > > > >> I wonder how much of that could be caused by Spectre mitigations > > > >> blowing up indirect function calls... > > > >> > > > >> Cheers, > > > >> Ben. > > > > > > > > I won't be surprised. If yes I suggested a way to mitigate the overhead. > > > > > > Did we get better results (lower regression due to indirect calls) with > > > the suggested mitigation ? Just curious. > > > > I'm referring to this: > > I wonder whether we can support map_sg and friends being NULL, then use > > that when mapping is an identity. A conditional branch there is likely > > very cheap. > > > > I don't think anyone tried implementing this yes. > > I've done something very similar in the thread I posted a few years > ago. Right so that was before spectre where a virtual call was cheaper :( > I plan to get a version of that upstream for 4.20, but it won't > cover the virtio case, just the real direct mapping. I guess this RFC will have to be reworked on top and performance retested. Thanks, -- MST ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-06 16:06 ` Michael S. Tsirkin @ 2018-08-06 16:10 ` Christoph Hellwig 0 siblings, 0 replies; 206+ messages in thread From: Christoph Hellwig @ 2018-08-06 16:10 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Christoph Hellwig, Anshuman Khandual, Benjamin Herrenschmidt, robh, srikar, aik, Jason Wang, linuxram, linux-kernel, virtualization, paulus, joe, david, linuxppc-dev, elfring, haren On Mon, Aug 06, 2018 at 07:06:05PM +0300, Michael S. Tsirkin wrote: > > I've done something very similar in the thread I posted a few years > > ago. > > Right so that was before spectre where a virtual call was cheaper :( Sorry, I meant days, not years. The whole point of the thread was the slowdowns due to retpolines, which are the software spectre mitigation. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices @ 2018-08-06 16:10 ` Christoph Hellwig 0 siblings, 0 replies; 206+ messages in thread From: Christoph Hellwig @ 2018-08-06 16:10 UTC (permalink / raw) To: Michael S. Tsirkin Cc: robh, srikar, Benjamin Herrenschmidt, linuxram, linux-kernel, virtualization, Christoph Hellwig, paulus, joe, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Mon, Aug 06, 2018 at 07:06:05PM +0300, Michael S. Tsirkin wrote: > > I've done something very similar in the thread I posted a few years > > ago. > > Right so that was before spectre where a virtual call was cheaper :( Sorry, I meant days, not years. The whole point of the thread was the slowdowns due to retpolines, which are the software spectre mitigation. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-06 16:10 ` Christoph Hellwig @ 2018-08-06 16:13 ` Michael S. Tsirkin -1 siblings, 0 replies; 206+ messages in thread From: Michael S. Tsirkin @ 2018-08-06 16:13 UTC (permalink / raw) To: Christoph Hellwig Cc: Anshuman Khandual, Benjamin Herrenschmidt, robh, srikar, aik, Jason Wang, linuxram, linux-kernel, virtualization, paulus, joe, david, linuxppc-dev, elfring, haren On Mon, Aug 06, 2018 at 09:10:40AM -0700, Christoph Hellwig wrote: > On Mon, Aug 06, 2018 at 07:06:05PM +0300, Michael S. Tsirkin wrote: > > > I've done something very similar in the thread I posted a few years > > > ago. > > > > Right so that was before spectre where a virtual call was cheaper :( > > Sorry, I meant days, not years. The whole point of the thread was the > slowdowns due to retpolines, which are the software spectre mitigation. Oh that makes sense then. Could you post a pointer pls so this patchset is rebased on top (there are things to change about 4/4 but 1-3 could go in if they don't add overhead)? -- MST ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices @ 2018-08-06 16:13 ` Michael S. Tsirkin 0 siblings, 0 replies; 206+ messages in thread From: Michael S. Tsirkin @ 2018-08-06 16:13 UTC (permalink / raw) To: Christoph Hellwig Cc: robh, srikar, Benjamin Herrenschmidt, linuxram, linux-kernel, virtualization, paulus, joe, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Mon, Aug 06, 2018 at 09:10:40AM -0700, Christoph Hellwig wrote: > On Mon, Aug 06, 2018 at 07:06:05PM +0300, Michael S. Tsirkin wrote: > > > I've done something very similar in the thread I posted a few years > > > ago. > > > > Right so that was before spectre where a virtual call was cheaper :( > > Sorry, I meant days, not years. The whole point of the thread was the > slowdowns due to retpolines, which are the software spectre mitigation. Oh that makes sense then. Could you post a pointer pls so this patchset is rebased on top (there are things to change about 4/4 but 1-3 could go in if they don't add overhead)? -- MST ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-06 16:13 ` Michael S. Tsirkin @ 2018-08-06 16:34 ` Christoph Hellwig -1 siblings, 0 replies; 206+ messages in thread From: Christoph Hellwig @ 2018-08-06 16:34 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Christoph Hellwig, Anshuman Khandual, Benjamin Herrenschmidt, robh, srikar, aik, Jason Wang, linuxram, linux-kernel, virtualization, paulus, joe, david, linuxppc-dev, elfring, haren On Mon, Aug 06, 2018 at 07:13:32PM +0300, Michael S. Tsirkin wrote: > Oh that makes sense then. Could you post a pointer pls so > this patchset is rebased on top (there are things to > change about 4/4 but 1-3 could go in if they don't add > overhead)? The dma mapping direct calls will need a major work vs what I posted. I plan to start that work in about two weeks once returning from my vacation. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices @ 2018-08-06 16:34 ` Christoph Hellwig 0 siblings, 0 replies; 206+ messages in thread From: Christoph Hellwig @ 2018-08-06 16:34 UTC (permalink / raw) To: Michael S. Tsirkin Cc: robh, srikar, Benjamin Herrenschmidt, linuxram, linux-kernel, virtualization, Christoph Hellwig, paulus, joe, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Mon, Aug 06, 2018 at 07:13:32PM +0300, Michael S. Tsirkin wrote: > Oh that makes sense then. Could you post a pointer pls so > this patchset is rebased on top (there are things to > change about 4/4 but 1-3 could go in if they don't add > overhead)? The dma mapping direct calls will need a major work vs what I posted. I plan to start that work in about two weeks once returning from my vacation. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-06 13:36 ` Michael S. Tsirkin (?) (?) @ 2018-08-06 15:24 ` Christoph Hellwig -1 siblings, 0 replies; 206+ messages in thread From: Christoph Hellwig @ 2018-08-06 15:24 UTC (permalink / raw) To: Michael S. Tsirkin Cc: robh, srikar, Benjamin Herrenschmidt, linuxram, linux-kernel, virtualization, hch, paulus, joe, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Mon, Aug 06, 2018 at 04:36:43PM +0300, Michael S. Tsirkin wrote: > On Mon, Aug 06, 2018 at 02:32:28PM +0530, Anshuman Khandual wrote: > > On 08/05/2018 05:54 AM, Michael S. Tsirkin wrote: > > > On Fri, Aug 03, 2018 at 08:21:26PM -0500, Benjamin Herrenschmidt wrote: > > >> On Fri, 2018-08-03 at 22:08 +0300, Michael S. Tsirkin wrote: > > >>>>>> Please go through these patches and review whether this approach broadly > > >>>>>> makes sense. I will appreciate suggestions, inputs, comments regarding > > >>>>>> the patches or the approach in general. Thank you. > > >>>>> > > >>>>> Jason did some work on profiling this. Unfortunately he reports > > >>>>> about 4% extra overhead from this switch on x86 with no vIOMMU. > > >>>> > > >>>> The test is rather simple, just run pktgen (pktgen_sample01_simple.sh) in > > >>>> guest and measure PPS on tap on host. > > >>>> > > >>>> Thanks > > >>> > > >>> Could you supply host configuration involved please? > > >> > > >> I wonder how much of that could be caused by Spectre mitigations > > >> blowing up indirect function calls... > > >> > > >> Cheers, > > >> Ben. > > > > > > I won't be surprised. If yes I suggested a way to mitigate the overhead. > > > > Did we get better results (lower regression due to indirect calls) with > > the suggested mitigation ? Just curious. > > I'm referring to this: > I wonder whether we can support map_sg and friends being NULL, then use > that when mapping is an identity. A conditional branch there is likely > very cheap. > > I don't think anyone tried implementing this yes. I've done something very similar in the thread I posted a few years ago. I plan to get a version of that upstream for 4.20, but it won't cover the virtio case, just the real direct mapping. ^ permalink raw reply [flat|nested] 206+ messages in thread
* Re: [RFC 0/4] Virtio uses DMA API for all devices 2018-08-03 2:41 ` Jason Wang (?) (?) @ 2018-08-03 19:08 ` Michael S. Tsirkin -1 siblings, 0 replies; 206+ messages in thread From: Michael S. Tsirkin @ 2018-08-03 19:08 UTC (permalink / raw) To: Jason Wang Cc: robh, srikar, benh, linuxram, linux-kernel, virtualization, hch, paulus, mpe, joe, david, linuxppc-dev, elfring, haren, Anshuman Khandual On Fri, Aug 03, 2018 at 10:41:41AM +0800, Jason Wang wrote: > > > On 2018年08月03日 04:55, Michael S. Tsirkin wrote: > > On Fri, Jul 20, 2018 at 09:29:37AM +0530, Anshuman Khandual wrote: > > > This patch series is the follow up on the discussions we had before about > > > the RFC titled [RFC,V2] virtio: Add platform specific DMA API translation > > > for virito devices (https://patchwork.kernel.org/patch/10417371/). There > > > were suggestions about doing away with two different paths of transactions > > > with the host/QEMU, first being the direct GPA and the other being the DMA > > > API based translations. > > > > > > First patch attempts to create a direct GPA mapping based DMA operations > > > structure called 'virtio_direct_dma_ops' with exact same implementation > > > of the direct GPA path which virtio core currently has but just wrapped in > > > a DMA API format. Virtio core must use 'virtio_direct_dma_ops' instead of > > > the arch default in absence of VIRTIO_F_IOMMU_PLATFORM flag to preserve the > > > existing semantics. The second patch does exactly that inside the function > > > virtio_finalize_features(). The third patch removes the default direct GPA > > > path from virtio core forcing it to use DMA API callbacks for all devices. > > > Now with that change, every device must have a DMA operations structure > > > associated with it. The fourth patch adds an additional hook which gives > > > the platform an opportunity to do yet another override if required. This > > > platform hook can be used on POWER Ultravisor based protected guests to > > > load up SWIOTLB DMA callbacks to do the required (as discussed previously > > > in the above mentioned thread how host is allowed to access only parts of > > > the guest GPA range) bounce buffering into the shared memory for all I/O > > > scatter gather buffers to be consumed on the host side. > > > > > > Please go through these patches and review whether this approach broadly > > > makes sense. I will appreciate suggestions, inputs, comments regarding > > > the patches or the approach in general. Thank you. > > Jason did some work on profiling this. Unfortunately he reports > > about 4% extra overhead from this switch on x86 with no vIOMMU. > > The test is rather simple, just run pktgen (pktgen_sample01_simple.sh) in > guest and measure PPS on tap on host. > > Thanks Could you supply host configuration involved please? > > > > I expect he's writing up the data in more detail, but > > just wanted to let you know this would be one more > > thing to debug before we can just switch to DMA APIs. > > > > > > > Anshuman Khandual (4): > > > virtio: Define virtio_direct_dma_ops structure > > > virtio: Override device's DMA OPS with virtio_direct_dma_ops selectively > > > virtio: Force virtio core to use DMA API callbacks for all virtio devices > > > virtio: Add platform specific DMA API translation for virito devices > > > > > > arch/powerpc/include/asm/dma-mapping.h | 6 +++ > > > arch/powerpc/platforms/pseries/iommu.c | 6 +++ > > > drivers/virtio/virtio.c | 72 ++++++++++++++++++++++++++++++++++ > > > drivers/virtio/virtio_pci_common.h | 3 ++ > > > drivers/virtio/virtio_ring.c | 65 +----------------------------- > > > 5 files changed, 89 insertions(+), 63 deletions(-) > > > > > > -- > > > 2.9.3 _______________________________________________ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization ^ permalink raw reply [flat|nested] 206+ messages in thread
end of thread, other threads:[~2018-09-10 8:53 UTC | newest] Thread overview: 206+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-07-20 3:59 [RFC 0/4] Virtio uses DMA API for all devices Anshuman Khandual -- strict thread matches above, loose matches on Subject: below -- 2018-07-20 3:59 Anshuman Khandual 2018-07-20 13:16 ` Michael S. Tsirkin 2018-07-20 13:16 ` Michael S. Tsirkin 2018-07-23 6:28 ` Anshuman Khandual 2018-07-23 9:08 ` Michael S. Tsirkin 2018-07-23 9:08 ` Michael S. Tsirkin 2018-07-25 3:26 ` Anshuman Khandual 2018-07-27 11:31 ` Michael S. Tsirkin 2018-07-27 11:31 ` Michael S. Tsirkin 2018-07-28 8:37 ` Anshuman Khandual 2018-07-28 8:37 ` Anshuman Khandual 2018-07-25 3:26 ` Anshuman Khandual 2018-07-23 6:28 ` Anshuman Khandual 2018-07-27 9:58 ` Will Deacon 2018-07-27 9:58 ` Will Deacon 2018-07-27 9:58 ` Will Deacon 2018-07-27 10:58 ` Anshuman Khandual 2018-07-27 10:58 ` Anshuman Khandual 2018-07-30 9:34 ` Christoph Hellwig 2018-07-30 9:34 ` Christoph Hellwig 2018-07-30 10:28 ` Michael S. Tsirkin 2018-07-30 10:28 ` Michael S. Tsirkin 2018-07-30 11:18 ` Christoph Hellwig 2018-07-30 11:18 ` Christoph Hellwig 2018-07-30 13:26 ` Michael S. Tsirkin 2018-07-30 13:26 ` Michael S. Tsirkin 2018-07-31 17:30 ` Christoph Hellwig 2018-07-31 17:30 ` Christoph Hellwig 2018-07-31 20:36 ` Benjamin Herrenschmidt 2018-07-31 20:36 ` Benjamin Herrenschmidt 2018-08-01 8:16 ` Will Deacon 2018-08-01 8:16 ` Will Deacon 2018-08-01 8:36 ` Christoph Hellwig 2018-08-01 8:36 ` Christoph Hellwig 2018-08-01 8:36 ` Christoph Hellwig 2018-08-01 9:05 ` Will Deacon 2018-08-01 9:05 ` Will Deacon 2018-08-01 22:41 ` Michael S. Tsirkin 2018-08-01 22:41 ` Michael S. Tsirkin 2018-08-01 22:35 ` Michael S. Tsirkin 2018-08-01 22:35 ` Michael S. Tsirkin 2018-08-02 15:24 ` Benjamin Herrenschmidt 2018-08-02 15:24 ` Benjamin Herrenschmidt 2018-08-02 15:41 ` Michael S. Tsirkin 2018-08-02 15:41 ` Michael S. Tsirkin 2018-08-02 16:01 ` Benjamin Herrenschmidt 2018-08-02 16:01 ` Benjamin Herrenschmidt 2018-08-02 17:19 ` Michael S. Tsirkin 2018-08-02 17:19 ` Michael S. Tsirkin 2018-08-02 17:53 ` Benjamin Herrenschmidt 2018-08-02 17:53 ` Benjamin Herrenschmidt 2018-08-02 20:52 ` Michael S. Tsirkin 2018-08-02 20:52 ` Michael S. Tsirkin 2018-08-02 21:13 ` Benjamin Herrenschmidt 2018-08-02 21:13 ` Benjamin Herrenschmidt 2018-08-02 21:51 ` Michael S. Tsirkin 2018-08-02 21:51 ` Michael S. Tsirkin 2018-08-03 7:05 ` Christoph Hellwig 2018-08-03 7:05 ` Christoph Hellwig 2018-08-03 15:58 ` Benjamin Herrenschmidt 2018-08-03 15:58 ` Benjamin Herrenschmidt 2018-08-03 16:02 ` Christoph Hellwig 2018-08-03 16:02 ` Christoph Hellwig 2018-08-03 18:58 ` Benjamin Herrenschmidt 2018-08-03 18:58 ` Benjamin Herrenschmidt 2018-08-04 8:21 ` Christoph Hellwig 2018-08-04 8:21 ` Christoph Hellwig 2018-08-05 1:10 ` Benjamin Herrenschmidt 2018-08-05 1:10 ` Benjamin Herrenschmidt 2018-08-05 1:10 ` Benjamin Herrenschmidt 2018-08-05 7:29 ` Christoph Hellwig 2018-08-05 7:29 ` Christoph Hellwig 2018-08-05 21:16 ` Benjamin Herrenschmidt 2018-08-05 21:16 ` Benjamin Herrenschmidt 2018-08-05 21:30 ` Benjamin Herrenschmidt 2018-08-05 21:30 ` Benjamin Herrenschmidt 2018-08-06 9:42 ` Christoph Hellwig 2018-08-06 9:42 ` Christoph Hellwig 2018-08-06 19:52 ` Benjamin Herrenschmidt 2018-08-06 19:52 ` Benjamin Herrenschmidt 2018-08-07 6:21 ` Christoph Hellwig 2018-08-07 6:42 ` Benjamin Herrenschmidt 2018-08-07 6:42 ` Benjamin Herrenschmidt 2018-08-07 13:55 ` Christoph Hellwig 2018-08-07 20:32 ` Benjamin Herrenschmidt 2018-08-07 20:32 ` Benjamin Herrenschmidt 2018-08-08 6:31 ` Christoph Hellwig 2018-08-08 6:31 ` Christoph Hellwig 2018-08-08 10:07 ` Benjamin Herrenschmidt 2018-08-08 10:07 ` Benjamin Herrenschmidt 2018-08-08 12:30 ` Christoph Hellwig 2018-08-08 13:18 ` Benjamin Herrenschmidt 2018-08-08 13:18 ` Benjamin Herrenschmidt 2018-08-08 20:31 ` Michael S. Tsirkin 2018-08-08 22:13 ` Benjamin Herrenschmidt 2018-08-08 22:13 ` Benjamin Herrenschmidt 2018-08-09 2:00 ` Benjamin Herrenschmidt 2018-08-09 2:00 ` Benjamin Herrenschmidt 2018-08-09 5:40 ` Christoph Hellwig 2018-08-09 5:40 ` Christoph Hellwig 2018-09-07 0:09 ` Jiandi An 2018-09-10 6:19 ` Christoph Hellwig 2018-09-10 6:19 ` Christoph Hellwig 2018-09-10 8:53 ` Gerd Hoffmann 2018-09-10 8:53 ` Gerd Hoffmann 2018-08-08 20:31 ` Michael S. Tsirkin 2018-08-08 12:30 ` Christoph Hellwig 2018-08-07 13:55 ` Christoph Hellwig 2018-08-07 6:21 ` Christoph Hellwig 2018-08-03 19:07 ` Michael S. Tsirkin 2018-08-03 19:07 ` Michael S. Tsirkin 2018-08-04 1:11 ` Benjamin Herrenschmidt 2018-08-04 1:11 ` Benjamin Herrenschmidt 2018-08-04 1:16 ` Benjamin Herrenschmidt 2018-08-04 1:16 ` Benjamin Herrenschmidt 2018-08-05 0:22 ` Michael S. Tsirkin 2018-08-05 4:52 ` Benjamin Herrenschmidt 2018-08-05 4:52 ` Benjamin Herrenschmidt 2018-08-06 13:46 ` Michael S. Tsirkin 2018-08-06 19:56 ` Benjamin Herrenschmidt 2018-08-06 19:56 ` Benjamin Herrenschmidt 2018-08-06 20:35 ` Michael S. Tsirkin 2018-08-06 21:26 ` Benjamin Herrenschmidt 2018-08-06 21:26 ` Benjamin Herrenschmidt 2018-08-06 21:46 ` Michael S. Tsirkin 2018-08-06 21:46 ` Michael S. Tsirkin 2018-08-06 22:13 ` Benjamin Herrenschmidt 2018-08-06 22:13 ` Benjamin Herrenschmidt 2018-08-06 23:16 ` Benjamin Herrenschmidt 2018-08-06 23:16 ` Benjamin Herrenschmidt 2018-08-06 23:45 ` Michael S. Tsirkin 2018-08-07 0:18 ` Benjamin Herrenschmidt 2018-08-07 0:18 ` Benjamin Herrenschmidt 2018-08-07 6:32 ` Christoph Hellwig 2018-08-07 6:32 ` Christoph Hellwig 2018-08-06 23:45 ` Michael S. Tsirkin 2018-08-07 6:27 ` Christoph Hellwig 2018-08-07 6:27 ` Christoph Hellwig 2018-08-07 6:44 ` Benjamin Herrenschmidt 2018-08-07 6:44 ` Benjamin Herrenschmidt 2018-08-07 6:18 ` Christoph Hellwig 2018-08-07 6:18 ` Christoph Hellwig 2018-08-07 6:16 ` Christoph Hellwig 2018-08-07 6:16 ` Christoph Hellwig 2018-08-06 23:18 ` Benjamin Herrenschmidt 2018-08-06 23:18 ` Benjamin Herrenschmidt 2018-08-07 6:12 ` Christoph Hellwig 2018-08-07 6:12 ` Christoph Hellwig 2018-08-06 20:35 ` Michael S. Tsirkin 2018-08-06 13:46 ` Michael S. Tsirkin 2018-08-05 0:22 ` Michael S. Tsirkin 2018-08-04 1:18 ` Benjamin Herrenschmidt 2018-08-04 1:18 ` Benjamin Herrenschmidt 2018-08-04 1:22 ` Benjamin Herrenschmidt 2018-08-04 1:22 ` Benjamin Herrenschmidt 2018-08-05 0:23 ` Michael S. Tsirkin 2018-08-05 0:23 ` Michael S. Tsirkin 2018-08-03 19:17 ` Michael S. Tsirkin 2018-08-03 19:17 ` Michael S. Tsirkin 2018-08-04 8:15 ` Christoph Hellwig 2018-08-04 8:15 ` Christoph Hellwig 2018-08-05 0:09 ` Michael S. Tsirkin 2018-08-05 0:09 ` Michael S. Tsirkin 2018-08-05 1:11 ` Benjamin Herrenschmidt 2018-08-05 1:11 ` Benjamin Herrenschmidt 2018-08-05 7:25 ` Christoph Hellwig 2018-08-05 7:25 ` Christoph Hellwig 2018-08-05 0:53 ` Benjamin Herrenschmidt 2018-08-05 0:53 ` Benjamin Herrenschmidt 2018-08-05 0:27 ` Michael S. Tsirkin 2018-08-05 0:27 ` Michael S. Tsirkin 2018-08-06 14:05 ` Will Deacon 2018-08-06 14:05 ` Will Deacon 2018-08-01 21:56 ` Michael S. Tsirkin 2018-08-01 21:56 ` Michael S. Tsirkin 2018-08-02 15:33 ` Benjamin Herrenschmidt 2018-08-02 15:33 ` Benjamin Herrenschmidt 2018-08-02 20:53 ` Michael S. Tsirkin 2018-08-03 7:06 ` Christoph Hellwig 2018-08-03 7:06 ` Christoph Hellwig 2018-08-02 20:53 ` Michael S. Tsirkin 2018-08-02 20:55 ` Michael S. Tsirkin 2018-08-02 20:55 ` Michael S. Tsirkin 2018-08-03 2:41 ` Jason Wang 2018-08-03 2:41 ` Jason Wang 2018-08-03 19:08 ` Michael S. Tsirkin 2018-08-04 1:21 ` Benjamin Herrenschmidt 2018-08-04 1:21 ` Benjamin Herrenschmidt 2018-08-05 0:24 ` Michael S. Tsirkin 2018-08-05 0:24 ` Michael S. Tsirkin 2018-08-06 9:02 ` Anshuman Khandual 2018-08-06 9:02 ` Anshuman Khandual 2018-08-06 13:36 ` Michael S. Tsirkin 2018-08-06 13:36 ` Michael S. Tsirkin 2018-08-06 15:24 ` Christoph Hellwig 2018-08-06 16:06 ` Michael S. Tsirkin 2018-08-06 16:06 ` Michael S. Tsirkin 2018-08-06 16:10 ` Christoph Hellwig 2018-08-06 16:10 ` Christoph Hellwig 2018-08-06 16:13 ` Michael S. Tsirkin 2018-08-06 16:13 ` Michael S. Tsirkin 2018-08-06 16:34 ` Christoph Hellwig 2018-08-06 16:34 ` Christoph Hellwig 2018-08-06 15:24 ` Christoph Hellwig 2018-08-03 19:08 ` Michael S. Tsirkin
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.