* Can VFIO pin only a specific region of guest mem when use pass through devices? @ 2018-10-29 2:42 Simon Guo 2018-10-29 9:14 ` Jason Wang 0 siblings, 1 reply; 5+ messages in thread From: Simon Guo @ 2018-10-29 2:42 UTC (permalink / raw) To: Alex Williamson, Eric Auger; +Cc: qixuan.wu, linux-kernel, kvm Hi, I am using network device pass through mode with qemu x86(-device vfio-pci,host=0000:xx:yy.z) and “intel_iommu=on” in host kernel command line, and it shows the whole guest memory were pinned(vfio_pin_pages()), viewed by the “top” RES memory output. I understand it is due to device can DMA to any guest memory address and it cannot be swapped. However can we just pin a rang of address space allowed by iommu group of that device, instead of pin whole address space? I do notice some code like vtd_host_dma_iommu(). Maybe there is already some way to enable that? Sorry if I missed some basics. I googled some but no luck to find the answer yet. Please let me know if any discussion already raised on that. Any other suggestion will also be appreciated. For example, can we modify the guest network card driver to allocate only from a specific memory region(zone), and qemu advises guest kernel to only pin that memory region(zone) accordingly? Thanks, - Simon ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Can VFIO pin only a specific region of guest mem when use pass through devices? 2018-10-29 2:42 Can VFIO pin only a specific region of guest mem when use pass through devices? Simon Guo @ 2018-10-29 9:14 ` Jason Wang 2018-10-29 18:29 ` Alex Williamson 0 siblings, 1 reply; 5+ messages in thread From: Jason Wang @ 2018-10-29 9:14 UTC (permalink / raw) To: Simon Guo, Alex Williamson, Eric Auger Cc: qixuan.wu, linux-kernel, kvm, Peter Xu On 2018/10/29 上午10:42, Simon Guo wrote: > Hi, > > I am using network device pass through mode with qemu x86(-device vfio-pci,host=0000:xx:yy.z) > and “intel_iommu=on” in host kernel command line, and it shows the whole guest memory > were pinned(vfio_pin_pages()), viewed by the “top” RES memory output. I understand it is due > to device can DMA to any guest memory address and it cannot be swapped. > > However can we just pin a rang of address space allowed by iommu group of that device, > instead of pin whole address space? I do notice some code like vtd_host_dma_iommu(). > Maybe there is already some way to enable that? > > Sorry if I missed some basics. I googled some but no luck to find the answer yet. Please > let me know if any discussion already raised on that. > > Any other suggestion will also be appreciated. For example, can we modify the guest network > card driver to allocate only from a specific memory region(zone), and qemu advises guest > kernel to only pin that memory region(zone) accordingly? > > Thanks, > - Simon One possible method is to enable IOMMU of VM. Peter (cced) may know more. Thanks ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Can VFIO pin only a specific region of guest mem when use pass through devices? 2018-10-29 9:14 ` Jason Wang @ 2018-10-29 18:29 ` Alex Williamson 2018-10-30 3:00 ` Peter Xu 0 siblings, 1 reply; 5+ messages in thread From: Alex Williamson @ 2018-10-29 18:29 UTC (permalink / raw) To: Jason Wang; +Cc: Simon Guo, Eric Auger, qixuan.wu, linux-kernel, kvm, Peter Xu On Mon, 29 Oct 2018 17:14:46 +0800 Jason Wang <jasowang@redhat.com> wrote: > On 2018/10/29 上午10:42, Simon Guo wrote: > > Hi, > > > > I am using network device pass through mode with qemu x86(-device vfio-pci,host=0000:xx:yy.z) > > and “intel_iommu=on” in host kernel command line, and it shows the whole guest memory > > were pinned(vfio_pin_pages()), viewed by the “top” RES memory output. I understand it is due > > to device can DMA to any guest memory address and it cannot be swapped. > > > > However can we just pin a rang of address space allowed by iommu group of that device, > > instead of pin whole address space? I do notice some code like vtd_host_dma_iommu(). > > Maybe there is already some way to enable that? > > > > Sorry if I missed some basics. I googled some but no luck to find the answer yet. Please > > let me know if any discussion already raised on that. > > > > Any other suggestion will also be appreciated. For example, can we modify the guest network > > card driver to allocate only from a specific memory region(zone), and qemu advises guest > > kernel to only pin that memory region(zone) accordingly? > > > > Thanks, > > - Simon > > > One possible method is to enable IOMMU of VM. Right, making use of a virtual IOMMU in the VM is really the only way to bound the DMA to some subset of guest memory, but vIOMMU usage by the guest is optional on x86 and even if the guest does use it, it might enable passthrough mode, which puts you back at the problem that all guest memory is pinned with the additional problem that it might also be accounted for once per assigned device and may hit locked memory limits. Also, the DMA mapping and unmapping path with a vIOMMU is very slow, so performance of the device in the guest will be abysmal unless the use case is limited to very static mappings, such as userspace use within the guest for nested assignment or perhaps DPDK use cases. Modifying the guest to only use a portion of memory for DMA sounds like a quite intrusive option. There are certainly IOMMU models where the IOMMU provides a fixed IOVA range, but creating dynamic mappings within that range doesn't really solve anything given that it simply returns us to a vIOMMU with slow mapping. A window with a fixed identity mapping used as a DMA zone seems plausible, but again, also pretty intrusive to the guest, possibly also to the drivers. Host IOMMU page faulting can also help the pinned memory footprint, but of course requires hardware support and lots of new code paths, many of which are already being discussed for things like Scalable IOV and SVA. Thanks, Alex ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Can VFIO pin only a specific region of guest mem when use pass through devices? 2018-10-29 18:29 ` Alex Williamson @ 2018-10-30 3:00 ` Peter Xu 2018-10-30 11:22 ` Simon Guo 0 siblings, 1 reply; 5+ messages in thread From: Peter Xu @ 2018-10-30 3:00 UTC (permalink / raw) To: Alex Williamson Cc: Jason Wang, Simon Guo, Eric Auger, qixuan.wu, linux-kernel, kvm On Mon, Oct 29, 2018 at 12:29:22PM -0600, Alex Williamson wrote: > On Mon, 29 Oct 2018 17:14:46 +0800 > Jason Wang <jasowang@redhat.com> wrote: > > > On 2018/10/29 上午10:42, Simon Guo wrote: > > > Hi, > > > > > > I am using network device pass through mode with qemu x86(-device vfio-pci,host=0000:xx:yy.z) > > > and “intel_iommu=on” in host kernel command line, and it shows the whole guest memory > > > were pinned(vfio_pin_pages()), viewed by the “top” RES memory output. I understand it is due > > > to device can DMA to any guest memory address and it cannot be swapped. > > > > > > However can we just pin a rang of address space allowed by iommu group of that device, > > > instead of pin whole address space? I do notice some code like vtd_host_dma_iommu(). > > > Maybe there is already some way to enable that? > > > > > > Sorry if I missed some basics. I googled some but no luck to find the answer yet. Please > > > let me know if any discussion already raised on that. > > > > > > Any other suggestion will also be appreciated. For example, can we modify the guest network > > > card driver to allocate only from a specific memory region(zone), and qemu advises guest > > > kernel to only pin that memory region(zone) accordingly? > > > > > > Thanks, > > > - Simon > > > > > > One possible method is to enable IOMMU of VM. > > Right, making use of a virtual IOMMU in the VM is really the only way > to bound the DMA to some subset of guest memory, but vIOMMU usage by > the guest is optional on x86 and even if the guest does use it, it might > enable passthrough mode, which puts you back at the problem that all > guest memory is pinned with the additional problem that it might also > be accounted for once per assigned device and may hit locked memory > limits. Also, the DMA mapping and unmapping path with a vIOMMU is very > slow, so performance of the device in the guest will be abysmal unless > the use case is limited to very static mappings, such as userspace use > within the guest for nested assignment or perhaps DPDK use cases. > > Modifying the guest to only use a portion of memory for DMA sounds like > a quite intrusive option. There are certainly IOMMU models where the > IOMMU provides a fixed IOVA range, but creating dynamic mappings within > that range doesn't really solve anything given that it simply returns > us to a vIOMMU with slow mapping. A window with a fixed identity > mapping used as a DMA zone seems plausible, but again, also pretty > intrusive to the guest, possibly also to the drivers. Host IOMMU page > faulting can also help the pinned memory footprint, but of course > requires hardware support and lots of new code paths, many of which are > already being discussed for things like Scalable IOV and SVA. Thanks, Agree with Jason's and Alex's comments. One trivial additional: the whole guest RAM will possibly still be pinned for a very short period during guest system boot (e.g., when running guest BIOS) and before the guest kernel enables the vIOMMU for the assigned device since the bootup code like BIOS would still need to be able to access the whole guest memory. Thanks, -- Peter Xu ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Can VFIO pin only a specific region of guest mem when use pass through devices? 2018-10-30 3:00 ` Peter Xu @ 2018-10-30 11:22 ` Simon Guo 0 siblings, 0 replies; 5+ messages in thread From: Simon Guo @ 2018-10-30 11:22 UTC (permalink / raw) To: Peter Xu Cc: Alex Williamson, Jason Wang, Eric Auger, qixuan.wu, linux-kernel, kvm On Tue, Oct 30, 2018 at 11:00:51AM +0800, Peter Xu wrote: > On Mon, Oct 29, 2018 at 12:29:22PM -0600, Alex Williamson wrote: > > On Mon, 29 Oct 2018 17:14:46 +0800 > > Jason Wang <jasowang@redhat.com> wrote: > > > > > On 2018/10/29 上午10:42, Simon Guo wrote: > > > > Hi, > > > > > > > > I am using network device pass through mode with qemu x86(-device vfio-pci,host=0000:xx:yy.z) > > > > and “intel_iommu=on” in host kernel command line, and it shows the whole guest memory > > > > were pinned(vfio_pin_pages()), viewed by the “top” RES memory output. I understand it is due > > > > to device can DMA to any guest memory address and it cannot be swapped. > > > > > > > > However can we just pin a rang of address space allowed by iommu group of that device, > > > > instead of pin whole address space? I do notice some code like vtd_host_dma_iommu(). > > > > Maybe there is already some way to enable that? > > > > > > > > Sorry if I missed some basics. I googled some but no luck to find the answer yet. Please > > > > let me know if any discussion already raised on that. > > > > > > > > Any other suggestion will also be appreciated. For example, can we modify the guest network > > > > card driver to allocate only from a specific memory region(zone), and qemu advises guest > > > > kernel to only pin that memory region(zone) accordingly? > > > > > > > > Thanks, > > > > - Simon > > > > > > > > > One possible method is to enable IOMMU of VM. > > > > Right, making use of a virtual IOMMU in the VM is really the only way > > to bound the DMA to some subset of guest memory, but vIOMMU usage by > > the guest is optional on x86 and even if the guest does use it, it might > > enable passthrough mode, which puts you back at the problem that all > > guest memory is pinned with the additional problem that it might also > > be accounted for once per assigned device and may hit locked memory > > limits. Also, the DMA mapping and unmapping path with a vIOMMU is very > > slow, so performance of the device in the guest will be abysmal unless > > the use case is limited to very static mappings, such as userspace use > > within the guest for nested assignment or perhaps DPDK use cases. > > > > Modifying the guest to only use a portion of memory for DMA sounds like > > a quite intrusive option. There are certainly IOMMU models where the > > IOMMU provides a fixed IOVA range, but creating dynamic mappings within > > that range doesn't really solve anything given that it simply returns > > us to a vIOMMU with slow mapping. A window with a fixed identity > > mapping used as a DMA zone seems plausible, but again, also pretty > > intrusive to the guest, possibly also to the drivers. Host IOMMU page > > faulting can also help the pinned memory footprint, but of course > > requires hardware support and lots of new code paths, many of which are > > already being discussed for things like Scalable IOV and SVA. Thanks, > > Agree with Jason's and Alex's comments. One trivial additional: the > whole guest RAM will possibly still be pinned for a very short period > during guest system boot (e.g., when running guest BIOS) and before > the guest kernel enables the vIOMMU for the assigned device since the > bootup code like BIOS would still need to be able to access the whole > guest memory. > Peter, Alex, Jason, Thanks for your nice/detailed explanation. BR, - Simon ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2018-10-30 11:22 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-10-29 2:42 Can VFIO pin only a specific region of guest mem when use pass through devices? Simon Guo 2018-10-29 9:14 ` Jason Wang 2018-10-29 18:29 ` Alex Williamson 2018-10-30 3:00 ` Peter Xu 2018-10-30 11:22 ` Simon Guo
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).