All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] vfio-pci: Use io_remap_pfn_range() for PCI IO memory
@ 2020-11-05 16:34 Jason Gunthorpe
  2020-11-05 23:39 ` Peter Xu
  0 siblings, 1 reply; 16+ messages in thread
From: Jason Gunthorpe @ 2020-11-05 16:34 UTC (permalink / raw)
  To: Cornelia Huck, kvm, Tom Lendacky; +Cc: Alex Williamson, Peter Xu

commit f8f6ae5d077a ("mm: always have io_remap_pfn_range() set
pgprot_decrypted()") allows drivers using mmap to put PCI memory mapped
BAR space into userspace to work correctly on AMD SME systems that default
to all memory encrypted.

Since vfio_pci_mmap_fault() is working with PCI memory mapped BAR space it
should be calling io_remap_pfn_range() otherwise it will not work on SME
systems.

Fixes: 11c4cd07ba11 ("vfio-pci: Fault mmaps to enable vma tracking")
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
---
 drivers/vfio/pci/vfio_pci.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

The io_remap_pfn_range() commit is in Linus's tree and will be in rc3, but
there is no cross dependency here.

Tom says VFIO device assignment works OK with KVM, so I expect only things
like DPDK to be broken.

Don't have SME hardware, can't test.

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index fbd2b3404184ba..1853cc2548c966 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -1635,8 +1635,8 @@ static vm_fault_t vfio_pci_mmap_fault(struct vm_fault *vmf)
 
 	mutex_unlock(&vdev->vma_lock);
 
-	if (remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff,
-			    vma->vm_end - vma->vm_start, vma->vm_page_prot))
+	if (io_remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff,
+			       vma->vm_end - vma->vm_start, vma->vm_page_prot))
 		ret = VM_FAULT_SIGBUS;
 
 up_out:
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] vfio-pci: Use io_remap_pfn_range() for PCI IO memory
  2020-11-05 16:34 [PATCH] vfio-pci: Use io_remap_pfn_range() for PCI IO memory Jason Gunthorpe
@ 2020-11-05 23:39 ` Peter Xu
  2020-11-16 15:53   ` Jason Gunthorpe
  0 siblings, 1 reply; 16+ messages in thread
From: Peter Xu @ 2020-11-05 23:39 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Cornelia Huck, kvm, Tom Lendacky, Alex Williamson

On Thu, Nov 05, 2020 at 12:34:58PM -0400, Jason Gunthorpe wrote:
> Tom says VFIO device assignment works OK with KVM, so I expect only things
> like DPDK to be broken.

Is there more information on why the difference?  Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] vfio-pci: Use io_remap_pfn_range() for PCI IO memory
  2020-11-05 23:39 ` Peter Xu
@ 2020-11-16 15:53   ` Jason Gunthorpe
  2020-11-16 21:43     ` Tom Lendacky
  0 siblings, 1 reply; 16+ messages in thread
From: Jason Gunthorpe @ 2020-11-16 15:53 UTC (permalink / raw)
  To: Peter Xu; +Cc: Cornelia Huck, kvm, Tom Lendacky, Alex Williamson

On Thu, Nov 05, 2020 at 06:39:49PM -0500, Peter Xu wrote:
> On Thu, Nov 05, 2020 at 12:34:58PM -0400, Jason Gunthorpe wrote:
> > Tom says VFIO device assignment works OK with KVM, so I expect only things
> > like DPDK to be broken.
> 
> Is there more information on why the difference?  Thanks,

I have nothing, maybe Tom can explain how it works?

Jason 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] vfio-pci: Use io_remap_pfn_range() for PCI IO memory
  2020-11-16 15:53   ` Jason Gunthorpe
@ 2020-11-16 21:43     ` Tom Lendacky
  2020-11-16 23:20       ` Jason Gunthorpe
  0 siblings, 1 reply; 16+ messages in thread
From: Tom Lendacky @ 2020-11-16 21:43 UTC (permalink / raw)
  To: Jason Gunthorpe, Peter Xu; +Cc: Cornelia Huck, kvm, Alex Williamson

On 11/16/20 9:53 AM, Jason Gunthorpe wrote:
> On Thu, Nov 05, 2020 at 06:39:49PM -0500, Peter Xu wrote:
>> On Thu, Nov 05, 2020 at 12:34:58PM -0400, Jason Gunthorpe wrote:
>>> Tom says VFIO device assignment works OK with KVM, so I expect only things
>>> like DPDK to be broken.
>>
>> Is there more information on why the difference?  Thanks,
> 
> I have nothing, maybe Tom can explain how it works?

IIUC, the main differences would be along the lines of what is performing
the mappings or who is performing the MMIO.

For device passthrough using VFIO, the guest kernel is the one that ends
up performing the MMIO in kernel space with the proper encryption mask
(unencrypted).

I'm not familiar with how DPDK really works other than it is userspace
based and uses polling drivers, etc. So it all depends on how everything
gets mapped and by whom. For example, using mmap() to get a mapping to
something that should be mapped unencrypted will be an issue since the
userspace mappings are created encrypted. Extending mmap() to be able to
specify a new flag, maybe MAP_UNENCRYPTED, might be something to consider.

Thanks,
Tom

> 
> Jason 
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] vfio-pci: Use io_remap_pfn_range() for PCI IO memory
  2020-11-16 21:43     ` Tom Lendacky
@ 2020-11-16 23:20       ` Jason Gunthorpe
  2020-11-17 15:33         ` Tom Lendacky
  0 siblings, 1 reply; 16+ messages in thread
From: Jason Gunthorpe @ 2020-11-16 23:20 UTC (permalink / raw)
  To: Tom Lendacky; +Cc: Peter Xu, Cornelia Huck, kvm, Alex Williamson

On Mon, Nov 16, 2020 at 03:43:53PM -0600, Tom Lendacky wrote:
> On 11/16/20 9:53 AM, Jason Gunthorpe wrote:
> > On Thu, Nov 05, 2020 at 06:39:49PM -0500, Peter Xu wrote:
> >> On Thu, Nov 05, 2020 at 12:34:58PM -0400, Jason Gunthorpe wrote:
> >>> Tom says VFIO device assignment works OK with KVM, so I expect only things
> >>> like DPDK to be broken.
> >>
> >> Is there more information on why the difference?  Thanks,
> > 
> > I have nothing, maybe Tom can explain how it works?
> 
> IIUC, the main differences would be along the lines of what is performing
> the mappings or who is performing the MMIO.
> 
> For device passthrough using VFIO, the guest kernel is the one that ends
> up performing the MMIO in kernel space with the proper encryption mask
> (unencrypted).

The question here is why does VF assignment work if the MMIO mapping
in the hypervisor is being marked encrypted.

It sounds like this means the page table in the hypervisor is ignored,
and it works because the VM's kernel marks the guest's page table as
non-encrypted?

> I'm not familiar with how DPDK really works other than it is userspace
> based and uses polling drivers, etc. So it all depends on how everything
> gets mapped and by whom. For example, using mmap() to get a mapping to
> something that should be mapped unencrypted will be an issue since the
> userspace mappings are created encrypted. 

It is the same as the rdma stuff, DPDK calls mmap against VFIO which
calls remap_pfn and creates encrypted mappings

> Extending mmap() to be able to specify a new flag, maybe
> MAP_UNENCRYPTED, might be something to consider.
 
Not sure how this makes sense here, the kernel knows the should not be
encrypted..

Jason

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] vfio-pci: Use io_remap_pfn_range() for PCI IO memory
  2020-11-16 23:20       ` Jason Gunthorpe
@ 2020-11-17 15:33         ` Tom Lendacky
  2020-11-17 15:54           ` Alex Williamson
  2020-11-17 15:57           ` Peter Xu
  0 siblings, 2 replies; 16+ messages in thread
From: Tom Lendacky @ 2020-11-17 15:33 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Peter Xu, Cornelia Huck, kvm, Alex Williamson

On 11/16/20 5:20 PM, Jason Gunthorpe wrote:
> On Mon, Nov 16, 2020 at 03:43:53PM -0600, Tom Lendacky wrote:
>> On 11/16/20 9:53 AM, Jason Gunthorpe wrote:
>>> On Thu, Nov 05, 2020 at 06:39:49PM -0500, Peter Xu wrote:
>>>> On Thu, Nov 05, 2020 at 12:34:58PM -0400, Jason Gunthorpe wrote:
>>>>> Tom says VFIO device assignment works OK with KVM, so I expect only things
>>>>> like DPDK to be broken.
>>>>
>>>> Is there more information on why the difference?  Thanks,
>>>
>>> I have nothing, maybe Tom can explain how it works?
>>
>> IIUC, the main differences would be along the lines of what is performing
>> the mappings or who is performing the MMIO.
>>
>> For device passthrough using VFIO, the guest kernel is the one that ends
>> up performing the MMIO in kernel space with the proper encryption mask
>> (unencrypted).
> 
> The question here is why does VF assignment work if the MMIO mapping
> in the hypervisor is being marked encrypted.
> 
> It sounds like this means the page table in the hypervisor is ignored,
> and it works because the VM's kernel marks the guest's page table as
> non-encrypted?

If I understand the VFIO code correctly, the MMIO area gets registered as
a RAM memory region and added to the guest. This MMIO region is accessed
in the guest through ioremap(), which creates an un-encrypted mapping,
allowing the guest to read it properly. So I believe the mmap() call only
provides the information used to register the memory region for guest
access and is not directly accessed by Qemu (I don't believe the guest
VMEXITs for the MMIO access, but I could be wrong).

> 
>> I'm not familiar with how DPDK really works other than it is userspace
>> based and uses polling drivers, etc. So it all depends on how everything
>> gets mapped and by whom. For example, using mmap() to get a mapping to
>> something that should be mapped unencrypted will be an issue since the
>> userspace mappings are created encrypted. 
> 
> It is the same as the rdma stuff, DPDK calls mmap against VFIO which
> calls remap_pfn and creates encrypted mappings
> 
>> Extending mmap() to be able to specify a new flag, maybe
>> MAP_UNENCRYPTED, might be something to consider.
>  
> Not sure how this makes sense here, the kernel knows the should not be
> encrypted..

Yeah, not in this case. Was just a general comment on whether to allow
userspace to do something like that on any mmap().

Thanks,
Tom

> 
> Jason
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] vfio-pci: Use io_remap_pfn_range() for PCI IO memory
  2020-11-17 15:33         ` Tom Lendacky
@ 2020-11-17 15:54           ` Alex Williamson
  2020-11-17 16:37             ` Tom Lendacky
  2020-11-17 15:57           ` Peter Xu
  1 sibling, 1 reply; 16+ messages in thread
From: Alex Williamson @ 2020-11-17 15:54 UTC (permalink / raw)
  To: Tom Lendacky; +Cc: Jason Gunthorpe, Peter Xu, Cornelia Huck, kvm

On Tue, 17 Nov 2020 09:33:17 -0600
Tom Lendacky <thomas.lendacky@amd.com> wrote:

> On 11/16/20 5:20 PM, Jason Gunthorpe wrote:
> > On Mon, Nov 16, 2020 at 03:43:53PM -0600, Tom Lendacky wrote:  
> >> On 11/16/20 9:53 AM, Jason Gunthorpe wrote:  
> >>> On Thu, Nov 05, 2020 at 06:39:49PM -0500, Peter Xu wrote:  
> >>>> On Thu, Nov 05, 2020 at 12:34:58PM -0400, Jason Gunthorpe wrote:  
> >>>>> Tom says VFIO device assignment works OK with KVM, so I expect only things
> >>>>> like DPDK to be broken.  
> >>>>
> >>>> Is there more information on why the difference?  Thanks,  
> >>>
> >>> I have nothing, maybe Tom can explain how it works?  
> >>
> >> IIUC, the main differences would be along the lines of what is performing
> >> the mappings or who is performing the MMIO.
> >>
> >> For device passthrough using VFIO, the guest kernel is the one that ends
> >> up performing the MMIO in kernel space with the proper encryption mask
> >> (unencrypted).  
> > 
> > The question here is why does VF assignment work if the MMIO mapping
> > in the hypervisor is being marked encrypted.
> > 
> > It sounds like this means the page table in the hypervisor is ignored,
> > and it works because the VM's kernel marks the guest's page table as
> > non-encrypted?  
> 
> If I understand the VFIO code correctly, the MMIO area gets registered as
> a RAM memory region and added to the guest. This MMIO region is accessed
> in the guest through ioremap(), which creates an un-encrypted mapping,
> allowing the guest to read it properly. So I believe the mmap() call only
> provides the information used to register the memory region for guest
> access and is not directly accessed by Qemu (I don't believe the guest
> VMEXITs for the MMIO access, but I could be wrong).

Ideally it won't, but trapping through QEMU is a common debugging
technique and required if we implement virtualization quirks for a
device in QEMU.  So I believe what you're saying is that device
assignment on SEV probably works only when we're using direct mapping
of the mmap into the VM and tracing or quirks would currently see
encrypted data.  Has anyone had the opportunity to check that we don't
break device assignment to VMs with this patch?  Thanks,

Alex


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] vfio-pci: Use io_remap_pfn_range() for PCI IO memory
  2020-11-17 15:33         ` Tom Lendacky
  2020-11-17 15:54           ` Alex Williamson
@ 2020-11-17 15:57           ` Peter Xu
  2020-11-17 16:34             ` Tom Lendacky
  1 sibling, 1 reply; 16+ messages in thread
From: Peter Xu @ 2020-11-17 15:57 UTC (permalink / raw)
  To: Tom Lendacky; +Cc: Jason Gunthorpe, Cornelia Huck, kvm, Alex Williamson

On Tue, Nov 17, 2020 at 09:33:17AM -0600, Tom Lendacky wrote:
> On 11/16/20 5:20 PM, Jason Gunthorpe wrote:
> > On Mon, Nov 16, 2020 at 03:43:53PM -0600, Tom Lendacky wrote:
> >> On 11/16/20 9:53 AM, Jason Gunthorpe wrote:
> >>> On Thu, Nov 05, 2020 at 06:39:49PM -0500, Peter Xu wrote:
> >>>> On Thu, Nov 05, 2020 at 12:34:58PM -0400, Jason Gunthorpe wrote:
> >>>>> Tom says VFIO device assignment works OK with KVM, so I expect only things
> >>>>> like DPDK to be broken.
> >>>>
> >>>> Is there more information on why the difference?  Thanks,
> >>>
> >>> I have nothing, maybe Tom can explain how it works?
> >>
> >> IIUC, the main differences would be along the lines of what is performing
> >> the mappings or who is performing the MMIO.
> >>
> >> For device passthrough using VFIO, the guest kernel is the one that ends
> >> up performing the MMIO in kernel space with the proper encryption mask
> >> (unencrypted).
> > 
> > The question here is why does VF assignment work if the MMIO mapping
> > in the hypervisor is being marked encrypted.
> > 
> > It sounds like this means the page table in the hypervisor is ignored,
> > and it works because the VM's kernel marks the guest's page table as
> > non-encrypted?
> 
> If I understand the VFIO code correctly, the MMIO area gets registered as
> a RAM memory region and added to the guest. This MMIO region is accessed
> in the guest through ioremap(), which creates an un-encrypted mapping,
> allowing the guest to read it properly. So I believe the mmap() call only
> provides the information used to register the memory region for guest
> access and is not directly accessed by Qemu (I don't believe the guest
> VMEXITs for the MMIO access, but I could be wrong).

Thanks for the explanations.

It seems fine if two dimentional page table is used in kvm, as long as the 1st
level guest page table is handled the same way as in the host.

I'm thinking what if shadow page table is used - IIUC here the vfio mmio region
will be the same as normal guest RAM from kvm memslot pov, however if the mmio
region is not encrypted, does it also mean that the whole guest RAM is not
encrypted too?  It's a pure question because I feel like these are two layers
of security (host as the 1st, guest as the 2nd), maybe here we're only talking
about host security rather than the guests, then it looks fine too.

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] vfio-pci: Use io_remap_pfn_range() for PCI IO memory
  2020-11-17 15:57           ` Peter Xu
@ 2020-11-17 16:34             ` Tom Lendacky
  2020-11-17 18:17               ` Peter Xu
  0 siblings, 1 reply; 16+ messages in thread
From: Tom Lendacky @ 2020-11-17 16:34 UTC (permalink / raw)
  To: Peter Xu; +Cc: Jason Gunthorpe, Cornelia Huck, kvm, Alex Williamson

On 11/17/20 9:57 AM, Peter Xu wrote:
> On Tue, Nov 17, 2020 at 09:33:17AM -0600, Tom Lendacky wrote:
>> On 11/16/20 5:20 PM, Jason Gunthorpe wrote:
>>> On Mon, Nov 16, 2020 at 03:43:53PM -0600, Tom Lendacky wrote:
>>>> On 11/16/20 9:53 AM, Jason Gunthorpe wrote:
>>>>> On Thu, Nov 05, 2020 at 06:39:49PM -0500, Peter Xu wrote:
>>>>>> On Thu, Nov 05, 2020 at 12:34:58PM -0400, Jason Gunthorpe wrote:
>>>>>>> Tom says VFIO device assignment works OK with KVM, so I expect only things
>>>>>>> like DPDK to be broken.
>>>>>>
>>>>>> Is there more information on why the difference?  Thanks,
>>>>>
>>>>> I have nothing, maybe Tom can explain how it works?
>>>>
>>>> IIUC, the main differences would be along the lines of what is performing
>>>> the mappings or who is performing the MMIO.
>>>>
>>>> For device passthrough using VFIO, the guest kernel is the one that ends
>>>> up performing the MMIO in kernel space with the proper encryption mask
>>>> (unencrypted).
>>>
>>> The question here is why does VF assignment work if the MMIO mapping
>>> in the hypervisor is being marked encrypted.
>>>
>>> It sounds like this means the page table in the hypervisor is ignored,
>>> and it works because the VM's kernel marks the guest's page table as
>>> non-encrypted?
>>
>> If I understand the VFIO code correctly, the MMIO area gets registered as
>> a RAM memory region and added to the guest. This MMIO region is accessed
>> in the guest through ioremap(), which creates an un-encrypted mapping,
>> allowing the guest to read it properly. So I believe the mmap() call only
>> provides the information used to register the memory region for guest
>> access and is not directly accessed by Qemu (I don't believe the guest
>> VMEXITs for the MMIO access, but I could be wrong).
> 
> Thanks for the explanations.
> 
> It seems fine if two dimentional page table is used in kvm, as long as the 1st
> level guest page table is handled the same way as in the host.
> 
> I'm thinking what if shadow page table is used - IIUC here the vfio mmio region
> will be the same as normal guest RAM from kvm memslot pov, however if the mmio
> region is not encrypted, does it also mean that the whole guest RAM is not
> encrypted too?  It's a pure question because I feel like these are two layers
> of security (host as the 1st, guest as the 2nd), maybe here we're only talking
> about host security rather than the guests, then it looks fine too.

SEV is only supported with NPT (TDP).

Thanks,
Tom

> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] vfio-pci: Use io_remap_pfn_range() for PCI IO memory
  2020-11-17 15:54           ` Alex Williamson
@ 2020-11-17 16:37             ` Tom Lendacky
  2020-11-17 17:07               ` Jason Gunthorpe
  0 siblings, 1 reply; 16+ messages in thread
From: Tom Lendacky @ 2020-11-17 16:37 UTC (permalink / raw)
  To: Alex Williamson; +Cc: Jason Gunthorpe, Peter Xu, Cornelia Huck, kvm

On 11/17/20 9:54 AM, Alex Williamson wrote:
> On Tue, 17 Nov 2020 09:33:17 -0600
> Tom Lendacky <thomas.lendacky@amd.com> wrote:
> 
>> On 11/16/20 5:20 PM, Jason Gunthorpe wrote:
>>> On Mon, Nov 16, 2020 at 03:43:53PM -0600, Tom Lendacky wrote:  
>>>> On 11/16/20 9:53 AM, Jason Gunthorpe wrote:  
>>>>> On Thu, Nov 05, 2020 at 06:39:49PM -0500, Peter Xu wrote:  
>>>>>> On Thu, Nov 05, 2020 at 12:34:58PM -0400, Jason Gunthorpe wrote:  
>>>>>>> Tom says VFIO device assignment works OK with KVM, so I expect only things
>>>>>>> like DPDK to be broken.  
>>>>>>
>>>>>> Is there more information on why the difference?  Thanks,  
>>>>>
>>>>> I have nothing, maybe Tom can explain how it works?  
>>>>
>>>> IIUC, the main differences would be along the lines of what is performing
>>>> the mappings or who is performing the MMIO.
>>>>
>>>> For device passthrough using VFIO, the guest kernel is the one that ends
>>>> up performing the MMIO in kernel space with the proper encryption mask
>>>> (unencrypted).  
>>>
>>> The question here is why does VF assignment work if the MMIO mapping
>>> in the hypervisor is being marked encrypted.
>>>
>>> It sounds like this means the page table in the hypervisor is ignored,
>>> and it works because the VM's kernel marks the guest's page table as
>>> non-encrypted?  
>>
>> If I understand the VFIO code correctly, the MMIO area gets registered as
>> a RAM memory region and added to the guest. This MMIO region is accessed
>> in the guest through ioremap(), which creates an un-encrypted mapping,
>> allowing the guest to read it properly. So I believe the mmap() call only
>> provides the information used to register the memory region for guest
>> access and is not directly accessed by Qemu (I don't believe the guest
>> VMEXITs for the MMIO access, but I could be wrong).
> 
> Ideally it won't, but trapping through QEMU is a common debugging
> technique and required if we implement virtualization quirks for a
> device in QEMU.  So I believe what you're saying is that device
> assignment on SEV probably works only when we're using direct mapping
> of the mmap into the VM and tracing or quirks would currently see
> encrypted data.  Has anyone had the opportunity to check that we don't
> break device assignment to VMs with this patch?  Thanks,

I have not been able to test device assignment with this patch, yet. Jason?

Thanks,
Tom

> 
> Alex
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] vfio-pci: Use io_remap_pfn_range() for PCI IO memory
  2020-11-17 16:37             ` Tom Lendacky
@ 2020-11-17 17:07               ` Jason Gunthorpe
  2020-11-17 17:10                 ` Tom Lendacky
  0 siblings, 1 reply; 16+ messages in thread
From: Jason Gunthorpe @ 2020-11-17 17:07 UTC (permalink / raw)
  To: Tom Lendacky; +Cc: Alex Williamson, Peter Xu, Cornelia Huck, kvm

On Tue, Nov 17, 2020 at 10:37:18AM -0600, Tom Lendacky wrote:

> > Ideally it won't, but trapping through QEMU is a common debugging
> > technique and required if we implement virtualization quirks for a
> > device in QEMU.  So I believe what you're saying is that device
> > assignment on SEV probably works only when we're using direct mapping
> > of the mmap into the VM and tracing or quirks would currently see
> > encrypted data.  Has anyone had the opportunity to check that we don't
> > break device assignment to VMs with this patch?  Thanks,
> 
> I have not been able to test device assignment with this patch, yet. Jason?

I don't have SME systems, we have a customer that reported RDMA didn't
work and confirmed the similar RDMA patch worked.

I know VFIO is basically identical, so it should be applicable here
too.

Jasnon

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] vfio-pci: Use io_remap_pfn_range() for PCI IO memory
  2020-11-17 17:07               ` Jason Gunthorpe
@ 2020-11-17 17:10                 ` Tom Lendacky
  0 siblings, 0 replies; 16+ messages in thread
From: Tom Lendacky @ 2020-11-17 17:10 UTC (permalink / raw)
  To: Jason Gunthorpe; +Cc: Alex Williamson, Peter Xu, Cornelia Huck, kvm

On 11/17/20 11:07 AM, Jason Gunthorpe wrote:
> On Tue, Nov 17, 2020 at 10:37:18AM -0600, Tom Lendacky wrote:
> 
>>> Ideally it won't, but trapping through QEMU is a common debugging
>>> technique and required if we implement virtualization quirks for a
>>> device in QEMU.  So I believe what you're saying is that device
>>> assignment on SEV probably works only when we're using direct mapping
>>> of the mmap into the VM and tracing or quirks would currently see
>>> encrypted data.  Has anyone had the opportunity to check that we don't
>>> break device assignment to VMs with this patch?  Thanks,
>>
>> I have not been able to test device assignment with this patch, yet. Jason?
> 
> I don't have SME systems, we have a customer that reported RDMA didn't
> work and confirmed the similar RDMA patch worked.

Right, I think Alex was asking about device assignment to a guest in
general, regardless of SME.

Thanks,
Tom

> 
> I know VFIO is basically identical, so it should be applicable here
> too.
> 
> Jasnon
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] vfio-pci: Use io_remap_pfn_range() for PCI IO memory
  2020-11-17 16:34             ` Tom Lendacky
@ 2020-11-17 18:17               ` Peter Xu
  2020-11-26 20:13                 ` Jason Gunthorpe
  0 siblings, 1 reply; 16+ messages in thread
From: Peter Xu @ 2020-11-17 18:17 UTC (permalink / raw)
  To: Tom Lendacky; +Cc: Jason Gunthorpe, Cornelia Huck, kvm, Alex Williamson

On Tue, Nov 17, 2020 at 10:34:37AM -0600, Tom Lendacky wrote:
> On 11/17/20 9:57 AM, Peter Xu wrote:
> > On Tue, Nov 17, 2020 at 09:33:17AM -0600, Tom Lendacky wrote:
> >> On 11/16/20 5:20 PM, Jason Gunthorpe wrote:
> >>> On Mon, Nov 16, 2020 at 03:43:53PM -0600, Tom Lendacky wrote:
> >>>> On 11/16/20 9:53 AM, Jason Gunthorpe wrote:
> >>>>> On Thu, Nov 05, 2020 at 06:39:49PM -0500, Peter Xu wrote:
> >>>>>> On Thu, Nov 05, 2020 at 12:34:58PM -0400, Jason Gunthorpe wrote:
> >>>>>>> Tom says VFIO device assignment works OK with KVM, so I expect only things
> >>>>>>> like DPDK to be broken.
> >>>>>>
> >>>>>> Is there more information on why the difference?  Thanks,
> >>>>>
> >>>>> I have nothing, maybe Tom can explain how it works?
> >>>>
> >>>> IIUC, the main differences would be along the lines of what is performing
> >>>> the mappings or who is performing the MMIO.
> >>>>
> >>>> For device passthrough using VFIO, the guest kernel is the one that ends
> >>>> up performing the MMIO in kernel space with the proper encryption mask
> >>>> (unencrypted).
> >>>
> >>> The question here is why does VF assignment work if the MMIO mapping
> >>> in the hypervisor is being marked encrypted.
> >>>
> >>> It sounds like this means the page table in the hypervisor is ignored,
> >>> and it works because the VM's kernel marks the guest's page table as
> >>> non-encrypted?
> >>
> >> If I understand the VFIO code correctly, the MMIO area gets registered as
> >> a RAM memory region and added to the guest. This MMIO region is accessed
> >> in the guest through ioremap(), which creates an un-encrypted mapping,
> >> allowing the guest to read it properly. So I believe the mmap() call only
> >> provides the information used to register the memory region for guest
> >> access and is not directly accessed by Qemu (I don't believe the guest
> >> VMEXITs for the MMIO access, but I could be wrong).
> > 
> > Thanks for the explanations.
> > 
> > It seems fine if two dimentional page table is used in kvm, as long as the 1st
> > level guest page table is handled the same way as in the host.
> > 
> > I'm thinking what if shadow page table is used - IIUC here the vfio mmio region
> > will be the same as normal guest RAM from kvm memslot pov, however if the mmio
> > region is not encrypted, does it also mean that the whole guest RAM is not
> > encrypted too?  It's a pure question because I feel like these are two layers
> > of security (host as the 1st, guest as the 2nd), maybe here we're only talking
> > about host security rather than the guests, then it looks fine too.
> 
> SEV is only supported with NPT (TDP).

I see, thanks for answering (even if my question was kind of out-of-topic..).

Regarding this patch, my current understanding is that the VM case worked only
because the guests in the previous tests were always using kvm directly mapped
MMIO accesses.  However that should not be always guaranteed because qemu
should be in complete control of that (e.g., qemu can switch to user-exit for
all mmio accesses for a vfio-pci device anytime without guest's awareness).

Logically this patch should fix that, just like the dpdk scenario where mmio
regions were accessed from userspace (qemu).  From that pov, I think this patch
should help.

Acked-by: Peter Xu <peterx@redhat.com>

Though if my above understanding is correct, it would be nice to mention some
of above information in the commit messages too, though may not worth a repost.

Tests will always be welcomed as suggested by Alex, of course.

Thanks,

-- 
Peter Xu


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] vfio-pci: Use io_remap_pfn_range() for PCI IO memory
  2020-11-17 18:17               ` Peter Xu
@ 2020-11-26 20:13                 ` Jason Gunthorpe
  2020-11-30 14:34                   ` Tom Lendacky
  0 siblings, 1 reply; 16+ messages in thread
From: Jason Gunthorpe @ 2020-11-26 20:13 UTC (permalink / raw)
  To: Peter Xu; +Cc: Tom Lendacky, Cornelia Huck, kvm, Alex Williamson

On Tue, Nov 17, 2020 at 01:17:54PM -0500, Peter Xu wrote:
 
> Logically this patch should fix that, just like the dpdk scenario where mmio
> regions were accessed from userspace (qemu).  From that pov, I think this patch
> should help.
> 
> Acked-by: Peter Xu <peterx@redhat.com>

Thanks Peter

Is there more to do here?

Jason

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] vfio-pci: Use io_remap_pfn_range() for PCI IO memory
  2020-11-26 20:13                 ` Jason Gunthorpe
@ 2020-11-30 14:34                   ` Tom Lendacky
  2020-11-30 15:34                     ` Alex Williamson
  0 siblings, 1 reply; 16+ messages in thread
From: Tom Lendacky @ 2020-11-30 14:34 UTC (permalink / raw)
  To: Jason Gunthorpe, Peter Xu; +Cc: Cornelia Huck, kvm, Alex Williamson

On 11/26/20 2:13 PM, Jason Gunthorpe wrote:
> On Tue, Nov 17, 2020 at 01:17:54PM -0500, Peter Xu wrote:
>  
>> Logically this patch should fix that, just like the dpdk scenario where mmio
>> regions were accessed from userspace (qemu).  From that pov, I think this patch
>> should help.
>>
>> Acked-by: Peter Xu <peterx@redhat.com>
> 
> Thanks Peter
> 
> Is there more to do here?

I just did a quick, limited passthrough test of a NIC device (non SRIOV)
for a legacy and an SEV guest and it all appears to work.

I don't have anything more (i.e. SRIOV, GPUs, etc.) with which to test
device passthrough.

Thanks,
Tom

> 
> Jason
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] vfio-pci: Use io_remap_pfn_range() for PCI IO memory
  2020-11-30 14:34                   ` Tom Lendacky
@ 2020-11-30 15:34                     ` Alex Williamson
  0 siblings, 0 replies; 16+ messages in thread
From: Alex Williamson @ 2020-11-30 15:34 UTC (permalink / raw)
  To: Tom Lendacky; +Cc: Jason Gunthorpe, Peter Xu, Cornelia Huck, kvm

On Mon, 30 Nov 2020 08:34:51 -0600
Tom Lendacky <thomas.lendacky@amd.com> wrote:

> On 11/26/20 2:13 PM, Jason Gunthorpe wrote:
> > On Tue, Nov 17, 2020 at 01:17:54PM -0500, Peter Xu wrote:
> >    
> >> Logically this patch should fix that, just like the dpdk scenario where mmio
> >> regions were accessed from userspace (qemu).  From that pov, I think this patch
> >> should help.
> >>
> >> Acked-by: Peter Xu <peterx@redhat.com>  
> > 
> > Thanks Peter
> > 
> > Is there more to do here?  
> 
> I just did a quick, limited passthrough test of a NIC device (non SRIOV)
> for a legacy and an SEV guest and it all appears to work.
> 
> I don't have anything more (i.e. SRIOV, GPUs, etc.) with which to test
> device passthrough.

Thanks, I'll include this for v5.11.

Alex


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2020-11-30 15:36 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-05 16:34 [PATCH] vfio-pci: Use io_remap_pfn_range() for PCI IO memory Jason Gunthorpe
2020-11-05 23:39 ` Peter Xu
2020-11-16 15:53   ` Jason Gunthorpe
2020-11-16 21:43     ` Tom Lendacky
2020-11-16 23:20       ` Jason Gunthorpe
2020-11-17 15:33         ` Tom Lendacky
2020-11-17 15:54           ` Alex Williamson
2020-11-17 16:37             ` Tom Lendacky
2020-11-17 17:07               ` Jason Gunthorpe
2020-11-17 17:10                 ` Tom Lendacky
2020-11-17 15:57           ` Peter Xu
2020-11-17 16:34             ` Tom Lendacky
2020-11-17 18:17               ` Peter Xu
2020-11-26 20:13                 ` Jason Gunthorpe
2020-11-30 14:34                   ` Tom Lendacky
2020-11-30 15:34                     ` Alex Williamson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.