All of lore.kernel.org
 help / color / mirror / Atom feed
* <summary> RE: [iGVT-g] VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...)
@ 2016-02-03  8:04 ` Tian, Kevin
  0 siblings, 0 replies; 18+ messages in thread
From: Tian, Kevin @ 2016-02-03  8:04 UTC (permalink / raw)
  To: Lv, Zhiyuan, Alex Williamson, Gerd Hoffmann
  Cc: Yang Zhang, igvt-g@lists.01.org, qemu-devel, kvm, Paolo Bonzini

> From: Zhiyuan Lv
> Sent: Tuesday, February 02, 2016 3:35 PM
> 
> Hi Gerd/Alex,
> 
> On Mon, Feb 01, 2016 at 02:44:55PM -0700, Alex Williamson wrote:
> > On Mon, 2016-02-01 at 14:10 +0100, Gerd Hoffmann wrote:
> > >   Hi,
> > >
> > > > > Unfortunately it's not the only one. Another example is, device-model
> > > > > may want to write-protect a gfn (RAM). In case that this request goes
> > > > > to VFIO .. how it is supposed to reach KVM MMU?
> > > >
> > > > Well, let's work through the problem.  How is the GFN related to the
> > > > device?  Is this some sort of page table for device mappings with a base
> > > > register in the vgpu hardware?
> > >
> > > IIRC this is needed to make sure the guest can't bypass execbuffer
> > > verification and works like this:
> > >
> > >   (1) guest submits execbuffer.
> > >   (2) host makes execbuffer readonly for the guest
> > >   (3) verify the buffer (make sure it only accesses resources owned by
> > >       the vm).
> > >   (4) pass on execbuffer to the hardware.
> > >   (5) when the gpu is done with it make the execbuffer writable again.
> >
> > Ok, so are there opportunities to do those page protections outside of
> > KVM?  We should be able to get the vma for the buffer, can we do
> > something with that to make it read-only.  Alternatively can the vgpu
> > driver copy it to a private buffer and hardware can execute from that?
> > I'm not a virtual memory expert, but it doesn't seem like an
> > insurmountable problem.  Thanks,
> 
> Originally iGVT-g used write-protection for privilege execbuffers, as Gerd
> described. Now the latest implementation has removed wp to do buffer copy
> instead, since the privilege command buffers are usually small. So that part
> is fine.
> 
> But we need write-protection for graphics page table shadowing as well. Once
> guest driver modifies gpu page table, we need to know that and manipulate
> shadow page table accordingly. buffer copy cannot help here. Thanks!
> 

After walking through the whole thread again, let me do a summary here
so everyone can be on the same page. 

First, Jike told me before his vacation, that we cannot do any change to 
KVM module according to community comments. Now I think it's not true. 
We can do necessary changes, as long as it is done in a structural/layered 
approach, w/o hard assumption on KVMGT as the only user. That's the 
guideline we need to obey. :-)

Mostly we care about two aspects regarding to a vgpu driver:
  - services/callbacks which vgpu driver provides to external framework
(e.g. vgpu core driver and VFIO);
  - services/callbacks which vgpu driver relies on for proper emulation
(e.g. from VFIO and/or hypervisor);

The former is being discussed in another thread. So here let's focus
on the latter.

In general Intel GVT-g requires below services for emulation:

1) Selectively pass-through a region to a VM
--
This can be supported by today's VFIO framework, by setting
VFIO_REGION_INFO_FLAG_MMAP for concerned regions. Then Qemu
will mmap that region which will finally be added to the EPT table of
the target VM

2) Trap-and-emulate a region
--
Similarly, this can be easily achieved by clearing MMAP flag for concerned
regions. Then every access from VM will go through Qemu and then VFIO
and finally reach vgpu driver. The only concern is in the performance
part. We need some general mechanism to allow delivering I/O emulation
request directly from KVM in kernel. For example, Alex mentioned some
flavor based on file descriptor + offset. Likely let's move forward with
the default Qemu forwarding, while brainstorming exit-less delivery in parallel.

3) Inject a virtual interrupt
--
We can leverage existing VFIO IRQ injection interface, including configuration
and irqfd interface.

4) Map/unmap guest memory
--
It's there for KVM.

5) Pin/unpin guest memory
--
IGD or any PCI passthru should have same requirement. So we should be
able to leverage existing code in VFIO. The only tricky thing (Jike may
elaborate after he is back), is that KVMGT requires to pin EPT entry too,
which requires some further change in KVM side. But I'm not sure whether
it still holds true after some design changes made in this thread. So I'll
leave to Jike to further comment.

6) Write-protect a guest memory page
--
The primary purpose is for GPU page table shadowing. We need to track
modifications on guest GPU page table, so shadow part can be synchronized
accordingly. Just think about CPU page table shadowing. And old example
as Zhiyuan pointed out, is to write-protect guest cmd buffer. But it becomes
not necessary now.

So we need KVM to provide an interface so some agents can request such
write-protection action (not just for KVMGT. could be for other tracking 
usages). Guangrong has been working on a general page tracking mechanism,
upon which write-protection can be easily built on. The review is still in 
progress.

7) GPA->IOVA/HVA translation
--
It's required in various places, e.g.:
- read a guest structure according to GPA
- replace GPA with IOVA in various shadow structures

We can maintain both translations in vfio-iommu-type1 driver, since
necessary information is ready at map interface. And we should use
MemoryListener to update the database. It's already there for physical
device passthru (Qemu uses MemoryListener and then rely to vfio).

vfio-vgpu will expose query interface, thru vgpu core driver, so that 
vgpu driver can use above database for whatever purpose.


----
Well, then I realize pretty much opens have been covered with a solution
when ending this write-up. Then we should move forward to come up a
prototype upon which we can then identify anything missing or overlooked
(definitely there would be), and also discuss several remaining opens atop
 (such as exit-less emulation, pin/unpin, etc.). Another thing we need
to think is whether this new design is still compatible to Xen side.

Thanks a lot all for the great discussion (especially Alex with many good
inputs)! I believe it becomes much clearer now than 2 weeks ago, about 
how to integrate KVMGT with VFIO. :-)

Thanks
Kevin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Qemu-devel] <summary> RE: [iGVT-g] VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...)
@ 2016-02-03  8:04 ` Tian, Kevin
  0 siblings, 0 replies; 18+ messages in thread
From: Tian, Kevin @ 2016-02-03  8:04 UTC (permalink / raw)
  To: Lv, Zhiyuan, Alex Williamson, Gerd Hoffmann
  Cc: Yang Zhang, igvt-g@lists.01.org, qemu-devel, kvm, Paolo Bonzini

> From: Zhiyuan Lv
> Sent: Tuesday, February 02, 2016 3:35 PM
> 
> Hi Gerd/Alex,
> 
> On Mon, Feb 01, 2016 at 02:44:55PM -0700, Alex Williamson wrote:
> > On Mon, 2016-02-01 at 14:10 +0100, Gerd Hoffmann wrote:
> > >   Hi,
> > >
> > > > > Unfortunately it's not the only one. Another example is, device-model
> > > > > may want to write-protect a gfn (RAM). In case that this request goes
> > > > > to VFIO .. how it is supposed to reach KVM MMU?
> > > >
> > > > Well, let's work through the problem.  How is the GFN related to the
> > > > device?  Is this some sort of page table for device mappings with a base
> > > > register in the vgpu hardware?
> > >
> > > IIRC this is needed to make sure the guest can't bypass execbuffer
> > > verification and works like this:
> > >
> > >   (1) guest submits execbuffer.
> > >   (2) host makes execbuffer readonly for the guest
> > >   (3) verify the buffer (make sure it only accesses resources owned by
> > >       the vm).
> > >   (4) pass on execbuffer to the hardware.
> > >   (5) when the gpu is done with it make the execbuffer writable again.
> >
> > Ok, so are there opportunities to do those page protections outside of
> > KVM?  We should be able to get the vma for the buffer, can we do
> > something with that to make it read-only.  Alternatively can the vgpu
> > driver copy it to a private buffer and hardware can execute from that?
> > I'm not a virtual memory expert, but it doesn't seem like an
> > insurmountable problem.  Thanks,
> 
> Originally iGVT-g used write-protection for privilege execbuffers, as Gerd
> described. Now the latest implementation has removed wp to do buffer copy
> instead, since the privilege command buffers are usually small. So that part
> is fine.
> 
> But we need write-protection for graphics page table shadowing as well. Once
> guest driver modifies gpu page table, we need to know that and manipulate
> shadow page table accordingly. buffer copy cannot help here. Thanks!
> 

After walking through the whole thread again, let me do a summary here
so everyone can be on the same page. 

First, Jike told me before his vacation, that we cannot do any change to 
KVM module according to community comments. Now I think it's not true. 
We can do necessary changes, as long as it is done in a structural/layered 
approach, w/o hard assumption on KVMGT as the only user. That's the 
guideline we need to obey. :-)

Mostly we care about two aspects regarding to a vgpu driver:
  - services/callbacks which vgpu driver provides to external framework
(e.g. vgpu core driver and VFIO);
  - services/callbacks which vgpu driver relies on for proper emulation
(e.g. from VFIO and/or hypervisor);

The former is being discussed in another thread. So here let's focus
on the latter.

In general Intel GVT-g requires below services for emulation:

1) Selectively pass-through a region to a VM
--
This can be supported by today's VFIO framework, by setting
VFIO_REGION_INFO_FLAG_MMAP for concerned regions. Then Qemu
will mmap that region which will finally be added to the EPT table of
the target VM

2) Trap-and-emulate a region
--
Similarly, this can be easily achieved by clearing MMAP flag for concerned
regions. Then every access from VM will go through Qemu and then VFIO
and finally reach vgpu driver. The only concern is in the performance
part. We need some general mechanism to allow delivering I/O emulation
request directly from KVM in kernel. For example, Alex mentioned some
flavor based on file descriptor + offset. Likely let's move forward with
the default Qemu forwarding, while brainstorming exit-less delivery in parallel.

3) Inject a virtual interrupt
--
We can leverage existing VFIO IRQ injection interface, including configuration
and irqfd interface.

4) Map/unmap guest memory
--
It's there for KVM.

5) Pin/unpin guest memory
--
IGD or any PCI passthru should have same requirement. So we should be
able to leverage existing code in VFIO. The only tricky thing (Jike may
elaborate after he is back), is that KVMGT requires to pin EPT entry too,
which requires some further change in KVM side. But I'm not sure whether
it still holds true after some design changes made in this thread. So I'll
leave to Jike to further comment.

6) Write-protect a guest memory page
--
The primary purpose is for GPU page table shadowing. We need to track
modifications on guest GPU page table, so shadow part can be synchronized
accordingly. Just think about CPU page table shadowing. And old example
as Zhiyuan pointed out, is to write-protect guest cmd buffer. But it becomes
not necessary now.

So we need KVM to provide an interface so some agents can request such
write-protection action (not just for KVMGT. could be for other tracking 
usages). Guangrong has been working on a general page tracking mechanism,
upon which write-protection can be easily built on. The review is still in 
progress.

7) GPA->IOVA/HVA translation
--
It's required in various places, e.g.:
- read a guest structure according to GPA
- replace GPA with IOVA in various shadow structures

We can maintain both translations in vfio-iommu-type1 driver, since
necessary information is ready at map interface. And we should use
MemoryListener to update the database. It's already there for physical
device passthru (Qemu uses MemoryListener and then rely to vfio).

vfio-vgpu will expose query interface, thru vgpu core driver, so that 
vgpu driver can use above database for whatever purpose.


----
Well, then I realize pretty much opens have been covered with a solution
when ending this write-up. Then we should move forward to come up a
prototype upon which we can then identify anything missing or overlooked
(definitely there would be), and also discuss several remaining opens atop
 (such as exit-less emulation, pin/unpin, etc.). Another thing we need
to think is whether this new design is still compatible to Xen side.

Thanks a lot all for the great discussion (especially Alex with many good
inputs)! I believe it becomes much clearer now than 2 weeks ago, about 
how to integrate KVMGT with VFIO. :-)

Thanks
Kevin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: <summary> RE: [iGVT-g] VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...)
  2016-02-03  8:04 ` [Qemu-devel] " Tian, Kevin
@ 2016-02-03  8:41   ` Neo Jia
  -1 siblings, 0 replies; 18+ messages in thread
From: Neo Jia @ 2016-02-03  8:41 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Lv, Zhiyuan, Alex Williamson, Gerd Hoffmann, Yang Zhang,
	igvt-g@lists.01.org, qemu-devel, kvm, Paolo Bonzini,
	Kirti Wankhede

On Wed, Feb 03, 2016 at 08:04:16AM +0000, Tian, Kevin wrote:
> > From: Zhiyuan Lv
> > Sent: Tuesday, February 02, 2016 3:35 PM
> > 
> > Hi Gerd/Alex,
> > 
> > On Mon, Feb 01, 2016 at 02:44:55PM -0700, Alex Williamson wrote:
> > > On Mon, 2016-02-01 at 14:10 +0100, Gerd Hoffmann wrote:
> > > >   Hi,
> > > >
> > > > > > Unfortunately it's not the only one. Another example is, device-model
> > > > > > may want to write-protect a gfn (RAM). In case that this request goes
> > > > > > to VFIO .. how it is supposed to reach KVM MMU?
> > > > >
> > > > > Well, let's work through the problem.  How is the GFN related to the
> > > > > device?  Is this some sort of page table for device mappings with a base
> > > > > register in the vgpu hardware?
> > > >
> > > > IIRC this is needed to make sure the guest can't bypass execbuffer
> > > > verification and works like this:
> > > >
> > > >   (1) guest submits execbuffer.
> > > >   (2) host makes execbuffer readonly for the guest
> > > >   (3) verify the buffer (make sure it only accesses resources owned by
> > > >       the vm).
> > > >   (4) pass on execbuffer to the hardware.
> > > >   (5) when the gpu is done with it make the execbuffer writable again.
> > >
> > > Ok, so are there opportunities to do those page protections outside of
> > > KVM?  We should be able to get the vma for the buffer, can we do
> > > something with that to make it read-only.  Alternatively can the vgpu
> > > driver copy it to a private buffer and hardware can execute from that?
> > > I'm not a virtual memory expert, but it doesn't seem like an
> > > insurmountable problem.  Thanks,
> > 
> > Originally iGVT-g used write-protection for privilege execbuffers, as Gerd
> > described. Now the latest implementation has removed wp to do buffer copy
> > instead, since the privilege command buffers are usually small. So that part
> > is fine.
> > 
> > But we need write-protection for graphics page table shadowing as well. Once
> > guest driver modifies gpu page table, we need to know that and manipulate
> > shadow page table accordingly. buffer copy cannot help here. Thanks!
> > 
> 
> 
> 4) Map/unmap guest memory
> --
> It's there for KVM.
> 
> 5) Pin/unpin guest memory
> --
> IGD or any PCI passthru should have same requirement. So we should be
> able to leverage existing code in VFIO. The only tricky thing (Jike may
> elaborate after he is back), is that KVMGT requires to pin EPT entry too,
> which requires some further change in KVM side. But I'm not sure whether
> it still holds true after some design changes made in this thread. So I'll
> leave to Jike to further comment.
> 

Hi Kevin,

I think you should be able to map and pin guest memory via the IOMMU API, not
KVM.

> Well, then I realize pretty much opens have been covered with a solution
> when ending this write-up. Then we should move forward to come up a
> prototype upon which we can then identify anything missing or overlooked
> (definitely there would be), and also discuss several remaining opens atop
>  (such as exit-less emulation, pin/unpin, etc.). Another thing we need
> to think is whether this new design is still compatible to Xen side.
> 
> Thanks a lot all for the great discussion (especially Alex with many good
> inputs)! I believe it becomes much clearer now than 2 weeks ago, about 
> how to integrate KVMGT with VFIO. :-)
> 

It is great to see you guys are onboard with VFIO solution! As Kirti has
mentioned in other threads, let's review the current registration APIs and
figure out what we need to add for both solutions.

Thanks,
Neo

> Thanks
> Kevin
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] <summary> RE: [iGVT-g] VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...)
@ 2016-02-03  8:41   ` Neo Jia
  0 siblings, 0 replies; 18+ messages in thread
From: Neo Jia @ 2016-02-03  8:41 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Yang Zhang, igvt-g@lists.01.org, kvm, qemu-devel, Kirti Wankhede,
	Alex Williamson, Lv, Zhiyuan, Paolo Bonzini, Gerd Hoffmann

On Wed, Feb 03, 2016 at 08:04:16AM +0000, Tian, Kevin wrote:
> > From: Zhiyuan Lv
> > Sent: Tuesday, February 02, 2016 3:35 PM
> > 
> > Hi Gerd/Alex,
> > 
> > On Mon, Feb 01, 2016 at 02:44:55PM -0700, Alex Williamson wrote:
> > > On Mon, 2016-02-01 at 14:10 +0100, Gerd Hoffmann wrote:
> > > >   Hi,
> > > >
> > > > > > Unfortunately it's not the only one. Another example is, device-model
> > > > > > may want to write-protect a gfn (RAM). In case that this request goes
> > > > > > to VFIO .. how it is supposed to reach KVM MMU?
> > > > >
> > > > > Well, let's work through the problem.  How is the GFN related to the
> > > > > device?  Is this some sort of page table for device mappings with a base
> > > > > register in the vgpu hardware?
> > > >
> > > > IIRC this is needed to make sure the guest can't bypass execbuffer
> > > > verification and works like this:
> > > >
> > > >   (1) guest submits execbuffer.
> > > >   (2) host makes execbuffer readonly for the guest
> > > >   (3) verify the buffer (make sure it only accesses resources owned by
> > > >       the vm).
> > > >   (4) pass on execbuffer to the hardware.
> > > >   (5) when the gpu is done with it make the execbuffer writable again.
> > >
> > > Ok, so are there opportunities to do those page protections outside of
> > > KVM?  We should be able to get the vma for the buffer, can we do
> > > something with that to make it read-only.  Alternatively can the vgpu
> > > driver copy it to a private buffer and hardware can execute from that?
> > > I'm not a virtual memory expert, but it doesn't seem like an
> > > insurmountable problem.  Thanks,
> > 
> > Originally iGVT-g used write-protection for privilege execbuffers, as Gerd
> > described. Now the latest implementation has removed wp to do buffer copy
> > instead, since the privilege command buffers are usually small. So that part
> > is fine.
> > 
> > But we need write-protection for graphics page table shadowing as well. Once
> > guest driver modifies gpu page table, we need to know that and manipulate
> > shadow page table accordingly. buffer copy cannot help here. Thanks!
> > 
> 
> 
> 4) Map/unmap guest memory
> --
> It's there for KVM.
> 
> 5) Pin/unpin guest memory
> --
> IGD or any PCI passthru should have same requirement. So we should be
> able to leverage existing code in VFIO. The only tricky thing (Jike may
> elaborate after he is back), is that KVMGT requires to pin EPT entry too,
> which requires some further change in KVM side. But I'm not sure whether
> it still holds true after some design changes made in this thread. So I'll
> leave to Jike to further comment.
> 

Hi Kevin,

I think you should be able to map and pin guest memory via the IOMMU API, not
KVM.

> Well, then I realize pretty much opens have been covered with a solution
> when ending this write-up. Then we should move forward to come up a
> prototype upon which we can then identify anything missing or overlooked
> (definitely there would be), and also discuss several remaining opens atop
>  (such as exit-less emulation, pin/unpin, etc.). Another thing we need
> to think is whether this new design is still compatible to Xen side.
> 
> Thanks a lot all for the great discussion (especially Alex with many good
> inputs)! I believe it becomes much clearer now than 2 weeks ago, about 
> how to integrate KVMGT with VFIO. :-)
> 

It is great to see you guys are onboard with VFIO solution! As Kirti has
mentioned in other threads, let's review the current registration APIs and
figure out what we need to add for both solutions.

Thanks,
Neo

> Thanks
> Kevin
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: <summary> RE: [iGVT-g] VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...)
  2016-02-03  8:04 ` [Qemu-devel] " Tian, Kevin
@ 2016-02-03 20:44   ` Alex Williamson
  -1 siblings, 0 replies; 18+ messages in thread
From: Alex Williamson @ 2016-02-03 20:44 UTC (permalink / raw)
  To: Tian, Kevin, Lv, Zhiyuan, Gerd Hoffmann
  Cc: Yang Zhang, igvt-g@lists.01.org, qemu-devel, kvm, Paolo Bonzini

On Wed, 2016-02-03 at 08:04 +0000, Tian, Kevin wrote:
> > From: Zhiyuan Lv
> > Sent: Tuesday, February 02, 2016 3:35 PM
> > 
> > Hi Gerd/Alex,
> > 
> > On Mon, Feb 01, 2016 at 02:44:55PM -0700, Alex Williamson wrote:
> > > On Mon, 2016-02-01 at 14:10 +0100, Gerd Hoffmann wrote:
> > > >   Hi,
> > > > 
> > > > > > Unfortunately it's not the only one. Another example is, device-model
> > > > > > may want to write-protect a gfn (RAM). In case that this request goes
> > > > > > to VFIO .. how it is supposed to reach KVM MMU?
> > > > > 
> > > > > Well, let's work through the problem.  How is the GFN related to the
> > > > > device?  Is this some sort of page table for device mappings with a base
> > > > > register in the vgpu hardware?
> > > > 
> > > > IIRC this is needed to make sure the guest can't bypass execbuffer
> > > > verification and works like this:
> > > > 
> > > >   (1) guest submits execbuffer.
> > > >   (2) host makes execbuffer readonly for the guest
> > > >   (3) verify the buffer (make sure it only accesses resources owned by
> > > >       the vm).
> > > >   (4) pass on execbuffer to the hardware.
> > > >   (5) when the gpu is done with it make the execbuffer writable again.
> > > 
> > > Ok, so are there opportunities to do those page protections outside of
> > > KVM?  We should be able to get the vma for the buffer, can we do
> > > something with that to make it read-only.  Alternatively can the vgpu
> > > driver copy it to a private buffer and hardware can execute from that?
> > > I'm not a virtual memory expert, but it doesn't seem like an
> > > insurmountable problem.  Thanks,
> > 
> > Originally iGVT-g used write-protection for privilege execbuffers, as Gerd
> > described. Now the latest implementation has removed wp to do buffer copy
> > instead, since the privilege command buffers are usually small. So that part
> > is fine.
> > 
> > But we need write-protection for graphics page table shadowing as well. Once
> > guest driver modifies gpu page table, we need to know that and manipulate
> > shadow page table accordingly. buffer copy cannot help here. Thanks!
> > 
> 
> After walking through the whole thread again, let me do a summary here
> so everyone can be on the same page. 
> 
> First, Jike told me before his vacation, that we cannot do any change to 
> KVM module according to community comments. Now I think it's not true. 
> We can do necessary changes, as long as it is done in a structural/layered 
> approach, w/o hard assumption on KVMGT as the only user. That's the 
> guideline we need to obey. :-)

We certainly need to separate the functionality that you're trying to
enable from the more pure concept of vfio.  vfio is a userspace driver
interfaces, not a userspace driver interface for KVM-based virtual
machines.  Maybe it's more of a gimmick that we can assign PCI devices
to QEMU tcg VMs, but that's really just the proof of concept for more
useful capabilities, like supporting DPDK applications.  So, I
begrudgingly agree that structured/layered interactions are acceptable,
but consider what use cases may be excluded by doing so.

> Mostly we care about two aspects regarding to a vgpu driver:
>   - services/callbacks which vgpu driver provides to external framework
> (e.g. vgpu core driver and VFIO);
>   - services/callbacks which vgpu driver relies on for proper emulation
> (e.g. from VFIO and/or hypervisor);
> 
> The former is being discussed in another thread. So here let's focus
> on the latter.
> 
> In general Intel GVT-g requires below services for emulation:
> 
> 1) Selectively pass-through a region to a VM
> --
> This can be supported by today's VFIO framework, by setting
> VFIO_REGION_INFO_FLAG_MMAP for concerned regions. Then Qemu
> will mmap that region which will finally be added to the EPT table of
> the target VM
> 
> 2) Trap-and-emulate a region
> --
> Similarly, this can be easily achieved by clearing MMAP flag for concerned
> regions. Then every access from VM will go through Qemu and then VFIO
> and finally reach vgpu driver. The only concern is in the performance
> part. We need some general mechanism to allow delivering I/O emulation
> request directly from KVM in kernel. For example, Alex mentioned some
> flavor based on file descriptor + offset. Likely let's move forward with
> the default Qemu forwarding, while brainstorming exit-less delivery in parallel.
> 
> 3) Inject a virtual interrupt
> --
> We can leverage existing VFIO IRQ injection interface, including configuration
> and irqfd interface.
> 
> 4) Map/unmap guest memory
> --
> It's there for KVM.

Map and unmap for who?  For the vGPU or for the VM?  It seems like we
know how to map guest memory for the vGPU without KVM, but that's
covered in 7), so I'm not entirely sure what this is specifying.
 
> 5) Pin/unpin guest memory
> --
> IGD or any PCI passthru should have same requirement. So we should be
> able to leverage existing code in VFIO. The only tricky thing (Jike may
> elaborate after he is back), is that KVMGT requires to pin EPT entry too,
> which requires some further change in KVM side. But I'm not sure whether
> it still holds true after some design changes made in this thread. So I'll
> leave to Jike to further comment.

PCI assignment requires pinning all of guest memory, I would think that
IGD would only need to pin selective memory, so is this simply stating
that both have the need to pin memory, not that they'll do it to the
same extent?

> 6) Write-protect a guest memory page
> --
> The primary purpose is for GPU page table shadowing. We need to track
> modifications on guest GPU page table, so shadow part can be synchronized
> accordingly. Just think about CPU page table shadowing. And old example
> as Zhiyuan pointed out, is to write-protect guest cmd buffer. But it becomes
> not necessary now.
> 
> So we need KVM to provide an interface so some agents can request such
> write-protection action (not just for KVMGT. could be for other tracking 
> usages). Guangrong has been working on a general page tracking mechanism,
> upon which write-protection can be easily built on. The review is still in 
> progress.

I have a hard time believing we don't have the mechanics to do this
outside of KVM.  We should be able to write protect user pages from the
kernel, this is how copy-on-write generally works.  So it seems like we
should be able to apply those same mechanics to our userspace process,
which just happens to be a KVM VM.  I'm hoping that Paolo might have
some ideas how to make this work or maybe Intel has some virtual memory
experts that can point us in the right direction.

> 7) GPA->IOVA/HVA translation
> --
> It's required in various places, e.g.:
> - read a guest structure according to GPA
> - replace GPA with IOVA in various shadow structures
> 
> We can maintain both translations in vfio-iommu-type1 driver, since
> necessary information is ready at map interface. And we should use
> MemoryListener to update the database. It's already there for physical
> device passthru (Qemu uses MemoryListener and then rely to vfio).
> 
> vfio-vgpu will expose query interface, thru vgpu core driver, so that 
> vgpu driver can use above database for whatever purpose.
> 
> 
> ----
> Well, then I realize pretty much opens have been covered with a solution
> when ending this write-up. Then we should move forward to come up a
> prototype upon which we can then identify anything missing or overlooked
> (definitely there would be), and also discuss several remaining opens atop
>  (such as exit-less emulation, pin/unpin, etc.). Another thing we need
> to think is whether this new design is still compatible to Xen side.
> 
> Thanks a lot all for the great discussion (especially Alex with many good
> inputs)! I believe it becomes much clearer now than 2 weeks ago, about 
> how to integrate KVMGT with VFIO. :-)

Thanks for your summary, Kevin.  It does seem like there are only a few
outstanding issues which should be manageable and hopefully the overall
approach is cleaner for QEMU, management tools, and provides a more
consistent user interface as well.  If we can translate the solution to
Xen, that's even better.  Thanks,

Alex


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] <summary> RE: [iGVT-g] VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...)
@ 2016-02-03 20:44   ` Alex Williamson
  0 siblings, 0 replies; 18+ messages in thread
From: Alex Williamson @ 2016-02-03 20:44 UTC (permalink / raw)
  To: Tian, Kevin, Lv, Zhiyuan, Gerd Hoffmann
  Cc: Yang Zhang, igvt-g@lists.01.org, qemu-devel, kvm, Paolo Bonzini

On Wed, 2016-02-03 at 08:04 +0000, Tian, Kevin wrote:
> > From: Zhiyuan Lv
> > Sent: Tuesday, February 02, 2016 3:35 PM
> > 
> > Hi Gerd/Alex,
> > 
> > On Mon, Feb 01, 2016 at 02:44:55PM -0700, Alex Williamson wrote:
> > > On Mon, 2016-02-01 at 14:10 +0100, Gerd Hoffmann wrote:
> > > >   Hi,
> > > > 
> > > > > > Unfortunately it's not the only one. Another example is, device-model
> > > > > > may want to write-protect a gfn (RAM). In case that this request goes
> > > > > > to VFIO .. how it is supposed to reach KVM MMU?
> > > > > 
> > > > > Well, let's work through the problem.  How is the GFN related to the
> > > > > device?  Is this some sort of page table for device mappings with a base
> > > > > register in the vgpu hardware?
> > > > 
> > > > IIRC this is needed to make sure the guest can't bypass execbuffer
> > > > verification and works like this:
> > > > 
> > > >   (1) guest submits execbuffer.
> > > >   (2) host makes execbuffer readonly for the guest
> > > >   (3) verify the buffer (make sure it only accesses resources owned by
> > > >       the vm).
> > > >   (4) pass on execbuffer to the hardware.
> > > >   (5) when the gpu is done with it make the execbuffer writable again.
> > > 
> > > Ok, so are there opportunities to do those page protections outside of
> > > KVM?  We should be able to get the vma for the buffer, can we do
> > > something with that to make it read-only.  Alternatively can the vgpu
> > > driver copy it to a private buffer and hardware can execute from that?
> > > I'm not a virtual memory expert, but it doesn't seem like an
> > > insurmountable problem.  Thanks,
> > 
> > Originally iGVT-g used write-protection for privilege execbuffers, as Gerd
> > described. Now the latest implementation has removed wp to do buffer copy
> > instead, since the privilege command buffers are usually small. So that part
> > is fine.
> > 
> > But we need write-protection for graphics page table shadowing as well. Once
> > guest driver modifies gpu page table, we need to know that and manipulate
> > shadow page table accordingly. buffer copy cannot help here. Thanks!
> > 
> 
> After walking through the whole thread again, let me do a summary here
> so everyone can be on the same page. 
> 
> First, Jike told me before his vacation, that we cannot do any change to 
> KVM module according to community comments. Now I think it's not true. 
> We can do necessary changes, as long as it is done in a structural/layered 
> approach, w/o hard assumption on KVMGT as the only user. That's the 
> guideline we need to obey. :-)

We certainly need to separate the functionality that you're trying to
enable from the more pure concept of vfio.  vfio is a userspace driver
interfaces, not a userspace driver interface for KVM-based virtual
machines.  Maybe it's more of a gimmick that we can assign PCI devices
to QEMU tcg VMs, but that's really just the proof of concept for more
useful capabilities, like supporting DPDK applications.  So, I
begrudgingly agree that structured/layered interactions are acceptable,
but consider what use cases may be excluded by doing so.

> Mostly we care about two aspects regarding to a vgpu driver:
>   - services/callbacks which vgpu driver provides to external framework
> (e.g. vgpu core driver and VFIO);
>   - services/callbacks which vgpu driver relies on for proper emulation
> (e.g. from VFIO and/or hypervisor);
> 
> The former is being discussed in another thread. So here let's focus
> on the latter.
> 
> In general Intel GVT-g requires below services for emulation:
> 
> 1) Selectively pass-through a region to a VM
> --
> This can be supported by today's VFIO framework, by setting
> VFIO_REGION_INFO_FLAG_MMAP for concerned regions. Then Qemu
> will mmap that region which will finally be added to the EPT table of
> the target VM
> 
> 2) Trap-and-emulate a region
> --
> Similarly, this can be easily achieved by clearing MMAP flag for concerned
> regions. Then every access from VM will go through Qemu and then VFIO
> and finally reach vgpu driver. The only concern is in the performance
> part. We need some general mechanism to allow delivering I/O emulation
> request directly from KVM in kernel. For example, Alex mentioned some
> flavor based on file descriptor + offset. Likely let's move forward with
> the default Qemu forwarding, while brainstorming exit-less delivery in parallel.
> 
> 3) Inject a virtual interrupt
> --
> We can leverage existing VFIO IRQ injection interface, including configuration
> and irqfd interface.
> 
> 4) Map/unmap guest memory
> --
> It's there for KVM.

Map and unmap for who?  For the vGPU or for the VM?  It seems like we
know how to map guest memory for the vGPU without KVM, but that's
covered in 7), so I'm not entirely sure what this is specifying.
 
> 5) Pin/unpin guest memory
> --
> IGD or any PCI passthru should have same requirement. So we should be
> able to leverage existing code in VFIO. The only tricky thing (Jike may
> elaborate after he is back), is that KVMGT requires to pin EPT entry too,
> which requires some further change in KVM side. But I'm not sure whether
> it still holds true after some design changes made in this thread. So I'll
> leave to Jike to further comment.

PCI assignment requires pinning all of guest memory, I would think that
IGD would only need to pin selective memory, so is this simply stating
that both have the need to pin memory, not that they'll do it to the
same extent?

> 6) Write-protect a guest memory page
> --
> The primary purpose is for GPU page table shadowing. We need to track
> modifications on guest GPU page table, so shadow part can be synchronized
> accordingly. Just think about CPU page table shadowing. And old example
> as Zhiyuan pointed out, is to write-protect guest cmd buffer. But it becomes
> not necessary now.
> 
> So we need KVM to provide an interface so some agents can request such
> write-protection action (not just for KVMGT. could be for other tracking 
> usages). Guangrong has been working on a general page tracking mechanism,
> upon which write-protection can be easily built on. The review is still in 
> progress.

I have a hard time believing we don't have the mechanics to do this
outside of KVM.  We should be able to write protect user pages from the
kernel, this is how copy-on-write generally works.  So it seems like we
should be able to apply those same mechanics to our userspace process,
which just happens to be a KVM VM.  I'm hoping that Paolo might have
some ideas how to make this work or maybe Intel has some virtual memory
experts that can point us in the right direction.

> 7) GPA->IOVA/HVA translation
> --
> It's required in various places, e.g.:
> - read a guest structure according to GPA
> - replace GPA with IOVA in various shadow structures
> 
> We can maintain both translations in vfio-iommu-type1 driver, since
> necessary information is ready at map interface. And we should use
> MemoryListener to update the database. It's already there for physical
> device passthru (Qemu uses MemoryListener and then rely to vfio).
> 
> vfio-vgpu will expose query interface, thru vgpu core driver, so that 
> vgpu driver can use above database for whatever purpose.
> 
> 
> ----
> Well, then I realize pretty much opens have been covered with a solution
> when ending this write-up. Then we should move forward to come up a
> prototype upon which we can then identify anything missing or overlooked
> (definitely there would be), and also discuss several remaining opens atop
>  (such as exit-less emulation, pin/unpin, etc.). Another thing we need
> to think is whether this new design is still compatible to Xen side.
> 
> Thanks a lot all for the great discussion (especially Alex with many good
> inputs)! I believe it becomes much clearer now than 2 weeks ago, about 
> how to integrate KVMGT with VFIO. :-)

Thanks for your summary, Kevin.  It does seem like there are only a few
outstanding issues which should be manageable and hopefully the overall
approach is cleaner for QEMU, management tools, and provides a more
consistent user interface as well.  If we can translate the solution to
Xen, that's even better.  Thanks,

Alex

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: <summary> RE: [iGVT-g] VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...)
  2016-02-03 20:44   ` [Qemu-devel] " Alex Williamson
@ 2016-02-04  3:01     ` Tian, Kevin
  -1 siblings, 0 replies; 18+ messages in thread
From: Tian, Kevin @ 2016-02-04  3:01 UTC (permalink / raw)
  To: Alex Williamson, Lv, Zhiyuan, Gerd Hoffmann
  Cc: Yang Zhang, igvt-g@lists.01.org, qemu-devel, kvm, Paolo Bonzini

> From: Alex Williamson [mailto:alex.williamson@redhat.com]
> Sent: Thursday, February 04, 2016 4:45 AM
> >
> > First, Jike told me before his vacation, that we cannot do any change to
> > KVM module according to community comments. Now I think it's not true.
> > We can do necessary changes, as long as it is done in a structural/layered
> > approach, w/o hard assumption on KVMGT as the only user. That's the
> > guideline we need to obey. :-)
> 
> We certainly need to separate the functionality that you're trying to
> enable from the more pure concept of vfio.  vfio is a userspace driver
> interfaces, not a userspace driver interface for KVM-based virtual
> machines.  Maybe it's more of a gimmick that we can assign PCI devices
> to QEMU tcg VMs, but that's really just the proof of concept for more
> useful capabilities, like supporting DPDK applications.  So, I
> begrudgingly agree that structured/layered interactions are acceptable,
> but consider what use cases may be excluded by doing so.

Understand. We shouldn't assume VFIO always connected to KVM. For 
example, once we have vfio-vgpu ready, it can be used to drive container
usage too, not exactly always connecting with KVM/Qemu. Actually thinking
more from this angle there is a new open which I'll describe in the end...

> >
> > 4) Map/unmap guest memory
> > --
> > It's there for KVM.
> 
> Map and unmap for who?  For the vGPU or for the VM?  It seems like we
> know how to map guest memory for the vGPU without KVM, but that's
> covered in 7), so I'm not entirely sure what this is specifying.

Map guest memory for emulation purpose in vGPU device model, e.g. to r/w
guest GPU page table, command buffer, etc. It's the basic requirement as
we see in any device model.

7) provides the database (both GPA->IOVA and GPA->HPA), where GPA->HPA
can be used to implement this interface for KVM. However for Xen it's
different, as special foreign domain mapping hypercall is involved which is
Xen specific so not appropriate to be in VFIO. 

That's why we list this interface separately as a key requirement (though
it's obvious for KVM)

> 
> > 5) Pin/unpin guest memory
> > --
> > IGD or any PCI passthru should have same requirement. So we should be
> > able to leverage existing code in VFIO. The only tricky thing (Jike may
> > elaborate after he is back), is that KVMGT requires to pin EPT entry too,
> > which requires some further change in KVM side. But I'm not sure whether
> > it still holds true after some design changes made in this thread. So I'll
> > leave to Jike to further comment.
> 
> PCI assignment requires pinning all of guest memory, I would think that
> IGD would only need to pin selective memory, so is this simply stating
> that both have the need to pin memory, not that they'll do it to the
> same extent?

For simplicity let's first pin all memory, while taking selective pinning as a
future enhancement.

The tricky thing is that existing 'pin' action in VFIO doesn't actually pin
EPT entry too (only pin host page tables for Qemu process). There are 
various places where EPT entries might be invalidated when guest is 
running, while KVMGT requires EPT entries to be pinned too. Let's wait 
for Jike to elaborate whether this part is still required today.

> 
> > 6) Write-protect a guest memory page
> > --
> > The primary purpose is for GPU page table shadowing. We need to track
> > modifications on guest GPU page table, so shadow part can be synchronized
> > accordingly. Just think about CPU page table shadowing. And old example
> > as Zhiyuan pointed out, is to write-protect guest cmd buffer. But it becomes
> > not necessary now.
> >
> > So we need KVM to provide an interface so some agents can request such
> > write-protection action (not just for KVMGT. could be for other tracking
> > usages). Guangrong has been working on a general page tracking mechanism,
> > upon which write-protection can be easily built on. The review is still in
> > progress.
> 
> I have a hard time believing we don't have the mechanics to do this
> outside of KVM.  We should be able to write protect user pages from the
> kernel, this is how copy-on-write generally works.  So it seems like we
> should be able to apply those same mechanics to our userspace process,
> which just happens to be a KVM VM.  I'm hoping that Paolo might have
> some ideas how to make this work or maybe Intel has some virtual memory
> experts that can point us in the right direction.

What we want to write-protect, is when the access happens inside VM.
I don't know why any tricks in host page table can help here. The only
way is to tweak page tables used in non-root mode (either EPT or
shadow page table).

> 
> Thanks for your summary, Kevin.  It does seem like there are only a few
> outstanding issues which should be manageable and hopefully the overall
> approach is cleaner for QEMU, management tools, and provides a more
> consistent user interface as well.  If we can translate the solution to
> Xen, that's even better.  Thanks,
> 

Here is the main open in my head, after thinking about the role of VFIO:

For above 7 services required by vGPU device model, they can fall into
two categories:

a) services to connect vGPU with VM, which are essentially what a device
driver is doing (so VFIO can fit here), including:
	1) Selectively pass-through a region to a VM
	2) Trap-and-emulate a region
	3) Inject a virtual interrupt
	5) Pin/unpin guest memory
	7) GPA->IOVA/HVA translation (as a side-effect)

b) services to support device emulation, which gonna be hypervisor 
specific, including:
	4) Map/unmap guest memory
	6) Write-protect a guest memory page

VFIO can fulfill category a), but not for b). A possible abstraction would
be in vGPU core driver, to allow specific hypervisor registering callbacks
for category b) (which means a KVMGT specific file say KVM-vGPU will 
be added to KVM to connect both together).

Then a likely layered blocks would be like:

VFIO-vGPU  <--------->  vGPU Core  <-------------> KVMGT-vGPU
                        ^       ^
                        |       |
                        |       |
                        v       v
                      nvidia   intel
                       vGPU    vGPU

Xen will register its own vGPU bus driver (not using VFIO today) and
also hypervisor services using the same framework. With this design,
everything is abstracted/registered through vGPU core driver, instead
of talking with each other directly.

Thoughts?

P.S. from the description of above requirements, the whole framework
might be also extended to cover any device type using same mediated
pass-through approach. Though graphics has some special requirement,
the majority are actually device agnostics. Maybe better not limiting it
with a vGPU name at all. :-)

Thanks
Kevin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] <summary> RE: [iGVT-g] VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...)
@ 2016-02-04  3:01     ` Tian, Kevin
  0 siblings, 0 replies; 18+ messages in thread
From: Tian, Kevin @ 2016-02-04  3:01 UTC (permalink / raw)
  To: Alex Williamson, Lv, Zhiyuan, Gerd Hoffmann
  Cc: Yang Zhang, igvt-g@lists.01.org, qemu-devel, kvm, Paolo Bonzini

> From: Alex Williamson [mailto:alex.williamson@redhat.com]
> Sent: Thursday, February 04, 2016 4:45 AM
> >
> > First, Jike told me before his vacation, that we cannot do any change to
> > KVM module according to community comments. Now I think it's not true.
> > We can do necessary changes, as long as it is done in a structural/layered
> > approach, w/o hard assumption on KVMGT as the only user. That's the
> > guideline we need to obey. :-)
> 
> We certainly need to separate the functionality that you're trying to
> enable from the more pure concept of vfio.  vfio is a userspace driver
> interfaces, not a userspace driver interface for KVM-based virtual
> machines.  Maybe it's more of a gimmick that we can assign PCI devices
> to QEMU tcg VMs, but that's really just the proof of concept for more
> useful capabilities, like supporting DPDK applications.  So, I
> begrudgingly agree that structured/layered interactions are acceptable,
> but consider what use cases may be excluded by doing so.

Understand. We shouldn't assume VFIO always connected to KVM. For 
example, once we have vfio-vgpu ready, it can be used to drive container
usage too, not exactly always connecting with KVM/Qemu. Actually thinking
more from this angle there is a new open which I'll describe in the end...

> >
> > 4) Map/unmap guest memory
> > --
> > It's there for KVM.
> 
> Map and unmap for who?  For the vGPU or for the VM?  It seems like we
> know how to map guest memory for the vGPU without KVM, but that's
> covered in 7), so I'm not entirely sure what this is specifying.

Map guest memory for emulation purpose in vGPU device model, e.g. to r/w
guest GPU page table, command buffer, etc. It's the basic requirement as
we see in any device model.

7) provides the database (both GPA->IOVA and GPA->HPA), where GPA->HPA
can be used to implement this interface for KVM. However for Xen it's
different, as special foreign domain mapping hypercall is involved which is
Xen specific so not appropriate to be in VFIO. 

That's why we list this interface separately as a key requirement (though
it's obvious for KVM)

> 
> > 5) Pin/unpin guest memory
> > --
> > IGD or any PCI passthru should have same requirement. So we should be
> > able to leverage existing code in VFIO. The only tricky thing (Jike may
> > elaborate after he is back), is that KVMGT requires to pin EPT entry too,
> > which requires some further change in KVM side. But I'm not sure whether
> > it still holds true after some design changes made in this thread. So I'll
> > leave to Jike to further comment.
> 
> PCI assignment requires pinning all of guest memory, I would think that
> IGD would only need to pin selective memory, so is this simply stating
> that both have the need to pin memory, not that they'll do it to the
> same extent?

For simplicity let's first pin all memory, while taking selective pinning as a
future enhancement.

The tricky thing is that existing 'pin' action in VFIO doesn't actually pin
EPT entry too (only pin host page tables for Qemu process). There are 
various places where EPT entries might be invalidated when guest is 
running, while KVMGT requires EPT entries to be pinned too. Let's wait 
for Jike to elaborate whether this part is still required today.

> 
> > 6) Write-protect a guest memory page
> > --
> > The primary purpose is for GPU page table shadowing. We need to track
> > modifications on guest GPU page table, so shadow part can be synchronized
> > accordingly. Just think about CPU page table shadowing. And old example
> > as Zhiyuan pointed out, is to write-protect guest cmd buffer. But it becomes
> > not necessary now.
> >
> > So we need KVM to provide an interface so some agents can request such
> > write-protection action (not just for KVMGT. could be for other tracking
> > usages). Guangrong has been working on a general page tracking mechanism,
> > upon which write-protection can be easily built on. The review is still in
> > progress.
> 
> I have a hard time believing we don't have the mechanics to do this
> outside of KVM.  We should be able to write protect user pages from the
> kernel, this is how copy-on-write generally works.  So it seems like we
> should be able to apply those same mechanics to our userspace process,
> which just happens to be a KVM VM.  I'm hoping that Paolo might have
> some ideas how to make this work or maybe Intel has some virtual memory
> experts that can point us in the right direction.

What we want to write-protect, is when the access happens inside VM.
I don't know why any tricks in host page table can help here. The only
way is to tweak page tables used in non-root mode (either EPT or
shadow page table).

> 
> Thanks for your summary, Kevin.  It does seem like there are only a few
> outstanding issues which should be manageable and hopefully the overall
> approach is cleaner for QEMU, management tools, and provides a more
> consistent user interface as well.  If we can translate the solution to
> Xen, that's even better.  Thanks,
> 

Here is the main open in my head, after thinking about the role of VFIO:

For above 7 services required by vGPU device model, they can fall into
two categories:

a) services to connect vGPU with VM, which are essentially what a device
driver is doing (so VFIO can fit here), including:
	1) Selectively pass-through a region to a VM
	2) Trap-and-emulate a region
	3) Inject a virtual interrupt
	5) Pin/unpin guest memory
	7) GPA->IOVA/HVA translation (as a side-effect)

b) services to support device emulation, which gonna be hypervisor 
specific, including:
	4) Map/unmap guest memory
	6) Write-protect a guest memory page

VFIO can fulfill category a), but not for b). A possible abstraction would
be in vGPU core driver, to allow specific hypervisor registering callbacks
for category b) (which means a KVMGT specific file say KVM-vGPU will 
be added to KVM to connect both together).

Then a likely layered blocks would be like:

VFIO-vGPU  <--------->  vGPU Core  <-------------> KVMGT-vGPU
                        ^       ^
                        |       |
                        |       |
                        v       v
                      nvidia   intel
                       vGPU    vGPU

Xen will register its own vGPU bus driver (not using VFIO today) and
also hypervisor services using the same framework. With this design,
everything is abstracted/registered through vGPU core driver, instead
of talking with each other directly.

Thoughts?

P.S. from the description of above requirements, the whole framework
might be also extended to cover any device type using same mediated
pass-through approach. Though graphics has some special requirement,
the majority are actually device agnostics. Maybe better not limiting it
with a vGPU name at all. :-)

Thanks
Kevin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [iGVT-g] <summary> RE: VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...)
  2016-02-04  3:01     ` [Qemu-devel] " Tian, Kevin
@ 2016-02-04  3:52       ` Neo Jia
  -1 siblings, 0 replies; 18+ messages in thread
From: Neo Jia @ 2016-02-04  3:52 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Alex Williamson, Lv, Zhiyuan, Gerd Hoffmann, Yang Zhang,
	igvt-g@lists.01.org, qemu-devel, kvm, Paolo Bonzini,
	Kirti Wankhede

On Thu, Feb 04, 2016 at 03:01:36AM +0000, Tian, Kevin wrote:
> > From: Alex Williamson [mailto:alex.williamson@redhat.com]
> > Sent: Thursday, February 04, 2016 4:45 AM
> > >
> > > First, Jike told me before his vacation, that we cannot do any change to
> > > KVM module according to community comments. Now I think it's not true.
> > > We can do necessary changes, as long as it is done in a structural/layered
> > > approach, w/o hard assumption on KVMGT as the only user. That's the
> > > guideline we need to obey. :-)
> > 
> > We certainly need to separate the functionality that you're trying to
> > enable from the more pure concept of vfio.  vfio is a userspace driver
> > interfaces, not a userspace driver interface for KVM-based virtual
> > machines.  Maybe it's more of a gimmick that we can assign PCI devices
> > to QEMU tcg VMs, but that's really just the proof of concept for more
> > useful capabilities, like supporting DPDK applications.  So, I
> > begrudgingly agree that structured/layered interactions are acceptable,
> > but consider what use cases may be excluded by doing so.
> 
> Understand. We shouldn't assume VFIO always connected to KVM. For 
> example, once we have vfio-vgpu ready, it can be used to drive container
> usage too, not exactly always connecting with KVM/Qemu. Actually thinking
> more from this angle there is a new open which I'll describe in the end...
> 
> > >
> > > 4) Map/unmap guest memory
> > > --
> > > It's there for KVM.
> > 
> > Map and unmap for who?  For the vGPU or for the VM?  It seems like we
> > know how to map guest memory for the vGPU without KVM, but that's
> > covered in 7), so I'm not entirely sure what this is specifying.
> 
> Map guest memory for emulation purpose in vGPU device model, e.g. to r/w
> guest GPU page table, command buffer, etc. It's the basic requirement as
> we see in any device model.
> 
> 7) provides the database (both GPA->IOVA and GPA->HPA), where GPA->HPA
> can be used to implement this interface for KVM. However for Xen it's
> different, as special foreign domain mapping hypercall is involved which is
> Xen specific so not appropriate to be in VFIO. 
> 
> That's why we list this interface separately as a key requirement (though
> it's obvious for KVM)

Hi Kevin,

It seems you are trying to map the guest physical memory into your kernel driver
on the host, right? 

If yes, I think we have already have the required information to achieve that.

The type1 IOMMU VGPU interface has provided <QEMU_VA, iova, qemu_mm>, which is
enough for us to do any lookup.

> 
> > 
> > > 5) Pin/unpin guest memory
> > > --
> > > IGD or any PCI passthru should have same requirement. So we should be
> > > able to leverage existing code in VFIO. The only tricky thing (Jike may
> > > elaborate after he is back), is that KVMGT requires to pin EPT entry too,
> > > which requires some further change in KVM side. But I'm not sure whether
> > > it still holds true after some design changes made in this thread. So I'll
> > > leave to Jike to further comment.
> > 
> > PCI assignment requires pinning all of guest memory, I would think that
> > IGD would only need to pin selective memory, so is this simply stating
> > that both have the need to pin memory, not that they'll do it to the
> > same extent?
> 
> For simplicity let's first pin all memory, while taking selective pinning as a
> future enhancement.
> 
> The tricky thing is that existing 'pin' action in VFIO doesn't actually pin
> EPT entry too (only pin host page tables for Qemu process). There are 
> various places where EPT entries might be invalidated when guest is 
> running, while KVMGT requires EPT entries to be pinned too. Let's wait 
> for Jike to elaborate whether this part is still required today.

Sorry, don't quite follow the logic here. The current VFIO TYPE1 IOMMU (including API
and underlying IOMMU implementation) will pin the guest physical memory and
install those pages to the proper device domain. Yes, it is only for the QEMU
process as that is what the VM is running at. 

Do I miss something here?

> 
> > 
> > > 6) Write-protect a guest memory page
> > > --
> > > The primary purpose is for GPU page table shadowing. We need to track
> > > modifications on guest GPU page table, so shadow part can be synchronized
> > > accordingly. Just think about CPU page table shadowing. And old example
> > > as Zhiyuan pointed out, is to write-protect guest cmd buffer. But it becomes
> > > not necessary now.
> > >
> > > So we need KVM to provide an interface so some agents can request such
> > > write-protection action (not just for KVMGT. could be for other tracking
> > > usages). Guangrong has been working on a general page tracking mechanism,
> > > upon which write-protection can be easily built on. The review is still in
> > > progress.
> > 
> > I have a hard time believing we don't have the mechanics to do this
> > outside of KVM.  We should be able to write protect user pages from the
> > kernel, this is how copy-on-write generally works.  So it seems like we
> > should be able to apply those same mechanics to our userspace process,
> > which just happens to be a KVM VM.  I'm hoping that Paolo might have
> > some ideas how to make this work or maybe Intel has some virtual memory
> > experts that can point us in the right direction.
> 
> What we want to write-protect, is when the access happens inside VM.
> I don't know why any tricks in host page table can help here. The only
> way is to tweak page tables used in non-root mode (either EPT or
> shadow page table).
> 
> > 
> > Thanks for your summary, Kevin.  It does seem like there are only a few
> > outstanding issues which should be manageable and hopefully the overall
> > approach is cleaner for QEMU, management tools, and provides a more
> > consistent user interface as well.  If we can translate the solution to
> > Xen, that's even better.  Thanks,
> > 
> 
> Here is the main open in my head, after thinking about the role of VFIO:
> 
> For above 7 services required by vGPU device model, they can fall into
> two categories:
> 
> a) services to connect vGPU with VM, which are essentially what a device
> driver is doing (so VFIO can fit here), including:
> 	1) Selectively pass-through a region to a VM
> 	2) Trap-and-emulate a region
> 	3) Inject a virtual interrupt
> 	5) Pin/unpin guest memory
> 	7) GPA->IOVA/HVA translation (as a side-effect)
> 
> b) services to support device emulation, which gonna be hypervisor 
> specific, including:
> 	4) Map/unmap guest memory

I think we have the ability to support this already with VFIO, see my comments
above.

Thanks,
Neo

> 	6) Write-protect a guest memory page
> 
> VFIO can fulfill category a), but not for b). A possible abstraction would
> be in vGPU core driver, to allow specific hypervisor registering callbacks
> for category b) (which means a KVMGT specific file say KVM-vGPU will 
> be added to KVM to connect both together).
> 
> Then a likely layered blocks would be like:
> 
> VFIO-vGPU  <--------->  vGPU Core  <-------------> KVMGT-vGPU
>                         ^       ^
>                         |       |
>                         |       |
>                         v       v
>                       nvidia   intel
>                        vGPU    vGPU
> 
> Xen will register its own vGPU bus driver (not using VFIO today) and
> also hypervisor services using the same framework. With this design,
> everything is abstracted/registered through vGPU core driver, instead
> of talking with each other directly.
> 
> Thoughts?
> 
> P.S. from the description of above requirements, the whole framework
> might be also extended to cover any device type using same mediated
> pass-through approach. Though graphics has some special requirement,
> the majority are actually device agnostics. Maybe better not limiting it
> with a vGPU name at all. :-)
> 
> Thanks
> Kevin
> _______________________________________________
> iGVT-g mailing list
> iGVT-g@lists.01.org
> https://lists.01.org/mailman/listinfo/igvt-g

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [iGVT-g] <summary> RE: VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...)
@ 2016-02-04  3:52       ` Neo Jia
  0 siblings, 0 replies; 18+ messages in thread
From: Neo Jia @ 2016-02-04  3:52 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Yang Zhang, igvt-g@lists.01.org, kvm, qemu-devel, Kirti Wankhede,
	Alex Williamson, Lv, Zhiyuan, Paolo Bonzini, Gerd Hoffmann

On Thu, Feb 04, 2016 at 03:01:36AM +0000, Tian, Kevin wrote:
> > From: Alex Williamson [mailto:alex.williamson@redhat.com]
> > Sent: Thursday, February 04, 2016 4:45 AM
> > >
> > > First, Jike told me before his vacation, that we cannot do any change to
> > > KVM module according to community comments. Now I think it's not true.
> > > We can do necessary changes, as long as it is done in a structural/layered
> > > approach, w/o hard assumption on KVMGT as the only user. That's the
> > > guideline we need to obey. :-)
> > 
> > We certainly need to separate the functionality that you're trying to
> > enable from the more pure concept of vfio.  vfio is a userspace driver
> > interfaces, not a userspace driver interface for KVM-based virtual
> > machines.  Maybe it's more of a gimmick that we can assign PCI devices
> > to QEMU tcg VMs, but that's really just the proof of concept for more
> > useful capabilities, like supporting DPDK applications.  So, I
> > begrudgingly agree that structured/layered interactions are acceptable,
> > but consider what use cases may be excluded by doing so.
> 
> Understand. We shouldn't assume VFIO always connected to KVM. For 
> example, once we have vfio-vgpu ready, it can be used to drive container
> usage too, not exactly always connecting with KVM/Qemu. Actually thinking
> more from this angle there is a new open which I'll describe in the end...
> 
> > >
> > > 4) Map/unmap guest memory
> > > --
> > > It's there for KVM.
> > 
> > Map and unmap for who?  For the vGPU or for the VM?  It seems like we
> > know how to map guest memory for the vGPU without KVM, but that's
> > covered in 7), so I'm not entirely sure what this is specifying.
> 
> Map guest memory for emulation purpose in vGPU device model, e.g. to r/w
> guest GPU page table, command buffer, etc. It's the basic requirement as
> we see in any device model.
> 
> 7) provides the database (both GPA->IOVA and GPA->HPA), where GPA->HPA
> can be used to implement this interface for KVM. However for Xen it's
> different, as special foreign domain mapping hypercall is involved which is
> Xen specific so not appropriate to be in VFIO. 
> 
> That's why we list this interface separately as a key requirement (though
> it's obvious for KVM)

Hi Kevin,

It seems you are trying to map the guest physical memory into your kernel driver
on the host, right? 

If yes, I think we have already have the required information to achieve that.

The type1 IOMMU VGPU interface has provided <QEMU_VA, iova, qemu_mm>, which is
enough for us to do any lookup.

> 
> > 
> > > 5) Pin/unpin guest memory
> > > --
> > > IGD or any PCI passthru should have same requirement. So we should be
> > > able to leverage existing code in VFIO. The only tricky thing (Jike may
> > > elaborate after he is back), is that KVMGT requires to pin EPT entry too,
> > > which requires some further change in KVM side. But I'm not sure whether
> > > it still holds true after some design changes made in this thread. So I'll
> > > leave to Jike to further comment.
> > 
> > PCI assignment requires pinning all of guest memory, I would think that
> > IGD would only need to pin selective memory, so is this simply stating
> > that both have the need to pin memory, not that they'll do it to the
> > same extent?
> 
> For simplicity let's first pin all memory, while taking selective pinning as a
> future enhancement.
> 
> The tricky thing is that existing 'pin' action in VFIO doesn't actually pin
> EPT entry too (only pin host page tables for Qemu process). There are 
> various places where EPT entries might be invalidated when guest is 
> running, while KVMGT requires EPT entries to be pinned too. Let's wait 
> for Jike to elaborate whether this part is still required today.

Sorry, don't quite follow the logic here. The current VFIO TYPE1 IOMMU (including API
and underlying IOMMU implementation) will pin the guest physical memory and
install those pages to the proper device domain. Yes, it is only for the QEMU
process as that is what the VM is running at. 

Do I miss something here?

> 
> > 
> > > 6) Write-protect a guest memory page
> > > --
> > > The primary purpose is for GPU page table shadowing. We need to track
> > > modifications on guest GPU page table, so shadow part can be synchronized
> > > accordingly. Just think about CPU page table shadowing. And old example
> > > as Zhiyuan pointed out, is to write-protect guest cmd buffer. But it becomes
> > > not necessary now.
> > >
> > > So we need KVM to provide an interface so some agents can request such
> > > write-protection action (not just for KVMGT. could be for other tracking
> > > usages). Guangrong has been working on a general page tracking mechanism,
> > > upon which write-protection can be easily built on. The review is still in
> > > progress.
> > 
> > I have a hard time believing we don't have the mechanics to do this
> > outside of KVM.  We should be able to write protect user pages from the
> > kernel, this is how copy-on-write generally works.  So it seems like we
> > should be able to apply those same mechanics to our userspace process,
> > which just happens to be a KVM VM.  I'm hoping that Paolo might have
> > some ideas how to make this work or maybe Intel has some virtual memory
> > experts that can point us in the right direction.
> 
> What we want to write-protect, is when the access happens inside VM.
> I don't know why any tricks in host page table can help here. The only
> way is to tweak page tables used in non-root mode (either EPT or
> shadow page table).
> 
> > 
> > Thanks for your summary, Kevin.  It does seem like there are only a few
> > outstanding issues which should be manageable and hopefully the overall
> > approach is cleaner for QEMU, management tools, and provides a more
> > consistent user interface as well.  If we can translate the solution to
> > Xen, that's even better.  Thanks,
> > 
> 
> Here is the main open in my head, after thinking about the role of VFIO:
> 
> For above 7 services required by vGPU device model, they can fall into
> two categories:
> 
> a) services to connect vGPU with VM, which are essentially what a device
> driver is doing (so VFIO can fit here), including:
> 	1) Selectively pass-through a region to a VM
> 	2) Trap-and-emulate a region
> 	3) Inject a virtual interrupt
> 	5) Pin/unpin guest memory
> 	7) GPA->IOVA/HVA translation (as a side-effect)
> 
> b) services to support device emulation, which gonna be hypervisor 
> specific, including:
> 	4) Map/unmap guest memory

I think we have the ability to support this already with VFIO, see my comments
above.

Thanks,
Neo

> 	6) Write-protect a guest memory page
> 
> VFIO can fulfill category a), but not for b). A possible abstraction would
> be in vGPU core driver, to allow specific hypervisor registering callbacks
> for category b) (which means a KVMGT specific file say KVM-vGPU will 
> be added to KVM to connect both together).
> 
> Then a likely layered blocks would be like:
> 
> VFIO-vGPU  <--------->  vGPU Core  <-------------> KVMGT-vGPU
>                         ^       ^
>                         |       |
>                         |       |
>                         v       v
>                       nvidia   intel
>                        vGPU    vGPU
> 
> Xen will register its own vGPU bus driver (not using VFIO today) and
> also hypervisor services using the same framework. With this design,
> everything is abstracted/registered through vGPU core driver, instead
> of talking with each other directly.
> 
> Thoughts?
> 
> P.S. from the description of above requirements, the whole framework
> might be also extended to cover any device type using same mediated
> pass-through approach. Though graphics has some special requirement,
> the majority are actually device agnostics. Maybe better not limiting it
> with a vGPU name at all. :-)
> 
> Thanks
> Kevin
> _______________________________________________
> iGVT-g mailing list
> iGVT-g@lists.01.org
> https://lists.01.org/mailman/listinfo/igvt-g

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [iGVT-g] <summary> RE: VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...)
  2016-02-04  3:52       ` [Qemu-devel] " Neo Jia
@ 2016-02-04  4:16         ` Tian, Kevin
  -1 siblings, 0 replies; 18+ messages in thread
From: Tian, Kevin @ 2016-02-04  4:16 UTC (permalink / raw)
  To: Neo Jia
  Cc: Yang Zhang, igvt-g@lists.01.org, kvm, qemu-devel, Kirti Wankhede,
	Alex Williamson, Lv, Zhiyuan, Paolo Bonzini, Gerd Hoffmann

> From: Neo Jia [mailto:cjia@nvidia.com]
> Sent: Thursday, February 04, 2016 11:52 AM
> 
> > > > 4) Map/unmap guest memory
> > > > --
> > > > It's there for KVM.
> > >
> > > Map and unmap for who?  For the vGPU or for the VM?  It seems like we
> > > know how to map guest memory for the vGPU without KVM, but that's
> > > covered in 7), so I'm not entirely sure what this is specifying.
> >
> > Map guest memory for emulation purpose in vGPU device model, e.g. to r/w
> > guest GPU page table, command buffer, etc. It's the basic requirement as
> > we see in any device model.
> >
> > 7) provides the database (both GPA->IOVA and GPA->HPA), where GPA->HPA
> > can be used to implement this interface for KVM. However for Xen it's
> > different, as special foreign domain mapping hypercall is involved which is
> > Xen specific so not appropriate to be in VFIO.
> >
> > That's why we list this interface separately as a key requirement (though
> > it's obvious for KVM)
> 
> Hi Kevin,
> 
> It seems you are trying to map the guest physical memory into your kernel driver
> on the host, right?

yes.

> 
> If yes, I think we have already have the required information to achieve that.
> 
> The type1 IOMMU VGPU interface has provided <QEMU_VA, iova, qemu_mm>, which is
> enough for us to do any lookup.

As I said, it's easy for KVM. but not the same case for Xen which needs special
hypercall to map guest memory in kernel and VFIO is not used by Xen today.

> 
> >
> > >
> > > > 5) Pin/unpin guest memory
> > > > --
> > > > IGD or any PCI passthru should have same requirement. So we should be
> > > > able to leverage existing code in VFIO. The only tricky thing (Jike may
> > > > elaborate after he is back), is that KVMGT requires to pin EPT entry too,
> > > > which requires some further change in KVM side. But I'm not sure whether
> > > > it still holds true after some design changes made in this thread. So I'll
> > > > leave to Jike to further comment.
> > >
> > > PCI assignment requires pinning all of guest memory, I would think that
> > > IGD would only need to pin selective memory, so is this simply stating
> > > that both have the need to pin memory, not that they'll do it to the
> > > same extent?
> >
> > For simplicity let's first pin all memory, while taking selective pinning as a
> > future enhancement.
> >
> > The tricky thing is that existing 'pin' action in VFIO doesn't actually pin
> > EPT entry too (only pin host page tables for Qemu process). There are
> > various places where EPT entries might be invalidated when guest is
> > running, while KVMGT requires EPT entries to be pinned too. Let's wait
> > for Jike to elaborate whether this part is still required today.
> 
> Sorry, don't quite follow the logic here. The current VFIO TYPE1 IOMMU (including API
> and underlying IOMMU implementation) will pin the guest physical memory and
> install those pages to the proper device domain. Yes, it is only for the QEMU
> process as that is what the VM is running at.
> 
> Do I miss something here?

For Qemu there are two page tables involved: one is host page table as
you mentioned here for root mode, the other is EPT page table used
as the 2nd level translation when guest is running in non-root mode. I'm
not sure why KVMGT requires to pin EPT entry. Jike should know better
here.

> >
> > b) services to support device emulation, which gonna be hypervisor
> > specific, including:
> > 	4) Map/unmap guest memory
> 
> I think we have the ability to support this already with VFIO, see my comments
> above.

Again, please don't consider only KVM/VFIO. We need support both KVM/Xen
in this common framework.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [iGVT-g] <summary> RE: VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...)
@ 2016-02-04  4:16         ` Tian, Kevin
  0 siblings, 0 replies; 18+ messages in thread
From: Tian, Kevin @ 2016-02-04  4:16 UTC (permalink / raw)
  To: Neo Jia
  Cc: Yang Zhang, igvt-g@lists.01.org, kvm, qemu-devel, Kirti Wankhede,
	Alex Williamson, Lv, Zhiyuan, Paolo Bonzini, Gerd Hoffmann

> From: Neo Jia [mailto:cjia@nvidia.com]
> Sent: Thursday, February 04, 2016 11:52 AM
> 
> > > > 4) Map/unmap guest memory
> > > > --
> > > > It's there for KVM.
> > >
> > > Map and unmap for who?  For the vGPU or for the VM?  It seems like we
> > > know how to map guest memory for the vGPU without KVM, but that's
> > > covered in 7), so I'm not entirely sure what this is specifying.
> >
> > Map guest memory for emulation purpose in vGPU device model, e.g. to r/w
> > guest GPU page table, command buffer, etc. It's the basic requirement as
> > we see in any device model.
> >
> > 7) provides the database (both GPA->IOVA and GPA->HPA), where GPA->HPA
> > can be used to implement this interface for KVM. However for Xen it's
> > different, as special foreign domain mapping hypercall is involved which is
> > Xen specific so not appropriate to be in VFIO.
> >
> > That's why we list this interface separately as a key requirement (though
> > it's obvious for KVM)
> 
> Hi Kevin,
> 
> It seems you are trying to map the guest physical memory into your kernel driver
> on the host, right?

yes.

> 
> If yes, I think we have already have the required information to achieve that.
> 
> The type1 IOMMU VGPU interface has provided <QEMU_VA, iova, qemu_mm>, which is
> enough for us to do any lookup.

As I said, it's easy for KVM. but not the same case for Xen which needs special
hypercall to map guest memory in kernel and VFIO is not used by Xen today.

> 
> >
> > >
> > > > 5) Pin/unpin guest memory
> > > > --
> > > > IGD or any PCI passthru should have same requirement. So we should be
> > > > able to leverage existing code in VFIO. The only tricky thing (Jike may
> > > > elaborate after he is back), is that KVMGT requires to pin EPT entry too,
> > > > which requires some further change in KVM side. But I'm not sure whether
> > > > it still holds true after some design changes made in this thread. So I'll
> > > > leave to Jike to further comment.
> > >
> > > PCI assignment requires pinning all of guest memory, I would think that
> > > IGD would only need to pin selective memory, so is this simply stating
> > > that both have the need to pin memory, not that they'll do it to the
> > > same extent?
> >
> > For simplicity let's first pin all memory, while taking selective pinning as a
> > future enhancement.
> >
> > The tricky thing is that existing 'pin' action in VFIO doesn't actually pin
> > EPT entry too (only pin host page tables for Qemu process). There are
> > various places where EPT entries might be invalidated when guest is
> > running, while KVMGT requires EPT entries to be pinned too. Let's wait
> > for Jike to elaborate whether this part is still required today.
> 
> Sorry, don't quite follow the logic here. The current VFIO TYPE1 IOMMU (including API
> and underlying IOMMU implementation) will pin the guest physical memory and
> install those pages to the proper device domain. Yes, it is only for the QEMU
> process as that is what the VM is running at.
> 
> Do I miss something here?

For Qemu there are two page tables involved: one is host page table as
you mentioned here for root mode, the other is EPT page table used
as the 2nd level translation when guest is running in non-root mode. I'm
not sure why KVMGT requires to pin EPT entry. Jike should know better
here.

> >
> > b) services to support device emulation, which gonna be hypervisor
> > specific, including:
> > 	4) Map/unmap guest memory
> 
> I think we have the ability to support this already with VFIO, see my comments
> above.

Again, please don't consider only KVM/VFIO. We need support both KVM/Xen
in this common framework.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [iGVT-g] <summary> RE: VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...)
  2016-02-04  4:16         ` [Qemu-devel] " Tian, Kevin
@ 2016-02-04 15:04           ` Jike Song
  -1 siblings, 0 replies; 18+ messages in thread
From: Jike Song @ 2016-02-04 15:04 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Neo Jia, Yang Zhang, igvt-g@lists.01.org, kvm, qemu-devel,
	Kirti Wankhede, Alex Williamson, Lv, Zhiyuan, Paolo Bonzini,
	Gerd Hoffmann

On 02/04/2016 12:16 PM, Tian, Kevin wrote:
>>>>> 5) Pin/unpin guest memory
>>>>> --
>>>>> IGD or any PCI passthru should have same requirement. So we should be
>>>>> able to leverage existing code in VFIO. The only tricky thing (Jike may
>>>>> elaborate after he is back), is that KVMGT requires to pin EPT entry too,
>>>>> which requires some further change in KVM side. But I'm not sure whether
>>>>> it still holds true after some design changes made in this thread. So I'll
>>>>> leave to Jike to further comment.
>>>>
>>>> PCI assignment requires pinning all of guest memory, I would think that
>>>> IGD would only need to pin selective memory, so is this simply stating
>>>> that both have the need to pin memory, not that they'll do it to the
>>>> same extent?
>>>
>>> For simplicity let's first pin all memory, while taking selective pinning as a
>>> future enhancement.
>>>
>>> The tricky thing is that existing 'pin' action in VFIO doesn't actually pin
>>> EPT entry too (only pin host page tables for Qemu process). There are
>>> various places where EPT entries might be invalidated when guest is
>>> running, while KVMGT requires EPT entries to be pinned too. Let's wait
>>> for Jike to elaborate whether this part is still required today.
>>
>> Sorry, don't quite follow the logic here. The current VFIO TYPE1 IOMMU (including API
>> and underlying IOMMU implementation) will pin the guest physical memory and
>> install those pages to the proper device domain. Yes, it is only for the QEMU
>> process as that is what the VM is running at.
>>
>> Do I miss something here?
> 
> For Qemu there are two page tables involved: one is host page table as
> you mentioned here for root mode, the other is EPT page table used
> as the 2nd level translation when guest is running in non-root mode. I'm
> not sure why KVMGT requires to pin EPT entry. Jike should know better
> here.
> 

There may be some misunderstanding here - KVMGT doesn't need to pin EPT
entries. Previously I mentioned spte pinning, only because that, at that
time we wanted to query pfn for a given gfn by KVM MMU (rmap + spte). Now
we have better way of this.

I promise this is not a problem :)


--
Thanks,
Jike

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [iGVT-g] <summary> RE: VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...)
@ 2016-02-04 15:04           ` Jike Song
  0 siblings, 0 replies; 18+ messages in thread
From: Jike Song @ 2016-02-04 15:04 UTC (permalink / raw)
  To: Tian, Kevin
  Cc: Yang Zhang, igvt-g@lists.01.org, Neo Jia, kvm, qemu-devel,
	Kirti Wankhede, Alex Williamson, Lv, Zhiyuan, Paolo Bonzini,
	Gerd Hoffmann

On 02/04/2016 12:16 PM, Tian, Kevin wrote:
>>>>> 5) Pin/unpin guest memory
>>>>> --
>>>>> IGD or any PCI passthru should have same requirement. So we should be
>>>>> able to leverage existing code in VFIO. The only tricky thing (Jike may
>>>>> elaborate after he is back), is that KVMGT requires to pin EPT entry too,
>>>>> which requires some further change in KVM side. But I'm not sure whether
>>>>> it still holds true after some design changes made in this thread. So I'll
>>>>> leave to Jike to further comment.
>>>>
>>>> PCI assignment requires pinning all of guest memory, I would think that
>>>> IGD would only need to pin selective memory, so is this simply stating
>>>> that both have the need to pin memory, not that they'll do it to the
>>>> same extent?
>>>
>>> For simplicity let's first pin all memory, while taking selective pinning as a
>>> future enhancement.
>>>
>>> The tricky thing is that existing 'pin' action in VFIO doesn't actually pin
>>> EPT entry too (only pin host page tables for Qemu process). There are
>>> various places where EPT entries might be invalidated when guest is
>>> running, while KVMGT requires EPT entries to be pinned too. Let's wait
>>> for Jike to elaborate whether this part is still required today.
>>
>> Sorry, don't quite follow the logic here. The current VFIO TYPE1 IOMMU (including API
>> and underlying IOMMU implementation) will pin the guest physical memory and
>> install those pages to the proper device domain. Yes, it is only for the QEMU
>> process as that is what the VM is running at.
>>
>> Do I miss something here?
> 
> For Qemu there are two page tables involved: one is host page table as
> you mentioned here for root mode, the other is EPT page table used
> as the 2nd level translation when guest is running in non-root mode. I'm
> not sure why KVMGT requires to pin EPT entry. Jike should know better
> here.
> 

There may be some misunderstanding here - KVMGT doesn't need to pin EPT
entries. Previously I mentioned spte pinning, only because that, at that
time we wanted to query pfn for a given gfn by KVM MMU (rmap + spte). Now
we have better way of this.

I promise this is not a problem :)


--
Thanks,
Jike

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: [Qemu-devel] [iGVT-g] <summary> RE: VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...)
  2016-02-04 15:04           ` Jike Song
@ 2016-02-05  2:01             ` Tian, Kevin
  -1 siblings, 0 replies; 18+ messages in thread
From: Tian, Kevin @ 2016-02-05  2:01 UTC (permalink / raw)
  To: Song, Jike
  Cc: Neo Jia, Yang Zhang, igvt-g@lists.01.org, kvm, qemu-devel,
	Kirti Wankhede, Alex Williamson, Lv, Zhiyuan, Paolo Bonzini,
	Gerd Hoffmann

> From: Song, Jike
> Sent: Thursday, February 04, 2016 11:05 PM
> 
> On 02/04/2016 12:16 PM, Tian, Kevin wrote:
> >>>>> 5) Pin/unpin guest memory
> >>>>> --
> >>>>> IGD or any PCI passthru should have same requirement. So we should be
> >>>>> able to leverage existing code in VFIO. The only tricky thing (Jike may
> >>>>> elaborate after he is back), is that KVMGT requires to pin EPT entry too,
> >>>>> which requires some further change in KVM side. But I'm not sure whether
> >>>>> it still holds true after some design changes made in this thread. So I'll
> >>>>> leave to Jike to further comment.
> >>>>
> >>>> PCI assignment requires pinning all of guest memory, I would think that
> >>>> IGD would only need to pin selective memory, so is this simply stating
> >>>> that both have the need to pin memory, not that they'll do it to the
> >>>> same extent?
> >>>
> >>> For simplicity let's first pin all memory, while taking selective pinning as a
> >>> future enhancement.
> >>>
> >>> The tricky thing is that existing 'pin' action in VFIO doesn't actually pin
> >>> EPT entry too (only pin host page tables for Qemu process). There are
> >>> various places where EPT entries might be invalidated when guest is
> >>> running, while KVMGT requires EPT entries to be pinned too. Let's wait
> >>> for Jike to elaborate whether this part is still required today.
> >>
> >> Sorry, don't quite follow the logic here. The current VFIO TYPE1 IOMMU (including API
> >> and underlying IOMMU implementation) will pin the guest physical memory and
> >> install those pages to the proper device domain. Yes, it is only for the QEMU
> >> process as that is what the VM is running at.
> >>
> >> Do I miss something here?
> >
> > For Qemu there are two page tables involved: one is host page table as
> > you mentioned here for root mode, the other is EPT page table used
> > as the 2nd level translation when guest is running in non-root mode. I'm
> > not sure why KVMGT requires to pin EPT entry. Jike should know better
> > here.
> >
> 
> There may be some misunderstanding here - KVMGT doesn't need to pin EPT
> entries. Previously I mentioned spte pinning, only because that, at that
> time we wanted to query pfn for a given gfn by KVM MMU (rmap + spte). Now
> we have better way of this.
> 
> I promise this is not a problem :)
> 

Thanks Jike for confirmation. Then we can reuse existing pin mechanism
in VFIO which is much easier.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [iGVT-g] <summary> RE: VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...)
@ 2016-02-05  2:01             ` Tian, Kevin
  0 siblings, 0 replies; 18+ messages in thread
From: Tian, Kevin @ 2016-02-05  2:01 UTC (permalink / raw)
  To: Song, Jike
  Cc: Yang Zhang, igvt-g@lists.01.org, Neo Jia, kvm, qemu-devel,
	Kirti Wankhede, Alex Williamson, Lv, Zhiyuan, Paolo Bonzini,
	Gerd Hoffmann

> From: Song, Jike
> Sent: Thursday, February 04, 2016 11:05 PM
> 
> On 02/04/2016 12:16 PM, Tian, Kevin wrote:
> >>>>> 5) Pin/unpin guest memory
> >>>>> --
> >>>>> IGD or any PCI passthru should have same requirement. So we should be
> >>>>> able to leverage existing code in VFIO. The only tricky thing (Jike may
> >>>>> elaborate after he is back), is that KVMGT requires to pin EPT entry too,
> >>>>> which requires some further change in KVM side. But I'm not sure whether
> >>>>> it still holds true after some design changes made in this thread. So I'll
> >>>>> leave to Jike to further comment.
> >>>>
> >>>> PCI assignment requires pinning all of guest memory, I would think that
> >>>> IGD would only need to pin selective memory, so is this simply stating
> >>>> that both have the need to pin memory, not that they'll do it to the
> >>>> same extent?
> >>>
> >>> For simplicity let's first pin all memory, while taking selective pinning as a
> >>> future enhancement.
> >>>
> >>> The tricky thing is that existing 'pin' action in VFIO doesn't actually pin
> >>> EPT entry too (only pin host page tables for Qemu process). There are
> >>> various places where EPT entries might be invalidated when guest is
> >>> running, while KVMGT requires EPT entries to be pinned too. Let's wait
> >>> for Jike to elaborate whether this part is still required today.
> >>
> >> Sorry, don't quite follow the logic here. The current VFIO TYPE1 IOMMU (including API
> >> and underlying IOMMU implementation) will pin the guest physical memory and
> >> install those pages to the proper device domain. Yes, it is only for the QEMU
> >> process as that is what the VM is running at.
> >>
> >> Do I miss something here?
> >
> > For Qemu there are two page tables involved: one is host page table as
> > you mentioned here for root mode, the other is EPT page table used
> > as the 2nd level translation when guest is running in non-root mode. I'm
> > not sure why KVMGT requires to pin EPT entry. Jike should know better
> > here.
> >
> 
> There may be some misunderstanding here - KVMGT doesn't need to pin EPT
> entries. Previously I mentioned spte pinning, only because that, at that
> time we wanted to query pfn for a given gfn by KVM MMU (rmap + spte). Now
> we have better way of this.
> 
> I promise this is not a problem :)
> 

Thanks Jike for confirmation. Then we can reuse existing pin mechanism
in VFIO which is much easier.

Thanks
Kevin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* RE: <summary> RE: [iGVT-g] VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...)
  2016-02-03 20:44   ` [Qemu-devel] " Alex Williamson
@ 2016-02-16  6:40     ` Tian, Kevin
  -1 siblings, 0 replies; 18+ messages in thread
From: Tian, Kevin @ 2016-02-16  6:40 UTC (permalink / raw)
  To: Alex Williamson, Lv, Zhiyuan, Gerd Hoffmann
  Cc: Yang Zhang, igvt-g@lists.01.org, qemu-devel, kvm, Paolo Bonzini

> From: Tian, Kevin
> Sent: Thursday, February 04, 2016 11:02 AM
> 
> >
> > Thanks for your summary, Kevin.  It does seem like there are only a few
> > outstanding issues which should be manageable and hopefully the overall
> > approach is cleaner for QEMU, management tools, and provides a more
> > consistent user interface as well.  If we can translate the solution to
> > Xen, that's even better.  Thanks,
> >
> 
> Here is the main open in my head, after thinking about the role of VFIO:
> 
> For above 7 services required by vGPU device model, they can fall into
> two categories:
> 
> a) services to connect vGPU with VM, which are essentially what a device
> driver is doing (so VFIO can fit here), including:
> 	1) Selectively pass-through a region to a VM
> 	2) Trap-and-emulate a region
> 	3) Inject a virtual interrupt
> 	5) Pin/unpin guest memory
> 	7) GPA->IOVA/HVA translation (as a side-effect)
> 
> b) services to support device emulation, which gonna be hypervisor
> specific, including:
> 	4) Map/unmap guest memory
> 	6) Write-protect a guest memory page
> 
> VFIO can fulfill category a), but not for b). A possible abstraction would
> be in vGPU core driver, to allow specific hypervisor registering callbacks
> for category b) (which means a KVMGT specific file say KVM-vGPU will
> be added to KVM to connect both together).
> 
> Then a likely layered blocks would be like:
> 
> VFIO-vGPU  <--------->  vGPU Core  <-------------> KVMGT-vGPU
>                         ^       ^
>                         |       |
>                         |       |
>                         v       v
>                       nvidia   intel
>                        vGPU    vGPU
> 
> Xen will register its own vGPU bus driver (not using VFIO today) and
> also hypervisor services using the same framework. With this design,
> everything is abstracted/registered through vGPU core driver, instead
> of talking with each other directly.
> 
> Thoughts?
> 
> P.S. from the description of above requirements, the whole framework
> might be also extended to cover any device type using same mediated
> pass-through approach. Though graphics has some special requirement,
> the majority are actually device agnostics. Maybe better not limiting it
> with a vGPU name at all. :-)
> 

Any feedback on above open?

btw, based on above description I believe the interaction between VFIO
and vGPU has become very clear. The remaining two services are related
to how a hypervisor provides emulation services to vendor specific vGPU 
device model (more generally it's not vGPU specific. Can apply to any in-kernel 
emulation requirement so KVMGT-vGPU might not be a good name). This part 
is not related to VFIO at all, so we'll start prototyping VFIO related changes 
in parallel.

Since this is related to KVM, Paolo, your comment is also welcomed. :-)

Thanks,
Kevin

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] <summary> RE: [iGVT-g] VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...)
@ 2016-02-16  6:40     ` Tian, Kevin
  0 siblings, 0 replies; 18+ messages in thread
From: Tian, Kevin @ 2016-02-16  6:40 UTC (permalink / raw)
  To: Alex Williamson, Lv, Zhiyuan, Gerd Hoffmann
  Cc: Yang Zhang, igvt-g@lists.01.org, qemu-devel, kvm, Paolo Bonzini

> From: Tian, Kevin
> Sent: Thursday, February 04, 2016 11:02 AM
> 
> >
> > Thanks for your summary, Kevin.  It does seem like there are only a few
> > outstanding issues which should be manageable and hopefully the overall
> > approach is cleaner for QEMU, management tools, and provides a more
> > consistent user interface as well.  If we can translate the solution to
> > Xen, that's even better.  Thanks,
> >
> 
> Here is the main open in my head, after thinking about the role of VFIO:
> 
> For above 7 services required by vGPU device model, they can fall into
> two categories:
> 
> a) services to connect vGPU with VM, which are essentially what a device
> driver is doing (so VFIO can fit here), including:
> 	1) Selectively pass-through a region to a VM
> 	2) Trap-and-emulate a region
> 	3) Inject a virtual interrupt
> 	5) Pin/unpin guest memory
> 	7) GPA->IOVA/HVA translation (as a side-effect)
> 
> b) services to support device emulation, which gonna be hypervisor
> specific, including:
> 	4) Map/unmap guest memory
> 	6) Write-protect a guest memory page
> 
> VFIO can fulfill category a), but not for b). A possible abstraction would
> be in vGPU core driver, to allow specific hypervisor registering callbacks
> for category b) (which means a KVMGT specific file say KVM-vGPU will
> be added to KVM to connect both together).
> 
> Then a likely layered blocks would be like:
> 
> VFIO-vGPU  <--------->  vGPU Core  <-------------> KVMGT-vGPU
>                         ^       ^
>                         |       |
>                         |       |
>                         v       v
>                       nvidia   intel
>                        vGPU    vGPU
> 
> Xen will register its own vGPU bus driver (not using VFIO today) and
> also hypervisor services using the same framework. With this design,
> everything is abstracted/registered through vGPU core driver, instead
> of talking with each other directly.
> 
> Thoughts?
> 
> P.S. from the description of above requirements, the whole framework
> might be also extended to cover any device type using same mediated
> pass-through approach. Though graphics has some special requirement,
> the majority are actually device agnostics. Maybe better not limiting it
> with a vGPU name at all. :-)
> 

Any feedback on above open?

btw, based on above description I believe the interaction between VFIO
and vGPU has become very clear. The remaining two services are related
to how a hypervisor provides emulation services to vendor specific vGPU 
device model (more generally it's not vGPU specific. Can apply to any in-kernel 
emulation requirement so KVMGT-vGPU might not be a good name). This part 
is not related to VFIO at all, so we'll start prototyping VFIO related changes 
in parallel.

Since this is related to KVM, Paolo, your comment is also welcomed. :-)

Thanks,
Kevin

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2016-02-16  6:40 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-03  8:04 <summary> RE: [iGVT-g] VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...) Tian, Kevin
2016-02-03  8:04 ` [Qemu-devel] " Tian, Kevin
2016-02-03  8:41 ` Neo Jia
2016-02-03  8:41   ` [Qemu-devel] " Neo Jia
2016-02-03 20:44 ` Alex Williamson
2016-02-03 20:44   ` [Qemu-devel] " Alex Williamson
2016-02-04  3:01   ` Tian, Kevin
2016-02-04  3:01     ` [Qemu-devel] " Tian, Kevin
2016-02-04  3:52     ` [iGVT-g] <summary> " Neo Jia
2016-02-04  3:52       ` [Qemu-devel] " Neo Jia
2016-02-04  4:16       ` Tian, Kevin
2016-02-04  4:16         ` [Qemu-devel] " Tian, Kevin
2016-02-04 15:04         ` Jike Song
2016-02-04 15:04           ` Jike Song
2016-02-05  2:01           ` Tian, Kevin
2016-02-05  2:01             ` Tian, Kevin
2016-02-16  6:40   ` <summary> RE: [iGVT-g] " Tian, Kevin
2016-02-16  6:40     ` [Qemu-devel] " Tian, Kevin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.