<summary> RE: [iGVT-g] VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT

* <summary> RE: [iGVT-g] VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...)
@ 2016-02-03  8:04 ` Tian, Kevin
  0 siblings, 0 replies; 18+ messages in thread
From: Tian, Kevin @ 2016-02-03  8:04 UTC (permalink / raw)
  To: Lv, Zhiyuan, Alex Williamson, Gerd Hoffmann
  Cc: Yang Zhang, igvt-g@lists.01.org, qemu-devel, kvm, Paolo Bonzini

> From: Zhiyuan Lv
> Sent: Tuesday, February 02, 2016 3:35 PM
> 
> Hi Gerd/Alex,
> 
> On Mon, Feb 01, 2016 at 02:44:55PM -0700, Alex Williamson wrote:
> > On Mon, 2016-02-01 at 14:10 +0100, Gerd Hoffmann wrote:
> > >   Hi,
> > >
> > > > > Unfortunately it's not the only one. Another example is, device-model
> > > > > may want to write-protect a gfn (RAM). In case that this request goes
> > > > > to VFIO .. how it is supposed to reach KVM MMU?
> > > >
> > > > Well, let's work through the problem.  How is the GFN related to the
> > > > device?  Is this some sort of page table for device mappings with a base
> > > > register in the vgpu hardware?
> > >
> > > IIRC this is needed to make sure the guest can't bypass execbuffer
> > > verification and works like this:
> > >
> > >   (1) guest submits execbuffer.
> > >   (2) host makes execbuffer readonly for the guest
> > >   (3) verify the buffer (make sure it only accesses resources owned by
> > >       the vm).
> > >   (4) pass on execbuffer to the hardware.
> > >   (5) when the gpu is done with it make the execbuffer writable again.
> >
> > Ok, so are there opportunities to do those page protections outside of
> > KVM?  We should be able to get the vma for the buffer, can we do
> > something with that to make it read-only.  Alternatively can the vgpu
> > driver copy it to a private buffer and hardware can execute from that?
> > I'm not a virtual memory expert, but it doesn't seem like an
> > insurmountable problem.  Thanks,
> 
> Originally iGVT-g used write-protection for privilege execbuffers, as Gerd
> described. Now the latest implementation has removed wp to do buffer copy
> instead, since the privilege command buffers are usually small. So that part
> is fine.
> 
> But we need write-protection for graphics page table shadowing as well. Once
> guest driver modifies gpu page table, we need to know that and manipulate
> shadow page table accordingly. buffer copy cannot help here. Thanks!
> 

After walking through the whole thread again, let me do a summary here
so everyone can be on the same page. 

First, Jike told me before his vacation, that we cannot do any change to 
KVM module according to community comments. Now I think it's not true. 
We can do necessary changes, as long as it is done in a structural/layered 
approach, w/o hard assumption on KVMGT as the only user. That's the 
guideline we need to obey. :-)

Mostly we care about two aspects regarding to a vgpu driver:
  - services/callbacks which vgpu driver provides to external framework
(e.g. vgpu core driver and VFIO);
  - services/callbacks which vgpu driver relies on for proper emulation
(e.g. from VFIO and/or hypervisor);

The former is being discussed in another thread. So here let's focus
on the latter.

In general Intel GVT-g requires below services for emulation:

1) Selectively pass-through a region to a VM
--
This can be supported by today's VFIO framework, by setting
VFIO_REGION_INFO_FLAG_MMAP for concerned regions. Then Qemu
will mmap that region which will finally be added to the EPT table of
the target VM

2) Trap-and-emulate a region
--
Similarly, this can be easily achieved by clearing MMAP flag for concerned
regions. Then every access from VM will go through Qemu and then VFIO
and finally reach vgpu driver. The only concern is in the performance
part. We need some general mechanism to allow delivering I/O emulation
request directly from KVM in kernel. For example, Alex mentioned some
flavor based on file descriptor + offset. Likely let's move forward with
the default Qemu forwarding, while brainstorming exit-less delivery in parallel.

3) Inject a virtual interrupt
--
We can leverage existing VFIO IRQ injection interface, including configuration
and irqfd interface.

4) Map/unmap guest memory
--
It's there for KVM.

5) Pin/unpin guest memory
--
IGD or any PCI passthru should have same requirement. So we should be
able to leverage existing code in VFIO. The only tricky thing (Jike may
elaborate after he is back), is that KVMGT requires to pin EPT entry too,
which requires some further change in KVM side. But I'm not sure whether
it still holds true after some design changes made in this thread. So I'll
leave to Jike to further comment.

6) Write-protect a guest memory page
--
The primary purpose is for GPU page table shadowing. We need to track
modifications on guest GPU page table, so shadow part can be synchronized
accordingly. Just think about CPU page table shadowing. And old example
as Zhiyuan pointed out, is to write-protect guest cmd buffer. But it becomes
not necessary now.

So we need KVM to provide an interface so some agents can request such
write-protection action (not just for KVMGT. could be for other tracking 
usages). Guangrong has been working on a general page tracking mechanism,
upon which write-protection can be easily built on. The review is still in 
progress.

7) GPA->IOVA/HVA translation
--
It's required in various places, e.g.:
- read a guest structure according to GPA
- replace GPA with IOVA in various shadow structures

We can maintain both translations in vfio-iommu-type1 driver, since
necessary information is ready at map interface. And we should use
MemoryListener to update the database. It's already there for physical
device passthru (Qemu uses MemoryListener and then rely to vfio).

vfio-vgpu will expose query interface, thru vgpu core driver, so that 
vgpu driver can use above database for whatever purpose.

----
Well, then I realize pretty much opens have been covered with a solution
when ending this write-up. Then we should move forward to come up a
prototype upon which we can then identify anything missing or overlooked
(definitely there would be), and also discuss several remaining opens atop
 (such as exit-less emulation, pin/unpin, etc.). Another thing we need
to think is whether this new design is still compatible to Xen side.

Thanks a lot all for the great discussion (especially Alex with many good
inputs)! I believe it becomes much clearer now than 2 weeks ago, about 
how to integrate KVMGT with VFIO. :-)

Thanks
Kevin

^ permalink raw reply	[flat|nested] 18+ messages in thread