All of lore.kernel.org
 help / color / mirror / Atom feed
* VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...)
@ 2016-01-18  2:39 ` Jike Song
  0 siblings, 0 replies; 118+ messages in thread
From: Jike Song @ 2016-01-18  2:39 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Ruan, Shuai, Tian, Kevin, kvm, qemu-devel, igvt-g, Gerd Hoffmann,
	Paolo Bonzini, Zhiyuan Lv

Hi Alex, let's continue with a new thread :)

Basically we agree with you: exposing vGPU via VFIO can make
QEMU share as much code as possible with pcidev(PF or VF) assignment.
And yes, different vGPU vendors can share quite a lot of the
QEMU part, which will do good for upper layers such as libvirt.


To achieve this, there are quite a lot to do, I'll summarize
it below. I dived into VFIO for a while but still may have
things misunderstood, so please correct me :)



First, let me illustrate my understanding of current VFIO
framework used to pass through a pcidev to guest:


                 +----------------------------------+
                 |            vfio qemu             |
                 +-----+------------------------+---+
                       |DMA                  ^  |CFG
QEMU                   |map               IRQ|  |
-----------------------|---------------------|--|-----------
KERNEL    +------------|---------------------|--|----------+
          | VFIO       |                     |  |          |
          |            v                     |  v          |
          |  +-------------------+     +-----+-----------+ |
IOMMU     |  | vfio iommu driver |     | vfio bus driver | |
API  <-------+                   |     |                 | |
Layer     |  | e.g. type1        |     | e.g. vfio_pci   | |
          |  +-------------------+     +-----------------+ |
          +------------------------------------------------+


Here when a particular pcidev is passed-through to a KVM guest,
it is attached to vfio_pci driver in host, and guest memory
is mapped into IOMMU via the type1 iommu driver.


Then, the draft infrastructure of future VFIO-based vgpu:



                 +-------------------------------------+
                 |              vfio qemu              |
                 +----+-------------------------+------+
                      |DMA                   ^  |CFG
QEMU                  |map                IRQ|  |
----------------------|----------------------|--|-----------
KERNEL                |                      |  |
         +------------|----------------------|--|----------+
         |VFIO        |                      |  |          |
         |            v                      |  v          |
         | +--------------------+      +-----+-----------+ |
DMA      | | vfio iommu driver  |      | vfio bus driver | |
API <------+                    |      |                 | |
Layer    | |  e.g. vfio_type2   |      |  e.g. vfio_vgpu | |
         | +--------------------+      +-----------------+ |
         |         |  ^                      |  ^          |
         +---------|--|----------------------|--|----------+
                   |  |                      |  |
                   |  |                      v  |
         +---------|--|----------+   +---------------------+
         | +-------v-----------+ |   |                     |
         | |                   | |   |                     |
         | |      KVMGT        | |   |                     |
         | |                   | |   |   host gfx driver   |
         | +-------------------+ |   |                     |
         |                       |   |                     |
         |    KVM hypervisor     |   |                     |
         +-----------------------+   +---------------------+

        NOTE    vfio_type2 and vfio_vgpu are only *logically* parts
                of VFIO, they may be implemented in KVM hypervisor
                or host gfx driver.



Here we need to implement a new vfio IOMMU driver instead of type1,
let's call it vfio_type2 temporarily. The main difference from pcidev
assignment is, vGPU doesn't have its own DMA requester id, so it has
to share mappings with host and other vGPUs.

        - type1 iommu driver maps gpa to hpa for passing through;
          whereas type2 maps iova to hpa;

        - hardware iommu is always needed by type1, whereas for
          type2, hardware iommu is optional;

        - type1 will invoke low-level IOMMU API (iommu_map et al) to
          setup IOMMU page table directly, whereas type2 dosen't (only
          need to invoke higher level DMA API like dma_map_page);


We also need to implement a new 'bus' driver instead of vfio_pci,
let's call it vfio_vgpu temporarily:

        - vfio_pci is a real pci driver, it has a probe method called
          during dev attaching; whereas the vfio_vgpu is a pseudo
          driver, it won't attach any devivce - the GPU is always owned by
          host gfx driver. It has to do 'probing' elsewhere, but
          still in host gfx driver attached to the device;

        - pcidev(PF or VF) attached to vfio_pci has a natural path
          in sysfs; whereas vgpu is purely a software concept:
          vfio_vgpu needs to create create/destory vgpu instances,
          maintain their paths in sysfs (e.g. "/sys/class/vgpu/intel/vgpu0")
          etc. There should be something added in a higher layer
          to do this (VFIO or DRM).

        - vfio_pci in most case will allow QEMU to access pcidev
          hardware; whereas vfio_vgpu is to access virtual resource
          emulated by another device model;

        - vfio_pci will inject an IRQ to guest only when physical IRQ
          generated; whereas vfio_vgpu may inject an IRQ for emulation
          purpose. Anyway they can share the same injection interface;


Questions:

        [1] For VFIO No-IOMMU mode (!iommu_present), I saw it was reverted
            in upstream ae5515d66362(Revert: "vfio: Include No-IOMMU mode").
            In my opinion, vfio_type2 doesn't rely on it to support No-IOMMU
            case, instead it needs a new implementation which fits both
            w/ and w/o IOMMU. Is this correct?


For things not mentioned above, we might have them discussed in
other threads, or temporarily maintained in a TODO list (we might get
back to them after the big picture get agreed):


        - How to expose guest framebuffer via VFIO for SPICE;

        - How to avoid double translation with two-stage: GTT + IOMMU,
          whether identity map is possible, and if yes, how to make it
          more effectively;

        - Application acceleration
          You mentioned that with VFIO, a vGPU may be used by
          applications to get GPU acceleration. It's a potential
          opportunity to use vGPU for container usage, worthy of
          further investigation.





--
Thanks,
Jike

^ permalink raw reply	[flat|nested] 118+ messages in thread

end of thread, other threads:[~2016-02-02  7:46 UTC | newest]

Thread overview: 118+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-18  2:39 VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...) Jike Song
2016-01-18  2:39 ` [Qemu-devel] " Jike Song
2016-01-18  4:47 ` Alex Williamson
2016-01-18  4:47   ` [Qemu-devel] " Alex Williamson
2016-01-18  8:56   ` Jike Song
2016-01-18  8:56     ` [Qemu-devel] " Jike Song
2016-01-18 19:05     ` Alex Williamson
2016-01-18 19:05       ` [Qemu-devel] " Alex Williamson
2016-01-20  8:59       ` Jike Song
2016-01-20  8:59         ` [Qemu-devel] " Jike Song
2016-01-20  9:05         ` Tian, Kevin
2016-01-20  9:05           ` [Qemu-devel] " Tian, Kevin
2016-01-25 11:34           ` Jike Song
2016-01-25 11:34             ` [Qemu-devel] " Jike Song
2016-01-25 21:30             ` Alex Williamson
2016-01-25 21:30               ` [Qemu-devel] " Alex Williamson
2016-01-25 21:45               ` Tian, Kevin
2016-01-25 21:45                 ` [Qemu-devel] " Tian, Kevin
2016-01-25 21:48                 ` Tian, Kevin
2016-01-25 21:48                   ` [Qemu-devel] " Tian, Kevin
2016-01-26  9:48                 ` Neo Jia
2016-01-26  9:48                   ` [Qemu-devel] " Neo Jia
2016-01-26 10:20                 ` Neo Jia
2016-01-26 10:20                   ` [Qemu-devel] " Neo Jia
2016-01-26 19:24                   ` Tian, Kevin
2016-01-26 19:24                     ` [Qemu-devel] " Tian, Kevin
2016-01-26 19:29                     ` Neo Jia
2016-01-26 19:29                       ` [Qemu-devel] " Neo Jia
2016-01-26 20:06                   ` Alex Williamson
2016-01-26 20:06                     ` [Qemu-devel] " Alex Williamson
2016-01-26 21:38                     ` Tian, Kevin
2016-01-26 21:38                       ` [Qemu-devel] " Tian, Kevin
2016-01-26 22:28                     ` Neo Jia
2016-01-26 22:28                       ` [Qemu-devel] " Neo Jia
2016-01-26 23:30                       ` Alex Williamson
2016-01-26 23:30                         ` [Qemu-devel] " Alex Williamson
2016-01-27  9:14                         ` Neo Jia
2016-01-27  9:14                           ` [Qemu-devel] " Neo Jia
2016-01-27 16:10                           ` Alex Williamson
2016-01-27 16:10                             ` [Qemu-devel] " Alex Williamson
2016-01-27 21:48                             ` Neo Jia
2016-01-27 21:48                               ` [Qemu-devel] " Neo Jia
2016-01-27  8:06                     ` Kirti Wankhede
2016-01-27  8:06                       ` [Qemu-devel] " Kirti Wankhede
2016-01-27 16:00                       ` Alex Williamson
2016-01-27 16:00                         ` [Qemu-devel] " Alex Williamson
2016-01-27 20:55                         ` Kirti Wankhede
2016-01-27 20:55                           ` [Qemu-devel] " Kirti Wankhede
2016-01-27 21:58                           ` Alex Williamson
2016-01-27 21:58                             ` [Qemu-devel] " Alex Williamson
2016-01-28  3:01                             ` Kirti Wankhede
2016-01-28  3:01                               ` [Qemu-devel] " Kirti Wankhede
2016-01-26  7:41               ` Jike Song
2016-01-26  7:41                 ` [Qemu-devel] " Jike Song
2016-01-26 14:05                 ` Yang Zhang
2016-01-26 14:05                   ` [Qemu-devel] " Yang Zhang
2016-01-26 16:37                   ` Alex Williamson
2016-01-26 16:37                     ` [Qemu-devel] " Alex Williamson
2016-01-26 21:21                     ` Tian, Kevin
2016-01-26 21:21                       ` [Qemu-devel] " Tian, Kevin
2016-01-26 21:30                       ` Neo Jia
2016-01-26 21:30                         ` [Qemu-devel] " Neo Jia
2016-01-26 21:43                         ` Tian, Kevin
2016-01-26 21:43                           ` [Qemu-devel] " Tian, Kevin
2016-01-26 21:43                       ` Alex Williamson
2016-01-26 21:43                         ` [Qemu-devel] " Alex Williamson
2016-01-26 21:50                         ` Tian, Kevin
2016-01-26 21:50                           ` [Qemu-devel] " Tian, Kevin
2016-01-26 22:07                           ` Alex Williamson
2016-01-26 22:07                             ` [Qemu-devel] " Alex Williamson
2016-01-26 22:15                             ` Tian, Kevin
2016-01-26 22:15                               ` [Qemu-devel] " Tian, Kevin
2016-01-26 22:27                               ` Alex Williamson
2016-01-26 22:27                                 ` [Qemu-devel] " Alex Williamson
2016-01-26 22:39                                 ` Tian, Kevin
2016-01-26 22:39                                   ` [Qemu-devel] " Tian, Kevin
2016-01-26 22:56                                   ` Alex Williamson
2016-01-26 22:56                                     ` [Qemu-devel] " Alex Williamson
2016-01-27  1:47                                     ` Jike Song
2016-01-27  1:47                                       ` [Qemu-devel] " Jike Song
2016-01-27  3:07                                       ` Alex Williamson
2016-01-27  3:07                                         ` [Qemu-devel] " Alex Williamson
2016-01-27  5:43                                         ` Jike Song
2016-01-27  5:43                                           ` [Qemu-devel] " Jike Song
2016-01-27 16:19                                           ` Alex Williamson
2016-01-27 16:19                                             ` [Qemu-devel] " Alex Williamson
2016-01-28  6:00                                             ` Jike Song
2016-01-28  6:00                                               ` [Qemu-devel] " Jike Song
2016-01-28 15:23                                               ` Alex Williamson
2016-01-28 15:23                                                 ` [Qemu-devel] " Alex Williamson
2016-01-29  7:20                                                 ` Jike Song
2016-01-29  7:20                                                   ` [Qemu-devel] " Jike Song
2016-01-29  8:49                                                   ` [iGVT-g] " Jike Song
2016-01-29  8:49                                                     ` [Qemu-devel] " Jike Song
2016-01-29 18:50                                                     ` Alex Williamson
2016-01-29 18:50                                                       ` [Qemu-devel] " Alex Williamson
2016-02-01 13:10                                                       ` Gerd Hoffmann
2016-02-01 13:10                                                         ` [Qemu-devel] " Gerd Hoffmann
2016-02-01 21:44                                                         ` Alex Williamson
2016-02-01 21:44                                                           ` [Qemu-devel] " Alex Williamson
2016-02-02  7:28                                                           ` Gerd Hoffmann
2016-02-02  7:28                                                             ` [Qemu-devel] " Gerd Hoffmann
2016-02-02  7:35                                                           ` Zhiyuan Lv
2016-02-02  7:35                                                             ` [Qemu-devel] " Zhiyuan Lv
2016-01-27  1:52                                     ` Yang Zhang
2016-01-27  1:52                                       ` [Qemu-devel] " Yang Zhang
2016-01-27  3:37                                       ` Alex Williamson
2016-01-27  3:37                                         ` [Qemu-devel] " Alex Williamson
2016-01-27  0:06                   ` Jike Song
2016-01-27  0:06                     ` [Qemu-devel] " Jike Song
2016-01-27  1:34                     ` Yang Zhang
2016-01-27  1:34                       ` [Qemu-devel] " Yang Zhang
2016-01-27  1:51                       ` Jike Song
2016-01-27  1:51                         ` [Qemu-devel] " Jike Song
2016-01-26 16:12                 ` Alex Williamson
2016-01-26 16:12                   ` [Qemu-devel] " Alex Williamson
2016-01-26 21:57                   ` Tian, Kevin
2016-01-26 21:57                     ` [Qemu-devel] " Tian, Kevin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.