From mboxrd@z Thu Jan  1 00:00:00 1970
From: Jike Song <jike.song@intel.com>
Subject: Re: VFIO based vGPU(was Re: [Announcement] 2015-Q3
 release of XenGT - a Mediated ...)
Date: Wed, 20 Jan 2016 16:59:50 +0800
Message-ID: <569F4C86.2070501@intel.com>
References: <569C5071.6080004@intel.com> <1453092476.32741.67.camel@redhat.com>
	<569CA8AD.6070200@intel.com>
	<1453143919.32741.169.camel@redhat.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Cc: "Ruan, Shuai" <shuai.ruan@intel.com>, "Tian, Kevin" <kevin.tian@intel.com>,
	kvm@vger.kernel.org, "igvt-g@lists.01.org" <igvt-g@ml01.01.org>,
	qemu-devel <qemu-devel@nongnu.org>, Gerd Hoffmann <kraxel@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>, Zhiyuan Lv <zhiyuan.lv@intel.com>
To: Alex Williamson <alex.williamson@redhat.com>
Return-path: <qemu-devel-bounces+gceq-qemu-devel=gmane.org@nongnu.org>
In-Reply-To: <1453143919.32741.169.camel@redhat.com>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
Errors-To: qemu-devel-bounces+gceq-qemu-devel=gmane.org@nongnu.org
Sender: qemu-devel-bounces+gceq-qemu-devel=gmane.org@nongnu.org
List-Id: kvm.vger.kernel.org

On 01/19/2016 03:05 AM, Alex Williamson wrote:
> On Mon, 2016-01-18 at 16:56 +0800, Jike Song wrote:
>>
>> Would you elaborate a bit about 'iommu backends' here? Previously
>> I thought that entire type1 will be duplicated. If not, what is supposed
>> to add, a new vfio_dma_do_map?
> 
> I don't know that you necessarily want to re-use any of the
> vfio_iommu_type1.c code as-is, it's just the API that we'll want to
> keep consistent so QEMU doesn't need to learn about a new iommu
> backend.  Opportunities for sharing certainly may arise, you may want
> to use a similar red-black tree for storing current mappings, the
> pinning code may be similar, etc.  We can evaluate on a case by case
> basis whether it makes sense to pull out common code for each of those.

It would be great if you can help abstracting it :) 

> 
> As for an iommu backend in general, if you look at the code flow
> example in Documentation/vfio.txt, the user opens a container
> (/dev/vfio/vfio) and a group (/dev/vfio/$GROUPNUM).  The group is set
> to associate with a container instance via VFIO_GROUP_SET_CONTAINER and
> then an iommu model is set for the container with VFIO_SET_IOMMU.
>  Looking at drivers/vfio/vfio.c:vfio_ioctl_set_iommu(), we look for an
> iommu backend that supports the requested extension (VFIO_TYPE1_IOMMU),
> call the open() callback on it and then attempt to attach the group via
> the attach_group() callback.  At this latter callback, the iommu
> backend can compare the device to those that it actually supports.  For
> instance the existing vfio_iommu_type1 will attempt to use the IOMMU
> API and should fail if the device cannot be supported with that.  The
> current loop in vfio_ioctl_set_iommu() will exit in this case, but as
> you can see in the code, it's easy to make it continue and look for
> another iommu backend that supports the requested extension.
> 

Got it, sure type1 API w/ userspace should be kept, while a new backend
being used for vgpu.

>>> The benefit here is that QEMU could work
>>> unmodified, using the type1 vfio-iommu API regardless of whether a
>>> device is directly assigned or virtual.
>>>
>>> Let's look at the type1 interface; we have simple map and unmap
>>> interfaces which map and unmap process virtual address space (vaddr) to
>>> the device address space (iova).  The host physical address is obtained
>>> by pinning the vaddr.  In the current implementation, a map operation
>>> pins pages and populates the hardware iommu.  A vgpu compatible
>>> implementation might simply register the translation into a kernel-
>>> based database to be called upon later.  When the host graphics driver
>>> needs to enable dma for the vgpu, it doesn't need to go to QEMU for the
>>> translation, it already possesses the iova to vaddr mapping, which
>>> becomes iova to hpa after a pinning operation.
>>>
>>> So, I would encourage you to look at creating a vgpu vfio iommu
>>> backened that makes use of the type1 api since it will reduce the
>>> changes necessary for userspace.
>>>
>>
>> BTW, that should be done in the 'bus' driver, right?
> 
> I think you have some flexibility between the graphics driver and the
> vfio-vgpu driver in where this is done.  If we want vfio-vgpu to be
> more generic, then vgpu device creation and management should probably
> be done in the graphics driver and vfio-vgpu should be able to probe
> that device and call back into the graphics driver to handle requests.
> If it turns out there's not much for vfio-vgpu to share, ie. it's just
> a passthrough for device specific emulation, then maybe we want a vfio-
> intel-vgpu instead.
>

Good to know that.

>>
>> Looks that things get more clear overall, with small exceptions.
>> Thanks for the advice:)
> 
> Yes, please let me know how I can help.  Thanks,
> 
> Alex
> 

I will start the coding soon, will do :)

--
Thanks,
Jike

From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:39652)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <jike.song@intel.com>) id 1aLoc1-0008C3-PA
	for qemu-devel@nongnu.org; Wed, 20 Jan 2016 03:59:34 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <jike.song@intel.com>) id 1aLobw-0007MC-Pg
	for qemu-devel@nongnu.org; Wed, 20 Jan 2016 03:59:33 -0500
Received: from mga04.intel.com ([192.55.52.120]:59980)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <jike.song@intel.com>) id 1aLobw-0007L1-Hs
	for qemu-devel@nongnu.org; Wed, 20 Jan 2016 03:59:28 -0500
Message-ID: <569F4C86.2070501@intel.com>
Date: Wed, 20 Jan 2016 16:59:50 +0800
From: Jike Song <jike.song@intel.com>
MIME-Version: 1.0
References: <569C5071.6080004@intel.com> <1453092476.32741.67.camel@redhat.com>
	<569CA8AD.6070200@intel.com>
	<1453143919.32741.169.camel@redhat.com>
In-Reply-To: <1453143919.32741.169.camel@redhat.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] VFIO based vGPU(was Re: [Announcement] 2015-Q3
 release of XenGT - a Mediated ...)
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: "Ruan, Shuai" <shuai.ruan@intel.com>, "Tian, Kevin" <kevin.tian@intel.com>, kvm@vger.kernel.org, "igvt-g@lists.01.org" <igvt-g@ml01.01.org>, qemu-devel <qemu-devel@nongnu.org>, Gerd Hoffmann <kraxel@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>, Zhiyuan Lv <zhiyuan.lv@intel.com>

On 01/19/2016 03:05 AM, Alex Williamson wrote:
> On Mon, 2016-01-18 at 16:56 +0800, Jike Song wrote:
>>
>> Would you elaborate a bit about 'iommu backends' here? Previously
>> I thought that entire type1 will be duplicated. If not, what is supposed
>> to add, a new vfio_dma_do_map?
> 
> I don't know that you necessarily want to re-use any of the
> vfio_iommu_type1.c code as-is, it's just the API that we'll want to
> keep consistent so QEMU doesn't need to learn about a new iommu
> backend.  Opportunities for sharing certainly may arise, you may want
> to use a similar red-black tree for storing current mappings, the
> pinning code may be similar, etc.  We can evaluate on a case by case
> basis whether it makes sense to pull out common code for each of those.

It would be great if you can help abstracting it :) 

> 
> As for an iommu backend in general, if you look at the code flow
> example in Documentation/vfio.txt, the user opens a container
> (/dev/vfio/vfio) and a group (/dev/vfio/$GROUPNUM).  The group is set
> to associate with a container instance via VFIO_GROUP_SET_CONTAINER and
> then an iommu model is set for the container with VFIO_SET_IOMMU.
>  Looking at drivers/vfio/vfio.c:vfio_ioctl_set_iommu(), we look for an
> iommu backend that supports the requested extension (VFIO_TYPE1_IOMMU),
> call the open() callback on it and then attempt to attach the group via
> the attach_group() callback.  At this latter callback, the iommu
> backend can compare the device to those that it actually supports.  For
> instance the existing vfio_iommu_type1 will attempt to use the IOMMU
> API and should fail if the device cannot be supported with that.  The
> current loop in vfio_ioctl_set_iommu() will exit in this case, but as
> you can see in the code, it's easy to make it continue and look for
> another iommu backend that supports the requested extension.
> 

Got it, sure type1 API w/ userspace should be kept, while a new backend
being used for vgpu.

>>> The benefit here is that QEMU could work
>>> unmodified, using the type1 vfio-iommu API regardless of whether a
>>> device is directly assigned or virtual.
>>>
>>> Let's look at the type1 interface; we have simple map and unmap
>>> interfaces which map and unmap process virtual address space (vaddr) to
>>> the device address space (iova).  The host physical address is obtained
>>> by pinning the vaddr.  In the current implementation, a map operation
>>> pins pages and populates the hardware iommu.  A vgpu compatible
>>> implementation might simply register the translation into a kernel-
>>> based database to be called upon later.  When the host graphics driver
>>> needs to enable dma for the vgpu, it doesn't need to go to QEMU for the
>>> translation, it already possesses the iova to vaddr mapping, which
>>> becomes iova to hpa after a pinning operation.
>>>
>>> So, I would encourage you to look at creating a vgpu vfio iommu
>>> backened that makes use of the type1 api since it will reduce the
>>> changes necessary for userspace.
>>>
>>
>> BTW, that should be done in the 'bus' driver, right?
> 
> I think you have some flexibility between the graphics driver and the
> vfio-vgpu driver in where this is done.  If we want vfio-vgpu to be
> more generic, then vgpu device creation and management should probably
> be done in the graphics driver and vfio-vgpu should be able to probe
> that device and call back into the graphics driver to handle requests.
> If it turns out there's not much for vfio-vgpu to share, ie. it's just
> a passthrough for device specific emulation, then maybe we want a vfio-
> intel-vgpu instead.
>

Good to know that.

>>
>> Looks that things get more clear overall, with small exceptions.
>> Thanks for the advice:)
> 
> Yes, please let me know how I can help.  Thanks,
> 
> Alex
> 

I will start the coding soon, will do :)

--
Thanks,
Jike